Carlos S. Kubrusly
The Elements of Operator Theory Second Edition
Carlos S. Kubrusly Electrical Engineering Department Catholic University of Rio de Janeiro R. Marques de S. Vicente 225 22453-900, Rio de Janeiro, RJ, Brazil
[email protected]
ISBN 978-0-8176-4997-5 e-ISBN 978-0-8176-4998-2 DOI 10.1007/978-0-8176-4998-2 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2011922537 Mathematics Subject Classification (2010): 47-01, 47A-xx, 47B-xx, 47C-xx, 47D-xx, 47L-xx © Springer Science+Business Media, LLC 2011 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+ Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper www.birkhauser-science.com
To the memory of my father
The truth, he thought, has never been of any real value to any human being — it is a symbol for mathematicians and philosophers to pursue. In human relations kindness and lies are worth a thousand truths. He involved himself in what he always knew was a vain struggle to retain the lies. Graham Greene
Preface to the Second Edition
This is a revised, corrected, enlarged, updated, and thoroughly rewritten version for a second edition of Elements of Operator Theory (Birkh¨ auser, Boston, 2001). Although a considerable amount of new material has been added to this new edition, it was not altered in a significant way: the original focus and organization were preserved. In particular, the numbering system of the former edition (concerning chapters, sections, definitions, propositions, lemmas, theorems, corollaries, examples, and problems) has been kept. New material was either embedded in the text without changing those numbers (to catch up with the previous edition for reference purposes so that citations made to the previous edition still hold for the new edition) or included at the end of each chapter with a subsequent numbering. All problems and references of the first edition have also been kept, and 33 new problems and 24 new references (22 books and 2 papers) were added to the present edition. The logical dependence of the various sections (and chapters) is roughly linear and reflects approximately the minimum amount of material needed to proceed further. A few parts might be compressed or even skipped in a first reading. Chapter 1 may be taken for self-study (and an important one at that), and a formal course of lectures might begin with Chapter 2. Sections 3.8, 3.9, and 4.7 may be postponed to a second reading, as well as Section 6.8 if the readers are still to acquire their first contact with measure theory. The first edition was written about ten years ago. During this period an extensive Errata was posted on the Web. All corrections listed in it were, of course, also incorporated in the present edition. I thank Torrey Adams, Patricia T. Bandeira, Renato A. A. da Costa, Moacyr V. Dutra, Jorge S. Garcia, Jessica Q. Kubrusly, Nhan Levan, Jos´e Luis C. Lyra Jr, Adrian H. Pizzinga, Regina Posternak, Andr´e L. Pulcherio, James M. Snyder, Guilherme P. Tempor˜ao, Fernando Torres-Torija, Augusto C. Gadelha Vieira, and Jo˜ao Zanni, who helped in compiling that Errata. Rio de Janeiro, November 2010
Carlos S. Kubrusly
Preface
“Elements” in the title of this book has its standard meaning, namely, basic principles and elementary theory. The main focus is operator theory, and the topics range from sets to the spectral theorem. Chapter 1 (Set-Theoretic Structures) introduces the reader to ordering, lattices, and cardinality. Linear spaces are presented in Chapter 2 (Algebraic Structures), and metric (and topological) spaces are studied in Chapter 3 (Topological Structures). The purpose of Chapter 4 (Banach Spaces) is to put algebra and topology to work together. Continuity plays a central role in the theory of topological spaces, and linear transformation plays a central role in the theory of linear spaces. When algebraic and topological structures are compatibly laid on the same underlying set, leading to the notion of topological vector spaces, then we may consider the concept of continuous linear transformations. By an operator we mean a continuous linear transformation of a normed space into itself. Chapter 5 (Hilbert Spaces) is central. There a geometric structure is properly added to the algebraic and topological structures. The spectral theorem is a cornerstone in the theory of operators on Hilbert spaces. It gives a full statement on the nature and structure of normal operators, and is considered in Chapter 6 (The Spectral Theorem). The book is addressed to graduate students, both in mathematics or in one of the sciences, and also to working mathematicians exploring operator theory and scientists willing to apply operator theory to their own subject. In the former case it actually is a first course. In the latter case it may serve as a basic reference on the so-called elementary theory of single operator theory. Its primary intention is to introduce operator theory to a new generation of students and provide the necessary background for it. Technically, the prerequisite for this book is some mathematical maturity that a first-year graduate student in mathematics, engineering, or in one of the formal sciences is supposed to have already acquired. The book is largely self-contained. Of course,
X
Preface
a formal introduction to analysis will be helpful, as well as an introductory course on functions of a complex variable. Measure and integration are not required up to the very last section of the last chapter. Each section of each chapter has a short and concise (sometimes a compound) title. They were selected in such a way that, when put together in the contents, they give a brief outline of the book to the right audience. The focus of this book is on concepts and ideas as an alternative to the computational approach. The proofs avoid computation whenever possible or convenient. Instead, I try to unfold the structural properties behind the statements of theorems, stressing mathematical ideas rather than long calculations. Tedious and ugly (all right, “ugly” is subjective) calculations were avoided when a more conceptual way to explain the stream of ideas was possible. Clearly, this is not new. In any event, every single proof in this book was specially tailored to meet this requirement, but they (at least the majority of them) are standard proofs, perhaps with a touch of what may reflect some of the author’s minor idiosyncrasies. In writing this book I kept my mind focused on the reader. Sometimes I am talking to my students and sometimes to my colleagues (they surely will identify in each case to whom I am talking). For my students, the objective is to teach mathematics (ideas, structures, and problems). There are 300 problems throughout [the first edition of] the book, many of them with multiple parts. These problems, at the end of each chapter, comprise complements and extensions of the theory, further examples and counterexamples, or auxiliary results that may be useful in the sequel. They are an integral part of the main text, which makes them different from traditional classroom exercises. Many of these problems are accompanied by hints, which may be a single word or a sketch, sometimes long, of a proof. The idea behind providing these long and detailed hints is that just talking to students is not enough. One has to motivate them too. In my view, motivation (in this context) is to reveal the beauty of pure mathematics, and to challenge students with a real chance to reconstruct a proof for a theorem that is “new” to them. Such a real chance can be offered by a suitable, sometimes rather detailed, hint. At the end of each chapter, just before the problems, the reader will find a list of suggested readings that contains only books. Some of them had a strong influence in preparing this book, and many of them are suggested as a second or third reading. The reference section comprises a list of all those books and just a few research papers (82 books and 11 papers — for the first edition), all of them quoted in the text. Research papers are only mentioned to complement occasional historical remarks so that the few articles cited there are, in fact, classical breakthroughs. For a glance at current research in operator theory the reader is referred to recent research monographs suggested in Chapters 5 and 6.
Preface
XI
I started writing this book after lecturing on its subject at Catholic University of Rio de Janeiro for over 20 years. In general, the material is covered in two one-semester beginning graduate courses, where the audience comprises mathematics, engineering, economics, and physics students. Quite often senior undergraduate students joined the courses. The dividing line between these two one-semester courses depends a bit on the pace of lectures but is usually somewhere at the beginning of Chapter 5. Questions asked by generations of students and colleagues have been collected. When the collection was big enough, some former students, as well as current students, insisted upon a new book but urged that it should not be a mere collection of lecture notes and exercises bound together. I hope not to disappoint them too much. At this point, where a preface is coming to an end, one has the duty and pleasure to acknowledge the participation of those people who somehow effectively contributed in connection with writing the book. Certainly, the students in those courses were a big help and a source of motivation. Some friends among students and colleagues have collaborated by discussing the subject of this book for a long time on many occasions. They are: Gilberto O. Corrˆea, Oswaldo L. V. Costa, Giselle M. S. Ferreira, Marcelo D. Fragoso, Ricardo S. Kubrusly, Abilio P. Lucena, Helios Malebranche, Carlos E. Pedreira, Denise O. Pinto, Marcos A. da Silveira, and Paulo C´esar M. Vieira. Special thanks are due to my friend and colleague Augusto C. Gadelha Vieira, who read part of the manuscript and made many valuable suggestions. I am also grateful to Ruth F. Curtain, who, back in the early 1970s, introduced me to functional analysis. I wish to thank Catholic University of Rio de Janeiro for providing the release time that made this project possible. Let me also thank the staff of Birkh¨ auser Boston and Elizabeth Loew of TEXniques for their ever-efficient and friendly partnership. Finally, it is just fair to mention that this project was supported in part by CNPq (Brazilian National Research Council) and FAPERJ (Rio de Janeiro State Research Council). Rio de Janeiro, November 2000
Carlos S. Kubrusly
Contents
Preface to the Second Edition Preface 1 Set-Theoretic Structures
VII IX
1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Sets and Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 3 4 7 8 10 12 14 21 26
2 Algebraic Structures 2.1 Linear Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Linear Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Hamel Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Isomorphic Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Direct Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37 37 43 45 48 55 58 63 66 70 75
XIV
Contents
3 Topological Structures
87
3.1 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.2 Convergence and Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 3.3 Open Sets and Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 3.4 Equivalent Metrics and Homeomorphisms . . . . . . . . . . . . . . . . . . . . 108 3.5 Closed Sets and Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 3.6 Dense Sets and Separable Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 3.7 Complete Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 3.8 Continuous Extension and Completion . . . . . . . . . . . . . . . . . . . . . . . . 135 3.9 The Baire Category Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 3.10 Compact Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 3.11 Sequential Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 4 Banach Spaces 4.1 Normed Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Subspaces and Quotient Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Bounded Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 The Open Mapping Theorem and Continuous Inverse . . . . . . . . . 4.6 Equivalence and Finite-Dimensional Spaces . . . . . . . . . . . . . . . . . . . 4.7 Continuous Linear Extension and Completion . . . . . . . . . . . . . . . . 4.8 The Banach–Steinhaus Theorem and Operator Convergence . . 4.9 Compact Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10 The Hahn–Banach Theorem and Dual Spaces . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
199 199 204 210 217 225 232 239 244 252 259 270
5 Hilbert Spaces 309 5.1 Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 5.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 5.3 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 5.4 Orthogonal Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 5.5 Orthogonal Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 5.6 Unitary Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 5.7 Summability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340 5.8 Orthonormal Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 5.9 The Fourier Series Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 5.10 Orthogonal Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 5.11 The Riesz Representation Theorem and Weak Convergence . . . 374 5.12 The Adjoint Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384 5.13 Self-Adjoint Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 5.14 Square Root and Polar Decomposition . . . . . . . . . . . . . . . . . . . . . . . . 398 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
Contents
6 The Spectral Theorem 6.1 Normal Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 The Spectrum of an Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Spectral Radius . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Numerical Radius . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Examples of Spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 The Spectrum of a Compact Operator . . . . . . . . . . . . . . . . . . . . . . . . 6.7 The Compact Normal Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 A Glimpse at the General Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
XV
443 443 450 458 465 468 478 484 492 499
References
521
Index
529
1 Set-Theoretic Structures
The purpose of this chapter is to present a brief review of some basic settheoretic concepts that will be needed in the sequel. By basic concepts we mean standard notation and terminology, and a few essential results that will be required in later chapters. We assume the reader is familiar with the notion of set and elements (or members, or points) of a set, as well as with the basic set operations. It is convenient to reserve certain symbols for certain sets, especially for the basic number systems. The set of all nonnegative integers will be denoted by N 0 , the set of all positive integers (i.e., the set of all natural numbers) by N , and the set of all integers by Z . The set of all rational numbers will be denoted by Q , the set of all real numbers (or the real line) by R, and the set of all complex numbers by C .
1.1 Background We shall also assume that the reader is familiar with the basic rules of elementary (classical) logic, but acquaintance with formal logic is not necessary. The foundations of mathematics will not be reviewed in this book. However, before starting our brief review on set-theoretic concepts, we shall introduce some preliminary notation, terminology, and logical principles as a background for our discourse. If a predicate P ( ) is meaningful for a subject x, then P (x) (or simply P ) will denote a proposition. The terms statement and assertion will be used as synonyms for proposition. A statement on statements is sometimes called a formula (or a secondary proposition). Statements may be true or false (not true). A tautology is a formula that is true regardless of the truth of the statements in it. A contradiction is a formula that is false regardless of the truth of the statements in it. The symbol ⇒ denotes implies and the formula P ⇒ Q (whose logical definition is “either P is false or Q is true”) means “the statement P implies the statement Q”. That is, “if P is true, then Q is true”, or “P is a sufficient condition for Q”. We shall also use the symbol ⇒ / for the denial of ⇒, so that ⇒ / denotes does not imply and the formula P ⇒ / Q
C.S. Kubrusly, The Elements of Operator Theory, DOI 10.1007/978-0-8176-4998-2_1, © Springer Science+Business Media, LLC 2011
1
2
1. Set-Theoretic Structures
means “the statement P does not imply the statement Q”. Accordingly, let P stand for the denial of P (read: not P ). If P is a statement, then P is its contradictory. Let us first recall one of the basic rules of deduction called modus ponens: “if a statement P is true and if P implies Q, then the statement Q is true” — “anything implied by a true statement is true”. Symbolically, {P true and P ⇒ Q} =⇒ {Q true}. A direct proof is essentially a chain of modus ponens. For instance, if P is true, then the string of implications P ⇒ Q ⇒ R ensures that R is true. Indeed, if we can establish that P holds, and that P implies Q, then (modus ponens) Q holds. Moreover, if we can also establish that Q implies R, then (modus ponens again) R holds. However, modus ponens alone is not enough to ensure that such a reasoning may be extended to an arbitrary (endless) string of implications. In certain cases the Principle of Mathematical Induction provides an alternative reasoning. Let N be the set of all natural numbers. A set S of natural numbers is called inductive if n + 1 is an element of S whenever n is. The Principle of Mathematical Induction states that “if 1 is an element of an inductive set S, then S = N ”. This leads to a second scheme of proof, called proof by induction. For instance, for each natural number n let Pn be a proposition. If P1 holds true and if Pn ⇒ Pn+1 for each n, then Pn holds true for every natural number n. The scheme of proof by induction works for N replaced with N 0 . There is nothing magical about the number 1 as far as a proof by induction is concerned. All that is needed is a “beginning” and the notion of “induction”. Example: Let i be an arbitrary integer and let Z i be the set made up of all integers greater than or equal to i. For each integer k in Z i let Pk be a proposition. If Pi holds true and if Pk ⇒ Pk+1 for each k, then Pk holds true for every integer k in Z i (particular cases: Z 0 = N 0 and Z 1 = N ). “If a statement leads to a contradiction, then this statement is false”. This is the rule of a proof by contradiction — reductio ad absurdum. It relies on the Principle of Contradiction, which states that “P and P are impossible”. In other words, the Principle of Contradiction says that the formula “P and P ” is a contradiction. But this alone does not ensure that any of P or P must hold. The Law of the Excluded Middle (or Law of the Excluded Third — tertium non datur ) does: “either P or P holds”. That is, the Law of the Excluded Middle simply says that the formula “P or P ” is a tautology. Therefore, the formula Q ⇒ P means “P holds only if Q holds”, or “Q is a necessary condition for P ”. If P ⇒ Q and Q ⇒ P , then we write P ⇔ Q which means “P if and only if Q”, or “P is a necessary and sufficient condition for Q”, or still“P and Q are equivalent ” (and vice versa). Indeed, the formulas P ⇒ Q and Q ⇒ P are equivalent: {P ⇒ Q} ⇐⇒ {Q ⇒ P }. This equivalence is the basic idea behind a contrapositive proof : “to verify that a proposition P implies a proposition Q, prove, instead, that the denial of Q implies the denial of P ”.
1.2 Sets and Relations
3
We conclude this introductory section by pointing out another usual but slightly different meaning for the term “proposition”. We shall often say “prove the following proposition” instead of “prove that the following proposition holds true”. Here the term proposition is being used as a synonym for theorem (a true statement for which we demand a proof of its truth), and not as a synonym for assertion or statement (that may be either true or false). A conjecture is a statement that has not been proved yet — it may turn out to be either true or false once a proof of its truth or falsehood is supplied. If a conjecture is proved to be true, then it becomes a theorem. Note that there is no “false theorem” — if it is false, it is not a theorem. Another synonym for theorem is lemma. There is no logical difference among the terms “theorem”, “lemma”, and “proposition”, but it is usual to endow them with a psychological hierarchy. Generally, a theorem is supposed to bear a greater importance (which is subjective) and a lemma is often viewed as an intermediate theorem (which may be very important indeed) that will be applied to prove a further theorem. Propositions are sometimes placed a step below, either as an isolated theorem or as an auxiliary result. A corollary is, of course, a theorem that comes out as a consequence of a previously proved theorem (i.e., whose proof is mainly based on an application of that previous theorem). Unlike “conjecture”, “proposition”, “lemma”, “theorem”, and “corollary”, the term axiom (or postulate) is applied to a fundamental statement (or assumption, or hypothesis) upon which a theory (i.e., a set of theorems) is built. Clearly, a set of axioms (or, more appropriately, a system of axioms) should be consistent (i.e., they should not lead to a contradiction), and they are said to be independent if none of them is a theorem (i.e., if none of them can be proved by the remaining axioms).
1.2 Sets and Relations If x is an element of a set X, then we shall write x ∈ X (meaning that x belongs to X, or x is contained in X). Otherwise (i.e., if x is not an element of X), x ∈ / X. We also write A ⊆ B to mean that a set A is a subset of a set B (A ⊆ B ⇐⇒ {x ∈ A ⇒ x ∈ B}). In such a case A is said to be included in B. The empty set , which is a subset of every set, will be denoted by ∅. Two sets A and B are equal (notation: A = B) if A ⊆ B and B ⊆ A. If A is a subset of B but not equal to B, then we say that A is a proper subset of B and write A ⊂ B. In such a case A is said to be properly included in B. A nontrivial subset of a set X is a nonempty proper subset of it. If P ( ) is a predicate which is meaningful for every element x of a set X (so that P (x) is a proposition for each x in X), then {x ∈ X: P (x)} will denote the subset of X consisting of all those elements x of X for which the proposition P (x) is true. The complement of a subset A of a set X, denoted by X\A, is the subset {x ∈ X: x ∈ / A}. If A and B are sets, the difference between A and B, or the relative complement of B in A, is the set
4
1. Set-Theoretic Structures
A\B =
x ∈ A: x ∈ /B .
We shall also use the standard notation ∪ and ∩ for union and intersection, respectively (x ∈ A ∪ B ⇐⇒ {x ∈ A or x ∈ B} and x ∈ A ∩ B ⇐⇒ {x ∈ A and x ∈ B}). The sets A and B are disjoint if A ∩ B = ∅ (i.e., if they have an empty intersection). The symmetric difference (or Boolean sum) of two sets A and B is the set AB = (A\B) ∪ (B\A) = (A ∪ B)\(A ∩ B). The terms class, family, and collection (as their related terms prefixed with “sub”) will be used as synonyms for set (usually applied for sets of sets, but not necessarily) without imposing anyhierarchy among them. If X is a collection of subsets of a given set X, then X will denote the union of all sets in X . Similarly, X willdenote the intersection of all sets in X (alternative nota tion: A∈X A and A∈X A). An important statement about complements that exhibits the duality between union and intersection are the De Morgan laws:
X X\A and X X\A . A = A = A∈X
A∈X
A∈X
A∈X
The power set ofany set X, denoted by ℘ (X), is the collection of all subsets of X. Note that ℘(X) = X ∈ ℘ (X) and ℘(X) = ∅ ∈ ℘ (X). A singleton in a set X is a subset of X containing one and only one point of X (notation: {x} ⊆ X is a singleton on x ∈ X). A pair (or a doubleton) is a set containing just two points, say {x, y}, where x is an element of a set X and y is an element of a set Y. A pair of points x ∈ X and y ∈ Y is an ordered pair , denoted by (x, y), if x is regarded as the first member of the pair and y is regarded as the second. The Cartesian product of two sets X and Y, denoted by X×Y, is the set of all ordered pairs (x, y) with x ∈ X and y ∈ Y. A relation R between two sets X and Y is any subset of the Cartesian product X×Y. If R is a relation between X and Y and (x, y) is a pair in R ⊆ X×Y, then we say that x is related to y under R (or x and y are related by R), and write xRy (instead of (x, y) ∈ R). Tautologically, for any ordered pair (x, y) in X×Y, either (x, y) ∈ R or (x, y) ∈ / R (i.e., either xRy or xR y). A relation between a set X and itself is called a relation on X. If X and Y are sets and if R is a relation between X and Y, then the graph of the relation R is the subset of X×Y GR = (x, y) ∈ X×Y : xRy . A relation R clearly coincides with its graph GR .
1.3 Functions Let x be an arbitrary element of a set X and let y and z be arbitrary elements of a set Y. A relation F between the sets X and Y is a function if xF y and
1.3 Functions
5
xF z imply y = z. In other words, a relation F between a set X and a set Y is called a function from X to Y (or a mapping of X into Y ) if for each x ∈ X there exists a unique y ∈ Y such that xF y. The terms map and transformation are often used as synonyms for function and mapping. (Sometimes the terms correspondence and operator are also used but we shall keep them for special kinds of functions.) It is usual to write F : X → Y to indicate that F is a mapping of X into Y, and y = F (x) (or y = F x) instead of xF y. If y = F (x), we say that F maps x to y, so that F (x) ∈ Y is the value of the function F at x ∈ X. Equivalently, F (x), which is a point in Y, is the image of the point x in X under F . It is also customary to use the abbreviation “the function X → Y defined by x → F (x)” for a function from X to Y that assigns to each x in X the value F (x) in Y. A Y-valued function on X is precisely a function from X to Y. If Y is a subset of the set C , R, or Z , then complex-valued function, real-valued function, or integer-valued function, respectively, are usual terminologies. An X-valued function on X (i.e., a function F : X → X from X to itself) is referred to as a function on X. The collection of all functions from a set X to a set Y will be denoted by Y X. Indeed, Y X ⊆ ℘(X×Y ). Consider a function F : X → Y. The set X is called the domain of F and the set Y is called the codomain of F . If A is a subset of X, then the image of A under F , denoted by F (A), is the subset of Y consisting of all points y of Y such that y = F (x) for some x ∈ A: F (A) = y ∈ Y : y = F (x) for some x ∈ A ⊆ X . On the other hand, if B is a subset of Y, then the inverse image of B under F (or the pre-image of B under F ), denoted by F −1 (B), is the subset of X made up of all points x in X such that F (x) lies in B: F −1 (B) = x ∈ X: F (x) ∈ B ⊆ Y . The range of F , denoted by R(F ), is the image of X under F . Thus R(F ) = F (X) = y ∈ Y : y = F (x) for some x ∈ X . If R(F ) is a singleton, then F is said to be a constant function. If the range of F coincides with the codomain (i.e., if F (X) = Y ), then F is a surjective function. In this case F is said to map X onto Y. A function F is injective (or F is a one-to-one mapping) if its domain X does not contain two elements with the same image. In other words, a function F : X → Y is injective if F (x) = F (x ) implies x = x for every x and x in X. A one-to-one correspondence between a set X and a set Y is a one-to-one mapping of X onto Y ; that is, a surjective and injective function (also called a bijective function). If A is an arbitrary subset of X and F is a mapping of X into Y, then the function G: A → Y such that G(x) = F (x) for each x ∈ A is the restriction of F to A. Conversely, if G: A → Y is the restriction of F : X → Y to some subset
6
1. Set-Theoretic Structures
A of X, then F is an extension of G over X. It is usual to write G = F |A . Note that R(F |A ) = F (A). Let A be a subset of X and consider a function F : A → X. An element x of A is a fixed point of F (or F leaves x fixed ) if F (x) = x. The function J: A → X defined by J(x) = x for every x ∈ A is the inclusion map (or the embedding, or the injection) of A into X. In other words, the inclusion map of A into X is the function J: A → X that leaves each point of A fixed. The inclusion map of X into X is called the identity map on X and denoted by I, or by IX when necessary (i.e., the identity on X is the function I: X → X such that I(x) = x for every x ∈ X). Thus the inclusion map of a subset of X is the restriction to that subset of the identity map on X. Now consider a function on X; that is, a mapping F : X → X of X into itself. A subset of X, say A, is invariant for F (or invariant under F , or F -invariant) if F (A) ⊆ A. In this case the restriction of F to A, F |A : A → X, has its range included in A: R(F |A ) = F (A) ⊆ A ⊆ X. Therefore, we shall often think of the restriction of F : X → X to an invariant subset A ⊆ X as a mapping of A into itself: F |A : A → A. It is in this sense that the inclusion map of a subset of X can be thought of as the identity map on that subset: they differ only in that one has a larger codomain than the other. Let F : X → Y be a function from a set X to a set Y, and let G: Y → Z be a function from the set Y to a set Z. Since the range of F is included in the domain of G, R(F ) ⊆ Y, consider the restriction of G to the range of F , G|R(F ) : R(F ) → Z. The composition of G and F , denoted by G ◦ F (or simply by GF ), is the function from X to Z defined by (G ◦ F )(x) = G|R(F ) (F (x)) = G(F (x)) for every x ∈ X. It is usual to say that the diagram F
X −−−→ Y ⏐ ⏐ G H Z commutes if H = G ◦ F . Although the above diagram is said to be commutative whenever H is the composition of G and F , the composition itself is not a commutative operation even when such a commutation makes sense. For instance, if X = Y = Z and F is a constant function on X, say F (x) = a ∈ X for every x ∈ X, then G ◦ F and F ◦ G are constant functions on X as well: (G ◦ F )(x) = G(a) and (F ◦ G)(x) = a for every x ∈ X. However G ◦ F and F ◦ G need not be the same (unless a is a fixed point of G). Composition may not be commutative but it is always associative. If F maps X into Y, G maps Y into Z, and K maps Z into W , then we can consider the compositions K ◦ (G ◦ F ): X → W and (K ◦ G) ◦ F : X → W . It is readily verified that K ◦ (G ◦ F ) = (K ◦ G) ◦ F . For this reason we may and shall drop the parentheses. In other words, the diagram
1.4 Equivalence Relations
7
F
X −−−→ Y ⏐ ⏐ L G H
K Z −−−→ W commutes (i.e., H = G ◦ F , L = K ◦ G, and K ◦ H = L ◦ F ). If F is a function on set X, then the composition of F : X → X with itself, F ◦ F , is denoted by F 2 . Likewise, for any positive integer n ∈ N , F n denotes the composition of F with itself n times, F ◦ · · · ◦ F : X → X, which is called the nth power of F . A function F : X → X is idempotent if F 2 = F (and hence F n = F for every n ∈ N ). It is easy to show that the range of an idempotent function is precisely the set of all its fixed points. In fact, F = F 2 if and only if R(F ) = {x ∈ X: F (x) = x}. Suppose F : X → Y is an injective function. Thus, for an arbitrary element of R(F ), say y, there exists a unique element of X, say xy , such that y = F (xy ). This defines a function from R(F ) to X, F −1 : R(F ) → X, such that xy = F −1 (y). Hence y = F (F −1 (y)). On the other hand, if x is an arbitrary element of X, then F (x) lies in R(F ) so that F (x) = F (F −1 (F (x))). Since F is injective, x = F −1 (F (x)). Conclusion: For every injective function F : X → Y there exists a (unique) function F −1 : R(F ) → X such that F −1 F : X → X is the identity on X (and F F −1 : R(F ) → R(F ) is the identity on R(F )). F −1 is called the inverse of F on R(F ): an injective function has an inverse on its range. If F is also surjective, then F −1 : Y → X is called the inverse of F . Thus, an injective and surjective function is also called an invertible function (in addition to its other names).
1.4 Equivalence Relations Let x, y, and z be arbitrary elements of a set X. A relation R on X is reflexive if xRx for every x ∈ X, transitive if xRy and yRz imply xRz , and symmetric if xRy implies yRx. An equivalence relation on a set X is a relation ∼ on X that is reflexive, transitive, and symmetric. If ∼ is an equivalence relation on a set X, then the equivalence class of an arbitrary element x of X (with respect to ∼ ) is the set [x] = x ∈ X: x ∼ x . Given an equivalence relation ∼ on a set X, the quotient space of X modulo ∼ , denoted by X/∼ , is the collection X/∼ = [x] ⊆ X: x ∈ X
8
1. Set-Theoretic Structures
of the equivalence classes (with respect to ∼ ) of every x ∈ X. Set π(x) = [x] in X/∼ for each x in X. This defines a surjective map π: X → X/∼ which is called the natural mapping of X onto X/∼ . Let X be any collection of nonempty subsets of a set X. It covers X (or X is a covering of X) if X = X (i.e., if every point in X belongs to some set in X ). The collection X is disjoint if the sets in X are pairwise disjoint (i.e., A ∩ B = ∅ whenever A and B are distinct sets in X ). A partition of a set X is a disjoint covering of it. Let ≈ be an equivalence relation on a set X, and let X/≈ be the quotient space of X modulo ≈ . It is clear that X/≈ is a partition of X. Conversely, let X be any partition of a set X and define a relation ∼ /X on X as follows: for every x, x in X, x is related to x under ∼ /X (i.e., x ∼ /X x ) if x and x belong to the same set in X . In fact, ∼ /X is an equivalence relation on X, which is called the equivalence relation induced by a partition X . It is readily verified that the quotient space of X modulo the equivalence relation induced by the partition X coincides with X itself, just as the equivalence relation induced by the quotient space of X modulo the equivalence relation ≈ on X coincides with ≈ . Symbolically, X/(∼ /X ) = X
and
∼ /(X/≈ ) = ≈ .
Thus an equivalence relation ≈ on X induces a partition X/≈ of X, which in turn induces back an equivalence relation ∼ /(X/≈ ) on X that coincides with ≈ . On the other hand, a partition X of X induces an equivalence relation ∼ /X on X, which in turn induces back a partition X/(∼ /X ) of X that coincides with X . Conclusion: The collection of all equivalence relations on a set X is in a one-to-one correspondence with the collection of all partitions of X.
1.5 Ordering Let x and y be arbitrary elements of a set X. A relation R on X is antisymmetric if xRy and yRx imply x = y. A relation ≤ on a nonempty set X is a partial ordering of X if it is reflexive, transitive, and antisymmetric. If ≤ is a partial ordering on a set X, the notation x < y means x ≤ y and x = y. Moreover, y > x and y ≥ x are just another
1.5 Ordering
9
way to write x < y and x ≤ y, respectively. Thus a partially ordered set is a pair (X, ≤) where X is a nonempty set and ≤ is a partial ordering of X (i.e., a nonempty set equipped with a partial ordering on it). Warning: It may happen that x ≤ y and y ≤ x for some (x, y) ∈ X×X. Let (X, ≤) be a partially ordered set, and let A be a subset of X. Note that (A, ≤) is a partially ordered set as well. An element x ∈ X is an upper bound for A if y ≤ x for every y ∈ A. Similarly, an element x ∈ X is a lower bound for A if x ≤ y for every y ∈ A. A subset A of X is bounded above in X if it has an upper bound in X, and bounded below in X if it has a lower bound in X. It is bounded if it is bounded both above and below. If a subset A of a partially ordered set X is bounded above in X and if some upper bound of A belongs to A, then this (unique) element of A is the maximum of A (or the greatest or biggest element of A), denoted by max A. Similarly, if A is bounded below in X and if some lower bound of A belongs to A, then this (unique) element of A is the minimum of A (or the least or smallest element of A), denoted by min A. An element x ∈ A is maximal in A if there is no element y ∈ A such that x < y (equivalently, if x < y for every y ∈ A). Similarly, an element x ∈ A is minimal in A if there is no element y ∈ A such that y < x (equivalently, if y < x for every y ∈ A). Note that x < y (or y < x) does not mean that y ≤ x (or x ≤ y) so that the concepts of a maximal (or a minimal) element in A and that of the maximum (or the minimum) element of A do not coincide. Example 1.A. A collection of many (e.g., two) pairwise disjoint nonempty subsets of a set, equipped with the partial ordering defined by the inclusion relation ⊆ , has no maximum, no minimum, and every element in it is both maximal and minimal. On the other hand, the collection of all infinite subsets of an infinite set, whose complements are also infinite, has no maximal element in the inclusion ordering ⊆ . (The notion of infinite sets will be introduced later in Section 1.8 — for instance, the set all even natural numbers is an infinite subset of N that has an infinite complement.) Let A be a subset of a partially ordered set X. Let UA ⊆ X be the set of all upper bounds of A, and let VA ⊆ X be the set of all lower bounds of A. If UA is nonempty and has a minimum element, say u = min UA , then u ∈ UA is called the supremum (or the least upper bound ) of A (notation: u = sup A). Similarly, if VA is nonempty and has a maximum, say v = max VA , then v ∈ VA is called the infimum (or the greatest lower bound ) of A (notation: v = inf A). A bounded set may not have a supremum or an infimum. However, if a set A has a maximum (or a minimum), then sup A = max A (or inf A = min A). Moreover, if a set A has a supremum (or an infimum) in A, then sup A = max A (or inf A = min A). If a pair {x, y} of elements of a partially ordered set X has a supremum or an infimum in X, then we shall use the following notation: x ∨ y = sup{x, y} and x ∧ y = inf{x, y}.
10
1. Set-Theoretic Structures
Let F : X → Y be a function from a set X to a partially ordered set Y. Thus the range of F , F (X) ⊆ Y, is a partially ordered set. An upper bound for F is an upper bound for F (X), and F is bounded above if it has an upper bound. Similarly, a lower bound for F is a lower bound for F (X), and F is bounded below if it has a lower bound. If a function F is bounded both above and below, then it is said to be bounded . The supremum of F , supx∈X F (x), and the infimum of F , inf x∈X F (x), are defined by supx∈X F (x) = sup F (X) and inf x∈X F (x) = inf F (X). Now suppose X also is partially ordered and take an arbitrary pair of points x1 , x2 in X. F is an increasing function if x1 ≤ x2 in X implies F (x1 ) ≤ F (x2 ) in Y, and strictly increasing if x1 < x2 in X implies F (x1 ) < F (x2 ) in Y. (For notational simplicity we are using the same symbol ≤ to denote both the partial ordering of X and the partial ordering of Y .) In a similar way we can define decreasing and strictly decreasing functions between partially ordered sets. If a function is either decreasing or increasing, then it is said to be monotone.
1.6 Lattices Let X be a partially ordered set. If every pair {x, y} of elements of X is bounded above, then X is a directed set (or the set X is said to be directed upward ). If every pair {x, y} is bounded below, then X is said to be directed downward . X is a lattice if every pair of elements of X has a supremum and a infimum in X (i.e., if there exits a unique u ∈ X and a unique v ∈ X such that u = x ∨ y and v = x ∧ y for every pair x ∈ X and y ∈ X). A nonempty subset A of a lattice X that contains x ∨ y and x ∧ y for every x and y in A is a sublattice of X (and hence a lattice itself). Every lattice is directed both upward and downward. If every bounded subset of X has a supremum and an infimum, then X is a boundedly complete lattice. If every subset of X has a supremum and an infimum, then X is a complete lattice. The following chain of implications: complete lattice ⇒ boundedly complete lattice ⇒ lattice ⇒ directed set
is clear enough, and neither of them can be reversed. If X is a complete lattice, then X has a supremum and an infimum in X, which actually are the maximum and the minimum of X, respectively. Since min X ∈ X and max X ∈ X, this shows that a complete lattice in fact is nonempty (even if this had not been assumed when we defined a partially ordered set). Likewise, the empty set ∅ of a complete lattice X has a supremum and an infimum. Since every element of X is both an upper and a lower bound for ∅, it follows that U∅ = V ∅ = X. Hence sup ∅ = min X and inf ∅ = max X. Example 1.B. The power set ℘(X) of a set X is a complete lattice in the inclusion ordering ⊆ , where A ∨ B = A ∪ B and A ∧ B = A ∩ B for every pair {A, B} of subsets of X. In this case, sup ∅ = min ℘ (X) = ∅ and inf ∅ = max ℘(X) = X.
1.6 Lattices
11
Example 1.C. The real line R with its natural ordering ≤ is a boundedly complete lattice but not a complete lattice (and so is its sublattice Z of all integers). The set A = {x ∈ R: 0 ≤ x ≤ 1}, as a sublattice of (R, ≤), is a complete lattice where sup ∅ = min A = 0 and inf ∅ = max A = 1. The set of all rational numbers Q is a sublattice of R (in the natural ordering ≤ ) but not a boundedly complete lattice — e.g., the set {x ∈ Q : x2 ≤ 2} is bounded in Q but has no infimum and no supremum in Q . Example 1.D. The notion of connectedness needs topology and we shall define it in due course. (Connectedness will be defined in Chapter 3.) However, if the reader is already familiar with the concept of a connected subset of the plane, then he can appreciate now a rather simple example of a directed set that is not a lattice The subcollection of ℘ (R2 ) made up of all connected subsets of the Euclidean plane R2 is a directed set in the inclusion ordering ⊆ (both upward and downward) but not a lattice. Lemma 1.1. (Banach–Tarski). An increasing function on a complete lattice has a fixed point. Proof. Let (X, ≤) be a partially ordered set, consider a function F : X → X, and set A = {x ∈ X: F (x) ≤ x}. Suppose X is a complete lattice. Then X has a supremum in X (sup X = max X). Since max X ∈ X, it follows that F (max X) ∈ X so that F (max X) ≤ max X. Conclusion: A is nonempty. Take x ∈ A arbitrary and let a be the infimum of A (a = inf A ∈ X). If F is increasing, then F (a) ≤ F (x) ≤ x since a ≤ x and x ∈ A. Hence F (a) is a lower bound for A, and so F (a) ≤ a. Thus a ∈ A. On the other hand, since F (x) ≤ x and F is increasing, F (F (x)) ≤ F (x). Thus F (x) ∈ A so that F (A) ⊆ A, and hence F (a) ∈ A (for a ∈ A), which implies that a = inf A ≤ F (a). Therefore a ≤ F (a) ≤ a. Thus (antisymmetry) F (a) = a. The next theorem is an extremely important result that plays a central role in Section 1.8. Its proof is based on the previous lemma. Theorem 1.2. (Cantor–Bernstein). If there exist an injective mapping of X into Y and an injective mapping of Y into X, then there exists a one-to-one correspondence between the sets X and Y . Proof. First note that the theorem statement can be translated into the following problem. Given an injective function from X to Y and also an injective function from Y to X, construct a bijective function from X to Y. Thus consider two functions F : X → Y and G: Y → X. Let ℘(X) be the power set of X. For each A ∈ ℘(X) set Φ(A) = X\G(Y \F (A)). It is readily verified that Φ: ℘(X) → ℘ (X) is an increasing function with respect to the inclusion ordering of ℘(X). Therefore, by the Banach–Tarski
12
1. Set-Theoretic Structures
Lemma, it has a fixed point in the complete lattice ℘(X). That is, there is an A0 ∈ ℘(X) such that Φ(A0 ) = A0 . Hence A0 = X\G(Y \F (A0 )) so that X\A0 = G(Y \F (A0 )). Thus X\A0 is included in the range of G. If F : X → Y and G: Y → X are injective, then it is easy to show that the function H: X → Y, defined by F (x), x ∈ A0 , H(x) = G−1 (x), x ∈ X\A0 , is injective and surjective.
If X is a partially ordered set such that for every pair x, y of elements of X either x ≤ y or y ≤ x, then X is simply ordered (synonyms: linearly ordered , totally ordered). A simply ordered set is also called a chain. Note that, in this particular case, the concepts of maximal element and maximum element (as well as minimal element and minimum element) coincide. Also note that if F is a function from a simply ordered set X to any partially ordered set Y, then F is strictly increasing if and only if it is increasing and injective. It is clear that every simply ordered set is a lattice. For instance, any subset of the real line R (e.g., R itself or Z ) is simply ordered. Example 1.E. Let ≤ be a simple ordering on a set X and recall that x < y means x ≤ y and x = y. This defines a transitive relation < on X that satisfies the trichotomy law : for every pair {x, y} in X exactly one of the three statements x < y, x = y, or y < x is true. Conversely, if < is a transitive relation on a set X that satisfies the trichotomy law, and if a relation ≤ on X is defined by setting x ≤ y whenever either x < y or x = y, then ≤ is a simple ordering on X. Thus, according to the above notation, ≤ is a simple ordering on a set X if and only if < is a transitive relation on X that satisfies the trichotomy law. If X is a partially ordered set such that every nonempty subset of it has a minimum, then X is said to be well ordered. Every well-ordered set is simply ordered. Example: Any subset of N 0 (equipped with its natural ordering) is well ordered.
1.7 Indexing Let F be a function from a set X to a set Y. Another way to look at the range of F is: for each x ∈ X set yx = F (x) ∈ Y and note that F (X) = {yx ∈ Y : x ∈ X}, which can also be written as {yx }x∈X . Thus the domain X can be thought of as an index set , the range {yx }x∈X as a family of elements of Y indexed by an index set X (an indexed family), and the function F : X → Y
1.7 Indexing
13
as an indexing. An indexed family {yx }x∈X may contain elements ya and yb , for a and b in X, such that ya = yb . If {yx }x∈X has the property that ya = yb whenever a = b, then it is said to be an indexed family of distinct elements. Observe that {yx }x∈X is a family of distinct elements if and only if the function F : X → Y (i.e., the indexing process) is injective. The identity mapping on an arbitrary set X can be viewed as an indexing of X, the selfindexing of X. Thus any set X can be thought of as an indexed family (the range of the self-indexing of itself). A mapping of the set N (or N 0 , but not Z ) into a set Y is called a sequence (or an infinite sequence). Notation: {yn }n∈N , {yn}n≥1 , {yn }∞ n=1 , or simply {yn }. Thus a Y -valued sequence (or a sequence of elements in Y, or even a sequence in Y ) is precisely a function from N to Y, which is commonly thought of as an indexed family (indexed by N ) where the indexing process (i.e., the function itself) is often omitted. The elements yn of {yn} are sometimes referred to as the entries of the sequence {yn }. If Y is a subset of the set C , R, or Z , then complex-valued sequence, real-valued sequence, or integer-valued sequence, respectively, are usual terminologies. Let {Xγ }γ∈Γ be an indexed family of sets. The Cartesian product of {Xγ }γ∈Γ , denoted by γ∈Γ Xγ , is the set consisting of all indexed families {xγ }γ∈Γ such that xγ ∈ Xγ for every γ ∈ Γ . In particular, if Xγ = X for all γ ∈ Γ , where X is a fixed set, then γ∈Γ Xγ is precisely the collection of all functions from Γ to X. That is, X = XΓ . γ∈Γ
Recall: X Γ denotes the collection of all functions from a set Γ to a set X. Suppose Γ = I n , where I n = {i ∈ N : i ≤ n} for some n ∈ N (I n is called an initial segment of N ).The Cartesian product of {Xi }i∈I n (or {Xi }ni=1 ), de n noted by i∈I nXi or i=1 Xi , is the set X1 × · · · ×Xn of all ordered n-tuples (x1 , . . . , xn ) with xi ∈ Xi for every i ∈ I n . Moreover, if Xi = X for all i ∈ I n , then i∈I nX is the Cartesian product of n copies of X which is denoted by X n (instead of X I n ). The n-tuples (x1 , . . . , xn ) in X n are also called finite sequences (as functions from an initial segment of N into X). Accordingly, n∈N X is referred to as the Cartesian product of countably infinite copies of X which coincides with X N : the set of all X-valued (infinite) sequences. An exceptionally helpful way of defining an infinite sequence is given by the Principle of Recursive Definition which says that, if F is a function from a nonempty set X into itself, and if x is an arbitrary element of X, then there exists a unique X-valued sequence {xn }n∈N such that x1 = x and xn+1 = F (xn ) for every n ∈ N . The existence of such a unique sequence is intuitively clear, and it can be easily proved by induction (i.e., by using the Principle of Mathematical Induction). A slight generalization reads as follows. For each n ∈ N let Gn be a mapping of X n into X, and let x be an arbitrary
14
1. Set-Theoretic Structures
element of X. Then there exists a unique X-valued sequence {xn }n∈N such that x1 = x and xn+1 = Gn (x1 , . . . , xn ) for every n ∈ N . Since sequences are functions of N (or of N 0 ) to a set X, the terms associated with the notion of being bounded clearly apply to sequences in a partially ordered set X. In particular, if X is a partially ordered set, and if {xn } is an X-valued sequence, then supn xn and inf n xn are defined as the supremum and infimum, respectively, of the partially ordered indexed family {xn }. Since N and N 0 (with their natural ordering) are partially ordered sets (well-ordered, really), the terms associated with the property of being monotone (such as increasing, decreasing, strictly increasing, strictly decreasing) also apply to sequences in a partially ordered set X. Let {zn}n∈N be a sequence in a set Z, and let {nk }k∈N be a strictly increasing sequence of positive integers (i.e., a strictly increasing sequence in N ). If we think of {nk } and {zn } as functions, then the range of the former is a subset of the domain of the latter (i.e., the indexed family {nk }k∈N is a subset of N ). Thus we may consider the composition of {zn } with {nk }, say {znk }, which is again a function of N to Z (i.e., {znk } is a sequence in Z). Since {nk } is strictly increasing, to each element of the indexed family {znk }k∈N there corresponds a unique element of the indexed family {zn }n∈N . In this case the Z-valued sequence {znk } is called a subsequence of {zn }. A sequence is a function whose domain is either N or N 0 , but a similar concept could be likewise defined for a function on any well-ordered domain. Even in this case, a function with domain Z (equipped with its natural ordering) would not be a sequence. Now recall the following string of (nonreversible) implications: well-ordered ⇒ simply ordered ⇒ lattice ⇒ directed set.
This might suggest an extension of the concept of sequence by allowing functions whose domains are directed sets. A net in a set X is a family of elements of X indexed by a directed set Γ . In other words, if Γ is a directed set and X is an arbitrary set, then an indexed family {xγ }γ∈Γ of elements of X indexed by Γ is called a net in X indexed by Γ . Examples: Every X-valued sequence {xn } is a net in X. In fact, sequences are prototypes of nets. Every X-valued function on Z (notation: {xk }k∈Z , {xk }∞ k=−∞ or {xk ; k = 0, ±1, ±2, . . .}) is a net (sometimes called double sequences or bisequences, although these nets are not sequences themselves).
1.8 Cardinality Two sets, say X and Y, are said to be equivalent (denoted by X ↔ Y ) if there exists a one-to-one correspondence between them. Clearly (see Problems 1.8 and 1.9), X ↔ X (reflexivity), X ↔ Y if and only if Y ↔ X (symmetry), and
1.8 Cardinality
15
X ↔ Z whenever X ↔ Y and Y ↔ Z for some set Z (transitivity). Thus, if there exists a set upon which ↔ is a relation, then it is an equivalence relation. For instance, if the notion of equivalent sets is restricted to subsets of a given set X, then ↔ is an equivalence relation on the power set ℘ (X). If C = {xγ }γ∈Γ is an indexed family of distinct elements of a set X indexed by a set Γ (so that xα = xβ for every α = β in Γ ), then C ↔ Γ (the very indexing process sets a one-to-one correspondence between Γ and C). Let N be the set of all natural numbers and, for each n ∈ N , consider the initial segment In = i ∈ N : i ≤ n . A set X is finite if it is either empty or equivalent to I n for some n ∈ N . A set is infinite if it is not finite. If X is finite and Y is equivalent to X, then Y is finite. Therefore, if X is infinite and Y is equivalent to X, then Y is infinite. It is easy to show by induction that, for each n ∈ N , I n has no proper subset equivalent to it. Thus (see Problem 1.12), every finite set has no proper subset equivalent to it. That is, if a set has a proper equivalent subset, then it is infinite. Moreover, such a subset must be infinite too (since it is equivalent to an infinite set). Example 1.F. N is infinite. Indeed, it is easy to show that N 0 is equivalent to N (the function F : N 0 → N such that F (n) = n + 1 for every n ∈ N 0 will do the job). Thus N 0 is infinite, because N is a proper subset of N 0 which is equivalent to it, and so is N . To verify the converse (i.e., to show that every infinite set has a proper equivalent subset) we apply the Axiom of Choice. Axiom of Choice. If {Xγ }γ∈Γ is an indexed family of nonempty sets indexed by a nonempty index set Γ, then there exists an indexed family {xγ }γ∈Γ such that xγ ∈ Xγ for each γ ∈ Γ . Theorem 1.3. A set is infinite if and only if it has a proper equivalent subset. Proof. We have already seen that every set with a proper equivalent subset is infinite. To prove the converse, take an arbitrary element x0 from an infinite set X0 , and an arbitrary k from N 0 . The Principle of Mathematical Induction allows us to construct, for each k ∈ N 0 , a finite family {Xn }k+1 n=0 of infinite sets as follows. Set X1 = X0 \{x0 } and, for every nonnegative integer n ≤ k, let Xn+1 be recursively defined by the formula Xn+1 = Xn \{xn }, where {xn }kn=0 is a finite set of pairwise distinct elements, each xn being an arbitrary element taken from each Xn . Consider the (infinite) indexed family {Xn }n∈N 0 = k∈N 0 {Xn }k+1 Axiom of Choice to ensure the existn=0 and use the ence of the indexed family {xn }n∈N 0 = k∈N 0 {xn }kn=0 , where each xn is arbi-
16
1. Set-Theoretic Structures
trarily taken from each Xn . Next consider the sets A0 = {xn }n∈N 0 ⊆ X0 , A = {xn }n∈N ⊂ A0 , and X = A ∪ (X0 \A0 ) ⊂ A0 ∪ (X0 \A0 ) = X0 . Note that A0 ↔ N 0 and A ↔ N (since the elements of A0 are distinct). Thus A0 ↔ A (because N 0 ↔ N ), and hence X0 ↔ X (see Problem 1.20). Conclusion: Any infinite set X0 has a proper equivalent subset (i.e., there exists a proper subset X of X0 such that X0 ↔ X). If X is a finite set, so that it is equivalent to an initial segment I n for some natural number n, then we say that its cardinality (or its cardinal number ) is n. Thus the cardinality of a finite set X is just the number of elements of X (where, in this case, “numbering” means “indexing” as a finite set may be naturally indexed by an index set I n ). We shall use the symbol # for cardinality. Thus # I n = n, and so # X = n whenever X ↔ I n . For infinite sets the concept of cardinal number is a bit more complicated. We shall not define a cardinal number for an infinite set as we did for finite sets (which “number” should it be?) but define the following concept instead. Two sets X and Y are said to have the same cardinality if they are equivalent. Thus, to each set X we shall assign a symbol # X, called the cardinal number of X (or the cardinality of X) according to the following rule: # X = # Y ⇐⇒ X ↔ Y — two sets have the same cardinality if and only if they are equivalent; otherwise (i.e., if they are not equivalent) we shall write # X = # Y. We say that the cardinality of a set X is less than or equal to the cardinality of a set Y (notation: # X ≤ # Y ) if there exists an injective mapping of X into Y (i.e., if there exists a subset Y of Y such that # X = # Y ). Equivalently, # X ≤ # Y if there exists a surjective mapping of Y onto X (see Problem 1.6). If # X ≤ # Y and # X = # Y, then we shall write # X < # Y. Theorem 1.4. (Cantor).
#X
<
# ℘ (X)
for every set X.
Proof. Consider the function F : X → ℘ (X) defined by F (x) = {x} for every x ∈ X, which is clearly injective. Thus # X ≤ # ℘(X). Hence # X < # ℘ (X) if and only if # X = # ℘ (X). Suppose # X = # ℘ (X) so that there is a surjective function G: X → ℘(X). Consider the set A = {x ∈ X: x ∈ / G(x)} in ℘(X) and take a ∈ X such that G(a) = A (recall: G is surjective). If a ∈ A, then a ∈ / G(a) and so a ∈ / A, which is a contradiction. Conclusion 1: a ∈ / A. On the other hand, if a ∈ / A, then a ∈ G(a) and so a ∈ A, which is another contradiction. Conclusion 2: a ∈ A. Therefore {a ∈ / A and a ∈ A}, which is impossible (i.e., which also is a contradiction). Final conclusion: # X = # ℘(X). Let A be a subset of a set X. The characteristic function of the set A is the map χA : X → {0, 1} such that 1, x ∈ A, χA (x) = 0, x ∈ X\A.
1.8 Cardinality
17
It is clear that the correspondence between the subsets of X and their characteristic functions is one to one. Hence # ℘(X) coincides with the cardinality of the collection of all characteristic functions of subsets of X. More generally, let 2X denote the collection of all maps of a set X into the set {0, 1} (i.e., set 2X = {0, 1}X ). Theorem 1.5.
# ℘ (X)
=
# 2X
for every set X.
Proof. Let F be a function from the collection 2X to the power set ℘(X) that assigns to each map ϕ: X → {0, 1} the inverse image of the singleton {1} under ϕ. That is, consider the function F : 2X → ℘(X) defined by F (ϕ) = ϕ−1 ({1}) for every ϕ in 2X . Claim 1. F is surjective. Proof. If A is a subset of X, then the characteristic function of A, χA in 2X , −1 ℘ is such that χA ({1}) = A.Thus F (χA ) = A for every A ∈ (X). Therefore, ℘(X) = F (χ ) ⊆ F (ϕ) = R(F ). A A∈℘(X) ϕ∈2X Claim 2. F is injective. Proof. Take ϕ, ψ ∈ 2X . If ϕ = ψ, then ϕ(x) = ψ(x) for some x ∈ X. Thus ϕ(x) = 0 and ψ(x) = 1 (or vice versa), so that x ∈ / ϕ−1 ({1}) and x ∈ ψ −1 ({1}). −1 −1 Hence ϕ ({1}) = ψ ({1}) so that F (ϕ) = F (ψ).
Conclusion: F is a one-to-one correspondence between 2X and ℘(X).
Although the next theorem may come as no surprise, it is all but trivial. This actually is a rewriting, in terms of the concept of cardinality, of the rather important Theorem 1.2. Theorem 1.6. (Cantor–Bernstein). Let X and Y be any sets. If and # Y ≤ # X, then # X = # Y .
#X
≤
#Y
The Cantor–Bernstein Theorem exhibits an antisymmetry property. Note that reflexivity and transitivity are readily verified (see Problem 1.22) which, together with Theorem 1.6, lead to a partial ordering property. Behind the antisymmetry property of Theorem 1.6 there is in fact a simple ordering property. To establish it (Theorem 1.7 below) we shall rely on the following axiom. Zorn’s Lemma. Let X be a partially ordered set. If every simply ordered subset of X (i.e., if each chain in X) has an upper bound in X, then X has a maximal element. The label “Zorn’s Lemma” is inappropriate but has already been consecrated in the literature. It should read “Zorn’s Axiom” instead, for it really is an axiom equivalent to the Axiom of Choice. Theorem 1.7. For any two sets X and Y, either
#X ≤ #Y
or
#Y
≤ # X.
18
1. Set-Theoretic Structures
Proof. Consider two sets X and Y. Let I be the collection of all injective functions from subsets of X to Y. That is, I = F ∈ Y A : A ⊆ X and F is injective . Recall that (Problem 1.17), as a subset of F = A∈℘(X) Y A, I is a partially ordered set in the extension ordering, and every simply ordered subset of it has a supremum in I. Thus, by Zorn’s Lemma, I contains a maximal function. Let F0 : A0 → Y be a maximal function of I, where A0 ⊆ X and # F0 (A0 ) = # A0 (since F0 is injective). Suppose A0 = X and F0 (A0 ) = Y. Take x0 ∈ X\A0 and y0 ∈ Y \F0 (A0 ), and consider the function F1 : A0 ∪ {x0 } → Y defined by F0 (x) ∈ F0 (A0 ), x ∈ A0 , F1 (x) = x = x0 ∈ X\A0 , y0 ∈ Y \F0 (A0 ), which is injective (because F0 is injective and y0 ∈ / F0 (A0 )). Since F1 ∈ I and F0 = F1 |A0 , it follows that F0 ≤ F1 , which contradicts the fact that F0 is a maximal function of I (for F0 = F1 ). Hence, either A0 = X or F0 (A0 ) = Y. If A0 = X, then F0 : X → Y is injective and so # X ≤ # Y. If F0 (A0 ) = Y, then # Y = # F0 (A0 ) = # A0 ≤ # X (for A0 ⊆ X — see Problem 1.21(a)). We have already seen that N ↔ N 0 . Thus N and N 0 have the same cardinality. It is usual to assign a special symbol (aleph-naught ) to such a cardinal number: # N = # N 0 = ℵ0 . We have also seen (cf. proof of Theorem 1.3) that, if X is an infinite set, then there exists a subset of it, say A, which is equivalent to N . Thus # N = # A ≤ # X, and hence ℵ0 is the smallest infinite cardinal number in the sense that ℵ0 ≤ # X for every infinite set X (see Problems 1.21(a) and 1.22). A set X such that # X = ℵ0 is said to be countably infinite (or denumerable). Therefore, every infinite set has a countably infinite subset. A set that is either finite or countably infinite (i.e., a set X such that # X ≤ ℵ0 ) is said to countable; otherwise, it is said to be uncountable (or uncountably infinite, or nondenumerable). Proposition 1.8.
# (X×X)
=
#X
for every countably infinite set X.
Proof. Suppose # X = # N . According to Problems 1.26, 1.23(b), and 1.25(a), we get # X ≤ # (X×X) ≤ # (N ×N ) = # N = # X. Hence the identity # X = # (X×X) follows by the Cantor–Bernstein Theorem. Note that # X ≤ # (X×X) for any set X (see Problem 1.26). Moreover, it is easy to show that # X < # (X×X) whenever X is a finite set with more than one element. Thus, if a set with more than one element X is such that # X = # (X×X), then it is an infinite set. The previous proposition ensured the converse for countably infinite sets. The next theorem (which is another application of Zorn’s Lemma) ensures the converse for every infinite set. Therefore, the identity # X = # (X×X) actually characterizes the infinite sets (of any cardinality) among the sets with more than one element.
1.8 Cardinality
Theorem 1.9. If X is an infinite set, then
#X
=
19
# (X×X).
Proof. First we verify the following auxiliary result. Claim 0. Let C, D, and E be nonempty sets. If # (E×E) = # (C ∪ D) ≤ # E whenever # C ≤ # E and # D ≤ # E.
# E,
then
Proof. The claimed result is a straightforward application of Problems 1.26, 1.23(b), and 1.22: # (C ∪ D) ≤ # (C×D) ≤ # (E×E) = # E. Now, back to the theorem statement. Let X be a set and let J be the collection of all injective functions from subsets of X to X×X such that the range of each function in J coincides with the Cartesian product of its domain with itself. That is, J = F ∈ (X×X)A : A ⊆ X, F is injective and F (A) = A×A . Note that J is nonempty (at least the empty function is there). From now on suppose X is infinite. Thus X has a countably infinite subset, and so Proposition 1.8 ensures the existence of a function in J with infinite (at least countably infinite) domain. Recall that J is a partially ordered set in the extension ordering, and that every chain in J (i.e., every simply ordered subset of J ) has an injective supremum (see Problem 1.17). Claim 1. Such a supremum in fact lies in J . Proof. Let {Fγ } be an arbitrary chain in J , and let D(Fγ ) and R(Fγ ) denote the domain and range of Fγ , respectively. Thus each Fγ is an injective functionfromAγ to X×X, with Aγ = D(Fγ ) ⊆ X and Fγ (Aγ ) = Aγ ×Aγ . Now let γ Fγ : γ Aγ → X×X be the supremum of {Fγ }, and note that (see Prob
lem 1.17) γ Fγ A = R γ Fγ = γ R(F γ) = γ (Aγ ×Aγ ). Clearly,
γ γ γ (Aγ ×Aγ ) ⊆ γ Aγ × γ Aγ . On the other hand, if {Fλ , Fμ } is an arbitrary pair from {Fγ }, then Fλ ≤ Fμ (or vice versa), so thatAλ ×Aμ ⊆ Aμ ×Aμ × (because Aλ ⊆ Aμ ), and therefore A A ⊆ γ (Aγ ×Aγ ). Hence
γ γ γ γ γ Fγ γ Aγ = γ Aγ × γ Aγ , and so γ Fγ ∈ J . Conclusion: Every chain in J has an upper bound (a supremum, actually) in J . Thus, according to Zorn’s Lemma, J contains a maximal element. Let F0 : A0 → X×X be a maximal function of J so that A0 ⊆ X and F0 (A0 ) = A0 ×A0 . Since F0 is injective, # (A0 ×A0 )
=
# A0 ,
where A0 is an infinite (at least countably infinite) set. Claim 2. If
# A0
<
# X,
then
# A0
<
# (X\A0 ).
Proof. If # (X\A0 ) ≤ # A0 , then (cf. Problems 1.26 and 1.23(b)) # X = # (A0 ∪ X\A0 ) ≤ # (A0 ×(X\A0 )) ≤ # (A0 ×A0 ) = # A0 . Thus # (X\A0 ) ≤
20 # A0 # A0
1. Set-Theoretic Structures
implies # X ≤ # A0 (Problem 1.22). Equivalently, < # (X\A0 ) by Theorem 1.7.
# A0
<
#X
implies
Note that # A0 ≤ # X (for A0 ⊆ X). Suppose # A0 < # X. In this case # A0 < # (X\A0 ) by Claim 2. Thus there exists a proper subset of X\A0 , say A1 , such that # A0 = # A1 . Hence (cf. Problem 1.23(b)) # (Ai ×Aj ) ≤ # (A0 ×A0 ) = # A0 for all possible combinations of i, j in {0, 1}, and therefore # [(A0 ×A1 ) ∪ (A1 ×A0 ) ∪ (A1 ×A1 )] ≤ # A0 according to Claim 0. Since the reverse inequality is trivially verified, it follows by Theorem 1.6 that # [(A0 ×A1 ) ∪ (A1 ×A0 ) ∪ (A1 ×A1 )] = # A0 . Set A = A0 ∪ A1 and observe that (A×A)\(A0 ×A0 ) = (A0 ×A1 ) ∪ (A1 ×A0 ) ∪ (A1 ×A1 ) because A0 and A1 are disjoint. Thus # [(A×A)\(A0 ×A0 )]
=
# A0
=
# A1 ,
which ensures the existence of an injective function F1 : A1 → X×X such that F1 (A1 ) = (A×A)\(A0 ×A0 ). Now consider a function F from A to X×X defined as follows. F0 (x) ∈ A0 ×A0 , x ∈ A0 , F (x) = x ∈ A1 = A\A0 . F1 (x) ∈ (A×A)\(A0 ×A0 ), F : A → X×X is injective (because F0 and F1 are injective functions with disjoint ranges) and F (A) = A×A (for F0 (A0 ) ∪ F1 (A1 ) = A×A). Since F ∈ J and F0 = F |A0 , it follows that F0 ≤ F , which contradicts the fact that F0 is a maximal function of J (for F0 = F ). Therefore # A0
=
# X,
and hence (cf. Problem 1.23(b) # (A0 ×A0 ) = # A0 = # (A0 ×A0 ) = # (X×X).
# (X×X).
Conclusion:
#X
=
Theorem 1.9 is a natural extension of Proposition 1.8 which in turn generalizes Problem 1.25(a). Another important and particularly useful result along this line is given by the next theorem and its corollary. Theorem 1.10. Let X be a set and consider an indexed family of sets {Xγ }γ∈Γ . If # Xγ ≤ # X for all γ ∈ Γ , then
# Xγ ≤ # Γ ×X . γ∈Γ
Proof. Take an indexed family of sets {Xγ }γ∈Γ and suppose there is a set X such that # Xγ ≤ # X for all γ ∈ Γ . Thus, for each γ ∈ Γ , there is a surjective
1.9 Remarks
21
function Fγ : X → Xγ of X onto Xγ . Let G: Γ ×X → γ∈Γ Xγ be the function defined by G(γ, x) = Fγ (x) for every γ ∈ Γ and every x ∈ X. Take any y in γ∈Γ Xγ so that y ∈ Xγ for some γ ∈ Γ . Since Fγ : X → Xγ is surjective, there exists x ∈ X such that y = Fγ (x). Thus y = G(γ,
is, y ∈ R(G). x); that Hence G: Γ ×X → γ∈Γ Xγ is surjective, and so # γ∈Γ Xγ ≤ # (Γ ×X). Corollary 1.11. A countable union of countable sets is countable. Proof. Consider any countable family of countable sets, say {Xn }n∈N , so that #N ≤ # N and
#Xn ≤ # N for all n ∈ N . According to Theorem 1.10 we have # X n∈N n ≤ #(N ×N ). However, #(N ×N ) ≤ #(N ×N ) = #N . Thus
# Xn ≤ # N . n∈N
(See Problems 1.23(b), 1.25(a), and 1.22.)
1.9 Remarks We assume the reader is familiar with the definition of an interval of the real line R. An interval of R is nondegenerate if it does not collapse to a singleton. It is easy to show that the cardinality of the real line R is the same as the cardinality of any nondegenerate interval of R. A typical example: The function F : [0, 1] → [−1, 1] given by F (x) = 2x − 1 for every x ∈ [0, 1] is injective and surjective, and so is the function G: (−1, 1) → R defined by G(x) = (1 − |x|)−1 x for every x ∈ (−1, 1). Thus #[0, 1] = #[−1, 1] and #(−1, 1) = #R. Since (−1, 1) ⊂ [−1, 1] ⊂ R, it follows that #R = #(−1, 1) ≤ #[−1, 1] ≤ #R, and hence (Cantor–Bernstein Theorem) #[−1, 1] = #R. Thus, # [0, 1]
=
# (−1, 1)
=
# [−1, 1]
=
# R.
We can also prove that # R = # 2N . Indeed, take the function F : 2N → [0, 1] that assigns to each sequence {αn } in 2N (i.e., to each sequence {αn }n∈N with values either 0 or 1) a real number in [0, 1], in ternary expansion, as follows. F ({αn }) = 0.(2α1 )(2α2 ) . . . for every {αn } ∈ 2N . It can be shown that F is injective. ∞ Reason: Every real number x ∈ [0, 1] can be written as n=1 βn p−n for a given positive integer p greater than 1 (i.e., for a given base p). In this case 0.β1 β2 . . . is a p-ary expansion of x, where {βn }n∈N is a sequence of nonnegative integers ranging from 0 to p − 1. That is, {βn }n∈N is a sequence of digits with respect to the base p — e.g., if p = 2, 3, or 10, then 0.β1 β2 . . . is a binary, ternary, or denary (i.e., decimal ) expansion of x, respectively. A p-ary expansion (with respect to a base p) is not unique — e.g., 0.499 . . . and 0.500 . . . are decimal
1. Set-Theoretic Structures
22
expansions for x = 12 . However, if two p-ary expansions of x differ, then the absolute difference between the first digits in which they differ is equal to 1. Hence, if we take a p-ary expansion whose digits are either 0 or 2 (as we did), then it is unique. (This can only be done for a p-ary expansion with respect to a base p ≥ 3, so that a ternary expansion is enough.) Thus F is injective. Therefore, # 2N ≤ # [0, 1]. On the other hand, let G: 2N → [0, 1] be the function that assigns to each sequence {αn} in 2N a real number in [0, 1], in binary expansion, as follows. G({αn }) = 0.α1 α2 . . . for every {αn } ∈ 2N . It can also be shown that G is surjective. ∞ Reason: Every x ∈ [0, 1] can be written as n=1 αn 2−n so that 0.α1 α2 . . . is a binary expansion of it for some sequence {αn } ∈ 2N . Thus G is surjective. Therefore, # [0, 1] ≤ orem, and so
# 2N . Hence # [0, 1] #R
(since
# [0, 1]
=
# R).
=
=
# 2N
#2
by the Cantor–Bernstein The-
N
Using Theorems 1.4 and 1.5 we may conclude that #N
<
# R.
Such a fundamental result can also be derived by the celebrated Cantor’s diagonal procedure as follows. Clearly, # N ≤ # R (because N ⊂ R). Suppose # N = # R. This implies that # N = # [0, 1] (recall: # [0, 1] = # R). Thus the interval [0, 1] can be indexed by N so that [0, 1] = {xn }n∈N . Write each xn in decimal expansion: xn = 0.αn1 αn2 . . . where each αnk (k ∈ N ) is a nonnegative integer between 0 and 9. Now consider the point x ∈ [0, 1] with the following decimal expansion. x = 0.α1 α2 . . . where, again, each αn (n ∈ N ) is a nonnegative integer between 0 and 9 but α1 = α11 , α2 = α22 , and so on. That is, αn = αnn for each n ∈ N (e.g., take αn diametrically opposite to αnn so that, for each n ∈ N , αn = αnn + 5 if 0 ≤ αnn ≤ 4 or αn = αnn − 5 if 5 ≤ αnn ≤ 9). Thus x = xn for every n ∈ N . Hence x ∈ / [0, 1], which is a contradiction. Therefore # N = # R. Equivalently (since # N ≤ # R), # N < # R. We have denoted
#N
by ℵ0 . Let us now denote #N
= ℵ0 < 2 ℵ 0 =
# 2N
by 2ℵ0 so that
# R.
Cantor conjectured in 1878 that there is no cardinal number between ℵ0 and 2ℵ0 . Such a conjecture is called the Continuum Hypothesis.
1.9 Remarks
23
Continuum Hypothesis. There is no set whose cardinality is greater than # N and smaller than # R. The Generalized Continuum Hypothesis is the conjecture that naturally generalizes the Continuum Hypothesis. Generalized Continuum Hypothesis. For any infinite set X, there is no cardinal number between # X and # 2X . There are several different axiomatic set theories, each based on a somewhat different axiom system. The most popular is probably the axiom system ZFC. It comprises the axiom system ZF (“Z” for Zermelo and “F” for Fraenkel) plus the Axiom of Choice. The axioms of ZF are listed below. Axiom of Empty Set: There exists the empty set ∅. Axiom of Extension: {x ∈ A if and only if x ∈ B} defines A = B. Axiom of Specification: There exists the set {x ∈ A: P (x)}. Axiom of Pairing: There exists the set {a, b} = {x: x = a or x = b}.
A = {x: x ∈ A for some A ∈ S}. ℘ Axiom of Power Set: There exists the set (A) = {B: x ∈ B implies x ∈ A}. Axiom of Union: There exists the set
A∈S
Axiom of Replacement: There exists the set Fγ = {x: Pγ (x)} for each γ ∈ Γ . Axiom of Infinity: There exists an infinite set. Axiom of Foundation: There exists A ∈ S = ∅ such that A ∩ S = ∅.
The Axiom of Empty Set states that there is a unique set ∅ such that, for every x, x ∈ / ∅. After defining inclusion, the Axiom of Extension ensures that A = B if and only if {A ⊆ B and B ⊆ A}. In the Axiom of Specification (also called Axiom of Separation), P ( ) is a meaningful predicate for elements of a set A. The Axiom of Pairing simply states the existence of a pair (of elements, which may be sets themselves), and the Axiom of Union states the existence of unions for every collection of sets. After defining inclusion, the Axiom of Power Set states that {B: B ⊆ A} is a set: the collection of all subsets of a set is itself a set. The Axiom of Replacement (also called Axiom of Substitution) says that, if Pγ ( ) is a meaningful predicate for each γ in a set Γ such that each {x: Pγ (x)} is a set, then there exists a function F whose domain is Γ that assigns the set Fγ = {x: Pγ (x)} to each γ ∈ Γ . The Axiom of Infinity is equivalent to saying that N is a set. The Axiom of Foundation (also called Axiom of Regularity or Axiom of Restriction) says that if S = ∅, then there exists A ∈ S such that B ∈ / A for every B ∈ S. The axiom system ZF contains some redundancies (i.e., ZF is not an independent system of axioms). For instance. the Axiom of Empty Set may be viewed as a consequence of the Axiom of Specification. Indeed, if there exists a set A, then the set {x ∈ A: x = x} is empty.
24
1. Set-Theoretic Structures
Observe that if {A: A ∈ / A} is a set, say S, then S ∈ S implies S ∈ / S; a contradiction (and S ∈ / S implies S ∈ S; another contradiction). Therefore, {A: A ∈ / A} is not a set — such a “thing” is too large to be a set: there is no set of all sets that are not members of themselves. Applying the Axiom of Specification we may consider the set R = {A ∈ S: A ∈ / A} for any set S. If R ∈ S and R ∈ R, then R ∈ / R; a contradiction. If R ∈ S and R ∈ / R, then R ∈ R; another contradiction. Thus R ∈ / S. Since S is an arbitrary set, we conclude that there exists something (viz., R) that does not belong to anything (i.e., that does not belong to any S). This is sometimes stated by saying that there is no universe. The above arguments were known (in the pre-axiomatic era) as the Russell paradox . It is worth noticing that the arguments in the preceding paragraph have nothing to do with the existence (or nonexistence) of a set A such that A ∈ A. However, it is natural to ask at this point whether there is any set A such that A ∈A. The Axiom of Foundation is very little used, but it provides a way (set S = {A}) to ensure that no set is a member of itself . That is, A ∈ / A always. The Axiom of Choice actually is a genuine axiom to be added to ZF. Indeed, G¨odel proved in 1939 that the Axiom of Choice is consistent with ZF, and Cohen proved in 1963 that the Axiom of Choice is independent of ZF. The situation of the Continuum Hypothesis with respect to ZFC is somewhat similar to that of the Axiom of Choice with respect to ZF, although the Continuum Hypothesis itself is not as primitive as the Axiom of Choice (even if the Axiom of Choice might be regarded as not primitive enough). G¨odel proved in 1939 that the Generalized Continuum Hypothesis is consistent with ZFC, and Cohen proved in 1963 that the denial of the Continuum Hypothesis also is consistent with ZFC. Thus the Continuum Hypothesis and the Generalized Continuum Hypothesis are consistent with ZFC and also independent of ZFC: neither of them can be proved or disproved on the basis of ZFC alone (they are undecidable statements in ZFC). The Generalized Continuum Hypothesis in fact is stronger than the Axiom of Choice: Sierpinski showed in 1947 that the Generalized Continuum Hypothesis implies the Axiom of Choice. We have already observed that the Axiom of Choice and Zorn’s Lemma are equivalent. There is a myriad of axioms equivalent to the Axiom of Choice. Let us mention just two of them. Hausdorff Maximal Principle. Every partially ordered set contains a maximal chain (i.e., a maximal simply ordered subset ). Zermelo Well-Ordering Principle. Every set may be well ordered. In particular, the set R of all reals may be well ordered. This is a pure existence result, not exhibiting (either constructing or even defining) a wellordering of R. Indeed, Feferman showed in 1965 that no defined partial ordering can be proved in ZFC to well-order the set R.
1.9 Remarks
25
If X and Y are any sets, properly well ordered, and if there exists a one-to-one order-preserving correspondence between them (i.e., an injective and surjective mapping Φ: X → Y such that x1 < x2 in (X, ≤) if and only if Φ(x1 ) < Φ(x2 ) in (Y, ≤)), then X and Y are said to have the same ordinal number . If two well-ordered sets have the same ordinal number, then they have the same cardinal number. However, unlike the notion of cardinal number, the notion of ordinal number depends on the well-orderings that well-order the sets. Proposition 1.12. There is an uncountable set X, well-ordered by a relation ≤ on it, with the following properties. X has a greatest element Ω and the set {x ∈ X: x < z} is countable for every z in X\{Ω}. Proof. Let Y be an uncountable set. By the Well-Ordering Principle, there exists a well-ordering of Y. Take ζ not in Y, set Z = Y ∪ {ζ}, and extend the well-ordering of Y to Z by setting y < ζ for every y ∈ Y. Consider the set A = α ∈ Z: {z ∈ Z: z < α} is an uncountable set . A is nonempty (ζ ∈ A because Y = {z ∈ Z: z < ζ} is uncountable), and hence it has a minimum (since ∅ = A ⊆ Z and Z is well-ordered). Set Ω = min A so that X = {z ∈ Z: z ≤ Ω} is the required set. Moreover, it can be shown that such a well-ordered set X is unique in the sense that, if Y is any well-ordered set with the same properties of X, then there exists a one-to-one order-preserving correspondence between X and Y (i.e., then X and Y have the same ordinal number). The greatest (or the last) element Ω in X is called the least or first uncountable ordinal , and the elements x of X such that x < Ω are called countable ordinals. The greatest elements of the finite subsets of X are called finite ordinals. If ω is the first infinite ordinal (i.e., the least nonfinite ordinal), then the set {x ∈ X: x < ω} of all finite ordinals and the set of all natural numbers N (equipped with its natural ordering) have the same ordinal number. It is usual to assign the symbol ω as the ordinal number of any well-ordered set that is in a one-to-one order-preserving correspondence with N .
Suggested Reading Binmore [1] Brown and Pearcy [2] Cohen [2] Crossley et al. [1] Dugundji [1] Fraenkel, Bar-Hillel, and Levy [1] Halmos [3] Kelley [1]
Kolmogorov and Fomin [1] Moore [1] Royden [1] Simmons [1] Suppes [1] Vaught [1] Wilder [1] Yandell [1]
26
1. Set-Theoretic Structures
Problems Problem 1.1. Let A, B, and C be arbitrary sets. Prove the assertions below. (a) (A\B) ∪ (B\A) = (A ∪ B)\(A ∩ B)
and
A ∩ B = (A ∪ B)\(AB).
(b) (AB) ∪ (BC) = (A ∪ B ∪ C)\(A ∩ B ∩ C). (c) De Morgan laws: If A and B are subsets of X, then X\(A ∪ B) = (X\A) ∩ (X\B) and X\(A ∩ B) = (X\A) ∪ (X\B). Problem 1.2. Consider a function F : X → Y from a set X to a set Y. Let A, A1 , and A2 be arbitrary subsets of X, and let B, B1 , and B2 be arbitrary subsets of Y. Verify the following propositions. (a) F (X)\F (A) ⊆ F (X\A). (b) F −1 (Y \B) = X\F −1(B). (c) A1 ⊆ A2
=⇒
F (A1 ) ⊆ F (A2 ).
(d) B1 ⊆ B2
=⇒
F −1 (B1 ) ⊆ F −1 (B2 ).
(e) F (A1 ∪ A2 ) = F (A1 ) ∪ F (A2 ). (f) F (A1 ∩ A2 ) ⊆ F (A1 ) ∩ F (A2 ). (g) F −1 (B1 ∪ B2 ) = F −1 (B1 ) ∪ F −1 (B2 ). (h) F −1 (B1 ∩ B2 ) = F −1 (B1 ) ∩ F −1 (B2 ). (i) A ⊆ F −1 (F (A)). (j) F (F −1 (B)) ⊆ B. Problem 1.3. Consider the setup of Problem 1.2. Show that (a) F is injective if and only if the inverse image under F of each singleton in R(F ) is a singleton in X; (b) F is injective if and only if F (A1 ∩ A2 ) = F (A1 ) ∩ F (A2 ) for every A1 , A2 ⊆ X; (c) F is injective if and only if the images of disjoint sets in X are disjoint sets in Y ; (d) F is injective if and only if A = F −1 (F (A)) for every A ⊆ X; (e) F is surjective if and only if the inverse image under F of each nonempty subset of Y is a nonempty subset of X; (f) F is surjective if and only if F (F −1 (B)) = B for every B ⊆ Y. Problem 1.4. Verify that a function F : X → X is idempotent if and only if the range of F coincides with the set of all fixed points of F .
Problems
27
Problem 1.5. A function L: Y → X is said to be a left inverse of a function F : X → Y if LF = IX , the identity on X. F is injective if and only if it has a left inverse. The restriction of a left inverse of F to the range of F is unique and coincides with the inverse of F on R(F ) (recall: an injective function has an inverse on its range). Prove. Problem 1.6. A function R: Y → X is said to be a right inverse of a function F : X → Y if F R = IY , the identity on Y. Show that F is surjective if and only if it has a right inverse. Note that any right inverse of F is injective (for it has a left inverse). Similarly, any left inverse of F is surjective (for it has a right inverse). Conclusion: There exists an injective mapping of X into Y if and only if there exists a surjective mapping of Y onto X. Problem 1.7. A function F : X → Y is injective and surjective if and only if there is a function G: Y → X such that GF = IX (the identity on X) and F G = IY (the identity on Y ). Prove the above proposition and show that, in this case, this function G is unique and coincides with the inverse of F . Problem 1.8. If F : X → Y is an invertible function, then so is its inverse F −1 : Y → X and (F −1 )−1 = F . Moreover, if A ⊆ X, then F |A : A → F (A) is invertible and (F |A )−1 = F −1 |F (A) . Prove. Problem 1.9. Verify the following propositions. (a) The composition of two injective functions is an injective function. (b) The composition of two surjective functions is a surjective function. (c) The composition of two invertible functions is an invertible function. Note: When we speak of the composition G ◦ F of two functions F and G, it is assumed that the domain of G includes the range of F . Problem 1.10. Let F : X → Y and G: Y → Z be invertible mappings. (a) Show that (G ◦ F )−1 = F −1 ◦ G−1 . That is, using the simplified notation GF = G ◦ F , this means that if F and G are invertible, then GF : X → Z is invertible and the inverse (GF )−1 : Z → X is given by (GF )−1 = F −1 G−1 . Consider the nth power F n : X → X of F : X → X (i.e., the composition of F with itself n times) for any integer n ≥ 0 (with F 0 = I; the identity on X). (b) Show by induction that if F is invertible, then so is F n for every n ≥ 0, and (F n )−1 = (F −1 )n for each n ≥ 0. In this case we write F −n = (F −1 )n . Problem 1.11. A function F of a set X into itself is an involution if F 2 = I. (a) Verify that an involution is precisely an invertible function on X that coincides with its inverse: F = F −1 . Show that the composition of two involutions is again an involution if and only if they commute.
28
1. Set-Theoretic Structures
A sequence {Fn }∞ n=0 of functions of a set X into itself has the semigroup property if F0 = I and Fm+n = Fm ◦ Fn (i.e., Fm+n = Fm Fn ) for all m, n ≥ 0. (b) Show that the only sequences with the semigroup property are the power n sequences {F n }∞ n=0 for some F : X → X (where F = F · · · F , n times). Hint : Set F = F1 . If Fm+n = Fm ◦ Fn , then show by induction that Fn m = (Fm )n = (Fn )m for every m ≥ 0 and n ≥ 0. Thus Fn = F n . Problem 1.12. Let F : X → Y be a one-to-one mapping of a set X onto a set Y. Let G: X → A be a one-to-one mapping of X onto a subset A of X. Prove: If A is a proper subset of X, then F (A) is a proper subset of Y. Apply this to show that a finite set has no proper equivalent subset. Hint : Consider the commutative diagram F −1
X ←−−− ⏐ ⏐ G
Y ⏐ ⏐ H
F |A
A −−−→ F (A). Problem 1.13. Let X be a set with more than one element, and consider the following relations R1 , R2 , R3 , R4 , and R5 on the power set ℘ (X) of X. For every pair {A, B} of subsets of X A R1 B if AB = ∅ (i.e., if A = B), A R2 B if AB is finite, A R3 B if A and B are singletons, A R4 B if A ⊆ B or B ⊆ A (i.e., if A\B = ∅ or B\A = ∅), A R5 B if A ⊆ B (i.e., if A\B = ∅). Show that the table below properly classifies these relations according to reflexivity, transitivity, symmetry, and antisymmetry.
R1 R2 R3 R4 R5
Reflexive √
Transitive √
Symmetric √
√
√
√
√
√
√ √
Antisymmetric √
√ √
√
Hint : To verify that R2 is transitive use Problem 1.1(b) and recall that the union of two sets is finite if and only if each of them is finite.
Problems
29
Problem 1.14. Consider the functions Φ: ℘ (X) → ℘(X) and H: X → Y as in the proof of Theorem 1.2. Show that Φ is an increasing function with respect to the inclusion ordering of ℘ (X), and that H is injective and surjective. Problem 1.15. Let Y X denote the collection of all functions from a set X to a set Y. Suppose Y is partially ordered and let ≤ be a partial ordering of Y. Now consider a relation on Y X, also denoted by ≤ and defined as follows. For any pair {F, G} of functions in Y X, F ≤G
if F (x) ≤ G(x) for every x ∈ X.
(a) Show that ≤ is a partial ordering of Y X. (b) Prove: If (Y, ≤) is a lattice, then (Y X , ≤) is a lattice. Hint : Suppose (Y, ≤) is a lattice. Take F and G from Y X and let U and V be functions in Y X defined as follows. U (x) = F (x) ∨ G(x) and V (x) = F (x) ∧ G(x) for every x ∈ X. Show that F ∨ G = U and F ∧ G = V. Problem 1.16. Set Y = {0, 1} and let χA : X → {0, 1} be the characteristic function of an arbitrary subset A of a set X. Thus, for every A ∈ ℘(X), χA lies in Y X = 2X . Let A and B be subsets of X, and consider the partial ordering of 2X introduced in Problem 1.15. Prove the following propositions. (a) χA ≤ χB
⇐⇒
A ⊆ B,
(b) χA ∨ χB = χA∪B , (c) χA ∧ χB = χA∩B = χA χB , (d) A ∩ B = ∅
⇐⇒
χA∩B = 0
⇐⇒
χA∪B = χA + χB .
Problem 1.17. Let F be the collection of all functions from subsets of a set X to a set Y. That is, Y A. F = F ∈ Y A: A ⊆ X = A∈℘(X)
The unique function whose domain is the empty set ∅ is called the empty function in F. Consider the following relation ≤ on F. For any pair {F, G} of functions in F, F ≤ G if F is a restriction of G (equivalently, if G is an extension of F ). That is, F ≤ G if and only if F : A → Y, G: B → Y, A ⊆ B ⊆ X, and F = G|A (i.e., F (x) = G(x) for every x ∈ A). (a) Show that the relation ≤ on F is a partial ordering (called the extension ordering). A function V : C → Y in F is a lower bound for a pair of functions F : A → Y and G: B → Y in F if C ⊆ B, C ⊆ A, and V = F |C = G|C . A function U : D → Y in F is an upper bound for the pair {F, G} if A ⊆ D, B ⊆ D, U |A = F , and U |B = G.
30
1. Set-Theoretic Structures
(b) Show that every pair of functions F and G in F has an infimum F ∧ G in F . (In particular, if the domain A of F and the domain B of G are disjoint, then F ∧ G is the empty function — which function is F ∧ G if A and B are not disjoint but F (A) and G(B) are disjoint?) Let {Fγ } be an indexed family of functions in F . For each Fγ let D(Fγ ) and R(Fγ ) denote the domain and range of Fγ , respectively. Prove the following propositions. has a supre(c) If {Fγ } is simply ordered (i.e., if {Fγ } is a chain in F ), then
it F , whose domain and range are D γ Fγ = γ D(Fγ ) and mum γ Fγ in R γ Fγ = γ R(Fγ ). Moreover, if each Fγ is injective, then so is γ Fγ . (d) If the domains {D(Fγ )} are pairwise disjoint, then {Fγ } has a supremum {R(Fγ )} are γ Fγ : γ D(Fγ ) → γ R(Fγ ) in F. Moreover, if the ranges also pairwise disjoint, and if each Fγ is injective, then so is γ Fγ . Problem 1.18. Let {Xn }n∈N be a sequence of sets. Set Y1 = X1 and Yn+1 =
n+1
Xk
n
k=1
Xk
k=1
for each n ∈ N . Show by induction that n k=1
n
Yk =
Xk
k=1
for every n ∈ N . (Hint : A ∪ (B\A) = A ∪ B.) Verify that Yn ⊆ Xn for each n ∈ N , and Ym ∩ Yn = ∅ for every pair of distinct natural numbers m and n. Moreover, show that ∞ ∞ Yn = Xn . n=1
n=1
The sequence {Yn }n∈N is referred to as the disjointification of {Xn }n∈N . Problem 1.19. Let ℘(X) be the power set of a set X and consider the in℘ clusion ordering of ℘(X). Let {Xn }∞ n=1 be an arbitrary (X)-valued sequence (i.e., a sequence of subsets of a given set X). Recall that, for each n ∈ N , {Xk }∞ k=n is an indexed family of subsets of X, and hence a subset of the complete lattice ℘ (X). Let inf n≤k Xk and supn≤k Xk denote inf{Xk }∞ k=n and sup{Xk }∞ , respectively, and set k=n Yn = inf Xk = n≤k
∞ k=n
Xk
and
Zn = sup Xk = n≤k
∞
Xk
k=n
∞ ℘ so that {Yn }∞ n=1 and {Zn }n=1 are (X)-valued sequences as well.
Problems
31
∞ (a) Verify that {Yn }∞ n=1 is an increasing sequence and {Zn }n=1 is decreasing. ∞ ∞ The union n=1 Yn and the intersection n=1 Zn , which are elements of ℘ (X), are usually denoted by ∞
Yn = lim inf Xn
∞
and
n
n=1
Zn = lim sup Xn , n
n=1
called limit inferior and limit superior of {Xn}∞ n=1 , respectively. Show that lim inf Xn ⊆ lim sup Xn .
(b)
n
n
If lim inf n Xn = lim supn Xn , then the sequence {Xn}∞ n=1 is said to converge to the limit lim Xn = lim inf Xn = lim sup Xn . n
n
n
Prove the following propositions. (c) If {Xn}∞ n=1 is an increasing sequence, then Yn = Xn for each n ∈ N and Zn = ∞ m=1 Xm = supm Xm for every n ∈ N , so that lim inf Xn = lim sup Xn = sup Xn . n
n
∞
{Xn }∞ n=1
(d) If is itself decreasing, then Yn = n ∈ N and Zn = Xn for each n ∈ N , so that
n
m=1 Xm
= inf m Xm for every
lim inf Xn = lim sup Xn = inf Xn . n
n
n
Therefore, an increasing sequence of sets converges to its union (supn Xn = X ) and, n n dually, a decreasing sequence of sets converges to its intersection (inf n Xn = n Xn ), and so every monotone sequence of sets converges. Thus, ∞ since {Yn }∞ decreasing, respectively, n=1 and {Zn }n=1 are always increasing and ∞ ∞ ∞ ∞ they do converge: {Yn }n=1 converges to itsunion n=1 Y = n n=1 k=n Xk , ∞ ∞ ∞ and {Zn }∞ n=1 converges to its intersection n=1 Zn = n=1 k=n Xk . (e) Verify the following identities.
lim inf Xn = lim inf Xk n
n
and
lim sup Xn = lim sup Xk . n
n≤k
n
n≤k
(f) Now show that X\ lim sup Xn = lim inf (X\Xn ) and X\ lim inf Xn = lim sup(X\Xn ). n
n
n
n
Thus a sequence {Xn}∞ n=1 converges if and only if the sequence of its complements {X\Xn}∞ converges and, in this case, n=1 lim(X\Xn ) = X\ lim Xn . n
n
Problem 1.20. Let A and B be subsets of the sets X and Y. Show that A ↔ B and X\A ↔ Y \B
imply
X ↔ Y.
32
1. Set-Theoretic Structures
(Warning: X ↔ Y and A ↔ B do not imply X\A ↔ Y \B.) Problem 1.21. Let A and B be any sets. Prove that ≤
(a)
A⊆B
(b)
A ⊆ B, B is finite, and
#A
implies
#B
#A
(hint : inclusion map), #B
=
imply
A = B;
and show that assertion (b) does not hold if B is infinite. Problem 1.22. For any sets A, B, and C verify that ≤
#A #A
Moreover, if
#B
<
≤
#C
#C
or
#B
≤
implies #A
≤
#B
#A
<
≤
# C.
# C,
then
#A
<
# C.
Hint : Cantor–Bernstein Theorem. Problem 1.23. Suppose the sets A, B, C, and D are such that #A
≤
#C
#B
and
≤
# D.
Prove the following propositions. (a)
# (A ∪ B)
(b)
# (A×B)
Moreover, if
≤
≤
#A
# (C
∪ D) whenever C ∩ D = ∅,
# (C×D).
=
#C
and
#B
=
# D,
then the cardinalities in (b) coincide.
Problem 1.24. Let Y X denote the collection of all mappings of a set X into a set Y. Show that, if Y has more than one element, then # ℘ (X)
(In fact,
# ℘ (X)
=
#2
X
=
≤
#Y
#Y X
X
for every set X.
if X is infinite and 2 ≤
#Y
≤
# X.)
Problem 1.25. Let N , Z , and Q have their standard meanings and set ℵ0 = # N as usual. Verify the following identities. (a)
# (N ×N )
= ℵ0 .
Hint : The function F : N ×N → N defined, for each pair m, n ∈ N , by + m is injective and surjective. The array F (m, n) = (m+n−1)(m+n−2) 2 n↑ 5 4 3 2 1
may be suggestive.
11 7 4 2 1
· 12 8 5 3
· · 13 9 6
· · · 14 10
· · · · 15
1
2
3
4
5
m →
Problems
(b)
#Z
33
= ℵ0 .
Hint : Let N e denote the set of all nonnegative even integers (including zero), and let N o denote the set of all positive odd integers. Recall that # N 0 = # N , and note that # N 0 = # N e and # N = # N o (reason: the functions F : N 0 → N e and G: N → N o , defined by F (n) = 2n for every n ∈ N 0 and G(n) = 2n − 1 for every n ∈ N , are injective and surjective). Set N − = {k ∈ Z : −k ∈ N } so that # N − = # N o . Use Problem 1.23(a) to show that # Z ≤ # N . (c)
#Q
= ℵ0 .
Hint : The function F : Z ×N → Q defined by F (k, n) = and every n ∈ N is surjective. Use Problem 1.23(b).
k n
for every k ∈ Z
Problem 1.26. The purpose of this problem is to prove that, if X and Y are sets with more than one element, then # (X
∪Y) ≤
# (X×Y ).
(a) First verify that the above assertion holds whenever X and Y are both finite sets with more than one element. Consider the relations ∼X and ∼ Y on the Cartesian product X×Y defined as follows. For every pair of pairs (x1 , y1 ) and (x2 , y2 ) in X×Y, (x1 , y1 ) ∼X (x2 , y2 )
⇐⇒
x1 = x2
(x1 , y1 ) ∼Y (x2 , y2 )
⇐⇒
y 1 = y2 .
(b) Show that ∼ X and ∼ Y are both equivalence relations on X×Y. Now take x ∈ X and y ∈ Y arbitrary and consider the equivalence classes, [x] ⊆ X×Y and [y] ⊆ X×Y, of the point (x, y) ∈ X×Y with respect to ∼X and ∼ Y , respectively. (c) Show that [x] ↔ Y and [y] ↔ X. Hint :
Y | | y ||. | |
[x] | | | |· | | · x
[y]
X
Next suppose one of the sets X or Y, say X, is infinite and consider the singleton {(x, y)} on (x, y) ∈ X×Y.
34
1. Set-Theoretic Structures
(d) Verify that
#X
=
# ([y]\{(x, y)})
and
#Y
=
# [x].
(e) Finally, apply the results of Problems 1.21(a), 1.22, and 1.23(a) to conclude that # (X ∪ Y ) ≤ # (X×Y ). Problem 1.27. Prove the following propositions. (a) The union of two sets is countable if and only if each of them is countable. Hint : Problems 1.22, 1.23, 1.25, and 1.26. (b) Let X be an infinite set, and let A and B be arbitrary subsets of X. The relation ∼ on ℘ (X) defined by A∼B
if
AB is countable
is an equivalence relation — compare with Problem 1.13. Problem 1.28. Let E be an infinite set. Suppose A and B are sets such that #A
≤
#E
and
#B
≤
# E.
(a) The following propositions, which are naturally linked to Problems 1.23 and 1.26, are in fact corollaries of Theorem 1.9. Prove them. # (A ∪ B)
≤
#E
and
# (A×B)
≤
# E.
(b) Now use Problem 1.23(b) and Theorem 1.9 to show by induction that, if X is an infinite set, then #X
n
=
#X
for every n ∈ N .
Problem 1.29. Let I be the collection of all injective functions from subsets of a set X to itself. That is, set I = F ∈ X A : A ⊆ X and F is injective . As a subset of A∈℘(X) X A , I is partially ordered in the extension ordering (cf. Problem 1.17). Let J be the collection of all those functions F in I for which the range R(F ) is disjoint with the domain D(F ): J = F ∈ I: R(F ) ⊆ X\D(F ) . Problem 1.17 also tells us that every chain {Fγ } in J has a supremum γ Fγ
in I, and also that D γ Fγ = γ D(Fγ ) and R γ Fγ = γ R(Fγ ). (a) Show that γ Fγ in fact lies in J . Hint : Take Fλ and Fμ arbitrary from {Fγ } ⊆ J so that Fλ ≤ Fμ (or vice versa). Note thatR(Fλ ) ∩ D(Fμ ) ⊆ R(Fμ ) ∩ D(Fμ ) = ∅, and conclude that γ R(Fγ ) ∩ γ D(Fγ ) is empty too.
Problems
35
Thus every chain in J has an upper bound (a supremum, actually) in J, and so J contains a maximal function by Zorn’s Lemma. Let F0 be a maximal function of J and let A0 be the domain of F0 , so that F0 (A0 ) ⊆ X\A0 . Suppose # A0 < # (X\A0 ). (b) Show that if X is an infinite set, then there exist two distinct points, say x0 and x1 , in (X\A0 )\F0 (A0 ). Hint : Under the above assumption # F0 (A0 ) < # (X\A0 ) (why?) and X\A0 is infinite — recall: the union of finite sets is finite. Now set A1 = A0 ∪ {x0 } and consider the function F1 : A1 → X defined by F0 (x) ∈ F0 (A0 ), x ∈ A0 , F1 (x) = x = x0 ∈ X\A0 . x1 ∈ X\F0 (A0 ), (c) Show that F1 ∈ J . Since F0 = F1 |A0 it follows that F0 ≤ F1 , which contradicts the fact that F0 is a maximal of J (for F0 = F1 ). Therefore (by Theorem 1.7), ≤
# (X\A0 )
Next verify that
# A0
=
# F0 (A0 )
≤
# A0
=
# A0 .
# (X\A0 )
and conclude:
# (X\A0 )
(cf. Cantor–Bernstein Theorem). Finally, using Problem 1.28(a), show that # A0
=
# X.
Outcome: If X is an infinite set, then it has a subset, say A0 , such that # A0
=
#X
=
# (X\A0 ).
Problem 1.30. Let A be an arbitrary set and write A = A×{A}. (a) Show that
# A
=
# A.
Hint : The function that assigns to each a in A the ordered pair (a, A) in the Cartesian product A is a one-to-one mapping of A onto A . It is plain that A ∩ A = ∅. Moreover, if B is any set such that B = A, then A ∩ B = ∅ where B = B×{B}. Conclusion 1: If A and B are any sets, then there exist sets C and D such that #A
=
# C,
#B
=
# D,
and
C ∩ D = ∅.
Now suppose C1 , C2 , D1 and D2 are sets with the following properties. C1 ∩ D1 = ∅, C2 ∩ D2 = ∅, # C1 = # C2 , and # D1 = # D2 .
36
1. Set-Theoretic Structures
(b) Show that
# (C1
∪ D1 ) =
# (C2
∪ D2 ).
Conclusion 2: # (C ∪ D) is independent of the particular pair of sets {C, D} employed in Conclusion 1. We are now prepared to define the sum of cardinal numbers. If A and B are sets, then #A + #B
# (C
=
∪ D)
for any pair of sets {C, D} such that #A
=
# C,
#B
In particular, if A ∩ B = ∅, then
=
# D,
#A + #B
(c) Use Problem 1.29 to show that
#X
C ∩ D = ∅.
and
+
=
#X
# (A ∪ B).
=
#X
for every infinite set X.
The definition of product of cardinal numbers is much simpler: #A · #B
=
# (A×B)
for any sets A and B. According to Theorem 1.9, infinite set X.
#X
·
#X
=
#X
for every
(d) Prove: If X and Y are two sets, at least one of which is infinite, then #X
+
#Y
=
#X
·
#Y
= max{ # X, # Y }.
Hint : If # Y ≤ # X, then verify that # X ≤ also that # X ≤ # X · # Y ≤ # X · # X.
#X
+
#Y
≤
#X
+
#X
and
Problem 1.31. Let A be an arbitrary nonempty set. The Zermelo WellOrdering Principle says that A can be well ordered so that there is a wellordering, say ≤, of A. Let P ( ) be a predicate that is meaningful for every a ∈ A (i.e., let P (a) be a proposition for each a ∈ A). Suppose that (i) P (a0 ) holds true for the minimum element a0 of A, and that (ii) P (a ) holds whenever P (a ) holds for every a < a . Then show that P (a) holds true for every a ∈ A. This is called the Principle of Transfinite Induction. Hint : Prove by contradiction. (Compare with the Principle of Mathematical Induction of Section 1.1.)
2 Algebraic Structures
The main algebraic structure involved with the subject of this book is that of a “linear space” (or “vector space”). A linear space is a set endowed with an extra structure in addition to its set-theoretic structure (i.e., an extra structure that goes beyond the notions of inclusion, union, complement, function, and ordering, for instance). Roughly speaking, linear spaces are sets where two operations, called “addition” and “scalar multiplication”, are properly defined so that we can refer to the “sum” of two points in a linear space, as well as to the “product” of a point in it by a “scalar”. Although the reader is supposed to have already had a contact with linear algebra and, in particular, with “finite-dimensional vector spaces”, we shall proceed from the very beginning. Our approach avoids the parochially “finite-dimensional” constructions (whenever this is possible), and focuses either on general results that do not depend on the “dimensionality” of the linear space, or on abstract “infinitedimensional” linear spaces.
2.1 Linear Spaces A binary operation on a set X is a mapping of X×X into X. If F is a function from X×X to X, then we generally write z = F (x, y) to indicate that z in X is the value of F at the point (x, y) in X×X. However, to emphasize the rule of the binary operation (the outcome of a binary operation on two points of X is again a point of X), it is convenient (and customary) to adopt a different notation. Moreover, in order to emphasize the abstract character of a binary operation, it is also common to use a noncommittal symbol to denote it. Thus, if is a binary operation on X (so that : X×X → X), then we shall write z = x y instead of z = (x, y) to indicate that z in X is the value of at the point (x, y) in X×X. If a binary operation on X has the property that x (y z) = (x y) z for every x, y, and z in X, then it is said to be associative. In this case we shall drop the parentheses and write x y z. If there exists an element e in X such that C.S. Kubrusly, The Elements of Operator Theory, DOI 10.1007/978-0-8176-4998-2_2, © Springer Science+Business Media, LLC 2011
37
38
2. Algebraic Structures
xe = ex = x for every x ∈ X, then e is said to be the neutral element (or the identity element ) with respect to the binary operation on X. It is easy to show that, if a binary operation has a neutral element e, then e is unique. If an associative binary operation on X has a neutral element e in X, and if for some x ∈ X there exists x−1 ∈ X such that x x−1 = x−1 x = e, then x−1 is called the inverse of x with respect to . It is also easy to show that, if the inverse of x exists with respect to an associative binary operation , then it is unique. A group is a nonempty set X on which is defined a binary operation such that (a) is associative, (b) has a neutral element in X, and (c) every x in X has an inverse in X with respect to . If a binary operation on X has the property that xy = yx for every x and y in X, then it is said to be commutative. If X is a group with respect to a binary operation , and if (d) is commutative, then X is said to be an Abelian (or commutative) group. Example 2.A. Let X be a set with more than three elements. The collection of all injective mappings of X onto itself (i.e., the collection of all invertible mappings on X) is a non-Abelian group with respect to the composition operation ◦. The neutral element (or the identity element) of such a group is, of course, the identity map on X. An additive Abelian group is an Abelian group X for which the underlying binary operation is interpreted as an addition and denoted by + (instead of ). In this case the element x + y (which lies in X for every x and y in X) is called the sum of x and y. The (unique) neutral element with respect to addition is denoted by 0 (instead of e) and called zero. The (unique) inverse of x under addition is denoted by −x (instead of x−1 ) and is called the negative of x. Thus x + 0 = 0 + x = x and x + (−x) = (−x) + x = 0 for every x ∈ X. Moreover, the operation of subtraction is defined by x − y = x + (−y), and x − y is called the difference between x and y. If : X×X → X is another binary operation on X, and if x (y z) = (x y) (x z)
and
(y z) x = (y x) (z x)
2.1 Linear Spaces
39
for every x, y, and z in X, then is said to be distributive with respect to . The above properties are called the distributive laws. A ring is an additive Abelian group X with a second binary operation on X, called multiplication and denoted by · , such that (e) the multiplication operation is associative and (f) distributive with respect to the addition operation. In this case the element x · y (which lies in X for every x and y in X) is called the product of x and y (alternative notation: xy instead of x · y). A commutative ring is a ring for which (g) the multiplication operation is commutative. A ring with identity is a ring X such that (h) the multiplication operation has a neutral element in X. In this case such a (unique) neutral element in X with respect to the multiplication operation is denoted by 1 (so that x · 1 = 1 · x = x for every x ∈ X) and is called the identity. Example 2.B. The power set ℘(X) of a nonempty set X is a commutative ring with identity if addition is interpreted as symmetric difference (or Boolean sum) and multiplication as intersection (i.e., A + B = AB and A · B = A ∩ B for all subsets A and B of X). Here the neutral element under addition (i.e., the zero) is the empty set ∅, and the neutral element under multiplication (i.e., the identity) is X itself. A ring with identity is nontrivial if it has another element besides the identity. (The set {0} with the operations 0 + 0 = 0 · 0 = 0 is the trivial ring whose only element is the identity.) If a ring with identity is nontrivial, then the neutral element under addition and the neutral element under multiplication never coincide (i.e., 0 = 1). In fact, x · 0 = 0 · x = 0 for every x in X whenever X is a ring (with or without identity). Incidentally (or not) this also shows that, in a nontrivial ring with identity, zero has no inverse with respect to the multiplication operation (i.e., there is no x in X such that 0 · x = x · 0 = 1). A ring X with identity is called a division ring if (i) each nonzero x in X has an inverse in X with respect to the multiplication operation. That is, if x = 0 in X, then there exists a (unique) x−1 ∈ X such that x · x−1 = x−1 · x = 1. Example 2.C. Let the addition and multiplication operations have their ordinary (“numerical”) meanings. The set of all natural numbers N is not a group under addition; neither is the set of all nonnegative integers N 0 . However, the set of all integers Z is a commutative ring with identity, but not a
40
2. Algebraic Structures
division ring. The sets Q , R, and C (of all rational, real, and complex numbers, respectively), when equipped with their respective operations of addition and multiplication, are all commutative division rings. These are infinite commutative division rings, but there are finite commutative division rings (e.g., if we declare that 1 + 1 = 0 + 0 = 0, 0 + 1 = 1 + 0 = 1, 0 · 0 = 0 · 1 = 1 · 0 = 0, and 1 · 1 = 1, then {0, 1} is a commutative division ring). Roughly speaking, commutative division rings are the number systems of mathematics, and so they deserve a name of their own. A field is a nontrivial commutative division ring. The elements of a field are usually called scalars. We shall be particularly concerned with the fields R and C (the real field and the complex field ). An arbitrary field will be denoted by F . Summing up: A field F is a set with more than one element (at least 0 and 1 are distinct elements of it) equipped with two binary operations, called addition and multiplication, that satisfy all the properties (a) through (i) — clearly, with replaced by + in properties (a) through (d). Definition 2.1. A linear space (or vector space) over a field F is a nonempty set X (whose elements are called vectors) satisfying the following axioms. Vector Addition. X is an additive Abelian group under a binary operation called vector addition. Scalar Multiplication. There is given a mapping of F ×X into X that assigns
to each scalar α in F and each vector x in X a vector αx in X . Such a mapping defines an operation, called scalar multiplication, with the following properties. For all scalars α and β in F , and all vectors x and y in X , 1x = x, α(βx) = (α · β)x, α(x y) = αx αy, (α + β)x = αx βx. Some remarks on notation and terminology. The underlying set of a linear space is the nonempty set upon which the linear space is built. We shall use the same notation X for both the linear space and its underlying set, even though the underlying set alone has no algebraic structure of its own. A set X needs a binary operation on it, a field, and another operation involving such a field with X to acquire the necessary algebraic structure that will grant it the status of a linear space. The scalar 1 in the above definition stands, of course, for the identity in the field F with respect to the multiplication · in F , and + denotes the addition in F . Observe that + (addition in the field F ) and (addition in the group X ) are different binary operations. However, once the difference has been pointed out, we shall use the same symbol + to denote both addition in the field F and addition in the group X . Moreover, we shall also drop the dot from the multiplication notation in F , and write αβ instead
2.1 Linear Spaces
41
of α · β. The neutral element under the vector addition in X (i.e., the vector zero) is referred to as the origin of the linear space X . Again, we shall use one and the same symbol 0 to denote both the origin in X and the scalar zero in F . A linear space over R is called a real linear space, and a linear space over C is called a complex linear space. Example 2.D. R itself is a linear space over R. That is, the plain set R when equipped with the ordinary binary operations of addition and multiplication becomes a field, also denoted by R. If vector addition is identified with scalar addition, then it becomes a real linear space, denoted again by R. More generally, for each n ∈ N , let F n denote the Cartesian product of n copies of a field F (i.e., the set of all ordered n-tuples of scalars in F ). Now define vector addition and scalar multiplication coordinatewise, as usual: x + y = (ξ1 + υ1 , . . . , ξn + υn )
and
αx = (αξ1 , . . . , αξn )
for every x = (ξ1 , . . . , ξn ) and y = (υ1 , . . . , υn ) in F n and every α in F . This makes F n into a linear space over F . In particular, Rn (the Cartesian product of n copies of R) and C n (the Cartesian product of n copies of C ) become real and complex linear spaces, respectively, whenever vector addition and scalar multiplication are defined coordinatewise. However, if we restrict scalar multiplication to real multiplication only, then C n can also be made into a real linear space. Example 2.E. Let S be a nonempty set, and let F be an arbitrary field. Consider the set X = FS of all functions from S to F (i.e., the set of all scalar-valued functions on S, where “scalar-valued” stands for “F -valued”). Let vector addition and scalar multiplication be defined pointwise. That is, if x and y are functions in X and α is a scalar in F , then x + y and αx are functions in X defined by (x + y)(s) = x(s) + y(s)
and
(αx)(s) = α(x(s))
for every s ∈ S. Now it is easy to show that X , when equipped with these two operations, in fact is a linear space over F . Particular cases: F N (the set of all scalar-valued sequences) and F [0,1] (the set of all scalar-valued functions on the interval [0, 1]) are linear spaces over F , whenever vector addition and scalar multiplication are defined pointwise. Note that the linear space F n in the previous example also is a particular case of the present example, where the coordinatewise operations are identified with the pointwise operations (recall: F n = F I n where I n = {i ∈ N : i ≤ n}). Example 2.F. What was the role played by the field F in the previous example? Answer: Vector addition and scalar multiplication in F S were defined
42
2. Algebraic Structures
pointwise by using addition and multiplication in F . This suggests the following generalization of Example 2.E. Let S be a nonempty set, and let Y be an arbitrary linear space (over a field F ). Consider the set X = YS of all functions from S to Y (i.e., the set of all Y-valued functions on S). Let vector addition and scalar multiplication in X be defined pointwise by using vector addition and scalar multiplication in Y. That is, if f and g are functions in X (so that f (s) and g(s) are elements of Y for each s ∈ S) and α is a scalar in F , then f + g and αf are functions in X defined by (f + g)(s) = f (s) + g(s)
and
(αf )(s) = α(f (s))
for every s ∈ S. As before, it is easily verified that X , when equipped with these operations, becomes a linear space over the same field F . The origin of X is the null function 0: S → Y (which is defined by 0(s) = 0 for all s ∈ S). Examples 2.D and 2.E can be thought of as particular cases of this one. Example 2.G. Let X be a linear space over F , and let x, x , y, and y be arbitrary vectors in X . An equivalence relation ∼ on X is compatible with vector addition if x ∼ x and y ∼ y
imply
x + y ∼ x + y.
Similarly, it is said to be compatible with scalar multiplication if, for x and x in X and α in F , x ∼ x implies αx ∼ αx. If an equivalence relation ∼ on a linear space X is compatible with both vector addition and scalar multiplication, then we shall say that ∼ is a linear equivalence relation. Now consider X /∼ , the quotient space of X modulo ∼ (i.e., the collection of all equivalence classes [x] with respect to ∼ ), and suppose the equivalence relation ∼ on X is linear. In this case a binary operation + on X /∼ can be defined by setting [x] + [y] = [x + y] for every [x] and [y] in X /∼ . Indeed, since ∼ is compatible with vector addition, it follows that [x + y] does not depend on which particular members x and y of the equivalence classes [x] and [y] were taken. Thus the operation + actually is a function from (X /∼ ×X /∼) to X /∼ . This defines vector addition in X /∼ . Scalar multiplication in X /∼ can be similarly defined by setting α[x] = [αx] for every [x] in X /∼ and α in F . Therefore, if ∼ is a linear equivalence relation on a linear space X over a field F , then X /∼ becomes a linear space over F when vector addition and scalar multiplication in X /∼ are defined this way.
2.2 Linear Manifolds
43
It is clear by the definition of a linear space X that x + y + z is a welldefined vector in X whenever x, y, and z are vectors in X . Similarly, if {xi }ni=1 n is a finite set of vectors in X , then the sum x1 + · · · + xn , denoted by i=1 xi , is again a vector in X . (The notion of infinite sums needs topology and we shall consider these topics in Chapters 4 and 5.)
2.2 Linear Manifolds A linear manifold of a linear space X over F is a nonempty subset M of X with the following properties. x+y ∈M
and
αx ∈ M
for every pair of vectors x, y in M and every scalar α in F . It is readily verified that a linear manifold M of a linear space X over a field F is itself a linear space over the same field F . The origin 0 of X is the origin of every linear manifold M of X . The zero linear manifold is {0}, consisting of the single vector 0. If a linear manifold M is a proper subset of X , then it is said to be a proper linear manifold . A nontrivial linear manifold M of a linear space X is a nonzero proper linear manifold of it ({0} = M= X ). Example 2.H. Let M be a linear manifold of a linear space X and consider a relation ∼ on X defined as follows. If x and x are vectors in X , then x ∼ x
if
x − x ∈ M.
That is, x ∼ x if x is congruent to x modulo M — notation: x ≡ x(mod M). Since M is a linear manifold of X , the relation ∼ in fact is an equivalence relation on X (reason: 0 ∈ M — reflexivity, x − x = (x − x) + (x − x ) ∈ M whenever x − x and x − x lie in M — transitivity, and x − x ∈ M whenever x − x ∈ M — symmetry). The equivalence class (with respect to ∼ ) [x] = x ∈ X : x ∼ x = x ∈ X : x = x + z for some z ∈ M of a vector x in X is called the coset of x modulo M — notation: [x] = x + M. The set of all cosets [x] modulo M for every x ∈ X (i.e., the collection of all equivalence classes [x] with respect to the equivalence relation ∼ for every x in X ) is precisely the quotient space X /∼ of X modulo ∼ . Following the terminology introduced in Example 2.G, ∼ is a linear equivalence relation on the linear space X . Indeed, if x − x ∈ M and y − y ∈ M, then (x + y ) − (x + y) = (x − x) + (y − y) ∈ M and αx − αx = α(x − x) ∈ M for every scalar α, so that x ∼ x and y ∼ y imply x + y ∼ x + y and αx ∼ αx. Therefore, with vector addition and scalar multiplication defined by [x] + [y] = [x + y]
and
α[x] = [αx],
44
2. Algebraic Structures
X /∼ is made into a linear space over the same scalar field. This is usually denoted by X /M (instead of X /∼ ), and called the quotient space of X modulo M. The origin of X /M is, of course, [0] = M. Let π: X → X /M be the natural mapping of X onto the quotient space X /M as defined in Section 1.4: π(x) = [x] = x + M
for every
x ∈ X.
The concept of “linear transformation” will be defined in Section 2.5. It can be easily shown that π is a linear transformation between the linear spaces X and X /M. The null space of π, viz., N (π) = {x ∈ X : π(x) = [0]}, is given by N (π) = M. Indeed, if π(x) = [0] = M, then x + M = [x] = [0] = M, and so x ∈ M. On the other hand, if u ∈ M, then π(u) = u + M = 0 + M = M = [0]. If M and N are linear manifolds of a linear space X , then the sum of M and N , denoted by M + N , is the subset of X made up of all sums x + y where x is a vector in M and y is a vector in N : M + N = z ∈ X : z = x + y, x ∈ M and y ∈ N . It is trivially verified that M + N is a linear manifold of X . If {Mi }ni=1 is a n finite family of linear manifolds of a linear space X , then the sum i=1 Mi n is the linear manifold M1 + · · · + Mn of X consisting of all sums i=1 xi where each vector xi lies in Mi . More generally, if {Mγ }γ∈Γ is an arbitrary indexed family of linear manifolds of a linear space X , then the sum γ∈Γ Mγ is defined as the set of all sums γ∈Γ xγ with xγ ∈ M γ for each index γ and xγ = 0 except for some finite set of indices (i.e., γ∈Γ Mγ is the set made up of all finite sums witheach summand being a vector in one of the linear manifolds Mγ ). Clearly, γ∈Γ Mγ is itself a linear manifold of X , and Mα ⊆ γ∈Γ Mγ for every Mα ∈ {Mγ }γ∈Γ . A linear manifold of a linear space X is never empty: the origin of X is always there. Note that the intersection M ∩ N of two linear manifolds M and N of a linear space X is itself a linear manifold of X . In fact, if {Mγ }γ∈Γ is an arbitrary collection of linear manifolds of a linear space X, then the inter section γ∈Γ Mγ is again a linear manifold of X . Moreover, γ∈Γ Mγ ⊆ Mα for every Mα ∈ {Mγ }γ∈Γ . Consider the collection Lat(X ) of all linear manifolds of a linear space X . Since Lat(X ) is a subcollection of the power set ℘(X ), it follows that Lat(X ) is partially ordered in the inclusion ordering. If {Mγ }γ∈Γ is a subcollection of Lat(X ), then γ∈Γ Mγ in Lat(X ) is an upper bound for {Mγ }γ∈Γ and ∈ Lat(X ) is an upper γ∈Γ Mγ in Lat(X ) is a lower bound for {Mγ }γ∈Γ . If U bound for {Mγ }γ∈Γ (i.e., Mγ ⊆ U for all γ ∈ Γ ), then γ∈Γ Mγ ⊆ U . Thus Mγ = sup{Mγ }γ∈Γ . γ∈Γ
2.3 Linear Independence
45
Similarly, if V ∈ Lat(X ) is a lower bound for {Mγ }γ∈Γ (i.e., if V ⊆ Mγ for all γ ∈ Γ ), then V ⊆ γ∈Γ Mγ . Thus Mγ = inf{Mγ }γ∈Γ . γ∈Γ
Conclusion: Lat(X ) is a complete lattice. The collection of all linear manifolds of a linear space is a complete lattice in the inclusion ordering. If {M, N } is a pair of elements of Lat(X ), then M ∨ N = M + N and M ∧ N = M ∩ N . Let A be an arbitrary subset of a linear space X , and consider the subcollection (a sublattice, actually) LA of the complete lattice Lat(X ), LA = M ∈ Lat(X ): A ⊆ M , consisting of all linear manifolds of X that include A. Set span A = inf LA = LA , which is called the (linear) span of A. Since A ⊆ LA (for A ⊆ M for every M ∈ LA ), it follows that inf LA = min LA so that span A ∈ LA . Thus span A is the smallest linear manifold of X that includes A, which coincides with the intersection of all linear manifolds of X that include A. It is readily verified that span ∅ = {0}, span M = M for every M ∈ Lat(X ), and A ⊆ span A = span (span A) for every A ∈ ℘(X ). Moreover, if A and B are subsets of X, then A⊆B
implies
span A ⊆ span B.
If M and N are linear manifolds of a linear space X , then M ∪ N ⊆ M + N . Moreover, if K is a linear manifold of X such that M ∪ N ⊆ K, then x + y ∈ K for every x ∈ M and every y ∈ N , and hence M + N ⊆ K. Thus M + N is the smallest linear manifold of X that includes M ∪ N , which means that M + N = span (M ∪ N ). More generally, let {Mγ }γ∈Γ be anarbitrary subcollection of Lat(X ), and suppose K ∈ Lat(X ) is such that γ∈Γ Mγ ⊆ K. Then every (finite) sum x with each x in M is a vector in K. Thus γ γ γ∈Γ Mγ ⊆ K. Since γ∈Γ γ M ⊆ M , it follows that M is the smallest element of γ γ γ γ∈Γ γ∈Γ γ∈Γ Lat(X ) that includes γ∈Γ Mγ . Equivalently,
Mγ = span Mγ . γ∈Γ
γ∈Γ
2.3 Linear Independence Let A be a nonempty subset of a linear space X . A vector x ∈ X is a linear combination of vectors in A if there exist a finite set {xi }ni=1 of vectors in A and a finite family of scalars {αi }ni=1 such that
46
2. Algebraic Structures
x =
n
αi xi .
i=1
Warning: A linear combination is, by definition, finite. That is, a linear combination of vectors in a set A is a weighted sum of a finite subset of vectors in A, weighted by a finite family of scalars, no matter whether A is a finite or an infinite set. Since X is a linear space, any linear combination of vectors in A is a vector in X . Proposition 2.2. The set of all linear combinations of vectors in a nonempty subset A of a linear space X coincides with the linear manifold span A. Proof. Let A be an arbitrary subset of a linear space X , consider the collection LA of all linear manifolds of X that include A, and recall that span A = min LA . Suppose A is nonempty and let A denote the set of all linear combinations of vectors in A. It is plain that A ⊆ A (every vector in A is a trivial linear combination of vectors in A), and that A is a linear manifold of X (if x, y ∈ A, then x + y and αx lie in A). Therefore, A ∈ LA . Moreover, if M is an arbitrary linear manifold of X , and if x ∈ X is a linear combination of vectors in M, then x ∈ M (because M is itself a linear space). Thus M ⊆ M. Since M ⊆ M , it follows that M = M for every linear manifold M of X . Furthermore, if M ∈ LA , then A ⊆ M and so A ⊆ M (reason: A ⊆ B whenever A and B are nonempty subsets of X such that A ⊆ B). Thus M ∈ LA implies A ⊆ M. Conclusion: A is the smallest element of LA . That is, A = span A.
Following the notation introduced in the proof of Proposition 2.2, A = span A whenever A = ∅. Set ∅ = span ∅ so that ∅ = {0}, and hence A is well defined for every subset A of X . We shall use one and the same notation, viz., span A, for both of them: the set of all linear combinations of vectors in A and the (linear) span of A. For this reason span A is also referred to as the linear manifold generated (or spanned) by A. If a linear manifold M of X (which may be X itself) is such that span A = M for some subset A of X , then we say that A spans M. A subset A of a linear space X is said to be linearly independent if each vector x in A is not a linear combination of vectors in A\{x}. Equivalently, A
2.3 Linear Independence
47
is linearly independent if x ∈ / span (A\{x}) for every x ∈ A. If a set A is not linearly independent, then it is said to be linearly dependent . Note that the empty set ∅ of a linear space X is linearly independent (there is no vector in ∅ that is a linear combination of vectors in ∅). Any singleton {x} of X such that x = 0 is linearly independent. Indeed, span ({x}\{x}) = span ∅ = {0} so that x ∈ / span ({x}\{x}) if x = 0. However, 0 ∈ span ({0}\{0}) = {0}, and hence the singleton {0} is not linearly independent. In fact, every subset of X that contains the origin of X is not linearly independent (indeed, if 0 ∈ A ⊆ X and A has another vector, say x = 0, then 0 = 0x). Thus, if a vector x is an element of a linearly independent subset of a linear space X , then x = 0. Proposition 2.3. Let A be a nonempty subset of a linear space X . The following assertions are pairwise equivalent. (a) A is linearly independent . (b) Each nonzero vector in span A has a unique representation as a linear combination of vectors in A. (c) Every finite subset of A is linearly independent. (d) There is no proper subset of A whose span coincides with span A. Proof. The statement (b) can be rewritten as follows.
(b ) For every nonzero x ∈ span A there exist a unique finite family of scalars n {αi }ni=1 and a unique finite subset {ai }ni=1 of A such that x = i=1 αi ai . Proof of (a)⇒(b). Suppose A = ∅ is linearly independent. Take an arbitrary nonzero x ∈ span A, and consider two representations of it as a linear combination of vectors in A: x =
n i=1
βi bi =
m
γi c i ,
i=1
where each bi and each ci are vectors in A (and hence nonzero because A is linearly independent). Since x = 0 we may assume that the scalars βi and γi are all nonzero. Set B = {bi }ni=1 and C = {ci }m i=1 , both finite nonempty subsets of A. Take an arbitrary b ∈ B and note that b is a linear combination of vectors in (B\{b}) ∪ C. However, since b ∈ A and A is linearly independent, it follows that b is not a linear combination of any subset of A\{b}. Thus b ∈ C. Similarly, take an arbitrary c ∈ C and conclude that c ∈ B by using nthe same argument. Hence B ⊆ C ⊆ B. That is, B = C. Therefore x = i=1 βi bi = n n γ b , which implies that (β − γ )b = 0. Since each b is not a linear i i i i=1 i i i=1 i combination of vectors in B\{bi }, it follows that βi = γi for every i. Summing up: The two representations of x coincide. Proof of (b)⇒(a). If A is nonempty and every nonzero vector x in span A has a unique representation as a linear combination of vectors in A, then the unique representation of an arbitrary a in A as a linear combination of vectors
48
2. Algebraic Structures
in A is a itself (recall: A ⊆ span A). Therefore, every a ∈ A is not a linear combination of vectors in A\{a}, which means that A is linearly independent. Proof of (a)⇔(c). If A is linearly independent, then every subset of it clearly is linearly independent. If A is not linearly independent, then either A = {0} or there exists x ∈ A which is a linear combination of vectors, say {xi }ni=1 for some n ∈ N , in A\{x} = ∅. In the former case A is itself a finite subset of A which is not linearly independent. In the latter case {xi }ni=1 ∪ {x} is a finite subset of A that is not linearly independent. Conclusion: If every finite subset of A is linearly independent, then A is itself linearly independent. Proof of (a)⇒(d). Recalling that B ⊆ A implies span B ⊆ span A, the statement (d) can be rewritten as follows.
(d ) B ⊂ A implies span B ⊂ span A. Suppose A is nonempty and linearly independent. Let B be an arbitrary proper subset of A. If B = ∅, then (d ) holds trivially (∅ = A = {0} so that span ∅ ⊂ span A). Thus suppose B = ∅ and take any x ∈ A\B. If x ∈ span B, then x is a linear combination of vectors in B. This implies that B ∪ {x} is a subset of A that is not linearly independent, and so A itself is not linearly independent, which is a contradiction. Therefore, x ∈ / span B for every x ∈ A\B whenever ∅ = B ⊂ A. Since x ∈ span A (for x ∈ A) and span B ⊆ span A (for B ⊂ A), it follows that span B ⊂ span A so that (d ) holds true. Proof of (d)⇒(a). If A is not linearly independent, then either A = {0} or there is an x ∈ A which is a linear combination of vectors in A\{x}. In the former case B = ∅ is the unique proper subset of A and span B = {0} = span A. In the latter case B = A\{x} is a proper subset of A such that span B = span A. (Indeed, span B ⊆ span A as B ⊆ A, and span A ⊆ span B as vectors in A are linear combinations of vectors in A\{x}.) Thus (d ) implies (a).
2.4 Hamel Basis A linearly independent subset of a linear space X that spans X is called a Hamel basis (or a linear basis) for X . In other words, a subset B of a linear space X is a Hamel basis for X if (i) B is linearly independent, and (ii) span B = X . Let B = {xγ }γ∈Γ be an indexed Hamel basis for a linear space X . If x is a nonzero vector in X , then Proposition 2.3 ensures the existence of a unique (similarly indexed) family of scalars {αγ }γ∈Γ (which may depend on x) such that αγ = 0 forall but a finite set of indices γ and x = α γ∈Γ γ xγ . The weighted sum α x (i.e., the unique representation of x as a linear γ γ γ∈Γ combination of vectors in B, or the unique (linear) representation of x in
2.4 Hamel Basis
49
terms of B) is called the expansion of x on B, and the coefficients of it (i.e., the unique indexed family of scalars {αγ }γ∈Γ ) are called the coordinates of x with respect to the indexed basis B. If x = 0, then its unique expansion on B is the trivial one whose coefficients are all null. Since ∅ is linearly independent, and since span ∅ = {0}, it follows that the empty set ∅ is a Hamel basis for the zero linear space {0}. Now suppose X is a nonzero linear space. Every singleton {x} in X such that x = 0 is linearly independent. Thus every nonzero linear space has many linearly independent subsets. If a linearly independent subset A of X is not already a Hamel basis for X , then we can construct a larger linearly independent subset of X . Proposition 2.4. If A is a linearly independent subset of a linear space X , and if there exists a vector x in X \span A, then A ∪ {x} is another linearly independent subset of X . Proof. Suppose there exists x ∈ X \span A. Note that x = 0, and so X = {0}. If A = ∅, then the result is trivially verified ({x} = ∅ ∪ {x} is linearly independent). Thus suppose A is nonempty and set C = A ∪ {x} ⊂ X . Since x∈ / span A, it follows that x ∈ / span (C\{x}). Take an arbitrary a ∈ A. Suppose it is a linear combination of vectors in C\{a}. Since a = αx for every scalar α (for x ∈ / span A and a = 0 because A is linearly independent), we get a = α0 x +
n
αi ai ,
i=1
where each n ai is a vector in A\{a} and each αi is a nonzero scalar (recall: 0 = a = i=1 αi ai because A is linearly independent). Thus x is a linear combination of vectors in A, which contradicts the assumption that x ∈ / span A. Therefore, every a ∈ A is not a linear combination of vectors in C\{a}. Conclusion: Every c ∈ C is not a linear combination of vectors in C\{c}, which means that C is linearly independent. Can we proceed this way, enlarging linearly independent subsets of X in order to form a chain of linearly independent subsets, so that an “ultimate” linearly independent subset becomes a Hamel basis for X ? Yes, we can; and it seems reasonable that the Axiom of Choice (or any statement equivalent to it as, for instance, Zorn’s Lemma) might be called into play. In fact, every linearly independent subset of any linear space X is included in some Hamel basis for X , so that every linear space has a large supply of Hamel bases. Theorem 2.5. If A is a linearly independent subset of a linear space X , then there exists a Hamel basis B for X such that A ⊆ B. Proof. Suppose A is a linearly independent subset of a linear space X . Set IA = B ∈ ℘(X ): B is linearly independent and A ⊆ B ,
50
2. Algebraic Structures
the collection of all linearly independent subsets of X that include A. Recall that, as a nonempty subcollection (since A ∈ IA ) of the power set ℘ (X ), IA is partially ordered in the inclusion ordering. Claim 1. IA has a maximal element. Proof. If X = {0}, then A = ∅ and IA = {A} = {∅} = ∅, so that the claimed result is trivially verified. Thus suppose X = {0}. In this case, the nonempty collection IA contains a nonempty set (e.g., if A = ∅, then every nonzero singleton in X belongs to IA ; if A = ∅, then A ∈ IA ). Now consider an ar bitrary chain C in IA containing a nonempty set. Recall that C denotes the union of allsets in C. Take an arbitrary finite nonempty subset of C, say, a set D ⊆ C such that #D = n for some n ∈ N . Each element of D belongs to a set in C (for D ⊆ C). Since C is a chain, we can arrange the elements of D as follows. D = {xi }ni=1 such that xi ∈ Ci ∈ C for each index i, where C1 ⊆ · · · ⊆ Cn . Thus D ⊆ Cn . Since Cn is linearly independent (because Cn ∈ C ⊆ IA ), itfollows that D is linearly independent. Conclusion: Every finite subset of C is linearly independent. Therefore C is linearly independent by Proposition 2.3.Moreover, since A ⊆ C for all C ∈ C (for C ⊆ IA ), it also follows that A ⊆ C. Hence C ∈ IA . Since C clearly is an upper bound for C, we may conclude: Every chain in IA has an upper bound in IA . Thus IA has a maximal element by Zorn’s Lemma. Claim 2. B ∈ IA is maximal in IA if and only if B is a Hamel basis for X. Proof. Again, if X = {0}, then B = A = ∅ is the only (and so a maximal) element in IA and span B = X , so that the claimed result holds trivially. Thus suppose X = {0}, which implies that IA contains nonempty sets, and take an arbitrary B in IA . If span B = X (i.e., if span B ⊂ X ), then take x ∈ X \span B so that B ∪ {x} ∈ IA (i.e., B ∪ {x} is linearly independent by Proposition 2.4, and A ⊂ B ∪ {x} because A ⊆ B). Hence B is not maximal in IA . Therefore, if B is maximal in IA , then span B = X . On the other hand, if span B = X , then B = ∅ (for X = {0}) and every vector in X is a linear combination of vectors in B. Thus B ∪ {x} is not linearly independent for every x ∈ X \B. This implies that there is no B ∈ IA such that B ⊂ B , which means that B is maximal in IA . Conclusion: If B ∈ IA , then B is maximal in IA if and only if span B = X . According to the definition of Hamel basis, B in IA is such that span B = X if and only if B is a Hamel basis for X . Claims 1 and 2 ensure that, for each linearly independent subset A of X , there exists a Hamel basis B for X such that A ⊆ B. Since the empty set is a linearly independent subset of any X, Theorem 2.5 ensures (by setting A = ∅) that every linear space has a Hamel basis. In this case I∅ is the collection of all linearly independent subsets of X, and so Claim 2 says that a Hamel basis for a linear space is precisely a maximal linearly independent subset of it (i.e., a Hamel basis is a maximal element of I∅ ).
2.4 Hamel Basis
51
The idea behind the previous theorem was that of enlarging a linearly independent subset of X to get a Hamel basis for X . Another way of facing the same problem (i.e., another way to obtain a Hamel basis for linear space X ) is to begin with a set that spans X and then to weed out from it a linearly independent subset that also spans X . Theorem 2.6. If a subset A of a linear space X spans X , then there exists a Hamel basis B for X such that B ⊆ A. Proof. Let A be a subset of a linear space X such that span A = X , and consider the collection IA of all linearly independent subsets of A: IA = B ∈ ℘(X ): B is linearly independent and B ⊆ A . If X = {0}, then either A = ∅ or A = {0}. In any case IA = {∅} trivially has a maximal element. If X = {0}, then A has a nonzero vector (for span A = X ) and every nonzero singleton {x} ⊆ A is a element of IA . Thus, proceeding exactly as in the proof of Theorem 2.5 (Claim 1), we can show that IA has a maximal element. Let A0 be a maximal element of IA . If A is linearly independent, then we are done (i.e., A is itself a Hamel basis for X since span A = X ). Thus suppose A is not linearly independent so that A0 is a proper subset of A. Take an arbitrary a ∈ A\A0 and consider the set A0 ∪ {a} ⊆ A, which is not linearly independent because A0 is maximal in IA . Since A0 is linearly independent, it follows that a is a linear combination of vectors in A0 . Thus A\A0 ⊆ span A0 , and hence A = A0 ∪ (A\A0 ) ⊆ span A0 . Therefore span A ⊆ span (span A0 ) = span A0 ⊆ span A, which implies that span A0 = span A = X . Conclusion: A0 is a Hamel basis for X . Since X trivially spans X , Theorem 2.6 holds for A = X . In this case IX is the collection of all linearly independent subsets of X (i.e., IX = I∅ ), and the theorem statement again says that every linear space has a Hamel basis. An ever-present purpose in mathematics is a quest for hidden invariants. The concept of Hamel basis supplies a fundamental invariant for a given linear space X , namely, the cardinality of all Hamel bases for X . Theorem 2.7. Every Hamel basis for a linear space has the same cardinality. Proof. If X = {0}, then the result holds trivially. Suppose X = {0} and let B and C be arbitrary Hamel bases for X (so that they are nonempty and do not contain the origin). Proposition 2.3 ensures that for every nonzero vector x in X there is a unique finite subset of the Hamel basis C, say Cx , such that x is a linear combination of all vectors in Cx ⊆ C. Now take an arbitrary c ∈ C and consider the unique representation of it as a linear combination of vectors in the Hamel basis B. Thus c is a linear combination of all vectors in {b} ∪ B for some (nonzero) b ∈ B and some finite subset B of B. Hence c = β b + d, where β is a nonzero scalar and d is a vector in X different from c (for β b = 0). If d = 0, then c = β b so that Cb = {c}, and hence c ∈ Cb trivially. Suppose
52
2. Algebraic Structures
d = 0. Recalling again that C also is a Hamel basis for X , consider the unique representation of the nonzero vector d as a linear combination of vectors in C so that β b = c − d = 0 is a linear combination of vectors in C. Thus b is itself a linear combination of all vectors in {c} ∪ C for some subset C of C. Since such a representation is unique, {c} ∪ C = Cb . Therefore c ∈ Cb . Summing up: For every c ∈ C there exists b ∈ B such that c ∈ Cb . Hence Cb . C ⊆ b∈B
Now we shall split the proof into two parts, one dealing with the case of finite Hamel bases, and the other with infinite Hamel bases. Claim 0. If a subset E of a linear space X has exactly n elements and spans X , then every subset of X with more than n elements is not linearly independent. Proof. Assume the linear space X is nonzero (i.e., X = {0}) to avoid trivialities. Take an integer n ∈ N and let E = {ei }ni=1 be a subset of X with n elements such that span E = X . Take any subset of X with n + 1 elements, say D = {di }n+1 i=1 . Suppose D is linearly independent. Now consider the set S1 = {d1 } ∪ E which clearly spans X (because E already does). Since span E = X , it follows that d1 is a linear combination of vectors in E. Moreover, d1 = 0 because D is n linearly independent. Thus d1 = i=1 αi ei where at least one, say αk , of the n scalars {αi }i=1 is nonzero. Therefore, if we delete ek from S1 , then the set S1 = S1 \{ek } = {d1 } ∪ E\{ek } still spans X . That is, in forming this new set S1 that spans X we have traded off one vector in D for one vector in E. Next rename the elements of S1 by setting si = ei for each i = k and sk = d1 , so that S1 = {si }ni=1 . Since D has at least two elements, set S2 = {d2 } ∪ S1 = {d1 , d2 } ∪ E\{ek } which again spans X (for S1 spans X ). Since span S1 = X , it follows that d2 n is a linear combination of vectors in S1 , say d2 = i=1 βi si for some family n of scalars {βi }i=1 . Moreover, 0 = d2 = βk sk = βk d1 because D is linearly independent. Thus there is at least one nonzero scalar in {βi }ni=1 different from βk , say βj . Hence, if we delete sj from S2 (recall: sj = ej = ek ), then the set S2 = S2 \{ej } = {d1 , d2 } ∪ E\{ek , ej } still spans X . Continuing this way, we eventually get down to the set Sn = {di }ni=1 ∪ E\{ei }ni=1 = D\{dn+1 }
2.4 Hamel Basis
53
which once again spans X . Thus dn+1 is a linear combination of vectors in D\{dn+1 }, which contradicts the assumption that D is linearly independent. Conclusion: Every subset of X with n + 1 elements is not linearly independent. Recalling that every subset of a linearly independent set is again linearly independent, it follows that every subset of X with more than n elements is not linearly independent. Claim 1. If B is finite, then
#C
=
# B.
Proof. Recall that Cb is finite for every b in B. If B is finite, then b∈B Cb is a finite union of finite sets. Hence any subset of it is finite. In particular, C is finite. Since C is linearly independent, it follows by Claim 0 that # C ≤ # B. Dually (swap the Hamel bases B and C), # B ≤ # C. Hence # C = # B.
Claim 2. If B is infinite, then
#C
=
# B.
Proof. If B is infinite, and since Cb is finite for every b in B, it follows that # Cb ≤ # B for all b in B. Thus, according to Theorems 1.9 and 1.10,
# Cb ≤ # B×B = # B b∈B
because B is infinite. Hence # C ≤ # B (recall: C ⊆ b∈B Cb and use Problems 1.21(a) and 1.22). Moreover, Claim 1 ensures that B and C are finite together. Thus C must be infinite as B is infinite. Since C is infinite we may reverse the argument (swapping again the Hamel bases B and C) and get # B ≤ # C. Hence # C = # B by the Cantor–Bernstein Theorem (Theorem 1.6). Claims 1 and 2 ensure that, if B and C are Hamel bases for a linear space X , then B and C have the same cardinal number. Such an invariant (i.e., the cardinality of any Hamel basis) is called the dimension (or the linear dimension) of the linear space X , denoted by dim X . Thus dim X = # B for any Hamel basis B for X . If the dimension of X is finite (equivalently, if any Hamel basis for X is a finite set), then we say that X is a finite-dimensional linear space. Otherwise (i.e., if any Hamel basis for X is an infinite set) we say that X is an infinite-dimensional linear space. Example 2.I. The Kronecker delta (or Kronecker function) is the mapping in 2Z ×Z (i.e., the function from Z ×Z to {0, 1}) defined by 1, i = j, δij = 0, i = j, for all integers i, j. Now consider the linear space F n (for an arbitrary positive integer n, over an arbitrary field F — see Example 2.D). The subset B = {ei }ni=1 of F n consisting of the n-tuples ei = (δi1 , . . . , δin ), with 1 at the ith position and zeros elsewhere, constitute a Hamel basis for F n. This is called
54
2. Algebraic Structures
the canonical basis (or the natural basis) for F n. Thus dim F n = n. As we shall see later, F n in fact is a prototype for every finite-dimensional linear space (of dimension n) over a field F . Example 2.J. Let F N be the linear space (over a field F ) of all scalar-valued sequences (see Example 2.E), and let X be the subset of F N defined as follows. x = {ξk }k∈N belongs to X if and only if ξk = 0 except for some finite set of indices k in N . That is, X is the set consisting of all F -valued sequences with a finite number of nonzero entries, which clearly is a linear manifold of F N, and hence a linear space itself over F . For each integer j ∈ N let ej be an F -valued sequence with just one nonzero entry (equal to 1) at the jth position; that is, ej = {δjk }k∈N ∈ X for every j ∈ N . Now set B = {ej }j∈N ⊂ X . It is readily verified that B is linearly independent and that span B = X (every vector in X is a linear combination of vectors in B). Thus B is a Hamel basis for X . Since B is countably infinite, X is an infinite-dimensional linear space with dim X = ℵ0 . Therefore (see Problem 2.6(b)), F N is an infinite-dimensional linear space. Note that B is not a Hamel basis for F N (reason: span B = X and X is properly included in F N ). The next example shows that ℵ0 < dim F N whenever F = Q , F = R, or F = C . Example 2.K. Let C N be the complex linear space of all complex-valued sequences. For each real number t ∈ (0, 1) consider the real sequence xt = {tk−1 }k∈N = {tk }k∈N 0 = (1, t, t2 , . . .) ∈ C N whose entries are the nonnegative powers of t. Set A = {xt }t∈(0,1) ⊆ C N. We claim that A is linearly independent. A bit of elementary real analysis (rather than pure algebra) supplies a very simple proof as follows. Suppose A is not linearly independent. Then there exists s ∈ (0, 1) such that xs is a linear combination of vectors in A\{xs }. That is, xs = ni=1 αi xti for some n ∈ N , where {αi }ni=1 is a family of nonzero complex numbers and {xti }ni=1 is a (finite) subset of A such that xti = xs for every i = 1 , . . . , n. Hence n > 1 (reason: if n = 1, then xs = α1 xt1 so that sk = α1 tk1 for every k ∈ N 0 , which implies that xs = xt1 ). As the set {ti }ni=1 consists of distinct points from (0, 1), suppose it is decreasingly ordered it if n (reorder k k necessary) so that t < t for each i = 2 , . . . , n. Since s = α t , we get i 1 i n n i=1 i (s/t1 )k = α1 + i=2 αi (ti /t1 )k for every k ∈ N 0 . But limk i=2 αi (ti /t1 )k = 0, because each ti /t1 lies in (0, 1), and hence limk (s/t1 )k = α1 . Thus α1 = 0 (recall: xs = xt1 so that s = t1 ) which is a contradiction. Conclusion: A is linearly independent. Therefore, by Theorem 2.5 there is a Hamel basis B for C N including A. Since A ⊆ B and # A = # (0, 1) = 2ℵ0, it follows that 2ℵ0 ≤ # B. However, # C = # R = 2ℵ0 ≤ # B = dim C N, and so # C N = dim C N (Problem 2.8). Conclusion: C N is an infinite-dimensional linear space such that 2ℵ0 ≤ dim C N =
#C
N
.
Note that the whole argument does apply for C replaced by R, so that 2ℵ0 ≤ dim RN =
#R
N
;
2.5 Linear Transformations
55
but it does not apply to the rational field Q (the interval (0, 1) is not a subset of Q , and hence the set A is not included in Q N ). However, the final conclusion does hold for the linear space Q N. Indeed, if F is an arbitrary infinite field, then 2ℵ0 = # 2N ≤ # F N = max{ # F , dim F N } according to Problems 1.24 and 2.8. Therefore, since # Q = ℵ0 < 2ℵ0 (Problem 1.25(c)), it follows that 2ℵ0 ≤ dim Q N =
#Q
N
.
2.5 Linear Transformations A mapping L: X → Y of a linear space X over a field F into a linear space Y over the same field F is homogeneous if L(αx) = αLx for every vector x ∈ X and every scalar α ∈ F . The scalar multiplication on the left-hand side is an operation on X and that on the right-hand side is an operation on Y (so that the linear spaces X and Y must indeed be defined over the same field F ). L is additive if L(x1 + x2 ) = L(x1 ) + L(x2 ) for all vectors x1 , x2 in X . Again, the vector addition on the left-hand side is an operation on X while the one on the right-hand side is an operation on Y. If X and Y are linear spaces over the same scalar field, and if L is a homogeneous and additive mapping of X into Y, then L is a linear transformation: a linear transformation is a homogeneous and additive mapping between linear spaces over the same scalar field. When we say that L: X → Y is a linear transformation, it is implicitly assumed that X and Y are linear spaces over the same field F . If X = Y and L: X → X is a linear transformation, then we refer to L as a linear transformation on X . Trivial example: The identity I: X → X (such that I(x) = x for every x ∈ X ) is a linear transformation on X . Recall that a field F can be made into a linear space over F itself (see Example 2.D). If X is a linear space over F , then a linear transformation f : X → F is called a linear functional : a linear functional is a scalar-valued linear transformation (i.e., a linear transformation of a linear space X into its scalar field). If y ∈ Y is the value of a linear transformation L: X → Y at x ∈ X , then we shall often write y = Lx (instead of y = L(x)). Since Y is a linear space, it has an origin. The null space (or kernel ) of a linear transformation L: X → Y is the subset N (L) = x ∈ X : Lx = 0 = L−1 ({0}) of X consisting of all vectors in X mapped into the origin of Y by L. Since X also is a linear space, it has an origin too. The origin of X is always in N (L)
56
2. Algebraic Structures
(i.e., L0 = 0 for every linear transformation L). The null transformation (denoted by O) is the mapping O: X → Y such that Ox = 0 for every x ∈ X , which certainly is a linear transformation. In fact, if L: X → Y is a linear transformation, then L = O if and only if N (L) = X . Equivalently, L = O if and only if R(L) = {0}. The null space N (L) = L−1 ({0}) of any linear transformation L: X → Y is a linear manifold of X , and the range R(L) = L(X ) of L is a linear manifold of Y (Problem 2.10(a)). These are indeed particular cases of Problem 2.11: The linear image of a linear manifold is a linear manifold , and the inverse image of a linear manifold under a linear transformation is again a linear manifold . The theorem below supplies an elegant and useful, although very simple, necessary and sufficient condition that a linear transformation be injective. Theorem 2.8. A linear transformation L is injective if and only if N (L) = {0}. Proof. Let X and Y be linear spaces over the same scalar field and consider a linear transformation L: X → Y. If L is injective, then L−1 (L({0})) = {0} (see Problem 1.3(d)). But L({0}) = {0} (since L0 = 0) so that L−1 ({0}) = {0}, which means N (L) = {0}. On the other hand, suppose N (L) = {0}. Take x1 and x2 arbitrary in X , and note that Lx1 − Lx2 = L(x1 − x2 ) since L is linear. Thus, if Lx1 = Lx2 , then L(x1 − x2 ) = 0 and hence x1 = x2 (i.e., x1 − x2 = 0 because N (L) = {0}). Therefore L is injective. The collection Y S of all mappings of a set S into a linear space Y over a field F is itself a linear space over F (see Example 2.F). Now suppose X is a linear space (over the same field F ) and let L[X , Y ] denote the collection of all linear transformations of X into Y. Since L[X , Y ] is a linear manifold of Y X, it follows that L[X , Y ] is a linear space over the same field F (see Problem 2.13). Set L[X ] = L[X , X ] for short so that L[X ] ⊂ X X is the linear space of all linear transformations on X . The linear space L[X , F ] of all linear functionals defined on a linear space X , which is a linear manifold of the linear space F X (see Example 2.E), is called the algebraic dual (or algebraic conjugate) of X and is denoted by X . (Dual spaces will be considered in Chapter 4.) Let X and Y be linear spaces over the same scalar field. Let L|M : M → Y be the restriction of a linear transformation L: X → Y to a linear manifold M of X . Since M is a linear space, it is easy to show that L|M is a linear transformation: The restriction of a linear transformation to a linear manifold is again a linear transformation (Problem 2.14). The next result ensures the converse: If M is a linear manifold of X and L ∈ L[M, Y ], then there exists T ∈ L[X , Y ] such that L = T |M . T is called a linear extension of L over X . Theorem 2.9. Let X and Y be linear spaces over the same field F , and let M be a linear manifold of X . If L: M → Y is a linear transformation, then there exists a linear extension T : X → Y of L defined on the whole space X .
2.5 Linear Transformations
57
Proof. Set K =
K ∈ L[N , Y ] : N ∈ Lat(X ), M ⊆ N and L = K|M ,
the collection of all linear transformations from linear manifolds of X to Y which are extensions of L. Note that K is nonempty (at least L is there). Moreover, as a subcollection of F = A∈℘(X ) Y A, K is partially ordered in the extension ordering (see Problem 1.17). Problem 1.17 also tells us that
a supremum K in F with domain D Kγ = every chain {Kγ } in K has γ γ γ
γ D(Kγ ) and range R γ Kγ = γ R(Kγ ). Since D(Kγ ) ∈ Lat(X ) (each of X ), and since Kγ is a linear transformation defined on a linear
manifold is a linear manifold of K Lat(X ) is a complete lattice, it follows that D γ γ X (i.e., γ D(Kγ ) ∈ Lat(X )). Similarly, R γ Kγ is a linear manifold of Y. Claim . The supremum γ Kγ lies in K.
Proof. Take u and v arbitrary in D γ Kγ , so that u ∈ D(Kλ ) for some Kλ ∈ {Kγ } and v ∈ D(Kμ ) for some Kμ ∈ {Kγ }. Since {Kγ } is a chain, it follows that Kλ ≤ Kμ (or vice versa), so that D(Kλ ) ⊆ D(Kμ ). Thus αu + βv ∈ D(Kμ ) and hence Kμ (αu + βv) = αKμ u + βKμ v for every α, β ∈ F
μ , which implies
each Kγ is linear). However
γ Kγ |D(Kμ ) = K
that (recall: (αu + βv) = α u + β v. That is, →Y K K K K : D K γ γ γ γ γ γ γ γ γ γ is such that K | = L, and since {K } is is linear. Moreover, since each K γ γ M γ
a chain, it follows that γ Kγ |M = L. Conclusion: γ Kγ ∈ K. Therefore, every chain in K has a supremum (and so an upper bound) in K. Thus, according to Zorn’s Lemma, K contains a maximal element, say K0 : N0 → Y. We shall show that N0 = X , and hence K0 is a linear extension of L over X . The proof goes by contradiction. Suppose N0 = X . Take x1 ∈ X \N0 (so that x1 = 0 because N0 is a linear manifold of X ) and consider the sum of N0 and the one-dimensional linear manifold of X spanned by {x1 }, N1 = N0 + span {x1 }, which is a linear manifold of X properly including M (because M ⊆ N0 ⊂ N1 ). Since N0 ∩ span {x1 } = {0}, it follows that every x in N1 has a unique representation as a sum of a vector in N0 and a vector in span {x1 }. That is, for each x ∈ N1 there is a unique pair (x0 , α) in N0 ×F such that x = x0 + αx1 . (Indeed, if x = x0 + αx1 = x0 + α x1 , then x0 − x0 = (α − α)x1 lies in N0 ∩ span {x1 } = {0} so that x0 = x0 and α = α — recall: x1 = 0.) Take any y in Y (e.g., y = 0) and consider the mapping K1 : N1 → Y defined by K1 x = K0 x0 + αy for every x ∈ N1 . Observe that K1 is linear (it inherits the linearity of K0 ) and K0 = K1 |N0 (so that K0 ≤ K1 ). Since M ⊆ N0 ⊂ N1 , it follows that
58
2. Algebraic Structures
L = K0 |M = K1 |M . Thus K1 ∈ K, which contradicts the fact that K0 is maximal in K (for K0 = K1 ). Therefore, N0 = X . Let X and Y be nonzero linear spaces over the same field. Take x = 0 in X and y = 0 in Y, set M = span {x} in Lat(X ), and let L: M → Y be defined by Lu = αy for every u = αx ∈ M. Clearly, L is linear and L = O. Thus Theorem 2.9 ensures that, if X and Y are nonzero linear spaces over the same field, then there exist many T = O in L[X , Y ] (at least as many as one-dimensional linear manifolds in Lat(X )).
2.6 Isomorphisms Two exemplars of a mathematical structure are indistinguishable, in the context of the theory in which that structure is embedded, if there exists a oneto-one correspondence between them that preserves such a structure. This is a central concept in mathematics. From the point of view of the linear space theory, two linear spaces are essentially the same if there exists a one-to-one correspondence between them that preserves all the linear relations — they may differ in the set-theoretic nature of their elements but, as far as the linear space (algebraic) structure is concerned, they are indistinguishable. In other words, two linear spaces X and Y over the same scalar field are regarded as essentially the same linear space if there exists a one-to-one correspondence between them that preserves vector addition and scalar multiplication; that is, if there exists at least one invertible linear transformation from X to Y whose inverse from Y to X also is linear. The theorem below shows that the inverse of an invertible linear transformation is always linear. Theorem 2.10. Let X and Y be linear spaces over the same field F . If L: X → Y is an invertible linear transformation, then its inverse L−1 : Y → X is a linear transformation. Proof. Suppose L: X → Y is an invertible linear transformation. Recall that a function is invertible if it is injective and surjective. Take y1 and y2 arbitrary in Y so that there exist x1 and x2 in X such that y1 = Lx1 and y2 = Lx2 (because Y = R(L) — i.e., L is surjective). Since L is injective (i.e., L−1 L is the identity on X — see Problems 1.5 and 1.7) and additive, it follows that L−1 (y1 + y2 ) = L−1 (Lx1 + Lx2 ) = L−1L(x1 + x2 ) = x1 + x2 = L−1Lx1 + L−1Lx2 = L−1 y1 + L−1 y2 , and hence L−1 is additive. Similarly, since L is injective and homogeneous, L−1 (αy) = L−1 (αLx) = L−1L(αx) = αx = αL−1Lx = αL−1 y for every y ∈ Y = R(L) and every α ∈ F , which implies that L−1 is homogeneous. Thus L−1 is a linear transformation.
2.6 Isomorphisms
59
An isomorphism between linear spaces (over the same scalar field) is an injective and surjective linear transformation. Equivalently, an isomorphism between linear spaces is an invertible linear transformation. Two linear spaces X and Y over the same field F are isomorphic if there exists an isomorphism (i.e., a linear one-to-one correspondence) of X onto Y. Thus, according to Theorem 2.8, a linear transformation L: X → Y of a linear space X into a linear space Y is an isomorphism if and only if N (L) = {0} and R(L) = Y. In particular, if N (L) = {0}, then X and the range of L (R(L) = L(X )) are isomorphic linear spaces. We noticed in Example 2.I that F n is a “prototype” for every n-dimensional linear space over F . What this really means is that every n-dimensional linear space over a field F is isomorphic to F n, and hence two n-dimensional linear spaces over the same scalar field are isomorphic. In fact, such an isomorphism between linear spaces with the same dimension holds in general, either for finite-dimensional or for infinite-dimensional linear spaces. We shall prove this below (Theorem 2.12), but first we need the following auxiliary result. Proposition 2.11. Let X and Y be linear spaces over the same field, and let B be a Hamel basis for X . For each mapping F : B → Y there exists a unique linear transformation T : X → Y such that T |B = F . Proof. If B = {xγ }γ∈Γ is a Hamel basis for X , indexed by an index set Γ (recall: any set can be thought of as an indexed set), then every vector x in X has a unique expansion on B, viz., x = αγ xγ , γ∈Γ
where {αγ }γ∈Γ is a similarly indexed family of scalars with αγ = 0 for all but a finite set of indices γ (the coordinates of x with respect to the basis B). Set Tx = αγ F (xγ ) γ∈Γ
for every x ∈ X . This defines a mapping T : X → Y of X into Y which is homogeneous, additive, and equals F when restricted to B. That is, T is a linear transformation such that T |B = F . Moreover, if L: X → Y is a linear transformation of X into Y such that L|B = F , then L = T . Indeed, for every x ∈ X ,
αγ xγ = αγ F (xγ ) = T αγ xγ = T x. Lx = L γ∈Γ
γ∈Γ
γ∈Γ
Theorem 2.12. Two linear spaces X and Y over the same scalar field are isomorphic if and only if dim X = dim Y. Proof. (a) Let L: X → Y be an isomorphism of X onto Y, and let BX be a Hamel basis for X . Set BY = L(BX ), a subset of Y.
60
2. Algebraic Structures
Claim 1. BY is linearly independent. Proof. Recall that L is an injective and surjective linear transformation. If BY is not linearly independent, then there exists n y ∈ BY , which is a linear combination of vectors in BY \{y}, say y = i=1 αi yi where each yi is a vector in BY \{y}. Thus x = L−1 y in BX = L−1 (BY ) is a linear combination of vecn tors in BX \{x}. (Indeed, x = i=1 αi xi where each xi = L−1 yi is a vector in −1 BX = L (BY ) different from x = L−1 y — recall: each yi is a vector in BY different from y, and L is injective.) But this contradicts the fact that BX is linearly independent. Conclusion: BY is linearly independent. Claim 2. BY spans Y. Proof. Take y ∈ Y arbitrary so that y = Lx for some x ∈ X (because L is surjective). Since span BX = X , it follows that x is a linear combination of vectors in BX . Hence y = Lx is a linear combination of vectors in BY = L(BX ) (since L is linear) so that span BY = Y. Therefore, BY is a Hamel basis for Y. Moreover, # BY = # BX because L sets a one-to-one correspondence between BX and BY . (In fact, the restriction L|BX : BX → BY is injective and surjective, since L is injective and BY = L(BX ) by definition.) Thus dim X = dim Y. (b) Let BX and BY be Hamel bases for X and Y, respectively. If dim X = dim Y, then # BY = # BX , which means that there exists a one-to-one mapping F : BX → BY of BX onto BY . Let T : X → Y be the unique linear transformation such that T |BX = F (see Proposition 2.11), and hence T (BX ) = F (BX ) = BY . Claim 3. T is injective. Proof. If X = {0}, then the result holds trivially. Thus suppose X = {0}. Take any nonzero vector x in X and consider its (unique) representation as a linear combination of vectors in BX . Therefore, T x has a representation as a linear combination of vectors in BY = T (BX ) because T is linear. Since BY is linearly independent, it follows that T x = 0. That is, N (T ) = {0} which means, by Theorem 2.8, that T is injective. Claim 4. T is surjective. Proof. an arbitrary vector y ∈ Y and consider Take n its expansion on BY , say n y = i=1 αi yi with each yi in BY . Thus y = i=1 αi T (x i ) with each
xi in n B , where because B = T (B ). But T is linear so that y= T α x X Y X i i i=1 n α x is a vector in X (since X is a linear space). Hence y ∈ R(T ). i i i=1 Therefore, T : X → Y is an isomorphism of X onto Y.
Example 2.L. Let X and Y be finite-dimensional linear spaces over the same field F , with dim X = n and dim Y = m. Let BX = {xj }nj=1 and BY = {yi}m i=1 be Hamel bases for X and Y, respectively. Take an arbitrary vector x in X and consider its unique expansion on BX ,
2.6 Isomorphisms
x =
n
61
ξj xj ,
j=1
where the family of scalars {ξj }nj=1 consists of the coordinates of x with respect to BX . Now let A: X → Y be any linear transformation so that Ax =
n
ξj Axj ,
j=1
where Axj is a vector in Y for each j. Consider its unique expansion on BY , Axj =
m
αij yi ,
i=1
where {αij }m i=1 is a family of scalars (the coordinates of each Axj with respect to BY ). Set y = Ax in Y and consider the unique expansion of y on BY , y =
m
υi yi .
i=1
Again, {υi }m i=1 is a family of scalars consisting of the coordinates of y with respect to BY . Thus the identity y = Ax can be written as m i=1
υi yi =
m n i=1
ξj αij yi .
j=1
Since the expansion of y on BY is unique, it follows that υi =
n
αij ξj
j=1
for every i = 1 , . . . , m. This gives an expression for each coordinate of Ax as a function of the coordinates of x. In terms of standard matrix notation, and according to the ordinary matrix operations, the matrix equation ⎞ ⎛ ⎞ ⎞⎛ ⎛ ξ1 υ1 α11 . . . α1n .. ⎠ ⎜ .. ⎟ ⎝ ... ⎠ = ⎝ ... ⎝ . ⎠ . υm
αm1
. . . αmn
ξn
represents the identity y = Ax (the vector y is the value of the linear transformation A at the point x), and the m×n array of scalars ⎞ ⎛ α11 . . . α1n . . .. ⎠ [A] = ⎝ .. αm1 . . . αmn
62
2. Algebraic Structures
is the matrix that represents the linear transformation A: X → Y with respect to the bases BX and BY . The matrix [A] of a linear transformation A depends on the bases BX and BY . If the bases are changed, then the matrix that represents the linear transformation may change as well. Different matrices representing the same linear transformation are simply different representations of it with respect to different bases. However, for fixed bases BX and BY the representation [A] of A is unique. Uniqueness is not all. It is easy to show that (a) the set F m×n of all m×n matrices with entries in F is a linear space over F when equipped with the ordinary (entrywise) operations of matrix addition and scalar multiplication. Moreover, for fixed bases BX and BY , (b) F m×n is isomorphic to L[X , Y ]. Indeed, if we fix the bases BX and BY , then the relation between L[X , Y ] and
F m×n defined by “[A] represents A with respect to BX and BY ” in fact is a function from L[X , Y ] to F m×n . It is readily verified that such a function, say Φ: L[X , Y ] → F m×n , is homogeneous, additive, injective, and surjective. In
other words, Φ is an isomorphism. For this reason we may and shall identify a linear transformation A ∈ L[F n, F m ] with its matrix [A] ∈ F m×n relative to the canonical bases for F n and F m (which were introduced in Example 2.I). Example 2.M. Let F denote either the real field or the complex field. For every nonnegative integer n let Pn [0, 1] be the collection of all polynomials in the variable t ∈ [0, 1] with coefficients in F of degree not greater than n: Pn [0, 1] = p ∈ F [0,1] : p(t) = ni=0 αi ti , t ∈ [0, 1], with each αi in F . m Recall that the degree of a nonzero polynomial p is m if p(t) = i=0 αi ti with αm = 0 (e.g., the degree of a constant polynomial is zero), and the degree of the zero polynomial is undefined (thus not greater than any n ∈ N 0 ). It is readily verified that Pn [0, 1] is a linear manifold of the linear space F [0,1] (see Example 2.E), and hence a linear space over F . Now consider the mapping L: F n+1 → Pn [0, 1] defined as follows. For each x = (ξ0 , . . . , ξn ) ∈ F n+1 let p = Lx in Pn [0, 1] be given by p(t) =
n
ξ i ti
i=0
for every t ∈ [0, 1]. It is easy to show that L is a linear transformation. Moreover, N (L) = {0} (i.e., if p(t) = ni=0 ξi ti = 0 for every t ∈ [0, 1], then x = (ξ0 , . . . , ξn ) = 0 — a nonzero polynomial has only a finite number of zeros) so that L is injective (see Theorem 2.8). Furthermore, every polynomial p in n Pn [0, 1] is of the form p(t) = i=0 ξi ti for some x = (ξ0 , . . . , ξn ) in F n+1, which means that Pn [0, 1] ⊆ R(L). Hence Pn [0, 1] = R(L); that is, L is also surjective. Therefore, the linear transformation L is an isomorphism between the
2.7 Isomorphic Equivalence
63
linear spaces F n+1 and Pn [0, 1]. Thus, since dim F n+1 = n + 1 (see Example 2.I), it follows by Theorem 2.12 that dim Pn [0, 1] = n + 1. Next consider the collection P[0, 1] of all polynomials in the variable t ∈ [0, 1] with coefficients in F of any degree: Pn [0, 1]. P[0, 1] = n∈N 0
Note that P[0, 1] contains the zero polynomial together with every polynomial of finite degree. It is again readily verified that, as a linear manifold of F [0,1] , P[0, 1] is itself a linear space over F . The functions pj : [0, 1] → F , defined by pj (t) = tj for every t ∈ [0, 1], clearly belong to P[0, 1] for each j ∈ N 0 . Consider the set B = {pj }j∈N 0 ⊂ P[0, 1]. Since any polynomial in P[0, 1] is, by definition, a (finite) linear combination of vectors in B, we get P[0, 1] ⊆ span B. Hence B spans P[0, 1] (i.e., span B = P[0, 1]). We claim that B is also linearly independent. Indeed, suppose B is not linearly independent. Then there m exists in B a linear combination pk of vectors in B\{pk }. That is, pk = i=1 αi pji m for some m ∈ N , where {αi }m i=1 is a family of nonzero scalars and {pji }i=1 is a finite subset of B such that pji = pk (i.e., ji = k) for every i = 1 , . . . , m. m Thus p = pk − i=1 αi pji is the origin of P[0, 1], which means that p(t) = tk −
m
αi tji = 0
i=1
for all t ∈ [0, 1]. But because p is a polynomial of de this is a contradiction gree equal to max {k} ∪ {ji }m i=1 ≥ 1. Conclusion: B is linearly independent. Therefore the set B = {pj }j∈N 0 is a Hamel basis for P[0, 1], and hence dim P[0, 1] = ℵ0 (since # B = # N 0 = ℵ0 ). Thus P[0, 1] is isomorphic to the linear space X of all F -valued sequences with a finite number of nonzero entries (which was introduced in Example 2.J).
2.7 Isomorphic Equivalence Two linear spaces over the same scalar field are regarded as essentially the same linear space if they are isomorphic. Let X , Y, and Z be linear spaces over the same field F . It is clear that X is isomorphic to itself (reflexivity), and Y is isomorphic to X whenever X is isomorphic to Y (symmetry). Moreover, since the composition of two isomorphisms is again an isomorphism (see Problems 1.9(c) and 2.15), it follows that, if X is isomorphic to Y and Y is isomorphic to
64
2. Algebraic Structures
Z, then X is isomorphic to Z (transitivity). Thus, if the notion of isomorphic linear spaces is restricted to a given set (for instance, to the collection of all linear manifolds Lat(X ) of a linear space X ), then it is an equivalence relation on that set. We shall now define an equivalence between linear transformations. As usual, let GF : X → Z denote the composition G ◦ F of a mapping G: Y → Z and a mapping F : X → Y. Definition 2.13. Let X , X, Y, and Y be linear spaces over the same scalar Two linear transfield, where X is isomorphic to X and Y is isomorphic to Y. formations T : X → Y and L: X → Y are isomorphically equivalent if there exist isomorphisms X: X → X and Y : Y → Y such that Y T = LX. That is, L and T are isomorphically equivalent if there are isomorphisms X and Y such that T = Y −1LX (or L = Y T X −1), which means that the diagram T
Y ⏐ −1 ⏐Y
L
Y
X ⏐ ⏐ X
−−−→
X
−−−→
commutes. Warning: If X is isomorphic to X and Y is isomorphic to Y, then there exists an uncountable supply of isomorphisms between X and X If we take arbitrary linear transformations T : X → Y and between Y and Y. and L: X → Y, it may happen that the above diagram does not commute (i.e., it may happen that Y T = LX) for all isomorphisms of X onto X and In this case T and L are not isomorphically all isomorphisms of Y onto Y. equivalent. However, if there exists at least one pair of isomorphisms X and Y for which Y T = LX, then T and L are isomorphically equivalent. Isomorphic equivalence deserves its name. In fact, every T in L[X , Y ] is is isomorisomorphically equivalent to itself (reflexivity), and L in L[X, Y] phically equivalent to T in L[X , Y ] whenever T is isomorphically equivalent to L (symmetry). Moreover, if T in L[X , Y ] is isomorphically equivalent to and L is isomorphically equivalent to K in L[X, Y] (so that X , L in L[X, Y] and Y), then it is X, and X are isomorphic linear spaces, as well as Y, Y, easy to show that T is isomorphically equivalent to K (transitivity). Indeed, and if we restrict the concept of isomorphic equivalence if X = X and Y = Y, to the set L[X , Y ] of all linear transformations of X into Y, then isomorphic equivalence actually is an equivalence relation on L[X , Y ]. An important particular case is obtained when X = Y and X = Y so that T lies in L[X ] and L lies in L[X]. Let X and X be isomorphic linear spaces. Two linear transformations T : X → X and L: X → X are similar if there exists an isomorphism W : X → X such that
2.7 Isomorphic Equivalence
65
W T = LW. To put it another way, if there is an isomorphism W such that the diagram T
X ⏐ ⏐ W
L
X
X ⏐ ⏐ W
−−−→
X
−−−→
commutes. It should be noticed now that the concept of similarity will be redefined later in Chapter 4 where the linear spaces are endowed with an additional (topological) structure. Such a redefinition will assume that all linear transformations involved in the definition of similarity are “continuous” (including the inverse of W ). Example 2.N. Consider the setup of Example 2.L, where X and Y are finite-dimensional linear spaces over the same field F . Let X: X → F n and Y : Y → F m be two mappings defined by Xx = (ξ1 , . . . , ξn )
and
Y y = (υ1 , . . . , υm )
for every x ∈ X and every y ∈ Y, where {ξj }nj=1 and {υi }m i=1 consist of the coordinates of x and y with respect to the bases BX and BY , respectively. It is readily verified that X and Y are both isomorphisms (for fixed bases BX and BY ). Let F n×1 denote the linear space (over the field F ) of all n×1 matrices (or, if you like, the linear space of all “column n-vectors” with entries in F — Example 2.L). Now consider the map Wn : F n → F n×1 that assigns to each n-tuple (ξ1 , . . . , ξn ) in F n the n×1 matrix ⎛ ⎞ ξ1 ⎜ .. ⎟ ⎝ . ⎠ = Wn (ξ1 , . . . , ξn ) ξn in F n×1 whose entries are the (similarly ordered) coordinates of the ordered n-tuple with respect to the canonical basis for F n. It is easy to show that Wn is an isomorphism between F n and F n×1 . This is called the natural isomorphism of F n onto F n×1 . Note that any m×n matrix (with entries in F ) can be viewed as a linear transformation from F n×1 to F m×1 : the action of an m×n matrix [αij ] ∈ F m×n on an n×1 matrix [ξj ] ∈ F n×1 is simply the matrix product ⎞⎛ ⎞ ⎛ ξ1 α11 . . . α1n .. ⎠ ⎜ .. ⎟ .. ⎝ [αij ][ξj ] = ⎝ . ⎠, . . αm1
. . . αmn
ξn
which is an m×1 matrix in F m×1 . According to Example 2.L let [A] ∈ F m×n be the unique matrix representing the linear transformation A ∈ L[X , Y ] with
66
2. Algebraic Structures
respect to the bases BX and BY . Now, if this matrix is viewed as a linear transformation of F n×1 into F m×1 , then the diagram X ⏐ ⏐ X
A
−−−→
Fn
Y ⏐ −1 ⏐Y Fm
⏐ ⏐ Wn
⏐ −1 ⏐ Wm
[A]
F n×1 −−−→ F m×1
commutes. This shows that the linear transformation A: X → Y is isomorphically equivalent to its matrix [A] with respect to the bases BX and BY when this matrix is viewed as a linear transformation [A]: F n×1 → F m×1 . That is, (Wm Y )A = [A](Wn X).
2.8 Direct Sum Let {Xi }ni=1 be a finite family of linear spaces over the same field F (not necessarily linear of the same linear space). The direct sum of {Xi }ni=1 , manifolds n denoted by i=1 Xi , is the set of all ordered n-tuples (x1 , . . . , xn ), with each xi in Xi , where vector addition and scalar multiplication are defined as follows. (x1 , . . . , xn ) ⊕ (y1 , . . . , yn ) = (x1 + y1 , . . . , xn + yn ), α(x1 , . . . , xn ) = (αx1 , . . . , αxn ) . . . , yn ) in ni=1 Xi and every α in F . It is easy for every (x1 , . . . , xn ) and (y1 , n to verify that the direct sum i=1 Xi of the linear spaces {Xi }ni=1 is a linear space over F when vector addition (denoted by ⊕) and scalar multiplication n are defined as above. The underlying set of the linear space i=1 Xi is the n Cartesian product i=1 Xi of the underlying sets of each linear space Xi . The n origin of i=1 Xi is the ordered n-tuple (01 , . . . , 0n ) of the origins of each Xi . If M and N are linear manifolds of a linear space X , then we may consider both their ordinary sum M + N (defined as in Section 2.2) and their direct sum M ⊕ N . These are different linear spaces over the same field. There is however a natural mapping Φ: M ⊕ N → M + N , defined by Φ((x1 , x2 )) = x1 + x2 , which assigns to each pair (x1 , x2 ) in M ⊕ N their sum in M + N ⊆ X . It is readily verified that Φ is a surjective linear transformation of the linear space M ⊕ N onto the linear space M + N , but Φ is not always injective. We shall
2.8 Direct Sum
67
establish below a necessary and sufficient condition that Φ be injective, viz., M ∩ N = {0}. In such a case the mapping Φ is an isomorphism (called the natural isomorphism) of M ⊕ N onto M + N , so that the direct sum M ⊕ N and the ordinary sum M + N become isomorphic linear spaces. Theorem 2.14. Let M and N be linear manifolds of a linear space X . The following assertions are pairwise equivalent. (a) M ∩ N = {0}. (b) For each x in M + N there exists a unique u in M and a unique v in N such that x = u + v. (c) The natural mapping Φ: M ⊕ N → M + N is an isomorphism. Proof. Take an arbitrary x in M + N . If x = u1 + v1 = u2 + v2 , with u1 , u2 in M and v1 , v2 in N , then u1 − u2 = v1 − v2 is in M ∩ N (for u1 − u2 ∈ M and v1 − v2 ∈ N ). Thus M ∩ N = {0} implies that u1 = u2 and v1 = v2 , and hence (a)⇒(b). On the other hand, if M ∩ N = {0}, then there exists a nonzero vector w in M ∩ N . Take any nonzero vector x in M + N so that x = u + v with u in M and v in N . Thus x = (u + w) + (v − w), where u + w is in M and v − w is in N . Since w = 0, it follows that u + w = u, and hence the representation of x as a sum u + v with u in M and v in N is not unique. Thus, if (a) does not hold, then (b) does not hold. Equivalently, (b)⇒(a). Finally, recall that the natural mapping Φ is linear and surjective. Since Φ is injective if and only if (b) holds (by definition), it follows that (b)⇔(c). Two linear manifolds M and N of a linear space X are said to be disjoint (or algebraically disjoint ) if M ∩ N = {0}. (Note that, as linear manifolds of a linear space X , M and N can never be “disjoint” in the set-theoretical sense — the origin of X always belongs to both of them.) Therefore, if M and N are disjoint linear manifolds of a linear space X , then we may and shall identify their ordinary sum M + N with their direct sum M ⊕ N . Such an identification is carried out by the natural isomorphism Φ: M ⊕ N → M + N (Theorem 2.14). When we identify M ⊕ N with M + N , which is a linear manifold of X , we are automatically identifying the pairs (u, 0) and (0, v) in M ⊕ N with u in M and with v in N , respectively. More generally, we shall be identifying the direct sums M ⊕ {0} and {0} ⊕ N with M and N , respectively. For instance, if x ∈ M ⊕ N and M ∩ N = {0}, then Theorem 2.14 ensures that there exists a unique u in M and a unique v in N such that x = (u, v). Hence x = (u, 0) ⊕ (0, v) where (u, 0) ∈ M ⊕ {0} and (0, v) ∈ {0} ⊕ N (recall: M ⊕ {0} and {0} ⊕ N are both linear manifolds of M ⊕ N ). Now identify (u, 0) with Φ((u, 0)) = u and (0, v) with Φ((0, v)) = v, and write x = u ⊕ v where u ∈ M and v ∈ N (instead of x = (u, 0) ⊕ (0, v) = Φ−1 (u) ⊕ Φ−1 (v)). Outcome: If M and N are disjoint linear manifolds of a linear space X , then every x in M ⊕ N has a unique decomposition with respect to M and N , denoted by x = u ⊕ v, which is referred to as the direct sum of u in M and v in
68
2. Algebraic Structures
N . It should be noticed that u ⊕ v is just another notation for (u, v) that reminds us of the algebraic structure of the linear space M ⊕ N . What really is being added in M ⊕ N is (u, 0) ⊕ (0, v). If M and N are disjoint linear manifolds of a linear space X , and if their (ordinary) sum is X , then we say that M and N are algebraic complements of each other. In other words, two linear manifolds M and N of a linear space X form a pair of algebraic complements in X if X = M+N
and
M ∩ N = {0}.
and
M ∩ N = {0}
Accordingly, this can be written as X =M⊕N
once we have identified the direct sum M ⊕ N with its isomorphic image Φ(M ⊕ N ) = M + N = X through the natural isomorphism Φ. Proposition 2.15. Let M and N be linear manifolds of a linear space X , and let BM and BN be Hamel bases for M and N , respectively. (a) M ∩ N = {0} if and only if BM ∩ BN = ∅ and BM ∪ BN is linearly independent . (b) M + N = X and BM ∪ BN is linearly independent if and only if BM ∪ BN is a Hamel basis for X . In particular, if BM ∪ BN ⊆ B, where B is a Hamel basis for X , then
(a ) M ∩ N = {0} if and only if BM ∩ BN = ∅,
(b ) M + N = X if and only if BM ∪ BN = B. Proof. (a) Recall that {0} ⊆ span (BM ∩ BN ) ⊆ span (M ∩ N ) = M ∩ N . Thus M ∩ N = {0} implies span (BM ∩ BN ) = {0}, which in turn implies BM ∩ BN = ∅ (for 0 ∈ / BM ∪ BN ). Moreover, if M ∩ N = {0}, then the union of the linearly independent sets BM and BN is again linearly independent (see Problem 2.3). On the other hand, recall that {0} ⊆ M ∩ N = span BM ∩ span BN = span (BM ∩ BN ) if BM ∪ BN is linearly independent (see Problem 2.4). Thus BM ∩ BN = ∅ implies span (BM ∩ BN ) = {0}, and hence M ∩ N = {0}. (b) Next recall that span (BM ∪ BN ) = span (M ∪ N ) = M + N ⊆ X
2.8 Direct Sum
69
whenever BM and BN are Hamel bases for M and N , respectively. Moreover, if BM ∪ BN is a Hamel basis for X , then BM ∪ BN is linearly independent and X = span (BM ∪ BN ) so that M + N = X . On the other hand, if M + N = X , then span (BM ∪ BN ) = X . Thus, according to Theorem 2.6, there exists a Hamel basis B for X such that B ⊆ BM ∪ BN . If BM ∪ BN is linearly independent, then Theorem 2.5 ensures that there exists a Hamel basis B for X such that BM ∪ BN ⊆ B. Therefore B ⊆ B. But a Hamel basis is maximal (see Claim 2 in the proof of Theorem 2.5) so that B = B. Hence BM ∪ BN = B. Theorem 2.16. Every linear manifold has an algebraic complement . Proof. Let M be a linear manifold of a linear space X , let BM be a Hamel basis for M, and let B be a Hamel basis for X such that BM ⊆ B (see Theorem 2.5). Set BN = B\BM (which, as a subset of a linearly independent set B, is linearly independent itself), and set N = span BN (a linear manifold of X ). Thus BM and BN are Hamel basis for M and N , respectively, both included in the Hamel basis B for X . Since BM ∩ BN = ∅ and BM ∪ BN = B, it follows by Proposition 2.15 that N is an algebraic complement of M. Lemma 2.17. Let M be a linear manifold of a linear space X . Every algebraic complement of M is isomorphic to the quotient space X /M. Proof. Let M be a linear manifold of a linear space X over a field F , and let X /M be the quotient space of X modulo M, which is again a linear space over F (see Example 2.H). The natural mapping π: X → X /M, which assigns each vector x in X the equivalence class π(x) = [x] = x + M in X /M, is a linear transformation (cf. Example 2.H). It is plain that π is surjective. Let K be a linear manifold of X and consider the restriction of π to K, π|K : K → X /M, which is again a linear transformation (Problem 2.14). Claim 1. If M ∩ K = {0}, then π|K is injective. Proof. Problem 2.14 also says that N (π|K ) = K ∩ N (π). Since N (π) = M (see Example 2.H), it follows that, if M ∩ K = {0}, then N (π|K ) = {0}, and so the linear transformation π|K is injective (Theorem 2.8). Claim 2. If X = M + K, then π|K is surjective. Proof. Take an arbitrary [x] in X /M so that [x] = x + M for some x in X . If X = M + K, then x = u + v with u in M and v in K. Thus, as π is linear, [x] = π(x) = π(u) + π(v). But u ∈ M = N (π) so that π(u) = [0], and hence π(u) + π(v) = [0] + [v] = [0 + v] = [v] = π(v). Therefore, [x] = π(v) = π|K (v), which lies in R(π|K ). Then X /M ⊆ R(π|K ), and so π|K is surjective. Thus, if K is an algebraic complement of M, then π|K is invertible by Claims 1 and 2 and, since π|K is linear, it is an isomorphism of K onto X /M.
70
2. Algebraic Structures
Theorem 2.18. Let M be a linear manifold of a linear space X . Every algebraic complement of M has the same dimension. Proof. According to Theorem 2.12 the above statement can be rewritten as follows. If N and K are algebraic complements of M, then K and N are isomorphic. But this is a straightforward consequence of the previous lemma: N and K are both isomorphic to X /M, and hence isomorphic to each other. The dimension of an algebraic complement of M is therefore a property of M (i.e., it is an invariant for M). We refer to this invariant as the codimension of M: the codimension of a linear manifold M, denoted by codim M, is the (constant) dimension of any algebraic complement of M.
2.9 Projections A projection is an idempotent linear transformation of a linear space into itself. Thus, if X is a linear space, then P ∈ L[X ] is a projection if and only if P = P 2 . Briefly, projections are the idempotent elements of L[X ]. It is plain that the null transformation O and the identity I in L[X ] are projections. A nontrivial projection in L[X ] is a projection P such that O = P = I. It is easy to verify that, if P is a projection, then so is I − P . Moreover, the null spaces and ranges of P and I − P are related as follows (cf. Problem 1.4). R(P ) = N (I − P )
and
N (P ) = R(I − P ).
Projections are singularly useful linear transformations. One of their main properties is that the range and the null space of a projection form a pair of algebraic complements. Theorem 2.19. If P ∈ L[X ] is a projection, then R(P ) and N (P ) are algebraic complements of each other . Proof. Let X be a linear space and let P : X → X be a projection. Recall that both the range R(P ) and the null space N (P ) are linear manifolds of X (because P is linear). Since P is idempotent, it follows that R(P ) = x ∈ X : P x = x (the range of an idempotent mapping is the set of all its fixed points — Problem 1.4). If x ∈ R(P ) ∩ N (P ), then x = P x = 0, and hence R(P ) ∩ N (P ) = {0}. Moreover, write any vector x in X as x = P x + (x − P x). Since P is linear and idempotent, P (x − P x) = P x − P 2 x = 0, and so (x − P x) lies in N (P ). Hence x = u + v with u = P x in R(P ) and v = (x − P x) in N (P ). Therefore, X = R(P ) + N (P ).
2.9 Projections
71
On the other hand, for any pair of algebraic complements there exists a unique projection whose range and null space coincide with them. Theorem 2.20. Let M and N be linear manifolds of a linear space X . If M and N are algebraic complements of each other, then there exists a unique projection P : X → X such that R(P ) = M and N (P ) = N . Proof. Let M and N be algebraic complements in a linear space X so that M+N = X
and
M ∩ N = {0}.
According to Theorem 2.14, for each x ∈ X there exists a unique u ∈ M and a unique v ∈ N such that x = u + v. Let P : X → X be the function that assigns to each x in X its unique summand u in M (i.e., P x = u). It is easy to verify that P is linear. Moreover, for each vector x in X , P 2 x = P (P x) = P u = u = P x (reason: u is itself its unique summand in M), so that P is idempotent. By the very definition of P we get R(P ) = M and N (P ) = N . Conclusion: P : X → X is a projection with R(P ) = M and N (P ) = N . Now let P : X → X be any projection with R(P ) = M and N (P ) = N . Take an arbitrary x ∈ X and consider again its unique representation as x = u + v with u ∈ M = R(P ) and v ∈ N = N (P ). Since P is linear and idempotent, it follows that P x = P u + P v = u = P x. Therefore, P = P . Remark : An immediate corollary of Theorems 2.16 and 2.20 says that any linear manifold of a linear space is the range of some projection. That is, if M is a linear manifold of a linear space X , then there exists a projection P : X → X such that R(P ) = M. If M and N are algebraic complements in a linear space X , then the unique projection P in L[X ] with range R(P ) = M and null space N (P ) = N is called the projection on M along N . If P is the projection on M along N , then the projection on N along M is precisely the projection E = I − P in L[X ], referred to as the complementary projection of P . Note that EP = P E = O. Proposition 2.21. Let M and N be linear manifolds of a linear space X . If M and N are algebraic complements of each other, then the unique decomposition of each x in X = M ⊕ N as a direct sum x = u⊕v of u in M and v in N is such that u = Px
and
v = (I − P )x,
where P : X → X is the unique projection on M along N . Proof. Take an arbitrary x in X and consider its unique decomposition x = u ⊕ v in X = M ⊕ N . Note that the identification of M ⊕ N with
72
2. Algebraic Structures
M + N = X is implicitly assumed in the proposition statement. Now write x = (u, v) and set P x = (u, 0). The very same argument used in the proof of Theorem 2.20 can be applied here to verify that this actually defines a unique projection P : M ⊕ N → M ⊕ N such that R(P ) = M ⊕ {0} and N (P ) = {0} ⊕ N . Finally, identify M ⊕ {0} and {0} ⊕ N with M and N (and hence (u, 0) and (0, v) with u and v), respectively. According to Theorem 2.16 every linear space X can be represented as the sum X = M + N of a pair {M, N } of algebraic complements in X . If M ⊕ N is identified with M + N , then this means that every linear space X has a decomposition X = M ⊕ N as a direct sum of disjoint linear manifolds of X . Proposition 2.22. Let X be a linear space and consider its decomposition X = M⊕N as a direct sum of disjoint linear manifolds M and N of X . Let P : X → X be the projection on M along N , and let E = I − P be the projection on N along M. Every linear transformation L: X → X can be written as a 2×2 matrix with linear transformation entries ! A B L = , C D where A = P L|M : M → M, B = P L|N : N → M, C = EL|M : M → N , and D = EL|N : N → N . Proof. Let M and N be linear manifolds of a linear space X . Suppose M and N are algebraic complements of each other, identify M ⊕ N with M + N , and consider the decomposition X = M ⊕ N . Let L be a linear transformation on M ⊕ N so that L ∈ L[X ]. Take an arbitrary x ∈ X and consider its unique decomposition x = u ⊕ v in X = M ⊕ N with u in M and v in N . Now write x = (u, v) so that Lx = L(u, v) = L((u, 0) ⊕ (0, v)) = L(u, 0) ⊕ L(0, v) = L|M⊕{0} (u, 0) ⊕ L|{0}⊕N (0, v). Identifying M ⊕ {0} and {0} ⊕ N with M and N (and so (u, 0) and (0, v) with u and v), respectively, it follows that L x = L|M u ⊕ L|N v, where L|M u and L|N v lie in X = M ⊕ N . By Proposition 2.21 we may write L|M u = P L|M u ⊕ EL|M u, L|N v = P L|N v ⊕ EL|N v, where P is the unique projection on M along N and E = I − P . Therefore, L x = (P L|M u + P L|N v) ⊕ (EL|M u + EL|N v), where P L|M u + P L|N v is in M and EL|M u + EL|N v is in N . Since the ranges of P L|M and P L|N are included in R(P ) = M, we may think of them
2.9 Projections
73
as linear transformations into M. Similarly, EL|M and EL|N can be thought of as linear transformations into N . Thus set A = P L|M in L[M], B = P L|N in L[N , M], C = EL|M in L[M, N ], and D = EL|N in L[N ] so that L x = (Au + Bv, Cu + Dv) ∈ M ⊕ N for every x = (u, v) ∈ M ⊕ N . In terms of standard matrix notation, the vector Lx in M ⊕ N can be viewed Au+Bvas a 2×1 matrix with the first entry in M and the other in N , namely, Cu+Dv . This is precisely the action of the 2×2 A B , on the 2×1 with matrix with linear transformation entries, C
D A matrix B u , and entries in M and N representing x, namely, uv . Thus Lx = C D v A B . hence we write L = C D Example 2.O. Consider the setup of Proposition 2.22. Note that the projection on M along N can be written as ! I O P = O O with respect to the decomposition X = M ⊕ N , where I denotes the identity A O A O on M. Thus LP = C and P LP = O O O , so that LP = P LP if and only if C = O. Note that M is L-invariant (i.e., L(M) ⊆ M) if and only if P L|M = L|M (equivalently, if and only if EL|M = O with E = I − P ). Thus L(M) ⊆ M
⇐⇒
A = L|M
⇐⇒
C=O
⇐⇒
LP = P LP.
Conclusion 1: The following assertions are pairwise equivalent. (a) M is L-invariant. B (b) L = L|OM D . (c) LP = P LP . Similarly, if we apply the same argument to N , then L(N ) ⊆ N
⇐⇒
D = L|N
⇐⇒
B=O
⇐⇒
P L = P LP.
Conclusion 2: The following assertions are pairwise equivalent as well.
(a ) M and N are both L-invariant.
(b ) L = L|OM L|ON .
(c ) L and P commute (i.e., P L = LP ). Let M and N be algebraic complements in a linear space X . If a linear A O transformation L in L[X ] is represented as L = O D in terms of the decom position X = M ⊕ N (as in (a ) above), where A ∈ L[M] and D ∈ L[N ], then it is usual to write L = A ⊕ D. For instance, the projection on M along N ,
74
2. Algebraic Structures
O which is represented as P = OI O with respect to the same decomposition X = M ⊕ N , is usually written as P = I ⊕ O. These are examples of the following concept. Let {Xi }ni=1 be a finite family n of linear spaces over the same scalar field and consider their direct sum i=1 Xi . Let {Li }ni=1 be a family of linear transn formations such n that each Li lies in each nL[Xi ]. The direct sum of {Li}i=1 , denoted by i=1 Li , is the mapping of i=1 Xi into itself defined by n "
Li (x1 , . . . , xn ) = (L1 x1 , . . . , Ln xn )
i=1
n n for every n ) in i=1 Xi . It is readily verified that i=1 Li is linear n (x1 , . . . , x (i.e., i=1 Li ∈ L[ ni=1 Xi ]) and also that, for every index i, n " i=1
# # Li #X = Li . i
The above identity is a short notation for the following assertion. “If 0i is the origin of each Xi and Oi is the unique (linear) transformation of {0i } onto itself, each linear manifold ⊕ {0i+1 } ⊕ ··· ⊕ {0n } then n {01 } ⊕ ··· ⊕ {0i−1 } ⊕ Xi n n of i=1 Xi is invariant for i=1 Li and the restriction of i=1 Li to that invariant linear manifold is the direct sum O1 ⊕ ··· ⊕ Oi−1 ⊕ Li ⊕ Oi+1 ⊕ ···⊕ On ”. n Of course, we shall always use the short notation. Conversely, if L ∈ L[ i=1 Xi ] is such that each restriction L|X direct sum of i lies in L[Xi ], then L is the {L|Xi }ni=1 . That is, if each Xi in ni=1 Xi is invariant for L ∈ L[ ni=1 Xi ], then L =
n "
L|Xi .
i=1
n Summing up: Set X = i=1 Xi and consider linear transformations Li in L[Xi ] for each i and L in L[X ]. L=
n "
Li
if and only if
Li = L|Xi
i=1
for every index i (so that each Xi , viewed as a linear manifold of the linear space ni=1 Xi , is invariant for L). The linear transformations {Li } are referred to as the direct summands of L. In particular, consider the decomposition X = M ⊕ N of a linear space X into the direct sum of a pair of algebraic complements M and N in X , and take linear transformations L ∈ L[X ], A ∈ L[M], and D ∈ L[N ]. Then ! A O L= = A ⊕ D if and only if A = L|M and D = L|N O D (so that M and N are both L-invariant), where A and D are the direct summands of L with respect to the decomposition X = M ⊕ N .
Problems
75
Suggested Reading Birkhoff and MacLane [1] Brown and Pearcy [3] Halmos [2], [5] Herstein [1], [2] Hoffman and Kunze [1] Kaplansky [1]
Lax [2] MacLane and Birkhoff [1] Naylor and Sell [1] Roman [1] Simmons [1] Taylor and Lay [1]
Problems Problem 2.1. Let X be a linear space over a field F . Take arbitrary α and β in F and arbitrary x, y, and z in X . Verify the following propositions. (a) (−α)x = −(αx). (b) 0x = 0 = α0. (c) αx = 0
=⇒
(d) x + y = x + z
α = 0 or x = 0. =⇒
y = z.
(e) αx = αy
=⇒
x = y if α = 0.
(f) αx = βx
=⇒
α = β if x = 0.
Problem 2.2. Let X be a real or complex linear space. A subset C of X is convex if αx + (1 − α)y is in C for every x, y in C and every α in [0, 1]. A vector x ∈ X is a convex linear combination of vectors in X if there exist a finite set {xi }ni=1 of vectors in X anda finite family of nonnegative scalars {αi }ni=1 such that x = ni=1 αi xi and ni=1 αi = 1. If A is a subset of X , then the intersection of all convex sets containing A is called the convex hull of A, denoted by co(A). (a) Show that the intersection of an arbitrary nonempty collection of convex sets is convex. (b) Show that co(A) is the smallest (in the inclusion ordering) convex set that includes A. (c) Show that C is convex if and only if every convex linear combination of vectors in C belongs to C. Hint : To verify that every convex linear combination of vectors in a convex set C belongs to C, proceed as follows. Note that the italicized result holds for any convex linear combination of two vectors in C (by definition of convex set). Suppose it holds for every convex linear combination of n vectors in C, for some n ∈ N . This implies that α ni=1 α−1 αi xi + αn+1 xn+1 lies n+1 n+1 in C whenever {x i }i=1 ⊂ C and i=1 αi = 1 with 0 < αn+1 , where α = n n −1 α (reason: α α x ∈ C). Conclude the proof by induction. i i i i=1 i=1
76
2. Algebraic Structures
(d) Show that co(A) coincides with the set of all convex linear combinations of vectors in A. Hint : Let clc(A) denote the set of all convex linear combinations of vectors in A. Verify that clc(A) is a convex set. Now use (b) and (c) to show that co(A) ⊆ clc(A) ⊆ clc(co(A)) = co(A). Problem 2.3. Let M and N be linear manifolds of a linear space X , and let A and B be linearly independent subsets of M and N , respectively. If M ∩ N = {0}, then A ∪ B is linearly independent. Hint : If a ∈ A is a linear combination of vectors in A ∪ B, then a = b + a for some a ∈ M and some b ∈ N . Problem 2.4. Let A be a linearly independent subset of a linear space X . If B and C are subsets of A, then span (B ∩ C) = span B ∩ span C. Hint : Show that span B ∩ span C ⊆ span (B ∪ C) by Proposition 2.3. Problem 2.5. The cardinality of a linearly independent subset of a linear space X is less than or equal to the cardinality of a subset of X that spans X . Hint : That is, if A ⊆ X is such that span A = X , then # C ≤ # A for every linearly independent C ⊆ X . Indeed, if B ⊆ X is a Hamel basis for X , then show that # C ≤ # B ≤ # A. (Apply Theorems 2.5, 2.6, and 2.7 — see Problems 1.21(a) and 1.22.) Note that this generalizes Claim 0 in the proof of Theorem 2.7 for subsets of arbitrary cardinality. Problem 2.6. Let X be a linear space, and let M be a linear manifold of X . Verify the following propositions. (a) dim M = 0 if and only if M = {0}. (b) dim M ≤ dim X .
(Hint : Problem 2.5.)
Problem 2.7. If M is a proper linear manifold of a finite-dimensional linear space X , then dim M < dim X . Prove the above statement and show that it does not hold for infinite-dimensional linear spaces. Hint : Show that dim X0 = dim X , where X is the linear space of Example 2.J and X0 = {x = (ξ1 , ξ2 , ξ3 , . . .) ∈ X : ξ1 = 0}. Problem 2.8. Let X be a nonzero linear space over an infinite field, and let B be a Hamel basis for X . Recall that every nonzero vector x in X has a unique representation in terms of B. That is, for each x = 0 in X there exists
Problems
77
a unique nonempty finite subset Bx of B and a unique finite family of nonzero scalars {αb }b∈Bx ⊂ F such that x = αb b. b∈Bx
For each positive integer n ∈ N let Xn be the set of all nonzero vectors in X whose representations as a (finite) linear combination of vectors in B have exactly n (nonzero) summands. That is, for each n ∈ N , set Xn = x ∈ X : # Bx = n . (a) Prove that
# Xn
=
# (F ×B)
for all n ∈ N .
Hint : Show that # Xn = # (F ×B) and recall: if F is an infinite set, then n # F = # F (Problems 1.23 and 1.28).
(b) Apply Theorem 1.10 to show that # n∈N Xn ≤ # (F ×B). n
(c) Verify that {Xn }n∈N is a partition of X \{0}. Thus conclude from (b) and (c) (see Problem 1.28(a)) that #X
=
# (F ×B)
= max{ # F , dim X }.
Problem 2.9. Prove the following proposition, which is known as the Principle of Superposition. A mapping L: X → Y, where X and Y are linear spaces over the same scalar field, is a linear transformation if and only if L
n i=1
α i xi
=
n
αi Lxi
i=1
for all finite sets {xi }ni=1 of vectors in X and all finite sets of scalars {αi }ni=1 . Problem 2.10. Let L: X → Y be a linear transformation. (a) Show that the null space N (L) and the range R(L) of L are linear manifolds of the linear spaces X and Y, respectively. Moreover, show that they both are L-invariant. That is, L(N (L)) ⊆ N (L) and L(R(L)) ⊆ R(L). Now set X = Y and show that the positive integral powers of L are linear transformations. That is, Ln ∈ L[X ] for every n ≥ 1 whenever L ∈ L[X ]. (b) Show that the linear manifolds N (Ln ) and R(Ln ) are L-invariant. That is, L(N (Ln )) ⊆ N (Ln ) and L(R(Ln )) ⊆ R(Ln ) for every n ≥ 1. (c) Show that Ln is injective or surjective if L is. That is, N (L) = {0} implies N (Ln ) = {0}, and R(L) = X implies R(Ln ) = X , for every n ≥ 1. Problem 2.11. Let L: X → Y be a linear transformation of a linear space X into a linear space Y. Prove the following propositions.
78
2. Algebraic Structures
(a) If M is a linear manifold of X , then L(M) is a linear manifold of Y (i.e., the linear image of a linear manifold is a linear manifold). (b) If N is a linear manifold of Y, then L−1 (N ) is a linear manifold of X (i.e., the inverse image of a linear manifold under a linear transformation is again a linear manifold). Problem 2.12. Let X and Y be linear spaces, and let L: X → Y be a linear transformation. Show that the following assertions are equivalent. (a) A ⊆ X is a linear manifold whenever L(A) ⊆ Y is a linear manifold. (b) N (L) = {0}. Hint : Give a direct proof for (b)⇒(a) by using Problems 1.3(d) and 2.11(b). Give a contrapositive proof for (a)⇒(b) — recall: if x is a nonzero vector in X , then {x} is not a linear manifold of X . Problem 2.13. Prove that the set L[X , Y ] of all linear transformations of a linear space X into a linear space Y is itself a linear space (over the same common field of X and Y) when vector addition and scalar multiplication in L[X , Y ] are defined pointwise as in Example 2.F. Problem 2.14. Show that the restriction L|M : M → Y of a linear transformation L: X → Y to a linear manifold M of X is itself a linear transformation. Moreover, also show that N (L|M ) = M ∩ N (T ). Problem 2.15. Show that the composition of two linear transformations is again a linear transformation. That is, if X , Y, and Z are linear spaces over the same scalar field, and if L ∈ L[X , Y ] and T ∈ L[Y, Z], then T L ∈ L[X , Z]. Moreover, also show that R(T L) = T (R(L)). Problem 2.16. Let L: X → Y be a linear transformation. It is trivially verified that, if L is surjective, then dim R(L) = dim Y. Now verify that, if L is injective, then dim R(L) = dim X . Problem 2.17. Let L: X → Y be a linear transformation of a linear space X into a linear space Y. The dimension of the range of L is the rank of L, and the dimension of the null space of L is the nullity of L. Show that rank and nullity are related as follows. dim N (L) + dim R(L) = dim X . Hint : Suppose L = O. Let BN be a Hamel basis for N (L) and let BX be a Hamel basis for X that properly includes BN . (Theorem 2.5 — why is the inclusion proper?) Set BM = BX \BN and M = span BM . Show that BM is a Hamel basis for M, the restriction of L to M is injective (since N (L) and M
Problems
79
are algebraic complements by Proposition 2.15), and L|M : M → R(L) is an isomorphism (Problem 2.14). Thus dim R(L) = dim M (Theorem 2.12), and dim X = #BX = #BN + #BM = dim N (L) + dim M (Problem 1.30). Problem 2.18. If dim R(L) is finite, then L is called a finite-dimensional (or a finite-rank ) linear transformation. Clearly, if Y is a finite-dimensional linear space, then every L ∈ L[X , Y ] is finite dimensional. Verify that, if X is a finite-dimensional linear space, then every L ∈ L[X , Y ] is finite dimensional. Moreover, if L: X → Y is a finite-dimensional linear transformation (so that R(L) is a finite-dimensional linear manifold of Y), then show that (a) L is injective if and only if dim R(L) = dim X , (b) L is surjective if and only if dim R(L) = dim Y, (c) L is injective if and only if it is surjective, whenever dim X = dim Y. Problem 2.19. Let X be a linear space over a field F and let X N 0 be the linear space (over the same field F ) of all X -valued sequences {xn }n∈N 0 . Take a linear transformation of X into itself, A ∈ L[X ], and an arbitrary sequence u = {un }n∈N 0 in X N 0. Consider the (unique) sequence x = {xn }n∈N 0 in X N 0 which is recursively defined as follows. Set x0 = u0 and, for each n ∈ N 0 , set xn+1 = Axn + un+1 . As usual, let An denote the composition of A with itself n times for each integer n ≥ 0, with A0 = I, the identity in L[X ]. Prove by induction that xn =
n
An−i ui
i=0
for every n ∈ N 0 . Now let L: X N 0 → X N 0 be the map that assigns to each sequence u in X N 0 this unique sequence x in X N 0, so that x = L u. Show that L is a linear transformation of X N 0 into itself. The recursive equation (or the difference equation) xn+1 = Axn + un+1 is called a discrete linear dynamical system because L is linear . Its unique solution is given by x = L u n (i.e., xn = i=0 An−i ui for every n ∈ N 0 ). Problem 2.20. Let F denote either the real or complex field, and let X and Y be linear spaces over F . For any polynomial p (inone variable in F with n coefficients αi in F and of finite order n; i.e., p(z) = i=0 αi z i for z ∈ F ), set p(L) =
n i=0
αi L i ,
80
2. Algebraic Structures
where L ∈ L[X ] and {αi }ni=0 is a finite set of coefficients in F (note: L0 = I). Show that p(L) ∈ L[X ] (in particular, Ln ∈ L[X ] for each n ≥ 0) for every L ∈ L[X ]. Take L ∈ L[X ], K ∈ L[Y ], and M ∈ L[X , Y ]. Prove the implication. (a) If M L = KM, then M p(L) = p(K)M for any polynomial p. Thus conclude: p(L) is similar to p(K) whenever L is similar to K. A linear transformation L in L[X ] is called nilpotent if Ln = O for some integer n ∈ N , and algebraic if p(L) = O for some polynomial p. It is clear that every nilpotent linear transformation is algebraic. Prove the following propositions. (b) A linear transformation is similar to an algebraic (nilpotent) linear transformation if and only if it is itself algebraic (nilpotent). (c) Sum and composition of nilpotent linear transformations are not necessarily nilpotent. Hint : The matrices T = 00 10 and L = 01 00 in L[C 2 ] are both nilpotent. L + T is an involution. L T and T L are idempotent. Problem 2.21. Let F denote either the real or complex field, and let X be a linear space over F . A subset K of X is a cone (with vertex at the origin) if αx ∈ K whenever x ∈ K and α ≥ 0. Recall the definition of a convex set in Problem 2.2 and verify the following assertions. (a) Every linear manifold is a convex cone. (b) The union of nonzero disjoint linear manifolds is a nonconvex cone. Let S be a nonempty set and consider the linear space F S. Show that (c) {x ∈ F S : x(s) ≥ 0 for all s ∈ S} is a convex cone in F S. Problem 2.22. Show that the implication (a)⇒(b) in Theorem 2.14 does not generalize to three linear manifolds, say M, N , and R, if we simply assume that they are pairwise disjoint. (Hint : R3.) Problem 2.23. Let {Mi }ni=1 be a finite collection of linear manifolds of a linear space X . Show that the following assertions are equivalent. n (a) Mi ∩ j=1,j =i Mj = {0} for every i = 1 , . . . , n. n (b) For each x in i there exists a unique n-tuple (x1 , . . . , xn ) in i=1 M n n i=1 Mi such that x = i=1 xi . Hint : (a)⇒(b) for n = 2 by Theorem 2.14. Take any integer n > 2 and suppose (a)⇒(b) for every 2 ≤ m < n. Show that, if (a) holds true for m + 1, then (b) holds true for m + 1. Now conclude the proof of (a)⇒(b) by induction in n. Next show that (b)⇒(a) by Theorem 2.14.
Problems
81
Problem 2.24. Let {Mi }ni=1 be a finite collection of linear manifolds of a linn ear space X , and let Bi be a Hamel basis for each Mi . If Mi ∩ j=1,j =i Mj = n n {0} for every i = 1 , . . . , n, then i=1 Bi is a Hamel basis for i=1 Mi . Prove. Hint : Apply Proposition 2.15 for n = 2. Now use the hint to Problem 2.23. Problem 2.25. Let M and N be linear manifolds of a linear space. (a) If M and N are disjoint, then dim(M ⊕ N ) = dim(M + N ) = dim M + dim N . Hint : Problem 1.30, Theorem 2.14, and Proposition 2.15. (b) If M and N are finite-dimensional, then dim(M + N ) = dim M + dim N − dim(M ∩ N ). Problem 2.26. Let M be a proper linear manifold of a linear space X so that M ∈ Lat(X )\{X }. Consider the inclusion ordering of Lat(X ). Show that M is maximal in Lat(X )\{X }
⇐⇒
codim M = 1.
Problem 2.27. Let ϕ be a nonzero linear functional on a linear space X (i.e., a nonzero element of X , the algebraic dual of X ). Prove the following results. (a) N (ϕ) is maximal in Lat(X )\{X }. That is, the null space of every nonzero linear functional in X is a maximal proper linear manifold of X . Conversely, if M is a maximal linear manifold in Lat(X )\{X }, then there exists a nonzero ϕ in X such that M = N (ϕ). (b) Every maximal element of Lat(X )\{X } is the null space of some nonzero ϕ in X . Problem 2.28. Let X be a linear space over a field F . The set Hϕ,α = x ∈ X : ϕ(x) = α , determined by a nonzero ϕ in X and a scalar α in F , is called a hyperplane in X . It is clear that Hϕ,0 coincides with N (ϕ) but Hϕ,α is not a linear manifold of X if α is a nonzero scalar. A linear variety is a translation of a proper linear manifold. That is, a linear variety V is a subset of X that coincides with the coset of x modulo M, V = M + x = y ∈ X : y = z + x for some z ∈ M , for some x ∈ X and some M ∈ Lat(X )\{X }. If M is maximal in Lat(X )\{X }, then M + x is called a maximal linear variety. Show that a hyperplane is precisely a maximal linear variety.
82
2. Algebraic Structures
Problem 2.29. Let X be a linear space over a field F , and let P and E be projections in L[X ]. Suppose E = O, and let α be an arbitrary nonzero scalar in F . Prove the following proposition. (a) P + αE is a projection if and only if P E + EP = (1 − α)E. Moreover, if P + αE is a projection, then show that (b) P and E commute (i.e., P E = EP ), and so P E is a projection; (c) P E = O if and only if α = 1 and P E = O if and only if α = −1. Therefore, (d) P + αE is a projection implies α = 1 or α = −1. Thus conclude: (e) P + E is a projection if and only if P E = EP = O, (f) P − E is a projection if and only if P E = EP = E. Next prove that, for arbitrary projections P and E in L[X ], (g) R(P ) ∩ R(E) ⊆ R(P E) ∩ R(EP ). Furthermore, if P and E commute, then show that (h) P E is a projection and R(P ) ∩ R(E) = R(P E), and so (still under the assumption that E and P commute), (i) P E = O if and only if R(P ) ∩ R(E) = {0}. Problem 2.30. An algebra (or a linear algebra) is a linear space A that is also a ring with respect to a second binary operation on A called product (notation: xy ∈ A is the product of x ∈ A and y ∈ A). The product is related to scalar multiplication by the property α(xy) = (αx)y = x(αy) for every x, y ∈ A and every scalar α. We shall refer to a real or complex algebra if A is a real or complex linear space. Recall that this new binary operation on A (i.e., the product in the ring A) is associative, x(yz) = (xy)z, and distributive with respect to vector addition, x(y + z) = xy + xz
and
(y + z)x = yx + z x,
for every x, y, and z in A. If A possesses a neutral element 1 under the product operation (i.e., if there exists 1 ∈ A such that x1 = 1x = x for every x ∈ A), then A is said to be an algebra with identity (or a unital algebra). Such a
Problems
83
neutral element 1 is called the identity (or unit) of A. If A is an algebra with identity, and if x ∈ A has an inverse (denoted by x−1 ) with respect to the product operation (i.e., if there exists x−1 ∈ A such that xx−1 = x−1 x = 1), then x is an invertible element of A. Recall that the identity is unique if it exists, and so is the inverse of an invertible element of A. If the product operation is commutative, then A is said to be a commutative algebra. (a) Let X be a linear space of dimension greater than 1. Show that L[X ] is a noncommutative algebra with identity when the product in L[X ] is interpreted as composition (i.e., L T = L ◦ T for every L, T ∈ L[X ]). The identity I in L[X ] is precisely the neutral element under the product operation. L is an invertible of L[X ] if and only if L is injective and surjective. A subalgebra of A is a linear manifold M of A (when A is viewed as a linear space) which is an algebra in its own right with respect to the product operation of A (i.e., uv ∈ M whenever u ∈ M and v ∈ M). A subalgebra M of A is a left ideal of A if ux ∈ M whenever u ∈ M and x ∈ A. A right ideal of A is a subalgebra M of A such that xu ∈ M whenever x ∈ A and u ∈ M. An ideal (or a two-sided ideal or a bilateral ideal ) of A is a subalgebra I of A that is both a left ideal and a right ideal. (b) Let X be an infinite-dimensional linear space. Show that the set of all finite-dimensional linear transformations in L[X ] is a proper left ideal of L[X ] with no identity. (Hint : Problem 2.25(b).) (c) Show that, if A is an algebra and I is a proper ideal of A, then the quotient space A/I of A modulo I is an algebra. This is called the quotient algebra of A with respect to I. If A has an identity 1, then the coset 1 + I is the identity of A/I. Hint : Recall that vector addition and scalar multiplication in the linear space A/I are defined by (x + I) + (y + I) = (x + y) + I, α(x + I) = αx + I, for every x, y ∈ A and every scalar α (see Example 2.H). Now show that the product of cosets in A/I can be likewise defined by (x + I)(y + I) = xy + I for every x, y ∈ A (i.e., if x = x + u and y = y + v, with x, y ∈ A and u, v ∈ I, then there exists z ∈ I such that x y + w = xy + z for any w ∈ I, whenever I is a two-sided ideal of A). Problem 2.31. Let A and B be algebras over the same scalar field. A linear transformation Φ: A → B (of the linear spaces A into the linear space B) that preserves products — i.e., such that Φ(xy) = Φ(x)Φ(y) for every x, y in A
84
2. Algebraic Structures
— is called a homomorphism (or an algebra homomorphism) of A into B. A unital homomorphism between unital algebras is one that takes the identity of A to the identity of B. If Φ is an isomorphism (of the linear spaces A onto the linear space B) and also a homomorphism (of the algebra A onto the algebra B), then it is an algebra isomorphism of A onto B. In this case A and B are said to be isomorphic algebras. (a) Let {eγ } be a Hamel basis for the linear space A. Show that a linear transformation Φ: A → B is an algebra isomorphism if and only if Φ(eα eβ ) = Φ(eα )Φ(eα ) for every pair {eα , eβ } of elements of the basis {eγ }. (b) Let I be an ideal of A and let π: A → A/I be the natural mapping of A onto the quotient algebra A/I. Show that π is a homomorphism such that N (π) = I. (Hint : Example 2.H.) (c) Let X and Y be isomorphic linear spaces and let W : X → Y be an isomorphism between them. Consider the mapping Φ: L[X ] → L[Y ] defined by Φ(L) = W L W −1 for every L ∈ L[X ]. Show that Φ is an algebra isomorphism of the algebra L[X ] onto the algebra L[Y ]. Problem 2.32. Here is a useful result, which holds in any ring with identity (sometimes referred to as the Matrix Inversion Lemma). Take A, B ∈ L[X ] on a linear space X . If I − AB is invertible, then so is I − BA, and (I − BA)−1 = I + B(I − AB)−1 A. Hint : For every A, B, C ∈ L[X ] verify that (a) (I + B CA)(I − BA) = I − BA + B CA − B CABA, (b) (I − BA)(I + B CA) = I − BA + B CA − BAB CA, (c) I − BA + B CA − B(C − I)A = I. Now set C = (I − AB)−1 so that C(I − AB) = I = (I − AB)C, and hence (d) CAB = C − I = AB C. Thus conclude that (e) (I + B CA)(I − BA) = I = (I − BA)(I + B CA). Problem 2.33. Take a linear transformation L ∈ L[X ] on a linear space X and consider its nonnegative integral powers Ln . Verify that, for every n ≥ 0, N (Ln ) ⊆ N (Ln+1 )
and
R(Ln+1 ) ⊆ R(Ln ).
Let n0 be an arbitrary nonnegative integer. Prove the following propositions. (a) If N (Ln0 +1 ) = N (Ln0 ), then N (Ln+1 ) = N (Ln ) for every integer n ≥ n0 . (b) If R(Ln0 +1 ) = R(Ln0 ), then R(Ln+1 ) = R(Ln ) for every integer n ≥ n0 .
Problems
85
Hint : Rewrite the statements in (a) and (b) as follows. (a) If N (Ln0 +1 ) = N (Ln0 ), then N (Ln0 +k+1 ) = N (Ln0 +k ) for every k ≥ 1. (b) If Ln0 +1 (X ) = Ln0 (X ), then Ln0 +k+1 (X ) = Ln0 +k (X ) for every k ≥ 1. Show that (a) holds for k = 1. Now show that the conclusion in (a) holds for k + 1 whenever it holds for k. Similarly, show that (b) holds for k = 1, then show that the conclusion in (b) holds for k + 1 whenever it holds for k. Thus conclude the proof of (a) and (b) by induction. Problem 2.34. Set N 0 = N 0 ∪ ∞, the set of all extended nonnegative integers with its natural (extended) ordering. The previous problem suggests the following definitions. The ascent of L ∈ L[X ] (notation: asc (L)) is the least nonnegative integer such that N (Ln+1 ) = N (Ln ), and the descent of L (notation: dsc (L)) is the least nonnegative integer such that R(Ln+1 ) = R(Ln ): asc (L) = min n ∈ N 0 : N (Ln+1 ) = N (Ln ) , dsc (L) = min n ∈ N 0 : R(Ln+1 ) = R(Ln ) . It is plain that asc (L) = 0
⇐⇒
N (L) = {0},
dsc (L) = 0
⇐⇒
R(L) = X .
Now prove the following propositions. (a) asc (L) < ∞ and dsc (L) = 0
implies
asc (L) = 0.
Hint : Suppose dsc (L) = 0 (i.e., suppose R(L) = X ). If asc (L) = 0 (i.e., if N (L) = {0}), then take 0 = x1 ∈ N (L) ∩ R(L) and x2 , x3 in R(L) = X such that x1 = Lx2 and x2 = Lx3 , and so x1 = L2 x3 . Proceed by induction to construct a sequence {xn }n≥1 of vectors in X = R(L) such that xn = Lxn+1 and 0 = x1 = Ln xn+1 ∈ N (L), and so Ln+1 xn+1 = 0. Then xn+1 ∈ N (Ln+1 )\N (T n ) for each n ≥ 1, and asc (L) = ∞ by Problem 2.33. (b) asc (L) < ∞ and dsc (L) < ∞
implies
asc (L) = dsc (L).
Hint : Set m = dsc (L), so that R(Lm ) = R(Lm+1 ), and set T = L|R(Lm) . Since R(Lm ) is L-invariant, T ∈ L[R(Lm )] (Problem 2.10(b)). Verify that R(T ) = T (R(Lm )) = R(T Lm ) = R(Lm+1 ) = R(Lm ) (see Problem 2.15). Thus conclude that dsc (T ) = 0. Since asc (T ) < ∞ (because asc (L) < ∞), it follows by (a) that asc (T ) = 0. That is, N (T ) = {0}. Take x ∈ N (Lm+1 ) and set y = Lm x in R(Lm ). Show that T y = Lm+1 x = 0, so y = 0, and hence x ∈ N (Lm ). Therefore, N (Lm+1 ) ⊆ N (Lm ). Use Problem 2.33 to conclude that asc (L) ≤ m. On the other hand, suppose m = 0 (otherwise apply (a)) and take z in R(Lm−1 )\R(Lm ) so that Lz = L(Lm−1 u) = Lm u is in R(Lm ) for u ∈ X . Since Lm (R(Lm )) = R(L2m ) = R(Lm ), infer that Lz = Lm v for v ∈ R(Lm ). Verify that Lm (u − v) = 0 and Lm−1 (u − v) = z − Lm−1 v = 0 (reason: since v ∈ R(Lm ), Lm−1 v ∈ R(L2m−1 ) = R(Lm ) and z ∈ / R(Lm )). Thus (u − v) ∈ N (Lm )\N (Lm−1 ), and so asc (L) ≥ m.
86
2. Algebraic Structures
Problem 2.35. Consider the setup of the previous problem. If asc (L) and dsc (L) are both finite, then they are equal by Problem 2.34(b). Set m = asc (L) = dsc (L) in N . Show that the linear manifolds R(Lm ) and N (T m ) of the linear space X are algebraic complements of each other. That is, R(Lm ) ∩ N (T m ) = {0}
and
X = R(Lm ) ⊕ N (T m ).
Hint : If y is in R(T m ) ∩ N (Lm ), then y = Lm x for some x ∈ X and Lm y = 0. Verify that x ∈ N (L2m ) = N (Lm ), and infer that y = 0. Now consider the hint to Problem 2.34(b) with T = L|R(Lm ) ∈ L[R(Lm )]. Since R(T ) = R(Lm ), it follows that R(T m ) = R(Lm ) (Problem 2.10(c)). Take any x ∈ X . Verify that there exists u ∈ R(Lm ) such that T m u = Lm u = Lm x, and so v = x − u is in N (Lm ). Thus x = u + v ∈ R(T m ) + N (Lm ). Finally, use Theorem 2.14.
3 Topological Structures
The basic concept behind the subject of point-set topology is the notion of “closeness” between two points in a set X. In order to get a numerical gauge of how close together two points in X may be, we shall provide an extra structure to X, viz., a topological structure, that again goes beyond its purely settheoretic structure. For most of our purposes the notion of closeness associated with a metric will be sufficient, and this leads to the concept of “metric space”: a set upon which a “metric” is defined. The metric-space structure that a set acquires when a metric is defined on it is a special kind of topological structure. Metric spaces comprise the kernel of this chapter, but general topological spaces are also introduced.
3.1 Metric Spaces A metric (or metric function, or distance function) is a real-valued function on the Cartesian product of an arbitrary set with itself that has the following four properties, called the metric axioms. Definition 3.1. Let X be an arbitrary set. A real-valued function d on the Cartesian product X×X, d : X×X → R, is a metric on X if the following conditions are satisfied for all x, y, z in X. (i)
d(x, y) ≥ 0 and d(x, x) = 0
(nonnegativeness),
(ii)
d(x, y) = 0 only if
(positiveness),
x=y
(iii) d(x, y) = d(y, x) (iv) d(x, y) ≤ d(x, z) + d(z, y)
(symmetry), (triangle inequality).
A set X equipped with a metric on it is a metric space. A word on notation and terminology. The value of the metric d on a pair of points of X is called the distance between those points. According to the above definition a metric space actually is an ordered pair (X, d) where X is C.S. Kubrusly, The Elements of Operator Theory, DOI 10.1007/978-0-8176-4998-2_3, © Springer Science+Business Media, LLC 2011
87
88
3. Topological Structures
an arbitrary set, called the underlying set of the metric space (X, d), and d is a metric function defined on it. We shall often refer to a metric space in several ways. Sometimes we shall speak of X itself as a metric space when the metric d is either clear in the context or is immaterial. In this case we shall simply say “X is a metric space”. On the other hand, in order to avoid confusion among different metric spaces, we may occasionally insert a subscript on the metrics. For instance, (X, dX ) and (Y, dY ) will stand for metric spaces where X and Y are the respective underlying sets, dX denotes the metric on X, and dY the metric on Y. Moreover, if a set X can be equipped with more than one metric, say d1 and d2 , then (X, d1 ) and (X, d2 ) will represent different metric spaces with the same underlying set X. In brief, a metric space is an arbitrary set with an additional structure defined by means of a metric d. Such an additional structure is the topological structure induced by the metric d. If (X, d) is a metric space, and if A is a subset of X, then it is easy to show that the restriction d|A×A : A×A → R of the metric d to A×A is a metric on A — called the relative metric. Equipped with the relative metric, A is a subspace of X. We shall drop the subscript A×A from d|A×A and say that (A, d) is a subspace of (X, d). Thus a subspace of a metric space (X, d) is a subset A of the underlying set X equipped with the relative metric, which is itself a metric space. Roughly speaking, A inherits the metric of (X, d). If (A, d) is a subspace of (X, d) and A is a proper subset of X, then (A, d) is said to be a proper subspace of the metric space (X, d). Example 3.A. The function d : R×R → R defined by d(α, β) = |α − β| for every α, β ∈ R is a metric on R. That is, it satisfies all the metric axioms 1 in Definition 3.1, where |α| = (α2 ) 2 stands for the absolute value of α ∈ R. This is the usual metric on R. The real line R equipped with its usual metric is the most important concrete metric space. If we refer to R as a metric space without specifying a metric on it, then it is understood that R has been equipped with its usual metric. Similarly, the function d : C ×C → R given by 1 d(ξ, υ) = |ξ − υ| for every ξ, υ ∈ C is a metric on C . Again, |ξ| = (ξξ) 2 stands for the absolute value (or modulus) of ξ ∈ C , with the upper bar denoting complex conjugate of a complex number. This is the usual metric on C . More generally, let F denote either the real field R or the complex field C , and let F n be the set of all ordered n-tuples of scalars in F . For each real number p ≥ 1 consider the function dp : F n ×F n → R defined by dp (x, y) =
n
|ξi − υi |p
p1
i=1
and also the function d∞ : F n ×F n → R given by d∞ (x, y) = max |ξi − υi |, 1≤i≤n
,
3.1 Metric Spaces
89
for every x = (ξ1 , . . . , ξn ) and y = (υ1 , . . . , υn ) in F n. These are metrics on F n. Indeed, all the metric axioms up to the triangle inequality are trivially verified. The triangle inequality follows from the Minkowski inequality (see Problem 3.4(a)). Note that (Q n, dp ) is a subspace of (Rn, dp ) and (Q n, d∞ ) is a subspace of (Rn, d∞ ). The special (very special, really) metric space (Rn, d2 ) is called n-dimensional Euclidean space and d2 is the Euclidean metric on Rn. The metric space (C n, d2 ) is called n-dimensional unitary space. The singular role played by the metric d2 will become clear in due course. Recall that the notion of a bounded subset was defined for partially ordered sets in Section 1.5. In particular, boundedness is well defined for subsets of the simply ordered set (R, ≤), the set of all real numbers R equipped with its natural ordering ≤ (see Section 1.6). Let us introduce a suitable and common notation for a subset of R that is bounded above. Since the simply ordered set R is a boundedly complete lattice (Example 1.C), it follows that a subset R of R is bounded above if and only if it has a supremum, sup R, in R. In such a case we shall write sup R < ∞. Thus the notation sup R < ∞ simply means that R is a subset of R which is bounded above. Otherwise (i.e., if R ⊆ R is not bounded above) we write sup R = ∞. With this in mind we shall extend the notion of boundedness from (R, ≤) to a metric space (X, d) as follows. A nonempty subset A of X is a bounded set in the metric space (X, d) if sup d(x, y) < ∞.
x,y ∈A
That is, A is bounded in (X, d) if {d(x, y) ∈ R: x, y ∈ A} is a bounded subset of (R, ≤) or, equivalently, if the set {d(x, y) ∈ R: x, y ∈ A} is bounded above in R (because 0 ≤ d(x, y) for every x, y ∈ X). An unbounded set is, of course, a set A that is not bounded in (X, d). The diameter of a nonempty bounded subset A of X (notation: diam(A)) is defined by diam(A) = sup d(x, y) x,y ∈A
so that diam(A) < ∞ whenever a nonempty set A is bounded in (X, d). By convention the empty set ∅ is bounded and diam(∅) = 0. If A is unbounded we write diam(A) = ∞. Let F be a function of a set S to a metric space (Y, d). F is a bounded function if its range, R(F ) = F (S), is a bounded subset in (Y, d); that is, if sup d(F (s), F (t)) < ∞. s,t ∈S
Note that R is bounded as a subset of the metric space R equipped with its usual metric if and only if R is bounded as a subset of the simply ordered set R equipped with its natural ordering. Thus the notion of a bounded subset of R and the notion of a bounded real-valued function on an arbitrary set S are both unambiguously defined.
90
3. Topological Structures
Proposition 3.2. Let S be a set and let F denote either the real field R or the complex field C . Equip F with its usual metric. A function ϕ ∈ F S (i.e., a function ϕ: S → F ) is bounded if and only if sup |ϕ(s)| < ∞. s∈S
Proof. Consider a function ϕ from a set S to the field F . Take s and t arbitrary in S, and let d be the usual metric on F (see Example 3.A). Since ϕ(s), ϕ(t) ∈ F , if follows by Problem 3.1(a) that # # # # |ϕ(s)| − |ϕ(t)| ≤ #|ϕ(s)| − |ϕ(t)|# = #d(ϕ(s), 0) − d(0, ϕ(t))# ≤ d(ϕ(s), ϕ(t)) = |ϕ(s) − ϕ(t)| ≤ |ϕ(s)| + |ϕ(t)|. If sups∈S |ϕ(s)| < ∞ (i.e., if {|ϕ(s)| ∈ R: s ∈ S} is bounded above), then d(ϕ(s), ϕ(t)) ≤ 2 sup |ϕ(s)|, s∈S
and hence sups,t ∈S d(ϕ(s), ϕ(t)) ≤ 2 sups∈S |ϕ(s)| so that the function ϕ is bounded. On the other hand, if sups,t ∈S d(ϕ(s), ϕ(t)) < ∞, then |ϕ(s)| ≤ sup d(ϕ(s), ϕ(t)) + |ϕ(t)| s,t ∈S
for every
s, t ∈ S,
and so the real number sups,t ∈S d(ϕ(s), ϕ(t)) + |ϕ(t)| is an upper bound for {|ϕ(s)| ∈ R: s ∈ S} for every t ∈ S. Thus sups∈S |ϕ(s)| < ∞. p denote the set of all scalarExample 3.B. For each real number p ≥ 1, let + valued (real or complex) infinite sequences {ξ } in C N (or in C N 0 ) such that k k∈ N ∞ p that the k=1 |ξk | < ∞. We shall refer to this condition ∞ by saying elements n p p p of + are p-summable sequences. Notation: |ξ | = sup k n∈ N k=1 k=1 |ξk | . ∞ p Thus, according to Proposition 3.2, k=1 |ξk | < ∞ means that the nonnegative sequence { nk=1 |ξk |p }n∈N is bounded as a real-valued function on N . p Note that, if {ξk }k∈N and {υk }k∈N are arbitrary sequences the ∞ in + , then p Minkowski inequality (Problem 3.4(b)) ensures that k=1 |ξk − υk | < ∞. p p Hence we may consider the function dp : + ×+ → R given by
dp (x, y) =
∞
|ξk − υk |p
p1
k=1 p for every x = {ξk }k∈N and y = {υk }k∈N in + . We claim that dp is a metric p on + . Indeed, as in Example 3.A, all the metric axioms up to the triangle inequality are readily verified, and the triangle inequality follows from the p Minkowski inequality (Problem 3.4(b)). Therefore (+ , dp ) is a metric space for p each p ≥ 1, and the metric dp is referred to as the usual metric on + . Now let ∞ + denote the set of all scalar-valued bounded sequences; that is, the set of all real or complex-valued sequences {ξk }k∈N such that supk∈N |ξk | < ∞. Again,
3.1 Metric Spaces
91
the Minkowski inequality (Problem 3.4(b)) ensures that supk∈N |ξk − υk | < ∞ ∞ whenever {ξk }k∈N and {υk }k∈N lie in + , and hence we may consider the ∞ ∞ function d∞ : + ×+ → R defined by d∞ (x, y) = sup |ξk − υk | k∈N
∞ . Proceeding as before (using the for every x = {ξk }k∈N and y = {υk }k∈N in + ∞ Minkowski inequality to verify the triangle inequality), it follows that (+ , d∞ ) ∞ is a metric space, and the metric d∞ is referred to as the usual metric on + . These metric spaces are the natural generalizations (for infinite sequences) of the metric spaces considered in Example 3.A, and again the metric space 2 (+ , d2 ) will play a central role in the forthcoming chapters. There are counterp ∞ p parts of + and + for nets in C Z. In fact, for each p ≥ 1 let denote the set of ∞ p all scalar-valued (real or complex) nets {ξ } such that k k∈Z k=−∞ |ξk | < ∞ n p (i.e., such that the nonnegative sequence { k=−n |ξk | }n∈N 0 is bounded), and let ∞ denote the set of all bounded nets in C Z (i.e., the set of all scalar-valued nets {ξk }k∈Z such that supk∈Z |ξk | < ∞). The functions dp : p × p → R and d∞ : ∞ × ∞ → R, given by
dp (x, y) =
∞
|ξk − υk |p
p1
k=−∞
for every x = {ξk }k∈Z and y = {υk }k∈Z in p , and d∞ (x, y) = sup |ξk − υk | k∈Z
for every x = {ξk }k∈Z and y = {υk }k∈Z in ∞, are metrics on p (for each p ≥ 1) and on ∞, respectively. Let (X, d) be a metric space. If x is an arbitrary point in X, and A is an arbitrary nonempty subset of X, then the distance from x to A is the number d(x, A) = inf d(x, a) a∈A
in R. If A and B are nonempty subsets of X, then the distance between A and B is the real number d(A, B) =
inf
a∈A, b∈B
d(a, b).
Example 3.C. Let S be a set and let (Y, d) be a metric space. Let B[S, Y ] denote the subset of Y S consisting of all bounded mappings of S into (Y, d). According to Problem 3.6, sup d(f (s), g(s)) ≤ diam(R(f )) + diam(R(g)) + d(R(f ), R(g)) s∈S
92
3. Topological Structures
so that sups∈S d(f (s), g(s)) ∈ R for every f, g ∈ B[S, Y ]. Thus we may consider the function d∞ : B[S, Y ]×B[S, Y ] → R defined by d∞ (f, g) = sup d(f (s), g(s)) s∈S
for each pair of mappings f, g ∈ B[S, Y ]. This is a metric on B[S, Y ]. Indeed, d∞ clearly satisfies conditions (i), (ii), and (iii) in Definition 3.1. To verify the triangle inequality (condition (iv)), proceed as follows. Take an arbitrary s ∈ S and note that, if f , g, and h are mappings in B[S, Y ], then (by the triangle inequality in (Y, d)) d(f (s), g(s)) ≤ d(f (s), h(s)) + d(h(s), g(s)) ≤ d∞ (f, h) + d∞ (h, g). Hence d∞ (f, g) ≤ d∞ (f, h) + d∞ (f, g), and therefore (B[S, Y ], d∞ ) is a metric space. The metric d∞ is referred to as the sup-metric on B[S, Y ]. Note that the ∞ metric spaces (+ , d∞ ) and ( ∞, d∞ ) of the previous example are particular ∞ cases of (B[S, Y ], d∞ ). Indeed, + = B[N , C ] and ∞ = B[Z , C ]. Example 3.D. The general concept of a continuous mapping between metric spaces will be defined in the next section. However, assuming that the reader is familiar with the particular notion of a scalar-valued continuous function of a real variable, we shall now consider the following example. Let C[0, 1] denote the set of all scalar-valued (real or complex) continuous functions defined on the interval [0, 1]. For every x, y ∈ C[0, 1] set $ dp (x, y) =
1
0
p1 |x(t) − y(t)|p dt ,
where p is a real number such that p ≥ 1, and d∞ (x, y) = sup |x(t) − y(t)|. t∈[0,1]
These are metrics on the set C[0, 1]. That is, dp : C[0, 1]×C[0, 1] → R and d∞ : C[0, 1]×C[0, 1] → R are well-defined functions that satisfy all the conditions in Definition 3.1. Indeed, nonnegativeness and symmetry are trivially verified, positiveness for dp is ensured by the continuity of the elements in C[0, 1], and the triangle inequality comes by the Minkowski inequality (Problem 3.4(c)) as follows. For every x, y, z ∈ C[0, 1], $ dp (x, y) = ≤
0
$
0
1
1
|x(t) − z(t) + z(t) − y(t)|p dt
p1 $ |x(t) − z(t)|p dt +
= dp (x, z) + dp (z, y), and
0
1
p1
p1 |z(t) − y(t)|p dt
3.1 Metric Spaces
93
d∞ (x, y) = sup |x(t) − z(t) + z(t) − y(t)| t∈[0,1]
≤ sup |x(t) − z(t)| + sup |z(t) − y(t)| t∈[0,1]
t∈[0,1]
= d∞ (x, z) + d∞ (z, y). Let B[0, 1] denote the set B[S, Y ] of Example 3.C when S = [0, 1] and Y = F (with F standing either for the real field R or for the complex field C ). Since C[0, 1] is a subset of B[0, 1] (reason: every scalar-valued continuous function defined on [0, 1] is bounded), it follows that (C[0, 1], d∞ ) is a subspace of the metric space (B[0, 1], d∞ ). The metric d∞ is called the sup-metric on C[0, 1] and, as we shall see later, the “sup” in its definition in fact is a “max”. Let X be an arbitrary set. A real-valued function d : X×X → R on the Cartesian product X×X is a pseudometric on X if it satisfies the axioms (i), (iii), and (iv) of Definition 3.1. A pseudometric space (X, d) is a set X equipped with a pseudometric d. The difference between a metric space and a pseudometric space is that a pseudometric does not necessarily satisfy the axiom (ii) in Definition 3.1 (i.e., it is possible for a pseudometric to vanish at a pair (x, y) even though x = y). However, given a pseudometric space (X, d), associated with d) there exists a natural way to obtain a metric space (X, associated with the pseudometric d on X. (X, d), where d is a metric on X Indeed, as we shall see next, a pseudometric d induces an equivalence relation is precisely the quotient space X/∼ (i.e., the collection of all ∼ on X, and X equivalence classes [x] with respect to ∼ for every x in X). Proposition 3.3. Let d be a pseudometric on a set X and consider the relation ∼ on X defined as follows. If x and x are elements of X, then x ∼ x
if
d(x , x) = 0.
The relation ∼ is an equivalence relation on X with the following property. For every x, x , y, and y in X, x ∼ x and y ∼ y
imply
d(x , y ) = d(x, y).
Let X/∼ be the quotient space of X modulo ∼. For each pair ([x], [y]) in X/∼ × X/∼ set d([x], [y]) = d(x, y) for an arbitrary pair (x, y) in [x]×[y]. This defines a function d: X/∼ × X/∼ → R, which is a metric on the quotient space X/∼ . Proof. It is clear that the relation ∼ on X is reflexive and symmetric because a pseudometric is nonnegative and symmetric. Transitivity comes from the
94
3. Topological Structures
triangle inequality: 0 ≤ d(x, x ) ≤ d(x, x ) + d(x , x ) for every x, x , x ∈ X. Thus ∼ is an equivalence relation on X. Moreover, if x ∼ x and y ∼ y (i.e., if x ∈ [x] and y ∈ [y]), then the triangle inequality in the pseudometric space (X, d) ensures that d(x, y) ≤ d(x, x ) + d(x , y ) + d(y , y) = d(x , y ) and, similarly, d(x , y ) ≤ d(x, y). Therefore d(x , y ) = d(x, y)
whenever
x ∼ x and y ∼ y.
That is, given a pair of equivalence classes [x] ⊆ X and [y] ⊆ X, the restriction of d to [x]×[y] ⊆ X×X, d|[x]×[y] : [x]×[y] → R, is a constant function. Thus, for each pair ([x], [y]) in X/∼ × X/∼ , set d([x], [y]) = d|[x]×[y] (x, y) = d(x, y) for any x ∈ [x] and y ∈ [y]. This defines a function d: X/∼ × X/∼ → R which is nonnegative, symmetric, and satisfies the triangle inequality (along with d). The reason for defining equivalence classes is to ensure positiveness for d from the nonnegativeness of the pseudometric d: if d([x], [y]) = 0, then d(x, y) = 0 so that x ∼ y, and hence [x] = [y]. Example 3.E. The previous example exhibited different metric spaces with the same underlying set of all scalar-valued continuous functions on the interval [0, 1]. Here we allow discontinuous functions as well. Let S be a nondegenerate interval of the real line R (typical examples: S = [0, 1] or S = R). For each real number p ≥ 1 let rp (S) denote the set of all scalar-valued (real or complex) p-integrable functions on S. In this context, “p-integrable” means that % a scalar-valued function x %on S is Riemann integrable and S |x(s)|p ds < ∞ (i.e., the Riemann integral S |x(s)|p ds exists as a number in R). Consider the function δp : rp (S)×rp (S) → R given by $
p1 δp (x, y) = |x(s) − y(s)|p ds S
for every x, y ∈ r (S). The Minkowski inequality (see Problem 3.4(c)) ensures that the function δp is well defined, and also that it satisfies the triangle inequality. Moreover, nonnegativeness and symmetry are readily verified, but positiveness fails. For instance, if 0 denotes the null function on S = [0, 1] (i.e., 0(s) = 0 for all s ∈ S), and if x(s) = 1 for s = 21 and zero elsewhere, then δp (x, 0) = 0 although x = 0 (since x( 21 ) = 0( 21 )). Thus δp actually is a pseudometric on rp (S) rather than a metric, so that (rp (S), δp ) is a pseudometric space. However, if we “redefine” rp (S) by endowing it with a new notion of equality, different from the usual pointwise equality for functions, then perhaps we might make δp a metric on such a “redefinition” of rp (S). This in fact is the idea behind Proposition 3.3. Consider the equivalence relation ∼ on rp (S) p
3.2 Convergence and Continuity
95
defined as in Proposition 3.3: if x and x are functions in rp (S), then x ∼ x if δp (x , x) = 0. Now set Rp (S) = rp (S)/∼ , the collection of all equivalence classes [x] = {x ∈ rp (S): δp (x , x) = 0} for every x ∈ rp (S). Thus, by Proposition 3.3, (Rp (S), dp ) is a metric space with the metric dp : Rp (S)×Rp (S) → R defined by dp ([x], [y]) = δp (x, y) for arbitrary x ∈ [x] and y ∈ [y] for every [x], [y] in Rp (S). Note that equality in Rp (S) is interpreted in the following way: if [x] and [y] are equivalence classes in Rp (S), and if x and y are arbitrary functions in [x] and [y], respectively, then [x] = [y] if and only if δp (x, y) = 0. If x is any element of [x], then, in this context, it is usual to write x for [x] and hence dp (x, y) for dp ([x], [y]). Thus, following the common usage, we shall write x ∈ Rp (S) instead of [x] ∈ Rp (S), and also $
p1 |x(s) − y(s)|p ds dp (x, y) = S
for every x, y ∈ Rp (S) to represent the metric dp on Rp (S). This is referred to as the usual metric on Rp (S). Note that, according to this convention, x = y in Rp (S) if and only if dp (x, y) = 0.
3.2 Convergence and Continuity The notion of convergence, together with the notion of continuity, plays a central role in the theory of metric spaces. Definition 3.4. Let (X, d) be a metric space. An X-valued sequence {xn } (or a sequence in X indexed by N or by N 0 ) converges to a point x in X if for each real number ε > 0 there exists a positive integer nε such that n ≥ nε
implies
d(xn , x) < ε.
If {xn } converges to x ∈ X, then {xn } is said to be a convergent sequence and x is said to be the limit of {xn } (usual notation: lim xn = x, limn xn = x, or −→ x). xn → x; also lim n→∞ xn = x, xn → x as n → ∞, or even xn − n→∞ As defined above, convergence depends on the metric d that equips the metric space (X, d). To emphasize the role played by the metric d, it is usual to refer to an X-valued convergent sequence {xn } by saying that {xn } converges in (X, d). If an X-valued sequence {xn } does not converge in (X, d) to the point x ∈ X, then we shall write xn → / x. Clearly, if xn → / x, then the sequence {xn } either converges in (X, d) to another point different from x or does not converge in (X, d) to any x in X. The notion of convergence in a metric space (X, d) is a natural extension of the ordinary notion of convergence in the real line R (equipped with its usual metric). Indeed, let (X, d) be a metric space, and consider an X-valued sequence {xn }. Let x be an arbitrary point in X and consider the real-valued sequence {d(xn , x)}. According to Definition 3.4,
96
3. Topological Structures
xn → x
if and only if
d(xn , x) → 0.
This shows at once that a convergent sequence in a metric space has a unique limit (as we had anticipated in Definition 3.4 by referring to the limit of a convergent sequence). In fact, if a and b are points in X, then the triangle inequality says that 0 ≤ d(a, b) ≤ d(a, xn ) + d(xn , b) for every index n. Thus, if xn → a and xn → b (i.e., d(a, xn ) → 0 and d(xn , b) → 0), then d(a, b) = 0 (see Problems 3.10(c,e)). Hence a = b. Example 3.F. Let C[0, 1] denote the set of all scalar-valued continuous functions on the interval [0, 1], and let {xn } be a C[0, 1]-valued sequence such that, for each integer n ≥ 1, xn : [0, 1] → R is defined by ⎧ ⎨ 1 − nt t ∈ [0, n1 ], xn (t) = ⎩ 0, t ∈ ( n1 , 1]. Consider the metric spaces (C[0, 1], dp ) for p ≥ 1 and (C[0, 1], d∞ ) which were introduced in Example 3.D. It is readily verified that the sequence {xn } converges in (C[0, 1], dp ) to the null function 0 ∈ C[0, 1] for every p ≥ 1. Indeed, take an arbitrary p ≥ 1 and note that $ 1
p1 p1 dp (xn , 0) = |xn (t)|p dt < n1 0
1 for each n ≥ 1. Since the sequence of real numbers ( n1 ) p converges to zero (when the real line R is equipped with its usual metric — apply Definition 3.4), it follows that dp (xn , 0) → 0 as n → ∞ (Problem 3.10(c)). That is, xn → 0 in (C[0, 1], dp ). However, {xn } does not converge in the metric space (C[0, 1], d∞ ). Indeed, if there exists x ∈ C[0, 1] such that d∞ (xn , x) → 0, then it is easy to show that x(0) = 1 and x(ε) = 0 for all ε ∈ (0, 1]. Hence x ∈ / C[0, 1], which is a contradiction. Conclusion: There is no x ∈ C[0, 1] such that xn → x in (C[0, 1], d∞ ). Equivalently, {xn } does not converge in (C[0, 1], d∞ ). Example 3.G. Consider the metric space (B[S, Y ], d∞ ) introduced in Example 3.C, where B[S, Y ] denotes the set of all bounded functions of a set S into a metric space (Y, d), and d∞ is the sup-metric. Let {fn} be a B[S, Y ]-valued sequence (i.e., a sequence of functions in B[S, Y ]), and let f be an arbitrary function in B[S, Y ]. Since 0 ≤ d(fn (s), f (s)) ≤ sup d(fn (s), f (s)) = d∞ (fn , f ) s∈S
for each index n and all s ∈ S, it follows by Problem 3.10(c) that
3.2 Convergence and Continuity
fn → f in (B[S, Y ], d∞ )
implies
97
fn (s) → f (s) in (Y, d)
for every s ∈ S. If fn → f in (B[S, Y ], d∞ ), then we say that the sequence {fn } of functions in B[S, Y ] converges uniformly to the function f in B[S, Y ]. If fn (s) → f (s) in (Y, d) for every s ∈ S, then we say that {fn } converges pointwise to f . Thus uniform convergence implies pointwise convergence (to the same limit), but the converse fails. For instance, set S = [0, 1], Y = F (either the real field R or the complex field C equipped with their usual metric d), and set B[0, 1] = B[[0, 1], F ]. Recall that the metric space (C[0, 1], d∞ ) of Example 3.D is a subspace of (B[0, 1], d∞ ). (Indeed, every scalar-valued continuous function defined on a bounded closed interval is a bounded function — we shall consider a generalized version of this well-known result later in this chapter.) If {gn } is a sequence of functions in C[0, 1] given by gn (s) =
s2 s2 + (1 − ns)2
for each integer n ≥ 1 and every s ∈ S = [0, 1], then it is easy to show that gn (s) → 0
in (R, d)
for every s ∈ [0, 1] (cf. Definition 3.4), so that the sequence {gn } of functions in C[0, 1] converges pointwise to the null function 0 ∈ C[0, 1]. However, since 0 ≤ gn (s) ≤ 1 for all s ∈ [0, 1] and gn ( n1 ) = 1, for each n ≥ 1, it follows that d∞ (gn , 0) = sup |gn (s)| = 1 s∈[0,1]
for every n ≥ 1. Hence {gn} does not converge uniformly to the null function, and so it does not converge uniformly to any limit (for, if it converges uniformly, then it converges pointwise to the same limit). Thus the C[0, 1]-valued sequence {gn } does not converge in the metric space (C[0, 1], d∞ ). Briefly, {gn } does not converge in (C[0, 1], d∞ ). But it converges to the null function 0 ∈ C[0, 1] in the metric spaces (C[0, 1], dp ) of Example 3.D. That is, for every p ≥ 1, gn → 0 in (C[0, 1], dp ). Indeed, gn (0) = 0, gn (s) = (1 + fn (s))−1 with fn (s) = (n − s1 )2 ≥ 0 for every s ∈ (0, 1], and gn ( n1 ) = 1. Note that fn ( n1 ) = 0, fn (1) = (n − 1)2 ≥ 0, and fn (s) = 0 only at s = n1 . Thus each fn is decreasing on (0, n1 ] and increasing on [ n1 , 1], and hence each gn is increasing on [0, n1 ] and decreasing on [ n1 , 1]. Therefore, for an arbitrary ε ∈ (0, 12 ], and for every n ≥ 2 and every p ≥ 1, $ 0
$
1
n
|gn (s)| ds +
1
n +ε
p
1
n
|gn(s)|p ds +
$
1 1
|gn (s)|p ds ≤
n +ε
1 n
+ ε + gn ( n1 + ε)p .
98
3. Topological Structures
1 Since fn ( n1 + ε) = ε2 n4 (1 + εn)−2, it follows % 1 that gnp ( n + ε) → 0 as n → ∞. This and the above inequality ensure that 0 |gn (s)| ds → 0, and so
dp (gn , 0) → 0
as n → ∞
for every
p ≥ 1.
Proposition 3.5. An X-valued sequence converges in a metric space (X, d) to x ∈ X if and only if every subsequence of it converges in (X, d) to x. Proof. Let {xn } be an X-valued sequence. If every subsequence of it converges to a fixed limit, then, in particular, the sequence itself converges to the same limit. On the other hand, suppose xn → x in (X, d). That is, for every ε > 0 there exists a positive integer nε such that n ≥ nε implies d(xn , x) < ε. Take an arbitrary subsequence {xnk }k∈N of {xn }n∈N . Since k ≤ nk (reason: {nk }k∈N is a strictly increasing subsequence of the sequence {n}n∈N — see Section 1.7), it follows that k ≥ nε implies nk ≥ nε which in turn implies d(xnk , x) < ε. Therefore xnk → x in (X, d) as k → ∞. As we saw in Section 1.7, nets constitute a natural generalization of (infinite) sequences. Thus it comes as no surprise that the concept of convergence can be generalized from sequences to nets in a metric space (X, d). In fact, an X-valued net {xγ }γ∈Γ (or a net in X) indexed by a directed set Γ converges to x ∈ X if for each real number ε > 0 there exists an index γε in Γ such that γ ≥ γε
implies
d(xγ , x) < ε.
If {xγ }γ∈Γ converges to a point x in X, then {xγ }γ∈Γ is said to be a convergent net and x is said to be the limit of {xγ }γ∈Γ (usual notation: lim xγ = x, limγ xγ = x, or xγ → x). Just as in the particular case of sequences, a convergent net in a metric space has a unique limit. The notion of a real-valued continuous function on R is essential in classical analysis. One of the main reasons for investigating metric spaces is the generalization of the idea of continuity for maps between abstract metric spaces: a map between metric spaces is continuous if it preserves closeness. Definition 3.6. Let F : X → Y be a function from a set X to a set Y. Equip X and Y with metrics dX and dY , respectively, so that (X, dX ) and (Y, dY ) are metric spaces. F : (X, dX ) → (Y, dY ) is continuous at the point x0 in X if for each real number ε > 0 there exists a real number δ > 0 (which certainly depends on ε and may depend on x0 as well) such that dX (x, x0 ) < δ
implies
dY (F (x), F (x0 )) < ε.
F is continuous (or continuous on X) if it is continuous at every point of X; and uniformly continuous (on X) if for each real number ε > 0 there exists a real number δ > 0 such that dX (x, x ) < δ for all x and x in X.
implies
dY (F (x), F (x )) < ε
3.2 Convergence and Continuity
99
It is plain that a uniformly continuous mapping is continuous, but the converse fails. The difference between continuity and uniform continuity is that if a function F is uniformly continuous, then for each ε > 0 it is possible to take a δ > 0 (which depends only on ε) so as to ensure that the implication {dX (x, x0 ) < δ =⇒ dY (F (x), F (x0 )) < ε} holds for all points x0 of X. We say that a mapping F : (X, dX ) → (Y, dY ) is Lipschitzian if there exists a real number γ > 0 (called a Lipschitz constant ) such that dY (F (x), F (x )) ≤ γ dX (x, x ) for all x, x ∈ X (which is referred to as the Lipschitz condition). It is readily verified that every Lipschitzian mapping is uniformly continuous, but, again, the converse fails (see Problem 3.16). A contraction is a Lipschitzian mapping F : (X, dX ) → (Y, dY ) with a Lipschitz constant γ ≤ 1. That is, F is a contraction if dY (F (x), F (x )) ≤ dX (x, x ) for all x, x ∈ X or, equivalently, if sup
x =x
dY (F (x), F (x )) ≤ 1. dX (x, x )
A function F is said to be a strict contraction if it is a Lipschitzian mapping with a Lipschitz constant γ < 1, which means that sup
x =x
dY (F (x), F (x )) < 1. dX (x, x )
Note that, if dY (F (x), F (x )) < dX (x, x ) for all x, x ∈ X, then F is a contraction but not necessarily a strict contraction. Consider a function F from a metric space (X, dX ) to a metric space (Y, dY ). If F is continuous at a point x0 ∈ X, then x0 is said to be a point of continuity of F . Otherwise, if F is not continuous at a point x0 ∈ X, then x0 is said to be a point of discontinuity of F , and F is said to be discontinuous at x0 . F is not continuous if there exists at least one point x0 ∈ X such that F is discontinuous at x0 . According to Definition 3.6, a function F is discontinuous at x0 ∈ X if and only if the following assertion holds true: there exists an ε > 0 such that for every δ > 0 there exists an xδ ∈ X with the property that dX (xδ , x0 ) < δ
and
dY (F (xδ ), F (x0 )) ≥ ε.
Example 3.H. (a) Consider the set R2 (R) defined in Example 3.E. Set Y = R2 (R) and let X be the subset of Y made up of all functions x in R2 (R) for which the formula $ t y(t) = x(s) ds for each t ∈ R −∞
defines a function in R2 (R). Briefly,
100
3. Topological Structures
X =
#2 % ∞ #% t x ∈ Y : −∞ # −∞ x(s) ds# dt < ∞ .
Recall that a “function” in Y is, in fact, an equivalence class of functions as discussed in Example 3.E. Thus consider the mapping F : X → Y that assigns to each function x in X the function y = F (x) in Y defined by the above formula. Now equip R2 (R) with its usual metric d2 (cf. Example 3.E) so that (X, d2 ) is a subspace of the metric space (Y, d2 ). We claim that F : (X, d2 ) → (Y, d2 ) is nowhere continuous; that is, the mapping F is discontinuous at every x0 ∈ X (see Problem 3.17(a)). (b) Now let S be a (nondegenerate) closed and bounded interval of the real line R (typical example: S = [0, 1]) and consider the set R2 (S) defined in Example 3.E. If x is a function in R2 (S) (so that it is Riemann integrable), then set $ t y(t) = x(s) ds for each t ∈ S. min S
% According to the H¨older inequality in Problem 3.3(c), we get S |x(s)|ds ≤ %
1 %
1 %t 2 |x(s)|2 ds 2 for every x ∈ R2 (S). Then |y(t)|2 = | 0 x(s)ds|2 ≤ S % %S ds 2 ≤ diam(S) S |x(s)|2 ds for each t ∈ S, and so S |x(s)|ds $ $ 2 2 |y(t)| dt ≤ diam(S) |x(s)|2 ds < ∞ S
S
for every x ∈ R2 (S). Thus the previous identity defines a function y in R2 (S). Let F be a mapping of R2 (S) into itself that assigns to each function x in R2 (S) this function y in R2 (S), so that y = F (x). Equip R2 (S) with its usual metric d2 (Example 3.E). It is easy to show that F : (R2 (S), d2 ) → (R2 (S), d2 ) is uniformly continuous. As a matter of fact, the mapping F is Lipschitzian (cf. Problem 3.17(b)). Comparing the example in item (a) with the present one, we observe how different the metric spaces R2 (R) and R2 (S), both equipped with the usual metric d2 , can be: the “same” integral transformation F that is nowhere continuous when defined on an appropriate subspace of (R2 (R), d2 ) becomes Lipschitzian when defined on (R2 (S), d2 ). The concepts of convergence and continuity are tightly intertwined. A particularly important result on the connection of these central concepts says that a function is continuous if and only if it preserves convergence. This leads to a necessary and sufficient condition for continuity in terms of convergence. Theorem 3.7. Consider a mapping F : (X, dX ) → (Y, dY ) of a metric space (X, dX ) into a metric space (Y, dY ) and let x0 be a point in X. The following assertions are equivalent . (a) F is continuous at x0 . (b) The Y-valued sequence {F (xn )} converges in (Y, dY ) to F (x0 ) ∈ Y whenever {xn } is an X-valued sequence that converges in (X, dX ) to x0 ∈ X.
3.2 Convergence and Continuity
101
Proof. If {xn } is an X-valued sequence such that xn → x0 in (X, dX ) for some x0 in X, then (Definition 3.4) for every δ > 0 there exists a positive integer nδ such that n ≥ nδ implies dX (xn , x0 ) < δ. If F : (X, dX ) → (Y, dY ) is continuous at x0 , then (Definition 3.6) for each ε > 0 there exists δ > 0 such that dX (xn , x0 ) < δ
implies
dY (F (x), F (x0 )) < ε.
Therefore, if xn → x0 and F is continuous at x0 , then for each ε > 0 there exists a positive integer nε (e.g., nε = nδ ) such that n ≥ nε
implies
dY (F (xn ), F (x0 )) < ε,
which means that (a)⇒(b). On the other hand, if F is not continuous at x0 , then there exists ε > 0 such that for every δ > 0 there exists xδ ∈ X with the property that dX (xδ , x0 ) < δ and dY (F (xδ ), F (x0 )) ≥ ε. In particular, for each positive integer n there exists xn ∈ X such that dX (xn , x0 ) <
1 n
and
dY (F (xn ), F (x0 )) ≥ ε.
/ F (x0 ) in (Y, dY ) Thus xn → x0 in (X, dX ) (since dX (xn , x0 ) → 0) and F (xn ) → (since dY (F (xn ), F (x0 )) → / 0). That is, the Y-valued sequence {F (xn )} does not converge to F (x0 ) (it may not converge in (Y, dY ) or, if it converges in (Y, dY ), then it does not converge to F (x0 )). Therefore, the denial of (a) implies the denial of (b). Equivalently, (b)⇒(a). Note that the proof of (a)⇒(b) can be rewritten in terms of nets so that, if the mapping F : (X, dX ) → (Y, dY ) is continuous at x0 ∈ X, and if {xγ }γ∈Γ is an X-valued net that converges to x0 , then {F (xγ )}γ∈Γ is a Y-valued net that converges to F (x0 ). That is, if (b ) is the statement obtained from (b) by changing “sequence” to “net” in (b), then (a)⇒(b ). Conversely, since (b )⇒(b) trivially (in fact, (b) is a particular case of (b )), it also follows that (b )⇒(a) (because (b)⇒(a)). We shall say that a mapping F : (X, dX ) → (Y, dY ) of a metric space (X, dX ) into a metric space (Y, dY ) preserves convergence if the Y-valued sequence {F (xn )} converges in (Y, dY ) whenever the X-valued sequence {xn } converges in (X, dX ), and F (lim xn ) = lim F (xn ). Corollary 3.8. A map between metric spaces is continuous if and only if it preserves convergence. Proof. Combine the above definition and the definition of a continuous function with Theorem 3.7.
102
3. Topological Structures
Example 3.I. Let C[0, 1] denote the set of all scalar-valued (real or complex) continuous functions defined on the interval [0, 1]. Consider the map ϕ : C[0, 1] → C defined by $ 1 x(t) dt ϕ(x) = 0
for every x in C[0, 1]. Equip C[0, 1] with the sup-metric d∞ and C with its usual metric d. Take an arbitrary convergent sequence {xn } in (C[0, 1], d∞ ) (i.e., an arbitrary C[0, 1]-valued sequence that converges in the metric space (C[0, 1], d∞ )) and set x0 = lim xn ∈ C[0, 1]. Note that $ 1 #$ 1 #
# # xn (t) dt − x0 (t) dt # 0 ≤ d ϕ(xn ), ϕ(x0 ) = # 0 0 $ 1 #$ 1 # # #
# # #xn (t) − x0 (t) # dt = # xn (t) − x0 (t) dt # ≤ 0
# # ≤ sup #xn (t) − x0 (t) #
$
t∈[0,1]
0
1
0
dt = d∞ (xn , x0 )
for each positive integer n (by the H¨older inequality: Problem 3.3(c)). Since d∞ (xn , x0 ) → 0, it follows that d(ϕ(xn ), ϕ(x0 )) → 0. Therefore ϕ(xn ) → ϕ(x0 ) in (C , d) whenever xn → x0 in (C[0, 1], d∞ ) so that ϕ: (C[0, 1], d∞ ) → (C , d) is continuous by Corollary 3.8. (In fact, ϕ is a contraction, thus Lipschitzian, and hence uniformly continuous.)
3.3 Open Sets and Topology Let (X, d) be a metric space. For each point x0 in X and each nonnegative real number ρ, the set Bρ (x0 ) = x ∈ X: d(x, x0 ) < ρ is the open ball with center x0 and radius ρ (or the open ball centered at x0 with radius ρ). If ρ = 0, then B0 (x0 ) is empty; otherwise, Bρ (x0 ) always contains at least its center. The set Bρ [x0 ] = x ∈ X: d(x, x0 ) ≤ ρ is the closed ball with center x0 and radius ρ. It is clear that Bρ (x0 ) ⊆ Bρ [x0 ]. If ρ = 0, then B0 [x0 ] = {x0 } for every x0 ∈ X. Definition 3.9. A subset U of a metric space X is an open set in X if it includes a nonempty open ball centered at each one of its points. That is, U is an open set in X if and only if for every u in U there exists a positive number ρ such that Bρ (u) ⊆ U . Equivalently, U ⊆ X is open in the metric space (X, d) if and only if for every u ∈ U there exists ρ > 0 such that
3.3 Open Sets and Topology
d(x, u) < ρ
implies
103
x ∈ U.
Thus, according to Definition 3.9, a subset A of a metric space (X, d) is not open if and only if there exists at least one point a in A such that every open ball with positive radius ρ centered at a contains a point of X not in A. In other words, A ⊂ X is not open in the metric space (X, d) if and only if there exists at least one point a ∈ A with the following property: for every ρ > 0 there exists x ∈ X such that d(x, a) < ρ
and
x ∈ X\A.
This shows at once that the empty set ∅ is open in X (reason: if a set is not open, then it has at least one point); and also that the underlying set X is always open in the metric space (X, d) (reason: there is no point in X\X). Proposition 3.10. An open ball is an open set. Proof. Let Bρ (x0 ) be an open ball in a metric space (X, d) with center at an arbitrary x0 ∈ X and with an arbitrary radius ρ ≥ 0. Suppose ρ > 0 so that Bρ (x0 ) = ∅ (otherwise Bρ (x0 ) is empty and hence trivially open). Take an arbitrary u ∈ Bρ (x0 ), which means that u ∈ X and d(u, x0 ) < ρ. Set β = ρ − d(u, x0 ) so that 0 < β ≤ ρ, and let x be a point in X. If d(x, u) < β, then the triangle inequality ensures that d(x, x0 ) ≤ d(x, u) + d(u, x0 ) < β + d(u, x0 ) = ρ, and so x ∈ Bρ (x0 ). Conclusion: For each u ∈ Bρ (x0 ) there is a β > 0 such that d(x, u) < β
implies
x ∈ Bρ (x0 ).
That is, Bρ (x0 ) is an open set.
An open neighborhood of a point x in a metric space is an open set containing x. In particular (see Proposition 3.10), every open ball with positive radius centered at a point x in a metric space is an open neighborhood of x. A neighborhood of a point x in a metric space X is any subset of X that includes an open neighborhood of x. Clearly, every open neighborhood of x is a neighborhood of x. Open sets give an alternative definition of continuity and convergence. Lemma 3.11. Let F : X → Y be a mapping of a metric space X into a metric space Y, and let x0 be a point in X. The following assertions are equivalent. (a) F is continuous at x0 . (b) The inverse image of every neighborhood of F (x0 ) is a neighborhood of x0 . Proof. Consider the image F (x0 ) ∈ Y of x0 ∈ X. Take an arbitrary neighborhood N ⊆ Y of F (x0 ). Since N includes an open neighborhood U of F (x0 ),
104
3. Topological Structures
it follows that there exists an open ball Bε (F (x0 )) ⊆ U ⊆ N with center at F (x0 ) and radius ε > 0. If the mapping F : X → Y is continuous at x0 (cf. Definition 3.6), then there exists δ > 0 such that dY (F (x), F (x0 )) < ε
whenever
dX (x, x0 ) < δ,
where dX and dY are the metrics on X and Y, respectively. In other words, there exists δ > 0 such that x ∈ Bδ (x0 )
implies
F (x) ∈ Bε (F (x0 )).
Thus Bδ (x0 ) ⊆ F −1 (Bε (F (x0 ))) ⊆ F −1 (U ) ⊆ F −1 (N ). Since the open ball Bδ (x0 ) is an open neighborhood of x0 , and since Bδ (x0 ) ⊆ F −1 (N ), it follows that F −1 (N ) is a neighborhood of x0 . Hence (a) implies (b). Now suppose (b) holds true. Then, in particular, the inverse image F −1 (Bε (F (x0 ))) of each open ball Bε (F (x0 )) with center F (x0 ) and radius ε > 0 includes a neighborhood N ⊆ X of x0 . This neighborhood N includes an open neighborhood U of x0 , which in turn includes an open ball Bδ (x0 ) with center x0 and radius δ > 0 (cf. Definition 3.9). Therefore, for each ε > 0 there is a δ > 0 such that Bδ (x0 ) ⊆ U ⊆ N ⊆ F −1 (Bε (F (x0 ))). Hence (see Problems 1.2(c,j)) F (Bδ (x0 )) ⊆ Bε (F (x0 )). Equivalently, if x ∈ Bδ (x0 ), then F (x) ∈ Bε (F (x0 )). Thus, for each ε > 0 there exists δ > 0 such that dX (x, x0 ) < δ
implies
dY (F (x), F (x0 )) < ε,
where dX and dY denote the metrics on X and Y, respectively. That is, (a) holds true (Definition 3.6). Theorem 3.12. A map between metric spaces is continuous if and only if the inverse image of each open set is an open set . Proof. Let F : X → Y be a mapping of a metric space X into a metric space Y. (a) Take any neighborhood N ⊆ Y of F (x) ∈ Y (for an arbitrary x ∈ X). Since N includes an open neighborhood of F (x), say U , it follows that F (x) ∈ U ⊆ N , which implies x ∈ F −1 (U ) ⊆ F −1 (N ). If the inverse image (under F ) of each open set in Y is an open set in X, then F −1 (U ) is open in X, and so F −1 (U ) is an open neighborhood of x. Hence the inverse image F −1 (N ) is a neighborhood of x. Therefore, the inverse image of every neighborhood of F (x) (for any x ∈ X) is a neighborhood of x. Thus F is continuous by Lemma 3.11.
3.3 Open Sets and Topology
105
(b) Conversely, take an arbitrary open subset U of Y. Suppose R(F ) ∩ U = ∅ and take x ∈ F −1 (U ) ⊆ X arbitrary. Thus F (x) ∈ U so that U is an open neighborhood of F (x). If F is continuous, then it is continuous at x. Therefore, according to Lemma 3.11, F −1 (U ) is a neighborhood of x, and so it includes a nonempty open ball Bδ (x) centered at x. Thus Bδ (x) ⊆ F −1 (U ) so that F −1 (U ) is open in X (reason: it includes a nonempty open ball of an arbitrary point of it). If R(F ) ∩ U = ∅, then F −1 (U ) = ∅ which is open. Conclusion: F −1 (U ) is open in X for every open subset U of Y. Corollary 3.13. The composition of two continuous functions is again a continuous function. Proof. Let X, Y, and Z be metric spaces, and let F : X → Y and G: Y → Z be continuous functions. Take an arbitrary open set U in Z. Theorem 3.12 says that G−1 (U ) is an open set in Y, and so (GF )−1 (U ) = F −1 (G−1 (U )) is an open set in X. Thus, using Theorem 3.12 again, GF : X → Z is continuous. An X-valued sequence {xn } is said to be eventually in a subset A of X if there exists a positive integer n0 such that n ≥ n0
implies
xn ∈ A.
Theorem 3.14. Let {xn } be a sequence in a metric space X and let x be a point in X. The following assertions are equivalent . (a) xn → x in X. (b) {xn } is eventually in every neighborhood of x. Proof. If xn → x, then (definition of convergence) {xn } is eventually in every nonempty open ball centered at x. Hence it is eventually in every neighborhood of x (cf. definitions of neighborhood and of open set). Conversely, if {xn } is eventually in every neighborhood of x, then, in particular, it is eventually in every nonempty open ball centered at x, which means that xn → x. Theorem 3.14 is naturally extended from sequences to nets. A net {xγ }γ∈Γ in a metric space X converges to x ∈ X if and only if for every neighborhood N of x there exists an index γ0 ∈ Γ such that xγ ∈ N for every γ ≥ γ0 . Given a metric space X, the collection of all open sets in X is of paramount importance. Its fundamental properties are stated in the next theorem. Theorem 3.15. If X is a metric space, then (a) the whole set X and the empty set ∅ are open, (b) the intersection of a finite collection of open sets is open, (c) the union of an arbitrary collection of open sets is open.
106
3. Topological Structures
Proof. We have already verified that assertion (a) holds true. Let {U n } be a finite collection of open subsets of X. Suppose U = ∅ (otherwise n n n Un is a open set). Take an arbitrary u ∈ n Un so that u ∈ Un for every index n. As each Un is an open subset of X, there are open balls Bαn(u) ⊆ Un (with center at u and radius αn > 0) for each n. Consider the set {αn } consisting of the radius of each Bαn(u). Since {αn} is a finite set of positive numbers, it follows that ithas a positive minimum. Set α = min{α n } > 0 so that Bα (u) ⊆ n Un . Thus n U n is open in X (i.e., for each u ∈ n Un there exists an open ball Bα (u) ⊆ n Un ), which concludes the proof of (b). The proof of (c) goes as follows. Let U be an arbitrary collection of open subsets of X. Suppose U is nonempty (otherwise it is open by (a)) and take an arbitrary u ∈ U so that u ∈ U for some U ∈ U . As U is an open subset of X, there exists a nonempty open ball Bρ (u) ⊆ U ⊆ U, which means that U is open in X. Corollary 3.16. A subset of a metric space is open if and only if it is a union of open balls. Proof. The union of open balls in a metric space X is an open set in X because open balls are open sets (cf. Proposition 3.10 and Theorem 3.15). On the other hand, let U be an open set in a metric space X. If U is empty, then it coincides with the empty open ball. If U is a nonempty open subset of X, then each u ∈ U isthe center of an open ball, say Bρ u(u), included in U . Thus U = u∈U {u} ⊆ u∈U Bρu(u) ⊆ U , and hence U = u∈U Bρu(u). The collection T of all open sets in a metric space X (which is a subcollection of the power set ℘(X)) is called the topology (or the metric topology) on X. As the elements of T are the open sets in the metric space (X, d), and since the definition of an open set in X depends on the particular metric d that equips the metric space (X, d), the collection T is also referred to as the topology induced (or generated, or determined ) by the metric d. Our starting point in this chapter was the definition of a metric space. A metric has been defined on a set X as a real-valued function on X×X that satisfies the metric axioms of Definition 3.1. A possible and different approach is to define axiomatically an abstract notion of open sets (instead of an abstract notion of distance as we did in Definition 3.1), and then to build up a theory based on it. Such a “new” beginning goes as follows. Definition 3.17. A subcollection T of the power set ℘ (X) of a set X is a topology on X if it satisfies the following three axioms. (i) The whole set X and the empty set ∅ belong to T . (ii) The intersection of a finite collection of sets in T belongs to T . (iii) The union of an arbitrary collection of sets in T belongs to T . A set X equipped with a topology T is referred to as a topological space (denoted by (X, T ) or simply by X), and the elements of T are called the open
3.3 Open Sets and Topology
107
subsets of X with respect to T . Thus a topology T on an underlying set X is always identified with the collection of all open subsets of X: U is open in X with respect to T if and only if U ∈ T . It is clear (see Theorem 3.15) that every metric space (X, d) is a topological space, where the topology T (the metric topology, that is) is that induced by the metric. This topology T induced by the metric d, and the topological space (X, T ) obtained by equipping X with T , are said to be metrized by d. If (X, T ) is a topological space and if there is a metric d on X that metrizes T , then the topological space (X, T ) and the topology T are called metrizable. The notion of topological space is broader than the notion of metric space. Although every metric space is a topological space, the converse fails. There are topological spaces that are not metrizable. Example 3.J. Let X be an arbitrary set. Define a function d : X×X → R by 0, x = y, d(x, y) = 1, x = y, for every x and y in X. It is readily verified that d is a metric on X. This is the discrete metric on X. A set X equipped with the discrete metric is called a discrete space. In a discrete space every open ball with radius ρ ∈ (0, 1) is a singleton in X: Bρ (x0 ) = {x0 } for every x0 ∈ X and every ρ ∈ (0, 1). Thus, according to Definition 3.9, every subset of X is an open set in the metric space (X, d) equipped with the discrete metric d. That is, the metric topology coincides with the power set of X. Conversely, if X is an arbitrary set, then the collection T = ℘ (X) is a topology on X (since T = ℘(X) trivially satisfies the above three axioms), called the discrete topology, which is the largest topology on X (any other topology on X is a subcollection of the discrete topology). Summing up: The discrete topology T = ℘ (X) is metrizable by the discrete metric. On the other extreme lies the topology T = {∅, X}, called the indiscrete topology, which is the smallest topology on X (it is a subcollection of any other topology on X). If X has more than one point, then the indiscrete topology T = {∅, X} is not metrizable. Indeed, suppose there is a metric on d on X that induces the indiscrete topology. Take u in X arbitrary and consider the set X\{u}. Since ∅ = X\{u} = X, it follows that this set is not open (with respect to the indiscrete topology). Thus there exists v ∈ X\{u} with the following property: for every ρ > 0 there exists x ∈ X such that d(x, v) < ρ
and
x ∈ X\(X\{u}) = {u}.
Hence x = u so that d(u, v) < ρ for every ρ > 0. Therefore u = v (i.e., d(u, v) = 0), which is a contradiction (because v ∈ X\{u}). Conclusion: There is no metric on X that induces the indiscrete topology.
108
3. Topological Structures
Continuity and convergence in a topological space can be defined as follows. A mapping F : X → Y of a topological space (X, TX ) into a topological space (Y, TY ) is continuous if F −1 (U ) ∈ TX for every U ∈ TY . An X-valued sequence {xn } converges in a topological space (X, T ) to a limit x ∈ X if it is eventually in every U ∈ T that contains x. Carefully note that, for the particular case of metric spaces (or of metrizable topological spaces), the above definitions of continuity and convergence agree with Definitions 3.6 and 3.4 when the topological spaces are equipped with their metric topology. Indeed, these definitions are the topological-space versions of Theorems 3.12 and 3.14. Many (but not all) of the theorems in the following sections hold for general topological spaces (metrizable or not), and we shall prove them by using a topological-space style (based on open sets rather than on open balls) whenever this is possible and convenient. However, as we had anticipated at the introduction of this chapter, our attention will focus mainly on metric spaces.
3.4 Equivalent Metrics and Homeomorphisms Let (X, d1 ) and (X, d2 ) be two metric spaces with the same underlying set X. The metrics d1 and d2 are said to be equivalent (or d1 and d2 are equivalent metrics on X — notation: d1 ∼ d2 ) if they induce the same topology (i.e., a subset of X is open in (X, d1 ) if and only if it is open in (X, d2 )). This notion of equivalence in fact is an equivalence relation on the collection of all metrics defined on a given set X. If T1 and T2 are the metric topologies on X induced by the metrics d1 and d2 , respectively, then d 1 ∼ d2
if and only if
T1 = T2 .
If T1 ⊆ T2 (i.e., if every open set in (X, d1 ) is open in (X, d2 )), then T2 is said to be stronger than T1 . In this case we also say that T1 is weaker than T2 . The terms finer and coarser are also used as synonyms for “stronger” and “weaker”, respectively. If either T1 ⊆ T2 or T2 ⊆ T1 , then T1 and T2 are said to be commensurable. Otherwise (i.e., if neither T1 ⊆ T2 nor T2 ⊆ T1 ), the topologies are said to be incommensurable. As we shall see below, if T2 is stronger than T1 , then continuity with respect to T1 implies continuity with respect to T2 . On the other hand, if T2 is stronger than T1 , then convergence with respect to T2 implies convergence with respect to T1 . Briefly and roughly: “Strong convergence” implies “weak convergence” but “weak continuity” implies “strong continuity”. Theorem 3.18. Let d1 and d2 be metrics on a set X, and consider the topologies T1 and T2 induced by d1 and d2 , respectively. The following assertions are pairwise equivalent . (a) T2 is stronger than T1 (i.e., T1 ⊆ T2 ).
3.4 Equivalent Metrics and Homeomorphisms
109
(b) Every mapping F : X → Y that is continuous at x0 ∈ X as a mapping of (X, d1 ) into the metric space (Y, d) is continuous at x0 as a mapping of (X, d2 ) into (Y, d). (c) Every X-valued sequence that converges in (X, d2 ) to a limit x ∈ X converges in (X, d1 ) to the same limit x. (d) The identity map of (X, d2 ) onto (X, d1 ) is continuous. Proof. Consider the topologies T1 and T2 on X induced by the metrics d1 and d2 on X. Let T denote the topology on a set Y induced by a metric d on Y. Proof of (a)⇒(b). If F : (X, d1 ) → (Y, d) is continuous at x0 ∈ X, then (Lemma 3.11) for every U ∈ T that contains F (x0 ) there exists U ∈ T1 containing x0 such that U ⊆ F −1 (U ). If T1 ⊆ T2 , then U ∈ T2 : the inverse image (under F ) of every open neighborhood of F (x0 ) in T includes an open neighborhood of x0 in T2 , which clearly implies that the inverse image (under F ) of every neighborhood of F (x0 ) in T is a neighborhood of x0 in T2 . Thus, applying Lemma 3.11 again, F : (X, d2 ) → (Y, d) is continuous at x0 . Proof of (a)⇒(c). Let {xn } be an X-valued sequence. If xn → x ∈ X in (X, d2 ), then (Theorem 3.14) {xn } is eventually in every open neighborhood of x in T2 . If T1 ⊆ T2 then, in particular, {xn } is eventually in every neighborhood of x in T1 . Therefore, applying Theorem 3.14 again, xn → x in (X, d1 ). Proof of (b)⇒(d). The identity map I: (X, d1 ) → (X, d1 ) of a metric space onto itself is trivially continuous. Thus, by setting (Y, d) = (X, d1 ) in (b), it follows that (b) implies (d). Proof of (c)⇒(d). Corollary 3.8 ensures that (c) implies (d). Proof of (d)⇒(a). According to Theorem 3.12, (d) implies (a) (i.e., if the identity I: (X, d2 ) → (X, d1 ) is continuous, then U = I −1 (U ) is open in T2 whenever U is open in T1 , and hence T1 ⊆ T2 ). As the discrete topology is the strongest topology on X, the above theorem ensures that any function F : X → Y that is continuous in some topology on X is continuous in the discrete topology. Actually, since every subset of X is open in the discrete topology, it follows that the inverse image of every subset of Y — no matter which topology equips the set Y — is an open subset of X when X is equipped with the discrete topology. Therefore, every function defined on a discrete topological space is continuous. On the other hand, if an X-valued (infinite) sequence converges in the discrete topology, then it is eventually constant (i.e., it has only a finite number of entries not equal to its limit), and hence it converges in any topology on X. Corollary 3.19. Let (X, d1 ) and (X, d2 ) be metric spaces with the same underlying set X. The following assertions are pairwise equivalent.
110
3. Topological Structures
(a) d2 and d1 are equivalent metrics on X. (b) A mapping of X into a set Y is continuous at x0 ∈ X as a mapping of (X, d1 ) into the metric space (Y, d) if and only if it is continuous at x0 as a mapping of (X, d2 ) into (Y, d). (c) An X-valued sequence converges in (X, d1 ) to x ∈ X if and only if it converges in (X, d2 ) to x. (d) The identity map of (X, d1 ) onto (X, d2 ) and its inverse (i.e., the identity map of (X, d2 ) onto (X, d1 )) are both continuous. Proof. Recall that, by definition, two metrics d1 and d2 on a set X are equivalent if the topologies T1 and T2 on X, induced by d1 and d2 respectively, coincide (i.e., if T1 = T2 ). Now apply Theorem 3.18. A one-to-one mapping G of a metric space X onto a metric space Y is a homeomorphism if both G: X → Y and G−1 : Y → X are continuous. Equivalently, a homeomorphism between metric spaces is an invertible (i.e., injective and surjective) mapping that is continuous and has a continuous inverse. Thus G is a homeomorphism from X to Y if and only if G−1 is a homeomorphism from Y to X. Two metric spaces are homeomorphic if there exists a homeomorphism between them. A function F : X → Y of a metric space X into a metric space Y is an open map (or an open mapping) if the image of each open set in X is open in Y (i.e., F (U ) is open in Y whenever U is open in X). Theorem 3.20. Let X and Y be metric spaces. If G: X → Y is invertible, then (a) G is open if and only if G−1 is continuous, (b) G is continuous if and only if G−1 is open, (c) G is a homeomorphism if and only if G and G−1 are both open. Proof. If G is invertible, then the inverse image of B (B ⊆ Y ) under G coincides with the image of B under the inverse of G (tautologically: G−1 (B) = G−1 (B)). Applying the same argument to the inverse G−1 of G (which is clearly invertible), (G−1 )−1 (A) = G(A) for each A ⊆ X. Thus the theorem is a straightforward combination of the definitions of open map and homeomorphism by using the alternative definition of continuity in Theorem 3.12. Thus a homeomorphism provides simultaneously a one-to-one correspondence between the underlying sets X and Y (so that X ↔ Y, since a homeomorphism is injective and surjective) and between their topologies (so that TX ↔ TY , since a homeomorphism puts the open sets of TX into a one-to-one correspondence with the open sets of TY ). Indeed, if TX and TY are the topologies on X and Y, respectively, then a homeomorphism G: X → Y induces a map G: TX → TY , defined by G(U ) = G(U ) for every U ∈ TX , which is injective and surjective according to Theorem 3.20. Thus any property of a metric
3.4 Equivalent Metrics and Homeomorphisms
111
space X expressed entirely in terms of set operations and open sets is also possessed by each metric space homeomorphic to X. We call a property of a metric space a topological property or a topological invariant if whenever it is true for one metric space, say X, it is true for every metric space homeomorphic to X (trivial examples: the cardinality of the underlying set and the cardinality of the topology). A map F : X → Y of a metric space X into a metric space Y is a topological embedding of X into Y if it establishes a homeomorphism of X onto its range R(F ) (i.e., F : X → Y is a topological embedding of X into Y if it is such that F : X → F (X) is a homeomorphism of X onto the subspace F (X) of Y ). Example 3.K. Suppose G: X → Y is a homeomorphism of a metric space X onto a metric space Y. Let A be a subspace of X and consider the subspace G(A) of Y. According to Problem 3.30 the restriction G|A : A → G(A) of G to A onto G(A) is continuous. Similarly, the restriction G−1 |G(A) : G(A) → A of the inverse of G to G(A) onto G−1 (G(A)) = A is continuous as well. Since G−1 |G(A) = (G|A )−1 (Problem 1.8), it follows that G|A : A → G(A) is a homeomorphism, and so A and G(A) are homeomorphic metric spaces (as subspaces of X and Y, respectively). Therefore, the restriction G|A : A → Y of a homeomorphism G: X → Y to any subset A of X is a topological embedding of A into Y. The notions of homeomorphism, open map, topological invariant, and topological embedding are germane to topological spaces in general (and to metric spaces in particular). For instance, both Theorem 3.20 and Example 3.K can be likewise stated (and proved) in a topological-space setting. In other words, the metric has played no role in the above paragraph, and “metric space” can be replaced with “topological space” there. Next we shall consider a couple of concepts that only make sense in a metric space. A homeomorphism G of a metric space (X, dX ) onto a metric space (Y, dY ) is a uniform homeomorphism if both G and G−1 are uniformly continuous. Two metric spaces are uniformly homeomorphic if there exists a uniform homeomorphism mapping one of them onto the other. An isometry between metric spaces is a map that preserves distance. Precisely, a mapping J: (X, dX ) → (Y, dY ) of a metric space (X, dX ) into a metric space (Y, dY ) is an isometry if dY (J(x), J(x )) = dX (x, x ) for every pair of points x, x in X. It is clear that every isometry is an injective contraction, and hence an injective and uniformly continuous mapping. Thus every surjective isometry is a uniform homeomorphism (the inverse of a surjective isometry is again a surjective isometry — trivial example: the identity mapping of a metric space into itself is a surjective isometry on that space). Two metric spaces are isometric (or isometrically equivalent) if there exists a surjective isometry between them, so that two isometrically equivalent metric
112
3. Topological Structures
spaces are uniformly homeomorphic. It is trivially verified that a composition of surjective isometries is a surjective isometry (transitivity), and this shows that the notion of isometrically equivalent metric spaces deserves its name: it is indeed an equivalence relation on any collection of metric spaces. If two metric spaces are isometrically equivalent, then they can be thought of as being essentially the same metric space — they may differ on the set-theoretic nature of their points but, as far as the metric space (topological) structure is concerned, they are indistinguishable. A surjective isometry not only preserves open sets (for it is a homeomorphism), but it also preserves distance. Now consider two metric spaces (X, d1 ) and (X, d2 ) with the same underlying set X. According to Corollary 3.19 the metrics d1 and d2 are equivalent if and only if the identity map of (X, d1 ) onto (X, d2 ) is a homeomorphism (i.e., if and only if I: (X, d1 ) → (X, d2 ) and its inverse I −1 : (X, d2 ) → (X, d1 ) are both continuous). We say that the metrics d1 and d2 are uniformly equivalent if the identity map of (X, d1 ) onto (X, d2 ) is a uniform homeomorphism (i.e., if I: (X, d1 ) → (X, d2 ) and its inverse I −1 : (X, d2 ) → (X, d1 ) are both uniformly continuous). For instance, if I and I −1 are both Lipschitzian, which means that there exist real numbers α > 0 and β > 0 such that α d1 (x, x ) ≤ d2 (x, x ) ≤ β d1 (x, x ) for every x, x in X, then the metrics d1 and d2 are uniformly equivalent, and hence equivalent. Thus, if d1 and d2 are equivalent metrics on X, then (X, d1 ) and (X, d2 ) are homeomorphic metric spaces. However, the converse fails: there exist uniformly homeomorphic metric spaces with the same underlying set for which the identity is not a homeomorphism. Example 3.L. Take two metric spaces (X, d1 ) and (X, d2 ) with the same underlying set X. Consider the product spaces (X×X, d) and (X×X, d ), where d((x, y), (u, v)) = d1 (x, u) + d2 (y, v), d ((x, y), (u, v)) = d2 (x, u) + d1 (y, v), for all ordered pairs (x, y) and (u, v) in X×X. In other words, (X×X, d) = (X, d1 )×(X, d2 ) and (X×X, d ) = (X, d2 )×(X, d1 ) — see Problem 3.9. Suppose the metrics d1 and d2 on X are not equivalent so that either the identity map of (X, d1 ) onto (X, d2 ) or the identity map of (X, d2 ) onto (X, d1 ) (or both) is not continuous. Let I: (X, d1 ) → (X, d2 ) be the one that is not continuous. The identity map I: (X×X, d) → (X×X, d ) is not continuous. Indeed, if it is continuous, then the restriction of it to a subspace of (X×X, d) is continuous (Problem 3.30). In particular, the restriction of it to (X, d1 ) — viewed as a subspace of (X×X, d) = (X, d1 )×(X, d2 ) — is continuous. But such a restriction is clearly identified with the identity map of (X, d1 )
3.4 Equivalent Metrics and Homeomorphisms
113
onto (X, d2 ), which is not continuous. Thus I: (X×X, d) → (X×X, d ) is not continuous, and hence the metrics d and d on X×X are not equivalent. Now let J: X×X → X×X be the involution (Problem 1.11) on X×X defined by J((x, y)) = (y, x)
for every
(x, y) ∈ X×X.
It is easy to show that J: (X×X, d) → (X×X, d ) is a surjective isometry. Thus J: (X×X, d) → (X×X, d ) is a uniform homeomorphism. Summing up: The metric spaces (X×X, d) and (X×X, d ), with the same underlying set X×X, are uniformly homeomorphic (more than that, they are isometrically equivalent), but the metrics d and d on X×X are not equivalent. Since two metric spaces with the same underlying set may be homeomorphic even if the identity between them is not a homeomorphism, it follows that a weaker version of Corollary 3.19 is obtained if we replace the homeomorphic identity with an arbitrary homeomorphism. This in fact can be formulated for arbitrary metric spaces (not necessarily with the same underlying set). Theorem 3.21. Let X and Y be metric spaces and let G be an invertible mapping of X onto Y. The following assertions are pairwise equivalent . (a) G is a homeomorphism. (b) A mapping F of X into a metric space Z is continuous if and only if the composition F G−1 : Y → Z is continuous. (c) An X-valued sequence {xn } converges in X to a limit x ∈ X if and only if the Y -valued sequence {G(xn )} converges in Y to G(x). Proof. Let G: X → Y be an invertible mapping of a metric space X onto a metric space Y. Proof of (a)⇒(b). Let F : X → Z be a mapping of X into a metric space Z, and consider the commutative diagram G−1
Y −−−→ X ⏐ ⏐ F H Z so that H = F G−1 : Y → Z. Suppose (a) holds true, and consider the following assertions. (b1 ) F : X → Z is continuous. (b2 ) F −1 (U ) is an open set in X whenever U is an open set in Z.
114
3. Topological Structures
(b3 ) G(F −1 (U )) is an open set in Y whenever U is an open set in Z. (b4 ) (F G−1 )−1 (U ) is an open set in Y whenever U is an open set in Z. (b5 ) H = F G−1 : Y → Z is continuous. Theorem 3.12 says that (b1 ) and (b2 ) are equivalent. But (b2 ) holds true if and only if (b3 ) holds true by Theorem 3.20 (the homeomorphism G: X → Y puts the open sets of X into a one-to-one correspondence with the open sets of Y ). Now note that, as G is invertible, G(F −1 (A)) = G(x) ∈ Y : F (x) ∈ A = y ∈ Y : F (G−1 (y)) ∈ A = (F G−1 )−1 (A) for every subset A of Z. Thus (b3 ) is equivalent to (b4 ), which in turn is equivalent to (b5 ) (cf. Theorem 3.12 again). Conclusion: (b1 )⇔(b5 ) whenever (a) holds true. Proof of (b)⇒(a). If (b) holds, then it holds in particular for Z = X and for Z = Y. Thus (b) ensures that the following assertions hold true.
(b ) If a mapping F : X → X of X into itself is continuous, then the mapping H = F G−1 : Y → X is continuous.
(b ) A mapping F : X → Y of X into Y is continuous whenever the mapping H = F G−1 : Y → Y is continuous.
Since the identity of X onto itself is continuous, (b ) implies that G−1 : Y → X is continuous. By setting F = G in (b ) it follows that G: X → Y is continuous (because the identity I = GG−1 : Y → Y is continuous). Summing up: (b) implies that both G and G−1 are continuous, which means that (a) holds true. Proof of (a)⇔(c). According to Corollary 3.8 an invertible mapping G between metric spaces is continuous and has a continuous inverse if and only if both G and G−1 preserve convergence.
3.5 Closed Sets and Closure A subset V of a metric space X is closed in X if its complement X\V is an open set in X. Theorem 3.22. If X is a metric space, then (a) the whole set X and the empty set ∅ are closed , (b) the union of a finite collection of closed sets is closed , (c) the intersection of an arbitrary collection of closed sets is closed . Proof. Apply the De Morgan laws to each item of Theorem 3.15.
3.5 Closed Sets and Closure
115
Thus the concepts “closed” and “open” are dual to each other (U is open in X if and only if its complement X\U is closed in X, and V is closed in X if and only if its complement X\V is open in X); but they are neither exclusive (a set in a metric space may be both open and closed) nor exhaustive (a set in a metric space may be neither open nor closed). Theorem 3.23. A map between metric spaces is continuous if and only if the inverse image of each closed set is a closed set. Proof. Let F : X → Y be a mapping of a metric space X into a metric space Y. Recall that F −1 (Y \B) = X\F −1 (B) for every subset B of Y (Problem 1.2(b)). Suppose F is continuous and take an arbitrary closed set V in Y. Since Y \V is open in Y, it follows by Theorem 3.12 that F −1 (Y \V ) is open in X. Thus F −1 (V ) = X\F −1 (Y \V ) is closed in X. Therefore, the inverse image under F of an arbitrary closed set V in Y is closed in X. Conversely, suppose the inverse image under F of each closed set in Y is a closed set in X and take an arbitrary open set U in Y. Thus F −1 (Y \U ) is closed in X (because Y \U is closed in Y ) so that F −1 (U ) = X\F −1 (Y \U ) is open in X. Conclusion: The inverse image under F of an arbitrary open set U in Y is open in X. Therefore F is continuous by Theorem 3.12. A function F : X → Y of a metric space X into a metric space Y is a closed map (or a closed mapping) if the image of each closed set in X is closed in Y (i.e., F (V ) is closed in Y whenever V is closed in X). In general, a map F : X → Y of a metric space X into a metric space Y may possess any combination of the attributes “continuous”, “open”, and “closed” (i.e., these are independent concepts). However, if F : X → Y is invertible (i.e., injective and surjective), then it is a closed map if and only if it is an open map. Theorem 3.24. Let X and Y be metric spaces. If a map G: X → Y is invertible, then (a) G is closed if and only if G−1 is continuous, (b) G is continuous if and only if G−1 is closed , (c) G is a homeomorphism if and only if G and G−1 are both closed . Proof. Replace “open map” with “closed map” in the proof of Theorem 3.20 and use Theorem 3.23 instead of Theorem 3.12. Let A be a set in a metric space X and let VA be the collection of all closed subsets of X that include A: VA = V ∈ ℘ (X): V is closed in X and A ⊆ V . The whole set X always belongs to VA so that VA is never empty. The intersectionof all sets in VA is called the closure of A in X, denoted by A− (i.e., A− = VA ). According to Theorem 3.22(c) it follows that
116
3. Topological Structures
A− is closed in X
A ⊆ A− .
and
If V ∈ VA , then it is plain that A− = inclusion ordering of ℘(X),
VA ⊆ V . Thus, with respect to the
A− is the smallest closed subset of X that includes A, and hence (since A− is closed in X) A is closed in X
if and only if
A = A− .
From the above displayed results it is readily verified that ∅− = ∅,
X − = X,
(A− )− = A−
and, if B also is a set in X, A⊆B
A− ⊆ B − .
implies
Moreover, since both A and B are subsets of A ∪ B, we get A− ⊆ (A ∪ B)− and B − ⊆ (A ∪ B)− so that A− ∪ B − ⊆ (A ∪ B)−. On the other hand, since (A ∪ B)− is the smallest closed subset of X that includes A ∪ B, and since A− ∪ B − is closed (Theorem 3.22(b)) and includes A ∪ B (because A ⊆ A− and B ⊆ B − so that A ∪ B ⊆ A− ∪ B −), it follows that (A ∪ B)− ⊆ A− ∪ B −. Therefore, if A and B are subsets of X, then (A ∪ B)− = A− ∪ B − . It is easy to show by induction that the above identity holds for any finite collection of subsets of X. That is, the closure of the union of a finite collection of subsets of X coincides with the union of their closures. In general (i.e., by allowing infinite collections as well) one has inclusion rather than equality. Indeed, if {Aγ }γ∈Γ is an arbitrary indexed family of subsets of X, then
A− γ ⊆
γ
since Aα ⊆
−
γ
− − γ Aγ and hence Aα ⊆ ( γ Aγ ) for each index α ∈ Γ . Similarly, γ
Aγ
Aγ
−
⊆
A− γ
γ
− since γ Aγ ⊆ γ A− γ and γ Aγ is closed in X by Theorem 3.22(c). However, these inclusions are not reversible in general, so that equality does not hold.
Example 3.M. Set X = R with its usual metric and consider the following subsets of R: An = [0, 1 − n1 ], which is closed in R for each positive integer n, and A = [0, 1), which is not closed in R. Since
3.5 Closed Sets and Closure ∞
117
An = A,
n=1
it follows that the union of an infinite collection of closed sets is not necessarily − closed (cf. Theorem 3.22(b)). Moreover, as A− n = An for each n and A = [0, 1], [0, 1) =
∞
A− n ⊂
n=1
∞
An
−
= [0, 1],
n=1
which is a proper inclusion. If B = [1, 2] (so that B − = B), then ∅ = (A ∩ B)− ⊂ A− ∩ B − = {1}, so that the closure of any (even finite) intersection of sets may be a proper subset of the intersection of their closures. A point x in X is adherent to A (or an adherent point of A, or a point of adherence of A) if it belongs to the closure A− of A. It is clear that every point of A is an adherent point of A (i.e., A ⊆ A− ). Proposition 3.25. Let A be a subset of a metric space X and let x be a point in X. The following assertions are pairwise equivalent . (a) x is a point of adherence of A. (b) Every open set in X that contains x meets A (i.e., if U is open in X and x ∈ U , then A ∩ U = ∅). (c) Every neighborhood of x contains at least one point of A (which may be x itself ). Proof. Suppose there is an open set U in X containing x for which A ∩ U = ∅. Then A ⊆ X\U, the set X\U is closed in X, and x ∈ / X\U . Since A− is the smallest closed subset of X that includes A, it follows that A− ⊆ X\U so that x∈ / A−. Thus the denial of (b) implies the denial of (a), which means that (a) implies (b). Conversely, if x ∈ / A−, then x lies in the open set X\A−, which − − does not meet A (A ∩ X\A− = ∅). Therefore, the denial of (a) implies the denial of (b); that is, (b) implies (a). Finally note that (b) is equivalent to (c) as an obvious consequence of the definition of neighborhood. A point x in X is a point of accumulation (or an accumulation point , or a cluster point) of A if it is a point of adherence of A\{x}. The set of all accumulation points of A is the derived set of A, denoted by A . Thus x ∈ A if and only if x ∈ (A\{x})−. It is clear that every point of accumulation of A is also a point of adherence of A; that is, A ⊆ A− (since A\{x} ⊆ A implies (A\{x})− ⊆ A− ). Actually, A− = A ∪ A .
118
3. Topological Structures
Indeed, since A ⊆ A− and A ⊆ A−, it follows that A ∪ A ⊆ A−. On the other hand, if x ∈ / A ∪ A , then (A\{x})− = A− (since A\{x} = A whenever x ∈ / A), and hence x ∈ / A− (because x ∈ / A so that x ∈ / (A\{x})− ). Thus if x ∈ A− , then x ∈ A ∪ A , which means that A− ⊆ A ∪ A . Hence A− = A ∪ A . So A = A−
if and only if
A ⊆ A.
That is, A is closed in X if and only if it contains all its accumulation points. It is trivially verified that A⊆B
implies
A ⊆ B
whenever A and B are subsets of X. Also note that A− = ∅ if and only if A = ∅ (for ∅− = ∅ and ∅ ⊆ A ⊆ A− ), and A = ∅ whenever A = ∅ (because A ⊆ A− ), but the converse fails (e.g., the derived set of a singleton is empty). Proposition 3.26. Let A be a subset of a metric space X and let x be a point in X. The following assertions are pairwise equivalent. (a) x is a point of accumulation of A. (b) Every open set in X that contains x also contains at least one point of A other than x. (c) Every neighborhood of x contains at least one point of A distinct from x. Proof. Since x ∈ X is a point of accumulation of A if and only if it is a point of adherence of A\{x}, it follows by Proposition 3.25 that the assertions (a), (b), and (c) are equivalent (replace A with A\{x} in Proposition 3.25). Everything that has been written so far in this section pertains to the realm of topological spaces (metrizable or not). However, the following results are typical of metric spaces. Proposition 3.27. Let A be a subset of a metric space (X, d) and let x be a point in X. The following assertions are pairwise equivalent . (a) x is a point of adherence of A. (b) Every nonempty open ball centered at x meets A. (c) A = ∅ and d(x, A) = 0. (d) There exists an A-valued sequence that converges to x in (X, d). Proof. The equivalence (a)⇔(b) follows by Proposition 3.25 (recall: every nonempty open ball centered at x is a neighborhood of x and, conversely, every neighborhood of x includes a nonempty open ball centered at x, so that every nonempty open ball centered at x meets A if and only if every neighborhood of x meets A). Clearly (b)⇔(c) (i.e., for each ε > 0 there exists a ∈ A such that d(x, a) < ε if and only if A = ∅ and inf a∈A d(x, a) = 0). Theorem 3.14
3.5 Closed Sets and Closure
119
ensures that (d)⇒(b). On the other hand, if (b) holds true, then for each positive integer n the open ball B1/n (x) meets A (i.e., B1/n (x) ∩ A = ∅). Take xn ∈ B1/n (x) ∩ A so that xn ∈ A and 0 ≤ d(xn , x) < n1 for each n. Thus {xn } is an A-valued sequence such that d(xn , x) → 0. Therefore (b)⇒(d). Proposition 3.28. Let A be a subset of a metric space (X, d) and let x be a point in X. The following assertions are pairwise equivalent . (a) x is a point of accumulation of A. (b) Every nonempty open ball centered at x contains a point of A distinct from x. (c) Every nonempty open ball centered at x contains infinitely many points of A. (d) There exists an A\{x}-valued sequence of pairwise distinct points that converges to x in (X, d). Proof. (d)⇒(c) by Theorem 3.14, (c)⇒(b) trivially, and (d)⇒(a)⇒(b) by the previous proposition. To complete the proof, it remains to show that (b)⇒(d). Let Bε (x) be an open ball centered at x ∈ X with radius ε > 0. We shall say that an A-valued sequence {xk }k∈N has Property Pn , for some integer n ∈ N , if xk is in B1/k (x)\{x} for each k = 1 , . . . , n+1 and if d(xk+1 , x) < d(xk , x) for every k = 1 , . . . , n. Claim . If assertion (b) holds true, then there exists an A-valued sequence that has Property Pn for every n ∈ N . Proof. Suppose assertion (b) holds true so that (Bε (x)\{x}) ∩ A = ∅ for every ε > 0. Now take an arbitrary x1 in (B1 (x)\{x}) ∩ A and an arbitrary x2 in (Bε2 (x)\{x}) ∩ A with ε2 = min{ 21 , d(x1 , x)}. Every A-valued sequence whose first two entries coincide with x1 and x2 has Property P1 . Suppose there exists an A-valued sequence that has Property Pn for some integer n ∈ N . Take 1 any point from (Bεn+2 (x)\{x}) ∩ A where εn+2 = min{ n+2 , d(xn+1 , x)}, and replace the (n+2)th entry of that sequence with this point. The resulting sequence has Property Pn+1 . Thus there exists an A-valued sequence that has Property Pn+1 whenever there exists one that has Property Pn , and this concludes the proof by induction. However, an A-valued sequence {xk }k∈N that has Property Pn for every n ∈ N in fact is an A\{x}-valued sequence of pairwise distinct points such that 0 < d(xk , x) < k1 for every k ∈ N . Therefore (b)⇒(d). Recall that “point of adherence” and “point of accumulation” are concepts defined for sets, while “limit of a convergent sequence” is, of course, a concept defined for sequences. But the range of a sequence is a set, and it can have (many) accumulation points. Let (X, d) be a metric space and let {xn } be an X-valued sequence. A point x in X is a cluster point of the sequence {xn } if
120
3. Topological Structures
some subsequence of {xn } converges to x. The cluster points of a sequence are precisely the accumulation points of its range (Proposition 3.28). If a sequence is convergent, then (Proposition 3.5) its range has only one point of accumulation which coincides with the unique limit of the sequence. Corollary 3.29. The derived set A of every subset A of a metric space (X, d) is closed in (X, d). Proof. Let A be an arbitrary subset of a metric space (X, d). We want to show that (A )− = A (i.e., A is closed) or, equivalently, (A )− ⊆ A (recall: every set is included in its closure). If A is empty, then the result is trivially verified (∅ = ∅ = ∅− ). Thus suppose A is nonempty. Take an arbitrary x− in (A )− and an arbitrary ε > 0. Proposition 3.27 ensures that Bε (x− ) ∩ A = ∅. Take x in Bε (x− ) ∩ A and set δ = ε − d(x , x− ). Note that 0 < δ ≤ ε (because 0 ≤ d(x , x− ) < ε). Since x ∈ A , we get by Proposition 3.28 that Bδ (x ) ∩ A contains infinitely many points. Take x in Bδ (x ) ∩ A distinct from x− and from x . Thus 0 < d(x, x− ) ≤ d(x, x ) + d(x , x− ) < δ + d(x , x− ) = ε by the triangle inequality. Therefore x ∈ Bε (x− ) and x = x−. Conclusion: Every nonempty ball Bε (x− ) centered at x− contains a point x of A other than x−. Thus x− ∈ A by Proposition 3.28, and so (A )− ⊆ A . The preceding corollary does not hold in a general topological space. Indeed, if a set X containing more than one point is equipped with the indiscrete topology (where the only open sets are ∅ and X), then the derived set {x} of a singleton {x} is X\{x} which is not closed in that topology. Theorem 3.30. (The Closed Set Theorem). A subset A of a set X is closed in the metric space (X, d) if and only if every A-valued sequence that converges in (X, d) has its limit in A. Proof. (a) Take an arbitrary A-valued sequence {xn } that converges to x ∈ X in (X, d). By Theorem 3.14 {xn } is eventually in every neighborhood of x, and hence every neighborhood of x contains a point of A. Thus x is a point of adherence of A (Proposition 3.25); that is, x ∈ A−. If A = A− (equivalently, if A is closed in (X, d)), then x ∈ A. (b) Conversely, take an arbitrary point x ∈ A− (i.e., an arbitrary point of adherence of A). According to Proposition 3.27, there exists an A-valued sequence that converges to x in (X, d). If every A-valued sequence that converges in (X, d) has its limit in A, then x ∈ A. Thus A− ⊆ A, and hence A = A− (since A ⊆ A− for every set A). That is, A is closed in (X, d). This is a particularly useful result that will often be applied throughout this book. Part (a) of the proof holds for general topological spaces but not part (b). The counterpart of the above theorem for general (not necessarily metrizable) topological spaces is stated in terms of nets (instead of sequences).
3.6 Dense Sets and Separable Spaces
121
Example 3.N. Consider the set B[X, Y ] of all bounded mappings of a metric space (X, dX ) into a metric space (Y, dY ), and let BC[X, Y ] denote the subset of B[X, Y ] consisting of all bounded continuous mappings of (X, dX ) into (Y, dY ). Equip B[X, Y ] with the sup-metric d∞ as in Example 3.C. We shall use the Closed Set Theorem to show that BC[X, Y ] is closed in (B[X, Y ], d∞ ). Take any BC[X, Y ]-valued sequence {fn } that converges in (B[X, Y ], d∞ ) to a mapping f ∈ B[X, Y ]. The triangle inequality in (Y, dY ) ensures that dY (f (u), f (v)) ≤ dY (f (u), fn (u)) + dY (fn (u), fn (v)) + dY (fn (v), f (v)) for each integer n and every u, v ∈ X. Take an arbitrary real number ε > 0. Since fn → f in (B[X, Y ], d∞ ), it follows that there exists a positive integer nε such that d∞ (fn , f ) = supx∈X dY (fn (x), f (x)) < 3ε , and so dY (fn (x), f (x)) < ε3 for all x ∈ X, whenever n ≥ nε (uniform convergence — see Example 3.G). Since each fn is continuous, it follows that there exists a real number δε > 0 (which may depend on u and v) such that dY (fnε (u), fnε (v)) < 3ε whenever dX (u, v) < δε . Therefore dY (f (u), f (v)) < ε whenever dX (u, v) < δε , so that f is continuous. That is, f ∈ BC[X, Y ]. Thus, according to Theorem 3.30, BC[X, Y ] is a closed subset of the metric space (B[X, Y ], d∞ ). Particular case (see Examples 3.D and 3.G): C[0, 1] is closed in (B[0, 1], d∞ ).
3.6 Dense Sets and Separable Spaces Let A be a set in a metric space X, and let U A be the collection of all open subsets of X included in A: U A = U ∈ ℘(X): U is open in X and U ⊆ A . The empty set ∅ of X always belongs to U A so that U A is never empty. The union of all sets in U A is called the interior of A in X, denoted by A◦ (i.e., ◦ A = U A ). According to Theorem 3.15(c), it follows that A◦ is open in X If U ∈ U A , then it is plain that U ⊆ inclusion ordering of ℘(X),
and
A◦ ⊆ A.
U A = A◦ . Thus, with respect to the
A◦ is the largest open subset of X that is included in A, and hence (since A◦ is open in X) A is open in X
if and only if
A◦ = A.
From the above displayed results it is readily verified that ∅◦ = ∅,
X ◦ = X,
(A◦ )◦ = A◦
122
3. Topological Structures
and, if B also is a set in X, A⊆B
A◦ ⊆ B ◦ .
implies
Moreover, since A ∩ B is a subset of both A and B, we get (A ∩ B)◦ ⊆ A◦ ∩ B ◦ . On the other hand, since (A ∩ B)◦ is the largest open subset of X that is included in A ∩ B, and since A◦ ∩ B ◦ is open (Theorem 3.15(b)) and is included in A ∩ B (because A◦ ⊆ A and B ◦ ⊆ B so that A◦ ∩ B ◦ ⊆ A ∩ B), it follows that A◦ ∩ B ◦ ⊆ (A ∩ B)◦ . Therefore, if A and B are subsets of X, then (A ∩ B)◦ = A◦ ∩ B ◦ . It is shown by induction that the above identity holds for any finite collection of subsets of X. That is, the interior of the intersection of a finite collection of subsets of X coincides with the intersection of their interiors. In general (i.e., by allowing infinite collections as well) one has inclusion rather than equality. Indeed, if {Aγ }γ∈Γ is an arbitrary indexed family of subsets of X, then ◦ Aγ ⊆ A◦γ since
γ
γ
◦ γ Aγ ⊆ Aα and hence ( γ Aγ ) ⊆ Aα for each index α ∈ Γ . Similarly, ◦ A◦γ ⊆ Aγ
◦ γ Aγ
γ
γ
◦ γ Aγ
since ⊆ γ Aγ and is open in X by Theorem 3.15(c). However, these inclusions are not reversible in general, so that equality does not hold. Example 3.O. This is the dual of Example 3.M. Consider the setup of Example 3.M and set Cn = X\An, which is open in R for each positive integer n, and C = X\A, which is not open in R. Since ∞ n=1
Cn =
∞
(X\An ) = X
n=1
∞ )
An = X\A = C,
n=1
it follows that the intersection of an infinite collection of open sets is not necessarily open (see Theorem 3.15(b)). Moreover, as Cn◦ = Cn for each n, (X\A)◦ = C ◦ =
∞
Cn
◦
n=1
⊂
∞
Cn◦ = C = X\A,
n=1
which is a proper inclusion. Now set D = X\B = (−∞, 1) ∪ (2, ∞) (so that D◦ = D). Thus C ◦ ∪ D◦ is a proper subset of (C ∪ D)◦ : R\{1} = C ◦ ∪ D◦ ⊂ (C ∪ D)◦ = R.
Remark : The duality between “interior” and “closure” is clear: (X\A)− = X\A◦
and
(X\A)◦ = X\A−
3.6 Dense Sets and Separable Spaces
123
for every A ⊆ X. Indeed, U ∈ U A if and only if X\U ∈ VX \A (i.e., U is open in X and U ⊆ A if and only if X\U is closed in Xand X\A ⊆ X\U) and, dually, V ∈ VX \A if and only if X\V ∈ U A . Thus A◦ = U∈ U A U = X\ U∈ U A (X\U ) = X\ V ∈VX\AV = X\(X\A)− and so X\A◦ = (X\A)− ; which implies (swap A and X\A) that X\(X\A)◦ = A− and hence (X\A)◦ = X\A−. This confirms the above identities and also their equivalent forms: A◦ = X\(X\A)−
and
A− = X\(X\A)◦ .
Thus it is easy to show that A− \(X\A) = A and A\(X\A◦ ) = A◦. A point x ∈ X is an interior point of A if it belongs to the interior A◦ of A. It is clear that every interior point of A is a point of A (i.e., A◦ ⊆ A), and it is readily verified that x ∈ A is an interior point of A if and only if there exists a neighborhood of x included in A (reason: A◦ is the largest open neighborhood of every interior point of A that is included in A). The interior of the complement of A, (X\A)◦, is called the exterior of A, and a point x ∈ X is an exterior point of A if it belongs to the exterior (X\A)◦ of A. A subset A of a metric space X is called dense in X (or dense everywhere) if its closure A− coincides with X (i.e., if A− = X). More generally, suppose A and B are subsets of a metric space X such that A ⊆ B. A is dense in B if B ⊆ A− or, equivalently, if A− = B − (why?). Clearly, if A ⊆ B and A− = X, then B − = X. Note that the only closed set dense in X is X itself. Proposition 3.31. Let A be a subset of a metric space X. The following assertions are pairwise equivalent . (a) A− = X (i.e., A is dense in X). (b) Every nonempty open subset of X meets A. (c) VA = {X}. (d) (X\A)◦ = ∅ (i.e., the complement of A has empty interior ). Proof. Take any nonempty open subset U of X, and take an arbitrary u in U ⊆ X. If (a) holds true, then every point of X is adherent to A. In particular, u is adherent to A. Thus Proposition 3.25 ensures that U meets A. Conclusion: (a)⇒(b). Now take an arbitrary proper closed subset V of X so that ∅ = X\V is open in X. If (b) holds true, then (X\V ) ∩ A = ∅. Thus V does not include A, and so V ∈ / VA . Hence (b)⇒(c). Since A− ∈ VA , it follows that (c)⇒(a). The equivalence (a)⇔(d) is obvious from the identity A− = X\(X\A)◦ . The reader has probably observed that the concepts and results so far in this section apply to topological spaces in general. From now on the metric will play its role. Note that a point in a subset A of a metric space X is an interior point of A if and only if it is the center of a nonempty open ball
124
3. Topological Structures
included in A (reason: every nonempty open ball is a neighborhood and every neighborhood includes a nonempty open ball). We shall say that (A, d) is a dense subspace of a metric space (X, d) if the subset A of X is dense in (X, d). Proposition 3.32. Let (X, d) be a metric space and let A and B be subsets of X such that ∅ = A ⊆ B ⊆ X. The following assertions are pairwise equivalent . (a) A− = B − (i.e., A is dense in B). (b) Every nonempty open ball centered at any point b of B meets A. (c) inf a∈A d(b, a) = 0 for every b ∈ B. (d) For every point b in B there exists an A-valued sequence {an } that converges in (X, d) to b. Proof. Recall that A− = B − if and only if B ⊆ A−. Let b be an arbitrary point in B. Thus assertion (a) can be rewritten as follows.
(a ) Every point b in B is a point of adherence of A.
Now notice that assertions (a ), (b), (c), and (d) are pairwise equivalent by Proposition 3.27. Corollary 3.33. Let F and G be continuous mappings of a metric space X into a metric space Y. If F and G coincide on a dense subset of X, then they coincide on the whole space X. Proof. Suppose X is nonempty to avoid trivialities. Let A be a nonempty dense subset of X. Take an arbitrary x ∈ X and let {an } be an A-valued sequence that converges in X to x (whose existence is ensured by Proposition 3.32). If F : X → Y and G: X → Y are continuous mappings such that F |A = G|A , then F (x) = F (lim an ) = lim F (an ) = lim G(an ) = G(lim an ) = G(x) (Corollary 3.8). Thus F (x) = G(x) for every x ∈ X; that is, F = G.
A metric space (X, d) is separable if there exists a countable dense set in X. The density criteria in Proposition 3.3 (with B = X) are particularly useful to check separability. Example 3.P. Take an arbitrary integer n ≥ 1, an arbitrary real p ≥ 1, and consider the metric space (Rn, dp ) of Example 3.A. Since the set of all rational numbers Q is dense in the real line R equipped with its usual metric, it follows that Q n (the set of all rational n-tuples) is dense in (Rn, dp ). Indeed, Q − = R implies that inf υ∈Q |ξ − υ| = 0 for every n ξ ∈ R, which
in turn implies that p p1 inf y∈Q n dp (x, y) = inf y=(υ1 ,...,υn )∈Q n |ξ − υ | = 0 for every vector i i=1 i x = (ξ1 , . . . , ξn ) in Rn. Hence (Q n )− = Rn according to Proposition 3.32. Moreover, since # Q n = # Q = ℵ0 (Problems 1.25(c) and 2.8), it follows that Q n is countably infinite. Thus Q n is a countable dense subset of (Rn, dp ), and so
3.6 Dense Sets and Separable Spaces
125
(Rn, dp ) is a separable metric space. p Now consider the metric space (+ , dp ) for any p ≥ 1 as in Example 3.B, where p 0 + is the set of all real-valued p-summable infinite sequences. Let + be the p subset of + made up of all real-valued infinite sequences with a finite number 0 of nonzero entries, and let X be the subset of + consisting of all rational0 valued infinite sequences with a finite number of nonzero entries. The set + is p − dense in (+ , dp ) — Problem 3.44(b). Since Q = R, it follows that X is dense 0 in (+ , dp ) — the proof is essentially the same as the proof that Q n is dense in p p n 0 − (R , dp ). Thus X − = (+ ) = + , and so X is dense in (+ , dp ). Next we show that X is countably infinite. In fact, X is a linear space over the rational field Q and dim X = ℵ0 (see Example 2.J). Thus # X = max{ # Q , dim X} = ℵ0 p by Problem 2.8. Conclusion: X is a countable dense subset of (+ , dp ), and so p (+ , dp ) is a separable metric space.
The same argument is readily extended to complex spaces so that (C n, dp ) p p also is separable, as well as (+ , dp ) when + is made up of all complex-valued p-summable infinite sequences. Finally we show that (see Example 3.D) (C[0, 1], d∞ ) is a separable metric space. Actually, the set P [0, 1] of all polynomials on [0, 1] is dense in (C[0, 1], d∞ ). This is the well-known Weierstrass Theorem, which says that every continuous function in C[0, 1] is the uniform limit of a sequence of polynomials in P [0, 1] (i.e., for every x ∈ C[0, 1] there exists a P [0, 1]-valued sequence {pn } such that d∞ (pn , x) → 0). Moreover, it is easy to show that the set X of all polynomials on [0, 1] with rational coefficients is dense in (P [0, 1], d∞ ), and so X is dense in (C[0, 1], d∞ ). Since X is a linear space over the rational field Q , and since dim X = ℵ0 (essentially the same proof as in Example 2.M), we get by Problem 2.8 that X is countable. Thus X is a countable dense subset of (C[0, 1], d∞ ). A collection B of open subsets of a metric space X is a base (or a topological base) for X if every open set in X is the union of some subcollection of B. For instance, the collection of all open balls in a metric space (including the empty ball) is a base for X (cf. Corollary 3.16). Note that the above definition forces the empty set ∅ of X to be a member of any base for X if the subcollection is nonempty. Proposition 3.34. Let B be a collection of open subsets of a metric space X that contains the empty set. The following assertions are pairwise equivalent . (a) B is a base for X. (b) For every nonempty open subset U of X and every point x in U there exists a set B in B such that x ∈ B ⊆ U . (c) For every x in X and every neighborhood N of x there exists a set B in B such that x ∈ B ⊆ N .
126
3. Topological Structures
Proof. Take an arbitrary open subset U of the metric space X and set BU = {B ∈ B: B ⊆ U }. If B is a base for X, then U = BU by the definition of base. Thus, if x ∈ U , then x ∈ B for some B ∈ BU so that x ∈ B ⊆ U . That is, (a) implies (b). On the other hand, if (b) holds, then any open subset U of X clearly coincides with BU , which shows that (a) holds true. Finally note that (b) and (c) are trivially equivalent: every neighborhood of x includes an open set containing x, and every open set containing x is a neighborhood of x. Theorem 3.35. A metric space is separable if and only if it has a countable base. Proof. Suppose B = {Bn } is a countable base for X. Consider a set {bn } with each bn taken from each nonempty set Bn in B. Proposition 3.34(b) ensures that for every nonempty open subset U of X there exists a set Bn such that Bn ⊆ U , and therefore U ∩ {bn } = ∅. Thus Proposition 3.31(b) says that the countable set {bn } is dense in X, and so X is separable. On the other hand, suppose X is separable, which means that there is a countable subset A of X that is dense in X. Consider the collection B = {B1/n (a): n ∈ N and a ∈ A} of nonempty open balls, which is a double indexed family, indexed by N ×A (i.e., by two countable sets), and thus a countable collection itself. In other words, # B = # (N ×A) = max{ # N , # A} = # N — cf. Problem 1.30(b). Claim . For every x ∈ X and every neighborhood N of x there exists a ball in B containing x and included in N . Proof. Take an arbitrary x ∈ X and an arbitrary neighborhood N of x. Let Bε (x) be an open ball of radius ε > 0, centered at x, and included in N . Take a positive integer n such that n1 < 2ε and a point a ∈ A such that a ∈ B1/n (x) (recall: since A− = X, it follows by Proposition 3.32(b) that there exists a ∈ A such that a ∈ Bρ (x) for every x ∈ X and every ρ > 0). Obviously, x ∈ B1/n (a). Moreover, if y ∈ B1/n (a), then d(x, y) ≤ d(x, a) + d(a, y) < n2 < ε so that y ∈ Bε (x), and hence B1/n (a) ⊆ Bε (x). Thus x ∈ B1/n (a) ⊆ Bε (x) ⊆ N . Therefore the countable collection B ∪ {∅} of open balls is a base for X by Proposition 3.34(c). Corollary 3.36. Every subspace of a separable metric space is itself separable. Proof. Let S be a subspace of a separable metric space X and, according to Theorem 3.35, let B be a countable base for X. Set BS = {S ∩ B: B ∈ B}, which is a countable collection of subsets of S. Since the sets in B are open subsets of X, it follows that the sets in BS are open relative to S (see Problem 3.38(c)). Take an arbitrary nonempty relatively open subset A of S so that A = S ∩ U for some open subset U of X (Problem 3.38(c)). Since U = B for some subcollection B of B, it follows that A = S ∩ B∈B B = B∈B S ∩ B = BS , where BS = {S ∩ B: B ∈ B } is a subcollection of BS . Thus BS is a base for S. Therefore the subspace S has a countable base, which means by the previous theorem that S is separable.
3.6 Dense Sets and Separable Spaces
127
Let A be a subset of a metric space. An isolated point of A is a point in A that is not an accumulation point of A. That is, a point x is an isolated point of A if x ∈ A\A . Proposition 3.37. Let A be a subset of a metric space X and let x be a point in A. The following assertions are pairwise equivalent. (a) x is an isolated point of A. (b) There exists an open set U in X such that A ∩ U = {x}. (c) There exists a neighborhood N of x such that A ∩ N = {x}. (d) There exists an open ball Bρ (x) centered at x such that A ∩ Bρ (x) = {x}. Proof. Assertion (a) is equivalent to assertion (b) by Proposition 3.26. Assertions (b), (c), and (d) are trivially pairwise equivalent. A subset A of X consisting entirely of isolated points is a discrete subset of X. This means that in the subspace A every set is open, and hence the subspace A is homeomorphic to a discrete space (i.e., to a metric space equipped with the discrete metric). According to Theorem 3.35 and Corollary 3.36, a discrete subset of a separable metric space is countable. Thus, if a metric space has an uncountable discrete subset, then it is not separable. Example 3.Q. Let S be a set, let (Y, d) be a metric space, and consider the metric space (B[S, Y ], d∞ ) of all bounded mappings of S into (Y, d) equipped with the sup-metric d∞ (Example 3.C). Suppose Y has more than one point, and let y0 and y1 be two distinct points in Y. As usual, let 2S denote the set of all mappings on S with values either y0 or y1 (i.e., the set of all mappings of S into {y0 , y1 } so that 2S = {y0 , y1 }S ⊆ B[S, Y ]). If f, g ∈ 2S and f = g (i.e., if f and g are two distinct mappings on S with values either y0 or y1 ), then d∞ (f, g) = sup d(f (s), g(s)) = d(y0 , y1 ) = 0. s∈S
Therefore, any open ball Bρ (g) = {f ∈ 2S : d∞ (f, g) < ρ} centered at an arbitrary point g of 2S with radius ρ = d(y0 , y1 )/2 is such that 2S ∩ Bρ (g) = {g}. This means that every point of 2S is an isolated point of it, and hence 2S is a discrete set in (B[S, Y ], d∞ ). If S is an infinite set, then 2S is an uncountable subset of B[S, Y ] (recall: if S is infinite, then ℵ0 ≤ # S < # 2S by Theorems 1.4 and 1.5). Thus (B[S, Y ], d∞ ) is not separable whenever 2 ≤ # Y and ℵ0 ≤ # S. Concrete example: ∞ , d∞ ) is not a separable metric space. (+
Indeed, set S = N and Y = C (or Y = R) with its usual metric d, so that ∞ (B[S, Y ], d∞ ) = (+ , d∞ ): the set of all scalar-valued bounded sequences
128
3. Topological Structures
equipped with the sup-metric, as introduced in Example 3.B. The set 2N, consisting of all sequences with values either 0 or 1, is an uncountable discrete ∞ subset of (+ , d∞ ). In a discrete subset every point is isolated. The opposite notion is that of a set where every point is not isolated. A subset A of a metric space X is dense in itself if A has no isolated point or, equivalently, if every point in A is an accumulation point of A; that is, if A ⊆ A . Since A− = A ∪ A for every subset A of X, it follows that a set A is dense in itself if and only if A = A−. A subset A of X that is both closed in X and dense in itself (i.e., such that A = A) is a perfect set: a closed set without isolated points. For instance, Q ∩ [0, 1] is a countable perfect subset of the metric space Q , but it is not perfect in the metric space R (since it is not closed in R). As a matter of fact, every nonempty perfect subset of R is uncountable because R is a “complete” metric space, a concept that we shall define next.
3.7 Complete Spaces Consider the metric space (R, d), where d denotes the usual metric on the real line R, and let (A, d) be the subspace of (R, d) with A = (0, 1]. Let {αn }n∈N be the A-valued sequence such that αn = n1 for each n ∈ N . Does {αn } converge in the metric space (A, d)? It is clear that {αn } converges to 0 in (R, d), and hence we might at first glance think that it also converges in (A, d). But the point 0 simply does not exist in A, so that it is nonsense to say that “αn → 0 in (A, d)”. In fact, {αn } does not converge in the metric space (A, d). However, the sequence {αn } seems to possess a “special property” that makes it apparently convergent in spite of the particular underlying set A, and the metric space (A, d) in turn seems to bear a “peculiar characteristic” that makes such a sequence fail to converge in it. The “special property” of the sequence {αn } is that it is a Cauchy sequence in (A, d) and the “peculiar characteristic” of the metric space (A, d) is that it is not complete. Definition 3.38. Let (X, d) be a metric space. An X-valued sequence {xn } (indexed either by N or N 0 ) is a Cauchy sequence in (X, d) (or satisfies the Cauchy criterion) if for each real number ε > 0 there exists a positive integer nε such that n, m ≥ nε implies d(xm , xn ) < ε. A usual notation for the Cauchy criterion is limm,n d(xm , xn ) = 0. Equivalently, an X-valued sequence {xn } is a Cauchy sequence if diam({xk }n≤k ) → 0 as n → ∞ (i.e., limn diam({xk }n≤k ) = 0). Basic facts about Cauchy sequences are stated in the following proposition. In particular, it shows that every convergent sequence is bounded , and that a Cauchy sequence has a convergent subsequence if and only if every subsequence of it converges (see Proposition 3.5).
3.7 Complete Spaces
129
Proposition 3.39. Let (X, d) be a metric space. (a) Every convergent sequence in (X, d) is a Cauchy sequence. (b) Every Cauchy sequence in (X, d) is bounded . (c) If a Cauchy sequence in (X, d) has a subsequence that converges in (X, d), then it converges itself in (X, d) and its limit coincides with the limit of that convergent subsequence. Proof. (a) Take an arbitrary ε > 0. If an X-valued sequence {xn } converges to x ∈ X, then there exists an integer nε ≥ 1 such that d(xn , x) < 2ε whenever n ≥ nε . Since d(xm , xn ) ≤ d(xm , x) + d(x, xn ) for every pair of indices m, n (triangle inequality), it follows that d(xm , xn ) < ε whenever m, n ≥ nε . (b) If {xn } is a Cauchy sequence, then there exists an integer n1 ≥ 1 such that d(xm , xn ) < 1 whenever m, n ≥ n1 . The set {d(xm , xn ) ∈ R: m, n ≤ n1 } has a maximum in R, say β, because it is finite. Thus d(xm , xn ) ≤ d(xm , xn1 ) + d(xn1 , xn ) ≤ 2 max{1, β} for every pair of indices m, n. (c) Suppose {xnk } is a subsequence of an X-valued Cauchy sequence {xn } that converges to a point x ∈ X (i.e., xnk → x as k → ∞). Take an arbitrary ε > 0. Since {xn } is a Cauchy sequence, it follows that there exists a positive integer nε such that d(xm , xn ) < 2ε whenever m, n ≥ nε . Since {xnk } converges to x, it follows that there exists a positive integer kε such that d(xnk , x) < 2ε whenever k ≥ kε . Thus, if j is any integer with the property that j ≥ kε and nj ≥ nε (for instance, if j = max{nε , kε }), then d(xn , x) ≤ d(xn , xnj ) + d(xnj , x) < ε for every n ≥ nε , and therefore {xn } converges to x. Although a convergent sequence always is a Cauchy sequence, the converse may fail. For instance, the (0, 1]-valued sequence { n1 }n∈N is a Cauchy sequence in the metric space ((0, 1], d), where d is the usual metric on R, that does not converge in ((0, 1], d). There are, however, metric spaces with the notable property that Cauchy sequences in them are convergent. Metric spaces possessing this property are so important that we give them a name. A metric space X is complete if every Cauchy sequence in X is a convergent sequence in X. Theorem 3.40. Let A be a subset of a metric space X. (a) If the subspace A is complete, then A is closed in X. (b) If X is complete and if A is closed in X, then the subspace A is complete. Proof. (a) Take an arbitrary A-valued sequence {an } that converges in X. Since every convergent sequence is a Cauchy sequence, it follows that {an } is a Cauchy sequence in X, and therefore a Cauchy sequence in the subspace A. If the subspace A is complete, then {an } converges in A. Conclusion: If A is complete as a subspace of X, then every A-valued sequence that converges in X has its limit in A. Thus, according to the Closed Set Theorem (Theorem 3.30), A is closed in X.
130
3. Topological Structures
(b) Take an arbitrary A-valued Cauchy sequence {an }. If X is complete, then {an } converges in X to a point a ∈ X. If A is closed in X, then Theorem 3.30 (the Closed Set Theorem again) ensures that a ∈ A, and hence {an } converges in the subspace A. Conclusion: If X is complete and A is closed in X, then every Cauchy sequence in the subspace A converges in A. That is, A is complete as a subspace of X. An important immediate corollary of the above theorem says that “inside” a complete metric space the properties of being closed and complete coincide. Corollary 3.41. Let X be a complete metric space. A subset A of X is closed in X if and only if the subspace A is complete. Example 3.R. (a) A basic property of the real number system is that every bounded sequence of real numbers has a convergent subsequence. This and Proposition 3.39 ensure that the metric space R (equipped with its usual metric) is complete; and so is the metric space C of all complex numbers equipped with its usual metric (reason: if {αk } is a Cauchy sequence in C , then {Re αk } and {Im αk } are both Cauchy sequences in R so that they converge in R, and hence {αk } converges in C ). Since the set Q of all rational numbers is not closed in R (recall: Q − = R), it follows by Corollary 3.41 that the metric space Q is not complete. More generally (but similarly), Rn and C n are complete metric spaces
when equipped with any of their metrics dp for p ≥ 1 or d∞ (as in Example 3.A), for every positive integer n, while Q n is not a complete metric space.
(b) Now let F denote either the real field R or the complex field C equipped with their usual metrics. As we have just seen, F is a complete metric space. p For each real number p ≥ 1 let (+ , dp ) be the metric space of all F -valued p-summable sequences equipped with its usual metric dp as in Example 3.B. p Take an arbitrary Cauchy sequence in (+ , dp ), say {xn }n∈N . Recall that this p is a sequence of sequences; that is, xn = {ξn (k)}k∈N is a sequence in + for each integer n ∈ N . The Cauchy criterion says: for every ε > 0 there exists an integer nε ≥ 1 such that dp (xm , xn ) < ε whenever m, n ≥ nε . Thus |ξm (k) − ξn (k)| ≤
∞
|ξm (i) − ξn (i)|p
p1
= dp (xm , xn ) < ε
i=1
for every k ∈ N whenever m, n ≥ nε . Therefore, for each k ∈ N the scalarvalued sequence {ξn (k)}n∈N is a Cauchy sequence in F , and hence it converges in F (since F is complete) to, say, ξ (k) ∈ F . Consider the scalar-valued sequence x = {ξ (k)}k∈N consisting of those limits ξ (k) ∈ F for every k ∈ N . First we show p p that x ∈ + . Since {xn }n∈N is a Cauchy sequence in (+ , dp ), it follows by
3.7 Complete Spaces
131
Proposition 3.39 that it is bounded (i.e., supm,n dp (xm , xn ) < ∞), and hence p supm dp (xm , 0) < ∞ where 0 denotes the null sequence in + . (Indeed, for every m ∈ N the triangle inequality ensures that dp (xm , 0) ≤ supm,ndp (xm , xn ) + dp (xn , 0) for an arbitrary n ∈ N .) Therefore, j
|ξn (k)|p
p1
≤
∞
k=1
|ξn (k)|p
p1
= dp (xn , 0) ≤ sup dp (xm , 0) m
k=1
for every n ∈ N and each integer j ≥ 1. Since ξn (k) → ξ (k) in F as n → ∞ for each k ∈ N , it follows that j
|ξ (k)|p
p1
= lim n
k=1
j
|ξn (k)|p
p1
≤ sup dp (xm , 0) m
k=1
for every j ∈ N . Thus ∞
|ξ (k)|p
k=1
p1
= sup j
j
|ξ (k)|p
p1
≤ sup dp (xm , 0), m
k=1
p p . Next we show that xn → x in (+ , dp ). which means that x = {ξ (k)}k∈N ∈ + p Again, as {xn }n∈N is a Cauchy sequence in (+ , dp ), for any ε > 0 there exists an integer nε ≥ 1 such that dp (xm , xn ) < ε whenever m, n ≥ nε . Thus j
|ξn (k) − ξm (k)|p ≤
k=1
∞
|ξn (k) − ξm (k)|p < εp
k=1
for every integer j ≥ 1 whenever m, n ≥ nε . Since limm ξm (k) = ξ (k) for each j k ∈ N , it follows that k=1 |ξn (k) − ξ (k)|p ≤ εp , and hence dp (xn , x) =
∞ k=1
|ξn (k) − ξ (k)|p
p1
= sup j
j
|ξn (k) − ξ (k)|p
p1
≤ ε,
k=1
p whenever n ≥ nε ; which means that x(n) → x in (+ , dp ). Therefore
( p , dp ) is a complete metric space for every p ≥ 1. Similarly (see Example 3.B), for each p ≥ 1, ( p , dp ) is a complete metric space. Example 3.S. Let S be a nonempty set, let (Y, d) be a metric space, and consider the metric space (B[S, Y ], d∞ ) of all bounded mappings of S into (Y, d) equipped with the sup-metric d∞ (Example 3.C). We claim that (B[S, Y ], d∞ ) is complete if and only if (Y, d) is complete.
132
3. Topological Structures
(a) Indeed, suppose (Y, d) is a complete metric space. Let {fn } be a Cauchy sequence in (B[S, Y ], d∞ ). Thus {fn (s)} is a Cauchy sequence in (Y, d) for every s ∈ S = ∅ (because d(fm (s), fn (s)) ≤ sups∈S d(fm (s), fn (s)) = d∞ (fm , fn ) for each pair of integers m, n and every s ∈ S), and hence {fn(s)} converges in (Y, d) for every s ∈ S (since (Y, d) is complete). Set f (s) = limn fn (s) for each s ∈ S (i.e., fn (s) → f (s) in (Y, d)), which defines a function f of S into Y. We shall show that f ∈ B[S, Y ] and that fn → f in (B[S, Y ], d∞ ), thus proving that (B[S, Y ], d∞ ) is complete whenever (Y, d) is complete. First note that, for each positive integer n and every pair of points s, t in S, d(f (s), f (t)) ≤ d(f (s), fn (s)) + d(fn (s), fn (t)) + d(fn (t), f (t)) by the triangle inequality. Now take an arbitrary real number ε > 0. Since {fn } is a Cauchy sequence in (B[S, Y ], d∞ ), it follows that there exists a positive integer nε such that d∞ (fm , fn ) = sups∈S d(fm (s), fn (s)) < ε, and hence d(fm (s), fn (s)) ≤ ε for all s ∈ S, whenever m, n ≥ nε . Moreover, since fm (s) → f (s) in (Y, d) for every s ∈ S, and since the metric is continuous (i.e., d(·, y): Y → R is a continuous function from the metric space Y to the metric space R for each y ∈ Y ), it also follows that d(f (s), fn (s)) = d(limm fm (s), fn (s)) = limm d(fm (s), fn (s)) for each positive integer n and every s ∈ S (see Problem 3.14 or 3.34 and Corollary 3.8). Thus d(f (s), fn (s)) ≤ ε for all s ∈ S whenever n ≥ nε . Furthermore, since each fn lies in B[S, Y ], it follows that there exists a real number γnε such that sup d(fnε (s), fnε (t)) ≤ γnε .
s,t ∈S
Therefore, for any ε > 0 there exists a positive integer nε such that d(f (s), f (t)) ≤ 2ε + γnε for all s, t ∈ S, so that f ∈ B[S, Y ], and d∞ (f, fn ) = sup d(f (s), fn (s)) ≤ ε s∈S
whenever n ≥ nε , so that fn → f in (B[S, Y ], d∞ ). (b) Conversely, suppose (B[S, Y ], d∞ ) is a complete metric space. Take an arbitrary Y -valued sequence {yn } and set fn (s) = yn for each integer n and all s ∈ S = ∅. This defines a sequence {fn } of constant mappings of S into Y with each fn clearly in B[S, Y ] (a constant mapping is obviously bounded). Note that d∞ (fm , fn ) = sups∈S d(fm (s), fn (s)) = d(ym , yn ) for every pair of integers m, n. Thus {fn } is a Cauchy sequence in (B[S, Y ], d∞ ) if and only if
3.7 Complete Spaces
133
{yn } is a Cauchy sequence in (Y, d). Moreover, {fn} converges in (B[S, Y ], d∞ ) if and only if {yn } converges in (Y, d). (Reason: If d(yn , y) → 0 for some y ∈ Y, then d∞ (fn , f ) → 0 where f ∈ B[S, Y ] is the constant mapping f (s) = y for all s ∈ S and, on the other hand, if d∞ (fn , f ) → 0 for some f ∈ B[S, Y ], then d(yn , f (s)) = d(fn (s), f (s)) for each n and every s so that d(yn , f (s)) → 0 for all s ∈ S — and hence f must be a constant mapping.) Now suppose (Y, d) is not complete, which implies that there exists a Cauchy sequence in (Y, d), say {yn }, that fails to converge in (Y, d). Thus the sequence {fn } of constant mappings fn (s) = yn for each integer n and all s ∈ S is a Cauchy sequence in (B[S, Y ], d∞ ) that fails to converge in (B[S, Y ], d∞ ), and so (B[S, Y ], d∞ ) is not complete. Conclusion: If (B[S, Y ], d∞ ) is complete, then (Y, d) is complete. (c) Concrete example: Set S = N or S = Z and Y = F (either the real field R or the complex field C equipped with their usual metric). Then ∞ (+ , d∞ ) and ( ∞, d∞ ) are complete metric spaces.
Example 3.T. Consider the set B[X, Y ] of all bounded mappings of a nonempty metric space (X, dX ) into a metric space (Y, dY ) and equip it with the sup-metric d∞ as in the previous example. Let BC[X, Y ] be the set of all continuous mappings from B[X, Y ] (Example 3.N), so that (BC[X, Y ], d∞ ) is the subspace of (B[X, Y ], d∞ ) made up of all bounded continuous mappings of (X, dX ) into (Y, dY ). If (Y, dY ) is complete, then (B[X, Y ], d∞ ) is complete according to Example 3.S. Since BC[X, Y ] is closed in (B[X, Y ], d∞ ) (Example 3.N), it follows by Theorem 3.40 that (BC[X, Y ], d∞ ) is complete. On the other hand, the very same construction used in item (b) of the previous example shows that (BC[X, Y ], d∞ ) is not complete unless (Y, dY ) is. Conclusion: (BC[X, Y ], d∞ ) is complete if and only if (Y, dY ) is complete. In particular (see Examples 3.D, 3.G, and 3.N), (C[0, 1], d∞ ) is a complete metric space because R or C (equipped with their usual metrics, as always) are complete metric spaces (Example 3.R). However, for any p ≥ 1 (see Problem 3.58), (C[0, 1], dp ) is not a complete metric space. The concept of completeness leads to the next useful result on contractions. Theorem 3.42. (Contraction Mapping Theorem or Method of Successive Approximations or Banach Fixed Point Theorem). A strict contraction F of a nonempty complete metric space (X, d) into itself has a unique fixed point x ∈ X, which is the limit in (X, d) of every X-valued sequence of the form {F n (x0 )}n∈N 0 for any x0 ∈ X.
134
3. Topological Structures
Proof. Take any x0 ∈ X. Consider the X-valued sequence {xn }n∈N 0 such that xn = F n (x0 ) for each n ∈ N 0 . Recall that F n denotes the composition of F : X → X with itself n times (and that F 0 is by convention the identity map on X). It is clear that the sequence {xn }n∈N 0 satisfies the difference equation xn+1 = F (xn ) for every n ∈ N 0 . Conversely, if an X-valued sequence {xn }n∈N 0 is recursively defined from any point x0 ∈ X onwards as xn+1 = F (xn ) for every n ∈ N 0 , then it is of the form xn = F n (x0 ) for each n ∈ N 0 (proof: induction). Now suppose F : (X, d) → (X, d) is a strict contraction and let γ ∈ (0, 1) be any Lipschitz constant for F so that d(F (x), F (y)) ≤ γ d(x, y) for every x, y in X. A trivial induction shows that d(F n (x), F n (y)) ≤ γ n d(x, y) for every nonnegative integer n and every x, y ∈ X. Next take an arbitrary pair of nonnegative distinct integers, say m < n. Note that xn = F n (x0 ) = F m (F n−m (x0 )) = F m (xn−m ), and hence d(xm , xn ) = d(F m (x0 ), F m (xn−m )) ≤ γ m d(x0 , xn−m ). By using the triangle inequality we get d(x0 , xn−m ) ≤
n−m−1
d(xi , xi+1 ),
i=0
and therefore d(xm , xn ) ≤ γ m
n−m−1
d(xi , xi+1 ) ≤ γ m
n−m−1
γ i d(x0 , x1 ).
i=0
i=0
k−1
Another trivial induction shows that γi = each integer k ≥ 1. Thus n−m−1 i=0 d(xm , xn ) <
1−αk i i=0 α = 1−α 1−γ n−m 1 < 1−γ 1−γ
γm d(x0 , x1 ) 1−γ
and
for every real α = 1 and for any γ ∈ (0, 1) so that γ m → 0.
This ensures that {xn } is a Cauchy sequence in (X, d) (reason: for any ε > 0 γm d(x0 , x1 ) < ε, which implies d(xm , xn ) < ε there is an integer nε such that 1−γ
3.8 Continuous Extension and Completion
135
whenever n > m ≥ nε ). Hence {xn } converges in the complete metric space (X, d). Set x = lim xn ∈ X. Since a contraction is continuous, we get by Corollary 3.8 that {F (xn )} converges in (X, d) and F (lim xn ) = lim F (xn ). Thus x = lim xn = lim xn+1 = lim F (xn ) = F (lim xn ) = F (x) so that the limit of {xn } is a fixed point of F . Moreover, if y is any fixed point of F , then d(x, y) = d(F (x), F (y)) ≤ γ d(x, y), which implies that d(x, y) = 0 (since γ ∈ (0, 1)), and so x = y. Conclusion: For every x0 ∈ X the sequence {F n (x0 )} converges in (X, d), and its limit is the unique fixed point of F .
3.8 Continuous Extension and Completion Recall that continuity preserves convergence (Corollary 3.8). Uniform continuity, as one might expect, goes beyond that. In fact, uniform continuity also preserves Cauchy sequences. Lemma 3.43. Let F : X → Y be a uniformly continuous mapping of a metric space X into a metric space Y. If {xn } is a Cauchy sequence in X, then {F (xn )} is a Cauchy sequence in Y. Proof. The proof is straightforward by the definitions of Cauchy sequence and uniform convergence. Indeed, let dX and dY denote the metrics on X and Y, respectively, and take an arbitrary X-valued sequence {xn }. If F : X → Y is uniformly continuous, then for every ε > 0 there exists δε > 0 such that dX (xm , xn ) < δε
implies
dY (F (xm ), F (xn )) < ε.
However, associated with δε there exists a positive integer nε such that m, n ≥ nε
implies
dX (xm , xn ) < δε
whenever {xn } is a Cauchy sequence in X. Hence, for every real number ε > 0 there exists a positive integer nε such that m, n ≥ nε
implies
dY (F (xm ), F (xn )) < ε,
which means that {F (xn )} is a Cauchy sequence in Y.
Thus, if G: X → Y is a uniform homeomorphism between two metric spaces X and Y, then {xn } is a Cauchy sequence in X if and only if {G(xn )} is a Cauchy sequence in Y, and therefore a uniform homeomorphism takes a complete metric space onto a complete metric space. Theorem 3.44. Take two uniformly homeomorphic metric spaces. One of them is complete if and only if the other is.
136
3. Topological Structures
Proof. Let X and Y be metric spaces and let G: X → Y be a uniform homeomorphism. Take an arbitrary Cauchy sequence {yn } in Y and consider the sequence {xn } in X such that xn = G−1 (yn ) for each n. Lemma 3.43 ensures that {xn } is a Cauchy sequence in X. If X is complete, then {xn } converges in X to, say, x ∈ X. Since G is continuous, it follows by Corollary 3.8 that the sequence {yn }, which is such that yn = G(xn ) for each n, converges in Y to y = G(x). Thus Y is complete. The preceding theorem does not hold if uniform homeomorphism is replaced by plain homeomorphism: if X and Y are homeomorphic metric spaces, then it is not necessarily true that X is complete if and only if Y is complete. In other words, completeness is not a topological invariant (continuity preserves convergence but not Cauchy sequences). Therefore, there may exist homeomorphic metric spaces such that just one of them is complete. A Polish space is a separable metric space homeomorphic to a complete metric space. Example 3.U. Let R be the real line with its usual metric. Set A = (0, 1] and B = [1, ∞), both subsets of R. Consider the function G: A → B such that G(α) = α1 for every α ∈ A. As is readily verified, G is a homeomorphism of A onto B, so that A and B are homeomorphic subspaces of R. Now consider the A-valued sequence {αn } with αn = n1 for each n ∈ N , which is a Cauchy sequence in A. However, G(αn ) = n for every n ∈ N , and so {G(αn )} is certainly not a Cauchy sequence in B (since it is not even bounded in B). Thus G: A → B (which is continuous) is not uniformly continuous by Lemma 3.43. Actually, B is a complete subspace of R since B is a closed subset of the complete metric space R (Corollary 3.41) and, as we have just seen, A is not a complete subspace of R: the Cauchy sequence {αn } does not converge in A because its continuous image {G(αn )} does not converge in B (Corollary 3.8). Lemma 3.43 also leads to an extremely useful result on extensions of uniformly continuous mappings of a dense subspace of a metric space into a complete metric space. Theorem 3.45. Every uniformly continuous mapping F : A → Y of a dense subspace A of a metric space X into a complete metric space Y has a unique continuous extension over X, which in fact is uniformly continuous. Proof. Suppose the metric space X is nonempty (to avoid trivialities) and let A be a dense subset of X. Take an arbitrary point x in X. Since A− = X, it follows by Proposition 3.32 that there exists an A-valued sequence {an } that converges in X to x, and hence {an } is a Cauchy sequence in the metric space X (Proposition 3.39) so that {an } is a Cauchy sequence in the subspace A of X. Now suppose F : A → Y is a uniformly continuous mapping of A into a metric space Y. Thus, according to Lemma 3.43, {F (an )} is a Cauchy sequence in Y. If Y is a complete metric space, then the Y -valued sequence {F (an )} converges in it. Let y ∈ Y be the (unique) limit of {F (an )} in Y :
3.8 Continuous Extension and Completion
137
y = lim F (an ). We shall show now that y, which obviously depends on x ∈ X, does not depend on the A-valued sequence {an } that converges in X to x. Indeed, let {an } be an A-valued sequence converging in X to x, and set y = lim F (an ). Since both sequences {an } and {an } converge in X to the same limit x, it follows that dX (an , an ) → 0 (Problem 3.14(b)), where dX denotes the metric on X. Thus for every real number δ > 0 there exists an index nδ such that n ≥ nδ
implies
dX (an , an ) < δ.
Moreover, since the mapping F : A → Y is uniformly continuous, for every real number ε > 0 there exists a real number δε > 0 such that dX (a, a ) < δε
implies
dY (F (a), F (a )) < ε
for all a and a in A, where dY denotes the metric on Y. Conclusion: Given any ε > 0 there is a δε > 0, associated with which there is an nδε , such that n ≥ nδε
implies
dY (F (an ), F (an )) < ε.
Thus (Problem 3.14(c)) 0 ≤ dY (y, y ) ≤ ε for all ε > 0, and so dY (y, y ) = 0. That is, y = y . Therefore, for each x ∈ X set F(x) = lim F (an ) in Y, where {an } is any A-valued sequence that converges in X to x. This defines a mapping F : X → Y of X into Y. Claim 1. F is an extension of F over X. Proof. Take an arbitrary a in A and consider the A-valued constant sequence {an } such that an = a for every index n. As the Y -valued sequence {F (an )} is constant, it trivially converges in Y to F (a). Thus F (a) = F (a) for every a in A. That is, F|A = F . This means that F : A → Y is a restriction of F : X → Y to A ⊆ X or, equivalently, F is an extension of F over X. Claim 2. F is uniformly continuous. Proof. Take a pair of arbitrary points x and x in X. Let {an } and {an } be any pair of A-valued sequences converging in X to x and x , respectively (recall: the existence of these sequences is ensured by Proposition 3.32 because A is dense in X). Note that dX (an , an ) ≤ dX (an , x) + dX (x, x )+dX (x , an ) for every index n by the triangle inequality in X. Thus, as an → x and an → x in X, for any δ > 0 there exists an index nδ such that (Definition 3.4) dX (x, x ) < δ
implies
dX (an , an ) < 3δ for every n ≥ nδ .
138
3. Topological Structures
Since F : A → Y is uniformly continuous, it follows by Definition 3.6 that for every ε > 0 there exists δε > 0, which depends only on ε, such that dX (an , an ) < 3δε
implies
dY (F (an ), F (an )) < ε.
Thus, associated with each ε > 0 there exists δε > 0 (depending only on ε), which in turn ensures the existence of an index nδε , such that dX (x, x ) < δε
implies
dY (F (an ), F (an )) < ε for every n ≥ nδε .
Moreover, since F (an ) → F(x) and F (an ) → F (x ) in Y by the very definition of F : X → Y, it follows by Problem 3.14(c) that dY (F (an ), F (an )) < ε for every n ≥ nδε
implies
dY (F(x), F(x )) ≤ ε.
Therefore, given an arbitrary ε > 0 there exists δε > 0 such that dX (x, x ) < δε
implies
dY (F(x), F (x )) ≤ ε
for all x, x ∈ X. Thus F: X → Y is uniformly continuous (Definition 3.6). Finally, since F: X → Y is continuous, it follows by Corollary 3.33 that if X → Y is a continuous extension of F : A → Y over X, then G = F (beG: cause A is dense in X and G|A = F |A = F ). Therefore, F is the unique continuous extension of F over X. Corollary 3.46. Let X and Y be complete metric spaces, and let A and B be dense subspaces of X and Y, respectively. If G: A → B is a uniform homeomorphism of A onto B, then there exists a unique uniform homeomorphism X → Y of X onto Y that extends G over X (i.e., G| A = G). G: Proof. Since A is dense in X, Y is complete, and G: A → B ⊆ Y is uniformly continuous, it follows by the previous theorem that G has a unique uniformly X → Y. Also, the inverse G−1 : B → A of G: A → B continuous extension G: * −1 : Y → X. Now observe that has a unique uniformly continuous extension G −1 * −1 A= (G G)|A = G G = IA , where IA : A → A is the identity on A (reason: G| * −1 | = G−1 : B → A). The identity I is uniformly continuous G: A → B and G B A (because its domain and range are subspaces of the same metric space X), and hence it has a unique continuous extension on X (by the previous theorem) which clearly is IX : X → X, the identity on X (recall: IX in fact is uniformly continuous because its domain and range are equipped with the same metric). * * −1 G −1 G = IX , since G is continuous (composition of continuous mapThus G pings) and is an extension of the uniformly continuous mapping G−1 G = IA * −1 = I where I : Y → Y is the identity on Y. Therefore G over X. Similarly, G Y Y −1 * −1 X → Y is an invertible uniformly continuous G = G . Summing up: G: mapping with a uniformly continuous inverse (i.e., a uniform homeomorphism) which is the unique uniformly continuous extension of G: A → B over X.
3.8 Continuous Extension and Completion
139
Recall that every surjective isometry is a uniform homeomorphism. Suppose the uniform homeomorphism G of the above corollary is a surjective isom etry. Take an arbitrary pair of points x and x in X so that G(x) = lim G(an ) and G(x ) = lim G(an ) in Y, where {an } and {an } are A-valued sequences converging in X to x and x , respectively (cf. proof of Theorem 3.45). Since G is an isometry, it follows by Problem 3.14(b) that )) = lim dY (G(an ), G(a )) = lim dX (an , a ) = dX (x, x ). dY (G(x), G(x n n is an isometry as well, and so a surjective isometry (since G is a Thus G homeomorphism). This proves the following further corollary of Theorem 3.45. Corollary 3.47. Let A and B be dense subspaces of complete metric spaces X and Y, respectively. If J: A → B is a surjective isometry of A onto B, then X → Y of X onto Y that extends there exists a unique surjective isometry J: J over X (i.e., J|A = J). If a metric space X is a subspace of a complete metric space Z, then its closure X − in Z is a complete metric space by Theorem 3.40. In this case X can be thought of as being “completed” by joining to it all its accumulation points from Z (recall: X − = X ∪ X ), and X − can be viewed as a “completion” of X. However, if a metric space X is not specified as being a subspace of a complete metric space Z, then the above approach of simply taking the closure of X in Z obviously collapses; but the idea of “completion” behind such an approach survives. To begin with, recall that two metric spaces, say are isometrically equivalent if there exists a surjective isometry of X and X, Isometrically equivalent metone of them onto the other (notation: X ∼ = X). ric spaces are regarded (as far as purely metric-space structure is concerned) is a subspace of a complete as being essentially the same metric space. If X − in that complete metric space is itself a metric space, then its closure X complete metric space. With this in mind, consider the following definition. Definition 3.48. If the image of an isometry on a metric space X is a dense then X is said to be densely embedded in X. If subspace of a metric space X, a metric space X is densely embedded in a complete metric space X, then X is a completion of X. is an isometry and J(X)− = X, then X is In other words, if J: X → X densely embedded in X. Moreover, if X is complete, then X is a completion of X. Even if a metric space fails to be complete it can always be densely embedded in a complete metric space. Lemma 3.43 plays a central role in the proof of this result, which is restated below. Theorem 3.49. Every metric space has a completion. Proof. Let (X, dX ) be an arbitrary metric space and let CS(X) denote the collection of all Cauchy sequences in (X, dX ). If x = {xn } and y = {yn } are
140
3. Topological Structures
sequences in CS(X), then the real-valued sequence {dX (xn , yn )} converges in R (see Problem 3.53(a)). Thus, for each pair (x , y) in CS(X)×CS(X), set d(x , y ) = lim dX (xn , yn ). This defines a function d : CS(X)×CS(X) → R which is a pseudometric on CS(X). Indeed, nonnegativeness and symmetry are trivially verified, and the triangle inequality in (CS(X), d) follows at once by the triangle inequality in (X, dX ). Consider a relation ∼ on CS(X) defined as follows. If x = {xn } and x = {xn } are Cauchy sequences in (X, dX ), then x ∼ x
if
d(x , x ) = 0.
Proposition 3.3 asserts that ∼ is an equivalence relation on CS(X). Let X be the collection of all equivalence classes [x ] ⊆ CS(X) with respect to ∼ for = CS(X)/∼ , the every sequence x = {xn } in CS(X). In other words, set X X, set quotient space of CS(X) modulo ∼ . For each pair ([x ], [y ]) in X× ([x ], [y ]) = d(x , y ) d X ([x ], [y ]) = lim dX (xn , yn ) where for an arbitrary pair (x , y) in [x ]×[y ] (i.e., d X {xn } and {yn } are any Cauchy sequences from the equivalence classes [x ] and [y ], respectively). Proposition 3.3 also asserts that this actually defines a X → R, and that such a function d is a metric on X. Thus function dX : X× X d ) is a metric space. (X, X defined as follows. For each x ∈ X take Now consider the mapping K: X → X the constant sequence x = {xn } ∈ CS(X) such that xn = x for all indices n, That is, for each x in X, K(x) is the equivalence and set K(x) = [x ] ∈ X. class in X containing the constant sequence with entries equal to x. Note that d ) is an isometry. K: (X, dX ) → (X, X Indeed, if x, y ∈ X, let x = {xn } and y = {yn } be constant sequences with xn = x and yn = y for all n. Then dX (K(x), K(y)) = dX (xn , yn ) = dX (x, y). Claim 1. K(X)− = X. and any {xn } ∈ [x ] so that {xn } is a Cauchy sequence Proof. Take any [x ] ∈ X in (X, dX ). Thus for each ε > 0 there is an index nε such that dX (xn , xnε ) < ε confor every n ≥ nε . Set [x ε ] = K(xnε ) ∈ K(X): the equivalence class in X taining the constant sequence with entries equal to xnε . Therefore, for each and each ε > 0 there exists an [x ε ] ∈ K(X) such that d ([x ε ], [x ]) = [x ] ∈ X X d ) (Proposition 3.32). lim dX (xn , xnε ) < ε. Hence K(X) is dense in (X, X d ) is complete. Claim 2. The metric space (X, X
3.8 Continuous Extension and Completion
141
d ). Since K(X) Proof. Take an arbitrary Cauchy sequence {[x ]k }k≥1 in (X, X d ), for each k ≥ 1 there exists [y ]k ∈ K(X) such that is dense in (X, X dX ([x ]k , [y ]k ) <
1 k
(cf. Proposition 3.32). Then, since dX ([y ]j , [y ]k ) ≤ dX ([y ]j , [x]j ) + dX ([x ]j , [x ]k ) + dX ([x ]k , [y ]k ) d ), it for every j, k ≥ 1, and since {[x]k }k≥1 is a Cauchy sequence in (X, X follows that the K(X)-valued sequence d ). {[y ]k }k≥1 is a Cauchy sequence in (X, X Now take any k ≥ 1 and notice that, as [y ]k lies in K(X), there exists yk ∈ X such that the constant sequence y k = {yk }n≥1 belongs to the equivalence class is a sur[y ]k = K(yk ), and so yk = K −1 ([y ]k ). Indeed, K: X → K(X) ⊆ X −1 jective isometry of X onto the subspace K(X) of X so that K : K(X) → X is again a surjective isometry, thus uniformly continuous. Therefore, since {[y ]k }k≥1 is a Cauchy sequence in (K(X), dX ), it follows by Lemma 3.43 that {yk }k≥1 is a Cauchy sequence in (X, dX ). Moreover, Then y = {yk }k≥1 ∈ CS(X), and so the equivalence class [y ] is in X. d ) to [y ] ∈ X. {[x]k }k≥1 converges in (X, X Indeed, for every k ≥ 1, 0 ≤ dX ([x ]k , [y ]) ≤ dX ([x ]k , [y ]k ) + dX ([y ]k , [y ]) ≤
1 k
+ dX ([y ]k , [y ]).
Take y = {yn }n≥1 ∈ [y ] and, for each k ≥ 1, take the constant sequence y k = {yk }n≥1 ∈ [y ]k such that yk = K −1 ([y ]). By the definition of the metric dX on d ([y ]k , [y ]) = limn dX (yn , yk ) for every k ≥ 1, and so limk d ([y ]k , [y ]) = 0 X, X X because {yk }k≥1 is a Cauchy sequence in (X, dX ). Therefore, dX ([x ]k , [y ]) → 0 as k → ∞. d ), so that d ) converges in (X, Conclusion: Every Cauchy sequence in (X, X X d ) is a complete metric space. (X, X and X is complete. That is, X is Summing up: X ∼ = K(X), K(X)− = X, which means that X is a densely embedded in a complete metric space X, completion of X. Corollary 3.47 leads to the proof that a completion of a metric space is essentially unique; that is, the completion of a metric space is unique up to a surjective isometry.
142
3. Topological Structures
Theorem 3.50. Any two completions of a metric space are isometrically equivalent . and X Proof. Let X be a metric space and, according to Theorem 3.49, let X be two completions of X. This means that there exist surjective isometries →X J: X
and
→ X, J : X
is a dense subspace of X and X is a dense subspace of X . Recall where X that a surjective isometry is invertible, and also that its inverse is again a surjective isometry. Thus set → X, J = J −1 J : X which, as a composition of surjective isometries, is a surjective isometry itself. and X are dense subspaces of the complete metric spaces X and X , Since X it follows by Corollary 3.47 that there exists a unique surjective isometry X → X J: . Thus X and X are isometrically equivalent. that extends J over X
is a completion According to Definition 3.48, a complete metric space X say X, which is of a metric space X if there exists a dense subspace of X, isometrically equivalent to X; that is, if there exists a surjective isometry →X J: X onto X for some dense subspace X of X. Now let Y be another metric of X space, consider a mapping F:X → Y of X into Y, and let
→Y F: X
of X into Y such that F(x) = F (J(x)) for be a mapping of a completion X or, every x ∈ X. That is, F is an extension of the composition FJ over X equivalently, FJ: X → Y is the restriction of F to X. It is customary to refer of X (which in fact is a to F as an extension of F over the completion X slight abuse of terminology). The situation so far is illustrated by the following commutative diagram (recall: F|X = FJ ). ∼ X =X
X ⏐ F| ⏐ F X J
←−−−
Y
⊆ ! F
=X − X
3.8 Continuous Extension and Completion
143
The next theorem says that, if F is uniformly continuous and Y is complete, then there exists an essentially unique continuous extension F of F over a of X. completion X be a completion of a metric space X and let Y be a Theorem 3.51. Let X complete metric space. Every uniformly continuous mapping F : X → Y has → Y over the completion X of X. a uniformly continuous extension F : X Moreover, F is unique up to a surjective isometry. be a completion of a metric space X. Thus there exists Proof. Existence. Let X and a surjective isometry J: X →X a dense subspace X of the metric space X, of X onto X. Suppose F : X → Y is a uniformly continuous mapping of X into → Y, which is uniformly a metric space Y. Consider the composition FJ: X is dense in X, continuous as well (reason: J is uniformly continuous). Since X Y is complete, and FJ: X → Y is uniformly continuous, it follows by Theorem → Y of FJ: X →Y 3.45 that there exists a unique continuous extension F : X which in fact is uniformly continuous. Thus F: X → Y is a uniformly over X, of X. continuous extension of F : X → Y over the completion X → Y is another continuous extension of F : X → Y Uniqueness. Suppose F : X over some completion X of X so that F |X = FJ , where X is a dense → X is a surjective isometry of X onto X. Set and J : X subspace of X →X as in the proof of Theorem 3.50, and let J: X → X be the J = J −1 J : X surjective isometry of X onto X such that J|X = J. Thus, since F |X = FJ ,
= F J J = F JJ −1 J = F | . F J| X X Therefore, the continuous mappings F J: X → Y (composition of two continu → Y coincide on a dense subset X of X , and hence ous mappings) and F : X → Y is another continuous F = FJ by Corollary 3.33. Conclusion: If F : X extension of F over some completion X of X, then there exists a surjective X → X such that F = FJ. In other words, a continuous extenisometry J: sion of F over a completion of X is unique up to a surjective isometry. The commutative diagram, where ⊂ denotes dense inclusion, ⊂ X X
F J ⏐ ! ⏐ ⏐ ⏐ Y X J J " J −1 F ⊂ X X
illustrates the uniqueness proofs of Theorems 3.50 and 3.51.
144
3. Topological Structures
3.9 The Baire Category Theorem We close our discussion on complete metric spaces with an important classification of subsets of a metric space into two categories. The basic notion behind such a classification is the following one. A subset A of a metric space X is nowhere dense (or rare) in X if (A− )◦ = ∅ (i.e., if the interior of its closure is empty). Clearly, a closed subset of X is nowhere dense in X if and only if it has an empty interior. Note that (A\A◦ )◦ = ∅ for every subset A of a metric space X. Indeed, A◦ is the largest open subset of X that is included in A, so that the only open subset of X that is included in A\A◦ is the empty set ∅ of X. Therefore, if V is a closed subset of X, then V \V ◦ is nowhere dense in X (reason: V \V ◦ = V − \V ◦ = ∂V, which is closed in X — see Problem 3.41). Dually, if U is an open subset of X, then U − \U is nowhere dense in X (recall that U − \U = (X\V )− \(X\V ) = (X\V ◦ )\(X\V ) = V \V ◦ ). Carefully note that ∂ Q = Q − \Q ◦ = (Q − )◦ = R in the metric space R (usual metric). Proposition 3.52. A singleton {x} on a point x in a metric space X is nowhere dense in X if and only if x is not an isolated point of X. Proof. Recall that every singleton in a metric space X is a closed subset of X (Problem 3.37), and hence {x} = {x}− for every x in X. According to Proposition 3.37, a point x in X is an isolated point of X if and only if the singleton {x} is an open set in X; that is, if and only if {x}◦ = {x}. Thus a point x in X is not an isolated point of X if and only if {x}◦ ⊂ {x} (i.e., {x}◦ = {x}) or, equivalently, {x}◦ = ∅ (since the empty set is the only proper subset of any singleton). But {x}◦ = ∅ if and only if ({x}− )◦ = ∅ (because {x} = {x}− for every singleton {x}), which means that {x} is nowhere dense in X. The next proposition gives alternative characterizations of nowhere dense sets that will be required in the sequel. Proposition 3.53. Let X be a metric space and let A be a subset of X. The following assertions are pairwise equivalent. (a) (A− )◦ = ∅ (i.e., A is nowhere dense in X). (b) For every nonempty open subset U of X there exists a nonempty open subset U of X such that U ⊂ U and U ∩ A = ∅. (c) For every nonempty open subset U of X and every real number ρ > 0 there exists an open ball Bε with radius ε ∈ (0, ρ) such that Bε− ⊂ U and Bε− ∩ A = ∅. Proof. Suppose A is nonempty (otherwise the proposition is trivial). Proof of (a)⇔(b). Take an arbitrary nonempty open subset U of X. If (A− )◦ = ∅, then U \A− = ∅ (i.e., A− includes no nonempty open set), U \A− is open in X (since U \A− = (X\A− ) ∩ U ), U \A− ⊂ U , and (U \A− ) ∩ A = ∅. Thus
3.9 The Baire Category Theorem
145
(a) implies (b). Conversely, suppose (A− )◦ = ∅ so that there exists an open subset U0 of X such that ∅ = U0 ⊆ A−. Then every point of U0 is a point of adherence of A, and hence every nonempty open subset U0 of U0 meets A (cf. Proposition 3.25). Therefore, the denial of (a) implies the denial of (b). Equivalently, (b) implies (a). Proof of (b)⇔(c). If (b) holds true, then (c) holds true for every open ball included in U . Precisely, suppose (b) holds true. If u is any point of the open set U , then there exists a radius ρ > 0 such that Bρ (u) ⊂ U . Take an open ball Bε (u) with center at u and radius ε ∈ (0, ρ). Since Bε (u)− ⊆ Bρ (u), it follows that Bε (u)− ⊂ U and Bε (u)− ∩ A = ∅. Thus (b) implies (c). Conversely, suppose (c) holds true and set U = Bε , so that (c) implies (b). By using Proposition 3.53 it is easy to show that A ∪ B is nowhere dense in X whenever the sets A and B are both nowhere dense in X. Thus a trivial induction ensures that any finite union of nowhere dense subsets of a metric space X is again nowhere dense in X. However, a countable union of nowhere dense subsets of X does not need to be nowhere dense in X. A subset A of a metric space X is of first category (or meager ) in X if it is a countable unionof nowhere dense subsets of X. That is, A is of first category in X if A = n∈N An , where each An is nowhere dense in X. The complement of a set of first category in X is a residual (or comeager ) in X. A subset B of X is of second category (or nonmeager ) in X if it is not of first category in X. Example 3.V. Let X be a metric space. Recall that a subset of X is dense in itself if and only if it has no isolated point. Thus, according to Proposition 3.52, if X is nonempty and dense in itself, then every singleton in X is nowhere dense in X. Moreover, a nonempty countable subset of X is a countable union of singletons in X. Therefore, if A is a nonempty countable subset of X, and if X is dense in itself, then A is a countable union of nowhere dense subsets of X. Summing up: If a metric space X is dense in itself (i.e., has no isolated point), then every countable subset of it is of first category in X. For instance, Q is a (dense) subset of first category in R. Equivalently, if a metric space X has no isolated point, then every subset of second category in X is uncountable. The following basic properties of sets of first category are (almost) immediate consequences of the definition. Note that assertions (a) and (b), but not assertion (c), in the following proposition still hold if we replace “sets of first category” by “nowhere dense sets”. Proposition 3.54. Let X be a metric space. (a) A subset of a set of first category in X is of first category in X. (b) The intersection of an arbitrary collection of subsets of X is a set of first category in X if at least one of the subsets is of first category in X.
146
3. Topological Structures
(c) The union of a countable collection of sets of first category in X is a set of first category in X. Proof. If B = n Bn and if A ⊆ B ⊆ X, then A = n An , with each An = ◦ − ◦ Bn ∩ A ⊆ Bn , and so (A− n ) ⊆ (Bn ) . Hence (a) holds true by the definitions of nowhere dense set and set of first category. Let {Aγ }γ∈Γ be an arbitrary collection of subsets of X. Since γ∈Γ Aγ ⊆ Aα for every α ∈ Γ , it follows by item (a) that γ∈Γ A is of first category in X whenever at least one of the sets Aα in {Aγ }γ∈Γ is of first category in X. This proves assertion (b). If {An } is a countable collection of subsets ofX, and if each An is a countable union of nowhere dense subsets of X, then n An is itself a countable union of nowhere dense subsets of X (recall: a countable union of a countable collection is again a countable collection — Corollary 1.11). Thus n An is a set of first category in X, which concludes the proof of assertion (c). Example 3.V may suggest that sets of second category are particularly important. The next theorem, which plays a fundamental role in the theory of metric spaces, shows that they really are very important. Theorem 3.55. (Baire Category Theorem). Every nonempty open subset of a complete metric space X is of second category in X. Proof. Let {An }n∈N be an arbitrary countable collection of nowhere dense subsets of a metric space X and set A = n∈N An ⊆ X. Let U be an arbitrary nonempty open subset of X. Claim . For each integer k ≥ 1 there exists a collection {Bn }k+1 n=1 of open balls Bn with radius εn ∈ (0, n1 ) such that: Bn− ⊂ U and Bn− ∩ An = ∅ for each n = 1 , . . . , k + 1, and Bn+1 ⊂ Bn for every n = 1 , . . . , k. Proof. Since each An is nowhere dense in X, it follows by Proposition 3.53 that there exist open balls B1 and B2 with center at x1 and x2 (both included in U ) and positive radius ε1 < 1 and ε2 < 12 , respectively, such that Bi− ⊂ U and Bi ∩ Ai = ∅, for i = 1, 2 , and B2 ⊂ B1 . Thus the claimed result holds 1 for k = 1. Suppose it holds for some k ≥ 1, and take 0 < εk+1 < min{εk , k+1 }. Proposition 3.53 ensures again the existence of an open ball Bk+1 with center − at xk+1 ∈ U and radius εk+1 such that Bk+1 ⊂ U and Bk+1 ∩ Ak+1 = ∅. It is plain that Bk+1 ⊂ Bk , and so the claimed result holds for k + 1 whenever it holds for some k ≥ 1, which concludes the proof by induction. Consider the collection {Bn}n∈N = k∈N {Bn }k+1 n=1 . Since each open ball Bn = Bεn(xn ) ⊂ U is such that 0 < εn < n1 , it follows that the sequence {xn }n∈N of centers xn ∈ U is a Cauchy sequence in X (reason: for each ε > 0 take a positive integer nε > ε1 so that, if n > m ≥ nε , then Bεn(xn ) ⊂ Bεm(xm ) with 1 εm ≤ m ≤ n1ε < ε, and hence d(xm , xn ) < ε). Now suppose the metric space X is complete. This ensures that the Cauchy sequence {xn }n∈N converges in X to, say, x ∈ X. Take an arbitrary integer i ≥ 1. Since x is the limit of the sequence {xn }n≥i (a subsequence of {xn }n∈N ), and since {xn }n≥i ⊂ Bi
3.9 The Baire Category Theorem
147
(Bn+1 ⊂ Bn for every n ∈ N ), it follows that x ∈ Bi− (i.e., x is an adherent point of Bi ). Therefore, x ∈ / A because Bi− ∩ Ai = ∅ for every i ∈ N and A = − i∈N Ai . However, x ∈ U because Bi ⊂ U for all i ∈ N . Hence x ∈ U \A, and so U = A. Summing up: If U is a nonempty open subset of a complete metric space X, and if A is a set of first category in X (i.e., if A is a countable union of nowhere dense sets in X), then U = A. Conclusion: Every nonempty open subset of a complete metric space X is not a set of first category in X. In particular, as a metric space is always open in itself, we get at once the following corollary of Theorem 3.55 on nonempty complete metric spaces. Corollary 3.56. A complete metric space is of second category in itself . Of course, Corollary 3.56 refers to nonempty complete metric spaces. As a matter of fact, the metric spaces in the next three results are clearly nonempty too, although, for simplicity, we shall drop “nonempty” from their statements. Corollary 3.57. The complement of a set of first category in a complete metric space is a dense subset of second category in that space. Proof. Let X be a nonempty complete metric space. The union of two sets of first category in X is a set of first category in X (Proposition 3.54). Since the union of a subset A of X and its complement X\A is the whole space X, it follows by Corollary 3.56 that A and X\A cannot both be of first category in X. Thus X\A is of second category in X whenever A is of first category in X. Moreover, if (X\A)− = X, then A◦ is a nonempty open subset of X (Proposition 3.31), and so A◦ is a set of second category in X (Theorem 3.55), which implies that A is a set of second category in X (reason: if A is of first category in X, then A◦ is of first category in X because A◦ ⊆ A — Proposition 3.54). Thus, if A is of first category in X, then X\A is dense in X. In other words, if X = ∅ is a complete metric space, then every residual in X is both dense in X and of second category in X. Theorem 3.58. If a complete metric space is a countable union of closed sets, then at least one of them has nonempty interior . Proof. According to Corollary 3.56, a nonempty complete metric space is not a countable union of nowhere dense subsets of it. Thus, if a nonempty complete metric space X is a countable union of subsets of X, then at least one of them is not nowhere dense in X (i.e., the closure of at least one of them has nonempty interior). Theorem 3.58 is a particularly useful version of the Baire Category Theorem. A further version of it, which is the dual statement of Theorem 3.58, reads as follows. (This in fact is the classical Theorem of Baire.)
148
3. Topological Structures
Theorem 3.59. Every countable intersection of open and dense subsets of a complete metric space X is dense in X. Proof. Let {Un } be a countable collection of open and dense subsets of a nonempty complete metric space X. Set Vn = X\Un so that {Vn } is a countable − collection of closed subsets of X with empty interior (recall: Un = X means ◦ − − (X\Un) = ∅ by Proposition 3.31). If ( U ) = X, then X\( n n n Un ) = ∅, and hence X\ n Un = ∅. However, X\ n U n = − n (X\Un ) — De Morgan laws — so that n Vn = ∅, which implies ( n Vn ) = ∅. Thus, according to − Theorem 3.58, the nonempty subspace ( V ) of X is not complete (reason: n n − each Vn is a closed subset of ( n Vn ) — see Problem — with Vn◦ = ∅). 3.38 − On the other hand, Corollary 3.41 ensures that( n Vn ) is a complete sub− space of the complete metric space X (since ( is a closed subset of n Vn ) X), which leads to a contradiction. Conclusion: ( n Un )− = X. The Baire Category Theorem is a nonconstructive existence theorem. For instance, for an arbitrary countable collection {An } of nowhere dense sets in a nonempty complete metric space X, Corollary 3.57 asserts the existence of a dense set of points in X with the property that none of them lies in {An } for every n, but it does not tell us how to find those points. However, the unusual (and remarkable) fact about the Baire Category Theorem is that, while its hypothesis (completeness) has been defined in a metric space and is not a topological invariant (completeness is preserved by uniform homeomorphism but not by plain homeomorphism — see Example 3.U), its conclusion is of purely topological nature and is a topological invariant. For instance, the conclusion in Theorem 3.55 (being of second category) is a topological invariant in a general topological space. Indeed, if G: X → Y is a homeomorphism between topological spaces X and Y and A is an arbitrary subset of X, then it is easy to show that G(A)− = G(A− ) and G(A)◦ = G(A◦ ). Thus the property of being nowhere dense is a topological invariant (i.e., (A− )◦ = ∅ if and only if (G(A)− )◦ = ∅),and so is the property of being of first or second category (since G(A) = G( n An ) = n G(An ) whenever A = n An ). Such a purely topological conclusion suggests the following definition. A topological space is a Baire space if the conclusion of the classical Theorem of Baire holds on it. Precisely, a Baire space is a topological space X on which every countable intersection of open and dense subsets of X is dense in X. Thus Theorem 3.59 simply says that every complete metric space is a Baire space. Example 3.W. We shall now unfold three further consequences of the Baire Category Theorem, each resulting from one of the above three versions of it. (a) If A is a set of first category in a complete metric space X, then Corollary 3.57 says that (X\A)− = X; equivalently, A◦ = ∅. Conclusion: A set of first category in a complete metric space has empty interior . Corollary: A closed set of first category in a complete metric space is nowhere dense in that space: if A = A− is a set of first category in a complete metric space, then (A− )◦ = ∅.
3.10 Compact Sets
149
(b) Recall that a set without isolated points (i.e., dense in itself) in a complete metric space may be countable (example: Q in R). Suppose A is a nonempty perfect subset of a complete metric space X, which means that A is a closed set without isolated points. If A is a countable set, then it is the countable union of all singletons in it. Since every point in A is not an isolated point of A, it follows by Proposition 3.52 that every singleton in A is nowhere dense in the subspace A, so that every singleton in A has empty interior in A (recall: a singleton in a metric space A is closed in A — Problem 3.37). Then A is the countable union of closed sets in A, all of them with empty interior in A, and therefore the subspace A is not complete according to Theorem 3.58. However, since A is a closed subset of a complete metric space X, it follows by Corollary 3.41 that the subspace A is complete. Thus the assumption that A is countable leads to a contradiction. Conclusion: A nonempty perfect set in a complete metric space is uncountable. (c) A subset of a metric space X is a Gδ (read: G-delta) if it is a countable intersection of open subsets of X, and an Fσ (read: F -sigma) if it is a countable union of closed subsets of X. First observe that, if the complement X\A of a subset A of X is acountable union of subsetsof X, then A includes a − Gδ . In fact, if X\A = n Cn , then n (X\C ) ⊆ (X\C ) = X\ C = n n n n n X\(X\A) = A, and so A includes a Gδ ; viz., n (X\Cn− ). Moreover, if each Cn is nowhere dense in X (i.e., (Cn− )◦ = ∅), then (X\Cn− )− = X (see Proposition 3.31) so that X\Cn− is open and dense in X for every index n. Thus, according to Theorem 3.59, n (X\Cn− ) is a dense Gδ in X whenever X is a complete metric space. Summing up: If X\A is of first category (i.e., a countable union of nowhere dense subsets of X) in a complete metric space X, then A includes adense Gδ . Conversely, if a subset A of a metric space X includes a Gδ , say U ⊆ A where each U is open in X, then X\A ⊆ X\ U = n n n n n n X\Un . If the Gδ is dense in X (i.e., ( n Un )− = X), then (X\ n Un )◦ = ∅ (see Propo◦ ◦ sition 3.31 again) so that [(X\Un )− ]◦ = (X\U n ) ⊆ ( m X\Um ) = ∅, and so each X\Un is nowhere dense in X. Thus n X\Un is a set of first category in X, which implies that X\A is of first category in X as well (because every subset of a set of first category is itself of first category — Proposition 3.54). Summing up: If a set A in a metric space X includes a dense Gδ , then X\A is of first category in X. Conclusion: A subset of a complete metric space is a residual (i.e., its complement is of first category) if and only if it includes a dense Gδ . (This generalizes Corollary 3.57.) Dually, a subset of a complete metric space is of first category if and only if it is included in an Fσ with empty interior . (This generalizes the results of item (a).)
3.10 Compact Sets Recall that a collection A of nonempty subsets of a set X is a covering of a subset A of X (or A covers A ⊆ X) if A ⊆ A. If A is a covering of A, then
150
3. Topological Structures
any subcollection of A that also covers A is a subcovering of A. A covering of A comprising only open subsets of X is called an open covering. Definition 3.60. A metric space X is compact if every open covering of X includes a finite subcovering. A subset A of a metric space X is compact if it is compact as a subspace of X. The notion of compactness plays an extremely important role in general topology. Note that any topology T on a metric space X clearly is an open covering of X which trivially has a finite subcovering; namely, the collection {X} consisting of X alone. However, the definition of a compact space demands that every open covering of it has a finite subcovering. The idea behind the definition of a compact space is that even open coverings made up of “very small” open sets have a finite subcovering. Also note that the definition of a compact subspace A of a metric space X is given in terms of the relative topology on A: an open covering of the subspace A consists of relatively open subsets of A. The next elementary result says that this can be equally defined in terms of the topology on X. Proposition 3.61. A subset A of a metric space X is compact if and only if every covering of A made up of open subsets of X has a finite subcovering. Proof. If U is a covering of A (i.e., A ⊆ U) consisting of open subsets of X, then {U ∩ A: U ∈ U} is an open covering of the subspace A (see Problem 3.38). Conversely, every open covering U A of the subspace A consisting of relatively open subsets of A is of the form {U ∩ A: U ∈ U} for some covering U of A consisting of open subsets of X (reason: UA ∈ U A if and only if UA = A ∩ U for some open subset U of X — see Problem 3.38 again). The properties of being a closed subset and of being a compact subset of a metric space are certainly different from each other (trivial example: every metric space is closed in itself). However, “inside” a compact metric space these properties coincide. Theorem 3.62. Let A be a subset of a metric space X. (a) If A is compact, then A is closed in X. (b) If X is compact and if A is closed in X, then A is compact . Proof. (a) Let A be a compact subset of a metric space X. If either A = ∅ or A = X, then A is trivially closed in X. Thus suppose ∅ = X\A = X and take an arbitrary point x in X\A. Since x is distinct from every point in A, it follows that for each a ∈ A there exists an open neighborhood A a of a and an open neighborhood Xa of x such that A a ∩ Xa = ∅ (reason: every metric space is a Hausdorff space — see Problem 3.37). But A ⊆ a∈A A a so that {A a }a∈A is a covering of A consisting of nonempty open subsets of X.If A is compact, then n there is a finite subset of A, say {ai }ni=1 , such that A ⊆ i=1 Aai (Proposition
3.10 Compact Sets
151
3.61). Set Ux = ni=1 Xai , which is an open neighborhood of x (recall: each Xai isan open neighborhood of x). Since Aai ∩ Ux = ∅ for each i, it follows n that ( i=1 Aai ) ∩ Ux = ∅, and hence A ∩ Ux = ∅. Therefore Ux ⊆ X\A. Thus X\A is open in X (it includes an open neighborhood of each one of its points). (b) Let A be a closed subset of a compact metric space X. Take an arbitrary covering of A, say U A , consisting of open subsets of X. Thus U A ∪ {X\A} is an open covering of X. As X is compact, this covering includes a finite subcovering, say U, so that U \{X\A} ⊆ U A is a finite subcovering of U A . Therefore, every covering of A consisting of open subsets of X has a finite subcovering, and hence (Proposition 3.61) A is compact. Corollary 3.63. Let X be a compact metric space. A subset A of X is closed in X if and only if it is compact. A subset of a metric space X is relatively compact (or conditionally compact ) if it has a compact closure. It is clear by Corollary 3.63 that every subset of a compact metric space is relatively compact . Another important property of a compact set is that the continuous image of a compact set is compact . Theorem 3.64. Let F : X → Y be a continuous mapping of a metric space X into a metric space Y. (a) If A is a compact subset of X, then F (A) is compact in Y. (b) If X is compact, then F is a closed mapping. (c) If X is compact and F is injective, then F is a homeomorphism of X onto F (X). Proof. (a) Let U be a covering of F (A) (i.e., F (A) ⊆ U∈ U U ) consisting of open subsets U of Y. If F is continuous, then F −1 (U ) is an open subset of X for every U ∈ U according to Theorem 3.12. Set F −1 (U) = {F −1 (U ): U ∈ U}; a collection subsets of X. Clearly (see Problem 1.2), A ⊆ F −1 (F (A)) ⊆ of open
−1 −1 F (U ) so that F −1 (U) is a covering of A made up of U∈ U U = U∈ U F open subsets of X. If A is compact, then (Proposition 3.61) there exists a finite subcollection of F −1 (U) covering A; that is, exists {Ui}ni=1 ⊆ U such that n there n n −1 −1 A ⊆ i=1 F (Ui ) ⊆ X. Thus F (A) ⊆ F i=1 F (Ui ) ⊆ i=1 Ui ⊆ Y (have another look at Problem 1.2), and so F (A) is compact by Proposition 3.61. (b) If X is compact and if A is a closed subset of X, then A is compact by Theorem 3.62(b). Hence F (A) is a compact subset of Y by item (a), so that F (A) is closed in Y according to Theorem 3.62(a). (c) If X is compact and F is injective, then F is a continuous invertible closed mapping of X onto F (X) by item (b), and therefore a homeomorphism of X onto F (X) (Theorem 3.24).
152
3. Topological Structures
Since compactness is preserved under continuous mappings, it is preserved by homeomorphisms, and so compactness is a topological invariant. Moreover, a one-to-one continuous correspondence between compact metric spaces is a homeomorphism. These are straightforward corollaries of Theorem 3.64. Corollary 3.65. If X and Y are homeomorphic metric spaces, then one is compact if and only if the other is. Corollary 3.66. If X and Y are compact metric spaces, then every injective continuous mapping of X onto Y is a homeomorphism. Probably the reader has already noticed two important features: the metric has not played its role yet in this section, and the concepts of completeness and compactness share some common properties (e.g., compare Theorems 3.40 and 3.62). Indeed, the compactness proofs so far apply to topological spaces that are not necessarily metrizable. Actually, they all apply to Hausdorff spaces (metrizable or not), and Theorems 3.62(b) and 3.64(a) do hold for general topological spaces (not necessarily Hausdorff). As for the connection between completeness and compactness, first note that the notion of completeness introduced in Section 3.7 needs a metric. Moreover, as we have just seen, compactness is a topological invariant while completeness is not preserved by plain homeomorphism (just by uniform homeomorphism — Theorem 3.44 and Example 3.U). However, as we shall see in the next section, in a metric space compactness implies completeness. Some of the most important results of mathematical analysis deal with continuous mappings on compact metric spaces. Theorem 3.64 is a special instance of such results that leads to many relevant corollaries (e.g., Corollary 3.85 in the next section is an extremely useful corollary of Theorem 3.64). Another important result along this line reads as follows. Theorem 3.67. Every continuous mapping of a compact metric space into an arbitrary metric space is uniformly continuous. Proof. Let (X, dX ) and (Y, dY ) be metric spaces and take an arbitrary real number ε > 0. If F : X → Y is a continuous mapping, then for each x ∈ X there exists a real number δε (x) > 0 such that dX (x , x) < 2δε (x)
implies
dY (F (x ), F (x)) < ε.
Let Bδε (x) (x) be the open ball with center at the point x and radius δε (x). Consider the collection {Bδε (x) (x)}x∈X , which surely covers X (i.e., X = B x∈X δε (x) (x)). If X is compact n(Definition 3.60), then this covering of X includes a finite subcovering, say i=1 Bδε (xi ) (xi ) with xi ∈ X for i = 1 , . . . , n. n Take any x ∈ X. Since i=1 Bδε (xi ) (xi ) covers X, dX (xj , x ) < δε (xj )
3.10 Compact Sets
153
for some j = 1 , . . . , n (i.e., X = ni=1 Bδε (xi ) (xi ) implies that every point of X belongs to some ball Bδε (xj ) (xj )). Thus dY (F (xj ), F (x )) < ε. Set δε = min{δε (xi )}ni=1 , which is a positive number. If dX (x , x) < δε , then dX (x, xj ) ≤ dX (x, x ) + dX (x , xj ) < δε + δε (xj ) ≤ 2δε (xj ) by the triangle inequality, and hence dY (F (x), F (xj )) < ε. Therefore, since dY (F (x), F (x )) ≤ dY (F (x), F (xj )) + dY (F (xj ), F (x )), it follows that dY (F (x ), F (x)) < 2ε. Conclusion: Given an arbitrary ε > 0 there exists δε > 0 such that dX (x , x) < δε
implies
dY (F (x ), F (x)) < 2ε
for all x, x ∈ X. That is, F : X → Y is uniformly continuous.
We shall now investigate alternative characterizations of compact sets that, unlike the fundamental concept posed in Definition 3.60, will be restricted to metrizable spaces. Definition 3.68. Let A be a subset of a metric space (X, d). A subset A ε of A is an ε-net for A if for every point x of A there exists a point y in A ε such that d(x, y) < ε. A subset A of X is totally bounded in (X, d) if for every real number ε > 0 there exists a finite ε-net for A. Proposition 3.69. Let A be a subset of a metric space X. The following assertions are equivalent . (a) A is totally bounded . (b) For every real number ε > 0 there exists a finite partition of A into sets of diameter less than ε. Proof. Take any ε > 0 and set ρ = ε2 . If there is a finite ρ-net Aρ forA, then the finite collection of open balls {Bρ (y)}y∈Aρ covers A. That is, A ⊆ y∈AρBρ (y) Aρ is a ρ-net for since every x ∈ A belongs to Bρ (y) for some y ∈ Aρ whenever A. Set Ay = Bρ (y) ∩ A for each y ∈ Aρ so that A = y∈AρAy . A disjointification (Problem 1.18) of the finite collection {Ay }y∈Aρ is a finite partition of A into sets of diameter not greater than max y∈Aρ diam(Bρ (y) ∩ A) ≤ 2ρ = ε. Thus (a) implies (b) according to Definition 3.68. On the other hand, if {Ai }ni=1 is a finite partition of A into (nonempty) sets of diameter less than ε, then by taking one point ai of each set Ai we get a finite set {ai }ni=1 which is an ε-net for A. Therefore (b) implies (a).
154
3. Topological Structures
Note that every finite subset of a metric space X is totally bounded : it is a finite ε-net for itself for every positive ε. In particular, the empty set of X is totally bounded: for every positive ε there is no point in ∅ within a distance greater than ε for every point of X. It is also readily verified that every subset of a totally bounded set is totally bounded (indeed, Aε ∩ B is an ε-net for any set B ⊆ A whenever Aε is an ε-net for A). Moreover, the closure of a totally bounded set is again totally bounded (reason: Aε is a 2ε-net for A− whenever Aε is an ε-net for A). Proposition 3.70. Let A be a subset of a metric space X. If A has a finite ε-net for some ε > 0, then A is bounded in X. Proof. Suppose a nonempty subset A of a metric space (X, d) has a finite ε-net for some ε > 0. Take x, y ∈ A arbitrary so that there exist a, b ∈ Aε for which d(x, a) < ε and d(x, b) < ε. So d(x, y) ≤ d(x, a) + d(a, b) + d(b, y) < d(a, b) + 2ε by the triangle inequality. Hence diam(A) ≤ diam(Aε ) + 2ε. Therefore, since diam(Aε ) < ∞ (Aε is a finite set), it follows that diam(A) < ∞. Corollary 3.71. Every totally bounded set is bounded . Example 3.X. The converse fails. That is, a bounded set is not necessarily totally bounded. For instance, the closed unit ball B1 [ 0 ] centered at the null 2 sequence 0 in the metric space (+ , d2 ) of Example 3.B is obviously bounded 2 in (+ , d2 ); actually, diam(B [ 0 ]) = 2. We show that there is no finite ε-net 1 √ 2 for B1 [ 0 ] with ε ≤ 12 2, and hence B1 [ 0 ] is not totally bounded in (+ , d2 ). Indeed, consider the countable subset E = {ej }j∈N of B1 [ 0 ] made up of all scalar-valued sequences ej = {δjk }∞ k=1 ; that is, each sequence ej has just one nonzero entry (equal to 1) at the jth position. If Aε is an ε-net for B1 [ 0 ], then Aε contains a point within a distance less than ε for each ej in E, and √ hence Aε must have a point in each open ball Bε (ej ). Since d2 (ei , ej ) = 2 whenever i = j, it follows that B√ε (ei ) ∩ Bε (ej ) = ∅√ (i.e., Bε (ej )\Bε (ei ) = Bε (ej )) whenever i = j and ε ≤ 12 2. Thus, if ε ≤ 12 2, then for each ej ∈ E there exists bj ∈ Aε ∩ Bε (ej )\Bε (ei ) for every i = j. This establishes an injective function from E to Aε , and therefore # E ≤ # Aε . Thus Aε is at least countably infinite (i.e., ℵ0 ≤ # Aε ). Conclusion: B1 [ 0 ] is a closed and bounded 2 2 set in the complete metric space (+ , d2 ) that is not totally bounded in (+ , d2 ). Proposition 3.72. A totally bounded metric space is separable. Proof. Suppose X is a totally bounded metric space. For each positive integer n let Xn be a finite n1 -net for X. Set A = n∈N Xn , which is a countable subset of X (Corollary 1.11) and dense in X. Thus X is separable. To verify that A is dense in X proceed as follows. Take an arbitrary x in X. For each positive integer n there exists xn ∈ Xn such that d(x, xn ) < n1 (since Xn is a 1 n -net for X), and the A-valued sequence {xn }n∈N converges in X to x. Thus A− = X by Proposition 3.32.
3.10 Compact Sets
155
Lemma 3.73. A set A in a metric space X is totally bounded if and only if every A-valued sequence has a Cauchy subsequence. Proof. If A is a finite set, then the result holds trivially (in this case every A-valued sequence has a constant subsequence). Thus suppose A is an infinite set and let d denote the metric on X. (a) We shall say that an A-valued sequence {xk }k∈N 0 has Property Pn (ε), for some integer n ∈ N , if there exists an ε > 0 such that d(xj , xk ) ≥ ε for every pair {j, k} of distinct integers j, k = 0 , 1 , . . . , n. Claim . If A is not totally bounded, then there exists an ε > 0 and an A-valued sequence that has Property Pn (ε) for every n ∈ N . Proof. Suppose the infinite set A is not totally bounded and let ε be any positive real number for which there is no finite ε-net for A. In particular (and trivially), no singleton in A is an ε-net for A, and hence there exists a pair of points in A, say x0 and x1 , for which d(x0 , x1 ) ≥ ε. Thus every A-valued sequence whose first two entries coincide with x0 and x1 has Property P1 (ε). Suppose there exists an A-valued sequence {xk }k∈N 0 that has Property Pn (ε) for some integer n ∈ N 0 , so that d(xj , xk ) ≥ ε for every j, k = 0 , 1 , . . . , n such that j = k. Since the set {xk }nk=0 is not an ε-net for A (recall that there is no finite ε-net for A), it follows that there exists a point in A, say xn+1 , for which d(xk , xn+1 ) ≥ ε for every k = 0 , 1 , . . . , n. Replace xn+1 with xn+1 so that the resulting sequence has Property Pn+1 (ε). This concludes the proof by induction. If an A-valued sequence {xk }k∈N 0 has Property Pn (ε) for every n ∈ N , then d(xj , xk ) ≥ ε for every nonnegative distinct integers j and k, and hence it has no Cauchy subsequence. Conclusion: If A is not totally bounded, then there exists an A-valued sequence that has no Cauchy subsequence. Equivalently, if every A-valued sequence has a Cauchy subsequence, then A is totally bounded. (b) Conversely, suppose A is totally bounded and let {xk }k∈N be an arbitrary A-valued sequence. According to Proposition 3.69 there exists a finite partition A of A into sets of diameter less than 1. Since A is a finite partition of A, it follows that at least one of its sets, say A1 ⊆ A, has the property that the (infinite) A-valued sequence x = {xk }k∈N has an (infinite) subsequence, say x 1 = {x(1)k }k∈N , whose entries lie in A1 . Note that A1 is totally bounded (because A is). Thus there exists a finite partition A1 of A1 , consisting of subsets of A1 with diameter less than 12 , such that at least one of its sets, say A2 ⊆ A1 ⊆ A, has the property that the A1 -valued sequence x 1 = {x(1)k }k∈N has a subsequence, say x 2 = {x(2)k }k∈N , whose entries lie in A2 . This leads to the inductive construction of a decreasing sequence {An }n∈N of subsets of A with diam(An ) < n1 , each including a subsequence x n = {x(n)k }k∈N of the A-valued sequence {xk }k∈N , for every n ∈ N . Moreover, the sequence of sequences {x n }n∈N (i.e., the AN -valued sequence whose entries are the An -valued sequences x n for each n ∈ N ) has the property that x n+1 is a
156
3. Topological Structures
subsequence of x n for each n ∈ N . The kernel of the proof relies on the diagonal procedure (Section 1.9). Consider the sequence {x(n)n }n∈N , where each x(n)n is the nth entry of x n , which has the following properties: (1) it is an A-valued sequence (each x(n)n lies in An ⊆ A), (2) it is a subsequence of {xk }k∈N (since x n+1 is a subsequence of x n for each m ∈ N and x 1 is a subsequence of 1 x = {xk }k∈N ), and (3) d(x(m)m , x(n)n ) < m if m < n (because x(m)m ∈ Am , 1 x(n)n ∈ An , An ⊆ Am , and diam(Am ) < m for every m ∈ N ). Therefore, the “diagonal” sequence {x(n)n }n∈N is a Cauchy subsequence of {xk }k∈N . Total boundedness is not a topological invariant, but it is preserved by uniform homeomorphism. In fact, the same example that shows that completeness is not preserved by plain homeomorphism (Example 3.U) also shows that total boundedness is not preserved by plain homeomorphism (the sets (0, 1] and [1, ∞) are homeomorphic, but (0, 1] is totally bounded — see Example 3.Y below — while [1, ∞) is not even bounded). Theorem 3.74. Let F : X → Y be a uniformly continuous mapping of a metric space X into a metric space Y. If A is a totally bounded subset of X, then F (A) is totally bounded in Y. Proof. Let A be a nonempty subset of X and consider its image F (A) in Y under a mapping F : X → Y (if A is empty the result is trivially verified). Take an arbitrary F (A)-valued sequence {yn } and consider any A-valued sequence {xn } such that yn = F (xn ) for every n. If A is totally bounded, then Lemma 3.73 ensures that {xn } has a Cauchy subsequence, say {xnk }. If F : X → Y is uniformly continuous, then {F (xnk )} is a Cauchy sequence in Y (Lemma 3.43) which is a subsequence of {yn }. Thus every F (A)-valued sequence has a Cauchy subsequence; that is, F (A) is totally bounded by Lemma 3.73. In particular, if F : X → Y is a surjective uniformly continuous mapping of a totally bounded metric space X onto a metric space Y, then Y must be totally bounded. Corollary 3.75. If X and Y are uniformly homeomorphic metric spaces, then one is totally bounded if and only if the other is. Total boundedness is sometimes referred to as precompactness. Lemma 3.73 links completeness to compactness in a metric space. It actually leads to the proof that a metric space is compact if and only if it is complete and totally bounded . We shall prove this important assertion in the next section.
3.11 Sequential Compactness The notions of compactness and total boundedness can be thought of as topological counterparts of the set-theoretic notion of “finiteness” in the sense that they may suggest “approximate finiteness” (see Propositions 3.61 and 3.69).
3.11 Sequential Compactness
157
Definition 3.76. A metric space X is sequentially compact if every X-valued sequence has a subsequence that converges in X. A subset A of a metric space X is sequentially compact if it is sequentially compact as a subspace of X. Proposition 3.77. Let A be a subset of a metric space X. The following assertions are equivalent. (a) A is sequentially compact . (b) Every infinite subset of A has at least one accumulation point in A. Proof. If A is empty, then the result holds trivially (it is sequentially compact because there is no A-valued sequence, and it satisfies condition (b) because it includes no infinite subset). Thus let A be a nonempty set. Recall that the limits of the convergent subsequences of a given sequence are precisely the accumulation points of its range. Proof of (a)⇒(b). If B is an infinite subset of A, then B includes a countably infinite set, and so there exists a B-valued sequence {bn }n∈N of distinct points. If A is sequentially compact, then this A-valued sequence has a subsequence that converges in X to a point a ∈ A (Definition 3.76). If bn = a for all n ∈ N , then {bn }n∈N is a B\{a}-valued sequence of distinct points that converges in X to a, and therefore a ∈ A is an accumulation point of B (Proposition 3.28). If bm = a for some m ∈ N , then remove this (unique) point from {bn }n∈N and get a B\{a}-valued sequence of distinct points that converges in X to a, so that a ∈ A is again an accumulation point of B. Conclusion: (a) implies (b). Proof of (b)⇒(a). Let A be a subset of X for which assertion (b) holds true. In particular, every countably infinite subset of A has an accumulation point. Then every (infinite) A-valued sequence has a convergent subsequence, and hence A is sequentially compact. That is, (b) implies (a). A subset A of a metric space X (which may be X itself) is said to have the Bolzano–Weierstrass Property if it satisfies condition (b) of Proposition 3.77. Then a metric space is sequentially compact if and only if it has the Bolzano– Weierstrass Property. Note: Every finite subset of a metric space is sequentially compact , since it includes no infinite subset, and so it has the Bolzano– Weierstrass Property. In particular, the empty set is sequentially compact. Theorem 3.78. A metric space is sequentially compact if and only if it is totally bounded and complete. Proof. Let {xn }n∈N be an arbitrary X-valued sequence. If the metric space X is both totally bounded and complete, then {xn }n∈N has a Cauchy subsequence (because X is totally bounded — Lemma 3.73), which converges in X (because X is complete). Thus X is sequentially compact (Definition 3.76). On the other hand, sequential compactness clearly implies total boundedness (see Definition 3.76, Proposition 3.39(a), and Lemma 3.73). Moreover, if X is se-
158
3. Topological Structures
quentially compact and if {xn }n∈N is an X-valued Cauchy sequence, then this sequence has a convergent subsequence (Definition 3.76), and hence {xn }n∈N converges in X (Proposition 3.39(c)). Therefore, in a sequentially compact metric space X, every Cauchy sequence converges; that is, X is complete. The next lemma establishes another necessary and sufficient condition for sequential compactness. We shall apply this lemma to prove the equivalence between compactness and sequential compactness in a metric space. Lemma 3.79. (Cantor). A metric space X is sequentially compact if and only if every decreasing sequence {Vn }n∈N of nonempty closed subsets of X has a nonempty intersection (i.e., is such that n∈N Vn = ∅). Proof. (a) Let {Vn }n∈N be a decreasing sequence (i.e., Vn+1 ⊆ Vn for every n ∈ N ) of nonempty closed subsets of a metric space X. For each n ∈ N let vn be a point of Vn and consider the X-valued sequence {vn }n∈N . If X is sequentially compact, then {vn }n∈N has a subsequence that converges in X to, say, v ∈ X (Definition 3.76). Now take an arbitrary m ∈ N and note that this convergent subsequence is eventually in Vm (because {Vn }n∈N is decreasing), and hence it has a Vm -valued subsequence that converges in X to v (Proposition 3.5). Since Vm is closed in X, it follows by the ClosedSet Theorem that v ∈ Vm (Theorem 3.30). Therefore, as m is arbitrary, v ∈ m∈N Vm . (b) Conversely, let {xk }k∈N be an arbitrary X-valued sequence and set Xn = {xk ∈ X: k ≥ n} for each n ∈ N . Note that {Xn }n∈N is a decreasing sequence of nonempty subsets of X, and so is the sequenceof closed subsets of X, {Xn− }n∈N , consisting of the closures of each Xn . If n∈N Xn− = ∅, then there exists x ∈ Xn− for all n ∈ N . Take an arbitrary real number ε > 0 and consider the open ball Bε (x). Since Bε (x) ∩ Xn = ∅ for every n ∈ N (reason: set A = Xn and B = Xn− in Proposition 3.32(b)), it follows that for every n ∈ N there exist integers k ≥ n for which xk ∈ Bε (x). Thus every nonempty open ball centered at x meets the range of the sequence {xk }k∈N infinitely often, and so x is the limit of some convergent subsequence of {xk }k∈N (Proposition 3.28). Conclusion: Every X-valued sequence has a convergent subsequence, which means that X is sequentially compact. Theorem 3.80. A metric space is compact if and only if it is sequentially compact . Proof. (a) Suppose X is a compact metric space (Definition 3.60). Let {Vn }∞ n=1 be an arbitrary decreasing sequence of nonempty closed subsets of X. Set Un = X\Vn for each n ∈ N so that {Un }∞ sequence of n=1 is an increasing ∞ proper open subsets of X. If {Un }∞ n=1 covers X (i.e., if n=1 Un = X), then m Um = n=1 Un = X for some m ∈ N (since {Un }∞ n=1 is increasing and X is compact), which contradicts the fact that U = X n ∞ for every n ∈ N . Outcome: ∞ n=1 Un = X and hence (De Morgan laws) n=1 Vn = ∅. Therefore X is sequentially compact by Lemma 3.79.
3.11 Sequential Compactness
159
(b) On the other hand, suppose X is a sequentially compact metric space. Since X is separable (Proposition 3.72 and Theorem 3.78), it follows by Theorem 3.35 that X has a countable base B of open subsets of X. Let U be an arbitrary open covering of X. Claim . There exists a countable subcollection U of U that covers X. Proof. For each U ∈ U set BU = {B ∈ B: B ⊆ U }. Since B is a base for X, and U is an open subset of X, we get by the definition of base that U = BU . The collection B = U∈ U BU of open subsets of X has the following properties. #B ≤ #B and U⊆ B . Indeed, since BU ⊆ B for every U ∈ U , it follows that U∈ U BU ⊆ B. Thus B ⊆ B set so that # B ≤ # B. Moreover, if U is an arbitrary in U, then U = BU = B∈BU B ⊆ B∈B B = B , and hence U ⊆ B . Another set in B is included in some set in U property of the collection B is that every (reason: if B ∈ B = U∈ U BU , then B ∈ BU ⊆ BU = U for some U ∈ U ). For each set B in B take one set U in U that includes B , and consider the subcollection U of U consisting of all those sets U . The very of construction U establishes a surjective map of B onto U that embeds B in U . Thus # U ≤ #B and B ⊆ U . Therefore, by transitivity, #
U ≤
#B
and
U⊆
U .
Conclusion: U is a countable subcollection of U (because B is a countable base for X) which covers X (because U covers X). If U is finite, then it is itself a finite subcovering of U so that X is compact. If U is countably infinite, then it canbe indexed by N so that U = {Un }∞ n=1 , n where each Un is in U. Set Vn = X\ i=1 Ui for each n ∈ N so that {Vn }∞ n=1 is a decreasing sequence of closed subsets of X. Since U = ∞ n=1 Un = X ∞ (because U covers X), it follows that n=1 Vn = ∅. Thus, according to Lemma ∞ 3.79, at least one of the sets in{Vn }n=1 , say Vm , must mbe empty (because X is m sequentially compact). Then n=1 Vn = ∅, and so n=1 Un = X. Conclusion: U includes a finite subcovering, viz., {Un }m n=1 , so that X is compact. The theorems have been proved. Let us now harvest the corollaries. Corollary 3.81. If X is a metric space, then the following assertions are pairwise equivalent . (a) X is compact. (b) X is sequentially compact . (c) X is complete and totally bounded .
160
3. Topological Structures
Proof. Theorems 3.78 and 3.80.
As we have already observed, completeness and total boundedness are preserved by uniform homeomorphisms but not by plain homeomorphisms, whereas compactness is preserved by plain homeomorphisms. Thus completeness and total boundedness are not topological invariants. However, when taken together they mean compactness, which is a topological invariant. Corollary 3.82. A compact subset of any metric space is closed and bounded . Proof. Theorem 3.62(a) and Corollaries 3.71 and 3.81.
Recall that the converse fails. Indeed, we exhibited in Example 3.X a closed and bounded subset of a (complete) metric space that is not totally bounded, and hence not compact. Theorem 3.83. (Heine–Borel). A subset of Rn is compact if and only if it is closed and bounded . Proof. The condition is clearly necessary by Corollary 3.8. We shall prove that it is also sufficient. Consider the real line R equipped with its usual metric. Let Vρ be any nondegenerate closed and bounded interval, say Vρ = [α, α + ρ] for some real number α and some ρ > 0. Take an arbitrary ε > 0 and let nε be a positive integer large enough so that ρ < (nε + 1) 2ε . For each integer k = ε ε ε 0 , 1 , . . . , nε consider the interval Ak = [α + k , α + (k + 1) ) of diameter . 2 2 nε 2 ε Since Aj ∩ Ak = ∅ whenever j = k, and Vρ ⊂ [α, α + (nε + 1) 2 ) = k=0 Ak , ε it follows that {Ak ∩ Vρ }nk=0 is a finite partition of Vρ into sets of diameter less than ε. Thus every closed and bounded interval of the real line is totally bounded (Proposition 3.69). Now equip Rn with any of the metrics d∞ or dp for some p ≥ 1 as in Example 3.A (recall: these are uniformly equivalent metrics on Rn — Problem 3.33). Take an arbitrary bounded subset B of Rn and consider the closed interval Vρ of diameter ρ = diam(B) such that B ⊆ Vρn . Since Vρ is totally bounded in R, it follows by Problem 3.64(a) that the Cartesian product Vρn is totally bounded in Rn. Hence, as a subset of a totally bounded set, B is totally bounded. Conclusion: Every bounded subset of Rn is totally bounded . Moreover, since Rn is a complete metric space (when equipped with any of these metrics — Example 3.R(a)), it follows by Theorem 3.40(b) that every closed subset of Rn is a complete subspace of Rn. Therefore every closed and bounded subset of Rn is compact (Corollary 3.81). The Heine–Borel Theorem is readily extended to C n (again equipped with any of the uniformly equivalent metrics d∞ or dp for some p ≥ 1 as in Example 3.A): A subset of C n is compact if and only if it is closed and bounded in C n. Corollary 3.84. Let X be a complete metric space and let A be a subset of X. (a) A is compact if and only if it is closed and totally bounded in X. (b) A is relatively compact if and only if it is totally bounded in X.
3.11 Sequential Compactness
161
Proof. By Theorem 3.40(b) and Corollaries 3.81 and 3.82 we get the result in (a), which in turn leads to the result in (b) by recalling that A− is totally bounded if and only if A is totally bounded. Corollary 3.85. A continuous image of a compact set is closed and bounded .
Proof. Theorem 3.64(a) and Corollary 3.82.
Theorem 3.86. (Weierstrass). If ϕ: X → R is a continuous real-valued function on a metric space X, then ϕ assumes both a maximum and a minimum value on each nonempty compact subset of X. Proof. If ϕ is a continuous real-valued function defined on a (nonempty) compact metric space, then its (nonempty) range R(ϕ) is both closed and bounded in the real line R equipped with its usual metric (Corollary 3.85). Thus the bounded subset R(ϕ) of R has an infimum and a supremum in R, which actually lie in R(ϕ) because R(ϕ) is closed in R (recall: a closed subset contains all its adherent points). Example 3.Y. Consider the set C[X, Y ] of all continuous mappings of a metric space X into a metric space Y, and let B[X, Y ] be the set of all bounded mappings of X into Y. According to Corollary 3.85, C[X, Y ] ⊆ B[X, Y ] whenever X is compact. Thus the sup-metric d∞ on B[X, Y ] (Example 3.C) is inherited by C[X, Y ] if X is compact, and hence (C[X, Y ], d∞ ) is a subspace of (B[X, Y ], d∞ ). In other words, if X is compact, then the sup-metric d∞ is well defined on C[X, Y ] so that, in this case, (C[X, Y ], d∞ ) is a metric space. In particular, C[0, 1] ⊂ B[0, 1] because [0, 1] is compact in R by the Heine–Borel Theorem (Theorem 3.83), and so (C[0, 1], d∞ ) is a subspace of (B[0, 1], d∞ ), as we had anticipated in Examples 3.D and 3.G, and used in Examples 3.N and 3.T. Moreover, since the absolute value function | · |: [0, 1] → R is continuous, since (x − y) ∈ C[0, 1] for every x, y ∈ C[0, 1], since the interval [0, 1] is compact, and since the composition of continuous functions is continuous, it follows by the Weierstrass Theorem (Theorem 3.86) that d∞ (x, y) = max |x(t) − y(t)|
for every
x, y ∈ C[0, 1].
t∈[0,1]
Now let BC[X, Y ] be the set of all bounded continuous mappings of X into Y and equip it with the sup-metric d∞ as in Example 3.T. If X is compact, then C[X, Y ] = BC[X, Y ] and (C[X, Y ], d∞ ) is a metric space that coincides with the metric space (BC[X, Y ], d∞ ). Since (BC[X, Y ], d∞ ) is complete if and only if Y is complete (Example 3.T), it follows that (C[X, Y ], d∞ ) is complete if X is compact and Y is complete.
162
3. Topological Structures
Example 3.Z. Suppose (X, dX ) is a compact metric space, let (Y, dY ) be any metric space, and consider the metric space (C[X, Y ], d∞ ) of Example 3.Y. Let Φ be a subset of C[X, Y ]. We shall investigate a necessary and sufficient condition that Φ be totally bounded. To begin with, let us pose the following definitions. ( i ) A subset Φ of C[X, Y ] is pointwise totally bounded if for each x in X the set Φ(x) = {f (x) ∈ Y : f ∈ Φ} is totally bounded in Y. Similarly, Φ is pointwise bounded if the set Φ(x) is bounded in Y for each x in X (i.e., if supf,g∈Φ dY (f (x), g(x)) < ∞ for each x ∈ X). (ii) A subset Φ of C[X, Y ] is equicontinuous at a point x0 ∈ X if for each ε > 0 there exists a δ > 0 such that dY (f (x), f (x0 )) < ε whenever dX (x, x0 ) < δ for every f ∈ Φ (note: δ depends on ε and may depend on x0 but it does not depend on f — hence the term “equicontinuity”). Φ is equicontinuous (or equicontinuous on X) if it is equicontinuous at every point of X. Remark : If a subset Φ is equicontinuous on X, and if for each ε > 0 there exists a δ > 0 (which depends only on ε) such that dX (x, x ) < δ implies dY (f (x), f (x )) < ε for all x, x ∈ X and every f ∈ Φ, then Φ is said to be uniformly equicontinuous on X. Uniform equicontinuity coincides with equicontinuity on a compact space (Theorem 3.67). (a) A totally bounded Φ is pointwise totally bounded and equicontinuous. Indeed, take ε > 0, x0 ∈ X, and f ∈ Φ arbitrary. If Φ is totally bounded, then there is a finite ε-net Φε for Φ so that the set Φε (x0 ) = {g(x0 ) ∈ Y : g ∈ Φε } is a finite ε-net for Φ(x0 ). Hence, if Φ is totally bounded, then Φ(x0 ) is totally bounded for every x0 in X, which means that Φ is pointwise totally bounded. Moreover, since Φε is an ε-net for Φ, there exists g ∈ Φε such that d∞ (f, g) < ε, and so, for every x ∈ X, dY (f (x), g(x)) ≤ supx∈X dY (f (x), g(x)) = d∞ (f, g) < ε. Therefore, dY (f (x), f (x0 )) ≤ dY (f (x), g(x)) + dY (g(x), g(x0 )) + dY (g(x0 ), f (x0 )) ≤ 2ε + dY (g(x), g(x0 )) for every x ∈ X and every g ∈ Φε . Since each g ∈ Φε is continuous, it follows that for each g ∈ Φε there exists a δg = δg (ε, x0 ) > 0 such that dX (x, x0 ) < δg implies dY (g(x), g(x0 )) < ε. Since Φε is a finite ε-net for Φ, set δ = δ(ε, x0 ) = min{δg }g∈Φε so that dY (g(x), g(x0 )) < ε whenever dX (x, x0 ) < δ. Thus there exists a δ > 0 (that does not depend on f ) such that dX (x, x0 ) < δ
implies
dY (f (x), f (x0 )) < 3 ε.
Hence, if Φ is totally bounded, then Φ is equicontinuous at an arbitrary x0 in X, which means that Φ is equicontinuous on X. Summing up: If Φ is totally bounded, then Φ is pointwise totally bounded and equicontinuous on X.
3.11 Sequential Compactness
163
(b) A pointwise totally bounded and equicontinuous Φ is totally bounded . This is the converse of (a). In fact, recall that X is separable (because it is compact — Proposition 3.72 and Corollary 3.81) and take a countable dense subset A of X. Consider the (infinite) A-valued sequence {ai }i≥1 consisting of an enumeration of all points of A (followed by an arbitrary repetition of points of A if A is finite). Let f = {fn }n≥1 be an arbitrary Φ-valued sequence, and suppose Φ is pointwise totally bounded so that Φ(x) is totally bounded in Y for every x ∈ X. Thus, according to Lemma 3.73, for each x ∈ X the Φ(x)-valued sequence {fn (x)}n≥1 has a Cauchy subsequence. In particular, {fn (a1 )}n≥1 has a Cauchy subsequence, say {fn(1) (a1 )}n≥1 . Set f1 = {fn(1) }n≥1 , which is a Φ-valued subsequence of f such that {fn(1) (a1 )}n≥1 is a Cauchy sequence in Y. Consider for each x ∈ X the Φ(x)-valued sequence {fn(1) (x)}n≥1 . Since Φ(x) is totally bounded for every x ∈ X, it follows by Lemma 3.73 that {fn(1) (x)}n≥1 has a Cauchy subsequence for every x ∈ X. In particular, {fn(1) (a2 )}n≥1 has a Cauchy sequence, say {fn(2) (a2 )}n≥1 . Set f2 = {fn(2) }n≥1 , which is a Φ-valued subsequence of f1 such that {fn(2) (a2 )}n≥1 is a Cauchy sequence in Y and both {fn(1) (a1 )}n≥1 and {fn(1) (a2 )}n≥1 are Cauchy sequences in Y as well (reason: f2 is a subsequence of f1 , and hence {fn(2) (a1 )}n≥1 is a subsequence of the Cauchy sequence {fn(1) (a1 )}n≥1 ). This leads to the inductive construction of a sequence {fk }k≥1 of Φ-valued subsequences of f with the following properties. Property (1). Each fk+1 = {fn(k+1) }n≥1 is a subsequence of fk = {fn(k) }n≥1 . Property (2). {fn(k) (ai )}n≥1 is a Cauchy sequence in Y whenever 1 ≤ i ≤ k. As it happened in part (b) of the proof of Lemma 3.73, the diagonal procedure plays a central role in this proof too. Take any integer i ≥ 1. By Property (1) the Y -valued sequence {fn(n) (ai )}n≥i is a subsequence of {fn(i) (ai )}n≥1 , which is a Cauchy sequence in Y by Property (2). Thus {fn(n) (ai )}n≥i is a Cauchy sequence in Y, and so is {fn(n)(ai )}n≥1 . Therefore, the “diagonal” sequence {fn(n) }n≥1 , which is a subsequence of the Φ-valued sequence f = {fn }n≥1 (cf. Property (1)), is such that {fn(n)(a)}n≥1 is a Cauchy sequence in Y for every a ∈ A. Now take ε > 0 and x ∈ X arbitrary. Suppose Φ is equicontinuous on X. Thus there exists a δε = δε (x) > 0 such that
dX (x , x) < δε implies dY fn(n) (x ), fn(n) (x) < ε for all n. Since A is dense in X, it follows that there exists a ∈ A such that dX (a, x) < δε , and hence
dY fn(n) (a), fn(n) (x) < ε
164
3. Topological Structures
for all n. But {fn(n) (a)}n≥1 is a Cauchy sequence, which means that there exists a positive integer nε = nε (a) such that (m)
(a), fn(n) (a) < ε dY fm whenever m, n ≥ nε . Then, by the triangle inequality, (m) (m) (m)
(m) (x), fn(n) (x) ≤ dY fm (x), fm (a) + dY fm (a), fn(n) (a) dY fm
+ dY fn(n) (a), fn(n) (x) < 3 ε whenever m, n ≥ nε . Therefore, as nε does not depend on x,
(m) (n) (m) , fn (x), fn(n) (x) < 3 ε = sup dY fm d∞ fm x∈X
whenever m, n ≥ nε , and so the subsequence {fn(n) }n≥1 of f = {fn }n≥1 is a Cauchy sequence in Φ. Thus Φ is totally bounded by Lemma 3.73. Summing up: If X is compact, and if Φ is pointwise totally bounded and equicontinuous, then Φ is totally bounded. ` –Ascoli Theorem. If X is compact, then a subset of the metric (c) Arzela space (C[X, Y ], d∞ ) is totally bounded if and only if it is pointwise totally bounded and equicontinuous.
This is a consequence of (a) and (b). The next corollary follows at once (cf. Example 3.Y and Corollary 3.84). (d) If X is a compact metric space and Y is a complete metric space, then a subset of the metric space (C[X, Y ], d∞ ) is compact if and only if it is pointwise totally bounded, equicontinuous, and closed in (C[X, Y ], d∞ ). Recall that total boundedness coincides with plain boundedness in the real line (see proof of Theorem 3.83). Thus we get the following particular case of (d): A subset Φ of the metric space (C[0, 1], d∞ ) is compact if and only if it is pointwise bounded, equicontinuous and closed in (C[0, 1], d∞ ). Note: In this case pointwise boundedness means supx∈Φ |x(t)| < ∞ for each t ∈ [0, 1].
Suggested Reading Bachman and Narici [1] [2] Brown and Pearcy [3] Dieudonn´e [1] Dugundji [1] Gelbaum and Olmsted [1] Goffman and Pedrick [1] Kaplansky [2] Kantorovich and Akilov [1] Kelley [1]
Kolmogorov and Fomin [1] McDonald and Weiss [1] Munkres [1] Naylor and Sell [1] Royden [1] Schwartz [1] Simmons [1] Smart [1] Sutherland [1]
Problems
165
Problems Problem 3.1. Let (X, d) be a metric space. Then, for every x, y, z in X, |d(x, y) − d(y, z)| ≤ d(x, z).
(a)
Hint : Use symmetry and the triangle inequality only. Incidentally, the above inequality shows that the metric axioms (i) to (iv) in Definition 3.1 are not independent. For instance, the property d(x, y) ≥ 0 in axiom (i) follows from symmetry and the triangle inequality. That is, the statement “d(x, y) ≥ 0 for every x, y ∈ X” in fact is a theorem derived by axioms (iii) and (iv). Moreover, for every u, v, x, y in X, |d(x, u) − d(v, y)| ≤ d(x, v) + d(u, y).
(b)
Hint : d(x, u) ≤ d(x, v) + d(v, y) + d(y, u) and, similarly, d(v, y) ≤ d(v, x) + d(x, u) + d(u, y) — use symmetry. Problem 3.2. Suppose d1 : X×X → R and d2 : X×X → R are metrics on a set X. Define the functions d : X×X → R and d : X×X → R by d(x, y) = d1 (x, y) + d2 (x, y) and d (x, y) = max d1 (x, y), d2 (x, y) for every x, y ∈ X. Show that both d and d are metrics on X. p > 1 is Problem 3.3. Let p and q be real numbers. If p > 1 and if q = p−1 the unique solution to the equation p1 + q1 = 1 (or, equivalently, the unique solution to the equation p + q = pq), then p and q are said to be H¨ older conjugates of each other. Prove the following inequalities.
(a) If p > 1 and q > 1 are H¨older conjugates, and if x = (ξ1 , . . . , ξn ) and y = (υ1 , . . . , υn ) are arbitrary n-tuples in C n, then n
|ξi υi | ≤
n
i=1
|ξi |p
n
p1
i=1 p
|υi |q
q1
.
i=1
q
Hint : Show that αβ ≤ αp + βq for every pair of positive real
α numbers n p p1 |ξ | . and β whenever p and q are H¨older conjugates. Set #x#p = i i=1 Moreover,
n i=1
|ξi υi | ≤ max |ξi | 1≤i≤n
n
|υi |.
i=1
These are the H¨ older inequalities for finite sums.
166
3. Topological Structures
(b) Let x = {ξk } and y = {υk } be arbitrary complex-valued sequences (i.e., sequences in C N ). If p > 1 and q > 1 are H¨ older conjugates, then ∞
|ξk υk | ≤
∞
k=1
whenever
∞
|ξk |p
∞
p1
k=1
p k=1 |ξk |
< ∞ and ∞
k=1 |υk |
< ∞; and
q
|ξk υk | ≤ sup |ξk | k∈N
whenever supk∈N |ξk | < ∞ and inequalities for infinite sums.
q1
k=1
∞
k=1
|υk |q
∞
k=1 |υk |
∞
|υk |
k=1
< ∞. These are called H¨ older
(c) Finally, let Ω be a nonempty subset of C , and let x and y be arbitrary complex-valued functions on Ω (i.e., functions in C Ω ). If p > 1 and q > 1 are H¨older conjugates, then $ $
p1 $
q1 |x y| dω ≤ |x|p dω |y|q dω Ω
Ω
Ω
% for every pair of integrable functions x, y in C Ω such that Ω |x|p dω < ∞ % Ω and Ω |y|q dω < ∞. Moreover, % if x, y ∈ C are integrable functions such that supω∈Ω |x(ω)| < ∞ and Ω |y| dω < ∞, then $ $ |x y| dω ≤ sup |x(ω)| |y| dω. ω∈Ω
Ω
Ω
These are the H¨ older inequalities for integrals. Problem 3.4. Let p be any real number such that p ≥ 1. Use the preceding problem to verify the following results. (a) Minkowski inequalities for finite sums. Take an arbitrary pair of n-tuples in C n, say x = (ξ1 , . . . , ξn ) and y = (υ1 , . . . , υn ) in C n. Then n
|ξi + υi |p
i=1
p1
≤
n
|ξi |p
i=1
p1
+
n
|υi |p
p1
i=1
and max |ξi + υi | ≤ max |ξi | + max |υi |.
1≤i≤n
1≤i≤n
1≤i≤n
Hint : Verify that |ξ + υ| ≤ |ξ| + |υ| for every pair {ξ, υ} of complex numbers, and show that (α + β)p = (α + β)p−1 α + (α + β)p−1 β for every pair {α, β} of nonnegative real numbers.
Problems
167
(b) Minkowski inequalities for infinite sums. If x = {ξk } and y = {υk } are sequences in C N (i.e., = {υk } are complex-valued se∞if x =p{ξk } and y ∞ p quences) such that k=1 |ξk | < ∞ and k=1 |υk | < ∞, then ∞
|ξk + υk |p
p1
∞
≤
k=1
|ξk |p
p1
+
∞
k=1
Moreover,
|υk |p
p1
.
k=1
sup |ξk + υk | ≤ sup |ξk | + sup |υk | k∈N
k∈N
k∈N
whenever supk∈N |ξk | < ∞ and supk∈N |υk | < ∞. (c) Minkowski inequalities for integrals. If x and y are integrable in % functions p C Ω (i.e., integrable complex-valued functions on Ω) with |x| dω < ∞ Ω % and Ω |y|p dω < ∞, where Ω is a nonempty subset of C , then $
p1 $
p1 $
p1 p p |x + y| dω ≤ |x| dω + |y|p dω . Ω
Ω
Ω
If supω∈Ω |x(ω)| < ∞ and supω∈Ω |y(ω)| < ∞, then sup |x(ω) + y(ω)| ≤ sup |x(ω)| + sup |y(ω)|. ω∈Ω
ω∈Ω
ω∈Ω
Problem ∞ 3.5. (a) Let {αk } be a sequence of nonnegative real numbers such that k=1 αk < ∞. Take an arbitrary real number r > 1. Verify that ∞
αrk ≤
∞
k=1
Hint : αk ≤
∞
k=1 αk
and so
r αk
.
k=1
r ∞ ∞
∞ ∞ ≤ k=1 αk k=1 αk k=1 αk k=1 αk = 1.
(b) Jensen inequality. If p and q are real numbers such that 0 < p < q, then ∞
|ξk |q
q1
≤
∞
k=1
|ξk |p
p1
k=1
for every x = {ξk } in C N such that complex-valued sequence). Prove.
∞
Hint : Apply the inequality in (a) for
k=1 |ξk |
∞
p
k=1 |ξk |
< ∞ (i.e., for every p-summable q
=
∞
k=1 |ξk |
pr
with r =
q p
> 1.
∞ and + be the sets of all p-summable and bounded sequences from (c) Let N F , respectively, where F = R or F = C (i.e., the sets of all p-summable and bounded scalar-valued sequences, respectively; see Example 3.B). Show that p +
1≤p
implies
p q ∞ ⊂ + ⊂ + , +
and verify that these are in fact proper inclusions.
168
3. Topological Structures
Problem 3.6. If A and B are nonempty and bounded subsets of a metric space (X, d), then sup x∈A, y∈B
d(x, y) ≤ diam(A) + diam(B) + d(A, B).
Hint : d(x, y) ≤ d(x, a) + d(a, b) + d(b, y) for every x, a ∈ A and every y, b ∈ B. Now conclude that A ∪ B is bounded. Show that the union of a finite collection of bounded subsets of X is a bounded subset of X. Problem 3.7. Let (X, d) be a metric space and define di : X×X → R for i = 1, 2 as follows. d1 (x, y) =
d(x, y) 1 + d(x, y)
d2 (x, y) = min 1, d(x, y)
and
for every x, y ∈ X. Show that d1 and d2 are metrics on X. Moreover, verify that every set in the metric spaces (X, d1 ) and (X, d2 ) is bounded. Problem 3.8. Let p, q, and r be positive numbers such that 1 1 1 + = . p q r (a) Prove the following extension of the H¨older inequality for integrals.
r1 $
p1 $
q1 $ |x y|r dω ≤ |x|p dω |y|q dω Ω
Ω
Ω
% p for % allq integrable functions x, y in C such that Ω |x| dω < ∞ and |y| dω < ∞. (Similar inequalities hold for finite and infinite sums.) Ω Ω
Hint : Verify that
p r
and
q r
are H¨older conjugates.
(b) Let S be a bounded interval of the real line and let dp be the usual metric on Rp (S) (see Example 3.E). Show that 1≤r
implies
Rp (S) ⊂ Rr (S).
p−r
Hint : dr (x, y) ≤ diam(S) pr dp (x, y) for every x, y in Rp (S) if 1 ≤ r < p. Problem3.9. Let {(Xi , di )}ni=1 be a finite collection of metric spaces and n set Z = i=1 Xi , the Cartesian product of their underlying sets. Consider the functions dp : Z×Z → R (for each real number p ≥ 1) and d∞ : Z×Z → R defined as follows. n
p1 n di (xi , yi )p and d∞ (x, y) = max di (xi , yi ) i=1 dp (x, y) = i=1
for every x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) in Z = these are metrics on Z.
n
i=1 Xi .
Show that
Problems
169
Remark : Let d : Z×Z → R be any of the above metrics. The Cartesian product n i=1 Xi equipped with d is referred to as a product space of the metric spaces {(Xi , di )}ni=1 , and the metric d as a product metric. Sometimes, when the metric d been previously specified, it is convenient to denote the product has n n space i=1 Xi , d by i=1 (Xi , di ). This notation is particularly suitable for a product space of metric spaces with the same underlying set but different metrics. For instance, (X, d1 )×(X, d2 ) and (X, d2 )×(X, d1 ) are different metric spaces with the same underlying set X×X. Problem 3.10. Consider the real line R with its usual metric. Let {αn } be a real-valued sequence (indexed by N or N 0 ) and recall the definitions of bounded, increasing, decreasing, and monotone sequences: (i)
{αn } is bounded if supn |αn | < ∞ .
(ii) {αn } is increasing if αn ≤ αn+1 for every index n. (iii) {αn } is decreasing if αn+1 ≤ αn for every index n. (iv)
{αn} is monotone if it is either increasing or decreasing.
Prove the following statements. (a) If {αn } converges, then it is bounded. (b) If {αn } is monotone and bounded, then it converges. Therefore, for a real-valued monotone sequence, boundedness becomes equivalent to convergence. Now suppose {αn } is a nonnegative sequence (i.e., 0 ≤ αn for every index n) and let {βn } be another real-valued sequence. (c) If 0 ≤ αn ≤ βn for every n and βn → 0, then αn → 0. Next let {αn } and {βn } be arbitrary scalar-valued sequences (either in R or C). (d) supn |αn | inf n |βn | ≤ supn (|αn | |βn |). Hint : |αn | inf n |βn | ≤ |αn | |βn |. (e) If αn → α and βn → β, then αn + βn → α + β. (f) If αn → α and βn → β, then αn βn → α β. Hint : αn βn = (αn − α)βn + α(βn − β) + α β. Show that (e) and (f) can be extended to finite sums and finite products of convergent sequences. A subsequence {nk } of the sequence of all nonnegative integers is of bounded increments if supk≥0 (nk+1 − nk ) < ∞. Use the extension of (e) to finite sums to prove the following proposition. (g) If {αn } is a sequence of nonnegative numbers, and if there exists a subsequence of bounded increments {nk } such that αnk +j → 0 as k → ∞ for every j ≥ 0, then αn → 0 as n → ∞.
170
3. Topological Structures
Hint : Set J = supk≥0 (nk+1 − nk ) and write any nonnegative integer as n = n k + jk ≤ nk + J for some 0 ≤ jk ≤ J. The extension of (e) ensures J α that j=1 nk +j → 0 as nk → ∞. Therefore, sup0≤jk ≤J αnk +jk → 0 as nk → ∞. Now show that for every ε > 0 there is an integer kε for which αn = αnk +jk < sup0≤jk ≤J αnk +jk < ε whenever nk ≥ nkε . Thus conclude that αn < ε whenever n ≥ nkε + J. Problem 3.11. Consider again the real line R with its usual metric and let {αn } be a sequence of nonnegative real numbers. For each integer n ≥ 1 set σn =
n
αi .
i=1
{σn } is called the sequence of partial sums of {αn }, which clearly is nonnegative and increasing. Let us introduce the following usual notation and terminology. (i) If {σn } converges, then we say that ∞ αk converges. ∞k=1 (ii) If {σn } is bounded, then we write k=1 αk < ∞. ∞ (iii) For each index n write k=n+1 αk = supm σm − σn . The purpose of this problem is to show that the assertions ∞ k=1 αk converges, ∞ and k=1 αk < ∞, ∞ k=n+1 αk → 0 as n → ∞ are pairwise equivalent. Prove the following propositions. (a) If supm σm < ∞, then σn → supm σm as n → ∞. (b) If {σn } converges, then limn σn = supm σm . Obviously, the above convergences are all in R with its usual metric. Problem 3.12. Let {αn } be a sequence of nonnegative real numbers and equip the real line with its usual metric. Prove the following assertions. ∞ (a) If n=1 αn < ∞, then limn αn = 0. ∞ (b) If {αn } is decreasing and n=1 αn < ∞, then limn nαn = 0. Hint : Show that if limn α2n = limn α2n−1 = α, then limn αn = α. Now exhibit a pair of nonnegative sequences {βn } and {γn } with the following properties. ∞ (c) n=1 βn < ∞ and supn nβn = ∞. ∞ 2 (d) {γn } is decreasing, n=1 γn < ∞, and supn nγn = ∞.
Problems
171
Problem 3.13. Let {βn } be a real-valued bounded sequence. The real line is a boundedly complete lattice in the natural ordering ≤ of R (Example 1.C). Thus both supn≤k βk = sup{βk : n ≤ k} and inf n≤k βk = inf{βk : n ≤ k} exist in R for each index n. Set αn = inf βk
and
n≤k
γn = sup βk n≤k
for each n, consider the real-valued sequences {αn } and {γn }, and equip the real line R with its usual metric. (a) Show that both sequences {αn } and {γn } converge in R. Hint : They are bounded monotone sequences. Since these sequences converge in R, set α = lim αn = lim inf βk n
n
and
n≤k
γ = lim γn = lim sup βk . n
n n≤k
The limits α ∈ R and γ ∈ R of the bounded monotone sequences {αn } and {γn } are usually denoted by α = lim inf βn n
and
γ = lim sup βn , n
and are called limit inferior (or lower limit ) and limit superior (or upper limit ) of the sequence {βn }, respectively (see Problem 1.19). First show that (b) lim inf n βn ≤ lim supn βn . Now prove the following propositions which say that a real-valued sequence converges if and only if its lower and upper limits coincide. (c) If {βn } converges in R, then lim inf n βn = lim supn βn = limn βn . Hint : Show that |αn − β| ≤ |βn − β| for each n and every β ∈ R. Similarly, show that |βn − γ| ≤ |γn − γ| for each n and every γ ∈ R. (d) If lim inf n βn = lim supn βn = β, then {βn } converges in R to β. Hint : Show that αn ≤ βn ≤ γn for each n, then apply Problem 3.10(c). Remark : If the real-valued sequence {βn } is not bounded above, then {βk }n≤k is not bounded above for every index n. In this case we write supn≤k βk = ∞ for every n and also lim supn βn = ∞. Similarly, if {βn } is not bounded below, then {βk }n≤k is not bounded below for every index n. In this case we write inf n≤k βk = −∞ for every n and lim inf n βn = −∞. Observe that the notation lim supn βn = ∞ and lim inf n βn = −∞ are mere formalisms; they simply say that {βn } is not bounded above or below, respectively. It should also be noticed that the concepts and results in this problem naturally generalize from sequences to nets of real numbers.
172
3. Topological Structures
Warning: Let {βn } be an arbitrary real-valued sequence (bounded or not). Consider the sets B = {βn } and Bn = {βk }n≤k for each n, which are the ranges of the sequences {βn } and {βk }n≤k . Since {Bn } is a decreasing sequence of subsets of B ⊆ R in the inclusion ordering, it follows by Problem 1.19 that Bn , lim Bn = lim inf Bn = lim sup Bn = n
n
n
n
℘ which always exists as a set in the power set (R) (e.g., if βn = n for each n ∈ N , then n∈N Bn = ∅). Carefully note that the present problem deals with the natural ordering ≤ of R, and not with the inclusion ordering ⊆ of ℘(R). Problem 3.14. Let {xn } and {yn } be two convergent sequences (both indexed by N or N 0 ) in a metric space (X, d). Set x = lim xn and y = lim yn in X. Prove the following propositions. (a) d(xn , u) → d(x, u) and d(v, yn ) → d(v, y) for each u, v ∈ X. (b) d(xn , yn ) → d(x, y). Hint : Problems 3.1(b) and 3.10(c). (c) If there exists α > 0 and an integer n0 such that d(xn , yn ) < α for every n ≥ n0 , then d(x, y) ≤ α. Hint : Use the triangle inequality to show that d(x, y) < 2ε + α for every ε > 0, so that d(x, y) ≤ inf ε>0 (2ε + α). Problem 3.15. Two sequences {xn }, {yn } in a metric space (X, d) are equiconvergent if lim d(xn , yn ) = 0. Prove the following propositions. (a) {xn } converges to x in (X, d) if and only if it is equiconvergent with the constant sequence {xn } where xn = x for all n. (b) Two sequences that converge to the same limit are equiconvergent. (c) If one of two equiconvergent sequences is convergent, then so is the other and both have the same limit. Problem 3.16. Obviously, uniform continuity implies continuity. (a) Show that the function f : R → R defined by f (x) = x2 for every x ∈ R is continuous but not uniformly continuous. (b) Prove that every Lipschitzian mapping is uniformly continuous. 1
(c) Show that the function g: [0, ∞) → [0, ∞) defined by g(x) = x 2 for every x ∈ [0, ∞) is uniformly continuous but not Lipschitzian. Problem 3.17. Consider the setup and the mappings of Example 3.H. (a) Show that F : (X, d2 ) → (Y, d2 ) of Example 3.H(a) is nowhere continuous.
Problems
173
Hint : Take x0 ∈ X and δ > 0 arbitrary. Let α be any positive real number 3 δ3 ≤ α2 < δ6 and consider the function eδ : R → R defined by such that 18 ⎧ ⎪ 0 ≤ t ≤ 3δ , ⎪ α, ⎨ 3 6 eδ (t) = −α, δ < t ≤ δ, ⎪ ⎪ ⎩ 0, otherwise. Verify that eδ lies in X so that x0 + eδ ∈ X. Set xδ = x0 + eδ in X, yδ = F (xδ ) and y0 = F (x0 ) in Y. Compute yδ (t) − y0 (t) for t ∈ R and conclude that d2 (xδ , x0 ) < δ and d2 (F (xδ ), F (x0 )) ≥ 1. (b) Show that F : (R2 (S), d2 ) → (R2 (S), d2 ) of Example 3.H(b) is Lipschitzian (and hence uniformly continuous — hint : H¨older inequality). Problem 3.18. Let C [0, 1] be the subset of C[0, 1] consisting of all differentiable functions from C[0, 1] whose derivatives lie in C[0, 1]. Consider the subspace (C [0, 1], d∞ ) of the metric space (C[0, 1], d∞ ) — see Example 3.D. Let D: C [0, 1] → C[0, 1] be the mapping that assigns to each x ∈ C [0, 1] its derivative in C[0, 1]. (a) Show that D: (C [0, 1], d∞ ) → (C[0, 1], d∞ ) is nowhere continuous. Now consider the function d : C [0, 1]×C [0, 1] → R defined by d(x, y) = d∞ (x, y) + d∞ (D(x), D(y)) for every x, y ∈ C [0, 1], which is a metric on C [0, 1] (cf. Problem 3.2). (b) Show that D: (C [0, 1], d) → (C[0, 1], d∞ ) is a contraction (and therefore Lipschitzian, and hence uniformly continuous). Problem 3.19. Consider the real line R equipped with its usual metric. Set B[0, ∞) = B[[0, ∞), R] and R1 [0, ∞) = R1 ([0, ∞)) as in Examples 3.C and 3.E. Let d∞ be the sup-metric on B[0, ∞) and let d1 be the usual metric on R1 [0, ∞). Consider the set of all real-valued functions x on [0, ∞) that are 1%∞ integrable (i.e., x is Riemann integrable on [0, ∞) and 0 |x(s)| ds < ∞) and bounded (i.e., sups≥0 |x(s)| < ∞). Allowing a slight abuse of notation, write X[0, ∞) = B[0, ∞) ∩ R1 [0, ∞) and set, for any real number α > 0, X[0, α] = x ∈ X[0, ∞): x(s) = 0 for all s > α . For each x ∈ X[0, ∞) consider the function y: [0, ∞) → R defined by $ t y(t) = x(s) ds 0
174
3. Topological Structures
for every t ≥ 0. Let BC[0, ∞) ⊆ B[0, ∞) denote the set of all real-valued bounded and continuous functions on [0, ∞). (a) Show that y ∈ BC[0, ∞). Now consider the mapping F : X[0, ∞) → BC[0, ∞) that assigns to each function x in X[0, ∞) the function y = F (x) in BC[0, ∞) defined by the above integral. For simplicity we shall use the same notation F for the restriction F |X[0,α] of F to X[0, α]. By using the appropriate definitions, show that (b) F : (X[0, ∞), d1 ) → (BC[0, ∞), d∞ ) is a contraction, (c) F : (X[0, ∞), d∞ ) → (BC[0, ∞), d∞ ) is nowhere continuous, (d) F : (X[0, α], d∞ ) → (BC[0, ∞), d∞ ) is Lipschitzian. Problem 3.20. Let I be the identity map of C[0, 1] onto itself and consider the metrics d∞ and dp (for any p ≥ 1) on C[0, 1] as in Example 3.D. Show that (a) I: (C[0, 1], d∞ ) → (C[0, 1], dp ) is a contraction, (b) I: (C[0, 1], dp ) → (C[0, 1], d∞ ) is nowhere continuous. Hint : Take the C[0, 1]-valued sequence {xn } of Example 3.F. Apply Theorem 3.7 to show that I: (C[0, 1], dp ) → (C[0, 1], d∞ ) is not continuous at 0 ∈ C[0.1]. p ∞ ⊂ + for every p ≥ 1 (Problem 3.5) and conProblem 3.21. Recall that + p ∞ sider the subspace (+ , d∞ ) of the metric space (+ , d∞ ). Let I be the identity p map of + onto itself. Show that, for each p ≥ 1, p p , dp ) → (+ , d∞ ) is a contraction, (a) I: (+ p p , d∞ ) → (+ , dp ) is nowhere continuous. (b) I: (+ p p , d∞ ) → (+ , dp ) is not continuous Hint : Use Theorem 3.7 to show that I: (+ 1 p −p at 0 ∈ + . Take xn = n (1, 1, . . . , 1, 0, 0, . . .).
Problem 3.22. Let F denote either the real field R or the complex field C and let F N be the collection of all scalar-valued sequences indexed by N . Consider p the metric space (+ , dp ) of Example 3.B for an arbitrary p ≥ 1 and, for every p a = {αk } ∈ F N, consider the following subset of + . ∞ p p Xap = x = {ξk } ∈ + : k=1 |αk ξk | < ∞ . p Let Da : (Xap , dp ) → (+ , dp ) be the diagonal mapping that assigns to each sep quence x = {ξk } in Xap the sequence {αk ξk } in + ; that is,
Da (x) = {αk ξk }
for every
x = {ξk } ∈ Xap .
p ∞ (a) Show that Xap = + and Da is Lipschitzian if a ∈ + (i.e., if supk |αk | < ∞).
Problems
175
(b) Show that Da is not continuous if αk = k for each k ∈ N . Hint : Use Theorem 3.7. Take xn = n−1 (0, . . . , 0, 1, 0, . . .). Problem 3.23. Let α be an arbitrary nonnegative real number and set ⎧ ⎨ 1−α, α ≤ 1, β = β(α) = α − 1 ⎩ α ≥ 1. α , Consider the real-valued sequence {ξn } recursively defined as follows. (i)
ξ0 = 0
and
ξn+1 = 12 (β + ξn2 )
for every n ≥ 0. Verify that (a) {ξn } is an increasing sequence, (b) 0 ≤ ξn ≤ β for all n ≥ 0, (c) {ξn } converges in R, and 1
(d) lim ξn = 1 − (1 − β) 2 . Hint : According to Problem 3.16(a), the function f : R → R such that f (x) = x2 for every x ∈ R is continuous. Use Theorem 3.7. Thus prove the square root algorithm: For every nonnegative real number α, α ≤ 1, 1 − lim ξn , 1 α2 = −1 (1 − lim ξn ) , α ≥ 1, where {ξn } is recursively defined as in (i). Problem 3.24. Take an arbitrary C[0, 1]-valued sequence {xn } that converges in (C[0, 1], d∞ ) to x ∈ C[0, 1]. Take an arbitrary [0, 1]-valued sequence {tn } that converges in R to t ∈ [0, 1]. Show that xn (tn ) → x(t)
in
R.
Hint : αk + βk → α + β whenever αk → α and βk → β in R (Problem 3.10(e)). Now use Problem 3.10(c) and Theorem 3.7. Problem 3.25. Consider the standard notion of curve length in the plane and let D[0, 1] denote the subset of C[0, 1] consisting of all real-valued functions on [0, 1] whose graph has a finite length. Let ϕ : D[0, 1] → R be the mapping that assigns to each function x ∈ D[0, 1] the length of its graph (e.g., if x ∈ C[0, 1] is 1 given by x(t) = (t − t2 ) 2 for every t ∈ [0, 1], then x ∈ D[0, 1] and ϕ(x) = π2 ). Now consider the D[0, 1]-valued sequence {xn } defined as follows. For each
176
3. Topological Structures
n ≥ 1 the graph of xn forms n equilateral triangles of the same height when intercepted with the horizontal axis. xn (t)
| | |
0
1 n
· · · n−1 n
t
1
Use this sequence to show that the mapping ϕ is not continuous when D[0, 1] is equipped with the sup-metric d∞ and R is equipped with its usual metric. (Hint : Theorem 3.7.) Problem 3.26. Let C[0, ∞) denote the set of all real-valued continuous functions on [0, ∞) and set XC[0, ∞) = X[0, ∞) ∩ C[0, ∞) = B[0, ∞) ∩ R1 [0, ∞) ∩ C[0, ∞), where the set X[0, ∞) was defined in Problem 3.19 as the intersection of B[0, ∞) and R1 [0, ∞). Now consider the mapping ϕ : XC[0, ∞) → R given by $ ∞ |x(t)| dt ϕ(x) = 0
for every x ∈ XC[0, ∞). Let {xn } be the XC[0, ∞)-valued sequence such that, for each n ≥ 1, ⎧ ⎪ t ∈ [0, n], ⎪ t, 1 ⎨ xn (t) = 2 2n − t, t ∈ [n, 2n], n ⎪ ⎪ ⎩ 0, otherwise. Use this sequence to show that ϕ : XC[0, ∞) → R is not continuous when XC[0, ∞) is equipped with the sup-metric d∞ from B[0, ∞) and R is equipped with its usual metric. (Hint : Theorem 3.7 — compare with Example 3.I.) Problem 3.27. Let (X, d) be a metric space. A real-valued function ϕ: X → R is upper semicontinuous at a point x0 of X if for each real number β such that ϕ(x0 ) < β there exists a positive number δ such that d(x, x0 ) < δ
implies
ϕ(x) < β.
It is upper semicontinuous on X if it is upper semicontinuous at every point of X. Similarly, a real-valued function ψ: X → R is lower semicontinuous at a point x0 of X if for each real number α such that α < ψ(x0 ) there exists a positive number δ such that d(x, x0 ) < δ
implies
α < ψ(x).
It is lower semicontinuous on X if it is lower semicontinuous at every point of X. Now equip R with its usual metric and prove the next proposition (which can be thought of as a real-valued semicontinuous version of Theorem 3.7).
Problems
177
(a) ϕ : X → R is upper semicontinuous at x0 if and only if lim sup ϕ(xn ) ≤ ϕ(x0 ) n
for every X-valued sequence {xn } that converges in (X, d) to x0 . Similarly, ψ: X → R is lower semicontinuous at x0 if and only if lim inf ψ(xn ) ≥ ψ(x0 ) n
for every X-valued sequence {xn } that converges in (X, d) to x0 . Hint : Take ε > 0 and x0 arbitrary, and set β = ϕ(x0 ) + ε. Suppose ϕ is upper semicontinuous at x0 and show that there exists δ > 0 such that d(x, x0 ) < δ
implies
ϕ(x) < ε + ϕ(x0 ).
If xn → x0 in (X, d), then show that there is an integer nδ > 0 such that n ≥ nδ
implies
ϕ(xn ) < ε + ϕ(x0 ).
Now conclude: lim supn ϕ(xn ) ≤ ϕ(x0 ). Conversely, if ϕ : X → R is not upper semicontinuous at x0 , then verify that there exists β > ϕ(x0 ) such that for every δ > 0 there exists xδ ∈ X such that d(xδ , x0 ) < δ
and
ϕ(xδ ) ≥ β.
Set δn = n1 and xn = xδn for each integer n ≥ 1. Thus xn → x0 in (X, d) while ϕ(xn ) ≥ β for all n, and hence lim supn ϕ(xn ) > ϕ(x0 ). (b) Show that ϕ: X → R is continuous if and only if it is both upper and lower semicontinuous on X. Hint : Problem 3.13 and Theorem 3.7. Problem 3.28. Show that the composition of uniformly continuous mappings is again uniformly continuous. Problem 3.29. Let X, Y, and Z be metric spaces. (a) If F : X → Y is continuous at x0 and G: Y → Z is continuous at F (x0 ), then the composition G ◦ F : X → Z is continuous at x0 . (Hint : Lemma 3.11.) (b) Now use Corollary 3.13 to show by induction that if F : X → X is continuous, then so is its nth power F n : X → X for every n ≥ 1. Problem 3.30. The restriction of a continuous mapping to a subspace is continuous. That is, if F : X → Y is a continuous mapping of a metric space X into a metric space Y, and if A is a subspace of X, then the restriction F |A : A → Y of F to A is continuous. Hint : F |A = F ◦ J where J: A → X is the inclusion map — use Corollary 3.13.
178
3. Topological Structures
Problem 3.31. Let X be an arbitrary set. The largest (or strongest) topology on X is the discrete topology ℘ (X) (where every subset of X is open), and the smallest (or weakest) topology on X is the indiscrete topology (where the only open subsets of X are the empty set and the whole space). The collection of all topologies on X is partially ordered by the inclusion ordering ⊆. Recall that T1 ⊆ T2 (i.e., T1 is weaker than T2 or, equivalently, T2 is stronger than T1 ) if every element of T1 also is an element of T2 . Prove the following propositions. (a) If U ⊆ ℘(X) is an arbitrary collection of subsets of X, then there exists a smallest (weakest) topology T on X such that U ⊆ T . Hint : The power set ℘(X) is a topology on X. The intersection of a nonempty collection of topologies on X is again a topology on X. (b) Show that the collection of all topologies on X is a complete lattice in the inclusion ordering. Problem 3.32. Let d1 and d2 be two metrics on a set X. Show that d1 and d2 are equivalent if and only if for each x0 ∈ X and each ε > 0 the following two conditions hold. (i) There exists δ1 > 0 such that d1 (x, x0 ) < δ1
implies
d2 (x, x0 ) < ε.
implies
d1 (x, x0 ) < ε.
(ii) There exists δ2 > 0 such that d2 (x, x0 ) < δ2
Hint : d1∼ d2 if and only if the identity I:(X, d1) → (X, d2) is a homeomorphism. n Problem n3.33. Let {(Xi , di )}i=1 be a finite collection of metric spaces and let Z = i=1 Xi be the Cartesian product of their underlying sets. Consider the metrics dp for each p ≥ 1 and d∞ on Z that were defined in Problem 3.9. Show that, for an arbitrary p ≥ 1,
d∞ (x, y) ≤ dp (x, y) ≤ d1 (x, y) ≤ n d∞ (x, y) n for every x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) in Z = i=1 Xi . Hint : dp (x, y) ≤ d1 (x, y) by the Jensen inequality (Problem 3.5). Thus conclude that the metrics d∞ and dp for every p ≥ 1 are all uniformly equivalent on Z, so that the product spaces ( ni=1 Xi , d∞ ) and ( ni=1 Xi , dp ) for every p ≥ 1 are all uniformly homeomorphic. Problem 3.34. Let (X, d) be a metric space and equip the real line R with its usual metric. Take u, v ∈ X arbitrary and note that both functions d(· , u): (X, d) → R
and
d(v, · ): (X, d) → R
Problems
179
preserve convergence (Problem 3.14(a)), and hence they are continuous by Corollary 3.8. Now consider the Cartesian product X×X equipped with the metric d1 of Problem 3.9: d1 (z, w) = d(x, u) + d(y, v) for every z = (x, y) and w = (u, v) in X×X. (a) Show that d(· , · ): (X×X, d1 ) → R is continuous. Hint : Verify that xn → x and yn → y in (X, d) whenever (xn , yn ) → (x, y) in (X×X, d1 ). Now use Problem 3.14(b) and Corollary 3.8. Next let d denote any of the metrics dp (for an arbitrary p ≥ 1) or the metric d∞ on X×X as in Problem 3.9. (b) Show that d(· , · ): (X×X, d ) → R is continuous. Hint : According to Problem 3.33 and Corollary 3.19, it can be verified that the identity I: (X×X, d ) → (X×X, d1 ) is a (uniform) homeomorphism. Now use item (a) and Corollary 3.13. Problem 3.35. If d and d are equivalent metrics on a set X, then an Xvalued sequence {xn } converges in (X, d) to x ∈ X if and only if it converges in (X, d ) to the same limit x (Corollary 3.19). If d and d are not equivalent, then it may happen that an X-valued sequence {xn } converges to x ∈ X in (X, d) but does not converge (to any point) in (X, d ) (e.g., see Examples 3.F and 3.G). Can a sequence converge in (X, d) to a point x ∈ X and also converge in (X, d ) to a different point x ∈ X? Yes, it can. We shall equip a set X with two metrics d and d , and exhibit an X-valued sequence {xn } and a pair of distinct points x and x in X such that xn → x in
(X, d)
and
xn → x
in (X, d ).
Consider the set R2 and let d denote the Euclidean metric on it (or any of the metrics on R2 introduced in Example 3.A, which are uniformly equivalent according to Problem 3.33). Set v = (0, 1) ∈ R2 and let V be the vertical axis joining v to the point 0 = (0, 0) ∈ R2 (in the jargon of Chapter 2, set V = span {v}). Now consider a function d : R2 ×R2 → R defined as follows. If x and y are both in R2 \V or both in V , then set d (x, y) = d(x, y). If one of them is in V but not the other, then d (x, y) = d(x + v, y) if x ∈ V and y ∈ R2 \V, or d (x, y) = d(x, y + v) if x ∈ R2 \V and y ∈ V . (a) Show that d is a metric on R2. Hint : If x, y ∈ V, then d (x, y) = d(x + v, y + v). Take the R2 -valued sequence {xn } with xn = ( n1 , 1) for each n ≥ 1. Show that (b)
xn → v
in (R2, d)
and
xn → 0
in (R2, d ).
(This construction was communicated by Ivo Fernandez Lopez.)
180
3. Topological Structures
Problem 3.36. Upper and lower semicontinuity were defined in Problem 3.27. Equip the real line with its usual metric, let X be a metric space, and consider the following statement. (i) Let ϕ : X → R be an upper semicontinuous function on X and let ψ: X → R be a lower semicontinuous function on X. If ϕ(x) ≤ ψ(x) for every x ∈ X, then there is a continuous function η : X → R such that ϕ(x) ≤ η(x) ≤ ψ(x) for every x ∈ X. This is the Hahn Interpolation Theorem. Use it to prove the Tietze Extension Theorem, which is stated below. (ii) If f : A → R is a bounded and continuous function on a nonempty and closed subset A of a metric space X, then f has a continuous extension g: X → R over the whole space X such that inf x∈X g(x) = inf a∈A f (a) and supx∈X g(x) = supa∈A f (a). Hint : Verify that the functions ϕ : X → R and ψ: X → R defined by f (x), x ∈ A, f (x), x ∈ A, and ψ(x) = ϕ(x) = supa∈A f (a), x ∈ X\A, inf a∈A f (a), x ∈ X\A, extend f : A → R over X and satisfy the hypothesis of the Hahn Interpolation Theorem. To show that ϕ is upper semicontinuous at an arbitrary point a0 ∈ A, use the fact that f is continuous on A (take β > f (a0 ) and set ε = β − f (a0 )). To show that ϕ is upper semicontinuous at an arbitrary point x0 ∈ X\A, use the fact that X\A is open in X and so it includes an open ball Bδ (x0 ) centered at x0 for some δ > 0 (d(x, x0 ) < δ implies ϕ(x) < β for every β > ϕ(x0 )). Problem 3.37. We can define neighborhoods in a general topological space as we did in a metric space. Precisely, a neighborhood of a point x in a topological space X is any subset of X that includes an open subset which contains x. (a) Show that a subset of a topological space X is open in X if and only if it is a neighborhood of each one of its points. A topological space X is a Hausdorff space if for every pair of distinct points x and y in X there exist neighborhoods Nx and Ny of x and y, respectively, such that Nx ∩ Ny = ∅. Prove the following propositions. (b) Each singleton in a Hausdorff space is closed (i.e., X\{x} is open in X for every x ∈ X). (c) Every metric space is a Hausdorff space (regarding the metric topology). (d) For every pair of distinct points x and y in a metric space X there exist nonempty open balls Bε (x) and Bρ (y) centered at x and y, respectively, such that Bε (x) ∩ Bρ (y) = ∅.
Problems
181
Problem 3.38. Let (X, TX ) be a topological space and let A be a subset of X. A set U ⊆ A is said to be open relative to A if U = A ∩ U for some U ∈ TX . Prove the assertions (a) through (g). (a) The collection TA of all relatively open subsets of A is a topology on A. TA is called the relative topology on A. When a subset A of X is equipped with this relative topology it is called a subspace of X; that is, (A, TA ) is a subspace of (X, TX ). If a subspace A of X is an open subset of X, then it is called an open subspace of X. Similarly, if it is a closed subset of X, then it is called an closed subspace of X. If (Y, TY ) is a topological space, and if F : X → Y is a mapping of a set X into Y, then the collection F −1 (TY ) = {F −1 (U ): U ∈ TY } of all inverse images F −1 (U ) of open sets U in Y forms a topology on X. This is the topology inversely induced on X by F , which is the weakest topology on X that makes F continuous. (b) The relative topology on A is the topology inversely induced on A by the inclusion map of A into X. (Recall that the inclusion map of A into X is the function J: A → X defined by J(x) = x for every x ∈ A.) Let (X, d) be a metric space and let TX be the metric topology on X. Suppose A is a subset of X and consider the (metric) subspace (A, d) of the metric space (X, d). Let TA be the metric topology induced on A by the relative metric (i.e., let TA be the collection of all open sets in the metric space (A, d)). (c) U ⊆ A is open in (A, d) if and only if U = A ∩ U for some U ⊆ X open in (X, d). Thus the metric topology TA induced on A by the relative metric coincides with the relative topology TA induced on A by the metric topology TX on X; that is, TA = TA , and hence the notion of subspace is unambiguously defined in a metric space. Let A be a subspace of a metric space X. (d) V ⊆ A is closed in A (or closed relative to A) if and only if V = A ∩ V for some closed subset V of X. Hint : A\(A ∩ B) = A ∩ (X\B) for arbitrary subsets A and B of X. (e) Open subsets of an open subspace of X are open sets in X. Dually, closed subsets of a closed subspace of X are closed sets in X. Let A be a subset of B (A ⊆ B ⊆ X) and let A− and B − be the closures of A and B in X, respectively. (f) B ∩ A− coincides with the closure of A in the subspace B. (g) A is dense in the subspace B if and only if A− = B −. Problem 3.39. A point x in a metric space X is a condensation point of a subset A of X if the intersection of A with every nonempty open ball centered at x is uncountable. Let A denote the set of all condensation points of an
182
3. Topological Structures
arbitrary set A in X. Show that A is closed in X and A ⊆ A , where A is the derived set of A. Problem 3.40. Take an arbitrary real number p ≥ 1 and consider the metric space (Rp [0, 1], dp ) of Example 3.E. Let C[0, 1] be the set of all scalar-valued continuous functions on the interval [0, 1] as in Example 3.D. Recall that the inclusion C[0, 1] ⊂ Rp [0, 1] is interpreted in the following sense: if x ∈ C[0, 1], then [x] ∈ Rp [0, 1]. Therefore, we are identifying C[0, 1] with the collection of all equivalence classes [x] = {x ∈ rp [0, 1]: δp (x , x) = 0} that contain a continuous function x ∈ C[0, 1] (see Example 3.E). Use the Closed Set Theorem (Theorem 3.30) to show that C[0, 1] is neither closed nor open in (Rp [0, 1], dp ). Hint : As usual, write x for [x]. Consider the C[0, 1]-valued sequence {xn }∞ n=1 and the Rp [0, 1]\C[0, 1]-valued sequence {yn }∞ n=1 defined by ⎧ ⎧ ⎪ 1, t ∈ [0, 12 ], ⎪ ⎨ ⎨ 1, t ∈ [0, n1 ), 1 n+1 xn (t) = n + 1 − 2nt, t ∈ [ 2 , 2n ], and yn (t) = ⎪ ⎩ 0, t ∈ [ n1 , 1]. ⎪ ⎩ 0, n+1 t∈[ , 1], 2n
Problem 3.41. Let A be a subset of a metric space X. The boundary of A is the set ∂A = A− \A◦ . A point x ∈ X is a boundary point of A if it belongs to ∂A. Prove the following propositions. (a) ∂A = A− ∩ (X\A)− = X\(A◦ ∪ (X\A)◦ ) = ∂(X\A). (b) A− = A◦ ∪ ∂A, so that A = A− if and only if ∂A ⊆ A. (c) ∂A is a closed subset of X (i.e., (∂A)− = ∂A). (d) A◦ ∩ ∂A = ∅, so that A = A◦ if and only if A ∩ ∂A = ∅. (e) The collection {A◦, ∂A, (X\A)◦ } is a partition of X. (f) ∂A ∩ ∂B = ∅ implies (A ∪ B)◦ = A◦ ∪ B ◦ (for B ⊆ X). Problem 3.42. Let (X, d) be a metric space. (a) Show that a closed ball is a closed set. Let Bρ (x) and Bρ [x] be arbitrary nonempty open and closed balls, respectively, both centered at the same point x ∈ X and with the same radius ρ > 0. Prove the following propositions. (b) Bρ (x) ⊆ Bρ [x]◦
and
∂Bρ [x] ⊆ {y ∈ X: d(y, x) = ρ}.
(c) Bρ (x)− ⊆ Bρ [x]
and
∂Bρ (x) ⊆ {y ∈ X: d(y, x) = ρ}.
(d) Show that the above inclusions may be proper. Also show that, in general, ∂Bρ (x) and ∂Bρ [x] are not related by inclusion. Hint : X = [0, 1] ∪ [2, 3], x = 1, and ρ = 1.
Problems
183
Problem 3.43. Let A be an arbitrary set in a metric space (X, d). Show that (a) max{diam(A◦ ), diam(∂A)} ≤ diam(A) = diam(A− ), (b) d(x, A) = d(x, A− )
and
d(x, A) = 0 if and only if x ∈ A−.
p be the set of all scalar-valued Problem 3.44. For an arbitrary p ≥ 1 let + e p-summable sequences as in Example 3.B. Let + be the set of all scalarvalued sequences {ξk }k∈N for which there exist ρ > 1 and α ∈ (0, 1) such that 0 |ξk | ≤ ραk for every k ∈ N ; and let + be the set of all scalar-valued sequences with a finite number of nonzero entries.
(a) Prove: If p and q are real numbers such that 1 ≤ p < q, then p q 0 e + ⊂ + ⊂ + ⊂ + ,
where the above inclusions are all proper. (Hint : Problem 3.5.) p 0 e , + , and + are all dense in the metric (b) Moreover, show that the sets + q space (+ , dq ) of Example 3.B. (Hint : Example 3.P.)
Problem 3.45. Let A and B be subsets of a metric space X. Recall that (A ∩ B)− ⊆ A− ∩ B − and this inclusion may be proper. For instance, the sets in Example 3.M are disjoint while their closures are not. We shall now exhibit a pair of sets A and B with the following property. A ∩ B = ∅
and
(A ∩ B)− = A− ∩ B − .
2 Consider the metric space (+ , d2 ) as in Example 3.B. Set 1 2 A = + and B = xβ ∈ + : xβ = β{ k1 }k≥1 for some β ∈ C . 1 2 1 − 2 2 1 Recall that + ⊂ + (Problem 3.5) and (+ ) = + in (+ , d2 ) (i.e., the set + 2 − is dense in the metric space (+ , d2 ) — Problem 3.44(b)). Show that B = B ; 2 that is, B is closed in (+ , d2 ). (Hint : Theorem 3.30.) Thus conclude that − A ∩ B = {0}, (A ∩ B) = {0}, and A− ∩ B − = B.
Problem 3.46. Suppose F : X → Y is a continuous mapping of a metric space X into a metric space Y. Show that (a)
F (A− ) ⊆ F (A)−
for every A ⊆ X. (Hint : Problem 1.2 and Theorem 3.23.) Now use the above result to conclude that, if A ⊆ X and C ⊆ Y, then (b) Finally, prove that
F (A) ⊆ C
implies
F (A− ) ⊆ C − .
184
(c)
3. Topological Structures
A− = B −
implies
F (A)− = F (B)−
whenever A ⊆ B ⊆ X. (Hint : Proposition 3.32 and Corollary 3.8.) Thus, if A is dense in X and if F is continuous and has a dense range (in particular, if F is surjective), then F (A) is dense in Y. p Problem 3.47. Consider the metric space (+ , dp ) for any p ≥ 1. Take any ∞ scalar-valued bounded sequence a = {αn }n≥1 from + , and consider the dip p agonal mapping Da : (+ , dp ) → (+ , dp ) defined in Problem 3.22. Suppose the 0 bounded sequence {αn }n≥1 is such that αn = 0 for every n ≥ 1. Let + be the set of all scalar-valued sequences with a finite number of nonzero entries. p p (a) Show that R(Da )− = + (i.e., the range of Da is dense in (+ , dp )). p 0 ⊆ R(Da ) ⊆ + (see Problem 3.44). Hint : Verify that + p 0 − ) = + . (b) Show that Da (+
Hint : Problems 3.22(a), 3.44(b), and 3.46(c). Problem 3.48. Prove the following results. (a) If X is a separable metric space and F : X → Y is a continuous and surjective mapping of X onto a metric space Y, then Y is separable (i.e., a continuous mapping preserves separability). Hint : Recall that, if there exists a surjective function of a set A onto a set B, then # B ≤ # A. Use Problem 3.46. (b) Separability is a topological invariant. Problem 3.49. Verify the following propositions. (a) A metric space X is separable if and only if there exists a countable subset A of X such that every nonempty open ball centered at each x ∈ X meets A (i.e., if and only if there is a countable subset A of X such that every x ∈ X is a point of adherence of A). (Hint : Propositions 3.27 and 3.32.) n (b) The product space ( i=1 Xi , d) of a finite collection {(Xi , di )}ni=1 of separable metric nspace is a separable metric space. (Note: d is any of the metrics on i=1 Xi that were defined in Problem 3.9.) co is the set of all scalar-valued sequences that converge to zero, then (c) If + co (+ , d∞ ) is a separable metric space. co Hint : According to Proposition 3.39(b), (+ , d∞ ) is a subspace of the ∞ nonseparable metric space (+ , d∞ ) of Example 3.Q. Show that the set of all rational-valued sequences with a finite number of nonzero entries is co ∞ countable and dense in (+ , d∞ ) (but not dense in (+ , d∞ )). See Example
Problems
185
3.P. (Note: We say that a complex number is “rational” if its real and imaginary parts are rational numbers.) co ∞ (d) The metric space (+ , d∞ ) is not homeomorphic to (+ , d∞ ).
Problem 3.50. A subset A of a topological space X that is both closed and open is called clopen (or closed-open). A partition {A, B} of X into the union of two nonempty disjoint clopen sets A and B is a disconnection of X (i.e., {A, B} is a disconnection of X = A ∪ B if A ∩ B = ∅ and A and B are both clopen subsets of X). If there exists a disconnection of X, then X is called disconnected . Otherwise X is called connected . In other words, a topological space is connected if and only if the only clopen subsets of X are the whole space X and the empty set ∅. A subset A of X is a connected set if, as a subspace of X, A is a connected topological space. A topological space is totally disconnected if there is no connected subset of it containing two distinct points. Prove the following propositions. (a) X is disconnected if and only if it is the union of two disjoint nonempty open sets. (b) X is disconnected if and only if it is the union of two disjoint nonempty closed sets. (c) If A is a connected set in a topological space X, and if A ⊆ B ⊆ A−, then B is connected. In particular, A− is connected whenever A is connected. Note: The closure A− of a set A is defined in a topological space X in the same way we have defined it in a metric space: the smallest closed subset of X including A. (d) The continuous image of a connected set is a connected set. (e) Connectedness is a topological invariant. Recall that a subset A of a topological space X is discrete if it consists entirely of isolated points (i.e., if every point in A does not belong to (A\{x})−, which means that in the subspace A every set is open in A). Suppose A is a discrete subset of X. If B is an arbitrary subset of A that contains more than one point, and if x ∈ B, then B\{x} and {x} are both nonempty and open in A. Thus B is disconnected, and hence A is totally disconnected. Conclusion: A discrete subset of a topological space is totally disconnected . (f) Show that the converse of the above italicized proposition fails. Hint : Verify that Q is a totally disconnected subset of R that is dense in itself (i.e., it has no isolated point). Problem 3.51. Prove the following proposition. (a) {xn } is a Cauchy sequence in a metric space (X, d) if and only if
186
3. Topological Structures
lim sup d(xn+k , xn ) = 0 n
k≥1
(i.e., if and only if {d(xn+k , xn )} converges to zero uniformly in k). (b) Show that the real-valued sequence {xn } such that xn = log n for each n ≥ 1 has the following property. lim d(xn+k , xn ) = 0 n
for every integer k ≥ 1, where d is the usual metric on R. However, {xn } is not a Cauchy sequence (it is not even bounded). Hint : log: (0, ∞) → R is continuous. Use Corollary 3.8. Problem 3.52. Let (X, d) be a metric space and let {xn } be an X-valued sequence. We say that {xn } is of bounded variation if ∞
d(xn+1 , xn ) < ∞.
n=1
If there exist real constants ρ ≥ 1 and α ∈ (0, 1) such that d(xn+1 , xn ) ≤ ρ αn for every n, then we say that {xn } has exponentially decreasing increments. Prove the following propositions. (a) If a sequence in a metric space has exponentially decreasing increments, then it is of bounded variation. (b) If a sequence in a metric space is of bounded variation, then it is a Cauchy sequence. Thus, if (X, d) is a complete metric space, then every sequence of bounded variation converges in (X, d), which implies that every sequence with exponentially decreasing increments converges in (X, d). (c) Every Cauchy sequence in a metric space has a subsequence with exponentially decreasing increments (and therefore every Cauchy sequence in a metric space has a subsequence of bounded variation). Now prove the converse of the above italicized statement: (d) If every sequence with exponentially decreasing increments in a metric space (X, d) converges in (X, d), then (X, d) is complete. m 1 Hints: (a) Verify that n=0 αn → 1−α as m → ∞ for any α ∈ (0, 1). (b) Use the triangle inequality and Problems 3.10 and 3.11. (c) Show that for each
Problems
187
integer k ≥ 1 there exists an integer nk ≥ 1 such that d(xn , xnk ) ≤ ( 12 )k for every n ≥ nk whenever {xn } is a Cauchy sequence. (d) Proposition 3.39(c). Problem 3.53. Let {xn } and {yn } be Cauchy sequences in a metric space (X, d) and consider the real sequence {d(xn , yn )}. Show that (a) {d(xn , yn )} converges in R. Hint : Use Problems 3.1(b) and 3.10(c) to show that {d(xn , yn)} is a Cauchy sequence in R. Moreover, if {xn } and {yn } are Cauchy sequences in a metric space (X, d) such that lim d(xn , xn ) = 0 and lim d(yn , yn ) = 0 (i.e., if {xn } and {yn } are Cauchy sequences in (X, d) equiconvergent with {xn } and {yn}, respectively — see Problem 3.15), then (b) lim d(xn , yn ) = lim d(xn , yn ). Hint : Set αn = d(xn , yn ), αn = d(xn , yn ), α = lim αn , and α = lim αn . Use Problems 3.1(b) and 3.10(c) to show that |αn − αn | → 0. Now note that 0 ≤ |α − α | ≤ |α − αn | + |αn − αn | + |αn − α | for each n. Problem 3.54. Suppose {xn } and {xn } are two equiconvergent sequences in a metric space X — see Problem 3.15. (a) Show that if one of them is a Cauchy sequence, then so is the other. (b) A metric space X is complete whenever there exists a dense subset A of X such that every Cauchy sequence in A converges in X. Prove. Problem 3.55. Let X be an arbitrary set. A function d : X×X → R is an ultrametric on X if it satisfies conditions (i), (ii), and (iii) in Definition 3.1 and also the ultrametric inequality, viz., d(x, y) ≤ max d(x, z), d(z, y) for every x, y, and z in X. Clearly, the ultrametric inequality implies the triangle inequality, so that an ultrametric is a metric (example: the discrete metric is an ultrametric). Let d be an ultrametric on X and let x, y, and z be arbitrary points in X. Prove the following propositions. (a) If d(x, z) = d(z, y), then d(x, y) = max{d(x, z), d(z, y)}. (b) Every point in a nonempty open ball is a center of that ball. That is, if ε > 0 and z ∈ Bε (y), then Bε (y) = Bε (z). Hint : Suppose z ∈ Bε (y) and take any x ∈ Bε (y). If d(x, z) = d(z, y), then x ∈ Bε (z). Use item (a) to show that if d(x, z) = d(z, y), then x ∈ Bε (z). Thus conclude that Bε (y) ⊆ Bε (z) whenever z ∈ Bε (y).
188
3. Topological Structures
(c) If two nonempty open balls meet, then one of then is included in the other. In particular, if two nonempty open balls of the same radius meet, then they coincide with each other. (d) Any nonempty open ball is a closed set. (Hint : Theorem 3.30.) Thus infer that the metric space (X, d) is totally disconnected. (Hint : Proposition 3.10 and Problem 3.50.) (e) A sequence {xn } in (X, d) is a Cauchy sequence if and only if lim d(xn+1 , xn ) = 0. n
Then conclude that {d(xn+k , xn )} converges to zero uniformly in k if and only if it converges to zero for some integer k ≥ 1; that is, verify that limn supk≥1 d(xn+k , xn ) = 0 if and only if limn d(xn+k , xn ) = 0 for some k ≥ 1. (Compare with Problem 3.51.) Problem 3.56. Let S be a nonempty set. Consider the collection S N of all Svalued sequences. For any distinct sequences x = {xk } and y = {yk } in S N, set k(x, y) = min{k ∈ N : xk = yk }. Define a function d : S N ×S N → R as follows. ⎧ ⎨ 0, x = y, d(x, y) = 1 ⎩ k(x,y) , x = y. (a) Show that d is an ultrametric on S N and (S N, d) is a complete metric space. This metric d is called the Baire metric on S N. Set S = F (where F denotes ∞ either the real field R or the complex field C ). Let + be the set of all bounded ∞ scalar-valued sequences (i.e., x = {ξk } ∈ + if and only if supk |ξk | < ∞). Let ∞ d be the Baire metric on F N and consider the subspace (+ , d) of the complete N ∞ metric space (F , d). Take the following + -valued sequence {xn }: for each n ∈ N , xn = {ξn (k)}k∈N , where n, k = n, ξn (k) = 0, k = n. ∞ ∞ and recall that (+ , d∞ ) is a complete metric Let d∞ be the sup-metric on + space (Example 3.S(c)). ∞ (b) Show that {xn } converges to 0 (the null sequence) in (+ , d) but is un∞ bounded in (+ , d∞ ). ∞ , d) is not complete. (c) Show that the metric space (+ ∞ -valued sequence {yn }: for each n ∈ N , Hint : Consider the following + yn = {υn (k)}k∈N , where
Problems
υn (k) =
k,
k ≤ n,
0,
k > n.
189
∞ Verify that {yn } converges in (F N, d) to y = {k} ∈ F N \+ . Use item (a) and Theorems 3.30 and 3.40(a). p Problem 3.57. Recall that (+ , dp ) is a complete metric space for every p ≥ 1 p q 0 (Example 3.R(b)) and + ⊂ + ⊂ + whenever 1 ≤ p < q (Problem 3.44(b)). p q Consider the subspace (+ , dq ) of (+ , dq ) and show that p (+ , dq ) is not a complete metric space. p 0 Now consider the subspace (+ , dp ) of (+ , dp ) and show that 0 (+ , dp ) is not a complete metric space.
Hint : Problem 3.44(b) and Theorem 3.40(a). Problem 3.58. Take an arbitrary real number p ≥ 1 and consider the metric space (C[0, 1], dp ) of Example 3.D. Prove that (C[0, 1], dp ) is not a complete metric space. Hint : Consider the C[0, 1]-valued sequence {xn } defined in Problem 3.40. First take an arbitrary pair of integers m and n such that 1 ≤ m < n and show that 1 dp (xm , xn )p < 2m . Then infer that {xn } is a Cauchy sequence in (C[0, 1], dp ). Suppose there is a function x in C[0, 1] such that dp (xn , x) → 0. Show that $ 1 $ 12 |1 − x(t)|p dt = 0 and (ii) |x(t)|p dt → 0 as n → ∞. (i) 0
n+1 2n
From (i) conclude that x(t) = 1 for all t ∈ [0, 12 ]; in particular, x( 12 ) = 1. From (ii) conclude that x(t) = 0 for all t ∈ [ n+1 2n , 1] and every n ≥ 1; in particular, 1 x( n+1 ) = 0 for every n ≥ 1 so that x( ) = x(lim n+1 ) = lim x( n+1 ) = 0 by 2n 2 2n 2n Corollary 3.8. This leads to a contradiction (viz., 0 = 1), and hence there is no function x in C[0, 1] such that dp (xn , x) → 0. Thus the C[0, 1]-valued Cauchy sequence {xn } does not converge in (C[0, 1], dp ). ∞ , d∞ ) is a complete metric space (Example Problem 3.59. Recall that (+ c 3.S). Let + denote the set of all scalar-valued convergent sequences (i.e., co c x = {ξi } ∈ + if and only if |ξi − ξ| → 0 for some scalar ξ) and let + denote c the subset of + consisting of all sequences that converge to zero. Since every convergent sequence is bounded (Proposition 3.39), it follows that p co 0 c ∞ + ⊂ + ⊂ + ⊂ + ⊂ + , p 0 with the sets + (p ≥ 1) and + defined as before (Problems 3.44 and 3.57). Use the Closed Set Theorem (Theorem 3.30) to verify the following propositions.
190
3. Topological Structures
(a)
co c (+ , d∞ ) and (+ , d∞ ) are complete metric spaces.
(b)
p 0 , d∞ ) and (+ , d∞ ) are not complete metric spaces. (+
c ∞ Hint : (a) To show that + is closed in (+ , d∞ ), proceed as follows. Take any c ε > 0. Let {xn }n≥1 be an + -valued sequence so that each xn = {ξn (k)}k≥1 converges in R. Thus, for each n ≥ 1, there is an integer kε,n ≥ 1 such that
|ξn (k) − ξn (j)| < ε ∞ whenever j, k ≥ kε,n . Now suppose {xn }n≥1 converges in (+ , d∞ ) to x = ∞ {ξ (k)}k≥1 in + so that supk |ξn (k) − ξ (k)| → 0 as n → ∞. Thus there exists an integer nε ≥ 1 such that
|ξn (k) − ξ (k)| < ε for every k ≥ 1 whenever n ≥ nε . Therefore, |ξ (k) − ξ (j)| ≤ |ξ (k) − ξnε (k)| + |ξnε (k) − ξnε (j)| + |ξnε (j) − ξ (j)| < 3ε c whenever j, k ≥ kε,nε . Next conclude that x lies in + . (b) To show that both p 0 ∞ sets + and + are not closed in the metric space (+ , d∞ ), take the sequence 1 1 0 {xn }n≥1 with xn = {1, ( 21 ) p , . . . , ( n1 ) p , 0, 0, 0, . . .} ∈ + for each n ≥ 1, which 1 p co 1 p ∞ converges in (+ , d∞ ) to x = {( k ) }k≥1 ∈ + \+. p ∞ is not dense in (+ , d∞ ). Indeed, if y = {υk }k≥1 is the Remark : Note that + ∞ constant sequence in + with υk = 1 for all k ≥ 1, then d∞ (x, y) ≥ 1 for every p co 0 ∞ x in + ⊂ + . Hence + is not dense in (+ , d∞ ) either. n Problem n 3.60. Let {(Xi , di )}i=1 be a finite collection of metric spaces and let Z = i=1 Xi be the Cartesian product of their underlying sets.Let d denote n any of the metrics dp (for an arbitrary p ≥ 1)or d∞ on Z = i=1 Xi as in n Problem 3.9. Show that the product space ( i=1 Xi , d) is complete if and only if (Xi , di ) is a complete metric space for every i = 1 , . . . , n. n Hint : Consider the metric d1 on Z (see Problem 3.9). Show that ( i=1 Xi , d1 ) is complete if and only if each (Xi , di ) is complete. Now use Problem 3.33 and Lemma 3.43.
Problem 3.61. Take an arbitrary nondegenerate closed interval (of positive length) of the real line, say I = [α, β] ⊂ R of length λ = β − α. Consider a pair of closed intervals [α, α + λ3 ], [β − λ3 , β] consisting of the first and third closed subintervals of I = [α, β] of length λ3 , which will be referred to as the closed intervals derived from I by removal of the central open third of I. If I = {Ii }ni=1 is a finite disjoint collection of nondegenerate closed intervals in R, then let I = {Ii }2n i=1 be the corresponding finite collection obtained by replacing each closed interval Ii in I with the pair of closed subintervals derived from Ii by removal of the central open third of Ii . Now consider the unit interval [0, 1] and set
Problems
191
I0 = [0, 1] . The derived intervals from [0, 1] by removal of the central open third are [0, 13 ] and [ 23 , 1]. Set I1 = I0 = [0, 13 ], [ 23 , 1] . Similarly, replacing each closed interval in I1 by the pair of closed subintervals derived from it by removal of its central open third, set I2 = I1 = [0, 19 ], [ 29 , 13 ], [ 23 , 79 ], [ 89 , 1] . Take an arbitrary positive integer n. Suppose the collection of intervals Ik has already been defined for each k = 0 , 1 , . . . , n, and set In+1 = In . This leads to an inductive construction of a disjoint collection In of 2n nondegenerate closed 1 subintervals of the unit interval [0, 1], each having length 3n , for every n ∈ N 0 . Set Cn = In for each n ∈ N 0 so that {Cn }n∈N 0 is a decreasing sequence of subsets of the unit interval (i.e., Cn+1 ⊂ Cn ⊂ C0 = [0, 1]) such that each set Cn is the union of 2n disjoint nondegenerate subintervals of length 31n . C0 0
1
C1 0
1 3
2 3
1 3
2 3
1
C2 0
1 9
2 9
The Cantor set is the set C=
7 9
8 9
1
Cn .
n∈N 0
Equip the real line R with its usual metric and prove the following assertions. (a) The Cantor set is a nonempty, closed , and bounded subset of R. (b) The Cantor set has an empty interior and hence it is nowhere dense. Hint : Recall that each set Cn consists of 2n intervals of length 31n . Take an arbitrary point γ ∈ C and an arbitrary ε > 0. Verify that there exists a positive integer nε such that the open ball Bε (γ) is not included in Cnε . Now conclude that the nonempty open ball Bε (γ) is not included in C, which means that γ is not an interior point of C. (c) The Cantor set has no isolated point and hence it is a perfect subset of R. Hint : Consider the hint to item (b). Verify that there exists a positive integer nε such that the open ball Bε (γ) includes an end point of some of the closed intervals of Cnε . (d) The Cantor set is uncountable. Hint : See Example 3.W(b), and recall that R is a complete metric space.
192
3. Topological Structures
(e) The Cantor set is totally disconnected . Hint : If α and γ are two points of an arbitrary subset A of C such that α < γ, and if n is a positive integer such that 31n < γ − α, then both α and γ do not belong to any interval of length 31n . Thus α and γ must belong to different intervals in Cn , so that there exists a real number β such that α < β < γ and β ∈ / Cn . Verify that {A ∩ (−∞, β), A ∩ (β, ∞)} is a disconnection of the set A. See Problem 3.50. Remark : Let μ(Cn ) denote the length of each set Cn , which consists of 2n disjoint intervals of length 31n . Therefore, μ(Cn ) = ( 23 )n . If we agree that the length μ(A) of a subset A of the real line can be defined somehow as to bear the property that 0 ≤ μ(A) ≤ μ(B) whenever A ⊆ B ⊆ R (provided the lengths μ(A) and μ(B) are “well defined”), then the Cantor set C is such that 0 ≤ μ(C) = μ( n Cn ) ≤ μ(Cm ) for all m (note: the length μ( n Cn ) of n Cn is “well defined” whenever the length μ(Cn ) is “well defined” for every Cn ). Hence μ(C) = 0. That is, the Cantor set has length zero. Problem 3.62. Consider the construction of the previous problem. Note that each set Cn (for n ≥ 1) is obtained from Cn−1 by removing 2n−1 central open subintervals, each of length 31n . Now, instead of removing at each iteration 2n−1 central open subintervals of length 31n , remove at each iteration 2n−1 central open subintervals of length 41n . Let {Sn }n∈N 0 be the resulting collection of closed subsets of the unit interval S0 = [0, 1]. Note that the length of Sn is μ(Sn ) = 1 −
n−1 i=0
2i 4i+1
=
1 1 + n+1 2 2
for each n ∈ N . Consider the sequence {xn } consisting of the characteristic functions of the subsets Sn of S0 ; that is, for every n ∈ N , 1, t ∈ Sn , xn (t) = 0, t ∈ S0 \Sn . Each xn belongs to Rp [0, 1] for every p ≥ 1 (see Example 3.E), so that {xn } is an Rp [0, 1]-valued sequence. Equip Rp [0, 1] with its usual metric dp . (a) Show that {xn } is a Cauchy sequence in (Rp [0, 1], dp ). (b) Show that {xn } does not converge in (Rp [0, 1], dp ). Hint : Suppose there exists x in Rp [0, 1] such that dp (xn , x) → 0. Consider 1 the null function 0 ∈ Rp [0, 1] and show that dp (xn , 0) > ( 12 ) p for all n. Use 1 1 p the triangle inequality to inferthat dp (x, 0) ≥ ( 2 ) . On the other hand, set S = n Sn so that S0 \S = n (S0 \Sn ). Take any m ≥ 0 and show that $ $ p |x(t)| dt ≤ |x(t) − xk (t)|p dt m n=1
(S0 \Sn )
S0
Problems
193
% for every k ≥ m. Thus conclude that S0 \S |x(t)|p dt = 0, and hence the lower Riemann integral of |x|p is zero. Since |x|p is Riemann integrable, % % p we get S0 |x(t)| dt = 0. This contradicts the fact that S0 |x(t)|p dt ≥ 12 . From (a) and (b) it follows that, for any p ≥ 1, (Rp [0, 1], dp ) is not a complete metric space. Remark : The failure of Rp [0, 1] to be complete when equipped with its usual metric dp is regarded as one of the defects in the definition of the Riemann integral. A more general concept of integral, viz., the Lebesgue integral , corrects this and other drawbacks of the Riemann integral. Let Lp [0, 1] be the collection of all equivalence % classes (as in Example 3.E) of scalar-valued functions x on [0, 1] such that 01 |x(t)|p dt < ∞ (the integral now is the Lebesgue integral). Every Riemann integrable function is Lebesgue integrable (and if a function is Riemann integrable, then its Riemann and Lebesgue integrals coincide), and so Rp [0, 1] ⊂ Lp [0, 1]. Moreover, it can be shown that Rp [0, 1] is dense in (Lp [0, 1], dp ). Therefore, (Lp [0, 1], dp ) is a completion of (Rp [0, 1], dp ). In fact, let F stand either for R or C , as usual, and consider the sets P [0, 1] ⊂ C[0, 1] ⊂ Rp [0, 1] ⊂ Lp [0, 1] of all F -valued polynomials with coefficients in F , of all F -valued continuous functions, of all F -valued Riemann p-integrable functions, and of all F -valued Lebesgue p-integrable functions, respectively, on [0,1], as in Examples 3.D, 3.E, and 3.P, and Problem 3.58, where the above inclusions are to be interpreted in the sense of equivalence classes as in Problem 3.40. We saw in Example 3.P that P [0, 1] is dense in (C[0, 1], d∞ ), and so it is readily verified that P [0, 1] is dense in (C[0, 1], dp ). Since it can also be shown that C[0, 1] is dense in (Rp [0, 1], dp ), it follows that (Lp [0, 1], dp ) is also a completion of (P [0, 1], dp ) and (C[0, 1], dp ). Problem 3.63. A metric space is complete if and only if every decreasing sequence {Vn}∞ n=1 of nonempty closed subsets of X for which diam(Vn ) → 0 ∞ is such that n=1 Vn = ∅. Hint : This result, likewise Lemma 3.79, is also attributed to Cantor. Its proof follows closely the proof of Lemma 3.79. Consider the same X-valued sequence {vn }n∈N that was defined in part (a) of the proof of Lemma 3.79. Show that
194
3. Topological Structures
{vn }n∈N is a Cauchy sequence if diam(Vn ) → 0. SupposeX is complete, set v = lim vn , and verify that v ∈ Vn for every n ∈ N so that m∈N Vm = ∅. Conversely, let {xn }n∈N be an arbitrary X-valued Cauchy sequence and consider the decreasing sequence {Vm− }m∈N of nonempty closed subsets of X that was defined in part (b) of the proof of Lemma 3.79. Show that diam(Vn− ) → 0. If − − m∈N Vm = ∅, then there exists v ∈ Vn for every n ∈ N . Verify that xn → v and conclude that X is complete. n Problem n 3.64. Let {(Xi , di )}i=1 be a finite collection of metric spaces and let Z = i=1 Xi be the Cartesian product of their underlying sets. Let d denote n any of the metrics dp (for any p ≥ 1) or d∞ on Z = i=1 Xi as in Problem 3.9. n (a) Show that the product space ( i=1 Xi , d) is totally bounded if and only if (Xi , di ) is totally bounded for every i = 1 , . . . , n. n Hint : Use Lemma 3.73 to show that ( i=1 Xi , d1 ) is totally bounded if and only if each (Xi , di ) is totally bounded. Also apply Problem 3.33 and Corollary 3.81. n (b) Show that ( i=1 Xi , d) is compact if and only if (Xi , di ) is compact for every i = 1 , . . . , n.
Hint : Item (a), Problem 3.60, and Corollary 3.81. Remark : Let {Xγ }γ∈Γ be an indexed family of nonempty topological spaces. Let Z = γ∈Γ Xγ be the Cartesian product of the underlying sets {Xγ }γ∈Γ . The product topology on Z is the topology inversely induced on Z by the family {πγ }γ∈Γ of projections of X onto each Xγ (the weakest topology on Z that makes each projection πγ : Z → Xγ continuous). Compactness in a topological spaceis defined as in Definition 3.60. The important Tikhonov Theorem says that γ∈Γ Xγ is compact if and only if Xγ is compact for every γ ∈ Γ . p Problem 3.65. Every closed ball of positive radius in (+ , dp ) is not totally bounded (and hence not compact). p Hint : Consider the metric space (+ , dp ) of Example 3.B for any p ≥ 1. Let p Bρ [x0 ] be a closed ball of radius ρ > 0 centered at an arbitrary x0 ∈ + . Take p the + -valued sequence {ei }i∈N of Example 3.X and set xi = ρei + x0 for each i ≥ 1. Instead of following the approach of Example 3.X, show that {xi } is a Bρ [x0 ]-valued sequence that has no Cauchy subsequence. Apply Lemma 3.73.
Problem 3.66. Prove the following propositions. (a) Every closed ball of positive radius in (C[0, 1], d∞ ) is not compact. Hint : Consider the metric space (C[0, 1], d∞ ) of Example 3.D. Let Bρ [x0 ] be the closed ball of radius ρ > 0 centered at an arbitrary x0 ∈ C[0, 1]. Consider the mapping ϕ: Bρ [x0 ] → R defined by
Problems
$
1
ϕ(x) = 0
195
|x(t) − x0 (t)| dt − |x(0) − x0 (0)|
for every x ∈ Bρ [x0 ]. Equip Bρ [x0 ] with the sup-metric d∞ and the real line R with its usual metric. Show that ϕ is continuous, ϕ(x) < ρ for all x ∈ Bρ [x0 ], and supx∈Bρ [x0 ] ϕ(x) = ρ. Now use the Weierstrass Theorem (Theorem 3.86) to verify that Bρ [x0 ] is not compact. (b) Every closed ball of positive radius in (C[0, 1], d∞ ) is a complete subspace. Hint : Example 3.T, Problem 3.42(a), and Theorem 3.40(b). (c) Every closed ball of positive radius in (C[0, 1], d∞ ) is not totally bounded. Hint : Corollary 3.81 and item (a). Problem 3.67. A topological space X is locally compact if every point of X has a compact neighborhood. Prove the following assertions. (a) A metric space X is locally compact if and only if there exists a compact closed ball of positive radius centered at each point of X. (b) Rn and C n (equipped with any of their uniformly equivalent metrics of Example 3.A) are locally compact. p (c) (+ , dp ) and (C[0, 1], d∞ ) are not locally compact.
(d) Every open subspace and every closed subspace of a locally compact metric space is locally compact. p Problem 3.68. Consider the metric space (+ , dp ) for an arbitrary p ≥ 1. p is totally bounded if and only if (a) Prove that a subset A of +
sup
∞
{ξk }∈A k=1
|ξk |p < ∞
(i.e., A is bounded and
and
lim sup
∞
n {ξk }∈A k=n
|ξk |p = 0
∞
p k=n |ξk | → 0 as n → ∞ uniformly on p is compact if and only if it is closed +
(b) Show that a subset A of files the above conditions.
A). and sati-
p (c) Verify that every closed ball of positive radius in + is not compact. p and set Problem 3.69. Let {ξk (0)} be an arbitrary point in + p S0 = {ξk } ∈ + : |ξk | ≤ |ξk (0)| for every k .
Use the preceding problem to show that S0 is a compact subset of the metric p space (+ , dp ). In particular, the set
3. Topological Structures
196
S =
2 {ξk }k≥1 ∈ + : |ξk | ≤
1 k
for every k ≥ 1 ,
2 which is known as the Hilbert cube, is compact in (+ , d2 ). Show that the 2 2 Hilbert cube has an empty interior. (Hint : Verify that (+ \S)− = + .) Then conclude that the Hilbert cube is nowhere dense.
Problem 3.70. Suppose X is a compact metric space, let Y be any metric space, and consider the metric space (C[X, Y ], d∞ ) of Example 3.Y. Take an arbitrary real number γ > 0 and let Cγ [X, Y ] denote the subset of C[X, Y ] consisting of all Lipschitzian mappings of X into Y that have a Lipschitz constant less than or equal to γ. (a) Show that Cγ [X, Y ] is equicontinuous and closed in (C[X, Y ], d∞ ). Hint : Set δ = γε for equicontinuity. Then apply the Closed Set Theorem: if fn ∈ Cγ [X, Y ] and if fn → f ∈ C[X, Y ], then by the triangle inequality dY (f (x), f (y)) ≤ dY (f (x), fn (x)) + dY (fn (x), fn (y)) + dY (fn (y), f (y)) ≤ 2 d∞ (fn , f ) + γ dX (x, y). From now on suppose the space Y is compact. Thus Y is complete (Corollary 3.81), and hence (C[X, Y ], d∞ ) is complete (Example 3.Y). (b) Show that Cγ [X, Y ] is pointwise totally bounded and then conclude that Cγ [X, Y ] is a compact subset of the metric space (C[X, Y ], d∞ ). Particular case (γ = 1): The set C1 [X, Y ] of all contractions of a compact metric space X into a compact metric space Y is a compact subset of (C[X, Y ], d∞ ). Let J[X, Y ] denote the set of all isometries of a compact metric space X into a compact metric space Y so that J[X, Y ] ⊂ C1 [X, Y ]. (c) Show that J[X, Y ] is closed in (C[X, Y ], d∞ ) and then conclude that J[X, Y ] is a compact subset of the metric space (C[X, Y ], d∞ ). Hint : Again, apply the Closed Set Theorem: if {fn } is a J[X, Y ]-valued sequence that converges to f ∈ C[X, Y ], then (cf. Problem 3.1(b)) verify that |dX (x, y) − dY (f (x), f (y))| = |dY (fn (x), fn (y)) − dY (f (x), f (y))| ≤ dY (fn (x), f (x)) + dY (fn (y), f (y)) ≤ 2 d∞ (fn , f ). Problem 3.71. Let m, g, and T be positive constants and let u: [0, T ] → R be a continuous function (i.e., u ∈ C[0, T ] = C([0, T ], R)). Consider the ordinary differential equation m
d2 x (t) = u(t) − mg dt2
with x(0) =
for every
t ∈ [0, T ],
dx d2 x dx (0) = 2 (0) = (T ) = 0. dt dt dt
Problems
(a) Show that
d2 x (0) = 0 d t2 dx dx (0) = (T ) dt dt
dx (0) = 0 x(0) = dt
=⇒
⇐⇒
u(0) = mg, $
⇐⇒
197
T
u(t) dt = mgT, 0
1 x(T ) = m
$ T $ 0
t
0
gT 2 . u(s) ds dt − 2
Hint : Verify that the solution x: [0, T ] → R to the above differential equation (which is a twice continuously differentiable function) under the initial conditions x(0) = dx dt (0) = 0 is given by $
1 dx (t) = dt m 1 x(t) = m
0
$ t $ 0
t
0
τ
u(s) ds − g t
for every
g t2 u(s) ds dτ − 2
t ∈ [0, T ],
for every
t ∈ [0, T ].
Let M and K be positive constants such that mg < M . Consider the following subsets of C[0, T ]. U1 = u ∈ C[0, T ]: 0 ≤ u(t) for every t ∈ [0, T ] , U2 = u ∈ C[0, T ]: u(t) ≤ M for every t ∈ [0, T ] , U3 = u ∈ C[0, T ]: |u(t) − u(s)| ≤ K|t − s| for every t, s ∈ [0, T ] , U4 = u ∈ C[0, T ]: u(0) = mg , %T U5 = u ∈ C[0, T ]: 0 u(t) dt = mgT . Set U =
5
i=1 Ui
⊂ C[0, T ] and consider the functional f : U → R defined by f (u) = x(T )
for every
u ∈ U.
This comprises the proper setup for a classical optimal control problem. If u(t) stands for the downward thrust of a rocket of mass m burning over the time interval [0, T ], then the above differential equation is a model for the rocket altitude x(t) — Newton’s Second Law where g is the gravitational constant. The constraints on the admissible thrusts are described in the set U. Indeed, U1 and U2 say that the thrust is nonnegative and has a maximum value, U3 affirms that the changes of thrust are limited, and U4 and U5 take into account conditions on the trust that ensure null initial acceleration in U4 and coincident initial and final velocities in U5 — which actually are null velocities when the final altitude x(T ) is given as in (a). An optimal control problem is that of selecting u ∈ U that maximizes the final altitude x(T ) in (a). Many optimal control techniques start by assuming that an optimal solution u∗ ∈ U
198
3. Topological Structures
exists, and then proceed to build up algorithms to determine it. Next we shall use the Arzel`a–Ascoli Theorem (Example 3.Z) to prove the existence of u∗. Consider the metric spaces (R, d) and (C[0, T ], d∞ ) where d is the usual metric on R (Example 3.A) and d∞ is the sup-metric on C[0, T ] (Example 3.D). (b) Show that U is nonempty and that f : (U, d∞ ) → (R, d) is continuous. Hint : Verify that f is Lipschitzian with Lipschitz constant
T m
.
(c) Show that U is closed in (C[0, T ], d∞ ). Hint : Show that each Ui is closed and use Theorem 3.22(b). (d) Show that U ⊂ C[0, T ] is pointwise totally bounded. Hint : Verify that the closure of Ut = {u(t) ∈ R: u ∈ U} is bounded in (R, d) for each t ∈ [0, T ], and so compact (Theorem 3.83), and hence totally bounded in (R, d) (Corollary 3.81). Conclude by using Example 3.Z(i). (e) Show that U ⊂ C[0, T ] is equicontinuous on [0, T ]. ε such that Hint : If u ∈ U, then u ∈ U3 . Thus for any ε > 0 there is a δ = K |t −s| ≤ δ implies |u(t)− u(s)| ≤ K|t −s| ≤ ε. Then U ⊂ (C([0, T ], R), d∞ ) is (uniformly) equicontinuous on [0, 1] (cf. Example 3.Z(ii)).
(f) Show that U is compact in (C[0, T ], d∞ ). Hint : ([0, T ], d) is a compact metric space and (R, d) is a complete metric space (aren’t they?). Apply (c), (d), and (e) to the Arzel`a–Ascoli Theorem in Example 3.Z(d) and conclude that U is compact in (C([0, T ], R), d∞ ). (g) Show that there exists u∗ ∈ U such that f (u∗ ) = max u∈U f (u). Hint : Use (b) and (f) together with Theorem 3.86.
4 Banach Spaces
Our purpose now is to put algebra and topology to work together. For instance, from algebra we get the notion of finite sums (either ordinary or direct sums of vectors, linear manifolds, or linear transformations), and from topology the notion of convergent sequences. If algebraic and topological structures are suitably laid on the same underlying set, then we may consider the concept of infinite sums and convergent series. More importantly, as continuity plays a central role in the theory of topological spaces, and linear transformation plays a central role in the theory of linear spaces, when algebra and topology are properly combined they yield the concept of continuous linear transformation; the very central theme of this book.
4.1 Normed Spaces To begin with let us point out, once and for all, that throughout this chapter F will denote either the real field R or the complex field C , both equipped with their usual topologies induced by their usual metrics. If we intend to combine algebra and topology so that a given set is endowed with both algebraic and topological structures, then we might simply equip a linear space with some metric, and hence it would become a linear space that is also a metric space. However, an arbitrary metric on a linear space may induce a topological structure that has nothing to do with the algebraic structure (i.e., these structures may live apart on the same underlying set). A richer and more useful structure is obtained when the metric recognizes the operations of vector addition and scalar multiplication that come with the linear space, and incorporates these operations in its own definition. With this in mind, let us first define a couple of concepts. A metric (or a pseudometric) d on a linear space X over F is said to be additively invariant if d(x, y) = d(x + z, y + z) for every x, y, and z in X . This means that the translation mapping X → X defined by x → x + z for any z ∈ X is an isometry. If d is such that C.S. Kubrusly, The Elements of Operator Theory, DOI 10.1007/978-0-8176-4998-2_4, © Springer Science+Business Media, LLC 2011
199
200
4. Banach Spaces
d(αx, αy) = |α|d(x, y) for every x and y in X and every α in F , then the metric d is called absolutely homogeneous. A program for equipping a linear space with a metric that has the above “linear-like” properties goes as follows. Let p : X → R be a realvalued functional on a linear space over F (recall: F is either R or C so that R is always embedded in F ). It is nonnegative homogeneous if p(αx) = αp(x) for every x ∈ X and every nonnegative (real) scalar α ∈ F , and subadditive if p(x + y) ≤ p(x) + p(y) for every x and y in X . If p is both nonnegative homogeneous and subadditive, then it is called a sublinear functional. If p(αx) = |α|p(x) for every x in X and α in F , then p is absolutely homogeneous. A subadditive absolutely homogeneous functional is a convex functional. (Note that this includes the classical definition of a convex functional, namely, if p : X → R is convex, then p(αx + β x) ≤ αp(x) + β p(x) for every x, y ∈ X and every α ∈ [0, 1] with β = 1 − α.) If p(x) ≥ 0 for all x in X , then p is nonnegative. A nonnegative convex functional is a seminorm (or a pseudonorm — i.e., a nonnegative absolutely homogeneous subadditive functional). If p(x) > 0
whenever
x = 0,
then p is called positive. A positive seminorm is a norm (i.e., a positive absolutely homogeneous subadditive functional). Summing up: A norm is a realvalued functional on a linear space with the following four properties, called the norm axioms. Definition 4.1. Let X be a linear space over F . A real-valued function # #: X → R is a norm on X if the following conditions are satisfied for all vectors x and y in X and all scalars α in F . (i) (ii)
#x# ≥ 0 #x# > 0 if
x = 0
(nonnegativeness), (positiveness),
(iii) #αx# = |α|#x# (absolute homogeneity), (iv) #x + y# ≤ #x# + #y# (subadditivity – triangle inequality).
4.1 Normed Spaces
201
A linear space X equipped with a norm on it is a normed space (synonyms: normed linear space or normed vector space). If X is a real or complex linear space (so that F = R or F = C ) equipped with a norm on it, then it is referred to as a real or complex normed space, respectively. Note that these are not independent axioms. For instance, axiom (i) can be derived from axioms (ii) and (iii): an absolutely homogeneous positive functional is necessarily nonnegative. Indeed, for α = 0 in (iii) we get #0# = 0 and, conversely, x = 0 whenever #x# = 0 by positiveness in (ii). Therefore, if # # : X → R is a norm, then #x# = 0
if and only if
x = 0.
Proposition 4.2. If # # : X → R is a norm on a linear space X , then the function d : X ×X → R, defined by d(x, y) = #x − y# for every x, y ∈ X , is a metric on X . Proof. From (i) and (ii) in Definition 4.1 we get the metric axiom (i) of Definition 3.1. Positiveness (ii) and absolute homogeneity (iii) of Definition 4.1 imply positiveness (ii) and symmetry (iii) of Definition 3.1, respectively. Finally, the triangle inequality (iv) of Definition 3.1 follows from the triangle inequality (iv) and absolute homogeneity (iii) of Definition 4.1. A word on notation and terminology. According to Definition 4.1 a normed space actually is an ordered pair (X , # #), where X is a linear space and # # is a norm on X . As in the case of a metric space, we shall refer to a normed space in several ways. We may speak of X itself as a normed space when the norm # # is either clear in the context or immaterial and, in this case, we shall simply say “X is a normed space”. However, in order to avoid confusion among different normed spaces, we may occasionally insert a subscript on the norms (e.g., (X , # #X ) and (Y, # #Y)). If a linear space X can be equipped with more than one norm, say # #1 and # #2 , then (X , # #1 ) and (X , # #2 ) will represent different normed spaces with the same linear space X . The metric d of Proposition 4.2 is the metric generated by the norm # #, and so a normed space is a special kind of linear metric space. Whenever we refer to the topological (metric) structure of a normed space (X , # #), it will always be understood that such a topology on X is that induced by the metric d generated by the norm # #. This is called the norm topology. Note that the norm #x# of every vector x in a normed space (X , # #) is precisely the distance (in the norm topology) between x and the origin 0 of the linear space X (i.e., #x# = d(x, 0), where d is the metric generated by the norm # #). Proposition 4.2 says that every norm on a linear space generates a metric, but an arbitrary metric on a linear space may not be generated by any norm on it. The next proposition tells us when a metric on a linear space is generated by a norm.
202
4. Banach Spaces
Proposition 4.3. Let X be a linear space. A metric on X is generated by a norm on X if and only if it is additively invariant and absolutely homogeneous. Moreover, for each additively invariant and absolutely homogeneous metric on X there exists a unique norm on X that generates it . Proof. If d is a metric on a normed space X generated by a norm # # on X , then it is additively invariant (d (x, y) = #x − y# = #x + z − (y + z)# = d (x + z, y + z) for every x, y and z in X ) and also absolutely homogeneous (d (αx, αy) = #αx − αy# = #α(x − y)# = |α|#x − y# = |α|d (x, y) for every x, y in X and every scalar α). Conversely, if d is an additively invariant and absolutely homogeneous metric on a linear space X , then the function # #d : X → R defined by #x#d = d(x, 0) for every x in X is a norm on X . Indeed, properties (i) and (ii) of Definition 4.1 are trivially verified by the first two metric axioms in Definition 3.1. Properties (iii) and (iv) of Definition 4.1 follow from absolute homogeneity (#αx#d = d(αx, 0) = |α|d(x, 0) = |α|#x#d for every x in X and every scalar α) and from additive invariance (#x + y#d = d(x + y, 0) = d(x, −y) ≤ d(x, 0) + d(0, −y) = d(x, 0) + d(0, y) = #x# + #y# for every x, y in X ). This norm # #d generates the metric d since #x − y#d = d(x − y, 0) = d(x, y) for every x, y in X , and uniqueness is straightforward: if # #1 and # #2 generate d, then #x#1 = d(x, 0) = #x#2 for all x in X . Let (X , # #) be a normed space. By Proposition 4.2 and Problem 3.1 it follows at once that # # # #x# − #y# # ≤ #x − y# for every x, y ∈ X . Thus the norm # # : X → R is a continuous mapping with respect to the norm topology of X (see Problem 3.34). In fact, the preceding inequality says that every norm is a contraction (thus Lipschitzian and hence uniformly continuous). Therefore (cf. Corollary 3.8 and Lemma 3.43), a norm preserves convergence: if xn → x in the norm topology of X , then #xn # → #x# in R; and also preserves Cauchy sequences: if {xn } is a Cauchy sequence in X with respect to the metric generated by the norm on X , then {#xn #} is a Cauchy sequence in R. A Banach space is a complete normed space. Obviously, completeness refers to the norm topology: a Banach space is a normed space that is complete as a metric space with respect to the metric generated by the norm. A real or complex Banach space is a complete real or complex normed space. Let X be a linear space and let {xn } be an X -valued sequence indexed by
N (or by N 0 ). For each n ≥ 1 set
yn =
n
xi
i=1
in X so that {yn}∞ n=1 is again an X -valued sequence. This is called the sequence of partial sums of {xn }∞ n=1 . Now equip X with a norm # #. If the sequence of
4.1 Normed Spaces
203
partial sums {yn }∞ n=1 converges in the normed space X to a point y in X (i.e., ∞ if #yn − y# → 0), then that ∞we say that {xn }n=1 is a summable sequence(or ∞ the infinite series k=1 xk converges in X to y — notation: y = k=1 xk ). If real-valued sequence {#xn #}∞ n=1 is summable ∞ (i.e., if the infinite series the ∞ k=1 #xk # converges in R or, equivalently, if k=1 #xk # < ∞ — see Problem 3.11), then we say that {xn }∞ n=1 is an absolutely summable sequence (or that ∞ the infinite series k=1 xk is absolutely convergent ). Proposition 4.4. A normed space is a Banach space if and only if every absolutely summable sequence is summable. Proof. Let (X , # #) be a normed space and let {xn }∞ n=0 be an arbitrary X valued sequence. ∞ (a) Consider the sequence {yn }∞ n=0 of partial sums of {xn }n=0 ,
yn =
n
xi
i=0
in X for each n ≥ 0. It is readily verified by induction (with a little help from the triangle inequality) that n+k , , n+k , , #yn+k − yn # = , xi , ≤ #xi # i=n+1
i=n+1
for every pair of integers n ≥ 0 and k ≥ 1. Suppose {xn }∞ n=0 is an absolutely ∞ summable sequence (i.e., j=0 #xj # < ∞) so that 0 ≤ sup #yn+k − yn # ≤ k≥1
∞
#xj # → 0 as
n → ∞,
j=n+1
and hence limn supk≥1 #yn+k − yn # = 0 (Problems 3.10(c) and 3.11). Equivalently, {yn }∞ n=0 is a Cauchy sequence in X (Problem 3.51). Therefore, if X is ∞ a Banach space, then {yn }∞ n=0 converges in X , which means that {xn }n=0 is a summable sequence. Conclusion: An arbitrary absolutely summable sequence in a Banach space is summable. (b) Conversely, suppose {xn }∞ n=0 is a Cauchy sequence. By Problem 3.52(c) ∞ {xn }∞ n=0 has a subsequence {xnk }k=0 of bounded variation. Set z0 = xn0 and zk+1 = xnk+1 − xnk in X , so that xnk+1 = xnk + zk+1 , for every k ≥ 0. Thus xnk =
k i=0
zi
204
4. Banach Spaces
for every k ≥ 0 (cf. Problem 2.19). Since {xnk }∞ k=0 is of bounded variation (i.e., ∞ ∞ k=0 #xnk+1 − xnk # < ∞), it follows that {zk }k=0 is an absolutely summable sequence in X . If every absolutely summable sequence in X is summable, then ∞ {zk }∞ k=0 is a summable sequence, and so the subsequence {xnk }k=0 converges ∞ in X . Thus (see Proposition 3.39(c)) the Cauchy sequence {xn }n=0 converges in X . Conclusion: Every Cauchy sequence in X converges in X , which means that the normed space X is complete (i.e., X is a Banach space).
4.2 Examples Many of the examples of metric spaces exhibited in Chapter 3 are in fact examples of normed spaces: linear spaces equipped with an additively invariant and absolutely homogeneous metric. Example 4.A. Let F n be the linear space over F of Example 2.D (with either F = R or F = C ). Consider the functions # #p : F n → R (for each real number p ≥ 1) and # #∞ : F n → R defined by #x#p =
n
|ξi |p
p1
and
#x#∞ = max |ξi | 1≤i≤n
i=1
for every x = (ξ1 , . . . , ξn ) in F n. It is easy to verify that these are norms on F n (the triangle inequality comes from the Minkowski inequality of Problem 3.4) and also that the metrics generated by each of them are precisely the metrics dp (for p ≥ 1) and d∞ of Example 3.A. Since F n, when equipped with any of these metrics, is a complete metric space (Example 3.R(a)), it follows that F n is a Banach space
when equipped with any of the norms # #p or # #∞ . In particular, for n = 1 all of these norms are reduced to the absolute value function | | : F → R, which is the usual norm on F . The norm # #2 plays a special role. On Rn it is the Euclidean norm, and the real Banach space (Rn, # #2 ) is the n-dimensional Euclidean space. The complex Banach space (C n, # #2 ) is sometimes referred to as the n-dimensional unitary space. Example 4.B. According to Example 2.E the set F N (or F N 0 ) of all scalarp valued sequences is a linear space over F . Now consider the subsets + and N N ∞ + of F defined as in Example 3.B. These are linear manifolds of F (vector addition and scalar multiplication — pointwise defined — of p-summable or bounded sequences are again p-summable or bounded sequences, respectively, p ∞ by the Minkowski inequality of Problem 3.4), and hence + and + are linear p spaces over F . For each p ≥ 1 consider function # #p : + → R defined by #x#p =
∞ k=1
|ξk |p
p1
4.2 Examples p for every x = {ξk }k∈N in + (i.e., #x#p = supn∈N ∞ 3.B), and the function # #∞ : + → R given by
n
p k=1 |ξk |
205
— see Example
#x#∞ = sup |ξk | k∈N
p ∞ for every x = {ξk }k∈N in + . It is readily verified that # #p is a norm on + ∞ and # #∞ is a norm on + (as before, the Minkowski inequality leads to the triangle inequality). Moreover, the norm # #p generates the metric dp and the norm # #∞ generates the metric d∞ of Example 3.B. These are the usual norms p p ∞ ∞ on + and + . Since (+ , dp ) is a complete metric space, and since (+ , d∞ ) also is a complete metric space (see Examples 3.R(b) and 3.S), it follows that p ∞ (+ , # #p ) and (+ , # #∞ ) are Banach spaces.
Similarly (see Examples 2.E, 3.B, 3.R(b) and 3.S again), ( p , # #p ) and ( ∞, # #∞ ) are Banach spaces, where the functions # #p : p → R and # #∞ : ∞ → R, defined by #x#p =
∞ k=−∞
|ξk |p
p1
and
#x#∞ = sup |ξk | k∈Z
for x = {ξk }k∈Z in p or in ∞, respectively, are the usual norms on the linear manifolds p and ∞ of the linear space F Z. Let X be a linear space. A real-valued function # # : X → R is a seminorm (or a pseudonorm) on X if it satisfies the three axioms # (i), (iii), #and (iv) of Definition 4.1. It is worth noticing that the inequality # #x# − #y# # ≤ #x − y# for every x, y in X still holds for a seminorm. The difference between a norm and a seminorm is that a seminorm does not necessarily satisfy the axiom (ii) of Definition 4.1 (i.e., a seminorm surely vanishes at the origin but it may also vanish at a nonzero vector). A seminorm generates a pseudometric as in Proposition 4.2: if # # is a seminorm on X , then d(x, y) = #x − y# for every x, y ∈ X defines an additively invariant and absolutely homogeneous pseudometric on X . Moreover, if a pseudometric is additively invariant and absolutely homogeneous, then it is generated by a seminorm as in Proposition 4.3: if d is an additively invariant and absolutely homogeneous pseudometric on X , then #x# = d(x, 0) for every x ∈ X defines a seminorm on X such that d(x, y) = #x − y# for every x, y ∈ X . Proposition 4.5. Let # # be a seminorm on a linear space X . The set N of all vectors x in X for which #x# = 0, N = x ∈ X : #x# = 0 , is a linear manifold of X . Consider the quotient space X /N and set
206
4. Banach Spaces
#[x]#∼ = #x# for every coset [x] in X /N , where x is an arbitrary vector in [x]. This defines a norm on the linear space X /N so that (X /N , # #∼ ) is a normed space. Proof. Indeed, N is a linear manifold of X (if u, v ∈ N , then 0 ≤ #u + v# ≤ #u# + #v# = 0 and #αu# = |α|#u# = 0 so that u + v ∈ N and αu ∈ N for every scalar α). Consider the quotient space X /N of X modulo N (Example 2.H), which is a linear space over the same scalar field of X . Take an arbitrary coset [x] = x + N = x ∈ X : x = x + z for some z ∈ N in X /N and note that # #u# = #v# # for every u and v in [x] (if u, v ∈ [x], then u − v ∈ N and 0 ≤ # #u# − #v# # ≤ #u − v# = 0). Thus set #[x]#∼ = #x# for an arbitrary x ∈ [x] (i.e., for an arbitrary representative of the equivalence class [x]), which defines a function from X /N to R, # #∼ : X /N → R. It is clear that #[x]#∼ ≥ 0. If #[x]#∼ = 0, then [x] = N = [0], the origin of the linear space X /N (reason: #[x]#∼ = 0 implies that every x in [x] belongs to N and also that every u in N belongs to [x]). Moreover, #α[x]#∼ = #[αx]#∼ = #αx# = |α|#x# = |α|#[x]#∼ , #[x] + [y]#∼ = #[x + y]#∼ = #x + y# ≤ #x# + #y# = #[x]#∼ + #[y]#∼ , for every [x], [y] ∈ X /N and every scalar α (Example 2.H). Therefore (Definition 4.1), # #∼ is a norm on the linear space X /N . Remark : Note that the relation ∼ on X defined by x ∼ x if or, equivalently,
x ∼ x if
#x − x# = 0 x − x ∈ N
is a linear equivalence relation on the linear space X in the sense of Example 2.G. Consider the quotient space X /∼ of X modulo ∼ . If vector addition and scalar multiplication are defined in X /∼ as in Example 2.H, then X/∼ is a linear space that coincides with X/N . In this case (i.e., when X is a linear space and # # is a seminorm on X ), the metric d∼ on X/N generated by the norm # #∼ is precisely the metric d on X /∼ of Proposition 3.3 obtained by the pseudometric d on X generated by the seminorm # #. Example 4.C. Take an arbitrary real number p ≥ 1 and consider the setup of Example 3.E: rp (S) is the set of all F -valued Riemann p-integrable functions
4.2 Examples
207
on a nondegenerate interval S of the real line R. This is a linear manifold of the linear space F S (see Example 2.E), and hence rp (S) is a linear space over F . Indeed, addition and scalar multiplication of Riemann p-integrable functions on S are again Riemann p-integrable functions on S (Minkowski inequality). It is clear that the pseudometric δp on rp (S) defined in Example 3.E is additively invariant and absolutely homogeneous. Thus δp is generated by a seminorm | |p on rp (S), $
p1 |x(s)|p ds |x|p = δp (x, 0) = S
for every x ∈ r (S), so that δp (x, y) = |x − y|p for every x, y ∈ rp (S). Consider the linear manifold N = {x ∈ rp (S): |x|p = 0} and set Rp (S) = rp (S)/N , the quotient space of rp (S) modulo N (i.e., the collection of all equivalence classes [x] = {x ∈ rp (S): |x − x|p = 0} for every x ∈ rp (S)). By Proposition 4.5 the function # #p : Rp (S) → R defined by #[x]#p = |x|p for every [x] in Rp (S) is a norm on Rp (S) (where x is an arbitrary representative of the equivalence class [x]).This is the usual norm on Rp (S). Note that Rp (S) is precisely the quotient space rp (S)/∼ of Example 3.E (see the remark that follows Proposition 4.5). Moreover, # #p is the norm on Rp (S) that generates the usual metric dp of Example 3.E: p
dp ([x], [y]) = #[x] − [y]#p = #[x − y]#p = |x − y|p = δp (x, y) for every [x], [y] ∈ Rp (S), where x and y are arbitrary vectors in [x] and [y], respectively. According to common usage, we shall write x ∈ Rp (S) instead of [x] ∈ Rp (S), and also $
p1 #x#p = dp (x, 0) = |x(s)|p ds S
for every x ∈ R (S) to represent the norm # #p on Rp (S). Therefore, p
(Rp (S), # #p ) is a normed space but not a Banach space (Problem 3.62). Its completion, of course, is: (Lp (S), # #p ) is a Banach space. (We shall discuss the completion of a normed space in Section 4.7.) Example 4.D. Let C[0, 1] be the set of all F -valued continuous functions on the interval [0, 1] as in Example 3.D. Again, this is a linear manifold of the linear space F [0,1] of Example 2.E (addition and scalar multiplication of continuous functions are continuous functions), and hence C[0, 1] is a linear space over F . In fact, C[0, 1] is a linear manifold of the linear space rp [0, 1] of the previous example (every continuous function on [0, 1] is Riemann pintegrable). For each p ≥ 1 consider the function # #p : C[0, 1] → R defined by
208
4. Banach Spaces
#x#p =
$
1
0
|x(t)|p dt
p1
for every x ∈ C[0, 1]. This is the norm on C[0, 1] that generates the metric dp of Example 3.D so that (C[0, 1], # #p ) is a normed space. According to Problem 3.58, (C[0, 1], # #p ) is not a Banach space. Recall that C[0, 1] can be viewed as a subset of Rp [0, 1] (in the sense of Problem 3.40) and, as such, it can be shown that C[0, 1] is dense in (Rp [0, 1], # #p ). Therefore, since Rp [0, 1] is dense in (Lp [0, 1], # #p ), it follows that the Banach space (Lp [0, 1], # #p ) is a completion of (C[0, 1], # #p ) — see Problem 3.38(g) and the remark that follows Problem 3.62. Let {X γ }γ∈Γ be an indexed family of linear spaces over the same field F . The set γ∈Γ Xγ of all indexed families {xγ }γ∈Γ , where xγ ∈ Xγ for each index γ ∈ Γ , becomes a linear space over F if vector addition and scalar multiplication are defined on γ∈Γ Xγ as {xγ }γ∈Γ ⊕ {yγ }γ∈Γ = {xγ + yγ }γ∈Γ and α{xγ }γ∈Γ = {αxγ }γ∈Γ for every {xγ }γ∈Γ and {yγ }γ∈Γ in γ∈Γ Xγ and every α in F . This is the direct sum (or the full direct sum) of the family {Xγ } γ∈Γ . The underlying set of the linear space γ∈Γ Xγ is the Cartesian product γ∈Γ Xγ of the underlying sets of each linear space Xγ (cf. Section 2.8). spaces, Example 4.E. Let {(Xi , # #i )}ni=1 be a finite collection of normed n where the linear spaces Xi are all over the same field F , and let X be the i i=1 n direct sum of the family {Xi }ni=1 . Consider the functions # # : X p i=1 i → R (for each real number p ≥ 1) and # #∞ : ni=1 Xi → R defined by #x#p =
n
#xi #pi
i=1
p1
#x#∞ = max #xi #i
and
1≤i≤n
n
for every x = (x1 , . . . , xn ) in i=1 Xi . It is easy to verify that these are n norms on the direct sum X . For instance, the triangle inequality for i i=1 the norm # #p comes from the Minkowskiinequality (Problem 3.4): for every n x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) in i=1 Xi , #x ⊕ y#p =
n
#xi +
i=1
≤
n i=1
#xi #pi
yi #pi
p1
p1
+
≤
n
n
i=1
#yi#pi
#xi #i + #yi #i
p1
p p1
= #x#p + #y#p .
i=1
Moreover, these norms generate the metrics dp and d∞ of Problem 3.9 (recall: n n the underlying set of the linear space i=1 Xi is the Cartesian product i=1 Xi of the underlying sets of each linear space Xi ), and therefore (Problem 3.60)
4.2 Examples
n
i=1 Xi , #
209
n
#p and i=1 Xi , # #∞ are Banach spaces
if and only if each (Xi , # #i ) is a Banach space. If the nnormed spaces (Xi , # #i ) coincide with a fixed normed space (X , # #), then i=1 X is the direct sum of n (a linear space whose underlying set is the Cartesian product ncopies of X n X = X of n copies i=1 n of the underlying set of the linear space X — Section 1.7). We can identify i=1 X with X n (where the linear operations on X n are defined coordinatewise) so that (X n , # #p ) and (X n , # #∞ ) are Banach spaces whenever (X , # #) is a Banach space. Note that this generalizes the Banach spaces of Example 4.A. Example 4.F. Let {(Xk , # #k )} be a countably infinite collection ofnormed spaces, where the linear spaces Xk are over the same field F-, and let Xk be k . the direct sum of {Xk }. For each p ≥ 1 consider the subset Xk p of k Xk k . - consisting of all families {xk } in k Xk . That is, {xk } ∈ k Xk p p-summable if and only if k #xk #pk < ∞, where k #xk #pk is the supremum of the set of all p finite sums ofpositive numbers - from . {#xk #k }. This is a linear manifold of the linear space k Xk and so over F . As in Example k Xk p is a linear -space . 4.E, it is readily verified that the function # #p : X k k p → R, defined by #x#p =
#xk #pk
p1
k
- - . . for every x = {xk } ∈ Xk p , is a norm on Xk p . Now consider the k k - . subset k X-k k Xk consisting of all bounded families {xk } in k Xk ∞ of . (i.e., {xk } ∈ only if supk #xk #k < ∞). This again is a linear k Xk ∞ if and . - manifold of the linear space k Xk so that k X-k∞ is .itself a linear space over F . It is easy to show that the function # #∞ : k Xk ∞ → R, defined by #x#∞ = sup #xk #k .
k
. - for every x = {xk } ∈ k Xk ∞ , is a norm on k Xk ∞ . Moreover, it can also be shown that - .
- .
k Xk p , # #p and k Xk ∞ , # #∞ are Banach spaces -
if and only if each (Xk , # #k ) is a Banach space (hint : Example 3.R(b)). Again, if the normed spaces (Xk , # #k ) coincide with a fixed normed space (X , # #), then that k X is the direct sum of countably infinite copies of X . (Recall X is a linear space whose underlying set is the Cartesian product k kX of countably infinite copies of the underlying set of the linear space X , which coincide with X N, X N 0, or X Z if the indices k run over N , N 0 , or Z , respectively N — Section 1.7.) As before, we shall adopt the identifications k∈N X = X ,
210
4. Banach Spaces
Z = X N 0, and (the linear operations on X N, X N 0, and k∈Z X = X - . Z N X are defined as in Example 2.F). It is usual to denote (or k X p in X p N0 N in X ) by + (X ): the linear manifold of X consisting of all p-summable - . N ∞ X -valued sequences; and (or in X N 0 ) by + (X ): the linear k X ∞ in X N manifold of X consisting of all bounded X -valued sequences. That is, ∞ p p (X ) = {xk } ∈ X N : + k=1 #xk # < ∞ , ∞ + (X ) = {xk } ∈ X N : supk∈N #xk # < ∞ . ∞
p p1 The norms #x#p = and #x#∞ = supk∈N #xk #, for every x = k=1 #xk # p p ∞ ∞ {xk } either in + (X ) or in + (X ), are the usual norms on + (X ) and + (X ), respectively, and
k∈N 0 X
p ∞ (+ (X ), # #p ) and (+ (X ), # #∞ ) are Banach spaces - . Z whenever (X , # #) is a Banach space. Similarly, k X p in X is denoted by . p Z ∞ is denoted by (X ) and, when equipped with (X ) and k X ∞ in X their usual norms # #p and # #∞ ,
( p (X ), # #p ) and ( ∞ (X ), # #∞ ) are Banach spaces whenever (X , # #) is a Banach space. This example generalizes the Banach spaces of Example 4.B.
4.3 Subspaces and Quotient Spaces If (X , # #) is a normed space, and if M is a linear manifold of the linear space X , then it is easy to show that the restriction # #M : M → R of the norm # #: X → R to M is a norm on M so that (M, # #M ) is a normed space. Moreover, the metric dM : M×M → R generated by the norm # #M on M coincides with the restriction to M×M of the metric d : X ×X → R generated by the norm # # on X . Thus (M, dM ) is a subspace of the metric space (X , d). If a linear manifold of a normed space is regarded as a normed space, then it will be understood that the norm on it is the restricted norm # #M . We shall drop the subscripts and write (M, # #) and (M, d) instead of (M, # #M ) and (M, dM ), respectively, and often refer to the normed space (M, # #) by simply saying that “M is a linear manifold of X ”. Proposition 4.6. Let M be a linear manifold of a normed space X . If M is open in X , then M = X . Proof. Since M is a linear manifold of the linear space X , the origin 0 of X lies in M. If M is open in X (in the norm topology, of course), then M includes a nonempty open ball with center at the origin: Bε (0) = {y ∈ X : #y# < ε} ⊂ M for some ε > 0. Take an arbitrary x = 0 in X and set z = ε(2#x#)−1 x in X
4.3 Subspaces and Quotient Spaces
211
so that #z# = ε2 , and hence z ∈ Bε (0) ⊂ M. Thus x = (2#x#)ε−1 z lies in M (since M is a linear space). Therefore, every nonzero vector in X also lies in M. Conclusion: X ⊆ M so that M = X (because M ⊆ X ). This shows that a normed space X is itself the only open linear manifold of X . On the other hand, the closed linear manifolds of a normed space are far more interesting. In fact, they are so important that we give then a name. A closed linear manifold of a normed space is called a subspace of X . (Warning: A subspace of a metric space (X, d) is simply a subset of X equipped with the “same” metric d, while a subspace of a normed space (X , # #) is a linear manifold of X equipped with the “same” norm # # that is closed in (X , # #).) Proposition 4.7. A linear manifold of a Banach space X is itself a Banach space if and only if it is a subspace of X .
Proof. Corollary 3.41.
Example 4.G. As usual, let Y S denote the collection of all functions of a nonempty set S into a set Y . If Y is a linear space over F , then Y S is a linear space over F (Example 2.F). Suppose Y is a normed space and consider the subset B[S, Y] of Y S consisting of all bounded mappings of S into Y. This is a linear manifold of Y S (vector addition and scalar multiplication of bounded mappings are bounded mappings), and hence B[S, Y] is a linear space over F . Now consider the function # #∞ : B[S, Y] → R defined by #f #∞ = sup #f (s)# s∈S
for every f ∈ B[S, Y], where # # : Y → R is the norm on Y. It is easy to verify that (B[S, Y], # #∞) is a normed space. The norm # #∞ on B[S, Y], which is referred to as the sup-norm, generates the sup-metric d∞ of Example 3.C. Thus, according to Example 3.S, (B[S, Y], # #∞ ) is a Banach space if and only if Y is a Banach space. Moreover, if X is a nonempty metric space and if BC[X, Y] is the subset of B[X, Y] made up of all continuous mappings from B[X, Y], then BC[X, Y] is a linear manifold of the linear space B[X, Y] (addition and scalar multiplication of continuous functions are continuous functions), and so BC[X, Y] is itself a linear space over F . According to Example 3.N, the linear manifold BC[X, Y] is closed in (B[X, Y], # #∞ ), and so (BC[X, Y], # #∞ ) is a subspace of the normed space (B[X, Y], # #∞ ). Thus (cf. Example 3.T and Proposition 4.7), (BC[X, Y], # #∞ ) is a Banach space if and only if Y is a Banach space. If the metric space X is compact, then C[X, Y] = BC[X, Y], where C[X, Y] stands for the set of all continuous mappings of the compact metric space X into the normed space Y (see Example 3.Y). Therefore,
212
4. Banach Spaces
(C[X, Y], # #∞ ) is a Banach space if X is compact and Y is Banach. By setting X = [0, 1] equipped with the usual metric on R (which is compact — Heine–Borel Theorem), it follows that (C[0, 1], # #∞ ) is a Banach space, where C[0, 1] = C([0, 1], F ) is the linear space over F of all F -valued continuous functions defined on the closed interval [0, 1] (recall: (F , | |) is a Banach space — Example 4.A), and #x#∞ = sup |x(t)| = max |x(t)| 0≤t≤1
0≤t≤1
for every
x ∈ C[0, 1]
is the sup-norm on C[0, 1] (cf. Examples 3.T and 3.Y). In any normed space X the zero linear manifold {0} and the whole space X are subspaces of X . If a subspace M is a proper subset of X , then it is said to be a proper subspace. A nontrivial subspace M of a normed space X is a nonzero proper subspace of it ({0} = M = X ). The next proposition shows that, if the dimension of the linear space X is greater than 1, then there are many nontrivial subspaces of the normed space X . Proposition 4.8. Let X be a normed space. (a) The closure M− of a linear manifold M of X is a subspace of X . (b) The intersection of an arbitrary nonempty collection of subspaces of X is again a subspace of X . Proof. (a) The closure M− of a linear manifold M of a normed space X is clearly a closed subset of X . What is left to be shown is that this closed subset of X is also a linear manifold of X . Take two arbitrary points x and y in M−. According to Proposition 3.27, there exist M-valued sequences {xn } and {yn } that converge to x and y, respectively, in the norm topology of X . Therefore (cf. Problem 4.1), the M-valued sequence {xn + yn } converges in X to x + y, and hence x + y ∈ M− (see Proposition 3.27 again). Similarly, αx ∈ M− for every scalar α. Thus M− is a linear manifold of X . (b) The intersection of an arbitrary nonempty collection of linear manifolds of a linear space is a linear manifold (cf. Section 2.2), and the intersection of an arbitrary collection of closed subsets of a metric space is a closed set (cf. Theorem 3.22). Thus the intersection of an arbitrary nonempty collection of closed linear manifolds of a normed space is a closed linear manifold. Let A be a subset of a normed space X . The (linear) span of A, span A, was defined in Section 2.2 as the intersection of all linear manifolds of X that include A, which coincides with the smallest linear manifold of X that includes A. Recall that the smallest closed subset of X that includes span A is precisely
4.3 Subspaces and Quotient Spaces
213
its closure (span A)− in X (by the very definition of closure). According to Proposition 4.8(a) (span A)− is a subspace of X . This is the smallest closed linear manifold of X that includes A. Set
− A = span A , which is called the subspace spanned by A. If a subspace M of X (which may be X itself) is such that M = A for some subset A of X , then we say that A spans M or that A is a spanning set for M (note: the same terminology of Section 2.2 but now with a different meaning); or still that A is a total set for M. Note that the intersection of all subspaces of X that include A is the smallest subspace of X (see Proposition 4.8(b)) that includes A. Summing up: A is the smallest subspace of X that includes A, which coincides with the intersection of all subspaces of X that include A. It is readily verified that − ∅ = {0}, M = M for every linear manifold M of X , and A ⊆ A = A for every subset A of X . Moreover, if A and B are subsets of X , then A ⊆ B implies A ⊆ B. Proposition 4.9. Let X be a normed space. (a) A set A spans X if and only if every linear manifold of X that includes A is dense in X . (b) X is separable if and only if it is spanned by a countable set . Proof. (a) Let A be a subset of a normed space X. Take an arbitrary linear manifold M of X such that A ⊆ M. Recall: A ⊆ M = M−. Thus A = X implies M− = X . Conversely, note that span A is a linear manifold of X that includes A. If every linear manifold of X that includes A is dense in X , then A = (span A)− = X . (b) Let X be a normed space. If X is separable as a metric space, then (by definition) there a countableset, say A ⊆ X , such that A− = X . Since exists − A⊆ A= A , it follows that A = X . Conversely, suppose (span A)− = X for some countable subset A of X . Recall that span A is the set of all (finite) linear combinations of vectors in A (Proposition 2.2). Let M denote the set of all (finite) linear combinations of vectors in A with rational coefficients (note: we say that a complex number is “rational” if its real and imaginary parts are rational numbers). Since Q − = R, it follows that M− = (span A)− (Example 3.P), and hence M− = X . Moreover, it is readily verified that M is a linear space over the rational field Q and so M = span A (over Q ). Thus M has a Hamel basis included in A (Theorem 2.6) so that dim M ≤ # A ≤ ℵ0 , and therefore # M = max{ # Q , dim M} = ℵ0 (Problem 2.8). Conclusion: M is a countable dense subset of X . Outcome: X is separable. In Section 2.2 we considered the subcollection Lat(X ) of the power set ℘(X ) of a linear space X consisting of all linear manifolds of X . Lat(X ) was
214
4. Banach Spaces
shown to be a complete lattice (in the inclusion ordering of ℘ (X )): if {Mγ }γ∈Γ is an arbitrary nonempty subcollection of Lat(X ) (i.e., an arbitrary nonempty indexed family of linear manifolds of X ), then Mγ , inf{Mγ }γ∈Γ = γ∈Γ
sup{Mγ }γ∈Γ = span
Mγ = Mγ ,
γ∈Γ
γ∈Γ
both in Lat(X ). In particular, if {M, N } is a pair of linear manifolds of X , then M ∧ N = M ∩ N and M ∨ N = span (M ∪ N ) = M + N are both linear manifolds of X . Now shift from linear manifolds to subspaces. Let Lat(X ) denote the subcollection of the power set ℘(X ) of a normed space X made up of all subspaces of X . Clearly, Lat(X ) ⊆ Lat(X ). If {Mγ }γ∈Γ is an arbitrary nonempty subcollection of Lat(X ) (i.e., an arbitrary nonempty indexed family of subspaces of the normed space X ), then γ∈Γ Mγ ∈ Lat(X ) (by Proposition 4.8(b)) and γ∈Γ Mγ ⊆ Mα for all Mα ∈ {Mγ }γ∈Γ , so that γ∈Γ Mγ is a lower bound for {Mγ }γ∈Γ . Moreover, if V in Lat(X ) is a lower bound for {Mγ }γ∈Γ (i.e., if V ⊆ Mγ for all γ ∈ Γ ), then V ⊆ γ∈Γ Mγ . Thus γ∈Γ Mγ = inf{Mγ }γ∈Γ . / If we adopt the usual notation γ∈Γ Mγ = inf{Mγ }γ∈Γ , then 0
Mγ =
γ∈Γ
Mγ .
γ∈Γ
Similarly, Mα ⊆ γ∈Γ Mγ ∈ Lat(X ) for all Mα ∈ {Mγ }γ∈Γ and, if U in ⊆ U for all γ ∈ Γ so that Lat(X ) is an upper bound for {M γ }γ∈Γ (i.e., if M γ Thus γ∈Γ Mγ ⊆ U ), then γ∈Γ Mγ ⊆ U . γ∈Γ Mγ = sup{Mγ }γ∈Γ . Again, if we take up the usual notation γ∈Γ Mγ = sup{Mγ }γ∈Γ , then 1 γ∈Γ
Mγ =
1 γ∈Γ
Mγ
=
Mγ
−
.
γ∈Γ
This is the topological sum of {Mγ }γ∈Γ . Conclusion: Lat(X ) is a complete lattice. The collection of all subspaces of a normed space is a complete lattice in the inclusion ordering. If M and N are subspaces of X , then M ∧ N = M ∩ N and M ∨ N = (M ∪ N ) = (M + N )− lie in Lat(X ). However (and this is rather important), it may happen that M + N = (M + N )− : the (algebraic) sum of subspaces is not necessarily a subspace (it is a linear manifold but not necessarily a closed linear manifold). We shall see later (next chapter) an example of a pair of subspaces (of a Banach space) whose sum is not closed. Remark : Let {Mi }ni=1 be a finite collection
of a normed of linear manifolds n − − space X . Although it may happen that ni=1 M− M , we have i i i=1
4.3 Subspaces and Quotient Spaces n
M− i
−
i=1
=
n
Mi
−
215
.
i=1
n M− − and ε > 0 arbitrary. Thus #x − u# < ε for some Indeed, take x ∈ n n i=1 − i u = i=1 ui in i=1 Mi with each ui in M− so #ui − ui # < ε for some i , and n n ui in Mi , for every integer i. Set u = u in . Thus #x − u# = i i=1 i=1 Mi n n n n #x − i=1 uγ # = #x − i=1 ui + i=1 (ui − ui )# = #x − u + i=1 (ui − ui )# ≤ n #x − u# + i=1 #ui − u1 # ≤ (n+1)ε. Then (since n does not depend non ε), for − δ every δ > 0 set ε = n+1 so that #x − u# < δ, and hence x ∈ . i=1 Mi n
− n − − Therefore, M ⊆ M . The converse inclusion is trivial. i i i=1 i=1 Next we consider the quotient space of a normed space modulo a subspace. Suppose M is a subspace of a normed space (X , # #X ). Let [x] = x + M = x ∈ X : x = x + u for some u ∈ M be the coset (of x modulo M) in the quotient space X /M (of X modulo M), which is a linear space over the same field of the linear space X (cf. Example 2.H). Consider the function # # : X /M → R given by #[x]# = inf #x #X = inf #x + u#X x ∈[x]
u∈M
for every [x] in X /M, where x is any vector of X in [x]. This in fact defines a function from X /M to R, for #[x]# depends only on the coset [x] and not on a particular representative vector x in [x]. Note that inf u∈M #x + u#X = inf u∈M #x − u#X because M is a linear manifold. Thus, with dX denoting the metric on X generated by the norm # #X , #[x]# = inf dX (x, u) = dX (x, M) u∈M
for every [x] ∈ X /M, where x is any vector of X in [x]. We claim that # # is a norm on the quotient space X /M. Indeed, nonnegativeness, #[x]# ≥ 0 for every [x] ∈ X /M, is trivially verified. To verify positiveness, proceed as follows. Take an arbitrary x ∈ [x]. If #[x]# = 0, then dX (x, M) = 0 so that x ∈ M− = M (cf. Problem 3.43(b) — recall: M is closed in X ). This means that x ∈ [0] (reason: the origin [0] of X /M is M — see Example 2.H), and hence [x] = [0]. Equivalently, #[x]# > 0
whenever
[x] = 0.
Absolute homogeneity and subadditivity (i.e., the triangle inequality) are also easily verified. Recall that (Example 2.H) α[x] = [αx]
and
[x] + [y] = [x + y]
for every [x], [y] in X /M and every scalar α. Since M is a linear manifold, it follows by absolute homogeneity and subadditivity of the norm # #X that
216
4. Banach Spaces
#α[x]# = #[αx]# = inf #αx + u#X = inf #αx + αu#X u∈M
u∈M
= |α| inf #x + u#X = |α|#[x]#, u∈M
#[x] + [y]# = #[x + y]# = inf #x + y + u#X = inf #x + u + y + u#X u∈M
u∈M
≤ inf #x + u#X + inf #y + u#X = #[x]# + #[y]#, u∈M
u∈M
for every [x], [y] in X /M and every scalar α. Such a norm is referred to as the quotient norm. When a quotient space X /M (of a normed space X modulo a subspace M) is regarded as a normed space, it is this quotient norm that is assumed to equip it. Since inf u∈M #x + u#X ≤ #x#X + inf u∈M #u#X = #x#X , #[x]# ≤ #x#X for every [x] ∈ X /M and every x ∈ [x]. Thus #[x] − [y]# = #[x − y]# ≤ #x − y#X for every x, y ∈ X . Thus the natural mapping π of X onto X /M (defined by π(x) = [x] for every x ∈ X ) is uniformly continuous (a contraction, actually), and so it preserves convergence and Cauchy sequences (Corollary 3.8 and Lemma 3.43). If {xn } converges in X to x ∈ X , then {[xn ]} converges in X /M to [x] ∈ X /M, and if {xn } is a Cauchy sequence in X , then {[xn ]} is a Cauchy sequence in X /M. Proposition 4.10. If M is a subspace of a Banach space X , then the quotient space X /M is a Banach space. Proof. Let M be a subspace of a normed space (X , # #X ). Consider the quotient space X /M equipped with the quotient norm # # and let {[xn ]}∞ n=0 be an arbitrary Cauchy sequence in X /M. According to Problem 3.52(c), {[xn ]}∞ n=0 has a subsequence {[xnk ]}∞ k=0 of bounded variation so that ∞
#[xnk+1 − xnk ]# =
k=0
∞
#[xnk+1 ] − [xnk ]# < ∞.
k=0
If x ∈ X , then there exists an ε > 0 and a vector uε,x ∈ M such that #x + uε,x #X ≤ inf #x + u#X + ε = #[x]# + ε. u∈M
In particular, for each integer k ≥ 0 there exists a vector uk ∈ M such that #(xnk+1 − xnk ) + uk #X ≤ #[xnk+1 − xnk ]# + ( 12 )k . ∞ Therefore, as k=0 ( 12 )k < ∞,
4.4 Bounded Linear Transformations ∞
217
#(xnk+1 − xnk ) + uk #X < ∞.
k=0
In words, the X -valued sequence {(xnk+1− xnk ) + uk }∞ k=0 is absolutely summable. Take the sequence {yk }∞ in X of partial sums of {(xnk+1 − xnk ) + uk }∞ k=0 k=0 . A trivial induction shows that yk =
k
(xni+1 − xni ) + ui = (xnk+1 − xn0 ) +
i=0
k
ui
i=0
for each k ≥ 0. If X is a Banach space, then the absolutely summable sequence {(xnk+1 − xnk ) + uk }∞ k=0 is summable (cf. Proposition 4.4), which means that {yk }∞ converges in X . Thus {[yk ]}∞ k=0 k=0 converges in X /M (since the natural mapping of X onto X /M, x → [x], is continuous). But [xnk+1 ] − [xn0 ] = [xnk+1 − xn0 ] = [yk ] for each k ≥ 0 (because ki=0 ui lies in M for every k ≥ 0). Then the subsequence {[xnk ]}∞ k=0 converges in X /M, and so the Cauchy sequence {[xn ]}∞ n=0 converges in X /M (Proposition 3.39(c)). Conclusion: Every Cauchy sequence in X /M converges in X /M; that is, X /M is Banach.
4.4 Bounded Linear Transformations Let X and Y be normed spaces. A continuous linear transformation of X into Y is a linear transformation of the linear space X into the linear space Y that is continuous with respect to the norm topologies of X and Y. (Note that X and Y are necessarily linear spaces over the same scalar field.) This is the most important concept that results from the combination of algebraic and topological structures. Definition 4.11. A linear transformation T of a normed space X into a normed space Y is bounded if there exists a constant β ≥ 0 such that #T x# ≤ β #x# for every x ∈ X . (The norm on the left-hand side is the norm on Y and that on the right-hand side is the norm on X .) Proposition 4.12. A linear transformation of a normed space X into a normed space Y is bounded if and only if it maps bounded subsets of X into bounded subsets of Y. Proof. Let A be a bounded subset of X so that supa∈A #a# < ∞ (Problem 4.5). If T is bounded, then there exists a real number β > 0 such that #T x# ≤ β#x# for every x ∈ X . Therefore sup #y# = sup #T a# ≤ β sup #a# < ∞,
y∈T (A)
a∈A
a∈A
218
4. Banach Spaces
and so T (A) is bounded in Y. Conversely, suppose T (A) is bounded in Y for every bounded set A in X . In particular, T (B1 [0]) is bounded in Y: the closed unit ball centered at the origin of X , B1 [0], is certainly bounded in X . Thus sup #T b# = sup #T b# =
b ≤1
b∈B1 [0]
sup
y∈T (B1 [0])
#y# < ∞.
x Take an arbitrary nonzero x in X . Since # x # = 1, it follows that
, x , , ≤ sup #T b##x# #T x# = #x# ,T x
b ≤1
for every 0 = x ∈ X , and hence T is bounded (since the inequality #T x# ≤ sup b ≤1 #T b##x# holds trivially for x = 0). The next elementary result is extremely important. Proposition 4.13. Let T be a bounded linear transformation of a normed space X into a normed space Y. The null space N (T ) of T is a subspace of X. Proof. The null space (or kernel) of T , N (T ) = x ∈ X : T x = 0 = T −1 ({0}), is a linear manifold of X (Problem 2.10). The Closed Set Theorem (Theorem 3.30) ensures that it is closed in X . Indeed, if {xn } is an N (T )-valued sequence (i.e., T xn = 0 for every n) that converges in X to x ∈ X , then 0 ≤ #T x# = #T xn − T x# = #T (xn − x)# ≤ β#xn − x# → 0 as n → ∞ (because T is linear and bounded), and hence x ∈ N (T ).
Theorem 4.14. Let T be a linear transformation of a normed space X into a normed space Y. The following assertions are pairwise equivalent . (a) T is bounded. (b) T is Lipschitzian. (c) T is uniformly continuous. (d) T is continuous. (e) T is continuous at some point x0 of X . (f) T is continuous at the origin 0 ∈ X . Proof. Let T : X → Y be a linear transformation. If T is bounded, then there exists β ≥ 0 such that, for every x1 , x2 ∈ X , #T x1 − T x2 # = #T (x1 − x2 )# ≤ β#x1 − x2 #,
4.4 Bounded Linear Transformations
219
and hence (a)⇒(b). Recall that (b)⇒(c)⇒(d)⇒(e) trivially. Now suppose T is continuous at x0 ∈ X . Then for every ε > 0 there exists a δ > 0 such that #T x − T 0# = #T x# = #T (x + x0 ) − T x0 # < ε whenever #x − 0# = #x# = #(x + x0 ) − x0 # < δ, and so T is continuous at 0 ∈ X . Therefore (e)⇒(f). Next suppose T is continuous at 0 ∈ X . Thus (for ε = 1) there exists a δ > 0 such δ that #T x# = #T x − T 0# < 1 whenever #x# = #x − 0# < δ. Since # 2 x x# < δ for every nonzero x in X , , , ,
, , δ , , 2 x , δ T x = x #T x# = 2 x , ,T , , < 2δ #x# δ 2 x δ 2 x for every 0 = x ∈ X , and hence T is bounded. Thus (f)⇒(a).
Observe from Theorem 4.14 that the terms “bounded linear transformation” and “continuous linear transformation” are synonyms, and as such we shall use them interchangeably. If X and Y are normed spaces (over the same field F ), then the collection of all bounded linear transformations of X into Y will be denoted by B[X , Y ]. It is easy to verify (triangle inequality and homogeneity for the norm on Y) that B[X , Y ] is a linear manifold of the linear space L[X , Y ] of all linear transformations of X into Y (see Section 2.5), and so B[X , Y ] is a linear space over F . The origin of B[X , Y ] is the null transformation, denoted by O (Ox = 0 for every x ∈ X ). For each T ∈ B[X , Y ] set #T # = inf β ≥ 0: #T x# ≤ β#x# for every x ∈ X . (Recall: If T ∈ B[X , Y ], then there is a β ≥ 0 such that #T x# ≤ β#x# for every x ∈ X , and hence the nonnegative number #T # exists for every T ∈ B[X , Y ].) If x is any vector in X , then #T x# ≤ (#T # + ε)#x# for every ε > 0 so that #T x# ≤ inf ε>0 (#T # + ε)#x#. Thus #T x# ≤ #T # #x#
for every
x ∈ X,
and hence #T # = min β ≥ 0: #T x# ≤ β#x# for every x ∈ X . Therefore #T # ≥ 0
and
#T # = 0 if and only if T = O.
Moreover, for any α in F and any S in B[X , Y ] we get #(αT )x# = #α(T x)# = |α|#T x# and #(T + S)x# = #T x + Sx# ≤ #T x# + #Sx# ≤ (#T # + #S#)#x#, for every x ∈ X . This implies that #αT # = |α|#T #
and
#T + S# ≤ #T # + #S#
for every T, S ∈ B[X , Y ] and every scalar α ∈ F . Conclusion: The function B[X , Y ] → R defined by T → #T # is a norm on the linear space B[X , Y ]. Thus B[X , Y ] is a normed space, and #T # is referred to as the norm of T ∈ B[X , Y ],
220
4. Banach Spaces
or the usual norm on B[X , Y ], or yet as the induced uniform norm on B[X , Y ] — the norm #T # on B[X , Y ] is induced by the norm #T x# on Y and the norm #x# on X . This is the norm that will be assumed to equip B[X , Y ] whenever B[X , Y ] is regarded as a normed space. The reader is invited to show that x #T # = sup #T x# = sup #T x# = sup #T x# = sup T
x ,
x ≤1
x <1
x =0
x =1
where the last two expressions make sense only if X = {0}. A contraction in B[X , Y ] is a bounded linear transformation T ∈ B[X , Y ] such that #T # ≤ 1. Clearly, #T # ≤ 1
⇐⇒
#T x# ≤ #x# for every x ∈ X .
If X = {0}, then a transformation T ∈ B[X , Y ] is a contraction if and only if supx =0 (#T x#/#x#) ≤ 1. A strict contraction in B[X , Y ] is a bounded linear transformation T ∈ B[X , Y ] such that #T # < 1. It is obvious that every strict contraction is a contraction. If X = {0}, then a transformation T ∈ B[X , Y ] is a strict contraction if and only if supx =0 (#T x#/#x#) < 1. Carefully note that, if T ∈ B[X , Y ] and X = {0}, then #T # < 1
=⇒
#T x# < #x# for every 0 = x ∈ X
=⇒
#T # ≤ 1.
A transformation that satisfies the middle inequality is called a proper contraction. Every strict contraction is a proper contraction, which in turn is a (plain) contraction. The converses to the above one-way implications fail in general. It is clear that there exist contractions that are not proper (the identity, for instance). The next example exhibits a proper contraction that is not strict. A nonstrict contraction is precisely a bounded linear transformation T ∈ B[X , Y ] with norm 1 (i.e., #T # ≤ 1 and #T # < 1 if and only if #T # = 1). p Example 4.H. Let + be the Banach space of Example 4.B for some p ≥ 1. p We shall drop the subscript “p” from the norm # #p on + and write simply p p # #. Now consider the diagonal mapping Da : + → + of Problem 3.22 for some ∞ bounded sequence a = {αk }∞ k=0 ∈ + :
Da x = {αk ξk }∞ k=0
for every
p x = {ξk }∞ k=0 ∈ +
(i.e., Da (ξ0 , ξ1 , ξ2 , . . .) = (α0 ξ0 , α1 ξ1 , α2 ξ2 , . . .)). Common notation for a diagp onal mapping of + into itself includes the following usual representation as an infinite diagonal matrix : ⎛ ⎞ α0 α1 ⎜ ⎟ α2 Da = diag({αk }∞ ⎠. k=0 ) = diag(α0 , α1 , α2 , . . .) = ⎝ .. . This bounded linear transformation is called a diagonal operator . Linearity is trivially verified. Boundedness (i.e., continuity) follows from Problem 3.22(a). ∞ Indeed, if a = {αk }∞ k=0 is a bounded sequence (i.e., a sequence in + ), then
4.4 Bounded Linear Transformations
#Da x# =
∞
|αk ξk |p
p1
≤ sup |αk |
∞
k
k=0
|ξk |p
p1
221
= sup |αk |#x# k
k=0
p for every x = {ξk }∞ k=0 ∈ + , so that #Da # ≤ supk |αk | = #a#∞ . On the other p hand, consider the + -valued sequence {ej }∞ j=0 where, for each j ≥ 0, ej is a scalar-valued sequence with just one nonzero entry (equal to 1) at the jth position (i.e., ej = {δjk }∞ k=0 for every j ≥ 0, with δjk = 1 if k = j and δjk = 0 if k = j). Since Da ej = αj ej and #ej # = 1, it follows that #Da ej # = |αj | and hence #Da # = sup x =1 #Da x# ≥ #Da ej # = |αj |, for every j ≥ 0. Thus #Da # ≥ supj |αj | = #a#∞ . Therefore,
#Da # = #a#∞ . If a = {αk }∞ k=0 is a constant sequence, say αk = α for all k ≥ 0, then Da = p diag(α, α, α, . . .) = αI, a multiple of the identity I on + . In this case Da is called a scalar operator (see Problem 4.19). It is clear that #αIx# = |α|#x# for p every x ∈ + and #αI# = |α|. If a = {αk }∞ the increasing sequence with k=0
1 is k+1 2, 3, , αk = k+2 for every k ≥ 0, then Da = diag 2 3 4 · · · is a nonstrict proper p and #Da # = 1. contraction: #Da x# < #x# for every 0 = x ∈ + Proposition 4.15. B[X , Y ] is a Banach space if Y is a Banach space. Proof. Let {Tn } be an arbitrary Cauchy sequence in B[X , Y ]. Thus, for each x ∈ X , the sequence {Tn x} is a Cauchy sequence in Y (reason: #Tm x − Tn x# ≤ #Tm − Tn # #x# for every x ∈ X and every pair of indices m and n). Since Y is complete, {Tn x} converges in Y. Then for each x ∈ X there is a vector yx ∈ Y such that Tn x → yx in Y. Let T be the mapping that assigns to each x in X this (unique) limit vector yx in Y; that is, T x = limn Tn x for every x ∈ X . Claim 1. T : X → Y is linear. Proof. Since Tn is linear for each n, and since the linear operations are continuous (Problem 4.1), it follows that T (α1 x1 + α2 x2 ) = lim Tn (α1 x1 + α2 x2 ) = lim(Tn α1 x1 + Tn α2 x2 ) n
n
= α1 lim Tn x1 + α2 lim Tn x2 = α1 T x1 + α2 T x2 n
n
for every x1 , x2 ∈ X and every pair of scalars α1 and α2 . Claim 2. T : X → Y is bounded. Proof. Take an arbitrary real number ε > 0. Since {Tn } is a Cauchy sequence in B[X , Y ], it follows that there exists an integer nε ≥ 1 such that #Tn − Tm # < ε whenever m, n ≥ nε . Thus #Tn x − Tm x# ≤ #Tn − Tm # #x# < ε#x# for every x ∈ X and every m, n ≥ nε . Therefore, , , #(Tn − T )x# = #Tn x − T x# = ,Tn x − lim Tm x, m , , , , = , lim (Tn x − Tm x), = lim ,Tn x − Tm x, ≤ ε#x# m
m
222
4. Banach Spaces
for every x ∈ X and every n ≥ nε (reason: every norm is continuous — see Corollary 3.8). Hence the linear transformation Tn − T is bounded for each n ≥ nε , and so is T = Tn − (Tn − T ). Claim 3. Tn → T in B[X , Y ]. Proof. For every ε > 0 there exists an integer nε ≥ 1 such that n ≥ nε
implies
#Tn − T # = sup #(Tn − T )x# < ε
x ≤1
according to the above displayed inequality. Conclusion: Every Cauchy sequence {Tn} in B[X , Y ] converges in B[X , Y ] to T ∈ B[X , Y ], which means that the normed space B[X , Y ] is complete (with respect to the induced uniform norm topology of B[X , Y ]). That is, B[X , Y ] is a Banach space. Comparing the preceding proposition with Example 4.G, we might expect that, if X = {0}, then B[X , Y ] is a Banach space if and only if Y is a Banach space. This indeed is the case, and we shall prove the converse of Proposition 4.15 in Section 4.10. Recall that bounded linear transformations can be multiplied (where multiplication means composition). The resulting transformation is linear as well as bounded. Proposition 4.16. Let X , Y, and Z be normed spaces over the same scalar field. If T ∈ B[X , Y ] and S ∈ B[Y, Z], then S T ∈ B[X , Z] and #S T # ≤ #S# #T #. Proof. The composition S T : X → Z is linear (Problem 2.15) and #S T x# ≤ #S##T x# ≤ #S##T ##x# for every x in X . The above inequality is a rather important additional property shared by the (induced uniform) norm of bounded linear transformations. We shall refer to it as the operator norm property. Let X be a normed space and set B[X ] = B[X , X ] for short. The elements of B[X ] are called operators. In other words, by an operator (or a bounded linear operator ) we simply mean a bounded linear transformation of a normed space X into itself, so that B[X ] is the normed space of all operators on X . Example 4.I. Let {(Xk , # #k )}k∈I be a countable (either finite or infinite) indexed family of normed spaces (over the same scalar field), and consider the direct sum k∈I Xk of the linear spaces {Xk }k∈I . (To avoid trivialities, assume that each Xk is nonzero.) Equip the linear space k∈I Xk with any of the norms # #∞ or # #p (for any p ≥ 1) as in Examples 4.E and 4.F. If the index set I is
- .
infinite, then write k∈I Xk , # #∞ for the normed space k∈I Xk ∞ , # #∞
4.4 Bounded Linear Transformations
223
- .
and the normed space k∈I Xk , # #p for k∈I Xk p , # #p . For each i ∈ I define a mapping Pi : k∈I Xk → k∈I Xk by Pi x = xi for every x = {xk }k∈I ∈ k∈I Xk , where we are identifying each vector xi ∈ Xi with the indexed family {xi (k)}k∈I in k∈I Xk that has just one nonzero entry (equal to xi ) at the ith position (i.e., set xi (k) = δik xk so that xi (k) = 0k ∈ Xk , the origin of Xk , if k = i and xi (i) = xi ). Briefly, we are writing xi for (01 , . . . , 0i−1 , xi , 0i+1 , . . .). Each Pi is a linear transformation. Indeed,
Pi (αu ⊕ β v) = Pi α{uk }k∈I ⊕ β{vk }k∈I
= Pi {αuk + β vk }k∈I = αui + β vi = αPi u ⊕ βPi v for every u = {uk }k∈I and v = {vk }k∈I in k∈I Xk and every pair of scalars {α, β}. Moreover, R(Pi ) = Xi if k∈I Xi (k) of we identify each normed space Xi with the linear manifold (k) (i) X , such that X = {0 } for k = i and X = X , equipped with any i k i i k∈I k of the norms # #∞ or # #p . (To identify Xi with k∈I Xi (k) simply means that these normed spaces are isometrically isomorphic, a concept that will be in troduced in Section 4.7.) Note that {xi (k)}k∈I lies in k∈I Xi (k) for every xi in Xi and #xi #i = #{xi (k)}k∈I #∞ = #{xi (k)}k∈I #p for any p ≥ 1. Also note that each Pi is a projection (i.e., an idempotent linear transformation): Pi2 x = Pi {xi (k)}k∈I = {xi (k)}k∈I = Pi x for every x = {xk }k∈I in k∈I Xk so that Pi = Pi2 for each i ∈ I . In fact, each Pi is a continuous projection. To show that P : X → i k∈I k k∈I Xk is continu ous, when k∈I Xk is equipped with any of the norms # #∞ or # #p , note that #Pi x#∞ = #{xi (k)}k∈I #∞ = #xi #i ≤ sup #xk #i = #x#∞ k∈I
for every x = {xk }i∈I in
k∈I Xk , #
#∞ , and
#Pi x#pp = #{xi (k)}k∈I #pp = #xi #pi ≤
#xk #pk = #x#pp
k∈I
for every x = {xk }k∈I in k∈I Xk , # #p . Thus Pi is a bounded linear transformation, and hence continuous (Theorem 4.14). In other words, each Pi is a pro jection operator in B[ k∈I Xk ]. Actually, if we use the same symbols # #∞ and # #p to denote the induced uniform norms on B[ k∈I Xk ] whenever k∈I Xk is equipped with either # #∞ or # #p , respectively, then the above inequalities ensure that #Pi #∞ ≤ 1 and #Pi #p ≤ 1. Thus each Pi is a contraction with respect to any of the norms # #∞ or # #p . On the other hand, since Pi = Pi2 , it
224
4. Banach Spaces
follows that #Pi #∞ = #Pi2 #∞ ≤ #Pi #2∞ and #Pi #p = #Pi2 #p ≤ #Pi #2p (operator norm property). Hence 1 ≤ #Pi #∞ and 1 ≤ #Pi #p (since Pi = O) so that
#Pi #∞ = #Pi #p = 1.
Summing up: If # #∞ or # #p k∈I Xk is equipped with any of the norms (for any p ≥ 1), then {Pi }i∈I is a family of projections in B[ k∈I Xk ] with unit norm. When we identify with Xi , as we actually did, k∈I X i (k) = R(Pi ) then each mapping P : X → X ⊆ can be viewed as a (k) i k∈I k k∈I i k∈I Xk function from k∈I Xk onto Xi , and hence we write P : i k∈I Xk → Xi . This is called the natural projection of k∈I Xk onto Xi . Definition 4.17. A normed space A that is also an algebra with respect to a product operation A×A → A (see Problem 2.30), say (A, B) → AB, such that #AB# ≤ #A# #B# is called a normed algebra. A Banach algebra is a normed algebra that is complete as a normed space. If a normed algebra possesses an identity I such that #I# = 1, then it is a unital normed algebra (or a normed algebra with identity). If a unital normed algebra is complete as a normed space, then it is a unital Banach algebra (or a Banach algebra with identity). Let L[X ] = L[X , X ] denote the linear space of all linear transformations on X (i.e., the linear space of all linear transformations of a linear space X into itself — see Section 2.5). We have already seen that B[X ] is a linear manifold of L[X ]. If X = {0}, then L[X ] is much more than a mere linear space; it is an algebra with identity, where the product in L[X ] is interpreted as composition (Problem 2.30). The operator norm property ensures that B[X ] is a subalgebra (in the algebraic sense) of L[X ], and hence an algebra in its own right. Moreover, the operator norm property also ensures that B[X ] in fact is a normed algebra. Let I denote the identity operator in B[X ] (i.e., Ix = x for every x ∈ X ), which is the identity of the algebra B[X ]. Of course, #I# = 1 by the very definition of norm (the induced uniform norm, that is) on B[X ]. Thus B[X ] is a normed algebra with identity and, according to Proposition 4.15, B[X ] is a Banach algebra with identity if X = {0} is a Banach space. We know from Problem 4.1 that addition and scalar multiplication in B[X ] are continuous (as they are in any normed space). The operator norm property allows us to conclude that multiplication is continuous too. Indeed, if {Sn } and {Tn } are B[X ]-valued sequences that converge in B[X ] to S ∈ B[X ] and to T ∈ B[X ], respectively, then #Sn Tn − S T # = #Sn (Tn − T ) + (Sn − S)T # ≤ #Sn # #Tn − T # + #Sn − S# #T # for every n. Since supn #Sn # < ∞ (every convergent sequence is bounded), it follows that {Sn Tn } converges in B[X ] to S T . Thus the product operation B[X ]×B[X ] → B[X ] given by (S, T ) → S T is continuous (cf. Corollary 3.8), where the topology on B[X ]×B[X ] is that induced by any of the equivalent metrics of Problems 3.9 and 3.33.
4.5 The Open Mapping Theorem and Continuous Inverse
225
4.5 The Open Mapping Theorem and Continuous Inverse Recall that a function F : X → Y from a set X to a set Y has an inverse on its range R(F ) if there exists a (unique) function F −1 : R(F ) → X (called the inverse of F on R(F )) such that F −1 F = IX , where IX stands for the identity on X. Moreover, F has an inverse on its range if and only if it is injective. If F is injective and surjective, then it is called invertible (in the set-theoretic sense) and F −1 : Y → X is the inverse of F . Furthermore, F : X → Y is invertible if and only if there exists a unique function F −1 : Y → X (the inverse of it) such that F −1 F = IX and F F −1 = IY , where IY stands for the identity on Y. (See Section 1.3 and Problems 1.5 through 1.8.) Now let T be a linear transformation of a linear space X into a linear space Y. Theorems 2.8 and 2.10 say that N (T ) = {0} if and only if T has a linear inverse on its range R(T ). That is, N (T ) = {0} ⇐⇒ T ∈ L[X , Y ] is injective ⇐⇒ T −1 ∈ L[R(T ), X ] exists. Clearly, N (T ) = {0} if and only if 0 < #T x# for every nonzero x ∈ X (whenever Y is a normed space). This leads to the following proposition. Proposition 4.18. Take T ∈ L[X , Y ], where X is a linear space and Y is a normed space. The following assertions are equivalent . (a) There exists T −1 ∈ L[R(T ), X ] (i.e., T has an inverse on its range). (b) 0 < #T x# for every nonzero x in X . Definition 4.19. A linear transformation T of a normed space X into a normed space Y is bounded below if there exists a constant α > 0 such that α#x# ≤ #T x# for every x ∈ X (where the norm on the left-hand side is the norm on X and that on the right-hand side is the norm on Y). Note that T ∈ B[X, Y ] actually means that T ∈ L[X, Y ] is bounded above. The next result is a continuous version of the previous proposition. Proposition 4.20. Take T ∈ L[X , Y ], where X and Y are normed spaces. The following assertions are equivalent. (a) There exists T −1 ∈ B[R(T ), X ] (i.e., T has a continuous inverse on its range). (b) T is bounded below . Moreover, if T ∈ B[X , Y ] and if X is a Banach space, then each of the above equivalent assertions implies that R(T )− = R(T ) (i.e., R(T ) is closed in Y). Proof. Suppose T is a linear transformation between normed spaces X and Y. Recall that, if T has an inverse, then its inverse is linear (Theorem 2.10).
226
4. Banach Spaces
Proof of (a)⇒(b). If (a) holds, then there exists a constant β > 0 such that #T −1 y# ≤ β#y# for every y ∈ R(T ) (Definition 4.11). Take an arbitrary x ∈ X so that T x ∈ R(T ). Thus #x# = #T −1 T x# ≤ β#T x#, and hence β1 #x# ≤ #T x#. Proof of (b)⇒(a). If (b) holds, then 0 < #T x# for every nonzero x in X , which implies that there exists T −1 ∈ L[R(T ), X ] by Proposition 4.18. Take any y in R(T ) so that y = T x for some x ∈ X . Thus #T −1 y# = #T −1 T x# = #x# ≤ 1 1 −1 is bounded (Definition 4.11). α #T x# = α #y# if (b) holds, and so T Now take an arbitrary R(T )-valued convergent sequence {yn }. Since each yn is in R(T ), there exists an X -valued sequence {xn } such that yn = T xn for each n. Thus {T xn } converges in Y, and hence {T xn} is a Cauchy sequence in Y. If T is bounded below, then there exists α > 0 such that 0 ≤ α#xm − xn # ≤ #T (xm − xn )# = #T xm − T xn # for every pair of indices m and n (recall: T is linear) so that {xn } also is a Cauchy sequence (in X ), and so it converges in X to, say, x ∈ X (whenever X is a Banach space). But T is continuous and so yn = T xn → T x (Corollary 3.8), which implies that the (unique) limit of {yn } lies in R(T ). Conclusion: R(T ) is closed in Y by the Closed Set Theorem (Theorem 3.30). ∞ Example 4.J. Take a sequence a = {αk }∞ k=0 ∈ + and consider the diagonal p operator Da ∈ B[+ ] (for some p ≥ 1) of Example 4.H. If αk = 0 for every p k ≥ 0 and Da x = 0 for some x = {ξk }∞ k=0 ∈ + , then x = 0 (i.e., if αk = 0 and αk ξk = 0 for every k ≥ 0, then ξk = 0 for every k ≥ 0). Thus αk = 0 for every k ≥ 0 implies N (Da ) = {0}. Conversely, if N (Da ) = {0}, then Da x = 0 p for every nonzero x ∈ + . In particular, Da ej = αj ej = 0 so that #Daej # = |αj | = 0, and hence αj = 0, for every j ≥ 0 (see Example 4.H). Conclusion:
N (Da ) = {0} if and only if αk = 0 for every k ≥ 0. Thus, since a linear transformation has a (linear) inverse on its range if and only if its null space is zero (i.e., if and only if it is injective), we may conclude: p There exists Da−1 ∈ L[R(Da ), + ] if and only if αk = 0 for every k ≥ 0.
In this case, the linear (but not necessarily bounded) transformation Da−1 is again a diagonal mapping, ∞ Da−1 y = {α−1 k υk }k=0
for every
p y = {υk }∞ k=0 ∈ R(Da ) ⊆ + ,
whose domain D(Da−1 ) is precisely the range R(Da ) of Da (reason: this mapping is a diagonal, as diagonals were defined in Problem 3.22, and Da−1Da is p p the identity on + ). Recall that D(Da−1 ) = + if supk |α−1 k | < ∞ (see Problem p 3.22(a)), and hence R(Da ) = + if inf k |αk | > 0 (because inf k |αk | > 0 implies supk |α−1 k | < ∞). Moreover,
4.5 The Open Mapping Theorem and Continuous Inverse
#Da x# =
∞ k=0
|αk ξk |p
p1
≥ inf |αk | k
∞ k=0
|ξk |p
p1
227
= inf |αk |#x# k
p for every x = {ξk }∞ k=0 ∈ + , so that Da is bounded below if inf k |αk | > 0. Conversely, if Da is bounded below, then there exists α > 0 such that α ≤ #Da ej # = |αj | for every j ≥ 0 so that inf j |αj | ≥ α > 0. Outcome (cf. Proposition 4.20): p There exists Da−1 ∈ B[+ ] if and only if inf k |αk | > 0. p That is, the inverse of a diagonal operator Da in B[+ ] exists and is itself a p diagonal operator in B[+ ] if and only if the bounded sequence a = {αk }∞ k=0 is bounded away from zero (see Problem 4.5).
If T ∈ B[X , Y ], if X is a Banach space, and if there exists T −1 ∈ B[Y, X ], then Y is a Banach space. Proof: Theorem 4.14, Lemma 3.43, and Corollary 3.8 (note: this supplies another proof for part of Proposition 4.20). Question: Does the converse hold? That is, if X and Y are Banach spaces, and if T ∈ B[X , Y ] is invertible in the set-theoretic sense (i.e., injective and surjective), does it follow that T −1 ∈ B[Y, X ]? Yes, it does. This is the Inverse Mapping Theorem. To prove it we need a fundamental result in the theory of bounded linear transformations between Banach spaces, viz., the Open Mapping Theorem. Recall that a function F : X → Y between metric spaces X and Y is an open mapping if F (U ) is open in Y whenever U is open in X (Section 3.4). The Open Mapping Theorem says that every continuous linear transformation of a Banach space onto a Banach space is an open mapping. That is, a surjective bounded linear transformation between Banach spaces maps open sets onto open sets. Theorem 4.21. (The Open Mapping Theorem). If X and Y are Banach spaces and T ∈ B[X , Y ] is surjective, then T is an open mapping. Proof. Let T be a surjective bounded linear transformation of a Banach space X onto a Banach space Y. The nonempty open ball with center at the origin of X and radius ε will be denoted by Xε , and the nonempty open ball with center at the origin of Y and radius δ will be denoted by Yδ . Claim 1. For every ε > 0 there exists δ > 0 such that Yδ ⊆ T (Xε)− (i.e., T (Xε )− is a neighborhood of 0 in Y). Proof. Since each x in X belongs to Xn for every integer n > #x#, ∞ it follows that the sequence of open balls {X } covers X . That is, X = n n=1 Xn , and ∞ hence T (X ) = n=1 T (Xn ) (see Problem 1.2(e)). Thus the countable family {T (Xn )} of subsets of Y covers T (X ). If T (X ) = Y and Y is complete, then the Baire Category Theorem (Theorem 3.58) ensures the existence of a positive integer m such that T (Xm )− has nonempty interior. Since T is linear, T (Xε ) = ε ε − − m T (Xm ) and so T (Xε ) = m T (Xm ) has nonempty interior for every ε > 0 (because multiplication by a nonzero scalar is a homeomorphism). Take an arbitrary ε > 0 and an arbitrary y0 ∈ [T (X 2ε )− ]◦ so that there exists a nonempty open ball with center at y0 , say Bδ (y0 ), such that Bδ (y0 ) ⊆ T (X 2ε )−.
228
4. Banach Spaces
If y is an arbitrary point of Yδ , then #y + y0 − y0 # = #y# < δ, and therefore y + y0 ∈ Bδ (y0 ). Thus both y + y0 and y0 lie in T (X 2ε )−, which means that inf #T u − y − y0 # = 0
u∈X ε
inf #T v − y0 # = 0.
and
v∈X ε
2
2
Hence (recall: u, v ∈ X ε2 implies u − v ∈ Xε ), inf #T x − y# ≤
x∈Xε
inf
u,v∈X ε
#T (u − v) − y#
2
=
inf
u,v∈X ε
#T u − y − y0 + y0 − T v#
2
≤
inf #T u − y − y0 # + inf #T v − y0 # = 0,
u∈X ε
v∈X ε
2
2
and so y ∈ T (Xε )−. Conclusion: Yδ ⊆ T (Xε )−. Claim 2. For every ε > 0 there exists δ > 0 such that Yδ ⊆ T (Xε ) — i.e., we may even erase the closure symbol from Claim 1. Proof. Take an arbitrary ε > 0, and set εn = 2εn for every n ∈ N . According to Claim 1, for each n ∈ N there exists δn ∈ (0, εn ) such that Yδn ⊆ T (Xεn )− . If yn ∈ Yδn ⊆ T (Xεn )− for some n ∈ N , then inf x∈Xεn #yn − T x# = 0, and so there exists xn ∈ Xn such that #yn − T xn # < δn+1 . Set δ = δ1 and take an arbitrary y ∈ Yδ . We claim that there is an X -valued sequence {xn } such that xn ∈ Xεn
and
y−
n
T xi ∈ Yδn+1
i=1
for every n ∈ N . Indeed, y ∈ Yδ1 implies that there exists x1 ∈ Xε1 such that #y − T x1 # < δ2 , and hence the above properties hold for x1 (i.e., they hold for i = n = 1). Now suppose they hold for each i = 1 , . . . , n up to some n ≥ 1, so n that there exists xi ∈ Xεi for i = 1 , . . . , n and y − i=1 T xi ∈ Yδn+1 . This imn plies that there is an xn+1 ∈ Xεn+1 such that #y − i=1 T xi − T xn+1 # < δn+2 , n+1 and so there are xk ∈ Xεk for k = 1 , . . . , n+1 such that y − k=1 T xk ∈ Yδn+2 . Therefore, assuming that the above properties hold for some n ≥ 1, we conclude that they hold for n + 1, which completes the induction argument. Now we shall use this sequence {xn } to show that x ∈ Xε . ∞ ∞ In fact, since #xn # < 2εn for each n ∈ N , we get k=1 #xk # < ε k=1 21k = ε ∞ 1 (because k=0 2k = 2), and so {xn } is absolutely summable. But since X is a Banach space, this implies that {xn } is summable (Proposition 4.4), which means that { ni=1 xi } converges in X . That is, there exists x ∈ X such that y = Tx
for some
4.5 The Open Mapping Theorem and Continuous Inverse n
xi → x in X
as
229
n → ∞.
i=1
Moreover, #
n
i=1 xi #
≤
n
i=1 #xi #
< ε for all n, so that
n n n , , , , , , , , , , , , #x# = , lim xi , = lim , xi , ≤ sup , xi , < ε n i=1
n
n
i=1
i=1
(recall: every norm is continuous), and hence x ∈ Xε . Since T is continuous and linear, and since n
(because #y −
T xi → y in Y
as
n→∞
i=1
n
i=1 T xi #
T x = T lim n
< δn+1 < εn+1 =
n k=1
xk = lim T n
n k=1
ε 2n+1
for every n), it follows that
xk = lim n
n
T xk = y,
k=1
and so y ∈ T (Xε ). That is, an arbitrary point of Yδ lies in T (Xε ). Therefore Yδ ⊆ T (Xε ). Let U be any open subset of X and take an arbitrary z ∈ T (U ). Thus there exists u ∈ U such that z = T u. But u is an interior point of U (since U is open in X ), which means that there is a nonempty open ball included in U with center at u, say Bε (u) ⊆ U . Translate this ball to 0 in X and note that the translated ball coincides with the open ball of radius ε about the origin of X : Xε = Bε (u) − u = x ∈ X : x = v − u for some v ∈ Bε (u) . According to Claim 2 there is a nonempty open ball Yδ about the origin of Y such that Yδ ⊆ T (Xε ). Translate this ball to the point z and get the open ball with center at z and radius δ: Bδ (z) = Yδ + z = w ∈ Y: w = y + z for some y ∈ Yδ . Then, since T is linear, Bδ (z) ⊆ T (Xε ) + T u = T (Xε + u) = T (Bε (u)) ⊆ T (U ), and hence z is an interior point of T (U ). Conclusion: Every point of T (U ) is interior, which means that T (U ) is open in Y. A straightforward application of the Open Mapping Theorem says that an injective and surjective bounded linear transformation between Banach spaces has a bounded inverse.
230
4. Banach Spaces
Theorem 4.22. (The Inverse Mapping Theorem or The Banach Continuous Inverse Theorem). If X and Y are Banach spaces and T ∈ B[X , Y ] is injective and surjective, then T −1 ∈ B[Y, X ]. Proof. Theorems 2.10, 3.20, and 4.21.
If X and Y are normed spaces, then an element T of B[X , Y ] is called invertible if it is an invertible mapping in the set-theoretic sense (i.e., injective and surjective) and its inverse T −1 lies in B[Y, X ]. According to Theorem 2.10, this can be briefly stated as: T ∈ B[X , Y ] is invertible if it has a bounded inverse. We shall denote the set of all invertible elements of B[X , Y ] by G[X , Y ]. The Inverse Mapping Theorem says that, if X and Y are Banach spaces, then both meanings of the term “invertible” coincide, and T −1 ∈ G[Y, X ] whenever T ∈ G[X , Y ] (for (T −1 )−1 = T — Problem 1.8). Note that G[X , Y ] is not a linear manifold of B[X , Y ]: addition is not a binary operation on G[X , Y ] (e.g., −k ∞ 1 diag({ k+1 }k=1 ) + I = diag({ k+1 }∞ k=1 ) — see Example 4.J). On the other hand, the composition (product) of invertible bounded linear transformations is again an invertible bounded linear transformation. Corollary 4.23. If T ∈ G[X , Y ] and S ∈ G[Y, Z], then S T ∈ G[X , Z] and (S T )−1 = T −1 S −1 whenever X , Y, and Z are Banach spaces. Proof. Problem 1.10, Proposition 4.16, and Theorem 4.22.
Here is a useful corollary of Theorem 4.22, which is sometimes referred to as the Bounded Inverse Theorem. Corollary 4.24. Let X and Y be Banach spaces and take T ∈ B[X , Y ]. The following assertions are pairwise equivalent. (a) There exists T −1 ∈ B[R(T ), X ] (T has a bounded inverse on its range). (b) T is bounded below. (c) N (T ) = {0} and R(T )− = R(T ) (T is injective and has a closed range). Proof. According to Proposition 4.20, assertions (a) and (b) are equivalent and each of them implies R(T )− = R(T ) whenever X is a Banach space. Moreover, (b) trivially implies N (T ) = {0}. Thus either of the equivalent assertions (a) or (b) implies (c). On the other hand, if N (T ) = {0}, then T is injective. If, in addition, the linear manifold R(T ) is closed in the Banach space Y, then it is itself a Banach space (Proposition 4.7), so that T : X → R(T ) is an injective and surjective bounded linear transformation of the Banach space X onto the Banach space R(T ). Therefore, its inverse T −1 lies in B[R(T ), X ] by the Inverse Mapping Theorem. That is, (c) implies (a). The role played by the Inverse Mapping Theorem (and so by the Open Mapping Theorem) in the proof of Corollary 4.24 is restricted to show that (c) implies (a); the rest of the proof does not need the Open Mapping Theorem.
4.5 The Open Mapping Theorem and Continuous Inverse
231
Let X = {0} be a normed space. Consider the algebra L[X ] (of all linear transformations on X ), and also the normed algebra B[X ] (of all operators on X ), which is a subalgebra (in the purely algebraic sense) of L[X ]. Let G[X ] denote the set of all invertible operators in B[X ] (i.e., set G[X ] = G[X , X ]). Suppose X is a Banach space, and let T ∈ B[X ] be an invertible element of the algebra L[X ] (i.e., there exists T −1 ∈ L[X ] such that T −1 T = T T −1 = I, where I stands for the identity of L[X ]). Thus, as we had already noticed, the Inverse Mapping Theorem ensures that the concept of an invertible element of the Banach algebra B[X ] is unambiguously defined (since T −1 ∈ B[X ]). In other words, the set-theoretic inverse of an operator on a Banach space is again an operator on the same Banach space. Moreover (since the inverse of an invertible operator is itself invertible), the set G[X ] of all invertible operators in B[X ] forms a group under multiplication (every operator in G[X ] ⊂ B[X ] has an inverse in G[X ]) whenever X is a Banach space. We close this section with another important application of the Open Mapping Theorem. Let X and Y be normed spaces (over the same scalar field) and let X ⊕ Y be their direct sum, which is a linear space whose underlying set is the Cartesian product X ×Y of the underlying sets of the linear spaces X and Y. Recall that the graph of a transformation T : X → Y is the set GT = (x, y) ∈ X ×Y: y = T x = (x, T x) ∈ X ⊕ Y: x ∈ X (cf. Section 1.2). If T is linear, then GT is a linear manifold of the linear space X ⊕ Y (since α(u, T u) ⊕ β(v, T v) = (αu + β v, T (αu + β v)) for every u, v in X and every pair of scalars α, β). Equip the linear space X ⊕ Y with any of the norms of Example 4.E and consider the normed space X ⊕ Y. Theorem 4.25. (The Closed Graph Theorem). If X and Y are Banach spaces and T ∈ L[X , Y ], then T is continuous (i.e., T ∈ B[X , Y ]) if and only if GT is closed in X ⊕ Y. Proof. Let PX : X ⊕ Y → X and PY : X ⊕ Y → Y be defined by PX (x, y) = x
and
PY (x, y) = y
for every (x, y) ∈ X ⊕ Y. Consider the restriction PX |GT : GT → X of PX to the linear manifold GT of the normed space X ⊕ Y. Observe that PX and PY are the natural projections of X ⊕ Y onto X and Y, respectively, which are both linear and bounded (see Example 4.I). Thus PY lies in B[X ⊕ Y, Y ] and PX |GT lies in B[GT , X ] (Problems 2.14 and 3.30). Moreover, PX |GT is clearly surjective. Furthermore, PX |GT is also injective (if PX (x, T x) = 0, then x = 0 and hence (x, T x) = (0, 0) ∈ GT ; that is, N (PX |GT ) = {0}). (a) Recall that X ⊕ Y is a Banach space whenever X and Y are Banach spaces (Example 4.E). If GT is closed in X ⊕ Y, then GT is itself a Banach space (Proposition 4.7), and the Inverse Mapping Theorem (Theorem 4.22) ensures that the inverse (PX |GT )−1 of PX |GT lies in B[X , GT ]. Since T = PY (PX |GT )−1 (because T PX|GT = PY |GT ), it follows by Proposition 4.16 that T is bounded.
232
4. Banach Spaces
(b) Conversely, take an arbitrary sequence {(xn , T xn )} in GT that converges in X ⊕ Y to, say, (x, y) ∈ X ⊕ Y. Since PX and PY are continuous, it follows by Corollary 3.8 that lim xn = lim PX (xn , T xn ) = PX lim(xn , T xn ) = PX (x, y) = x, lim T xn = lim PY (xn , T xn ) = PY lim(xn , T xn ) = PY (x, y) = y. If T is continuous, then (by Corollary 3.8 again) y = lim T xn = T lim xn = T x, and hence (x, y) = (x, T x) ∈ GT . Therefore GT is closed in X ⊕ Y by the Closed Set Theorem (Theorem 3.30).
4.6 Equivalence and Finite-Dimensional Spaces Recall that two sets are said to be equivalent if there exists a one-to-one correspondence (i.e., an injective and surjective mapping or, equivalently, an invertible mapping) between them (Chapter 1). Two linear spaces are isomorphic if there exists an isomorphism (i.e., an invertible linear transformation) of one of them onto the other (recall: the inverse of an invertible linear transformation is again a linear transformation). An isomorphism (or a linear-space isomorphism) is then a one-to-one correspondence that preserves the linear operations between the linear spaces, and hence preserves the algebraic structure (Chapter 2). Two topological spaces are homeomorphic if there exists a homeomorphism (i.e., an invertible continuous mapping whose inverse also is continuous) of one of them onto the other. A homeomorphism provides a one-to-one correspondence between the topologies on the respective spaces, thus preserving the topological structure. In particular, a homeomorphism preserves convergence. A uniform homeomorphism between metric spaces is a homeomorphism where continuity (in both senses) is strengthened to uniform continuity, so that a uniform homeomorphism also preserves Cauchy sequences. Two metric spaces are uniformly homeomorphic if there exists a uniform homeomorphism of one of them onto the other (Chapter 3). Now, as one would expect, we shall be interested in preserving both algebraic and topological structures between two normed spaces. A mapping of a normed space X onto a normed space Y that is (simultaneously) a homeomorphism and an isomorphism is called a topological isomorphism (or an equivalence), and X and Y are said to be topologically isomorphic (or equivalent ) if there exists a topological isomorphism between them. Clearly, continuity refers to the norm topologies: a topological isomorphism is a mapping of a normed space X onto a normed space Y, which is a homeomorphism when X and Y are viewed as metric spaces (equipped with the metrics generated by their respective norms), and also is an isomorphism between the linear spaces
4.6 Equivalence and Finite-Dimensional Spaces
233
X and Y. Since an isomorphism is just an injective and surjective linear transformation between linear spaces, it follows that a topological isomorphism is simply a linear homeomorphism between normed spaces. Thus the topological isomorphisms between X and Y are precisely the elements of G[X , Y ]: if X and Y are normed spaces, then W : X → Y is a topological isomorphism if and only if W is an invertible element of B[X , Y ]. Therefore, X and Y are topologically isomorphic if and only if there exists a linear homeomorphism between them or, equivalently, if and only if G[X , Y ] = ∅. Conversely, the elements of G[X , Y ] are also characterized as those linear-space isomorphisms that are bounded above and below (Theorem 4.14 and Proposition 4.20). Thus X and Y are topologically isomorphic if and only if there exists an isomorphism W in L[X , Y ] and a pair of positive constants α and β such that α#x# ≤ #W x# ≤ β#x#
for every
x ∈ X.
The Inverse Mapping Theorem says that an injective and surjective bounded linear transformation of a Banach space onto a Banach space is a homeomorphism, and hence a topological isomorphism: if X and Y are Banach spaces, then they are topologically isomorphic if and only if there exists an isomorphism in B[X , Y ]. Two norms on the same linear space are said to be equivalent if the metrics that they generate are equivalent. In other words (see Section 3.4), # #1 and # #2 are equivalent norms on a linear space X if and only if they induce the same norm topology on X . Proposition 4.26. Let # #1 and # #2 be two norms on the same linear space X . The following assertions are pairwise equivalent . (a) # #1 and # #2 are equivalent norms. (b) The identity map between the normed spaces (X , # #1 ) and (X , # #2 ) is a topological isomorphism. (c) There exist real constants α > 0 and β > 0 such that α#x#1 ≤ #x#2 ≤ β#x#1
for every
x ∈ X.
Proof. Recall that the identity obviously is an isomorphism of a linear space onto itself, and hence it is a topological isomorphism between the normed spaces (X , # #1 ) and (X , # #2 ) if and only if it is a homeomorphism. Thus assertions (a) and (b) are equivalent by Corollary 3.19. Assertion (c) simply says that the identity I: (X , # #1 ) → (X , # #2 ) is bounded above and below or, equivalently, that it is continuous and its inverse I −1 : (X , # #2 ) → (X , # #1 ) also is continuous (Theorem 4.14 and Proposition 4.20), which means that it is a homeomorphism. That is, assertions (b) and (c) are equivalent. It is worth noticing that metrics generated by norms are equivalent if and only if they are uniformly equivalent. Indeed, continuity coincides with uni-
234
4. Banach Spaces
form continuity for linear transformations between normed spaces (Theorem 4.14), and hence the identity I: (X , # #1 ) → (X , # #2 ) is a homeomorphism if and only if it is a uniform homeomorphism. Observe that, according to Problem 3.33, all norms of Example 4.E are equivalent. In particular, all norms of Example 4.A are equivalent. Theorem 4.27. If X is a finite-dimensional linear space, then any two norms on X are equivalent . Proof. Let B = {ei }ni=1 be a Hamel basis for an n-dimensional linear space X over F so that every vector in X has a unique expansion on B, x =
n
ξi e i ,
i=1
where {ξi }ni=1 is a family of scalars: the coordinates of x with respect to B. It is easy to show that the function # #0 : X → R, defined by #x#0 = max |ξi | 1≤i≤n
for every x ∈ X , is a norm on X . Now take an arbitrary norm # # on X . We shall verify that # #0 and # # are equivalent. First observe that n n n , ,
, , #x# = , ξi ei , ≤ |ξi |#ei # ≤ #ei# #x#0 i=1
i=1
i=1
for every x ∈ X . To show the other inequality, consider the linear space F n equipped with the norm # #∞ of Example 4.A, that is, #a #∞ = max |αi | 1≤i≤n
for every a = (α1 , . . . , αn ) ∈ F n, and let L: (F n, # #∞ ) → (X , # #) be a transformation defined by n La = αi ei i=1
for every a = (α1 , . . . , αn ) ∈ F . It is readily verified that L is linear. Moreover, n
n n , ,
, , #La # = , αi e i , ≤ #ei # #a#∞ , i=1
i=1
so that L is bounded, that is, L is continuous (Theorem 4.14), and so is the real-valued function ϕ: (F n, # #∞ ) → R such that ϕ(a ) = #La #
4.6 Equivalence and Finite-Dimensional Spaces
235
for every a = (α1 , . . . , αn ) ∈ F n (because # # : X → R is continuous, and composition of continuous functions is a continuous function). The unit sphere S = ∂B1 [0] = {a ∈ F n : #a#∞ = 1} is compact in (F n, # #∞ ) by Theorem 3.83, and so the continuous function ϕ assumes a minimum value on S (Theorem 3.86). Since {ei }ni=1 is linearly independent, it follows that the linear transformation L is injective (L(a ) = 0 if and only if a = 0), so that this minimum value of ϕ on S is positive (since ϕ(a ) = #L(a)# > 0 for every a ∈ S). Summing up: There exists a ∗ ∈ S such that 0 < ϕ(a ∗ ) ≤ ϕ(a ) for all a ∈ S. Let x = (ξ1 , . . . , ξn ) ∈ F n be nthe n-tuple whose entries are the coordinates {ξi }ni=1 of an arbitrary x = i=1 ξi ei in X with respect to the basis B. Note that #x#0 = #x#∞ = 0 whenever x = 0, and hence ,
, ϕ(a ∗ )#x#0 ≤ ϕ xx #x#∞ = ,L xx , #x#∞ ∞ ∞ n , , , , = #Lx# = , ξi ei , = #x# i=1
for every nonzero x in X . Therefore, by setting α = ϕ(a ∗ ) > 0 and β = n i=1 #ei # > 0 (which do not depend on x), it follows that α#x#0 ≤ #x# ≤ β#x#0 for every x ∈ X . Conclusion: Every norm on X is equivalent to the norm # #0 , and hence every two norms on X are equivalent. Corollary 4.28. Every finite-dimensional normed space is a Banach space. Proof. Let X be an n-dimensional linear space over F and let B = {ei}ni=1 be a Hamel basis for X . Take an arbitrary X -valued sequence {xk }k≥1 so that xk =
n
ξk (i) ei ,
i=1
where {ξk (i)}ni=1 is a family of scalars (the coordinates of xk with respect to the Hamel basis B) for each integer k ≥ 1. Equip X with a norm # # and suppose {xk }k≥1 is a Cauchy sequence in (X , # #). Consequently, for every ε > 0 there exists an integer kε ≥ 1 such that #xj − xk # < ε whenever j, k ≥ kε . Now consider the equivalent norm # #0 on X (which was defined in the proof of Theorem 4.27) so that, for every j, k ≥ kε , max |ξj (i) − ξk (i)| = #xj − xk #0 ≤
1≤i≤n
1 α
#xj − xk # <
ε α
236
4. Banach Spaces
for some positive constant α. Hence {ξk (i)}k≥1 is a Cauchy sequence in (F , | |) for each i = 1 , . . . , n, where | | stands for the usual norm on F . As (F , | |) is a Banach space (Example 4.A), each sequence {ξk (i)}k≥1 converges in (F , | |) to, say, ξ (i) ∈ F . That is, for every ε > 0 there exists an integer ki,ε ≥ 1 for each n i = 1 , . . . , n such that |ξk (i) − ξ (i)| < ε whenever k ≥ ki,ε . By setting x = i=1 ξ (i) ei in X we get n
#xk − x# ≤
|ξk (i) − ξ (i)|#ei # ≤
n
i=1
#ei # ε
i=1
whenever k ≥ kε = and hence xk → x in (X , # #) as k → ∞. Conclusion: If X is a finite-dimensional linear space and # # is any norm on X , then every Cauchy sequence in (X , # #) converges in (X , # #), and therefore (X , # #) is a Banach space. max{ki,ε }ni=1 ,
Corollary 4.29. Every finite-dimensional linear manifold of any normed space X is a subspace of X .
Proof. Corollary 4.28 and Theorem 3.40(a).
Corollary 4.30. Let X and Y = {0} be normed spaces. Every linear transformation of X into Y is continuous if and only if X is finite dimensional . Proof. If Y = {0}, then {0}X = L[X , {0}] = B[X , {0}] = {O} for every normed space X regardless of its dimension. Therefore, the right statement must exclude the case of Y = {0}, and it can be rewritten as follows. If Y = {0}, then dim X < ∞
if and only if
L[X , Y ] = B[X , Y ].
(a) If dim X = 0, then X = {0} and so L[X , Y] = B[X , Y] = {O}. Thus suppose dim X = n for some integer n ≥ 1 and let B = {ei }ni=1 be a Hamel basis for X . Take an arbitrary x ∈ X and consider its (unique) expansion on B, x =
n
ξi e i ,
i=1
where {ξi }ni=1 is a family of scalars consisting of the coordinates of x with respect to the basis B. Let # #X and # #Y denote the norms on X and Y, respectively. If T ∈ L[X , Y ], then n n n , ,
, , #T x#Y = , ξi T ei , ≤ |ξi |#T ei #Y ≤ #T ei#Y max |ξi |. i=1
Y
i=1
i=1
1≤i≤n
The norm # #0 on X that was defined in the proof of Theorem 4.27 is equivalent to # #X . Hence there exists α > 0 such that n n
#T ei#Y #x#0 ≤ α1 #T ei #Y #x#X , #T x#Y ≤ i=1
i=1
and so T ∈ B[X , Y ]. Thus L[X , Y ] ⊆ B[X , Y ] (i.e., L[X , Y ] = B[X , Y ]).
4.6 Equivalence and Finite-Dimensional Spaces
237
(b) Conversely, suppose X is infinite dimensional and let {eγ }γ∈Γ be an indexed Hamel basis for X . Since {eγ }γ∈Γ is an infinite set, it has a countably infinite subset, say {ek }k∈N . Set M = span {ek }k∈N , a linear manifold of X for which {ek }k∈N is a Hamel basis. Thus every vector x in M ⊆ X has a unique representation as a (finite) linear combination of vectors in {ek }k∈N . This means that there exists a unique (similarly indexed) family of scalars {ξk }k∈N such that ξk = 0 for all but a finite set of indices k, and ξk ek x = k∈N
(see Section 2.4). Take an arbitrary nonzero vector y in Y and consider the mapping LM : M → Y defined by LM x = k ξk #ek #y k∈N
for every x ∈ M. It is readily verified that LM is linear (recall: the above sums are finite and unique). Extend LM to X , according to Theorem 2.9, and get a linear transformation L: X → Y such that L|M = LM . Since Lek = LM ek = k#ek #y and ek = 0 for each k ∈ N , it follows that k#y# =
Lek
ek
≤ sup Lx
x x =0
for every k ∈ N . Outcome: There is no constant β ≥ 0 for which #Lx# ≤ β#x# for all x ∈ X . Thus the linear transformation L: X → Y is not bounded; i.e., L ∈ L[X , Y ]\B[X , Y ]. Conclusion: If L[X , Y ] = B[X , Y ], then dim X < ∞. Corollary 4.30 holds, in particular, if X is a Banach space. In this case, the above construction (proof of Corollary 4.30, part (b)) exhibits (via Theorem 2.9 — and therefore via Zorn’s Lemma) our first example of an unbounded linear transformation L whose domain X is an infinite-dimensional Banach space and whose codomain Y is any nonzero normed space. Corollary 4.31. Two finite-dimensional normed spaces are topologically isomorphic if and only if they have the same dimension. Proof. Let X and Y be finite-dimensional normed spaces. Corollaries 4.28 and 4.30 ensure that X and Y are Banach spaces and B[X , Y ] = L[X , Y ]. Thus X and Y are topologically isomorphic if and only if they are isomorphic linear spaces (Theorem 4.22), which means that dim X = dim Y (Theorem 2.12). Corollary 3.82 says that every compact subset of any metric space is closed and bounded. The Heine–Borel Theorem (Theorem 3.83) asserts that every closed and bounded subset of F n is compact. This equivalence (i.e., in F n a set is compact if and only if it is closed and bounded ) actually is a property that characterizes the finite-dimensional normed spaces.
238
4. Banach Spaces
Corollary 4.32. If X is a finite-dimensional normed space, then every closed and bounded subset of X is compact . Proof. Let X be a finite-dimensional normed space over F and let A be a subset of X that is closed and bounded (in the norm topology of X ). According to Corollary 4.31, X is topologically isomorphic to F n, where n = dim X and F n is equipped with any norm (which are all equivalent by Theorem 4.27). Take a topological isomorphism W ∈ G[X , F n ] and note that W (A) is closed (because W is a closed map — Theorem 3.24) and bounded (since supa∈A #Wa# ≤ #W # supa∈A #a# — see Problem 4.5) in F n. Thus W (A) is compact in F n (Theorem 3.83). Therefore, since W −1 ∈ G[F n, X ] is a homeomorphism, and since compactness is a topological invariant (Theorem 3.64), it follows that A = W −1 (W (A)) is compact in X . Since compact sets are always closed and bounded, Corollary 4.32 extends the Heine–Borel Theorem to any finite-dimensional normed space: a subset of a finite-dimensional normed space is compact if and only if it is closed and bounded . To establish the converse of Corollary 4.32, we need the next result. Lemma 4.33. (Riesz). Let M be a proper subspace of a normed space X . For each real α ∈ (0, 1) there exists a vector xα in X such that #xα # = 1 and α ≤ d(xα , M) ≤ 1. Proof. The second inequality holds trivially because M is a linear manifold of X , and hence it contains the origin of X (i.e., d(x, M) = inf u∈M #x − u# ≤ #x − 0# = #x# for every x ∈ X ). Since M is a proper closed subset of X , it follows that X \M is nonempty and open in X . Take any vector z ∈ X \M such that 0 < d(z, M) (e.g., take the center of an arbitrary nonempty open ball included in the open set X \M). Thus for each ε > 0 there exists a vector vε ∈ M such that 0 < d(z, M) = inf #z − u# ≤ #z − vε # ≤ (1 + ε) d(z, M). u∈M
Set xε = #z − vε #−1 (z − vε ) in X so that #xε # = 1 and , , #z − vε #d(xε , M) = inf , #z − vε #xε − #z − vε #u , u∈M
= inf #z − vε − v# = d(z, M). v∈M
Hence
1 1+ε
≤ d(xε , M),
which concludes the proof by setting α =
1 1+ε .
This result is sometimes referred to as the Almost Orthogonality Lemma. (Why? — Draw a picture.) It is worth remarking that the lemma does not ensure the existence of a vector x1 in X with #x1 # = 1 and d(x1 , M) = 1.
4.7 Continuous Linear Extension and Completion
239
Such a vector may not exist in a Banach space, but it certainly exists in a Hilbert space (next chapter). Corollary 4.34. Let X be a normed space. If the closed unit ball B1 [0] = {x ∈ X : #x# ≤ 1} is compact, then X is finite dimensional . Proof. If B1 [0] is compact, then it is totally bounded (Corollary 3.81), so that there exists a finite 21 -net for B1 [0] (Definition 3.68), say {ui }ni=1 ⊂ B1 [0]. Set M = span {ui}ni=1 , which is a finite-dimensional subspace of X (reason: span {ui}ni=1 is a finite-dimensional linear manifold of X because there exists a Hamel basis for it included in {ui }ni=1 , by Theorem 2.6, and hence it is a subspace of X according to Corollary 4.29). If X is infinite dimensional, then M = X and, in this case, Lemma 4.33 ensures the existence of a vector x in B1 [0] such that 21 < d(x, M). Thus 21 < #x − u# for every u ∈ M and, in particular, 21 < #x − ui # for every i = 1 , . . . , n, which contradicts the fact that {ui }ni=1 is a 21 -net for B1 [0]. Conclusion: If B1 [0] is compact in X , then X is finite dimensional. Corollary 4.34 says that if X is infinite dimensional, then there is a closed and bounded subset of X that is not compact. Equivalently, if every closed and bounded subset of a normed space X is compact, then X is finite dimensional , which is the converse of Corollary 4.32. Thus Corollaries 4.32 and 4.34 lead to the following characterization. A normed space is finite dimensional if and only if compact sets are precisely the closed and bounded ones.
4.7 Continuous Linear Extension and Completion We shall now transfer the extension results of Section 3.8 on uniformly continuous mappings between metric spaces to bounded linear transformations between normed spaces. The normed-space versions of Theorem 3.45 and Corollary 3.46 read as follows. Theorem 4.35. Let M be a dense linear manifold of a normed space X and let Y be a Banach space. Every T in B[M, Y ] has a unique extension T in B[X , Y ]. Moreover , #T# = #T #. Proof. Let M and Y be normed spaces. Recall that every T in B[M, Y ] is uniformly continuous (Theorem 4.14). If M is a dense linear manifold of a normed space X , and Y is a Banach space (i.e., a complete normed space), then Theorem 3.45 ensures that there exists a unique continuous extension T: X → Y of T over X . It remains to verify that T is linear and #T# = #T #. Let u and v be arbitrary vectors in X . Since M− = X , it follows by Proposition 3.32 that there exist M-valued sequences {un } and {vn } converging in X to u and v, respectively. Then, by Corollary 3.8,
240
4. Banach Spaces
T(u + v) = T(lim un + lim vn ) = T(lim(un + vn )) = lim T(un + vn ) because addition is a continuous operation (Problem 4.1) and T is continuous. But T = T|M , M is a linear space, and T is a linear transformation. Hence T(un + vn ) = T (un + vn ) = T un + T vn = Tun + Tvn for every n. The same continuity argument (Problem 4.1 and Corollary 3.8) makes the backward path: lim T(un + vn ) = lim(Tun + Tvn ) = lim Tun + lim Tvn = Tu + Tv. This shows that T is additive. Similarly, for every scalar α, T(αu) = T(α lim un ) = T(lim αun ) = lim T(αun ) = lim T αun = lim αT un = lim α Tun = α lim Tun = α Tu, so that T is homogeneous. That is, T is linear. Thus T ∈ B[X , Y ]. Moreover, #Tun# = #T un# ≤ #T # #un# for every n. Applying once again the same continuity argument, and recalling that the norm is continuous, it follows that lim #un# = #u# and #Tu# = #T lim un # = lim #Tun # ≤ #T # lim #un # = #T ##u#, and so #T# = sup u =1 #Tu# ≤ #T #. On the other hand, #T # =
sup
0 =w∈M
T w
w
=
sup
0 =w∈M
w
T
w
≤ sup
0 =x∈X
x
T
x
= #T#.
Corollary 4.36. Let X and Y be Banach spaces. Let M and N be dense linear manifolds of X and Y, respectively. If W ∈ G[M, N ], then there exists 2 ∈ G[X , Y ] that extends W over X . a unique W Proof. Every W ∈ G[M, N ] is a linear homeomorphism, and hence a uniform homeomorphism (Theorem 4.14). Thus Corollary 3.46 ensures that there is 2 : X → Y that extends W over X . On the other a unique homeomorphism W hand, Theorem 4.35 says that there exists a unique continuous linear extension 2 ∈ B[X , Y ]. Thus W 2 and W 2 are both continuous of W : M → Y over X , say W 2 |M = W = mappings of X into Y that coincide on a dense subset M of X : W 2 2 2 2 of W |M . Thus W = W by Corollary 3.33, and so the homeomorphism W 2 is a linear homeomorphism, which X onto Y lies in B[X , Y ]. Conclusion: W 2 ∈ G[X , Y ]. means a topological isomorphism; that is, W An even stronger form of homeomorphism is that of a surjective isometry (a surjective mapping that preserves distance — recall that every isometry is
4.7 Continuous Linear Extension and Completion
241
an injective contraction, and the inverse of a surjective isometry is again a surjective isometry). If there exists a surjective isometry between two metric spaces, then they are said to be isometric or isometrically equivalent (Chapter 3). Thus a linear isometry of a normed space X into a normed space Y (i.e., a linear transformation of X into Y that is also an isometry) is necessarily an element of B[X , Y ], since an isometry is continuous. A surjective isometry in B[X , Y ] (i.e., a linear surjective isometry or, equivalently, a linear-space isomorphism that is also an isometry) is called an isometric isomorphism. Two normed spaces are isometrically isomorphic if there exists an isometric isomorphism between them. The next proposition places linear isometries in a normed-space setting. Proposition 4.37. Take V ∈ L[X , Y ], where X and Y are normed spaces. The following assertions are equivalent. (a) V is an isometry (i.e., #V u − V v# = #u − v# for every u, v ∈ X ). (b) #V x# = #x# for every x ∈ X . If Y = X , then each of the above assertions also is equivalent to (c) #V n x# = #x# for every x ∈ X and every integer n ≥ 1. Proof. Assertions (a) and (b) are clearly equivalent because V is linear. Indeed, set v = 0 in (a) to get (b) and, conversely, set x = u − v in (b) to get (a). Moreover, take V ∈ L[X ] and suppose (b) holds true. Then (c) holds trivially for n = 1. If (c) holds for some n ≥ 1, then #V n+1 x# = #V n V x# = #V x# = #x# for every x ∈ X . Conclusion: (b) implies (c) by induction. Finally, note that (c) trivially implies (b). It is plain that every isometric isomorphism is a topological isomorphism, and that the identity I of B[X ] is an isometric isomorphism. It is also clear that the inverse of a topological (isometric) isomorphism is itself a topological (isometric) isomorphism, and that the composition (product) of two topological (isometric) isomorphisms is again a topological (isometric) isomorphism. Hence these concepts (topological isomorphism and isometric isomorphism, that is) have the defining properties of an equivalence relation (viz., reflexivity, symmetry, and transitivity). Corollary 4.38. Let M and N be dense linear manifolds of Banach spaces X and Y, respectively. If U : M → N is an isometric isomorphism of M onto : X → Y of X onto Y N , then there exists a unique isometric isomorphism U that extends U over X . Proof. According to Corollary 4.36, there exists a unique topological isomor : X → Y that extends U : M → N over X . Note that the mappings phism U ( )# : X → R are continuous (composition of continuous # # : X → R and #U functions is again a continuous function), and also that they coincide on a
242
4. Banach Spaces
u# = #U u# = #u# for every u ∈ M since U |M = U dense subset M of X (#U and U is an isometry — Proposition 4.37). Then, according to Corollary 3.33, x# = #x# for every x ∈ X . they coincide on the whole space X ; that is, #U is an isometry, Thus (Proposition 4.37 again) the topological isomorphism U and hence an isometric isomorphism. Since every normed space is a metric space, every normed space has a completion in the sense of Definition 3.48. The question that naturally arises is whether a normed space has a completion that is a Banach space. In other words, is a completion (in the norm topology, of course) of a normed space a complete normed space? We shall first redefine the concept of completion in a normed-space setting. Definition 4.39. If the image of a linear isometry on a normed space X is a dense linear manifold of a Banach space X, then X is a completion of X . Equivalently, if a normed space X is isometrically isomorphic to a dense linear manifold of a Banach space X, then X is a completion of X . (In particular, any Banach space is a completion of every dense linear manifold of it, since the inclusion map is a linear isometry.) Theorem 4.40. Every normed space has a completion. Proof. Let (X , # #X ) be a normed space. Consider the linear space X N of all X valued sequences. It is readily verified that the collection CS(X) of all Cauchy sequences in (X , # #X ) is a linear manifold of X N, and so a linear space (over the same field of the linear space X ). The program is to rewrite the proof of Theorem 3.49 in terms of the metric dX generated by the norm # #X . Thus #x# = lim #xn #X defines a seminorm on the linear space CS(X), where x = {xn } is an arbitrary sequence in CS(X). Set N = x ∈ CS(X): #x# = 0 . Proposition 4.5 ensures that N is a linear manifold of CS(X ), and also that #[x]# = #x# X defines a norm on the quotient space X = CS(X )/N , where x is an arbitrary element of an arbitrary coset [x] = x + N in X, so that (X, # # ) is a normed space. X Now consider the mapping K: X → X that assigns to each x ∈ X the coset [x ] ∈ X containing the constant sequence x = {xn } ∈ CS(X) such that xn = x for all n. It is easy to show that
4.7 Continuous Linear Extension and Completion
243
K: X → X is a linear isometry. Moreover, by Claims 1 and 2 in the proof of Theorem 3.49, the range of K (which is clearly a linear manifold of X) is dense in (X, # # ), and (X, # # ) X X is a complete normed space. That is, K(X )− = X
X is a Banach space.
and
Summing up: X is isometrically isomorphic to K(X ), which is a dense linear manifold of X, and X is a Banach space. Thus X is a completion of X . The completion of a normed space is essentially unique; that is, the completion of a normed space is unique up to an isometric isomorphism (i.e., up to a linear surjective isometry). Theorem 4.41. Any two completions of a normed space are isometrically isomorphic. Proof. Replace “metric space”, “surjective isometry”, and “subspace”, respectively, with “normed space”, “isometric isomorphism”, and “linear manifold” in the proof of Theorem 3.50. According to Definition 4.39, a Banach space X is a completion of a normed space X if there exists a dense linear manifold of X, say X, which is isometrically isomorphic to X ; in other words, if there exists a surjective isometry UX in G[X, X ] for some dense linear manifold X of X. Let Y be a completion of a normed space Y, so that Y is isometrically isomorphic to a dense linear mani and let UY ∈ G[Y, Y ] be the surjective isometry fold Y of the Banach space Y, between Y and Y. Take T ∈ B[X , Y ] and set T = UY T UX ∈ B[X, Y ] as in the following commutative diagram. T
X ⏐ UX ⏐
−−−→
Y ⏐ ⏐ UY
X
T −−−→
Y
Define T ∈ B[X, Y ] as the extension of T ∈ B[X, Y ] over X (Theorem 4.35), so that #T# = #UY T UX # = #T # (see Problem 4.41). It is again usual to refer to T as the extension of T over the completion X of X into the completion Y of Y. Moreover, since any pair of completions of X and Y, say {X, X } Y }, are isometrically isomorphic (i.e., since there exist surjective and {Y, Y in G[Y, Y ]), it follows (see proof of Theorem X in G[X , X] and U isometries U 3.51) that any two extensions T ∈ B[X, Y ] and T ∈ B[X , Y ] of T ∈ B[X , Y ] over the completions X and X into the completions Y and Y , respectively, Y T U X (and are unique up to isometric isomorphisms in the sense that T = U hence #T # = #T # = #T #). The commutative diagrams
4. Banach Spaces
244
Y ⏐ T⏐ X
UY −−− → Y ⏐ ⏐T
⊂
U ←−X−− X
⊂
Y U Y −−− → ⏐ ⏐ T
Y ⏐ ⏐T
U X ←−X−− X
illustrate such an extension program, which is finally stated as follows. Theorem 4.42. Let the Banach spaces X and Y be completions of the normed spaces X and Y, respectively. Every T ∈ B[X , Y ] has an extension T ∈ B[X, Y ] over the completion X of X into the completion Y of Y. Moreover, T is unique up to isometric isomorphisms and #T# = #T #.
4.8 The Banach–Steinhaus Theorem and Operator Convergence Let X and Y be normed spaces and let Θ be a subset of B[X , Y ]. We shall say that Θ is strongly bounded (or pointwise bounded ) if for each x ∈ X the set Θx = {T x ∈ Y: T ∈ Θ} is bounded in Y (with respect to the norm topology of Y), and uniformly bounded if Θ is itself bounded in B[X , Y ] (with respect to the uniform norm topology of B[X , Y ]). It is clear that uniform boundedness implies strong boundedness: sup #T # < ∞ implies sup #T x# < ∞ for every x ∈ X ,
T ∈Θ
T ∈Θ
since #T x# ≤ #T ##x# for every x ∈ X . The Banach–Steinhaus Theorem ensures the converse whenever X is a Banach space: sup #T x# < ∞ for every x in a Banach space X implies sup #T # < ∞.
T ∈Θ
T ∈Θ
That is, a collection Θ of bounded linear transformations of a Banach space X into a normed space Y is uniformly bounded if and only if it is strongly bounded . This will be our second important application of the Baire Category Theorem (the first was the Open Mapping Theorem). Theorem 4.43. (The Banach–Steinhaus Theorem or The Uniform Boundedness Principle). Let {Tγ }γ∈Γ be an arbitrary indexed family of bounded linear transformations of a Banach space X into a normed space Y. If supγ∈Γ #Tγ x# < ∞ for every x ∈ X , then supγ∈Γ #Tγ # < ∞. Proof. For each n ∈ N set An = x ∈ X : #Tγ x# ≤ n for all γ ∈ Γ . Claim 1. An is closed in X . Proof. Let {xk } be an An -valued sequence that converges in X to, say, x ∈ X . Take an arbitrary γ ∈ Γ and note that
4.8 The Banach–Steinhaus Theorem and Operator Convergence
245
#Tγ x# = #Tγ (x − xk ) + Tγ xk # ≤ #Tγ (x − xk )# + #Tγ xk # ≤ #Tγ # #x − xk # + n for every integer k. Since #x − xk # → 0 as k → ∞, it follows that #Tγ x# ≤ n. Thus, as γ is arbitrary in Γ , #Tγ x# ≤ n for all γ ∈ Γ , and hence x ∈ An . Conclusion: An is closed in X by the Closed Set Theorem (Theorem 3.30). Claim 2. If supγ∈Γ #Tγ x# < ∞ for every x ∈ X , then {An } covers X . Proof. Take an arbitrary x ∈ X . If supγ∈Γ #Tγ x# < ∞, then there exists an integer nx ∈ N such thatsupγ∈Γ #Tγ x# ≤ nx , and hence x ∈ Anx . Therefore, X ⊆ n∈N An (i.e., X = n∈N An ). Then, by the Baire Category Theorem (Theorem 3.58), at least one of the sets An , say An0 , has a nonempty interior. This means that there is a vector x0 in X and a radius ρ > 0 such that the open ball B2ρ (x0 ) is included in An0 (and so #Tγ x0 # < n0 for all γ ∈ Γ ). Take any nonzero vector x in X and set x = ρ#x#−1 x ∈ X . Since x ∈ B2ρ (0), it follows that x + x0 ∈ B2ρ (x0 ) ⊂ An0 . Thus, for every γ ∈ Γ , #Tγ (x + x0 )# < n0 and ρ
Tγ x
x
= #Tγ x # = #Tγ (x + x0 ) − Tγ x0 # ≤ #Tγ (x + x0 )# + #Tγ x0 # < 2n0 .
Therefore, #Tγ # = supx =0
Tγ x
x
≤
2n0 ρ
for all γ ∈ Γ .
Let {Ln } be a sequence of linear transformations of a normed space X into a normed space Y (i.e., an L[X , Y ]-valued sequence). {Ln} is pointwise convergent if for every x ∈ X the Y-valued sequence {Ln x} converges in Y ; that is, if for each x in X there exists a yx in Y such that Ln x → yx in Y as n → ∞. As the limit is unique, this actually defines a mapping L: X → Y given by L(x) = yx for every x ∈ X . Moreover, it is readily verified that L is linear (cf. Problem 4.1). Thus an L[X , Y ]-valued sequence {Ln } is pointwise convergent if and only if there is an L ∈ L[X , Y ] such that #(Ln − L)x# → 0 as n → ∞ for every x ∈ X . Now consider a sequence {Tn} of bounded linear transformations of a normed space X into a normed space Y (i.e., a B[X , Y ]valued sequence). Proposition 4.44. If X is a Banach space and Y is a normed space, then a B[X , Y ]-valued sequence {Tn } is pointwise convergent if and only if there is a T ∈ B[X , Y ] such that #(Tn − T )x# → 0 for every x ∈ X . Proof. According to the above italicized result (about pointwise convergence of L[X , Y ]-valued sequences), all that remains to be shown is that T is bounded whenever #(Tn − T )x# → 0 for every x ∈ X . Since the sequence {Tn x} converges in Y, it is bounded in Y (Proposition 3.39). That is, supn #Tn x# < ∞ for every x ∈ X (Problem 4.5). Then supn #Tn # < ∞ by the Banach–Steinhaus Theorem (Theorem 4.43) because X is a Banach space. Therefore,
246
4. Banach Spaces
#T x# = lim #Tn x# ≤
lim sup #Tn# #x# ≤ sup #Tn # #x# n
n
n
for every x ∈ X , so that T ∈ B[X , Y ] and #T # ≤ lim supn #Tn#.
Definition 4.45. Let X and Y be normed spaces, and let {Tn } be a B[X , Y ]valued sequence. If {Tn } converges in B[X , Y ], that is, if there exists T in B[X , Y ] such that #Tn − T # → 0, then we say that {Tn } converges uniformly (or converges in the uniform topology, or in the operator norm topology) to T , which is called the uniform limit u u of {Tn }. Notation: Tn −→ T (or Tn − T −→ O). If {Tn} does not converge u uniformly to T , then we write Tn −→ / T . If there exists T in B[X , Y ] such that #(Tn − T )x# → 0
for every
x ∈ X,
then we say that {Tn } converges strongly (or converges in the strong (operator ) s topology) to T , which is called the strong limit of {Tn }. Notation: Tn −→ T s (or Tn − T −→ O). Again, if {Tn } does not converge strongly to T , then we s write Tn −→ / T. Uniform convergence implies strong convergence (to the same limit). In fact, for each n, 0 ≤ #(Tn − T )x# ≤ #Tn − T # #x# for every x ∈ X , and hence u T Tn −→
=⇒
s Tn −→ T.
The next proposition says that on a finite-dimensional normed space the concepts of strong and uniform convergence coincide. Proposition 4.46. Let X and Y be normed spaces. Take a B[X , Y ]-valued sequence {Tk } and a transformation T in B[X , Y ]. If X is finite dimensional, s u then Tk −→ T if and only if Tk −→ T. Proof. Suppose (X , # #X ) is a finite-dimensional normed space and let B = {ei }ni=1 be a Hamel basis for the linear space X . Take an arbitrary x in X and consider its unique expansion on B, x =
n
ξi e i ,
i=1
where the scalars {ξi }ni=1 are the coordinates of x with respect to B. Set #x#0 = max |ξi |, 1≤i≤n
which defines a norm on X , as we saw in the proof of Theorem 4.27. If {Tk } converges strongly to T ∈ B[X , Y ], then #(Tk − T )ei #Y → 0 as k → ∞ for each ei ∈ B. Thus for each i = 1 , . . . , n and for every ε > 0 there exists a positive integer ki,ε such that #(Tk − T )ei #Y < ε whenever k ≥ ki,ε . Therefore,
4.8 The Banach–Steinhaus Theorem and Operator Convergence n , , , , #(Tk − T )x#Y = , ξi (Tk − T )ei ,
Y
i=1
≤
n
247
|ξi |#(Tk − T )ei #Y
i=1
≤ n max |ξi | max #(Tk − T )ei #Y < n #x#0 ε 1≤i≤n
1≤i≤n
whenever k ≥ kε = max{ki,ε }ni=1 . However, # #0 and # #X are equivalent norms on X (according to Theorem 4.27) so that there exists a constant α > 0 such that α#x#0 ≤ #x#X . Since n and α do not depend on x, it follows that #(Tk − T )x#Y <
n α
ε#x#X
for every x ∈ X whenever k ≥ kε . Hence, for every ε > 0 there exists a positive integer kε such that k ≥ kε
#(Tk − T )# = sup #(Tk − T )x#Y <
implies
x ≤1
n α
ε,
s u and so {Tk } converges uniformly to T . Summing up: Tk −→ T implies Tk −→ T whenever X is a finite-dimensional normed space. This concludes the proof once uniform convergence always implies strong convergence.
Proposition 4.47. If a Cauchy sequence in B[X , Y ] is strongly convergent, then it is uniformly convergent . Proof. Let X and Y be normed spaces. If T lies in B[X , Y ] and if {Tn } is a B[X , Y ]-valued sequence, then, for each pair of indices m and n, #(Tn − T )x# = #(Tn − Tm )x + (Tm − T )x# ≤ #Tn − Tm # #x# + #(Tm − T )x# for every x ∈ X . Take an arbitrary ε > 0. If {Tn } is a Cauchy sequence in B[X , Y ], then there is a positive integer nε such that #Tn − Tm # < 2ε , and so #(Tn − T )x# <
ε #x# 2
+ #(Tm − T )x#
for every x ∈ X , whenever m, n ≥ nε . If {Tn } converges strongly to T , then for each x ∈ X there exists an integer mε,x ≥ nε such that #(Tm − T )x# < ε2 whenever m ≥ mε,x . Hence #(Tn − T )x# <
ε #x# 2
+
ε 2
for every x ∈ X , and therefore #Tn − T # = sup #(Tn − T )x# < ε,
x ≤1
if n ≥ nε . Conclusion: For every ε > 0 there exists an integer nε ≥ 1 such that n ≥ nε
implies
#Tn − T # < ε,
which means that {Tn } converges in B[X , Y ] to T .
248
4. Banach Spaces
Outcome: If X and Y are normed spaces and {Tn } is a B[X , Y ]-valued sequence, then {Tn} is uniformly convergent if and only if it is a strongly convergent Cauchy sequence in B[X , Y ]. Remark : If {Tn } is strongly convergent, then it is strongly bounded (Proposition 3.39). Now, if X is a Banach space, then pointwise convergence coincides with strong convergence (Proposition 4.44), and strong boundedness coincides with uniform boundedness (Theorem 4.43). Hence s Tn −→ T
sup #Tn # < ∞ whenever X is Banach.
=⇒
n
Take T ∈ B[X ] and consider the power sequence {T n} in the normed algebra B[X ]. The operator T is called uniformly stable if the power sequence {T n } u converges uniformly to the null operator; that is, if T n −→ O (equivalently, n n #T # → 0). T is strongly stable if the power sequence {T } converges strongly s to the null operator; that is, if T n −→ O (equivalently, #T n x# → 0 for every x ∈ X ). T is called power bounded if {T n} is a bounded sequence in B[X ]; that is, if supn #T n # < ∞. If X is a Banach space, then the Banach–Steinhaus Theorem ensures that T is power bounded if and only if supn #T nx# < ∞ for every x ∈ X . Clearly, uniform stability implies strong stability, u T n −→ O
=⇒
s T n −→ O,
which in turn implies power boundedness whenever X is a Banach space: s O T n −→
sup #T n # < ∞ if X is Banach.
=⇒
n
However, the converses fail. p ] of Example 4.H Example 4.K. Consider the diagonal operator Da ∈ B[+ ∞ (for some p ≥ 1, where a = {αk }∞ lies in ), and recall that #Da # = + k=0 supk |αk | = #a#∞ . It is readily verified by induction that the nth power of Da , p Dan , is again a diagonal operator in B[+ ]. Indeed,
Dan x = {αnk ξk }∞ k=0
for every
p x = {ξk }∞ k=0 ∈ +
p so that #Dan # = supk |αk |n = #a#n∞ for every n ≥ 0. Hence Da ∈ B[+ ] is uniformly stable if and only if #a#∞ < 1; that is, u O Dan −→
if and only if
sup |αk | < 1. k
p Next we shall investigate strong stability for Da . If #Dan x# → 0 for every x ∈ + , n n then (Example 4.H) #Da ej # = |αj | → 0 as n → ∞ and so |αj | < 1 for every p j ≥ 0. On the other hand, take an arbitrary x = {ξk }∞ k=0 in + . Note that
0 ≤ #Dan x#p =
∞
|αk |np |ξk |p =
k=0
≤
max |αk |np
0≤ k≤m
m
|αk |np |ξk |p +
k=0 m k=0
|ξk |p + sup |αk |np k>m
∞ k=m+1 ∞ k=m+1
|αk |np |ξk |p |ξk |p
4.8 The Banach–Steinhaus Theorem and Operator Convergence
249
for every pair of integers m, n ≥ 0. If |αk | < 1 for every integer k ≥ 0, then m supk |αk |np ≤ 1 for all n ≥ 0. Thus, as k=0 |ξk |p ≤ #x#p for all m ≥ 0, 0 ≤ #Dan x#p ≤
max |αk |np #x#p +
0≤ k≤m
∞
|ξk |p
k=m+1
∞ for every m, n ≥ 0. Now take an arbitrary ε > 0. Since k=m+1 |ξk |p → 0 as ∞ p p m → ∞ (because k=0 |ξk | = #x# < ∞ — Problem 3.11), it follows that there exists a positive integer mε such that m ≥ mε
implies
∞
|ξk |p <
εp . 2
k=m+1
Moreover, if |αk | < 1 for every integer k ≥ 0, then max{|αk |p }m k=0 < 1 so that limn max 0≤k≤m |αk |np = limn (max 0≤k≤m |αk |p )n = 0 for every nonnegative integer m. In particular, limn max 0≤k≤mε|αk |np = 0. Then there exists a positive integer nε such that n ≥ nε
implies
max |αk |np <
0≤ k≤mε
But 0 ≤ #Dan x#p ≤ max 0≤k≤mε|αk |np #x#p + n ≥ nε
implies
εp . 2
∞
k=mε +1 |ξk |
p
, and hence
0 ≤ #Dan x# < ε.
p . Hence Therefore, if |αk | < 1 for every k ≥ 0, then #Dan x# → 0 for every x ∈ + p Da ∈ B[+ ] is strongly stable if and only if |αk | < 1 for every k ≥ 0; that is, s Dan −→ O
|αk | < 1 for every k ≥ 0. ∞ p ] of Example For instance, the diagonal operator Da = diag k+1 ∈ B[+ k+2 k=0 4.H is strongly stable but not uniformly stable. if and only if
p p Example 4.L. Consider the mapping S− : + → + defined by
S− x = {ξk+1 }∞ k=0
for every
p x = {ξk }∞ k=0 ∈ +
(i.e., S− (ξ0 , ξ1 , ξ2 , . . .) = (ξ1 , ξ2 , ξ3 , . . .)), which is also represented as an infinite matrix as follows. ⎞ ⎛ 0 1 ⎟ ⎜ 0 1 ⎟ S− = ⎜ 0 1 ⎠, ⎝ .. . where every entry immediately above the main diagonal is equal to 1 and the p remaining entries are all zero. This is the backward unilateral shift on + . It p is readily verified that S− is linear and bounded so that S− ∈ B[+ ]. Actually,
250
4. Banach Spaces
∞ p p p p ∞ #S− x#p = ∞ k=1 |ξk | ≤ k=0 |ξk | = #x# for every x = {ξk }k=0 ∈ + , so that S− is, in fact, a contraction (i.e., #S− # ≤ 1). Consider the power sequence {S−n } p in B[+ ]. A trivial induction shows that, for each n ∈ N 0 , p S−n x = {ξk+n }∞ for every x = {ξk }∞ k=0 k=0 ∈ + . ∞ p Hence #S−n x#p = k=n |ξk |p → 0 as n → ∞ for every x = {ξk }∞ k=0 ∈ + (cf. Problem 3.11) so that S− is strongly stable; that is, n
s O. S− −→
However, S− is not uniformly stable. Indeed, #S−n # ≤ #S− #n ≤ 1 for every p integer n ≥ 0 (see Problem 4.47(a)). On the other hand, consider the + valued sequence {ej }∞ where, for each integer j ≥ 0, e is a scalar-valued j j=0 sequence with just one nonzero entry (equal to 1) at the jth position (i.e., p ej = {δjk }∞ k=0 ∈ + for every j ≥ 0). Observe that #ej # = 1 for every j ≥ 0 and n S− en+1 = e1 for every n ≥ 0. Thus #S−n # = sup x =1 #S−n x# ≥ #S−n en+1 # = #e1 # = 1 for every nonnegative integer n. Therefore, #S−n # = 1
for every
n ≥ 0,
u so that S−n −→ / O. Conclusion: The power sequence
{S−n } does not converge uniformly. s O, and since uniform convergence implies strong conReason: Since S−n −→ u O or {S−n } does not vergence to the same limit, it follows that either S−n −→ converge uniformly.
Let X and Y be normed spaces and let Θ be a subset of B[X , Y ]. According to the Closed Set Theorem, Θ is closed in B[X , Y ] if and only if every Θvalued sequence that converges in B[X , Y ] has its limit in Θ. Equivalently, every uniformly convergent sequence {Tn } of bounded linear transformations in Θ ⊆ B[X , Y ] has its (uniform) limit T in Θ. In this case the set Θ ⊆ B[X , Y ] is also called uniformly closed (or closed in the uniform topology of B[X , Y ]). We say that a subset Θ of B[X , Y ] is strongly closed in B[X , Y ] if every Θvalued strongly convergent sequence {Tn } has its (strong) limit T in Θ. Proposition 4.48. If Θ ⊆ B[X , Y ] is strongly closed in B[X , Y ], then it is (uniformly) closed in B[X , Y ]. Proof. Take an arbitrary Θ-valued uniformly convergent sequence, say {Tn }, and let T ∈ B[X , Y ] be its (uniform) limit. Since uniform convergence implies strong convergence to the same limit, it follows that {Tn } converges strongly to T . If every Θ-valued strongly convergent sequence has its (strong) limit in Θ, then T ∈ Θ. Conclusion: Every Θ-valued uniformly convergent sequence has its (uniform) limit in Θ.
4.8 The Banach–Steinhaus Theorem and Operator Convergence
251
Remark : If X is finite dimensional, then strong convergence coincides with uniform convergence (Proposition 4.46). Thus the concepts of strongly and uniformly closed in B[X , Y ] coincide if X is a finite-dimensional normed space. p Example 4.M. Take an arbitrary p ≥ 1 and let Δ ⊂ B[+ ] be the collection p of all diagonal operators in B[+]. That is (see Example 4.H), p ∞ ∞ Δ = Da ∈ L[+ ]: Da = diag({αk }∞ k=0 ) and a = {αk }k=0 ∈ + .
Set Δ∞ =
co , Da ∈ Δ: a ∈ +
p the collection of all diagonal operators Da = diag({αk }∞ k=0 ) in B[+ ] such that ∞ the scalar-valued sequence a = {αk }k=0 converges to zero. As a matter of fact, p both Δ and Δ∞ are subalgebras of the Banach algebra B[+ ]. Let {Dan}n≥1 u be an arbitrary Δ∞ -valued uniformly convergent sequence. Hence Dan −→ D p ∞ for some D ∈ B[+ ], with each Dan = diag({αk (n)}k=0 ) in Δ∞ , so that each s co an = {αk (n)}∞ k=0 lies in + . This implies that Dan −→ D and, according to Problem 4.51, D is a diagonal operator; that is, D = Da = diag({αk }∞ k=0) in ∞ Δ for some a = {αk }∞ in . Thus (see Example 4.H), + k=0
#Dan − Da # = #an − a#∞ = sup |αk (n) − αk |. k
Now take an arbitrary ε > 0. Since supk |αk (n) − αk | → 0 as n → ∞ (because #Dan − Da # → 0), there is a positive integer nε such that sup |αk (nε ) − αk | < ε. k co ), there is a positive integer kε such that Since αk (nε ) → 0 as k → ∞ (anε ∈ +
|αk (nε )| < ε whenever k ≥ kε . # # Recall that # |αk | − |αk (nε )| # ≤ |αk − αk (nε )|, and hence |αk | ≤ sup |αk − αk (nε )| + |αk (nε )|, k
for every k. Therefore, k ≥ kε
implies
|αk | < 2ε
co so that αk → 0. Thus a = {αk }∞ k=0 ∈ + and so Da ∈ Δ∞ . Conclusion: p ]. Δ∞ is closed in B[+
Next consider a Δ∞ -valued sequence {Dn}n≥1 , where each Dn is a diagonal operator whose only nonzero entries are the first n entries in the main diagonal, which are all equal to 1: Dn = diag(1, . . . , 1, 0, 0, 0, . . .) ∈ Δ∞ .
252
4. Banach Spaces
p ∞ Note that #Dn x − x#p = ∞ k=n+1 |ξk | → 0 as n → ∞ for every x = {ξk }k=0 ∞ p (because k=0 |ξk | = #x# < ∞ — Problem 3.11). Hence s I. Dn −→ p ] is a diagonal operator that does not lie in But the identity operator in B[+ Δ∞ (i.e., I ∈ Δ\Δ∞ ). Outcome: p ]. Δ∞ is not strongly closed in B[+
4.9 Compact Operators Let X and Y be normed spaces. A linear transformation T ∈ L[X , Y ] is compact (or completely continuous) if it maps bounded subsets of X into relatively compact subsets of Y; that is, if T (A)− is compact in Y whenever A is bounded in X . Equivalently, T ∈ L[X , Y ] is compact if T (A) lies in a compact subset of Y whenever A is bounded in X (see Theorem 3.62). Theorem 4.49. If T ∈ L[X , Y ] is compact, then T ∈ B[X , Y ]. Proof. Take an arbitrary bounded subset A of X . If T ∈ L[X , Y ] is compact, then T (A)− is a compact subset of Y. Thus T (A)− is totally bounded in Y (Corollary 3.81), and so T (A)− is bounded in Y (Corollary 3.71). Hence T (A) is a bounded subset of Y. Thus T is bounded by Proposition 4.12. In other words, every compact linear transformation is continuous. The converse is clearly false, for the identity I of an infinite-dimensional normed space X into itself is not compact by Corollary 4.34. Recall that a transformation T ∈ L[X , Y ] is of finite rank (or T is a finite-dimensional linear transformation) if it has a finite-dimensional range (see Problem 2.18). Proposition 4.50. Let X and Y be normed spaces. If T ∈ B[X , Y ] is of finite rank, then it is compact . Proof. Take an arbitrary bounded subset A of X . If T ∈ B[X , Y ], then T (A) is a bounded subset of Y (Proposition 4.12). Thus T (A)− is closed and bounded in Y. If T is of finite rank, then the range R(T ) of T is a finite-dimensional subspace of Y (Corollary 4.29), and hence R(T ) is a closed subset of Y. Thus T (A)− is closed and bounded in R(T ), according to Problem 3.38(d), and so it is a compact subset of R(T ) by Corollary 4.32. Since the metric space R(T ) is a subspace of the metric space Y, it follows that T (A)− is compact in Y. Remark : The assumption “T is bounded” cannot be removed from the statement of Proposition 4.50. Actually, we have exhibited (see proof of Corollary 4.30, part (b)) an unbounded linear transformation L of an arbitrary infinitedimensional normed space X into an arbitrary nonzero normed space Y. If dim Y = 1, then L: X → Y is a linear transformation of rank 1 (dim R(L) = 1) that is not even bounded, and so L is not compact by Theorem 4.49.
4.9 Compact Operators
253
Corollary 4.51. If X is a finite-dimensional normed space and Y is any normed space, then every T ∈ L[X , Y ] is of finite rank and compact Y. Proof. If X is finite dimensional and T ∈ L[X , Y ], then T ∈ B[X , Y ] (Corollary 4.30) and dim R(T ) < ∞ (Problems 2.17 and 2.18). Thus T is bounded and of finite rank, and hence compact by Proposition 4.50. Theorem 4.52. Let T be a linear transformation of a normed space X into a normed space Y. The following four assertions are pairwise equivalent . (a) T is compact (i.e., T maps bounded sets into relatively compact sets). (b) T maps the unit ball B1 [0] into a relatively compact set . (c) T maps every B1 [0]-valued sequence into a sequence that has a convergent subsequence. (d) T maps bounded sequences into sequences that have a convergent subsequence. Moreover, each of the above equivalent assertions implies that (e) T maps bounded sets into totally bounded sets, which in turn implies that (f) T maps the unit ball B1 [0] into a totally bounded set . Furthermore, if Y is a Banach space, then these six assertions are all pairwise equivalent . Proof. Note that (a)⇒(b) trivially. Hence, in order to verify that (a), (b), (c), and (d) are pairwise equivalent, it is enough to show that (b)⇒(c)⇒(d)⇒(a). Proof of (b)⇒(c). Take an arbitrary B1 [0]-valued sequence {xn }. If (b) holds, then {T xn } lies in a compact subset of Y or, equivalently, in a sequentially compact subset of Y (Corollary 3.81). Thus, according to Definition 3.76, the Y-valued sequence {T xn } has a convergent subsequence. Then (b) implies (c). Proof of (c)⇒(d). Take an arbitrary X -valued bounded sequence {xn } so that there is a real number β ≥ supn #xx # for which {β −1 xn } is a B1 [0]-valued sequence. If (c) holds, then {T (β −1 xn )} = β −1 {T xn } has a convergent subsequence, and so {T xn } has a convergent subsequence. Thus (c) implies (d). Proof of (d)⇒(a). Take an arbitrary bounded subset A of X . If (d) holds, then every T (A)-valued sequence has a convergent subsequence, and hence every T (A)−-valued sequence has a subsequence that converges in T (A)− by Theorem 3.30, which means that T (A)− is sequentially compact (Definition 3.76). Therefore (d) implies (a) by Corollary 3.81. Moreover, (a) implies (e). Indeed, if T (A)− is a compact subset of Y whenever A is a bounded subset of X , then T (A)− (and so T (A)) is totally bounded in Y whenever A is bounded in X , by Corollary 3.81. Also, (e) trivially implies (f).
254
4. Banach Spaces
Conversely, suppose (f) holds true so that T (B1 [0]) is a totally bounded subset of Y. If Y is a Banach space, then Corollary 3.84(b) ensures that T (B1 [0]) is relatively compact in Y. Thus (f) implies (b) if Y is a Banach space. Remark : On a finite-dimensional normed space every operator is compact (cf. Corollary 4.51). So, by Theorem 4.52 and Corollary 4.34, the identity operator on a normed space X is compact if and only if X is finite dimensional . Let B∞[X , Y ] denote the collection of all compact linear transformations between normed spaces X and Y. Thus B∞[X , Y ] ⊆ B[X , Y ] by Theorem 4.49, and B∞[X , Y ] = B[X , Y ] if dim X < ∞ by Corollary 4.51. Write B∞[X ] for B∞[X , X ]: the collection of all compact operators on a normed space X . Theorem 4.53. Let X and Y be normed spaces. (a) B∞[X , Y ] is a linear manifold of B[X , Y ]. (b) If Y is a Banach space, then B∞[X , Y ] is a subspace of B[X , Y ]. Proof. It is trivially verified that αT ∈ B∞[X , Y ] for every scalar α whenever T ∈ B∞[X , Y ] (Theorem 4.52(d)). In order to verify that S + T ∈ B∞[X , Y ] for every S, T ∈ B∞[X , Y ], proceed as follows. Take S, T in B∞[X , Y ] ⊆ B[X , Y ] and let {xn }n≥1 be an arbitrary X -valued bounded sequence. Theorem 4.52(d) ensures that there exists a subsequence of {T xn }n≥1 , say {T xnk }k≥1 , that converges in Y. Now consider the subsequence {xnk }k≥1 of {xn }n≥1 , which is clearly bounded. Then (Theorem 4.52(d) again) the sequence {Sxnk }k≥1 has a subsequence, say {Sxnkj }j≥1 , that converges in Y. Since {T xnkj }j≥1 is a subsequence of the convergent sequence {T xnk }k≥1 , it follows that {T xnkj }j≥1 also converges in Y (Proposition 3.5). Thus {(S + T )xnkj }j≥1 = {Sxnkj }j≥1 + {T xnkj }j≥1 is a convergent subsequence of {(S + T )xn }n≥1 , and therefore S + T ∈ B[X , Y ] is compact by Theorem 4.52(d). Hence S + T ∈ B∞[X , Y ]. Conclusion: B∞[X , Y ] is a linear manifold of B[X , Y ]. Claim . If Y is a Banach space, then B∞[X , Y ] is closed in B[X , Y ]. Proof. Take an arbitrary B∞[X , Y ]-valued sequence {Tn } that converges (uniformly) in B[X , Y ] to, say, T ∈ B[X , Y ]. Thus, for each ε > 0 there exists a positive integer nε such that #(T − Tnε )x# ≤ #T − Tnε # #x# <
ε 2 #x#
for every x ∈ X . Since Tnε is compact, it follows by Theorem 4.52(f) that the image Tnε(B1 [0]) of the unit ball B1 [0] is totally bounded in Y, and hence Tnε(B1 [0]) has a finite 2ε -net, say Yε (Definition 3.68). Therefore, for each x ∈ B1 [0] there exists y ∈ Yε such that #Tnε x − y# < 2ε , and so #T x − y# = #T x − Tnε x + Tnε x − y# ≤ #(T − Tnε )x# + #Tnε x − y# <
ε #x# 2
+
ε 2
< ε.
4.9 Compact Operators
255
That is, Yε is a finite 2ε -net for T (B1 [0]), which means that T (B1 [0]) is totally bounded in Y. Thus, if Y is a Banach space, then T is compact by Theorem 4.52. Conclusion: Every B∞[X , Y ]-valued sequence that converges in B[X , Y ] has its limit in B∞[X , Y ]. Hence B∞[X , Y ] is closed in B[X , Y ] by the Closed Set Theorem (Theorem 3.30). Outcome: If Y is a Banach space, then B∞[X , Y ] is a subspace of B[X , Y ] (which are Banach spaces by Propositions 4.15 and 4.7). Recall that a two-sided ideal I of an algebra A is a subalgebra of A such that the product (both left and right products) of every element of I with any element of A is again an element of I (see Problem 2.30). Proposition 4.54. If X is a normed space, then B∞[X ] is a two-sided ideal of the normed algebra B[X ]. Proof. B∞[X ] is a linear manifold of B[X ] by Theorem 4.53. Take S ∈ B∞[X ] and T ∈ B[X ] arbitrary. Claim . Both S T and T S lie in B∞[X ]. Proof. Let A be any bounded subset of X so that T (A) is bounded by Proposition 4.12. Since S is compact, it follows (by definition) that S(T (A))− is compact. Thus the composition S T maps bounded sets into relatively compact sets, which means that S T is compact. Moreover, S(A)− is compact as well. Since T is continuous, it follows that T (S(A)− ) is compact by Theorem 3.64 (and so T (S(A)− ) is closed — Theorem 3.62), and also that T (S(A))− = T (S(A)− )− = T (S(A)− ) according to Problem 3.46. Therefore T (S(A))− is compact. Thus the composition T S maps bounded sets into relatively compact sets, which means that T S is compact. Conclusion: B∞[X ] is a two-sided ideal of B[X ].
Let B0 [X , Y ] denote the collection of all finite-rank bounded linear transformations of a normed space X into a normed space Y. Proposition 4.50 says that B0 [X , Y ] ⊆ B∞[X , Y ], and B0 [X , Y ] = B∞[X , Y ] = B[X , Y ] = L[X , Y ] if X is finite dimensional by Corollaries 4.30 and 4.51. We shall write B0 [X ] for B0 [X , X ]: the collection of all finite-rank operators on X . It is readily verified that both S T and T S lie in B0 [X ] for every S ∈ B0 [X ] and every T ∈ B[X ]. Indeed, it is clear that S T is of finite rank (because the range of S T is trivially included in the range of S). Moreover, T S is of finite rank since T S = T |R(S) S and the domain of T |R(S) is the finite-dimensional range of S (and so its own range is finite dimensional as well — see Problem 2.17). Therefore B0 [X ] is also a two-sided ideal of B[X ]. Corollary 4.55. Let X and Y be normed spaces. If Y is a Banach space, then every B0 [X , Y ]-valued sequence that converges (uniformly) in B[X , Y ] has its limit in B∞[X , Y ].
256
4. Banach Spaces
Proof. This is a straightforward application of Theorem 4.53(b) because B0 [X , Y ] ⊆ B∞[X , Y ]. Indeed, since B∞[X , Y ] is closed in B[X , Y ], the Closed Set Theorem says that every B∞[X , Y ]-valued sequence (and so every B0 [X , Y ]valued sequence) that converges in B[X , Y ] has its limit in B∞[X , Y ]. p ] of Example 4.K Example 4.N. Consider the diagonal operator Da ∈ B[+ ∞ ∞ (for some p ≥ 1, where a = {αk }k=0 lies in + ). We shall show that
Da is compact if and only if αk → 0 as k → ∞. p p ∞ -valued sequence {ej }∞ Consider the + j=0 with ej = {δjk }k=0 ∈ + for each j ≥ 0 (just one nonzero entry equal to 1 at the jth position) of Example 4.H.
(a) For each nonnegative integer n set Dan = diag(α0 , . . . , αn , 0, 0, 0, . . .) in p B[+ ]. It is readily verified that each Dan is of finite rank. (Indeed, y is in p R(Dan) if and only if y = ni=0 αi ξi ei for some x = {ξj }∞ j=0 in + , so that R(Dan) ⊆ span {ei }ni=0 , and hence dim R(Dan) ≤ n by Theorem 2.6.) Moreover (Example 4.H again), #Dan − Da # = supk≥n+1 |αk | for every n ≥ 0. If limn |αn | = 0, then lim supn |αn | = limn supk≥n |αk | = 0 (Problem 3.13), and u so limn #Dan − Da # = 0. Thus αn → 0 implies Dan −→ Da , which in turn implies that Da is compact by Corollary 4.55. ∞ / 0, then {αj }∞ (b) Conversely, if αj → j=0 has a subsequence, say {αjn }n=0 , such that inf n |αjn | > 0. Set ε = inf n |αjn | > 0, and note that #Da ejm − Da ejn #p = #αjm ejm − αjn ejn #p = |αjm |p + |αjn |p ≥ 2εp whenever m = n. Then every subsequence of {Da ejn }∞ n=0 is not a Cauchy sequence, and so does not converge p in + . Therefore, the bounded sequence {ejn }∞ n=0 is such that its image under Da , {Da ejn }∞ , has no convergent subsequence. Hence Da is not compact by n=0 Theorem 4.52(c). Conclusion: If Da is compact, then αi → 0 as i → ∞.
Example 4.O. According to Theorem 4.53 and Corollary 4.55, it follows that B0 [X , Y ]
−
lies in B∞[X , Y ],
and B∞[X , Y ] is closed in B[X , Y ], whenever Y is a Banach space. Now consider the setup of Examples 4.M co and 4.N, and note that Δ∞ = {Da ∈ Δ: a ∈ + } is precisely the collection p of all compact diagonal operators on the Banach space + . It was shown in p Example 4.M that Δ∞ is not strongly closed in B[+ ] by exhibiting a sequence {Dn } of finite-rank diagonal operators (and hence a sequence of compact p diagonal operators) that converges strongly to the identity I on + , which is not even compact. Moreover, strong convergence to compact diagonals is shown in Problem 4.53. Conclusion: p p B0 [+ ] is not strongly closed in B∞[+ ],
and p p p ] and B∞[+ ] are not strongly closed in B[+ ]. B0 [+
4.9 Compact Operators
257
It may be tempting to think that the converse of Corollary 4.55 holds. It in fact does hold whenever the Banach space Y has a Schauder basis (Problem 4.11), and it also holds if Y is a Hilbert space (next chapter). In these cases, every T in B∞[X , Y ] is the uniform limit of a B0 [X , Y ]-valued sequence. But such an italicized result fails in general (see Problem 4.58). However, every compact linear transformation comes close to having a finite-dimensional range in the following sense. Proposition 4.56. Let X and Y be normed spaces and take T in L[X , Y]. If T is compact, then for each ε > 0 there exists a finite-dimensional subspace Rε of the range R(T ) of T such that d(T x, Rε ) ≤ ε #x#
x ∈ X.
for every
Proof. Take an arbitrary ε > 0. If T ∈ L[X , Y ] is compact, then the image T (B1 [0]) of the closed unit ball B1 [0] is totally bounded in the normed space R(T ) by Theorem 4.52(f). Thus there exists a finite ε-net for T (B1 [0]), say ε ε {vi }ni=1 ⊂ R(T ). That is, for every y ∈ T (B1 [0]) there exists vy ∈ {vi }ni=1 such nε that #y − vy # < ε. Set Rε = span {vi }i=1 ⊆ R(T ). Rε is a finite-dimensional subspace of R(T ) (Theorem 2.6, Proposition 4.7, and Corollary 4.28), and , , , Tx 1 1 u , d(T x, R ) = inf #T x − u# = inf − , ε
x
x u∈R
x , u∈Rε x ε , , , ,
, , , , x x = inf ,T x − v, ≤ inf nε ,T x − v, < ε v∈{vi }i=1
v∈Rε
for every nonzero x in X , which concludes the proof (since 0 ∈ Rε ).
Proposition 4.57. The range of a compact linear transformation is separable. Proof. Let X be a normed space. Consider the collection {Bn (0)}n∈N of all open balls with center at the origin of X and radius n. This covers X , that is, X = Bn (0). n
Let Y be a normed space and let T : X → Y be any mapping of X into Y so that (Problem 1.2(e))
R(T ) = T (X ) = T Bn (0) = T (Bn (0)). n
n
If T ∈ B∞[X , Y ], then each T (Bn (0)) is separable (Theorem 4.52 and Proposition 3.72), and hence each T (Bn(0)) has a countable dense subset. That is, for each n ∈ N there exists a countable set An ⊆ T (Bn (0)) such that − A− n = T (Bn (0)) (Problem 3.38(g)). Therefore, −
An ⊆ T Bn (0) ⊆ A− ⊆ An n n
n
n
n
258
4. Banach Spaces
− (Section 3.5). Thus = R(T )− so that n An is dense in R(T ) (Probn An lem 3.38(g)). Moreover, since each An is countable, it follows by Corollary 1.11 that the countable union n An is countable as well. Outcome: n An is a countable dense subset of R(T ), which means that R(T ) is separable. If T : X → Y is a compact linear transformation of a normed space X into a normed space Y, then the restriction T |M : M → Y of T to a linear manifold M of X is a linear transformation (Problem 2.14). Moreover, it is clear by the very definition of compact linear transformations that T |M is compact as well: the restriction of a compact linear transformation T : X → Y to a linear manifold of X is again a compact linear transformation (i.e., T |M lies in B∞[M, Y ] whenever T lies in B∞[X , Y ]). On the other hand, if M is a linear manifold of an infinite-dimensional Banach space X , and T : M → Y is a compact linear transformation of M into a Banach space Y, then it is easy to show that an arbitrary bounded linear extension of T over X may not be compact (e.g., see Problem 4.60). However, if M is dense in X , then the extension of T over X must be compact. In fact, the extension of a compact linear transformation T : X → Y over a completion of X into a completion of Y is again a compact linear transformation. Recall that every bounded linear transformation T of a normed space X into a normed space Y has a unique (up to an isometric isomorphism) bounded linear extension T over the (essentially unique) completion X of X into the (essentially unique) completion Y of Y (Theorems 4.40 through 4.42). The next theorem says that T ∈ B[X, Y ] is compact wheneverT ∈ B[X , Y ] is compact. Theorem 4.58. Let the Banach spaces X and Y be completions of the normed spaces X and Y, respectively. If T lies in B∞[X , Y ], then its bounded linear extension T: X → Y lies in B∞[X, Y ]. Proof. Let X and Y be completions of X and Y. Thus there exist dense linear manifolds X and Y of X and Y that are isometrically isomorphic to X and Y, respectively (cf. Definition 4.39 and Theorem 4.40). Let UX ∈ G[X, X ] and UY ∈ G[Y, Y ] denote such isometric isomorphisms. Take T ∈ B[X , Y ] and set T = UY T UX ∈ B[X, Y ] so that the diagram U
Y ⏐ ⏐T
U
X
Y ⏐ T⏐
Y −−− →
X
←−X−−
commutes. Now take an arbitrary bounded X-valued sequence { xn }. Since − x−x #: x ∈ X} = 0 for every x ∈ X — see ProposiX = X (i.e., since inf{# tion 3.32), it follows that there exists an X-valued sequence { xn } equiconvergent with { xn } (i.e., such that # xn − x n # → 0; for instance, for each integer
4.10 The Hahn–Banach Theorem and Dual Spaces
259
1 n take x n in X such that # xn − x n # < n+1 ), which is bounded (since # xn # ≤ # xn − x n # + # xn # for every n). Consider the X -valued sequence {xn } such that xn = UX x n for each n, which is bounded too: #xn # = #UX x n # = # xn # for every n. If T is compact, then the Y-valued sequence {T xn } has a convergent subsequence (Theorem 4.52), say, {T xnk }. Thus {UY T xnk } converges in nk } converges in Y (since Y (because UY is a homeomorphism). Therefore {Tx Hence {Tx nk = UY T xnk for each k) to, say, y ∈ Y ⊆ Y. nk } Tx nk = UY T UX x converges in Y. Indeed,
#Tx nk − y# = #T( xnk − x nk ) + Tx nk − y# ≤ #T# # xn − x n # + #Tx n − y# k
k
k
nk → y (reason: #Tx nk − y# → 0 and for every k because T = T|X, so that Tx nk # → 0, see Proposition 3.5) as k → ∞. Conclusion: T maps bounded # xnk − x sequences into sequences that have a convergent subsequence; that is, T is compact (Theorem 4.52).
4.10 The Hahn–Banach Theorem and Dual Spaces Three extremely important results on continuous (i.e., bounded) linear transformations, which yield a solid foundation for a large portion of modern analysis, are the Open Mapping Theorem, the Banach–Steinhaus Theorem, and the Hahn–Banach Theorem. The Hahn–Banach Theorem is concerned with the existence of bounded linear extensions for bounded linear functionals (i.e., for scalar-valued bounded linear transformations), and it is the basis for several existence results that are often applied in functional analysis. In particular, the Hahn–Banach Theorem ensures the existence of a large supply of continuous linear functionals on a normed space X , and hence it is of fundamental importance for introducing the dual space of X (the collection of all continuous linear functionals on X ). Let M be any linear manifold of a linear space X and consider a linear transformation L: M → Y of M into a linear space Y. From a purely alge X → Y of L over X has already braic point of view, a plain linear extension L: been investigated in Theorem 2.9. On the other hand, if M is a dense linear manifold of the normed space X and T : M → Y is a bounded linear transformation of M into a Banach space Y, then T has a unique bounded linear extension T: X → Y over X (Theorem 4.35). In particular, every bounded linear functional on a dense linear manifold M of a normed space X has a bounded linear extension over X . The results of Section 3.8 (and also of Section 4.7), which ensure the existence of a uniformly continuous extension over a metric space X of a uniformly continuous mapping on a dense subset of X,
260
4. Banach Spaces
are called extension by continuity. What the Hahn–Banach Theorem does is to ensure the existence of a bounded linear extension f: X → F over X for every bounded linear functional f : M → F on any linear manifold M of the normed space X . (Here M is not necessarily dense in X so that extension by continuity collapses.) We shall approach the Hahn–Banach Theorem step by step. The first steps are purely algebraic and, as such, could have been introduced in Chapter 2. To begin with, we consider the following lemma on linear functionals, acting on a linear manifold of a real linear space, that are dominated by a sublinear functional (i.e., by a nonnegative homogeneous and subadditive functional). Lemma 4.59. Let M0 be a proper linear manifold of a real linear space X . Take x1 ∈ X \M0 and consider the linear manifold M1 of X generated by M0 and x1 , M1 = M0 + span {x1 }. Let p : X → R be a sublinear functional on X . If f0 : M0 → R is a linear functional on M0 such that f0 (x) ≤ p(x)
for every
x ∈ M0 ,
then there exists a linear extension f1 : M1 → R of f0 over M1 such that f1 (x) ≤ p(x)
for every
x ∈ M1 .
Proof. Take an arbitrary vector x1 in X \M0 . Claim . There exists a real number c1 such that −p(−w − x1 ) − f0 (w) ≤ c1 ≤ p(w + x1 ) − f0 (w) for every w ∈ M0 . Proof. Since the linear functional f0 : M0 → R is dominated by a subadditive functional p : X → R, it follows that f0 (v) − f0 (u) = f0 (v − u) ≤ p(v − u) = p(v + x1 − u − x1 ) ≤ p(v + x1 ) + p(−u − x1 ) for every u, v ∈ M0 . Therefore, −p(−u − x1 ) − f0 (u) ≤ p(v + x1 ) − f0 (v) for every u ∈ M0 and every v ∈ M0 . Set a1 = sup (−p(−u − x1 ) − f0 (u)) u∈M0
and
b1 = inf (p(v + x1 ) − f0 (v)). v∈M0
The above inequality ensures that a1 and b1 are real numbers, and also that a1 ≤ b1 . Thus the claimed result holds for any c1 ∈ [a1 , b1 ].
4.10 The Hahn–Banach Theorem and Dual Spaces
261
Recall that every x in M1 = M0 + span {x1 } can be uniquely written as x = x0 + αx1 with x0 in M0 and α in R. Consider the functional f1 : M1 → R defined by the formula f1 (x) = f0 (x0 ) + αc1 for every x ∈ M1 , where the pair (x0 , α) in M0 ×R stands for the unique representation of x in M0 + span {x1 }. It is readily verified that f1 is a linear extension of f0 over M1 (i.e., f1 inherits the linearity of f0 and f1 |M0 = f0 ). We show now that p also dominates f1 . Take an arbitrary x = x0 + αx1 in M1 and consider the three possibilities, viz., α = 0, α > 0, or α < 0. If α = 0, then f1 (x) ≤ p(x) trivially (in this case, f1 (x) = f0 (x0 ) ≤ p(x0 ) = p(x)). Next recall that p is nonnegative homogeneous (i.e., p(γz) = γp(z) for every z ∈ X and every γ ≥ 0), and consider the above claimed inequalities. If α > 0, then f1 (x) = f0 (x0 ) + αc1 ≤ f0 (x0 ) + αp( xα0 + x1 ) − αf0 ( xα0 ) = p(x0 + αx1 ) = p(x). On the other hand, if α < 0, then f1 (x) = f0 (x0 ) + αc1 = f0 (x0 ) − |α|c1 −x0 0 ≤ f0 (x0 ) + |α|p( −x α − x1 ) + |α|f0 ( α ) = p(x0 − |α|x1 ) = p(x0 + αx1 ) = p(x),
which concludes the proof.
Theorem 4.60. (Real Hahn–Banach Theorem). Let M be a linear manifold of a real linear space X and let p : X → R be a sublinear functional on X . If f : M → R is a linear functional on M such that f (x) ≤ p(x)
for every
x ∈ M,
then there exists a linear extension f: X → R of f over X such that f(x) ≤ p(x)
for every
x ∈ X.
Proof. First note that, except for the dominance condition that travels from f to f, this could be viewed as a particular case of Theorem 2.9, whose kernel’s proof is Zorn’s Lemma. Let K = ϕ ∈ L[N , R] : N ∈ Lat(X ), M ⊆ N and f = ϕ|M be the collection of all linear functionals on linear manifolds of the real linear space X which extend the linear functional f : M → R, and set K = ϕ ∈ K : ϕ(x) ≤ p(x) for every x ∈ N = D(ϕ) . Note that K is not empty (for f ∈ K ). Following the proof of Theorem 2.9, K is partially ordered (in the extension ordering, and so is its subcollection K ),
262
4. Banach Spaces
and every chain every chain {ϕγ } in K has a in K has a supremum in K. Then supremum γ ϕγ in K, which actually lies in K . Indeed, since each ϕγ is such that ϕγ (x) ≤ p(x) for every x in the domain D(ϕγ ) of ϕγ , and since {ϕγ } is a chain, it follows that ϕ (x) ≤ p(x) for every x in the domain γ γ γ D(ϕγ ) of γ ϕγ . (In fact, if x ∈ γ D(ϕγ ), then x ∈ D(ϕμ ) for some ϕμ ∈ {ϕγ }, and
so γ ϕγ (x) = ϕμ (x) ≤ p(x) because γ ϕγ |D(ϕμ ) = ϕμ .) Therefore, every chain in K has a supremum (and hence an upper bound) in K . Thus, according to Zorn’s Lemma, K has a maximal element, say, ϕ0 : N0 → R. Now we apply Lemma 4.59 to show that N0 = X . Suppose N0 = X . Take x1 ∈ X \N0 and consider the linear manifold N1 of X generated by N0 and x1 , N1 = N0 + span {x1 }, which properly includes N0 . Since ϕ0 ∈ K , it follows that ϕ0 (x) ≤ p(x) for every x ∈ N0 . Thus, according to Lemma 4.59, there exists a linear extension ϕ1 : N1 → R of ϕ0 over N1 such that ϕ1 (x) ≤ p(x) for every x ∈ N1 . Therefore ϕ0 ≤ ϕ1 ∈ K , which contradicts the fact that ϕ0 is maximal in K (for ϕ0 = ϕ1 ). Conclusion: N0 = X . Outcome: ϕ0 is a linear extension of f over X which is dominated by p. Theorem 4.61. (Hahn–Banach Theorem). Let M be a linear manifold of a linear space X over F and let p : X → R be a seminorm on X . If f : M → F is a linear functional on M such that |f (x)| ≤ p(x)
for every
x ∈ M,
then there exists a linear extension f: X → F of f over X such that |f(x)| ≤ p(x)
for every
x ∈ X.
Proof. As we have agreed in the introduction of this chapter, F denotes either the complex field C or the real field R. Recall that a seminorm is a convex (i.e., an absolutely homogeneous and subadditive) nonnegative functional. (a) If F = R, then this is an easy corollary of the previous theorem. Indeed, if F = R, then the condition |f | ≤ p trivially implies f ≤ p on M. As a seminorm is a sublinear functional, Theorem 4.60 ensures the existence of a extension f: X → R of f over X such that f ≤ p on X . Since f is linear and p is absolutely homogeneous, it follows that −f(x) = f(−x) ≤ p(−x) = |−1| p(x) = p(x) for every x ∈ X . Hence −p ≤ f, and so |f| ≤ |p| = p on X (for p is nonnegative). (b) Suppose F = C . Note that the complex linear space X can also be viewed as a real linear space (where scalar multiplication now means multiplication by real scalars only). Moreover, if M is a linear manifold of the (complex) linear space X , then it is also a (real) linear manifold of X when X is regarded as a real linear space. Furthermore, if f : M → C is a complex-valued functional on M, and if g: M → R and h: M → R are defined by g(x) = Re f (x) and h(x) = Im f (x) for every x ∈ X , then
4.10 The Hahn–Banach Theorem and Dual Spaces
263
f = g + ih. Now recall that f : M → C is linear. Thus, for an arbitrary α ∈ R, g(αx) + ih(αx) = f (αx) = αf (x) = αg(x) + iαh(x), and hence g(αx) = αg(x) and h(αx) = αh(x) for every x ∈ M (because g and h are real valued). Similarly, for every x, y ∈ M, g(x + y) + ih(x + y) = f (x + y) = f (x) + f (y) = g(x) + ih(x) + g(y) + ih(y), and so g(x + y) = g(x) + g(y) and h(x + y) = h(x) + h(y). Conclusion: g: M → R and h: M → R are linear functionals on M when M is regarded as a real linear space. Observe that f (ix) = if (x), and hence g(ix) + ih(ix) = ig(x) − h(x), for every x ∈ M. Since g(x), g(ix), h(x), and h(ix) are real numbers, it follows that h(x) = −g(ix), and therefore f (x) = g(x) − ig(ix), for every x ∈ M. If |f | < p on M, then g = Ref ≤ p on M. Since g is a linear functional on the (real) linear manifold M, and since p is a sublinear functional on the (real) linear space X (because it is a seminorm on the complex linear space X ), it follows by Theorem 4.60 that there exists a real-valued linear extension g of g over the (real) linear space X such that g < p on X . Consider the functional f: X → C (on the complex linear space X ) defined by f(x) = g(x) − i g(ix) for every x ∈ X . It is clear that f extends f over X (reason: if x ∈ M — so that = g(x) − ig(ix) = f (x)). It is also readily verified that ix ∈ M — then f(x) f is a linear functional on the complex space X . Indeed, additivity and (real) homogeneity are trivially verified (because g is additive and homogeneous on for every the real linear space X ). Thus it suffices to verify that f(ix) = if(x) x ∈ X . In fact, f (ix) = g (ix) − i g(−x) = g (ix) + i g(x) = i f (x) for every x ∈ X . Therefore, f is a linear extension of f over X . Finally, we show that |f(x)| ≤ p(x) for every x ∈ X . Take an arbitrary x in X and write the complex number f(x) in polar form: f(x) = ρeiθ (if f(x) = 0, then ρ = 0 and θ is any number in [0, 2π]). Since f is a linear functional on the complex space X , it follows that g (e−iθ x), and hence f(e−iθ x) = ρ = |f(x)|, a real number. Then f(e−iθ x) = |f(x)| = g(e−iθ x) ≤ p(e−iθ x) = |e−iθ |p(x) = p(x), since p : X → R is absolutely homogeneous on the complex space X .
264
4. Banach Spaces
Theorems 4.60 and 4.61 are called Dominated Extension Theorems (in which no topology is involved). The next one is the Continuous Extension Theorem. Theorem 4.62. (Hahn–Banach Theorem in Normed Space). Let M be a linear manifold of a normed space X . Every bounded linear functional f : M → F defined on M has a bounded linear extension f: X → F over the whole space X such that #f# = #f #. Proof. Take an arbitrary f ∈ B[M, F ] so that |f (x)| ≤ #f # #x# for each x ∈ M. Set p(x) = #f # #x# for every x ∈ X , which defines a seminorm on X . (Indeed, p : X → R is a norm on X whenever f = 0 because it is a multiple of a norm # # : X → R.) Since |f (x)| ≤ p(x) for every x ∈ M, it follows by the previous theorem that there exists a linear extension f: X → F of f over X such that |f(x)| ≤ p(x) = #f ##x# for every x ∈ X . Thus f is bounded (i.e., f ∈ B[X , F ]) and #f# ≤ #f #. On the other hand, f (x) = f(x) for every x ∈ M (because f = f|M ), and hence #f # = sup |f(x)| ≤ sup |f(x)| = #f#. x∈M
x∈X
Therefore #f# = #f #.
Here are some useful consequences of the Hahn–Banach Theorem. Corollary 4.63. Let M be a proper subspace of a normed space X . For each vector x in X \M there exists a bounded linear functional f : X → F such that f (x) = 1, f (M) = {0}, and #f # = d(x, M)−1. Proof. Take an arbitrary x ∈ X \M. Since M = M−, the distance from x to M is strictly positive, that is, d(x, M) = 0 (Problem 3.43(b)). Thus d(x, M)−1 is well defined. Consider the linear manifold Mx of X generated by M and x, Mx = M + span {x}, so that every y in Mx can be uniquely written as y = u + αx with u in M and α in F . Let fx : Mx → F be a functional on Mx defined by fx (u + αx) = α for every y = u + αx ∈ Mx . It is easy to verify that fx is linear, fx (M) = {0}, and fx (x) = 1. Next we show that fx is bounded (so that fx ∈ B[Mx , F ]) and #fx # = d(x, M)−1. Consider the set S = (u, α) ∈ M×F : (u, α) = (0, 0) and its partition {S1 , S2 }, where
4.10 The Hahn–Banach Theorem and Dual Spaces
265
S1 = (u, α) ∈ M×F : α = 0 , S2 = (u, α) ∈ M×F : u = 0 and α = 0 . Note that y = u + αx = 0 (in Mx ) if and only if u = 0 (in M) and α = 0 (in F ) — reason: M is a linear manifold of X and x ∈ X \M, so that span {x} ∩ M = {0}. Hence y = u + αx = 0 in Mx if and only if (u, α) ∈ S, which implies that #fx # =
sup
0 =y∈Mx
|fx (y)|
y
= sup (u,α)∈S
|α|
u+αx
=
sup (u,α)∈S1
|α|
u+αx ,
|α| since sup(u,α)∈S2 u+αx = 0 and S = S1 ∪ S2 . However, inf v∈M #v + x# = inf v∈M #x − v# = d(x, M) = 0, and so (Problem 4.5)
sup (u,α)∈S1
|α|
u+αx
=
sup (u,α)∈S1
1
α−1u+x
= sup v∈M
1
v+x
=
1 · d(x,M)
Summing up: fx is a bounded linear functional on the linear manifold Mx of X such that fx (x) = 1, fx (M) = {0}, and #fx # = d(x, M)−1. Thus, according to Theorem 4.62, fx : Mx → F has a bounded linear extension f : X → F over X such that #f # = #fx # = d(x, M)−1. Moreover, since f |Mx = fx , we get f (x) = fx (x) = 1 and f (M) = fx (M) = {0}, which concludes the proof. Recall that {0} is a proper subspace of any nonzero normed space X . Thus, according to Corollary 4.63, for each x = 0 in X = {0} there exists a bounded linear functional f : X → F such that f (x) = 1 and #f # = d(x, {0})−1 = #x#−1. Now set fx = #x#f , which is again a bounded linear functional on X , so that fx (x) = #x# and #fx # = 1. Moreover, if x = 0, then take x = 0 in X and a bounded linear functional fx on X such that fx (x ) = #x # and #fx # = 1. Since fx is linear, fx (x) = fx (0) = 0 = #x#. This proves the next corollary. Corollary 4.64. For each vector x in a normed space X = {0} there exists a bounded linear functional f : X → F such that #f # = 1 and f (x) = #x#. Consequently, there exist nonzero bounded linear functionals defined on every nonzero normed space. Let X and Y be normed spaces over the same field, and consider the normed space B[X , Y ] of all bounded linear transformations of X into Y. If X = {0}, then Corollary 4.64 ensures the existence of f = 0 in B[X , F ]. Suppose Y = {0}, take any y = 0 in Y, and set T (x) = f (x)y for every x ∈ X . This defines a nonzero mapping T : X → Y which certainly is linear and bounded. Conclusion: There exists T = O in B[X , Y ] whenever X and Y are nonzero normed spaces. Example 4.P. Proposition 4.15 says that B[X , Y ] is a Banach space whenever Y is a Banach space. If X = {0}, then B[X , Y ] = {O}, which is a trivial
266
4. Banach Spaces
Banach space regardless of whether Y is Banach or not. Thus the converse of Proposition 4.15 should read as follows. If X = {0} and B[X , Y ] is a Banach space, then Y is a Banach space. Corollary 4.64 asserts that there exists f = 0 in B[X , F ] whenever X = {0}. Take an arbitrary Cauchy sequence {yn} in Y and consider the B[X , Y ]-valued sequence {Tn } such that, for each n, Tn x = f (x)yn for every x ∈ X . Each Tn in fact lies in B[X , Y ] because f lies in B[X , F ]. Indeed, for each integer n, #Tn # = sup #Tn x# = sup |f (x)|#yn # = #f # #yn#
x ≤1
x ≤1
and, for any pair of integers m and n, #Tm − Tn # = sup #(Tm − Tn )x# = #f # #ym − yn #.
x ≤1
Hence {Tn } is a Cauchy sequence in B[X , Y ]. If B[X , Y ] is complete, then {Tn } converges in B[X , Y ] to, say, T ∈ B[X , Y ]. Since f = 0, there exists x0 ∈ X such that f (x0 ) = 0. Therefore, yn = f (x0 )−1 Tn (x0 ) → f (x0 )−1 T (x0 ) in
Y
(since uniform convergence implies strong convergence to the same limit), and so {yn } converges in Y. Conclusion: If X = {0} and B[X , Y ] is complete, then Y is complete. The dual space (or conjugate space) of a normed space X , denoted by X ∗, is the normed space of all continuous linear functionals on X (i.e., X ∗ = B[X , F ], where F stands either for the real field R or the complex field C , whether X is a real or complex normed space, respectively). Obviously, X ∗ = {0} whenever X = {0}. Corollary 4.64 ensures the converse: X ∗ = {0} whenever X = {0}. Indeed, if f (x) = 0 for all f ∈ X ∗, then x = 0. As a matter of fact, if x = y in X , then there exists f ∈ X ∗ such that f (x) − f (y) = f (x − y) = #x − y# = 0 (Corollary 4.64 again), and so f (x) = f (y). This is usually expressed by saying that X ∗ separates the points of X . Still from Corollary 4.64, for each nonzero x ∈ X there exists f0 ∈ X ∗ such that #f0 # = 1 and #x# = |f0 (x)|. Therefore, ⎧ ⎫ ⎨ sup ∗, f ≤1 |f (x)| ⎬ f ∈X |f0 (x)| ≤ ≤ #x# #x# = ⎩ sup 0 =f ∈X ∗ |f (x)| ⎭ #f0 #
f
(recall that |f (x)| ≤ #f # #x# for every x ∈ X and every f ∈ X ∗ ), which shows a symmetry in the definitions of the norms in X and X ∗ :
4.10 The Hahn–Banach Theorem and Dual Spaces
#x# =
sup f ∈X ∗, f ≤1
|f (x)| =
sup
0 =f ∈X ∗
267
|f (x)|
f
for every x ∈ X . Note that, by Proposition 4.15, X ∗ is a Banach space for every normed space X (reason: X ∗ = B[X , F ] and (F , | |) is a Banach space). Proposition 4.65. If the dual space X ∗ of a normed space X is separable, then X itself is separable. Proof. If X ∗ = {0}, then X = {0} and the result holds trivially. Thus suppose X ∗ = {0} is separable and consider the unit sphere about the origin of X ∗, viz., S1 = {f ∈ X ∗ : #f # = 1}. Since every subset of a separable metric space is separable (Corollary 3.36), it follows that S1 includes a countable dense subset, say, {fn }. For each fn there exists xn ∈ X such that #xn # = 1
and
1 2
< |fn (xn )|.
Indeed, sup x =1 |fn (x)| = #fn # = 1 because fn ∈ S1 . Consider the countable subset {xn } of X and set M = span {xn }. If M− = X , then Corollary 4.63 ensures that there exists 0 = f ∈ X ∗ such that f (M) = {0}. Set f0 = #f #−1 f in X ∗ so that f0 ∈ S1
and
f0 (xn ) = 0 for every n.
Hence |fn (xn )| ≤ |(fn − f0 )(xn )| ≤ #fn − f0 #, and therefore 1 2
≤ #fn − f0 #
for every n. But this implies that the set {fn } is not dense in S1 (see Proposition 3.32), which contradicts the very definition of {fn }. Outcome: M− = X , and so X is separable by Proposition 4.9(b). If X = {0}, then X ∗ = {0} and so (X ∗ )∗, the dual of X ∗, is again a nonzero Banach space. We shall write X ∗∗ instead of (X ∗ )∗, which is called the second dual (or bidual ) of X . It is clear that X , X ∗, and X ∗∗ are normed spaces over the same scalar field. The next result shows that X can be identified with a linear manifold of X ∗∗. Thus X is naturally embedded in its second dual X ∗∗. Theorem 4.66. Every normed space X is isometrically isomorphic to a linear manifold of X ∗∗. Proof. Suppose X = {0} (otherwise the result is trivial), take an arbitrary x in X , and consider the functional ϕx : X ∗ → F defined on the dual X ∗ of X by ϕx (f ) = f (x) for every f in X ∗. Since X ∗ is a linear space,
268
4. Banach Spaces
ϕx (αf + βg) = (αf + βg)(x) = αf (x) + βg(x) = αϕx (f ) + β ϕx (g) for every f, g ∈ X ∗ and every α, β ∈ F , so that ϕx is linear. Moreover, since the elements of X ∗ are bounded and linear, it also follows that |ϕx (f )| = |f (x)| ≤ #f # #x# for every f ∈ X ∗, and so ϕx is bounded. Thus ϕx ∈ X ∗∗. Indeed, #ϕx # = #x#, since Corollary 4.64 ensures the existence of f0 ∈ X ∗ such that #f0 # = 1 and |f0 (x)| = #x#, and therefore #ϕx # = sup |f (x)| ≤ #x# = |f0 (x)| = |ϕx (f0 )| ≤ #ϕx # #f0 # = #ϕx #.
f ≤1
Let Φ: X → X ∗∗ be the mapping that assigns to each vector x in X the functional ϕx in X ∗∗ ; that is, Φ(x) = ϕx for every x ∈ X . It is easy to verify that Φ is linear. Since #Φ(x)# = #x# for every x ∈ X , it follows that Φ is a linear isometry of X into X ∗∗ (Proposition 4.37). Hence Φ: X → R(Φ) ⊆ X ∗∗ is an isometric isomorphism of X onto R(Φ) = Φ(X ), the range of Φ. Thus the range Φ(X ) of Φ is a linear manifold of X ∗∗ isometrically isomorphic to X . Remark : If X is a Banach space, then the linear isometry Φ: X → X ∗∗ has a closed range R(Φ) — see Problem 4.41(d). Thus the previous theorem ensures that every Banach space X is isometrically isomorphic to a subspace of X ∗∗. This linear isometry Φ: X → X ∗∗ is known as the natural embedding of the normed space X into its second dual X ∗∗. If Φ is surjective (i.e., if Φ(X ) = X ∗∗ ), then we say that X is reflexive. Equivalently, X is reflexive if and only if the natural embedding Φ: X → X ∗∗ is an isometric isomorphism of X onto X ∗∗. Thus, if X is reflexive, then X and X ∗∗ are isometrically isomorphic (notation: X ∼ = X ∗∗ ). The converse, however, fails: X ∼ = X ∗∗ clearly implies Φ(X ) ∼ = X ∗∗ ∗∗ ∼ ∼ (since Φ(X ) = X = X and composition of isometric isomorphisms is again an isometric isomorphism) but does not imply Φ(X ) = X ∗∗. Since X ∗∗ (the dual of X ∗ ) always is a Banach space, it follows by Problem 4.37(b) that every reflexive normed space is a Banach space. Again, the converse fails (i.e., there exist nonreflexive Banach spaces, as we shall see in Example 4.S). Recall that separability is a topological invariant (see Problem 3.48), so that the converse of Proposition 4.65 holds for reflexive Banach spaces. Indeed, if X ∼ = X ∗∗, ∗∗ then X is separable if and only if X is separable, which implies that X ∗ is separable by Proposition 4.65. Therefore, if X is separable and X ∗ is not separable, then X ∼ = X ∗∗ and so X is not reflexive: a separable Banach space with a nonseparable dual is not reflexive. This provides a necessary condition for reflexivity. Here is an equivalent condition.
4.10 The Hahn–Banach Theorem and Dual Spaces
269
Proposition 4.67. A Banach space X is reflexive if and only for each ϕ ∈ X ∗∗ there exists x ∈ X such that ϕ(f ) = f (x)
for every
f ∈ X ∗.
Proof. Let Φ: X → X ∗∗ be the natural embedding of X into X ∗∗ and take an arbitrary ϕ ∈ X ∗∗. There exists x ∈ X such that ϕ(f ) = f (x) for every f ∈ X ∗ if and only if ϕ = ϕx , which means that ϕ ∈ R(Φ) (cf proof of Theorem 4.66). Therefore, for each ϕ ∈ X ∗∗ there exists an x ∈ X such that ϕ(f ) = f (x) for every f ∈ X ∗ if and only if Φ is surjective. Example 4.Q. Let X be a finite-dimensional normed space. In this case, dim X = dim X ∗ by Problem 4.64. Moreover, dim X ∗ = dim X ∗∗ because X ∗ is finite dimensional (Problem 4.64 again). Hence dim X = dim X ∗∗. Take the natural embedding Φ: X → X ∗∗ of X into X ∗∗. Since Φ(X ) is a linear manifold of the finite-dimensional linear space X ∗∗, it follows by Problem 2.7 that Φ(X ) also is a finite-dimensional linear space. Thus, as X and Φ(X ) are topologically isomorphic finite-dimensional normed spaces, dim Φ(X ) = dim X by Corollary 4.31. Then Φ(X ) is a linear manifold of the finite-dimensional space X ∗∗ and dim Φ(X ) = dim X ∗∗. Hence Φ(X ) = X ∗∗ (Problem 2.7 again). Conclusion: Every finite-dimensional normed space is reflexive. Example 4.R. Take an arbitrary pair {p, q} of H¨older conjugates (i.e., take real numbers p > 1 and q > 1 such that p1 + q1 = 1), and consider the Banach p q spaces + and + of Example 4.B. It can be shown that there exists a natp q ∗ p q ural isometric isomorphism Jp : + → (+ ) of + onto the dual of + . Thus, q p ∗ symmetrically, there exists a natural isometric isomorphism Jq : + → (+ ) of q p + onto the dual of + . It then follows by Problem 4.65 that there exists an q ∗ p ∗∗ q isometric isomorphism Jq∗ : (+ ) → (+ ) of the dual of + onto the second p p p ∗∗ ∗ dual of + . Thus the composition Jq Jp : + → (+ ) is an isometric isomorp p ∼ p phism of + onto its second dual so that + = (+ )∗∗. Moreover, it can also ∗ be shown that this isometric isomorphism Jq Jp actually coincides with the p p ∗∗ natural embedding Φ: + → (+ ) . Conclusion: p is a reflexive Banach space for every p > 1. + 2 In particular, the very special space + , besides being reflexive, is also isometrically equivalent to its own dual (actually, as we shall see in Section 5.11, the 2 2 ∗ real space + is isometrically isomorphic to (+ ) ).
Example 4.S. There are, however, nonreflexive (also referred to as irreflexive) ∞ 1 Banach spaces. For instance, consider the linear spaces + and + equipped co with their usual norms (# #∞ and # #1 , respectively). Since + is a linear ∞ manifold of the linear space + , equip it with the sup-norm as well. Recall co 1 ∞ that + and + are separable Banach spaces but the Banach space + is not
270
4. Banach Spaces
separable (see Examples 3.P to 3.S and also Problems 3.49 and 3.59). It is not co ∗ ∼ 1 co ∗∗ ∼ ∞ 1 ∗ ∼ ∞ very difficult to verify that (+ ) = + and (+ ) = + , and so (+ ) = + 1 (Problem 4.65). Thus + is a separable Banach space with a nonseparable 1 ∗ 1 ∗ ∼ ∞ dual (reason: (+ ) is not separable because (+ ) = + and separability is a topological invariant). Hence, 1 + is a nonreflexive Banach space . co co ∗∗ ∼ ∞ ∞ is not even homeomorphic to + (Problem 3.49) and (+ ) = + , Since + co ∼ co ∗∗ it follows that + = (+ ) . Therefore, co is a nonreflexive Banach space . +
Suggested Reading Bachman and Narici [1] Banach [1] Beauzamy [1] Berberian [2] Brown and Pearcy [2] Conway [1] Day [1] Douglas [1] Dunford and Schwartz [1] Goffman and Pedrick [1] Goldberg [1] Hille and Phillips [1] Istrˇ a¸tescu [1] Kantorovich and Akilov [1]
Kolmogorov and Fomin [1] Kreyszig [1] Lax [1] Maddox [1] Naylor and Sell [1] Pietsch [1] Reed and Simon [1] Robertson and Robertson [1] Rudin [1] Rynne and Youngson [1] Schwartz [1] Simmons [1] Taylor and Lay [1] Yosida [1]
Problems Problem 4.1. We shall say that a topology T on a linear space X over a field F is compatible with the linear structure of X if vector addition and scalar multiplication are continuous mappings of X ×X into X and of F ×X into X , respectively. In this case T is said to be a compatible topology (or a linear topology) on X . When we refer to continuity of the mappings X ×X → X and F ×X → X defined by (x, y) → x + y and (α, x) → αx, respectively, it is understood that X ×X and F ×X are equipped with their product topology. If X is a metric space, then these are the topologies induced by any of the uniformly equivalent metrics of Problems 3.9 and 3.33. If X is a general topological space, then these are the product topologies (cf. remark in Problem 3.64). A topological vector space (or topological linear space) is a linear space X equipped with a compatible topology.
Problems
271
(a) Show that, for each y in a topological vector space X and each nonzero α in F , the translation mapping x → x + y and the scaling mapping x → αx are homeomorphisms of X onto itself. (b) Show that every normed space is a topological vector space (a metrizable topological vector space, that is). In other words, show that vector addition and scalar multiplication are continuous mappings of X ×X into X and of F ×X into X , respectively, with respect to the norm topology on X . Problem 4.2. Consider the definitions of convex set and convex hull in a linear space (Problem 2.2). Recall that the closure of a subset of a topological space is the intersection of all closed subsets that include it. Prove the following assertions. (a) In a topological vector space the closure of a convex set is convex. The intersection of all closed and convex subsets that include a subset A of a topological vector space is called the closed convex hull of A. (b) In a topological vector space the closed convex hull of a set A coincides with co(A)− (i.e., it coincides with the closure of the convex hull of A). A subset A of a linear space is balanced if αx ∈ A for every vector x ∈ A and every scalar α such that |α| ≤ 1 (i.e., if αA ⊆ A whenever |α| ≤ 1). A subset of a linear space is absolutely convex if it is both convex and balanced. (c) A subset A of a linear space is absolutely convex if and only if αx + β y ∈ A for every x, y ∈ A and every scalars α, β such that |α| + |β| ≤ 1 (hence the term “absolutely convex”). (d) The interior A◦ of an absolutely convex set A contains the origin whenever A◦ is nonempty. (e) In a topological vector space the closure of a balanced set is balanced, and therefore the closure of an absolutely convex set is absolutely convex. A subset A of a linear space X is absorbing (or absorbent) if for each vector x ∈ X there exists ε > 0 such that αx ∈ A for every scalar α with |α| ≤ ε. Equivalently, if for each x ∈ X there exists λ > 0 such that x ∈ μA for every scalar μ with |μ| ≥ λ. (f) In a topological vector space every neighborhood of the origin is absorbing. A subset A of a linear space X absorbs a subset B of X (or B is absorbed by A) if there is a β > 0 such that x ∈ B implies x ∈ μA for every scalar μ with |μ| ≥ β (i.e., if B ⊆ μA whenever |μ| ≥ β). Thus A is absorbing if and only if it absorbs every singleton {x} in X . A subset B of a topological vector space is said to be bounded if it is absorbed by every neighborhood of the origin.
272
4. Banach Spaces
Problem 4.3. Let X be a linear space over a field F (either F = C or F = R). A quasinorm on X is a real-valued positive subadditive functional | | : X → R on X that satisfies the axioms (i), (ii), and (iv) of Definition 4.1 but, instead of axiom (iii), it satisfies the following ones. (iii ) (iii )
| αx|| ≤ | x|| whenever |α| ≤ 1, | αn x|| → 0 whenever αn → 0,
for every x ∈ X (where α stands for a scalar in F and {αn } for a scalar-valued sequence). A linear space X equipped with a quasinorm is called a quasinormed space. Consider the mapping d : X ×X → R defined by d(x, y) = | x − y|| for every x, y ∈ X , where | | : X → R is a quasinorm on X . (a) Show that d is an additively invariant metric on X that also satisfies d(αx, αy) ≤ d(x, y) for every x, y ∈ X and every α ∈ F such that |α| ≤ 1. This is called the metric generated by the quasinorm | | . (b) Show that a norm on X is a quasinorm on X , so that every normed space is a quasinormed space. A quasinormed space that is complete as a metric space (with respect to the metric generated by the quasinorm) is called an F -space. (c) Show that every Banach space is an F -space. Problem 4.4. A neighborhood base at a point x in a topological vector space is a collection N of neighborhoods of x with the property that every neighborhood of x includes some neighborhood in N . A locally convex space (or simply a convex space) is a topological vector space that has a neighborhood base at the origin consisting of convex sets. (a) Show that every normed space is locally convex. A barrel is a subset of a locally convex space that is convex, balanced, absorbing, and closed. It can be shown that every locally convex space has a neighborhood base at the origin consisting of barrels. A locally convex space is called barreled if every barrel is a neighborhood of the origin. Barreled spaces can be thought of as a generalization of Banach spaces. Indeed, a sequence {xn } in a locally convex space is a Cauchy sequence if for each neighborhood N at the origin there exists an integer nN such that xm − xn ∈ N for all m, n ≥ nN . It can be verified that every convergent sequence in a metrizable locally convex space X (i.e., every sequence that is eventually in every open neighborhood of a point x in X ) is a Cauchy sequence. A set A in a metrizable locally convex space is complete if every Cauchy sequence in A converges to a point of A. A complete metrizable locally convex space is called a Fr´echet space. Recall that every Banach space is an F -space (see Problem 4.3).
Problems
273
(b) Show that every F -space is a Fr´echet space. (c) Show that every Fr´echet space is a barreled space. Hint : A Fr´echet space X is a complete metric space. Take any barrel B in X . Show that the countable collection {nB}n≥1 of closed sets covers X . Apply the Baire Category Theorem (Theorem 3.58); see Problem 4.2(d). We shall return to barreled spaces in Problem 4.44. Problem 4.5. Recall that subset A of a metric space is bounded if and only if diam(A) < ∞ (see Section 3.1). Prove the following propositions. (a) A subset A of a normed space is bounded if and only if supx∈A #x# < ∞. (By convention, supx∈∅ #x# = 0.) Now consider the definition of a bounded subset of a topological vector space as given in Problem 4.2. (b) A set A is bounded as a subset of a normed space X if and only if it is bounded as a subset of the topological vector space X . In other words, supx∈A #x# < ∞ if and only if A is absorbed by every neighborhood of the origin of X . Thus the notion of a bounded subset of a normed space is unambiguously defined. Let A be a subset of a normed space that contains the origin (A\{0} = ∅).
−1 (c) supx∈A #x# < ∞ implies inf x∈A\{0} #x#−1 = supx∈A #x# .
−1 . (d) inf x∈A #x# = 0 implies supx∈A\{0} #x#−1 = inf x∈A #x# (e) inf x∈A #x# = 0 if and only if supx∈A\{0} #x#−1 < ∞. Note: inf x∈A #x# = 0 if and only if inf x∈A #x# > 0, inf x∈A #x# ≤ supx∈A #x#, and inf x∈A #x# < ∞ even if A is unbounded. A subset A = ∅ of a normed space is bounded away from zero if inf x∈A #x# = 0. A mapping F of a nonempty set S into a normed space X is bounded if and only if sups∈S #F (s)# < ∞, and it is bounded away from zero if and only if inf s∈S #F (s)# = 0. Thus an X -valued sequence {xn } is bounded if and only if supn #xn # < ∞, and it is bounded away from zero if and only if inf n #xn # = 0. Problem 4.6. This problem is entirely based on the triangle inequality. Con1 ∞ sider the space (+ , # #1 ) and (+ , # #∞ ) of Example 4.B. Take x = {ξk }∞ k=1 ∞ 1 in + and let {xn }∞ be an -valued sequence (i.e., each xn = {ξn (k)}∞ n=1 + k=1 1 1 ∞ lies in + ). Recall: + ⊂ + . Show that 1 . (a) if #xn − x#∞ → 0 and supn #xn #1 < ∞, then x ∈ + m Hint : k=1 |ξk | ≤ m#xn − x#∞ + supn #xn #1 for each m ≥ 1. 1 and show that Now suppose x ∈ +
274
4. Banach Spaces
(b) if #xn − x#∞ → 0 and #xn #1 → #x#1 , then #xn − x#1 → 0. ∞ 1 Hint : If x = {ξk }∞ k=1 and z = {ζk }k=1 are in + , then, for each m ≥ 1, ∞ #z#1 + #x#1 ≤ #z − x#1 + 2m#z#∞ + 2 k=m+1 |ξk |. m ∞ Observe that #z#1 + #x#1 = k=1 # (|ξk | − |ζ# k |) + k=m+1 (|ζk | − |ξk |) +
m ∞ # # 2 k=1 |ζk | + k=m+1 |ξk | and |ξk | − |ζk | ≤ |ξk − ζk |. Prove the above 1 auxiliary inequality and conclude: if x, y ∈ + , then # # ∞ #y − x#1 ≤ # #y#1 − #x#1 # + 2m#y − x#∞ + 2 k=m+1 |ξk |
for each m ≥ 1. Show that, under the assumptions of (b), lim supn #xn − x#1 ≤ 2 ∞ for all m ≥ 1. k=m+1 |ξk | 1 Next suppose the sequence {xn }∞ n=1 is dominated by a vector y in + . That ∞ 1 is, |ξn (k)| ≤ |υ k | for each k ≥ 1 and all n ≥ 1, for some y = {υk }k=1 in + . ∞ Equivalently, k=1 supn |ξn (k)| < ∞. Show that ∞ (c) if #xn − x#∞ → 0 and k=1 supn |ξn (k)| < ∞, then #xn − x#1 → 0. 1 (since Hint : Item (a) and the dominance condition ensure that x ∈ + 1 supn #xn #1 ≤ #y#1 for some y ∈ + ). Moreover, for each m ≥ 1, ∞ ∞ #xn − x#1 ≤ m#xn − x#∞ + k=m+1 |ξn (k)| + k=m+1 |ξk |. 1 ∞ Extend these results to + (X ) and + (X ) as in Example 4.F.
Problem 4.7. Let {xi }∞ i=1 be a sequence in a normed space X and nconsider ∞ the sequence {yn }∞ of partial sums of {x } ; that is, y = i n n=1 i=1 i=1 xi for each n ≥ 1. Prove that the following assertions are pairwise equivalent. (a) {yn }∞ n=1 is a Cauchy sequence in X . n+k (b) The real sequence {# i=n xi #}∞ n=1 converges to zero uniformly in k, i.e., , n+k , , , lim sup , xi , = 0. n
k≥0
i=n
(c) For each real number ε > 0 there exists an integer nε ≥ 1 such that n , , , , xi , < ε ,
whenever
nε ≤ m ≤ n.
i=m
(Hint : Problem 3.51.) Observe that, according to item (b), xn → 0 in X whenn ever { i=1 xi }∞ is a Cauchy sequence in X and, in particular, whenever n=1 ∞ the infinite series i=1 xi converges in X . That is, xn → 0 in X for every summable sequence {xi }∞ i=1 in X . Now suppose X is a Banach space. Since X is complete, the very definition of completeness says that each of the above equivalent assertions also is equivalent to the following one.
Problems
(d) The infinite series summable).
∞
i=1 xi
275
converges in X (i.e., the sequence {xi }∞ i=1 is
If X is a Banach space, then condition (c) (or condition (b)) is referred to as the Cauchy criterion for convergent infinite series. Problem 4.8. Proposition 4.4 says that every absolutely summable sequence in a Banach space is summable. 1 2 (a) Consider the + -valued sequence {xi }∞ i=1 with each xi = i ei , where ei = ∞ 2 {δik }k=1 in + (just one nonzero entry equal to 1 at the ith position) for each positive integer i. Show that {xi }∞ i=1 is a summable sequence in the 2 Banach space + but is not absolutely summable.
space if Problem 4.7 says that {xi }∞ i=1 is a summable sequence in a Banach n+k n+k and only if limn supk≥0 # i=n xi # = 0, which implies that limn # i=n xi # = 0 for every integer k ≥ 0. (b) Give an example of a nonsummable sequence (or, equivalently, an example n+k of a nonconvergent series) in a Banach space such that limn # i=n xi # = 0 for every integer k ≥ 0. (Hint : ξi = 1i in R.) Problem 4.9. Prove the following propositions. ∞ (a) If {xn }∞ n=1 and {yn }n=1 are summable sequences in a normed space X , ∞ ∞ then {αxn + β yn }∞ n=1 = α{xn }n=1 + β {yn }n=1 is again a summable sequence in X , and ∞ ∞ ∞ (αxj + β yj ) = α xj + β yj , j=1
j=1
j=1
for every pair of scalars α, β ∈ F . This shows that the collection of all X -valued summable sequences is a linear manifold of the linear space X N (see Example 2.F), and hence is a linear space itself. (b) If {xn }∞ n=1 is a summable sequences in a normed space X , then ∞
xj → 0 in X
as
n → ∞.
j=n
n−1 m m Hint : j=1 xj = j=1 xj + j=n xj for every 1 < n ≤ m. Now use item ∞ ∞ (a) to verify that the series j=n xj = j=1 xn+1−j converges in X , and n−1 ∞ ∞ j=1 xj = j=1 xj + j=n xj for every n > 1. Thus the result follows by uniqueness of the limit. Problem 4.10. Take a sequence x = {xi }∞ i=1 of linearly independent vectors in a normed space X . Let Ax be the collection of all scalar-valued sequences ∞ a = {αi }∞ α x converges in X (i.e., such that j j i=1 for which the series j=1 {αi xi }∞ i=1 is a summable sequence in X ). Show that
276
(a)
4. Banach Spaces
Ax is a linear manifold of the linear space F N .
Recall: F = C or F = R, whether X is a complex or real linear space, respectively. Verify that the function # # : Ax → R, defined by (b)
n , , , , #a# = sup , αi xi , n
X
i=1
for every a = {αi }∞ i=1 in Ax , is a norm on Ax . Take an arbitrary integer i ≥ 1. Show that |αi |#xi #X ≤ 2#a# i i−1 for every a = {αi }∞ i=1 in Ax . (Hint : αi xi = j=1 αj xj − j=1 αj xj .) Hence, since Ax is a normed space by (a) and (b), (c)
|αi − βi |#xi #X ≤ 2#a − b#
∞ ∞ for every a = {αi }∞ i=1 and b = {βi }i=1 in Ax . Now let {ak }k=1 be a Cauchy sequence in the normed space Ax . That is, each ak = {αk (i)}∞ i=1 lies in Ax is Cauchy in (A , # #), where # # is the and the Ax -valued sequence {ak }∞ x k=1 −1 norm in (b). According to (c), |αk (i) − α (i)| ≤ 2#xi #X #ak − a # for each i ≥ 1 and every k, ≥ 1 (note that #xi #X = 0 for every i ≥ 1 because {xi }∞ i=1 is linearly independent). Thus the scalar-valued sequence {αk (i)}∞ k=1 is Cauchy in F , and so convergent in F , for every integer i ≥ 1. Set
α i = lim αk (i) k
N in F for each i ≥ 1 and consider the sequence a = { αi }∞ i=1 in F . Take an arbitrary ε > 0. Show that there exists an integer kε ≥ 1 such that
n , , , , (αk (i) − α (i))xi , < 2ε , X
i=m
n for every 1 ≤ m ≤ n, whenever k, ≥ kε . (Hint : i=m (αk (i) − α (i))xi = m−1 n ∞ (i) (i) (i) (i) (α − α )x − (α − α )x and {a k i k i k }k=1 is a Cauchy sei=1 i=1 quence in Ax .) Hence n , , , , (d) (αk (i) − α i )xi , < 2ε , i=m
X
for every npair of nonnegative integers m ≤ n, nwhenever k ≥ kε . (Hint : Note that # i=m (αk (i) − lim α (i))xi #X = lim # i=m (αk (i) − α (i))xi #X . Why?) Next prove the following claims. If X is a Banach space, then a ∈ Ax . ∞ (Hint : (d) implies that the infinite series i=1 (αk (i) − α i )xi converges in the Banach space X for every k ≥ kε — see Problem 4.7: Cauchy criterion — and hence ak − a ∈ Ax for each k ≥ kε .) (e)
Problems
277
If a ∈ Ax , then ak → a in Ax .
(f)
(Hint : a ∈ Ax implies #ak − a# < 2ε whenever k ≥ kε by setting m = 1 in (d).) Finally conclude from (e) and (f): If (X , # #X ) is a Banach space, then (Ax , # #) is a Banach space. Problem 4.11. Let X be a normed space. A sequence {xk } of vectors in X (indexed either by N or by N 0 ) is a Schauder basis for X if for each x in X there exists a unique (similarly indexed) sequence of scalars {αk } such that α k xk x = k
n
(i.e., x = limn k=1 αk xk ). The entries of the scalar-valued sequence {αk } are called the coefficients of x with respect to the Schauder basis {xk }, and the ∞ (convergent) series k=1 αk xk is the expansion of x with respect to {xk }. Prove the following assertions. (a) Every Schauder basis for X is a sequence of linearly independent vectors in X that spans X . That is, if {xk } is a Schauder basis for X , then {xk } is linearly independent, and (ii) {xk } = X . (i)
(b) If a normed space has a Schauder basis, then it is separable. Hint : Proposition 4.9(b). Remark : An infinite sequence of linearly independent vectors exists only in an infinite-dimensional linear space. It is readily verified that finite-dimensional normed spaces are separable Banach spaces (see Problem 4.37) but, in this case, the purely algebraic notion of a (finite) Hamel basis is enough. Therefore, when the concept of a Schauder basis is under discussion, only infinitedimensional spaces are considered. Does every separable Banach space have a Schauder basis? This is a famous question, raised by Banach himself in the early 1930s, that remained open for a long period. Each separable Banach space that ever came up in analysis during that period (and this includes all classical examples) had a Schauder basis. The surprising negative answer to that question was given by Enflo in 1973, who constructed a separable Banach space that has no Schauder basis. See also the remark in Problem 4.58. Problem 4.12. For each integer k ≥ 1 let ek be a scalar-valued sequence with just one nonzero entry (equal to 1) at the kth position (i.e., ek = {δjk }∞ j=1 p ∞ for each k ≥ 1). Consider the Banach spaces + and + for any p ≥ 1 as in p Example 4.B. Show that the sequence {ek } is a Schauder basis for every + , ∞ and verify that + has no Schauder basis. (Hint : Example 3.Q.)
278
4. Banach Spaces
Problem 4.13. Let M be a subspace of a normed space X . If X is a Banach space, then M and X /M are both Banach spaces (Propositions 4.7 and 4.10). Conversely, if M and X /M are both Banach spaces, then X is a Banach space. Indeed, suppose M and X /M are Banach spaces, and let {xn } be an arbitrary Cauchy sequence in X (indexed, for instance, by N ). (a) Show that {[xn ]} is a Cauchy sequence in X /M and conclude that {[xn ]} converges in X /M to, say, [x] ∈ X /M. (b) Take any x in [x]. For each n there exists x n in [xn − x] such that 0 ≤ # xn #X ≤ #[xn − x]# +
1 n
= #[xn ] − [x]# + n1 .
Prove the above assertion and conclude: x n → 0 in X . Observe that x n − xn + x lies in M for each n. In fact, [ xn ] = [xn − x] (since x n ∈ [xn − x]), and so [ xn − xn + x] = [ xn ] − [xn − x] = [0] = M. (c) Set un = x n − xn + x in M and show that {un } is a Cauchy sequence in M. Thus conclude that {un } converges in M (and so in X ) to, say, u ∈ M. n + x − un for each n, it follows that the sequence {xn } converges Since xn = x in X to x − u ∈ X (recall that vector addition and scalar multiplication are continuous mappings — see Problem 4.1). Outcome: Every Cauchy sequence in X converges in X , which means that X is a Banach space. co co Problem 4.14. Consider the linear space + and let {ak }∞ k=1 be an + ∞ valued sequence. That is, each ak = {αk (n)}n=1 is a scalar-valued sequence that converges to zero:
lim |αk (n)| = 0 n
for every
k ≥ 1.
Suppose further that there exists a real number α ≥ 0 such that |αk (n)| ≤ α for all k,n ≥ 1 or, equivalently, suppose sup sup |αk (n)| < ∞. n
k
1 Take an arbitrary x = {ξk }∞ k=1 in + so that ∞
|ξk | < ∞.
k=1
Consider the above assumptions and prove the following proposition. (a)
lim sup |αk (n)||ξk | = 0 n
k
and
∞ k=1
sup |αk (n)||ξk | < ∞. n
∞ Hint : supk |αk (n)||ξk | ≤ max max1≤k≤m |αk (n)||ξk |, α k=m+1 |ξk | for every m, n ≥ 1, which implies
Problems
lim sup sup |αk (n)||ξk | ≤ α n
k
∞
|ξk |
for every
279
m ≥ 1.
k=m+1
Next use the dominated convergence of Problem 4.6(c) to show that (b)
lim n
∞
|αk (n)||ξk | = 0.
k=1
∞ Then conclude: For each n ≥ 1 the infinite series k=1 (in αk (n) ξk converges ∞ the Banach space F ) and the scalar-valued sequence { ∞ k=1 αk (n) ξk }n=1 is an co element of + . Therefore, every infinite matrix A = [αk (n)]k,n≥1 whose rows ak co 1 1 satisfy the above assumptions represents a mapping of + into + . Equip + co c 1 o and + with their usual norms (# #1 on + and # #∞ on + ). Observe that the assumptions on A simply say that {ak }∞ k=1 is a bounded (i.e., supk #ak #∞ < ∞) co + -valued sequence. Show that, under these assumptions, such a mapping in co 1 fact is a bounded linear transformation of + into + : co 1 , + ] A ∈ B[+
and
#A# = sup #ak #∞ . k
Problem 4.15. Let T ∈ B[X , Y ] be a continuous linear transformation of a normed X into a normed space Y. (a) If {xk }∞ k=0 is a summable sequence in X , then show that {T xk }∞ is a summable sequence in Y and k=0 ∞ k=0
T xk = T
∞
xk .
k=0
(b) If A is any set of vectors in X , then show that 1 − 1 TA = T A . Problem 4.16. Let (X1 , # #1 ) and (X2 , # #2 ) be normed spaces and consider the normed space X1 ⊕ X2 equipped with any of the norms of Example 4.E, which we shall denote by # #. That is, for every (x1 , x2 ) in X1 ⊕ X2 , either #(x1 , x2 )#p = #x1 #p1 + #x2 #p2 for some p ≥ 1 or #(x1 , x2 )# = max{#x1 #1 , #x2 #2 }. Let T1 and T2 be operators on X1 and X2 (i.e., T1 ∈ B[X1 ] and T2 ∈ B[X2 ]). Consider the direct sum T ∈ L[X1 ⊕ X2 ] of T1 and T2 (defined in Section 2.9): ! T1 O , T = T1 ⊕ T2 = O T2 where T1 = T |X1 and T2 = T |X2 are the direct summands of T . (a) Show that T ∈ B[X1 ⊕ X2 ] and #T # = max #T1 #, #T2 # . Hint : For any norm# # that equips X1 ⊕ X2 (among those of Example 4.E), #T (x1 , x2 )# ≤ max #T1 #, #T2 # #(x1 , x2 )# for every (x1 , x2 ) ∈ X1 ⊕ X2 .
280
4. Banach Spaces
Generalize to a countable direct sum. That is, let {X k } be a countable family of normed spaces and consider the normed space k X k of Examples . or - 4.E 4.F (equipped with any of those norms, so that either k Xk = Xk p or k . - k Xk = k Xk ∞ in case of a countably infinite family as in Example 4.F). Let {Tk } be a similarly indexed countable family of operators on Xk (each Tk lying in B[Xk ]) such that supk #Tk # < ∞. Set T {xk } = {Tk xk } for each {xk } ∈ k Xk . (b) Show that this actually defines a bounded linear transformation T of k Xk into itself. Such an operator is usually denoted by " . - Tk in B k Xk , T = k
and referred to as the direct sum of {Tk }. Moreover, verify that Tk = T |Xk (in the sense of Section 2.9) for each k. These are the direct summands of T . Finally, show that #T # = sup #Tk #. k
If Xk = X for a single normed space X , then T = k Tk is sometimes referred p ∞ to as a block diagonal operator acting on + (X ) or on + (X ). Problem 4.17. Let Xi and Yi be normed spaces for i = 1, 2 and consider the normed spaces consisting of the direct sums X1 ⊕ X2 and Y1 ⊕ Y2 equipped with any of the norms of Example 4.E. (a) Take Tij in B[Xj , Yi ] for i, j = 1, 2 so that T11 x1 + T12 x2 lies in Y1 and T21 x1 + T22 x2 lies in Y2 for every (x1 , x2 ) in X1 ⊕ X2 . Set T (x1 , x2 ) = (T11 x1 + T12 x2 , T21 x2 + T22 x2 ) in Y1 ⊕ Y2 . Show that this defines a mapping T : X1 ⊕ X2 → Y1 ⊕ Y2 , which in fact lies in B[X1 ⊕ X2 , Y1 ⊕ Y2 ], and max #Tij # ≤ #T # ≤ 4 max #Tij # . i=1,2 j=1,2
i=1,2 j=1,2
(b) Conversely, suppose T ∈ B[X1 ⊕ X2 , Y1 ⊕ Y2 ]. If x1 is any vector in X1 , then T (x1 , 0) = (T11 x1 , T21 x1 ) in Y1 ⊕ Y2 , where T11 is a mapping of X1 into Y1 and T21 is a mapping of X1 into Y2 . Similarly, if x2 is any vector in X2 , then T (0, x2 ) = (T12 x2 , T22 x2 ) in Y1 ⊕ Y2 , where T12 is a mapping of X2 into Y1 and T22 is a mapping of X2 into Y2 . Show that Tij ∈ B[Xj , Yi ] and #Tij # ≤ #T # for every i, j = 1, 2.
Problems
281
Consider the bounded linear transformation T ∈ B[X1 ⊕ X2 , Y1 ⊕ Y2 ] in (b). Since T (x1 , x2 ) = T (x1 , 0) ⊕ T (0, x2 ) in Y1 ⊕ Y2 , it follows that T (x1 , x2 ) = (T11 x1 + T12 x2 , T21 x1 + T22 x2 ) for every (x1 , x2 ) in X1 ⊕ X2 as in (a). This establishes a one-to-one correspondence between each T in B[X1 ⊕ X2 , Y1 ⊕ Y2 ] and a 2×2 matrix of bounded linear transformations [Tij ], called the operator matrix for T , which we shall represent by the same symbol T (instead of, for instance, [T ]) and write ! T11 T12 T = . T21 T22 If Yi = Xi for i = 1, 2, then T is the direct sum T11 ⊕ T22 in B[X1 ⊕ X2 ] of the previous problem if and only if T12 = O in B[X2 , X1 ] and T21 = O in B[X1 , X2 ]. Problem 4.18. Let T be an operator on a normed space X (i.e., T ∈ B[X ]). A subset A of X is T -invariant (or A is an invariant subset for T ) if T (A) ⊆ A (i.e., if T x ∈ A for every x ∈ A). If M is a linear manifold (or a subspace) of X and, as a subset of X , is T -invariant, then we say that M is an invariant linear manifold (or an invariant subspace) for T . Prove the following propositions. (a) If A is T -invariant, then A− is T -invariant. (b) If M is an invariant linear manifold for T , then M− is an invariant subspace for T . (c) {0} and X are invariant subspaces for every T in B[X ]. Problem 4.19. Let Lat(X ) be the lattice of all subspaces of a normed space X . Since {0} and X are subspaces of X, they lie in Lat(X ). These are the trivial elements of Lat(X ): a nontrivial subspace of X is a proper nonzero subspace of X (i.e., M ∈ Lat(X ) in nontrivial if {0} = M = X ). Prove (a) and (b). (a) There are nontrivial subspaces in Lat(X ) ifand only if the dimension of X is greater than 1 (i.e., Lat(X ) = {0}, X if and only if dim X > 1). Let B[X ] be the unital algebra of all operators on a normed space X and let T be an operator in B[X ]. A nontrivial invariant subspace for T is a nontrivial element of Lat(X ) which is invariant for T (i.e., a subspace M ∈ Lat(X ) such that {0} = M = X and T (M) ⊆ M). An element of B[X ] is a scalar operator if it is a multiple of the identity, say, αI for some scalar α. (b) Every subspace in Lat(X ) is invariant for any scalar operator in B[X ], and so every scalar operator has a nontrivial invariant subspace if dim X > 1. Problem 4.20. Let X be a normed space and take T ∈ B[X ]. Show that (a) N (T ) and R(T )− are invariant subspaces for T , (b) N (T ) = {0} and R(T )− = X if T has no nontrivial invariant subspace.
282
4. Banach Spaces
Take S and T in B[X ]. We say that S and T commute if S T = T S. Show that (c) N (S), N (T ), R(S)−, and R(T )− are invariant subspaces for both S and T whenever S and T commute. Problem 4.21. Let S ∈ B[X ] and T ∈ B[X ] be nonzero operators on a normed space X . Suppose S T = O and show that (a) T (N (S)) ⊆ T (X ) = R(T ) ⊆ N (S), (b) {0} = N (S) = X
and {0} = R(T )− = X ,
(c) S(R(T )− ) ⊆ S(R(T ))− ⊆ R(T )−. Conclusion: If S = O, T = O, and S T = O, then N (S) and R(T )− are nontrivial invariant subspaces for both S and T . Problem 4.22. Take T ∈ B[X ] on a normed space X . Verify that p(T ) ∈ B[X ] for every nonzero polynomial p(T ) of T . In particular, T n ∈ B[X ] for every integer n ≥ 0. (Hint : B[X ] is an algebra; see Problems 2.20 and 3.29.) (a) Show that N (p(T )) and R(p(T ))− are invariant subspaces for T . Recall that an operator in B[X ] is nilpotent if T n = O for some positive integer n, and algebraic if p(T ) = O for some nonzero polynomial p (cf. Problem 2.20). (b) Show that every nilpotent operator in B[X ] (with dim X > 1) has a nontrivial invariant subspace. (c) Suppose X is a complex normed space and dim X > 1. Show that every algebraic operator in B[X ] has a nontrivial invariant subspace. Hint : Every polynomial (in one complex variable and with complex coefficients) of degree n ≥ 1 is the product of a polynomial of degree n − 1 and a polynomial of degree 1. Problem 4.23. Let Lat(T ) denote the subcollection of Lat(X ) made up of all invariant subspaces for T ∈ B[X ], where X is a normed space. It is plain that an T has no nontrivial invariant subspace if and only if Lat(T ) = operator {0}, X (see Problems 4.18 and 4.19). (a) Show that Lat(T ) is a complete lattice in the inclusion ordering. Hint : Intersection and closure of sums of invariant subspaces are again invariant subspaces. See Section 4.3. Take an operator T in B[X ] and a vector x in X . Consider the X -valued power sequence {T n x}n≥0 . The range of {T n x}n≥0 is called the orbit of x under T . (b) Show that the (linear) span of the orbit of x under T is the set of the images of all nonzero polynomials of T at x; that is, span {T nx}n≥0 = p(T )x ∈ X : p is a nonzero polynomial .
Problems
283
Since span {T nx}n≥0 is a linear manifold of X , it follows that its closure, n (span {T x}n≥0 )− = {T nx}n≥0 , is a subspace of X (Proposition 4.8(b)). That is, {T nx}n≥0 ∈ Lat(X ). (c) Show that {T n x}n≥0 ∈ Lat(T ). These Lat(T ): M ∈ Lat(T ) is cyclic for T if M = n are the cyclic subspaces in {T x}n≥0 for some x ∈ X . If {T n x}n≥0 = X , then x is said to be a cyclic vector for T . We say that a linear manifold M of X is totally cyclic for T if every nonzero vector in M is cyclic for T . (d) Verify that T has no nontrivial invariant subspace if and only if X is totally cyclic for T . Problem 4.24. Let X and Y be normed spaces. A bounded linear transformation X ∈ B[X , Y ] intertwines an operator T ∈ B[X ] to an operator S ∈ B[Y ] if XT = SX. If there exists an X intertwining T to S, then we say T is intertwined to S. Suppose XT = SX. Show by induction that (a)
XT n = S nX
for every positive integer n. Thus verify that (b)
Xp(T ) = p(S)X
for every polynomial p. Now use Problem 4.23(b) to prove that
(c) X span {T n x}n≥0 = span {S nXx}n≥0 for each x ∈ X , and therefore (see Problem 3.46(a))
(d) X {T n x}n≥0 ⊆ {S nXx}n≥0 for every x ∈ X . An operator T ∈ B[X ] is densely intertwined to an operator S ∈ B[Y ] if there is a bounded linear transformation X ∈ B[X , Y ] with dense range intertwining T to S. If XT = SX and R(X)− = Y, then show that n (e) {T x}n≥0 = X implies Y = {S nXx}n≥0 . Conclusion: Suppose T in B[X ] is densely intertwined to S in B[Y ]. Let X in B[X , Y ] be a transformation with dense range intertwining T to S. If x ∈ X is a cyclic vector for T , then Xx ∈ Y is a cyclic vector for S. Thus, if a linear manifold M of X is totally cyclic for T , then the linear manifold X(M) of Y is totally cyclic for S. Problem 4.25. Here is a sufficient condition for transferring nontrivial invariant subspaces from S to T whenever T is densely intertwined to S. Let X and Y be normed spaces and take T ∈ B[X ], S ∈ B[Y ], and X ∈ B[X , Y ] such that
284
4. Banach Spaces
XT = SX. Prove the following assertions. (a) If M ⊆ Y is an invariant subspace for S, then the inverse image of M under X, X −1 (M) ⊆ X , is an invariant subspace for T . (b) If, in addition, M = Y (i.e., M is a proper subspace), R(X) ∩ M = {0}, and R(X)− = Y, then {0} = X −1 (M) = X . Hint : Problems 1.2 and 2.11, and Theorem 3.23. Conclusion: If T is densely intertwined to S, then the inverse image under the intertwining transformation X of a nontrivial invariant subspace M for S is a nontrivial invariant subspace for T , provided that the range of X is not (algebraically) disjoint with M. Show that the condition R(X) ∩ M = {0} in (b) is not redundant. That is, if M is a subspace of Y, then show that (c) {0} = M = Y and R(X)− = Y does not imply R(X) ∩ M = {0}. However, if X is surjective, then the condition R(X) ∩ M = {0} in (b) is trivially satisfied whenever M = {0}. Actually, with the assumption XT = SX still in force, check the proposition below. (d) If S has a nontrivial invariant subspace, and if R(X) = Y, then T has a nontrivial invariant subspace. Problem 4.26. Let X be a normed space. The commutant of an operator T in B[X ] is the set {T } of all operators in B[X ] that commute with T . That is, {T } = C ∈ B[X ]: C T = T C . In other words, the commutant of an operator is the set of all operators intertwining it to itself. (a) Show that {T } is an operator algebra that contains the identity (i.e., {T } is a unital subalgebra of the normed algebra B[X ]). A linear manifold (or a subspace) of X is hyperinvariant for T ∈ B[X ] if it is invariant for every C ∈ {T }; that is, if it is an invariant linear manifold (or an invariant subspace) for every operator in B[X ] that commutes with T . As T ∈ {T } , every hyperinvariant linear manifold (subspace) for T obviously is an invariant linear manifold (subspace) for T . Take an arbitrary T ∈ B[X ] and, for each x ∈ X , set Tx = Cx = y ∈ X : y = Cx for some C ∈ {T } . C∈{T }
Tx is never empty (for instance, x ∈ Tx because I ∈ {T } ). In fact, 0 ∈ Tx for every x ∈ X , and Tx = {0} if and only if x = 0. Prove the next proposition.
Problems
285
(b) For each x ∈ X , Tx− is a hyperinvariant subspace for T . Hint : As an algebra, {T } is a linear space. This implies that Tx is a linear manifold of X . If y = C0 x for some C0 ∈ {T } , then Cy = CC0 x ∈ Tx for every C ∈ {T } (i.e., Tx is hyperinvariant for T because {T } is an algebra). See Problem 4.18(b). Problem 4.27. Let X and Y be normed spaces. Take T ∈ B[X ], S ∈ B[Y ], X ∈ B[X , Y ], and Y ∈ B[Y, X ] such that XT = SX
and
Y S = T Y.
Show that if C ∈ B[X ] commutes with T , then XC Y commutes with S. That is (see Problem 4.26), show that (a) XC Y ∈ {S} for every C ∈ {T }. Now consider the subspace Tx− of X that, according to Problem 4.26, is nonzero and hyperinvariant for T for every nonzero x in X . Under the above assumptions on T and S, prove the following propositions. (b) Suppose M is a nontrivial hyperinvariant subspace for S. If R(X)− = Y and N (Y ) ∩ M = {0}, then Y (M) = {0} and Tx− = X for every nonzero x in Y (M). Consequently, Tx− is a nontrivial hyperinvariant subspace for T whenever x is a nonzero vector in Y (M). Hint : Since M is hyperinvariant for S, it follows from (a) that M is invariant for XCY whenever C ∈ {T }. Use this fact to show that X(Tx ) ⊆ M for every x ∈ Y (M), and hence X(Tx− ) ⊆ M− = M (Problem 3.46(a)). Now verify that Tx− = X implies R(X)− = X(X )− = X(Tx− )− ⊆ M. Thus, if M = Y and R(X)− = Y, then Tx− = X for every vector x in Y (M). Next observe that if Y (M) = {0} (i.e., if M ⊆ N (Y )), then N (Y ) ∩ M = M. Conclude: If M = {0} and N (Y ) ∩ M = {0}, then Y (M) = {0}. Finally recall that Tx = {0} for every x = 0 in X , and so {0} = Tx− = X for every nonzero vector x in Y (M). (c) If S has a nontrivial hyperinvariant subspace, and if R(X)− = Y and N (Y ) = {0}, then T has a nontrivial hyperinvariant subspace. Problem 4.28. A bounded linear transformation X of a normed space X into a normed space Y is quasiinvertible (or a quasiaffinity) if it is injective and has a dense range (i.e., N (X) = {0} and R(X)− = Y). An operator T ∈ B[X ] is a quasiaffine transform of an operator S ∈ B[Y ] if there exists a quasiinvertible transformation X ∈ B[X , Y ] intertwining T to S. Two operators are quasisimilar if they are quasiaffine transforms of each other. In other words, T ∈ B[X ] and S ∈ B[Y ] are quasisimilar (notation: T ∼ S) if there exists X ∈ B[X , Y ] and Y ∈ B[Y, X ] such that N (X) = {0},
R(X)− = Y,
N (Y ) = {0},
R(Y )− = X ,
286
4. Banach Spaces
XT = SX
and
Y S = T Y.
Prove the following propositions. (a) Quasisimilarity has the defining properties of an equivalence relation. (b) If two operators are quasisimilar and if one of them has a nontrivial hyperinvariant subspace, then so has the other. Problem 4.29. Let X and Y be normed spaces. Two operators T ∈ B[X ] and S ∈ B[Y ] are similar (notation: T ≈ S) if there exists an injective and surjective bounded linear transformation X of X onto Y, with a bounded inverse X −1 of Y onto X , that intertwines T to S. That is, T ∈ B[X ] and S ∈ B[Y ] are similar if there exists X ∈ B[X , Y ] such that N (X) = {0}, R(X) = Y, X −1 ∈ B[Y, X ], and XT = SX. (a) Let T be an operator on X and let S be an operator on Y. If X is a bounded linear transformation of X onto Y with a bounded inverse X −1 of Y onto X (which is always linear), then check that XT = SX ⇐⇒ T = X −1SX ⇐⇒ S = X T X −1 ⇐⇒ X −1S = T X −1. Now prove the following assertions. (b) If T and S are similar, then they are quasisimilar. (c) Similarity has the defining properties of an equivalence relation. (d) If two operators are similar, and if one of them has a nontrivial invariant subspace, then so has the other. (Hint : Problem 4.25.) Note that we are using the same terminology of Section 2.7, namely, “similar”, but now with a different meaning. The linear transformation X: X → Y in fact is a (linear) isomorphism so that X and Y are isomorphic linear spaces, and hence the concept of similarity defined above implies the purely algebraic homonymous concept defined in Section 2.7. However, we are now imposing that all linear transformations involved are continuous (equivalently, that all of them are bounded), viz., T , S, X and also the inverse X −1 of X. Problem 4.30. Let {xk }∞ k=1 be a Schauder basis for a (separable) Banach space X (see Problem 4.11) so that every x ∈ X has a unique expansion x =
∞
αk (x)xk
k=1
with respect to {xk }∞ k=1 . For each k ≥ 1 consider the functional ϕk : X → F that assigns to each x ∈ X its unique coefficient αk (x) in the above expansion: ϕk (x) = αk (x)
Problems
287
for every x ∈ X . Show that ϕk is a bounded linear functional (i.e., ϕk ∈ B[X , F ] for each k ≥ 1). In other words, each coefficient in a Schauder basis expansion for a vector x in a Banach space X is a bounded linear functional on X . Hint : Let Ax be the Banach space defined in Problem 4.10. Consider the mapping Φ: Ax → X given by ∞ Φ(a) = αk xk k=1
for every a = {αk }∞ k=1 in Ax . Verify that Φ is linear, injective, surjective, and bounded (actually, Φ is a contraction: Φ(a) ≤ #a# for every a ∈ Ax ). Now apply Theorem 4.22 to conclude that Φ ∈ G[Ax , X ]. For each integer k ≥ 1 consider the functional ψk : Ax → F given by ψk (a) = αk for every a = {αk }∞ k=1 in Ax . Show that each ψk is linear and bounded. Finally, observe that the following diagram commutes. Φ−1
X −−−→ Ax ⏐ ⏐ ψk ϕk F
Problem 4.31. Let X and Y be normed spaces (over the same scalar field) and let M be a linear manifold of X . Equip the direct sum of M and Y with any of the norms of Example 4.E and consider the normed space M ⊕ Y. A linear transformation L: M → Y is closed if its graph is closed in M ⊕ Y. Since a subspace means a closed linear manifold, and recalling that the graph of a linear transformation of M into Y is a linear manifold of the linear space M ⊕ Y, such a definition can be rewritten as follows. A linear transformation L: M → Y is closed if its graph is a subspace of the normed space M ⊕ Y. Take an arbitrary L ∈ L[M, Y ] and prove that the assertions below are equivalent. (i)
L is closed.
(ii) If {un } is an M-valued sequence that converges in X , and if its image under L converges in Y, then lim un ∈ M
and
Symbolically, L is closed if and only if 6 un ∈ M → u ∈ X Lun → y ∈ Y
lim Lun = L lim un . =⇒
u∈M y = Lu
6 .
288
4. Banach Spaces
Hint : Apply the Closed Set Theorem. Use the norm # #1 on M ⊕ Y (i.e., #(u, y)#1 = #u#X + #y#Y for every (u, y) ∈ M ⊕ Y). Problem 4.32. Consider the setup of the previous problem and prove the following propositions. (a) If L ∈ B[M, Y ] and M is closed in X , then L is closed. Every bounded linear transformation defined on a subspace of a normed space is closed. In particular (set M = X ), if L ∈ B[X , Y ] then L is closed. (b) If M and Y are Banach spaces and if L ∈ L[M, Y ] is closed, then L ∈ B[M, Y ]. Every closed linear transformation between Banach spaces is bounded. (c) If Y is a Banach space and L ∈ B[M, Y ] is closed, then M is closed in X . Every closed and bounded linear transformation into a Banach space has a closed domain. Hint : Closed Graph Theorem and Closed Set Theorem. Recall that continuity means convergence preservation in the sense of Theorem 3.7, and also that the notions of “bounded” and “continuous” coincide for a linear transformation between normed spaces (Theorem 4.14). Now compare Corollary 3.8 with Problem 4.31 and prove the next proposition. (d) If M and Y are Banach spaces, then L ∈ L[M, Y ] is continuous if and only if it is closed. Problem 4.33. Let X and Y be Banach spaces and let M be a linear manifold of X . Take L ∈ L[M, Y ] and consider the following assertions. (i)
M is closed in X (so that M is a Banach space).
(ii) (iii)
L is a closed linear transformation. L is bounded (i.e., L is continuous).
According to Problem 4.32 these three assertions are related as follows: each pair of them implies the other . (a) Exhibit a bounded linear transformation that is not closed. 1 2 is a dense linear manifold of (+ , # #2 ). Take the inclusion map Hint : + 1 2 of (+ , # #2 ) into (+ , # #2 ).
The classical example of a closed linear transformation that is not bounded is the differential mapping D: C [0, 1] → C[0, 1]) defined in Problem 3.18. It is easy to show that C [0, 1], the set of all differentiable functions in C[0, 1] whose derivatives lie in C[0, 1], is a linear manifold of the Banach space C[0, 1]
Problems
289
equipped with the sup-norm. It is also easy to show that D is linear. Moreover, according to Problem 3.18(a), D is not continuous (and so unbounded). However, if {un } is a uniformly convergent sequence of continuously differentiable functions whose derivative sequence {Dun } also converges uniformly, then lim Dun = D(lim un ). This is a standard result from advanced calculus. Thus D is closed by Problem 4.31. (b) Give another example of an unbounded closed linear transformation. ∞ 1 1 , M = x = {ξk }∞ Hint : X = Y = + k=1 ∈ + : k=1 k|ξk | < ∞ , and D = 1 diag({k}∞ k=1 ) = diag(1, 2, 3, . . .): M → + . Verify that M is a linear mani1 1 fold of + . Use xn = n2 (1, . . . , 1, 0, 0, 0, . . .) ∈ M (the first n entries are all equal to n12 ; the rest is zero) to show that D is not continuous (Corollary 1 3.8). Suppose un → u ∈ + , with un = {ξn (k)}∞ k=1 in M, and Dun → y = ∞ 1 {υ (k)}k=1 ∈ + . Set ξ (k) = k1 υ (k) so that x = {ξ (k)}∞ k=1 lies in M. Now show that #un − x#1 ≤ #Dun − y#1 , and so un → x. Thus u = x ∈ M (uniqueness of the limit) and y = Du (since y = Dx). Apply Problem 4.31 to conclude that D is closed. Generalize to injective diagonal mappings with unbounded entries. Problem 4.34. Let M and N be subspaces of a normed space X . If M and N are algebraic complements of each other (i.e., M + N = X and M ∩ N = {0}), then we say that M and N are complementary subspaces in X . According to Theorem 2.14 the natural mapping Φ: M ⊕ N → M + N , defined by Φ((u, v)) = u + v for every (u, v) ∈ M ⊕ N , is an isomorphism between the linear spaces M ⊕ N and M + N if M ∩ N = {0}. Consider the direct sum M ⊕ N equipped with any of the norms of Example 4.E. Prove the statement: If M and N are complementary subspaces in a Banach space X , then the natural mapping Φ: M ⊕ N → M + N is a topological isomorphism. Hint : Show that the isomorphism Φ is a contraction when M ⊕ N is equipped with the norm # #1 . Recall that M and N are Banach spaces (Proposition 4.7) and conclude that M ⊕ N is again a Banach space (Example 4.E). Apply the Inverse Mapping Theorem to prove that Φ is a topological isomorphism when M ⊕ N is equipped with the norm # #1 . Also recall that the norms of Example 4.E are equivalent (see the remarks that follow Proposition 4.26). Problem 4.35. Prove the following propositions. (a) If P : X → X is a continuous projection on a normed space X , then R(P ) and N (P ) are complementary subspaces in X . Hint : R(P ) = N (I − P ). Apply Theorem 2.19 and Proposition 4.13. (b) Conversely, if M and N are complementary subspaces in a Banach space X , then the unique projection P : X → X with R(P ) = M and N (P ) = N of Theorem 2.20 is continuous and #P # ≥ 1.
290
4. Banach Spaces
Hint : Consider the natural mapping Φ: M ⊕ N → M + N of the direct sum M ⊕ N (equipped with any of the norms of Example 4.E) onto X = M + N . Let PM : M ⊕ N → M ⊆ X be the map defined by PM (u, v) = u for every (u, v) ∈ M ⊕ N , which is a contraction (indeed, #PM # = 1, see Example 4.I). Apply the previous problem to verify that the diagram Φ−1
M ⊕ N ←−−− M + N ⏐ ⏐ P PM X commutes. Thus show that P is continuous (note that P u = u for every u ∈ M = R(P )). Remarks: PM is, in fact, a continuous projection of M ⊕ N into itself whose range is R(PM ) = M ⊕ {0}. If we identify M ⊕ {0} with M (as we did in Example 4.I), then PM : M ⊕ N → M ⊕ {0} ⊆ M ⊕ N can be viewed as a map from M ⊕ N onto M, and hence we wrote PM : M ⊕ N → M ⊆ X ; the continuous natural projection of M ⊕ N onto M. Also notice that the above propositions hold for the complementary projection E = (I − P ): X → X as well, since N (E) = R(P ) and R(E) = N (P ). Problem 4.36. Consider a bounded linear transformation T ∈ B[X , Y ] of a Banach space X into a Banach space Y. Let M be a complementary subspace of N (T ) in X . That is, M is a subspace of X that is also an algebraic complement of the null space N (T ) of T : M = M−,
X = M + N (T ),
and
M ∩ N (T ) = {0}.
Set TM = T |M : M → Y, the restriction of T to M, and verify the following propositions. (a) TM ∈ B[M, Y ], R(TM ) = R(T ), and N (TM ) = {0}. Hint : Problems 2.14 and 3.30. (b) R(TM ) = R(TM )− if and only if there exists TM−1 ∈ B[R(TM ), M]. Hint : Proposition 4.7 and Corollary 4.24. (c) If A ⊆ R(T ) and TM−1 (A)− = M, then T −1 (A)− = X . Hint : Take an arbitrary x = u + v ∈ X = M + N (T ), with u ∈ M and v ∈ N (T ). Verify that there exists a TM−1 (A)-valued sequence {un } that converges to u. Set xn = un + v in X and show that {xn } is a TM−1 (A)valued sequence that converges to x. Apply Proposition 3.32. Now use the above results to prove the following assertion. (d) If A ⊆ R(T ) and A− = R(T ) = R(T )−, then T −1 (A)− = X .
Problems
291
That is, the inverse image under T of a dense subset of the range of T is dense in X whenever X and Y are Banach spaces and T ∈ B[X , Y ] has a closed range and a null space with a complementary subspace in X . This can be viewed as a converse to Problem 3.46(c). Problem 4.37. Prove the following propositions. (a) Every finite-dimensional normed space is a separable Banach space. Hint : Example 3.P, Problem 3.48, and Corollaries 4.28 and 4.31. (b) If X and Y are topologically isomorphic normed spaces and if one of them is a (separable) Banach space, then so is the other. Hint : Theorems 3.44 and 4.14. Problem 4.38. Let X and Y be normed spaces and take T ∈ L[X , Y ]. If either X or Y is finite dimensional, then T is of finite rank (Problems 2.6 and 2.17). R(T ) is a subspace of Y whenever T is of finite rank (Corollary 4.29). If T is injective and of finite rank, then X is finite dimensional (Theorem 2.8 and Problems 2.6 and 2.17). Use Problem 2.7 and Corollaries 4.24 and 4.28 to prove the following assertions. (a) If Y is a Banach space and T ∈ B[X , Y ] is of finite rank and injective, then T has a bounded inverse on its range. (b) If X is finite dimensional, then an injective operator in B[X ] is invertible. (c) If X is finite dimensional and T ∈ L[X ], then N (T ) = {0} if and only if T ∈ G[X ]. This means that a linear transformation of a finite-dimensional normed space into itself is a topological isomorphism if and only if it is injective. (d) If X is finite dimensional, then every linear isometry of X into itself is an isometric isomorphism. That is, every linear isometry of a finite-dimensional normed space into itself is surjective. Problem 4.39. The previous problem says that nonsurjective isometries in B[X ] may exist only if the normed space X is infinite dimensional. Here is p an example. Let (+ , # #) denote either the normed space (+ , # #p ) for some ∞ p ≥ 1 or (+ , # #∞ ). Consider the mapping S+ : + → + defined by 0, k = 0, ∞ with υk = S+ x = {υk }k=0 ξk−1 , k ≥ 1, for every x = {ξk }∞ k=0 ∈ + . That is, S+ (ξ0 , ξ1 , ξ2 , . . .) = (0, ξ0 , ξ1 , ξ2 , . . .) for every (ξ0 , ξ1 , ξ2 , . . .) in +, which is also represented by the infinite matrix
292
4. Banach Spaces
⎞
⎛
0 ⎜ 1 0 ⎜ 1 S+ = ⎜ ⎜ ⎝
0 1
⎟ ⎟ ⎟, .. ⎟ .⎠
where every entry immediately below the main diagonal is equal to 1 and the remaining entries are all zero. This is the unilateral shift on + . (a) Show that S+ is a linear nonsurjective isometry. Since S+ is a linear isometry, it follows by Proposition 4.37 that #S+n x# = #x# for every x ∈ + and all n ≥ 1, and hence S+ ∈ B[+ ]
with
#S+n # = 1 for all n ≥ 0.
Consider the backward unilateral S− of Example 4.L, now acting either on p + or on ∞. Recall that S− ∈ B[+ ] and #S−n # = 1 for all n ≥ 0 (this has p been verified in Example 4.L for (+ , # #) = (+ , # #p ), but the same argument ∞ ensures that it holds for (+ , # #) = ( , # #∞ ) as well). (b) Show that S− S+ = I: + → + , the identity on + . Therefore, S− ∈ B[+ ] is a left inverse of S+ ∈ B[+ ]. (c) Conclude that S− is surjective but not injective. Problem 4.40. Let (, # #) denote either the normed space ( p , # #p ) for some p ≥ 1 or ( ∞, # #∞ ). Consider the mapping S: → defined by Sx = {ξk−1 }∞ k=−∞
for every
x = {ξk }∞ k=−∞ ∈
(i.e., S(. . . , ξ−2 , ξ−1 , (ξ0 ), ξ1 , ξ2 , . . .) = (. . . , ξ−3 , ξ−2 , (ξ−1 ), ξ0 , ξ1 , . . .) ), which is also represented by the (doubly) infinite matrix ⎞ ⎛ .. . ⎟ ⎜ ⎟ ⎜ 1 0 ⎟ ⎜ S = ⎜ ⎟ 1 (0) ⎟ ⎜ ⎠ ⎝ 1 0 .. . (with the inner parenthesis indicating the zero-zero position), where every entry immediately below the main diagonal is equal to 1 and the remaining entries are all zero. This is the bilateral shift on . (a) Show that S is a linear surjective isometry. That is, S is an isometric isomorphism, and hence
Problems
S ∈ G[ ]
293
#S n# = 1 for all n ≥ 0.
with
Its inverse S −1 is then again an isometric isomorphism, so that S −1 ∈ G[ ]
#(S −1 )n # = 1 for all n ≥ 0.
with
(b) Verify that the inverse S −1 of S is given by the formula S −1 x = {ξk+1 }∞ k=−∞
for every
x = {ξk }∞ k=−∞ ∈
(that is, S −1 (. . . , ξ−2 , ξ−1 , (ξ0 ), ξ1 , ξ2 , . . .) = (. . . , ξ−1 , ξ0 , (ξ1 ), ξ2 , ξ3 , . . .) ), which is also represented by a (doubly) infinite matrix ⎞ ⎛ .. . 1 ⎟ ⎜ ⎟ ⎜ 0 1 ⎟ ⎜ S −1 = ⎜ ⎟, (0) 1 ⎟ ⎜ ⎠ ⎝ 0 .. . where every entry immediately above the main diagonal is equal to 1 and the remaining entries are all zero. This is the backward bilateral shift on . Problem 4.41. Use Proposition 4.37 to prove the following assertions. (a) Let W, X , Y, and Z be normed spaces (over the same scalar field) and take T ∈ B[Y, Z] and S ∈ B[W, X ]. If V ∈ B[X , Y ] is an isometry, then #T V # = #T #
and
#V S# = #S#.
(b) The product of two isometries is again an isometry. Hint : If T is an isometry, then #T V x# = #V x# = #x# for every x ∈ X . (c) #V n # = 1 for every n ≥ 1 whenever V is an isometry in B[X ]. (d) A linear isometry of a Banach space into a normed space has closed range. Hint : Propositions 4.20 and 4.37. Problem 4.42. Let X and Y be normed spaces. Verify that T ∈ B[X ] and S ∈ B[Y ] are similar (in the sense of Problem 4.29 — notation: T ≈ S) if and only if there exists a topological isomorphism intertwining T to S; that is, if and only if there exists W ∈ G[X , Y ] such that W T = SW. Thus X and Y are topologically isomorphic normed spaces if there are similar operators in B[X ] and B[Y ]. A stronger form of similarity is obtained when there is an isometric isomorphism, say U in G[X , Y ], intertwining T to S; i.e., U T = SU.
294
4. Banach Spaces
If this happens, then we say that T and S are isometrically equivalent (notation: T ∼ = S). Again, X and Y are isometrically isomorphic normed spaces if there are isometrically equivalent operators in B[X ] and B[Y ]. As in the case of similarity, show that isometric equivalence has the defining properties of an equivalence relation. An important difference between similarity and isometric equivalence is that isometric equivalence is norm-preserving: if T and S are isometrically equivalent, then #T # = #S#. Prove this identity and show that it may fail if T and S are simply similar. Now let X and Y be Banach spaces. Show that, in this case, T ∈ B[X ] and S ∈ B[Y ] are similar if and only if there exists an injective and surjective bounded linear transformation in B[X , Y ] intertwining T to S. Problem 4.43. Let X be a normed space. Verify that the following three conditions are pairwise equivalent. (a) X is separable (as a metric space). (b) There exists a countable subset of X that spans X . (c) There exists a dense linear manifold M of X such that dim M ≤ ℵ0 . Hint : Proposition 4.9. (d) Moreover, show also that a completion X of a separable normed space X is itself separable. Problem 4.44. In many senses barreled spaces in a locally convex-space setting plays a role similar to Banach spaces in a normed-space setting. In fact, as we saw in Problem 4.4, a Banach space is barreled. Barreled spaces actually are the spaces where the Banach–Steinhaus Theorem holds in a locally convexspace setting: Every pointwise bounded collection of continuous linear transformations of a barreled space into a locally convex space is equicontinuous. To see that this is exactly the locally convex-space version of Theorem 4.43, we need the notion of equicontinuity in a locally convex space. Let X and Y be topological vector spaces. A subset Θ of L[X , Y ] is equicontinuous if for each neighborhood NY of the origin of Y there exists a neighborhood NX of the origin of X such that T (NX ) ⊆ NY for all T ∈ Θ. (a) Show that if X and Y are normed spaces, then Θ ⊆ L[X , Y ] is equicontinuous if and only if Θ ⊆ B[X , Y ] and sup T ∈Θ #T # < ∞. The notion of a bounded set in a topological vector space (and, in particular, in a locally convex space) was defined in Problem 4.2. Moreover, it was shown in Problem 4.5(b) that this in fact is the natural extension to topological vector spaces of the usual notion of a bounded set in a normed space. (b) Show that the Banach–Steinhaus Theorem (Theorem 4.43) can be stated as follows: Every pointwise bounded collection of continuous linear transformations of a Banach space into a normed space is equicontinuous.
Problems
295
Problem 4.45. Let {Tn} be a sequence in B[X , Y ], where X and Y are normed spaces. Prove the following results. s (a) If Tn −→ T for some T ∈ B[X , Y ], then #Tnx# → #T x# for every x ∈ X and #T # ≤ lim inf n #Tn #.
(b) If supn #Tn # < ∞ and {Tn a} is a Cauchy sequence in Y for every a in a dense set A in X , then {Tn x} is a Cauchy sequence in Y for every x in X . Hint : Tn x − Tm x = Tn x − Tn ak + Tn ak − Tm ak + Tm ak − Tm x. (c) If there exists T ∈ B[X , Y ] such that Tn a → T a for every a in a dense set s A in X , and if supn #Tn # < ∞, then Tn −→ T. Hint : (Tn − T )x = (Tn − T )(x − aε ) + (Tn − T )aε . (d) If X is a Banach space and {Tn x} is a Cauchy sequence for every x ∈ X , then supn #Tn # < ∞. (e) If X and Y is a Banach space and {Tn x} is a Cauchy sequence for every s x ∈ X , then Tn −→ T for some T ∈ B[X , Y ]. Problem 4.46. Let {Tn } be a sequence in B[X , Y ] and let {Sn } be a sequence in B[Y, Z], where X , Y, and Z are normed spaces. Suppose s Tn −→ T
and
s Sn −→ S
for T ∈ B[X , Y ] and S ∈ B[Y, Z]. Prove the following propositions. s ST. (a) If supn #Sn # < ∞, then Sn Tn −→ s ST. (b) If Y is a Banach space, then Sn Tn −→ u s S, then Sn Tn −→ ST. (c) If Sn −→ u u u S and Tn −→ T , then Sn Tn −→ ST. (d) If Sn −→
Finally, show that addition of strongly (uniformly) convergent sequences of bounded linear transformations is again a strongly (uniformly) convergent sequence of bounded linear transformations whose strong (uniform) limit is the sum of the strong (uniform) limits of each summand. Problem 4.47. Let X be a Banach space and let T be an operator in B[X ]. If λ is any scalar such that #T # < |λ|, then λI − T is an invertible element of B[X ] (i.e., (λI − T ) ∈ G[X ] — see the paragraph that follows Theorem 4.22) Tk −1 and the series ∞ k=0 λk+1 converges in B[X ] to (λI − T ) . That is, #T # < |λ|
implies
(λI − T )−1 =
1 λ
∞ T k λ k=0
∈ B[X ].
296
4. Banach Spaces
This is a rather important result, known as the von Neumann expansion. The purpose of this problem is to prove it. Take T ∈ B[X ] and 0 = λ ∈ F arbitrary. Show by induction that, for each integer n ≥ 0, #T n# ≤ #T #n,
(a) (b)
(λI − T ) λ1
n T i λ
1 λ
=
i=0
n T i λ
and (λI − T ) = I −
T n+1 λ
i=0
From now on suppose #T # < |λ| and consider the power sequence in B[X ]. Use the result in (a) to show that T n ∞ is absolutely summable. (c) λ n=0
.
T n ∞ λ
n=0
Thus conclude that (cf. Problem 3.12) T n u −→ O (d) λ
i.e., Tλ is uniformly stable , and also that (see Proposition 4.4) T k ∞
(e)
λ
k=0
is summable.
∞ T k
converges in B[X ] or, equivalent T k ly, that there exists an operator in B[X ], say ∞ k=0 λ , such that This means that the infinite series
k=0 λ
n T k λ
u −→
k=0
∞ T k λ
.
k=0
Apply the results in (b) and (d) to check the following convergences. (f)
(λI − T ) λ1
n T k λ
u −→ I
and
1 λ
k=0
n T k λ
u (λI − T ) −→ I.
k=0
Now use (e) and (f) to show that (g)
(λI − T ) λ1
∞ T k λ
1 λ
=
k=0
∞ T k λ
(λI − T ) = I.
k=0
T k ∈ B[X ] is the inverse of (λI − T ) ∈ B[X ] (Problem 1.7). So Then λ1 ∞ k=0 λ λI − T is an invertible element of B[X ] (i.e., (λI − T ) ∈ G[X ]) whose inverse n i (λI − T )−1 is the (uniform) limit of the sequence { i=0 λTi+1 }∞ n=0 . That is, (λI − T )−1 =
1 λ
∞ T k λ k=0
Finally, verify that
∈ B[X ].
Problems
(h)
#(λI − T )−1 # ≤
1 |λ|
∞ T k |λ|
297
= (|λ| − #T #)−1.
k=0
Remark : Exactly the same proof applies if, instead of B[X ], we were working on an abstract unital Banach algebra A. Problem 4.48. If T is a strict contraction on a Banach space X , then (I − T )−1 =
∞
T k ∈ B[X ].
k=0
This is the special case of Problem 4.47 for λ = 1. Use it to prove assertion (a) below. Then prove the next three assertions. (a) Every operator in the open unit ball centered at the identity I of B[X ] is invertible. That is, if #I − S# < 1, then S ∈ G[X ]. (b) Let X and Y be Banach spaces. Centered at each invertible transformation T ∈ G[X , Y ] there exists a nonempty open ball Bε (T ) ⊂ G[X , Y ] such that supS∈Bε (T ) #S −1 # < ∞. In particular, G[X , Y ] is open in B[X , Y ]. Hint : Suppose #T − S# < ε < #T −1 #−1 so that #IX − T −1 S# = #T −1(T − S)# < #T −1 # ε < 1. Thus T −1 S = IX − (IX − T −1 S) lies in G[X ] by (a), and therefore S = T T −1S also lies in G[X , Y ], with (cf. Corollary 4.23 and Proposition 4.16) #S −1 # = #S −1 T T −1 # ≤ #T −1##S −1 T #. But, according to Problem 4.47(h), #S −1 T # = #(T −1 S)−1 # = # [IX − (IX − T −1 S)]−1 # ≤ (1 − #IX − T −1 S#)−1. Conclude: #S −1 # ≤ #T −1 # (1 − #T −1 #ε)−1. (c) Inversion is a continuous mapping. That is, if X and Y are Banach spaces, then the map T → T −1 of G[X , Y] into G[Y, X ] is continuous. Hint : T −1 − S −1 = T −1 (S − T )S −1. If Tn ∈ G[X , Y ] and {Tn} converges in B[X , Y ] to S ∈ G[X , Y ], then supn #Tn−1 # < ∞, and so Tn−1 → S −1 . (d) If T ∈ B[X ] is an invertible contraction on a normed space X , then #T nx# ≤ #x# ≤ #T −nx# for every x ∈ X and every integer n ≥ 0. Hint : Show that T n is a contraction if T is (cf. Problems 1.10 and 4.22). Problem 4.49. Let X be a normed space. Show that the set of all strict contractions in B[X ] is not closed in B[X ] (and so not strongly closed in B[X ]).
298
4. Banach Spaces
n Hint : Each Dn = diag( 12 , . . . , n+1 , 0, 0, 0, . . .), with just the first n entries p ∞ ] (and in B[+ ]). different from zero, is a strict contraction in any B[+
On the other hand, show that the set of all contractions in B[X ] is strongly closed in B[X ], and so (uniformly) closed in B[X ]. # # Hint : # #Tn x# − #T x# # ≤ #(Tn − T )x#. Problem 4.50. Show that the strong limit of a sequence of linear isometries is again a linear isometry. In other words, the set of all isometries from B[X , Y ] is strongly closed, and so uniformly closed (where X and Y are normed spaces). Hint : Proposition 4.37 and Problem 4.45(a). p Problem 4.51. Take an arbitrary p ≥ 1 and consider the normed space + of p Example 4.B. Let {Dn } be a sequence of diagonal operators in B[+ ]. If {Dn } p converges strongly to D ∈ B[+ ], then D is a diagonal operator. Prove. p Problem 4.52. Let {Pk }∞ k=1 be a sequence of diagonal operators in B[+ ] for any p ≥ 1 such that, for each k ≥ 1,
Pk = diag({ek }∞ k=1 ) = diag(0, . . . , 0, 1, 0, 0, 0, . . .) (the only nonzero entry is equal to 1 and lies at the kth position). Set En =
n
p Pk = diag(1, . . . , 1, 0, 0, 0, . . .) ∈ B[+ ]
k=1 p s for every integer n ≥ 1. Show that En −→ I, the identity in B[+ ], but {En }∞ n=1 does not converge uniformly because #En − I# = 1 for all n ≥ 1.
Problem 4.53. Let a = {αk }∞ k=1 be a scalar-valued sequence and consider p a sequence {Dn }∞ n=1 of diagonal mappings of the Banach space + (for any p ≥ 1) into itself such that, for each integer n ≥ 1, Dn = diag(α1 , . . . , αn , 0, 0, 0, . . .), where the entries in the main diagonal are all null except perhaps for the first n p p p entries. It is clear that Dn lies in B[+ ] for each n ≥ 1 (reason: B0 [+ ] ⊂ B[+ ]). p ∞ ∞ If a ∈ + , then consider the diagonal operator Da = diag({αk }k=1 ) ∈ B[+] of Examples 4.H and 4.K. Prove the following assertions. s (a) If supk |αk | < ∞, then Dn −→ Da . ∞ p p Hint : #Dn x − Da x# ≤ supk |αk |p k=n |ξk |p for x = {ξk }∞ k=1 ∈ + . p p (b) Conversely, if {Dn x}∞ n=1 converges in + for every vector x ∈ + , then s Da . supk |αk | < ∞, and hence Dn −→
Hint : Proposition 3.39, Theorem 4.43, and Example 4.H.
Problems
299
u (c) If limk |αk | = 0, then Dn −→ Da .
Hint : Verify that #(Dn − Da )x#p ≤ supn≤k |αk |p #x#p and conclude that limn #Dn − Da # ≤ limn supn≤k |αk | = lim supn |αk |. (d) Conversely, if {Dn }∞ n=1 converges uniformly, then limk |αk | = 0, and hence u Dn −→ Da . Hint : Uniform convergence implies strong convergence. Apply (c). Compute (Dn − Da )ek for every k, n. Problem 4.54. Take any α ∈ C such that α = 1. Consider the operators A and P in B[C 2 ] identified with the matrices ! ! α −1 0 1 1 A= and P = α−1 α −1 −α 1 + α (i.e., these matrices are the representations of A and P with respect to the canonical basis for C 2 ). (a) Show that P A = AP = P = P 2 . (b) Prove by induction that An+1 = αAn + (1 − α)P, and hence (see Problem 2.19 or supply another induction) An = αn (I − P ) + P, for every integer n ≥ 0. (c) Finally, show that |α| < 1
implies
u P and #P # > 1, An −→
where # # denotes the norm on B[C 2 ] induced by any of the norms # #p (for p ≥ 1) or # #∞ on the linear space C 2 as in Example 4.A. Hint : 1 < #P e1 #∞ ≤ #P e1 #p (cf. Problem 3.33). Problem 4.55. Take a linear transformation T ∈ L[X ] on a normed space X . Suppose the power sequence {T n} is pointwise convergent, which means that there exists P ∈ L[X ] such that T n x → P x in X for every x ∈ X . Show that (a) P T k = T k P = P = P k for every integer k ≥ 1, (b) (T − P )n = T n − P for every integer n ≥ 1. Now suppose T lies in B[X ] and prove the following propositions. s s P ∈ B[X ], then P is a projection and (T − P )n −→ O. (c) If T n −→
300
4. Banach Spaces
(d) If T n x → P x for every x ∈ X , then P ∈ L[X ] is a projection. If, in addition, X is a Banach space, then P is a continuous projection and T − P in B[X ] is strongly stable. Problem 4.56. Let F : X → X be a mapping of a set X into itself. Recall that F is injective if and only if it has a left inverse F −1 : R(X) → X on its range R(X) = F (X). Therefore, if F is injective and idempotent (i.e., F = F 2 ), then F = F −1 F F = F −1 F = I, and hence (a) the unique idempotent injective mapping is the identity. This is a purely set-theoretic result (no algebra or topology is involved). Now let X be a metric space and recall that every isometry is injective. Thus, (b) the unique idempotent isometry is the identity. In particular, if X is a normed space and F : X → X is a projection (i.e., an idempotent linear transformation) and an isometry, then F = I: the identity is the unique isometry that also is a projection. Show that (c) the unique linear isometry that has a strongly convergent power sequence is the identity. Hint : Problems 4.50 and 4.55. Problem 4.57. Let {Tn } be a sequence of bounded linear transformations in B[Y, Z], where Y is a Banach space and Z is a normed space, and take T in B[Y, Z]. Show that, if M is a finite-dimensional subspace of Y, then s Tn −→ T
(a)
implies
u (Tn − T )|M −→ O.
(Hint : Proposition 4.46.) Now let K be a compact linear transformation of a normed space X into Y (i.e., K ∈ B∞[X , Y ]). Prove that s Tn −→ T
(b)
implies
u Tn K −→ T K.
Hint : Take any x ∈ X . Use Proposition 4.56 to show that for each ε > 0 there exists a finite-dimensional subspace Rε of Y and a vector rε,x in Rε such that #Kx − rε,x # < 2ε#x#
and
#rε,x # < (2ε + #K#)#x#.
Then verify that
, , #(Tn K − T K)x# ≤ #Tn − T # #Kx − rε,x # + ,(Tn − T )|Rε , #rε,x # ,
, < 2ε#Tn − T # + 2ε + #K# ,(Tn − T )|Rε , #x#.
Finally, apply the Banach–Steinhaus Theorem (Theorem 4.43) to ensure that supn #Tn − T # < ∞ and conclude from item (a): for every ε > 0
lim sup #Tn K − T K# < 2 sup #Tn − T # ε. n
n
Problems
301
Problem 4.58. Prove the converse of Corollary 4.55 under the assumption that the Banach space Y has a Schauder basis. In other words, prove the following proposition. If Y is a Banach space with a Schauder basis and X is a normed space, then every compact linear transformation in B∞[X , Y ] is the uniform limit of a sequence of finite-rank linear transformations in B0 [X , Y ]. Hint : Suppose Y is infinite dimensional (otherwise the result is trivial) and has a Schauder basis. Take an arbitrary K in B∞[X , Y ]. R(K)− is a Banach space ∞ − possessing a Schauder ∞ basis, say {yi }i=1 , so that every y ∈ R(K) has a unique expansion y = i=1 αi (y)yi (Problem 4.11). For each nonnegative n integer n consider the mapping En : R(K)− → R(K)− defined by En y = i=1 αi (y)yi . Show that each En is bounded and linear (Problem 4.30), and of finite rank (since R(En ) ⊆ {yi }ni=1 ). Also show that {En }∞ n=1 converges strongly to the identity operator I on R(K)− (Problem 4.9(b)). Use the previous problem to u conclude that En K −→ K. Finally, check that En K ∈ B0 [X , Y ]for each n. Remark : Consider the remark in Problem 4.11. There we commented on the construction of a separable Banach space that has no Schauder basis. Such a breakthrough was achieved by Enflo in 1973 when he exhibited a separable (and reflexive) Banach space X for which B0 [X ] is not dense in B∞[X ], so that there exist compact operators on X that are not the (uniform) limit of finite-rank operators (and so the converse of Corollary 4.55 fails in general). Thus, according to the above proposition, such an X is a separable Banach space without a Schauder basis. Problem 4.59. Recall that the concepts of strong and uniform convergence coincide in a finite-dimensional space (Proposition 4.46). Consider the Banach p space + for any p ≥ 1 (which has a Schauder basis — Problem 4.12). Exhibit p a sequence of finite-rank (compact) operators on + that converges strongly to a finite-rank (compact) operator but is not uniformly convergent. Hint : Let Pk be the diagonal operator defined in Problem 4.52. Problem 4.60. Let M be a subspace of an infinite-dimensional Banach space X . An extension over X of a compact operator on M may not be compact. Hint : Let M and N be Banach spaces over the same field. Suppose N is infinite dimensional. Set X = M ⊕ N and consider the direct sum T = K ⊕ I in B[X ], where K is a compact operator in B∞ [M] and I is the identity operator in B[N ], as in Problem 4.16. Problem 4.61. Let X and Y be nonzero normed spaces over the same field. Let M be a proper subspace of X . Show that there exists O = T ∈ B[X , Y ] such that M ⊆ N (T ) (i.e., such that T (M) = {0}). (Hint : Corollary 4.63.)
302
4. Banach Spaces
# # Problem 4.62. Let X be a normed space. Since # #x# − #y# # ≤ #x − y# for every x, y ∈ X , it follows that the norm on X is a real-valued contraction that takes each vector of X to its norm. Show that for each vector in X there exists a real-valued linear contraction on X that takes that vector to its norm. Problem 4.63. Let A be an arbitrary subset of a normed space X . The annihilator of A is the following subset of the dual space X ∗ : A⊥ = f ∈ X ∗ : A ⊆ N (f ) . (a) If A = ∅, then show that A⊥ =
f ∈ X ∗ : f (A) = {0} .
(b) Show that ∅⊥ = {0}⊥ = X , X ⊥ = {0}, and B ⊥ ⊆ A⊥ whenever A ⊆ B. (c) Show that A⊥ is a subspace of X ∗. (d) Show that A− ⊆ f ∈A⊥ N (f ). Hint : If f ∈ A⊥, then A ⊆ N (f ). Thus conclude that A− ⊆ N (f ) for every f ∈ A⊥ (Proposition 4.13). Now let M be a linear manifold of X and prove the following assertions. (e) M− = f ∈M⊥ N (f ). ⊥ Hint : if x0 ∈ X \M−, then there exists an f ∈ M such that f (x0 ) = 1 (Corollary 4.63), and therefore x0 ∈ f ∈M⊥ N (f ). Thus conclude that − f ∈M⊥ N (f ) ⊆ M . Next use item (d).
(f) M− = X if and only if M⊥ = {0}. Problem 4.64. Let {ei }ni=1 be a Hamel basis for a finite-dimensional normed space X . Verify the following propositions. (a) For each i = 1 , . . . , n there exists fi ∈ X ∗ such that fi (ej ) = δij for every j = 1 , . . . , n. n Hint : Set fi (x) = ξi for every x = i=1 ξi ei ∈ X . (b) {fi }ni=1 is a Hamel basis for X ∗. Hint : If f ∈ X ∗, then f = ni=1 f (ei )fi . Now conclude that dim X = dim X ∗ whenever dim X < ∞. Problem 4.65. Let J: Y → X be an isometric isomorphism of a normed space Y onto a normed space X , and consider the mapping J ∗ : X ∗ → Y F defined by J ∗f = f ◦ J for every f ∈ X ∗. Show that
Problems
303
(a) J ∗ (X ∗ ) = Y ∗ so that J ∗ : X ∗ → Y ∗ is surjective, (b) J ∗ : X ∗ → Y ∗ is linear, and (c) #J ∗f # = #f # for every f ∈ X ∗.
(Hint : Problem 4.41(a).)
Conclude: If X and Y are isometrically isomorphic normed spaces, then their duals X ∗ and Y ∗ are isometric isomorphic too. That is, X ∼ =Y
implies
X∗ ∼ = Y ∗.
∞ Problem 4.66. Consider the normed space + equipped with its usual supc ∞ ∞ norm. Recall that + ⊆ + (Problem 3.59). Let S− ∈ B[+ ] be the backward ∞ unilateral shift on + (Example 4.L and Problem 4.39). A bounded linear ∞ functional f : + → F is a Banach limit if it satisfies the following conditions.
(i)
#f # = 1,
∞ , (ii) f (x) = f (S− x) for every x ∈ + c (iii) If x = {ξk } lies in + , then f (x) = limk ξk ,
∞ is such that ξk ≥ 0 for all k, then f (x) ≥ 0. (iv) If x = {ξk } in +
∞ c the limit function on + Condition (iii) says that Banach limits extend to + ∞ c (i.e., f is defined on + and its restriction to + , f |+c , assigns to each convergent sequence its own limit). The remaining conditions represent fundamental properties of a limit function. Condition (i) ensures that limk |ξk | ≤ supk |ξk |, and condition (ii) that limk ξk = limk ξk+n for every positive integer n, whenc ever {ξk } ∈ + . Condition (iv) says that f is order-preserving for real-valued ∞ sequences in + (i.e., if x = {ξk } and y = {υk } are real-valued sequences in ∞ + , then f (x), f (y) ∈ R — why? — and f (x) ≤ f (y) if ξk ≤ υk for every k). The purpose of this problem is to show how the Hahn–Banach Theorem en∞ sures the existence of Banach limits on + . ∞ (a) Suppose F = R (so that the sequences in + are all real valued). Let e be ∞ the constant sequence in + whose entries are all ones, e = (1, 1, 1, . . .), and set M = R(I − S− ). Show that d(e, M) = 1.
Hint : Verify that d(e, M) ≤ 1 (for #e#∞ = 1 and 0 ∈ M). Now take an arbitrary u = {υk } in M. If υk0 ≤ 0 for some integer k0 , then show that 1 ≤ #e − u#∞ . But u ∈ R(I − S− ), and so υk = ξk − ξk+1 for some x = ∞ {ξk } in + . If υk ≥ 0 for all k, then {ξk } is decreasing and bounded. Check that {ξk } converges in R (Problem 3.10), show that υk → 0 (Problem 3.51), and conclude that 1 ≤ #e − u#∞ whenever υk ≥ 0 for all k. Hence 1 ≤ #e − u#∞ for every u ∈ M so that d(e, M) ≥ 1. Therefore (it does not matter whether M is closed or not), M− is a subspace ∞ of + (Proposition 4.9(a)) and d(e, M− ) = 1 (Problem 3.43(b)). Then, by ∞ Corollary 4.63, there exists a bounded linear functional ϕ : + → R such that ϕ(e) = 1,
ϕ(M− ) = {0},
and
#ϕ# = 1.
304
4. Banach Spaces
∞ (b) Show that ϕ(x) = ϕ(S−n x) for every x ∈ + and all n ≥ 1.
Hint : ϕ((I −S− )x) = 0 because ϕ(M) = {0}. This leads to ϕ(x) = ϕ(S− x) ∞ for every x in + . Conclude the proof by induction. (c) Show that ϕ satisfies condition (iii). c so that ξk → ξ in R for some ξ ∈ R. Hint : Take an arbitrary x = {ξk } in + Observe that |ξk+n − ξ| ≤ |ξk+n − ξn | + |ξn − ξ| for every pair of positive integers n, k. Now Use Problem 3.51(a) to show that #S−n x − ξe#∞ → 0. ∞ That is, S−n x → ξe in + . Next verify that ϕ(x) = ξϕ(e).
(d) Show that ϕ satisfies condition (iv). c such that ξk ≥ 0 for all k, and set x = Hint : Take any 0 = x = {ξk } in + −1 (#x#∞ )x = {ξk }. Verify that 0 ≤ ξk ≤ 1 for all k, and so #e − x #∞ ≤ 1. Finally, show that 1 − ϕ(x ) = ϕ(e − x ) ≤ 1, and conclude: ϕ(x) ≥ 0. ∞ ∞ → R is a Banach limit on + . Thus, in the real case, ϕ : + ∞ ). (e) Now suppose F = C (so that complex-valued sequences are allowed in + ∞ For each x in + write x = x1 + ix2 , where x1 and x2 are real-valued se∞ quences in + , and set
f (x) = ϕ(x1 ) + iϕ(x2 ). ∞ Show that this defines a bounded linear functional f : + → C.
Hint : #f # ≤ 2. (f) Verify that f satisfies conditions (ii), (iii) and (iv). (g) Prove that #f # = 1. # be the set of all scalar-valued sequences that take on only a Hint : Let + # ∞ ⊂ + . If finite number of values (i.e., that have a finite range). Clearly, + # y ∈ + with #y#∞ ≤ 1, then there is a finite partition of N , say {Nj }m j=1 , and a finite set of scalars {αj }m j=1 with |αj | ≤ 1 for all j such that y = m α χ . Here χ is the characteristic function of Nj which, in fact, is Nj j=1 j Nj # (i.e., χNj = {χNj (k)}k∈N , where χNj (k) = 1 if k ∈ Nj and an element of + m χNj (k) = 0 if k ∈ N \Nj ). Verify: f (y) = m j=1 αj f (χNj ) = j=1 αj ϕ(χNj )
m m and j=1 ϕ(χNj ) = ϕ j=1 χNj = ϕ(χN) = ϕ(e). Recall that ϕ satisfies condition (iv) and show that |f (y)| ≤ (supj |αj |)ϕ(e). # and #y#∞ ≤ 1, then |f (y)| ≤ 1. Conclusion 1. If y ∈ +
The closed unit ball B with center at the origin of C is compact. For each ∞ positive integer n, take a finite n1 -net for B, say Bn ⊂ B. If x = {ξk } ∈ + is such that #x#∞ ≤ 1, then ξk ∈ B for all k. Thus for each k there exists υn (k) ∈ Bn such that |υn (k) − ξk | < n1 . This defines for each n a Bn -valued # sequence yn = {υn (k)} with #yn − x#∞ < n1 , which defines an + -valued ∞ sequence {yn } with #yn #∞ ≤ 1 for all n that converges in + to x.
Problems
305
# ∞ ∞ Conclusion 2. Every x ∈ + with #x#∞ ≤ 1 is the limit in + of an + valued sequence {yn } with supn #yn #∞ ≤ 1.
∞ → C is continuous. Apply Conclusion 2 to show that Recall that f : + f (yn ) → f (x), and so |f (yn )| → |f (x)|. Since |f (yn )| ≤ 1 for every n (by ∞ Conclusion 1), it follows that |f (x)| ≤ 1 for every x ∈ + with #x#∞ ≤ 1. Then #f # ≤ 1. Verify that #f # ≥ 1 (since f (e) = ϕ(e) and #e#∞ = 1). ∞ ∞ Thus, in the complex case, f : + → C is a Banach limit on + .
Problem 4.67. Let X be a normed space. An X -valued sequence {xn } is said to be weakly convergent if there exists x ∈ X such that {f (xn )} converges in F to f (x) for every f ∈ X ∗. In this case we say that {xn } converges weakly to w x ∈ X (notation: xn −→ x) and x is said to be the weak limit of {xn }. Prove the following assertions. (a) The weak limit of a weakly convergent sequence is unique. Hint : f (x) = 0 for all f ∈ X ∗ implies x = 0. (b) {xn } converges weakly to x if and only if every subsequence of {xn } converges weakly to x. Hint : Proposition 3.5. (c) If {xn } converges in the norm topology to x, then it converges weakly to w x (i.e., xn → x =⇒ xn −→ x). Hint : |f (xn − x)| ≤ #f ##xn − x#. (d) If {xn } converges weakly, then it is bounded in the norm topology (i.e., w xn −→ x =⇒ supn #xn # < ∞). Hint : For each x ∈ X there exists ϕx ∈ X ∗∗ such that ϕx (f ) = f (x) for every f ∈ X ∗ and #ϕx # = #x#. This is the natural embedding of X into X ∗∗ (Theorem 4.66). Verify that supn |f (xn )| < ∞ for every f ∈ X ∗ whenw ever xn −→ x, and show that supn |ϕxn (f )| < ∞ for every f ∈ X ∗. Now use the Banach–Steinhaus Theorem (recall: X ∗ is a Banach space). Problem 4.68. Let X and Y be normed spaces. A B[X , Y ]-valued sequence {Tn } converges weakly in B[X , Y ] if {Tn x} converges weakly in Y for every x ∈ X . In other words, {Tn } converges weakly in B[X , Y ] if {f (Tn x)} converges in F for every f ∈ Y ∗ and every x ∈ X . Prove the following assertions. (a) If {Tn } converges weakly in B[X , Y ], then there exists a unique T ∈ L[X , Y ] w (called the weak limit of {Tn }) such that Tn x −→ T x in Y for every x ∈ X . Hint : That there exists such a unique mapping T : X → Y follows by Problem 4.67(a). This mapping is linear because every f in Y ∗ is linear. w w Notation: Tn −→ T (or Tn − T −→ O). If {Tn } does not converge weakly to w T , then we write Tn −→ / T.
306
4. Banach Spaces
w (b) If X is a Banach space and Tn −→ T, then supn #Tn # < ∞ and T ∈ B[X , Y ].
Hint : Apply the Banach–Steinhaus Theorem and Problem 4.67(d) to prove (uniform) boundedness for {Tn}. Show that |f (T x)| ≤ #f #(supn #Tn #)#x# for every x ∈ X and every f ∈ Y ∗. Thus conclude that, for every x ∈ X , #T x# = supf ∈Y ∗, f =1 |f (T x)| ≤ supn #Tn # #x#. s w (c) Tn −→ T implies Tn −→ T.
Hint : |f (Tn − T )x | ≤ #f # #(Tn − T )x#.
Take T ∈ B[X ] and consider the power sequence {T n} in the normed algebra w B[X ]. The operator T is weakly stable if T n −→ O. (d) Strong stability implies weak stability, s O T n −→
=⇒
w T n −→ O,
which in turn implies power boundedness if X is a Banach space: w O T n −→
=⇒
sup #T n # < ∞ if X is Banach. n
Problem 4.69. Let X and Y be normed spaces. Consider the setup of the previous problem and prove the following propositions. w w (a) If T ∈ B[X , Y ], then T xn −→ T x in Y whenever xn −→ x in X . That is, a continuous linear transformation of X into Y takes weakly convergent sequences in X into weakly convergent sequences in Y.
Hint : If f ∈ Y ∗, then f ◦ T ∈ X ∗. w x in X . That (b) If T ∈ B∞[X , Y ], then #T xn − T x# → 0 whenever xn −→ is, a compact linear transformation of X into Y takes weakly convergent sequences in X into convergent sequences in Y. w Hint : Suppose xn −→ x in X and take T ∈ B∞[X , Y ]. Use Theorem 4.49, part (a) of this problem, and Problem 4.67(d) to show that
(b1 )
w T xn −→ T x in Y
and
sup #xn # < ∞. n
Suppose {T xn } does not converge (in the norm topology of Y) to T x. Use Proposition 3.5 to show that {T xn } has a subsequence, say {T xnk }, that does not converge to T x. Conclude: There exists ε0 > 0 and a positive integer kε0 such that (b2 )
#T (xnk − x)# > ε0
for every
k ≥ kε0 .
Verify from (b1 ) that supk #xnk # < ∞. Apply Theorem 4.52 to show that {T xnk } has a subsequence, say {T xnkj }, that converges in the norm topology of Y. Now use the weak convergence in (b1 ) and Problem 4.67(b) to show that {T xnkj } in fact converges to T x (i.e., T xnkj → T x in Y). Therefore, for each ε > 0 there exists a positive integer jε such that
Problems
(b3 )
#T (xnkj − x)# < ε
for every
307
j ≥ jε .
Finally, verify that (b3 ) contradicts (b2 ) and conclude that {T xn } must converge in Y to T x. Problem 4.70. Let X be a normed space. An X ∗ -valued sequence {fn } is weakly convergent if there exists f ∈ X ∗ such that {ϕ(fn )} converges in F to w ϕ(f ) for every ϕ ∈ X ∗∗ (cf. Problem 4.67). In this case we write fn −→ f in ∗ ∗ X . An X -valued sequence {fn } is weakly* convergent if there exists f ∈ X ∗ w∗ such that {fn (x)} converges in F to f (x) for every x ∈ X (notation: fn −→ f ). ∗ Thus weak* convergence in X means pointwise convergence of B[X , F ]-valued sequences to an element of B[X , F ]. (a) Show that weak convergence in X ∗ implies weak* convergence in X ∗ (i.e., w w∗ f =⇒ fn −→ f ). fn −→ Hint : According to the natural embedding of X into X ∗∗ (Theorem 4.66), for each x ∈ X there exists ϕx ∈ X ∗∗ such that ϕx (f ) = f (x) for every f ∈ X ∗. Verify that, for each x ∈ X , fn (x) → f (x) if ϕx (fn ) → ϕx (f ). (b) If X is reflexive, then the concepts of weak convergence in X ∗ and weak* convergence in X ∗ coincide. Prove. Problem 4.71. Let K be a compact (thus totally bounded — see Corollary 3.81) subset of a normed space X . Take an arbitrary ε > 0 and let Aε be a finite ε-net for K (Definition 3.68). Take the closed ball Bε [a] of radius ε centered at each a ∈ Aε , and consider the functional ψa : K → [0, ε] defined by ε − #x − a#, x ∈ Bε [a], ψa (x) = 0, x ∈ Bε [a]. Define the function ΦAε: K → X by the formula aψa (x) for every ΦAε(x) = a∈Aε a∈Aε ψa (x)
x ∈ K.
Prove that ΦAε is continuous and #ΦAε(x) − x# < ε for every x ∈ K. This is a technical result that will be needed in the next problem. Hint: Verify that a∈Aεψa (x) > 0 for every x ∈ K so that the function ΦAε is well defined. Show that each ψa is continuous, and infer that ΦAε is continuous as well. Take any x ∈ K. If ψa (x) = 0 for some a ∈ Aε , then #x − a# < ε. Thus a∈Aε#a − x# ψa (x) < ε. #ΦAε(x) − x# ≤ a∈Aεψa (x) Problem 4.72. An important classical result in topology reads as follows. Let B1 [0] be the closed unit ball (radius 1 with center at the origin) in Rn . Recall that all norms in Rn are equivalent (Theorem 4.27).
308
4. Banach Spaces
(i) If F : B1 [0] → B1 [0] is a continuous function, then it has a fixed point in B1 [0] (i.e., there exists x ∈ Rn with #x# ≤ 1 such that F (x) = x). This is the Browder Fixed Point Theorem. A useful corollary extends it from closed unit balls (which are compact and convex in Rn ) to compact and convex sets in a finite-dimensional normed space as follows. (ii) Let K be a nonempty compact and convex subset of a finite-dimensional normed space. If F : K → K is a continuous function, then it has a fixed point in K (i.e., there exists x ∈ K such that F (x) = x). We borrow the notion of a compact mapping from nonlinear functional analysis. Let D be a nonempty subset of a normed space X . A mapping F : D → X is compact if it is continuous and F (B)− is compact in X whenever B is a bounded subset of D. Recall that a continuous image of any compact set is a compact set (Theorem 3.64). Thus, if D is a compact subset of X , then every continuous mapping F : D → X is compact. However, we are now concerned with the case where D (the domain of F ) is not compact but bounded. In this case, if F is continuous and F (D)− is compact, then F is a compact mapping (for F (B)− ⊆ F (D)− if B ⊆ D). The next result is the Schauder Fixed Point Theorem. It is a generalization of (ii) to infinite-dimensional spaces. Prove it. (iii) Let D be a nonempty closed bounded convex subset of a normed space X , and let F : D → X be a compact mapping. If D is F -invariant, then F has a fixed point (i.e., if F (D) ⊆ D, then F (x) = x for some x ∈ D). Hint : Set K = F (D)− ⊆ D, which is compact. For each n ≥ 1 let An be a finite n1 -net for K, and take ΦAn: K → X as in Problem 4.71. Verify by the definition of ΦAn that ΦAn(K) ⊆ co(K) ⊆ D since D is convex (Problem 2.2). So infer that D is (ΦAn ◦ F )-invariant. Set Fn = (ΦAn ◦ F ): D → D. Use Problem 4.71 to conclude that, for each n ≥ 1 and every x ∈ D, #Fn (x) − F (x)# <
1 n.
Set Dn = D ∩ span (An ), each Dn being a nonempty compact and convex subset of the finite-dimensional space span (An ). Also, Dn is Fn -invariant and Fn = Fn |Dn : Dn → Dn is continuous. Use (ii) to conclude that each Fn has a fixed point in Dn (say, xn ∈ Dn such that Fn (xn ) = xn ). Take the sequence {F (xn )} of points in the sequentially compact set K so that there is a point x0 ∈ K ⊆ D and a subsequence {F (xnk )} of {F (xn )} such that F (xnk ) → x0 (Definition 3.76 and Theorem 3.80). Since Fnk (xnk ) = xnk and #Fnk (xnk ) − F (xnk )# < n1k , it follows that #xnk − x0 # ≤ #Fnk (xnk ) − F (xnk )# + #F (xnk ) − x0 # → 0. Since F is continuous, Corollary 3.8 ensures that F (x0 ) = lim F (xnk ) = x0 .
5 Hilbert Spaces
What is missing? The algebraic structure of a normed space allowed us to operate with vectors (addition and scalar multiplication), and its topological structure (the one endowed by the norm) gave us a notion of closeness (by means of the metric generated by the norm), which interacts harmoniously with the algebraic operations. In particular, it provided the notion of length of a vector. So what is missing if algebra and topology have already been properly laid on the same underlying set? A full geometric structure is still missing. Algebra and topology are not enough to extend to abstract spaces the geometric concept of relative direction (or angle) between vectors that is familiar in Euclidean geometry. The keyword here is orthogonality, a concept that emerges when we equip a linear space with an inner product . This supplies a tremendously rich structure that leads to remarkable simplifications.
5.1 Inner Product Spaces We assume throughout this chapter (as we did in Chapter 4) that F denotes either the real field R or the complex field C , both equipped with their usual topologies induced by their usual metrics. Recall that an upper bar stands for complex conjugate in C . That is, for each complex number λ = α + i β in standard form the real numbers α = Reλ and β = Imλ are the real and imaginary parts of λ, respectively, and the complex number λ = α − i β is the conjugate of λ. The following are basic properties of conjugates: for every λ, μ in C , (μ) = μ, (λ + μ) = λ + μ, (λμ) = λ μ, λ + λ = 2 Re λ, λ − λ = 2 i Imλ, λ λ = |λ|2 = (Reλ)2 + (Imλ)2 , and λ = λ if and only if λ ∈ R. Let X be a linear space over F . Consider a functional σ: X ×X → F defined on the Cartesian product of the linear space X with itself. If σ( · , v): X → F and σ(u, · ): X → F are linear functionals on X for every u, v ∈ X (i.e., if σ is linear in both arguments), then σ is called a bilinear form (or a bilinear functional ) on X . If σ( · , v): X → F is a linear functional on X for each v ∈ X , and if σ(u, · ): X → F is a linear functional on X for each u ∈ X , then σ is said to be a sesquilinear form (or a sesquilinear functional ) on X . Equivalently, σ is a sesquilinear form on X if, for every u, v, x, y ∈ X and every α, β ∈ F , C.S. Kubrusly, The Elements of Operator Theory, DOI 10.1007/978-0-8176-4998-2_5, © Springer Science+Business Media, LLC 2011
309
310
5. Hilbert Spaces
σ(αu + β x, v) = ασ(u, v) + β σ(x, v), σ(u, αv + β y) = ασ(u, v) + βσ(u, y). In other words, a sesquilinear form is additive in both arguments, homogeneous in the first argument, and conjugate homogeneous in the second argument (“sesqui” means “one and a half”). A functional σ: X ×X → F is symmetric if σ(x, y) = σ(y, x), and Hermitian symmetric if σ(x, y) = σ(y, x), for every x, y ∈ X . If F = R, then it is clear that σ is symmetric if and only if it is Hermitian symmetric, and the notions of bilinear and sesquilinear forms coincide in the real case. It is also readily verified that a functional σ: X ×X → F that is linear in the first argument (i.e., such that σ( · , v): X → F is a linear functional for every v ∈ X ) and Hermitian symmetric is a sesquilinear form. Thus a Hermitian symmetric sesquilinear form is precisely a Hermitian symmetric functional on X ×X that is linear in the first argument. If σ is a sesquilinear form on X , then the functional φ: X → F defined by φ(x) = σ(x, x) for every x ∈ X is called a quadratic form on X induced (or generated) by σ. If σ is Hermitian symmetric, then the induced quadratic form φ is real valued (i.e., σ(x, x) ∈ R for every x ∈ X if σ is Hermitian symmetric). Note that if σ is a sesquilinear form, then σ(0, v) = σ(u, 0) = 0 for every u, v ∈ X , and so σ(0, 0) = 0. A quadratic form induced by a Hermitian symmetric sesquilinear form σ is nonnegative or positive if σ(x, x) ≥ 0 for every x ∈ X ,
or
σ(x, x) > 0 for every nonzero x ∈ X , respectively. Therefore, a quadratic form φ is positive if it is nonnegative and σ(x, x) = 0 only if x = 0. An inner product (or a scalar product ) on a linear space X is a Hermitian symmetric sesquilinear form that induces a positive quadratic form. In other words, an inner product on a linear space X is a functional on the Cartesian product X ×X that satisfies the following properties, called the inner product axioms. Definition 5.1. Let X be a linear space over F . A functional ; : X ×X → F is an inner product on X if the following conditions are satisfied for all vectors x, y, and z in X and all scalars α in F . (i)
x + y ; z = x ; z + y ; z
(ii) αx ; y = αx ; y (iii) x ; y = y ; x (iv) (v)
x ; x ≥ 0 x ; x = 0
only if
(additivity), (homogeneity), (Hermitian symmetry),
x=0
(nonnegativeness), (positiveness).
5.1 Inner Product Spaces
311
A linear space X equipped with an inner product on it is an inner product space (or a pre-Hilbert space). If X is a real or complex linear space (so that F = R or F = C ) equipped with an inner product on it, then it is referred to as a real or complex inner product space, respectively. Observe that ; : X ×X → F is actually a sesquilinear form. In fact, (i )
x + y ; z = x ; z + y ; z,
(ii )
αx ; y = αx ; y,
x ; w + z = x ; w + x ; z, x ; αy = αx ; y,
and
for all vectors x, y, w, z in X and all scalars α in F . Properties (i ) and (ii ) define a sesquilinear form. For an inner product, (i ) and (ii ) are obtained by axioms (i), (ii), and (iii), and are enough by themselves to ensure that n 7
8 αi xi ; β0 y0
i=1
and so
=
n
αi β0 xi ; y0 ,
n n 8 7 α 0 x0 ; βi yi = α0 βi x0 ; yi ,
i=1 n 7 i=0
αi xi ;
i=1 n
8 βj yj
j=0
=
n
i=1
αi β j xi ; yj ,
i,j=0
for every αi , βi ∈ F and every xi , yi ∈ X , with i = 0 , . . . , n, for each integer n ≥ 1. Let # #2 : X → F denote the quadratic form induced by the sesquilinear form ; on X (the notation # #2 for the quadratic form induced by an inner product is certainly not a mere coincidence as we shall see shortly); that is, #x#2 = x ; x for every x ∈ X . The preceding identities ensure that, for every x, y ∈ X , #x + y#2 = #x#2 + x ; y + y ; x + #y#2 and
x ; 0 = 0 ; x = 0 ; 0 = 0.
The above results hold for every sesquilinear form. Now, since ; is also Hermitian symmetric (i.e., since ; also satisfies axiom (iii)), it follows that x ; y + y ; x = x ; y + x ; y = 2 Rex ; y for every x, y ∈ X , and hence #x + y#2 = #x#2 + 2 Rex ; y + #y#2 by axioms (i) and (iii). Moreover, by using axioms (ii) and (v) we get x ; y = 0 for all y ∈ X
if and only if
x = 0.
The next result is of fundamental importance. It is referred to as the Schwarz (or Cauchy–Schwarz , or even Cauchy–Bunyakovski–Schwarz ) inequality. Lemma 5.2. Let ; : X ×X → F be an inner product on a linear space X . 1 Set #x# = x ; x 2 for each x ∈ X . If x, y ∈ X , then
312
5. Hilbert Spaces
|x ; y| ≤ #x# #y#. Proof. Take an arbitrary pair of vectors x and y in X , and consider just the first four axioms of Definition 5.1, viz., axioms (i), (ii), (iii), and (iv). Thus 0 ≤ x − αy ; x − αy = x ; x − αx ; y − αx ; y + |α|2 y ; y for every α ∈ F . Note that z ; z ≥ 0 by axiom (iv), and so it has a square 1 root #z# = z ; z 2 , for every z ∈ X . Now set α = x;y for any β > 0 so that β
2 0 ≤ #x#2 − β1 2 − y |x ; y|2 . β If #y# = 0, then set β = #y#2 to get the Schwarz inequality. If #y# = 0, then 0 ≤ 2|x ; y|2 ≤ β#x#2 for all β > 0, and hence |x ; y| = 0 (which trivially satisfies the Schwarz inequality). Proposition 5.3. If ; : X ×X → F is an inner product on a linear space X , then the function # # : X → R, defined by 1
#x# = x ; x 2 for each x ∈ X , is a norm on X . Proof. Axioms (ii), (iii), (iv), and (v) in Definition 5.1 imply the norm axioms (i), (ii), and (iii) of Definition 4.1. The triangle inequality (axiom (iv) of Definition 4.1) is a consequence of the Schwarz inequality:
0 ≤ #x + y#2 = #x#2 + 2 Rex ; y + #y#2 ≤ #x# + #y# 2 for every x and y in X (reason: Rex ; y ≤ |x ; y| ≤ #x##y#).
A word on notation and terminology. An inner product space in fact is an ordered pair (X , ; ), where X is a linear space and ; is an inner product on X . We shall often refer to an inner product space by simply saying that “X is an inner product space” without explicitly mentioning the inner product ; that equips the linear space X . However, there may be occasions when the role played by different inner products should be emphasized and, in these cases, we shall insert a subscript on the inner products (e.g., (X , ; X ) and (Y, ; Y )). If a linear space X can be equipped with more than one inner product, say ; 1 and ; 2 , then (X , ; 1 ) and (X , ; 2 ) will represent different inner product spaces with the same linear space X . The norm # # of Proposition 5.3 is the norm induced (or defined, or generated) by the inner product ; , so that every inner product space is a special kind of normed space (and hence a very special kind of linear metric space). Whenever we refer to the topological structure of an inner product space (X , ; ), it will always be understood that such a topology on X is that defined by the metric d that is generated by the norm # #, which in turn is the one induced by the inner product ; . That is,
5.1 Inner Product Spaces
313
1
d(x, y) = #x − y# = x − y ; x − y 2 for every x, y ∈ X (cf. Propositions 4.2 and 5.3). This is the norm topology on X induced by the inner product. Since every inner product on a linear space induces a norm, it follows that every inner product space is a normed space (equipped with the induced norm). However, an arbitrary norm on a linear space may not be induced by any inner product on it (so that an arbitrary normed space may not be an inner product space). The next result leads to a necessary and sufficient condition that a norm be induced by an inner product. Proposition 5.4. Let ; be an inner product on a linear space X and let # # be the induced norm on X . Then
#x + y#2 + #x − y#2 = 2 #x#2 + #y#2 for every x, y ∈ X . This is called the parallelogram law. If (X , ; ) is a complex inner product space, then
x ; y = 14 #x + y#2 − #x − y#2 + i#x + iy#2 − i#x − iy#2 for every x, y ∈ X . If (X , ; ) is a real inner product space, then
x ; y = 14 #x + y#2 − #x − y#2 for every x, y ∈ X . The above two expressions are referred to as the complex and real polarization identities, respectively. Proof. Axioms (i), (ii), and (iii) in Definition 5.1 lead to properties (i ) and (ii ), which in turn, by setting #x#2 = x ; x for every x ∈ X , ensure that #x + αy#2 = #x#2 + αx ; y + αy ; x + |α|2 #y#2 for every x, y ∈ X and every α ∈ F . For the parallelogram law, set α = 1 and α = −1. For the complex polarization identity, also set α = i and α = −i. For the real polarization identity, set α = 1, α = −1, and use axiom (iii). Remark : The parallelogram law and the complex polarization identity hold for every sesquilinear form. Theorem 5.5. (von Neumann). Let X be a linear space. A norm on X is induced by an inner product on X if and only if it satisfies the parallelogram law. Moreover, if a norm on X satisfies the parallelogram law, then the unique inner product that induces it is given by the polarization identity. Proof. Proposition 5.4 ensures that if a norm on X is induced by an inner product, then it satisfies the parallelogram law, and the inner product on X can be written in terms of this norm according to the polarization identity. Conversely, suppose a norm # # on X satisfies the parallelogram law and consider the mapping ; : X ×X → F defined by the polarization identity. Take x, y, and z arbitrary in X . Note that
314
5. Hilbert Spaces
x+z =
x + y 2
x−y +z + 2
and
y+z =
x + y 2
x−y +z − . 2
Thus, by the parallelogram law, , ,2 , x−y ,2
, . , +, #x + z#2 + #y + z#2 = 2 , x+y + z 2 2 Suppose F = R so that ; : X ×X → R is the mapping defined by the real polarization identity (on the real normed space X ). Hence
x ; z + y ; z = 14 #x + z#2 − #x − z#2 + #y + z#2 − #y − z#2 -
. = 14 #x + z#2 + #y + z#2 − #x − z#2 + #y − z#2 ,2 , x−y ,2 , x+y ,2 , x−y ,2 . - , , − , , , +, , +, = 12 , x+y 2 +z 2 2 −z 2 , , , ,
9 : 2 2 , − , x+y − z , = 2 x+y ; z . = 12 , x+y 2 +z 2 2 The above identity holds for arbitrary x, y, z ∈ X , and so it holds for y = 0. Moreover, the polarization identity ensures that 0 ; z = 0 for every z ∈ X . Thus, by setting y = 0 above, we get x ; z = 2 x2 ; z for every x, z ∈ X . Then (i)
x ; z + y ; z = x + y ; z
for arbitrary x, y, and z in X . It is readily verified (using exactly the same argument) that such an identity still holds if F = C , where the mapping ; : X ×X → C now satisfies the complex polarization identity (on the complex normed space X ). This is axiom (i) of Definition 5.1 (additivity). To verify axiom (ii) of Definition 5.1 (homogeneity in the first argument), proceed as follows. Take x and y arbitrary in X . The polarization identity ensures that −x ; y = −x ; y. Since (i) holds true, it follows by a trivial induction that nx ; y = nx ; y, and hence x ; y = n nx ; y = n nx ; y so that nx ; y =
1 n x ; y,
for every positive integer n. The above three expressions imply that qx ; y = qx ; y for every rational number q (since 0 ; y = 0 by the polarization identity). Take an arbitrary α ∈ R and recall that Q is dense in R. Thus there exists a rational-valued sequence {qn } that converges in R to α. Moreover, according to (i) and recalling that −αx ; y = −αx ; y, |qn x ; y − αx ; y| = |(qn − α)x ; y|.
5.2 Examples
315
The polarization identity ensures that |αn x ; y| → 0 whenever αn → 0 in R (because the norm is continuous). Hence |(qn − α)x ; y| → 0, and therefore |qn x ; y − αx ; y| → 0, which means qn x ; y → αx ; y. This implies that αx ; y = limn qn x ; y = limn qn x ; y = αx ; y. Outcome: (ii(a))
αx ; y = αx ; y
for every α ∈ R. If F = C , then the complex polarization identity (on the complex space X ) ensures that ix ; y = ix ; y. Take an arbitrary λ = α + iβ in C and observe by (i) and (ii(a)) that λx ; y = (α + iβ)x ; y = αx ; y + iβ x ; y = (α + iβ)x ; y = λx ; y. Conclusion: (ii(b))
λx ; y = λx ; y
for every λ ∈ C . Axioms (iii), (iv), and (v) of Definition 5.1 (Hermitian symmetry and positiveness) emerge as immediate consequences of the polarization identity. Thus the mapping ; : X ×X → F defined by the polarization identity is, in fact, an inner product on X . Moreover, this inner product induces the norm # #; that is, x ; x = #x#2 for every x ∈ X (polarization identity again). Finally, if ; 0 : X ×X → F is an inner product on X that induces the same norm # # on X , then it must coincide with ; . That is, x ; y0 = x ; y for every x, y ∈ X (polarization identity once again). A Hilbert space is a complete inner product space. That is, a Hilbert space is an inner product space that is complete as a metric space with respect to the metric generated by the norm induced by the inner product. A real or complex Hilbert space is a complete real or complex inner product space. In fact, every Hilbert space is a special kind of Banach space: a Hilbert space is a Banach space whose norm is induced by an inner product. By Theorem 5.5, a Hilbert space is a Banach space whose norm satisfies the parallelogram law .
5.2 Examples Theorem 5.5 may suggest that just a few of the classical examples of Section 4.2 survive as inner product spaces. This indeed is the case. Example 5.A. Consider the linear space F n over F (with either F = R or F = C ) and set n ξi υi x ; y = i=1
for every x = (ξ1 , . . . , ξn ) and y = (υ1 , . . . , υn ) in F n. It is readily verified that this defines an inner product on F n (check the axioms in Definition 5.1), which
316
5. Hilbert Spaces
induces the norm # #2 on F n. In particular, it induces the Euclidean norm on Rn so that (Rn, ; ) is the n-dimensional Euclidean space (see Example 4.A). Since (F n, # #2 ) is a Banach space, it follows that (F n, ; ) is a Hilbert space. Now consider the norms # #p (for p ≥ 1) and # #∞ on F n defined in Example 4.A. If n > 1, then all of them, except the norm # #2 , are not induced by any inner product on F n. Indeed, set x = (1, 0, . . . , 0) and y = (0, 1, 0, . . . , 0) in F n and verify that the parallelogram law fails for every norm # #p with p = 2, as it also fails for the sup-norm # #∞ . Therefore, if n > 1, then (F n, # #2 ) is the only Hilbert space among the Banach spaces of Example 4.A. p Example 5.B. Consider the Banach spaces (+ , # #p ) for each p ≥ 1 and ∞ 2 (+ , # #∞ ) of Example 4.B. It is easy to show that, except for (+ , # #2 ), these are not Hilbert spaces: the norms # #p for every p = 2 and # #∞ do not pass the parallelogram law test of Theorem 5.5, and hence are not induced by any posp ∞ sible inner product on + (p = 2) or on + (e.g., take x = e1 = (1, 0, 0, 0, . . .) p ∞ and y = e2 = (0, 1, 0, 0, 0, . . .) in + ∩ + ). On the other hand, the function 2 2 ; : + ×+ → F given by
x ; y =
∞
ξk υ k
k=1 2 for every x = {ξk }k∈N and y = {υk }k∈N in + is well defined (i.e., the above 2 infinite series converges in F for every x, y ∈ + by the H¨older inequality for p = q = 2 and Proposition 4.4). Moreover, it actually is an inner product on 2 + (i.e., it satisfies the axioms of Definition 5.1), which induces the norm # #2 2 2 on + . Thus, as (+ , # #2 ) is a Banach space, 2 , ; ) is a Hilbert space. (+
Similarly, the Banach spaces ( p , # #p ) for any 1 ≤ p = 2 and ( ∞, # #∞ ) are not Hilbert spaces. However, the function ; : 2 × 2 → F given by x ; y =
∞
ξk υ k
k=−∞
for every x = {ξk }k∈Z and y = {υk }k∈Z in 2 is an inner product on 2 , which induces the norm # #2 on 2 . Indeed, the sequence of nonnegative numbers n n {k=−n |ξk υ k |}n∈N 0 converges in R if the sequences { k=−n |ξk |2 }n∈N 0 and n 2 { k=−n |υk | }n∈N 0 of nonnegative numbers converge in R (the H¨ older inequal ity for p = q = 2), and so { nk=−n ξk υ k }n∈N 0 converges in F (by Proposition 4.4). Therefore, the function ; is well defined and, as it is easy to check, it satisfies the axioms of Definition 5.1. Thus, as ( 2 , # #2 ) is a Banach space, ( 2 , ; ) is a Hilbert space.
5.2 Examples
317
Example 5.C. Consider the linear space C[0, 1] equipped with any of the norms # #p (p ≥ 1) of Example 4.D or with the sup-norm # #∞ of Example 4.G. Among these norms on C[0, 1], the only one that is induced by an inner product on C[0, 1] is the norm # #2 . Indeed, take x and y in C[0, 1] such that xy = 0 and #x# = #y# = 0, where # # denotes either # #p for some p ≥ 1 or # #∞. That is, suppose x and y are nonzero continuous functions on [0, 1] of equal norms such that their nonzero values are attained on disjoint subsets of [0, 1]. For instance, x
|
y
|
0
1
Observe that #x + y#pp = #x − y#pp = 2#x#pp for every p ≥ 1 and #x + y#∞ = #x − y#∞ = 2#x#∞ . Thus # #p for p = 2 and # #∞ do not satisfy the parallelogram law, and so these norms are not induced by any inner product on C[0, 1] (Theorem 5.5). Now consider the function ; : C[0, 1]×C[0, 1] → F given by $ 1 x ; y = x(t)y(t) dt 0
for every x, y ∈ C[0, 1]. It is readily verified that ; is an inner product on C[0, 1] that induces the norm # #2 . Hence (C[0, 1], ; ) is an inner product space. However, (C[0, 1], ; ) is not a Hilbert space (reason: (C[0, 1], # #2 ) is not a Banach space — Example 4.D). As a matter of fact, among the normed spaces (C[0, 1], # #p ) for each p ≥ 1 and (C[0, 1], # #∞ ), the only Banach space is (C[0, 1], # #∞ ). This leads to a dichotomy: either equip C[0, 1] with # #2 to get an inner product space that is not a Banach space, or equip it with # #∞ to get a Banach space whose norm is not induced by an inner product. In any case, C[0, 1] cannot be made into a Hilbert space. Roughly speaking, the set of continuous functions on [0, 1] is not large enough to be a Hilbert space. Let X be a linear space over a field F . A functional ; : X ×X → F is a semi-inner product on X if it satisfies the first four axioms of Definition 5.1. The difference between an inner product and a semi-inner product is that a semi-inner product is a Hermitian symmetric sesquilinear form that induces a nonnegative quadratic form which is not necessarily positive (i.e., axiom (v) of Definition 5.1 may not be satisfied by a semi-inner product). A semiinner product ; on X induces a seminorm # #, which in turn generates a 1 pseudometric d, namely, d(x, y) = #x − y# = x − y ; x − y 2 for every x, y in X . A semi-inner product space is a linear space equipped with a semi-inner product. Remark : The identity #x + y#2 = #x#2 + 2 Rex ; y + #y#2 for every x, y ∈ X still holds for a semi-inner product and its induced seminorm. Indeed, the
318
5. Hilbert Spaces
Schwarz inequality, the parallelogram law, and the polarization identities remain valid in a semi-inner product space (i.e., they still hold if we replace “inner product” and “norm” with “semi-inner product” and “seminorm”, respectively — cf. proofs of Lemma 5.2 and Proposition 5.4). The same happens with respect to Theorem 5.5. Proposition 5.6. Let # # be the seminorm induced by a semi-inner product ; on a linear space X . Consider the quotient space X /N , where N = {x ∈ X : #x# = 0} is a linear manifold of X . Set [x] ; [y]∼ = x ; y for every [x] and [y] in X /N , where x and y are arbitrary vectors in [x] and [y], respectively. This defines an inner product on X /N so that (X /N , ; ∼ ) is an inner product space. Proof. The seminorm # # is induced by a semi-inner product so that it satisfies the parallelogram law of Proposition 5.4. Consider the norm # #∼ on X /N of Proposition 4.5 and note that #[x] + [y]#2∼ + #[x] − [y]#2∼ = #[x + y]#2∼ + #[x − y]#2∼
= #x + y#2 + #x − y#2
= 2 #x#2 + #y#2 = 2 #[x]#2∼ + #[y]#2∼
for every [x], [y] ∈ X /N . Thus # #∼ satisfies the parallelogram law. This means that it is induced by a (unique) inner product ; ∼ on X /N , which is given in terms of the norm # #∼ by the polarization identity (Theorem 5.5). On the other hand, the semi-inner product ; on X also is given in terms of the seminorm # # through the polarization identity as in Proposition 5.4. Since #[x] + α[y]#∼ = #x + αy# for every [x], [y] ∈ X /N and every α ∈ F (with x and y being arbitrary elements of [x] and [y], respectively), it is readily verified by the polarization identity that [x] ; [y]∼ = x ; y. Example 5.D. For each p ≥ 1 let rp (S) be the linear space of all scalar-valued Riemann p-integrable functions, on a nondegenerate interval S of the real line, equipped with the seminorm | |p of Example 4.C. Again (see Example 5.C), it is easy to show that, except for the seminorm | |2 , these seminorms do not satisfy the parallelogram law. Moreover, $ x ; y = x(s)y(s) ds S
for every x, y ∈ r2 (S) %defines a semi-inner product that induces the seminorm 1 | |2 given by |x|2 = ( S |x(s)|2 ds) 2 for each x ∈ r2 (S). Consider the linear manifold N = {x ∈ r2 (S): |x|2 = 0} of r2 (S), and let R2 (S) be the quotient space r2 (S)/N as in Example 4.C. Set
5.2 Examples
319
[x] ; [y] = x ; y for every [x], [y] ∈ R2 (S), where x and y are arbitrary vectors in [x] and [y], respectively. According to Proposition 5.6, this defines an inner product on R2 (S), which is the one that induces the norm # #2 of Example 4.C. Since (R2 (S), # #2 ) is not a Banach space, it follows that (R2 (S), ; ) is an inner product space but not a Hilbert space. The completion (L2 (S), # #2 ) of (R2 (S), # #2 ) is a Banach space whose norm is induced by the inner product ; so that (L2 (S), ; ) is a Hilbert space. This, in fact, is the completion of the inner product space (C[0, 1], ; ) of Example 5.C (if S = [0, 1] — see Examples 4.C and 4.D). We shall discuss the completion of an inner product space in Section 5.6. Example 5.E. Let {(Xi , ; i )}ni=1 be a finite collection of inner product spaces, where all the linear spaces Xi are over the same field F , and let n n X be the direct sum i i=1 n of the family {Xi }i=1 . For each x = (x1 , . . . , xn ) and y = (y1 , . . . , yn) in i=1 Xi , set x ; y =
n
xi ; yi i .
i=1
n It is easy to check that this defines an inner product on i=1 Xi that induces the norm # #2 of Example 4.E. Indeed, if ##i is the norm on neach Xi2induced2 n by the inner product ; i , then x ; x = x ; x = i i i i=1
i=1 #xi #i = #x#2 n n for every x = (x1 , . . . , xn ) in i=1 Xi . Since i=1 Xi , # #2 is a Banach space if and only if each (Xi , # #i ) is a Banach space, it follows that
n i=1 Xi , ; is a Hilbert space if and only if each (Xi , ; i ) is a Hilbert space. If the inner product spaces (X n(X , ; X ), then x ; y = ni , ; i ) coincide with a fixed inner productn space x ; y defines an inner product on X = i i X i=1 i=1 X and (X n , ; ) is a Hilbert space whenever (X , ; X ) is a Hilbert space. This generalizes Example 5.A. Example 5.F. Let {(Xk , ; k )} be a countably infinite collection of inner product spaces indexed by N (or by N 0 ), where all the linear spaces Xk are ∞ over the same field F . Consider- the full direct sum Xk of {Xk }∞ which k=1 k=1 . , ∞ ∞ is a linear space over F . Let manifold of k=1 Xk k=1 Xk 2 be the linear ∞ made up of all square-summable sequences {xk }∞ k=1 in k=1 Xk . That is (see Example 4.F),
5. Hilbert Spaces
320
- ∞
.
k=1 Xk 2
=
∞ ∞ 2 {xk }∞ k=1 ∈ k=1 Xk : k=1 #xk #k < ∞ ,
the inner where each # #k is the norm on Xk induced by - . product ; k . Take ∞ ∞ arbitrary sequences {xk }∞ and {y } in X so that the real-valk k k=1 k=1 k=1 2 ∞ 2 and {#y # } lie in . Write ; +2 and # #+2 ued sequences {#xk #k }∞ k k k=1 + k=1 2 for inner product and norm on + (see Example 5.B). Use the Schwarz inequal2 ity in each inner product space Xk and also in the Hilbert space + to get ∞
|xk ; yk k | ≤
k=1
∞
: 9 ∞ #xk #k #yk #k = {#xk #k }∞ k=1 ; {#yk #k }k=1 2
+
k=1
, , , , ∞ , , , ≤ ,{#xk #k }∞ k=1 2 {#yk #k }k=1 2 . +
∞
+
∞
Therefore k=1 |xk ; yk k | < ∞, and so the infinite series k=1 xk ; yk k is absolutely convergent in the Banach space (F , | |), which implies that it converges in (F , | |) by Proposition 4.4. Set x ; y =
∞
xk ; yk k
k=1
. -∞ ∞ for every x = {xk }∞ k=1 and y = {yk-} k=1 in . k=1 Xk 2 . It is easy to show that ∞ X that induces the norm # #2 of this defines an inner product on -∞k=1 .k 2
Example 4.F. Moreover, since X , # #2 is a Banach space if and k k=1 2 only if each (Xk , # #k ) is a Banach space, it follows that .
-∞ k=1 Xk 2 , ; is a Hilbert space if and only if each (Xk , ; k ) is a Hilbert space. A similar argument holds if the collection {(Xk , ; k )} is indexed by Z . Indeed, if we set . - ∞ ∞ ∞ ∞ 2 k=−∞ Xk 2 = {xk }k=−∞ ∈ k=−∞ Xk : k=−∞ #xk #k < ∞ , ∞ the linear manifold of the full direct sum k=−∞ Xk of {Xk }∞ k=−∞ made up ∞ of all square-summable nets {xk }∞ in X , then k=−∞ k=−∞ k x ; y =
∞
xk ; yk k
k=−∞
-∞ . for every x = {xk }∞ and y = {yk }∞ k=−∞ in k=−∞ Xk 2 defines the inner -∞ k=−∞. product on k=−∞ Xk 2 that induces the norm # #2 of Example 4.F. Again, if each (Xk , ; k ) is a Hilbert space, then
. -∞ k=−∞ Xk 2 , ; is a Hilbert space. If the inner product spaces (Xk , ; k ) coincide with a fixed inner product space (X , ; X ), then set
5.3 Orthogonality 2 + (X ) =
-∞
.
k=1 X 2
and
2 (X ) =
-∞
321
.
k=−∞ X 2
as in Example 4.F. If (X , ; X ) is a Hilbert space, then 2 (X ), ; ) and ( 2 (X ), ; ) are Hilbert spaces. (+
5.3 Orthogonality Let a and b be nonzero vectors in the Euclidean plane R2, and let θab be the angle between the line segments joining these points to the origin (this is usually called the angle between a and b). Set a = #a#−1 a = (α1 , α2 ) and b = #b#−1b = (β1 , β2 ) in the unit circle about the origin. It is an exercise of elementary plane geometry to verify that cos θab = α1 β1 + α2 β2 = a ; β = #a#−1 #b#−1 a ; b. We shall be particularly concerned with the notion of orthogonal (or perpendicular ) vectors a and b. The line segments joining a and b to the origin are perpendicular if θab = π2 (equivalently, if cos θab = 0), which means that a ; b = 0. The notions of angle and orthogonality can be extended from the Euclidean plane to a real inner product space (X , ; ) by setting cos θxy =
x ; y #x##y#
whenever x and y are nonzero vectors in X = {0}. Note that −1 ≤ cos θxy ≤ 1 by the Schwarz inequality, and also that cos θxy = 0 if and only if x ; y = 0. Definition 5.7. Two vectors x and y in any (real or complex) inner product space (X , ; ) are said to be orthogonal (notation: x ⊥ y) if x ; y = 0. A vector x in X is orthogonal to a subset A of X (notation: x ⊥ A) if it is orthogonal to every vector in A (i.e., if x ; y = 0 for every y ∈ A). Two subsets A and B of X are orthogonal (notation: A ⊥ B) if every vector in A is orthogonal to every vector in B (i.e., if x ; y = 0 for every x ∈ A and every y ∈ B). Thus A and B are orthogonal if there is no x in A and no y in B such that x ; y = 0. In this sense the empty set ∅ is orthogonal to every subset of X . Clearly, x ⊥ y if and only if y ⊥ x, and hence A ⊥ B if and only if B ⊥ A, so that ⊥ is a symmetric relation both on X and on the power set ℘(X ). We write x ⊥ y if x ∈ X and y ∈ X are not orthogonal. Similarly, A ⊥ B means that A ⊆ X and B ⊆ X are not orthogonal. Note that if there exists a nonzero vector x in A ∩ B, then x ; x = #x#2 = 0, and hence A ⊥ B. Therefore, A⊥B
implies
A ∩ B ⊆ {0}.
We shall say that a subset A of an inner product space X is an orthogonal set (or a set of pairwise orthogonal vectors) if x ⊥ y for every pair {x, y} of distinct vectors in A. Similarly, an X -valued sequence {xk } is an orthogonal sequence (or a sequence of pairwise orthogonal vectors) if xk ⊥ xj whenever
322
5. Hilbert Spaces
k = j. Since #x + y#2 = #x#2 + 2 Rex ; y + #y#2 for every x and y in X , it follows as an immediate consequence of the definition of orthogonality that x⊥y
implies
#x + y#2 = #x#2 + #y#2 .
This is the Pythagorean Theorem. The next result is a generalization of it for a finite orthogonal set. Proposition 5.8. If {xi }ni=0 is a finite set of pairwise orthogonal vectors in a inner product space, then n n , ,2 , , xi , = #xi #2 . , i=0
i=0
Proof. We have already seen that the result holds for n = 1 (i.e., it holds for every pair of distinct orthogonal vectors). Suppose it holds for some n ≥ 1 (i.e., n n suppose # i=0 xi #2 = i=0 #xi #2 for every orthogonal set {xi }ni=0 with n +1 elements). Let {xi }n+1 set with n +2elements. i=0 be an arbitrary orthogonal n n n Since x ⊥ {x } , it follows that x ⊥ x (since xn+1 ; i=0 xi = n+1 i n+1 i i=0 i=0 n i=0 xn+1 ; xi ). Hence n n n+1 , ,2 ,2 , , , n+1 , , , , , ,2 xi , = , xi + xn+1 , = , xi , + #xn+1 #2 = #xi #2 , , i=0
i=0
i=0
i=0
so that the result holds for n +1 (i.e., it holds for every orthogonal set with n +2 elements whenever it holds for every orthogonal set with n +1 elements), which completes the proof by induction. Recall that an X -valued {xk }∞ k=1 (where X is any normed space) ∞ sequence 2 is square-summable if k=1 #xk # < ∞. Here is a countably infinite version of the Pythagorean Theorem. Corollary 5.9. Let {xk }∞ k=1 be a sequence of pairwise orthogonal vectors in a inner product space X . ∞ (a) If the infinite series in X , then {xk }∞ k converges k=1 is a squarek=1 x∞ ∞ 2 summable sequence and # k=1 xk # = k=1 #xk #2 . ∞ (b) If X is a Hilbert space ∞ and {xk }k=1 is a square-summable sequence, then the infinite series k=1 xk converges in X .
Proof. Let {xk }∞ k=1 be an orthogonal sequence in X . n ∞ ∞ in X ; that is, if k=1 xk → k=1 xk (a) If the infinite series k=1 xk converges n ∞ in X as n → ∞, then # k=1 xk #2 → # k=1 xk #2 as n → ∞ (reason: nnorm and 2 squaring are continuous mappings). Proposition 5.8 says that # k=1 xk # = n ∞ n 2 2 2 k=1 #xk # for every n ≥ 1, and hence k=1 #xk # → # k=1 xk # as n → ∞.
5.3 Orthogonality
323
∞ (b) Consider the X -valued sequence {yn }∞ n=1 of partial sums of {xk }k=1 ; that n is, set yn = k=1 xk for each integer n ≥ 1. By Proposition 5.8 we know that n+m ∞ #yn+m − yn #2 = j=n+1 #xkj #2 for every m, n ≥ 1. If k=1 #xk #2 < ∞, then ∞ 2 supm≥1 #yn+m − yn #2 = k=n+1 #xk # → 0 as n → ∞ (Problem 3.11), and hence {yn }∞ is a Cauchy sequence in X (Problem 3.51). If X is Hilbert, n=1 ∞ then {yn }∞ converges in X , which means that the infinite series n=1 k=1 xk converges in X .
if {xk }∞ k=1 is an orthogonal sequence ∞in a Hilbert space H, then ∞Therefore, 2 ∞ if and onlyif the infinite series k=1 xk converges in H and, k=1 #xk # < ∞ 2 2 in this case, # ∞ k=1 xk # = k=1 #xk # . ; k )}∞ of Hilbert Example 5.G. Let {(X k , k=1 . be a sequence- . spaces. Con∞ ∞ sider the Hilbert space X , ; , where X is the linear k k k=1 k=1 2 2 ∞ space of all square-summable sequences in the full direct sum k=1 Xk and ; is the inner product of Example 5.F; that is, x ; y =
∞
xk ; yk k
k=1
. -∞ ∞ for every x = {xk }∞ k=1 and y = {yk }k=1 in k=1 Xk 2 . This is referred to as an (external) orthogonal direct sum. An orthogonal direct sum actually deserves its if we identify each linear space Xi with the linear - ∞ . name. Indeed, manifold ∞ k=1 Xi (k) of k=1 Xk 2 such that Xi (k) = {0k } ⊆ Xk for k = i and Xi (i) = Xi (as in Example 4.I), then it is clear that Xi ⊥ Xj
i = j, ∞ ∞ where such an orthogonality is interpreted -∞ as . k=1 Xi (k) ⊥ k=1 Xj (k) with respect to the inner product ; on k=1 Xk 2 . Observe that the norm # # -∞ . induced on X by this inner product is given by k k=1 2 #x#2 = x ; x =
whenever
∞ k=1
-∞
xk ; xk k =
∞
#xk #2k
k=1
.
for every x = {xk }∞ via Corollary k=1 in k=1 Xk 2 . This can also be -verified . ∞ 5.9 as follows. Take an arbitrary vector x = {xk }∞ in X k 2 (i.e., take k=1 k=1 ∞ an arbitrary square-summable sequence from k=1 Xk ). Set xi (k) = δik xk in Xk for every k, i ≥ 1 (i.e., xi (k) = 0k if k = i and xi (i) = xi ). For each i ≥ 1 consider the vector x i = {xi (k)}∞ k=1 so that x 1 = (x1 , 02 , 03 , . . .) and, for i ≥ 2, . -∞ x i = {xi (k)}∞ k=1 = (01 , . . . , 0i−1 , xi , 0i+1 , . . .) in k=1 Xk 2 . Note that {x i }∞ i=1 is an orthogonal square-summable sequence. (Indeed, x i lies ∞ in k=1 Xi (k), #x i # = #xi #i , and {xi }∞ i=1 is square-summable.) Thus Corollary
324
5. Hilbert Spaces
5.9 that (i) the infiniteseries ∞ converges in the Hilbert space i=1 x -ensures . i ∞ ∞ ∞ 2 2 i # . A word on notation: k=1 Xk 2 , ; , and (ii) # i=1 x i # = i=1 #x ∞ and vector We are denoting vector addition in the linear space k=1 Xk by ⊕ ∞ n subtraction by &, as usual. From (i), x = x (because x & ( i i=1 i=1 x i ) = ∞ ∞ 2 2 x for each n ≥ 1) and from (ii), #x# = #x # . Therefore, i i i=n+1 i=1 #x#2 =
∞ i=1
#x i #2 =
∞
#xi #2i .
i=1
If (X , ; ) is an inner product space, and if M is a linear manifold of the linear space X , then it is easy to show that the restriction ; M : M×M → F of the inner product ; : X ×X → F to M×M is an inner product on M, so that (M, ; M ) is an inner product space. Moreover, the norm # #M : M → R induced by the inner product ; M on M coincides with the restriction to M of the norm # # : X → R induced by the inner product ; on X . Thus (M, # #M ) is a linear manifold of the normed space (X , # #). Whenever a linear manifold of an inner product space is regarded as an inner product space, it will always be understood that the inner product on it is the restricted inner product ; M . We shall drop the subscript and write (M, ; ) instead of (M, ; M ), and refer to the inner product space (M, ; ) by simply saying that “M is a linear manifold of X ”. Recall that a subspace of a normed space is a closed linear manifold of it. Hence a subspace of an inner product space X is a linear manifold of the linear space X that is closed in the inner product topology. That is, a subspace M of an inner product space X is a linear manifold of X that is closed as a subset of X when X is regarded as a metric space whose metric is that generated by the norm that is induced by the inner product. According to Proposition 4.7, a linear manifold of a Hilbert space H is a Hilbert space if and only if it is a subspace of H. We observed in Section 4.3 that the sum of subspaces is not necessarily a subspace (it is a linear manifold but may not be closed). An important consequence of the Pythagorean Theorem is that the sum of orthogonal subspaces of a Hilbert space is a subspace. Theorem 5.10. (a) If M and N are complete orthogonal linear manifolds of an inner product space X , then M + N is a complete linear manifold of X . (b) If M and N are orthogonal subspaces of a Hilbert space H, then the sum M + N is a subspace of H. Proof. Let M and N be orthogonal linear manifolds of an inner product space X . Take an arbitrary Cauchy sequence {xn } in M + N so that xn = un + vn with un in M and vn in N for each n. Since M and N are linear manifolds of X , it follows that um − un lies in M and vm − vn lies in N , and hence um − un ⊥ vm − vn , for every pair of integers m and n (since M ⊥ N ). Writing xm − xn = (um − un) + (vm − vn ) the Pythagorean Theorem ensures that #xm − xn #2 = #um − un #2 + #vm − vn #2
5.3 Orthogonality
325
for every m and n. This implies that {un } and {vn } are Cauchy sequences in M and N , respectively (because {xn } is a Cauchy sequence). (a) If M and N are complete, then {un } converges in M and {vn } converges in N . Recalling that addition is a continuous operation (Problem 4.1), we get from Corollary 3.8 that {xn } converges in M + N . Conclusion: Every Cauchy sequence in M + N converges in M + N . Thus M + N is complete (and hence closed in X by Theorem 3.40(a)). (b) If M and N are closed linear manifolds of a complete inner product space H, then they are complete (Theorem 3.40(b)), and the linear manifold M + N of H is complete (and therefore closed in H) according to item (a). Remark : Theorem 5.10(a) fails if M and N are either (1) orthogonal but not complete or (2) complete but not orthogonal. Equivalently, Theorem 5.10(b) fails if M and N are either (1) orthogonal subspaces of an incomplete inner product space or (2) not orthogonal subspaces of a Hilbert space. That is, completeness and orthogonality are both crucial assumptions in the statement of Theorem 5.10. This will be verified in Problems 5.12 and 5.13. Let {Mγ }γ∈Γ be an arbitrary nonempty indexed family of subspaces of a Hilbert space H (i.e., an arbitrary nonempty subcollection of Lat(H)). Recall that the sum γ∈Γ Mγ of {Mγ }γ∈Γ is the linear manifold of H consisting of all finite sums of vectors in H with each summand being a vector in one of the subspaces Mγ . That is,
Mγ = span Mγ . γ∈Γ
γ∈Γ
Corollary 5.11. Every finite sum of pairwise orthogonal subspaces of a Hilbert space H is itself a subspace of H. Proof. Let H be a Hilbert space. Theorem 5.10(b) says that the sum of every pair of orthogonal subspaces of H is again a subspace of H. Take an arbitrary integer n ≥ 2. Suppose the sum of every set of n pairwise orthogonal subspaces of H is a subspace of H. Now take an arbitrarycollection of n +1 n pairwise orthogonal subspaces of H, say, {Mi }n+1 If x ∈ i=1 Mi , then x = i=1 . n n i=1 xi with each xi in Mi , and hence x ; xn+1 = i=1 xi ; xn+1 = 0 whenever x lies in M (because M ⊥ M for every i = 1 , . . . , n). Thus n+1 n+1 i n+1 n n i=1 Mi ⊥ Mn+1 . As i=1 Mi is assumed to be a subspace of H, and since n+1 i=1
Mi = span
n+1
n n
Mi = span Mi ∪ Mn+1 = Mi + Mn+1 ,
i=1
it follows by Theorem 5.10(b) that pletes the proof by induction.
i=1
n+1 i=1
i=1
Mi is a subspace of H. This com
326
5. Hilbert Spaces
5.4 Orthogonal Complement If A is a subset of an inner product space X , then the orthogonal complement of A is the set A⊥ = x ∈ X : x ⊥ A = x ∈ X : x ; y = 0 for every y ∈ A consisting of all vectors in X that are orthogonal to every vector in A. If A is the empty set ∅, then for every x in X there is no vector y in A for which x ; y = 0, and hence ∅⊥ = X . It is plain that x ⊥ {0} for every x ∈ X , and x ⊥ X if and only if x = 0. Hence {0}⊥ = X
and
X ⊥ = {0}.
Let A and B be nonempty subsets of X . The next results are immediate consequences of the definition of orthogonal complement. A ⊥ A⊥ ,
A ∩ A⊥ ⊆ {0},
and
A ∩ A⊥ = {0} whenever 0 ∈ A
(reason: if there exists x ∈ A ∩ A⊥, then x ; x = 0), and A⊥B
if and only if
A ⊆ B⊥ .
Since ⊥ is a symmetric relation (i.e., A ⊥ B if and only if B ⊥ A), the above equivalent assertions also are equivalent to B ⊆ A⊥. Moreover, A⊥B
implies
A ∩ B ⊆ {0}
(if A ⊆ B ⊥, then A ∩ B ⊆ B ⊥ ∩ B ⊆ {0}). It is readily verified that A⊆B
implies
B ⊥ ⊆ A⊥
and so
A⊥⊥ ⊆ B ⊥⊥ ,
where A⊥⊥ = (A⊥ )⊥. Since A ⊥ A⊥ and A⊥ ⊥ A⊥⊥, we get A ⊆ A⊥⊥ (so that A⊥⊥⊥ ⊆ A⊥ ) and A⊥ ⊆ A⊥⊥⊥, where A⊥⊥⊥ = (A⊥⊥ )⊥. Therefore, A ⊆ A⊥⊥
and
A⊥ = A⊥⊥⊥ .
Proposition 5.12. The orthogonal complement A⊥ of every subset A of any inner product space X is a subspace of X . Moreover, ⊥ A⊥ = (A⊥ )− = (A− )⊥ = (span A)⊥ = A . The orthogonal complement of every dense subset of X is the zero space: A⊥ = {0}
whenever
A− = X .
Proof. Suppose A = ∅ (otherwise the results are trivially verified). Since the inner product is linear in the first argument, it follows that A⊥ is a linear manifold of the linear space X . If x ⊥ A, then x ⊥ ni=1 αi yi for every integer n ≥ 1
5.4 Orthogonal Complement
327
whenever yi ∈ A and αi ∈ F for each i = 1 , . . . , n, and hence A⊥ ⊆ (span A)⊥. On the other hand, A ⊆ span A so that (span A)⊥ ⊆ A⊥. Then A⊥ = (span A)⊥ . That A⊥ is closed in X is a consequence of the continuity of the inner product (Problem 5.6). Actually, if {xn } is an A⊥ -valued sequence that converges in X to x ∈ X , then (cf. Corollary 3.8) x ; y = lim xn ; y = limxn ; y = 0 for every y ∈ A, which implies that x ∈ A⊥. Therefore, A⊥ is closed in X by the Closed Set Theorem (Theorem 3.30); that is, A⊥ = (A⊥ )− , and so A⊥ is a subspace (i.e., a closed linear manifold) of X . Now take an arbitrary x in A⊥ and an arbitrary y in A−. By Proposition 3.27 there is an Avalued sequence {yn } that converges in X to y. Using Corollary 3.8 again, and recalling that the inner product is continuous, we get y ; x = lim yn ; x = limyn ; x = 0. Thus A⊥ ⊥ A− so that A⊥ ⊆ (A− )⊥. But (A− )⊥ ⊆ A⊥ because A ⊆ A−. Hence A⊥ = (A− )⊥ . Since A⊥ = (span A)⊥ and A⊥ = (A− )⊥ for every subset A of X , .⊥ ⊥ A⊥ = (span A)⊥ = (span A)− = A . Finally, if A− = X , then A⊥ = (A− )⊥ = X ⊥ = {0}.
Remark : If L ∈ L[X , Y ], where X is an inner product space, then the linear transformation L|N (L)⊥ is injective (i.e., then N (L|N (L)⊥) = {0}). In fact, if v ∈ N (L)⊥ lies in N (L|N(L)⊥), then v ∈ N (L) ∩ N (L)⊥ = {0}. The next theorem is of critical importance; it may be thought of as a pivotal result in the theory if Hilbert spaces. Recall that the distance d(x, M ) of a point x in a normed space X to a nonempty subset M of X is the real number d(x, M ) = inf #x − u#. u∈M
Theorem 5.13. Let x be an arbitrary vector in a Hilbert space H. (a) If M is a closed convex nonempty subset of H, then there exists a unique vector ux in M such that #x − ux # = d(x, M ). (b) Moreover, if M is a subspace of H, then the unique vector in M for which #x − ux # = d(x, M) is the unique vector in M such that the difference x − ux is orthogonal to M; that is, such that x − ux ∈ M⊥ .
328
5. Hilbert Spaces
Proof. (a) Let x be an arbitrary vector in H and let M be a nonempty subset of H so that d(x, M ) = inf u∈M #x − u# exists in R. Therefore, for each integer n ≥ 1 there exists un ∈ M such that d(x, M ) ≤ #x − un # < d(x, M ) +
1 n.
Consider the M -valued sequence {un }. H is an inner product space, and so the parallelogram law ensures that
#2x − um − un #2 + #un − um #2 = 2 #x − um #2 + #x − un #2 for each m, n ≥ 1. Since M is convex, it follows that 12 (um + un ) ∈ M , and hence 2 d(x, M ) ≤ 2 # 21 (um + un ) − x# = #2x − um − un # so that
0 ≤ #um − un #2 ≤ 2 #x − um #2 + #x − un #2 − 2d(x, M )2 for every m,n ≥ 1. This inequality and the fact that #x − un # → d(x, M ) as n → ∞ are enough to ensure that {un } is a Cauchy sequence in H, and therefore it converges in the Hilbert space H to, say, ux ∈ H. But the norm is a continuous function so that (see Corollary 3.8) #x − ux # = #x − lim un # = lim #x − un # = d(x, M ). Moreover, since M is closed in H and {un } is an M -valued sequence that converges to ux in H, it follows by the Closed Set Theorem (Theorem 3.30) that ux ∈ M . Conclusion: There exists ux in M such that #x − ux # = d(x, M ). To prove uniqueness take an arbitrary u in M such that #x − u# = d(x, M ). Observe that 12 (ux + u) lies in M because M is convex, and hence d(x, M ) ≤ # 12 (ux + u) − x#. Thus 4d(x, M )2 ≤ #ux + u − 2x#2 . This inequality and the parallelogram law imply that 4 d(x, M )2 + #ux − u#2 ≤ #ux + u − 2x#2 + #ux − u#2
= 2 #ux − x#2 + #u − x#2 = 4d(x, M )2 . Outcome: #ux − u#2 = 0; that is, u = ux . (b) Now let x be an arbitrary vector in H and suppose M is a subspace of H, which obviously implies that M is a closed convex nonempty subset of H. According to item (a) there exists a unique ux ∈ M such that #x − ux # = d(x, M). Take an arbitrary nonzero u ∈ M. Since (ux + αu) ∈ M for every scalar α, it follows that d(x, M)2 ≤ #x − ux − αu#2 = #x − ux #2 + |α|2 #u#2 − 2 Re (αx − ux ; u). Set α = #u#−2 x − ux ; u in the above inequality and recall that #x − ux #2 = d(x, M)2 . Thus 2|x − ux ; u|2 ≤ |x − ux ; u|2 , and hence |x − ux ; u| = 0. Conclusion: x − ux ⊥ u for every nonzero u in M, which implies
5.4 Orthogonal Complement
329
x − ux ⊥ M. Moreover, this ux is the unique vector in M with the above property. Indeed, if v is a vector in M such that x − v ⊥ M, then x − v ; v − u = x − v ; v − x − v ; u = 0 whenever u ∈ M so that x − v ⊥ v − u for every u ∈ M. Thus, by the Pythagorean Theorem, #x − v#2 ≤ #x − v#2 + #v − u#2 = #x − v + v − u#2 = #x − u#2 for all u ∈ M. In particular, for u = ux , d(x, M) ≤ #x − v# ≤ #x − ux # = d(x, M) so that d(x, M) = #x − v#, and hence v = ux because ux is the unique vector in M for which d(x, M) = #x − ux #. LetA be any nonempty subset of a Hilbert space H. Recall that the subspace A = (span A)− of H is the closure in H of the set of all (finite) linear combinations of vectors in A. Take an arbitrary vector x in H. The vector ux in A that minimizes the distance of x to A is called the best linear approximation of x in terms of A, and the difference x − ux in H is called the error of the approximation of x in H by ux in A. The next result is a straightforward consequence of Theorem 5.13, which is illustrated in the figure below. Corollary 5.14. The best linear approximation of any vector x in a Hilbert space H in terms of a nonempty subset A of H is the vector ux in A for which the error x − ux is orthogonal to A. . | | 0 ˙
x−ux
x . ⏐ ⏐ u ˙x
H
∨A
Proposition 5.15. Let M be a linear manifold of a Hilbert space H. M⊥⊥ = M−
and
M⊥ = {0} if and only if M− = H.
In particular, if A is any subset of a Hilbert space H, then A⊥⊥ = A and A⊥ = {0} if and only if A = H. Proof. Recall that M ⊆ M⊥⊥ and M⊥⊥ is closed in H according to Proposition 5.12. Then M− ⊆ M⊥⊥ . Since M⊥⊥ is a subspace (i.e., a closed linear manifold) of a Hilbert space, it follows that it is itself a Hilbert space, and hence M− is a subspace of the Hilbert space M⊥⊥. Take an arbitrary x in M⊥⊥. Theorem 5.13 ensures that
330
5. Hilbert Spaces
there exists ux ∈ M− such that x − ux ∈ (M− )⊥ = M⊥ (Proposition 5.12). But x − ux ∈ M⊥⊥ because ux ∈ M− ⊆ M⊥⊥. Thus x − ux ∈ M⊥ ∩ M⊥⊥, and hence x = ux ∈ M−. Conclusion: M⊥⊥ ⊆ M− . Therefore M− = M⊥⊥. If M⊥ = {0}, then M⊥⊥ = {0}⊥ = H, and so M− = H
whenever
M⊥ = {0}.
− = H. On the other hand, Proposition 5.12 says that M⊥ = {0} whenever M ⊥ ⊥ ⊥⊥ Finally, if A is any subset of H,then A = ( A) so that A = ( A)⊥⊥ ⊥⊥ (Proposition 5.12 again). Since A = is a subspace of H, it follows that A ⊥ A, and A = {0} if and only if A = H.
It is worth noticing that the inner product space H in Theorem 5.13 was supposed to be complete only to ensure that the closed (convex and nonempty) subset M and the closed linear manifold M are complete (Theorem 3.40(b)). In fact, Theorem 5.13 can be formulated in an inner product space setting by assuming that M is complete (as a metric space in the inner product topology) and M is a complete linear manifold, instead of assuming that M and M are closed in a inner product space H that is itself complete. We shall see next that Theorem 5.13 does not hold without the completeness assumption. Example 5.H. Let X be a proper dense linear manifold of a Hilbert space H. (X is an inner product space that is not complete by Proposition 4.7.) Take z ∈ H\X , consider the orthogonal complement {z}⊥ of {z} in H, and set M = {z}⊥ ∩ X . (a) Since {z}⊥ is a closed linear manifold of H (Proposition 5.12), it follows by Problem 3.38(d) that {z}⊥ ∩ X is closed in X . Thus the intersection M of the linear manifolds {z}⊥ and X of H is a linear manifold of X that is closed in X (i.e., M is a subspace of X ). Now, if M = X , then X ⊆ {z}⊥, which implies {z}⊥⊥ ⊆ X ⊥. But X ⊥ = {0} since X − = H (Proposition 5.12). Hence {z} ⊆ {z}⊥⊥ = {0}. Then z = 0, which contradicts the fact that z ∈ X . So M is a proper subspace of X . Take x ∈ X \M. Since x ∈ H, and since {z}⊥ is a subspace of H, it follows by Theorem 5.13 that there is a unique ux ∈ {z}⊥ such that x − ux ∈ {z}⊥⊥ = {z} (seeProposition 5.15), and hence ux = x + αz for some nonzero scalar α (recall: {z} = span {z}). Therefore, ux ∈ X because z ∈ X and x ∈ X . Theorem 5.13 also says that ux is the unique vector in {z}⊥ such that
5.4 Orthogonal Complement
331
#x − ux # = d(x, {z}⊥ ). If M is dense in {z}⊥ (i.e., if M− = {z}⊥, where M− is the closure of M in H), then d(x, M) = d(x, {z}⊥ ) by Problem 3.43(b), and in this case there is no vector u in M = {z}⊥ ∩ X such that #x − u# = d(x, M). This shows that Theorem 5.13 does not hold in the incomplete inner product space X . (b) Next we give a concrete example of the above setup by exhibiting a Hilbert space H, a dense linear manifold X of H, and a vector z in H\X for which M = {z}⊥ ∩ X is dense in {z}⊥. Set 2 H = +
and
0 X = + .
0 2 2 0 is dense in + (Problem 3.44). Set z = {( 21 )k }∞ Recall that + k=1 ∈ + \+ and ⊥ ∞ 2 take an arbitrary y ∈ {z} . That is, y = {υk }k=1 ∈ + is such that ∞ 1 k 2
υk = 0.
k=1
For each integer n ≥ 1 consider the sequence
n yn = υ1 , . . . , υn , −2n+1 k=1 ( 12 )k υk , 0, 0, 0, . . . 0 0 . It is clear that yn ; z = 0, and hence yn ∈ M = {z}⊥ ∩ + , for every in + n ≥ 1. Moreover, #2 ∞ # n #yn − y#2 = #υn+1 + 2n+1 k=1 ( 12 )k υk # + k=n+2 |υk |2
∞ 2 for each integer n ≥ 1. Since y ∈ {z}⊥ ⊂ + , it follows that k=n+2 |υk |2 → 0 n ∞ 1 k 1 k as n → ∞ (Problem 3.11) and k=1 ( 2 ) υk = − k=n+1 ( 2 ) υk for every ∞ 1 k 1 n n ≥ 1. Recalling that k=n+1 ( 2 ) = ( 2 ) for each n ≥ 0, we get n ∞ # # # # # # # # {0} ≤ 2n+1 # ( 12 )k υk # = 2n+1 # ( 12 )k υk # k=1
k=n+1
≤
sup |υk | 2
n+1
k≥n+1
=
sup |υk |
2n+1
k≥n+1
2n
∞
( 12 )k
k=n+1
= 2 sup |υk | → 0 k≥n+1
as n → ∞ (since limn supk≥n |υk | = lim supn |υk | = limn |υn | = 0). Thus yn → y
2 in + .
Conclusion: For every y ∈ {z}⊥ there exists an M-valued sequence {yn }∞ n=1 2 that converges in + to y, which means that M is dense in {z}⊥ (Proposition 3.32(d)). That is,
332
5. Hilbert Spaces
M− = {z}⊥. Note that M = {z}⊥ (e.g., the sequence (ζ1 − #z#2(ζ 1 )−1, ζ2 , ζ3 , . . .) lies in ⊥ 0 2 0 {z} \+ for every z = {ζk }∞ k=1 in + \+ with ζ1 = 0), which implies that 2 M is not closed in the Hilbert space H = + , and hence M is not complete (Corollary 3.41). The above example shows that completeness of H was crucial in the proof of Theorem 5.13. But it is not enough. Indeed, Theorem 5.13 does not necessarily hold in a Banach space that is not Hilbert (i.e., in a Banach space whose norm does not satisfy the parallelogram law). This is closely related to Lemma 4.33. In fact, what is behind that lemma is that there may exist a proper subspace M of a Banach space X for which d(x, M) < 1 for all x ∈ X with #x# = 1 or, equivalently, d(x, M) < #x# for every nonzero x in X . Suppose this is the case, and take an arbitrary x ∈ X \M. Since d(x, M) = d(x − u, M) for every x ∈ X and every u ∈ M, it follows that d(x, M) < #x − u# whenever u lies in M and x lies in X \M. Therefore, if x is a vector in X \M, then there is no vector u in M for which #x − u# = d(x, M).
5.5 Orthogonal Structure Let {Mγ }γ∈Γ be an arbitrary nonempty indexed family of subspaces of a Hilbert space H and consider their topological sum
−
<−
; 1 1 Mγ = span Mγ = Mγ = Mγ . γ∈Γ
γ∈Γ
γ∈Γ
γ∈Γ
If {Mi }ni=1 is a finite (nonempty) family of pairwise orthogonal (i.e., Mi ⊥ Mj whenever i = j) subspaces
− of a Hilbert space, then Corollary 5.11 says that n n M = M . That is, the topological sum and the ordinary (algei i i=1 i=1 braic) sum of a finite family of pairwise orthogonal subspaces of a Hilbert space coincide. The next theorem is a countably infinite counterpart of the above italicized result, which emerges as an important consequence of Theorem 5.13. Theorem 5.16. (The Orthogonal Structure Theorem). Let H be a Hilbert space and suppose {Mk }k∈N is a countably
− infinite family of pairwise orthogonal subspaces of H. If x ∈ M , then there exists a unique H-valued k k∈N sequence {uk }∞ with u ∈ M for each k such that k k k=1 x =
∞
uk .
k=1
∞ 2 if {uk }∞ Moreover, #x#2 = k=1 is an H-valued sek=1 #uk # . Conversely, ∞ 2 quence such that uk ∈ Mk for each kand k=1 #u # < ∞, k
− then the infinite ∞ ∞ series u converges in H and u ∈ M . k k k k∈N k=1 k=1
5.5 Orthogonal Structure
333
− Proof. , then there exists a sequence {x(n)}∞ n=1 of vectors k∈N Mk If x ∈ in k∈N Mk that converges to x in H (Proposition 3.27). Take an arbitrary integer n ≥ 1. Since x(n) ∈ k∈N Mk , it follows that x(n) is a finite sum where each summand lies in one of the subspaces Mk . Thus x(n) can be written as x(n) =
mn
x(n)k ∈
k=1
mn
Mk
k=1
with x(n)k ∈ Mk for each k = 1 , . . . , mn , where mn is an integer that depends on n. Clearly, we may take mn+1 ≥ mn ≥ n. Observe that the above finite sum mn+1 n may contain finitely many zero summands, and {Mk }m k=1 ⊆ {Mk }k=1 . mn Mk is a subspace of the Hilbert space H (Corollary Existence. Note that k=1 5.11). According to Theorem 5.13 there exists a unique vector ux (n) =
mn
uk ∈
k=1
mn
Mk ,
k=1
#x − ux (n)# ≤ #x − u# for all vectors u in with mn each uk in Mk , suchthat mn k=1 Mk and x − ux (n) ⊥ k=1 Mk . In particular, mn , , , , uk , ≤ #x − x(n)#. ,x − k=1
Claim . The vectors uk in the expansion of ux (n) do not depend on n. mn mn+1 mn+1 Mk = k=1 Mk + k=m Mk is a subspace of H, it Proof. Since k=1 n +1 follows by Theorem 5.13 that there exists a unique
mn+1
ux (n+1) = v + w ∈
Mk ,
k=1
mn+1 mn Mk and w ∈ k=m Mk , such that #x − ux (n+1)# ≤ #x − u# with v ∈ k=1 n +1 mn+1 mn+1 M and x − ux (n+1) ⊥ k=1 Mk . Take an arbifor all vectors u in k=1 mn mn+1k trary z ∈ k=1 Mk ⊆ k=1 Mk and note that 0 = x − ux (n+1) ; z = x − v − w ; z = x − v ; z − w ; z = x − v ; z mn+1 mn (since Mj ⊥ Mk whenever j = k so that k=mn +1 Mk ⊥ k=1 Mk , and mn (n) M . But u is the only vector in hence w ; z = 0). Thus x − v ⊥ k x k=1 m mn n (n) (n) M for which x − u ⊥ M . Therefore v = u . Outcome: k x k x k=1 k=1 mn+1 ux (n+1) = ux (n) + w = k=1 uk with uk ∈ Mk . n Set {uk }∞ k=1 = n≥1 {uk }k=1 , which is a sequence of pairwise orthogonal vectors in H because {Mk }k∈N is an orthogonal family. Since n ≤ mn , it follows by Proposition 5.8 that
334
5. Hilbert Spaces n k=1
#uk #2 ≤
mn
mn ,2 , , , #uk #2 = , uk , ≤ (#x − x(n)# + #x#)2 .
k=1
k=1
2 2 But x(n) → x in H as n → ∞ so that ∞ sek=1 #uk # ≤ #x# ∞. Thus theinfinite ∞ ∞ ries k=1 uk converges in the Hilbert space H, and # k=1 uk #2 = k=1 #uk #2 (see Corollary 5.9). Moreover, it in fact converges to x since sequence of mthe n ∞ partial sums { nk=1 uk }∞ has a subsequence, namely { u n=1 k=1 k }n=1 , that ∞ converges to x (see Proposition 3.5), and so #x#2 = k=1 #uk #2 . ∞ Uniqueness. Suppose x= ∞ k=1 uk = k=1 vk , where uk and vk lie in Mk for ∞ each k. Thus k=1 (uk − vk ) = 0 (see Problem 4.9(a)). But {(uk − vk )}∞ k=1 is a sequence of pairwise orthogonal vectors in the inner product space H (since u − vk ∈ Mk for each k∞and {Mk }k∈2N is an orthogonal family), and therefore k ∞ 2 #u − v # = # k k k=1 k=1 (uk − vk )# = 0 by Corollary 5.9(a). Hence uk = vk for every k ≥ 1. Converse. If {uk }∞ k=1 is an H-valued sequence with uk ∈ Mk for each k, then it is a sequence of ∞ pairwise orthogonal vectors in H (since ∞Mj ⊥ Mk whenever j = k). If k=1 #uk #2 < ∞, then the infinite series n ∞k=1 uk converges in H according to Corollary 5.9(b); that is, u → k=1 k k=1 uk in H as n → ∞. Since nk=1 uk lies in k∈N Mk for each integer n ≥ 1, it follows by Proposition
− ∞ . 3.27 that the limit k=1 uk lies in k∈N Mk Here are two immediate consequences of the Orthogonal Structure Theorem. Corollary 5.17. If {Mk }k∈N is a countably infinite family of pairwise orthogonal subspaces of a Hilbert space H, then
− ∞ ∞ 2 = k∈N Mk k=1 uk ∈ H : uk ∈ Mk and k=1 #uk # < ∞ . Corollary 5.18. Let {Mk }k∈N be a countably infinite orthogonal family of subspaces of a Hilbert space H. If it spans H; that is, if 1 Mk = H, k∈N
then every vector x in H is uniquely expressed as x =
∞
uk
k=1
in terms of an orthogonal sequence {uk }∞ k=1 with each uk in each Mk . Example 5.I. Let {Mk } be a countable collection of subspaces of a Hilbert space (H, ; ) so that each (Mk , ; ) is itself a Hilbert space. If {Mk } is an then let it be indexed by N . Consider the direct sum infinite collection, that subset of k Mk .-Recall k Mk is itself a linear space (but it is not a . ∞ H). Let full direct sum k=1 Mk k Mk 2 denote the linear manifold of the made up of all square-summable sequences {xk } in ∞ k=1 Mk . That is,
5.5 Orthogonal Structure
-
.
k Mk 2
=
335
2 {xk } ∈ k Mk : k #xk # < ∞ ,
where- # # is the on H by the inner product ; . (It is clear . norm induced that M coincides with M if the collection {Mk } is finite.) The k k 2 - . - k . k function ; ⊕ : M × M k 2 k 2 → F , given by k k x ; y⊕ = xk ; yk k
. . - - Mk 2 and y = {yk } ∈ Mk 2 , is an inner prodfor every x = {xk } ∈ k k - . uct on k Mk 2 that makes it into a Hilbert space (Examples 5.E and 5.F). If the collection {M of pairwise orthogonal subspaces of H, then k } consists - .
the Hilbert space k Mk 2 , ; ⊕ is referred to as an (internal) orthogonal direct sum. In this case (i.e., if {Mk } is a collection of orthogonal subspaces), then Corollary 5.17 (or Corollary 5.11, if the collection is finite) says that -
− . = k Mk k xk ∈ H: {xk } ∈ k Mk 2 , and - .
− . k Mk 2 = {xk } ∈ k Mk : k xk ∈ k Mk - .
− This establishes a natural mapping Φ: , k Mk 2 → k Mk
Φ {xk } = xk -
.
k
for every {xk } ∈ k Mk 2 , which is injective and surjective (by the Orthogonal Structure Theorem) and linear as well (addition and scalar multiplication of summable sequences is again a summable — Problem 4.9). Thus - sequence . Φ is an isomorphism of the linear space M onto the linear manifold k k 2 -
− . M of the linear space H. Hence the orthogonal direct sum Mk 2 k k k
− and the topological sum of pairwise orthogonal subspaces of a k Mk Hilbert space are isomorphic linear spaces. They are, in fact, isometrically isomorphic Hilbert spaces: the natural Φ actually is an isometric - mapping .
−isomorphism of the Hilbert space of k Mk 2 onto the subspace k Mk the Hilbert space H (next section). This provides a natural identification between them. - It is. customary to use the same notation of the full direct sum to denote k Mk 2 when it is equipped with the above inner product ; ⊕ . .
- Notation: k Mk = k Mk 2 , ; ⊕ . We shall follow the common usage. From now on k Mk will denote the Hilbert space whose linear space is the set of all square-summable sequences in the full direct sum of a collection {Mk } of pairwise orthogonal subspaces of a Hilbert space H. Two subspaces M and N of an inner product space X are complementary in X if they are algebraic complements of each other (i.e., M + N = X and M ∩ N = {0}; see Problem 4.34). Recall that M ⊥ N implies M ∩ N = {0}.
336
5. Hilbert Spaces
Proposition 5.19. Orthogonal complementary subspaces in an inner product space are orthogonal complements of each other . Proof. Let M and N be orthogonal complementary subspaces in an inner product space X . Take an arbitrary x in M⊥ ⊆ X . Since M + N = X , it follows that x = u + v with u in M and v in N , and so x ; u = u ; u + v ; u. But x ; u = v ; u = 0 (because M ⊥ M⊥ and M ⊥ N ). Hence #u#2 = 0, which means that u = 0. Thus x = v ∈ N . Therefore, M⊥ ⊆ N . But N ⊆ M⊥ because M ⊥ N . Outcome: M⊥ = N . The next result is the central theorem of Hilbert space geometry. Theorem 5.20. (Projection Theorem – First version). Every Hilbert space H can be decomposed as H = M + M⊥ where M is any subspace of H. Proof. Let M be any subspace of a Hilbert space H. Since M⊥ is a subspace of H (Proposition 5.12) which is orthogonal to M (by definition), it follows by Theorem 5.10 that M + M⊥ is a subspace of H. Moreover, M ⊆ M + M⊥ and M⊥ ⊆ M + M⊥. Thus (M + M⊥ )⊥ ⊆ M⊥ ∩ M⊥⊥ = M⊥ ∩ M− = M⊥ ∩ M = 0, and so M + M⊥ = (M + M⊥ )− = H (Proposition 5.15). Let M be an arbitrary subspace of a Hilbert space H. Since M⊥ is again a subspace of H, and since M ∩ M⊥ = {0}, what Theorem 5.20 says is: If M is any subspace of a Hilbert space H, then M and M⊥ are complementary subspaces of H. This in fact is the converse of Proposition 5.19 in a Hilbert space setting. Moreover, according to Theorem 2.14, for each x ∈ H = M + M⊥ there exists a unique u in M and a unique v in M⊥ such that x = u + v and, by the Pythagorean Theorem, #x#2 = #u#2 + #v#2 .
5.6 Unitary Equivalence An isometry between metric spaces is a map that preserves distance, which is obviously continuous. Proposition 4.37 placed linear isometries in a normedspace setting. The next one places it in an inner-product-space setting. Proposition 5.21. Let X and Y be inner product spaces. A linear transformation V ∈ L[X , Y ] is an isometry if and only if V x1 ; V x2 = x1 ; x2 for every x1 , x2 ∈ X .
5.6 Unitary Equivalence
337
Proof. The inner product on the left-hand side is the inner product on Y, and that on the right-hand side is the inner product on X . If the above identity holds, then #V x#2 = V x ; V x = x ; x = #x#2 for every x ∈ X . Conversely, if #V x# = #x# for every x ∈ X , then V x1 ; V x2 = x1 ; x2 for every x1 , x2 ∈ X by the polarization identity of Proposition 5.4. Thus V x1 ; V x2 = x1 ; x2 for every x1 , x2 ∈ X if and only if #V x# = #x# for every x ∈ X or, equivalently, if and only if V ∈ L[X , Y ] is an isometry (Proposition 4.37). In other words, a linear isometry between inner product spaces is a linear transformation that preserves the inner product . Now recall that an isomorphism is an invertible linear transformation between linear spaces. These concepts (isometry and isomorphism) were combined in Section 4.7 to yield the notion of an isometric isomorphism between normed spaces, which is precisely a linear surjective isometry. Between inner product spaces it has a name of its own: an isometric isomorphism between inner product spaces is called a unitary transformation. That is, a unitary transformation of an inner product space onto an inner product space is a linear surjective isometry or, equivalently, an invertible linear isometry. According to Proposition 5.21, a unitary transformation between inner product spaces is a linear-space isomorphism that preserves the inner product . Thus a unitary transformation preserves the algebraic structure, the topological structure, and also the geometric structure between inner product spaces. In particular, it preserves convergence and Cauchy sequences (see introduction to Section 4.6), and so it also preserves separability and completeness (cf. Problem 3.48(b) and Theorem 3.44). Two inner product spaces, say X1 and X2 , are called unitarily equivalent if there exists a unitary transformation between them (equivalently, if they are isometrically isomorphic — Notation: X1 ∼ = X2 ). Unitarily equivalent inner product spaces are regarded as essentially the same inner product space (i.e., they are indistinguishable, except perhaps by the nature of their points). The continuous linear extension results of Section 4.7 (viz., Theorem 4.35 and Corollaries 4.36 and 4.38) are trivially extended to inner product spaces (simply replace “normed space”, “Banach space”, and “isometric isomorphism” with “inner product space”, “Hilbert space”, and “unitary transformation”, respectively). The completion results of Section 4.7 are (almost) immediately translated into the inner product space language by just recalling that in an inner-product-space setting two spaces are unitarily equivalent if and only if they are isometrically isomorphic. Definition 5.22. If the image of a linear isometry on an inner product space X is a dense linear manifold of a Hilbert H, then H is a completion of X . Equivalently, if an inner product space X is unitarily equivalent to a dense linear manifold of a Hilbert space H, then H is a completion of X . Theorem 5.23. Every inner product space has a completion. Any two completions of an inner product space are unitarily equivalent. If H and K are com-
338
5. Hilbert Spaces
pletions of inner product spaces X and Y, respectively, then every T ∈ B[X , Y ] has an extension T ∈ B[H, K] over the completion H of X into the completion K of Y. Moreover, T is unique up to unitary transformations, and #T# = #T #. Proof. Every inner product space has a completion as a normed space (Theorem 4.40). The only question that has not been answered in Theorems 4.40 through 4.42 is whether this completion (which is a Banach space) is a Hilbert space; in other words, whether the norm # # in the proof of Theorem 4.40 X satisfies the parallelogram law whenever the norm # #X does. Consider the setup of the proof of Theorem 4.40 and suppose X is an inner product space. Recall that #[x ]# = lim #xn #X, X where x = {xn } is an arbitrary element of an arbitrary coset [x] in the quotient space X. Take any pair of cosets [x ] and [y ] in X. Since the norm # #X on the inner product space X satisfies the parallelogram law, #[x ] + [y ]#2 + #[x ] − [y ]#2 = #[x + y]#2 + #[x − y ]#2 X X X X
= lim #xn + yn #2X + lim #xn − yn #2X
= lim #xn + yn #2X + #xn − yn #2X
= lim 2 #xn #2X + #yn #2X
= 2 lim #xn #2X + lim #yn #2X
= 2 #[x ]#2 + #[y ]#2 X X
by continuity (apply Corollary 3.8 recalling that squaring, addition, and scalar multiplication are continuous). Thus the norm # # on X also satisfies the X parallelogram law so that this Banach space is, in fact, a Hilbert space. We shall now return to orthogonal complements. Let M and N be linear manifolds of an inner product space X . If M and N are algebraic complements of each other (i.e., if M + N = X and M ∩ N = {0}), then they are said to be complementary linear manifolds in X . Consider the natural mapping Φ of the linear space M ⊕ N into the linear space M + N , which was defined by Φ((u, v)) = u + v for each (u, v) ∈ M ⊕ N in Section 2.8. Let ; be the inner product on X and equip the direct sum M ⊕ N with the inner product ; ⊕ , viz., (u1 , v1 ) ; (u2 ; v2 )⊕ = u1 ; u2 + v1 ; v2 for every (u1 , v1 ) and (u2 , v2 ) in M ⊕ N , as in Example 5.E. Proposition 5.24. Let M and N be orthogonal complementary linear manifolds in an inner product space X . The natural mapping Φ: M ⊕ N → M + N
5.6 Unitary Equivalence
339
is a unitary transformation so that M ⊕ N and X = M + N are unitarily equivalent (i.e., X ∼ = M ⊕ N ). Proof. Since M and N are complementary linear manifolds in X , it follows by Theorem 2.14 that Φ is an isomorphism of the linear space M ⊕ N onto the linear space M + N = X . If M ⊥ N , then 9 : Φ((u1 , v1 )) ; Φ((u2 , v2 )) = u1 + v1 ; u2 + v2 = u1 ; u2 + v1 ; v2 = (u1 , v1 ) ; (u2 ; v2 )⊕ for every (u1 , v1 ) and (u2 , v2 ) in M ⊕ N . Thus the natural mapping Φ is a linear-space isomorphism that preserves the inner product; that is, Φ is a unitary transformation. In light of Proposition 5.24, we may identify the inner product space X with the orthogonal direct sum M ⊕ N (equipped with the above inner product ; ⊕ ) through the natural mapping Φ, whenever M and N are orthogonal complementary linear manifolds in X . Next suppose M and N are orthogonal complementary subspaces in a Hilbert space H. That is, suppose M and N are closed linear manifolds of a Hilbert space H such that M + N = H and M ⊥ N (thus M ∩ N = {0}). According to Proposition 5.19, N = M⊥ so that M + M⊥ = H, and hence M ⊕ M⊥ ∼ = H by Proposition 5.24. In this case it is usual to identify the orthogonal direct sum M ⊕ M⊥ with its unitarily equivalent image Φ(M ⊕ M⊥ ) = M + M⊥ = H, and write M ⊕ M⊥ = H. Under this identification the central Theorem 5.20 can be restated as follows. Theorem 5.25. (Projection Theorem – Second version). Every Hilbert space H has an orthogonal direct sum decomposition H = M ⊕ M⊥ where M is any subspace of H. Observe that both M and M⊥ are themselves Hilbert spaces, so that each of them can be further decomposed as direct sums of orthogonal complementary subspaces. The decomposition of Theorem 5.25 leads to a useful notation for the orthogonal complement of a subspace M of a Hilbert space H, namely, M⊥ = H & M. Example 5.J. Let {Mk } be a countable collection of orthogonal subspaces of a Hilbert space (H, ; ). If {Mk } is countably infinite, that it is then assume − indexed by N . Consider the natural isomorphism Φ of k Mk onto k Mk defined in Example 5.I,
xk for every {xk } ∈ k Mk , Φ {xk } = k
340
5. Hilbert Spaces
where k Mk denotes the Hilbert space of all square-summable sequences in the full direct sum of {Mk }, and the inner product on k Mk is given by : 9 xk ; yk {xk } ; {yk } ⊕ = k
for every {xk } and {yk } in k Mk (see Example 5.I). Since Mj ⊥ Mk whenever j = k, and since the inner product is continuous (in both arguments — Problem 5.6), it follows that 7 8 9
: Φ {xk } ; Φ {yk } = xj ; yk = xj ; yk j
=
k
j
k
9 : xk ; yk = {xk } ; {yk } ⊕
k
for every pair of square-summable sequences {xk } and {yk } in k Mk . Thus the linear-space isomorphism Φ preserves the inner product, which means that The orthogonal direct sum
− Conclusion: it is a unitary transformation. M and the topological sum M of pairwise orthogonal subspaces k k k k of a Hilbert space are unitarily equivalent Hilbert spaces. That is,
− " Mk ∼ Mk = k
k
if {Mk } is a collection of pairwise orthogonal subspaces of a Hilbert space. This, in fact, is a restatement of the Orthogonal Theorem (Theorem
−Structure 5.16). (Recall: For a finite collection, M = M by Corollary 5.11, k k k k ∼ so that M M .) If, in addition, the orthogonal collection {Mk } = k k k k spans H (i.e., if k Mk = H), then it is usual to identify the orthogonal direct
− sum = M with its unitarily equivalent image Φ M M = k k k k k k M = H, and write M = H. Under this identification the result in k k k k Corollary 5.18 can be restated as follows. If the orthogonal collection {Mk } spans H, then H = k Mk . Remark : If {Mi }ni=1 is a finite orthogonal collection of linear manifolds of a n
− n Hilbert space, then = i=1 M− i (by induction and Problem 5.7). i=1 Mi
5.7 Summability The first half of this section pertains to Banach spaces and, as such, could have been introduced in Chapter 4. The final part, however, is a genuine Hilbert space subject. Definition 5.26. Let Γ be any index set and let {xγ }γ∈Γ be an indexed family of vectors in a normed space X . {xγ }γ∈Γ is a summable family with sum x ∈ X (notation: x = γ∈Γ xγ ) if for each ε > 0 there exists a finite set of indices Nε ⊆ Γ such that
5.7 Summability
341
, , , , xk − xγ , < ε , γ∈Γ
k∈N
for every finite subset N of Γ that includes Nε (i.e., for every finite N such that Nε ⊆ N ⊆ Γ ). It is called a p-summable family for some p ≥ 1 if {#xγ #p }γ∈Γ is a summable family of positive numbers. In particular, {xγ }γ∈Γ is an absolutely summable family if {#xγ #}γ∈Γ is a summable family, and a square-summable family if {#xγ #2 }γ∈Γ is a summable family. Remark : It is readily verified from Definition 5.26 that if {xγ }γ∈Γ and {yγ }γ∈Γ are two (similarly indexed) summable families of vectors in a normed space X with sums γ∈Γ xγ and γ∈Γ yγ in X , respectively, then {αxγ + β yγ }γ∈Γ = α{xγ }γ∈Γ + β{yγ }γ∈Γ is again a summable family of vectors in X with sum (αxγ + β yγ ) = α xγ + β yγ γ∈Γ
γ∈Γ
γ∈Γ
for every pair of scalars α and β. This shows that the collection of all summable families of vectors in X is a linear manifold of the linear space X Γ (cf. Example 2.F), and so is a linear space itself. It is also easy to verify that if {xγ }γ∈Γ is a summable family of vectors in X with sum γ∈Γ xγ in X , and if T ∈ B[X , Y ] is a bounded linear transformation of X into some normed space Y, then {T xγ }γ∈Γ is a summable family of vectors in Y with sum T xγ = T xγ . γ∈Γ
γ∈Γ
Theorem 5.27. (Cauchy Criterion). Let {xγ }γ∈Γ be an indexed family of vectors in a normed space X and consider the following assertions. (a) {xγ }γ∈Γ is a summable family. (b) For each ε > 0 there exists a finite set of indices Nε ⊆ Γ such that , , , , xk , < ε , k∈N
for every finite subset N of Γ that is disjoint with Nε (i.e., whenever N ⊆ Γ is finite and N ∩ Nε = ∅). Claim: (a) implies (b), and (b) implies (a) if X is a Banach space. Proof. Suppose (a) holds true and take an arbitrary ε > 0. According to Definition 5.26 there exists a finite Nε ⊆ Γ such that , , , , xk − xγ , < 2ε , k∈N
γ∈Γ
whenever N is finite and a finite subset of Γ such Nε ⊆ N⊆ Γ . If N is that N ∩ Nε = ∅, then k∈N xk = k∈N∪Nε xk − k∈Nε xk . Since N ∪ Nε is finite, and since Nε ⊆ N ∪ Nε ⊆ Γ , it follows that
342
5. Hilbert Spaces
, , , , , , , , , , , , xk , ≤ , xk − xγ , + , xk − xγ , < ε. , k∈N ∪Nε
k∈N
γ∈Γ
γ∈Γ
k∈Nε
Thus (a)⇒(b). Conversely, if (b) holds true, then there exists a sequence 1 {Ni }∞ of finite subsets of Γ such that, for each i ≥ 1, # i=1 k∈N xk # < i when i ever N ⊆ Γ is finite and N ∩ Ni = ∅. Set Ni = j=1 Nj , which is a finite subset of Γ such that Ni ⊆ Ni+1 for each i ≥ 1. Take any i ≥ 1 and an arbitraryfinite N ⊆ Γ such that N ∩ Ni = ∅. Since N ∩ Ni = ∅, it follows that # k∈N xk # < 1i . Conclusion: {Ni }∞ i=1 is an increasing sequence of finite subsets of Γ such that, for each i ≥ 1, , , , , xk , < 1i , k∈N
whenever N ⊆ Γ is finite and N ∩ Ni = ∅. Now set yi = k∈Ni xk in X for each i ≥ 1. Since {Ni }∞ i=0 is an increasing sequence, we get xk − xk = xk yi+j − yi = k∈Ni+j
k∈Ni+j \Ni
k∈Ni
for every i , j ≥ 1, which implies
, , #yi+j − yi # ≤ ,
, , xk , <
1 i
k∈Ni+j \Ni
for every i ≥ 1 and all j ≥ 1 (reason: Ni+j \Ni is a finite subset of Γ such that (Ni+j \Ni ) ∩ Ni = ∅ for every i , j ≥ 1). Hence (see Problem 3.51) {yi }∞ i=1 is a Cauchy sequence in X . If X is a Banach space, then {yi}∞ i=1 converges in X to, say, x ∈ X . Therefore, for each ε > 0 there exists (i) an integer iε ≥ 1 such that # k∈Ni xk − x# < ε2 whenever i ≥ iε and, as (b) holds true, (ii) a finite Nε ⊆ Γ such that # N ∩ Nε = ∅.
k∈N xk #
<
ε 2
whenever N ⊆ Γ is finite and
Since Nε ∪ Niε is finite and since {Ni }∞ i=1 is increasing, it follows that there exists an integer i∗ε ≥ iε , and consequently a finite subset Ni∗ε of Γ , such that Nε ∪ Niε ⊆ Ni∗ε . If N is finite and Ni∗ε ⊆ N ⊆ Γ , then N \Ni∗ε ⊆ Γ is finite and (N \Ni∗ε ) ∩ Nε = ∅. Hence , , , , , , , , xk − x, = , xk + xk − x, , k∈N
k∈N \Ni∗
k∈N \Ni∗ ε
and therefore (b)⇒(a).
k∈Ni∗ ε
, ε , , , , , , , ≤ , xk , + , xk − x, < ε, k∈Ni∗ ε
5.7 Summability
343
Corollary 5.28. If {xγ }γ∈Γ is a summable family of vectors in a normed space X , then the set {γ ∈ Γ : xγ = 0} is countable. Proof. Let {xγ }γ∈Γ be a summable family of vectors in a normed space X . The previous theorem ensures that for every integer n ≥ 1 there is a finite subset 1 Nn of Γ such that # x k∈N k # < n whenever N ⊆ Γ is finite and N ∩ Nn = ∅. ∞ Set S = n=1 Nn ⊆ Γ and recall from Corollary 1.11 that S is a countable set. If γ ∈ Γ \S, then {γ} ∩ Nn = ∅, and hence #xγ # < n1 , for every n ≥ 1, which implies that xγ = 0. Thus xγ is a nonzero vector in X only if γ lies in the countable set S. What Corollary 5.28 says is that an uncountable indexed family of vectors in a normed space may be summable but, in this case, it has only a countable number of nonzero vectors. Corollary 5.29. Every absolutely summable family of vectors in a Banach space is a summable family. Proof. Let {xγ }γ∈Γ be an absolutely summable family of vectors in a normed space X so that {#xγ #}γ∈Γ is a summable family of nonnegative numbers. Thus (cf. Theorem 5.27) for each ε > 0 there exists a finite Nε ⊆ Γ such that , , , , xk , ≤ #xk # < ε , k∈N
k∈N
for every finite N ⊆ Γ such that N ∩ Nε = ∅. Another application of Theorem 5.27 ensures that {xγ }γ∈Γ is a summable family if X is a Banach space. The converse of Corollary 5.29 holds for finite-dimensional Banach spaces. That is, if X is a finite-dimensional normed space, then every summable family of vectors in X is absolutely summable. But it fails in general. In fact, Dvoretzky and Rogers proved in 1950 that there are summable families of vectors in infinite-dimensional Banach spaces that are not absolutely summable. Proposition 5.30. If X is a finite-dimensional normed space, then {xγ }γ∈Γ is a summable family of vectors in X if and only if it is absolutely summable. Proof. Recall that a finite-dimensional normed space is a Banach space (Corollary 4.28). Then, according to Corollary 5.29, it remains to show that every summable family of vectors in a finite-dimensional normed space is absolutely summable. Consider a normed space (X , # #) with dim X = n for some positive integer n and let B = {ej }nj=1 be a Hamel basis for X . Take an arbitrary x ∈ X and consider its unique expansion on B, x =
n
ξj ej ,
j=1
where {ξj }nj=1 is a family of scalars; the coordinates of x with respect to the basis B. It is readily verified that the function # #1 : X → R, defined by
344
5. Hilbert Spaces
#x#1 =
n
|ξj |
j=1
for every x ∈ X , is a norm on X . Since any two norms on X are equivalent, it follows that there exist real constants α > 0 and β > 0 such that #x# ≤ α#x#1
#x#1 ≤ β#x#
and
for every x ∈ X (Proposition 4.26 and Theorem 4.27). Now suppose {xγ }γ∈Γ is a summable family of vectors in X and take an arbitrary ε > 0. Theorem 5.27 ensures the existence of a finite Nε ⊆ Γ such that n # # , , , , # # , , , , ξj (k)# = , xk , ≤ β , xk , < β ε, # j=1
k∈N
k∈N
# # # # ξj (k)# < β ε #
and hence
1
k∈N
for every
j = 1 , . . . , n,
k∈N
whenever N ⊆ Γ is finite and N ∩ Nε = ∅, where {ξj (k)}nj=1 are the coordin nates of each xk ∈ X with respect to the basis B. That is, xk = j=1 ξj (k) ej n n with #xk #1 = j=1 |ξj (k)|, and therefore x = j=1 ξj (k) ej = k∈N k∈N k n (k) ξ e is the unique expansion of x on B, which implies j k∈N k j=1 k∈N j n that # k∈N xk #1 = j=1 | k∈N ξj (k)|. Take any finite subset N of Γ for which N ∩ Nε = ∅ and observe that
#xk # ≤ α
k∈N
≤ α
k∈N n
#xk #1 = α
n
|ξj (k)|
k∈N j=1 n
|Re ξj (k)| + α
j=1 k∈N
|Im ξj (k)|.
j=1 k∈N
Set Nj+ = {k ∈ N : Re ξj (k) ≥ 0} and Nj− = {k ∈ N : Re ξj (k) < 0} for each j = 1 , . . . , n, which (since they are subsets of N ) are finite subsets of Γ + − such that Nj ∩ Nε = ∅ and Nj ∩ Nε = ∅ for every j = 1 , . . . , n. Therefore, | k∈N + ξj (k)| < β ε and | k∈N − ξj (k)| < β ε for all j = 1 , . . . , n, and hence j
n
j
|Re ξj (k)| =
n # n # # # # # # # Re ξj (k)# + Re ξj (k)# # # j=1 k∈N + j
j=1 k∈N
=
n # n #
#
# # # # # ξj (k) # + ξj (k) # #Re #Re j=1
≤
j=1 k∈N − j
k∈Nj+
j=1
k∈Nj−
n # n # # # # # # # + (k) ξj # ξj (k)# < 2nβ ε. # # j=1 k∈N + j
j=1 k∈N − j
5.7 Summability
Similarly,
n j=1
k∈N |Im ξj (k)|
345
< 2nβ ε, and so
#xk # < 4nαβ ε.
k∈N
Conclusion: {xγ }γ∈Γ is an absolutely summable family.
Remark : If a family of vectors in a normed space X is indexed by N (or by N 0 ), then it can be viewed as an X -valued sequence (the very indexing process establishes a function from N to X ). If {xk }k∈N is a summable family, ∞then {xk }∞ is a summable sequence (or, equivalently, the infinite series k=1 k=1 xk converges). Indeed, if for each ε > 0 there exists a finite N ⊂ N such that ε # k∈N xk − x# < ε for some x ∈ X whenever N is finite and N ⊆ N ⊂ N, ε n then by setting nε = # Nε it follows that # k=1 xk − x# < ε whenever n ≥ nε . However, the converse fails even for scalar-valued sequences. For instance, 1 consider the sequence {xk }∞ 2k = k . The infinite series k=1 with n x2k−1 = −x ∞ 2 xk | ≤ n+1 for each n ≥ 1), but it is not k=1 xk converges (to zero, since | k=1 n 1 an absolutely convergent series (since 2n k=1 k for each n ≥ 1). k=1 |xk | = 2 Thus {xk }∞ k=1 is a summable sequence but not an absolutely summable sequence, and so {xk }k∈N is not an absolutely summable family of vectors in the one-dimensional normed space R. Hence {xk }k∈N is not a summable family by Proposition 5.30. An X -valued sequence {xk }∞ k=1 for which {xk }k∈N is a summable family of vectors in X is referred to as an unconditionally summable ∞ sequence. In this case it is also common to say that the infinite series k=1 xk is an unconditionally convergent series. Proposition 5.31. Let {xγ }γ∈Γ be a family of vectors in a normed space X and take any p ≥ 1. The following assertions are equivalent . (a) {xγ }γ∈Γ is a p-summable family. (b) supN k∈N #xk #p < ∞, where the supremum is taken over all finite subsets of Γ , which is expressed by writing γ∈Γ #xγ #p < ∞. Moreover, if any of the above equivalent assertions holds, then #xγ #p = sup #xk #p . N
γ∈Γ
k∈N
Proof. Suppose (a) holds true. Theorem 5.27 ensures that for each ε > 0 there is a finite subset Nε of Γ such that k∈N #xk #p < ε for every finite subset N of Γ that is disjoint with Nε (i.e., for every finite N ⊆ Γ such that N ∩ Nε = ∅). If (b)fails, then for every integer n ≥ 1 there is a finite subset Nn of Γ such that k∈Nn #xk #p > n. Therefore, since (Nn \Nε ) ∩ Nε = ∅, it follows that n <
k∈Nn
#xk #p =
#xk #p +
k∈Nn ∩Nε
#xk #p <
k∈Nn \Nε
k∈Nε
#xk #p + ε
346
5. Hilbert Spaces
for every integer n ≥ 1, which is a contradiction. Hence (a) implies (b). Con p versely, suppose (b) holds and set α = supN k∈N for every ε > 0 #xk # . Thus there exists a finite subset Nε of Γ for which k∈Nε #xk #p > α − ε. If N is any finite subset of Γ such that N ∩ Nε = ∅, then #xk #p = #xk #p − #xk #p < α − (α − ε) = ε. k∈N ∪Nε
k∈N
k∈Nε
Hence (b) implies (a) by Theorem 5.27 (in the Banach space R). Finally, if any of the above equivalent assertions holds, then for every ε > 0 there exists a finite subset Nε of Γ such that # # # # #xk #p − #xγ #p # < ε # γ∈Γ
k∈N
or, equivalently, such that #xk #p < #xγ #p + ε k∈N
and
γ∈Γ
#xγ #p <
γ∈Γ
#xk #p + ε,
k∈N
whenever N is finite and Nε ⊆ N ⊆ Γ (Definition 5.26). Hence sup #xk #p ≤ #xγ #p + ε ≤ sup #xk #p + 2ε N
k∈N
N
γ∈Γ
k∈N
for 0, with the supremum taken over all finite subsets of Γ (recall: every ε > p p #x # ≤ k k∈N1 k∈N2 #xk # whenever N1 and N2 are finite subsets of Γ such that N1 ⊆ N2 ). By taking the infimum over all ε > 0 in the above inequalities, #xγ #p = sup #xk #p . γ∈Γ
N
k∈N
Remark : Let {xγp}γ∈Γ be a family of vectors in a normed space X . It is obvious that k∈N #xk # < ∞ for every finite subset N of Γ .By convention, the empty sum is null (i.e., k∈∅ #xk #p = 0 — in general, k∈∅ xk = 0 ∈ X ). If {xγ }γ∈Γ is a summable family, then it has only a countable set of nonzero vectors (cf. Corollary 5.28). If this set is infinite (countably infinite, that is), then the family of all nonzero vectors from {xγ }γ∈Γ can be indexed by N , say {xk }k∈N (or by any other countably infinite index set). Clearly, {xγ }γ∈Γ is a summable (a p-summable) family if and only if {xk }k∈N is. If {xk }∞ k=1 is any sequence containing all vectors from {xk }k∈N , then it follows by the remark that precedes Proposition 5.31 that {xγ }γ∈Γ is a summable family if and only if {xk }∞ k=1 is an unconditionally summable sequence. In particular, {xγ }γ∈Γ is a p-summable family if and only if {#xk #p }∞ k=1 is an unconditionally summable sequence of positive numbers — but a sequence of positive numbers is unconditionally summable if and only if it is (plain) summable, isn’t it? (Indeed, Proposition 5.30 ensures that a series of real numbers is unconditionally convergent if and only if it is absolutely convergent .)
5.7 Summability
347
Example 5.K. Let Γ be any index set and let Γ2 denote the collection of all square-summable families {ξγ }γ∈Γ of scalars in F (as usual, F stands either for the real field R or for the complex field C ). It is easy to check that Γ2 is a linear space over F . Observe that {ξγ υ γ }γ∈Γ is a summable family whenever x = {ξγ }γ∈Γ and y = {υγ }γ∈Γ are square-summable families. Indeed, by the H¨older inequality for finite sums,
|ξk υ k | ≤
k∈N
|ξk |2
k∈N
12
|υk |2
12
k∈N
for every finite subset N of Γ . Therefore, γ∈Γ
|ξγ υ γ | ≤
|ξγ |2
γ∈Γ
12
|υγ |2
12
γ∈Γ
by Proposition 5.31, which implies that {ξγ υ γ }γ∈Γ is an absolutely summable family of scalars, and hence a summable family in the Banach space F (Corollary 5.29). Thus the function ; : Γ2 ×Γ2 → F given by ξγ υ γ x ; y = γ∈Γ
for every x = {ξγ }γ∈Γ and y = {υγ }γ∈Γ in Γ2 is well defined. Moreover, it is readily verified that it is an inner product on Γ2 . In fact, it is not difficult to show that (Γ2 , ; ) is a Hilbert space. 2 In particular, if Γ = N , then (Γ2 , ; ) becomes the Hilbert space (+ , ; ) 2 of Example 5.B. The natural generalization is easy. Let Γ (H) denote the linear space of all square-summable families {xγ }γ∈Γ of vectors in a Hilbert space (H, ; H ). Proceeding as above, it can also be shown that the function ; : Γ2 (H)×Γ2 (H) → F given by : 9 xγ ; yγ H {xγ }γ∈Γ ; {yγ }γ∈Γ = γ∈Γ
for every {xγ }γ∈Γ and {yγ }γ∈Γ in Γ2 (H) is an inner product on Γ2 (H), and (Γ2 (H), ; ) is a Hilbert space. 2 (H), ; ) Again, if Γ = N , then (Γ2 (H), ; ) becomes the Hilbert space (+ of Example 5.F.
From now on suppose X is an inner product space. It is easy to show using Definition 5.26 that if {xγ }γ∈Γ and {y indexed) summable γ }γ∈Γ are (similarly families of vectors in X with sums γ∈Γ xγ and γ∈Γ yγ in X , respectively,
348
5. Hilbert Spaces
and if x and y are arbitrary vectors in X , then {xγ ; y}γ∈Γ and {x ; yγ }γ∈Γ are summable families of scalars with sums 7 8 7 8 xγ ; y = xγ ; y and x ; yγ = x ; yγ . γ∈Γ
γ∈Γ
γ∈Γ
γ∈Γ
The next result is the general version of the Pythagorean Theorem. It extends Corollary 5.9 to any infinite (not necessarily countable) orthogonal family of vectors in an inner product space. Theorem 5.32. Let {xγ }γ∈Γ be a family of pairwise orthogonal vectors in a inner product space X . (a) If {x family, then it is a square-summable family γ }γ∈Γ is a summable and # γ∈Γ xγ #2 = γ∈Γ #xγ #2 . (b) If X is a Hilbert space and {xγ }γ∈Γ is a square-summable family, then {xγ }γ∈Γ is a summable family. Proof. Let {xγ }γ∈Γ be an orthogonal family of vectors in X . (a) If {xγ }γ∈Γ is a summable family, then for each ε > 0 there exists a finite Nε ⊆ Γ such that , ,2 , , #xk #2 = , xk , < ε 2 k∈N
k∈N
for every finite N ⊆ Γ such that N ∩ Nε = ∅ (by Proposition 5.8 and Theorem 5.27). Another application of Theorem 5.27 ensures that {#xγ #2 }γ∈Γ is a summable family in the Banach space R. Moreover, since xα ⊥ xβ whenever xα and xβ are distinct vectors from {xγ }γ∈Γ (i.e., xα ; xβ = 0 for every α, β ∈ Γ such that α = β), it follows that {xγ ; xγ }γ∈Γ includes the family of all nonzero scalars from the family {xα ; xβ }(α,β)∈Γ ×Γ . Therefore, , , ,
γ∈Γ
,2 7 7 8 8 , xα ; xγ , = xγ ; xγ = xβ γ∈Γ
=
γ∈Γ
xα ; xβ =
α∈Γ β∈Γ
α∈Γ
β∈Γ
xγ ; xγ =
γ∈Γ
#xγ #2 .
γ∈Γ
(b) If {#xγ #2 }γ∈Γ is a summable family of nonnegative numbers, then for every ε > 0 there exists a finite Nε ⊆ Γ such that , ,
12 1 , , xk , = #xk #2 < ε2 , k∈N
k∈N
whenever N ⊆ Γ is finite and N ∩ Nε = ∅ (by Proposition 5.8 and Theorem 5.27 again). Another application of Theorem 5.27 ensures that {xγ }γ∈Γ is a summable family in the Hilbert space X .
5.8 Orthonormal Basis
349
5.8 Orthonormal Basis An orthogonal set in an inner product space X may contain the origin of X , which is orthogonal to every vector in X . The next proposition presents the main algebraic property of orthogonal sets that do not contain the origin. Proposition 5.33. An orthogonal set consisting of nonzero vectors in an inner product space is linearly independent. Proof. Let A be an orthogonal set consisting of nonzero vectors (i.e., set of pairwise orthogonal nonzero vectors) in an inner product space X . Suppose there exists a finite subset of A containing more than one vector that is not linearly independent; for instance, {xi }ni=0 ⊆ A such that x0 = ni=1 αi xi for n some integer n ≥ 1 and some set of scalars n {αi }i=1 . Since x0 ⊥ xi for every 2 i = 1 , . . . , n we get #x0 # = x0 ; x0 = i=1 αi xi ; x0 = 0, which is a contradiction (because A does not contain the origin). Conclusion: Every finite subset of A containing more than one vector is linearly independent. Recall that every singleton {x} in a linear space X such that x = 0 is linearly independent. Therefore, every finite subset of A is linearly independent, which means that A is itself linearly independent (Proposition 2.3). A unit vector in a normed space is a vector with norm equal to 1. An orthonormal set in an inner product space X is an orthogonal set consisting of unit vectors. That is, a subset A of X is an orthonormal set if x ⊥ y for every pair {x, y} of distinct vectors in A and #x# = 1 for every x ∈ A. Equivalently, {eγ }γ∈Γ is an orthonormal family of vectors in X if eα ; eβ = δαβ for every α, β ∈ Γ , where δαβ is the Kronecker delta (i.e., eα ; eβ = 0 for every α, β ∈ Γ such that α = β and #eγ #2 = eγ ; eγ = 1 for every γ ∈ Γ ). Each orthogonal set of nonzero vectors can be normalized. In fact, if A is an orthogonal set of nonzero vectors in X , then the set {#x#−1 x ∈ X : x ∈ A} ⊆ X is orthonormal. The next two results are immediate consequences of the definition of orthonormal sets (Proposition 5.34 below is a particular case of Proposition 5.33). Proposition 5.34. Every orthonormal set in any inner product space is linearly independent . Proposition 5.35. If A is an orthonormal set in an inner product space X , and if there exists x ∈ X such that x ⊥ A and #x# = 1, then A ∪ {x} is an orthonormal set in X (that properly includes A). Let O be the collection of all orthonormal sets in an inner product space X . We may argue by contradiction that any singleton {x} with #x# = 1 is an orthonormal set in X (although the expression “pairwise orthonormal” is meaningless in this case). As a subcollection of the power set ℘ (X ), O is partially ordered in the inclusion ordering of ℘ (X ). Recall from Section 1.5 that a set A ∈ O (i.e., an orthonormal set A in an inner product space X ) is maximal in O if there is no set A ∈ O such that A ⊂ A (i.e., if there is no
350
5. Hilbert Spaces
orthonormal set A in X that properly includes A). If A is maximal in O, then we say that A is a maximal orthonormal set in the inner product space X . Proposition 5.36. Let X be an inner product space and let A be an orthonormal set in X . The following assertions are pairwise equivalent. (a) A is a maximal orthonormal set in X . (b) There is no unit vector x for which A ∪ {x} is an orthonormal set. (c) If x ⊥ A, then x = 0 (i.e., A⊥ = {0}). Proof. Let A be an orthonormal set in an inner product space X . Proof of (a)⇒(b). If there exists a unit vector x in X for which A ∪ {x} is an orthonormal set, then A ∪ {x} is an orthonormal set that properly includes A (since x ⊥ A). Thus A is not a maximal orthonormal set in X . Proof of (b)⇒(c). If there exists a nonzero vector x in X such that x ⊥ A, then there exists a unit vector x = #x#−1 x in X such that A ∪ {x } is an orthonormal set. Proof of (c)⇒(a). If (a) fails, then there exists an orthonormal set A in X that properly includes A so that A \A = ∅. Take any x in A \A, which is a nonzero vector (actually, x is a unit vector) orthogonal to A (because x ∈ A , A ⊂ A , and A is an orthonormal set). Thus (c) fails. Proposition 5.37. If A is an orthonormal set in an inner product space X , then there exists a maximal orthonormal set B in X such that A ⊆ B. Proof. Let A be an orthonormal set in an inner product space X. Set OA = S ∈ ℘ (X ): S is an orthonormal set in X and A ⊆ S , the collection of all orthonormal sets in X that include A. As a nonempty subcollection (e.g., A ∈ OA ) of the power set ℘(X ), OA is partially ordered in the inclusion ordering. Take an arbitrary chain C in OA and consider the union C of all sets in C. If x and y are distinct vectors in C, then x ∈ C and y ∈ D, where C, D ∈ C ⊆ OA . Since C is a chain, it follows that either C ⊆ D or D ⊆C. Suppose (without loss of generality) that C ⊆ D. Thus x, y ∈ D, and so C is an orthonormal set (since D ∈ O ). Moreover, A ⊆ C (reason: A every set in C includes A). Outcome: C ∈ OA . Since C is an upper bound for C, we may conclude that every chain in OA has an upper bound in OA . Hence OA has a maximal element by Zorn’s Lemma. Thus let B be a maximal element of OA , which clearly is an orthonormal set in X that includes A. If there is a unit vector x in X such that B ∪ {x} is an orthonormal set, then B ∪ {x} lies in OA and properly includes B, which contradicts the fact that B is a maximal element of OA . Therefore, there is no unit vector x in X such that B ∪ {x} is an orthonormal set, and hence B is a maximal orthonormal set in X (Proposition 5.36).
5.8 Orthonormal Basis
351
Proposition 5.37 says that there are plenty of maximal orthonormal sets in any inner product space of dimension greater than 1. The next proposition says that the maximal orthonormal sets in a Hilbert space H are precisely those orthonormal sets that span H. Proposition 5.38. Let A be an orthonormal set in an inner product space X. (a) If A = X , then A is a maximal orthonormal set . (b) If X is a Hilbert space and A is a maximal orthonormal set, then A = X. Proof. Take any orthonormal set A in X . According to A
⊥is a Proposition 5.36, maximal orthonormal set if and only if A⊥ = {0}. If A = X , then A = ⊥ ⊥ ⊥ ⊥ A , and hence A = {0}. X = {0}. But Proposition 5.12 says that A = The converse is an immediate consequence of Proposition 5.15. If X is a Hilbert space and A⊥ = {0}, then A = X . An orthonormal set in an inner product space X that spans X is called an orthonormal basis (or a Hilbert basis) for X . In other words, a subset B of an inner product space X is an orthonormal basis if (i) (ii)
B is an orthonormal set, and B = X.
This is a combined topological and algebraic concept, while the Hamel basis of Section 2.4 is a purely algebraic concept. However, every orthonormal basis for a given inner product space X is included in a Hamel basis for the linear space X (Proposition 5.34 and Theorem 2.5). Recall from Proposition 5.37 that every nonzero inner product space has a maximal orthonormal set . Using the above terminology, Proposition 5.38(a) says that if B is an orthonormal basis for an inner product space X , then B is a maximal orthonormal set in X . Note that this does not ensure the existence of orthonormal bases in incomplete inner product spaces, but in nonzero Hilbert spaces they do exist. In fact, in a Hilbert-space setting the concepts of maximal orthonormal set and orthonormal basis coincide (Proposition 5.38). That is, B is an orthonormal basis for a Hilbert space H if and only if B is a maximal orthonormal set in H. Therefore (cf. Proposition 5.37 again), every nonzero Hilbert space has an orthonormal basis. As we could expect (suggested perhaps by Section 2.4), the cardinality of all maximal orthonormal sets in an inner product space X is an invariant for X . First we prove this statement for finite-dimensional spaces. Proposition 5.39. Let X be an inner product space. (a) If X is finite dimensional, then every orthonormal basis for X is a Hamel basis for X . (b) If there exists an orthonormal basis for X with a finite number of vectors, then every orthonormal basis for X is a Hamel basis for X .
352
5. Hilbert Spaces
Consequently, in both cases, every orthonormal basis for X has the same finite cardinality, which coincides with the linear dimension of X . Proof. Let X be an inner product space. − (a) If X is finite dimensional, then M = M for every linear manifold M of X . In particular, span B = B for every orthonormal basis B for X . Recall that every orthonormal basis for X is linearly independent (Proposition 5.34). Therefore, if B is an orthonormal basis for X , then B is a linearly independent subset of X such that span B = X ; that is, B is a Hamel basis for X .
(b) If B = {ei }ni=1 is an orthonormal basis for X , then span {ei }ni=1 is an ndimensional linear manifold , which in fact is a subspace of X (Corollary of X 4.29). Hence span B = B. But B = X so that B is a linearly independent subset of X (Proposition 5.34) such that span B = X . In other words, B is a Hamel basis for X . Finally, recall that the cardinality of any Hamel basis for X is an invariant for X : the linear dimension of X (Theorem 2.7). Thus, in both cases, # B = dim X in N for every orthonormal basis B for X . To verify such an invariance for infinite-dimensional spaces, we shall use the following fundamental inequality. Lemma 5.40. (Bessel Inequality). Let {eγ }γ∈Γ be a family of vectors in an inner product space X and let x be any vector in X . If {eγ }γ∈Γ is an orthonormal family, then {x ; eγ }γ∈Γ is a square-summable family of scalars and |x ; eγ |2 ≤ #x#2 . γ∈Γ
Proof. Take an arbitrary x ∈ X and an arbitrary finite set of indices N ⊆ Γ . Since {eγ }γ∈Γ is an orthonormal family, it follows by the Pythagorean Theorem (Proposition 5.8) that , ,2 , , 0 ≤ , x ; ek ek − x, k∈N
,2 , , , = , x ; ek ek , − 2 Re x ; ek ek ; x + #x#2 k∈N
=
2
|x ; ek | − 2
k∈N
and hence
k∈N
|x ; ek |2 + #x#2 ,
k∈N
|x ; ek |2 ≤ #x#2 .
k∈N
Therefore, as N is an arbitrary finite subset of Γ ,
5.8 Orthonormal Basis
|x ; eγ |2 = sup N
γ∈Γ
353
|x ; ek |2 ≤ #x#2 ,
k∈N
where the supremum is taken over all finite subsets of Γ . The family of scalars {x ; eγ }γ∈Γ is then square-summable by Proposition 5.31. Corollary 5.41. Let {eγ }γ∈Γ be any orthonormal family of vectors in an inner product space X . For each x ∈ X the set {γ ∈ Γ : x ; eγ = 0} is countable. Proof. According to Lemma 5.40 {|x ; eγ |2 }γ∈Γ is a summable family of nonnegative numbers. Thus Corollary 5.28 ensures that {γ ∈ Γ : |x ; eγ |2 = 0} is a countable set. Equivalently, {γ ∈ Γ : x ; eγ = 0} is a countable set. Theorem 5.42. Every orthonormal basis for a given Hilbert space has the same cardinality. Proof. Let H be a Hilbert space. If H is finite dimensional, then the above result follows by Proposition 5.39. Suppose H is infinite dimensional and let B and C be orthonormal bases for H. Proposition 5.39(b) ensures that B and C are infinite, and so # N ≤ # B. For each b ∈ B set Cb = {c ∈ C: c ; b = 0}. According to Corollary 5.41, # Cb ≤ # N , and hence # Cb ≤ # B for all b ∈ B. Then, since B is an infinite set (cf. Theorems 1.10 and 1.9),
# Cb ≤ # B×B = # B. b∈B
If c ∈ C, then c ∈ Cb for some b ∈ B (reason: since B is a maximal orthonormal set, it follows that if c ∈ C and c ⊥ B, then c = 0, which contradicts the fact that #c# = 1; hence c ; b = 0 for some b ∈ B). Therefore, C ⊆ C b∈B b . But C ⊆ C (each C is a subset of C). Thus C = C and, consequently, b b∈B b b∈B b # C ≤ # B. Since C also is infinite, we may swap B and C, apply the same argument, and get # B ≤ # C. Hence # C = # B by the Cantor–Bernstein Theorem (Theorem 1.6). The preceding theorem might be restated as “every maximal orthonormal set in a given inner product space X has the same cardinality”. The proof remains essentially the same. Such an invariant (i.e., the cardinality of every orthonormal basis for a given Hilbert space) is called the orthogonal dimension of the Hilbert space H. According to Proposition 5.39, the orthogonal dimension of H is finite if and only if its linear dimension is finite and, in this case, these two dimensions coincide. Thus the concepts of “infinite-dimensional” and ‘finite-dimensional” Hilbert spaces are unambiguously defined. Proposition 5.43. If X is a separable inner product space, then every orthonormal set in X is countable. Proof. Let B = {eγ }γ∈Γ be any family of orthonormal vectors in the inner product space X . If X is separable (as a metric space), then there exists a
354
5. Hilbert Spaces
countable dense subset A = {ak }k∈N of X . Since A is dense in X , it follows by Proposition 3.32 that every nonempty open ball centered at any point of X meets A. In particular, every open ball of radius, say 12 , centered at any point of B meets A. Then for each γ ∈ Γ there exists a least integer kγ ∈ N such that #eγ − akγ # < 12 . This establishes a function F : Γ → N that assigns to each γ ∈ Γ the integer kγ ∈ N . We show that this function is injective. In fact, consider the family {akγ }γ∈Γ . Recall that B is an orthonormal set. Thus √
2
= #eα − eβ # = #eα − akα − eβ + akβ + akα − akβ # ≤ #eα − akα # + #eβ − akβ # + #akα − akβ # ≤ 1 + #akα − akβ #,
√ and so #akα − akβ # ≥ 2 − 1 > 0, for every pair of distinct indices α, β ∈ Γ . This implies that kα = kβ whenever α = β, which means that F : Γ → N is injective. Therefore # Γ ≤ # N , and so B is countable. Recall that every (nonzero) Hilbert space has an orthonormal basis. The next theorem characterizes the (nonzero) separable Hilbert spaces in terms of their orthogonal dimension. Theorem 5.44. A nonzero Hilbert space is separable if and only if it has a countable orthonormal basis. Proof. Propositions 4.9(b) and 5.43.
We close this section with a useful result for constructing countable orthonormal sets, which is known as the Gram-Schmidt orthogonalization process. Proposition 5.45. If A = {ak } is a countable linearly independent nonempty set in an inner product space X , then there exists a countable orthonormal set B = {ek } in X with the following property: span {ek }nk=1 = span {ak }nk=1 for every integer n such that 1 ≤ n ≤ # A, and hence span B = span A. Proof. Let A = {ak } be a countable (either a finite and nonempty or a countably infinite) linearly independent subset of an inner product space X . Claim . For every integer n ≥ 1 (n ≤ #A) there exists an orthonormal subset {ek }nk=1 of X such that span {ek }nk=1 = span {ak }nk=1 . Proof. a1 = 0 (because A = {ak } is a linearly independent subset of X ). Set e1 = #a1 #−1 a1 in X so that the result holds for n = 1. (Recall that any singleton {x} such that #x# = 1 is an orthonormal set in X .) If #A = 1 the proposition is proved. Thus assume that 1 < #A ≤ ℵ0 . Suppose the result holds for some integer n ≥ 1 such that n ≤ #A. Observe that an+1 −
n k=1
an+1 ; ek ek = 0
5.8 Orthonormal Basis
355
(otherwise an+1 ∈ span {ek }nk=1 = span {ak }nk=1 , which contradicts the fact that A is linearly independent). Set n
en+1 = βn+1 an+1 − an+1 ; ek ek , k=1
n
where βn+1 = #an+1 − k=1 an+1 ; ek ek #−1 so that #en+1 # = 1. Take an arbitrary j = 1 , . . . , n. Since {ek }nk=1 is an orthonormal set, we get n
en+1 ; ej = βn+1 an+1 ; ej − an+1 ; ek ek ; ej k=1
= βn+1 an+1 ; ej − an+1 ; ej = 0, and hence en+1 ⊥ {ek }nk=1 . Therefore, {ek }n+1 k=1 is an orthonormal set. But en+1 ∈ span ({ek }nk=1 ∪ {an+1 }) = span {ak }n+1 k=1 . Thus n span {ek }n+1 k=1 = span ({ek }k=1 ∪ {en+1 })
= span ({ak }nk=1 ∪ {en+1 }) = span {ak }n+1 k=1 , which completes the proof by induction. #A n n Finally, set B = n=1 {ek }k=1 . Since {ek }k=1 ⊆ {ek }n+1 k=1 for each integer n ≥ 1 such that n ≤ #A, it follows that if e and e are distinct vectors in B, then they m belong to {ek }m k=1 for some integer m such that 1 ≤ m ≤ #A. Since {ek }k=1 is an orthonormal set, e ⊥ e and #e# = #e # = 1. Thus B is an orthonormal set. Moreover, since span {ek }nk=1 = span {ak }nk=1 for every integer n ≥ 1 such that n ≤ #A, then span B = span A. Corollary 5.46. There is no Hilbert space with a countably infinite Hamel basis. In other words, a Hamel basis for a Hilbert space is either finite or uncountable. Proof. Let H be a Hilbert space and suppose there exists a countably infinite Hamel basis for the linear space H, say {fk }∞ k=1 . The Gram-Schmidt orthogonalization process ensures the existence of a countably infinite orthonormal n n set, say {ek }∞ k=1 , such that span {ek }k=1 = span {fk }k=1 for every n ≥ 1. If ∞ {αk }k=1 is a square-summable sequence of scalars, then {αk ek }∞ k=1 is a squaresummable sequence of pairwise orthogonal vectors in H, and so the infinite ∞ series k=1 αk ek converges in the Hilbert space H by Corollary 5.9(b). Take ∞ any square-summable sequence αk = k1 for ∞ {αk }k=1 of nonzero scalars (e.g., n each k ≥ 1) and put x = k=1 αk ek in H. Since x ∈ / span {ek }k=1 , and since span {ek }nk=1 = span {fk }nk=1 , for every n ≥ 1, it follows that x ∈ / span {fk }∞ k=1 ∞ (i.e., x is not a finite linear combination of vectors from {fk }k=1 ), which contradicts the assumption that {fk }∞ k=1 is a Hamel basis for H. Conclusion: There is no countably infinite Hamel basis for H.
356
5. Hilbert Spaces
This result is the main reason why the concept of linear dimension is of no interest in Hilbert space theory. Here the useful notion is orthogonal dimension and, from now on, dim H will always mean the orthogonal dimension of the Hilbert space H. Observe that the notions of finite-dimensional and infinitedimensional Hilbert spaces remain unchanged.
5.9 The Fourier Series Theorem The Fourier Series Theorem states the fundamental properties of an orthonormal basis in a Hilbert space. Precisely, it exhibits a collection of necessary and sufficient conditions for an orthonormal set to be an orthonormal basis in a Hilbert space and, in particular, it ensures existence and uniqueness of the expansion of any vector in a Hilbert space in terms of each orthonormal basis. Before stating and proving it, we need the following auxiliary result. Proposition 5.47. Let {ek }k∈N be a finite orthonormal set in a Hilbert space H and let x be an arbitrary vector in H. ux = x ; ek ek k∈N
is the unique vector in span {ek }k∈N such that #x − ux # ≤ #x − u#
for every
u ∈ span {ek }k∈N .
Proof. Since span {ek }k∈N is a finite-dimensional linear manifold of H (Theorem 2.6), it follows by Corollary
− 4.29 that it is a subspace of H so that span {ek }k∈N = span {ek }k∈N = {ek }k∈N . Theorem 5.13 says that there exists a unique vector ux in {ek }k∈N such that #x − ux # ≤ #x − u# for every u ∈ {ek }k∈N . Moreover, Theorem 5.13 also says that this ux is the unique vector in {ek }k∈N such that x − ux ⊥ {ek }k∈N . Since ux ∈ span {ek }k∈N , it follows thatux = k∈N αk ek for some finite family of scalars {αk }k∈N . Since x − ux ⊥ {ek }k∈N , it follows that x − ux ⊥ ej for every j ∈ N . Thus, recalling that {ek }k∈N is an orthonormal set, we get αk ek ; ej = x ; ej − αj 0 = x − ux ; ej = x ; ej − for every j ∈ N , and hence ux =
k∈N
k∈N x ; ek ek .
Theorem 5.48. (The Fourier Series Theorem). Let B = {eγ }γ∈Γ be an orthonormal set in a Hilbert space H. The following assertions are pairwise equivalent .
5.9 The Fourier Series Theorem
357
(a) B is an orthonormal basis for H. (b) Every x ∈ H has a unique expansion on B, namely x ; eγ eγ . x = γ∈Γ
This is referred to as the Fourier series expansion of x. The elements of the family of scalars {x ; eγ }γ∈Γ are called the Fourier coefficients of x with respect to B. (c) For every pair of vectors x, y in H, x ; eγ y ; eγ . x ; y = γ∈Γ
(d) For every x ∈ H,
#x#2 =
|x ; eγ |2 .
γ∈Γ
This is the Parseval identity. (e) Every linear manifold of H that includes B is dense in H. Proof. We shall verify that (e)⇔(a)⇔(d) and (b)⇒(c)⇒(d)⇒(b). Proof of (a)⇔(d). Take any x in H. If (span B)− = H, then every nonempty open ball centered at x meets span B (Proposition 3.32). That is, for every ε > 0 there exists a finite setNε ⊆ Γ and a vector u ∈ span {ek }k∈Nε such that #x − u# < ε. Set ux = k∈Nε x ; ek ek in span {ek }k∈Nε . Proposition 5.4 ensures that #x − ux # ≤ #x − u#, and hence ,2 , , , x ; ek ek , < ε2 . ,x − k∈Nε
Since {ek }k∈Nε is an orthonormal set, it follows by the Pythagorean Theorem (Proposition 5.8) that , ,2 ,2 , , , , , x ; ek ek − x, = , x ; ek ek , − 2 Re x ; ek ek ; x + #x#2 , k∈Nε
k∈Nε
=
2
|x ; ek | − 2
k∈Nε
= #x#2 −
k∈Nε
|x ; ek |2 + #x#2
k∈Nε
|x ; ek |2 .
k∈Nε
Therefore, by usingthe Bessel inequality (Lemma 5.40) and recalling that 2 2 k∈Nε |x ; ek | ≤ γ∈Γ |x ; eγ | (Proposition 5.31), we get
358
5. Hilbert Spaces
# # # # |x ; eγ |2 # = #x#2 − |x ; eγ |2 ≤ #x#2 − |x ; ek |2 ##x#2 − γ∈Γ
γ∈Γ
k∈Nε
,2 , , , = ,x − x ; ek ek , < ε2 k∈Nε
for all ε > 0. Hence #x#2 = γ∈Γ |x ; eγ |2 . Outcome: (a) implies (d). Conversely, if the orthonormal set B is not an orthonormal basis for H, then it is not maximal in the Hilbert space H (Proposition 5.38(b)). Thus there exists a unit vector e in H such that B ∪ {e} is an orthonormal set (Proposition 5.36). Therefore e ; eγ = 0 for all γ ∈ Γ , and hence 1 = #e#2 = γ∈Γ |e ; eγ |2 = 0. Conclusion: If (a) fails then (d) fails. Equivalently, (d) implies (a). Proposition 4.9(a) ensures that (a)⇔(e). It is readily verified that (b)⇒(c). Indeed, if (b) holds, then 7 8 8 7 x ; y = x ; eα eα ; y ; eβ eβ = x ; eα eα ; y ; eβ eβ α∈Γ
=
x ; eα
α∈Γ
α∈Γ
β∈Γ
β∈Γ
y ; eβ eα ; eβ = x ; eα y ; eα
β∈Γ
α∈Γ
for every x, y ∈ H (because {eγ }γ∈Γ is an orthonormal set). Hence (b) implies (c). Moreover, (c)⇒(d) trivially. Proof of (d)⇒(b). Take any x ∈ H. Assertion (d) implies that {x ; eγ eγ }γ∈Γ is a square-summable family of orthogonal vectors in H. Thus {x ; eγ eγ }γ∈Γ is a summable family of orthogonal vectors in the Hilbert space H by Theorem 5.32(b). Let x be the sum of {x ; eγ eγ }γ∈Γ . That is, x = γ∈Γ x ; eγ eγ ∈ H. Since {eγ }γ∈Γ is an orthonormal set, it follows by Theorem 5.32(a) that , ,2 , , x ; eγ eγ − x, #x − x#2 = , γ∈Γ
,2 , , , = , x ; eγ eγ , − 2 Re x ; eγ eγ ; x + #x#2 γ∈Γ
=
2
|x ; eγ | − 2
γ∈Γ
γ∈Γ
|x ; eγ |2 + #x#2 = 0
γ∈Γ
because assertion (d) holds true (i.e., because #x#2 = fore, x = x so that x ; eγ eγ . x =
γ∈Γ |x ; eγ |
2
). There-
γ∈Γ
Finally, if x = (x)eγ for some family of scalars {αγ (x)}γ∈Γ , then γ∈Γ α
γ 2 γ∈Γ αγ (x) − x ; eγ eγ = 0 so that γ∈Γ |αγ (x) − x ; eγ | = 0 by Theorem 5.32(a). Thus αγ (x) = x ; eγ for every γ ∈ Γ , which proves that the Fourier series expansion of x is unique.
5.9 The Fourier Series Theorem
359
Remark : Consider the sums in (b), (c), and (d) of Theorem 5.48. If the Hilbert space H is finite dimensional, then any orthonormal basis for H is finite (Proposition 5.39), and so these are finite sums. If H is infinite dimensional and separable, then any orthonormal basis B for H is countably infinite (Proposition 5.43) so that B can be indexed by N (or N 0 , or Z ). For instance, if B = {ek }k∈N is an orthonormal basis for an infinite-dimensional separable Hilbert space H, then we have a countable summable family of vectors {x ; ek ek }k∈N in (b), a countable summable family of scalars {x ; ek y ; ek }k∈N in (c), and a countable summable family of nonnegative numbers {|x ; ek |2 }k∈N in (d). If {ek }∞ k=1 is any H-valued orthonormal sequence containing all vectors from the orthonormal basis {ek }k∈N for H (which is itself an orthonormal basis for H), then the Fourier series expansion for any x ∈ H can be written as x =
∞
x ; ek ek .
k=1
Similarly, x ; y =
∞
x ; ek y ; ek
k=1
for every x, y in H, and #x#2 =
∞
|x ; ek |2
k=1
for every x ∈ H, where all the above infinite series are unconditionally convergent . In particular, the Fourier series expansion is unconditionally convergent . If H is a nonseparable Hilbert space, then any orthonormal basis for H is uncountable (Theorem 5.44). However, Corollary 5.41 says that, even in this case, the sums in (b), (c), and (d) have only a countable number of nonzero summands for each x, y ∈ H. Example 5.L. In this example we shall exhibit orthonormal bases for some classical separable Hilbert spaces. (a) First recall that, for any integer n ≥ 1, F n is a Hilbert space (see Example 5.A). Moreover, the finite set {ek }nk=1 , where each ek = (δk1 , . . . , δkn ) ∈ F n has 1 at the kth position and zeros elsewhere, clearly is an orthonormal set in F n and also a Hamel basis for the linear space F n (Example 2.I). Then {ek }nk=1 is an orthonormal basis for the finite-dimensional Hilbert space F n, which is called the canonical orthonormal basis for F n. 2 2 be the Hilbert space of Example 5.B. Consider the + -valued (b) Now let + 2 sequence {ek }k∈N , where each ek is a scalar-valued sequence in + with just one 2 nonzero entry (equal to 1) at the kth position. That is, ek = {δkj }j∈N ∈ + for every k ∈ N with δkj = 1 if j = k and δkj = 0 if j = k. Again, it is clear that 2 {ek }k∈N is an orthonormal sequence of vectors in + . Moreover, recall that x =
360
5. Hilbert Spaces
2 2 2 {ξj }j∈N lies in + if and only if ∞ j=1 |ξj | < ∞. Therefore, if x = {ξj }j∈N ∈ + , 2 then x = lim xn , where xn = (ξ1 , . . . , ξn , 0, 0, 0, . . .) ∈ + for every n ∈ N . Since n each xn ∈ span {ek }k∈N (for xn = k=1 ξk ek ), it follows2 bythe Closed Set − 2 Theorem that x ∈ span {e } = {ek }k∈N . Hence + ⊆ {ek }k∈N ⊆ + , k k∈ N 2 which means that {ek }k∈N = + . Conclusion: The orthonormal sequence 2 {ek }k∈N is an orthonormal basis for + . It can be similarly shown that {ek }k∈Z , where each ek is a scalar-valued net in 2 with just one nonzero entry (equal to 1) at the kth position (i.e., ek = {δkj }j∈Z ∈ 2 for every k ∈ Z with δkj = 1 if j = k and δkj = 0 if j = k) is an orthonormal basis for the Hilbert space 2 of Example 5.B. These are referred to as the canonical orthonormal bases for 2 + and 2 , respectively. Next consider the complex Hilbert space L2 (S) for some nondegenerate interval S of the real line (see Example 5.D). Recall that L2 (S) is the completion of R2 (S), which in turn is the quotient space r2 (S)/∼ of Example 3.E. Also recall that, according to the usual convention, we write x ∈ L2 (S) instead of [x] ∈ L2 (S), where x is any representative of the equivalence class [x]. (c) Suppose S = [a, b] for some pair of real numbers a < b so that L2 (S) = L2 [a, b]. It is not difficult to verify that the countable set {ek }k∈Z , with each ek ∈ L2 [a, b] given by
1 t−a for every t ∈ [a, b], ek (t) = (b − a)− 2 exp 2πik b−a is a maximal orthonormal set in L2 [a, b], and hence an orthonormal basis for the Hilbert space L2 [a, b]. In particular, for S = [0 , 2π], the countable set {ek }k∈Z , with each ek ∈ L2 [0 , 2π] given by
for every t ∈ [0 , 2π], ek (t) = √12π eikt = √12π cos kt + i sin kt is an orthonormal basis for the Hilbert space L2 [0, 2π]. Similarly (and equivalently), let D be the open unit disk (about the origin) in the complex plane C , and let T = ∂ D denote the unit circle in C . Suppose the length of T is normalized (i.e., suppose L2 (T ) is the Hilbert space of all equivalence classes of square-integrable complex functions defined on T with respect to normalized Lebesgue measure μ on the unit circle so that μ(T ) = 1). The countable set {ek }k∈Z , with each ek ∈ L2 (T ) given by ek (z) = z k
for every
z = eiθ ∈ T
(with 0 ≤ θ < 2π),
is an orthonormal basis for the Hilbert space L2 (T ). (d) If S = [0, ∞), then it can be shown that the sequence {en }∞ n=0 with each en ∈ L2 [0, ∞) given by en (t) =
1 − 2t e Ln (t) n!
where each Ln ∈ L2 [0, ∞) is defined by
for every
t ≥ 0,
5.9 The Fourier Series Theorem
n −t n k n n! k dn Ln (t) = et dt t = n t e k=0 (−1) k k!
for every
361
t ≥ 0,
is an orthonormal basis for the Hilbert space L2 [0, ∞). (Note: The elements ∞ of {en}∞ n=0 and {Ln }n=0 are referred to as Chebyshev–Laguerre functions and Chebyshev–Laguerre polynomials, respectively.) If S = R, then it can also be 2 shown that the sequence {en }∞ n=0 with each en ∈ L (−∞, ∞) given by √ − 1 t2 en (t) = 2n n! π 2 e− 2 Hn (t)
for every
t ∈ R,
where each Hn ∈ L2 (−∞, ∞) is defined by 2
Hn (t) = (−1)n et
dn dtn
2
e−t
for every
t ∈ R,
is an orthonormal basis for the Hilbert space L2 (−∞, ∞). (Note: The ele∞ ments of {en }∞ n=0 and {Hn }n=0 are called Chebyshev–Hermite functions and Chebyshev–Hermite polynomials, respectively.) (e) Let S and U be bilateral shifts of infinite multiplicity acting on a separable Hilbert space H as defined in Problem 5.30 (which are unitary operators, i.e., invertible isometries). Suppose the bilateral shifts S and U in B[H] do not commute but S intertwines U 2 to U ; that is, suppose S U = U S
and
S U 2 = U S.
If a nonzero w in H makes the double-indexed family {S m U n w}(m,n)∈Z ×Z an orthonormal basis for H, then w is called a wavelet (or an orthogonal wavelet ), and the vectors wm,n = S m U n w in H are called the wavelet vectors generated by the wavelet w. The Fourier series expansion x = m,n x ; wm,n wm,n , in terms of the double-indexed orthonormal basis {wm,n }(m,n)∈Z ×Z for H consisting of wavelet vectors, is referred to as a wavelet expansion of x ∈ H. If H represents some concrete Hilbert space of functions, then the term “wavelet functions” is used instead of “wavelet vectors”. For instance, consider a pair of operators S and U on H = L2 (R) defined by √ y = Sx ∈ L2 (R) with y(t) = 2 x(2t) for t ∈ R, and
y = U x ∈ L2 (R)
with
y(t) = x(t − 1) for t ∈ R,
for every x ∈ L2 (R). It can be verified that S and U are bilateral shifts on L2 (R) satisfying the preceding assumptions. A well-known wavelet associated with them is the Haar wavelet , which is the function w in L2 (R) defined by ⎧ ⎪ 0 < t ≤ 12 , ⎪ ⎨ 1, 1 w(t) = −1, 2 < t ≤ 1, ⎪ ⎪ ⎩ 0, t ∈ R\(0, 1].
362
5. Hilbert Spaces
Example 5.M. All the orthonormal bases in the previous example are countable, and so those Hilbert spaces are all separable Hilbert spaces. However, there exist nonseparable Hilbert spaces. Actually, there exist Hilbert spaces of arbitrary orthogonal dimension. Indeed, let Γ be any index set and let Γ2 be the Hilbert space of Example 5.K. Consider the family {eγ }γ∈Γ of vectors in Γ2 , where each eγ = {δγα }α∈Γ is a family of scalars such that δγα = 1 if α = γ and δγα = 0 if α = γ. Note that each eγ is a square-summable family of scalars (with just one nonzero element equal to 1), and so each eγ in fact lies in Γ2 . It is also clear that eα ; eβ = δαβ for every α, β ∈ Γ , and hence {eγ }γ∈Γ is an orthonormal family of vectors in Γ2 . Let x be any unit vector in Γ2 . That is, x = {ξγ }γ∈Γ is a square-summable family of scalars such that #x#2 = γ∈Γ |ξγ |2 = 1. If xis orthogonal to every eγ (i.e., if x ⊥ eγ for all γ ∈ Γ ), then 0 = x ; eγ = α∈Γ ξα δγα = ξγ for all γ ∈ Γ , and hence x = 0. But this contradicts the fact that #x# = 1. Therefore, there is no unit vector x in Γ2 for which {eγ }γ∈Γ ∪ {x} is an orthonormal set. That is (Proposition 5.36), {eγ }γ∈Γ is a maximal orthonormal set in the Hilbert space Γ2 , which means that {eγ }γ∈Γ is an orthonormal basis for Γ2 (Proposition 5.38). Hence dim Γ2 =
# Γ.
If the index set Γ is uncountable, then {eγ }γ∈Γ is an uncountable orthonormal basis for Γ2 . In this case the Hilbert space Γ2 is not separable (Theorem 5.44). Theorem 5.49. Two Hilbert spaces are unitarily equivalent if and only if they have the same orthogonal dimension. Proof. Let H and K be Hilbert spaces (over the same field F ) and let B = {eγ }γ∈Γ be an orthonormal basis for H, indexed by an index set Γ . (a) If H ∼ = K (i.e., if H and K are unitarily equivalent), then there is a unitary transformation U ∈ B[H, K] such that U (B) is an orthonormal basis for K. In fact, since B is an orthonormal set in H, and U preserves the inner product, it follows that U (B) is an orthonormal set in K. Moreover, as U is surjective, for every y ∈ K there is an x ∈ H such that y = U x, and hence (Theorem 5.48) y = U x ; eγ eγ = U x ; U eγ U eγ = y ; U eγ U eγ γ∈Γ
γ∈Γ
γ∈Γ
because U is a bounded linear transformation that preserves the inner product. Thus the orthogonal set U (B) = {U eγ }γ∈Γ is an orthonormal basis for K by Theorem 5.48. Since U is invertible, it establishes a one-to-one correspondence between B and U (B), and so # B = # U (B). Therefore, dim H = dim K. (b) Conversely, suppose dim H = dim K. Then # B = # C, where C is an arbitrary orthonormal basis for K, so that C and B can be similarly indexed, say C = {fγ }γ∈Γ . Consider the Hilbert space Γ2 of Example 5.K (over the field F ). First we show that H and Γ2 are unitarily equivalent, that is,
5.9 The Fourier Series Theorem
363
H∼ = Γ2 . In fact, take an arbitrary x ∈ H. The Parseval identity (Theorem 5.48) says that #x#2 = γ∈Γ |x ; eγ |2 , and so {x ; eγ }γ∈Γ lies in Γ2 . Consider the mapping U : H → Γ2 defined by U x = {x ; eγ }γ∈Γ for every x ∈ H. It is readily verified that U is a linear transformation (by the linearity of the inner product in the first argument), and #U x#2 = |x ; eγ |2 = #x#2 γ∈Γ
for every x ∈ H (Parseval identity again). Thus U is a linear isometry. Next we verify that U is surjective. If {αγ }γ∈Γ is any family of scalars in Γ2 (i.e., if 2 summable family γ∈Γ |αγ | < ∞; cf. Proposition 5.31), then {αγ eγ }γ∈Γ is a of vectors in H. Indeed, as {eγ }γ∈Γ is an orthonormal set, γ∈Γ #αγ eγ #2 = 2 γ∈Γ |αγ | < ∞ so that {αγ eγ }γ∈Γ is a square-summable family, and hence a summable family of vectors in the Hilbert space H (Proposition 5.31 and Theorem5.32). Therefore, for every {αγ }γ∈Γ ∈ Γ2 there is an x in H such that x = γ∈Γ αγ eγ . But the Fourier series expansion of x is unique, which implies that αγ = x ; eγ for every γ ∈ Γ . Then U x = {x ; eγ }γ∈Γ = {αγ }γ∈Γ so that {αγ }γ∈Γ ∈ R(U ). That is, Γ2 ⊆ R(U ). Since R(U ) ⊆ Γ2 trivially, it follows that R(U ) = Γ2 . Thus the linear isometry U is surjective, which means that U is a unitary transformation. Hence H and Γ2 are unitarily equivalent. Similarly, since B and C are indexed by a common index set Γ , the same argument shows that K and Γ2 are unitarily equivalent as well: Γ2 ∼ = K. Therefore, H and K are unitarily equivalent, that is, H∼ = K, because unitary equivalence is transitive: composition of isometric isomorphism is again an isometric isomorphism. According to Theorem 5.44 and Examples 5.L and 5.M, the next result is an immediate consequence of Theorem 5.49. Corollary 5.50. Let Γ be an arbitrary index set. Every Hilbert space H with dim H = # Γ is unitarily equivalent to Γ2 . In particular, every n-dimensional Hilbert space (for any integer n ∈ N ) is unitarily equivalent to F n, and every 2 infinite-dimensional separable Hilbert space is unitarily equivalent to + . Our first example of an unbounded linear transformation defined on a Banach space appeared in the proof of Corollary 4.30 part (b). It is easy to exhibit unbounded linear transformations defined on linear manifolds of a Hilbert space (see Problem 4.33(b)). Now we apply the Fourier Series Theorem
364
5. Hilbert Spaces
to exhibit an unbounded linear transformation defined on a whole Hilbert space; precisely, defined on an infinite-dimensional separable Hilbert space. Example 5.N. Let H be an infinite-dimensional separable Hilbert space and let {ek }∞ k=1 be an orthonormal basis for H. Let B = {fγ }γ∈Γ be a Hamel basis for H that properly includes {ek }∞ k=1 (see Theorem 2.5 and Corollary 5.46). Take f0 ∈ B\{ek }∞ (obviously f = 0) and consider the mapping F : B → H 0 k=1 such that F f0 = f0 and F f = 0 for all f = f0 in B. Now consider the transformation L: H → H defined as follows. For each x ∈ H take its unique representation as a (finite) linear combination of vectors in the Hamel basis B,
n(x)
x =
αk (x)fk .
k=1 n(x)
Here n(x) is a positive integer and {αk (x)}k=1 is a finite family of scalars containing all nonzero coordinates of x with respect to the Hamel basis B. Set
n(x)
Lx =
αk (x)F fk .
k=1
It is not difficult to verify that L is linear (i.e., L ∈ L[H]). Moreover, Lf = 0 for every f ∈ B such that f = f0 , and Lf0 = f0 (so that L = O). In particular, Lek = 0 for all k ≥ 1. Take an arbitrary x ∈ H and consider its Fourier series ∞ expansion, viz., x = k=1 x ; ek ek . If L is bounded ∞ (i.e., if L is continuous), then it follows by Corollary 3.8 that Lx = k=1 x ; ek Lek = 0. But this implies that Lx = 0 for all x ∈ H (i.e., L = O), which is a contradiction. Thus L is unbounded; that is, L ∈ L[H]\B[H].
5.10 Orthogonal Projection A projection is an idempotent linear transformation of a linear space into itself (Section 2.9). An orthogonal projection on an inner product space X is a projection P : X → X such that R(P ) ⊥ N (P ). Since orthogonal projections are projections, all the results of Section 2.9 apply to orthogonal projections in particular. If P is an orthogonal projection on X , then so is the complementary projection E = (I − P ): X → X (reason: E = (I − P ) is a projection with R(E) = N (P ) and N (E) = R(P )). Proposition 5.51. An orthogonal projection P : X → X on an inner product space X has the following basic properties. (a) P ∈ B[X ] (i.e., P is bounded ) and #P # = 1 whenever P = O, (b) N (P ) and R(P ) are subspaces of X ,
5.10 Orthogonal Projection
365
(c) N (P ) = R(P )⊥ and R(P ) = N (P )⊥ , (d) R(P ) and N (P ) are orthogonal complementary subspaces in X . That is, besides being orthogonal subspaces of X , R(P ) and N (P ) are such that X = R(P ) + N (P ). Thus X can be decomposed as an orthogonal direct sum X = R(P ) ⊕ N (P ). Proof. Let P be an orthogonal projection on an inner product space X . (a) Take an arbitrary x ∈ X . Since R(P ) and N (P ) are algebraic complements of each other (Theorem 2.19), we can write x = u + v with u ∈ R(P ) and v ∈ N (P ). Moreover, u ⊥ v because R(P ) ⊥ N (P ). Then #x#2 = #u#2 + #v#2 by the Pythagorean Theorem. Recalling that R(P ) = {u ∈ X : P u = u}, we get P x = P u + P v = u so that #P x#2 = #u#2 ≤ #x#2 . Hence #P # ≤ 1. That is, P is a contraction. If P = O, then R(P ) = {0}, and so #P u# = #u# = 0 (because P u = u) for all nonzero u in R(P ). Thus #P # ≥ 1. Outcome: #P # = 1. (b) According to item (a), P ∈ B[X ]. Thus Proposition 4.13 says that N (P ) is a subspace of X . Since E = I − P is an orthogonal projection on X , it follows that E ∈ B[X ] and R(P ) = N (E) is a subspace of X (Proposition 4.13 again). (c) Recall from the definition of orthogonal complement that if A and B are subsets of X such that A ⊥ B, then A ⊆ B ⊥. Therefore, as N (P ) ⊥ R(P ), we get N (P ) ⊆ R(P )⊥. Now take an arbitrary x ∈ R(P )⊥ so that x ⊥ R(P ). Theorem 2.19 says that x = u + v with u ∈ R(P ) and v ∈ N (P ). Hence 0 = x ; u = u ; u + v ; u = #u#2 (since R(P ) ⊥ N (P )) and so u = 0, which implies that x = v ∈ N (P ). Therefore R(P )⊥ ⊆ N (P ). Outcome: N (P ) = R(P )⊥. Considering the complementary orthogonal projection E = I − P , we conclude that R(P ) = N (E) = R(E)⊥ = N (P )⊥. (d) Theorem 2.19 says that N (P ) and R(P ) are algebraic complements of each other so that X = R(P ) + N (P ). Since R(P ) ⊥ N (P ), it follows by Proposition 5.24 that X ∼ = R(P ) ⊕ N (P ). As usual, we identify the orthogonal direct sum R(P ) ⊕ N (P ) with its unitarily equivalent image R(P ) + N (P ) = X , and write X = R(P ) ⊕ N (P ). Theorem 5.52. (Projection Theorem – Third version). For every subspace M of a Hilbert space H there exists a unique orthogonal projection P ∈ B[H] with R(P ) = M. Proof. Existence. Theorem 5.20 says that H can be decomposed as H = M + M⊥ .
366
5. Hilbert Spaces
Since M and M⊥ are algebraic complements of each other, for every x ∈ H there exists a unique u ∈ M and a unique v ∈ M⊥ such that x = u + v (Theorem 2.14). For each x ∈ H set P x = P (u + v) = u in M ⊆ H. It is easy to check that this defines a linear transformation P : H → H with R(P ) = M. Moreover, P 2 x = P u = u = P x (because u = u + 0 is the unique decomposition of u ∈ M in M + M⊥ ) so that P is idempotent. Furthermore, P x = 0 if and only if x = 0 + v for some v ∈ M⊥, and hence N (P ) = M⊥. Conclusion: P is an orthogonal projection on H with R(P ) = M. Uniqueness. Take an arbitrary vector x ∈ H and consider its unique decomposition x = u + v ∈ M + M⊥ = H with u ∈ M and v ∈ M⊥. Suppose P is an orthogonal projection on H such that R(P ) = M. Thus P u = u = P u for every u ∈ M (since the range of any projection is the set of all its fixed points — see proof of Theorem 2.19), and P v = 0 = P v for every v ∈ M⊥. (Reason: If M = R(P ) = R(P ), then M⊥ = R(P )⊥ = R(P )⊥ = N (P ) = N (P ).) Therefore, P x = P (u + v) = P (u + v) = P x, and hence P = P . Let M be any subspace of a Hilbert space H. The unique orthogonal projection P : H → H for which R(P ) = M is called the orthogonal projection onto M. The above proof shows that Theorem 5.20 implies Theorem 5.52. In fact, all versions of the Projection Theorem, viz., Theorems 5.20, 5.25, and 5.52, are pairwise equivalent. That Theorems 5.20 and 5.25 are equivalent is an immediate consequence of Proposition 5.24. We shall verify that Theorem 5.52 implies Theorem 5.20. Indeed, if P : H → H is any orthogonal projection on a Hilbert space H, then H = R(P ) + N (P ) by Theorem 2.19, and hence H = R(P ) + R(P )⊥, where R(P ) is a subspace of H (Proposition 5.51). But Theorem 5.52 says that for every subspace M of H there exists an orthogonal projection P : H → H such that R(P ) = M. Therefore, for every subspace M of H we get H = M + M⊥. Outcome: Theorem 5.52 implies Theorem 5.20. Such an equivalence justifies the terminology “Projection Theorem” for Theorems 5.20 and 5.25. The pivotal result of Theorem 5.13 can also be translated into the orthogonal projection language. Actually, the unique vector ux ∈ M of Theorem 5.13 is given by ux = P x, where P : H → H is the orthogonal projection onto M (i.e., with R(P ) = M). Theorem 5.53. Let M be a subspace of a Hilbert space H and let P ∈ B[H] be the orthogonal projection onto M. Take any x in H. P x is the unique vector in M such that #x − P x# = d(x, M). Moreover, P x also is the unique vector in M such that x − P x ⊥ M. Proof. Let P be the orthogonal projection onto M. P (x − P x) = P x − P 2 x = 0 so that x − P x ∈ N (P ) = R(P )⊥ = M⊥. Thus P x is the unique vector in
5.10 Orthogonal Projection
367
M such that x − P x ∈ M⊥, which in turn is the unique vector in M such that #x − P x# = d(x, M) (Theorem 5.13). The next two results connect the notion of orthogonal projection with the Fourier Series Theorem. Theorem 5.54. Let {eγ } γ∈Γ be an orthonormal family of vectors in a Hilbert space H and set M = {eγ }γ∈Γ . For every x ∈ H, {x ; eγ eγ }γ∈Γ is a summable family of vectors in M, and the mapping P : H → H, defined by Px = x ; eγ eγ , γ∈Γ
is the orthogonal projection onto M. Proof.Take an arbitrary x ∈ H. The Bessel inequality (Lemma 5.40) says that γ∈Γ |x ; eγ |2 ≤ #x#2 , and hence {x ; eγ eγ }γ∈Γ is a square-summable family (cf. Proposition 5.31) of vectors in the Hilbert space M (see Proposition 4.7). Then {x ; e γ eγ }γ∈Γ is a summable family of vectors in M by Theorem 5.32. Set P x = γ∈Γ x ; eγ eγ ∈ M. It is readily verified that this defines a linear transformation P : H → H such that R(P ) ⊆ M. Moreover, #P x#2 = |x ; eγ |2 (Theorem 5.32 again), and hence #P x#2 ≤ #x#2 . Thus P lies γ∈Γ in B[H] (i.e., P is continuous — a contraction, actually), which implies that
P 2x = P x ; eγ eγ = x ; eγ P eγ . γ∈Γ
γ∈Γ
But the very definition of P ensures that P eα = γ∈Γ eα ; eγ eγ = eα for every α ∈ Γ . Therefore, P 2x = x ; eγ eγ = P x, γ∈Γ
and so P is a projection. Since {eγ }γ∈Γ is an orthonormal basis for the Hilbert space M, the Fourier Series Theorem ensures that every u ∈ M has a unique Fourier series expansion u = u ; eγ eγ .
γ∈Γ
Hence P u = γ∈Γ u ; eγ P eγ = γ∈Γ u ; eγ eγ = u so that u ∈ R(P ). Thus M ⊆ R(P ) and so R(P ) = M. Note that v ∈ N (P ) (i.e., P v = 0) if and only if #P v#2 = γ∈Γ |v ; eγ |2 = 0. Equivalently, if and only if v ; eγ = 0
for all γ ∈ Γ , which means that v ∈ {eγ }γ∈Γ ⊥ = {eγ }γ∈Γ ⊥ = M⊥ = R(P )⊥. Thus N (P ) = R(P )⊥, which implies that R(P ) ⊥ N (P ). Conclusion: P : H → H is an orthogonal projection with R(P ) = M; that is, the projection onto M (Theorem 5.52). Corollary 5.55. Let {eγ }γ∈Γ be any orthonormal basis for a subspace M of a Hilbert space H. The orthogonal projection P ∈ B[H] onto M is given by
368
5. Hilbert Spaces
Px =
x ; eγ eγ
for every
x ∈ H.
γ∈Γ
Note that if M = H in Theorem 5.54 (and Corollary 5.55), then P = I, thus pointing back to the Fourier Series Theorem (Theorem 5.48). Now we consider “orthogonal families of orthogonal projections”. This is a rather important notion upon which the Spectral Theorem of the next chapter will be built. Proposition 5.56. Let P1 ∈ B[X ] and P2 ∈ B[X ] be orthogonal projections on an inner product space X . The following assertions are pairwise equivalent. (a) R(P1 ) ⊥ R(P2 ). (b) P1 P2 = O. (c) P2 P1 = O. (d) R(P2 ) ⊆ N (P1 ). (e) R(P1 ) ⊆ N (P2 ). Proof. Clearly, R(P1 ) ⊥ R(P2 ) implies R(P2 ) ⊆ R(P1 )⊥ = N (P1 ). Conversely, if R(P2 ) ⊆ N (P1 ) = R(P1 )⊥, then R(P2 ) ⊥ R(P1 ). Hence (a)⇔(d). Similarly (recall: orthogonality is symmetric so that R(P1 ) ⊥ R(P2 ) if and only if R(P2 ) ⊥ R(P1 )) we get (a)⇔(e). Since R(P2 ) ⊆ N (P1 ) if and only if P1 P2 = O, it follows that (d)⇔(b). Swap P1 and P2 to get (e)⇔(c). If two orthogonal projections P1 and P2 on an inner product space X satisfy any of the equivalent assertions of Proposition 5.56, then they are said to be orthogonal to each other (or mutually orthogonal ). If {Pγ }γ∈Γ is a family of orthogonal projections on an inner product space X which are orthogonal to each other (i.e., R(Pα ) ⊥ R(Pβ ) whenever α = β), then we say that {Pγ }γ∈Γ is an orthogonal family of orthogonal projections on X . An orthogonal sequence of orthogonal projections {Pk }∞ k=0 is similarly defined. If {Pγ }γ∈Γ is an orthogonal family of orthogonal projections and Pγ x = x for every x ∈ X , γ∈Γ
then {Pγ }γ∈Γis called a resolution of the identity on X . (For each x ∈ X the sum x = γ∈Γ Pγ x has only a countable number of nonzero vectors — why?) If {Pk }∞ k=0 is an infinite sequence, then the above identity in X means convergence in the strong operator topology: n
s Pk −→ I.
k=0
If {Pi }ni=0 is a finite nfamily, then the above identity in X obviously coincides with the identity i=0 Pi = I in B[X ]. For instance, if P is an orthogonal
5.10 Orthogonal Projection
369
projection on X and E = I − P is the complementary projection on X , then P and E are orthogonal projections orthogonal to each other (since P E = P − P 2 = O). Moreover, {P, E} clearly is a resolution of the identity on X since P + E = I. Proposition 5.57. Let {eγ }γ∈Γ be an orthonormal basis for a Hilbert space H. For each γ ∈ Γ define the mapping Pγ : H → H by Pγ x = x ; eγ eγ
for every
x ∈ H.
Claim: {Pγ }γ∈Γ is a resolution of the identity on H. Proof. Each Pγ is the orthogonal projection onto the one-dimensional space M = span {eγ }. This is a particular case of Theorem 5.54. It is plain that Pα Pβ x = x ; eβ eβ ; eα eα , and so Pα Pβ x = 0 whenever α = β, for every x ∈ H. Thus Pα Pβ = O for every α, β ∈ Γ such that α = β. Hence {Pγ }γ∈Γ is an orthogonal family of orthogonal projections on H. The Fourier Series Theorem ensures that {x ; eγ eγ }γ∈Γ is a summable family of vectors in H and, for every x ∈ H, x = x ; eγ eγ = Pγ x. γ∈Γ
γ∈Γ
Conclusion: {Pγ }γ∈Γ is a resolution of the identity on H.
Example 5.O. Let {ek }∞ k=1 be an orthonormal basis for an infinite-dimensional separable Hilbert space H. According to Theorem 5.54 and Proposition 5.57, for each k ≥ 1 the mapping Pk : H → H defined by Pk x = x ; ek ek
for every
x∈H
∞ is an orthogonal projection and {P kn}k=1 is a resolution of the identity on H. Thus the sequence of operators { k=1 Pk }∞ n=1 converges strongly to the identity I on H. In fact, take an arbitrary x ∈ H. By the Fourier Series Theorem, n k=1
Pk x − x =
n
x ; ek ek −
k=1
∞
x ; ek ek = −
k=1
as n → ∞ because the infinite series lem 4.9(b)). This confirms that n
∞
x ; ek ek → 0
k=n+1
∞
k=1 x ; ek ek
converges in H (see Prob-
s Pk −→ I.
k=1
n We shall see now that the sequence of operators { k=1 Pk }∞ n=1 , which converges strongly to the identity I on H, does not converge uniformly. Indeed,
370
5. Hilbert Spaces
if { nk=1 Pk }∞ then the identity I must be its unin=1 converges n uniformly, s form limit. (Reason: k=1 Pk −→ I and uniform convergence implies strong convergence to the same limit.) However, for each n ≥ 1, n ∞ , , , ,
, , , , Pk en+1 , = , en+1 ; ek ek , = #en+1 # = 1. , I− k=1
k=n+1
n n Thus #I − k=1 Pk # = sup x =1 #(I − k=1 Pk)x# ≥ 1 for every n ≥ 1, and n n u hence k=1 Pk −→ / I. Conclusion: { k=1 Pk }∞ n=1 does not converge uniformly. sequence of orthogonal projecProposition 5.58. If {Pk }k∈N is an orthogonal n tions on a Hilbert space H, then { P } converges strongly ork n∈ N k=1
− n to the s thogonal projection P : H → H onto R(P ) . That is, P −→ k k k∈N
− P , k=1 R(P ) . where P ∈ B[H] is the orthogonal projection with R(P ) = k k∈N
− Proof. Set M = k∈N R(Pk ) , which is a subspace of the Hilbert space H, and consider the decomposition H = M + M⊥ ⊥ of Theorem 5.20. Take any x ∈ H
⊥so that x = u + v with u ∈ M and v ∈ M . Observe that v ∈ by Proposition 5.12 so that v ⊥ R(Pk ) for k∈N R(Pk ) hence Pk v = 0, for every k ∈ N , every k ∈ N . Thus v∈ R(Pk )⊥ = N (Pk ), and ∞ − which implies that k=1 Pk v = 0. Since u ∈ and {R(Pk )}k∈N k∈N R(Pk ) is a countably infinite family of pairwise orthogonal subspaces of H, it follows by the Orthogonal Structure Theorem (Theorem 5.16) that u = ∞ k=1 uk with ∞ uk ∈ R(Pk ) for each k. Since Pj is continuous, we get Pj u = k=1 Pj uk = uj (reason: uk ∈ R(P k ), Pj Pk = O whenever j = k, and Pj uj = uj ) for each ∞ j ∈ N . Hence u = k=1 Pk u. Therefore, for every x ∈ H, ∞
Pk x =
k=1
∞ k=1
Pk u +
∞
Pk v = u,
k=1
n convergent (Proposo that the B[H]-valued sequence { k=1 Pk }n∈N is strongly sition 4.44). Let P ∈ B[H] be the strong limit of { nk=1 Pk }n∈N so that Px =
∞
Pk x
for every
x ∈ H.
k=1
∞ ∞ ∞ ∞ Then P 2 x = k=1 P Pk x = k=1 j=1 Pj Pk x = k=1 Pk x = P x for every x in H (since P is continuous and Pj Pk = O whenever j = k), and so P is idempotent. Moreover, R(P ) = M and N (P ) = M⊥. Indeed, P x = P (u + v) = u for every x in H = M + M⊥, where u is the unique vector in M and v is the unique vector in M⊥ such that x = u + v. Thus R(P ) ⊥ N (P ). Outcome: P ∈ B[H] is the unique orthogonal projection onto M (cf. Theorem 5.52).
5.10 Orthogonal Projection
371
The preceding proposition is a consequence of the Projection Theorem (i.e., Theorems 5.20 and 5.52) and the Orthogonal Structure Theorem (Theorem 5.16). Here is the full version of the Orthogonal Structure Theorem in terms of orthogonal projections. Theorem 5.59. Let H be a Hilbert space. If {Pk }k∈N is a resolution of the identity on H, then
− H = R(Pk ) . k∈N
Conversely, if {M k }k∈N is a− sequence of pairwise orthogonal subspaces of H such that H = , then the sequence {Pk }k∈N of the orthogonal k∈N Mk projections Pk ∈ B[H] onto each Mk is a resolution of the identity on H.
− R(P ) Proof. Set M = k k∈ N
⊥ , which is a subspace of the Hilbert space H. ⊥ , then x ⊥ R(Pk ) for every k ∈ N . Therefore, If x ∈ M = k∈N R(Pk ) ⊥ ) = N (P ), so that P x ∈ R(P k k k x = 0, for every k ∈ N . Then k∈N Pk x = 0. But P x = x for every x ∈ H because {P } is a resolution of the k k k∈ N k∈N identity on H. Thus x ∈ M⊥ implies x = 0 so that M⊥ = {0}, and hence M = H by Proposition 5.15. Conversely, let {Mk } k∈ of pairwise or − N be a sequence thogonal subspaces of H such that H = M . For each k ∈ N let k k∈N Pk ∈ B[H] be the orthogonal projection onto Mk (Theorem 5.52), and note that R(Pj ) = Mj ⊥ Mk = R(Pk ) if j = k. Thus {Pk }k∈N is an orthogonal sequence of orthogonal projections on H. Therefore, by Proposition 5.58, n
s Pk −→ I,
k=1
where the identity projection on H with I ∈ B[H] the unique orthogonal
− is − R(I) = H = M = R(P ) . Outcome: {Pk }k∈N is a resoluk k k∈N k∈N tion of the identity on H. It is worth noticing that, as {R(Pk )}k∈N is of pairwise orthog
− a sequence ∼ onal subspaces of a Hilbert space H, then R(P ) R(P = k k) k∈N k∈N (see Example under the usual identification, Proposition 5.58 n 5.J). Therefore, s says that k=1 Pk −→ P , where P ∈ B[H] is the orthogonal projection with R(P ) = k∈N R(Pk ), and Theorem 5.59 says that H = k∈N R(Pk ) whenever {Pk }k∈N is a resolution of the identity on H. Definition 5.60. Let {Pγ }γ∈Γ be a resolution of the identity on a Hilbert space H = {0} with Pγ = O for every γ ∈ Γ . Let {λγ }γ∈Γ be a similarly indexed family of scalars. Set D(T ) = x ∈ H: {λγ Pγ x}γ∈Γ is a summable family of vectors in H .
372
5. Hilbert Spaces
The mapping T : D(T ) → H, defined by λγ Pγ x for every Tx =
x ∈ D(T ),
γ∈Γ
is said to be a weighted sum of projections. Proposition 5.61. Every weighted sum of projections is a linear transformation. Moreover, if T ∈ L[D(T ), H] is a weighted sum of projections, then the following assertions are pairwise equivalent. (a) D(T ) = H. (b) {λγ }γ∈Γ is a bounded family of scalars. (c) T is bounded. If any of the above equivalent assertions holds true, then T ∈ B[H] is such that #T # = supγ∈Γ |λγ |. Proof. It is readily verified that the domain D(T ) of a weighted sum of projections is a linear manifold of H, and also that the transformation T : D(T ) → H is linear. (See the remark that follows Definition 5.26.) Proof of (a)⇒(b). If {λγ }γ∈Γ is not bounded, then for each integer n ≥ 1 there exists a γn ∈ Γ such that |λγn | ≥ n. Consider the scalar-valued sequence ∞ {λγn }∞ n=1 , which is clearly unbounded. Consider the sequence {Pγn }n=1 from {Pγ }γ∈Γ and, for each n ≥ 1, take eγn ∈ R(Pγn ) such that #eγn # = 1 (recall: R(Pγn ) = {0} because Pγ = O for every γ ∈ Γ ). Observethat the orthogo−1 2 nal sequence {λ−1 e }∞ is square-summable (because ∞ γ n=1 #λγn eγn # = n γn n=1 ∞ ∞ 1 −2 ≤ n=1 n2 < ∞), and hence itis a summable sequence in the n=1 |λγn | −1 Hilbert space H (Corollary 5.9(b)). Set x = ∞ k=1 λγk eγk in H and note that (since each Pγn is continuous) λγn Pγn x =
∞
λγn λ−1 γk Pγn eγk = eγn
for each
n ≥1
k=1
(because Pγn eγk = 0 if k = n and Pγn eγn = eγn , once eγk ∈ R(Pγk ) for each k ≥ 1 and Pγn Pγk = O whenever k = n). But {eγn }∞ n=1 is not a squaresummable sequence (since it is an orthogonal sequence of unit vectors). Thus {λγn Pγn x}∞ n=1 is not na square-summable sequence of vectors in H, which means that supn≥1 k=1 #λγk Pγk x#2 = ∞, and hence the orthogonal family {λγ Pγ x}γ∈Γ of vectors in H is not square-summable (cf. Proposition 5.31). Therefore, {λγ Pγ x}γ∈Γ is not a summable family of vectors in H by Theorem 5.32, which means that x ∈ / D(T ). Conclusion: If {λγ }γ∈Γ is not bounded, then D(T ) = H. Equivalently, (a) implies (b). Proof of (b)⇒(a,c). Let x be an arbitrary vector in H. Since {Pγ }γ∈Γ is a resolution of the identity on H, it follows that {Pγ x}γ∈Γ is a summable family (for
5.10 Orthogonal Projection
373
γ∈Γ Pγ x = x) of orthogonal vectors in H, and so a square-summable family by Theorem 5.32(a). Suppose {λγ }γ∈Γ is bounded and set β = supγ∈Γ |λγ | (which is a nonnegative real number). Then, for any finite N ⊆ Γ , #λk Pk x#2 ≤ β 2 #Pk x#2 ≤ β 2 #Pγ x#2 < ∞, k∈N
k∈N
γ∈Γ
so that {λγ Pγ x}γ∈Γ is a square-summable family of orthogonal vectors in H (Proposition 5.31), and hence a summable family of vectors in the Hilbert space H by Theorem 5.32(b). Thus x ∈ D(T ), and therefore (b) implies (a). Moreover, since {λγ Pγ x}γ∈Γ is an orthogonal family of vectors in H, and {Pγ }γ∈Γ is a resolution of the identity on H, we get , ,2 , , #T x#2 = , λγ Pγ x, = |λγ |2 #Pγ x#2 γ∈Γ
≤ β2
γ∈Γ
, ,2 , , #Pγ x#2 = β 2 , Pγ x, = β 2 #x#2
γ∈Γ
γ∈Γ
(see Theorem 5.32 again). Hence (b) implies (c). Proof of (c)⇒(b). For each γ ∈ Γ take a unit vector eγ in R(Pγ ). Thus λγ Pγ eγ = λγ eγ . T eγ = γ∈Γ
If T is bounded, then |λγ | = #λγ eγ # = #T eγ # ≤ #T ##eγ # = #T # for all γ ∈ Γ so that (c) implies (b). Finally, by the above inequality, supγ∈Γ |λγ | ≤ #T #. On the other hand, we have already seen that #T x# ≤ β #x# for every x ∈ H, where β = supγ∈Γ |λγ |, which implies that #T # ≤ supγ∈Γ |λγ |. Hence #T # = supγ∈Γ |λγ |. Let the infinite sequence {Pk }∞ k=1 be a resolution of the identity on H, where Pk = O for every k ≥ 1, and let {λk }∞ k=1 be a bounded sequence of scalars. Observe that the identity in H Tx =
∞
λk Pk x
for every
x∈H
k=1
that defines the weighted sum of projections T ∈ B[H] actually means convergence in the strong topology; that is, n k=1
s λk Pk −→ T.
374
5. Hilbert Spaces
5.11 The Riesz Representation Theorem and Weak Convergence Associated with an arbitrary vector y in an inner product space X , consider the functional f : X → F defined by f (x) = x ; y
for every
x ∈ X.
This is a linear and bounded functional. Indeed, f is linear because the inner product is linear in the first argument, and bounded by the Schwarz inequality: |f (x)| = |x ; y| ≤ #y##x# for every x ∈ X . That is, f ∈ B[X , F ] and #f # = sup x =1 |f (x)| ≤ #y#. On the other hand, observe that #y##y# = |y ; y| = |f (y)| ≤ #f ##y# so that #y# ≤ #f #. Therefore, #f # = #y#. Outcome: A bounded (i.e., continuous) linear functional f is naturally associated with each vector y in an inner product space X . The remarkable fact is that the converse holds true in a Hilbert space. Theorem 5.62. (The Riesz Representation Theorem). For every bounded linear functional f on a Hilbert space H, there exists a unique vector y ∈ H such that f (x) = x ; y for every x ∈ H. Moreover, #f # = #y#. Such a unique vector y in H is called the Riesz representation of the functional f in B[H, F ]. Proof. If f = 0, then it is clear that y = 0 is the unique vector in H for which f (x) = x ; y for every x ∈ H. Thus suppose f = 0. Existence. Consider the null space N (f ) of f ∈ B[H, F ], which is a proper subspace of H (i.e., N (f ) is a subspace of H by Proposition 4.13 and N (f ) = H since f = 0). Thus, as H is a Hilbert space, N (f )⊥ = {0} by Proposition 5.15. Let z be any nonzero vector in N (f )⊥. Since z ∈ / N (f ) (for N (f ) ∩ N (f )⊥ = {0}), it follows that f (z) = 0. Now take an arbitrary x in H and note that
f x − ff (x) z = f (x) − f (x) ff (z) (z) (z) = 0. Thus x −
f (x) f (z) z
∈ N (f ). Since z ∈ N (f )⊥ we get 8 7 z ; z = x ; z − 0 = x − ff (x) (z)
f (x) 2 f (z) #z# ,
9 f (z) : and hence f (x) = x ; z 2 z . Then there exists a vector y = does not depend on x, such that f (x) = x ; y.
f (z)
z 2 z
in H, which
Uniqueness. If y ∈ H is such that f (x) = x ; y for every x ∈ H, then x ; y = x ; y so that x ; y − y = 0 for every x ∈ H. Therefore, y − y ∈ H⊥ = {0}.
5.11 The Riesz Representation Theorem and Weak Convergence
375
Finally, as we have already seen in the introduction of this section, if y ∈ H is such that f (x) = x ; y for every x ∈ H, then #f # = #y#. Corollary 5.63. For every Hilbert space H there exists a surjective isometry Ψ : H∗ → H of the dual H∗ of H onto H, which is additive and conjugate homogeneous (i.e., Ψ (αf ) = αΨ (f ) for every f ∈ H∗ and every α ∈ F ). Proof. Let H be a Hilbert space and let H∗ = B[H, F ] be the dual of H. By the Riesz Representation Theorem, for each f ∈ H∗ there exists a unique y ∈ H such that f (x) = x ; y for every x ∈ H and #f # = #y#. Conversely, for each y ∈ H the functional f : H → F given by f (x) = x ; y for every x ∈ H is linear and bounded, which means that f ∈ H∗. This establishes a surjective isometry (i.e., an invertible isometry) Ψ : H∗ → H of the dual H∗ of H onto H: Ψ (f ) = y
for every
f ∈ H∗ ,
where y ∈ H is the (unique) Riesz representation of f ∈ H∗. Therefore, every f in H∗ is such that f (x) = x ; Ψ (f )
for every
x ∈ H.
Observe that Ψ is additive. Indeed, if f, g ∈ H∗, then x ; Ψ (f + g) = (f + g)(x) = f (x) + g(x) = x ; Ψ (f ) + x ; Ψ (g) = x ; Ψ (f ) + Ψ (g) for every x ∈ H, so that Ψ (f + g) = Ψ (f ) + Ψ (g). Moreover, if f ∈ H∗ and α ∈ F , then x ; Ψ (αf ) = αf (x) = f (αx) = αx ; Ψ (f ) = x ; α Ψ (f ) for every x ∈ H, and hence Ψ (αf ) = α Ψ (f ).
From the above corollary we may conclude: Every Hilbert space is isometrically equivalent to its dual . In particular, every real Hilbert space is isometrically isomorphic to its dual . Corollary 5.64. Every Hilbert space is reflexive. Proof. Let Ψ : H∗ → H be the surjective isometry of Corollary 5.63, which is additive and conjugate homogeneous. Consider the mapping ; ∗ : H∗ × H∗ → F given by f ; g∗ = Ψ (g) ; Ψ (f ) for every f, g ∈ H∗, where ; stands for the inner product on H. This defines an inner product on H∗. Indeed, ; ∗ is additive because Ψ is additive. Since Ψ is conjugate homogeneous, αf ; g∗ = Ψ (g) ; Ψ (αf ) = Ψ (g) ; αΨ (f ) = αΨ (g) ; Ψ (f ) = αf ; g∗
376
5. Hilbert Spaces
for every f, g ∈ H∗ and every α ∈ F , and so ; ∗ is homogeneous in the first argument. It is clear that ; ∗ is Hermitian symmetric and positive. Actually, #f #∗ = #Ψ (f )# = #f # for every f ∈ H∗, so that the norm # #∗ induced on H∗ by the inner product ; ∗ coincides with the usual (induced uniform) norm on H∗ = B[H, F ]. Since the dual space of every normed space is a Banach space, (H∗, # #) is a Banach space, and hence (H∗, # #∗ ) is a Hilbert space. We shall now apply the Riesz Representation Theorem to the Hilbert space H∗. Take an arbitrary ϕ ∈ H∗∗. Theorem 5.62 ensures that there exists a unique g ∈ H∗ such that ϕ(f ) = f ; g∗ = Ψ (g) ; Ψ (f ) ∗
for every f ∈ H . According to Theorem 5.62, every f ∈ H∗ is given by f (x) = x ; y for every x ∈ H, where y = Ψ (f ) ∈ H. Set z = Ψ (g) ∈ H so that f (z) = z ; y = Ψ (g) ; Ψ (f ). Therefore, there exists z ∈ H such that ϕ(f ) = f (z)
for every
f ∈ H∗ .
Hence H is reflexive by Proposition 4.67.
Let T ∈ B[H, Y ] be a bounded linear transformation of a Hilbert space H to an inner product space Y. Take an arbitrary y ∈ Y and consider the functional fy : H → F defined by fy (x) = T x ; y for every x ∈ H, where the above inner product is that on Y. It is easy to show that fy : H → F is a bounded linear functional (i.e., fy ∈ H∗ ). Indeed, fy is linear because T is linear and the inner product is linear in the first argument, and bounded by the Schwarz inequality: |fy (x)| ≤ #T x##y# ≤ #T ##y##x# for every x ∈ H. The Riesz Representation Theorem says that there exists a unique zy in H such that T x ; y = fy (x) = x ; zy for every x ∈ H, where the right-hand side inner product is that on H. This establishes a mapping T ∗ : Y → H that assigns to each y in Y this unique zy in H (i.e., T ∗ y = zy for every y ∈ Y), and therefore satisfies the following identity for every x in H and every y in Y: T x ; y = x ; T ∗ y. The mapping T ∗ : Y → H is referred to as the adjoint of T ∈ B[H, Y ]. In fact, as we shall see below, the adjoint T ∗ of T is unique. Proposition 5.65. Take any T ∈ B[H, Y ], where H is a Hilbert space and Y is an inner product space.
5.11 The Riesz Representation Theorem and Weak Convergence
377
(a) The adjoint T ∗ of T is the unique mapping of Y into H such that T x ; y = x ; T ∗ y for every x ∈ H and every y ∈ Y. (b) T ∗ is a bounded linear transformation (i.e., T ∗ ∈ B[Y, H]). Moreover, if Y also is a Hilbert space, then (c) T ∗∗ = T ,
and
(d) #T ∗ #2 = #T ∗ T # = #T T ∗ # = #T #2. Proof. Take T ∈ B[H, Y ] and let T ∗ : Y → H be a mapping such that T x ; y = x ; T ∗ y for every x ∈ H and every y ∈ Y. Proof of (a). If T #: Y → H satisfies the identity T x ; y = x ; T # y for every x ∈ H and every y ∈ Y, then for each y in Y x ; T # y = x ; T ∗ y — i.e., x ; T # y − T ∗ y = 0 — for every x ∈ H. Hence T # y = T ∗ y for every y ∈ Y. Proof of (b). Take y1 , y2 ∈ Y and α1 , α2 ∈ F arbitrary. Note that x ; T ∗ (α1 y1 + α2 y2 ) = T x ; α1 y1 + α2 y2 = α1 T x ; y1 + α2 T x ; y2 = α1 x ; T ∗ y1 + α2 x ; T ∗ y2 = x ; α1 T ∗ y1 + α2 T ∗ y2 for every x ∈ H, and hence T ∗ (α1 y1 + α2 y2 ) = α1 T ∗ y1 + α2 T ∗ y2 so that T ∗ is linear. Moreover, by the Schwarz inequality, #T ∗ y#2 = T ∗ y ; T ∗ y = T T ∗ y ; y ≤ #T T ∗ y##y# ≤ #T ##T ∗y##y# for every y ∈ Y. This implies that #T ∗ y# ≤ #T ##y# for every y ∈ Y. Thus T ∗ is bounded and #T ∗ # ≤ #T #. Now suppose Y is a Hilbert space so that T ∗ ∈ B[Y, H] has a unique adjoint (T ∗ )∗ ∈ B[H, Y ] (notation: T ∗∗ = (T ∗ )∗ ) such that T ∗ y ; x = y ; T ∗∗ x for every x ∈ H and every y ∈ Y. Proof of (c). Since y ; T x = T x ; y = x ; T ∗ y = T ∗ y ; x, for every x ∈ H and every y ∈ Y, it follows by the above identity that y ; T x = y ; T ∗∗ x for every x ∈ H and every y ∈ Y, and hence T x = T ∗∗ x for every x ∈ H. Proof of (d). We have already seen that #T ∗ # ≤ #T #. Then, as T = T ∗∗, we get #T # = #T ∗∗ # ≤ #T ∗ #. Thus #T ∗ # = #T #, and so #T ∗ T # ≤ #T ∗ ##T # = #T #2 . However, the Schwarz inequality ensures that
378
5. Hilbert Spaces
#T x#2 = T x ; T x = T ∗ T x ; x ≤ #T ∗ T x##x# ≤ #T ∗ T ##x#2 for every x ∈ H. Hence #T #2 ≤ #T ∗ T #, and therefore #T #2 = #T ∗T #. Again, as T ∗∗ = T , we get #T ∗ #2 = #T ∗∗ T ∗ # = #T T ∗ #. Let X be an inner product space and consider the definition of weak convergence (cf. Problem 4.67): An X -valued sequence {xn } converges weakly to w x ∈ X (notation: xn −→ x) if the scalar-valued sequence {f (xn )} converges in F to f (x) for every f ∈ X ∗ (i.e., if f (xn − x) → 0 for every f ∈ X ∗ ). Recall from Problem 4.67 that convergence in the norm topology implies weak convergence (to the same and unique limit): xn → x
implies
w xn −→ x.
Since · ; y: X → F lies in X ∗ for every y ∈ X , it follows that w x xn −→
implies
xn ; y → x ; y for every y ∈ X .
The Riesz Representation Theorem ensures the converse in a Hilbert space: If X is a Hilbert space, then w xn −→ x
xn ; y → x ; y for every y ∈ X .
if and only if
In particular, xn −→ x implies xn ; x → x ; x = #x#2 . By recalling that #xn − x#2 = #xn #2 − 2 Rexn ; x + #x#2 , we may conclude the nontrivial part of the next equivalence, which holds in any inner product space: w
xn → x
if and only if
w xn −→ x and #xn # → #x#.
Now suppose X and Y are inner product spaces and let {Tn } be a B[X , Y ]valued sequence. If the Y-valued sequence {Tn x} converges weakly for every x ∈ X , then we say that {Tn} converges weakly (or converges in the weak operator topology). If X is a Hilbert space, then {Tn } converges weakly if there exists a unique T ∈ B[X , Y ] such that w Tn x −→ T x in Y for every x ∈ X , w T. which is called the weak limit of {Tn } (see Problem 4.68). Notation: Tn −→ ∗ Since · ; y: Y → F lies in Y for every y ∈ Y, it follows that w T implies (Tn − T )x ; y → 0 for every x ∈ X and every y ∈ Y. Tn −→
We shall see in Proposition 5.67 that the converse holds if Y is a Hilbert space. Remark : Recalling Definition 4.45 and Problem 4.68, uniform convergence implies strong convergence, which implies weak convergence (to the same limit): u T Tn −→
=⇒
s Tn −→ T
=⇒
w Tn −→ T.
w y and #yn # → #y#, it follows that Since yn → y if and only if yn −→ s T Tn −→
if and only if
w Tn −→ T and #Tn x# → #T x# for every x ∈ X .
5.11 The Riesz Representation Theorem and Weak Convergence
379
We have already seen that strong and uniform convergence coincide if X is finite dimensional. In fact, uniform, strong, and weak convergence coincide if X and Y are finite-dimensional inner product spaces. Proposition 5.66. (a) In a finite-dimensional inner product space, weak convergence and convergence in the norm topology coincide. (b) Let X and Y be inner product spaces. Consider a B[X , Y ]-valued sequence {Tk } and let T be a transformation in B[X , Y ]. If Y is finite dimensional, w s then Tk −→ T if and only if Tk −→ T. Proof. (a) Suppose X is a finite-dimensional inner product space, and so X is a Hilbert space (Corollary 4.28). Let B = {ei }ni=1 be an orthonormal basis for X . w Take an arbitrary weakly convergent sequence {xk } in X so that xk −→ x for some x ∈ X , and hence xk − x ; ei → 0 as k → ∞ for each ei ∈ B. Therefore, for each i = 1 , . . . , n and every ε > 0, there exists a positive integer ki,ε such that |xk − x ; ei | < ε whenever k ≥ ki,ε . Then the Fourier Series Theorem (Theorem 5.48) ensures that #xk − x#2 =
n
|xk − x ; ei |2 < nε2
i=1
whenever k ≥ kε = max{ki,ε }ni=1 . Hence #xk − x# → 0 w xk → x. Summing up: xk −→ x implies xk → x if X is
as k → ∞. That is, a finite-dimensional inner product space. This concludes the proof of (a), since norm convergence always implies weak convergence.
w w (b) Tk −→ T means Tk x −→ T x in Y for every x ∈ X . But item (a) says that this is equivalent to Tk x → T x in Y for every x ∈ X if Y is finite dimensional, s which means Tk −→ T . The preceding remark ensures the converse.
Here are equivalent conditions for weak convergence of bounded linear transformations between Hilbert spaces. Proposition 5.67. Let {Tn } be a sequence of bounded linear transformations of a Hilbert space H into a Hilbert space K (i.e., {Tn } is a B[H, K]-valued sequence). The following three assertions are pairwise equivalent. (a) There exists T ∈ B[H, K] such that {Tn } converges weakly to T w w (i.e., Tn −→ T or, equivalently, Tn − T −→ O). (b) There exists T ∈ B[H, K] such that Tn x ; y → T x ; y as n → ∞ for every x in H and every y in K. (c) The scalar-valued sequence {Tn x ; y} converges in F for every x in H and every y in K. Now set K = H and consider the following further assertions. (d) There exists T ∈ B[H] such that Tn x ; x → T x ; x as n → ∞ for every x ∈ H.
380
5. Hilbert Spaces
(e) The scalar-valued sequence {Tn x ; x} converges in F for every x ∈ H. Clearly, (b) implies (d), which implies (e). If K = H is a complex Hilbert space, then these five assertions are all pairwise equivalent. Proof. If there is T ∈ B[H, K] such that f (Tnx) → f (T x) in F as n → ∞ for every f ∈ K∗ and every x ∈ H, then Tn x ; y → T x ; y as n → ∞ for every y ∈ K and every x ∈ H (since · ; y: K → F lies in K∗ for every y ∈ K). Hence (a)⇒(b). Conversely, suppose (b) holds true and take any f ∈ K∗. Since K is a Hilbert space, the Riesz Representation Theorem (Theorem 5.62) ensures that f (Tn x) → f (T x) in F as n → ∞ for every x ∈ H. Thus (b)⇒(a). It is clear that (b)⇒(c). Now suppose assertion (c) holds true. Take an arbitrary y in K and consider the functional fy : H → F defined by fy (x) = limTn x ; y n
for every x ∈ H. Observe that fy is linear (because Tn is linear for each n, the inner product is linear in the first argument, and the linear operations in F are continuous). Since {Tn x ; y} converges in F for every x ∈ H, it follows that {Tn x ; y} is a bounded sequence for every x ∈ H (Proposition 3.39). Then supn |Tn x ; y| < ∞ for every x ∈ H and every y ∈ K, and so supn #Tn # < ∞ (as a consequence of the Banach–Steinhaus Theorem — see Problem 5.5). Hence, for an arbitrary y in K, |fy (x)| = | limn Tn x ; y| = limn |Tn x ; y| ≤ supn |Tn x ; y| ≤ supn #Tn x##y# ≤ supn #Tn ##x##y# for every x ∈ H, so that #fy # ≤ sup #Tn ##y#. n
That is, fy is bounded, and therefore fy ∈ H∗ (fy is a bounded linear functional on H). Since H is a Hilbert space, the Riesz Representation Theorem says that there exists a unique zy ∈ H such that fy (x) = x ; zy for every x ∈ H. Consider the mapping S: K → H that assigns to each y in H this unique zy in H, Sy = zy for every y ∈ K, so that limTn x ; y = fy (x) = x ; zy = x ; Sy n
for every x ∈ H and every y ∈ K. Note that S is linear and bounded (i.e., S ∈ B[K, H]). Indeed, if y1 , y2 ∈ Y and α1 , α2 ∈ F , then x ; S(α1 y1 + α2 y2 ) = limn Tn x ; α1 y1 + α2 y2 = α1 limn Tn x ; y1 + α2 limn Tn x ; y2 = α1 x ; Sy1 + α2 x ; Sy2 = x ; α1 Sy1 + α2 Sy2 for every x ∈ H. Hence S(α1 y1 + α2 y2 ) = α1 Sy1 + α2 Sy2 , which means that S is linear. Moreover,
5.11 The Riesz Representation Theorem and Weak Convergence
381
#Sy# = #zy # = #fy # ≤ sup #Tn ##y# n
for every y ∈ K so that S is bounded with #S# ≤ supn #Tn#. Setting T = S ∗ ∈ B[H, K], the adjoint of S, we get (cf. Proposition 5.65) limTn x ; y = x ; Sy = x ; S ∗∗ y = S ∗ x ; y = T x ; y n
for every x ∈ H and every y ∈ K. Therefore, (c)⇒(b). Finally, set K = H so that (b)⇒(d) and (c)⇒(e) trivially. By Problem 5.3(b) (with L = I), it follows that (d)⇒(b) and (e)⇒(c) whenever H is a complex Hilbert space. Carefully note that the latter two conditions are equivalent to the former three equivalent conditions only if the Hilbert space H = K is complex . Remark : Observe from Proposition 5.67 and Problem 5.5 that w Tn −→ T
=⇒
sup #Tn # < ∞ whenever H and K are Hilbert. n
Take T ∈ B[X ], where X is an inner product space, and consider the power sequence {T n } so that each T n lies in B[H]. The operator T is weakly stable if the power sequence {T n } converges weakly to the null operator; that is, w T n −→ O. If X is a Hilbert space, then Proposition 5.67 says that this is equivalent to any of those five assertions with Tn replaced with T n and T w replaced with O. In particular, if X is a complex Hilbert space, then T n −→ O if and only if T n x ; x → 0 as n → ∞ for every x ∈ X . It is plain that, uniform stability implies strong stability, which implies weak stability, u T n −→ O
=⇒
s T n −→ O
=⇒
w T n −→ O,
which in turn implies power boundedness whenever X is a Hilbert space: w T n −→ O
=⇒
sup #T n # < ∞ if X is Hilbert. n
The converses, however, fail (see Example 4.K and Problem 5.29(c)). Let X and Y be inner product spaces. We say that a subset Θ of B[X , Y ] is weakly closed in B[X , Y ] if every Θ-valued weakly convergent sequence {Tn } has its (weak) limit T in Θ. Recall from Proposition 4.48: If Θ is strongly closed in B[X , Y ], then it is (uniformly) closed in B[X , Y ]. Proposition 5.68. If Θ ⊆ B[X , Y ] is weakly closed in B[X , Y ], then it is strongly closed in B[X , Y ]. Proof. Take an arbitrary Θ-valued strongly convergent sequence, say {Tn }, and let T ∈ B[X , Y ] be its (strong) limit. Since strong convergence implies weak convergence to the same limit, it follows that {Tn } converges weakly to T . If every Θ-valued weakly convergent sequence has its (weak) limit in Θ,
382
5. Hilbert Spaces
then T ∈ Θ. Conclusion: Every Θ-valued strongly convergent sequence has its (strong) limit in Θ. Remark : If Y is finite dimensional, then weak convergence coincides with strong convergence (Proposition 5.66(b)), and so the concepts of weakly closed and strongly closed in B[X , Y ] coincide whenever Y is a finite-dimensional inner product space. Then (see the remark after Proposition 4.48), if X and Y are finite-dimensional inner product spaces, then all three concepts of weakly, strongly, and uniformly closed in B[X , Y ] coincide. Recall that a subset A of a metric space is compact if and only if it is sequentially compact (Theorem 3.80), which means by Definition 3.76 that every A-valued sequence has a convergent subsequence (that converges to a point in A). The Heine–Borel Theorem (Theorem 3.83) ensures that every bounded subset B of a finite-dimensional inner product space has a compact closure. Thus every B-valued sequence has a convergent subsequence (that converges to a point in B − ). Hence, every bounded nonempty subset of a finitedimensional inner product space has a convergent sequence (whose limit lies in its closure). This no longer holds in an infinite-dimensional inner product space (e.g., if {en} is an infinite orthonormal sequence, then #en − em #2 = 2 for every m = n). Now recall that a finite-dimensional inner product space is a Hilbert space where convergence in the norm topology coincides with weak convergence (Proposition 5.66(a)), so that the next lemma actually is an extension to infinite-dimensional Hilbert spaces of the above italicized result. Lemma 5.69. Every bounded nonempty subset of a Hilbert space has a weakly convergent sequence. In particular, every bounded sequence in a Hilbert space has a weakly convergent subsequence. Proof. Let B be a bounded nonempty subset of an inner product space H. The result is trivial if B = {0}. Suppose B = {0} and let A = {0} be a nonempty countable subset of B, so that 0 < supx∈A #x# < ∞. Set X = A, which is a subspace of H, so that X is a separable inner product space by Proposition 4.9(b). Consider the following subset of the dual of X : Φ = · ; x: X → F : x ∈ A ⊂ X ∗ . Recall the definition of “pointwise total boundedness” and “equicontinuity” of Example 3.Z. Claim 1. Φ is pointwise totally bounded. Proof. Since total boundedness coincides with plain boundedness in F (cf. proof of Theorem 3.83), it follows that Φ is pointwise totally bounded if and only if it is pointwise bounded. But |w ; x| ≤ supx∈A #x##w# for every w ∈ X , and so Φ is pointwise bounded. Claim 2. Φ is (uniformly) equicontinuous.
5.11 The Riesz Representation Theorem and Weak Convergence
383
Proof. Indeed, |u ; x − v ; x| = |u − v ; x| ≤ supx∈A #x##u − v# for every u, v ∈ X and every x ∈ A. This implies that for every ε > 0 there exists a δ = (supx∈A #x#)−1 ε > 0 such that |u ; x − v ; x| < ε whenever #u − v# < δ for all u, v ∈ X and every x ∈ A. Then, according to the proof of Example 3.Z (set fn = · ; xn ∈ Φ), we may conclude: Φ is totally bounded because X is separable, which means that every Φ-valued sequence has a Cauchy subsequence (Lemma 3.73). Take an arbitrary A-valued sequence {xn } so that the Φ-valued sequence {· ; xn } has a subsequence, say {· ; xnk }, that is Cauchy in X ∗. But X ∗ is always complete so that {· ; xnk } converges in X ∗. That is, there exists an f ∈ X ∗ such that · ; xnk → f in X ∗. If H is a Hilbert space, then the subspace X is itself a Hilbert space, and the Riesz Representation Theorem says that there exists an x ∈ X for which f = · ; x. Therefore, xnk ; w → x ; w for every w ∈ X . w Using the Riesz Representation Theorem again, we get xnk −→ x in X . Hence w xnk −→ x in H (cf. Problem 5.20). Theorem 5.70. Let {Tn } be a B[H, K]-valued sequence, where H and K are Hilbert spaces. If H is a separable Hilbert space and supn #Tn # < ∞, then {Tn } has a weakly convergent subsequence. Proof. Suppose H = {0} to avoid trivialities. If H is separable, then there exists a countably infinite dense subset A of the nonzero linear space H. Let {ai }i≥1 be an A-valued sequence consisting of an enumeration of all points of A. Since A− = H, it follows by Proposition 3.32 that inf #x − ai # = 0 i
for every
x ∈ H.
If supn #Tn # < ∞, then {Tn a1 }n≥1 is a bounded sequence in K. Thus Lemma 5.69 ensures the existence of a subsequence of {Tn }n≥1 , say {Tn(1) }n≥1 , such that {Tn(1) a1 }n≥1 converges weakly in K. Again, since supn #Tn # < ∞, it follows that {Tna2 }n≥1 is bounded in K and, in particular, the subsequence {Tn(1) a2 }n≥1 is bounded in K. Hence another application of Lemma 5.69 ensures that there exists a subsequence of {Tn(1) }n≥1 , say {Tn(2) }n≥1 , such that {Tn(2) a2 }n≥1 converges weakly in K. Note that {Tn(2) a1 }n≥1 also converges weakly in K (reason: {Tn(2) }n≥1 is a subsequence of {Tn(1) }n≥1 and {Tn(1) a1 }n≥1 converges weakly in K — see Problem 4.67(b)). This leads to an inductive construction of a sequence of B[H, K]-valued sequences, {Tn(k) }n≥1 k≥1 , with the following properties. Property (1). {Tn(k+1) }n≥1 is a subsequence of {Tn(k) }n≥1 , which is a subsequence of {Tn }n≥1 , for every k ≥ 1. Property (2). {Tn(k) ai }n≥1 converges weakly in K whenever k ≥ i. Take the “diagonal” sequence {Tn(n) }n≥1 , which is a subsequence of {Tn }n≥1 . If {Tn(n) }n≥1 is weakly convergent, then the theorem is proved. Claim . {Tn(n) }n≥1 is weakly convergent.
384
5. Hilbert Spaces
Proof. Take x ∈ H, y ∈ K, and ε > 0 arbitrary. Since inf i #x − ai # = 0, there exists an integer iε ≥ 1 such that #x − aiε # < ε. According to Property (1) {Tn(n) }n≥iε is a subsequence of {Tn(iε ) }n≥1 , and so {Tn(n) aiε }n≥iε is a subsequence of {Tn(iε ) aiε }n≥1 . Since {Tn(iε ) aiε }n≥1 converges weakly in K by Property (2), it follows that its subsequence {Tn(n) aiε }n≥iε also converges weakly in K. This implies that {Tn(n) aiε }n≥1 converges weakly in K so that {Tn(n) aiε ; y}n≥1 converges in F , and therefore is a Cauchy sequence in F . That is, there exists an integer nε ≥ 1 such that m, n ≥ nε
(m) |(Tn(n) − Tm )aiε ; y| < ε.
implies
(m) # ≤ 2 supk #Tk # for all m, n ≥ 1 because {Tn(n) }n≥1 is a Note that #Tn(n) − Tm subsequence of {Tn }n≥1 . Hence (m) (m) )x ; y| = |(Tn(n) − Tm )(aiε + x − aiε ); y| |(Tn(n) − Tm (n) (m) ≤ (Tn − Tm )aiε ; y| + 2 supk #Tk ##x − aiε ##y#
≤ 1 + 2 supk #Tk ##y# ε
whenever m, n ≥ nε . Conclusion: {Tn(n) x ; y}n≥1 is a Cauchy sequence in F so that it converges in F (since F is complete). As x and y are arbitrary vectors in H and K, respectively, this implies that the scalar-valued sequence {Tn(n) x ; y}n≥1 converges in F for every x ∈ H and every y ∈ K, which means w that Tn(n) −→ T for some T ∈ B[H, K] by Proposition 5.67.
5.12 The Adjoint Operator Let T ∈ B[H, Y ] be a bounded linear transformation of a Hilbert space H into an inner product space Y. The adjoint of T was defined in the previous section as the unique mapping T ∗ : Y → H such that T x ; y = x ; T ∗ y for every x ∈ H and every y ∈ Y, whose existence was established in Section 5.11 as a consequence of the Riesz Representation Theorem. The basic facts about the adjoint T ∗ were stated in Proposition 5.65. In particular, it is linear and bounded (i.e., T ∗ ∈ B[Y, H]). Here is a useful corollary of Proposition 5.65. Corollary 5.71. If H and K are Hilbert spaces and T ∈ B[H, K], then #T # =
sup
x = y =1
|T x ; y|.
Proof. Since #z# = sup y =1 |z ; y| for every z ∈ K (see Problem 5.1), it follows that #T x# = sup y =1 |T x ; y| for every x ∈ H. Hence
5.12 The Adjoint Operator
385
#T # = sup sup |T x ; y|.
x =1 y =1
Recalling that T ∗∗ = T (see Proposition 5.65), it also follows that #T ∗y# = sup x =1 |T ∗ y ; x| = sup x =1 |y ; T x| = sup x =1 |T x ; y| for every y ∈ K. Therefore, as #T ∗ # = #T # (cf. Proposition 5.65 again), #T # = #T ∗ # = sup sup |T x ; y|.
y =1 x =1
Let us see some further elementary properties of the adjoint. Consider the linear space B[H, Y ], where H is a Hilbert space and Y is an inner product space. First observe that O∗ = O. That is, the adjoint O∗ ∈ B[Y, H] of the null transformation O ∈ B[H, Y ] coincides with the null transformation O ∈ B[Y, H]. In fact, x ; O∗ y = Ox ; y = 0 for every x ∈ H and every y ∈ Y, and hence O∗ y = 0 for every y ∈ Y. Now take S and T in B[H, Y ] so that S + T and αT lie in B[H, Y ], where α is any scalar. Consider their adjoints: the unique transformations in B[Y, H] such that (S + T )x ; y = x ; (S + T )∗ y and αT x ; y = x ; (αT )∗ y for every x ∈ H and y ∈ Y, respectively. These identities imply that x ; (S + T )∗ y = Sx ; y + T x ; y = x ; (S ∗ + T ∗ )y and x ; (αT )∗ y = x ; α T ∗ y for every x ∈ H and every y ∈ Y. Therefore, (S + T )∗ = S ∗ + T ∗
and
(αT )∗ = α T ∗ .
Next take T in B[H, K] and S in B[K, Y ], where H and K are Hilbert spaces and Y is an inner product space, so that S T lies in B[H, Y ] by Proposition 4.16. Consider its adjoint, which is the unique transformation in B[Y, H] such that S T x ; y = x ; (S T )∗ y for every x ∈ H and y ∈ Y. This implies that x ; (S T )∗ y = x ; T ∗ S ∗ y for every x ∈ H and every y ∈ Y. Then (S T )∗ = T ∗ S ∗ . Finally, consider the algebra B[H], where H is a Hilbert space. It is clear by the very definition of adjoint that I ∗ = I, where I is the identity operator in B[H]. If T ∈ G[H, K], where H and K are Hilbert spaces (i.e., if T is invertible in B[H, K] so that T −1 ∈ B[K, H] by the Inverse Mapping Theorem), then the above identities ensure that I = I ∗ = (T −1 T )∗ = T ∗ (T −1 )∗, the identity operator in B[H], and I = I ∗ = (T T −1 )∗ = (T −1 )∗ T ∗, the identity operator in B[K]. Hence T ∗ ∈ G[K, H]. Dually (since T ∗∗ = T ), this implies that T ∈ G[H, K]. Therefore, T ∈ G[H, K] if and only if T ∗ ∈ G[K, H] and (T ∗ )−1 = (T −1 )∗ .
386
5. Hilbert Spaces
Example 5.P. Consider the Hilbert spaces F n and F m (for arbitrary positive integers m and n) equipped with their usual inner products as in Example 5.A. Take A ∈ B[F n, F m ] and recall that B[F n, F m ] = L[F n, F m ] by Corollary 4.30. As usual (cf. Examples 2.L and 2.N), we shall identify the linear transformation A ∈ L[F n, F m ] with its matrix ⎞ ⎛ α11 . . . α1n . .. ⎠ [A] = [αij ] = ⎝ .. ∈ F m×n . αm1 . . . αmn relative to the canonical bases for F n and F m. Thus for every x = (ξ1 , . . . , ξn ) in F n the vector y = Ax = (υ1 , . . . , υm ) in F m is such that υi =
n
αij ξj
j=1
for every i = 1 , . . . , m. In terms of common matrix notation, and according to ordinary matrix operations, the matrix equation [y] = [A][x] represents the identity y = Ax. Here ⎛
⎛
⎞ υ1 . [y] = ⎝ .. ⎠ ∈ F m×1 υm
and
⎞ ξ1 ⎜ ⎟ [x] = ⎝ ... ⎠ ∈ F n×1 ξn
are the matrices of y and x with respect to the canonical bases for F m and F n, respectively. Recall that [A] = [αij ] ∈ F m×n , [A]T = [αji ] ∈ F n×m , and [A] T = [A]T = [αji ] ∈ F n×m denote conjugate, transpose, and transpose conjugate of [A] = [αij ] ∈ F m×n , respectively. Now observe that the inner product on F m can be written as y ; z =
m
υi ζ i = [y]T [z]
i=1
for every y = (υ1 , . . . , υm ) ∈ F m and every z = (ζ1 , . . . , ζm ) ∈ F m. Using standard matrix algebra, we get
Ax ; y = [A][x] T [y] = [x]T [A]T [y] = [x]T [A]T [y] for every x ∈ F n and every y ∈ F m. Next consider the adjoint A∗ ∈ B[F m, F n ] of A ∈ B[F n, F m ], and let [A∗ ] ∈ F n×m denote the matrix of A∗ relative to the canonical bases for F m and F n. Then
Ax ; y = x ; A∗ y = [x]T [A∗ ][y]
5.12 The Adjoint Operator
387
for every x ∈ F n and every y ∈ F m. Therefore, by uniqueness of the matrix representation (with respect to fixed bases — cf. Example 2.L), we get [A∗ ] = [A]T . That is, the matrix of the adjoint A∗ of A is the transpose conjugate of the matrix of A. Example 5.Q. Let S be a nondegenerate interval of the real line R. Consider the Hilbert space L2 (S) equipped with its usual inner product (see Example 5.D). As always, we write x ∈ L2 (S) instead of [x] ∈ L2 (S), where x is any representative of the equivalence class [x]. Take an arbitrary a ∈ L2 (S×S) so that a(s, ·) ∈ L2 (S) for each s ∈ S, a(·, t) ∈ L2 (S) for each t ∈ S, and $ $ #a(s, ·)#2 ds = #a(·, t)#2 dt #a#2 = S $ $ $S $ 2 |a(s, t)| ds dt = |a(s, t)|2 dt ds < ∞. = S
S
S
S
The preceding identities are due to a well-known result in integration theory referred to as the Fubini Theorem (which ensures that the order of the integrals can be interchanged). Now define the integral mapping A: L2 (S) → L2 (S) as follows. For each x ∈ L2 (S) let z = Ax ∈ L2 (S) be given by $ 9 : 9 : a(s, t)x(t) dt = a(s, ·) ; x = x ; a(s, ·) z(s) = S
for every s ∈ S. Note that z = Ax actually lies in L2 (S) because a lies in L2 (S×S). Indeed, by the Schwarz inequality, $ $ #9 , ,2 :# # x ; a(s, ·) #2 ds ≤ #x#2 , a(s, ·) , ds = #x#2 #a#2 , #z#2 = S
S
and hence, #Ax# ≤ #a##x# for every x ∈ L2 (S) so that A is bounded. Since A is certainly linear, it follows that A ∈ B[L2 (S)]. Also note that $ $
a(s, t)x(t) dt y(s) ds Ax ; y = S
$ $
S
a(s, t)x(t)y(s) dt ds =
= S S
$
$ x(t) S
a(s, t) y(s) ds dt
S
for every x, y ∈ L2 (S). Consider the adjoint A∗ ∈ B[L2 (S)] of A ∈ B[L2 (S)]. For each y ∈ L2 (S) set w = A∗ y ∈ L2 (S) so that $ x(t)w(t) dt, Ax ; y = x ; A∗ y = x ; w = S
for every x, y ∈ L2 (S). Then w = A∗ y ∈ L2 (S) is given by
388
5. Hilbert Spaces
$
$
w(s) =
a(t, s) y(t) dt = S
a∗ (s, t)y(t) dt
S
for every s ∈ S. Therefore, the adjoint A∗ of the integral operator A is again an integral operator whose kernel a∗ ∈ L2 (S×S) is related to the kernel a ∈ L2 (S×S) of A as follows. For every (s, t) ∈ S×S, a∗ (s, t) = a(t, s). An isometry between metric spaces is a map that preserves distance, and so every isometry is an injective contraction. A linear isometry between normed spaces is a linear transformation that preserves norm and, between inner product spaces, a linear isometry is a linear transformation that preserves inner product. Propositions 4.37 and 5.21 gave necessary and sufficient conditions for a linear transformation to be an isometry. Here is another one for linear transformations between Hilbert spaces that is stated in terms of the adjoint. Proposition 5.72. A transformation V ∈ B[H, K] of a Hilbert space H into a Hilbert space K is an isometry if and only if V ∗ V = I. Proof. According to Proposition 5.21, V is an isometry if and only if it preserves inner product; that is, V x ; V y = x ; y for every x, y ∈ H. Equivalently, (V ∗ V − I)x ; y = 0 for every x, y ∈ H, and hence (V ∗ V − I)x = 0 for every x ∈ H, where I is the identity on H. A coisometry is a transformation T in B[H, K] such that its adjoint T ∗ in B[K, H] is an isometry. By the previous proposition, T is a coisometry if and only if T T ∗ = I (identity on K). Recall that a unitary transformation U in B[H, K] is an isometric isomorphism between H and K. Equivalently, U in B[H, K] is unitary if it is an invertible isometry (i.e., an isometry in G[H, K]), which means a surjective isometry (since isometries are always injective). Proposition 5.73. Take U ∈ B[H, K], where H and K are Hilbert spaces. The following assertions are pairwise equivalent. (a) U is unitary (i.e., U is a surjective isometry, or an invertible isometry). (b) U is invertible and U −1 = U ∗ (i.e., U lies in G[H, K] and U −1 = U ∗ ). (c) U is invertible and #U # = #U −1 # = 1. (d) U is invertible, #U # ≤ 1 , and #U −1 # ≤ 1. (e) U ∗ U = I (identity on H) and U U ∗ = I (identity on K). (f) U is an isometry and a coisometry. (g) U is invertible and both U and U −1 are isometries. (h) #U x# = #x# and #U ∗ y# = #y# for every x ∈ H and every y ∈ K. If H = K, then each of the above is equivalent to each of the following.
5.12 The Adjoint Operator
389
(i) U ∗ U = U U ∗ = I. (j) U is an isometry such that U ∗ U = U U ∗ . (k) #U ∗ x# = #U x# = #x# for every x ∈ H. (l) #U ∗n x# = #U n x# = #x# for every x ∈ H and every integer n ≥ 1. (m) U ∗n U n = U n U ∗n = I for every integer n ≥ 1. (n) U is invertible and #U n # = #U −n # = 1 for every integer n ≥ 1. Proof. Take U ∈ B[H, K]. First we show that (a) and (b) are equivalent. If U is unitary, then it is an isometry in G[H, K]. Thus there exists U −1 ∈ G[K, H] such that U U −1 = I (identity on K). Proposition 5.21 ensures that U x1 ; U x2 = x1 ; x2 Hence
for every
x1 , x2 ∈ H.
x ; U −1 y = U x ; U U −1 y = U x ; y = x ; U ∗ y
for every x ∈ H and y ∈ K. Then, as the adjoint is unique (Proposition 5.65), U −1 = U ∗. Conversely, if U ∈ G[H, K] and U −1 = U ∗, then U ∗ U = I (identity on H), and so U is a surjective isometry (Proposition 5.72). Thus (a) ⇔ (b). It is plain that (b) ⇔ (e) by Theorem 4.22, (e) ⇔ (f) by Proposition 5.72, and (f) ⇔ (h) by Proposition 4.37. Tautologically (g) ⇒ (a), and (e) ⇒ (g) by Proposition 5.72 since (U −1 )∗ U −1 = (U U ∗ )−1. If (b) holds, then 1 = #U −1 U # = #U ∗ U # = #U #2 = #U ∗ #2 = #U −1 #2 by Proposition 5.65, and so (b) ⇒ (c). Trivially (c) ⇒ (d). If (d) holds, then #x# = #U −1 U x# ≤ #U x# ≤ #x# so that #U x# = #x# for each x in H, and therefore (d) ⇒ (a). Suppose H = K. Thus (i) ⇔ (e) ⇔ (j) by Proposition 5.72, (l) ⇔ (f) by Proposition 4.37(c), (k) ⇔ (h) trivially, (m) ⇒ (i) and the converse follows by induction, and (b, l) ⇒ (n) ⇒ (c). Let T be an operator in B[X ], where X is an inner product space, and let M be a subspace of X . Recall that M is an invariant subspace for T (or invariant under T , or T -invariant) if T (M) ⊆ M (i.e., T x ∈ M whenever x ∈ M). A nontrivial invariant subspace for T is an invariant subspace M for T such that {0} = M = X (see Problems 4.18 to 4.20). If M and its orthogonal complement M⊥ are both invariant for T (i.e., if T (M) ⊆ M and T (M⊥ ) ⊆ M⊥ ), then we say that M reduces T (or M is a reducing subspace for T ). Accordingly, a nontrivial reducing subspace for T is a reducing subspace for T such that {0} = M = X . An operator is reducible if it has a nontrivial reducing subspace. Now let X = H be a Hilbert space and consider the orthogonal direct sum H = M ⊕ M⊥ of Theorem 5.25. Observe from Example 2.O (also see Problem 4.16) that the following assertions are pairwise equivalent. (a) M reduces T , (b) T = T |M ⊕ T |M⊥ =
T |M O
O T |M⊥
: H = M ⊕ M⊥ → H = M ⊕ M ⊥ .
390
5. Hilbert Spaces
⊥ ⊥ (c) P T = T P , where P = OI O is the O : H = M⊕M →H =M⊕M orthogonal projection onto M. If M is nontrivial, then T is reducible. This suggests that, if M reduces T , then the investigation of T is reduced to the investigation of smaller operators (viz., T |M and T |M⊥ ), which justifies the terminology “reducing subspace”. Proposition 5.74. Let T be any operator on a Hilbert space H. A subspace M of H is invariant for T if and only if M⊥ is invariant for T ∗. Thus T has a nontrivial invariant subspace if and only if T ∗ has. Proof. Let M be a subspace of a Hilbert space H, let T be any operator in B[H], and take any y ∈ M⊥. If T x ∈ M whenever x ∈ M, then x ; T ∗ y = T x ; y = 0 for every x ∈ M so that T ∗ y ⊥ M. Then T ∗ y ∈ M⊥ for every y ∈ M⊥. Conclusion: T (M) ⊆ M implies T ∗ (M) ⊆ M⊥. Conversely, since the above implication holds for every T ∈ B[H], it follows that T ∗ (M⊥ ) ⊆ M⊥ implies T ∗∗ (M⊥⊥ ) ⊆ M⊥⊥ . But T ∗∗ = T and M⊥⊥ = M− = M (Propositions 5.15 and 5.65(c)).Thus T ∗ (M⊥ ) ⊆ M⊥ implies T (M) ⊆ M. Finally, note that {0} = M = H if and only if {0} = M⊥ = H (Proposition 5.15). Corollary 5.75. A subspace M of a Hilbert space H reduces T ∈ B[H] if and only if it is invariant for both T and T ∗. In this case (T |M )∗ = T ∗ |M . Proof. Since M⊥⊥ = M, the previous proposition says that T (M⊥ ) ⊆ M⊥ if and only if T ∗ (M) ⊆ M. Therefore, T (M) ⊆ M and T (M⊥ ) ⊆ M⊥ if and only if T (M) ⊆ M and T ∗ (M) ⊆ M. Moreover, in this case, (T |M )x ; y = T x ; y = x ; T ∗ y = x ; (T ∗ |M )y for every x, y in the Hilbert space M. Recall that N (T ) and R(T )− are invariant subspaces for T ∈ B[H] (see Problems 4.20 to 4.22). Null spaces of Hilbert space operators constitute an important source of invariant subspaces. The next result shows that N (T ∗ )⊥, R(T ∗ )⊥, N (T ∗ T ), and R(T T ∗ )− also are invariant subspaces for T ∈ B[H]. Proposition 5.76. If T is a bounded linear transformation of a Hilbert space H into a Hilbert space K, then (a)
N (T ) = R(T ∗ )⊥ = N (T ∗ T ),
(b)
R(T )− = N (T ∗ )⊥ = R(T T ∗ )− ,
(a∗)
N (T ∗ ) = R(T )⊥ = N (T T ∗ ),
(b∗)
R(T ∗ )− = N (T )⊥ = R(T ∗ T )− .
Proof. Note that x ∈ R(T ∗ )⊥ if and only if x ; T ∗ y = 0 for every y ∈ K. By the definition of adjoint, this is equivalent to T x ; y = 0 for every y ∈ K, which means that T x = 0; that is, x ∈ N (T ). Hence R(T ∗ )⊥ = N (T ).
5.12 The Adjoint Operator
391
Moreover, since #T x#2 = T x ; T x = T ∗ T x ; x for every x ∈ H, it follows that N (T ∗ T ) ⊆ N (T ). But N (T ) ⊆ N (T ∗ T ) trivially, and so N (T ) = N (T ∗ T ), which completes the proof of (a). Since (a) holds true for every T ∈ B[H, K], it also holds for T ∗ ∈ B[K, H] and T T ∗ ∈ B[K]. Therefore (cf. Propositions 5.15 and 5.65(c)), R(T )− = R(T ∗∗ )⊥⊥ = N (T ∗ )⊥ = N (T ∗∗ T ∗ )⊥ = N (T T ∗ )⊥ = R((T T ∗ )∗ )⊥⊥ = R(T ∗∗ T ∗ )⊥⊥ = R(T T ∗ )− , which proves (b). Since T ∗∗ = T we get the dual expressions (a∗) and (b∗). Here is a useful result concerning closed ranges and adjoints. Proposition 5.77. Let H and K be Hilbert spaces and take any T ∈ B[H, K]. The following assertions are pairwise equivalent . (a) R(T ) = R(T )− . (b) R(T ∗ ) = R(T ∗ )− . (c) R(T ∗ T ) = R(T ∗ T )− . (d) R(T T ∗ ) = R(T T ∗ )− . Proof. Let T be an arbitrary bounded linear transformation of a Hilbert space H into a Hilbert space K. Proof of (a)⇔(b). Set T0 = T |N (T )⊥ ∈ B[N (T )⊥, K], the restriction of T to N (T )⊥. Since H = N (T ) + N (T )⊥ (cf. Proposition 4.13 and Theorem 5.20), it follows that every x ∈ H can be written as x = u + v with u ∈ N (T ) and v ∈ N (T )⊥. If y ∈ R(T ), then y = T x = T u + T v = T v = T |N (T )⊥ v for some x ∈ H, and hence y ∈ R(T |N (T )⊥ ). Thus R(T ) ⊆ R(T |N (T )⊥ ). Since R(T |N (T )⊥ ) ⊆ R(T ) trivially, we get R(T0 ) = R(T )
and
N (T0 ) = {0}
because N (T |N (T )⊥ ) = {0}. If R(T ) = R(T )−, then Corollary 4.24 ensures the existence of T0−1 ∈ B[R(T ), N (T )⊥ ]. Now take an arbitrary w ∈ N (T )⊥ and consider the functional fw : R(T ) → F defined by fw (y) = T0−1 y ; w for every y ∈ R(T ), which is linear (reason: T0−1 is linear and the inner product is linear in the first argument) and bounded (in fact, |fw (y)| ≤ #T0−1##w##y# for every y ∈ R(T )). The Riesz Representation Theorem (Theorem 5.62) says
392
5. Hilbert Spaces
that there is a zw in the Hilbert space R(T ) (recall: R(T ) is a subspace of the Hilbert space K, and so a Hilbert space itself, if R(T ) = R(T )− ) such that fw (y) = y ; zw for every y ∈ R(T ). Consider the decomposition H = N (T ) + N (T )⊥ (again) and take any x ∈ H so that x = u + v with u ∈ N (T ) and v ∈ N (T )⊥. Then x ; T ∗ zw = T x ; zw = T u ; zw + T v ; zw = T v ; zw = fw (T v) = T0−1 T v ; w = T0−1 T0 v ; w = v ; w = u ; w + v ; w = x ; w. Thus x ; T ∗ zw − w = 0 for every x ∈ H, which means that T ∗ zw = w. Therefore, w ∈ R(T ∗ ). This ensures the inclusion N (T )⊥ ⊆ R(T ∗ ). On the other hand, R(T ∗ ) ⊆ R(T ∗ )− = N (T )⊥ by Proposition 5.76(b∗ ) so that R(T ∗ ) = N (T )⊥ . Hence (a) implies (b) by Proposition 5.12 (or by Proposition 5.76(b∗ )). Since (a) implies (b), it follows that (b) implies (a) because T ∗∗ = T . Proof of (a)⇒(c) and (b)⇒(d). Let T1 ∈ B[H, R(T )] be defined by T1 x = T x for every x ∈ H (i.e., T1 is surjective and coincides with T on H). Clearly, R(T ) = R(T1 ). Let T1∗ ∈ B[R(T ), H] be the adjoint of T1 ∈ B[H, R(T )], consider the restriction of T ∗ ∈ B[K, H] of T to R(T ), viz., T ∗ |R(T ) ∈ B[R(T ), H], and note that x ; T1∗ y = T1 x ; y = T x ; y = x ; T ∗ y = x ; T ∗ |R(T ) y for every x ∈ H and every y ∈ R(T ). Then T1∗ y = T ∗ |R(T ) y for every y ∈ R(T ) (i.e., T1∗ = T ∗ |R(T ) ), and hence R(T1∗ ) = R(T ∗ |R(T ) ). Observe that x lies in R(T ∗ |R(T ) ) if and only if x = T ∗ |R(T ) y = T ∗ y for some y in R(T ). But this is equivalent to saying that x = T ∗ T u for some u in H, which means that x lies in R(T ∗ T ). That is, R(T ∗ |R(T ) ) = R(T ∗ T ), so that
R(T1∗ ) = R(T ∗ T ).
If R(T ) = R(T )−, then R(T1 ) = R(T1 )−. Since (a) implies (b), it follows that R(T1∗ ) = R(T1∗ )−. Therefore, R(T ∗ T ) = R(T ∗ T )− .
5.13 Self-Adjoint Operators
393
Conclusion: (a) implies (c), and hence (b) implies (d) (because T ∗∗ = T ). Proof of (d)⇒(a) and (c)⇒(b). According to Proposition 5.76(b), R(T T ∗ ) ⊆ R(T ) ⊆ R(T )− = R(T T ∗ )− . Then (d) implies (a), and so (c) implies (b) (because T ∗∗ = T ).
We close this section by introducing an important notion. Let A be a unital (complex) Banach algebra (cf. Definition 4.17) and suppose there exists a mapping A → A defined by A → A∗ that satisfies the following conditions for all A, B ∈ A and all α ∈ C . (i)
(A∗ )∗ = A,
(ii) (iii)
(AB)∗ = B ∗ A∗ , (A + B)∗ = A∗ + B ∗ ,
(iv)
(αA)∗ = αA∗ .
Such a mapping is called an involution on A. A C*-algebra is a unital Banach algebra with an involution ∗: A → A such that (v)
#A∗A# = #A#2
for every A ∈ A. It is clear that if H is a complex Hilbert space, then B[H] is a C*-algebra, where ∗ is the adjoint operation. Every T ∈ B[H] determines a C*-subalgebra of B[H], denoted by C ∗ (T ), which is the smallest C*-algebra of operators from B[H] containing both T and the identity I. It can be shown that C ∗ (T ) = P (T, T ∗)−, the closure in B[H] of all polynomials in T and T ∗ with complex coefficients. We mention the Gelfand–Naimark Theorem that asserts the converse: Every C*-algebra is isometrically ∗-isomorphic to a C*subalgebra of B[H]. That is, for every C*-algebra A there exists an isometric algebra isomorphism of A onto a C*-subalgebra of B[H] that preserves the involution ∗. A great deal of the rest of this book can be posed in an abstract C*-algebra. However, we shall stick to B[H].
5.13 Self-Adjoint Operators Throughout this section H and K will stand for Hilbert spaces. An operator T ∈ B[H] is self-adjoint (or Hermitian) if T ∗ = T . By the definition of the adjoint operator, T ∈ B[H] is self-adjoint if and only if T x ; y = x ; T y
for every
x, y ∈ H.
Proposition 5.78. If T ∈ B[H] is self-adjoint, then #T # = sup |T x ; x|.
x =1
394
5. Hilbert Spaces
Proof. Let T be any operator in B[H]. The Schwarz inequality says that |T x ; x| ≤ #T x##x# ≤ #T ##x#2 for every x ∈ H. Then sup |T u ; u| ≤ #T #.
u =1
On the other hand, T (x ± y) ; x ± y = T x ; x ± T x ; y ± T y ; x + T y ; y for every x, y ∈ H. Therefore, if T = T ∗, then T y ; x = y ; T x = T x ; y, and hence T x ; y + T y ; x = 2 Re T x ; y, which implies that T (x ± y) ; x ± y = T x ; x ± 2 Re T x ; y + T y ; y. Thus T (x + y) ; x + y − T (x − y) ; x − y = 4 Re T x ; y for every x, y ∈ H. But |T z ; z| = |T (#z#−1 z) ; #z#−1z|#z#2 for every nonzero vector z in H, and so |T z ; z| ≤ sup |T u ; u|#z#2
u =1
for every z ∈ H. By the above two relations and the parallelogram law, 4 ReT x ; y ≤ |T (x + y) ; x + y| + |T (x − y) ; x − y|
≤ sup |T u ; u| #x + y#2 + #x − y#2
u =1
= 2 sup |T u ; u| #x#2 + #y#2 = 4 sup |T u ; u|
u =1
u =1
if #x# = #y# = 1. Consider the polar representation T x ; y = |T x ; y|eiθ, set x = e−iθ x, and get |T x ; y| = e−iθ T x ; y = T x ; y. Since #x # = #x#, and since |T x ; y| = T x ; y = Re T x ; y, it follows that |T x ; y| ≤ sup |T u ; u|
whenever
u =1
#x# = #y# = 1.
Therefore, according to Corollary 5.71, #T # =
sup
x = y =1
|T x ; y| ≤ sup |T u ; u|.
u =1
It is worth noticing now that there are non-self-adjoint operators T in B[H] such that #T # = sup x =1 |T x ; x|. The class of all operators for which this norm identity holds will be characterized in Chapter 6 (these are the normaloid operators; a class of operators that includes the class of the normal operators of Section 6.1). The next proposition gives a necessary and sufficient condition for an operator to be self-adjoint on a complex Hilbert space. Proposition 5.79. If H is a complex Hilbert space, then T ∈ B[H] is selfadjoint if and only if T x ; x ∈ R for every x ∈ H.
5.13 Self-Adjoint Operators
395
Proof. If T = T ∗, then T x ; x = x ; T x = T x ; x, and so T x ; x is a real number, for every x ∈ H. Conversely, if T x ; x ∈ R for every x ∈ H, then T x ; x = T x ; x = x ; T x for every x ∈ H, and hence (cf. Problem 5.3(b)) T x ; y = x ; T y for every x, y ∈ H; that is, T is self-adjoint.
Remark : If H is a real Hilbert space, then T x ; x ∈ R for every x ∈ H for all T ∈ B[H]. Since there are non-self-adjoint operators on real Hilbert spaces, the proposition fails when H is a real Hilbert space. Example: If A = 0 above 1 0 −1 2 ∗ R ], then O = A = A = ∈ B[ and Ax ; x = 0 for all x ∈ R2 . −1 0 1 0 Take any z in K, where K is a (real or complex) Hilbert space. Recall that z ; y = 0 for every y ∈ K if and only if z = 0. Now take T in B[H, K]. Since T = O if and only if T x = 0 in K for all x ∈ H, it follows that T = O if and only if T x ; y = 0 for every x ∈ H and every y ∈ K. Next set K = H and take T in B[H]. If H is a complex Hilbert space, then T = O if and only if T x ; x = 0 for all x ∈ H (cf. Problem 5.4). This is in general false for a real Hilbert space (see the above example). However, this certainly holds for every operator (on any Hilbert space) that satisfies the norm identity of Proposition 5.78. In particular, it holds for self-adjoint operators on a real Hilbert space. Corollary 5.80. Let H be any Hilbert space. If T ∈ B[H] is self-adjoint, then T = O if and only if T x ; x = 0 for all x ∈ H. Proposition 5.79 leads to a partial ordering of the set of all self-adjoint operators. Let Q ∈ B[H] be a self-adjoint operator. We say that Q is nonnegative (notation: O ≤ Q or Q ≥ O) if 0 ≤ Qx ; x for every x ∈ H. If 0 < Qx ; x for every nonzero x in H, then Q is called positive (notation: O < Q or Q > O). If there exists a real number α > 0 such that α#x#2 ≤ Qx ; x for every x ∈ H, then Q is called strictly positive (notation: O ≺ Q or Q ) O). Trivially, O≺Q
=⇒
O
=⇒
O ≤ Q.
Observe that T ∗ T ∈ B[H] and T T ∗ ∈ B[K] are self-adjoint operators for every T ∈ B[H, K]. In fact, since 0 ≤ #T x#2 = T x ; T x = T ∗ T x ; x for every x ∈ H, it follows that O ≤ T ∗ T . Dually, since T ∗∗ = T , it also follows that O ≤ T T ∗ . The next proposition uses this fact to give necessary and sufficient conditions that a projection be an orthogonal projection. Proposition 5.81. If P ∈ B[H] is a nonzero projection, then the following assertions are pairwise equivalent . (a) P is an orthogonal projection. (b) P is self-adjoint . (c) P is nonnegative.
396
5. Hilbert Spaces
(d) #P # = 1. (e) #P # ≤ 1. Proof. Let P be an orthogonal projection. Proposition 5.51 says that R(P ) = R(P )− (and so R(P ∗ ) = R(P ∗ )− by Proposition 5.77) and R(P ) = N (P )⊥. But N (P )⊥ = R(P ∗ )− by Proposition 5.76. Therefore, R(P ) = R(P ∗ ). Now take an arbitrary x ∈ H. The above identity ensures that P x = P ∗ z for some z ∈ H, and hence P ∗ P x = (P ∗ )2 z = (P 2 )∗ z = P ∗ z = P x. That is, P ∗ P = P , which implies that P is self-adjoint. Therefore (a)⇒(b). If P = P ∗ (i.e., if P is self-adjoint), then P x ; x = P 2 x ; x = P x ; P x = #P x#2 ≥ 0 for every x ∈ H. This shows that (b)⇒(c). If P ≥ O (i.e., if P is nonnegative), then P = P ∗ (by definition) so that #P #2 = #P ∗ P # = #P 2 # = #P # (cf. Proposition 5.65), and so #P # = 1 (since P = O), which implies that #P # ≤ 1. Thus (c)⇒(d)⇒(e). Finally, suppose #P # ≤ 1 and take an arbitrary v ∈ N (P )⊥. Since R(I − P ) = N (P ), we get (I − P )v ∈ N (P ). Then (I − P )v ⊥ v so that 0 = (I − P )v ; v = #v#2 − P v ; v and hence #v#2 = P v ; v ≤ #P v##v# ≤ 1 #P ##v#2 ≤ #v#2 . This implies that #P v# = #v# = P v ; v 2 . Therefore, #(I − P )v#2 = #P v − v#2 = #P v#2 − 2 ReP v ; v + #v#2 = 0, and so v ∈ N (I − P ) = R(P ). Thus N (P )⊥ ⊆ R(P ). On the other hand, if y ∈ R(P ), then y = u + v, where u ∈ N (P ) and v ∈ N (P )⊥ (because R(P ) ⊆ H = N (P ) + N (P )⊥ ). Since N (P )⊥ ⊆ R(P ) = {x ∈ H: P x = x}, we get y = P y = P u + P v = P v = v, and so y ∈ N (P )⊥. Hence R(P ) ⊆ N (P )⊥. Then R(P ) = N (P )⊥ so that R(P ) ⊥ N (P ). Outcome: (e)⇒(a). Take S, T ∈ B[H]. If T − S is self-adjoint (in particular, if T and S are selfadjoint) and O ≤ T − S, then we write S ≤ T so that S ≤ O means O ≤ −S. It is easy to show that ≤ defines a reflexive, transitive, and antisymmetric relation on the set of all self-adjoint operators on H, and so a partial ordering of it. Similarly, we write S < T if O < T − S and S ≺ T if O ≺ T − S (and hence S < O means O < −S and S ≺ O means O ≺ −S). Observe that O≺Q
⇐⇒
αI ≤ Q for some α > 0.
Moreover, #T # ≤ 1 (i.e., T is a contraction) if and only if T ∗ T ≤ I, and #T # < 1 (i.e., T is a strict contraction) if and only if T ∗ T ≺ I (i.e., T ∗ T ≤ βI for some β ∈ (0, 1)). Why? Proposition 5.82. If Q ∈ B[H] is nonnegative, then |Qx ; y|2 ≤ Qx ; xQy ; y for every x, y ∈ H, and hence #Qx#2 ≤ #Q#Qx ; x
for every
x ∈ H.
5.13 Self-Adjoint Operators
397
Proof. Take O ≤ Q ∈ B[H]. Consider the function ; Q : H×H → F given by x ; yQ = Qx ; y
for every
x, y ∈ H.
It is easy to verify that ; Q is a semi-inner product on H. Let # #Q be the seminorm induced on H by ; Q so that #x#2Q = Qx ; x for every x ∈ H. Since the Schwarz inequality holds in a semi-inner product space, we get |Qx ; y|2 = |x ; yQ |2 ≤ #x#2Q #y#2Q = Qx ; xQy ; y for every x, y ∈ H. In particular, by setting y = Qx, #Qx#4 = Qx ; Qx2 ≤ Qx ; xQ2 x ; Qx ≤ Qx ; x#Q2 x##Qx# ≤ Qx ; x#Q##Qx#2 , which implies that #Qx#2 ≤ #Q#Qx ; x, for every x ∈ H.
Positive operators do not necessarily have a bounded inverse. Indeed, O
⇐⇒
O ≤ Q and N (Q) = {0}
⇐⇒
O ≤ Q and R(Q)− = H.
The nontrivial part of the first equivalence follows from Proposition 5.82, and the second one is a straightforward consequence of Propositions 5.15 and 5.76(a∗ ): R(Q)⊥ = N (Q∗ ) = N (Q) so that R(Q)− = H if and only if N (Q) = {0}. Therefore, Q > O has an inverse on its range, which is not necessarily bounded. However, strictly positive operators are invertible (i.e., have a bounded inverse). In fact, if Q ) O, then Q is bounded below (Schwarz inequality). Conversely, if Q ≥ O is bounded below, then Q ) O (cf. Proposition 5.82). Thus, by the above displayed equivalence and Proposition 4.24, O≺Q
⇐⇒
O ≤ Q ∈ G[H]
⇐⇒
O ≤ Q and R(Q) = H.
Note: If H is finite dimensional, then Q ) O if and only if Q > O. Why? ∞ Example 5.R. Take a sequence a = {αk }∞ k=0 in + and consider the diagonal ∞ 2 operator Da = diag({αk }k=0 ) in B[+ ] of Examples 4.H and 4.J, now acting 2 on the Hilbert space + . It is readily verified that
Da = Da∗ O ≤ Da
if and only if if and only if
αk ∈ R for all k ≥ 0, αk ≥ 0 for all k ≥ 0,
O < Da O ≺ Da
if and only if if and only if
αk > 0 for all k ≥ 0, Da = Da∗ and inf k αk > 0.
Let B +[H] denote the weakly closed convex cone of all nonnegative operators in B[H] (see Problem 5.49) and set G + [H] = B + [H] ∩ G[H]: the class of all strictly positive operators on H. The following is a rather useful corollary of Proposition 5.82.
398
5. Hilbert Spaces
Corollary 5.83. If {Qn } is a sequence of nonnegative operators on H (i.e., if Qn ∈ B + [H] for every integer n), then w Qn −→ O
if and only if
s Qn −→ O.
Proof. Recall that strong convergence always implies weak convergence (to the same limit). On the other hand, since H is a Hilbert space, it follows that supn #Qn # < ∞ whenever {Qn } converges weakly. Therefore, according w s to Proposition 5.82, Qn −→ O implies Qn −→ O, because #Qn x#2 ≤ sup #Qk #Qn x ; x k
for every x ∈ H and every integer n (see Proposition 5.67).
An immediate consequence of the above corollary reads as follows. If Tn in B[H] is such that, for each n ≥ 1, either O ≤ T − Tn or O ≤ Tn − T , then w s Tn −→ T if and only if Tn −→ T . This is what is behind the next proposition. Let {Tn } be a B[H]-valued sequence. We say that {Tn } is increasing or decreasing if O ≤ Tn+1 − Tn or O ≤ Tn − Tn+1 , respectively, for every integer n. If it is increasing or decreasing, then we say that {Tn } is monotone. Proposition 5.84. A bounded monotone sequence of self-adjoint operators converges strongly. Proof. Suppose {Tn } is a bounded decreasing sequence of self-adjoint operators in B[H] and take an arbitrary x ∈ H. Thus {Tn x ; x} is a real-valued bounded decreasing sequence (see Proposition 5.79 and recall that supn |Tn x ; x| ≤ supn #Tn ##x#2 ). Therefore, {Tnx ; x} converges in R (Problem 3.10), and so w in F , which implies that Tn − T −→ O for some T ∈ B[H] (Problem 5.48). Moreover, as {Tn } is decreasing, Tn x ; x ≥ Tn+k x ; x → T x ; x as k → ∞ so that Tn − T ∈ B +[H] for every integer n, and hence Corollary 5.83 ensures s that Tn − T −→ O. If {Tn } is a bounded increasing sequence of self-adjoint operators, then it is clear that {−Tn } is a bounded decreasing sequence of self-adjoint operators. The above argument ensures that {−Tn } also converges strongly, and so does {Tn }.
5.14 Square Root and Polar Decomposition Throughout this section H and K will again stand for Hilbert spaces. If T is an operator in B[H], and if there exists an operator S in B[H] such that S 2 = T , then S is referred to as a square root of T . Nonnegative operators have a unique nonnegative square root. 1
Theorem 5.85. Every operator Q in B + [H] has a unique square root Q 2 in B + [H], which commutes with every operator in B[H] that commutes with Q.
5.14 Square Root and Polar Decomposition
399
Proof. (a) First we show that there is no loss of generality in assuming that Q is a nonnegative contraction (i.e., that O ≤ Q ≤ I — cf. Problem 5.55). In fact, if O ≤ Q and O = Q, then O ≤ #Q#−1Q ≤ I (because #Q#−1 Qx ; x = #Q#−1 Qx ; x ≤ #Q#−1 #Q##x#2 = x ; x for each x ∈ H). If there is a unique 1 1 S ≥ O such that S 2 = #Q#−1 Q, then the operator Q 2 = #Q# 2 S is such that 1 1 1 Q 2 ≥ O and (Q 2 )2 = #Q#S 2 = Q. Moreover, Q 2 is the unique nonnegative 1 1 1 operator such that (Q 2 )2 = Q (since S = #Q#− 2 Q 2 is the unique nonnegative operator such that S 2 = #Q#−1Q). Finally, if #Q#−1Q T = T #Q#−1Q implies 1 1 1 1 S T = T S, then (as S = #Q#− 2 Q 2 ) QT = T Q implies Q 2 T = T Q 2 . (b) Thus suppose O ≤ Q ≤ I and set R = I − Q so that O ≤ R ≤ I. Consider the sequence {Bn }∞ n=0 in B[H] recursively defined by Bn+1 =
1 2 (R
+ Bn2 )
with
B0 = O.
Claim 1. O ≤ Bn for every integer n ≥ 0. Proof. The proof goes by induction. First observe that B0 is trivially selfadjoint. If Bn is self-adjoint for some n ≥ 0, then Bn+1 is self-adjoint because (Bn2 )∗ = (Bn∗ )2 = Bn2 and R = R∗ . Hence Bn is self-adjoint for every n ≥ 0. Now note that O ≤ B0 trivially. If O ≤ Bn for some n ≥ 0, then Bn+1 x ; x = 1 2 2 (Rx ; x + #Bn x# ) ≥ 0 for every x ∈ H because O ≤ R. Claim 2. #Bn # ≤ 1 so that Bn ≤ I for all n ≥ 0. Proof. Since O ≤ R ≤ I it follows that #R# ≤ 1 by Proposition 5.78: #R# = sup x =1 Rx ; x ≤ sup x =1 x ; x = 1. Trivially, #B0 # ≤ 1. If #Bn # ≤ 1 for some n ≥ 0, then #Bn+1 # ≤ 12 (#R# + #Bn #2 ) ≤ 1, proving by induction that #Bn # ≤ 1 for all n ≥ 0. Then O ≤ Bn ≤ I for all n ≥ 0 (cf. Problem 5.55). Claim 3. {Bn } is an increasing sequence. Proof. It is readily verified by induction that each Bn is a polynomial in R with positive coefficients. Hence Bn Bm = Bm Bn for every m, n ≥ 0. Thus 2
− Bn2 = 12 (Bn+1 − Bn )(Bn+1 + Bn ) Bn+2 − Bn+1 = 12 Bn+1 for each n ≥ 0. Note that B1 − B0 = 12 R and B2 − B1 = 18 R2 are polynomials in R with positive coefficients. Since Bn+1 + Bn is a polynomial in R with positive coefficients (because each Bn is), it follows by the above displayed identity that Bn+2 − Bn+1 is a polynomial in R with positive coefficients whenever Bn+1 − Bn is. This proves by induction that Bn+1 − Bn is a polynomial in R with positive coefficients for every n ≥ 0, and so Bn+1 − Bn ≥ O for every n ≥ 0 because R ≥ O (cf. Problem 5.52(e)). Since {Bn } is a bounded monotone sequence of self-adjoint operators, it converges strongly by Proposition 5.84. That is, s Bn −→ B
and hence
s I − Bn −→ I −B
5. Hilbert Spaces
400
for some B ∈ B[H]. Since O ≤ Bn ≤ I (i.e., Bn and I − Bn lie in B + [H]) for each n, and since B + [H] is weakly (thus strongly) closed in B[H] (cf. Problem s 5.49 and Proposition 5.68), it follows that O ≤ B ≤ I. Moreover, as Bn −→ B, 2 s 2 we get Bn −→ B (cf. Problem 4.46(a)). Therefore,
O ≤ B = 12 R + B 2 ≤ I. Thus R = 2B − B 2 , and so Q = I − R = B 2 − 2B + I = (I − B)2 . Recalling 1 1 1 that O ≤ I − B ≤ I, set Q 2 = I − B. Then O ≤ Q 2 ≤ I and (Q 2 )2 = Q so 1 that there exists a nonnegative square root Q 2 of Q. (c) Suppose T Q = QT for some T ∈ B[H]. Thus T (I − R) = (I − R)T so that T R = R T , and hence T p(R) = p(R)T for every polynomial p in R. Therefore s T Bn = Bn T (cf. proof of Claim 3) for each n. Since T Bn −→ T B and since s Bn T −→ B T , it follows that T B = B T (the strong limit is unique). Then 1 1 1 T (I − B) = (I − B)T so that T Q 2 = Q 2 T . That is, Q 2 commutes with every operator that commutes with Q. 1
(d) We show that the nonnegative square root Q 2 of Q is unique. If A ≥ O is 1 1 such that A2 = Q, then AQ = A3 = QA, and so AQ 2 = Q 2 A by (c). Then 1
1
1
1
1
(Q 2 − A) Q 2 (Q 2 − A) + (Q 2 − A)A(Q 2 − A) 1
1
1
1
= (Q 2 − A)(Q 2 + A)(Q 2 − A) = (Q 2 − A)(Q − A2 ) = O. 1
1
1
1
1
1
But (Q 2 − A) Q 2 (Q 2 − A) ≥ O and (Q 2 − A)A(Q 2 − A) ≥ O (since Q 2 and A are nonnegative — cf. Problem 5.51(a)), which implies that these two operators are null, and so is their difference. That is, 1
1
1
1
1
1
(Q 2 − A)3 = (Q 2 − A) Q 2 (Q 2 − A) − (Q 2 − A)A(Q 2 − A) = O, 1
1
1
and therefore (Q 2 − A)4 = O. Since (Q 2 − A) is self-adjoint, #(Q 2 − A)#4 = 1 1 #(Q 2 − A)4 # = 0 (see Problem 5.45(e)) so that A = Q 2 . Proposition 5.86. If Q ∈ B + [H], then 1
1
#Q 2 #2 = #Q# = #Q2 # 2 ,
(a) 1
N (Q 2 ) = N (Q) = N (Q2 )
(b)
and
1
R(Q 2 )− = R(Q)− = R(Q2 )− .
Proof. If Q is nonnegative, then Q2 is nonnegative (Problem 5.52). Hence Q is the unique nonnegative square root of Q2 (Theorem 5.85). Therefore, the 1 identities involving Q and Q2 follow at once if the identities involving Q 2 and Q are established. 1
1
1
1
(a) Q 2 = (Q 2 )∗ implies #Q 2 #2 = #(Q 2 )2 # = #Q# (Proposition 5.65). 1
1
1
1
(b) Since Q = Q 2 Q 2 , we get N (Q 2 ) ⊆ N (Q). On the other hand, #Q 2 x#2 = 1 1 1 Q 2 x ; Q 2 x = Qx ; x for each x ∈ H and so N (Q) ⊆ N (Q 2 ). Thus N (Q) = 1 1 1 N (Q 2 ). Then R(Q)− = N (Q)⊥ = N (Q 2 )⊥ = R(Q 2 )− (Proposition 5.76).
5.14 Square Root and Polar Decomposition
401
A partial isometry is a bounded linear transformation that acts isometrically on the orthogonal complement of its null space. That is, W ∈ B[H, K] is a partial isometry if W |N (W )⊥ : N (W )⊥ → K, the restriction of W to N (W )⊥, is an isometry. Note: In this context, “isometry” means “linear isometry”. Proposition 5.87. If W : H → K is a partial isometry, then W = V P, where V : N (W )⊥ → K is an isometry and P : H → H is the orthogonal projection onto N (W )⊥. Conversely, let M be any subspace of H. If V : M → H is an isometry and P : H → H is the orthogonal projection onto M, then W = V P : H → K is a partial isometry. Proof. If W : H → K is a partial isometry, then let P : H → H be the orthogonal projection onto N (W )⊥ (so that R(P ) = N (W )⊥, and therefore R(I − P ) = N (P ) = R(P )⊥ = N (W ) — cf. Propositions 4.13, 5.15, and 5.51). Set V = W |N (W )⊥ : N (W )⊥ → K, which is an isometry. Thus, for every x ∈ H, W x = W (P x + (I − P )x) = W P x + W (I − P )x = W P x = W |N (W )⊥ P x = V P x. Conversely, let V : M → K be an isometry, and let P : H → H be the orthogonal projection onto M, where M is an arbitrary subspace of H. If W = V P , then W |M v = W v = V P v = V v for every v ∈ R(P ) = M, and hence W acts isometrically on M. Moreover, if W u = 0, then V P u = 0 so that P u ∈ N (V ) = {0} (recall that V is an isometry), which implies that u ∈ N (P ). Therefore, N (W ) ⊆ N (P ), and so N (W ) = N (P ) (reason: N (P ) ⊆ N (W ) because W = V P ). Thus M = R(P ) = N (P )⊥ = N (W )⊥. Conclusion: W = V P : H → K is a bounded linear transformation (composition of bounded linear transformations) that acts isometrically on N (W )⊥ ; that is, W is a partial isometry. Since W = V P and V = W |N (W )⊥, #V # ≤ #W # = #V P # ≤ #V ##P # and R(W ) ⊆ R(V ) = R(W |N (W ⊥ ) ) ⊆ R(W ). Then #W # = #V # = 1
and
R(W ) = R(V )
(Proposition 5.81(e) and Problem 4.41(a)). Recall that the range of a linear isometry on a Banach space is a subspace (Problem 4.41(b)). That is, R(V ) = R(V )−
and hence
R(W ) = R(W )− .
Thus the range of a partial isometry is also a subspace. The subspaces N (W )⊥ and R(W ) (of H and K) are called initial and final spaces of the partial isometry W ∈ B[H, K], respectively. Proposition 5.88. Take W ∈ B[H, K]. The following assertions are pairwise equivalent .
402
5. Hilbert Spaces
(a) W is a partial isometry with initial space M and final space R. (b) W ∗ W is the orthogonal projection onto M and R(W ) = R. (c) W W ∗ W = W. (d) W ∗ W W ∗ = W ∗. (e) W W ∗ is the orthogonal projection onto R and N (W )⊥ = M. (f) W ∗ is a partial isometry with initial space R and final space M. Proof. Let W be a bounded linear transformation of H into K. Proof of (a)⇒(c). If (a) holds, then W = V P , where V : M → K is an isometry, P : H → H is the orthogonal projection onto M, and M = N (W )⊥ (Proposition 5.87). Consider the adjoint V ∗ : K → M of V , and recall from Proposition 5.72 that V ∗ V = I, the identity on the Hilbert space M (since M = N (W )⊥ is a subspace of the Hilbert space H by Proposition 5.12, and hence a Hilbert space itself). Since P is an orthogonal projection, it follows by Proposition 5.81 that P = P ∗ and therefore W ∗ = P ∗ V ∗ = P V ∗. Hence W W ∗ W = V P P V ∗ V P = V P = W. Proof of (c)⇒(b). If W W ∗ W = W , then (W ∗ W )2 = W ∗ W W ∗ W = W ∗ W so that W ∗ W is a (continuous) projection, and so it has a closed range (reason: if E = E 2 , then R(E) = N (I − E), and N (I − E) is closed whenever the linear transformation E is bounded — see Proposition 4.13). Therefore, R(W ∗ W ) = R(W ∗ W )− = N ((W ∗ W )∗ )⊥ = N (W ∗ W )⊥ = N (W )⊥ (Proposition 5.76). Thus R(W ∗ W ) = N (W )⊥ and R(W ∗ W ) ⊥ N (W ∗ W ), which implies that W ∗ W is the orthogonal projection onto M = N (W )⊥. Proof of (b)⇒(a). If (b) holds, then M = R(W ∗ W ) = R(W ∗ W )− = N (W )⊥ (Propositions 5.51 and 5.76). If v ∈ M = R(W ∗ W ), then W ∗ W v = v, and hence #W v#2 = W v ; W v = W ∗ W v ; v = v ; v = #v#2 so that W |M is an isometry. Since M = N (W )⊥, it follows that (b) implies (a). Conclusion: Assertions (a), (b), and (c) are pairwise equivalent. If W ∗ W is an orthogonal projection, then R(W ∗ W ) is closed, which implies that R(W ∗ ) and R(W ) are closed as well. Similarly, if W ∗ is a partial isometry, then R(W ∗ ) is closed, and so is R(W ). Therefore, in both cases, R(W ∗ ) = N (W )⊥ and N (W ∗ )⊥ = R(W ). Since assertions (a), (b), and (c) are pairwise equivalent, it follows that the dual assertions (f), (e), and (d) are pairwise equivalent too. Finally, observe that (c) and (d) are trivially equivalent. If a transformation T ∈ B[H, K] is the product of a partial isometry W ∈ B[H, K] and a nonnegative operator Q ∈ B[H], and if N (W ) = N (Q), then the representation T = W Q is called the polar decomposition of T . The next theorem says that every bounded linear transformation in B[H, K] has a unique polar decomposition.
5.14 Square Root and Polar Decomposition
403
Theorem 5.89. If T lies in B[H, K], then there exists a partial isometry W in B[H, K] with initial space N (T )⊥ and final space R(T )− such that 1
T = W (T ∗ T ) 2 1
and N (W ) = N ((T ∗ T ) 2 ). Moreover, if T = ZQ, where Q is a nonnegative operator in B[H] and Z is a partial isometry in B[H, K] with N (Z) = N (Q), 1 then Q = (T ∗ T ) 2 and Z = W . Proof. Take T in B[H, K]. Recall that (Propositions 5.76 and 5.86) T ∗ T is in 1 1 B + [H] and R((T ∗ T ) 2 )− = R(T ∗ T )− = N (T ∗ T )⊥ = N ((T ∗ T ) 2 )⊥ = N (T )⊥ . 1
Existence. Consider a mapping V0 : R((T ∗ T ) 2 ) ⊆ H → K defined by 1
V0 (T ∗ T ) 2 x = T x
for every
x∈H 1
so that R(V0 ) ⊆ R(T ). Note that V0 is linear. Indeed, if y, z ∈ R((T ∗ T ) 2 ), 1 1 then y = (T ∗ T ) 2 u and z = (T ∗ T ) 2 v for some u and v in H, and hence 1
1
V0 (αy + βz) = V0 (α(T ∗ T ) 2 u + β(T ∗ T ) 2 v) 1
= V0 (T ∗ T ) 2 (αu + βv) = T (αu + βv) = αT u + β T v 1
1
= αV0 (T ∗ T ) 2 u + β V0 (T ∗ T ) 2 v = αV0 y + β V0 z for every α, β ∈ F . Moreover, since 1
1
1
#T x#2 = T x ; T x = T ∗ T x ; x = (T ∗ T ) 2 x ; (T ∗ T ) 2 x = #(T ∗ T ) 2 x#2 , 1
1
it follows that #V0 (T ∗ T ) 2 x# = #T x# = #(T ∗ T ) 2 x#, for every x ∈ H. Hence 1 1 #V0 y# = #y# for every y ∈ R((T ∗ T ) 2 ) so that V0 : R((T ∗ T ) 2 ) → K is a linear 1 ∗ − ∗ − isometry. Extend it to R(T T ) = R((T T ) 2 ) and get the mapping V : R(T ∗ T )− → K. This is a linear isometry with R(V ) ⊆ R(V0 )− ⊆ R(T )−. Indeed, since the 1 mapping V0 : R((T ∗ T ) 2 ) → R(V0 ) is an isometric isomorphism, it follows by 1 Corollary 4.38 that V : R((T ∗ T ) 2 )− → R(V0 )− is again an isometric isomor1 1 phism. Since V (T ∗ T ) 2 x = V0 (T ∗ T ) 2 x = T x for every x ∈ H, 1
T = V (T ∗ T ) 2 , and so R(T ) ⊆ R(V ), where V : N (T )⊥ → K is a linear isometry with R(V ) = R(T )− . (Recall: N (T )⊥ is a Hilbert space so that R(V ) = R(V )− .) Let P : H → H be the orthogonal projection onto N (T )⊥. Then R(P ) = N (T )⊥, and so V P y = 1 1 1 V y for every y ∈ N (T )⊥ = R((T ∗ T ) 2 ). Thus V P (T ∗ T ) 2 x = V (T ∗ T ) 2 x for 1 1 each x ∈ H. That is, V P (T ∗ T ) 2 = V (T ∗ T ) 2 . With W = V P : H → K we get
404
5. Hilbert Spaces 1
T = W (T ∗ T ) 2 . Since V = W |R(P ) = W |N (T )⊥ is an isometry and P is an orthogonal projec1 tion, we get N (W ) = N (V P ) = N (P ) = R(P )⊥ = N (T ) = N ((T ∗ T ) 2 ) and R(W ) = R(V ) = R(T )−. Therefore, W : H → K is a partial isometry 1
with initial space N (T )⊥, final space R(T )−, and N (W ) = N ((T ∗ T ) 2 ). Uniqueness. Let Z ∈ B[H, K] be a partial isometry with N (Z) = N (Q), where Q is a nonnegative operator in B[H]. Suppose T = ZQ. Since Z ∗ Z is the orthogonal projection onto N (Z)⊥ = N (Q)⊥ = R(Q)− (Propositions 1 5.76(b∗ ) and 5.88(b)), we get T ∗ T = QZ ∗ ZQ = Q2 . Thus Q = (T ∗ T ) 2 by 1 1 uniqueness of the nonnegative square root. Thus Z(T ∗ T ) 2 = T = W (T ∗ T ) 2 so that Z|R((T ∗ T )1/2 ) = W |R((T ∗ T )1/2 ) , and therefore N (Z) = N (W )
and
Z|N (Z)⊥ = W |N (W )⊥ ,
1
1
for R((T ∗ T ) 2 )− = R(T ∗ T )− = N (T )⊥ and N (Z) = N (Q) = N ((T ∗ T ) 2 ) = N (T ) = N (W ). Conclusion: The partial isometries Z: H → K and W : H → K have the same initial space and they coincide there. That is, Z = W . If T = W Q is the polar decomposition of T , then W ∗ W is the orthogonal projection onto N (W )⊥ = N (Q)⊥ = R(Q)− so that W ∗ W Q = Q. Thus T = WQ
implies
W ∗ T = Q.
Here is another corollary of Theorem 5.89. Corollary 5.90. Let T = W Q be the polar decomposition of T ∈ B[H, K]. (a) W ∈ B[H, K] is an isometry if and only if N (T ) = {0}. (b) W ∈ B[H, K] is a coisometry if and only if R(T )− = K. Proof. Recall that N (T ) = N (W ) and R(W ) = R(T )− whenever T = W Q is the polar decomposition of T . (a) A partial isometry W is an isometry if and only if N (W )⊥ = H, which means that N (W ) = {0}. (b) W is a coisometry if and only if W ∗ is an isometry. But W ∗ is a partial isometry with N (W ∗ )⊥ = R(W ) (Proposition 5.88). Hence W ∗ is an isometry if and only if R(W ) = K. Recall that a unitary transformation is precisely a surjective isometry or, equivalently, an isometry and a coisometry. Also recall that a transformation is called quasiinvertible if it is injective and has dense range. An immediate consequence of the preceding results reads as follows.
Problems
405
Corollary 5.91. If T = W Q is the polar decomposition of T ∈ B[H, K], then W is unitary if and only if T is quasiinvertible. 1
In particular, if T is invertible (i.e., if T ∈ G[H, K]), then T = U (T ∗ T ) 2 , where U ∈ G[H, K] is unitary. If T is an operator on H (i.e., if T ∈ B[H]), then both W and Q are again operators on H. However, the polar decomposition of an operator T does not necessarily commute (i.e., in general W Q = QW ). The class of all operators for which the polar decomposition commutes will be characterized in Chapter 6 (cf. Proposition 6.4). These are the quasinormal operators; a class of operators that includes the class of the normal operators of Section 6.1.
Suggested Reading Kato [1] Kubrusly [1], [2] Lax [1] Murphy [1] Naylor and Sell [1] Pearcy [1], [2] Putnam [1] Radjavi and Rosenthal [1] Reed and Simon [1] Riesz and Sz.-Nagy [1] Schatten [1] Stone [1] Sz.-Nagy and Foia¸s [1] Weidmann [1] Yoshino [1]
Akhiezer and Glazman [1] Arveson [1] Bachman and Narici [1] Balakrishnan [1] Beals [1] Berberian [3] Berezansky, Sheftel, and Us [1] Conway [1], [2] Davidson [1] Dunford and Schwartz [2] Fillmore [1], [2] Furuta [1] Gohberg and Kre˘ın [1] Goldberg [1] Halmos [1], [4]
Problems Problem 5.1. Let # # be the norm induced by an inner product ; on a linear space X = {0}. Show that, for every x ∈ X , ;y| #x# = sup |x ; y| = sup |x ; y| = sup |x y .
y =1
Hint : If #x# = 0, then #x# =
y ≤1
x x
y =0
; x. Use the Schwarz inequality.
Problem 5.2. Let (X , ; ) be an inner product space and let # # be the norm on X induced by ; . Take arbitrary vectors x and y in X and consider the following assertions. (a) #x + y# = #x# + #y#. # # (a ) #x − y# = # #x# − #y# #.
406
5. Hilbert Spaces
(b) |x ; y| = #x##y#.
(b ) x ; y = #x##y#. (c) x and y are collinear (i.e., either one of them is null or y = αx for some nonzero α in F ).
(c ) x and y are proportional (i.e., either one of them is null or y = αx for some real α > 0). Show that the diagram below exhibits all possible implications among the above assertions.
(a) ⇐⇒ (a ) ⇐⇒ (b ) ⇐⇒ (c ) = = > > (b) ⇐⇒ (c) Hint : Consider the auxiliary assertion.
(b ) Rex ; y = #x##y#. Recall that #x ± y#2 = #x#2 ± 2 Rex ; y + #y#2 for every x, y ∈ X . Use this identity to show that (a)⇔(b ), (a )⇔(b ) and, by setting z = x ; y#y#−2 y and −2 z = |x ; y|#y# y, that (b)⇒(c) and (b )⇒(c ). Now use the Schwarz inequal ity to show that (b )⇔(b ). The remaining implications are trivially verified. Show that (c)⇒ / (c ) and conclude that there may be no other implication in the above diagram. Problem 5.3. Let X and Y be inner product spaces and take arbitrary T and L in L[X , Y ]. Prove the following identities.
(a) T x ; Ly + T y ; Lx = 12 T (x+y) ; L(x+y) − T (x−y) ; L(x−y) for every x, y ∈ X . If X is a complex inner product space, then (b) T x ; Ly = 14 T (x + y) ; L(x + y) − T (x − y) ; L(x − y)
+ iT (x + iy) ; L(x + iy) − i T (x − iy) ; L(x − iy)
for every x, y ∈ X . These yield the polarization identities of Proposition 5.4. Hint : Verify: T (x + αy) ; L(x + αy) = T x ; Lx + αT x ; Ly + αT y ; Lx + |α|2 T y : Ly. Then set α = ±1 to get (a), and also α = ±i to get (b). Problem 5.4. Take T ∈ B[X , Y ], where X and Y are inner product spaces. Show that the following assertions are pairwise equivalent. (a) T = O. (b) T x = 0 for all x ∈ X . (c) T x ; y = 0 for all x ∈ X and y ∈ Y.
Problems
407
Now set Y = X so that T ∈ B[X ], and consider the following further assertion. (d) T x ; x = 0 for all x ∈ X . Clearly, (c) implies (d). If X is a complex inner product space, then prove the converse: if X = Y is complex, these four assertions are all pairwise equivalent. Hint : Use Problem 5.3(b) to show that (d) implies (c) if X = Y is complex. Problem 5.5. Let {Tn } be a sequence of bounded linear transformations of a Hilbert space H into a Hilbert space K (i.e., Tn ∈ B[H, K] for each n). Show that uniform, strong, and weak boundedness coincide. In other words, show that the following assertions are pairwise equivalent. (a) supn #Tn # < ∞. (b) supn #Tn x# < ∞ for every x ∈ H. (c) supn |Tn x ; y| < ∞ for every x ∈ H and every y ∈ K. Now set K = H so that Tn ∈ B[H], and consider the following further assertion. (d) supn |Tn x ; x| < ∞ for every x ∈ H. Clearly, (c) implies (d). If H = K is a complex Hilbert space, then prove the converse: if H = K is complex, these four assertions are all pairwise equivalent. Hint : Check that (a)⇒(b)⇒(c). Now take an arbitrary x ∈ H. For each n consider the functional xn : K → F given by xn (y) = y ; Tn x
for every
y ∈ K.
Show that each xn is linear and bounded. If (c) holds (so that supn |xn (y)| < ∞ for every y ∈ K), then the Banach–Steinhaus Theorem (Theorem 4.43) ensures that supn #xn # < ∞ because K is a Banach space. But #Tn x# = sup |xn (y)| = #xn #
y =1
for each n (cf. Problem 5.1). Thus conclude that (c)⇒(b). A straightforward application of the Banach–Steinhaus Theorem ensures that (b)⇒(a) because H is a Banach space. Finally, if K = H is complex, then use Problem 5.3(b) to show that (d)⇒(c). Problem 5.6. Let (X , ; ) be an inner product space and equip the field F with its usual metric. Use the Schwarz inequality and Corollary 3.8 to prove the following assertions. (a) · ; y: X → F is a continuous function for every y ∈ X . (b) x ; · : X → F is a continuous function for every x ∈ X .
408
5. Hilbert Spaces
Equip the direct sum X ⊕ X with the inner product ; ⊕ of Examples 5.E and 5.I. That is, set x ; y ⊕ = x1 ; y1 + x2 ; y2 for every x = (x1 , x2 ) and y = (y1 , y2 ) in X ⊕ X . Show that (c) ; ⊕ : X ⊕ X → F is a continuous function. Hint : Take an arbitrary convergent sequence {x (n) = (x1 (n), x2 (n))} in X ⊕ X and let x = (x1 , x2 ) ∈ X ⊕ X be its limit. Verify that : 9 : 9 x1 (n) − x1 + x1 ; x2 (n) − x2 + x2 = x1 (n) − x1 ; x2 (n) − x2 : 9 : 9 : 9 + x1 (n) − x1 ; x2 + x1 ; x2 (n) − x2 + x1 ; x2 and use the Schwarz inequality to show that |x1 (n) ; x2 (n) − x1 ; x2 | ≤ #x1 (n) − x1 ##x2 (n) − x2 # + #x2 ##x1 (n) − x1 # + #x1 ##x2 (n) − x2 #. Since #x (n) − x #2⊕ = #x1 (n) − x1 #2 + #x2 (n) − x2 #2 , we get x (n) → x in X ⊕ X if and only if x1 (n) → x1 and x2 (n) → x2 in X , which implies that x1 (n) ; x2 (n) → x1 ; x2 in F . Problem 5.7. Let A and B be subsets of a Hilbert space H. Show that (a) A ⊆ B and A⊥ ⊆ B implies B = H. Hint : B ⊥⊥ = H because B ⊥ ⊆ A⊥ ∩ A⊥⊥ = {0}. Use Proposition 5.15. In particular, if M is a subspace of H, then (b) A ⊆ M and A⊥ ⊆ M implies M = H. Let M and N be linear manifolds of an inner product space X . Show that (c) M ⊥ N
implies M− ⊥ N − .
Hint : Verify that if u = limm um ∈ M− and v = limm um ∈ N − for um in M and vm in M, then u ; v = limm um ; v = limm limn um ; vn = 0. In particular, if X is a Hilbert space, then (d) M ⊥ N
implies (M + N )− = M− + N − .
Hint : Apply the remark after Proposition 4.9, Theorem 5.10(b), and item (c) to conclude that (M + N )− = (M− + N − )− = M− + N − if M ⊥ N . Let M and N be linear manifolds of Hilbert spaces H and K, respectively, and consider the (orthogonal) direct sum H ⊕ K. Prove the following assertions. (e) (M ⊕ N )− = M− ⊕ N −. Hint : By Proposition 5.24 and item (d). H ⊕ K if and only if (f) (M ⊕ N )− = M ⊕ N = M− = M, N − = N , and M = H or N = K. Hint : By (e) the result is equivalent to M− ⊕ N − = M ⊕ N = H ⊕ K.
Problems
409
Problem 5.8. Let X be an inner product space and let H be a Hilbert space. Prove the following propositions. If {Aγ }γ∈Γ is a nonempty family of subsets of X , then
⊥ . (a) γ∈Γ A⊥ γ = γ∈Γ Aγ If {Mγ }γ∈Γ is a nonempty family of linear manifolds of X , then
⊥ . (b) γ∈Γ M⊥ γ = γ∈Γ Mγ
− Hint : Proposition 5.12. Recall: . γ∈Γ Mγ = γ∈Γ Mγ If {Mγ }γ∈Γ is a nonempty family of subspaces of H, then
⊥ − = H, (c) γ∈Γ Mγ = {0} if and only if γ∈Γ Mγ
− ⊥ (d) = H if and only if γ∈Γ Mγ γ∈Γ Mγ = {0}. Hint : Propositions 5.12 and 5.15 and item (b). If {Mγ }γ∈Γ is a nonempty family of orthogonal subspaces of H, then
− implies Mα = α =γ∈Γ M⊥ (e) H = γ for every α ∈ Γ . γ∈Γ Mγ
− Hint : Show that Mα ⊥ α =γ∈Γ Mγ , and so Mα ⊥ (cf. α =γ∈Γ Mγ Proposition 5.12). Verify each of the following steps (see Theorem 5.10(b)):
−
Mα ∪ = Mα ∪ α =γ∈Γ Mγ ⊆ Mγ = γ γ∈Γ M α = γ∈Γ
− Mα + Mγ − ⊆ H. Hence H = Mα + . Conclude α =γ∈Γ α =γ∈Γ Mγ
− that M⊥ = M by Proposition 5.19, and therefore Mα = α
⊥ α =γ∈Γ γ (Propositions 5.12 and 5.15). Apply (b). α =γ∈Γ Mγ Remark : The collection Lat(H) of all subspaces of H is a complete lattice in the inclusion ordering, where M ∧ N = M ∩ N and M ∨ N = (M + N )− (Section 4.3). Thus, by item (b) and Proposition 5.15, M ∨ N = (M⊥ ∩ N ⊥ )⊥. Problem 5.9. Let X and Y be inner product spaces and consider the setup of Problem 4.42. If there exists a unitary transformation U ∈ G[X , Y ] intertwining T ∈ B[X ] and S ∈ B[Y ]; that is, if U T = S U, then we say that T and S are unitarily equivalent (notation: T ∼ = S). If this happens, then it is clear that X and Y are unitarily equivalent inner product spaces. Unitary equivalence is a rather important special case of isometric equivalence. In particular (see Problems 4.41 and 4.42), T ∼ =S
implies
#T # = #S#.
First prove the following assertion that mirrors similarity on linear spaces, and isometric equivalence on normed spaces.
410
5. Hilbert Spaces
(a) Unitary equivalence has the defining properties of an equivalence relation. Now take any polynomial p in one variable (cf. Problem 2.20) and show that (b) Up(T ) = p(S)U whenever U T = SU for every unitary U ∈ G[X , Y ]. In particular, T ∼ = S implies T n ∼ = S n for every n ≥ 0 (see Problem 4.24). Next show that product does not preserve unitary equivalence. That is, show that (c) T ∼ = S do not imply T S ∼ = TS. = T and S ∼ Hint: Let T be a nilpotent and let U be a unitary, both operators acting on the same Hilbert space, such that U T is a nonzero idempotent (e.g., see Problem 5.43(e)). Take S = T and set S = ISI = S and T = U T U so Verify that T S ∼ that T ∼ = T and S ∼ = S. = U T = (U T )2 = U T U T = TS. Let X and Y be Hilbert spaces and let M be a subspace of X . Suppose T ∼ = S and let U ∈ G[X , Y ] be any unitary transformation intertwining T to S. Theorem 3.24 ensures that U (M) is a subspace of Y. Show that (d) M is invariant for T (or reduces T ) if and only if U (M) is invariant for S (or reduces S). Moreover, M is nontrivial if and only if U (M) is. Hint : Use Proposition 5.73 to show that U T = SU
implies
U T ∗ = S ∗ U.
Corollary 5.75 says that M reduces T if and only if T (M) ⊆ M and T ∗ (M) ⊆ M. Thus verify that (cf. Problem 1.2(c)) T (M) ⊆ M implies S[U (M)] ⊆ U (M) and T ∗ (M) ⊆ M implies S ∗ [U (M)] ⊆ U (M). Therefore we may infer that T has a nontrivial invariant subspace (or a reducing subspace) if and only if S has (also see Problem 4.25 through 4.29). Problem 5.10. This is the uncountable version of the Orthogonal Structure Theorem. Prove it. (a) Let H be a Hilbert space. Suppose {Mγ }γ∈Γ family of is an uncountable
− pairwise orthogonal subspaces of H. If x ∈ M , then there exists γ γ∈Γ a unique summable family {u } of vectors in H with u ∈ M for each γ γ∈Γ γ γ γ such that x = γ∈Γ uγ . Moreover, #x#2 = γ∈Γ #uγ #2 . Conversely, if {uγ }γ∈Γ is a square-summable family of vectors in H with uγ ∈ Mγ for − each γ ∈ Γ , then {uγ }γ∈Γ is summable and γ∈Γ uγ lies in . γ∈Γ Mγ Hint : Set x(n) = k∈Nnx(n)k in k∈NnMk , where each Nn is a finite subset of Γ such that N ⊆ N and # Nn ≥ n. Consider the set N∞ = n n+1 N and construct a countable family of orthogonal vectors in H, say n≥1 n {uk }k∈N∞ , as in the proof of Theorem 5.16. (b) Show that H must be nonseparable if {Mγ }γ∈Γ has uncountably many nonzero subspaces. (Hint : Proposition 5.43 and the Axiom of Choice.) Even in this case, the sum x = γ∈Γ uγ has only a countable number of nonzero vectors. (Why?)
Problems
411
(c) Restate Corollaries 5.17 and 5.18 in light of item (a). Rewrite Examples 5.I and 5.J for an uncountable family {Mγ }γ∈Γ of pairwise orthogonal subspaces of a Hilbert space (H, ; ) and conclude that
− " Mγ ∼ Mγ . = γ∈Γ
γ∈Γ
be a collection subspaces of a Hilbert Problem 5.11. Let {Mγ }
of orthogonal space H that spans H (i.e., γ Mγ = γ Mγ − = H) and let Bγ be an orthonormal basis for each Mγ . Show that γ Bγ is an orthonormal basis for H.
Hint : γ Bγ is an orthonormal set in H and γ Bγ ⊆ B — Sections
−γ γ
2.3, 3.5, and 4.3.Verify each of the following steps: Mγ = Mγ = γ γ
= γ Bγ ⊆ γ Bγ γ Bγ ⊆ H. Problem 5.12. Completeness is necessary in Theorem 5.10(b): If M and N are orthogonal subspaces of an incomplete inner product space X , then it may happen that M + N is not closed in X . Indeed, set X = span {ek }∞ k=0 , 2 ∞ where e0 = { 1i }∞ i=1 ∈ + and {ek }k=1 is the canonical orthonormal basis for 2 the Hilbert space + (cf. Example 5.L(b)). Recall that X is the linear manifold 2 of + consisting of all (finite) linear combinations of vectors from {ek }∞ k=0 . Now consider the following linear manifolds of X . M = u ∈ {υk }∞ k=1 ∈ X : υ2k−1 = 0 for all k ≥ 1 , N = v ∈ {νk }∞ k=1 ∈ X : ν2k = 0 for all k ≥ 1 .
It is clear that M ⊥ N , isn’t it? Moreover, they are subspaces of X . (a) Show that M and N are both closed in the inner product space X . ∞ Hint : Take a sequence {un }∞ n=1 , with each un = {υk (n)}k=1 ∞in M such that ∞ 2 un → x = {ξk }k=1 in X . Split the series #un − x# = k=1 |υk (n) − ξk |2 into the sum of a series running over the odd integers and a series running ∞ 2 2 over the even integers and conclude that #u − x# = n k=1 |ξ2k−1 | + ∞ 2 k=1 |υ2k (n) − ξ2k | . This implies that x ∈ M.
(b) Exhibit an M + N -valued sequence that converges in X to e0 ∈ X . ∞ Hint : For each n ≥ 1 define un = {υk (n)}∞ k=1 and vn = {νk (n)}k=1 as fol1 lows. υk (n) = k if 1 ≤ k ≤ n is an even integer, and υk (n) = 0 otherwise. νk (n) = k1 if 1 ≤ k ≤ n is an odd integer, and νk (n) = 0 otherwise. Show that un , vn ∈ span {ek }nk=1 ⊂ X , un ∈ M, vn ∈ N , and that un + vn = (1, 12 , . . . , n1 , 0, 0, 0, . . .) for every n ≥ 1.
/ M + N. (c) Show (by contradiction) that e0 ∈ Conclude from (b) and (c) that M + N is not closed in X .
412
5. Hilbert Spaces
Problem 5.13. Orthogonality is necessary in Theorem 5.10(b): If M and N are nonorthogonal subspaces of a Hilbert space H, then it may happen that M + N is not closed in H even if M ∩ N = {0}. In fact, set vk = 1 −
1 k2
12
e2k−1 +
1 k
e2k
in
2 +
for every k ≥ 1, where {ek }∞ k=1 is the canonical orthonormal basis for the 2 2 Hilbert space + , and consider the following subspaces of + . M = {e2k−1 }∞ and N = {vk }∞ k=1 k=1 . (a) Show that {vk }∞ k=1 is an orthonormal sequence, and hence an orthonormal basis for the Hilbert space N . (b) Apply the Fourier Series Theorem to show that M ∩ N = {0}. Hint :Take any x in M ∩ N and consider its Fourier series expansions: ∞ ∞ x = x ; e e = 2k−1 2k−1 k=1 x ; vk vk . Verify that 0 = x ; e2j = ∞ k=1 1 x ; v v ; e = x ; v k k 2j j for all j ≥ 1. Conclude that x = 0. k=1 j ∞ ∞ 2 2 Show that the series k=1 k1 e2k converges in + and set x = k=1 k1 e2k in + . n 1 n n 1 1 2 Note that xn = k=1 k e2k = k=1 vk − k=1 (1 − k2 ) e2k−1 lies in M + N 2 for each n ≥ 1. Moreover, check that xn → x in + as n → ∞. (c) Apply the Fourier Series Theorem to show that x ∈ / M+N Hint : Suppose x = u + v with u ∈ M and v ∈ N . Verify that x ; e2n = n1 , u ; e2n = 0, and v ; e2n = v ; vn n1 for each n ≥ 1. Now conclude that 1 1 hence v ; vn = 1 for every n ≥ 1, which is a contradicn v ; vn = n , and ∞ tion (since #v#2 = n=1 |v ; vn |2 ). 2 by the Closed Set Theorem. Therefore, M + N is not closed in +
Problem 5.14. Let {eγ }γ∈Γ be any orthonormal basis for a complex Hilbert space H. For each x ∈ H consider its Fourier series expansion with respect to {eγ }γ∈Γ ; that is, x = γ∈Γ x ; eγ eγ . Verify that {Rex ; eγ eγ }γ∈Γ and {Imx ; eγ eγ }γ∈Γ are both summable families of vectors in H. Set Re x = γ∈Γ Rex ; eγ eγ and Im x = γ∈Γ Im x ; eγ eγ in H. Show that x = Re x + i Im x and prove the following identities. (a) Re x ; eγ = Rex ; eγ and Im x ; eγ = Imx ; eγ for every γ ∈ Γ . (b) Re x ; Im x = Im x ; Re x and #x#2 = #Re x#2 + #Im x#2 . (c) Re (αx) = Re α Re x − Im α Im x and Im (αx) = Re α Im x − Im α Re x. 9 : (d) Re (αx) ; Im (αx) =
Re α Im α #Re x#2 − #Im x#2 + Re x ; Im x (Re α)2 − (Im α)2 .
Problems
413
Now prove the Orthogonal Normalization Lemma: For every nonzero vector x in a complex Hilbert space there exists an α ∈ C such that #αx# = 1 and Re(αx) ⊥ Im (αx). Hint : Use the above results and the polar representation for α ∈ C . Problem 5.15. Let {eγ }γ∈Γ be an orthonormal family of vectors eγ in an infinite-dimensional Hilbert space H, and let {ek }∞ k=1 be a countably infin nite subset of {eγ }γ∈Γ . Set M = {ek }∞ k=1 and Mn = span {ek }k=1 for each integer n ≥ 1. Let P : H → H be the orthogonal projection onto M, and let Pn : H → H be the orthogonal projection onto each finite-dimensional subs space Mn . Show that Pn −→ P . (Hint : Corollary 5.55.) ∞ on a Hilbert space Problem 5.16. ∞ Let {Pk }k=0 be a resolution of the identity n s H so that k=0 Pk x = x for every x ∈ H; that is, k=0 Pk −→ I as n → ∞. Suppose Pk = O for every k ≥ 0 and let {λk }∞ k=0 be a bounded ∞ sequence of scalars. By Proposition 5.61 the weighted sum of projections k=0 λk Pk is the n s ∞ operator on H for which λ P −→ λ P as n → ∞. Theorem 5.59 k k k k k=0 k=0
∞
− ∞ says that H = ) −, and is unitarily equivalent to k=0 R(Pk k=0 R(Pk ) ∞ the orthogonal direct sum k=0 R(Pk ) equipped with its usual inner product (see Examples 5.I and 5.J). Moreover, the natural unitary transformation
Φ:
∞ "
R(Pk ) →
∞
k=0
k=0
∞
k=0 R(Pk )
between the Hilbert spaces
− R(Pk )
∞
Φ {uk }∞ = uk k=0
and
for every
∞
−
k=0 R(Pk )
{uk }∞ k=0 ∈
is given by
∞
k=0 R(Pk ).
k=0
Set Ik = Pk |R(Pk ) in B[R(Pk )] for each k and consider the operator ∞ "
λk Ik
in
-∞ . B k=0 R(Pk )
k=0
∞
∞ ∞ ∞ given by k=0 λk Ik {uk }k=0 = {λk uk }k=0 for every {uk }k=0 in (cf. Problem 4.16). Show that
∞
k=0 R(Pk )
∞ ∞
" Φ λk Ik Φ−1 x = λk Pk x k=0
k=0
∞
∞ for every x in H. (Hint : Φ−1 k=0 Pk x = {Pk x}k=0 .) In other words, as the ∞ orthogonal projections {Pk }k=0 are orthogonal to each other, a weighted sum of projections is identified with an orthogonal direct sum of scalar operators. In fact, these are unitarily equivalent operators:
414
5. Hilbert Spaces ∞
λk Pk ∼ =
k=0
∞ "
λk Ik
Ik = Pk |R(Pk ) .
with
k=0
Problem 5.17. Let {ek }∞ k=1 be an orthonormal basis for a (separable infinitedimensional) Hilbert space H and let {λk }∞ k=1 be a sequence of scalars. For each integer m ≥ 1 consider the mapping Tm : H → H defined by Tm x =
m
λk x ; ek ek
for every
x ∈ H.
k=1
(a) Verify that Tm lies in B[H] for each m and show that sup |λk | < ∞
s Tm −→ T,
if and only if
k
where T ∈ B[H] is the weighted sum of projections (cf. Definition 5.60) Tx =
∞
λk x ; ek ek
for every
x =
k=1
∞
x ; ek ek ∈ H
k=1
with #T # = supk |λk | (see Theorem 5.48, and Propositions 5.57 and 5.61). In this case, T is called a diagonal operator with respect to the basis {ek }∞ k=1 . (b) Conversely, take T ∈ B[H]. If there exists an orthonormal basis for H and a bounded sequence of scalars such that T is the strong limit of {Tm }, then T is a diagonalizable operator . Let supk |λk | < ∞. Show that (see Problem 4.53) lim λk = 0 k
if and only if
u Tm −→ T.
(c) Still under the assumption that supk |λk | < ∞, which ensures the existence of T in B[H], prove the following assertions (see Example 4.J). (c1) There exists T −1 ∈ L[R(T ), H] if and only if λk = 0 for every k ≥ 1. In this case, R(T )− = H. Hint : span {ek }∞ k=1 ⊆ R(T ) if λk = 0 for all k (see Problem 3.47). (c2) There exists T −1 ∈ B[H] if and only if inf k |λk | > 0. In this case, T −1 x =
∞
λ−1 k x ; ek ek
for every
x ∈ H.
k=1
Problem 5.18. Consider the setup of the previous problem under the assumption that supk |λk | < ∞. Use the Fourier expansion of x ∈ H (Theorem 5.48) to show by induction that T nx =
∞
λnk x ; ek ek
for every
x∈H
k=1
and every positive integer n. Now prove the following propositions.
Problems
415
u (a) T n −→ O if and only if supk |λk | < 1. s O if and only if |λk | < 1 for every k ≥ 1. (b) T n −→ w s O if and only if T n −→ O. (c) T n −→
(d) limn #T n x# = ∞ for every x = 0 if and only if 1 < |λk | for every k ≥ 1. Hint : For (a) and (b), see Example 4.H. For (c), note that T n ej = λnj ej , w and so |T n ej ; ej | = |λj |n . If |λj | ≥1 for some j, then T n −→ / O. For (d), n 2 2 2n note that the expansion #T x# = k | x; ek | |λk | has nonzero terms for every x = 0 and if and only if 0 < |λk | for every k (see Example 4.J). Suppose T has an inverse T −1 ∈ L[R(T ), H] on its range. Prove the assertion. (e) limn #T nx# = ∞ or limn #T −nx# = ∞ for every x = 0 if and only if 0 = |λk | = 1 for every k ≥ 1. Problem 5.19. Let {ek }∞ k=1 be an orthonormal basis for a Hilbert space H. Show that M (defined below) is a dense linear manifold of H. ∞ M = x ∈ H: k=1 |x ; ek | < ∞ . Hint : Let T be a diagonal ∞ operator (Problem 5.17) with λk = 0 for all k (so 2 that R(T )− = H) and k=1 |λk |2 < ∞. Show that (Schwarz inequality in + )
∞ ∞ ∞ 1 1 ∞ 2 2 2 2 <∞ j=1 |T x ; ej | = j=1 |λj ||x ; ej | ≤ j=1 |λj | j=1 |x ; ej | for all x ∈ H. Hence R(T ) ⊆ M. Problem 5.20. Let {xn } be an M-valued sequence and let x be a vector in M, where M is a subspace of a Hilbert space H. Show that w xn −→ x in M
if and only if
w xn −→ x in H.
w Hint : xn −→ x in M if and only if xn ; u → x ; u for every u ∈ M because M is a Hilbert space (Theorem 5.62). Recall: H = M + M⊥ (Theorem 5.20), w w and show that xn −→ x in M implies xn −→ x in H.
Problem 5.21. Let {Tn } be a sequence of transformations in B[X , Y ] where X and Y are inner product spaces. Prove the following propositions. w (a) If Tn −→ T for some T ∈ B[X , Y ], then #T # ≤ lim inf n #Tn #.
Hint : If {Tnk} is a subsequence of {Tn } such that limk #Tnk# = lim inf n #Tn # w and if Tn −→ T , then T x ; y = limk Tnk x ; y (why?) and so |T x ; y| ≤ lim inf n #Tn ##x##y#, for every x, y ∈ H. Now apply Corollary 5.71. (b) If supn #Tn # < ∞ and {Tn a ; b} converges in F for every a in a dense subset A of X and every b in a dense subset B of Y, then {Tn x ; y} converges in F for every x ∈ X and every y ∈ Y. Hint : See the hint to Problem 4.45(b) and recall that F is complete.
416
5. Hilbert Spaces
(c) Take T ∈ B[X , Y ]. If supn #Tn # < ∞ and Tn a ; b → T a ; b for every a in a dense subset A of X and every b in a dense subset B of Y, then Tn x ; y → T x ; y for every x ∈ X and every y ∈ Y. Hint : (Tn − T )x ; y = (Tn − T )(x − aε ) + (Tn − T )aε ; y − bε + bε . (d) If X and Y are Hilbert spaces, and if the hypothesis of (b) or (c) holds w true, then Tn −→ T. Problem 5.22. Let {Tn } and {Sn } be sequences of transformations in B[X , Y ] and in B[Y, Z], respectively, where X , Y, and Z are inner product spaces. Take T ∈ B[X , Y ], S ∈ B[Y, Z], and prove the following propositions. w s w (a) If supn #Sn # < ∞, Sn −→ S, and Tn −→ T , then Sn Tn −→ ST.
Hint : Sn Tn − S T = Sn (Tn − T ) + (Sn − S)T . w s w S, and Tn −→ T , then Sn Tn −→ S. (b) If Y and Z are Hilbert spaces, Sn −→ u w w S, supn #Tn # < ∞, and Tn −→ T , then Sn Tn −→ ST. (c) If Sn −→
Hint : Sn Tn − S T = (Sn − S)Tn + S(Tn − T ). u w w S, and Tn −→ T , then Sn Tn −→ ST . (d) If X and Y are Hilbert spaces, Sn −→
Show that addition of weakly convergent sequences of bounded linear transformations is again a weakly convergent sequence of bounded linear transformations whose limit is the sum of the limits of each summand. Remarks: It is easy to show, even if X = Y = Z is a Hilbert space, that w u Sn −→ S and Tn −→ T
do not imply
s Sn Tn −→ S T.
For instance, if {Tn } is a constant sequence, say Tn = I for all n, and {Sn } is any operator sequence that converges weakly to zero but does not converge s strongly, then Sn Tn = Sn −→ / O = OI. It is also easy to show that s w Sn −→ S and Tn −→ T
do not imply
w Sn Tn −→ S T.
Indeed, there exists an isometry S+ (thus S+∗n S+n = I and #S+n x# = 1 for all n w s s / O and S+n −→ / O) for which S+∗n −→ O (and hence and all x, and so S+∗n S n −→ w w ∗n n S+ −→ O and S+ −→ O) — the isometry S+ is introduced in Problem 5.29. Problem 5.23. This problem deals with a characterization of weak convergence for sequences of Hilbert space operators, and also with its relation to strong convergence. Prove the following proposition. (a) Let {ek }∞ k=1 be an orthonormal basis for a separable Hilbert space H. Let {Tn } be a sequence of operators in B[H], and let T be an operator in B[H]. w Tn −→ T
if and only if
sup #Tn # < ∞ and Tn ej ; ek → T ej ; ek as n → ∞ for every j, k ≥ 1. n
Problems
417
Hint : The “only if” part is easy (cf. Problem 5.5 and Proposition 5.67). Conversely, suppose supn #Tn# < ∞ and limn (Tn − T )ej ; ek = 0 for every j, k ≥ 1. Use Theorem 5.48 and Problem 4.14(b) to show that lim sup |(Tn − T )ej ; y| ≤ lim sup n
n
∞
|(Tn − T )ej ; ek ||ek ; y| = 0
k=1
for each j ≥ 1 and every y ∈ M, where M is the linear manifold of Problem 5.19. Repeat the argument to show that lim sup |(Tn − T )x ; y| = lim sup |x ; (Tn − T )∗ y| n
n
≤ lim sup n
∞
|x ; ek ||(Tn − T )ek ; y| = 0,
k=1
and so limn |(Tn − T )x ; y| = 0, for every x, y ∈ M. Now use Problems 5.19 w and 5.21(d) to conclude that Tn −→ T. (b) Weak convergence from below implies strong convergence, which coincides with weak convergence and norm convergence together. That is, if {Tn } is a sequence of operators in B[H] and T is an operator in B[H], then ? @ w w Tn −→ T and T and Tn −→ s =⇒ Tn −→ T ⇐⇒ #Tn x# ≤ #T x# for every x #Tn x# → #T x# for every x. (b1) If a sequence of Hilbert space contractions converges weakly to an isometry, then it converges strongly. s (b2) It may happen that Tn −→ T and #T x# < #Tn x# for every x = 0 and all n.
Hint : Show that if #Tn x#2 ≤ #T x#2 = T x ; T x for every x ∈ H and every n, then #(Tn − T )x#2 ≤ 2 Re(T − Tn )x ; T x for every x ∈ H and every n. Thus conclude the claimed implication in (b). For the equivalence in (b), see the remark between Propositions 5.65 and 5.66. Apply Proposition 4.37 to verify that (b1) is a straightforward corollary of (b). Set Tn = n+1 n I and T = I to get an example for (b2). Problem 5.24. Take L ∈ L[X ] on a linear space X and consider its nth power Ln ∈ L[X ] for each integer n ≥ 0 with L0 = I (see Problem 2.20). Recall that if X is a normed space, then each Ln lies in B[X ] whenever L lies in B[X ] (see Problem 4.22). Verify that the the power sequence {Ln }n≥0 can be recursively defined by Ln+1 = LLn for every n ≥ 0. Show by induction: (a) Ln+k = Ln Lk = Lk Ln and (Ln )k = (Lk )n = Lnk for every k, n ≥ 0. Now let X be an inner product space and take any power bounded operaw tor T ∈ B[X ] (i.e., any T such that supn #T n# < ∞). If T n −→ P for some operator P ∈ B[X ], then use Problem 5.22 to show that
418
5. Hilbert Spaces
(b) P T k = T k P = P = P k for every k ≥ 1, so that P is a projection (not necessarily an orthogonal projection). Show by induction that w (c) (T − P )n = T n − P for every n ≥ 1, so that (T − P )n −→ O.
If X is a Hilbert space and T ∈ B[X ], then show that (d) T n∗ = T ∗n and #T ∗n # = #T n # for every n ≥ 0. Problem 5.25. Let X and Y be two normed spaces. Recall that T ∈ B[X ] and S ∈ B[Y ] are similar if there exists W ∈ G[X , Y ] such that W T = SW (i.e., T = W −1SW — see Problem 4.42). Consider a sequence {Tn } of operators in B[X ] and a sequence {Sn } of operators in B[Y ]. Suppose there exists W ∈ G[X , Y ] such that W Tn = Sn W for every integer n. Show that u S Sn −→
implies
u Tn −→ W −1SW,
s S Sn −→
implies
s Tn −→ W −1SW.
s u P (or S n −→ P ) for some P ∈ B[Y ], In particular, if W T = SW and S n −→ n s −1 n u −1 then show that T −→ W P W (or T −→ W P W ) and W −1P W is a projection in B[X ] (cf. Problem 4.55).
Hint : Tn − W −1 Sn W = W −1 (Sn − S)W . Now let X and Y be two Hilbert spaces. Recall that T ∈ B[X ] and S ∈ B[Y ] are unitarily equivalent if there exists a unitary U ∈ G[X , Y ] such that U T = SU (i.e., T = U −1SU — see Problem 5.9). Again, suppose there exists a unitary U ∈ G[X , Y ] such that U Tn = Sn U for every integer n. Show that w Sn −→ S
implies
w Tn −→ U −1SU.
w P for some P ∈ B[Y ], then show that In particular, if U T = SU and S n −→ n w −1 −1 T −→ U P U , and that U P U is a projection in B[X ] (not necessarily orthogonal — cf. Problem 5.24).
Hint : (Tn − U −1 Sn U )x y = U −1 (Sn − S)U x ; y = (Sn − S)U x ; U y. Problem 5.26. Take a B[H, K]-valued sequence {Tn }, where H and K are Hilbert spaces, and take T ∈ B[H, K]. Show that u Tn −→ T
if and only if
u Tn∗ −→ T ∗,
w T Tn −→
if and only if
w Tn∗ −→ T ∗.
The adjoint operation preserves uniform and weak convergence. But it does not preserve strong convergence. In fact, as we shall see in Problem 5.29(b,c), s Tn −→ T
does not imply
s Tn∗ −→ T ∗.
s w However, if Tn −→ T and Tn∗ −→ S, then S = T ∗. Why?
Problems
419
Problem 5.27. Let H and K be Hilbert spaces. Show that (a) V ∈ B[H, K] is an isometry if and only if V ∗n V n = I (where I is the identity on H) for every n ≥ 1. (b) U ∈ B[H, K] is a unitary transformation if and only if U ∗n U n = I (identity on H) and U n U ∗n = I (identity on K) for every n ≥ 1. Hint : Propositions 5.72 and 5.73. Let T ∈ B[H] be a diagonalizable operator (Problem 5.17). Show that ∞ (c) T ∗ x = k=1 λk x ; ek ek for every x ∈ H, (d) T is unitary if and only if |λk | = 1 for every k ≥ 1. Problem 5.28. Let Hi and Ki be Hilbert spaces for i = 1, 2 and consider the Hilbert spaces H1 ⊕ H2 and K1 ⊕ K2 of Example 5.E. Take Tij ∈ B[Hj , Ki ] for i, j = 1, 2 and consider transformation ! T11 T12 T = in B[H1 ⊕ H2 , K1 ⊕ K2 ] T21 T22 of Problem 4.17. Show that ! ∗ ∗ T21 T11 T∗ = ∗ ∗ T12 T22
in B[K1 ⊕ K2 , H1 ⊕ H2 ].
In particular, if Ki = Hi for i = 1, 2 so that T, T ∗ ∈ B[H1 ⊕ H2 ], then ! ! ∗ O T11 O T11 ∗ ∗ ∗ implies T = T11 ⊕ T22 = . T = T11 ⊕ T22 = ∗ O T22 O T22 } be a sequence of Let {H -k .
Hilbert spaces and let k Hk stand for the Hilbert ∞ space H , ; of Examples 5.F and 5.G. Let {Tk } be a sequence of k=1 k 2 . - operators in B[Hk ] and consider their direct- sum T. = k Tk ∈ B k Hk as ∗ ∗ in Problem 4.16. Show k ∈ B k Hk . Next consider the usual that T = k T ∼ identification Hk = j
k {0} so that we may interpret each Hk as a subspace of k Hk . Show that (cf. Corollary 5.75) Hk reduces T = k Tk , T |Hk = Tk , and (T |Hk )∗ = T ∗ |Hk , for each k. Now let {Hγ } be a family of Hilbert spaces and consider the orthogonal direct sum γ Hγ as in Problem 5.10. Rewrite Examples 5.F and 5.G for uncountable orthogonal sums. Restate the above results for uncountable families - {T.γ } of operators in B[Hγ ] considering their direct sum T = γ Tγ ∈ B γ Hγ . Problem 5.29. An operator S+ acting on a Hilbert space H is a unilateral ∞ shift if there exists an infinite sequence ∞{Hk }k=0 of nonzero pairwise orthogonal subspaces of H such that H = k=0 Hk and S+ maps each Hk isometri∞ cally onto Hk+1 . That is, Hj ⊥ Hk whenever j = k, H = k=0 Hk , and S+ : H → H is such that S+ |Hk : Hk → Hk+1 is a surjective isometry.
420
5. Hilbert Spaces
Thus each S+ |Hk : Hk → Hk+1 is a unitary transformation (a surjective isometry) and so dim Hk+1 = dim Hk for every k ≥ 0 (Theorem 5.49). Such a common dimension is the multiplicity of S+ . The adjoint of S+ ∈ B[H], S+∗ ∈ B[H], ∞ ∞ is referred ∞ to as a backward unilateral shift . We shall write k=0 xk for {xk }k=0 in k=0 Hk . Prove the following assertions. (a) S+ : H → H and S+∗ : H → H are given by the formulas ∞ ∞ ∗ S+ x = 0 ⊕ k=1 Uk xk−1 and S+∗ x = k=0 Uk+1 xk+1 ∞ ∞ for every x = k=0 xk in H = k=0 Hk , with 0 denoting the origin of H0 , where {Uk+1 }∞ k=0 is an arbitrary sequence of unitary transformations Uk+1 : Hk → Hk+1 , so that S+ |Hk = Uk+1 , for each k ≥ 0. The operators S+ and S+∗ are identified with the infinite matrices ⎛
O ⎜ U1 O ⎜ U2 O S+ = ⎜ ⎜ U3 O ⎝
..
⎟ ⎟ ⎟ ⎟ ⎠
⎞
⎛
⎞ and
O U1∗ ⎜ O U2∗ ⎜ ∗ O U3∗ S+ = ⎜ ⎜ ⎝ O
.
..
⎟ ⎟ ⎟ ⎟ ⎠ .
of transformations, where every entry immediately below (above) the main block diagonal in the matrix of S+ (or S+∗ ) is a unitary transformation, and the remaining entries are all null transformations. (b) S+∗ is a strongly stable coisometry. That is, the operator S+ is an isometry s O (so that S+ is an isometry that is not a coisometry). and S+∗n −→ ∞ 2 ∗ ∗ and S+∗n x = k=0 Uk+1 · · · Uk+n xk+n (by induction) Hint : #S+ x#2 = #x# ∞ ∗n so that #S+ x#2 = k=n #xk #2 . w (c) S+n −→ O but {S+n } does not converge strongly. w Hint : S+∗n −→ O and #S+n x# = #x# (i.e., S+∗n S+n = I).
Let K0 be any Hilbert space unitarily equivalent to H0 , and let U0 : K0 → H0 be a unitary transformation, ∞ so that dim Hk = dim K0 for all k. Consider the 2 Hilbert space + (K0 ) = k=0 K0 of Example 5.F. ∞ 2 (d) U = k=0 Uk · · · U0 : + (K0 ) → H is a unitary transformation, and ⎞ ⎛ O ⎟ ⎜I O ⎟ ⎜ ⎟ in B[ 2 (K0 )]. I O U ∗ S+ U = ⎜ + ⎟ ⎜ ⎠ I O ⎝ .. . Thus S+ is unitarily equivalent to U ∗ S+ U , which is a unilateral shift of 2 multiplicity dim K0 . This is called the canonical unilateral shift on + (K0 ).
Problems
421
Item (b) says that the adjoint of a unilateral shift is a strongly stable coisometry. It can be shown that the converse holds true, so that a unilateral shift is precisely an operator whose adjoint is a strongly stable coisometry: S+ is a unilateral shift if and only if S+∗ is a strongly stable coisometry. (e) If an isometry commutes with a unilateral shift, then their product is again a unilateral shift. Hint : If V is an isometry and S+ a unilateral shift, then, since S+ is an isometry, both S+ V and V S+ are isometries (Problem 4.41(b)). Suppose S+ V = V S+ so that #(S+ V )∗n x# = #(S+ V )n∗ x# = #(S+∗n V n )∗ x#. Hence, since an isometry is a contraction, #(S+ V )∗n x# = #V n∗ (S+∗n x)# ≤ #S+∗n x# s s O, we get (S+ V )∗n −→ O. for every n ≥ 1 and x ∈ H. Thus, since S+∗n −→ We shall see later (Problem 5.31(c)) that if an isometry does not commute with a unilateral shift, then their product may not be a unilateral shift. Problem 5.30. An operator S acting on a Hilbert space H is a bilateral shift if there exists an infinite family {Hk }∞ k=−∞ of nonzero pairwise orthogonal ∞ subspaces of H such that H = k=−∞ Hk and S maps each Hk isometrically onto Hk+1 . That is, Hj ⊥ Hk whenever j = k, H = ∞ k=−∞ Hk , and S: H → H is such that S|Hk : Hk → Hk+1 is a surjective isometry. As in the case of a unilateral shift, each S|Hk : Hk → Hk+1 is a unitary operator. Thus the subspaces Hk of H are all unitarily equivalent, and their common dimension is the multiplicity of S. The adjoint S ∗ ∈ B[H] of S ∈ B[H] is referred to as a backward bilateral shift . Prove the following assertions. (a) S: H → H and S ∗ : H → H are given by the formulas ∞ ∞ ∗ and S ∗ x = Sx = k=−∞ Uk xk−1 k=∞ Uk+1 xk+1 ∞ ∞ for every x = k=−∞ xk in H = k=−∞ Hk , where {Uk }∞ k=−∞ is an arbitrary family of unitary transformations Uk+1 : Hk → Hk+1 , so that S|Hk = Uk+1 , for each integer k. The operators S and S ∗ are identified with the following (doubly) infinite matrices of transformations (the inner parenthesis indicates the zero-zero entry). ⎞ ⎛. ⎞ ⎛ .. .. . ⎟ ⎜ ⎟ ⎜ ∗ O ⎟ ⎜ ⎟ ⎜ O U−1 ⎟ ⎜ ⎟ ⎜ ∗ ⎟ ⎜ U−1 O ⎟ ⎜ O U ∗ 0 S = ⎜ ⎟. ⎟ and S = ⎜ ⎟ ⎜ ⎟ ⎜ U0 (O) (O) U1∗ ⎟ ⎜ ⎟ ⎜ U1 O ⎠ ⎝ ⎠ ⎝ O .. .. . . Again, every entry immediately below (above) the main block diagonal in the matrix of S (or S ∗ ) is a unitary transformation, and the remaining entries are all null transformations,
422
5. Hilbert Spaces
(b) S is unitarily equivalent to the canonical bilateral shift acting on 2 (K0 ) = ∞ k=−∞ K0 , ⎞ ⎛ .. . ⎟ ⎜ ⎟ ⎜ O ⎟ ⎜ ⎟ ⎜ I O ⎟ in B[ 2 (K0 )], ⎜ ⎟ ⎜ I (O) ⎟ ⎜ ⎠ ⎝ I O .. . where K0 is any Hilbert space with the property that dim K0 is the multiplicity of S. (Hint : Problem 5.29(d).) (c) S is a weakly stable unitary operator. In other words, S is an isometry w and a coisometry (i.e., S ∗n S n = S n S ∗n = I) and S n −→ O. 2 Hint : Let S0 denote the above canonical 0 ), which ∞bilateral shift on (H ∞ is unitarily equivalent to S. Take x = k=−∞ xk in 2 (H0 ) = k=−∞ H0 . Show by induction that S0n x = ∞ k=−∞ xk−n . Suppose n is even. Thus
S0n x ; x =
−(n/2)−1
∞
xk−n ; xk =
k=−∞
xk ; xk+n +
k=−∞
∞
xk ; xk+n .
k=−n/2
Now apply the Schwarz inequality in H0 and in 2 to show that −(n/2)−1
−(n/2)−1
|xk ; xk+n | ≤
k=−∞
≤
#xk ##xk+n #
k=−∞
12 −(n/2)−1
12
12 −(n/2)−1 −(n/2)−1 #xk #2 #xk+n #2 ≤ #xk #2 #x#. k=−∞
k=−∞
k=−∞
Similarly, ∞
|xk ; xk+n | ≤
∞
k=−n/2
#xk+n #2
12
#x# =
k=−n/2
∞
#xj #2
12
#x#.
j=n/2
Next verify that −(n/2)−1
lim n
k=−∞
#xk #2 = lim
∞
n
#xj #2 = 0.
j=n/2
w O (cf. PropoA similar argument holds for n odd. Thus infer that S0n −→ n w sition 5.67). Use Problem 5.25 to conclude that S −→ O.
(d) Both {S n} and {S ∗n } do not converge strongly. w O and #S nx# = #S ∗n x# = #x#. Hint : S ∗n −→
Problems
423
Problem 5.31. According to Problem 5.29, unilateral shifts exist only on infinite-dimensional spaces. Now let H be a separable Hilbert space. (a) S+ ∈ B[H] is a unilateral shift of multiplicity 1 if and only if S+ ek = ek+1
for every
k ∈ N0
{ek }∞ k=0
for some orthonormal basis for H; that is, if and only if it shifts some orthonormal basis indexed by N 0 (or by any set in a one-to-one order-preserving correspondence with N 0 ). In this case, also show that ∗
S+ e0 = 0
and
∗
S+ ek = ek−1
for every
k ∈ N 0.
Hint : Set Hk = span {ek }, so that dim Hk = 1, for each k ∈ N 0 . 2 ] of Problem 4.39 is, in fact, the canon(b) Verify that the operator S+ ∈ B[+ 2 ical unilateral shift of multiplicity 1 on ∞+ , which shifts the canonical 2 2 orthonormal basis for + . (Hint : + = k=0 C .) 2 ] of item (b). Exhibit a unitary operator (c) Take the unilateral shift S+ ∈ B[+ 2 U ∈ B[+ ] such that both products US+ and S+ U are not unilateral shifts. 2 ] is unitary, where W = 01 10 in B[C 2 ] and I is Hint : U = W ⊕ I in B[+ 2 the identity in B[+ ]. U S+ = 1 ⊕ S+ is an isometry (Problem 4.41(b)), but s (U S+ )∗n = 1 ⊕ S+∗n −→ 1 ⊕ O = diag(1, 0, 0, 0, . . .) = O,
and so the isometry U S+ is not a unilateral shift (see Problem 5.29). Moreover, since the unitary U also is self-adjoint, we get s (S+ U )∗(n+1) = (U S+∗ )n+1 = U (S+∗ U )n S+∗ = U (U S+ )∗n S+∗ −→ U (1 ⊕ O)S+∗ ,
where U (1 ⊕ O)S ∗ = diag(0, 1, 0, 0, . . .). Again, S+ U is an isometry with an adjoint that is not strongly stable, and so it is not a unilateral shift. 2 (d) Let {ek }∞ k=−∞ be the canonical orthonormal basis for . Reindex this basis as follows. Set, for every n ∈ N ,
fn = e 1−n if n is odd 2
and
fn = e n2 if n is even.
2 Check that {fn }∞ n=1 is an orthonormal basis for and that the operator 2 S+ in B[ ] given by the (doubly) infinite matrix ⎛ ·⎞ ·· ⎟ ⎜ 1 ⎟ ⎜ ⎟ ⎜ 1 ⎟ ⎜ ⎟ ⎜ S+ = ⎜ (0) ⎟ ⎟ ⎜ 0 1 ⎟ ⎜ ⎠ ⎝ 0 1 · · · 1
is a unilateral shift on 2 that shifts the orthonormal basis {fn }∞ n=1 .
424
5. Hilbert Spaces
Problem 5.32. According to Problem 5.30, bilateral shifts exist only on infinite-dimensional spaces. Now let H be a separable Hilbert space. (a) S ∈ B[H] is a bilateral shift of multiplicity 1 if and only if Sek = ek+1
for every
k∈Z
for some orthonormal basis {ek }∞ k=−∞ for H; that is, if and only if it shifts some orthonormal basis indexed by Z (or by any set in a one-to-one order-preserving correspondence with Z ). In this case, also show that S ∗ ek = ek−1
for every
k ∈ Z.
Hint : Set Hk = span {ek }, so that dim Hk = 1, for each k ∈ Z . (b) Verify that the operator S ∈ B[ 2 ] of Problem 4.40 is, in fact, the canonical bilateral shift of multiplicity 1 on 2 , which shifts the canonical orthonor∞ 2 2 mal basis for . (Hint : = k=−∞ C .) 2 (c) Let {en }∞ n=1 be the canonical orthonormal basis for + . Reindex this basis as follows. Set, for every k ∈ Z ,
fk = e1−2k if k ≤ 0
and
fk = e2k if n k > 0.
2 Check that {fk }∞ k=∞ is an orthonormal basis for + and that the operator 2 S in B[+ ] given by the infinite matrix ⎞ ⎛ b A ⎟ ⎜ B A ⎟ ⎜ ⎟, with b = 0 , A = 0 1 , and B = 0 0 , B A S = ⎜ 1 0 0 1 0 ⎟ ⎜ ⎝ B ... ⎠ .. . 2 is a bilateral shift on + that shifts the orthonormal basis {fk }∞ k=−∞ .
Problem 5.33. Consider the orthonormal basis {ek }k∈Z for the Hilbert space L2 (T ) of Example 5.L(c), where T denotes the unit circle about the origin of the complex plane and, for each k ∈ Z , ek (z) = z k for every z ∈ T . Define a map U : L2 (T ) → L2 (T ) as follows. If f ∈ L2 (T ), then Uf is given by (Uf )(z) = zf (z)
for every
z ∈ T.
(a) Verify that Uf ∈ L2 (T ) for every f ∈ L2 (T ), and U ∈ B[L2 (T )]. (b) Show that U is a bilateral shift of multiplicity 1 on L2 (T ) that shifts the orthonormal basis {ek }k∈Z . (c) Prove the Riemann–Lebesgue Lemma: % If f ∈ L2 (T ), then T z k f (z) dz → 0 as k → ±∞.
Problems
425
% Hint : (U kf )(z) = z k f (z) so that U kf ; 1 = T z k f (z) dz, where 1(z) = 1 w O (cf. Problem 5.30(c)). for all z ∈ T . Recall that U k −→ Problem 5.34. Let H be a Hilbert space and take T, S ∈ B[H]. Use Problem 4.20 and Corollary 5.75 to prove the following assertion. If S commutes with both T and T ∗, then N (S) and R(S)− reduce T . Problem 5.35. Take T ∈ B[H, K], where H and K are Hilbert spaces. Prove the following propositions. (a) N (T ) = {0} ⇐⇒ N (T ∗ T ) = {0} ⇐⇒ R(T ∗ )− = H ⇐⇒ R(T T ∗ )− = H. Moreover, R(T ∗ ) = H ⇐⇒ R(T ∗ T ) = H. (a∗) N (T ∗ ) = {0} ⇐⇒ N (T T ∗ ) = {0} ⇐⇒ R(T )− = K ⇐⇒ R(T T ∗ )− = K. Moreover, R(T ) = K ⇐⇒ R(T T ∗ ) = K. Hint : Use Propositions 5.15, 5.76, and 5.77 and recall that R(T ) = K if and only if R(T ) = R(T )− = K. (b) R(T ) = K ⇐⇒ T ∗ has a bounded inverse on R(T ∗ ). (b∗) R(T ∗ ) = H ⇐⇒ T has a bounded inverse on R(T ). Hint : Corollary 4.24 and Proposition 5.77. Problem 5.36. Consider the following assertions (setup of Problem 5.35): (a)
N (T ) = {0}.
(a∗ )
N (T ∗ ) = {0}.
(b)
dim R(T ) = n.
(b∗ )
dim R(T ∗ ) = m.
(c)
R(T ∗ ) = H.
(c∗ )
R(T ) = K.
(d)
∗
T T ∈ G[H].
∗
(d )
T T ∗ ∈ G[K].
If dim H = n, then (a), (b), (c), and (d) are pairwise equivalent. If dim K = m, then (a∗ ), (b∗ ), (c∗ ), and (d∗ ) are pairwise equivalent. Prove. Problem 5.37. Let H and K be Hilbert spaces and take T ∈ B[H, K]. If y ∈ R(T ), then there is a solution x ∈ H to the equation y = T x. It is clear that this solution is unique whenever T is injective. If, in addition, R(T ) is closed in K, then this unique solution is given by x = (T ∗ T )−1 T ∗ y. In other words, suppose N (T ) = {0} and R(T ) = R(T )−. According to Corollary 4.24, there exists T −1 ∈ B[R(T ), H]. Use Propositions 5.76 and 5.77 to show that there exists (T ∗ T )−1 ∈ B[R(T ∗ ), H] = B[H] and T −1 = (T ∗ T )−1 T ∗
on R(T ).
426
5. Hilbert Spaces
Problem 5.38. (Least-Squares). Let H and K be Hilbert spaces and take T ∈ B[H, K]. If y ∈ K\R(T ), then there is no solution x ∈ H to the equation y = T x. Question: Is there a vector x in H that minimizes #y − T x#? Use Theorem 5.13, Proposition 5.76, and Problem 5.37 to prove the following proposition. If R(T ) = R(T )−, then for each y ∈ K there is an xy ∈ H such that #y − T xy # = inf #y − T x# x∈H
and
T ∗ T xy = T ∗ y.
Moreover, if T is injective, then xy is unique and given by xy = (T ∗ T )−1 T ∗ y. Problem 5.39. Let H and K be Hilbert spaces and take T ∈ B[H, K]. If y ∈ R(T ) and R(T ) = R(T )−, then show that there is an x0 in H such that y = T x0 and #x0 # ≤ #x# for all x ∈ H such that y = T x. That is, if R(T ) = R(T )−, then for each y ∈ R(T ) there exists a solution x0 ∈ H to the equation y = T x with minimum norm. Moreover, if T ∗ is injective, then show that x0 is unique and given by x0 = T ∗ (T T ∗ )−1 y. Hint : If R(T ) = R(T )−, then R(T T ∗ ) = R(T ) (Propositions 5.76 and 5.77). Take y ∈ R(T ) so that y = T T ∗ z for some z in K. Set x0 = T ∗ z in H, and so y = T x0 . If x ∈ H is such that y = T x, then #x0 #2 = T ∗ z ; x0 = z ; T x = T ∗ z ; x = x0 ; x ≤ #x0 ##x#. If N (T ∗ ) = {0}, then N (T T ∗ ) = {0} (Proposition 5.76). Since R(T T ∗ ) = R(T ) = R(T )−, there exists (T T ∗ )−1 in B[R(T ), K] (Corollary 4.24). Thus z = (T T ∗ )−1 y is unique and so is x0 = T ∗ z. Problem 5.40. Show that T ∈ B0 [H, K] if and only if T ∗ ∈ B0 [K, H], where H and K are Hilbert spaces. Moreover, dim R(T ) = dim R(T ∗ ). Hint : B0 [H, K] denotes the set of all finite-rank bounded linear transformations of H into K. If T ∈ B0 [H, K], then R(T ) = R(T )−. (Why?) Now use Propositions 5.76 and 5.77 to show that R(T ∗ ) = T ∗ (R(T )). Thus conclude: dim R(T ∗ ) ≤ dim R(T ) (cf. Problems 2.17 and 2.18). Problem 5.41. Let T ∈ B[H, Y ] be a bounded linear transformation of a Hilbert space H into a normed space Y. Show that the following assertions are pairwise equivalent. (a) T is compact (i.e., T ∈ B∞[H, Y ]). w x in H. (b) T xn → T x in Y whenever xn −→ w 0 in H. (c) T xn → 0 in Y whenever xn −→
Problems
427
Hint : Problem 4.69 for (a)⇒(b). Conversely, let {xn } be a bounded sequence in H. Apply Lemma 5.69 to ensure the existence of a subsequence {xnk } of {xn } such that {T xnk } converges in Y whenever (b) holds true. Now conclude that T is compact (Theorem 4.52(d)). Hence (b)⇒(a). Trivially, (b)⇒(c). w On the other hand, if xn −→ x in H, then verify that T (xn − x) → 0 in Y whenever (c) holds; that is, (c)⇒(b). Problem 5.42. If T ∈ B[H, K], where H and K are Hilbert spaces, then show that the following assertions are pairwise equivalent. (a) T is compact (i.e., T ∈ B∞[H, K]). (b) T is the (uniform) limit in B[H, K] of a sequence of finite-rank bounded linear transformations of H into K. That is, there exists a B0 [H, K]-valued sequence {Tn } such that #Tn − T # → 0. (c) T ∗ is compact (i.e., T ∗ ∈ B∞[K, H]). basis for Hint : Take any T ∈ B∞[H, K] and let {ek }∞ k=1 be an orthonormal R(T )−. If Pn : K → K is the orthogonal projection onto {ek }nk=1 , then u Pn T −→ T . Indeed, R(T )− is separable (Proposition 4.57), and Theorem s 5.52 ensures the existence of Pn . Show: Pn −→ P , where P : K → K is the − orthogonal projection onto R(T ) (Problem 5.15). Use Problem 4.57 to u verify that Pn T −→ P T = T . Set Tn = Pn T and show that each Tn lies in B0 [H, K]. Hence (a)⇒(b). For the converse, see Corollary 4.55. Thus (a)⇔(b), which implies (a)⇔(c) (Proposition 5.65(d) and Problem 5.40). Now prove the following proposition:
B0 [H, K] is dense in B∞[H, K].
Problem 5.43. An operator J ∈ B[H] on a Hilbert space H is an involution if J 2 = I (cf. Problem 1.11). A symmetry is a unitary involution. (a) Take S ∈ B[H]. Show that the following assertions are pairwise equivalent. (i)
S is a unitary involution.
(ii)
S is a self-adjoint involution.
(iii) S is self-adjoint and unitary. (iv) S is an involution such that S ∗ S = SS ∗. (b) Exhibit an involution on C 2 that is not self-adjoint. √ i Hint : J = i2 −√ in B[C 2 ]. 2 (c) Exhibit a unitary on C 2 that is not self-adjoint. θ − sin θ Hint : U = cos in B[C 2 ] for any θ ∈ (0, π). sin θ cos θ (d) Consider the symmetry S = 01 10 in B[C 2 ]. Find a resolution of the identity on C 2 , say {P1 , P−1 }, such that S = P1 − P−1 . (As we shall see in Chapter 6, P1 − P−1 is the spectral decomposition of S.)
428
5. Hilbert Spaces
Hint : {P1 , P−1 } with P1 = 12 11 11 and P−1 = 12 −11 −11 in B[C 2 ] is a resolution of the identity on C 2 (i.e., P1 2 = P1 = P1 ∗, P−1 2 = P−1 = P−1 ∗, P1 P−1 = P−1 P1 = O, P1 + P−1 = I) such that S = P1 − P−1 . (e) Exhibit a symmetry S and a nilpotent T (both acting on the same Hilbert space) such that S T is a nonzero idempotent. That is, exhibit S, T ∈ B[H], where S = S ∗ = S −1 and T 2 = O, such that O = ST = (ST )2 . Hint : T = 01 00 and S = 01 10 . (f) Exhibit an operator in B[H] that is unitarily equivalent (through a symmetry) to its adjoint but is not self-adjoint. That is, exhibit T ∈ B[H] such that S T S ∗ = T ∗ = T for some S ∈ B[H] with S = S ∗ = S −1 .
Hint : T = α0 α0 , α ∈ C \R, or T = β0 α0 , β = α, β ∈ R; with S = 01 10 . Problem 5.44. Let H be a Hilbert space. Show that the set of all self-adjoint operators from B[H] is weakly closed in B[H]. Hint : Verify: |T x ; y − x ; T y| = |T x ; y − Tn x ; y + x ; Tn y − x ; T y| ≤ |(Tn − T )x ; y| + |(Tn − T )y ; x| whenever Tn∗ = Tn . Problem 5.45. Let S and T be self-adjoint operators in B[H], where H is a Hilbert space. Prove the following results. (a) T + S is self-adjoint. (b) αT is self-adjoint if and only if α ∈ R. Therefore, if H is a real Hilbert space, then the set of all self-adjoint operators from B[H] is a subspace of B[H]. (c) T S is self-adjoint if and only if T S = S T . (d) p(T ) = p(T )∗ for every polynomial p with real coefficients. n
n
(e) T 2n ≥ O and #T 2 # = #T #2 for each n ≥ 1. (Hint : Proposition 5.78.) Problem 5.46. If an operator T ∈ B[H] acting on a complex Hilbert space H is such that T = A + iB, where A and B are self-adjoint operators in B[H], then the representation T = A + iB is called the Cartesian decomposition of T . Prove the following propositions. (a) Every operator T ∈ B[H] on a complex Hilbert space H has a unique Cartesian decomposition. Hint : Set A = 12 (T ∗ + T ) and B = 2i (T ∗ − T ). (b) T ∗ T = T T ∗ if and only if AB = BA. In this case, T ∗ T = A2 + B 2 and max{#A#2, #B#2 } ≤ #T #2 ≤ #A2 # + #B 2 #.
Problems
429
Problem 5.47. If T ∈ B[H] is a self-adjoint operator acting on a real Hilbert space H, then show that
T x ; y = 14 T (x + y) ; x + y − T (x − y) ; x − y for every x, y ∈ H. (Hint : Problem 5.3(a).) Problem 5.48. Let H be any (real or complex) Hilbert space. (a) If {Tn } is a sequence of self-adjoint operators, then the five assertions of Proposition 5.67 are all pairwise equivalent, even in a real Hilbert space. Hint : If Tn∗ = Tn and the real sequence {Tn x ; x} converges in R for every x ∈ H, and if H is real, then use Problem 5.47 to show that {Tn x ; y} converges in R for every x, y ∈ H. Now apply Proposition 5.67. (b) If {Tn } is a sequence of self-adjoint operators, then the four assertions of Problem 5.5 are all pairwise equivalent, even in a real Hilbert space. Hint : Problems 5.5 and 5.47. Problem 5.49. The set B + [H] of all nonnegative operators on a Hilbert space H is a weakly closed convex cone in B[H]. w Hint : If Qn ≥ O for every positive integer n and Qn −→ Q, then Q ≥ O since 0 ≤ Qn x; x = (Qn − Q)x; x + Qx; x. See Problems 2.2 and 2.21.
Problem 5.50. Let H and K be Hilbert spaces and take T ∈ B[H, K]. Verify that T ∗ T ∈ B + [H] and T T ∗ ∈ B + [K], and prove the following assertions. (a) T ∗ T > O if and only if T is injective. (b) T ∗ T ∈ G + [H] if and only if T ∈ G[H, K]. (a∗) T T ∗ > O if and only if T ∗ is injective. (b∗) T T ∗ ∈ G + [K] if and only if T ∗ ∈ G[K, H]. Problem 5.51. Let H be a Hilbert space and take Q, R, and T in B[H]. Prove the following implications. (a) Q ≥ O implies T ∗ Q T ≥ O. (b) Q ≥ O and R ≥ O imply Q + R ≥ O. (c) Q > O and R ≥ O imply Q + R > O. (d) Q ) O and R ≥ O imply Q + R ) O. Problem 5.52. Let Q be an operator acting on a Hilbert space H. Prove the following propositions.
430
5. Hilbert Spaces
(a) Q ≥ O implies Qn ≥ O for every integer n ≥ 0. (b) Q > O implies Qn > O for every integer n ≥ 0. (c) Q ) O implies Qn ) O for every integer n ≥ 0. (d) Q ) O implies Q−1 ) O. (e) If p is an arbitrary polynomial with positive coefficients, then Q ≥ O implies p(Q) ≥ O, Q > O implies p(Q) > O, Q ) O implies p(Q) ) O. Hints: (a), (b), and (c) are trivially verified for n = 0, 1. Suppose n ≥ 2. n
(a) Show that: Qn x ; x = #Q 2 x#2 for every x ∈ H if n is even, and n−1 n−1 Qn x ; x = QQ 2 x ; Q 2 x for every x ∈ H if n is odd. (b, c) Q > O if and only if Q ≥ O and N (Q) = {0}; and Q ) O if and only if Q ≥ O and Q is bounded below. In both cases, Q = O. Note that (i) Q2n x ; x = #Qn x#2 , and (ii) Q2n−1 x ; x ≥ #Q#−1 #Qn x#2 (since, by Proposition 5.82, #Qn x#2 = #Q Qn−1 x#2 ≤ #Q#Q Qn−1 x ; Qn−1 x). Apply (i) to show that (b) and (c) hold for n = 2, and hence they hold for n = 3 by (ii). Conclude the proofs by induction. (d) #x#2 = #Q Q−1 x#2 ≤ #Q#Q Q−1 x ; Q−1 x = #Q#Q−1 x ; x. Why? Problem 5.53. Let H be a Hilbert space and take Q, R ∈ B[H]. Prove that (a) O ≺ Q ≺ R implies O ≺ R−1 ≺ Q−1 , (b) O ≺ Q ≤ R implies O ≺ R−1 ≤ Q−1 , (c) O ≺ Q < R implies O ≺ R−1 < Q−1 . Hints: Consider the result in Problem 5.52(d). (a) If O ≺ Q ≺ R, then Q−1 ) O, R−1 ) O, and (R − Q)−1 ) O. Observe that Q−1 − R−1 = Q−1 (R − Q)R−1 = ((R − Q + Q)(R − Q)−1 Q)−1 = (Q + Q(R − Q)−1 Q)−1 and Q + Q(R − Q)−1 Q ) O. So Q−1 − R−1 ) O. (b) If O ≺ Q ≤ R, then Q−1 ) O, R−1 ) O (there is an α > 0 such that α#x#2 ≤ R−1 x ; x for every x ∈ H), and O ≺ Q ≤ R ≺ nn+1 R. Note that 1 Q−1 − R−1 = Q−1 − ( nn+1 R)−1 − n+1 R−1 and Q−1 − ( nn+1 R)−1 ) O. Thus n+1 1 −1 −1 −1 (Q − R )x ; x = Q − ( n R)−1 x; x − n+1 R−1 x; x ≥ − nα+1 #x#2 −1 −1 for all n ≥ 1, and so (Q − R )x ; x ≥ 0, for every x ∈ H. (c) If O ≺ Q < R, then Q−1 ) O, R−1 ) O, and R − Q > O. Therefore, there is an α > 0 such that α#x# ≤ #Q−1 x# for every x ∈ H, R−1 ∈ G[H], and N (R − Q) = {0}. Hence 0 < α#(R − Q)R−1x# ≤ #Q−1 (R − Q)R−1x# = #(Q−1 − R−1 )x# for every nonzero vector x in H, and so N (Q−1 − R−1 ) = {0}. Recall that Q−1 − R−1 ≥ O by item (b).
Problems
431
Problem 5.54. Show that the following equivalences hold for every T in B[H, K], where H and K are Hilbert spaces (apply Corollary 5.83). s T ∗n T n −→ O
⇐⇒
w T ∗n T n −→ O
⇐⇒
s T n −→ O.
Now conclude that for a self-adjoint operator the concepts of strong and weak s w stabilities coincide (i.e., if T ∗ = T , then T n −→ O ⇐⇒ T n −→ O). Problem 5.55. Take Q, T ∈ B[H], where H is a Hilbert space. Prove the following assertions. (a) −I ≤ T ∗ = T ≤ I if and only if T ∗ = T and #T # ≤ 1. Hint : Use Propositions 5.78 and 5.79 to show the “only if” part. On the other hand, use Proposition 5.79 and recall that |T x ; x| ≤ #T ##x#2. (b) O ≤ Q ≤ I ⇐⇒ O ≤ Q and #Q# ≤ 1 ⇐⇒ Q∗ = Q and Q2 ≤ Q. Hint : Equivalent characterizations for a nonnegative contraction. Problem 5.56. Take P, Q, T ∈ B[H] on a Hilbert space H. Prove the results: w P , then P is an orthogonal projection. (a) If T ∗ = T and T n −→
Hint : Problems 5.24 and 5.44 and Proposition 5.81. (b) If O ≤ Q ≤ I, then Qn+1 ≤ Qn for every integer n ≥ 0. Hint : Take n ≥ 1 and x ∈ H. If n is even, use Problem 5.55(b) and Propon−2 n−2 n sition 5.82 to show that Qn x ; x = #Q 2 x#2 ≤ #Q#Q Q 2 x ; Q 2 x ≤ n−1 n−1 Qn−1 x ; x. If n is odd, then show that Qn x ; x = QQ 2 x ; Q 2 x ≤ n−1 n−1 Q 2 x ; Q 2 x = Qn−1 x ; x. s P and P is an orthogonal projection. (c) If O ≤ Q ≤ I, then Qn −→
Hint : Problems 5.55(b), 4.47(a), 5.24, items (a,b), and Proposition 5.84. Problem 5.57. This is our first problem that uses the square root of a nonnegative operator (Theorem 5.85). Take T ∈ B[H] acting on a complex Hilbert space H and prove the following propositions. 1 (a) If T = O is self-adjoint, then U± (T ) = #T #−1 T ± i(#T #2I − T 2 ) 2 are unitary operators in B[H]. Hint : #T #−2T 2 ≤ I so that O ≤ #T #2I − T 2 (cf. Problems 5.45 and 5.55). See Proposition 5.73. (b) Every operator on a complex Hilbert space is a linear combination of four unitary operators. Hint : If O = T = T ∗, then show that T = T2 U+(T ) + T2 U−(T ). Apply the Cartesian decomposition (Problem 5.46) if O = T = T ∗.
5. Hilbert Spaces
432
Problem 5.58. If Q ∈ B + [H], where H is a Hilbert space, then show that (cf. Theorem 5.85 and Proposition 5.86) 1
1
1
(a) Qx ; x = #Q 2 x#2 ≤ #Q# 2 Q 2 x ; x for every x ∈ H, 1
1
(b) Q 2 x ; x ≤ Qx ; x 2 #x# for every x ∈ H, 1
(c) Q 2 > 0 if and only if Q > O, 1
(d) Q 2 ) 0 if and only if Q ) O. Problem 5.59. Take Q, R ∈ B + [H] on a Hilbert space H. Prove the following two assertions. (a) If Q ≤ R and QR = R Q, then Q2 ≤ R2 . (b) Q ≤ R does not imply Q2 ≤ R2 . 1
1
1
1
Hints: (a) R Q 2 x ; Q 2 x = QR 2 x ; R 2 x. (b) Q =
1 0 0 0
and R =
2 1 1 1 .
Remark : Applying the Spectral Theorem of Section 6.8 and the square root of Theorem 5.85, it can be shown that Q2 ≤ R2 implies Q ≤ R
1
1
Q ≤ R implies Q 2 ≤ R 2 .
and so
Problem 5.60. Let Q and R be nonnegative operators acting on a Hilbert space. Use Problem 5.52 and Theorem 5.85 to prove that QR = R Q
implies
Qn Rm ≥ O for every m, n ≥ 1.
Show that p(Q)q(R) ≥ O for every pair of polynomials p and q with positive coefficients whenever Q ≥ O and R ≥ O commute. Problem 5.61. Let H and K be Hilbert spaces. Take any T in B[H, K] and recall that T ∗ T lies in B + [H]. Set 1
|T | = (T ∗ T ) 2
in B + [H] so that |T |2 = T ∗ T . Prove the following assertions. 1
1
(a) #T # = #|T |2 # 2 = #|T |# = #|T | 2 #2 . 1
(b) |T |x ; x = #|T | 2 x#2 ≤ #|T |x##x# for every x ∈ H. (c) #T x#2 = #|T |x#2 ≤ #T #|T |x ; x for every x ∈ H. Moreover, if H = K (i.e., if T ∈ B[H]), then show that s w s O ⇐⇒ | T n | −→ O ⇐⇒ |T n | −→ O, (d) T n −→
(e) B + [H] = {T ∈ B[H]: T = |T |}
(i.e., T ≥ O if and only if T = |T |).
Problems
433
Problem 5.62. Let Q be a nonnegative operator on a Hilbert space. 1
Q is compact if and only if Q 2 is compact. 1
Hint : If Q 2 is compact, then Q is compact by Proposition 4.54. On the other 1 hand, #Q 2 xn #2 = Qxn ; xn ≤ supk #xk ##Qxn # (Problem 5.41). Take T ∈ B[H, K], where H and K are Hilbert spaces. Also prove that 1
T ∈ B∞[H, K] ⇐⇒ T ∗ T ∈ B∞[H] ⇐⇒ |T | ∈ B∞[H] ⇐⇒ |T | 2 ∈ B∞[H]. Problem 5.63. Consider a sequence {Qn } of nonnegative operators on a Hilbert space H (i.e., Qn ≥ O for every n). Prove the following propositions. s s 1/2 (a) Qn −→ Q implies Q1/2 . n −→ Q u u 1/2 Q, then Q1/2 . (b) If Qn is compact for every n and Qn −→ n −→ Q
Hints: Q ≥ O by Problem 5.49 and Propositions 5.68 and 4.48. (a) Recall that Q1/2 is the strong limit of a sequence {pk(Q)} of polynomials in Q, where the polynomials {pk } themselves do not depend on Q; that s is, pk(Q) −→ Q1/2 for every Q ≥ O (cf. proof of Theorem 5.85). First ver1/2 ify that #(Qn − Q1/2 )x# ≤ #(Q1/2 n − pk (Qn ))x# + #(pk (Qn ) − pk (Q))x# + #(pk (Q) − Q1/2 )x#. Now take an arbitrary ε ≥ 0 and any x ∈ H. Show that there are positive integers nε and kε such that #(pkε (Q) − Q1/2 )x# < ε, s #(pkε (Qn ) − pkε (Q))x# < ε for every n ≥ nε (since Qjn −→ Qj for every 1/2 positive integer j by Problem 4.46), and #(Qnε − pkε (Qnε ))x# < ε. s 1/2 (b) Note that Q ∈ B∞[H] by Theorem 4.53. Since Q1/2 by part (a), n −→ Q u 1/2 1/2 1/2 2 we get Qn Q −→ Q (Problems 5.62 and 4.57). Hence (Q1/2 ) = n −Q u 1/2 1/2 1/2 1/2 ∗ 1/2 Qn + Q − Qn Q − (Qn Q ) −→ O (Problem 5.26). But Qn − Q1/2 1/2 2 1/2 2 is self-adjoint so that #Q1/2 # = #(Q1/2 n −Q n − Q ) # (Problem 5.45).
Problem 5.64. Let {eγ }γ∈Γ and {fγ }γ∈Γ be orthonormal bases for a Hilbert space H. Take any operator T ∈ B[H]. Use the Parseval identity to show that #T eγ #2 = #T ∗ fγ #2 = |T eα ; fβ |2 γ∈Γ
γ∈Γ
α∈Γ β∈Γ
whenever the family of nonnegative numbers {#T eγ #2 }γ∈Γ is summable; that is, whenever γ∈Γ #T eγ #2 < ∞ (cf. Proposition 5.31). Apply the above result 1 to the operator |T | 2 ∈ B + [H] (cf. Problem 5.61) and show that |T |eγ ; eγ = |T |fγ ; fγ
γ∈Γ
γ∈Γ
whenever γ∈Γ |T |eγ ; eγ < ∞. Outcome: If the sum γ∈Γ |T |eγ ; eγ exists in R (i.e., if {|T |eγ ; eγ }γ∈Γ is summable), then it is independent of the choice
434
5. Hilbert Spaces
of the orthonormal basis {eγ }γ∈Γ for H. An operator T ∈ B[H] is1 trace-class (or nuclear ) if γ∈Γ |T |eγ ; eγ < ∞ (equivalently, if γ∈Γ #|T | 2 eγ #2 < ∞) for some orthonormal basis {eγ }γ∈Γ for H. Let B1 [H] denote the subset of B[H] consisting of all trace-class operators on H. If T ∈ B1 [H], then set 1 |T |eγ ; eγ = #|T | 2 eγ #2 . #T #1 = γ∈Γ
γ∈Γ
Problem 5.65. Let T ∈ B[H] be an operator on a Hilbert space H, and let {eγ }γ∈Γ be an orthonormal basis for H. If the operator |T |2 is trace-class (as defined in Problem 5.64; that is, if γ∈Γ #|T |eγ #2 < ∞ or, equivalently, if 2 γ∈Γ #T eγ # < ∞ — Problem 5.61(c)), then T is a Hilbert–Schmidt operator. Let B2 [H] denote the subset of B[H] made up of all Hilbert–Schmidt operators on H. Take T ∈ B2 [H]. According to Problems 5.61 and 5.64, set
1
1 1 1 #T #2 = #T ∗ T #12 = #|T |2 #12 = #|T |eγ #2 2 = #T eγ #2 2 γ∈Γ
γ∈Γ
for any orthonormal basis {eγ }γ∈Γ for H. Prove the following results. (a) T ∈ B2 [H] ⇐⇒ |T | ∈ B2 [H] ⇐⇒ |T |2 ∈ B1 [H].
In this case,
#T #22 = #|T |#22 = #|T |2 #1 . 1
(b) T ∈ B1 [H] ⇐⇒ |T | ∈ B1 [H] ⇐⇒ |T | 2 ∈ B2 [H].
In this case,
#T #1 = #|T |#1 = #|T | 2 #22 . 1
(c) If T ∈ B2 [H], then T ∗ ∈ B2 [H] and #T ∗ #2 = #T #2. (Hint : Problem 5.64.) (d) #T # ≤ #T #2 for every T ∈ B2 [H].
(Hint : #T e# ≤ #T #2 if #e# = 1.)
(e) If T, S ∈ B2 [H], then T + S ∈ B2 [H] and #T + S#2 ≤ #T #2 + #S#2.
2 12 2 12 = Hint : Since γ∈Γ #T eγ ##Seγ # ≤ γ∈Γ #T eγ # γ∈Γ #Seγ # 2 2 #T #2#S#2 (Schwarz inequality in Γ ), we get #T + S#2 ≤ (#T #2 + #S#2 )2 . (f) B2 [H] is a linear space and # #2 is a norm on B2 [H]. (g) S T and T S lie in B2 [H] and max{#S T #2, #T S#2} ≤ #S##T #2 for every S in B[H] and every T in B2 [H]. Hint : #S T eγ #2 ≤ #S#2 #T eγ #2 and #(T S)∗ eγ #2 ≤ #S#2 #T ∗ eγ #2 . (h) B2 [H] is a two-sided ideal of B[H]. Problem 5.66. Consider the setup of the previous problem and prove the following assertions. (a) If T, S ∈ B1 [H], then T + S ∈ B1 [H] and #T + S#1 ≤ #T #1 + #S#1 .
Problems
435
Hint : Polar decompositions: T + S = W |T + S|, T = W1 |T |, and S = W2 |S|. Thus |T + S| = W ∗ (T + S), |T | = W1∗ T , and |S| = W2∗ T . Verify: |T + S|eγ ; eγ ≤ |T eγ ; W ∗ eγ | + |Seγ ; W ∗ eγ | γ∈Γ
=
γ∈Γ 1 2
||T | eγ ; |T |
γ∈Γ
≤
1 2
#|T | eγ #
2
γ∈Γ
+
1 2
W1∗ W eγ |
1 2
+
γ∈Γ
||S| eγ ; |S| 2 W2∗ W eγ | 1 2
γ∈Γ
#|T | W1∗ W eγ #2 1 2
1
1 2
γ∈Γ
#|S| eγ #2 1 2
1
γ∈Γ
γ∈Γ
#|S| 2 W2∗ W eγ #2 1
2
1 2
≤ #|T | 2 #22 #W1∗ W # + #|S| 2 #22 #W2∗ W # ≤ #T #1 + #S#1 . 1
1
(Problem 5.65(b,g); recall that #W # = #W1 # = #W2 # = 1.) (b) B1 [H] is a linear space and # #1 is a norm on B1 [H]. (c) B1 [H] ⊆ B2 [H] (i.e., every trace-class operator is Hilbert–Schmidt). If T ∈ B1 [H], then #T #2 ≤ #T #1. Hint : Problem 5.65(a,b,g) to prove the inclusion, and Problems 5.61(c) and 5.65(b) to prove the inequality. (d) B2 [H] ⊆ B∞[H] (i.e., every Hilbert–Schmidt operator is compact). Hint T ∈ B2 [H] so that T ∗ ∈ B2 [H] (Proposition 5.65(c)), and hence : Take ∗ 2 #T e γ # < ∞. Take an arbitrary integer n ≥ 1. There exists a finite γ∈Γ that k∈N #T ∗ ek #2 < n1 for all finite N ⊆ Γ \Nn (Theorem Nn ⊆ Γ such 1 ∗ 2 5.27). Thus #T e # < . Recall that T x = γ n γ∈Γ \Nn γ∈Γ T x ; eγ eγ (Theorem 5.48) and define Tn : H → H by Tn x = k∈NnT x ; ek ek . Show that #(T − Tn )x#2 = γ∈Γ \Nn |T x; eγ |2 ≤ γ∈Γ \Nn #T ∗ eγ #2 #x#2 and Tn ∈ B0 [H]. Thus #Tn − T # → 0, and hence T ∈ B∞[H] (Problem 5.42). (e) T ∈ B1 [H] if and only if T = AB for some A, B ∈ B2 [H]. 1
1
Hint : Let T = W |T | = W |T | 2 |T | 2 be the polar decomposition of T . If T ∈ B1 [H], then use Problem 5.65(b,g). Conversely, suppose T = AB with ∗ ∗ A, B ∈ B2 [H]. Since |T | = W T , we get |T | = W AB with A∗ W ∈ B2 [H] (Problem 5.65(c,g)). Verify: γ∈Γ |T |eγ ; eγ ≤ γ∈Γ #Beγ ##A∗ W eγ # ≤
2 12 ∗ 2 12 . Hence #T #1 ≤ #B#2 #A∗ W #2 . γ∈Γ #Beγ # γ∈Γ #A W eγ # (f) S T and T S lie in B1 [H] for every T in B1 [H] and every S in B[H]. Hint : Apply (e). T = AB for some A, B ∈ B2 [H]. SA and BS lie in B2 [H], and so S T = SAB and T S = ABS lie in B1 [H]. (g) B1 [H] is a two-sided ideal of B[H].
436
5. Hilbert Spaces
Problem 5.67. Let {eγ }γ∈Γ be an arbitrary orthonormal basis for a Hilbert space H. If T ∈ B1 [H] (i.e., if T is a trace-class operator), then show that |T eγ ; eγ | < ∞. γ∈Γ
Hint : 2|T eγ ; eγ | = 2|ABeγ ; eγ | ≤ 2#Beγ ##A∗ eγ # for A,B ∈ B2 [H] (Problem 5.66(e)). So 2|T eγ ; eγ | ≤ #Beγ #2 + #A∗ eγ #2 . Then γ∈Γ |T eγ ; eγ | ≤ 1 (#A#22 + #B#22 ) (Problem 5.65(c)). 2 Thus, by Corollary 5.29, {T eγ ; eγ }γ∈Γ is a summable family of scalars (since F is a Banach space). Let γ∈Γ T eγ ; eγ in F be its sum and show that γ∈Γ T eγ ; eγ does not depend on {eγ }γ∈Γ . Hint : α∈Γ T eα ; eα = α∈Γ β∈Γ T eα ; fβ fβ ; eα , where {eγ }γ∈Γ and {fγ }γ∈Γ Now observe are H (Theorem 5.48(c)). any orthonormal bases for that β∈Γ α∈Γ T eα ; fβ fβ ; eα = β∈Γ fβ ; T ∗ fβ = β∈Γ T fβ ; fβ . If T ∈ B1 [H] and {eγ }γΓ is any orthonormal basis for H, then set tr(T ) = T eγ ; eγ so that #T #1 = tr(|T |). γ∈Γ
Hence B1 [H] = {T ∈ B[H]: tr(|T |) < ∞}. The number tr(T ) is called the trace of T ∈ B1 [H] (thus the terminology “trace-class”). Warning: If T lies in B[H] and γ∈Γ T eγ ; eγ < ∞ for some orthonormal basis {eγ }γ∈Γ for H, then it does not follow that T ∈ B1 [H]. However, if γ∈Γ |T |eγ ; eγ < ∞ for some orthonormal basis {eγ }γ∈Γ for H, then T ∈ B1 [H] (Problem 5.64). Problem 5.68. Consider the setup of the previous problem and prove the following assertions. (a) tr: B1 [H] → F is a linear functional. (b) |tr(T )| ≤ #T #1 for every T ∈ B1 [H] (i.e., tr: (B1 [H], # #1 ) → F is a contraction, and hence a bounded linear functional). Hint : Let T = W |T | be the polar decomposition of T . Recall that #W # = 1. 1 1 If T is trace-class, then verify that |tr(T )| ≤ γ∈Γ ||T | 2 eγ ; |T | 2 W ∗ eγ | ≤
1 1 2 12 ∗ 2 12 2 2 ≤ #T #1 (Problem 5.65). γ∈Γ #|T | eγ # γ∈Γ #|T | W eγ # (c) tr(T ∗ ) = tr(T ) for every T ∈ B1 [H]. (d) tr(T S) = tr(S T ) whenever T ∈ B1 [H] and S ∈ B[H]. T ∗ eα ; Seα = T ∗ eα ; fβ fβ ; Seα Hint : tr(T S) = α∈Γ ∗ α∈Γ β∈Γ ∗ and tr(T S) = β∈Γ S fβ ; T fβ = β∈Γ α∈Γ S fβ ; eα eα ; T fβ (cf. Problem 5.66(f), item (c), and Theorem 5.48(c)). (e) |tr(S|T |)| = |tr(|T |S)| ≤ #S##T #1 if T ∈ B1 [H] and S ∈ B[H].
Problems
437
Hint and 5.66(f), and verify that (see item (d)) # : Use Problems# 5.65(b,g) 1 1 1 1 #≤ # S|T |e ; e ||T | 2 eγ ; |T | 2 S ∗ eγ | ≤ #|T | 2 #2 #|T | 2 S ∗ #2 . γ γ γ∈Γ γ∈Γ (f) T ∗ ∈ B1 [H] and #T ∗ #1 = #T #1 for every T ∈ B1 [H]. Hint : Let T = W1 |T | and T ∗ = W2 |T ∗ | be the polar decompositions of T and T ∗. Since |T ∗ | = W2∗ T ∗ = W2∗ |T |W1∗, T ∗ lies in B1 [H] (by Problems 5.65(b) and 5.66(f)). Now show that #T ∗ #1 = tr(|T ∗ |) = tr(W2∗ |T |W1∗ ) ≤ #W1∗ W2∗ ##T #1 (Problem 5.65(b) and items (d) and (e)). But #W1∗ W2∗ # ≤ #W1 ##W2 # = 1. Therefore, #T ∗ #1 ≤ #T #1 . Dually, #T #1 ≤ #T ∗ #1 . (g) max{#S T #1, #T S#1} ≤ #S##T #1 whenever T ∈ B1 [H] and S ∈ B[H]. Hint : Let T = W |T |, S T = W1 |S T |, and T S = W2 |T S| be the polar decompositions of T , S T , and T S, respectively, and verify that #S T #1 = tr(|S T |) = tr(W1∗ S W |T |) and #T S#1 = tr(|T S|) = tr(W2∗ W |T |S). Use items (d) and (e) and recall that #W # = #W1 # = #W2 # = 1. (h) B0 [H] ⊆ B1 [H] (i.e., every finite-rank operator is trace-class). Hint : If dim R(T ) is finite, then dim N (T ∗ )⊥ is finite (Proposition 5.76). Let {fα } be an orthonormal basis for N (T ∗ ) and let {gk } be a finite orthonormal basis for N (T ∗ )⊥. Since H = N (T ∗ ) + N (T ∗ )⊥ (Theorem 5.20), {eγ } = {fα } ∪ {gk } is an orthonormal basisfor H (Problem 5.11). Now, either T ∗ eγ = 0 or T ∗ eγ = T ∗ gk . Show that γ |T ∗ |eγ ; eγ = ∗ ∗ k |T |gk ; gk < ∞ (e.g., see Problem 5.61(c)). Thus T ∈ B1 [H] (Problem 5.64), and hence T ∈ B1 [H] by item (f). Problem 5.69. Let (B1 [H], # #1 ) and (B2 [H], # #2 ) be the normed spaces of Problems 5.65(f) and 5.66(b). Show that (a) (B1 [H], # #1 ) is a Banach space. Hint : Take a B1 [H]-valued sequence {Tn }. If {Tn } is a Cauchy sequence in (B1 [H], # #1 ), then it is a Cauchy sequence in the Banach space (B[H], # #) u (Problems 5.65(d) and 5.66(c)), and so Tn −→ T for some T ∈ B[H]. Use 1 1 u 2 u Problems 5.26 and 4.46 to verify that |Tn | −→ |T |2 , andso |Tn | 2 −→ |T | 2 1 2 (Problems 5.63(b) and 5.66(c,d)). Therefore, show that γ∈Γ #|T | 2 eγ # ≤ 1 lim supn γ∈Γ #|Tn | 2 eγ #2 ≤ supn #Tn #1 < ∞ (recall that {Tn } is Cauchy 1 u u in (B1 [H], # #1 )). Thus T ∈ B1 [H]. Since Tn − T −→ O, |Tn − T | 2 −→ O 1 2 2 e # (Problem 5.61(a)). Observe that #Tn − T #1 = #|T − T | = n γ γ∈Γ 1 1 2 2 2 2 k∈Nε #|Tn − T | ek # + supN k∈N #|Tn − T | ek # < ∞ for every finite set Nε ⊆ Γ , where the supremum is taken over all finite sets N ⊆ Γ \Nε (Proposition 5.31). Use Theorem 5.27 to conclude that #Tn − T #1 → 0. Consider the function ; : B2 [H]×B2 [H] → F given by T ; S = tr(S ∗ T )
438
5. Hilbert Spaces
for every S, T ∈ B2 [H]. Show that ; is an inner product on B2 [H] that induces the norm # #2 . (Hint : Problem 5.68(a,c).) Moreover, (b) (B2 [H], ; ) is a Hilbert space. Recall that B0 [H] ⊆ B1 [H] ⊆ B2 [H] ⊆ B∞[H] and that B0 [H] is dense in the Banach space (B∞[H], # #). Now show that (c) B0 [H] is dense in (B1 [H], # #1 ) and in (B2 [H], # #2 ). Problem 5.70. Two normed spaces X and Y are topologically isomorphic if there exists a topological isomorphism between them (i.e., if there exists W in G[X , Y ] — see Section 4.6). Two inner product spaces X and Y are unitarily equivalent if there exists a unitary transformation between them (i.e., if there exists a unitary U in G[X , Y ] — see Section 5.6). Two Hilbert spaces are topologically isomorphic if and only if they are unitarily equivalent. That is, if H and K are Hilbert spaces, then G[H, K] = ∅ if and only if U ∈ G[H, K]: U is unitary = ∅. 1
Hint : If W ∈ G[H, K], then |W | = (W ∗ W ) 2 ∈ G + [H] (Problems 5.50(b) and 5.58(d)). Show that U = W |W |−1 ∈ G[H, K] is unitary (Proposition 5.73) and that U |W | is the polar decomposition of W (Corollary 5.90). Problem 5.71. Let {Tk } and {Sk } be (equally indexed) countable collections of operators acting on Hilbert spaces Hk (i.e., Tk , Sk ∈ B[H k ] for each k). Consider the direct sum operators k Tk ∈ B[ k Hk ] and k Sk ∈ B[ k Hk ] acting on the (orthogonal) direct sum space k Hk , which is a Hilbert space (as in Examples 5.F and 5.G, and Problems 4.16 and 5.28). Verify that
(a) R and N k Tk = k R(Tk ) k Tk = k N (Tk ).
∗ ∗ (b) = k Tk , k Tk
(c) p k Tk = k p(Tk ) for every polynomial p,
(d) k Tk + k Sk = k (Tk + Sk ),
(e) k Tk k Sk = k Tk Sk . Problem 5.72. Consider the setup of the previous problem. Show that # # (a) # Tk # = |Tk |. k
k
Now suppose the countable collections are finite and show that n (b) k=1 Tk is compact, trace-class, or Hilbert Schmidt if and only if every Tk is compact, trace-class, or Hilbert Schmidt, respectively. Hint : For the compact case use Theorem 4.52 and recall that the restriction of a compact to a linear manifold is again compact (Section 4.9). For the trace-class and Hilbert Schmidt cases use item (a) and Problem 5.11.
Problems
439
(c) The “if” part of (b) fails for infinite collections (but not the “only if” part). Hint : Set Tk = 1 on Hk = C (see Example 4.N). Problem 5.73. Consider the setup of Problem 5.71. (a) Show that a countable direct sum k Tk is an involution, an orthogonal projection, nonnegative, positive, self-adjoint, an isometry, a unitary operator, or a contraction, if and only if each Tk is. (b) Show that if each Tk is invertible (or strictly positive), then every finite direct sum nk=1 Tk is invertible (or strictly positive), but not every infinite direct sum. However, the converse holds even for an infinite direct sum. Hint : Set Tk =
1 k
on Hk = C (see Examples 4.J and 5.R).
Problem 5.74. Let Lat(H) be the lattice of all subspaces of a Hilbert space H, and let Lat(T ) be the lattice of all invariant subspaces for an operator T on H (see Problems 4.19 and 4.23). Extend the concept of diagonal operator of Problem 5.17 to operators on an arbitrary (not necessarily separable) Hilbert space: an operator T ∈ B[H] is a diagonal with respect to an orthonormal basis {eγ }γ∈Γ for H if there exists a bounded family of scalars {λγ }γ∈Γ such that Tx = λγ x ; eγ eγ γ∈Γ
for every x ∈ H. Show that the following assertions are pairwise equivalent. (a) Lat(T ) = Lat(H). (b) T is a diagonal operator with respect to every orthonormal basis for H. (c) U ∗ T U is a diagonal operator for every unitary operator U ∈ B[H]. (d) T is a scalar operator (i.e., T = λ I for some λ ∈ C ). orthonormal basis for H. Take Hint : (a) ⇒(b): Let {eγ }γ∈Γ be an arbitrary any x ∈ H. Theorem 5.48 says that x = γ∈Γ x ; eγ eγ , and so (why?) T x = x ; e T e . Fix γ ∈ Γ . If Lat(T ) = Lat(H), then every one-dimensional γ γ γ∈Γ subspace of H is T -invariant. Thus, if α ∈ C \{0}, then T (α eγ ) = λ(α eγ )α eγ for some function λ: (span {eγ }\{0}) → C . Since α λ(α eγ )eγ = λ(α eγ )α eγ = T (α eγ ) = α T eγ = α λ(eγ )eγ , we get λ(α eγ ) = λ(eγ ) so that λ is a constant function. That is, λ(α eγ ) = λγ for all α ∈ C \{0}, and hence T eγ = λγ eγ . This implies that T is a diagonal with respect to basis {eγ }γ∈Γ . (b) ⇒(c): Take any unitary operator U ∈ B[H], let {eγ }γ∈Γ be an orthonormal basis for H, and set fγ = U eγ for each γ ∈ Γ so that {fγ }γ∈Γ is an orthonormal basis for H (Proof of Theorem 5.49). Take any x ∈ H. If (b) holds, then there is a bounded family of scalars {μ } such that T x = γ γ∈Γ γ∈Γ μγ x ; fγ fγ = ∗ ∗ γ∈Γ μγ x ; U eγ U eγ = U ( γ∈Γ μγ U x ; eγ eγ ) = UD U x, where D is the
5. Hilbert Spaces
440
diagonal operator with respect to {eγ }γ∈Γ given by Dx = Thus T = UD U ∗ or, equivalently, D = U ∗ T U .
γ∈Γ μγ x ; eγ eγ .
(c) ⇒(d): If (c) holds, then T is a diagonal operator with respect to an orthonormal basis {eγ }γ∈Γ for some bounded family of scalars {λγ }γ∈Γ . Suppose dim H ≥ 2. Take any pair of (distinct) indices {γ1 , γ2 } from Γ, split {eγ }γ∈Γ into {eγ }γ∈Γ = {eγ1 ,eγ2 } ∪ {eγ }γ∈(Γ \{γ1 ,γ2 }) , and decompose H = M ⊕ M⊥ (Theorem 5.25) with M = span {eγ1 , eγ2 } and M⊥ = {eγ }γ∈(Γ \{γ1 ,γ2 }) . As M reduces T, T = A ⊕ B with A = T |M and B = T |M⊥ (Problem 5.28). Thus A is a diagonal operator on M with respect to the orthonormal basis {eγ1 , eγ2 } for M. Let {e1 , e2 } be the canonical basis for C 2. Since M ∼ = C 2, there is a uni2 tary W : C → M such that W e1 = eγ1 and W e2 = eγ2 (Theorem 5.49), and 2 2 hence W ∗A W y = W ∗ ( i=1 λγi W y ; eγi eγi ) = i=1 λγi y ; W ∗ eγi W ∗ eγi = 2 2 ∗ i=1 λγi y ; ei ei for each y ∈ C . Therefore W A W = diag(λγ1 , λγ2 ), a diag2 onal operator in the unitary
B[C ] with respect to {e1 , e2 }. Consider
operator √
λγ2 −λγ1 γ2 in B[C 2 ]. U = 22 11 −11 in B[C 2 ]. So U ∗ W ∗A W U = 12 λλγγ1 +λ λγ1 +λγ2 2 −λγ1 But if (c) holds, this must be a diagonal (why?), which implies that λγ1 = λγ2 . Since the pair of (distinct) indices {γ1 , γ2 } from Γ was arbitrarily taken, it follows that {λγ }γ∈Γ is a constant family. Hence T = λ I.
(d) ⇒(a): Every subspace of H trivially is invariant for a scalar operator. Problem 5.75. If T ∈ B[H] is a contraction on a Hilbert space H, then U = x ∈ H: #T n x# = #T ∗n x# = #x# is a reducing subspace for T . Prove. Also show that the restriction of T to U, T |U : U → U, is a unitary operator. A contraction on a nonzero Hilbert space is called completely nonunitary if the restriction of it to every nonzero reducing subspace is not unitary. That is, T ∈ B[H] on H = {0} with #T # ≤ 1 is completely nonunitary, if T |M ∈ B[M] is not unitary for every subspace M = {0} of H that reduces T . Show that a contraction T is completely nonunitary if and only if U = {0}. Equivalently, T is not completely nonunitary if and only if there is a nonzero vector x ∈ H such that #T nx# = #T ∗nx# = #x# for every n ≥ 1. Also verify the following (almost tautological) assertions. Every completely nonunitary contraction on a nonzero Hilbert space is itself nonzero. A completely nonunitary contraction has a completely nonunitary adjoint. Hints: U reduces T by Proposition 5.74. Use Proposition 5.73(j) to show that T |U is unitary and that T is completely nonunitary if and only if U = {0}, and therefore T is completely nonunitary if and only if T ∗ is. Problem 5.76. Prove the following proposition.
Problems
441
A countable direct sum of contractions is completely nonunitary if and only if every direct summand is completely nonunitary. Hint : Let {Tk } be a nonzerocontraction on each Hilbert space Hk . Recall that a countable direct sum k Tk is a contraction if and only if every Tk is acontraction (Problem 5.73). Let M be a subspace of k Hk that reduces n n T . Recall that ( T ) = T (Problem 5.71). Verify that M reduces k k k k k k ( k Tk )n for every n ≥ 1, and that this implies, for every n ≥ 1, that " n " n " n T k |M = Tk |M = Tk |M k
and
"
k
T k |M
∗n
=
k
"
k
Tk
∗n
|M =
k
"
Tk∗n |M
k
(cf. Corollary 5.75 and nProblems 5.24(d) and 5.28). If ( k Tk )|M is unitary, then so is (( k Tk |M ) for every n ≥ 1, and hence, by the above identities, "
" ∗n " n
" ∗n Tkn |M Tk |M = I = Tk |M Tk |M
k
k
k
k
for every n ≥ 1. Thus, for an arbitrary sequence u = {uk } ∈ M ⊆ " k
Tkn
"
k
Hk ,
" ∗n " n Tk∗n u = u = Tk Tk u,
k
k
k
which means that ( k Tkn Tk∗n ){uk } = {uk } = ( k Tk∗n Tkn ){uk }. Therefore, Tkn Tk∗n uk = Tk∗n Tkn uk = uk
#Tknuk # = #Tk∗nuk # = #uk #
and so
(why?) for each k and every n. If each Tk is completely nonunitary, then every uk is zero (Problem 5.75). Thus u = 0 so that M = {0}, and k Tk is completely nonunitary. Conversely, if one of the Tk is not completely nonunitary, then (Problem 5.75 again) there exists a nonzero vector xk ∈ Hk such that #Tknxk # = #Tk∗nxk # = #xk #
for every n ≥ 1. Then there is a nonzero vector x = (0, . . . , 0, xk , 0, . . .) in k Hk such that , " n , , " n , , Tk x, = , Tk x, = #Tkn xk # = #xk # = #x# k
k
, " ∗n , , " ∗n , = #xk # = #Tk∗n xk # = , Tk x, = , Tk x,,
for every n ≥ 1, and hence the contraction
k
k
k
Tk is not completely nonunitary.
Problem 5.77. Let T ∈ B[H] be a nonzero contraction on a Hilbert space H. Prove the following assertions.
442
5. Hilbert Spaces
(a) Every strongly stable contraction is completely nonunitary. Hint : If #T nx# → 0 for every x, then U = {0} (Problem 5.75). (b) There is a completely nonunitary contraction that is not strongly stable. Hint: A unilateral shift S+ is an isometry (thus a contraction that is not s O (Problem 5.29), and hence S+ is strongly stable) such that S+∗n −→ completely nonunitary (cf. Problem 5.75 and item (a) above). (c) There is a completely nonunitary contraction that is not strongly stable and whose adjoint is also not strongly stable. Hint : S+ ⊕ S+∗ (see Problem 5.76). Problem 5.78. Show that if a contraction is completely nonunitary, then so is every operator unitarily equivalent to it. That is, the property of being completely nonunitary is invariant under unitary equivalence. Hint : An operator unitarily equivalent to a contraction is again a contraction (since unitary equivalence is norm-preserving — Problem 5.9). Let T ∈ B[H] and S ∈ B[K] be unitarily equivalent contractions, and let U ∈ B[H, K] be any unitary transformation intertwining T to S. Take any nonzero reducing subspace M for T so that U (M) is a nonzero reducing subspace for S by Problem 5.9(d). Since U T |M = S U |M = S| U(M) , if T |M : M → M is unitary, then so is U T |M: M → U (M) (composition of invertible isometries is again an invertible isometry) and, conversely, if U T |M : M → U (M) is unitary, then so is T |M = U ∗ (U T |M ): M → M. Therefore, T |M is unitary if and only if S| U(M) is unitary. On the other hand, recall that U ∗ is unitary and U ∗ S = T U ∗. Thus if N is a nonzero reducing subspace for S, then U ∗ (N ) is a nonzero reducing subspace for T . Again, since U ∗ S|N = T U ∗ |N = S| U ∗ (N ) , conclude that S|N is unitary if and only if T | U ∗ (N ) is unitary. Therefore, T |M is not unitary for every nonzero T -reducing subspace M if and only if S|N is not unitary for every nonzero S-reducing subspace N .
6 The Spectral Theorem
The Spectral Theorem is a landmark in the theory of operators on Hilbert space, providing a full statement about the nature and structure of normal operators. Normal operators play a central role in operator theory; they will be defined in Section 6.1 below. It is customary to say that the Spectral Theorem can be applied to answer essentially all questions on normal operators. This indeed is the case as far as “essentially all” means “almost all” or “all the principal”: there exist open questions on normal operators. First we consider the class of normal operators and its relatives (predecessors and successors). Next, the notion of spectrum of an operator acting on a complex Banach space is introduced. The Spectral Theorem for compact normal operators is fully investigated, yielding the concept of diagonalization. The Spectral Theorem for plain normal operators needs measure theory. We would not dare to relegate measure theory to an appendix just to support a proper proof of the Spectral Theorem for plain normal operators. Instead we assume just once, in the very last section of this book, that the reader has some familiarity with measure theory, just enough to grasp the statement of the Spectral Theorem for plain normal operators after having proved it for compact normal operators.
6.1 Normal Operators Throughout this section H stands for a Hilbert space. An operator T ∈ B[H] is normal if it commutes with its adjoint (i.e., T is normal if T ∗ T = T T ∗ ). Here is another characterization of normal operators. Proposition 6.1. The following assertions are pairwise equivalent . (a) T is normal (i.e., T ∗ T = T T ∗ ). (b) #T ∗ x# = #T x# for every x ∈ H. (c) T n is normal for every positive integer n. (d) #T ∗n x# = #T nx# for every x ∈ H and every n ≥ 1.
C.S. Kubrusly, The Elements of Operator Theory, DOI 10.1007/978-0-8176-4998-2_6, © Springer Science+Business Media, LLC 2011
443
444
6. The Spectral Theorem
Proof. If T ∈ B[H], then #T ∗x#2 − #T x#2 = (T T ∗ − T ∗ T )x ; x for every x in H. Since T T ∗ − T ∗ T is self-adjoint, it follows by Corollary 5.80 that T T ∗ = T ∗ T if and only if #T ∗ x# = #T x# for every x ∈ H. This shows that (a)⇔(b). Therefore, as T ∗n = T n∗ for every n ≥ 1 (cf. Problem 5.24), (c)⇔(d). If T ∗ commutes with T , then it commutes with T n and, dually, T n commutes with T ∗n = T n∗ . So (a)⇒(c). Since (d)⇒(b) trivially, the proposition is proved. Clearly, every self-adjoint operator is normal (i.e., T ∗ = T implies T ∗ T = T T ∗ = T 2 ), and so are the nonnegative operators and, in particular, the orthogonal projections (cf. Proposition 5.81). It is also clear that every unitary operator is normal (recall from Proposition 5.73 that U ∈ B[H] is unitary if and only if U ∗ U = U U ∗ = I). In fact, normality distinguishes the orthogonal projections among the projections, and the unitaries among the isometries. Proposition 6.2. P ∈ B[H] is an orthogonal projection if and only if it is a normal projection. Proof. If P is an orthogonal projection, then it is a self-adjoint projection (Proposition 5.81), and hence a normal projection. On the other hand, if P is normal, then #P ∗ x# = #P x# for every x ∈ H (by the previous proposition) so that N (P ∗ ) = N (P ). If P is a projection, then R(P ) = N (I − P ) so that R(P ) = R(P )− by Proposition 4.13. Therefore, if P is a normal projection, then N (P )⊥ = N (P ∗ )⊥ = R(P )− = R(P ) (cf. Proposition 5.76), and hence R(P ) ⊥ N (P ). Thus P is an orthogonal projection. Proposition 6.3. U ∈ B[H] is unitary if and only if it is a normal isometry.
Proof. Proposition 5.73(a,j). Let T ∈ B[H] be an arbitrary operator on a Hilbert space H and set D = T ∗T − T T ∗ in B[H]. Observe that D = D∗ (i.e., D is always self-adjoint). Moreover, T is normal if and only if D = O.
An operator T ∈ B[H] is quasinormal if it commutes with T ∗ T ; that is, if T ∗ T T = T T ∗ T or, equivalently, if (T ∗ T − T T ∗ )T = O. Therefore, T is quasinormal if and only if D T = O. It is plain that every normal operator is quasinormal . Also note that every isometry is quasinormal . Indeed, if V ∈ B[H] is an isometry, then V ∗ V = I (Proposition 5.72) so that V ∗ V V − V V ∗ V = O. Proposition 6.4. If T = W Q is the polar decomposition of an operator T in B[H], then (a) W Q = Q W if and only if T is quasinormal .
6.1 Normal Operators
445
In this case, W T = T W and Q T = T Q. Moreover, (b) if T is normal, then W |N (W )⊥ is unitary. That is, the partial isometry of the polar decomposition of any normal operator is, in fact, a “partial unitary transformation” in the following sense. W = U P , where P is the orthogonal projection onto N ⊥ with N = N (T ) = N (W ) = N (Q), and U : N ⊥ → N ⊥ ⊆ H is a unitary operator for which N ⊥ is U -invariant . Proof. (a) Let T = W Q be the polar decomposition of T so that Q2 = T ∗ T (Theorem 5.89). If W Q = Q W , then Q2 W = Q W Q = W Q2 , and hence T T ∗ T = W Q Q2 = Q2 W Q = T ∗ T T (i.e., T is quasinormal). Conversely, if T T ∗ T = T ∗ T T , then T Q2 = Q2 T . Thus T Q = Q T by Theorem 5.85 (since 1 Q = (Q2 ) 2 ) so that W Q Q = Q W Q; that is, (W Q − Q W )Q = O. Therefore, − R(Q) ⊆ N (W Q − Q W ) and so N (Q)⊥ ⊆ N (W Q − Q W ) by Proposition 5.76 (since Q = Q∗ ). Recall that N (Q) = N (W ) (Theorem 5.89). If u ∈ N (Q), then u ∈ N (W ) so that (W Q − Q W )u = 0. Hence N (Q) ⊆ N (W Q − Q W ). The above displayed inclusions imply N (W Q − Q W ) = H (Problem 5.7(b)); that is, W Q = Q W . Since T = W Q, it follows at once that W and Q commute with T whenever they commute with each other. (b) Recall from Theorem 5.89 that the null spaces of T , W, and Q coincide. Thus (cf. Proposition 5.86) set N = N (T ) = N (W ) = N (Q) = N (Q2 ). According to Proposition 5.87, W = V P where V : N ⊥ → H is an isometry and P : H → H is the orthogonal projection onto N ⊥. Since R(Q)− = N (Q)⊥ = N ⊥ = R(P ), it follows that P Q = Q. Taking the adjoint and recalling that P = P ∗ (Proposition 5.81), we get P Q = QP = Q. Moreover, since V ∈ B[N ⊥, H], its adjoint V ∗ lies in B[H, N ⊥ ]. Then R(V ∗ ) ⊆ N ⊥ = R(P ), which implies that P V ∗ = V ∗ . Hence V P V ∗ = V V ∗. These identities hold for the polar decomposition of every operator T ∈ B[H]. Now suppose T is normal so that T is quasinormal. By part (a) we get Q 2 = T ∗ T = T T ∗ = W Q Q W ∗ = Q 2 W W ∗ = Q 2 V P V ∗ = Q2 V V ∗ . Therefore, Q2 (I − V V ∗ ) = O and hence R(I − V V ∗ ) ⊆ N (Q2 ) = N .
446
6. The Spectral Theorem
But if T is normal, then T commutes with T ∗ and trivially with itself. Therefore, N = N (T ) reduces T (Problem 5.34), and so N ⊥ is T -invariant. Then R(T ) ⊆ N ⊥ by Theorem 5.20, which implies that R(V ) ⊆ N ⊥ (since R(V ) = R(W ) = R(T )− ). In this case the isometry V : N ⊥ → H is into N ⊥ ⊆ H so that both V and V ∗ lie in B[N ⊥ ]. Thus the above displayed inclusion now holds for I and V V ∗ in B[N ⊥ ]. Hence R(I − V V ∗ ) = {0} (as R(I − V V ∗ ) ⊆ N ∩ N ⊥ = {0}), which means that I − V V ∗ = O. That is, V V ∗ = I so that the isometry V also is a coisometry. Thus V is unitary (Proposition 5.73). A part of an operator is a restriction of it to an invariant subspace. For instance, every unilateral shift is a part of some bilateral shift (of the same multiplicity). This takes a little proving. In this sense, every unilateral shift has an extension that is a bilateral shift. Recall that unilateral shifts are isometries, and bilateral shifts are unitary operators (see Problems 5.29 and 5.30). The above italicized result can be extended as follows. Every isometry is a part of a unitary operator . This takes a little proving too. Since every isometry is quasinormal, and since every unitary operator is normal, we might expect that every quasinormal operator is a part of a normal operator . This actually is the case. We shall call an operator subnormal if it is a part of a normal operator or, equivalently, if it has a normal extension. Precisely, an operator T on a Hilbert space H is subnormal if there exists a Hilbert space K including H and a normal operator N on K such that H is N -invariant (i.e., N (H) ⊆ H) and T is the restriction of N to H (i.e., T = N |H ). In other words, T ∈ B[H] is subnormal if H is a subspace of a larger Hilbert space K, so that K = H ⊕ H⊥ by Theorem 5.25, and ! T X N = : H ⊕ H⊥ → H ⊕ H ⊥ O Y is a normal operator in B[K] for some X ∈ B[H⊥, H] and some Y ∈ B[H⊥ ] (see Example 2.O). Recall that, writing the orthogonal direct sum decomposition K = H ⊕ H⊥ , we are identifying H ⊆ K with H ⊕ {0} (a subspace of H ⊕ H⊥ ) and H⊥ ⊆ K with {0} ⊕ H⊥ (also a subspace of H ⊕ H⊥ ). Proposition 6.5. Every quasinormal operator is subnormal . Proof. Suppose T ∈ B[H] is a quasinormal operator. Claim . N (T ) reduces T . Proof. Since T is quasinormal, T ∗ T commutes with both T and T ∗. So N (T ∗ T ) reduces T (Problem 5.34). But N (T ∗ T ) = N (T ) (Proposition 5.76). Thus T = O ⊕ S on H = N (T ) ⊕ N (T )⊥, with O = T |N (T ) : N (T ) → N (T ) and S = T |N (T )⊥ : N (T )⊥ → N (T )⊥. Note that T ∗ T = O ⊕ S ∗ S, and so (O ⊕ S ∗ S)(O ⊕ S) = T ∗ T T = T T ∗ T = (O ⊕ S)(O ⊕ S ∗ S).
6.1 Normal Operators
447
Then O ⊕ S ∗ S S = O ⊕ S S ∗ S, and hence S ∗ S S = S S ∗ S. That is, S is quasinormal. Since N (S) = N (T |N (T )⊥ ) = {0}, it follows by Corollary 5.90 that the partial isometry of the polar decomposition of S ∈ B[N (T )⊥ ] is an isometry. Therefore S = V Q, where V ∈ B[N (T )⊥ ] is an isometry (so that V ∗ V = I) and Q ∈ B[N (T )⊥ ] is nonnegative. But S = V Q = Q V by Proposition 6.4, and hence S ∗ = Q V ∗ = V ∗ Q. Set ! ! Q O V I −VV∗ and R = U = O Q O V∗ in B[N (T )⊥ ⊕ N (T )⊥ ]. Observe ! O V∗ ∗ U U = I −V V∗ V ! I O V = = O O I
that U is unitary. In fact, ! V I −V V∗ O V∗ ! ! V∗ O I −V V∗ = U U ∗. I −V V∗ V V∗
Also note that the nonnegative operator R commutes with U : ! ! S Q(I − V V ∗ ) V Q (I − V V ∗ )Q = UR = O V ∗Q O S∗ ! QV Q(I − V V ∗ ) = R U. = O QV ∗ Now set N = UR in B[N (T )⊥ ⊕ N (T )⊥ ]. The middle operator matrix says that S is a part of N (i.e., N (T )⊥ is N -invariant and S = N |N (T )⊥ ). Moreover, N ∗ N = R U ∗ UR = R2 = R2 U U ∗ = UR2 U ∗ = N N ∗ . Thus N is normal. Then S is subnormal, and so is T = O ⊕ S since T trivially is a part of the normal operator O ⊕ N on N (T ) ⊕ N (T )⊥ ⊕ N (T )⊥. An operator T ∈ B[H] is hyponormal if T T ∗ ≤ T ∗ T . In other words, T is hyponormal if and only if D ≥ O. Recall that T ∗ T and T T ∗ are nonnegative and D = (T ∗ T − T T ∗ ) is selfadjoint, for every T ∈ B[H]. Proposition 6.6. T ∈ B[H] is hyponormal if and only if #T ∗ x# ≤ #T x# for every x ∈ H. Proof. T T ∗ ≤ T ∗ T if and only if T T ∗ x ; x ≤ T ∗ T x ; x or, equivalently, #T ∗ x# ≤ #T x# for every x ∈ H. An operator T ∈ B[H] is cohyponormal if its adjoint T ∗ ∈ B[H] is hyponormal (i.e., if T ∗ T ≤ T T ∗ or, equivalently, if D ≤ O, which means by the above
448
6. The Spectral Theorem
proposition that #T x# ≤ #T ∗ x# for every x ∈ H). Hence T is normal if and only if it is both hyponormal and cohyponormal (Propositions 6.1 and 6.6). If an operator is either hyponormal or cohyponormal, then it is called seminormal . Every normal operator is trivially hyponormal. The next proposition goes beyond that. Proposition 6.7. Every subnormal operator is hyponormal . Proof. If T ∈ B[H] is subnormal, then H is a subspace of a larger Hilbert space K so that K = H ⊕ H⊥, and the operator ! T X N = : H ⊕ H⊥ → H ⊕ H ⊥ O Y in B[K] is normal for some X ∈ B[H⊥, H] and Y ! ! T ∗T T ∗X T∗ O = X ∗ T X ∗X + Y ∗ Y X∗ Y ∗ ! ! T X T∗ O = NN∗ = = X∗ Y ∗ O Y
∈ B[H⊥ ]. Then ! T X = N ∗N O Y T T ∗ + XX ∗ Y X∗
XY ∗ YY∗
! .
Therefore T ∗ T = T T ∗ + XX ∗, and hence T ∗ T − T T ∗ = XX ∗ ≥ O.
Let X be a normed space, take any operator T ∈ B[X ], and consider the power sequence {T n}. A trivial induction (cf. Problem 4.47(a)) shows that #T n # ≤ #T #n for every n ≥ 0. Lemma 6.8. If X isa normed space and T is an operator in B[X ], then the 1 real-valued sequence #T n# n converges in R. Proof. Suppose T = O. The proof uses the following bit of elementary number theory. Take an arbitrary m ∈ N . Every n ∈ N can be written as n = mpn + qn for some pn , qn ∈ N 0 , where qn < m. Hence #T n # = #T mpn + qn# = #T mpn T qn# ≤ #T mpn##T qn# ≤ #T m#pn #T qn#. Set μ = max 0≤k≤m−1 {#T k #} = 0 and recall that qn ≤ m − 1. Then 1
#T n# n ≤ #T m # 1
1
pn n
1
1
qn
1
μ n = μ n #T m # m − mn .
qn
1
Since μ n → 1 and #T m # m − mn → #T m # m as n → ∞, it follows that 1
1
lim sup #T n # n ≤ #T m # m n
1
1
for m ∈ N . Thus lim supn #T n# n ≤ lim inf n #T n# n and so (Problem 3.13) every 1 n n #T # converges in R. 1 We shall denote the limit of #T n# n by r(T ): 1
r(T ) = lim #T n # n . n
6.1 Normal Operators
449
1
According to the above proof we get r(T ) ≤ #T n # n for every n ≥ 1, and so 1 1 1 1 r(T ) ≤ #T #. Also note that r(T k ) k = (limn #(T k )n # n ) k = limn #T kn # kn = kn 1 r(T ) for each k ≥ 1, because #T # kn is a subsequence of the convergent se 1 quence #T n # n . Thus r(T k ) = r(T )k for every positive integer k. Therefore, if T ∈ B[X ] is an operator on a normed space X , then r(T )n = r(T n ) ≤ #T n # ≤ #T #n for each integer n ≥ 0. Definition: If r(T ) = #T #, then we say that T ∈ B[X ] is normaloid . The next proposition gives an equivalent definition. Proposition 6.9. An operator T ∈ B[X ] on a normed space X is normaloid if and only if #T n # = #T #n for every integer n ≥ 0. Proof. If r(T ) = #T #, then #T n # = #T #n for every n ≥ 0 by the above inequali1 ties. If #T n# = #T #n for every n ≥ 0, then r(T ) = limn #T n # n = #T #. Proposition 6.10. Every hyponormal operator is normaloid . Proof. Take an arbitrary operator T ∈ B[H] and let n be a nonnegative integer. Claim 1. If T is hyponormal, then #T n#2 ≤ #T n+1 ##T n−1# for every n ≥ 1. Proof. Note that, for every T ∈ B[H], #T nx#2 = T n x ; T nx = T ∗ T n x ; T n−1 x ≤ #T ∗T n x##T n−1 x# for each integer n ≥ 1 and every x ∈ H. If T is hyponormal, then #T ∗ T n x##T n−1 x# ≤ #T n+1 x##T n−1 x# ≤ #T n+1 ##T n−1##x#2 by Proposition 6.6, and hence #T n x#2 ≤ #T n+1##T n−1 ##x#2 for each n ≥ 1 and every x ∈ H, which ensures the claimed result. Claim 2. If #T n#2 ≤ #T n+1##T n−1 # for every n ≥ 1, then #T n # = #T #n for every n ≥ 0 Proof. #T n # = #T #n holds trivially if T = O (for all n ≥ 0), and if n = 0, 1 (for all T in B[H]). Let T be a nonzero operator and suppose #T n # = #T #n for some integer n ≥ 1. If #T n #2 ≤ #T n+1##T n−1 #, then #T #2n = (#T #n)2 = #T n #2 ≤ #T n+1 ##T n−1# ≤ #T n+1 ##T #n−1 since #T m # ≤ #T #m for every m ≥ 0, and therefore (recall: T = O), #T #n+1 = #T #2n(#T #n−1 )−1 ≤ #T n+1 # ≤ #T #n+1. Hence #T n+1 # = #T #n+1, concluding the proof by induction.
450
6. The Spectral Theorem
Claims 1 and 2 say that a hyponormal T is normaloid by Proposition 6.9. Since #T ∗n # = #T n# for each n ≥ 0 (cf. Problem 5.24(d)), it follows that r(T ∗ ) = r(T ). Thus T is normaloid if and only if T ∗ is normaloid, and so every seminormal operator is normaloid. Summing up: An operator T is normal if it commutes with its adjoint, quasinormal if it commutes with T ∗ T , subnormal if it is a restriction of a normal operator to an invariant subspace, hyponormal if T T ∗ ≤ T ∗ T , and normaloid if r(T ) = #T #. These classes are related by proper inclusion as follows. Normal ⊂ Quasinormal ⊂ Subnormal ⊂ Hyponormal ⊂ Normaloid. Example 6.A. We shall verify that the above inclusions are, in fact, proper. The unilateral shift will do the whole job. First recall that a unilateral shift S+ is an isometry but not a coisometry, and hence S+ is a nonnormal quasinormal operator. Since S+ is subnormal, A = I + S+ is subnormal (if N is a normal extension of S+ , then I + N is a normal extension of A). However, since S+ is a nonnormal isometry, A∗AA − AA∗A = A∗AS+ − S+ A∗A = S+∗ S+ − S+ S+∗ = O, and therefore A is not quasinormal. Check that B = S+∗ + 2S+ is hyponormal, but B 2 is not hyponormal. Since the square of every subnormal operator is again a subnormal operator, it follows that B is not subnormal. Finally, S+∗ is normaloid (by Proposition 6.9) but not hyponormal.
6.2 The Spectrum of an Operator Let T ∈ L[D(T ), X ] be a linear transformation, where X is a nonzero normed space and D(T ), the domain of T , is a linear manifold of X = {0}. Let I be the identity on X . The resolvent set ρ(T ) of T is the set of all scalars λ ∈ F for which (λI − T ) ∈ L[D(T ), X ] has a densely defined continuous inverse: ρ(T ) = λ ∈ F : (λI − T )−1 ∈ B[R(λI − T ), D(T )] and R(λI − T )− = X . Henceforward all linear transformations are operators on a complex Banach space. In other words, T ∈ B[X ], where D(T ) = X = {0} is a complex Banach space; that is, T : X → X is a bounded linear transformation of a nonzero complex Banach space X into itself. In this case (i.e., in the unital complex Banach algebra B[X ]), Corollary 4.24 ensures that the above-defined resolvent set ρ(T ) is the set of all complex numbers λ for which (λI − T ) ∈ B[X ] is invertible (i.e., has a bounded inverse on X ). Equivalently (Theorem 4.22), ρ(T ) = λ ∈ C : (λI − T ) ∈ G[X ] = {λ ∈ C : λI − T has an inverse in B[X ]} = λ ∈ C : N (λI − T ) = {0} and R(λI − T ) = X . The complement of ρ(T ), denoted by σ(T ), is the spectrum of T : σ(T ) = C \ρ(T ) = λ ∈ C : N (λI − T ) = {0} or R(λI − T ) = X .
6.2 The Spectrum of an Operator
451
Proposition 6.11. If λ ∈ ρ(T ), then δ = #(λI − T )−1 #−1 is a positive number. The open ball Bδ (λ) with center at λ and radius δ is included in ρ(T ), and hence δ ≤ d(λ, σ(T )). Proof. Let T be a bounded linear operator acting on a complex Banach space. Take λ ∈ ρ(T ). Then (λI − T ) ∈ G[X ], and so O = (λI − T )−1 is bounded. Thus δ = #(λI − T )−1 #−1 > 0. Let Bδ (0) be the nonempty open ball of radius δ about the origin of the complex plane C , and take an arbitrary μ in Bδ (0). Since |μ| < #(λI − T )−1 #−1, #μ(λI − T )−1 # < 1. Thus, by Problem 4.48(a), [I − μ(λI − T )−1 ] ∈ G[X ], and so (λ − μ)I − T = (λI − T )[I − μ(λI − T )−1 ] also lies in G[X ] by Corollary 4.23. Outcome: λ − μ ∈ ρ(T ), so that Bδ (λ) = Bδ (0) + λ = ν ∈ C : ν = μ + λ for some μ ∈ Bδ (0) ⊆ ρ(T ), which implies that σ(T ) = C \ρ(T ) ⊆ C \Bδ (λ). Hence d(λ , ς) = |λ − ς| ≥ δ for every ς ∈ σ(T ), and so d(λ, σ(T )) = inf ς∈σ(T ) |λ − ς| ≥ δ. Corollary 6.12. The resolvent set ρ(T ) is nonempty and open, and the spectrum σ(T ) is compact. Proof. If T ∈ B[X ] is an operator on a Banach space X , then (since T is bounded) the von Neumann expansion (Problem 4.47) ensures that λ ∈ ρ(T ) whenever #T # < |λ|. Since σ(T ) = C \ρ(T ), this is equivalent to |λ| ≤ #T #
for every
λ ∈ σ(T ).
Thus σ(T ) is bounded, and so ρ(T ) = ∅. By Proposition 6.11, ρ(T ) includes a nonempty open ball centered at each point in it. Thus ρ(T ) is open, and so σ(T ) is closed. In C , closed and bounded means compact (Theorem 3.83). The resolvent function of T ∈ B[X ] is the map R : ρ(T ) → G[X ] defined by R(λ) = (λI − T )−1 . for every λ ∈ ρ(T ). Since R(λ) − R(μ) = R(λ) R(μ)−1 − R(λ)−1 R(μ), we get R(λ) − R(μ) = (μ − λ)R(λ)R(μ) for every λ, μ ∈ ρ(T ). This is the resolvent identity. Swapping λ and μ in the resolvent identity, it follows that R(λ)R(μ) = R(μ)R(λ) for every λ, μ ∈ ρ(T ). Also, T R(λ) = R(λ)T for every λ ∈ ρ(T ) (since R(λ)−1 R(λ) = R(λ)R(λ)−1 ). To prove the next proposition we need a piece of elementary complex analysis. Let Λ be a nonempty and open subset of the complex plane C . Take a function f : Λ → C and a point μ ∈ Λ. Suppose f (μ) is a complex number property. For every ε > 0 there exists δ > 0 such # with(μ)the following # # that # f (λ)−f − f (μ) < ε for all λ in Λ for which 0 < |λ − μ| < δ. If there λ−μ exists such an f (μ) ∈ C , then it is called the derivative of f at μ. If f (μ) exists for every μ in Λ, then f : Λ → C is analytic on Λ. A function f : C → C is entire if it is analytic on the whole complex plane C . The Liouville Theorem is the result we need. It says that every bounded entire function is constant .
452
6. The Spectral Theorem
Proposition 6.13. The spectrum σ(T ) is nonempty. Proof. Let T ∈ B[X ] be an operator on a complex Banach space X . Take an arbitrary nonzero element ϕ in the dual B[X ]∗ of B[X ] (i.e., an arbitrary nonzero bounded linear functional ϕ: B[X ] → C — note: B[X ] = {O} because ∗ X = {0}, and so B[X ] = {0} by Corollary 4.64). Recall that ρ(T ) = C \σ(T ) is nonempty and open in C . Claim 1. If σ(T ) is empty, then ϕ ◦ R : ρ(T ) → C is bounded. Proof. The resolvent function R : ρ(T ) → G[X ] is continuous (reason: scalar multiplication and addition are continuous mappings, and so is inversion by Problem 4.48(c)). Thus #R(· )# : ρ(T ) → R is continuous (composition of continuous functions). Then sup |λ|≤ T #R(λ)# < ∞ by Theorem 3.86 whenever σ(T ) is empty. On the other hand, if #T # < |λ|, then Problem 4.47(h) ensures that #R(λ)# = #(λI − T )−1 # ≤ (|λ| − #T #)−1, and therefore #R(λ)# → 0 as |λ| → ∞. Since the function #R(·)# : ρ(T ) → R is continuous, it then follows that sup T <|λ| #R(λ)# < ∞. Hence sup λ∈ρ(T ) #R(λ)# < ∞. Thus sup #(ϕ ◦ R)(λ)# ≤ #ϕ# sup #R(λ)# < ∞.
λ∈ρ(T )
λ∈ρ(T )
Claim 2. ϕ ◦ R : ρ(T ) → C is analytic. Proof. If λ and μ are distinct points in ρ(T ), then
R(λ) − R(μ) + R(μ)2 = R(μ) − R(λ) R(μ) λ−μ by the resolvent identity. Set f = ϕ ◦ R : ρ(T ) → C , and let f : ρ(T ) → C be defined by f (λ) = −ϕ(R(λ)2 ) for each λ ∈ ρ(T ). Therefore, # f (λ) − f (μ) # # -
.# # # − f (μ)# = #ϕ R(μ) − R(λ) R(μ) # # λ−μ ≤ #ϕ##R(μ)##R(μ) − R(λ)# so that f : ρ(T ) → C is analytic because R : ρ(T ) → G[X ] is continuous. If σ(T ) = ∅ (i.e., if ρ(T ) = C ), then ϕ ◦ R : C → C is a bounded entire function, and so a constant function by the Liouville Theorem. But we have just seen (proof of Claim 1) that #R(λ)# → 0 as |λ| → ∞. Hence ϕ(R(λ)) → 0 as |λ| → ∞ (since ϕ is continuous). Then ϕ ◦ R = 0 for all ϕ ∈ B[H]∗ so that R = O (Corollary 4.64). That is, (λI − T )−1 = O for every λ ∈ C , which is a contradiction (O ∈ / G[X ]). Thus σ(T ) = ∅. Remark : σ(T ) is compact and nonempty, and so is its boundary ∂σ(T ). Thus ∂σ(T ) = ∂ρ(T ) = ∅ (see Problem 3.41). The spectrum σ(T ) is the set of all λ in C such that λI − T fails to be invertible (i.e., fails to have a bounded inverse on R(λI − T ) = X ). According
6.2 The Spectrum of an Operator
453
to the origin of such a failure, σ(T ) can be split into many disjoint parts. A classical partition comprises three parts. The set of those λ such that λI − T has no inverse is the point spectrum: σP (T ) = λ ∈ C : N (λI − T ) = {0} . A scalar λ ∈ C is called an eigenvalue of T if there exists a nonzero vector x in X such that T x = λx; equivalently, if N (λI − T ) = {0}. If λ ∈ C is an eigenvalue of T , then the nonzero vectors in N (λI − T ) are the eigenvectors of T , and N (λI − T ) is the eigenspace (which, in fact, is a subspace of X — cf. Proposition 4.13), associated with λ. The multiplicity of an eigenvalue is the dimension of the respective eigenspace. After this quick digression on eigenvalues and eigenvectors, note that the point spectrum of T is precisely the set of all eigenvalues of T . The set of those λ for which λI − T has a densely defined but unbounded inverse on its range is the continuous spectrum: σC (T ) = λ ∈ C : N (λI − T ) = {0}, R(λI − T )− = X and R(λI − T ) = X (see Corollary 4.24 again). If λI − T has an inverse on its range that is not densely defined, then λ belongs to the residual spectrum: σR (T ) = λ ∈ C : N (λI − T ) = {0} and R(λI − T )− = X . The collection σP (T ), σC (T ), σR (T ) of subsets of σ(T ) is in fact a partition (i.e., a disjoint covering) of the spectrum: they are pairwise disjoint and σ(T ) = σP (T ) ∪ σC (T ) ∪ σR (T ). The following diagram refines this spectrum partition. The residual spectrum is split into two disjoint parts, σR(T ) = σR1(T )∪ σR2(T ), and the point spectrum is split into four disjoint parts, σP (T ) = 4i=1 σPi(T ). We adopt the following abbreviated notation: Tλ = λI − T , Nλ = N (Tλ ), and Rλ = R(Tλ ). Recall that if Tλ is injective (i.e., if N (Tλ ) = {0}), then its linear inverse Tλ−1 on Rλ is continuous if and only if Rλ is closed (Corollary 4.24). R− =X λ
R− = X λ
R− = Rλ R− = Rλ R− = Rλ R− = Rλ λ λ λ λ
Nλ = {0}
Tλ−1 ∈ B[Rλ ,X ]
ρ (T )
∅
∅
σR1(T )
Tλ−1 ∈ / B[Rλ ,X ]
∅
σC (T )
σR2(T )
∅
⎫ ⎬
σP1(T )
σP2(T )
σP3(T )
σP4(T )
⎭
Nλ = {0}
A
BC
σCP (T )
σAP (T )
D
Recall that σ(T ) = ∅, but any of the above disjoint parts of the spectrum may be empty (see Section 6.5). However, if σP (T ) is nonempty, then every set of eigenvectors associated with distinct eigenvalues is linearly independent.
454
6. The Spectral Theorem
Proposition 6.14. Let {λγ }γ∈Γ be any family of distinct eigenvalues of T . For each γ ∈ Γ let xγ be an eigenvector associated with λγ (i.e., xγ = 0 in N (λγ I − T ) = {0}). The set {xγ }γ∈Γ is linearly independent . Proof. Consider the set {xγ }γ∈Γ (whose existence is ensured by the Axiom of Choice). Let {xi }ni=1 be an arbitrary finite subset of {xγ }γ∈Γ . Claim . {xi }ni=1 is linearly independent. Proof. If n = 1, then linear independence is trivial. Suppose {xi }ni=1 is linearly n+1 independent for nsome integer n ≥ 1.n If {xi }i=1 is not linearly independent, then xn+1 = i=1 αi xi , where {αi }i=1 is a family of complex numbers with at n number among them. Therefore λn+1 xn+1 = T xn+1 = nleast one nonzero α T x = i i i=1 αi λi xi . If λn+1 = 0, then λi = 0 for every i = n + 1 and i=1 n α λ x = 0 so that {xi }ni=1 is not linearly independent, i=1 i i i nwhich is a contradiction. On the other hand, if λn+1 = 0, then xn+1 = i=1 αi λ−1 n+1 λi xi , n n λ )x = 0 so that {x } is not linearly indepenand hence i=1 αi (1 − λ−1 i i=1 n+1 i i dent (because λi = λn+1 for every i = n + 1), which is again a contradiction. This completes the proof by induction. Outcome: {xγ }γ∈Γ is linearly independent by Proposition 2.3.
There are some overlapping parts of the spectrum which are commonly used too. For instance, the compression spectrum σCP (T ) and the approximate point spectrum (or approximation spectrum) σAP (T ). These are defined by σCP (T ) = λ ∈ C : R(λI − T ) is not dense in X = σP3(T ) ∪ σP4(T ) ∪ σR (T ) and
σAP (T ) = λ ∈ C : λI − T is not bounded below = σP (T ) ∪ σC (T ) ∪ σR2(T ) = σ(T )\σR1(T ).
Next we give an alternative definition of σAP (T ) which may come as a motivation for the term “approximate point spectrum”. The elements of it are sometimes referred to as the approximate eigenvalues of T . Proposition 6.15. The following assertions are pairwise equivalent . (a) λ ∈ σAP (T ). (b) There exists an X -valued sequence {xn } of unit vectors such that #(λI − T )xn # → 0. (c) For every ε > 0 there is a unit vector xε ∈ X such that #(λI − T )xε # < ε. Proof. It is clear that (c) implies (b). If (b) holds true, then there is no constant α > 0 such that α = α#xn # ≤ #(λI − T )xn # for all n, and so λI − T is not bounded below. Hence (b) implies (a). If λI − T is not bounded below, then
6.2 The Spectrum of an Operator
455
there is no constant α > 0 such that α#x# ≤ #(λI − T )x# for all x ∈ X or, equivalently, for every ε > 0 there exists 0 = yε ∈ X such that #(λI − T )yε # < ε#yε #. By setting xε = #yε #−1 yε , it follows that (a) implies (c). Proposition 6.16. The approximate point spectrum is nonempty, closed in C , and includes the boundary ∂σ(T ) of the spectrum.
Proof. Take an arbitrary λ ∈ ∂σ(T ). Recall that ρ(T ) = ∅ (Corollary 6.12) and ∂σ(T ) = ∂ρ(T ) ⊂ ρ(T )− (Problem 3.41). Hence there exists a sequence {λn } in ρ(T ) such that λn → λ (Proposition 3.27). Since (λn I − T ) − (λI − T ) = (λn − λ)I for every n, it follows that (λn I − T ) → (λI − T ) in B[X ]; that is, {(λn I − T )} in G[X ] converges in B[X ] to (λI − T ) ∈ B[X ]\G[X ] (each λn lies in ρ(T ) and λ ∈ ∂σ(T ) ⊆ σ(T ) because σ(T ) is closed). If supn #(λn I − T )−1 # < ∞, then (λI − T ) ∈ G[X ] (cf. hint to Problem 4.48(c)), which is a contradiction. Thus sup #(λn I − T )−1 # = ∞. n
For each n take yn in X with #yn # = 1 such that #(λn I − T )−1 # −
1 n
≤ #(λn I − T )−1 yn # ≤ #(λn I − T )−1 #.
Then supn#(λn I − T )−1 yn# = ∞, and hence inf n#(λn I − T )−1 yn #−1 = 0, so that there exist subsequences of {λn } and {yn }, say {λk } and {yk }, for which #(λk I − T )−1 yk #−1 → 0. Set xk = #(λk I − T )−1 yk #−1 (λk I − T )−1 yk and get a sequence {xk } of unit vectors in X such that #(λk I − T )xk # = #(λk I − T )−1 yk #−1 . Since #(λI − T )xk # = #(λk I − T )xk − (λk − λ)xk # ≤ #(λk I − T )−1yk #−1 + |λk − λ| and λk → λ, it follows that #(λI − T )xk # → 0. Hence λ ∈ σAP (T ) according to Proposition 6.15. Therefore, ∂σ(T ) ⊆ σAP (T ). This inclusion implies that σAP (T ) = ∅ (since σ(T ) is closed and nonempty). Finally, take an arbitrary λ ∈ C \σAP (T ) so that λI − T is bounded below. Thus there exists an α > 0 for which α#x# ≤ #(λI − T )x# ≤ #(μI − T )x# + #(λ − μ)x#, and so (α − |λ − μ|)#x# ≤ #(μI − T )x#, for all x ∈ X and μ ∈ C . Then μI − T is bounded below (i.e., μ ∈ C \σAP (T )) for every μ sufficiently close to λ (such that 0 < α − |λ − μ|). Hence C \σAP (T ) is open, and so σAP (T ) is closed. Remark : σR1(T ) is open in C . Indeed, since σAP (T ) is closed in C and includes ∂σ(T ), it follows that C \σR1(T ) = ρ(T ) ∪ σAP (T ) = ρ(T ) ∪ ∂σ(T ) ∪ σAP (T ) = ρ(T ) ∪ ∂ρ(T ) ∪ σAP (T ) = ρ(T )− ∪ σAP (T ), which is closed in C .
456
6. The Spectral Theorem
For the next proposition we assume that T lies in B[H], where H is a nonzero complex Hilbert space. If Λ is any subset of C , then set Λ∗ = λ ∈ C : λ ∈ Λ so that Λ∗∗ = Λ, (C \Λ)∗ = C \Λ∗, and (Λ1 ∪ Λ2 )∗ = Λ∗1 ∪ Λ∗2 . Proposition 6.17. If T ∗ ∈ B[H] is the adjoint of T ∈ B[H], then ρ(T ) = ρ(T ∗ )∗ ,
σ(T ) = σ(T ∗ )∗ ,
σC (T ) = σC (T ∗ )∗ ,
and the residual spectrum of T is given by the formula σR (T ) = σP (T ∗ )∗ \σP (T ). As for the subparts of the point and residual spectra, σP1(T ) = σR1(T ∗ )∗ ,
σP2(T ) = σR2(T ∗ )∗ ,
σP3(T ) = σP3(T ∗ )∗ ,
σP4(T ) = σP4(T ∗ )∗ .
For the compression and approximate point spectra, we get σCP (T ) = σP (T ∗ )∗ , ∂σ(T ) ⊆ σAP (T ) ∩ σAP (T ∗ )∗ = σ(T )\(σP1(T ) ∪ σR1(T )). Proof. Since S ∈ G[H] if and only if S ∗ ∈ G[H], we get ρ(T ) = ρ(T ∗ )∗. Hence σ(T )∗ = (C \ρ(T ))∗ = C \ρ(T ∗ ) = σ(T ∗ ). Recall that R(S)− = R(S) if and only if R(S ∗ )− = R(S ∗ ), and N (S) = {0} if and only if R(S ∗ )− = H (Proposition 5.77 and Problem 5.35). Thus σP1(T ) = σR1(T ∗ )∗, σP2(T ) = σR2(T ∗ )∗, σP3(T ) = σP3(T ∗ )∗, and also σP4(T ) = σP4(T ∗ )∗. Applying the same argument, σC (T ) = σC (T ∗ )∗ and σCP (T ) = σP (T ∗ )∗. Therefore, σR (T ) = σCP (T )\σP (T )
implies
σR (T ) = σP (T ∗ )∗ \σP (T ).
Moreover, by using the above properties, observe that σAP (T ∗ ) = σP (T ∗ ) ∪ σC (T ∗ ) ∪ σR2(T ∗ ) = σCP (T )∗ ∪ σC (T )∗ ∪ σP2(T )∗ , and so
σAP (T ∗ )∗ = σCP (T ) ∪ σC (T ) ∪ σP2(T ).
Hence σAP (T ∗ )∗ ∩ σAP (T ) = σ(T )\(σP1(T ) ∪ σR1(T )). But σ(T ) is closed and σR1(T ) is open (and so σP1(T ) = σR1(T ∗ )∗ ) in C . This implies that (cf. Problem 3.41(b,d)) σP1(T ) ∪ σR1(T ) ⊆ σ(T )◦ and ∂σ(T ) ⊆ σ(T )\(σP1(T ) ∪ σR1(T )). Remark : We have just seen that σP1(T ) is open in C . Corollary 6.18. Let H = {0} be a complex Hilbert space. Let D be the open unit disk about the origin in the complex plane C , and let T = ∂ D denote the unit circle about the origin in C .
6.2 The Spectrum of an Operator
457
(a) If H ∈ B[H] is hyponormal, then σP (H)∗ ⊆ σP (H ∗ ) and σR (H ∗ ) = ∅. (b) If N ∈ B[H] is normal, then σP (N ∗ ) = σP (N )∗ and σR (N ) = ∅. (c) If U ∈ B[H] is unitary, then σ(U ) ⊆ T . (d) If A ∈ B[H] is self-adjoint, then σ(A) ⊂ R. (e) If Q ∈ B[H] is nonnegative, then σ(Q) ⊂ [0, ∞). (f) If R ∈ B[H] is strictly positive, then σ(R) ⊂ [α, ∞) for some α > 0. (g) If P ∈ B[H] is a nontrivial projection, then σ(P ) = σP (P ) = {0, 1}. (h) If J ∈ B[H] is a nontrivial involution, then σ(J) = σP (J) = {−1, 1}. Proof. Take any T ∈ B[H] and any λ ∈ C . It is readily verified that (λI − T )∗ (λI − T ) − (λI − T )(λI − T )∗ = T ∗ T − T T ∗ . Hence λI − T is hyponormal if and only if T is hyponormal. If H is hyponormal, then λI − H is hyponormal and so (cf. Proposition 6.6) #(λI − H ∗ )x# ≤ #(λI − H)x#
for every x ∈ H and every λ ∈ C .
If λ ∈ σP (H), then N (λI − H) = {0} so that N (λI − H ∗ ) = {0} by the above inequality, and hence λ ∈ σP (H ∗ ). Thus σP (H) ⊆ σP (H ∗ )∗ . Equivalently, σP (H)∗ ⊆ σP (H ∗ )
so that
σR (H ∗ ) = σP (H)∗ \σP (H ∗ ) = ∅
(cf. Proposition 6.17). This proves (a). Since N is normal if and only if it is both hyponormal and cohyponormal, this also proves (b). That is, σP (N )∗ = σP (N ∗ )
so that
σR (N ) = ∅.
Let U be a unitary operator isometry). # #(i.e., a normal # # Since U is an isometry, #U x# = #x# so that # |λ| − 1# #x# = # #λx# − #U x# # ≤ #(λI − U )x#, for every x in H. If |λ| = 1, then λI − U is bounded below so that λ ∈ ρ(U ) ∪ σR (U ) = ρ(U ) since σR (U ) = ∅ by (b). Thus λ ∈ / ρ(U ) implies |λ| = 1, proving (c): σ(U ) ⊆ T . If A is self-adjoint, then x ; Ax ∈ R for every x ∈ H. Thus β x ; (αI − A)x is real, and hence Reiβ x ; (αI − A)x = 0, for every α, β ∈ R and every x ∈ H. Therefore, with λ = α + iβ, #(λI − A)x#2 = #iβx + (αI − A)x#2 = |β|2 #x#2 + 2 Reiβ x ; (αI − A)x + #(αI − A)x#2 = |β|2 #x#2 + #(αI − A)x#2 ≥ |β|2 #x#2 = |Im λ|2 #x#2 for every x ∈ H and every λ ∈ C . If λ ∈ R, then λI − A is bounded below, and so λ ∈ ρ(A) ∪ σR (A) = ρ(A) since σR (A) = ∅ by (b) once A is normal. Thus λ∈ / ρ(A) implies λ ∈ R. Since σ(A) is bounded, this shows that (d) holds:
458
6. The Spectral Theorem
σ(A) ⊂ R. If Q ≥ O and λ ∈ σ(Q), then λ ∈ R by (d) since Q is self-adjoint, and hence #(λI − Q)x#2 = |λ|2 #x#2 − 2λQx ; x + #Qx#2 for each x ∈ H. If λ < 0, then #(λI − Q)x#2 ≥ |λ|2 #x#2 for every x ∈ H (since Q ≥ O), and so λI − Q is bounded below. Applying the same argument of the previous item, we get (e): σ(Q) ⊂ [0, ∞). If R ) O, then O ≤ R ∈ G[H], and so 0 ∈ ρ(R) and σ(R) ⊂ [0, ∞) by (e), and hence σ(R) ⊂ (0, ∞). But σ(R) is closed. Thus (f) holds: σ(R) ⊂ [α, ∞)
for some
α > 0.
If O = P = P 2 = I (i.e., if P is a nontrivial projection), then {0} = R(P ) = N (I − P ) and {0} = R(I − P ) = N (P ) (Section 2.9), and so {0, 1} ⊆ σP (P ). If λ is any complex number such that 0 = λ = 1, then
1 1 (λI − P ) λ1 I + λ(λ−1) P = I = λ1 I + λ(λ−1) P (λI − P ) so that λI − P is invertible (i.e., (λI − P ) ∈ G[H] — Theorem 4.22), and hence λ ∈ ρ(P ). Thus σ(P ) ⊆ {0, 1}, which concludes the proof of (g): σ(P ) = σP (P ) = {0, 1}. If J 2 = I (i.e., an involution), then (I − J)(−I − J) = 0 = (−I − J)(I − J) so that R(−I − J) ⊆ N (I − J) and R(I − J) ⊆ N (−I − J). If 1 ∈ / σP (J) or −1 ∈ / σP (J), then N (I − J) = {0} or N (−I − J) = {0}, which implies that R(I + J) = {0} or R(I − J) = {0}, and hence J = I or J = −I. Thus, if the involution J is nontrivial (i.e., if J = ±I), then {−1, 1} ∈ σP (J). Moreover, if λ in C is such that λ2 = 1 (i.e., if λ = ±1), then
−λ 1 −λ 1 (λI − J) 1−λ = I = 1−λ (λI − J), 2 I − 1−λ2 J 2 I − 1−λ2 J so that (λI − J) ∈ G[H], and hence λ ∈ ρ(J). Thus σ(J) ⊆ {−1, 1}, which concludes the proof of (h): σ(J) = σP (J) = {−1, 1}.
6.3 Spectral Radius We open this section with the Spectral Mapping Theorem for polynomials. Let us just mention that there are versions of it that hold for functions other than polynomials. If Λ is any subset of C , and p : C → C is any polynomial (in one variable) with complex coefficients, then set p(Λ) = p(λ) ∈ C : λ ∈ Λ .
6.3 Spectral Radius
459
Theorem 6.19. (The Spectral Mapping Theorem). If T ∈ B[X ], where X is a complex Banach space, then σ(p(T )) = p(σ(T )) for every polynomial p with complex coefficients. Proof. If p is a constant polynomial (i.e., if p(T ) = αI for some α ∈ C ), then the result is trivially verified (and has nothing to do with T ; that is, σ(αI) = ασ(I) = {α} since ρ(αI) = C \{α} for every α ∈ C ). Thus let p : C → C be an arbitrary nonconstant polynomial with complex coefficients, n
p(λ) =
with n ≥ 1 and αn = 0,
αi λi ,
i=0
for every λ ∈ C . Take an arbitrary μ ∈ C and consider the factorization μ − p(λ) = βn
n
(λi − λ),
i=1
with βn = (−1)n+1 αn , where {λi }ni=1 are the roots of μ − p(λ). Thus μI − p(T ) = βn
n
(λi I − T ).
i=1
If μ ∈ σ(p(T )), then there exists λj ∈ σ(T n) for some j = 1, . . . , n. Indeed, if λi ∈ ρ(T ) for every i = 1, . . . , n, then βn i=1 (λi I − T ) ∈ G[X ], and therefore μI − p(T ) ∈ G[X ], which means that μ ∈ ρ(p(T )). However, μ − p(λj ) = βn
n
(λi − λj ) = 0,
i=1
and so p(λj ) = μ. Then μ = p(λj ) ∈ {p(λ) ∈ C : λ ∈ σ(T )} = p(σ(T )) because λj ∈ σ(T ). Hence σ(p(T )) ⊆ p(σ(T )). Conversely, if μ ∈ p(σ(T )) = {p(λ) ∈ C : λ ∈ σ(T )}, then μ = p(λ) for some λ ∈ σ(T ). Thus μ − p(λ) = 0 so that λ = λj for some j = 1, . . . , n, and so μI − p(T ) = βn
n
(λi I − T )
i=1
= βn (λj I − T )
n
(λi I − T ) = βn
j =i=1
n
(λi I − T )(λj I − T )
j =i=1
since λj I − T commutes with λi I − T for every integer i. If μ ∈ ρ(p(T )), then (μI − p(T )) ∈ G[X ] so that
460
6. The Spectral Theorem
n
(λj I − T ) βn (λi I − T ) μI − p(T ) −1 j =i=1
= μI − p(T ) μI − p(T ) −1 = I = μI − p(T ) −1 μI − p(T ) n
(λi I − T ) (λj I − T ). = βn μI − p(T ) −1 j =i=1
This means that λj I − T has a right and a left inverse, and so it is injective and surjective (Problems 1.5 and 1.6). The Inverse Mapping Theorem (Theorem 4.22) says that (λj I − T ) ∈ G[X ], and so λ = λj ∈ ρ(T ). This contradicts the fact that λ ∈ σ(T ). Conclusion: μ ∈ / ρ(p(T )); that is, μ ∈ σ(p(T )). Hence p(σ(T )) ⊆ σ(p(T )).
Remarks: Here are some useful properties of the spectrum. By the previous theorem, μ ∈ σ(T )n = {λn ∈ C : λ ∈ σ(T )} if and only if μ ∈ σ(T n ). Thus σ(T n ) = σ(T )n
for every
n ≥ 0.
Moreover, μ ∈ ασ(T ) = {αλ ∈ C : λ ∈ σ(T )} if and only if μ ∈ σ(αT ). So σ(αT ) = ασ(T )
for every
α ∈ C.
The next identity is not a particular case of the Spectral Mapping Theorem for polynomials (as was the case for the above two results). If T ∈ G[X ], then σ(T −1 ) = σ(T )−1 . That is, μ ∈ σ(T )−1 = {λ−1 ∈ C : 0 = λ ∈ σ(T )} if and only if μ ∈ σ(T −1 ). Indeed, if T ∈ G[X ] (so that 0 ∈ ρ(T )) and μ = 0, then −μT −1 (μ−1 I − T ) = μI − T −1 , and so μ−1 ∈ ρ(T ) if and only if μ ∈ ρ(T −1 ); which means that μ ∈ σ(T −1 ) if and only if μ−1 ∈ σ(T ). Also notice that, for every S, T ∈ B[X ], σ(S T )\{0} = σ(T S)\{0}, In fact, Problem 2.32 says that I − S T is invertible if and only if I − T S is, or equivalently, λ − S T is invertible if and only if λ − T S is whenever λ = 0, and so ρ(S T )\{0} = ρ(T S)\{0}. Now let H be a complex Hilbert space. Recall from Proposition 6.17 that, if T ∈ B[H], then σ(T ∗ ) = σ(T )∗ . If Q ∈ B[H] is a nonnegative operator, then it has a unique nonnegative square 1 root Q 2 ∈ B[H] by Theorem 5.85, and σ(Q) ⊆ [0, ∞) by Corollary 6.18. Thus 1 1 Theorem 6.19 ensures that σ(Q 2 )2 = σ((Q 2 )2 ) = σ(Q). Therefore, 1
1
σ(Q 2 ) = σ(Q) 2 .
6.3 Spectral Radius
461
The spectral radius of an operator T ∈ B[X ] is the number rσ (T ) = sup |λ| = max |λ|. λ∈σ(T )
λ∈σ(T )
The first identity defines the spectral radius rσ (T ), and the second follows by Theorem 3.86 (since σ(T ) = ∅ is compact in C and | |: C → R is continuous). Corollary 6.20. rσ (T n ) = rσ (T )n for every n ≥ 0. Proof. Take an arbitrary integer n ≥ 0. Since σ(T n ) = σ(T )n , it follows that μ ∈ σ(T n ) if and only if μ = λn for some λ ∈ σ(T ). Hence supμ∈σ(T n ) |μ| = supλ∈σ(T ) |λn | = supλ∈σ(T ) |λ|n = (supλ∈σ(T ) |λ|)n . Remarks: Recall that λ ∈ σ(T ) only if |λ| ≤ #T # (cf. proof of Corollary 6.12), and so rσ (T ) ≤ #T #. Therefore, according to Corollary 6.20, rσ (T )n = rσ (T n ) ≤ #T n# ≤ #T #n for each integer n ≥ 0. Thus rσ (T ) ≤ 1 whenever T is power bounded . Indeed, if supn #T n # < ∞, then rσ (T )n ≤ supn #T n# < ∞ for all n ≥ 0 so that rσ (T ) ≤ 1. That is, sup #T n # < ∞
implies
n
rσ (T ) ≤ 1.
Also note that the spectral radius of a nonzero operator may be null. Indeed, the above inequalities ensure that rσ (T ) = 0 for every nonzero nilpotent operator T (i.e., whenever T n = O for some integer n ≥ 2). An operator T ∈ B[X ] is quasinilpotent if rσ (T ) = 0. Thus every nilpotent operator is quasinilpotent . Observe that σ(T ) = σP (T ) = {0} if T is nilpotent. In fact, if T n−1 = O and T n = O, then T (T n−1 x) = 0 for every x ∈ X , so that {0} = R(T n−1 ) ⊆ N (T ), and hence λ = 0 is an eigenvalue of T . Since σP (T ) may be empty for a quasinilpotent operator (as we shall see in Examples 6.F and 6.G of Section 6.5), it follows that the inclusion below is proper: Nilpotent
⊂
Quasinilpotent.
The next proposition is the Gelfand–Beurling formula for the spectral radius. The proof of it requires another piece of elementary complex analysis, namely, every analytic function has a power series representation. Precisely, if f : Λ → C is analytic and the annulus Bα,β (μ) = {λ ∈ C : 0 ≤ α < |λ − μ| < β} lies in the open set Λ ⊆ C , then f has a unique Laurent expansion about the ∞ point μ, viz., f (λ) = k=−∞ γk (λ − μ)k for every λ ∈ Bα,β (μ). 1
Proposition 6.21. rσ (T ) = limn #T n # n . Proof. Since rσ (T )n ≤ #T n # for every positive integer n, 1
rσ (T ) ≤ lim #T n # n . n
462
6. The Spectral Theorem 1
(Reason: The limit of the sequence {#T n# n} exists for every T ∈ B[X ], according to Lemma 6.8.) Now recall the von Neumann expansion for the resolvent function R : ρ(T ) → G[X ]: R(λ) = (λI − T )−1 = λ−1
∞
T k λ−k
k=0
for every λ ∈ ρ(T ) such that #T # < |λ|, where the above series converges in the (uniform) topology of B[X ] (cf. Problem 4.47). Take an arbitrary bounded ∗ linear functional ϕ : B[X ] → C in B[X ] . Since ϕ is continuous, ϕ(R(λ)) = λ−1
∞
ϕ(T k )λ−k
k=0
for every λ ∈ ρ(T ) such that #T # < |λ|. Claim . The displayed identity holds whenever rσ (T ) < |λ|. ∞ Proof. λ−1 k=0 ϕ(T k )λ−k is a Laurent expansion of ϕ(R(λ)) about the origin for every λ ∈ ρ(T ) such that #T # < |λ|. But ϕ ◦ R is analytic on ρ(T ) (cf. Claim 2 in Proposition 6.13) so that ϕ(R(λ)) has a unique Laurent expansion about the origin for every λ ∈ ρ(T ), and therefore for every λ ∈ C such ∞ that rσ (T ) < |λ|. Then ϕ(R(λ)) = λ−1 k=0 ϕ(T k )λ−k, which holds for every λ ∈ C such that rσ (T ) ≤ #T # < |λ|, must be the Laurent expansion about the origin for every λ ∈ C such that rσ (T ) < |λ|. Therefore, if rσ (T ) < |λ|, then ϕ((λ−1 T )k ) = ϕ(T k )λ−k → 0 (since the above ∗ series converges — see Problem 4.7) for every ϕ ∈ B[X ] . But this implies that {(λ−1 T )k } is bounded in the (uniform) topology of B[X ] (Problem 4.67(d)). That is, λ−1 T is power bounded. Hence |λ|−n #T n # ≤ supk #(λ−1 T )k # < ∞ so that, for every n ≥ 1,
1 1 |λ|−1 #T n # n ≤ sup #(λ−1 T )k # n k −1
1 n
1
if rσ (T ) < |λ|. Then |λ| limn #T # ≤ 1, and so limn #T n # n ≤ |λ|, for every 1 λ ∈ C such that rσ (T ) < |λ|. Thus limn #T n # n ≤ rσ (T ) + ε for all ε > 0. Hence n
1
lim #T n # n ≤ rσ (T ).
n
What Proposition 6.21 says is that rσ (T ) = r(T ), where r(T ) is the limit 1 of the numerical sequence {#T n# n } (whose existence was proved in Lemma 6.8). We shall then adopt one and the same notation (the simplest, of course) 1 for both of them: the limit of {#T n# n } and the spectral radius. Thus, from now on, the spectral radius of an operator T ∈ B[X ] on a complex Banach space X will be denoted by r(T ): 1
r(T ) = sup |λ| = max |λ| = lim #T n# n . λ∈σ(T )
λ∈σ(T )
n
6.3 Spectral Radius
463
Remarks: Thus a normaloid operator on a complex Banach space is precisely an operator whose norm coincides with the spectral radius. Recall that in a complex Hilbert space H every normal operator is normaloid, and so is every nonnegative operator. Since T ∗ T is always nonnegative, it follows that r(T ∗ T ) = r(T T ∗ ) = #T ∗ T # = #T T ∗ # = #T #2 = #T ∗#2 for every T ∈ B[H] (cf. Proposition 5.65), which is an especially useful formula for computing the norm of operators on a Hilbert space. Also note that an operator T on a Banach space is normaloid if and only if there exists λ ∈ σ(T ) such that |λ| = #T #. However, for operators on a Hilbert space such a λ can never be in the residual spectrum. Explicitly, for every operator T ∈ B[H], σR (T ) ⊆ λ ∈ C : |λ| < #T # . (Indeed, if λ ∈ σR (T ) = σP (T ∗ )∗ \σP (T ), then there exists 0 = x ∈ H such that 0 < #T x − λx#2 = #T x#2 − 2 Reλx ; T ∗ x + |λ|2 #x#2 = #T x#2 − |λ|2 #x#2 , and so |λ| < #T #.) Moreover, as a consequence of the preceding results (see also the remarks that succeed Corollary 6.20), for all operators S, T ∈ B[X ], r(αT ) = |α|r(T ) and
for every
α ∈ C,
r(S T ) = r(T S).
If T ∈ B[H], where H is a complex Hilbert space H, then r(T ∗ ) = r(T ) and, if Q ∈ B[H] is a nonnegative operator, then 1
1
r(Q 2 ) = r(Q) 2 . An important application of the Gelfand–Beurling formula ensures that an operator T is uniformly stable if and only if r(T ) < 1. In fact, there exists in the current literature a large collection of equivalent conditions for uniform stability. We shall consider below just a few of them. Proposition 6.22. Let T ∈ B[X ] be an operator on a complex Banach space X . The following assertions are pairwise equivalent . u (a) T n −→ O.
(b) r(T ) < 1. (c) #T n # ≤ β αn for every n ≥ 0, for some β ≥ 1 and some α ∈ (0, 1). ∞ n p (d) n=0 #T # < ∞ for an arbitrary p > 0. ∞ n p (e) n=0 #T x# < ∞ for all x ∈ X , for an arbitrary p > 0.
464
6. The Spectral Theorem
Proof. Since r(T )n = r(T n ) ≤ #T n # for every n ≥ 0, it follows that (a)⇒(b). Suppose r(T ) < 1 and take any α ∈ (r(T ), 1). The Gelfand–Beurling formula 1 says that limn #T n # n = r(T ). Thus there is an integer nα ≥ 1 such that #T n # ≤ αn for every n ≥ nα , and so (b)⇒(c) with β = max 0≤n≤nα #T n#α−nα . It is trivially verified that (c)⇒(d)⇒(e). If (e) holds, then supn #T n x# < ∞ for every x ∈ X , and hence supn #T n # < ∞ by the Banach–Steinhaus Theorem (Theorem 4.43). Moreover, for m ≥ 1 and p > 0 arbitrary, 1
#m p T m x#p =
m−1
∞
p #T m−n T n x#p ≤ sup #T n # #T n x#p . n
n=0
n=0
1
Thus supm #m p T m x# < ∞ for every x ∈ X whenever (e) holds true. Since 1 1 m p T m ∈ B[X ] for each m ≥ 1, it follows that supm#m p T m # < ∞ by using the Banach–Steinhaus Theorem again. Hence 1
1
0 ≤ #T n # ≤ n− p sup #m p T m # m
for every n ≥ 1, so that #T # → 0 as n → ∞. Therefore, (e)⇒(a). n
Remark : If an operator is similar to a strict contraction, then, by the above proposition, it is uniformly stable. Indeed, let X and Y be complex Banach spaces and take any M ∈ G[X , Y ]. Since similarity preserves the spectrum (and so the spectral radius — see Problem 6.10), it follows that r(T ) = r(M T M −1 ) ≤ #M T M −1 #. Hence, if T ∈ B[X ] is similar to a strict contraction (i.e., if #M T M −1 # < 1 u for some M ∈ G[X , Y ]), then r(T ) < 1 or, equivalently, T n −→ O. There are several different ways to verify the converse. One of them uses a result from model theory for Hilbert space operators (see the references at the end of this chapter), yielding another formula for the spectral radius, which reads as follows. If H and K are complex Hilbert spaces, and if T ∈ B[H], then r(T ) =
inf
M ∈G[H,K]
#M T M −1 #.
Thus r(T ) < 1 if and only if #M T M −1 # < 1 for some M ∈ G[H, K]. Equivalently (cf. Proposition 6.22), a Hilbert space operator is uniformly stable if and only if it is similar to a strict contraction. The next result extends the von Neumann expansion of Problem 4.47. Re∞ call that an infinite series k=0 ( Tλ )k is said to converge uniformly or strongly if ∞ n ( Tλ )k n=0 converges uniformly or strongly, the sequence of partial sums k=0 ∞ and k=0 ( Tλ )k ∈ B[X ] denotes its uniform or strong limit, respectively. Corollary 6.23. Let X be a complex Banach space. Take any operator T in B[X ] and any nonzero complex number λ.
6.4 Numerical Radius
465
∞ T k (a) r(T ) < |λ| if and only if ) converges uniformly. In this case, λ k=0 ( λ ∞ lies in ρ(T ) and (λI − T )−1 = λ1 k=0 ( Tλ )k . ∞ T k (b) If r(T ) = |λ| and k=0 ( λ ) converges strongly, then λ lies in ρ(T ) and ∞ (λI − T )−1 = λ1 k=0 ( Tλ )k . ∞ (c) If |λ| < r(T ), then k=0 ( Tλ )k does not converge strongly. ∞ u O (cf. Problem 4.7), Proof. If k=0 ( Tλ )k converges uniformly, then ( Tλ )n −→ T −1 and hence |λ| r(T ) = r( λ ) < 1 by Proposition 6.22. Conversely, if r(T ) < |λ|, then λ ∈ ρ(T ) so that (λI − T ) ∈ G[X ], and r( Tλ ) = |λ|−1 r(T ) < 1. Therefore, T n ( λ ) is an absolutely summable sequence in B[X ] by Proposition 6.22. Now follow steps of Problem 4.47 to conclude all the properties of item (a). If ∞ the T k T n ( ) converges strongly, k=0 λ , T n ,then ( λ ) x → 0 in X for every x ∈,XT(Problem , 4.7 again) so that supn ,( λ ) x, < ∞ for every x ∈ X . Then supn ,( λ )n , < ∞ by the Banach–Steinhaus Theorem (i.e., the operator Tλ is power bounded), and hence |λ|−1 r(T ) = r( Tλ ) ≤ 1. This proves assertion (c). Moreover, (λI −
T ) λ1
n T k λ
=
1 λ
k=0
n T k λ
(λI − T ) = I −
T n+1 λ
s −→ I.
k=0
∞ ∞ T k = 1 ( λ ) , where k=0 ( Tλ )k ∈ B[X ] is the strong nλ k=0 ∞ T k k=0 ( λ ) n=0 , which concludes the proof of (b).
−1
Thus (λI − T ) of the sequence
limit
6.4 Numerical Radius The numerical range of an operator T acting on a complex Hilbert space H = {0} is the (nonempty) set W (T ) = λ ∈ C : λ = T x ; x for some #x# = 1 . It can be shown that W (T ) is always convex in C and, clearly, W (T ∗ ) = W (T )∗ . Proposition 6.24. σP (T ) ∪ σR (T ) ⊆ W (T ) and σ(T ) ⊆ W (T )−. Proof. Take T ∈ B[H], where H = {0} is a complex Hilbert space. (a) If λ ∈ σP (T ), then there exists a unit vector x in H (i.e., there exists x ∈ H with #x# = 1) such that T x = λx. Thus T x ; x = λ#x#2 = λ, which means that λ ∈ W (T ). If λ ∈ σR (T ), then λ ∈ σP (T ∗ ) by Proposition 6.17, and hence λ ∈ W (T ∗ ). Therefore, λ ∈ W (T ). (b) If λ ∈ σAP (T ), then there exists a sequence {xn } of unit vectors in H such that #(λI − T )xn # → 0 by Proposition 6.15. Thus 0 ≤ |λ − T xn ; xn | = |(λI − T )xn ; xn | ≤ #(λI − T )xn # → 0,
466
6. The Spectral Theorem
so that T xn xn → λ. Since each T xn ; xn lies in W (T ), it follows by the Closed Set Theorem (Theorem 3.30) that λ ∈ W (T )−. Hence σAP (T ) ⊆ W (T )− , and therefore σ(T ) = σR (T ) ∪ σAP (T ) ⊆ W (T )− according to item (a).
The numerical radius of T ∈ B[H] is the number w(T ) = sup |λ| = sup |T x ; x|. λ∈W (T )
Note that
w(T ∗ ) = w(T )
x =1
and
w(T ∗ T ) = #T #2.
Unlike the spectral radius, the numerical radius is a norm on B[H]. That is, 0 ≤ w(T ) for every T ∈ B[H] and 0 < w(T ) if T = O, w(αT ) = |α|w(T ), and w(T + S) ≤ w(T ) + w(S) for every α ∈ C and every S, T ∈ B[H]. Warning: The numerical radius is a norm on B[H] that does not have the operator norm property, which means that the inequality w(S T ) ≤ w(S)w(T ) is not true for all operators S, T ∈ B[H]. However, the power inequality holds: w(T n ) ≤ w(T )n for all T ∈ B[H] and every positive integer n — the proof is tricky. Nevertheless, the numerical radius is a norm equivalent to the (induced uniform) operator norm of B[H] and dominates the spectral radius, as follows. Proposition 6.25. 0 ≤ r(T ) ≤ w(T ) ≤ #T # ≤ 2w(T ). Proof. Since σ(T ) ⊆ W (T )−, we get r(T ) ≤ w(T ). Moreover, w(T ) = sup |T x ; x| ≤ sup #T x# = #T #.
x =1
x =1
Now use Problem 5.3, and recall that |T z ; z| ≤ sup |T u ; u|#z#2 = w(T )#z#2
u =1
z z for every z ∈ H (because T z ; z = T z ; z #z#2 for every nonzero z ∈ H), and apply the parallelogram law, to get |T x ; y| ≤ 14 |T (x + y) ; (x + y)| + |T (x − y) ; (x − y)|
+|T (x + iy) ; (x + iy)| + |T (x − iy) ; (x − iy)|
≤ 14 w(T ) #x + y#2 + #x − y#2 + #x + iy#2 + #x − iy#2
= w(T ) #x#2 + #y#2 ≤ 2w(T )
whenever #x# = #y# = 1. Therefore, according to Corollary 5.71, #T # =
sup
x = y =1
|T x ; y| ≤ 2w(T ).
6.4 Numerical Radius
467
An operator T ∈ B[H] is spectraloid if r(T ) = w(T ). The next result is a straightforward application of the previous proposition. Corollary 6.26. Every normaloid operator is spectraloid . Indeed, r(T ) = #T # implies r(T ) = w(T ) by Proposition 6.25. However, Proposition 6.25 also ensures that r(T ) = #T # implies w(T ) = #T #, so that w(T ) = #T # is a property of every normaloid operator on H. What emerges as a nice surprise is that this property can be viewed as a third definition of a normaloid operator on a complex Hilbert space. Proposition 6.27. T ∈ B[H] is normaloid if and only if w(T ) = #T #. Proof. The easy half of the proof was presented above. Suppose T = O (avoiding trivialities). W (T )− is compact in C (since it is clearly bounded). Thus max
λ∈W (T )−
|λ| =
sup λ∈W (T )−
|λ| =
sup |λ| = w(T ),
λ∈W (T )
and so there exists a λ ∈ W (T )− such that λ = w(T ). If w(T ) = #T #, then λ = #T #. Since W (T ) is always nonempty, it follows by Proposition 3.32 that there exists a sequence {λn } in W (T ) that converges to λ. In other words, there exists a sequence {xn } of unit vectors in H (#xn # = 1 for each n) such that λn = T xn ; xn → λ, where |λ| = #T # = 0. Set S = λ−1 T ∈ B[H] so that Sxn ; xn → 1. Claim . #Sxn # → 1 and ReSxn ; xn → 1. Proof. |Sxn ; xn | ≤ #Sxn # ≤ #S# = 1 for each n. But Sxn ; xn → 1 implies that |Sxn ; xn | → 1 (and hence #Sxn # → 1) and also that Re Sxn ; xn → 1. Both arguments follow by continuity. Then #(I − S)xn #2 = #Sxn − xn #2 = #Sxn #2 − 2ReSxn ; xn + #xn #2 → 0 so that 1 ∈ σAP (S) ⊆ σ(S) (cf. Proposition 6.15). Hence r(S) ≥ 1 and r(T ) = r(λS) = |λ|r(S) ≥ |λ| = #T #, which implies r(T ) = #T # (since r(T ) ≤ #T # for every operator T ). Therefore, the class of normaloid operators on H coincides with the class of all operators T ∈ B[H] for which #T # = sup |T x ; x|.
x =1
This includes the normal operators and, in particular, the self-adjoint operators (see Proposition 5.78). This includes the isometries too. In fact, every isometry is quasinormal, and hence normaloid. Thus r(V ) = w(V ) = #V # = 1
whenever
V ∈ B[H] is an isometry.
(The above identity can be directly verified by Propositions 6.21 and 6.25, once #V n # = 1 for every positive integer n — cf. Proposition 4.37.)
468
6. The Spectral Theorem
Remark : Since an operator T is normaloid if (and only if) r(T ) = #T #, it follows that the unique normaloid quasinilpotent is the null operator . In other words, if T normaloid and r(T ) = 0 (i.e., σ(T ) = {0}), then T = O. In particular, the unique normal (or hyponormal ) quasinilpotent is the null operator . More is true. In fact, the unique spectraloid quasinilpotent is the null operator. Proof: If w(T ) = r(T ) = 0, then T = O by Proposition 6.25. Corollary 6.28. If there exists λ ∈ W (T ) such that |λ| = #T #, then T is normaloid and λ ∈ σP (T ). In other words, if there exists a unit vector x such that #T # = |T x ; x|, then r(T ) = w(T ) = #T # and T x ; x ∈ σP (T ). Proof. If λ ∈ W (T ) is such that |λ| = #T #, then w(T ) = #T # (see Proposition 6.25) so that T is normaloid by Proposition 6.27. Moreover, since λ = T x ; x for some unit vector x, it follows that #T # = |λ| = |T x ; x| ≤ #T x##x# ≤ #T #, and hence |T x ; x| = #T x##x#. Then T x = αx for some α ∈ C (cf. Problem 5.2) so that α ∈ σP (T ). But α = α#x#2 = αx ; x = T x ; x = λ. Remark : Using the inequality #T n# ≤ #T #n, which holds for every operator T , we have shown in Proposition 6.9 that T is normaloid if and only if #T n# = #T #n for every n ≥ 0. Using the inequality w(T n ) ≤ w(T )n , which also holds for every operator T , we can show that T is spectraloid if and only if w(T n ) = w(T )n for every n ≥ 0. Indeed, by Corollary 6.20 and Propositions 6.25, r(T )n = r(T n ) ≤ w(T n ) ≤ w(T )n n
for every
n ≥ 0.
n
Hence r(T ) = w(T ) implies w(T ) = w(T ) . Conversely, since 1
1
w(T n ) n ≤ #T n # n → r(T ) ≤ w(T ), if follows that w(T n ) = w(T )n implies r(T ) = w(T ).
6.5 Examples of Spectra Every closed and bounded subset of the complex plane (i.e., every compact subset of C ) is the spectrum of some operator. Example 6.B. Take T ∈ B[X ] on a finite-dimensional complex normed space X . Thus X and its linear manifolds are all Banach spaces (Corollaries 4.28 and 4.29). Moreover, N (λI − T ) = {0} if and only if (λI − T ) ∈ G[X ] (cf. Problem 4.38(c)). That is, N (λI − T ) = {0} if and only if λ ∈ ρ(T ), and hence σC (T ) = σR (T ) = ∅. Furthermore, since R(λI − T ) is a subspace of X for every λ ∈ C , it also follows that σP2(T ) = σP3(T ) = ∅ (see diagram of Section 6.2). Finally, if N (λI − T ) = {0}, then R(λI − T ) = X whenever X is finite dimensional (cf. Problems 2.6(a) and 2.17), and so σP1(T ) = ∅. Therefore, σ(T ) = σP (T ) = σP4(T ).
6.5 Examples of Spectra
469
Example 6.C. Let T ∈ B[H] be a diagonalizable operator on a complex (separable infinite-dimensional) Hilbert space H. That is, according to Problem ∞ 5.17, there exists an orthonormal basis {ek }∞ k=1 for H and a sequence {λk }k=1 ∞ in + such that, for every x ∈ H, Tx =
∞
λk x ; ek ek .
k=1
(λI − T ) ∈ B[H] is again a diagonalizTake an arbitrary λ ∈ C and note that ∞ able operator. Indeed, (λI − T )x = k=1 (λ − λk )x ; ek ek for every x ∈ H. Since N (λI − T ) = {0} if and only if λ = λk for every k ≥ 1 (i.e., there exists (λI − T )−1 ∈ L[R(λI − T ), H] if and only if λ − λk = 0 for every k ≥ 1 — cf. Problem 5.17), it follows that σP (T ) = λ ∈ C : λ = λk for some k ≥ 1 . Similarly, since T ∗ ∈ B[H] also is a diagonalizable operator, given by T ∗ x = ∞ k=1 λk x ; ek ek for every x ∈ H (e.g., see Problem 5.27(c)), we get σP (T ∗ ) = λ ∈ C : λ = λk for some k ≥ 1 . Then
σR (T ) = σP (T ∗ )∗ \σP (T ) = ∅.
Moreover, λ ∈ ρ(T ) if and only if λI − T lies in G[H]; equivalently, if and only if inf k |λ − λk | > 0 (Problem 5.17). Thus σ(T ) = σP (T ) ∪ σC (T ) = λ ∈ C : inf |λ − λk | = 0 , k
and hence σ(T )\σP (T ) is the set of all cluster points of the sequence {λk }∞ k=1 (i.e., the set of all accumulation points of the set {λk }∞ k=1 ): σC (T ) = λ ∈ C : inf |λ − λk | = 0 and λ = λk for every k ≥ 1 . k
Note that σP1(T ) = σP2(T ) = ∅ (reason: T ∗ is a diagonalizable operator so that σR (T ∗ ) = ∅ — see Proposition 6.17). If λj ∈ σP (T ) also is an accumulation point of σP (T ), then it lies in σP3(T ); otherwise (i.e., if it is an isolated point of σP (T )), it lies in σP4(T ). Indeed, consider a new set {λk } without this point λj and the associated diagonalizable operator T so that λj ∈ σC (T ), and hence R(λj I − T ) is not closed, which means that R(λj I − T ) is not closed. If {λk } is a constant sequence, say λk = μ for all k, then T = μI is a scalar operator and, in this case, σ(μI) = σP (μI) = σP4(μI) = {μ}. Recall that C (with its usual metric) is a separable metric space (Example 3.P). Thus it includes a countable dense subset, and so does every compact
470
6. The Spectral Theorem
subset Σ of C . Let Λ be any countable dense subset of Σ, and let {λk }∞ k=1 be an enumeration of it (if Σ is finite, then set λk = 0 for all k > # Σ). Observe that supk |λk | < ∞ as Σ is bounded. Consider a diagonalizable operator T in ∞ B[H] such that T x = k=1 λk x ; ek ek for every x ∈ H. As we have just seen, σ(T ) = Λ− = Σ. That is, σ(T ) is the set of all points of adherence of Λ = {λk }∞ k=1 , which means the closure of Λ. This confirms the statement that introduced this section. Precisely, every closed and bounded subset of the complex plane is the spectrum of some diagonalizable operator on H. Example 6.D. Let D and T = ∂ D denote the open unit disk and the unit circle in the complex plane centered at the origin, respectively. In this example we shall characterize each part of the spectrum of a unilateral shift of arbitrary multiplicity. Let S+ be a unilateral shift acting on a (complex) Hilbert space H, and let {Hk }∞ k=0 be the underlying sequence of orthogonal subspaces of ∞ H = k=0 Hk (Problem 5.29). Recall that ∞ ∗ S+ x = 0 ⊕ ∞ and S+∗ x = k=1 Uk xk−1 k=0 Uk+1 xk+1 ∞ for every x = ∞ k=0 xk in H = k=0 Hk , with 0 denoting the origin of H0 , where {Uk+1 }∞ is an arbitrary sequence of unitary transformations of Hk k=0 onto Hk+1 , Uk+1 : Hk → Hk+1 . Since a unilateral shift is an isometry, we get r(S+) = 1. ∞
Take any H and an arbitrary λ ∈ C . If x ∈ N (λI − S+ ), then ∞x = k=0 xk ∈ ∞ λx0 ⊕ k=1 λxk = 0 ⊕ k=1 Uk xk−1 . Hence λx0 = 0 and, for every k ≥ 0, λxk+1 = Uk+1 xk . If λ = 0, then x = 0. If λ = 0, then x0 = 0 and xk+1 = λ−1 Uk+1 xk , so that #x0 # = 0 and #xk+1 # = |λ|−1 #xk #, for each k ≥ 0. Thus #xk # = |λ|−k #x0 # = 0 for every k ≥ 0. Hence x = 0, and so N (λI − S+) = {0} for all λ ∈ C . That is, σP (S+) = ∅. Now take any x0 = 0 in H0 and any λ ∈ D . Consider the sequence {xk }∞ k=0 , with each xk in Hk , recursively defined by xk+1 = λUk+1 xk , so that #xk+1 # = k |λ|#xk # for every k ≥ 0. Then k # = |λ| #x0 # for every k ≥ 1, and hence ∞ #x2k ∞ 2 2 = #x0 # (1 + k=0 #xk # k=1 |λ| ) < ∞, which implies that the nonzero ∞ ∗ vector x = ∞ k=0 xk lies in k=0 Hk = H. Moreover, since λxk = Uk+1 xk+1 ∗ ∗ for each k ≥ 0, it follows that λx = S+ x, and so 0 = x ∈ N (λI − S+ ). Therefore, N (λI − S+∗ ) = {0} for all λ ∈ D . Equivalently, (S+∗ ). On the other ∞ D ⊆ σP ∗ hand, if λ ∈ σP (S+ ), then there exists 0 = x = k=0 xk ∈ ∞ k=0 Hk = H such ∗ that S+∗ x = λx. Hence Uk+1 xk+1 = λxk so that #xk+1 # = |λ|#xk # for each for every k ≥ 1. Thus x0 = 0 (because x = 0) k ≥ 0, andso #xk # = |λ|k #x0 # ∞ ∞ and (1 + k=1 |λ|2k )#x0 #2 = k=0 #xk #2 = #x#2 < ∞, which implies that |λ| < 1 (i.e., λ ∈ D ). So we may conclude that σP (S+∗ ) ⊆ D . Then
6.5 Examples of Spectra
471
σP (S+∗ ) = D . But the spectrum of any operator T on H is a closed set included in the disk {λ ∈ C : |λ| ≤ r(T )}, which is the disjoint union of σP (T ), σR (T ), and σC (T ), where σR (T ) = σP (T ∗ )∗ \σP (T ) (Proposition 6.17). Hence σP (S+) = σR (S+∗ ) = ∅,
σR (S+) = σP (S+∗ ) = D ,
σC (S+) = σC (S+∗ ) = T .
Example 6.E. The spectrum of a bilateral shift is simpler than that of a unilateral shift, since bilateral shifts are unitary (i.e., besides being isometries they are normal too). Let S be a bilateral shift of arbitrary multiplicity acting on a (complex) Hilbert space H, and let {Hk }∞ k=−∞ be the underlying family of orthogonal subspaces of H = ∞ H (Problem 5.30) so that k k=−∞ ∞ ∞ ∗ and S ∗ x = Sx = k=−∞ Uk xk−1 k=−∞ Uk+1 xk+1 ∞ ∞ for every x = k=−∞ xk in H = k=−∞ Hk , where {Uk }∞ k=−∞ is an arbitrary family of unitary transformations Uk+1 : Hk → Hk+1 . Suppose there exists λ ∈ T ∩ ρ(S) so that R(λI − S) = H and |λ| = 1. Take any y0 = 0 in H0 ∞ and set yk = 0 ∈ Hk for each k = 0. Now consider the vector y = k=−∞ yk ∞ in H = R(λI − S) and let x = k=−∞ xk ∈ H be any inverse image of y under λI − S; that is, (λI − S)x = y. Since y0 = 0 it follows that y = 0, and hence x = 0. On the other hand, since yk = 0 for every k = 0, it also follows that λxk = Uk xk−1 + yk = Uk xk−1 . Hence #xk # = #xk−1 # for every k = 0. Thus #xj # = #x−1 # for every j ≤ −1 and #xj # = #x0 # for every j ≥ 0, and so ∞ −1 ∞ x = 0 (since #x#2 = k=−∞ #xk #2 = j=−∞ #xj #2 + j=0 #xj #2 < ∞). Thus the existence of a complex number λ in T ∩ ρ(S) leads to a contradiction. Conclusion: T ∩ ρ(S) = ∅. That is, T ⊆ σ(S). Since S is unitary, it follows that σ(S) ⊆ T (according to Corollary 6.18(c)). Outcome: σ(S) = T .
∞ Now take any pair {λ, and x = k=−∞ xk in H. If x is ∞x} with λ in σ(S) ∞ in N (λI − S), then k=−∞ λxk = k=−∞ Uk xk−1 and so λxk = Uk xk−1 for each k. Since |λ| = 1 (because σ(S) = T ), #xk # = #xk−1 # for each k. Hence ∞ x = 0 (since #x#2 = k=−∞ #xk #2 is finite). Thus N (λI − S) = {0} for all λ ∈ σ(S). That is, σP (S) = ∅. But S is normal, so that σR (S) = ∅ (cf. Corollary 6.18(b)). Recalling that σ(S ∗ ) = σ(S)∗ and σC (S ∗ ) = σC (S)∗ (Proposition 6.17), we get σ(S) = σ(S ∗ ) = σC (S ∗ ) = σC (S) = T . 2 (H) or on 2 (H), Consider a weighted sum of projections D = k αk Pk on + ∼ where {αk } is a bounded family of scalars and R(Pk ) = H for all k. This is identified with an orthogonal direct sum of scalar operators D = k αk I
472
6. The Spectral Theorem
2 (Problem 5.16), and is referred to as a diagonal operator on + (H) or on 2 (H), respectively. A weighted shift is the product of a shift and a diagonal operator. Such a definition implicitly assumes that the shift (unilateral or bilateral, of any multiplicity) acts on the direct sum of countably infinite copies of a single 2 Hilbert space H. Explicitly, a unilateral weighted shift on + (H) is the product 2 2 of a unilateral shift on + (H) and a diagonal operator on + (H). Similarly, a 2 bilateral weighted shift on (H) is the product of a bilateral shift on 2 (H) 2 and a diagonaloperator on 2 (H). Diagonal operators acting on + (H) and on ∞ ∞ 2 (H), D+ = k=0 αk I and D = k=−∞ αk I, where I is the identity on H, ∞ are denoted by D+ = diag({αk }∞ k=0 ) and D = diag({αk }k=−∞ ), respectively. 2 2 Likewise, weighted shifts acting on + (H) and on (H), T+ = S+ D+ and ∞ T = SD, will be denoted by T+ = shift({αk }∞ k=0 ) and T = shift({αk }k=−∞ ), 2 respectively, whenever S+ is the canonical unilateral shift on + (H) and S is the canonical bilateral shift on 2 (H) (see Problems 5.29 and 5.30).
Example 6.F. Let {αk }∞ k=0 be a bounded sequence in C such that αk = 0 for every k ≥ 0
and
αk → 0 as k → ∞.
2 Consider the unilateral weighted shift T+ = shift({αk }∞ k=0 ) on + (H), where H = {0} is a complex Hilbert space. The operators T+ and T+∗ are given by ∞ ∞ T+ x = S+ D+ x = 0 ⊕ k=1 αk−1xk−1 and T+∗ x = D+∗ S+∗ x = k=0 αk xk+1 ∞ ∞ 2 for every x = k=0 xk in + (H) = k=0 H, with 0 denoting the origin of H. Applying the same argument used in Example 6.D to show that ∞ σ(S+ ) = ∅, we get N (λI − T+ ) = {0} for all λ ∈ C . Indeed, if x = k=0 xk lies in ∞ ∞ N (λI − T+ ), then λx0 ⊕ k=1 λxk = 0 ⊕ k=1 αk−1 xk−1 so that λx0 = 0 and λxk+1 = αk xk for every k ≥ 0. Thus x = 0 if λ = 0 (since αk = 0) and, if λ = 0, then x0 = 0 and #xk+1 # ≤ supk |αk ||λ|−1 #xk # for every k ≥ 0, which implies that x = 0. Thus σP (T+ ) = ∅. ∞ Note that the vector x = k=0 xk , with 0 = x0 ∈ H and xk = 0 ∈ H for every ∞ 2 2 k ≥ 1, is in + (H) but not in R(T+ )− ⊆ {0} ⊕ k=1 H. So R(T+ )− = + (H), and hence 0 ∈ σ CP (T+ ). Since σP (T+ ) = ∅, it follows that 0 ∈ σR (T+ ). Then
{0} ⊆ σR (T+ ). 2 (H). In fact, suppose λ = 0 and However, if λ = 0, then R(λI − T+ ) = + ∞ 2 take any y = k=0 yk in + (H). Set x0 = λ−1 y0 and, for each k ≥ 0, xk+1 = λ−1 (αk xk + yk+1 ). Since αk → 0, there exists a positive integer kλ such that α = |λ|−1 supk≥kλ |αk | ≤ 12 . Then #αk+1 xk+1 # ≤ α(#αk xk # + #yk+1 #), so that #αk+1 xk+1 #2 ≤ α2 (#αk xk # + #yk+1 #)2≤ 2α2 (#αk xk #2 + #yk+1 #2 ), for each ∞ ∞ k ≥ kλ . Thus k=kλ #αk+1 xk+1 #2 ≤ 12 k=kλ #αk xk #2 + 12 #y#2 , which implies ∞ ∞ ∞ 2 xk #2 < ∞, and hence |λ|2 ( k=0 #xk+1 # ) ≤ k=0 (#αk xk # + that k=0 #α k ∞ 2 2 2 #yk+1 #)2 ≤ 2 < ∞. Then x = ∞ k=0 #αk xk # + #y# k=0 xk lies in + (H).
6.5 Examples of Spectra
473
But (λI − T+ )x = λx0 ⊕ ∞ k=1 λxk − αk−1 xk−1 = y, and so y ∈ R(λI − T+ ). 2 Outcome: R(λI − T+ ) = + (H). Since N (λI − T+ ) = {0} for all λ ∈ C , it then follows that λ ∈ ρ(T+ ) for every nonzero λ ∈ C , and so σ(T+ ) = σR (T+ ) = {0}. Moreover, as σR1(T ) is always an open set, σ(T+ ) = σR (T+ ) = σR2(T+ ) = {0}, and hence
σ(T+∗ ) = σP (T+∗ ) = σP2(T+∗ ) = {0}.
This is our first instance of a quasinilpotent operator (r(T+ ) = 0) that is not nilpotent (σP (T+ ) = ∅). The next example exhibits another one. It is worth noticing that σ(μI − T+ ) = {μ − λ ∈ C : λ ∈ σ(T+ )} = {μ} by the Spectral Mapping Theorem, and so σ(μI − T+∗ ) = {μ}. Moreover, if x is an eigenvector of T+∗ , then T+∗ x = 0 so that (μI − T+∗ )x = μx; that is, μ ∈ σP (μI − T+∗ ). Thus σ(μI − T+ ) = σR (μI − T+ ) = {μ}
and
σ(μI − T+∗ ) = σP (μI − T+∗ ) = {μ}.
Example 6.G. Let {αk }∞ k=−∞ be a bounded family in C such that αk = 0 for every k ∈ Z
and
αk → 0 as |k| → ∞,
2 and consider the bilateral weighted shift T = shift({αk }∞ k=−∞ ) on (H), where H = {0} is a complex Hilbert space. T and T ∗ are given by ∞ ∞ T = SDx = and T ∗ = D∗ S ∗ x = k=−∞ αk−1 xk−1 k=−∞ αk xk+1 ∞ ∞ 2 for every ∞ x = k=−∞ xk in (H) =∞k=−∞ H. Take an arbitrary λ ∈ C . If x = k=−∞ xk ∈ N (λI − T ), then k=−∞ λxk − αk−1 xk−1 = 0, and hence λxk+1 = αk xk for every k ∈ Z . If λ = 0, then x = 0. Otherwise, if λ = 0, then −1 #xk+1 # ≤ supk |αk |#xk # for every k ∈ Z . But limk→ −∞ #xk # = 0 (since |λ| ∞ 2 2 #x# = k=−∞ #xk # < ∞) so that x = 0. Thus N (λI − T ) = {0} for all λ ∈ C . That is, σP (T ) = ∅. ∞ 2 Take any vector y = k=−∞ yk in (H) and any scalar λ = 0 in C . Since αk → 0 as |k| → ∞, it follows that there exists a positive integer kλ and a finite set Kλ = {k ∈ Z : −kλ ≤ k ≤ kλ } such that α = supk∈Z \Kλ | αλk | ≤ 12 . Thus ∞ j
k αj yj yk αk αk j=−∞ | λ | · · · | λ |# λ # ≤ supk # λ # 2 j=0 α + # Kλ supk | λ | < ∞ for all k α y k ∈ Z , which implies that the infinite series j=−∞ αλk · · · λj λj is absolutely convergent (and so convergent) in H for every k ∈ Z . Thus, for each k ∈ Z , set k−1 α αj yj yk+1 yk αk xk = j=−∞ k−1 λ ··· λ λ + λ in H so that xk+1 = λ xk + λ . If k ∈ Z \Kλ , So #αk xk #2 ≤ 2α2 (#αk−1 xk−1 #2 + #yk #2 ), #αk xk # ≤ α(#α k−1 xk−1 # + 2#yk #). 1 2 2 and hence k∈Z \Kλ#αk xk # ≤ 2 k∈Z \Kλ #αk−1xk−1 # + #y# . Therefore, ∞ 2 since λxk+1 = αk xk + yk+1 for each k ∈ Z , k=−∞#αk xk # < ∞. Moreover, ∞
2 2 2 ≤ ∞ it then follows that |λ| k=−∞ #xk+1 # k=−∞ (#αk xk # + #yk+1 #) ≤
474
6. The Spectral Theorem
∞
∞ 2 2 xk #2 + #y#2 < ∞. Then x = k=−∞ #αk k=−∞ xk lies in (H). But ∞ (λI − T )x = k=−∞ λxk − αk−1 xk−1 = y, and so y ∈ R(λI − T ). Outcome: R(λI − T ) = 2 (H). Since N (λI − T ) = {0} for all λ ∈ C , every 0 = λ ∈ C lies in ρ(T ). Conclusion: σ(T ) = {0}. However, if x ∈ N (T ∗ ), then αk xk+1 = 0 so that xk+1 = 0 (since αk = 0) for every k ∈ Z and hence x = 0. That is, N (T ∗ ) = {0} or, equivalently (Problem 5.35), R(T )− = 2 (H). This implies that 0 ∈ σR (T ). Since σP (T ) = ∅, we get σ(T ) = σC (T ) = σC (T ∗ ) = σ(T ∗ ) = {0}. Note: As in the previous example, by the Spectral Mapping Theorem we get σ(μI − T ) = σC (μI − T ) = {μ}
and
σ(μI − T ∗ ) = σC (μI − T ∗ ) = {μ}.
Example 6.H (Part 1). Let F ∈ B[H] be an operator on a complex Hilbert 2 space H = {0}. Consider the operator T ∈ B[+ (H)] defined by ∞ ∗ ∞ T x = 0 ⊕ k=1 F xk−1 so that T ∗ x = k=0 F xk+1 ∞ ∞ 2 for every x = k=0 xk in + (H) = k=0 H, where 0 is the origin of H. These can be identified with infinite matrices of operators, namely, ⎞ ⎞ ⎛ ⎛ O O F∗ ∗ ⎟ ⎟ ⎜F O ⎜ O F ⎟ ⎟ ⎜ ⎜ ∗ ∗ ⎟ ⎟, ⎜ ⎜ F O O F T =⎜ ⎟ and T = ⎜ ⎟ ⎠ ⎠ ⎝ ⎝ F O O .. .. . . where the entries just below (above) the main block diagonal in the matrix of T (T ∗ ) are copies of F (F ∗ ), and the remaining entries are all null operators. ∞ n It is readily verified by induction that T n x = n−1 k=0 0 ⊕ k=n F xk−n , and ∞ n 2 n 2 n n 2 hence #T x# = k=0 #F xk # so that #T x# ≤ #F ##x# for all x in + (H), n n which implies that #T # ≤ #F #, for each n ≥ 1. On the other hand, take ∞ any y0 = 0 in H, set yk = 0 ∈ H for all k ≥ 1, consider the vector y = k=0 yk in 2 + (H) so that #y# = #y0 # = 0, and observe that #F n # = sup y0 =1 #F n y0 # = sup y =1 #T n y# ≤ sup x =1 #T nx# = #T n # for each n ≥ 1. Thus #T n # = #F n #
for every
n ≥ 1,
and so (Gelfand–Beurling formula — Proposition 6.21), r(T ) = r(F ). Moreover, since y = 0 and T ∗ y = 0, it follows that 0 ∈ σP (T ∗ ). Thus {0} ⊆ σP (T ∗ ),
6.5 Examples of Spectra
475
and hence {0} ⊆ σ(T ). Now take anarbitrary λ ∈ ρ(T ) so that λ = 0 and ∞ 2 2 R(λI − T ) = + (H). Since y = y0 ⊕ k=1 0 lies in + (H) for every y0 ∈ H, it ∞ follows that y ∈ R(λI − T ). That is, y = (λI − T )x for some x = k=0 xk in ∞ ∞ 2 + (H) and so y0 ⊕ k=1 0 = λx0 ⊕ k=1 (λxk − F xk−1 ). Thus x0 = λ−1 y0 −1 k and xk+1 = λ−1 F xk for every k ≥ 0 and so λ−1 (λ−1 F )k y0 . x∞k = (λ−1 F )k x0 = ∞ 2 2 −2 2 Therefore, #x# = F ) y0 # < ∞ for every k=0 #xk # = |λ| k=0 #(λ 2 y0 in H (since x lies in + (H) for every y0 ∈ H). Hence r(λ−1 F ) < 1 by Proposition 6.22. Conclusion: if λ ∈ ρ(T ), then r(F ) < |λ|. Equivalently, if |λ| ≤ r(F ), then λ ∈ σ(T ); that is, {λ ∈ C : |λ| ≤ r(F )} ⊆ σ(T ). But the reverse inclusion, σ(T ) ⊆ {λ ∈ C : |λ| ≤ r(F )}, holds because r(F ) = r(T ). Moreover, since σ(T ∗ ) = σ(T )∗ for every operator T , and since D − · σ(F ) = {λ ∈ C : |λ| ≤ r(F )} (where the product of two numerical sets is the set consisting of all products with factors in each set, and where D − denotes the closed unit disk about the origin), it follows that σ(T ∗ ) = σ(T ) = D − · σ(F ) = λ ∈ C : |λ| ≤ r(F ) . Now recall that λ ∈ σP (T ) if and only if T x = λx (i.e., if and only if λx0 = 0 ∞ 2 and λxk+1 = F xk for every k ≥ 0) for some nonzero x = k=0 xk in + (H). If ∞ 2 0 ∈ σP (T ), then F xk = 0 for all k ≥ 0 for some nonzero x = k=0 xk in + (H) so that 0 ∈ σP (F ). Conversely, if 0 ∈ σP (F ), then there exists an x0 = 0 in H ∞ such that F x0 = 0. Thus set x = k=0 (k + 1)−1 x0 , which is a nonzero vector ∞ 2 in + (H) such that T x = 0 ⊕ k=1 k −1 F x0 = 0, and so 0 ∈ σP (T ). Outcome: 0 ∈ σP (T ) if and only if 0 ∈ σP (F ). Moreover, if λ = 0 lies in σP (T ), then x0 = 0 and xk+1 = λ−1 F xk for every k ≥ 0 so that x = 0, which is a contradiction. Thus if λ = 0, then λ ∈ σP (T ). Summing up: {0}, 0 ∈ σP (F ), σP (T ) = ∅, 0∈ / σP (F ). Since σR (T ∗ ) = σP (T )∗ \σP (T ∗ ), and since σP (T )∗ ⊆ {0} ⊆ σP (T ∗ ), we get σR (T ∗ ) = ∅,
and hence σC (T ∗ ) =
λ ∈ C : |λ| ≤ r(F ) \σP (T ∗ ).
If σP (T ∗ ) = {0}, then thereexists 0 = λ ∈ σP (T ∗ ), which means that T ∗ x = ∞ 2 λx for some nonzero x = k=0 xk in + (H). Thus there exists 0 = xj ∈ H ∗ such that F xk+1 = λxk for every k ≥ j, and so a trivial induction shows that ∞ k ∗k F ∗k xj+k = λ x for every k ≥ 0. Hence x ∈ j j k=0 R(F ) because λ = 0, and ∞ ∗k therefore k=0 R(F ) = {0}. Conclusion: ∞ R(F ∗k ) = {0} implies σP (T ∗ ) = {0}, k=0
and, in this case, σC (T ∗ ) =
λ ∈ C : |λ| ≤ r(F ) \{0}.
476
6. The Spectral Theorem
2 In particular, if F = S+∗ on H = + (K) for any nonzero complex Hilbert space ∗ K, then r(F ) = r(S+ ) = 1 (according to Example 6.D) and R(F ∗k ) = k ∞ ∞ 2 R(S+k ) = j=0 {0} ⊕ j=k+1 K ⊆ + (K) so that k=0 R(F ∗k ) = {0}. Thus,
σP (T ∗ ) = σP (T ) = {0}, σR (T ∗) = σR (T ) = ∅, σC (T ∗ ) = σC (T ) = D − \{0}. Summing up: A backward unilateral shift of unilateral shifts (i.e., T ∗ with F ∗ = S+ , which is usually denoted by T ∗ = S+∗ ⊗ S+ ) and a unilateral shift of backward unilateral shifts (i.e., T with F = S+∗ , which is usually denoted by T = S+ ⊗ S+∗ ) have a continuous spectrum equal to the punctured disk D − \{0}. This was our first example of operators for which the continuous spectrum has nonempty interior. Example 6.H (Part 2). This is a bilateral version of Part 1. Take F ∈ B[H] on a complex Hilbert space H = {0}, and consider T ∈ B[ 2 (H)] defined by ∞ ∞ ∗ Tx = so that T ∗ x = k=−∞ F xk−1 k=−∞ F xk+1 ∞ ∞ for every x = k=−∞ xk in 2 (H) = k=−∞ H. These can be identified with (doubly) infinite matrices of operators (the inner parenthesis indicates the zero-zero entry), namely, ⎞ ⎛ ⎞ ⎛ .. .. . . ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ O O F∗ ⎟ ⎜ ⎟ ⎜ ∗ ⎟ ⎜ ⎟ ⎜ F O O F ∗ T = ⎜ ⎟ and T = ⎜ ⎟, ⎟ ⎜ ⎟ ⎜ F (O) (O) F ∗ ⎟ ⎜ ⎟ ⎜ ⎠ ⎝ ⎠ ⎝ F O O .. .. . . where the entries just below (above) the main block diagonal in the matrix of T (or T ∗ ) are copies of F (F ∗ ), and the remaining entries are all null operators. Using the same argument of Part 1, it is easy to show that r(T ) = r(F ). If N (F ) = then there exists 0 = x0 ∈ H for which F x0 = 0. In this case, {0}, ∞ 2 set x = k=−∞ xk = 0 in (H) with xk = 0 for every k ∈ Z \{0} so that T x =0. Thus N (T ) = {0}. Conversely, if N (T ) = {0}, then there exists an ∞ x = k=−∞ xk = 0 in 2 (H) (so that xj = 0 for some j ∈ Z ) for which T x = ∞ k=−∞ F xk−1 = 0, and hence F xk = 0 for all k ∈ Z ; in particular, F xj = 0 so that N (F ) = {0}. Therefore, 0 ∈ σP (T )
if and only if
0 ∈ σP (F ).
Similarly (same argument), 0 ∈ σP (F ∗ ) if and only if 0 ∈ σP (T ∗ ), and so, recalling that σR (F ) = σP (F ∗ )∗ \σP (F ) and σR (T ) = σP (T ∗ )∗ \σP (T ), 0 ∈ σR (T )
if and only if
0 ∈ σR (F ).
6.5 Examples of Spectra
477
We have seen that N (F ) = {0} if and only if N (T ) = {0}. Dually (and similarly), N (F ∗ ) = {0} if and only if N (T ∗ ) = {0}, which means that R(F )− = H if and only if R(T )− = 2 (H) (Problem 5.35). Moreover, it is plain that R(F ) = H if and only if R(T ) = 2 (H). Thus (cf. diagram of Section 6.2), 0 ∈ σC (T )
if and only if
0 ∈ σC (F ).
Next we prove the following assertion. there exists If 0 = λ ∈ σP (T ) and N (F ) = {0}, then−1 0 = x0 ∈ R(F ) ⊆ H such that ∞ F )±k x0 #2 < ∞. k=0 #(λ ∞ Indeed, λ ∈ σP (T ) if and only if T x = λx for some nonzero x = k=−∞ xk in 2 (H). Suppose λ = 0 and N (F ) = {0} so that λ ∈ σP (T ) if and only if xk+1 = λ−1 F xk , with 0 = xk ∈ R(F Z . Thus x±k = (λ−1 F )±k x0 for ),∞for every2k ∈ ∞ 2 every k ≥ 0, and so #x# = k=−∞ #xk # = k=0 #(λ−1 F )±k x0 #2 < ∞. Now 2 2 set H = + and let F be a diagonal operator on + , 2 F = diag({λj }) ∈ B[+ ],
where the countably infinite set {λj } consists of an enumeration of all rational numbers in (0, 1). Observe that σ(F ) = [0, 1] (and so r(F ) = 1), with σP (F ) = {λj } (in particular, N (F ) = {0}), σR (T ) = ∅, and σC (F ) = [0, 1]\{λj } (see Example 6.C). With this F we proceed to show that σP (T ) = ∅. Suppose there exists a nonzero λ in σP (T ). If |λ| = |λj | for every j, then 0 = |λ−1 λj | = 1 (because 0 < |λj | < 1) for every j. Since x0 = 0, it follows by Problem 5.18(e) that lim k #(λ−1 F )k x0 #2 = ∞ or lim k #(λ−1 F )−k x0 #2 = ∞ because λ−1 F = diag({λ−1 λj }) is a diagonal operator with an inverse on its −1 range. Thus ∞ F )±k x0 #2 = ∞, which is a contradiction. On the other k=0 #(λ hand, |λ| = |λj | for some j then, with x0 = {ξj }, we get #(λ−1 F )k x0 #2 = ∞ if−1 λj |2k |ξj |2 → 0 as k → ∞ only if ξj = 0 for every index j such that j=1 |λ −1 |λj | ≤ |λj | (i.e., such that |λ−1 λj | ≥ 1). Also, #(λ−1 F )−k x#2 = λj | = |λ ∞ −1 −2k 2 ∞ j −1 2k 2 λj | |ξj | = j=1 |λλj | |ξj | → 0 as k → ∞ only if ξj = 0 for j=1 |λ −1 every j such that |λj | ≥ |λj | (i.e., such that |λj λ−1 j | = |λλj | ≥ 1). Thus x0 = 0, which is again a contradiction. Hence, σP (T )\{0} = ∅. However, 0 ∈ σC (T ) because 0 ∈ σC (F ), which concludes the proof of σP (T ) = ∅. Moreover, as F is a normal operator, T also is a normal operator, and so σR (T ) = ∅. Therefore, σC (T ) = σ(T ). Actually, σ(T ) is the annulus about the origin that includes σ(F ) but no circle entirely included in ρ(F ). In other words, with T = ∂ D denoting the closed unit circle about the origin, it can be shown that σ(T ) = T · σ(F ).
478
6. The Spectral Theorem
This follows by an important result of Brown and Pearcy (1966) which says that the spectrum of a tensor product is the product of the spectra. Since σ(F ) = [0, 1], it follows that T · σ(F ) = D −, and hence σ(T ) = σC (T ) = D − . Summing up: A bilateral shift of operators F (which is usually denoted by T = S ⊗ F ) has only a continuous spectrum, which is equal to the (closed) disk D −, if F is a diagonal operator whose diagonal consists of an enumeration of all rational numbers in (0, 1).
6.6 The Spectrum of a Compact Operator The spectral theory of compact operators is an essential feature for the Spectral Theorem for compact normal operators of the next section. Normal operators were defined on a Hilbert space, and therefore we assume throughout this section that the compact operators act on a complex Hilbert space H = {0}, although the spectral theory of compact operators can also be developed on a complex Banach space. Recall that B∞[X , Y ] stands for the collection of all compact linear transformations of a normed space X into a normed space Y, and so B∞[H] denotes the class of all compact operators on H (Section 4.9). Proposition 6.29. If T ∈ B∞[H] and λ is a nonzero complex number, then R(λI − T ) is a subspace of H. Proof. Take an arbitrary compact transformation K ∈ B∞[M, X ] of a subspace M of a complex Banach space X = {0} into X . Let I be the identity on M, let λ be any nonzero complex number, and consider the transformation (λI − K) ∈ B[M, X ]. Claim . If N (λI − K) = {0}, then R(λI − K) is closed in X . Proof. If N (λI − K) = {0} and R(λI − K) is not closed in X = {0}, then λI − K is not bounded below (see Corollary 4.24). This means that for every ε > 0 there exists 0 = xε ∈ M such that #(λI − K)xε # < ε#xε #. Therefore, inf x =1 #(λI − K)x# = 0 and so there exists a sequence {xn } of unit vectors in M for which #(λI − K)xn # → 0. Since K is compact and {xn } is bounded, it follows by Theorem 4.52 that {Kxn } has a convergent subsequence, say {Kxk }, so that Kxk → y ∈ X . However, #λxk − y# = #λxk − Kxk + Kxk − y# ≤ #(λI − K)xk # + #Kxk − y# → 0. Then {λxk } also converges in X to y, and hence y ∈ M (since M is closed in X — Theorem 3.30). Moreover, y = 0 (since 0 = |λ| = #λxk # → #y#) and, as K is continuous, Ky = K limk λxk = λ limk Kxk = λy so that y ∈ N (λI − K). Therefore N (λI − K) = {0}, which is a contradiction.
6.6 The Spectrum of a Compact Operator
479
Now take any T ∈ B[H]. Recall that (λI − T )|N (λI−T )⊥ in B[N (λI − T )⊥, H] is injective (i.e., N ((λI − T )|N (λI−T )⊥) = {0} — see the remark that follows Proposition 5.12) and coincides with λI − T |N (λI−T )⊥ on N (λI − T )⊥. If T is compact, then so is T |N (λI−T )⊥ ∈ B[N (λI − T )⊥, H] (reason: N (λI − T )⊥ is a subspace of H, and the restriction of a compact linear transformation to a linear manifold is a compact linear transformation — see Section 4.9). Since H = 0, we get by the above claim that (λI − T )|N (λI−T )⊥ = λI − T |N (λI−T )⊥ has a closed range for all λ = 0. But R((λI − T )|N (λI−T )⊥) = R(λI − T ), as is readily verified. Proposition 6.30. If T ∈ B∞[H] and λ is a nonzero complex number, then R(λI − T ) = H whenever N (λI − T ) = {0}. Proof. Take any λ = 0 in C and any T ∈ B∞[H]. Suppose N (λI − T ) = {0} and R(λI − T ) = H (recall: H = {0}), and consider the sequence {Mn }∞ n=0 of linear manifolds of H recursively defined by Mn+1 = (λI − T )(Mn )
for every
n≥0
with
M0 = H.
It can be verified by induction that Mn+1 ⊆ Mn
for every
n ≥ 0.
Indeed, M1 = R(λI − T ) ⊆ H = M0 and, if the above inclusion holds for some n ≥ 0, then Mn+2 = (λI − T )(Mn+1 ) ⊆ (λI − T )(Mn ) = Mn+1 , which concludes the induction. The previous proposition ensures that R(λI − T ) is a subspace of H, and so (λI − T ) ∈ G[H, R(λI − T )] by Corollary 4.24. Hence (another induction plus Theorem 3.24), {Mn }∞ n=0 is a decreasing sequence of subspaces of H. Moreover, if Mn+1 = Mn for some n, then there exists an integer k ≥ 1 such that Mk+1 = Mk = Mk−1 (for M0 = H = R(λI − T ) = M1 ). But this leads to a contradiction: if Mk+1 = Mk , then (λI − T )(Mk ) = Mk so that Mk = (λI − T )−1 (Mk ) = Mk−1 . Outcome: Mn+1 is properly included in Mn for each n; that is, Mn+1 ⊂ Mn
for every
n ≥ 0.
Hence Mn+1 is a proper subspace of Mn (Problem 3.38). By Lemma 4.33, for each n ≥ 0 there is an xn ∈ Mn with #xn # = 1 such that 12 < d(xn , Mn+1 ). Recall that λ = 0, take any pair of integers 0 ≤ m < n, and set
x = xn + λ−1 (λI − T )xm − (λI − T )xn so that T xn − T xm = λ(x − xm ). Since x lies in Mm+1 , #T xn − T xm # = |λ|#x − xm # >
1 |λ|. 2
480
6. The Spectral Theorem
Thus the sequence {T xn } has no convergent subsequence (every subsequence of {T xn} is not a Cauchy sequence). Since {xn } is bounded, this ensures that T is not compact (Theorem 4.52), which is a contradiction. Conclusion: If T ∈ B∞[H] and N (λI − T ) = {0} for λ = 0, then R(λI − T ) = H. Corollary 6.31. If T ∈ B∞[H], then 0 = λ ∈ ρ(T ) ∪ σP4 (T ), so that σ(T )\{0} = σP (T )\{0} = σP4(T )\{0}. Proof. Take 0 = λ ∈ C . Since H = {0}, Propositions 6.29 and 6.30 ensure that λ ∈ ρ(T ) ∪ σP1(T ) ∪ σP4(T ) ∪ σR1(T ) and also that λ ∈ ρ(T ) ∪ σP (T ) (see the diagram of Section 6.2). Then λ ∈ ρ(T ) ∪ σP1(T ) ∪ σP4(T ), which implies that λ ∈ ρ(T )∗ ∪ σP1(T )∗ ∪ σP4(T )∗ = ρ(T ∗ ) ∪ σR1(T ∗ ) ∪ σP4(T ∗ ) (by Proposition 6.17). But T ∗ ∈ B∞[H] whenever T ∈ B∞[H] (according to Problem 5.42) so that λ ∈ ρ(T ∗ ) ∪ σP1(T ∗ ) ∪ σP4(T ∗ ), and hence λ ∈ ρ(T ∗ ) ∪ σP4(T ∗ ). That is, λ ∈ ρ(T ) ∪ σP4(T ) whenever λ = 0. Example 6.I. If T ∈ B0 [H] (i.e., T is a finite-rank operator on H), then σ(T ) = σP (T ) = σP4(T ) is finite. Indeed, if dim H < ∞, then σ(T ) = σP (T ) = σP4(T ) (Example 6.B). Suppose dim H = ∞. Since B0 [H] ⊆ B∞[H], it follows that 0 = λ ∈ ρ(T ) ∪ σP4(T ) by Corollary 6.31. Moreover, since dim R(T ) < ∞ and dim H = ∞, it also follows that R(T )− = R(T ) = H and N (T ) = {0} (because dim N (T ) + dim R(T ) = dim H according to Problem 2.17). Then 0 ∈ σP4(T ) (cf. diagram of Section 6.2). Hence σ(T ) = σP (T ) = σP4(T ). If σP (T ) is infinite, then there exists an infinite set of linearly independent eigenvectors of T (Proposition 6.14). Since every eigenvector of T lies in R(T ), this implies that dim R(T ) = ∞ (Theorem 2.5), which is a contradiction. Conclusion: σP (T ) must be finite. In particular, this shows that the spectrum in Example 6.B is, clearly, finite. Example 6.J. Let us glance at the spectra of some compact operators. (a) The operator A = 00 01 on C 2 is obviously compact. Its spectrum is given by (cf. Example 6.B and 6.I) σ(A) = σP (A) = σP4(A) = {0, 1}. 2 (b) The diagonal operator D = diag{λk }∞ k=0 ∈ B[+ ] with λk → 0 is compact ∞ (Example 4.N). By Example 6.C, σP4(D) = {λk }k=0 \{0} and σC (D), λk = 0 for all k ≥ 0 (with σC (D) = {0}), σ(D) = σP4(D) ∪ σP3(D), λk = 0 for some k ≥ 0 (with σP3(D) = {0}). 2 (c) The unilateral weighted shift T+ = shift({αk }∞ k=0 ) acting on + of Example 6.F is compact (T+ = S+ D+ and D+ is compact) and (Example 6.F)
6.6 The Spectrum of a Compact Operator
481
σ(T+ ) = σR (T+ ) = σR2(T+ ) = {0}. Moreover, T+∗ also is compact (Problem 5.42) and (Example 6.F) σ(T+∗ ) = σP (T+∗ ) = σP2(T+∗ ) = {0}. 2 (d) Finally, the bilateral weighted shift T = shift({αk }∞ k=−∞ ) acting on of Example 6.G is compact (the same argument as above ) and (Example 6.G)
σ(T ) = σC (T ) = {0}. Corollary 6.32. If an operator T on H is compact and normaloid, then σP (T ) = ∅ and there exists λ ∈ σP (T ) such that |λ| = #T #. Proof. Recall that H = {0}. If T is normaloid (i.e., r(T ) = #T #), then σ(T ) = {0} only if T = O. If T = O and H = {0}, then 0 ∈ σP (T ) and #T # = 0. If T = O, then σ(T ) = {0} and #T # = r(T ) = maxλ∈σ(T ) |λ|, so that there exists λ in σ(T ) such that |λ| = #T #. Moreover, if T is compact and σ(T ) = {0}, then ∅ = σ(T )\{0} ⊆ σP (T ) by Corollary 6.31, and hence r(T ) = maxλ∈σ(T ) |λ| = maxλ∈σP (T ) |λ| = #T #. Thus there exists λ ∈ σP (T ) such that |λ| = #T #. Proposition 6.33. If T ∈ B∞[H] and {λn } is an infinite sequence of distinct elements in σ(T ), then λn → 0. Proof. Take any T ∈ B[H] and let {λn } be an infinite sequence of distinct elements in σ(T ). If λn = 0 for some n , then the subsequence {λk } of {λn } consisting of all points of {λn } except λn is a sequence of distinct nonzero elements in σ(T ). Since λk → 0 implies λn → 0, there is no loss of generality in assuming that {λn } is a sequence of distinct nonzero elements in σ(T ) indexed by N . Moreover, if T is compact and 0 = λn ∈ σ(T ), then Corollary 6.31 says that λn ∈ σP (T ) for every n ≥ 1. Let {xn }∞ n=1 be a sequence of eigenvectors associated with {λn }∞ (i.e., T x = λ x with xn = 0 for every n ≥ 1), which n n n n=1 is a sequence of linearly independent vectors by Proposition 6.14. Set Mn = span {xi }ni=1
for each
n ≥1
so that each Mn is a subspace of H with dim Mn = n, and Mn ⊂ Mn+1
for every
n ≥ 1.
Actually, each Mn is properly included in Mn+1 since {xi }n+1 i=1 is linearly independent, and so xn+1 ∈ Mn+1 \Mn . From now on the proof is similar to that of Proposition 6.30. Since each Mn is a proper subspace of Mn+1 , for every n ≥ 1 there exists yn+1 ∈ Mn+1 with #yn+1 # = 1 such that 12 < d(yn+1 , Mn ) n+1 by Lemma 4.33. Write yn+1 = i=1 αi xi in Mn+1 so that (λn+1 I − T )yn+1 =
n+1 i=1
αi (λn+1 − λi )xi =
n i=1
αi (λn+1 − λi )xi ∈ Mn .
482
6. The Spectral Theorem
Recall that λn = 0 for all n, take any pair of integers 1 ≤ m < n, and set −1 y = ym − λ−1 m (λm I − T )ym + λn (λn I − T )yn −1 so that T (λ−1 m ym ) − T (λn yn ) = y − yn . Since y lies in Mn−1 , −1 #T (λ−1 m ym ) − T (λn yn )# = #y − yn # >
1 2,
which implies that the sequence {T (λ−1 n yn )} has no convergent subsequence. If T is compact, then Theorem 4.52 ensures that {λ−1 n yn } has no bounded subsequence. That is, supk |λk |−1 = supk #λ−1 y # = ∞, and so inf k |λk | = 0 k k for every subsequence {λk } of {λn }. Thus λn → 0. Corollary 6.34. Take any compact operator T ∈ B∞[H]. (a) 0 is the only possible accumulation point of σ(T ). (b) If λ ∈ σ(T )\{0}, then λ is an isolated point of σ(T ). (c) σ(T )\{0} is a discrete subset of C . (d) σ(T ) is countable. Proof. If λ = 0, then the previous proposition says that there is no sequence of distinct points in σ(T ) that converges to λ. Thus λ = 0 is not an accumulation point of σ(T ) by Proposition 3.28. Therefore, if λ ∈ σ(T )\{0}, then it is not an accumulation point of σ(T ), which means (by definition) that it is an isolated point of σ(T ). Hence σ(T )\{0} consists entirely of isolated points, which means (by definition again) that it is a discrete subset of C . But C is separable, and every discrete subset of a separable metric space is countable (this is a consequence of Theorem 3.35 and Corollary 3.36; see the observations that follow Proposition 3.37). Then σ(T )\{0} is countable and so is σ(T ). The point λ = 0 may be anywhere (i.e., zero may be in any part of the spectrum or in the resolvent set of a compact operator). Precisely, if T ∈ B∞[H], then λ = 0 may lie in σP (T ), σR (T ), σC (T ), or ρ(T ) (see Example 6.J). However, if 0 ∈ ρ(T ), then H must be finite dimensional. Indeed, if 0 ∈ ρ(T ), then T −1 ∈ B[H], and so I = T −1 T is compact by Proposition 4.54, which implies that H is finite dimensional (Corollary 4.34). We show next that the eigenspaces associated with nonzero eigenvalues of a compact operator are also finite dimensional. Proposition 6.35. If T ∈ B∞[H] and λ is a nonzero complex number, then dim N (λI − T ) = dim N (λI − T ∗ ) < ∞. Proof. Take any λ = 0 in C and any T ∈ B∞[H]. If dim N (λI − T ) = 0, then N (λI − T ) = {0} so that λ ∈ ρ(T ∗ ) by Proposition 6.17. Thus N (λI − T ∗ ) = {0}; equivalently, dim N (λI − T ∗ ) = 0. Dually, since T ∈ B∞[H] if and only if T ∗ ∈ B∞[H] (Problem 5.42), it follows that dim N (λI − T ∗ ) = 0 implies dim N (λI − T ) = 0. That is,
6.6 The Spectrum of a Compact Operator
dim N (λI − T ) = 0
if and only if
483
dim N (λI − T ∗ ) = 0.
Now suppose dim N (λI − T ) = 0, and so dim N (λI − T ∗ ) = 0. Observe that N (λI − T ) = {0} is an invariant subspace for T (if T x = λx, then T (T x) = λ(T x)), and also that T |N (λI−T ) = λI of N (λI − T ) into itself. If T is compact, then T |N (λI−T ) is compact (Section 4.9), and so is λI = O on N (λI − T ) = {0}. But λI = O is not compact in an infinite-dimensional normed space (by Corollary 4.34) so that dim N (λI − T ) < ∞. Dually, as T ∗ is compact, dim N (λI − T ∗ ) < ∞. Therefore, there exist positive integers m and n such that dim N (λI − T ) = m
and
dim N (λI − T ∗ ) = n.
n Let {ei }m i=1 and {fi }i=1 be orthonormal bases for the Hilbert spaces N (λI − T ) and N (λI − T ∗ ), respectively. Set k = min{m, n} ≥ 1 and consider the mappings S: H → H and S ∗ : H → H defined by
Sx =
k
x ; ei fi
and
i=1
S ∗x =
k
x ; fi ei
i=1
for every x ∈ H. It is clear that S and S ∗ lie in B[H], and also that S ∗ is the adjoint of S; that is, Sx ; y = x ; S ∗ y for every x, y ∈ H. Actually, R(S) ⊆
{fi }ki=1 ⊆ N (λI − T ∗ )
and
R(S ∗ ) ⊆
{ei }ki=1 ⊆ N (λI − T )
so that S, S ∗ ∈ B0 [H], and hence T + S and T ∗ + S ∗ lie in B∞[H] by Theorem 4.53 (since B0 [H] ⊆ B∞[H]). First suppose that m ≤ n (and so k = m). If x is a vector in N (λI − (T + S)), then (λI − T )x = Sx. But R(S) ⊆ N (λI − T ∗ ) = R(λI − T )⊥ (Proposition 5.76), and hence (λI− T )x = Sx = 0. Therefore, m x ∈ N (λI − T ) = span {ei }m x = αi ei (forsome family of i=1 so that i=1 m m m scalars {αi }m ). Thus 0 = Sx = α Se = j i=1 j=1 j j=1 αj i=1 ej ; ei fi = m m i=1 αi fi , which implies that αi = 0 for every i = 1, . . . , m (reason: {fi }i=1 is an orthonormal set, thus linearly independent — Proposition 5.34). That is, x = 0. Outcome: N (λI − (T + S)) = {0}. Hence λ ∈ ρ(T + S) according to Corollary 6.31 (since T + S ∈ B∞[H] and λ = 0). Conclusion: m≤n
implies
R(λI − (T + S)) = H.
Dually, using exactly the same argument, n≤m
implies
R(λI − (T ∗ + S ∗ )) = H.
If m < n, then k = m < m + 1 ≤ n and fm+1 ∈ R(λI − (T + S)) = H, so that there exists v ∈ H for which (λI − (T + S))v = fm+1 . Hence 1 = fm+1 ; fm+1 = (λI − (T + S))v ; fm+1 = (λI − T )v ; fm+1 − Sv ; fm+1 = 0,
484
6. The Spectral Theorem
which is a contradiction. Indeed, (λI − T )v ; fm+1 = Sv ; fm+1 = 0 for fm+1 ∈ N (λI − T ∗ ) = R(λI − T )⊥ and Sv ∈ R(S) ⊆ span {fi }m i=1 . If n < m, then k = n < n + 1 ≤ m, and en+1 ∈ R(λI − (T ∗ + S ∗ )) = H so that there exists u ∈ H for which (λI − (T ∗ + S ∗ ))u = en+1 . Hence 1 = em+1 ; em+1 = (λI − (T ∗ + S ∗ ))u ; en+1 = (λI − T ∗ )u ; en+1 − S ∗ u ; en+1 = 0, which is a contradiction too (since en+1 ∈ N (λI − T ) = R(λI − T ∗ )⊥ and S ∗ u ∈ R(S ∗ ) ⊆ span {ei }ni=1 ). Therefore, m = n. Together, the statements of Propositions 6.29 and 6.35 (or simply, the first identity in Corollary 6.31) are referred to as the Fredholm Alternative.
6.7 The Compact Normal Case Throughout this section H = {0} is a complex Hilbert space. Let {λγ }γ∈Γ be a bounded family of complex numbers, let {Pγ }γ∈Γ be a resolution of the identity on H, and let T ∈ B[H] be a (bounded) weighted sum of projections (cf. Definition 5.60 and Proposition 5.61): Tx = λγ Pγ x for every x ∈ H. γ∈Γ
Proposition 6.36. Every weighted sum of projections is normal . Proof. Note that {λγ }γ∈Γ is a bounded family of complex numbers, and consider the weighted sum of projections T ∗ ∈ B[H] given by λγ Pγ x for every x ∈ H. T ∗x = γ∈Γ
This in fact is the adjoint of T ∈ B[H] since each Pγ is self-adjoint (Proposition 5.81). Indeed, take x = γ∈Γ Pγ x and y = γ∈Γ Pγ y in H (recall: {Pγ }γ∈Γ is a resolution of the identity on H) so that, as R(Pα ) ⊥ R(Pβ ) if α = β, : 9 T x ; y = α∈Γ λα Pα x ; β∈Γ Pβ y = α∈Γ β∈Γ λα Pα x ; Pβ y = γ∈Γ λγ Pγ x ; Pγ y = β∈Γ α∈Γ λα Pβ x ; Pα y 9 : ∗ = β∈Γ Pβ x ; α∈Γ λα Pα y = x ; T y. Moreover, since Pγ2 = Pγ for all γ and Pα Pβ = Pβ Pα = O if α = β, T ∗T x = α∈Γ λα Pα β∈Γ λβ Pβ x = α∈Γ β∈Γ λα λβ Pα Pβ x 2 = γ∈Γ |λγ | Pγ x = α∈Γ β∈Γ λα λβ Pα Pβ x ∗ = α∈Γ λα Pα β∈Γ λβ Pβ x = T T x for every x ∈ H. That is, T is normal.
6.7 The Compact Normal Case
485
Particular case: Diagonal operators and, more generally, diagonalizable operators on a separable Hilbert space (as defined in Problem 5.17), are normal operators. In fact, the concept of weighted sum of projections on any Hilbert space can be thought of as a generalization of the concept of diagonalizable operators on a separable Hilbert space. The next proposition shows that such a generalization preserves the spectral properties (compare with Example 6.C). Proposition 6.37. If T ∈ B[H] is a weighted sum of projections, then σP (T ) = λ ∈ C : λ = λγ for some γ ∈ Γ , σR (T ) = ∅, and σC (T ) = λ ∈ C : λ = λγ for all γ ∈ Γ and inf |λ − λγ | = 0 . γ∈Γ
Proof. Take any x = γ∈Γ Pγ x in H. Recall that {Pγ }γ∈Γ is a resolution of the identity on H so that #x#2 = γ∈Γ #Pγ x#2 by Theorem 5.32. Moreover, #(λI − T )x#2 = γ∈Γ |λ − λγ |2 #Pγ x#2 since (λI − T )x = γ∈Γ (λ − λγ )Pγ x for any λ ∈ C (cf. Theorem 5.32 again). If N (λI − T ) = {0}, then there exists an x = 0 in H such that (λI − T )x = 0, and therefore γ∈Γ #Pγ x#2 = 0 and 2 2 γ∈Γ |λ − λγ | #Pγ x# = 0, which implies that #Pα x# = 0 for some α ∈ Γ and |λ − λα |#Pα x# = 0. Thus λ = λα . Conversely, take any α ∈ Γ and an arbitrary nonzero vector x in R(Pα ) (recall: Pγ = O and so R(Pγ ) = {0} for every γ ∈ Γ ). But R(Pα ) ⊥ R(Pγ ) whenever α = γ so that R(P α) ⊥ α =γ∈Γ R(Pγ ).
⊥ = α =γ∈Γ R(Pγ )⊥ = α =γ∈Γ N (Pγ ) (cf. Hence R(Pα ) ⊆ α =γ∈Γ R(Pγ ) Problem 5.8(a) and Propositions 5.76(a) and 5.81(b)). Thus x ∈ N (Pγ ) for every α = γ ∈ Γ , and so #(λα I − T )x#2 = γ∈Γ |λα − λγ |2 #Pγ x#2 = 0, which ensures that N (λα I − T ) = {0}. Outcome: N (λI − T ) = {0} if and only if λ = λα for some α ∈ Γ . That is, σP (T ) = λ ∈ C : λ = λγ for some γ ∈ Γ . We have just seen that N (λI − T ) = {0} if and only if λ = λγ for all γ ∈ Γ . In this case (i.e., if λI − T is injective) there exists an inverse (λI − T )−1 in L[R(λI − T ), H], which is a weighted sum of projections on R(λI − T ): (λI − T )−1 x = (λ − λγ )−1 Pγ x for every x ∈ R(λI − T ). γ∈Γ
Indeed, if λ = λγ for all γ ∈ Γ, then α∈Γ (λ − λα )−1 Pα β∈Γ (λ − λβ )Pβ x = −1 (λ − λβ )Pα Pβ x = α∈Γ β∈Γ (λ − λα ) γ∈Γ Pγ x = x for every x in H. Now recall from Proposition 5.61 that (λI − T )−1 ∈ B[H] if and only if λ = λγ for all γ ∈ Γ and supγ∈Γ |λ − λγ |−1 < ∞. Equivalently, (λI − T )−1 ∈ B[H] if and only if inf γ∈Γ |λ − λγ | > 0. In other words, ρ(T ) = λ ∈ C : inf |λ − λγ | > 0 . γ∈Γ
But T is normal by Proposition 6.36, so that σR (T ) = ∅ (Corollary 6.18), and hence σC (T ) = (C \ρ(T ))\σP (T ).
486
6. The Spectral Theorem
Proposition 6.38. A weighted sum of projections T ∈ B[H] is compact if and only if the following triple condition holds: σ(T ) is countable, 0 is the only possible accumulation point of σ(T ), and dim R(Pγ ) < ∞ for every γ such that λγ = 0. Proof. Let T ∈ B[H] be a weighted sum of projections. Claim . R(Pγ ) ⊆ N (λγ I − T ) for every γ. Proof. Take an arbitrary index γ. If x ∈ R(Pγ ), then x = Pγ x (Problem 1.4) so that T x = T Pγ x = α λα Pα Pγ x = λγ Pγ x = λγ x (since Pα ⊥ Pγ whenever γ = α), and hence x ∈ N (λγ I − T ). If T is compact, then σ(T ) is countable and 0 is the only possible accumulation point of σ(T ) (Corollary 6.34), and dim N (λI − T ) < ∞ whenever λ = 0 (Proposition 6.35) so that dim R(Pγ ) < ∞ for every γ such that λγ = 0 by the above claim. Conversely, if T = O, then T is trivially compact. Thus suppose T = O. Since T is normal (Proposition 6.36), r(T ) > 0 (reason: the unique normal operator with a null spectral radius is the null operator — see the remark that precedes Corollary 6.28), so that there exists λ = 0 in σP (T ) by Corollary 6.31. If σ(T ) is countable, then let {λk } be any enumeration of the countable set σP (T )\{0} = σ(T )\{0}. Hence the weighted sum of projections T ∈ B[H] is given by (Proposition 6.37) λk Pk x for every x ∈ H, Tx = k
where {Pk } is included in a resolution of the identity on H (which is itself a resolution of the identity on H whenever 0∈ / σP (T )). If {λk } is a finite set, n n say {λk } = {λ } , then R(T ) = R(P k k=1 k ). If dim R(Pk ) < ∞ for every k=1
− n k, then dim R(P ) < ∞ (according to Problem 5.11), and so T lies k k=1 in B0 [H] ⊆ B∞[H]. Now suppose {λk } is countably infinite. Since σ(T ) is compact (Corollary 6.12), it follows by Theorem 3.80 and Proposition 3.77 that {λk } has an accumulation point in σ(T ). If 0 is the only possible accumulation point of σ(T ), then 0 is the unique accumulation point of {λk }. Thus, for each integer n ≥ 1, consider the partition {λk } = {λk } ∪ {λk }, where |λk | ≥ n1 and |λk | < n1 . Note that {λk } is a finite subset of σ(T ) (it has no accumulation point), and hence {λk } is an infinite subset of σ(T ). Set λk Pk ∈ B[H] for each n ≥ 1. Tn = k
We have just seen that dim R(Tn ) < ∞. That is, Tn ∈ B0 [H] for every n ≥ 1. However, since Pj ⊥ Pk whenever j = k, we get (cf. Corollary 5.9) , ,2 , , #(T − Tn )x#2 = , λk Pk x, ≤ sup |λk |2 #Pk x#2 ≤ n12 #x#2 k
k
k
u for all x ∈ H, so that Tn −→ T . Hence T ∈ B∞[H] by Corollary 4.55.
6.7 The Compact Normal Case
487
Before considering the Spectral Theorem for compact normal operators, we need a few spectral properties of normal operators. Proposition 6.39. If T ∈ B[H] is normal, then N (λI − T ) = N (λI − T ∗)
for every
λ ∈ C.
Proof. Take an arbitrary λ ∈ C . If T is normal, then λI − T is normal (cf. proof of Corollary 6.18) so that #(λI − T ∗ )x# = #(λI − T )x# for every x ∈ H by Proposition 6.1(b). Proposition 6.40. Take λ, μ ∈ C . If T ∈ B[H] is normal, then N (λI − T ) ⊥ N (μI − T )
whenever
λ = μ.
Proof. Suppose x ∈ N (λI − T ) and y ∈ N (μI − T ) so that λx = T x and μy = T y. Since N (λI − T ) = N (λI − T ∗ ) by the previous proposition, λx = T ∗x. Thus μy ; x = T y ; x = y ; T ∗ x = y ; λx = λy ; x, and therefore (μ − λ)y ; x = 0, which implies that y ; x = 0 whenever μ = λ. Proposition 6.41. If T ∈ B[H] is normal, then N (λI − T ) reduces T for every λ ∈ C . Proof. Take an arbitrary λ ∈ C and any T ∈ B[H]. Recall that N (λI − T ) is a subspace of H (Proposition 4.13). Moreover, it is clear that N (λI − T ) is T -invariant (if T x = λx, then T (T x) = λ(T x)). Similarly, N (λI − T ∗) is T ∗ invariant. Now suppose T ∈ B[H] is a normal operator. Proposition 6.39 says that N (λI − T ) = N (λI − T ∗), and so N (λI − T ) also is T ∗ -invariant. Then N (λI − T ) reduces T (cf. Corollary 5.75). Corollary 6.42. Let {λγ }γ∈Γ be a family of distinct scalars. If T ∈ B[H] is − a normal operator, then the topological sum reduces T . γ∈Γ N (λγ I − T ) Proof. For each γ ∈ Γ write Nγ = N (λγ I − T ), which is a subspace of H (Proposition 4.13). According to Proposition 6.40, {Nγ }γ∈Γ of is a family
pairwise orthogonal subspaces of H. Take an arbitrary x ∈ Nγ −. If Γ γ∈Γ
− is finite, then = γ∈Γ Nγ (Corollary 5.11); otherwise, apply the γ∈Γ Nγ Orthogonal Structure Theorem (i.e., Theorem 5.16 if Γ is countably infinite or Problem 5.10 if Γ isuncountable). In any case (finite, countably infinite, or uncountable Γ ), x = γ∈Γ uγ with each uγ in Nγ . Moreover, T uγ and T ∗ uγ lie in Nγ for each γ ∈ Γ because each Nγ reduces T by Proposition 6.41 (cf. ∗ Corollary 5.75).
T and −T are ∗linearand continuous, it follows Thus, since ∗ that T x = γ∈Γ T uγ ∈ N and T x = T u ∈ Nγ −. γ γ γ∈Γ γ∈Γ γ∈Γ
− Therefore, reduces T (cf. Corollary 5.75 again). γ∈Γ Nγ Every (bounded) weighted sum of projections is normal (Proposition 6.36), and every compact weighted sum of projections has a countable set of distinct
488
6. The Spectral Theorem
eigenvalues (Propositions 6.37 and 6.38). The Spectral Theorem for compact normal operators ensures the converse. Theorem 6.43. (The Spectral Theorem). If T ∈ B[H] is compact and normal, then there exists a countable resolution of the identity {Pk } on H and a (similarly indexed) bounded set of scalars {λk } such that λk Pk , T = k
where {λk } = σP (T ), the set of all (distinct ) eigenvalues of T , and each Pk is the orthogonal projection onto the eigenspace N (λk I − T ). Moreover, if the above countable weighted sum of projections is infinite, then it converges in the (uniform) topology of B[H]. Proof. If T is compact and normal, then it has a nonempty point spectrum (Corollary 6.32) and its eigenspaces span H. In other words,
− Claim . = H. λ∈σP (T ) N (λI − T )
− , which is a subspace of H. Suppose Proof. Set M = λ∈σP (T ) N (λI − T ) ⊥ M = H so that M = {0} (Proposition 5.15). Consider the restriction T |M⊥ of T to M⊥. If T is normal, then M reduces T (Corollary 6.42) so that M⊥ is T -invariant, and hence T |M⊥ ∈ B[M⊥ ] is normal (cf. Problem 6.17). If T is compact, then T |M⊥ is compact (see Section 4.9). Thus T |M⊥ is a compact normal operator on the Hilbert space M⊥ = {0}, and so σP (T |M⊥ ) = ∅ by Corollary 6.32. That is, there exist λ ∈ C and 0 = x ∈ M⊥ such that T |M⊥ x = λx and hence T x = λx. Thus λ ∈ σP (T ) and x ∈ N (λI − T ) ⊆ M. But this leads to a contradiction, viz., 0 = x ∈ M ∩ M⊥ = {0}. Outcome: M = H. Since T is compact, the nonempty set σP (T ) is countable (Corollaries 6.32 and 6.34) and bounded (because T ∈ B[H]). Then write σP (T ) = {λk }k∈N , where {λk }k∈N is a finite or infinite sequence of distinct elements in C consisting of all eigenvalues of T . Here, either N = {1, . . . , m} for some m ∈ N if σP (T ) is finite, or N = N if σP (T ) is (countably) infinite. Recall that each N (λk I − T ) is a subspace of H (Proposition 4.13). Moreover, since T is normal, Proposition 6.40 says that N (λk I − T ) ⊥ N (λj I − T ) whenever k = j. Thus {N (λk I − T )}k∈N is a sequence
− of pairwise orthogonal subspaces of H such that H = N (λ I − T ) by the above claim. Then the sequence k k∈N {Pk }k∈N of the orthogonal projections onto each N (λk I − T ) is a resolution of the identity on H (see Theorem 5.59). Thisimplies that x = k∈N Pk x and, since T is linear and continuous, T x = k∈N T Pk x for every x ∈ H. But Pk x ∈ R(Pk ) = N (λk I − T ), and so T Pk x = λk Pk x, for each k ∈ N and every x ∈ H. Hence
6.7 The Compact Normal Case
Tx =
λk Pk x
489
x ∈ H.
for every
k∈N
Conclusion: T is a countable weighted sum of projections. If N is finite, then the theorem is proved. Thussuppose N is infinite (i.e., N = N ). In this case, n s the above identity says that k=1 λk Pk −→ T (see the observation that follows the proof of Proposition 5.61). We show next that the above convergence actually is uniform. Indeed, for any n ∈ N , n ∞ ∞ , , ,2
,2 , , , , λk Pk x, = , λk Pk x, = |λk |2 #Pk x#2 , T− k=1
k=n+1
k=n+1 ∞
≤ sup |λk |2 k≥n+1
#Pk x#2 ≤ sup |λk |2 #x#2 . k≥n
k=n+1
(Reason: R(Pj ) ⊥ R(Pk ) whenever j = k, and x = ∞ 2 k=1 #Pk x# — see Corollary 5.9.) Hence
∞
k=1 Pk x
so that #x#2 =
n n , , ,
, , , , , λk Pk , = sup , T − λk Pk x, ≤ sup |λk | 0 ≤ ,T − k=1
x =1
k=1
k≥n
for all n ∈ N . Since T is compact and since {λn } is an infinite sequence of distinct elements in σ(T ), it follows by Proposition λn → 0. Therefore n 6.33 that u limn supk≥n |λk | = lim supn |λn | = 0, and so k=1 λk Pk −→ T. In other words, if T is a compact and normal operator on a (nonzero) complex Hilbert space H, then the family {Pλ }λ∈σP (T ) of orthogonal projections onto each eigenspace N (λI − T ) is a resolution of the identity on H, and T is a weighted sum of projections. Thus we write T = λPλ , λ∈σP (T )
which is to be interpreted pointwise (i.e., T x = λ∈σP (T ) λPλ x for every x in H) as in Definition 5.60. This was naturally identified in Problem 5.16 with the orthogonal direct sum of scalar operators λ∈σP (T ) λIλ , where Iλ = Pλ |R(Pλ ) . Here R(Pλ ) = N (λI − T ). Under such a natural identification we also write " T = λPλ . λ∈σP (T )
These representations are referred to as the spectral decomposition of a compact normal operator T . The next result states the Spectral Theorem for compact normal operators in terms of an orthonormal basis for N (T )⊥ consisting of eigenvectors of T . Corollary 6.44. Let T ∈ B[H] be compact and normal .
490
6. The Spectral Theorem
λ (a) For each λ ∈ σP (T )\{0} there is a finite orthonormal basis {ek (λ)}nk=1 for N (λI − T ) consisting entirely of eigenvectors of T . λ (b) The set {ek } = λ∈σP (T )\{0} {ek (λ)}nk=1 is a countable orthonormal basis ⊥ for N (T ) made up of eigenvectors of T . nλ x ; ek (λ)ek (λ) for every x ∈ H, so that (c) T x = λ∈σP (T )\{0} λ k=1 (d) T x = k μk x ; ek ek for every x ∈ H, where {μk } is a sequence containing all nonzero eigenvalues of T finitely repeated according to the multiplicity of the respective eigenspace.
Proof. We have already seen that σP (T ) is nonempty and countable (cf. proof of the previous theorem). Recall that σP (T ) = {0} if and only if T = O (Corollary 6.32) or, equivalently, if and only if N (T )⊥ = {0} (i.e., N (T ) = H). If T = O (i.e., T = 0I), then the above assertions hold trivially (σP (T )\{0} = ∅, {ek } = ∅, N (T )⊥ = {0} and T x = 0x = 0 for every x ∈ H because the empty sum is null). Thus suppose T = O (so that N (T )⊥ = {0}), and take an arbitrary λ = 0 in σP (T ). According to Proposition 6.35, dim N (λI − T ) is finite, say, dim N (λI − T ) = nλ for some positive integer nλ . Then there exists a fiλ nite orthonormal basis {ek (λ)}nk=1 for the Hilbert space N (λI − T ) = {0} (cf. Proposition 5.39). This proves (a). Observe that ek (λ) is an eigenvector of T for each k = 1, . . . , nλ (because 0 = ek (λ) ∈ N (λI − T )). nλ ⊥ Claim . λ∈σP (T )\{0} {ek (λ)}k=1 is an orthonormal basis for N (T ) .
− (cf. Claim in the proof of Theorem 6.43). Proof. H = λ∈σP (T ) N (λI − T ) Thus, according to Problem 5.8(b,d,e), it follows that
⊥ N (λI − T )⊥ = N (λI − T ) N (T ) = λ∈σP (T )\{0}
λ∈σP (T )\{0}
(because {N (λI − T )}λ∈σP (T ) is a nonempty family of orthogonal subspaces
− of H — Proposition 6.40). Therefore N (T )⊥ = λ∈σP (T )\{0} N (λI − T ) (Proposition 5.15), and the claimed result follows by part (a), Proposition 6.40, and Problem 5.11. λ Thus (b) holds since {ek } = λ∈σP (T )\{0} {ek (λ)}nk=1 is countable by Corollary 1.11. Consider the decomposition H = N (T ) + N (T )⊥ of Theorem 5.20. Take an arbitrary u + v with u ∈ N (T ) and v ∈ N (T )⊥. vector x ∈ H so that x = λ Let v = k v ; ek ek = λ∈σP (T )\{0} nk=1 v ; ek (λ)ek (λ) be the Fourier series expansion of v (cf. Theorem 5.48) in terms of the orthonormal basis {ek } = nλ ⊥ λ∈σP (T )\{0} {ek (λ)}k=1 for the Hilbert space N (T ) = {0}. Since the operator T is linear and continuous, and since T ek (λ) = λek (λ) for each integer k follows that T x= T u + T v = T v = nλevery λ ∈ σP (T )\{0}, it = 1, . . . , nλ and nλ (λ) (λ) v ; e T e = k k λ∈σP (T )\{0} λ∈σP (T )\{0} λ k=1 k=1 v ; ek (λ) ek (λ). However, x ; ek (λ) = u ; ek (λ) + v ; ek (λ) = v ; ek (λ) because u ∈ N (T ) and ek (λ) ∈ N (T )⊥, which proves (c). The preceding assertions lead to (d).
6.7 The Compact Normal Case
491
Remark : If T ∈ B[H] is compact and normal, and if H is nonseparable, then 0 ∈ σP (T ) and N (T ) is nonseparable. Indeed, for T = O the italicized result is trivial (T = O implies 0 ∈ σP (T ) and N (T ) = H). On the other hand, if T = O, then N (T )⊥ = {0} is separable for it has a countable orthonormal basis {ek } (Theorem 5.44 and Corollary 6.44). If N (T ) is separable, then it also has a countable orthonormal basis, say {fk }, and hence {ek } ∪ {fk } is a countable orthonormal basis for H = N (T ) + N (T )⊥ (Problem 5.11) so that H is separable. Moreover, if 0 ∈ / σP (T ), then N (T ) = {0}, and therefore H = N (T )⊥ is separable. N (T ) reduces T (Proposition 6.41), and hence T = T |N (T )⊥ ⊕ O. By Problem 5.17 and Corollary 6.44(d), if T ∈ B[H] is compact and normal, then T |N (T )⊥ ∈ B[N (T )⊥ ] is diagonalizable. Precisely, T |N (T )⊥ is a diagonal operator with respect to the orthonormal basis {ek } for the separable Hilbert space N (T )⊥. Generalizing: An operator T ∈ B[H] (not necessarily compact) acting on any Hilbert space H (not necessarily separable) is diagonalizable if there exist a resolution of the identity {Pγ }γ∈Γ on H and a bounded family of scalars {λγ }γ∈Γ such that T u = λγ u whenever u ∈ R(Pγ ). Take an = γ∈Γ Pγ x in H. Since T is linear and continuous, arbitrary x T x = γ∈Γ T Pγ x = γ∈Γ λγ Pγ x so that T is a weighted sum of projections (which is normal by Proposition 6.36). Thus we write (cf. Problem 5.16) " T = λγ Pγ or T = λγ Pγ . γ∈Γ
γ∈Γ
Conversely, if T is a weighted sum of projections (T x = γ∈Γ λγ Pγ x for every x ∈ H), then T u = γ∈Γ λγ Pγ u = γ∈Γ λγ Pγ Pα u = λα u for every u ∈ R(Pα ) (since Pγ Pα = O whenever γ = α and u = Pα u whenever u ∈ R(Pα )), and hence T is diagonalizable. Outcome: An operator T on H is diagonalizable if and only if it is a weighted sum of projections for some bounded sequence of scalars {λγ }γ∈Γ and some resolution of the identity {Pγ }γ∈Γ on H. In this case, {Pγ }γ∈Γ is said to diagonalize T . Corollary 6.45. If T ∈ B[H] is compact, then T is normal if and only if T is diagonalizable. Let {Pk } be a resolution of the identity on H that diagonalizes a compact and normal operator T ∈ B[H] into its spectral decomposition, and take any operator S ∈ B[H]. The following assertions are pairwise equivalent. (a) S commutes with T and with T ∗ . (b) R(Pk ) reduces S for every k. (c) S commutes with every Pk . Proof. Take a compact operator T on H. If T is normal, then the Spectral Theorem says that it is diagonalizable. The converse is trivial since a diagonalizable operator is normal. Now suppose T is compact and normal so that
492
6. The Spectral Theorem
T =
λk Pk ,
k
where {Pk } is a resolution of the identity on H and {λk } = σP (T ) is the set of all (distinct) eigenvalues of T (Theorem 6.43). Recall from the proof of Proposition 6.36 that T∗ = λk Pk . k
Take any λ ∈ C . If S commutes with T and with T ∗, then (λI − T ) commutes with S and with S ∗, so that N (λI − T ) is an invariant subspace for both S and S ∗ (Problem 4.20(c)). Hence N (λI − T ) reduces S (Corollary 5.75), which means that S commutes with the orthogonal projection onto N (λI − T ) (cf. observation that precedes Proposition 5.74). In particular, since R(Pk ) = N (λk I − T ) for each k (Theorem 6.43), R(Pk ) reduces S for every k, which means that S commutes with every Pk . Then (a)⇒(b)⇔(c). It is readily verified that (c)⇒(a). Indeed, if SP = P S for every k, then S T = k k k λk SPk = ∗ ∗ k λk Pk S = T S and S T = k λk SPk = k λk Pk S = T S (recall that S is linear and continuous).
6.8 A Glimpse at the General Case What is the role played by compact operators in the Spectral Theorem? First note that, if T is compact, then its spectrum (and so its point spectrum) is countable. But this is not crucial once we know how to deal with uncountable sums. In particular, we know how to deal with an uncountable weighted sum of projections T x = γ∈Γ λγ Pγ x (recall that, even in this case, the above sum has only a countable number of nonzero vectors for each x). What really brings a compact operator into play is that a compact normal operator has a nonempty point spectrum and, more than that, it has enough eigenspaces to span H (see the fundamental claim in the proof of Theorem 6.43). That makes the difference, for a normal (noncompact) operator may have an empty point spectrum (witness: a bilateral shift), or it may have eigenspaces but not enough to span the whole space H (sample: an orthogonal direct sum of a bilateral shift with an identity). The general case of the Spectral Theorem is the case that deals with plain normal (not necessarily compact) operators. In fact, the Spectral Theorem survives the lack of compactness if the point spectrum is replaced with the spectrum (which is never empty). But this has a price: a suitable statement of the Spectral Theorem for plain normal operators requires some knowledge of measure theory, and a proper proof requires a sound knowledge of it. We shall not prove the two fundamental theorems of this final section (e.g., see Conway [1] and Radjavi and Rosenthal [1]). Instead, we just state them, and verify some of their basic consequences. Thus we assume here (and only here) that the reader has, at least, some familiarity with measure theory in order
6.8 A Glimpse at the General Case
493
to grasp the definition of spectral measure and, therefore, the statement of the Spectral Theorem. Operators will be acting on complex Hilbert spaces H = {0} or K = {0}. Definition 6.46. Let Ω be a nonempty set in the complex plane C and let ΣΩ be the σ-algebra of Borel subsets of Ω. A (complex) spectral measure in a (complex) Hilbert space H is a mapping P : ΣΩ → B[H] such that (a) P (Λ) is an orthogonal projection for every Λ ∈ ΣΩ , (b) P (∅) = O and P (Ω) = I, (c) P (Λ1 ∩ Λ2 ) = P (Λ1 )P (Λ2 ) for every Λ1 , Λ2 ∈ ΣΩ ,
(d) P k Λk = k P (Λk ) whenever {Λk } is a countable collection of pairwise disjoint sets in ΣΩ (i.e., P is countably additive). If {Λk }k∈N is a countably infinite collection of pairwise disjoint sets in ΣΩ , then the identity in (d) means convergence in the strong topology: n
s P (Λk ) −→ P
Λk .
k∈N
k=1
In fact, since Λj ∩ Λk = ∅ whenever j = k, it follows by properties (b) and (c) that P (Λj )P (Λk ) = P (Λj ∩ Λk ) = P (∅) = O for j = k, so that {P (Λk )}k∈N is an orthogonal sequence n of orthogonal projections in B[H]. Then, according to Proposition 5.58, { k=1
− strongly to the orthogonal P (Λk )}n∈N converges projection in B[H] onto R(P (Λ )) = k k∈N k∈N R(P (Λk )) . Therefore, what property (d) says (in the case of a countably infinite collection of pairwise
disjoint Borel sets {Λk }k∈N ) is that P k∈N Λk coincides with the orthogonal
projection in B[H] onto k∈N R(P (Λk )) . This generalizes the concept of a resolution of the identity on H. In fact, if {Λk }k∈N is a partition of Ω, then the orthogonal sequence of orthogonal projections {P (Λk )}k∈N is such that n k=1
s P (Λk ) −→ P
k∈N
Λk
= P (Ω) = I.
Now take x, y ∈ H and consider the mapping px,y : ΣΩ → C defined by px,y (Λ) = P (Λ)x ; y for every Λ ∈ ΣΩ . The mapping px,y is an ordinary complex-valued (countably additive) measure on ΣΩ . Let ϕ : Ω → C be any bounded ΣΩ % -measurable function. The integral of ϕ with respect to the measure p , viz., ϕ dpx,y , will x,y % % % also be denoted by ϕ(λ) dpx,y , or by ϕ dPλ x ; y, or by ϕ(λ) dPλ x ; y. Moreover, there exists a unique F ∈ B[H] such that $ F x ; y = ϕ(λ) dPλ x ; y for every x, y ∈ H.
494
6. The Spectral Theorem
% Indeed, let f : H×H → C be defined by f (x, y) = φ(λ) dPλ x ; y for every x, y in H, which %is a sesquilinear form. Since % the measure P (·)x ; x is positive, |f (x, x)| ≤ |φ(λ)| dPλ x ; x ≤ #φ#∞ dPλ x ; x = #φ#∞ P (Ω)x ; x = #φ#∞ x ; x = #φ#∞ #x#2 for every x in H. This implies that f is bounded (i.e., sup x = y =1 |f (x, y)| < ∞) by the polarization identity (see the remark that follows Proposition 5.4), and so is the linear functional f (·, y) : H → C for each y ∈ H. Then, by the Riesz Representation Theorem (Theorem 5.62), for each y ∈ H there is a unique zy ∈ H such that f (x, y) = x ; zy for every x ∈ H. This establishes a mapping Φ : H → H assigning to each y ∈ H such a unique zy ∈ H so that f (x, y) = x ; Φy for every x, y ∈ H. Φ is unique and lies in B[H] (cf. proof of Proposition 5.65(a,b)). Thus F = Φ∗ is the unique operator in B[H] for which F x ; y = f (x, y) for every x, y ∈ H. The notation $ F = ϕ(λ)dPλ % is just a shorthand for the identity F x ; y = ϕ(λ) dPλ x ; y for every x, y in % H. Observe that F ∗ x ; y = Φx ; y = y ; Φx = f (y, x) = ϕ(λ) dPλ y ; x = % ϕ(λ) dPλ x ; y) for every x, y ∈ H, and hence $ ∗ F = ϕ(λ) dPλ . % If ψ : Ω → C is a bounded ΣΩ % -measurable function and G = ψ(λ) dPλ , then it can be shown that F G = ϕ(λ)ψ(λ) dPλ . In particular, $ F ∗F = |ϕ(λ)|2 dPλ = F F ∗ so that F is normal. The Spectral Theorem states the converse. Theorem 6.47. (The Spectral Theorem). If N ∈ B[H] is normal, then there exists a unique spectral measure P on Σσ(N ) such that $ N = λ dPλ . If Λ is a nonempty relatively open subset of σ(N ), then P (Λ) = O. % The representation N = λ dPλ is% usually referred to as the spectral decomposition of N . Note that N ∗N = |λ|2 dPλ = N N ∗. % Theorem 6.48. (Fuglede). Let N = λ dPλ be the spectral decomposition of a normal operator N ∈ B[H]. If S ∈ B[H] commutes with N , then S commutes with P (Λ) for every Λ ∈ Σσ(N ) . In other words, if SN = N S, then SP (Λ) = P (Λ)S, and so each subspace R(P (Λ)) reduces S, which means that {R(P (Λ))}Λ∈Σσ(N ) is a family of reducing subspaces for every operator that commutes with a normal operator
6.8 A Glimpse at the General Case
495
% N = λ dPλ . If σ(N ) has a single point, say σ(N ) = {λ}, then N = λI (by uniqueness of the spectral measure); that is, N is a scalar operator so that every subspace of H reduces N . Hence, if N is nonscalar, then σ(N ) has more than one point (and dim H > 1). If λ, μ ∈ σ(N ) and λ = μ, then let D λ be the open disk of radius 12 |λ − μ| centered at λ. Set Λλ = σ(N ) ∩ D λ and Λλ = σ(N )\D λ in Σσ(N ) so that σ(N ) is the disjoint union of Λλ and Λλ . Note that P (Λλ ) = O and P (Λλ ) = O (since Λλ and σ(N )\D − λ are nonempty relatively open subsets of σ(N ), and σ(N )\D − ⊆ Λ ). Then I = P (σ(N )) = λ λ P (Λλ ∪ Λλ ) = P (Λλ ) + P (Λλ ), and therefore P (Λλ ) = I − P (Λλ ) = I. Thus {0} = R(P (Λλ )) = H. Conclusions: Suppose dim H > 1. Every normal operator has a nontrivial reducing subspace. Actually, every nonscalar normal operator has a nontrivial hyperinvariant subspace which reduces every operator that commutes with it . In fact, an operator is reducible if and only if it commutes with a nonscalar normal operator or, equivalently, if and only if it commutes with a nontrivial orthogonal projection (cf. observation preceding Proposition 5.74). Corollary 6.49. (Fuglede–Putnam). If N1 ∈ B[H] and N2 ∈ B[K] are normal operators, and if X ∈ B[H, K] intertwines N1 to N2 , then X intertwines N1∗ to N2∗ (i.e., if XN1 = N2 X, then XN1∗ = N2∗ X). % Proof. Let N = λ dPλ ∈ B[H] be normal. Take any Λ ∈ Σσ(N ) and S ∈ B[H]. Claim . SN = N S ⇐⇒ SP (Λ) = P (Λ)S ⇐⇒ SN ∗ = N ∗S. Proof. If SN = N S, then SP (Λ) = P (Λ)S for every Λ ∈ Σσ(N ) by Theorem 6.48. Therefore, for every x, y ∈ H, $ SN ∗ x ; y = N ∗ x ; S ∗ y = λ dPλ x ; S ∗ y $ $ λ dSPλ x ; y = λ dPλ Sx ; y = N ∗Sx ; y. = Hence SN ∗ = N ∗ S so that N S ∗ = S ∗ N . Conversely, If N S ∗ = S ∗ N , then P (Λ)S ∗ = S ∗P (Λ), and so SP (Λ) = P (Λ)S, for every Λ ∈ Σσ(N ) (cf. Theorem 6.48 again). Thus SN = N S since, for every x, y ∈ H, $ SN x ; y = N x ; S ∗ y = λ dPλ x ; S ∗ y $ $ = λ dSPλ x ; y = λ dPλ Sx ; y, = N Sx ; y. Finally, take N1 ∈ B[H], N2 ∈ B[K], X ∈ B[H, K], and consider the operators
O O N = N1 ⊕ N2 = NO1 NO2 and S = X O in B[H ⊕ K]. If N1 and N2 are normal, then N is normal. If XN1 = N2 X, then SN = N S and so SN ∗ = N ∗S by the above claim. Hence XN1∗ = N2∗ X.
496
6. The Spectral Theorem
In particular, the claim in the above proof ensures that S ∈ B[H] commutes with N and with N ∗ if and only if S commutes with P (Λ) or, equivalently, R(P (Λ)) reduces S, for every Λ ∈ Σσ(N ) (compare with Corollary 6.45). Corollary 6.50. Take N1 ∈ B[H], N2 ∈ B[K], and X ∈ B[H, K]. If N1 and N2 are normal operators and XN1 = N2 X, then (a) N (X) reduces N1 and R(X)− reduces N2 , so that N1 |N (X)⊥ ∈ B[N (X)⊥ ] and N2 |R(X)− ∈ B[R(X)− ]. Moreover, (b) N1 |N (X)⊥ and N2 |R(X)− are unitarily equivalent. Proof. (a) Since XN1 = N2 X, it follows that N (X) is N1 -invariant and R(X) is N2 -invariant (and so R(X)− is N2 -invariant — cf. Problem 4.18). Indeed, if Xx = 0, then XN1 x = N2 Xx = 0; and N2 Xx = XN1 x ∈ R(X) for every x ∈ H. Corollary 6.49 ensures that XN1∗ = N2∗ X, and so N (X) is N1∗ -invariant and R(X)− is N2∗ -invariant. Therefore (Corollary 5.75), N (X) reduces N1 and R(X)− reduces N2 . 1
(b) Let X = W Q be the polar decomposition of X, where Q = (X ∗X) 2 (Theorem 5.89). Observe that XN1 = N2 X implies X ∗ N2∗ = N1∗ X ∗, which in turn implies X ∗ N2 = N1 X ∗ by Corollary 6.49. Then Q2 N1 = X ∗XN1 = X ∗ N2 X = N1 X ∗X = N1 Q2 so that QN1 = N1 Q (Theorem 5.85). Therefore, W N1 Q = W QN1 = XN1 = N2 X = N2 W Q. That is, (W N1 − N2 W )Q = O. Thus R(Q)− ⊆ N (W N1 − N2 W ), and so N (Q)⊥ ⊆ N (W N1 − N2 W ) (since Q = Q∗ so that R(Q)− = N (Q)⊥ by Proposition 5.76). Recall that N (W ) = N (Q) = N (Q2 ) = N (X ∗X) = N (X) (cf. Propositions 5.76 and 5.86, and Theorem 5.89). If u ∈ N (Q) then N2 W u = 0, and N1 u = N1 |N (X) u ∈ N (X) = N (W ) (because N (X) is N1 -invariant) so that W N1 u = 0. Hence N (Q) ⊆ N (W N1 − N2 W ). The above displayed inclusions imply that N (W N1 − N2 W ) = K (cf. Problem 5.7(b)), which means that W N1 = N2 W = O. Thus W N1 = N2 W.
6.8 A Glimpse at the General Case
497
Now W = V P , where V : N (W )⊥ → K is an isometry and P : H → H is the orthogonal projection onto N (W )⊥ (Proposition 5.87). Then V P N 1 = N2 V P
so that
V P N1 |N (X)⊥ = N2 V P |N (X)⊥ .
Since R(P ) = N (W )⊥ = N (X)⊥ is N1 -invariant (recall: N (X) reduces N1 ), it follows that N1 (N (X)⊥ ) ⊆ N (X)⊥ = R(P ), and hence V P N1 |N (X)⊥ = V N1 |N (X)⊥ . Since R(V ) = R(W ) = R(X)− (cf. Theorem 5.89 and the observation that precedes Proposition 5.88), it also follows that N2 V P |N (X)⊥ = N2 V P |R(P ) = N2 V = N2 |R(X)−V. But V : N (W )⊥ → R(V ) is a unitary transformation (i.e., a surjective isometry) of the Hilbert space N (X)⊥ = N (W )⊥ ⊆ H onto the Hilbert space R(X)− = R(V ) ⊆ K. Conclusion: V N1 |N (X)⊥ = N2 |R(X)−V, so that the operators N1 |N (X)⊥ ∈ B[N (X)⊥ ] and N2 |R(X)− ∈ B[R(X)− ] are unitarily equivalent. An immediate consequence of Corollary 6.50: If a quasiinvertible linear transformation intertwines two normal operators, then these normal operators are unitarily equivalent . That is, if N1 ∈ B[H] and N2 ∈ B[K] are normal operators, and if XN1 = N2 X, where X ∈ B[H, K] is such that N (X) = {0} (equivalently, N (X)⊥ = H) and R(X)− = K, then U N1 = N2 U for a unitary U ∈ B[H, K]. This happens, in particular, when X is invertible (i.e., if X is in G[H, K]). Outcome: Two similar normal operators are unitarily equivalent . Applying Theorems 6.47 and 6.48 we saw that normal operators (on a complex Hilbert space of dimension greater than 1) have a nontrivial invariant subspace. This also is the case for compact operators (on a complex Banach space of dimension greater than 1). The ultimate result along this line was presented by Lomonosov in 1973: An operator has a nontrivial invariant subspace if it commutes with a nonscalar operator that commutes with a nonzero compact operator . In fact, every nonscalar operator that commutes with a nonscalar compact operator (itself, in particular) has a nontrivial hyperinvariant subspace. Recall that, on an infinite-dimensional normed space, the only scalar compact operator is the null operator. On a finite-dimensional normed space every operator is compact, and hence every operator on a complex finite-dimensional normed space of dimension greater than 1 has a nontrivial invariant subspace and, if it is nonscalar, a nontrivial hyperinvariant subspace as well. This prompts the most celebrated open question in operator theory, namely, the invariant subspace problem: Does every operator (on
498
6. The Spectral Theorem
an infinite-dimensional complex separable Hilbert space) have a nontrivial invariant subspace? All the qualifications are crucial here. Observe that the 0 1 2 operator −1 on R has no nontrivial invariant subspace (when acting on 0 the Euclidean real space but, of course, it has a nontrivial invariant subspace when acting on the unitary complex space C 2 ). Thus the preceding question actually refers to complex spaces and, henceforward, we assume that all spaces are complex. The problem has a negative answer if we replace Hilbert space with Banach space. This (the invariant subspace problem in a Banach space) remained as an open question for a long period up to the mid-1980s, when it was solved by Read (1984) and Enflo (1987), who constructed a Banach-space operator without a nontrivial invariant subspace. As we have just seen, the problem has an affirmative answer in a finite-dimensional space (of dimension greater than 1). It has an affirmative answer in a nonseparable Hilbert space too. Indeed, let T be any operator on a nonseparable Hilbert space H, and let x be any vector in H. Consider the orbit of x under T , {T nx}n≥0 , nonzero n so that {T x}n≥0 = {0} is an invariant nsubspace for T (cf. Problem 4.23). n Since {T x} is a countable set, {T x}n≥0 = H by Proposition 4.9(b). n≥0 Hence {T n x}n≥0 is a nontrivial invariant subspace for T . Completeness and boundedness are also crucial here. In fact, it can be shown that (1) there is an operator on an infinite-dimensional complex separable (incomplete) inner product space which has no nontrivial invariant subspace, and that (2) there is a (not necessarily bounded) linear transformation of a complex separable Hilbert space into itself without nontrivial invariant subspaces. However, for bounded linear operators on an infinite-dimensional complex separable Hilbert space, the invariant subspace problem remains a recalcitrant open question.
Suggested Reading Akhiezer and Glazman [1], [2] Arveson [2] Bachman and Narici [1] Beals [1] Beauzamy [1] Berberian [1], [3] Berezansky, Sheftel, and Us [1], [2] Clancey [1] Colojoarˇ a and Foia¸s [1] Conway [1], [2], [3] Douglas [1] Dowson [1] Dunford and Schwartz [3] Fillmore [1], [2] Furuta [1] Gustafson and Rao [1]
Halmos [1], [4] Helmberg [1] Herrero [1] Kubrusly [1], [2] Martin and Putinar [1] Naylor and Sell [1] Pearcy [1], [2] Putnam [1] Radjavi and Rosenthal [1], [2] Riesz and Sz.-Nagy [1] Sunder [1] Sz.-Nagy and Foia¸s [1] Taylor and Lay [1] Weidmann [1] Xia [1] Yoshino [1]
Problems
499
Problems Problem 6.1. Let H be a Hilbert space. Show that the set of all normal operators from B[H] is closed in B[H]. Hint : (T ∗ − S ∗ )(T − S) + (T ∗ − S ∗ )S + S ∗ (T − S) = T ∗ T − S ∗ S, and hence #T ∗ T − S ∗ S# ≤ #T − S#2 + 2#S##T − S# for every T, S ∈ B[H]. Verify the above inequality. Now let {Nn }∞ n=1 be a sequence of normal operators in B[H] that converges in B[H] to N ∈ B[H]. Check that #N ∗ N − N N ∗ # = #N ∗ N − Nn∗ Nn + Nn Nn∗ − N N ∗ # ≤ #Nn∗ Nn − N ∗ N # + #Nn Nn∗ − N N ∗ #
≤ 2 #Nn − N #2 + 2#N ##Nn − N # . Conclude: The (uniform) limit of a uniformly convergent sequence of normal operators is normal . Finally, apply the Closed Set Theorem. Problem 6.2. Let S and T be normal operators acting on the same Hilbert space. Prove the following assertions. (a) αT is normal for every scalar α. (b) If S ∗ T = T S ∗, then S + T , T S and S T are normal operators. (c) T ∗nT n = T n T ∗n = (T ∗ T )n = (T T ∗ )n for every integer n ≥ 0. Hint : Problem 5.24 and Proposition 6.1. Problem 6.3. Let T be a contraction on a Hilbert space H. Show that s A, (a) T ∗n T n −→
(b) O ≤ A ≤ I (i.e., A ∈ B + [H] and #A# ≤ 1; a nonnegative contraction), (c) T ∗nAT n = A for every integer n ≥ 0. Hint : Take T ∈ B[H] with #T # ≤ 1. Use Proposition 5.84, Problem 5.49 and Proposition 5.68, Problems 4.45(a), 5.55, and 5.24(a). According to Problem 5.54 a contraction T is strongly stable if and only if A = O. Since A ≥ O, it follows by Proposition 5.81 that A is an orthogonal projection if and only if it is idempotent (i.e., if and only if A = A2 ). In general, A may not be a projection. 2 (d) Consider the operator T = shift(α, 1, 1, 1, . . .) in B[+ ], a weighted shift 2 on H = + with |α| ∈ (0, 1), and show that T is a contraction for which 2 A = diag(|α|2 , 1, 1, 1, . . .) in B[+ ] is not a projection.
(e) Show that A = A2 if and only if AT = TA .
500
6. The Spectral Theorem
Hint : Use part (c) to show that AT x ; TAx = #Ax#2. Since #T # ≤ 1, check that #AT x − TAx#2 ≤ #AT x#2 − #Ax#2 . Recalling that #T # ≤ 1, 1 #A# ≤ 1, and using part (c), show that #Ax#2 ≤ #AT x#2 ≤ #A 2 T x#2 = 1 T ∗AT x ; x = Ax ; x = #A 2 x#2 . Conclude: A = A2 implies AT = TA. For the converse, use parts (a) and (c). (f) Show that A = A2 if T is a normal contraction. Hint : Problems 6.2(c) and 5.24. Remark : It can be shown that A = A2 if T is a cohyponormal contraction. Problem 6.4. Consider the Hilbert space L2 (T ) of Example 5.L(c), where T denotes the unit circle about the origin of the complex plane. Recall that, in this context, the terms “bounded function”, “equality”, “inequality”, “belongs”, and “for all” are interpreted in the sense of equivalence classes. Let ϕ : T → C be a bounded function. Show that (a) ϕf lies in L2 (T ) for every f ∈ L2 (T ). Thus consider the mapping Mϕ : L2 (T ) → L2 (T ) defined by Mϕ f = ϕf
for every
f ∈ L2 (T ).
That is, (Mϕ f )(z) = ϕ(z)f (z) for all z ∈ T . This mapping is called the multiplication operator on L2 (T ). It is easy to show that Mϕ is linear and bounded (i.e., Mϕ ∈ B[L2 (T )]). Prove the following propositions. (b) #Mϕ # = #ϕ#∞. Hint : Show that #Mϕ f # ≤ #ϕ#∞ #f # for every f ∈ L2 (T ). Take any ε > 0 and set T ε = {z ∈ T : #ϕ#∞ − ε < |ϕ(z)|}. Let fε be the characteristic function of T ε . Show that fε ∈ L2 (T ) and #Mϕ fε # ≥ (#ϕ#∞ − ε)#f − ε#. ∗
(c) Mϕ g = ϕg for every g ∈ L2 (T ). (d) Mϕ is a normal operator. (e) Mϕ is unitary if and only if ϕ(z) ∈ T for all z ∈ T . (f) Mϕ is self-adjoint if and only if ϕ(z) ∈ R for all z ∈ T . (g) Mϕ is nonnegative if and only if ϕ(z) ≥ 0 for all z ∈ T . (h) Mϕ is positive if and only if ϕ(z) > 0 for all z ∈ T . (i) Mϕ is strictly positive if and only if ϕ(z) ≥ α > 0 for all z ∈ T . Problem 6.5. If T is a quasinormal operator, then (a) (T ∗ T )n T = T (T ∗ T )n for every n ≥ 0, (b) |T n | = |T |n for every n ≥ 0,
and
s s w O ⇐⇒ |T |n −→ O ⇐⇒ |T |n −→ O. (c) T n −→
Problems
501
Hint : Prove (a) by induction. (b) holds trivially for n = 0, 1, for every operator T . If T is a quasinormal operator (so that (a) holds) and if |T n| = |T |n for some n ≥ 1, then verify that |T n+1 |2 = T ∗ T ∗n T nT = T ∗ |T n |2 T = T ∗ |T |2n T = T ∗ (T ∗ T )nT = T ∗ T (T ∗ T )n = (T ∗ T )n+1 = |T |2(n+1) = (|T |n+1 )2 . Now conclude the induction that proves (b) by recalling that the square root is unique. Use Problem 5.61(d) and part (b) to prove (c). Problem 6.6. Every quasinormal operator is hyponormal. Give a direct proof. Hint : Let T be an operator on a Hilbert space H. Take any x = u + v ∈ H = N (T ∗ ) + N (T ∗ )⊥ = N (T ∗ ) + R(T )−, with u ∈ N (T ∗ ) and v ∈ R(T )−, so that v = limn vn where {vn } is an R(T )-valued sequence (cf. Propositions 4.13, 5.20, 5.76 and 3.27). Set D = T ∗ T − T T ∗. Verify that Du; u = #T u#2. If T is quasinormal (i.e., D T = O), then Du ; v = limn u ; Dvn = 0, Dv ; u = limn Dvn ; u = 0, and Dv ; v = limn Dvn ; v = 0. Thus Dx ; x ≥ 0. Problem 6.7. If T ∈ G[H] is hyponormal, then T −1 is hyponormal. Hint : O ≤ D = T ∗ T − T T ∗ . Then (Problem 5.51(a)) O ≤ T −1D T −1∗. Show that I ≤ T −1 T ∗ T T ∗−1 and so T ∗ T −1 T ∗−1 T ≤ I (Problems 1.10 and 5.53(b)). Verify: O ≤ T −1∗ (I − T ∗ T −1 T ∗−1 T )T −1. Conclude: T −1 is hyponormal. Problem 6.8. If T ∈ G[H] is hyponormal and both T and T −1 are contractions, then T is normal. Hint : #T x# = #T (T ∗)−1 T ∗ x# ≤ #T ∗ x#, and so T is cohyponormal. Remark : The above statement is just a particular case of the following general result. If T is invertible and both T and T −1 are contractions, then T is unitary (and #T # = #T −1 # = 1) — see Proposition 5.73(a,d, and c). Problem 6.9. Take any operator T ∈ B[H] on a Hilbert space H and take an arbitrary vector x ∈ H. Show that (a) T ∗ T x = #T #2x if and only if #T x# = #T ##x#. Hint : If #T x# = #T ##x#, then T ∗ T x ; #T #2x = #T #4 #x#2 . Therefore, #T ∗ T x − #T #2x#2 = #T ∗ T x#2 − #T #4#x#2 ≤ (#T ∗ T #2 − #T #4)#x#2 = 0. Now consider the set M = x ∈ H: #T x# = #T ##x# and prove the following assertions. (b) M is a subspace of H. Hint : M = N (#T #2 I − T ∗ T ).
502
6. The Spectral Theorem
(c) If T is hyponormal, then M is T -invariant. Hint : #T (T x)# ≤ #T ##T x# = ##T #2x# = #T ∗ T x# ≤ #T (T x)# if x ∈ M. (d) If T is normal, then M reduces T . Hint : M is invariant for both T and T ∗ whenever T is normal. k Note: M may be trivial (examples: T = I and T = diag({ k+1 }∞ k=1 )).
Problem 6.10. Let H = {0} and K = {0} be complex Hilbert spaces. Take T ∈ B[H] and W ∈ G[H, K] arbitrary. Recall that H and K are unitarily equivalent, according to Problem 5.70. Show that σP (T ) = σP (W T W −1 )
and
ρ(T ) = ρ(W T W −1 ).
Thus conclude that (see Proposition 6.17) σR (T ) = σR (W T W −1 ) Finally, verify that
and
σ(T ) = σ(W T W −1 ).
σC (T ) = σC (W T W −1 ).
Outcome: Similarity preserves each part of the spectrum, and so similarity preserves the spectral radius: r(T ) = r(W T W −1 ). That is, if T ∈ B[H] and S ∈ B[K] are similar (i.e., if T ≈ S), then σP (T ) = σP (S), σR (T ) = σR (S), σC (T ) = σC (S), and so r(T ) = r(S). Use Problem 4.41 to show that unitary equivalence also preserves the norm (i.e., if T ∼ = S, then #T # = #S#). Note: Similarity preserves nontrivial invariant subspaces (Problem 4.29). Problem 6.11. Let A ∈ B[H] be a self-adjoint operator on a complex Hilbert space H = {0}. Use Corollary 6.18(d) to check that ±i ∈ ρ(A) so that A + iI and A − iI both lie in G[H]. Consider the operator U = (A − iI)(A + iI)−1 = (A + iI)−1 (A − iI) in G[H], where U −1 = (A + iI)(A − iI)−1 = (A − iI)−1 (A + iI) (cf. Corollary 4.23). Note that A commutes with (A + iI)−1 and with (A − iI)−1 because every operator commutes with its resolvent function. Show that (a) U is unitary, (b) U = I − 2 i(A + iI)−1 = 2A(A + iI)−1 − I, (c) 1 ∈ ρ(U )
and
A = i(I + U )(I − U )−1 = i(I − U )−1 (I + U ).
Hint : (a) #(A ± iI)x#2 = #Ax#2 + #x#2 (since A = A∗ and so 2 ReAx ; ix = 0) for every x ∈ H. Take any y ∈ H so that y = (A + iI)x for some x ∈ H (recall: R(A + iI) = H). Then U y = (A − iI)x and #U y#2 = #(A − iI)x#2 = #(A + iI)x#2 = #y#2 so that U is an isometry. (b) A − iI = −2 iI +(A + iI) = 1 2A − (A + iI). (c) (I − U )−1 = 2i (A + iI) and I + U = I +(A − iI)(A + iI)−1.
Problems
503
Conversely, let U ∈ G[H] be a unitary operator with 1 ∈ ρ(U ) (so that I − U lies in G[H]) and consider the operator A = i(I + U )(I − U )−1 = i(I − U )−1 (I + U ) in B[H]. Recall again: U commutes with (I − U )−1. Show that (d) A = iI + 2i U (I − U )−1 = −iI + 2 i(I − U )−1 , (e) A is self-adjoint, (f) ±i ∈ ρ(A)
and
U = (A − iI)(A + iI)−1 = (A + iI)−1 (A − iI).
Hint : (d) i(I + U ) = i(I − U ) + 2 i U = −i(I − U ) + 2i I. (e) (I − U )−1∗ U ∗ = (I − U ∗ )−1 U −1 = (U (I − U ∗ ))−1 = −(I − U )−1, and therefore A∗ = −i I − 2 i(I − U )−1∗ U ∗ = −i I + 2 i(I − U )−1 = A. (f) Using (d) we get A − iI = 1 2 i U (I − U )−1 and A + iI = 2 i(I − U )−1, so that (A + iI)−1 = 2i (I − U ). Summing up: Set U = (A − iI)(A + iI)−1 for an arbitrary self-adjoint operator A. U is unitary with 1 ∈ ρ(U ) and i(I + U )(I − U )−1 = A. Conversely, set A = i(I + U )(I − U )−1 for any unitary operator U with 1 ∈ ρ(U ). A is self-adjoint and (A − iI)(A + iI)−1 = U . Outcome: There is a one-to-one correspondence between the class of all selfadjoint operators and the class of all unitary operators for which 1 belongs to the resolvent set, namely, a mapping A → (A − iI)(A + iI)−1 with inverse U → i(I + U )(I − U )−1. If A is a self-adjoint operator, then the unitary operator U = (A − iI)(A + iI)−1 is called the Cayley transform of A. What is behind such a one-to-one correspondence is the M¨ obius transformation z → z−i , z+i which maps the open upper half plane onto the open unit disk, and the extended real line onto the unit circle. Remark : Take any A in L[D(A), H] where D(A), the domain of A, is a linear manifold of H. Suppose A − I is injective (i.e., 1 ∈ σP (A)) so that it has an inverse (not necessarily bounded) on its range; that is, A − I has an inverse (A − I)−1 on R(A − I). Then set S = (A + I)(A − I)−1 in L[R(A − I), H]. Observe that, on R(A − I), S − I = 2(A − I)−1
and
S + I = 2A(A − I)−1 .
Indeed, S = (A − I + 2I)(A − I)−1 = (A − I)(A − I)−1 + 2(A − I)−1 and S = (2A − A + I)(A − I)−1 = 2A(A − I)−1 − (A − I)(A − I)−1 . Thus S − I is injective (i.e., 1 ∈ σP (S) because (A − I)−1 in L[R(A − I), H] is injective) so that it has an inverse (S − I)−1 = 12 (A − I) on its range R(S − I), and hence (S + I)(S − I)−1 = A. The domain of (S − I)−1, which is the range of S − I, coincides with the domain of A so that R(S − I) = D(A). Thus, similarly, on R(S − I) = D(A),
504
6. The Spectral Theorem
A − I = 2(S − I)−1
and
A + I = 2S(S − I)−1 .
In fact, A = (S − I + 2I)(S − I)−1 = (S − I)(S − I)−1 + 2(S − I)−1 and A = (2S − S + I)(S − I)−1 = 2S(S − I)−1 − (S − I)(S − I)−1. It is also usual to refer to S as the Cayley transform of A. Even if D(A) = H, we may not apply Corollary 4.24 to infer boundedness for the inverse of S − I because the domain of S − I is R(A − I), which may not be a Banach space. It is worth noticing that if A ∈ B[H] is self-adjoint with 1 ∈ ρ(A), then S ∈ B[H] may not be unitary. Example: A = αI with 1 = α ∈ R is self-adjoint and A − I has a # α+1 # # = 1 (i.e., if α = 0). # bounded inverse, but S = α+1 I is not unitary if α−1 α−1 Problem 6.12. Let T ∈ B[H] be any operator on a complex Hilbert space H of dimension greater than 1. Prove the following assertions. (a) N (λI − T ) is a subspace of H, which is T -invariant for every λ ∈ C . Moreover, T |N (λI−T ) = λI : N (λI − T ) → N (λI − T ). That is, if restricted to the invariant subspace N (λI − T ), then T acts as a scalar operator on N (λI − T ), and hence T |N (λI−T ) is normal. Remark : Obviously, if N (λI − T ) = {0} (in other words, if λ ∈ / σP (T )), then T |N (λI−T ) = λI : {0} → {0} coincides with the null operator: on the null space every operator is null or, equivalently, the only operator on the null space is the null operator. (b) N (λI − T ) ⊆ N (λI − T ∗ ) if and only if N (λI − T ) reduces T . Hint : Take x in N (λI − T ) so that T x = λx and T x ∈ N (λI − T ) because N (λI − T ) is T -invariant. If N (λI − T ) ⊆ N (λI − T ∗ ), then T ∗ x = λx. Thus T (T ∗ x) = T (λx) = λ T x = λ λx = λλ x = λ(T ∗ x), and therefore T ∗ x ∈ N (λI − T ). Now apply Corollary 5.75. Conversely, if N (λI − T ) reduces T , then N (λI − T ) reduces λI − T . Consider the decomposition H = N (λI − T ) ⊕ N (λI − T )⊥. Thus λI − T = O ⊕ S for some operator S on N (λI − T )⊥ (since (λI − T )|N (λI−T ) = 0). Hence λI − T ∗ = O ⊕ S ∗. Then N (λI − T ∗ ) = N (λI − T ) ⊕ N (S ∗ ). So N (λI − T ) ⊆ N (λI − T ∗ ). (c) An operator T is reducible if and only if λI − T is reducible for every λ. Hint : Recall that an operator is reducible if and only if it commutes with a nontrivial orthogonal projection; that is, if and only if it commutes with an orthogonal projection onto a nontrivial subspace (cf. observation that precedes Proposition 5.74). (d) Every eigenspace of a nonscalar operator T is a nontrivial hyperinvariant subspace for T (i.e., if λ ∈ σP (T ), then {0} = N (λI − T ) = H, and N (λI − T ) is S-invariant for every S that commutes with T ). (e) If T is nonscalar and σP (T ) ∪ σR (T ) = ∅ (i.e., σP (T ) ∪ σP (T ∗ ) = ∅), then T has a nontrivial hyperinvariant subspace. (f) If T has no nontrivial invariant subspace, then σ(T ) = σC (T ).
Problems
505
Hint : Problems 4.20(c) and 4.26, and Propositions 5.74 and 6.17. Problem 6.13. We have already seen in Section 6.3 that σ(T −1 ) = σ(T )−1 = {λ−1 ∈ C : λ ∈ σ(T )} for every T ∈ G[X ], where X = {0} is a complex Banach space. Exhibit a diagonal operator T in G[C 2 ] for which r(T −1 ) = r(T )−1 . Problem 6.14. Let T be an arbitrary operator on a complex Banach space X = {0}, take any λ ∈ ρ(T ) (so that (λI − T ) ∈ G[X ]), and set d = d(λ, σ(T )), the distance of λ to σ(T ). Since σ(T ) is nonempty (bounded) and closed, if follows that d is a positive real number (cf. Problem 3.43(b)). Show that the spectral radius of the inverse of λI − T coincides with the inverse of the distance of λ ∈ ρ(T ) to the spectrum of T . That is, (a)
r((λI − T )−1 ) = d−1 .
Hint : Since d = inf μ∈σ(T ) |λ − μ|, it follows that d−1 = supμ∈σ(T ) |λ − μ|−1 . Why? Recall from the Spectral Mapping Theorem (Theorem 6.19) that σ(λI − T ) = {λ − μ ∈ C : μ ∈ σ(T )}. Since σ((λI − T )−1) = σ(λI − T )−1 = {μ−1 ∈ C : μ ∈ σ(λI − T )}, then σ((λI − T )−1) = {(λ − μ)−1 ∈ C : μ ∈ σ(T )}. Now let X be a Hilbert space and prove the following implication. (b)
If T is hyponormal, then #(λI − T )−1 # = d−1 .
Hint : If T is hyponormal, then λI − T is hyponormal (cf. proof of Corollary 6.18) and so is (λI − T )−1 (Problem 6.7). Hence (λI − T )−1 is normaloid by Proposition 6.10. Apply (a). Problem 6.15. Let M be a subspace of a Hilbert space H and take T ∈ B[H]. If M is T -invariant, then (T |M )∗ = P T ∗|M in B[M], where P : H → H is the orthogonal projection onto M. Hint : Use Proposition 5.81 to verify that, for every u, v ∈ M, (T |M )∗ u ; v = u ; T |M v = u ; T v = u ; T P v = P T ∗ u ; v = P T ∗|M u ; v. In other words, if M is T -invariant, then T (M) ⊆ M (so that T |M lies in B[M]), but T ∗(M) may not be included in M; it has to be projected there: P T ∗(M) ⊆ M (so that P T ∗|M lies in B[M] and coincides with (T |M )∗ ). If M reduces T (i.e., if M also is T ∗ -invariant), then T ∗(M) does not need to be projected on M; it is already there (i.e., if M reduces T , then T ∗(M) ⊆ M and (T |M )∗ = T ∗|M — see Corollary 5.75). Problem 6.16. Let M be an invariant subspace for T ∈ B[H]. (a) If T is hyponormal, then T |M is hyponormal. (b) If T is hyponormal and T |M is normal, then M reduces T .
506
6. The Spectral Theorem
Hint : T |M ∈ B[M]. Use Problem 6.15 (and Propositions 5.81 and 6.6) to show that #(T |M )∗ u# ≤ #T ∗ u# ≤ #T |M u# for every u ∈ M. If T |M is normal, say X T |M = N , then T = N in B[M ⊕ M⊥] (cf. Example 2.O and Proposition O Y 5.51). Since N is normal and T is hyponormal, verify that ! N ∗X − X Y ∗ −XX ∗ O ≤ D = T ∗T − T T ∗ = . X ∗N − Y X ∗ X ∗X + Y ∗ Y − Y Y ∗ Take u in M, set x = (u, 0) in M ⊕ M⊥, and show that Dx ; x = −#X ∗u#2 . O Conclude: T = N = N ⊕ Y , and hence M reduces T . O Y Problem 6.17. This is a rather important result. Let M be an invariant subspace for a normal operator T ∈ B[H]. Show that T |M is normal if and only if M reduces T . Hint : If T is normal, then it is hyponormal. Apply Problem 6.16 to verify that M reduces T whenever T |M is normal. Conversely, if M reduces T , then T = N1 ⊕ N2 on M ⊕ M⊥, with N1 = T |M ∈ B[M] and N2 = T |M⊥ ∈ B[M⊥]. Now verify that both N1 and N2 are normal operators whenever T is normal. Problem 6.18. Let T be a compact operator on a complex Hilbert space H and let D be the open unit disk about the origin of the complex plane. u O. (a) Show that σP (T ) ⊆ D implies T n −→
Hint : Corollary 6.31 and Proposition 6.22. w O implies σP (T ) ⊆ D . (b) Show that T n −→
Hint : If λ ∈ σP (T ), then verify that there exists a unit vector x in H such that T n x = λn x for every positive integer n. Thus |λ|n → 0, and hence w |λ| < 1, whenever T n −→ O (cf. Proposition 5.67). Conclude: The concepts of weak, strong, and uniform stabilities coincide for a compact operator on a complex Hilbert space. Problem 6.19. If T ∈ B[H] is hyponormal, then N (λI − T ) ⊆ N (λI − T ∗ )
for every
λ ∈ C.
Hint : Adapt the proof of Proposition 6.39. Problem 6.20. Take λ, μ ∈ C . If T ∈ B[H] is hyponormal, then N (λI − T ) ⊥ N (μI − T )
whenever
λ = μ.
Hint : Adapt the proof of Proposition 6.40 by using Problem 6.19.
Problems
507
Problem 6.21. If T ∈ B[H] is hyponormal, then N (λI − T ) reduces T for every λ ∈ C . Hint : Adapt the proof of Proposition 6.41. First observe that, if λx = T x, then T ∗ x = λx (by Problem 6.19). Next verify that λT ∗ x = λλx = λλx = λT x = T λx = T T ∗x. Then conclude: N (λI − T ) is T ∗ -invariant. Note: T |N (λI−T ) is a scalar operator on N (λI − T ), and so a normal operator (Problem 6.12(a)). An operator is called completely nonnormal if it has no normal direct summand (i.e., if it has no nonzero reducing subspace on which it acts as a normal operator; equivalently, if the restriction of it every nonzero reducing subspace is not normal). A pure hyponormal is a completely nonnormal hyponormal operator. Use the above result to prove the following assertion. A pure hyponormal operator has an empty point spectrum. Problem 6.22. Let T ∈ B[H] be a hyponormal operator. Show that
− M= reduces T and T |M is normal. λ∈σP (T ) N (λI − T ) Hint : If σP (T ) = ∅, then the result is trivial (for the empty sum is null). Thus suppose σP (T ) = ∅. First note that {N (λI − T )}λ∈σP (T ) is an orthogonal family of nonzero subspaces of the Hilbert space M (Problem 6.20). Now choose one of the following methods. (1) Adapt the proof of Corollary 6.42, with the help of Problems 6.20 and 6.21, to verify that M reduces T . Use Theorem 5.59 and Problem 5.10 to check that the family {Pλ }λ∈σP (T ) consisting of the nonzero orthogonal projections Pλ ∈ B[M] onto each N (λI Take any − T ) is a resolution of the identity on M. u ∈ M. Verify that u= λ Pλ u, and so T |M u = T u = λ T Pλ u = λ λPλ u (reason: Pλ u ∈ N (λI − T ), where the sums run over σP (T )). Conclude that T |M ∈ B[M] is a weighted sum of projections. Apply Proposition 6.36. (2) Use Example 5.J and Problem 5.10 to identify the topological sum M with the orthogonal direct sum λ∈σP (T ) N (λI − T ). Since each N (λI − T ) reduces T (Problem 6.21), it follows that M reduces T , and also that each N (λI − T ) reduces T |M ∈ B[M]. Therefore, T |M = λ∈σP (T ) T |N (λI−T ) . But each T |N (λI−T ) is normal (in fact, a scalar operator — Problem 6.12), which implies that T |M is normal (actually, a weighted sum of projections). Problem 6.23. Every compact hyponormal operator is normal. Hint : Let T ∈ B[H] be a compact hyponormal operator on a Hilbert space H. Consider the subspace M of Problem 6.22. If λ ∈ σP (T |M⊥ ), then there is a nonzero vector v ∈ M⊥ such that λv = T |M⊥ v = T v, and hence v ∈ N (λI − T ) ⊆ M, which is a contradiction. Thus σP (T |M⊥) = ∅. Recall that T |M⊥ is compact (Section 4.9) and hyponormal (Problem 6.16). Then M⊥ = {0} (use Corollary 6.32). Apply Problem 6.22 to show that T is normal.
508
6. The Spectral Theorem
Remark : According to the above result, on a finite-dimensional Hilbert space, quasinormality, subnormality, and hyponormality all collapse to normality (and so isometries become unitaries — see Problem 4.38(d)). Problem 6.24. Let T ∈ B[H] be a weighted sum of projections on a complex Hilbert space H = {0}. That is, Tx = λγ Pγ x for every x ∈ H, γ∈Γ
where {Pγ }γ∈Γ is a resolution of the identity on H with Pγ = O for all γ ∈ Γ , and {λγ }γ∈Γ is a (similarly indexed) bounded family of scalars. Recall from Proposition 6.36 that T is normal. Now prove the following equivalences. (a) T is unitary ⇐⇒ λγ ∈ T for all γ ⇐⇒ σ(T ) ⊆ T . (b) T is self-adjoint ⇐⇒ λγ ∈ R for all γ ⇐⇒ σ(T ) ⊆ R. (c) T is nonnegative ⇐⇒ λγ ⊆ [0, ∞) for all γ ⇐⇒ σ(T ) ⊆ [0, ∞). (d) T is positive ⇐⇒ λγ ⊆ (0, ∞) for all γ. (e) T is strictly positive ⇐⇒ λγ ⊆ [α, ∞) for all γ ⇐⇒ σ(T ) ⊆ [α, ∞). (f) T is a projection ⇐⇒ λγ ⊆ {0, 1} for all γ ⇐⇒ σ(T ) = σP (T ) ⊆ {0, 1}. Note: In part (a), T denotes the unit circle about the origin of the complex plane. In part (e), α is some positive real number. In part (f), projection means orthogonal projection (Proposition 6.2). Problem 6.25. Let T be an operator on a complex (nonzero) Hilbert space H. Show that (a) T is diagonalizable if and only if H has an orthogonal basis made up of eigenvectors of T . Hint : If {eγ } is an orthonormal basis for H, where each eγ is an eigenvector of T , then (by the Fourier Series Theorem) show that the resolution of the identity on H of Proposition 5.57 diagonalizes T . Conversely, if T is diagonalizable, every nonzero vector in R(Pγ ) is an eigenvector of T . Let B γ−be an orthonormal basis for the Hilbert space R(Pγ ). Since ) = γ R(Pγ H (Theorem 5.59 and Problem 5.10), use Problem 5.11 to show that γ Bγ is an orthonormal basis for H consisting of eigenvectors of T . If there exists an orthonormal basis {eγ }γ∈Γ for H and a bounded family of scalars {λγ }γ∈Γ such that T x = γ∈Γ λγ x ; eγ eγ for every x ∈ H, then we say that T is (or acts as) a diagonal operator with respect to the basis {eγ }γ∈Γ (cf. Problem 5.17). Use part (a) to show that (b) T is diagonalizable if and only if it is a diagonal operator with respect to some orthonormal basis for H.
Problems
509
Now let {eγ }γ∈Γ be an orthonormal basis for H and consider the Hilbert space Γ2 of Example 5.K. Let {λγ }γ∈Γ be a bounded family of scalars and consider the mapping D: Γ2 → Γ2 defined by Dz = {λγ ζγ }γ∈Γ
for every
z = {ζγ }γ∈Γ ∈ Γ2 .
In fact, Dz ∈ Γ2 for all z ∈ Γ2 , D ∈ B[Γ2 ], and #D# = supγ∈Γ |λγ | (hint : Example 4.H). This is called a diagonal operator on Γ2 . Show that (c) T is diagonalizable if and only if it is unitarily equivalent to a diagonal operator. Hint : Let {eγ }γ∈Γ be an orthonormal basis for H and consider the natural mapping (cf. Theorem 5.48) U : H → Γ2 given by U x = {x ; eγ }γ∈Γ for every x = γ∈Γ x ; eγ eγ . Verify that U is unitary (i.e., a linear surjective isometry — see the proof of Theorem 5.49), and use part (b) to show that the diagram H ⏐ ⏐ U Γ2
T
−−−→ H ⏐ ∗ ⏐U D
−−−→ Γ2
commutes if and only if T is diagonalizable, where D is a diagonal operator on the Hilbert space Γ2 . Problem 6.26. If T is a normal operator on a complex Hilbert space H, then (a) T is unitary if and only if σ(T ) ⊆ T , (b) T is self-adjoint if and only if σ(T ) ⊆ R, (c) T is nonnegative if and only if σ(T ) ⊆ [0, ∞), (d) T is strictly positive if and only if σ(T ) ⊆ [α, ∞) for some α > 0, (e) T is an orthogonal projection if and only if σ(T ) ⊆ {0, 1}. Hint : Recall that T stands for the unit circle about the origin of the complex plane. Half of this problem was solved in Corollary 6.18. To % verify the other half, use the spectral decomposition (Theorem 6.47), T = σ(T ) λ dPλ , which % is an abbreviation of T x ; x = σ(T ) λ dPλ x ; x for every x ∈ H; and recall % % that T ∗ = σ(T ) λ dPλ and T ∗ T = σ(T ) |λ|2 dPλ = T T ∗. Problem 6.27. Let T ∈ B[H] be a hyponormal operator. Prove the following implications. (a) If σ(T ) ⊆ T , then T is unitary.
510
6. The Spectral Theorem
Hint : If σ(T ) is included in the unit circle T , then 0 ∈ ρ(T ), and so T is in G[H]. Since T is hyponormal, verify that #T # = r(T ) = 1. Use Problem 6.7 to check that T −1 is hyponormal. Show that #T −1# = 1 (recall: σ(T −1 ) = σ(T )−1 = T ) and conclude from Problem 6.8 that T is unitary. (b) If σ(T ) ⊆ R, then T is self-adjoint. Hint : Take any 0 = α ∈ R. Since αi ∈ ρ(T ) and T is hyponormal, use Problem 6.14 to show that #(αi − T )−1 # = d−1 ≤ |α|−1 . Now verify that α2 #(αiI − T )−1 (αiI − T )x#2 ≤ α2 #(αiI − T )−1 #2 #(αiI − T )x#2 ≤ α2 #x#2 + #T x#2 − 2 Reαix ; T x, thus −2 α Im T x ; x = −2 Imαx ; T x = 2 Re αix ; T x ≤ #T x#2 , and so Im T x; x = 0 (with α = − 21 ), for every x ∈ H. Use Proposition 5.79. (c) If σ(T ) ⊆ [0, ∞), then T is nonnegative. (d) If σ(T ) ⊆ [α, ∞) for some α > 0, then T is strictly positive. (e) If σ(T ) ⊆ {0, 1}, then T is an orthogonal projection. Hint : Use part (b) and Problem 6.26 to prove (c), (d), and (e). Problem 6.28. Prove the following assertion. (a) An isolated point of the spectrum of a normal operator is an eigenvalue. % Hint : Let N = λ dPλ be a normal operator on a Hilbert space H and let λ0 be an isolated point of σ(N ). Apply Theorems 6.47 and 6.48 to show that: (1) P ({λ0 }) = O, (2) R(P ({λ0 })) = {0} reduces N , % (3) N |R(P ({λ0 })) = N P ({λ0 }) = λ χ{λ0 } dPλ = λ0 P ({λ0 }), where χ{λ0 } is the characteristic function of {λ0 }, and (4) (λ0 I − N )u = λ0 P ({λ0 })u − N |R(P ({λ0 })) u = 0 if u ∈ R(P ({λ0 })). An important result in operator theory is the Riesz Decomposition Theorem, which reads as follows. If T is an operator on a complex Hilbert space, and if σ(T ) = σ1 ∪ σ2 , where σ1 and σ2 are disjoint nonempty closed sets in C , then T has a complementary (not necessarily orthogonal ) pair of nontrivial invariant subspaces {M1 , M2 } such that σ(T |M1) = σ1 and σ(T |M2) = σ2 . Use the Riesz Decomposition Theorem to prove the next assertion. (b) An isolated point of the spectrum of a hyponormal operator is an eigenvalue. Hint : Let λ1 be an isolated point of the spectrum σ(T ) of a hyponormal operator T ∈ B[H]. Show that σ(T ) = {λ1 } ∪ σ2 for some nonempty closed set σ2
Problems
511
that does not contain λ1 . Apply the Riesz Decomposition Theorem to ensure that T has a nontrivial invariant subspace M such that σ(T |M ) = {λ1 }. Set H = T |M on M = {0}. Show that λ1 I − H is a hyponormal (thus normaloid) operator and σ(λ1 I − H) = {0}. Conclude that T |M = H = λ1 I in B[M]. Remark : An operator is isoloid if isolated points of the spectrum are eigenvalues. Thus item (b) simply says that hyponormal operators are isoloid . Apply Problem 6.21 to prove the following assertion. (c) A pure hyponormal operator has no isolated point in its spectrum. Problem 6.29. Let S and T be normal operators acting on the same Hilbert space. Prove the following assertion. If S T = T S, then S + T , T S, and S T are normal operators. Hint : Corollary 6.49 and Problem 6.2. (Compare with Problem 6.2(b).) Problem 6.30. The operators in this problem act on a complex Hilbert space of dimension greater than 1. Recall from Problem 4.22: (a) Every nilpotent operator has a nontrivial invariant subspace. However, it is still an open question whether every quasinilpotent operator has a nontrivial invariant subspace. In other words, the invariant subspace problem remains unanswered for quasinilpotent operators on a Hilbert space (but not for quasinilpotent operators on a nonreflexive Banach space, where the answer is in the negative: the existence of quasinilpotent operators on 1 without nontrivial invariant subspaces was proved by Read in 1997). Now shift from nilpotent to normal operators, and recall from the Spectral Theorem: (b) Every normal operator has a nontrivial invariant subspace. Next prove the following propositions. (c) Every quasinormal operator has a nontrivial invariant subspace. Hint : (T ∗ T − T T ∗ )T = O. Use Problem 4.21. (d) Every isometry has a nontrivial invariant subspace. Every subnormal operator has a nontrivial invariant subspace. This is a deep result proved by S. Brown in 1978. However, it is still unknown whether every hyponormal operator has a nontrivial invariant subspace. Problem 6.31. Consider the direct sum k Tk ∈ B[ k Hk ] with each Tk in B[Hk ], where the Hilbert space k Hk is the (orthogonal) direct sum of a countable collection {Hk } of Hilbert spaces (as in Examples 5.F and 5.G and Problems 4.16 and 5.28). Also see Problems 5.71 and 5.73. Show that (a) k Tk is normal, quasinormal, or hyponormal if and only if each Tk is. (b) If every Tk is normaloid, then so is k Tk . However the converse fails.
512
6. The Spectral Theorem
Hint : Set T = k Tk and use Problems 4.16(b) and 5.71(b) to verify that #T n # = supk #Tkn # = supk #Tk #n = (supk #Tk #)n = #T #n if every Tk is normaloid. Show that T = S ⊕ Q is normaloid if S is a normaloid nonstrict contraction (#S n # = #S#n = 1) and Q is a nilpotent nonzero contraction (#Q2 # = 0 = #Q# ≤ 1). Example: #T n# = #T #n = 1 for T = 1 ⊕ 01 00 . Problem 6.32. Take any operator T acting on a normed space. Lemma 6.8 1 says that the sequence {#T n# n } converges and its limit was denoted by r(T ). 1 That is, r(T ) = limn #T n# n . Now consider the following assertions. (a) r(T ) = #T #. (b) #T n # = #T #n for every integer n ≥ 1. (c) #T n # = #T #n for every integer n ≥ m for some integer m ≥ 1. (d) #T 1+jk # = #T #1+jk for every integer j ≥ 0 for some integer k ≥ 1. A normaloid operator was defined as one for which (a) holds true. Proposition 6.9 says that (a) and (b) are equivalent. Show that the above assertions are all equivalent so that T is normaloid if and only if any of them holds true. Hint : (b) trivially implies (c) and (d), and (c) implies (a) as (b) implies (a). (d) implies (a) as follows. If (d) holds, then there is a subsequence {T nj} of nj {T n }, viz., T nj = T 1+jk, such that limj #T nj #1/nj = limj (#T # )1/nj = #T #. Since the convergent sequence {#T n#1/n } has a subsequence that converges to #T #, then {#T n#1/n } converges itself to the same limit. Thus (a) holds true. Remark : An independent proof of (c)⇒(b) without using (a): If #T n # = #T #n for every n ≥ m for some m ≥ 1, then, for every 1 ≤ k ≤ m, #T #k =
#T #m #T m# #T #m−k #T k # = ≤ = #T k # ≤ #T #k #T #m−k #T #m−k #T #m−k
so that #T k # = #T #k , and hence #T n # = #T #n holds for every n ≥ 1. Problem 6.33. An operator T ∈ B[X ] on a normed space X is called kparanormal if there exists an integer k ≥ 1 such that #T x#k+1 ≤ #T k+1 x##x#k
for every
x ∈ X.
A paranormal operator is simply a 1-paranormal operator: T is paranormal if #T x#2 ≤ #T 2x##x#
for every
x ∈ X.
Prove the following assertions. (a) Every hyponormal operator is paranormal (if X is a Hilbert space). Hint : If T ∗ T − T T ∗ ≥ O, then T ∗ (T ∗ T − T T ∗ )T ≥ O, which means that |T |4 ≤ |T 2 |2 . Thus |T |2 ≤ |T 2 | (cf. remark in Problem 5.59). So #T x#2 = |T |2 x ; x ≤ |T 2 |x ; x ≤ #|T 2 |x##x# = #T 2 x##x# by Problem 5.61(c).
Problems
513
(b) Every k-paranormal operator is normaloid. Hint : Suppose T is a nonzero k-paranormal. Take any integer j ≥ 1. Thus #T j x#k+1 = #T (T j−1 x)#k+1 ≤ #T k+1 (T j−1 x)# #T j−1 x#k = #T k+j x# #T j−1 x#k ≤ #T k+j # #T j−1 #k #x#k+1 for every x ∈ H, which implies #T j #k+1 ≤ #T k+j # #T j−1 #k. Therefore, if #T j # = #T #j for some j ≥ 1 (which holds tautologically for j = 1), then #T #(k+1)j = (#T #j )k+1 = #T j #k+1 ≤ #T k+j ##T j−1 #k ≤ #T k+j ##T #(j−1)k so that #T #k+j = #T #(k+1)j / #T #(j−1)k ≤ #T k+j # ≤ #T #k+j , and hence #T k+j # = #T #k+j. Thus it is proved by induction that #T 1+jk # = #T #1+jk for every j ≥ 1. This yields a subsequence {T nj} of {T n}, say T nj = T 1+jk, such that limj #T nj #1/nj = limj (#T #nj )1/nj = #T #. Since {#T n#1/n } is a convergent sequence that converges to r(T ) (Lemma 6.8), and since it has a subsequence that converges to #T #, then the sequence {#T n#1/n } must converge itself to the same limit (Proposition 3.5). Hence r(T ) = #T #. Remark : These classes of operators are related by proper inclusion as follows. Hyponormal ⊂ Paranormal ⊂ k-Paranormal ⊂ Normaloid. Problem 6.34. Consider the setup of the previous problem. Recall that a part of an operator is the restriction of it to an invariant subspace. Prove: (a) Every part of a k-paranormal operator is again k-paranormal. Hint : #(T |M )u#k+1 = #T u#k+1 ≤ #T k+1 u##u#k = #(T |M )k+1 u##u#k . (b) The inverse of an invertible paranormal operator is again paranormal. Hint : If T ∈ B[X ] is an invertible (with a bounded inverse: T −1 ∈ B[X ]) k-paranormal operator on a normed space X for some k ≥ 1, then #x#k+1 = #T T −1 x#k+1 ≤ #T k+1 (T −1 x)##T −1 x#k = #T k x##T −1 x#k for every x in X . Since T k is invertible, take an arbitrary y in X = R(T k ). Thus y = T k x, and hence x = (T k )−1 y = T −k y, which implies that T −1 x = T −(k+1) y, for some x in X . Therefore, by the above inequality, #T −k y#k+1 ≤ #y##T −(k+1)y#k for every y in X . Thus by setting k = 1 it follows that if T is 1-paranormal, then #T −1 y#2 ≤ #T −2 y##y# for every y in X , and so T −1 is 1-paranormal. Remark : Is the inverse of an invertible k-paranormal operator (for k > 1) still k-paranormal? So far, this seems to be an open question. (c) If T ∈ B[X ] is paranormal then, for every x ∈ X and every integer n ≥ 1, #T n x##T x# ≤ #T n+1 x##x#.
514
6. The Spectral Theorem
Hint : If T is paranormal, #T n+1 x#2 #T x# ≤ #T n+2 x##T n x##T x#. If the claimed result holds, #T nx##T x# ≤ #T n+1 x##x#. Conclude the induction. Problem 6.35. Take any C ∈ B[H] on a Hilbert space H. Consider the subset MC = (Cx, Cx) ∈ H ⊕ H: x ∈ H of H ⊕ H. Prove the following assertions. (a) If C commutes with T ∈ B[H], then MC is (T ⊕ T )-invariant. (b) If C has a closed range, then MC is a subspace of H ⊕ H. Take any operator T in B[H] and take an arbitrary operator C in the commutant {T } of T . Prove the following propositions. , , n n n Cx x = supCx =0 CT (c) ,(T ⊕ T )|MC , = supCx =0 T Cx
Cx . , , n (d) If T is an isometry, then ,(T ⊕ T )|MC , = 1. , n, (e) If C is an isometry, then ,(T ⊕ T )|MC , = #T n #. (f) If T is normaloid and C is an isometry, then (T ⊕ T )|MC is normaloid. (g) If T is unitary, then (T ⊕ T )|MT and (T ⊕ T )|MT −1 are normaloid. (h) Assertion (f) may fail if C is not an isometry. For instance, let Q ∈ B[C 3 ] be a nilpotent contraction of index 3 (#Q3 # = 0 < #Q2 # ≤ #Q# ≤ 1). Take T = 1 ⊕ Q in B[C 4 ]. Now take C = 0 ⊕ Q in B[C 4 ], which clearly commutes with T (in fact, C = T − T 3 ) and has a closed range. (Why?) Show that T is a normaloid operator for which (T ⊕ T )|MC is not normaloid. Hints: (a) (T ⊕ T )(Cx, Cx) = (T Cx, T Cx) = (CT x, CT x) if CT = T C. (b) It is clear (isn’t it?) that MC is a linear manifold of H ⊕ H. If R(C) is closed, then R(C) ⊕ R(C) is closed in H ⊕ H since the orthogonal direct sum of closed subspaces is closed (Theorem 5.10 and Proposition 5.24). Take any MC -valued sequence {zn }, say zn = (Cun , Cun ) for each n, that converges to z = (Cx, Cy) in R(C) ⊕ R(C). Thus #zn − z#2 = #(Cun , Cun ) − (Cx, Cy)#2 = #C(un − x), C(un − y)#2 = #C(un − x)#2 + #C(un − y)#2 → 0, so that #C(un − x)# → 0
and
#C(un − y)# → 0.
Hence Cun → Cx and Cun → Cy, and so Cx = Cy. Then z = (Cx, Cx) lies in MC , and so MC is closed in R(C) ⊕ R(C) (Theorem 3.30). Thus MC is closed in H ⊕ H since R(C) ⊕ R(C) is closed in H ⊕ H (Problem 3.38(c)). , , n n )n (Cx,Cx) Cx (c) By (a) ,(T ⊕ T )|MC , = supCx =0 (T ⊕T = supCx =0 T Cx .
(Cx,Cx) , , n
Cx (d) By (c) and Proposition 4.37(c) ,(T ⊕ T )|MC , = supCx =0 Cx = 1. , , n n (e) By (c) and Proposition 4.37(b) ,(T ⊕ T )|MC , = supx =0 T x x = #T n#.
Problems
515
, ,n , , (f) By (e) ,(T ⊕ T )|MCn , = #T n # = #T #n = ,(T ⊕ T )|MC , . (g) Particular case of (f). (h) Since T n = 1 ⊕ Qn and since #Q3 # = 0 < #Q2 # ≤ #Q# ≤ 1, it follows that 2 2 #T n # = 1 = #T #n, and T is normaloid. Since T C , = 0 ⊕ Q = O and T C = , 3 2 , , 0 ⊕ Q = O, it follows by (c) that (T ⊕ T )|MC = 0, while (still by (c)) , , ,(T ⊕ T )|M , = C
#(0 ⊕ Q2 )x# #Q2 y# #Q2 y0 # = sup ≥ > 0 #Qy0 # Qy =0 #Qy# (0⊕Q)x =0 #(0 ⊕ Q)x# sup
, , , , for every y0 ∈ H\N (Q2 ). Thus ,(T ⊕ T )|MC2 , = ,(T ⊕ T )|MC,2 . Problem 6.36. If Λ1 and Λ2 are sets of complex numbers, then define their product Λ1 ·Λ2 as the set in C consisting of all products λ1 λ2 with λ1 ∈ Λ1 and λ2 ∈ Λ2 . It is plain that Λ1 ·Λ2 = Λ2 ·Λ1 . Let |Λ|2 denote the set of nonnegative numbers consisting of the squared absolute values |λ|2 of all λ ∈ Λ (i.e., the set of all nonnegative numbers of the form λλ). Recall that σ(T ∗ ) = σ(T )∗ for every Hilbert space operator T (Proposition 6.17). Exhibit T such that (a) T ∼ = T ∗ = T (i.e., T is unitarily equivalent to its adjoint without being self-adjoint), and (b) |σ(T )|2 = σ(T ∗ ) · σ(T ) ⊆ R (i.e., σ(T ∗ ) · σ(T ) is not only different from |σ(T )|2 , but it is not even a subset of the real line). ! ! −i 0 0 0 0 1 Hints: Take T = 0 1 0 and U = 0 1 0 in B[C 3 ]. 0 0 i
1 0 0
∗
(a) U is a symmetry such that U T = T U . (b) σ(T ∗ ) · σ(T ) = {i, 1, −i} · {−i, 1, i} = {1, i, −1, −i} and |σ(T )|2 = {1}. Problem 6.37. Take T ∈ B[H] and S ∈ B[K] on Hilbert spaces H and K, and the (orthogonal) direct sum T ⊕ S ∈ B[H ⊕ K]. Prove the following assertions. (a) σ(T ⊕ S) = σ(T ) ∪ σ(S). Hint : ρ(T ⊕ S) = ρ(T ) ∩ ρ(S). (b) σP (T ⊕ S) = σP (T ) ∪ σP (S).
n Remark : Assertion (a) can be extended to a finite direct sum, σ i=1 Ti = n ∞ i=1 σ(Ti ), but not to countably infinite direct sums. For instance, let {qk }k=1 be any enumeration of all rational numbers the interval [0, 1]. Consider the
-in . ∞ ∞ 2 direct sum ∞ in B = B[ T = diag {q } C k k + ] of the bounded k=1 k=1 k=1 ∞ ∞ family {Tk }k=1 , each Tk = qk on C , so that k=1 σ(Tk ) = Q ∩ [0, 1], which is not closed in C . But the spectrum is closed. Actually, for bounded
− every ∞ ∞ family {Tk }∞ of operators in B[H ] we have σ(T ) ⊆ σ k k k=1 k=1 k+1 Tk but, in general, equality may not hold. However, assertion (b) can be extended
∞ ∞ to countable direct sums: σP k=1 Tk = k=1 σP (Tk ).
516
6. The Spectral Theorem
(c) σP4 (T ⊕ S) ⊆ σP4 (T ) ∪ σP4 (S). Hint : σP1 (T ) ∩ σR1 (S) ⊆ σP4 (T ⊕ S) — see Problem 5.7(e). (d) σC (T ) ∪ σC (S) ⊆ σC (T ⊕ S).
2 on K = + so that Hint : Set T = 0 on H = C and S = diag { k1 }∞ k+1
1 1 T ⊕ S = diag 0, 1, 2 , 3 , . . . on H ⊕ K. Verify that 0 ∈ σC (S) ∩ σP (T ⊕ S).
Problem 6.38. Recall that a quasinilpotent T ∈ B[X ] on a Banach space X is an operator such that σ(T ) = {0} (equivalently, such that r(T ) = 0). Prove the following assertion. If 0 ∈ σ(T ) (i.e., if 0 ∈ ρ(T )), then every restriction of T to a nonzero invariant subspace is not quasinilpotent; that is, 0 ∈ σ(T )
=⇒
σ(T |M ) = {0} for every T -invariant subspace M = {0}.
Hint : If 0 ∈ ρ(T ), then T is invertible, thus bounded below, and so is T |M . Hence 0 ∈ ρ(T |M ) ∪ σR1 (T |M ) (cf. diagram of Section 6.2). If 0 ∈ σR1 (T |M ), then σ(T |M ) = {0} because σR1 (T |M ) is open. Problem 6.39. Let T ∈ B[X ] be an operator on a complex Banach space. (a) Use the spectral radius condition for uniform stability of Proposition 6.22 to prove the following result. inf #T n # < 1 n
implies
#T n # → 0.
Hint : If inf n #T n # < 1, then there is an integer n0 ≥ 0 such that #T n0 # < 1. Thus r(T )n0 = r(T n0 ) ≤ #T n0 # < 1, and hence r(T ) < 1 or, equivalently, #T n # → 0 (cf. Corollary 6.20, its remarks, and Proposition 6.22). (b) This is the strong stability version of the above result. For every x ∈ X , sup #T n # < ∞ and inf #T n x# = 0 n
n
implies
#T nx# → 0.
Hint : #T n+1 x# ≤ #T ##T nx#. If inf n #T nx# = 0, then lim inf n #T nx# = 0, and therefore there exists a subsequence {#T nk x#} of {#T nx#} such that limk #T nk x# → 0. If supn #T n# < ∞, then take any ε > 0 and let kε = kε (x) be any integer such that #T nkε x# ≤ ε/ supn #T n #. Thus, #T n x# ≤ #T n−nkε ##T nkε x# ≤ supn #T n##T nkε x# < ε, and hence #T nx# → 0. Problem 6.40. Let T ∈ B[X ] and S ∈ B[Y ] be operators on complex Banach spaces. Apply the preceding problem to prove the following propositions. (a)
#T n##S n # → 0
implies
#T n # → 0 or #S n # → 0.
Hint : According to Problem 6.39(a), inf #T n # = 0 n
=⇒
#T n # → 0.
Problems
517
Now, under the assumption that #T n ##S n# → 0, inf #T n # > 0
#S n # → 0.
=⇒
n
In fact, since #T n+1 # ≤ #T ##T n#, it follows that inf n #T n# = 0 if and only if lim inf n #T n # = 0. Thus inf n #T n # > 0 if and only if lim inf n #T n # > 0. In this case, since #T n##S n # → 0, it follows that #S n # → 0. (b) sup #T n ##S n # < ∞
implies
n
sup #T n# < ∞ or sup #S n# < ∞. n
n
Hint : Suppose supn #S n # = ∞ so that inf n #S n # ≥ 1 by Problem 6.39(a). In this case, supn #T n# < ∞ if and only if supn (#T n ##S n #) < ∞. But if supn (#T n ##S n #) < ∞, then supn #T n # inf n #S n # < ∞. Thus, by Problem 3.10(d), supn #T n# < ∞ because inf n #S n # ≥ 1. (c) The above implications do not hold for general (not power) sequences. Exhibit two sequences of operators on X , {Tn } and {Sn}, such that #Tn##Sn # → 0
but
lim sup #Tn # = lim sup #Sn # = ∞. n
n
1 I n2
if n is even, and Sn = n12 I if n Hint : Tn = nI if n is odd and Tn = is odd and Sn = nI if n is even (or, more drastically, Tn = O if n is even and Sn = O if n is odd). Problem 6.41. Consider the setup of the previous problem. Prove the following strong stability version of the uniform stability result of Problem 6.40(a). If supn #T n # < ∞ or supn #S n # < ∞, and if #T n x##S ny# → 0 for every x in X and every y in Y, then #T nx# → 0 for every x in X or #S n y# → 0 for every y in Y. Hint : If supn #T n # < ∞ then, by Problem 6.39(b), inf #T n x# = 0 for every x ∈ X n
=⇒
#T nx# → 0 for every x ∈ X .
Now, under the assumption that #T nx##S n y# → 0 for every x ∈ X and y ∈ Y, inf #T nx# > 0 for some x ∈ X n
=⇒
#S n y# → 0 for every y ∈ Y.
Indeed, if inf n #T nx# > 0 for some x ∈ X , then lim inf n #T n x# > 0. Take an arbitrary y in Y. Since #T nx##S n y# → 0, it follows that #S n y# → 0. Problem 6.42. Let T and S be operators on a complex Hilbert space H. (a) Show that S T and T S are proper contractions whenever S is a contraction and T is a proper contraction. (b) Thus show that the point spectrum σP (T ∗ T ) lies in the open unit disk if T is a proper contraction.
518
6. The Spectral Theorem
Conclude: The concepts of proper and strict contraction coincide for compact operators on a complex Hilbert space. Hint : Suppose T is compact. Verify that T ∗ T is compact (Proposition 4.54). Suppose, in addition, that T is a proper contraction. Use (b) to infer that the spectrum σ(T ∗ T ), which is always closed, also lies in the open unit disk (since σ(K)\{0} = σP (K)\{0} whenever K is compact — Corollary 6.31). Thus verify that #T #2 = r(T ∗ T ) < 1. Problem 6.43. If Λ1 and Λ2 are arbitrary subsets of C and p : C ×C → C is any polynomial in two variables (with complex coefficients), then set p(Λ1 , Λ2 ) = p(λ1 , λ2 ) ∈ C : λ1 ∈ Λ1 , λ2 ∈ Λ2 ; in particular, with Λ∗ = {λ ∈ C : λ ∈ Λ}, p(Λ, Λ∗ ) = p(λ, λ) ∈ C : λ ∈ Λ . Let A be a maximal commutative subalgebra of a complex unital Banach algebra B (i.e., a commutative subalgebra of B that is not included in any other commutative subalgebra of B). Consider the algebra C (of all complex numbers). Let Aˆ denote the collection of all algebra homomorphisms of A onto C (see Problem 2.31). An important result in spectral theory reads as follows. Let X be a complex Banach space. If A is a maximal commutative subalgebra of B[X ], then, for every T ∈ A, σ(T ) = φ(T ) ∈ C : φ ∈ Aˆ . (See Problem 2.31.) Use this result to prove the following extension of the Spectral Mapping Theorem for polynomials, which is referred to as the Spectral Mapping Theorem for Normal Operators. Let H be a complex Hilbert space. If T ∈ B[H] is normal and p(·, ·) is a polynomial in two variables, then σ(p(T, T ∗ )) = p(σ(T ), σ(T ∗ )) = p(λ, λ) ∈ C : λ ∈ σ(T ) . Hint : Since T is normal, use Zorn’s Lemma to show that there exists a maximal commutative subalgebra A T of B[H] containing T and T ∗. Verify that φ(p(T, T ∗ )) = p(φ(T ), φ(T ∗ )) for every homomorphism φ : A T → C . Thus σ(p(T, T ∗ )) = p(φ(T ), φ(T ∗ )) ∈ C : φ ∈ Aˆ T . Take any surjective homomorphism φ : A T → C (i.e., any φ ∈ Aˆ T ). Consider the Cartesian decomposition T = A + iB, where A, B ∈ B[H] are self-adjoint operators, and so T ∗ = A − iB (Problem 5.46). Thus φ(T ) = φ(A) + iφ(B)
Problems
519
and φ(T ∗ ) = φ(A) − iφ(B). Verify that {φ(A) ∈ C : φ ∈ Aˆ T = σ(A) ⊂ R because A is self-adjoint (cf. Corollary 6.18). Hence φ(A) ∈ R and φ(B) ∈ R. Thus conclude that φ(T ∗ ) = φ(T ). Hence, since σ(T ∗ ) = σ(T )∗ for every T ∈ B[H] by Proposition 6.17, σ(p(T, T ∗ )) = p(φ(T ), φ(T )) ∈ C : φ ∈ Aˆ T = p(λ, λ) ∈ C : λ ∈ {φ(T ) ∈ C : φ ∈ Aˆ T } = p(λ, λ) ∈ C : λ ∈ σ(T ) = p(σ(T ), σ(T )∗ ) = p(σ(T ), σ(T ∗ )). Problem 6.44. Let T be a normal operator on a complex Hilbert space, and % consider its spectral decomposition T = λ dPλ (Theorem 6.47). Show that $ ∗ p(T, T ) = p(λ, λ) dPλ for every polynomial p : C ×C → C in two variables with complex coefficients. Hint : Let P be the spectral measure on Σσ(T ) for the spectral decomposition of T . Recall from Section 6.8 that % if ϕ, ψ : σ(T ) → C % are bounded Σσ(T ) measurable functions, and if F = ψ(λ) dP and G = ψ(λ) dPλ , then F G = λ % % j ϕ(λ)ψ(λ) dPλ . This is enough to ensure that T i T ∗j = λi λ dPλ , and so m
i
αi.j T T
∗j
m
=
i,j=0
i,j=0
$ αi.j
i
σ(T )
$
j
λ λ dPλ =
σ(T )
m
αi.j λi λ
j
dPλ .
i,j=0
Remark : Since a polynomial in one variable can be thought of as a particular case of a polynomial in two variables, it follows that $ p(T ) = p(λ) dPλ for every polynomial p : C → C with complex coefficients. Problem 6.45. Let T be a nonnegative operator on a complex Hilbert space. % Consider its spectral decomposition T = λ dPλ . Use Problem 6.44 to show $ that 1 1 2 T = λ 2 dPλ . 1
Hint : Consider the nonnegative square root T 2 of T (Theorem 5.85). Recall 1 1 that σ(T 2 ) = σ(T ) 2 (Section 6.3). Let P be the spectral measure on Σσ(T ) for the spectral decomposition of T , and let P be the spectral measure on 1 Σσ(T 1/2 ) for the spectral decomposition of T 2 . Thus $ $ 1 1 2 )2 = T2 = λ dP , so that T = (T λ2 dPλ λ 1 1 σ(T
2
)
σ(T ) 2
520
6. The Spectral Theorem
by Problem 6.44, and hence $ $ 2 λ dP = λ dPλ , λ 1 σ(T ) 2
$ which implies
σ(T )
σ(T )
λ dPλ =
$ σ(T )
λ dPλ .
Since the spectral decomposition is unique (Theorem 6.47), it follows that P = P , and therefore $ $ 1 1 λ dP = λ 2 dPλ . T2 = λ 1 σ(T ) 2
σ(T )
Remark : Since polynomials p(T, T ∗ ) and p(T ) are compact and normal whenever T is (see Proposition 4.54 and Problem 4.29), and since the nonnegative 1 square root T 2 is compact whenever a nonnegative T is compact (cf. Problem 5.62), it follows by the preceding two problems and Theorem 6.43 that 1 1 p(T, T ∗ ) = p(λ, λ) dPλ , p(T ) = p(λ) dPλ , T 2 = λ 2 dPλ σP (T )
σP (T )
σP (T ) 1
whenever T is compact and normal (and nonnegative for the case of T 2 ). Problem 6.46. Consider the setup of Problem 6.4. Let Mϕ ∈ B[L2 (T )] be the multiplication operator where ϕ : T → T ⊆ C is the identity function (ϕ(λ) = λ for all λ ∈ T ). Use Problem 5.33 and Example 6.E to show that (a) σ(Mϕ ) = σC (Mϕ ) = T . Now consider Definition 6.46. For each Λ ∈ Σ T let χΛ : T → {0, 1} be its characteristic function, and take the multiplication operator M χΛ ∈ B[L2 (T )]. Show that the mapping P : Σ T → B[H] defined by (b) P (Λ) = M χΛ for every Λ ∈ Σ T is a spectral measure in L2 (T ). This in fact is the unique (and natural) spectral measure in L2 (T ) that leads to the spectral decomposition of the unitary operator Mϕ , namely, $ Mϕ = λ dPλ , in the sense that the complex-valued measure px,y : Σ T → C defined by $ $ px,y (Λ) = P (Λ)x ; y = M χΛ x ; y = χΛ x(λ) y(λ) dλ = x(λ) y(λ) dλ Λ
for every Λ ∈ ΣT is such that, for each x, y ∈ L2 (T ), $ $ Mϕ x ; y = ϕ(λ) dpx,y = λ x(λ) y(λ) dλ.
References
N.I. Akhiezer and I.M. Glazman [1] Theory of Linear Operators in Hilbert Space – Volume I , Pitman, London, 1981; reprinted: Dover, New York, 1993. [2] Theory of Linear Operators in Hilbert Space – Volume II , Pitman, London, 1981; reprinted: Dover, New York, 1993. W. Arveson [1] An Invitation to C*-Algebras, Springer, New York, 1976. [2] A Short Course on Spectral Theory, Springer, New York, 2002. G. Bachman and L. Narici [1] Functional Analysis, Academic Press, New York, 1966; reprinted: Dover, Mineola, 2000. A.V. Balakrishnan [1] Applied Functional Analysis, 2nd edn. Springer, New York, 1981. S. Banach [1] Theory of Linear Operations, North-Holland, Amsterdam, 1987. R. Beals [1] Topics in Operator Theory, The University of Chicago Press, Chicago, 1971. B. Beauzamy [1] Introduction to Operator Theory and Invariant Subspaces, North-Holland, Amsterdam, 1988. S.K. Berberian [1] Notes on Spectral Theory, Van Nostrand, New York, 1966. [2] Lectures in Functional Analysis and Operator Theory, Springer, New York, 1974. [3] Introduction to Hilbert Space, 2nd edn. Chelsea, New York, 1976.
C.S. Kubrusly, The Elements of Operator Theory, DOI 10.1007/978-0-8176-4998-2, © Springer Science+Business Media, LLC 2011
521
522
References
Y.M. Berezansky, Z.G. Sheftel, and G.F. Us [1] Functional Analysis – Volume I, Birkh¨ auser, Basel, 1996. [2] Functional Analysis – Volume II, Birkh¨ auser, Basel, 1996. K.G. Binmore [1] The Foundations of Analysis – A Straightforward Introduction – Book I: Logic, Sets and Numbers, Cambridge University Press, Cambridge, 1980. G. Birkhoff and S. MacLane [1] A Survey of Modern Algebra, Macmillan, New York, 1941; 3rd edn. 1965. A. Brown and C. Pearcy [1] Spectra of tensor products of operators, Proc. Amer. Math. Soc. 17 (1966), 162–166. [2] Introduction to Operator Theory I – Elements of Functional Analysis, Springer, New York, 1977. [3] An Introduction to Analysis, Springer, New York, 1995. S.W. Brown [1] Some invariant subspaces for subnormal operators, Integral Equations Operator Theory 1 (1978), 310–333. G. Cantor [1] Ein Beitrag zur Mannigfaltigkeitslehre, J. f¨ ur Math. 84 (1878), 242–258. K. Clancey [1] Seminormal Operators, Springer, Berlin, 1979. P.J. Cohen [1] The independence of the continuum hypothesis, Proc. Nat. Acad. Sci. 50 (1963), 1143–1148. [2] Set Theory and the Continuum Hypothesis, W.A. Benjamin, New York, 1966. ˇ and C. Foias¸ I. Colojoara [1] Theory of Generalized Spectral Operators, Gordon & Breach, New York, 1968. J.B. Conway [1] A Course in Functional Analysis, 2nd edn. Springer, New York, 1990. [2] The Theory of Subnormal Operators, Mathematical Surveys and Monographs Vol. 36, Amer. Math. Soc., Providence, 1991. [3] A Course in Operator Theory, Graduate Studies in Mathematics Vol. 21, Amer. Math. Soc., Providence, 2000. J.N. Crossley, C.J. Ash, C.J Brickhill, J.C. Stillwell, and N.H. Williams [1] What is Mathematical Logic, Oxford University Press, Oxford, 1972. K.R. Davidson [1] C ∗ -Algebras by Example, Fields Institute Monographs Vol. 6, Amer. Math. Soc., Providence, 1996.
References
523
M.M. Day [1] Normed Linear Spaces, 3rd edn. Springer, Berlin, 1973. J. Dieudonn´ e [1] Foundations of Modern Analysis, Academic Press, New York, 1969. R.G. Douglas [1] Banach Algebra Techniques in Operator Theory, Academic Press, New York, 1972; 2nd edn. Springer, New York, 1998. H.R. Dowson [1] Spectral Theory of Linear Operators, Academic Press, London, 1978. J. Dugundji [1] Topology, Allyn & Bacon, Boston, 1966; reprinted: 1978. N. Dunford and J.T. Schwartz [1] Linear Operators – Part I: General Theory, Interscience, New York, 1958; reprinted: Wiley, New York, 1988. [2] Linear Operators – Part II: Spectral Theory – Self Adjoint Operators in Hilbert Space, Interscience, New York, 1963; reprinted: Wiley, New York, 1988. [3] Linear Operators – Part III: Spectral Operators, Interscience, New York, 1971; reprinted: Wiley, New York, 1988. A. Dvoretzky and C.A. Rogers [1] Absolute and unconditional convergence in normed linear spaces, Proc. Nat. Acad. Sci. 36 (1950), 192–197. P. Enflo [1] A counterexample to the approximation problem in Banach spaces, Acta Math. 130 (1973), 309–317. [2] On the invariant subspace problem for Banach spaces, Acta Math. 158 (1987), 213–313. S. Feferman [1] Some applications of the notion of forcing and generic sets, Fund. Math. 56 (1965), 325–345. P.A. Fillmore [1] Notes on Operator Theory, Van Nostrand, New York, 1970. [2] A User’s Guide to Operator Algebras, Wiley, New York, 1996. A.A. Fraenkel, Y. Bar-Hillel, and A. Levy [1] Foundations of Set Theory, 2nd edn. North-Holland, Amsterdam, 1973. T. Furuta [1] Invitation to Linear Operators: From Matrices to Bounded Linear Operators on a Hilbert Space , Taylor & Francis, London, 2001.
524
References
B.R. Gelbaum and J.M.H. Olmsted [1] Counterexamples in Analysis, Holden-Day, San Francisco, 1964; reprinted: Dover, Mineola, 2003. ¨ del K. Go [1] Consistence-proof for the generalized continuum-hypothesis, Proc. Nat. Acad. Sci. 25 (1939), 220–224. C. Goffman and G. Pedrick [1] A First Course in Functional Analysis, Prentice-Hall, Englewood Cliffs, 1965; 2nd edn. Chelsea, New York, 1983. I.C. Gohberg and M.G. Kre˘ın [1] Introduction to Nonselfadjoint Operators, Translations of Mathematical Monographs Vol. 18, Amer. Math. Soc., Providence, 1969. S. Goldberg [1] Unbounded Linear Operators, McGraw-Hill, New York, 1966; reprinted: Dover, Mineola, 1985, 2006. K.E. Gustafson and D.K.M. Rao [1] Numerical Range, Springer, New York, 1997. P.R. Halmos [1] Introduction to Hilbert Space and the Theory of Spectral Multiplicity, 2nd edn. Chelsea, New York, 1957; reprinted: AMS Chelsea, Providence, 1998. [2] Finite-Dimensional Vector Spaces, Van Nostrand, New York, 1958; reprinted: Springer, New York, 1974. [3] Naive Set Theory, Van Nostrand, New York, 1960; reprinted: Springer, New York, 1974. [4] A Hilbert Space Problem Book, Van Nostrand, New York, 1967; 2nd edn. Springer, New York, 1982. [5] Linear Algebra Problem Book, Dolciani Mathematical Expositions, No. 16. Math. Assoc. America, Washington, D.C., 1995. G. Helmberg [1] Introduction to Spectral Theory in Hilbert Space, North-Holland, Amsterdam, 1969. D. Herrero [1] Approximation of Hilbert Space Operators – Volume. 1, 2nd edn. Longman, Harlow, 1989. I.N. Herstein [1] Topics in Algebra, Blaisdell, New York, 1964; 2nd edn. Xerox, Lexington, 1975. [2] Abstract Algebra, Macmillan, New York, 1986; 3rd edn. Prentice-Hall, Upper Saddle River, 1996.
References
525
E. Hille and R.S. Phillips [1] Functional Analysis and Semi-Groups, Colloquium Publications Vol. 31, Amer. Math. Soc., Providence, 1957; reprinted: 1974. K. Hoffman and R. Kunze [1] Linear Algebra, 2nd edn. Prentice-Hall, Englewood Cliffs, 1971. ˇt V.I. Istra ¸ escu [1] Introduction to Linear Operator Theory, Marcel Dekker, New York, 1981. L.V. Kantorovich and G.P. Akilov [1] Functional Analysis, 2nd edn. Pergamon Press, Oxford, 1982. I. Kaplansky [1] Linear Algebra and Geometry: A Second Course, Allyn & Bacon, Boston, 1969; revised edition: Dover, Mineola, 2003. [2] Set Theory and Metric Spaces, Allyn & Bacon, Boston, 1972; reprinted: Chelsea, New York, 1977. T. Kato [1] Perturbation Theory for Linear Operators, 2nd edn. Springer, Berlin, 1980; reprinted: 1995. J.L. Kelley [1] General Topology, Van Nostrand, New York, 1955; reprinted: Springer, New York, 1975. A.N. Kolmogorov and S.V. Fomin [1] Introductory Real Analysis, Prentice-Hall, Englewood Cliffs, 1970; reprinted: Dover, New York, 1975. E. Kreyszig [1] Introductory Functional Analysis with Applications, Wiley, New York, 1978, reprinted: 1989. C.S. Kubrusly [1] An Introduction to Models and Decompositions in Operator Theory, Birkh¨ auser, Boston, 1997. [2] Hilbert Space Operators: A Problem Solving Approach, Birkh¨ auser, Boston, 2003. P.D. Lax [1] Functional Analysis, Wiley, New York, 2002. [2] Linear Algebra and Its Applications, 2nd edn. Wiley, Hoboken, 2007. V.I. Lomonosov [1] Invariant subspaces for the family of operators which commute with a completely continuous operator, Functional Anal. Appl. 7 (1973), 213–214. S. MacLane and G. Birkhoff [1] Algebra, Macmillan, New York, 1967; 3rd edn. Chelsea, New York, 1988.
526
References
M. Martin and M. Putinar [1] Lectures on Hyponormal Operators, Birkh¨ auser, Basel, 1989. I.J. Maddox [1] Elements of Functional Analysis, 2nd edn. Cambridge University Press, Cambridge, 1988. J.N. McDonald and N.A Weiss [1] Real Analysis, Academic Press, San Diego, 1999. G.H. Moore [1] Zermelo’s Axiom of Choice, Springer, New York, 1982. J.R. Munkres [1] Topology: A First Course, Prentice-Hall, Englewood Cliffs, 1975. G. Murphy [1] C*-Algebras and Operator Theory, Academic Press, San Diego, 1990. A.W. Naylor and G.R. Sell [1] Linear Operator Theory in Engineering and Science, Holt, Rinehart & Winston, New York, 1971; reprinted: Springer, New York, 1982. C.M. Pearcy [1] Some Recent Developments in Operator Theory, CBMS Regional Conference Series in Mathematics No. 36, Amer. Math. Soc., Providence, 1978. [2] Topics in Operator Theory, Mathematical Surveys No. 13, Amer. Math. Soc., Providence, 2nd pr. 1979. A. Pietsch [1] History of Banach Spaces and Linear Operators, Birkh¨ auser, Boston, 2007. C.R. Putnam [1] Commutation Properties of Hilbert Space Operators and Related Topics, Springer, Berlin, 1967. H. Radjavi and P. Rosenthal [1] Invariant Subspaces, Springer, New York, 1973; 2nd edn. Dover, Mineola, 2003. [2] Simultaneous Triangularization, Springer, New York, 2000. C.J. Read [1] A solution to the invariant subspace problem, Bull. London Math. Soc. 16 (1984), 337–401. [2] Quasinilpotent operators and the invariant subspace problem, J. London Math, Soc. 56 (1997), 595–606. M. Reed and B. Simon [1] Methods of Modern Mathematical Physics I: Functional Analysis, 2nd edn. Academic Press, New York, 1980.
References
527
F. Riesz and B. Sz.-Nagy [1] Functional Analysis, Frederick Ungar, New York, 1955; reprinted: Dover, New York, 1990. A.P. Robertson and W.I. Robertson [1] Topological Vector Spaces, 2nd edn. Cambridge University Press, Cambridge, 1973; reprinted: 1980. S. Roman [1] Advanced Linear Algebra, 3rd edn. Springer, New York, 2008. H.L. Royden [1] Real Analysis, 3rd edn. Macmillan, New York, 1988. W. Rudin [1] Functional Analysis, 2nd edn. McGraw-Hill, New York, 1991. B.P. Rynne and M.A. Youngson [1] Linear Functional Analysis, 2nd edn. Springer, London, 2008. R. Schatten [1] Norm Ideals of Completely Continuous Operators, Springer, Berlin, 1970. L. Schwartz [1] Analyse – Topologie G´en´erale et Analyse Fonctionnelle, 2`eme ´edn. Hermann, Paris, 1970. W. Sierpinski [1] L’hypothese g´en´eralis´ee du continu et l’axiome du choix, Fund. Math. 34 (1947), 1–5. G.F. Simmons [1] Introduction to Topology and Modern Analysis, McGraw-Hill, New York, 1963; reprinted: Krieger, Melbourne, 1983. D.R. Smart [1] Fixed Point Theorems, Cambridge University Press, Cambridge, 1974. M.H. Stone [1] Linear Transformations in Hilbert Space, Colloquium Publications Vol. 15, Amer. Math. Soc., Providence, 1932; reprinted: 1990. V.S. Sunder [1] Functional Analysis – Spectral Theory, Birkh¨ auser, Basel, 1998. P. Suppes [1] Axiomatic Set Theory, Van Nostrand, Princeton, 1960; reprinted: Dover, New York, 1972. W.A. Sutherland [1] Introduction to Metric and Topological Spaces, Oxford University Press, Oxford, 1975.
528
References
B. Sz.-Nagy and C. Foias¸ [1] Harmonic Analysis of Operators on Hilbert Space, North-Holland, Amsterdam, 1970. A.E. Taylor and D.C. Lay [1] Introduction to Functional Analysis, 2nd edn. Wiley, New York, 1980; enlarged edn. of A.E. Taylor, 1958; reprinted: Krieger, Melbourne, 1986. R.L. Vaught [1] Set Theory – An Introduction, 2nd edn. Birkh¨ auser, Boston, 1995. J. Weidmann [1] Linear Operators in Hilbert Spaces, Springer, New York, 1980. R.L. Wilder [1] Introduction to the Foundation of Mathematics, 2nd edn. Wiley, New York, 1965; reprinted: Krieger, Malabar, 1983. D. Xia [1] Spectral Theory of Hyponormal Operators, Birkh¨ auser, Basel, 1983. B.H. Yandell [1] The Honors Class: Hilbert’s Problems and Their Solvers, A K Peters, Natick, 2002. T. Yoshino [1] Introduction to Operator Theory, Longman, Harlow, 1993. K. Yosida [1] Functional Analysis, 6th edn. Springer, Berlin, 1981; reprinted: 1995.
Index
Abelian group, 38 absolute homogeneity, 200 absolutely convergent series, 203 absolutely convex set, 271 absolutely homogeneous functional, 200 absolutely homogeneous metric, 200 absolutely summable family, 341, 343 absolutely summable sequence, 203 absorbing set, 271 accumulation point, 117–119 additive Abelian group, 38 additive mapping, 55 additively invariant metric, 199 additivity, 310 adherent point, 117, 118 adjoint, 376, 384–388, 456 algebra, 82 algebra homomorphism, 84 algebra isomorphism, 84 algebra with identity, 82 algebraic complement, 68–72, 289 algebraic conjugate, 56 algebraic dual, 56 algebraic linear transformation, 80 algebraic operator, 282 algebraically disjoint, 67 Almost Orthogonality Lemma, 238 annihilator, 302 antisymmetric relation, 8 approximate eigenvalue, 454 approximate point spectrum, 454
approximation spectrum, 454 Arzel` a–Ascoli Theorem, 164, 198 ascent, 85 associative binary operation, 37 Axiom of Choice, 15 Axiom of Empty Set, 23 Axiom of Extension, 23 Axiom of Foundation, 23 Axiom of Infinity, 23 Axiom of Pairing, 23 Axiom of Power Set, 23 Axiom of Regularity, 23 Axiom of Replacement, 23 Axiom of Restriction, 23 Axiom of Separation, 23 Axiom of Specification, 23 Axiom of Substitution, 23 Axiom of Union, 23 backward bilateral shift, 293, 421 backward unilateral shift, 249, 292, 420, 476 Baire Category Theorem, 146–148 Baire metric, 188 Baire space, 148 balanced set, 271 Banach algebra, 224 Banach Fixed Point Theorem, 133 Banach limit, 303 Banach space, 202, 211, 216, 221, 235, 272, 294
530
Index
Banach–Steinhaus Theorem, 244, 294 Banach–Tarski Lemma, 11 barrel, 272 barreled space, 272, 294 base, 125, 272 basis, 48, 227, 351 Bessel inequality, 352 best linear approximation, 329 bidual, 267 bijective function, 5 bilateral ideal, 83 bilateral shift, 292, 421, 424, 471 bilateral weighted shift, 472, 473 bilinear form, 309 bilinear functional, 309 binary operation, 37 block diagonal operator, 280 Bolzano–Weierstrass Property, 157 Boolean sum, 4 boundary, 182 boundary point, 182 bounded above, 9, 10, 225 bounded away from zero, 227, 273 bounded below, 9, 10, 225, 230 bounded family, 209 bounded function, 10, 89, 273 bounded increments, 169 bounded inverse, 229, 230 Bounded Inverse Theorem, 230 bounded linear operator, 222 bounded linear transformation, 217 bounded sequence, 14, 90, 129, 204 bounded set, 9, 89, 154, 271, 273 bounded variation, 186 boundedly complete lattice, 10 Browder Fixed Point Theorem, 308 C*-algebra, 393 canonical basis for F n, 54 canonical bilateral shift, 422, 472 canonical orthonormal basis for F n, 359 2 , 360 canonical orthonormal basis for + canonical unilateral shift, 420, 472 Cantor set, 191 Cantor–Bernstein Theorem, 11, 17 cardinal number, 16, 22 cardinality, 14, 16, 18, 23, 51, 53 Cartesian decomposition, 428
Cartesian product, 4, 13, 66, 178, 190, 194, 208 Cauchy criterion, 128, 275, 341 Cauchy sequence, 128, 129, 135, 155, 185–188, 272 Cayley transform, 503, 504 chain, 12 characteristic function, 16, 17, 29 Chebyshev–Hermite functions, 361 Chebyshev–Hermite polynomials, 361 Chebyshev–Laguerre functions, 361 Chebyshev–Laguerre polynomials, 361 clopen set, 185 closed ball, 102 closed convex hull, 271 Closed Graph Theorem, 231 closed linear transformation, 287–289 closed map, 115 closed set, 114–116, 118, 120, 129, 130, 150, 151 Closed Set Theorem, 120 closed subspace, 181 closure, 115–117, 181, 185 cluster point, 117, 119 codimension, 70, 81 codomain, 5 coefficients, 277, 357 cohyponormal operator, 447 coisometry, 388, 420, 421 collinear vectors, 406 comeager set, 145 commensurable topologies, 108 commutant, 284 commutative algebra, 83 commutative binary operation, 38 commutative diagram, 6 commutative group, 38 commutative ring, 39 commuting operators, 282, 491, 494, 495, 511 compact extension, 258 compact linear transformation, 252–258, 300, 301, 306, 426, 427 compact mapping, 308 compact operator, 256, 433, 435, 438, 478–482, 486, 488–491, 497, 506, 507, 518, 520 compact restriction, 258
Index compact set, 149–151, 160, 161, 195, 196, 198, 238, 239 compact space, 150–152, 156, 158–164, 194, 196 compatible topology, 270 complementary linear manifolds, 338 complementary projection, 71, 290, 364 complementary subspaces, 289, 290, 336, 339, 365 complete lattice, 10, 45, 178, 214, 282 complete set, 272 complete space, 129–133, 146–149, 157–161, 186–190, 193 completely continuous, 252 completely nonnormal operator, 507 completely nonunitary contraction, 440 completion, 139, 142, 143, 242–244, 337 complex Banach space, 202 complex field, 40 complex Hilbert space, 315 complex inner product space, 311 complex linear space, 41 complex normed space, 201 complex-valued function, 5 complex-valued sequence, 13 composition of functions, 6 compression spectrum, 454 condensation point, 181 conditionally compact, 151 cone, 80 conjugate homogeneous, 310, 375 conjugate space, 266 connected set, 185 connected space, 185 connectedness, 185 consistent system of axioms, 3 constant function, 5 continuity, 98, 108 continuity of inner product, 407, 408 continuity of inversion, 297 continuity of metric, 179 continuity of norm, 202 continuity of scalar multiplication, 271 continuity of vector addition, 270, 271 continuous composition, 105, 177 continuous extension, 136 Continuous Extension Theorem, 264 continuous function, 98–105, 115, 124, 151, 152, 161, 183, 218
531
continuous inverse, 225, 230 Continuous Inverse Theorem, 230 continuous linear extension, 239–241 continuous linear transformation, 217 continuous projection, 223, 289, 290, 299, 300 continuous restriction, 177 continuous spectrum, 453 Continuum Hypothesis, 23 contraction, 99, 196, 220, 297, 396, 439–442, 499 Contraction Mapping Theorem, 133 contrapositive proof, 2 convergence, 95, 108 convergence-preserving map, 101 convergent nets, 98, 105 convergent sequence, 95–98, 105, 108, 128, 129 convergent series, 203, 274, 275, 345 convex functional, 200 convex hull, 75, 271 convex linear combination, 75 convex set, 75, 271 convex space, 272 coordinates, 49 coset, 43 countable set, 18 countably infinite set, 18 covering, 8, 149 cyclic subspace, 283 cyclic vector, 283 De Morgan laws, 4, 26 decomposition, 67, 71–74, 339, 365, 402, 428, 489, 494, 510 decreasing function, 10 decreasing increments, 186 decreasing sequence, 14, 31 dense in itself, 128 dense linear manifold, 213, 239–243, 259, 294, 329, 330, 415 dense set, 123, 124, 147, 148, 181, 326 dense subspace, 124, 136, 138, 139 densely embedded, 139 densely intertwined, 283 denumerable set, 18 derived set, 117, 120 descent, 85 diagonal mapping, 174, 184, 220, 289
532
Index
diagonal operator, 220, 226, 248, 251, 256, 298, 397, 414, 439, 472, 508 diagonal procedure, 22, 156, 163 diagonalizable operator, 414, 469, 491, 508, 509 diameter, 89 difference equation, 79, 134 dimension, 53, 76, 78, 81, 356 direct proof, 2 direct sum, 66–68, 74, 208–210, 222–224, 279, 280, 289, 319–321, 323, 334, 335, 338–340, 365, 389, 413, 419, 438–441, 489, 511, 515 direct sum decomposition, 68, 71–74, 339, 365 direct summand, 74, 279 directed downward, 10 directed set, 10 directed upward, 10 disconnected set, 185 disconnected space, 185 disconnection, 185 discontinuous function, 99 discrete dynamical system, 79 discrete metric, 107 discrete set, 127, 185 discrete space, 107 discrete topology, 107 disjoint linear manifolds, 67 disjoint sets, 4 disjointification, 30 distance, 87, 91 distance function, 87 distributive laws, 39 division ring, 39 domain, 5 Dominated Extension Theorems, 264 doubleton, 4 dual space, 266 ε-net, 153 eigenspace, 453, 488, 489, 504 eigenvalue, 453, 454, 488–490, 510 eigenvector, 453, 454, 490, 508 embedding, 6 empty function, 29 empty sum, 346 equicontinuous, 162, 198, 294 equiconvergent sequences, 172, 187
equivalence, 232 equivalence class, 7 equivalence relation, 7 equivalent metrics, 108, 110, 178 equivalent norms, 233, 234 equivalent sets, 14 equivalent spaces, 232 Euclidean metric, 89 Euclidean norm, 204, 316 Euclidean space, 89, 204, 316 eventually constant, 109 eventually in, 105 expansion, 49, 277, 357 exponentially decreasing increments, 186 extended nonnegative integers, 85 extension by continuity, 260 extension of a function, 6 extension ordering, 29 extension over completion, 142, 143, 243, 244, 258, 338 exterior, 123 exterior point, 123 F-space, 272, 273 Fσ , 149 field, 40 final space, 401 finite sequence, 13 finite set, 15 finite-dimensional space, 53, 60, 65, 76, 79, 81, 234–239, 246, 253, 269, 291, 302, 343, 351, 379, 382, 425, 468, 508 finite-dimensional transformation, 79, 252, 255 finite-rank transformation, 79, 252, 253, 255, 291, 301, 427, 480 first category set, 145–149 fixed point, 6, 11, 26, 70, 133, 308 Fixed Point Theorems, 133, 308 Fourier coefficients, 357 Fourier series expansion, 357 Fourier Series Theorem, 356 Fr´echet space, 272, 273 Fredholm Alternative, 484 Fubini Theorem, 387 Fuglede Theorem, 494 Fuglede–Putnam Theorem, 495
Index full direct sum, 208 function, 4 Gδ , 149 Gelfand–Beurling formula, 461 Gelfand–Naimark Theorem, 393 Generalized Continuum Hypothesis, 23 Gram–Schmidt process, 354 graph, 4 greatest lower bound, 9 group, 38, 231 Haar wavelet, 361 Hahn Interpolation Theorem, 180 Hahn–Banach Theorem, 259–264 Hamel basis, 48–51, 351, 355 Hausdorff Maximal Principle, 24 Hausdorff space, 180 Heine–Borel Theorem, 160 Hermitian operator, 393 Hermitian symmetric functional, 310 Hermitian symmetry, 310 Hilbert basis. 351 Hilbert cube, 196 Hilbert space, 315, 375 Hilbert–Schmidt operator, 434, 435 H¨ older conjugates, 165 H¨ older inequalities, 165, 166, 168 homeomorphic spaces, 110, 152 homeomorphism, 110–113, 115, 152 homogeneity, 310 homogeneous mapping, 55 homomorphism, 84 hyperinvariant linear manifold, 284 hyperinvariant subspace, 284, 285 hyperplane, 81 hyponormal operator, 447–450, 457, 501, 502, 505–507, 509–513 hyponormal restriction, 505 ideal, 83 idempotent, 7, 26, 70, 300 identity element, 38, 39, 82 identity map, 6 identity operator, 224 image of a point, 5 image of a set, 5 inclusion map, 6 incommensurable topologies, 108
533
increasing function, 10, 11 increasing sequence, 14, 31 independent system of axioms, 3 index set, 12 indexed family, 12 indexing, 13 indiscrete topology, 107 induced equivalence relation, 8 induced topology, 106, 201, 312, 313 induced uniform norm, 220 inductive set, 2 infimum, 9, 10, 14 infinite diagonal matrix, 220 infinite sequence, 13 infinite set, 15 infinite series, 203 infinite-dimensional space, 53, 353, 356 initial segment, 13 initial space, 401 injection, 6 injective function, 5, 26, 27 injective linear transformation, 56 injective mapping, 5, 11, 16 inner product, 310 inner product axioms, 310 inner product space, 311 2 (X ), 321 inner product space + integer-valued function, 5 integer-valued sequence, 13 interior, 121 interior point, 123 intertwined operators, 283, 495, 497 intertwining transformation, 283 invariant linear manifold, 73, 74, 281 invariant set, 6 invariant subspace, 281–284, 389, 390, 497, 498, 502, 505, 506, 511 invariant subspace problem, 497, 511 inverse element, 38, 83 inverse image, 5 Inverse Mapping Theorem, 230 inverse of a function, 7, 27, 225 inversely induced topology, 181 invertible contraction, 297 invertible element of B[X , Y ], 230 invertible function, 7, 27 invertible linear transformation, 58 invertible operator in B[X ], 231 involution, 27, 393, 427, 439, 457
534
Index
irreflexive spaces, 269 isolated point, 127, 128, 144, 510, 511 isoloid operator, 511 isometric isomorphism, 241, 243, 244, 268, 269, 291, 293, 302, 335, 337 isometric spaces, 111 isometrically equivalent operators, 394 isometrically equivalent spaces, 111, 139, 142, 375 isometrically isomorphic spaces, 241, 243, 267, 268, 294, 303, 335, 375 isometry, 111, 196, 241, 292, 293, 298, 300, 336, 388, 439, 444, 467, 511 isomorphic algebras, 84 isomorphic equivalence, 63, 64, 66 isomorphic linear spaces, 59, 62–64 isomorphism, 58–60, 63–65 Jensen inequality, 167 k-paranormal operator, 512, 513 k-paranormal restriction, 513 kernel, 55 Kronecker delta, 53 Kronecker function, 53 lattice, 10, 29, 45, 214, 281, 282 Laurent expansion, 461 Law of the Excluded Middle, 2 least upper bound, 9 least-squares, 426 left ideal, 83 left inverse, 27 limit, 31, 95, 98, 171 limit inferior, 31, 171 limit superior, 31, 171 linear algebra, 82 linear basis, 48 linear combination, 45, 46 linear composition, 78 linear dimension, 53, 353, 356 linear equivalence relation, 42 linear extension, 56, 259–262 linear functional, 55 linear manifold, 43, 210, 329, 338 linear restriction, 56, 78 linear space, 40 linear space L[X , Y ], 56, 78 linear span, 45
linear topology, 270 linear transformation, 55, 56, 62, 64 linear variety, 81 linearly independent set, 46, 349 linearly ordered set, 12 Liouville Theorem, 451 Lipschitz condition, 99 Lipschitz constant, 99 Lipschitzian mapping, 99, 196, 218 locally compact space, 195 locally convex space, 272 Lomonosov Theorem, 497 lower bound, 9, 10 lower limit, 171 lower semicontinuity, 176 map, 5 mapping, 5 mathematical induction, 2, 13 matrix, 61, 62, 65 Matrix Inversion Lemma, 84 maximal commutative subalgebra, 518 maximal element, 9 maximal linear variety, 81 maximal orthonormal set, 350, 351, 353 maximum, 9 meager set, 145 metric, 87 metric axioms, 87 metric function, 87 metric generated by a norm, 201, 202 metric generated by a quasinorm, 272 metric space, 87 metrizable, 107 minimal element, 9 minimum, 9 Minkowski inequalities, 166, 167 M¨ obius transformation, 503 modus ponens, 2 monotone function, 10 monotone sequence, 14, 31 multiplication operator, 500, 520 multiplicity, 420, 421, 453 mutually orthogonal projections, 368 natural natural natural natural
embedding, 268, 269 isomorphism, 65, 67, 339 mapping, 8, 44, 66, 290, 338 projection, 224, 290
Index neighborhood, 103, 180 neighborhood base, 272 net, 14 neutral element, 38 Newton’s Second Law, 197 nilpotent linear transformation, 80 nilpotent operator, 282, 461, 473, 511 nondegenerate interval, 21 nondenumerable set, 18 nonmeager set, 145 nonnegative contraction, 431, 499 nonnegative functional, 200 nonnegative homogeneity, 200 nonnegative operator, 395–398, 429–433, 439, 457, 500, 508–510 nonnegative quadratic form, 310 nonnegativeness, 87, 200, 310 nonreflexive spaces, 270 nonstrict contraction, 220, 221 nontrivial hyperinvariant subspace, 285, 286, 495, 497, 504 nontrivial invariant subspace, 281–284, 286, 389, 497, 498, 504, 510, 511 nontrivial linear manifold, 43 nontrivial projection, 70, 457 nontrivial reducing subspace, 489, 495 nontrivial ring, 39 nontrivial subset, 3 nontrivial subspace, 212, 281 norm, 200, 219 norm axioms, 200 norm induced by an inner product, 312, 313, 315, 405 norm topology, 201, 313 normal extension, 446 normal operator, 443–446, 450, 457, 484, 487–491, 494–497, 499–501, 506–511, 518, 519 normal restriction, 506 normaloid operator, 449, 450, 463, 467, 468, 481, 512–514 normed algebra, 224 normed linear space, 201 normed space, 201 normed space B[X , Y ], 219, 220 p ∞ (X ) and + (X ), 210 normed spaces + normed vector space, 201 nowhere continuous, 100 nowhere dense, 144, 145, 148
535
nuclear operator, 434 null function, 42 null operator, 248, 381, 468 null space, 55, 218 null transformation, 56, 219 nullity, 78 numerical radius, 466–468 numerical range, 465 one-to-one correspondence, 5, 11, 14 one-to-one mapping, 5 open ball, 102, 103 open map, 110 Open Mapping Theorem, 227 open neighborhood, 103 open set, 102–106 open subspace, 181 operator, 222 operator algebra B[X ], 224, 231, 393 operator convergence, 246–250, 295, 305, 306, 378–382, 398, 415–418 operator matrix, 281 operator norm property, 222 orbit, 282 ordered n-tuples, 13 ordered pair, 4 ordering, 8 order-preserving correspondence, 25 ordinal number, 25 origin of a linear space, 41, 44 orthogonal complement, 326, 336, 339 orthogonal dimension, 353, 356, 362 orthogonal direct sum, 323, 335, 339, 340, 365, 389, 411, 413, 419–421, 438–441, 491, 511, 515, 516 orthogonal family, 348, 352, 353, 368 Orthogonal Normalization Lemma, 413 orthogonal projection, 364–373, 395, 439, 444, 457, 508, 509 orthogonal projection onto M, 366, 367, 370, 390, 488, 505 orthogonal sequence, 322, 323, 368 orthogonal set, 321, 322, 349, 353, 356 Orthogonal Structure Theorem, 332, 371, 410 orthogonal subspaces, 324, 325, 332, 334–336, 339, 340, 365, 487, 506 orthogonal wavelet, 361 orthogonality, 321
536
Index
orthonormal basis, 351–354, 357, 359–362 orthonormal family, 352, 353, 367 orthonormal sequence, 322, 323 orthonormal set, 349–351, 354 p-integrable functions, 94 p-summable family, 209, 341, 345 p-summable sequence, 90 pair, 4 parallelogram law, 313 paranormal operator, 512, 513 Parseval identity, 357 part of an operator, 446 partial isometry, 401–403 partial ordering, 8 partially ordered set, 9 partition, 8 perfect set, 128, 149 point of accumulation, 117–120 point of adherence, 117, 118 point of continuity, 99 point of discontinuity, 99 point spectrum, 453 pointwise bounded, 162, 244 pointwise convergence, 97, 245 pointwise totally bounded, 162, 198 polar decomposition, 402–405, 444, 445 polarization identities, 313, 406, 494 Polish space, 136 polynomial, 62, 79, 80, 282, 459, 518–520 positive functional, 200 positive operator, 395, 397, 429, 430, 432, 439, 500, 508 positive quadratic form, 310 positiveness, 87, 200, 310 power bounded operator, 248, 306, 381 power inequality, 466 power of a function, 7 power sequence, 248, 282, 299, 381, 417, 448 power set, 4 precompactness, 156 pre-Hilbert space, 311 pre-image, 5 Principle of Contradiction, 2 Principle of Mathematical Induction, 2, 13
Principle of Recursive Definition, 13 Principle of Superposition, 77 Principle of Transfinite Induction, 36 product metric, 169 product of cardinal numbers, 36 product space, 169, 178, 184, 190, 194 product topology, 194 projection, 70–73, 82, 223, 299 projection on M, 71–73 projection operator, 223 Projection Theorem, 336, 339, 365 proof by contradiction, 2 proof by induction, 2 proper contraction, 220, 518 proper linear manifold, 43 proper subset, 3 proper subspace, 212, 238, 264 proportional vectors, 406 pseudometric, 93 pseudometric space, 93 pseudonorm, 200, 205 pure hyponormal operator, 507 Pythagorean Theorem, 322, 348 quadratic form, 310 quasiaffine transform, 285 quasiaffinity, 285 quasiinvertible transformation, 285 quasinilpotent operator, 461, 468, 511 quasinorm, 272 quasinormal operator, 444, 446, 450, 501, 508, 511 quasinormed space, 272 quasisimilar operators, 285 quasisimilarity, 285 quotient algebra, 83, 84 quotient norm, 216 quotient space, 7, 42–44, 69, 83, 93, 140, 205–207, 215, 216, 242, 318 range, 5 rank, 78 rare set, 144 real Banach space, 202 real field, 40 real Hilbert space, 315 real inner product space, 311 real linear space, 41 real normed space, 201
Index real-valued function, 5 real-valued sequence, 13 reducible operator, 389, 495, 504 reducing subspace, 389, 390, 487, 491, 494, 504, 506, 507 reflexive relation, 7 reflexive spaces, 268, 269, 307, 375 relation, 4 relative complement, 3 relative metric, 88 relative topology, 181 relatively closed, 181 relatively compact, 151, 160 relatively open, 181 residual set, 145, 147, 149 residual spectrum, 453 resolution of the identity, 368–371, 413, 484, 488, 489, 491, 493, 508 resolvent function, 451 resolvent identity, 451 resolvent set, 450, 451 restriction of a function, 5 Riemann–Lebesgue Lemma, 424 Riesz Decomposition Theorem, 510 Riesz Lemma, 238 Riesz Representation Theorem, 374 right ideal, 83 right inverse, 27 ring, 39 ring with identity, 39 Russell paradox, 24 scalar, 40 scalar multiplication, 40 scalar operator, 221, 281, 469, 504 scalar product, 310 Schauder basis, 277, 286, 301 Schauder Fixed Point Theorem, 308 Schwarz inequality, 311 second category set, 145–147 second dual, 267 self-adjoint operator, 393–398, 427–429, 431, 439, 457, 500, 502, 503, 508–510, 515 self-indexing, 13 semicontinuity, 176 semi-inner product, 317 semi-inner product space, 317 semigroup property, 28
537
seminorm, 200, 205 seminormal operator, 448, 450 separable space, 124–127, 154, 184, 213, 257, 267, 277, 291, 294, 353, 354, 359–363, 369, 383, 491, 498 sequence, 13 sequence of partial sums, 170, 203 sequentially compact set, 157 sequentially compact space, 157–159 sesquilinear form, 309, 310 sesquilinear functional, 309, 310 set, 3 shift, 249, 292, 293, 419–424, 442, 450, 470–478 similar linear transformations, 64, 80 similar operators, 286, 293, 418, 497 similarity, 64, 65, 80, 286, 293, 294, 409, 497, 502 simply ordered set, 12 singleton, 4 span, 45, 46, 213 spanned linear manifold, 46 spanned subspace, 213 spanning set, 213 spectral decomposition, 489, 494 Spectral Mapping Theorem, 459, 518 spectral measure, 493 spectral radius, 461–464, 466–468, 502, 505, 512 Spectral Theorem, 488, 494 spectraloid operator, 467, 468 spectrum, 450–457, 468–478, 480–482, 485, 502, 509–511, 515, 516, 518 spectrum diagram, 453 spectrum partition, 453 square root, 398, 431–433, 460, 519 square root algorithm, 175 square-summable family, 341, 347, 348 square-summable net, 320 square-summable sequence, 319, 322, 323, 334, 335 stability, 248–250, 296, 306, 381, 415, 420–422, 431, 432, 442, 463, 464, 500, 506, 516, 517 strict contraction, 99, 133, 220, 297, 396, 518 strictly decreasing function, 10 strictly decreasing sequence, 14 strictly increasing function, 10
538
Index
strictly increasing sequence, 14 strictly positive operator, 395–397, 429, 430, 432, 439, 457, 500, 508–510 strong convergence, 246–248, 295, 298, 300, 369, 370, 373, 378, 379, 398, 416–418, 433, 499 strong limit, 246 strong stability, 248, 306, 381, 431, 506, 516, 517 stronger topology, 108 strongly bounded, 244, 407 strongly closed, 250, 298, 381, 382 strongly stable coisometry, 420, 421 strongly stable contraction, 442, 499 strongly stable operator, 248–250 subadditive functional, 200 subadditivity, 200 subalgebra, 83 subcovering, 150 sublattice, 10 sublinear functional, 200 subnormal operator, 446, 448, 450, 511 subsequence, 14, 98, 129, 155, 157 subset, 3 subspace of a metric space, 88, 181 subspace of a normed space, 210–214, 216, 218, 236, 324–329, 332, 334–336, 339, 340, 365, 366, 401, 410–412, 440, 478 subspace of a topological space, 181 Successive Approximation Method, 133 sum of cardinal numbers, 36 sum of linear manifolds, 44, 45, 67, 68, 81, 214, 215 summable family, 340–348 summable sequence, 203, 274, 275, 279 sup-metric, 92, 93 sup-norm, 211, 212 supremum, 9, 10, 14 surjective function, 5, 26, 27 surjective isometry, 111, 139–143, 292, 337, 388, 404 symmetric difference, 4, 26 symmetric functional, 310 symmetric relation, 7 symmetry, 87, 427 Tietze Extension Theorem, 180 Tikhonov Theorem, 194
tensor product, 478 topological base, 125 topological embedding, 111 topological invariant, 111, 148, 152, 184, 185 topological isomorphism, 232, 241, 289, 291, 293, 438 topological linear space, 270 topological property, 111 topological space, 106 topological sum, 214, 332, 340 topological vector space, 270 topologically isomorphic spaces, 232, 237, 291, 293, 438 topology, 106, 107 total set, 213 totally bounded, 153–157, 159–164 totally cyclic linear manifold, 283 totally disconnected, 185, 188, 192 totally ordered set, 12 trace, 436 trace-class operator, 434–437 transfinite induction, 36 transformation, 5 transitive relation, 7 triangle inequality, 87, 200 trichotomy law, 12 two-sided ideal, 83, 255, 434, 435 ultrametric, 187 ultrametric inequality, 187 unbounded linear transformation, 237, 288, 289, 364 unbounded set, 89 unconditionally convergent, 345, 359 unconditionally summable, 345, 346 uncountable set, 18 uncountably infinite set, 18 undecidable statement, 24 underlying set, 40, 88 Uniform Boundedness Principle, 244 uniform convergence, 97, 246–248, 295, 299, 300, 379, 418, 433 uniform homeomorphism, 111, 135, 138 uniform limit, 246 uniform stability, 248, 381, 463, 506, 516, 517 uniformly bounded, 244, 407 uniformly closed, 250, 382
Index uniformly continuous composition, 177 uniformly continuous function, 98, 99, 135–138, 143, 152, 156, 218 uniformly equicontinuous, 162 uniformly equivalent metrics, 112, 178 uniformly homeomorphic spaces, 111, 112, 135, 156, 178 uniformly stable operator, 248, 296, 463, 464 unilateral shift, 292, 419, 423, 450, 470 unilateral weighted shift, 472 unit vector, 349 unital algebra, 82 unital algebra L[X ], 56, 83, 224 unital Banach algebra, 224, 393 unital homomorphism, 84 unital normed algebra, 224, 284 unitarily equivalent operators, 409, 418, 420, 422, 496, 497, 509 unitarily equivalent spaces, 337, 340, 362, 363, 438 unitary equivalence, 336, 363, 410, 442 unitary operator, 389, 422, 431, 439, 440, 444, 457, 500, 502, 508, 509 unitary space, 89, 204 unitary transformation, 337, 339, 340, 388, 404, 409, 419–421, 509 upper bound, 9, 10 upper limit, 171 upper semicontinuity, 176 usual metrics, 88, 90, 91, 95 usual norms, 204, 205, 207, 210, 220
539
value of a function, 5 vector, 40 vector addition, 40 vector space, 40 von Neumann expansion, 296, 464 wavelet, 361 wavelet expansion, 361 wavelet functions, 361 wavelet vectors, 361 weak convergence, 305–307, 378–383, 398, 415–418 weak limit, 305, 378 weak stability, 306, 381, 431, 506 weaker topology, 108 weakly bounded, 407 weakly closed, 381, 382 weakly closed convex cone, 397, 429 weakly stable operator, 306, 381, 420, 422, 506 weak* convergence, 307 Weierstrass Theorems, 125, 161 weighted shift, 472, 473 weighted sum of projections, 372, 413, 484–489, 491, 492, 508 well-ordered set, 12 Zermelo Well-Ordering Principle, 24 ZF axiom system, 23 ZFC axiom system, 23 zero linear manifold, 43 Zorn’s Lemma, 17