Matrix Theory From Generalized Inverses to Jordan Form
PURE AND APPLIED MATHEMATICS A Program of Monographs, Textbooks, and Lecture Notes
EXECUTIVE EDITORS Earl J. Taft Rutgers University Piscataway, New Jersey
Zuhair Nashed University of Central Florida Orlando, Florida
EDITORIAL BOARD M. S. Baouendi University of California, San Diego Jane Cronin Rutgers University Jack K. Hale Georgia Institute of Technology S. Kobayashi University of California, Berkeley
Marvin Marcus University of California,
Anil Nerode Cornell University Freddy van Oystaeyen University of Antwerp, Belgiaun
Donald Passman University of Wisconsin, Madison Fred S. Roberts Rutgers University
Santa Barbara
David L. Russell Virginia Polytechnic Institute and State University
W. S. Massey Yale University
Walter Schempp Universitat Siegen
MONOGRAPHS AND TEXTBOOKS IN PURE AND APPLIED MATHEMATICS
Recent Titles W. J. Wickless, A First Graduate Course in Abstract Algebra (2004)
R. P. Agarwal, M. Bohner, and W-T Li, Nonoscillation and Oscillation Theory for Functional Differential Equations (2004) J. Galambos and 1. Simonelli, Products of Random Variables: Applications to Problems of Physics and to Arithmetical Functions (2004) Walter Ferrer and Alvaro Rittatore, Actions and Invariants of Algebraic Groups (2005) Christof Eck, Jiri Jarusek, and Miroslav Krbec, Unilateral Contact Problems: Variational
Methods and Existence Theorems (2005) M. M. Rao, Conditional Measures and Applications, Second Edition (2005) A. B. Kharazishvili, Strange Functions in Real Analysis, Second Edition (2006) Vincenzo Ancona and Bernard Gaveau, Differential Forms on Singular Varieties: De Rham and Hodge Theory Simplified (2005) Santiago Alves Tavares, Generation of Multivariate Hemiite Interpolating Polynomials (2005)
Sergio Macias, Topics on Continua (2005)
Mircea Sofonea, Weimin Han, and Meir Shillor, Analysis and Approximation of Contact Problems with Adhesion or Damage (2006) Marwan Moubachir and Jean-Paul Zolesio, Moving Shape Analysis and Control: Applications to Fluid Structure Interactions (2006)
Alfred Geroldinger and Franz Halter-Koch, Non-Unique Factorizations: Algebraic, Combinatorial and Analytic Theory (2006)
Kevin J. Hastings, Introduction to the Mathematics of Operations Research with Mathematical, Second Edition (2006) Robert Carlson, A Concrete Introduction to Real Analysis (2006) John Dauns and Yiqiang Zhou, Classes of Modules (2006) N. K. Govil, H. N. Mhaskar, Ram N. Mohapatra, Zuhair Nashed, and J. Szabados, Frontiers in Interpolation and Approximation (2006)
Luca Lorenzi and Marcello Bertoldi, Analytical Methods for Markov Semigroups (2006) M. A. AI-Gwaiz and S. A. Elsanousi, Elements of Real Analysis (2006) Theodore G. Faticoni, Direct Sum Decompositions of Torsion-free Finite Rank Groups (2006) R. Sivaramakrishnan, Certain Number-Theoretic Episodes in Algebra (2006) Aderemi Kuku, Representation Theory and Higher Algebraic K-Theory (2006) Robert Piziak and P L. Ode!/, Matrix Theory: From Generalized Inverses to Jordan Form (2007)
Norman L. Johnson, Vikram Jha, and Mauro Biliotti, Handbook of Finite Translation Planes (2007)
Matrix Theory From Generalized Inverses
to Jordan Form
Robert Piziak Baylor University Texas, U.S.A.
P. L. NO Baylor University Texas, U.S.A.
Chapman & Hall/CRC Taylor & Francis Group Boca Raton London New York
Chapman At Hall/CRC is an imprint of the Taylor lit Francis Group, an intorma business
Chapman & Hall/CRC Taylor & Francis Group 6000 Broken Sound Parkway N%V, Suite 300 Boca Raton, FL 33487-2742
O 2007 by Taylor & Francis Group, LLC
Chapman & Hall/CRC is an imprint of Taylor & Francis Group, an Infoi ma business No claim to original U.S. Government works Pnnted in Canada on acid-free paper
1098765432 International Standard Book Number-10:1-58488-625-0 (Hardcover) International Standard Book Number-13: 978-1-58488-625-9 (Hardcover)
This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright. com (http://www.copyright com/) or contact the Copyright Clearance Center, Inc (CCC) 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data Piziak, Robert. Matrix theory : from generalized inverses to Jordan form / Robert Piziak and P.L. Odell
p cm -- (Pure and applied mathematics) Includes bibliographical references and index. ISBN-13.978-1-5848 8-625-9 (acid-free paper) 1. Matrices--textbooks. 2. Algebras, Linerar--Textbooks. 3. Matrix inversion--Textbooks. I.Odell, Patrick L., 1930- II Title. Ill. Series. QA 188. P59 2006
512.9'434--dc22
Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com
2006025707
Dedication Dedicated to the love and support of our spouses
Preface
This text is designed for a second course in matrix theory and linear algebra accessible to advanced undergraduates and beginning graduate students. Many concepts from an introductory linear algebra class are revisited and pursued to a deeper level. Also, material designed to prepare the student to read more advanced treatises and journals in this area is developed. A key feature of the book is the idea of "generalized inverse" of a matrix, especially the Moore-Penrose inverse. The concept of "full rank factorization" is used repeatedly throughout the book. The approach is always "constructive" in the mathematician's sense.
The important ideas needed to prepare the reader to tackle the literature in matrix theory included in this book are the Henderson and Searle formulas, Schur complements, the Sherman-Morrison-Woodbury formula, the LU factorization, the adjugate, the characteristic and minimal polynomial, the Frame algorithm and the Cayley-Hamilton theorem, Sylvester's rank formula, the fundamental subspaces of a matrix, direct sums and idempotents, index and the Core-Nilpotent factorization, nilpotent matrices, Hermite echelon form, full rank factorization, the Moore-Penrose inverse and other generalized inverses, norms, inner products and the QR factorization, orthogonal projections, the spectral theorem, Schur's triangularization theorem, the singular value decomposition, Jordan canonical form, Smith normal form, and tensor products. This material has been class tested and has been successful with students of mathematics, undergraduate and graduate, as well as graduate students in statistics and physics. It can serve well as a "bridge course" to a more advanced study of abstract algebra and to reading more advanced texts in matrix theory.
ix
Introduction In 1990, a National Science Foundation (NSF) meeting on the undergraduate linear algebra curriculum recommended that at least one "second" course in linear algebra should be a high priority for every mathematics curriculum. This text is designed for a second course in linear algebra and matrix theory taught at the senior undergraduate and beginning graduate level. It has evolved from notes we have developed over the years teaching MTH 4316/5316 at Baylor University. This text and that course presuppose a semester of sophomore level introductory linear algebra. Even so, recognizing that certain basic ideas need review and reinforcement, we offer a number of appendixes that can be used in the classroom or assigned for independent reading. More and more schools are seeing the need for a second semester of linear algebra that goes more deeply into ideas introduced at that level and delving into topics not typically covered in a sophomore level class. One purpose we have for this course is to act as a bridge to our abstract algebra courses. Even so, the topics were chosen to appeal to a broad audience and we have attracted students in statistics and physics. There is more material in the text than can be covered in one semester. This gives the instructor some flexibility in the choice of topics. We require our students to write a paper on some topic in matrix theory as part of our course requirements, and the material we omit from this book is often a good starting point for such a project. Our course begins by setting the stage for the central problem in linearalgebra, solving systems of linear equations. In Chapter 1, we present three views of this problem: the geometric, the vector, and the matrix view. One of the main goals of this book is to develop the concept of "generalized inverses," especially the Moore-Penrose inverse. We therefore first present a careful treatment of ordinary invertible matrices, including a connection to the minimal polynomial. We develop the Henderson-Searle formulas for the inverse of a sum of matrices, the idea of a Schur complement and the Sherman-Morrison-Woodbury formula. In Chapter 2, we discuss the LU factorization after reviewing Gauss elimination. Next comes the adjugate of a matrix and the Frame algorithm for computing the coefficients of the characteristic polynomial and from which the CayleyHamilton theorem results. In Chapter 3, we recall the subspaces one associates with a matrix and review the concept of rank and nullity. We derive Sylvester's rank formula and reap the many consequences that follow from this result. Next come direct sum
xi
Introduction
xii
decompositions and the connection with idempotent matrices. Then the idea of index of a matrix is introduced and the Core-Nilpotent factorization is proved. Nilpotent matrices are then characterized in what is probably the most challenging material in the book. Left and right inverses are introduced in part to prepare the way for generalized inverses. In Chapter 4, after reviewing row reduced echelon form, matrix equivalence,
and the Hermite echelon form, we introduce the all-important technique of full rank factorization. The Moore-Penrose inverse is then introduced using the
concept of full rank factorization and is applied to the problem of solving a system of linear equations, giving a consistency condition and a description of all solutions when they exist. This naturally leads to the next chapter on other generalized inverses.
At this point, some choices need to be made. The instructor interested in pursuing more on generalized inverses can do so at the sacrifice of later material. Or, the material can be skipped and used as a source of projects. Some middle ground can also be chosen. Chapter 6 concerns norms, vector and matrix. We cover this material in one
day since it is really needed only to talk about minimum norm solutions and least squares solutions to systems of equations. The chapter on inner products comes next. The first new twist is that we are dealing with complex vector spaces so we need a Hermitian inner product. The concept of orthogonality is reviewed and the QR factorization is developed. Kung's approach to finding the QR factorization is presented. Now we are in a position to deal with minimum norm and least squares solutions and the beautiful connection with the Moore-Penrose inverse. The material on orthogonal projections in Chapter 8 is needed to relate the Moore-Penrose inverse to the orthogonal projections onto the fundamental subspaces of a matrix and to our treatment of the spectral theorem. Unfortunately, sections 2 and 5 are often skipped due to lack of time. In Chapter 9, we prove the all-important spectral theorem. The highlights of Chapter 10 are the primary decomposition theorem and Schur's triangularization theorem. Then, of course, we also discuss singular value decomposition, on which you could spend an inordinate amount of time. The big finish to our semester is the Jordan canonical form theorem. To be honest, we have never had time to cover the last chapter on multilinear algebra, Chapter 12. Now to summarize what we feel to be the most attractive features of the book:
1. The style is conversational and friendly but always mathematically correct. The book is meant to be read. Concrete examples are used to make abstract arguments more clear. 2. Routine proofs are left to the reader as exercises while we take the reader carefully through the more difficult arguments.
introduction
xiii
3. The book is flexible. The core of the course is complex matrix theory accessible to all students. Additional material is available at the discretion of the instructor, depending on the audience or the desire to assign individual projects.
4. The Moore-Penrose inverse is developed carefully and plays a central role, making this text excellent preparation for more advanced treatises in matrix theory, such as Horn and Johnson and Ben Israel and Greville. 5. The book contains an abundance of homework problems. They are not graded by level of difficulty since life does not present problems that way.
6. Appendixes are available for review of basic linear algebra.
7. Since MATLAB seems to be the language of choice for dealing with matrices, we present MATLAB examples and exercises at appropriate places in the text.
8. Most sections include suggested further readings at the end.
9. Our approach is "constructive" in the mathematician's sense, and we do not extensively treat numerical issues. However, some "Numerical Notes" are included to be sure the reader is aware of issues that arise when computers are used to do matrix calculations. There are many debts to acknowledge. This writing project was made possible and was launched when the first author was granted a sabbatical leave by Baylor
University in the year 2000. We are grateful to our colleague Ron Stanke for his help in getting the sabbatical application approved. We are indebted to many students who put up with these notes and with us for a number of years. They have taken our courses, have written master's theses under our direction, and have been of immeasurable help without even knowing it. A special thanks goes to Dr. Richard Greechie and his Math 405 linear algebra class at LA TECH, who worked through an earlier version of our notes in the spring of 2004 and who suggested many improvements. We must also mention Curt Kunkel, who took our matrix theory class from this book and who went way beyond the call of duty in reading and rereading our notes, finding many misprints, and offering many good ideas. Finally, we appreciate the useful comments and suggestions of our colleague Manfred Dugas, who taught a matrix theory course from a draft version of our book. Having taught out of a large number of textbooks over the years, we have collected a large number of examples and proofs for our classes. We have made some attempt to acknowledge these sources, but many have likely been lost in the fog of the past. We apologize ahead of time to any source we have failed to properly acknowledge. Certainly, several authors have had a noticeable impact
Introduction
xiv
on our work. G. Strang has developed a picture of "fundamental" subspaces that we have used many times. S. Axler has influenced us to be "determinant free" whenever possible, though we have no fervor against determinants and use them when they yield an easy proof. Of course, the wonderful book by C. D. Meyer, Jr., has changed our view on a number of sections and caused us to rewrite material to follow his insightful lead. The new edition of Ben-Israel and Greville clearly influences our treatment of generalized inverses. We also wish to thank Roxie Ray and Margaret Salinas for their help in the typing of the manuscript. To these people and many others, we are truly grateful.
Robert Piziak Patrick L. Odell
References D. Carlson, E. R. Johnson, D. C. Lay, and A. D. Porter, The linear algebra
curriculum study group recommendations for the first course in linear algebra, College Mathematics Journal, 24 (2003), 41-45.
J. G. Ianni, What's the best textbook?-Linear algebra, Focus, 24 (3) (March, 2004), 26. An excellent source of historical references is available online at http://wwwhistory.mcs.st-andrews.ac.uk/ thanks to the folks at St. Andrews, Scotland.
Contents
1
The Idea of Inverse ...............................................1
1.1
........................... 1 1.1.1.1 Floating Point Arithmetic ..................... 10 1.1.1.2 Arithmetic Operations ........................ 11 1.1.1.3 Loss of Significance ......................... 12 1.1.2 MATLAB Moment ................................... 13 1.1.2.1 Creating Matrices in MATLAB ............... 13 The Special Case of "Square" Systems ........................ 17 Solving Systems of Linear Equations 1.1.1
1.2
1.2.1
The Henderson Searle Formulas ....................... 21
1.2.2
Schur Complements and the Sherman-Morrison-Woodbury Formula ................. 24 MATLAB Moment 1.2.3.1 Computing Inverse Matrices .................. 37
1.2.3
1.2.4
2
Numerical Note ...................................... 10
................................... 37
Numerical Note ...................................... 39 1.2.4.1
Matrix Inversion ............................. 39
1.2.4.2
Operation Counts ............................ 39
Generating Invertible Matrices .................................. 41
2.1
2.2
A Brief Review of Gauss Elimination with Back Substitution .... 41 2.1.1 MATLAB Moment 2.1.1.1 Solving Systems of Linear Equations .......... 47
................................... 47
Elementary Matrices ........................................ 49
............................. 57 2.3.1 MATLAB Moment ................................... 75 2.3.1.1 The LU Factorization ........................ 75 The Adjugate of a Matrix .................................... 76 2.2.1
2.3
2.4 2.5
The Minimal Polynomial
The LU and LDU Factorization ...............................63
The Frame Algorithm and the Cayley-Hamilton Theorem .......
81
2.5.1
Digression on Newton's Identities ......................85
2.5.2
The Characteristic Polynomial and the Minimal Polynomial .......................................... 90
xv
xvi
Contents 2.5.3
Numerical Note ......................................95
The Frame Algorithm ........................ 95 MATLAB Moment ................................... 95
2.5.3.1 2.5.4
2.5.4.1
Polynomials in MATLAB .................... 95
3 Subspaces Associated to Matrices ................................ 3.1
99
Fundamental Subspaces ..................................... 99 3.1.1
MATLAB Moment ..................................109 3.1.1.1
The Fundamental Subspaces ................. 109
3.2
A Deeper Look at Rank .....................................111
3.3 3.4
Direct Sums and Idempotents ............................... 117 The Index of a Square Matrix ............................... 128 3.4.1 MATLAB Moment .................................. 147 3.4.1.1
3.5
The Standard Nilpotent Matrix ............... 147
Left and Right Inverses ..................................... 148
4 The Moore-Penrose Inverse ..................................... 155 4.1 Row Reduced Echelon Form and Matrix Equivalence .......... 155 4.1.1
Matrix Equivalence ................................. 160
4.1.2
MATLAB Moment .................................. 167
4.1.2.1 4.1.3
4.1.3.1
Pivoting Strategies .......................... 169
4.1.3.2
Operation Counts ........................... 170
4.2
The Hermite Echelon Form ................................. 171
4.3
Full Rank Factorization ..................................... 176 4.3.1
MATLAB Moment ..................................
179
Full Rank Factorization .....................
179
4.3.1.1 4.4
The Moore-Penrose Inverse ................................. 179 4.4.1
MATLAB Moment .................................. 190 4.4.1.1
5
Row Reduced Echelon Form ................. 167
Numerical Note ..................................... 169
The Moore-Penrose Inverse .................. 190
4.5
Solving Systems of Linear Equations ........................
4.6
Schur Complements Again (optional) ........................ 194
190
Generalized Inverses ........................................... 199 5.1
The {l}-Inverse ............................................ 199
5.2
{1,2}-Inverses ............................................. 208
5.3
Constructing Other Generalized Inverses ..................... 210
5.4
{2}-Inverses ...............................................217
5.5
The Drazin Inverse ......................................... 223
5.6
The Group Inverse ......................................... 230
Contents
xvii
6 Norms .........................................................
233
6.1
The Nonmed Linear Space C" ................................. 2
6.2
Matrix Norms ............................................. 244
6.2.1
MATLAB Moment .................................. 252 6.2.1.1
7
Inner Products ................................................. 257 7.1
The Inner Product Space C" .................................257
7.2
Orthogonal Sets of Vectors in C" ............................ 262 7.2.1
MATLAB Moment 7.2.1.1
7.3
7.3.1
Kung's Algorithm ................................... 274
7.3.2
MATLAB Moment ..................................276 The QR Factorization ....................... 276
7.4 7.5
A Fundamental Theorem of Linear Algebra ...................278
7.6
Least Squares ..............................................285
Minimum Norm Solutions .................................. 282
Projections .....................................................291 8.1
Orthogonal Projections ..................................... 291
8.2
The Geometry of Subspaces and the Algebra
8.3
of Projections ..............................................299 The Fundamental Projections of a Matrix ..................... 309 8.3.1
9
.................................. 269
The Gram-Schmidt Process ..................269
QR Factorization ...........................................269
7.3.2.1
8
Norms .....................................252
MATLAB Moment .................................. 313
8.4
8.3.1.1 The Fundamental Projections ................ 313 Full Rank Factorizations of Projections .......................313
8.5
Afline Projections ..........................................315
8.6
Quotient Spaces (optional) .................................. 324
Spectral Theory ................................................ 329 9.1
Eigenstuff ................................................. 329
9.1.1
MATLAB Moment .................................. 337 9.1.1.1
9.2 9.3
Eigenvalues and Eigenvectors
in MATLAB ............................... 337 The Spectral Theorem ...................................... 338 The Square Root and Polar Decomposition Theorems..........347
10 Matrix Diagonalization ....................................... 351 10.1 Diagonalization with Respect to Equivalence ............... 351 10.2
Diagonalization with Respect to Similarity
................. 357
xviii
Contents
10.3
Diagonahzation with Respect to a Unitary .................. 10.3.1
10.4
10.3.1.1 Schur Triangularization ................. 376 The Singular Value Decomposition ........................ 377 10.4.1
MATLAB Moment ............................... 385 10.4.1.1
11
The Singular Value Decomposition
....... 385
Jordan Canonical Form ....................................... 389 11.1
Jordan Form and Generalized Eigenvectors .................
Jordan Blocks ...................................389
11.1.2
Jordan Segments ................................. 392
11.1.3
Jordan Matrices ..................................396
11.1.2.1
11.1.3.1 11.1.4
MATLAB Moment ..................... 392 MATLAB Moment ..................... 395 MATLAB Moment ..................... 397
Jordan's Theorem ................................398 11.1.4.1
11.2
389
11.1.1
11.1.1.1
12
371
MATLAB Moment ............................... 376
Generalized Eigenvectors ............... 402
The Smith Normal Form (optional) ........................
422
Multilinear Matters ...........................................431 12.1
12.2 12.3 12.4 12.5 12.6 12.7
.......................................... 431 437 ........................................... 440 442 447 450 452
Bilinear Forms Matrices Associated to Bilinear Forms ..................... Orthogonality Symmetric Bilinear Forms ................................ Congruence and Symmetric Matrices ...................... Skew-Symmetric Bilinear Forms .......................... Tensor Products of Matrices .............................. 12.7.1
MATLAB Moment ............................... 456 12.7.1.1
Tensor Product of Matrices
.............. 456
Appendix A Complex Numbers .................................
459
A.l
What Is a Scalar? ........................................ 459
A.2 A.3
The System of Complex Numbers ......................... 464 The Rules of Arithmetic in C ..............................466 A.3.1 Basic Rules of Arithmetic in 0 .....................466 A.3.1.1
Associative Law of Addition ..............466
A.3.1.2 A.3.1.3 A.3.1.4 A.3.1.5 A.3.1.6 A.3.1.7
Existence of a Zero ...................... 466 Existence of Opposites
................... 466
Commutative Law of Addition ............466 Associative Law of Multiplication .........467 Distributive Laws ........................ 467 Commutative Law for Multiplication ...... 467
Contents
xix
Existence of Identity ..................... 467 Existence of Inverses .................... 467 A.4 Complex Conjugation, Modulus, and Distance .............. 468 A.4.1 Basic Facts about Complex Conjugation ............ 469 A.3.1.8 A.3.1.9
Basic Facts about Magnitude ...................... 469 Basic Properties of Distance ....................... 470 The Polar Form of Complex Numbers ...................... 473 A.4.2 A.4.3
A.5 A.6 A.7
Polynomials over C ...................................... 480 Postscript ............................................... 482
Appendix B Basic Matrix Operations ...........................
485
B.l
Introduction ............................................. 485
B.2
Matrix Addition ..........................................487
B.3
Scalar Multiplication ..................................... 489
B.4
Matrix Multiplication ..................................... 490
B.5
Transpose ............................................... 495
B.5.1
MATLAB Moment ............................... 502 B.5.1.1
B.6
Matrix Manipulations .................... 502
Submatrices ............................................. 503 B.6.1
MATLAB Moment ............................... 506 B.6.1.1
Getting at Pieces of Matrices ..............506
Appendix C Determinants ......................................
509
C.1
Motivation ...............................................509
C.2
Defining Determinants ....................................512
C.3
Some Theorems about Determinants ....................... 517
C.4
C.3.1
Minors .......................................... 517
C.3.2
The Cauchy-Binet Theorem ........................517
C.3.3
The Laplace Expansion Theorem ................... 520
The Trace of a Square Matrix .............................. 528
Appendix D A Review of Basics .................................531 D.1
Spanning ................................................ 531
D.2
Linear Independence ..................................... 533
D.3
Basis and Dimension ..................................... 534
D.4
Change of Basis ......................................... 538
Index ..............................................................543
Chapter 1 The Idea of Inverse
systems of linear equations, geometric view, vector view, matrix view
1.1
Solving Systems of Linear Equations
The central problem of linear algebra is the problem of solving a system of linear equations. References to solving simultaneous linear equations that derived from everyday practical problems can be traced back to Chiu Chang Suan Shu's book Nine Chapters of the Mathematical Art, about 200 B.C. [Smoller, 2005]. Such systems arise naturally in modern applications such as economics, engineering, genetics, physics, and statistics. For example, an electrical engineer using Kirchhoff's law might be faced with solving for unknown currents x, y, z in the system of equations: 1.95x + 2.03y + 4.75z = 10.02 3.45x + 6.43y - 5.02z = 12.13 2.53x +7.O1y+3.61z = 19.46
3.O1x+5.71y+4.02z = 10.52
Here we have four linear equations (no squares or higher powers on the unknowns) in three unknowns x, y, and z. Generally, we can consider a system of m linear equations in n unknowns: a11xj + a12x2 + ... + alnxn = b1 a21x1 + a22x2 + ... + a2nxn = b2
a3lxI + a32x2 + ....+ a3nXn = b3 am I xl + am2X2 + ... + arnnxn = bm
1
The Idea of Inverse
2
The coefficients a,j and constants bk are all complex numbers. We use the symbol C to denote the collection of complex numbers. Of course, real numbers, denoted R, are just special kinds of complex numbers, so that case is automatically included in our discussion. If you have forgotten about complex numbers (remember i2 = -1 ?) or never seen them, spend some time reviewing Appendix A, where we tell you all you need to know about complex numbers. Mathematicians allow the coefficients and constants in (1. I) to come from number domains more general than the complex numbers, but we need not be concerned about that at this point. The concepts and theory derived to discuss solutions of a system of linear equations depend on at least three different points of view. First, we could view each linear equation individually as defining a hyperplane in the vector space of complex n-tuples C". Recall that a hyperplane is just the translation of an (n - 1) dimensional subspace in C". This view is the geometric view (or row view). If n = 2 and we draw pictures in the familiar Cartesian coordinate plane R2, a row represents a line; a row represents a plane in R3, etc. From this point of view, solutions to (1.1) can he visualized as the intersection of lines or planes. For example, in
J 3x-y=7
x+2y=7
the first row represents the line (= hyperplane in R2) through the origin y = 3x translated to go through the point (0, -7) [see Figure 1.11, and the second row
represents the line y = -zx translated to go through (0, 2). The solution to the system is the ordered pair of numbers (3, 2) which, geometrically, is the intersection of the two lines, as is illustrated in Figure I.1 and as the reader may verify.
Another view is to look "vertically," so to speak, and develop the vector (or column) view; that is, we view (1.1) written as
all
a12
a1,,
a21
a22
a2n
+
+X1
XI
a,nl
amt
b1
_
+ x am,,
b2 (1.2)
bm
recalling the usual way of adding column vectors and multiplying them by scalars. Note that the xs really should be on the right of the columns to produce the system (I.1). However, complex numbers satisfy the commutative law of multiplication (ab = ba), so it does not matter on which side the x is written. The problem has not changed. We are still obliged to find the unknown xs given the aids and the bks. However, instead of many equations (the rows) we have just one vector equation. Regarding the columns as vectors (i.e., as n-tuples in C"), the problem becomes: find a linear combination of the columns on the left
/.1 Solving Systems of Linear Equations
Figure 1.1:
3
Geometric view.
to produce the vector on the right. That is, can you shrink or stretch the column vectors on the left in some way so they "resolve" to the vector on the right? And so the language of vector spaces quite naturally finds its way into the fundamental problem of linear algebra. We could phrase (1.2) by asking whether the vector (bl, b2, , bm) in C' is in the subspace spanned by (i.e., generated by) a,,,,). the vectors (all, a21, " ' , aml), (a12, a22, ' , a , , , 2 ) , . ' , (aln, a2n, Do you remember all those words? If not, see Appendix D. Our simple example in the vector view is *
*
7
x L
1
i
L
21
J- L 7 J'
Here the coefficients x = 3, y = 2 yield the solution. We can visualize the situation using arrows and adding vectors as scientists and engineers do, by the head-to-tail rule (see Figure 1.2). The third view is the matrix view. For this view, we gather the coefficients a,1 of the system into an m-by-n matrix A, and put the xs and the bs into columns
The Idea of Inverse
4 Y T
(7,7)
;]
-> x
Figure 1.2:
Vector view.
(n-by-1 and m-by-1, respectively). From the definition of matrix multiplication and equality, we can write (1.1) as
all
a12
a1n
xl
b,
a21
a22
a2n
X2
b2 (1.3)
a., I
amt
amn
bn,
X11
or, in the very convenient shorthand,
Ax=b
(1.4)
where
all
alt
a21
a22
b,
b2
} an,
a,n2
x=
}
and
b = bm
1.1 Solving Systems of Linear Equations
5
This point of view leads to the rules of matrix algebra since we can view (1.1) as the problem of solving the matrix equation (1.4). The emphasis here is on "symbol pushing" according to certain rules. If you have forgotten the basics of manipulating matrices, adding them, multiplying them, transposing them, and so on, review Appendix B to refresh your memory. Our simple example can be expressed in matrix form as
xy] I
7
2'
1
The matrix view can also be considered more abstractly as a mapping or function view. If we consider the function x H Ax, we get a linear transformation from C" to C'" if A is m-by-n. Then, asking if (1.1) has a solution is the same as asking if b lies in the range of this mapping. From (1.2), we see this is the same as asking if b lies in the column space of A, which we shall denote Col(A). Recall that the column space of A is the subspace of C'" generated by the columns of A considered as vectors in C'". The connection between
the vector view and the matrix view is a fundamental one. Concretely in the 3-by-3 case, a
b
d
e h
g
c
f
]
y
i
a]
x] =
d
b e h
x+
g
z
c
fi
Z.
More abstractly,
xi
Ax = [Coll (A)
C0120)
...
Col,, (A)]
X' X11
= x1Coll (A) + x2Col2(A) +
+
(A).
Now it is clear that Ax = b has a solution if b is expressible as a linear combination of the columns of A (i.e., b lies in the column space of A). A row version of this fundamental connection to matrix multiplication can also be useful on occasion
a b
c
g h
i
[x y z] d e f =x[a b c]+y[d e f]+z[g h i].
6
The Idea of Inverse
More abstractly,
Rowi(A) xT A
= [XI
X2
...
Row2(A) X,
Row,,,(A)
= x1 Rowi(A) + x2Row2(A) +
+ x,,, Row(A).
You may remember (and easy pictures in R2 will remind you) that three cases can occur when you try to solve (1.1).
CASE 1.1 A unique solution to (1.1) exists. That is to say, if A and b are given, there is
one and only one x such that Ax = b. In this case, we would like to have an efficient algorithm for finding this unique solution. CASE 1.2
More than one solution for (1.1) exists. In this case, infinite solutions are possible. Even so, we would like to have a meaningful way to describe all of them.
CASE 1.3 No solution to (1.1) exists. We are not content just to give up and turn our backs on this case. Indeed, "real life" situations demand "answers" to (I.1) even when a solution does not exist. So, we will seek a "best approximate solution" to (1.1), where we will make clear later what "best" means.
Thus, we shall now set some problems to solve. Our primary goal will be to give solutions to the following problems:
Problem 1.1 Let Ax = b where A and b are specified. Determine whether a solution for x exists (i.e., develop a "consistency condition") and, if so, describe the set of all solutions.
Problem 1.2 Let Ax = b where A and b are specified. Suppose it is determined that no solution for x exists. Then find the best approximate solution; that is, find x such that the vector Ax - b has minimal length.
1.1 Solving Systems of Linear Equations
7
Problem 1.3 Given a matrix A, determine the column space of A; that is, find all b such that Ax = b for some column vector x. To prepare for what is to come, we remind the reader that complex numbers can be conjugated. That is, if z = a + bi, then the complex conjugate of z is z = a -bi. This leads to some new possibilities for operations on matrices that we did not have with real matrices. If A = [aid ], then A = [ai j ] and A* is the transpose
2 + 3i
of A (i.e., the conjugate transpose) of A. Thus if A =
then A -
f 2-3i 4-5i I 7+5i 6+3i
L
r 2 - 3i
and A*
4+5i
7-5i 6-3i
7 + 5i 1. A matrix A
LL 4-5i 6+3i J is called self-adjoint or Hermitian if A* = A. Also, det(A) is our way of denoting the determinant of a square matrix A. (See Appendix C for a review on determinants.)
Exercise Set 1 1. Write and discuss the three views for the following systems of linear equations.
3x+2y+z=5 2x+3y+z=5. x+y+4z=6
5 4x+3y=17 .
2x- y=10'
2. Give the vector and matrix view of
5 (4 + 30z, + (7 - 'Tri )zz = 1
- '13i
(2 - i )z i + (5 + 13i )Z2 = 16 + 10i
3. Solve J (1 + 2i)z + (2 - 31)w = 5 + 3i
(1 - i)z+ (4i)w= 10-4i 4. Solve
3ix+4iy+5iz= 17i 2iy+7iz = 161. 3iz = 6i
5. Consider two systems of linear equations that are related as follows: f aiixi + a12x2 = ki
azi xi + azzxz = kz
and
biiyi + bi2Y2 = xi
f b21 Y1 + b22y2 = xz
The Idea of'Inverse
8
Let A be the coefficient matrix of the first system and B he the coefficient matrix of the second. Substitute for the x's in the system using the second system and produce a new system of linear equations in the y's. How does the coefficient matrix of the new system relate to A and B?
6. Argue that the n-tuple (s,, s2, ... J
,
satisfies the two equations:
arlxl+...+a1nxn=b1
ajlxl+...+ajnxn=bj
if and only if it satisfies the two equations:
a11x1 + {
+ a;,,x = b;
(ajl + carl)xl + ... + (ajn + ca;n)xn = bj + cb;
for any constant c.
7. Consider the system of linear equations (1.1). Which of the following modifications of (1.1) will not disturb the solution set? Explain! (i) Multiply one equation through by a scalar. (ii) Multiply one equation through by a nonzero scalar. (iii) Swap two equations. (iv) Multiply the ilh equation by a scalar and add the resulting equation to the j" equation, producing a new j`h equation. (v) Erase one of the equations. 8. Let S = {x I Ax = b} be the solution set of a system of linear equations where A is m-by-n, x is n-by- 1, and b is m-by-1. Argue that S could be empty (give a simple example); S could contain exactly one point but, if S contains two distinct points, it must contain infinitely many. In fact, the set S has a very special property. Prove that if xl and x2 are in S, then 7x1 + (1 - '\)X2 is also in S for any choice of it in C. This says the set of solutions of a system of linear equations is an affine subspace of Cn.
9. Argue that if x, solves Ax = b and x2 solves Ax = b, then x, - x2 solves Ax = 6. Conversely, if z is any solution of Ax = -6 and is a particular solution of Ax = b, then argue that xN + z solves Ax = b. 10. You can learn much about mathematics by making up your own examples instead of relying on textbooks to do it for you. Create a 4-by-3 matrix A
1.1 Solving Systems of Linear Equations
9
composed of zeros and ones where Ax = b has (i) exactly one solution (ii) has an infinite number of solutions (iii) has no solution.
If you remember the concept of rank (we will get into it later in Section 3.1 ), what is the rank of A in your three examples?
11. Can a system of three linear equations in three unknowns over the real numbers have a complex nonreal solution? If not, explain why; if so, give an example of such a system.
Further Reading [Atiyah, 2001 ] Michael Atiyah, Mathematics in the 20th Century, The American Mathematical Monthly, Vol. 108, No. 7, August-September, (2001), 654-666.
[D&H&H, 2005] Ian Doust, Michael D. Hirschhorn, and Jocelyn Ho, Trigonometric Identities, Linear Algebra, and Computer Algebra, The American Mathematical Monthly, Vol. 112, No. 2, February, (2005), 155-164. [F-S, 1979] Desmond Fearnley-Sander, Hermann Grassmann and the Creation of Linear Algebra, The American Mathematical Monthly, Vol. 86, No. 10, December, (1979), 809-817.
[Forsythe, 1953] George Forsythe, Solving Linear Equations Can Be Interesting, Bulletin of the American Mathematical Society, Vol. 59, (1953), 299-329.
[Kolodner, 1964] Ignace I. Kolodner, A Note on Matrix Notation, The American Mathematical Monthly, Vol. 71, No. 9, November, (1964), 1031-1032. [Rogers, 1997] Jack W. Rogers, Jr., Applications of Linear Algebra in Calculus, The American Mathematical Monthly, Vol. 104, No. 1, January, (1997), 20-26.
The Idea of inverse
10
[Smoller, 2005] Laura Smoller, The History of Matrices, http://www ualr. edu/ lasmoller/matrices. html.
[Wyzkoski, 1987] Joan Wyzkoski, An Application of Matrices to Space Shuttle Technology, The UMAP Journal Vol. 8, No. 3, (1987), 187-205.
1.1.1 1.1.1.1
Numerical Note Floating Point Arithmetic
Computers, and even some modern handheld calculators, are very useful tools in reducing the drudgery that is inherent in doing numerical computations with matrices. However, calculations on these devices have limited precision, and blind dependence on machine calculations can lead to accepting nonsense
answers. Of course, there are infinitely many real numbers but only a finite number of them can be represented on any given machine. The most common representation of a real number is floating point representation. If you have learned the scientific notation for representing a real number, the following description will not seem so strange. A (normalized) floating point number is a real number x of the form x = ±.d, d2d3 ... d, x be
where d, $ 0, (that's the normalized part), b is called the base, e is the exponen
and d, is an integer (digit) with 0 < d; < b. The number t is called the precision, and ±.d, d7d3 . d, is called the mantissa. Humans typically prefer base b = 10; for computers, b = 2, but many other choices are available (e.g.,
b = 8, b = 16). Note 1.6 x 102 is scientific notation but .16 x 103 is the normalized floating point way to represent 160. The exponent e has limits that usually depend on the machine; say -k < e < K. If x = ±.d,d2d3 . d, x be and e > K, we say x overflows; if e < -k, x is said to underflow. A number x that can he expressed as x = ±.d,d2d3 ... d, x be given b, t, k, and K is called a representable number. Let Rep(b, t, k, K) denote the set of all representable numbers given the four positive integers b, t, k, and K. The important thing to realize is that Rep(b, t, k, K) is a finite set. Perhaps it is a large finite set, but it is still finite. Note zero, 0, is considered a special case and is always in
Rep(b, t, k, K). Also note that there are bounds for the size of a number in Rep(b, t, k, K). If x # 0 in Rep(b, t, k, K), then bk-1 < jxl < bK(l - b-') Before a machine can compute with a number, the number must he converted into a floating point representable number for that machine. We use the notation f l(x) = x` to indicate that x' is the floating point representation of x. That is
z=fl(x)=x(1+8), where 8
A'
1.1 Solving Systems of Linear Equations
11
signifies that i is the floating point version of x. In particular, x is representable if fl(x) = x. The difference between x and z is called roundoff error. The difference x - x is called the absolute error, and the ratio
IX
Ix Ix I is
called the relative error. The maximum value for 161 is called the unit roundoff
error and is typically denoted by E. For example, take x = 51, 022. Say we wish to represent x in base 10, four-digit floating point representation. Then f l(51022) = .5102 x 105. The absolute error is -2 and the relative error is
about -3.9 x 10-5. There are two ways of converting a real number into a floating point number: rounding and truncating (or chopping). The rounded floating point version xr of x is the t-digit number closest to x. The largest error in rounding occurs when a number is exactly halfway between two representable numbers. In the truncated version x,, all digits of the mantissa beyond the last to be kept are thrown away. The reader may verify the following inequalities:
(i)Ix`r-X1 <2b"IXI, I
16rI < 26'-',
(ii)
(iii)I1,-x1 :5 b'-' Ix I, (iv)
18,1 < b".
The reader may also verify that the unit roundoff satisfies
_ E0 = { 1.1.1.2
' b'-'
b"
for rounding for truncating
Arithmetic Operations
Errors can occur when data are entered into a computer due to the floating point representation. Also, errors can occur when the usual arithmetic operations are applied to floating point numbers. For x and y floating point numbers, we can say
(i) fl(x + y) = (x + y)(I + s), (ii) fl(Xy) = xy(1 + s), (iii) fl(x - y) = (x - y)(I + 8), (iv) f l(x _ y) = (X - y)(1 + s).
Unfortunately, the usual laws of arithmetic fail. For example, there exists representable numbers x, y, and z, such that
(v) fl(fl(x+y)+z)0 fl(x+fl(y+z)),
The Idea of Inverse
12
that is, the associative law fails. It may also happen that
(vi) fl(x+y) # fl(x)+ fl(y), (vii)
fl(ay)
54
fl(x)fl(y).
For some good news, the commutative law still works. 1.1.1.3
Loss of Significance
Another phenomenon that can have a huge impact on the outcome of floating point operations is if small numbers are computed from big ones or if two nearly
equal numbers are subtracted. This is called cancellation error. It can happen that the relative error in a difference can be many orders of magnitude larger than the relative errors in the individual numbers.
Exercises 1. How many representable numbers are in R.ep(l0, 1, 2, 1)? Plot them on the real number line. Are they equally spaced? 2. Give an example of two representable numbers whose sum is not representable. 3. Write the four-digit floating point representation of 34248, and compute the absolute and relative error.
Further Reading [Dwyer, 1951 ] P. S. Dwyer, Linear Computations, John Wiley & Sons, New York, (1951). [F,M&M, 1977] G. E. Forsythe, M. A. Malcolm, and C. B. Moler, Computer Methods for Mathematical Computations, Prentice-Hall, Englewood Cliffs, NJ, (1977). [G&vanL, 1996] Gene H. Golub and Charles F. Van Loan, Matrix Computations, 3rd edition, Johns Hopkins Press, Baltimore, MD, (1996).
1.1 Solving Systems of Linear Equations
13
[Higham, 1996] Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, SIAM, Philadelphia, (1996).
[Householder, 1964] Alston S. Householder, The Theory of Matrices in Numerical Analysis, Dover Publications, Inc., New York, (1964).
[Kahan, 2005] W. Kahan, How Futile are Mindless Assessments of Roundoff in Floating-Point Computation?, http://www.cs.berkeley.edu/ wkahan/ Mindless. pdf.
[Leon, 1998] Steven J. Leon, Linear Algebra with Applications, 5th edition, Prentice Hall, Upper Saddle River, NJ, (1998).
1.1.2 MATLAB Moment 1.1.2.1
Creating Matrices in MATLAB
MATLAB (short for MATrix LABoratory) is an interactive program first created by Cleve Moler in FORTRAN (1978) as a teaching too] for courses in linear algebra and matrix theory. Since then, the program has been written in C (1984) and improved over the years. It is now licensed by and is the registered trademark of The Math Works Inc., Cochituate Place, 24 Prime Park Way, Natick, MA. A student edition is available from Prentice Hall. As its name implies, MATLAB is designed for the easy manipulation of matrices. Indeed, there is basically just one object it recognizes and that is a rectangular matrix. Even variable names are considered to be matrices. We assume the reader will learn how to access MATLAB on his or her local computer system. You will know you are ready to begin when you see the command line prompt: >>. The first command to learn is "help". MATLAB has quite an extensive internal library to assist the user. You can be more specific and ask for help on a given topic. Experiment! Closely related is the "lookfor" command. Type "lookfor" and a keyword to find functions relating to that keyword. This first exercise helps you learn how to create matrices in MATLAB. There are a number of convenient ways to do this. 1.1.2.1.1
Explicit Entry Suppose you want to enter the matrix 1
2
3
2
3
4
4 1
3
4
1
2
There are a couple of ways to do this. First
> > A = [1 234;234 1;34 1 2]
The Idea of Inverse
14
will enter this matrix and give it the variable name A. Note we have used spaces
to separate elements in a row (commas could he used too) and semicolons to delimit the rows. Another way to enter A is one row at a time.
> > A=[1234 2341 34 121 A row vector is just a matrix with one row:
>> a=[4321] A column vector can he created with the help of semicolons.
>> b = [1;2;3;4] The colon notation is a really cool way to generate some vectors. m : n generates a row vector starting at m and ending at n. The implied step size is 1. Other step sizes can be used. For example, m : s : n generates a row vector starting at m, going as far as n, in steps of size s. For example
returns
and
>> y=2.0:-0.5:0 returns
y = 2.0000 1.5000 1.0000 0.5000 0
Now try to create some matrices and vectors of your own. 1.1.2.1.2 Built-in Matrices MATLAB comes equipped with many built-in matrices:
1. The Zero Matrix: the command zeros(m, ii) returns an in-by-n matrix filled with zeros. Also, zeros(n) returns an n-by-n matrix filled with zeros.
2. The Identity Matrix: the command eye(n) returns an n-by-n identity matrix (clever, aren't they'?)
1.1 Solving Systems of Linear Equations
15
3. The Ones Matrix: the command ones(m, n) returns an m-by-n matrix filled with ones. Of course, ones(n) returns an n-by-n matrix filled with ones.
4. Diagonal Matrices: the function diag creates diagonal matrices. For example, diag([2 4 61) creates the matrix 2
0
0
0
4
0
0
0 6
Now create some of the matrices described above. Try putting the matrix A above into diag. What happens? What is diag(diag(A))? 1.1.2.1.3 Randomly Generated Matrices MATLAB has two built-in random number generators. One is rand, which returns uniformly distributed num-
bers between zero and one, and the other is randn, which returns normally distributed numbers between zero and one. For example, >> A = rand(3)
might create the 3-by-3 matrix 0.9501 0.2311 0.6068
0.4860 0.8913 0.7621
0.4565 0.0185 0.8214
We can generate a random 5-by-5 matrix with integers uniformly distributed between 0 and 10 by using
floor(] 1 * rand(y)). A random 5-by-5 complex matrix with real and imaginary parts being integers normally distributed between 0 and 10 could be generated with
floor(11 * rand n(5)) + i *floor(] 1 * rand n(5)). Experiment creating random matrices of various kinds. 1.1.2.1.4 Blocks A really neat way to build matrices is to build them up by blocks of smaller matrices to make a big one. The sizes, of course, must fit together.
The Idea of Inverse
16
For example,
>> A = [B 5 * ones(2) eye(2) 3 * eye(2)] returns
2
4
5
6
8
5
5
1
0
3
0
0
1
0
3
5
where B = [2 4; 6 8] has been previously created. The matrix A could also have been created by > > A = [B 5 * ones(2); eye(2) 3 * eye(2)]
Very useful for later in the text is the ability to create block diagonal matrices. For example, > > A = blkdiag(7 * eye(3), 8 * eye(2)) returns the matrix 7
0
0 0
7
0 0
0 0 0
0 0
0 0
7
0 0 0
0 0
8
0 0
0
8
Now create some matrices using blocks.
Further Reading IH&H, 2000] Desmond J. Higham and Nicholas J. Higham, MATLAB Guide, SIAM, Philadelphia, (2000).
[L&H&F, 1996] Steven Leon, Eugene Herman, and Richard Faulkenberry, ATLAST Computer Exercises for Linear Algebra, Prentice Hall, Upper Saddle River, NJ, (1996). [Sigmon, 19921 Kermit Sigmon, MATLAB Primer, 2nd edition, http://math.ucsd.edu/"driver/2Id-s99/matlab-prinier.html, (1992).
I.2 The Special Case of "Square" Systems
17
matrix inverse, uniqueness, basic facts, reversal law, Henderson and Searle formulas, Schur complement, Sherman-Morrison-Woodbury formula
1.2
The Special Case of "Square" Systems
In this section, we consider systems of linear equations (l .1) where the number
of equations equals the number of unknowns; that is, m = n. In the form Ax = b, A becomes a square (i.e., n-by-n) matrix while x and b are n-by-1 column vectors. Thinking back to the good old high school days in Algebra I, we learned the complete story concerning the equation
ax = b
(1.5)
over the real numbers R.
If a = 0 and b = 0, any real number x provides a solution. If a = 0 but b A 0, then no solution x exists. If a 0, then we have the unique solution x = b/a; that is, we divide both sides of (2.1) by a. Wouldn't it be great if we could solve the matrix equation
Ax = b
(1.6)
the same way? However, as you well remember, dividing by a matrix was never
an allowed operation, but matrix multiplication surely was. Another way to look at (1.5) is, rather than dividing by a, multiply both sides of (1.5) by the reciprocal (inverse) of a, I la. Then x = (1 /a)b is the unique solution to (1.5). Now, the reciprocal of a in R is characterized as the unique number c such that
ac = I = ca. Of course, we write c as I /a or a-t so we can write x = a-' b above. We can translate these ideas into the world of square matrices.
DEFINITION 1.1 (matrix inverse) An n-by-n matrix A is said to have an inverse if there exists another n-by-n matrix C with AC = CA = I,,, where I is the n-by-n identity matrix. Any matrix A that has an inverse is called invertible or nonsingular. It is important to note that under this definition, only "square" (i.e., n-by-n) matrices can have inverses. It is easy to give examples of square matrices that are
not invertible. For instance [ 0
J
by any 2-by-2 matrix yields
[ 0
2
can not be invertible since multiplication
2
[
b
d ]
-[
2b
2d]
'
The Idea of Inverse
18
To get the identity matrix 12 = [ 2b
2d ] as an answer, b would have to
be I and 2b would have to be 0, a contradiction. On the other hand,
[
1
0]=[0
II [
0 1
-
0
o][°
1
We see some square matrices have inverses and some do not. However, when a matrix is invertible, it can have only one inverse.
THEOREM 1.1 (uniqueness of inverse) Suppose A is an n-by-n invertible matrix. Then only one matrix can act as an inverse matrix for A. As notation, we denote this unique matrix as A-'. Thus
AA-' = A -' A = !,,
.
PROOF
In typical mathematical fashion, to prove uniqueness, we assume there are two objects that satisfy the conditions and then show those two objects
are equal. So let B and C he inverse matrices for A. Then C = IC = (BA)C =
B(AC) = B! = B. You should, of course, be able to justify each equality in detail. We have used the neutral multiplication property of the identity, the definition of inverse, and the associative law of matrix multiplication. 0 The uniqueness of the inverse matrix gives us a handy procedure for proving facts about inverses. If you can guess what the inverse should be, all you need to verify are the two equations in the definition. If they work out, you must have the inverse. We next list some basic facts about inverses that, perhaps, you already know. It would be a good exercise to verify them.
THEOREM 1.2 (basic facts about inverses) Suppose the matrices below are square and of the same size. 1.
!, is invertible for any n and 1,-, ' = 1,,.
2. If k # 0, k! is invertible and (k!")-' = 11, 3. If A is invertible, so is A-'and (A-')-' = A. 4. If A is invertible, so is A" for any natural number n and (A")-'
5. If A and B are invertible, so is AB and (AB)-' = B'A'. (This is called the "reversal law for inverses" by some.)
6. !f k :0 and A is invertible, so is kA and (kA)-' = k A-'.
1.2 The Special Case of "Square" Systems 7.
19
If A is invertible, so is A AT, and A*; moreover, (AT)-i = (A-' )T'
(A-') _ (A)-', and (A*)-' = (A-')`
8. If A2=I", then A=A-'. 9.
If A is invertible, so is AAT, AT A, AA*, and A*A.
10. If A and B are square matrices of the same size and the product AB is invertible, then A and B are both invertible.
PROOF
This is an exercise for the reader.
There are many equivalent ways to say that a square matrix has an inverse. Again, this should be familiar material, so we omit the details. For the square matrix A, let the mapping from C" to C' given by x H Ax be denoted LA.
THEOREM 1.3 (equivalents to being invertible) For an n-by-n matrix A, the following statements are all equivalent (TA.E.): 1. A is invertible.
2. The equation Ax = ' has only the trivial solution x = . 3.
Ax = b has a solution for every n-by-1 b.
4.
Ax = b has a unique solution for each n-by- I b.
5. det(A) # 0. 6. The columns of A are linearly independent. 7. The rows of A are linearly independent.
8. The columns of A span C".
9. The rows of A span C'. 10. The columns of A form a basis of C'. 11. The rows of A form a basis of C".
12. A has rank n (A has full rank). 13. A has nullity 0. 14. The mapping LA is onto. 15. The mapping L A is one-to-one.
The Idea of Inverse
20
PROOF
Again the proofs are relegated to the exercises.
0
An important consequence of this theorem is the intimate connection between finding a basis of C" and creating an invertible matrix by putting this basis as columns to form the matrix. We shall use this connection repeatedly. Now, returning to the question of solving a "square" system of linear equations, we get a very nice answer.
THEOREM 1.4 (solving a square system of full rank) Suppose Ax = b has an invertible coefficient matrix A of size n-by-n. Then the system has a unique solution, namely x = A-' b.
PROOF
Suppose Ax = b has the invertible coefficient matrix A. We claim
xo = A-'b is a solution. Compute Axo = A(A-'b) = (AA-')b = lb = b. Next, if x, is some other solution of Ax = b, we must have Ax, = b. But then, A-' (Ax,) = A-' b, and so (A-' A)x, = A-' b. Of course, A-' A = I, so Ix, = A-' b, and hence, x, = A` b = x0, proving uniqueness. 0 As an illustration, suppose we wish to solve
2x, + 7x2 + x3 = 2
x,+4x2-x3 =2. =4
x, + 3x2 As a matrix equation, we have 2
I
The inverse of
-1
x2
3
0
x1
1
2
7
1
I
4
-1
1
3
0 X1
lution to the system is
x,
7 4
x2 X3
2
=
2 4
-3/2 -3/2 is
1 /2
1 /2
-1/2 -3/2 -3/2 1/2
1/2 1/2
1/2
-1/2
11/2
-3/2 -1/2
, so the so-
11/2
2
-3/2 -1/2
2
=
4
16
-4
, as you may verify.
-2
Theorem 1.4 is hardly the end of the story. What happens if A is not invertible?
We still have a long way to go. However, we note that the argument given in
21
1.2 The Special Case of "Square" Systems
Theorem 1.4 did not need x and b to be n-by-1. Indeed, consider the matrix equation
AX = B where A is n-by-n, X is n-by-p and, necessarily, B is n-by-p. By the same argument as in Theorem 1.4, we conclude that if A is invertible, then this matrix equation has the unique solution X = A-' B. So, if
[ 3
4) [
y u
V
1
we get 9
[u v]-[3 4]-'[6 _
12
-11
10.5
9.5
8
2
8
5
1.5
5
1
1
Our definition of matrix inversion requires two equations to be satisfied, AC = I and CA = I. Actually, one is enough (see exercise set 2, problem 21). This cuts down on our labor substantially and we will use this fact in the sequel. 1.2.1
The Henderson Searle Formulas
Dealing with the inverse of a product of invertible matrices is fairly easy in light of the "reversal law," (AB)-' = B-'A-1. However, dealing with the inverse of a sum is more problematic. You may consider this surprising since matrix addition seems so much more straightforward than matrix multiplication. There are some significant facts about sums we now pursue. We present the formulas found in Henderson and Searle [H&S, 1981 ]. This notable paper gives some historical perspective on why inverting sums of matrices is of interest. Applications come from the need to invert partitioned matrices, inversion of a slightly modified matrix (rank one update), and statistics. Henderson and Searle give some general formulas from which many interesting special cases result. We present these general formulas in this section. The following lemma is an important start to developing the HendersonSearle formulas. Note that the matrices U and V below are included to increase the generality of the statements. They are quite arbitrary, but of appropriate size. LEMMA 1.1 Suppose A is any n-by-n matrix with I, + A invertible. Then 1.
(I, + A)-' = I - A(1n + A)-' = I, - (I + A)-'A.
22
The Idea of Inverse
In particular 2.
A(/ + A)-' = (1 + A)-' A. Clearly 1, = (I + A) - A, so multiply by (I + A)-' from either
PROOF side.
0
Another important lemma is given next.
LEMMA 1.2 Suppose A, B, (B + VA-1U), (A + UB-' V) are all invertible with Vs-by-n,
A n-by-n, U n-by-s, and B s-by-s. Then (B + V A-' U)-' V A-' = B-' V (A +
UB-'V)-1. PROOF
Note VA-'(A+UB-'V)= V+VA-'UB-tV =(B+VA-'U)
B-' V so V A-I (A + UB-' V) = (B + VA-' U)B-' V . Multiply both sides by (A + UB-' V)-' from the right to get VA-' = (B + VA-' U)B-' V (A + UB-'V)-'. Multiply both sides by (B + VA-1U)-1 from the left to get
(B+VA-IU)-IVA-1 =B-IV(A+UB-'V)-1.
0
A quick corollary to Lemma 1.2 is given below. COROLLARY 1.1
Suppose (I,, + VU) and (I,, + U V) are invertible. Then (1, + VU)-I V =
V(I + UV)-'. PROOF
Take A = I and B = 1, in Lemma 1.2.
0
Finally, the Henderson-Searle formulas are given below.
THEOREM 1.5 (Henderson Searle formulas) Suppose A is n-by-n invertible, U is n-by-p, B is p-by-q, and V is q-by-n. Suppose (A + U B V) ' exists. Then
1. (A+UBV)-' =A-' 2. (A + UBV)-' =A-'
3. (A+UBV)-' =A-' -A-'U(I,,+BVA-'U)-'BVA-', 4. (A + UBV)-' = A-' - A-1 UB(Iq + VA-'UB)-'VA-1,
/.2 The Special Case of "Square" Systems 5.
23
(A + UBV)-' = A-' - A-'UBV(In + A-'UBV)-'A-',
6. (A + UBV)-' = A-' - A-'UBVA-'(In + UBVA-I)-I. PROOF First, note In + A-'UBV = A-'(A + UBV), which is a product of invertible matrices (by hypothesis) and hence In + A-' UBV is invertible.
For(I)then,
(A + UBV)-' _ [A(1,, + A-'UBV)]-'
((In - (I + A-'UBV)-'A -'UBV ))A
= A-' - (In + A-'UBV)-'A-UBVA-' using Lemma 1.1. Using (2) of that lemma, we see (1 + A UBV)-' A-' U B V = A-' UBV (In + A-' UBV)-' so formula (5) follows.
Now for formula (2), note (In + UBVA-1) = (A + UBV)A-1, which is a product of invertible matrices and so (1 + UBVA-1)-I exists. Using Lemma 1.1 again, wesee(A+UBV)-' = A-'(In+
UBVA-')-' = A-'[1 - (In + UBVA-')-'UBVA-'] = A-' - A-'(In +
UBVA-')-'UBVA-'. Again, using (2) of the lemma, formula (6) follows. Formula (3) follows from formula (2) and the corollary above if we knew (It, +BVA-' U)-' exists. But there is a determinant formula (see our appendix on determinants) that says det(1 + U(BVA-')) = det(I,, + (BVA-')U) so that follows. Now using the corollary, A-' [(In +(U)(B V A-' ))-' U] B VA-' = A-'[U(I,, +(BVA-')U)-']BVA-', hence (3) follows from (2). Finally, (4) follows from (3) by a similar argument.
0
The next corollary gives formulas for the inverse of a sum of two matrices. COROLLARY 1.2
Suppose A, B E C", A invertible, and A + B invertible. Then
(A+ B)-' = A-' -(1n +A-'B)-'A-'BA-' = A-' - A-'(In + BA-')-' BA-'
= A-' - A-' B(I,, + A'B'A' = A-' - A-'BA-'(In + BA-1)-1. PROOF
Take U = V = In in Theorem 1.5.
The next corollary is a form of the Sherman-Morrison-Woodbury formula we will develop soon using a different approach.
The Idea of Inverse
24
COROLLARY 1.3
Suppose A, B and A + U B V are invertible. Let D = -B-' so B = D
then (A - UD-'V)-l = A-' +A-'U(D - VA-'U)-'VA-'. PROOF
Using (3) from Theorem 1.5, (A - UD-' V)-' =
A-' - A-'U[(1 - D-'VA-'U)-'(-D-')]VA-' = A-' - A-'U[(D-'D - D-'VA-'U)-'(-D-')]VA-' = A-' + A-' U(D - VA-1 U)-' VA-1 using distrihutivity and the reversal law for inverses.
0
We have a special case involving column vectors. COROLLARY 1.4
Suppose A is n-by-n invertible, a is a scalar, A + auv* is invertible, 1 +
av*A-' u # 0 , and u, v are n-by-1. Then
(A+auv*)-' = A-' 1
PROOF
A 'uv*A+av*A-lu'.
By (4) from Theorem 1.5 with B = a/, (A + uBv*)-' = (A +
u(al)v*)-' = A-' - A-'u(al)(I +v*A-'u(al))-'v*A-' _ A_' - aA-'uv*A-'
0
1 +av*A-'u We finish with a result for self-adjoint matrices. COROLLARY 1.5
Suppose A = A*, B = B*, A invertible, and A + UBU* invertible. Then
(A + UBU*)-' = A-' - A-'UB(l + U*A-'UB)-U*A-'. PROOF 1.2.2
Use (4) from Theorem 1.5 and take V = U*.
Schur Complements and the Sherman-Morrison-Woodbury Formula
We end this section by introducing the idea of a Schur complement, named after the German mathematician Issai Schur (10 January 1875-10 January 1941) and using it to prove the Sherman-Morrison-Woodbury formula. The Schur complement has to do with invertible portions of a partitioned matrix. To motivate the idea of a Schur complement, we consider the 2-by-2 matrix
25
1.2 The Special Case of "Square" Systems
M
d
= [
E
C2,,2. Suppose a # 0. Then
c 1
I
-ca
0
a
I
c
)
b
a
-a-)b
1
d]
0
1
b
=
d - ca-)b
0
-a-(b
I
-ca-)a+c -ca-(b+d a a -as-'b +b
] [ 0
1
0
d - ca-(b
0
Since the matrices on either side of M are invertible, M is invertible iff
[ a0 d - ca0 b
is invertible iff d -ca-) b A 0.Of course, all of this assumes
)
J
that a j4 0. Note that det(M) = a(d - ca-)b). 01
Similarly, if we assumed # 0 instead of a 54 0, then I
[0
[a d
1
= [d
x J
J
c 1 is invertible if M is. With different letters, we apply b aJ 0 the above argument and conclude M is invertible iff a - bd-) c # 0. Again, there is the underlying assumption that d i4 0. Now we extend these 1
1
1
B"xt Dsxt
A'tx" Csxn
ideas to larger matrices. Let M
E (r(n+s)x(n+t) Assume A
is nonsingular. Can we mimic what we did above'? Let's try.
DEFINITION 1.2
(Schur complements) Bnxt
Consider a matrix M partitioned as M = [ Anxn
1
E C(n++)x(n+t)
D,xt J Define the Schur complement of A in M by MIA = D - CA-'B E C'Sxr Cv xn
assuming, of course, that A is invertible. If M is partitioned I
A, xt Cnxt
Bs xn
Dnxn
E J
C(n+s)x(n+t) where D is now invertible, define the Schur complement of D in
M by M//D = A - BD-)C.
To illustrate , suppose M =
left 2-by-2 block [
1.
2
1
5
9
2 6
3 7
5
-4
48 -3
57
[
g3
-7
2,- [
6 5
4,[2 7
-
,
where A is the up per
2
Then M/A =
3 8
13
1
5
3]-)[4
9
5,-[
13
0
-16
10
24
26
The Idea of Inverse
The next theorem is due to the astronomer/mathematician Tadeusz Banachiewicz
(1882-1954).
THEOREM 1.6 (Banacheiwicz inversion formula, 1937) Consider a matrix M of four blocks partitioned as
M=[ C 1.
D] where A is r-by-r, B is r-by-s, C is s-by-r, and D is s-by-s.
if A-' and (M/A)-'exist, then the matrix [
A
[
B
'
CD
_
J
A-' + A-' BS-' CA-'
= [
]
is invertible and
B
C
-S-'CA-'
-A-' BS-'
S'
I
where S = M/A.
if D-' and (M//D)-'exist, then the matrix
2.
A
[
B
CD
'
]
[
A D J is invertible and -T-'BD-
_ T-' = [ -D-'CT-'
D-' + D-'CT-'BD-' I
]
where T = M//D. PROOF We will prove part (I) and leave part (2) as an exercise. Suppose A is invertible and S = D - CA-' B is invertible also. We verify the claimed inverse by direct computation, appealing to the uniqueness of the inverse. Compute A
I
B
CD
_
A-' +A-'BS-'CA-' -S-'CA-'
-A-' BS-' 1 S-'
J
AA-' + AA-' BS-' CA-' - BS-'CA-1 -AA-' BS-' + BS-' C[A-' + A-' BS-'CA-'] + D[-S-'CA-'] -CA-' BS-' + DS-' [
I + IBS-'CA-' - BS-'CA-' CA-' + CA-' BS-'CA-' - DS-'CA-'
-IBS-' + BS-' -CA-'BS-1 + DS-'
A few things become clear. The first row is just what we hoped for. The first block is /, and the second is so we are on the way to producing the identity matrix. Moreover, the lower right block
-CA-' BS-' + DS-' = (-CA-' B + D)S-' = SS-' = I.
27
1.2 The Special Case of "Square" Systems We are almost there! The last block is
CA-' +CA-'BS-'CA-' - DS-'CA-' = CA-' + [CA-'BS-' - DS-']CA-' = CA-' + [[CA-'B - D]S-']CA-' = CA-' - SS-'CA-' = 0 and we are done.
0
It is of interest to deduce special cases of the above theorem where B = 0 or C = 0, or both B and C are 0. (See problem 30 of exercise set 2.) Note, in the latter case, we get D0 L
O
®'
D®'
Statisticians have known an interesting result about the inverse of a certain sum of matrices since the late 1940s. A formula they discovered has uses in many areas. We look at a somewhat generalized version next. THEOREM 1.7 (Sherman-Morrison- Woodbury formula) Suppose A is n-by-n nonsingular and G is s-by-s nonsingular, where s <
n. Suppose C and D are n-by-s and otherwise completely arbitrary. Then A + CGD* is invertible if G-' + D*A-'C is invertible, in which case
(A+CGD*)-' = A-' - A-'C(G-' + D*A-'C)-'D*A-'. PROOF We will use a Schur complement to prove this result. First, suppose
that A and G are invertible and S = G-' + D*A-'C is invertible. We must show A + CGD* is invertible. We claim -D* G ' is invertible and J
L
C
A
'
A-' - A-'CS-'D*A-'
-A-'CS-1
Note the
S-' D*A-' S-' Schur complement of A is G-' - (-D*)A-'C = G-' + D*A-'C, which is [
-D* G-'
precisely S. The theorem above applies; therefore A
A-' + A-'CS-'(-D*)A-'
C
-D* G-' ]
-[ - r A-' -AS-'CS-'D*A-1 D* A-'
-A-'CS- ' S-'
-S-'(-D*)A-i
-A-' CS-1 S-'
L
The claim has been established. But there is more to show. The next claim is
that 1 ®
-CG
I
1 is invertible. This is easy since L
C'
is its inverse, J
The Idea of Inverse
28
I
I
as you can easily check. Also,
GD*
1
is invertible since its inverse is
®J . Now consider /
-CG Ir
f
C
A
-D* G'
1
/ I
GD*
/
1
Lr
A + CGD*
0
0
G-'
Being the product of three invertible matrices, this matrix must be invertible. But then, the two nonzero diagonal block matrices must he invertible, so A +CGD* must he invertible. Moreover,
A + CGD*
0
(A +
0
CGD*)-'
®1 G
but also equals
GD*
_
0 ]-1 [-AD* I -G/D*
_
0
I][
A-' - A-' CS-' D*A-' S-1 D*A-'
A-' - A-' CS-' D*A-'
stuff
stuff
stuff
-A-'CS-1 S
/
CG
110
/
Now compare the upper left blocks.
We leave the converse result as an easy exercise.
0
The strength of a theorem is measured by the consequences it has. There are many useful corollaries to this theorem.
COROLLARY 1.6 For matrices of the appropriate size, we have 1.
If A, G, and A + G are invertible, (A + G)-' = A-' - A-'(G-1 +
A-')-'A-'.
2. If A is invertible and n-by-n, C and D are n-by-s, and (I + D*A-'C)-'
exists, then (A+ CD*)-' = A-' - A-'C(1 + D*A-'C)-' D*A-1.
29
1.2 The Special Case of "Square" Systems
3. (1 + CD*)-' = I - C(1 + D*C)-I D*, if (I + D*C)-I exists. 4. Let c and d be n-by-1. Then (A +cd*)-' = A
- I + cd
d*A-'c # -1 and A-' exists. 5. Let c and d be n-by-1. Then (I + cd*)-' = I 6.
-
If u and v are n-by-1, then (I - uv*)-1 = 1-
PROOF
I
c' provided
cd* + d*c ,
uv* v*u -
1'
if I + d*c 00.
if v*u 54 I.
The proofs are left as exercises.
a
We look at a special case of (2) above. Recall the "standard basis" vectors
fo1
ej =
where the
1
I
appears in the jth position. Then we change the
L 0 J
(i, j) entry of a matrix A by adding a specified amount a to this entry. If A is invertible and the perturbed matrix is still invertible, we have a formula for its inverse. Some people say we are inverting a "rank one update" of A.
(A+ae;e')-' = A-' -a j
A-'e,e, A-' _ A-' -acol;(A-')rowj(A-') I +ae*A-'e; I +aentji(A-1)
We illustrate in the exercises (see exercise 16 in exercise set 2).
Exercise Se t 2 1.
If A =
i
0
0
0
i
0
0
0
i
, what is A-'? What if A =
2. Find A* and (A*)-i , if A =
0
0
i
0
i
0
i
0
0
2+i
3+i
4+i
5 + 3i
6 + 3i
6 + 2i
8+2i 8+3i
9-3i
?
30
The Idea of Inverse
3. If A is an m-by-n matrix, what is Aej, e A, e, Ae1, where e, is the standard
basis vector? What is ej*e;? What is eke;*? 4. Suppose A and B are two m-by-n matrices and Ax = Bx for all n-by-I matrices x. Is it true that A = B? Suppose Ax = 0 for all x. What can you say about A? I
5. Argue that
0 0
a
b
1
c
0
1
is always invertible regardless of what a, b, and
c are and find a formula for its inverse. Do the same for
1
0
a
1
b
c
0 0 I
6. Suppose A is n-by-n and A3 = I,,. Must A be invertible? Prove it is or provide a counterexample. 7. Cancellation laws: Suppose B and C are m-by-n matrices. Prove (i) if A is m-by-m and invertible and AB = AC, then B = C. (ii) if A is n-by-n and invertible and BA = CA, then B = C.
8. Suppose A and B are n-by-n and invertible. Argue that A-' + B-' _ A-'(A + B)B-'. If A and B were scalars, how would you have been led to this formula? (Hint: consider ( + h)). If (A + B)-' exists, what is (A-' + B-')-' in view of your result above? 9. Suppose U is n-by-n and U2 = I. Argue that I + U is not invertible unless U = I. 10. Suppose A is n-by-n, A* = A, and A is invertible. Prove that (A-')* _
A-'. 11. Suppose A is n-by-n and invertible. Argue that AA* and A*A are invertible.
12. Prove or disprove the following claim: Any square matrix can be written as the sum of two invertible matrices.
13. Verify the claims made in Theorem 1.2.
14. Complete the proof of Theorem 1.5. 15. Fill in the details of the proof of Corollary 1.1 and Corollary 1.6.
31
1.2 The Speci al Case of "Square" Systems
16. Let A = 1
14
17
3
17
26
5
3
5
1
-2
. Argue that A is invertible and A- ' _
7
-2 5 -19 7 -19 75
.
Now consider the matrix Anew =
14
17
17 3
24 5 5
3
1
Use the rank one update formula above to find (Anew)-I
17. Suppose A is invertible and suppose column j of A is replaced by a new column c, so that the new matrix A, is still invertible. Argue that
(A,)-' = A-' -
(A-1c - ej)rowj(A-1) Continuing with the matrix in rowj(A- )c
exercise 16, compute the inverse of
1
17
3
1
26
5
1
5
1
18. Suppose we have a system of linear equations Ax = b, where A is invertible. Suppose A is perturbed slightly to A + cd*. Consider the system of linear equations (A + cd*)y = b, where (A + cd*) is still invertible. Argue that the solution of the perturbed system is y = A-l b
A-icd*A-lb
-
I +d*A-'c 19. Argue that for a small enough, A + ee;e* remains nonsingular if A is.
20. Suppose A E Cand B E C"x" are invertible. What can you say about the matrix converse?
® L L
B
l ? (Hint: Use Theorem 1.6.) How about the J
21. In our definition of inverse, we required that the inverse matrix work on both sides; that is, AC = I and CA = I. Actually, you can prove AC = I implies CA = I for square matrices. So, do it. 22. Prove the claims of Theorem 1.3. Feel free to consult your linear algebra book as a review.
23. If A is invertible and A commutes with B, then does A-' commute with B as well? Recall we say "A commutes with B" when AB = BA. 24. If I+ AB is invertible, is I+ BA also invertible? If so, is there a formula for the inverse of I + BA?
The Idea of'Inverse
32
25. Suppose P
P2. Argue that I - 2P is its own inverse. Generalize this
to (I - a P) ' for any a. 26. Suppose A is not square but AA* = 1. Does A*A = I also? 1
-r
27. Compute det
r
s
I
t
-s -t
. What, if anything, can you conclude
1
about the invertibility of this matrix? 28. Argue that A, a square matrix, can not be invertible if each row of A sums to 0.
29. Argue that
I
is always an invertible matrix regardless of what
® J
A is. Exhibit the inverse matrix.
30. Investigate the special cases of the Schur complements theorem where B = 0, C = 0, or both are zero. Be clear what the hypotheses are.
31. For which values of X is
1-X 0 0
1
1
2-A
3
-3
2-
an invertible
matrix'?
32. Is the matrix
a
b
c
d
e 0
0
f
invertible'? Do you need any conditions on
0
the entries'?
33. Suppose Ax = Xx, where k is a nonzero scalar and x is a nonzero vector.
Argue that if A is invertible, then A-'x = fx. 34. Can a skew symmetric matrix of odd order be invertible'? 1
I
1
35. Let A = n / +
= n 1 + B. Is A invertible? If I
1
1
so, what is A-''? What is the sum of all entries in A"? (Hint: What is B2?)
36. Suppose A and B are nonsingular. Argue that [(AB)-l ]T = [(AB)T ]-I.
1.2 The Special Case of "Square" Systems 1
0
0
0
0
a
1
0
0
0
33
0 0 . What is (L-5(a))-19 Can you 0 a 0 0 0 a 0 0 0 a 1 generalize this example to (L,, (a))-'?
37. Let L5(a) =
1
1
38. Let A be a square and invertible matrix and let X be such that Ak+, X = Ak
for some k in N. Argue that X = A-'. 1
39. FindA-' ifA =
1
0
-2
1
1
0
0
-2
1
0
-2 1
1 1
-2
0
0
0
1
0
0
-2
0
0
1
-2
1
1
0
etc. Do you see a pattern?
40. Find A-' if A =
2
-1
0
-1
2
-1
0
-1
2
2
-1
0
0
- 1
2
-1
0
0 0
0
1
2
-1
-
, etc.
1
2
Do you see a pattern?
41. Let A =
A,2 A22
A I
, where A is k-by-k and invertible. Argue that
I
A2, A
-L
A2,Aii
42. Consider
r I
All
®1
1
A T
1
A,2
L0
J
a l r
A22 - A21Ai,'A,2
l
x
I
J=
b
, where A is n-by-n and
Ld a J x,, bn+, invertible and a, b, d e C". Argue that if a -dTA-'a # 0, then b,,+, -dTA-'b and x=A-'b-xn+,A-'a. a - dTA-la l
43. Argue that det(A + cd T) = det(A)(1 + dTA-Ic). In particular, deduce that det(1 + cdT) = I + dre. 44. Suppose v is an n-by-1, nonzero column vector. Argue that an invertible matrix exists whose first column is v.
34
The Idea of Inverse
-
1
45. What is the inverse of 0
of
1
1
01
-1
-1
1
I
1
I
II
? How about the inverse
I
1
'? Do you notice anything interesting?
1
-1
0
a
46. Under what conditions is
b
-b a
invertible'? Determine the inverses J
in these cases.
47. Argue that
A-' -VA -
01 [A
U
I
D
i
V
A'U 0 D- V A- U I
=
i
where, of course, A is invertible. Deduce that det(I A
U
VD
LL
= det(A)det(D - V A-' U) = det(D)det(A - U D-' V). As a corollary, conclude that det(I + V U) = det(I + UV).
48. Suppose A, D, A - BD-' C, and A - BD-' C are invertible. Prove that A B ] _' [ (A - BD-'C)-' -(A - BD-'C)-'BD-1 D -(D (Hint: See Theorem 1.6.) C
CA-' B)-'CA
D - CA-'B -'
49. Suppose A, B, C, and D are all invertible, as are A - BD-'C, C -
DB-' A, B - AC-'D, and D - CA-' B. Argue that
(A - BD-'C)-'
(C - DB-'A)-'
(B-AC-'D)-'
(D - CA- B-' ]
A
B
CD
50. Can you derive the Sherman-Morrison-Woodhury formula from the Henderson-Searle formulas'?
51. Argue that (D - CA-'B)-'CA-' = D-'C(A - BD-'C)-'. (Hint: Show CA-'(A - BD-'C) = (D - CA-'B)D-'C.) From this, deduce as a corollary that (1 + AB)-' A = A(1 + BA)-'. 52. Argue that finding the inverse of an n-by-n matrix is tantamount to solving n systems of linear equations. Describe these systems explicitly.
53. Suppose A + B is nonsingular. Prove that A - A(A + B)-IA = B - B(A + B)-'B.
1.2 The Special Case of "Square" Systems
35
54. Suppose A and B are invertible matrices of the same size. Prove that
A-' + B-' = A-'(A+
B)B-1.
55. Suppose A and B are invertible matrices of the same size. Suppose further
that A-' + B-' = (A + B)-1. Argue that AB-'A = BA-'B. 56. Suppose det(1,,, + AA*) is not zero. Prove that (1,,, + AA*)-' = I A(I + A*A)-' A*. Here A is assumed to be m-by-n. 57. Suppose C = C* is n-by-n and invertible and A and B are arbitrary and
n-by-n. Prove that (A - BC-')C(A - BC-')* - BC-' B* = ACA* BA* - (BA*)*. 58. Refer to Theorem 1.6. Show that in case (1),
I-' L
CD
-'B
I
-[ 0
-A1
A J
L
®'
-CA-' I
0
59. Continuing our extension of Theorem 1.6, show that
_
(A - BD-'C)-' -D-'C (A - BD-'C)-'
A
B
1-1
CD
-A-' B(D - CA-' B)-' (D - CA-' B)-'
60. Use exercise 59 to derive the following identities:
(a) (A - BD-'C)-' BD-' = A-'B(D - CA-'B)-l (A-' - BD-'C)-' BD-' = AB(D - CA-' B)-' (c) (A + BD-'C)-' BD-' = A-' B(D + CA-' B)-' (d) (D - CA-' B)-' = D-' + D-C (A - BD-'C)-' BD-' (e) (D - CA-' B)-' = D-' - D-'C (BD-'C - A)-' BD-' (I) (D + CA-' B)-' = D-' - D-'C (BD-'C + A)-' BD-' (b)
(g) (D-' + CA-' B)-' = D - DC (BDC + A)-' BD
(h) (D - CAB)-' = D-' - D-'C (BD-'C - A-')-' BD-' (i) (D + CAB)-' = D-' - D-'C (BD-1C +A-')-' BD. 61. Apply Theorem 1.6 and rthe related 1problems above to the special case of the partitioned matrix
B*
D J . What formulas and identities can
I.
you deduce?
62. Prove the Schur determinant formulas. Let M =
A
B
CD
(i) If det(A) # 0, then det(M) = det(A)det(M/A). (ii) If det(D) 0 0, then det(M) = det(Ad)det(M//D).
The Idea of inverse
36
Further Reading [Agg&Lamo, 2002] RitaAggarwala and Michael P. Lamoureux, Inverting the Pascal Matrix Plus One, The American Mathematical Monthly, Vol. 109, No. 4, April, (2002), 371-377.
[Banachiewicz, 1937] T. Banachiewicz, Sur Berechnung der Determinanten, wie auch der Inversen, and zur darauf basierten aufiosung der Systeme Linearer Gleichungen, Acta Astronomica, Serie C, 3 (1937), 41-67.
[B&R, 1986(2)] T. S. Blyth and E. F. Robertson, Matrices and Vector Spaces, Vol. 2, Chapman & Hall, New York, (1986). [C&V 1993] G. S. Call and D. J. Velleman, Pascal's Matrices, The American Mathematical Monthly, Vol. 100, (1993), 372-376.
[Greenspan, 1955] Donald Greenspan, Methods of Matrix Inversion, The American Mathematical Monthly, Vol. 62, May, (1955), 303-318. [Hager, 1989] W. W. Hager, Updating the Inverse of a Matrix, SIAM Rev., Vol. 31, (1989), 221-239.
[H&S, 19811 H. V. Henderson and S. R. Searle, On Deriving the Inverse of a Sum of Matrices, SIAM Rev., Vol. 23, (1981), 53-60. [M&K, 2001 ] Tibor Mazuch and Jan Kozanek, New Recurrent Algorithm for a Matrix Inversion, Journal of Computational and Applied Mathematics, Vol. 136, No. 1-2, 1 November, (2001), 219-226.
[Meyer, 2000] Carl Meyer, Matrix Analysis and Applied Linear Algebra, SIAM, Philadelphia, (2000). [vonN&G, 1947] John von Neumann and H. H. Goldstine, Numerical Inverting of Matrices of High Order, Bulletin of the American Mathematical Society, Vol. 53, (1947), 1021-1099.
[Zhang, 2005] F. Zhang, The Schur Complement and Its Applications, Springer, New York, (2005).
1.2 The Special Case of "Square" Systems
37
1.2.3 MATLAB Moment 1.2.3.1
Computing Inverse Matrices
If A is a square matrix, the command for returning the inverse, if the matrix is nonsingular, is inv(A)
Of course, the answer is up to round off. Let's look at an example of a random 4-by-4 complex matrix.
> >format rat >>A = fix(l0 * rand(4)) + fix(l0 * rand(4)) * i
A= Columns 1 through 3
9+9i 8 8+ li 2+9i 7+3i 4+2i
6+4i 4+8i 6+ Ii 4+8i
0
7+6i
Column 4
9+2i
7+li 1
4+7i >>inv(A) ans =
Columns I through 3
1/2 - 1/6i
-959/2524+516/2381i
291/4762+520/4451i
-1/10 - 2/1 51
1072/6651 + 139/1767i
212/4359 - 235/26361
-7/10 + 7/3 01 1/ 10 - 1 113 0i
1033/2317 - 714/1969
71/23810 - 879/50981 31/11905 + 439/81811
129/1379+365/1057i
Column 4 -489/2381 + 407/2381 i 399/7241 + 331/338 1 i
683/1844 - 268/1043i
-6/11905+81/601i
38
The Idea of Inverse
We ask for the determinant of A which, of course, should not be zero
>>det(A) ans =
-396 + 1248i > >det(inv(A)) ans =
-11/47620 - 26/35715i > > det(A) * det(inv(A)) ans =
I- 1/1385722962267845i
Note that, theoretically, det(A) and det(inv(A)) should be inverses to one another. But after all, look how small the imaginary part of the answer is in MATLAB. We can also check our answer by multiplying A and inv(A). Again, we must interpret the answer presented.
>> A*inv(A) ans =
Columns I through 3
*-* 1+* *-* *-* *-* *-* Column 4
We need to recognize the identity matrix up to round off. To get a more satisfying result, try
>>round(A*inv(A)). There may be occasions where you actually want a singular square matrix. Here is a way to construct a matrix that is not only singular but has a prescribed dependency relation among the columns. To illustrate let's say we want the third row of a matrix to be two times the first row plus three times the second.
1.2 The Special Case of "Square" Systems
39
[III 1; 2222; 3333; 4444]
>>A
A= i
l
1
I
2
2
2
2
3
3
3
3
4
4
4
4
> > A(:, 3) = A(:, 1 : 2) * [2, 3]'
A= 1
1
2 3 4
2
5
1
10
2
3
15
3
4
20
4
>>det(A) ans = 0
1.2.4
Numerical Note Matrix Inversion
1.2.4.1
Inverting a matrix in floating point arithmetic has its pitfalls. Consider the _i l where n is a positive integer. Exact simple example, A= I I I I n J
n -nn I
However, when n is large, the arithmetic gives A` _ J. machine may not distinguish -n + I from -n. Thus, the inverse returned would be
n -n -n n
J,
which has zero determinant and hence is singular.
User beware!
1.2.4.2
Operation Counts
You probably learned in your linear algebra class how to compute the inverse of a small matrix using the algorithm [A I] -> [1 I A-'] by using GaussJordan elimination. If the matrix A is n-by-n, the number of multiplications and the number of additions can be counted. 1
1. The number of additions is n3 - 2n2 + n. 2. The number of multiplications is n3. For a very large matrix, n3 dominates and is used to approximate the number of additions and multiplications.
Chapter 2 Generating Invertible Matrices
Gauss elimination, back substitution, free parameters, Type 1, Type 1!, and Type 111 operations; pivot
2.1
A Brief Review of Gauss Elimination with Back Substitution
This is a good time to go back to your linear algebra book and review the procedure of Gauss elimination with back substitution to obtain a solution to a system of linear equations. We will give only a brief refresher here. First, we recall how nice "upper triangular" systems are. Suppose we need to solve
2x,+x2-x3=5 X2+x3=3. 3x3 = 6
There is hardly any challenge here. Obviously, the place to start is at the bottom. There is only one equation with one unknown; 3X3 = 6 so X3 = 2. Now knowing what x3 is, the second equation from the bottom has only one equation with one unknown. So, X2 + 2 = 3, whence x2 = 1. Finally, since we know x2 and x3, x, is easily deduced from the first equation; 2x, + 1 - 2 = 5, so x, = 3. Thus the unique solution to this system of linear equations is (3, 1, 2). Do you see why this process is called back substitution? When it works, it is great! However, consider 2X, + X2 - x3 = 5
X3=2. 3X3 = 6
We can start at the bottom as before, but there appears to be no way to obtain a value for X2. To do this, we would need the diagonal coefficients to be nonzero.
41
Generating /nvertible Matrices
42
Generally then, if we have a triangular system a11xl + a12X2 + ... + alnx, = bl
a22x2 + ... + a2nxn = b2 annx,, = bpi
and all the diagonal coefficients aii are nonzero, then back substitution will work recursively. That is, symbolically, xn = bnlann 1
X11-1 =
xi = for
an-In-I
(bn-1 - an-Inxn)
1 (bi - ai.i+l xi+l - ai.i+2xi+2 aii
- ainxn)
i =n - 1,n -2,... ,2, 1.
What happens if the system is not square? Consider
x1 + 2x2 + 3x3 + xa - x5 = 2 x3 + xa + x5 = -1 . We can make it triangular, sort of: xa - 2x5 = 4
{
x l + 3x3 + xa = 2 - 2x2 + x5 X3 + X4 = -1 - X5
xa=4+2x5
Now back substitution gives
xl = 13 - 2x2 + 8x5 X3 = -5 - 3x5
xa=4+2x5. X2 = X2 X5 = X5
Clearly, x2 and z5 can be any number at all. These variables are called 'free." So we introduce free parameters, say x2 = s, x5 = t. Then, all the solutions to this system can he described by the infinite set {(13 - 2s + 8t, s, -5 - 3t, 4 + 2t, t) I s, t are arbitrary}. Wouldn't it he great if all linear systems were (upper) triangular? Well, they are not! The question is, if a system is not triangular, can you transform it into a triangular system without disturbing the set of solutions? The method we review next has ancient roots but was popularized by the German mathematician
2.1 A Brief Review of Gauss Elimination with Back Substitution
43
Johann Carl Friedrich Gauss (30 April 1777 - 23 February 1855). It is known to us as Gauss elimination. We call two linear systems equivalent if they have the same set of solutions. The basic method for solving a system of linear equations is to replace the given system with an equivalent system that is easier to solve. Three elementary operations can help us achieve a triangular system: Type 1: Interchange two equations.
Type 11: Multiply an equation by a nonzero constant.
Type 111: Multiply an equation by any constant and add the result to another equation. The strategy is to focus on the diagonal (pivot) position and use Type III operations to zero out (i.e., eliminate) all the elements below the pivot. If we find a zero in a pivot position, we use a Type I operation to swap a nonzero number below into the pivot position. Type II operations ensure that a pivot can always be made to equal 1. For example,
I
x, + 2x2 + 3x3 = 6
x, + 2x2 + 3x3 = 6 pivot
2x, + 5x2 + x3 = 9 X1 + 4x2 - 6x3 = 1
all = 1
{
pivot
X2-5X3 = -3
(I)
a22
2X2 - 9X3 = -5
x, + 2x2 + 3x3 = 6 X2 - 5x3 = -3 X3 = 1
Back substitution yields the unique solution (-1, 2, 1). As another example, consider
x, + 2x2 + x3 + x4 = 4 2x, + 4X2 - x3 + 2x4 = I I X1 + X2 + 2x3 + 3x4 = I x, + 2x2 + X3 + X4 = 4
-x2+X3+2X4 = -3 -3X3 =3
-
x, + 2x2 + x3 + x4 = 4 -3x3 -x2 + x3 + 2x4 = -3
=3-
x, + 2x2 + X3 = 4 - X4
-X2+X3 = -3-2X4. -3X3 =
3
We see x4 is free, so the solution set is
{(1 - 5t, 2 + 2t, -1, t) I t arbitrary}. However, things do not always work out as we might hope. For example, if a zero pivot occurs, maybe we can fix the situation and maybe we cannot.
44
Generating Invertible Matrices Consider
I x+y+z= 4
x+y+z= 4
I
2x + 2Y + 5z = 3 ->
3z = -5
4x + 4y + 8z =
4z = -6
10
.
There is no way these last two equations can he solved simultaneously. This system is evidently inconsistent - that is, no solution exists.
Exercise Set 3 1. Solve the following by Gauss elimination:
+ +
2x, X1 (a)
-
{ 6x, 4x,
+ + + +
2x,
4x, (b)
{
9x1 10x,
2fx, I 3,/2-x,
(d)
{
1
(g)
8x3
18x2
+ -
6x3 10x3 6x3
10x2
+
10x3
+ + +
3x3
2x2
2x2 3x2
x2 3ix2
-
6ix3
-
2x,
+
4x3
X1
+
;X2
-
3x3
x2
4x2 5x2
+
-
x3
5x3 4x3
4x,
+
(2 - 2i)x2
(2 - 2i)x,
-
2x2
(x, + 2x2 + 3x3 = 4
1
12
20
-
= = = =
8x4
+ +
2x4 3x4 15x4
-
-2 10
24 10
2x3
+
+ + +
2
x3
4x,
3x, { 4x,
(0
3x3
Six,
X1 (e)
= = = =
x3 2x3
+
2x2
+ + +
,f2-x,
(c)
+ -
x2 3x2 3x2 4x2
+ +
+
+ +
9ix5
4x4 X4
-
3x5
3ix4
X4
-
6x4 6x4
= =
i i
= = =
3 1
5
12x5
= = =
3i
4 1
2.1 A Brief Review of'Gauss Elimination with Back Substitution XI (h) X1
xI
+
-
G)
(k)
X2
IZ
x2
{
3x2
5x1
-
X2
X1
-
12-X 2
{
2x1
I
X2
+ +
0
=
0
+ +
2x3 2
= = =
X3 X3
3x3
4X4
4x4
0
=0
3
6 7
=
X3
-
'rte 1/
- y 2_x3
4x2
+ +
-4ix1
jX3
+
X2
= - X4 = X4
2X3
+ + +
+ +
3x1
2X3
+ +
4X2
2x1
xI (i)
-
ZX2
45
6x2 (2 - 6i) X2
3ix2
+ + +
=
37r + 2,/2-
(16 + 12i) x3 (16 - 12i)x3
(-3 + 91)x3
= = =
26 8 - 26i.
3 + 12i
2. Consider the nonlinear problem x2
{
2x2 3x2
+ + +
2xy
xy 3xy
+ + +
3y2
= =
29.
y2
=
3
y2
1
Is there a way of making this a linear problem and finding x and y? 3. Suppose you need to solve a system of linear equations with three different
right-hand sides but the same left-hand side -that is Ax = b1, Ax = b2, and Ax = b3. What would be an efficient way to do this?
4. Suppose you do Gauss elimination and back substitution on a system of three equations in three unknowns. Count the number of multiplications/divisions in the elimination part (11), in the back substitution part (6), and in the total. Count the number of additions/subtractions in the elimination part (8), in the back substitution part (3), and the total. (Hint: do not count creating zeros. The numerical people wisely put zeros where they belong and do not risk introducing roundoff error.)
5. If you are brave, repeat problem 4 for n equations in n unknowns. At
the ith stage, you need (n - i) + (n - i)(n - i + 1) multiplications/ divisions, so the total number of multiplications/divisions for the elimination is (2n3 + 3n2 - 5n)/6. At the ith stage, you need (n - i) (n - i + 1) additions/subtractions, so you need a total of (n3 - n)/3 additions/subtractions. For the back substitution part, you need (n2+n)/2
Generating Invertible Matrices
46
multiplications/divisions and (n2-n)/2 additions/subtractions, so the total number of multiplications/divisions is it 3 /3 + n2 - n/3 and the total number of additions/subtractions is n3/3 + n222/2 - 5n/6. 6. Create some examples of four equations in three unknowns (x, y, z) such that a) one solution exists, b) an infinite number of solutions exist, and c) no solutions exist. Recall our (zero, one)-advice from before. 7. Pietro finds 16 U.S. coins worth 89 cents in the Trevi Fountain in Rome. The coins are dimes, nickels, and pennies. How many of each coin did he find? Is your answer unique? 8. Suppose (xi, YO, (x2, y2), and (x3, y3) are distinct points in R2 and lie on
the same parabola y = Ax2 + Bx + C. Solve for A, B, and C in terms of the given data. Make up a concrete example and find the parabola. 9. A linear system is called overdetermined iff there are more equations than unknowns. Argue that the following overdetermined system is inconsi-
2x+2y=1
stent:
-4x + 8y = -8. Draw a picture in 1R2 to see why.
3x-3y=9
10. A linear system of m linear equations inn unknowns is call underdetermined iff m < n (i.e., there are fewer equations than unknowns). Argue that it is not possible for an underdetermined linear system to have only one solution. 11. Green plants use sunlight to convert carbon dioxide and water to glucose and oxygen. We are the beneficiaries of the oxygen part. Chemists write xi CO2 + x2H2O
X302 + X4C6H12O6.
To balance, this equation must have the same number of each atom on each side. Set up a system of linear equations and balance the system. Is your solution unique?
Further Reading [B&R, 1986(2)] T. S. Blyth and E. F. Robertson, Matrices and Vector Spaces, Vol. 2, Chapman & Hall, New York, (1986).
2.1 A Brief Review of Gauss Elimination with Back Substitution
47
Gauss (Forward) Elimination with Back Substitution
Use back substitution to find the determined variables in terms of the free variables: there will be infinitely many solutions
Use back substitution to find the values of the variables: there will be one unique solutions
Figure 2.1:
Gauss elimination flow chart.
2.1.1 MATLAB Moment 2.1.1.1
Solving Systems of Linear Equations
MATLAB uses the slash (/) and backslash (\) to return a solution to a system of linear equations. X = A \ B returns a solution to the matrix equation AX = B,
while X = B/A returns a solution to XA = B. If A is nonsingular, then A\B returns the unique solution to AX = B. If A is m-by-n with m > n
Generating hivertible Matrices
48
(overdetermined), A\B returns a least squares solution. If A has rank ii, there is a unique least squares solution. If A has rank less than n, then A\B is a basic solution. If the system is underdetermined (in < n), then A \B is a basic solution. If the system has no solution, the A\B is a least squares solution. Let's look at some examples. Let's solve the first homework problem of Exercise Set 3, la. >>A=[2 I -1;1 3 2;6 -3 -3;4 -4 8]
A= 2
1
-1
1
3
2
-3 -3 4 -4 8 >> b=[2;1;12;20] 6
b= 2 1
12
20
>> x=A\b
x=
2.0000
-1.0000 1.0000
>> A*x ans = 2.0000 1.0000 12.0000 1.0000
Next let's solve I k since it has complex coefficients.
>> C=[4 6 16+12i;-4 2-6i 16-12i;0 3i -3+9i]
C= 4.0000
6.0000
-4.0000
2.0000 - 6.0000i 0 + 3.0000i 0 >> d=[26;8-26i;3+121]
d= 26.0000
8.0000 - 26.0000i 3.0000 + 12.0000i
>> y=C\d
y= -0.1 154 + 0.0769i 1.1538 +.02308i 0.7308 - 0.6538i
16.0000 + 12.0000i 16.0000 - 12.0000i
-3.0000 + 9.0000i
2.2 Elementary Matrices
49
» c*y ans =
26.0000
8.0000 - 26.0000i 3.0000 + 12.0000i
Now let's look at an overdetermined system. >> E=[2 2;-4 8;3 -31
E= 2
2
-4
8
-3 >> f=[1; -8; 91 3
1
-8 9
>> z=E\f
z=
1.6250
-0.3750 >> E*z
ans = 2.5000
-9.5000 6.0000
matrix units, transvections, dilations, permutation matrix, elementary row operations, elementary column operations, the minimal polynomial
2.2
Elementary Matrices
We have seen that when the coefficient matrix of a system of linear equations is invertible, we can immediately formulate the solution to the system in terms of the inverse of this matrix. Thus, in this section, we look at ways of generating invertible matrices. Several easy examples immediately come to mind. The nby-n identity matrix I,, is always invertible since = I. So
1
10
-I =[ 1
U 1
]
0
0 1
]' L
1
0
0 ]-'
0
1
0
0
0
1
1
=
0 0
0 1
0
0
0, 1
Generating Invertible Matrices
50
and so on. Next, if we multiply / by a nonzero scalar X, then W, is invertible by Theorem 1.2 of Chapter I page 18, with = a /,,. Thus, for example, 10
0 0
0 10 0
0
-'
1/10
0
0
1/ 10
0
0
0
1/10
=
0 10
0
More generally, when we
take a diagonal matrix with nonzero entries on the diagonal, we have an invertible matrix with an easily computed inverse matrix. Symbolically, if D =
diag(di,d2,
with all d; 1-' 0fori = I, ,n, then D' _
diag(1/dj, 1/d2, , 1
0
0
1/2
0 0
0
0
1/3
For example,
I
0
0
2
0 0
0
0
3
-'
Now these examples are not going to impress our smart friends, so we need
to be a bit more clever. Say we know an invertible matrix U. Then we can take a diagonal invertible matrix D and form U-1 DU, which has the inverse U-' D-' U. Thus, one nontrivial invertible matrix U gives us a way to form many nontrivial examples. For instance, take D = 1
0
2
1
1
1
-1
0
-1
-1
0
-2
-1 . Then U-1 =
2
0
0
0
2
0
0
0
1
and U =
-2
0
0
1
I
I
0
I
, as you ma y ve rify, and
2
0
0
1
0
2
0
1
1
0
2
0
1
1
1
1
0
1
0
0
1
-1
0
-1
=
-2
0
0
1
2
1
1
0
3
is
an invertible matrix. Unfortunately, not all invertible matrices are obtainable this way (i.e., from a diagonal invertible matrix). We now turn to an approach that will generate all possible invertible matrices. You probably recall that one of the first algorithms you learned in linear algebra class was Gauss elimination. We reviewed this algorithm in Section 2.1. This involved three kinds of "elementary" operations that had the nice property of not changing the solution set to a system of linear equations. We will now elaborate a matrix approach to these ideas. There are three kinds of elementary matrices, all invertible. All are obtained from the identity matrix by perturbing it in special ways. We begin with some very simple matrices (not invertible at all) that can be viewed as the building blocks for all matrices. These are the so-called matrix units E,j. Define E11 to be the n-by-n matrix (actually, this makes sense for m-by-n matrices as well), with all entries zero except that the (i, j) entry has a one. In the 2-by-2 case,
2.2 Elementary Matrices
51
we can easily exhibit all of them:
Ell =[
0
00 ]Ei2= [
0
1,E21
0
=[ 0
0
]E22=
0
0
J
The essential facts about matrix units are collected in the next theorem, whose proof is left as an exercise.
THEOREM 2.1
1. >r-i E,, = l t
2. Eij=Epyiffi=pand j=q. 0 ifr # j ifr = j
3. Ei j Erg
E,5
4. E4 = Eii for all i. 5. Given any matrix A = [aij] in C' ' we have A 6. T h e collection { Eij l i , j
=I,
-
- -
i aij Eij.
, n) is a basis for the vector space of
matrices C"x" which, therefore, has dimension n2. We will now use the matrix units to define the first two types of elementary matrices. The first type of elementary matrix (Type III) goes under a name derived from geometry. It is called a transvection and is defined as follows.
DEFINITION2.1
(transvection)
For i # j, define Tij (c) = 1" + cEij for any complex number c.
Do not let the formula intimidate you. The idea is very simple. Just take the identity matrix and put the complex number c in the (i, j) position. In the 2-by-2 case, we have
T12(0 _ [ 0
1
] and T21 (c) _ [
1
01
The essential facts about transvections are collected in the next theorem.
Generating Invertible Matrices
52
THEOREM 2.2 Suppose c and d are complex numbers. Then I. Tij(c)T,,j(d) = Tij (d + c).
2. Tj(0)=1,. is invertible and Tj(c)-' = Tij(-c).
3. Each 4.
Tij(c)Tr+(d)=Tr.,(d)Tj(c), ifj 0r 54 s 54 i 0j
5.
T,j(cd) = Tir(c)_'Trj(d)-'Tr(c)Trj(d), ifr 0 i 54 j 0 r.
6.
T,j(c)T = T ,(c)
7.
T,j(c)* = Ti,(C).
8. det(T,j(c)) = 1. The proofs are routine and left to the reader. The second type of elementary matrix (Type II) also has a name derived from geometry. It is called a dilation and is defined as follows.
DEFINITION 2.2 For any i = 1, 1)E,,.
(dilation) -
-
, n, and any nonzero complex number c, define D,(c) _
Once again, what we are doing is quite straightforward. To form D(c), simply write down the identity matrix and replace the 1 in the diagonal (i, i) position with c. In the 2-by-2 case, we have
Di(c) _ [ 0
10
1
and
c0
D2(c) = L
The salient facts about dilations are collected in the next theorem.
THEOREM 2.3 Suppose c and d are nonzero complex numbers. Then 1. j 96i
2. Di(c)D,(d) = Di(cd).
3. Di(l)=I,,.
2.2 Elementary Matrices
4. Di (c) is invertible and Di (c)-1 = Di (c
53
1 ).
5. det(Di(c)) = c.
6. Di(c)Dj(d) = Dj(d)Di(c), if i # j. 7.
Di (c)T = Di (c)
8. Di(c)* = Di(c). Once again, the easy proofs are left to the reader. Finally, we have the third type of elementary matrix (Type I), which is called a permutation matrix. Let S denote the set of all permutations of then element set [n] := (1, 2, , n}. Strictly speaking, a permutation is a one-to-one function from [n] onto [ii]. If a is a permutation, we have or change the identity matrix as follows.
DEFINITION 2.3 (permutation matrix) Let a be a permutation. Define the permutation matrix P((T) = [61,a(j)] where 1 if i = Q(1) That is, ent P 8 i.a(j) ( (v)) 0 if i 54 or(j) Once again, a very simple idea is being expressed here. All we are doing is swapping rows of the identity matrix according to the permutation U. Let's look at an example. Suppose u, in cycle notation, is the permutation (123) of [3] = 11, 2, 3). In other words, under o,, l goes to 2, 2 goes to 3, and 3 goes back to 1. Then,
P(123) =
812
813
81
822
823
821
832
833
831
0
=
1
0 0
0
0
1
0
1
Notice that this matrix is obtained from the identity matrix /3 by sending the first row to the second, the second row to the third, and the third row to the first, just as v indicated. Since every permutation is a product (i.e., composition) of transpositions (i.e., permutations that leave everything fixed except for swapping two elements), it suffices to deal with permutation matrices that are obtained by swapping only two rows of the identity matrix. For example, in C3x3 0
P(12) =
1
0
1
0 0
0
0 1
0
0 0
0
1
1
and
P(23) =
We collect the basic facts about permutation matrices next.
0 1
0
Generating Invertible Matrices
54
THEOREM 2.4 1.
2.
P(Tr) P(a) = P(7ru), where ar and a are any permutations in S,,. P(L) = I, where r. is the identity permutation that leaves every element of [n] fixed.
3. P(Q-1) = P(a)-' = P(Q)'' = p(U)*. 4.
P(ij) = I - Ei1 - Eli + E;j + Ei,, for any i, j with I < i, j < n.
5.
P(ij)-' = P(ij).
6.
Ejj = P(ij)-'EuP(ij)
In the 2-by-2 case, only two permutations exist: the identity L and the transposition (12). Thus,
P(L) =
01
and J
U
P(12) = L
1. 1
So there you are. We have developed the three kinds of elementary matrices. Note that each is expressible in terms of matrix units and each is invertible with an inverse of the same type. Moreover, the inverses are very easy to compute! So why have we carefully developed these matrices? The answer is they do real work for us. The way they do work is given by the next two theorems.
THEOREM 2.5 (theorem on elementary row operations) Let A be a matrix in cC^' X" . Then the matrix product 1.
T;j(c)A amounts to adding the left c multiple of the jth row of A to the ith row of A.
2. Di(c)A amounts to multiplying the ith row of A on the left by c. 3. P(v)A amounts to moving the ith row of A into the position of the a(i)th row for each i.
4. P(ij)A amounts to swapping the ith and jth rows of A and leaving the other rows alone. We have similar results for columns.
2.2 Elementary Matrices
55
THEOREM 2.6 (theorem on elementary column operations) Let A be a matrix in C"'x". Then the matrix product 1.
AT, (c) amounts to adding the right c multiple of the ith column of A to the jth column of A.
2. AD;(c) amounts to multiplying the ith column of A on the right by c.
3. AP(a-1) amounts to moving the ith column of A into the position of the a(i )th column.
4. AP(ij) amounts to swapping the ith and jth columns of A and leaving the other columns alone.
We illustrate in the 2-by-2 case. Let A = [ a I
a
0
1
J
a
b l_ r a+ac b+ad 1
c
d
1
0 ] [ c
c
d
r
P(1 2)A = [ I
a
_
a as+b
d ]
[ a
b
d
].
D1(a)A
Then T12(a)A =
_ r as ab c
d
b
a nd
d
]
[ 0 1 c ca+d Now that we have elementary matrices to work for us, we can establish some significant results. The first will be to determine all invertible matrices. However, we need a basic fact about multiplying an arbitrary matrix by an invertible matrix on the left. Dependency relationships among columns do not change.
THEOREM 2.7 Suppose A is an m-by-n matrix and R is an m-by-m invertible matrix. Select any columns C1, c2, , ck from A. Then (CI, c2, - , ek) is independent if and only if the corresponding columns of RA are independent. Moreover, (c, , c2, , ck } is dependent if and only if the corresponding columns of RA are dependent with the same scalars providing the dependency relations for both sets of vectors. -
PROOF First note that the product RA is the same as [Ra, I Rae I . I Ra" ], Suppose (ci, c2, , ck} if A is partitioned into columns, A = [a, 1a21 ... is an independent set of columns. We claim the corresponding vectors Re,, Rc2, - , Rck are also independent. For this we take the usual approach and assume we have a linear combination of these vectors that produces the zero vector, say
a, Rc, + a2 Rc2 + ... + ak Rck = -6.
56
Generating Invertible Matrices
But then, R(aici +a2C2+- -+a cA) = -6. Since R is invertible, we conclude a1Ci + a2C2 + + akck = But the c's are independent, so we may conclude all the as are zero, which is what we needed to show the vectors Rci, Rc2, , Rck to be independent. Conversely, suppose Rci, Rc2, , RcA are independent. We show Ci, c2, , ck are independent. Suppose Of ici +
7.
RV =
V. Then R(aici +a7c-)
0.
But then, ai Rc1 + a2Rc2 + - - + ak Rck = I and independence of the Rcs implies all the a's are zero, which completes this part of the proof. The rest of the proof is transparent and left to the reader. 0 -
COROLLARY 2.1 With A and R as above, the dimension of the column space of A is equal to the dimension of the column space of RA. (Note: We did not say the column spaces are the same.)
We now come to our main result.
THEOREM 2.8 Let A be an n-by-n square matrix. Then A is invertible if and only if A can be written as a product of elementary matrices.
PROOF If A can be written as a product of elementary matrices, then A is a product of invertible matrices and hence itself must be invertible by Theorem 1.2 of Chapter 1 page 18. Conversely, suppose A is invertible. Then the first column of A cannot consist entirely of zeros. Use a permutation matrix P, if necessary, to put a nonzero entry in the (1,1) position. Then, use a dilation D to make the (1,1) entry equal to 1. Now use transvections to "clean out" (i.e., zero) all the entries in the first column below the (1,1) position. Thus, a product of elementary matrices, some of which
could be the identity, produces TDPA =
...
*
I
*
*
0
*
*
*
0
*
*
*
L0 * *
. Now there
*j
must be a nonzero entry at or below the (2,2) position of this matrix. Otherwise, the first two columns of this matrix would be dependent by Theorem 2.7. But this would contradict that A is invertible, so its columns must be independent. Use a permutation matrix, if necessary, to swap a nonzero entry into the (2,2) position and use a dilation, if necessary, to make it 1. Now use transvections to
2.2 Elementary Matrices
57
"clean out" above and below the (2,2) entry. Therefore, we achieve
1
T, D, P, TDPA =
0 0
0
...
0
*
1
*
*
0
*
*
0
*
*
*
*
*
...
*
Again, if all the entries of the third column at and below the (3,3) entry were zero, the third column would be a linear combination of the first two, again contradicting the invertibility of A. So, continuing this process, which must terminate after a finite number of steps, we get E, E2E3 . EPA = 1", where all the E;s are elementary matrices. Thus, A = EP' E2' El ', which again is a product of elementary matrices. This completes the proof. 0 This theorem is the basis of an algorithm you may recall for computing by hand the inverse of, at least, small matrices. Begin with a square matrix A and augment it with the identity matrix 1, forming [All]. Then apply elementary operations on the left attempting to turn A into 1. If the process succeeds, you will have produced A', where I was originally; in other words, [I IA-1 ] will be the result. If you keep track of the elementary matrices used, you will also be able to express A as a product of elementary matrices.
2.2.1
The Minimal Polynomial
There is a natural and useful connection between matrices and polynomials.
We know the dimension of C""' as a vector space is n2 so, given a matrix eventually produce a dependent set. Thus there must exist scalars, not all zero, such that AP + at,-1 At"-' + + a, A + aol _ 0. This naturally associates to A the polynomial p(x) = xP +ap_,xP-1 + +alx + ao, and we can think of A as being a "root" of p since replacing x by A yields p(A) = 0. Recall that a polynomial with leading A, we must have its powers 1, A, A22, A3,
coefficient I is called monic.
THEOREM 2.9 Every matrix in C""" has a unique monic polynomial of least degree that it satisfies as a root. This unique polynomial is called the minimal (or minimum) polynomial of the matrix and is denoted p.,,.
Generating Invertible Matrices
58
PROOF Existence of such a polynomial is clear by the argument given above the theorem, so we address uniqueness. Suppose f (x) = xt'+a,,_,xt'-1 +- - +
a,x + ao and g(x) = xN + R,,_,x"-1 +
R,x + 13o are two polynomials of
least degree satisfied by A. Then A would also satisfy f (x) - g(x) = (a,,_, 13 p_, )x"- ' +. +(ao -13o). If any coefficient of this polynomial were nonzero, we could produce a monic polynomial of degree less than p satisfied by A, a contradiction. Hence aj = 13, for all j and so p(x) = q(x). This completes the
proof.
0
You may be wondering why we brought up the minimal polynomial at this point. There is a nice connection to matrix inverses. Our definition of matrix inverse requires two verifications. The inverse must work on both sides. The next theorem saves us much work. It says you only have to check one equation.
THEOREM 2.10
Suppose A is a square matrix in CV". If there exists a matrix B in C"'
such that AB = 1, then BA = I and so B = A-'. Moreover, a square matrix A in C""" has an inverse if and only if the constant term of its minimal polynomial is nonzero. If A` exists, it is expressible as a polynomial in A of degree deg(µA) - 1. In particular, if A commutes with a matrix C, then A-' commutes with C also.
PROOF
Suppose that AB = 1. We shall prove that BA = I as well.
Consider the minimal polynomial p.A(x) = 13o + PI X + 132x2 +.. + x"'. First
+x'° = x(p(x)), we claim 13o # 0. Otherwise, LA(x) = Rix + 132x2 + + x"'-1. But then, p(A) = 13, + 02A + + where p(x) = R, + R2x +
A'-'
=13,AB+132AAB+...+A""AB=
=RII+132A1+...+A"' I
+ A"')B = pA(A)B = 0. But the degree of p is less
(13,A + 02A2 + than the degree of µA, and this is a contradiction. Therefore, 13o # 0. This allows us to solve for the identity matrix in the minimal polynomial equation; 13o1=-RIA-132A2- -A so that!=
-
1
130
130
13z
130
Multiplying through by B on the right we get B01= --AB - -A2B -
-
-
Ro
-
13o
A"'- . This expresses B as a polynomial A"' B = - Ri 1 - 132 A 0 in A of degreeoone less than that of the minimal polynomial and hence B 00
commutes with A. Thus I = AB = BA. The remaining details are left to the reader.
0
There is an algorithm that allows us to compute the minimum polynomial of matrices that are not too large. Suppose we are given an n-by-n matrix A. Start
2.2 Elementary Matrices
59
multiplying A by itself to form A, A2, ... , A". Form a huge matrix B where the rows of B are 1, A, A2, ... strung out as rows so that each row of B has n2 elements and B is (n + 1)-by -n22. We must find a dependency among the first p rows of B where p is minimal. Append the identity matrix /n+1 to B and row reduce [B 1 /n+l ].Look for the first row of zeros in the transformed B matrix. The corresponding coefficients in the transformed 1,, matrix give the coefficients 0 3 of a dependency. Let's illustrate with an example. Let A = 2 1 1 -1 0 3 -2 0 12 -14 0 30 Then A2 = 3 1 10 and A3 = -5 1 40 . Then we form -4 0 6 -10 0 6 [B 114]= 1
1
1
-2 -14
0 0 0 0
0
0
1
0
3
2
1
1
0
12
3
1
10
-5
-4
30
1
40
-10
0 0 0 0
-1
1
1
3
0 0 0
6 6
0 1
0 0
0 0
0 0
1
0
0
1
Now do you see what we mean about stringing out the powers of A into rows of B? Next row reduce [B 1 /4]. 1
0
0
0
1
0
0
1
0
2 5
0
0
0
1
1
0
0
0
0
0
0 17
5
0 1
3
0
0
5
1
_5
3
6
6
0
2 3
0
26 45
_§
90 4
90 _ I
5 5
5 5
3
6
5 I 6
_3
6
5
5
0
0
0
0
0
0
0
1
_L
I
7
J
The first full row of zeros in the transformed B part is the last row, so we read that an annihilating polynomial of A is 16x2 - 6x3. This is not a monic polynomial, however, but that is easy to fix by an appropriate scalar multiple. In this case, multiply through by -6. Then we conclude p A(x) =x3-5x2+ lOx-6 is the minimal polynomial of A. The reader may verify this. A more efficient way to compute the minimal polynomial will be described later.
Exercise Set 4 1. Argue that D;(6)Tj,(`) = Tj1(-c)D;(6) for a # 0. 2. Fill in the details of the proofs of Theorems 2.1 through 2.3. 3. Fill in the details of Theorem 2.4. 4. Complete the proof of Theorem 2.5 and Theorem 2.6.
60
Generating hmertible Matrices 5. Argue that zero cannot be a root of µA if A is invertible. 1
6. Compute the inverse of A =
-l
2 3
2
1
2
5
2
-4
-1
2
and express it as
a polynomial in A and as a product of elementary matrices.
7. If A= [a,j ], describe P(a)-1 AP(Q) for or, a permutation in S. U11
8. Suppose U =
0 0
P2 such that P, U P2 =
U12 U22
0
U13
Find permutation matrices P, and
U23 1133
U33
0
u23
1422
0 0
U13
1112
U11
9. Show that permutation matrices are, in a sense, redundant, since all you really need are transvections and dilations. (Hint: Show P;j = D,(-1)T1(l) T,1j (- I )Tj; (1)). Explain in words how this sequence of dilations and transvections accomplishes a row swap.
10. Consider the n-by-n matrix units Eij and let A be any n-by-n matrix. Describe AE;j, E;jA, and Eij AEL,,,. What is (E;j Ejk)T? 11. Find the minimal polynomial of A =
I
I
I
0
1
1
0
0
1
I.
12. Make up a 4-by-4 matrix and compute its minimal polynomial by the algorithm described on pages 58-59.
13. Let A= I
b
where a 54 0. Argue that A = d J for suitable choices of x, u, v, and w. a
c
z
10
0
w]
14. What can you say about the determinant of a permutation matrix? 15. Argue that multiplying a matrix by an elementary matrix does not change the order of its largest nonzero minor. Argue that nonzero minors go to nonzero minors and zero minors go to zero minors. Why does this mean
that the rank of an m-by-n matrix is the order of its largest nonzero minor?
2.2 Elementary Matrices
61
16. Let p(x) = ao + a1x + + anx" be a polynomial with complex coefficients. We can create a matrix p(A) = aol + a, A + if + A is n-by-n. Argue that for any two polynomials p(x) and g(x), p(A) commutes with g(A).
17. Let p(x) = ao + aix +
+ anx" be a polynomial with complex coefficients. Let A be n-by-n and v be a nonzero column vector. There is a slick way to compute the vector p(A)v. Of course, you could just compute all the necessary powers of A, combine them to form p(A),
and multiply this matrix by v. But there is a better way. Form the Krylov matrix JCn+, (A, v) = [v I Av I A2v I . . . I A"v]. Note you never have to compute the powers of A since each column of ACn+1(A, v) is just A times ao
a, the previous column. Argue that p(A)v =
(A, v)
Make up an
a 3-by-3 example to illustrate this fact.
18. Suppose A^ is obtained from A by swapping two columns. Suppose the same sequence of elementary row operations is performed on both A and A^, yielding B and B^. Argue that B^ is obtained from B by swapping the same two columns. 19. Find nice formulas for the powers of the elementary matrices, that is, for any positive integer m, (T;j(k))"' =, (D1(c))'" _, and (P(Q))"' = ?
20. Later, we will be very interested in taking a matrix A and forming S-1 AS where S is, of course, invertible. Write a generic 3-by-3 matrix A 0 0 -, and form (T12(a))-'AT12(a), (D2(a))-1 AD2(a), and 0 0 1
1
0 A
0
0
1
0
0
1
1
0
1
0 J. Now make a general statement about what happens in 0
the n-by-n case.
21. Investigate how applying an elementary matrix affects the determinant of a matrix. For example, det(T;j(a)A) = det(A). 22. What is the minimal polynomial of A =
L0
a ]?
Generating Im'ertible Matrices
62
23. Suppose A is a square matrix and p is a polynomial with p(A) = 0. Argue that the minimal polynomial p-A(x) divides p(x). (Hint: Remember the division algorithm.)
24. Exhibit two 3-by-3 permutation matrices that do not commute.
Group Project Elementary matrices can he generalized to work on blocks of a partitioned matrix instead of individual elements. Define a Type I generalized elemen-
tary matrix to be of the form [
® I®
a Type II generalized elementary
matrix multiplies a block from the left by a nonsingular matrix of appropriate size, and a Type III generalized elementary matrix multiplies a block by a matrix from the left and then adds the result to another row. So, for ex-A B C D ample,) ® 10 C B C D A and
D]-[
I/O®] X
[
CD
I
[® ®][
A
[
XA+CXB+D
]
'
The project is to
develop a theory of generalized elementary matrices analogous to the theory developed in the text. For example, are the generalized elementary matrices all
invertible with inverses of the same type? Can you write [ ® A
'
] as a
product of generalized Type III matrices?
Further Reading [B&R, 1986(2)] T. S. Blyth and E. F. Robertson, Matrices and Vector Spaces, Vol. 2, Chapman & Hall, New York, (1986).
[Lord, 1987] N. J. Lord, Matrices as Sums of Invertible Matrices, Mathematics Magazine, Vol. 60, No. 1, February, (1987), 33-35.
2.3 The LU and LDU Factorization
63
upper triangular lower triangular, row echelon form, zero row, leading entry, LU factorization, elimination matrix, LDUfactorization, full rank factorization
2.3
The LU and LDU Factorization
Our goal in this section is to show how to factor matrices into simpler ones. What this section boils down to is a fancy way of describing Gauss elimination. You might want to consult Appendix B for notation regarding entries, rows, and columns of a matrix. The "simpler" matrices mentioned above are the triangular matrices. They
come in two flavors: upper and lower. Recall that a matrix L is called lower
triangular if ent;j(L) = 0 for i < j, that is, all the entries of L above the main diagonal are zero. Similarly, a matrix U is called upper triangular if entij(U) = 0 for i > j, that is, all the entries of U below the main diagonal are zero. Note that dilations, being diagonal matrices, are both upper and lower triangular, whereas transvections T;i(a) are lower triangular if i > j and upper
triangular if i < j. We leave as exercises the basic facts that the product of lower (upper) triangular matrices is lower (upper) triangular, the diagonal of their products is the product of the diagonal elements, and the inverse of a lower (upper) triangular matrices, if it exists, is again lower (upper) triangular. For example, 0 4
2 3
0 0 I
1
0
0
5
2
0 4
[ -1
3
r2 = I 23
0
0
8
0
L
is the product of two lower triangular matrices, while
5
0 2
0 0
-1
3
4
1
1
-,
-5/2
0 1/2
0 0
17/8
-3/8
1 /4
1
=
is the inverse of the second matrix.
When we reviewed Gauss elimination in Section 2.1, we noted how easily we could solve a "triangular" system of linear equations. Now that we have developed elementary matrices, we can make that discussion more complete and precise. It is clear that the unknowns in a system of linear equations are convenient placeholders and we can more efficiently work just with the matrix
Generating Invertible Matrices
64
of coefficients or with this matrix augmented by the right-hand side. Then elementary row operations (i.e., multiplying elementary matrices on the left) can be used to convert the augmented matrix to a very nice form called row echelon form. We make the precise definition of this form next. Consider an m-by-n matrix A. A row of A is called a zero row if all the entries in that row are zero. Rows that are not zero rows will be termed (what else?) nonzero rows. A nonzero row of A has a first nonzero entry as you come from the left. This entry is called the leading entry of the row. An m-by-n matrix A is in row echelon form if it has three things going for it. First, all zero rows are below all nonzero rows. In other words, all the zero rows are at the bottom of the matrix. Second, all entries below a leading entry must be zero. Third, the leading entries occur farther to the right as you go down the nonzero rows of the matrix. In other words, the leading entry in any nonzero row appears in a column to the right of the column containing the leading entry of the row above it. In particular, then, a matrix in row echelon form is upper triangular! Do you see that the conditions force the nonzero entries to lie in a stair-step arrangement in the northeast corner of the matrix? Do you see how the word "echelon" came to be used? For example, the following matrix is in row echelon form: 1
2
3
4
5
6
7
8
0 0 0 0
0 0 0 0
2
3
4
5
6
7
8
0 0 0
2
3
4
5
6
7
0 0
0 0
0 0
2
3
4
0
0
0
9
We can now formalize Gauss elimination.
THEOREM 2.11 Every matrix A E C"" can be converted to a row echelon matrix by a finite number of elementary row operations.
PROOF
First, use permutation matrices to swap all the zero rows below all the nonzero rows. Use more swaps to move row; (A) below rowj (A) if the leading element of row;(A) occurs to the right of the leading element of rowi(A). So far, all we have used are permutation matrices. If all these permutations have been performed and we have not achieved row echelon form, it has to be that
there is a row,(A) above a row,(A) whose leading entry a81 is in the same position as the leading entry a,j. Use a transvection to zero out aq. That is, /a,j) will put a zero where aq was. Continue in this manner until row echelon form is achieved.
0
Notice that we did not use any dilations to achieve a row echelon matrix. That is because, at this point, we do not really care what the values of the
2.3 The LU and LDU Factorization
65
leading entries are. This means we do not get a unique matrix in row echelon form starting with a matrix A and applying elementary row operations. We can multiply the rows by a variety of nonzero scalars and we still have a row echelon matrix associated with A. However, what is uniquely determined by A are the positions of leading entries. This fact makes a nice nontrivial exercise. Before we move on, we note a corollary to our theorem above. COROLLARY 2.2 Given any matrix A in C'
"
there exists an invertible matrix R such that RA G is a row echelon matrix, that is, RA = . . . and G has no zero rows.
Let's motivate our next theorem with an example. Let
A=
2 4
-2
1
-6
2
0
3
7
2
1
1
. Then, doing Gauss elimination,
T32(1)T31(1)T21(-2)A =
2 0
-8 -2 -1
0
0
1
2
1
= U,
2
1
an upper triangular matrix. Thus, A = T21(-2) -1 T31(1)-1 T32(1) 1 U = T21(2)TI, (- I)TI2(- DU 1
2 -1
0
0
2
1
0
0
1
0
-1
1
1
2
-8 -2 -1 0
1
= L U, the product of a
2
lower and an upper triangular matrix. The entries of L are notable. They are the opposites of the multipliers used in Gauss elimination and the diagonal elements are all ones. What we have illustrated works great as long as you do not run into the necessity of doing a row swap. THEOREM 2.12 Suppose A E c"" can be reduced to a row echelon matrix without needing any row exchanges. Then there is an in-by-in lower triangular matrix L with ones on the diagonal and an m-by-n upper triangular matrix U in row echelon form such that A = LU. Such a factorization is called an LU factorization of A.
PROOF We know there exist elementary matrices E1, ... , Ek such that Ek ... E1 A = U is in row echelon form. Since no row swaps and no dilations were required, all the Eis are transvections Tin(a) where i > j. These are lower triangular matrices with ones on their diagonals. The same is true for U their inverses, so L = El 1 . . . Ek 1 is as required for the theorem.
Generating Invertible Matrices
66
Instead of using elementary transvections one at a time, we can "speed up" the elimination process by clearing out a column in one blow. We now introduce some matrices we call elimination matrices. An elimination matrix E is a matrix
a,
a2
of the form E = I - uvT = In -
[U, u2 ... U]
.
It, We know by Corollary 1.6 of Chapter I page 28 that E is invertible as long as uv r . We note that the elementary vTu # 1, and, in this case, E-1 = I -
v u- I
matrices already introduced are special cases of elimination matrices. Indeed,
Tin(a) = I +ae;ej', DA(a) = / -(1 -a)ekeT and Pii = I -(e; -ei)(e, -ej )T. all a21
an,,
Let A =
a22
... ...
a2n
an,2
...
an,
a12
a,,,
0 a2i/aii
Let u, =
0
and v, = e,
all
ami/aj
2, 3, ...
, m and call µi, 0
1
0
a "multiplier." Then L, = I - uivi
...
.. .
-µ2i
1
0
.. .
-µs1
0
1
.. .
0
all
LiA =
Let µi, _ail- for i =
0
...
a12
.. .
...
0 0 0
is lo we r tri angul ar a nd
1
a,,
a22
a2(2)
(2)
(2)
(2)
= A(2),
... amn where a;j) = a,1 - µi,a,j for i = 2, 3, ... , n, j = 2, ... 0
a,n2
Let's look at an example. Suppose A = I
-2 -3
0
0
10
00
0
1
and L, A =
=
1
I
2
2 3
3
I
-l
, n.
4
-1 ].Then L, _ 2
1
4
I
I
2
0
I
-3 -9
0
-4 -5 -10
.
The idea is to repeat
2.3 The LU and LDU Factorization
67
100
0 1
I - u2e2 , where u2 =
the process. Let L2 =
µ32
1
and e2 =
µi2 = ai2 (2)
=3,...
for
. Here
0
µn2 (2)
0
n,
a22
again assuming a22) A 0. With (m - I) sweeps, we reduce our matrix A to row
echelon form. For each k, Lk = I - ukeT where enti(uk) =
ifi = 1,2,... ,k
0 (k)
µik = a(k)
ifi =k+ 1,... ,m'
akk
These multipliers are well defined as long as the elements akk) are nonzero, in
a()(k) t µ;kak , i = k+1, ... , m, j = i, ... , n. To continue
t = which case a+1) our example, L2 =
1
0
0
0
1
0
1
. Then L2L, A =
0 4 1 = U. More generally we would find A = LU where
L=
I
0
0
µ21
1
0
µ31
µ32
µn11
µm2
1
...
0 0
1
1
0
2
4
-3
-9
-17
-46
0
... 0
...
... ...
0 0
...
I
J
(k)
where entik(L) = µik =
a(k)
; i = k + 1,... , m and µki = akk), j =
akk
k,...,n.
Before you get too excited about the LU factorization, notice we have been assuming that nothing goes wrong along the way. However, the simple nonsingular matrix
0 L
does not have an LU factorization! (Why not?)
1
J
We do know a theorem that tells when a nonsingular matrix has an LU factorization. It involves the idea of leading principal submatrices of a matrix. They are the square submatrices you can form starting at the upper left-hand
corner of A and working down. More precisely, if A = (aid] E Cnxn the
Generating Invertible Matrices
68
leading principal submatrices of A are AI = [all ], A2 = all
a12
a2I
a22
Ak= ak2
akl
.
alk
...
a2k
...
akk
.
.
all
a12
a21
a22
A,,=A.
THEOREM 2.13 Let A be a nonsingular inatrix in C" xa. Then A has an L U factorization if and only if all the leading principal submatrices of A are nonsingular.
PROOF
Suppose first that A has an LU factorization
A = LU -
LII
0
L21
L22
] [
U11
U12
0
U22
I_[
LIIUII #
'k
*
,
where LII and UII are k-by-k. Being triangular with nonzero diagonal entries, L11 and UII must be nonsingular, hence so is their product LIIUI1. This product is the leading principal submatrix Ak. This argument works for each
k=1,... ,n.
Conversely, suppose the condition holds. We use induction to argue that each leading principal submatrix has an LU factorization and so A itself must
have one. If k = 1, AI = [all] = [I][aIl] is trivially an LU factorization and all cannot he zero since AI is invertible. Now, proceeding inductively, suppose Ak = LkUk is an LU factorization. We must prove Ak+I has an
L U factorization as well. Now Ak = U' Lk 1 so Ak+I =
vAT
L
Lk VTUL I
®
Uk
l f J
I
LO
Lk l u
ak+I - VTAL lu
and Uk+I -
L
The crucial fact is that ak+I -vT Ak -'u
VTU-I L
Ik+u-vTAklu L
]=
, where cTand b contain the first k
components of row,+, (Ak+I) and colk+I(Ak+I ) But Lk+I = r Uk
U
ak+I
I 11
]
k
] gives an LU factorization for Ak+I. This is because Uk+I = Lk+1 Ak+1
is nonsingular. By induction, A has an LU factorization. There is good news on the uniqueness front.
0
2.3 The LU and LDU Factorization
69
THEOREM 2.14 If A is nonsingular and can be reduced to row echelon form without row swaps, then there exists a unique lower triangular matrix L with one's on the diagonal and a unique upper triangular matrix U such that A = L U.
PROOF We have existence from before, so the issue is uniqueness. Note U = L-I A is the product of invertible matrices, hence is itself invertible. In typical fashion, suppose we have two such LU factorizations, A = L1 U, _ L2U2. Then U, U2 1 = L 1 L2. However, U, Uz 1 is upper triangular and L 1
1
1 L2
is lower triangular with one's down the diagonal. The only way that can be is if they equal the identity matrix. That is, U, U2 1 = 1 = L 1 1 L2. Therefore L, = L2 and U, = U2, as was to be proved. D
Suppose we have a matrix A that does not have an L U factorization. All is not lost; you just have to use row swaps to reduce the matrix. In fact, a rather remarkable thing happens. Suppose we need permutation matrices in L2P2L, P, A = U where reducing A. We would have Ln_, P"_, Li_2 Pn_2 Pk = I if no row swap is required at the kth step. Remarkably, all we really need is one permutation matrix to take care of all the swapping at once. Let's see why. Suppose we have L2 P2L, P, A = U. We know P,T Pk = I
so L2P2L,P, P2P,A = L2L,(P2P,)A where L, is just L, reordered (i.e., _
, = P2L, P2 T). To illustrate, suppose L, =
1
0
0
a
1
0
b
0
1
and P = P(23).
Then 1
L,=
0 0
0 0 1
0 1
0
1[
1
a b
[
0
0 1
1
0
0
1
0
0
0
1
0
1
0
1
0
[
1
=
1
0
0
b
1
0
a
0
1
.
M_ ore generally, L,,_1 Pn-, ... L, P,A = Ln_,Ln_2... L,(Pn-, ... P,)A = L P A = U where Lk = P,,_1 . Pk+, Lk P+i for k = 1, 2, ... , n - 2 and L = L11_1
and P = Pn_1...P1. Then PA = L-'U = LU. We
have argued the following theorem.
THEOREM 2.15
For any A E C""', there exists a permutation matrix P, an m-by-m lower triangular matrix L with ones down its diagonal, and an tipper triangular mby-n row echelon matrix U with PA = LU. If A is n-by-n invertible, L and U are unique.
Generating Invertible Matrices
70
For example , let A = J
0 2 4
2
8
16
2
-1
3
3
2
7
5
15
24
1
Clearly there is no way to get Gauss elimination started with a,, = 0. A row swap is required, say P12A
=
2
1
0 4
2 3 5
8
2
-I
3
16 2
2 7
15
24
PI 2 A =
-1
3
16
2
-4
0
0
11
1
0 2 T3,(-2)P,2A = 0 0 0 0 The reader may v erify
Then T43(4)T42(-Z)TT2(-Z)T41(-4)
1
0
0
1
0 0
2
0 0 0
1
4
1
2
1
-1
3
0
2
16
2
0 0
0
-4
0
0
0
11
How does this LU business help us to solve systems of linear equations? Suppose you need to solve Ax = b. Then, if PA = LU, we get PAx = Pb or LUx = Pb = b. Now we solve Ly = b by forward substitution and Ux = y by hack substitution since we have triangular systems. So, for example, 2x2 + 16x3 + 2x4 = 10 - x3 + 3x4 = 17
2x, + x2
can be wr i tten
4x, + 3x2 + 2x3 + 7x4 = 4 8x, + 5x2 + 15X3 + 24x4 = 5
P12A
X1
10
X2
17
= Pie
X3
X4
17
_
10
4
4
5
5
There tore,
Yi
Y2
l I
1
0
0
1
0 0
0 0 0
y3 y4
4
4
1
y,
17
Y2
10
y3
4
y4
5
2.3 The LU and LDU Factorization
71
which yields 17
y1
10
Y2
=
Y3
-35 521
Y4
4
and
2
1
-1
3
xl
0
2
16
2
0
0 0
-4
0
x2 x3 x4
0
0
11
17
_
10
-
-35 -121 4
It follows that 34337
XI
48678
X2
3544
X3
8 521
X4
44
It may seem a bit unfair that L gets to have ones down its diagonal but U does not. We can fix that (sometimes). Suppose A is nonsingular and A = LU. The trick is to pull the diagonal entries of U out and divide the rows of U by each diagonal entry. That is, 0 0
I
EU
EU
...
0
1
Ell
...
d
0
0
...
0
d2
d2
1
Then we have LDU1 = A where D is a diagonal matrix and UI is upper triangular with ones on the diagonal as well. For example,
A-
-2 2
4
2
16
2
1
-1
3
8
3 5
1
0
-1
1
7
15
24
0 0
1
-2 -4
-I
2
7/3
1
-14
13/3 0
0
0
1
0
0 0
-2 7/3
1
-4 13/3 -14 1 We summarize.
0
-2
2
16
0
0
3
15
5
0
0
-1
-2/3
1
0
0 0
0
-2 0 0 0 0 0 0 3 0 0 -1 0 0
0
0
2
I
1
1
0
0 0
-1 -8 -1 1
5
0
1
5/3 2/3
0
0
1
Generating Invertible Matrices
72
THEOREM 2.16 (LDU factorization theorem) If A is nonsingular, there exists a permutation matrix P such that P A = L D U where L is lower triangular with ones down the diagonal, D is a diagonal matrix, and U is upper triangular with ones down the diagonal. This factorization of PA is unique. One application of the LDU factorization is for real symmetric matrices (i.e.,
matrices over R such that A = AT). First, A = LDU so AT = UT DT LT = UT DLT . But A = AT so LDU = UT DLT . By uniqueness U = LT . Thus symmetric matrices A can be written LDLT. Suppose D = diag(di, d2,... , makes has all real positive entries. Then D2 = diag( dl, d2, ... , sense and (D2)2 = D. Let S = D1LT. Then A = LDLT = LD2D2LT = STS, where S is upper triangular with positive diagonal entries. Conversely, if A = RRT where R is lower triangular with positive diagonal elements, then R = LD where L is lower triangular with ones on the diagonal and D is a diagonal matrix with all positive elements on the diagonal. Then A = LD2LT is the LDU factorization of A. Let's take a closer look at an LU factorization as a "preview of coming r I 0 0 01 ra b c d 0 0 0 g It attractions.' Suppose A= L U= Ix 0 I 0 0 0 0 y r 0 0 0 0 Z s t b d a c xd + h xa xb + f xc + g ya yb +rf yc+rg yd +r h zb + sf zc + sg zd + s h za 1
f
1
1
I
1
1
0
x
r
f
L
Z
d
a0
s
c
1
J. The block of zeros in U makes part of L
irrelevant. We could reconstruct the matrix A without ever knowing the value
oft! More specifically A
I
=a
lb
Y
xy
Z
z
x
0
1
+f
r 1
x
1
Ic
0
x y
+g rI
z
x
1
Id
0
x y
+h rI
z
s
2.3 The LU and LDU Factorization
73
so we have another factorization of A different from LU where only the crucial columns of L are retained so that all columns of A can be reconstructed using the nonzero rows of U as coefficients. Later, we shall refer to this as a full rank factorization of A. Clearly, the first two columns of L are independent, so Gauss elimination has led to a basis of the column space of A.
Exercise Set 5 1. Prove that the product of upper (lower) triangular matrices is upper (lower) triangular. What about the sums, differences, and scalar multiples of triangular matrices? 2. What can you say about the transpose and conjugate transpose of an upper (lower) triangular matrix?
3. Argue that every square matrix can be uniquely written as the sum of a strictly lower triangular matrix, a diagonal matrix, and a strictly upper triangular matrix. 4. Prove that the inverse of an upper (lower) triangular matrix, when it exists, is again upper (lower) triangular.
5. Prove that an upper (lower) triangular matrix is invertible if the diagonal entries are all nonzero.
6. Argue that a symmetric upper (lower) triangular matrix is a diagonal matrix.
7. Prove that a matrix is diagonal iff it is both upper and lower triangular.
8. Prove the uniqueness of the LDU factorization. 9. Prove A is invertible if A can be reduced by elementary row operations to the identity matrix.
10. LetA=
2
1
4
2
-2 -1 I
8
4
-1 2 16 15
3 7 2 24
Generating Invertible Matrices
74
Multiply
2
-1 4
1
0
0
1
0
3
1
1
5
1
0
0
U
2
1
0
0
1
0
-1 L4
is
a. 5
0 0 0 1
[2 1
1
2
1
0 0 0
0 0 0
-1
-1
3
4
1
3
2
0
1
and
3
0
0
4
1
0
0
0
5/4
What do you notice?
0 0 J L 0 0 Does this contradict any of our theorems? Of course not, but the question is why not?
11. LetA -
1
-1
2
1
4
3
2
-2
2
16
8
5
15
3 7 2 24
(a) Find an L U factorization by multiplying A on the left by elementary transvections.
(b) Find an LU factorization of A by "brute force" Set A 1
_
-
0
x y
r
z
s
1
0 0
0 0 0
2
0
1
a 0
-1
3
b
c
0 d e 0 0 0 t Multiply out and solve. Did you get the same L and U ? Did you have to? 1
1
f
12. The leading elements in the nonzero rows of a matrix in row echelon form
are called pivots. A pivot, or basic column, is a column that contains a pivot position. Argue that, while a matrix A can have many different pivots, the positions in which they occur are uniquely determined by A. This gives us one way to define a notion of "rank" we call pivot rank. The pivot rank of a matrix A is the number of pivot positions in any row echelon matrix obtained from A. Evidently, this is the same as the number of basic columns. The variables in a system of linear equations corresponding to the pivot or basic columns are called the basic variables of the system. All the other variables are called free. Argue that an inby-it system of linear equations, Ax = b with variables x1, ... , x,,, is consistent for all b ill A has in pivots. 13. Argue that an LU factorization cannot be unique if U has a row of zeros.
14. If A is any m-by-n matrix, there exists an invertible P so that A =
P-'LU.
2.3 The LU and LDU Factorization
75
Further Reading [B&R, 1986(2)] T. S. Blyth and E. F. Robertson, Matrices and Vector Spaces, Vol. 2, Chapman & Hall, New York, (1986).
[E&S, 2004] Alan Edelman and Gilbert Strang, Pascal Matrices, The American Mathematical Monthly, Vol. I1 1 , No. 3, March, (2004), 189-197. [Johnson, 2003] Warren P. Johnson, An LDU Factorization in Elementary
Number Theory, Mathematics Magazine, Vol. 76, No. 5, December, (2003), 392-394.
[Szabo, 2000] Fred Szabo, Linear Algebra, An Introduction Using Mathematica, Harcourt, Academic Press, New York, (2000).
MATLAB Moment
2.3.1
The LU Factorization
2.3.1.1
MATLAB computes LU factorizations. The command is
[L, U, P] = lu(A) which returns a unit lower triangular matrix L, an upper triangular matrix U, and a permutation matrix P such that PA = LU. For example, >>A=[0 2 16 2;2 1 -13;4 3 2 7;8 5 15 241
A= 0
2 4 8
2
16
2
1
-1
3
2 15
24
3 5
7
> [L,U,P]=Iu(A)
L= 0
1
0 0
1/2
1/4
1
0 0
1/4
-1/8
11/38
1
1
0
0
Generating Invertible Matrices
76
U= 24
8
5
15
0
2
16
2
0
0
-19/2
-22/19
0 0
0 0
P= 0 1
0 0
0
1
1
0
1
0 0 0
adjugate, submatrix, principal submatrices, principal minors, leading principal submatrices, leading principal minors, cofactors
2.4
The Adjugate of a Matrix
There is a matrix that can be associated to a square matrix and is closely related to the invertibility of that matrix. This is called the adjugate matrix, or adjoint matrix. We prefer to use the word "adjoint" in another context, so we go with the British and use "adjugate." Luckily, the first three letters are the same for both terms so the abbreviations will look the same. Suppose A is an m-by-n matrix. If we erase m - r rows and n - c columns, what remains is called an r-by-c submatrix of A. For example, let
A =
all
a12
a13
a14
a15
a16
all
a21
a22
a23
a24
a25
a26
a27
a31
a32
a33
a34
a35
a36
a37
a41
a42
a43
a44
a45
a46
a47
a51
a52
a53
a54
a55
a56
a57
. Suppose we strike out
two rows, say the second and fifth, and three columns, say the second, fourth, and fifth, then the (5 - 2)-by-(7 - 3) submatrix of A we obtain is all
a13
a16
all
a31
a33
a36
a37
a41
a43
a46
a47
E C3x4
Them -r rows to strike out can be chosen in m -r ways and the n -c columns can be chosen in (n-c " ) ways, so there are (ni -v (n"c) - possible submatrices that
can be formed from an in-by-n matrix. For example, the number of possible submatrices of size 3-by-4 from A above is (;) (4) = 350. You probably would not want to write them all down.
We are most interested in submatrices of square matrices. In fact, we are very interested in determinants of square submatrices of a square matrix.
2.4 The Adjugate of a Matrix
77
The determinant of an r-by-r submatrix of A E C" x" is called an r-by-r minor
()2
of A. There are such minors possible in A. We take the O-by-O minor of any matrix A to be 1 for convenience. Note that there are just as many minors of order
r as of order n-r. For later, we note that the principal submatrices of A are obtained when the rows and columns deleted from A have the same indices. A submatrix so obtained is symmetrically located with respect to the main diagonal of A. The determinants of these principal submatrices are called principal minors. Even more special are the leading principal submatrices and their determinants,
all called the leading principal minors. If A =
a3l
principal submatrices are [all ],
all
a12
azI
azz
a13 a23 a33
a12 a22 a32
a21
l
] , and
,
the leading
all
a12
a13
a21
a22 a32
a23 a33
a31
Let M11(A) be defined to be the (n - l)-by-(n - 1) submatrix of A obtained by striking out the ith row and jth column from A. For example, if A =
1
2
3
4 7
5 8
6
,
4
then M12(A) =
The (i, j)-cofactor of A is
9
L7
9
L
defined by
cof,j(A) = (-1)`+idet(M;j(A)). For example, for A above, cof12(A) _ (-1)1+2det
4
L L
6.
(-1)(-6)
9
We now make a matrix of cofactors of A and take its transpose to create a new matrix called the adjugate matrix of A.
adj(A) :_ [cof;j (A)]" For example,
adj\ and if A =
all
aI2
a21
a22
a13 a23
a31
a32
a33
d et
a22 a32
a23
L
adj(A) =
d
Lc
-det L
a12 a32
-det [
a33
aI3 ]
det
a33
det f a12
a13
a22
a23
L
d J/- L
]
-det
a
aa21
a23
31
a33
L a31 a21
T
]
a23
det
[ a31
-det I
3]
a33
[
J
j
a32
]
all
det I a a
]
a32
21
a22
I
Generating Invertible Matrices
78
You may be wondering why we took the transpose. That has to do with a formula
we want that connects the adjugate matrix with the inverse of a matrix. Let's go after that connection. Let's look at the 2-by-2 situation, always a good place to start. Let A = a
-
d
J.
Let's compute A(adj(A)) =
_
ad + b(-c) a(-b) + ba cd + d(-c) c(-b) + da 0 r det(A) 0 det(A)
b
a d L ad - be
I I
-
ab
0
ad - be
0
0 ].That is a neat answer! Ah, but does it persist with larger
= det(A) [
matrices? In the 3-by-3 case, A =
A(adj(A)) all alt a21
a31
a22 a32
all
a12
a21
a22
a13 a23
a31
a32
a33
and so
a13 a23 a33
a22a33 - a23a32
-al2a33 + a13a32
-a21a33 +a3la23
aIIa33 -aI3a3I
a21a32 - a22a3l
-a11a32 + a12a31
det(A)
0
0 0
det(A)
0 0
0
det(A)
aI2a23 - al3a22 -a1Ia23 +a21a13 al Ia22 - al2a21 1
= det(A)
0 0
0 1
0 0
0
1
Indeed, we have a theorem.
THEOREM 2.17 For any n-by-n matrix A, A(adj(A)) = (det(A))I,,. Thus, A is invertible iff
det(A):0 and, in this case, A-1 = PROOF
1
det (A)
adj(A).
The proof requires the Laplace expansion theorem (see Appendix Q.
We compute the (i, j)-entry: ent, (A(adj(A)) = >ent;k(A)entkl(adj(A)) _
Eaik(-1)i+kdet(Mik (A)) = k=1
j det(A) ifi = j
l0
ifi0j
k=1
79
2.4 The Adjugate of a Matrix
While this theorem does not give an efficient way to compute the inverse of a matrix, it does have some nice theoretical consequences. If A were 10-by10, just finding adj(A) would require computing 100 determinants of 9-by-9 matrices! There must be a better way, even if you have a computer. We can, of course, illustrate with small examples. 6
1
4
3
0
2
-1
2
2
Let A =
6
1
3
0
-1
2
Aadj(A) =
. Then
4 2 2
adj(A) _
-4 -8
6
2
16
0
6
-13
so we see det(A) = -8 and A-' _
6
6
-8
16
0
6
-13
-3
-8 =
-3 -4
-8 -8
-4
2
0
0
-8
0
0
6 16
2
-13
-3
0
and
0
0
-8
, as the reader
may verify.
Exercise Set 6 1. Compute the adjugate and inverse of 3
5
-1 6
0 4
2
2 0 7
5
3
1
1
1
1
3
6
J,
2
4
3
0 0
2
4
0
0
,
, if they exist, using Theorem 2.17 above.
2. Write down the generic 4-by-4 matrix A. Compute M,3(A) and M24 (M,3(A)).
3. Establish the following properties of the adjugate where A and B are in Cnxn:
(a) adj(A-') = (adj(A))-', provided A is invertible (b) adj(cA) = cn-'adj(A) (c) if A E C"xn with n > 2, adj(adj(A)) _ (det(A))i-2A (d) adj(AT) = (adj(A))T so A is symmetric itf adj(A) is (e) adj(A*) _ (adj(A))* so A is Hermitian iffadj(A) is
(f) adj(AB) = adj(B)adj(A) (g) adj(adj(A)) = A provided det(A) = 1
(h) adj(A) = adj(A) (i) the adjugate of a scalar matrix is a scalar matrix
Generating lnvertible Matrices
80
(j) the adjugate of a diagonal matrix is a diagonal matrix (k) the adjugate of a triangular matrix is a triangular matrix (1)
adj(T;j(-a)) = T,j(a)
(m) adj(1) = I and adj((D) = G. 4. Find an example of a 3-by-3 nonzero matrix A with adj(A) = 0.
5. Argue that det(adj(A))det(A) = (det(A))" where det(A) 0, det(adj(A))
A is n-by-n.
So, if
(det(A))"-1.
6. Argue that det
= (-1)""' whenever n > 1 and in > I.
® ®
L
7. Prove that det I
V*
u 1 = (3det (A) - v*(adj(A))u = det(A)((3 0
v*A-lu) where (3 is a scalar and u and v are n-by-1. (Hint: Do a Laplace expansion by the last row and then more Laplace expansions by the last column.)
8. Argue that ad j(1 - uv*) = uv* + (I - v*u)/ where u and v are n-by-1.
9. Prove that det(adj(adj(A))) = (det(A))("-') . 10. If A is nonsingular, adj(A) = det(A)A-1.
Further Reading [Aitken, 1939] A. C. Aitken, Determinants and Matrices, 9th edition, Oliver and Boyd, Edinburgh and London, New York: Interscience Publishers, Inc., (1939). [B&R, 1986(2)] T. S. Blyth and E. F. Robertson, Matrices and Vector Spaces, Vol. 2, Chapman & Hall, New York, (1986). [Bress, 1999] David M. Bressoud, Proofs and Confirmations: The Story of the Alternating Sign Matrix Conjecture, Cambridge University Press, (1999). [Bress&Propp, 1999] David Bressoud and James Propp, How the Alternating Sign Conjecture was Solved, Notices of the American Mathematical Society, Vol. 46, No. 6, June/July (1999), 637-646.
2.5 The Frame Algorithm and the Cayley-Hamilton Theorem
81
Group Project Find out everything you can about the alternating sign conjecture and write a paper about it.
characteristic matrix, characteristic polynomial, Cayley-Hamilton theorem, Newton identities, Frame algorithm
2.5
The Frame Algorithm and the Cayley-Hamilton Theorem
In 1949, J. Sutherland Frame (24 December 1907 - 27 February 1997) published an abstract in the Bulletin of theAmerican Mathematical Society indicating a recursive algorithm for computing the inverse of a matrix and, as a byproduct, getting additional information, including the famous Cayley-Hamilton
theorem. (Hamilton is the Irish mathematician William Rowan Hamilton (4 August 1805 - 2 September 1865), and Cayley is Arthur Cayley (16 August 1821 - 26 January 1895.) We have not been able to find an actual paper with a detailed account of these claims. Perhaps the author thought the abstract sufficient and went on with his work in group representations. Perhaps he was told this algorithm had been rediscovered many times (see [House, 1964, p. 72]). Whatever the case, in this section, we will expand on and expose the details
of Frame's algorithm. Suppose A E C"'. The characteristic matrix of A is xl - A E C[x J""", the collection of n-by-n matrices with polynomial entries. We must open our minds to accepting matrices with polynomial entries. For example,
x+I
x-3
4x + 2
x3 - 7 ]
E
C[X]2,2. Determinants work just fine for
these kinds of matrices. The determinant ofxl - A, det(xl - A) E C[x], the polynomials in x, and is what we call the characteristic polynomial of A:
XA(x)=det(xl -A)=x"+clx"-1+--.+Cn-Ix+c". 1
For example, if A
I x-3 -6
1
-2
1
2
2
3
4
5
7
8
6
-2 x - 4 -5 -7 x -8
E
C3"3, then X13 - A =
I
E
C[x]3i3 . Thus XA(x) _
Generating Invertible Matrices
82
x-1
-3 -6
det
-2 x-4
-7
-2 -5
x3- 13x2-9x-3. This is computed
J=
x-8
using the usual familiar rules for expanding a determinant. You may recall that the roots of the characteristic polynomial are quite important, being the eigenvalues of the matrix. We will return to this topic later. For now, we focus on the coefficients of the characteristic polynomial.
First, we consider the constant term c,,. You may already know the an-
swer here, but let's make an argument. Now det(A) = (-1)"det(-A) _ (-1)"det(01 - A) = (-1)"XA(0) = (-I)"c,,. Therefore, det(A) =
(-1)"c,,
As a consequence, we see immediately that A is invertible iff c, A 0, in which case
A-' = (-I)"adj(A), c'1
where adj(A) is the adjugate matrix of A introduced previously. Also recall the
important relationship, Badj(B) = det(B)I. We conclude that
(xl - A)adj(xl - A) = XA(x)l. To illustrate with the example above, (xl - A)adj(xl - A) =
x-1 -3 -6
-2
-2 -5 -7 x- 8 x3-13x2-9x-3
x2- 12x-3
x-4 0 0
=x3-13x2-9x-3
2x-2
3x +6
x2-9x-4
6x -3
7x+5
2x+2 5x+1
x2-5x-2
0
0
x3- 13x2-9x-3
0
0 1
0
0
1
0 0
0
0
1
X3-13x2-9X-3
Next, let C(x) = adj(xl - A) E C[x]""". We note that the elements of adj(xl - A) are computed as (n-1)-by-(n-l) subdeterminants of xl - A, so the highest power that can occur in C(x) is x". Also, note that we can identify C[x]""", the n-by-n matrices with polynomial entries with C"[x], the polynomials with matrix coefficients, so we can view C(x) as a polynomial in
[0
]x3+[0 01
0]x2+[4
]+[
x-3
x'-+1
x with scalar matrices as coefficients. For example,
4x+2 x3-7 1
2
-3 -7
_ ]
All you do is 1.
2.5 The Frame Algorithm and the Cayley-Hamilton Theorem
83
gather the coefficients of each power of x and make a matrix of scalars as the coefficient of that power of x. Note that what we have thusly created is an element of Cnxn[x], the polynomials in x whose coefficients come from the n-by-n matrices over C. Also note, xB = Bx for all B E tC[x]"x", so it does not matter which side we put the x on. We now view C(x) as such an expression in Cnx"[x]:
C(x) = Box"-I +BI These coefficient matrices turn out to be of interest. For example, adj(A) _ (-1)n-Iadj(-A)
= (-l)"-IC(0) = (-1) 'B,,-,, so (-1)n-Ign-I
adj(A) =
Thus, if cn 54 0, A is invertible and we have
A-I =
cn
Bn-I
But now we compute
(xl - A)adj(xl - A) = (xl - A)C(x) = (xI - A)(Box"-I + B1xi-2 + ... + B.-2X + Bn-1) = x" Bo + x"_' (BI - ABo) + xn-2(B2 - ABI) + +x(Bi_1 - ABn_2) - AB.-1 = x"I + x"-I C11 + ... + xci_1I + cnl
and we compare coefficients using the following table: Compare Coefficients
Multiply by
on the Left
Bp=I
All
An Bo
on the Right A"
BI - ABo = c1l
An-'
A"-I BI - AnBo
CI
B2 - A BI = C21
A"-2
An-2B2 - An-'BI
An-1
C2An
-2
Bk - ABk_I = ckI Bn_2 - ABn-3 = Cn_2I Bn-1 - ABn_2 = cn_1l
ABn_1 - A2Bn_2
sum =
0=
Cn-2A2
-ABn-I
-ABn_I = column
A2Bn-2 - A3B,i-3
A2 A
XA(A)
Generating Invertible Matrices
84
So, the first consequence we get from these observations is that the CayleyHamilton theorem just falls out as an easy consequence. (Actually, Liebler [2003] reports that Cayley and Hamilton only established the result for matrices
up to size 4-by-4. He says it was Frohenius (Ferdinand Georg Frobenius [26 October 1849 - 3 August 19171 who gave the first complete proof in 1878.)
THEOREM 2.18 (Cayley-Hamilton theorem) For any n-by-n matrix A over C, XA(A) = 0. What we are doing in the Cayley-Hamilton theorem is plugging a matrix into a polynomial. Plugging numbers into a polynomial seems reasonable, almost inevitable, but matrices? Given a polynomial p(x) = ao+a, x+ - +akxk E C[x], + ak Ak E C1 '". For example, we can create a matrix p(A) = a0 l + a, A +
ifp(x) = 4 + 3x - 9x 3, then p(A)=41+3A-9A33=4
1
0
0
0
I
0
0
0
1
+
-3432 -2324 -2964 -8112 -5499 -7004 8 6 8 -9243 -11 778 -13634 6 The Cayley-Hamilton theorem says that any square matrix is a "root" of its characteristic polynomial. But there is much more information packed in those equations on the left of the table, so let's push a little harder. Notice we can rewrite these equations as 1
3
3
2 4 7
2
5
2
2
4 7
5
1
-9
3
;
=
Bo
=
B,
= ABo+c,l
B2
= AB, +C21
B,_1
=
I
A B" _,
+ c 1.
By setting B, := 0, we have the following recursive scheme clear from above: for k = 1, 2, ... , n, Bo Bk
=
I = A Bk _, +
1.
In other words, the matrix coefficients, the Bks are given recursively in terms of the B&_,s and cks. If we can get a formula for ck in terms of Bk_,, we will get a complete set of recurrence formulas for the Bk and ck. In particular, if we know B"_, and c, we have A-1, provided, of course, A-1 exists (i.e., provided
2.5 The Frame Algorithm and the Cayley-Hamilton Theorem
85
c A 0). For this, let's exploit the recursion given above:
Bo=I BI = ABo + c1I = AI + c11 = A + c11 B2 = ABI +C21 = A(A+c11)+C21 = A2+c1A+C21 = A3+CIA2+C2A+C31 B3 =
Inductively, we see for k = 1, 2, ... , n,
Bk =
Ak+CIAk_1
+...+Ck_IA+CkI.
Indeed, when k = n, this is just the Cayley-Hamilton theorem all over again.
Now we have fork =2,3,... , n +
1,
Bk_I = Ak-I +CIAk-2+...+Ck-2A+ck_11. If we multiply through by A, we get for k = 2, 3,
ABk_I =
...
, n + 1,
Ak+c1Ak_1 +...+Ck_2A2+Ck_IA.
Now we pull a trick out of the mathematician's hat. Take the trace of both sides of the equation using the linearity of the trace functional.
tr(ABk_1) = tr(Ak) +
c1tr(Ak-1)
+
+ ck_2tr(A2) + ck_Itr(A)
fork=2,3,...,n+1. Why would anybody think to do such a thing? Well, the appearance of the coefficients of the characteristic polynomial on the right is very suggestive. Those who know a little matrix theory realize that the trace of A' is the sum of the rth powers of the roots of the characteristic polynomial and so Newton's identities leap to mind. Let Sr denote the sum of the rth powers of the roots of the characteristic polynomial. Thus, for k = 2, 3, ... , n + 1, tr(A Bk_I) = Sk + cISk_I + ' + Ck_2S2 + Ck_ISI.
2.5.1
Digression on Newton's Identities
Newton's identities go back aways. They relate the sums of powers of the roots of a polynomial recursively to the coefficients of the polynomial. Many proofs are available. Some involve the algebra of symmetric functions, but we do not want to take the time to go there. Instead, we will use a calculus-based argument following the ideas of [Eidswick 1968]. First, we need to recall some facts about
polynomials. Let p(x) = ao + a I x +
+ a"x". Then the coefficients of p
86
Generating Invertible Matrices
can he expressed in terms of the derivatives of p evaluated at zero (remember Taylor polynomials?): 0
P(x) = P(0) + p'(0)x + p 2 )x2 + ... +
cn t
P n i 0)xn
Now here is something really slick. Let's illustrate a general fact. Suppose
p(x) = (x - I)(x - 2)(x - 3) = -6 + 1 Ix - 6x2 + x-t. Do a wild and crazy thing. Reverse the rolls of the coefficients and form the new reversed
polynomial q(x) _ -6x3 + I1x2 - 6x + I. Clearly q(l) = 0 but, more amazingly, q(;) _ - + a - ; + 1 = -6+2 24+8 = 0. You can also check s polynomial has as roots 8 the reciprocals of the roots of q(!) = 0. So the reversed the original polynomial. Of course, the roots are not zero for this to work. This fact is generally true. Suppose p(x) = ao + aix + + anx" and the reversed polynomial is q(x) = an + an_ix + + aox". Note
gc,u(0) n
Then r # 0 is a root of p iff I is a root of q. r Suppose p(x) = ao + aix + + anx" = an(x - ri)(x - r2) ... (x - rn). The ri s are, of course, the roots of p, which we assume to be nonzero but not necessarily distinct. Then the reversed polynomial q(x) = an +an-tx +
+
aox" = ao(x - -)(x - -) . . (x - -). For the sake of illustration, suppose ri r2 .
rn
it = 3. Then form f (x)
_ q'(x) _ (x - ri IH(x - rz ') + (x - r; ' )] + [(x - r2 + (x - r3 )] q(x)
I
x-r_i + x-r2
I
+
(x - ri)(x - r2)(x - r3)
x-r3_i . Generally then, f(x)
4_1(x-rk n,
Let's introduce more notation. Let s,,, = >r" for rn = 1, 2, 3, .... Thus, t=i
sn, is the sum of the mrh powers of the roots of p. The derivatives of f are
2.5 The Frame Algorithm and the Cayley-Hamilton Theorem
87
intimately related to the ss. Basic differentiation yields n
f'(x )
f (0)
= -S I
-I
I(rP
f'(0)
= -s2
(x-r2r
f "(O)
_ -2s3
(x- ') +'
f (*)(0)
= -k!Sk+l
n
n
f(k)(x)
The last piece of the puLzle is the rule of taking the derivative of a product; this is the so-called Leibnitz rule for differentiating a product:
n) D'1(F(x)G(x)) = j( F(i)(x)G(n-i)(x). All right, let's do the argument. We have f (x) =
q'(x), so q'(x) = f (x)q(x).
Therefore, using the Leibnitz rule m-I
q('n)(x) =
[f(x)q(x)](m-1) =
E(m k
k=0
11 f(k)(x)q(m-I-k)(x).
Plugging in zero, we get III - I
(m)
9
(0) _
/n - l \ k
(k) (0)9
(m-I-k) (0)
/
k=0
rrl
ff
(m - I )!
Ok!(m - I
-k)!(-k!)sk+lq(m-I-k)(0).
Therefore,
I "'-I q(In-I-k)(0)
q(In)(O)
m!
m
= an-m = -- k-o (m -
I - k)! Sk+l
One more substitution and we have the Newton identities n1_1
0 = man_m + Ean-m+k+I Sk+I if I < m < Il k=0
m-I
0=
an-m+k+ISk+I if m > n. k=m-n-l
Generating Invertible Matrices
88
For example, suppose n = 3, p(x) = ao + a1x + a2x22 + a3x3.
in=l a2+s1a3=0 in = 2
2a1 + s1a2 + s2a3 = 0
in = 3
Sao + s,a1 + a2s2 + a3s3 = 0
in = 4
slap + s,a1 + s3a2 + S4a3 = 0
in = 5
aoS2 + aIS3 + a2s4 + a3s5 = 0
in = 6
ansi + a1 S4 + a2 S5 + a3 S6 = 0
That ends our digression and now we go back to Frame's algorithm. We need to translate the notation a bit. Note ak in the notation above. So from tr(A Bk-I) = Sk + CISk_I + ... + Ck_2S2 + Ck_ISI
fork =2,3,... n+l we see tr(A Bk_i) + kck = 0
fork=2,3,...
n.
In fact, you can check that this formula works when k = I (see exercise 1), so we have succeeded in getting a formula for the coefficients of the characteristic polynomial: fork = 1, 2, ... , n, ck =
-I tr(ABk_i). k
Now we are in great shape. The recursion we want, taking Bo = /, is given by
the following: fork = 1, 2, ... , n, Ck =
-1 k
tr(ABk_i)
Bk = A Bk-I + Ckf.
Note that the diagonal elements of are all equal (why?). Let's illustrate this algorithm with a concrete example. Suppose 2
A=
40
3 7
2
3 5
5
;0 1
2
4
. The algorithm goes as follows: first, find cl.
-27 c1 = -tr(A) = -(-17) = 17.
Next find BI :
1 19
B1=ABo+171=
2
3
5
4 0
24
10
3
3
18
4
2
5
1
-10
2.5 The Frame Algorithm and the Cayley-Hamilton Theorem Then compute A Bi :
AB1 =
1
54
103
132
13
110
225
273
39
20 4
95
52
-6
-27
51
293
Now start the cycle again finding c2:
c2=-Itr(ABi)=-2624=-312. Next comes B2:
- 258
103
1 32
13
110
-87
2 73
39
20
95
-2 60
4
-6
51
-27 -19
-115
-3 0
366
408 735 -190
763
14
-54
28
-8
70 7
1
B2 = A B ,
+ (-312)1 =
Now form A B2:
-78 -50
F
AB2 =
-8
-2
Starting again, we find c3:
tr(AB2) _ -(2127) = -709.
C3
Then we form B3:
-787 B3
-50
=AB2-7091=
366
-54
408 26 -190 28
-115
-30
-8
-2
54
14
-8
-2
Now for the magic; form AB3 :
AB3 =
-2
0
0
0
-2
0
0
0
-2
0 0 0
0
0
0
-2
Next form c4: c4 =
-4tr(AB3) = -2(-8) = 2.
89
Generating Invertible Matrices
90
Now we can clean up. First 1)4C4
det(A) _
= 2.
Indeed the characteristic polynomial of A is
XA(x) = xa + 17x; - 312X2 - 709x + 2. Immediately we see A is invertible and
A- ' =
_
( c4 )
-787
a
adj(A) _
2
;(-1)3B =
7Z7
-204
ii
25
-13
4
1
-183
95
-27
-7
27
-14
4
1
-30
408 26
-1 15
-50 366
-190
54
14
-54
28
-8
-2
-8
-2
15
1
Moreover, we can express A-' as a polynomial in A with the help of the characteristic polynomial 17 A-' =709 -1 + 312 A - 2 A2-2A;. 2 I
2.5.2
The Characteristic Polynomial and the Minimal Polynomial
We end with some important results that connect the minimal and characteristic polynomials.
THEOREM 2.19 The minimal polynomial IAA (x) divides any annihilating polynomial of the n-by-
n matrix A. In particular, P-A(x) divides the characteristic polynomial XA(x). However, XA(x) divides (ILA(x))" .
PROOF
The first part of the proof involves the division algorithm and is left as an exercise (see exercise 21). The last claim is a bit more challenging, so we offer
aproof. Write p.A(x) = [3r+[3r_I x+ +p,x'-I+xr. Let B0 = 1,,, and let Bi Itor-I.Itiseasytoseethat B;-AB;_, =oi/ A'+01
for i = I to r - 1. Note that ABr-1 = Ar + [3, Ar-i + + 1r-I A = µA (A) - 0,,1,. = -[3rIn. Let C = Box' - + + ... + Br-2x + BrI
B,xr-2
1-Then C E C[x]""' and (xI - A)C = (xI, - A)(Boxr-i + B,xr-2 + ... + Br_2x+Br_,) = Boxr+(B, -ABO)xr-I + . +(Br-, -ABr_2)x-ABr_, =
2.5 The Frame Algorithm and the Cayley-Harnilton Theorem
91
Or In + (3, -ix!" +- + 3ixr-11 +x` I,, = µA (X)1". Now take the determinant of both sides and get XA(x)det(C) = (µA (X))" . The theorem now follows.
0
A tremendous amount of information about a matrix is locked up in its minimal and characteristic polynomials. We will develop this in due time. For the moment, we content ourselves with one very interesting fact: In view of the corollary above, every root of the minimal polynomial must also be a root of the characteristic polynomial. What is remarkable is that the converse is also true. We give a somewhat slick proof of this fact. THEOREM 2.20 The minimal polynomial and the characteristic polynomial have exactly the same set of roots.
PROOF
Suppose r is a root of the characteristic polynomial XA. Then
det(rl - A) = 0, so the matrix rl - A is not invertible. This means there must be a dependency relation among the columns of rl - A. That means there is a nonzero column vector v with (rl - A)v = 6 or what is the same, + akxk, we have Av = rv. Given any polynomial p(x) = ao + aix +
p(A)v = aov+a,Av+ +akAkv = aov+airv+ +akrkv = p(r)v. This says p(r)!-p(A) is not invertible, which in turn implies det(p(r)!-p(A)) = 0. Thus p(r) is a root of Xp(A). Now apply this when p(x) = liA(x). Then, for any
root r of XA, iJA(r) is a root of Xµ,(A)(X) = XO(x) = det(xI - 0) = x". The only zeros of x" are 0 so we are forced to conclude p.A(r) = 0, which says r is a root of µA.
0
As a consequence of this theorem, if we know the characteristic polynomial of a square matrix A factors as
XA(X) _ (x - r, )d' (X -
r2)d2
... (X - rk )d ,
then the minimal polynomial must factor as
11A(X) = (x - ri )'(x - r2)e2 ... (x - rk)r"
wheree,
Exercise Set 7 1. Explain why the formula tr(ABk-1) + kck = 0 for k = 2, 3, 4, .., n also works for k = 1. 2. Explain why the diagonal elements of
are all equal.
Generating Inertible Matrices
92
3. Using the characteristic polynomial, explain why A A when A is invertible.
is a polynomial in
'
4. Explain how to write a program on a handheld calculator to compute the coefficients of the characteristic polynomial.
5. Use the Newton identities and the fact that sk = tr(Ak) to find formulas for the coefficients of the characteristic polynomial in terms of the trace of powers and powers of traces of a matrix. (Hint: c2 = [tr(A)2 - tr(A2)],
c3 =
.... )
Ztr(A)tr(A2) - -3 tr(A3),
6. Consider the polynomial p(x) = ao + aix +
z
+ a,-Ix"-1 + x". Find
a matrix that has this polynomial as its characteristic polynomial. (Hint: 0
I
0
0
0
1
0
0
0
-ao
-a1
-a2
...
0
0
Consider the matrix
... ...
1
-a,,_,
7. Show how the Newton identities for k > n follow from the CayleyHamilton theorem.
8. (D. W. Robinson) Suppose A E C"I and B E C'"I", where m < n. Argue that the characteristic polynomial of A is x
"' times the character-
istic polynomial of B if and only if tr(Ak) = tr(B') fork = 1, 2, ... , n. 9. (H. Flanders, TAMM, Vol. 63, 1956) Suppose A E C""". Prove that A is nilpotent iff tr(Ak) = 0 fork = 1, 2, ... , n. Recall that nilpotent means
AP = 0 for some power p. 10. Suppose A E C""1" and B E C"'"", where m < n. Prove that the characteristic polynomial of AB is x"-"' times the characteristic polynomial of BA. 11. What can you say about the characteristic polynomial of a sum of two matrices? A product of two matrices?
12. Suppose tr(AL) = tr(Bk) for k = I, 2, 3, .... Argue that XA(x) Xe(x) 13. Suppose A = I B
0C
. What, if any, is the connection between the
characteristic polynomial of A and those of B and C?
2.5 The Frame Algorithm and the Cayley-Hamilton Theorem
14. Verify the Cayley-Hamilton theorem for A =
1
1
I
0
1
l
0
0
1
93
15. Let A E C0". Argue that the constant term of XA(x) is (-1)" (product
of the roots of XA(x)) and the coefficient of x" is -Tr(A) = -(sum of the roots of XA(x)) If you are brave, try to show that the coefficient of x"-i is (-1)"-i times the sum of the j-by-j principal minors of A. 16. Is every monic polynomial of degree n in C[x] the characteristic polynomial of some n-by-n matrix in C"""?
17. Find a matrix whose characteristic polynomial is p(x) = x4 + 2x3 -
3x2+4x-5. 18. Explain how the Cayley-Hamilton theorem can be used to provide a method to compute powers of a matrix. Make up a 3-by-3 example and illustrate your approach.
19. Explain how the Cayley-Hamilton theorem can be used to simplify the calculations of matrix polynomials so that the problem of evaluating a polynomial expression of an n-by-n matrix can be reduced to the problem of evaluating a polynomial expression of degree less than n. Make up an example to illustrate your claim. (Hint: Divide the characteristic polynomial into the large degree polynomial.)
20. Explain how the Cayley-Hamilton theorem can be used to express the inverse of an invertible matrix as a polynomial in that matrix. Argue that a matrix is invertible if the constant term of its characteristic polynomial is not zero.
21. Prove the first part of Theorem 2.19. (Hint: Recall the division algorithm for polynomials.)
22. Suppose U is invertible and B = U-'AU. Argue that A and B have the same characteristic and minimal polynomial. 23. What is wrong with the following "easy" proof of the Cayley-Hamilton theorem: XA(x) = det(xI - A), so replacing x with A one gets XA(A) _
det(AI - A) = det(O) = 0? 24. How can two polynomials be different and still have exactly the same set of roots'? How can it be that one of these polynomials divides the other.
Generating Invertible Matrices
94
25. What are the minimal and characteristic polynomials of the n-by-n idenHow about the n-by-n zero matrix? tity matrix
26. Is there any connection between the coefficients of the minimal polynomial for A and the coefficients of the minimal polynomial for A-' for an invertible matrix A?
27. Suppose A =
B
C
What, if any, is the connection between the
minimal polynomial of A and those of B and C?
28. Give a direct computational proof of the Cayley-Hamilton theorem for any 2-by-2 matrix.
Further Reading [B&R, 1986(2)] T. S. Blyth and E. F. Robertson, Matrices and Vector Spaces, Vol. 2, Chapman & Hall, New York, (1986). [B&R, 1986(4)] T. S. Blyth and E. F. Robertson, Linear Algebra, Vol. 4, Chapman & Hall, New York, (1986). [Eidswick, 1968] J. A. Eidswick, A Proof of Newton's Power Sum Formulas, The American Mathematical Monthly, Vol. 75, No. 4, April, (1968),
396-397. [Frame, 1949] J. S. Frame, A Simple Recursion Formula for Inverting a Matrix, Bulletin of the American Mathematical Society, Vol. 55, (1949), Abstracts. [Gant, 1959] F. R. Gantmacher, The Theory of Matrices, Vol. I, Chelsea Publishing Co., New York, (1959).
[H-W&V, 1993] Gilbert Helmherg, Peter Wagner, Gerhard Veltkamp, On Faddeev-Leverrier's Methods for the Computation of the Characteristic Polynomial of a Matrix and of Eigenvectors, Linear Algebra and Its Applications, (1993), 219-233. [House, 1964] Alston S. Householder, The Theory of Matrices in Numerical Analysis, Dover Publications Inc., New York, (1964).
2.5 The Frame Algorithm and the Cayley-Hamilton Theorem
95
[Kalman, 2000] Dan Kalman, A Matrix Proof of Newton's Identities, Mathematics Magazine, Vol. 73, No. 4, October, (2000), 313-315. [LeV, 18401 U. J. LeVerrier, Sur les Variations Seculaires des Elements Elliptiques des sept Planetes Principales, J. Math. Pures Appl. 5, (1840), 220-254. [Liebler, 2003] Robert A. Liebler, Basic Matrix Algebra with Algorithms and Applications, Chapman & Hall/CRC Press, Boca Raton, FL, (2003).
[Mead, 1992] D. G. Mead, Newton's Identities, The American Mathematical Monthly, Vol. 99, (1992), 749-751. [Pennisi, 1987] Louis L. Pennisi, Coefficients of the Characteristic Polynomial, Mathematics Magazine, Vol. 60, No. 1, February, (1987), 31-33.
[Robinson, 19611 D. W. Robinson, A Matrix Application of Newton's Identities, The American Mathematical Monthly, Vol. 68, (1961), 367369.
2.5.3 2.5.3.1
Numerical Note The Frame Algorithm
As beautiful as the Frame algorithm is, it does not provide a numerically stable means to find the coefficients of the characteristic polynomial or the inverse of a large matrix.
2.5.4 MATLAB Moment 2.5.4.1
Polynomials in MATLAB
Polynomials can be manipulated in MATLAB except that, without The Sym-
bolic Toolbox, all you see are coefficients presented as a row vector. So if f (x) = a, x" + a2x"- I + + a"x + a,+,, the polynomial is represented as a row vector
[a, a2 a3 ... an+I]
For example, the polynomial f (x) = 4x3 + 2x2 + 3x + 10 is entered as
f =[42310]. If g(x) is another polynomial of the same degree given by g, the sum of f (x) and g(x) is just f + g; the difference is f - g. The product of any two
Generating !m'ertible Matrices
96
polynomials f and g is conv(f,g). The roots of a polynomial can be estimated by the function roots( ). You can calculate the value of a polynomial f at a given number, say 5, by using polyval(f,5). MATLAB uses Horner's (nested) method to do the evaluation. You can even divide polynomials and get the quotient and remainder. That is, when a polynomial f is divided by a polynomial h there is a unique quotient q and remainder r where the degree of r is strictly less than the degree of h. The command is
[q, r] = deconv(f, h).
For example, suppose f (x) = x4 + 4x2 - 3x + 2 and g(x) = x2 + 2x - 5. Let's enter them, multiply them, and divide them.
>> f=[104-32]
f=
-3
4
0
1
2
>> g=[I2-5] 9= 1
2
-5
>> conv(f,g)
ans = 2
1
-1
5
-24
19
-10
>> [q r] =deconv(fg) q=
-2
1
13
r= 0
0
-39
0
67
>> roots(g)
ans =
-3.3395 1.4495
>> polyval(f,0)
ans = 2
Let's be sure we understand the output. We see
f (x)g(x) = x6 + 2x5 - x4 + 5x3 - 24x22 + l9x - 10. Next, we see
f (x) = g(x)(x2 - 2x + 13) + (-39x + 67). Finally, the roots of g are -3.3395 and 1.4495, while f (0) = 2.
2.5 The Frame Algorithm and the cayley-Hamilton Theorem
97
One of the polynomials of interest to us is the characteristic polynomial, XA(x) = det(xI - A). Of course, MATLAB has this built in as poly(A). All you see are the coefficients, so you have to remember the order in which they appear. For example, let's create a random 4-by-4 matrix and find its characteristic polynomial. >>format rat >>A=fix(11 *rand(4))+i*fix(i I *rand(4))
A= Columns 1 through 3 10+ 10i 9 2+ 101 8+3i
6+4i
5+8i
5+9i
0
9+ li
4+2i 6+2i 8+6i
Column 4 10 + 2i
8 + 2i 1
4 + 8i >>p= poly(A)
p= Columns l through 3
-28 - 231
1
-4 + 1961
Columns 4 through 5 382 - 12101 -176 + 766i So we see the characteristic polynomial of this complex 4-by-4 matrix is
XA(X) = x4 - (28 + 23i )x3 + (-4 + 196i )x2 + (382 - 1210i )x
+(-176 + 766i). We can form polynomials in matrices using the polyvalm command. You can even check the Cayley-Hamilton theorem using this command. For example, using the matrix A above,
>> fix(polyvalm(p,A))
ans = 0 0
0 0
0 0 0 0
0
0 0 0
0 0 0 0
Finally, note how we are using the fix command to get rid of some messy decimals. If we did not use the rat format above, the following can happen.
Generating hrvertible Matrices
98
>> A = fix( I I *rand(4)) + i *fix( I I *rand(4))
A=
2.0000 + 3.00001 1.0000 + 7.00001 9.0000 + 4.00001 9.0000 + 7.00001 7.0000 + 3.0000i 7.0000 + 3.00001 6.0000 + 7.00001 7.0000 + 6.00001 3.0000 + 3.0000i 4.0000 + 9.0000i 5.0000 + 6.0000i 8.0000 + 8.00001 5.0000 + 5.00001 9.0000 + 6.00001 9.0000 + 4.00001 7.0000 + 10.00001
>> p = poly(A)
p= I.0e+003* Columns I through 4 0.0010 -0.0210 - 0.02201 Co l umn 5 -2.087-1.3060i >> polyvalm(p,A) ans =
-0.0470 - 0.1180i
-0.273 - 1.04501
I.Oe-009* 0.1050 + 0.0273i 0.1398 + 0.05551 0.1525 + 0.03461 0.1892 + 0.0650i 0.1121 + 0.01321 0.1560 + 0.03801 0.1598 + 0.0102i 0.2055 + 0.0351 i
0.1073+0.0421i 0.1414+0.07891 0.1583+0.05391 0.1937+0.09161 0.1337+0.0255i 0.1837+0.0600i 0.1962+0.0277i 0.2447 + 0.06181 Notice the scalar factor of 1000 on the characteristic polynomial. That leading coefficient is after all supposed to be 1. Also, that last matrix is supposed to be 10_9 in front of the matrix. That the zero matrix but notice the scalar factor of makes the entries of the matrix teeny tiny, so we effectively are looking at the zero matrix.
Chapter 3 Subspaces Associated to Matrices
subspace, null space, nullity, column space, column rank, row space, row rank, rank-plus-nullity theorem, row rank=column rank
3.1
Fundamental Subspaces
In this section, we recall, in some detail, how to associate subspaces to a matrix A in C' " xn . First, we recall what a subspace is.
DEFINITION 3.1
(subspace of C")
A nonempty subset M C C" is called a subspace of C" if M is closed under the formation of sums and scalar multiples. That is, if u, v E M, then u + v E M, and if u E M and a is any scalar, then au E M.
Trivial examples of subspaces are ('), the set consisting only of the zero vector and C" itself. Three subspaces are naturally associated to a matrix. We define these next. DEFINITION 3.2 (null space and nullity) Let A E C"' ". We define the null space of A as Null (A) = {x E C' j Ax = -6). The dimension of this subspace of C" is called the nullity of A and is denoted nlty(A).
(column space and column rank) Let A E C'""". We define the column space of A as Col(A) = (Ax I X E C" }.
DEFINITION 3.3
The dimension of this subspace of Cis called the column rank of A and is denoted c-rank(A).
99
Subspaces Associated to Matrices
100
DEFINITION 3.4 (row space and row rank) Let A E C""". We define the row space of A as the span of the rows of A in C". In symbols, we write 7Zow(A) for the row space and the dimension of this subspace is the row rank of A and is denoted r-rank(A).
THEOREM 3.1 Let A E CIII ' ". Then
I. Null(A) is a subspace of C" and nlty(A) < n. 2. Col(A) is a subspace of C' and c-rank(A) < m. 3. Col(A) equals the span of'the columns of A in C"'.
4. 7Zow(A) is a subspace of C' and r-rank(A) < n. 5. 7Zow(A) = Col(AT) and Col(A) = 7Zow(AT ).
6. Row(A) = Col(A*) and Col(A) = 7Zow(A*).
7. Null(A*A) = Null(A), so nlty(A*A) = nlty(A). PROOF We leave (1) through (6) as exercises. We choose to prove (7) to illustrate some points. To prove two sets are equal, we prove each is included in the other as a subset. First pick x E Mull(A). Then Ax = _0 so A*Ax = A* 6 = -6 putting X E NUll(A*A). To get the other inclusion, we need to note
that ifx =
xi
xi
X2
x,
I, then x*x = [xi 12 ... Y,,]
L x,,, J
I
.
I = Ixi I2+1x212+
+I
L x,,, J
XI2, using zz = Iz12 for a complex number z. Thus, if x*x = V, then
Ij
jxi I2 = 0, implying all xis are zero, making x = . Therefore, if x E Null(A*A), then A*Ax = 0 , so x*A*Ax = x*' = l e. Then (Ax)*(Ax) _ I
e. But (Ax)*(Ax) = -6 implies Ax = -6 by the above discussion. This puts x E Null(A). Thus, we have proved Null(A) e NUll(A*A) above and now Null(A*A) S Null(A). These together prove Null(A) = JVull(A*A). 0 Note that we strongly used a special property of real and complex numbers to get the result above. There are some other useful connections that will be needed later. We collect these next.
3.1 Fundamental Subspaces
101
THEOREM 3.2 Let A E Cm ", B E Cy x M. Then
1. Null(A) C Null(BA) 2. If U E C",xm is invertible, then Null(UA) = AIull(A) so nlty(UA) _ nlty(A). PROOF
The proof is left as an exercise.
0
We are often interested in factoring a matrix, that is, writing a matrix as a product of two other "nice" matrices. The next theorem gives us some insights on matrix factors. THEOREM 3.3 LetAECmx",BECmxP,DECyxn.Then
1. Col(B) S; Col(A) iff there exists a matrix C in C",1 such that B = AC.
2. Col(AC) C Col(A)for any C in C"", so c-rank(AC) < c-rank(A). 3. Row(D) a Row(A) iff there exists a matrix K in Cg,, such that D = KA. 4.
Row(KA) C_ Row(A) for any K in Cg,,, so r-rank(KA) < rrank(A).
5. LetS E Cnxk T E C"*P.IfCol(S) c Col(T), thenCol(AS) C Col(AT). Moreover, ifCol(S) = Col(T), then Col(AS) = Col(AT).
6. Let S E Cgxm and T E C_'xm. Then, if Row(S) C_ Row(T), then
Row(SA) g Row(TA). Moreover, if Row(S) = Row(T), then Row(SA) = Row(T A). 7. If U E
is invertible, then Row(UA) = Row(A).
8. If V E Cnxn is invertible, then Col(AV) = Col(A).
PROOF
The proofs are left as exercises.
El
Next, we consider a very useful result whose proof is a bit abstract. It uses many good ideas from elementary linear algebra. It's a nice proof so we are going to give it.
102
Subspaces Associated to Matrices
THEOREM 3.4 (rank plus nullity theorem) Let A E Cl"". Then the column rank of A plus the nullity of A equals the number of columns of A. That is, c-rank(A) + nlty(A) = it.
PROOF
Take a basis (vi, V2,
of all of C', say f v,
, V2,
, Vq } of Null
, vq, w1 ,
(A). Extend this basis to a basis
, w, } is the full basis. Note n = q + r.
Now take y in Col(A). Then, y = Ax for some x in C". But then, x can be expressed in the basis (uniquely) as x = al vi +a2v2+ +ayv, +b, w, + + brWr.Thusy= +brAWr = +brAWr. This says the vectors Awl, Awe, + b, Awl + , AWr span Col(A). Now the question is, are these vectors independent? To check, we set c,Aw,+C2Aw2+ Then +Crwr) = -6. This puts the vector c,w, + C2w2 + + CrWr in Null(A). Therefore, this vector can be expressed in terms of the basis vectors v,, v2, , vy. Then,
C,W,+C2W2+
d,v,+d2v2+
+dyv,q-c,W, -C2W2- -c,.w, =
'. But now, we are looking at the
entire basis which is, of course, independent, so d, = d2 = .. = d, = c, _ = Cr = 0. Thus, the vectors Awl, Awe, , Awr are independent and consequently form a basis for Col(A). Moreover, r = c-rank(A) and C2 =
q = nlty(A). This completes the proof. Isn't this a nice argument?
0
COROLLARY 3.1 If A E CnIxn and U E C0""' is invertible, then c-rank(UA) = c-rank(A).
PROOF By Theorem 3.2(2)A(ull(UA) = Null(A), sonlty(UA) = nlty(A). By Theorem 3.4, n = c-rank(A)+nlty(A) = c-rank(U A)+nlty(UA). Cancel and conclude c-rank(A) = c-rank(UA). 0 Notice that this corollary does not say Col(A) = Col(UA). That is be-
cause this is not true! Consider A = ( 2 is invertible and UA = [ Col(B) _ ((3
0
1. J
8
-2
Now U = L
But Col(A) _ {a L
2
0 1
]Ia E C} and
1
0J
10 E C} which are, evidently, different subspaces.
THEOREM 3.5 If U and V are invertible, then UA V has the same row rank and the same column rank as A.
3.1 Fundamental Subspaces
103
PROOF We have c-rank(A) = c-rank(UA) by Corollary 3.1 and c-rank (UA) = c-rank(UA V) by Theorem 3.4(8). Thus c-rank(A) = c-rank(UA V).
Also, Row(UAV) = Col((UAV )T) = Col(V T ATUT), so r-rank(A) = c-rank(AT) = r-rank(UAV) = c-rank(V T AT UT ).
0
We next look at a remarkable and fundamental result about matrices. You probably know it already. The row rank of a matrix is always equal to its column rank, even though A need not be square and Null(A) and Co!(A) are contained in different vector spaces! Our next goal is to give an elementary proof of this fact.
THEOREM 3.6 (row rank equals column rank) Let A E C"'X". Then r-rank(A) = c-rank(A). PROOF Let A be an m-by-n matrix of row rank r over C. Let rl = rowl(A), ... , r", = row (A). Thus r1 = row;(A) = [ail ai2 . for i = 1, 2, ... , m. Choose a basis for the row space of A, Row(A), say bl, b2, ... , br. Suppose b1 = [b11b12 ... b;"] for i = 1, 2, ... , r. It follows that each row of A is uniquely expressible as a linear combination of the basis vectors:
r2
= cl1bl + c12b2 + ... + Clrbr = C21 bl + C22b2 + ... + C2rbr
rm
=
rl
+Cm2b2+...+C,nrbr
Cmibl
Now [a,Ial2 ... aln] = rl = cl1[bllbl2 ... bin]+C12[b21b22 ... b2n]+ .+Clr [brlbr2 ... brn]. A similar expression obtains for each row. By equating entries, we see for each j,
a21
= cl I bl j + c12b2j + . + Cl rbr j = C21 bl j + C22 b21 + ... + C2rbr1
a,, ,j
=
all
Cm I b 1 j + Cm 2 b2j + ... + Cn, r brj .
As a vector equation, we get Cl1
C21
Cml
Cir
+
+brj
C2r
Cmr
j = 1, 2, ...
104
Subspaces Associated to Matrices
This says that every column of A is a linear combination of the r vectors of cs. Hence the column space of A is generated by r vectors and so the dimension of the column space cannot exceed r. This says c-rank(A) < rrank(A). Applying the same argument to AT, we conclude c-rank(AT) < r-
rank(AT). But then, we have r-rank(A) < c-rank(A). Therefore, equality a
must hold.
From now on, we will use the word rank to refer to either the row rank or the column rank of a matrix, whichever is more convenient, and we use the notation rank(A) or r(A). For some matrices, the rank is easy to ascertain. For example,
then r(A) is the number of nonzero , n. If A = diag(di, d2, diagonal elements. For other matrices, especially for large matrices, the rank may not be so easily accessible. We have seen that, given a matrix A, we can associate three subspaces and two
dimensions to A. We have the null space Null(A), the column space Col(A), the row space Row(A), the dimension dim(Null(A)), which is the nullity of A, and dim(Co!(A)) = dim(Row(A)) = r(A), the rank of A. But this is not the end of the story. That is because we can naturally associate other matrices to A. Namely, given A, we have the conjugate of A, A; the transpose of A, AT; and the conjugate transpose, A* _ (A)T. This opens up a number of subspaces that can be associated with A:
Null(A)
Col(A)
Row(A)
Null(A)
Col(A)
Row(A)
Afull(AT)
Col(AT)
Row(AT)
NUll(A*)
Col(A*)
Row(A*).
Fortunately, not all 12 of these subspaces are distinct. We have Col(A) = Row(AT ), Col(A) = Row(A*), Col(AT) = Row(A), and Col(A*) = Row(A). Thus, there are actually eight subspaces to consider. If we wish, we can eliminate row spaces from consideration altogether and just deal with null spaces and column spaces. An important fact we use many times is that rank(A) = rank(A*) (see problem 15 of Exercise Set 8). This depends on the fact that if there is a dependency relation among vectors in C', there is an equivalent dependency relation among the vectors obtained from these vectors by taking complex conjugates of their entries. In fact, all you have to do is take the complex conjugates of the scalars that effected the original dependency relationship. We begin developing a heuristic picture of what is going on with the following diagram.
3.1 Fundamental Subspaces
105
Cm
Cn
Amxn, A
FAT, AT = A*
Figure 3.1:
Fundamental subspaces.
Exercise Set 8 1. Give the rank and nullity of the following matrices: 1
0 0
0 1
0
0 0 1
,
2
0
0
0
3
0 0
0
0
],
], [
1
1
i
1
a
b
0
O
c
d
l
2+2i 2-2i
4 Fin d th e ra nk of A. 6i 6 6 +61 JJ . Also, compute AA* and A*A and find their ranks.
2. Let A =
f
2
3+3i
3. If A E C","" argue that r(A) < m and r(A) < n. 4. Fill in the proofs of Theorems 3.1, 3.2, and 3.3. 5. Argue that if A is square, A is invertible iff A*A is invertible.
6. Prove that Col(AA*) = Col(A), so c-rank(AA*) = c-rank(A).
7. Prove that c-rank(A) = c-rank(A) and r-rank(A) = r-rank(A).
8. Argue that Null(A)f1Col(A*) = (') andNull(A*) f1Col(A) = (76). 9. Let B be an m-by-n matrix. Argue that the rows of B are dependent in C" if there is a nonzero vector w with wB = '. Also the columns are dependent in C"' iff there is a nonzero vector w with Bw = e.
10. Consider a system of linear equations Ax = b where A E C"'"", X E C", and b E C01"'. Argue that this system is consistent (i.e., has a
Subspaces Associated to Matrices
106
solution) iii rank(A) = rank([A I bO, where [A I b] is the augmented matrix (i.e., the matrix obtained from A by adding one more column, namely b).
11. Define a map J : C" - C" by J(xi, x2, ... ,
(xi, x2, ... , x").
That is, J just takes the complex conjugate of each entry of a vector. Argue that J2 = I. Is J a linear map? How close to a linear map is it? Is J one-to-one? Is J onto? If M is a subspace of C", argue that J(M) is a subspace and dim(J(M)) = dim(M). If A is a matrix in C"', what does J do to the fundamental subspaces of A? 12. Suppose A is an n-by-n matrix and v is a vector in C". We defined the Krylov matrix IC,(A, v) = [v I Av I A2v I I A`-'v]. Then the Krylov subspace K.c(A, v) = span{v, Av,... , A`v} = Co1(K,(A, v)). Suppose Ax = b is a linear system with A invertible. Suppose deg(µA) = m. Argue that the solution to Ax = b lies in K,,,(A, b). Note, every x in K(A, b) is of the form p(A)b, where p is a polynomial of degree m - I or less.
13. Make up a concrete example to illustrate problem 12 above.
14. Argue that Null(A)
iff the columns of A are independent.
15. Prove that r(A) = r(AT) = r(A*) = r(A).
16. Argue that AB = 0 iff Col(B) c_ Null(A), where A and B are conformable matrices. Recall "conformable" means the matrices are of a size that can be multiplied.
17. Prove that A2 = 0 iff Col(A) c Null(A). Conclude that if A2 = 0, then rank(A) < z if A is n-by-n.
18. Let Fix(A) = {x I Ax = x}. Argue that Fix(A) is a subspace of C" when A is n-by-n. Find Fix (
I
I
1
0 0
2 3
3 2
).
19. Is W = ((z, z, 0) 1 Z E C) a subspace of C3?
20. What is the rank of 1
3
6
7
8
10
11
12
15
16
13
41
2
3
5
6
7
8
9
? What is the rank of
4
2
5 9
1
4
'? Do y ou see a patte rn? Can you generalize?
3. / Fundamental Subspaces
Let S=
21.
a -b
b
c
a, b, c
107
C). Is S a subspace of
2x2
J
22.
Argue that rank(AB) < min(rank(A), rank(B)) and rank(B) _ rank(-B).
23.
Prove that if one factor of AB is nonsingular, then the rank of AB is the rank of the other factor.
24.
Argue that if S and T are invertible, then rank(SAT) = rank(A).
25.
26.
Prove that rank(A + B) < rank(A) + rank(B) and rank(A + B) > Irank(A) - rank(B)I. Argue that if A E C'"'" and B E C"xm where n < n, then AB cannot be invertible.
27. For a block diagonal matrix A = diag[A11, A22,
...
, Akk], argue that
k
rank(A) = >rank(A;;). 28. Suppose A is an n-by-n matrix and k is a positive integer. Argue that JVull(Ak-1) e_ Null(Ak) e_ Null(Ak+' ). Let {b,, ... , b,} be a basis ofl%full(Ak_l); extend this basis toAful/(A")and get {b,, ... , b,, Cl, C2, ... ,c, } . Now extend this basis to get a basis {b, , ... , br, CI, , . . , Cs,, d, , ... , d, } of Null (Ak+' ). Argue that {b, , ... , b Ad,, ... , Ad,) is a linearly independent subset of Null(Ak). .
29. Argue that Col(AB) = Col(A) iff rank(AB) = rank(A).
30. Prove that Null(AB) = Null(B) iff rank(AB) = rank(B). 31
.
(Peter Hoffman) Suppose A , , A2, ... , An are k-by-k matrices over C with A, + A2 + + A" invertible. Argue that the block matrix A, ®
A2
A,
A3 A2
... ...
An
0
An-,
A.
...
0
has full rank.
®
0
...
A,
A2
...
...
A,,
What is this rank? 32. (Yongge Tian) Suppose A is an m-by-n matrix with real entries. What is the minimum rank of A + iB, where B can be any m-by-n matrix with real entries?
33. Prove that if AB = 0, then rank(A) + rank(B) < i, where A and
BEC"
108
Subspaces Associated to Matrices
34. Suppose A is nonsingular. Argue that the inverse of A is the unique matrix
X such that rank [
X ] = rank(A).
35. If A is m-by-n of rank m, then an LU factorization of A is unique. In particular, if A is invertible, then LU is unique, if it exists. 36. Suppose A is n-by-n.
(a) If rank(A) < n - 1, then prove adj(A) = 0. (b) If rank(A) = it - I, then prove rank(adj(A)) = 1. (c) If rank(A) = it, then rank(adj(A)) = it. 37. Argue that Col(A + B) e Col(A) + Col(B) and Col(A + B) =Col(A)+ Col(B) iff Col(A) C Col(A + B). Also, Col(A + B) = Col(A)+Col(B)
iffCol(B) g Col(A + B). 38. Prove that Col([A I B]) = Col(A) + Col(B) so that if A is m-by-n and B is m-by-p, rank([A I B]) < rank(A) + rank(B).
39. Suppose A E C"", X E C""t', and AX = 0. Argue that the rank of X is less than or equal to n - r. 40. Suppose A E C"""' and m < n. Prove that there exists X # 0 such that
AX=0. A
41. Prove the Guttman rank additivity formula: suppose M = IC
B
DJ
with det(A) # 0. Then rank(M) = rank(A) + rank(M/A).
Further Reading [B&R, 1986(2)] T. S. Blyth and E. F. Robertson, Matrices and Vector Spaces, Vol. 2, Chapman & Hall, New York, (1986). [Lieheck, 19661 H. Liebeck, A Proof of the Equality of Column and Row Rank of a Matrix, The American Mathematical Monthly, Vol. 73, (1966), 1114.
[Mackiw, 1995] G. Mackiw, A Note on the Equality of Column and Row Rank of a Matrix, Mathematics Magazine, Vol. 68, (1995), 285-286.
3.1 Fundamental Subspaces
109
MATLAB Moment
3.1.1
The Fundamental Subspaces
3.1.1.1
MATLAB has a built-in function to compute the rank of a matrix. The command is
rank(A) Unfortunately, MATLAB does not have a built-in command for the nullity of A. This gives us a good opportunity to define our own function by creating an M-file. To do this, we take advantage of the rank plus nullity theorem. Here is how it works.
Assuming you are on a Windows platform, go to the "File" menu, choose "New" and "M-file." A window comes up in which you can create your function as follows: 1
function nlty= nullity(A)
2-
[m,n]= size(A)
3-
nlty = n - rank(A).
Note that the "size" function returns the number of rows and the number of columns of A. Then do a "Save as" nullity.m, which is the suggested name. Now check your program on 1
A=
-2 2
-6
1
14
1
-6
11
4
-9 5
>> A=[ 1 -6 14;-2 14 1 -9;2 -6 11 5] 1
A= -2 2
-6
1
4
14
1
-6
-9
11
5
>> rank(A)
ans = 2
>> nullity(A) ans =
2
Now try finding the rank and nullity of
B=
1 +i
2+2i 3+i
2+2i 4+4i 3+3i 6+6i
9i 8i
Subspaces Associated to Matrices
110
It is possible to get MATLAB to find a basis for the column space and a basis for the null space of a matrix. Again we write our own functions to do this. We use the rref command (row reduced echelon form), which we will review later. You probably remember it from linear algebra class. First, we find the column space. Create the M-file
function c = colspace(A) [m,n] = size(A) 3- C = [A' eye(n)] 4- B = rref(C)' 5- c = B([I:m],[1:rank(A)]); I
2-
which we save as colspace.m. Next we create a similar function to produce a basis for the nullspace of A, which we call nullspace.m.
function N = nullspace(A) [m,n] = size(A) C = [A' eye(n)]; 4- B = rref(C)' 5- N = B([m+l:m+n],[rank(A)1+1:n]); I
23-
Now try these out on matrix A above. >> A=[ 1 -6 1 4;-2 14 1 -9;2 -6 11 51 I
A = -2 2
-6
1
14
1
-6
11
4
-9 5
>> colspace(A) ans =
0
1
0
1
8
3
>> format rat >>nullspace(A) ans = I
0
-1/13 -3/13
0 1
-2/13 20/13
Now, determine the column space and nullspace of B above. Finally we note that MATLAB does have a built-in command null(A), which returns an orthonormal basis for the nullspace, and orth(A), which returns an
3.2 A Deeper Look at Rank
I
II
orthonormal basis for the column space. A discussion on this is for a later chapter.
By the way, the World Wide Web is a wonderful source of M-files. Just go out there and search.
Further Reading [L&H&F, 1996] Steven Leon, Eugene Herman, Richard Faulkenberry, ATLAST Computer Exercises for Linear Algebra, Prentice Hall, Upper Saddle River, NJ, (1996).
Sylvester's rank formula, Sylvester's law of nullity, the Frobenius inequality
3.2 A Deeper Look at Rank We have proved the fundamental result that row rank equals column rank. Thus, we can unambiguously use the word "rank" to signify either one of these
numbers. Let r(A) denote the rank of the matrix A. Then we know r(A) =
r(A*) = r(AT) = r(A). Also, the rank plus nullity theorem says r(A) + nlty(A) = the number of columns of A. To get more results about rank, we develop a really neat formula which goes back to James Joseph Sylvester (3 September 1814 - 15 March 1897). THEOREM 3.7 (Sylvester's rank formula) Let A E Cl " and B E C""P. Then
r(AB) = r(B) - dim(Null(A) fCol(B)). PROOF Choose a basis of Null (A) fl Col(B), say {bi , b2, ... , b, }, and extend this basis to a basis of Col (B). Say B = {b,, b2, ... , b c, , c2, ... , c, } is this basis for Col(B). We claim that {Ac, , Ace, ... Ac, } is a basis for Col(AB). As usual, there are two things to check. First, we check the linear indepen-
Subspaces Associated to Matrices
1 12
dence of this set of vectors. We do this in the usual way. Suppose a linear + a,Ac, = 6. Then, A(a,c, +a2c2 + combination a,Ac, + a2Ac2 +
+ a,c,) =
'. This puts a,c, + a2c2 +
+ a,c, in the null space
of A. But the cs are in Col(B), so this linear combination is in there also. + arc, E A(ull(A) fl Col(B). But we have a baThus, a,c, + a2c2 + sis for this intersection, so there must exist scalars Ri , (32, ... , (3, so that + atc, =Ribs +02b2 + + R,b,. But then, a i ci + a2c2 + a, c, + a2c2 + . + a,c, - R, b, - 02b2 - R,b,. = -6. Now the cs and the bs together .
make up an independent set so all the scalars, all the as, and all the Ps, must be zero. In particular, all the a's are zero, so this establishes the independence of
{Aci, Ace,... Ac,}. Is it clear that all these vectors are in Col(AB)? Yes, because each ci is in Col(B), so ci = Bxi for some xi; hence Ac, = ABxi E Col(AB). Finally, we prove that our claimed basis actually does span Col(A B). Let y be in Col(A B).
Then y = ABx for some x. But Bx lies in Col(B) so Bx = aibi + a2b2 +
then, y=ABx=a,Ab,+a2Ab2+ +a,,Ab, +a,.+,Ac, +...+a,+tAc, = _iT +a,+,Ac, +...+a,+tAc,. Thus, we have established our claim. Now notice t = dim(Col(AB)) = r(AB)
and r(B) = dim(Col(B)) = s + t = dim(Null(A) fl Col(B)) + r(AB). The formula now fellows.
0
The test of a good theorem is all the consequences you can squeeze out of it. Let's now reap the harvest of this wonderful formula.
COROLLARY 3.2
For AEC"'X"and BEC'
,
nlty(AB) - nlty(B)+dim(Null(A) flCol(B)). PROOF
This follows from the rank plus nullity theorem.
0
COROLLARY 3.3
For A E C'nxn and B E C"xn, r(AB) < min(r(A), r(B)).
PROOF First, r(AB) = r(B) - dim(Null(A) fl Col(B)) < r(B). Also 0 r(AB) = r((AB)T) = r(BT AT) < r(AT) = r(A). COROLLARY 3.4
For A E C"' x" and B E CnXP, r(A) + r(B) - n < r(AB) < min(r(A), r(B)).
3.2 A Deeper Look at Rank
PROOF
113
First, Null(A)fCol(B) c A(ull(A) so dim (Arull(A) fl Col(B)) <
dim(Null(A)) = nlty(A) = n - r(A) so r(AB) = r(B) - dim(Null(A) nl Col(B)) > r(B) - (n - r(A)) so r(AB) > r(B) + r(A) - it. COROLLARY 3.5
Let A E C"' x"; then r(A*A) = r(A). PROOF It takes a special property of complex numbers to get this one. Let x E Null(A*) fl Col(A). Then A*x = -6 and x = Ay for some y. But then x*x = y* A*x = -6 so E Ix; 1Z = 0. This implies all the components of x are zero, so x must be the zero vector. Therefore, JVull(A*) fl Col(A) _ (v), and so r(A*A) = r(A)- dim(Null(A*) fl Col(A)) = r(A). 0 COROLLARY 3.6 Let A E Cm x'1; then r(AA*) = r(A*).
PROOF
Replace A by A* above.
0
COROLLARY 3.7
Let A E C"; then Col(A*A) = Col(A*) and JVull(A*A) = JVull(A). PROOF
Clearly Col(A*A) C Col(A*) and Null(A) e Afull(A*A). But
dim(Col(A*A)) = r(A*A) = r(A) = r(A*) = dim(Col(A*)). Also, dim(Null(A)) = n - r(A) = n - r(A*A) = dim(JVull(A*A)) soArull(A*A) = Null(A). 0 COROLLARY 3.8
Let A E C'nxn; then Col(AA*) = Col(A) and JVull(AA*) = JVull(A*).
PROOF
Replace A by A* above.
Next, we get another classical result of Sylvester. COROLLARY 3.9 (Sylvester's law of nullity [1884])
For square matrices A and B in C"',
max(nlty(A), nlty(B)) < nlty(AB) < nlty(A) + nlty(B).
0
Subspaces Associated to Matrices
1 14
PROOF
FirstArull(B) c J%Iull(AB)sodint (Arull(B)) < dint (Null(AB))
and so nlty(B) < nlty(AB). Also, nlty(A) = it - r(A) < it - r(AB) nlty(AB).Therefore, max(nlty(A),nlty(B))
3.4, r(A)+r(B)-n < r(AB)= n-nlty(AB)son-nlty(A)+n-nlty(B)n < n - nlty(AB). Canceling the its gives the "minus" of the inequality we want. Thus, nlty(AB) < nlty(A) + nlty(B). 0 Another classical result goes back to F. G. Frobenius, whom we have previously mentioned. COROLLARY 3.10 (the Frobenius inequality 11911D
Assume the product ABC exists. Then r(AB) +r(BC) < r(B) +r(ABC).
PROOF Now Col(BC)nA(ull(A) C Col(B)tlNull(A)so dim(Col(BC)fl Null(A)) < dim(Col(B) fl A(ull(A)). But ditn(Col(BC) fl NUII(A)) = r(BC) - r(ABC) by Sylvester's formula. Also, dim(Col(B) fl JVull(A)) = r(B) - r(AB) so r(BC) - r(ABC) < r(B) - r(AB). Therefore, r(BC) + r(AB) < r(B) + r(ABC). 0 There is one more theorem we wish to present. For this we need some notation.
The idea of augmented matrix is familiar. For A, B in C"', define [A:B] as the nt-by-2n matrix formed by adjoining B to A on the right. A
to be the 2m-by-n matrix formed by putting B
Similarly, define B under A.
THEOREM 3.8 A
Let A and B be in 0'
)fl
Then r (A + B) = r (A) + r(B) - dint (Col( B
AIull([1",:1])) - dim(Col(A*) flCol(B*)). In particular r(A + B) < r(A) + r(B). A
PROOF We note the A + B = [I,,,:1,,,] B
115
3.2 A Deeper Look at Rank A
A
...
so r(A + B) = r(
) - dim(Col(
A
r(
...
)
= r(
) (1
But
B
B
A
...
)
= r([A*:B*]) = dim(Col([A*:B*]) _
B
B
dim(Col(A*) + Col(B*)) = dim(Col(A*)) + dim(Col(B*) - dim(Col(A*) fl
Col(B*)) = r(A*) + r(B*) - dim(Col(A*) fl Col(B*)) = r(A) + r(B) dim(Col(A*) fl Co!(B*)) using the familiar dimension formula.
0
Exercise Set 9
1. Can you discover any other consequences of Sylvester's wonderfu formula?
2. Consider the set of linear equations Ax = b. Then the set of equations A*Ax = A*b are called the normal equations. (a) Argue that the normal equations are always consistent.
(b) If Ax = b is consistent, prove that Ax = b and the associated normal equations A*Ax = A*b have the same set of solutions. the unique solution to both systems is (A*A)-I
(c) IfNull(A) A*b.
3. In this exercise, we develop another approach to rank. Suppose A E
emXn r
(a) Suppose A is m-by-p and B is m-by-q. Argue that if the rows of A are linearly independent, then the rows of [AB] are also linearly independent. (b) If the rows of [A:B] are dependent, argue that the rows of A are necessarily dependent.
(c) Suppose rank(A) = r. Argue that all submatrices of order r + I are singular.
(d) Suppose rank(A) = r. Argue that at least one nonsingular r-by-r submatrix of A exists. (e) If the order of the largest nonsingular submatrix of A is r-by-r, then argue A has rank r.
Subspaces Associated to Matrices
1 16
4. (Meyer, 20W] There are times when it is very handy to have a basis for Null(A) fl Col(B). Argue that the following steps will produce one. (a) Find a basis forCol(B), say {XI, x2, ... , xr}. (b) Construct the matrix X in C""r; X = [x, I X2 I . . . I x.] (c) Find a basis forAfull(AX), say { v i , v2, ... , v,.}.
(d) Argue that B = (Xv1, Xv2,... Xv,} is a basis for Null(A) fl Col(B). (Hint: Argue that Col(X) =Col(B) and Mull(X) then use Sylvester's formula.)
All 5. Prove the rank(
0
0
® I
the rank of A =
... ...
A12 A22
0 n
0 0
...
0
1
0 0
0
0
0
0
0
0
I
AIk A2k
Erank(Aii). Compute
r-i AAA
and com p are it to the ranks of
1
the 2-by-2 diagonal blocks. What does this tell you about the previous inequality? B
A
6. Suppose A is n-by-n and invertible and D is square in M = IC
D
Argue that rank(M) = n iff the Schur complement of A in M is zero. 7. If A is m-by-n and B is n-by-m and rn > n, argue that det(AB) = 0.
8. Argue that the linear system Ax = b is consistent iff rank[A rank(A).
I
b]
9. Suppose that rank(CA) = rank(A). Argue that Ax = b and CAx = Cb have the same solution set.
10. Suppose A is m-by-n. Argue that Ax = 0 implies x =
iff m > n
and A has full rank.
11. Prove that Ax = b has a unique solution iff m > n, the equation is consistent, and A has full rank.
12. Argue that the rank of a symmetric matrix (or skew-symmetric matrix) is equal to the order of its largest nonzero principal minor. In particular, deduce that the rank of a skew-symmetric matrix cannot be odd.
13. Let T be a linear map from C" to C0", and let M be a subspace of C". Argue thatdim(T(M)) = dim(M)- dim(Mfl Ker(T)) so, in particular,
dim(T(M)) < dim(M).
3.3 Direct Sums and Idempotents
117
14. Suppose Ti and T2 are linear maps from C" to C"'. Argue that
(a) Ker(Tj) fl Ker(T2) e Ker(T1 + T2). (b) Im(T1 + T2) c Im(T1)+Im(T2). (c) Irank(Ti) - rank(T2)I < rank(Ti + T2) < rank(Ti) + rank(T2). 15. Argue that Ax = b has no solution if rank(A) # rank([A I b]). Otherwise, the general solution has n - rank(A) free variables.
16. For A,Bin C""", argue that r(AB-I)
r([
CD
]) < r(A) + r(B) + r(C) + r(D).
19. If A and rB E C""",, argue that
rank([ A
B
J) > rank(A) + rank(B).
Further Reading [Meyer, 2000] Carl Meyer, Matrix Analysis and Applied Linear Algebra, SIAM, Philadelphia, (2000).
[Zhang, 1999] Fuzhen Zhang, Matrix Theory: Basic Results and Techniques, Springer, New York, (1999).
complementary subspaces, direct sum decomposition, idempotent matrix, projector, parallel projection
3.3
Direct Sums and Idempotents
There is an intimate connection between direct sum decompositions of C" and certain kinds of matrices. This correspondence plays an important role in our discussion of generalized inverses. First, let's recall what it means to have a
Subspaces Associated to Matrices
1 18
direct sum decomposition. Let M and N be subspaces of C". We say M and N are disjoint when their intersection is as small as it can be, namely, m fl N -
We can always form a new subspace M + N = (x I x = m + n where m c M and n E N}. Indeed, this is the smallest subspace of C" containing both M and N. When M and N are disjoint and M + N = C", we say M and N are complementary subspaces and that they give a direct sum decomposition
of C. The notation is C" = M (D N for a direct sum decomposition. What is nice about a direct sum decomposition is that each vector in C" has a unique representation as a vector from M plus a vector from N. For example, consider the subspace M, = {(z, 0) 1 z E C) and M2 = {(0, w) I W E C). Clearly, any vector v = (z, w) in C'- can be written as v = (z, 0)+(0, w), so C2 = M, +M2. Moreover, if V E M, fl M2, the second coordinate of v is 0 since v E M1, and the
first coordinate of v is 0 since v E M2, so v = (0, 0) = 6. Thus C2 = M, ®M2. This example is almost too easy. Let's try to be more imaginative. This time,
let M, = {(x, y, z)
I
x + 2y + 3z = 0} and M2 = {(r, s, s)
I
r, s E C).
These are indeed subspaces of C3 and (-5, 1, 1) EM, fl M2. Now any vector v = (x, y, z) E C3 can be written 3
(x, y, z) = x,- Sx+ sy1
3
Sx1
Sz,-
2Sy+ 2Sz
+ (0, s(x +2y+ 3z), 5(x +2y +3z) so C3 = M, + M2, but the sum is not direct.
Can we extend this idea of direct sum to more than two summands? What would we want to mean by C" = M, ® M2 ® M3? First, we would surely want
any vector v E C" to be expressible as v = v, + v2 + v3, where v, E M1, V2 E M2, and V3 E M3. Then we would want this representation to be unique. What would it take to make it unique? Suppose v = V1 +v2+v3 = wi +W2+W3,
where v w; E M, for i = 1, 2, 3. Then v, - w, = (w2 - v2) + (w3 - v3) E M2 + M3. Thus v, - w, EM, fl (M2 + M3). To get v, - w, we need M2 fl
to get v2 = w2 and M3 fl (M, + M2) = (6) to get v3 = W3. Now that we have the idea, let's go for the most general case. Suppose MI, M2, ... , Mk are subspaces of C". Then the sum of these subk
spaces, written M, + M2 +
+ Mk = r_M1, is defined to he the collection of i=i all vectors of the form v = v, +v2+ +vk, where v; E M; f o r i = 1, 2, ... , k. THEOREM 3.9 k
Suppose M,, M2, ... , Mk are subspaces of C". Then EM; is a subspace of C". Moreover, it is the smallest subspace of C" containing all the M;, i = 1, 2, k
k
k. htdeed, > M1 = span( U M,). i=1
i_1
3.3 Direct Sums and Idempotents
119
The proof is left as an exercise.
PROOF
a
Now for the idea of a direct sum. THEOREM 3.10 Suppose Mi, M2, ... , Mk are subspaces of C". Then TA.E.: k
1. Every vector v in E Mi can be written uniquely v = vt + v2 +
+ vk
i=1
where vi E Mi for i = 1, 2, ... , k. k
2. If
i=t
vi =
i = 1, 2, ...
with v; E M, for i = 1, 2, ... , k, then each vi _
for
k.
3. For every i = 1, 2, ... , k, Mi fl (>2 M1) i#i PROOF
The proof is left as an exercise. k
We write ® Mi for i=t
0
k
i=t
Mi, calling the sum a direct sum when any one (and
hence all) of the conditions of the above theorem are satisfied. Note that condition (3) above is often expressed by saying that the subspaces MI, M2, ... , Mk are independent. COROLLARY 3.11 k
Suppose M1, M2, ... , Mk are subspaces of C". Then C" = ® Mi iff C" _ i=
k
Mi and for each i = 1, 2, ... , k, Mi fl (EM1) i=1
1#i
Now suppose E is an n-by-n matrix in C" (or a linear map on C") with the property that E2 = E. Such a matrix is called idempotent. Clearly, !" and O are idempotent, but there are always many more. We claim that each idempotent matrix induces a direct sum decomposition of C". First we note that, if E is idempotent, so is ! - E. Next, we recall that Fix(E) = (v E C" I v = Ev}. For an idempotent E, we have Fix(E) = Col(E). Also Null(E)
= Fix(! - E) and Null(!- E) = Fix(E). Now the big claim here is that C" is the direct sum of JVull(E) and Col(E). Indeed, it is trivial that any x in
(C" can be written x = Ex + x - Ex = Ex + (! - E)x. Evidently, Ex is in Col(E) and E((! - E)x) = Ex -EEx = -6, so (I - E)x is in AIull(E).
Subspaces Associated to Matrices
120
We claim these two subspaces are disjoint, for ifz E Col(E) fl Afull(E), then
z = Ex for some x and Ez = 6. But then, -6 = Ez = EEx = Ex = z. This establishes the direct sum decomposition. Moreover, if B = {b, , b2, ... , bk } is a basis for Col(E) = Fix(E) and C = { c 1 , c2, ... , Cn-k } is a basis of Mull (E), then S UC is a basis for C", so the matrix S = [b, ... I bk I C1 I cn-k 1 is invertible. Thus, ES = [Eb, . . I Ebk I Ec, I Ecn-k] = [bl I bk I
I
I
I
I
I
V 1...1-61=[BI®], andsoE=[BI®]S-',where B=[b, I...Ibk
Moreover, if F is another idempotent and Null(E) = Null(F) and Col(E) _ Col(F), then FS = [B 101, so FS = ES, whence F = E. Thus, there is only one idempotent that can give this particular direct sum decomposition. Now the question is, suppose someone gives you a direct sum decompo-
sition C" = M ® N. Is there a (necessarily unique) idempotent out there
with Col(E) = M and Null(E) = N? Yes, there is! As above, select a
... ,
basis {b,, b2, ... , bk} of M and a basis {c,, c2,
of N, and form
the matrix S = [B C]. Note {b,,b2, ... bk, c,, c2, ... , is a basis of C", B E C" and C E C""("-k). Then define E = [B I ®nx(n-k)]S-1 = I
r
[B I Cl I
lk
L ®(n-k)xk
S-1 = SJS-'.ThenE2 = SJS-1SJS-1
®"x("-k)
®(n-k)x(n-k)
J
= SJJS-' = SJS-) = E, so E is idempotent. Also we claim Ev = v if v E M, and Ev = 6 iffy E N. First note m E M iff m = Bx for some x in1Ck.
Then Em = [B I C]J[B I C]-'Bx = [B I C]J[B I C]-)[B I Cl I r
[B I C]J I
]=[BIC][ -x 1
_6 X
= J
= Bx = m. Thus M C Fix(E). Next if
n E N, then n = Cy for some y in C"-k. Then En = [B I C]J[B I C1`[B I
] = [B I C]J I
C]
=
= [B I C]
0 . Since E is idem-
J
L
potent, C" = Fix(E) e Null(E) = Col(E) ® Null(E). Now M CFix(E) and k = dim(M) = rank(E) = dim(Fix(E)) so M = Fix(E). A dimension argument shows N = Null(E) and we are done. There is some good geometry here. The idempotent E is called the projector of C" onto M along N. What is going on is parallel projection. To see 1
this, let's look at an example. Let E =
i
1
T
z
J. Then E = E2, so E
z
must induce a direct sum decomposition of C2. We check that
v=I
X Y J
is fixed by E iff xrr = 1y and Ev = 0 iff y = -x. Thus, C2 = M N, where M = ( y = -x }. The x = y ) and N = ( I
I
y J LX
unique representation of an arbitrary vector is
y J LX
f L
xy
__ f 2x + L
y
zx + Zy 21
+
3.3 Direct Sums and Idempotents
[
121
We now draw a picture in R22 to show the geometry lurking in
'XX -+ Z
the background here.
Parallel projection.
Figure 3.2:
Of course, these ideas extend. Suppose C" = M, ® M2 ® M3. Select bases 13i = [bi, , b12, ... , bik, ] of Mi for i = 1, 2, 3. Then 13, U 132 U 133 is a babi,t, b2ib22 .. b2k2 sis for C". Form the invertible matrix S -
I
I
..
b3lb32.. b3k,].FormEl = [bii I biz
I bik,
®1S- = I
S
f /® ®1
®®® E2 = [® I
b21
E3 = [ ® I
b31
I
b22
I
b2U2
I
I ®] S- l = S r
I
b32
I
I
b3k,] = S I 1
E3=SI IL
0
/k'
0®JS-,+S ® J
0
J
®
01
S', and
0
®J S-1. Then E, + E2 + '
0
S-I+SL ®
®00 'k2
®
l®
Ik2
S,
L
® Ik'
JS-t _ J
Subspaces Associated to Matrices
122
SI S 1 = I,, . Also, E, E2 = S [
®®
OOO
11
S 'S
0
0
1k.
O
J
S
SOS-, = 0 = E2E1. Similarly, E,E3 = 0 = E3E, and E3E2 = 0 = E2E3.
Next, we see E, = E, t = E, (E, + E2 + E3) = E, E, +0+0 = E2
.
Thus, E, is idempotent. A similar argument shows E2 and E3 are also idempotent. The language here is that idempotents E and F are called orthogonal iff EF = 0 = FE. So the bottom line is that this direct sum decomposition has led to a set of pairwise orthogonal idempotents whose sum is the identity. Moreover, note that Col(E1) = Col([b11 b12 b,k, (D]S-1) _ Col([b b12 ... blk, I O]) = b17, , blk, }) = M, Similarly, one shows Col(E2) = M2 and Col(E3) = M3. We finish by summarizing and extending this discussion with a theorem. I
I
I
I
I
.
I
THEOREM 3.11 Suppose M1, M2, ... , Mk are subspaces of C" with C" = M,®M2®...®Mk. Then there exists a set o f idempotents (El, E2, ... , Ek ) such that 1.
E;Ej =0ifi $ j.
2.
3. Col(E,)=M,fori=1,2,...,k. Conversely, given idempotents { El, E2, above, then C" = M, ® M2 s ... ED Mk
PROOF
...
, Ek } satisfying (1), (2), and (3)
.
In view of the discussion above, the reader should be able to fill in
the details.
0
To understand the action of a matrix A E O"", we often associate a direct sum decomposition of C" consisting of subspaces of C", which are invariant under A. A subspace M of C" is invariant under A (i.e., A-invariant) iff A(M) c M. In other words, the vectors in M stay in M even after they are multiplied by A. What makes this useful is that if we take a basis of M, say m1,m2, ... , mk, and extend it to a basis of C", 5 = (m1, m2, .... mk, ... , b"), we can 9---, form the invertible matrix S. whose columns are these basis vectors. Then the matrix S'1 AS has a particularly nice form; it has a block of zeros in it. Let us illustrate. Suppose M = spun{ml, m2, m3), where m,, m2, and m3 are independent in C5. Now Am, is a vector in M since M is A-invariant. Thus, Am, is uniquely expressible as a linear combination of ml, m2, and m3. Say
Am, =a, , m, + a21 m2 + a31 m3.
3.3 Direct Sums and Idempotents
123
Similarly,
Am2 = a12m1 + a22m2 + a32m3
Am3 =
a13m1 + a23m2 + a33m3
Ab4
=
a14m1 + a24m2 + a34m3 + 044b4 + 1354b5
Ab5
=
a15m1 + a25m2 + a35m3 + 1345b4 + 1355b5
Since Ab4 and Ab5 could be anywhere in C5, we may need the entire basis to express them. Now form AS = [Am, I Am2 I Am3 I Ab4 I Ab51 =
[allm1 +a12m2+a13m3 I a21 m1 +a22m2+a23m3 I a31m1 +a32m2+a33m3
all
a12
a13
a14
a15
a21
a22
a23
a24
a25
a3I
a32
a33
a34
a35
0
0
0
044
R45
0
0
0
054
055
Al
A2 A3
IAb4IAb5]=[ml Im2Im3Ib4Ib5]
S
all
a12
a13
a14
a15
a21
a22
a23
a24
a25
a3I
a32
a33
a34
a35
o
0
0
1344
045
o
0
0
054
055
_
S
®
J
It gets even better if C is the direct sum of two subspaces, both of which are A-invariant. Say 0 = M ® N, where M and N are A-invariant. Take a basis of M, say 5 = {bl, ... , bk } and a basis of N, say C = {ck+l, ... , c}. Then 13 U C is a basis of C' and rS = [b1 bk ek+l ... cn] is an I
invertible matrix with S-l AS = I
Al
®
I
J.
I
I
I
Let's illustrate this important
idea in C5. Suppose C5 = Ml ® M2, where A(Ml) c Mi and A(M2) c M2. In order to prepare for the proof of the next theorem, let's establish some notation. Let Bl = {bill, bz ), b31} be a basis for MI and B2 = (b12j, b2 )} be a basis for M2. Form the matrices BI = [b(1I) I b2 I b3 )] E CSX3 and B2 = [bill 1621] E CSX2. Then AM; = ACol(B1) a Col(B;) for i = 1, 2. Now A acts on a basis vector from Z31 and produces another vector in MI.
Subspaces Associated to Matrices
124 Therefore,
Ab(i' = a bc" + Abu' + y,b(I' = [b(I1) I b" I bc''] 1
(3
1
2
2
3
a,
= B,
3
Y' J a')
Ab2' = 02bi" + [3262" + 1'zb3 = lbiI' I b2' I
b3
1
= B,
[32
Y2 a3
Ab3' = a3b') + 13b2' + Y3b3' - [b" 1 b2' I b3)]
= B,
R3
Y3
Form the matrix A, =
a,
a2
a3
R'
02
03
YI
Y2
Y3
We compute that
B,
a3
a2 02
a,
AB,=[Abi"IAb2'IAb2']=
I B, [031] = B, A,.
I B,
13'
Yz
Y '
Y3
A similar analysis applies to A acting on the basis vectors in B2; Ab12) = X b(2' + '
i
'
Ab2'`, = X2biz' +
Form the matrix Az =
µ
,
b(2) z
=
=
[b(2)
I
b(2']
[biz' I b2
µ2b2
1\'
I\ 2
µ'
µ2
XI
= B2
z
'
µ'
)]
X2
= B2
µ2
Again compute that
X2
X
ABz=A[bi2jIb2z']_[Abiz'IAb2']= 1B2
L
µ''
=B2A2.
J
I B2L µz
JJ
Finally, form the invertible matrix S = [B, I B2] and compute
AS = A[B, I B21 =l [AB, I ABz] A
®
0
[BI A, I B2A21 = [B, I Bz]
=S[ ® ® J
The more invariant direct summands you can find, the more blocks of zeros you can generate. We summarize with a theorem.
3.3 Direct Sums and Idempotents
125
THEOREM 3.12 Suppose A E C"xn and C" = Mi®M2® ®Mk, where each M; isA -invariant
for i = 1, 2, ... , k. Then there exists an invertible matrix S E Cnxn such that S-'AS = Block Diagonal [A,, A2, ... , Ak] =
AI
0 0
0
A2
0
00
...
0
0
...
At
PROOF In view of the discussion above, the details of the proof are safely left to the reader. We simple sketch it out. Suppose that M; are invariant subspaces
for A for i = 1, 2, ... , k. Form the matrices B; E Cnx";, whose columns come from a basis of M; . Then M; = Col (B; ). Note Abj = B; y(' ), where y, E C'1' . Form the matrices A; = [y;1 Y(i) E C' xd, Compute that AB; = B; A,, noting n = di + d2 + dk. Form the invertible matrix S = [Bi B2 ... Bk]. Then S-BAS = S-'[AB1 I AB2 I ABk] = I
I
I
S-'[B1A1
I Ai
0
0
A2
I
I
I
I
B2A2
I
0
0
..
01 I
:
0 0
0
I
...
1
BkAk] = S-'[B1 r Al 0 0 ... I
0
A2
L0
0
1
0
B2
01 I
I
Bk]
.
El
Exercise Set 10 1. If M ®N = C", argue that dim(M) + dim(N) = n. 2. If M ®N = C", B is a basis for M, and C is a basis for N, then prove Ci U C is a basis for C".
3. Argue the uniqueness claim when C" is the direct sum of M and N.
4. Prove that M + N is the smallest subspace of Cl that contains M and N.
5. Argue that if E is idempotent, so is I - E. Moreover, Col(I Null(E), and Nnll(1 - E) = Col(E).
- E) _
6. Prove that if E is idempotent, the Fix(E) = Co!(E). 7. Verify that the constructed E in the text has the properties claimed for it.
126
Subspaces Associated to Matrices
8. Suppose E is idempotent and invertible. What can you say about E'?
= E2 iffCol(E) f11Col(/ - E) _ (Vl).
9. Prove that 10. Consider
E0
10 J,
L
f
1
L
0
,
and
J
J.
L
S
Show that these
5
matrices are all idempotent. What direct sum decompositions do they induce on C2?
11. Show that if E is idempotent, then so is Q = E + AE - EAE and Q = E - A E + EA E for any matrix A of appropriate size.
12. Let E
_ J
z
=i
l
]. Argue that E is idempotent and
symmetric. What is the rank of E? What direct sum decomposition does it induce on a typical vector'? Let E
Discuss the same
issues for E. Do you see how to generalize to E n-by-n? Do it.
13. Recall from calculus that a function f : R -+ R is called even iff f (-x) = f (x). The graph of such an f is symmetric with respect to the y-axis. Recall f (x) = x2, f (x) = cos(x), and so on are even functions. Also, f : ]l8 -+ lib is called odd iff f (-x) = -f (x). The graph of an odd function is symmetric with respect to the origin. Recall f (x) = x', f (x) = sin(x), and so on are odd functions Let V = F(R) be the vector space of all functions on R. Argue that the even (resp., odd) functions form a subspace of V. Argue that V is the direct sum of the subspace of even and the subspace of odd functions. (Hint: If f : R -± R, fe(x)
'[f(x) + f (-x)] is even and f °(x) = Z [ f (x) - f(-x)] is odd.) 14. Let V = C""", the n-by-n matrices considered as a vector space. Recall
a matrix A is symmetric iff A = AT and skew-symmetric ill AT = -A. Argue that the symmetric (resp., skew-symmetric) matrices form a subspace of V and V is the direct sum of these two subspaces. 15. Suppose M1, M2, ... , Mk are nontrivial subspaces of C" that are independent. Suppose B is a basis of Mi for each i = 1 , 2, ... , k. Argue that k
k
U 5i is a basis of ®Mi. i_i
i=1
16. Suppose MI, M2, ... , Mk are subspaces of C" that are independent. Ar+ gue that dim(Mi ® M2 ... ED Mk) = dim(Mi) + dim(M2) + dim(Mk).
3.3 Direct Sums and Idempotents 17.
127
Suppose E2 = E and F = S-1 ES for some invertible matrix S. Argue that F2 = F.
18. Suppose M1, M2, ... , Mk are subspaces of C". Argue that these sub-
spaces are independent iff for each j, 2 < j < k, Mi n (M, +
+
Mi_1) = (76). Is independence equivalent to the condition "pairwise
disjoint," that is, Mi n Mi = (') whenever i # j? 19. Suppose E is idempotent. Is I + E invertible? How about I + aE? Are there conditions on a? Suppose E and F are both idempotent. Find invertible matrices S and T such that S(E + F)T = aE + 13F. Are there conditions on a and R? 20. Suppose A is a matrix and P is the projector on M along N. Argue that
PA = A iffCol(A) C M and AP = A iffMull(A) ? N. 21. Suppose P is an idempotent and A is a matrix. Argue that A(Col(P)) c
Col(P) iff PAP = AP. Also prove that Nul!(P) is A-invariant if PAP = PA. Now argue that A(Col(P)) C Col(P) and A(JVull(P)) c
NUII(P) if PA = AP. 22. Prove the dimension formula: for M,, M2 subspaces of C", dim(M, + M2) = dim(MI) + di m(M2) - di m(M, n M2). How would this formula read if you had three subspaces? 23. Suppose M, and M2 are subspaces of C" and d i m(M, +M2) = di m(M, n
M2) + 1. Argue that either M, C M2 or M2 c M,.
24. Suppose M1, M2, and M3 are subspaces of C". Argue that dim(M, n
M2 nM3)+2n > dim(M,)+dim(M2)+dim(M3). 25. Suppose T:C" -+ C" is a linear transformation. Argue that dim(T(M)) +dim(Ker(T) n M) = dim(M), where M is any subspace of C". 26. Prove Theorem 3.10 and Theorem 3.11. 27. Fill the details of the proofs of Theorem 3.13 and Theorem 3.14. 28. (Yongge Tian) Suppose P and Q are idempotents. Argue that (1) rank(P+
Q) > rank(P - Q) and (2) rank(PQ + QP) > rank(PQ - QP). 29. Argue that every subspace of C" has a complementary subspace.
30. Prove that if M; n E Mi = ( ), then M; n Mi = (') for i # j, where i#i
the Mi are subspaces of C".
Subspaces Associated to Matrices
128
31. Suppose p is a polynomial in C[x]. Prove thatCol(p(A)) andNull(p(A)) are both A-invariant for any square matrix A.
32. Argue that P is an idempotent iff P' is an idempotent.
33. Suppose P is an idempotent. Prove that (I + P)" = I +(2" - 1) P. What can you say about (I - P)? 34. Determine all n-by-n diagonal matrices that are idempotent. How many are there?
Further Reading [A&M, 2005] J. Arat%jo and J. D. Mitchell, An Elementary Proof that Every Singular Matrix is a Product of Idempotent Matrices, The American
Mathematical Monthly, Vol. 112, No. 7, August-September, (2005), 641-645. [Tian, 2005] Yongge Tian, Rank Equalities Related to Generalized Inverses of Matrices and Their Applications, ArXiv:math.RA/0003224.
index, core-nilpotent factorization, Weyr sequence, Segre sequence,
conjugate partition, Ferrer's diagram, standard nilpotent matrix
3.4
The Index of a Square Matrix
Suppose we have a matrix A in C""". The rank-plus-nullity theorem tells us that
dim(Col(A))+ditn(Null (A)) = n. It would be reasonable to suspect that
C" = Null(A) ® Col(A).
3.4 The Index of a Square Matrix
129
Unfortunately, this is not always the case. Of course, if A is invertible, then this 00
is trivially true (why?). However, if A =
20
J,
then
E Null(A) n
2 J
Col(A). We can say something important if we look at the powers of A. First,
it is clear that Mull(A) C Ifull(A2), for if Ax = 0, surely A2x = A(Ax) = AO = 0. More generally, Arull(A)") C NUll(A)'+'). Also it is clear that Col(A2) c Col(A) since, if y E Col(A2), then y = A2x = A(Ax) E Col(A). These observations set up two chains of subspaces in C" for a given matrix A:
(0) = Null(A°) c Null(A) C Null(A2) C ... C Mull(AP) C Null(AP+') C ... and
C" = Col(A°) D Col(A) D Col(A2) D
.
. 2 Col(AP) 2 Col(A)'+1) 2 ,
.
Since we are in finite dimensions, neither of these chains can continue to strictly ascend or descend forever. That is, equality must eventually occur in these chains of subspaces. Actually, a bit more is true: once equality occurs, it persists.
THEOREM 3.13
Let A E C. Then there exists an integer q with 0 < q < n such that C" = Mull(A") ®Col(Ag). PROOF
To ease the burden of writing, let Nk = Null(A"). Then we have
(0) = NO C N' C ... C cC". At some point by the dimension argument above, equality must occur. Otherwise dim(N°) < dim(N') < ... so it < dim(N"+'), which is contradictory. Let q be the least positive integer where Ng+1. We claim there is nothing equality first happens. That is, Ng-' C Ng = but equal signs in the chain from this point forward. In other words, we claim Ng = Ng+i for all j = 1, 2, .... We prove this by induction. Evidently the
claim is true for j = 1. So, suppose the claim is true for j = k - 1. We argue that the claim is still true for j = k. That is, we show Ng+(k-1) = Nq+k
Now Aq+kx = 0 implies Aq+1(Ak-'x) = 0. Thus, Ak-1x E Nq+1 = Nq so Aq+(k-1)x = 0, putting x in Nq+(k-1). Thus, Nq+k C AgAk-'x = 0, which says Nq+(k-1) e Nq+k. Therefore, Ng+(k-1) = Nq+k and our induction proof is complete. By the rank-plus-nullity theorem, n = dim(Ak) + dim(Col(Ak)) so
130
Subspaces Associated to Matrices
dim(Col(A")) = dim(Col(A"+1 )) = .... That is, the null space chain stops growing at exactly the same place the column space chain stops shrinking.
Our next claim is that Ny n Col(A") = (0). Let V E Ny n Col(A"). Then v = Ayw for some w. Then AZyw = Ay Ayw = Ayv = 0. Thus = .. = N". Thus, 0 = Ayw = v. This W E NZy = Ny+q = gives us the direct sum decomposition of C" since dim(Ny + Col(A")) = dim(N")+dim(Col(Ay))-dim(NynCol(A")) =dim(N")+dim(Col(A")) _ Ny+(q-1)
it, applying the rank plus nullity theorem to Ay.
Finally, it is evident that q < n since, if q > n + I, we would have a properly increasing chain of subspaces N' C N2 C ... C Nnt1 C ... C N", which gives at least it + I dimensions, an impossibility in C". Thus, C" = 0 Null(A") ® Col(A").
DEFINITION 3.5
(index) For an n-by-n matrix A, the integer q in the above theorem is called the index of A, which we will denote index(A) orlnd(A).
We now have many useful ways to describe the index of a matrix.
COROLLARY 3.12
Let A E C"". The index q of A is the least nonnegative integer k such that
1. Null(A4) = Null(Ak+l ) 2. Col(AL) = Col(Ak+I )
3. rank(AA) = rank(Ak+I)
4. Co!(Ak) nNull(Ak) = (0) 5. C" = Null(AA) ®Col(Ak). Note that even if you do not know the index of a matrix A, you can always take
C" = Null(A") ® Col(A"). Recall that a nilpotent matrix N is one where Nk = 0 for some positive integer k.
COROLLARY 3.13 The index q of a nilpotent matrix N is the least positive integer k with Nk = ®.
3.4 The Index of a Square Matrix
131
Suppose p is the least positive integer with N" = 0. Then NN-'
PROOF ® so
Null(N) C .AIull(N) C
C Null(Nt"-') C Null(NP) = A/ull(NP+i)
=C". Thus, p = q.
0
Actually, the direct sum decomposition of C" determined by the index of a matrix is quite nice since the subspaces are invariant under A; recall W is invariant under A if V E W implies Av E W. COROLLARY 3.14
Let A E C"x" have index q. (Actually, q could be any nonnegative integer.) Then
1. A(JVull(A4)) C A(ull(Ay) 2. A(Col(Ay)) c Col(Ay). The index of a matrix leads to a particularly nice factorization of the matrix. It is called the core-nilpotent factorization in Meyer, [2000]. You might call the next theorem a "poor man's Jordan canonical form" You will see why later.
THEOREM 3.14 (core-nilpotent factorization) Suppose A is in C0'", with index q. Suppose that rank(Ay) = r. Then there
C®` ® S-' where C is
exists an invertible matrix S such that A = S
J
L
invertible and N is nilpotent of index q. Moreover, RA(x) = X' t µC(x).
PROOF Recall from the definition of index that C'1 = Col(A') ®JVull(A"). Thus, we can select a basis of C" consisting of r independent vectors from Col(Ay) and n - r independent vectors from JVull(Ay). Say x1, x2, ... , x, is a basis f o r Col(Ay) and Y1, Yz, ... , Yn-r is a basis of JVull(Ay). Form a
matrix S by using these vectors as columns, S = [xi X2 Xr Yi Y,,-.] = [S1 S2]. Evidently, S is invertible. Now recall that I
I
I
I
I
I
I
Col(Ay) and AIull(Ay) are invariant subspaces for A, and therefore, S-I AS =
f C ®1.
0N
Raising both sides to the q power yields with appropriate par-
Subspaces Associated to Matrices
132
L
AgS j
I
Ti
AgSl = J
T,
Ti 1
®
=
J
:
S-1
T2
I
T
Ay 1S,
J
[AS ®l =
..
Z
0
q
titioning (S-'A S)q = S-' AqS =
L
Ti AgSI TZAgSi
J
0
1
®J
.
Com-
paring the lower right corners, we see Nq = 0, so we have established that N paring is nilpotent.
Next we note that Cq is r-by-r and rank(Aq) = r = rank(S-' AqS) rank
® ®j
= rank[C']. Thus Cy has full rank and hence C is in-
vertible. Finally, we need to determine the index of nilpotency of N. But if index(N) < q, then Ny-' would already be 0. This would imply rank(Aq-1) r
i
= rank(S-' Aq-' S) = rank(I C
0
r
N0 I) = rank(L C ,
0
rank (Cq-1) = r = rank(Ay), contradicting the definition of q. The rest is left to the reader.
0
Now we wish to take a deeper look at this concept of index. We are about to deal with the most challenging ideas yet presented. What we prove is the crucial part of the Jordan canonical form theorem, which we will meet later. We determine the structure of nilpotent matrices. Suppose A in C" <" has index q. Then
(0)=Null(A°)
Null(A)
Null(AZ)
Null(A')
We associate a sequence of numbers with these null spaces.
DEFINITION 3.6 (Weyr Sequence) Suppose A in C""" has index q. The Weyr sequence of A, Weyr(A) (w , w ... , wy), where w, = nlty(A), w, = nlty(AZ) - nlty(A), .... Generally, w; = nlty(A') - nlty(A' ' ) for i = l , 2, ... , q. Note w4 = 0 for
k2q.
In view of the rank-plus-nullity theorem, we can also express w; as
w; = rank(A' ') - rank(A').
+ w; = dim(Null(A')) = n - dim(Col(A')), Note that wl + w2 + +Wq and rank(A) > w2 +w3+ +wy. We may visualize
n > w, +w2+
what is going on here pictorially in terms of chains of subspaces.
3.4 The Index of a Square Matrix
133
N nilpotent, Ind(N) = q
(0)
L
' C Null(Nq)=Null(Nq+))=.... _C
C NUII(N) C Null(N) C WI
I
1
fJt
I
fJ1
I
+(Jt+fJi
The Weyr sequence.
Figure 3.3:
We may also view the situation graphically. T
nt
W) +(JZ + 4)3+ a3 4)) + Wz
nlty(N) = a) w)
(0)
Figure 3.4:
Null(N)
Null(N2) Null(N3)
Nul1(Nq) = Cn
Another view of the Weyr sequence.
Let's look at some examples. We will deal with nilpotent matrices since these are the most important for our applications. First, let's define the "standard" kby-k nilpotent matrix by
Nilp[k] =
0 0
1
0 1
0 0
...
0
0
0
0
0
...
0
0
0
0
0 0
0
1
E Ckxk
Subspaces Associated to Matrices
134
Thus we have zeros everywhere except the superdiagonal, which is all ones.
It is straightforward to see that rank(Nilp[k]) = k - 1, index(Nilp[k]) = k, and the minimal polynomial p.Niit,lkl(x) = xk. Note Nilp[I ] _ [0]. 0
Let N = Nilp[4] _
0
];N 2
0
0
0 0
0
0
0
0
1
0 0
0 0
0 0
0 0
0
0
0
0
1
0 0
0
Clearly rank(N) = 3, nullity(N)
0
0
0
0
0
1
0
0
0
0 ,
0
rank(N2) = 2, nullity(N2) = 2; N3 =
0
rank(N3) = 1, nullity(N3) = 3; N4 = 0, rank(N;)
0, nulliry(N4) = 4. Thus, we see Weyr(N) = (1, 1, 1, I) and index(N) = 4. Did you notice how the ones migrate up the superdiagonals as you raise N to higher powers? In general, we see that
Weyr(Nilp[k]) = (1, 1, 1, 1,...
,
1).
k times
The next question is, can we come up with a matrix, again nilpotent, where the
Weyr sequence is not all ones'? Watch this. Let N =
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
00
J 0 where the blank spaces are understood to he all zeros. Now N is rank 4, nullity 1
2. Now look at N2. N2 =
0 0
0 0
1
0
0
1
0
0
0
0
We see rank(N2) = 2, 0 0
nlty(N2) = 4. Next N3 = L
0 0
0
0
0 0
0
0
0
0
0
0 0
1
Clearly, rank(N3) = I , 0
0
0
0
nullity(N3) = 5 and finally N4 = 0. Thus, Weyr(N) = (2, 2, 1, 1).
3.4 The Index of a Square Matrix
135
Now the question is, can we exhibit a matrix with prescribed Weyr sequence? The secret was revealed above in putting standard nilpotent matrices in a block diagonal matrix. Note that we used Nil p[4] and Nilp[2J to create the Weyr sequence of (2, 2, 1, 1). We call the sequence (4,2) the Segre sequence of N above and write Segre(N) = (4, 2) under these conditions. It may not be obvious to you that there is a connection between these two sequences, but there is!
DEFINITION 3.7
(Segre sequence)
Let N = Block Diagonal[Nilp[p,I, Nilp[p2], ... , Nilp[pk]]. We write Segre(N) = (p,, p2, ... , pk). Let's agree to arrange the blocks so that p, k
p2 >
> Pk. Note N E C"' where t = >2 pi. i_1
First, we need a lesson in counting. Suppose n is a positive integer. How many ways can we write n as a sum of positive integers? A partition of n is an m-tuple of nonincreasing positive integers whose sum is n. So, for example, (4, 2, 2, 1, 1)
is a partition of 10 = 4 + 2 + 2 + I + 1. There are many interesting counting problems associated with the partitions of an integer but, at the moment, we shall concern ourselves with counting how many there are for a given n. It turns out, there is no easy formula for this, but there is a recursive way to get as many as you could want. Let p(n) count the number of partitions of n. Then
P(1) = 1 p(2) = 2
p(3)=3 p(4)=5 p(5)=7
I
2, 1 + 1
3,2+1,1+I+1 4,3+1,2+2,2+1+1,1+I+1+1 5,4+1,3+2,3+1+ 1,2+2+1,2+1+1+ 1,1+1 +1+1+1
p(6) = 11 p(7) = 15 p(8) = 22
There is a formula you can find in a book on combinatorics or discrete mathematics that says
p(n)=p(n-1)+p(n-2)-p(n-5)-p(n-7)+.... For example, p(8) = p(7)+ p(6)- p(3)- p(1) = 15+11-3-1 = 22. Now, in the theory of partitions, there is the notion of a conjugate partition of a given partition and a Ferrer's diagram to help find it. Suppose a = (m,, in,, ... , ink) is a partition of n. Define the conjugate partition of a to be a* = (ri , r2*, . r,'), where ri * counts the number of mjs larger than or equal to i for i = 1, 2, 3, ... .
So, for example, (5, 3, 3, 1)* = (4, 3, 3, 1, 1). Do not panic, there is an easy
136
Subspaces Associated to Matrices
way to construct conjugate partitions from a Ferrer's diagram. Given a partition
a = (m, , m_... ...
, mk) of n, its Ferrer's diagram is made up of dots (some
people use squares), where each summand is represented by a horizontal row of dots as follows. Consider the partition of 12, (5, 3, 3, 1). Its Ferrer's diagram is
Figure 3.5:
Ferrer's diagram.
To read off the conjugate partition, just read down instead of across. Do you see where (4, 3, 3, 1, 1) came from? Do you see that the number of dots in row i of a* is the number of rows of a with i or more dots? Note that a** = of, so the mapping on partitions of n given by a H a* is one-to-one and onto. THEOREM 3.15 Suppose N = Block Diagonal[ Nilp[p1 ], NilpIP2]. , Nilp[pk]]. Then Weyr(N) is the conjugate partition o f Segre(N) = (P,, P2, ... , Pk)
PROOF To make the notation a little easier, let N = Block Diagonal [N,, N2, ... , Nk], where Ni = Nilp[p,], which is size p,-by-p, for i = 1,2, 3, ... , k. Now N2 =Block Diagonal[ N1 , N21 ... , Nk ], and generally Ni = Block Diagonal [N, j, N 2 , ... , NA ] for j = 1, 2, 3, .... Next note that,
for each i, rank(N,) = pi - 1, rank(N?) = pi - 2, ... , rank(N/) = pi - j as long as j < pi; however, rank(N;) = 0 if j > pi. Thus, we conclude
rank(N; -I) - rank(N!)
I if i pi l0ifj>pi
.
The one acts like a counter sig-
nifying the presence of a block of at least size p,. Next, we have rank(Ni) _
>rank(N, ),sowi =rank(Ni')-rank(Ni)=1:rank(N;
-,)->rank(Ni
)
_ >(rank(N/-') - rank(N; )). First, w, = rank(N°) - rank(N) = nrank(N) = nullity(N) =the number of blocks in N since each N; has nullity I (note the single row of all zeros in each N;). Next, w, = rank(N)-rank(N2) _ k
>(rank(N,)-rank(N'-)) = (rank(N,)-rank(N? ))+(rank(N2)-rank (N2))
3.4 The Index of a Square Matrix
137
+- + (rank(Nk) - rank(Nk )). Each of these summands is either 0 or I. A I occurs when the block Ni is large enough so its square does not exceed the size of Ni. Thus, w2 counts the number of Ni of size at least 2-by-2. Next, A
w; = rank(N2) - rank(N3) = >(rank(N2) - rank(N, )) = (rank(N2) i=1
rank(N3 )) + (rank(N?) - rank(NZ )) + + (rank(NN) - rank(NN )). Again, a I occurs in a summand that has not yet reached its index of nilpotency, so a I records the fact that Ni is of size at least 3-by-3. Thus w3 counts the number of blocks Ni of size at least 3-by-3. Generally then, wj counts the number of blocks Ni of size at least j-by-j. And so, the theorem follows. 0 COROLLARY 3.1 S With N as above, the number of size j-by-j blocks in N is exactly wj - wj +,
rank(Ni-1) - 2rank(Ni) + rank(Ni+') = = [nlty(NJ) - nlty(N)-t )] - [nlty(NJ+') - nity(Ni)].
PROOF
With the notation of the theorem above, (rank (N; -')-(rank(N/ ))-
(rank(NJ) - rank(N; +i )) = { I if I = Pi . Thus, a 1 occurs exactly when 0 otherwise a block of size pi-by-pi occurs. Thus the sum of these terms count the exact number of blocks of size j-by- j. 0 Let's look at an example to be sure the ideas above are crystal clear. Consider 0 0 0 1
N =
0 0
0 0
0
0
1
0
0 0
0
1
0
1
0
0
.
0
1
0
0
Clearly, w, = 9 - rank(N) =
L 0J 9 - 5 = 4, so the theorem predicts 4 blocks in N
1. That is, it predicts (quite accurately) exactly 4 blocks in N. Next w2 = rank(N) -rank(N2) = (rank(N,) - rank(N? )) + (rank(N2) - rank(N2 )) (rank(N3) - rank(N2))+ (rank(N4) - rank(NN )) = I + 1 + 1 +0 = 3 blocks of size at least 2-by-2, 03 = rank(N2) - rank(N3) = I + 0 + 0 + 0 = 1 block of size at least 3-by-3, w4 = rank(N3) - rank(N4) = I + 0 + 0 + 0 = I block of size at least 4-by-4, and w5 = rank(N4) - rank(N5) = 0,
138
Subspaces Associated to Matrices
which says there are no blocks of size 5 or more. Moreover, our formulas
w, - w, = 4 - 3 = I block of size I-by-I w, - w = 3 - 1 = 2 blocks of size 2-b y -2 w; - w4 = I - I = U blocks of size 3-by-3 w4 - ws = J - 0 = I block of size 4-by-4 Next let's illustrate our theorem. Suppose we wish to exhibit a nilpotent matrix N with Weyr(N) _ (4, 3, 2, 1, l). We form the Ferrer's diagram
Figure 3.6:
Ferrer's diagram for (4, 3, 2, 1, 1).
and read off the conjugate partition (5, 3, 2, 1). Next we form the matrix N = Block Diagonal[Nilp[5], Nilp[3], Nilp[2], Nilp[l]] _ 0 0 0 0 0
1
0
0
1
0 0 0
1
0 0 0
0 0
0
0 0
0 0 0
1
0 0 0
1
0 0
0 1
0 0 0
1
0 0
Then Weyr(N) = (4, 3, 2, 1, I). The block diagonal matrices we have been manipulating above look rather special but in a sense they are not. If we choose the right basis, we can make any nilpotent matrix look like these block diagonal nilpotent matrices. This is the essence of a rather deep theorem about the structure of nilpotent matrices. Let's look at an example to motivate what we need to achieve. Suppose we have a nilpotent matrix N and we want N to "look like" 0 0
0
1
0
0
0
0 0 0 0
0
0
0 0 0
0 0 0
1
0
0 0
0 0 0
1
0 0 0
0 0 0
0 0 0
0 0
0
0
01
0
0
0 0 0 0
0 0 0
0 0
0 0
0
0
0 0 0 0
1
0
1
0
That is, we seek an invertible matrix S
3.4 The Index of a Square Matrix
139
such that S-1 NS will be this 8-by-8 matrix. This is the same as seeking a special
basis of C8. If this is to work, we must have NS = S 0 0 0 0 0 0 0
1
0 0 0 0 0 0 0
0 [
0
0
1
0
0 0
0
0 0 0 0
0 0 0 0
0 0 0 0
1
0 0
0 0
0 0 0 0 1
0 0 0
0 0 0 0 0 0 0 0
0 0
0 0 0 0 1
0
ISell=
ISet ISe2ISe31-6 ISesI
IS71, ISi IS2IS3I IS5I where si is the ith column of the matrix S. Equating columns we see
Ns,=-6 Nss=-6 Ns7=-6 Ns6 = s5
Nsg = S1
Ns8 = s7
Ns3 = s2
Nsa = $3 In particular, we see that s1, s5, and s7 must form a basis for the null space of N. Note N is rank 5 so has nullity 3. Also note
Ns,=-6
Ns5 =-6
N2s2 = ' N2s6 =
Ns7
N zsg=
N3s3 = N a sa= and so
Null(N) = Null(N2) = Null(N3) = 1%full(N4) =
sp(s1, s5, s7} sp(s1, s5, s7, s2, s6, SO sp{SI, s5, s7, S2, $6, S8, s3} sp{sl, S5, S7, S2, S6, S8, S3, S4} = C8
Finally, note that S4
S6
S8
Nsa = s3
Ns6 = s5
Ns8 = S7
N2sa = S2 N3Sa = Si
The situation is a bit complicated but some clues are beginning to emerge. There seem to be certain "Krylov strings" of vectors that are playing an important role. We begin with a helper lemma.
Subspaces Associated to Matrices
140
LEMMA 3.1
For any square matrix A and positive integer k, Null(Ak-') e Nu1l(At) C NUll(A"+' ) . Let 13k_i = { b i , b2, , ... , br} be a basis forArull(Ak-). Extend L34-1 to a basis o f Null(A"); say 5k = { b , , b2, , ... , br, c , , c), ... , c, } . Finally, extend B k to a basis o f Null (A"+' ); say Lik+, = {bi , b2, , ... , b,, c, ,
c2,...,c,, di,d2,...,d,}.Then T=(b,,b2...... br,Ad,,Ad,,...,Ad,)
is an independent subset of Null(Ak) though not necessarily a basis thereof.
PROOF First we argue that T _e 11ull(Ak). Evidently all the bis are in JVull(Ak-') so Akbi = AAk-'bi = A ' = -6, putting the bis inlull(A"). Next, the dis are in Null(Ak+') so = Ak+'di = Ak(Adi), which puts the Adis in Null(Al) as well. Next, we argue the independence in the usual way. Suppose (3,b, '.Then A(a,d, +
+a,d,) = - ((31 b, + + (3,b,.) E Null(Ak-' ). Thus, Ak-' A(a, d,+ + a,d,) _ ' putting aid, + + aid, E Null(A"). Therefore, a,d, + +
a,d, =y,b, y, b, -
But then aid,
- y, br - S, c, -
- Bsc, = 6. By independence, all the as, ys, and bs must be zero. Thus 01b, + + (3rbr = so all the (3s must be zero as well by independence. We have proved T is an independent subset of JVull(Ak). 0
Now let's go back to our example and see if we could construct a basis of that special form we need for S.
You could start from the bottom of the chain and work your way up, or you could start at the top and work your way down. We choose to build from the bottom (i.e., from the null space of N). First take a basis B, for Null(N); say B, = {b,, b2, b3}. Then extend to a basis of Null(N2); say Lie = {b,, b2, b3, b4, b5, b6}. Extend again to a basis of Null(N3); say 133 = {b,, b2, b3, b4, b5, b6, b7). Finally, extend to a basis 84 of Null(N4) = C8, say E4 = {b,, b2, b3, b4, b5, b6, b7, b8}. The chances this basis has the special properties we seek are pretty slim. We must make it happen. Here is where things get a hit tricky. As we said, we will start from the full basis and work our way down. Let T4 = {b8}; form L32 U (Nb8) = (b,, b2, b3, b4, b5, b6, Nb8). This is an independent subset of Null(N3) by our lemma. In this case, it must be a basis of A(u!l(N3). If it was not, we would have had to extend it to a basis at this point. Next, let T; = (Nb8); form B, U {N2b8} = {b,, b2, b3, N2b8} c Null(N2). Again this is an independent set, but here we must extend to get
a basis of Null(N2); say {b,, b2, b3, N2b8, z,, z2}. Next, let T = {N3b8} U{Nz,, Nz2} _ (N3b8, Nz,, Nz2} c_ Null(N). This is abasis forMull(N). So here is the modified basis we have constructed from our original one: {N3b8, Nz,, Nz2, N2b8, z,, z2, Nb8, b8). Next, we stack it
3.4 The Index of a Square Matrix
141
carefully:
in A(ull(N4)
b8
Nb8 N2b8 N3b8
in Null(N3) z,
z2
in .Afull(N2)
Nz,
Nz2
in Null(N)
Now simply label from the bottom up starting at the left. b8 = S4 Nb8 = S3 N2b8 = S2
Z, = S6
Z2 = S8
N3b8 = s,
Nz, = s5
Nz2 = s7
The matrix S = [N3b8 I NZb8 I Nb8 I b8 I Nz, zi I Nz I z21 has the desired pro erty; namely, NS = [76 I N3b8 I N2b8 I Nb8 I I Nz, I Nz2] = 17 1 s, I S2 I s3 16 I s5 16 I s7], which is exactly what we wanted. Now let's see if we can make the argument above general. I
THEOREM 3.16 Suppose N is an n-by-n nilpotent matrix of index q. Then there exists an invertible
matrix S with S`NS =Block Diagonal[Nilp[pi], Nilp[p2], ... , Nilp[pk]]. The largest block is q-by-q and we may assume p, > P2 > > Pk . Of course, + Ilk = n. The total number of blocks is the nullity of N, and the p, + P2 +
number of j-by-j blocks is rank(NJ-1) - 2rank(Ni) + rank(NJ+, ). PROOF
Suppose N is nilpotent of index q so we have the proper chain of
subspaces
(0) (; Null(N) C, Null(N2) C ... C Null(Nq-') C, Null(N?) = C". As above, we construct a basis of C" starting at the bottom of the chain. Then we adjust it working from the top down. Choose a basis Ci, of Null(N);
say Iii = {b,, ... ,bk,}. Extend to a basis of Null(N2); say B2 = {b,,
... bk, , bk,+l , ... , bk2 }. Continue this process until we successively produce a basis of C", 1 3 q = {b, , ... bk, , bk,+,, ... , b2..... bky -2+, .... , bky , bky_,+, , ... , bkq J. As we did in our motivating example, we begin at the top and work our way down the chain. Let T. = {bky_,+,, ... , bky}; form 13q_2 U (Nbky_,+,, ... , Nbky}. This is an independent subset of Null(Nq-1) by our helper lemma. Extend this independent subset to a basis of Null(Nq-1); say 8q_2 U {Nbky ,+, , ... , Nbky) U {c , ... , c,,, }. Of course, we may not need ,
142
Subspaces Associated to Matrices
any cs at all as our example showed. Next take 7t-I = (Nbk,, ,+I . . , Nbkq ) U {c11, ... , clt, } and form 8,,-3 U {N2bk ,+I , , N2bk,, } U {Nc11, .... Ncl,, ). This is an independent subset ofNu!!(Ny-22). Extend this independent subset to basis + 1 ,_... . , N2bky}U{Ncll, ... , Ncl,, I U(c21, ... , c2t2 J. Continue this process until we produce a basis for Null(N). Now stack the basis vectors we have thusly constructed.
N (bj,_,+l) N2 (bkq +I)
... ...
Ny-I(bk,-,+I)
...
bk,
Nbk N2 (bky)
Ny-1 (bk,,)
...
C21
...
Ny-3 (c,1)
... ...
c1,
Nc11
Ny-2(CII)
...
Ny-2 (elti)
C11
Ncl,
C2r2
Cy-I I ... Ctt-Iry
Ny-3 (e212)
Note that every column except the last is obtained by repeated application of N to the top vector in each stack. Also note that each row of vectors belong to the same null space. As in our example, label this array of vectors starting at the bottom of the first row and continue up each column. Then S =
[Nt (bk, i+I)
I
Ny
,_(bk,,
...
I Cy-II
i+I) I I
...
I bk,t_,+I I Ny
... I Cy-lt ,}
(bky)
I
...
This matrix is invertible and brings N into the block diagonal form as advertised in the theorem. The remaining details are left to the reader. 0
This theorem is a challenge so we present another way of looking at the proof due to Manfred Dugas (I I February, 1952). We illustrate with a nilpotent matrix of index 3. Let N E C""" with N3 = ®i4 N22 . Now (V0 )
Null(N) (; Nul!(N2) (; Null(N3) = C. Since Null(N2) is a proper subspace of'C", it has a complementary subspace M3. Thus, C" = Null(N2)®M3.
Clearly, N(M3) c Null(N2) and N2(M3) c Null(N). Moreover, we claim
that N(M3) fl Mull(N) = (t), for if v E N(M3) fl Null(N), then v = Nm3 for some m3 in M3 and 6 = Nv = N2m3. But this puts m3 in so v = Nm3 = -6. Now Null(N2) fl M3 = (v). Therefore, m3 = Null(N) ® N(M3) e Null(N2) and could actually equal Nul!(N2). In any case, we can find a supplement M2 so thatNull(N2) = J%/u11(N)®N(M3)M2.
Note N(M2) S; Arul!(N). Even more, we claim N(M2) fl N2(M3) _ (v),
for if v E N(M2) fl N2(M3), then v = Nm2 = N2m3 for some m2 in
3.4 The Index of a Square Matrix
143
putting m2 - Nm3 in M2 and M3 in M3. But then, N(m2 - Nm3) = (-d'). Thus, M2 = Nm3 E M2 fl N(M3) = (e ),
J\/ull(N) fl (N(M3) ( E )
making v = Nm2 ='. Now N(M2) ® N2(M3) c_ AIull(N), so there is a supplementary subspace M, with Null(N) = N(M2) ® N2(M3) ® M1. Thus, collecting all the pieces, we have C" = M3 ®N M3 ®M2 ®N M2 ® N2 M3 ®M 1. We can reorganize this sum as
C" =[M3eNM3ED N2M3]®[M2®NM2]®MI = L3ED L2®LI, where the three blocks are N-invariant. To create the similarity transformation S, we begin with a basis B3 = (bi3j, b23), ... , b(3)} of M3. We claim (3) (3) IN bI , N 2b2 2
, ... N
2
d;
= Eaj
(3)
bd3 } is an independent set. As usual, set
j=1 d
N2b(. ), which implies
ajb(" E Jfull(N2) fl M3 =
This means
j=1 d3
all the ajs are 0, hence the independence. Now if 0 =
then
j=I
d,
d
0 so, as above, all the ajs are 0. Thus we have
j=1
L3 = M3 ® NM3 ® N2M3 = Nb(3), N2b(3) N2b(3)
L2 = M2 ® NM2 =
span{b(,3), b2 ), ... , b3), NbI3 Nb2), ...
,
... , N2b(3)} spun{b'2), b(2) , ...
,
bd2), N2b12j,
Nb2),
... ,
Nbd2)}
L3 = M1span{b(,g), b2 (l), ... , b(l)),
again noting these spans are all N-invariant. Next we begin to construct S. Let S3) _ [N2bj3j Nbj3) bi3)] E Ci 3 for j = I, ... , d3. We compute that 10 I 0 NSj3 = IV I = 0 0 1 [N2bj3) Nb(j3) 0 0 0 = S(3)Nilp[3]. Similarly, set Sj2j =1 [Nb;2) b(2)] and find 0 [Nb(z) S(.2'Nilp[2]. Finally, set S(') = [b(il)]. b(2)] Nb(2)] _ J= L0 0 [b(-1)][0]. We are ready to take S = [S,(3) Compute that NS(1) I
I
I
I
I
NS(.2)
I
I
0 ISd)IS2)I...IS2)IS')I...IS(d')].Then
[I
Subspaces Associated to Matrices
144
Nilp[31
Nilp[3] Nilp[21
NS=S Nilp[2] 0
0J
L
where there are d3 copies of Nilp[3], d2 copies of Nilp[2], and d, copies of Nilp[ 1 ]. So we have another view of the proof of this rather deep result about nilpotent matrices.
This theorem can be considered as giving us a "canonical form" for any nilpotent matrix under the equivalence relation of "similarity" Now it makes sense to assign a Segre sequence and a Weyr sequence to any nilpotent matrix. We shall return to this characterization of nilpotent matrices when we discuss the Jordan canonical form.
Exercise Set 11 1. Compute the index of
1
3
1
0
0
1
0
1
1
1
2
1
1
0
s-t 2. Argue that
N=
-t
-s - t
4 0
-2s
s+t
0
t
2s
-s + t
is nilpotent. Can you say
anything about its index?
3. Fill in the details of the proof of Corollary 3.12. 4. Fill in the details of the proof of Corollary 3.13.
5. What is the index of 0? What is the index of invertible matrix?
What is the index of an
6. Suppose p = P2 but P # I or 0. What is the index of P?
3.4 The Index of a Square Matrix
7. Let M = R
4 0
0 0 0
0 0
® ®J
8. Let L =
0 0 0 0 0
2
1
3
,
0
0 1
0 0 0
0
1
0
0
145
R`. What is the index of M?
where M is from above. What is the index of L?
L
9. Find the conjugate partition of (7, 6, 5, 3, 2). 10. Construct a matrix N with Weyr(N) = (5, 3, 2, 2, 1).
11. Argue that the number of partitions of n into k summands equals the number of partitions of n into summands the largest of which is k. 12. Argue that if N is nilpotent of index q, then so is S- 1 N S for any invertible square matrix S.
13. Argue that if B = S- 'AS, then Weyr(B) = Weyr(A).
14. Consider all possible chains of subspaces between (') and C4. How many types are there and what is the sequence of dimensions of each type? Produce a concrete 4-by-4 matrix whose powers generate each type of chain via their null spaces.
15. Consider the differentiation operator Don the vector space V = C[x]`" of polynomials of degree less or equal to n. Argue that D is nilpotent. What is the index of nilpotency? What is the matrix of D relative to the standard basis 11, x, x2, ... , x")? 16. An upper(lower) triangular matrix is called strictly upper(lower) triangular iff the diagonal elements consist entirely of zeros. Prove that strictly upper(lower) triangular matrices are all nilpotent. Let M be the matrix A
m12
0
A
0
0
... MD, ... mz
...
Argue that M - hl is nilpotent.
A
17. Can you have a nonsingular nilpotent matrix? Suppose N is n-by-n nilpotent of index q. Prove that I + N is invertible. Exhibit the inverse. 18. Compute p(9), p(10), p(l 1), and p(12).
19. Is the product of nilpotent matrices always nilpotent?
Subspaces Associated to Matrices
146
20. Suppose N is nilpotent of index q and cx is a nonzero scalar. Argue that aN
is nilpotent of index q. Indeed, if p(x) is any polynomial with constant term zero, argue that p(N) is nilpotent. Why is it important that p(x) have a zero constant term?
21. Suppose Ni and N2 are nilpotent matrices of the same size. Is the sum nilpotent? What can you say about the index of the sum'?
22. Suppose N is nilpotent of index q. Let V E C" with Ny-Iv # 0. Then q < n and the vectors v, Nv, N2v, ... , Ny-1 v are independent.
23. Suppose N is nilpotent of index q. Prove that ColM-1) e Null(N). 24. (M. A. Khan, CMJ, May 2003). Exponential functions such as f (x) = 2'
have the functional property that f (x + y) = f (x) + f (y) for all x and
y. Are there functions M:C[x]" " -* C[x]" that satisfy the same functional equation (i.e., M(x + y) = M(x) + M(y))? If there is such an M, argue that I = M(0) = M(x)M(-x), so M(x) must be invert-
ible with M(-x) = (M(x))-1. Also argue that (M(x))' = M(rx), so that M(X)' = M(x). Thus, the rth root of M(x) is easily found by r replacing x by X. Suppose N is a nilpotent matrix. Then argue that r
M(x) = I + Nx +
N2x2
+
NY
+
is a matrix with polynomial entries that satisfies the functional equation M(x + y) = M(x) + M(y). 3!
Verity Khan's example. Let N =
7
-10
4
-4
4
-13
7
-4 5
18
16
Argue
-7
that N is nilpotent and find M(x) explicitly and verify the functional equation.
25. Suppose N is nilpotent of index q. What is the minimal polynomial of N? What is the characteristic polynomial of Nilp[k]'? 26. Find a matrix of index 3 and rank 4. Can you generalize this to any rank and any index?
27. Suppose A is a matrix of index q. What can you say about the minimal polynomial of A? (Hint: Look at the core-nilpotent factorization of A.) 28. Draw a graphical representation of the Weyr sequence of a matrix using column spaces instead of null spaces, as we did in the text in Figure 3.4.
29. What is the trace of any nilpotent matrix?
3.4 The Index of a Square Matrix 30. Show that every 2-by-2 nilpotent matrix looks like
147 a[32
L -a L
R2
a[3
31. Suppose N = Nilp[n] and A is n-by-n. Describe NA, AN, NT A, ANT , NT AN, NANT, and N' AN`.
32. Suppose A is a matrix with invariant subspaces (') and C. Prove that either A is nilpotent or A is invertible.
Further Reading [Andr, 1976] George E. Andrews, The Theory of Partitions, AddisonWesley, Reading, MA, (1976). Reprinted by Cambridge University Press, Cambridge, (1998). [B&R, 1986(4)] T. S. Blyth and E. F. Robertson, Linear Algebra, Vol. 4, Chapman & Hall, New York, (1986).
[G&N, 2004] Kenneth Glass and Chi-Keung Ng, A Simple Proof of the Hook Length Formula, The American Mathematical Monthly, Vol. 111, No. 8, October, (2004), 700-704.
[Hohn, 19641 E. Hohn, Elementary Matrix Theory, 2nd Edition, The Macmillan Company, New York, (1958, 1964). [Shapiro, 19991 Helene Shapiro, The Weyr Characteristic, The American Mathematical Monthly, Vol. 106, No. 10, December, (1999), 919-929.
3.4.1 3.4.1.1
MATLAB Moment The Standard Nilpotent Matrix
We can easily create a function in MATLAB to construct the standard nilpotent matrix nilp[n]. Create the following M-file: I
23-
function N = nilp(n) if n == O.N = [].else, N = diag(ones(n - 1), 1); end
This is an easy use of the logical format "if ... then ... else". Note, if n = 0, the empty matrix is returned. Try out your new function with a few examples.
Subspaces Associated to Matrices
148
There is a function to test if a matrix is empty. It is
isempty(A). How could you disguise the standard nilpotent matrix to still he nilpotent but not standard? (Hint: If N is nilpotent, so is SNS-t for any invertible S.)
3.5
Left and Right Inverses
As we said at the very beginning, the central problem of linear algebra is the
problem of solving a system of linear equations. If Ax = b and A is square and invertible, we have a complete answer: x = A-'b is the unique solution. However, if A is not square or does not have full rank, inverses make no sense. This is a motivation for the need for "generalized inverses." Now we face up to the fact that it is very unlikely that in real-life problems our systems of linear equations will have square coefficient matrices. So consider Ax = b, where A
is m-by-n, x is n-by-1, and b is m-by-I. If we could find a matrix C n-by-m with CA = I", then we would have a solution to our system, namely x = Cb. This leads us to consider one-sided inverses of a rectangular matrix, which is a first step in understanding generalized inverses.
DEFINITION 3.8 (left, right inverses) Suppose A is an m-by-n matrix. We say B in Cnx"' is a left inverse for A if] BA = I". Similarly we call C in C' xm a right inverse for A if AC = I",.
The first thing we notice is a loss of uniqueness. For example, let A = 1
0
0
1
0
0
. Then any matrix B =
0
0
x
is a left inverse for any choice
y J
of x and y. Next we consider existence. Having a one-sided inverse makes a matrix rather special. THEOREM 3.17 Suppose A is in C"' x" . Then
I. A has a right inverse iff A has full row rank m. 2. A has a left inverse iff A has full column rank n.
PROOF (I) Suppose A has a right inverse C. Then AC = /",. But partitioning Ac",] = I", _ [et e2 e,,,] so C into columns, AC = [Act IAc2I I
I
I
I
3.5 Left and Right Inverses
149
each Ac; is the standard basis vector ei. Thus, (Acl, Ace, , Ac) is a basis of C0". In particular, the column space of A must equal C". Hence the column rank of A is in. Therefore the row rank of A is m, as was to be proved. Conversely, suppose A has full row rank m. Then its column rank is also in. so among the columns of A, there must be a basis of the column space, which is C". Call these columns dl, d2, , d",. Now the standard basis vectors e: , e2, , e,,, belong to C"' so are uniquely expressible in terms of the ds: say
el =aiidi +a12d2+...+almdn, e2 = a21d1 + a22d2 + ... + a2mdm
+... +ammdn,
en, = a,,,1dI +am2d2
Now we will describe how to construct a right inverse C with the help of these
as. Put a11, a21,
, a,,,I in the row corresponding to the column of dl in
A. Put a12, a22, , amt in the row corresponding to the column d2 in A. Keep going in this manner and then fill in all the other rows of C with zeros. Then AC = [e1 1e21 ... le",] = I,,,. Let's illustrate in a concrete example what
just happened. Suppose A = L
b
d
e
h
has rank 2. Then there must 1
be two columns that form a basis of the column space C2, say d1 = [ d
].menel=L
d2=1
01=a11
d ]+a12[
[
0
].musAC=[:
[d I +a221
«21
h 1
0l
0
1
e
f
01=
]ande2= h h
0
all
a21
0
0
a12
a22
j '2
(2) A similar argument can be used as above or we can be more clever. A has full column rank n iff A' has full row rank n iff A' has a right inverse iff A has a left inverse. 0
Now we have necessary and sufficient conditions that show how special you have to be to have a left inverse. The nonuniqueness is not totally out of control in view of the next theorem.
THEOREM 3.18 If A in Cm- has a left inverse B, then all the left inverses of A can be written as B + K, where K A = ®. A similar statement applies to right inverses.
Subspaces Associated to Matrices
150
Suppose BA = I and B, A = 1. Set K, = B, - B. Then K, A =
PROOF
(B, -B)A=B,A-BA=I - I = 0. Moreover, B, =B+K,.
0
Okay, that was a pretty trivial argument, so the theorem may not be that helpful. But we can do better.
THEOREM 3.19
Let A be in C"'. 1. Suppose A has full column rank n. Then A* A is invertible and (A* A )- ' A* is a left inverse of A. Thus all left inverses of A are of the form (A* A)- I A*+
K, where KA = 0. Indeed, we can write K = W[1,,, - A(A*A)-' A*], where W is arbitrary of appropriate size. Hence all left inverses of A look like
(A*A)-'A* + W[I,,,
- A(A*A)-IA*]
2. Suppose A has full row rank m. Then AA* is invertible and A*(AA*)-i
is a right inverse of A. Thus all right inverses of A are of the form A*(AA*)-' + K, where AK = 0. Indeed K = [I - A*(AA*)-'A]V, where V is arbitrary of appropriate size. Hence all right inverses of A look like
A*(AA*)-' + [1 - A*(AA*)-' A]V. PROOF Suppose A has full column rank n. Then A*A is n-by-n and r(A*A) = r(A) = n. Thus A*A has full rank and is thus invertible. Then [(A*A)-'A*]A = (A*A)-'(A*A) = I,,. The remaining details are left as exercises. A similar proof applies.
0
As an example, suppose we wish to find all the left inverses of the matrix A = 1
2
2
1
3
(
].WeflndA*A =
[ 17
6, and (A*A)-' = is
1
_6 L
-7 14
Now we can construct one left inverse of A, namely (A*A)-l A* _ ,
8 21
5
11
] The reader may verity by direct computation that this 0 -7 matrix is indeed a left inverse for A. To get all left inverses we use T,
C = (A*A) ' A* + W[I,11 - A (A*A)-i A*].
3.5 Left and Right Inverses
a d
Let W _
C
] he a parameter matrix. Now A (A*A)-' A* =
fc
e
-3
34
5
5
10
15
-3
15
26
I
-8
5
11
35
21
0
-7
' I
b
. Then
I
15 +a
]+[
--L+ t
35 - St
+S
-5s
2? 35
a d
g
b e
5
fC
]
-5b+3c 35 -5a+25b- 15c
s =d -5e+3f.
5
3
39
35 I5S
35
3255
T
IS
39
35
35
35
=
'S +3a - 15b+9c
-5d+25e-15f -5+3d-15e+9f]
35+d-5e+3f
[
151
35
+ 3t +3s
]
, where t = a - 5b + 3c and
The reader may again verify we have produced a left inverse of A. So where do we stand now in terms of solving systems of linear equations? We have the following theorem. THEOREM 3.20
Suppose we have Ax = b, where A is m-by-n of rank n, x is n-by-I, and b, necessarily m-by-I. This system has a solution if and only if A(A*A)-' A*b = b (consistency condition). If the system has a solution, it is (uniquely) x = (A*A)-' A*b.
PROOF Note n = rank(A) = rank(A*A), which guarantees the existence of (A*A)-', which is n-by-n and hence of full rank. First suppose the condition A (A*A)-' A*b = b. Then evidently x0 = (A*A)-' A*b is a solution for Ax =
b. Conversely, suppose Ax = b has a solution x,. Then A*Ax, = A*b so (A*A)-'(A*A)x, = (A*A)-'A*b sox, = (A*A)-'A*b whence b = Ax, _ A(A*A)-' A*b.
Now suppose Ax = b has a solution. A has a left inverse C so x = Cb is a solution. But C can be written as C = (A*A)-' A* + W[I,,, - A (A*A)-' A*]. So, Cb = (A*A)-' A*b+W[I,,, - A (A*A)-' A*]b = (A*A)-' A*b+O usin the consistency condition. Thus, x = (A*A)-' A*b. For example, consider the system
x+2y=I 2x+y = 1
3x+y=1
.
Subspaces Associated to Matrices
152
1
x J
A(A*A)-'A*
I
J
We check consistency with
1
l
I.
1
= s 35
1
34
5
-3
I
5
10
15
1
-3
15
26
1
j
1
i4
1
SO we
I
conclude this system is inconsistent (i.e., has no solution). However, for
x+2y=3 2x+y=3, 3x+y=4 35
34
5
-3
3
5
10
15
3
-3
15
26
4
3
=
I
3
I
3 3 4
5
the unique solution i5
1
L
so the system is consistent and has
4
-7 J
21
0
=
J.
Exercise Set 12 1.
(a) Suppose A has a left inverse and AB = AC. Prove that B = C. (b) Suppose A has a right inverse and suppose BA = CA. Prove that B = C.
2. Prove that rank(SAT) = rank(A) if S has full column rank and T has full row rank.
3. Argue that KA = 0 iffthere exists W such that K = W[1-A(A*A)-lA*]. What can you say about the situation when AK = 0?
4. Argue that A has a left inverse iffNull(A) = (0). 5. Suppose A = LK where L has a left inverse and K has a right inverse. Argue that r(A) = r(L) = r(K). 6. Construct all left inverses of A =
1
0
0
1
1
I
7. Argue that A has a left inverse if AT has a right inverse. 8. Argue that a square singular matrix has neither a left nor a right inverse.
3.5 Left and Right Inverses
153
9. Let A be an m-by-n matrix of rank m. Let B = A*(AA*)-I. Show B is a right inverse for A and A = (B*B)-I B*. Moreover, B is the only right inverse for A such that B* has the same row space as A. 10. If A is m-by-n and the columns of A span C"', then A has a right inverse and conversely. 1 I. Find all right inverses of A
1
0
0
2
1
0
1
1
0
1
1
0
1
2
1
12. Give an example of a matrix that has neither a left inverse nor a right inverse.
13. Suppose A is rn-by-n, B is n-by-r, and C = AB. Argue that if A and B both have linearly independent columns, then C has linearly independent columns. Next argue that if the columns of B are linearly dependent, then the columns of C must be linearly dependent.
14. Let T : C" - C"' be a linear map. Argue that the following statements are all equivalent: (a) T is left invertible.
(b) Ker(T) = (6). (c) T :C1 - Im(T) is one-to-one and onto.
(d) n < m and rank(A) = n. (e) The matrix of T, Mat(T;C3,C) in Cmxn has n < m and has full rank.
15. Let T : C" - Cbe a linear map. Argue that the following statements are all equivalent:
(a) T is right invertible.
(b) rank(T) = m. C"' is one-to-one and onto where M ® Ker(T) = C". (c) T : M (d) n > m and nlt y(T) = n - m. (e) The matrix of T, Mat(T;t3,C) in Cm"" has n > m and has full rank.
16. Suppose A is m-by-n and A = FG. Suppose F is invertible. Argue that A has a right inverse iff G has a right inverse.
17. Suppose A has a left inverse C and the linear system Ax = b has a solution. Argue that this solution is unique and must equal Cb.
18. Suppose A has a right inverse B. Argue that the linear system Ax = b has at least one solution.
154
Subspaces Associated to Matrices
19. Suppose A is an m-by-n matrix of' rank r. Discuss the existence and uniqueness of left and right inverses of A in the following cases: r =
m
20. Argue that A has a left inverse iff A* has a right inverse.
Further Reading [Noble, 1969] Ben Noble, Applied Linear Algebra, Prentice Hall, Inc., Englewood Cliffs, NJ, (1969). [Perlis, 1952] Sam Perlis, Theory of Matrices, Dover Publications Inc., New York, (1952).
Chapter 4 The Moore-Penrose Inverse
RREF, leading coefficient, pivot column, matrix equivalence, modified RREF, rank normal form, row equivalence, column equivalence
4.1
Row Reduced Echelon Form and Matrix Equivalence
Though we have avoided it so far, one of the most useful reductions of a matrix
is to bring it into row reduced echelon form (RREF). This is the fundamental result used in elementary linear algebra to accomplish so many tasks. Even so, it often goes unproved. We have seen that a matrix A can be reduced to many matrices in row echelon form. To get uniqueness, we need to add some requirements. First, a little language. Given a matrix A, the leading coefficient of a row of A is the first nonzero entry in that row (if there is one). Evidently, every row not consisting entirely of zeros has a unique leading coefficient. A column of A that contains the leading coefficient of at least one row is called a pivot column.
DEFINITION 4.1 (row reduced echelon form) A matrix A in C111-1 is said to be in row reduced echelon form iff
1. For some integer r > 0, the first r rows are nontrivial (not totally filled with zeros) and all the remaining rows (if there are any) are totally filled with zeros.
2. Row 1, row 2,. up to and including row r has its first nonzero entry a I (called a leading one).
3. Suppose the leading ones occur in columns c1, c2,
C2 < ... < cr.
, Cr.
Then ci <
155
The Moore-Pen rose Inverse
156
4. In an)' column with a leading one, all the other entries in that column are zero.
For example, 0 0
0 0
1
4
0
6 0
0
0
0
0
0
0
0
0
1
0 0
0 0
0
0 0
0 0
0 0
0 0
0
1
0 0
7
5
6
3
4 0 0
0 0
8
is in row reduced echelon form.
In other words, we have RREF if each leading coefficient is one, any zero row occurs at the bottom, in each pair of successive rows that are not totally zero, the leading coefficient of the first row occurs in an earlier column than the leading coefficient of the later row and each pivot column has only one nonzero entry, namely a leading one. In particular, all entries below and to the left of a leading one are zero. Do you see better now why the word "echelon" is used? Notice, if we do not demand condition (4), we simply say the matrix is in row echelon form (REF) so that it may not be "reduced" In the next theorem, we shall prove that each matrix A in C"I can he reduced by elementary row operations to a unique matrix in RREF. This is not so if condition (4) is not required.
THEOREM 4.1 Let A be in C"'". Then there exists a finite sequence of elementary matrices E, , E2, , Ek such that Ek ER-, . . . E2 E, A is in RREF. Moreover this matrix is unique and we denote it RREF(A), though the sequence of elementary matrices Grxn that produce it is not. Moreover, if r is the rank of A, then RA = ®(m-r)xn
where R = Ek Ek_I ... E( is in Cx"' and is invertible. In fact, the rank of G is G
r where G is r-by-n. In particular A = R-'
. Moreover, Row(A)
Row(G).
PROOF If A = 0, then A is in RREF and RREF(A) = 0. So suppose A # 0. Then A must have a column with nonzero entries. Let be the number of the first such column. If the (1, c,) entry is zero, use a permutation matrix P to swap a nonzero entry into the (I, c,) position. Use a dilation, if necessary, to make this element 1. If any element below this I is nonzero, say Of in the (j, c,) position, use the transvection T i (-a) to make it zero. In this way, all the entries of column c, except the (1, c,) entry can be made zero.Thus far we have
4. 1 Row Reduced Echelon
and Matrix Equivalence
157
*
* *
A -> TI D, P, A = L
0
*
. Now, if all the rows below the
*j
first row consist entirely of zeros, we have achieved RREF and we are done. If not, there will be a first column in the matrix above that has a nonzero entry, 0 say, below row 1. Suppose the number of this column is c2. Evidently c, < c2. By a permutation matrix (if necessary), we can swap rows and move R into row 2, keeping 0 in column c2. By a dilation, we can make R be 1 and we can use transvections to "zero out"the entries above and below the (2, c2) position.
Notice column c, is unaffected due to all the zeros produced earlier in that column. Continuing in this manner, we must eventually terminate when we get to a column c which will be made to have a I in the (r, Cr) position with zeros above and below it. Either there are no more rows below row r or, if there are, they consist entirely of zeros. Now we argue the uniqueness of the row reduced echelon form of a matrix.
As above, let A be an m-by-n matrix. We follow the ideas of Yuster [1984] and make an inductive argument. For other arguments see Hoffman and Kunze [1971, p. 561 or Meyer [2000, p. 134]. Fix m. We proceed by induction on the number of columns of A. For n = 1, the result is clear (isn't it always?), so assume n > 1. Now suppose uniqueness of RREF holds for matrices of size m-by-(n-l). We shall show uniqueness also holds for matrices of size m-by-n and our result will follow by induction. Suppose we produce from A a matrix in row reduced echelon form in two different ways by using elementary row operations. Then there exist R, and
R2 invertible with R,A = Ech, and R,A = Ech2, where Ech, and Ech2 are matrices in RREF. We would like to conclude Ech, = Ech2. Note that if then R, A = we partition A by isolating its last column, A = [A' Ech, and R2A = [R2A' I Ech2. Here is [R,A' I a key point of the argument: any sequence of elementary row operations that yields RREF for A also puts A' into row reduced echelon form. Hence, by the induction hypothesis, we have R, A' = R2A' = A" since A' is m-by-(n-1). We I
distinguish two cases. CASE 4.1 Every row of A" has a leading 1. This means there are no totally zero rows in this matrix. Then, by Theorem 2.7 of Chapter 2, the columns of A corresponding to
the columns with the leading one's in A" form an independent set and the last column of A is a linear combination of corresponding columns in A with the coefficients coming from the last column of Ech 1. But the same is true about the last column of Ech2. By independence, these scalars are uniquely determined
The Moore-Penrose Inverse
158
so the last column of Ech, must equal the last column of Ech2. But the last column is the only place where Ech, could differ from Ech2, so we conclude in this case, Ech, = Ech2. CASE 4.2
A" has at least one row of zeros. Let's assume Ech, A Ech2 and seek a contradiction. Now again, the only place Ech, can differ from Ech2 is in the last column, so there must exist a j with bj # cj,,, where bj,, = entjn(Ech,) and cj = entjn(Ech2). Recall from Theorem 3.2 of Chapter 3 that Null(Ech,) = Null(A) = Null(Ech2). Let's compare some null x,
E Nllll(A). Then Ech,x = Ech2x =
spaces. Suppose x = xn
so (Ech, - Ech2)x = bin - cin so [0 I
]
bmn - Cn"i
0
. But the first n - 1 columns of Ech, -Ech2 are zero 0 xl , which implies (bin - cin)xn = 0
x
=
0
for all i- in particular, when i = j, (bj -cjn )x = 0. By assumption, bjn -Cj U,
# 0, so this forces xn = 0, a rather specific value. Now if u Un
Null (Ech, ), then
' = Ech, u =
b(k+i ),,Un b(k+2)n U n
L
bmnun j
where k + I is the first full row of zeros in A". If b(k+,),,, ... , bn,,, all equal zero, then u can be any number and we can construct vectors u in Nu ll (Ech, ) without a zero last entry. This contradicts what we deduced above. Thus, some b in that list must be nonzero. If b(k+, ) were zero, this would contradict that Ech, is in RREF. So, b(k+,) must he nonzero. Again quoting row reduced echelon form, b(k+,) must be a leading one hence all the other bs other than b(k+,)n must be zero. But exactly the same argument applies to Ech2 so C(k+I)n must be one and all the other cs zero. But then the last column of Ech, is identical to the last column of Ech2, so once again we conclude, Ech, = Ech2. This is 0 our ultimate contradiction that establishes the theorem.
4.1 Row Reduced Echelon Form and Matrix Equivalence
159
For example,
2 - 4i
i
3
2-7i 3+2i 6- 15i 3
RREF
0
1
7
0
0
4i 2i
0 0
10i 0
0
0
= T12(4 + 2i)D2114 - i T32(-1)DI (-i)T31(-2 + 3i ) 2 -4i 3 4i 0 i 2i 0 3 2-7i T21(31) 3 +2i 6 - 151 7 10i 0 1
0
l 9+ i
-L i
196
197 + 9 t
0 0 0 0
0
-234
7gq3i
177 197 + 1971 157
1
0 0
0 0
197 + 1971
-2
1
0
0
0
0
i
2-4 i
3
4i
0
0
3
2 - 7l
I
21
0
10i 0
0 0
-1
1
0
0
0
1
3+21 6- 151 0
0
7
0
2761
7yy677
196
170
197 + 197 i
0
0
0
0
0
0
0
Now, do you think we used a computer on this example or what'? The reader will no doubt recall that the process of Gauss elimination gives an algorithm for producing the row reduced echelon form of a matrix, and this gave a way of easily solving a system of linear equations. However, we have a different use of RREF in mind. But let's quickly sketch that algorithm for using RREF to solve a system of linear equations:
1. Write the augmented matrix of the system Ax = b, namely [A I b]. 2. Calculate RREF([A I b]). 3. Write the general solution, introducing free variables for each nonpivot column.
Note that if every column of the RREF coefficient matrix has a leading one, the system can have at most one solution. If the final column also has a leading one, the system is inconsistent (i.e., has no solution). Recall that the number of leading ones in RREF(A) is called the pivot rank (p-rank(A)) of A. Finally, recall that a system Ax = b with n unknowns has no solution if p-rank(A) # p-rank([A b]). Otherwise the general solution has n - (prank(A)) free variables. I
The Moore-Penrose Inverse
160
Matrix Equivalence
4.1.1
Next, we consider matrix equivalence. Recall that we say that matrix A E Cis equivalent to matrix B E C",- if B can be obtained from A by applying both elementary row and elementary column operations to A. In other words, A is equivalent to B if there exist invertible matrices S E C"" and T in C0`1 such that SAT = B. Recall that invertible matrices are the same as products of elementary matrices so we have not said anything different.
DEFINITION 4.2 (matrix equivalence) Let A and B be m-by-n matrices. We say A is equivalent to B and write in symbols A B if there exist invertible matrices S in C`111 and T in C"" such that B = SAT. This is our first example of what mathematicians call an equivalence relation
on matrices. There are in fact many such relations on matrices. But they all share the following crucial properties: Reflexive Law: Every matrix is related to itself.
Symmetric Law: If matrix A is related to matrix B, then matrix B must also be related to matrix A. Transitive Law: If matrix A is related to matrix B and matrix B is related to matrix C, then matrix A is related to matrix C.
Clearly, equality is such a relation (i.e., A = B). Do the names of the Laws above make any sense to you? Let's make a theorem. THEOREM 4.2 Matrix equivalence is an equivalence relation; that is, 1.
A
A where A and B are in C0"".
3. If A ti B and B-- C then, A-- C where A, B and C are in C"'X" PROOF
The proof is left as an exercise.
0
The nice thing about equivalence relations is that they partition the set of matrices Cnixn into disjoint classes such that any two matrices that share a class are equivalent and any two matrices in different classes are not equivalent.
Mathematicians like to ask, is there an easy way to check if two matrices
4.1 Row Reduced Echelon Form and Matrix Equivalence
161
are in the same class'? They also want to know if each class is represented by a particularly nice matrix. In the meantime, we want to extend the notion of RREF. You may have noticed that the leading ones of RREF do not always line up
nicely to form a block identity matrix in the RREF. However, all it takes is some column swaps to get an identity matrix to show up. The problem is that this introduces column operations, whereas RREF was accomplished solely mth row operations. However, the new matrix is still equivalent to the one we started with. Thus, we will speak of a modified RREF of a matrix when we use a permutation matrix on the right to get the leading ones of RREF to have an identity matrix block.
THEOREM 4.3 Suppose A E
there exists an11invertible matrix R and a permutation
matrix P such that RAP = L
®1. Moreover, the first r columns of AP
®
-C
form a basis for the column space of A, the columns of P
form a
. . .
In-r
'r basis for the null space of A, and the columns of P
. .
form a basis of the
C*
column space of A*.
PROOF
We know there is an invertible matrix R such that RREF(A) _ G
RA =
. Suppose the pivot columns occur at ct, c2, ... ,cr in RA.
Consider the permutation or that takes ct to 1, c2 to 2, and so on, and leaves every thing else fixed. Form the permutation matrix P(cr). In other words, column j of P(a) is e,, and the other columns agree with the columns of the identity matrix. Then RAP =
10r 0
and the first r columns of A P form 1
a basis for the column space of A. The remaining details are left to the reader. U 0 0
4
6
0
0
0
0
0
1
0 0
0
0
0
0
0
0
0 0
0
0
0
0
0 0
0 0
0 0
0
For example, consider A
1
7
5
6
3
1
4
0 0
0 0
8 0 0
, which
is already in RREF. The swaps we need to make are clear: 1 H 3, 2 H 6,
and 3 Fa 7. Thus, the permutation a = (37)(26)(13) = (173)(26) and the
rank.
same
the
achieved
now
have
You
row.
by
row
B
zero-out
to
1,
®], in
ones
the
of
help
the
have
4.1
COROLLARY
=[®
SAT
B
and
A
iff
B
to
equivalent
is
A
Then
x".
C"'
to
belong
B
A,
Let
with
transvections
use
Finally,
.
®]
r
L
produce
to
right)
the
on
matrices
permutation
(i.e.,
swaps
column
use
Then
Cr.
,
c2,
ci,
are
ones
leading
the
of
numbers
column
the
Say
F(A).
RRE
produce
ti
A
that
argue
we
zeros,
(i.e.,
operations
row
apply
First
®].
®
to L
left)
the
on
matrices
elementary
of
blocks
empty
be
can
there
that
understand
we
If
Cmxn.
E
A
Let
PROOF
called
n,
itself
is
matrix
zero
The
RNF(A).
denoted
and
A
of
form
normal
rank
by the
class
in a
Jr
ifm>r=
01
ifm>r,n>r,E1r:01ifm=r<nor
1
0
11
I 0
®
r
n
=
m
if
1,
form
the
of
matrix
unique
a
to
equivalent
is
""
C,
E
A
matrix
Any
form)
normal
(rank
4.4
THEOREM
form.
normal
and
result
important
very
a
to
leads
and
case
the
fact,
in
is,
This
C.
matrix
the
out"
"zero
and
ones)
the
off
pivoting
operations
column
(i.e.,
transvections
with
continue
can
you
block,
matrix
identity
that
have
you
Once
A.
of
right
the
at
matrix
permutation
stop
why
wondering,
0 0 0 0 0 0 at 0 a 0 0 be
may
You
0 0 0 0 0 0 0 0 0
0 0 1 0 1 0 1 0 0 0 04 0 0 6 0 0 0 00 0 4 6 7 8 3 5
=
AP(o)
0 0 0 0 0 0 0 0 1
0 0
0 0 0
Finally,
1 0 0 0 0 0 0 0 10 0 0 0 001 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 1
=
P(Q)
is
matrix
permutation
0 0
0 0 0 0 0 0 1
Inverse
rose
Moore-Pen
The
162
4.1 Row Reduced Echelon Form and Matrix Equivalence PROOF
163
The proof is left as an exercise.
U
These results give a very nice way to determine the classes under matrix equivalence. For example, C2x4 has three classes; (02x4}, {all matrices of rank
1), {all matrices of rank 2}. Actually, we can do a bit better. We can pick a canonical representative of each class that is very nice. So the three classes r 0 0 O are {02x4}, {all matrices equivalent to L 0 J } and {all matrices 1
0
equivalent to
l
0
0
0
0
I
0
0
0
0
There is an algorithm for producing matrices S and T that put A into its rank normal form. First, adjoin the identity matrix to A, [A 11. Next, row reduce this augmented matrix: [A 1] -* [RREF(A) I S]. Then, S is an invertible matrix with SA = RREF(A). Now, form the augmented matrix RR EF(A)I and column reduce it: [RREF(A)] , [RNF(A)] Then T is invertible and SAT = RNF(A). 1
1
We can refine the notion of matrix equivalence to row equivalence and column equivalence by acting on just one side of the matrix with elementary operations.
DEFINITION 4.3
(row equivalence, column equivalence)
1. We say the matrices A and B in (C"' are row equivalent and write A `'R B iff there exists an invertible matrix S with SA = B.
2. We say A and B are column equivalent and write A -,c B iff there is a nonsingular matrix T with AT = B.
In other words, A is row equivalent to B if we can obtain B from A by performing a finite sequence of elementary row operations, and A is column equivalent to B iff B can be obtained from A by performing a finite sequence of elementary column operations on B. THEOREM 4.S Let A and B be in C'nxn. Then the following statements are all equivalent:
1. A -R B.
2. RREF(A) = RREF(B). 3. Col(AT) = Col(BT)
4. Null(A)
Null(B).
The Moore-Penrose Inverse
164
PROOF
The proof is left as an exercise.
0
A similar theorem holds for column equivalence.
THEOREM 4.6 Let A and B be in C"' x". Then the following statements are all equivalent:
I. A -C B. 2. RREF(BT) = RREF(BT). 3. Col(A) = Col(B). 4. JVull(AT) = Arull(BT ).
PROOF
As usual, the proof is left to the reader.
0
Exercise Set 13 1. Prove that matrix equivalence is indeed an equivalence relation. (This is Theorem 4.2.)
2. Prove that matrices A and B are equivalent if and only if they have the same rank. (This is Corollary 4.1.) 3. Describe the equivalence classes in C3x3 under ti
4.
Reduce A
under
_
1240
862
1593
2300 2404 488
2130 2200 438
1245
1386
309
2278 2620 2818 598
.
to its canonical form
.
5. You may have noticed that your calculator has two operations that do row reductions, ref and rref. Note that rref is what we talked about above,
RREF. This is unique. There is a weaker version of row reduction that is not unique, ref-row echelon form. Here you demand (1) all totally zero rows are at the bottom and (2) if the first nonzero entry in row i is at position k, then all the entries below the ith position in all previous columns are zero. Argue that the positions of the pivots are uniquely determined even though the row echelon form need not be unique. Argue that the number of pivots is the rank of A, which is the same as the number of nonzero rows in any row echelon form. If you call a column of A basic
4.1 Row Reduced Echelon Form and Matrix Equivalence
165
if it contains a pivot position, argue the rank of A is the same as the number of basic columns.
6. Is there a notion of column reduced echelon form, CREF(A)? If so, formulate it.
7. Suppose A is square and nonsingular. Is A -- A-'? Is A `'R A-1? Is
A -c A-'? 8. Argue that A
B iff AT Pt BT .
9. Prove that A '^R B iff AT -c BT
10. Argue that A -c B or A -R B implies A N B. 1 I. We say A and B are simultaneously diagonable with respect to equivalence if there exist nonsingular matrices S and T such that SAT = D, and SBT = D2 where D, and D2 are diagonal. Create a pair of 2-by-3 matrices that are simultaneously diagonable with respect to equivalence. 12. Prove Theorem 4.5.
13. Prove Theorem 4.6.
14. Argue that the linear relationships that exist among the columns of RREF(A), which are easy to see, are exactly the same as the linear relationships that exist among the columns of A. (Hint: Recall Theorem 2.7 of Chapter 2.)
15. Many people call the columns of A corresponding to the leading one columns of RREF(A) the basic columns of A. Of course, the other columns of A are called nonbasic columns. Argue that the basic columns of A form a basis of the column space of A. Indeed, only the basic columns occurring to the left of a given nonbasic column are needed to express this nonbasic column as a linear combination of basic ones. 16. Argue that if A is row equivalent to B, then any linear relationship among the columns of A must also exist among the same columns of B with the same coefficients.
17. In view of exercise 16, what can you say if A is column equivalent to B?
18. Explain why the algorithm for producing the RNF of a matrix given in the text above works. 19. Here is another proof that row rank equals column rank: We use RREF
in this argument. First let R = RREF(A) = SA, where S is m-by-m
The Moore-Penrose Inverse
166
invertible and A is m-by-n. Conclude that Row(R) = Row(A). Write a,,] so that R = SA = [Sa, I Saz l . I Sa 1. Let A = [a, l a, I I. B = {Sad,, Saj ... , Saj, } he the columns of R with the leading ones in them. Argue that 5 is a basis for Col(R). Since S is invertible, argue that {aj,, aj,, ... , aj, } is an independent set. If cj is any column of A, Sc, . is a linear combination of the columns of B. Therefore, conclude c, is a linear combination of cj,, cj,, ... , cj,. Conclude that dim(Row(A)) = r = dim(Col(A)). I
20. If A is nonsingularn-by-n and B is n-by-r, argue that RREF([A I B]) _ [I I A-i B].
21. If two square matrices are equivalent, argue that they are either both invertible or both singular.
22. Suppose T is a linear map from C" to C"'. Suppose B and B1 are bases of C" and C and C, are bases of C"'. Argue that Mat(T;B,,C,) = PMat(T;B,C)Q-l, where P is the transition matrix for C to C, and Q is the transition matrix from B to Cit. Deduce that any two matrix representations of T are matrix equivalent. 23. Tell whether the following matrices are in REF, RREF, or neither:
12 0 0 1
[
1
6
8
5
5
0
0
3
5
0
0
7
3
4 1
0
7
2
3
0
1
9
4
0
0 1
0 0
0 0
0
0
0
0
0
0
7
0
1
0
5
2
3
I
0
2
1
0
0
0
1
2
24. Make up an example to find two different REFs for one matrix.
25. If you are brave, do you think you can find the RREF of a matrix with polynomial entries? Try
x-4
3
2
x-1
-3
-3
3 1
x-9
26. Suppose two people try to solve Ax = b but choose different orders for listing the unknowns. Will they still necessarily get the same free variables?
27. Use rank normal form of a matrix A of rank r to prove that the largest number of columns (or rows) of A that are linearly independent is r. Argue that this is equivalent to saying A contains an r-by-r nonsingular submatrix and every r + I-by-r + I submatrix is singular. 28. Fill in the details of the proof of Theorem 4.3.
4.1 Row Reduced Echelon Form and Matrix Equivalence
167
Further Reading I H&K, 19711 Kenneth Hoffman and Ray Kunze, Linear Algebra, 2nd Edition, Prentice Hall Inc., Englewood Cliffs, NJ, (1971).
[L&S, 2000] Steven L. Lee and Gilbert Strang, Row Reduction of a Matrix and A = CaB, The American Mathematical Monthly, Vol. 107, No. 8, October, (2000), 681-688. [Yuster, 1984] Thomas Yuster, The Reduced Row Echelon Form of a Matrix Is Unique: A Simple Proof, The American Mathematical Monthly, Vol. 57, No. 2, March, (1984), 93-94.
4.1.2 MATLAB Moment 4.1.2.1
Row Reduced Echelon Form
MATLAB has a built in command to produce the RREF of a matrix A. The command is rref(A) Let's look at some examples.
>> B=round(] 0*rand(3,4))+round(10(3,4))*i
B= Columns I through 3 1.0000 + 8.0000i 2.0000 + 5.0001 2.0000 + 2.00001
6.0000 + 7.0000i 3.0000 + 8.0000i 2.0000
Column 4 9.0000 + 5.0000i 5.0000 + 7.0000i 4.0000 + 4.00001
> >rref(B) Columns I through 3 1.0000
0
0 0
1.0000 0
0 0 1.0000
0 + 7.00001 7.0000 + 4.0000i 4.0000 + 8.0000i
The Moore-Penrose Inverse
168
Column 4
-2.2615 - I.2923i 1.5538 + 1.0308i 1.0462 + 0.1692i
There is actually more you can do here. The command
[R, jb] = rref(A) returns the RREF R and a vector jb so that jb lists the basic variables in the linear system Ax=b, r=length(jb) estimates the rank of A, A(:,jb) gives a basis for the column space of A. Continuing our example above, > > [R, jb]=rref(B)
R= Columns I through 3 1.0000 0
0
0 1.0000 0
0 0 1.0000
Column 4
-2.2615 - 1.2923i 1.5538 + I.0308i 1.0462 + 0.16921 jb = 1
2
3
Let's get a basis for the column space of B.
>>B(:,jb) ans =
1.0000 + 8.0000i 2.0000 + 5.000i 2.0000 + 2.0000i
6.0000 + 7.0000i 3.0000 + 8.0000i 2.0000
0 + 7.0000i 7.0000 + 4.0000i 4.0000 + 8.0000i
Of course, this answer is not surprising. You might try experimenting with the matrix C=[ 1 2 3;2 4 5;3 6 9]. There is a really cool command called rrefniovie(A )
This steps you through the process element by element as RREF is achieved for the given matrix.
4.1 Row Reduced Echelon Form and Matrix Equivalence 4.1.3 4.1.3.1
169
Numerical Note Pivoting Strategies
In theory, Gauss elimination proceeds just fine as long as you do not run into a zero diagonal entry at any step. However, pivots that are very small (i.e., near zero) can cause trouble in finite-precision arithmetic. If the pivot is small, the multipliers derived from it will be large. A smaller multiplier means that earlier errors are multiplied by a smaller number and so have less effect being carried forward. Equations (rows) can be scaled (multiplied by a nonzero constant), so we should choose as pivot an element that is relatively larger in absolute value than the other elements in its row. This is called partial pivoting. This will make the multipliers less than I in absolute value. One approach is to standardize each row by dividing row i by >j Jail 1. Or we can just choose the largest magnitude coefficient aki to eliminate the other xk coefficients. An easy example will illustrate what is going on. Consider the system
-x+y=a n x+y=b 1
where n is very large compared to a and b. Using elementary operations, we get
-x+y=a n (I-n)y=b-na. 1
Thus
- na y- b1-n x = (a - y)n.
When n is very large, the computer will see I - n as -n and b - na as -na so the answer for y will be a, and hence x will be zero. In effect, b and I are overwhelmed by the size of n so as to disappear. On the other hand, if we simply swap the two equations,
x+y=b
-x+y=a n and eliminate as usual, we get
x+y=b C
1--nI)y=a --bn
170
The Moore-Penrose Inverse
so
Y- a - !'
I-!
x =b-y. In summary then, the idea on partial pivoting is to look below the current pivot and locate the element in that column with the largest absolute value. Then
do a row swap to get that element into the diagonal position. This ensures the multipliers will be less than or equal to I in absolute value. There is another strategy called complete pivoting. Here one searches not just below the current pivot, but in all remaining rows and columns. Then row and column swaps are necessary to get the element with largest absolute value into the pivot position. The problem is you have to keep track of row and column swaps. The column swaps do disturb the solution space so you have to keep up with changing the variables. Also, all this searching can use up lots of computer time.
4.1.3.2
Operation Counts
Operation counts give us a rough idea as to the efficiency of an algorithm. We count the number of additions/subtractions and multiplications/divisions. Suppose A is an n-by-n matrix and we wish to solve Ax = b (n equations in If unknowns).
Algorithm Gauss Elimination with back substitution
2.
Gauss-Jordan elimination (RREF)
Additions/Subtractions n; + 1 n2 - 11, Multiplications/Divisions 113 + n2 - Ii Additions/Subtractions n3
Cramer's rule
+ ;n2 - 6n
Multiplications/Divisions 1113+n2- 1fin Additions/Subtractions n4
-
bn. - n2 + bn M ulti plications/Divisions 311 3 T in2 +
x = A-1b if A is invertible
;n -
Additions/Subtractions
n;-I!2 Multiplications/Divisions If 3+If 2
4.2 The Hermite Echelon Form
171
Interestingly, I and 2 have the same number of counts. To understand why, note that both methods reduce the augmented matrix to a REF. We leave it as an exercise to see that the number of operations to do back substitutions is the same as continuing to RREF.
Further Reading [Anion, 19941 Howard Anton, Elementary Linear Algebra, 7th Edition, John Wiley & Sons, New York, (1994). [C&deB
19801
Samuel D. Conte and Carl de Boor, Elementary
Numerical Analysis, 3rd Edition, McGraw-Hill Book Company, New York, (1980).
[Foster, 1994] L. V. Foster, Gaussian Elimination with Partial Pivoting Can Fail in Practice, SIAM J. Matrix Anal. Appl., 15, (1994), 1354-1362.
4.2
The Hermite Echelon Form
There is another useful way to reduce a matrix, named in honor of the French mathematician Charles Hermite (24 December 1822 - 14 January 1901), that is very close to the RREF. However, it is only defined for square matrices. Statisticians have known about this for some time.
DEFINITION 4.4 (Hermite echelon form) A matrix H in C" " is in (upper) Hermite echelon form if
1. H is upper triangular (hid = entij (H) = 0 if i > j). 2. The diagonal of H consists only of zeros and ones.
3. If a row has a zero on the diagonal, then every element of that row is zero; if hii = 0, then hik = O for all k = 1, 2, ... , n. 4. If a row has a I on the diagonal, then every other element in the column containing that I is zero; i f hii = 1 , then hji = O for all j = 1 , 2, ... , n
except j = i.
The Moore-Penrose Inverse
172
The first interesting fact to note is that a matrix in Hermite echelon form must be idempotent.
THEOREM 4.7 Let H E C""" be in Hermite echelon form. Then H2 = H.
PROOF
Let bik he the (i,k) entry of Hz. Then the definition of matrix H i-I H
multiplication gives bik = Ehiihjk = >hijhjk + hiihik + j hiihjk.
If
j=i+l i > k, then bik = 0 since this is just a sum of zeros. Thus H2 is upper triangular. j=1
j=1
Ifi < k, then bik = >hijhjk. Weconsidercases. Ifhii = 0, then by (3), hij = 0 j=i
... , n so bik = 0 = hik. If hii 0 0, then hii must equal 1, so bik = hik + r_ hijhjk . Now whenever hij 0 0 for i + I < j < tt, we have f o r all j = 1,2,
11
j=i+I
by (4) that hjj = 0 so from (3), h j,,, = 0 for all m. This is so, in particular, for m = k. Thus, in any case, bik = hik so H2 = H. 0
THEOREM 4.8 Every matrix A in C"can be brought into Hermite echelon form by using elementary row operations.
PROOF
First we use elementary row operations to produce RREF(A). Then permute the rows of RREF(A) until each first nonzero element of each nonzero row is a diagonal element. The resulting matrix is in Hermite echelon form. 0
I
6 2
2
4
3
For example, RREF( 6s
60
I
0
3
6 2
9
5 10 J 9
1
)=
=
[
1
2
0
0 0
0 0
0
11
2
1. Indeed 0
0 0 J. To get the Hermite 4 0 0 0 2 21 4 echelon form, which we shall denote HEF(A), simply permute the second and z
third rows. Then
0
1
s
-;
0
?I
6
z
1
5 10
0 0
1
3
6
9
1
2
5 10
2
4
=
1
2
0 0
0 0
0 0
= H.
1
The reader may verify Hz = H. Thus, our algorithm f'or finding HEF(A) is described as follows: use elementary row operations to produce [A [HEF(A) I S]. Then SA = HEF(A). 1
I] -
4.2 The Hermite Echelon Form
173
COROLLARY 4.2
For any A E C", there exists a nonsingular matrix S such that SA is in Hermite echelon form. Moreover, ASA = A.
For example, for A =
1
2
2
3
1
1
1
0
1
-1
1
2
1
1
0
2
3
1
0
1
1
1
1
0
0
0
01[1
2
1
0
-1
0
2
3
1
1
1
1
0
D2(-I)T32(-1)T31(-1)T21(-2)
0 1
H
or
1-3
12 11
2
-1 -1
1
=
0 0
1
1
0
0
THEOREM 4.9 The Hermite echelon form of a matrix is unique, justifying the notation HEF(A) for A E C"xn
PROOF (Hint: Use the uniqueness of RREF(A).) The proof is left as exercise.
U
Note that the sequence of elementary operations used to produce HEF(A) is far from unique. For the example above:
0 0 1
-1 1
-1
3
I
2
1
-2
2
3
1
1
1
1
0
1
=
0 0
0
-1
1
1
0
0
The fact that H = HEF(A) is idempotent means that there is a direct sum decomposition lurking in the background. The next result helps to indicate what that is.
COROLLARY 4.3
For any A E C"', Mull(A) = Null(HEF(A)) =Col(/ - HEF(A)). Moreover, the nonzero columns of 1 - HE F(A) yield a basis for the null space of
A. Also rank(A) = rank(HEF(A)) = trace(HEF(A)).
The Moore-Penrose Inverse
174
THEOREM 4.10 Let A and B E C""". Then HEF(A) = HEF(B) iffCol(A*) =Col(B*)
PROOF
First, suppose Col(A*) = Col(B*). Then there exists an invertible
matrix S with A*S* = B*. This says that SA = B. Now there exists T nonsingular with TB = HEF(B). Then HEF(B) = TB = TSA = (TS)A is a matrix in Hermite echelon form. By uniqueness, HEF(B) = HEF(A). Conversely, suppose H = HEF(A) = HEF(B). Then, there are nonsingular matrices Sand T with SA = H = TB. Then A = S-I T B so A* = B*(S-J T)*. But (S-1 T) is nonsingular so Col(A*) = Col(B*).
0
COROLLARY 4.4
1. For any A E C""", HEF(A*A) = HEF(A).
2. For A E C""" and S E C""" nonsingular, HEF(SA) = HEF(A). In other words, row equivalent matrices have the same Hermite echelon form.
THEOREM 4.11 Let H = HEF(A) for some A in C""". Suppose that the diagonal ones of H occurs in columns numbered c1, c2, ... , ck. Then the corresponding columns of A are linearly independent.
PROOF
The proof is left as an exercise.
0
COROLLARY 4.5
Consider the i"' column of A, a,. This column is a linear combination of the set of linearly independent columns of A as described in the theorem above. The coefficients of the linear combinations are the nonzero elements of the i`h column of HEF(A).
Exercise Set 14 1.
Fill in the arguments for those theorems and corollaries given above.
2. Prove that H = HEF(A) is nonsingular iff H = 1.
4.2 The Hermite Echelon Form
175
3. Argue that if A is nonsingular, then HEF(A) = I.
4. Let S be a nonsingular matrix with SA = HEF(A) = H. Argue that
AH=A. 5. Let A E C""", S E C"" nonsingular with SA = HEF(A) = H. Then prove that AH = A.
6. Let H = HEF(A). Argue that A is idempotent iff HA = H. 7. Define A -- B iff Col(A*) = Col(B*). Is an equivalence relation? Is HEF considered as a function on matrices constant on the equivalence classes?
8. Create an example of a matrix A with nonzero entries such that H = 0 1 [ 01 [ I I I HEF(A) = 0 , 0 0 0 0 1
0
1
1
1
1
0
0
0
1
0 0,
0
1
0
0
0
9. Check to see that AH = H in the examples you created above. 10. Spell out the direct sum decomposition induced by H = HEF(A). 11. Fill in a proof for Corollary 4.2. 12. Make a proof for Theorem 4.9. 13. Fill in a proof for Corollary 4.10. 14. Fill in a proof for Corollary 4.4. 15. Make a proof for Theorem 4.11.
16. Fill in a proof for Corollary 4.5.
Further Reading [Graybill, 19691 Franklin A. Graybill, Introduction to Matrices with Applications in Statistics, Wadsworth Publishing Co., Inc., Belmont, CA, (1969).
The Moore-Penrose hiverse
176
Full Rank Factorization
4.3
There are many ways to write a matrix as the product of others. You will recall the LU factorization we discussed in Chapter 3. There are others. In this section, we consider a factorization based on rank. It will be a major theme of our approach to matrix theory.
DEFINITION 4.5
(full rank factorization)
Let A be a matrix in Cr` with r > 0. If there exists F in C""', and G in Cr x" such that A = FG, then we say we have a full rank factorization of A.
There are the usual questions of existence and uniqueness. Existence can be argued in several ways. One approach is to take F to be any matrix whose columns form a basis for the column space of A. These could be chosen from among the columns of A or not. Then, since each column of A is uniquely expressible as a linear combination of the columns of F, the coefficients in the linear combinations determine a unique G in Cr-,,,, with A = FG. Moreover,
r=r(A)=r(FG)
.
Another approach is to apply elementary matrices on the left of A to produce the unique RREFof A. That is, we produce an invertible matrix R in Cmx"' with Grxn
RA =
,
where r = r(A) = r(G) and O(n,-r)x is the (m-r)-
0(,n-r)xn
by-n zero matrix. Then, A = R-)
With a suitable partitioning
I
®Vn-r)xn
I.
of R-), say R-' = [R) R2J, where R, is m-by-r and R2 is m-by-(m - r), G
A=
(R, : RZJ
= RIG + R20 = RIG. Take F to be RI. Since
R-' is invertible, its columns are linearly independent so F has r independent columns and hence has full column rank. We summarize our discussion with a theorem. THEOREM 4.12 Every matrix A in C;' x" with r > 0 has a full rank factorization.
4.3 Full Rank Factorization
177
Even better, we will now describe a procedure, that is, an algorithm, for computing a full rank factorization of a given matrix A that works reasonably well for hand calculations on small matrices. It appears in [C&M, 1979]. Let A
be in C; " Step 1. Use elementary row operations to reduce A to RREF(A). Step 2. Construct a matrix F by choosing the columns of A that correspond to the columns with the leading ones in RREF(A) placing them in F in the same order they appear in A. Step 3. Construct a matrix G by taking the nonzero rows of RREF(A) and placing them as the rows of G in the same order they appear in RREF(A).
Then, A = FG is a full rank factorization of A. Now for the bad news. As you may have guessed by now, not only do full rank factorizations exist, they abound. After all, in our first construction described above, there are many choices for bases of the column space of A, hence many choices for F. Indeed, if A = FG is one full rank factorization of A in (Cr"" with r > 0, choose any invertible matrix R in C; `. Let FR = FR and GR = R-' G. Then clearly A = FRGR is also a full rank factorization of A. Actually,
this will turn out to be good news later since we will be able to select an R to produce very nice full rank factorizations. Again, we summarize with a theorem.
THEOREM 4.13 Every matrix A in C;"`" with r > 0 has infinitely many full rank factorizations.
Example 4.1
Let A =
3
6
2
4 2
1
1
2
0
0
01 1
J
13
9
.
Then RREF(A) =
3
and F =
3
13
2
9
1
3
indeed a full rank factorization of A.
1
2
0 0
0 0
0 1
so we take G =
0
. The reader may verify that A = FG is
The Moore-Penrose hiver.se
178
Exercise Set 15 1. Compute full rank factorizations for the following matrices: 1
1
0
1
0
1
1
1
0
1
0
1
2
0
1
1
1
2
1
0
1
1
'
l
1
I
1
0
1
0
0
1
0
1
I
1
1
1
,
1
[
1
1
1
0
1
2
1
2
2
3
1
3
1
2
3
1
2
1
1
2
3
1
3
0
1
2
3
1
2
1
1
'
'
0 '
1
0
.
1
2. Argue that any A in C"' "' can be written A = L K, where L has a left inverse and K has a right inverse. 3. Suppose A = FG is a full rank factorization of A and C" = Null(A) Co!(A). Argue that (G F)-' exists and E = F(GF)-'G is the projector of C" onto Co!(A) along Mull(A).
4. Use full rank factorizations to compute the index of the following matrices:
11 0 1
0
1
1
0
1
11
I
0'
1
0
0
1
1
I
1
0
0, I
1
0
1
1
2
2
3
3
1
0
1
2
3
2
1
1
1
2
2
3
3
0
1
0
1
2
3
1
I
I
}
2
}
1
I
1
0
1
2
1
2
5. Suppose A has index q. Then C" = Col(Ay) ®.AIull(A`"). We can use a full rank factorization of A" to compute the projector of C" onto Col(A") alongArull(A"). Indeed, let Ay = FG be a full rank factorization. Argue that F(GF)-I G is the projector of C" onto Col(Ay) along Afull(A").
6. Suppose A = FG is a full rank factorization. Argue that A = AZ iff
GF=l.
7. Argue that a full rank factorization of a matrix A can he obtained by first selecting a matrix G whose rows form a basis for 'Row(A). Then F must be uniquely determined. 8. Explain how to produce a full rank factorization from the modified RREF of a matrix.
4.4 The Moore-Penrose Inverse
179
9. Suppose A = A2. Prove the the rank of A is the trace of A.
10. (G. Trenkler) Suppose A and B are n-by-n idempotent matrices with A + B + AB + BA = 0. What can you conclude about A and B?
Further Reading [C&M, 1979] S. L. Campbell and C. D. Meyer, Jr., Generalized Inverses of Linear Transformations, Dover Publications, Inc., New York, (1979). 4.3.1 4.3.1.1
MATLAB Moment Full Rank Factorization
We can create an M-file to compute a full rank factorization of a matrix. By now, you should have this down.
3
function FRF=frf(A) [R,Jp] = rref(A) r = rank(A)
4
fori= 1:r
5
G(I,:) = R(i,:)
6
end F = A(:,jp)
I
2
7
8G Experiment with this routine on some matrices of your own creation.
4.4
The Moore-Penrose Inverse
In this section, we develop a key concept, the Moore-Penrose inverse (MPinverse), also known as the pseudoinverse. What is so great about this inverse is that every matrix has one, square or not, full rank or not. Our approach to the pseudoinverse is to use the idea of full rank factorization; we build up from the factors of a full rank factorization. The idea of a generalized inverse of a singular matrix goes back to E. H. Moore [26 January, 1862 to 30 December, 19321 in a paper published in 1920. He investigated the idea of a "general reciprocal" of
The Moore-Penrose Inverse
180
a matrix again in a paper in 1935. Independently, R. Penrose 18 August, 1931 ] rediscovered Moore's idea in 1955. We present the Penrose approach.
DEFINITION 4.6 (Moore-Penrose inverse) Let A be any matrix in C011". We say A has a Moore-Penrose inverse (or just pseudoinverse for short) iff there is a matrix X in C" ""' such that
(MPI) AXA = A (MP2) X AX = X (MP3) (AX)' = AX (MP4) (X A)* = X A. These four equations are called the Moore-Penrose equations and the order in which they are written is crucial for our subsequent development. Indeed, later we will distinguish matrices that solve only a subset of the four Moore-Penrose equations. For example, a 1-inverse of A would be a matrix X that is required to solve only MPI. A {I,2}-inverse of A would be required to solve only MPI and MP2. Now, we settle the issue of uniqueness.
THEOREM 4.14 (the uniqueness theorem) If A in C", x" has a pseudoinverse at all, it must be unique. That is, there can be only one simultaneous solution to the four MP-equations.
PROOF
Suppose X and Y in C"""' both satisfy the four Moore-Penrose
equations. Then X = X(AX) = X(AX)* = XX*A* = XX*(AYA)* _ XX'A*(AY)* = X(AX)*(AY)* = XAXAY = XAY = XAYAY = (XA)*(YA)"Y = A'X'A'Y'Y = (AXA)'Y'Y = A*Y*Y = (YA)'Y = YAY = Y. The reader should be sure to justify each of the equalities above and note that 0 all four Moore-Penrose equations were actually used. In view of the uniqueness theorem for pseudoinverses, we use the notation A+ for the unique solution of the four MP-equations (when the solution exists, of course, which is yet to be established). Since this idea of a pseudoinverse may be quite new to you, the idea that all matrices have inverses may be surprising. We now spend some time on a few concrete examples. Example 4.2
1. Clearly /,+ = I,, for any n and ® to divide by zero?)
x=®
(Does this give us a chance
4.4 The Moore-Penrose Inverse 2.
181
Suppose A is square and invertible. Then A+ = A-'. This is, of course, how it should be if the pseudoinverse is to generalize the idea of ordinary inverse. Let's just quickly verify the four MP-equations. AA-1 A = Al =
A, A-'AA-' = A-'I = A-', (AA-')* = /* = / = AA-1, and (A-' A)* = 1* = I = A-' A. Yes, they all check. 3.
Suppose P is a matrix such that P = P* = P2. Later we shall call such a matrix a projection (also known as a Herrnitian idempotent). We claim for
such a matrix, P = P+. Again a quick check reveals, PP+P = PPP =
PP = P, P+PP+ = PPP = P = P+, (PP+)* _ (PP)* = P* =
P=PP=PP+,and(P+P)*=(PP)*=P*=P=PP=P+P. 0
1
Once again, we are golden. So, for example,
i
s0
00sand
0
5
0
0
0
0 0
4
-2
5
5
+
=
0 0
0
0
s
s
0
0
0
0
Z
=
0
1
4. Let's agree that for a scalar k, k+ _ if k # 0 and k+ = 0 if )\ = 0. Let D be a diagonal matrix, say D = diag(di, d2, ... , We claim D+ = diag(di , dz . ... , d,+,). We leave the details as an exercise. In particular then,
1
0
0
0
2
0
0
0
0
+
0
0
0
z
0
0
0
0
1
=
5. What can we say about an n-by- I matrix? In other words we are just
Ib, 1 b2
looking at one column that could be viewed as a vector. Say b = L b,. J
If b = ', we know what b+ is so suppose b # -6. A little trial and error leads us to b+ = -b*b b*. Remember, b*b is a scalar. In fact, it
is the Euclidean length squared of b considered as a vector. We prefer to illustrate with an example and leave the formal proof as an exercise. We claim
2
3i b* b
=
14 . Then
_
2
I
3n
- [ 14 14 14 ] 1
2 [ 3i
2
,
. First note that
2
3i
4
1414 1 4
where b =
2 ,
3i
=
2 3i
[
1
]=
2 3i
182
The Moore-Pen rose Inverse
b+b= 1414 1
1
14 4
-3' 14 J
2
-3i
124
1
1g1
It4
T4
194
14
14
14
I
_
][---
2
14i]
114 14
I
I
I
2
2 -3i
= [I) [14 14 14 I
I
_ [ I) and bb+ = I 2 L 3i
Next, we come to crucial cases in which we can identify the pseudoinverse of a matrix.
THEOREM 4.15
1. Suppose F E C" "' - that is, F has frill column rank. Then F+ _ (F*F)-I
F*.
2. Suppose G E Cr,,, -that is, G hasfull row rank. Then G+ = G*(GG*)-I. ;
PROOF
(1) We verify the four MP-equations. First, FF+F = F((F*F)-I F*)F =
F(F*F)-'(F*F) = F1 = F. Next, F+FF+ = ((F*F)-I F*) F((F*F)-I F*) _ (F*F)-I F* = F*) = F+. Now F+F = ((F*F)-I F*)F = (F*F)-'(F*F) = I, so surely (F+F)* = F+F. Finally, (FF+)* = (F(F*F)-I F*)* = F**(F*F)-I F* F(F*F)*-'F* = F(F*F)-I F* = FF+. (F*F)(F*F)-1
I((F*F)-I
(2) This proof is similar to the one above and is left as an exercise.
0
So we see that for matrices of full row or column rank, the pseudoinverse picks out a specific left(right) inverse of the matrix. From above, F+F = I, and GG+ = 1,. Now, for an arbitrary matrix A in C"I" with r > 0, we shall show how to construct the pseudoinverse.
DEFINITION 4.7
(pseudoinverse) Let A be a matrix in C"' "". Take any full rank factorization of A = FG. Then
F+ and G+ exist by the theorem above. Define A+ in C"' by A+ := G+F+. In other words, A+ = G*(GG*)-I (F*F)-I F*. THEOREM 4.16 For an arbitrary matrix A in C" with r > 0, A+ defined above satisfies the four MP-equations and, hence, must be the unique pseudoinverse of A.
4.4 The Moore-Penrose Inverse
183
Moreover, AA+ = FF+ and A+A = G+G where A = FG is any full rank factorization of A.
PROOF
Suppose the notation of (4.7). Then AA+A = AG+F+FG =
AG+G = FGG+G = FG = A. Next, A+AA+ = G+F+AA+ = G+F+ FGA = G+GA+ = G+GG+F+ = G+F+ = At Also, AA+ = FGG+F+ = FF+ and we know (FF+)* = FF+. Finally, A+A = G+F+FG = G+G and we know (G+G)* = G+G.
0
We now have established the uniqueness and existence of A+ for any matrix A in C111". The approach we used here goes back to Greville [19601, who credits A. S. Householder with suggesting the idea. There are some properties of pseudoinverses that are easy to establish. We collect some of these in the next theorem.
THEOREM 4.17 Let A E Cmxn Then
1. (AA+)2 = AA+ = (AA+)*. 2.
(1", - AA+)Z = Um - AA+)
AA+)*.
3. (A+A)2 = A+A = (A+A)*.
4. (1n-A+A)2=(1n-A+A)=(1"-A+A)*. 5.
(/," - AA+)A = ®,nxn
6. (1 - A+A)A+ = Onxm
7. A++=A. 8. (A*)+ = (A+)*. 9. (A*A)+ = A+A*+ 10.
A* = A*AA+ = A+AA*.
11. A+ _ (A*A)+A* = A*(AA*)+. 12. (XA)+ = k+A+.
ROOF
The proofs are left as exercises.
Let's look at an example.
El
The Moore-Penrose Inverse
184
Example 4.3 6 4
13
2 1
2
3
0
0
3
We continue with the example from above: A = 3
13
2
9
I
3
=
9
3
13
2
9
1
3
11
J, where F
and G= L
, gives a full 01
rank factorization of A. Then direct computation from the formulas in (4.15.1) 0 i -22 79 2P 2 and (4.15.2) yields G+ _ 0 and F+ = ? and so A+ 26 26 26 J 0 1
3
-II 6
13
13
79
6
=
23
-3
-22
If0
ITo
I36
130
79
There is something of interest 130
to note here. I t you recall the formula for the inverse of a matrix in terms of the ad-
jugate, we see
G+(GG*)-I
=
so A+ =
and F+ = (F*F)-I F* In
particular, if the entries of A consist only of integers, the entries of A+ will be rational numbers with the common denominator d et (F* F)det (GG* ). In our example, det(GG*) = 26 while det(F*F) = 5, hence the common denominator of 130. Before we finish this section, we need to tie up a loose end. We have already noted that a matrix A in C; ' with r > 0 has infinitely many full rank factorizations. We even showed how to produce an infinite collection using invertible matrices. We show next that this is the only way to get full rank factorizations.
THEOREM 4.18 Every matrix A E C°" with r > 0 has infinitely many full rank factorizations. However, if A = FG = FIG 1 are two full rank factorizations of A, then there
exists an invertible matrix R in C"`r such that F, = FR and GI = R-' G. Moreover, (R-'G)+ = G+R and (FR)+ = R-I F+. PROOF
The first claim has already been established so suppose A = FG = F, G I are two full rank factorizations of A. Then FI G I = FG so FI FIG, = F,+FG so GI = (FI+F)G since FI+F, = Jr. Note that Fi F
is r-by-r and r = r(GI) = r((F, F)G) < r(FI F) < r, so r(FI F) = r and so Fi F is invertible. Call FI+F = S. Similarly, FIG, = FG implies FIGIGi = FGGi = FI since GIGS _ fr. Again note GGi is r-by-r of rank r so GG+ is invertible. Name R = GGi. Then SR = F,+FGG+ = FI+AGi = FI+FIGIGt = /,. Thus S = R-I. Now we can see GI = SG =
4.4 The Moore-Penrose Inverse
185
R-'G and F, = FGG+ = FR. To complete the proof, compute (FR)+ =
((FR)*(FR))-'(FR)* = (R*F*FR)-IR*F* = R-l(F*F)-'R*-IR*F* = R-I(F*F)-I F* = R-1 F+ and (R-'G)+ = (R-iG)*((R-IG)(R-iG)*)-I = G*(R-1)*(R-lGG*R-1*)-1 =G*(R-l)*(R-I)*-I(GG*)-lR=G*(GG*)-lR
G+R.
0
We end this section with a table summarizing our work on pseudoinverses so far.
TABLE 4.4.1:
Summary Table
Dimension
Rank
n =m
n
A+=A-'
m-by-n m-by-n
n m
A+=(A*A)-IA*
m-by-n
r
Pseudoinverse
A+ = A*(AA*)-I A+ = G+ F+ where A = FG is any full rank factorization of A
Exercise Set 16 1. Let A = FG be a full rank factorization of A. Argue that F+A = G, FF+A = A, AG+ = F, and AG+G = A.
2. Suppose AL is a left inverse of A - that is, ALA = I. Is AL = A+ necessarily true? Suppose A*A = I. What can you say about A+?
3. Suppose AZ = A in C""". Use a full rank factorization of A to prove that rank(A) = trace(A) (i.e., the rank of A is just the trace of A when A is an idempotent matrix).
4. Justify each of the steps in the proof of the uniqueness theorem. 5. Determine the pseudoinverse of any diagonal matrix.
6. Go through the computations in detail of the numerical example in the text.
7. Prove (2) of Theorem 4.15.
186
The Moore-Penrose Inverse
8. Prove the 12 claims of Theorem 4.17.
9. Prove the following: AA+A+* = A+*, A+*A+A = A+*, A*+A*A = A AA*A*+ = A, A*A+*A+ = A+, A+A+*A* = At 10. Argue that the row space of A+ is equal to the row space of A*.
11. Argue that the column space of A+ is equal to the column space of A* and the column space of A+A. 12. Argue that A, A*, A+, and A+* all have the same rank.
13. Prove (AA*)+ = A+*A+, (A*A)+ = A+A+* = A+A*+ and (AA*)+ (AA*) = AA+.
14. Prove A = AA*(A+)* = (A+)*A*A = AA*(AA*)+A and A* A*AA+ = A+AA*.
15. Prove A+ = A+(A+)*A* = A*(A+)*A+.
16. Prove A+ = (A*A)+A* = A*(AA*)+ so that AA+ = A(A*A)+A*.
17. Show that if A = > A, where A; Aj = 0 whenever i # j, then A+ =
r_ A;. 18. Argue that all the following matrices have the same rank: A, A+, AA+, A+A, AA+A, and A+AA+. The rank is Tr(AA+).
19. Argue that (-A)+ = -A+. 20. Suppose A is n-by-m and S is m-by-ni invertible. Argue that (AS)(AS)+
= AA+.
21. Suppose A*A = AA*. Prove that A+A = AA+ and for any natural number n, (A")+ = (A+)". What can you say if A = A*? 22. Prove that A+ = A* if and only if A*A is idempotent.
23. If A =
®®
,
find a formula for At
24. Suppose A is a matrix and X is a matrix such that AX A = A, X AX = X and AX = X A. Argue that if X exists, it must be unique. 25. Why does (F*F)-1 exist in Theorem 4.15?
4.4 The Moore-Penrose Inverse
187
26. (MacDuffee) Let A = FG be a full rank factorization of A in Qr"
Argue that F*AG* is invertible and A+ = G*(F*AG*)-I F*. (Hint: First
argue that F*AG* is in fact invertible.) Note F*AG* = (F*F)(GG*) and these two matrices are r-by-r or rank r hence invertible. Then
(F*AG*)-I = (GG*)-I(F*F)-I. y,
X,
27. Let x =
and y =
. Show (xy*)+ = (x*x)+(y*y)+yx*. yn
xn
28. Find the MP inverse of a 2-by-2 matrix
a
b
c
d
29. Find examples of matrices A and B with (AB)+ = B+A+ and A and B with (A B)+ # B+A+. Then argue Greville's [1966] result that (AB)+ _ B+A+ iff A+A and BB+ commute. 30. Find
0
1
0
0
1
0
0
0
+
0
31. Remember the matrix units E, ? What is Et? 32. Find necessary and sufficient conditions for A+ = A. 33. (Y. Tian) Prove that the following statements are equivalent for m-by-n matrices A and B: r
(i) Col I AAA
1
= Col I BBB 1
(ii) Col f AAA 1 = Col I BBB
(iii) A =
B.
L
34. In this exercise, we introduce the idea of a circulant matrix. An n-hyn matrix A is called a circulant matrix if its first row is arbitrary but its subsequent rows are cyclical permutations of the previous row. So, if the first row is the second row is (anaia2...an-1 ), and the last row is (a2a3a4 anal). There are entire books written on these kinds of matrices (see page 169). Evidently, if you know the first row, you know the matrix. Write a typical 3-by-3 circulant matrix. Is the identity matrix a circulant matrix? Let C be the circulant matrix whose first row is (0100 . . . 0). Argue that all powers of C are also circulant matrices. Moreover, argue that if A is any circulant matrix with first row then A =a,I+a2C+a3C2+...+aC"-'.
The Moore-Penrose inverse
188
35. Continuing the problem above, prove that A is a circulant matrix iff AC = CA. 36. Suppose A is a circulant matrix. Argue that A+ is also circulant and A+ commutes with A.
37. (Cline 1964) If AB is defined, argue that (A B)+ = Bi A+ where AB A,B1, B, = A + A B and AI = ABiBi .
38. If rank(A) = 1, prove that A+ = (tr(AA*)-')A*.
39. Prove that AB = 0 implies B+A+ = 0. 40. Prove that A*B = 0 iff A+B = 0. 41. Suppose A*AB = A*C. Prove that AB = AA+C. 42. Suppose BB* is invertible. Prove that (AB)(AB)+ = AA+. 43. Suppose that AB* = 0. Prove that (A+B)+ = A+ +(I,, - A+B)[C++
(I - C+C)MB*(A+)*A+(I - BC+) where C = (1,,, - AA+)B
and
M = [1,1 + (1,, - C+C)B*(A+)A+B(I,1 - C+C)]-' A
+
= [A+ - T BA+ I T] where T = E+ + (1 -
44. Prove that B
E+B)A+(A+)*B*K(Ip - EE+) with E = B(I - A+A) and K = [1,, + (I,, - EE+)BA+(A+)*B*(1 - EE+)]-'. 45. Prove that [A:B]+
A+-A+B(C++D) l C,+ + D
where C = (I,,, -
J
AA+)B and D = (I,, - C+C)[I,, + (1,, - C+C)B*(A+)*A+B(1,, - C+C)]-' B*(A+)* A+(I,,, - BC+).
46. Argue Greville's [1966] results: (AB)+ = B+A+ iff any one of the following hold true:
(a) A+ABB*A* = BB*A* and BB+A*AB = A*AB. (b) A+ABB* and A*ABB+ are self adjoint. (c) A+ABB*A*ABB+ = BB*A*A. (d) A+AB = B(AB)+AB and BB+A* = A*AB(AB)+. 47. Suppose A is m-by-n and B is n-by-p and rank(A) = rank(B) = n. Prove that (A BY =B+ A+.
4.4 The Moore-Penrose Inverse
189
Further Reading [Cline, 1964] R. E. Cline, Note on the Generalized Inverse of a Product of Matrices, SIAM Review, Vol. 6, January, (1964), 57-58.
[Davis, 1979] Philip J. Davis, Circulant Matrices, John Wiley & Sons, New York, (1979). [Greville, 1966] T. N. E. Greville, Note on the Generalized Inverse of a Matrix Product, SIAM Review, Vol. 8, (1966), 518-524. [Greville, 1960] T. N. E. Greville, Some Applications of the Pseudo Inverse of a Matrix, SIAM Review, Vol. 2, (1960), 15-22. [H&M, 1977] Ching-Hsiand Hung and Thomas L. Markham, The Moore-Penrose Inverse of a Sum of Matrices, J. Australian Mathematical Society, Vol. 24, (Series A), (1977), 385-392.
[L&O, 19711 T. O. Lewis and P. L. Odell, Estimation in Linear Models, Prentice Hall, Englewood Cliffs, NJ, (1971).
[Liitkepohl, 1996] H. Liitkepohl, Handbook of Matrices, John Wiley & Sons, New York, (1996). [Mitra, 1968] S. K. Mitra, On a Generalized Inverse of a Matrix and Applications, Sankhya, Series A, XXX:1, (1968), 107-114.
[M&O, 1968] G. L. Morris and P. L. Odell, A Characterization for Generalized Inverses of Matrices, SIAM Review, Vol. 10, (1968), 208-211.
[Penrose, 1955] R. Penrose, A Generalized Inverse for Matrices, Proc. Camb. Phil. Soc., Vol. 51, (1955), 406-413. [R&M, 19711 C. R. Rao and S. K. Mitra, Generalized Inverses of Matrices and its Applications, John Wiley & Sons, New York, (1971).
[Rohde 1966] C. A. Rohde, Some Results on Generalized Inverses, SIAM Review, VIII:2, (1966), 201-205. [Wong,
19811
Edward T. Wong, Polygons, Circulant Matrices, and
Moore-Penrose Inverses, The American Mathematical Monthly, Vol. 88, No. 7, August/September, (1981), 509-515.
190
The Moore-Pen rose Inverse
MATLAB Moment
4.4.1 4.4.1.1
The Moore-Penrose Inverse
MATLAB has a built-in command to compute the pseudoinverse of an mby-n matrix A. The command is pinv(A). For example,
> > A=[123;456;789 ans = -0.6389 -0.0556
-0.1667 0.0000 0.1667
0.5278
0.3056 0.0556
-0.1944
> > format rat >> pinv(A) ans =
-23/36
-1/6
-1/18
*
1/18
19/36
1/6
-7/36.
1 1 /36
For fun, find pinv(B) where B = ones(3), B = ones(4). Do you see a pattern?
4.5
Solving Systems of Linear Equations
Now, with the MP-inverse in hand, we consider an arbitrary system of linear equations Ax = b where A is m-by-n, x is n-by-1, and b is in-by-l.
THEOREM 4.19
Ax = b has a solution if and only if AA+b = b. If a solution exists at all, every solution is of the form x = A+b + (I - A+A)w, where w is an arbitrary parameter matrix. Indeed, a consistent system always has A+b as a particular solution.
PROOF First, we verify the consistency condition. If AA+b = b, then evidently x = A+b is a solution to the system. Conversely suppose a solution
y exists. Then Ay = b, so A+Ay = A+b whence AA+Ay = AA+b. But AA+A = A, so b = Ay = AA+b, and we have the condition. Now, suppose the system has a solution x and let x = A+b as above. Then, if w = x - xo,
Aw = A(x -
Ax - Ax,, = b - AA+b = b - b = V. Now clearly,
4.5 Solving Systems of'Linear Equations
191
ButAw=-6 implies A+Aw = -6, sow=(I-A+A)w.Thus we see x = x +w = A+b + (I - A+A)w, and the proof is complete.
D
So, while our previous concepts of inverse were highly restrictive, the MPinverse handles arbitrary systems of linear equations, giving us a way to judge Hhether a solution exists and giving us a way to write down all solutions when they exist. We illustrate with an example. Example 4.4 1
Let A =
1
1
0
0 0
1
in C4x3 and consider the system of linear equations
1
1
2
Ax = b, where b =
2
First we compute the pseudoinverse of A. We
.
2 1
write A=FG=
]
0
r
1 1
1
0
0
1
1
1
a full rank factorization of A.
0 1
Then A+
I
I
6
= G+F+ = G*(GG*)-I (F*F)-I F* =
61
61
6
6
6
6
6
2
and AA+ =
0 1
Since AA+ is idempotent, its rank is its trace,
2
0 -1
1
which is 2. Also I - A+A -
-1
3 73,
We compute A A+b to
I
2
see if we have any solutions at all: AA+b = 0 2 2 2
Yes, the system is consistent. To give all possible solutions we form
2
x = A+b + [I - A+AIw, where w is a column of free parameters: x =
The Moore-Penrose Inverse
192
i
i
2
61
bi
2 2
6
2
61
61 6
2
i
i
6
6
-i
i
+
w
i
=
W2
31 3
3
(wi - w2 - w3) $
-i
3
+
w3
3
4
+t
(-WI +w2+w;)
where t
3 31
3(-3(+ W2 + w3) 3i W1 - w2 - W3. This also tells us that the nullity of A is I and the null space is i
1
3
3-i
Null(A) = span
I
Next consider Ax = b =
T
1)
Again we
0
-
check for consistency by computing AA+
1
2
0
2
0
2
2
0
=
1
0
2
0
I
2
2
2
0
1
0; 0
Z
0
Evidently, there is no solution to this system.
The astute reader has no doubt noticed that the full force of the pseudoinverse
was not needed to settle the problem of solving systems of linear equations. Indeed, only equation (MPI) was used. This observation leads us into the next chapter.
Exercise Set 17 1. Use the pseudoinverse to determine whether the following systems of linear equations have a solution. If they do, determine the most general solution. (a)
(b)
2x
-
10y
+
16z
3x
+
y
-
5z
+ + + + +
z z
2x 2x 2x 6x 4x
-
y 2y 4y 2y 2y
3z 4z z
-
= =
10
_
4
=
1
3
5w = -3 5w = 1
=
4
193
4.5 Solving Systems of Linear Equations 2.
Create examples of three equations in two unknowns that (a) have unique solutions (b) have an infinite number of solutions (c) have no solutions.
Are your examples compatible with the theory worked out above? 3.
This exercise refers back to the Hermite echelon form. Suppose we desire the solutions of Ax = b where A is square but not necessarily invertible. We have showed how to use the pseudoinverse to describe all solutions to Ax = b if any exist. In this exercise, we consider a different approach. First form the augmented matrix [A b]. There is an invertible matrix S such that SA = H = HEF(A) so form [A b] -+ [SA Sb] I
I
I
= [H I Sb].
(a) Argue that rank(A) = rank(H) = the number of ones on the diagonal of H. (b) Prove that Ax = b is consistent iff Sb has nonzero components only in the rows where H has ones.
(c) If Ax = b is consistent, argue Sb is a particular solution to the system.
(d) Argue that if H has r ones down its diagonal, then I - SA has exactly n - r nonzero columns and these nonzero columns span A(ull(A) and hence form a basis forMull(A).
(e) Argue that all solutions of Ax = b are described by x = Sb + (I - SA)D, where D is a diagonal matrix containing n - r free parameters.
Further Reading [Greville, 1959] T. N. E. Greville, The Pseudoinverse of a Rectangular or Singular Matrix and its Application to the Solution of Systems of Linear Equations, SIAM Review, Vol. 1, (1959), 38-43.
[K&X, 1995] Robert Kalaba and Rong Xu, On the Generalized Inverse Form of the Equations of Constrained Motion, The American Mathematical Monthly, Vol. 102, No. 9, November, (1995), 821-825.
The Moore-Penrose Inverse
194
MP-Schur complement, the rank theorem, Sylvester's determinant formula, the quotient formula, Reidel's formula, parallel sum
Schur Complements Again (optional)
4.6
Now, we generalize the notion of a Schur complement to matrices that may B,,,xt
not be invertible or even square. Let M = [ A"'xn
1
Dsxt J
Crxn
We define the
MP-Schur complement of A in M to be
M/A = D - CA+B. Obviously, if A is square and nonsingular, A-' = A+ and we recapture the previous notion of the Schur complement. Similarly, we define
M//D = A - BD+C. Next, we investigate how far we can generalize the results of Chapter I. Things get a bit more complicated when dealing with generalized inverses. We follow the treatment in Carlson, Haynsworth, and Markham [C&H&M, 1974].
THEOREM 4.20 (the rank theorem)
Let M =
I
C
Then BD
].
rank(M) > rank(A) + rank(M/A).
Moreover, equality holds iff
Null(MIA) c Null((I - A+A)B)
1.
2. Null((M/A)*) F= Null((/ - A+A)C*) 3.
(I - AA+)B(M/A)+C(I - A+A) = G.
PROOF
Let P =
[ -CA+
and Q are invertible. Then P M Q_ A
= [
B
®] and Q = I
-CA A+C -CA+B+D ] A
®
1
1
-CA+ I [
® -AA+B+B
®
1+B
J.
Note P
AB I -A+B [+C D) [® I
I
B1
-CA+A + C CA+AA+B - CA+B + -CA+B + D ]
4.6 Schur Complements Again (optional)
-AA+B+B
A
I
195 1
-CA+A + C CA+B - CA+B + -CA+B + D J _
-AA+B + B D - CA+B
A
= [ -CA+A + C
Let P =
[ ® -(I - AA;)B(M/A)+ 1 M = [ Cn
DBj?jxj ].
x,n
sxt
Then (M/A)5 , = Dsxt - C,nJApxmBmxt E C(nt+')x(n+t)
0
The following theorem is Sylvester's determinant formula for noninvertible matrices.
THEOREM 4.21 (Sylvester's determinant formula) BB
Let M = [ AC k
Let either Null(A) C Null(C) or Null(A*) e 1
Null(B*). Let P = [pii] and
Col B) 1
Then P = det(A)(M/A), and if M is (C) dii n-by-n, det(P) = (det(A))n-k-'det (M). ptj
Row;A
det I
PROOF Since either Null(A) c_ Null(C) or NUll(A*) c_ Arull(B*) holds, the determinant formula det (M) = det (A)det(M/A) can be applied to the elements pig p;j = det(A)(d,1 - Row;(C)A+Coli(B)) = (det(A))(M/A);;. If M is n-by-n, then by equation above det (P) = (det(A))n-kdet(M/A) = (det(A))n-I -'det(M). 0 :
The quotient formula for the noninvertible case is stated and proved below.
THEOREM 4.22 (the quotient formula)
Let A = [
HK
l , where B = [
D
E
1 ].
Let Arull(A) e Null(C),
Null(A*) C 1(ull(BJ*). Then (A/B) _ (A/C)/(B/C).
Since Null(A) e Null(C) and Null(A*) e_ Null(B*), C So, we we may write A = QP BKP FR ]. B [ SC PROOF
I
have A/B = K - QPBB+B P = K - QBP, B/C = F - SCR. Partition P, Q as P = [p], Q = [ Q) Q2 ]. Then A/C =
The Moore-Penrose Inverse
196
(B/C) Q2(B/C)
(B/C)P2 Hence A/C satisfies K - (Q, + Q2)C(P, + Null(A) e Mull(C) and NUll(A*) c Null(B*). Since B/C = F - SCR, (A/C)/(B/C) = K - QiCPi - Q2SCP1 - Q2FP2 = A/B. RP"-)
The following theorem is an example using the MP-inverse to get a ShermanMorrison-Woodbury type theorem. The theorem stated below is due to Reidel [19921.
THEOREM 4.23 (Reidel's formula) Let A,,,, have rank l i , where 1, < 1, V1, V2, W1, W2 be l -by-k and G be a k-by-k
nonsingular matrix. Let the columns of V, belong to Col(A) and the columns of W, be orthogonal to Col(A). Let the columns of V2 belong to Col(A*) and the columns of W2 be orthogonal to Col(A*). Let B = W;*W; have rank k. Suppose Col(W1) = Col(W2). Then the matrix ar = A + (V1 + W,)G(V2 + W2)* has the MP-inverse ,n+
= A+ -
W2(W, W2)-'Vz A+
W2(W2
W2)_,(G++
- A+ Vj(Wj(W1 Wi)-l )* +
VA +V1)(W1(W, Wi)-I)*
The proof is computational but lengthy, so we leave it as a challenge to the reader. The corollary given below addresses some consequences of this theorem. COROLLARY 4.6 The following are true under the assumptions of the previous theorem: 1.
If G = /, then (A + (V, + W i)(V2 + W2 )* )+ W2)-,VA
= A+ - W2(W2 ++W2(W2W2)-'V2 A+VI(Wi(Wr W,)-')*
A+V1(W1(W,*W1)-i)*
2. If G = 1, and A = I, then (I + (Vi + W,)(V2 + W2)*)+ _ / - W2(W? W2)-1 V2 - VJ(W1(W, Wi) T + W2(W2 *W2)-' V2 V1(Wj (W1 Wi)-I)*
3. If V, =V2=O, then ((A + W, W2)*)+ = A+ + W2(WZ W2)-J G+(Wi (W, Wj )-I )*
Fill and Fishkind [F&F, 19991 noticed that the assumption Col(W1) = Col(W2) is not used in the proof of Reidel's theorem, and they used this theorem to get a somewhat clean formula for the MP-inverse of a sum. They also noted that a rank additivity assumption cannot be avoided in Reidel's theorem.
4.6 Schur Complements Again (optional)
197
THEOREM 4.24
Suppose A, B E Cnxn with rank(A + B) = rank(A) + rank(B). Then
(A + B)+ = (I - S)A+(1 - T) + SB+T where S = (B+B(1 - A+A))+ and T = ((I - AA+)BB+)+
PROOF
The proof uses Reidel's theorem and is again rather lengthy so we 0
omit it.
The following simple example is offered on page 630 of Fill and Fishkind [ 1999].
Example 4.S
Let A = B = [ 1]. Then rank(A+B) = rank(l+1) = rank(2) = 10 2 = rank[ 11+
rank[1] = rank(A) + rank(B). Now S = [[1][0]]+ = [0] and T = [[0][1]]+ = [0]. Therefore, (I-S)A+(I-T)+SB+T = [ 1 ] [ l ]+[ 1 ] + [0][11+[0] _ [11 and (A+B)+ = [2] and the theorem fails. There is a nice application of the previous theorem to the parallel sum of two matrices. Given A, B E Cnxn we define the parallel sum and A and B to be
A II B = (A+ + B+)+. COROLLARY 4.7
Suppose A, B E Cnxn with rank(A II B) = rank(A) + rank(B). Then A II
B = (I - R)A(I - W) + RBW, where R = (BB+(I - AA+)+ and W = ((1 - A+A)B+B)+.
Exercise Set 18 1. Let M = L
C D ].
Argue that det(M) = det(A)det(M/A) if A is
invertible. Moreover, if AC = CA, then det(M) = AD - BC.
The Moore-Pet, rose Inverse
198
Further Reading [C&H&M, 1974] David Carlson, Emilie Haynsworth, and Thomas Markham, A Generalization of the Schur Complement by Means of the Moore Penrose Inverse, SIAM J. Appl. Math., Vol. 26, No. 1, (1974), 169-175. [C&H, 1969] D. E. Crabtree and E. V. Haynsworth, An Identity for the Schur Complement of a Matrix, Proceedings of the American Mathematical Society, Vol. 22, (1969), 364-366.
[F&F, 1999] James Allen Fill and Donnielle E. Fishkind, The MoorePenrose Generalized Inverse For Sums of Matrices, SIAM J. Matrix Anal., Vol. 21, No. 2, (1999), 629-635.
[Gant, 1959] F. R. Gantmacher, The Theory of Matrices, Vol. 1, Chelsea Publishing Company, New York, (1959).
IRiedel, 1992] K. S. Riedel, A Shcrman-Morrison-Woodbury Identity f'or Rank Augmenting Matrices with Application to Centering, SIAM J. Matrix Anal., Vol. 13, No. 2, (1992), 659-662.
Chapter 5 Generalized Inverses
5.1
The (I)-Inverse
The Moore-Penrose inverse (MP-inverse) of a matrix A in C'"" can be considered as the unique solution X of a system of simultaneous matrix equations; namely,
(MPI) AXA = A (MP2) X AX = X (MP3) (AX)` = AX (MP4) (XA)` = XA. We have already noted that only MPI was needed when seeking solutions of a system of linear equations. This leads us to the idea of looking for "inverses" of A that satisfy only some of the MP-equations. We introduce some notation. Let A{1} = {G I AGA = A}, A{2} = (H I HAH = H}, and so forth. For example, All, 2} = A{ I) fl A{2}. That is, a 11,21-inverse of A is a matrix that satisfies MPI and MP2. We have established previously that A(1, 2, 3, 4} has just one element in it, namely A+. Evidently we have the inclusions A{1,2,3,4} e_ A{1,2,3) E- A{1,2} E_ A{l}. Of course, many other chains are also possible. You might try to discover them all. In this section, we devote our attention to A( 1}, the set of all 1-inverses of A. The idea of a 1-inverse can be found in the book by Baer [1952]. Baer's idea was later developed by Sheffield [ 1958] in a paper. Let's make it official.
DEFINITIONS.] (1-inverse) Let A E C'' ". A 1-inverse of A is a matrix G in C"' such that AGA = A; that is, a matrix that satisfies MPI is called a 1-inverse of A. Let A(1) denote the collection of all possible 1-inverses of A. Our goal in this section is to describe A { 11. Of course, A+ E A ( I ) so this collection is never empty and there is always a ready example of an element
199
200
Generalized Inverses
in A(l). Another goal is to prove a fundamental result about I-inverses. We begin by looking at a general matrix equation AX B = C where A E C"'"" X E C""1', B E CP'"q and, necessarily, C E C"y. Here A, B, C are given and we are to solve for X. THEOREM 5.1 Let AX B = C be as above. Then this equation has a solution if and only if'there exists an AA in A(1) and a Bs in B{1 } such that AAACBAB = C. If solutions exist, they are all of the form X = AACBR + W- AAAWBBA, where W is
arbitrary in C"P. PROOF
Suppose the consistency condition AAACBAB = C holds for some AA in A{ 1) and some BA in B{ 1). Then clearly, X = AACBA solves the matrix
equation. Conversely, suppose AX B = C has a solution X1. Then AX I B =
C so A+AXIBB+ = A+CB+. Thus, AA+AXIBB+B = AA+CB+B and also equals AX I B, which is C. Therefore, AA+CB+B = C and note that A+EA(1) and B+ E B(I). Now suppose AX B = C has a solution, say Y. Let K = Y - X,,, where k = AACBA as above. Then AKB = A(Y AYB C -AAACBAB =G. Now, AKB = O implies AAAKBBR = 0, so K = KAAAK BBR, so Y = X + K = AACBA + K - AA A K B BR. On the other hand, if X = AACBA + W -AAAWBBA for some W, then AX B = A(ARCBR)B +
A(W -ARAWBBA)B= AAACBAB+AWB-AARAWBBAB = C usingthe consistency condition and the fact that All and BA are 1-inverses. This completes the proof. 0
The test of a good theorem is all the results that follow from it. We now reap the harvest of this theorem.
COROLLARY 5.1 Consider the special case AX A = A. This equation always has solutions since the consistency condition A All AAR A = A holds true for any AA in A { 11.
including A+. Moreover, A111 = {X I X = ARAAR + W- AAA WAAA}, where W is arbitrary and All is any I -inverse of A. In particular, All) = {X X = A+ + W - A+A WAA+), where W is arbitrary.
COROLLARY 5.2
Consider the matrix equation AX = C. A in C"'"", X in C"P, and C in Cm 'P. This is solvable iff AAAC = C for some I-inverse AA of A and the general solution is X = ARC + (I - AAA)W, where W is arbitrary.
5.1 The (I }-Inverse
201
COROLLARY 5.3
Consider the matrix equation XB = C, where X is n-by-p, B E Cex9, and C E C"xy. This equation is solvable iff CBSB = C for some I-inverse BS of B and then the general solution is X = CBS + W(1 - BBS), where W is arbitrary.
COROLLARY 5.4
Consider the matrix equation AX = 0, where A E C'"x" and X is n-by-p. Then this equation always has solutions since the consistency condition AAS®1S1 = 0 evidently holds for any 1-inverse of A. The solutions are of the form X = (I - ASA)W, where W is arbitrary. COROLLARY 5.5
Consider the matrix equation X B = 0. This equation always has solutions since the consistency condition evidently holds. The solutions are of the form X = W (l - B BS ), where W is arbitrary and BS is some 1-inverse of B. COROLLARY 5.6
Consider a system of linear equations Ax = c, where A E C"'x" x is n-by-I and c is m-by-1. Then this system is solvable iff AASc = c for any AS in At I )
and the solutions are all of the form x = ASe + (1 - ASA)w, where w is arbitrary of size n-by-1.
We recall that for a system of linear equations Ax = b, A', if it exists, has the property that A-' b is a solution for every choice of b. It turns out that the 1-inverses of A generalize this property for arbitrary A. THEOREM 5.2
Consider the system of linear equations Ax = b, where A E C'"x". Then Gb is a solution of this system for every b E Col(A) iff G E A(1). PROOF Suppose first Gb is a solution for every b in the column space of A. Then AGb = bin Col(A). Now b = Ay for some y so AGAy = Ay. But y could be anything, so A G A = A. Conversely, suppose G E A( 11. Take any b in Col(A). Then b = Ay for some y. Then AGb = AGAy = Ay = b. 0 As Campbell and Meyer [C&M, 1979] put it, the "equation solving" generalized inverses of A are exactly the ]-inverses of A. Next, we give a way of generating 1-inverses of a given matrix. First, we need the following.
202
Generalized Inverses
THEOREM 5.3
If S and T are invertible matrices and G is a I -inverse of A, then T GS-' is a 1-inverse of B = SAT. Moreover, every 1-inverse of B is of this form.
PROOF First B(T-'GS-')B = SAT(T-'GS-')(SAT) = SAIGIAT = SAG AT =SAT= B, proving T -' G S-' is a I -inverse of B. Next, let K be any 1-inverse of B, that is B K B = B. Note also that S-' BT -' (T K S)S-' BT -'
S-'BKBT-' = S-'BT-'. But S-'BT-' = A so if we take G = SKT, then AGA = A(SKT)A = A. That is, G is a I-inverse of A and K = T-'GS-'as claimed.
0
THEOREM 5.4 (Bose)
If A E C,' "' there exist invertible matrices S and T with SAT = A matrix G is a I -inverse of A iff G = T N'S, where N"
Ir
I#
Y
X
W ] where
X, Y, and W are arbitrary matrices of appropriate size.
Suppose SAT = [
PROOF
G=T
Y
X
r ®] , where S and T are invertible. Let
] S, where X, Y, and W are arbitrary matrices of appropriate
size. Then, AGA = AT [ X W ] SA =
0jT'T1 Jr
S-'1 0 S
[® ®] [X
W jSS-I1
] [®
W
®
®]T-
®1
T'
'r Y S[0 ®] [®
0T-'
= S-' [ r0 ®] T-' = A. Conversely, ifGisal-inverseofA,then T-'GS-' is a I -inverse of S-' AT -'' by the previous theorem. That is, N = T' GS-' is a I -inverse of [
®
®]
.
Partition N =
[M W I
, where M is r-by-r.
and also equals ® ®] [ X W ] [ ® ®] = [ 0 ] [ ® ®] Therefore M = Ir. Thus G = T NS = T 0[ X W ] S. 0
But [
11
.
Next, we consider an example.
203
5.1 The (11-Inverse
Example 5.l 0
1
Let A =
0
0 0 . Then A = A+. To find all the 1-inverses of A, we 0 0 0 compute A+ + W - A+A W A A+ for an arbitrary W : 1
0
1
00
xy
1
u
0+
000 0
z
0
I
W
r
s
t
1
r s t Thus A{ 1)
.
l
0 0
-r -s
0 0 I
0 0
O
z
1
w
0
0
xy
00
1
0
0
1
r s t
0
z
0
1
w
r
s
t
0 0 0
1
0
0
0 0
0
0
1 0=
z, w, t, r, s are arbitrary}.
, ABA =
,and!-AAB=
z
uw
u
000 1
We find that AA8 = 0 0
z
vw-
0 1
0
1
0
0
1
r
s
0
0 0
, 1 - ABA =
-z -w 1
We introduced you to the Hermite echelon form for a reason. It helps to generate 1-inverses, indeed, nonsingular ones.
THEOREM S.S
Suppose A is n-by-n and S is nonsingular with SA = HEF(A) := H. Then S is a 1-inverse of A.
PROOF
The proof is left as an exercise.
U
THEOREM S.6 [Bose, 1959] Let H E (A*A){ I). Then AHA* = A(A*A)+ = AA+. In other words, AHA* is the unique Hermitian idempotent matrix with the same rank as A.
PROOF WeknowA*AHA*A = A*AsoAHA*A =A and A*AHA* = A*. Let G1 and GZ be in A*A{l }. Then AGZA*A = AGZA*A so that AGiA* = 0 AG2A*. This proves the uniqueness. The rest is left to the reader. In 1975, Robert E. Hartwig published a paper where he proposed a formula for the I-inverse of a product of two matrices. We present this next.
204
Generalized Inverses
THEOREM 5.7 [Hartwig, 1975]
For conformable matrices A and B, (AB)A = B,'AA - B4(1 - AAA)j(1 BB9)(1 - AAA)]9(1 - BBA)AA. It would he reasonable to guess the same formula works for the MP-inverse; that is,
(AB)+ = B+A+ - B+(1 - A+A)[(1 - BB+)(1 - A+A)]+(I - BB+)A+. 1
If we check it with our previous example A = I
0
and B =
1 J
to
I
0
0
1
1
1
[
, we find (AB)+ # B+A+. However, the right-hand side computes
-1
-1 9 2
9
9
9
I I
3
3
3
3
2 3
2
-I
=
3
3
3
3 1
3
-
3
-1
1
[
3
1
0
1
0
columns, so A+A = 1, A*A = [
2
A+ _ (A*A)-IA* = 1
[
1
0
0
1
1
_ 0
I[
1
2
1
l
-I
3 -I
3
3
-I
I
3
3
3
Now A has full
J.
*4_1 -;
_ 21
21
11and
-1
1
2
1
].AlsoB*B=E1JB+=
.
(AB)*(AB) = [21, and (AB)+ _ [(AB)*(AB)]-'(AB)* = ever, B+A+ = [1011 [
3
I
°
J
1] =3 [ -1
(B*B)-1 = [l0] and AB =
3
I
3
I
and B = [ 2I
1
2
-l
-I -1
-I
I
I
1
1 I
1
I
= (AB)+. So the formula works! Before
2
1
you get too excited, let A =
2
I
I
3
-I
I
I
3
-I 3 -I
Z[101].
How-
21
21
1
= '[2 -111 # (A B)+. It would he nice to have necessary and sufficient conditions for the formula to hold, or better yet, to find a formula that always works.
S. / The (I }-Inverse
205
Exercise Set 19
4 -100
1
is a 1-inverse of A = 0 000 0 0 00 Can you find some others? Can you find them all?
1. Verify that G
A
2. Suppose A E C"" and A=Az A 11 [
r
r-by-r invertible. Show that G = I
i
AZ1A11 A12 J
l
1
0
4
2
1
0 1-1-44 7
3
5
J, where A 1 1 is
®1
®J
is a 1-inverse of A.
3. Suppose Gland G2 are in A( 1) for some matrix A. Show that (G 1 +G2) 2 is also in A (1).
4. Suppose G 1 and G2 are in A (I } for some matrix A. Show that XG 1 + (1 - X)G2 is also in A (1). In other words, prove that A (1) is an affine set.
5. Suppose G 1, G2, ... , Gk are in A 11) and X 1, X2, ... , Xk are scalars that sum to 1. Argue that X1 G 1 + X2G2 + + I\k Gk is in All).
6. Is it true that any linear combination of 1-inverses of a matrix A is again a 1-inverse of A?
7. Let S and T be invertible matrices. Then show T-1 MS-1 is a 1-inverse of SAT for any Ax in A{ 1). 8. Argue that B is contained in the column space of A if and only if AA8 B =
B for any A8 in All). 9. Argue that A{l} = (A++ [1 - A+A}W21 + W12(1 - AA+) I W21, W12 arbitrary). 10. If G E A{1), then prove that GA and AG are both idempotent matrices.
11. If G E A{1), then prove rank(A) = rank(AG) = rank(GA) = trace(AG) < rank(G). Show rank(I -AG) = m - rank(A) and rank(I -
GA) = n - r(A). 12. Suppose A is n-by-n and H = HEF(A). Prove A is idempotent if and only if H is a 1-inverse of A.
Generalized Inver.e.s
206
13. If G is a I-inverse of A, argue that GAG is also a 1-inverse of A and has the same rank as A. 14. Show that it is always possible to construct a 1-inverse of A E C,'"" that has rank = min{m, n}. In particular, argue that every square matrix has an invertible 1-inverse.
15. Make clear how Corollary 5.1 to Corollary 5.6 follow from the main theorem.
16. Prove Theorem 5.5. 17. Prove Theorem 5.6.
18. Suppose AGA = A. Argue that AG and GA are idempotents. What direct sum decompositions do they generate?
(1)? What is / (I }? Remember the matrix units E11 E 19. What is (C"'"? What is Eij 11)? 20. Is A{2, 3, 41 ever empty for some weird matrix A?
21. Find a 1-inverse for
1
0
2
3
0
1
4
5
0
0
0
0
0
0
0
0
22. Show that A E C, "" can have a {I }-inverse of any rank between r and
min fm, n}. (Hint: rank ®
D
r + rank(D).)
23. Argue that any square nonsingular matrix has a unique 1-inverse. 24. Suppose A E C""" and G E A { l ). Argue that G* E A* (1).
25. Suppose A E Cand G E A{ I} and h E C. Argue that X+G E (XA)(1) where, recall, k+ =
0
x-'
if X= 0
if x# 0
26. Suppose G E A (1). Argue that GA and AG are idempotent matrices that have the same rank as A.
27. Suppose G E A{ 11. Argue that rank(G) > rank(A). 28. Suppose G E All). Argue that if A is invertible, G = A'. 29. Suppose G E A{1} with S and T invertible. Argue that, T-'GS-' E (SAT) { 1).
5.1 The { I }-Inverse
207
30. Suppose G E A( I). Argue that Col(AG) = Col(A), Null(GA) _ J/ull(A), and Col((GA)*) = Col(A*). 31.
Suppose G E A(1 ), where A E C"'. Argue that GA = I iff r = n iff G is a left inverse of A.
32.
Suppose G E A{ 1), where A E C"""". Argue that AG = I iff r = m iff G is a right inverse of A.
33. Suppose G E A(l) and v E Null(A). Argue that G, _ [g, rr
34. Let A E C"I x" . Argue that G E A (1) iff G = S L® invertible S and T.
0
L
I
I
11
T for some J
35. LetGEA(l). Argue that H = G+(W -GAWAG)isalsoa I-inverse and all 1-inverses of A look like H.
36. (Penrose) Prove that AX = B and X C = D have a common solution X iff each separately has a solution and AD = BC.
37. Suppose G E Al I). Prove that G + B - GABAG E A{1} and G + AG) = (1" - GA)C E Al I). 38. Suppose G E A(1). Prove that rank(A) = rank(GA) = rank(AG) and rank(G) = rank(A) iff GAG = G.
39. Suppose G E A*A{1). Prove that AGA*A = A and A*AGA* _ A*.
40. Suppose BAA* = CAA*. Prove that BA = CA. 41.
Suppose G E AA*{1} and H E A*A{1}. Prove that A = AA*GA, A = AHA*A, A = AA*G*A, and A = AH*A*A.
42. Suppose A is n-by-m, B is p-by-m, and C is n-by-q with Col(B*) c Col(A*) and Col(C) e Col(A). Argue that for all G in A( 1), BGC = BA+C. 43.
Suppose A is n-by-n, c is n-by-I, and c E Col(A) fl Col(A*). Argue that for all G in A{1}, c*Gc = c*A+c. How does this result read if A = A*?
44. Find matrices G and A such that G E A 11) but A V G { I).
Generalized Inverses
208
Further Reading [Baer, 1952] Reinhold Baer, Linear Algebra and Projective Geometry, Academic Press, Inc., New York, (1952). [B-I&G, 20031 Adi Ben-Israel and Thomas N. E. Greville, Generalized Inverses: Theory and Applications, 2nd Edition, Springer, New York, (2003).
[B&O, 1971] Thomas L. Boullion and Patrick L. Odell, Generalized Inverse Matrices, Wiley-Interscience, New York, (1971). [C&M, 19791 S. L. Campbell and C. D. Meyer, Jr., Generalized Inverses of Linear Transformations, Dover Publications, Inc., New York, (1979). [Sheffield, 1958] R. D. Sheffield, A General Theory For Linear Systems, The American Mathematical Monthly, February, (1958), 109-111.
[Wong, 1979] Edward T. Wong, Generalized Inverses as Linear Transformations, Mathematics Gazette, Vol. 63, No. 425, October, (1979), 176-181.
5.2
{1,2}-Inverses
C. R. Rao in 1955 made use of a generalized inverse that satisfied MPI and MP2. This type of inverse is sometimes called a reflexive generalized inverse. We can describe the general form of 11,2)-inverses as we did with 1-inverses. It is interesting to see the extra ingredient that is needed. We take the constructive approach as usual.
THEOREM 5.8 Let A E C"' -.There are matrices S and T with SAT = L
1,
G is a {1,2}-inverse of A if and only if G = T NMS, where N" =
where X and Y are arbitrary of appropriate size.
g
]. A matrix 11
Y X XY
5.2 {1,2}-beverses
209
PROOF First suppose G = TN"S as given in the theorem. Then G is a 1-inverse by Theorem 5.3. To show it is also a 2-inverse we compute GAG =
GS-' I T [
®g
] T-'G = TNT
XXY
] [
1,
Y
00
f ® ®]
Xr X Y
]
N'S =
S-T [
'r X
'r Y Y ®X XX
S = T I X XY ] S = G. Conversely, suppose G is a {1,2}-inverse of A. Then, being a 1-inverse, we know G = TN"S, where N" _ [ X W ] . But to be a 2-inverse, G = GAG =
XW XW]
T [
] SS-1 [ S
=
11
0 T
0
O ] T-IT [
[Jr
XW
IS
-
T[
X
0]
XY ] S. Comparing matrices, we see
W=XY.
0
Exercise Set 20 1. Suppose S and T are invertible matrices and G is a {1,2}-inverse of A. Then T-'GS-' is a {1,2}-inverse of B = SAT. Moreover, every { 1, 2}-inverse of B is of this form.
2. (Bjerhammar) Suppose G is a 1-inverse of A. Argue that G is a 11, 2}-inverse of A iff rank(G) = rank(A). 3. Argue that G E Ail, 2} iff G = G 1 A G2, where G1, G2 E Ail).
4. Argue that G = E(HAE)-' H belongs to A {2}, where H and E are selected judiciously so that HAE is nonsingular.
5. Argue that G = E(HAE)+H belongs to A {2}, where H and E are chosen of appropriate size.
6. Suppose A and B are {1,2}-inverses of each other. Argue that AB is the projector onto Col(A) along Null(B) and BA is the projector of Col(B) along Null (A).
7. Argue that G E A{1, 2} iff there exist S and T invertible with G =
Sr
00
JTandSAT=
B
L00
].
210
Generalized Inverses
Constructing Other Generalized Inverses
5.3
In this section, we take a constructive approach to building a variety of examples of generalized inverses of a given matrix. The approach we adopt goes back to the fundamental idea of reducing a matrix to row echelon form. Suppose G
A E C;' x". Then there exists an invertible matrix R with RA =
. .
,
where
G has r = rank(A) = rank(G). Now G has full row rank and, therefore, we know G+ = G*(GG*)-I. Indeed, it is clear that GG+ = Jr. Now let's define As, = [G+ + (I - G+G)X I V]R, where X and V are arbitrary matrices of appropriate size. We compute G
G
...
AAR A = R-'
...
[G+ + (I - G+G)X:V JRR-'
_
0
G
...
= R-1
[G+G + (I - G+G)XG + 01 _
10 1
R
G+G+(!-G+G)XG+0 ...
'
G
= R-'
...
= A. We see A"'
is a (I)-inverse of A. For example,
let A= G
113
1
6
'
1 /5
AA -
Then I
13
I
] [
2 ],GG*=151,G+=L 2/5 4/5
1 /5
2/5
J.
+
r ]
[
u
'
Su:X
5
5
1
[ -3
0 1
]
-
y 1
[ -3
0 1
]
5
5+ 5-2u-3x x _ s- Tr +5 3Y Y 2
x
]
2/5
+5 2-L+y
]so
) Then
1 /5
-2/5 1 /5
0
J,G+G= -2/5
-2/5
4/5 [ -2/5
] = [ 0
2/5
4/5
! - G +G =
6
3
],where r, u, x, and y can be chosen arbitrarily.
The skeptical reader can compute AAA' A directly and watch the magic of just the right cancellations in just the right place at just the right time. Suppose in our 0 example it = r = I and x = y = 5 . Then A91 = . However, 1
2/5
51
5.3 Constructing Other Generalized Inverses 1
A AXE _ [ 3
Mi A=
2
0
1/5
6
-2/5
1 /5
0
1 /5
-2/5
1/5
2 6
1
_
-45
211
3/5
and
-12/5 3/5
9/5 6/5
] _ [
2/5 1 /5 1 neither of which is symmetric. Also Ari AA91 _ 9 0 1 /5 -4/5 3/5 _ -12 3/25 1 # A91 . 25 [ -2/5 1/5 ] [ -12/5 9/5 ] [ 4/25 So this choice of u, r, x, and y produces a (I }-inverse of A that satisfies none of the other MP-equations. In order to gain additional equations, we make some special choices. As a next step, we choose X = 0 in our formula for AR' to produce
11
3
-
AR'14 =
[G+: V] R. As you may have already guessed from the notation,
we claim now to have added some symmetry, namely equation MP-4. Surely MP-1 is still satisfied, and now
G
1
ARM A =
[G+ : V] RR-'
= G+G + V O = G+G, which is evi-
dently a projection. From our example above, AR4 1/5 - 3x x
2/5 - 3y
y
]'
Continuing with ourchoice ofx = y = 1/5 we have Axw = , 1
is in A (1 , 4) . However AAX 4 =
3
-2/5
is not symmetric and A914 A A914
_
-4/25 8/25
3/25 6/25
-2/5 -1/5
2 6
1 /5 1 /5
1 /5 1
1/5
[
_
-
1/5 1/5
4/5
3/5
-12/5 9/5
[ -4/5 -12/5
1 /5
-2/5
-1/5 3/5 9/5
Axj.
So AK14 satisfies MP-1 and MP-4 but not MP-2 or MP-3.
All right, suppose we want to get MP-2 to be satisfied. What more can we demand? Let's let mathematics be our guide. We want Akw
=AX4AA84 _
[G+V RR-'
G
1
[G+: V]
R
J
G.
= [G+ : V]
.
[G+ : V ] R = [G+G + 0] [G+ :
VR
= [G+GG+ G+G V ] R = [G+ : G+G V ] R. We are close but we badly need G+GV = V. So we select V = G+W, where W is arbitrary of appropriate size. Then G+GV = G+GG+W = G+W = V as we desired.
212
Generalized Inverses
Now A9124 _ [G+G+ W ] R will he in A { 1, 2, 4} as the reader may verify.
Continuing with our example,
/5 ] [a] = [ 2a/5
[
AR
24
a15
21/5
G+W [
1/5
a15
2/5
2a/5
1 SO
_
0
1
] [ -3
1
] 1/5
115[
a/5
2/5-
2a/5 j' Choosing
65
2/5 ] E A { 1, 2, 41. However, AA11a = 1, we get A9124 = [ 4/5 2/5 -10/5 5/5 _ -2 -6 3 j is not symmetric, so AA124 ¢ A 13). -30/5 15/5 ] 1
We are now very close to A+ = AAI2;+. We simply need the AAsI24 to be sym-
metric. But AA8124 =
1
1/5
2
[ 3 6
-3a/5 a/5
_
] [ 2/5 -6a/5 2a/5 j
I
-
3
-3a
a
-9a 3a
We need 3 - 9a = a or I0a = 3, so a = 3/ 10. The reader may verify that 21/50 3/50 It is not at all clear how to get A+ by making another A+ = [ /50 6/50 ] C special choice for W, say. We have AMI24 = R-1 R-1
[®®]
[G+ G+W] R =
.
R. The simple minded choice W = ® does not work, as the
reader may verify in the example we have been running above. However, we sort of know what to do. Our development of the pseudoinverse through full G
G+ _
rank factorization says look at R = AG+. Then F = R-' R
GG+ ®
R,
:
R,
= R1, which is just the first r
]
columns of an invertible matrix R-'. Hence, F has full column rank and so F+ = (F*F)-l F*. Get A+ from G+F+. In our example,
A+G+ = [
3
[ I U], so F+ _ 10
6
j[
21/5
/5
j = [
3
], so F* F = [
13 ] . Now G+ F+ _ [ 2/5 ] o [1 3 ]
Let us summarize with a table.
TABLE 5.1 G
1. Given A in
form RA
Then Am,
10
_
3][
1
_
i
] _
21150
[
/50 6/50 ]
213
5.3 Constructing Other Generalized Inverses
[G+ + (I - G+G)X :
V1
R E A { I }, where X and V are arbitrary
JJ
and G+ =
G*(GG*)-'.
2. AA14 = [G+: V] R E A {1, 4), where X was chosen to be 0 and V is
still arbitrary. 3.
[G G+W]
A9'24 =
where W is arbitrary.
R E A 11, 2, 4}, where V is chosen as G+ W,
J
4. Ab123' = G+F+, where F = AG+ and F+ = (F*F)-I F*. Now let's play the same game with A* instead of A. Let's now reduce A* and be a little clever with notation F*
F*
S*A* _
...
, so
A* = (S*)-i
r
and A = I F : 0] S-1. Now F
..
0
0
J
L
has full column rank, so F+ = (F*F)-1 F* and clearly F+F = I. Now we take
F+ + X(I - FF+)
AK1 = S
,
where X and V are arbitrary of appro-
V
F++X(I - FF+) [F : ®]S-'
priate size. Then AAx' A = [F ®]S-'S V
= [FF+ + FX(I - FF+)] [F : ®] S-' = = [FF+F+0: ®]S-' = [F ®]S-1 = A so Asi E A(l). Continuing with our example,
A = f 3 6 ]soA
Taking*,L1 11
J
L
r
1
0
r
-2 1
0
3
0
r
1
Now F
0
OI
1
121=
6J [0
L
A
*2 6, and r
1
3
]
[
2
6
]
-[0
,so
F*= [
1
3
L
F*F=[ 1 0],F* and I
- FF+ =
1/10
3/10],FF*
9/10
-3/10 1].ThusA1
-3/10
1/10
1/10
3/10
3/10
9/10
0,
214
Generalized Inverses
-2
f1 10
...................................................
]
1
x
L
J
10
x
Choosing r = r5/10 3/10 1 L
y
- io -2x 3/10-3r/IO+ L' -241
1/10+ io L
9/l0 -3/10 -3/10 I/10 J
1r u
i0 1+
io
1
1 / 10 - 1/ I 0 J
u=
and
I
y
J
x=
1 / 10, y = -1/10 we get A^ 1
is a { I)-inverse of A that satisfies none of the other MP.
tspecial choice X = 0, we find
equations. Making F+
[F : ®]S-'S
AAkii
= FF+, which is a projection. So, in our example, I
AR 1 = r
A9 + = [
10
1
2x
10
- 2y 1 , a (1, 3)-inverse of A. With x and y as above,
5/1/1010
/11/010
1
is in A (1, 3) but not A (2, 41 . Reasoning as be-
F+ fore, we see the special choice V = WF+ yields AM121 = S
.
.
as a
.
WF+ r
2a
I
a = ],we have L
1/
(w
3
11, 2, 3)-inverse of A. In our example, AA 124 = L 91010 10
3u/ 1
pn
1
so, with
/I O 3-3/010 1 as a specific { 1, 2, 3)-inverse ofA JA. To get Si
1
11
At we look at G = F+A = F+ F 01 S-1 = [F+F : ®J L
J
= S1. S2
which has full row rank so G+ = G* (GG*)-l The reader may verify our example reproduces the same A+ as above. We summarize with a table.
TABLE 5.2 J
1. Given A in C;""', form A* = (S*)-i Then
...
to get A = [F ®]S '.
F+ + X(/ - FF+)
A91 = S
E A (1), where X and V are arbiV
trary of appropriate size and F+ = (F*F)-1 F*. F+
A 11, 3), where X was chosen to be 0 and V is
2.
V
still arbitrary.
5.3 Constructing Other Generalized Inverses
215
F+ 3.
Ag123 = S
.
E A (1, 2, 3}, where V is chosen as W F+, where
W F+
W is arbitrary. 4.
A91234 = A+ = G+F+, where G = F+A and G+ = G*(GG*)-I.
We indicate next how to get generalized inverses of a specified rank. We use the notation A {i, j, k, 11s for the set of all {i, j, k, 1) inverses of ranks. We begin with {2}-inverses.
THEOREM S.9 (G. W. Stewart, R.E. Funderlie)
Let A E Cn"' and 0 < s < r. Then A{2}, = {X I X = YZ, where YEC"' ZEC!"ZAY=I,). Let X be in the right-hand set. We must show X is a {2}-inverse of A. But XAX = YZAYZ = YIZ = YZ = X. Conversely, let X E A{2},. Write X = FG in full rank factorization. Then F E C;" and G E C;"'". Then PROOF
X = XAX so FG = FGAFG. But then, F+FGG+ = F+FGAFGG+ so 1,. = CAF.
0
COROLLARY 5.7
LetAE(C',!
PROOF
Then All, 2}={FGI FEC"xr,GECr"GAF=1r}. A{1, 2} = A{2}r.
0
COROLLARY S.8
If GAF = 1, then G E (A F)(1, 2, 4).
THEOREM 5.10
LetAEC"
ando<s
Then AX = AY(AY)+, so PROOF Let X = Y(AY)+, where AYE (AX) = (AX)* Also, XAX = Y(AY)+AY(AY)+ = Y(AY)+ = X. Moreover, = rank(AY) = rank(AX) = rank(X). Conversely, suppose X E A{2, 3},.
Then AX is a projection of rank s. Thus (A))+ = AX and so X (AX)+ = XAX = X, and X plays the role of Y.
0
216
Generalized Inverses
THEOREM 5.11 Let A E C11'-" and 0 < s < r. Then A{2, 4), _ {(YA)+Y I YA E C's: "}.
PROOF
The proof goes along the lines of the previous theorem and is left as an exercise. 0 Researchers have found uses for generalized inverses of type { 1, 2, 3} and 11, 2,41 (see Goldman and Zelen [G&Z, 1964)).
Exercise Set 21 1. Argue that A(], 2, 3,41 e AI1, 2, 3) g A{ 1, 2) e A(1), with equality holding throughout if and only if A is invertible.
2. Suppose G is a 11, 2,31-inverse of A. Argue rank(G) = rank(Al) _ rank(A).
3. Argue that the following statements are all equivalent:
(i) A*B = 0. (ii) GB=0,where G e A{I,2,3}. (iii) HA = 0, where H E B(l, 2, 3}. 4. Argue that a matrix G is in A(1, 2, 3) if and only if G = HA*, where H is in A*A{l}.
5. Argue that a matrix G is in A{ 1, 2, 4} if and only if G = A*H, where H is in AA*{ 1). 6. Construct various generalized inverses of A
1
L -1 L
2
1
i
0
1
2i
I.
7. Let B = A*(AA*)912. Argue that B is a (I, 2, 4)-inverse of A. 8. Let C = (A*A)yl2 A*. Argue that C is a 11, 2, 3)-inverse of A.
9. Let B E A { 1, 2, 4} and C E A 11, 2, 3). Argue that BAC = A+. Is it good enough to assume B E A 11, 4) and C E A (1, 3)?
B ®J , where B E C"" Is ZX1 invertible. Then let G = TNS, where N
10. Suppose A E C""" and SAT =
I
Y
W
5.4 12)-inverses
217
Then argue
(i) GEA{I}iffZ=B-'. (ii) G E A {1, 2} iffZ = B-' and W = YBX. (iii) G E A { 1, 2, 3} iffZ = B-1, X = -B-1 S, S2, and W = -YS, S2, S,
where S = S2
(iv) G E A 11, 2,41 iffZ = B-' ,Y= -T2 + T, B-1, and W = -T2 + T, X, where T = {T,:T2J.
(v) G = A+ iff Z = B-', X = -B-' S, S2 , Y = -T2 T, B-1, and W =T2+T,B-'S,S2. (vi) LetG E A*A{I}.Argue that GA* E A{1,2,3}.LetH E AA*(1). Argue that A* H E A { 1, 2, 4} .
11. (Urguhart) Let G E A{ 1, 4} and H E A ( I, 3). Prove that G A H = A +.
5.4
{2}-Inverses
In this section, we discuss the 2-inverses of a matrix. The problem of finding 2-inverses is a bit more challenging than that of describing 1-inverses because it is a "quadratic" problem in the unknowns. To understand what this means, let's look at the 2-by-2 case. Given A
a c
d
J,
find X
u
v J such that
XAX = X. That is, solve
u v I = I u v I for Lu v ] I c d x, y, u, and v. The reader may check that this amounts to solving the following equations for x, y, u and v
x = x2a + ycx + xbu + ydu y = xay + y2c + xbv + yd v
u =uax+vcx+u2b+vdu v = uay + vcy + ubv + v2d, which are quadratic in the unknowns. The reader may also verify (just interchange letters) that solving for 1-inverses is linear in the unknowns. We will approach 2-inverses using the rank normal form. First, we need a theorem.
THEOREM 5.12
Let A E C" and suppose S and T are invertible matrices of appropriate size. Also suppose X is a 2-inverse of A. Then T -I X S-' is a 2-inverse of SAT. Moreover, every 2-inverse of A is of this form.
218
Generalized Inverses
PROOF Let S and T be invertible and let X he a 2-inverse of A. We claim T-'XS-1 isa2-inverse of SAT. To prove this, we compute T-'XS-I(SAT)T-1 XS-' = T -' X AX S-' = T -' X S-' , since XAX = X by assumption. Conversely, let K be any 2-inverse of SAT. Then K(SAT)K = K. Note that
TKSS-'SATT-'TKS-t = TKSATKS = TKS, so if L = TKS, then LAL = (TKS)(S-'SATT-')(TKS) = TKSATKS = TKS = L. In other words, L is a 2-inverse of A and K = T-' LS-1, as claimed.
0
The next theorem shows the structure that every 2-inverse takes relative to the rank normal form of its matrix.
THEOREM 5.13 [Bailey, 2002] 1
Let A E C;""'. Suppose RNF(A) = SAT = [ ® ®] for appropriate invertible matrices S and T. A matrix X is a 2-inverse of A if and only if
W WY J
X = T RS, where R
where M is idempotent of size r-by-r
and M is a left inverse of Y and right inverse of W, where all these matrices are of appropriate size.
PROOF
Suppose SAT = [ ® ®] , where S and T are invertible, and let ]S,whereM=M2,My=Y,WM=W,and
X=TRS=T W WY
M
M, W, Y are all of appropriate size. Then XAX = T I
SAT[W WY]S
T[W WY]SS'[0 ".
T[W
][0M2®][W WY]S
S
[W
WY
IS
WY
-
T[WM
M
WY
IS
-
Y
1
W WY J 0']T-'T[
T[W
W WY
T[W M WY
0] CD
]S = X.
Conversely, suppose X is a 2-inverse of A. Then R = T -' X S-' is a 2-inverse of SAT = [ ® 11 j by the previous theorem. Partition R = [ W 1
Z
WZ
], ]
where M
-[
W0
] [
® is
r-by-r.
WZ
]
Then
-[
[
M
ZJ
WMWY ]
-[
[®®, W
Z]
219
5.4 {2}-Inverses
The conclusions follow by comparing blocks (i.e., M = M2, MY = Y,
WM=W,andZ=WY).
0 1
Let's look at an example. Let A =
2 1
2
3
4 2
6
4 7
3
6
.
By performing ele0 0 0
1
mentary row and column operations on A, we find RN F(A) =
0
1
=
0 0
0 0 0 0 1
-7
0 0
4
-I
2
-5
2
0
A
0 0
1
0
-2
-3
0 0
I
0
0 0
1
According to th e previou s
1
0 1
1
2
2
theorem, we need a 2-by-2 idempotent matrix M. So choose M =
Next we need Y such that MY = Y. That is,
I 2
r
1
L
Y2 1 Y,
=I
i 2
L
Y2 Y,
The reader may verify that this equation implies y, = y2, so we might just as well take them to equal 1. We also need WM = W; that is, we need [wl w2]. Once again the reader may verify this en-
Iwl w2] 2
2
tails that w1 = w2; again why not use I? Now we can put R together; R = I
I
2
2
2
! 2
1
1
2
1
2
Finally we c ompute TRS, which will be a 2-inverse of A. 135 2
TRS =
-15 _ 2 15
_L _9 2
7 7
2
.
The final verification left to the reader is that
1
2
this matrix is indeed a 2-inverse of A as promised. Next, we look at the connection of the 2-inverse to other generalized inverses of a matrix. Let's fix an m-by-n matrix A of rank r over C. Note that if we have a 2-inverse X for A, then rank(X) = rank(X AX) < rank(A) = r. Now choose arbitrary matrices E in Cnxk and H in Ckx"' and form the k-by-k matrix HAE. Suppose we can find a2-inverse of HA E, say (HAE)x2. For example, (HAE)+
would certainly be one such. Then, if we form X = E(HAE )112H, we find XAX H) = E((HAE)A2)(HAE)((HAE)A2) H = E ((HAE)K2) H = X. In other words, X is a 2-inverse of A. But do all 2-inverses look like this? Once again we appeal to full rank factorizations of A. Suppose A = FG is a full rank factorization of A. Then our X above looks like X = E ((H FG E)R2) H. Note H F is k-by-r and G E is r-by-k.
220
Generalized Inverses
THEOREM 5.14 Suppose k = r =rank(A). Then HAE is invertible iff HF and GE are invertible.
PROOF Note that HF, GE, and HAE are all r-hy-r, and HAE = (HF)(GE). Thus det(HAE) = det(HF)det(GE). The theorem follows from this formula and the familiar fact that a matrix is invertible if it has a nonzero determinant. D THEOREM S.1 S
Let k = r = rank(A) and choose H and E so that HAE is invertible. Then X = E(HAE)-' H is a {/, 2}-inverse of A. PROOF
We already know X is a 2-inverse of A. We compute, AX A =
AE(HAE)-'HA = AE(HFGE)-'HA = AE(GE)-'(HF)-'HA = FGE(G E)-' (H F)-'HFG = FG = A, making X a 1-inverse as well.
D
THEOREM S.16 With the hypotheses of the previous theorem, add that we choose H = F*. Then X is a (1, 2, 3}-inverse of A.
We need only that (AX)* = AX. But AX = AE(HAE)-'H = AE(F*FGE)-' F* = (FGE)(GE)-'(F*F)-' F* = FF+, which we know PROOF
to be self-adjoint.
D
THEOREM S.17
In the previous theorem, choose E = G* instead of H = F*. Then X is a (1, 2, 4)-inverse of A.
PROOF Weonlyncedthat(XA)* = XA.ButXA = E(GE)-'(HF)-l HFG = E(GE) -' (H F)-'H FG = E(G E)-' G = G*(GG*)-' G = G+G, which we know to be self-adjoint.
0
THEOREM 5.18 In the previous theorem, choose both H = F* and E = G*. Then X = A+.
PROOF
In this case, X = G*(GG*)-'(F*F)-' = G+F+ =A+.
D
Can we find a way to control the rank of a 2-inverse of a given matrix? The next theorem gives some insight into this question.
5.4 {2}-Inverses
221
THEOREM 5.19
Let A E C;'"". Suppose 0 < s < r. Then 1. The rank r 2-inverses of A are exactly the (1, 2)-inverses of A. 2. If 0 < s< r, the ranks 2-inverses of A form the set { Y Z I Y E C""' , Z E
CS", and ZAY = 1, }. PROOF The proof of Theorem 5.19 (1) is left as an exercise. For (2), let X he a rank s 2-inverse of A. Let X = YZ be a full rank factorization of X so
that then Y E C"" and Z E C;"". Now YZ = X = XAX = YZAYZ. But Y+Y = 1, = ZZ+, so 1, = ZAY. Conversely, let X = YZ, where ZAY = /,. 0 Then XAX = YZAYZ = YIZ = X. We have avoided a serious issue until now. Above we wrote 2-inverses as X = E(HAE)-' H. But how did we know we could ever find any matrices E and H so that HAE is invertible? We have the following theorem. THEOREM 5.20
Let A E C'"". Then X is a 2-inverse of A if and only if there exists E and H where E has full column rank, H has full row rank, Col(X) = Col(E), Col(X*) = Col(H*), HAE is invertible, and X = E(HAE)-' H. Before we leave this section, we can actually determine all 1-inverses of a given rank. Suppose G E Ail }, where A E Cn'". We know r < rank(G) < minim, n). We claim all the rank s I-inverses of A are in the set (YZ
Y E C""', Z E C;""', where ZAY =
I
®
(D
}. Indeed, suppose G
belongs to this set. Then G = YZ and rank(G) = s. Partition Y with a block of its first r columns and Z with its first r rows. Say Y = [Y,
I
Y2]
Z,
and Z =
.
It follows that Z, A Y, = Ir and Z, A Y2 = 0. Let
ZZ
G, = Y,Z,. Then GIAG, = Y,Z,AY,Z, = Y,1ZI = Y,Z, = G so G E A (2). But rank(G I) = r = rank(A) so by (5.8), G, is also a 1-inverse of A. Z,
Thus, AG,AGA = AYIZ,AYZA = AYI[1,
A = AY,ZIA =
1 ®] ZZ
AG, A = A. Conversely, let G be a 1-inverse of A of ranks. Let G = FH be a
full rank factorization of G so F E C' " and H E C;"'". Then HAFHAF = HAGAF = HAF so HAF is an idempotent of rank r. Thus there exists an invertible matrix S with SHAFS-1 _
Ir
®J. Let Y = FS-1, and
222
Generalized Inverses
Z=SH.Then YEC;",ZEC,""'andZAY=SHAFS--' _
I Jr
0 1
YZ = FH = G.
Exercise Set 22 0
-1
0
0 0
0
0
0
1
1. Find a 2-inverse for D =
,
following the examp le
worked above.
2. Find a 2-inverse for N-=
0
1
0
0
0- 0
0 1
, following the exa mple worke d
0
above.
3. Verify the equations for the 2-by-2 case listed at the beginning of this section.
4. Let A = FG he a full rank factorization of A E Cr"". Argue that Gs' Ft' E A{1}, Gx=FR' E A{2), GX4Fb' E A{4}, G91 F92 E A{2}, and GA' F1' E All). Finally, argue that A+ = G+Fx'+ = G914 F+.
5. Suppose A = FG is a full rank factorization of A. Argue that F(G F)-I G and F(GF)+G are 2-inverses of A.
Further Reading [Bailey, 20031 Chelsey Elaine Bailey, The (21-Inverses of a Matrix, Masters Thesis, Baylor University, May, (2003). [Greville, 19741 T. N. E. Greville, Solutions of the Matrix Equation X AX = X and Relations Between Oblique and Orthogonal Projectors, SIAM J. Appl. Math., Vol. 26, No. 4, June, (1974), 828-831.
[Schott, 19971 James R. Schott, Matrix Analysis for Statistics, John Wiley & Sons, New York, 1997.
5.5 The Drazin Inverse
5.5
223
The Drazin Inverse
We have looked at various kinds of generalized inverses, most dealing with the problem of solving systems of linear equations. However, other inverses have also been found to be useful. The one we consider next was introduced by M. P. Drazin [Drazin, 19581 in a more abstract setting. This inverse is intimately connected with the index of a matrix. It is defined only for square matrices and like the MP-inverse, is unique.
DEFINITION 5.2
(Drazin inverse)
Let A E Cnxn with index(A) = q. Then X E C"x" is called a Drazin inverse (D-inverse for short) of A ifX satisfies the following equations:
(D1)XAX = X (D2) AX = XA (D3) Aq+' X = Aq We see that a D-inverse of A is, in particular, a 2-inverse of A that commutes
with A. Well, the zero matrix does that! So it must be (D3) that gives the D-inverse its punch. Let's settle the issues of existence and uniqueness right away.
THEOREM 5.21
Let A E Cnx", with index(A) = q. Then there exists one and only one matrix X E C"11 that satisfies (DI), (D2), and (D3). We shall denote this unique matrix by A° and call it the Drazin inverse of A. PROOF
For uniqueness, suppose X and Y both satisfy (Dl) through (D3).
Then Aq+'X = Aq = Aq+'Y so MAX = MAY. Thus AgXA = AgYA. Then AgXAX = AgYAX so AqX = AgYAX. Now Aq-'AX = Aq-'AYAX
so Aq-'XA = Aq-'AMAX so Aq-'XAX = Aq-'AYAX2 so Aq-'X = Aq-' AYX AX = Aq-'AYX = Aq-' YAX. Thus we have peeled off one factor of A on the left to get Aq'' X = Aq-' YAX . Continue this process until all As are peeled away and conclude X = YAX. A symmetric argument gives
Y = XAY.Next, (D3)implies Aq(AY-1) = 0, soXAq(AY- 1) =Owhence Aq(XAY - X) = 0. Now using (D1), Aq(XAY - XAX) = 0 so Aq(XA) (Y-X) = O, hence X Aq X A(Y - X) =0. Using (D2), Aq-' (X AX)A(Y-X) = ® so, by (DI), Aq-'(XA)(Y - X) = 0, so Aq-'(XAY - XAX) = Aq-' (X AY - X) = 0. Again, this shows we can peel off factors of A on the left and conclude X AY - X = O so X = X A Y = Y by the above.
224
Generalized Inverses
There are various approaches for existence, but since we used the full rank factorization to get the MP-inverse, we do the same for the D-inverse. If A E Cnxn has full rank n, then take A° = A-1 and easily check the three equations
for the D-inverse. It is a fact that the index q of a matrix is characterized by a sequence of full rank factorizations: A = FIG,, G, F, = F2G2, ... .
Then index(A) = ' l F, F2
q if
for the first time. Define A° _
l if GqFq 4F9
. Fq(Gq Fq)-(q+I )GgGq_1
0
GI, when (Gq Fq)-l exists
ifGgFq =
Note, if Gq Fq = 0, then A is nilpotent of index q + 1. It is straightforward to verify that AD so defined actually satisfies (D1) through (D3). 0 A number of facts are easily deduced about the D-inverse.
THEOREM 5.22
Let A E C0", with index(A) = q. Then
I. Col(A°) =Col(Aq). 2. JVull(A°) =AIull(Aq). 3. (AA°)22 = AA° is the projector of C" onto Col(Aq) along A(ull(Aq).
4. (I -AA°)2 = (I -AA°) is the projector onto Afull(Aq)along Col(Aq). 5. Aq+P(A°)1' = Aq. 6.
A''+1 A ° = A" iff p > q, where p and q are positive integers.
PROOF
The proofs are easy and left as exercises.
0
We could have used the core-nilpotent factorization to get the Drazin inverse.
THEOREM 5.23
Let A E C01 --', with index(A) = q > 0. If A = S [ C C-1
nilpotent factorization of A, then A° = S L
(CD
0 11 S-I is a coreJ
0 01 S-'.
PROOF We leave it as an exercise to verify (D I) through (D3).
0
Campbell and Meyer [C&M, 1979 talk about the core of a matrix A using the D-inverse. It goes like this.
5.5 The Drazin Inverse
225
DEFINITION 5.3 Let A E C""". Then AA°A is defined to be the core of the matrix A and is denoted CA; we also define NA = A - CA. THEOREM 5.24
Let A E C""" . Then NA = A - CA is a nilpotent matrix of index q = index(A).
PROOF
If the index(A) = 0, then A = CA, so suppose index(A) > 1. Then
(NA)9 = (A - AA°A)` = [A(1- AA °)]y = Ay(I - AA°)y = Ay - Ay = 0.
Ifp
0
DEFINITION 5.4 (core-nilpotent decomposition) Let A E C"I". The matrix NA =A - CA = (I -A A D)A is called the nilpotent part of A, and A = CA + NA is called the core-nilpotent decomposition of A.
THEOREM 5.25
r
11
Let A E C""" and let A =
S[1 C ®J
of A. Then CA=SL C
0
PROOF
S-' be a core-nilpotent factorization
S-1 and NA=SI ® ®JS-'.
The proof is left as an exercise.
There is some uniqueness here. THEOREM 5.26
Let A E C""". Then A has a unique decomposition A = X + Y, where XY = YX = 0. The index(X) < 1 and Y is nilpotent of index q =index(A). Moreover, the unique decomposition is A = CA +NA.
PROOF
If index(A) =10, then Y = 0 and Ar is invertible. If index(X) = 1,
write X = S I C 0 ®
S-'. Then Y = S L® ®J S-1, since XY =
YX = 0 and C is invertible. Thus Y2 is nilpotent with index(Y2) = index(A). NowA=X+Y=Sf C ® ]s'.sox=cAandY=NA. 0 Y2 L COROLLARY 5.9
Let A E C"". Then CA = CAP, NA = NAP, and AP = CAP + NAP. If p index(A), then AP = C.
226
Generalized Inverses
Next we list additional properties of the D-inverse, leaving the proofs as exercises.
THEOREM 5.27
Let A E C"". Then I if index(A) > I 0 if index(A) = 0
1. index(A) = index(CA) =
2. NACA=CANA=O
3. NAA°=A°NA=O 4. CAAA° = AA°CA = CA
5. (A°)° = CA 6. A = CA iff index(A) < I
7. ((A°)°)° = A° 8. AD = CD 9. (AD)r = (A*)D
10. A° = A+ iff AA+ = A+A. There is a way to do hand computations of the D-inverse. Begin with A E C"". Chances are slim that you will know the index q of A, so compute A. If A" = 0, then A° = 0, so suppose A" # O. Find the Hermite echelon form of A, HEF(A). The basic columns v1, v2, ... , v,. form a basis for Col(Ay). Form I - HEF(A) and call the nonzero columns Vr+i, ... , v". They form a basis for Null(Ay). Form S = [vi v,. I Vr+J v"]. Then S-1 AS I
I
_
CO ON
A D =S
I
I
is a core-nilpotent factorization of A. Compute C-'and form
C
O
O
O1
S-1.
5
4
52
15
5
6
10
1
3
6
For example, suppose A = L
apparent, so we compute that A 4
=
The index of A is not
-6 -6
5974
19570
8446
5562
14214 3708 -16480
31930 26780
-1854
-4326
27192
21012
-24720
20600
19776 J
5.5 The Drazin Inverse
and note HEF(A) =
1
0
0
1
227
-2 -3 3
3
2
0 0
5974
U
U
0
0
3708
19570 31930 26780
-16480
-24720
14214
]. Next, we form I - HEF(A)
0 7
15
-3 -6 2
0
0
5
,
where we have eliminated
0 0some fractions in the last two columns. Now P- I A P = i 2 0 0 0 00 0 00 0 0 69
_ 155
69 2
We see C =
206
2 69
127 10
2
206
103 7
I
0
103
2060
0 0
2
5
1
412 127
206
5
15
5
206
206 5
206
3
103
103
_ 69
_ L55-
2 127
69
_ 55 412 69 412
0 0
2
0
0
0
0
0
0
0
0
P-1 =
3 206 3
103 3
3
103
206 3 103
We have shown earlier that A-' can always be expressed as a polynomial in A. The MP-inverse of a square matrix may not be expressible as a polynomial in A. The D-inverse, on the other hand, is always expressible as a polynomial in the matrix. THEOREM 5.28 Let A E C""". Then there is a polynomial p(x) in C[x] such that p(A) = A°.
PROOF
Write A = S I ® ®J S-1 in a core-nilpotent factorization.
Now, C is invertible, so there exists a polynomial q(x) with q(C) = C-1. Let p(x) = x9[q(x)]y+I , where q = index(A). Then p(A) = Ay[q(A)]y+I r[ Cq ®] r q(C) ® ]v+l S-I S[ C'[q(C)]y+l 0 1 S-I = = S ®® ® q(N) 0 01 II`
S
[
C-1
0
0
01
S_I
- A°.
COROLLARY 5.[p Let A, B be in CnX" and AB = BA. Then
1. (A B)° = B°A° = A°B°
2. A°B=BA°
0
228
Generalized Inverses
3. AB° = B°A 4. index(AB) < rnax{index(A), index(B)}. As long as we are talking about polynomials, let's take another look at the minimum polynomial. Recall that for each A in C""" there is a monic polynomial of least degree IIA(X) E C[x] such that p.A(A) _ ®. Say pA(x) = xd+
ad-IX d-i+ +aix+a". We have seen that A is invertible iffthe constant term + a2A + al I]. [Ad-i + ad_, A`t-2 + all Now suppose that i is the least positive integer with 0 = a = ai = = a;-i but a; i4 0. Then i is the index of A.
a $ 0 and, in this case, A-' =
THEOREM 5.29
Let A E C" " and µA (x) = x`t + ad_Ixd-I + i = index(A). PROOF
Write A = S I ®
+ a,x`, with a, # 0. Then
S-1 in a core-nilpotent factorization j where q = index(A) = index(N). Since AA(A) = 0, we see 0 = p.A(N) _ N
I
(Nd-i+ad_;Nd-'-1
Since (N"-'+
ad_iNd-'-I +
+ ail) is invertible, we get N' = 0. Thus i > q. Suppose q < i.Then A°A' = A'-'. Write iLA(x) = x'q(x)soO= IIA(A) = A'-'q(A). Multiply by AD. Then 0 = A'-t q(A). Thus r(x) = x'-'q(x) is a polynomial that annihilates A and deg(r(x)) < deg(µA(x)), a contradiction. Therefore
i =q.
0
There is a nice connection between the D-inverse and Krylov subspaces. Given a square matrix A and a column vector b, we defined in a previous home-
work exercise the Krylov subspace, K,(A, b) = span{b, Ab,... , A'-'b}. We have the following results that related a system Ax = b to solutions in a Krylov subspace where A is n-by-n. The proofs are left as exercises.
THEOREM 5.30 The following statements are equivalent:
1. A°b is a solution of Ax = b. 2. b E Col(A`J), where q is the index of A. 3.
Ax = b has a solution in the Krylov subspace K,,(A, b).
5.5 The Drazin Inverse
229
THEOREM 5.31
Suppose m is the degree of the minimal polynomial of A and q is the index of A. If b E Col(Ay), then the linear system Ax = b has a unique solution x = ADb E K,,,_q(A, b). If b V Col(A<), then Ax = b does not have a solution in K, (A, b).
Exercise Set 23 0
0 0
0
0
1
1. Find the D-inverse of
1
1
0
1
and
0 2
2
1
5 -l
4 -2
1
2. Fill in the proof of Theorem 5.22. 3. Fill in the-proof of Theorem 5.23.
4. Fill in the proof of Theorem 5.25. 5. Fill in the proof of Theorem 5.27.
Further Reading [B-I&G, 2003] Adi Ben-Israel and Thomas N. E. Greville, Generalized Inverses: Theory and Applications, Springer-Verlag, New York, (2003).
[C&G, 1980] R. E. Cline and Thomas N. E. Greville, A Drazin Inverse for Rectangular Matrices, Linear Algebra and Appl., Vol. 29, (1980), 53-62. [C&M, 19911 S. L. Campbell and C. D. Meyer, Jr., Generalized Inverses of Linear Transformations, Dover Publications, New York, (1991). [C&M&R, 1976] S. L. Campbell, C. D. Meyer, Jr., and N.J. Rose, Applications of the Drazin Inverse to Linear Systems of Differential Equations with Singular Constant Coefficients, SIAM J. Appl. Math., Vol. 31, No. 3, (1976), 411-425.
230
Generalized Inverses
[Drazin, 1958] M. P. Drazin, Pseudo-Inverses in Associative Rings and Semi-Groups, The American Mathematical Monthly, Vol. 65, (1958), 506-514.
IH&M&S, 20041 Olga Holtz, Volker Mehrmann, and Hans Schneider, Potter, Wielandt, and Drazin on the Matrix Equation AB = wBA: New Answers to Old Questions, The American Mathematical Monthly, Vol. 111, No. 8, October, (2004), 655-667. 1998] Ilse C. F. Ipsen and Carl D. Meyer, The Idea Behind Krylov Methods, The American Mathematical Monthly, Vol. 105, No. 10, December, (1998), 889-899.
[I&M,
[M&P, 1974] C. D. Meyer, Jr. and R. J. Plemmons, Limits and the Index of a Square Matrix, SIAM J. Appl. Math., Vol. 26, (1974), 469-478.
[M&P, 1977] C. D. Meyer, Jr. and R. J. Plemmons, Convergent Powers of a Matrix with Applications to Iterative Methods for Singular Linear Systems, SIAM J. Numer. Anal., Vol. 36, (1977). [M&R, 1977] C. D. Meyer, Jr. and Nicholas J. Rose, The Index and the Drazin Inverse of Block Triangular Matrices, SIAM J. Applied Math., Vol. 33, (1977). [Zhang, 20011 Liping Zhang, A Characterization of the Drazin Inverse, Linear Algebra and its Applications, Vol. 335, (2001), 183-188.
5.6
The Group Inverse
There is yet another kind of inverse that has been found to be useful.
DEFINITION 5.5
(group inverse) We say a matrix X is a group inverse of A iff it satisfies
(MPI)AXA=A (MP2) XAX = X
(3) AX = XA.
5.6 The Group Inverse
231
Thus, a group inverse of A is a 11, 2}-inverse of A that commutes with A. First, we establish the uniqueness. THEOREM 5.32 (uniqueness of the group inverse) If a matrix A has a group inverse, it is unique.
Suppose X and Y satisfy the three equations given above. Then X = XAX = AXX = AYAXX = YAX = YYA = YAY = Y. O
PROOF
It turns out that not every matrix has a group inverse. We use our favorite, the full rank factorization, to see when we do. When a group inverse does exist, we denote it be A".
THEOREM 5.33 [Cline, 1964] Suppose the square matrix A = FG is in full rank factorization. Then A has a group inverse if and only if G F is nonsingular. In this case, A" = F(G F) -2 G. PROOF
Suppose r = rank(A) and A = FG is a full rank factorization of A. GF) , sorank(A2) = rank(GF), since F has full column rank and G has full row rank. Thus, rank(A2) = rank(A) if and only if GF is nonsingular. Let X = F(GF)-2G. We compute AXA =
Then GF is in C',' and A2 = FGFG =
FGF(GF)-2GFG = FG = A, XAX = F(GF)-2GFGF(GF)-2G = F(G F)-2G = X and X A = FG F(G F)-2G = F(G F)-2G = F(G F)(G F)-I G FG = F(G F)-2G FG = X A.
0
COROLLARY 5.11
A square matrix A has a group inverse if and only if index(A) = I and if and only if rank(A) = rank(A2).
Exercise Set 24 1. If A is nonsingular, argue that A* =
2. Prove that A** = A, if A" exists. 3. Prove that A*# = A"*, if A' exists.
4. Prove that And = A"" for any positive integer n.
232
Generalized Inverses
5. Show that A" = AGA for any G in A3(1 ).
6. Create a group inverse of rank 2 for a 4-by-4 matrix and a D-inverse of rank 2 for the same matrix. What are the differences'?
Further Reading [C&M, 1991 ] S. L. Campbell and C. D. Meyer, Jr., Generalized Inverses
of Linear Transformations, Dover Publications, New York, (1991). [Hartwig, 1987] Robert E. Hartwig, The Group Inverse of a Block Triangular Matrix, in Current Trends in Matrix Theory, Elsevier Science Publishing Co., Inc., F. Uhlig and R. Grone, Editors, (1987).
1975] C. D. Meyer, Jr., The Role of the Group Generalized Inverse in the Theory of Finite Markov Chains, SIAM Review, Vol. 17, (1975), 443-464. [Meyer,
Chapter 6 Norms
length, norm, unit ball, unit sphere, norm equivalence theorem, distance fienction, convergent sequence of vectors, Holder's inequality, Cauchy-Schwarz inequality, Minkowski inequality
6.1
The Normed Linear Space C"
We have seen how to view C" as a space of vectors where you can study the algebra of addition and scalar multiplication. However, there is more you can do with vectors than just add them and multiply them by scalars. Vectors also have properties of a geometric nature. For example, vectors have length (some
people say magnitude). This is the concept we study in this chapter. Wee motivated by our experience in R", specifically R2.
By the famous theorem about right triangles (remember which one?), the length square of x = (XI, x2) is Ilxll2 = xi + xZ. It is natural to define the x1 + x2. However, when dealing with complex vectors, we have a problem. Consider x = (1, i) in C2. Then, following the previous formula, Ilxll2 = 12+i2 = I-1 = 0. Unless you are doing relativity in physics, this is troublesome. It says a nonzero vector in C2 can have zero length! That does not seem right, so we need to remedy this problem. We can use the idea of the magnitude of a complex number to make things work out. Recall that if the complex number z = a + bi, then Iz12 = a2 + b2 = zz = zz. Let's define Ilxl12 = 1112 + li I2 = I + I = 2; then llxll = f, which seems much more length of x as Ilxll =
reasonable. Therefore, we define Ilxl12 = Ixi I2 + Ix2I2 on C2.
With this in mind, we define what we mean by the "length" or "norm" of a vector and give a number of examples. Common sense tells us that lengths of nonzero vectors should be positive real numbers. The other properties we adopt also make good common sense. We can also use our knowledge of the "absolute value" concept as a guide.
233
Norms
234
x2
----------------
(x1, x2)
xl
Figure 6.1:
Norm of a vector.
DEFINITION 6.1
(norm)
A real valued function 11 11 : C" -* R is called a norm on C" iff 1.
llxll >OforallxinC"andllxll =0ifandonly
2.
Ilaxll = lal llxllforallxinC" andallainC
3. llx + yll < llxll + Il ll for all x, y in C" (triangle inequality).
It turns out that there are many examples of norms on C". Here are just a few. Each gives a way of talking about the "size" of a vector. Is it "big" or is it "small"? Our intuition has been formed by Euclidean notions of length, but there are many other ways to measure length that may seem strange at first. Some norms may actually he more convenient to use than others in a given situation. Example 6.1 Let x = (x1,x2,
...
,
be a vector in C".
6. 1 The Norrned Linear Space C"
235
'I
1. Define 11x111 = > lx; 1. This is a norm called the e, norm, sum norm, or r=i
even the taxicab norm (can you see why?).
2. Define lIX112 = (
Ixt12) . This is a norm called the f2 norm or Eu-
clidean norm. I "
3. More generally, define IIx11P = I > Ix; IP
1/P
, where I < p < oo. This
is a norm called the eP norm or just the p-norm for short and includes the previous two examples as special cases. 4. Define IIxLIoo = max {1x;1}. This norm is called the l... norm or the max , <, <
norm.
5. Suppose B is a matrix where B = S2 with S = S* # 0. Then define 11x118 = (x*Bx)2. This turns out to be a norm also.
There are two subsets that are interesting to look at for any norm, their unit ball and unit sphere. DEFINITION 6.2 (unit ball, unit sphere) Let 11 11 be a norm on C Then, the set B, = {x E C" I lIxll < 1) is called the unit ball for the norm, and S, = (x E C I Ilxll = 1) is called the unit sphere for the norm. It may be helpful to visualize some of these sets for various p-norms in R2 (see Figure 6.2).
THEOREM 6.1 (basic facts about all norms) Let 11 11 be a norm on C" with x and y in C. Then 1.
2.
Ix11 = II-x11 for all x in C". 111 x 11 - I l y 1 1
I < 11x - yll < IIxII + Ilyll.
3. For any invertible matrix S in C" x", the function IIx 11 s = II All defines a norm 1111 s on C.
4. The unit ball is always a closed, bounded, convex set that contains the origin of C".
Nouns
236
y
P=°° p=7 p=2 p = 1.5
P=1
Figure 6.2:
Unit spheres in various norms.
5. Every norm on C" is uniformly continuous. In other words, given any E > 0 there exists S > 0 such that Ix; - y; I < S for 1 < i < it implies III x11 - IIYIII <e, where x=(x,,x2,...
PROOF
The proofs are left as exercises.
0
Recall that a subset K of C" is called convex if x and y in K implies that tx + (I - t )y also belongs to K for any real t with 0 < t < 1. In other words, if x and y are in K, the line segment from x toy lies entirely in K. We remark that given any closed, bounded, convex set K of C" that contains the origin, there is a norm on C" for which K is the unit ball. This result is proved in Householder [ 1964]. This and Theorem 6.1(3) suggest that there is an enormous supply of norms to choose from on C". However, in a sense, it does not matter which one you choose. The next theorem pursues this idea. THEOREM 6.2 (the norm equivalence theorem) Let 1111 and 11 11' be any two norms on C". Then, there are real constants C2 C, > 0 such that C, 11x11 < Ilxil' < C2 IIx11 for all x in C".
237
6. 1 The Normed Linear Space C"
PROOF
Define the function f (x) = Ilxll' from C" to R. This is a continuous
function on C". Let S = {y E C" IIxII = 1). This is a closed and bounded subset of C". By a famous theorem of Weierstrass, a continuous function on a closed and bounded subset of C" assumes its minimum and its maximum value on that set. Thus, there exists a Ymin in S and a Ymax in S such that I
f (Ym,n) < f (Y) < f (Ymax) for ally in S. Note ymin $ -6 since IlYmin II = 1 Note SO f (Ynun) > 0. Now take any x in C" other than ', and form IIxII .
that this vector is in S, so Ilxll' = Ilxll
II II
> IIxII f(Yn,,,,). In fact, the
IIxII
inequality IIxII' > IIxII f(Ymin) holds even if x = V since all it says then is 0 > 0. Similarly, IIxII' = IIxII II
x
' < IIxII f(Ymax) Putting this together,
IIxII
Ilxll' < IIxII f(Ymax) Clearly, f(ymin) -< f(Ymax) Take CI = f (Yrmn) and C2 = f (ymax) 0 we have IIxII f(Ymin)
Since we do not really need this result, you may ignore its proof altogether because it uses ideas from advanced calculus. However, it is important to know that whenever you have a norm - that is, a notion of length - you automatically have a notion of "distance." DEFINITION 6.3 (distance function) Let 11 11 be a norm on C". For x and y in C", define the distance from x to y
byd(x,y)= IIY - xll. THEOREM 6.3 (basic facts about any distance function) Let d be the distance function derived from the norm 11 11. Then
I. d(x, y) > O for all x, y in C" and d(x, y) = 0 iff x = y. 2. d(x, y) = d(y, x) for all x, y in C" (symmetry).
3. d(x, z) < d(x, y) + d(y, z) for all x, y, z in C" (triangle inequality for distance).
4. d(x, y) = d(x + z, y + z) for all x, y, z in C" (translation invariance). 5. d(ax, ay) = foal d (x, y) for all x, y in C" and all a E C .
PROOF The proofs here all refer back to basic properties of a norm and so we leave them as exercises. 0
Norms
238
A huge door has just been opened to us. Once we have a notion of distance, all the concepts of calculus are available. That is because you now have a way of saying that vectors are "close" to each other. However, this is too big a story to tell here. Besides, our main focus is on the algebraic properties of matrices. Even so, just to get the flavor of where we could go with this, we look at the concept of a convergent sequence of vectors.
DEFINITION 6.4
(convergent sequence of vectors) Let (XI, x2, ... ) be a sequence of vectors in C". This sequence is said to converge to the vector x in C" with respect to the norm 11 11 if for each e > 0
there is a natural number N > 0 such that if n > N, then lix,, - xii < e. One final remark along these lines. In view of the norm equivalence theorem, the convergence of a sequence of vectors with respect to one norm implies the convergence with respect to any norm. So, for convergence ideas, it does not matter which norm is used. Once we have convergence, it is not difficult to formulate the concept of "continuity." How would you do it? The astute reader will have noticed we did not offer any proofs that the norms of Example 6.1 really are norms. You probably expected that to show up in the exercises. Well, we next give a little help establishing that the EP norms really
are norms. When you see what is required, you should be thankful. To get this proved, we need a fundamental result credited to Ludwig Otto Holder (22 December, 1859-29 August, 1937). THEOREM 6.4 (Holder's inequality) Let a,, a2, ... , a and b, , b2, ... , b be any complex numbers and let p and
q be real numbers with 1 < p, q < oc. Suppose 'lv laabiI
Jail''
-p + lq = 1. Then
(ibIi1
tlq
PROOF If either all the as are zero or all the bis are zero, the inequality above holds trivially. So let's assume that not all the as are zero and not all the bis are zero. Recall from calculus that the function f(x) = ln(x) is always concave down on the positive real numbers (look at the second derivative
f"). Thus, if a and 0 are positive real numbers and k is a real number with 0 < k < 1, then k In(a) + (1 - k) In(p) < ln(k(x + (1 - k)(3) (see Figure 6.3). To see this, note that the equation of the chord from (a, en(«)) to (a, £n(R)) is
en(p) = en(p) - en(a)(x - 0), so ii'x = ka+(1 - k 0 -«
239
6.1 The Normed Linear Space C"
+h
1.111
((3, In((3))
ax + (1-X)(3
Chord: y - in(p) =
In(p)
- lav(a) (x -
It x = aA + (1-A)li, y = Aln(a) + (1-A)In(13)
Figure 6.3:
Proof of Holder's Theorem.
M(p) - en(a) (Xa + (1 - X)13- R) = en(13) + _ - R)) _ en(pj)+X(en((3)-en(a)). From the picture it is clear that k ln(a)+(1-X) ln((3) < £n((3) - en(a)
_
ln(ka + (1 - X)13).
Now, by laws of logarithm, ln(a'`(3'-A) < In(Xa + (I - X)p). Then, since ln(x) is strictly increasing on positive reals (just look at the first derivative f'), we get < lea + (I - X)(3. Now choose aAP1-A
Iv
a; =
lay
EIaiI;,
1
Ib. I"
k=
and
13. =
-. P
FIb1I"
l=1
J=1 11
11
Let A = > Iaj I P
and
B = 1: I bi I y
1=1
J=1
Note that to even form a; and (3;, we need the fact that not all the aj and not all the bj are zero. 1\ _ pI , we get
Then using a; (3;1 ->, < ka1 + (1 -
CJb81y/1-1
P
CJaAP/1
or
Jai I
1b11
(A)l/P (B)hI'
1
P
la; I
A
l+
1
9
Ibl ly
B
Norms
240
This is true for each i, I < i < n, so we may sum on i and get of
Jai IJbiI
I
(A)1 /v (B) 1/q = (A)1/" (B) 1/q Elaibil ,-1 I
n
If
< -E
Iai1"
p i=1 (A) I
I
p (A)
A+
Ibilq +91"-j= --E IailP + --E Ibilq p (A) i=1 9 (B) i=1 1=1 (B) I
I
I
I
q (B)
I+ I
B=
p
I
q
I
I
= 1.
n
Therefore, (A)1/ (B)1/q
laibi I < 1. i=1 ,
By clearing denominators, we get
laibiI -< (A)1 /v (B)
1/q
i=1
'/p
n
11y n
E Ibil
IaJIP)
j=1
Wow! Do you get the feeling we have really proved something here? Actually, though the symbol pushing was intense, all we really used was the fact that the natural logarithm function is concave down.
As a special case, we get a famous inequality that apparently dates back to 1821 when Augustin Louis Cauchy (21 August 1789-23 May 1857) first proved it. This inequality has been generalized by a number of mathematicians, notably Hermann Amandus Schwarz (25 January 1843-30 November 1921).
COROLLARY 6.1 (Cauchy -Schwarz inequality in C") Take p = q = 2 above. Then n
i=1
II
a.b <
/
1/2
Ia, Iz !=1
1/2
n
Ib, . I2 i-I
COROLLARY 6.2 T h e function I I I I p : C" -+ R d e f i n e d by Ilxll,, =
wherex = (x1,x2,... x,,) and I < p < oo.
(E °) I xi l
1/P
is a norm on C"
6. 1 The Normed Linear Space C"
241
PROOF We must establish that this function satisfies the three defining properties of a norm. The first two are easy and left to the reader. The challenge is the triangle inequality. We shall prove llx + ylli, < IIxIIP + Ilylli, for all x, y in C". Choose any x, y in C". If x + y = -6, the inequality is trivially true, so
suppose x + y # 6. Then IIx + yll, 4 0. Thus
IIx+ylIP=?Ixi+yil'' i=1 n
n
Ixi+yillxi+yiiP-'
(Ixil+lyiI)lxi+yiIP-'
i=1
i=1 n
n
i=1
i=I
UP
n
IxilP
(lxi+yii) n
(P-1)q
Ixi + yi
I
1/q
n
1/q
I(P-l)y
lyiIP i=1
n
x
+
i=1
i=l
UP
n
I/q
lxi + yi
= (IIxIIP + IIyIIP)
I(P-I)4
i=1
Ci=1
P/Pq
(EIxi+yilP I
(IIxIIP+IIyIIP) i-
= (IIxIIP + liyIIP)
(IIx+YIIP)P/q
Dividing both sides byllx+yliP/q , we getllx+yli, = IIx+YIIP -P/q -< IIxIIP+ IIyIIP which is what we want. Notice we needed lI x + y II P i4 0 to divide above. The reader should be able to
explain all the steps above. It may be useful to note that pq - q = (p - 1)q = follows from v + 1 = 1.
This last inequality is sometimes called the Minkowski inequality after Hermann Minkowski (22 June 1864-12 January 1909).
Exercise Set 25 1. Give direct arguments that 21 and e«, are norms on C".
2. For x in C", define Ilxll =
I if x54 -jT 0 if x= -6 .IsthisanormonC"?
242
Norn,.s
3. Let d he a function satisfying (1), (2), and (3) of Theorem 6.3. Does llxll = d(x, 6) define a norm on C"? Supposed satisfies (1), (3), and (5). Do you get a norm then? 4. A vector x is called a unit vector for norm II II if J x 11 = 1. Suppose IIx ll
1
and x # V. Can you (easily) find a unit vector in the same direction as x? 5. If you have had some analysis, how would you define a Cauchy sequence
of vectors in C" ? Can you prove that every Cauchy sequence in C" converges? How would you define a convergent series of vectors in C" ?
: C" -* R by Ilxll = Ixi 14- Ix21 + is a norm on C" and sketch the unit ball in R2.
6. Define 11 11
+ "'- Ix,, I. Show that 11
11
7. Let x be any vector in C". Show that Ilxlla, = viim Ilxllt, (Hint: n'1't' -* 00
I as p -o oc and Ilxll,, < n'/t' IIXII. )
8. Compute the p-norm for the vectors (1, 1, 4, 3) and (i, i, 4 - 2i, 5) for p = 1, 2, and oo. 9. Argue that a sequence (x,,) converges to x in the oo-norm if each component in x converges to the corresponding entry in x. What is the limit in the oo-norm? vector of the sequence (1,n 4 + 71 !'2M is
n Ilxll,,, for all x in C". Why does this say 11x112 < that if a sequence converges in the oc-norm, then the sequence must also converge to the same result in the 2-norm? Also argue that llxlli
10. Prove llxllc,,
lIX112 and Ilxlli < n Ilxll... for all x in C".
11. Discuss how the 2-norm can he realized as a matrix multiplication. (Hint: If x is in C"' 1, IN 112 =
"'X-'X.)
12. Argue that x -). x iff the sequence of scalars llx,, - xll -+ 0 for any norm. 2
13. Argue that
C>xi ,_i
p,
< nEXi , where the x; are all real. i=i
14. Determine when equality holds in Holder's inequality. 15. Can IN - Y112 = IN + YII2 for two vectors x and y in C"? 16. Argue that I Ilxll - IIYII I
min(IIx - YII , IN + YII) for all x and yin C".
17. How does the triangle inequality generalize ton vectors?
6.1 The Normed Linear Space C"
243
18. Argue Ilxll,0 < lIX112 -< 11x111 for all x in C".
19. Prove all the parts of Theorem 6.1. 20. Prove all the parts of Theorem 6.3.
21. Prove if p < q, then n-v IIxII,, < Ilxlly < Ilxlli, for any x E C". 22. Let (0, 0) and (1, 2) be points in the (x, y)-plane and let Li be the line that
is equidistant from these two points in the el, e2, and e,, norms. Draw these lines.
23. Compute the 1, 2, and oo distances between (-10, 11, 12i) and (4, 1 +
i, -35i). 24. Prove that if a sequence converges in one norm, it also converges with respect to any norm equivalent to that norm. 25. Can you find a nonzero vector x in C2 such that Ilxll,,. = lIX112 = IIxII1?
26. Suppose Ilxll,,, IlXlln are norms on C". Argue that max{llxll,,, IlXlln} is also an norm on C". Suppose X1 and X2 are positive real numbers. Argue that K1 IIxII" + K2 IlXlln is also a norm. 27. Could the p-norm 11x11 , be a norm if 0 < p < 1?
28. Consider the standard basis vectors e; in C". Argue that for any norm II II
IxiI Ileill,where x=(x1,X2,... ,xn).
on C', IIxII < f=1
29. Suppose U is a unitary matrix(U* = U-1). Find all norms from the examples given above that have the property IlUxll = IIxII for all x = (XI, x2, ... , Xn) In 01. 30. A matrix V is called an isometry for a norm 11 11 iff II Vxll = IIxII for all x = (X1, x2, ... , x,,) in C". Argue that the isometrics for a norm on C" form a subgroup of GL(n, Q. Determine the isometry group of the Euclidean norm, the sum norm, and the max norm.
31. Statisticians sometimes like to put "weights" on their data. Suppose w1, w2, ... , W. is a set of positive real numbers (i.e., weights). Does IIxII = (Ewi Ix;l")1 tN define a norm on C" where I < p < oo? How i=1
about I I x I I = max{w, Ixi l , W2 Ix2l, ... , wn Ixnl}?
32. Verify that Example (6.1)(5) actually defines a norm.
Norms
244
33. Argue that the intersection of two convex sets is either empty or again a convex set.
34. Prove that for the 2-norm and any three vectors zi, z2, z; I1z1 - z21122 + Ilz2 -23112+11?, -2,112+IjZ, +z2 +2,112 = 3(IIzjII'+IIz2112+112.,1121.
35. There is another common proof of the key inequality in Holder's inequality. We used a logarithm, but you could define a function f (t) =
>\(t - 1) - tn` + 1, 0 < X < 1. Argue that f(1) = 0 and f '(t) > 0. Conclude that t?` < Xt + (I - X). Assume a and b are nonnegative real
numbers with a > b. Put t = a , A = b
where
1+ p
1 . If a < b, put t = b p
a
-q = 1. In each case, argue that a T b
<
and k =
b
q
a + -. p
q
36. As a fun problem, draw the lines in the plane 1R2 that are equidistant from (0, 0) and (1, 2) if distance is defined in the 1, 2 and oo norms. Try (0, 0) and (2, 1) and (0, 0) and (2, 2). 37. Prove that 2 11x - y11 ? (11x11 + Ilyll)
x and y.
x
y
Ilxll
Ilyll
for nonzero vectors
Further Reading [Hille, 1972] Einar Hille, Methods in Classical and Functional Analysis, Addison-Wesley Publishing Co., Reading, MA, (1972). [House, 1964] Alston S. Householder, The Theory of Matrices in NumericalAnalysis, Dover Publications, Inc., New York, (1964).
[Schattsch, 1984] D. J. Schattschneider, The Taxicab Group, The American Mathematical Monthly, Vol. 91, No. 7, 423-428, (1984). matrix norm, multiplicative norm, Banach norm, induced norm, condition number
6.2
Matrix Norms
We recall that the set of all in-hy-n matrices C111" with the usual addition and scalar multiplication is a complex vector space, the "vectors" being matrices. It
6.2 Matrix Norms
245
is natural to inquire about norms for matrices. In fact, a matrix can be "strung out" to make a "long" vector in C"' in various ways, say by putting one row after the other. You could also stack the columns to make one very "tall" column vector out of the matrix. This will give us some clues on how to define norms for matrices since we have so many norms on Cl'. In particular, we will see that all the vector norms of the previous section will lead to matrix norms. There ara, however, other norms that have also been found useful.
DEFINITION 6.5
(matrix norm) C'"" -+ R is called a matrix norm iff it satisfies
A real valued function 11 11
1. 11AII?Oforall AEC''"and 11A11=0iffA=O 2.
llaAll =1allIAll for all aECandA EC
3.
II A + B II
II A II + II B II for all A, B E Cn". (Triangle inequality) We say we have a strong matrix norm or a Banach norm iff we have in
addition n = m and 4.
IIABII <_ IIAII IIBIIforallA,B,CinC""" The terms multiplicative and submultiplicative are also used to describe (4).
As with vector norms, there are many examples of matrix norms.
Example 6.2 n
1.
II A II i = > > l ai j 1. This is a norm called the sum norm. i=1 j=1 M
2. II A II, = max Y_ lai j 1. This is a norm called the maximum column sum
I<j
norm.
3.
IIAIIF =
(n_ laijl i=i j=i
2
= tr(A*A)1/2. This is a norm called the
Frobenius norm. i/p 4.
II A II
v=
l aij
v
/
This is a norm called the Minkowski p-norm
(i=1j=I or Holder norm. It generalizes (1) and (3).
Norms
246
5. HA lloo = max{ jai, I }. This is a norm called the max norm.
ij
6. 11A II,_, = max E jail This is a norm called the maximum row sum j=1 1:51!51n norm.
The reader is now invited to make up more norms motivated by the examples above. We remark that unit balls and unit spheres are defined in the same way as for
vector norms: B, =(A E C'"xn I IIAII -< l} and S, =(A E C"' III All=]). Also, most of the facts we derived for vector norms hold true for matrix norms. We collect a few below.
THEOREM 6.5 (basic facts about matrix norms) Let 11 11 be a norm on C"'x" and A, B E C'nxn Then 1.
111 A 11 - IIB111:5 II A - B II< IIAII + IIBII for all A, B E C'nxn
2. The unit ball in Cm xn is a closed, bounded, convex subset of C" x" containing the zero matrix.
3. Every matrix norm is uniformly continuous. 4. If 11 11 and 11 11' are matrix norms, there exist constants C2 > C, > 0 such
that C, IIAII < IIAII < C2 II A II for all A in C"' xn (norm equivalence theorem).
5. Let S be invertible in Cmx"'. Then IlAlls = IISAS-' II defines another matrix norm on C"' In'.
PROOF
The proofs are left as exercises.
El
Now we have a way to discuss the convergence of matrices and the distance between matrices. We have a way to address the issue if two matrices are "close" or not. In other words, we could develop analysis in this context. However, we will not pursue these ideas here but rather close our current discussion looking at connections between matrix norms and vector norms. It turns out vector norms
induce norms on matrices, as the following theorem shows. These induced norms are also called operator norms.
247
6.2 Matrix Norms
THEOREM 6.6 Let II Ila and II Ilb be vector norms on C' and C", respectively. Then the function II Ila.b on C ,,n defined by II A Ila,b = max
IIAXIIa
= max IIAxIIa is a matrix
x16-e IIxIIb
IlXIIh=I
norm on C"" Moreover, IIAxIIa < IIA Ila,bI xllb for all x in C" and all A in C""n. Finally, if n = m, this norm is multiplicative. PROOF We sketch the proof. The fact that IIA IIa. b exists depends on a fact from advanced calculus, which we shall omit. For each nonzero vector x, IIAxIIa
> 0 so the max of all these numbers must also be greater than or equal
IlxlIb
to zero. Moreover, if II A lla.b = 0, then Ax = ' for every vector x, which we know implies A = 0. Of course, if A = 0, it is obvious that IIA lla.b = 0 For the second property of norm, IlaAlla.b = max IIaAxlla = lal max IIAxIIa = IlxlIb
x
-0-d' IlxlIb
II(A + B)xlla
Ial IIAIIa.b Finally, for x nonzero, IIA + Blla,b = max x06 IlxlIb IIAx+Bxlla _ IIAXIIa + IIBxIIa < IIBxIIa < max max maxIIAXIIa + max = x# '
x#T
IlxlIb
x0-d IlxlIb
IlxlIb
IlAlla.b+IlBlla,b.Moreover, foranynonzerovector x, IIAIIa.b So IIAXIIa
_<
1IA
-0-6
IIAIlxXlIb IIa
Ila,bll x11b. Finally, IIABxIIa
< _<
m ax
IlxlIb IIAXIIa
IlxlIb
IIAIIa.b IIBxIIa
IIAila.b IIBIIa.b IIxIIb Thus, IIABIIa b = IIABxIIa < IlAlla.b IIBIla.b IIxIIb
<_
0
DEFINITION 6.6 (induced norm) The matrix norm II Ila.b in the above theorem is called the norm induced by the vector norms II Ila and II IIb If n = m and II Ila = II IIb , we simply say the matrix norm ll
II
is induced by the vector norm II Ila.
We remark that the geometric interpretation for an induced matrix norm 11 11 is that II A II is the maximum length of a unit vector (i.e., a vector in S1) after it was
transformed by A. This is clear from the formula IIAII = max IIAxIIa Here IlxII,=I
the terms "length" and "unit" must be understood in terms of the underlying vector norms. Next, we look at some examples of induced norms. Example 6.3 We will only consider the case where m = n and where the two vector norms are the same. We present the examples in Table 6.2.
Norms
248
TABLE 6.2.1: Induced Matrix Norm
Vector Norm el
:
II A II I.1 = max F, I aji l = IIAII,,,,
11X111 = > Ixil r=I
Is/sni_I
1l1/2
n
e2 : lIX112 =
IIAII2.2 = max IIAxII2
Ix; 12/
11x112=I
n
Ilxll00 = max Ixil
= maxi > I aii I =11 A II,=
II A
I«
You might have thought that the e2 vector norm would induce the Frobenius norm, but that is not so. Generally, II A II F 0 II A 112,2 Next we investigate what
is so nice about induced norms.
THEOREM 6.7 (basic properties of induced norms) 1. Let II Ila.b be the matrix norm on C" "' induced by the vector norms II Ila on cm and 1111 b on C". Let N be any other matrix norm on C' x" such that N(A) N(A) Il Xll bfor all x in C", all A in Cmxn. Then IIA Ila.b IIXIIa
for all A E C""". 11 Ilb , 11111 be vector norms on C", C', and GN respectively and let II Ila,h be the matrix norm on (Cmxn induced by II IIa and 1111b , 11I1b,I be the matrix norm on CPx"' induced by II IIb and I1II1. Finally let ll Ila,, be the matrix norm on C'' x" induced by II Ila and 11111. Then II AB II,,., <11 A IIa,b 11 B 11b., for all A E C'. xn B E Cl""". The particular case where
2. Let II Ila ,
in = n = p and all vector norms are the same is particularly nice. In this case, the result reads, after dropping all the subscripts. IIABII
IIAII IIAII
3. Let 11 11 be a Banach norm on C" x". Then
(i) IIABII < IIAII II Bll forall A, B in C"x" (ii) 11111 = 1 (iii) IIA"II _< IIAII"forA E Cnxn (iv) IIA-III >
PROOF
IIAII
for all invertible A in C"".
The proofs are left as exercises.
U
Of course, we have only scratched the surface of the theory of norms. They play an important role in matrix theory. One application is in talking about
6.2 Matrix Norms
249
the convergence of series of matrices. A very useful matrix associated to a square matrix A is eA. We may wish to use our experience with Taylor series 00
I
in calculus to write eA = E -A". For this to make sense, we must be able
-on!
to deal with the question of convergence, which entails a norm. In applied numerical linear algebra, norms are important in analyzing convergence issues and how well behaved a matrix or system of linear equations might be. There is a number called the condition number of a matrix, which is defined as c(A) =
be if A is singular
and measures in some sense "how close" a if A is nonsingular nonsingular matrix is to being singular. The folks in numerical linear algebra are interested in condition numbers. Suppose we are interested in solving the system of linear equations Ax = b where A is square and invertible. Theoretically, this A-' 1111 A 11
system has the unique solution x = A-'b. Suppose some computer reports the solution as z. Then the error vector is e = x - z. If we choose a vector norm, we can measure the absolute error Ilell = IIx - ill and the relative error 11ell II
x II
But this supposes we know x, making it pointless to talk about the error .
anyway. Realistically, we do not know x and would like to know how good an approximation i is. This leads us to consider something we can compute, namely the residual vector r = b - Az. We will consider this vector again later and ask how to minimize its length. Anyway, we can now consider the relative IIb - AnII which we can compute with some accuracy. Now choose residual , IIbII
a matrix norm that is compatible with the vector norm we are already using.
Thenr=b-Ax=Ax-Ax=Ae so IIrII = IlAell and so Hell
IIAII Ilellande=A-'r
IIA-' 11 110. Putting these together we get 11r1l
IIAII <
IIA-' II IIrII
hell
Similarly,
Il-lll From these it follows that
` IIrII
I
Ilxll < IIA
<
hell
IIAII IIA-' II Ilbll
Ilxll
IIrII
hell
'
II Ilbll.
< IIAII IIA-' II
IIrII
Ilbll
or
I
c(A) Ilbll - IIxII
- c(A) IIrII
Ilbll
Now we have the relative error trapped between numbers we can compute (if we make a good choice of norms). Notice that if the condition number of a
Norms
250
matrix is very close to 1, there is not much difference between the relative error and the relative residual. To learn more about norms and their uses, we point the reader to the references on page 252.
Exercise Set 26 1.
Let 11 11 be a matrix norm on C" x" such that 11 A B 11 < I I A I I 11
1 1B1Show that
there exists a vector norm N on 0" such that N(Ax) < IIA All N(x) for all
A EC"'x"and all x in C. 2. Show IIAII2 < IIA11I,1 IIAII,,.. for all A E Cm '
3. Show that the matrix norm 1111 on Cnxn induced by the vector norm II Ilp ,
I < p < oo satisfies F
IIAll
for all A E C"".
E(E I aij 1`')y i=1
j=1
i
i
4. Let 1111 P and II Ily be matrix norms on C"xn of Holder type with + = I Show that for p > 2, we have 11ABII,, < min(IIAII,,11Blly , II1IIy I1BII,,)
5. Prove IIA + B112 + IIA - BIIF = 2 IIAIIF + 2 IIBIIF .
842
6. Find the I -norm and oo-norm of
- 13 7 191
7. Does IIAII = max { Ia;j I } define a Banach norm for n-by-n matrices? 1:50
8. Argue directly that I I A
II2 < IIAIIF II
II2and IIABIIF
9. Prove that when A is nonsingular, min IlAxll = IIXII=l
I
IA-111
IIAIIF IIBIIF
measures how
much A can shrink vectors on the unit sphere. 10. Argue that IIA112.2 = IIA*112.2and IIAII, = IIA*IIF. Also, IIA*AII2.2 = IIAIl2,2
251
6.2 Matrix Norms 11.
Prove that if SS* = / and T*T = I , then IIS*AT112,2 =
12.
Argue that I I A I I F = I I A
IIA112,2
TI
IF.
13. Prove that if A = AT, then 11Alloo = IIAl11.
14. Argue that 11Axll . < fn- IIA112 IIxII for all x in C".
15. Prove that IlAxII2 <- ,/ IIAlI,,, lIX112 for all x in C".
16. Argue that f IIAIl2 < IIAII. <-,/n IIAIl2 17. Suppose A is m-by-n and Q is m-by-m with Q-1 IIQAIIF = IIAIIF
QT. Argue that
18. Argue that there cannot exist a norm 1111 on C" such that IIAxFI < IIAIIF IIXII
for all x in C".
19. Suppose U and V are unitary matrices, U* = U-' and V* = V-'. For which norms that we have discussed does II U A V II = IIA II . For all A in
f,nxn7
E = E2.
20. Suppose 11 is a Banach norm. Suppose 0 11
(a) Prove that 11 Ell > 1.
(b) Prove that if A is singular, II! - All > 1. (c) Prove that if A is nonsingular, IIA' I >
IIIII
I
11 A11
(d) Prove that if 111 - A 11 < 1, then A is nonsingular.
(e) Prove that if A is nonsingular, A + B nonsingular the IIA-'(A+B)-'II < IIA 'II II(A+B)-'II IIBII. 21. Argue that II ll on (Cntxm
is not a Banach norm but II II' where 11 A II' = m II A II. is
22. Suppose S2 is a positive definite m-by-m matrix . Argue that the Mahalanobis norm IlAlln = tr(A*QA) is in fact a norm on C,nxm 23. Prove that for an m-by-n matrix A, IIA 112 < II A II i < mn 11 A ll 2
24. Prove that for an m-by-n matrix A, IIA Iloo < II A II i < mn IIA Il
25. Prove that for an m-by-n matrix A, II All... < II A Ill <_
.
mn II A II,.
26. Prove Holder's inequality for m-by-n matrices. Suppose I < p, q < 00 and
-p + 1 = 1. Then IIABII1 <- IlAllt, IIBIIq. q
Norms
252
Further Reading [B&L, 1988] G. R. Belitskii and 1. Yu. Lyuhich, Matrix Norms and Their Applications, Oper. Theory Adv. Appl., Vol. 36, Birkh5user, (1988).
[H&J, 1985] Roger A. Horn and Charles R. Johnson, Matrix Analysis, Cambridge University Press, (1985). [Ld, 1996] HelmutLutkepohl, Handbook of Matrices, John Wiley & Sons, New York, (1996).
[Noble, 1969] Ben Noble, Applied Linear Algebra, Prentice Hall, Englewood Cliffs, NJ, (1969).
6.2.1 6.2.1.1
MATLAB Moment Norms
MATLAB has built in commands to compute certain vector norms. If v = (v1, v2, ... , vn) is a vector in Cn, recall that the p-norm is 1/n
I
Ilvlloo = max 1<j<11
{Ivjl}.
In MATLAB, the function is norm(v, p).
Note that p = -inf is acceptable and norm(v,-int) returns min {Iv1I}. I<j
For example, let's generate a random vector in V.
253
6.2 Matrix Norms v= I0*rand(5,1)+i* 10*rand(5, 1) V =
6.1543 + 4.0571 i
7.9194 + 9.3547i 9.2181 + 9.16901 7.3821 + 4.10271
1.7627 + 8.9365i
We can make a list of various norms by
> > [norm(v, 1), norm(v,2), norm(v,3), norm(v,-int),norm(v,inf)] ans =
22.9761
50.1839
17.9648
7.3713
13.0017
For fun, let's try the vector w = (i, i, i, i, i).
> > w=[iiiii] w=
Column I through Column 4 0 + 1.0000i
0 + 1.0000i
0 + 1.0000i
0 + 1.0000i
Column 5 0 + 1.0000i
[norm(w,1), norm(w,2), norm(w,3), norm(w,-int),norm(w,int)]
ans = 5.0000
2.2361
1.0000
1.7100
1.0000
MATLAB can also compute certain matrix norms. Recall that the p-norm of a matrix A is IIAvIIv,
IIAIIt, = max
where I < p < oo.
IIVIIp
Also recall that n
maxE laijI
1
j=1
which is the maximum row sum norm. Of course, we cannot forget the Frobenius norm
II All F- =
(1:1: 1aij12 l
/
= tr(A*A) 1/2.
Norms
254
Let's look at an example with a randomly generated 4-by-5 complex matrix. The only matrix norms available in MATLAB are the p-norms where p is 1, 2, inf, or "fro" >> A=10*rand(4,5)+i*10*rand(4,5)
A= Column I through Column 4 9.5013 + 0.5789i 8.9130 + 1038891 8.2141 + 2.72191 9.2181 + 4.451(1
2.3114+3.5287i 7.6210 + 2.20277i 4.4470 + 1.9881 i 7.3821 + 9.3181 i 6.0684 + 8.1317i 4.5647 + 1.98721 6.1543 + 0.1527i 1.7627 + 4.65991 4.8598 + 0.09861 0.1850 + 6.0379i
7.9194 + 7.4679i 4.0571 + 4.18651
Column 5 9.3547 + 8.4622i
9.1690 + 5.2515i
4.1027 + 2.0265i 8.9365 + 6.72141
> > [norm(A,I), norm(A,2), norm(A,-inf),norm(A,int),norm(A,'fro')]
ans = 38.9386 35.0261 50.0435 50.0435 37.6361
The 2-norm of a very large matrix may be difficult to compute, so it can be estimated with the command normest(A, tol),
where toll is a given error tolerance. The default tolerance is 10-6. Recall that the condition number of a nonsingular matrix A relative to some p-norm is defined as c,,(A) = IIAII, IIA-' 11P .
This number measures the sensitivity to small changes in A as they affect solutions to Ax = b. A matrix is called "well conditioned" if c,,(A) is "small" and "ill conditioned" if cp(A) is "large" The MATLAB command is cond(A, p),
6.2 Matrix Norms
255
where p can be 1, 2, inf, or `fro.' Rectangular matrices are only accepted for p = 2, which is the default. For example, for the matrix A above > > cond(A, 2)
ans = 8.4687
Again, for large matrices, computing the condition number can he difficult so MATLAB offers two ways to estimate the 1-norm of a square matrix. The commands are rcond and condest. To illustrate, we shall call on a well-known ill conditioned matrix called the Hilbert matrix. This is the square matrix whose (i, j) entry is
1
.+.+I
.
It is built into MATLAB with the call hilb(n). We will
use the rat format to view the entries as fractions and not decimals. We will also compute the determinant to show just how small it is making this matrix very "close" to being singular.
> > format rat > > H = hilb(5)
H= 11111 2
3
4
'
1
1
1
1
2
3
4
5
6
1
1
1
1
1
3
4
5
5
6 7
1
I
I
I
I
4
5
6
7
8
I
I
I
I
I
5
6
7
8
9
Chapter 7 Inner Products
Hermitian inner product, parallelogram law, polarization identity, Appolonius identity, Pythagorean theorem
7.1
The Inner Product Space C"
In this section, we extend the idea of dot product from W to (r". Let's go back to R2, where we recall how the notion of dot product was motivated. The idea was to try to get a hold of the angle between two nonzero vectors. We have the notion of norm, Ilxii = ',/x - + x- , and thus of distance, d(x, y) = IIy - xii . Let's picture the situation:
(XI, Xz)
Figure 7.1:
The angle between two vectors. 257
Inner Products
258
By the famous law of cosines, we have IIY - x112 = 11x112 + IIY112 - 2 Ilxll IIYII
cos 0. That is, (y, - x, )2 + (y2 - X2)2 = x2 + x2 + y; + y2 - 2 IN 11 11Y11 cos 0. Using the good old "foil" method from high school, we get y; - 2x, yi +x; + y2 - 2x2y2 + x2 = x; + x2 + y; + y2 - 2 Ilxll IIYII cos 0. Now cancel and get x, y, + x2y2 = I I x 1 1 IIYII cos 0 or
cos0 =
X1 Y1 + x2)'2
Ilxll IIYII
Now, the length of the vectors x and y does not affect 0, the angle between them, so the numerator must be the key quantity determining the angle. This leads us to define the dot product of x and y as x y = x, y, + x2y2. This easily extends to R". Suppose we copy this definition over to C". In R2, we have (x,, x2) (x,, x2) = xi +x2; if this dot product is zero, x, = x2 = 0, so (x,, x2) is the zero vector. However, in C2, (1, i) (1, i) = 12 + i 2 = 0; but we dotted a nonzero vector with itself! Do you see the problem'? We have seen it before. A French mathematician by the name of Charles Hermite (24 December 182214 January 1901) came up with an idea on how to solve this problem. Use complex conjugates! In other words, he would define(I, i) (1, i) = I 1 T+ T =
I + i(-i) = I - i2 = 2. That feels a lot better. Now you can understand
1
why the next definition is the way it is.
DEFINITION 7.1
(Hermitian inner product)
LetxandybeinC",where x=(X,,X2,... ,x")andy=(y,,Y2,...y").Then the Hermitian inner product of x and y is (x I y) = x, y, +x2y2 +
+x,, y" =
01
:x1)'1.
1='
We are using the notation of our physicist friends that P. A. M. Dirac (8 August 1902-22 October 1984 ) pioneered. There is a notable difference however. We put the complex conjugation on the ys, whereas they put it on the xs. That is not an essential difference. We get the same theories but they are a kind of "mirror image" to each other. In matrix notation, we have x, X2
(x I Y) = [7 172...Yn] Xn
Note that we are viewing the vectors as columns instead of n-tuples. The next theorem collects some of the basic computational facts about inner products. The proofs are routine and thus left to the reader.
7.1 The Inner Product Space C"
259
THEOREM 7.1 (basic facts about inner product)
Letx,y,zbeinC",a, (3 E C. Then 1.
(x+ylz)=(xIZ)+(Ylz)
2.
(ax I z) = a (x l z)
3.
(xIY)=(YIx)
4. (xlx)0and (xIx)=0iffx 5.
(xII3Y)=R(xIY)
Notice that for the physicists, scalars come out of the first slot decorated with a complex conjugate, whereas they come out of the second slot unscathed. Next, using the properties established in Theorem 7.1, we can derive additional computational rules. See if you can prove them without mentioning the components of the vectors involved.
THEOREM 7.2 Let x, y, z be in C", a, 0 E C. Then 1.
(XIY+Z)=(xIY)+(xlz)
2.
(-6 IY)=\xl-6 >=Oforanyx,yinC"
3.
(x I Y) = O for all y in C" implies x =
4.
(X I z)
(y I z) for all z in C" implies x = y ,"
7n
5.
F-ajxj IY)
)
j=1 6.
= j=1
x I E13kyk/ _ >Rk(X I Yk) k=1
gym`
7.
Haj(Xj IY)
k=1
p
') = `ajXj I EQky j=1 k=1
m
P
EajQk (Xj
j=lk=1
8. (x-ylz)=(XIZ)-(ylz) 9. (xly-Z)(XIY)-(Xlz).
I Yk)
260
Inner Products
Next, we note that this inner product is intimately connected to the h norm xX x 1x,12 + Ixz12 + + Ix" 12 = 11X112. Thus 11X112 = (x I x). Note this makes perfect sense, since (x I x) is a positive real number.
on C". Namely, for x = (x1, x2, ...
I
We end with four facts that have a geometric flavor. We can characterize the perpendicularity of two vectors through the inner product. Namely, we define
x orthogonal to y, in symbols x I y, iff < x I y >= 0. THEOREM 7.3 Let x, y, z be in C". Then 1.
IIx + y112 + IIX - y112 = 211x112 + 2 IIY112 (parallelogram law)
2.
(x I y) = I/4(Ilx+y112 (polarization identity)
3.
IIZ -x112 + IIZ - YII'` = z IIX - y112 + 2 IIz identity)
-
IIX-Y112 + i IIX+iyI12 - i IIX-iY1121
Z
(x + Y)112 (Appolonius
4. If'x 1 y, then IIx+Y112 = IIXII2 + 11Y112 (Pythagorean theorem).
PROOF
The norm here is of course the 12 norm. The proofs are computational and left to the reader. 0
Exercise Set 27 1. Explain why (x I y) = y*x. 2. Establish the claims of Theorem 7.1, Theorem 7.2, and Theorem 7.3.
3. Argue that I< x I y >I < < x I x > < y I y > . This is the CauchySchwarz inequality. Actually, you can do a little better. Prove that
I<xIY>1<
Ix,yrl<<xlx>
.Make upanexample
where both inequalities are strict.
4. Let x = (1, 1, 1, 1). Find as many independent vectors that you can such
that <xIy>=0. 5. Argue that < Ax I y >=< x I A*y > for all x, y in C".
7. I The Inner Product Space C"
261
6. Prove that <xlay +(3z>=ax <xIy>+(3<xlz>for all x,y,z in C" and all a, R E C. 7.
Argue that (x I -6 ) = 0 = (-6 l y) for all x, y in C".
8. Prove that if (x I y) = 0, then IIX ± YII2 = IIXI122 + 1ly112
9. Argue that if (Ax I y) = (x I By) for all x, y in C", then B = A*.
10. Prove that Ax = b is solvable iff b is orthogonal to every vector in Null(A*). 11. Let z = (Z I, Z2) E C2 and w = (w1, w2) E C2. Which of the following defines an inner product on C2? (a) (z I w) = zIW2
(h) (zlw)=ziw2-z2wi (c) (z l w) = zi wi + z2w2
(d) (z I w) =2zIWi +i(z2wi -ziw2)+2z2w2. 12. Suppose llx + yll = 9, Ilx - YII = 7, and IIXII = 8. Can you determine Ilyll?
13. Suppose (z I w) = w*Az defines an inner product on C". What can you say about the matrix A? (Hint: What can you say about A* regarding the diagonal elements of A?) 14. Suppose f is a linear map from C" to C. Argue that that there is a unique vector y such that f (x) = (x I y) .
15. Prove 4 (Ax I y) _ (A(x + y) I x + y) - (A(x - y) I x - y) + i (A(x+ iy) I x + iy) - (A(x - iy) I x - iy) . 16. Prove that for all vectors x and y, l1x+yl12-i Ilix+y1l2 =11X112+IIy112i(IIXII2 + IIYIIZ) + 2 (x I y)
Further Reading [Rassias, 1997] T. M. Rassias, Inner Product Spaces and Applications, Chapman & Hall, Boca Raton, FL, (1997).
Inner Products
262
[Steele, 2004] J. Michael Steele, The Cuuchy-Schwarz Muster Class: An Introduction to the Art of Mathematical Inequalities, Cambridge University Press, Cambridge, and the Mathematical Association of America, Providence, RI, (2004).
orthogonal set of vectors, M-perp, unit vector, normalized, orthonormal set, Fourier expansion, Fourier coefficients, Bessel's inequality, Gram-Schmidt process
7.2
Orthogonal Sets of Vectors in Cn
To ask that a set of'vectors be an independent set is to ask much. However, in applied mathematics, we often need a stronger condition, namely an orthogonal set of vectors. Remember, we motivated the idea of dot product in an attempt to get a hold of the idea of the angle between two vectors. What we are saying is that xy the most important angle is a right angle. Thus, if you believe cos 0 = IIxII IIYII
in R" then, for 0 = 90°, we must have x y = 0. This leads us to the next definition.
DEFINITION 7.2 (orthogonal vectors and 1) Let x, y E C". We say x and y are orthogonal and write x 1 y iff (x I y) = 0. That isy*x = 0.ForanysubsetM C C", define M1 ={x E C" I x 1 mforall
m in M). M1 is read "M perp." A set of vectors {xj } in C" is called an orthogonal set iff < xj I xk > = 0 if j # k. As usual, there are some easy consequences of the definitions and, as usual. we leave the proofs to the reader.
THEOREM 7.4
Let x,yEC", aEC,M,NCC".Then
1. xIyiffy.1x 2.
x 1 0 for all x in C"
3. xlyiffxlayforall ain C 4. M1 is always a subspace of C"
7.2 Orthogonal Sets of Vectors in C"
5. Ml
263
= (span(M))1
6. McM" 7. M111 = M1 8.
9. (_)1 = c" 10.
M
C N implies Nl a Ml.
Sometimes, it is convenient to have an orthogonal set of vectors to be of unit length. This part is usually easy to achieve. We introduce additional language next.
DEFINITION 7.3 (unit vector, orthonormal set) Let X E C". We call x a unit vector if (x I x) = 1. This is the same as saying IIxII = 1. Note that any nonzero vector x in C" can be normalized to a unit vector u = X A set of vectors {xi } in C" is called an orthonormal set iff IIxII
(xi I xk)
.
0 if j 54 k. In other words, an orthonormal set is just a set of
I fj = = pairwise orthogonal unit vectors. Note that any orthogonal set can be made into an orthonormal set just by normalizing each vector. As we said earlier, orthogonality is a strong demand on a set of vectors.
THEOREM 7.5 Let D be an orthogonal set of nonzero vectors in C". Then D is an independent set.
PROOF
This is left as a nice exercise.
The previous theorem puts a significant limit to the number of mutually orthogonal vectors in a set; you cannot have more than n in C". For example, (i, 2i, 2i), (2i, i, -2i), and (2i, -2i, i) form an orthogonal set in C3 and being, necessarily, independent, form a basis of C3. The easiest orthonormal basis for C" is the standard basis T e i = ( 1 , 0, ... , 0), - e 2 = (0, 1, 0, ... , 0), ... , -'e " = (0, 0.... , 0, 1). One reason orthonormal sets are so nice is that they associate a particularly nice set of scalars to any vector in their span. If you have had some analysis, you may have heard of Fourier expansions and Fourier coefficients. If you have not, do not worry about it.
Inner Products
264
THEOREM 7.6 (Fourier expansion) Let {u 1 , u2, ... , U,,,) be an orthonormal set of vectors. Suppose x = u I u I +
a2uz + ... + an,um. Then aj = (x I uj) for all j = 1, 2, ... , m.
PROOF
Again, this is a good exercise.
a
DEFINITION 7.4 (Fourier coefficients) Let {ej } be an orthonormal set of vectors in C" and x a vector in C". The set of scalars { (x I ej) } is called the set of Fourier coefficients of x with respect to this orthonormal set. In view of Theorem 7.3, if you have an orthonormal basis of C" and if you know the Fourier coefficients of a vector, you can reconstruct the vector.
THEOREM 7.7 (Bessel's inequality) Let ( e 1 , e2, ... , e,,,) be an orthonormal set in C". If x is any vector in C", then 2
in
I.
"I
x-1: (xIek)ek
=11x112-EI (xIek)12
k=1
k=I
2. (x_>1 (x I ek) ek k=1
3.
1 ej for each j = 1, ... , m
/
I(x l ek)IZ < IIx112. k=I
PROOF We will sketch the proof of (1) and leave the other two statements as exercises. We will use the computational rules of inner product quite intensively: in
m
2
x-(xIek)ek
x - E(xIek)ek k=1
k=1
k=1
(xl =( xx)-ek)ek )- ((x / \\
k= 1
Ik
m
m
+
Iek)ek
j /
((xIeL)eklE (xlek)ek\ k=1
k=1 M
m
=(xlx)-E(xIek)(xIek)-E(xIek)(ek Ix) k=1
k=1
7.2 Orthogonal Sets of Vectors in C" i"
265
m
+J:J: (xIek)(ek Iej)(xIej) k=1 j=1
=IIXII2-21: 1(XIek)I2+EI(XIek)12 k=1
k=1
m
= IIXI12 -
I
(X I ek) 12
.
k=I
a
The reader should be sure to understand each of the steps above. Again there is some nice geometry here. Bessel's inequality may be interpreted as saying that the sum of the squared magnitudes of the Fourier coefficients (i.e., of the components of the vector in various perpendicular directions) never exceeds the square of the length of the vector itself.
COROLLARY 7.1 Suppose {uI , u2, ... , u" } is an orthonormal basis of C". Then for any x in C", we have I. IIXI12 = F_ I(X I uk)12
(Parseval's identity)
k=1
2. x = j (x I uk) Uk
(Fourier expansion)
k=I
3. (xIY)=E (XIuk)(uk IY) forallx,yinC". k=I
Our approach all along has been constructive. Now we address the issue of how to generate orthogonal sets of vectors using the pseudoinverse. Begin with an arbitrary nonzero vector in C", call it x1. Form x*. Then we seek to solve the matrix equation x, x2 = 0. But we know all solutions of this equation are of the
form x2 = [I - xlxi ] v1, where vI is arbitrary. We do not want x2 to be 0, so we must choose v, specifically to be outside span(xi). Let's suppose we have done so. Then we have x I, x2, orthogonal vectors, both nonzero. We wish a third nonzero vector orthogonal to x, and x2. The form of x2 leads us to make a guess
for x3, namely x3 = [I - x1 xI - x2x? ] v2. Then x3 = V2* [/ - x,xt - x2X ]
.
Evidently, x3x, and xiX2 are zero, so x, and x2 are orthogonal to x3. Again we need to take our arbitrary vector v2 outside span {x,,x2} to insure we have not taken x3 = '. The pattern is now clear to produce additional vectors that are pairwise orthogonal. This process must terminate since orthogonal vectors are independent and you cannot have more than n independent vectors in C".
Inner Products
266
To get an orthonormal set of vectors, simply normalize the vectors obtained by the process described above. Let's illustrate with an example.
Example 7.1 i i
In C3, take xi =
. Then
xi =
i
xi
-i
X *J x,
3
[-i/3 -i/3 -i/3]. Then ! - xixt = 13 = 13 -
1/3
1/3
1/3
1/3
1/3
1/3
1/3
1/3
1/3
=
[-i/3 -i/3 -i/3]
Ii]i
2/3
-1/3
-1/3
-1/3
2/3
-1/3
-1/3
-1/3
2/3
1
Evidently, v, =
is not in span
0 0
J), So -1/3 -1/3 -1/3 2/3 -1/3 -1/3 -1/3 2/3 2/3
X2 = [
0
I
0]
=
You may verify x, I x2 if you are skeptical. Next, x2x2 = 2/3 2/3
[ 2/3
-1/3 -1/3
-1/3
-1 /3
-1/3 -1/3
2/3
-1/3 -1/3
2/3
-1/3 -1/3
1/6
1/6
1/6
1/6
,
so13-x,xi -x2x2 =
0
Now v2 =
is not in span
i
1
X3 =
0 1/2 1/2
-
Therefore
i
0
0
0
0 0
1/2
-1/2
-1/2
1/2
,
-1/3 -1/3
(why not?), so 0
. (Note, we could have just as well chosen v2 =
1
)
0
+2/3
i i
-1/2 -1/2]
2/3
i
0
[I
,
-1/3 -1/3
0 ,
1/2 1/2
is an orthogonal set of vectors in
7.2 Orthogonal Sets of Vectors in C"
267
C3 and necessarily a basis for C3. Note, the process must stop at this point since
x,x+1 + x2xz + x3x3
1/3 1/3
1/ 3
1/3 1/3
1/ 3
-1/3 - 1/3
2/3
+
1/3 1/3 1/ 3
-1/3
1/6
1/6
-1/3
1/6
1/6
+
0
0
0
1/2
0 -1/2
0 -1/2
1/2
0 0
1
0
1
0
0 0
= I3.
1
We note a matrix connection here. Form the matrix Q using our
-i
-i
orthogonal vectors as columns. Then Q* Q =
2/3 0
0
2/3
i
-1/3 -1/2 -1/3 1/2
=
i
-3z
0
0
3
-i
-1/3 -1/3 -1/2 1/2
0 0 2/3 is a diagonal matrix with posi1/2 0 0 Live entries on the diagonal. What happens when you compute QQ*? Finally, let's create an orthonormal basis by normalizing the vectors obtained i i
i/
i / 3-
above:
2/f
i/
,
-1 /.
,
. Now form U =
Ilk
3f 0
-1 /.
and compute U*U and UU*.
Next, we tackle a closely related problem. Suppose we are given yi, Y29 Y3, . , independent vectors in C". We wish to generate an orthonormal sequence q, , q2, q3 ... so that span { q, , q2, , qj } = span {Y,, y2, ... , Y j } Yi Yi. Being independent vectors, none of for all j. Begin by setting q, = Y the ys are zero. Clearly span(q,) = span(y1) and q1 q, =
Y,
Yi
y- Y,
YfYi
= 1,
so q, is a unit vector.
Following the reasoning above, set qz = vi [I - q,qi ] = v*, [I - q,q*1]. A good choice of vi is span(y1),
*
Y2
+
(y2 [I - gtgt ] Y2)
2.
For one thing, we know Y2
so any multiple of Y2 will also not be in there. Clearly
q2q, = 0, so we get our orthogonality. Moreover, g2g2
[I - q. q, ] Y2 (Y2* [I
- gigi ] Y2)2
_
=
y2* [I - giq*,]
(yz [I - q'qi ] Y2)T
Yz [I - q i q, ] Y2 = 1. Now the pattern should be clear;
Yi [I - gigi] Y2
Inner Products
268
choose q3 =
Yi [I - gigi - q2q? Note y, (yi [I - q qi - q:qi ] Y3)2
span {y1, yz). General) y
k-1
I I - Egigi rr
Yk
L
* then gk=
1_1
,.
k-1
I
- j=-1 gigj Yk)
(y[k
This is known as the Gram-Schmidt orthogonalization process, named after Erhardt Schmidt (13 January 1876-6 December 1959), a German mathematician who described the process in 1907, and Jorgen Peterson Gram (27 June 1850-29 April 1916), a Danish actuary. It produces orthogonal vectors starting with a list of independent vectors, without disturbing the spans along the way.
Exercise Set 28 1. Let x =
1l
. Find a basis for {x}1.
-I 2. Let x1, x,, ..., x, be any vectors in C". Form the r-by-r matrix G by
G = [(xj I xk )] for j, k = 1, 2, ..., r. This is called the Gram matrix after J. P. Gram, mentioned above. Argue that G is Hermitian. The determinant of G is called the Grammian. Prove that x1 , x2, ..., xr form an independent
set if the Grammian is positive and a dependent set if the Grammian is zero.
3. For complex matrices A and B, argue that AB* = 0 iffCol(A) 1 Col(B).
4. Argue that C" = M ® M1 for any subspace M of C". In particular, prove that dim(M1) = n - dim(M). Also, argue that M = M11 for any subspace M of C. 5. Prove that every orthonormal subset of vectors in C" can be extended to an orthonormal basis of C".
6. Apply the Gram-Schmidt processes to {(i, 0, -1, 2i), (2i, 2i, 0, 2i), (i, -2i, 0, 0)} and extend the orthonormal set you get to a basis of C4. 7. Prove the claims of Theorem 7.3.
7.3 QR Factorization
269
8. Suppose M and N are subspaces of C". Prove that (M nN)1 = M1+N'
and (M + N)1 = M1 n N. 7.2.1
MATLAB Moment
7.2.1.1
The Gram-Schmidt Process
The following M-file produces the usual Gram-Schmidt process on a set of linearly independent vectors. These vectors should be input as the columns of a matrix.
2
function GS = grmsch(A) [m n] = size(A);
I
3
Q(:, 1)=A(:, 1)/norm(A(:, I ));
4
fori=2:n
5
Q(:,i) = A( i) Q(:,1:i I)*Q(:,I:i-1)'*A(:,i);
6
Q(:,i) = Q(:,i)/norm(Q(:,i));
7
end
8
Q
You can see that this is the usual algorithm taught in elementary linear algebra. There is a better algorithm for numerical reasons called the modified GramSchmidt process. Look this up and make an M-file for it. In the meantime, try out the code above on some nice matrices, such as
A
_
I
1
1
1
1
0
1
0
0
1
0
0
QR factorization, unitary matrix, Kung's algorithm, orthogonal full rank factorization
7.3 QR Factorization The orthogonalization procedures described in the previous section have some interesting consequences for matrix theory. First, if A E C"'x", then A* E cnxm, where A* = AT. If we look at A*A, which is n-by-n, and partition
270
Inner Products
A by columns, we see
A*A =
aT iT 2
a2
a"
-T a,,
I
= [a,Tal] = [(ai
I ae)]
In other words, the matrix entries of A*A are just the inner products of the columns of A. Similarly, the matrix entries of AA* are just the inner products of the rows of A. It should now be easy for the reader to prove the following theorem. THEOREM 7.8 Let A E C"' 1". Then I. The columns of A are orthogonal iff A*A is a diagonal matrix.
2. A* is a left inverse of A (i.e., A*A = 1) iff the columns of A are orthonormal. 3. The rows of A are orthogonal iff AA* is a diagonal matrix.
4. A* is a right inverse of A (i.e., AA* = 1) iff the rows of A are orthonormal.
From this, we get the following. COROLLARY 7.2
Let U E C"x" Then the following are equivalent. 1. The columns of U are orthonormal.
2. U*U = 1,,.
3. U*=U-1. 4. UU* = I. 5. The rows of U are orthonormal. This leads us to a definition.
DEFINITION 7.5
(unitary matrix) The square matrix U E C"111 is called unitary if'U satisfies any one (and hence all) of the conditions in Corollary 7.2 above.
7.3 QR Factorization
271
Unitary matrices have many useful features that we will pursue later. But first we want to have another look at the Gram-Schmidt procedure. It leads to a useful factorization of matrices. Suppose, for the purpose of illustration, we have three independent vectors a,, a2, a3 that can be considered the columns of a matrix A 1
(i.e., A = [a, 1212 1 a3]). Say, to be even more concrete, A
=
1 1
a,
We begin the process by taking q, =
=
'Vra
=
*
a,
a,
aa
qTq,
V asa, l
a,
1
1
0 0
0 0 0
Then q, is a unit vector,
Ila, 11
= I. But the new wrinkle here is we can solve back for
i
a , . Namely, a, =
a,a,q, = Ila, II qi But note, a, q, _
a*a ,
a ,a ,
=
= Ila, 11, so a, = (a*gl)glIn our numerical example, q, _
aa, and
2
a, = 2q, sincea,q,
I
1
1
1
=2=
]
a, a, = Ila, 11. Note
2
a,q, is real and positive. By the way, we can now write A = [2q, 1212
1 a3].
Next, we solve back for az. Let C2 = a2 - (a2q,)q, = a2 - (az 1 qi) qi and q2 =
we see c*2 q
C2
cz
C2
Since cz = a2 - (a29i)9, = a2 - (a29i)9,
IIc2Il
= aZgl - (a*gi)q,gi. But q, q, = I so c*qi = 0 = (q, 2
I cz)
making cz I. q,. Therefore, q2 1 q1, since qz is just a scalar multiple of cz. In our numerical example, c2 =
1
0 0
-1
=
-21
so gz =
2
i
z,
I
. Now we do the new thing and solve for az. We see a2 = cz +
(azgi)gi =
c2c92 + (aigi)gi = Ilczll qz + (a2 I qi) qi and so a2g2 =
c2c2g2g2 + (a2q, )q,qz = c2cz = Ilcz II . Again, a2g2 is real and positive, and a2 = (a2q, )q, + (a2g2)g2 = (a2 19i) qi + Ilcz 11 qz In our numerical example, az = Lq, + L q2 so A = [2qi I qi + qz I a31. Finally we take c3 = a3-(aigi)gi -(a2g2)g2 = 1313 -(a3 I qi) qi -(a3 19z) 92. Wecomputec3q, =
272
/cuter Products
'andc;q2 =a3gz-
(a3-(a*ql)q*-(a*g)g2)q1 and conclude c31gl , q2. Take q3 = aigz = =a*sqi-a*qi-'
C3
c3
=
and get q3 1
11C3 11 1
11
q1, q2 In our example, c3 =
i
0 0
- 1 /2
1
-1/2
T
0
_
2
-2 0 0
i
,f2-
f
2
so q3 -
Solving for a3, we see a3 = C3 + (a*gl)g1 + (aigz)gz =
-O2
0 c3c3g3 + (a3*g1)g1 + (a*g2)g2 = 11C3 11 q3 + (a3 I qi) qi + (a3 I q2) q2 and a;q3 = cjc3 = 11c311. So finally, a3 = (a'391)91 + (a392)g2 + (a3g3)g3 = (a3 I q1) 91 + (a3 I q2) q2 + 11C3 11 q3. In our example 1
1
0
a3 =
=(z)
0
1
_
f2
1
2
1
1
2
2
0
1
+(Z )
2
02
so
Z
Z
A = [2q1 I q1 +qz I 2
-0
+(2)
L0
f
1
Zq1+Zq2+ z q3] _ [q1 I q2 I q31 2
1
0
1
0
0
= QR, whe re Q* Q = 1, since t he 2
0 columns of Q are orthonormal and R is upper triangular with positive real entries on the main diagonal. This is the famous QRfactorization. Notice that the columns of A were independent for this to work. We could generalize the procedure a little bit by agreeing to allow dependencies. If we run into one, agree to replace that column with a column of zeros in Q. For example,
A
_
11
3
1
3
1
1
3
0 0 0
1
3
1
0
4
1
4
0
0
0
2 2 2 2
0
0 0 0
i
0
2 I
2 2
2
f 2
0
2
0
0
0
0
2
6
1
4
0 0 0 0
2
0
0 0
1
4
0
0
0
0
0
0
0
0
0
,f2-
2
2
We lose a bit since now Q* Q is diagonal with zeros and ones but R is still upper triangular square with nonnegative real entries on the main diagonal. We summarize our findings in a theorem.
7.3 QR Factorization
273
THEOREM 7.9 (QRfactorization)
Let A E C'' ", with n < m. Then there is a matrix Q E C"' "" with orthonormal columns and an upper triangular matrix R in C", such that A = QR. Moreover, if n = ni, Q is square and unitary. Even more, if A is square and nonsingular R may be selected so as to have positive real numbers on the diagonal. In this case, the factorization is unique. PROOF Suppose A has full column rank n. Then the columns form an independent set in C"'. Apply the Gram-Schmidt procedure as previously illustrated and write A = QR, with Q* Q = I and R upper triangular with positive real entries on the main diagonal. If the columns of A are dependent, we proceed with the generalization indicated immediately above. Let A = [a1 I a2 I ... I a"j. If al at a1 = 6, take q, = "; otherwise take q1 = V-atal Tal II
Next compute C2 = a2 - (a2 gt )ql = a2 - (a2 191) 91 If c2 = ', which c2 happens if a2 depends on a1, set q2 = 1. If c2 0 take q2 = = c C2
C2
k-1
Generally, for k = 2, 3, ... , n, compute Ck = ak - E (akgi)gi = j=1
IIC2II
ak - E (ak I qj ) qi. If ck = '(iff ak depends on the previous a's), set j=1
qk = '. Else, take qk =
Ck
Ce C,
=
Ck
This constructs a list of vec-
II Ck 11
tors q1, q2, ... , q, that is an orthogonal set consisting of unit vectors and the zero vector wherever a dependency was detected. Now, by construction, each qj is a linear combination of a,, ... , aj, but also, each aj is a linear combination of q I, ... , qj. Thus, there exist scalars ak% f o r j = 1 , 2, ... , n such that i
ai = jakigk. k=1
Indeed akj = (akqj) = (ak I qj). To fill out the matrix R we take ak1 = 0 if k > j. W e also take a;j = 0 for all j = 1, 2, ... , n for each i where q1 In this way, we have A = Q R where the columns of Q are orthogonal to each other and R is upper triangular. But we promised more! Namely that Q is supposed to have orthonormal columns. All right, suppose some zero columns occurred.
Take the columns that are not zero and extend to a basis of C'. Say we get qi, q2, ... , q' as additional orthonormal vectors. Replace each zero column in turn by q1, q2, ... , q'. Now we have a new Q matrix with all orthonormal columns. Moreover, QR is still A since each new q' matches a zero row of R. This gives the promised factorization. If A is nonsingular, m = n = rank(A), so Q had no zero columns. Moreover, Q*A = R and Q necessarily unitary says R cannot have any zero entries on its
Inner Products
274
main diagonal since R is necessarily invertible. Since the columns of Q form an orthonormal basis for C"', the upper triangular part of 'R is uniquely determined and the process puts lengths of nonzero vectors on the main diagonal of R. The Q uniqueness proof is left as an exercise. This will complete the proof.
One final version of QR says that if we have a matrix in C"'"" of rank r, we can always permute the columns of A to find a basis of the column space. Then we can apply QR to those r columns.
COROLLARY 7.3
Suppose A E C;"". Then there exists a permutation matrix P such that A P = [QR I M], where Q* Q = I,, R is r-by-r upper triangular with positive elements on its main diagonal. We illustrate with our matrix from above.
AP=
I
I
1
3
I
1
0
3
0 0
0 0
3
1
1
3
f z
z
Ti
2
z 2
1
z1
_I 2
0 0
4 4 0 0 0 0 0
0
0 0 0 0
2
1
0
1
0 0 0
0 0 0
6
2 2
0 0
0 0 0 0
4 4
0 0 0
So, what good is this QR factorization? Well, our friends in applied mathematics really like it. So do the folks in statistics. For example, suppose you have a system of linear equations Ax = b. You form the so-called "normal equations" A*Ax = A*b. Suppose you are lucky enough to get A = QR. Then A*Ax = R*Q*Qx = A*b so R*Rx = R*Q*b. But R* is invertible, so we are reduced to solving Rx = Q*b. But this is an upper triangular system that can be solved by back substitution. There is an interesting algorithm due to S. H. Kung [2002] that gives a way of finding the QR factorization by simply using elementary row and column operations. We demonstrate this next.
7.3.1
Kung's Algorithm
Suppose A III. Then A* E C" and A*A E C""". We see A*A is square and Hermitian and its diagonal elements are real nonnegative numbers.
7.3 QR Factorization
275
Suppose A has n linearly independent columns. Then A*A is invertible. Thus we can use ' "2 1) pairs of transvections (T,j (c)*, T;j (c))to "clean out" the off diagoD, nal elements and obtain (AE)*(AE) = E*A*AE = diag(di, d2,...,
a diagonal matrix with strictly positive elements. This says the columns of
AE are orthogonal. Let C = diag[ d , ...,
Then Q = AEC-' has or-
thonormal columns. Also, E is upper triangular, so E-' is also as is R = CE-'. Finally note, A = QR. i 0 3 2 0 0 Then Let's work through an example. Suppose A = 0 2 i 0 4i 0 0 21 0 0 121 0 -3i 0 5 0 = and Ti 21 )A*ATi3(3) 0 5 0 A*A = 21
-
I
3i
0
0
9
21
0
0
60
0 21
0
0
Thus, C =
0 V
0
z1 AEC-' _
4_1
0 0
L
a, 21
055 0
0
_
0
1
0
0
0
1
zi
2
0
and E =
2 105
21
andR_CE` =
Now Q
0 -iV
0f 0
0
0 2
The reader may verify that A = QR. We finish this section with an important application to full rank factorizations.
THEOREM 7.10 Every matrix A E C111111 has a full rank factorization A = FG, where F* = F+.
PROOF
Indeed take any full rank factorization of A, say A = FG. Then the
columns of F form a basis for Col(A). Write F = QR. Then A = (QR)G = Q(RG) is also a full rank factorization of A. Take F1 = Q, G, = RG. Then Fi* F, = Q* Q = 1, but then Fi = (Fl* F1) ' Fi = Fl*. O DEFINITION 7.6 (orthogonal full rank factorizations) If A = FG is a full rank factorization and F* = F+, call the factorization an orthogonal full rank factorization.
!niter Products
276
Exercise Set 29 1. Find an orthogonal full rank factorization for
0
1
1
1
0
1
1
1
0
1
0
1
Further Reading [Kung, 2002] Sidney H. Kung, Obtaining the QR Decomposition by Pairs of Row and Column Operations, The College Mathematics Journal, Vol. 33, No. 4, September, (2002), 320-321.
7.3.2 MATLAB Moment 7.3.2.1
The QR Factorization
If A is an m-by-n complex matrix, then A can be written A = QR, where Q is unitary and R is upper triangular the same size as A. Sometimes people use a permutation matrix P to permute columns of A so that the magnitudes of the diagonal of R appear in decreasing order. Then AP = QR. MATLAB offers four versions of the QR-factorization, full size or economy size, with or without column permutations. The command for the full size is
[Q, R] = qr(A). If a portion of R is all zeros, part of Q is not necessary so the economy QR is obtained as
[Q, R] = qr(A, 0). To get the diagonal of R lined up in decreasing magnitudes, we use
[Q. R, P] = qr(A).
277
7.3 QR Factorization
Let's look at some examples.
>> A=pascal(4)
A= l
1
I
1
2
3
4
I
3
6
10
1
4
10
20
I
>> [Q,R]=9r(A)
Q=
-0.5000 -0.5000 -0.5000 -0.5000
R=
0.2236
-0.6708
0.5000
-0.2236
-10.000 -6.7082
-17.5000 -14.0872
-0.2236 -0.6708
-5.000
-2.000 0
0.5000
-0.5000 -0.5000
0.6708 0.2236
-2.2361
0 0 >> format rat
0 0
0.6708
1.000
3.5000
0
-0.2236
>>[Q,R]=qr(A)
Q=
R=
-1/2 -1/2 -1/2 -1/2
646/963
1/2
646/2889
646/2889
-1/2 -1/2
-646/963
1 /2
-646/2889
- 646/2889 - 646/963
646/963
-2
-5
-10
-35/2
0 0 0
-2889/1292
-2207/329
0 0
0
-4522/321 7/2 -646/2889
1
To continue the example, >>[Q,R,Pl=9r(A)
Q= -202 /4593
-1414/2967
1125/1339
-583/2286 1787/2548
-1190/1781
-577/3291
-26 3/598 -26 3/299
-217/502
-501/1085
-621/974
699/1871
975/4354
320/1673
-469 3/202 0
-1837/351
-1273/827
-1837/153
-2192/1357
-846/703
0
357/836
0
0
-1656/1237 -281/1295 -301/4721
-263 /1495
R= 0 0
Inner Products
278
P0
0 0 1
0
1
1
0
0 0
0 0
0 0
0
1
Note that as a freebee, we get an orthonormal basis for the column space of A, namely, the columns of Q.
7.4 A Fundamental Theorem of Linear Algebra We have seen already how to associate subspaces to a matrix. The rank-plus-
nullity theorem gives us an important formula relating the dimension of the null space and the dimension of the column space of a matrix. In this section, we develop further the connections between the fundamental subspaces of a matrix. We continue the picture of a matrix transforming vectors to vectors. More specifically, let A be an m-by-n matrix of rank r. Then A transforms vectors from Cn to vectors in (C' by the act of multiplication. A similar view applies to A, A*, AT, and At We then have the following visual representation of what is going on.
Cn
(Cm
Amxn, A
FA-,AT,A'=AT
Figure 7.2:
The fundamental subspaces of a matrix.
The question now is, how do the other subspaces fit into this picture'? To answer this question we will use the inner product and orthogonality ideas.
Recall (x I y) = y*x = >x; y; and M1 denote the set of all vectors that
7.4 A Fundamental Theorem of Linear Algebra
279
are orthogonal to each and every vector of M. Recall that M = M11 and
bf ® M1 = C" for all subspaces M of C". Also, if N CM, then iy1 C N1. We will make repeated use of a formula that is remarkably simple to prove. THEOREM 7.11 Let A be in Cmxn X E C", Y E Cm. Then
(Ax I y) = (x I A*y). PROOF
We compute (x I A*y) = (A*y)*x = (y*A**)x
= (y*A)x = y*(Ax) = (Ax I y). Next, we develop another simple fact - one with important consequences. First, we need some notation. We know what it means for a matrix to multiply a single vector. We can extend this idea to a matrix, multiplying a whole collection of vectors.
DEFINITION 7.7
Let A E C"" and let M be a subset of C"
LetA(M)=(AxI xEM}. Naturally, A(M)CC' This is not such a wild idea since A(C") = Col(A) and A(Null(A)) _
{-6} . Now for that simple but useful fact.
THEOREM 7.12
LetAEC'"x",M CC",and N CC"'.Then A(M) c N iU A*(N1) C M1. PROOF We prove the implication from left to right. Suppose A(M) _g N. We need to show A*(N1) c M1. Take a vector y in A*(N1). We must argue that y E M1. That is, we must showy 1 m for all vectors m in M. Fix a vector
m E M. We will compute (y I m) and hope to get 0. But y = A*x for some x E N1 so (y I m) = (A*x I m) = (x I Am). But X E N1, and Am E N, so this inner product is zero, as we hoped. To prove the converse, just apply the result just proved to A*. Conclude that A**(M11) c N11 (i.e., A(M) C N). This completes the proof.
Now we get to the really interesting results.
0
Inner Products
280
THEOREM 7.13
Let A E C"". Then 1. Null(A) = Col(A*)1
2. Null(A)1 = Col(A*) 3. Null(A*) = Co!(A)1
4. Null(A*)1 = Col(A). PROOF It is pretty clear that if we can prove any one of the four statements above by replacing A by A* and taking "perps," we get all the others. We will focus on (1). Clearly A*(Null(A*)) c (0 ). So, by 7.12, A((_6)1) c_ A(ull(A*)1. But (-iT)J = C"; so this says A(C") C lVull(A*)1, However, A(C") = Col(A), as we have already noted, so we conclude Col(A) _c Null(A*)1. We would like the other inclusion as well. It is a triviality that A(C") C A( C"). But look at this the right way: A(C") S; A(C") = Col(A). Then again, appealing to 7.12, A*(Col(A)1) C Ci1 = (-(I). Thus, Col(A)1 C_ Null(A*), which gets us NUII(A*)1 C Col(A). 0
COROLLARY 7.4 Let A E C' x 1. Then
I. Null(A) = Col(AT)1 = Row(A)1 2. Col(A) = Null(AT)1
3. Null(A)1 = Row(A) 4.
Null(AT) = Col(A)1.
We now summarize with a theorem and a picture.
THEOREM 7.14 (fundamental theorem of linear algebra) Let A E C; X" Then 1. dim(Col(A)) = r.
2. dim(Null(A)) = n - r. 3. dim(Co!(A*)) = r.
4. dim(Null(A*)) = in - r.
7.4 A Fundamental Theorem of Linear Algebra
281
Cm
Cn
Amxn, A
4A+, AT, A'=AT
Figure 7.3:
Fundamental theorem of linear algebra.
5. Mull(A) is the orthogonal complement ofCol(A*). 6. NUll(A*) is the orthogonal complement ofCol(A).
7 iVull(A) is the orthogonal complement of Row(A). There is a connection to the problem of solving systems of linear equations. COROLLARY 7.5
Consider the system of linear equation Ax = b. Then the following are equivalent:
1. Ax = b has a solution. 2. b E Col(A) (i.e., b is a linear combination of the columns of A).
3. A*y = -6 implies b*y = 0. 4. b is orthogonal to every vector that is orthogonal to all the columns of A.
Exercise Set 30 1.
If A E C"1x", B E C^xP, and C E Cnxt', then A*AB = A*AC if and only if AB = AC.
2. Prove thatA(ull(A*A) = Null(A).
Inner Products
282
3. Prove that Col(A*A) = Col(A*). 4. Fill in the details of the proof of Corollary 7.4. 5. Fill in the details of the proof of Theorem 7.14.
Further Reading [Strang, 1988] Gilbert Strang, Linear Algebra and its Applications, 3rd Edition, Harcourt Brace Jovanovich Publishers, San Diego, (1988).
[Strang, 1993] Gilbert Strang, The Fundamental Theorem of Linear Algebra, The American Mathematical Monthly, Vol. 100, No. 9, (1993), 848-855. [Strang, 20031 Gilbert Strang, Introduction to LinearAlgebra, 3rd Edition, Wellesley-Cambridge Press, Wellesley, MA, (2003).
7.5
Minimum Norm Solutions
We have seen that a consistent system of linear equations Ax = b can have many solutions; indeed, there can be infinitely many solutions and they form an affine subspace. Now we are in a position to ask, among all of these solutions, is there a shortest one'? That is, is there a solution of minimum norm? The first question is, which norm? For this section, we choose our familiar Euclidean norm, II x 112 =
tr(x`x)=. DEFINITION 7.8 (minimum norm) We say xo is a minimum norm solution of Ax = b if xo is a solution and 11xoII _< IlxiI for all solutions x of Ax = b.
Recall that 1-inverses are the "equation solvers." We have established the consistency condition: Ax = b is consistent iff AGb = b for some G E A{ 1). In this case, all solutions can he described by x = Gb + (I - GA)z, where z is
7.5 Minimum Norm Solutions
Figure 7.4:
283
Vector solution of various length (norm).
arbitrary in C. Of course, we could use A+ for G. The first thing we establish is that there is, in fact, a minimum norm solution to any consistent system of linear equations and it is unique. THEOREM 7.1 S
Suppose Ax = b is a consistent system of linear equations (i.e., b E Col(A)). Then there exists a unique solution of Ax = b of minimum norm. In fact, it lies in Col(A*).
PROOF
For existence, choose xo = A+b. Take any solution x of Ax =
b. Then x = A+b + (1 - A+A)z for some z. Thus, IIx112 = (x I x) = (A+b + (I - A+A)z I A+b + (I - A+A)z) = (A+b I A+b) + (A+b I (IA+A)z) + ((I - A+A)z I A+b) + ((I - A+A)z I (/ - A+A)z) IIA+bII2 + ((1 - A+A)A+b I z)+(z I (/ - A+A)A+b) +II(/ - A+A)zII2 = IIA+bII2+ 0+0+ 11(1 - A+A)zll2 > IIA+bII2, with equality holding iff II(/ - A+A)zjI = iff x = A+b. Thus, A+b is the unique minimum norm 0 iff (I - A+A)z = 0 solution. Since A+ = A*(AA*)+, we have A+b E Col(A*). Once again, we see the prominence of the Moore-Penrose inverse. It turns out that the minimum norm issue is actually intimately connected with 11,41inverses. Recall, G E A{ 1, 4} iff AGA = A and (GA)* = GA.
Inner Products
284
THEOREM 7.16
LetGEA{1,4}.Then
1. GA=A+A 2.
(1 - GA)* = (I - GA) = (I - GA)2
3. A(I - GA)* = 4.
(I -GA)A* =O.
PROOF Compute that GA = GAA+A = (GA)*(A+A)* = A*G*A*A+* _ (AGA)*A+* = A*A+* = (A+A)* = A+A. The other claims are now easy and left to the reader.
0
So, { 1,4}-inverses G have the property that no matter which one you choose, GA is always the same, namely A+A. In fact, more can be said. COROLLARY 7.6
G E A{ I, 4} iff GA = A+ A. In particular, if G E A{ I, 4}, then Gb = A+b for any b E Col(A). Thus { 1,4 }-inverses are characterized by giving the minimum norm solutions.
THEOREM 7.17 Suppose Ax = b is consistent and G E A 11, 4). Then Gb is the unique solution of minimum norm. Conversely, suppose H E C"" and, whenever Ax = b is consistent, A Hb = b and II Hb II < llz ll for all solutions z other than Hb; then
HEA( I,4). PROOF
The details are left to the reader.
0
Exercise Set 31 1. Suppose G E A(l, 4). Argue that A{1, 4) _ {H I HA = GA). 2. Argue that A{1, 4} = {H I HA = A' A). 3. Suppose G E A{ 1, 4). Argue that A(1, 4) _ {G + W(I - AG) I W arbitrary l.
7.6 Least Squares
285
4. Suppose G E A{1, 3} and HE A{1,4}. Argue that HAG = A+. 5. Let u and v be in Col(A*) with Au = Av. Prove that u = v.
7.6
Least Squares
Finally, the time has come to face up to a reality raised in the very first chapter.
A system of linear equations Ax = b may not have any solutions at all. The realities of life sometime require us to come up with a "solution" even in this case. Again, we face a minimization problem. Once again, we use the Euclidean
norm in this section. Suppose Ax = b is inconsistent. It seems reasonable to seek out a vector in the column space of A that is closest to b. In other words, if b ¢ Col(A), the vector r(x) = Ax - b, which we call the residual vector, is never zero. We shall try to minimize the length of this vector in the Euclidean norm. Statisticians do this all the time under the name "least squares." DEFINITION 7.9 (least squares solutions) A vector x0 is called a least squares solution of the system of linear equations Ax = b i./f IlAxo - bII IIAx - bII for all vectors x. Remarkably, the connection here is with { 1, 3)-inverses. THEOREM 7.18
Let AEC > andGEA{1,3}.Then
1. AG=AA+ 2.
(I - AG)* = I - AG = (I - AG)2
3.
(I - AG)*A =
4. A'(I - AG) = 0. PROOF For (1), we compute AG = AA+AG = AA+*(AG)* = A+* x A*G*A* = A+*(AGA)* = A+*A* = (AA+)* = AA+. The other claims are 0
now clear.
COROLLARY 7.7 GEA{1,3}iffAG=AA+
286
Inner Products
THEOREM 7.19
Suppose G E A( 1, 3). Then x0 = Gb is a least squares solution of the linear system Ax = b.
PROOF Suppose G E A (1, 3). We use the old add-and-subtract trick. IAx - bl12 = IIAx - AGb - b + AGb112 = IIA(x - A+b)) + (AA+b - b)II2
= (A(x - A+b)) + (AA+b - b) I A(x - A+b)) + (AA+b - b)) = (A(xA+b)) I A(x - A+b)))+ (A(x - A+b) I (AA+b - b)) + ((A A+b - b) I A(xA+b))) + ((AA+b - b) I (AA+b - b)) = IIA(x - A+b))II2 + II(AA+bb)112 > II(AA+b - b)II2 = II(AGb - b)112, and equality holds iff IIA(x - A+b))II2 = 0 iff x - A+b E Null(A). 0 THEOREM 7.20 Let G be any element of A{ 1, 3}. Then xi is a least squares solution of the linear
system Ax =biff 1lAx, -bll = llb - AGbll PROOF Suppose IIAx, - bll = llb - Axoll, where xo = Gb. By our theorem above, Ilb - Axoll < llb - AxII for all x, so IIAxi - bll = Ilb - Axoll lb - Ax II for all x, making x, a least squares solution. Conversely, suppose x, is a least squares solution of Ax = b. Then, by definition, II Ax, - bil < ll Ax - bll for all choices of x. Choose x = Gb. Then Il Ax, - bil < IIAGb - bll. But Gb is a least squares solution, so IIAGb - bll < IIAx - bll for all x, so if we take
x, for x, IIAGb - bll < IlAx, - bll. Hence, equality must hold.
0
THEOREM 7.21 Let G be any element of A{ I , 3). Then, xo is a least squares solution of Ax = b
ff Axo = AGb = AA+b.
PROOF
Note AG(AGb) = (AGA)(AGb) = AGb, so the system on the
right is consistent. Moreover, IIAxo - bll = II AGb - bll, so xo is a least squares
solution of the left-hand system. Conversely, suppose xo is a least squares solution. Then IIAxo - bll = Ilb - AGbII. But IIAxo - b112 = IIAxo - AGb+ AGb-b112 = IIAxo - AGbll2+llb AGb112, which says IIAxo - AGbll = 0. Thus, Axo = AGb. 0
-
As we have noted, a linear system may have many least squares solutions. However, we can describe them all.
7.6 Least Squares
287
THEOREM 7.22 Let G be any element of A{ I , 3). Then all least squares solutions of Ax = b
are of the form Gb + (I - G A)z for z arbitrary. PROOF Let y = Gb + (1 - GA)z and compute that A(I - GA)y = -6 so Ay = AGb. Hence, y is a least squares solution. Conversely, suppose x is = -GA(x - Gb) whence a least squares solution. Then Ax = AGb so
6
x = x-GA(x-Gb) = Gb+x-Gb-GA(x-Gb) = Gb+(1-GA)(x-Gb). Take z = x - Gb to get the desired form.
0
It is nice that we can characterize when a least squares solution is unique. This often happens in statistical examples. THEOREM 7.23 Suppose A E C'""". The system of linear equations Ax = b has a unique least squares solution iff rank(A) = n.
Evidently, the least squares solution is unique iff I - GA = 0 if GA = 1. This says A has a left inverse, which we know to be true if PROOF
rank(A) = n (i.e., A has full column rank).
0
Finally, we can put two ideas together. There can be many least squares solutions to an inconsistent system of linear equations. We may ask, among all of these, is there one of minimum norm? The answer is very nice indeed. THEOREM 7.24 (Penrose)
Among the least squares solutions of Ax = b, A+b is the one of minimum norm. Moreover, if G has the the property that Gb is the minimum norm least squares
solution for all b, then G = A+. PROOF
The proof is left to the reader.
0
Isn't the Moore-Penrose inverse great? When Ax = b has a solution, A+b is the solution of minimum norm. When Ax = b does not have a solution, A+b gives the least squares solution of minimum norm. You cannot lose computing A+b!
We end this section with an example. Consider the linear system
I xt+x2= 1
xi-x2=0. 3
x2= 4
Inner Products
288
In matrix notation, this system is expressed as I
1
Ax
r XI
-1
0
[
1
X2
l
I
=b.
0
1=
3
Evidently, A has rank 2, so a full rank factorization trivially is A = FG r
-1
1
0
[
1
I
-
[
I
4
I
6
20
so A+ = (F*F)-IF*
0 3J
I
[
-1
1
0 ]
[
0
].WecomPuteAA+b=
I
3
3
"J
1
1
I
-1
0
3
2
b
0
6
4
0 0 3
3
1
II
1
I
I
[ t _
=
4
I
#b=
=
0
Thus, the system is
3
4
12
inconsistent. The best approximate solution is xo = A+b =
I
1
12
J
[
.
that one can compute a measure of error: JlAxa - bil = IIAA+b -
Note
bll =
II
0 12
12
12
12
TU
T2-
4
Note that the error vector is given by E = [I - A+A]b. We have chosen this example so we can draw a picture to try to gain some intuition as to what is going on here.
x2 = 3/4
1/2
Figure 7.5:
A least squares solution.
The best approximate solution is (1, 7
xl
7.6 Least Squares
289
Exercise Set 32 1. Consider the system
1
0
2
1
2 1
2
2
-l
2
-2
0 -2
X2 X3
=
2
Verify that
I
this system is inconsistent. Find all least squares solutions. (Hint: There will be infinitely many.)
2. We have previously seen that G = A+ + (! - A+A)W + V(1 - AA+) is a 1-inverse of A for any choice of V and W of appropriate size. If you did not complete exercise 9 on page 205, do it now. Argue that choosing V = 0 makes G a 3-inverse as well. If instead we choose W = ®, argue that we get a 4-inverse as well.
3. Argue that if G E AA*{ 1, 2), then A*G is a (1, 3)-inverse of A.
4. Argue that if H E A*A(1, 2), then HA* is a 11, 4)-inverse of A.
5. LetG E A{1, 3, 4). Argue that A{1, 3,4) = (G+(!-GA)W(1-AG) I W arbitrary of appropriate size}.
6. LetH E A( 1, 2,4). Argue that A( 1, 2,4) _ {H+HV(I -AH) I V arbitrary of appropriate size}.
7. LetKEA(l,2,3).Argue that A{1,2,3)=(K+(I-KA)WK I W arbitrary of appropriate size).
8. Argue that xp is the minimum norm least squares solution of Ax = b if a) IIAxo - b11 < 11 Ax - b11 and b)IIxoII < llxll for any x 0 xo.
9. Suppose A = FG is a full rank factorization of A. Then Ax = b if
FGx = b iff a) Fy = b and b) Gx = y. Argue that y = F+b = (F*F)-i F*b and IIFy - bll is minimal. Also argue that Gx = F+b is always consistent and has minimum norm.
10. Argue that xo is a least squares solution of Ax = b iff xo is a solution to the always consistent (prove this) system A*Ax = A*b. These latter are often called the normal equations. Prove this latter is equivalent to Ax - b E Null(A*). 11. Suppose A = FG is a full rank factorization. Then the normal equations are equivalent to F*Ax = F*b.
290
Inner Products
Further Reading [Albert, 19721 A. Albert, Regression and the Moore-Penrose Pseudoinverse, Academic Press, New York, NY, (1972).
[Bjorck, 1967] A. Bjorck, Solving Linear Least Squares Problems by Gram-Schmidt Orthogonalization, Nordisk Tidskr. InformationsBehandling, Vol. 7, (1967), 1-21.
Chapter 8 Projections
idempotent, self-adjoint, projection, the approximation problem
8.1
Orthogonal Projections
We begin with some geometric motivation. Suppose we have two nonzero vectors x and y and we hold a flashlight directly over the tip of y. We want to determine the shadow y casts on x. The first thing we note is
,-.
z Figure 8.1:
Orthogonal projection.
the shadow vector is proportional to x, so must be of the form ax for some scalar a. If we can discover the scalar a, we have the shadow vector, more formally the orthogonal projection of y onto x. The word "orthogonal" comes from the fact that the light was held directly over the tip of y, so the vector z forms a right angle with x. Note, ax + z = y, so z = y - ax and z 1 x. Thus,
0=== -a <xIx>soa= .This is <xIx> great! It gives a formula to compute the shadow vector.
DEFINITION 8.1
Let x, y be vectors in C" with x # -6. We define the
orthogonal projection of y onto x by
yIx > X. < xIx >
291
292
Projections
First, we note that the formula can he written as P,,(y)x=ykx = x#y x
xx
xx
(xx+)y. Here comes the Moore-Penrose (MP) inverse again! This suggests that Px can be viewed as the matrix xx+, and orthogonal projection of y onto x can be achieved by the appropriate matrix multiplication. The next thing we note is that Px is unchanged if we multiply x by a nonzero scalar. Thus Px depends on the "line" (i.e., one dimensional subspace) span(x) and not just x.
LEMMA 8.1 For any nonzero scalar 0, Ppx(Y) = P,,(Y)
PROOF < x> Ppx(Y) = x l
=
< Ix>
<xlx> x
= Px(Y)
0
So, from now on, we write Pp(,,) instead of P, indicating the connection of P,y,(x) with the one dimensional subspace span(x). Next, to solidify this connection with sp(x), we show the vectors in sp(x) are exactly those vectors left fixed by
LEMMA 8.2 {Y1 Pcp(x)(Y) = Y} = sp(x)
If y = P,y,(x)(y), then y = ` Ylx > x, a multiple of x so y E sp(x). < xIx > Conversely, if y E sp(x), then y = (3x for some scalar 0, so P,y,(x)(y) _ ` (3xJx >x R < xIx > x 0 Psn(x)(Rx) _ < xIx > = < xIx > = (3x = Y. PROOF
Next, we establish the geometrically obvious fact that if we project twice, we do not get anything new the second time.
LEMMA 8.3 For all vectors y, P,y,(x)(P,y,(x)(Y)) = P,p(x)(Y)-
PROOF <x z>
Pvt,(x)(P,t)(x)(Y)) = P,./,(.)
< xIx >
<xix>
C
< Yix > x < xIx >
< yix > x = <xIx> x = P1,(x)(Y)
-
/ < lx> xlx1 \ <xIx>
<xix>
/
x
= 0
293
8. 1 Orthogonal Projections
Taking the matrix point of view, P,p(x) = xx+, we have (P,("))2 = xx+xx+
= xx+ = P,p(x). This says the matrix Pp(,,) is idempotent. Let's make it official.
DEFINITION 8.2
A matrix Pin C' x" is called idempotent iff P2 = P.
Next, we note an important relationship with the inner product. LEMMA 8.4 For ally, z in C", < P,,,( )(z)ly > = < zlP+p(x)(Y) >.
PROOF
We compute both sides:
< Pp(x)(Z)lY >=
< zlx > < zlx > < xly >, but also, <XIX> xiy > = <xlx>
=x>= _ < xlx >
< xlx >
< xlx >
< xly > .
0
Okay let's make another definition.
DEFINITION 8.3 (self-adjoin[) A matrix P is self adjoint iff < Pxly >=< xl Py > for all x, y.
In view of our fundamental formula < Axly >=< xlA*y > for all x, y, we see self-adjoint is the same as Hermitian symmetric (i.e., P = P*). This property of Pp(x) is obvious because of the MP-equation: (P,.p(x))* = (xx+)* _ Next, we establish the "linearity" properties of Pp(x). LEMMA 8.5 1.
P,,,(x)(Xy) = XPp(x)(y) for X any scalar and y any vector.
2.
P,p(x)(Yi + Y2) = P11p(x)(Yi) + P,p(x)(Y2) for any vectors yi and Y2.
PROOF By now, it should be reasonable to leave these computations to the reader. 0 Next, we note a connection to the orthogonal complement of sp(x), sp(x)l.
Projections
294
LEMMA 8.6 1. y - P,y,(x)(y) 1 P,,(,,)(y) jor all vectors y. 2. Y - P,p(x) (Y) E sp(x)' for all vectors y.
PROOF
Again we leave the details to the reader.
Finally, we end with a remarkable geometric property of orthogonal projections. They solve a minimization problem without using any calculus. Given a vector y not in sp(x), can we find a vector in sp(x) that has the least distance to x?
That is, we want to minimize the distance d(y, z) = llY - zil as we let z vary through sp(x). But lit - z112 = 11(Y - P,y,(X)(Y)) + (P 1,(%)(Y) - z)ll2 =
lly -Pt ,,tx,(y)ll` + ll P,t,(x)(y) -
zll2
by the Pythagorean theorem. Note
(PP,,(x)(Y) - y) 1 (y - PNtx, (z)) since y - P,p(x)(z) E sp(x)1. Now conclude II P,p(x)(y) - yll < lly - zll. So, among all vectors z in sp(x), P,N(x)(y) is closest to y. We have proved the following.
LEMMA 8.7 d (Pcp(%)(Y), y)
d (z, y) for all z E sp(x).
We can generalize this idea.
DEFINITION 8.4 (the approximation problem) Let M be a subspace of C" and let x be a vector in C. Then by the approximation problem for x and M we mean the problem of finding a vector mo in M such that IIx - moll < IIx - mll for all m E M. We note that if x is in M. then clearly x itself solves the approximation problem for x and M. The next theorem gives some significant insight into the approximation problem.
THEOREM 8.1
1. A vector mo E M solves the approximation problem for x and M if and only if (x - mo) E M1. Moreover,
2. If m1 and m2 both solve the approximation problem for x and M, then m1 = m2. That is, if the approximation problem has a solution, the solution is unique.
8.1 Orthogonal Projections
295
PROOF 1.
First, suppose that mo c M solves the approximation problem for x and M. We show z = x - mo is orthogonal to every m in M. Without loss of generality, we may assume Ilml = 1. For any K E C, we have
IIz-km112== - i-K <mlz >+k<mlm>),=IIz112-I 12+I 12- IIzIlZ- 112+( -K)( -k)= IIZ112-
12 + I
_K12.
We may now choose K to be what we wish. We wish k = < zlm >. Then, our computation above reduces to IIz ]tm112 = IIZ112 - I <
-
zlm> 12. Nowz-km=(x-mo)-hm=x-(mo+Km).Since mo
and m lie in M, so does mo + km, and so, by assumption, Ilx - moll IIx - (mo + km)ll. We can translate back to z and get llzll < Ilz - kmll , so IlzllZ < Ilx - km112 Therefore, we can conclude IIZI12 < IIZI12 - 1 < zlm > 12. By cancellation, we get 0 < -1 < zlm > 12. But the only way this can happen is that I < zlm > I = 0, hence < zim >= 0. Thus, z 1 m (i.e., (x - mo) 1 m), as we claim. Conversely, suppose mo is a vector in M such that x - mo E Ml. We claim mo solves the approximation problem for x and M. By the Pythagorean theorem, II(x - mo) + m112 = Ilx - moll2 + IIm112 IIx - mo112 for any m E M. This says IIx - moll' IIx - mo + m112
Let n E M and take m = mo - n which, since M is a subspace, still belongs to M. Then IIx - moll < II(x - mo) + (mo - n)II = IIx - n1l. Since n was arbitrary from M, we see mo solves the approximation problem for x and M. 2. Now let's argue the uniqueness by taking two solutions of the approximation problem and showing they must have been equal all along. Suppose m1 and m2 both solve the approximation problem for x and M. Then both
x - mi E Ml and x - m2 E M1. Let m3 = m1 - m2. We hope of course that m3 = '. For this, we measure the length of m3. Then 11m3112 =
< m31m3 >=< m31(x-m2)-x+mi >= < m31(x-m2)-(x-MI) >= < m3l(x-m2) > - < m31(x-mi) >= 0-0 = 0. Great! Now conclude 11m311 = 0 so m3 = -6; hence, mi = m2 and we are done.
a
In particular, this theorem says Pp(x)(y) is the unique vector in sp(x) of minimum distance from y. We remark that the approximation problem is of great interest to statisticians. It may not be recognized in the form we stated it above, but be patient. We will get there. Also, you may have forgotten the problem set
out in Chapter I of finding "best" approximate solutions to systems of linear
Projections
296
equations that have no solutions. We have not and we have been working toward this problem since we introduced the notion of inner product. Finally, note that the approximation problem is always solvable in C". That is, let x E C" and M he any subspace. Now, we showed awhile back that C" = M®M1, sox is uniquely
expressible x = m + n, where m E M, n E M. But then, x - m = n E Ml, so by our theorem, m solves the approximation problem for x and M. We have seen how to project onto a line. The question naturally arises, can we project onto a plane (i.e., a two-dimensional subspace). So now take a two-dimensional subspace M of C". Take any orthonormal basis (ul, U2) for M. Guided by the matrix formula above, we guess PM = [u11u2][ullu2]+. From the MP equations, we have p2 = PM = PM. For any matrix A, recall
Fix(A) = (xIAx = x). Clearly, 0 is in Fix(A) for any matrix A. We claim M = Fix(PM) = Col(PM). Let Y E Fix(PM). Then y = PMy E Col(P,1t). Conversely, if y E Col(PM), then y = PM(x) for some x. But then PM(y) = PMPM(x) = PM(x) = y. This establishes Fix(PM) = Col(PM). To get that these are M, we need a clever trick. For any matrix A, A+ = A*(AA*)+. Then PM = [111 112][[111 1U21+ = [U11112]([ui 1u21`([ul1u2][u} U ]')+ = [u11u21
([] ([1111112] [° ])+) _ [ullu2]([ ] (ului +u2u)+).However, (ului+ U2U*2 )2
(11111*1 +11211,*)(UtU*I +U2U*2) = (U1U*1U1U*1 +UII U*U2U2*+U2U2*U1U1 +
U2LJ U2u,* = 111111 + u2u2*. Also, (ulu* + u2u',)* = utui + u2u2*. Therefore, (utuT +u2uZ)+ = 111111 +u2uZ. Thus, P M = ([1111112] [! ]) (111111 +u2uZ) _
(UI u7 + 11211;)2 = uI u + u2u;. This rather nice formula implies PM(uI) = uI
and PM(u2) = u2, so PMx = x for all x in M since PM fixes a basis. This evidently says M C_ Fix(PM). But, if x E Fix(PM), then x = PM(x) = (111111 + u2uZ)x = uIUix + U2UZx = (111x)111 + (u2*x)u2 E sp{u1,U2) = M.
This completes the argument that M = Fix(PM) = Col(PM). Finally, we show that PM(x) solves the approximation problem for x and M. It suffices to show x - PM(x) E M1. So take m E M and compute the inner product
< x-PM(x)Im >_ < xIm > - < PM(X)IM >_ < xIm > - < xI PM(m) >= < xIm > - < xIm >= 0. Note that we used PM = PM and M = Fix(PM) in this calculation. Now it should be clear to the reader how to proceed to projecting on a threedimensional space. We state a general theorem.
THEOREM 8.2 Let M be an m-dimensional subspace of C. Then there exists a matrix P5, such that
1. PM=PM=PM 2. M = Fix(PM) = Col(PM) 3. For any x E C", PM(x) solves the approximation problem for x and M.
297
8. 1 Orthogonal Projections
Indeed, select an orthonormal basis {u1, u2, ... , u,n} for M and form the matrix PM = [Ul IU2I ... IUn,][UI IU2I ... I U,n]+ = UIUI +U2U*2 +
+UnnUM* .
Then for any x E C", PM(x) = < xlul > ul+ < xlu2 > u2 + ... +
< xlu > Urn. PROOF
The details are an extension of our discussion above and are left to the
reader.
LI
Actually a bit more can be said here. If Q2 = Q = Q* and Fix(Q) = Col(Q) = M, then Q = PM. In particular, this says it does not matter which orthonormal basis of M you choose to construct PM. For simplicity say PM = [Ul lu2][uI lu2]+
[ullu2][ullu2l+. Then QPM = Q[ullu2][ullu2]+ = [QuIIQu2][ullu2]+ =
= PM. Thus, QPM = PM since Q leaves vectors in M fixed. Also PMQ = [Ul Iu2][ul lu21+Q = [ului +u2u2*]Q = [utui +u2u][gl Iq2] _ [ulu*gl + u2u*2gllUluig2 + u2u*g2] = [< gllul > UI+ < q, 11112 > U21 <
g2lul > UI+ < g21u2 > u2] = [gllg2] = Q. Thus PMQ = Q. Therefore, Q = Q* = (PMQ)* = Q*P,y = QPM = PM. Now it makes sense to call PM the orthogonal projection onto the subspace M.
Exercise Set 33 I
1. Let v =
2
-2
. Compute the projection onto sp(v) and sp(v)1.
2. Compute the projection onto the span of { (1, 1, 1), (- 1, 0, 1)).
3. Prove that if UU* _ /, then U*U is a projection.
4. Let PM.N denote the projector of C" onto M along N. Argue that (PM,N)* = PN1,M' .
5. Argue that P is a projection iff P = P P*. 6. Suppose C" = M ® N. Argue that N = Ml if PM.N = (PM,N)*. 7. Argue that for all x in C", 11 PMx11 < IIxi1, with equality holding iff x E M.
8. (Greville) Argue that G E A(2) if G = (EAF)+ for some projections E and F. 9. (Penrose) Argue that E is idempotent iff E = (FG)+ for some projections F and G.
Projections
298
10. (Greville) Argue that the projector PM.N is expressible with projections through the use of the MP-inverse, namely, PM.N = (PNI PM)+ = ((I PN)PM)+.
1 1. A typical computation of the projection PM onto the subspace M is to obtain an orthonormal basis for M. Here is a neat way to avoid that: Take any basis at all of M, say (b1, b2, ... , b,,, ), and form the matrix
F = [b,
I b2 I
I b",]. Argue that PM = FF+ = F(F*F)-I F*.
12. Suppose E is a projection. Argue that G E E{2, 3, 4} iff G is a projection
and Col(G) C Col(E). 13. Let M and N be subspaces of C". Prove that the following statements are all equivalent:
(i) PM - PN is invertible
(ii) C"=M®N (iii) there is a projector Q with Col(Q) = M and Xull(Q) = N.
In fact, when one of the previous holds, prove that (PM - PN)-' _
Q+Q*-I. 14. Suppose P = P2. Argue that P = P* iffArull(P) = (Fix(P))-. 15. Argue that Q =
Cos2(0)
Sin(x)Cos(0)
sin(x)Cos(0)
sin2(0)
is a projection of the
plane R2. What is the geometry behind this projection?
Further Reading [Banerjee, 20041 Sudipto Banerjee, Revisiting Spherical Trigonometry with Orthogonal Projections, The College Mathematics Journal, Vol. 35, No. 5, November, (2004), 375-381. [Gross, 1999] Jdrgen Gross, On Oblique Projection, Rank Additivity and the Moore-Penrose Inverse of the Sum of Two Matrices, Linear and Multilinear Algebra, Vol. 46, (1999), 265-275.
8.2 The Geometry of Subspaces and the Algebra of Projections
8.2
299
The Geometry of Subspaces and the Algebra of Projections
In the previous section, we showed that, starting with a subspace M of C", we can construct a matrix PM such that P,y = PM = P,, and M = Col(PM) = Fix(PM). Now we want to go farther and establish a one-to-one correspondence
between subspaces of C" and idempotent self-adjoint matrices. Then we can ask how the relationships between subspaces is reflected in the algebra of these special matrices. We begin with a definition. DEFINITION 8.S (orthogonal projection matrix) A matrix P is called an orthogonal projection matrix (or just projection, for short) if it is self-adjoint and idempotent (i.e., P2 = P = P'). Let P(C"x") denote the collection of all n-by-n projections.
We use the notation Lat(C") to denote the collection of all subspaces of C".
THEOREM 8.3 There is a one-to-one correspondence between ?(C",") and Lat(C") given
as follows: to P in P(C"x"), assign 9(P) = Fix(P) = Col(P) and to M in Lat((C"), assign Js(M) = PM.
PROOF It suffices to prove tpp(P) = P and p i(M) = M. We note that i is well defined by the discussion at the end of the previous section. First,
*p(P) = 4)(Col(P)) = Pc1,r(p) = PM, where M = Col(P). But then, P = PM since PM is the only projection with M = Fix(PM) = Col(PM). Next, 0 0(ti[) =
THEOREM 8.4 For any subspace M of C", PMl = I - PM.
300
Projections
Co!(I - PM) = Fix(I - PM) = {xI(I - PM)x = x} = {xIPMx 61 = Null(PM) = Col(PM)1 = Col(PM)1 = M1. Note I - PM is a pro-
PROOF
jection since(I-P5.r)2=I-P",-Pir,+P,;=I- P,,, and (I-P",)*= /*-Pm=I -P,,,.Thus, PMi =PcoIU-P.)=/-PM. As a shorthand, write P1- = I - P when P E 1P(C""11). THEOREM 8.5 M C N if PM = PN PM iff PM = PM PN
PROOF SupposeM C N.ThenPMX E M C N = Fix(PN),soPN(PMx)= PMx. Since this is true for all x in C", we get PNPM = PM. Conversely, if PNPM = PM, then Co!(PM) C Col(PN). That is, M C N. The second 'iff' follows by taking *.
DEFINITION 8.6
0
(partial order)
For P, Q in 1P(Cl ' ") define P < Q iff P = P Q.
This gives us a way of saying when one projection is "smaller" than another one.
THEOREM 8.6 Let P, Q, and R be projections.
1. For all P, P < P.
2. IfP
6. P
9. P+QE1P(C0")iffPQ=QP=®. 10. P < Q iff Q1 < P1.
8.2 The Geometry of Subspaces and the Algebra of Projections PROOF
The proofs are routine and left to the reader.
301
0
THEOREM 8.7 Let M and N be subspaces of C".
I. PM+N is the projection uniquely determined by (i) PM < PM+N and PN < PM+N and (ii) if PM < Q and PN < Q, then PM+N < Q, where Q is a projection.
2. PMnN is the projection uniquely determined by (i) PMnN < PM and PMnN < PN and if Q < PM and Q < PN, then Q _S PMnN. PROOF I.
First M, N CM+N, so PM, PN < PM+N. Now suppose Q is a projection
with PM < Q and PN < Q. Then, M C Col(Q) and N C Col(Q), so M + N C Col(Q), making PM+N < Q. For the uniqueness, suppose H is a projection satisfying the same two properties PM+N does.
That is, PM, PN < H and if PM _< Q and PN < Q, then H < Q. Then, with H playing the role of Q, we get PM+N < H. But with PM+N playing the role of Q, we get H < PM+N. Therefore, H = PM+N.
2. The proof is analogous to the one above.
a
Our next goal is to derive formulas for PM+N and PMnN for any subspaces M and N of C. First, we must prepare the way. The MP-equations are intimately connected with projections. We will explore this more fully in the next section
but, for now, recall (MPI) AA+A = A and (MP3) (AA+)* = AA+. The first equation says AA+AA+ = AA+. In other words MPI and MP3 imply P = AA+ is a projection. But what does P project onto? We need Fix(P) = Fix(AA+) = {xIAA+x = x}. But if y E Col(A), then y = Ax for some x, so AA+y = AA+Ax = Ax = y. Thus we see Col(A) c_ Fix(AA+). On the other hand, if y E Fix(AA+), then A(A+y) = y so y E Col(A). We conclude Fix(AA+) = Col(A), so that AA+ is the projection onto the column space of A. Let's record our findings. LEMMA 8.8 For any matrix A E C"I'll, AA+ E C"'> "' is the projection onto the column space of A.
Next, we need the following Lemma.
Projections
302
LEMMA 8.9 For any matrix A, Col(A) = Col(AA*)
PROOF
If X E Co!(AA*) then x = AA*y for some y, so x = A(A*y) is
evidently in Col(A). Conversely, if x E Col(A), then x = Ay for some y. Now x = AA+x = A(A*(AA*)+)x.Inotherwords, AA*z = x, wherez = (AA*)+x. 0 This puts x in Col(AA*). Another fact we will need is in the following Lemma.
LEMMA 8.10
Let A E C"' x", B E C"'" and M = [A: B], the augmented matrix in Cl""'+A). Then Col(M) = Col(A) + Col(B).
PROOF
The proof is left as an exercise.
0
Now we come to a nice result that is proved in a bit more generality than we need.
THEOREM 8.8
Let AEC"""' and B E C". Then 1. Col(AA* + BB*) = Col(A) + Col(B)
2. Null(AA* + BB*) = Arull(AA*) n Null(BB*).
PROOF 1. Let M = [A:B] be the m-by-(n +k) augmented matrix. Then Col(M) = Col(A) + Col(B) by (2.10). But also, Col(M) = Col(MM*) by (8.9). A*
However, MM* = [A:B]
..
= AA* + BB*. Hence, Col(AA* +
B*
BB*) = Col(MM*) = Col(M) = Col(A) + Col(B).
2. Let X E Null(AA*) nNull(BB*). Then AA*x = -6 and BB*x so (AA* + BB*)x = 6, putting x in NUl1(AA* + BB*). Now let x E NUIl(AA*+BB*). Then [AA*+BB*](x) = 6 so AA*x+BB*x = -.
8.2 The Geometry of Subspaces and the Algebra of Projections
303
Thus x*AA*x + x*BB*x = '. That is, IIA*x112 + IIB*x112 = 0 so IIA*xll = 0 = IIB*xll. We conclude x E NUll(A*) = Null(AA*) and x E NUll(B*) = NUll(BB*).Therefore,x E NUll(AA*)nArull(BB*). 0
Now we apply these results to projections. Note that if P and Q are projections, then P + Q = PP* + QQ*. Also note that while P + Q is self-adjoint, it need not be a projection. COROLLARY 8.1 Let P and Q be in P(C' "). Then
1. Col(P + Q) = Col(P) + Col(Q)
2. Null(P + Q) = Null(P) nArull(Q). Let M = Col(A) for some matrix A. We have noted above that AA+ is the projection onto M. Thus, if M and N are any subspaces, [ PM + PN ] [ PM + PN ]+ is
the projection on Col (PM + PN) = Col(PM)+Col(PN) = M +N using (8.8)(1). This is part of the next theorem. Recall our shorthand notation P1 = I - P for a projection P. THEOREM 8.9
Let M and N be subspaces of C". Let P = Pit and Q = PN. Then all the following expressions are equal to the orthogonal projection onto M + N. 1. PM+N
2. [P + Q][P + Q]+ 3. [P + Q]+[P + Q] 4. Q + [(PQ1)+(PQ1)]
5. P + [(QP1)+(QP1)]
6. P+P1[P1Q]+ 7. Q + Q1[Q'P]+ PROOF There is clearly lots of symmetry in these formulas, since M + N = N + M. That (2) equals (3) follows from the fact that P + Q is self-adjoint. Take the * of (2) and you get (3), and vice-versa. That (1) and (2) are equal follows
Projections
304
from the discussion just ahead of the theorem. We will give an order theoretic argument that (1) equals (4). The equality of (1) and (5) will then follow
by symmetry. Let H = Q + (PQ1)+(PQ') First note (PQ1)+(PQ1)Q =
0, so that H is in fact a projection. Also, H Q = Q + 0 = Q so Q <
H. Next, PH = P(Q + (PQ')+(PQ')] = PQ + P(PQ')+(PQ1) PQ + P[I - (1 - (PQ J)+(PQI)] = PQ + P - P(1 - (PQ1)+(PQ1)) P+[PQ-P(I -(PQ1)+(PQ'))]. But (D = (PQ1)[I -(PQ1)+(PQ1)] P(1 - Q)[I - (PQ1)+(PQl)] _ [P - PQ][I - (PQ1)+(PQ1)] = P(I (PQ1)+(PQ1)) - PQ, since Q < I - (PQ1)+(PQ1). Thus, PQ = P(I (PQ1)+(PQ')) and, consequently, PH = PQ + P - PQ = P. Thus, P < H. Now let K be any projection with K > P and K > Q. Then KH =
K[Q+(PQ -)+(PQ1)] = KQ+K(PQ1)+(PQ')) = Q+K(PQ1)+(PQ1).
But Q < K so QK1 = 0 whence PQ1K' = PK1 = 0, so K1 = [I - (PQ1)+(PQl)]K1. This says K1 < I - (PQ1)+(PQ1) or, equivalently, (PQ')+(PQ1) < K. Thus, K(PQ1)+(PQ1) = (PQ1)+(PQ1) and so KH = H putting H < K. Therefore, H = PM+N by (8.7)(1). Now we have (1) through (5) all equal. So let's look at (6). Let U = P1 Q.
Then UQ = Q so U+UQ = U+U. This says U+UQP-- = U+UP1. By taking the * of both sides, we get P1QU+U = P'U+U so U = UU+U = P1U+U. Thus, UU+ = P1U+UU+ = P--U+. Therefore, UU+ = P'U+;
that is, (P'Q)(P1Q) = P1(P1Q)+. But (P1Q)(P'Q)+ = [(P1Q)(P1 = (P1Q)+`(P1Q)` = (QP1)(QP')so(QP')+(QP1) = Pl(P--Q)+. Q)+]*
Now (6) follows from (5). Of course (7) follows by symmetry.
0
Thus our first goal is accomplished; namely, in finite dimensions, the projection onto the linear sum of two subspaces is computable in many ways in terms of the individual projections. We now turn to our second goal of representing the projection on the intersection of two subspaces in terms of the individual projections.
THEOREM 8.10 Let M and N be subspaces of C'. Let P = PM and Q = PN. Then all the following expressions are equal to the orthogonal projection onto m fl N. 1 I. PMn N
2. 2Q(Q + P)+P 3. 2P(Q + P)+Q
4. 2[P - P(Q + P)+P] 5. 2[Q - Q(Q + P)+Q]
8.2 The Geometry of Subspaces and the Algebra of Projections
305
6. P - (P - QP)+(P - QP) = P - (Q1P)+(Q1P) 7. Q - (Q - PQ)+(Q - PQ) = Q - (P'Q)+(P1Q)
8. P - P(PQ')+ 9. Q - Q(QP1)+ PROOF As before. we have much symmetry since m fl N = N fl M. We begin by showing (2) equals (3). To do this, we first show Q(Q + P)+P -
P(Q + P)+Q =0. But Q(Q + P)+P = P(Q + P)+Q = Q(Q + P)+P + Q(Q + P)+Q - Q(Q + P)+Q + P(Q + P)+Q = Q(Q + P)+(Q + P) (Q + P)(Q + P)+ Q. By (2.12), Col(Q) S Col(Q) + Col(P) = Col(Q + P), so Q(Q + P)+ (Q + P) = Q = (Q + P)(Q + P)+ Q. Thus, Q(Q + P)+P -
P(Q+P)+Q=Q-Q=0,and so Q(Q+P)+P=P(Q+P)+Q, and it
follows that (2) equals (3). Next, we argue that (2), and hence (3), also equals (1). We use the uniqueness
characterization of PMnN. Let H = Q(Q + P)+P + P(Q + P)+Q = 2Q(Q +
P)+P =2P(Q+P)+Q.NowHP = [2Q(Q+P)+P]P =2Q(Q+P)+P2 = 2Q(Q + P)+P = H and, similarly, HQ = H. Thus, Col(H) C- M fl N. But also H = HPMnN = [Q(Q + P)+P + P(Q + P)+Q]PMnN = Q(Q + P)+PPMnN+P(Q+P)+QPMnN = Q(Q+P)+PMnN+P(Q+P)+PMnN = [Q(Q + P)+ + P(Q + P)+]PMnN = (Q + P)(Q + P)+PMnN = PMnN. This last equality follows because M fl N C_ Col(Q + P) and so we conclude H = PMnN
Next, we show (4) and (5) equal (1) by showing P(Q + P)+Q = P P(Q + P)+ P. The argument that Q(Q + P)+P = Q - (Q + P)+Q is similar and will be left as an exercise. Now P(Q + P)+Q - (P - P(Q + P)+ P) _
P(Q + P)+ Q - P + P(Q + P)+ P = P(Q + P)+ Q + P(Q + P)+ P - P =
P(Q+P)+(Q+P)-P=P-P=0.
To see that (6) and (7) equal (1), note first that M fl N C M and m fl N C N,
so PPMnN = PMnN and QPMnN = PMnN and Q P PMnN = PMnN Now
let S = P - (P - QP)+(P - QP). We claim S is a projection. Clearly S' = S and SP = S. It follows PS = S. In particular, Col(S) C Col(P) =
M. Next, Sz = P2 - P(P - QP)+(P - QP) - (P - QP)+(P - QP)P+ ((P - QP)+(P - QP))2 = PEP - (P - QP)+(P - QP)] = PS = S.
Now P - QP = (P - QP)(P - QP)+(P - QP) = P(P - QP)+(P QP) - QP(P - QP)+(P - QP) so S = PS = P[P - (P - QP)+(P -
QP)]=P-P(P-QP)+(P-QP)=QP-QP(P-QP)+(P-QP)=
Q[P - P(P - QP)+(P - QP)] = QPS = QS. Thus, QS = S and SQ = S, so Col(S) C Col(Q) = N. Therefore, Col(S) c M fl N and so PMnNS = S.
ButSPMnN =[P-(P-QP)+(P-QP)]PMnN = PPMnN-(P-QP)+(PQ P)PMnN = PMnN - 0 = PMnN Thus, S = PMnN. The argument for (8) and
Projections
306
(9) are similar to the ones given for (6). They will be left as an exercise. This completes our theorem. 0 Next, we look at some special cases whose proofs are more or less immediate.
COROLLARY 8.2 Let A E Cmxn and B E Cmxk. Then 1. [A:B][A:B]+ = PC(,!(A)+cot(B)
2. 2AA+[AA+ + BB+]+BB+ = Pco1taux;ouB)COROLLARY 8.3
Let M and N be subspaces of C" with P = PM and Q = PN. Then if PQ = Q P,
I. PM+N = P -I- Q- P Q
2. PMnN=PQ=QP. In particular, if PQ = 0 (i.e., if P±Q), 1.
Pet+N = P + Q
2. PMnN = 0.
We end this section with an example. Suppose M = span{a,, a,,... , a,] and N = span{b,, b2, ... , b, ] are subspaces of C". We form the matrices A = [a, a2 ... a,.] and [b, b2 ... b,], which are n-by-r PM and BB+ _ and n-by-s, respectively. Of course, AA+ = Pc,,uB) = PN. Now form the augmented matrix M = [A:B], which is n-byI
I
I
I
I
I
(r + s) and, with a left multiplication by a suitable invertible matrix R, we pro-
duce the row reduced echelon form of M; say RREF(M) = RM = R[A:B] _ E11
[RA:RB] _
0
E22
and
(D
. Now
®
= R-
Ell
E12
0
E12 E22
B = R-'
_
®
[E 0 0]
(D
(D
E12
E12
®
E22
E+
®
R-'
E12
. Thus, R-1
0
E12
0
+
E12
®
o E12
has columns in m fl N. If we let R-'
then W W+ is the projection
on M fl N. For example,
suppose
8.2 The Geometry of Subspaces and the Algebra of Projections
307
M = span {(1, 1, 1, 1, 0), (1,2,3,4,0), (3, 5, 7, 9, 0), (0, 1, 2, 2,0)) and N = span J(2, 3, 4, 7,0), (1, 0, 1,0,0), (3, 3, 5, 7, 0), (0, 1, 0, 3, 0)}. Then RM =
0
-1
0
2
-1
-z
-1z
1
1
-1
Z
1
0
0
z
0 0
0 0 0
1
0
1
0
0
0
1
0
0
0
0
0
E12
2 3
0
3
1
4
4
1
7
0 0
LO
0
0
7
0 0
3 0
and W+ =
1
0
7
3
0
0
0
0
0
0
0
-2 -2 1= RREF(M). -1
1
0
0
1
1
0
1
1
2
1
0
1
3
1
4
0
0
r
1
3
4
0
10 0
0
0
-2
1
0
5
2
0
1
0
3
:
3
1
3
0
2
1
1
1
7 2: 4 9 2: 7
0
0
0
3
3
2 0:
Now W= R-1 =
1
5
-1
1
L0
2
-1
0
:
1
0
0
0
1
-1
1
1
3 0: 2
1
0
2 2 0
1
0 0
-1
0 0 0 0
-1
-1
3
2
-2
0 0 0
0 0
0 0
0 0
3
1
-2 -2
0-
-1/12
1/12
1/4
-1/12
0
0
0 1/12
0 1/4
0
0
-1/3
-7/6
-1/12
projection onto m fl N is W W+ _
1/2
-1/12 0 2/3
The
0
1/6
0
-1/6
0
1/6
1/3
-1/6
1 /3
5/6
0
0 0 0
1/3
1/6
5/6
0
0
0
0 0
0
0
1/3 1/6
=
0
. The
trace of this projection is 2, which gives the dimension of m fl N. A basis for
MflNis{(2,3,4,7,0),(1, 1, 1,3,0)}.
Exercise Set 34 1. Prove that tr((PM + PN)(PM + PN)+) = tr(PM)+tr(PM) - tr(PMnN). 2. (G. Trenkler) Argue that A is a projection iff 3tr(A*A)+tr(AAA*A*) _ 2Re(tr(AA + AA*A)).
Projections
308
3. Suppose PM and PN are projections. Argue that PM + PN is a projection
iIT MIN. 4. Suppose P and Q are projections. Argue that PQ is a projection if PQ = QP. Indeed, argue that the following statements are equivalent for projections P and Q: (a) PQ is idempotent
(b) tr(PQPQ) = tr(PQ) (c) PQ is self-adjoint (d) PQ = QP. 5. Suppose PM and PN are projections. Argue that PM PN = PN PM iff
M = (MnN)®(MnNl). 6. Suppose PM and PN are projections. Argue that the following statements are all equivalent:
(a) PM - PN is a projection (b) PN < PM (c) IIPNxII < IIPMxII for all x
(d) N C M (e) PM PN = PN
(f) PNPM=PN 7. Suppose P is an idempotent. Argue that P is a projection if II Pxll for all x.
Ilxll
Further Reading [P&O&H, 1999] R. Piziak, P. L. Odell, and R. Hahn, Constructing Projections on Sums and Intersections, Computers and Mathematics with Applications, Vol. 37, (1999), 67-74.
[S&Y, 1998] Henry Stark and Yongyi Yang, Vector Space Projections, John Wiley & Sons, New York, (1998).
8.3 The Fundamental Projections of a Matrix
309
The Fundamental Projections of a Matrix
8.3
We have seen how to take a matrix A and associate with it four subspaces we called "fundamental": Col(A), Mull(A), Col(A*) and Null(A*). All these subspaces generate projections. Remarkably, all these subspaces are related to the MP-inverse of A. THEOREM 8.11 Let A E C"' x" Then 1. AA+ = PCOI(A) = PA(.//(A')'
2. A+A = 3.
1 - AA+ =
PNuu(A)1
PC,,I(A)1 =
4. 1 - A+A =
PNuiuA).
PROOF We have already argued (1) in the previous section. A similar argument shows A+A is a projection. Then I - AA+ and I - A + A are projections. The only real question is what does A+A project onto? Well A+A = A+ A++
so, by (1), A+A projects onto Col(A+). So the only thing left to show is that Col(A+) = Col(A*). This follows from two identities. First, suppose x E Col(A*). Then, x = A*z for some z. But A* = A+AA*, so x = A*z = A+AA*z = A+(AA*z), putting x E Col(A+). Next, suppose x E Col(A+). Then x = A+z for some z. But A+ = A*A+*A+, so x = A+z = A*(A+*A+z), putting x in Col(A*). 0 COROLLARY 8.4 For any matrix A E 011
, Col(A*) = Col(A+) and Null(A) =
Col(A+)'.
Next, we relate the fundamental projections to a full rank factorization of A. THEOREM 8.12 Let A = FG be a full rank factorization of A E C;' x". Then
1. AA+ = FF+
2. A+A=G+G
Projections
310
3.I,,,-AA+FF+ 4. I - A+A = I - G+G. PROOF
The proof is easy and left to the reader.
Let's look at an example.
Example 8.1
Let A=
13
6 4 2
2 1
13
= FG =
9 3
+we compute G
1 /5
0
2/5
0
0
Then G+G =
I - G+G =
FF+ =
F+
and
3
13
2
9
1
3
-
L0
1
0
. 1
-3/26 -11/13 1/13
3/13
As before,
1
79/26 -9/13
,.
1
1/5
2/5
0
2/5
4/5
0
0
0
is a rank 2 projection onto Col(A*);
1
4/5
-2/5
0
-2/5
1 /5
0
0
0 0
is a rank I projection onto Null (A). Next
17/26
6/13
3/26
6/13 3/26
5/13
-2/13
-2/13
25/26
9/26
-6/13
-3/26
6/13
8/13 2/13
2/13 1/26
and I - FF+ _
r
-3/26
is a rank 2 projection onto Col(A),
is a rank I projection onto
Mull(A*). Next, we develop an assignment of a projection to a matrix that will prove useful later. The most crucial property is (3), which we use heavily later. This property uniquely characterizes A:
DEFINITION 8.7 (the prime mapping) For any matrix A E CmX", we assign the projection A' = I - A+A E P(C' "') That is, A' is the projection onto the null space of A. We next collect some formulas involving A'.
8.3 The Fundamental Projections of a Matrix
311
THEOREM 8.13 Let A E C,nxn Then
1. AA' _ 0; (A*A)' = A'; (A *)'A = 0. 2. If P E p(Cnxn), then P'= P1 = I - P.
3. AB = 0 iff B = A'B. In fact, A' is the unique projection with this property.
4. If B A'B
B* and AB = BA, then AB' = B'A and A'B = BA' and BA'.
5. IfP,QElin(Cnxn),then PAQ=(Q'P)'PandPVQ=(P'AQ')'. 6. If P c Q, thenQ= PV(QAP'). PROOF
1. AA'= A(I - A+A) = A - AA+A = 0. 2. Suppose P is a projection. Then P+ = P so P' = I - P+P = I - P2 =
I-P=P1.
3. Suppose AB=O. Then A'B=(I-A+A)B=B-A+AB=B-0= B. Conversely, if B = A'B, then AB = AA'B = OB = 0. Now let P be any projection with the property AB = 0 iff B = PB. Then AA' = 0,
so A' = PA', making A' < P. But also, P = PP so AP = 0 whence
A+AP=0, soA'P=(I-A+A)P=P-A+AP=P, soP
4. Suppose B = B* and AB = BA. Now BY = 0, so ABB' = 0, so BAB' = 0, so AB' = B'(AB'). Similarly, A*B = BA*, so A*B' = B'A*B'. Taking *, we get B'A = B'AB'. Thus, AB' = B'AB' = B'A. Also, AA'= O, so B A A' = O, so A B A' = O, so B A' = A'BA'. Taking * we get A'B = A'BA'. Therefore, A'B = BA'.
5. (Q'P)'P = (Q1P)'P = (I -(Q1P)+(Q'P))P = P-(Q1P)+(Q1P) = P A Q by (8.10)(6). Now (P'A Q')' = ((Q"P')'P')' = ((QP1)'P1)' =
((I - (QP1)+(QP1))P1)' = (P1 - (QP1)+(QP1))' = I - (P1 (QP1)+(QP1)) = P + (QP1)+(QP1) = P V Q by (8.9)(5). Note we have used that P1 - (QP1)+(QP1) is a projection since P1 (QP1)+(QP1) < P1
Projections
312
6. Suppose P < Q. Then P V (Q A P) = P V (Q'P')'P' = P V I(1 -
Q)(I-P)]'P1=Pv[I-Q-P+PQJ'(I-P)=Pv(Q(I-P))= Pv(Q-QP)=Pv(Q-P).ButP(Q-P)=OsoPV(Q-P)= P+(Q-P)=Q. 0
Well if one prime is so good, what happens if you prime twice? It must be twice as good don't you think? Again A" is a projection we assign to A.
THEOREM 8.14
1. A" = A+A = P E 1P(C""') then P" = P.
3. A=AA"=(A*)"A. 4. If AB = A, then A"
6. (A B)" < B".
7 (AB)" _ (A"B) 8. (A*A)" - A"; (AA*)" - A*". 9. ((AB)'B*)" < A'. 10. AB* = 0 iff A"-LB" iff A"B" = 0.
/ /. If AB = AC, then A"B = A"C.
Exercise Set 35 1.
Fill in the proofs of Theorem 8.12.
2. Let a =
I
L
i J.
Compute a+ and P = as . Verify that P is an orthogonal
projection. Onto what does P project?
3. Fill in the proofs of Theorem 8.14.
8.4 Full Rank Factorizations of Projections
313
Further Reading [Greville, 1974] T. N. E. Greville, Solutions of the Matrix Equation XAX = X and Relations Between Oblique and Orthogonal Projectors, SIAM J. Appl. Math., Vol. 26, No. 4, June, (1974), 828-831. [B-I & D, 1966] A. Ben-Israel and D. Dohen, On Iterative Computation of Generalized Inverses and Associated Projections, J. SIAM Numer. Anal., III, (1966), 410-419.
MATLAB Moment
8.3.1 8.3.1.1
The Fundamental Projections
It hardly seems necessary to define M-files to compute the fundamental projections. After we have input a matrix A, we can easily compute the four projections onto the fundamental subspaces:
the projection onto the column space of A the projection onto the column space of A* the projection onto the null space of A* the projection onto the null space of A.
A*pinv(A) pinv(A)*A eye(m)-A*pinv(A) eye(n)-pinv(A)*A
Since "prime mapping" plays a crucial role later, we could create a file as follows: I
2 3
function P=prime(A) [m n] = size(A) P = eye(n)-pinv(A)*A.
Experiment with a few matrices. Compute their fundamental projections.
8.4
Full Rank Factorizations of Projections
We have seen that every matrix A in C;'"" with r > 0 has infinitely many full rank factorizations A = FG, where F E and G E (Cr"". The columns of F form a basis for column space of A. Applying the Gram-Schmidt process,
we can make these columns of F orthonormal. Then F*F = J. But then, it is easy to check that F* satisfies the four MP-equations. Thus, F* = F+,
Projections
314
which leads us to what we called an orthogonal full rank factorization, as we delined in (7.6) of Chapter 7. Indeed, if U is unitary, A = (FU)(U*G) is again an orthogonal full rank factorization if A = FG is. We summarize with a theorem.
THEOREM 8.15 Every matrix A in C;'"" with r > 0 has infinitely many orthogonal full rank factorizations. Next, we consider the special case of a projection P E C"r "". We have already noted that P+ = P. But now take Pin a full rank factorization that is orthogonal.
P = FG, where F+ = F*. Then P = P* = (FG)* = G*F* = G*F+. But P = P+ = (FG)+ = G+ F+ = G+F* and GP = GG+F* = F*. But then P = PP = FGP = FF*. This really is not a surprise since FF* = FF+ is the projection onto Col(P). But that is P! Again we summarize.
THEOREM 8.16
Every projection P in C"" has a full rank factorization P = FG where
G=F*=F+.
For example, consider the rank I projection
_1
9/26
-6/13
-3/26
6/13
8/ 13
2/ 13
-3/26
2/ 13
1/26
9/26
[
1
-4/3 - 1/3 ]. This is a full rank factorization but it
-3/26 is not orthogonal. But Gram-Schmidt is easy to apply here. Just normalize 9/26 -6/13 -3/26 3/,/2-6 the column vector and then 2/ 13 6/13 8/13 = 4/ f2-6-3/26
2/13
-1/26
1/26
[3/ 26 -4/ 26 -1/,/-2-6] . We can use this factorization of a projection to create invertible matrices. This will he useful later. Say F is m-by-r of rank r, that is, F has full column rank. Then F+ _ (F*F)-1 F*, as we have seen. This says F+ F = Ir. Now write I - F F+ = F, Fj in orthogonal full rank factorizaF+ tion. Form them -by-m matrix S = . . . . We claim S-t = [ F: F, ]. We cornF1+
F+ puce SS
=
F.+
[FF,] =
F+F
+ F,
F1+F ]
-
Ir
[ F1+F
+
Im-r
8.5 Affine Projections
315
hIr
since Fi = F+(FI F+) = F+(I - FF+) so F, +F = -r F+(I - FF+)F = 0 and F, = FI Fi F, = (I - FF+)FI soF+F, = 0 ®
/,,,
as well.
To illustrate, let F =
3
13
2
9
1
3
, which has rank 2. We
saw above / -
3
26 3
FF+ =
[
26
-1
F+
-4 26
26
]
= FI F+ so S = I
F+
26
1
3
- 11
26
13
26
3
-9
13
13
I
13
26
26
79
3
13
3
26
i s invertible with inverse S-1 =
2 6
2
9
1
3
6 -1
Similarly, if G has full row rank, we write 1 = G+G = F2 F2 in orthogonal G
full rank factorization and form T
G+
F2 I. Then T-1 = F2
Exercise Set 36 1
1
1. Find an orthogonal full rank projection of z
8.5
Affine Projections
We have seen how to project (orthogonally) onto a linear subspace. In this section, we shall see how to project onto a subspace that has been moved away from the origin. These are called affine subspaces of C".
DEFINITION 8.8 (affine subspace) By an affine subspace of C", we mean any set of vectors of the form M(a, U) {a + ul u E U), where U is a subspace of C" and a is a fixed vector from C". The notation M(a, U) = a + U is very convenient.
Projections
316
We draw a little picture to try to give some meaning to this idea.
Figure 8.2:
Affine subspace.
The following facts are readily established and are left to the reader.
1. a E M(a, U).
2. M(U)=U. 3. M(a,U)=UiffaEU. 4. M(a, U) c M(b, W) iff U C W and a - b E W.
5. M(a, U) = M(a, W) iff U = W.
6. M(a, U) = M(b, U) if a - b E U. 7. M(a, U) is a convex subset of C".
8. M(a,U)nM(b, W)00iffa-bE U+W. 9. If Z E M(a, U) n M(b, W) then M (a, U) n M(b, w) = M(z, u n W). In view of (5), we see that the linear subspace associated with an ahne subspace is uniquely determined by the affine subspace. Indeed, given the affine
subspace, M, U = (y - aly E M) is the uniquely determined linear subspace called the direction space of M. We call the affine subspaces M(a, U) and
M(b, W) parallel if U C W. If M(a, U) and M(b, W) have a point c in
8.5 Affine Projections
317
common, then M(a, U) = M(c, U) c M(c, W) = M(b, W). Thus, parallel affine subspaces are either totally disjoint or one of them is contained in the other. Note that through any point x in C", there is one and only one affine subspace with given direction U parallel to W, namely x + U. Does all this sound familiar from geometry'? Finally, we note by (8) that if U ® W = C", then M(a, U) fl M(b, W) is a singleton set. Next, we recall the correspondence we have made between linear subspaces
and orthogonal projections; U H Pu = Pu = Pu, where U = Col(Pu) = Fix(Pu ). We should like to have the same correspondence for affine subspaces. The idea is to translate, project, and then translate back.
DEFINITION 8.9
(affine projection)
For x in C", define fM(a,u)x = a + Pu(x - a) = a + Pu(x) - Pu(a) _ Pu(x) + Pul (a) as the projection of x onto the affine subspace M(a, U). The following are easy to compute.
1. nM( U)(x) = fM(a,u)(x) for all x in C".
2. M(a, u) = {yly = nM(a.u)(x)} for all x in C".
3. M(a, U) = {xlnu(a,u)(x) = x}. 4. nL(a,ul)(x) = X
nM(a.u)(x)
5. fM(a.u)(') = a - Pu(a) = Pul(a). 6. If a E U, f M(a.U)(x) = Pu(x) 7. 11 fs(a.u)(y) - nM(a.u)(Z)II = II Pu(z) - Pu(y)11. 8. 11nM(a.U)(x)MM2 = IIPu(x)112 + IIPui(a)112.
9. X = Il m(a.u)(x) = (x - a) - Pu(x - a) = Pul(x - a). As a concrete illustration, we take the case of projecting on a line. Let b 0 '
be in C" and let U = sp(b) the one dimensional subspace spanned by b. We shall compute the affine projection onto M(a, sp(b)). From the definition,
< x - alb >
1IM(a.,v(b)(x) = a +
< bib >
_
b=a+
b*(x - a)b b*b
= a + bb+(x - a) < a - x xlb >
(I - bb+)(a) + bb+(x) and nM(a.,p(b),() W = (x - a) + < b1b > b = (x - a) + bb+(a - x) = (I - bb+)x - (I - bb+)a = (I - bb+)(x - a). We see that, geometrically, there are several ways to resolve the vector nM(a,u)(x) as the algebraic formulas have indicated.
318
Projections
Now, if A is a matrix in C""', then we write M(a, A) for the affine subspace M(a, Co!(A)). Consider a line L of slope in in R2. Then L = {(x, mx + yo)Ix E
1
118} e R2. We associate the matrix
(1
L = Sl
m
0 I of rank I with this line. Then
I
01
x
0 /I
y
l
0
+
fix, y E ]E81. We can then use the pscu-
yo
doinverse to compute the orthogonal projection onto the direction space of this line, which is just the line parallel passing through the origin. The affine projec0 0 0 x tion can then be computed as 171L
y
yo
=,
I
I}m
1
+
1
0
in
+
x
I
M
n12
I TM2
-IT n-17
0
in
-
0
So, for example, say y = -3x +4. Then 11L
x
_
y x-3v+12 10
I .8
SO 11L
-3x+9y+4
10+30+12
1
10
J
-30+90+4
10
10
E L.
6.4
Next, consider a plane in R3 given by ax + by + cz = d with c # 0. We 0
1
associate the rank 2 matrix
0
I
I
_u r
0
0
1
-hh
I
with this plane. Then the plane
0
c
is described by the set of vectors
0
0
1
0
1I;]+[]XYZER}. z x y
We compute the affine projection 11
z
0 I
0 o
-)'
0
c
r
I
0
U
0
1
0
x y
h
0
z
c
+
8.5 Affine Projections
319
11
1
0 0
a
(3
0
0
1
0
0
1
0
of
03
0
1+a
+
1
0
0
1
0 0
0
0 a
The pseudoinverse above is easily computed and the product a p2
1+az+p2
1 tae
=
I+
1+p2
I+al+l32
For example, to the plane 3x+2y+6z = 6, we associate so the plane consists of the vectors in 0 0 0 x 0 0 y + 0 0 z 31 1 x 0 0
-I
-I
2
3
0
1
Ix, y, z E R
1
z
o
Y Z
-
L1
40 49 X
-6
T9- X
-18X 49 X
2
0
3!
49
-6
L5
ZL2
49
49
49
-12
49
49
0
0
1
0
y
0 o
0
Z
1
Y
18
-12 49 Y
1
. So, to project [
1
onto this plane, we
1
11
3
49(Z-1)+I
simply compute II
=
0
T9-(Z
_
49Y
+
o
+
Z-1
2(Z 492(z - 1)
45
+
49
-
49 y
31
X
13
x
1
21
-18
49
-18
-6
0
1
-6
40 49
0
The affine projection
0
1
n
.
1
-2
=
As one last illustration, suppose we
IS
3
49
I
wish to project
1
onto the line in R3 given by L(t) = (1, 3, 0)+t(1, -1, 2).
5 1
We associate the matrix
-1 2
0
0 0
of rank I to this line so the line be-
-1
0 0
0 0
x y
2
0
0
z
l
comes the set of vectors
0
0 0
1
+ I
6
X
The affine projection onto this line is II
y Z
=
Ix,y,ZER
3 0
6
-I
2
1
-2
6
6
6
2
-2
4
6
6
6
.
Projections
320 I
Y Z X
'-Y (x - I)
1
([1_rn+1]
=
3
6(x-I>
0
0
3
1
So, for example, n
6(x - I)
=
1
- 3)
.:6'(y
z+I
'(y-3) 6 z+3 6(y-3) 6z
1
4
5
Since orthogonal projections solve a minimization problem so do affine projections. If we wish to solve the approximation problem for x and M(a, U) that is, we wish to minimize Ilx - yll for y E M(a, U) - we see that this is the same as minimizing ll(x - a) - (y - a)Ii, as y - a runs through the linear subspace U. But this problem is solved by PU(x-a). Thus Ilx - fM(a.U)(x)II =
ll(x - a) - PU(x - a)ll solves the minimization problem for M(a, U). To 1
illustrate, suppose we want to minimize the distance between
and the
I
3
plane 3x + 2y + 6z = 6. What vector should we use? n
of course!
I
3
Then
-n
1
=
I
3
3
I
I
L1-9-
49
= E . Similarly,
I
the minimum distance from
I
to the line L(t) _ (1, 3, 0) + t(I , -1, 2) is
5 1
I
n
I'
1
1
=
I
5
5
5
-
-2
3 1
=
5.
0
4
1
Our last task is to project onto the intersection of two affine subspaces when this intersection is not empty. You may recall it was not that easy to project orthogonally onto linear subspaces. It takes some work for affine subspaces as well. We offer a more computational approach this time.
Suppose matrices A and B determine affine subspaces that intersect (i.e., M(a, A) fl M(b, B) # 0). By (8), we have b - a E Col(A) + Col(B) xi
Col([AB]). Thus, b - a = Ax1 + Bx2 = [AB]
. One useful fact X2
about pseudoinverses isY+ _ (y*y)+y*, so we compute that [A:B]+ _ \ [ B* B* = I A*(AA* + BB*)+ 1 I
]
J
B*(AA* + BB*)+ J
Let
D = AA*+
BB*. Then the projection onto Col(A) + Col(B) is [A:B][A:B]+
_
8.5 Affine Projections
r A* BA D+
321
= AA*D++BB*D+ = DD+. So M(a, A) fl M(b, B) # 0
[A:B] I
iff b - a = D D+ (b - a). Next, b - a = D D+(b - a) iff b - a = A A * D+(b -
a) + BB*D+(b - a) if AA*D+(b - a) + a = -BB*D+(b - a) + b. Also Col(A), Col(B) c Col(A) + Col(B) so DD+A = A and DD+B = B. But DD+A = A implies (AA* + BB*)D+A = A, so BB*D+A = A - AA*D+A whence BB*D+AA* = AA* - AA*D+AA*. But this matrix is self-adjoint, so BB*D+AA* = (BB*D+AA*)* = AA*D+BB*.
THEOREM 8.17
M(a, A) fl M(b, B) 0 0 i.,fb - a = DD*(b - a), where D = AA* + BB*. In that case, M(a, A) fl M(b, B) = M(c, C), where c = AA*D+(b - a) + a and C = [BB*D+A:AA*D+B]. Let Y E M(a, A) fl M(b, B). Then y = Ax, + a and y = Bx2 + b for suitable x, and x2. Then Ax, + a = Bx2 + b so b - a = Ax, - Bx2. In
PROOF
x,
matrix notation we write this as [A - B]
. . .
= b - a. The solution set
X2
to this equation is
r L
A*D+(b - a)
-B+D+(b - a)
1
XZ
= [A: - B]+(b - a) + [I - [A: - B][A: - B]+]z =
J
I
+
[
I- A*D+A
A*D+B
B*D+A
I - B*D+B
z, l J
[
z2 J
,
where z =
x ].ThenY=Axi+a=B_BB*D+B]z=B[AA*D(b_a)+a]+
L X2
[A - DD*D+A:A*D+B]z, = c + [BB*D+A:AA*D+B]z,. This puts y in M(c, C).
Conversely, suppose y E M(c, C). Then y = c+[BB*D+AAA*D+B]z for
some z E Col(C). But then y = (AA*D+(b - a) + a) + [A - AA*D+A:AA*
D+B]z = A[A*D+(b - a)] + A[I - A*D+A:A*D+B]z + a = Awl + a E
M(a, A).Similarly,y =(-BB*D+(b-a)+b)+[BB*D+AB-BB*D+B]z = B[-B*D+(b - a) + B[B*D+A:1 - B*D+B]z + b = Bw2 + b E M(b, B). This puts y E M(a, A) fl M(b, B) and completes the proof.
Before we write down the projection onto the intersection of two affine subspaces, we obtain a simplification.
Projections
322
THEOREM 8.18
= [BB*D+AA*][BB*D+
[BB*D+A.AA*D+B][BB*D+A:AA*D+B]+ AA*]+.
PROOF ComputeCC+ = [BB*D+A.AA*D+B][BB*D+A:AA*D+B]+ CC*(CC*)+ = CC*[BB*D+A:AA*D+B][BB*D+A:AA*D+B]+ _ CC*[BB*D+AA*D+BB* + AA*D+BB*D+AA*]+ = CC*[(BB* + AA*)D+BB*D+AA*]+ = CC*[DD+BB*D+AA*]+ _ CC*[BB*D+AA*]+ = [BB*D+AA*][BB*D+AA*]+, which is what we want.
0
We can now exhibit a concrete formula for the affine projection onto the intersection of affine subspaces: fI M(a.A)nM(b.B)(x) = [B B* D+AA*] [ B B* D+AA* ]+
(x - c) + c, where c = AA*D+(b - a) + a. We note with interest that BB*D+AA* = BB*[AA* + BB*]+AA* is just the parallel sum of BB* and AA*. In particular the orthogonal projection onto Col(A) flCol(B) can be com[BB*D+AA*][BB*D+AA*]+. This adds yet another puted formula to our list. We illustrate with an example. Consider the plane in R3, 3x - 6y - 2z = 15,
00
1
0
to which we associate the matrix A =
1
0
-3 0
3
0 0
and vector a =
-;5
Also consider the plane 2x + y - 2z = 5, to which we associate the matrix
B=
1
0
0
1
0 0 0
t
1
0
and vector b =
0
. We shall compute the projec-
ZS
tion onto the intersection of these planes. First, AA* =
1
0
0
1
3
-3
z 1
BB* =
0 I
0 1
3
1
z
, and D+ = (AA* + BB*)+ _
z
-3 45
2
-11
-1
4
5
-1
3
1
4
4
5
1
s
-I
I
2
4
5
5
25
Now DD+ = 1 ensuring that these planes do intersect without appealing to di1
I
5
1
mension considerations. Next, c = I 57 J and C = L
3
I
49
7
21
100
100
40
1L
-i 40
21
3
9
40
40
16
100
I
J and
8.5 Affine Projections
323
L96-s)+ 2(Y+5)+85(z+3)+s 425 =
425 (x
z
+ 85 (z + 3) + 5
- s) + 4425
5)
-3
85(x - 5) + 85(y + 5) + 17(z + 3) - 3
14t
2t
, where t = is (x s)
15t
+ 4i5 (y +
s)
+ 4z5 (z + 3).
The formula we have demonstrated can be extended to any number of affine subspaces that intersect nontrivially.
Exercise Set 37
1. Consider A =
1
0
1
1
0
1
1
0
1
1
1
1
0
1
1
0
1
0
1
0 0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0 0
LO 0
Let a =
0 0 0 0 1
0
1
0
0 n
n
0
and B =
0
1
0
1
1
0
1
0
1
1
1
0
1
0
1
0
1
0 0 1
1
0 0 0 0 1 1
1
and b =
0
0
0
1
1
.
Compute AA*, BB *, and D + _
(AA* + BB*), and c. Finally, compute fl(x, y, z), the projection onto the intersection of M(a, A) and M(b, B).
Further Reading [P&O, 2004] R. Piziak and P. L. Odell, Affine Projections, Computers and Mathematics with Applications, 48, (2004), 177-190. [Rock, 1970] R. Tyrrell Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, (1970).
324
8.6
Projections
Quotient Spaces (optional)
In this section, we take a more sophisticated look at affine subspaces. This section may be omitted without any problems. Let V be a vector space and M a subspace of V. Let VIM (read "VmodM") denote the collection of all affine subspaces of V generated by M
VIM= IV +M IV E V). Our goal is to make this set VIM into a vector space in its own right using the same scalars that V has. In other words, affine subspaces can be viewed as vectors. First, we review some basics. THEOREM 8.19 Let M be a subspace of V Then TA.E.:
1. u+M=v+M 2. vEU+M 3. UEV+M
4. U-VEM. PROOF
The proof is left as an exercise.
0
Let's talk about the vector operations first. There are some subtle issues here.
Suppose u + M and v + M are affine subspaces. (By the way, some people call these sets cosets.) How would you add them? The most natural approach would be
(u+M)®(v+M):= (u+v)+M. The problem is that the vector u does not uniquely determine the affine space
u + M. Indeed, u + M could equal u, + M and v + M could equal v, + M. How do we know (u + v) + M = (u, + v,) + M? The fancy language is, how do we know ® is "well defined"? Suppose u + M = u, + M and v + M = v, + M. Then u - u, E M and v - v, E M. But M is a subspace, so the sum (u-u,)+(v-v,) E M. A little algebra then says (u+v)-(u, +v,) E M whence (u+v)+M = (u, +v,)+M. This says (u+M)®(v+M) _ (u, +M)®(v, +M). In other words, it does not matter what vectors we use to determine the affine space; the sum is the same.
8.6 Quotient Spaces (optional)
325
The next issue is scalar multiplication. The same issue of "well definedness" must be confronted. Again, the most natural definition seems to be
a(u + M) = au + M for a any scalar.
Suppose u + M = v + M. Then u - v E M, so a(u -V) E M since M is a subspace. But then au - ON E M, so au + M = av + M, and again this definition of scalar multiplication does not depend on the choice of vector to represent the affine subspace. Note, the zero vector in VIM is V + M = M itself! We can visualize VIM as being created by a great collapsing occurring in V where whole sets of vectors are condensed and become single vectors in a new space.
V=
M
U1 +M u2+M 1 -+ V/M =
u3+M
THEOREM 8.20 Let V be a vector space and M a subspace. Then, with the operations defined above, VIM becomes a vector space in its own right.
PROOF The hard part, proving the well definedness of the operations, was done above. The vector space axioms can now be checked by the reader. 0
There is a natural mapping T1 from V to VIM for any subspace M of V defined by -q(u) = u + M. THEOREM 8.21 With the notation above, -9 : V -+ VIM is a linear transformation that is onto.
Moreover, ker(i) = M. PROOF We supply a quick proof with details left to the reader. First, compute
''(au+0v) = (au+(3v)+M = (au+M)®({3v+M) = a(u+M)®{3(v+M) = cog(u) + (3q(v) and v E ker(q) iff 9(v) _ ker('q) = M.
if v + M = M if v E M so 0
In finite dimensions, we can get a nice relationship between the dimension of V and that of VIM. THEOREM 8.22 Suppose V is a finite dimensional vector space and M is a subspace of V If {m1, m2, ... , mk} is a basis for M and {vl+ M, ... , v,, +M) is a basis for VIM,
thef.3={mj,m2,...,mk,vi,...,v}isabasisforV
Projections
326
PROOF
Suppose first that {vi +M,...
is a linearly independent set
,
of vectors. Then, if r- a,v, = 0 , we have -'0 v/,t: = Tl(-6) = q(ra,v,) _ E il(a,v,) = r a,(v, + M). Thus, by independence, all the as are zero. This says n < dim(V) so that VIM must he finite dimensional. Moreover, consider a a,v, + (3j mj. Then q(> a, v, + linear combination of the ms and vs, say
(3imj) = E a,(v, + M), which implies all the as are zero. But then, we are left with E 13jmi = -6. Again, by the assumption of independence, all the (3s are zero as well. Hence we have the independence we need. For spanning, let v he in V. Then q(v) is in VIM, so there must exist as such that
v+ M = E a,(v, + M) _ (E a,v,)+ M, and so v - (1: a,v,) E M. But since we have a basis for M, there must exist (3's with v - (E a,v,) = (3imi. Therefore, v = E a,v, + E 13jmj, which is in the span of 5. This completes the argument.
0
COROLLARY 8.5 Infinite dimensions, dim(V)=dim(M) + dim(VIM) for any subspace M of V.
PROOF
The proof is left as an exercise.
0
We end with a theorem about linear transformations and their relation to quotient spaces.
THEOREM 8.23 Suppose T : V -a V is a linear transformation and M is an invariant subspace n
n
for T. Then T : VIM -* VIM, defined by T(v + M) = T(v) + M, is a welldefined linear map whose minimal polynomial divides that of T.
PROOF
The proof is left as an exercise.
0
Exercise Set 38
1. Prove Theorem 8.19.
2. Finish the proof of Theorem 8.20. 3. Let M be a subspace of V. For vectors v, u in V, define u " v iff u - v E M. Argue that ' is an equivalence relation on V and the equivalence classes are exactly the affine subspaces generated by M. 4. Prove Corollary 8.5. 5. Fill in the details of Theorem 8.21 and Theorem 8.23.
8.6 Quotient Spaces (optional)
327
6. Suppose T : V -+ W is a linear map and K is a subspace of V contained in ker(T). Prove that there exists a unique linear map T so that T: V/K -
W, with the property, that T o Tj = T. Moreover, ker(T) = (v + K v E ker(T)) and i m(T) = im(T). (Hint: Begin by proving that T is well defined.) Then compute im(T) and ker(T). 7.
(first isomorphism theorem) Let T : V -- W be a linear map. Argue W defined by T(v + ker(T)) = T(v) is not only that T:V/ker(T) linear but is actually one-to-one. Conclude that V/ker(T) is isomorphic to im(T).
8. Suppose V = M ® N. Argue that N is isomorphic to V/M.
V -* W is a linear map. Prove that dim(ker(T)) + 9. Suppose T ditn(im(T)) = dim(V), where V is finite dimensional. :
10. (second isomorphism theorem) Let V be a vector space with subspaces M and N. Prove that (M + N)/N is isomorphic to M/(M fl N). (Hint:
Define T : M+N -+ M/(Mf1N) by T(m+n) = m+(MnN). Prove T is well defined and determine the kernel. Then use the first isomorphism theorem.)
11. (third isomorphism theorem) Let V be a vector space with subspaces M and N, and suppose M C_ N. Prove that (V/M)/(N/M) is isomorphic to
V/N. (Hint: Define T : V/M -> V/N by T(v + M) = v + N. Prove T is well defined and determine the kernel. Then use the first isomorphism theorem.)
12. Suppose you have a basis b1, ... bk of a subspace M of V. How could you use it to construct a basis of V/M?
Chapter 9 Spectral Theory
eigenvalue, eigenvector, eigenspace, spectrum, geometric multiplicity, eigenprojection, algebraic multiplicity
9.1
Eigenstuff
In this chapter, we deal only with square matrices. Suppose we have a matrix
A in C11" and a subspace M e C""" Recall, M is an invariant subspace for A if A(M) c M. That is, when A multiplies a vector v in M, the result Av remains in M. It is easy to see, for example, that the null space of A is an invariant subspace for A. For the moment, we restrict to one-dimensional subspaces. Suppose v is a nonzero vector in C" and sp(v) is the one-dimensional subspace spanned by v. Then, to have an invariant subspace of A, we would need to have A(sp(v)) g sp(v). But v is in sp(v), so Av must be in sp(v) also. Therefore, Av would have to be a scalar multiple of v. Let's say Av = Xv. Conversely, if we can find a nonzero vector v and a scalar K with Av = Xv, then sp(v) is an invariant subspace for A. So the search for one-dimensional invariant subspaces for a matrix A boils down to solving Av = Xv for a nonzero vector v. This leads to some language. DEFINITION 9.1 (eigenvalue, eigenvector, eigenspace) A complex number K is called an eigenvalue of A in C""" if and only if
there is a nonzero vector v in C" with Av = Xv. The vector v is called an eigenvector. The set of all possible eigenvalues of A is called the spectrum of A and is denoted X(A). Fork, an eigenvalue of A, we associate the subspace
Eig(A, K) = Mt, = Null(Xl - A) and call Mx the eigenspace associated with X. The dimension of M,, is called the geometric multiplicity of X. Finally, Mx, being a subspace, has a projection PMA associated with it called the Xth eigenprojection. In fact, PM. = (XI - A)'.
329
Spectral Theory
330
Eigenvalues were used in 1772 by Pierre Simon Laplace (23 March 17495 March 1827) to discuss variations in planetary motions. Let's look at some examples to help understand what is going on. Example 9.1
v for all v, so I is the only
1. Consider the identity matrix I,,. Then eigenvalue o f I,,. T h e r e f o r e , X
1} .
Also, Eig(I, 1) = M, =
Null (l I, -1") =Null(O) = C", so the eometric multiplicity of 1 is n. = Ov for all v so X(0) = {0} Similarly, the zero matrix 0 has 0v =
F
and Mo = Null(01 - 1) = Null(I) of0is0.
The geometric multiplicity
2. For a slightly more interesting example, consider A =
Then I is an eigenvalue of A since
0
1
0
0
0
.
1
0
0
1
0 0
=
0
=1
1
0
0
=
0
0
0
0
0 0
. But also,
0 0 0
I
0
0
,so
I
0
0 0 I
1
0 0
We see an eigenvectorfor I is
0 I
0
1
1 1
0 0 0
0
1
1
0
is an eigenvector for 1, inde-
I
0
0 l
pendent of the other eigenvector
0
.
Also
0
1
0
0
1
0
0
0 0
0 0 0
=
1
so 0 is an eigenvalue for A as well. Indeed, Mo =
xl -x, xl
0
X32
=
-1
0
0
x,
0
-1
x2
0
0
0 0
0
=
0
X3
0
=j[0
0
I
I
and dim (MO) =
Of E C
1. Meanwhile, M, = Null(1 - A) = Null xI
0 0
0 0
0 0
0
0
1
dim(M,)=2.
0
x, X2
x3
=
0 0 0
0
0
0
0 0
0 0
0
=
1
(3
=
y 0
IR,yEC so
9.1 Eigenst uff 1
3. Let A =
0 0
2
0 0
0
3
0
xi
A
Xx'
=
2x2
iff
AX3
0
0 0
0 0
x2
2
0
3
x3
=
x2
I
. Thus
1,x2
3X3
X3
0
1
. Then Av = Xv iff
XI
x2
331
0 0
has eigenvalue A _
0
hash = 2, and
0 has A = 3. Thus X (A) = 11, 2, 31 and 0 each eigenvalue has geometric multiplicity equal to 1. Can you generalize this example to any diagonal matrix? 1,
1
4. Let A
=
1
2
3
0
4
5
0
0
6
x,
X2
.
x1
=
if
x2
1\
X3
Then Av = Xv iff
1
2
3
0
4
5
0
0
6
xi + 2x2 + 3x3
A xI
=
4X2 + 5X3 6X3
X3
iff
A x2 A X3
(1 - K)xl + 2X2 + 3X3 = 0 (4 - )Ox2 + 5x3 = 0 . Solving this system we find that
is
z
(6-X)x3 =0
1
2
1
0 has 0 0 eigenvalue A = 1. Thus, A (A) = 11, 4, 6} . Do you see how to generalize this example? an eigenvector for A = 6,
3
has eigenvalue A = 4, and
5. (Jordan blocks) Fork E C and m any integer, we can form the m-by-m matrix formed, by taking the m-by-m diagonal matrix diag(A, A, ... , A) and adding to it the matrix with all zeros except ones on the superdiagonal. So, for example, J, (k) = [>\], JZ()\) =
J3(A) =
A
1
0
0
A
1
0
0
A
, JA) =
1
0
0
0
X
0
0
0
,
2. Null( Al - A) is not the trivial subspace.
1
L0 XJ,
and so on. It is
A
relatively easy to see that x is the only eigenvalue of Jm(A).
1. A E \ (A).
1
0
0
THEOREM 9.1 Let A E Cnx". Then these are all equivalent statements:
r x
332
Spectral Theory
3. A/ - A is not an invertible matrix. 4. The system of equations ( Al
- A)v =
has a nonzero solution.
5. The projection ( XI - A)' is not 0.
6. det(XI -A)=0. PROOF
The proof is left as an exercise.
II
We note that it really does not matter whether we use Al - A or A - AI above. It is strictly a matter of convenience which we use. For all we know, the spectrum of a matrix A could be empty. At this point, we need a very powerful fact about complex numbers to prove that this never happens for complex matrices. We use the polynomial connection.
THEOREM 9.2 Every matrix A in C" x" has at least one eigenvalue.
PROOF Take any nonzero vector v in C" and consider the vectors v, Av, A2v, ... , A"v. This is a set of n + I vectors in a vector space of dimension n, hence they must he dependent. This means there exist scalars a0, I ,.-- . , an, + a,A"v = 6. Choose the largest not all zero, so that aov + aoAv +
subscript j such that aj j4 0. Call it in. Consider the polynomial p(z) =
ao + aiZ + ... + a,nz"'. Note a # 0 but a,,,+1 = am+2 = ... = an = 0 by our choice of m. This polynomial factors completely in C[z]; say p(z) _ = (aol + a1 A +-..+ y (r, - z) (r2 - z) . (r,,, - z) where y # 0. Then
6
(v) = [y (r1l -A)(r2/ -A)...(rn,I - A)] v. IfNull(r,,,I - A) (W), then r,,, is an eigenvalue and we are done. If not, (r,,,I - A)v # If r,,,_1 / - A has a nontrivial nullspace, then r,,,_1 is an eigenvalue and we are done. If not, (r,,,_11 - A)(rn,I - A)v # 6. Since we ultimately get the zero vector, some nullspace must be nontrivial. Say (rj1 - A) has nontrivial nullspace, then ri is an eigenvalue and we are done. a
Now we know the spectrum of a complex matrix is never empty. At the other extreme, we could wonder if it is infinite. That is also not the case, as the following will show.
THEOREM 9.3 The eigenvectors corresponding to distinct eigenvalues of A E C" xn are linearly independent.
9.1 Eigenstuff
333
PROOF Suppose X1, 1\2. ... .Xn, are a set of distinct eigenvalues of A with + a,,,v,n = corresponding eigenvectors V1, V2, ... , v,,,. Set aivi + a2v2 + +re zero. Let A 6. Ourhope isthat at a]AI a v n ) eq A vi + A) + amA1vm = Then A, + ai()\2-1\1)(1\3-XI)...(X,,,-XI)vi+W+ +-6, so a I must equal zero. 0 In a similar fashion we show all the a's are zero.
-i
-
COROLLARY 9.1
Every matrix in C""" has at most n distinct eigenvalues. In particular, the spectrum of an n-by-n complex matrix is a finite set of complex numbers.
PROOF You cannot have more than n distinct eigenvalues, else you would haven + I independent vectors in C". This is an impossibility. 0 COROLLARY 9.2
If A E C"11' has it distinct eigenvalues, then the corresponding eigenvectors form a basis of C. We end this section with a remark about another polynomial that is associated
with a matrix. We have noted that K is an eigenvalue of A iff the system of
equations (XI - A)v = 6 has a nontrivial solution. This is so iff det(XI A) = 0. But if we write out det(XI - A), we see a polynomial in X. This polynomial is just the characteristic polynomial. The roots of this polynomial are the eigenvalues of A. For small textbook examples, this polynomial is a neat way to get the eigenvalues. For matrices up to five-by-five, we can in theory always get the eigenvalues from the characteristic polynomial. But who would try this with an eight-by-eight matrix'? No one in their right mind! Now the characteristic polynomial does have interesting theoretical properties. For example, the number of times an eigenvalue appears as a root of the characteristic polynomial is called its algebraic multiplicity, and this can be different from the
geometric multiplicity we defined above. The coefficients of the characteristic polynomial are quite interesting as well.
Exercise Set 39 1. Prove that k E K(A) iff K E \(A*). 2. Argue that A is invertible iff 0
X(A).
3. Prove that if A is an upper (lower) triangular matrix, the eigenvalues of A are exactly the diagonal elements of A.
334
Spectral Theory
4. Argue that if S is invertible, then X(S-'AS) = k(A).
5. Prove that if A = A*, then k(A) c R. 6. Argue that if U* = U-', then X E X(U) implies IXI = 1. 7. Prove that if X E X(A), then k" E X(A").
8. Argue that il' A is invertible and X E X(A), then X-' E k(A-'). 9. Suppose X and µ are two distinct eigenvalues of A. Let v be an eigenvector for A. Argue that v belongs to Col (X l - A).
10. As above, if v is an eigenvector for p., then v is an eigenvector for M - A, with eigenvalue X - p.. 11.
Suppose A is an n-by-n matrix and {v, , v2, ... , v" } is a basis of C" consisting of eigenvectors of A. Argue that the eigenvectors belonging to k, one of A's eigenvalues, form a basis of NUll(XI - A), while those not belonging to X form a basis of Col (X l - A).
12. Prove Theorem 9.1.
13. Prove that if A is nilpotent, then 0 is its only eigenvalue. What does that say about the minimum polynomial of A?
14. Solve example (9.1.4) in detail. 15. Find explicit formulas for the eigenvalues of
a
b
c
d
16. If A is a 2-by-2 matrix and you know its trace and determinant, do you know its eigenvalues?
17. Prove that if U E C"X" is unitary, the eigenvectors of U belonging to distinct eigenvalues are orthogonal.
18. Argue that if the characteristic polynomial XA(x) = det(xl - A) has real coefficients, then the complex eigenvalues of A occur in complex conjugate pairs. Do the eigenvectors also occur this way if A has real entries?
19. Suppose A and B have a common eigenvector v. Prove that v is also an eigenvector of any matrix of the form aA + {3B, a, {3 E C. 20. Suppose v is an eigenvector belonging to h j4 0. Argue that v E Col(A). Conclude that Eig(A, X) g Col(A).
9.1 Eigenstuff
335
21. If X $ 0 is an eigenvalue of AB, argue that X is also an eigenvalue of BA.
22. Suppose P is an idempotent. Argue that X(P) c {0, 1). 23. Suppose p is a polynomial and k E X(A). Argue that p(X) E X(p(A)).
24. How would you define all the eigenstuff for a linear transformation T:
C" -+ C"? All A iz A22 ® that X(A) = X(A i,) U X(Azz)
25. Suppose A =
Argue that XA = XA XA Conclude
[
26. Find an explicit matrix with eigenvalues 2, 4, 6 and eigenvectors (l, 0, 0), (1, 1, 0), (1, 1, 1). 27. Argue that the product of all the eigenvalues of A E Cn " is the determinant of A and the sum is the trace of A.
28. Suppose A E C""" has characteristic polynomial XA(x) _ 2c;x;. If you i=o
are brave, argue that c, is the sum of all principal minors of order n-r of A times (-1)"-'. At least find the trace and determinant of A among the cs.
29. Do A and AT have the same eigenvalues? How about the same eigenvectors? How about A and A*?
30. Suppose B = S-'AS. Argue that XA = XB Do A and B necessarily share the same eigenvectors?
31. Suppose v is an eigenvector of A. Argue that S-lv is an eigenvector of
S-'A. 32. Here is another matrix/polynomial connection. Suppose p(x) = x" + + alx + ao. We associate a matrix to this polynomial a,,_,x"-i + called the companion matrix as follows: C(p(x)) = 0
l
0
0
0
0
1
0
-a,
-612
.Whatisthe 1
...
...
...
-an_2 -a._I characteristic polynomial of C(p(x))? What are the eigenvalues?
-ao
33. Suppose A is a matrix whose rows sum up to m. Argue that m is an eigenvalue of A. Can you determine the corresponding eigenvector?
336
Spectral Theory
34. Suppose C = C(x" - I). Suppose A = a,I + a2C +
+ a,,C"-I
Determine the eigenvalues of A. 35. Argue that A B and BA have the same characteristic polynomial and hence
the same eigenvalues. More generally, if A is m-by-n and B is ti-by-m, then AB and BA have the same nonzero eigenvalues counting multiplic-
ity.(Hint:
[0 !
]
'[
B
®] [
®
!
]
-
[ B
so
BA
36. Argue that if A B = BA, then A and B have at least one common eigenvector. Must they have a common eigenvalue? 37. Suppose M is a nontrivial subspace of C", which is invariant for A. Argue that M contains at least one eigenvector of A.
38. Argue that Eig(A, A) is always an invariant subspace for A.
39. (J. Gross and G. Trenkler) Suppose P and Q are orthogonal projections of the same size. Prove that P Q is an orthogonal projection iff all nonzero
eigenvalues of P + Q are greater or equal to one. 40. Argue that A E X(A) implies ale E X(aA). 41. Argue that if X E X(A) with eigenvector v, then X E X(A) with eigenvector v.
42. Prove that if X E X(A ), then k + T E X(A + TI).
43. Prove that h E X(A) if A E A(S-'A S), where S is invertible. I
N-I
N
N
0
2
N-2
N
N
0
0
0
0
0
k
N-k
N
N
0
...
0 0
44. Consider the matrix PN =
0
0
0 0
0
0
0
0
0
...
0
N-I
I
N
N
0
1
Find all the eigenvalues of PN and corresponding eigenvectors.
9.1 Eigenstuff
337
Further Reading [A-S&A, 2005] Rhaghib Abu-Saris and Wajdi Ahmad, Avoiding Eigen-
values in Computing Matrix Powers, The American Mathematical Monthly, Vol. 112, No. 5., May, (2005), 450-454.
[Axler, 1996] Sheldon Axler, Linear Algebra Done Right, Springer, New York, (1996).
[B&R, 1986(2)] T. S. Blyth and E. F. Robertson, Matrices and Vector Spaces, Vol. 2, Chapman & Hall, New York, (1986).
[Holland, 1997] Samuel S. Holland, Jr., The Eigenvalues of the Sum of Two Projections, in Inner Product Spaces and Applications, T. M. Rassias, Editor, Longman, (1997), 54-64. [J&K, 1998] Charles R. Johnson and Brenda K. Kroschel, Clock Hands
Pictures for 2 x 2 Real Matrices, The College Mathematics Journal, Vol. 29, No. 12, March, (1998), 148-150. [Ol9, 2003] Gregor Olgavsky, The Number of 2 by 2 Matrices over Z/pZ
with Eigenvalues in the Same Field, Mathematics Magazine, Vol. 76, No. 4, October, (2003), 314-317. [Scho, 1995] Steven Schonefeld, Eigenpictures: Picturing the Eigenvector Problem, The College Mathematics Journal, Vol. 26, No. 4, September, (1995), 316-319.
[Tr&Tr,2003] Dietrich Trenkler and Gotz Trenkler, On the Square Root of aaT + bbT, The College Mathematics Journal, Vol. 34, No. 1, January, (2003), 39-41. [Zizler, 1997] Peter Zizler, Eigenpictures and Singular Values of a Matrix, The College Mathematics Journal, Vol. 28, No. 1, January, (1997), 59-62.
9.1.1 9.1.1.1
MATLAB Moment Eigenvalues and Eigenvectors in MATLAB
The eigenvalues of a square matrix A are computed from the function
eig(A).
Spectral Theory
338
More generally, the command
[V, D] = eig(A) returns a diagonal matrix D and a matrix V whose columns are the corresponding eigenvectors such that A V = V D. For example,
> > B = [1 +i2+2i3+i; 2+214+4i91; 3+3i6+6i8i] B= 1.0000 + 1.0000i 2.0000 + 2.0000i 3.0000 + 3.0000i
2.0000 + 2.0000i 4.0000 + 4.00001 6.0000 + 6.00001
3.0000 + 1.0000i 0 + 9.0000i 0 + 8.00001
> > eig(B)
ans = 6.3210+ 14.1551i 0.0000 - 0.00001
-1.3210- 1.1551i > > [V, D] = eig(B)
V= 0.2257 - 0.1585i
0.8944
0.6839
0.6344 + 0.06931 0.7188
-0.4472 - 0.00001
-,5210 + 0.35641
0.0000 + 0.00001
-0.0643 - 0.3602i
6.3210+ 14.1551i
0
0
0 0
0.0000 + 0.0000i
0
0
-1.3210- 1.15511
D=
A really cool thing to do with eigenvalues is to plot them using the "plot" command. Do a "help plot" to get an idea of what this command can do. Or, just try
plot(eig(A),' o'), grid on and experiment with a variety of matrices.
9.2
The Spectral Theorem
In this section, we derive a very nice theorem about the structure of certain square matrices that are completely determined by their eigenvalues. Our approach is motivated by the theory of rings of operators pioneered by Irving Kaplansky (22 March 1917-25 June 2006). First, we present some definitions.
9.2 The Spectral Theorem
DEFINITION 9.2
339
(nilpotent, normal)
A matrix A E C"'n is called nilpotent iff there is some power of A that produces the zero matrix. That is, An = O for some n E N. The least such n is called the index of nilpotency.
A matrix A is called normal iff AA* = A*A. That is, A is normal iff it commutes with its conjugate transpose.
Example 9.2 1.
If A = A* (i.e., A is self-adjoint or Hermitian), then A is normal.
2. If U* = U-l (i.e., U is unitary), then U is normal. 3. If A* = -A (i.e., A is skew Hermitian), then A is normal. 0
4. N =
0 0
1
0
0
1
0
0
is nilpotent. What is its index?
Recall that we have proved that AA* = 0 if A = 0 for A E Cnxn The first fact we need is that there are no nonzero normal nilpotent matrices.
THEOREM 9.4 If A is normal and nilpotent, then A = 0.
PROOF
First, we show the theorem is true for A self-adjoint. The argument is by induction. Suppose A = A*. Suppose A' = 0 implies A = 0 as long as i < n. We want the same implication to hold for it. We look at two cases: n is even, say n = 2m. Then 0 =A" = A2in = A'A' = A'"A*m = Am(Am)*. But then, Am = 0, so applying induction (m < n), we conclude A = 0. On the other hand, if n is odd other than 1, say n = 2m + 1, then 0 =A" = An+' = A2m+2 = Am+' Am+l = Am+l (A*)m+l = (Am+i)(A`"+l )* so again An'+l = 0.
Applying induction, A = 0 since in + 1 < n. Now, if A is normal and A" = 0, then (AA*)" = An(A*)" = 0 so, since AA* is self-adjoint, AA* = 0. But then A= 0 and our proof is complete. 0 Next, recall that for any matrix A, we defined A' = I - A+A; that is, A' is the projection onto the null space of A. This projection was characterized as the unique projection P that satisfies AX = 0 iff X = PX. Finally, recall the minimal polynomial p A(x), which is the monic polynomial of least degree such that PA (A) = 0.
340
Spectral Theory
THEOREM 9.5
Let A be normal.
Then A'
0=AA'=A'A.
is
a polynomial
in
A and hence
PROOF We first dispose of some trivial cases. If A = 0, 0' - 1, so A' is trivially a polynomial in A. Let's assume A # 0. Suppose AA(x) _ ao + a,x + a2x2 + + x"'. If ao $ 0, then by a result from Chapter 1, we have A-' exists so A' _ 0, again a trivial case. Now let's assume ao = 0. Thus PA(x) = aix+aZx2+ +x"'. Of course, at could be zero oroo could be zero along with at. But this could not go on because if a2 = a3 = a,,,-i were all zero, we would have A"' = 0 hence A"' = 0, whence A = 0 by (9.4) above. So some al in this list cannot be zero. Take the least j with at # 0. Call this
j,i.Then ai 0but ai-i =0.Note i <m-1 soi <m. Thus, AA(x) = aix'+ai+ix'+i+ +x'. Form E = l+Ei=, A = 1+ ai
ai+i
ai
A+L+-22 A2+
ai
.+ ai
A"'. Now E isa polynomial in A, so clearly AE _
EA. Also E is normal, EE* = E*E. Next, suppose AX = 0 for some matrix X. Then EX = X since X kills off any term with an A in it. Conversely, suppose
X = EX for some matrix X. Now 0 =AA(A) = A'E, so (AE)i = A'E' = (A' E)Ei-' = 0, so by (9.4) above AE = 0. Note AE is a polynomial in the normal matrix A, so is itself normal. But then AX = AEX = 0. Hence we have the right property: AX = 0 if X = EX. The only question left is if E is a projection. But AE = 0 so EE = E, making E idempotent. Moreover, (A E*)(A E*)* _
AE*EA* = E*AEA* = 0, using that E* is a polynomial in A* and A commutes with A*. Thus AE* = 0, so E* = EE*. But now we have it since E* = EE* = (EE*)* = (E*)* = E. Thus E is a projection and by uniqueness must equal A'. 0
Let us establish some notation for our big theorem. Let A E C"". Let (X1 - A)' denote the Xth eigenprojection of A, where X is any complex number. Recall X is in the spectrum of A, X(A), iff P, # 0.
THEOREM 9.6 Suppose A is normal. Then 1. A Px = Px A = k PP.
2. (Al - A)P5 = (p. - X)Px.
9.2 The Spectral Theorem
341
3. {PXJX E X(A)} is a finite set of pairwise orthogonal projections.
4. If k(A) = {0}, then A = 0. PROOF
1. Since (XI - A)P, = (XI - A)(X! - A)' = 0, we have APB, = \P,. Also, A! - A is normal so by (9.5), PP(XI - A) = (Al - A)PP so Px - PA = - AP. Canceling XP,, we get PA = AP.
µPx-APt\=VP),-APx=(p.-k)PP.
2.
(µl -A)Px
3.
We know by Theorem 9.3 that k(A) is a finite set. Moreover, if k, # k2 in X(A), then A,, =
(P A - XI P.
(A - ]\i 1) Pk2
X2-1\1
PX2. Thus, Pa, Pat = Pa, (A-X 11)
Pat 1\1 - X2
=
= 0, making P -..Pa, whenever Ai and A2 are
distinct eigenvalues.
4. Suppose 0 is the only eigenvalue of A. Look at the minimum polynomial of A, p A (x) = ao + a l x + + x'. Since we are working over complex
numbers, we can factor µA completely as linear factors µA(x) = (x Ri)(x - 132) . (x - (3). If all the 13s are zero, IJA(A) = 0 =A', so
A' = 0. By Theorem 9.4, A = 0 and we are done. Suppose not all the 13s are zero. Relabeling if necessary, suppose the first k, 13s are not
zero. Then p.A(x) = (x - R1)(x - R2)...(x - Rk)xm-k. Then 0 = ILA(A)=(A-Ril)...(A-Rkl)A'"-k.Now13, #OsoQi X(A)so
Pp, _ 0. But (A - 13i l)[(A - 021) ... (A - (3k/)A'"-k] = ® implies [(A-R21)...(4-Rk1)Am-k]
=(A-R1!)'[(A-R2l)..Am-k] = 0.
The same is true for 02 so (A -1331) .. (A - Rk l )A"'-k = ®. Proceeding inductively, we peal off each A -131 term until we have only A'"-k = But then using (9.4) again we have A = 0. 0 We need one more piece of evidence before we go after the spectral theorem.
THEOREM 9.7
Suppose A is normal and F is a projection that commutes with A (i.e., AF = FA). Then 1. A F is normal.
2. (Al-AF)'=(Al-A)'F=F(A!-A)'ifA54 0.
342
Spectral Theory
PROOF 1.
First (AF)(AF)* = AFF*A* = AF2A*=FAA*F = F*A*AF (A F)*(AF).
2. Suppose k # 0 and X is any matrix. Then (Al - FA)X = 0 iff AX =
FAX iff X = F(,\-' AX) iff X = FX and X = \-I AX iff X = FX
and \X = AX iff X = FX and (A/ - A)X = 0 if X = FX and X = (Al - A)'X iff X = F(AI - A)'X. By uniqueness, F(Al - A)' = (Al - FA)'. Also, (Al - A)' is a polynomial in Al - A and hence F commutes with (,\/ - A). 0 Now we are ready for the theorem. Let's establish the notation again. Let A E C"'" and suppose the distinct eigenvalues of A are A 1 , A2, ... , A,,,. Suppose the eigenspaces are Mt10*12, ... , Mi,,,, and the eigenprojections are Px, , Pat, ... , Ph.
THEOREM 9.8 (the spectral theorem) With the notations as above, the following statements are equivalent: 1. A is normal.
2. A= A i Pa, +1\2812 + ... +'\"I pX for all \i j41\j.
811 + ....+ Ph,, = /,and Pa, 1 Px,
3. The eigenspaces M),, are pairwise orthogonal and M,,, ® *12 ® . . . ® Al. = CI I.
PROOF
Suppose A is normal. Let F = I - E"', P. Then F* = (I -
E" P\,)* = 1* - >" P* = I F2 = F(l
P,, = F, so F is self-adjoint. Next,
PP,) = F - >"'I FPr, . But FPa, = P,\, - m P,,
pa,
p2 = 0 for each j, so F2 = F, using that the eigenprojections are orthogonal. Thus, F is a projection. Moreover, A F = FA by (9.7) (1) so
(X/ - AF)' _ (Al - A)'F = F(A/ - A)' by (9.6) for all \ # 0. Thus, for all A A 0, (A/ - A)'(/ - F) = J :(XI - A)'(PP) = ((A/ - A)')2 = (A - Al)'. Then (Al - FA)' = (A/ - A)'F = 0 for all k # 0. This says that the only eigenvalue possible for FA is zero so FA = AF = 0 by (9.6)(4). Hence F = A'F = PoF = 0 for, if 0 E MA), then PoF = FPo = 0 and if 0 ¢ A(A), then Po = 0. Thus, in any case, I
9.2 The Spectral Theorem
Next, A = Al = A(j
I Px,)
AP.,,
343 X P), by (9.7)(1).
Thus, we have the hard part proved (i.e., (1) implies (2)). Next, we show (2) implies (1). Suppose (2) is true. Then AA*
Ein
I
1,\,12 P.\, and A*A
=
X1P )(r" X Px,) _ rm IX1I2Px,. Thus I
AA* = A*A.
0
Again suppose (2) is true. The eigenspaces are orthogonal pairwise because their projections are. Moreover if V E C", then v = Ix = (Px, + + P,.)(v) = Px,v+Px2v+...+PX,,V E Mx,®1Mx2®1...®1Mx,,,,SO Cn = M, \ , M xm , \ . . Conversely, if we assume (3), then since Mx,-LM?,, we have Px, I Px, j. Moreover if v E C", then v = v1 +v2+ +v,n, where v1 E M. when i Then ='I VI+X2v2+
+k,nvn,. But P,\, vi = v; since Fix(Px,) = Mx,. Px,v; = 6 fori # j. Thus Px,v; = v; soAv= X1Px,v+1\2Px2v+...+k11,P.,,,V=(XIPx,+...+X,nP),.)y. Px, +
+ PPM)v and so (2) follows. This completes the proof of the spectral
theorem.
There is uniqueness in this spectral representation of a normal matrix, as the next theorem shows. THEOREM 9.9 Suppose A is a nonempty finite set of complex numbers and ( Qx IX E A) is XQx, an orthogonal set of projections such that EXEA Qx = 1. If A =
then A is normal, Qx = (X! - A)' for all K E A, and X(A) c A. Moreover, A=X(A)iffQx AO for eachnonzeroX E A.
[
PROOF We compute A*Qx = (IIEA XQx)*Qx,, = (>xeA XQx)Qx = XQx)Qx = L.XEA XQxQ1, = `xEA XQ.Qx,, = AflQx and AQx0 =
XoQx,,. Thus, A*A = A*(EXQx) = EXA*Qx = EXXQx = EXAQx = A is normal. We adopt the convention that Qx = 0
if K E A. Since for any µ E C, µl - A = µl - EXQx = E(µ - X)Qx, (µ! - A)Qµ = ExEA(µ - X)QxQ . = O. Then, for all µ E C with X A Px < Q. Then, for any µ E C, Q' = 1 - Q R = ExENlµl Qx P. Thus, P,, < Q . for each µ E C so that P. _ (Al - A)' = Q . for all µ E C. Now, if µ 0 A, then P,,, = Q1, = 0, which says µ X(A). Thus, X(A) C A. If µ E A such that Q. A 0, then Pµ = Q1, 54 0, which means 0 for every µ E A, which completes µ E X(A). Thus, A = X(A) if Q , our proof.
0
Spectral Theory
344
One of the nice consequences of the spectral theorem is that certain functions of a matrix are easily computed.
THEOREM 9.10 Let A be normal with spectral representation A = a EX(A) X, Ph,. Then
1. AZ =
L.A,EX(A) ki Ph,
2. A"= 3.
FX,EX(A)kP),,.
If q(x) is any polynomial in C[x], then q(A) _ E),,Ea(A)9(,\r)P.,
Moreover, if X E X(A), there exists a polynomial Px(x) in C[x] such that
& = Pk(A). PROOF 1. Compute A2 = (EXPx)(EPµ)=Ea r \µPaPµ = EX2Pi' = EX2P),. 2. Use induction.
ovP\+ 3. Compute µA"+vA"' so clearly, q(A) _ EX q(k)PP.
µa",Pa =
E"(fLk"+UA,")P), O
To get the polynomial P),(x), we recall the Lagrange interpolation polynomial: (X k )(.l-A2). (A A,_I)(A' /+I) (a PA (x) - (a/-ki)(A -k2). (k,-k _i)(X,-A,+i).. / (A,-k,,) Then Pa,(Xi) = bij, the
ICronecker S. Then P,\, (A) _ ; Pa,(k,)P), = P), _ (XjI - A)'. '\ For Recall that if k is a complex number, X+ 0 if X=0* normal matrices, we can capture the pseudoinverse from its eigenvalues and eigenprojections.
THEOREM 9.11 Suppose A is normal with spectral representation A = EXEX(A) XP),; then A+ = >XEX(A) i+P),. In particular A+ is normal also.
PROOF
The proof is left as an exercise.
U
345
9.2 The Spectral Theorem
Exercise Set 40 1. Give an example of a matrix that is not self-adjoint, not unitary, and not skew adjoint and yet is normal. 2. The spectral theorem is sometimes formulated as follows: A is normal if A = >2 X u, u*, where the u; are an orthonormal basis formed from the eigenspaces of A. Prove this form.
3. Suppose A is normal. Argue that Ak is normal for all k E N; moreover, if p(x) E C[x], argue that p(A) is normal. 4. Argue that A is normal iff the real part of A commutes with the imaginary part of A. (Recall the Cartesian decomposition of a matrix.)
5. Suppose A E C'
.
Define Ra = (Al - A)-' fork 0 k(A). Prove that
R,, - RN, = (FL - X) R,\ RI,.
6. Argue that an idempotent P is normal iff P = P*. 7. Argue that A is normal iff A* is normal and the eigenspaces associated with the mutually conjugate eigenvalues of A and A* are the same.
8. Suppose A is normal and k is an eigenvalue of A. Write A = a + bi, where a and b are real. Argue that a is an eigenvalue of the real part of A and b is an eigenvalue for the imaginary part of A.
9. Argue that a normal matrix is skew-Hermitian iff all it eigenvalues are pure imaginary complex numbers. 10. Write a matrix A in its Cartesian decomposition, A = B + Ci, where B and C have only real entries. Argue that it A is normal, so is
B -C
[C
B Can you discover any other connections between A and this 2-by-2 block matrix?
11. Prove that A is normal iff every eigenvector of A is also an eigenvector of A*. 12. Argue that A is normal iff A* is expressible as a polynomial in A. 13. Prove that A is normal iff A commutes with A*A.
14. Argue that A is normal if A commutes with A + A* iff A commutes with A - A* if A commutes with AA*-A*A.
Spectral Theory
346
15. Prove the following are all equivalent statements. (a) A is normal (b) A commutes with A + A* (c) A commutes with A - A* (d) A commutes with AA* - A*A
16. Prove that a matrix B in C'""" has full row rank iff the mapping A H AB* + B*A takes C0"" onto the set of m-by-m Hermitian matrices. 17. Let A E C0"' be normal and suppose the distinct eigenvalues of A are X 1 , X2, ... , X,,. Suppose XA(x) = Hk" i(x - Xk)'. Argue that PA (X) fl'T_i(x - Xk) and dim(Eig(A, Xk)) = dk. 18. Give an example of 2-by-2 normal matrices A and B such that neither A + B nor AB is normal.
Group Project A long-standing problem on how the eigenvalues of Hermitian matrices A and B relate to the eigenvalues of A + B has been solved. Read about this and write a paper. See [Bhatia, 20011.
Further Reading [Bhatia, 20011 Rajendra Bhatia, Linear Algebra to Quantum Cohomology: The Story of Alfred Horn's Inequalities, The American Mathematical Monthly, Vol. 108, No. 4, (2001), 289-318. [Brown, 1988] William C. Brown, A Second Course in Linear Algebra, John Wiley & Sons, New York, (1988).
[Fisk, 2005] Steve Fisk, A Very Short Proof of Cauchy's Interlace Theorem, The American Mathematical Monthly, Vol. 112, No. 2, February, (2005), 118.
9.3 The Square Root and Polar Decomposition Theorems
347
[Hwang, 2004] Suk-Geun Hwang, Cauchy's Interlace Theorem for Eigenvalues of Hermitian Matrices, The American Mathematical Monthly, Vol. 111, No. 2, February, (2004), 157-159. [J&S, 2004] Charles R. Johnson and Brian D. Sutton, Hermitian Matrices, Eigenvalue Multiplicities, and Eigenvector Components, SIAM J. Matrix Anal. Appl., Vol. 26, No. 2, (2004), 390-399.
[Zhang 1999] Fuzhen Zhang, Matrix Theory: Basic Results and Techniques, Springer, New York, (1999).
9.3
The Square Root and Polar Decomposition Theorems
The spectral theorem has many important consequences. In this section, we develop two of them. We also further our analogies with complex numbers and square matrices. THEOREM 9.12
Let A E C""". If A A*= X P where P is a projection and A 0 0, then k is a positive real number.
PROOF
Suppose AA* = XP, where Pisa projection and A # 0. Then AA*
is not 0, so there must exist v E C" with AA*v # 6. Then Pv # -6 either. But IIA*v112 =< A*vIA*v >= < AA*vlv >=< XPvly >= X < Pvly >= X < Pvl Pv >= X II Pv112. Now A*v # 0 so k =
IIA*y112
11 Pv112 is
positive and real. 0
COROLLARY 9.3 The eigenvalues of AA* are nonnegative real numbers.
PROOF
Suppose A # 0. Since A A * is normal, it has a spectral representation AA*)' for any p. in X(AA*). E
AA* _ Es
X; E(XiI - AA*)'E = p.E. Now if 0 because µ E K(AA*). Finally, if EA 54 0, then p. > 0 by Theorem 9.12 above. 0
Then (EA)(EA)* = EAA*E = EX
Ea(AA*)
EA = 0 then µE _ 0, so p = 0 since E
Spectral Theory
348
THEOREM 9.13 (the square root theorem) For each A E C" x", there exists B = B* such that B2 = AA*. By the spectral theorem, AA* = I:A Ea(AA*) ki(Xil -- AA*)'. By Corollary 9.3,wehaveX >_ OforallX E h(AA*).PutB = ,X,EO(AA*) 0 AA*)'. Then clearly B = B* and B22 = AA*.
PROOF
One of the useful ways of representing a complex number z = a + bi is
in its polar form z = reie. Here r > 0 is real, ere has magnitude 1, and (ere)* = e-ie = (eie)-'. The analog for matrices is the next theorem. THEOREM 9.14 (polar decomposition theorem)
For any A E Cnx", there exists B = B* in C"'" and U with U = U+ such that A = BU. Moreover, U" = A", U*" = A*", and B22 = AA*. PROOF By the square root theorem, there exists B = B* in C"," such that B2 = AA*. Let U = B+A, Then BU = BB+A = (BB+)*A = B+*B*A = (B*+B*)A = B*"A = B"A = (B*B)"A = (B2)"A = (AA*)"A = A*"A = A. (B+"A)" = Also, since B*" = (BB*)" _ (AA*)" = A*", U" = (B+A)" (B*"A)" _ (A*"A)" = A" and U* _ (A*B+*) = (A*B*+)" (A*B+)', _ (A*"B+)l"
[(B+*)"B+] = B+ = B` = A*", Finally, we
_ (B"B+)" =
claim U* = U+. But UU* = (B+A)(A*B+) = B+(AA*)B+ = B+B2B+ =
(B+B)(BB+) = B"B" = B*"B*" = B*" = A*" = U*". Also, U*U = (A*B+)(B+A) = A*(B+)2A = A*[(B*B)+B*B+]A = A*[(B2)+BB+]A = A*[(B2)+B ]A = A*(B2)+B l]A = A*[(B+)2(B"A)] _ = A*(B+)2A = A*(AA*)+A = A*A*+ = A" = U". Finally, UU*U = UU" _ U and U*UU* = U"U* _ (U+*)"U* = U*. Thus, U* satisfies the four 0 Moore-Penrose equations, and so U* = U+. A*[(B2)+B*,']A
We end this section with an interesting result about Hermitian matrices.
THEOREM 9.15 Suppose A = A *, B = B *, and AB = BA. Then there exists C = C* and polynomials
p and q with real coefficients such that A = p(C) and B = q(C).
PROOF Use the spectral theorem to write A = X, E, + + X, E, and B = µ, F, +.. + µs F,,. Now AB = BA implies BEj = Ej B for all j, so Ej Fk = Fk Ej for all j, k. Let C = >c jk Ej Fk. Note C = C*. Choose polynomials j,k
such that p(cjk) = Xj and q(cjk) = µk for all j, k. Then C" _ > cilkEj Fk and
9.3 The Square Root and Polar Decomposition Theorems
349
P(C)=EXJEjFk=F_ (F_XiEj) Fk=>AFk=A1: Fk=Aandq(C)= E µk EJ Fk
=
E (>2 EJ (µk Fk) = B.
0
Exercise Set 41 1. Write A in its polar form A = BU. Argue that A is normal if B and U commute.
2. Write A in its polar form A = BU. Argue that h is an eigenvalue of iff \2 is an eigenvalue of B.
3. Write A in its polar form A = BU. Suppose X is an eigenvalue of A written in its polar form X = reie. Argue that r is an eigenvalue of B and ei0 is an eigenvalue of U.
4. Prove that A is normal if in a polar decomposition A = BU and AU =
UAiffAB=BA.
Further Reading [Higham, 1986a] N. J. Higham, Computing the Polar DecompositionWith Applications, SIAM J. Sci. Statist. Comput., Vol. 7, (1986), 11601174.
[Higham, 1986b] N. J. Higham, Newton's Method for the Matrix Square Root, Math. Comp., Vol. 46, (1986), 537-549.
[Higham, 1987] N. J. Higham, Computing Real Square Roots of a Real Matrix, Linear Algebra and Applications, 88/89, (1987), 405-430.
[Higham, 1994] N. J. Higham, The Matrix Sign Decomposition and Its Relation to the Polar Decomposition, Linear Algebra and Applications, 212/213, (1994), 3-20.
Chapter 10 Matrix Diagonalization
10.1
Diagonalization with Respect to Equivalence
Diagonal matrices are about the nicest matrices around. It is easy to add, subtract, and multiply them; find their inverses (if they happen to be invertible);
and find their pseudoinverses. The question naturally arises, is it possible to express an arbitrary matrix in terms of diagonal ones? Let's illustrate just how 1
1
easy this is to do. Consider A = L
1
1
2
2
2
2
2
. Note A is 4-by-3 with rank 2.
First, write A in a full rank factorization; A
1
1
1
2
1
0
2
1
1
0
1
0
1
2
_
FG. Now the columns of F and the rows of G are independent, so there are not be any zero columns in F or zero rows in G. We could use any nonzero scalars really, but just to he definite, take the lengths of the columns of F, normalize the columns, and form the diagonal matrix of column lengths. So A = Z Z
z 1
io
2
io
2
0
0
10
1
0
2
0
1
0
Now do the same with the rows of
2 IO
G. Then A =
ri 2
io 2
2
io
[
0
l0
][
0
1][
-3-
0
1
0
io
351
Matrix Diagonalization
352 1
2
10
2
1
to
z
to
2f
0
0 0][
[
0
L
0
f 2
0
1
It is clear that you can do
1
2
this with any m-by-n matrix of rank r: A = F, DG1, where F, has independent columns, D is r-by-r diagonal with nonzero entries down the main diagonal, and G 1 has independent rows. All this is fine, but so what? The numbers along the diagonal of D seem to have little to do with the original matrix A. Only the size of D seems to be relevant because it reveals the rank of A. All right, suppose someone insists on writing A = F, DG1, where F, and G, are invertible. Well, this imposes some size restrictions; namely, F, and G, must be square. If A is in C!' , then F, would have to be m-by-m of rank m, G 1 would have to be n-by-n of rank n, and D would have to be m-by-n of rank r, meaning exactly r elements on the diagonal of D are nonzero. Now theory tells us we can always extend an independent set to a basis. Let's do this for the columns of F, and the rows of G1 . This will get us invertible matrices. Let's do this for our example above: 2
2
,o
io
a,
b,
a2
b2
a3
2
a4
b3
245-
0
0 0
]0
0
0
0
0 0 0 0
f
0
0f 1
0
...
...
...
Cl
C2
C3
b4
= A .Notice it does not matter how we completed those bases. 1
2
2
Note the rank of A is revealed by the number of nonzero elements on the diagonal of D. In essence, we have found invertible matrices S and T such that
SAT = D, where D is a diagonal matrix. The entries on the diagonal of D as yet do not seem to be meaningfully related to A. Anyway, this leads us to a definition.
DEFINITION 10.1 (diagonalizable with respect to equivalence) Let A E Cr"". We say A is diagonalizable with respect to equivalence iff there exist invertible matrices S in C"'x"' and T in Cnxn such that SAT =
D'
0
®
(D
,
where D, E c;xr is diagonal of rank r. Note that if r = m, we D,
write SAT = [D,®], and if r = n, we set SAT =
10.1 Diagonalization with Respect to Equivalence
353
THEOREM 10.1
Let A E C"'. Then A is diagonalizable with respect to equivalence if and only if there exists nonsingular matrices S E Cmxm and T E Cnxn such that SAT =
Ir
Suppose first such an S and T exist. Choose an arbitrary diago-
PROOF
®J
nal matrix D, in C;xr. Then D = I D'
L0
is nonsingular and so is
Im-r
r
Ir ® _
DS. Let SI = DS and T, = T. Then SIATI = DSAT = D I Dr
0
L
0
Conversely, suppose A is diagonalizable with respect to equiva-
1.
0 0 0
D,
lence. Then there exist S1, TI invertible with SI AT,
r0
/
®1 f ® ®1
m-r J
J
11
®®
=D
r
J
=
Choose S = DI SI and T =
LL
TI.Then SAT = D-I SI ATI = D-1 D
11
11
Jr
®J =
r ® .
L
1
0
This com-
pletes the proof.
U
This theorem says there is no loss in generality in using Ir in the definition of diagonalizability with respect to equivalence. In fact, there is nothing new here.
We have been here before. This is just matrix equivalence and we proved in Chapter 4 (4.4) that every matrix A E C01 " is equivalent to a matrix of the form
I,
® O®
namely rank normal form, and hence, in our new terminology, that
],
every matrix A E C;'x" is diagonalizable with respect to equivalence. What we did not show was how to actually produce invertible matrices S and T that bring A to this canonical form where only rank matters from a full rank factorization of A.
Let A E Cr x" and let A = FG be a full rank factorization of A. Then Fix m
A[G+ : (In - G +G) W2 WI [Im (m-r)xm
Fixm
- FF+]mxm
nxr
nxn
nx(n-r)
FG[G+:(I - G+G)W2] _
WI [IM - FF+] F+AG+
W,(Im - FF+)AG+
F+A(I - G+G)W2 WI(I - FF+)A(I - G+G)W2
Note that the arbitrary matrices W1 E
C(m-r)xm
Ir
0
00
l .
and W2 in Q`nx(n-r) were needed
Matrix Diagonulization
354
to achieve the appropriate sizes of matrices in the product. At the moment, they are quite arbitrary. Even more generally, take an arbitrary matrix M E Cr"' F+ r 1 J A[G+M : (I - G+G)W21 = ® ®J. We see Then
W,(1 - FF+)
L
that every full rank factorization of A leads to a "diagonal reduction"of A. Of course, there is no reason to believe that the matrices flanking A above
are nonsingular. Indeed, if we choose W, = 0 and W2 = 0, they definitely will not he invertible. We must choose our W's much more carefully. We again appeal to full rank factorizations. Begin with any full rank factorization of A,
say A = FG. Then take positive full rank factorizations of the projections I,,, - FF+ = FIG, and I, - G+G = F2G2. That is, F1* = G, and F2 = G2. Then we have the following:
1. F,=Gt 2. G1
F,+
3. F2
G+
4. G2
F2
5. G,F=G,Gi G,F=G,(I - FF+)F=G 6. F+Gi = F+F1 = F+F, F,+F, = F+F,G, F, = F+(I - FF+) F,
=0
7. G2G+ = G2G+ G2G+ = G2F2G2G+ = G2(I - G+G)G+ =
8. GGZ =GF2=GF2F2 F2=GF2G2F2=G(I-G+G)F2=®. F+ Now we shall make
invertible by a judicious choice
W1(I - FF+) F+ of W,. Indeed, we compute that
[F : (I - FF+)Wi*1 =
. . .
W1(I - FF+)
F+F
F+(I - FF+)Wj
-
Ir
[0
W1(I - FF+)F W1 (I - FF+)Wi ] W,(I - FF+)W1 We badly need W1(I - FF+)Wl* All right, choose W, = F1+ _ G,. Then W1 (I - FF+)Wl = W, FIG,Wi = F 1+F, G1FI+* = IG,FI+* F+ GIG*, = I. Thus, if we take S = , then S is invertible and F1+ (I - FF+) J
355
10.1 Diagonalization with Respect to Equivalence
S-' = [F : (1 - FF+)Fi *]. But F,+(l - FF+) = Fi F1G1 = G1 = Fl +, F+
..
so we can simplify further. If S =
, then
S-' = [F :
F1]. Sim-
F
ilarly, if T = [G+ : (1 - G+G)W2], we choose W2 = F2 and find that G
T-1 _
. I
These invertible matrices S and T are not unique. Take
F2
F++W3(1 -FF+) S=
I.
F'
L
Then S-1 = [F : (1 - FW3)F1], as the
J
reader may verify. Here W3 is completely arbitrary other than it has to be
the right size. Similarly for T = [G+ + (I - G+G)W4 : F2 ], we have G
T-1 _
, where again W4 is arbitrary of appropriate size.
. . .
F2(1 - W4G) Let us summarize our findings in a theorem.
THEOREM 10.2
0 . If A = FG is any ®® ] 1r
Every matrix A in Cr` is equivalent to 1
full rank factorization of A and I - F F+ = F1 F1 and 1 - G + G = F2 F2* r F+ + W3(1 - FF+) 1 are positive full rank factorizations, then S = I
I
F1
L
and
J
T = [G+ + (1 - G+G)W4 : FF ] are invertible with S-' = [F : (I - FW3)F1 ] C
and T-' _
and SAT
=
r
®
1` L
F2(1 - W4G)
J
Let's look at an example. Example 10.1
LetA=
1
2
2
1
2
7
6
10
7
6
4
4
6
4
4
1
0
1
1
0
=FG=
(
1
0
1
0
1
i
2
1
in C4.3. First, 2
Matrix Diagonalization
356
-10
16
38
T8
T8
9 38
-14
-8
38
38
38
-10
-14
26
38
38
38
5 38
-8
4
24
38
38
38
r
17
I
38 1
1-FF+=
16 38
1
. Next, we compute a positive full J
rank factorization of this projection as described previously: I - FF+ -1
-1
f f 2
4
s7 -2
-1
57
a
I
0 -32
6 57 16
-8
76
76
76
37
-9
14
76
76
76
2
16-
57
57
f
-I
57
57
28 76
30 76
Now
2
16
-8
76
76
76
37 76
-9
14
-30
76
76
76
6 57
we
= FI FI . Then we find FI+ _
put
can
r
28 76
and S-1 =
S = -1
-1
2
f f -2
4
I
57
57
0
v 57
6
57
1
2
7
6
4
4
f
1
0
0
5
-2
9
9
i2
:.-_
9
4 9
0
0
SAT =
0
0
0
0
0 0
2
I
57
6 57
J
T is computed in a similar manner. We find T = 1
together;
S
-6 9
8
-3
9
9
2
9
and
6 9
as the reader can verify.
Actually, a little more is true. The matrix S above can always be chosen to be unitary, not just invertible. To gain this, we begin with an orthogonal full
rank factorization of A, A = FG, where F* = F+. That is, the columns of F form an orthonormal basis of the column space of A. Then SS* _
I
I(F+)*.(1-FF+)Wi l = r F+F WI (1 - FF+) Ir
0
0 W1(1 - FF+)WI
As before, select WI = FI+ and get SS* = /.
10.2 Diagonalization with Respect to Similarity
357
Exercise Set 42 1. Verify the formulas labeled (1) through (8) on page 354.
Further Reading [Enegren, 1995] Disa Enegren, On Simultaneous Diagonalization of Matrices, Masters Thesis, Baylor University, May, (1995).
[Wielenga, 1992] Douglas G. Wielenga, Taxonomy of Necessary and Sufficient Conditions for Simultaneously Diagonalizing a Pair of Rectangular Matrices, Masters Thesis, Baylor University, August, (1992). similar matrices, principal idempotents, minimal polynomial relative to a vector, primary decomposition theorem
10.2
Diagonalization with Respect to Similarity
In this section, we demand even more of a matrix. In the previous section, every matrix was diagonalizable in the weak sense of equivalence. That will not be the case in this section. Here we deal only with square matrices.
DEFINITION 10.2
Two matrices A, B E C"' are similar (in symbols A - B) if there exists an invertible matrix S E C""" with S -' A S= B. A matrix A is diagonalizable with respect to similarity if A is similar to a diagonal matrix. Let's try to get a feeling for what is going on here. The first thing we note is that this notion of diagonalizability works only for square matrices. Let's just
look at a simple 3-by-3 case. Say S-' AS = D or, what is the same, AS = Xi
SD, where D =
0 0
0
0
k2
0
0
k3
and S = [sIIs21s3. Then A[s11s21s3] =
Matrix Diagonalization
358
0
0
0
1\2
0
0
0
1\3
X, [si Is21s31
or [As, IAs2IAs31 = [Xisi 1X2s21X3ss1. This says
As, = #\,s,, As2 = X2s2 and As3 = X3s3. Thus, the diagonal elements of D must be the eigenvalues of A and the columns of S must be eigenvectors corresponding to those eigenvalues. Moreover, S has full rank, being invertible, so these eigenvectors are independent. But then they form a basis of C3. The next theorem should not now surprise you.
THEOREM 10.3 Let A E Cnx". Then A is diagonalizable with respect to similarity iff A has n linearly independent eigenvectors (i.e., C" has a basis consisting entirely of eigenvectors of A).
PROOF
The proof is left as an exercise. Just generalize the example
above.
0
Thus, if you can come up with n linearly independent eigenvectors of A E C""" you can easily construct S to bring A into diagonal form.
THEOREM 10.4
Let A E C""" and let Xi , X2, ... X, be distinct eigenvalues o f A. Suppose v,, v2, ... , vs. are eigenvectors corresponding to each of the Xs. Then {v,, V2.... , v,} is a linearly independent set. PROOF Suppose to the contrary that the set is dependent. None of these eigenvectors is zero. So let t be the largest index so that vi, V2, ... V, is an
independent set. It could be t = 1, but it cannot be that t = s. So I
_<
t < s. Now vi, ... v,, v,+, is a dependent set by definition of how we chose t so there exists scalars a, , a , ... , a, , a,+, , not all zero with a, v, + a2 V2 + . Multiply by A and get a, X, v, + a2X2v2 + + a,v, + a,+i v,+i = + a,+IXf+,vt+, = Multiply the original dependency by X,+, and get ai X,+i v, + ... + a,+, Xt+, v,+i = -6. Subtracting the two equations yields
oil (XI-X,+i)vi+...+az(X,-X,+,)v, ='.Butv,,... ,v,isanindependent set, so all the coefficients must be zero. But the Xs are distinct so the only way out is for all the as to be zero. But this contradicts the fact that not all the as are zero. This completes our indirect proof. 0 COROLLARY 10.1 If A E C" x" has n distinct eigenvalues then A is diagonalizable with respect to similarity.
/0.2 Diagonalization with Respect to Similarity
359
Matrices that are diagonalizable with respect to a similarity have a particularly nice form. The following is a more general version of the spectral theorem.
THEOREM 10.5 Suppose A E C01 "1 and h 1, A2, ... , hk are its distinct eigenvalues. Then A is diagonalizable with respect to a similarity iff there exist unique idempotents E1, E2, ... , Ek, such that 1.
E, E j= 0 whenever i 96 j k
2. EEi=1,. k
3. A= > X1 E1. 1=1
Moreover, the E are expressible as polynomials in A.
PROOF Suppose first such idempotents E; exist satisfying (1), (2), and (3). Then we know from earlier work that these idempotents effect a direct sum decomposition of C", namely, C" = Col(EI) ® Col(E2) ® . . . ® Col(Ek).
Let ri = dim(Col(Ei)). Choose a basis of each summand and union these to get a basis of the whole space. Then create an invertible matrix S by k
k
arranging the basis vectors as columns. Then AS = (>XiEi)S = > X1E;S = i=1
i=1
. Thus A is diagonalizable with respect to
S
Xk 1ri
X21rz
a similarity. Conversely, suppose AS = S L
Xkl,.
J
Define Ej = [®... ®/3i®... ®]S-1, where Bi is a collection of column vectors from a basis of Col(E,). Then the Ej satisfy (I), (2), and (3), as we ask the reader to verify. For uniqueness, suppose idempotents Fi satisfy (1),
(2), and (3). Then E;A = AE, = X E; and FjA = AF/ = XjFj. Thus E(AF) = kj E1 Fj and (Ei A) Fj = hi Ej Fj, which imply Ej Fj = 0 when i # j. Now E, = E; E Fj = E,Fi = (E Ej)F1 = F1. Finally, consider the
Matrix Diagonalization
360
polynomial p;(x) = n(x - Xi). Note pi(,\i) 54 0. Let F;
Pi(A Compute
i#'
that F; Ei = E; if i = j and 0 if i 0 j. Then F; = F; (E Ei) - E,.
0
The E, s above are called the principal idempotents of A. For example, they allow us to define functions of a matrix that is diagonalizable with respect to a similarity; namely, given f and A, define f (A) = E f (k; )E; . The minimum polynomial of a matrix has something to say about diagonal-
izability. Let's recall some details. Let A E C""". Then 1, A, A2.... , A"2 is a list of n2 + 1 matrices (vectors) in the n2 dimensional vector space C""", These then must be linearly dependent. Thus, there is a uniquely determined
integer s < n2 such that 1, A, A2, ... , A''-' are linearly independent but 1, A, A2.. , A'-', A' are linearly dependent. Therefore, A' is a linear combination of the "vectors" that precede A' in this list, say A' _ (3o1 + [3, A +... + (3o R,._, A". The minimum polynomial of A is p A W = x' - R, _, x , in C[x].
THEOREM 10.6 Let A E Cnx". Then I.
pA(x)00inC[x]yetpA(A)=0inC`2.
µA is the unique monic polynomial (i.e., leading coefficient I) of least degree that annihilates A.
3. If p(x) E C[x] and p(A) = 0, then p.A(x) divides p(x). PROOF I. The proof fellows from the remarks preceding the theorem.
2. The proof is pretty clear since if p(x) were another monic polynomial with deg(p(x)) = s and p(A) = 0, then p.A(x) - p(x) is a polynomial of degree less than s, which still annihilates A, a contradiction.
3. Suppose p(x) E C[xJ and p(A) = 0. Apply the division algorithm in C[x] and write p(x) = p. (x)q(x) + r(x) where r(x) = 0 or degr(x) < s = deg p A(x). Then r(A) = p(A) - ILA(A)q(A) = 0. This contradicts the definition of µA unless r(x) = 0. Thus, p(x) = p A(x)q(x), as was to he proved.
Let's look at some small examples.
0
10.2 Diagonulization with Respect to Similarity
361
Example 10.2 1.
Let A = [ a
d
J.
In theory, we should compute 11, A, A2, A3, A') and
f
a2+be b(a+d2
a2 + be ca + cd
=
look for a dependency. But A'-
ab + bd cb + d
so A2-(a+d)A= f be - ad
]
0
c(a + d) cb + d J L 0 be - ad so A2 - tr(A)A + det(A)l = 0. Thus, the minimum polynomial of A, RA(x)divides p(x) = x2-tr(A)x+det(A). Hence either pA(x) = x-k or JA(x) = x2 - tr(A)x + det(A). In the first instance, A must be a scalar multiple of the identity matrix. So, for 2-by-2 matrices, the situ-
ation is pretty well nailed down: A = L 0
mial I A(x) = x - 5, while A =
5 L
5 J
5
1
has minimal polyno-
has minimal polynomial
RA(x) = x2 - lOx + 25 = (x - 5)2. Meanwhile, A = minimum polynomial VA (x) = x2 - 5x - 2.
2. The 3-by-3 case is a bit more intense. Let A =
has
2
. 3
4
J
L
a d g
A2=[a2+bd+cg
f
b e h
c
f
.
Then
j
a(a2 + bd + cg)+
A3=
d(a2+bd+cg)+
soA3-tr(A)A2+(ea+ja+je-
g(a2 + bd + cg)+ db - cq - f h)A - det(A)I = 0. Thus, the minimum polynomial µA is of degree 3, 2, or I and divides (or equals)
x3-tr(A)x2-(ea+ja+je-db-cg- fh)x
- det(A) = 0.
Well, that is about as far as brute force can take us. Now you see why we have theory. Let's look at another approach to computing the minimum polynomial of a matrix.
Let A E C". For any v E C", we can construct the list of vectors v, Av, A2v, .... As before, there is a least nonnegative integer d with v, Av,... , Adv linearly dependent. Evidently, d < n and d = 0 iff v = -6 and d = 1 iff v is an eigenvector of A. Say Adv = Rov + 13, Av +
+ 13d_,
Ad-IV.
-
Then define the minimal polynomial of A relative to v as µA.,,(x) = xd Rd_,xd-I - (3,x - 130. Clearly, V E JVull(µA,(A)) so VA.-, is the Unique monic polynomial p of least degree so that v E Null(p(A)). Now,
-
362
Matrix Diagonalization
if PA.v,(x) and µA.V,(x) both divide a polynomial p(x), then v, and v2 belong to Null(p(A)). So if you are lucky enough to have a basis VI, v2, , v and
a polynomial p(x) divided by each µA,,,(x), then the basis (v,, ... Null(p(A)) soAIull(p(A)) = C" whence p(A) = 0.
, v,,)
THEOREM 10.7
be a basis of C". Then µA(x) Let A E C""" and let (b,, b2, ... , LCM(µA.b,(x), ... , That is, the minimum polynomial of A is the least common multiple of the minimum polynomials of A relative to a basis. Then p(A) PROOF Let p(x) = LCM(µA.b,(x), µA.b,(x), , 0 as we noted above. Thus AA(x) divides p(A). Conversely, for each j, 1 < j < n, apply the division algorithm and get µA(x) = yi(x)µA.b,(x) + ri(x),
= µA(A)bi =
where ri(x) = 0 or degri(x) < deg(µA.bjx)). Then
9i(A)(.A.b,(A)bi) + ri(A)bi = rj (A)bj. By minimality of the degree of AA.b, we must have rj(x) = 0. This says every µA.b, divides p.A(x). Thus the LCM of the µA.b,(x) divides AA(x) but that is p(x). Therefore, p(x) = AA(x) and our proof is done. 0 -I
For example, let A =
-l
2
-I
0
0
-1
1
A22e, _
-I
0
-1
0
-I
=
I
0
1
-1 I = Ae,, so (A3
,
-1
l
=
0 0
I 1
2
-l
I
2
0
0
-I
2
-1
-1 -1
the easiest one to think of, so Ae,
-1 -1
. The standard basis is probably
1
A3e1 =
I
-I 0
-1 -1
-1
0
-1
0
2 1
1
- A)e, = -0 . Thus l.A.ei (x) = x3 -X =
0
I
x(x - I)(x + 1). Ae2 = -1
-1
-1 0
0
-1
2 1
1
-1 0
-I
-1
-I
-1
0
0
-1 -1 0
=
-I -1
x(x - 1). Finally, Ae3 =
-I 0
-1 0
-I
2
0
1
I
1
0
=
-1 0
,
A2e2 =
-1
= Ae2, so p.A,,(x) = x2 - x = 2
0
1
0
I
I
2
=
I I
,
A2e3 =
10.2 Diagonalization with Respect to Similarity
-1
-1 0
0
-1
-I
2
2
1
1
1
1
-1
-I
=
-1
,
A3e3 =
-1
-1 0
0
-1
0
363 2 1
1
-I
-I
=
0
2 1
Ae3, SO µA,,,,(x) = x3 - x = x(x - 1)(x + 1) and pA(x)
1
LCM(x3 - x, x2 - x) = x(x - l)(x + I) = x3 - x. Did you notice we never actually computed any powers of A in this example? The next theorem makes an important connection with eigenvalues of a matrix and its minimum polynomial.
THEOREM 10.8 Let A E CI ". Then the eigenvalues of A are exactly the roots of the minimum polynomial VA(X)-
PROOF
Suppose k is a root of p. (x). Then ILA(x) = (x - k)q(x) where
degq < degp.A But 0 = (A - k/)q(A) and q(A) 0. Thus there exists a vector v, necessarily nonzero, such that q(A)v # 6. Let w = q(A)v. Then
6 = (A - k/)w so k is an eigenvalue of A.
Conversely, suppose k is an eigenvalue of A with eigenvector v. Then A2v =
AKv = KAv = k2v. Generally, Aiv = kiv. So if µA(x) = x` - a,_,x'-' -
...-aix+aothen
= µA(A)v= (A' -a,-,A'-'
>\'v - a,-i k`-'v - ... akv - aov = ()\' - 0t,-,1\'- I Buty# V soILA(X)=0.
ao)v = WA(k)V. 0
THEOREM 10.9
Let A E Cn"n with distinct eigenvalues X1, k2, ....,. Then the following are equivalent:
1. A is diagonalizable with respect to similarity. 2. There is a basis of C"consisting of eigenvectors of A. 3.
ILA(x)=(x-ki)(z-K2)...(x-K,).
4. GCD(µA, A) = I 5. µA and µA do not have a common root.
PROOF We already have (I) is equivalent to (2) by (10.3). (4) and (5) are equivalent by general polynomial theory. Suppose (2). Then there is a basis b,, b2, ... , b of cigenvectors of A. Then RA(x) = LCM(RA,b, (x), ... , But ILA.b,(x) = x - Ro since the b; are eigenvectors and (3o is some eigenvalue. Thus (3) follows. Conversely, if we have (3), consider b, , b2, , b,
Matrix Diagonalization
364
eigenvectors that belong to X1, X2, ... , X respectively. Then (b1, b2, , b,} is an independent set. If it spans C", we have a basis of eigenvectors and (2) follows. If not, extend this set to a basis, {b1, , b w,+,, , W11). If, by luck, all the ws are eigenvectors of A, (2) follows again. So suppose some w is not an eigenvector. Then µA,W, (x) has degree at least 2, a contradiction. Q Now we come to a general result about all square matrices over C.
THEOREM 10.10 (primary decomposition theorem) Let A E C"". Suppose X, , X2, ... , X, are the distinct eigenvalues of A. Let PA(x) = (x - X,)e,(x - X2 )e2 ... (x - X,)ey. Then
C" = JVull((A - X,1)ej) ®Null((A - X2 1)e-) ® . ®JVull((A - X, 1)e') Moreover, each Nu ll ((A _X, I)e,) is A-invariant. Also if XA (x) = (x - X 1)d' (x X2)d2 .
. (x-X,.)d,, then dim(JVull((A-X11p)) = d;. Moreover, an invertible A2
matrix S exists with S-1 AS =
where each A; is
d1-by-di.
PROOF Let qi(x) = (x - X1)ei ...(x - Xi_,)e,-'(x - X,+,)e,+ ...(x X,)e'. In other words, delete the ith factor in the minimum polynomial. Then deg(gi(x)) < deg(p.A(x)) and the polynomials q,(x), q2(x), ... , qs(x) are relatively prime (i.e., they have no common prime factors). Thus, there exist
polynomials p1(x), p2(x), ... , p,(x) with I = gl(x)p,(x) + g2(x)p2(x) + ...+q,(x)ps(x). But then 1 = Let hi(x) = gi(x)pi(x) and let Ei = hi(A) i = 1,... , s. Already we see qi(A)pi(A)+g2(A)p2(A)+...+q,(A)p,(A).
I = E,+E2+ +E,.Moreover,fori # j,EiEj =q,(A)pi(A)gi(A)pj(A)_ AA(A)times (something)=O. Next, Ei = Ei(1) = Ei(E,+ +E,) = E,Ei, so each E, is idempotent. Therefore, already we have C" = Col(E,)+Col (E2 + +Col(E,). Could any of these column spaces be zero? Say Col(E1) = ( ).
Then C" = Col(Ej) +
+ Col(E,). Now + Col(Ef_1) + Col(E1t1) + gi(A)EJ = g1(A)g1(A)pj(A) = 0 for j # i so, since every vector in C" is a sum of vectors from the column spaces Col (E, ), ... , Col (E,) without Col (Ei ), it follows qi(A) = 0. But this contradicts the fact that µA(x) is the minimum polynomial of A so Col(Ei) # (0) for i = 1, , s. Thus, we have a direct sum C" = Col(E1)® . . ®Col(E,). To finish we need to identify these column
spaces. First, (A - X11)', Ei = (A - X;1)e,g1(A)p,(A) = IiA(A)p1(A) = 0 so Col(E,) C Null(A - X/ 1)e,. Next, take v E Mull(A - Xi1)ei and write
10.2 Diagonalization with Respect to Similarity
v = E,v, + E2v2 +
+ Esv, where Ejvj E Col(Ej), j = 1, ...
365 , s.
Then -6 = (A - X,I)ejV = (A - X,1)e'E1vI + ... + (A - XiI)e'E,V. = E,(A -Xi1)e'v, +...+E,(A-X,1)e'v,. Note the Es are polynomials in A, so we get to commute them with (A X,I)ej, which is also a polynomial in A. Since the sum is direct we have each
piece must be zero - in other words, (A - Xi 1)e' E jvj = . Thus for j # i,
(A-X,I)ejEjvj =(A-XjI)eJEjvj =
'. Now GCD((x-X,)e',(x-Xj)ei)=
I so there exist polynomials a(x) and b(x) with l = a(x)(x - y, )e, + b(x)(x -
Xj)e'.Therefore, Ejvj = IEjvj =(a(A)(A-X,I)ej+b(A)(A-Xj)eJ)Ejvj = a(A)(A - X,1)ej Ejvj + b(A)(A - X j)e%Ejvj = 6 + -6 = -6. Thus v = Eivi +... + E,v, = Eivi, putting v in the column space of E,. Now, by taking a basis of each Null ((A - X, I )ej ), we construct an invertible matrix S so that
A2
S-BAS =
Let AA; (x) be the minimum polynomial of A1. We know that PA(x) = lps-' AS(x)
LCM(p.A,(x),... , ILAs(x)) and XA(x) = Xs-'AS(x) = XA,(x)XA2(x) XA, (x). Now (A - X,1)ej = 0 on Null [(A - X, I )e' ] so µA; (x) divides (x - X, )ej and so the PA, (x)'s are relatively prime. This means that ILA (x) =
(x - X,)e,. Thus, for each i, LA, (x) . Pri(x) = (x - X,)e'(x - X2)e2 IiA,(x) = (x - X,)ei. Now XA,(x) must equal (x - X,)'', where ri > e,. But
(x-X,)'1(x-X2)'2.. .(x-X,)' =XA(x)=(x-XI)" (x-X2)d2...(x-X,.)J. By the uniqueness of the prime decomposition of polynomials, ri = di for all i. That wraps up the theorem. 0 This theorem has some rather nice consequences. With notation as above, let D = X, E, + X2 E2 + ... + X, E, . We claim D is diagonalizable with respect to similarity. Now, since I = E, + E2 + ... + E any v E C" can be written
as v = v, + v2 + ... + vs, V; E Col(E1) = Fix(Ei). But Dv, = DE,v1 = (X, El + + X, Es) E; v; = X1 E?vi = Xi v; . Thus every nonzero vector in the column space of E; is an eigenvector and every vector is a sum of these. So the eigenvectors span and hence a basis of eigenvectors of D can be selected.
Moreover, A = AE, + A E2 +
+ A Es, so if we take N = A - D = we see
X,I)2E N3 = (A - X,1)3E, +
+ (A - X51)3Es, and so on. Eventually,
since (A - X, 1)1, E1 = 0, we get N' = 0, where r > all e1 s. That is, N is nilpotent. Note that both D and N are polynomials in A, so they commute (i.e., DN = N D). Thus we have the next theorem that holds for any matrix over C.
Matrix Diagonalization
366
COROLLARY 10.2 (Jordan decomposition) Let A E C""". Then A= D+ N, where D is diagonalizable and N is nilpotent. Then there exists S invertible, with S-1 AS = D + N, where b is a diagonal matrix and N is nilpotent.
PROOF
The details are left to the reader.
0
COROLLARY 10.3
Let A E Cn'". The D and the N of (10.2) commute and are unique. Indeed, each of them is a polynomial in A.
PROOF Now D =XI El +,\.)E,)+. .+,\,E, and all the Ei s are polynomials
in A and so N = A - D is a polynomial in A. Suppose A = D, + N,, where D, is diagonalizable with respect to similarity, and N, is nilpotent and DIN, = N, D, . We shall argue D = DI, N = N, .Now D+ N = A = D, + N,
so D - D, = N, - N. Now D and D, commute and so are simultaneously diagonalizable; hence D- D, is diagonalizable. Also N and /N, commute and so
N, - N is nilpotent. To see this, look at (N, - N)'
s ) N, -1 Ni by
the binomial expansion, which you remember works for commuting matrices.
By taking an s large enough - say, s = 2n or more - we annihilate the right-hand side. Now this means D - D, is a diagonalizable nilpotent matrix. Then S-'(D - D,)S is a diagonal nilpotent matrix. But this must he the zero matrix.
0
We could diagonalize all matrices in C""I if it were not for that nasty nilpotent matrix that gets in the way. We shall have more to say about this representation later when we discuss the Jordan canonical tbrm. It is of interest to know if two matrices can be brought to diagonal form by the same invertible matrix. The next theorem gives necessary and sufficient conditions.
THEOREM 10.11 Let A, B E C" "" be diagonalizable with respect to similarity. Then there exists S invertible that diagonalizes both A and B if and only if A B = BA.
PROOF
Suppose S exists with S-' AS = D, and S-' BS = D2. Then A =
SD, S-' and B = SD2S-'. Then AB = SD, S-'SD2S-' = SD, D2S-' = SD2D,S-I = SD2S-'SD,S-1 = BA using the fact that diagonal matrices commute. Conversely, suppose AB = BA. Since A is diagonalizable, its minimal polynomial is of the form p (x) = (x - X,)(x - A2) ... (x - X,), where the
10.2 Diagonalization with Respect to Similarity
367
X, s are the distinct eigenvalues of A. By the primary decomposition theorem,
C" = Null(A-X1I)® .®NUII(A-X,1). Now for any vi inArull(A-X11), ABv1 = BAv1 = B(X1v1) = X,Bv1. This says each NUIl(A - X11) is Binvariant. We can therefore find a basis ofNull(A - 111), say B,, consisting of eigenvectors of B. Then U 131 is a basis of C" consisting of eigenvectors of 1_i
both A and of B. Forming this basis into a matrix S creates an invertible matrix that diagonalizes both A and B. 0 It follows from the primary decomposition theorem that any square complex matrix is similar to a block diagonal matrix. Actually, we can do even better. For this, we first refine our notion of a nilpotent matrix, relativising this idea to a subspace.
DEFINITION 10.3 (Nilpotent on V) Let V be a subspace of C" and let A E C"". We say that A is nilpotent on V iff there exists a positive integer k such that Ak v = for all v in V. The least such k is called the index of A on V, Indv(A).
Of course, Akv = -6 for all v in V is equivalent to saying V c_ Null(A"). Thus, Indv(A) = q iff V C_ Null(A9) but V 9 Null(Ay-1). For example, consider C4 = V ® W = span lei, e2) El) span Je3, e4)
0 1
0
00
1
00
0
0
1
0
1
00
0
and let A =
Note that
Aei Ae7
Ae3 Ae4
and A2 =
0 0
0
0
0 0
0
0
0 0
= = = =
e1
e3
e4
0
0
1
0
0
1
s
oAisnil p otent ofindex 2onV =sp an le i, e 2 }
.
Now we come to the crucial theorem to take the next step. THEOREM 10.12
Suppose C' = V ® W, where V and W are A-invariant for A E C""" and suppose dim(V) = r. If A is nilpotent of index q on V, then there exists a basis
368
Matrix Diagonalization
of V, {vi,v2,
... ,v,} such that =
V
Av2
E
span{vi )
Av3
E
spun{vi, v2}
Av,_1
E
span{vi, V2,
AVr
E
Av1
... span{vi, V2, ...
, Vr_2}
, Vr_1).
PROOF
The case when A is the zero matrix is trivial and left to the reader. By definition of q, V Null(Ay-' ), so there exists v 0- 16 such that A"-'v ¢ V.
Let vi = Ay-lv -A-6 but Avi = -6. Now if span(vj) = V, we are done. If not, span(vi} V. Look at Col(A) fl V. If Col(A) fl V _c span{vi), choose any v2 in V\span{vi}. Then {v1,v2} is independent and Av, = 6, Av2 E Col(A)fl V e span{vi ). IfCol(A)fl V span{vi }, then we first notice that Col(Ay) fl V = (-'0 ). The proof goes like this. Suppose X E Col(Ay) fl V. Then X E V and x = Ayy for some y. But y = v + w for some v E V, W E W. Thus x = Ay(v)+ Aq(w) = Aq(w) E V fl W = (t). Therefore x = 6. Now we can consider a chain
Col(A) fl V D Col(A2) fl V D ... D Col(Ay) fl V = (d). Therefore, there must be a first positive integer j such that Col(AJ) fl v
span(vi) but Col(Aill) fl V e span(vi). Choose V2 E (Col(A') fl V) \span(vi ). Then in this case, we have achieved Avi = ' and Av2 E spun{vi ). Continue in this manner until a basis of V of the desired type is obtained. 0 This theorem has some rather nice consequences. COROLLARY 10.4 Suppose N is an n-by-n nilpotent matrix. There is an invertible matrix S such
that S-' NS is strictly upper triangular. That is, 0
a12
a13
a i,,
0
0
a23
a2
0 0
0
0
...
an-In
0
0
...
0
S-'NS =
We can now improve on the primary decompostion theorem. Look at the matrix A, - ki/d,. Clearly this matrix is nilpotent on Null(A - k, 1)r., so we can get a basis of Null(A - lei1)ei and make an invertible matrix Si with S,
/0.2 Diagonalization with Respect to Similarity
0 a12 10 0
a13 a23
...
aid,
...
a2d,
.But then Si-i (Ai - hi ld, )Si =
(Ai - ki /d, )Si = 0 0
Si 1AiSi - kild, ki
a12
0
ki
0
0 0
0
...
0 0
a12
a13
...
0
a23
...
0 0
0 0
0 0
aid,
a2d
_
so S, I A; S;
=
ad,-Id, 0
aid,
a13 a23
0 0
ad,-Id, 0
0 0
a2d, .
0 0
369
. .
...
By piecing together a basis for each
ad,-td, k;
primary component in this way, we get the following theorem.
THEOREM 10.13 (triangularization theorem) Let A E C"". Suppose X 1 , k2, ... , Xs are the distinct eigenvalues of A. Let lLA(x) _ (x - X 0" (x - k2)02 ... (x - XS)e . Also let XA(x) _ (x - kl )dl (x X2)d2 ... (x - k,)dr. Then there exists an invertible matrix S such that S-I AS is block diagonal with upper triangular blocks. Moreover, the ith block has the eigenvalue ki on its diagonal. In particular this says that A is similar to an upper triangular matrix whose diagonal elements consist of the eigenvalues of A repeated according to their algebraic multiplicity. It turns out, we can make an even stronger conclusion, and that is the subject of the next section.
Exercise Set 43 1. Argue that similarity is an equivalence relation on n-by-n matrices.
2. Prove if A is similar to 0, then A = 0. 3. Prove that if A is similar to P and P is idempotent, then A is idempotent. Does the same result hold if "idempotent" is replaced by "nilpotent"? 4. Suppose A has principal idempotents Ei. Argue that a matrix B commutes with A iff it commutes with every one of the Ei.
370
Matrix Diagonalization
5. Suppose A has principal idempotents E,. Argue that Col(E;) Eig(A, X,) and Null(E,) is the direct sum of the eigenspaces of A not associated with X j.
6. Suppose A
B and A is invertible. Argue that B must also be invertible
and A-' -B-'. 7. Suppose A
B. Argue that AT
BT.
B. Argue that Ak - Bk for all positive integers k. Also 8. Suppose A argue then that p(A) ^- p(B) for any polynomial p(x). 9. Suppose A 10. Suppose A
B. Argue that det(A) = det(B). B and X E C. Argue that (A - XI) - (B - It 1) and so
det(A - Al) = det(B - Al). 11. Find matrices A and B with tr(A) = tr(B), det(A) = det(B), and rank(A) = rank(B), but A is not similar to B. B. Suppose A is nilpotent of index q. Argue that B is 12. Suppose A nilpotent with index q. 13. Suppose A - B and A is idempotent. Argue that B is idempotent. 14. Suppose A is diagonalizable with respect to similarity and B that B is also diagonalizable with respect to a similarity.
A. Argue
15. Give an example of two 2-by-2 matrices that have identical eigenvalues but are not similar.
16. Argue that the intersection of any family of A-invariant subspaces of a matrix A is again A-invariant. Why is there a smallest A-invariant subspace containing any given set of vectors?
17. Let A E C'
and v # 0 in C. Let µA.v(x) = xJ - [3,t-ixd-i
Rix - Ro
(a) Prove that {v,. AV,
A`t-Iv} is a linearly independent set.
(b) Let Kd(A, v) = span{v, AV, ... , Ad-'v}. Argue that Kd(A, v) is the smallest A-invariant subspace of C' that contains v. Moreover,
dim(Kj(A, v)) = deg(p.A,,.(x)) (c) Prove that Kd(A, v) = {p(A)v I P E C[x]}.
(d) Let A =
0
-1
I
0 0
0
1
I
2
.
Compute K,I(A, v), where v = el.
10.3 Diugonalization with Respect to a Unitary
371
(e) Extend the independent set (v, AV, ... Ad-1v) to a basis of C",say {v, AV, ... , Ad-I v, wI , .. wd_ }. Form the invertible matrix S = wd_"]. Argue t hat S-'AS = [v I Av I ... Ad-Iv I W, I
I
r C
L0
?
?
I
0
0
0
1
0
0
1
0 0
0
0
...
[30
...
PI
...
02
where C = L0
...
Rd-I
J
(t) Suppose µA.V,(x) = (x - \)d. Argue that v, (A - XI)v,... JA kI)d-Iv is a basis for lCd(A, v). 18. Suppose A E C" "". Prove that the following statements are all equivalent:
(a) There is an invertible matrix S such that S-1 AS is upper triangular. (b) There is a basis {vi, ... , v"} of C" such that Avk E span{vi, ... , vk}forallk = 1,2,... n.
(c) There is a basis (vi, ... , v"} of C" such that span{vI, ... , vk} is A-invariant for all k = 1, 2, ... , n.
Further Reading [Abate, 1997] Marco Abate, When is a Linear Operator Diagonal izable?, The American Mathematical Monthly, Vol. 104, No. 9, November, (1997),
824-830.
[B&R, 2002] T. S. Blyth and E. F. Robertson, Further Linear Algebra, Springer, New York, (2002).
unitarily equivalent, Schur's lemma, Schur decomposition
10.3
Diagonalization with Respect to a Unitary
In this section, we demand even more. We want to diagonalize a matrix using
a unitary matrix, not just an invertible one. For one thing, that relieves us of having to invert a matrix.
Matrix Diagonalization
372
DEFINITION 10.4 Two matrices A, BE C""" are called unitarily similar (or unitarily equivalent) iff there exists a unitary matrix U, (U* = U-') such that U-I AU = B. A matrix A is diagonalizable with respect to a unitary iff A is unitarily similar to a diagonal matrix. We begin with a rather fundamental result about complex matrices that goes back to the Russian mathematician Issai Schur (10 January 1875-10 January 1941).
THEOREM 10.14 (Schur's lemma, Math. Annalen, 66,(1909),408-510) Let A E C""". Then there is a unitary matrix U such that U*AU is upper triangular. That is, A is unitarily equivalent to an upper triangular matrix T. Moreover, the eigenvalues of A (possibly repeated) comprise the diagonal elements of T.
PROOF
The proof is by induction on the size of the matrix n. The result is clearly true for n = 1. (Do you notice that all books seem to say that when doing induction?) Assume the theorem is true for matrices of size k-by-k. Our job is to show the theorem is true for matrices of size (k + I)-by-(k + 1). So suppose
A is a matrix of this size. Since we are working over the complex numbers, we know A has an eigenvalue, say X, and an eigenvector of unit length w, belonging to it. Extend w, to an orthonormal basis of Ck+i (using Gram-Schmidt, for example). Say (WI, w2, ... , wk+, } is an orthonormal basis of Ck+i . Form the matrix W with columns w, , w2, ... , wk+,.Then W is unitary and W* A W = wi W* [Awl, I Awe, I
-
... l Awk+,] =
wZ
-
[X,w, I Awe 1
...
I
wk+, A,
*
*
...
0
Awk+, ] =
0
C
,
L0
where C is k-by-k. Now the induction
J
hypothesis provides a k-by-k unitary matrix V, such that V1*CV, = T, is upper 0 ... 0 0 triangular. Let V = and note that V is 1
V,
0
10.3 Diagonalization with Respect to a Unitary
unitary. Let U = WV. Then V*W*AWV =
f
x1
*
*
...
373
0 0 0
VI*CVI
*
0
0
TI
which is an upper triangular matrix as we
L0 hoped.
0
DEFINITION 10.5 (Schur decomposition) The theorem above shows that every A E C"X" can be written as A = UT U* where T is upper triangular. This is called a Schur decomposition of A.
THEOREM 10.15 Let A E C""" Then the following are equivalent:
1. A is diagonalizable with respect to a unitary. 2. There is an orthonormal basis of C" consisting of eigenvectors of A.
3. A is normal.
PROOF Suppose (1). Then there exists a unitary matrix U with U-' AU = U*AU = D, adiagonal matrix. But then AA* = UDU*UD*U* = UDD*U* = UD*DU* = (UD*U*)(UDU*) = A*A so A has to be normal. Thus (1) implies (3).
Next suppose A is normal. Write A in a Schur decomposition A = UT U*, where T is upper triangular. Then A* = UT*U* and so TT* = U*AUU*A*U
= U*AA*U = U*A*AU = U*A*UU*AU = T*T. This says T is normal. But if T is normal and upper triangular it must be diagonal. To see this compute
t22
...
tin
ill
...
t2n
712
0
...
0
0 0
0
. .
. tnn
tin
tnn
Matrix Diugonalizution
. Now compare the diagonal entries:
ItiII' + Ith212 + It1312 +... + Ihn12 = (l, 1) entry =
It, 112
It22122 + It2312 + ... I ten 12 = (2, 2) entry = Itt212 + 12212 + It2212
Itnnl2 = (n, n) entry = ItinI2 +t2/, + ... + I tnnl2
W e see that I tj j I2 = 0 whenever i # j. Thus (3) implies (1). If (1) holds
there exists a unitary U such that U-' AU = D, a diagonal matrix. Then AU = UD and the usual argument gives the columns of U as eigenvectors of A. Thus the columns of U yield an orthonormal basis of C" consisting entirely of eigenvectors of A. Conversely, if such a basis exists, form a matrix U with these vectors as columns. Then U is unitary and diagonalizes A. 0 COROLLARY 10.5 Let A E C""n 1.
If A is Hermitian, A is diagonalizable with respect to a unitary.
2. If A is skew Hermitian (A* _ -A), then A is diagonalizable with respect to a unitary. 3. If A is unitary, then A is diagonalizable with respect to a unitary.
It is sometimes of interest to know when a family of matrices over C can be simultaneously triangularized by a single invertible matrix. We state the following theorem without proof. THEOREM 10.16 Suppose .T' is a family of n-by-n matrices over C that commute pairwise. Then there exists an invertible matrix S in C1 " such that S-' A S is upper triangular for every A in F.
10.3 Diagonalization with Respect to a Unitary
375
Exercise Set 44 i -i
f zi
2
2
2
a unitary matrix?
2. Argue that U is a unitary matrix iff it transforms an orthonormal basis to an orthonormal basis.
3. Find a unitary matrix U that diagonalizes A =
1
1
1
1
1
1
1
1
1
1
1
1
1
1
l
1
}
4. Argue that unitary similarity is an equivalence relation.
5. Prove that the unitary matrices form a subgroup of GL(n,C), the group of invertible n-by-n matrices. 6. Argue that the eigenvalues of a unitary matrix must be complex numbers of magnitude 1 - that is, numbers of the form eie.
7. Prove that a normal matrix is unitary iff all its eigenvalues are of magnitude 1. 8. Argue that the diagonal entries of an Hermitian matrix must be real. 9. Let A = A* and suppose the eigenvalues of A are lined up as X1 > X2 > > X,,, with corresponding orthonormal eigenvectors ui,u2, ... ,u,,. (Av V)
For v A 0. Define p(v) =
. Argue that X < p(v) < X1. Even
(v I v)
more, argue that maxp(v) = X, and minp(v) _ X . vOO
V96o
10. Prove the claims of Corollary 10.5.
11. Argue that the eigenvalues of a skew-Hermitian matrix must be pure imaginary.
12. Suppose A is 2-by-2 and U*AU = T = r
t2 J
be a Schur decom-
position. Must ti and t2 be eigenvalues of A?
13. Use Schur triangularization to prove that the determinant of a matrix is the product of its eigenvalues and the trace is the sum.
14. Use Schur triangularization to prove the Cayley-Hamilton theorem for complex matrices.
376
Matrix Diagonalization
Further Reading [B&H 1983] A. Bjorck and S. Hammarling, A Schur Method for the Square Root of a Matrix, Linear Algebra and Applications, 52/53, (1983), 127-140. [Schur, 1909] I. Schur, Uber die characteristischen Wurzeln einer linearen Substitution mit einer Anwendung auf die Theorie der Integralgleichungen, Math. Ann., 66, (1909), 488-5 10.
10.3.1 MATLAB Moment 10.3.1.1
Schur Triangularization
The Schur decomposition of a matrix A is produced in MATLAB as follows:
[Q, T] = schur(A) Here Q is unitary and T is upper triangular with A = QTQ*. For example,
>> B = [1 +i2+2i3+i;2+2i4+4i9i;3+3i6+6181]
B= 1.0000 + 1.0000i 2.0000 + 2.0000i 3.0000 + 3.0000i
2.0000 + 2.0000i 4.0000 + 4.0000i 6.0000 + 6.0000i
3.0000 + 1.0000i 0 + 9.0000i 0 + 8.0000i
> > [U,T] = schur(B)
U= 0.2447 - .01273i
0.7874 + 0.4435i
0.6197 + 0.15251 0.7125 + 0.09501
-0.2712 - 0.3003i 0.1188 - 0.07401
-0.0787 + 0.31791 -0.1573 + 0.6358i 0.1723 - 0.65881
2.7038 + 1.3437i 0.0000 - 0.00001 0
-3.0701 - 0.0604i -1.3210- 1.1551i
T= 6.3210 + 14.1551 i 0 0
1.6875 + 5.4961i
> > format rat
> > B = [1+i2+2i3+i;2+2i4+4i9i;3+316+6i8i]
10.4 The Singular Value Decomposition
B= 1+1i
377
2+2i 3+1i
2+2i 4+4i 0+9i 3+3i 6+6i 0+8i U= 185/756 - 1147/9010i
663/842 + 149/3361
1328/2143 + 329/21571
-739/2725 - 2071/6896i
280/393 + 130/1369i
113/951 - 677/91471
-227/2886 + 659/20731 -227/1443 + 1318/2073i 339/1967 - 421/639i
T= 6833/1081 + 4473/3161
2090/773 + 1079/803i
0 0
0 20866/12365 + 709/1291
-5477/1784 - 115/1903i -1428/1081 - 1311/11351
10.4
The Singular Value Decomposition
In the previous sections, we increased our demands on diagonalizing a matrix. In this section we will relax our demands and in some sense get a better result.
The theorem we are after applies to all matrices, square or not. So suppose A E Cr X". We have already seen how to use a full rank factorization of A to
get SAT Ir= 11 0 , where S is unitary and T is invertible. The natural question arises as to whether we can get T unitary as well. So let's try! Let's begin with an orthogonal full rank factorization A = FG
with F+ = F*. We will also need the factorizations I - FF+ = F1 F1 = Fi Fj F+
Fi D
®®
A(I-G+G)W21= G+D 11
and I-G+G=F2F2 =F2F2.Then
111JJJ
F+ ]. The matrix U*
is unitary. At the moment, W2 is
F
arbitrary. Next, consider V = [G+D : (I - G+G)W21. We would like to make V unitary and we can fiddle with D and W2 to make it happen.
Matrix Diugonalizution
378
But V*V =
(G D)* [(1
D*G+*G+D
- G+G)W2]* lJ L D*G+*(I - G+G)W2
G+D : (I - G+G)W2
I
W'(1 - G+G)G+D W, (I - G+G)W,
D*G+*G+D 0 ®
Wz (I - G+G)W2 I'
We have already seen that by choosing W2 = F2 we can achieve I, _r in the lower right. So the problem is, how can you get 1, in the upper left by
choice of D? By the way, in case you missed it, G+* _ (G+GG+)* _ G+*(G+G)* = G+*G+G so that D*G+*(I - G+G)W2 = 0. Anyway our problem is to get D*G+*G+D = I,. But G = F+A = F*A so D*G+*G+D = D*(GG*)-'D = D*(F*AA*F)-1D since G+ = G*(GG*)-' and so G+* (GG*)-1G. So to achieve the identity, we need F*AA*F = DD*, where D is invertible. Equivalently, we need AA* F = FDD*. Let's take stock so far. THEOREM 10.17 Let A E C111", A = FG be an orthogonal full rank factorization of A. If there exists D E C;,,, with GG* = DD*, then there exist unitary matrices S and T
with SAT =
D 10 0
One way to exhibit such a matrix D for a given A is to choose for the columns of F an orthonormal basis consisting of eigenvectors of AA* corresponding to nonzero eigenvalues. We know the eigenvalues of AA* are nonnegative. Then AA* F = FE where E is the r-by-r diagonal matrix of real positive eigenvalues
of AA*. Let D be the diagonal matrix of the positive square roots of these eigenvalues Then D* [ F* AA* F] D = 1. Thus we have the following theorem.
THEOREM 10.18 (singular value decomposition)
Let A E C" A = FG where the columns of F are an orthonormal basis of Col(A) = Col(AA*) consisting of eigenvectors of AA* corresponding to
nonzero eigenvalues. Suppose I - FF+ = F1 Fi = F1 Fi and I - G+G = F2 F. = F2 F2+. Then there exist unitary matrices U and V with U * A V = Dr
10
01, where Dr is an r-by-r diagonal matrix whose diagonal entries are
the positive square roots of the nonzero eigenvalues of AA*. The matrices Uand F* 1 r Vcan be constructed explicitly from U* = and V = I G+D : F2 l1
F+ J
10.4 The Singular Value Decomposition
379
It does not really matter if we use the eigenvalues of AA* or A*A, as the next theorem shows.
THEOREM 10.19 Let A E Cmx". The eigenvalues of A*A and AA* differ only by the geometric multiplicity of the zero eigenvalue, which is n - r for A* A and in - r for AA*,
where r = r(A*A) = r(AA*). Since A*A is self-adjoint, there is an orthonormal basis of eigen-
PROOF
vectors v1,v2,... ,v" corresponding to eigenvalues X1,12,... ,k (not necessarily distinct). Then (A*Av; I v1) = A; (v; I v1) =)\ibi1 for l < i, j <
n. So (Av; I Avj) = (A*Av; I vj) = A;Sii. Thus (Av; I Av;) = X;, for i = 1, 2, ... , n. Thus Av; = ' iff X, = 0. Also AA*(Avi) = A(A*Av;) = h (Av;) so for hi 54 0, Av; is an eigenvector of AA*. Thus, if 1t is an eigenvalue of A* A, then it is an eigenvalue of AA*. A symmetric argument gives the other implication and we are done. 0
Let's look at an example next. Example 10.3
Consider A =
1
0
0
1
1
1
1
I
-1
0
0
1
. Then AA* _
3
-1
-1
2
1
2
1
3
I
-I -1
.
With the help of Gram-Schmidt, we get F =
F*AA*F
0
0
= [0 l
r
O ]soD= L
01/5
1/5
0
'
,
2
and
1
sS f f
0
_L
1
with
-1
1
0
eigenvalues 5, 3, and 0 and corre sponding eigenvectors
2
. Then
_L
]andD*FF*AA*F1_1D= 1
J- L
0 I
The singular value decomposition really is quite remarkable. Not only is it interesting mathematically, but it has many important applications as well. It deserves to be more widely known. So let's take a closer look at it. Our theorem says that any matrix A can he written as A = U E V *, where U and V are unitary and E is a diagonal matrix with nonnegative diagonal entries. Such a decomposition of A is called a singular value decomposition (SVD) of A. Our proof of the existence rested heavily on the diagonalizability of
Matrix Diagonalization
380
AA* and the fact that the eigenvalues of AA* are nonnegative. In a sense, we
did not have much choice in the matter since, if A = UEV*, then AA* = UEV*V E*U* = UE2U* so AA*U = UE2, which implies that the columns of U are eigenvectors of AA* and the cigenvalues of AA* are the squares of the while A * A = VE*U*UEV* = diagonal entries of E. Note here U V E2V * says the columns of V, which is n-by-n, form an orthonormal basis for C". Now since permutation matrices are unitary, there is no loss of generality is assuming the diagonal entries of E can be arranged to he in decreasing order
and since A and E have the same rank we have a1 >_ a2 ... > a, > 0 = ar+I = ar+2 = ar+3 = .. = an Some more language, the columns of U are called left singular vectors and the columns of V are called right singular vectors of A.
So what is the geometry behind the SVD? We look at A E C"' III as taking vectors from C" and changing them into vectors in C". We have A = aI
a,
vi
ar
0
[uI I U2 I ...1
V2
0
I V2
...
0
L
Vnl = [a1UI
or A [vi
1 a2 U2
arur ... } so Av1 = (7101, ... , AVr = UiUr,
4.
So, when we express a vector v E C" in terms of AVr+1 = . .... , Av,,, _ we see A contracts some components the orthonormal basis {v1, V2, ... , and dilates others depending on the magnitudes of the singular values. Then the change in dimension causes A to discard components or append zeros. Note the vectors {vr+I , ... , v" } provide an orthonormal bases for the null space of A. So what a great pair of bases the SVD provides! Cm
Cn
A, A Au; = a;u1
A+E
AT, A=AT
Figure 10.1: SVD and the fundamental subspaces.
10.4 The Singular Value Decomposition
381
Let's try to gain some insight as to the choice of these bases. If all you want
is a diagonal representation of A, select an orthonormal basis of Null(A), {vr+], ... , vn } and extend it to an orthonormal basis of Arull(A)1 = Col(A*) v } becomes an orthonormal basis of C'1. so that {vi, v2, ... , Avi
Define Jul, ... , u,}
Av,
IIAvi II'
,
IlAvrll
}
'
Then extend with a basis of
Arull(A*) = Col(A)l. Then AV = A [v, I ... v, ... ] = [Avi I
I
I
... I Av,
IIAvl II I
l = [U2
I ur
I
I
]
. But the problem is
IIAv211
the us do not have to be orthonormal! However, this is where the eigenvectors of A*A come in to save the day. Take the eigenvalues of A*A written k, > X2 > X, > kr+i = ... = kn = 0. Take a corresponding orthonormal basis of eigenvectors v,, V2, ... , v" of A*A. Define u, _
Av,
, ...
, Ur
=77.
A
The
key point is that the us are orthonormal. To see this, compute (ui I uj) = 1 kT, < vi I vj >= b,, . Now just extend the us to an orthonormal basis of C'. Then A V = U E where the ki and zeros appear on the diagonal of E zeros appear everywhere else.
We note in passing that if A = A*, then A*A = A2 = AA*, so if k is
an eigenvalue of A, then \2 is an eigenvalue of A*A = AA*. Thus the left and right singular vectors for A are just the eigenvectors of A and the singular values of A are the magnitudes of its eigenvalues. Thus, for positive semidefinite matrices A, the SVD coincides with the spectral diagonalization. Thus for any matrix A, AA* has SVD the same as the spectral (unitary) diagonalization. There is a connection between the SVD and various norms of a matrix.
THEOREM 10.20
Let A E c"' with singular values a > Q2 > ... >_ Q > 0. Then IIA112 = Qt II Av 112 and 11 A 112 = (Qi +QZ+' '+Q2012 If A is invertible
where II A
vn > 0 and IIA -I I min
din,(M)=n-k+1
max
1/an. Moreover, Qk = IIAv112
vEM\{Tf} 11v112
There is yet more information packed into the SVD. Write A = U E V* =
382
Matrix Diagonalization
Q,
0 Vi*
[U, I U2]
a,.
,
[ V*
0
2
0-
(i) U, UJ = AA' = the projection onto Col(A). (ii) U2UZ = 1 - AA+ = the projection onto Col(A)1 = Jvull(A*). Then (iii) V, V1* = A+A = the pro-
jectiononto Aiull(A)1 =Col(A*). (iv) V2V2 = 1 - A+A = the projection onto Mull(A) -
Indeed, we can write A = U,
V,*. Setting U, =
0
Q,
F, G =
V,* , we get a full rank factorization of A. In Orr
L0
1/Q,
Ui . Even more, write A = [u,
fact, A+ = V,
I
. .
I url
I/a, V,.
= v, u, V +... +Qrur V* which is the outer prodVr uct form of matrix multiplication, yielding A as a sum of rank I matrices. The applications of the S VD seem to be endless, especially when the matrices involved are large. We certainly cannot mention them all here. Let's just mention a few.
One example is in computing the eigenvalues of A*A. This is important to statisticians in analyzing covariance matrices and doing something called principal component analysis. Trying to compute those eigenvalues directly can be a numerical catastrophe. If A has singular values in the range .001 to 100 of magnitude, then A * A has eigenvalues that range from .000001 to 10000. So computing directly with A can have a significant advantage. With large data matrices, measurement error can hide dependencies. An effective measure of rank can be made by counting the number of singular values larger than the size of the measurement error. The SVD can help solve the approximation problem. Consider Ax - b =
UEV*x - b = U(EU*x) - U(U*b) = U(Ey - c), where y = V*x and c = U*b.
10.4 The Singular Value Decomposition
383
Also IIAx - bite = IIU(Ey - c)II = IIEy - cll since U is unitary. This sec-
I cl/91 and minimization is easy. Indeed y = E+c +
cr/9r
so z = VE+U*b =
0
0 0
A+b gives the minimum length solution as we already knew. There is a much longer story to tell, but we have exhausted our allotted space for the SVD.
Exercise Set 45 1. Argue that the rank of a matrix is equal to the number of nonzero singular values.
2. Prove that matrices A and B are unitarily equivalent iff they have the same singular values.
3. Argue that U is unitary iff all its singular values are equal to one. 4. For matrices A and B, argue that A and Bare unitarily equivalent iff A*A and B*B are similar.
5. Suppose A = U E V* is a singular value decomposition of A. Argue that IIAIIF = IIEV*IIF = IIE*IIF = (19112 + 19212 +... + 19ni2)Z. 6. Compare the SVDs of A with the SVDs of A*. 7. Suppose A = UEV* is an SVD of A. Argue that IIA112 = 1911
8. Suppose A = QR is a QR factorization of A. Argue that A and R have the same singular values. 9. Argue that the magnitude of the determinant of a square matrix A is the product of its singular values. 10. Argue that if a matrix is normal, its singular values are just the magnitudes of its eigenvalues.
11. Argue that the singular values of a square matrix are invariant under unitary transformations. That is, the singular values of A, AU, and UA are the same if U is unitary.
Matrix Diagonalization
384
12. Suppose A = U E V * is a singular value decomposition of A. Prove that x = V E+U*b minimizes Ilb - Ax112 . 13. Suppose A = U E V * is a singular value decomposition of A. Argue that Q 11Vl12 <_ IlAvll2 <- o
IIVI12
14. Suppose A is an m-by-n matrix of rank r and 0 < k < r. Find a matrix of rank k that is closest to A in the Frobenius norm. (Hint: Suppose A = U E V * is a singular value decomposition of A. Take UV*, 0 vi where Ek =
)
0
vk
0
0
15. Suppose A is an m-by-n matrix. Argue that the 2-norm of A is its largest singular value. 16. Suppose A is a matrix. Argue that the condition number of A relative to the 2-norm is the ratio of its largest singular value to its smallest singular value.
17. Argue that BB* = CC* iff there exists U with UU* = I such that C = BU. 18. Suppose A = U
®' ®J V* is the
of A. Argue that B = V[
D'
singular value decomposition
1
]U* is a 1-inverse of A and all
G J
I-inverses look like this where E, F, and G are arbitrary.
Further Reading [B&D, 1993] Z. Bai and J. Demmel, Computing the Generalized Singular
Value Decomposition, SIAM J. Sce. Comput., Vol. 14, (1993), 14641486.
[B&K&S, 1989] S. J. Blank, Nishan Krikorian, and David Spring, A Geometrically Inspired Proof of the Singular Value Decomposition, The American Mathematical Monthly, Vol. 96, March, (1989), 238-239.
/0.4 The Singular Value Decomposition
385
[C&J&L&R, 20051 John Clifford, David James, Michael Lachance, and Joan Remski, A Constructive Approach to Singular Value Decomposition and Symmetric Schur Factorization, The American Mathematical Monthly, Vol. 112, April, (2005), 358-363.
[M&R, 1998] Colm Mulcahy and John Rossi, A Fresh Approach to the Singular Value Decomposition, The College Mathematics Journal, Vol. 29, No. 3, (1998), 199-206. [Good, 1969] I. J. Good, Some Applications of the Singular Decomposition of a Matrix, Technometrics, Vol. 11, No. 4, Nov., (1969), 823-831. [Long, 1983] Cliff Long, Visualization of Matrix Singular Value Decomposition, Mathematics Magazine, Vol. 56, No. 3, May, (1983), 161-167.
[M&M, 1983] Cleve Moler and Donald Morrison, Singular Value Analysis of Cryptograms, The American Mathematical Monthly, February, (1983), 78-86. [Strang, 1980] G. Strang, Linear Algebra and Its Applications, Academic Press, New York, (1980).
[T&B, 1997] L. N. Trefethen and D. Bau III, Numerical Linear Algebra, Society of Industrial and Applied Mathematicians, Philadelphia, (1997). [G&L, 1983] G. H. Golub and C. Van Loan, Matrix Computations, Johns Hopkins University Press, Baltimore, (1983).
[H&O, 1996] Roger A. Horn and Ingram Olkin, When Does A*A=B*B
and Why Does One Want to Know?, The American Mathematical Monthly, Vol. 103, No 6., June-July, (1996), 470-481.
[Kalman, 19961 Dan Kalman, A Singularly Valuable Decomposition: The SVD of a Matrix, The College Mathematics Journal, Vol. 27, No. 1, January, (1996), 2-23. [Koranyi, 20011 S. Koranyi, Around the Finite-Dimensional Spectral Theorem, The American Mathematical Monthly, Vol. 108, (2001) 120-125.
10.4.1 10.4.1.1
MATLAB Moment The Singular Value Decomposition
Of course, MATLAB has a built-in command to produce the singular value decomposition of a matrix. Indeed, many computations in MATLAB are based
Matrix Diagonalization
386
on the SVD of a matrix. The command is
IU,S,V)=svd(A). This returns a unitary matrix U, a unitary matrix V, and a diagonal matrix S with the singular values of A down its diagonal such that
A = USV' up to roundoff. There is an economy size decomposition. The command is
[U, S, V] = svd(A, 0). If all you want are the singular values of A, just type
svd(A). Let's look at some examples.
> > A = vander(1:5)
A= I
I
I
I
16
8
4
2
81
27
9
3
I
256 625
64
16
4
1
125
25
5
1
1
I
> > [U,S,V] = svd(A)
U= -0.0018 -0.0251 -0.1224 -0.3796 -0.9167
-0.0739 -0.3193 -0.6258 -0.6165
0.6212 0.6273 0.1129
-0.7467
-0.2258
0.3705 0.3594
-0.6720
-0.4320
-0.4046
0.3542
0.3477
0.1455
0.1109
-0.0732
695.8418 0
0 18.2499
0
0 0 0
0 0 0
2.0018 0
-0.9778 -0.2046 -0.0434 -0.0094 -0.0021
0.1996
0.0553
0.0294
-0.8500 -0.4468 -0.1818 -0.0706
-0.3898
-0.2675
0.1101
0.4348 0.6063 0.5369
0.6292 0.0197
-0.4622
-0.7289
-0.4187
0.6055
S= 0
0
0 0 0 0.4261 0
0 0
0 0 0.0266
V= -0.0091
0.7739
10.4 The Singular Value Decomposition
387
>> format rat > > [U,S,V] = svd(A)
U= -18/10123 -476/6437 579/932 -457/612 -298/11865 -692/2167 685/1092 379/1023 -535/4372 -2249/3594 297/2630 1044/2905 -976/2571 -3141/5095 -499/1155 -547/1352 -39678/43285 491/1412 657/4517 207/1867
-240/1063 637/1052 -465/692 559/1578 -100/1367
S= 13221/19
0 0 0 0
0 33306/1825
0 0
0 0 0
1097/548
0 0
0 0 0 268/629 0
0 0
0 0 23/865
V= 395/1979 157/2838 367/12477 -218/24085 82/745 -593/2898 -2148/2527 -313/803 -811/3032 -403/872 -439/10117 -915/2048 1091/2509 246/391 -79/8430 644/32701 2091/2702 -313/1722 596/983 -87/41879 -286/4053 792/1475 -277/380 -675/1612 - 2029/2075
> > svd(A)
ans = 13221/19 33306/1825 1097/548 268/629 23/865
Now try the complex matrix B = [ I+i 2+2i 3+i;2+2i 4+4i 9i;3+3i 6+6i 8i J. The SVD is very important for numerical linear algebra. Unitary matrices preserve lengths, so they tend not to magnify errors. The MATLAB functions of rank, null, and orth are based on the SVD.
Chapter 11 Jordan Canonical Form
11.1
Jordan Form and Generalized Eigenvectors
We know that there are matrices that cannot be diagonalized. However, there
is a way to almost diagonalize all complex matrices. We now develop this somewhat long story.
11.1.1
Jordan Blocks
There are matrices that can be considered the basic building blocks of all square complex matrices. They are like the prime numbers in N. Any n E N is uniquely a product of primes except for the order of the prime factors that can be permuted. An analogous fact holds for complex matrices. The basic building blocks are the Jordan blocks. DEFINITION 11.1
(k-Jordan block)
For X E C, k E N, def ine Jk(k) _
k
1
0
0
k
1
0 0
O
O
k
1
0
0
0
0
. .. . .. . ..
0 0 0
0
. .
0
k
.
0 0
E
(Ckxk
This is a k-Jordan block of order k. 1
r
For example, J1(k) = lk], JZ(k) = r 0
and so on. r 5
More concretely, J2(5) = r 0 L
1
5
.13(4) =
k
J,
J3(k) =
4
1
0
0 0
4 0
4
k
1
0
k
0
0
0 I k
,
1
389
390
Jordan Canonical Form
Now we collect some basic facts about Jordan blocks. First we illustrate some important results. X
Note that
0
1
0 0
X
1
0
X
X
1
0 0
=
0 0
I
=X
0 0
, so X is an eigen-
xi
value ofJ3(X)with eigenvectorthestandard basis vector e1.Moreover,
xz
E
x3
x,
Eig(J3(X); X) iff
e Null(J3(X) - X/) iff
X2
x3
0 0 0
I
0 0
0
xi
1
xz
0
x3
=
0 0 iff xz = 0 and x3 = 0. Thus Eig(J3(X); X) = span(ei) = Cei, so 0 the geometric multiplicity of X is 1. Moreover, the characteristic polynomial of X 0
fx 0 0
J3(X), Xj,tXt(x) = det(x1- J3(X)) = det
x-X det
0 0
-1 x-X 0
1
0x0 -
0X
O O x
0 0 X
0
-I x-X
= (x-X)3. Also, J3(X)-XI =
0 0 0
=
I
1
0
0 0
0
1
which is nilpotent of index 3, so the minimal polynomial µz,(x)(x) = (x - X)3 as well. In particular, the spectrum of J3(X) is {X). We extend and summarize our findings with a general theorem. THEOREM 11.1 Consider a general Jordan block Jk(X) : 1.
Jk(X) is an upper triangular matrix.
2. det(Jk(X)) =X4 and Tr(Jk(X)) = kX. 3.
Jk(X) is invertible ifX # 0.
4. The spectrum X (JL(X)) = {X); that is, X is the only eigenvalue of Jk(X) and the standard basis vector ei spans the eigenspace Eig(Jk(X); X).
5. dim (Eig(JJ(X); X)) = 1; that is, the geometric multiplicity of X is 1. 6. The characteristic polynomial of Jk(X) is X.t,(a)(x) _ (x - X)k. 7. The minimal polynomial of Jk(X) is µ j, (,\)(x) = (x - X)k.
11.1 Jordan Form and Generalized Eigenvectors 8.
391
Jk(k) = Jk(0) + k/k and Jk(0) is nilpotent of index k.
9. rank( Jk(0)) = k - I. \n
(11xn-1
0 0
\n
0
0
1
10. Jk(h)n =
0
forn E N. 0
11. If k # 0, Jk(X)-I = I I _I 0
i,
-l
0
0
a
0
0
0
.
kn
...
(- 1)I+k
(-1)i+i -L
I
k
1
-1\7
0
...
Jk(O)3
Jk(0) - 1 Jk(0)2 +
P(,\)
' I,(k LL ... 2!
0
p(X)
0
0
p(X)
0
0
0
'
(r-1)!
...
12. Suppose p(x) E C [x] ; then p(Jk(X)) zz:
13. Jk(0)Jk(0)r
14. Jk(O)T Jk(0)
to I
.
...
p(X)
0
0
0
0
0
]k-I
15. Jk (O)+ = Jk (0)T . 16.
Jk(0)ei+1 = ei f o r i = 1, 2, ... , k-1.
17. If k # 0, rank(Jk(k))n' = rank(Jk(X))n,+I -- k for m = 1,2,3, ... .
18. rank(Jk(0)"') - rank(Jk(0)ii+1) = 0 if nt > k. 19. rank(Jk(0)"') - rank(Jk(0)'n+l) = 1 for m -- 1, 2, 3, ... , k-l.
PROOF
The proofs are left as exercises.
392
Jordan Canonical Form
Exercise Set 46
1.
Consider J = J6(4) =
0 0
4
1
0
0
4
1
0 0
0 0
0 0
4
0
4
1
0
0
0
0
0
4
0
0
0
0
10
0
.
What are the deter-
minant and trace of J? Is J invertible? If so, what is J-1. What is the spectrum of J and what is the eigenspace? Compute the minimal and characteristic polynomials. What is J7? Suppose p(x) = x2 + 3x - 8. What is p(J)? 2. Prove all the parts of Theorem 11.1.
3. Show that Jk(h) _)\l + E e;e+i for n > 2. 4. Argue that Jk(X) is not diagonalizable for n > 1.
MATLAB Moment
11.1.1.1 11.1.1.1.1
Jordan Blocks
It is easy to write an M-file to create a matrix that
is a Jordan block. The code is 2
function J = jordan(lambda, n) ifn == 0, J = [], else
3
J = nilp(n) + lambda * eye(n); end
I
Experiment making some Jordan blocks. Compute their ranks and characteristic polynomials with MATLAB. 11.1.2
Jordan Segments
Now we need to complicate our structure by allowing matrices that are block diagonal with Jordan blocks of varying sizes down the main diagonal. For the time being, we hold X fixed.
DEFINITION 11.2 (X-Jordan segment) A X-Jordan segment of length k is a block diagonal matrix consisting of k X-Jordan blocks of various sizes. In symbols, we write, J( k; pi, p2, ... , pk) Block Diagonal[Jv,(X), JP2(k), ... , J,,,(X)]. Note J(X; Pi. P2. . Pk) E k
C' ' where t = Epi. Let's agree to write the pis in descending order i=1
The sequence Segre(X) = (pi, P2,
...
, Pk) is called the Segre sequence of
11.1 Jordan Form and Generalized Eigenvectors
393
J(X; pi, p2, , pk) Clearly, given X and the Segre sequence of X, the XJordan Segment is completely determined. 5
1
0 5
0 0 0 0 0 1
00500 0005
Let's look at some examples for clarification. J(5;3, 2) =
E
1
0 0 0 0 5 0 1 0 0 0 0 0
0 0 0 0 7 0 0 0 7
Clx5,J(7;2,1,1,1)=
1
0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 7 0 0 ,J(0;3,2,2)= 0 0 0 0 0 0 00070 0000000 00007 000000 1
1
0 0 0 0 0 0 0
Note rank(J(0; 3, 2, 2)) = 4 and nullity = 3. To understand the basics about Jordan segments, we need some generalities about block diagonal matrices.
THEOREM 11.2 11
Let A = L
B
®J, where B and C are square matrices.
1. A2 =
®
positive integer.
g];moregenerally.Ak= [Bk 0 fo, any ®J
2. If p(x) E C[x] and p(A) _ 0, then p(B) =Oand p(C) _ 0. Conversely, if p(B) = ® and p(C) = 0, then p(A) = 0.
3. The minimal polynomial pA(x) = LCM(pB(x), pc(x)); moreover, if (x - X) occurs kB times in µB(x) and kc times in p.c(x), then (x- X) occurs max{ke, kc} times in µA(x).
4. The characteristic polynomial XA(x) = XB(x)Xc(x); so if (x- X) occurs kB times in XB(x) and kc times in Xc(x), then (x- X) occurs kB + kc times in XA(x).
5. X(A) = X(B) U X(C); that is, X is an eigenvalue of A if X is an eigenvalue
of either B or C. 6. The geometric multiplicity of X in X(A) equals the geometric multiplicity of X in B plus the geometric multiplicity of X in C (of course, one of these could be zero).
394
Jordan Canonical Form
PROOF
Once again, the details are left to the reader.
0
These properties extend to matrices with a finite number of square blocks down the diagonal. COROLLARY 11.1
Suppose A = Block Diagonal[A I, A2, ... , Ak]. Then 1. µA(x) = LCM[WA,(x), PAZ(x), ... , µA,(x)} k
2. XA(x) = fXA,(x) j=1 k
3. A(A) = U A(Aj). J=1
Now, we can apply these ideas to A-Jordan segments. COROLLARY 11.2
Consider a X-Jordan segment of length k, J = J(lt; PI, P2, ... 1. X is the only eigenvalue of J(X; pl, P2,
, PA)-
, Pk)-
2. The geometric multiplicity of A is k with eigenvectors e1, eP,+1, ... , eP,+P:+ P,-,+I3.
p.j(x) = (x - X)IIIP,.P2.....P41
4.
XJ (x) = (x- X)', where t = E Pj
k
j=1 k
-k.
5. J=1
6. index(J(0; P1, P2, P3,
PROOF
...
, PA)) = max{PI, P2, P3,
...
,
Pk}.
Once again, the proofs are left as exercises. 5
0 0
To illustrate, consider J(5; 3, 2, 2) =
1
0
0
5
1
0
5
5
1
0
5
.
5
1
0
5
Clearly,
the only eigenvalue is 5; its geometric multiplicity is 3 with eigenvectors el,e4,e6;
the minimal polynomial is µ(x) _ (x - 5)3, and the characteristic polynomial
395
11.1 Jordan Form and Generalized Eigenvectors 0
1
0
0 0
0 0
0
is X(x) = (x - 5)7. Consider J(0; 3, 2, 2) =
1
0 0
1
0
0 0
1
0
Zero is the only eigenvalue with geometric multiplicity 3, egenvectors e1 ,e4,e6,
minimal polynomial µ(x) = x3, and characteristic polynomial X(x) = x7. The rank is 4 and index is 3.
Exercise Set 47 1. Consider J = J(4; 3, 3, 2, 1, 1). Write out J explicitly. Compute the minimal and characteristic polynomials of J. What is the geometric multiplicity of 4 and what is a basis for the eigenspace? What is the trace and
determinant of J? Is J invertible? If so, find J-1. 2. Prove all the parts of Theorem 11.2. 3. Prove all the parts of Corollary 1 1.1 and Corollary 11.2.
11.1.2.1 MATLAB Moment 11.1.2.1.1
Jordan Segments We have to be a little bit more clever to write
a code to generate Jordan segments, but the following should work: I
2 3
4
function) = jordseg(lambda, p) k = length(p) J = jordan(lambda, p(k))
fori=k-l:-l:l
5
J = blkdiag(jordan(lambda, p(i)), J)
6
end
We need to be sure to enter p as a vector.For example,
> > jordseg(3, [2, 2, 4]) ans = 0 0
0
0
0
0 0 0
3
1
0 0 0
3
0 0
3
1
0
3
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0 0
0
0 0 0 0 0
3
1
0
3
1
0 0
0 0
3
1
0
3
Jordan Canonical Form
396
11.1.3 Jordan Matrices Now we take the next and final step in manufacturing complex matrices up to similarity.
DEFINITION 11.3
(Jordan matrix)
A J o r d a n m a t r i x is a block diagonal matrix whose blocks are Jordan segments. The notation will be J = J((,\ , ; p,, , P, 2, ... , PIk(l)), (1\2; P21, P22, , , p,k(s))) = BlockDiagonal[J(t i; Pi i, P12, ... , P2k(2)), ... , (k,; P,i, Ps2, P(k(l)), ... , J(X,.; PS(, Ps2, ... , p,k(,))]. The data (hi, Segre(X1)) for i = 1,2,
... ,s is called the Segre characteristic of J and clearly completely determines the structure of J. For example, J = J((3;2, 1, 1), (-2; 3), (4;3, 2), (i; 1, 1)) _ 3
1
0
3 3 3
-2 0 0
1
0
-2
I
0
-2 4
1
0
0 0
4 0
4
1
4
1
0
4 i i
The Segre characteristic of J is {(3; 2, 1, 1), (-2; 3), (4; 3, 2), (i ; 1, 1)}. Now, using the general theory of block diagonal matrices developed above, we can deduce the following.
THEOREM 11.3 Suppose J = J((A(; Pi 1, P12, ... , Pike)), (k2; P21, P22, ... , P242)), .... (ks; P, i P,2, .... P,k(s)))Then 1.
the eigenvalues of J are,\, , k2, ... , Ik,
2. the geometric multiplicity of k; is k(i)
IL l Jordan Form and Generalized Eigenvectors
397
3. the minimum polynomial of J is µ(x) = rI(x - Ki)"'ax(N,',Ni2. i=I
4. the characteristic polynomial of J is X(x) = fl(x -
where ti =
i=1 k(i)
i pii j=1
PROOF
The proofs are left as exercises.
0
Now the serious question is whether any given matrix A is similar to a Jordan matrix and is this matrix unique in some sense. The remarkable fact is that this is so. Proving that is a nontrivial task. We work on that next.
Exercise Set 48 1. Consider J = J((4; 5), (11; 2, 2, 1), (2+2i; 3)). What are the eigenvalues of J and what are their geometric multiplicities? Compute the minimal and characteristic polynomials of J. 2. Prove all the parts of Theorem 11.3.
11.1.3.1 MATLAB Moment
Jordan Matrices Programming Jordan matrices is a bit of a challenge. We are grateful to our colleague Frank Mathis for the following code: 11.1.3.1.1
2
functionJmat = jordform(varargin) n = nargin
I
3
lp = varargin(n)
4
lp = lp{:}
5 6 7
k = length(lp) Jmat = jordseg(lp(1),lp(2 : k))
8
lp = narargin(i)
9 II
lp = lp{:) k = length(lp) Jmat = blkdiag(jordseg(lp(l), lp(2 : k)), Jmat
12
end
10
fori=n-1:-I:1
Jordan Canonical Form
398
Try
> >ford f orm([8, [4, 2, 211, [9, [3, 1 ]1). Note the format of the entry.
11.1.4 Jordan's Theorem There are many proofs of Jordan's theorem in the literature. We will try to present an argument that is as elementary and constructive as possible. Induction is always a key part of the argument. Let's take an arbitrary A in C11 ". We shall argue that we can get A into the form of a Jordan matrix by using similarities. That is, we shall find an invertible matrix S so that S-1 AS = J is a Jordan matrix. This matrix will he unique up to the order in which the blocks are arranged on the diagonal of J. This will he our canonical form for the equivalence relation
of similarity. We can then learn the properties of A by looking at its Jordan canonical form (JCF). The proof we present is broken down into three steps. The first big step is to get the matrix into triangular form with eigenvalues in a prescribed order down the main diagonal.This is established by Schur's theorem, which we have already proved. As a reminder, Schur's theorem states: Let A E C0" with eigenvalues A, , .2, ... , K" (not necessarily distinct) be written in any prescribed order. Then
there exists a unitary matrix U E C"with U-1 AU = T, where T is upper triangular and t,, = K; for i = 1, 2, ... , n. Hence, we may assume U-1 AU is upper triangular with equal diagonal entries occurring consecutively down the diagonal. Suppose now, k, , K2, ... ,X,,, is a list of the distinct eigenvalues of A. For the second stage, we show that transvections can be used to "zero out" portions of the upper triangular matrix T when a change in eigenvalue on the diagonal occurs. This can be done without disturbing the diagonal elements or the upper triangular nature of T. where a Suppose r < s and consider T,.,(a)-1 is any complex scalar. Notice that this similarity transformation changes entries of T only in the rth row to the right of the sth column and in the sth column
above the rth row and replaces tr,. by tr, + a(trr - t ). By way of illustration, consider
I -a 0
t t12 t13
Ia0
0
0 t72 t23
010
0 0 t33
001
0
1
00
1
=
t22
t23
0
t33
By choosing ar, _
trr - t,V.
t t11a + t12 t13 0 0
1
00
0 I
t22
1`23
0
t33
tit ti 2+a(tt1-t22)t13-at2.1
hl ti 1a+t12-t22t1.3-(1123 0 0
I -a 0 0
=
0 0
t22
t23
0
t33
, we can zero out the (r, s) entry as long as tr, 96
t,,. Thus, working with the sequence of transvections corresponding to positions
ll.! Jordan Form and Generalized Eigenvectors
399
(n-1,n),(n-2,it - 1),(n-2,n),(n-3,it -2),(n-3,n-2),(n -3, n), (n - 4, it - 3), (n - 4, it - 2), ...etc., we form the invertible matrix and note that Q-I AQ =
Q= T2
a block diagonal matrix where T, = X, I + M;, where M; is strictly upper triangular (i.e., upper triangular with zeros on the main diagonal). Before going on to the final stage, let us illustrate the argument above to be sure it is clear. Consider A =
2
2
3
4
0
2 0 0
5
6
3
9
0
3
0 0
Being already upper triangular, we do not
need Schur. We see aI I = a22 and a33 = aqq so we begin by targeting a23: T23(-5)AT23(5)
=
2
2
13
4
0
2
0
9 -39
0
0 0
3
9
0
3
0
on with positions (3 , 4) ,
1
,
The reader should n ow conti nue
3 and 1 , 4 to obtain
2
2
0
0
0
2
0
0
3
9
0
0 0
0
3
Now we go on to the final stage to produce JCF. In view of the previous stage, it suffices to work with matrices of the form XI+ 0
x
*
0
0
x
L0
0
0
M where M is strictly upper triangular. Say T =
,,.
* *
...
We
X]
induct on n, the size of the matrix. For n = 1, T = [k] = JI [k], which is already k b1 in JCF. That seemed a little too easy, so assume t = 2. Then T
If b = 0, again we have JCF, so assume b # 0. Then, using a dilation, we can make b to be 1; DI (b-' )T DI (b) =
b-I
0
kb 0
b
_
>a
=
b-' \b
601
Ol
] k
1
k]
0
L
0
1
1
, which is JCR Now k = L 0 0 for the induction, suppose n > 2. Consider the leading principal submatrix of T, which is size (it - 1)-by-(n - 1). By the induction hypothesis, this matrix
0
11 L
can be brought to JCF by a similarity Q. Then, L L
?]
® Q-I
[
I
1
T
0Q
®] 1
_
Jordan Canonical Form
400
F and F is a block diagonal matrix of X-Jordan
T, =
XI L
0
...
0
0
blocks. The problem is that there may be nonzero entries in the last column. But any entry in that last column opposite a row of F that contains a one can be zeroed out. We illustrate.
Say T, =
It
l
0
0
0
0
a
0
X
0
0
0
0
b
0
0
X
1
0
0
0
0 0
0 0
0 0
it
1
0
0
It
1
d e
0
0
0
0
0 0
0
0
0
k
.
Now, for example, d lies in the
0 0] f
(4,7) position across from a one in the fourth row of the leading principal submatrix of T, of size 6-by-6. We can use a transvection similarity to zero it out. Indeed,
57(d)T1T57(-d) =
it
1
0
0
0
0
a
0
X
0
0
0
0
b
0 0
0 0
1t
1
0
C
it
1
0
0 0 0
0 0
0 0
0
0
it 0
0
0
0
0 0 it
0 0
0
1t
0
0
0
0
0
0
0 0
0 0
0 0
it
0 0
1
0
0 0 0
0 1
k 0
1
A
0
0
0
a
0
b
0 0
-d+d=0
d
I
e+Ad
]t 0
it
57(-d) =
f
C
-itd+e+itd=e
I
0 0
0
it
f
0
it
In like manner, we eliminate all nonzero elements that live in a row with a superdiagonal one in it. Thus, we have achieved something tantalizingly close to JCF:
T2=
It O
1
0
0
0
0
0
k
0
0
0
0
b
O
O
A
1
0
0
0
0
0 0
0
it
1
0
0
0
it
1
0 0
0
0
0
0
it
f
0
0
0
0
0
)\
0 0 0
Now comes the tricky part. First, if b and f are zero, we are done. We have
achieved JCR Suppose then that b = 0 but f # 0. Then f can be made one
11. 1 Jordan Form and Generalized Eigenvectors
401
using a dilation similarity: X 0 0
D7(f)T2D7(f -1) =
0 0
0 0
1
X 0
0 0 0
0 0
0 0
k
1
0
0
0 0
k 0
1
0
0
0 0
k
1
0 0
0 0
0 0
0
0 0
0 0
k 0
k
0
0
and we have JCF.
1
Now suppose f = 0 but b
0. Then b can be made one by a dilation similark 00000 0 X 0 0 0 0 0 0 k 0 0 0 ity as above; D7(b)T2D7(b-1) = 0 0 0 k 1 0 0 . Now use a permuta0000X 1 0 0 0 0 0 0 X 0 0 0 0 0 0 0 k 1
1
1
tion similarity to create JCF; swap columns 3 and 7 and rows 3 and 7 to get P(37)D7(b)T2D7(b-1)P(37) = X
1
0
= P(37)
0
X 0
0
0
0 0
0 0
0
0
0
0
0
1
0 1
0 0
0 0 0
0 0 k
X
1
0
0
0
X
1
0 0
0 0
0 0 0 0 X
X
=
0 0 0
X
0
0 0 0 0 0 0
1
0
0 0
0 0 0
X 0
k
1
0
0
X
1
0 0 0
0
0 0 0
X 0 0
1
0 0
0 0 0 0 1
X
0
0 0 0
0 0 0 k
Finally, suppose both b and f are nonzero. We will show that the element opposite the smaller block can be made zero. Consider D7(f)T15(- L)T26(- L') )
T2T26(f)Tl5(f)D7(f-l) _ k
T15(-f)
D7(f)
0 0 0
j
0 0 0
k 0
k
1
0
0
k
1
0 0 0
0 0
0 0 0
0 0 0
0
k
1
0 0
0 0
0 k
0 0
0 0
1
1
X 0
0
0 0
0 0
0 0
0 0 X 0 0 0 0
0 0
0 0
0
0 0 0
0
1
0
k
1
0 0
0
k 0
T15(k) =
f k
0
k
0
0 0 0 0
k
0 0 0
1
0
0 0 0
0 0
f k
D7(f-1) _
402
Jordan Canonical Form X
1
O 0
k
0
0 0
0
0
0
0
0
0
0 0 0
0 0 0
X
1
0
0
X
1
0
0 0
0 0 A
0 0
0 0 0 0
1
0
0 0 0 0 0
k
I
0
x
,
which is JCF.
This completes our argument for getting a matrix into JCF. Remarkably, after the triangularization given by Schur's theorem, all we used were elementary row and column operations. By the way, the Jordan we have been talking about is Marie Ennemond Camille Jordan (5 January 1838-22 January 1922), who presented this canonical form in 1870. Apparently, he won a prize for this work.
An alternate approach to stage three above is to recall a deep theorem we worked out in Chapter 3 concerning nilpotent matrices. Note that N = T - XI is nilpotent and Theorem 81 of Chapter 3 says that this nilpotent matrix is similar to BlockDiagonal[Nilp[pi], Nilp[p2], ... , Nilp[pk]] where the total
number of blocks in the nullity of N and the number of j-by-j blocks is rank(Ni-1) - 2rank(Ni) + rank(N'+'). If S is a similarity transformation
that affects this transformation then S-'NS = S-'(T - XI)S = S-'TSkl = Block Diagonal[Nilp[pi], Nilp[P2], ... , Nilp[Pk]] so S-1 TS = Al + Block Diagonal[Nilp[pi], Nilp[P2], ... , Nilp[pk]] is in JCF. Finally, we remark that the one on the superdiagonal of the JCF are not essential. Using dilation similarities, we can put any scalar on the superdiagonal for each block. What is essential is the number and the sizes of the Jordan blocks.
Generalized Eigenvectors
11.1.4.1
In this section we look at another approach to Jordan's theorem where we show how to construct a special kind of basis. Of course, constructing bases is the same as constructing invertible matrices. We have seen that not all square matrices are diagonalizable. This is because they may not have enough 0 1
egenvectors to span the whole space. For example, take A =
0
L
Being lower triangular makes it easy to see that zero is the only eigenvalue. If
we compute the eigenspace, Eig(A, 0) =
X2
I I A f Xx 2
1
= f 0 l 1=
L
XZ
sp (I
[ J
0
I
0
0 0
] [ X'
J
=[
X,
]=
0
[ 0 ] } = l [ 02
]1
X2 E
C} _
1), which is one dimensional. There is no way to generate C2 with
II. I Jordan Forin and Generalized Eigenvectors
403
eigenvectors of A. Is there a way to get more "eigenvectors" of A? This leads us to the next definition.
DEFINITION 11.4 (generalized eigenvectors) Let A be an eigenvalue of A E C"X". A nonzero vector x is a generalized eigenvector of level q belonging to A (or just A-q-eigenvector for short) if and only if (A - A! )g x =' but (A - A/ )g- I x # ". That is, X E NU ll ((A - A! )g ) but
xVNull((A-A/)g-1).
Note that a I -eigenvector is just an ordinary eigenvector as defined previously. Also, if x is a A-q-eigenvector, then (A - A/)t'x = for any p > q. We could have defined a generalized eigenvector of A to be any nonzero vector x such that (A - A!)kx = _iT for some k E N. Then the least such k would be the level of x. You might wonder if A could be any scalar, not necessarily an eigenvalue.
But if (A - AI)kx ='for x 54 -6, then (A - AI)k is not an invertible matrix whence (A - A/) is not invertible, making A an eigenvalue. Thus, there are no "generalized eigenvalues."
DEFINITION 11.5 (Jordan string) Let x be a q-eigenvector of A belonging to the eigenvalue A. We define the Jordan string ending at x by Xq = X
Xq
Xq_, = (A - k!)X Xq-2 = (A - A!)2X = (A - A/)Xq_1
Axq = Axq + xq-1
X2 = (A - AI )q-2X = (A - A!)x3
AX3 = AX3 + X2
x1 = (A - A!)q-IX = (A - A!)x2
Axe = Axe + x, Ax, = Ax1
Axq_1 = AXq-1 +Xq-2
Note what is going on here. If x is an eigenvector of A belonging to A, then (A - A!)x = V (i.e., (A - A!) annihilates x). Now if we do not have enough eigenvectors for A- that is, enough vectors annihilated by A - A! - then it is reasonable to consider vectors annihilated by (A - A1)2, (A - A/)3, and so on. If we have a Jordan string x j, x2, ... , xq , then (A - A! )x 1 = (A - A I )g x = _iT,
(A - A!)2 x2 = -6, ... , (A - A!)gxq = _. Note that x, is an ordinary eigenvector and (A -XI) is nilpotent on span 1x,, x2, ... , xq }, which is A-invariant. The first clue to the usefulness of Jordan strings is given by the next theorem. THEOREM 11.4 The vectors in any Jordan string {x1, x2, ... , xq) are linearly independent. Moreover, each xi is a generalized eigenvector of level ifor i = 1, 2, ... , q.
404
PROOF
Jordan Canonical Form
With the notation as above, set a, x i + a2x2 + ... + ay_ 1 Xq _ 1 +
agXq = (. Apply (A - \1)y-l to this equation and get a,(A - \1)y-Ixi + a2(A - \I )y-' x2 + ... + ay(A - \I )q-1 xq = 6. But all terms vanish except aq(A - \I )q-1 Xq. Thus aq(A - AI )q-1 Xq = aq(A - 1\1)q-IX, SO aq = 0 since
#Inasimilar manner, each acanbeforced tohezero. Clearly, (A - AI)x, = -6, so x, is an eigenvector of A. Next, (A - AI)x2 = x, # (1 and yet (A - \1)2x2 = (A -,\/)x, Thus, x2 is a generalized eigenvector of level 2. Now apply induction.
0
We are off to a good start in generating more independent vectors corresponding to an eigenvalue. Next, we generalize the notion of an eigenspace.
DEFINITION 11.6
If \ is an eigenvalue of A E C""", define
G5(A)={xEC" 1(A-AI)t"x=-6 for some pEN}. This is the generalized eigenspace belonging to A.
THEOREM 11.5
For each eigenvalue A, G,\(A) is an A-invariant subspace of C. Indeed, G,\(A) = Null((A - \1)"). PROOF
The proof is left as an exercise.
El
Clearly, the eigenspace forA, Eig(A, A) = Null(A-AI) e G,,(A). Now the idea is something like this. Take a generalized eigenspace G,,(A) and look for a
generalized eigenvector x, say with (A - \I)'-'x # -6 but (A - \I)kx = -6
and such that there is no vector in G,\(A) that is not annihilated at least by (A - AI )k . This gives a generalized eigenvector of level k where k is the length of a longest Jordan string associated with A. Then x = xk, x,,_,, ... , xi is a list of independent vectors in GX(A). If they span Gk(A), we are done. If not we look for another generalized eigenvector y and create another string independent of the first. We continue this process until (hopefully) we have a basis of G,,(A). We hope to show we can get a basis of C" consisting of eigenvectors and generalized eigenvectors of A. There are many details to verify. Before we get into the nittygritty, let's look at a little example. Suppose A is 3-by-3 and A is an eigenvalue
of A. Say we have a Jordan string, x3 = x, x2 = (A - XI)x = Ax3 - Ax3 and x, = (A - \1)2x = (A - AI )x2 = Axe - Axe and necessarily (A - XJ)3x = 6 and x, is an eigenvector for A. Then solving we see Ax3 = x2 + Ax3, Axe = x, + Axe and Ax, = Ax,, Then A[x, I X2 I x3] = [Ax, I Axe I Ax3] = [Ax,
405
I 1.1 Jordan Form and Generalized Eigenvectors K
X1 + #\X2
X2 + Xx31 = [xi
I
I
X2 I X31
K
0 I
0
K
1
0 0
= [x1
I X2
x3]J3(1\).
I
Theorem 11.4 says (x1, x2, x3) is an independent set, so if S = [x1
X2 I x3],
I
then AS = SJ3(X) or S-' AS = J3()\). Thus, A is similar to J3(X). We call (x1, x2, x3} a Jordan basis. The general story is much more complicated than this, but at least you now have a hint of where we are going.
First, a little bad news. Suppose we have a Jordan string (A - Xl)xi =
-6,(A-XI)x2 =xi,... ,(A-XI)xq =xq_I.Theny, =xt,y2 =x2+aixi, Y3 =x3+a1x2+a2x1,... ,yq =Xq+aixq_1+a2Xq_2+...+aq_Ixi isalso a Jordan string and the subspace spanned by the ys is the same as the subspace spanned by the xs. Thus, Jordan bases are far from unique. Note that yy above contains all the arbitrary constants. So once yy is nailed down, the rest of the chain is determined. This is why we want generalized eigenvectors of a high level. If we start building up from an eigenvector, we have to make arbitrary choices, and this is not so easy. Let's look at a couple of concrete examples before we develop more theory. 5
eigenvalue of A. The eigenspace Eig(A, 5) = 0
0 0
x
=
y
0
z x
0
1 0
x
00 000
y
1
1
0
1
0 0
Consider a Jordan block A = J3(5) =
. Evidently, 5 is the only
5
1
0
5
!
I
x
x
y
I
I (A - SI) I y
z
I
I
0
=
z
y
=
0 0
=
1
z
=
z
I
0
I
0 IE = sp Thus Eig(A, 5) is one di0 0 0 mensional, so we are short on eigenvectors if we hope to span C3. Note that 0 0 0 0 1 0 0
=
]).
1
(A-5!)=
0 0
0
1
0
0
,(A-51)2=
0
0 0
,and(A-51)3=®,
0 0
0
a so we would like a 3-eigenvector x3 =
b
. Evidently (A -51 )3x3 =
c
Now (A - 5!)2x3 =
0 0 0
0
0 0
1
0
0
a b c
0
c96 0.Sayc=1.Next, x2=(A-51)x3=
c 0 0
= 0 0
1
0 0
0 0 0
, so we must require 0
a
1
b
0
1
b
=
I
0
Jordan Canonical Forin
406 0 0
and x, =
1
0 0
0
0
b
I
1
0
0 I
So, choosing a = b = 1, x3 =
I
I
, x2 =
I
, and x, =
1
0
1
0
is a Jordan string, which is a Jordan basis of C3. Note Ax, = A 5 0 0
1
=
0 0
5
L
+5
I
0
1=
1
5
1
1
5
1
1
I
0
5
I
0
0
1
0
0
5
0
,
0
1
I
I
1
0
1
6
=
5
l
0
1
I
0
0
1
=
an d
0 5
6
0
5
6
0
0
5
6
=
0
1
0 I
{
soA
6
I
0
11
6 I
+5
0 0
0
I
Ax3=
=
I
0 0
I
1
1
Ax e =A
0
. Here G5(A) = C3 and a Jordan basis is
}. Actually, we could have just chosen the standard
basis {e,, e2, e3}, which is easily checked also to be a Jordan basis. Indeed, Ae, = 5e,, Ae2 = 5e2 + e,, and Ae3 = 5e3 + e2. Moreover, it is not difficult to see that the standard basis {e,,e2, ... is a Jordan basis for the Jordan block Furthermore, the standard basis is a Jordan basis for any Jordan matrix where each block comes from a Jordan string. To see why more than one Jordan block might be needed for a given eigenvalue, we look at another example.
Consider A =
0
3
20
0
0
3i 2i
3i
0
33
.
We find the characteristic poly-
2 2i 0 0 nomial of A is XA(x) = x4 and the minimum polynomial of A is lJA() = x'` since A2 = 0. Thus, 0 is the only eigenvalue of A. The eigenspace of 0 is Eig(A, 0)-= 0
0
+d 1
[']
0 I
0
IcdEC
=
[i]3 1
i 0 I
}
0
The geometric multiplicity of X = 0 is 2. Since A22 _ 0, the highest a level can be for a Jordan string is 2. Thus, the possibilities are two Jordan strings of
407
I 1. I Jordan Form and Generalized Eigenvectors
length two or one of length two and two of length 1. To find a 2-eigenvector we 0
solve the necessary system of equations and find one choice is x2 =
-i
1 /2
is a Jordan string of length 2. Similarly,
Then x2
0
we find 14
Now (x,, x2, x3, x4} turns out to be an
1,x;
1
L°J
L 1/3J
independent set, hence is a Jordan basis of C4. This must happen as the next theorem will show. In other words, G0(A) is four dimensional with a basis consisting of two Jordan strings of length two. i 0 i 0 0 0 i I -l Now
-i 0
- i
0
0
-i
-i
0
1/ 2
1
1
0
1/2
1
1
1
1
0
1/3
1
1
0
1/3
0
1
0
n
0 n
0 n
0
0
0
1
0
0
0
0
A
0
Note the right-hand matrix is not a Jordan block matri x but
is composed of two blocks in a block diagonal matrix,
J2(0)
[ ®
0 J2 (0)
THEOREM 11.6 Let S, = {x, , x2, ... , xy } and S2 = {Y1, y2, ... , y p } be two Jordan strings belonging to the same eigenvalue X . If the eigenvectorx, and y, are independent, then S, U S2 is a linearly independent set of vectors.
PROOF
The proof is left as an exercise.
O
We recall that eigenvectors belonging to different eigenvalues are independent. A similar result holds for generalized eigenvectors.
THEOREM 11.7 Let S, = {X1, x2, ... , xy } , S2 = {Y1, y2, ... , yt, } be Jordan strings belonging to distinct eigenvalues k, and 1\2. Then S, U S2 is a linearly independent, set.
408
Jordan Canonical Form
PROOF To show independence, we do the usual thing. Set a, x, +...+ayxq+ Apply (A - k, I)'. Remember (A - k, If x; _ 3iy[ + ... + (3,,y,,
fori= Next, apply (A - k2I)"-1. Note (A - 1\2I)"-I(A - k, I)y = (A - k, I)N(A k21)"-' being polynomials in A. Now (A - k2I)P-'y; for j = I_. , p - 1(A - k21)y, so (31,(X2 - k, )"y, Thus, 0,, = 0 since k, # k2 and y, # V. Continuing in this manner, we kill off all the (3s, leaving a,x, + .. . + ayxy = it. But the xs are independent by Theorem 1 1.4, so all the as are zero as well and we are done. 0 The next step is to give a characterization of a generalized eigenspace more useful than that of Theorem 11.5.
THEOREM 11.8 Let A E C" xn with minimum polynomial µA (x) = (x - X, )e, (x - 1\2 )e2 ... (x (x -,\,)d,. ,\,)e, and characteristic polynomial XA (x) = (x - k, )d' (x _,\,)d, Then d i m(Gx, (A)) = d; so G,,, (A) = Null ((A - k; I )e, ).
PROOF Let X E Null(A - k;/)`,. Then (A - XjI)",x = -0 , so x is in G;,,(A).This says dim(Gh,(A)) > d; sinceNull(A - XI)e e Gh,(A),andwe determined the dimension of Null(A - k, I )e, in the primary decomposition theorem. Getting equality is surprisingly difficult. We begin with a Schur triangularization of A using a unitary matrix that puts the eigenvalue X; on the main diagonal first in our list of eigenvalues down the main diagonal. More precisely,
we know there exists U unitary such that U-I AU = T where T is upper tri-
angular and T =
0
k;
*
0
0
k;
...
*
W
. Since the characteristic
* 0
0
..
0
k;
0
R
polynomials of A and T are identical, X; must appear exactly d; times down the main diagonal. Note R is upper triangular with values different from X; on its * 0
*
...
*
10
*
...
*
0
0
0
10
diagonal. Now consider T - k; I=
I
I, which *
0
0
0
...
W
0
0
II.! Jordan Form and Generalized Eigenvectors
409
has exactly di zeros down the diagonal, while k has no zeros on its main diagonal. Now we look at the standard basis vectors. Clearly, e, E Null(T - Xi!)
and possibly other standard basis vectors as well, but ed,+,, ... , e" do not. Any of these basis vectors when multiplied by T - lei! produce a column of T - Xi! that is not the zero column since, at the very least, k has a nonzero entry on its diagonal. Now (T - lei 1)2 adds a superdiagonal of zeros in the di -by-di upper left submatrix (remember how zeros migrate in an upper triangular matrix with zeros on the main diagonal when you start raising the matrix to powers?), so e,,e2 E Null(T - \, 1)2 for sure and possibly other standard basis vectors as well, but ed,+,, ... , e" do not. Continuing to raise T - Xi 1 to powers, we eventually find e,,e2, ... , ej, E Null(T - it !)di but ea,+,, ... , e,, do not. At this point, the di-by-di upper left submatrix is completely filled with zeros, so for any power k higher than di we see dim (Null (T - Xi 1)1) = di.
In particular then, dim(Null(T - lei!)") = di. However, Null((T - Xi!)") = G,,, (T). But G,,, (T) = G,\, (U-'AU) = U-'G,,, (A). Since U-' is invertible, dim(G,,,(A)) = di as well. 0 Now, the primary decomposition theorem gives the following. COROLLARY 11.3
Let A E C"" with minimum polynomial µA (x) _ (x - k, )r' (x - iz)r2 ... (x ,\'p and characteristic polynomial XA (x) = (x - X, )d' (x - X2)d2 ... (x - X,. Then C" = G,\, (A) ® G,,2(A) ® . . . ® G,,, (A). In particular C" has a basis consisting entirely of generalized eigenvectors of A. Unfortunately, our work is not yet done. We must still argue that we can line a basis of generalized eigenvectors up into Jordan strings that will make a basis for each generalized eigenspace. This we do next. Evidently, it suffices to show we can produce a Jordan basis for each generalized eigenspace, for then we can make a basis for the entire space by pasting bases together.
THEOREM 11.9 Consider the generalized eigenspace Gk(A) belonging to the eigenvalue it of
A. Then G,,(A) = Z, ® Z2 ® ® Z., where each Zi has a basis that is a Jordan string. Now comes the challenge. To get this theorem proved is no small task. We shall do it in bite-sized pieces, revisiting some ideas that have been introduced earlier. We are focusing on this single eigenvalue It of A. Suppose XA(x) = (x - X)dg(x) and p.A(x) = (x - k)"h(x), where X is not a root of either g(x) or h(x). The construction of Jordan strings sitting over an eigenvalue is complicated
by the fact that you cannot just choose an arbitrary basis of Eig(A, 1t) =
410
Jordan Canonical Form
NUll(A -XI) c GI,(A) and build a string over each basis cigenvector. Indeed, we need a basis of a very special form; namely,
((A -
W/)"-'v1,
(A -
\1)m2-1
v,),
v2, ... , (A -
where g = nlty((A - K1)) = dim(Eig(A, K)) is the geometric multiplicity of l =d =dim(Ga(A)). > m,, Kandm1 > m2 > Then the Jordan strings VI
V2
(A - \I)vl
(A - K!)v2
(A - \!)2v1
(A - \1)2v2
(A - lk!)v,
(A -
(A - X1 )1112-1 V2
(A -
h!)I-1v1
...
vA
Xlynk-IVk
will form a basis of G,\(A) and each column corresponds to a Jordan block for K. The ith column will he a basis for Zi. Now that we know where we are headed, let's begin the journey. There is a clue in the basis vectors we seek; namely,
(A - Xl) -lvi E JVull(A - k!) f1Col((A -
I)III, -I).
This suggests we look at subspaces
Nk = NUll(A - X1) fl Col((A -
K!)k-1).
This we will do, but first we look at the matrix (A - K!). We claim the index of this matrix is e, the multiplicity of K as a root of the minimal polynomial. Clearly, (
') e Null(A - K!) e ... e JVull(A - lk!)" e ,A(ull(A - \1)e+l
c ....
Our first claim is that
Mull(A - \1)e = JVull(A - \/)e+I \1)e+I. Then (A - \/)e+Iw = -6. But p (x) _ Suppose W E Mull(A (x - A)eh(x) where GCD((x - K)e, h(x)) = I. Thus, there exist a(x), b(x) in C[x] with 1
(x - lk)e
= = =
a(x)(x - K)e + b(x)h(x) and so
a(x)(x - lk)2e + b(x)h(x)(x - K)e a(x)(X - K)2e + b(x)µA(x)
I /. / Jordan Form and Generalized Eigenvectors
411
But then
(A - k)e = a(A)(A - \)2e + b(A)!LA(A) = a(A)(A - \)2e and thus
(A - A)ew = a(A)(A -
k)e_i(A
- \)e+,W _
putting w in Mull(A - XI )e. We have proved
Null(A - \I)e = Null(A - \I)e+i An induction argument establishes that equality persists from this point forward in the chain (exercise). But could there be an earlier equality? Note that µA(x) =
(x - X)eh(x) = (x - X)[(x - \)r-l h(x)] and (x - \)e-'h(x) has degree less than that of the minimal polynomial so that (A - X)e-' h(A) cannot be the zero matrix. Hence there exists some nonzero vector v with (A - k)e-'h(A)v # 0
whence h(A)v 54 r .Yet (A - \I)[(A - \)e-'h(A)v] =IIA(A)v
so
JVull(A - \I)e-I C JVull(A - \I)e. Thus the first time equality occurs is at the power e. This says the index of (A - XI) is e and we have a proper chain of subspaces
(')CNull(A-k1)C...CNull(A-XI)e=Xull(A-\I)e+i = Though we have worked with null spaces, our real interest is with the column spaces of powers of (A - XJ). We see
C' D Col(A - X1)
Col((A - X1)2)
... ? Col((A-\I)e-1) DCol((A-k1)e)=... We can now intersect each of these column spaces with NUll(A - XI) to get
Arull(A - kl) 2 NUll(A - \1) flCol(A - X1) 2
2Null(A-Xl)f1Col((A-XI)e)= In other words,
Ni 2N22...2Ne=Ne+1 =..., where we can no longer guarantee the inclusions are all proper. There could be some equalities sprinkled about in this chain of subspaces. If we let ni = dim(N;) we have
n, >n2>...>ne
Jordan Canonical Form
412
Next, we recall Corollary 3.2, which says
n, = dim(N1) = nlty((A - \I)') - nlty((A -
XI)'-1
so that
nlty(A - XI) nlty(A - XI )2 - nlty(A -XI) nlty(A - \1)3 - nlty(A - \1)22
nI 112
n3
nr n,+I
nlty(A - \I)' - nlty(A >I nlty(A - k/)`+I - nlty(A - iI)r = 0.
= =
kl)r-I
From this, it is clear what the sum of the n,s is; namely, n1 + n2 +
+ n, = nlty(A - k/)r = dim(Null(A - kI )` ) = dim(GI,(A)) = d.
Thus (n 1, n2, ... , ne) forms a partition of the integer d. This reminds us of the notion of a conjugate partition we talked about earlier. Recall the Ferrer's diagram: ni
.
.
n2
... ...
n3
Figure 11.1:
Ferrer's diagram.
This diagram has e rows, and the first row has exactly g dots since n I is the dimension of NUll(A - k/). It is time to introduce the conjugate partition (m I , m2, ... , in.). This means in I > n 2 > > in, > l and m 1 + +mx = d and in1 = e. It is also clear that in1 = m2 = = in,. The next step is to build a basis ofNull (A-XI) with the help of the conjugate partition. Indeed, we claim there is a basis of Null(A - XI) that looks like
((A - k/)m'-IVI, (A - k/)m'-1V2, ... , (A -
kI)rnk-IVI)
To understand how this is, we start with a basis of N,, and successively extend it to a basis of N1. Let ( b 1 , b2, ... , be a basis of N. By definition, the bs
are eigenvectors of the special form;
b1 = (A - \I)e-IVI, b2 = (A -
iI)''-I V2,
... b,,, = (A - k/)r-1V,i,. ,
I /. I Jordan Form and Generalized Eigenvectors
413
But remember, e=ml
bl = (A -
XI)mi-'VI,
b2 = (A - 1\1)n"-IV2, ... , bn, = (A - U)""',-IVne.
Now if ne = ne_I, no additional basis vectors are required at this stage. However, if ne_I > ne, an additional ne_I - ne basis vectors will be needed. The corresponding "m" values will be e - I for tie-] - ne additional basis vectors so that
bn,+I = (A - \I)r-2Vn,+l = (A - XI)"' +'-I Vn,+I,
and so on.
Continuing in this manner, we produce a basis ofNull(A-kl) of the prescribed form. Next, we focus on the vectors VI, V2, ... , Vg.
The first thing to note is that
AA.,,(X)=(X-X)"'' fori = 1,2,... ,g. B construction, (A - XI)"','vi belongs to Null(A - X1) so (A - XI)',v; _ . This means P A.v, (x) divides (x - X)"" . However, being a basis vector,
(A - X/)'"-Iv; # 6 so we conclude p.A,v,(x) = (x - *\)". Next, we characterize the subspaces Z of G,\(A) that correspond to Jordan strings, which in turn correspond to Jordan blocks, giving the Jordan segment belonging to k in the JCF. Consider
Z, = {p(A)v I p(x) E C[x}} for any vector v. This is the set of all polynomial expressions in A acting on
the vector v. This is a subspace that is A-invariant. Moreover, if v # V, In particular, (v, Av,... , Ak-IV) is a basis for Z, where k dim(Z,,) = degp.A.,,(x). This can be established with the help of the division algorithm. Even more is true. If p.A.,,(X) = (x -X)k, then (v, (A -Ik1)v, ... , (A X/)k-Iv} is a basis for Z,. W e want to consider these subspaces relative to V1, v2, ... , vx constructed above. First, we note Z,, C G>,(A)
fori = 1,2,... ,g.
From this we conclude
In view of pA,,,(x) = (x - it)"'' for i = 1, 2, ... , g, each Z,, has as its basis (vi, (A - X I )v; , ... , (A - X I)"', -I vi). There are two major obstacles to
414
Jordan Canonical Form
overcome. We must establish that the sum (1) is direct and (2) equals C5(A). The latter we will do by a dimension argument. The former is done by induction. Let's do the induction argument first. Assume we have a typical element of the sum set equal to zero: p1(A)v1 + p2(A)v2 + ... + pg(A)vg
where each p;(A)v; E Z. We would like to conclude pi(A)v; = ( for i = 1, 2, ... , g. This will say the sum is direct. We induct on rn I = e. Suppose m1 = 1. Then, since m1 > m2 >
> mg > I, we must be in the case where
MI =m2=...=mg= 1. This says ((A - XI)mi-Ivl, (A -
k!)nt:-I
V2, ... , (A - k!)nml-IV, = V
V
, Vg
is an independent set of vectors. By the corollary of the division algorithm called the remainder theorem, we see
pi(x)=(x-k)9i(x)+ri(k)for j = 1,2,...
g
where ri(k) is a scalar, possibly zero. Then,
pi(A)vj _ (A - k!)9i(A)vi +ri(k)vi = µA. v,, (A)c!i (A)vi + ri (X )vi
= rj (k)vj. Thus,
pI(A)v1 + p2(A)v2 + ... + pg(A)vg = -6 reduces to
rl(X)v1 + r2(X)v2 + ... + rg(k)vg = -6.
which is just a scalar linear combination of the vs. By linear independence, all the rj(k) must be zero. Thus,
pi (x) = (x - \)9i(x) for j = 1, 2, ... , g. Therefore,
pi(A)=(A-k!)9i(A)for j = 1,2,... ,g and thus
pl(A)vj = (A - k!)gi(A)vj = 0 for j = 1, 2, ... as we had hoped.
, g,
415
I/. 1 Jordan Form and Generalized Eigenvectors
Now let's assume the independence follows for m i - 1. We prove independence for m i. Rather than write out the formal induction, we will look at a concrete example to illustrate the idea. Again, it boils down to showing µA,v, (x) divides pj(x). The idea is to push it back to the case m i = 1. Assume the result
holds for mi - I = k. It will be helpful to note µA,(A-a/)v;(x) = (x - xyn,-). For the sake of concreteness, suppose n i = 2, m2 = 2, m3 = 1, m4 = 1. Then we have ((A - AI)vl, (A - A!)v2, v3, v4) is an independent set. Suppose Pi(A)vi + P2(A)v2 + p3(A)v3 + P4(A)v4 = -d-
Multiply this equation by (A - A/) and note we can commute this with any polynomial expression in A:
pI(A)(A - AI)vi + p2(A)(A - A!)v2 + p.,(A)(A - XI)v3 + P4(A)(A - AI)v4 = . Since V3 and v4 are eigenvectors, this sum reduces to
PI(A)(A - X1)vt + p2(A)(A - 1\ /)v2 = . But µA,(A_a/)v,(x) = (x - A) and µA,(A-t,/)v2(x) = (x - A), so by the case above, we may conclude that (x - A) divides pi(x) and p2(x). Thus, we have
pi(x) = (x - A)gI(x), p2(x) = (x - A)g2(x). Look again at our zero sum: gt(A)(A - X1)vi + q2(A)(A - XI)v2 + P3(A)v3 + p4(A)v4 = -6 and note (x - A), 1-A.(A-a1)12(x) = (x - A), AA.v,(x)
= (x - A), PA.v,(x) = (x - A). This is exactly the case we started with, so we can conclude the (x - A) divides
q, (x), q2(x), P3(x), and p4(x). Let's say q,(x) _ (x - A)h1(x), q2(x) = (x X)h2(x) p3(x) = (x - A)h3(x), P4(x) = (x - A)h4(x). Then
pi(A)vt = gi(A)(A - AI)v, = ht(A)(A - AI)2vi P2(A)v2 = q2(A)(A - A1)v2 = h2(A)(A - Al)2v2 =
-
P3(A)v3 = h3(A)(A - AI)v3 = P4(A) = h4(A)(A - AI)v4 = The general induction argument is left to the reader. Hopefully, the idea is now clear from the example why it works, and it is a notational challenge to write the general argument.
Jordan Canonical Form
416
The last step is toget the direct sum to equal the whole generalised eigenspace. This is just a matter of computing the dimension. ®... ® ZvX) _
Zv,
dim(Z,, )
_
deg(µA.,, (x)) i=1
=d = dim(G5(A)). Now the general case should be rather clear, though the notation is a bit messy. The idea is to piece together Jordan bases of each generalized eigenspace to get a Jordan basis of the whole space. More specifically, suppose A E C" x" and A has distinct eigenvalues , . .... ,X,. , X . Suppose the minimal polynomial of A is µA(x) = (x - X)" (x ->\)r= .. (x - X)r and gi is the geometric multiplicity of X,
for i = I, 2, ... , s. That is, gi = dim(Null(A -XI)). The consequence of the discussion above is that there exist positive integers mi j fori = 1, 2, ... , s, j =
1, 2, ... , gi and vectors vij such that ei = mil > miz > µA,,,, (x) = (x - Xi)"'' such that
> mig, >_ 1 and
K,
C',
=
G)EDZV,/.
i=1 j=1
Indeed, if we choose the bases
Bij = for
((A-Xi1)",-1vij,... ,(A-XXil)vij,vij)
and union these up, we get a Jordan basis V
9,
13=UU13,j i=1 j=I
for C". Moreover, if we list these vectors as columns in a matrix S, we have A-1 AS = J, where J is a Jordan matrix we call a JCF or Jordan normal form of A(JNF). We will consider the uniqueness of this form in a moment but let's consider an example first.
417
11.1 Jordan Form and Generalized Eigenvectors 0
-3i
1
0
0
Consider A =
One way or another, we find the
i 0 0 0 3i 0 characteristic polynomial and factor it:
0
1
+3)2.
XA(X) = (x - 3)2(x
Hence, the eigenvalues of A are -3 and 3. Next, we determine the eigenspaces and the geometric multiplicities in the usual way we do null space calculation:
a Eig(A, 3)
= Null(A - 31) =
a
I (A - 31)
b d
0
=
b d
0 0
-i
0
= SP
I
1-0.1 Thus, the geometric multiplicity of 3 is I. Similarly,
a
Eig(A, -3)
= Null(A + 31) =
b
a I (A + 31)
d
b
d
0
=
0 0
0
=SP
I
0
so the geometric multiplicity of -3 is also 1. Next, we seek a generalized eigenvector for 3. This means solving
a A
b
3
a
-i
b
0
c
c
1
d
d
0
We find a one parameter family of choices, one being
}
418
Jordan Canonical Form
Thus 0
G3(A) = sp
0
-i
1
0
0
1
Similarly, we find
10
1
i
0
G-AA) = sp
i
I
0
,
0
1
Thus
-i 0
0
-i
i
0
0
i
1
0
1
0
0
1
0
1
-i
-I
0 A
0
-i
i
0
0
i
1
0
1
0
0
1
0
1
=
3
0 0 0
1
3
0 0
0 0
-3 0
0 0 1
-3
Finally, we discuss the uniqueness of the JCR Looking at the example above, we could permute the block for 3 and the block for -3. Hence, there is some wiggle room. Typically, there is no preferred order for the eigenvalues of a matrix. Also, the ones appearing on the superdiagonal could be replaced by any set of nonzero scalars (see Meyers [2000]). Tradition has it to have ones. By the way, some treatments put the ones on the subdiagonal, but again this is only a matter of taste. It just depends on how you order the Jordan bases. Let's make some agreements. Assume all blocks belonging to a given eigenvalue stay together and form a segment, and the blocks are placed in order of decreasing size. The essential uniqueness then is that the number of Jordan segments and the number and sizes of the Jordan blocks is uniquely determined by the matrix. Let's talk through this a bit more. Suppose J, and J2 are Jordan matrices similar to A. Then they are similar to each other and hence have the same characteristic polynomials. This means the eigenvalues are the same and have the same algebraic multiplicity, so the number of times a given eigenvalue appears is the
same in J, and J2. Now let's focus on a given eigenvalue X. The geometric multiplicity g = dim(JVull(A - Al)) determines the number of blocks in the Jordan segment belonging to k. The largest block in the segment is e, where e is the multiplicity of X as a root of the minimal polynomial. It seems tantalizingly close to be able to say J, and J2 are the same, except maybe where the segments are placed. However, here is the rub. Suppose the algebraic multiplicity is 4 and
the geometric multiplicity is 2. Then A appears four times and there are two blocks. It could be a 3-by-3 and a 1-by-I or it could be two 2-by-2s. Luckily, there is a formula for the number of k-by-k blocks in a given segment. This
I /. I Jordan Forin and Generalized Eigenvectors
419
formula only depends on the rank (or nullity) of powers of (A - XI). We have seen this argument before when we characterized nilpotent matrices. It has been a while, so let's sketch it out again. Let's concentrate on the d-by-d k-segment.
k
?
X
0 it
This segment is a block diagonal matrix with the Jordan blocks J., (k), where
mi > m2 > ... > mg > 1, say Seg(k) = Block Diagonal [J,,,, (k), J,1,2(k),
...
, J,nx(k)].
But then
Seg(k) - kid = Block Diagonal[Nilp[mi], Nilp[m2], ... , Nilp[mg]] - N. Recall
nlty(Nk)-{ k
ld
if I d
and so nltY(NA ) - nlty(N k-1)
1
=
0
if I
ifk>d
Now g
nlty(Seg(k) - kid)k = Enlty(Nilp[m;]k) and so
nity(Seg(k) - kId)k - nlty(Seg(k) - kid)' g
_ >nlty(Nilp[m;]k) - nlty(Nilp[m;]k-I) r-i g
=
1
i=1 k < m; .
Jordan Canonical Form
420
This difference of nullities thus counts how many blocks have size at least k-hy-k
since the power k has not killed them off yet. Consequently, the difference
[nlty(Seg(X) - X1a)k - nlty(Seg(X) - X1d)k-'] - [nlty(Seg(X) - X/j+i -nlty(Seg(X) - X1d)k] counts exactly the number of blocks that are of size k-by-k. This can he restated using ranks:
rank(Seg(X) - X Id)k-I) - 2rank(Seg(X) - X Id)k) + rank(Seg(X) - XId)'+i ). Note that these computations did not depend whether we are in J, or J2, so the number and the sizes of the Jordan blocks in every segment must be the same. Up to ordering the segments, J, and J2 are therefore essentially the same.
Further Reading [F&I&S, 1979] S. H. Friedberg, A. J. Insel, and L. E. Spence, Linear Algebra, Prentice Hall, Englewood Cliffs, NJ, (1979). [H&K, 1971 ] K. Hoffman and R. Kunze, Linear Algebra, 2nd Edition, Prentice Hall, Englewood Cliffs, NJ, (1979). [MacDuffee, 1946] C. C. MacDuffee, The Theory of Matrices, Chelsea Publishing Company, New York, (1946). [Perlis, 1958] S. Perlis, Theory of Matrices, 2nd Edition, Addison-Wesley, Reading, MA, (1958).
[Valiaho, 1986] H. Valiaho, An Elementary Approach to the Jordan Canonical Form of a Matrix, The American Mathematical Monthly, Vol. 93, (1986), 711-714.
Exercise Set 49 1. How does XA -, relate to XA? (Hint: XA (X) _
2. Consider A = r
0 ] and B =
A and B are not similar.
[' L
1. J
(-xLXA(X).)
det(A)
Argue that XA = Xe but
I l.I Jordan Form and Generalized Eigenvectors
421
3. Prove all the parts of Theorem 11.5. 4. Prove all the parts of Theorem 11.6. 5. Prove all the parts of Corollary 11.3. 6. Prove alI the parts of Theorem 11.9. 7. Fill in the details in all the computational examples in the text.
8. Prove the Cayley-Hamilton theorem using Jordan's theorem. 9. Say everything you can about a matrix A whose JCF is 5 0 1
0
5
1
5
5
I
5
u
I
a
10. Suppose A is a 4-by-4 matrix with eigenvalue 3 of all possible JCFs A might have. How many JCFs can an arbitrary 4-by-4 matrix have? Exhibit them. 11. Prove the claim made about the lack of uniqueness of Jordan strings at the top of page 405. 12. Argue that the standard basis {e1, e2, ... , e } is a Jordan basis for the Jordan block Furthermore, the standard basis is a Jordan basis for any Jordan matrix where each block comes from a Jordan string. 13. Argue that a generalized eigenspace for A is A-invariant.
14. If S is invertible, prove that
GI,(S-1 AS)
= S-I(Gk(A)).
15. For a vector v, argue that the following two statements are equivalent: (1) there exists a positive integer k with (A - KI)1v = 0, and (2) there is a sequence v1, v2, ... , vk = v such that (A - XI)Vk = vk_I, (A -
XI)vk-I = Vk-2,...,(A - X!)v1 = 16. Suppose M = sp(xi, Axl, ... , Ad-1x1 ), where Adxl =
for the first time. Suppose xI is a generalized eigenvector of level d for the eigenvalue X of A. Argue that wl = (A - X I )d-I x1, w2 = (A - kl )J-2x1, ... , wd x1 is a Jordan string that is a basis for M.
Jordan Canonical Form
422
17. Do the induction argument indicated in the proof of Theorem 11.9.
18. Argue that Z,, = (p(A)v I p(x) E C[x]} is a subspace that is A-invariant. 19. Prove that Z,,, C Gx(A) f o r i = 1, 2, ... , g.
11.2
The Smith Normal Form (optional)
There is a more general approach to JCF that extends to more general scalars than complex numbers. For this, we need to be able to work with matrices that have polynomial entries. In symbols, this is C[x]"'". The concepts of dealing with scalar matrices extend naturally to matrices with polynomial entries. We define matrix equivalence in the usual way. Two m-by-n matrices A(x) and B(x) in C[x]"''I are equivalent iff there exist matrices P(x) and Q(x) such that B(x) = P(x)A(x)Q(x), where P(x) and Q(x) are invertible and of appropriate size. Another way to say invertible is P(x) and Q(x) have nonzero scalar determinants. It is easy to see equivalence is indeed an equivalence relation. When dealing with complex matrices, the fundamental result was rank normal form.
This said every complex matrix was equivalent to a matrix with ones down the diagonal, as many ones as rank, and zeros elsewhere. When dealing with matrices in C[x]"'"", there is an analog called Smith normal form. This is named for Henry John Stephen Smith (2 November 1826-9 February 1883). Actually, Smith was a number theorist and obtained a canonical form for matrices having only integer entries. (Philos. Trans. Roy. Soc. London, 151, (1861), 293-326).It was Frobenius who proved the analogous result for matrices with polynomial entries. (Jour. Reine Angew. Math. (Crelle), 86, (1878), 146-208.) Here we have monic polynomials down the diagonal, as many as rank, each polynomial divides the next and zeros elsewhere.
THEOREM 11.10 (Smith normal form) Suppose A(x) is in C[x ]'m"' of rank r Then there is a unique matrix SNF(A(x)) in C[x]'"" equivalent toA(x) where SNF(A(x)) is a diagonal, matrix with monic polynomials s, (x), s2(x), ... s, (x) on the diagonal, where si (x) is divisible by
s,_,(x)fori =2, ...
,
r.
PROOF We argue the existence first. Note that the transvections, dilations, and permutation matrices do the same work that they did for scalar matrices, even though we are now allowing polynomial entries in our matrices. Indeed, only the transvections will need polynomial entries in our proof. We note the
423
11.2 The Smith Normal Form (optional)
determinant of all these matrices is a nonzero scalar so, just as before, they are all invertible. The proof goes by induction on m and n. The case m = it = I is clear, so we consider the case m = 1, n > 1. In this case, A(x) = If all the ai(x)s are zero, we are done, so let's assume [al(x) a2(x) . otherwise. Then there must be an a;(x) with minimal degree. Use an elementary column operation if necessary and put that polynomial in the first position. In other words, we may assume a, (x) has minimal degree. Now, using elementary
matrices, we can replace all other aj(x)s by zero. The key is the division algorithm for polynomials. Take any nonzero aj(x) in A(x) other than ai(x). Divide aj(x) by al(x) and get aj(x) = qj(x)a,(x) + rj(x), where rj(x) is zero or its degree is strictly less than deg(a, (x)). Multiply the first column by -q1 (x)
and add the result to the jth column. That produces rj(x) in the jth position. Then, if r. (x) = 0, we are happy. If not, swap rj (x) to the first position. If there still remain nonzero entries, go through the same procedure again (i.e., divide the nonzero entry by rj(x) and multiply that column by minus the quotient and add to the column of the nonzero entry producing another remainder). Again, if the remainder is zero we are done; if not, go again. Since the degrees of the
remainders are strictly decreasing, this process cannot go on forever. It must terminate in a finite number of steps. In fact, this process has no more than deg(a,(x)) steps. This completes the induction since we have all the entries zero except the first. A dilation may be needed to produce a monic polynomial. The case m > I and n = I is similar and is left as an exercise. Now assume m and n are greater than I. Suppose the theorem is true for matrices of size (m-1)-by-(n-1). We may assume that the (1, 1) entry of A(x) is nonzero with minimal degree among the nonzero entries of A(x). After all, if A (x) is zero we are done. Then if not, row and column swaps can be used if necessary. Now, using the method described above, we can reduce A(x) using a finite 0 0 ... a(,,)(x) 0 azn)(x) number of elementary matrices to A, (x) _ O
(X)
42)(x)
...
[i") mn(x)
We would like to get all the entries divisible by a(1)(x). Iffor some i,
then add the ith row to the first that is not zero is not divisible by row and apply the procedure above again. Then we get a matrix A2(x) _ a(i)(x) 0 ... 0 0
0
a(2)(x) 22 (2)
an12(x)
... where the degree of al I)(x) is strictly less
...
a( z,(x)
than the degree of a(I 1)(x). If there is still an entry
not divisible by a(, i)(x),
repeat the process again. In a finite number of steps, we must produce a matrix
424
Jordan Canonical Form
O
A3(x) =
0 a22)(x)
... ...
0 a;'3 )(x )
where every entry is divisible by 0
a,,,2(x)
...
)711(3) (X)
a i)(x). We can use a dilation if necessary to make a, (x) monic. Now the mduca22)(x)
a,,1 (x)
tion hypothesis applies to the lower-right corner (3)
am2(x)
(;) a"In
and we are essentially done with the existence argument. Before we prove the uniqueness of the Smith normal form, we recall the idea of a minor. Take any m-by-n matrix A. A minor of order k is obtained by choosing k rows and k columns and forming the determinant of the resulting square matrix. Now the minors of a matrix with polynomial entries are polynomials so we can deal with their greatest common divisors (GCDs).
THEOREM 11.11 Let gk(x) denote the GCD of the order k minors of A(x) and hA(x) denote the GCD of the order k minors of B(x). Suppose A(x) is equivalent to B(x). Then gk(x) = hk(x) for all k.
PROOF Suppose A(x) is equivalent to B(x). Then there exist invertible P(x) and Q(x) such that B(x) = P(x)A(x)Q(x). P(x) and Q(x) are just products of elementary matrices, so we argue by cases. Suppose B(x) = E(x)A(x), where E(x) is an elementary matrix. We consider the three cases. Let R(x) be an i-by-i minor of A(x) and S(x) the i-by-i minor
of E(x)A(x) in the same position. Suppose E(x) = Pig, a swap of rows. The effect on A(x) is (1) to leave R(x) unchanged or (2) to interchange two rows of R(x), or (3) to interchange a row of R(x) with a row not in R(x). In case (1) S(x) = R(x); in case (2), S(x) = -R(x); in case(3), S(x) is except possibly for a sign, another i-by-i minor of A(x). Next, suppose E(x) is a dilation Di(x) where a is a nonzero scalar. Then either S(x) = R(x) or S(x) = aR(x). Lastly, consider a transvection E(x) = T,j(f(x)). The effect on A(x) is (I) to leave R(x) unchanged, (2) to increase one of the rows of R(x) by f (x) times another of row of R(x), or (3) to increase one of the rows of R(x) by f (x) times a row not of R(x). In cases (1) and (2),
S(x) = R(x); in case (3), S(x) = R(x) ± f(x)C(x), where C(x) is an i-by-i minor of A(x). Thus any i-by-i minor of E(x) is a linear combination of i-by-i minors of A(x). If g(x) is the GCD of all i-by-i minors of A(x) and h(x) is the GCD of all
11.2 The Smith Normal Form (optional)
425
i-by-i minors of E(x)A(x), then g(x) divides h(x). Now A(x) = E(x)-1 B(x) and E-1 (x) is a product of elementary matrices, so by a symmetric argument, h(x) divides g(x). Since these are inonic polynomials, g(x) = h(x). Next, suppose B(x) = E(x)A(x)F(x), where E(x) and F(x) are products of elementary matrices. Let C(x) = E(x)A(x) and D(x) = C(x)F(x). Since D(x)T = F(x)T C(x)T and F(x)T is a product of elementary matrices, the GCD of all i-by-i minors of D(x)T is the GCD of all i-by-i minors of C(x)T. But the GCD of all i-by-i minors of D(x).The same is true for C(x)T and C(x) so the GCD of all i-by-i minors of B(x) = E(x)A(x)F(x) is the GCD of all i-by-i minors of A(x).
0
We are now in a position to argue the uniqueness of the Smith normal form.
THEOREM 11.12 Suppose A(x) is in C[x]'nxn of rank r Let gk(x) denote the GCD of the order k minors of A(x). Let go(x) = I and diag[si(x), s2(x), ... , s,(x), 0, ... ,0] be a Smith normal form of A(x). Then r is the maximal integer with g,(x) nonzero gi(x)
and si(x)= PROOF
gi-i(x)
for i = 1,2,...,r.
We begin by arguing that A(x) is equivalent to diag[si(x),
...
, Sr(X), 0, ... ,0]. These two matrices have the same GCD of minors of order k by the theorem above. Except for the diagonal matrix, these minors are easy to compute. Namely, g,, (x) = s, (x)s2(x) . . . se (x) fork = 1,2, 52(X),
...
r. Thus, the si(x)'s are uniquely determined by A(x).
The polynomials SO), s2(x),
o
... , s,(x) are called the invariant factors of
A(x).
THEOREM 11.13 A(x) and B(x) in C[x I xn are equivalent if they have the same invariant factors.
PROOF
The proof is left as an exercise.
Before we go any further, let's look at an example. Example 11. 1
Let A(x) =
x
x-l
x+2
x2 + x
X2
x2 + 2x
x2-2x x2-3x+2 x2+x-3
El
426
Jordan Canonical Form
A(x)T12(-1) =
x
x-I
x +2
x22
x2
x-2 x22-3x+2 x2+x-3 I
T31(-x + 2)T21(-x)A(x)T17(-1) =
x-1 x-12
0
x
0
0
0
x+I
T31(-x + 2)T21(-x)A(x)T12(-l)T21(-x + 1)T31(-x - 2) _ 1
0
0
0
x
0
0 x+l
0
T23(-I)T31(-x + 2)T21(-x)A(X)T12(-1)T21(-x + I)T31(-x - 2) 0
1
0 0
0
-x-l
x
0 x+l
T23(-1)T31 (-x + 2)T21(-x)A(x)T12(-1)T21(-x +I) T31(-x - 2)T23(1) 1
0
0
0
-1
-x - 1
0 x+1 x+1
1
T32(x + 1)T23(- I)T31(-x + 2)T21(-x)A(x)T12(- l)T21(-x + 1)T31(-x 2)T23(1) = 1
0
0
1
0
0
0
x+1 -x22-x
D2(-I)T32(x+ I)T23(-1)T31(-x+2)T71(-x)A(X)T12(-I )T21(-x+ I)T31 (-x - 2)T23 T32(-x - 1)D3(-1) 1
0 0
0
0
1
0
0
x(x + I)
= SNF(A(x)).
There are other polynomials that can be associated with A(x).
DEFINITION 11.7 (elementary divisors) Let A(x) E C[x]"X". Write each invariant factor in its prime factorization
over C, say si (x) = (x - X,1 )e 1(X - k2)e,2 ... (X - \,,, )e, l = 1, 2, ... , r. However, some of the eij may be zero since si(x) divides si+1(x), ei+1J > eii
i = 1, 2, ... , r - 1, j = 1, 2, ...
, ki. The nontrivial factors (x - Aid )e J are called the elementary divisors of A(X) over C.
11.2 The Smith Normal Form (optional)
427
Example 11.2 Suppose
SNF(A(x)) _ 10 01
0 0
0
0
0 0
0
0 0 (x - 1)(x2 + 1) 0 00 (x - I)(x2 + I )2x
00
0
(x - 1)2(x2 + 1)2x2(x2 -5)
0
0
0
The invariant factors are s1(x) = I, s7(x) = 1, s3(x) = (x - 1)(x2 + 1), sa(x) = (x - 1)(x2 + 1)2x, and s5(x) = (x - 1)2(x2 + 1)2x2(x2 - 5). Now the elementary divisors over the complex field C are (x - 1)22, x - 1,x - 1, (x +
i)2,(x+i)2,x+i,(x-i)2,(x-i)2,x-i,x2,x,x- 15,x+ 15.
Note that the invariant factors determine the rank and the elementary divisors. Conversely, the rank and elementary divisors determine the invariant factors and, hence, the Smith normal form. T o see how this goes, suppose X1, X2, ... , Av
are the distinct complex numbers appearing in the elementary divisors. Let be the elementary divisors containing Xi. Agree > e;k, > 0. The number r of invariant factors must be greater or equal to max(kI, ... , k1)). The invariant factors can then be (x - X; )r,' , (x to order the degrees e, I
reconstructed by the following formula:
si(x) _
P fl(x -
A1)e.r+I-i
for j = 1, 2, ... , r
=1
where we agree (x - \;)ei, = 1 when j > ki. We can learn things about scalar matrices by using the following device. Given
a scalar matrix A E C"", we can associate a matrix with polynomial entries in C[x]"""; this is the characteristic matrix xl - A. So if A = [a;j ] E C"" then x - all
-a12
...
-a21
x - a27
...
xl-A
inC[x]""". Of course,
x - an,, the determinant of x l - A is just the characteristic polynomial of A. The main result here is a characterization of the similarity of scalar matrices.
THEOREM 11.14 For A and B in C"x", the following statements are equivalent:
1. A and B are similar.
428
Jordan Canonical Form
2. xl - A and xI - B are equivalent. 3. xI - A and xI - B have the same invariant. factors. PROOF Suppose A and B are similar. Then there exists an invertible matrix S with B = S-' A S. Then it is easy to see S-' (x! - A )S = x l - B. Conversely,
suppose P(x) and Q(x) are invertible with P(x)(x! - A) = (xl - B)Q(x).
By dividing (carefully), write P(x) = (xi - B)P,(x) + R, and Q(x) _ Q,(x)(xl - A) + R2, where R, and R2 are scalar matrices. Then, by considering degree, we conclude P, (x) - Q, (x) = 0. Therefore, R, = R2 and so R,A = BR,. It remains to prove R, is invertible. Suppose S(x) is the inverse to P(x). Write S(x) = (xl - A)Q2(x) + C, where C is a scalar matrix. Now ! = (xl - B)Q3(x)+ R,C since R,A = BR, and P(x)S(x) = 1. Note Q3(x) = P,(x)(x! - A)Q2(x) + P,(x)C + R,Q,(x). Now, by considering degrees, conclude Q3(x) is zero. Thus R,C = I and we are done. We leave it as an exercise that Jordan's theorem follows from the existence and uniqueness of the Smith normal form.
THEOREM 11.15 (Jordan's theorem) If A is a square complex matrix, then A is similar to a unique Jordan matrix (up to permutation of the blocks).
PROOF The proof is left as an exercise. The uniqueness comes from the uniqueness of the Smith normal form of xI - A. 0
Exercise Set 50 1. With the notation from above, argue that the number of elementary divir
sors of A(x) is Eki. i=
2. Suppose A(x) is invertible in C[x]" "", Argue that det(A(x)) is a nonzero constant and the converse. (Hint: Look at A(x)B(x) = 1 and take determinants of both sides.) 3. Argue that A(x) is invertible in C[x]n"" iff A(x) is a product of elementary
matrices in C[x]"" 4. Prove that the characteristic polynomial of A E C""" is the product of the invariant factors of xl - A.
11.2 The Smith Normal Form (optional)
429
5. Prove that the minimum polynomial of A E C"x" is the invariant factor of xI - A of highest degree. 6. Prove that A E Cn xn is similar to a diagonal matrix iff x I - A has linear elementary divisors in C[x].
7. Prove that if D is a diagonal matrix, the elementary divisors of x] - D are its diagonal elements. 8. Argue that matrix equivalence in C[x]"mxn is an equivalence relation.
9. Prove Theorem 11.13. 10. Prove Theorem 11.15.
Further Reading [Brualdi, 1987] Richard A. Brualdi, The Jordan Canonical Form: An Old Proof, The American Mathematical Monthly, Vol. 94, No. 3, 257-267, (1987).
[Filippov, 19711 A. F. Filippov, A Short Proof of the Theorem on Reduction of a Matrix to Jordan Form, Vestnik, Moscow University, No. 2, 18-19,(197 1). (Also Moscow University Math. Bull., 26,70-71,(197 1).) [F&S, 1983] R. Fletcher and D. Sorensen, An Algorithmic Derivation of the Jordan Canonical Form, The American Mathematical Monthly, Vol. 90, No. 1, 12-16, (1983).
[Gantmacher,1959] F. R. Gantmacher, The Theory of Matrices, Vol. 1, Chelsea Publishing Company, New York, (1959). [G&W, 1981 ] A. Galperin and Z. Waksman, An Elementary Approach to Jordan Theory, The American Mathematical Monthly, Vol. 87, 728-732, (1981).
[G&L&R, 1986] I. Gohberg, P. Lancaster, and L. Rodman, Invariant Subspaces of Matrices with Applications, John Wiley & Sons, New York, (1986).
430
Jordan Canonical Form
[H&J, 1986] R. Horn and C. R. Johnson, Introduction to Matrix Analysis, Cambridge University Press, Cambridge, (1986).
[Jordan, 1870] C. Jordan, Traite des Substitutions et des Equations Algebriques, Paris, (1870), 125.
[L&T, 1985] Peter Lancaster and Miron Tismenetsky, The Theory of Matrices: With Applications, 2nd Edition, Academic Press, Orlando, (1985). [Noble, 1969] Ben Noble, Applied Linear Algebra, Prentice Hall, Englewood Cliffs, NJ (1969).
[Sobczk, 1997] Garret Sobczyk, The Generalized Spectral Decomposition of a Linear Operator, The College Mathematics Journal, Vol. 28, No. 1, January, (1997), 27-38. [Strang, 1980] Gilbert Strang, Linear Algebra and Its Applications, 2nd Edition Academic Press, New York, (1980).
[T&A 1932] H. W. Turnbull and A. C. Aitken, An Introduction to the Theory of Canonical Matrices, Blackie & Son, London, (1932).
Chapter 12 Multilinear Matters
bilinear map, bilinear form, symmetric, skew-symmetric, nondegenerate, quadratic map, alternating
12.1
Bilinear Forms
In this section, we look at a generalization of the idea of an inner product. Let V1, V2, and W be vector spaces over I8 or C, which we denote by IF when it does not matter which scalars are being used.
DEFINITION 12.1 (bilinear map) A bilinear map cp is a function cp : V, x V2 -- W such that 1.
cp(x + y, z) = cp(x, z) + p(y, z) for all x, y E V, all z E V2
2.
p(ax, z) = ap(x, z) for all a E IF for all x E V, all z E V2
3.
cp(x, y + z) = p(x, y) + p(x, z) for all x E V1, all y, z E V2
4.
cp(x, (3y) = p(x, y)P for all R E F, all x E V1, ally E V2.
In particular, if V, = V2 and W = IF, we traditionally call p a bilinear form. We denote L2(V, , V2; W) to be the set of all bilinear maps on V, and V2 with values in W. We write L2(V; W) for L2(V, V; W).
We note that the name "bilinear" makes sense, since a bilinear map is linear in each of its variables. More precisely, if we fix yin V2, the map dP(y) : V, -* W defined by d,p(y)(x) = p(x, y) is linear and the map s,p(x) defined by s,p(x)(y) _ p(x, y) is linear for each fixed x in V1. Here s (x) : V2 - ) W. We also note that the zero map O(x, y) = is a bilinear map and any linear combination of bilinear maps is again bilinear. This says that L2(V,, V2; W) is a vector space over F in its own right. Now let's look at some examples.
431
432
Multilinear Mutters
Example 12.1
1. Let VI and V2 be any vector spaces over IF and let f : VI - -> F and g : V2 - F he linear maps (i.e., linear functionals). Then p(x, y) f (x)g(y) is a bilinear form on VI and V2. 2. Fix a matrix A in 1F"' `n. Define YA(X, Y) = XT AY, where X E IF"'Iy and Y E P't'. Then SPA E L2(IF'n"y, IF' ; IFy"t'). We can make PA into a bilinear form by modifying the definition to YA(X, Y) = tr(XT A Y). 3.
If we take q = p = I in (2) above, we get PA
_
all
a12
a21
a22
ami
anus
al,,, ...
a2m
[xl...xml
n,
an,n
if
= XT AY = E > ail xi yi E
i=li=l In gory detail, xl
yl
xm
yn
F.
YA allxlyl + al2xlx2y2+
+a2lx2yl +a22x2y2+
+a,nlxn,yl +am2xmy2+
...
+aI,xlyn
...
+a2,x2yn
+amn x,,,yn
To be even more concrete, SPA
yl
r xl l ,
y2
L X2 J
Y3 Y1
_
[xIx21l all a21
a12
a22
a13
a23
Y2 1
y3
1 Yl
= [xlaII + x2a21
x1a12 + x2a22
x1a13 + x2a23]
Y2 Y3
= (xlalI + x2a21)y1 + (x1a12 + x2a22)Y2 + (xja13 + x2a23)y3
= allxlyl +a2lx2yl + a12x1y2 + a22x2y2 + a13xly3 +a23x2y3 = (allxlyl + al2xIy2 + a13xly3) + (a21x2y1 + a22x2y2 + a23x2y3).
12.1 Bilinear Forms
433
We spent extra time on this example because it arises often in practice. As an extreme case, cpz : F x F -* F by (pz(x, y) = xzy is a bilinear form for each
fixed scalar z.
Next we collect some elementary computational facts about bilinear forms. THEOREM 12.1 Let cp be a bilinear form on V. Then
/. cp(', y) = 0 = cp(x, -6) = cp(', 1) for all x E V 2. cp(ax + (3y, z) = acp(x, z) + (3cp(y, z) for all a, (3 E F, all x, y, z E V n
3. cp(.Eca;x,, y) = 4.
iE1a1cp(xt, y) for all ai E F. all x;, y E V
cp(x, ay + (3z) = p(x, y)a + cp(x, z)(3 = acp(x, y) + (3cp(x, z) for all
a,(3EFallx,y,zE V m 5.
m
p(x, E (3i yi) = E p(x, Yi )Ri for all 13j E F, all y1, x E V j=1
j=1
6. cp(ax+ 3y, yw+8z) = ay(x, w)y+ay(x, z)8+ 3cp(y, w)y+(3cp(y, z)8
for all a, 3,y, in lFandallx,y,z, win V n
7.
m
if
ni
E RjYj) = E E a;cp(x1,Yi)13j
J=J
1=1 1=I
8.
cp(-x, y) = -cp(x, y) = cp(x, -y) for all x, y in V
9.
p(-x, -y) = cp(x, y) for all x, y in V
10. cp(x + y, x + y) = cp(x, x) + cp(x, y) + cp(y, x) + cp(y, y) for all x, y E V 11. cp(x - Y, X - Y) = cp(x, x) - cp(x, y) - cp(Y, x) + p(Y, y) for all x, y E V
12. cp(x + y, x + y) + cp(x - y, x - y) = 2cp(x, x) + 29(y, y) for all x, y E
V
13. cp(x + Y, x + Y) - cp(x - Y, x - y) = 2pp(x, y) + 2cp(y, x) for all x, y E V
14. cp(x + y, x - Y) = cp(x, x) - p(x, y) + cp(y, x) - cp(Y, y) 15. cp(x - y, x + Y) = cp(x, x) - cp(y, x) + cp(x, y) - cp(y, y)
16. cp(x + Y, x - Y) + 9(x - Y, x + y) = 2p(x, x) - 2pp(Y, y) 17. cp(x - y, x + Y) - pp(x + y, x - y) = 2pp(x, y) - 2p(Y, x) 18.
cp(x, y) + cp(Y, x) = c0(x + y, x + Y) - p(x, x) - p(Y, y)
Multilinear Matters
434
PROOF
The proofs are routine computations and best left to the reader.
0
It turns out arbitrary bilinear forms can be studied in terms of two special kinds of forms. We have the following trivial decomposition of an arbitrary (P
(P(x, y) = 2 [,P(x, y) + pP(y, x)] + 2
y) - P(y, x)]
= IPVVHI(x, y) + pskew(x, Y).
The reader will be asked to verify that y,,,,(x, y) = yvv(y, x) for all x, y and y ,is bilinear and epskew(y, x) _ -(P,'keu,(X, y) where again y.t., is biis called symmetry and the second is skewlinear. The first property of symmetry.
DEFINITION 12.2
(symmetric, skew-symmetric) Let ip be a bilinear form on V. Then we say
1. y is symmetric iff ep(x, y) = y(y, x) for all x, y E V. 2. y is skew-symmetric iff ep(x, y) = -cp(y, x) for all x, y E V. The fancy talk says that L2(V ; F) is the direct sum of the subspace consisting of all symmetric bilinear forms and the subspace of all skew-symmetric bilinear forms. So, in a sense, if we know everything about symmetric forms and everything about skew-symmetric forms, we should know everything about all bilinear forms. (Right!)
DEFINITION 12.3
(nondegenerate)
A bilinear form y : V x W --+ F is called nondegenerate on the left if y(v, w) = 0 for all w E W means v = 6. Also, y is nondegenerate on the right iff y(v, w) = O for all v E V means w = 6. We call y nondegenerate iff y is nondegenerate on the left and on the right.
We have seen that dp(y) : V -* IF and s,(x) : W -* F are linear. That is, for
each y E W, d,,(y) E L(V;F) = V* and for each x E V, sp(x) E L(W;F) _ W*. We can lift the level of abstraction by considering dp : W -* V* and sY : V -> W* by y r--> d,p(y) and x F-- s,p(x). These maps are themselves linear.
A bilinear map cp is a function of two variables. However, there is a natural way to associate a function of one variable by "restricting ep to the diagonal." That is, look at cp(x, x) only.
12.1 Bilinear Forms
435
DEFINITION 12.4 (quadratic map) Let V and W be vector spaces over IF. A quadratic map from V to W is a map Q : V ----+ W such that 1.
Q(ax) = a2 Q(x) for all a E IF, all x E V.
2.
Q(x + y) + Q(x - y) = 2Q(x) + 2Q(y), all x, y E V.
if W = F, we, again traditionally, speak of Q as a quadratic form. THEOREM 12.2 Let Q be a quadratic map. Then 1.
Q(()=0.
2.
Q(-x) = Q(x) for all x E V.
3.
Q(x - y) = Q(y - x) for all x, y E V.
4.
Q(ax + (3y) = Q(ax) + 2a[3Q(x) + Q((3x).
5.
Q(ax + 3y) + Q(ax - 3y) = 2a2 Q(x) + 2R2 Q(y)
6.
Q(Z (x + y)) + Q(Z (x - y)) = 12 Q(x) + 12 Q(y)
7.
Q(x + y) - Q(x - y) = z [Q(x + 2y) + Q(x - 2y)]
8.
2Q(x + z + y) + 2Q(y) = Q(x+z+2y)+ Q(x+z).
9.
2Q(x+y)+2Q(z+y)= Q(x+z+2y)+Q(x-z).
PROOF
Again the proofs are computations and referred to the exercises. 0
Now beginning with p : V x V ----) W bilinear, we can associate a map
0 : V --> W by 6(x) = p(x, x). Then cD is a quadratic map called the quadratic map associated with p.
Exercise Set 51 1. LetA =
f
1
L4 by this matrix.
2
3
5
6
I . Write out explicitly the bilinear form determined
2. Work out alI the computational formulas of Theorem 12.1.
Multilinear Matters
436
3. Verify all the claims made in the previous examples.
4. Let cp be a bilinear form on the IF vector space V. Then p is called alternating iff p(x, x) = 0 for all x E V. Argue that p is alternating if and only if p is skew-symmetric.
5. Let cp be a bilinear form on V, W and S : V ---> V and T : V ---> V be linear maps. Define ps.T(v, w) = p(Sv, Tw) for V E V, W E W. Argue that Ps.T is also a bilinear form on V, W.
6. Let pp be a bilinear form on V. Then p,,,,,(x, y) = z [p(x, y) + p(y, x)] is a symmetric bilinear form on V. Show this. Also show p,tew(x, y) = z [p(x, y) - p(y, x)] is a skew symmetric bilinear form on V. Argue that cp(x, y) = epv,,,,(x, y) + pPv4eu,(x, y) and this way of representing p as a symmetric plus a skew-symmetric bilinear form is unique.
7. Argue that L2(V, W; F) is a vector space over F. Also argue that the symmetric forms are a subspace of L2(V; F), as are the skew-symmetric forms, and that L2(V; F) is the direct sum of these to subspaces.
8. Let V be the vector space C ([-'rr, 'rr]) of F valued functions defined on [-zr, w], which are continuous. Define (p(f, g) = f ,, f (x)g(x)dx. Argue that p is a symmetric bilinear form on V. 9. You have seen how to define a bilinear map. How would you define a trilinear map p : Vi x V2 x V3 -* W. How about a p-linear map?
10. Let p be bilinear on V. Show that p,,.,,,(x, y) = a (cp (x + y, x + y)
-cp(x-y,x-y)}. 11. Let y be a bilinear form on V (i.e., cp E L2(V; IF)). Then s,p(x) : V -
F
and d,p(y) : V - IF are linear functionals. Show s,p : V - L(V; F) = V -) L(V;IF) = V* are linear maps in their own V* and d, :
right and p is nondegenerate on the right (left) if d,p(sp) is injective iff ker(d,o)(ker(s,,)) is trivial. Thus p is nondegenerate iff both s,p and d" are injective. In other words, p is degenerate iff at least one of ker(s,p) or ker(d,p) is not trivial.
12. Suppose a bilinear form is both symmetric and skew-symmetric. What can you say about it. 13. Verify the computational rules for quadratic maps given in Theorem 12.2.
14. Let p : V x V -* W be bilinear. Verify that the associated quadratic map 0: V -> W given by (D(x) = ep(x, x) is indeed a quadratic map.
12.2 Matrices Associated to Bilinear Forms
437
15. A quadratic equation in two variables x and y is an equation of the form
axe+2bxy+cy2+dx+ey+ f = 0.1 Write this equation in matrix form where x =
y 1 and A = I b
.Note xr Ax is the quadratic form
b J
associated to this equation. Generalize this to three variables; generalize it to n variables.
16. Suppose A E R""" and A= AT. Call A(or Q) definite iff Q(x) = xTAx takes only one sign as x varies over all nonzero vectors x in R", Call A positive definite iff Q(x) > 0 for all nonzero x in R", negative definite
if Q(x) < 0 for all nonzero x in R", and indefinite iff Q takes on both positive and negative values. Q is called positive semidefinite if
Q(x) > 0 and negative semidefinite iff Q(x) < 0 for all x. These words apply to either A or Q. Argue that A is positive definite iff all its eigenvalues are positive.
17. Prove that if A is positive definite real symmetric, then so is A-'.
18. Argue that if A is positive definite real symmetric, then det(A) > 0. Is the converse true? 19. Prove that if A is singular, ATA is positive semidefinite but not positive definite.
Further Reading [Lam, 1973] T. Y. Lam, The Algebraic Theory of Quadratic Forms, The Benjamin/Cummings Publishing Company, Reading, MA, (1973).
congruence, discriminant
12.2
Matrices Associated to Bilinear Forms
Let cp be a bilinear form on V and 5 = {b1, b2, ..., be an ordered basis of V. We now describe how to associate a matrix to cp relative to this basis. Let
xand ybe in V. Then x=xibi
andy= yibi +..+y"b,,,so
Multilinear Mutters
438 n
n
n
n
n
n
pp(x, y) = pp(r_xibi, y) = Exip(bi, y) = Exiy(bi, >yjbi) = r_ r_xiyiY i=I
i=I
i=I
j=1
i=lj=I
(bi, bj). Let aid = cp(bi, bj). These n2 scalars completely determine the action of y on any pair of vectors. Thus, we naturally associate the n-by-n matrix A = [aid] = [,.p(bi, b;)] := Mat (,p; B). Moreover, y(x, y) = Mat(x; C3)T Mat(,p; B) Mat(y; B). Conversely, given any n-by-n matrix A = laij] and an ordered basis 8 of V, then p(x, y) = Mat(x; C3)T AMat(y; B) defines a bilinear form on V whose matrix is A relative to 8. It is straightforward to verify that, given an ordered basis B of V, there is a one-to-one correspondence between bilinear forms on V and n-by-n matrices given by p r---* Mat(y,13). Moreover, Mat(acpi + b(p2; B) = aMar((pi; 13) + bMat(cp2; BY The crucial question now is what happens if we change the basis. How does the matrix representing p change and how is this new matrix related to the old one. Let C be another ordered basis of V. How is Mat(y; B) related to Mat(cp, C)? Naturally, the key is the change of basis matrix. Let P be the (invertible) change of basis matrix P = PB+c so that Mat(x; B) = PMat(x; C). Then for any x, y E V, p(x, y) = Mat(x; B)T Mat(cp; C3)Mat(y; B) _ (PMat(x;C))T Mat((p; B)PMat(y;C) = Mat(x;C)T(PT Mat(cp;B)P)Mat(y;C). But also p(x, y) _ Mat(x; C)T (Mat((p;C)Mat(y;C), so we conclude
Mat((p;C) = PT Mat(,p;B)P. This leads us to yet another equivalence relation on n-by-n matrices.
DEFINITION 12.5 (congruence) A matrix A E F" "" is congruent to a matrix B E IF"" iff there exists an B to symbolize invertible matrix P such that B = PT AP. We write A congruence.
THEOREM 12.3 is an equivalence relation on F" x 11, That is,
1. A^- Aforall A.
2.IfA^-B,then B--A. 3. IfA - BandB^- C, thenA - C. PROOF
The proof is left as an exercise.
12.2 Matrices Associated to Bilinear Forms
439
We note that congruence is a special case of matrix equivalence so, for example, congruent matrices must have the same rank. However, we did not demand that PT equal P-' , so congruence is not as strong as similarity. Congruent matrices need not have the same eigenvalues or even the same determinant. Any property that is invariant for congruence, such as rank, can be ascribed to the associated bilinear form. Thus, we define the rank of cp to he the rank of any matrix that represents cp. Moreover, we have the following theorem.
THEOREM 12.4 Let cp be a bilinear form on V. Then cp is nondegenerate iff rank(cp) = dim V.
PROOF
The proof is left as an exercise.
0
Suppose A and B are congruent. Then B = P7 A P for some nonsingular ma-
trix P. Then det(B) = det(PT AP) = det(P)2det(A). Thus, the determinants of A and B may differ, but in a precise way. Namely, one is a nonzero square scalar times the other. We define the discriminant of a bilinear form cp to be {a2det(A) I a i4 0, A represents cp in some ordered basis} This set of scalars is an invariant under congruence. We summarize this section with a theorem. THEOREM 12.5 Let cp be a bilinear form on V and B = {b1, b2, , an ordered basis of V. Then cp(x, y) = Mat(x,13)T Mat(cp; 8)Mat(y; B) where Mat(p, B) = [p(bt, b;)]. Moreover, if C is another ordered basis of V, then Mat(p;C) = (Pc_a)T Mat(cp; l3)Pc,a. Moreover, if two matrices are congruent, then they represent the same bilinear form on V. Also, p is symmetric if Mat(cp; B) is a symmetric matrix and p is skew-symmetric iff Mat(p;13) is a skew-symmetric matrix.
Exercise Set 52 1. Prove Theorem 12.3. 2. Find two congruent matrices A and B that have different determinants.
3. Prove Theorem 12.4. 4. Prove Theorem 12.5.
Multilinear Mutters
440
orthogonal, isotropic, orthosymmetric, radical, orthogonal direct sum
12.3
Orthogonality
One of the most useful geometric concepts that we associate with inner products is to identify when two vectors are orthogonal (i.e., perpendicular). We can do this with bilinear forms as well. Given a bilinear form cp, we can define the related notion of orthogonality in the natural way. Namely, we say x is p-orthogonal to y iff p(x, y) = 0. We can symbolize this by x ..L y. If p is understood, we will simplify things by just using the word "orthogonal" and the symbol 1. Unlike with inner products, some strange things can happen. It is possible for a nonzero vector to be orthogonal to itself! Such a vector is called isotropic, and these vectors actually occur meaningfully in relativity theory in physics. It is also possible that a vector x is orthogonal to a vector y but y is not orthogonal to x. However, we have the following nice result. THEOREM 12.6 Let cp be a bilinear form on V. Then the orthogonality relation is symmetric (i.e., x 1, y implies y 1, x) iff cp is either symmetric or skew-symmetric.
PROOF If y is symmetric or skew-symmetric, then clearly the orthogonality relation is symmetric. So, assume the relation is symmetric. Let x, y, z c V. Let w = p(x, y)z - p(x, z)y. Then we compute that x 1 w. By symmetry w 1 x, which is equivalent to p(x, y)p(z, x) - p(x, z)ep(y, x) = 0. Set x = y and conclude cp(x, x) [p(z, x) - p(x, z)] = 0 for all x, z E V. Swapping x and z, we can also conclude p(z, z) [p(z, x) - p(x, z)] = 0. This seems to be good news since it seems to say p(z, z) = 0 (i.e., z is isotropic or p(z, x) = cp(x, z)). The problem is we might have a mixture of these cases. The rest of the proof says its all one (all isotropic) or all the other (i.e., p is symmetric). So suppose cp is not symmetric (if it is, we are finished). Then there must exist vectors u and v such that p(u, v) i4 p(v, u). Then, by what we showed
above, p(u, u) = p(v, v) = 0. We claim p(w, w) = 0 for all w in V. Since u is isotropic cp(u + w, u + w) = cp(u, w) + cp(w, u) + cp(w, w). If w is not isotropic, then cp(w, x) = p(x, w) for all x E V. In particular, p(w, u) = p(u, w) and cp(w, v) = p(v, w). By above, p(u, w)p(v, u) - p(u, v)cp(w, u) = 0, so cp(u, w) [cp(v, u) - p(u, v)] = 0; but p(u, v) 54 p(v, u), so we must conclude
12.3 Orthogonality
441
Ip(w, u) = p(u, w) = 0. Similarly, p(w, v) = p(v, w) = 0. Thus y(u + w, u + w) = cp(w, w). But y(u + w, v) = p(u, v) + p(u, v) = cp(u, v) # p(v, u) = p(v, u + w). Thus u + w is also isotropic. Therefore, p(w, w) = 0. Therefore, all vectors in V are isotropic and cp is alternating, hence skew-symmetric.
0
To cover both of these cases, we simply call p orthosymmetric whenever the associated orthogonality relation is symmetric.
Now let S be any subset of a vector space V with p an orthosymmetric bilinear form on V. We define Sl = (v E V I (p(v, s) = 0 for all s E S). We define the radical of S by Rad(S) = S fl Sl so that Rad(V) = V'. We see that p is nondegenerate iff Rad(V) THEOREM 12.7 Let S be any subset of the vector space V. 1.
S1 is always a subspace of V.
2.
SC(S1)'.
3. If S, a S2, then Sz a Si .
4. If S is a finite dimensional subspace of V, then S = S. PROOF
The proof is relegated to the exercises.
0
If V is the direct sum of two subspaces Wn and W2 (i.e., V = W, ®W2), we call the direct sum an orthogonal direct sum iff W, c WZ and write V = W, ®1 W2. Of course, this idea extends to any finite number of subspaces. If cp is a bilinear form on V then, by restriction, p is a bilinear form on any subspace of V. The next theorem says we can restrict our study to nondegenerate orthosymmetric forms.
THEOREM 12.8 Let S be the complement of Rad (V ). Then V = Rad (V) ®1 S and p restricted to S is nondegenerate.
PROOF Since Rad(V) is a subspace of V, it has a complement. Choose one, say M; then V = Rad(V) ® M. But all vectors are orthogonal to Rad(V), so
V = Rad(V)®1M.LetvE Mf1Ml.Thenv E Mlsov E Rad(V).Buty E M also, so v E Rad(V) fl M = ( ). Hence v =
. Thus
M fl Ml = (-6). 0
Multilinear Matters
442
Exercise Set 53 1. Prove Theorem 12.7. 1
2. Consider the bilinear form associated with the matrix A = L
0
0
-1
Identify all the isotropic vectors.
3. A quadratic function f of n variables looks like f (x1, x2, ..., xn) _ n
n
>gijxixj + Ecixi + d. What does this formula reduce to if n = 1? i=I
i=I j=1
Argue that f can be written in matrix form f (v) = v7 Qv + cT v + d, where Q is a real n-by-n symmetric nonzero matrix. If you have had some calculus, compute the gradient V f (v) in matrix form.
orthogonal basis, orthonormal basis, Sym(n;C), Sylvester's law of inertia, signature, inertia
12.4 Symmetric Bilinear Forms In this section, we focus on bilinear forms that are symmetric. They have a beautiful characterization.
THEOREM 12.9 Suppose cp is a symmetric bilinear form on the finite dimensional space V. Then
there exists an ordered basis B of V such that Mat(p; 5) is diagonal. Such a basis is called an orthogonal basis.
PROOF
The proof is by induction We seek a basis B = {bI, b2, ...
,
such that p(bi, bj) = 0 if i # j. If 9 = 0 or n = 1, the theorem holds trivially, so suppose 9 # 0 and n > I . We claim p(x, x) 0 for some x E V. If cp(x, x) = 0 for all x E V, then p(x, y) = 1 {ep(x+y, x+y) - cp(x -y, x - y)} _
0 for all x, y, making cp = 0 against our assumption. Let W = sp(x). We claim V = W ®1 W1. First, let z E W n W -L. Then z = ax for some a and z E W1 so z I. z. Thus, cp(z, z) = 0 = cp(ax, ax) = a2p(x, x). Since p(x, x) 54 0, a2 = 0, so a = 0 so z = '. We conclude W n W1 = (V). cp(v, x)
Now let v E V and set b = v - p(x x) x. Then V =
(cp(v, x)
±P(x x)
x + b and
443
/2.4 Symmetric Bilinear Forms
cp(x, b) = p(x, v) - p(v' x) cp(x, x) = 0 since p is symmetric. Thus, b E W1 cp(x, x)
and so V = W ® W1. Now restrict p to W1, which is n - l dimensional. By induction there is a basis (b,, b2, ... , b"_, } with p(x;, bj) = 0 if i # j. Add x to this basis.
0
This theorem has significant consequences for symmetric matrices under congruence. COROLLARY 12.1 Any symmetric matrix over F is congruent to a diagonal matrix.
PROOF Let A be an n-by-n symmetric matrix over F. Let PA be the symmetric bilinear form determined by A on F", (i.e., cpA(x, y) = XT Ay). Then Mat(p;std) = A, where std denotes the standard basis of F". Now, by the previous theorem, there is a basis 5 of 1F" in which Mat((pA; B) is diagonal. But Mat(epA; B) and Mat(ePA; std) = A are congruent and so A is congruent to a diagonal matrix.
0
Unfortunately, two distinct diagonal matrices can be congruent, so the diagonal matrices do not form a set of canonical forms for the equivalence relation 0
d,
d2
and we
of congruence. For example, if Mat(p, CB)
0 d" select nonzero scalars a,, ... , of,,, we can "scale" each basis vector to- get a new
faid,
0
basis C = {a,b,,... , a"b"). Then Mat(cp, C) = 0
a?
and these two matrices are congruent. Now we restrict our attention to the complex field C. Here we see congruence reduces to ordinary matrix equivalence.
THEOREM 12.10 Suppose cp is a symmetric bilinear form of rank r over the n-dimensional space V. Then there exists an ordered basis B = (b,, b2, ... , b" } of V such that 1. Mat(p; B) is diagonal
2. 9(bj, bi) =
I
forj=l,2,...,r
0
for j > r
Multilinear Matters
444
he an orthogonal basis as provided by Theorem PROOF Let (a,, ... , 12.9. Then there exist r values of j such that cp(a,, a1) # 0, else the rank of cp would not be r. Reorder this basis if necessary so these basis vectors become the
first r. Then define bj =
I I
I
cp(a ,, aj )
ai for j =
1,
aiforj>r
r Then we have
our basis so that Mat(p; Ci) -10 ®] .
0
COROLLARY 12.2 If p is a nondegenerate symmetric bilinear form over the n-dimensional space V, then V has an orthonormal basis for p.
Let Sym(n; C) denote the set of n-by-n symmetric matrices over C. Let Ik.,,, be the matrix with k ones and in zeros on the main diagonal and zeros elsewhere.
Then the rank of a matrix is a complete invariant for congruence, and the set of all matrices of the form Ik.,,, for k + m = n is a set of canonical forms for congruence. That is, every matrix in Sym(n; C) is congruent to a unique matrix for some k = 0, 1, ... , n and m = n - k. of the form The story over the real field is more interesting. The main result is named for James Joseph Sylvester (3 September 1814-15 March 1897). THEOREM 12.11 (Sylvester's law of inertia) Suppose ep is a symmetric bilinear form of rank r on the real n-dimensional such that vector space V . Then there is an ordered basis 1 3 = (b, , b2, ... , 1k
Mat(p, B) _
I.
where k is the number of ones,
k + m = r, and k is an invariant for p under congruence.
PROOF Begin with the basis (a,, a,,,... , a } given by Theorem 12.9. Reorder the basis if necessary so that p(a1, at) = 0 for j > r and cp(aj, aj) # 0 for
I < j < r. Then the basis B = {b,, b2, ... ,
where bj =
cp(aj, aj)
aj,
1 < j < r, and bj = ai for j > r yields a matrix as above. The hard part is to prove that k, m, l do not depend on the basis chosen. Let V+ be the sub-
space of V spanned by the basis vectors for which cp(bj, bj) = I and Vthe subspace spanned by the basis vectors for which p(bi, bj) _ -l. Now
k = dim V+. If -6 # x E V+, p(x, x) > 0 on V+ and if
(
x E V-,
then p(x, x) < 0. Let Vl be the subspace spanned by the remaining basis
/2.4 Symmetric Bilinear Forms
445
vectors. Note if x E V1, cp(x, y) = 0 for all y E V. Since B is a basis, we have V = V+ e V-- ® V 1-. Let W be any subspace of V such that cp(w, w) > 0 for all nonzero w E W. Then W fl span {V -, VI) = (6 )
for suppose w E W, b E V -, c E V 1 and w + b + c=-6. Then 0 = cp(w, w + b + c) = cp(w, w) + p(w, b) + cp(w, c) = cp(w, w) + p(w, b) and 0 = p(b, w + b + c) = cp(b, b) + cp(b, w) + -6 = cp(b, b) + cp(w, b). But then p(w, w) = cp(b, b), but cp(w, w) > 0 and cp(b, b) < 0, so the only way out
is cp(w, w) = cp(b, b) = 0. Therefore w = b = 6 so c = V as well. Now V = V+ ® V- ® V -L and W, V-, V' are independent so dim W < dim V+. If W is any subspace of V on which cp takes positive values on nonzero vectors, dim W < dim V+. So if 5' is another ordered basis that gives a matrix /k'.,,,i.ti, then V+ has dimension k1 < dim V+ = k. But our argument is symmetric
in these bases so k < k'. Therefore, k = dim V+ = dim Vi+ for any such basis. Since k + I = r and 0 + 11 = r, it follows l = 11 as well, and since k + l + in = n, we must have in I = in also.
0
Note that V1 = Rad(V) above and dim(V1) = dim V - rank(p). The number dim V+ - dim V- is sometimes called the signature of cp. THEOREM 12.12 Congruence is an equivalence relation on Sym(n;R), the set of n-by-n symmetric matrices over the reals. The set of matrices Ik,t,,,, where k + I + in = n is a set of canonical forms for congruence and the pair of numbers (k, in) or (k + in, k - in) is a complete invariant for congruence. In other words, two real symmetric matrices are congruent iff they have the same rank and the same signature.
Exercise Set 54 1. Suppose A is real symmetric positive definite. Argue that A must be nonsingular.
2. Suppose A is real symmetric positive definite. Argue that the leading principal submatrices of A are all positive definite. 3. Suppose A is real symmetric positive definite. Argue that A can be reduced to upper triangular form using only transvections and all the pivots will be positive.
Multilineur Mutters
446
4. Suppose A is real symmetric positive definite. Argue that A can he factored as A = LDLT, where L is lower triangular with ones along its diagonal, and D is a diagonal matrix with all diagonal entries positive. 5. Suppose A is real symmetric positive definite. Argue that A can he factored as A = LLT, where L is lower triangular with positive diagonal elements.
6. Suppose A is real and symmetric. Argue that the following statements are all equivalent: (a) A is positive definite. (h) The leading principal submatrices of A are all positive definite. (c) A can be reduced to upper triangular form using only transvections and all the pivots will be positive. (d) A can he factored as A = LLT, where L is lower triangular with positive diagonal elements. (e) A can be factored as A = BT B for some nonsingular matrix B. (f) All the eigenvalues of A are positive.
7. Suppose A is a real symmetric definite matrix. Do similar results to exercise 6 hold? 8. Suppose A is real symmetric nonsingular. Argue that A2 is positive definite.
9. Argue that a Hermitian matrix is positive definite iff it is congruent to the identity matrix.
10. Suppose H is a Hermitian matrix of rank r. Argue that H is positive semidefinite iff it is congruent to
® ®] . Conclude H > 0 iff
H = P*P for some matrix P. 11. Argue that two Hermitian matrices A and B are congruent iff they have the same rank and the same number of positive eigenvalues (counting multiplicities).
12. Suppose A and B are Hermitian. Suppose A is positive definite. Argue that there exists an invertible matrix P with P*AP = I and P*BP = D, where D is a diagonal matrix with the eigenvalues of A` B down the diagonal.
13. Suppose A is a square n-by-n complex matrix. Define the inertia of A to he ('rr(A), v(A), B(A)), where rr(A) is the number of eigenvalues of A
12.5 Congruence and Symmetric Matrices
447
(counting algebraic multiplicities) in the open right-half plane, v(A) is the number of eigenvalues in the open left-half plane, and S(A) is the number
of eigenvalues on the imaginary axis. Note that a(A)+v(A)+8(A) = n. Argue that A is nonsingular iff S(A) = 0. If H is Hermitian, argue that ,rr(H) is the number of positive eigenvalues of H, v(H) is the number of negative eigenvalues of H, and b(H) is the number of times zero occurs as
an eigenvalue. Argue that rank(H) = 7r(H)+ v(H) and signature(H) _ 1T(H) - v(H).
14. Suppose A and B are n-by-n Hermitian matrices of rank r and suppos A = MBM* for some M. Argue that A and B have the same inertia.
Further Reading [Roman, 1992] Steven Roman, Advanced Linear Algebra, SpringerVerlag, New York, (1992).
[C&R, 19641 J. S. Chipman and M. M. Rao, Projections, Generalized Inverses and Quadratic Forms, J. Math. Anal. Appl., IX, (1964), 1-11.
12.5
Congruence and Symmetric Matrices
In the previous sections, we have motivated the equivalence relation of con-
gruence on n-by-n matrices over F. Namely, A, B E F"', A -c B iff there exists an invertible matrix P such that B = PT A P. Evidently congruence is a special case of matrix equivalence, so any conclusions that obtain for equivalent matrices hold for congruent matrices as well. For example, congruent matrices must have the same rank. Now P being invertible means that P can he expressed as a product ofelementary matrices. The same is so for Pr. Indeed if PT = En, E,,,_1 ... El, then P = E Ez ... ET . Therefore, B = PT A P = E,,, En, _, ... E I A E i EZ En . Thus, B1is obtained from A by performing pairs of elementary operations, each pair consisting of an elementary row operation and the same elementary column
448
Multilinear Matters
operation. Indeed 0
1
it 0
1
0 0
0
1
k
a
b
c
e
f
1
d g
h
k
0 0
a
aK+b
c
g
kg+h
k
1
0 0
0
1
=
Xa+d k2a+2Xb+e Xc+f Thus, we see if A is symmetric, (i.e., A = AT), then E7 'A E is still symmetric
and, if ET zeros out the (i, j) entry of A where i # j, then E zeros out the (j, i) entry simultaneously. We are not surprised now by the next theorem.
THEOREM 12.13 Let A be a symmetric matrix in Fr""'. Then A is congruent to a diagonal matrix whose first r diagonal entries are nonzero while the remaining n - r diagonal entries are zero.
The proof indicates an algorithm that will effectively reduce a symmetric E, A = PT A matrix to diagonal form. It is just Gauss elimination again: E,,, produces an upper triangular matrix while Ei E; . . . ET produces a diagonal matrix. These elementary operations have no effect on the rank at each stage.
ALGORITHM 12.1
[AII
[EiAEfIEf] -+ ... _
[EII...EIAET...EET...E]
= I DP]
For example, f
A=
I
L
23 5, then
1
2
2 2
5
1
I
2 2:
2
3
5
2
5
5
.
1 T21(-2)T
5
T, (-2)A T21(-2)T
:
1
0
2
I
-2
0
0
-1
1
0
I
0
2
1
5
0
0
1
I
-'
12.5 Congruence and Symmetric Matrices
T31(- 2) A" )T31( -2)T
T21(-2)T T31 (-2)T
1
0
0
1
0
-1
1
0
0
1
5
0
T23(-1)A(2) T23(-1)T
=
449
-2 -2 1
0
0
1
-*
T21(-2)T T31(-2)T
1
0
0
1
0
-2
0
-2
0
0
1
0
0
0
1
0
-1
T23(-1)T
1
Then 1
0
-2
0
1
0
0
-1
1
T
1
A
0 0
0
-2
1
0
-l
=
1
1
0
0
0
-2
0
0
0
1
A couple of important facts need to be noted here. Since P need not be an
orthogonal matrix (P-1 = pT), the diagonal entries of PT AP need not be eigenvalues of A. Indeed, for an elementary matrix E, EAET need not be the same as EAE-I. Also, the diagonal matrix to which A has been reduced above is not unique. Over the complex numbers, congruence reduces to equivalence. THEOREM 12.14 Let A E C"". Then A is congruent to
1
® ® ] . Therefore, two matrices A, L
B E C""" are congruent iff they have the same rank.
450
Multilinear Matters
Exercise Set 55 2
2 4
3 5
3
5
6
1
I. Find a diagonal matrix congruent to
2. Argue that if A is congruent to B and A is symmetric, then B must he
symmetric.
3. Prove directly that the matrix [
0 over R, but they are congruent over C.
12.6
is not congruent to I J
0
0 ,
L
Skew-Symmetric Bilinear Forms
Recall that a bilinear form p is called skew-symmetric if cp(x, y) = -ep(y, x) for all x, y E V. Any matrix representation of p gives a skew-symmetric matrix A7.
= -A and skew-symmetric matrices must have zero diagonals. Suppose u, v are vectors in V with cp(u, v) = 1. Then p(v, u) 1= -1. If we restrict ep to
H = span{u, v}, its matrix has the form
01 L
J.
Such a pair of vectors
is called a hyperbolic pair and H is called a hyperbolic plane.
THEOREM 12.15 Let p be a skew symmetric bilinearform on the F vector space V of n dimensions. Then there is a basis B = { a , , b , , a2, b,, ... , ak, bk, where [ -1
Mat(cp, B) =
0
[-11
0
0 where rank((p) = 2k.
PROOF
Suppose cp is nonzero and skew-symmetric on V. Then there exists
a pair of vectors a, b with cp(a, b) 4 0 say ep(a, b) = a. Replacing a by (I /a)a, we may assume cp(a, b) = 1. Let c = as + Pb. Then ep(c, a) =
12.6 Skew-Symmetric Bilinear Forms
451
ip(aa+(3b, a) = (3p(b, a) = -(3 and cp(c, b) = p(aa+(3b, b) = acp(a, b) = a so c = cp(c, b)a - ep(c, a)b. Note a and b are independent. Let W = span {a, b} We claim V = W ® W j-. Let x be any vector in V and e = p(x, b)a - cp(x, a)b
and d = x - c. Then C E W and d E W1 since cp(d, a) = p(x - ep(x, b)a + p(x, a)b, a) = p(x, a) + p(x, a)p(b, a) = 0 and, similarly, cp(d, b) = 0. Thus, V = W+ W -L. Moreover, W fl W -L = (-0 ), so V = W ®W 1. Now (p restricted to W -L is a skew-symmetric bilinear form. If this restriction is zero, we are done.
If not, there exist a2, b2 in W' with cp(a2, b2) = 1. Let W2 = span {a,, b2}. Then V = W ® W2 ® Wo. This process must eventually cease since we only have finitely many dimensions. So we get p(aj, bj) = 1 for j = 1, ... , k and
p(aj, aj) = cp(b;, bj) = cp(a;, bi) = 0 if i i4 j and if Wj is the hyperbolic plane spanned by {ai, bj }, V = W, ® ... (D Wk ® Wo, where every vector in WO is orthogonal to all aj and bj and cp restricted to WO is zero. It is clear the matrix of y relative to jai, b, , a2, b2, ... , at , bk, c, , ... , ct, } has the advertised form.
0
COROLLARY 12.3
If cp is a nondegenerate skew symmetric bilinear form on V, then dim(V) must be even and V is a direct sum of hyperbolic planes, and Mat ((p; B) is
[
-0 1] for some basis B.
Exercise Set 56 1. Prove that every matrix congruent to a skew-symmetric matrix is also skew-symmetric. 2. Argue that skew-symmetric matrices over C are congruent iff they have the same rank. 3. Prove that a skew-symmetric matrix must have zero trace.
4. Suppose A is an invertible symmetric matrix and K is skew-symmetric
with (A + K)(A - K) invertible. Prove that ST AS = A, where S =
(A + K)-'(A - K).
452
Multilinear Matters
5. Suppose A is an invertible symmetric matrix and K is such that (A + K)(A - K) is invertible. Suppose ST AS = A, where S = (A + K)-1(A K) and I + S is invertible. Argue that K is skew-symmetric.
12.7
Tensor Products of Matrices
In this section, we introduce another way to multiply two matrices together to get another matrix.
DEFINITION 12.6 (tensor or Kronecker product) Let A E C" and B E Cp,q. Then A ® B is defined to be the mp-by-nq
...
C11
C1n
matrix A ® B =
, where Ci1 is a block matrix of'size Con I
.
. .
Cwn
pq-by-pq defined by C,1 _ (ent,1(A))B = aid B. In other words,
A®B=
a11B a21 B
a1 2B
...
a2 2 B
...
an,1 B
a,, 2B
r bit
b12
bl3
b21
b22
b73
a2n B
amnB
For example, all
a12
a21
a22
a31
a32
0
L
1
ali bi,
al lbl2
all b2l a2l bll all b2l
al2bll
al2bl2
a12b13
al lb22
allb13 allb23
a12b21
a12b22
a12b23
a2 1b12
a2lb13
a22bll
a22b12
a22bl3
al l b22
all b23
a22b21
a22b22
a22b23
a31 bll
a3 1b12
a31b13
a32b11
a32b12
a32b13
a31 b21
a3 1 b22
a31 b23
a32b21
a32b22
a32b23
We can view A ® B as being formed by replacing each element all of A by the p-by-q matrix all B. So the tensor product of A with B is a partitioned matrix consisting of m rows and l columns of p-by-q blocks. The ij'h block is a;1 B. The element of A ® B that appears in the [p(i - 1) + r]'h row and [q(j - 1) + s]'t' column is the rs'h element ae1b of a;j B.
12.7 Tensor Products of Matrices
453
Note that A ® B can always be formed regardless of the size of the matrices, unlike ordinary matrix multiplication. Also note that A ® B and B ® A have the same size but rarely are equal. In other words, we do not have a commutative multiplication here. Let's now collect some of the basic results about computing with tensor products.
THEOREM 12.16 (basic facts about tensor products)
1. For any matrices A, B, C, (A ®B) ®C = A ®(B ®C). 2. For A, B m-by-n and C p-by-q,
(A+B)®C=A®C+B®Cand C®(A+B)=C®A+C®B.
3. If a is a scalar (considered as a 1-by-1 matrix),
then a ® A = aA = A ® a for any matrix A.
4. O®A = O = A ®O for any matrix A.
5. If a is a scalar, (aA) ® B = a(A ®B) = A ®(aB) for any matrices A and B.
6. IfD=diag(di,d2,... ,d),then D®A= diag(d, A, d2A,... , d A). 7.
8. For any matrices A, B, (A ® B)T = AT ® BT
.
9. For any matrices A, B, (A ® B)* = A* ® B*.
10. If A is m-by-n, B p-by-q, C n-by-s, D q-by-t, then (A ® B)(C ® D) _ (AC) ® (BD). 11. If A and B are square, not necessarily the same size, then tr(A ® B) = tr(A)tr(B). 12. If A and B are invertible, not necessarily the same size, then (A 0 B)-' _
A-' 0 B-1. 13. If A is m-by-m and B n-by-n, then det(A (& B) = (det(A))"(det(B))".
14. If A is m-by-n and B p-by-q and if A91, BA' are one-inverses of A and B, respectively, then As' 0 is a one-inverse of A 0 B.
Multilinear Matters
454
15. For any, matrices A and B, rank(A ® B) = rank(A)rank(B). Consequently A ® B has full row rank iff A and B have full row rank. A similar statement holds for column rank. In particular A ® B is invertible iff both A and B are.
16. A®B=A®B. PROOF
The proofs will he left as exercises.
All right, so what are tensor products good for? As usual, everything in linear algebra boils down to solving systems of linear equations. Suppose we are trying
to solve AX = B where A is n-by-n, B is m-by-p, and X is n-by-p; say I
-
I [ X3
[
=[
XX4
b2,:
b
If we are clever, we can write
b;;
this as an ordinary system of linear equations: uzz all
ail
0
bpi
a22
0
0 0
xi
a21
X3
b21
0
0
all
a12
X2
bi2
.This is just (/2(& A)x=b
o o a21 a22 x4 J [ b22 where x and b are big tall vectors made by stacking the columns of X and B on top of each other. This leads us to introduce the vec operator. We can take an m-by-n matrix A and form an mn-by- I column vector by stacking the columns of A over each other.
DEFINITION 12.7
(vec) Coll (A) Co12(A)
Let A be an to-by-n matrix. Then vec(A) =
E
C""'+i. Now if A
Col (A) is m-by-n, B n-by-p, and X n-by-p, we can reformulate the matrix equation
AX = B as (1t, ® A)x = b, where x = vec(X) and b = vec(B). If we can solve one system, we can solve the other. Let's look at the basic properties of vec. Note the (i, j) entry of A is the [(j - 1)m + i J`t' element of vec(A). THEOREM 12.17 (basic properties of vec)
1. If x and y are columns vectors, not necessarily the same size, then vec(xyT) = y ® X.
2. For any scalar c and matrix A, vec(cA) = c vec(A). 3. For matrices A and B of the same size, vec(A + B) = vec(A) + vec(B).
455
12.7 Tensor Products of Matrices
4. Let A and B have the same size, then tr(AT B) = [vec(A)]T vec(B). 5.
If A is m-by-n and B n-by-p, then Acoli B Acol2(B)
diag(A, A_. , A)vec(B) = (IN 0
vec(AB) Acol,,(B) A)vec(B) = (BT 0 I,,,)vec(A).
6. Let A be m-by-n, B n-by-p, C p-by-q; then vec(ABC) = (CT ® A)vec(B).
PROOF
The proofs are left as exercises.
We have seen that AX = B can be rewritten as (Ip (& A)vec(X) = vec(B). More generally, if A is m-by-n, B m-by-q, C p-by-q, and X n-by-p, then AXC = B can be reformulated as (CT ® A)vec (X) = vec(B). Another interesting matrix equation is AX + X B = C, where A is m-by-m, B is n-by-n, C is m-by-n, and X, m-by-n. This equation can be reformulated as (A 0 1 + 1,,, (9 BT)vec(X) = vec(C). Thus, the matrix equation admits a unique solution iff A ®1 + Im ® BT is invertible of size mn-by-mn.
Exercise Set 57 1. Prove the various claims of Theorem 12.16. 2. Prove the various claims of Theorem 12.17.
3. Is it true that (A 0 B)+ = A+ ® B+? 4. Work out vec(
a
b
d
e
f
g
h
i
c
) explicitly.
5. Define rvec(A) = [vec(AT )]T . Work out rvec(
a
b
c
d
e h
i
g
f
) explic-
itly. Explain in words what rvec does to a matrix.
6. Prove that vec(AB) = (1t, (& A)vec(B) = (BT(& I)vec(A) = (BT ® A)uec(1 ).
456
Multilinear Matters
7. Show that vec(ABC) = (CT ® A)vec(B).
8. Argue that tr(AB) = vec(A )T vec(B) = vec(BT)T vec(A).
9. Prove that tr(ABCD) = vec(DT )T (CT ® A)vec(B) = vec(D)t (A ® CT)vec(BT ).
10. Show that aA ®(3B = a(3(A (9 B) = (a(3A) ® B = A ®(a(3B).
11. What is (A®B®C)(D®E(9 F)='? 12.
If X E X(A) and µ E X(B) with eigenvectors v and w, respectively, argue that Aµ is an eigenvalue of A ® B with cigenvector v ® w.
13. Suppose v and w are m-by- I . Show that vT ®w = wv7 = w ®vT and vec(vwT) = w ® v.
14. Prove that vec(A) = vec(A).
15. Prove that vec(A*) =
vec(AT).
16. Suppose G E A( l} and HE B(I). Prove that G(9 HE A® B(1}. 17. Prove that the tensor product of diagonal matrices is a diagonal matrix.
18. Prove that the tensor product of idempotent matrices is an idempotent matrix. Is the same true for projections'? 19.
Is the tensor product of nilpotent matrices also nilpotent?
12.7.1 12.7.1.1
MATLAB Moment Tensor Product of Matrices
MATLAB has a built-in command to compute the tensor (or Kronecker) product of two matrices. If you tensor an n-by-n matrix with a p-hy-q matrix, you get an mp-by-nq matrix. You can appreciate the computer doing this kind of operation for you. The command is kron(A,B).
Just for fun, let's illustrate with matrices filled with prime numbers.
>>A = zeros(2); A(:) = primes(2)
A=
2
5
3
7
12.7 Tensor Products of Matrices
457
>>B = zeros(4); B(:) = primes(53)
B=
2
II 13
23 29
41
3
5
17
31
47
7
19
37
53
43
>>kron(A, B) ans = 4
22
46
82
10
55
115
205
6
26
86
15
65
145
215
10
34
94
25
85
155
235
14
38
106
35
95
185
265
6
33
123
14
77
161
287
9
39
58 62 74 69 87
129
21
91
301
15
51
93
141
35
119
203 217
21
57
III
159
49
133
259
371
329
Try kron(eye(2),A) and kron(A,eye(2)) and explain what is going on. This is a good opportunity to create a function of our own that MATLAB does not have built in. We do it with an M-file. To create an.m file in a Windows environment, go to File, choose New, and then choose M-File. In simplest form, all you need is to type
B = A(:) f unction B = vec(A) Then do a Save. Try out your new function on the matrix B created above.
>>vec(B) You should get a long column consisting of the first 16 primes.
Appendix A Complex Numbers
A.1
What is a Scalar?
A scalar is just a number. In fact, when you ask the famous "person on the street" what mathematics is all about, he or she will probably say that it is about
numbers. All right then, but what is a number? If we write "5" on a piece of paper, it might be tempting to say that it is the number five, but that is not quite right. In fact, this scratching of ink "5" is a symbol or name for the number and is not the number five itself. We got this symbol from the Hindus and Arabs. In ancient times, the Romans would have written "V" for this number. In fact, there are many numeral systems that represent numbers. Okay, so where does the number five, independent of how you symbolize it, exist? Well, this is getting pretty heavy, so before you think you are in a philosophy course, let's proceed as if we know what we are talking about. Do you begin to see why mathematics is called "abstract"? We learned to count at our mother's knee: 1, 2, 3, 4, .... (Remember that ` ... " means "keep on going in this manner," or "etc" to a mathematician.)] These numbers are the counting numbers, or whole numbers, and we officially name them the "natural numbers." They are symbolized by I`l = 11 9 2, 3 ... } . After learning to count, we learned that there are some other things you can do with natural numbers. We learned the operations of addition, multiplication, subtraction, and division. (Remember "gozinta" ? 3 gozinta 45 fifteen times.) It turns out that the first two operations can always be done, but that is not true of the last two. (What natural number is 3 minus 5 or 3 divided by 5?) We also learned that addition and multiplication satisfy certain rules of computation. For example, m+n = n+m for all natural numbers m and n. (Do you know the name of this rule?) Can you name some other rules? Even though the natural numbers
seem to have some failures, they turn out to be quite fundamental to all of mathematics. Indeed, a famous German mathematician, Leopold Kronecker (7 December 1823-29 December 1891), is supposed to have said that God
Do you know the official name of the symbol " . . "?
459
460
Complex Numbers
created the natural numbers and that all the rest is the work of humans. It is not clear how serious he was when he said that. Anyway, we can identify a clear weakness in the system of natural numbers by looking at the equation:
m + x = n.
(Al) .
This equation cannot always be solved for x in N (sometimes yes, but not always). Therefore, mathematicians invented more numbers. They constructed the system of integers, Z, consisting of all the natural numbers together with their opposites (the opposite of 3 is -3) and the wonderful and often underrated number zero. Thus, we get the enlarged number system:
Z = {. .. ,-2,-1,0, 1,2,...I. We have made progress. In the system, Z, equation (1) can always be solved for x. Note that N is contained in Z, in symbols, N e Z. The construction of Z is made so that all the rules of addition and multiplication of N still hold in Z. We are still not out of the woods! What integer is 2 divided by 3? In terms of an equation,
ax = b
(A.2)
cannot always he solved in Z, so Z also is lacking in some sense. Well all right, let's build some more numbers, the ratios of integers. We call these numbers fractions, or rational numbers, and they are symbolized by:
lb (Remember, we cannot divide by zero. Can you think of a good reason why not'?) Since a fraction is as good as a itself, we may view the integers as contained in Q, so that our ever expanding universe of numbers can be viewed as an inclusion chain:
NcZcQ. Once again, the familiar operations are defined, so nothing will he lost from the earlier number systems, making Q a rather impressive number system.
In some sense, all scientific measurements give an answer in Q. You can always add, multiply, subtract, and divide (not by zero), and solve equations
A. I What is a Scalar?
461
(A. 1) and (A.2) in Q. Remember the rules you learned about operating with fractions? a
c
ad + be
b
d
bd
\b/ ..dl a
c
b
d
(A . 2)'
(A . 4)
bd
_ ad - bc
(A . 5 )
bd
ad
\b/
(A.6)
\d/
( "Invert and multiply rule" ) It seems we have reached the end of the road. What more can we do that Q does not provide? Oh, if life were only so simple! Several thousand years ago, Greek mathematicians knew there was trouble. I think they tried to hush it up at the time, but truth has a way of eventually working its way out. Consider a square with unit side lengths (1 yard, l meter-it does not matter).
1
1
Figure Al.!: Existence of nonrational numbers. Would you believe we have seen some college students who thought b + they thinking? It makes you shiver all over!
_ °+e !? What were
Complex Numbers
462
The diagonal of that square has a length whose square is two. (What famous theorem of geometry led to that conclusion?) The trouble is there are no ratios of integers whose square is two; that is, x2 = 2 has no solution in Q. Even b is lacking in some sense, namely the equation:
x`-=2
(A.7)
cannot he solved forx in Q, although we can get really close to a solution in Q. It took mathematicians quite awhile to get the next system built. This is the system R of real numbers that is so crucial to calculus. This system consists of all ratio-
nal numbers as well as all irrational numbers, such as f, /.3, V(5, ii, e,
...
Our chain of number systems continues to grow:
NcZcQcR. Once again, we extend our operations to R so we do not lose all the wonderful
features of Q. But now, we can solve x2 = 2 in R. There are exactly two solutions, which we can write as f and -.. Indeed we can solve x2 = p for any nonnegative real number p. Aye, there's the rub. You can prove squares
cannot he negative in R. Did you think we were done constructing number systems? Wrong, square root breath! How about the equation
x = -3?
(A.8)
2
No, the real numbers, a really big number system, is still lacking. You guessed
it, though, we can build an even larger number system called the system of complex numbers C in which (8) can be solved. Around 2000 B.C., the Babylonians were aware of how to solve certain cases of the quadratic equation (famous from our high school days):
axe+bx+c=0. In the sixteenth century, Girolamo Cardano (24 September 1501-21 September 1576) knew that there were quadratic equations that could not be solved over the reals. If we simply "push symbols" using rules we trust, we get the roots of ax 2 + bx + c = 0 to be
r-bf r b2 2a
b22-4ac 2a
2a
--4
But what if b2 - 4ac is negative'? The square root does not make any sense
then in R. For example, consider x2 + x + 1 = 0. Then r =
If
1
2
-2
±
2
.
=
If you plug these "imaginary" numbers into the equation, you do
.
A.1 Whut is a Scular?
463
= -1. What can I say? It works! You can get zero if you agree that tell by the name "imaginary" that these "numbers" were held in some mistrust.
It took time but, by the eighteenth century, the remarkable work of Leonard Euler (15 April 1707-18 September 1783) brought the complex number system into good repute. Indeed, it was Euler who suggested using the letter i for. (Yes, I can hear all you electrical engineers saying "i is current; we use j for
." I know, but this is a mathematics book so we will use i for .) Our roots above can be written as
r=-2f 2/3-,e 1
We are now in a position to give a complete classification of the roots of any quadratic equation ax 2 + bx + c = 0 with real number coefficients. Case 1: b2 - 4ac is positive.
-b
b -4 a c
gives two distinct real roots of the equa2a 2a ± tion, one for the + and one for the -.
Then r =
Case 2 : b2 - 4ac is zero. b is a repeated real root of the equation. Then r = 2a Case 3 : b2 - 4ac is negative.
-b
Then r = 2a
4ac --b 2
C
2a
f
l)(4ac - b2) = -6 2a 2a
4ac - b2 W- if) 2a
- (2a)
i = a ± (3i gives the so-called "complex conjugate pair" of
roots of the quadratic. This is how complex numbers came to be viewed as numbers of the form a + (3i or a + i 13. It took much more work after Euler to make it plain that complex numbers were just as meaningful as any other numbers. We mention the work of Casper
Wessel (8 June 1745-25 March 1818), Jean Robert Argand (18 July 176813 August 1822), Karl Friedrich Gauss (30 April 1777-23 February 1855) (his picture was on the German 10 Mark bill) and Sir William Rowen Hamilton (4 August 1805-2 September 1865) among others. We are about to have a good look at complex numbers, since they are the crucial scalars used in this book. One of our main themes is how the system of complex numbers and the system of (square) complex matrices share many features in common. We will use the approach of Hamilton to work out the main properties of the complex number system.
Complex Numbers
464
Further Reading [Dehacne, 1997] S. Dehaene, The Number Sense, Oxford University Press, New York, (1997).
[Keedy, 1965] M. L. Keedy, Number Systems: A Modern Introduction, Addison-Wesley, Reading, MA, (1965).
[Nahin, 1998] P. J. Nahin, An Imaginary Tale: The Story of ,f
1,
Princeton University Press, Princeton, NJ, (1998). [Roberts, 1962] J. B. Roberts, The Real Number System in an Algebraic Setting, W. H. Freeman and Co., San Francisco, (1962).
A.2
The System of Complex Numbers
We will follow Hamilton's idea of defining a complex number z to be an
ordered pair of real numbers. So z = (a, b), where a and b are real, is a complex number; the first coordinate of z is called the real part of z, Re(z), and the second is called the imaginary part, lm(z). (There is really nothing imaginary about it, but that is what it is called.) Thus, complex numbers look suspiciously like points in the Cartesian coordinate plane R2. In fact, they can be viewed as vectors in the plane. This representation will help us draw pictures of complex numbers. It was Argand who suggested this and so this is sometimes called an Argand diagram. The operations on complex numbers are defined as follows: (Let z i = (xi, yi ) and z2 = (x2, y2) be complex numbers):
1.1 addition: zi e z2 = (xi + x2, y, + y2) 1.2 subtraction: zi e Z2 = (x1 - X2, y, - y2) 1.3 multiplication: zi G Z2 = (x,x2 - yIy2, xi y2 + yI x2) There are two special complex numbers 0 (zero) and 1 (one). We define:
1.4 the zero 0 = (0, 0) 1.5 the multiplicative identity 1 = (I , 0) (x2xi + y2yi xi y2 - x2yi 1 zi 1.6 division: - = J provided z2 # 02 2 2 Z2
xZ + y2
x2 + y2
A.2 The System of Complex Numbers
465
Some remarks are in order. First, there are two kinds of operations going on here: those in the complex numbers ®, O, etc. and those in R, +, etc. We are going to play as if we know all about the operations in R. Then, we will derive all the properties of the operations for complex numbers. Next, the definitions of multiplication and division may look a little weird, but they are exactly what they have to be to make things work. Let's agree to write C = ]R2 for the set of complex numbers equipped with the operations defined above. We can take a hint from viewing C as R 2 and identify the real number a with the complex number (a, 0). This allows us to view R as contained within C. So our inclusion chain now reads
NcZcQcRcc. Also, we can recapture the usual way of writing complex numbers by
(a, b) = (a,0)e(0, b) = a ®(b,0)0(0, 1) = a ®bai Since the operations in R and C agree for real numbers, there is no longer a need to be so formal with the circles around plus and times, so we now drop that. Let's just agree (a, b) = a + bi.
APA Exercise Set 1 1. Use the definitions of addition, subtraction, multiplication, and division to compute zi + zz, zi - zz. z, z2, zi /zz, and zZ, z, , where zi = 3 + 4i, and Z2 = 2 - i. Express you answers in standard form a + bi.
2. If i = (0, 1), use the definition of multiplication to compute i2. 3. Find the real and imaginary parts of:
(a) 3 - 4i (b) 6i (c) (2 + 3i)2 (d) ( _/ 3 --i ) 3
(e) (I + i)10. 4. Find all complex numbers with z2 = i. 5. Consider complex numbers of the form (a, 0), (b, 0). Form their sum and product. What conclusion can you draw'?
Complex Numbers
466
6. Solve z + (4 - 3i) = 6 + 5i for z. 7. Solve z(4 - 3i) = 6 + 5i for z.
8. Let z 0 0, z = a +bi. Compute
1
z
in terms of a and b. What is I , i
I
2+i
9. Prove Re(iz) = -Im(z) for any complex number z. 10. Prove Im(iz) = Re(z) for any complex number z. 11. Solve (1 + i )z2 + (3i )z + (5 - 4i) = 0. 12. What is 09991
13. Draw a vector picture of z = 3 + 4i in 1It22. Also draw 3 - 4i and -4i.
A.3
The Rules of Arithmetic in C
Next, we will list the basic rules of computing with complex numbers. Once you have these, all other rules can be derived.
A.3.1 A.3.1.1
Basic Rules of Arithmetic in C Associative Law of Addition
zi + (Z2 + Z3) = (zi + Z2) + Z3 for alI zi, Z2, Z3 in C. (This is the famous "move the parentheses" law.) A.3.1.2
Existence of a Zero
The element 0 = (0, 0) is neutral for addition; that is, 0 + z = z + 0 = z for all complex numbers z. A.3.1.3
Existence of Opposites
Each complex number z has an opposite -z such that z+(-z)
More specifically, if z = (x, y), then -z = (-x, -y). A.3.1.4
Commutative Law of Addition
zI+ Z2 =Z2+zi for all zi, Z2 in C.
= 0 =(-z)+z.
'?
467
A.3 The Rules of Arithmetic in C A.3.1.5
Associative Law of Multiplication
ZI (Z2Z3) = (ZI Z2)Z3 for all Z I, Z2, Z3 in C.
A.3.1.6
Distributive Laws
Multiplication distributes over addition; that is, ZI(Z2 + Z3) = ZIZ2 + ZIZ3 for all zl, Z2, Z3 in C and also (zl + Z2)Z3 = ZIZ3 + Z2Z3
A.3.1.7
Commutative Law for Multiplication
ZIZ2 = Z2ZI for all zl, Z2 in C.
A.3.1.8
Existence of Identity
One is neutral for multiplication; that is, 1 numbers z in C. A.3.1.9
z = z 1 = z for all complex
Existence of Inverses
For each nonzero complex number z, there is a complex number z-I so that zz-I
x
-y
= z-Iz = 1. In fact, ifz = (x, y) # 0, then z-I = Cxz + y2' x2 + y2)
We will illustrate how these rules are established and `leave most of them to you. Let us set a style and approach to proving these rules. It amounts to "steps and reasons"
PROOF of 2.1 Let ZI = (al, bl), Z2 = (a2, b2), and Z3 = (a3.b3)
Then zl + (z2 + Z3) = (al , bl) + ((a2, b2) + (a3,b3) by substitution, by definition of addition in C, = (albs) + (a2 + a3, b2 + b3) = (al + (a2 + a3), bl + (b2 + b3)) by definition of addition in C, = ((al + a2) + a3, (bl + b2) + b3) by associative law of addition in R, by definition of addition in C, = (al + a2, bl + b2) + (a3b3) = ((al , bl) + (a2, b2)) + (a3, b3) by definition of addition in C, by substitution. = (zI + Z2) + Z3
a
You see how this game is played'? The idea is to push everything (using definitions and basic logic) back to the crucial step where you use the given properties of the reals R. Now see if you can do the rest. We can summarize our discussion so far. Mathematicians say that we have established that C is afield.
468
Complex Numbers
APA Exercise Set 2 1. Prove all the other basic rules of arithmetic in C in a manner similar to the one illustrated above.
2. Provezi z2=0iffzi =0orz2=0. 3. Prove the cancellation law: z,z = z2z and z $ 0 implies z, = Z24. Note subtraction was not defined in the basic rules. That is because we can define subtraction by z, - Z2 = z, +(-z2). Does the associative law work for subtraction? 5. Note division was not defined in the basic rules. We can define z, - Z2 = z, by z, - Z2 = z, z; , if z2 0 0. Can you discover some rules that work Z2
for division?
6. Suppose z = a + bi
a
_
a
I az + b2
0. Argue that f _ V
2
(
1+
az + bz
1Va2+b2(a 2)ifb>_OandF7 + z
+i
1-
a
a2 + b2
) if b < 0. Use this information to find
7. Argue that axz+bx+c = O has solutions z = -
and z =
b
- (2a)
(2ab
-V
)L -
c
a
bz
I + i.
(_)+J(_)2
aC
even if a, b, and c are complex
numbers.
A.4
Complex Conjugation, Modulus, and Distance
There is something new in C that really is not present in R. We can form the complex conjugate of a complex number. The official definition of complex
conjugate is: if z = (a, b) = a + bi, then z := (a, -b) = a - bi. Next we collect what is important to know about complex conjugation.
A.4 Complex Conjugation, Modulus, and Distance
A.4.1
469
Basic Facts About Complex Conjugation
(3.1)Z=zforallzinC. (3.2)zi+z2=Zi+Z2 forallz1,z2inC. (3.3) zi -z2 =Zi - Z2 for all Zi, Z2 in C. (3.4) TI-z2 = Z,Z2 for all zi, z2 in C.
(3.5)
(-ZI
_? forallzi,z2inCwith Z2
Z2
(3.6)
- i for all nonzero z in C. /I
C1\ z
1
As illustration again, we will prove one of these and leave the other proofs where they belong, with the reader.
PROOF of 3.4 Let z, = (a I, b,) and z2 = (a2, b2). Then, by substitution, ziz2 = (a,, b,) (a2, b2) = (a,a2 - b1b2, a,b2 + b,a2) by definition of multiplication in C,
= (a,a2-b,b2, -(a,b2+b,a2)) = (a,a2 - b1b2, -a,b2 - b,a2)
by definition of complex conjugate, by properties of additive inverses
in R,
= (a, a2 - (-b,) (-b2), a, (-b2) + (-b, )a2) by properties of additive inverses in R,
_ (a,, -b,)(a2, -b2)
bydefinitionofmultiplication
in C, by substitution.
= Z, Z2
0
Again, see how everything hinges on definitions and pushing back to where you can use properties of the reals. Next, we extend the idea of absolute value from R. Given a complex number
z = (a, b) = a + bi, we define the magnitude or modulus of z by IzI := a2 +b2. We note that if z is real, z = (a, 0); then IzI = a2 = Ial the absolute value of a. Did you notice we are using I I in two ways here? Anyway, taking the modulus of a complex number produces a real number. We collect the basics on magnitude.
A.4.2
Basic Facts About Magnitude
(3.7) zz = Iz12 for all complex numbers z, (3.8) IzI ? 0 for all complex numbers z,
(3.9) IzI =0iffz=0, (3.10) IZIZ2l = Izi I IZ2l for all ZI, Z2 in C,
(3.11) I z' I z2
=
Izj I IZ2I
for all ZI, Z2 in C with z2 A 0,
Complex Numbers
470
(3.12) Izi + z21 _< Izi I + Iz21 for all zI, z2 in C (the triangle inequality),
(3.13)
z
= izil- for all z # 0 in C,
(3.14)11z,I-Iz211 _< Izi -Zz1forallZl,Z2inC, (3.15) IzI = I -zI for all z in C, (3.16) IzI = IzI. Again, we leave the verifications to the reader. We point out that you do not always have to go hack to definitions. We have developed a body of theory (facts) that can now be used to derive new facts. Take the following proof for example.
PROOF of 3.10 IZIZ212 = (z1z2)(z,z2)
= (z,z2)(z,z2) = ((zIZ2)zI)z2 = (z, (z2z, W Z2
= ((zizi)z2)z2 = (z,z,)(z2z2)
=
1Z1 j2 Iz212
= (IzI I Iz21)2
by 3.7, by 3.4, by 2.5, by 2.5, by 2.5 and 2.7, by 2.5, by 3.7,
by the commutative law of multiplication in R.
Now, by taking square roots, 3.10 is proved.
0
One beautiful fact is that when you have a notion of absolute value (magnitude), you automatically have a way to measure distance, in this case, the distance between two complex numbers. If zI and z2 are complex numbers, we define the distance between them by d(z1, z2) IzI - Z21 . Note that if zI = (a,, b,) and z2 = (a2, b2), then d(zl, z2) = IzI - z21 = I(ai - a2, b, - b2)I = (a, - a2)22 + (b, - b2)2. This is just the usual distance formula between points in R2! Let's now collect the basics on distance. These follow easily from the properties of magnitude.
A.4.3
Basic Properties of Distance
(3.17)d(zI,Z2)>0 for all Zi, z2 in C.
(3.18) d(z,,Z2)=0iffz, =Z2forz,, Z2 in C. (3.19) d(z1, z2) = d(z2, z,) for all z,. Z2 in C. (3.20) d(z,, z3) -< d(z,, z2) + d(z2, z3) for all z,, z2, z3 in C (triangle inequality for distance).
A.4 Complex Conjugation, Modulus, and Distance
471
3.20d(zi+w,z2+w)=d(z1,Z2)for all zi,Z2,win C (translation invariance of distance). We leave all these verifications as exercises. Note that once we have distance
we can introduce calculus since we can say how close two complex numbers are. We can talk of convergence of complex sequences and define continuous functions. But don't panic, we won't. We end this section with a little geometry to help out our intuition.
T
Imaginary Axis
i 1R /,'
z = (a,b) = a + bi
Real Axis
z = (a,-b) = a - bi
Figure A1.2:
Magnitude and complex conjugate of z = a + bi.
Note that Izl is just the distance from z to the origin and z is just the reflection of z about the real axis. The addition of complex numbers has the interesting interpretation of addition as vectors according to the parallelogram law. (Remember from physics?)
472
Complex Numbers
(a+c, b+d)
Figure A1.3:
Vector view of addition of complex numbers.
(You can check the slopes of the sides to verify we have a parallelogram.) The multiplication of complex numbers also has some interesting geometry behind it. But, for this, we must develop another representation.
APA Exercise Set 3 1. Prove the basic facts about complex conjugation left unproved in the text.
2. Prove the basic facts about modulus left unproved in the text. 3. Prove the basic facts about distance left unproved in the text.
4. Find the modulus of 7 - 3i, 4i, 10 - i.
A. 5 The Polar Form of Complex Numbers
473
5. Find the distance between 3 - 2i and 4 + 7i. if z2 A 0.
= i 2, 6 - 3i 7. Calculate 2 - i using the formula in problem 6 above.
6. Prove z2
8. Find z if (5 + 7i )Z = 2 - i.
i+
9. Compute .
i-z
10. Prove Re(z) =
Z(z + z),
/m(z) = Z; (z - z) for any z in C.
11. Prove IRe(z) ICI zl for any z in C.
12. Prove lRe(z)l + l!m(z)l
vf2 Izl
.
13. Prove z is real iff z = z; z is pure imaginary iff
14. Prove that forallz,winC,
-z. lz+w12+Iz-w12=2Iz12+21w12.
15. Prove that forallz,winC, Izwl <
(Iz12+Iw12). z
16. For complex numbers a,b,c,d, prove that la - bI I c - d l + la - d I lb - cl > Ia - cl lb - dl . Is there any geometry going on here? Can you characterize when equality occurs?
A.5
The Polar Form of Complex Numbers
You may recall from calculus that points in the plane have polar coordinates as well as rectangular ones. Thus, we can represent complex numbers in two ways. First, let's have a quick review. An angle in the plane is in standard position if it is measured counterclockwise from the positive real axis. Do not forget, these angles are measured in radians. Any angle 0 (in effect, any real number) determines a unique point on the unit circle, so we take cos 0 to be the first coordinate of that point and sin 0 to be the second. Now, any complex number z = (a, b) = a + bi # 0 determines a magnitude r = IzI = a2 -+b2 and an angle arg(z) in standard position. This angle 0 is not unique since, if 0 = arg(z), then 0 + 2Trk, k E 7G works just as well. We take the principal argument Arg(z) to be the 0 such that -Tr < 0 < Tr.
474
Complex Numbers
z = (a,b)
Figure A1.4:
Polar form of a complex number.
From basic trigonometry we see cos 0 =
a
=
IzI
z = a + bi = r(cos 0 + i sin 0).
a
r
b
and sin 0 =
Izi
= b. Thus r
There is an interesting connection here with the exponential function base e. Recall from calculus that x4 ex = l +x+-+-+-+...+-+... 2! 3! 4! n! x2
x3
xn
.
Let's be bold and plug in a pure imaginary number i 0 for x.
e'0 = l +(i0)+
=1+
i0
02
03 4
Z +
4
=cos0+isin0.
6
-
6i +
04
8
3
i(0
-
ei
S
+ si
f+ 7
9
9i...
A.5 The Polar Form of Complex Numbers
475
This is the famous Euler formula eme = cos 0 + i sin 0.
We can now write
z = a + bi =
re'0,
where r = Izi and 0 = arg(z) for any nonzero complex number z. This leads to a very nice way to multiply complex numbers. THEOREM A.1 Let z i = r, (cos 01 + i sin 0i) = r, eie' and z2 = r2(cos 02 + i sin 02) = r2eie2. Then zrz2 = r1 r2 [cos(01 + 02) + i sin(0i + 02)] = rIr2e'(e'+02)
PROOF
We compute z i z2 = r, (cos 01 + i sin 01)r2(cos 02 + i sin 02) =
r1 r2[(cos 01 cos 02-sin 01 sin 02)+i(sin 0r cos 02+cos 01 sin 02)] = ri r2[cos(Or +02) + i sin(0r + 02)] using some very famous trigonometric identities. 0
In words, our theorem says that, to multiply two complex numbers, you just multiply their magnitudes and add their arguments (any arguments will do). Many nice results fall out of this theorem. Notice that we can square a complex number z = reie and get z2 = zz = r2e2'e. Who can stop us now? z3 = r3es;e z4 = r4e410 ... Thus, we see that the theorem above leads to the famous theorem of Abraham DeMoivre (26 May 1667-27 November 1754). THEOREM A.2 (DeMoivre's theorem) If 0 is any angle, n is any integer, and z = re' e, then z" = r"erne = r" (cos(n0)+ i sin(nO)).
PROOF An inductive argument can be given for positive integers. The case n = 0 is easy to see. The case for negative integers has a little sticky detail.
First, if z = re'e # 0, then (re'e)(I a-'e) = eo = 1. This says z-1 = I e-1e. r r Now suppose n = -m, where m is a positive integer. Then z" = (re'°)" _
(reierm =
((reie)-i)m =
(I a-i0)mn = r-mei(-m)o = rnetne
0
r
One clever use of DeMoivre's theorem is to turn the equation around and find the nth roots of complex numbers. Consider z" = c. If c = 0, this equation
has only z = 0 as a solution. Suppose c # 0 and c = p(cos g + i sin g) = z" = r"(cos(nO) + i sin(nO)). Then we see p = r" and 9 = nO or, the other way around, r = p'/" and g = go + 2k'rr, where go is the smallest nonnegative
Complex Numbers
476
Po + 2kir where k is any integer. To get n distinct angle for c. That is, 0 = n roots, we simply need k = 0, 1, 2, ... THEOREM A.3 (on Nth roots)
For c = p(cos yo + i sin yo), p # 0, the equation z" = c, it a positive In
integer has exactly n distinct roots, namely z = p cpo + 2k tr
isin
)),
n
yo + 2k-tr 1
cos (
)+
n
where k=0,1,2...
For example, let's find the fifth roots of c = -2 + 2i. First, r = 1-2 + 2i I _ 37r
so yo = 135° = 3irr/4. Thus, c = 23/2(cos 4 + i sin
=
4) _
23/2ei3n'4. So the fifth roots are 2-1/10e 3,r/20
= 23/10 (cos
37r
20
+ i sin
;
20 23/ loe' 19w/20 = 23/ 10(cos 19Tr
20
23/ 10e i357r/20
= 23110(cos 27° + i sin 27°),
20
117r 3/10 (COs nn° +t stn 99° ), +t stn-)=2 20
23t toe IOrr/2° = 23/ 10(eos 11 Tr
23/ 10e127Tr'20
37r)
19ar
+ i sin
23/ 10(cos 171° +i sin 171`),
20 27Tr
= 23t10(cos 20 + i sin
23110(cos 243° + i sin 243°),
20 35,7r
= 23"10(cos 20 + i sin
20
=
23/ 1o(cos 315° + i sin 315°).
A special case occurs when c = 1. We then get what are called the roots of unity. In other words, they are the solutions to z" = I. THEOREM A.4 (on roots of unity) The it distinct nth roots of unity are 2,rrk
en
= cos
(211k) n
(2,rrk
+ i sin
n
l
,
where k = 0, 1, 2... , n - 1.
/)
Moreover, if we unite w = cos 12n I + i sin 12-n n
I
\\\n/
, then w is the nth root
of unity having the smallest positive angle and the nth roots of unity are w, w2, w3, ... , wn-1 . Note that all roots of unity are magnitude one complex numbers, so they all lie on the unit circle in R 2. There is some nice geometry here. The roots of unity lie at the vertices of a regular polygon of n sides
A.5 The Polar Form of cornplex Numbers
477
inscribed in the unit circle with one vertex at the real number 1. In other words, the unit circle is cut into n equal sectors.
.Find Example Al the four fourth roots of I and plot them
Figure A1.5:
I
7r
'Tr
cos 7r + i sin 7r = -1, 37r
W3 = COST + i sin
37r
2
= -l.
Complex Numbers
478
APA Exercise Set 4 1.
Express 10(cos 225` + i sin 225") in rectangular form. Do the same for b)
6(cos 60° + i sin 60°) 17(cos(270° + i sin 270")
c) d)
2f(cos 150° + i sin 150`) 6f(cos45° + i sin 45°)
e)
12(cos 330° + i sin 330°) 40(cos 70° + i sin 70°) 6(cos 345° + i sin 345°)
a)
f) g) 2.
h)
18(cos 135° + i sin 135
i)
41(cos90°+isin90")
j)
5 /(cos 225° + i sin 225
10(cos300° + i sin 300") 22(cos 240° + i sin 240') m) 8(cos 140° + i sin 140") n) 15(cos 235° + i sin 235') k) 1)
Write the following complex numbers in polar form. j) 14 a) 8 s) 18i
b) - 19i
k) 3 . - i 3 4-
1) -4-4i d) -13-i13. m) +i 15 n) 12 + 5i e)4-3i o)-13-10i t)6+8i c)
15 - i f
g)5/ +i5' h) 31i
i) -4f -4i
P) - 7 - 7i
q) -8'+8i
r) 15f -i15f
t)- 11+ lli u) 10-il0.
v)2f -i2/ w) - 6 + 12i
x)7- lli y) - 23
z)3-i3f3aa) 10 + 20i
3. Perform the indicated operations and express the results in rectangular form. a) [2(cos 18° + i sin 18°)] [4(cos 12° + i sin 12°)] b) [ 10(cos 34° + 34°)] [3(cos 26° + i sin 26°)] c) [7(cos 112° + i sin 112°)] [2(cos 68° + i sin 68°)] d) [6(cos 223° + i sin 223°)] [cos 227° + i sin 227°)] e)
12(cos 72° + sin 72°)
3(cos 42° + sin 42°) 24(cos 154° + sin 154°) 6(cos 64° + i sin 64°) 42(cos 8° + i sin 8°)
g) h) 4.
7(cos 68° + i sin 68°)
6f(cos 171° + i sin 171°) 2(cos 216° + i sin 216')
Express in rectangular form. a) [2(cos 30° + i sin 30°)14 b) [4(cos 10° + i sin 10")]6 c) [3(cos 144° + i sin 144°)]5
d) [2(cos 210° + i sin 210°)]'
A.5 The Polar Form of Complex Numbers
e) [2(cos 18° + i sin 18°)
f) [ 3 (cos 30° + i sin
ff
479
5
30°)] -6
g) [(cos 15° + i sin 15°)]100 h) (cos 60° + i sin 60°)50
i)(2 - 2
. 030
/3-4O
j)(-2+ 2:) 1
k) (
+ i)5
1) (v-i,5)9 5. Find the fourth roots of -8 + i 8
6. Find in polar form the roots of the following. Draw the roots graphically. a)
b) c)
d) e) f)
g) h) i)
j)
k)
1)
m)
x2 = 36(cos 80° + i sin 80°) x2 = 4(cos 140° + i sin x3 = 27(cos 72° + i sin 72°) x3 = 8(cos 105° + i sin 105°) x4 = 81(cos 64° + i sin 64°) x4 = 16(cos 200° + i sin 200°) X5 = (cos 150° + i sin 150°) x6 = 27(cos 120° + i sin 1400)
x2=l+if
1200)
x2=8-i8
x3+4+if3- =0
x3-8f-i8f = 0 X4 =2-iJ
n)
X4 = - 8 + i 8
o)
X5-32=0
P)
x5 + 16
f
+ IN = 0
7. Find all roots and express the roots in both polar and rectangular form.
a) x2 + 36i = 0
b) x2=32+i32f c) x3-27=0
d) x3 - 8i = 0
e) x3+216=0 f) X3+27i = 0 g) x3 - 64i = 0
h) x4+4=0 i) x4+81 =0 j) x4 = -32 - i32,/.3
k) x6-64=0
480
Complex Numbers
1) x6+8=0
m) x5-I=0
n) x5 - 2431 = 0
8. Find necessary and sufficient conditions on the complex numbers z and w so that zz = ww.
A.6
Polynomials over C
It turns out that properties of polynomials are closely linked to properties of matrices. We will take a few moments to look at some basics about polynomials. But the polynomials we have in mind here have complex numbers as coefficients, such as (4+ 2i )xz + 3i x +(,,f7-r +ezi ). We will use C [x] to denote the collection of all polynomials in the indeterminate x with coefficients from C. You can add
and multiply polynomials in C [x] just like you learned in high school. First, we focus on quadratic equations, but with complex coefficients. This, of course, includes the real coefficient case. Let ax 2 + bx + c = 0, where a, b, and c are in C. We will show that the famous quadratic formula for the roots of the quadratic still works. The proof uses a very old idea, completing the square. This goes back at least to the Arab mathematician Al-Khowarizmi (circa 780-850) in the ninth century A.D. Now a is assumed nonzero, so we can write c x + ba-x = --. a
of the first degree term squared) and get b
b
a
2a
X+ x+
z
z
2b
Do you remember what to do'? Sure, add
to each side (half the coefficient
(
c
= -a +
(b\2 2a
The left-hand side is a perfect square! z Cx+2a) -xz+ax+(2a)--a+I/ 2a z
Now form a common denominator on the right Cx
b
+ 2a)
z
b2 - 4ac 4a2
)2.
A.6 Polynomials over C
481
We have seen how to take square roots in C, so we can do it here. x +
b2 - 4ac
b 2a =
±
b
2a
'
b2 _-4 a c
X
2a 2a In 1797, Gauss, when he was only 20 years old, proved probably the most crucial result about C [x] I. You can tell it must be good by what we call the theorem today; it is called the Fundamental Theorem of Algebra.
THEOREM A.5 (the fundamental theorem of algebra) Every polynomial of degree n > I in C[x] has a root in C. We are in no position to prove this theorem right here. But that will not stop us from using it. The first consequence is the factor theorem, every polynomial in C [x] factors completely as a product of linear factors.
THEOREM A.6 (the factor theorem)
Let p(x) be in C [x] and have degree n > 1. Then p(x) = a(x - r,)(x where a, r, , r_.. , r are complex numbers with a # 0. The r2) ... (x -
numbers r, , r, ... r (possibly not distinct) are the roots of p(x) and a is the coefficient of the highest power of x in p(x).
PROOF By the fundamental theorem, p(x) has a root, r1. But then we can factor p(x) = (x - ri)q(x), where q(x) E C [x] and the degree of q(x) is one less than the degree of p(x). An inductive argument then gets us down to a last
factor of the form a(x -
0
In the case of real polynomials (i.e., R [x]), the story is not quite as clean, but we still know the answer. THEOREM A.7
Every polynomial p(x) in R [x] of degree at least one can be factored as a product of linear factors and irreducible quadratic factors. PROOF
Let p(x) E R [x] with p(x) =
a,x + ao.
Here the coefficients ai are all real numbers. If r is a root of p(x), possibly com-
plex, then so is 7 (Verify p(r) = p(T)). Thus, nonreal roots occur in complex conjugate pairs. Therefore, if s i , ... sk are real roots and r1, T , r2, T2 ... are the complex ones we get inC [x], p(x) =a(x-s1)...(x-sk)(x-rl)(x-Ti)....
482
Complex Numbers
But (x - rj)(x - I' j) = x2 - (rj + rj )x + r? j is an irreducible quadratic in II8[x], so the theorem is proved.
0
APA Exercise Set 4 1. Wouldn't life be simpler if fractions were easier to add? Wouldn't it he great if + ti = a+n? Are there any real numbers a and b that actually work in uthis formula? Are there any complex numbers that actually work in this formula?
Further Reading [B&L, 1981 J J. L. Brenner and R. C. Lyndon, Proof of the Fundamental Theorem of Algebra, The American Mathematical Monthly, Vol. 88, (1981), 253-256. [Derksen, 20031 Harm Derksen, The Fundamental Theorem of Algebra and Linear Algebra, The American Mathematical Monthly, Vol. 110, No. 7, Aug-Sept (2003), 620-623. [Fine, 1997] B. Fine and G. Rosenberger, The Fundamental Theorem of Algebra, Springer-Verlag, New York, (1997).
[H&H, 2004] Anthony A. Harkin and Joseph B. Harkin, Geometry of Generalized Complex Numbers, Mathematics Magazine, Vol. 77, No. 2, April, (2004), 118-129. [Ngo, 1998] Viet Ngo, Who Cares if x2 + 1 = 0 Has a Solution?, The College Mathematics Journal, Vol. 29, No. 2., March, (1998), 141-144.
A.7
Postscript
Surely by now we have come to the end of our construction of number systems. Surely C is the ultimate number system and you can do just about anything you want in C. Well, mathematicians are never satisfied. Actually, the story goes on. If you can make ordered pairs in R2 into a nice number system,
A. 7 Postscript
483
why not ordered triples in IR3. Well, it turns out, as William Rowan Hamilton discovered about 150 years ago, you cannot. But this Irish mathematician discovered a number system you can make with four-tuples in R4. Today we call this system IHI, the quaternions. If q, = (a, b, c, d) and q2 = (x, y, u, v),
then q, ® q2 = (a + x, b + y, c + u, d + r) defines an addition that you probably would have guessed. But take a look at the multiplication! q, O q2 =
(ax-by-cu-dv,ay+bx+cu-du,au+cx+dy-bv, av +dx +bu -cy). How many of the properties of C also work for 1HI? There is a big one that does not. Can you find it? Now C can he viewed as being contained in IHI by taking four-tuples with the last two entries zero. Our chain now reads
NcZcQcRcCCH. Mathematicians have invented even weirder number systems beyond the quaternions (see Baez [20021). To close this section, let me tempt you to do some arithmetic in the finite system 10, 1, 2, 3) where the operations are defined by the following tables:
+
0
1
2
0
3
0
0
0
1
2
3
1
1
0
3
2
1
2
2
3
0
1
2
1
0
3
3
3
2
0 0 0 0
1
0
2
3
0
1
0 2
2
3
1
3
1
2
3
Can you solve 2x = 1 in this system'? Can you figure out which of the rules of C work in this system? Have fun!
Further Reading [Baez, 20021 J. C. Baez, The Octonions, Bulletin of the American Math-
ematical Society, Vol. 39, No. 2, April (2002), 145-205. Available at . [Kleiner, 1988] I. Kleiner, Thinking the Unthinkable: The Story of Com-
plex Numbers (With a Moral), Mathematics Teacher, Vol. 81, (1988), 583-592. [Niev, 1997] Yves Nievergelt, History and Uses of Complex Numbers, UMAP Module 743, UMAP Modules 1997,1-66.
Appendix B Basic Matrix Operations
Introduction
B.1
In Appendix A, we looked at scalars and the operations you can apply to them. In this appendix, we take the same approach to matrices. A matrix is just a rectangular array of scalars. Apparantly, it was James Joseph Sylvester (3 September 1814-15 March 1897) who first coined the term "matrix" in 1850. The scalars in the array are called the entries of the matrix. We speak of a matrix having rows and columns (like double-entry bookkeeping). Remember,
columns hold up buildings, so they go up and down. Rows run from left to right. So, an m-by-n matrix A (we usually use uppercase Roman letters to denote matrices so as not to confuse them with scalars) is a rectangular array of scalars arranged in m horizontal rows and n vertical columns. In general then, an m-by-n matrix A looks like all
a12
...
aln
a21
a22
...
a2n
A
la1
an,l
amt
...
mxn
_= rra11 m , n
_ [fa,
anu,
where each aid is in C for I < i < m and I < j < n. In a more formal treatment, we would define a matrix as a function A : [m] x [n] --p C where [m] = [ 1 , 2,... , m} and [n] = ] 1, 2,... , n} and A(i, j) = aid, but we see no advantage in doing that here. Note that we can locate any entry in the matrix by two subscripts; that is, the entry aid lives at the intersection of the i th row and j th column. For example, a34
denotes the scalar in the third row and fourth column of A. When convenient, we use the notation entij (A) = aid; this is read "the (i, j) entry of A" The index i is called the row index and j the column index. By the ith row of A we mean the 1-by-n matrix rowi(A) = [a11 ail ... air]
485
Basic Matrix Operations
486
and by the jth column of A we mean the m-by-I matrix
1
We say two matrices are the same size if they have the same number of rows and the same number of columns. A matrix is called square if the num-
ber of rows equals the number of columns. Let C"' denote the collection of all possible m-by-n matrices with entries from C. Then, of course, C" "" would represent the collection of all square matrices of size n-by-n with entries from C. Let's have a little more language before we finish this section. Any entry in a matrix A with equal row and column indices is called a diagonal element of
A; diag(A) = [a,, a22 a33 ... ]. The entry ai j is called off-diagonal if i # j, subdiagonal if i > j, and superdiagonal if i < j. Look at an example and see if these names make any sense. We now finish this section with the somewhat straightforward idea of matrix equality. We say two matrices are equal iff they have the same size and equal entries; that is, if A, B E C"' "", we say A = B iff enti j (A) = ent,j (B) for all i, j
with I < i < m and I < j < n. This definition may seem pretty obvious, but it is crucial in arguments where we want to "prove" that two matrices that may not look the same on the surface really are. Notice we only speak of matrices being equal when they are the same size.
APB Exercise Set 1 1. Let A =
1. What is ent,3(A)? How about ent32(A)? What L4 5 6 are the diagonal elements of A? The subdiagonal elements? The superdiagonal elements? How many columns does A have? How many rows?
2. Suppose
L X- Y 2x - 3w
Y+ z 3w + 2z
What are x, y, z, and w?
]= f 9 L6
2 5
B.2 Matrix Addition
B.2
487
Matrix Addition
We now begin the process of introducing the basic operations on matrices. We can add any two matrices of the same size. We do this in what seems a perfectly
reasonable way, namely entrywise. Let A and B be in C" with A = laid I , B = [b, ]. The matrix sum A ® B is the m-by-n matrix whose (i, j) entry is aid + b, . Note that we have two notions of addition running around here, ® used for matrix addition and + used for adding scalars in C. We have defined one addition in terms of the other; that is A ® B = la;j + b,]. In other words, ent;i(A ® B) = ent11(A) + ent;j(B). For example, in C2x3, 1
[4
2 5
3
6,®[
2
4
1
3
6
3
5,-[5
6 8
11
Now the question is: what rules of addition does this definition enjoy? Are they the same as those for the addition of scalars in C? Let's see! THEOREM B.1 (basic rules of addition in C"' x") Suppose A, B, and C are in C"' "' Then 1. Associative law of matrix addition
A®(B(D C)=(A®B)®C. 2. Existence of a zero
Let 0 = [0]",x" (i.e., ent11(0) = 0). Then for any matrix A at all,
®®A=A=A®®. 3. Existence of opposites
Given any matrix A in C'"we can find another matrix B with A ®
B = 0 = B ® A. In fact take B with ent;j(B) = -entqj(A). Denote
B=-A Then A®(-A)-®=(-A)®A.
4. Commutative law of matrix addition For any two matrices A, B in Cm x", A ® B = B ® A.
PROOF We will illustrate the proofs here and leave most as exercises. We appeal to the definition of matrix equality and show that the (i, j) entry of the
Basic Matrix Operations
488
matrices on both sides of the equation are the same. The trick is to push the argument back to something you know is true about C. Let's compute: LA ® (B (D C)l = ent,j (A) + entij (B (D C) = ent;j (A) + (entgi(B) + ent,j (C))
definition of matrix addition, definition of matrix addition,
= (entij(A) + ent,j(B)) + ent,j(C)
associative law of addition in C,
= ent,i(A (D B) + ent,j (C) = ent,j ((A ® B) ® C)
definition of matrix addition, definition of matrix addition.
ent,
a
That was easy, wasn't it'? All we did was use the definition of matrix addition
and the fact that addition in C is associative to justify each of our steps. The big blow of course was the associative law of addition in C. Now you prove the rest!
So far, things are going great. The algebraic systems (C, +) and (C"`" ®) have the same additive arithmetic. The basic laws are exactly the same. We can even introduce matrix subtraction, just like we did in C. Namely, for A, B in
Cmxn define A J B = A ® (-B). In other words,
ent,j(A 0 B) = ent,1(A) - ent,j(B) = a,j - b,j. Now that the basic points have been made, we will drop the special notation of circles around addition and subtraction of matrices. The context should make clear whether we are manipulating scalars or matrices.
APB Exercise Set 2 1.
Let A= [
2 0 1 ]'B- [ 61 4 Find A + B. Now find B + A. What do you notice'? Find A - B. Find B - A. What do you notice now? 1
2. Suppose A = [
3
1
4
j' X =
x u
Y j, and B =
5
10
14
20
Solve for X in A + X = B. 3. What matrix would you have to add to the zero matrix?
Si
3+
] in
C2x2 to get
2i
4. Prove the remaining basis laws of addition in C"I'l using the format previously illustrated.
B.3 Scalar Multiplication
B.3
489
Scalar Multiplication
The first kind of multiplication we consider is multiplying a matrix by a scalar. We can do this on the left or on the right. So let A be a matrix in C""", and let a E C be a scalar. We define a new matrix aA as the matrix obtained by multiplying all the entries of A on the left by a. In other words, ent;l(aA) = a ent;l(A); that is, aA = a [aid] = [aa,j]. For example, in C2x2,
we have 3
1
4
2 6
, = [
3
6
12
18
Let's look at basic properties.
THEOREM B.2 (basic properties of scalar multiplication)
Let A, B be in C"<" anda, b be inC. Then
1. a(A + B) = aA + aB. 2. (a + b)A = aA + bA. 3. (ab)A = a(bA).
4. IA=A. 5. a(Ab) = (aA)b.
PROOF
As usual, we illustrate only one of the arguments and leave the rest as exercises. The procedure is to compare (i, j) entries. Let's prove (1) together:
entij(a(A + B)) = a enti1(A + B) = a [entij(A) + ent11(B)] = a(ent;j (A)) + a(entij (B)) = entij(aA) + entij(aB) = entij(aA + aB)
definition of scalar multiplication, definition of matrix addition, left distributive law in C, definition of scalar multiplication, definition of matrix addition,
Once again, our attack on this proof has been to appeal to definitions and then use a crucial property in C. 0 Our game plan is still going well, although it is a little unsettling that we are mixing apples and oranges, scalars and matrices. Here we multiply a matrix by a scalar and get another matrix. Can we multiply a matrix by a matrix and get another matrix? Yes, we can, and that is what we address next.
Basic Matrix Operations
490
APB Exercise Set 3 1. Let
A=[
2
1
1
1
B=[
J,
2
5
4
J. Compute 5A - 3B,
-4(A - 8). 2. Prove the remaining basic properties in the manner illustrated. Formulate and prove similar properties for right scalar multiplication.
3. Let C E 0"'. Argue C = A + i B where the entries of A and B are real. Argue A and B are unique.
B.4
Matrix Multiplication
The issue of matrix multiplication is not quite so straightforward. Suppose we have two matrices A and B of the same size. Since we defined addition entrywise, it might seem natural to define multiplication of matrices the same way. Indeed you can do that, and we will assign exercises to investigate properties of this definition, of multiplication. However, it turns out this is not the "right" definition. There is a sophisticated way to motivate the upcoming definition, but we will attempt one using systems of linear equations. Consider the problem of substituting one system of linear equations into another. Begin I a x, + a12x2 = C1 a,1 a12 with . The coefficient matrix is A = a2, x, + a22x2 = c2
a21
Now consider a linear substitution for the xs coefficient matrix here is B =
61, 621
b12 b22
I x, = b y, + b12 y2 :
I x2 = b21Y1 + b22y2
. The
Now, putting th e xs back int o
the original system, we get a, Ix, +a12x2
a22
b12Y2) + a12(b21 Y1 +b22Y2)
= a,1b11Y1 + a11b12Y2 + a,2b21Y1 + a12b22Y2 _ (a,1b,I + a22b21)y, + (a,1b,2 + a,2b22)Y2
and
a21x1 + a22x2 = a21(b11)'1 + b12Y2) + a22(b21Y1 + b22Y2)
= a2,b11y, + a2,b12y2 + a22b21Y1 + a22b22y2
= (a21 b + a22b21)Y, + (a21 b12 + a22b22)Y2
B.4 Matrix Multiplication
491
If we use matrix notation for the original system, we get AX = C, where
X= L
J and C =
y1
Y= L
ish
X1 2
Y2
J.
cz
1.
We write the substitution as X = BY, where
Then the new system is AX = A(BY) = (AB)Y = C ifyou cher-
the associative law. Thus, allbi1 +a12b21 a11bit +a12b22
coefficient
the
=
AB
matrix
So, if we make this our definition of a21b11 +az1b21
azib12+azzbaz
J
matrix multiplication, we see the connection between the rows of A and the columns of B. Using this row by column multiplication produces the resulting matrix AB. This is what leads us to make the general definition of how to multiply two matrices together. Suppose A is in C''" and B is in C"P. Then the product matrix AB is the matrix whose (i, j) entry is obtained by multiplying each entry from the ith row of A by the corresponding entry from the jth column of B and adding the results. Notice that the product matrix has size m-by-p. Let's spell this out a bit more. The product matrix AB is the m-by-p matrix whose (i, j) entry is
ent;j(AB) = k=1
Okay, still clear as mud? Let A = [a,1] and B = [b11]. Then ent,j(AB) = an b1j + a12b2J + ... + ai"b,,j. 4
Now, let's look at an example. Let A = [
26
1
43
0 -1 3
0] and B =
27
1
52
Let's compute the (2,3) entry of AB. We use the second row of A and third column of B: r 2.4+6.3+0. 5 = 26. So 12 2 4 60 L 1
] 14
1
43
0 -1 3 27
1
52
-
[
26
*,'
Does that help? See if you can fill out the rest of AB. As you can see, multiplying matrices requires quite a bit of work. But that is why we have calculators and computers. Notice that to multiply matrix A by matrix B in the order AB, the matrices do not have to be the same size, but it is absolutely necessary that the number of columns of A is the same as the number
Basic Matrix Operations
492
of rows of B. If you remember the concept of dot product from your earlier studies, it might he helpful to view matrix multiplication in the following way:
ent;j(AB) = row,(A). colj(B). Before we move to basic properties, let's have a quick review of the useful sigma notation we have already used above. You may recall this notation from calculus when you were studying Riemann sums and integrals. Recall that the capital Greek letter E stands for summation. It tells us to add something up. It
is a great shorthand for lengthy sums. So al + a2 + ... + an = Eat. Now the i=1
b1j b2j
(i, j) entry of A B can be expressed as [a; I ae2 ... a;,,
aikbkj :
k=I
bnj
using E-notation. Note the "outside" indices i and j are fixed and the "running index" is k, which matches on the "inside." There are three basic rules when we do computations using the E-notation.
THEOREM B.3 (basic rules of sigma) n
n
n
j=1
j=1
j=1
1. > (aj +bj) = >aj +
bj.
n
2. Ea bj =a Ebj. j=1
l=1
3. E > ajk = E E ajk.
j=lk=1
4=1 j=I
The proofs here are unenlightening computations and we will not even ask you to do the arguments as exercises. However, we would like to consider rule (3) about double summations. This may remind you of the Fubinni theorem in several variable calculus about interchanging the order of integration (then again, maybe not). Anyway, we can understand (3) by using matrices. Consider all a,2 the matrix a21 a22 . Let T stand for the summation total of all the matrix
all
a32
entries. Well, there are (at least) two different ways we could go about computing
T in steps. First, we could compute row sums and add these or we could first compute column sums and then add these. Evidently, we would get the same
B.4 Matrix Multiplication
493
answer either way, namely T.
all
a12
a21
a22
a31
a32
k=1 3
3
> aj2
>aj1 j=1 2
2
j=1
2
3
2
3
3
Thus, T = >alk + >a2k + >a3k = E > aik or T = >aji + >aj2 = k=I 2
k=1 3
3
j=lk=I
k=1 3
2
j=l
j=1
3
> ajk or T = >ajl + >aj2 = E Eajk.
j=l j=I k=lk=l So you see, matrices are good for something already! Now let's look at the basic properties of matrix multiplication. k=l j=1
THEOREM B.4 (basic properties of matrix multiplication) Let A, B, C be matrices and a E C. Then
1. If A is m-by-n, B n-by-p, and C p-by-q, then A(BC) = (AB)C in C",y
2. If A is m-by-n, B n-by-p, and C n-by-p, then A(B + C) = AB + AC in (Cmxp
3. If A is in-by-n, B is m-by-n, and C is n-by-p, then (A + B)C = AC + BC inCmxp
4. If A is m-by-n, B n-by-p, and a E C, then a(AB) _ (aA)B = A(aB). 5. row1(AB) = (row;(A))B.
6. colj(AB) = A(colj(B)). 7. Let I denote the n-by-n matrix whose diagonal elements all equal one and all off-diagonal entries are zero. Let A be m-by-n. Then IA = A = AI,,.
The proofs are left as exercises in entry verification in the manner 0 already illustrated.
PROOF
Basic Matrix Operations
494
What are some of the consequences of these basic properties'? That is, 1 1 1 , +, , share many common algebraic features. Thus, (C, +, , 0, the basic arithmetic of both structures is the same. However, now the cookies
start to crumble. Let's look at some examples. Let A = [ 6 4
[,
2
]inCz z. Note AB= [
9
6 ] and B =
23
18 0 -6 -9 ] =BA. Thus, the commutative law of multiplication breaks down for matrices. There1
fore, great care must be exercised when doing computations with matrices. Changing the order of a product could really mess things up. Even more can
go wrong; A # 02 but A2 =
®z, so C2x2 has "zero divisors" 0 0 ] = even though C does not. These two facts bring in some big differences in the arithmetic of scalars versus matrices. Understanding the nature of matrix multiplication is crucial to understanding matrix theory, so we will dwell on the concept a bit more. There are four ways to look at multiplying matrices and each gives insights in the appropriate context. We will illustrate with small matrices to bring out ideas. [
View 1 (the row column view [or dot product view]) a
Let A =
bd ] and B =
c
rb
Recall the dot product notation
a
[ax + bu] = [(a, b) (x, u)]. Then AB =
d
c
_
(a, b) (x, u)
(a, b) (y, v)
This is the row-column
I
-
View 2 (the column view) Here we leave A alone but think of B as a collection of columns. Then AB =
[c
d
]
U
ax + bu
ex+du
v
ay + by .
-[[c
d
U
]
[c
d
v
We find it very handy in matrix theory to do
cu+du
things like AX = A [xi I xz I x'j = [Ax, I Axz I Axi], which moves the matrix on the left in to operate on the columns of the matrix on the right one by one.
495
B.5 Transpose
View 3 (the row view)
a
b
c
d
Here we leave B alone and partition A into rows. Then A B =
a
b]
c
d]
1
ax+bu ay+bv cx+du cy+dv
View 4 (the column-row view [or outer product view]) a
Here we partition A into columns and B into rows. Then AB =
x..
y..
_
by ] _ r ax + bu dv L cx+du
du x
[u
b
][a][ ]+[u [cx+du]
d
.
aaCyy
V][ax
J
L
I bu
c
ay + by ]
cy+du
Also notice that
ra b] L
d
c
(
x
]+[ du] -[ca ]x+[d ]u.We
[acx see that the matrix product AB is a right linear combination of the columns ]
of A, with the coefficients coming from B. Indeed, [ a
b]
X
Y
U
v
a [x y] + b [u v], so the product can also he viewed as a left linear combination of the rows of B, with coefficients coming from the matrix on the left.
B.5
Transpose
Given a matrix A in C"I'll we can always form another matrix AT, Atranspose, in C"'<"'. In other words,
ent;j(AT) = entj;(A).
Thus, the transpose of A is the matrix obtained by simply interchanging th rows and columns of A. If A = r
1
2
3
4
5
6
1
], then AT = [ 2 3
4 5
6
Basic Matrix Operations
496
THEOREM B.5 (basic facts about transpose)
1. If AEC"",then ATT =A. 2. IfA,BEC'nxn
then(A+B)T=AT+BT.
3. If A E Cm"" and B E C"',', then (AB)T = BT AT T.
4. If A is invertible in C"x" then (A T)-1 = (A')T.
5. (aA)T = aAT. PROOF We illustrate with (3). Compute ent,j((AB)T) = entj,(AB)
definition of transpose,
n
definition of matrix multiplication,
= >2ajkbki k=I n
= >bkiajk
commutativity of C,
k=1 n
= Eent;k(BT )entk,(AT)
definition of transpose,
k=1
= ent,j((BT )(AT ))
definition of matrix multiplication.
Since the (i, j) entries agree, the matrices are equal.
0
We can use the notion of transpose to distinguish important families of matrices. For example, a matrix A is called symmetric iff A = AT. A matrix A is called skew-symmetric if1 AT = -A. The reader may notice that there seems to be a pretty strong analogy between complex conjugation z 1 -> z and matrix transposition A 1> AT. Indeed, the analogy is quite strong. However, there is a problem. For complex numbers, zz = Iz12 = 0 implies z = 0. If we let
A=
i
it
ECzxz then AA T=
1
iI
tl
1
1
0 0
0. Note that if we had restricted A to real entries, then AAT = implies A = 0 for A E R"x". It boils down to the fact that, in R, the sum but A
of squares equaling zero implies each entry in the sum is zero. We can fix this problem for complex matrices. If A E Cmx", define the conjugate transpose A* by
ent,j(A*) = entj,(A) = entj,(A); that is,
A*=(AT)=At.
B.5 Transpose
497
But first we define the conjugate of a matrix and list the basic properties. Define A by ent;j(A) = ent11(A).
THEOREM B.6 (basic fact about the conjugate matrix) Let A, B E C"xn and a E (C. Then
1. A=A. 2. (A -+B ) = A + B.
3. AB = AB. 4. A = A iff all entries of A are real.
5. aA = aA. 6. (X)T = AT. THEOREM B.7 (basic facts about conjugate transpose) Let A, B E C" x", and a E C. Then
1. (A*)* = A. 2. (A + B)* = A* + B*. 3. (AB)* = B*A*.
4. (aA)* = &A*. 5. (AA*)* = AA* and (A*A)* = A*A. 6.
AA* _ ® implies A = 0.
7. A* = A"
(A*)*).
8. (A ®B)* = A* ®B*. Now the real numbers sit inside of the complex numbers, IR e C. We can identify complex numbers as being real exactly when they are equal to their complex conjugates. This suggests that matrices of the form A = A* should play the role of real numbers in C"x". Such matrices are called selfadjoint, or Hermitian, in honor of the French mathematician Charles Hermite (24 December 1822-14 January 1901). Constructing Hermitian matrices is easy. Start with any matrix A and form AA* or A*A. Of course, Hermitian matrices
Basic Matrix Operations
498
are always square. So, in C"x" we can view the Hermitian matrices as playing
the role of real numbers in C. This suggests an important representation of matrices in C11". Recall that if z is a complex number, then z = a + bi, where a and b are real numbers. In fact, we defined Re(z) = a and Im(z) = b. We saw z
z
z
z
so the real and imaginary part of a complex Re(z) = and Im(z) = 2i , 2 number are uniquely determined. We can reason by analogy with matrices in * A_ A Cn"" Let A E C""". Define Re(A) = Aand Im(A) = . This 2i A*
leads to the so-called Cartesian decomposition. Let A E C11 "I be arbitrary. Then Re(A) and Im(A) are self-adjoint and A = Re(A) + Im(A)i. This has important philosophical impact. This representation suggests that if you know everything
2
there is to know about Hermitian matrices, then you know everything there is to know about all (square) matrices. Let's push this analogy one step forward. A positive real number is always a square; that is, a > 0 implies a = b2 for some real number b. Thus, if we view a as a complex number, a = bb = bb. This suggests defining a Hermitian matrix H to be positive if H = SS* for some matrix S 0. We shall have more to say about this elsewhere.
APB Exercise Set 4 1. Establish all the unproved claims of this section. 2. Say everything you can that is interesting about the matrix i 3 2
3
2
2
3
3
I
2
3
2
_2
3 I
3
3
3
3. Let A = Find A2, A3,
J. Find
...
A2, A3,... ,A". What if A
I
I
l
1
1
1
1
1
1
?
, A". Can you generalize?
4. Argue that (A + B)2 = A 2 + 2A B + B 2 iff AB = BA.
5. Argue that (A + B)2 = A22+ B2 iff AB = -BA. 6. Suppose AB = aBA. Can you find necessary and sufficient conditions so that (A + B)3 = A3 + B3'? 7. Suppose AB = C. Argue that the columns of C are linear combinations of the columns of A.
B.5 Transpose
499
8. Leta E C. Argue that CC
0 0 ABC a
9. Argue that AB is a symmetric matrix iff AB = BA. 10. Suppose A and B are 1-by-n matrices. Is it true that ABT = BAT? 11. Give several nontrivial examples of 3-by-3 symmetric matrices. Do the same for skew-symmetric matrices.
12. Find several nontrivial pairs of matrices A and B such that AB 54 BA. Do the same with A B = 0, where neither A nor B is zero. Also find examples of AC = BC, C 96 0 but A 0 B. 0
13. Define [A, B]
= AB - BA. Let A =
2 0
B
0
i2
0
-i 2
0
i2
,
and C
-1
0
0
0 0
0 0
0
2
0
0
42
2
0
Prove that
1 0 i 2 0 [A, B] = X, [B, C] = M, [C, A] = i B, A2 + B2 + C2 = 213.
0 J . Do the same for
14. Determine all matrices that commute with L 0 0
1
0
0 0
0 0
0
1
. Can you generalize?
X
0 0
15. Find all matrices that commute with 0 0
trices that commute with
1
16. Suppose A =
0
0
1
0
1
0
1
0
1
0
1
1
0
0
0
1
0
X
I
0
x
. Now find all ma-
and p(x) = x2 - 2x + 3. What is p(A)?
17. Suppose A is a square matrix and p(x) is a polynomial. Argue that A commutes with p(A).
Basic Matrix Operations
500
18. Suppose A is a square matrix over the complex numbers that commutes with all matrices of the same size. Argue that there must exist a scalar X
such that A = V. 19. Argue that if A commutes with a diagonal matrix with distinct diagonal elements, then A itself must be diagonal.
20. Suppose P is an idempotent matrix (P = P2). Argue that P = Pk for all k in N.
21. Suppose U is an upper triangular matrix with zeros down the main diagonal. Argue that U is nilpotent.
22. Determine all 2-by-2 matrices A such that A2 = I. 2
23. Compute [
.
1
Determine all 2-by-2 nilpotent matrices.
J
24. Argue that any matrix can be written uniquely as the sum of a symmetric matrix plus a skew-symmetric matrix.
25. Argue that for any matrix A, AAT, AT A, and A + AT are always symmetric while A - AT is skew-symmetric. 26. Suppose A2 = I. Argue that
P=
;(I
+ A) is idempotent.
27. Argue that for any complex matrix A, AA*, A*A, and i(A - A*) are always Hermitian.
28. Argue that every complex matrix A can be written as A = B +iC, where B and C are Hermitian. A-I
29. Suppose A and B are square matrices. Argue that Ak - Bk = E Bj (A j=0
B)Ak-I-j. 30. Suppose P is idempotent. Argue that tr(PA) = tr(PAP). 31. Suppose A
® C ].whatisA2?A3?CanyouguessA?1?
32. For complex matrices A and B, argue that AB = A B, assuming A and B can be multiplied.
33. Argue that (I - A)(I + A + integer m.
A2+.
+ A"') = I - A"'+I for any positive
B.5 Transpose
501
34. Argue that 2AA* - 2BB* = (A + B)*(A - B) + (A - B)*(A + B) for complex matrices A and B. 35. This exercise involves the commutator, or "Lie bracket," of exercise 13. Recall [A, B] = A B - BA. First, argue that the trace of any commutator is the zero matrix. Next, work out the 2-by-2 case explicitly. That is, e a b calculate I ] explicitly as a 2-by-2 matrix. g h 1 [ c d
f
1[
36. There is an interesting mapping associated with the commutator. This is for a fixed matrix A, AdA by AdA(X) = [A, X] = AX - XA. Verify the following:
(a) AdA(A)=O. (b) AdA(BC) = (AdA(B))C + B(AdA(C)). (Does this remind you of anything from calculus?) (c) AdAd,,(X) = ([AdA, AdB])(X) (d) AdA(aX) = aAdA(X), where a is a scalar. (e) AdA(X + Y) = AdA(X) + MAW)37. Since we add matrices componentwise, it is tempting to multiply them componentwise as well. Of course, this is not the way we are taught to do it. However, there is no reason we cannot define a slotwise multiplication of matrices and investigate its properties. Take two rn-by-n matrices, A = [a;j] and B = [b, ], and define A 0 B = [a,1b;j], which will also be m-by-n. Investigate what rules hold for this kind of multiplication. For example, is this product commutative? Does it satisfy the associative law? Does it distribute over addition? What can you say about the rank of A p B? How about vec(A (D B)? 38. We could define the absolute value or magnitude of a matrix A = [a;j] E Cmxn by taking the magnitude of each entry: JAI = [la;jl]. (Do not confuse this notation with the determinant of A.) What can you say about
IA + BI? How about JaAI where a is any scalar? What about IABI,
IA*J,and IA0BI? 39. This exercise is about the conjugate of a matrix. Define ent;i(A) _ ent;j .(A). That is, A =
(a) A+B=A+B (b) AB = AB (c) aA = &A (d) (A)k = (Ak).
Prove the following:
Basic Matrix Operations
502
40. Prove that BAA* = CAA* implies BA = CA.
41. Suppose A2 = 1. Prove that (I + A)" = 2"-'(1 + A). What can you say
about(I - A)"? 42. Prove that if H = H*, then B*HB is Hermitian for any conformable B.
43. Suppose A and B are n-by-n. Argue that A and B commute if A - X1 and B - XI commute for every scalar X. 44. Suppose A and B commute. Prove that AI'By = B`t AN for all positive integers p and q. 45. Consider the Pauli spin matrices Q, = [ 0
and v2 = [ 0
0
0 J , a'.
0 0`
J. Argue that these matrices anticommute (AB =
-BA) pairwise. Also, compute all possible commutators.
B.5.1 B.5.1.1
MATLAB Moment Matrix Manipulations
The usual matrix operations are built in to MATLAB. If A and B are compatible matrices,
A+B
matrix addition
A*B
matrix multiplication
A-B An
matrix subtraction
ctranspose(A)
conjugate transpose
transpose(A)
transpose of A
A'
conjugate transpose
A.'
transpose without conjugation
A."n
raise individual entries in A to the rnth power
matrix to the nth power
Note that the dot returns entrywise operations. So, for example, A * B is ordinary matrix multiplication but A. * B returns entrywise multiplication.
B. 6 Submatrices
B.6
503
Submatrices
In this section, we describe what a submatrix of a matrix is and establish some notation that will be useful in discussing determinants of certain square submatrices. If A is an m-by-n matrix, then a submatrix of A is obtained by deleting rows and/or columns from this matrix. For example, if
A=
all all
a12
a13
a14
a22
a23
a24
a31
a32
a33
a34
we could delete the first row and second column to obtain the submatrix a12
a13
a14
a32
a33
a34
From the three rows of A, there are (3) choices to delete, where r = 0, 1, 2, 3, and from the four columns, there are (4) columns to delete, where c = 0, 1, 2, 3. Thus, there are (3)(4) submatrices of A of size (3-r)-by-(4-c).
A submatrix of an n-by-n matrix is called a principal submatrix if it is obtained by deleting the same rows and columns. That is, if the ith row is deleted, the ith column is deleted as well. The result is, of course, a square matrix. For example, if
A=
all
a12
a13
a21
a22
a23
a31
a32
a33
and we delete the second row and second column, we get the principal submatrix
all
a13
a31
a33
The r-by-r principal submatrix of an n-by-n matrix obtained by striking out its last n-r rows and columns is called a leading principal submatrix. For the matrix A above, the leading principal submatrices are
[all],
[
a2 all
all
a12
a13
a21
a22
a23
a31
a32
a33
a12
a22
Basic Matrix Operations
504
It is handy to have a notation to be able to specify submatrices more precisely.
Suppose k and in are positive integers with k < in. Define Qk,n, to he the set of all sequences of integers ( i 1 , i2, ... , ik) of length k chosen from the first n positive integers [m ] = { 1, 2, ... , n }: QA..' = 1 0 1
,'---, 6iA) I 1 < it < i2 < ... < ik < m}.
Note that Qk.,,, can be linearly ordered lexicographically. That is, if a and 0 are in Qk,,,,, we define a< 0 iff a = ( i 1 , i2, ... , ik), [ 3 = ( J l , j2, ... , Jk) and
i, < j, or i, = j, and i2 < k2, or i, = j, and i2 = j2 and i3 < J3,
, it = jl
and, ... , iA - I = fA-I but ik < jk. This is just the way words are ordered in the dictionary. For example, in lexicographic order: Q2.3
Q2.4 3.4
_ {(1, 2), (1, 3), (2, 3)}, _ {(l, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)}, _ {(1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4)},
/Q
Q 1.4 _ {(1),(2),(3),(4)}.
It is an exercise that the cardinality of the set Qk.n, is the binomial coefficient in choose k:
QA.n,I=( k)
.
There is a simple function on Qk,,,, that will prove useful. It is the sum function S : Qk.n, - N, defined by s(i,, i2, ... , ik) = it + i2 + - - + ik. Now if A is
in-by-n, a E Qk,n [3 E Qj,,, and the notation A[a 101 stands for the k-by-j submatrix of A consisting of elements whose row index comes from a and whose column index comes from [3. For example, if
A
all
a12
a13
a14
a15
a21
a22
a23
a24
a25
a31
a32
a33
a34
a35
a41
a42
a43
a44
a45
1
and a E Q2.4 , a = (1, 3), R E Q3.5 R = (2, 4, 5), then
A[a
f a12
a14
a32
a34
a] = f alt
a13
a31
a33
]=
A[a I
and so on. We adopt the shorthand, A [ a I a] = A [a] for square matrices A. Note
that A[a] is a principal submatrix of A. Also, we abbreviate A[(i,, i2, ... , ik)]
505
B.6 Submatrices by A[i 1, i2, ... , ik]. Thus, A[ 1,
...
, r] is the r-by-r leading principal submatrix
all
of A. Using A above, A[1, 2] =
a12
a2, Each a E Qk,n, has a complementary sequence & in Qk-m.m consisting of the integers in { l , ... , m } not included in a, but listed in increasing order. This allows us to define several more submatrices:
A[a A(a
a21
A
I
I
R]A[&IR],
R) = A[« I Rl For square matrices, A(a) - A(a I a). Note if A[a] is a nonsingular principal submatrix of A, we can define its Schur complement as A(a
I
A/A[a] = A(a](A[a])-IA[a). We sometimes abuse notation and think of a sequence as a set with a preferred ordering. For example, we write A[a U {i,,} I a U { jp}], where i,, and j,, are not in a but they are put in the appropriate order. This is called "bordering" Using A and a = (1, 3) from above, we see A[a U {4} 1 a U {5}] =
all
a13
a15
a31
a33
a35
a41
a43
a45
In this notation, we can generalize the notion of a Schur complement even further:
A/A[a, R] = A[& I R] - A[a I R] (A [a I R])-I A[a I R].
APB Exercise Set 5 1. What can you say about the principal submatrices of a symmetric matrix? A diagonal matrix? An upper triangular matrix? A lower triangular matrix? An Hermitian matrix?
2. How do the submatrices of A compare with the submatrices of A'9. Specifically, if A is the r-by-s submatrix of the m-by-n matrix A obtained by deleting rows i l , i2, ... , in,_r and columns jl, j2, ... , and B is the s-by-r submatrix obtained from AT by deleting rows jl , j2, ... and columns, i 1 , i2, ... , ihow does B compare to A? 3. Prove that Qk.n, I =
().
Basic Matrix Operations
506
Further Reading [B&R, 1986(2)] T. S. Blyth and E. F. Robertson, Matrices and Vector Spaces, Vol. 2, Chapman & Hall, New York, (1986). [Butson, 1962] A. T. Butson, Generalized Hadamard Matrices, Proceedings of the American Mathematical Society, Vol. 13, (1962), 894-898. [Davis, 19651 Philip J. Davis, The Mathematics of Matrices: A First Book
of Matrix Theory and Linear Algebra, Blaisdell Publishing Company, New York, (1965). [G&B, 19631 S. W. Golomb and L. D. Baumert, The Search for Hadamard Matrices, The American Mathematical Monthly, Vol. 70, (1963), 12-17.
[J&S, 1996] C. R. Johnson and E. Schreiner, The Relationship Between AB and BA, The American Mathematical Monthly, Vol. 103, (1996), 578-582. [Kolo, 1964] Ignace 1. Kolodner, A Note on Matrix Notation, The Amer-
ican Mathematical Monthly, Vol. 71, No. 9, November, (1964), 10311032.
[Leslie, 1945] P. H. Leslie, On the Use of Matrices in Certain Population Mathematics, Biometrika, Vol. 33, (1945). ILiitkepohl, 1996] Helmut Ldtkepohl, Handbook of Matrices, John Wiley & Sons, New York, (1996).
B.6.1 MATLAB Moment B.6.1.1
Getting at Pieces of Matrices
Suppose A is a matrix. It is easy to extract portions of A using MATLAB.
A(i, j) A(
A(i, A(end,
.
returns the i, j-entry of A. , j) returns the jth column of A.
.
) returns the ith row of A. )
returns the last row of A.
B.6 Submutrices A(
A(
.
507
end) returns the last column of A. ) returns a long column obtained by stacking the columns of A. [) is the empty 0-by-0 matrix. ,
Submatrices can be specified in various ways. For example,
A([i j], [p q r]) returns the submatrix of A consisting of the intersection of rows i and j and columns p, q, and r. More generally,
A(i : j, k : m) returns the submatrix that is the intersection of rows i to j and columns k tom. The size command can be useful; size(A), for an m-by-n matrix A, returns the two-element row vector [m, n] containing the number of rows and columns in the matrix. [m, n] = size(A) for matrix A returns the number of rows and columns in A as separate output variables.
Appendix C Determinants
C.1
Motivation
Today, determinants are out of favor with some people (see Axler [ 1995]). Even so, they can he quite useful in certain theoretical situations. The concept goes back to Seki Kowa (March 1642-24 October 1708), a famous Japanese
mathematician. In the west, Gottfried Leibniz (I July 1646-14 November 1716) developed the idea of what we call Cramer's rule today. However, the idea lay dormant until 1750, when determinants became a major tool in the theory of systems of linear equations. According to Aitken [ 1962], Vandermonde may be regarded the founder of a notation and of rules for computing with determinants. These days, matrix theory and linear algebra play the dominant role, with some texts barely mentioning determinants. We gather the salient facts about determinants in this appendix; but first we consider some geometric motivation. Consider the problem of finding the area of a parallelogram formed by two independent vectors -'A = (a, b) and -9 = (c, d) in R2. The area of the parallelogram is base times height, that is II A II sin(O). _g 112 Thus, the squared area is I 1 112 (1-cost (0)) (sin (9))2 = I I
II2
I
111112 11 _g
2-
11111 2 11
11 _g
11 _X
2 Cost
12
(0) =
II2 II
2-_X' _g
)2 = (a2+
b2)(c2+d2)-(ac+bd)2 = a2c2+a2d2+b2c2+b2d2-a2c2-2acbd -b2d2 = a2d2 -2adbc+b2c2 = (ad bc)2. So the area of the parallelogram determined by independent vectors A =r (a, b) and -9 = (c, d) is the absolute value of ad - be. If we make a matrix L
number as det More generally, det r
b
d
] , then we can designate this important
=nd -he all a21
a12 a22
al a22 -a12a21. Notice the "crisscross"
way we compute the determinant of a 2-by-2 matrix? Notice the minus sign? In two dimensions, then, the determinant is very closely related to the geometric idea of area.
509
Determinants
510
Figure Cl.!: Area of a parallelogram. Consider a parallelepiped in R3 determined by three independent vectors, _ (a,, a2, a3), _ (bi, b2, b3), and (ci, c2, c3). The volume of this "box" is the area of its base times its altitude; that is,
1
volume = (area of base)(altitude)
_
(1111111 Z'I sin (0))(11111 cos (0)) _
= 1((b2c3 - c2b3), (b3ci - bic3), (bicz - c,b2)) (a,, a2, a3)I
= Iai(b2c3 - c2b3) + a2(b3ci - bic3) + a3(bic2 - cib2)I = Ia,b2c3 - aic2b3 + a2b3c, - a2b,c3 + a3b,c2 - aic,b,I.
This is getting more complicated. Again we create a matrix whose columns are the vectors and define a 3-by-3 determinant: det a,c2b3 + a2b3c, - a2b,c3 + a3b,c2 - a3c,b2.
a,
b,
c,
a2 a3
b2 b3
c2 c3
= alb,c3 -
C. I Motivation
511
z
Figure C1.2:
Volume of a parallelepiped.
Actually, there is a little trick to help us compute a 3-by-3 determinant by hand.
Write the first two columns over again. Then go down diagonals with plus products and up diagonals with minus products:
det
all
a12
a13
all
a12
a21
a22
a23
a2l
a22
a31
a32
a33
a31
all
a22
a33
-a31
a22
a13
+
-
a32 a12
a23
a31
a32
a23
all
+
-
a13
a21
a32
a33
a21
a12
Thus, in three dimensions, the idea of determinant is closely associated with the geometric idea of volume.
Determinants
5 12
Though we cannot draw the pictures, we can continue this idea into R". A "solid" in W whose adjacent sides are defined by a linearly independent set of vectors a .1, 71 2, ... , ,, is called an n-dimensional parallelepiped. Can we get hold of its volume? We can motivate our generalization by looking at the two-dimensional example again. Form the matrix A whose columns are the vectors (a, b) and (c, d):
A-
a b
c
d
We note that the height of the parallelogram is the length of the ortho onal
projection of t' onto 1, which is (! - P-)(1), where we recall P-(
A
1 . Thus, the area of the parallelogram is
(/ - P-t)(1)
A
Now here is something you might not think of: Write the matrix A in its QR factorization A = Q R =
911
912
r12
921
q22
r22
I
.
Then, by direct computation using the orthonormality of the columns of Q, we find r1 I = til and r22 = 11 (1 - P-t)(') 11 and so the area of the parallogram is just the determinant of R. This idea generalizes. Suppose we make a matrix A whose columns are these independent vectors, A = [x1 I x, I ... I x"]. Then the volume is IIx1 II II(I - P2)x2 II. II U - P,,)xn II, where Pk is the orthogonal projection onto span [x1, x2, , xi1_1 } . Write A
in its QR factorization. Then the volume is the product of the diagonal elements of R. Now (det(A))2 = det(AT A) = det(RT QT QR) = det(RT R) = (det(R))2 = (volume)2. But then, volume equals the absolute value of det(A). Even in higher dimensions, the determinant is closely associated with the geometric idea of volume. Well, all this presumes we know some facts about determinants. It is time to go back to the drawing board and start from scratch.
C.2
Defining Determinants
We could just give a formula to define a determinant, but we prefer to begin more abstractly. This approach goes back to Kronecker. We begin by defining a function of the columns of a square matrix. (Rows could be used just as well.) The first step is to dismantle the matrix into its columns. Define col : Cnxn
Cnx1 x C"x1
x .. X Cnx1 (n copies)
C.2 Defining Determinants
513
by
col(A) = (col1(A), col2(A), ... , For example,
col([3 1
2
4 ]=4
2
1 3J,
EC 2X1 xC 2X1
4
.
Clearly, col is a one-to-one and onto function. Then we introduce a "determinant function" on the n-tuple space with certain crucial properties. Define D : C"X 1 x CnX1 x . . . X CnXI -> C to be n-linear if I.
D(...
,a[xi],...)=a D(...
,[x1],...)fori = 1,2,... n
2. D(... , [xi] + [yi], ...) = D(... , [xi], ...) + D(... , [yi], ...) i = 1,2,... ,n. We say D is alternating iff D = 0 whenever two adjacent columns are equal. Finally, D : C" X 1 x C" X l x . . . x C" X I -+ C is a determinant function iff it is n-linear and alternating. We are saying that, as a function of each column, D is linear, hence the name n-linear. Then we have the associated determinant det(A) = D(col(A)).
Note that (2) above does not say det(A + B) = det(A) + det(B), which is false as soon as n > 1. Also note det : C'" ---> C. To see some examples, fix a number b and consider Dh([
all
b(a11a22 - ai2a21) Dh is a determinant function. Define D(
a1
]) _ all all ant
alt
I
I
aln
a2n
a22
aI1a22 I
ant
I
ann
This is an n-linear function. Is it
Determinants
514
all a determinant function? Let or E S and define Do(
a21
and
1[
aln a2n
a determinant function'?
= ala(I)a2,(2) ann
all a21
Let f : Sn -* C be any function. Define Dt by D1( and
a1,, a2n
_ Ef((T)ala(I)a2a(2) ano(n)
Dt is an n-linear function. Is it a
aES
ann
determinant function'? Our goal now is to show that there is a determinant function for every n E N and this determinant function is uniquely determined by its value on the identity
matrix /n. Ifn = I this is easy: D([a]) = D(a[1]) = aD([1]) = aD(11). First, we assemble some facts. THEOREM C.1
Let D : C"' x C"' x
x C' > -+ C be a determinant function. Then
1. D=0ifthere is azero column. 2. If two adjacent columns are interchanged, D changes sign. 3.
If any two columns are interchanged, then D changes sign.
4. If two columns are equal, D = 0. 5. If the jth column is multiplied by the scalar a and the result is added to the ith column, the value of D does not change.
6.
all
a12
a21
a22
D(
aln
, ...
a2n
sgn(Q)ala(I) ...ano(n)
)_
,
aE S and
ant
ann
D(1,,)
C.2 Defining Determinants
515
PROOF In view of (1) in the definition of n-linear, (1) of the theorem is clear. To prove (2), let [x1 and [y] be the ith and (i + I )st columns. Then
0 = D( D( ,[x],[x]
,[y],[x] +[y],
D(... , [x], [x], ... ) + D(... , [x], [y], ...) + D(... , [y], [x], ... )
+D(... [yl, [y], ... ) ,
O + D(... , [xl, [yl, ...) + D(... , [yl, [x], ...) + 0. Therefore,
D(... , [x], [y], ... ) = _D(... , [y], [x], ... ). The proof of (3) uses (2). Suppose the ith and jth columns are swapped. Say
i < j. Then by j - i swaps of adjacent columns, the jth column can be put into the ith position. Next, the ith column is put in the jth place by j - i - 1 swaps of adjacent columns. The sign change is (_I = -1. Now (4) is )2i-2i-1
clear. For (5), we use linearity in the ith column. We compute D([xi 1, ...
,
[x,] + a[xil ...., [xj1,... , lx,,]) = D([xil,... , [x1]...... [xi], ... , [x.1)
+aD([x1],... , [xi]...., [xi],...
, [xn1)
= D([x1 ], ... , [x1]...... 1xJ, ... , [x,]) + 0 = D([x11, ... , [x1]...... [Xi], ... , [x]). The proof of (6) is somewhat lengthy and the reader is referred to A&W [ 19921 for the details. 0
In view of (6), we see that a determinant function is completely determined by its action on the identity matrix. Let D be such that D,,(I,,) = a. Indeed, D,, = aD1 so we take det as the associated determinant to D1, our uniquely defined determinant function that takes the value I on the identity matrix. Note, we could have just started with this formula
det(A) = ESgn(Q)alo(1) .
. ano(n)
OES,
and proved properties of this function but we like this abstract approach. You may already be familiar with the following results, which follow readily from this formula for det(A).
516
Determinants
COROLLARY CI 1. If A E C'1 <'1 is upper or lower triangular, then det(A) =a I Ia22 ... ally.
2. det(A) = det(AT). 3. det(AB) = det(A)det(B).
4. If S is invertible, det(S'I AS) = det(A).
5. If i 0 j, then det(P11A) _ -det(A). 6. If i 0 j, then det(TI j(a)A) = det(A). 7.
I for E S,,, then det(P(v)) = sgn(a).
8. det(cA) = c"det(A) if A is n-by-n. PROOF We will offer only two proofs as illustration and leave the rest to the reader. First, we illustrate the use of the formula inproving (2).
det(AT)
Y:sgn(a)a,(I)1 ... aa(,), oEs
sg
p n (T- I )aT -' (1)1 ... C1T-1(/l )'1
TESL
Esppn(T)aT 1(I)1 ...aT I(n)n TE S
Esgn(T)aTT-1(I)T(I) ... aTT-i(n)T(n) TES,,
Esgn(T)aIT(1) .
anT(n)
TE S.
det(A).
Next, we illustrate a proof based on the abstract characterization of determinant. Look at (3). The proof from the formula is rather messy, but consider a function
DB(A) = det(AB). One easily checks that DB is n-linear and alternating on the columns of A. Thus, DB(A) = bdet(A), but b = DB(I) = det(B), so the theorem follows.
0
C. 3 Some Theorems about Determinants
517
C.3 Some Theorems about Determinants In this section, we present some of the more important facts about determinants. We need some additional concepts. Suppose A = [a;j] E C"x". We shall use the notation developed for submatrices in the previous appendix. The main theorems we develop are the Laplace expansion theorem and the Cauchy-Binet theorem. These are very useful theorems.
C.3.1
Minors
If A E C"', the determinant of any submatrix A[a I [3] where a E Qk,m and [3 E Qk,,, is called a k-by-k minor of A or the (a, [3)-minor of A. Here, 0 < k < min{m, n}. The complementary minor is the determinant of A[a 101 if this makes sense. If k = m = n, then Q,,,,", has only one element so there is only one minor of order m, namely det(A). If k = I, there are m2 minors of order one, which we identify with the elements of the matrix A. There are (111) 2
elements in Qk,," and so there are (k) "' minors of order k. The determinant of a principal submatrix A [a] is called a principal minor and the determinant of a leading principal submatrix is called a leading principal minor. For example,
if A = [a, ] is 5-by-5, a = (2, 3, 5), and [i = (1, 2, 4), then the (a, [3)-minor is
det
a21
a22
a24
a31
a32
a34
a51
a52
a54
and the complementary minor is a13
a15
a43
a45
det 1
There are (3) = 10 minors of A of order three altogether.
C.3.2
The Cauchy-Binet Theorem
You probably recall the well-known theorem that "the determinant of a product of two square matrices is the product of their determinants." The generalization of this result has a history that can be read in Muir, [ 1906 pages 123-130]. We present a vast generalization of this theorem next. We follow the treatment in A&W [ 19921.
Determinants
518
THEOREM C.2 (Cauchy-Binet, 1812)
Suppose A E C""", B E C""' and C = AB E C""" . Suppose 1< t min fin, n, r} and suppose a E Q,.,,,, 0 E Q,.r. Then
det(C[a R]) = E det(A[a I y])det(BIy I PD. yEQ,n
PROOF Suppose a = (a,_. . (X,) E Q,,,,, and [3 = ([31, ... , {3,) E Q, r We compute the (i, j)-entry of C[a I [3] : 11
ent;j(AB[a I R]) = rowa,(A) colp,(B) = Eaa,kbkp, k=1 so k=n
k=n
Eaaikbkp, r
...
Eaa,kbkp, 1
1
AB[aI13]= 1,
1,
Eaa,kbkp,
i:aa,kbkp,
k=1
k=1
Now, using the n-linearity of the determinant function, we get n
n
I
bk,p,
det(AB[a I [3]) _ E ... Eaa,k, ...aa,k, det k,=1
k,=1
L
b k, P.
I-
Ifkj = kj for i # j, then the ith and jth rows of the matrix on the right are equal. Thus, the determinant is zero in this case. The only nonzero determinants that appear on the right occur when the sequence (k 1 , k2, ... , k,) is a permutation of a sequence -y = (y1, ... , E Q, ,,. Let cr he a permutation in the symmetric
group S, such that y; = kf(;) for I < i < t. Then bk,p,
det
= sgn(o) det B[y 10]. bk, p,
Given y E Q,.n, all possible permutations of y are included in the summation above. Therefore,
det (AB[a 10]) = E (
aa,,,,,,,) det B[y I [3]
yEQ,,, UEs,
det(A[a I yl)det(B[y I (3]) yEQ,,,
0
C.3 Some Theorems about Determinants
519
Numerous corollaries follow from this result. First, as notation, let I : n =
(1,2,3,... ,n). COROLLARY C.2
Suppose A is nt-by-n and B is n-by-m where m < n. Then
det(AB) = E det(A[1 : in I y])det(B[y I
1
: m]).
In other words, the determinant of the product A B is the sum of the products
of all possible minors of the maximal order m of A with the corresponding minors of B of the same order. Also note that if m > n, det(AB) = 0. COROLLARY C.3 For conformable square matrices A, B, det(AB) = det(A)det(B).
PROOF If y E Q,,, then y = I : n, so A[y AB[y y] = AB. Thus det(AB) = det(AB[y I
y] = A, B[y I y] and
I
y]) _ E det(A[y
I
I
yE Qnn
y])det(B[y I y]) = det(A)det(B).
U
COROLLARY C.4
If A is k-by-n where k < n, then det(AA*) _
IdetA[l:kIy]I2>0. yEQ4
COROLLARY C.S (the Cauchy identity) n `
n
F-aici
>aidi
j=1
i=1
n
it
>bici
>bidi
i=1
i=1
det
=
aj bj
I<j
ak bk
det
cj dj
ck dk
In other words,
(Eaici) (b1d1) i i=" I
-(
aidi) (rubic) =1
x (cjdk - ckdj) . As a special case, we get the following corollary.
(ab- akbj) I<'
Determinants
520
COROLLARY C.6 (Cauchy inequality) Over R, 2
n
('a?) Ebi
I det
2
(>aibi)
I
i=1
a;
ak
b;
bk
so over R, 2 n
n
a; b;
<
n
(a)? (b).; i-I
!=1
!=1
The Laplace Expansion Theorem
C.3.3
Another classical theorem deals with expressing the determinant of a square matrix in terms of rows or columns and smaller order determinants. Recall the sum map on Q,.,,; s : Qr.n -* Ndefinedbys(ii,i2,... ,ir)=it+i2+. { i,.
THEOREM C.3 (Laplace expansion theorem) Suppose A E C"' ' and a e Q,,,, for I < t < n. Then 1.
(fix a)det(A)=
(-1)'I°'+`(a)det(A[a I R])det(A[a 10])
(expansion of det(A) by the rows in a) 2.
(fix [3) det(A) = > (-1)'(a)+'(f3)det(A[a I R]) det(A[a I R]) aEQ,n
(expansion of det(A) by the columns in [3)
PROOF
Fix a in Q,,,, and define Da(A) =
(-I) `la>+'t1 det(A[a
I
PE Q, n
[3])det(A[a [3]). Then D. : tC" n -+ C is n-linear as a function of the columns of A. We need Da to be alternating and Da(l) = I to prove the result. I
Then, by uniqueness, Da = det. Suppose two columns of A are equal; say col,,(A) = colq(A) with p < q. If p and q are both in [3 E Q,,,,, then A[a 10] will have two columns equal so det(A[a I [3]) = 0. Similarly, if both p and q are in [3 E then det(A[a I [3]) = 0. Thus, in evaluating Da(A), it is only necessary to consider those [3 E Qr.,, such that p E [3 and q E [3 or vice versa. Suppose then that p E [3 and q E [3. Define a new sequence in Qr.n by replacing p in [3 by q. Then [3' agrees with [3 except that q has been replaced by p. Thus, s([3') - s([3) = q - p. Now consider (-I)''(a)det(A[a I [3]) det(A[a I [3]) + (- 1)'(P') det(A [a
I
[3']) det(A [a I [3']). We claim this sum
C.3 Some Theorems about Determinants
521
is zero. If we can show this, then D0(A) = 0 since (3 and (3' occur in pairs in Q,.,,. It will follow that DQ(A) = 0 whenever two columns of A agree, making D alternating. Suppose p = (3k and q = (31. Then (3 and (3' agree except in the range from p to q, as do (3 and (3'. This includes a total of q - p + 1 entries. We have (31 < ... < Rk = P < Rk+1 < ... < Rk+r-1 < q < [3k+r < ... < (3r and Ala (3] = A[a (3]P(w-1) where w is the r-cycle (k + r - 1, k + r - 2, .., k). Similarly, A[& I (i'])P(w) where w' is a (q - p + I - r)-cycle. (-1)'(P)+(r-1)+(q-v)-rdet(A[a Thus(-1)''(P')det(A[a I (3'])det(A[& I R']) = (3]) det(A [& I R]). Since s(13') + (r - 1) + (q - p) - r - s((3) = 2(q - p) is odd, we conclude the sum above is zero. We leave as an exercise the fact I
I
I
1
that Da(t) = 1. So, Da(A) = det(A) by the uniqueness characterization of determinants, and we are done.
0
(2) Apply the result above to AT. We get the classical Laplace expansion theorem as a special case.
THEOREM C.4 (Laplace cofactor expansion theorem) Let A E C""" Then
I. (-1)k+iakidet(A[k I j]) = biidet(A). k=1 n
2. j(-1)'+iaiidet(A[i I j]) = det(A), (expansion by the jth column). i=) n
3. y(-I)k+iaikdet(A[j I k]) = biidet(A). k=1 n
4.
F_(-1)'+iaiidet(A[i I J]) = det(A), (expansion by the ith row). i=1
So, in theory, the determinant of an n-by-n matrix is reduced by the theorem
above to the computation of it, (n - 1)-by-(n - 1) determinants. This can be continued down to 3-by-3 or even 2-by-2 matrices but is not practical for large _ n. There is an interesting connection with inverses.
Let A E Cnxn We note that A[i j] = aii = entii(A) and A[i J] is the matrix obtained from A by deleting the ith row and_ j th column. The (ij)-cofactor of A is defined by cofii(A) = (-1)'+idet(A[i j]). Define the cofactor matrix cof(A) by entii(cof (A)) = cofii(A) = (-1)'+idet(A[i I J]). The adjugate of A is then adj(A) = cof(A)T (i.e., the transpose of the matrix obtained from A by replacing each element of A by its cofactor). For example, I
I
I
let A =
3
-2
5
6
2
0
-3
-18
1
. Then adj(A) =
17
-6
-6 -10 -10 -1 -2 28
Determinants
522
In this notation, the Laplace expansions become 11
1.
>ak,cofki(A) = Fiijdet(A). k=I n
2. >aijcofij(A) = det(A). i=
3. >aikcofjk(A) = bijdet(A). k=1 n
4. >aijcofij(A) = det(A). j=1
The main reason for taking the transpose of the cofactor matrix above to define the adjugate matrix is to get the following theorem to be true.
THEOREM CS
LetAEC'
.
1. Aadj(A) = adj(A)A = det(A)1,,. 2. A is invertible iffdet(A) # 0. In this case, A-' = det(A)-1 adj(A).
COROLLARY C.7 A E Cn"n is invertible iff there exists B E C0)"' with AB = 1,,.
There is a connection with "square" systems of linear equations. Let Ax = b,
where A is n-by-n. Then adj(A)Ax = adj(A)b whence det(A)x = adj(A)b. n
n
_
Thus, at the element level, det(A)xi = J:(adj(A))jibi =j:(-1)i+jbidet(A[i I _ i=I i=I j]). This last expression is just the determinant of the matrix A with the jth column replaced by b. Call this matrix B. What we have is the familiar Cramer's rule.
THEOREM C.6 (Cramer's rule) Let Ax = b, where A is n-by-n. Then this system has a solution iff det(A) # 0, in which case xi =
det(Bi )
det(A)
for i = l , ...
, n.
C.3 Some Theorems about Determinants
523
In the body of the text, we have used some results about the determinants of partitioned matrices. We fill in some details here.
THEOREM C.7 C ] = det(A) det(C) for matrices of appropriate size.
det I A
PROOF
Define a function D(A, B, C) = det I
A
C ], where A and B
are fixed. Suppose C is n-by-n. Then D is n-linear as a function of the columns of C. Thus, by the uniqueness of determinants, D(A, B, C) = (det(C))D(A, B,
Using transvections, we can zero out B and not change the value of D. Thus, D(A, B, D(A, O, Now suppose A is m-by-m. Then D(A, O, is m-linear and alternating as a function of the columns of A. Thus D(A, 0, (det(A))D(Im, 0, 1,). But note that D(1,,,, 0, 1. D(A, B, C) = (det(C)) D(A, B, (det(C))(det(A)). D(A, B, C) = (det(C))D(A, 0, 0 This theorem has many nice consequences. COROLLARY C.8
1. det
B
] = det(A) det(C) for matrices of appropriate size. ®
L
C ]
2. det L A
3. Let M = L C
= det(A) det(C) for matrices of appropriate size.
D
] where A is nonsingular. Then
det(M) = det(A)det(D - CA -I B). B
DJ = det(D B
I PROOF CA-1
= det(l - CB) fur matrices of appropriate size.
A
For the most part, these are easy. For three, note IC [
L
- CB) for matrices of appropriate size.
1
]
0 D- CA-' B]
B ]] = D U
Determinants
524
APC Exercise Set 1 1. Prove that if A E
BE C" ", then det( A® B) =det(A)"det(B)"'.
2. Fill in the proofs for Theorem C. I and Corollary C.I I.
3. Fill in the proofs for Corollary C.2, Theorem C.4, and Corollary C.7.
4. Argue that A isinvertibleiffadj(A)isinvertible,inwhich case adj(A)-' _ det(A)-I A. 5. Prove that det(adj(A)) = det(A)"-'.
6. Prove that if det(A) = 1, then adj(adj(A)) = A.
7. Argue that adj(AB) = adj(B)adj(A). 8. Prove that B invertible implies adj(B-'AB) = B-'adj(A)B. 9. Prove that the determinant of a triangular matrix is the product of the diagonal elements. 10. Prove that the inverse of a lower (upper) triangular matrix is lower (upper) triangular when there are no zero elements on the diagonal.
11. Argue that ad j (AT) = (ad j (A))T .
12. Argue that det(adj(A)) = (det(A))"-1 for it > 2 and A n-by-n. 13. Give an example to show det(A+ B) = det(A)+det(B) does not always hold.
14. If A is n-by-n, argue that det(aA) = a"det(A). In particular, this shows that det(det(B)A) = det(B)" det(A). 15.
It is important to know how the elementary matrices affect determinants. Argue the following:
(a) det(D;(a)A) = adet(A). (b) det(T;j(a)A) = det(A).
(c) det(P;jA) = -det(A). (d) det(AD;(a)) = adet(A). (e) det(AT;j(a)) = det(A).
(t) det(AP;j) = -det(A).
C.3 Some Theorems about Determinants
525
16. Argue that over C, det(AA*) > 0.
17. Argue that over C, det(A) = det(A). Conclude that if A = A*, det(A) is real.
18. If one row of a matrix is a multiple of another row of a matrix, what can you say about its determinant?
10 0
1
0
19. What is det [
? How about det J
b
0
b
c
0
a
0 0
'? Can you gener-
alize?
20. Suppose A is nonsingular. Argue that (adj(A))-' = det(A-')A = ad j (A-' ).
21. Find a method for constructing integer matrices that are invertible and the inverse matrices also have integer entries. 22. If A is skew-symmetric, what can you say about det(A)? 23. What is det(C(h)), where C(h) is the companion matrix of the polynomial h(x).
24. Prove det I
I B J = det
I
Al
1
-1
25. Suppose A = [ 11 1
0
® 1 1
1
0
1
Ol
1 1
0 1
0 0
1
-1 l
0
0
-1
1
Calculate det(A) for each. Do you recognize these famous numbers'? 1
X1
26. Prove that d et
2
x1
1
...
1
x2
...
Xn
2
x2
2
xn
,
= 11
(xi - xj)
.
This
1<j
x2n-I
n-1 xn
is a famous determinant known as the Vandermonde determinant.
27. There is a famous sequence of numbers called the Fibonacci sequence. There is an enormous literature out there on this sequence. It starts out { 1, 1, 2, 3, 5, ... }. Do you see the pattern'?
526
Determinants
i 0
i
l
i
0 0
O
i
l
i
0 0 i
I
I
Anyway, let F he the n-by-n matrix Fn =
... ..
0 0
...
0
0 0 0 Compute the determinants of F1, F2, F3, F4, and F5, and decide if there is a connection with the Fibonacci sequence. i
b+,
28. Find det
a
a
b+,
b+c
+a
b
a+b
a+b
c+a
LI
1
Notice anything interesting'? u+h c
29. (F. Zhang) For any n-by-n matrices A and B, argue that det I
A
L -B
>0.
B A
30. (R. Bacher) What is the determinant of a matrix of size n-by-n if its (i, j)
entry is a°'->>2 for I < i, j < n?
l+a 31. Find det
1
1
1
I +b
1
1
. Can you generalize your findings?
I
l+c
32. Here is a slick proof of Cramer's rule. Consider the linear system Ax = b, XI
X2
where A is n-by-n, where x =
X3
. Replace the first column of the
L -Cn
identity matrix by b and consider A[x I
e2
I
e3
I
I
en] = [Ax
I
[b I col2(A) I col3(A) I Taking determinants and using that the determinant of a product is the product of the determinants, we find Ae2 I Ae3 I
I
I
det(A) det([x I e2
I e3
I
...
I
det([b I col2(A) I col3(A) I
I coln(A)])
But det([x I e2 I e3 so
xi =
I
I e]) = xi, as we see by a Laplace expansion,
det([b I col2(A) I COMA) I det(A)
I
colt, (A)])
C.3 Some Theorems about Determinants
527
The same argument applies if x is placed in any column of the identity matrix. Make the general argument. Illustrate the argument in the 2-by-2 and 3-by-3 case. 33. Suppose all r-by-r submatrices of a matrix have determinant zero. Argue that all submatrices (r + I)-by-(r + 1) have determinant zero.
34. Prove that any minor of order r in the product matrix AB is a sum of products of minors of order r in A with minors of order r in B.
Further Reading [A&W, 1992] William A. Adkins and Steven H. Weintraub, Algebra: An Approach via Module Theory, Springer-Verlag, New York, (1992).
[Aitken, 1962] A. C. Aitken, Determinants and Matrices, Oliver and Boyd, New York: Interscience Publishers, Inc., (1962).
[Axler, 1995] Sheldon Axler, Down with Determinants, The American Mathematical Monthly, Vol. 102, No. 2, February, (1995), 139-154.
[Axler, 19961 Sheldon Axler, Linear Algebra Done Right, Springer, New York, (1996). [B&R, 1986] T. S. Blyth and E. F. Robertson, Matrices and Vector Spaces, Vol. 2, Chapman & Hall, New York, (1986).
[Bress, 1999] David M. Bressoud, Proofs and Confirmations: The Story of the Alternating Sign Matrix Conjecture, Cambridge University Press, Cambridge, (1999).
[B&S, 1983/84] R. A. Brualdi and H. Schneider, Determinantal Identities: Gauss, Schur, Cauchy, Sylvester, Krone, Linear Algebra and Its Applications, 52/53, (1983), 769-791, and 59, (1984), 203-207.
[C,D'E et al., 2002] Nathan D. Cahill, John R. D'Errico, Darren A. Narayan and Jack Y. Harayan, Fibonacci Determinants, The College Mathematics Journal, Vol. 33, No. 3, May, (2002), 221-225.
Determinants
528
[Des, 18191 P. Desnanot, Complement de la theorie des equations du premier degrd, Paris, (1819). [Dodg, 1866] Charles L. Dodgson, Condensation of Determinants, Proceedings of the Royal Society, London, 15, (1866), 150-155. [Garibaldi, 20041 Skip Garibaldi, The Characteristic Polynomial and Determinant are Not Ad Hoc Constructions, The American Mathematical Monthly, Vol. I1 1 , No. 9, November, (2004), 761-778.
[Muir, 1882] Thomas Muir, A Treatise on the Theory of Determinants, Macmillan and Co., London, (1882). [Muir, 1906-1923] Thomas Muir, The Theory of Determinants in the Historical Order of Development, 4 volumes, Macmillan and Co., London, (1906-1923). [Muir, 1930] Thomas Muir, Contributions to the History of Determinants, 1900-1920, Blackie & Sons, London, (1930). [Rob&Rum, 1986] David P. Robbins and Howard Rumsey, Determinants and Alternating Sign Matrices, Advances in Mathematics, Vol. 62, (1986), 169-184. [Skala, 1971], Helen Skala, An Application of Determinants, The Americal Mathematical Monthly, Vol. 78, (1971), 889-990.
C.4
The Trace of a Square Matrix
There is another scalar that can be assigned to a square matrix that is very useful. The trace of a square matrix is just the sum of the diagonal elements.
DEFINITION C.1
(trace)
Let A be in en xn. We define the trace of A as the sum of the diagonal elements
of A. In symbols, tr(A) _ >ente1(A) = ai i + a22 +
+ ann. We view tr as a
function from Cnxn to C.
Next, we develop the important properties of the trace of a matrix. The first is that it is a linear map.
C.4 The Trace of a Square Matrix
529
THEOREM C.8 Let A, B be matrices in C', In. Then
1. tr(A + B) = tr(A) + tr(B).
2. tr(AA) = atr(A).
3. tr(AB) = tr(BA). 4. If S is invertible, then tr(S-' AS) = tr(A).
5. tr(ain) = na.
6. tr(ABC) = tr(BCA) = tr(CAB). 7. tr(AT B) =
tr(ABT).
8. tr(AT) = tr(A).
9. tr(A) = tr(A). 10. tr(A*) = tr(A). The trace can be used to define an inner product on the space of matrices
Cnxn
THEOREM C.9 The function (A I B) = tr(B*A) defines an inner product on C"x". In particular,
1. tr(A*A)=tr(AA*)>0and =0iffA=®. 2. tr(AX) = O for all X implies A = 0.
3. Itr(AB)I <
tr(A*A)tr(B*B) < Z(tr(A*A)+tr(B*B)).
4. tr(A2) + tr(B2) = tr((A + B)2) - 2tr(AB). Can you see how to generalize (3) above? There is an interesting connection with the trace and orthonormal bases of Cn.
THEOREM C.10 Suppose ei,e2,
, e is an orthonormal basis of C" with
inner product W. Then tr(A) _
n
(Ae; I ei ).
respect to the usual
530
Determinants
There is also an interesting connection to the eigenvalues of a complex matrix.
THEOREM C.11 The trace of A E C"" is the suin of the eigenvalues of A.
APC Exercise Set 2 1.
Fill in the proofs of the theorems in this section.
2. (G. Trenkler) Argue that A2 = -A iff rank(A) = -tr(A) and rank(A + 1) = n + tr(A), where A is n-by-n.
Appendix D A Review of Basics
Spanning
D.1
Suppose v 1 , v2, ... , VP are vectors in C". Suppose a 1, a2, ... , aP are scalars in C. Then the vector v = a1 v 1 +a2v2+ +aPVP is called a linear combination
of v1, v2, ... , VP. For example, and
i
since 2
I
1
L
I
+ 3i [ 0 J
2
I
is a linear combination of
]=[
]. However, there is
9i
01
could be a linear combination of 2
j
L
10 (why not?).
and
I
J
L
3 J
Recall that the system of linear equations Ax1 = b has a solution iff b can be expressed as a linear combination of the columns of A. Indeed, if b = Cl
c,coli(A)+c2col2(A)+
+c,
C2
then x =
solves the system. C"
Consider a subset S of vectors from C". Define the span of S (in symbols, sp(S)) as the set of all possible (finite) linear combinations that can be formed using the vectors in S. Let's agree the Tan of the empty set is the set having only
the zero vector in it (i.e., sp(o) = ()). For example, if S = ((1, 0, 0)), then sp(S) = ((a, 0, 0) 1 a E Q. Note how spanning tends to make sets bigger. In this example, we went from one vector to an infinite number. As another example, note sp((1, 0), (0, 1)} = C2. We now summarize the basic facts about spanning.
THEOREM D.1 (basic facts about span) Let S, S1, S2 be subsets of vectors in C". 1. For any subset S, S e sp(S). (increasing)
2. For any subset S, sp(S) is a subspace of C. in fact, sp(S) is the smallest
531
532
A Review of Basics
subspace of C" containing S. 3. If S, C S2, then .sp(Si) c sp(S2). (monotone)
4. For any subset S, sp(sp(S)) = sp(S). (idempotent)
5. M is a subspace of C" iff M = sp(M). 6. sp(S, fl S2) c sp(S,)fl sp(S2). 7. sp(Si U S2) = sp(S, )+ sp(S2).
8. sp(Si) = sp(S2) iff each vector in S, is a linear combination of vectors in S2 and conversely.
PROOF
The proofs are left as exercises.
U
We can view spas a function with certain properties from the set of all subsets of C" to the set of all subspaces of C" , sp : P(C") -* La t (C" ). The fixed points of sp are exactly the subspaces of C".
If a subspace M of C" is such that M = sp(S), we call the vectors in S generators of M. If S is a finite set, we say M is finitely generated. It would be nice to have a very efficient spanning set in the sense that none of the vectors in the set are redundant (i.e., can be generated as a linear combination of other vectors in the set).
THEOREM D.2 Let S = (VI, V2, ... , V,,) where p > 2. Then the following are equivalent. I. sp({v,, V2, ... , v,,}) = sP({v,, V2, ... , Vk-1, Vk+i, ...
2. vk is a linear combination o f v1 , V2,
3. There exist scalars a,, a2,
...
...
,
v,,}).
, Vk_,, V 4 + 1 ,--, V,,.
, a,,, not all zero, such that a, v, + a2v2 +
... + a,,vv = V. PROOF
The proof is left as an exercise.
U
The last condition of the theorem above leads us to the concept developed in the next section.
D.2 Linear Independence
D.2
533
Linear Independence
A set of vectors {v1, v2, ... , is called linearly dependent iff there exist scalars o ti, a2, ... , a not all zero, such that a, vi + a2v2 + + -6. Such an equation is referred to as a dependency relation. These are, evidently, not unique when they exist. If a set of vectors { v i , v2, ... , v,,) is not linearly dependent, it is called linearly independent. Thus, the set { v 1 , v2, ... , v,,} is linearly independent iff the equation aivi + a2v2 + + at,vt, = -iT implies a1 = a2 = . = aP = 0. For example, any set of vectors that has the zero vector in it must he linearly dependent (why?). A set with just two distinct vectors is dependent iff one of the vectors is a scalar multiple of the other. The set {(1,0), (0,1)} is independent in C2. THEOREM D.3 Let S be a set of two or more vectors.
1. S is linearly dependent iff at least one vector in S is a linear combination of other vectors in S.
2. S is linearly independent iff no vector in S is expressible as a linear combination of vectors in S.
3. Any subset of a linearly independent set is linearly independent. 4. Any set containing a linearly dependent set is linearly dependent. 5. (Extension theorem) Let S = {v1, V2,
...
,
vN} be a linearly independent
set and v ¢ sp(S). Then S _ (V I, v2, ... , vv, v} is also an independent set.
6. Let S = { v i , v2, ... , be a set of two or more nonzero vectors. Then S is dependent iff at least one vector in S is a linear combination of the vectors preceding it in S.
7. If S is a linearly independent set and v E sp(S), then v is uniquely expressible as a linear combination of vectors from S.
PROOF
As usual, the proofs are left as exercises.
U
A Review of Basics
534
D.3
Basis and Dimension
There is a very important result about the size of linearly independent sets in finitely generated subspaces that allows us to introduce the idea of dimension. We begin by developing some language. Let B be a set of vectors in a subspace
M of C. We say B is a basis of M iff (1) B is an independent set and (2) sp(B) = M. For example, let 8 = ((1, 0), (0, 1)) in C22. Then clearly, 5 is a basis of C2. A subspace M of C" is called finitely generated iff it contains a finite subset S with sp(S) = M. We see from above, C2 is finitely generated. Next is a fundamental result about finitely generated subspaces. It is such a crucial fact, we offer a proof.
THEOREM D.4 (Steinitz exchange theorem) Let M be a finitely generated subspace of C". Specifically, let Al = sp({v,, v2, ... , v,, } ). Let T be an independent set of vectors in M, say T = {w, , W2, ... , w,,, }. Then m < p. In other words, in a finitely generated subspace, you cannot have more independent vectors than you have generators.
PROOF First, note w, a M, so w, is a linear combination of the vs. Consider the set T, = {w, , v, , ... , v,,). Clearly sp(T,) = M. Now T, is a dependent set since at least one vector in it, namely w,, is a linear combination of the others. By Theorem D.3, some vector in T, is a linear combination of vectors preceding it in the list, say vj. Throw it out and consider S, = (w,, v, , ... , vi-1, vj+,, ... , v,,). Note sp(Sj) = M since vj was redundant. Now we go again. Note W2 E M so w2 E sp(S,) = M. Consider T 2 = (w2, w,, v,, ... , vj_,, vj+,, ... , v,,). Clearly T2 is a linearly dependent set since W2 is a linear combination of the elements in S1. Again by Theorem D.3, some vector in T2 is a linear combination of vectors previous to it. Could this vector he W2? No, since w, and w2 are independent, so it must be one of the vs. Throw it out and call the resulting set S2. Note sp(S2) = M since the vector we threw out is redundant. Continue in this manner exchanging vs for ws. If we eliminate all the vs and still have some ws left over, then some w will be in the span of the other ws preceding it, contradicting the independence of the ws. So there must be more vs than ws or perhaps the same number. That is m
D.3 Basis and Dimension
535
COROLLARY D.2 Any n + I vectors in C" are necessarily dependent.
COROLLARY D.3 Any two bases of a finitely generated subspace have the same number of vectors in them.
PROOF
(Hint: View one basis as an independent set and the other as a set of generators. Then change these roles.) 0 This last result allows us to define the concept of dimension. A subspace M of C" is m-dimensional if M has a basis of m vectors. In view of the previous corollary, this is a uniquely defined number. The notation is dim(M) = m. Note dim(C") = n. COROLLARY D.4 If M is generated by n vectors and S = {v I, v2, ... , v" } is an independent set of vectors in M, then S must be a basis for M. COROLLARY D.5 If M has dimension n and S = IV 1, v2, ... , v") spans M, then S must be a basis for M. COROLLARY D.6
Suppose M # 16) is a finitely generated subspace of C"'. Then 1. M has a finite basis. 2. Any set of generators of M contains a basis.
3. Any independent subset of M can be extended to a basis of M. COROLLARY D. 7
Suppose M and N are subspaces of C"' and dim(N) = n and M e_ N. Then dim(M)< dim(N). Moreover, if in addition, dim(M) = dim(N), then M = N. These are wonderful and useful corollaries to the Steinitz theorem. Some results we want to review depend on making new subspaces from old ones. Recall that when M, and M2 are subspaces of C", we can form their intersection
M, (M2 and their sum M, + M2 = {u + v I U E M, and V E M2}. It is easy to show that these constructions lead to subspaces. A sum is called a direct
536
A Review of Busies
sum when M, fl M2 The notation is M, ® M, for a direct sum. If C" = M, ® M2, we say M2 is a complement of M, or that M, and M2 are complementary subspaces. Typically, a given subspace has many complements.
THEOREM D.5 Suppose M, and M2 are subspaces of C" with bases 5 and E2 respectively. Then TA.E.:
I. C" = M, ® M2. 2. For each vector w in C", there exist unique vector v in M, and u in M2 with w = v + u. 3.
Ci, fl E2 = 0 and B, U E2 is a basis for C".
PROOF
The proof is left as an exercise.
0
If M C N and there exists a subspace K with M ® K = N, then K is called a relative complement of M in N. THEOREM D.6 Suppose N is finitely generated and M C_ N. Then M has a relative complement in N.
PROOF
As usual, the proof is left as an exercise.
0
COROLLARY D.8 Any subspace of C" has a complement.
We end with a famous formula relating dimensions of two subspaces.
THEOREM D. 7 (the dimension formula) Suppose M, and M2 are subspaces of a finitely generated subspace M. Then M1, M2, M, +M2, and M, fl M2 are all finite dimensional and dim(M, + M2) _ dim(MI) + dim(M2) - dim(M, fl M2).
PROOF Start with a basis of M, fl M2, say u,, u2, ... , u,,. Then extend this basis to one of M, and one of M2, say u,, ... , ua,v,, ... , vh and u,, u2, ... , u,,, w,, ... , w, respectively. Now argue that u,, ... , u,,,v,, ... , vh, WI.... , w, is a basis for M, + M2. 0
D.3 Basis and Dimension
537
Does the dimension formula remind you of anything from probability theory?
APD Exercise Set 1 1. How would the dimension formula read if three subspaces were involved? Can you generalize to a finite number of subspaces?
2. Fill in the arguments that have been omitted in the discussion above.
3. Suppose {u1, u2, ... , u,,} and {v1, v2, ... , vq} are two sets of vectors with p > q. Suppose each u; lies in the span of {v1, v2, ... , vq }. Argue that {u1, u2,
... , u1,} is necessarily a linearly dependent set.
4. Suppose M1 and M2 are subspaces of C" with dim(M,+M2) = dim(M,fl M2) + 1. Prove that either M, c M2 or M2 a Ml . 5. Suppose M1, M2, and M3 are subspaces of C". Argue that dim(M1 fl M2 fl M3) + 2n > dim(M1) + dim(M2) + dim(M3).
6. Suppose {u1, u2, ... , and {v1, v2, ... , are two bases of V. Form the matrix U where the columns are the u; s and the matrix V whose columns are the vis. Argue that there is an invertible matrix S such that SU = V.
7. Suppose {u,, u2, ... , u,,} spanasubspaceM.Arguethat{Au1, Au2, ...
,
Aun} spans A(M).
8. Suppose Ml,... , Mk are subspaces of C". Argue that dim(M1 fl ... fl k-1
k
Mk) = n - E(n - dim(Mi))+>{n -dim((M1 fl ... fl Mi)+ Mj+1)}. j=1
i=1
k
Deduce that dim(M1 fl... fl Mk) > n - E(n -dim(M1)) and dim(M1 fl =1 k
...flMk) = n->.(n-dim(Mj))iffforalli = 1,... k, Mi+( fl Mi) _ Cn
i=1
i0i
9. Select some columns from a matrix B and suppose they are dependent. Prove that the same columns in AB are also dependent for any matrix A that can multiply B.
A Review of Basics
538
D.4
Change of Basis
You have no doubt seen in your linear algebra class how to associate a matrix to a linear transformation between vector spaces. In this section, we review how that went. Of course, we will need to have bases to effect this connection. First, we consider the change of basis problem. Let V be a complex vector space and let B = (b,, b2, ... , bn } he a basis for V. Let x be a vector in V. Then x can he written uniquely as x = b, R i +b2 [32 + That is, the scalars [3i, [32, ... , (3 are uniquely determined by x + and the basis B. We call these scalars in C the "coordinates" of x relative to the basis B. Thus, we have the correspondence 131
XH
12
= Mat(x; B) - [x]B.
Rn
Now suppose C = {ci , c2, ... , cn } is another basis of V. The same vector x can be expressed (again uniquely) relative to this basis as x = c,y, + c2'y2 + Thus, we have the correspondence + 1
x(
Y2
)
I
= Mat(x;C) - [x]c.
L yn J The question we seek to resolve is, what is the connection between these sets of coordinates determined by x? First we set an exercise: 1. Show Mat(x + y; B) = Mat(x; B) + Mat(y; B).
2. Show Mat(xa; B) = Mat(x; B)a. For simplicity to fix ideas, suppose B = {b,, b2 } and C = {C1, c2}. Let x be a vector in V. Now c, and c2 are vectors in V and so are uniquely expressible
and [c2]B = [ y l , SJJ LRJ 1 What is [x]B? Well, x = J.
in the basis B. Say c, = b, a + b2(3 so that [c, ]B =
reflecting that e2 = b, y + b2S. Let [x]c = L
c1 µ+c2o =(b,a+b2(3)µ+(b,y+b2s)a = bi (aµ+ya)+b20µ+8(F). Thus,
[x]B = (Pµ +
-Yo-
1=1
R
8
11 'L
[ 0
s
I
[x]c
.
539
D.4 Change of Basis The matrix
b 1 gives us a computational way to go from the coordinates
R
L
of x in the C-basis to the coordinates of x in the 8-basis. Thus, we call the matrix -Y
OL
the transition matrix or change of basis matrix from the C to the B
LR sJ basis. We write
rR
R13.-c =
s
J= [[el]B I [C2]E1
and note
[x]B = RB.-c [x]c The general case is the same. Only the notation becomes more obscure. Let 13 and C = { c i , c2, ... , = {b1, b2, ... , be bases of V. Define R13-c = [[ci]B I [c2]13
I
... I [c]B]. Then for any x in V, [x],6 = RB.-c[x]c. There is
a clever trick that can be used to compute transition matrices given two bases. We illustrate with an example. Let B = {(2, 0, 1), (1, 2, 0), (1, 1, ]) } and C = {(6, 3, 3), (4, -1, 3), (5, 5, 2)}. Form the augmented matrix [B 1 C] and use elementary matrices on the left to produce the identity matrix: 2
0 1
1
2 0
1
1
1
1
6
4
5
3
-1
5
3
3
2
2
Then RB,c =
1
2
1
-1
2
I
I
I
-
1
0
0
0
1
0
0
0
1
2 1
1
1
2
-1 1
1
2
.
1
. We finally make t he co nnec tion w ith invert-
ible matrices.
THEOREM D.8 With notation as above
1. RA.-BRB.-c = RAC-2.
RB,B = 1.
3. (RB,c)-' exists and equals RC-B. Moreover, [x]B = Rti.-c [xlc iff (RB.-c)-' [x]ti = [x]c PROOF
The proof is left to the reader.
0
Next, we tackle the problem of attaching a matrix to any linear transformation between two vector spaces that have finite bases. First, recall that a linear transformation is a function that preserves all the vector space structure, namely,
A Review of Basics
540
addition and scalar multiplication. So let T : V -+ W. T is a linear transformation iff (1) T(x + y) = T(x) + T(y) for all x, y in V and (2) T(xr) = T(x)r for all x in V and r in C. We shall now see how to assign a matrix to T relative to a pair of bases.
As above, we start with a simple illustration and then generalize. Let B = {bi, bz} be a basis of V and C = {ci, cz, c3} be a basis for W. Now T(b,) is a vector in W and so must be uniquely expressible in the C basis, say T(b, ) = CI (X + c20 + c3y. The same is true for T(bz), say T(b2) = c, s + cze + cap. Define the matrix of T relative to the bases 8 and C by
s Mat(T; 5, C) =
R
e
'Y
P
= [[T(bi)]c I [T (bz)]c]
Then a remarkable thing happens. Let x be a vector in V. Then say x = bi ri + bzrz. Then [x]
ri r2 J .
Also, T(x) = T (b, r, + b2r2) = T(bi )ri +
T(b2)r2 = (Cl a + CI -P + c3y)ri + (Cl 8 + cze + c3p) rz = c, («r, + 8r,)) + «r, + 8rz s = (3 e cz (Rri + er2)+c3 (yri + prz). Thus Mat(T (x); C) = (3r, + erz r
rr2
1
Mat(T; 8, C)Mat(x; B). In other words,
yr, + Prz
11
'Y p
[T(x)]c = Mat(T;B,C)[x]8. The reader may verify that this same formula persists no matter what the finite cardinalities of B and C are. We end with a result that justifies the weird way we multiply matrices. The composition of linear transformations is again a linear transformation. It turns out that the matrix of a composition is the product of the individual matrices. More precisely, we have the following theorem.
THEOREM D.9 Suppose V W, and P are complex vector spaces with bases 8, C, A, respectively.
Suppose T : V -* W and S : W -+ P are linear transformations. Then Mat(S o T; B, A) = Mat(S;C,A)Mat(T;B,C). PROOF Take any x in V. Then, using our formulas above, Mat(SoT;f3, A)[x],j = [(SoT)(x)].4 = [S(T(x))]A = Mat(S;C,A)[T(x)]c = Mat(S;C,A)(Mat(T; B, C)[x]8). Now this holds for all vectors x so it holds 0 when we choose basis vectors for x. But [b,]13 =
and generally,
0
D.4 Change of Basis
541
[bi]n = e;, the standard basis vector. By arguments we have seen before, Mat(S o T;5, A) must equal the product Mat(S;C,A)Mat(T;8,C) column by column. Thus, the theorem follows.
0
If we had not yet defined matrix multiplication, this theorem would he our guide since we surely want this wonderful formula relating composition to matrix multiplication to be true.
APD Exercise Set 2 1. Find the coordinate matrix Mat(x; B) of x = (3, -1, 4) relative to the basis B = {(1,0,0), (1, 1,0), (1, 1, 1)). Now find the coordinate matrix of x relative to C = ((2,0,0), (3,3,0), (4,4,4)}. Suppose Mat(v;B) _ 4
4 7 10
.
Find v. Suppose Mat(v;C) =
7 10
.
Find v.
2. Let B = {(I, 0, 1), (6, -4, 2), (-1, -6, I )j and C =-1, - 1, 0), (-1, -3, 2), (2, 3, -7)}. Verify that these are bases and 2
find the transition matrix RCt-a and Ra. -c. If Mat(v; 13) =
4
, use
6
the transition matrix to find Mat(v;C).
3. Argue that (Re-13)-l = RB-c, Rj3_8 = 1, and RA+ Rt3_c = RA C. 4. Argue that if C is any basis of V and S is any invertible matrix over C. then there is a basis B of V so that S = Rc.-13.
5. Investigate the map Mat(_;8,C) : Hom(V, W) -> tC' ' that assigns C-linear map between V and W and the matrix in (C-11 relative to the bases 5 of V and C of W. 6. Suppose T : V --+ V is invertible. Is Mat(T; 8,13) an invertible matri; for any basis B of V?
7. How would you formulate the idea of the kernel and image of a C-Iinea map?
8. Is T, : V -> V by T,(x) = xr a C-linear map on the V? If so, what is it image and what is its kernel'?
542
A Review of Basics
9. Let T be any C-linear map between two vector spaces. Argue that (a)
(b) T(-x) _ -T(x) for all x. (c) T(x - y) = T(x) - T(y) for all x and y. 10. Argue that a linear map is completely determined by its action on a basis.
Further Reading [B&R, 1986(2)1 T. S. Blyth and E. F. Robertson, Matrices and Vector Spaces, Vol. 2, Chapman & Hall, New York, (1986). [B&B, 2003] Karim Boulabiar and Gerard Buskes, After the Determinants
Are Down: A Criterion for Invertibility, The American Mathematical Monthly, Vol. 110, No. 8, October, (2003).
[Brown, 1988] William C. Brown, A Second Course in Linear Algebra, John Wiley & Sons, New York, (1988). [Max&Mey, 20011 C. J. Maxon and J. H. Meyer, How Many Subspaces Force Linearity?, The American Mathematical Monthly, Vol. 108, No. 6, June-July, (2001), 531-536.
Index I -inverse, 199-209 2-inverses, 208-209, 217-222
A Absolute error, I I
Adjoint matrix, 76 Adjugate, 76-81 Adjugate matrix, 76-81 Affine projection, 315-323 Affine subspace, 8, 315-317 Algebra of projections, 299-308 Algebraic multiplicity, 333 Alternating, 431, 436 Angle between vectors, 257 Appoloniuis identity, 260 Approximation, 291, 294-297
Argand diagram, 464
B Back substitution, 41-49 Bailey theorem, 218-219 Banach norm, 245, 248 Banachiewicz inversion formula, 26 Base, 10-11 Basic column, 74, 165
Basic facts about all norms, 235-236 Basic facts about any distance function, 237-238 Basic facts about inner product, 259 Basic facts about inverses, 18-19 Basic facts about matnx norms, 246 Basic facts about span, 531-532 Basic facts about tensor products, 453-454 Basic facts about the conjugate matrix, 497 Basic facts about the inner product, 259 Basic facts about transpose, 496 Basic properties of induced norms, 248-250 Basic properties of vec, 454-455 Basic rules of matrix addition, 487-488
Basic rules of matrix multiplication, 490-495 Basic rules of scalar multiplication, 489-490 Basic rules of sigma, 492 Basic variables, 74 Basis, 534-542 Bessel's inequality, 264
Bilinear form, 431, 437-440,450-452 Bilinear map, 431 Blocks, 15-17 Bose theorem, 202-203 Built-in matrices, 14-15
C Cancellation error, 12 Canonical form, 340 Canonical representative, 163 Cartesian decomposition, 448, 498 Cauchy-Binet Theorem, 517-520 Cauchy inequality, 233, 240 Cauchy-Schwarz inequality, 233, 240 Cauchy sequence. 242 Cayley-Hamilton theorem, 81-98 Change of basis, 538-542 Characteristic matrix, 81 Characteristic polynomial, 81, 90-95 Chopping, 1 I Circulant matrix, 187-188 Cline theorem, 231 Column equivalence, 163 Column index, 485 Column rank, 99, 103 Column space, 56, 99 Column view, 2 Columns, 165 Commutative law of multiplication, 2 Companion matrix, 335 Complementary minor, 517 Complementary subspace, 118 Complete pivoting, 170
543
544 Complex conjugate, 468-473 Complex numbers, 459-483 Condition number, 249, 254-255 Congruence, 438, 447-450 Conjugate partition, 135-136 Conjugate transpose, 7, 497 Consistency condition, 190 Convergent sequence of vectors, 233, 238 Convex, 235-236 Core, 131-132, 225 Core-nilpotent decomposition, 225 Core-nilpotent factorization, 131-132 Cosets, 324
D De Moivre's theorem. 475 Delinite, 437 Determinant function, 513 Determinants, 509-530 Diagonal element, 486 Diagonal matrices, 15 Diagonalizable with respect to a unitary, 372
Diagonalizable with respect to equivalence, 351-357
Diagonalizable with respect to similarity, 357-371 Dilation, 52 Dimension formula, 534-537 Direct sum, 117-128 Direct sum decomposition, 118 Disjoint. 118, 120, 127 Distance, 468-473
Distance function, 237-238 Drazin inverse, 223-229
E Eigenprojection, 329 Eigenspace, 329, 404 Eigenstuff, 329-338 Eigenvalue, 82, 329, 337-338, 347 Eigenvector, 329, 332, 337-338, 389-422 Elementary column operations, 55 Elementary divisors, 426 Elementary matrices, 49-63 Elementary row operations, 54 Elimination matrices, 66 Entries, 13-14, 64 Equivalence, 351-357 Equivalence relation, 144
Index Equivalent, 43. 372 Equivalents to being invertible, 19 Error vector, 249 Euclidean norm, 235 Even function, 126
Explicit entry, 13-14 Exponent, 10
F Ferrer's diagram, 136, 138, 412 Finitely generated, 532 Floating point arithmetic, 10-I I Floating point number, 10 Floating point representation, 10 Fourier coefficients, 263-264 Fourier expansion, 263-265 Frame algorithm, 81-98 Free variables, 42 Frohenius inequality, 114 Frobenius norm, 114 Full rank factorization, 176-179, 275, 313-315 Function view. 5
Fundamental projections, 309-310, 313 Fundamental subspaces, 99-111, 278, 380 Fundamental theorem of algebra, 278-282
G Gauss elimination, 41-49 Generalized eigenspace, 404 Generalized eigenvector, 389-422 Generalized elementary matrices, 62 Generalized inverses, 199-228 Generators, 532 Geometric law. 3
Geometric multiplicity, 329 Geometric view, 2
Grain matrix, 268 Grain-Schmidt orthogonalization process. 268 Grant-Schmidt process, 269 Grammian. 268 Group inverse, 230-231 Gutttnan rank additivity formula, 108
H Hartwig theorem, 204 Henderson-Searle formulas, 21-24
545
Index
Hermite echelon form, 171-175 Hermitian, 257-258 Hermitian inner product, 257-258 Holder's inequality, 238-240 Hyperbolic pair, 450 Hyperbolic plane, 450 Hyperplane, 2 0
Idempotent, 117-128, 291, 360 Identity matrix, 14 Inconsistent system, 44 Indefinite, 437 Independent, 533
Index,128-148 Induced matrix norm, 248 Induced norm, 247-250 Inertia, 444 Inner product space, 257-262 Inner products, 257-290 Intersection, 535 Invariant factors, 425 Invariant subspace, 125, 131, 147, 326 Inverse, 1-39
Invertible, 17,41-98 Isometry, 243 Isotropic, 440
L Laplace cofactor expansion theorem,
520-527 LDU factorization, 63-76 Leading coefficient, 155 Leading entry, 64 Leading principal minor, 77 Leading principal submatrices, 67, 77 Least squares solutions, 285-288 Left inverse, 148-154 Leibnitz rule, 87 Length, 233, 283, 392 Line segment, 236 Linear combination, 531 Linear independence, 533 Linear transformation, 540 Linearly dependent, 533 Linearly independent, 533 Loss of significance, 12-13 Lower triangular, 63, 69 LU-factorization, 63-76
M M-dimensional, 535 M-perp, 262 Magnitude, 233 Mahalanobis norm, 251 Mantissa, 10
MATLAB, 13-17,37-39,47-49,75-76.
J JCF, 413 JNF, 416 Jordan basis, 405
Jordan block, 389-392 Jordan block of order k, 389 Jordan canonical form, 389-430 Jordan decomposition, 366 Jordan form, 389-422 Jordan matrix, 396-398 Jordan segment, 392-395 Jordan string, 403 Jordan's theorem, 398-422, 428
K Kronecker product, 452 Krylov matrix, 61 Krylov subspace, 106 Kung's algorithm, 274-276
95-98, 109-111, 147-148, 167-168,
179,190,252-255,269,276-278,313, 337-338,376-377,385-387,392,395, 397-398,456-457,502,506-507 Matrix addition, 487-488 Matrix algebra, 5 Matrix diagonalization, 351-387 Matrix equality, 486 Matrix equivalence, 155-171 Matrix inversion. 39 Matrix multiplication, 490-495 Matrix norm, 244-255 Matrix of a linear transformation, 538 Matrix operations, 485-507 Matrix units, 50-51, 54, 60 Matrix view, 3 Minimal polynomial, 57-63, 90-95 Minimal polynomial of A relative to v, 361 Minimum norm, 282-285 Minkowski inequality, 241 Minor, 77, 517
546
Index
Modified Gram-Schmidt process, 269 Modified RREF, 155 Modulus, 468-473 Moore-Penrose inverse, 155-198, 292 Motiviation, 509-512 MP-Schur complement, 194
N
Overdetermined, 46, 48-49 Overflow, 10
P P-norm, 235, 245 Parallel projection, 121 Parallel sum, 197
Parallelogram law, 257 Natural map, 325 Negative definite, 437
Negative semidefinite, 437 Newton's identities, 85-90 Nilpotent, 131-132, 147-148, 225, 339
Nilpotent on V, 367 Nonbasic columns, 1b5 Nondegenerate, 434 Nondegenerate on the left, 434 Nondegenetate on the right, 434 Nonnegative real numbers, 347 Nonsingular, 17 Norm, 233-256
Norm equivalence theorem, 236-237 Norm of vector, 234
Normal, 339 Normal equations, 274 Normalized, 263 Normalized floating point number, 10
Normalized orthonormal set, 263 Normed linear space, 233-244 Null space, 99 Nullity, 99, 102, 104-105, 109, I I I
0 Odd function, 126 Off-diagonal, 486 Operation counts, 39, 170-171 Orthogonal, 262-263, 275, 299-300 Orthogonal basis, 442
Orthogonal full rank factonzation, 275 Orthogonal projection, 291-298 Orthogonal projection matrix, 299-300 Orthogonal set, 262-269 Orthogonal vectors, 262-263 Orthogonality, 440-442 Orthonormal set, 263 Orthosymmetnc, 441 Othogonal sets of vectors, 262-269
Parseval's identity, 265 Partial order, 300-307 Partial pivoting, 169 Particular solution, 8
Partition of n, 135-136 Pauli spin matrices, 502 Penrose theorem, 287-288 Permutation matrix, 53 Pivot, 41, 74 Pivot column, 155 Pivot rank, 74 Pivoting strategies, 169-170 Polar decomposition theorem, 347-350 Polar form of complex numbers,
473-480 Polynomial, 90-95 Polynomials, 90-95, 480-482 Positive definite, 437 Positive semidefinite, 437 Precision, 10 Primary decomposition theorem, 364-365 Prime mapping, 310-312 Principal idempotents, 360 Principal minor, 77 Principal submatrix, 67-68, 77, 503 Pseudoinverse, 182 Pythagorean theorem, 257
Q Q-eigenvector,403 QR factonzation, 269-278 Quadratic form, 435 Quadratic map, 431, 435 Quotient formula, 195 Quotient space, 324-327
R Radical, 441 Randomly generated matrices, I5
547
hidex
Rank, 111-117, 194 Rank normal form, 162 Rank one update, 21, 31 Rank plus nullity theorem, 102 Real valued function, 245 Reflexive generalized inverses, 208 Reidel's formula, 196-197 Relative complement, 536 Relative error, I I Relative residual, 249-250 Representable number, 10 Residual vector, 249
Skew-symmetric form, 450-452 Smith normal form, 422 Solving a square system of full rank, 20-21 Span,531-532 Spectral theorem, 338-347 Spectral theory, 329-350 Spectrum, 329
Square matrix, 128-148, 528-530 Square root theorem, 347-350 Square systems, 17-39 Square theorem, 348
Row reduction echelon form, 155 Row space, 100
Standard nilpotent matrix, 147-148 Strictly upper triangular, 368 Submatrices, 67, 77, 503-507 Subspace, 99-154, 299-308 Sum, 117-128 Sum function, 504 Sylvester's determinant formula, 195 Sylvester's law of inertia, 444 Sylvester's law of nullity, 113 Sylvester's rank formula, I I I Symmetric bilinear forms, 442-447 Symmetric form, 442-447, 450-452 Symmetric matrices, 447-450 System of complex numbers, 464-466
Row view, 2 Rows, 485
T
Reversal law, 21
Right inverse, 148-154
Rounding, II Roundoff error, I I Row echelon form, 64 Row echelon matrix, 64 Row equivalence, 163 Row index, 504
Row rank, 100, 103 Row rank equals column rank theorem, 103 Row reduced echelon form, 155-171
S Scalar multiplication, 489-490 Schur complement, 24-36, 194-198 Schur decomposition, 373
Schur determinant formulas, 35 Schur triangularization, 376-377 Schur's lemma, 372 Segre characteristic, 396 Segre sequence, 135-136
Self-adjoint, 293-294 Sherman-Mornson-Woodbury formula, 24-36 Signature, 445 Similar, 372 Simultaneously diagonable, 165 Singular value decomposition,
377-387 Skew-symmetric, 434 Skew-symmetric bilinear forms, 450-452
Tensor product, 452-457 Theorem on elementary column operations, 55 Theorem on elementary row operations, 54
Trace, 528-530 Transition matrix, 541 Transpose, 495-502 Transvection, 51 Triangular matrices, 63, 69 Triangularization theorem, 369 Tnlinear, 436 Truncating, I I Type 1,41,62 Type 11, 41,62 Type 111, 41,62
U Underdetermined, 46, 48 Underflow, 10
548 Unique polynomial, 57 Uniqueness of inverse, 18 Uniqueness of the group inverse, 231
Uniqueness theorem, 180-182 Unit ball, 235
Unit roundoff error, I I Unit sphere, 235-236 Unit vector, 263 Unitarily equivalent, 372 Unitarily similar, 372 Unitary, 270-272, 371-377 Upper triangular, 63, 69
Index
V Vandermonde determinant, 525 Vector view, 2, 4
VmodM, 324
W Weyr sequence, 132-133
Z Zero matrix, 14 Zero row, 64