Linear Algebra
This page is intentionally left blank
Linear Jgebra C Y Hsiung Wuhan University
GYMao Wuhan University of Technology
World Scientific Singapore'New Jersey'London 'Hong Kong
Published by World Scientific Publishing Co. Pie. Ltd. P O Box 128, Farrer Road, Singapore 912805 USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
LINEAR ALGEBRA Copyright © 1998 by World Scientific Publishing Co. Pie. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 981-02-3092-3
This book is printed on acid-free paper.
Printed in Singapore by U t o Print
PREFACE This book introduces the basic properties and operations of linear algebra as well as its basic theory and concepts. It is based on the first author's lecture notes on Linear Algebra during his teaching in the Mathematics Department of Wuhan University. Some amendments were made and many new ideas were added to the notes before the book was finalized. The book consists of eight chapters. The first six chapters introduce linear equations and matrices, and the last two chapters introduce linear spaces and linear transformations. They are arranged progressively from easy to difficult, simple to complex, and specific to general, which makes it easier for the readers to study on his own. A summary at the beginning of each chapter and each section gives the reader some idea of the topics and aims to be dealt with in the text. Both basic concepts and manipulation skills are equally emphasized, and enough examples are given for their illustration. There are also many exercises at the end of each section, with the answers atta.ched to the end of the book for reference. Basic concepts are specially stressed and great pains have been taken to explain the underlying thoughts and the approach. Furthermore, between chapters and between sections, a brief leader is given to preserve the coherence and continuity of the text. Chapter 1 was written by Professor Jian-Ke Lu, who gave the definition of determinants in a very special way, quite different from those in other text books. This definition is easier for the reader to understand and master, and facilitates the proofs of some of their properties. The author would like to thank Bang-Teng Xu for his very helpful and meticulous work on the proofs of the other seven chapters.
This page is intentionally left blank
CONTENTS
Preface
v
1 Determinants 1.1. Concept of Determinants 1.2. Basic Properties of Determinants 1.3. Development of a Determinant 1.4. Cramer's Theorem
1 . 1 12 20 35
2 Systems of Linear Equations 2.1. Linear Relations between Vectors 2.2. Systems of Homogeneous Linear Equations . . . 2.3. Systems of Fundamental Solutions 2.4. Systems of Nonhomogeneous Linear Equations . 2.5. Elementary Operations
41 42 56 62 70 80
3 Matrix Operations 3.1. Matrix Addition and Matrix Multiplication . . 3.2. Diagonal, Symmetric, and Orthogonal Matrices . 3.3. Invertible Matrices
92 92 112 124
4 Quadratic Forms 4.1. Standard Forms of General Quadratic Forms . . 4.2. Classification of Real Quadratic Forms . . . . *4.3. Bilinear Forms
141 142 155 170
5 Matrices Similar to Diagonal Matrices 5.1. Eigenvalues and Eigenvectors 5.2. Diagonalization of Matrices
173 174 184
vii
viii
5.3. Diagonalization of Real Symmetric Matrices *5.4. Canonical Form of Orthogonal Matrices *5.5. Cayley-Hamilton Theorem and Minimum Polynomials 6 Jordan Canonical Form of Matrices 6.1. Necessary and Sufficient Condition for Two Matrices to be Similar 6.2. Canonical Form of A-Matrices 6.3. Necessary and Sufficient Condition for Two A-Matrices to be Equivalent 6.4. Jordan Canonical Forms
. . . .
198 212 217 230 230 236 242 252
7 Linear Spaces and Linear Transformations 273 7.1. Concept of Linear Spaces 273 7.2. Bases and Coordinates 286 7.3. Linear Transformations 304 7.4. Matrix Representation of Linear Transformations 321 *7.5. Linear Transformations from One Linear Space into Another . . 337 *7.6. Dual Spaces and Dualistic Transformations 341 8 Inner Product Spaces 349 8.1. Concept of Inner Product Spaces 349 8.2. Orthonormal Bases 362 8.3. Orthogonal Linear Transformations 373 8.4. Linear Spaces over Complex Numbers with Inner Products . . . 381 *8.5. Normal Operators 392 Answers to Selected Exercises
395
Index
440
CHAPTER 1 DETERMINANTS
In many practical problems, relations among variables may be simply expressed directly or approximately in terms of linear functions so that it is necessary to investigate such functions. Linear algebra is a branch of mathe matics dealing mainly with linear functions, in which systems of linear equations constitute its basic and also important part. In linear algebra, the notion of determinants is fundamental. Theory of determinants is established to satisfy the need for solving systems of linear equations. It has wide appli cations in mathematics, as well as in other scientific branches (for instance, physics, dynamics, etc.). The present chapter will mainly deal with the follow ing three problems: 1. Formulation of the concept of determinants, 2. Derivation of their basic properties and study of the relating calculations, 3. Solving systems of linear equations by using them as a tool.
1.1. C o n c e p t of D e t e r m i n a n t s In high school, we already learned how to solve a system of linear equations in 2 or 3 unknowns by using determinants of order 2 or 3 respectively. Is it possible, in general, to solve a system of linear equations in n unknowns in an analogous way? Determinants of an arbitrary order were introduced for such needs. The aim of the present section is to establish the concept of determinants of order n so as to answer the above-mentioned problem 1. We shall first recall results familiar to us in high school. 1
2
Linear Algebra
First of all, let us solve the following system of linear equations in two unknowns x\, X2'. CLnXi +a\2X2
= b\ ,
(1) 0-21 Xl + 0.22 %2 — &2 j where b\, 62 are constant terms and a ^ ' s (i,j = 1,2) are called the coefficients of Xj. There are two subscripts in a^-: the first one, i, and the second one, j , which signify t h a t it is the coefficient of Xj in the i-th equation. For example, a\2 is the coefficient of X2 in the first equation. Eliminating X2, we obtain ( a n a 2 2 - ai2 021 )xx = 61 022 - 012 b2 ■ Similarly, eliminating x\, we get ( a n a 2 2 - a i 2 a 2 i ) x 2 = au b2 - h a 2 i . Therefore, when D — an 022 — ai2 021 ^ 0, we have b\ 022 — 0,2 &2 Xl =
CLu i>2
All 0-22 — ^12^21
X2 =
au «22
b\ 021
a n 022 — 012021
T h a t is to say, if system (1) is solvable under solution must be (2). By direct verification, we solution of (1). Thus, (2) is the unique solution To facilitate memorization we introduce the an ^21
—
(2)
the proviso D ^ 0, then its assure t h a t (2) is actually the of (1) provided t h a t D ^ 0. notation
a i l «22 - 012 a 2 i ,
and call it a determinant of order 2. It consists of two (horizontal) rows a n d two (vertical) columns. T h e numbers appearing in a determinant are called its elements. From the above expression, we see t h a t a determinant of order 2 is the algebraic sum of two terms: one is the product of the two elements situated on the principal diagonal of the determinant, i.e., the diagonal from the left upper corner to the right lower corner, and the other is t h a t of t h e two elements situated on its sub-diagonal, the other diagonal, with negative sign. For example, '
' = 1 - 5 - ( - 2 ) - 3 = 11.
It is easily seen t h a t the two numerators in (2) may be respectively written as, by definition of determinants of order 2,
3
Determinants b\ 022 — ^12 &2
bi
a12
&2
022
anb2
-
an
ha2i
0.21
61 62
If we denote D = an
ai2
«21
Dx
^22
h
a12
62
a22
Da
an
61
a2i
b2
then the unique solution (2) of system (1) may be written as
Xi =
D
h
ai2
b2 an
«22
a2i
012
D2 X2 = D
a-22
an
h
021
an
b2 ai2
^21
022
which may be easily memorized. E x a m p l e 1. Solve the system of linear equations f 2x + y = 7, \ x - Zy = - 2 . Solution: D
2 1
In this case, 1 = -7, -3
£>i
7 -2
1 -3
19,
-D2
-11
and hence, the unique solution of the given system is £>i
19
D
7 '
y
Do
11
D
7
Let us then solve the following system of linear equations in 3 unknowns: a n %i + aw X2 + 01313 = 6 1 , «21 Z l + 022 X2 + 023 £3 = &2 ,
(3)
^31 X\ + 032 £2 + 033 £3 = 63 .
Analogous to the above, eliminating £3 both from the first two equations and from the last two equations gives rise to a new system of two linear equa tions in xi, X2, the solution of which may be obtained as before, and X3 is
4
Linear Algebra
then obtained by substitution in any of the given equations. By some rather lengthy calculations, we finally get Xl = -jj(l>l «22 «33 + a i 2 «23 &3 + »13 &2 «32 — b\ 023 032 — Oi2 &2 «33 ~ a 1 3 ^22 fa) , X2 = 7 y ( a H ^2 «33 + h 023 031 + CI13 a 2 l 63
(4)
— ^11 023 &3 - &1 021 133 - O13 &2 fl3l) i xz = — ( a n a 2 2 &3 + ^12 &2 «3i + bi 021032 — a n 62 032 — a i 2 021 &3 — °i a 2 2 0 3 1 ) ,
provided that D
a n 022 033 + a i 2 023 031 + a i 3 021032 - a n a23 a 3 2 - a i 2 a 2 i a 3 3 - a i 3 a 2 2 a 3 i ^ 0 .
(4) is actually a solution of system (3) which may be checked by direct substi tution. Therefore, (4) is the unique solution of (3) if D ^ 0. Analogous to the case n = 2, we define a determinant of order 3: an
ai2
an
021
022
023
031
032
A33
= a n 022 033 + ai2 023 031 + ai3 021 032 a n a23 032 — ai2 a2i 033 — ai3 022 031
(5)
which consists of three rows and three columns and is the algebraic sum of 6 terms. The summation may be memorized as follows: As in the following figure, we add up the products of the three elements situated on each solid line with a positive sign and those on each dotted line with a negative sign.
c ,x x
°13
= x. .K °21
°22
°31
°32
, 023 u
33
5
Determinants
For example 2 4
A-
1 2 3 1 23 5
= 2 - 3 - 5 + l - l - 2 + 2- (-4) • 3 - 2 • 3 ■ 1 — 1 • (-4) ■ 5 + 2 ■ 3 ■ 2 = 30 + 2 - 24 - 6 + 20 - 12 = 10 . In the expression (4), the common denominator of x\, X2, x% is D
an a2\
ai2
ai3
«22
<*23
a-zi
^32
033
while the numerators are respectively the determinants Di
h
ai2
ai3
b2 63
<*22
«23
a32
fl33
an £>2 =
a2i <*31
bi 62 63
ai3
2>3
«23 033
an
ai2
021
«22
61 62
031
«32
03
which may be obtained by replacing the first, the second and the third columns in D by columns formed by 61,62,63 respectively. Thus, (4) may be simply written as Xi
Di D '
Xi
£2 ~ D '
X3
D3 ~ D '
the structure of which is quite similar to the case of n = 2. Example 2. Solve the system of linear equations 2x - y + 2 = 0, < 3z + 2y - 5z = 1, a; + 3y - 2z = 4 Solution: Here 2 -1 1 D = 3 2 - 5 = 28, 1 3 2 2 D2 = 3 1
0 1 1 - 5 = 47, 4 -2
0 -1 2 Dj = 1 3 4
1 -5 = -13, -2
2 -1 2 D3 = 3 3 1
0 1 = 21. 4
6
Linear Algebra
and hence 13 47 3 Zb , v = — 26 , .z = — 4 . a; = — We have rather simple expressions for the solutions of system (1) and sys 28' " 28' 4 temWe (3)have in terms of determinants of orders 2 and 3 respectively the rather simple expressions for the solutions of system (1)under and sys assumption D ^ 0. It is natural to ask whether we could express the solution tem (3) in terms of determinants of orders 2 and 3 respectively under the of a system Dof^linear in ask n unknowns in could terms express of determinants of assumption 0. It equations is natural to whether we the solution order n usingofa linear certainequations appropriate for in such determinants. We real of a system in definition n unknowns terms of determinants of ize that it is very easy toappropriate eliminate one unknown a system of 2 equations in order n using a certain definition for in such determinants. We real 2izeunknowns but easy it is rather lengthy eliminatein2aunknowns a system in of that it is ,very to eliminate onetounknown system of in 2 equations 23 unknowns, but it is rather lengthy to eliminate 2 unknowns in a system of equations in 3 unknowns. If we attempt to eliminate n — 1 unknowns in a 3 equations in 3 unknowns. If we attempt to eliminate n — 1 unknowns in a system of n equations in n unknowns and then to define determinants of order n for any n, then the procedure would be very complicated and tedious, and even impossible for large n. Therefore, in order to give such a definition, we are forced to find an alternative way. However, we may get some inspirations from the structures of determinants of order 2 and 3, which would be useful in the formulation of a definition of determinants of an arbitrary order. Moreover, such a definition should lead us to our objective of expressing the solution of a linear system of an arbitrary number of linear equations simply and in terms of determinants as in the cases n = 2 and 3. From the structure of (5), we observe that: 1. Each term in (5) is the product of three elements taken from different rows and different columns. Thus, each term, disregarding its sign, may be writ ten as aiPl a,2P2 03P3 where (pi,P2,P3) is a permutation of (1, 2, 3). 2. The sign in front of each term is determined by a certain rule which we are about to discover. We see that the term a n 022033, where all the occurring elements are from the principal diagonal, is with a positive sign. The elements appearing in each of the other five terms are not all from the principal diagonal. Of these, two terms 012 023 031 and 012 021 032 are with positive sign too. By interchanging two rows or two columns 2 (an even number) times successively, all the elements appearing in each of them may be transferred to the principal diagonal. For instance, for the term 0-12 0,23 0311 it needs an interchange of row 1 and row 2, and then that of co lumn 1 and column 3. For the remaining three terms a n 023 032, a,\2 a2i 033 and 013022031, we may arrive at the same situation by only one (an odd number) such interchange. For example, for a n 023 032, we need only inter change row 2 and row 3. 3. Since (1,2,3) has 3! = 6 different permutations, there are 6 terms in (5).
7
Determinants
Similar observations may be made for determinants of order 2. Inspired by these observations, we define a determinant of order n as follows. Given n2 numbers a^, i, j = 1 , . . . , n. Arrange them as an array with n rows and n columns as an
ai2
^21
0,22
ain 0-2v
a-ni an2 •• ■ an calling it a determinant of order n (a^ is called its element or number situated at i-th. row and j'-th column), and define as its value the algebraic sum of all the terms described below. 1. Any term, apart from its sign, is the product of n elements, one from each row and each column, and may be written in the general form 0-\pl Q-2p2 ' ' ' Qnpn j
where the first subscripts are arranged in the order of (1, 2 , . . . , n) while the second are in the order of a permutation (pi, P2, ■ ■ ■, pn) of (1, 2 , . . . , n). 2. The sign in front of each product is determined as follows. Interchange two rows or two columns successively so that the elements occurring in this product are transferred to the principal diagonal. The sign of this product is positive or negative depends on whether the number of such interchanges is even or odd. By the theorem at the end of this section, the sign of each term thus determined is unique so that the rule described above is meaningful.* Since there are n! permutations for n numbers, the total number of terms in a determinant of order n is n! Thus, a determinant of order n may be expressed as o-n
■ ■■
0>n\
'''
flin /
± a i P l a,2p2 ■ ■ ■npan i
^nn
"In many textbooks, the sign is determined by the evenness or oddness of the numbers of "reversed order" in p i , P2, •••. Pn, which is identical to ours. However, the rule used here seems simpler for proving certain properties of determinants described below.
8
Linear Algebra
where ]P denotes summation over all the n! permutations of ( 1 , . . . , n) and the sign + or — should be taken as described above, or, what is the same as the reader may easily prove, the sign is + or — according as (pi, P2, ■ ■ ■, Pn) is an even or odd permutation of (1, 2 , . . . , n), i.e., (pi, P2, ■ ■ ■, Pn) becomes (1, 2 , . . . , n) after even or odd numbers of interchanges between any two num bers in the former successively. Moreover, a determinant of order 1 consisting of a single element a is defined as a itself. Let us calculate two simplest, but most fundamental, determinants by definition. Example 3. Prove that the determinants of diagonal type (the unwritten elements being all zeros) have values a\
ai ai
■ ■ ■ an ,
(-1)
2
ai •■• ar,
Proof. The first equality is evident by definition. The only non-zero term of the second determinant is a n • • • ann with the sign determined below. Interchange its row 1 and row n, then row 2 and row n — 1, and so on until the elements o i , . . . , a„ are transferred to the principal diagonal. If n is even, we would require a total of ^ interchanges and so the sign should be (—1)?. If n is odd, then we need !i^ interchanges (the middle row remaining fixed) and so the sign should be (—1)~*~ ■ Note that (—l) n _ 1 = —1 when n is even and (—l)n = — 1 when n is odd. Therefore, for both the sign is (—1) 2 , from which follows the second equality. The second equality may be proved in another way. First, interchange the n-th row successively with its adjacent upper row. It is transferred to the first row after n — 1 such interchanges. Then interchange the n-th row of the new determinant (the (n — l)-th row of the original one) with its adjacent upper rows successively until it is transferred to the second row. This needs n — 2 interchanges. And so on. After a total of n(n — 1) (n - 1) + (n - 2) + ■ ■ ■ + 2 + 1 = -^-—'such interchanges, all of ai, . . . , an are transferred to the principal diagonal, which proves the second equality.
9
Determinants
Note that, after interchanging any two adjacent rows in a determinant, the order of any two rows except these themselves remains unchanged, a fact which will be used later. The triangular determinants in the following example are more general than the diagonal type. Example 4. Prove that the triangular determinants have values an
•■
ai,n-i a
n — l,n— 1
an
ai2
fln-1,1
Gn-1,2
a\n &n— l,n
= an • • • an-in-ian
alr (-1)
2
ain ■ • • a n - i , i a n i
an,i
Proof. The first equality is obvious since the only nonzero term in it is flu &12 ■ • • ain with positive sign. The same is true for the second equality since the only nonzero term of the determinant is ai„ a2, n -i ■ ■ • a„i with sign (—1) 2 as in Example 3. Finally, to end the present section we shall prove the following theorem, without which the sign of each term in a determinant would not be definitely determined and the definition of determinants given would be ambiguous. Theorem. If we have two different ways of transferring the n elements occurring in a term of a determinant of order n to the principal diagonal by interchanging two rows or columns successively, then the numbers of such interchanges in these two different ways must be both even or both odd. Proof. To facilitate the proof, we shall consider some equivalent reductions of the theorem. Firstly, it is easily seen that, if a determinant A is transformed to B by several interchanges of two rows or columns successively, then B may be trans formed to A by the same number of interchanges successively in reversed order. Therefore proving the theorem is equivalent to proving that if all the ele ments of the diagonal term a n 022 • • • ann of a determinant are transferred to
10
Linear Algebra
themselves, i.e., still remaining on the diagonal, by successive interchanges of rows or columns, then the number of such interchanges must be even. Secondly, in the proof of this mentioned fact, without loss of generality, all the involved interchanges of rows or columns may always be assumed to be interchanges between adjacent rows or columns, for any interchange of two rows or columns may be regarded as a succession of an odd number of ad jacent interchanges and so this substitution does not affect the evenness or oddness of the total number of the involved interchanges. For instance, by in terchanging between row 1 and row 5 in a determinant of order 5, the original order of rows: 1, 2, 3, 4, 5 is changed into 5, 2, 3, 4, 1, which may also be accomplished by successive adjacent interchanges: 1,2,3,4,5 —>• 1,2,3,5,4—> 1,2,5,3,4-► 1,5,2,3,4->• 5,1,2,3,4 ->• 5,2,1,3,4 -> 5,2,3,1,4 -> 5,2,3,4,1, seven interchanges in total. Thus, the theorem is proved if the following proposition is established: if all the n elements on the principal diagonal of a determinant of order n are transferred again to those of a new determinant by successive interchanges of adjacent rows or columns in any way, the total number of such adjacent interchanges is always even. We shall prove this by induction. It is evidently true for n = 2. Suppose it is true for determinants of order n-1. Assume all the elements a n , . . . , a n n on the principal diagonal of a determinant A of order n are transferred to those of a new determinant by m adjacent interchanges of rows or columns, of which some keep the (tempo rary) position of a n fixed, while others change its position. Let the position at i-th row and j-th column in the determinant be denoted by (i,j). The initial position of a n is (1, 1). Assume its final position is (k,k). Both the sums of the ordinal numbers of row and column are even. When we make an adjacent interchange between rows or columns at one of which a n locates, the ordinal number of this row or column will be increased or decreased by 1. Hence the total number of adjacent interchanges for a n transferring to akk must be even, say 21. Therefore, there are m — 21 adjacent interchanges keeping a n fixed. If we could prove that m — 21 is an even number and so isTO,then our proposition is established. To prove this, let us consider the determinant B of order n — 1, which results after the first row and first column of A have been cancelled from A with the orders of the remaining rows and columns unchanged. An adjacent interchange of A involving the row or column on which a n locates causes an adjacent interchange of B, while that of A independent of the mentioned row
11
Determinants
and column does not cause any change of B. Hence, among the m adjacent interchanges of rows and columns in A, the above-mentioned m — 2l adjacent interchanges of A are also in B, which makes all the elements 022, ■ • •, ann remain on its principal diagonal. By induction, m — 21 must be even. The theorem is completely proved. Exercises 1. Solve the following systems of equations: (1)
(2)
x tan a + y = sin(oj + /?), x — y tan a = COS(Q + /3).
x + y + z = 1, x + u>y + u>2z — w, x + u>2y + u)z =u>2
where u> is one of the complex cubic roots of unity. 2. Write out all the terms containing a n 022 as a factor in a determinant of order 4. 3. Calculate from definition an
ai2
ai3
a 14
ais
021
a22
<*23
0-24
0-25
0 0 0
0 0 0
0 0 0
031
032
a4i
«42
051
»52
. Prove that a b c a b c
c b = a3 + b3 + c3 - :3abc. a
5. If the number of nonzero elements in a determinant of order n is greater than n 2 — n, then the determinant is zero. Why? 6. Can all the elements appearing in any term of a determinant be transferred to lie on its sub-diagonal by successive interchanges of rows or columns? If so, can we obtain a rule for determination of the sign of the term on the basis of such interchanges? 7. Prove that multiplying each element a,j of a determinant A by bl~i{b 7^ 0) results in a determinant that is equal to A.
12
Linear Algebra
1.2. Basic Properties of Determinants We could calculate the value of a determinant of an arbitrary order from its definition. However, this would be very tedious in general and so would not be used in most cases. In the present section, some basic properties of deter minants will be illustrated, making use of which calculations of determinants may be considerably simplified. Property 1. When all the elements in a row of a determinant are multi plied by a number k, the resultant determinant equals k times of the original one. Or, what is the same, if k is a common factor of all the elements in a row of a determinant, then k may be factored out of the determinant, i.e., an
•
lire
an
■
O-ln
kan ■
Kdin
= k an
•
fltn
O-nl
ann
O-nl
■
ann
As the general term of the determinant on the left-hand side is ± &ipi ■ ■ ■ (.n-flipj) ■ ■ ■ Qnpn
=
"<(± a\Pl ■ ■ ■ aiPi • • ■ anPn),
this property follows immediately. For example, multiplying the first row of the determinant of order 3 in Section 1.1 by 2, we get 4 2 4 2-2 -4 3 1 = -4 2 3 5 2
21 3 3
2-2 1 5
2A = 20 .
In particular, a determinant equals zero when all the elements in a certain row vanishes (i.e., k = 0). Property 2. If every element in the i-th row of a determinant is written as the sum of two terms: 0>ij — 0j "r Cj ,
j = l,-
then the determinant is equal to the sum of two determinants in which the i-th rows are respectively b\,..., bn and c\,..., c„, the other rows remaining unchanged:
13
Determinants an
a12
ain
61 +Cl
b2 + c2
On 1 C n
O-nl
On2
ann
=
a n
012
•■■
ain
61
62
••■
On
Onl
0,n2
■ ■
an
«12
Cl
C2
ani
a«2
+
^nn
■ ■ •
ain Cn dnn
This follows from the fact that the general term of the determinant on the left-hand side is ± ai P l •■ • (bPi + cPi)
*npn
= ± ai P l • • ■ 6Pi - •nj>r> • L
±
aipi
• • ■ cPi
^npn 1
where either the upper or the lower signs apply throughout since the order of the columns of all these determinants remain the same. For example, 2 -4 1+1
1 3 1+2
2 1 2+3
2 4 1
1 3 1
2 1 2
+
2 1 2 -4 3 1 = 5 + 5 = 10 = 4 . 1 2 3
Property 2 may be extended to the case where every element in its i-th row is the sum of 3, 4, or m terms. Property 3. If any two rows of a determinant are interchanged, then only its sign is changed. Any product ai P l ■ ■ • anPn appearing in the original determinant also appears in the resultant one and vice versa. Their signs are however opposite each other for these two determinants according to the rule for determination of the sign of any term of a determinant, since the numbers of interchanges of the rows for a i P l , . . . , a„Prv to lie on the principal diagonals of the two determinants differ by 1. It is not difficult to obtain the following two conclusions as a consequence of the above discussion:
14
Linear
Algebra
1. If two rows of a determinant are identical then it is equal to zero. For, when we interchange these two rows the determinant remains the same, but, on the other hand, by Property 3 it should change sign. Therefore it must be zero. 2. If the corresponding elements in two rows of a determinant are proportional, then it equals zero. For, if each element in its i-th row is k times the corresponding element in the j'-th row (j ^ i), then, by Property 1, k may be factored out, resulting in a determinant with two identical rows, which is zero by Property 3. Example 1. Prove that in a determinant D of order n: On
■ • ■
O-ln
D =
(n > 2), O-nl
the number of terms with positive sign and that with negative sign are equal to each other. Proof. Put a.ij = 1 in D, then
£> = ]T ±1 = m - n where m and n are numbers of terms with positive and negative signs respec tively. On the other hand, D = 0 by the above-mentioned consequence 1. Hence m = n. Property 4. In a determinant, if we add k times of each element in a row to the corresponding element in another row, then the value of the determinant remains unchanged, i.e., an a,i + kdji «nl
("in T fvO-jn
Oi„
Oil
O-ln
=
Oil
■
O'in
Onl
•
onu
U * i) ■
This is readily proved by combining Property 2 with the previous conse quence 2. Note that in the above equality the j - t h row of the determinant on the left-hand side remains invariant.
15
Determinants For example, 2 1 2 -4 3 1 2 3 5
2 -4 + 2-2 2
2 1 2 0 5 5 2 3 5
1 3 + 1-2 3
2 1 + 2-2 5
2 1 2 O i l = 10. 2 3 5
Example 2. a d c d
c—a a— a c a c c—a a—
c 0 0 d b b d c 0 0
Example 3.
a+b a b
0 -26 -2a = -2 a b+c a 6 b c+ a
c c b+c a b c+ a
b a c 0 = 4a6c. 0 c
By the symmetry of row and column in the definition of a determinant, the above Properties 1-4 and consequences 1, 2 remain valid for "columns" in place of "rows". The following concept is fundamental. Given a determinant
D =
an
dl2
a^
0-21
«22
a2n
O-nl
0,n2
if we change all its rows (columns) into columns (rows) with the orders pre served, then we get a new determinant an D' = au O-ln
A21
CLnl
a22
an2
CL2n
called the transposed determinant of D. Obviously, the transpose of D' is D. So we may say that D and D' are transposes of each other. It is evident that
16
Linear Algebra
the element of D' at its i-th row and j - t h column is a,ji, which is the element of D at its j - t h row and i-th column (the elements of D and D' on their principal diagonals remain unchanged). Property 5. The value of a determinant D is the same as its transpose D' The same product ai P l • • • anPn will occur as a term of D and as a term of D'. An interchange between two rows (columns) of D is equivalent to one between two columns (rows) in £>'. Therefore, Property 5 follows immediately by the rule for determination of the sign of a term in a determinant. For example, 2 A' = 1 2
-4 3 1
2 2 0 0 3 = 1 5 2 = 10 = A. 5 2 5 3
The above properties of determinants are important in both calculations and theoretical deliberations. We illustrate by some examples. Example 4. Prove that
b+c c+a q + r r +p y + z z+x
a+b a b c p+q = 2 p q r x+y x y z
Proof. By Properties 2 and 3, we see that b c+ a b+ c c+ a a+b q+r r+p p+q = q r+p y+z z +x x +y y z+x b c+a = q r+p y z+x b c = q r y Z
a p X
a+b p+q x+y a P
+
X
+
+
c r z
C-T
a
r+p z+x
a+b p+q x+y
c a a +b r P p+q z X x+y
c x
a p
Z
X
b a b c q =2 p q r x y z y
The calculation of a determinant can be greatly simplified if it is first trans formed to a triangular one using the above properties.
17
Determinants
E x a m p l e 5. Calculate 1 D =
1
1
1
a,\
a
0,2
0,2
a
03
03
&3
0,3
a
a,2 a.1
Solution: Adding — oi, —02 and —03 times of the first row to the second, third and fourth rows respectively, we obtain 1 D
1
1
1
0 0
a — CLI 0
d2 — Qi a — a-2
02 — 0,1 03—02
0
0
0
a-a3
which is triangular. Then by Example 4 of the previous section D = (a — ai) (a - a 2 ) (a - a 3 ) . E x a m p l e 6. Calculate
D
3 1 1 1 1 3 1 1 1 1 3 1 1 1 1 3
Solution: Note that the sum of the elements of each row is 6. For each row, add to the element of the first column the elements of all the other columns. Then after taking out the factor 6, subtract, for each row, the element of the first column from the elements of all the other columns in turn. This results in a triangular determinant as follows:
6 1 1 1 6 3 11 =6 6 13 1 6 1 1 3
1 1 13 1 1 1 1 10 12 10 10
1 1 11 3 1 1 3 0 0 0 0 d 2 0 = 6-2 0 2
48.
18
Linear Algebra
Example 7. Calculate C a
b a a b 0 a a a 0
° - tc S i
Solution: -6 0 D = 6 0
0 -b 0 b
6 a 0 a
a b a = 0
-6 0 0 0
0 -6 0 0
6+ a a+ 6 6 + 2a 2a+ 6
-6 0 0 0
ii I) 2
0 -6 0 0
b a b 2a
-6 0 0 0
0 -6 0 0
a 6 2a 6 6+ a a+6 6+2a 0
a b 2a 6-2a
2 2 = 9{b - 4 a )
Example 8. Calculate a2 + 1 D = a6 ac
a& <1C be 62 + 1 be c2 + 1
Solution: First assume a6c ^ 0. Then a+i £> = a6c
6
c 1
a a
0+1 b
c c+\
a 2 + 62 + c2 + 1 62 2 2 2 2 = a + 6 + c + l 6 + l a 2 + 62 + c2 + 1 62 = ( a 2 + b2 + c 2 +
1}
1 1 1
1 2 2 = > 2 + 6 + c 4-1) 0 0
a2 + 1 a2 = a2 c2 c2 2 c + 1
62 6 + l 62 2
62 1 0
62 6 +1 62 c2 + l 2
c2 c2 2 c +1
c2 0 = a2-i -b2 + c 1
19
Determinants
It is easy to verify that the equality holds in the case one or more of a, 6, c equal to zero. To end this section we introduce two important classes of determinants. If a,ij = a,ji (i, j = 1,2,..., n) in a determinant
D =
an
ai2
■ ■
0,1
A21
a-22
■
■
a.2
ani
&n2
■
an
then D is called symmetric; if a.y = —a,ji (i,j = l,..., n, and so an = 0), then D is called skew-symmetric. For instance, the determinants in Examples 5 and 6 are symmetric while that in Section 1.3 below is skew-symmetric if a = 0. Example 9. Prove that any skew-symmetric determinant of odd order is equal to zero. Proof. Since a^- = — a,j, we have, by Properties 1 and 5,
D =
0
0,12
0-ln
0
-a-21
-O-nl
a2l
0
0-2n
-0-12
0
-d„2
Onl
Q-n2
0
-O-ln
= ("I)"
0
02l
O-nl
ai2
0
an2
O-ln
0,2n
' ■ •
^0,2n
= (-!)"£>.
0
In the case n is odd, obviously D — 0. Exercises 1. Calculate the following determinants:
(1)
1 2 0 1 1 3 5 0 0 1 5 6 1 2 3 4
(3)
a — b— c 2a 26 b-c-a 2c 2c
(2)
1 1 1 1 1 1 1 -1 1 1 1 -1 1 1 1 -1
2a 26 c— a — b
0
20
Linear Algebra
(4)
a b b b a b a b a a b a b b b a
1 3 3 3 2 3 3 3 3
(5)
3
3
3
n
2. Prove that
(1)
(2)
b d a+b a+b+c a+b+c+d 2 a + 6 3a + 26 + c 4a + 36 + 2c + d 3a + b 6a + 36 + c 10a + 66 + 3c + d by + az bx + ay bz + ax
bz + ax by + az bx + ay
bx + ay bz + ax by + az
x (a3 + 63) z y
y X
z
z y x
3. Prove that 2 3 4
n
3 4 5
n 1 2
1 2
*("-i) n " + rin - l = (-1)
n-1
4. Prove Exercise 7, Sec. 1.1 using the properties of determinants. 5. Prove that an{t) an(t) d_ a2i(t) a 2(t) 2 dt a.3i{t) a32(t)
a13(i) a2z{t) a 33 (i)
au(t) «2lW
a
ai 2 (t) 22(0
031 (*)
032(*)
ai 3 (t) 23(0
a
a33(t)
a'uW a2i(t) a3i(t)
+
a
'i2(t) a22{t) a32(t)
an(t) a2\{t) a'3i(t)
a'13(t) a23(t) a33(t)
ai2(t) a22(t) a'32(t)
a13(t) a23(i) a'33(t)
and generalize it to the case of a determinant of an arbitrary order. 6. What is the relation between a determinant and one that is obtained by transposing around its sub-diagonal?
1.3. Development of a Determinant We have given some basic properties of determinants, by which the calcu lations of many determinants may be simplified. We would now see a general
21
Determinants
method for such calculations. We know that the lower the order of a determi nant is, the simpler is its calculation. We therefore look for a general method to reduce the order of a determinant. To this aim, we first introduce some definitions. Taking certain k definite rows and k definite columns of a determinant D of order n(k < n), we construct a determinant of order k, called a minor of D of order k. The elements of the minor are those of D situated in the crossed area of the rows and columns taken with their relative orders preserved. In particular, by omitting all the elements of the i-th row and j'-th column at which the element a^ of D locates, the resultant determinant Mij, of order n — 1, is called the complement minor of a^- in D. For instance, for the determinant of order 4 an
0.12
ai3
an
«21
a22
a.23
a24
0-31 a4i
0-32
0-33
a34
a42
a.43
«44
a n and a23 are minors of order 1 and
Mn =
a-22
0.23
«24
0-32
0-33
«34
042
a.43
»44
a
,
M23 =
n
031 an
ai2
ai4
«32
034 a44
Q42
are minors of order 3. It is easily seen that the complement of a^- in D is the transpose of its complement in the transpose D'. We expect to express a determinant D in terms of its complements. Let us consider a n M n first. Since any term of M n is of the form ± a 2 p 2 • ■ • anpn, any term ±ana2P2 ■ ■ ■ anPn in a n Mn is a term of D. Therefore, all the terms of a n M n form a part of the terms of D. Then, consider the general case a\k Mik, which we shall reduce to the above special case. Interchange the fc-th column of D with its preceding adjacent column successively k — 1 times so that it is transferred to the first column; a\k is now situated at the left upper corner of the new determinant B. By Property 3, Sec. 1.2, we have
B=
(-l)k~1D.
22
Linear Algebra
Note that each of the above interchanges does not change the relative order of the columns of Mm, in D and so the complement minor of a\k in B is still Mifc. Thus, all the terms in a\k Mifc are terms of B. Hence, all the terms of aife(-l) fc_1 Mifc are terms of ( - l ) * - 1 ^ or D. Thus, we know that all the terms in the following products a n Mu, -an M12,...
,ain(-l)n_1Mn_i
are also terms of D, different from each other. But Mifc is a determinant of order n — 1 and so there are (n — 1)! terms in Mifc as well as in aifc(—1) Mifc. Therefore, we have got a total of n\ different terms of D, which means we have exhausted all the terms of D. Thus we can conclude D = a n M u - aw M12 + • ■ • + a i „ ( - l ) n _ 1 M i „ , or, what is the same, £> = 5>ifc(-l) 1 + f c Mifc. fc=l
As an example, we have 2 -4 2
1 2 3 1 3 1 = 2 3 5 3 5
-4 2
1 -4 + 2 5 2
3 3
= 24 + 22 - 36 = 10. The above result is easy to extend to the i-th row, instead of the first row. By interchanging the i-th row with its preceding adjacent row successively, it becomes the first row of the new determinant B, with the relative orders of the remaining rows and columns unchanged, and so the complement minor of a,ij in B is exactly My in D. By the previous conclusion, we have B = an Mn - an Mi2 H
1- a i n ( - l ) n _ 1 M i n ,
and consequently
D = {-iy-lB
=
ani-iy-'Mn+aai-iyMa + --- + ain(-l)i-1+n-1Min,
or, what is the same, for fixed i,
3= 1
23
Determinants
For convenience, (-l)* + J Mjj is called the algebraic complement or the (algebraic) cofactor of a^ in D, denoted by A^: Ail =
(-l)i+JMij.
As an example, in the previous determinant of order 4 the algebraic com plements of a n , o-23 are respectively An = ( - 1 ) 1 + 1 M „ = M a ,
A23 = ( - l ) 2 + d M 2 3
"M 2 3-
Thus, we have the following fundamental theorem: Theorem 1. The determinant of order n an
•••
air
D = Onl
a„
is equal to the sum of the products of all the elements of D in the i-th row and their algebraic complements: D = an An + at2 -Ai2 H
+ a, n Ai„ .
The theorem is frequently used and is usually formulated so as to develop a determinant about a row, say the i-th row. It may also be developed about a column instead of a row: say the j'-th column: D = aij Aij + a2j A2j -\
+ anj Anj .
For instance, developing A about its second column and third column we have, respectively, 2 -4 2
= 4
1 2 2 2 +3 3 5 2 5
2 2
1 3
= - 4 + 1 8 - 4 = 10 and 2 -4 2
1 2 -4 3 2 1 2 3 1 = 2 2 3 — 2 3 + 5 -4 3 5 = - 3 6 - 4 + 50 = 10.
1 3
24
Linear Algebra
By developing a determinant about a chosen row or column, its calculation can be simplified by the use of complement minors. Which row or column should we take then? Usually, we choose one which contains most zero el ements since we can then avoid calculating the complement minors of such elements. Therefore, in practice, we often first simplify the determinant by us ing Property 4, Sec. 1.2 so that there occur as many zero elements as possible in a row or column, and then develop it. Let us illustrate by some examples. Example 1. Calculate the determinant of order 4
D =
3 1 5 1 2 0 1 -5
-1 2 3 -4 1 -1 3 -3
Solution: There is already a zero element in row 3. By Property 4, Sec. 1.2, adding 2 times of column 4 to column 1 and then column 3 to column 4, we get a determinant with 3 zero elements in its row 3. Developing it about this column, we get
D
7 1 - 1 1 -13 1 3 -1 0 0 1 0 - 5 - 5 3 0
(-1)
3+3
7 1 13 1 -5 - 5
1 -1 0
Then, adding row 1 of the determinant of order 3 on the right-hand side to row 2, we obtain at length
D
7 -6 -5
1 2 -5
1 -6 0 = (-1) 1+3 -5 0
2 = 30 + 10 = 40 . -5
Example 2. Calculate
1+x D =
1 1 1
1 1-x 1 1
1 1 1 1 1 1 + 2/ 1 1-2/
Solution: Assume xy ^ 0. By Theorem 1, the determinant may be modified by adding in a row and a column as follows:
25
Determinants
1
1+x D
1 1 1
1 1 1-x 1 1
1 1 1 1 1 1 1 + 2/ 1 1 1-2/
1 -1 -1 -1 -1
0 0 0
1 0 -x 0
1 0 0 y
0
0
By Property 4, Sec. 1.2, by adding - times of column 2, —- times of column 3, i times of column 4, and — j times of column 5 altogether to column 1, it becomes a triangular determinant, and hence 1 0 D = 0 0 0
1 X
0 0 0
1 0 —x 0 0
1 0 0
l 0 0 0
y 0
2 2
= x'y*
-y
Obviously, the above holds for x = 0 or y = 0 too, since D = 0 in such cases. It may also be proved using Property 2 of determinants. Example 3. Prove that a —b D = —c —d-c
b a d
c —d a b
d c = (a 2 + b2 + c2 + d2)2 . —b a
P r o o f . Assume a ^ 0 for the time being. Then,
a
a2
a2 —ab —ac —ad
b c d a —d c d a —b — c b a
+ b2 + c2 + d2 a
a2
+ b2 + c2 + d2
a d -c
a2
+
b2
+
a -d a b
c
2
+
d2
1 0 0 0
b a d —c
c -b a
(a 3 + bed - bed + ab2 + ac2 + ad2)
a <2N2 { + K2}? , +„2(? + )
c -d a b
d c -b a
26
Linear Algebra
Thus, the equality is proved when a ^ 0. Letting a tend to zero in the equality, we see that it is also valid for a = 0 since its both sides are continuous functions of a. Or, as its both sides are evidently polynomials in a and it is true for a ^ 0, it is true for a = 0 too. The determinant in the following example is of considerable importance. Example 4. Prove that the Vandemond determinant 1 D =
1
x\
1
1
X2
X3
Xn
x
x
=
3
2
JJ
(Xi - Xj) .
„n-l
„»t-l
Proof. We prove it by induction. It is true for n = 2 since 1
1
X\
X2
= X2~Xi=
[\
(Xi-
Xj) .
Suppose now it is true for order n — 1; we can prove that it is true also for order n. To do this we transform D into a determinant of order n — 1. First subtract Xi times of row n — 1 from row n, x2 times of row n — 2 from row n — 1, and continue this process until we subtract x\ times of row 1 from row 2. As a result we get
D =
1 0 0 0
1 x2 X2(X2 -
ij
1 -xi Xi)
(x2 - x{)
X3 X3(X3
xj
1 X\ -Xi)
Xji
X-n\Xn „n-2
(x 3 — xi)
X\
X\)
(xn - x\)
Developing it about column 1 and taking out the common factor of each row, we have 1 D = (x2-
zi)(x 3 - xi) ■ ■ ■ (xn - xi)
X2 „n-2
1
1
X3 _n-2
_n-2
The determinant on the right-hand side is the Vandemonde determinant of order n — 1. By induction, it is equal to the product of all (XJ — Xj)'s where n ^ i > j' ^ 2. Therefore,
27
Determinants D = (X2-Xl)(xz-Xl)---(xn-Xl)
=
\\
(Xi-Xj)
{*i- XJ) ,
n n^i>j^l
which proves the equality. As a consequence, we know that Vandemonde determinant D = 0 if and only if at least two of x%,..., xn are equal to each other. Sometimes the above expression is written as
D=(-I)*^
n (*-»*)•
Example 5. Prove that 1 a i2
1 b b2 b4
1 c c2
1 1 d a = (a + b + c + d) d2 i d4 2
1 b b2 b3
1 c c2
1 d d2 d3
Proof. Let the quartic equation with a, b,c,d as its roots be x4 — k\x3 + ft2X2 — k$x + ki = 0. Then, k\x3 = x4 + kix2 — kzx + £4. In the left-hand determinant, by adding k^ times of row 1, — k^ times of row 2 and &2 times of row 3 altogether to row 4, the 4-th row becomes fcia3, k\b3, kic3, kid3. Taking out the common factor k\ =a + b + c + d,we get the Vandemonde determinant shown on the right-hand side of the equality. Theorem 1 assures that the sum of the products of all the elements in the i-th row of a determinant D and their algebraic complements is equal to D. We may ask what is the sum of the products of all the elements in the i-th row with the algebraic complements of the corresponding elements in another row, say the j - t h row (j ^ i), an Aji + ai2 Aj2 H
h ain Ajn
(i ^
j).
28
Linear Algebra
It is easy to answer this question. By Theorem 1, the above expression is equal to the determinant obtained after the j'-th row an,... ,djn of D is replaced by its i-th row an,... ,a,{n (all the other rows including the i-th remaining unchanged), which does not influence Aj\,... ,Ajn: an ■
ain
an
ain
<— the i-th row
an
ain
the j-th. row
ani
■
ann
This new determinant is zero since its z-th and j - t h rows are identical. Thus, we have proved: Theorem 2. The sum of the products of all the elements in a certain row of a determinant of order n (> 2) and the algebraic complements of the corresponding elements in another row is zero: an Aji + ai2 Aj2 H
V ain Ajn = 0
(i ^
j).
(i ^
j).
This is certainly true also for columns: au A\j + aii A2j H
(- a„i Anj = 0
Combining the above two theorems and introducing the Kroneker symbol 6i
i
=
when i = j , 0, when i ^ j , \0,
we have two equalities n / i=\
n a%s Aj3 = Oij U,
y
asi Asj = Oij U .
i=\
In Theorem 1, the determinant is developed about one row or column. It may be extended to the case where the development is realized about several rows or columns.
29
Determinants
We generalize the notion of complement minors first. In a determinant D of order n, by omitting all the rows and columns on which the elements of its certain minor TV of order k are situated, the remaining minor M of order n — k, with the relative orders of its rows and columns preserved, is called the complement minors of TV in D. Let the ordinal numbers of the rows of TV in D be ii, t2, • • ■, ik and those of the columns be j'1,.7'2, ■■■,jk, then
is called the algebraic complement or the {algebraic) cofactor of TV in D. For instance, in the determinant of order 4 previously considered, the complements of the minors TVi, TV2 of order 2 are 0*22
«23
0-32
033
M 2 = au
a.42
au d44
respectively and their algebraic complements are (_l)l+4+l+4Ml
=
Mu
(_l)2+3+l+3M2
=
_M2
respectively. If the sets of ordinal numbers of the rows and columns of TV are identical, then TV is called a principal minor. For example, the previously considered a>n,Mn,Ni,Mi are principal minors while 023, M23, N2, M2 are not. Obvi ously, the complement of a principal minor is just its algebraic complement. The minor of order n of a determinant of order n is itself, the complement minor of which does not exist. However, for convenience, we define 1 as its complement minor, as well as its algebraic complement. Example 6. Assume that M is the complement minor of TV in a determi nant. Prove that either M and TV are algebraic complements of each other or —M and TV, as well as M and —TV, are algebraic complements of each other. Proof. Assume that the ordinal numbers of the rows of TV are i\,... ,ik, and those of its columns are ji,- ■ ■ ,jk- Then the algebraic complement of TV is (—iyi+—+ik+h+-+jkM
30
Linear Algebra
and that of M is (_l)(l+"-Hi)-(«i+-Mfc)+(l+-+n)-(ii+-+ifc)jy
If iiH Hfc+jiH hjfc is even, then M and TV are algebraic complements of each other; if it is odd, then —M and — N are algebraic complements of N and M respectively. The statement in the example is established. Making use of the above generalized concepts, it is easy to extend Theo rem 1 as follows: T h e o r e m 3 (Laplace's t h e o r e m ) . Take certain k rows of a determinant D of order n (1 < k < n). Then the sum of the products of all the minors of order k for these rows and their corresponding algebraic complements in D is equal to D. This theorem is trivial for k = n; it is just Theorem 1 for k = 1 (or
k=
n-l).
This theorem is usually said to be the development of a determinant ac cording to certain k rows. It is also true for columns instead of rows. The reasoning for its proof is quite analogous to that for Theorem 1, but involves rather complicated statements, and will be omitted here. Thus, we have thoroughly answered the second problem proposed at the beginning of the present chapter. The discussions have been fundamental and theoretical. In engineering and technology, it is often required to calculate determinants of an extremely high order, the elements of which are arbitrary, without any rule. In such cases, the method of calculations described in the present chapter is not effective in general and computers must be used for their numerical and approximate calculations, which is beyond the scope of this book. Example 7. Calculate D in Example 1 of the present section by develop ment according to its first two rows. Solution: There are 6 minors of order 2 in its first two rows: 3 -5 1 1
3 1 = 8, N2 = 1 -5 -1 1 = 4, 7V5 = 1 3
-1 3 = 4, JV3 = 3 -5 2 -4
-6,
7V6 =
-1 3
2 - 4 —- 2 , 2 -4
=s
-2,
31
Determinants
the algebraic complements of which are respectively A, = (-1)1+2+1+2
1 3
A3 = (-1)1+2+1+4
0 -5
-1 -3
0, A2 = (-1)1+2+1+3
1 = 5, A4 = 3
(-l) 1+2+2+3
A5 = (-1)1+2+2+4 2 1 = - 5 , A6 = 1 3
(-l) 1+2+3+4
0 -5 2 1
-1 = 5, -3 -1 -3
-5, -10.
Therefore, by Laplace's theorem, 6
i=i
= 8 • 0 + 4 ■ 5 + (-2) • 5 + 4 • (-5) + (-6) ■ (-5) + (-2) ■ (-10) = 20 - 10 - 20 + 30 + 20 = 40. If there are many zero elements in certain rows or columns in a determinant, then it would be simpler to calculate its value by development according to these rows or columns. Example 8. Calculate the determinant 5 1 D = 0 0 0
6 0 6 1 5 0 1 0 0
5
0 0 6 5
0 0 0 6 1 5
Solution: Develop it according to the first two rows: D =
5 6 1 5
5 6 0 1 5 6 0 1 5
1 6 0 0 5 6 = 19 • 65 - 30 • 19 = 665 . 0 1 5
5 0 1 6
The situation is quite similar to developing it about the first two columns. Example 9. Calculate the determinant of order 2n Cn
D
Cnl
••■
Cin
On
C-nn
Gnl
&ln
bnl
■••
Ol,
32
Linear Algebra
where the elements in the last n rows and the last n columns are simultaneously zero. Solution: Develop D about the last n columns. Since all the minors of order n in these columns except the one situated at the right upper corner are zero, the value of D is the product of this exceptional minor and its algebraic complement: .H
\-n)+n-n
an &nl
= (-l)
Oil
' ■ ■
O-nl
■ ■ ■ 0>nn
Gin
O-ln
bu
■ ■ ■
■■ ■
" * *
^nn
bnl
bn
•••
bin
bnl
•■■
bin
bnn
n
»
"Tin
since n2 is even or odd depends on whether n is even or odd. Example 10. Calculate the determinant of order 2n a
b n rows a b b a n rows
Solution: Developing the determinant about the n-th and (n + l)-th rows, we have n — 1 rows D
a b b a
a b b a
n — 1 rows
n — 1 rows 2
2
(a - 6 ) •
a b b a n — 1 rows
33
Determinants
Hence, it is evident by induction that D = (a2 - b2)n. We see that it is really very convenient to calculate such special determi nants using Laplace's theorem. Exercises 1. Calculate the following determinants:
(1)
1 3 2 1
(3)
7 9 7 5 0 0
1 -1 3 2 6 7 4 3 0 0
\ + x\ (5)
Xi
X\
%n 2*1
2 -1 -1 3 5 8 9 6 5 6
3 2 1 . 0
4 9 7 1 6 8
3 4 0 0 0 0
Xi X2
1+
x
x a b 0 c 0 y 0 0 d 0 c z 0 / y h k u I 0 0 0 0 v
(2)
2 3 0 0 . (4) 0 0
a 0 0
0 a 0
0 0 a
0 0
0 0
0 0
• • • •
X\ Xi
•• •
X2 X,
2
(6)
*w *^2
0 0 0
1 0 0
a 0
0 a
(order n),
n
Cl
cl
Cn+1
Cn
Cn
Cln
2. Prove the following identity: X
0
-1 x
0 -1
■• •••
0 0
0 0 = xn+
0 an
0 a„_i
0 an-2
■ •• ■ ■■
x a,2
a i z 7 1 - 1 + ■ ■ + an-ix - + an
-1 x + ai
rov s t h a t the determinant of order n 2
-1
-1
2
-1
0 = n+l.
o
•,_1-'
-1 2
34
Linear Algebra
4. Calculate by Laplace's theorem 1 1 Ol
h
02
b2
a3
1
0 0 1 1 1 0
UJ
h2 ;
0 0 1
UJ2
1
0 0 1
UJ2
C2
UJ
C3
UJ2
Cl
u> 0
0
where UJ is one of the complex cubic roots of unity. 5. By using Vandemonde determinants, calculate an 71-1
(a - 1)" (a-1)"-1
(a - n ) n (a - n ) n _l
• ■■•
(1)
?
a 1
a-1 1
a?
a1
a—n 1
Oi
*ti— i t
a2
(2) n n+1
a
02
alb?-1
hi
„
bn2
in-1
<22°2
n+l°n+l
bn
an+iOn+1
••
where aj ^ 0, i = 1 , . . . ,n. 6. Prove by induction that, a + b ab 0 1 a + b ab 0 1 a+b (1) Dn = 0
0
0
0 0 0
0 0 0
1
a+b
2 n+i
(2) Dn
1 2cos0 1
0 1 2cos0
0 0 0
0 0
0 0
0 0
2cos0 1
sin (n + 1)0 (6 ± kir) sin# What would happen if 8 = fen-?
0n+i
{a * b),
What would happen if a = 6? 2cos0 1 0
_
0 0 0 1 2co
35
Determinants
7. Prove that the determinant of order n cos 6
1
1
2cos0
0
1 2cos0
1
1
2cos0
cos n 6.
8. Prove that, if the sums of elements in each row and in each column of a determinant are all zero, then the algebraic complements of its elements are equal to one another. 9. Prove Laplace's theorem.
1.4. Cramer's Theorem The concept of determinants of any order n is the generalization of de terminants of orders 2 and 3. We have seen at the beginning of the present chapter that the solutions of systems of linear equations in 2 and 3 unknowns may be expressed in term of determinants of orders 2 and 3 respectively. We expect that the solution of a system of linear equations in n unknowns may be similarly expressed by determinants of order n, which would confirm the reasonableness of the definition. A system of n linear equations in n unknowns may be written as ' a n %1 + ^12 x2 + 0-21 %1 + 0-22 %2 +
1" O-ln %n = &li 1" 0,2n %n = ^2,
> a « l Xl + 0-n2 X2 + ••• + CLnn %n
(1)
bn-
The following determinant of order n constituted by its coefficients, ai2 a22
■
a\
a2i
• ■
0,2
O-nl
0>n2
■
■
an
an D =
is called its coefficient determinant, sometimes abbreviated as D = \a,ij\ or det|ajj|.
Linear Algebra
36
If (1) has a solution, i.e., there exists a set of ordered numbers satisfying (1), by Property 4, Sec. 1.2, we have anxi
012
o-ni x\
an2
aiT
Dxx anxi
■■■
ann
+ CL12X2 + ■ ■ ■+ainxn
o.ni xi + an2 x2-\ b\
a\2
bn
o-ni
+ annxn
a12
&IT,
an2
■ • ■ o.\n (=Di).
In general, we have Dxi = Di,
i = l,...,n,
where Di is the determinant obtained when the i-th column an, a2i, ■ ■ ■, ani in D is replaced by the constant column b\,b2,..., bn. If D ^ 0, then we get _ Di _ Dn X\ — D ', . "..,X n — D ' ' ~" ® Thus, we have proved t h a t if D / 0 then the system (1), if solvable, has the unique solution (2). Now, we verify t h a t (2) is actually a solution of (1) provided D ^ 0, t h a t is, Di , Dn l,...,n, 0-il -JT H r Oin ~=r — Oi, or an Di H
+ ainDn
= biD,
To this aim, develop the following determinant of order n + 1 according to its first row, bi 0^1 &in h an 0-\n (i = l , . . . , n ) ,
which is zero since its (i + l ) - t h row is identical to the first one. Because t h e algebraic complement of a y in the first row of this determinant is 61
aii---a\j-i
Oi,j+i • • • a i n
&n
&nl - • ' Q-n,j-l
On,j+l ' • • 0,r,
(_l)i+i+i
= (-i) 2 +'' • (-iy-1Dj
=
-Dj,
37
Determinants
we have biD -anDi
ain Dn = 0
(i = 1 , . . . , n ) ,
which means that (2) is actually a solution of (1). Thus, we obtain the following important theorem. Theorem (Cramer's theorem). If the coefficient determinant D of sys tem (1) 7^ 0, then it has the unique solution Ih
Dn
where Di(i = 1,2,... ,n) is the determinant obtained when the i-th column Oil) Q-ii-, ■ • • i o-in of D is replaced by the constants column bi, 6 2 , . . . , bn. This theorem meets our expectation. In particular, when all the constant terms b\,&2>• • • >bn in (1) vanish, the system becomes auXi-\ \-ainxn = 0, (3)
called a system of homogeneous linear equations ((1) being in general nonhomogeneous). In this case, all the elements of the i-th column vanish and so Di = 0, i = 1 , . . . ,n. Therefore, if D ^ 0, then (3) has the unique solution x1=0,...,xn
= 0,
called the zero solution or trivial solution. Thus, the homogeneous system (3) has only the zero solution provided D ^ 0. It is easily seen that, if all the coefficients as well as the constant terms in system (1) are real numbers, then by the definition of determinants its solution, if D ^ 0, also consists of real numbers. An advantage of Cramer's theorem lies in that the solution of the system of linear equations (1) is expressed explicitly by means of determinants, the elements of which are its coefficients and constant terms. This is very conve nient and useful in a theoretical analysis of such systems. However, in practical calculations it requires one to compute n + 1 determinants of order n, which is very complicated especially for large n. In practice, for solving systems of linear equations of high orders, there are many numerical methods for comput ers. These will not be dealt with here as there are a large number of textbooks on the subject.
38
Linear Algebra
As for the case D = 0 in system (1), the question is whether it has solutions and, if so, how to get them, even if the numbers of equations and of unknowns are not necessarily the same. This forms main part of the next chapter. Example 1. Solve the system of homogeneous linear equations
1
x + 3y + 2z == 0, 2x-y + 3z == 0, 3x + 2y - z == 0.
ion: Since in this case 1 3 2 D = 2 - 1 3 = 42, 3 2 - 1 the system has only the trivial solution x = y = z = 0. Example 2. Solve the system of linear equations 2xi + X2— 5x3 + xi = 8, — x\ —3x2 6x4 = 9, 2x2— X3 + 2x4 = —5, Xi +4X2 — 7X3 + 6X4 = 0 . Solution: In this case, 2 1 - 5 1 1-3 0 - 6 D = 0 2 - 1 2 1 4 - 7 6 8 Dx =
D3
-5 0 -1 -7
9 -5 0
2 1 0 1
1 8 -3 9 2 -5 4 0
5 1 7 1 -6 = 81, 2 6
1 -6 2 6
-27,
0 2 1 13 2 = - 2 1 0 = 27, 12 7 0 5 2 8 9 1 0 -5 1 0
-5 0 -1 -7
1 -6 2 6
108,
2 1 1 -3 r»4 = 2 0 1 4
-5 0 -1 -7
8 9 -5 0
27,
D2
and so the solution is xi = 3 ,
i2 = - 4 ,
x 3 = —1,
x 4 = 1.
39
Determinants
Example 3. Solve the system of linear equations axi ax\ axi ax\ . bx\ where a^b
+ ax2 + (1x3 + ax\ + bx& = as, + ax2 + C1X3 + bin + ax*, = a±, + ax2 + bxz + ax^ + ax$ = 03, + bx2 + ax3 + ax± + ax& = 02, + ax2 + axz + ax^ + axs = 01,
and 4a + b ^ 0.
Solution: Here, a a a a b
D
a a a b a
(4a +
a a b a a
a b a a a
b a a a a
1 1 1 1 1 0 0 b—a 0 0 0 0 0 (4a + b) 0 b—a 0 0 0 b—a 0 0 0 b—a 0 0
b)(b-a)4
and
Dx
05 a^ a% a2
a a a b
a a b a
a b a a
b a a a
ai
a
a
a
a
2_\as
4a + 6 4a + b 4a + b 4a+ b
s=l
a\ a-z a2
a a b a
01
a b a a
b a a a
a a a a
= ai(4a + 6 ) ( 6 - a ) 3 - a ( b - a ) 3 y ^ a 3 s=l
Similarly, Di = ai{4a + b)(b- a) 3 - a(b ~a)3^2as,
i = 2,3,4,5.
s=l
Hence the solution of the system is 1 aj(4a 4- b) — a YJ a s (4a + 6 ) ( b - a ) s=l
t'= 1,2,3,4,5.
40
Linear Algebra
Exercises 1. Solve the following systems of linear equations using Cramer's Theorem: ' xi + z 2 +5x 3 +7x 4 = 14, (1)
3Xi + 5 X 2 + 7 X 3 + X4 = 0,
5xi +7x 2 + x 3 +3x 4 = 4, . 7 x i + X2+3X3 + 5 X 4 = 10.
(2)
x +y +z = a + b + c, { ax +by +cz = a2 + b2+c2, bcx +cay+abz = 3abc (a, b,c being different to each other).
2. Construct a quadratic polynomial f(x) such that / ( l ) = —1, /(—1) = 9, /(2) = - 3 . 3. Assume the relation between the density h and temperature t of mercury is h = ao + ait + atf2 + a 3 t 3 , and we have the following data: o°
10°
20°
30°
13.60
13.57
13.55
13.52
Find the densities of mercury at t = 15°, 40° (up to two decimals).
CHAPTER 2 SYSTEMS OF LINEAR EQUATIONS
In Chapter 1, we introduced Cramer's theorem for solving systems of linear equations. But using the theorem needs to satisfy two conditions that the numbers of equations and unknown quantities must be equal and that the determinant of coefficients of the system is not zero. But systems of linear equations encountered in many problems frequently do not satisfy these two conditions simultaneously. Sometimes, although the numbers of equations and unknown quantities are equal, the determinant of coefficients vanishes; sometimes even the numbers of equations and unknown quantities are not equal, to say nothing of the determinant. This urges us to discuss general systems of linear equations further. It is not enough for us to have only the determinant tool; we need to incorporate some new concepts, such as vectors, matrices, and so on. In this chapter we discuss systems of linear equations, mainly the following three problems: 1. How to determine whether or not a solution exists. That is, what is the necessary and sufficient condition that systems of linear equations have solutions. 2. If a system of linear equations has no solution, of course, it is no longer necessary for us to discuss it. If a system of linear equations has solutions, how many solutions it has, and how to find them? 3. If a system of linear equations has a unique solution, this is simple. If they have many solutions, what is the relation between the solutions? This chapter is divided into 5 sections. In Section 2.1, we shall start with a few basic concepts and their properties for use later on. They are all the 41
42
Linear Algebra
most basic concepts in linear algebra. We will solve the three problems raised above in Sections 2.2 and 2.3 for a special case, namely, that of systems of homogeneous linear equations; in Section 2.5, for the general case, namely, nonhomogeneous systems. Finally, in Section 2.5, we give a method for sim plifying computation. It will be used often, and is therefore very important. 2.1. Linear Relations between Vectors General systems of linear equations cannot be solved by Cramer's theorem. But we hope to be able to modify systems of linear equations so that Cramer's theorem is applicable. How on earth do we modify a system of linear equations? At first let us consider the following example. Example 1. Solve the following homogeneous system of linear equations, x + 2y - z = 0 , 2x - 3y + z = 0, 4a; + y — z = 0.
(1)
Solution: Since the determinant of coefficients 1 2 -1 2 -3 1 4 1 -1
we cannot solve the above system by using Cramer's theorem. It is easy to see that if we multiply the first equation by 2 and add the resulting equation to the second equation, we shall get the third equation. Thus the common solutions of the system of the first two equations are the solutions of the third equation. Hence the third equation is redundant and can be deleted from the system of linear equations. Consequently the solution of the system of the first two equations is simply the solution of the system (1). Of course, the reverse is also true. Namely, the system of the first two equations is the same-solution system or equivalent system of the system (1). Hence to solve the system (1), it suffices to solve the system of the first two equations. We are going to solve the system of the first two equations. We first note that 1 2 2 -3 = -7.
Systems of Linear
43
Equations
Regarding z as some given number and transporting the terms containing z to the right-hand side, we get (x + 2y = z,
\2x-3y
(2)
w
= -z.
Using Cramer's theorem, we solve the system and obtain 1
z -z
2\ _ z - 3 :~ 7 '
V
_ ~
1 1 z 7 2 -z
3z
y
These are all the solutions of (2) with z being any real number. Thus the solutions required are z 3z x = - , y= — , z = z, where z is an arbitrary constant. The process of solving a general system of homogeneous linear equations is the same as that of solving the above special case of a homogeneous sys tem. First we delete any redundant equations from the system of linear equa tions, find the same-solution system of the linear equations, and then solve the same-solution system using Cramer's theorem. The solutions obtained are those required. The problem is: which are the equations that are redundant in a system of linear equations? In other words, what is the same-solution system of the linear equations? Before solving a system of linear equations, this problem must be addressed first. Of course this is by no means easy. In this section we shall mainly explore ways and means of solving such a problem. In short, we first present some basic concepts such as linearly dependence, lin early independence, matrices, ranks and basic properties of matrices. These concepts are themselves basic in linear algebra and we cannot neglect them. We know that a linear equation is uniquely defined by its coefficients and constant terms independently of the notation used for the unknowns. If we omit the unknowns from a linear equation, what remains is a set of ordered numbers. Thus a linear equation corresponds to a set of ordered numbers. For convenience, we sometimes write a linear equation as a set of ordered num bers. Then discussing problems of linear equations is equivalent to discussing problems of a set of ordered numbers, which is much simpler. Thus we have the following important concepts. Definition 1. A set of ordered numbers composed of n numbers ai, a<2, ■ ■ ■, an, denoted by a = (ai,a2,...,an),
44
Linear Algebra
is called a n-dimensional vector, where 01,02,... ,O n are called the components of a, en being called the i-th component. For example, corresponding to the three equations in Example 1, we have 3 vectors of dimension 4. a = ( l , 2 , - 1 , 0 ) , / 3 = ( 2 , - 3 , l , 0 ) , v= ( 4 , 1 , - 1 , 0 ) . It is well known that in plane analytical geometry the vector OP from the origin (0,0) to a point P (a, b) is denoted by a 2-dimensional vector (a, b), it is a set of two ordered numbers. In space analytical geometry a vector OP from the origin O = (0,0,0) to a point P = (a, b, c) is denoted by a 3-dimensional vector (a,b,c), it is a set of three ordered numbers. These are familiar to us. The above definition can be considered as a generalization of 2- and 3-dimensional vectors in analytical geometry. It should be noted that an n-dimensional vector differs from 2- or 3-dimensional vector in that it has no visual geometrical meaning, we simply continue to use the geometric terms. A vector whose components are all zero is called a zero vector, usually denoted by o, i.e., o=(0,...,0)
as in the analytical geometry. Note that both the number 0 and the zero vector are denoted by 0. In general, it is not difficult to distinguish them from the context. Two linear equations are identical if their corresponding coefficients and constant terms are equal. Therefore two n-dimensional vectors a = (flu ■. ■, an), (3 = ( 6 j , . . . , bn) are said to be equal, if their corresponding com ponents are equal. Thus a — f3, or (oi,...,on) = (6i,...,6n),
if
a,i = bt, i = 1,2,. . . , n .
Again the coefficients and constant terms of the sum of linear equations or of the difference between two linear equations are the sum or difference of the coefficients of the corresponding terms and the constant terms. The coefficients and constant terms of the product of a linear equation with a scalar are the products of coefficients and constant terms of a linear equation with a scalar. According to these properties, for operations of n-dimensional vectors, we define as follows. Definition 2. Let a = ( a i , . . . , o n ) and /3 = (bi,...,bn). The vector (ai + 6 1 , . . . ,o„ + bn) is called the sum of the vectors a and f3; the vector (ai — b\,..., an — bn) is called the difference between a and /3, written as
Systems
of Linear
Equations
45
a + (3 = (ai + bi,..., an + bn), ot - (3 = (oi - &i,..., an - bn). Let k be a number, then the vector (kai,..., kan) is called the scalar mul tiplication of a by k, denoted by ka or ak, i.e., ka = ak = (ka\,...,
kan).
Addition, subtraction and scalar multiplication of vectors are called linear operations. Clearly, Definition 2 coincides with the addition, subtraction and scalar multiplication of two-dimensional or three-dimensional vectors in analytical geometry. It is for this reason that we refer to a set of n ordered numbers as a vector. Note that only when the dimensions of two vectors are equal can we say that these two vectors are equal or not equal and find their sum or difference. When the dimensions of two vectors are different, it is meaningless to make a comparison between them and attempt to find their sum or difference. This is analogous to the case of two equations whose numbers of unknowns are different. In Example 1 the third equation is obtained by adding 2 times the first equation to the second equation. This relation is expressed in terms of vectors as ( 4 , 1 , - 1 , 0 ) = 2(1,2, - 1 , 0 ) + (2, - 3 , 1 , 0 ) , or v = 2a + p*. We also say that 2a + /3 is a linear combination of vectors a and /3, or v is a linear expression of a and (3. In general we have: Definition 3. Let an,..., m constants. Then
am be m vectors of dimension n, k\,... ,km be
fciai H
h
kmam
is called a linear combination of vectors cti,...,am. If a = kiai + kmam, then a is called a linear expression of vectors atu.... am.
+ ■■■
46
Linear Algebra
We have known that, for m vectors ati,..., a m , if there is one vector that is a linear combination of the m — 1 remaining vectors, then we have fciai H
\-kmam=o,
(3)
where fci,..., fcm are not all zero. Conversely, if the coefficients in (3) are not all zero, then among the m vectors a t i , . . . , a m there at least exists a vector that is a linear combination of the m — 1 remaining vectors. For example, if km 7^ 0, then we obtain "-1 O m = —j— " 1 Km
Km—1 7 «m-l , ™m
i.e., am is a linear combination of ai,..., a m . Hence there exists some vector among the m vectors a\,..., a m which is a linear combination of the m — 1 remaining vectors if and only if the coefficients fci,..., fcm which satisfy (3) are not all zero. Or we may say that in the m vectors C*i,..., a m there does not exist one vector that is a linear combination of the m — 1 remaining vectors if and only if the coefficients fci,..., fcm which satisfy (3) are equal to zero. These properties are often used, and are of great importance. For simplicity, the following definition is given in particular. Definition 4. Let a i , . . . , a m be m vectors of dimension n. a\,..., ctm is said to be linearly dependent, if there exist constants fci,..., fcm, not all zero, such that fciaiH \-kmam = o. (3) If no such constants fci,..., fcm exist, then ot\,..., am are said to be linearly independent. That is to say, a t i , . . . , am are linearly independent if (3) holds only when the constants fci,... ,fcm are zero. For example, in Example 1, the vectors a, 3, and v are linearly dependent. However, the vectors a and 3 are linearly independent, since from fcia + k23 = fci(l,2, - 1 , 0 ) + fc2(2, - 3 , 1 , 0 ) = (fci + 2fc2, 2fci - 3fc2, -fci + fc2,0) = o, we obtain fci + 2fc2 = 0, 2fci - 3fc2 = 0 , -fci + fc2 = 0. Clearly the system has only zero solution, i.e., fci = 0, fc2 = 0. This is to say, fci and fc2 which satisfy the above linear relation must each be equal to zero,
Systems of Linear
Equations
47
hence a and (3 are linearly independent. If m vectors are linearly dependent, we may also say that there exists a linear relation between them. Conversely if they are linearly independent, there is no linear relation between them. As the associated relation of vectors only has addition and scalar multiplication, the relation between them has only such a linear relation as the above. The linear relation is also a relation between linear functions. Linear dependence is an extremely important concept, and is closely linked to linear combination. We know from the above-mentioned that m vectors are linearly dependent if and only if in them there is one vector which is a linear combination of the others. The following are some basic properties of the linear relation obtained directly from the definition. In (3) let m = 1. We obtain a set of vectors consisting of only one vector. It is obvious that if this vector is a zero vector, then the set is linearly dependent. If this vector is a nonzero vector, then the set is linearly independent. Let m = 2. (3) becomes kia.\ + k2<X2 = o. If fci ^ 0, we have k2 Oil =
—T-Ct2,
fci hence the components of a i is directly proportional to the corresponding com ponents of a.2- The inverse is also true. Thus we obtain that two vectors are linearly dependent if and only if their corresponding components are propor tional. Again suppose that in m vectors a x , . . . , a m of dimension n some, for ex ample, ai,... ,at(t < m), are linearly dependent, then there exist t constants ki,...,kt such that k\cxi -\ \- ktctt = o, where fci, &2,..., kt, are not all zero. Hence fciai -I
h ktat + 0at+i -\
h 0am — o,
i.e., o t i , . . . , a.m are linearly dependent. This is to say, a set of vectors whose subset is linearly dependent is itself linearly dependent. Therefore by the method of reduction to absurdity we can again prove that if the entire set of vectors is linearly independent, then any part of the vectors are linearly independent. Hence the problem raised earlier as to how to remove redundant linear equations from a system of linear equations and find the same-sulution system of linear equations corresponds to how to remove vectors which may be expressed as linear combinations of the other vectors so that the vectors that remain are linearly independent. In order to solve this problem, we must first solve the problem of how to judge whether a set of vectors is linearly
48
Linear Algebra
dependent or linearly independent. For this purpose we introduce another important concept, the matrix. Definition 5. A rectangular array o f m x n scalars a,j (i = 1,2,... ,m; j — 1,2,..., n) arranged in m rows and n columns and denoted by
is called an (m, n)-matrix, or simply a matrix. We refer to a^ as an element (or an entry) in the i-th row and j - t h column in A. If m = n, we say that A is a square matrix of order n or simply a matrix of order n. For example, I „
_
1
j is a (2, 3)-matrix. In the above example
the matrix of the coefficients of the linear equations is
which is a matrix, in fact a square matrix, of order 3. Note that although in form the determinant resembles a matrix, they are entirely different concepts. In fact the former is a number, while the latter is an array list composed of n 2 ordered numbers. The former is not to be confused with the latter. Let A be a (m, n) matrix. An n-dimensional vector composed of n entries of some row of A is called a row vector of A, and an m-dimensional vector composed of m entries of some column of A is called a column vector of A. Hence the matrix A has m row vectors and n column vectors. In general, a row vector is written horizontally, while a column row is written vertically. Therefore the i-th row of A is written horizontally as (an
■ ■ ■ Oin)
while the j - t h column of A is written vertically as / aU \ \ o,mj J Sometimes for convenience, column vectors are also written horizontally.
Systems of Linear Equations
49
A matrix composed only of one row or one column is often known as a vector. As for the determinant, we take any fc(l ^ k ^ min(m, n)) rows and k columns in an (m, n) matrix A, then the determinant which consists of the elements in the intersection of the rows and columns in the original relative positive in the matrix A is called a minor of order k of A. Particularly, in an (n, n) matrix A there is only one minor of order n which is often called the determinant of the matrix A, denoted by |vl| or detA For example, in the above (2,3)-matrix the minors of order 1 consist of one element of the matrix. Hence it has six minors of order 1. Its three minors of order 2 are 1 2 2 -3
1-1 2-1 ' 2 1 ' -3 1 '
In this example, the determinant of the matrix of order 3 is 1 2 -1 2-3 1 . 4 1-1 The following is another important concept of a matrix. Definition 6. A matrix is said to have rank r, if r is the largest order of nonzero minors of matrix A. The rank of a matrix A is denoted by rank of A. If the rank of a matrix of order n is n, then the matrix is called a nonsingular matrix, otherwise it is called a singular matrix. For example, the ranks of the (2, 3)-matrix and the matrix of order 3 above are both 2. Hence the second matrix above is a singular matrix. A matrix of rank zero consists of zeros only. If there is at least one nonzero minor of order r and all minors of order r + 1 for A are equal to zero, then the rank of A is r. This is because when all minors of order r+1 are zero, according to Laplace's expansion theorem in Chapter 1 all minors whose order is more than r + 1 are zero. Example 2. Let the ranks of matrices
(
a n ••• aifc \
h
Onl • • ' CLnkJ
B
A n ■ ■■
=[
\bril
hl\
•• '
'"
0nl
)
Linear Algebra
50
be r and s respectively, and the rank of matrix
(
a n ••• aifc bn ••• hi \ anl
■ ■ ■ O-nk bnl
■■■ bnl
)
is t, prove that max(r, s) ^t ^ r + s, i.e., max(rank of A, rank of B) ^ rank of (A, B) ^ rank of A + rank of B . Proof. Evidently t ^ max(r, s). To prove t ^ r + s, it suffices to showthat any minor M (if any) of order r + s + 1 of C are zero. Using Laplace's theorem for computing the minor M by expansion along m columns contained in A. If the number of columns of M contained in A is greater than r, then all minors of A in the expansion are zero. Hence M = 0. If the number of columns of M contained in A is not greater than r and the number of columns of M contained in B is greater than s. Therefore all minors of B in the expansion are zeros, and so M = 0. The proof is complete. We judge whether a set of vectors is linearly dependent or linearly inde pendent mainly by the following important theorem. Theorem 1. The m row vectors of an (m, n)-matrix A are linearly depen dent if and only if the rank of A is less than m. Proof. Assume that the m row vectors CX\ = ( a n , • • • i ^ l n ) »■ ■ - , Q ! m = ( t t m l i - • •
,0,mn)
ofan(m,n)-matrix / a n ■ • • a,\n \ \a7ni * * * a m n 1 are linearly dependent. Then there exists a vector among the m vectors which is a linear combination of the remaining m — 1 vectors. For convenience let am be a linear combination of a i , . . . , a m _ i , i.e., a m = fciai + £20:2 + ••• + fcm_iam_i. Then on multiplying the first row , . . . , (TO- l)-th row by — ki,...,
51
Systems of Linear Equations
—fcm_i respectively and adding them to the m-th row, all elements of the m-th row of A become zero. Hence if m ^ n, i.e., A contains a minor of order m, then any minor of order m of A is zero. Hence the rank of A is less than m. If m> n, obviously A contains no minor of order m. Of course the rank of A is also less than m. Thus the necessary condition holds. Conversely, if the rank of A is r where r <m. For convenience we assume r > 0 and the minor D of order r in the upper left corner of A is not zero. If we can prove that the r + 1 vectors o t i , . . . , a r + i are linearly dependent, then m vectors oti,..., am are linearly dependent, since if some of the vectors are linearly dependent, then all the vectors must also be linearly dependent. To prove the vectors c*i,..., ar+\ are linearly dependent, by definition it suffices to find r + 1 numbers fci,fo, • • •, K+i, not all zero, such that fciai H
hfcr+1ar+i
= (fcian +
h fcr+iar+i,i, • ■ •, fciain
+
fcr+iar+i)n)
= o, or kiait +
h fcr+iar+i)t = 0,
t = 1,2,... , n .
In order to find these r + 1 numbers, we consider the following determinant of order r + 1: air
an
A
ari Or+1,1
1
* ■
(Xff
• • a r +i | 7 -
ait "T£
ar+i:t
When t ^ r,Dt contains two identical columns and so Dt = 0. When £ > r (if any), Dt is a minor of order r + 1 of A and Dt = 0. That is to say, for t = 1,2,..., n, we have always Z)t = 0. Thus we develop a general expression for Dt by expanding about its last column Aiau H
h ^4,-art + £>a r + i ] t = 0,
where A\,..., At, D(^ 0) are respectively the cofactors of the elements an,..., art,cir+i,t i n Dt. Evidently Ai,..., Ar,D are independent of t. Hence A\,..., Ar, D are the r + 1 numbers required which are not all zero, so a i , a 2 , . . . , a r are linearly dependent. Thus the sufficient condition holds. The theorem is now proved completely.
52
Linear Algebra
If in the above theorem we replace the rows by columns, it is still true, i.e., n columns of an (m, n)-matrix A are linearly dependent if and only if the rank of A is less than n. For example, the rank of the matrix
is less than 3, and so its three row vectors are linearly dependent, so are the three columns. Again the rank of matrix 1 2 -1 2 -3 1 is 2; so its two row vectors are linearly independent, while its three columns are linearly dependent. It is easy to see from Theorem 1 that any m vectors of dimension n(m> n) are linearly dependent and that any m vectors of dimension n are linearly independent if and only if the rank of the matrix composed of them is equal to m. Again n vectors of dimension n ( a n , a i 2 , . . . , a i n ) , • • ■, (a n i, in2, • • •, o,nn) are linearly independent if and only if the determinant a n ■ • • air,
^0.
a«i
Example 3. Suppose that a t i , . . . , ctk are k linearly independent vectors of dimension n (> k), then n — k vectors /3fc+i,.. • ,(3n of dimension n can be chosen appropriately such that n vectors a i , . . . , otk, Pk+i,..., f3n are linearly independent. Proof. We consider a i , . . . , <x& as k row vectors and replenish n — k row vectors such that the determinant formed by them is not equal to zero. There are many ways to satisfy the condition. The simplest is that we construct a determinant in which the first k rows are a t i , . . . ,<** and the remaining n — k rows include a nonzero cofactor of a nonzero minor of order k in the first k rows, while the remaining elements of the determinant can be taken as zeros. For example, if the non-zero minor of order k in the first k rows is in the first k columns, then we can choose pi = ( 0 , . . . , 0 , 1 , 0 , . . . , 0) (i = k + 1 , . . . , n, with 1 situated in i-th column), that is, 0, is a row vector with 1 in the i-th column and zero elsewhere (i = k + 1 , . . , ,n).
Systems of Linear Equations
53
By Laplace's theorem, such a determinant is not equal to zero. Hence the statement in the example is proved. Theorem 2. The rank of a matrix A is equal to r if and only if there are r row (column) vectors in A which are linearly independent, and any r +1 row (column) vectors (if any) are all linearly dependent. Proof. The rank of A is equal to r if and only if there exists some nonzero minor of order r, and no nonzero minor of order r+1. It follows from Theorem 1 that the r rows (columns) where a nonzero minor of order r lie are linearly independent, while any r + 1 row (column) vectors are linearly dependent. This proves the theorem. In a set of vectors, if there are r linearly independent vectors and any group of vectors containing more than r vectors is linearly dependent, the rank of this set of vectors is said to be r. Hence the rank of matrix A is the rank of the set of its row vectors, and the rank of the set of its column vectors. Theorem 3. Let c*i, ■ ■ ■, a „ b e n vectors, /3i,... , / 3 m are linear combina tions of a i , . . . , a n . If m > n, then /3i,...,/3m are linearly dependent. Proof. Let
Then Then
r
{
a n « i + ■ • ■ + ai /3i = anoci H h ainan , ai H /3m = ami<xi -\
ha 1- amnan
.
fcijSi. + •■• + fcm/3m = fci(auai + •■■ + a i n Q ! n ) H fci/3i H 1- fem/3m = + fci(auQi k (a Hiai \- a i „ a „ ) H m
m
H
1" 1 m r . a n )
km{amH ia.i -H H A; ah i)ai amnH an) = +(fcian m m = (fcian H h kmami)a.i -\ (fciain + H ■ ih ki^m^mn)&n •. + (fclQln mamn)an .. ,kkmm,, not all zero, such that If we can prove that there are m numbers fci,. fci,...,
{
r fciai L ~r ' kian H
then then
1 fciOi,a ~r kiain H
" * i rCm^ml
(- kmami
* " ' ~r
= 0, = 0,
i^m^mn =
\- kmamn
0, = 0,
:= 0.o. fci/3i + • • ••++ Kfc *iA T mpmm/3m =
54
Linear Algebra
This proves the theorem. Since m> n, the rank of the matrix / a n ■•• a m l \ Qj\n • • • iljnn I
is less than m. We know from Theorem 1 that its m column vectors are linearly dependent. Hence it is now apparent that k\,... ,km exist. This proves the theorem. Suppose that A is an (m, n) matrix, ai,..., a m are its m row vectors. If 0*1 €Xr are linearly independent and ay, c*2, • ■ •, a n Qt» (i = 1,2,... m) are linearly dependent, then any vector a , is a linear combination of a i , c*2,..., ar. Then it follows from Theorem 3 that any r + 1 row vectors are linearly dependent. Thus we know from Theorem 2 that the rank of A is r. In compu ting the rank of a concrete matrix, it is simpler using the above method than Theorem 2. Example 4. Suppose that the rank of a matrix A of order n i s r , and s row (column) vectors taken from A form a matrix B. Let the rank of B be R. Prove that R ^ r + s — n. Proof. Assume that s ^n — r, then r + s — n ^ 0. Clearly R^ r + s — n. If s > n — r, then s — (n — r) = t > 0. Hence no matter how the s row are taken, at least t rows will be taken from the r linearly independent. Thus the rank of B is not less than t. In the above we have shown how to determine whether a set of vectors is linearly dependent or linearly independent. Based on this our later discussion of systems of linear equations will be much easier. Exercises 1. Find the vector a from the following expression: 3(ai - a ) + 2(ai + a ) = 5 ( a 3 + a ) , where a j = (2,5,1,3), a 2 = (10,1,5,10), a 3 = (4,1, - 1 , 1 ) . 2. Suppose Q i , . . . , a m are m vectors of the same dimension. If there are m constants ki,...,km> all zero, such that k\Oti + ■ ■ ■ + kmam = o, are oti,..,, am linearly independent? If there are m numbers k\,..., km, not all zero, such that fciai + .. .+kmam ^ o, are o t i , . . . , am linearly independent?
Systems of Linear
55
Equations
3. Suppose oti, ■ ■ • ,Q!m,/3i, • • ■ , / 3 m are vectors of the same dimension. If it follows from kioii + • • • + kmoim + fci/3i + • ■ • + fcm/3m = o that only fci = . . . = km = 0, are oti,..., am, (3i,..., f3m linearly independent? 4. If ot can be expressed as the linear combination oi = kioii + ■ ■ ■ + fcmo:m, is the expression unique? 5. Assume a i , a 2 , . . . , a m are linearly dependent, /3\,/32,. ■ ■ ,/3m are linearly dependent. Then there are m numbers fci,fc2,... , fcm, not all zero, such that fci an H
h fcmam = o, fci/3i H
h km(3m = o.
Hence fci(tti+j9i)H hfc m (a m + /3 m ) = o, so that a i + / 3 i , . . . ,am+(3m are also linearly dependent. Is the above true? 6. Let (1) oil,..., oir, (2) a n , . . . , a r , a r + i , . . . , a m be two sets of vectors of the same dimension, is (2) linearly independent when (1) is linearly inde pendent? Is (1) linearly dependent when (2) is linearly dependent? 7. Suppose we have two sets of vectors: oil = ( o n , . • ■ ,aiT),...
,otm = (ami,...
Pi = (an,- • ■ ,o-ir,o,iir+i,... fJm
=
,amr),
,aln),...,
v^ml j • • • ) Gmr> &m,r+l j ■ ■ - ) ^ m n j •
Prove that if cti,..., am are linearly independent, then / 3 i , . . . , j3m are also linearly independent. Is the converse true? Illustrate it with an example. 8. Assume Qi, a 2 , and 03 are linearly independent. Prove that 011 + 012,012 + 013, and 0:3 + 0:1 are also linearly independent. 9. Prove that any vector in a linearly dependent set of vectors must be a linear combination of some linearly independent set of vectors. 10. Suppose n vectors 0:1,0:2, ■ ■ ■ ,an a r e linearly dependent while any n — 1 vectors among them are linearly independent. Prove that (1) If the identity kioii + ■ ■ ■ + knan = o holds, then either k\,..., kn are all zeros or none of fci,..., kn are zero. (2) If the two identities kiQi -\ \- knan = o, lioii H \- lnctn = o holds, then Ki
Kn
h
In
11. Which of the following sets of vectors are linearly dependent? (1) (1, 1, 1), (2, 2, 2, 2); (2) (1, 1, 1), (1, 2, 3), (1, 3, 6).
56
Linear Algebra
12. Find the ranks of the following matrices: /l 1 0 0 \0
(2)
(1)
0 1 0 1 0 0 1 1 0 0 1 1 1 0 1
0\ 0 0 0 1/
13. For a matrix of rank r, is there any zero minor of order r — 1? Is there any zero minor of order r? Is there any nonzero minor of order r + 1? 14. Suppose the matrix B is obtained by deleting a row of a matrix A. What is the relation between the ranks of A and B1 15. Assume that A is an (m, n)-matrix of rank r. Let B be a (s, i)-matrix obtained by taking s rows and t columns from A. Prove that the rank of B^r + s + t — m — n. 16. Show that the rank of an (m, n)-matrix an
Olr;
is either 0 or 1 if and only if there exist m + n numbers Oi,... ,am,bi,... such that a,ij = a,ibj (i = 1,2,...,m;
j =
,bn
l,2,...,ri).
2.2. Systems of Homogeneous Linear Equations We are about to solve the general system of homogeneous linear equations anii H
h ainxn = 0, (1)
^-ml^'l T" * * * T" a m 7 l X n — U .
The matrix an
•••
a\n
I composed of its coefficients is called the coefficient matrix. According to Theorem 1 in the preceding section we start from the rank of its coefficient
Systems
of Linear
57
Equations
matrix A, delete any redundant linear equations, and find the same-solution system of linear equations. Without loss of generality and for convenience, suppose that the rank r of A is greater than zero, and the minor D of order r in the upper left corner is not equal to zero. Then the first r row vectors « i = (an, • ■ •, ain),...
,ar = (ari,...,
arn)
are linearly independent. Thus none of the first r linear equations is redundant. Again since any vector on is a linear combination of a\, ct2, ■ ■ ■, ar, the last m — r linear equations can be expressed with the first r linear equations. Hence the last m — r linear equations in (1) are redundant equations, i.e., the system of linear equations formed from the first r linear equations is a same-solution system of (1). Thus in order to solve (1), it suffices to solve a system consisting of the first r linear equations only. Since the determinant D ^ 0, we transport the terms containing a v + i , . . . , xn to the right-hand side: Gn^i + • • • + airxT — — ai i r +i£ r +i
0
(2) CLmXfi
Using Cramer's theorem, we find its solutions Xi
Di _ Dr — , . . . , xr—
(3)
where Di is a determinant of order r obtained from D by taking the right-hand side of (2) as a new column to replace the i-th column of D. The result can be written conveniently if we simplify (3). Using property 2 of a determinant, we expand D\ and obtain —ai,r+ix
r+1
Oi» %xn
r+1
iXn
ai2 ■
• air
Di =
—Ol.r+l
ai2
•
■
air
—ar
ar2
■
•
arr
ai2
■
■
air
aT2
■
•
arr
Xr+\
—a\n
+
xn . Q-rn
flr2
• ■arr
+ ■ ■
58
Linear Algebra
Thus £>i can be written as a linear combination of xr+i,... can be done with Di generally. Let
,xn.
And so this
Di _ ——■ — Cjr_)_ia;r+i + ■ • • + Cinxn,
i —l,..., r,
i.e., Xi = Citr-\-iXr+i
~t
• • + CinXn ,
% — 1, ' ' ' , T .
Then all solutions of (1) can be written as ' x\ = c i , r + i x r + i H
h ci„a;n ,
Xf — Cj>^-^-\Xr~\-l ~j~ ' ' ' -p CmXn , 3-r+l
k
==
Xn —
^r+1 i Xn ,
where xr+i,..., xn are arbitrary constants. In particular when r = n, i.e., the rank of .A is n, (1) has only the zero solution. What is mentioned above is the general rule for solving systems of ho mogeneous linear equations. The preceding procedure of solving (1) may be summarized as follows: First find the rank r of A and take r linear equations which contain some nonzero minors of order r of A, and then move the remaining n — r terms of the r equations to the right-hand side. Finally using Cramer's theorem solve the system of the r equations, and obtain all solutions to (1). According to the above rule, we again can state: T h e o r e m 1. The homogeneous system (1) of linear equations has only the zero solution if the rank r of its coefficient matrix is equal to n, and has infinitely many solutions if r < n. Assume that m < n in (1), then the rank r < n. Hence (1) has nonzero solutions. In other words, the number of equations in (1) is less than that of unknowns in (1), (1) has nonzero solutions. The above-mentioned solved the first and second problems raised in the introduction for homogeneous systems. The third problem will be discussed in the next section.
Systems
of Linear
59
Equations
Example 1. Solve the following system of linear equations: x + y — 3z — w = 0, 3x — y — 3z + 4w = 0, x + by - 9z - 8w = 0. Solution: First find out the rank of the coefficient matrix A. In general, we can find the rank, by definition, computing minors of A. But here we shall adopt a simpler method. Since the difference between 4 times the first equation and the second equation is the third equation, the third equation is redundant 1 1 and we shall delete it. As D / O w e solve the first two equations 3-1 in which D lies. Transporting the terms contained z and w to the right-hand side, we obtain
x + y = 3z + w, 3a; — y = Zz — Aw. Using Cramer's theorem, as Di =
3z + w 3z - 4w
D2 =
1 3
1 -1
-6z + 3w,
3z + w = -6z - lw , 3z-4w
we obtain 3 x=-z--w,
3
3 7 y = -z+-w,
z = z,
w = w.
These are the solutions required, where z and w are arbitrary constants. E x a m p l e 2. Discuss the positional relations among the following three planes: Px : aix + biy + C\z = 0, p 2 : a2x + b2y + c2z = 0, pz : a3x + b3y + c3z = 0.
60
Linear Algebra
Solution: According to the rank r of the coefficient matrix
(
ai
h
0.1
b2
ci \ c2 I ,
«3
b3
c3 /
we consider the following: 1. When r = 3 The above three planes intersect only at a point, namely the origin, because the three linear equations have only the zero solution. 2. When r = 2 Two of the three planes intersect at a line and the other plane either passes through this line, or coincides with one of the other two planes. This is because if CKj., a2, and ct3 are three row vectors of A, from the rank 2 of A, we know ai,a2, and a3 are linearly dependent and two vectors of them, say a i and a2, are linearly independent. Hence ai,&i,Ci, and 0,2, b2, C2 are out of proportion. Thus two planes pi and P2 intersect at a line. We know from linearly dependent vectors ai,ct2, and 0:3 that 03 = kia\ + k20t2When k\k,2 ^ 0, the plane p 3 passes through the intersection line. When kik2 = 0, say k\ = 0, then 03 = k2a2 and the two planes p2 and p3 are parallel. Again since the three planes pass through the origin, p2 and p 3 coincide. 3. When r = 1 The three planes coincide. From Theorem 2 in Section 2.1, we know that any two of the three row vectors are linearly dependent. Hence any two of the three planes are mutually parallel. Again they pass through the origin. Therefore they coincide. Three planes passing through the origin have above four different configurations as follows:
(1) The solution is complete.
12)
(3)
Systems of Linear
Equations
61
Suppose that we have two homogeneous systems, (1) and 611x1 H
h hnxn
= 0, B=
fefcl^i H
(5)
h bknXn = 0 ,
If the equations of (5) are linear combinations of the equations of (1), or in other words, the row vectors of B are linear combinations of row vectors of A, then the solutions of (1) are the solutions of (5). Conversely if the equations of (1) are linear combinations of the equation of (5), then the solutions of (5) are the solutions of (1). Hence (1) and (5) have exactly the same solutions, i.e., (1) and (5) are same-solution systems. The converse is also true. This is stated as the following Theorem: Theorem 2. Two systems of homogeneous linear equations (1) and (5) are same-solution systems if and only if the equations of (1) are linear combinations of the equations of (5), while the equations of (5) are linear combinations of the equations of (1). Proof. The proof of the sufficient condition is as mentioned above. We need only prove the necessary condition. Assume that (1) and (5) are samesolution systems. Clearly the system ' an^i +
(- ainxn
= 0,
. bnXi + ■ ■ ■ + binXn = 0 ,
and (1) have exactly the same solutions. Hence the ranks of matrices are equal. So for each i, vector (bu,bi2, ■ ■ ■ ,bin) nation of the first m vectors (aji,Oj2, • ■ • ,a,jn)- Similarly, (oa>Oi2>- •• >
their coefficient is a linear combi for each i, vector (bji,bj2, • ■ ■ ,bjn).
62
Linear Algebra Xi
Xi
—
%2
2- < Xl
+ x5
- ^3
+ x5 + x5 + x5
- x2 X2
- x3 —
X4
+ x6 — xe
= o, = o, = o, = o, = 0.
3. Let the matrix A be the coefficient matrix of the system of homogeneous linear equations an^i +
r o,inxn = 0,
flni^i +
r annxn
= 0.
Given that the determinant of A equals zero and Aij is the cofactor of element a^ of A, prove that (1) x\ = Au,X2 = A&,... ,xn = Ain is one of its solutions; (2) If the rank of A is equal to n — 1 and some A^ ^ 0, then the whole solutions of the homogeneous system are X\
=
KA.H
j ■ ■ ■ » Xn
=
KJiin ,
where k is any constant. 2.3. Systems of Fundamental Solutions We know that a system of homogeneous linear equations auxi H
1- ainxn
= 0, (1)
^ml^l
~r * ' ' i 0*mnXn = = " )
has infinitely many solutions if the rank r of its coefficient matrix a n ■ • • air:
is less than n. What is the relation among the infinitely many solutions? This is the third problem raised in the introduction to systems of homogeneous linear equations. The task of this section is to solve the problem. We see from (4) in the preceding section only the relation among the variables xi,X2,-. ■ ,xn of one solution. But it is not easy to see the relation between two solutions. In order to discuss the relation among solutions conveniently, we consider solutions of (1) as n-dimensional vectors, and these solutions are called solution vectors. The following are the basic properties of solution vectors of (1).
Systems of Linear
63
Equations
If a = (oi, ,an) is a solution vector of (1), it is easy to show by sub stitution that ka = (kai,..., kan) is also a solution vector of (1), where k is any constant. Again let (3 = {b\,..., bn) be also a solution vector of (1). Since system (1) can be written as n n
i == 1l .,.... . , m ,
y/ j ctijXj &ij%j = — 0, U, i3=1 =i
from n
n n
22aiJbj= 7= 1
/
we have
dijdj = 0 ,
°>
t = l... .,m,
y1 = 1, dijbj = 0,
i = 1,..., m,
n n
n n
n n
3= 1
3=1
3= 1
a + 2_^ i/= i ay(aj +bj) = j2__,aijflj =i j = i ijbj = 0 ,
1 = 1,.. . ,m.
i = l,...,m.
That is to say a + /3 = (a\ + bi,. ■ ■, an + bn) is also a solution vector of (1). Hence any linear combination of solution vectors of (1) is also a solution vector of(l). Since a set of vectors of dimension n containing more than n vectors is linearly dependent, there are at most k linearly independent solution vectors in the entire solution vectors of (1). Let them be a i , . . . , otk- Then all lin ear combinations of a i , . . . , a* form the whole solution vectors of (1). Thus infinitely many solution vectors can be expressed by finitely many linearly in dependent solution vectors, which is convenient to discuss. In order to show the basic role of the finitely many solution vectors, such k solution vectors are called a system of fundamental solutions of (1), i.e., Definition 1. Let a t i , . . . , ctk be k solution vectors of (1). If (1) a i , . . . , otk are linearly independent, (2) any solution vector of (1) is a linear combination of Q i , . . . , Qfc, then a n , . . . ,a/fe are called a system of fundamental solutions of (1). In general, in a set of vectors there is always a subset of vectors having the above properties. This is of great importance and we shall often encounter them later. Hence we specially give a formal definition: Definition 2. Let a x , . . . , am be m vectors in a set of vectors. If
64
Linear Algebra
(1) a i , . . . , a m are linearly independent, (2) any vector in the set of vectors is a linear combination of a i , . . . , otm, then a t i , . . . , am are called a largest independent set of the set of vectors. For example, in Example 1 in Section 2.1, (1, 2, - 1 ) , (2, —3, 1) is a largest independent set of the set of row vectors in the coefficient matrix. Furthermore (2, —3, 1), (4, 1, —1) is also its largest independent set. Therefore, in general, a largest independent set in a set of vectors is not unique. It should be noted that in the above definition the reason why a largest in dependent set is so called is that the number of vectors contained in a largest independent set is the largest number of linearly independent vectors in all linearly independent sets of vectors. It is not easy to see this property by the above definition. But using Theorem 3 in Section 2.1, it is to see it clearly. Since the largest number is determined uniquely, although the largest indepen dent sets of a set of vectors are not unique in general, they contain the same number of vectors. Thus we know that a system of fundamental solutions of (1) is a largest independent set of the whole of solution vectors of (1). Hence any homogeneous system of linear equations has a system of fundamental solu tions, and a system of fundamental solutions is not unique, while every system of fundamental solutions contains the same number of solution vectors. How many solution vectors does a system of fundamental solutions contain? The problem can be solved immediately following the procedures below for finding a system of fundamental solutions. This is because as soon as we find one system of fundamental solutions, we know the number of solution vectors contained in any one system of fundamental solutions. The following is for finding a system of fundamental solutions of (1) from its solution vectors. First we find a system of fundamental solutions from the general solution of Example 1 in the preceding section. Since any solution vector can be written as . // 3 3 3 77 \ (x ,y,z, w) =
(x,y,z,w) = I -z-
-w,
-z+-w,z,wj
=z(|,|,i,o) +» -Z{ll1'°)+W{-1
(4J.M)
= za + wfi, where
a = -(!■
3 2'
».°). *-(-!• I-
0, l ) ,
Systems of Linear
65
Equations
they are two solution vectors obtained from the general solution by specifying two groups of numbers 1, 0 and 0, 1 for z and w respectively. Clearly, a and f3 are linearly independent, that is to say, any solution vector is a linear combination of the two linearly independent solution vectors a and (3, so a and /3 are a system of fundamental solutions. This is true for the general case also. Suppose the rank of the coefficient matrix of (1) is r ( < n), from the general solution (4) of (1) in the preceding section, any solution of (1) can be written as solution vectors. Thus we have xT+x,.• • •■ ji 2-n) (xi,.. [Xi, . . ., .,x Xrr,, 3V+li %n) =
~TC-ln^Wi CITJXTM' ■• ••)■ ■> Cr,r+l%r+l '- \^lyr+l^r+l (Cl,r+lXr+l T T" "' ** '* "T Cr,r-+-l*r +i + - ■"T" ' t '
T" C r,n iXCnr +, l£jr _|_i, . , Xn) Orn^n ■ • • j. . %n)
...,o) + .-
=- 3aV ; r++ il ( c i ,,rr++ li ,;i •• •• •• j,CCT>r 1,0, r , r+i, + 1 j A, U, . . . , 0 ) + • ■ • -\~Xnn\C\ \C\n,n, -\~X
. . .. j, C r n , U, U,, 1l y) U , ., . .. , U
== IxrT+ + i n a „' _ r . + lioci C K l ++ •■' ■i ■**-*n^Ti—T where
{
' Gt\ =
J 1
,o,... ,0),
( C i , r + 1 , . ■ • j Cr,r+li *
« 1 = (Cl,r+1,- . . , C r i r + i , l , 0 , . . . , 0 ) ,
* Ot-n—r ~
&n—r
==
\C\m
• • ' j C r n j **i "1 . . . , 0 ,
i)
\Clm • • • > C r n , U, U, . . . , U, i)
are n — r solution vectors of (1). They can be obtained by specifying the are n — rn solution vectors of (1). for They xcan be obtained by specifying the following — r groups of numbers r+i,. .. ,xn in (4) in the preceding following n — r groups of numbers for x +\,... ,xn in (4) in the preceding r section: section: 1, 0;- ■ •; ; 0 o, .,....,., 0, 1. 1, 00 ,, .. .. .. ,, ();■•• Hence ;any my solution vector of (1) is a linear combination of solution vectors Q
OCn—r.
The rank of the matrix
(
Cl,r+1
■ •■
Cr,r+i
1
0
••■
0
\
cin ••• crn 0 0 ••• 1 / is equal to n — r; this is because the minor of order n — r situated in right-hand side of the above matrix is not equal to zero. It follows from Theorem 1 in Section 1 that a i , . . . , a „ _ r are linearly independent. Hence the n — r solution vectors a\,..., a n _ r is a system of fundamental solutions of (1). Thus we see that a system of fundamental solutions of (1) contains n - r solution vectors. Furthermore, any n — r linearly independent solution vec tors / 3 i , . . . ,/3 n -r are a system of fundamental solutions of (1). The reason
66
Linear Algebra
is that if a is any solution vector of (1), then / 3 i , . . . , /3 n -r, & can be written as linear combinations of some system of fundamental solutions. It follows from Theorem 3 in Section 2.1 that /3i,... , / 3 n _ r , a are linearly dependent, while 01,02, ■■ -,l3n-r are linearly independent, and so a is a linear combination of / 3 i , . . . , /3n-r- Hence /3i, 02, ■ ■ ■, Pn-r is a system of fundamental solutions. Thus we have: T h e o r e m 1. If the rank of the coefficient matrix of (1) is equal to n, then (1) has only the zero solution. Hence it has no system of fundamental solutions. If r < n, then (1) has systems of fundamental solutions, and every one such system contains n — r solution vectors. Any n—r linearly independent solution vectors form a system of fundamental solutions of (1). We can take two steps to find a system of fundamental solutions: First find the solutions of (1), writing them in the form of (4) in the preceding section, and then specify n — r groups of number 1, 0 , . . . , 0 ; . . . ; 0 , . . . , 0 , 1, for xT+\,..., xn respectively. Such n — r solution vectors form a system of fundamental solutions of (1). For example, in Example 1 in Section 2.1, letting z = l w e obtain a system of fundamental solutions which contains only n — r = 3 — 2 = 1 solution vector (j, 1,1). Again in Example 1 in the preceding section, as n — r = 4 — 2 = 2 a system of fundamental solutions contains two solution vectors: a
-(!■ !■>••)• " ■ ( - ! • J ^ 1 ) -
They can be obtained by specifying two groups of numbers 0, 1 and 1, 0 for z and w respectively. The above methods can be generalized as follows. We specify n — r groups of numbers ■"r+l' • * •»*"n >* = li • ■ - ,n ~ r , for xr+i,... ,xn in (4) in the preceding section, respectively. As long as the determinant of order n — r formed by them r(1)
(n~r)
(n—r) Cr+i
xW
••■ ■ ••
Xn
^0,
Systems
of Linear
67
Equations
then the n — r solution vectors determined by them
oti = ( 4 ° , - ■ •, 4 ° . 4 + n ■ • •, <$),* =
l,...,n-r,
are linearly independent. Hence oti,...,an_r form a system of fundamental solutions. According to (4) in preceding section, Xj = CjiT+ixr^_i + ■ ■ • + Cjnxn ,
j = l, z , . . . , r.
For example, in Example 1 in Section 2.1, putting z = 7 we obtain a system of fundamental solutions (1, 3, 7). Again in Example 1 in the preceding section we specify two groups of numbers 2, 0 and 0, 4 for z and w respectively; we then obtain a system of fundamental solutions (3, 3, 2, 0), (—3, 7, 0, 4). Note that any system of fundamental solutions of (1) can be found by the above methods. The reason is that if r
r(
L
«
...
n r
»(«- r )
- )
r-f 1
-
''
^n
according to properties 1 and 2 of determinants, it is readily shown that any minors of order n — r of matrix (i)
C\r+\^r^.\
,
(i)
(i)
(n—r) j ^ j^ (n—r) "T * * ' T C\nXn
• • • Crr+lxr+i
,
(n — r) ^^ T
,
(i)
(i)
^^ ™ ( n — r ) (n—r) x r Crnln r+l
(i)
\
{n—r) ' ' ' xn
. J
are equal to zero and hence the rank of the above matrix is less than n — r. Thus it follows from Theorem 1 in Section 2.1 that oti, • • •, ctn-r are linearly dependent, so that oti,..., a n _ r is not a system of fundamental solutions of But a system of fundamental solutions can be found by other ways. Of course, they are not necessarily obtained from the general solutions of a system of linear equations. Sometimes these methods are much simpler. First let us consider the following theorem: T h e o r e m 2. Assume that the coefficient matrix of a homogeneous system of linear equations a i i ^ i + • • • 4- ainxn
= 0, (2)
68
Linear Algebra
is A, and AM is the cofactor of the element aki of A. If the rank of A is n — 1 and Aki ¥= 0, then
(Akl,...,Akn) is a system of fundamental solutions of (2). Proof. A system of fundamental solutions contains only solution vector for n — (n — 1) = 1. Moreover, since \A\ = 0 , from the formula in Section 1.3, we have djiAn H
(- UjnAin = 0, i, j = 1 , . . . , n.
Hence the n vectors (An j • ■ ■ > Ain),
i = 1,..., n
are solution vectors and any one nonzero vector among them is a system of fundamental solutions of (2). Since Aki ^ 0>the solution vector (Aki, • ■ • > Atn) is a system of fundamental solutions. Therefore the theorem holds. For example, in Example 1 of Section 2.1, the rank of the coefficient matrix is 2 and A\\ = 2,A\2 = 6,^13 = 14, so that (2, 6, 14) or (1, 3, 7) is a system of fundamental solutions. In general we have: Theorem 3. Suppose that the rank of the coefficient matrix of the system of linear equations
{
auxi H
r-ainxn = 0, (3)
is r. If n — r groups of numbers Qr+1,1 j ■ ■ ■ j Gr+l,nJ * " ' j &nl) ■ • • j ^"Tin
are taken so that the rank of the matrix /
an aT\ Or+1,1
\ a„i
din
\
dTn
a r -(-l ) n 0>nn /
Systems of Linear
69
Equations
is n, then a
l
=
(.-Ar+l,l! ■ ■ • i -Ar+l.n)) • • • i O-n—r — \Anli
•••j
Ann)
is a system of fundamental solutions of (3), where Arj is the cofactor of element arj of A. Proof. It follows from Theorem 2 in Section 1.3 that a-nAji H
\-ainAjn
= 0, i = l,...,r,j
=r +
l,...,n.
Hence a i , . . . , a „ _ r are solution vectors of (3). Again, since \A\ ^ 0 and using the result of Example 7 of Section 3.1, we have An
■■■ Aln
= A ,
• • •
\A\n~l^Q.
A
We know from Theorem 1 in Section 2.1 that a i , . . . , an^r are linearly inde pendent. Therefore, riti,... ,cxn-r is a system of fundamental solutions. The theorem holds. Particularly when r = n—1, let Di be a minor of order n—1 of the coefficient matrix obtained by deleting the i-th column of the coefficient matrix, then (Di, — £>2, ■ • ■, (—l) n ~ 1 D n ) is a system of fundamental solutions. For example, the rank of the coefficient matrix in Example 1 of Section 2.2 is 2. Taking the first two row vectors of the coefficient matrix as the first two row vectors of A and the other two vectors (0, 0, 1, 0) and (0, 0, 0, 1) as the third and fourth row vectors of A, we get
A-
/I
1
3
1
0
Vo
-
0
o
- 3 -IX -
3
4
1
0
o
i /
where ASi = - 6 , A32 = - 6 , A33 = - 4 , A 34 = 0, Mi = 3, A12 = - 7 , A43 = 0, Aj 4 = - 4 , Hence ( - 6 , - 6 , - 4 , 0), (3, - 7 , 0, - 4 ) , or (3, 3, 2, 0), ( - 3 , 7, 0, 4) form the system of fundamental solutions required. The third problem concerning homogeneous systems of linear equations as mentioned above is now solved.
70
Linear Algebra
Exercises 1. Find systems of fundamental solutions of the homogeneous systems of linear equations given in Exercises 1 and 2 of Section 2.2. 2. Let a i , a 2 , and 03 be a system of fundamental solutions of a system of homogeneous linear equations. Are Qi + 02, a.2 + <*3> and 0C3 + c*i also a system of fundamental solutions of the system? 3. If the four row vectors of matrix 2 /I 1 -2 0 0 Vi - 2
1 0 0 1 1 -1 3 -2
°\ 0 0 0/
are all solution vectors of the system of homogeneous linear equations Xi + X2 + X3 + X4 + Xs = 0 ,
3x\ + 2x2 + X3 + X4 — 3x5 = 0 , X2 + 2x3 + 2x4 + 6x5 = 0 , 5xi + 4x2 + 3x3 + 3x4 — X5 = 0 Are the four row vectors a system of fundamental solutions? If not, are the four row vectors more or less than sufficient to form a system of fundamental solutions? If more, how are the redundant row vectors to be deleted? If less, how are they to be supplemented?
2.4. Systems of Nonhomogeneous Linear Equations We shall now consider the general system of nonhomogeneous linear equations anXi H
(- ai„x n = b\, (i)
(
^m\X\
+ ■ • • + &mnXn — 0m ,
The matrices of the coefficients and constant terms, an
■•■
ain \
/ 011
, ^ml
*
O'Tnn /
and
B= I ••• \ &ml
are called the coefficient matrix and augmented matrix of (1) respectively.
Systems of Linear
Equations
71
We know that systems of homogeneous linear equations always have so lutions; in any case they have a zero solution at least. But as systems of nonhomogeneous linear equations have no zero solution, and sometimes no nonzero solutions, they may not always have a solution. We shall next discuss on how to determine whether (1) have a solution or not. If (1) have solutions and i i = fei,...,in = fcnisa solution, that is, ( ai\k\ 4V am\kl
h ainkn = bi,
+ " " " + Q-mnkn — 0m ,
then the ( n + l)-th column vector of B is a linear combination of the preceding n columns. If the rank of A is r, then any minor of order r + 1 is zero. According to the basic properties of determinants, it is easy to see that any minor of order r + 1 of B is zero. Thus the rank of B dose not exceed r. But as the rank of B cannot be less than the rank of A, the rank of B is also r. In other words, if (1) has solutions, then the ranks of A and B are equal. Conversely, if the ranks of A and B are equal to r (> 0), assume, for convenience, that a minor of order r in the upper left corner is not zero. As in the discussion on homogeneous systems of linear equations in Section 2.2, we know from Theorem 1 in Section 2.2 that /3 r +i, ■ ■ ■ ,Pm among the m vectors A = ( a ii, • • •, ain, bi), i = l, 2 , . . . , m are linear combinations of r linearly independent vectors / 3 i , . . . , /3 r . Hence the last m — r equations can be expressed by the first r equations. Any solution obtained by solving the first r equations by Cramer's rule is a solution of (1). That is, if A and B have the same rank, then (1) has solutions. Thus we have: T h e o r e m 1. The system (1) of linear equations has solutions if and only if the ranks of its coefficient matrix and augmented matrix are equal. For a homogeneous system of linear equations, since its augmented matrix contains one more column, whose elements are all zero, than its coefficient matrix, their ranks are equal. Hence according to the above theorem, homo geneous systems of linear equations always have solutions. The above proof of sufficient condition gives a general way to solve systems of linear equations. That is, if the ranks of A and B are r, then by Cramer's rule we can solve r equations which contain some nonzero minors of order r and obtain the solutions of (1). Thus we again have
72
Linear Algebra
T h e o r e m 2. Suppose that the system (1) of linear equations has solutions and the rank of the coefficient matrix A is equal to r. If r = n, then (1) has only a zero solution; If r < n, then (1) has infinitely many solutions. Clearly Theorem 1 in Section 2.2 is the special case of this above theorem. We have now given the sufficient and necessary condition for (1) to have solutions and a general method for finding them, and so we have solved the first and second problems raised in the introduction. Example 1. Solve 7x + Sy = 2 , x - 2y = - 3 , 4x + 9y = 11. Solution: The rank of the coefficient matrix is clearly 2. Since the deter minant of augmented matrix B is equal to zero, the rank of B is also 2. Hence the nonhomogeneous system has solutions. Solving the first two equations by Cramer's rule we immediately obtain
- -— x -
17,
-?5 y-17-
This is the solution required. Example 2. Solve X\ + X2 — 3X3 — X4 = 1 ,
i 3zi — X2 — 3^3 + 4i4 = 4, V
Xi + 5X2 — 9X3 — 8X4 = 0 .
Solution: As in Example 1 of Section 2.2, the sum of the second and third equations is equal to 4 times the first equation. Hence the ranks of A and B are both 2. Thus the above system has solutions. Solving the first and third equations Xl + X2 = 1 + 3X3 + £4 ,
{ Xi + 5X2 = 9X3 + 8X4 by Cramer's rule, we obtain general solutions 5 3 3 xi = - + -x3 - -X4 , 4 2 4
1 3 7 x% = - - + 77X3 + - x 4 , 4 2 4
X3 = X3,
Xi = X4 .
73
Systems of Linear Equations
Example 3. In the following system of linear equations Xx\ + X2 + x3 = 1, Xi + \X2 + £3 = A , X\ + X2 + XX3 = A2 ,
determine all values of A for which the resulting linear system has (1) no solution, (2) a unique solution, (3) infinitely many solutions. Solution: Here B = X
1
1
\A\ = 1 A 1 1
( A - l ) 2 ( A + 2).
1 A
When A 7^ 1, —2, the ranks of A and B are both 3 and the system has only one solution. When A = 1, ranks of A and B are both 1, and the system has infinitely many solutions. When A = —2, the rank of A is 2, while the rank of B is 3. As the rank of A is not equal to the rank of B, the system has no solution. E x a m p l e 4. Discuss the relative positions of the following two planes: a\x + biy + c\z = d\, a2x + b2y + C2Z = d2 ■ Solution: Let A =
ai
bi
c\
0,2
62
C2
B=
a\
b\
c\
di
0,2 62 C2 ^2
It is easy to see from analytical geometry that when the rank of A is 2, ai,bi,ci and 02,^2^2 are out of proportion, so the two planes intersect on a line. When the rank of A is 1 and the rank of B is 2, a\,bi,ci is directly proportional to 02,^2^2 with a ratio 7^ jp-, so two planes are parallel. When the ranks of A and B are 1, a\, 61, c\, d\ is directly proportional to 02,62, C2, cfo, so two planes coincide. The solution is complete.
74
Linear Algebra
We next discuss the relation among solutions of (1). A system of fundamental solutions of a homogeneous system of linear equa tions can be used to express solutions of (1), i.e., infinitely many solutions can be expressed by finitely many solutions. Hence we solved the third problem raised in the introduction. Let a = ( a i , . . . , an) be any solution vector of (1) and v = ( c i , . . . , c n ) be some given solution vector, then a — v = (ai — c%,..., an — cn) is a solution vector of the system of homogeneous linear equations
{
aii^i H
\-a>inXn = 0 , (3)
flmlZl H
\~ 0,mnXn = 0 .
This is because from n
n
2_jo-ii0,j = bi,
2jai:,Cj=6i,
j=l
3=1
i = l,...,m,
we have n
n
n
j=i
j=i
j=i
Hence any solution vector of (1) can be written as the sum of some given solution vector of (1) and any one solution vector of (3). Conversely, it is easy to see that the sum of some given solution vector of (1) and any one solution vector of (3) is still a solution vector of (1). The system (3) is often called the system of homogeneous linear equations of (1). To summarize, we have: T h e o r e m 3. The sum of some given solution vector of (1) and every solution vector of the homogeneous system (3) of (1) gives the whole solution vectors of (1). Thus the solution vectors of (1) can be expressed by a system of fundamen tal solutions of the homogeneous system (3) of (1). Suppose that v is a given solution vector of (1) and c*i,..., a„_„ is a system of fundamental solutions of (3), then the whole solution vectors of (1) can be expressed as V +fciOfiH
h kn-TOLn-T
where k\, k%,..., /s n -r are arbitrary constants.
,
Systems of Linear
Equations
75
For example, in Example 2, u = (f,— 2,0,0) is one solution vector of the system of linear equations. We know from the preceding section that a system of fundamental solutions of the homogeneous system of linear equations are a = (3, 3, 2, 0),
p = ( - 3 , 7, 0, 4)
Hence its whole solution vectors are v + kia + k2(3, where ki and k2 are arbitrary constants. It should be noted that the general solutions obtained in the preceding Example 2 are also the whole solutions of this system, and the method of find ing them is simpler than the above method. But the relation among solutions is not so clear as for the above. This solved the third problem raised in the introduction. Example 5. Let L>Q be a solution vector of (1), a i , . . . ,otn-r of fundamental solutions of (3), then
is a system
1/0,1/j = i/0 + a i , . . . , vn-r = VQ + a „ _ r are n — r + 1 solution vectors of (1) which are linearly independent, and any solution of (1) can be expressed as V = fcofO + k\V\ -\ where ko + k\ +
h fcn_rI/„_r ,
h kn-T = 1.
Proof. Clearly UQ, U\,..., f „ _ r are n — r + 1 solution vectors of (1). First we show that they are linearly independent. Let CQUQ + CiL>i H
f- C n _ r I / n _ r = 0 ,
Then (co+ciH hc n _ r )i/o+ciQiH \-cn-ran-.r = 0. If co+cH hc n _ r ^ 0, then UQ is a linear combination of oti,..., a n - r - Hence u$ is not a solution vector of (1). This conflicts with assumption, and so CQ + c\ + • ■ • + c n _ r = 0. Again, e*i,... , a n _ r is a system of fundamental solutions of (3), and so a = ■ ■ ■ = cn-T = 0, Co = 0. This proves that VQ, UI, ..., vn-r are linearly independent.
Linear Algebra
76
Again since u — L>0 is a solution vector of (3), 1/1 — I/Q, .. system of fundamental solutions of (3), we have V - I/0 = fci(l/i - I/0) H
h
fcn-rK-r
1/Q
is a
~ "o) :
or v = k0fo + kii/i H
h
kn-run.
1. Therefore the where fco = 1 —fci— ■ • • — kn-T, or fco + fci +T fcnstatement of this example holds. The solutions UQ, f\,..., f n - r are sometimes said to be a system of fun damental solutions of (1). It is worth noting that the system of fundamen tal solutions given here differs from that of homogeneous linear equations in the preceding section. The coefficients fco, fci,..., fcn-r of the former satisfy fco + fci + ■' • + fcn-r = 1- Thus we know that any n — r + 2 solution vectors are linearly dependent, where r is the rank of coefficient matrix. E x a m p l e 6. Suppose AX = b represents some system of nonhomogeneous linear equations in four unknowns, the rank of A, its coefficient matrix, is 3, and 171,172, a n < i »73 are three solution vectors of the nonhomogeneous system, where
f23\ m
4
n\ 2
V2 + m
3
w
W
Find the general solution of the nonhomogeneous system. Solution: The associated homogeneous system of the above nonhomoge neous system is AX = 0. Since the rank of A is 3 and the number of un knowns is 4, a system of fundamental solutions of AX = 0 contains exactly one solution. It follows from the assumption that <*l = m - 772
and
CH2=T]I-
Vz
are two solution vectors of AX = 0. Therefore / a i + OL2 = 2TJI - (772 + T73)
3
4 5
\
W
Systems of Linear
Equations
77
is also a solution vector of AX = 0 and is a system of fundamental solutions of AX = 0. Hence, the general solution of AX = b is
Tji+/c(ai + a 2 ) -
A
\ +k
where k is an arbitrary constant. We conclude this section with the following example. E x a m p l e 7. Discuss the position relation among the three planes aix + b\y + c\z = d\, a2x + b2y + c2z = d2 , a3x + b3y + c3z = d3 . Solution: Let the ranks of matrices /ax ax 61 c A 1laiai 61 ci di dis\ A = j a2 b2 c2 1 , £ = 1 a 2 b2 c2 d2 \ a 3 63 c 3 / \ a 3 63 c 3 d 3 // a2 b2 c2 1 , B = I a 2 b2 c2 d2 be r, s respectively. aSince when b3 s >c3rJ, s ^ 3 , r > l \aand 63 c3s =d33 we have r ^ 1, 3 3 for values of r and different The following be r,different s respectively. Since s > rs , there s ^ 3 ,are r > five l and when cases. s = 3 we have r ^ 1, conclusions easilyof obtained geometry.cases. The following for different are values r and s from thereanalytical are five different conclusions are easily obtained from analytical geometry. 1. When s = 3 and r = 3, (1) The three planes intersect at a point as the above system has only one solution.
(
2. When s = 3 and r = 2, The three planes do not intersect as the above system has no solution. Since r = 2, three row vectors a i , a 2 , and a3 of A are linearly dependent. Let kidti + k2a2 + k3ot3 = o. If none of k\, k2, and £3 is zero, the intersection line of any two planes among the three planes is parallel to the third plane. If one of fci, k2, and £3 is equal to zero, then two planes of the three planes are parallel, while the third plane intersects the two planes. Thus we may have the following: (2) The intersection line of any two planes among the three planes is parallel to another plane.
78
Linear Algebra
(3) There are two parallel planes among the three planes. Another plane intersects the two parallel planes on parallel lines. 3. When s = 2 and r = 2, The three planes intersect on a line as the above system has infinitely many solutions. Again since s = 2, three row vectors (3i,/32, and fo of B are linearly dependent. Let &i/3i + k2^2 + ^3/^3 = °- When none of k\, k.2, and &3 are equal to zero, the three planes are distinct. When one of them is zero, there are two coincident planes among the three planes, i.e., (4) The three planes are distinct and intersect on a line. (5) The three planes intersect on a line with two of them coinciding. 4. When s = 2 and r = 1, The three planes do not intersect as the system has no solution. Since r = 1, the three planes are parallel. Again, since s = 2, at least two planes among the three are distinct, i.e., (6) The three planes are parallel and distinct. (7) The three planes are parallel with two of them coinciding. 5. When s = 1 and r = 1, (8) The three planes coincide. The positions of the three planes can be grouped into eight cases. The configurations are shown below:
Exercises 1. Solve the following systems of linear equations: 2x\ + X2 — X3 = 1 ,
(1)
xi — 3^2 + 4x3 = 2 , llxi - 12x2 + 17x3 = 3 .
Xi + 2X2 + X3 - X4 = 4 , (2) < 3xi + 6x2 — X3 — 3x4 = 8, 5Xi + 10X2 + X3 — 5X4 = 16 .
79
Systems of Linear Equations 2. Solve the following system of linear equations: ' Xi -
X2 — X3 — ■
-X\ + 3^2
—
%3
Xn
— LQj ,
■ - xn = 4a, h (2 n - l)xn = 2na.
. -x\ - X2 - x3
3. For what values of A does the following linear system have solutions, no solution, or only one solution? (X + 3)xi + x2 + 2x3 = A, Xxi + (A - l)x2 + x 3 = A, 3(A + l)xi + Xx2 + (A + 3)x 3 = 3 . 4. Suppose the rank of the coefficient matrix of the linear system an^i +
r ainxn
= bi,
a n iXj "p * ■ * ~r annxn
— on
equals the rank of the matrix 'an
■■•
■■■
ain
&i
ann
on
bn
0
Prove that this system has solutions. 5. Prove that the system X\ — X2 = O l , X2 — ^ 3 = a 2 , X3 — X4 = CL3, X4 — X5 = 0 4 , X5 — X\
= 05
has solutions if and only if ^ i = 1 a, = 0, and find its general solution, if any. 6. Let Ui,...,Vt be solution vectors of (1). Is any linear combination of V\,...,Vt& solution vector of (1)? If not, which of them are solution vectors of (1)7 7. pi{x\,y\) and p2(x2, y2) are two different points on a line L: ax + by + c = 0. Prove that the coordinates of any point p(x, y) on line L can be written as
x = Xxi + nx2,
y = Xyi+ fiy2,
where A and \i are arbitrary constants which satisfy A + p, = 1.
80
Linear Algebra
Suppose the rank of the coefficient matrix of some nonhomogeneous system of linear equations in three unknowns is 1, and ttj., ct2, and 03 are the three solution vectors of the nonhomogenous system, where
« 2 + Ot3
Find the general solution to the system of nonhomogeneous linear equations.
2.5. Elementary Operations In the previous sections we solved the three problems raised in the intro duction, gave methods for determining whether a system of linear equations has solutions, rules for finding the solutions and the relation among them. We may say that all the problems raised in the introduction are solved. But when the theory and methods are applied to practice, many determinants need computing, which may bring a lot of tedious work to us. In this section we introduce elementary operations which enable us to simplify the procedure of solving systems of linear equations and work especially efficiently on systems with digital coefficients. We know that a system of linear equations
{
OllZl H
h ClnXn
O-ml^l T ' ' ' 4* 0,mnXn
= b\ ,
w
— 0m
can be reduced to
{
bnxi H
h binxn = b[ , (2)
bmixi H
1- bmnxn = b'm .
by a finite sequence of operations as comprising the following: changing the order of equations, multiplying an equation by a nonzero constant, adding a scalar multiple of one equation to another equation. Obviously all these operations do not affect the solutions. Hence all the solutions of (1) are the solutions of (2). Conversely (2) can be reduced to (1) and so all the solutions of (2) are the solutions of (1). That is to say, (1) and (2) are two same-solution systems or two equivalent systems. Thus in order to solve (1) it suffices to solve (2).
Systems of Linear
Equations
81
If we replace a system of linear equations with the matrix A, then the above operations are simply: interchanging two rows of A, multiplying a row of A by a nonzero constant, adding a scalar-times multiple of a row of A to another row. These operations are called elementary row operations on matrix A. Thus reducing (1) to (2) is equivalent to reducing the augmented matrix A of (1) to the augmented matrix B of (2) by a finite sequence of elementary row operations. The following is an important property of elementary row operations on a matrix. Theorem 1. Assume that the matrix A can be reduced to the matrix B by a finite sequence of elementary row operations. Then the ranks of A and B are equal. Proof. To prove the theorem it suffices to show that each type of elemen tary row operation does not alter the rank of a matrix. Clearly the preceding two types of elementary row operations do not alter the rank of a matrix. Hence we need to prove the theorem only for the third type of elementary row operation. Suppose the matrix B is obtained by adding a fc-times multiple of the i-th row of A to the j - t h row of A. If we can prove that the rank s of B is not greater than the rank r of A, since we subtract a fc-times multiple of the i-th row of B from the j - t h row of B, we shall obviously obtain the matrix A, hence r is not greater than s, thus r = s. This theorem holds. It is easy to see that if the minor B\ of order r + 1 of B does not contain the j - t h row of B and it is simply a minor of order r + 1 of A, then B\ = 0; If Bi contains simultaneously the j - t h and i-th rows of B, we know from Property 4 of determinants that B\ is equal to the minor of order r + 1 of A and so B\ = 0; If B\ contains the j - t h row of B, but not the i-th row of B, then following from Property 2 of determinants, we have B\ = A\ + KA2, where A\ and A2 are minors of order r + 1 of A. So we also obtain B\ = 0. Since the status of B\ in B has only the above three possibilities, any minor of order r + 1 of B is equal to zero. Hence s ^ r and the theorem holds. Clearly, the simpler the matrix B is, the easier is (2) solved. For example, when B is in the simplest form, say B is a diagonal matrix, the solutions of (2) can immediately be seen without computation. But what are the simple forms that can be taken by matrix B? We illustrate this with the following example.
82
Linear Algebra
Example 1. Simplify the following matrix by elementary row operations:
Z1 A =
-2
-1
-2
4
1
2
-1
Vi
1
2\
Solution: Multiplying the first row by —4,-2, and —1 and adding the results to the second, third and fourth rows respectively, we obtain -2 -1 9 6 9 6 2 3
/I 0 0 \0 In a similar way, we
-2 9 3 3
2\ -5 -4
"1/
in obtain [till
/l 0 0 \0
-2 9 0 0
and finally, we obtain
-1 -2 6 9 0 -6 0 0
2\ -5 1 0/ 5
0
1
l 3 2 3
0
0
0
1
79 ' - 18 -" 16
\0
0
0 0
0/
c°
0 0
\ (3)
This is of a much simpler form than the given matrix. Clearly its rank is r = 3. The solution is complete. In the above matrix, all elements beyond the third of each column vector are zeros. This is a characteristic of the above matrix. Besides, there are three columns 0 0
J
1 0
w■
R li
each of which has an element equal to 1 and the others equal to zeros. The same applies to the general case. If the rank of A is r, then by elementary
Systems
of Linear
83
Equations
row operations some r columns of A can be reduced to such r columns each of which has an element equal to unity, while the other elements beyond the r-th are zero. If A is nonsingular, then by the above method it can be reduced to identity matrix (see Section 3.2). Therefore we have T h e o r e m 2. Every nonsingular matrix can be reduced to the identity matrix by elementary row operations. Thus on reducing B to the above form, by merely transporting the terms in an appropriate manner we can find the solutions of (2) without computation of the determinants. Example 2. Solve the following system of linear equations: X\ — 2x2 — %3 — 2^4 = 2 , 4xi +x2 + 2i3 + x4 = 3 , 2x\ + 5x2 + 4a;3 - X4 = 0 , Xl +X2+
X3 + X4
= i .
Solution: The augmented matrix of the above system is the very matrix in Example 1 and can be reduced to (3) by elementary row operations. Thus the given linear equations can be reduced to Xl + 5S3 = § , x2 +
jx3^-^,
x4=-\. By transporting the terms contained X3 to the right-hand side, we obtain 9
1
Zl = g - g Z 3 ,
7 X2 = - — -
2 -X3,
1 £3=Z3,
X4 =
- ~ ,
where X3 is an arbitrary constant. These are the solutions required. Note that when we solve the systems of linear equations by elementary row operations, the ranks of the coefficient and augmented matrices need not be computed at first. It suffices to observe that if the ranks of the above two matrices are not equal, then the system of linear equations has no solution. Example 3. Solve the following system of linear equations: ( x + 2y-z = l, I 2x-3y + z = 0, [4x + y-z = - l .
84
Linear Algebra
Solution: Here 1 2 4
B
2 -3 1
-1 1 -1
1 0 -1
By adding the products of multiplying —2 and —4 with the first row to the second and third rows respectively, we obtain 1 0 0
2 -7 -7
-1 3 3
1 -2 -5
Then by adding the product of multiplying —1 with the second row to the third row, we obtain
Evidently the ranks of the coefficient and augmented matrices are not equal, hence the original system of linear equations has no solution. The solution is complete. In general, besides elementary row operations there are also elementary column operations on a matrix. Elementary row operations and elementary column operations are called elementary operations. Definition. The following operations on a matrix A are called elementary operations: 1. Interchanging two rows or two columns of A, 2. Multiplying a row or a column by a nonzero constant, 3. Adding a scalar-times multiple of a row of A to another row of A, or adding a scalar-times multiple of a column of A to another column of A. It should be noted that elementary operation of type 1 can be accomplished by a combination of two elementary operations of types 2 and 3. For example, in order to interchange the first two rows, we may also proceed as follows: First add the second row to the first row, then add the product of — 1 with the first row to the second row, then add the second row to the first row, and finally multiply the second row by - 1 . The outcome is interchanging the first two rows. We shall still list operation 1 as a type of elementary operation for easy quotation.
Systems
of Linear
Equations
85
As mentioned before we can simplify a matrix by elementary row opera tions. If we use elementary column operations again, the matrix can be simplified further. For instance, the matrix in Example 1 can be simplified further by elementary column operations in the following way: multiplying the first column by — | , the second column by — | , and then adding them to the third column. Next multiplying the first column by — | , the second column by jg, the fourth column by 4, and add the resulting columns to the fifth column. Finally interchange the third and fourth columns and obtain / l 0 0 0 0* 0 1 0 0 0 0 0 1 0 0
Vo o o o o, Thus we obtain the matrix in the simplest form. As in the above, a matrix whose element in the i-th row and i-th column is 1 and whose elements elsewhere are all zero, i = 1,2,... ,r, where r is the rank of this matrix, is called the standard form of the matrix. Thus from the above example, we have: T h e o r e m 3. Any matrix can be reduced to standard form by elementary operations. The method of finding the standard form of a general matrix is the same as that used in the above example, except that they are stated in general terms, and need not be repeated here. The standard form of a matrix is also an important concept as it is needed in many branches of mathematics. We again illustrate with an example: E x a m p l e 4. Find the standard form of the matrix /
Solution: We have
0 1 3 0 2
2 -4 1 5 3
-4\ 5 7 -10
v
Linear Algebra
(I A~
0 3 0 \2 /I 0 0 0
4 2 1 5 3
-5\ -4 7 - 10
o/
0 0\ 2 -4 -11 22 5 -10 \o - 5 10/ 0 0 \ l ( 0 1 0 0 0 0 0 0 0 Vo 0 0 /
/l
4 2 --11 5 -5
0 0 0
~
Vo ~
/l 0 0 0 \°
0 1 0 0 0
-5\ -4 22 -10 10 ) 0\ -2 0 0 0/
This is the standard form of A, where the notation ~ expresses equivalence. Later we will give its explanation. The solution is complete. We know from Theorem 1 that the rank of a matrix is not altered by elementary row operations. Similarly, it is not altered by elementary column operations. Therefore the rank of a matrix is not altered by elementary oper ations. Thus the rank of a matrix is equal to that of its standard form. For instance in Example 4, the rank of the matrix is 2. Finding the rank of a matrix, of course, we need not reduce it to the standard form, it suffices to reduce it to such a form by appropriate elementary operations that the rank can be seen directly. Thus to find the rank of a matrix, we need not actually compute determinants. It is worthy of a special mention that to find the rank of a matrix we can use both elementary row operations and elementary column operations, but to solve a system of linear equations we can use only elementary row operations. This is because using elementary column operations we would make a mess of the unknown's order. This is, of course, not convenient for finding solutions. The basic properties of elementary operations on a matrix are similar to those for a determinant. Elementary operations are fundamental operations of a matrix. They are of extreme importance and are used widely. In two later chapters we shall use them to simplify computation. In the following we introduce another basic concept. Two matrices A and B are said to be equivalent matrices if the matrix A is reduced to the matrix B by a finite sequence of elementary operations, denoted
Systems of Linear
87
Equations
by A ~ B. It is readily seen from the definition that the following equivalence laws hold for the equivalence relation: 1. reflexive law: A ~ A; 2. symmetric law: if A ~ B, then B ~ A; 3. transitive law: if A ~ B, B ~ C, then .A ~ C. In general a relation which satisfies the equivalence laws is called an equivalence relation. For example, the parallel relation between two lines is an equivalence relation; the similar relation between two triangles is also an equivalence relation; the equal relation between two numbers in arithmetic is also a equivalence relation. But none of the non-equality relations, the "greater" relation and the "less" relation are equivalence relations. Two matrices whose numbers of rows and columns are equal are called matrices of the same size. For example, in the following three matrices, /l
\2-3
2-l\
l) '
/0
f3-2\
2-4\
\0 - 4
5/ '
\1
4) '
the first two matrices are of the same size but the last two are not. We know from Theorem 1 that if two matrices are equivalent, then their ranks are equal. Conversely if the ranks of two matrices of the same size are equal, then they have the same standard form, i.e., they are equivalent to the same matrix. According to the symmetric law and the transitive law, they are equivalent. Thus we obtain: T h e o r e m 4. Two matrices of the same size are equivalent if and only if their ranks are equal. We conclude this chapter with the following example. E x a m p l e 5. In the following system of linear equations ax + y + z = p, x + ay + z — q, x + y + az = r , determine all values of a, p, q, and r for which the resulting linear system has (1) a unique solution; (2) infinitely many solutions; (3) no solution.
88
Linear Algebra
Solution: The linear equation above has the augmented matrix /
p q
\
,
1 1 1 a 1 1 a
fa
where A =
V
r
\
Transforming this matrix to row echelon form by using elementary row operations, we obtain B
l />; - | 0 a - 1 ^0 0
a
r 1-a q-r - ( a - l ) ( a + 2) p + o - r ( a + l )
Let 1 1 0 a-1 0 0
a = - ( a - l ) 2 ( a + 2). 1-a - ( a - l ) ( a + 2)
1. When a ^ l and a ^ —2, as rank of A = rank of B = 3, the given linear system has a unique solution. Again since (l Bi ca
0 0 0
1° the solution is ap + p — q — r X= (a + 2 ) ( a - l ) '
y
1 C 0
1
ap + p — q — r \ (o + 2 ) ( o - l ) a — 9 P + 9— r (a + 2 ) ( a - l ) ar — p — q + r (o + 2 ) ( a - l ) /
aq - p + q-r 2)(a-l)'
Z
i
ar —p — q + r " (o + 2 ) ( o - l )
2. When a = 1, we have 1 1 1 r | 0 0 0 q-r 0 0 0 p + g-2r (i) When q — r = 0 and p + o — 2r = 0, i.e., p = a = r, we have rank of A = rank off? = r = l < n = 3. The given linear system has infinitely many solutions. Prom
Systems of Linear
(\ B i = 0 \0
Equations
1 1 r\ (I 1 1 pN 0 0 0 = I 0 0 0 0 0 0 0/ \ 0 0 0 0/
we obtain x + y + z = p, so it follows that a system of fundamental solutions of the associated homogeneous system is
A particular solution obtained by taking y = 0 and z = 0 is
m We have thus expressed the general solution of the original system as a particular solution plus a system of fundamental solutions of the associated homogeneous system: r)0 + kiati +
k2a2,
where fci and k2 are arbitrary constants. (ii) When q — r ^ O , or p + q — 2r ^ 0, o r g — r ^ O and p + q — 2r ^ 0, we see that the original system is inconsistent: that is, it has no solutions. 3. When a = - 2 , we have 1
-2
Bi= I 0 - 3
/l
3
q-r
\0
0
p + q + ry
0
r
(i) When p + q + r = 0, applying elementary row operations to the matrix Bi we get
Bx
/i o -l ap o i - i = ? \0
0
0
0
It follows that all the solutions to the associated homogeneous system of linear equations are ka = k
90
Linear Algebra
where k can be assigned any real value. A particular solution to the original system is / 2r+q \ 3
rio =
r —q
3
V o / Hence it follows that Tjo + ka gives all the solutions of the original system. (ii) When p+q+r ^ 0, we see easily that the original system is inconsistent, that is, it has no solutions. Exercises 1. Compute the ranks of the following matrices by using elementary opera tions: '25 75 (1) 75 ,25
31 94 94 32
17 43 \ 53 132 54 134 20 48 /
'47 (2) | 26 16
-67 98 -428
35 23 1
201 -294 1284
155' 86 52
2. Can you find ranks of matrices and solve a system of linear equations by only using elementary row operations, or only using elementary column operations? 3. Solve the following systems of linear equations using elementary row operations:
(1) \ x i -
2x 2 + 3x 3 -I- 4x 4 = 5 . X2 + X3 + X4 = 1 ;
' xi + 2x2 + 3x 3 + x4 = 5, 2xi + 4x 2 — X4 = —3 , (2K -Xi - 2x + 3x + 2x = 8, 2 3 4 xi - 2x2 - 9x3 - 5x 4 = - 2 1 .
4. Assume that an (m, n)-matrix A is reduced to a matrix B by elementary row operations. Let oti,...,a„ be the column vectors of A and f3lt..., f3n the column vectors of B. Prove that if fciai H
hfc„a n = o,
(*)
Systems of Linear
Equations
91
then
k1p1 + --- + kn0n = o,
(**)
and conversely, if (**) is true, then (*) is also true. 5. Find a largest linearly independent set from the following vector set a!=(4,-5,2,6), a3 = (4,-l,5,6),
02 = ( 2 , - 2 , 1 , 3 ) , a 4 = (6,-3,3,9)
and express the rest of the vectors in terms of the largest linearly indepen dent set.
CHAPTER 3 MATRIX OPERATIONS
The concept of matrices was discussed in the previous chapter. It is not only an important tool for treating a system of linear equations, but also an indispensable tool for studying linear functions. In the following chapters we shall often make use of it. The advantage of matrix operations is that when a matrix is regarded as a quantity, it makes operations on arrays of ordinary numbers extremely simple. In this chapter we shall discuss matrix operations, particularly as regards the following three aspects. 1. Matrix addition, matrix subtraction, scalar multiplication, matrix multipli cation, etc., as well as the basic properties of these matrix operations. 2. Some specially important matrices. 3. The necessary and sufficient condition for a matrix to be invertible and methods of finding an inverse matrix. This chapter is divided into three parts in which we in turn consider the above three aspects. 3.1. Matrix Addition and Matrix Multiplication In Sec. 2.1 the definition of a matrix was given. The matrix A of order n is called a nonsingular matrix if its determinant \A\ ^ 0. Otherwise, i.e., \A\ = 0, A is called a singular matrix. When all elements are real numbers, A is called a real matrix. As in the case of equality, addition, scalar multiplication of vectors, we have the following definitions. 92
93
Matrix Operations
Definition. Two (m, n)-matrices
(
an
••■
ain \ a
mn }
Q-ml
f hi
■'■
\ "ml
'''
. B=[
hn "ran
are equal, if they agree element by element. That is, if ay = fry, for i = 1,2,..., m and j = 1,2,..., n. The equality is denoted by A = B. The sum of A and B, denoted by A + B, is an (TO, n)-matrix whose elements are the sums of the corresponding elements of A and B, i.e.,
(
an+6n O-ml + &ml
■ • •
It should be noted that the sum of matrices A and B can be defined only when A and B have the same number of rows and the same number of columns. That is, when A and B are of the same size. If A is an (TO, n)-matrix and A; is a scalar number, then the scalar multi plication of A by k, denoted by kA or Ak, is an (m, n)-matrix whose every element is obtained by multiplying each element of A by k, namely,
(
ka\\
■■■
KOml
' '
ka\n KQ>mn
That is to say, suppose that A = (aij), B = (by) are two matrices of same size, then A + B = (ay + bij), kA = Ak — (/cay), For example, 3
-2\
(I
2\
-\)
1
OJ ' \0
3
- 2 \ _ /3
1
0J~Vl
-2\
9
(A
0
\\
-1
_/6
Q)l~\2
-4
0
Let E^ be a matrix of order n with 1 in the i-th row, j - t h column and zeros elsewhere. According to the rules of matrix addition and scalar multiplication, we have
(
1
0\
„
/an
•••
0
1/
i=1
\ a„i
•■•
ain\
ann I
n
n
i=i j=i
94
Linear Algebra
In the above chapter we introduced the concept of matrix to satisfy the need of solving a system of linear equations. However it has much wider applications. Even in everyday life we often make use of it. For example, we can compile the following table showing the amounts of four kinds of goods Bi,B2,Bs, and B4 which are to be shipped to three areas A\,A2, and A3. Table showing amounts of four kinds of goods needed in three areas. ^ ^
goods
tonnages\. places
-Bi
B2
B3
B4
J^^ Ai
1
A2 A3
2
2 3
5
4
1
4 1
5
0 2
For simplicity we can write it as a (3, 4)-matrix / l 2 5 4\ 2 3 1 0 ) . \4 1 5 2/ From the meaning of such matrices, we readily see that their equality, addition, and scalar multiplication satisfy the above rules. Furthermore we see that the definitions of equality, addition, and scalar multiplication have real meanings which conform to scientific abstraction. Historically the concept of matrix was first introduced by J. J. Sylvester (1814-1897) in 1850. The rules of matrix operations were later established by Cauchy (1921-1895). From then on the theory of matrices gradually be came an important branch of mathematics. The following properties of matrix operations readily follow from the definitions: A + B = B + A, k{A + B) = kA + kB,
(A + B) + C = A + (B + C), k(lA) = {kl)A, (k + l)A = kA + lA.
Therefore the sum of A, B, and C can be written as A + B + C, without the need of putting the matrices in parentheses. Scalar multiplication of IA by a scalar k can be written as klA, again without the need to put the matrices in parentheses. Addition and multiplication of numbers, addition and scalar multiplication of vectors have similar properties. In particular, the matrix whose elements are all zero is the zero matrix, denoted by O. Similar to zero
Matrix
95
Operations
number and zero vector, the sum of a zero matrix and any matrix A of the same size is still the matrix A itself, i.e., A + 0 = A. A matrix obtained from the matrix A by reversing the sign of its every element is called the negative matrix of A, denoted by — A, that is,
(
-an O-ml
•••
-air,
' ' '
O'mi
i.e., —A = (—aij). The sum of matrix A and its negative matrix — A equals the zero matrix, i.e., -A + A = A+(-A) = 0. Moreover The matrix A + (-B)
-A = (-1)A, -(-A) = A. can be written as A — B, i.e.,
(
a n -fen O-rnl
■••
"ml
ain - hn 0,mn
~ Dron ,
The matrix A — B = (a^ — bij) is called the difference between A and B. We have A — B + B = A. Hence matrix subtraction is the inverse operation of matrix addition. For example,
G -°K D-(i -J) Note that the sum of two matrices of order n and difference between them differ from those of two determinants of order n, scalar multiplication of A by a scalar k differs also from that of a determinant of order n by the scalar k. The former are operations on matrices, while the latter are operations on numbers. Let A be a matrix of order n. Clearly, by definition we have \kA\ =
kn\A\.
Example 1. Assume that the ranks of matrices A and B of the same size are r, s respectively and the rank of A + B is t, prove the t ^ r + s. Proof. By using a method similar to that in Example 2 of Sec. 2.1, t ^ r+s can be proved. But the following proof is given by using row vectors, which method appears simpler.
96
Linear Algebra
Let oti,...,arber linearly independent row vectors of A — (ay), / 3 i , . . . , (3S be s linearly independent row vectors of B = (by). Since any row vector (an + bn, ...,ain
+ bin) = {an,...,
is a linear combination of a i , a r and f3\,..., greater than r + s, i.e., t ^ r + s. The proof is complete.
ain) + (bn,...,
bin)
(33, the rank of A + B is not
Example 2 of Sec. 2.1 can also be proved by the above method. If we quote the result of this example, the proof of the above Example 1 can be simplified. Since elementary operations do not alter the rank of a matrix, we have rank of (A + B) < rank of (A + B,B)=
rank of (A, B)
^ rank of A + rank of B. rank of (A ± B) ^ rank of A + rank of B . Again since A = A — B + B, we have rank of (A — B) > rank of A — rank of B. Next we shall specify the product of two matrices. We know that the elements of the sum or difference of two matrices equal the sum or difference of the corresponding elements of the two matrices. However the elements of the product AB of two matrices A and B are not equal to product of the corresponding elements of the two matrices. An element at the i-th row and j - t h column of AB is the sum of the products of the corresponding elements in the i-th row of A and the j - t h column of B. Historically the definition of AB was first introduced by A. Cayley arising from the necessity of defining the product of linear transformations. This definition of the product of matrices was not subjectively conjectured, but was well thought out taking the reality as starting point. So it does reflect objective reality, as has been shown by scientific practice. The above definition of the product is indispensable for treating the operations of linear transformations in Chapter 7. The rule of matrix multiplication appears to be much more complicated than those of matrix addition and scalar multiplication. It is difficult to
Matrix Operations
97
state clearly. Rather, we shall illustrate the definition using mathematical expressions in detail, starting with some simple examples:
(
a\ : | = (6iai + • • • + bnan),
/aiN\
(a\bi
\{h-
:
• • & » )
=
aibn \
■ ■ ■
\a„>1
anbn j
\anbi /an
O-ln \
•
(&1 • • • frm)
/ h a m i 6 m , . . . , ai n 6i + • • • +
"ml
(anfei + on
■■■
ain \ / " J \
flml
■••
O-rnn ''mn /I
"")(
\h u„
x/ 1 - 21 -;.;; ;)V °i?-;3
0"mn W
'''
/ an&i I
H
\0,mlbi
h
H
h
amnbm),
ai n 6„ ttmnin
-si
In general the definition of the product of two matrices can be expressed as a formula: Definition. The product of two matrices an >
■■■ an \
I
, aml
■••
/fen
■■■
\ 6a
•••
B=
ami I
is
/ Cn
■
Ci„ \
AS \cmi
•
cmn /
where Cjj = cbnbij H
h ai(6jj = 22 aubtj , t=i
Linear Algebra
98 i.e.,
nj i (an ■■■ an) bij
/ V Note that we can say two matrices are equal or not equal only if they are of the same size. Similarly, two matrices can be added or substracted if they are of the same size. Their sum or difference is also of the same size. This is not surprising because vectors of different dimensions cannot be compared, added, or subtracted. In general in multiplication two matrices need not necessarily have the same size. Two matrices can be multiplied if the number of columns of the first matrix is the same as the number of rows of the second matrix. The relation between the numbers of rows and the numbers of columns of A, B, and C = AB is (TO,
I) • (l,n) = (TO, n).
We illustrate the above situation pictorially as follows:
I m\
n I
1
a.
Example 2. 77,
IT.
EirEkl
/
=
EH,
^
while j = k, while j ^ k.
This can be seen from the rule of the matrix multiplication. Example 3. Given the matrix of order n 0
T
1
0
A = Prove that ifTOis an odd positive integer, then A™ = A; ifTOis an even positive integer, then Am = E, where Am denotes the result of multiplying A with itselfTOtimes (see Sec. 3.3). Proof. It suffices to prove EA = AE = A, A2 = E, which are evident from the rule of matrix multiplication.
99
Matrix Operations
Example 4. Let a matrix of order n be /0
1 0
A =
1 0/
V n
n 1
Then A = 0,A ~ For example,
^O.
ri ( -ir 2
1
0 1 0 1 0
\
3
/O
i
V
\
0 0
/0 0 1
\
0
0 1 0 0 0/ \ 0 1\ /0 1 0 0 j 0 1 0 0 0 '
o/
V
\
o.
1 0/
Thus when n = 4, the statement of the example is true. It is also true for the general case. A matrix A of order n is called a nilpotent matrix, if Am is equal to the zero matrix for some positive integer m. Thus the above matrix A is a nilpotent matrix. We know that addition, subtraction, and scalar multiplication of matrices differ entirely from those of determinants, but the product of two determinants can be computed as in the matrix multiplication. First let us consider the following particular example. By Laplace's theo rem, we have
an a2i
ai2
bn
&12
a22
&21
622
an
aJ2
a,2i
a-22
-1 0
O
0
bu
h2
-1
&21
&22
For the determinant of order of 4 on the right-hand side, add a 6n-multiple of the first column and a ^-multiple of the second column to the third column; a 612-multiple of the first column and a 622-multiple of the second column to the fourth column; then there is a minor of order 2 at the lower right corner of the determinant of order 4 whose elements are all zero. Hence by using Laplace's theorem once more, we have
100
Linear Algebra
an a2i
ai2
fell
fel2
<*22
&21
&22
an =
Ol2 «22
«2i
-1 0
= (-1)
O n b i 2 + Ol2fe22 G2lfel2 + &22fe22
Ollfell +Ol2fe21 a2lfell + &22fe21
0 -1
1+2+3+4
0 -1
anfeii + ai2&2i
anfei2 + ai2fe22
o
fl2lfell + A22fe21
l2lfel2 + &22fe22
anfen+ai 2 fe2i anfei2+ ai2fe22 fl2lfell + 022&21 0,2lbi2 + a22&22 The same is true for the general case. As long as we use Laplace's theorem, we can reduce the product of two determinants of order n to one determinant of order 2n: an G'nl
' ' *
a\n
fei
■■■
Q"nn
Vnl
' ' '
an
...
a\n
dn\
.. .
Q"nn
hn
Unit
an
o = -1 *
bn
...
ani bin = - 1
"nl
...
Onfi
•••
ain
...
ann
Cll
•
Cnl
•
•
Cln
Cnn
o
-1
-1
O
Cll
• ■•
O
-1
Cnl
'
cln
__ / _ 1 ) l + - + n + ( n + l ) + - - - + ( n + n )
cn /
-i\2nn+n/
•••
ci„
-i \n
• • ■ cin
=
Cnl
where Cy = )j£ auhj-
cn
•■•
cnn
cnn )
Cnl
^7171
That is to say, just as for the product of two matri-
t=\
ces, the product of two determinants can be computed. The rule is that the elements of a row of the determinant on the left are multiplied by the corre sponding elements of a column of the right determinant, and their products are then added. As a determinant and its transposed determinant are equal, some times, for convenience, in computing the product of two determinants, rows
Matrix
101
Operations
and columns can be interchanged. However we cannot compute the products of two matrices in this way. Moreover, note that the above-mentioned is the rule of multiplication of two determinants of the same order. If the orders of two determinants are different, we can still use the rule; it suffices to reduce their orders to the same order. We know from Sec. 1.3 that we can increase or decrease the orders of determinants. For example, we can rewrite a determinant of order 2 an
a 12
«21
Q-22
as a determinant of order 3 or 4: 1 0 0 an 0 a2i
0 ai2
an
ai2
0-21
0-22
i
O
a22
o 1 0 0 1
On the other hand, the order of matrices cannot be increased or decreased. This is one important difference between matrices and determinants. Thus the determinant \AB\ of the product of two matrices A and B of order n equals the product of determinants \A\ and \B\: \AB\ = \A\ - \B\. Hence the product of nonsingular matrices is still a nonsingular matrix. The preceding process of computing is essentially a process of elementary operations. We briefly write it as follows:
°\
lA -1
V
-1
AB\
(A -1
B)
-1
o)
Besides, we have (AB,A)~(0,A), which are often used.
(?)
1°
AB\
O )
102
Linear Algebra
The rule of the determinant multiplication mentioned above is applied widely. In particular, when it is applied to determinants whose elements are letters, not numbers, and the result may be rather interesting. Some examples below will serve to illustrate this point. E x a m p l e 5. Given a -b D = —c -d
b a -d c
c d a -b
d -c b ) a
compute D2, and whence find D. Solution: 2
D = DD
=
DD' a b c d
a b e d —b a d —c —c—da b —d c —b a
-b a d -c
-c —d a b
-d c -b a
a2 + b2 + c2 + d2 0 0 0 0 a2 + b2 + c2 + d2 0 0 a2+b2 + c2 +d 2 0 0 0 0 0 0 a2 + b2 + c2 + d2
= (a2 + b2 + c2 + d2f . Hence D = ±(a2 + b2 + c2 + d2)2. Since the elements on the main diagonal of D are a, by definition of the deter minant, the sign of this term a 4 is taken as +. Hence D = (a2 + b2 + c2 + d2)2. The above example is Example 3 of Sec. 1.3, but the proof here is much simpler. E x a m p l e 6. Prove that the cyclic determinant
A=
ai
a2
an
ai
a2
a3
■
an ■
■
an-i
ai
= /(ei)---/(en),
Matrix
103
Operations
where /(x) = ai + a2x H
h a n a; n _ 1 , and ei, e 2 , . . . , en
are the total n-th roots of unity, that is, e™ = l(i = 1,2,..., n). Proof. Let e be the n-th primitive root of unity, i.e., e ^ e 1 , . . . , e n _ 1 are the total n-th roots of unity. Construct the Vandermonde determinant of order n 1 1 ••■ 1 -n-1 1 e e" D = I
„n-l
e^-1)2
en-i
Computing the product of two determinants A and D, we have AD aiH axH
+an \-an
aiH e(aiH
hane™ hanen_1)
a\-\
a i + - - - + a n en_1(ai + ---+anen-1)
+ane ( « - i )
2
^ - ^ ( a x + .-.+o^"-1)2)
f{l)f{e)---f{en-l)D. Since D ^ 0, we have A = / ( l ) / ( e ) • ■ • / ( e " - 1 ) = / ( e i ) / ( e 2 ) • • • f(en). The proof is complete. E x a m p l e 7. Let Aij be the cofactor of element a^ in the determinant an
•••
ai„
D
Prove that Au
Aln
Anl
■Ann
= Dn-\
104
Linear
Algebra
Proof. It is easy to see from the equality in Sec. 1.3 and the rule of matrix multiplication that An
■
An!
■
■
Aln
an
O-ln
An
O-nl
®nn
AlT
■
An\
D A
D = Dn.
= D
When D ^ 0, the statement clearly holds. When D = 0, if a n , . . . , an\ are zeros and obviously the example holds. If a n , . . . , a n i are not all zero, using o-nAu + 021^21 +
1- cLn\Ani = 0, i = 1 , . . . , n,
and Property 4 of determinant, elements of some row of the determinant An
•■•
Air,
A-nl ' ' '
A-nr
can be entirely reduced to zeros. Hence this determinant equals zero. Thus the statement still holds. Thus the proof is complete. The basic properties of matrix multiplication are presented below. First we have A(BC) = (AB)C. (1) Suppose that A = (aij) is an (m, Z)-matrix, B = (bij) is an (I, /s)-matrix, C = (cij) is an (fc, n)-matrix. Next let AB = U = (Uij), BC = V = (vij), (AB)C = S= (Sij), A(BC) =T = ( % ) . Then l Uij =
k
y x=l
Q-ixOxj)
Vij
=
/ OiyCyj i y=1
and so k 5
12 = 22 y=\
k U
x=l
k a b
c
1V V2 = z2 { z2 ^ xy) y2 y=l x=l I
I t\2 = 2^
I
C
a Vx2
^
= 22 1=1
k a i a
= ( z J bxyCv2) V=l
l
= 2 J 5 Z a^bxyCy2 ;/=l x=l I
,
k
= z 2 z 2 alxbXyCy2 x = l y=l
■
Matrix
105
Operations
Hence s\2 = t\i- More generally, we have k 2_J y=i
u
c tyiy yj
/ , UixVxj x=l
/
I
N
I 2 J aix"xy
ZJ
f Cyj
— 2_i &ix 2-i ®xyCyj 1=1 2/=l
=
k
I
2-i
/ •
— £_i Z-i x=l y=l
a
ixOx
a
ixO xy^yj
j
and so Sij = tij. Thus S = T, and so (1) holds. That is to say, matrix multiplication satisfies the associative law. Therefore the product of A, B, and C can be written as ABC. We need not put two of the matrices in parentheses. For example,
" <;
-1
: -?'■•* Io 3
1 -2 0
4
\ 1 2/
2
, c=
1
\o
then BC =
/-M 4
V°/ AB =
-1 1
and so It is easy to prove that
i 0
,, M A(BC) = 10 \
0
(AB)C =
A(BC) = (AB)C.
(A + B)C = AC + BC,
A{B + C) = AB + AC.
Thus matrix multiplication satisfies the distributive law with respect to matrix addition. The multiplication of real numbers also have the above properties. However some properties are vastly different between multiplication of matrices and multiplication of real numbers. First, we know that AB is defined only when the number of columns of A is the same as the number of rows of B. Thus if A is an (m, n)-matrix and B is an (n, p)-matrix, then AB is an (m, p)-matrbc. What about BA? Three different situations may occur: 1. BA may not be defined. This will take place if p =£ m. 2. If BA is defined, BA will be an (n, n)-matrix and AB will be an (m, m)matrix. Then if m ^ n, AB and BA will be of different sizes. 3. If BA and AB are of the same size, they may be unequal.
106
Linear Algebra
The above mean that the commutative law does not hold for matrix mul tiplication. In general,
AB^BA. Therefore, when multiplying A by B, the order of the factors cannot be reversed in general, though in some special cases reversal of orders may be allowed, e.g.,
\3 4/\0
V3 7) '
lj
1
) \3
\°
4
V3 V '
J
while (
7
\-4
- 1 2 \ /26
45 \ _ /26
45 \ /
7J\15
26j~\15
26JV-4
7
-12 \ _ (2
3\
7 ^ — ^ 1 2J '
Again 0 0
\o
1 0\ 0 1 0 0/
1 0
Vo
°\ = / I =° 1/
-1 1 0
-l
1
I
°\ /°
-1
0
1/
0
Vo
1 0 0 1 0 0
1 0 0
■6 1) As the order of factors cannot be reversed, when multiplying B by A, we must specify if B is multiplied by A on the left or on the right. Furthermore, there are striking differences between multiplication of matri ces and multiplication of numbers: Even when A and B are nonzero matrices, AB may be the zero matrix, i.e., when AB = 0, we may still have
A^O,
B^O.
For example, in Example 3, Am = O but A^O.
Another example is
ai2
M l)( 1 -l) (o oj' -ij 0 0 0 0 0
1 021
^22
0
Va3i
032
( /an
1
U
0
0
0
/°
&31
&32
&33 ,
Vo
X
\ /
o,/ V
0
0 0
°\ 0
0/
107
Matrix Operations
Besides, if AB = AC and A ^ O, we cannot always deduce B = C, i.e., if A(B — C) = O and A ^ 0, we cannot always have B = C. That is to say, matrix multiplication does not always satisfy the cancellation law. We know that the product of two nonsingular matrices is still a nonsingular matrix. The following is an important relation between the ranks of the product of general matrices. Theorem. The rank of the product of two matrices A and B is not greater than the rank of A and the rank of B, that is rank of (AB) < min(rank of A, rank of B). Proof. Since elementary operations do not alter the rank of matrices, we easily see rank of (AB) < rank of (AB, A) = rank of (O, A) = rank of A , rank of (AB) ^ rank of I _ I = rank of I
R
1 = rank of B .
This proves the theorem. Hence the rank of A2 is not greater than the rank of A, i.e., rank of A2 ^ rank of A.
E x a m p l e 8. Let A = (a;j) be an (m,n)-matrix, B = (bij) be an (n,k)matrix. If AB = O, prove that rank of A + rank of B ^ n . Proof. Since AB = O, any column vector (bu,..., vector of the system of linear equation: onxi +
h ainxn
^7711*^1 T " *" "T Qmn-En
bni) of B is a solution
= 0, ==
" .
But the number of linearly independent solution vectors is not greater than n — r, where r is the rank of A. Let the rank of B be s, then n — r ^ s, or r + s < n. Hence the statement holds.
108
Linear Algebra
Example 9. Let A and B be two matrices of order n, prove that rank of A + rank of B - n < rank of (AB) < min (rank of A, rank of B). Proof. The conclusion on the right side, i.e., Theorem above has been proved. The conclusion on left side is proved as below. It is easy to see from elementary operations that (A rank of A +Tank of B = rank of
\
^ rank of -1
Bj
AB\ rank of \
-1
O )
= rank of (AB) + n, i.e., rank of A + rank of B — n ^ rank of (AB). The proof is complete. We can reach the conclusion of Example 8 immediately from the conclusion of the above example. Example 10. Let A = (a,ij) be a matrix of order n. The sum of the n elements on its main diagonal is called the trace of A, denoted by trA, n
i.e., trA = £} an. Prove that i=i
tr(AB) = tr(BA), i.e., AB — BA is a matrix whose trace is zero. Proof. Let A = (dij), B = (bij), AB = (Uij),BA
= (u 4 i ).
Since n Uij = y aixOxj, x=l
n Vij = y DiyClyj , V=l
Matrix Operations
109
hence n
n
tr(AB) = ^2
u = 5 3 ( ^2
i=l n
i=l
= x5Z= l (5Z \i=l
a
ixbxi I
\x=l \
f n
\
/ n
u
6xiaix = /
/ n
I £=1 13 ^ = tr(Bj4) ■
So the example holds. It is clear that tr(A ± B) = tr(A) ± tr(B). Using matrices we can simplify a lot of problems. For example, the complex number a + bi can sometimes be expressed in terms of matrices:
A
b
a)=aE
={-b
+ bJ
'
where 1
E
0\
=\o
,
f
0
i M = -i
1\
T2
o»'J--£-
Again for instance, the system of linear equations fliiii
H
+ a-in^n = bi,
can be abbreviated to form of the matrix equation AX = B, where /flu
■•■
a
l n
/ * i \
\
H
• *= \ Ami
'" '
a
mn /
/ h B
■■ )> = \Xn
/
Thus the problem of solving the system of linear equations reduces to that of finding the matrix X. The method of finding X is by matrix division, which will be discussed in Sec. 3. Similar to Theorem 2 in Sec. 2.2, we have E x a m p l e 1 1 . Assume that the systems of linear equations AX = M and BX = N have solutions, then they have exactly the same solutions if and only
110
Linear Algebra
if any equation in AX = M is a linear combination of equations in BX = N, and any equation in BX = N is a linear combination of equations in AX = M. Proof. It is clear that the sufficient condition is true. We shall prove that the necessary condition is also true. Let Y be a common solution of AX = M and BX = N, i.e., AY = M, BY = N. Then A(X -Y) = 0 and B{X -Y) = 0 have exactly the same solutions. Hence row vectors of A are a linear combina tion of row vectors of B. Substituting any solution of AY = M into BY = N, we obtain BY = N'. Hence N' = N. That is, BY = N. Therefore any equation in BX = N is a linear combination of equations in AX = M. Similarly, any equation of AX = M is a linear combination of equations in BX = N. Hence the necessary condition holds. The proof is complete. In the preceding discussion we introduced matrix addition, matrix subtrac tion, scalar multiplication, matrix multiplication, and their basic properties. Among the four operations matrix multiplication is the most complex, hence much more comments will be made on it. Up to now, we have dealt with and solved the first problem raised in the introduction. Exercises 1. Compute
«»(; s o-( 'an
(2) ( i
y z) \ an
h /cos a y sin a
6 0
10 20 9 3
ai2
bi \
IX
a22 b2 I I y
b2
c J \ zt
-sinaV cos a J
( S i IV. \o o \)
2. Using the equalities 17 35
2 0 \ (-1 -6 \ / 2 3 -12 j ~ \ 5 7 0 3) \ 5 -7 3 \ (2 3\ 5 -2)
I5 7i ■ ( ! ! ) ■
3\ -2)
Matrix
111
Operations
compute 17 - 6 \ 5 35 - 1 2 / ' 3. Suppose A and B are two matrices of order n, prove that \A B
B A
\A +
B\-\A-B\.
4. Let A be an (m, n)-matrix, B an (n, m)-matrix. Prove that when m > n,C = AB is a singular matrix. 5. Let A be a matrix of order n such that A2 = E. Prove that the sum of the ranks of A + E and A — E equals n, i.e., rank of (A + E) + rank of (A - E) = n . 6. Let A be an idempotent matrix of order n, i.e., A2 = A. Prove that rank of A + rank of (A - E) = n. 7. Prove that n si
si s2
«n-l
^71—1
S71
S2n-2
LI
(Xi -
Xj)'
where
xi + 4 + •■• + < ,
i = l,2,--
8. Suppose an
0-\n
D
a-ii
, M = O-nl
'''
O-kj
Qnn
O-kl
Prove that Aij Akj
An Akl
= DN
where N is the algebraic cofactor of M in D, A^ is the algebraic cofactor of aij in D. 9. Prove that a skew-symmetric determinant of even order is a perfect square of the function of its elements. 10. Let A i , . . . , Am be matrices of order n, and A\ ■ ■ ■ Am = O. Prove that the sum of the ranks of the m matrices is not greater than (m — l)n, i.e., rank of A +
h rank of Am < (m — l)n.
11. Find a real number a and two matrices A and B of order 2 such that
112
Linear Algebra
GO
A + aB = ( :; "d ) , A + B = E, AB = O and prove that
A + anB = (A + aB)n , where n is any positive integer. 3.2. Diagonal, Symmetric, and Orthogonal Matrices In this section we shall solve the second problem raised in the introduction, that is, we shall introduce some special matrices which are used often. Let us begin with the simplest case. A matrix of order n with ones on the main diagonal and zeros elsewhere:
is multiplied by any matrix A of order n on its right side or on its left side, and their product is still A itself, i.e., EA = AE = A. That is to say, E plays the same role in multiplication of matrices as the number 1 does in the multiplication of real numbers. Hence E is called the identity matrix of order n. Obviously the identity matrix of order n is unique in the sense that if E\ is also an identity matrix of order n, then EE\ = E = E\. It should be noted that when a matrix is multiplied by individual matrix on its right side or its left side and their product is still equal to the individual matrix itself, the original matrix is not necessarily the identity matrix. For example, even though /3 \2
1\( 1 - 1 \ 2 ) \ -2 2)
/ 1 - l \ / 3 1-2 2J \2 1
1 2
-1
1 \ I is not an identity matrix of order 2. Example 1. Let A be a matrix of order n. If for any matrix B of order n, we have AB = B (or BA = B), then A is the identity matrix, i.e., A = E.
Matrix
113
Operations
Proof. Taking B = E, we have AE = E; but E is the identity matrix, and so AE = A. Hence A = E. The proof is complete. If A is not a square matrix, the case is quite different. For example, 1 0
0 \ fa 1 J \ a\
/
b c by c\
a
b
\ai
h
(a
b
\a,i
bi
\ /l
°) Cl)
0
U
°\
0 1 0 0 l)
C
)
C1 J
Thus for a general matrix, there isn't any so-called identity matrix. When a matrix of order n with fc's on main diagonal and O's elsewhere: '*
X ■.
= kE = Ek k)
is multiplied by any matrix A of order n, their product is obviously kE ■ A = A ■ kE = kA. That is to say, the product of matrix kE and any matrix A is the same as that of a number k and the matrix A. Therefore kE is called a scalar matrix. In particular, when k = l,kE becomes the identity matrix. Clearly the multiplication of a scalar matrix and any matrix of the same order is commutative. Its inverse is also true. Let us see the following example. Example 2. Let A = (a»j) be a matrix of order 3. If A and any nonsingular matrix of order 3 commute, prove that A is a scalar matrix. Proof. From 2 0 0 1 0 0
°\ 1a121n
ai2 a i 3 \ / a n «i2 013 0,22 023 I — I ^21 «22 «23
i / V ^31
^32
0
0-33 )
\ ^31
0-32
\ I20 ^33/ \o
we have 2an «21 G31
2ai2 2 a i 3 \ / 2 a n au 013 0,22 023 1 = 1 2a21 G22 023 0,32 a33 J \ 2 a 3 i a 32 a 33
hence 012 = ai3 = 0,021 = 031 = 0.
0 0 1 0 0 1
114
Linear Algebra
Again from
>i
1 2
i;
M
i
n
\
2
V
J
we obtain 023 = 032 = 0. Prom 0 1 1
°J
a22 (
/an I = I fl33 / \
\
\
\ /an
a22
A
0 1 1 0 1
°33
we obtain a n = 022- Similarly, we have a n = 033. Therefore a n = 022 = 033, i.e., A is a scalar matrix. If we take some special matrix as above once more, can we continue to simplify Al Obviously it is impossible, for a scalar matrix commutes with any matrix. If we repeat this process, as above, then the equality obtained is only an identical relation and we cannot infer any new property. Therefore A is a scalar matrix and the statement holds. This property can be generalized to the general case. Thus if a matrix of order n commutes with any nonsingular matrix of order n, then this matrix must be a scalar matrix. This is Exercise 9 in this section. A matrix of order n whose elements are zeros except for elements on the main diagonal: '01
(1) is called a diagonal matrix. When a\ = ■ ■ ■ = a„, it is a scalar matrix. It is easy to see from the definition that I/ a i
/fcai
\
H
| =
<
an/
(
"
• kan/
/ a i + 6i
/Oi
&J1 =
+ On/
tax
1
\
("
v J{
( \
O-n.
an + bn/
/ai&i
..
)1 = b„) 1
) .
I "■ a bn/ n
115
Matrix Operations
That is to say, the product of a scalar k and a diagonal matrix is still a diagonal matrix. Its elements on main diagonal are fc-times multiples of the corresponding elements on the main diagonal of the original diagonal matrix. Again the sum and product of two diagonal matrices are also diagonal matrices. The elements on the main diagonal are the sums of the corresponding elements on the main diagonals of the two diagonal matrices in the former, and are the products of the corresponding elements in the latter. It is clear that the factors of the product of two diagonal matrices can commute. Thus operations on diagonal matrices are much simpler than for general matrices. Example 3. Suppose A and P are two matrices of order n. The n column vectors of P are written as P i , . . . , Pn and let P = (Pi,. - -, Pn)- Then 'ai
AP = (AP1--APn),
P'
| = (axPi ■' ' Q>n*n)
•
an/ We can examine them from the rule of matrix multiplication. The results will be used in Chapter 5. For example, /i
- 1 2 1 0 - 1 2
0
Vi
1\ -
2
3; 1
then
l
<«**>-((-;.; :)(;)(u \M 3 \I )
"2
11)-". /ai
ai
( -)-(i i)(* / = Uil Ui
V
r\0 U/ 0
/ ai a2
a.2
M\
-2 -2 . \ 3
o \Ol
02
\
-2a2 3a2/
= (oiPi a 2 P 2 )
116
Linear Algebra
Like determinants, a matrix of order n in the form /All
0-12
••
a.22
■■■
Oln\ a2r
J
\
is called a triangular matrix. We see readily from the definition that the sum and product of two triangular matrices of order n are again triangular matrices. Generally square matrices whose elements above the main or minor diagonal, or below the main or minor diagonal are all zeros are called triangular matrices. Although some matrices are not diagonal, they can be partitioned into block diagonal matrices: 'A1 (2)
where the orders of matrices Ai,..., For example, 2 0 0 3 0 0 /2 1 0 Vo
0 2 0 0
°\ = / 1
3/
V
0 0' 0 0 3 0 1 3,
2 0 0
0 3 0
/ 2 1 0
Vo
Am-i,Am
0 ' 1 3 0 2 0 0
0 0 3 1
may be the same or different.
\ 3 0
\ / 0 \ 0 = 0 3 )
1 3
2 0 1 2
V
are block diagonal matrices. It is easy to see that rank
<
A
* >
rank of A + rank of B.
If matrices A and B of order n are block diagonal matrices 'Ax A =
\
/Bi B =
3 0 1 3
117
Matrix Operations
where Ai and Bi,i = 1,2,... ,m, are matrices of the same order. It is not difficult to show by definition that
lM
A+ B
r
'kAi
i
kAT >AiB l-Dl , AB =
A-mDji Amm + B.m ' T ±J That is to say, the sum, scalar multiplication and product of block diagonal matrices can be computed just like diagonal matrices. For example,
2 0 0\ / l 0 0' 0 3 1 + 0 2 0 0 0 3/ \0 3 1
'2 + 1
(o 5 3
/2 1 0 \0
0 0 0\ 0 0 0 3 0 0 1 3/
2
3)
2 0 (3 1
s)-
X
/ (2
2
+
0 2
1
\ 3
0V
1 3J / /4 4
0 4
\
°
9 6 9/
V
For brevity the diagonal matrices (1) and block diagonal matrices (2) are sometimes written as diag(ai,..., an) and diag(^4i,..., An) respectively. In general, block matrices need not be diagonal matrices. For example, (
\
B
an
au
ai3
^21
a22
G23
d31
«32
0.33
a4i
«42
0.43
051
052
053
An A2i
^12
&15
b\6
^21
&22
^25
^26
&31
&32
&35
&36
&33
334
Al2
A22
B21
B\2 B22
B\3 B23
118
Linear Algebra
It is clear that AB
An
A12
Bu
B12
B13
C\\
C12
C13 \
A21
A22
B21
B22
B23
C21
C22
C23/
where Aafly+AflJBjy,
i = 1,2; j = 1,2,3.
Partitioning matrices into block diagonal matrices, we can sometimes more easily compute the product of matrices. As mentioned above, when appropri ately drawn, the multiplication of partitioned matrices can be performed in a manner analogous to the usual matrix multiplications. In this multiplication one multiplies and adds blocks as if they were scalars. Now we shall present some other important special matrices. Interchanging the rows and columns of the matrix // aonn
aai2 i2
021
122
■
CL2n
\flml
Om2
■
0"mn /
ain \
we obtain fan
a.21
Ami \ am2
a\2
022
\ din
Cl2n
A' =
■■•
0>mn'
As in determinants, A' is called the transposed matrix of A. For example, the transposed matrices of (-1 ^ 0 are respectively /-
1
2 -1
°\
-1 , V i 2/ If A is a block diagonal matrix 2
'Ai
A =
1\ 2J'
H U
(6i- •&»)
Matrix
then
119
Operations
~r-j-
Since the determinant of a matrix is equal to the determinant of its trans posed matrix, the rank of a matrix is equal to that of its transposed matrix. Prom the rules of matrix addition and scalar multiplication it is easy to prove that (A ± B)' = A' + B', (kA)' = kA'. For the product of matrices we have: Theorem. The transposed matrix (AB)' of the product AB of matrices A and B equals the product B'A' of the transposed matrix B' of the matrix B and the transposed matrix A' of the matrix A, i.e., (AB)' = B'A'. Proof. Let A = (<%), B = (6y), and AB = C = (cij). Then A' = (a'ij),
B' = (b^),
0' =
^).
Since a
ij
=
a
jii
"ij
=
c
"jit
ij
=
c
jii
we have C12 = c2i = ^2a2khi
=
^2b'lka'k2.
In general, we have c
'ij = CH = J2aikbki
=
^2b'ika'kj-
That is to say, c'^ is the element in i-th row and j'-th column of B'A', thus (AB)' = B'A'. This proves the theorem. In general it is easy to show by induction that (AlA2---An)'
=
A'n--A'2A'1.
Example 4. Let A be a real square matrix. Prove that rank of A = rank of (A'A) = rank of
(AA').
120
Linear Algebra
Proof. It suffices to prove that AX = 0 and A'AX = 0 have exactly the same solutions. If this is so, then since a system of fundamental solutions of AX = 0 contains n — r solution vectors, while a set of fundamental solutions of A'AX = 0 contains n — s solution vectors, we get r = s, where r is the rank of A, s is the rank of A'A. Similarly we have rank of A = rank of
(AA1).
Then since rank of A = rank of A', the statement in the example holds. We next prove that AX = 0 and A'AX = 0 have exactly the same solutions. Prom AX = 0 we have A'AX = 0, so all the solutions of AX = 0 are the solutions of A' AX = 0. Conversely let AX = Y. From A'AX = 0 we have Y'Y = X'A'AX = 0. Since A is a real matrix, Y is a real vector; hence Y = 0, i.e., AX = 0, that is to say, all the solutions of A'AX = 0 are also the solutions of AX = 0. Hence AX = 0 and A'AX = 0 have exactly the same solutions. The proof is complete. Note that if A is not a real matrix, then the above conclusion may not be true, that is, rank of A is not necessarily equal to rank of A'A or rank of AA'. For example, A=
/ 0 i l\ I -i 0 0 I , \ 1 0 0/
A'A=
/0 I0 \0
0 -1 i
0\ i I. 1/
Their ranks are not equal. In general since A and A have the same rank, likewise we have rank of A = rank of (A'A) = rank of
(AA1),
where A = (dij), A — (%■), a,ij is the complex conjugate of atj. For example, a = 1 — i, 5 = 1 + i. Like determinants, we have symmetric and skew-symmetric matrices. A square matrix A = (flij) is called a symmetric matrix if A' = A, or a,j = a^. A square matrix A = (a^) is called a skew-symmetric matrix if A = —A', or aij = —aji. Note that elements on the diagonal of skew-symmetric matrix are all zeros. In fact from A = —A' we obtain a^ = —a^. Hence 2au = 0, i.e., an = 0. For example,
(
1 l 1
2
-*W ° 3 , - 1
l
~*\ 0
3
Matrix
121
Operations
are symmetric and skew-symmetric matrices respectively. We know from Ex ample 9 in Sec. 1.2 that a skew-symmetric matrix of odd order is a singular matrix. Example 5. Show that any square matrix can be written uniquely as the sum of a symmetric matrix and a skew-symmetric matrix. Proof. Let a square matrix A be written as A = B + C, where B is symmetric and C is skew-symmetric. We have to determine B and C. Since A' = (B + C)' = B' + C = B - G
and
A = B + C,
adding these two expressions, we obtain A + A' = 2B, or B = (A + A')/2. Subtracting, instead of adding, leads to C = (A - A')/2. This proves both the existence and uniqueness of B and C. Hence the proof is complete. For any matrix A, obviously A1 A and AA' are symmetric matrices. If A is a square matrix, then A + A' is a symmetric matrix and A — A' is skewsymmetric. If A and B are symmetric, obviously A + B is symmetric. However AB is not necessarily symmetric. Example 6. If A and B are symmetric matrices of order n, prove that AB is a symmetric matrix if and only if AB = BA. Proof. If AB is symmetric, then (AB)' = AB. But as (AB)' = B'A' = BA, we have AB = BA. Conversely, if AB = BA, then since (AB)' = B'A' = BA, (AB)' = AB. Hence AB is a symmetric matrix. The proof is complete. A real square matrix is called an orthogonal matrix if it satisfies AA' = A'A = E. For example, (1 \^0
0\ -l)'
/ 0 \-l
- 1 \ oj'
/cos6> \sinO
sin0\ -cos0,)
are orthogonal matrices. It is easy to see from the definition that if A is orthogonal, then \A\ = ± 1 and A' is also orthogonal. Again, if B is also orthogonal, then since (AB)'(AB) = B'A'AB = E, AB is also orthogonal. That is to say, an orthogonal matrix is a nonsingular matrix, the transposed matrix of an orthogonal matrix is also an orthogonal matrix. The product of orthogonal matrices is also an orthogonal matrix.
122
Linear Algebra
Assume that A=
Gin
{ an \avndi n i
•••
OTI
is an orthogonal matrix. From the definition A'A = AA' = E and the rule of matrix multiplication, we have
aii
h anianj = 0,
a,iOji H
1- o-inajn = 0 , i ^ j ,
i.e., n t=i
t=i
This is an important relation among the elements of an orthogonal matrix. It is sometimes called the orthogonal condition. Clearly, multiplying any row or any column of an orthogonal matrix by — 1, we again obtain an orthogonal matrix. Likewise, interchanging two rows or two columns of an orthogonal matrix, we also get an orthogonal matrix. E x a m p l e 7. Assume that a block matrix A = I „
_ 1 is an orthogonal
matrix, where P, Q are matrices of orders m, n respectively. Prove that P, Q are orthogonal matrices and R = 0. Proof. From A A ' - (
A A
P
~ \ 0
R p
\( ' Q)\R'
0\_fPP' Q')~\
+ RR' QR'
RQ'\_r QQ')~ '
it follows that PP' + RR' = Em,
QQ' = En,
QR' = 0,
RQ'=
O.
Then since QQ' = En, R = O. Thus we have PP' = Em. The proof is complete. Symmetric matrices and orthogonal matrices are two important classes of square matrices. We shall use them later. It should be noted that an orthogonal matrix must be a real matrix, while a symmetric matrix need not be real. If it is also a real matrix, it is called a real symmetric matrix. Symmetric matrices and orthogonal matrices must be square matrices.
Matrix
123
Operations
Exercises 1. Suppose A and P are matrices of order n and the n row vectors of P are
(Pl\ written as Pi,P^, ..., Pn, i.e., P = I
I. Prove
\PnJ PiA\
/oi
,PnA/
\
\
an/
/oiPi \anPn,
2. Prove that
(
1 ^ 1 0 —± V2 2
\ \~V6
3. 4. 5.
6. 7.
*8.
9. 10.
1
71
j_
-i 71/
is an orthogonal matrix. Assume that A is a matrix of order 2 whose trace equals 0. Prove that A2 is a scalar matrix. Let A be a real matrix. Show that if A'A = O, then A = O. Suppose A and B are two matrices of order n. Prove that rank of (AB) = rank of B if and only if ABX = 0 and BX = 0 have exactly the same solutions. Suppose A is a singular matrix of order n. Prove that there are nonzero matrices B and C of order n such that AB = O and CA = O. Suppose A is a diagonal matrix whose elements on the main diagonal are distinct and B is any matrix which satisfies AB = BA. Prove that B is a diagonal matrix. Assume that A = diag(Ai.E, A2.E2, ■ ■ ■, ^tEt) is a block diagonal matrix, Ei is a unity matrix of order rij, A; ^ Xj when i ^ j . If AB = BA. Show that i? = diag(Ci, C 2 , . . . , Ct), where Ci is a matrix of order rii, i.e., B is also a block diagonal matrix. Show that if a matrix of order n commutes with any nonsingular matrix of order n, then it is a scalar matrix. Suppose A is a skew-symmetric matrix, B is a symmetric matrix. Prove that (1) A2 is a symmetric matrix; (2) AB — BA is a symmetric matrix; (3) AB is a skew-symmetric if and only if AB = BA.
124
Linear Algebra
11. No matter what order n the matrices A and B have, prove that AB —
BA^E. 12. Suppose Xi, X2, ■ ■ ■, Xn are n linearly independent vectors of dimension n. Prove that PX\,PX2, ■.., PXn are linearly independent if and only if P is a nonsingular matrix. 13. Suppose A is an orthogonal matrix, of which Aij is the algebraic cofactor of element ay. Prove that when \A\ = 1, ay = Ay\ when [vl| = —1, O'ij
==
^-ij •
14. Assume that A and B are orthogonal matrices and \A\ = —\B\. Prove that \A + B\ = 0. 3.3. Invertible Matrices In Sec. 3.1 we introduced matrix addition and matrix multiplication in detail. Using matrix addition, we also defined matrix subtraction. Now we want to investigate how to define matrix division from matrix multiplication. By matrix A divides matrix B we mean that we find matrices X or Y such that AX = B or YA = B. Matrix division is more troublesome. In general A need not be divisible by B. =■B.B. If For example, exampk ;, let AX =
(
/ aa n A = I A21
012 Ol2 a22
/&11 An
an\ \ 1 013 023
.
B =
&21
612 &12 622
&13 &i3\ &23 ,
032 a.33) 1 \ & 3 1 &32 633/ \a31 A21 0.22 a 2 3 I , B = I &21 &22 ^23 we031 have032 the systems of linear 033/ \ & 3equations: i 632 633,
then then we have the systems of linear equations: Zll\ .(Xl2\ Au\ /6lM A X22 = 622 = 621 121 J = ■ 1 W32/ Z31/
w
\ r Vw
-
/ In
^12
121
X22
^23
\^ 3 1
^32
3^33
^13
\
2:23 3^33
/
^13
/fcis' \
= 1 ^23 \b33
/
We know from the theory of systems of linear equations in Sec. 2.4 that when A is not nonsingular, X need not exist. Even if it does, it may not be unique. In these cases to discuss the problem of A divides B is meaningless. Hence we shall discuss the problem only for a nonsingular matrix A. Let A be a matrix of order n and E an identity matrix of order n. If there exists a matrix X such that XA = AX = E,
Matrix
125
Operations
then X is called the inverse matrix of A. If A has inverse matrix, then the inverse matrix of A is unique. This is because if there is another matrix Y such that YA = AY = E, then EX = (YA)X
= Y(AX)
= YE
and hence X = Y. The inverse matrix of A is denoted by A-1. has an inverse, we say A is an invertible matrix. Clearly the inverses of the matrices
E=i
•-.
If a matrix A
I and i =
are just themselves, i.e., E~x = E, A~l = A. Moreover, orthogonal matrices are nonsingular. The inverse of an orthogonal matrix B is the transposed matrix of B, i.e., B~x = B'. Thus if A has inverse A-1, from AX = B or YA = B we have X = A~XB or Y = BA~l. Hence the problem of A divides B means that the inverse of A is to be found. It should be noted that when AB = BA, we have A~lB = BA~l. Hence X - Y. When AB ^ BA, obviously A~XB ^ BA~X and hence X ^ Y. This is to say, by distinguishing postmultiplying A by B and premultiplying A by B, we make clear distinction between left division of A by B and right division of A by B. We next discuss the existence of the inverse matrix and the methods of finding it. If A has inverse matrix A - 1 , since \A-l\\A\
= \A-lA\
= \E\ = l,
A is a nonsingular matrix. That is to say, the necessary condition that A is an invertible matrix is that A is nonsingular. Conversely, any nonsingular matrix A = (ay) of order n is invertible. This is because we have a method of solving the linear systems in Sec. 2.4, by which we can get the inverse matrix of A. Hence we have the following theorem. T h e o r e m 1. The matrix A is invertible if and only if A is nonsingular. Example 1. Let
*-(U)-
126
Linear Algebra
Prove that A is invertible and find A"1. Solution: Let A-1 = ( AA~1 =
, 1. We have
0 :)(::)-(!!)■
so that / a + 2c b + 2d\ _ (\ \3a + 4c 3b + id)~\0
0\ l)
Equating the corresponding entries of the above two matrices, we obtain the linear systems J a + 2c=l . f" b + 2d = 0 \ 3 a + 4c = 0 a n d \ :36 + 4d = 1 The solution is a = - 2 , c = 3/2, b = 1, d= - 1 / 2 . Moreover, since the matrix
CMS!) also satisfies the property that
(:i)(ii)-(l!W14)G :)-(!?)■ we conclude that A is nonsingular and that
A-*
Example 2. Let ,4 = ( * 4 ) ■
Solution: Let A -
~2
4)
F i n d j4_1
-
u
, .As ■C !)•
"-*-(! 2) (: i)-(i !)■ / a + 2c b + 2d\ _ (\ \^2a + 4c 2b + 4d)~\0
0\ l)'
Matrix
127
Operations
Equating the corresponding entries of the two matrices, we obtain the linear systems J a + 2c = 1 ( b + 2d = 0 \ 2 a + 4c = 0 \26 + 4 d = l ' These two sets of linear equations have no solution, so A has no inverse. In fact A is singular, for |J4| = 0. How to find the inverse matrix is an important problem. We shall give a method which is simpler than by using the above theorem. From Example 7 in Sec. 3.1, we have
(An
Ani\
(an
\Aln
A 1
\ani
'\A\
air
\ \A\)
Let
A
*=[
[An
■■■
AnX
and it is called the adjoint matrix of A. Thus the above expression becomes A*A = \A\E, or A*
Similarly, we have
so that
J4*/|.A|
wA'EA E m= '
is the inverse matrix of A. Thus we have:
T h e o r e m 2. If A is a nonsingular matrix, then A has the inverse matrix 1
AA'
- —A* ~ \A\A '
where A* is the adjoint matrix of A. Example 3. Find the inverse matrix of the matrix /I A =
2
\3
2 3 \ 2 1 4 3/
128
Linear Algebra
Solution: As An = 2,
A12
-3,
Al3 = 2,
An
A22
-6,
A23 = 2,
= 6,
A31 = - 4 ,
A32 = 5,
^33 - - 2 ,
the adjoint matrix off A is 2
(
A* =
-
3
V 2
6 -6 2
-4\ 5
-2/
"
Since \A\ = 2,
3 -3 1
1 ^-
J
3 2
= -
1
-2\ 5 2
-1/
The solution is complete. From multiplication rules of diagonal matrices and block diagonal matrices, we can find the inverses of the following matrices: -2
and
3 0
1 3
It is clear that 1
a \ In general, let A be a block diagonal matrix i'Al
A= '
\ 1 3
0
1 9
u
Matrix
129
Operations
where Ai(i = 1,2,..., m) are invertible and the inverse of Ai is A{ . Then
A~l = A~'
Mr1
>
A^j
\
This can be deduced from the definition directly. Example 4. Suppose the rank of a matrix A of order n is r, and the rank of its adjoint matrix A* is s. Prove that n when r = n, s = ^ 1 when r = n — 1, 0 when r < n — 1. Proof. When r = n, i.e., A is nonsingular; according to Theorem 2, A* is also nonsingular. Hence s = n. When r < n — 1, all the algebraic cofactors Aij = 0. Then obviously s = 0. It suffices to prove that when r = n — 1 we have s = 1. Since \A\ = 0, we know from Theorem 2 of Sec. 1.3 that \An,
Ai2, ■. ■, Aj,n),
2 = 1,2, . . . , n,
is a solution vector of the system of linear equations AX = 0. Since the rank of A is n — 1, its system of fundamental solutions consists of only one nonzero solution vector. Therefore, among the above n solution vectors there is at least one nonzero solution vector, and any two solution vectors must be linearly dependent. Hence the rank of A* is 1 and so the statement holds. The following are the basic properties of inverse matrices. From A~XA = E, we have \A_1\ \A\ = 1 and hence \A~l\ = \A\~l. In other words, the determinant of the inverse matrix A-1 is the reciprocal of the determinant of matrix A. Again from AA~X = E and the uniqueness of inverse matrix, we have
04-1)"1 =A. That is to say, the inverse of A-1 is A itself. Again from AA~r = E, we have (A'1)'A'
= E,
and hence
(AT^iA-1)'.
130
Linear Algebra
Thus the inverse of the transposed matrix of A is the transposed matrix of the inverse of A. Since (—A~x)(—A) = A_1A = E, we have
{-A)-'
=
-A-\
i.e., the inverse of the negative matrix of A is the negative matrix of the inverse of A. We know that if matrices A and B are nonsingular, then their product AB is also nonsingular and therefore the product also has inverse matrix. Prom B-XA~XAB = ABB-1 A'1 = E, we get (AB)'1
=
B~XA-1.
Thus the inverse of the product AB is the product of the inverses of A and B in reverse order. In general, if A\, A%,..., An are all nonsingular, then (A1A2 ■ ■ ■ A^-1
= A-1 A£t
■ ■ ■ A^1 A;1.
(1
Note that the above result is similar to that established for transposed matrices with the factors in reverse order. If A is a matrix of order n, then Am denotes the product of A with itself m times: A" = E, Am = AA--A. m times m
If A is a nonsingular matrix of order n, A~ denotes the product of A-1 itself m times: A-m = A-1A~1---A-1. * * ' Clearly AkAl
= Ak+l.
with
m times
Since the inverse of Am is A~m, A-m
_ (A"*)- 1 .
Example 5. Let A*,B*, and {AB)* be adjoint matrices of A, B, and AB respectively. Prove that (AB)* = B*A*. Proof. If \A\ ^ 0 and \B\ ^ 0, or \A\ ■ \B\ / 0, the conclusion follows immediately from (1). Hence the statement holds.
MatTix
131
Operations
When \A\ \B\ = 0, the conclusion also follows from the definition of adjoint matrix. But proof in this way is very tedious. It is easier to prove by using the following properties of polynomials. We know that if f(xi,...,x„), g(x\,... ,xn) are two polynomials in x\, X2, ■■■, xn
and
f(xi,
..., xn) -g(xi, ...,xn)
= Q,
then f(x\,..., xn) = 0 or g(xi,..., xn) = 0 (see Exercise 14). It follows from the property that if f(xi,... ,xn) ^ Oand at a point ( * i , . . . ,xn),f(xi,... ,xn) ^ 0 and g(xi,... ,xn) = 0, then f(x,...,x) ■ g(x, ...,x) = 0 and hence g(x\,.. .,xn) = 0. Let A = (ajj), B = (bij). \A\, \B\ can be considered as polynomials in a^and b^. It is clear that \A\ \B\ # 0. When \A\\B\ ± 0, (AB)* = B*A*, or {AB)* — B*A* = (fij) = O. Hence / ^ = 0, where faj is a polynomial in ai;7 and bij. Thus when \A\ \B\ ^ 0, | ^ | \B\fcj = 0; when \A\ \B\ = 0, obviously \A\ \B\fij = 0 also, so we have \A\ \B\fcj = 0. But | 4 | \B\ ^ 0, using the above properties of polynomials we have fij = 0, i.e., (AB)* — B*A* = O. Therefore whether A and B are nonsingular matrices or singular matrices, we always have (AB)* = B*A*. Hence the statement holds. In general, we have (A1A2---An)*
= A*n---A*2A*1.
(2)
Suppose A is a symmetric matrix, then its adjoint matrix A* is also a symmetric matrix. If in addition A is an invertible matrix, then the inverse A-1 of A is also a symmetric matrix. If A is an orthogonal matrix, then since A~x = A', its inverse matrix A~l is an orthogonal matrix and its adjoint matrix A* is also an orthogonal matrix. That is to say, the inverse and the adjoint of a symmetric matrix are symmetric matrices; the inverse and the adjoint of an orthogonal matrix are also orthogonal matrices. Example 6. Suppose A is a skew-symmetric matrix of order n. If n is an odd number, then A* is a symmetric matrix; if n is an even number, then A* is a skew-symmetric matrix. Proof. Let A = ((Hj). Since the algebraic cofactor of a^- in A is a trans posed determinant of the algebraic cofactor of a,j in A', we have (A*)' = (A1)*.
132
Linear Algebra
Again since elements of A* are minors of order n — 1 in A, for any k we have (kA)* = kn~lA*. Prom the hypothesis A' = -A, we have (A*)' = (A')* = {-A)* = ( - l ) " - 1 ^ * . Therefore, when n is odd, (A*)' = A*, i.e., A* is a symmetric matrix; when n is even, (A*)' = -A*, i.e., A* is a skew-symmetric matrix. The proof is complete. Example 7. Let A be a matrix of order n. If there exists a matrix B of order n such that i M = E (or A S = £ ) , then B is the inverse of A, i.e.,
Proof. We know from BA = E that A is nonsingular. Hence A-1 and AB = ABAA-1 = A{BA)A~l = AA~l = E.
exists
Therefore B is the inverse of A, i.e., B = A-1. This proves the statement. To check whether a matrix B is the inverse of a matrix A, a procedure based on the result of the above example will only be half as long as that based on the definition. Suppose that B is nonsingular. Then AB = CB implies A = C, and AB = O implies that A = O. Furthermore we have: Theorem 3. Let A and B be two matrices with B nonsingular, then the rank of AB or BA equals that of A. That is, premultiplying or postmultiplying any matrix by a nonsingular matrix, the rank of the product equals that of the original matrix. Proof. Let AB = C. We know from theorem of Sec. 3.1 that the rank of C is not greater than that of A. Again since CB = A, the rank of A is not greater than that of C. Hence the ranks of A and C are equal. Thus the theorem holds. Finding the inverse is an extremely important problem. The preceding methods of using Theorem 2 are very tedious, as many determinants have to be computed. Recall that computing the rank of matrices or solving systems of linear equations we often use elementary operations to simplify the computing procedure. The same is true for finding the inverse matrix. We next introduce a method for finding the inverse matrix using elementary operations.
133
Matrix Operations
For this purpose let us first introduce the following important elementary matrices. From the rule of matrix multiplication, it is easy to see that an ai2 fca2i ka22 asi a 32
a 13 \ /l ka.23 1 = I 0 a 33 / \0
0 0 \ /an k 0 I I a2i 0 1 / \a3i
'an a2i
ka-12 a n N /ail fca22 a23 I = I a2i
«12 a22
, 031
&132
CL32
033 /
\ 0,31
ai2 a22 a32
ai3\ »23 I = a33/
E2(k)-A,
1 0 0 ai3 \ a23 | ( 0 & k 0 | = A ■ E2(k), 1 0 3 3 / Vo 0
where £ 2 (*) In general, for any (m, n)-matrix A, the matrix resulting from multiplying the i-th row vector of A by k can be obtained by premultiplying A by a cor responding elementary matrix Ei(k) of order m. The matrix resulting from multiplying the i-th column vector of A by k can be achieved by postmultiplying A by a corresponding elementary matrix Ei(k) of order n, where Ei(k) can be obtained by multiplying the i-th row of the identity matrix by k(=/= 0). Again we have an a2i .a 3 i + A;aii
a 32
012 a22 + ka12
ai3 \ /l 023 J = I 0 a 3 3 -|-/cai3/ \k
U U\ / a n 1 0 I I a2i 0 1/ \a3i
a i 2 aX3 022 023 a 32 a 33
= Bi3(fc) • A, an a2i a3i
ai2 a22 a 32
a i 3 + fcan \ /an a 23 + fca2i I = I a 2 i a 33 + fca3i/ \ a 3 i
ai2 a22 a 32
\r
ai3 023 a 33 /
0
Vo
0 k 1 0 0 1
= ^-^3l(fc), where / I 0 k\ / I 0 0\ 1 0 0 E13(fe) = 0 1 0 , E31(k) = 0 1/ \k 0 1/ Vo In general, let A be any (m, n)-matrix. The matrix resulting from adding a &-times multiple of the i-th row of A to the j - t h row of A can be obtained by premultiplying A by a corresponding elementary matrix Eij (k) of order m;
134
Linear Algebra
the matrix arising from adding a fc-times multiple of the i-th column of A to the j-th column of A can be achieved by postmultiplying A by a corresponding elementary matrix Eji[k) of order n, where Eij(k) can be obtained by adding a fc-times multiple of the z-th row of the identity matrix E of order n to the j - t h row of E. It should be noted that for elementary row operations we premultiply A by Eij(k) and for elementary column operations we postmultiply A by Eji(k). The former and the latter are different and are not to be confused with each other. It is easy to see from definition that Ei{k) = Ei(k), E'ij(k) = Eji(k),
=
Eiik-1),
E-j1(k) =
Eij(-k).
E-\k)
Elementary matrices such as Ei(k), Eij(k) are nonsingular matrices. They can be formed by elementary operations of type 2 and type 3 on the identity matrix. They play the role of elementary operations and are therefore called elementary matrices. Since elementary operations of type 1 follows from elementary operations of type 2 and type 3, we have: T h e o r e m 4. The matrix obtained by performing row (column) elementary operations on a matrix A is equal to the matrix obtained by premultiplying (postmultiplying) A by a corresponding elementary matrix. In Sec. 2.5, we have given the necessary and sufficient condition for two matrices to be equivalent. We shall next give another necessary and sufficient condition by using elementary matrices. Assume that A ~ B. That is to say, we obtain B from A by a finite sequence of elementary row or column operations. Theorem 4 states that there are certain elementary matrices Pi,Pz,...,Pr and Qi,Q2,■ ■ ■,Qs such that B = Pr-PlAQ1-Qs, or B = PAQ, where P = Pr- ■■ P% and Q = Q\ ■ • ■ Qs are nonsingular. Conversely, if B = PAQ, where P, Q are nonsingular matrices, it follows from Theorem 3 that the ranks of A and B are equal. Then from Theorem 4 in Sec. 2.5 we obtain A ~ B.
135
Matrix Operations
Thus we have: T h e o r e m 5. Two matrices A and B of the same size are equivalent if and only if there are two nonsingular matrices P and Q such that B = PAQ. It should be remarked that since A and B are not necessarily square matrices, the orders of P and Q may not be equal. According to the theorem, one of two equivalent matrices can be expressed in terms of the other matrix, while Theorem 4 in Sec. 2.5 does not have such an advantage. In particular, if the rank of matrix A is r, then there are nonsingular matrices P and Q such that \
/\l PAQ =
V
0/
Assume that a matrix A of order n is nonsingular. It follows from The orem 4 in Sec. 2.5 that A is equivalent to the identity matrix E of order n. Since the inverse of an elementary matrix is still an elementary matrix, we know from Theorem 4 that A can be expressed as the product of elementary matrices. Obviously the inverse is true also. Therefore we have: T h e o r e m 6. A matrix is nonsingular if and only if it can be expressed as a product of elementary matrices. In the above we have introduced the elementary matrix and its properties. We next give a much simpler technique for computing the inverse of a matrix. We know from Theorem 2 in Sec. 2.5 that any nonsingular matrix can be reduced to the identity matrix by means of elementary row matrices. Hence for a nonsingular matrix A, we have elementary matrices E\,..., Em such that ■EiA = E. En Prom the uniqueness of inverse matrix, we get E„ E-n
■ ExE = A-
■E1=A~
i.e.,
Linear Algebra
136
We know from the above two expressions that if a sequence of elemen tary row operations reduces A to E, then the same sequence of elementary row operations applied to the identity matrix will yield A-1 simultaneously. Thus we can find the inverse by elementary row operations. A convenient way of organizing the computing process is to write down the partitioned matrix (A : E), setting E to the right side of A. When A is reduced to E by ele mentary row operations, E on the right side is simultaneously reduced to A~x. Thus we find the inverse of matrix without the tedious computation of deter minants. E x a m p l e 8. Using elementary row operations find the inverse matrix of
A =
Solution: We set the identity matrix E of order 3 to the right of A and reduce A to E by elementary row operations, proceeding as follows. 2
1
3
0
2 2
1
0
1 0
4
3
0
0
\3
/1
0* 1/
2
3
0
-2
-5
\0
-2
-6
\
/
0
-2
-1
1
0
0
-2
5
-2
1
0
\0
0
-1
-1
-1
1/
f 1
0
0
:
1
3
-2
0
-2
0
:
3
6
-5
Vo
0
-1
:
-1
-1
(l
thus A'1
1/
1
-2\
0
0
1/
Vo I\ o o 0
3
1 0
-3
Vo o i
1
-
2\
- 1 /
Matrix
137
Operations
This result agrees with that of Example 3. Similar to solving a system of linear equations in Sec. 2.5, before finding the inverse of a matrix by elementary row operations, we need not determine whether the inverse matrix exists. If at any point in the computation we find some singular matrix that is row equivalent to the original matrix, then the original matrix has no inverse. E x a m p l e 9. Find the inverse of the matrix 1 -2 4 1 2 5 1 1 Solution: To find A /
1
-1 ~2\ 2 1 4 -1 1 1/
, we proceed as follows:
1 0 0 0
1 4 2
\1
0 0 0 10 0 0 10 0 0 1/ 10
-2 -1 -2
0 0
0
9
6
9
- 4 1 0 0
0
9
6
3
- 2 0 1 0
\o
3
2
3
-10
0 1/
At this point A is row equivalent to
B
1 -2 0 9 0 9 0 3
-1 6 6 2
~2\ 9 3 3/
Since B is clearly a singular matrix, we stop here and conclude that A is a singular matrix. Hence it has no inverse matrix. The method above is very convenient for matrices whose elements are num bers. Sometimes it is also suitable for matrices whose elements are characters.
138
Linear Algebra
E x a m p l e 10. Find the inverse of the following lower triangular matrix /
1
\
a „2
1 a
n-1
,n-2
A =
w
,n-3
1/
Solution: We set E to the right side of A, write down the partitioned matrix {A : E), and perform the following elementary row operations on the matrix (A : E): Subtract a times the (n — l)-th row from the n-th row, then subtract a times the (n — 2)-th row from the (n — l)-th row, and continue in this way until we subtract a times the first row from the second row. We then have /
1 1 a
a „2
\a
\
1
„n-2
n-1
,n-3
0
1
0
0
1
0
0
0
1/
(l 0
1
0
0 1
1 —a
Vo o o
1/
and hence / 1 i-i
V
-a
1/
Similarly, we can also find the inverse matrix by elementary column oper ations only. In such a case we must set the identity matrix E under A. As in solving a system of linear equations, the above method of finding the inverse by elementary operations is not suitable for finding the inverse of matrices arising from practical applications, which requires the speed and efficiency of an electronic computer.
139
Matrix Operations
Exercises 1. Find the inverse of the following matrices, if they exist, f\ (1)
(3)
2
\ *;• i /I 1 i 1 -i Vi - i
(2)
V3
i -i i -i
!\ -1 -1 1/
(4)
cos a -sin a sin a cos a / l 1 0 0\ 1 2 0 0 3 7 2 3
V2 5 1 2 /
2. Given / 0
0
oi 0
0
a2
\
0
A= 0
0
0
an. 0 /
where a* ^ 0,i = 1,2,... ,n, find J 4 - 1 . 3. Let A and B be nonsingular matrices. Find the inverses of the following matrices. (1)
O B
4. Let ^4
A O /I 1
(2)
1
GS)-
be a matrix of order n. Find ^4*.
\1 ■• 15. Assume that A is a nilpotent matrix and Am — O. Prove that E — A is invertible and find its inverse. 6. Given A
- 1 0 1 —1 1 1
0 0 | , compute -1
(A + 2E)~1(A2-4E)
and (A +
2E)-\A-2E).
7. Let A and B be two matrices of order n. If AB = A + B, prove that AB = BA.
140
Linear Algebra
8. Write the system of linear equations in Example 2 of Sec. 1.4 in matrix equation form. Then using the inverse of the coefficient matrix, solve this system of linear equations. 9. Prove that E'i(a) = Ei(a), (Eijia))'1
=
E'^a) = Eji{a),
(^(a))"1 =
E^a'1),
Ei:j(-a).
What the above means is that the transposed matrix of an elementary matrix is an elementary matrix; the inverse of an elementary matrix is also an elementary matrix. 10. A matrix obtained by interchanging the i-th and j - t h rows of the identity matrix of order n is denoted by Eq. Show that E[j = Ejj, E~-1 = Eij, and the matrix obtained by multiplying A on the left (right) side by E^ is equal to the matrix resulted from interchanging the i-th row (column) and the j-th row (column) of A. 11. Let A be a matrix of order n. Show that (1) {-A)* = ( - l ) n - x A * ,
(2) {A~lY =
(A*)~\
(3) (A*Y = (A')*,
(4) (A*)* =
\A\n~2A.
12. Verify that any square matrix can be expressed as the product of a nonsingular matrix and a nilpotent matrix. *13. Given that the rank of an (m, Z)-matrix A is r, the rank of an (I, n)-matrix B is s, the rank of AB is t, prove r + s — I ^ t ^ min (r, s). 14. Prove that if / ( x i , . . ■ ,xn) ■ g(x\,... / ( x i , . . . , x „ ) = 0 or
,xn) = 0, then g(xi,...,xn)=0.
15. Can we find the inverse by using only elementary column operations? Can we find the inverse by using elementary row operations as well as elementary column operations?
CHAPTER 4 QUADRATIC FORMS
In linear algebra we consider mainly linear functions, normally not includ ing equations and polynomials of degrees equal or greater than 2. However, quadratic forms are a special case of bilinear forms, of course, and problems of the quadratic form are linear problems. Besides, in this chapter we discuss only the standard forms of quadratic forms, and do not solve equations of degree 2. This can be done by linear transformations or matrices. Hence the content of this chapter involves basi cally an aspect of matrices, in fact, applications of matrices. We know from plane analytic geometry that a curve of the second degree ax2 + bxy + cy2 = d whose center is at the origin can be reduced to the standard form a'x'2 + c'y'2 = d by rotating counterclockwise through an appropriate angle 9 about the origin of a coordinate system ( x = x1 cos 6 — y' sin 6, \ y = x' sin 6 + y' cos 6. From the standard form we can easily separate the curves into classes and study their properties. It is also an important problem to reduce the equations of quadratic sur faces to the standard forms in solid analytic geometry. Many theoretical and practical problems often require that we generalize the above case, and discuss 141
142
Linear Algebra
quadratic forms in n variables. For example, the problem of effective networks is simply one of quadratic forms. We recall that it is the second degree terms of the Taylor expansion of a function that determine such local behavior as maxima, minima, and curvature. For a real-valued function, the sum of the second degree terms in a Taylor expansion is a quadratic polynomial, which gives a quadratic form. Quadratic forms are also needed in statistics and other branches of applied mathematics. In this chapter we shall discuss the following three problems: 1. Methods of reducing the general quadratic forms to sums of squares by nonsingular linear transformations. 2. Real quadratic forms, the law of inertia for quadratic forms. 3. Bilinear forms. This chapter is divided into 3 sections, each dealing with one of the problems. 4.1. Standard Forms of General Quadratic Forms We have known that a polynomial whose terms are all of the second degree is called a quadratic form. For example, f(x, y, z)=x2
+ 2y2 + 5z 2 + 2xy + 6yz + 2zx
is a quadratic form in three variables. In general, a quadratic form in n variables can be written as f(xi,.
. . , Xn) = anX1
+ 012X1X2 + • • • + OlnXlX„ 2
+ 021X2X1 + <222X2 +
h fl2nX2Xn 2
+ anixnxi
+ an2XnX2 + ■■ ■ + annxn
In this section we give a method of reducing quadratic forms to sums of squares by nonsingular linear transformations. xWe know that the transformation
{
xi =PnJ/i + ---+Pinl/n, Xn = Pnl2/1 H
r- VnnVn ,
w
Quadratic
Forms
143
which transforms the variables xi,...,xn to variables yi,---,yn is called a linear transformation as the relation between the variables is linear. The matrix I Pu
■■■ Pin
\Pnl
■ ■ ■ Pnn
is called the matrix of linear transformation (1). When P is nonsingular, (1) is called a nonsingular linear transformation; When P is singular, (1) is called a singular linear transformation. When the coefficients pij are real numbers, (1) is called a real linear transformation. For example, in analytic geometry the rotation transformation is a real nonsingular linear transformation. If (1) is nonsingular, (j/i,...,y n ) can uniquely be obtained from any {x%,..., xn) by transformation (1). If (1) is singular, then (3/1,... ,yn) may be not exist, or even if it does, it is not unique. In analytic geometry we have seen that many properties of the curve f(xi,... ,xn) = 0 cannot be preserved through this kind of transformation. Hence the transformations used below are assumed to be nonsingular. Example 1. Find the equation of a straight line in a plane which is not 2 3 altered by the linear transformation whose matrix is 0 1 Solution: Let ax + by + c = 0 be the linear equation required. Making the nonsingular linear transformation x = 2x' + 3y' and y = y' and substituting them into the equation required, we obtain 2ax' + (3a + b)y' + c = 0 . Since the straight line is not altered, a : b : c = 2a : (3a + b) : c. If a ^ 0, then 3a + b = 2b, c = 0, i.e., b = 3a, c = 0. If a = 0, then by + c = 0 is not altered. Hence the straight line required is x + 3y = 0 or y = d, where d is a constant. The solution is complete. We next give concrete methods for reducing a quadratic form to a sum of squares. The method is that of completing the square learned in middle
144
Linear Algebra
school. As in Sec. 2.5, for simplicity, we shall illustrate this method with the following example. This is also the method for the proof. As long as there is no difference in principle between the example cited and the general case. It is plausible to give a proof by using examples. Because it is convenient to state and simple to understand, we shall frequently use this procedure. But we must make sure that there is no difference in principle between the proofs given for the special example and the general case, otherwise it cannot be valid as a proof. Example 2. Reduce the quadratic form f{x, y, z) = x2 + 2y2 + 5z2 + 2xy + 6yz + 2zx to the sum of squares, and find the nonsingular linear transformation used. Solution: Collecting the terms containing x and then completing the squares, we obtain f(x, y, z) = x2 + 2(j/ + z)x + 2y2 + 6yz + 5z2 = x2 + 2(2/ + z)x + (y + z)2 - (y + zf + 2y2 + 6yz + 5z2 = (x + y + z)2 + y2 + ±yz + 4z2 = (x + y + z)2 + (y + 2z)2 . Hence the nonsingular linear transformation ' x' = x + y + z, • y' =
k z'
( x = x' — y' + z',
y + 2z,
or
z,
=
yz =
y' - 2z',
z\
would reduce / to a sum of squares
f(x',y',z') = x'2 + y'2. As the coefficients of each term in the sum are all 1, the sum of squares is in the simplest form. The matrix of linear transformation used is thus P=
IX - 1 0 1
\0 The solution is complete.
0
1\ -2 .
l)
Quadratic
145
Forms
The simplest form of the sum of squares in which the coefficients are all 1, as in the above example, is called the standard form of / . E x a m p l e 3. Reduce the quadratic form f(xx,x2,x3)
= 2x\x2 + 2xix3 - 6x2X3
to a sum of squares and find the nonsingular linear transformation used. Solution: Since the quadratic form given has no square terms, we first make the nonsingular linear transformation %i = J/i + 3/2, X2 = y i - 3/2, x3 =
(2)
3/3
and obtain / = 2(2/i - 22/13/3 - 3/2 + 43/2y3) • We then collect all the terms which contain j/i and complete the squares. Similarly, collecting all the terms which contain y2 and completing the squares, we obtain / = 2[{y\ - 2Vly3 + j/f) - y\ + 4y2y3 - 3/f] = 2[(3/i - 2/3)2 - (yl - 42/22/3 + 4j/2) + 3y|] = 2[(3/i - 2/3)2 - (3/2 - 22/3)2 + 33/32] = 2zj - 2z\ + %z\ , where ' z\ = 3/1 - 3/3 = \xi + \x2 - x3, < z2 = 3/2 - 2y3 = \xi - \x2 - 223, .•23=
2/3 =
Z3
Hence the nonsingular linear transformation x\ = z\ + z2 + 3z 3 , x2 = Z1-Z2-
z3,
x3 =
z3
reduces / to the sum of squares / = 2z\ - 2z\ + 6 z | .
146
Linear Algebra
If we now use the nonsingular linear transformation 1 z
i = ~7f *!>
1 Z2 =
1
"7zf <2'
Z3 =
~7^ 3 '
' '
then / is reduced to the standard form
/ = t\ + t\ + t\ . The solution is complete. In this section we shall consider problems in the complex field so that the coefficients of quadratic forms and linear transformations can take on both real and complex numbers. Thus in Example 3 the coefficients of the last linear transformation (3) can be taken as complex numbers. In general, as in the above examples, quadratic forms in n variables can be reduced to a sum of squares as follows: If a quadratic form f(xi,,.. ,Xn) has square terms, say ax\, we collect all the terms containing x\ and then complete squares. If the quadratic form has no square terms, we first reduce it to a new quadratic form with square terms by a nonsingular linear transformation (for example transformation (2)), and collect all the terms containing some variable which has square terms (for example all terms containing y\ in Example 3), completing the square for the variable. We continue in this way for the remaining n— 1 variables. Using linear transformations at most n times we can reduce a quadratic form in n variables to a sum of squares. By using a simple nonsingular linear transformation (for example transformation (3)), the sum of squares is reduced to the standard form. Hence we have the following important theorem: Theorem 1. A quadratic form can be reduced to a sum of squares, or the standard form, by a nonsingular linear transformation. Sometimes the standard form can also be found conveniently by directly selecting an appropriate nonsingular linear transformation. In this case it is not necessary for us to reduce / to a sum of squares by the above method. Example 4. Reduce the quadratic form / { » ! , - ■ • , X 2 n ) = XiX2n
+ I2Z2n-l H
\-XnXn+i
to a sum of squares and find the nonsingular linear transformation used.
147
Quadratic Forms
Solution: By inspection we can obtain the following linear transformation: ' Xi=yi+ (
y2n,
X2n = Vl-
X2 = J/2 + 2/2n-l,
2/2n ,
Z 2 n - 1 = 2/2 ~ 2/2n-l ,
. Xn = 2/n + 2/n+l,
3=71+1 = 2/n ~ 2/n+l •
Using this the given quadratic form becomes r
2
,
2
i
i
2
2
2
/ = 2/1 + 2/2 + ■ • • + 2/n - 2/n+l
2/2n-l ~ 2/2n ■
Obviously the matrix of the linear transformation is
/I 0 / l0 1 0 0
2
0 01
1
1 \ 01 \
1 0
1 1 1 -1 0 1 \1 0
- 1- 1 0 0 -- 11 //
As its determinant is equal to (—2)", it is a nonsingular matrix. The solution is complete. After reducing a quadratic form to a sum of squares, we can readily reduce the sum of squares to the standard form. Hence the problem of reducing the quadratic form to the standard form is simply one of reducing it to a sum of squares. We next discuss the properties of quadratic forms by using matrices, and simplify the procedure of reducing them to sums of squares. This is our main purpose for this section. Firstly we express a quadratic form in terms of matrices. For example, in Example 2 we rewrite each term in the quadratic form which is not a square as sum of two terms whose coefficients are equal. Thus we rewrite the quadratic form in the symmetric form f(x,y,z)
= x2 + xy + xz + yx + 2y2 + 3yz + zx + 3zy + 5z 2 ,
148
Linear Algebra
or, in matrix form, f(x,y,z)
/I 1 1 1 2 3\ \1 3 5
= {xyz)
x
y
\
\ 1)[w
This expression is unique, for there is only one way we can divide the coefficients of nonsquare terms into two equal parts. In general, quadratic forms in n variables can be written as n J
=
/
. dijXiXj ,
i,J=l
where the coefficients Oy are complex numbers generally and Ojj = o^. Hence / can be written in terms of matrices as f = X'AX,
(4)
where X=
/M :
fan ■ ,X' = (xi-
■ ain \
■ ■ Xn), A —
\ani
\Xn/
■
Q>nn /
Since ajj = aji, we have A' = A, i.e., A is a symmetric matrix and is called the coefficient matrix of / . Hence expression (4) is unique. Thus / is uniquely determined by the matrix A, and the rank of A is called the rank of / . In particular, when the coefficients real, / is called a real quadratic form. We shall discuss how (4) is to change after it has been changed by the nonsingular linear transformation (1). Writing (1) as X = PY, where
x=
fXl\ \xnJ
(Vi\ ,Y =
i \VnJ
and substituting it into (4), we obtain
/ = (PY)'A(PY) = Y'(P'AP)Y = Y'BY,
149
Quadratic Forms
where B = P'AP is a symmetric matrix for B' = (P'AP)' = P'A(P')'
= P'AP = B .
We know from Theorem 3 in Sec. 3.3 and the nonsingularity of matrix P that the ranks of A and B are equal. Two matrices A and B are said to be congruent if there exists a nonsingular matrix P such that P'AP = B. Clearly the congruence relation also satisfies the three equivalent laws. Hence it is an equivalent relation. It follows directly from definition that the ranks of congruent matrices are equal. Thus we have: T h e o r e m 2. The quadratic form / = X'AX can be reduced to the quadratic form / = Y'BY by means of a nonsingular linear transformation X = PY, and its coefficient matrix B = P'AP, i.e., B is congruent to A. We know from this theorem that the ranks of quadratic forms are not altered by a nonsingular linear transformation. For instance, in Example 2 above, since the rank of / is 2, its sum of squares consists of two terms. In Example 3, since the rank of / is 3, its sum of squares consists of three terms. In general, if the rank of / is r, then its sum of squares will consists of r terms. Thus from Theorem 1, we have the following fundamental theorem for quadratic forms: T h e o r e m 3. By means of an appropriate nonsingular linear transforma tion X = PY, a quadratic form / = X'AX whose rank is r can be reduced to a sum of squares f = aiyj +
Varyl,
a, ^ 0,
i=l,2,...,r.
(5)
i.e.,
(a i xai
\ 2/1 \
f = Y' Y'
Y,
w where Y = Jn/
V
o)
150
Linear Algebra
We can simplify further the coefficients of (5) by using the nonsingular linear transformation 1 2/i
■-Zi,---
ZTI 2/r+l — %r+\t ' ' ' ; Un — zn :
,Vr
(6)
to obtain the standard form of / : / = z\ + ■ ■ ■ + z2r .
Since a nonsingular linear transformation does not alter the rank of a ma trix, the standard form of / is determined uniquely by its rank. In other words, the standard form of / is unique. To find the standard form, if we do not need the linear transformations involved we can simply compute the rank of the quadratic form, based on which we can immediately write out the standard form. This may be less tedious than the earlier method. If two quadratic forms can be transformed into each other, i.e., one of them can be reduced to the other, then their ranks are equal. Conversely, if the two quadratic forms have the same rank, then they have the same standard form and can be transformed into each other by a nonsingular linear transformation. Thus we have: Theorem 4. Two quadratic forms can be transformed into each other by a nonsingular linear transformation if and only if they have the same rank. If we omit the variables in the matrix expression of quadratic forms, what remains is a symmetric matrix, so that the problem of quadratic forms is in essence one of symmetric matrices. Restating Theorem 3 in terms of matrices we have: Theorem 5. A symmetric matrix is congruent to a diagonal matrix, i.e., if a matrix A is a symmetric matrix of rank r, then there is a nonsingular matrix P such that (a\
\
P'AP
0/
where
Quadratic
151
Forms
For instance, in Example 3, 0 1 1 0 1 -3
1\ -3
, p =
o
I 1
\o
1 -1 0
3 -1 1
By computing directly, we also obtain
P'AP =
1 0\ / 1 -1 0\ /I /2 -1 0 1 1 1 0 = -2 \3 -1 1/ V-2 4 6/ V
\
V
It should be noted that if A is not a symmetric matrix, P'AP may not be a diagonal matrix. For instance, in Example 3 if we take
/°° \o
2 2\ 0 -3 -3 0/
then we have P'AP =
°\
1 0 \ /' 2 -2 I -1 0 0 0 1 "3 -1 1/^v-3 3 3^ -2 Z /2 ~\ -2 3 ) 2 -3 •/
V» V>
which is not diagonal, while Y'(P'AP)Y is still the same sum of squares. Hence if A is not a symmetric matrix, it is not proper to describe in terms of matrices that / = X'AX is reduced to a sum of squares. This is another reason why the matrix A of a quadratic form is assumed symmetric. We now turn our attention to the question of how to find a matrix P such that P'AP becomes a diagonal matrix, where A is the given symmetric matrix. Of course we can use the method of completing squares as in the above example, but the procedure is rather tedious. We next introduce a method using elementary operations. We know from Theorem 6 in Sec. 3.3 that any nonsingular matrix can be written as a product of some elementary matrices. If we let P = E\Ei • ■ ■ Em, then P' = E'm ■■ • E[. Hence from Theorem 5 we have
152
Linear Algebra
fax Em " ' ' EXAE\
aT
■ ■E■m E= = m
(7)
0 0/
\
Since E[{k) = Ei(k),Elj(k) = Eji(k), elementary column operations result ing from multiplying A on the right by Ei{k) or Eij(k) and elementary row operations resulting from multiplying A on the left by Ej(k) or Eji(k) are elementary operations of the same type (see Sec. 3.3). We know from (7) that using row operations and simultaneously using column operations of the same type, we can also reduce a symmetric matrix to a diagonal matrix. Thus we have: Theorem 6. Any symmetric matrix can be reduced to a diagonal matrix by means of elementary row operations and elementary column operations of the same type simultaneously. Thus the diagonal matrix can be found by using elementary operations. Example 5. Reduce the symmetric matrix 0 1 1 0 1 -3
1 -3 0
to a diagonal matrix by using elementary row operations and elementary col umn operations of the same type. Solution: Since the element in the first row and the first column is zero, and the element in the first row and the second column is not equal to zero, adding the second row to the first row and simultaneously the second column to the first column yields
121 V-2
1 0 -3
~2\ -3 0/
It is still a symmetrix matrix. Adding - \ times the first row to the second row, the first row to third row, — \ times the first column to the second column, the first column to the third column yields
Quadratic
(2 0 \0
Forms
0 -J -2
153
0\ -2 . -2/
The submatrix at the lower right corner is also symmetric; we can simplify it in the same way. Adding —4 times the second row to the third row, and —4 times the second column to the third column yields (2 0 \0
0 -\ 0
0\ 0 . 6/
This is the diagonal matrix required. If we require the diagonal matrix desired to be the same as in the preceding Example 3, then it suffices to multiply the second row and the second column by 2. The solution is complete. It should be noted that Example 5 is here taken as an illustration of The orem 6. In fact Theorem 6 can be proved as in Example 6. So can the general case. The proof does not differ from the above example in principle. Here Theorem 5 can be directly proved using Theorem 6, where P is the product of elementary matrices corresponding to the elementary column operations used. The above method for finding a diagonal matrix finds the matrix P as well. Since P = E\Ei ■ ■ ■ Em , rewritting P as P =
EE1---Em
and comparing it with the preceding (7), we see immediately that by using elementary row and elementary column operations of the same type, A is reduced to a diagonal matrix. At the same time, using only elementary column operations, among the elementary operations used above, the identity matrix E is reduced to the matrix P. Thus P is simply and readily found. In practical computing, we adopt a writing method similar to that for finding inverse matrices by elementary row operations in Sec. 3.3. Finding the inverse matrix A"1, we use only elementary row operations on A, so we put E on the right side of A. Here we put E under A, for P results from elementary column operations on A. Thus reducing A to a diagonal matrix is completed simultaneously with finding P, which is really killing two birds with one stone. This explains why it is necessary to study quadratic forms in terms of matrices. Problems of quadratic forms are essentially problems on matrices.
154
Linear Algebra
Example 6. Reduce A in the above example to a diagonal matrix by using elementary row operations and elementary column operations of the same type, and find the matrix P used. Solution: According to the above example, we immediately obtain
f° 11 1 0
V0 /
2
0 0
1 0 -3 0 1 0 0
1 -2 1 1
0 0
1 2
i /
V o
0
/ 02
^
0 6
0
0
2
2
0
0
0 0
1 2
0 1 0 0 -2 0
0 0
1 1
F
\ 0
02
-1 1 0
3 -1 l)
1 -1 1 1 \o 0
3 \ -1 1/
I 0 -3
2
(
-3 0
~A
(
1J
-2
l
0 -2 -2 1 1 1
0 6 1
1
1 1
I
V0
02
1 1
3 -1 l)
Vo
so that P'AP=
2 0 | 0 -2 0 0
0 0 6
The solution is complete. The diagonal matrix in the above example coincides with that in the pre ceding Example 3, but the matrix P used are apparently different. However, multiplying the second row and the second column of the last equivalent form by —1, the diagonal matrix on top does not change while the matrix under the diagonal matrix becomes the matrix P used in Example 3. This also illustrates the fact that the matrix P in Theorem 5 is not unique. In the above we considered the reduction of a symmetric matrix to a diagonal matrix by means of a nonsingular matrix. In Sec. 5.3, we shall discuss the reduction of real symmetric matrices to diagonal matrices, that is, the reduction of real quadratic forms to sums of squares by using or thogonal transformations. This problem is known as the principal axes pro blem of quadratic forms, or briefly the principal axes problem. It is one of the important problems in linear algebra.
Quadratic
155
Forms
Exercises 1. Determine whether the following two expressions are quadratic forms or ot: not: (1) x2 + 2y2 + 2xy + 6yz + 3,
(2) 2xiz 2xix 2 + 2x1X3 2xix 3 - 6x 2 x 3 = 0.
2. Does )oes the conclusion of Theorem 2 in Sec. 4.1 hold if X = 1PY is a singular linear near transformation? 3. Can 'an P be found by elementary row operations? 2 2 4. Write Vrite f(xi,X2,£3) = (aix\ {a\Xi + 0,2X2 + a^xz) CL3X3) ininmatrix matrixform. form leduce the following quadratic forms to the standard forms fon by using the 5. Reduce lethod of completing squares, and find the nonsingular linear lir method transforma tions ions used. (1) f(xi,x / ( x i ,2x,x23,)x 3 ) = = X?+ xl + 5 x i5xiX2-Sx x 2 - 3 x 2 x2X3, 3 , 2 a; = 2x\ 2x\ + + 5x 5X2 + + 55x1 3 ++ 4XiX (2) f(xi,X f(xi,x2,X 4£l%2 4xix3 2 -~ 42:1X3 2, 3) x3) =
(3)
-
8X2X3,
/ ( x i , X 2 , X 3 , X 4 ) = X1X2 + X2X3 + X 3 X 4 .
6. Reduce the quadratic forms given in above problem to the standard forms by elementary operations and find the matrices of the linear transformations used. n
7. If the rank of \ J aijXiXj is n — 1, and a,ij = o,i, Ann ^ 0, where Ann is nonsinguh linear the algebraic aic cofactor of element ann in A = (ay), find a nonsingular transformation ition such that nn / . ClijXiXj i,3=l o. .Let J\pt>Xi''' ' ' ' j) X -^nj nJ
=
=
=
n—1 n—1 / . QijUiUj i)J = l
^ j ^ iQijXiXj, j X i X j , &ij 2—/ Qij ==
CLjii ffx&l)'' ' i, X Qjii 51*^11 xn)n j
=
=
/ , 2-i
DgXiXj,
btj = bji. If It for any (xi,...,x ( x i , . . n. ),, x n ), f{x / ( xx,...,x i , . .n.), xn) = g(x g{xx,..., xn), pre prove that x, ...,xn), &ij
=
0{j.
9. Let A = I J:
.
I , B = ( „x
_ J. If A\ and A2 are congruent to B\
and B2 respectively, prove that A is congruent to B. 4.2. Classification of Real Quadratic Forms In the preceding section we have considered the general quadratic forms, i.e., quadratic forms whose coefficients are complex numbers.
156
Linear Algebra
In this section we shall consider quadratic forms whose coefficients are all real numbers, i.e., in the realm of real numbers. We shall discuss the standard forms, classification of quadratic forms and their criterions. As for complex quadratic forms, we shall discuss some special cases in Chapter 8. The results to be obtained are actually the same as in this chapter. First we discuss the standard forms of real quadratic forms. We shall con fine our discussion to the realm of real numbers and require the coefficients of the linear transformations involved to be real. It is noted that the conclusions reached for general quadratic forms in the preceding section need not be true for the case-of real numbers. It is easy to see from the method of reducing to sums of squares that the coefficients o j , . . . , ar in Theorem 3 in the preceding section result from four arithmetic operations. Hence, if the coefficients of / and the elements of nonsingular matrix P are real numbers, obviously 01,02,... ,ar are all real numbers. Therefore, Theorem 3 in the preceding section is also true in realm of real numbers. Note that although ai, 0,2, ■ ■ ■, ar are all real numbers, they need not be positive numbers, and so the coefficients of the linear transformation (6) in the preceding section are not necessarily real numbers, therefore here we do not adopt (6). For convenience we shall assume that 0 1 , . . . ,Ofe are positive numbers and the remaining ak+i, ■ ■ ■ ,ar are negative numbers. We can then take the nonsingular real linear transformation as '
2/1 =
< Vk+l
=
1
I
w< = ^
-
^ST*l' y-ait+i
Zfc+li
Z k
>
• • -• , ■> yr - 7 = Zr= Z r ,' Ur = — y/—a^ v O-T
Vr+1 = = ZT+1, >. 2/r+l -2r+l)
• • • ,)
= ^nZn-. Vn ~
(5) in the preceding section now becomes / = zl H
1" zk ~ zk+l
""
z
r ■
Such a simple form is called the standard form of a real quadratic form / . For instance, in Example 3 in the preceding section, / is reduced to / = 1z\ - 2z\ + 6zf by a nonsingular linear transformation. Furthermore by the nonsingular real linear transformation
21 =
1
Wu
Z2 =
1
1
7^tz'Z3 = 7e 3'
Quadratic
Forms
157
/ is reduced to / — £j — t2 + 1 3 , which is the standard form of / . Thus from the above discussion we have: T h e o r e m 1. By a nonsingular real linear transformation X = PY,
(1)
a real quadratic form / =
X'AX
of rank r can be reduced to the standard form
/ = y\ + . . . + y\ - y2k+1
(2)
In analytic geometry the transformation x' = a\x + biy + c\z + di, < y' = a2x + b2y + c2z + d2 , z' = a3x + b3y + c3z + d3 , which transforms a point (x, y, z) in one affine coordinate system (oblique coordinate system) into a point (x',y',z') in another affine coordinate sys tem, is called a spatial affine coordinate transformation, if the determinant of its matrix ai 6i c\ a-i &2 c2 ■£ 0 a3
b3
c3
does not vanish. Example 1. Reduce the equation of a quadratic surface 2xy + 2xz-6yz
+ 2x + 2y + 2 = 0
to the simplest form by affine coordinate transformation. Solution: From Example 3 in the preceding section we know that by using the affine coordinate transformation x = x' + y' + 3z', y = x' -y' - z', z= z>,
158
Linear Algebra
the equation of the surface can be reduced to 2x'2 - 2y'2 + Qz'2 + 4x' + 4z' + 2 = 0, or 3(x' + l ) 2 - 3 2 / ' 2 + 9 ( V + i )
- 1 = 0.
Then by using the affine coordinate transformation ' x" = y/Hx'
+ y/3, 1
y" = ^ z" =
VSy , 3z' + l,
the equation given can be reduced to the standard form x"2-y"2
+- z" z "2-l2 - l = = 0. 0.
We have seen that the standard form m of a general quadratic quadr form is unique. We know from the following law of inertia that the standard form of a real quadratic form is also unique. T h e o r e m 2. In the standard form (2) of a real quadratic form of rank r, the number k of the positive terms is determined uniquely. Hence the number r — k of negative terms are also determined uniquely. Proof. Suppose that by means of the two nonsingular real linear transfor mations •Xi
yi\
= p
»arn /
Zl
Xi
= Q
\ynJ
f is reduced to two standard forms respectively:
f = y2 + --- +
yl-y2k+1-
yl,
(3)
2
f = zt + --- + zf-Zl 1+1 where zi
2/1 \
zi = bnyi H
1- binyn,
Zn = bniyi H
1- bnnyn .
= Q~1P
(4) xVnJ
Quadratic
159
Forms
We next verify k = I by an absurdity argument. Assume first k > I. Since in the system of linear equations with z\ = 0 , . . . , zi = 0, yk+i = 0 , . . . , j/„ = 0, i.e., ( bnS/i H
yfc+i
h hnVn = 0,
=o,
(5)
Vn = 0, the number of equations is less than n (the number of unknowns). According to Theorem 1 in Sec. 2.2, the above linear system has nonzero solutions. Sub stituting this solution into the first equation of (3), since J/i, J/2, • • • 12/fc are not all zero, we have
f = yl + --- + yl>0Again substituting this solution into (4), we obtain z%. Substituting this solution into the second equation of (3), since Z\,z%,... ,zi are zeros, we have
/ = -*?+!
*?<0.
Clearly this is a contradiction. That is to say, the above assumption is wrong. Hence k = 1. The theorem is proved. In the standard form (2), the number of positive terms and the number of negative terms are determined by / itself, independently of the method used for reducing / to the standard form. The number k of positive terms of / is called the positive index of inertia, the number r — k of negative terms of / is called the negative index of inertia. The sum of the number of the positive index of inertia and the number of the negative index of inertia is clearly equal to the rank r. Their difference is called the signature of the real quadratic form. For instance, in Example 2 in the preceding section, the rank of real quadratic form is 2, the positive index of inertia is 2, the negative index of inertia is 0, and hence the signature is 2. In Example 3, the rank is 3, the positive index of inertia is two, the negative index of inertia is 3 — 2 = 1, the signature is 2 — 1 = 1. Thus the standard form of a real quadratic form is not altered by a nonsingular linear transformation, and so its positive and negative indices of inertia are both not altered. Note that two complex quadratic forms can be transformed into each other if and only if their ranks are equal. But this is not the case for two real
160
Linear Algebra
quadratic forms. The positive indices of inertia of two quadratic forms of equal ranks are not necessarily equal, hence these two quadratic forms need not have the same standard form and thus may not be able to transform into each other. Two real quadratic forms can be transformed into each other if and only if their ranks are equal and their positive indices of inertia are also equal; or if their positive indices of inertia are equal and their negative indices of inertia are also equal. In the above we have discussed the standard forms of real quadratic forms, we shall next classify real quadratic forms according to their standard forms. If k = r = n in (2), that is, the positive index of inertia of f(x\, • ■ ■, xn) is n, then (2) becomes Substituting any (x\,... ,xn) ^ 0 into (1), we obtain (yi,... P is a nonsingular matrix. Thus
f = y2 +
,yn) / 0 because
--+yl>o.
If k = 0, r = n in (2), that is, the negative index of inertia of f(x\,..., is n, then (2) becomes i
and for any (x\,...,xn)
2
/ = -2/i ^ 0, we have f = -y2
xn)
2
J/„ , y2n<0-
A real quadratic form f(x\,...,xn) is said to be positive definite if for any (xi,..., xn) / 0, we always have f(x\,..., xn) > 0. A real quadratic form is said to be negative definite if for any (x\,... ,xn) ^ 0, we always have f(xi,...,xn) < 0. f(xi,...,xn) is said to be indefinite if for some (xi,... ,xn) ^ 0 we have f(xi,... ,xn) > 0, and for other (x\,... ,xn) / 0 we have f{x\,..., xn) < 0. That is to say, if / is neither positive definite nor negative definite, / is an indefinite quadratic form. We know from the definition that k = n and r = n are sufficient conditions for / to be positive definite, k = 0 and r = n are sufficient conditions for / to be negative definite. We shall next verify that these conditions are also their necessary conditions. Theorem 3. A real quadratic form f(x\,..., only if its positive index of inertia is n.
xn) is positive definite if and
Proof. The sufficient condition has been proved in the above.
Quadratic
161
Forms
The necessary condition will be proved by the method of contradiction as follows. Suppose that f(xi,...,xn) is positive definite and its standard form is t
2 ,
2
2
/ = 2/i + • • • + Vk ~ Vfc+i
2
Vr >
where y{ = pnxi -\ h pinxn, i = 1 , . . . ,n. If its positive index of inertia k < n, then the system of linear equations 2/1 = 0 , ...,2/fc = 0, i.e.,
{
Pii^i H
hpi„a;n = 0,
PklXl
hPfcn^n = 0
H
has nonzero solution xi — a\,... ,xn = an. Substituting this solution in y^ = Piixi H VPinXn- We obtain j/j = 6j, i = fc + 1 , . . . ,r. Thus / ( o i , . . . ,a„) = —6|. j — •.. — 6j ^ 0 . This conflicts with the assumption and so k = n. The theorem is proved. If f(x\,... ,xn) is negative definite, then —f(x\,... ,xn) is positive definite. Hence we have -f(xi,...,xn) = y% + - ■ -+yl or f(xu ■ ■ ■ ,xn) = -yj y\, so the negative index of inertia of / is n. Thus we obtain: T h e o r e m 4. A real quadratic form f[x%, ■ ■. ,xn) is negative definite if and only if its negative index of inertia is n. For instance, the quadratic forms in Example 2 and 3 in the preced ing section are neither positive definite nor negative definite, hence they are indefinite. Since a nonsingular real transformation does not alter the indices of inertia, a real quadratic form transformed by a real nonsingular linear transformation is still positive definite, negative definite, or indefinite. E x a m p l e 2. Is the quadratic form / = —5a;2 — 6y2 — Az2 + \xy + 4xz positive definite or negative definite? Solution: By using the method of completing squares, / can be rewritten as
f=
-(2z-x)2-(2x-y)2-by2,
162
Linear Algebra
For any (x, y, z) j^ 0, obviously, 2z — x, 2x — y, y are all not zero, so we have / < 0. Hence / is negative definite. Example 3. Prove that the real quadratic form n
/ =2^»| +
^2
XiXi+i
is positive definite. Proof. Using the method of completing square, we easily see that
/ = 2*? + 2 &1 + X2? + 2(X2 + Xz)2 + " ' + -{Xn-x + Xn)2 + -xl . Obviously, / > 0. If / = 0, then Xi = 0, Xi + £2 = 0, X2 + X3 = 0, • • • , £„_! + In = 0, Xn = 0 , and so Xi = 0, i.e., when ( * i , . . . , x n ) ^ 0, we have / > 0. Hence / is positive definite. The proof is complete. Example 4. Prove that a real quadratic form f(xi,..., xn) can be written as the product of two real linear forms if and only if the rank of / is 2 and the signature is 0, or the rank of / is 1. Proof. Assume that / = (aixi + ••• + anxn){bixi + • • • + bnxn) and ( o i , . . . , an), (&i,..., bn) are linearly independent. Without loss of generality we may assume that Oi, a2 are not proportional to bi, b2. Thus by the nonsingular linear transformation ' 2/1 = ai^i + ^2^2 H y2 = 61X1 + b2x2 -I ' 2/3 =
1- a,nxn, 1- bnxn, x3,
■ 2/ra =
2-n
/ can be reduced to / = 2/12/2 •
Quadratic
163
Forms
Hence the rank of / is 2, the signature is 0. If ( o i , . . . , On), ( b i , . . . , 6 n ) are linearly dependent, suppose that (&i,..., bn) = k(ai,... ,an) and ai / 0, then / can be reduced to / = kyf by the nonsingular linear transformation ' 2/i == aixi aixi H H 2/2 =
(
v 2/n
~r1-dannxxnn ,, X2 x2 ,
=
Xnn .
Thus the rank of / is 1. Hence the necessary condition 1holds. We shall next prove the sufficient condition. Assume that the rank of / is 2, the signature is 0, then / = vl - vl = (vi + 2/2K2/1 - 2/2) ■ Thus / is the product of two linear forms in « i , . . . ,xn- If the rank of / is 1, then
f = ±vlHence / is also the product of two linear forms and the sufficient condition holds. The statement of the example holds. Using a quadratic form to find the maxima and minima of functions of several variables is one of its important applications. From Taylor's (B. Taylor, 1685-1731) formula we can easily see that if fx(xo,yo) = 0 and fy(xo,yo) = 0, while k and h are very small, the difference between f(xo + h, yo + k) — f(xo, yo) and the quadratic form i d2 h2 2+2hk +k2 + 2hk—— + k: 2 f(xo,yo) dy ) oxoy 2 ( " dx
\{ &
&y
w)f^
^
is an infinitesimal of higher order. Therefore f(x, y) has a minimum at (xo, 2/0), if (6) is positive definite; f(x,y) has a maximum at (10,2/0), if (6) is nega tive definite; f(x,y) has neither maximum nor minimum at (£0,2/0), if (6) is indefinite. Example 5. Find the maxima and minima of the following function: f(x, y) = Saxy - x3 - y3 (a > 0).
164
Linear Algebra
Solution: Solving the system of equations
|I=3a2/-3x2 I ££ — 3„-r _ ^,,2 —
§£=3ax-3^ = 0,
We readily obtain the critical points x = 0, y = 0 and x = a, y = a. Consider derivatives of order 2:
d2f
3a,
——r = —6x, dx2 dxdy
5y 2
-6y,
If io = 0, j/o = 0, then the quadratic form (6) becomes 3ahk. It is indefinite, hence f(x,y) has no maximum and minimum at (0,0). If xo = a, yo = a, (6) becomes — 3a(h2 — hk + k2), and is negative definite. Therefore f(x,y) has a maximum at (a, a). The solution is complete. As mentioned above, from the definition of real quadratic forms, or by reducing them to the standard form we can determine whether they are positive definite, negative definite, or indefinite. We next introduce a method for determining of a real quadratic form is positive definite, negative definite, or indefinite with the help of determinants, which is much simpler and is embodied in the following theorem: Theorem 5. A real quadratic form f(x\,... ,xn) = X1 AX is positive definite, if and only if the principal minors of the matrix A = (a,j) in the upper left corner are all positive, that is,
a n > 0,
an <*21
an
ai2
>o,
«22
ai2 a22
ai3
0.21 d31
«32
A33
an
Oln
ani
Onn
0.23
>o,
>0
J
(7)
Proof. Let us firstly prove the necessary condition. Suppose that f(x\,...,xn) is positive definite, then there is a nonsingular linear transfor mation X = PY such that / ( « ! , . . . , xn) = X'AX
=
Y'(P'AP)Y
y\ +
+ !&■
Quadratic
165
Forms
Therefore P'AP = E. Hence | P p | | P | = 1, i.e., |P| 2 |,4| = 1, then as \P\2 > 0, |A| > 0. That is to say, the determinant of the coefficient matrix of a positive definite quadratic form / is air,
an
>0. &nl
'''
ann
Using the above inequality, we next prove that all principal minors of any other order in A: a\k
, fc = l , 2 , .
Ak = flfci
are positive. Substituting (xi, ■ ■ ■ ,xk, 0, ,• 0) in / ( x i
=
,xn),
we obtain
Oln \
/ X'AX
,n — 1
O-kk
(xi---xk0---0)
xk 0
afcn \ani
voy
•xi (o?i ■ • •
xk)Ak *Zfc.
It is a quadratic form in variables xi,X2, ■ ■ ■, xk- Since / is positive definite, it is also positive definite. We know from the results obtained above that Ak > 0. Hence (7) holds. This proves the necessary condition of the theorem. We next prove the sufficient condition by induction on n. We consider first the case n = 1. Obviously we have / = a\ix\. When a n > 0, / is positive definite and so the sufficient condition holds. Now assume by induction that the sufficient condition holds for n — 1. We next prove that the sufficient condition holds for n. Suppose (7) holds. We rewrite the quadratic form f(x\, ..., xn) = y
CL'i 9 JU'i'jj 4
as
i,i=i
1 / ( x i , . . . , x„) = — ( a u x i H
h alnxn)2
+ } i,j=2
bu JbiJb -i
166
Linear Algebra
where an Since a,ij = dji, we have b%j = bj%. If we can prove that the quadratic form n
2_] bijXiXj is positive definite, then f(x\,...
,xn) is also positive definite and
i,j=2
therefore the theorem holds. Prom Property 2 of determinants we obtain an
ai2
■■■ an
on
012
•••
an
&21
«22
• ■ ■ 0,2%
0
622
•••
&2»
O-il
Oj2
■ • ■
0
6j2
• • ■
bu
On
622
■••
b2i
bi2
...
>0,
i = 2, 3,.
i = 2, 3 , . . . , n .
= an bu
Prom (7) we obtain 622
■ •
I >2i
bi2
••
I Hi
,n.
Hence by the induction hypothesis, we know that the quadratic form in n — 1 n
variables, 2_] b^XiXj, is positive definite. So the sufficient condition holds and i,j=2
thus the theorem is proved. Since f(x\, ..., xn) is negative definite if and only if —f(xi, . . . , xn) is positive definite, from Theorem 5 above we have: Theorem 6. A real quadratic form f{x\,.. .,xn) = X'AX is negative definite if and only if the negative signs of the principal minors in the upper left corner of the matrix A = (a^) alternate with the positive signs as the orders of the minors increase, i.e., a n < 0,
an fl21
012 «22
>0,
an . (-1)" Onl
an
012
OlZ
a2i
a 22
«23
031
032
033
air
>0.
Quadratic Forms
167
After the order of variables is changed, the positive definiteness and the negative definiteness of a quadratic form are not altered, so Theorem 5 and 6 can be written in a more general form: Theorem 7. A real quadratic form f(xi,...,xn) = X'AX is positive definite if and only if all the principal minors of its matrix A are positive; and / is negative definite if and only if all the principal minors of A whose orders are odd numbers are less than zero and all the principal minors of A whose orders are even numbers are greater than zero. For simplicity, the above classification and properties of real quadratic forms are also frequently stated in terms of matrices. When a real quadratic form / = X'AX is positive definite, negative def inite, or indefinite, the real symmetric matrix A is called a positive definite matrix, a negative definite matrix, or an indefinite matrix respectively. Hence if A is a positive definite matrix, then A is congruent to E, i.e., P'AP = E; If A is a negative definite matrix, then A is congruent to —E, i.e., P1 AP = —E. The converse is also true. Moreover we easily see that if P is a real nonsingular matrix, then the properties that a real symmetric matrix A is a positive def inite matrix, a negative definite matrix, or an indefinite matrix coincide with those of P'AP. It follows from Theorem 5 and Theorem 6 that a real symmetric matrix A is positive definite if and only if the principal minors of A in the upper left corner are greater than zero; A is negative definite if and only if the principal minors in the upper left corner of A whose orders are odd numbers are less than zero and those whose orders are even numbers are greater than zero. Likewise we have general results analogous to Theorem 7 too. Example 6. Prove that the matrix A=
1 1 M 1 2 3 3 6/
V is a positive definite matrix.
Proof. Since the principal minors in the upper left corner of A,
1, \ j =1, 14 = 1, are all positive, A is a positive definite matrix.
168
Linear Algebra
E x a m p l e 7. Prove that the matrix
r-5
HV ^ i
2
!
2 -6 0
M 0
"4/
is a negative definite matrix. Proof. Since the principal minors of A are —5, 26, —80, A is a negative definite matrix. This coincides with the result in Example 2. Moreover, in Example 3 of the previous section, the matrix
3
(i i -o ) is an indefinite matrix as a n = 0. E x a m p l e 8. Prove that a real symmetric matrix A is a positive definite matrix if and only if A = B'B, where B is a real nonsingular matrix. Proof. If A is a positive definite matrix, then there is a nonsingular matrix P such that P'AP = E. Hence A = ( P ' ) _ 1 P _ 1 = ( P _ 1 ) ' P _ 1 , so the necessary condition does hold. Conversely, if A = B'B, then ( B ' ) - 1 A B " 1 = E, or (B^)'AB~l = E, as B is a nonsingular matrix. Hence A is a positive definite matrix. Thus the sufficient condition does hold. The proof is complete. E x a m p l e 9. If A = (ay) and B = (bij) are positive definite matrices of order n, prove that C = (a,ijbij) is also a positive definite matrix. Proof. Since B is positive definite, we have B — P'P, where P = (py) is a nonsingular matrix. Hence bij = ^IpkiPkj- Let X = (x\,... ,xn)' ^ 0. Then X'CX
= 2 J a,ij b^ XiXj = 2_jaio I z J PkiPkJ )
= 5J [z2 aii yki Vki) ' k
= j2nAYk, k
where Yl = (tfci, • - -, !fcn}>
Vki = PkiXi
'
x x
iJ
Quadratic
Since PX = I
:
169
Forms
^ 0, £>i,:,• • • -YjVm are not all zero, i.e.,
are not all zero. Therefore ^Y^AYk
> 0, i.e., X'CX
Yi,...,Yn
> 0. Thus C is a
k
positive matrix. The proof is complete. Exercises 1. Assume that substituting any x\ ^ 0 , . . . ,xn ^ 0 in a quadratic form /(a»i,..., x n ) gives / > 0. Is / positive definite? 2. Judge whether or not the following quadratic forms are positive definite, negative definite: (1) —5x1 ~~ §x\ ~ 4^3 + 4xiX2 + 4xiX3, (2) x\ + x\ + 14^3 + 7x1 + 6x1X3 + 4a;ix4 — 4x 2 x 3 + 2x2X4 + 4x3X4. 3. Find the value of A for which the quadratic form / ( x , y, z, w) = A(x2 + y2 + z 2 ) + 2xy - 2yz + 2zx + w2 is positive definite, and discuss the case of A ^ 2. 4. If the positive index of inertia and the negative index of inertia of a real quadratic form ^ k, I respectively; a j , . . . , Ofc are any k positive numbers, b\,.. ■, bi are any I negative numbers, show that this quadratic form can be reduced to aiy2 + ■■■ + akyl + hvl+i + ••- + hyl+i ■ 5. Prove that the rank r and signature s = p — I of a real quadratic form are both even numbers or both odd numbers, and — r ^ s < r. 6. Assume that a real symmetric matrix A = (aij) is positive definite, 61, 6 2 , . . . , bn are n any nonzero real numbers show that B = (a^ bi bj) is also a positive definite matrix. 7. Show that if the matrix (aij) is positive definite, then an > 0, and if (ay) is negative definite, then an < 0, where i = 1,2,..., n. 8. Suppose A is a positive definite matrix. Prove that A', A'1, A* are also positive definite matrices. 9. If A is a nonsingular real matrix, show that A'A is a positive definite matrix. 10. If M is a nonsingular real matrix, A is a real symmetric matrix, prove that the positive definiteness, negative definiteness, and indefiniteness of the matrix M'AM coincide with those of the matrix A.
170
Linear Algebra
11. Are A + B, A — B, and AB positive definite matrices if A and B are positive definite? 12. Prove that the principal minors of any orders of a positive definite matrix are greater than zero. 13. Let a quadratic form be f[xx, ■ ■ • , xn) = g\ + • • • +g$ - g2p+l
g%+q ,
where gi = anxi H
rainxn,
i — 1,2,••• ,p + q.
Show that the positive index of inertia of / does not exceed p, and that the negative index of inertia of / does not exceed q. 14. If & < a < \ , determine the classification of the real quadratic form n
f(xi,
n— 1
■■■ , xn) = cos a ■ x\ + 2 cos a ■ ^ J x\ + 2 V ] z ^ i + i . t=2
*15. Suppose ^2
t=l
positive definite quadratic form. Prove that n
E i,j=2
an
aij
an
a^
is also a positive definite quadratic form. *4.3. Bilinear Forms Let f{x\,..., xn; 2/1,..., yn) be a polynomial in two sets of variables xi,...,xn and yi,.-.,ynIf for every set of variables, / is a linear form, then / is called a bilinear form. For example, if f(x\,... ,xn) is a, linear form and 5(2/1,..., yn) is a linear form in j / i , . . . , y n , then / ( x i , . . . ,a; n ) 9(yi, • • •.Vn) is a bilinear form in a?i,..., x„; 1/1,..., yn. In general a bilinear form / can be written as n
where yi\
■'-')
Y=
: I, \Vn)
A = (aij).
Quadratic
171
Forms
Since / is determined uniquely by A, A is called the matrix of / . The rank of A is called the rank of / . When j/j = Xi, f becomes a quadratic form. Thus, the quadratic form is a special case of the bilinear form. The following fundamental properties are completely analogous to those of quadratic forms. When X, Y in a bilinear form f(X, Y) = X'AY are transformed by nonsingular linear transformations with matrices P and Q respectively, the bilinear form f(X,Y) becomes a bilinear form whose matrix is P'AQ. Its proof is completely analogous to that in preceding section. If the matrix A of f(X, Y) = X'AY is a symmetric matrix, then / is called a symmetric bilinear form. Evidently, we have f(X,Y) = f(Y,X). If A is a skew-symmetric matrix, then / is called a skew-symmetric bilinear form and f(X, Y) = —f(Y, X). The converse is also true. If X, Y in a bilinear form f(X, Y) are transformed by the same nonsingular linear transformation, then a symmetric bilinear form remains a symmetric bi linear form , a skew-symmetric bilinear form remains a skew-symmetric bilinear form. If the rank of A is r, Theorem 5 in Sec. 3.3 gives that there are nonsingular matrices P and Q such that PAQ = d i a g ( l , . . . , 1,0,..., 0). Hence if the rank of f(X, Y) = X'AY is r, then there exists a nonsingular linear transformation such that / becomes the standard form x[ y[ + • ■ ■ +x'ry'r, i.e., / = A y[ + ■ ■ ■ + x'r y'r .
If the matrix of / is a real matrix, then / is called a real bilinear form. Like real quadratic forms, real symmetric bilinear forms also satisfy the law of inertia. That is, in the standard form of / / = x[y'i + ■■■ + x'ky'k ~ x'k+iy'k+i
x
'rV'r
the number k of positive terms is determined uniquely, the number r — k of negative terms is also determined uniquely. Hence like real quadratic forms, for real bilinear forms we also have the concept of positive index of inertia and negative index of inertia. Exercises 1. Reduce the bilinear form f(X, Y) = zij/i + 2xi2/2 - x2yi - x2yi + 6xiy3 to the standard form.
172
Linear Algebra
2. Assume that /(X, Y) is a bilinear form. Prove that ffaXi + k2X2, Y) = h / ( I i , F ) + k2 fiX2, Y), fiX, kxYx + k2Y2) = h fiX, Yx) + k2 fiX, Y2).
CHAPTER 5 MATRICES SIMILAR TO DIAGONAL MATRICES
In the previous chapter we demonstrated that any symmetric matrix can be congruent to a diagonal matrix, i.e., for any symmetric matrix A there is an invertible matrix P such that P'AP is a diagonal matrix, the background of which is a quadratic form. In this chapter we shall consider matrices that are similar to diagonal matrices, where the matrices under discussion are square matrices but are not necessarily symmetric matrices. This type of problem arises from the treatment of linear transformations that will be discussed later. Normally, the content of this chapter should come after linear transformations. However, as the content of this chapter deals mainly with calculations, not new concepts, we put it ahead of linear transformations for the sake of continuity with the previous chapters. First we shall give the necessary and sufficient conditions for a matrix to be similar to a diagonal matrix, and then discuss on how to reduce real symmetric matrices and orthogonal matrices, two kinds of very important ma trices, to diagonal matrices. In the process of discussion we need the concepts of eigenvalues, eigenvectors, A-matrix, etc., as well as their basic properties. Besides, the concept of the minimal polynomial is also necessary. These con cepts are themselves very important and useful. In this chapter we shall discuss the following four problems in detail: 1. Concepts of eigenvalues and eigenvectors, and their basic properties. 2. The necessary and sufficient condition for a matrix to be similar to a dia gonal matrix, and the method of reducing matrices to diagonal matrices.
173
174
Linear Algebra
3. Methods for reducing real symmetric matrices and orthogonal matrices to diagonal matrices. 4. Minimal polynomials of matrices. The chapter consists of five sections. In the first four we deal with the first three problems, and in the last section we discuss the fourth problem. 5.1. Eigenvalues a n d Eigenvectors What kind of matrices can be similar to diagonal matrices? This section and the next section will discuss this question. We say that B is similar to A, denoted by A ~ B, if there is an invertible matrix P such that B =
P~1AP.
Prom Example 2 in Sec. 3.2 we know that the only matrices similar to themselves are the zero matrix, the identity matrix, and the scalar matrix. The ranks of similar matrices are equal. From the definition we easily see that the similar relation also satisfies the three laws of equivalence relations. Therefore similar relations are also equivalent relations. Prom B = P~1AP, we have P~lA = BP~l = Q. Hence we have A = PQ,
B = QP.
That is to say, if A is similar to B, then there are a matrix Q and a nonsingular matrix P such that the above relations hold. Example 1. Prove that similar matrices have the same trace. That is, when a matrix is transformed into a similar matrix, their trace remain un changed. Proof. Let B = P~1AP. obtain
Prom Example 10 of Sec. 3.1 we immediately
trB = t r ( P _ 1 A P ) = ti(AP ■ P'1)
= trA.
Suppose A = (ctij) is a matrix of order n. If it is similar to a diagonal matrix, we can write
Matrices Similar to Diagonal Matrices
175
or \
(* AP = P\
A„/ Writing the n column vectors of P as X\,... ,Xn in their proper order, i.e., P = (Xi, X 2 , . . . , Xn), from Example 3 of Sec. 3.2 we get (^
AP = AP = (AX {AX±1-AX ■■■AX n), n),
pl
r\
..
\
) = (X1X■1---X • ). = (A1X1 nXnn). nXX
xn)
Hence we have
(AXX1---.AX {AX ■■AXJ^^Xi= n)
(\1Xi---\ •• X ), nXnn), nX
and so AXi = XiXu AXi=XiXi,
(XiE - A)X or (X A)X4{ = 00,, ii ==l 1, , -•- ■.•, ,ra. n. tE -
(1)
The above relation is very important. It will be explained in detail later. For convenience to quote it we first give the following definitions. A matrix whose elements are polynomials in a parameter A is known as a X-matrix. A matrix with scalar elements as discussed before is called a constant matrix. Of course, as a number can be considered as a polynomial of degree zero, a constant matrix can also be regarded as a A-matrix. Suppose A — (aij) is a constant matrix. The A-matrix
XE- -A
=
(
(XA -- a ann • •■■ --ain a i n \\ ... . . ... • • • XA—— a na \ —a - a nni n nn/ J
is called the characteristic matrix of A. We know from Property 2 of a deter minant that A -—a a n n • • • • —ai —ai n n \XE-A\ = \XE-A\= —a n ] —a„]
■ • • AA——aan nnn
176
Linear Algebra
A -an
0 -aln
0-a„i
A — a nn -an
A
0 A
0
■••
A
•■•
0
■■■
0
•■•
0
■••
0 0
ni
0
A
+{ I -a — a
A 0
-021
0 A
ln
-an
1
-On -1,71
-ann
+ •■
}+ ••• +
-air.
-0.nl
= A" - ( a u + • • • + a n n ) A " " 1 + • • • + ( - l ) n | , 4 | , i.e., /(A) = \XE - A\ = A" + a i A " - 1 + • • • + a i A " - 1 + • • • + a „ , where ai = - ( a n H
h a„n) = -tr(yl),
a„ = ( - l ) " | ^ | ,
ai is equal to the product of the sum of all principal minors of order i of A and ( - 1 ) \ /(A) is a monic polynomal of degree n in A, it is said to be the characteristic polynomial of the matrix A. T h e equation /(A) = 0 is called the characteristic equation of A. T h e solutions (roots) of the characteristic equation are known as the eigenvalues (characteristic roots) of A, the fc-multiple roots of /(A) = 0 are called the eigenvalues (characteristic roots) of multiplicity k of A. For example, given
1 2 \ 3 1 3 5/
2
-4 \* we have ax a2 =
-(2 + 3 + 5) 3 3
1 5
+
2 2
2 5
10, a 3 = -\A\ = 2 1 + -4 3 = 2 8 ,
so the characteristic polynomial of A is f(\)
= A3 - 10A2 + 28A - 10.
10,
Matrices Similar to Diagonal
177
Matrices
Obviously a matrix of order n has n characteristic roots, if the multiple roots are counted. Their sum is the trace of the matrix A, their product is exactly \A\. Therefore a necessary and sufficient condition for the zero to be a characteristic root is that A is singular. Assuming that the rank of A is r(< n), then ar+i = a r +2 = • • • = a„ = 0. Hence the multiplicity of a zero characteristic root of A is at least n — r. Let A be the zero matrix of order n. Then its characteristic polynomial is /(A) = A™. Hence the characteristic roots of A are all zero, i.e., the characteristic roots of the zero matrix are all zeros. Note that when A is a real matrix, /(A) is a polynomial with real co efficients. Therefore the characteristic roots of a real matrix are not all real numbers. For example I ) has no real characteristic roots. The complex characteristic roots of A appear in complex conjugate pairs. Assume Ao is a characteristic root of A. Then /(A o ) = | A 0 £ - A | = 0 . Hence the homogeneous system of linear equations {X0E - A)X = 0,
X
i.e.,
(Ao - an)xi
ainxn
= 0, (2)
-flni^i - ■ • • + (Ao - ann)xn
=0,
has nonzero solution vectors. Any one of its nonzero solution vectors is called an eigenvector of A associated with the eigenvalue Ao, or for short an eigen vector of A. If Xo is an eigenvector of A associated with the eigenvalue Ao, then (\0E-A)X0
= 0,
or AXQ = Ao-Xo. This is simply the previous expression (1). That is to say, Aj in expression (1) is an eigenvalue of A, Xi is the eigenvector of A associated with Aj. Obviously a vector cannot be an eigenvector associated with two different eigenvalues.
178
Linear Algebra
Eigenvalues and eigenvectors are two important basic concepts in linear algebra and are widely used. We need them not only here but also in other branches of mathematics, for example in differential equations. Now we put the above-mentioned aside temporarily and first study eigenvalues and eigenvectors so as to be able to solve the first problem raised in the introductuion. Example 2. Find eigenvalues and eigenvectors of the scalar matrix
A
-[ • . ) ■
Solution: Because X—a \XE -A\
=
(A-a)3,
X—a
A = o is an eigenvalue of multiplicity 3. When X = a, (2), the homogeneous system of linear equations, becomes Oxi = 0,
0x 2 = 0,
0x 3 = 0.
Since any vector is a solution vector, any nonzero vector is an eigenvector associated with the eigenvalue a. The solution is complete. In general, the eigenvalues of any scalar matrix of order n are eigenvalues of multiplicity n. Any nonzero n-dimensional vector is its eigenvector. E x a m p l e 3. Find the eigenvalues and eigenvectors of the following matrix, A
=\
(-1 -4 \V 1
1 0\ 3 0 0 2
Solution: Since the characteristic polynomial of A is \XE -A\
=
A+ l -1 4 A-3 -1 0
0 = (A-2)(A-1)2, 0 A-2
the eigenvalues of A are Ai = 2, A2 = A3 = 1.
Matrices Similar to Diagonal
Matrices
179
When Ai = 2, the homogeneous system of linear equations (2) becomes 3 x i — X2 = 0 , Ax\ - X2 = 0 , -xi = 0.
Its system of fundamental solutions is (0,0,1). All the eigenvectors of A associated with eigenvalue Ai = 2 are /c(0,0,1), k =£ 0. For the eigenvalue of multiplicity 2, the system of linear equations (2) becomes 2x\ - X2 = 0, 4x\ — 1x2 = 0, —x\ — £3 = 0 . Its system of fundamental solutions is (1,2, —1), and sofc(l,2,—l),fc^0, are all the eigenvectors associated with A2 = A3 = 1. Example 4. Suppose A is an idempotent matrix, i.e., A = A2. Prove that the eigenvalues of A equals either zero or one. Proof. Suppose A is an eigenvalue of A, X is an eigenvector of A associated with A, i.e., AX = XX. Then AX = A2X = XAX = X2X. Hence AX = A2X or (A2 - A)X = 0. As X ^ 0, A2 - A = A(A - 1) = 0. Hence A = 1 or A = 0. Thus the statement holds. Example 5. Suppose Ai and A2 are eigenvalues of A, X\ and X2 are eigenvectors of A associated with Ai and A2 respectively. If Ai 7^ A2, then Xi + X2 is not an eigenvector of A. Proof. We verify the above statement by using the method of contra diction. If Xi + X2 is an eigenvector of A associated with eigenvalue A, then A{Xi + X2) = A(Xi + X 2 ). As AXX = XXi, AX2 = XX2, we have A(Xi + X 2 ) = A1X1 + A 2 X 2 , or (A - Ax)Xi + (A - A 2 )X 2 = 0 . If A ^ Xu then Xi = ^"^ A X2 is also an eigenvector associated with A2. This is impos sible. Thus A = Ai = A2. But this contradicts the assumption that Ai ^ A2. Therefore Xi + X 2 is not an eigenvector of A. The proof is complete.
180
Linear Algebra
Prom Example 2, we know that the eigenvalues of a diagonal matrix are simply the elements on the main diagonal of the diagonal matrix. Similarly, the eigenvalues of a triangular matrix are also the elements on the main diagonal of the triangular matrix. For a block diagonal matrix we have: Theorem 1. If A is a block diagonal matrix o
(M
A=l
V\
.,
AJ
then the characteristic polynomial of A is equal < to the product of the char acteristic polynomials of A\,A2,..., Amm. . Therefore all the eigenvalues of Ai,A2,...,A A\, A2,.. ■, Am are all eigenvalues eigenvalues of of A. A. Proof. We write the le unit matrix E as c _ 1
V
Em)
where Ei is a unit matrix whose order is the same as that of Ai. From the operational rules for block matrices, we have
(XE^-Ai XE-A
\
= \
XEm ~ Am '
therefore /(A) = \XE -A\
= \XEX - Ax\ ■ ■ ■ \XEm -
Am\.
The theorem holds. A matrix has the same characteristic polynomial as its transposed matrix. Therefore their eigenvalues coincide. This is because from (XE — A)' = XE — A' we have \XE -A\
= \{XE - A)'\ = \XE - A'\.
Matrices Similar to Diagonal
181
Matrices
Consequently two distinct matrices may have the same eigenvalues. Besides we have: Theorem 2. Similar matrices have the same characteristic polynomials and consequently the same characteristic roots. Proof. Suppose A ~ B, i.e., B = P~1AP. Taking into account that the matrix XE commutes with the matrix P and that | P _ 1 | = \P\~l we have \XE -B\
= \XE - P~lAP\ 1
= | P " | \\E-A\
= \P-\XE \P\ =
- A)P\ \XE-A\.
Therefore the theorem holds. Matrices AB and BA have the same characteristic polynomial. This is because if one of A and B is nonsingular, say A, we have A~l(AB)A = BA, i.e., AB is similar to BA. Therefore they have the same characteristic polynomial. If A and B are singular matrices, the above result also holds. The following example is a more general case. Example 6. Suppose that A is an (m, n)-matrix, B is an (n, m)-matrix, and m ^ n. The characteristic polynomial of AB and BA are /AB(X), /BA(X) respectively. Prove that fAB(A) = Xm~n ■ fBA(X)
.
This is called Sylvester's theorem. Proof. When m = n, if \B\ / 0 we have proved /Us(A) = / B A ( A ) , if \B\ = 0, as in Example 5 of Sec. 3.3, the above expression also holds, i.e., whether \B\ = 0 or \B\ ^ 0, we always have }AB{X) = / B A ( A ) . When m > n, we replenish the matrix of order m with zeros, i.e., (A O) = AU(
S)-
Then
iBo) = AB,
AXBX = (A O] B^A,
(A O)
(BA
o Om-
182
Linear Algebra
where Om-n is a zero matrix of order m — n. Hence /AiBx(A) = \XE - AiBi\ = \XE fBlAl(\)
AB\,
XE
™~BA
= \XE-B1A1\= \m-n\XE-BA\.
= Prom the above conclusion
/AIBIW
=
/BIAI(A),
JAB (A) = Xm-nfBAW
we obtain ■
Hence the statement holds. Thus when m = n, eigenvalues of AB and BA coincide. When m^n, the nonzero eigenvalues of AB and BA coincide. It must be pointed out that the above methods of searching for the eigen values and eigenvectors of a matrix are not recommended for matrices of large orders because of the need to evaluate determinants. Efficient methods of find ing eigenvalues of large-order matrices are studied under numerical analysis and will not be gone into here. In the above we have considered the concepts of eigenvalues and eigenvec tors, and explained some methods for finding them. As for using eigenvalues and eigenvectors to treat problems similar to diagonal matrices, it will be dis cussed in the next section. Exercises 1. Are all solution vectors of (XQE — A)X = 0 eigenvectors of A associated with Ao? If Xi, X2,. ■ ■, Xm are eigenvectors of A associated with Ao, are any linear combination of X\, X2, ■ ■ ■, Xm eigenvectors of A? 2. When distinct matrices have distinct eigenvalues, can they have common eigenvectors? When distinct matrices have common eigenvalues, can they have common eigenvectors? 3. Find eigenvalues and eigenvectors for each of the following matrices:
(')(-„;)■
(2)
Vi o oj
(3)
1 3 0 -1 0 0 0 0
1 2 \ 1 3 2 5 0 2/
Matrices Similar to Diagonal
Matrices
183
4. Prove that the following matrices have the same characteristic poly nomials and hence the same eigenvalues: /l 3 \6
-3 -5 -6
3\ /-3 3 , -7 4/ \-6
1 5 6
-l\ -1 . -2/
5. Assume that any nonzero n-dimensional vector is an eigenvector of matrix A of order n. Prove that A is a scalar matrix. 6. Show that if A2 = E, then the eigenvalues of A are equal to either 1 or -1. 7. Prove that if A is an eigenvalue of the matrix A, then Am is an eigenvalue of Am, where m is a positive integer. If X is the eigenvector of the matrix A associated with A, is Xm the only eigenvector of Am associated with eigenvalue A m ? 8. Assume X is an eigenvector of a matrix A. Prove that X is also an eigen vector of f(A), where f(x) is a polynomial in x. What is the relationship between the eigenvalues associated with X of A and of f{A)l 9. Given the eigenvalues and eigenvectors of a nonsingular matrix A, find the eigenvalues and eigenvectors of A~1 and A*. 10. Suppose A = (a,ij) is a singular matrix. Prove that the eigenvalues of A* are a zero root of multiplicity n, or a zero root of multiplicity of n— 1 and another root An + A22 H h Ann, where An is the algebraic cofactor of an in A. 11. Prove that the eigenvalues of a positive definite matrix are not negative real numbers. 12. If X is an eigenvector of the matrix A associated with Ao, find the eigen vectors of P~XAP associated with Ao13. Suppose the sum of the n elements in any row of a matrix A of order n is equal to a. Prove that A = a is the eigenvalue of A and ( 1 , . . . , 1) is an eigenvector of A associated with A = a. *14. Suppose A is an orthogonal matrix of an odd order. If \A\ = 1, prove that 1 is an eigenvalue of A. *15. Suppose A is a positive definite matrix and B is a nonnegative definite matrix, i.e., for any X ^ 0 the quadratic form / = X'BX ^ 0. Prove that the eigenvalues of AB are nonnegative numbers. 16. If two matrices are equivalent, are they similar? If they are similar, are they equivalent? What matrices are equivalent to the identity matrix? What matrices are congruent to the identity matrix? What matrices are similar to the identity matrix?
184
Linear Algebra
5.2. Diagonalization of Matrices We discussed characteristic polynomial in the preceding section. In this section we shall return to the diagonalization of matrices, which was raised at outset of Sec. 5.1. If a matrix A is similar to a diagonal matrix
/Ax
\
P~lAP = An/
where P = (X\,... ,Xn), then we know from (1) in Sec. 5.1 that AXi = A^AT;. Therefore the column vectors Xi of matrix P are all eigenvectors of A. Since P is nonsingular, X\, X% ..., Xn are n linearly independent eigenvectors of A. That is to say, if the matrix A of order n is similar to a diagonal matrix, then it has n linearly independent eigenvectors. Conversely, if A has n linearly independent eigenvectors Xi,..., Xn, then AXi = \Xi (i = 1,...,n). Let P = (Xi,...,Xn). Obviously P is a nonsin gular matrix. As AP = (AXi ■ ■ ■ AXn) = (A1.Y1 • • ■ XnXn) = (*l-*n)(
) = P (
A is similar to the diagonal matrix P~1AP. theorem:
Thus we have the following basic
Theorem 1. A matrix A of order n is similar to a diagonal matrix if and only if A has n linearly independent eigenvectors. If A has n linearly independent eigenvectors X\tX%t...,Xn, AXi — \X%, P = (ATi,... ,Xn), then P~lAP
= \
An,
It is worth noting that here the order of A i , . . . , An corresponds to the order of Xi,..., Xn. If the order of A i , . . . , An is changed, then the order of X\,..., Xn is changed accordingly. Therefore at this time P is not the original matrix.
Matrices Similar to Diagonal
Matrices
185
Prom the discussion above, we also know that if a matrix A of order n is similar to a diagonal matrix, then the n elements on the main diagonal of the diagonal matrix are the n eigenvalues of A. This property directly follows from Example 2 and Theorem 2 in the previous section. Therefore the diagonal matrix similar to A is unique up to the order of elements on the main diagonal. Since a matrix of order n has at most n, but not necessarily exactly n linearly independent eigenvectors, it is not necessarily similar to a diagonal matrix. For example, in Example 2 in Sec. 5.1 the matrix A is of order 3. It has 3 linearly independent eigenvectors and hence similar to a diagonal matrix. This is obvious for the matrix itself is a diagonal matrix. Again in Example 3 of Sec. 5.1, the matrix A is of order 3. However, it has at most 2 linearly independent eigenvectors and consequently is not similar to a diagonal matrix. We shall see further examples in the following. E x a m p l e 1. Find a diagonal matrix similar to A =
4
I -
3
V-3
6 0\ -5 0 -6 1
and the matrix P used. Solution: Since \XE -A\
=
A-4 3 3
-6 0 A+5 0 6 A-l
= ( A - l ) 2 ( A + 2),
A has two eigenvalues. One is the simple root Ai = —2, the other eigenvalue A2 = A3 = 1 is of multiplicity 2. For Ai = —2, the system of equations (X\E - A) X = 0 becomes -6x1 —6x2 = 0, 3xi +3x2 = 0, 3xi +6x2 — 3x3 = 0. Its system of fundamental solutions is (—1,1,1). Thus for Aj = —2, the eigen vector of A is (-1,1,1). In a similar way, for A2 = A3 = 1 we find that the eigenvectors of A are (—2,1,0) and (0,0,1). We can readily see from Theorem 1 in Sec. 2.1 that the 3 eigenvectors above are linearly independent. Therefore A is similar to a diagonal matrix.
186
Linear Algebra
Taking eigenvectors above as column vectors and constructing /-' =-
- 1 -2 l i i 1 0
(T o 1
we obtain -2 1
r AP= or
(
1
r
i
To check the correctness of the result, we first find /
1 -1
P~1=
2 0\ - 1 0
V-l and then compute If we take
l)
-2
_1
P AP. -1 0 1 0 1 1
-2' 1 0
then P'lAP
~2
=
1
or .A-
~2
\ i
1/ I From this we again see that P is not unique. The solution is complete. Example 2. Reduce the following matrix A= to its similar diagonal matrix.
/l 1 \4
2 -1 -12
2' 1 1
Matrices Similar to Diagonal
Matrices
187
Solution: \XE -A\
=
A-1 -1 -4
-2 -2 A + 1 -1 A-1 12
A-1 -1 -4
A2- 3 0 -4(A-- 2 )
A2-3 -4(A - 2)
-A-1 0 A+ 3
-A-1 A+ 3
= A3 - A2 + A - 1 = (A - 1)(A2 + 1). Thus A has 3 simple eigenvalues X% = 1 , A2 = i , A3 = —%. A simple computation gives that for Ai = 1, the eigenvector of A is (3,1, —1), for A2 = i , A3 = —i, the eigenvectors of A are (4 + 2i, 1 + i,—4), (4 — 2i, 1 — i, —4) respectively. As they are linearly independent, A is similar to a diagonal matrix. Taking
/*
i
3 4 + 2z 1 l+i - 1 - 4
4-2i' 1-i -4
we obtain
P~lAP = or
The solution is complete. It's worth noting that if we require that a real matrix be similar to a real diagonal matrix, then we cannot reduce the matrix A above to a diagonal matrix. However, we can reduce it to a simple real block diagonal matrix as given below.
188
Linear Algebra
Let the eigenvectors of A associated with eigenvalues l,t, — i be respectively and
Y1=X1
+
X2>
y2 =
X,X\,X2
^izll. i
Obviously Y\ = (8,2,-8), Y2 = (4,2,0) are two real vectors which are linearly independent. Since AX\ = iX\, AXi = —iXi, we have AYX = AX! + AX2 = iXi - iX2 = -Y2 , AY2 = \{AXx - AX2) = \(iXi + iX2) = l i . Therefore A(X Yx Y2) = (AX AYi AY2) = (X
(\ = (X Y2 Yi)
-Y2YX)
0 0\
0 -1 0
\0
/l = {X Yx Y2)
0 1/
0
0\
0
1
.
\0 -1 0/
Thus P-XAP =
0
o
0 -1
o
where 3
F = (X Yt Y2)
1
l-l
8 4\ 2 2 -8 0/
The same procedure applies to the general case. This is because complex eigenvalues of a real matrix appear in complex conjugate pairs. In the same way, we reduce each pair of complex eigenvectors to real vectors (in general they are no longer eigenvectors), while the real eigenvectors remain unchanged. From AX = XX, we obtain AX = XX (in the proof of Theorem 2 in Sec. 5.3 there will be more explicit explanation). Therefore, when the eigenvalues are complex conjugate quantities, the components of the eigenvectors associated with them are also complex conjugate quantities. We thus obtain real block diagonal matrices. Example 3. Suppose / 4 6 0\ A - |1 -3-5 0 1-3 -6 I find A 100 .
Matrices Similar to Diagonal
189
Matrices
Solution: From Example 1, we obtain /(-2
P~lAP
\
=
1
\V
1
and hence >1
-2
A = P
1 1 ) " - ' •
J
where P=
1 I 1 1
-2 1 0
0
2 -1 -2
/ 1 p~l
= \ -1
U
!
0 0 1
Thus
A2 = P I
>i
~2
I P- J P
1
k
1
!y = P\
1 P-i,
i
and so
r
100
1100 _ r, |
i
|
p-1
V
/olOO
;
i I
S p-1
i i
-2
1 1
1 0
0 ] I 1/ \
-2100 2100
-2 1
2 ioo
0
-2100 + 2 2 ioo
_ ioo _ 2
1 l
/2100
0\
\
0\ / 1 0 - 1 Yj 2ioi
2 0' - 1 0
y_x
_ j ioi _ 2 2
_2 i 0N 0 x
J
2
0'
- 1 - 1 0 1 / V —1 —2 1
1
-2101+2
/
190
Linear Algebra
This process of computing matrix powers is quite useful in the study of a system of linear differential equations. Example 4. Suppose a matrix A of order n is an idempotent matrix, i.e., A2 = A, and the rank of A is r. Prove \ !
<
■
A ~ ( , ^\
v1,
r \ l
A~
V
)
Q
' 0/ r, ,
Proof. From Example 4 in the previous section, we know that the eigen values of A are equal to either 1 or 0. Suppose (E-A)X
= 0,AY = 0
and rank of A = r, rank of {E — A) = s. Since the sum of the ranks of A and (E— A) is n, we have r+s = n (see Exercise 6 in Sec. 3.1). In the solution vectors of (E — A)X = 0 there are n — s = r linearly independent solution vectors. Let them be X\,X2, ■ ■ ■ ,XT. In the solution of AY = 0 there are n — r = s linearly independent solution vectors. Let them be Y\, Y%,..., Ys. From definition we can easily check that the vectors Xi,X%,...,Xr, Yi, Y%,... ,YB are linearly independent. This is because if fciXi + ■ ■ ■ + krXr + hYi + ■ • • + lsYs = 0, premultiplying by A we obtain k1AX1 + ■■■ + krAXr + hAYi + ■■■+ lsAYs = 0, or kiXi -\
h kTXr = 0.
Hence k\ = ■ ■ ■ = kT = 0, and consequently l\ = • • ■ = ls = 0. Therefore A has n linearly independent eigenvectors and so we have p-1AP = diag(l,...,l,0,...,0),
Matrices Similar to Diagonal Matrices
191
where
P=
(X1-XrY1--Yt).
The proof is complete. Example 5. Two diagonal matrices
-{' - > C ..) are similar if and only if the numbers 61,62,- ■ ■ ,bn are a permutation of a.\,a>2,...
,an.
Proof. If A ~ B, then as similar matrices have the same eigenvalues, 61,62,..., bn are a permutation of 0 1 , 0 3 , . . . , an. Conversely suppose &i, 62, • • •, &n are some permutation of ai, 02, • ■ ■, a n . As A has n linearly independent eigenvectors, we first find n eigenvectors and write them according to the order of 61,62, ■ ■ ■, bn. Then using these eigenvectors as column vectors, we construct a matrix P and obtain P _ 1 A P = B. Thus we have A ~ B. The proof is complete. In the above we have given the necessary and sufficient conditions for a matrix to be similar to a diagonal matrix, and explained the method of reducing a matrix to a diagonal matrix, thus solved the second problem raised in the introduction. When we diagonalize a matrix, the linear relation between the eigenvectors is an important relation, which will be explicitly discussed below to illustrate Theorem 1. For the linear relation between eigenvectors associated with distinct eigen values, we have: Theorem 2. Suppose A i , . . . , Am are distinct eigenvalues of the matrix A, Xi,Xz, • • •, Xm are eigenvectors associated with Ax, A2,..., Am respectively. Then Xi, X2, ■ ■ ■, Xm are linearly independent. Proof. We shall prove this assertion by induction with respect to m. When m = 1, the theorem obviously holds for any single nonzero eigen vector that is always linearly independent.
192
Linear Algebra
Suppose that for m — 1 distinct eigenvalues, the theorem holds. We have to prove that for m distinct eigenvalues the theorem also holds. Assume kiXi + • • ■ + fcm_iXm_i + kmXm = 0 . Since AXi = XiXi, premultiplying the above equality by A we obtain kiXiXi + ■ ■ ■ + fcm_iAm_iXm_i + Km^rn-A-Tn — " •
Eliminating Xm from the above two equations we get fci(Ai - A m )Xi H
\- km-i(\m-i
- Xm)Xm-i
= 0.
According to the induction hypothesis, X\, X2, ■ ■ ■, Xm-i are linearly indepen dent, and furthermore A; — Am / 0(z = 1,2,..., m — 1). They require that fci = 0, ...,fc m _i = 0, hence km = 0. That is to say, X\,X2,--,Xm are linearly independent. Consequently the theorem holds. Thus we obtain the following important sufficient condition for a matrix to be similar to a diagonal matrix: Theorem 3. Any matrix whose eigenvalues are all simple roots is similar to a diagonal matrix. For example, in Example 2, since the eigenvalues of A are all single roots, it is similar to a diagonal matrix. However, it is worth noting that the converse of Theorem 3 does not hold. That is to say, the eigenvalues of a matrix similar to a diagonal matrix are not necessarily single roots. Example 1 is such an example. The matrix is similar to a diagonal matrix but has multiple eigenvalues. From Example 5, we easily see that if two matrices of order n have distinct eigenvalues and the two sets of eigenvalues coincide, then the two matrices are similar. For the linear relationship between some eigenvectors associated with dis tinct eigenvalues, we have: Theorem 4. Suppose Ai and A2 are two unequal eigenvalues of the matrix A, X\, X2, ■ ■ ■, Xm and Y\, Y2,..., Yn are linearly independent eigenvectors of A associated with Ai and A2 respectively. Then X\,..., are linearly independent.
Xm, Y\,... ,Yn
Matrices Similar to Diagonal
Matrices
193
Proof. Suppose fciXi + • • • + kmXm + hYi + ■ ■ ■ + lnYn = 0. If kxXi + ... + kmXm =/= 0, then hYi + ... + lnYn ^ 0. Since the vector faX\ + . . . + kmXm ^ 0 is an eigenvector associated with Ai, the vector 1{Y\ + ■ ■ ■ + lnYn ^ 0 is an eigenvector associated with A2, the two eigenvectors associated with Ai and A2 respectively are linearly dependent. This contradicts Theorem 2. Therefore we must have hXi
+ ■■■ + kmXm = 0, hYi + • • • + lnYn = 0.
Furthermore since X\, X2, ■ ■ ■, Xm are linearly independent, Y\, Y2,..., Yn are also linearly independent, we have fa = 0, • • • , km = 0, h = 0, • • • , ln = 0 . That is to say, X\,...,Xm ,Yi,...,Yn are linearly independent. Therefore the theorem holds. For instance, in Example 1, (—1,1,1) is an eigenvector associated with Ai = —2 ; (—2,1,0), (0,0,1) are linearly independent eigenvectors associated with A2 = 1. Since Ai ^ A2, the 3 eigenvectors (-1,1,1), ( - 2 , 1 , 0), (0,0,1) are linearly independent, which needs no proof. Theorem 4 can obviously be generalized to any finite distinct eigenvalues. Its proof is exactly the same as for Theorem 4, which is omitted to avoid repetition. As to the linear relation among eigenvectors associated with the same eigen value, we have: Theorem 5. Suppose A is a matrix of order n, Ao is its eigenvalue of multiplicity k. Then among eigenvectors of A associated with Ao, the number that forms the largest linearly independent set cannot be greater than k. Proof. Since the eigenvectors of A associated with Ao are nonzero solution vectors of the homogeneous system of linear equations (X0E - A)X = 0, the number of the above-mentioned eigenvectors contained in the largest lin early independent set is equal to the number of solution vectors in a system of fundamental solutions. Suppose that its system of fundamental solutions
194
Linear Algebra
(written as column vectors of dimension n) are X\, X2, ■ ■ ■, Xi. If we are able to prove that (A - A0)( can divide /(A) = \XE - A\, i.e., /(A) = (A \0)lg(\), then I is not greater than the multiplicity k, i.e., I < k, and the theorem will hold true. Prom Theorem 2 in Sec. 5.1, we know that
\XE-A\
\\E-P-lAP\.
=
Hence if we can find some P such that /
P^AP
L
=
A0
* \
\
S. A0
V o
N^.
M)
then /vA-Ao 1
XE-P~ AP
=
A-A„
XE1 - A1 J
0
V
According to Laplace's theorem, expanding \XE — P lAP\ about the first I columns we arrive at /(A) = \XE - P~lAP\
= (A - A0)'|AEi - A x | .
Thus (A - A0)' divides /(A). Using the above eigenvectors X\, X2, ■ ■ ■, Xi, we can find P in the following way. According to Example 3 in Sec. 2.1, we can find n — I appropriate ndimensional vectors X1+1, Xi+2, ■ ■., Xn such that X\,...,
Xi, Xi+\,...,
Xn
are linearly independent, and construct a matrix of order n P = (Xi-Xt
Xl+1---Xn),
Obviously it is a nonsingular matrix. Prom the multiplication rule, we have AP = (AXi ---AXi
AXl+1 ■ ■ ■ AXn).
Matrices Similar to Diagonal
195
Matrices
Since AXi = X0Xi (i = 1,2,.../) and AXj(j = 1 + 1,1+2,. . n) can be written as linear combinations of Xi, X2, ■ ■ ■, Xn, we have AP = (AoXi ■ • • \0Xt
AXl+1 ■ ■ ■ AXn) (/ , Ao^
= {X\--Xi
Xi+i---Xn)
*1 v Ao
V o This is the form in matrices required. The theorem is completely proved. Clearly, when the eigenvalues are simple roots, a largest linearly indepen dent set consists of only one eigenvector in the eigenvectors associated with each simple eigenvalue. Assume that a matrix A of order n has ki linearly independent eigenvectors associated with each eigenvalue Aj of multiplicity ki. Then A has n linearly independent eigenvectors. This is because the sum of all ki is equal to n and according to Theorem 4, the whole of linearly independent eigenvectors associated with distinct eigenvalues are still linearly independent. Therefore from Theorem 1 we know that A is similar to a diagonal matrix. Further, according to Theorems 1 and 5, the inverse also holds. Now again suppose the rank of the matrix (XE — A) is r. Then the system of fundamental solutions of the homogeneous system (XE — A)X = 0 consists of n — r solution vectors. When A is an eigenvalue of multiplicity k, the above theorem requires n — r ^ k, or r Js n — k. If the rank of (XE — A) is n — k, the A has k linearly independent eigenvectors associated with the eigenvalue A. Thus we have the following important theorem: Theorem 6. A matrix A of order n is similar to a diagonal matrix if and only if for each eigenvalue Aj of multiplicity ki, the rank of the characteristic matrix (XiE — A) is n — A;,. Thus by computing the rank of the characteristic matrix we can determine if a matrix is similar to a diagonal matrix or not. At last we shall discuss the problem similar to a triangular matrix. Any matrix of order n need not always be similar to a diagonal matrix. Nevertheless, it is always similar to a triangular matrix. Thus we have: Theorem 7. Any matrix of order n is similar to a triangular matrix.
Linear Algebra
196
Proof. Suppose A is matrix of order n with characteristic polynomial
f(\) = \\E-A\
=
(\-\l).--(\-\n).
We shall prove this theorem by induction with respect to n. For n = 1 the theorem obviously holds. Suppose the theorem holds for a matrix of order n — 1. Let X\, X2, ■ ■ ■, Xn be n linearly independent column vectors (not necessarily eigenvectors), where X\ is an eigenvector of A associated with Ai, i.e., AX\ = \\Xi. As in the proof of Theorem 5, let Pl =
(X1X2--Xn),
and thus APi = {AXX
AX2--- AXn) = (A1X1 AX2--
AXn).
Hence P, _ 1 AP, =
/Ai 0
612
b\n\
••
Ax
J
Vo Again the characteristic polynomial of A\ is
/i(A) = \\E - Ax\ = (A - A2) • ■ ■ (A - A„) From the induction hypothesis we have /A2 '
1
Q~ A1Q=
An
Let /l 0
Vo Then P - X A P = (PiP2)-lA(PiP2) p-1
/Ai 0
612
°\
0
Q
= •■
P = PiPi
J P2-\Pr1AP1)P2 6ln\
Vo
A2
P2 =
A, and so the theorem holds.
/^l
/
V
An/
Matrices Similar to Diagonal
197
Matrices
Example 6. Find the triangular matrix similar to
t - 22 l5 \-3
2
1\ 1 5/
Solution: The eigenvalues of A are A = 4, which is an eigenvalue of multi plicity 3. The vector (1,1,1) is an eigenvector associated with A = 4. Taking P = (X1X2X3), where l
Xx = we have AP = {AXX = (Xi
\
w 1
x2 =
f'\ W 1
x3 =
W
AX2 X2
AX3) = (4X1 -XX+4X2 /4 -1 - 3 ' X3) 0 4 1
\0
0
/M 0 - 3Xt + X2 + 4X 3 )
4
or /4 -1 P~lAP = 0 4 \0
0
-3' 1 4
This is the triangular matrix required. Example 7. Suppose A is a complex matrix and A is an eigenvalue of A2. Prove that VA or — vA is also an eigenvalue of A. Proof. Suppose the matrix A is similar to a triangular matrix B: 'bn
*
A~B Then / u: n
B2=
'
bl, Since similar matrices have the same eigenvalues, A must be equal to some b\k, i.e., A = b\k. Hence bkk = \^A or bkk = - VA, i.e., V\ or —y/X is an eigenvalue of A. The proof is complete.
198
Linear Algebra
Exercises 1. Suppose the eigenvalues of a matrix A are A i , . . . , An, and the eigenvectors of A associated with A i , . . . , An are X\,X2,- ■ ■, Xn respectively. Let P = (Xi,..., Xn)- Does the following expression hold? P~lAP
= A„/
2. Which of the matrices in Exercise 3 in the previous section are similar to diagonal matrices? If they are diagonalizable, what are the transformation matrices required? 3. Can we reduce a matrix A to a diagonal matrix by elementary operations? 4. Prove that a nonzero nilpotent matrix cannot be similar to a diagonal matrix. 5. Prove the sufficient condition in Example 5 using elementary operations. *6. Reduce the following matrix A to a triangular matrix: A=
/-I 1 0\ - 4 3 0 1. V 1 0 2)
5.3. Diagonalization of Real S y m m e t r i c Matrices In the previous section we have discussed matrices similar to diagonal ma trices. In this section we shall discuss the problem of reducing real symmetric matrices to diagonal matrices by orthogonal matrices. This is known as prob lem of the principal axis of a quadratic form. It has wide applications. For example, it is needed in mathematical statistics. As we shall confine overselves to the realm of real numbers, elements of vectors and matrices are required to be real numbers. The inverse matrix of an orthogonal matrix is its transposed matrix and so this section is a continued and deep discussion of the previous section. Is a real symmetric matrix A of order n similar to a diagonal matrix? If the eigenvalues of A are all simple roots, obviously A has n linearly independent eigenvectors and is thus similar to a diagonal matrix. When A has multiple eigenvalues, it still has n linearly independent eigenvectors. That is to say, any real symmetric matrix of order n has n linearly independent eigenvectors. Its proof is more complicated and will be left to the end of the section so as not to interrupt the continuity with the previous section. Now we shall first take for granted the truth of the following.
Matrices Similar to Diagonal Matrices
199
Theorem 1. Any real symmetric matrix is similar to a diagonal matrix. Since the elements of the main diagonal in a diagonal matrix are its eigen values, as long as the following theorem holds, a real symmetric matrix is similar to a real diagonal matrix. Theorem 2. The eigenvalues of a real symmetric matrix are real numbers. Proof. Suppose a matrix A is a real symmetric matrix, A is its eigenvalue and X = (xi,..., xn)' is an eigenvector associated with the eigenvalue A, i.e., AX = XX. Performing conjugate operation on the two sides of the equality, we obtain AX = AX. According to the properties of conjugate complex numbers, AX = AX, XX — XX, and thus AX — XX. Performing transposing and using A = A, A' = A, we obtain
xx" = X'A'
= X'A.
Postmultiplying both sides by X, we obtain
xx'x = x'AX = xx'x. Hence
(A - X)x'x = 0. Since X ^ 0, we have 'X!
XX
= (xi ■■ -xn) I : \=xiX\-\
\-xnxn^0
and thus A = A, which means that A is a real number. Therefore the thoerem holds. It's worth noting that the eigenvalues of a symmetric matrix need not ' i 0N always be real number. For example, the symmetric matrix I I has eigen values 0 and i, not all real numbers. We know that if the coefficients of a system of linear equations are real numbers, then its solutions are also real numbers. Therefore the eigenvectors of any real symmetric matrix A can be taken to be real vectors. As in the previous section, taking n real eigenvectors of A which are linearly independent
200
Linear Algebra
as column vectors, we obtain a real matrix P such that P~lAP is a real diagonal matrix. However, here P is not orthogonal in general. We shall explain below how to find an orthogonal matrix Q from P such that Q~XAQ is a real diagonal matrix and so solve the problem of the principal axis. First we introduce some basic concepts. From analytic geometry we know that the length of a vector (a, b, c) is \Jo? + b2 + c2. A vector of unit length is called a unit vector. Two real vectors (oi, 61, ci) and (02, ^2, C2) are orthogonal if a\bi + 02^2 + ^363 = 0- These are true for the general case of arbitrary dimensions in the realm of real numbers. A real vector a = (ffli,... ,an) is called a unit vector it a2 + ■ ■ ■ + a^ = 1. Two real vectors a = ( o i , . . . , a n ) , (3 = ( 6 1 , . . . , bn) are said to be orthogonal vectors if a\b\ + .. . + anbn = 0. Clearly the zero vector is orthogonal to any real vector. Since the vector a = ( 0 1 , . . . , an) can be considered as a (1, n)-matrix, ai a' = I( ■\ ■ ■ I. Then if aoc' = 1, a is a unit vector; if a/3' = 0, a and /3 are
\an
)
orthogonal vectors. These concepts will be explicitly explained in Chapter 8 once again. The row vectors of an orthogonal matrix are pairwise orthogonal unit vec tors. So are the column vectors. These properties have been given in Sec. 3.2. Theorem 3. Suppose c*i,..., an are n nonzero pairwise orthogonal vec tors. Then ct\,.,., Ctn are linearly independent. Proof. Assume fciai H
h knan
= 0,
then fciaiai
-I
h fcnanai = 0.
Since a i , a j (i ^ 1) are mutually orthogonal, a ^ a i = 0. Thus we have k\cx\a[ = 0. But Qi is a nonzero vector, a i a ' j ^ 0, so fci = 0. In the same way we can verify that fcj = 0, i = 2 , . . . ,n. So c*xi..., an are linearly independent. Thus the theorem is valid. Suppose the vectors are column vectors. For example, the eigenvectors of a matrix are column vectors. If a'ot = 1, then a is a unit vector. We shall discuss mainly eigenvectors and write the vectors as vertical columns. Thus if P = (Xi,..., Xn) is an orthogonal matrix, then X[Xj = 6ij. How to find a matrix Q satisfying the above conditions? According to the discussion in Sec. 5.1, Q~lAQ is a diagonal matrix so that the column vectors
Matrices Similar to Diagonal
201
Matrices
of Q must also be eigenvectors of A. Since the column vectors Xi,...,Xn of P are n linearly independent vectors of dimension n, the column vectors of Q must also be linear combinations of X\,..., Xn. Thus the problem of finding Q is changed into the problem of finding n pairwise orthogonal unit vectors in linear combinations of the eigenvectors X\,..., Xn of A. The operations are usually called orthogonalization and normalization. Which vectors in the eigenvectors are always orthogonal and consequently need not be orthogonalized? Theorem 4. The eigenvectors of a real symmetric matrix associated with distinct eigenvalues are orthogonal eigenvectors. Proof. Assume that AiXi = AX1.A2.X2 = AX2,Xi X[A, we have \\X^X2
¥" M- Since \iX[
=
= X1AX2 = A2-X\-^2 ,
hence (X1-X2)X[X3
= 0.
But as Ai 7^ A2, X[X2 = 0. Hence X\ and X2 are orthogonal to each other. The proof is complete. Eigenvectors associated with the same eigenvalue which are linearly inde pendent are not necessarily orthogonal. However, using the following methods we can orthogonalize them. For example, oti,0:2,<*3 are three linearly independent vectors. We can find three pairwise orthogonal vectors /3i,/32,/93 by the following procedure. First we take (3i = oci and look for a vector @2 which is orthogonal to /3i in the linear combinations of j3i,<X2- According to the orthogonal condition, we can determine only one undetermined coefficient. For simplicity, let 02 = */3i + a 2 , then
(1)
# 0 2 = k/3i/3i + P[a2 = 0.
Since 0\ ^ 0, we get
fe-
&22
Substituting it into the right-hand side of (1), we obtain that 02 is orthogonal to 0 i . Clearly, 02 7^ 0. We then look for 0 3 in the linear combinations of
202
Linear Algebra
/3i,/3 2 ,a 3 . We have two orthogonal conditions and so can determine two undetermined coefficients. Let 03 = *ij8i + k2(32 + a3 .
(2)
Prom ftfo
= fc!#0i + fc2/3i/32 + # a
3
= 0,
/32/33 = fcij^Si + fc2/32/32 + /3 2 a 3 = 0, we get fc, = -
/3i«3 /3i/3i '
fc2 =
/3 2 a 3 /32/32 '
Substituting into the right-hand side of (2), we obtain that /3 3 is orthogonal to both (3\ and /3 2 . Since a i , a 2 , a 3 are linearly independent, /3 3 ^ 0, thus P\,f$2,@z are nonzero pairwise orthogonal vectors. Such a process of finding the nonzero pairwise orthogonal vectors /3i,/3 2 , /3 3 from the linear combinations of linearly independent vectors a i , a 2 , a 3 is known as orthogonalizing vectors
ai,a2,a3. The above also applies to the general case. Any n linearly independent vectors can be orthogonalized. The process is exactly the same as above, there being no difference in principle. Clearly a nonzero vector a = (ai, a 2 , . . . , a n ) multiplied by be y/al+-+al a. Such a method for reducing a vector into comes the unit vector a unit vector is called normalizing the vector. The problem of normalization is simpler than that of orthogonalization. It is worth noting that here the components a, of the vector a = ( o i , . . . , a„) are all real numbers. If they are complex numbers, the definition of the length of a vector and the concept of orthogonal vectors are given in a different way. For example, a = (1,1, y—2,0) is a nonzero vector. If we use the above definition, its length would be l 2 + l 2 + (v 7 3 ^) 2 + 0 2 = 0. Obviously, this is not reasonable. Example 1. Orthogonalize and normalize
a1 =
fx\1 0
\o)
, a2 =
f10^ 1
W
('^ ,
<*3 =
0 0
V 1/
203
Matrices Similar to Diagonal Matrices
In other words, from the linear combinations of vectors a i , 0 2 , 0 3 find 3 pairwise orthogonal unit vectors. Solution: We first orthogonalize. Take 0i = a\ and let 02 = k,0\ + 0*2Since 0 i f t = (1,1,0,0)
^
= 2,
0
# 0 2 = (1,1,0,0)
W
0 1 \0/
= 1.
We have k = -
1 '2'
gigg _
/91ft
and hence
/32 = - 2 f t + « 2 =
1 0/
\ Then let /93 = fci/3i + foft + a 3 . As
/3 2 "3 fcl
-
ftft
fc2
- 2'
"
/3 2 /3 2 " 3 '
we have
(~i\ 03 = ^0i + ^02 + a3 \ 1/ obtaining the pairwise orthogonal unit vectors
Next we normalize 0\,02^03i ,_0
,
0
Pi 1
-P2
/
,
/
/
i.e.,
/7f\
(
V5 1 1
V6
75
2
0
V0 )
76 \
0 y
Z'
2V3^ 1 2\/3 1 2\/3
V 4 )
204
Linear Algebra
The solution is complete. Note that from the above procedure we can easily see that the vectors ob tained by first normalizing and then orthogonalizing need not be unit vectors. However, vectors obtained by first orthogonalizing and then normalizing are still pairwise orthogonal vectors. Therefore when we need to orthogonalize and normalize vectors, we should always orthogonalize first and normalize later. If we first normalize and then orthogonalize, we would have to normalize them once more. From the definition of eigenvectors and Theorem 3, we readily verify: Vec tors obtained by orthogonalizing from some linearly independent eigenvectors associated with the same eigenvalue are still linearly independent eigenvectors associated with the same eigenvalue. An eigenvector obtained by normalizing is still an eigenvector associated with the same eigenvalue. After we have found a method of orthogonalizing and normalizing, have we completely solved the problem of finding the matrix Q? It is. We can find matrix Q like this: First, as in the previous section, find n linearly independent eigenvectors of A and then orthogonalize the eigenvectors associated with the same eigenvalue, and finally normalize the n eigenvectors obtained. Thus we find n eigenvectors which are pairwise orthogonal unit vectors. Taking them as n column vectors, we construct a matrix which is the orthogonal matrix required. The problem of principal axis is now completely solved. We then have the next main theorem. Theorem 5. Suppose that a matrix A is a real symmetric matrix. Then there exists an orthogonal matrix Q such that Q~lAQ is a diagonal matrix. Calculation of Q is basically the same as in the previous section. We need to add only procedures of normalization and orthogonalization. Example 2. Using orthogonal matrix reduce /
A- - \ I A
V
0
1
-lv
1 - 1 0 -1 1 1
1 0 /
1
1
°
_1 1
into a diagonal matrix. Solution: Since \XE - A\ = (A - 1)3(A + 3), A has eigenvalues Ai = - 3 , and A2 = 1 of multiplicity 3. First we find eigenvectors corresponding to
Matrices Similar to Diagonal
Matrices
205
Ai = —3. Solving (—3E - A)X = 0, we obtain the eigenvector (1, —1, —1,1). After normalizing we get 11
-
2' 2
We then find the eigenvectors associated with A2 = 1: (1,1,0,0),
(1,0,1,0),
(-1,0,0,1).
Orthogonalizing and normalizing, from Example 1 we at once obtain
Y
— I J_
v '
_ / '
--L
1_ 2V3 '
_?_
^_ 2^/1 '
ft\
_1 2\/3 '
N/3\
2 7
Thus
A~Q-MQ =
where / /
1 2
1 1 72 v'g 1
1
l_v 2V5 \ 1
2VI
Q
M
1
0
7f
2\/3
0
0
9
/
Obviously Q is not unique, as is P in the previous section. The solution is complete. It should be noted that a real symmetric matrix can be congruent to a diagonal matrix, as well as can be similar to a diagonal matrix. However, the diagonal matrix form in the former case is not unique, its elements on the main diagonal being 1 , - 1 , or 0 in the simplest form, whereas the diagonal matrix form in the latter case is unique with the elements on the main diagonal being eigenvalues of A. Therefore the two diagonal matrices above are not identical. Only when the matrices are diagonalized by using orthogonal matrices, the two kinds of diagonal matrices above are identical.
206
Linear Algebra
Example 3. Using a transformation of rectangular coordinates, simplify the equation of a quadratic surface x2 + y2 + z2 - 2xz + 4x + 2y - 4z - 5 = 0 .
Solution: Suppose A is the corresponding matrix of the quadratic form x2 + y2 + z2 - 2xz. Since \XE - A\= A(A - 1)(A - 2), the eigenvalues of A are 1,2,0. Its eigenvectors associated with these eigenvalues are respectively (0,1,0),
(1,0,-1),
(1,0,1).
These happen to be orthogonal. Normalizing them, we obtain
'0 Q=
-4=
4-'
i i "i
1
1
and hence
v/r
■)
v/2
"
yfi?
V2
Reducing the given equation to x'2 + + 2y'2 + 2x' + W2y' iV2y' --55 = 0, or {x' + l)2+2(y' (x' + 2(y' + V2) = 10, 10, V2)22 = we see that the le canonical form desired is x"2 + 2y"2 = = 10. It is an ellipticc cylindrical surface. The rectangular coordinate transformation used is , 1 1
l
.
i
' x" = x' +1 = y + 1 , , y" = y' + V2 = ±(x-z) Z» = Z> = -L(X + Z),
+ V2,
Matrices Similar to Diagonal
207
Matrices
or
• y =
x"-l,
We shall consider another example. Example 4. A real symmetric matrix A is positive definite if and only if all the eigenvalues of A are positive numbers. Proof. Suppose the matrix A is positive definite, A is an eigenvalue of A, X is an eigenvector of A associated with A. Prom AX = XX, we immediately obtain X'AX = XX'X. As A is positive definite, X'AX > 0. Furthemore as A is a real number, X can be taken to be a real vector. Therefore X'X > 0 and A > 0. Conversely, suppose the eigenvalues Ai, A2,.. •, A„ of A are positive num bers. From Theorem 5 we see that there is an orthogonal matrix Q such that
Q'AQ=\
An-
V
Suppose X is any nonzero vector. Taking Y = (yi,..., we have /Ai
X'AX = Y'Q'AQY = Y' '
\\
yn) such that X = QY,
n
W =i =$>»?■ i
A„//
Therefore X'AX > 0 and so A is positive definite. The proof is complete. Similarly, a real symmetric matrix A is negative definite if and only if the eigenvalues of A are all negative numbers. Now we prove Theorem 1 above. Following the proof of Theorem 5 in the previous section, we easily see that the following theorem (Theorem 6) holds. By using Theorem 5 and Theorem 6 in Sec. 5.2 and the following theorem, it is evident that Theorem 1 holds. That is to say, from Theorem 6 in Sec. 5.2 we know a matrix A is similar to a diagonal matrix if and only if for eigenvalues Aj of multiplicity ki there are ki
208
Linear Algebra
linearly independent eigenvectors. If a matrix A is a real symmetric matrix, we know from the following theorem that the above sufficient and necessary conditions hold. Thus this proves the theorem. T h e o r e m 6. Suppose that Ao is an eigenvalue of multiplicity A: of a real symmetric matrix A, then a largest linearly independent set is composed of only k eigenvectors associated with Ao. Proof. In the proof of Theorem 5 in the previous section, we can further suppose that the column vectors X\,..., Xn of P have been orthogonalized and normalized. That is to say, we can take P as an orthogonal matrix. Since A is a symmetric matrix, so is P_1AP. Prom the proof of Theorem 5 in the previous section, we easily see that
1
P~ AP =
(\
\
X
0
v ^0
V o "
AJ Hence A\ is also a real symmetric matrix. In Theorem 5 in the previous section, we have proved I ^ k. Using the method of contradiction, we shall prove I — k. Suppose I < k. Prom Theorem 1 in Sec. 5.1 we see that Ao is an eigenvalue of Ai. As in the above, there exists a matrix P of order n — 1 such that
l
PT AxPi
/Ao
\
Vo
A2J
=
Construct a matrix of order n: Q = P
Y V 0
V1
Pj
Since /I L
Q~ =
Vo
PfV
209
Matrices Similar to Diagonal Matrices we have /VA0
Q~lAQ
\
\A0
=
0
0
A0
I
A2)
V Thus
X0E -
Q-lAQ
!
\
0
() 0
V 0
X0E2 — A2 /
whose rank is ^ n— {1+1) or the rank of XQE—A is < n—l. This contradicts the assumption that a system of fundamental solutions of (AQE — A)X = 0 consists of I solution vectors. This contradiction establishes I = k and completes the proof of the theorem. Can we find nonsingular linear transformation such that two real quadratic forms are simultaneously reduced to sums of squares? In general this is im possible. It is possible only when one of the two quadratic forms is positive definite. First let us consider the following example. Example 5. Reduce the following two real quadratic forms f = 2xf + 3x2 + 2*iS3 + 2 i | , g = 2x\ - 2x\ + 2x113 - IOX2X3 - 3x§ to sums of squares simultaneously by means of a nonsingular and linear trans formation. Solution: Let A and B be the matrices of two quadratic forms / and g respectively. Since / is positive definite, there exists a nonsingular linear transformation X = P\Z such that
210
Linear Algebra
/ = X'AX = Z'PiAPiZ g = X'BX = Z'P[BPXZ
= Z'Z, = Z'BXZ,
where 0
-75)
0
Pi =
■>
Vo
0
/l Bi = P[BPl ==
7%) 0 3
- ^
7 3
u
Vo
0 5}/2 3
There also exists an orthogonal transformation Z = P2Y such that / = Z'Z = Y'P'2P2Y = Y'Y, g = Z'B1Z = Y'P^BlP2Y, where /I
p2=
°
lo p^ip
2
=
0 V2 V3
N/3
1
v/2
o\ 1
7s /I 0 0 1 \0 0
0 0 -4
Thus we have a nonsingular linear transformation X = PY such that x
2
,
2
,
2
2 , 2 , 1 2
/ = Vi + 2/2 + 2/3 > 9 = 2/i + 2/2 - 42/3 . where
i 3V2
(72
P = PxPi
0
Vo In general we have:
V2 3
V2 3
~i\ 1 3
2 / 3
'
Matrices Similar to Diagonal
211
Matrices
Theorem 7. Let / = X'AX, g = X'BX be two real quadratic forms with / positive definite. Then there exists one nonsingular linear transformation X = PY which reduces them to f = y\ + --- + yl, simultaneously, where k\,...,
9 = kiyf+--
+ kny*
kn are the roots of \XA — B\ = 0 .
Proof. We shall prove only the last part of the theorem as follows. Because P'AP = E,
P'BP = diag(fci,..., kn),
we have P'(XA - B)P = XE - diag(fci,..., kn) = diag(A - ku...,
A-
K),
and hence |P'(Ai4-B)P|=(A-Jfci)...(A-*ll)l or \XA-B\
=fc(A-fci)---(A-fcn).
Thus ki,... ,kn are the roots of |A^4 — B\ = 0. The proof is complete. Exercises 1. By using orthogonal matrices, reduce the following real symmetric ma trices to diagonal matrices: /
2 (1) 2 \-2
2 - 2 5 -4 -4 5
/ (2)
1 1 0
V-i
1 1 -1 0
0 -1 1 1
-1N
\ 0 1 1/
2. By using orthogonal transformations, reduce the following quadratic forms to sums of squares: (1) 2xf + x\-
AX\X2 - ix2X3,
(2) 8X1X3 + 2X1X4 + 2X2X3 + 8X2X4 .
3. By a transformation of rectangular coordinates, simplify the equation of a quadratic surface
212
Linear Algebra
Zx2 + 2y2 + 2z2 + 2xy + 2zx-8x-6y-2z
+ 3 = 0.
4. Can any n nonzero vectors which are linearly dependent be orthogonalized and normalized? 5. Prove that the eigenvalues of a skew-symmetric matrix are either zero or pure imaginary. 6. Prove that a real skew-symmetric matrix is similar to a diagonal matrix. 7. Suppose matrices A and B are two real symmetric matrices. Prove that the eigenvalues of A and B are the same if and only if the matrices A and B are similar. 8. Show that a real symmetric matrix A is positive definite if and only if there exists a positive definite matrix B such that A = B2. *9. Suppose matrices A and B are real symmetric matrices and A is a pos itive definite matrix, prove that eigenvalues of the matrix AB are real numbers. *10. Suppose A and B are real matrices of order n, AB = BA, and A has n distinct eigenvalues. Prove that by using the same matrix we can reduce them to diagonal matrices simultaneously. *5.4. Canonical Form of Orthogonal Matrices In the previous section we have discussed the canonical form of real sym metric matrices. Real symmetric matrices and orthogonal matrices are two important kinds of real matrix. In Chapter 6 we shall encounter them once again. The manners of discussion are basically the same as in the previous section, but the results are not all the same. First we define certain terms necessary for the discussion. For real matrices, we had the so-called unit vectors and orthogonal vectors in the previous section. We also require these for the complex case. Let a = (a\,..., an) be a complex vector. If aa' = 1, i.e., a^ai + ■•• + anan = 1, then a is called a unit vector. Let 3 = (b\,...,bn) be another complex vector, if a.3 = 0, i.e., ai&i +•••-)- anbn = 0, then a and 3 are called orthogonal vectors. Let a matrix Abe a square matrix. If A A = E, then A is called a unitary matrix. These terms are all generalization of those for a real matrix. In Chapter 8 we shall explain these concepts further. Complex vectors can be orthogonalized and normalized as in the previous section. For example, following equality (1), 32 — kPi+ot2, in Sec. 5.3, letting
jdi02=**3i0i+/3iaa=O
Matrices Similar to Diagonal
Matrices
213
one obtains k = — ^ ) " 2 . Substituting it into (1), we obtain vector @2 ortho gonal to Pi. By orthogonalizing and normalizing n complex vectors which are linearly independent, we can obtain n pairwise orthogonal unit vectors. In the proof of Theorem 7 in Sec. 5.2, if we take Xi,... ,Xn as pairwise orthogonal unit vectors, then Pi is a unitary matrix. Thus we have: Theorem 1. For any matrix A there exists a unitary matrix P such that P~lAP is a triangular matrix. The eigenvalues of an orthogonal matrix are not all real numbers, which is different from the case of a real symmetric matrix. Theorem 2. The eigenvalues A of an orthogonal matrix are all of unit modulus (AA = 1). Proof. Suppose A is an orthogonal matrix, A is an eigenvalue of A, and X is an eigenvector of A associated with A, i.e., AX = XX. From A = A, X A = A X , we have
X'A' = I', X'A! = Axx', and thus
X'A'AX X'A'AX = = Ax x'xx X'XX. Since AA' = E, we have
jtx x'x ==Jxx'x, xxx'x, or (AA -- 1)1?X (XX 1)1?X == 0. 0. As X X 7^ 0, AA = 1, that is to say, the module of A is 1. Therefore the theorem holds. Consequently, if the eigenvalues of an orthogonal matrix are real numbers, then they are either 1 or —1; if they are complex numbers, then they can be written as either cos 6 + isin 6 or cos 9 — isin 9.
214
Linear Algebra
Prom Theorem 8 in Sec. 8.4 it is easily seen that an orthogonal matrix is similar to a diagonal matrix. Thus we have: T h e o r e m 3. If A is an orthogonal matrix, then there exists a unitary matrix P such that P~XAP is a diagonal matrix. It is clear that if the eigenvalues of A are all real numbers, i.e., either 1 or —1, then P is a real matrix, and hence an orthogonal matrix. That is to say, if the eigenvalues of an orthogonal matrix are real numbers, there is an orthogonal matrix P such that P~l AP is a real diagonal matrix. If the eigenvalues are not all real numbers, then P is not a real matrix. When we discuss in the realm of real numbers, in order to avoid such a situation, we construct an orthogonal matrix Q from P. However, we can only reduce Q~lAQ to a simple real block diagonal matrix, but not a diagonal matrix. This is shown as follows. As in the proof of Theorem 4 in the previous section, we can easily prove that the eigenvectors of an orthogonal matrix associated with distinct eigenval ues are mutually orthogonal. Again since the coefficients of the characteristic polynomial of A are real numbers, if Ai is a complex eigenvalue of A, then the conjugate Ai of Ai is also an eigenvalue of A and Ai,Ai have the same multiplicity. Suppose X\ is an eigenvector of A corresponding to Ai. Prom AXi = X\Xi,A = A, we have AX\ = \iX\, i.e., the conjugate vector X\ of X\ is an eigenvector of A associated with \\. Thus we choose the n lin early independent eigenvectors of A like this: For Ai take a unit vector X\, for Ai take X~i, and so for other conjugate complex eigenvalues. Suppose P = (X1X1X3...Xn). Then
/Ax P~1AP =
\ A3
V
A„/
As in Example 2 of Sec. 5.2, exchanging mutually conjugate column vectors for real vectors, we construct an orthogonal matrix Q. In view of AXi = AiXi, AX\ = \\X\, let Yi =
*!+*! V2
Y2 =
Xi
sfli
Matrices Similar to Diagonal
Obviously, Y\ and Y2 cos Oi + ism #1. Thus
ar
AY1 = ^(AXi
215
Matrices
e real vectors. If Ai = cos0i — isin0i, then Ai
+ AX1) = -^(XiX1
+ A1X1)
= ^ { ( c o s ^ i - zsin^i)Xi + (cos0i + isin0i)Xi} = cos0i • X ' ^ X ' +sinfli •
x
^*'
= cos6»iY1 + sin 0iY 2 . Similarly AY2 = -sin^iVi + cos0iY2 And so
A(Y1Y2X3--Xn) = (cos 0i Yi + sin 0iY2
- sin 0XYx + cos 0i Y2
A 3 X 3 • ■ ■ \nXn)
( cos0! cos 0i —sin0 —sin 0ix sin 0i cos 0i =
(Y1Y2X3---Xn)
\ A3
V
A„/
Since Xi is a unit vector, Xi and X\ are eigenvectors corresponding to two distinct eigenvalues, X[Xi = 0. Using the two properties, we can easily check that Y{Yx = 1, Y2'Y2 = 1, %YX = 0, %Y2 = 0. Thus (Y1Y2X3.. .Xn) is still a unitary matrix, where Yi and Y2 are real vectors. If the two real column vectors in P remains unchanged, and conjugate complex vectors are exchanged for such pairs of real vectors, then the matrix obtained is simply the orthogonal matrix required. Thus we have the following important theorem: T h e o r e m 4. Suppose a matrix A is an orthogonal matrix. If its eigenvalues are 1 with multiplicity f"i,— 1 with multiplicity r2, and its other 2r eigenvalues are Ai, A i , . . . , Ar, Ar, then there exists an orthogonal matrix P such that
216
Linear Algebra
/. 1
V\U r
2
P^AP-
Ns - 1
cos 6\ —sin #i sin#i cos#i cos 0T —sin 6r sin 8r cos 9T
\
The matrix on the right-hand side of the above expression is known as the canonical form of an orthogonal matrix A, where the real eigenvalues are on the main diagonal. For a pair of conjugate complex eigenvalues, there is a little block cos# sin#
- sin# COS0
So long as we confine 6 to 0 ^ 6 < 2ir, the little block in P~XAP is determined uniquely by the complex eigenvalues. Consequently if we do not consider the orders of real eigenvalues and these little blocks, then the canonical form of A is unique. For example, in the canonical forms of orthogonal matrices of order 3, there are six kinds of matrices altogether: 1
1
-1 -1
-1 -1
cos# sin#
■ sin# COS0
cos 6 — sin 6 sin 9 cos 6
Exercises 1. Find the canonical forms of the following orthogonal matrices:
Matrices Similar to Diagonal
/ (1)
2 6
3
6 2 - 3
(2)
217
Matrices
V*3
-v^
1
3
—v/3
\/3
3
1
3
1
\/3
-V^
1
3
-\/3
V
\
\/3 /
2. Show that any real skew-symmetric matrix can be reduced to the canonical form \ »"1
Q~lAQ
\
0
6i
-bi
0 0 -fer2
6 r2 0
where Q is an orthogonal matrix, ri is the number of the eigenvalue 0 of A, 2r2 is the number of imaginary eigenvalues of A. 3. Verify that the eigenvalues of a unitary matrix are of module (absolute value) 1. 4. Suppose A is an orthogonal matrix of order n. Show that if \A\ = —1, then — 1 is an eigenvalue of A; if \A\ — 1 and n is an odd number, then 1 is an eigenvalue of A. 5. Suppose a matrix A is a unitary matrix. Prove that there exists a unitary matrix P such that P~XAP becomes a diagonal matrix.
*5.5. Cayley—Hamilton Theorem and Minimum Polynomial We have discussed characteristic polynomials in the second section. In this section we shall discuss further the properties of characteristic polynomials, introduce a relation of matrix's own and give another necessary and sufficient condition for a matrix to be similar to a diagonal matrix. These results are of great importance. First we shall introduce the famous Cayley-Hamilton (W. R. Hamilton 1805-1865) Theorem. It is an important relation of matrix's own.
218
Linear Algebra
Theorem 1. A matrix A of order n is a root of its own characteristic polynomial f(x) = \xE - A\=xn
+ aii"-1 H
h a n _ i x + an ,
i.e., f(A) =An + oi^™" 1 + • • • + an-iA + anE = O. Proof. Suppose the matrix B(x) is the adjoint matrix of xE — A. From Sec. 3.3, we have B(x)(xE-A)
= \xE-A\E
= f(x)E.
(1)
Since the elements of B(x) are algebraic cofactors of elements in xE — A, they form a polynomial in x whose degree does not exceed n — 1. Therefore B(x) can be written as a polynomial of the matrix's coefficients: B{x) = Box71'1 + Bxxn~2 +•■• + Bn_2x + S n _ i , where JBo, S i , . . . , -B n -2, -B n -i are all matrices with scalar elements. Thus B{x)(xE -A)
= (Box71'* + BiX71'2 + ■■■ + Bn_2x + Bn^)(xE n
= B0x
n 1
+ (Bi - B0A)x -
+ (B n _i - Bn-2A)x n
= Ex
71 1
+ axEx -
+ ■■■
-
+
- A)
Bn^A r
an^Ex
+ anE.
Comparing the coefficients on the two sides of equality (1), we obtain Bo
= E,
Bi - B0A
= axE,
Bn-i
=
-Bn-iA
— Bn-2A
an-\E, =anE.
Multiply the two sides of each equation above by An, An~l, ...,A,E from the left respectively and add them up. The terms on the left-hand side will cancel out leaving the zero matrix. The terms on the right-hand side add up to An + aiA71-1 +■■■ + an-iA
+ anE = O.
Matrices Similar to Diagonal
Matrices
219
The theorem is established. For example, let - 1 1 0\ - 4 3 0 . 1 0 2/
(2)
Its characteristic polynomial is f(x) = x3 - Ax2 + 5x - 2. Therefore A3 - AA2 + 5A - 2E = O. By direct computation, the above result can be checked. Note that by saying that A is the root of f(x) we mean that when we substitute A into fix) = xn + aia! n - 1 +
1- an_ix + an
we obtain a matrix polynomial equal to the zero matrix, i.e., f(A) = An + c^A71-1 +■■■+ an-xA + anE = 0. Note also that x in the polynomial f(x) obtained from \xE — A\ cannot be regarded as a matrix, for if we replace x in \xE — A\ by A, we would obtain \A-A\= 0, but not f(A) = O. Thus if A is a matrix of order n, any power of A which is larger than n — 1 can be expressed as a sum of the powers of A less than n. In this way the computation of matrices can be simplified. Example 1. Compute g(A) = A7-A5-
19A4 + 28A3 + 6A - AE,
where A is the matrix of order 3 in (2). Solution: The characteristic polynomial of A is f{x) = x3 - Ax1 + 5x - 2 . Dividing (x) by /(x) we obtain the remainder term - 3 x 2 + 22x - 8 .
220
Linear Algebra
Thus g(A) = -3A2 + 22A --8E =-3
1-3 -8
/ - 19 = 1-64 \ 19
+22
2 5
i) (
'-1 -4
1 0\ 3 0
'-8 -8
/' * (
v 1 °
2
28 -14 341
9 -4 1081
-8
16 43 - 3 24/
The solution is complete. Example 2. Find -y/A, where 4 - I
• 91 47 -1113
Solution: Since the characteristic polynomial of A is /(A) = A3 - 3A2 - A - 1, we have A3-
3A2- -A-E
=o
or
A(E
-A?
= (JS + A)2-
Hence
VA = ±{E + A){E--A)'1 --±\{E + 2A-- A 2 ) , or
/239 - 7 3 577 -175 \ 6 6 8 -208
a-±i
-23 \ -55 . -66/
Example 3. Show the Cayli3y-Hamilton Theorem by using Theorem 7 in Sec. 5.2. Proof. Let the characteristic polynomial of the matrix A be f(x) =
(x-
A i ) ( x - -A 2 )-- ■ (x - A n ) ,
Matrices Similar to Diagonal Matrices
221
and suppose that /A,
A2
P~lAP
■V A„/
Then f{P~lAP)
= ( P " 1 A P - XiE^p-'AP
- \2E) ■ ■ ■ {P~lAP
-
\nE)
/A1-A2
({
A n — A2 /
A„ - Ai /
\
/ A i — Xn A2 — A n
V /o 0 0
0
\
0/
/
An _ A3 / A2 — An
V /o 0 0 \ 0 0 0 * 0 0
\
A2 — A3
/ Ai — An
\0
\
/Aj-Aa
*
Vo 0
\
*
A2 — Ai
\
0/
/A1-A4
'
/
A n — A4 /
\
I \\ — Xn A2 — A„ 0/ 0 0
222
Linear Algebra
i.e., f(p-1AP)
= p-1f(A)P
= 0.
Therefore f(A) =An + aU71'1 + ■■■ + On-iA + anE = 0. This is the Cayley-Hamilton Theorem. The proof is complete. The following introduces an important concept. Suppose m(x) is a polynomial which the matrix A satisfies (when A is a real matrix, the coefficients of m{x) are real numbers). If m(x) is the (nonzero) monic polynomial with the lowest degree satisfied by A, we call m(x) the minimal polynomial for A. Since A satisfies its characteristic polynomial, we have f(A) = O. The degree of m{x) does not exceed n. Moreover m{x) is not necessarily an irreducible polynomial, for the product of two nonzero matrices may be a zero matrix. Suppose g{x) is any polynomial satisfied by A, i.e., g(A) = O. Upon dividing g(x) by m(x), we can write g(x) as g(x) =q(x)m(x)
+ r(x),
where q(x) is the quotient polynomial and r(x) is the remainder, which is either identically zero or a polynomial of a degree less than the degree of m(x). Thus g(A) =
q(A)m(A)+r(A),
and so r(A) = O. Therefore r(x) = 0, i.e., g{x) = q(x)m(x). That is to say, the minimal polynomial for A divides any polynomial g(x) satisfied by A. Hence the minimal polynomial for A is a factor (or divisor) of the characteristic polynomial of A. Moreover, the minimal polynomial for A is unique except for a possible nonzero scalar factor. For example, when the matrix A satisfies A2 = E, we have g(x) = x2 — 1. If A 7^ ±E, the g(x) is the minimal polynomial for A, i.e., m{x) = x2 — 1. We know the elements in the adjoint matrix B(x) of the characteristic polynomial (xE — A) are a polynomial in x of a degree at most n — 1. Let d(x) be the greatest common divisor of the elements in B(x). Since B{x){xE — A) = f(x)E, d(x) divides f(x), where the leading coefficient of d(x) is equal to one. Theorem 2. The minimal polynomial for A mix) =
^
Matrices Similar to Diagonal
223
Matrices
Proof. Suppose B(x) = d(x)C(x), where the elements of the matrix C(x) are relatively prime. Let h(x) = 4 4 - From B(x)(xE - A) = f(x)E, one obtains C(x)(xE -A) = h{x)E. We can repeat the argument used in the proof of Theorem 1 to deduce that h(A) — O. Hence h(x) is divisible by m(x), i.e., m(x)\h(x). On the other hand, consider the polynomial m(x) — m(y). Since it is a sum of terms of the form Ci{xl — yl), each of which is divisible by x — y, m(x) — m(y) is divisible by x — y: m(x) - m{y) = {x-
y)k(x,y).
Replacing x by xE and y by A, we have m(xE) - m(A) = (xE - A)k(xE,
A),
or m(x)E = (xE - A) ■ k{xE, A).
(3)
Premultiplying the two sides of the above with the adjoint matrix B(x) of (xE — A), we obtain m(x)B(x)
= B(x)(xE =
- a) ■ k(xE, A)
f(x)-k(xE,A),
or m(x)d(x)C(x)
= f(x) ■ k(xE,A),
and hence m(x)C(x)
=
h(x)k(xE,A).
Since h(x) divides every element of m(x)C(x) and the elements of C(x) have no nonscalar common factor, h{x) divides m(x), i.e., h(x)\m(x). Therefore h(x) and m{x) differ at most by a scalar factor. This proves the theorem. If d{x) = 1, then m(x) = f(x). That is to say, when the elements in the adjoint matrix of the characteristic matrix of A are relatively prime, the characteristic polynomial of A is its minimal polynomial. For example, A is the matrix in (2), 632(2) = 1 in the adjoint matrix B(x) of (xE — A). Therefore d(x) = 1. Thus its characteristic polynomial is its minimal polynomial, i.e., m(x) = x3 - Ax2 +
5x-2.
224
Linear Algebra
E x a m p l e 4. Find the minimal polynomial of the matrix 2 -4 2
A
1 0 -2 0 1 0
Solution:
/(*)
x-2 4 -2
-1 0 i + 2 0 -1 x
The greatest common factor of the elements in the adjoint matrix B{x)
'x2 + 2x —4x 2x
x x2 — 2x x
0 0 x
of (li? — A) is a;, i.e., d(x) = x. Hence m{x)
x2.
By direct computation we obtain 2 1 0' - 4 - 2 0 1 - O 2 1 0, The solution is complete. Theorem 3. Suppose M = I _
_ j , where A and B are square matrices.
Then the minimal polynomial m{x) of the matrix M is the least common multiple of f(x) and g(x), where f(x) and g(x) are the minimal polynomials of A and B respectively. Proof. Since m(x) is the minimal polynomial of M,
^ - { " o ] JSm)
O.
Hence m(A) = O, m(B) = O, i.e., /(a;)|m(a;), g(x)\m(x). common multiple of f(x) and g(x).
Thus m(x) is a
Matrices Similar to Diagonal
Matrices
225
Moreover, suppose h(x) is any common multiple of f(x) and g(x). Since
m(x)\h(x), that is to say, any common multiple of f(x) and g{x) is a multiple of m(x). Therefore m{x) is the least common multiple of f(x) and g(x). This proves the theorem. There is an important relation between the characteristic polynomial and the minimal polynomial, which is embodied in the following theorem. Theorem 4. Each irreducible factor of the characteristic polynomial f(x) of A is also an irreducible factor of the minimal polynomial m(x). Therefore the roots of the characteristic polynomial are also the roots of the minimal polynomial. Proof. Taking the determinant on the two sides of (3), one obtains det (m(x)E) = det {xE - A) det (k(xE, A)), or (m(x))n = f(x) det (k(xE, A)). Thus every irreducible factor of f(x) divides [m(a;)]n and hence m{x) itself. The proof is complete. Thus the roots of the characteristic polynomial coincide with those of the minimal polynomial. The difference between them is only the multiplicities of the roots. Therefore a characteristic polynomial without repeated factors is also the minimal polynomial. Moreover, suppose A is a matrix of order n. The characteristic polynomial f(x) of A is a factor of the n-th power of its minimal polynomial m(x), i.e., f(x)\[m(x)]n. Example 5. Find the characteristic polynomial and the minimal polyno mial for the matrix / 7 A= I 4 \ -4
4 - 1 7 - 1 -4 4
226
Linear Algebra
Solution: First we find the characteristic polynomial: i-7 -4 4
-4 x-7 4
= (*-3)2
1 8
1 1 a; — 4 1 i-4
x-3 0 4
-i+3 x-3 4
0 x-3 i —4
(x-3)2(x-12).
Since the minimal polynomial m(x) divides f(x), by a straightforward compu tation we obtain (i4 - 3£?)(i4 - 12E) = (
4 4 4 4 -4-4
- l \ /-5 4 - 1 \ - 1 I I 4 - 5 - 1 I = O, 1 / V -4 -4 -8 /
Therefore m(x) = (x - 3)(x - 12). Example 6. x 2 + 1 = 0.
Show that no nonzero real matrix of order 3 satisfies
Proof. We prove the above result by the method of contradiction. Suppose A is such a real matrix of order 3. Since its characteristic polynomial /(x) is a polynomial of degree 3 with real coefficients, if A satisfies x 2 + 1 = 0, then the minimal polynomial m(x) of A is m(x) = x 2 + 1 and f(x) must have a real factor of degree 1 apart from the factor x 2 + 1. Obviously, m(x) could not have the above factor. This conflicts with Theorem 4, and so the statement is valid. E x a m p l e 7. Similar matrices have the same minimal polynomial. Solution: Suppose that the minimal polynomial of the matrix A is m(x) = xk + 6ix fc_1 + • • • + bk-ix + bk . Then m(A) = Ak + biA*-1 + ■■■ + bk-iA + bkE = 0. Let B = P~lAP.
Since B* = (p-1AP)i
=P'1AlP,
Matrices Similar to Diagonal
227
Matrices
we have m(B) = Bk + b1Bk~1 + ■■■ + bk-iB + bkE = p-\Ak l
= p- m{A)P
+ bxAk-1
+ ■■■ + bk^A
+ bkE)P
= 0.
Let the minimal polynomial of matrix B be mi(x). Then mi(x)\m(x). Simi larly we have m{x)\mi{x), and so m1(a;) = m(x). Thus the statement holds. It should be remarked that having the same minimal polynomial is only a necessary condition for two matrices to be similar. It is not a sufficient condition. For example, the two matrices
( ' - , ) ■ ( • • . )
are not similar. Their characteristic polynomials are /i(x) = (x - 2){x - 3) 2 ,
f2(x) = (x-
2)2(x - 3)
respectively, while their minimal polynomials are both (x — 2)(x — 3), i.e., mi(x) = 7712(1) = (x — 2)(x — 3). We already know if the eigenvalues of the characteristic polynomial of a matrix A are all different, then A is similar to a diagonal matrix. Clearly at this point the eigenvalues of the minimal polynomial of A are also all different. But when m(x) has no multiple roots, while f(x) may have repeated roots, A can also be similar to a diagonal matrix. That is to say, whether A is similar to a diagonal matrix or not is completely determined by whether m(x) has multiple roots. This is the following theorem. Theorem 5. A matrix A of order n is similar to a diagonal matrix if and only if its minimal polynomial m(x) has no multiple roots. Proof. Suppose the matrix A is similar to a diagonal matrix D = diag ( A i , . . . , A n ), i.e., P~lAP = D, and A i , . . . , Am are eigenvalues of A which are all different. Let g{x) — {x — A i ) . . . (x — A m ). Since P~1A'P = Di, we have P-19(A)P
= g{D) = (D - Ai£) • ■ ■ {D = diag(s(Ai), . . . ,
g(Xn))=0.
\mE)
228
Linear Algebra
Hence g(A) = O and m(x)\g(x). Thus m(x) has no repeated roots. Conversely suppose m(x) = (x — Ai) ■ • • (x — A&) has no multiple roots, and the rank of A — XiE is T-J. Since m(A) = {A- \!E)(A
- X2E) ---(A-
XkE) = O,
applying the results of Exercise 10 of Sec. 3.1 repeatedly, we obtain 0 ^ r\ + r 2 H
h rfc — (k — l)n,
i.e., r\ + r2 H
h Tfc < (fe - l ) n .
Hence n - ri + n - r2 H
h n - rk = kn - (ri + r 2 H
h rfe) ^ n.
Therefore there exist n linearly independent eigenvectors of A, and so A is similar to a diagonal matrix. This proves the theorem. This is an important theorem and may sometimes be conveniently applied to determine whether a matrix is similar to a diagonal matrix. Example 8. Suppose Ak = E, where k is a positive integer and E a unit matrix. Prove that A is similar to a diagonal matrix. Proof. Since A satisfies the polynomial g(x) = xk — 1, i.e., g(A) = O. Let the minimal polynomial of A be m(x), then m(x)\g(x). But g(x) has no multiple roots and neither does m(x). Therefore A is similar to a diagonal matrix. Exercises 1. Using the Cay ley-Hamilton Theorem, evaluate 2AS - 3A5 + A4 + A2 - 4£, where A=
/I 0
\0
0 -1
2\ 1 I.
1 Oj
Matrices Similar to Diagonal
2. Let A =
1
-1
Matrices
229
. Express (2A4 - 12A3 + 19A2 - 29A + 37E) - l
as a polynomial of A. 3. Show that the inverse matrix A~l of any matrix A can be expressed as a polynomial of A. /I 0 4. Let A = 1 0 1 Prove that when n ^ 3, we have 1
°\ °/
V°
An
=
An-2
+
A
2 _
E
Use the above relation to evaluate A100. 5. Find the characteristic and minimal polynomials for the following matrices:
(1)
(3)
/
4 -8 -1
7 4
U fZ 0 0 0
1 3 0 0
"l ~*
0.Q
Ol
-Oi
a0
-a2 \-a3
a-3
UQ
-oi
-o2
a\
a0
/
-4\ '
(2)
a-2
-a3
«3
a2
0 0 0\ 0 0 0 3 1 0 0 3 1
Vo o o o 3 / 6. Verify that any matrix and its transposed matrix have the same character istic polynomial and the same minimal polynomial. 7. Prove that any nonzero nilpotent matrix cannot be similar to a diagonal matrix. 8. Suppose A is a matrix of order 3 and A2 = E, A ^ E. Prove that the trace of A is either 1 or —1, 9. Suppose two matrices A and B are similar to diagonal matrices and AB = BA. Show that there exists a nonsingular matrix P such that P~1AP, P~lBP are both diagonal matrices.
CHAPTER 6 JORDAN CANONICAL FORM OF MATRICES
In the previous chapter we have discussed the problem of reducing matrices to diagonal matrices. In general, matrices that cannot be reduced to diagonal matrices can be reduced to block diagonal matrices, or the Jordan canonical form. In this chapter we shall consider this problem, specifically the following: 1. The necessary and sufficient condition for A-matrices to be equivalent. 2. Methods of finding Jordan canonical forms. This chapter is divided into four sections. The second and third sections deal with the first problem, the other sections with the second problem. 6.1. Necessary and Sufficient Condition for Two Matrices to be Similar For a given matrix A, how does one find a simple matrix B such that A~B1 We proceed as follows. First find the necessary and sufficient condition for two matrices to be similar, and then using the sufficient condition find the matrix B required. Suppose two matrices A and B are similar, i.e., A ~ B. Then there exists a matrix P such that B = P~lAP. Thus
\E-B
=
P~1(\E-A)P.
We easily prove XE — A is equivalent to XE — B, i.e., XE — A ~ XE — B. 230
Jordan Canonical Form of Matrices
231
Conversely, if XE — A ~ XE — B, we will prove that there exist constant matrices R and S such that XE-B
= S(XE-A)R.
(1)
Consequently, E = SR, B = SAR, and hence A ~ B. Thus we have: Theorem 1. Two matrices A and B are similar if and only if their char acteristic matrices are equivalent: XE - A ~ XE -
B.
This theorem is very important. It not only gives the necessary and sufficient condition for two matrices to be similar, but, more importantly, it changes the similarity relation into an equivalence relation. It is difficult for us to deal with a similarity relation, but, using elementary operations, we can make the equivalence relation more specific. However we have given only the conclusion of the theorem, no its proof. Here the character istic matrices are A-matrices. However, the equivalence concept of A-matrices has not been given either. Therefore before a proof of the above theorem we first have to define some concepts, such as the concept of elementary operations on A-matrices, the concept of equivalence of two A-matrices. The concept of the rank of a A-matrix A(X) is the same as that of a cons tant matrix. Thus, the highest order of the minors of ^4(A) not being identically zero is called the rank of a A-matrix A(X). It is to be noted that minors of A(X) are polynomials in A. By a polynomial in A being identically zero we mean that any number can be its root. Hence its coefficients are all zero, and we often say that the polynomial is identically equal to zero. Because any polynomial of degree n only can have n roots, it is not a null polynomial, i.e., not a polynomial vanishing identically. Suppose the matrix A is of order n. Since \XE — A\ is a polynomial of degree n in A, and not identically vanishing, the characteristic matrix XE — A of A is of rank n. In other words, whether A is nonsingular or not the characteristic matrix of A is always nonsingular. We know that the elements in a A-matrix are polynomials in A, and the power exponents of A in a A-matrix can only be positive integers or zero. Owing to this constraint, elementary operations of A-matrices are not quite the same as those of matrices with scalar elements as in Sec. 2.5. The concrete definition is as follows.
232
Linear Algebra
Definition. The following operations are called elementary operations on a A-matrix A(X): (i) Interchanging two rows or two columns of A(X). (ii) Multiplying a row or a column by a nonzero number. (iii) Adding a row (column) multiplied by an arbitrary polynomial in A to another row (column). Note that the operation of type (ii) here is the same as in the case of constant matrices in Sec. 2.5. They are multiplied by a nonzero constant. If we should be allowed to multiply A(X) by a polynomial in A, then since the elements in A(X) are polynomials in A, the symmetric law of the equivalence relation might be destroyed. For instance, if we multiply I _ _ I by A we obtain I _ But in order to get I
I from I
j , we must divide the latter by A.
Divided by a polynomial in A, some elements of the A-matrices may become fractional functions of A, which is obviously not allowed. The definition such as given above is formulated so that the equivalence relation of A-matrices can satisfy the three equivalence laws as for constant matrices. The elementary operations of type (i) follows from that of types (ii) and (iii), as in Sec. 2.5. We list them separately only for convenience. As in Sec. 2.5, we can also define elementary A-matrix. It corresponds to elementary operations. As in matrices with scalar elements, two A-matrices A(X) and B(X) are said to be equivalent, denoted by A(X) ~ B(X), if there exists a finite sequence of elementary operations that can carry A(X) into B(X). The equivalent re lation of A-matrices satisfies also three laws of equivalence. The ranks of two equivalent A-matrices are equal. By analogy with Theorem 5 in Sec. 3.3, we can at once give the following theorem from definition: Theorem 2. Two A-matrices A(X) and B(X) of the same size are equivalent if and only if there are two invertible A-matrices -P(A) and Q(X) such that B(X) =
P(X)A(X)Q(X).
The reason is that every elementary operation corresponds to a multipli cation on the left side or on the right side of J4(A) by an elementary A-matrix and any nonsingular A-matrix can be expressed as a product of elementary A-matrices.
Jordan Canonical Form of Matrices
233
If a matrix A has an inverse matrix, we say that A is invertible (or nonsingular). For matrices with scalar elements, a nonsingular matrix has an inverse matrix and is thus an invertible matrix. But a nonsingular A-matrix is not necessarily an invertible matrix, which follows directly from the following theorem. Theorem 3. A A-matrix A(X) is invertible if and only if |>1(A)| is a nonzero constant. Proof. If \A(X)\ = c ^ 0 , then the elements of the matrix A (A)* are minors lors of order n — 1 of A(X). After dividing by c /^ 0 we obtain AW \A(X)\ \AjXj\A{Xr ''
whose Dse elements are polynomials in A. Thus from Theorem 2 in Sec. 3.3
A A
< >~'-|.4(V W '-
Again lin ^ ( A ) - 1 is a A-matrix. Therefore A(X) is an invertible A-matrix. Conversely, if A(X) is invertible, then its inverse matrix A(X)-1 is a Amatrix. trix. Prom From the equality A(X)~1XA(X) A(X) = E, we obtain \A(X)\ \A(X)\ = \E\ = 1 and1so so|-<4(A)| |J4(A)| isisaanonzero nonzeroconstant. constant. The Theproof proof isiscomplete. complete. Now we verify Theorem 1. From Theorem 2, we at once obtain the necessary condition for Theorem 1. The3 sufficient condition for the theorem is proved as follows. Before going on to prove the sufficient condition, we first present a basic property perty of A-matrix which plays an important role in the proof. In high school we learnt division algorithm: Dividing a polynomial p(x) by i - a awwe e obtain a quotient q{x) and a remainder r: p(x) = (x - a)q{x) + r, where r is a constant, and both q(x) and r are unique. For any A-matrix A(X), by matrix addition and scalar multiplication, it can be written as a polynomial in A with matric coefficients: A(X) = A0Xm + ■■■+ Am-iX + A,
234
Linear Algebra
where AQ,A\,...,Am are constant matrices. When AQ ^ O, we say that the matrix A(X) is of degree m. Thus a A-matrix of degree zero is simply a constant matrix. The above result just given about division carries over for matric polyno mials, and the proof is analogous. T h e o r e m 4. Suppose P(A) is a A-matrix, A is a constant matrix. Then there exist a A-matrix Q(A) and a constant matrix R such that P(X) = (XE-A)Q{X)
+ R,
and Q(X) and R are unique. Proof. Supposing P(X) = P0Xm + PxX™-1 + ■ - ■ + Pm, where Po(^ 0), P\,...,
Pm are all constant matrices, we have
P(A) - (XE - A)P0Xm'1
= (Pi + AP0)Xm-1 + P2Xm-2 + ■ ■ ■ + Pm .
The degree of A-matrix on the righ-hand side does not exceed m — 1. If we can prove that its degree is zero, then the theorem is established. This can be done by an absurdity argument. If its degree is more than zero, we continue in the same way to decrease the degree of A-matrix on the right-hand side, until it becomes zero. Combining the terms which contain XE — A, we obtain P(X) =
(XE-A)Q(X)+R,
where Q{X) is a A-matrix and R a constant matrix. This proves the existence of <5(A) and R. As for uniqueness, from (XE - A)Q(X) + R = (XE-
A)Q0(k) + Po
we obtain (AP-J4)[Q(A)-Qo(A)]=P0-P. Since the right-hand side of the above expression is a constant matrix, we have (5(A) = <3o(A). Therefore P = Po. Thus the theorem is established. Similarly we have P(X) =
Q1{X)(\E-A)+Ri,
where Qi(A) is a A-matrix, P i is a constant matrix and Qi(A) and P i are both unique.
Jordan Canonical Form of Matrices
235
Note that since matrix multiplication is not generally commutative, the above Q(X) and Q\(X) are not necessarily identical. Similarly R and R\ are also not necessarily identical. In the case of polynomials with scalar coefficients this does not arise. Finally we verify the following theorem, which can be used to prove readily the above formula (1). T h e o r e m 5. Suppose matrices A and C are nonsingular. If AX + B is equivalent to CX + D, i.e., CX + D = P(X)(AX + B)Q(X), where P(X) and Q(X) are invertible, then there exist nonsingular constant matrices S and R such that CX + D = S(AX D--= S(AX+ B)R. B)R. Proof. Suppose P(X) == (CX + D)M(X) D)M(X) + PW--
S,
Q(X) == N(X)(CX N(X)(CX + D) + R, Q(A) where S and R are constant matrices. From the assumption we have P-\X)(CX P-\X)(CX
+ D) = {AX + B)Q(X). B)Q(X).
;he expression of Q(X) gives Substitution of the \X)(CX + D) = (AX + B){N(X)(CX B){N(X)(CX + D) + R} , P-^X^CX or _1 B)N(X)}(CX + D) = (AX + B)R. B)R. {P-\X)(A) - (AX + B)N(X)}(CX
(2) (2)
Comparing the degrees of the two sides of the above identity we we obtain a constant matrix 1 T = P~ P-\X)-(AX (X)-(AX + B)N(X), B)N(X), or P(X)T = E- P(X)(AX P(X)T P(X)(AX + B)N(X) B)N(X)
= F, - (c\ +-i- mo-itx^Ntx} = E-(CX D)Q-HX)N(X).
236
Linear Algebra
Thus E = P{X)T + (CX + D)Q~1(X)N(X) = {(CX + D)M(X) + S}T + (CX +
D)Q~1(X)N(X)
= (CX + D){M(X)T + Q~1(X)N(X)} + ST. Again comparing the degrees of the two sides of the above identity we get E = ST,
or
T~l = S.
Therefore, from (2) we obtain CX + D = S(AX + B)R, where S is nonsingular. Similarly, R is also nonsingular. Thus the above theorem is completely proved. 6.2. Canonical Form of A-Matrices Theorem 1 in the preceding section only enables us to change the similar problem into an equivalent problem. But for a given matrix A, how do we find a simpler matrix B such that the characteristic matrices of A and B are equi valent? For this purpose we first discuss the conditions for two A-matrices to be equivalent. In the following two sections we shall discuss these problems. In this section we discuss the canonical forms of A-matrices. In Sec. 2.5, according to their canonical forms we have discussed the equivalent conditions for con stant matrices. Now we shall discuss the equivalent conditions for A-matrices as in Sec. 2.5. We shall next discuss the problem of reducing A-matrices to the diagonal matrices by elementary operations. We cannot directly apply the method in Chapter 2 to find the diagonal matrix of A-matrix. The reason is as follows. In Chapter 2, using elementary operation 3 we divide a row or a column by a number, and then add the result to another row or another column so that the other nonzero numbers in the same row and in the same column become zero. But here we can multiply a row or a column only with a polynomial in A. Then in order to reduce the other elements in the same row and in the same column to zero we must make it divide other elements in the same row and in the same column. Using the method for decreasing the degrees of elements, we can then arrive at our goal. We know that if a A-polynomial g(X) cannot divide /(A) we would have /(A)=p(A)5(A)+r(A),
(1)
Jordan Canonical Form of Matrices
237
where r(A) ^ 0, and has a degree less than that of g(X). Making use of this property, we perform elementary operations to decrease the degrees of the elements and can finally reduce a A-matrix to a diagonal A-matrix. The procedure is illustrated below with a simple example. Example 1. Reduce the A-matrix /0 A(A) = A \0
A(A-l) 0 0
0 \ A +1 - \ + 2j
to a diagonal matrix. Solution: Note that the element at the upper left corner is zero. Exchanging the first and second rows we obtain A 0 0
0 A(A-l) 0
A+ l \ 0 - A + 2y
As A + 1 is not divisible by A, with help of (1) we substract the first column from the third column and exchange the first column and the third column in the resulting matrix to obtain 1 0 -A+2
0 A(A-l) 0
A\ 0 . Oy
The element at the upper left corner is now 1 which can divide any element in above matrix. Using the same methods as in Sec. 2.5, we obtain /l 0 ^0
0 A(A-l) 0
0 \ 0 A(A-2)/
This is the diagonal matrix required. The solution is complete. In general, the method for finding the diagonal matrix of a A-matrix is the same as in Example 1. First we apply elementary operations on the A-matrix so that the element at the upper left corner of the new A-matrix can divide any element in the first row and in the first column. This is possible. In fact if there are some nonzero elements which are not divisible by the element
238
Linear Algebra
at the upper left corner, using formula (1) and elementary operations we can decrease the degree of the element at the upper left corner, so that the element at the upper left corner may divide any element in the first row and in the first column. If they still are not divisible by the element at the upper left corner, we continue to decrease the degree of the element at the upper left corner until its degree becomes zero, that is, the element at the upper left corner is simply a constant, at which time clearly any polynomial in A is divisible by it. Then following the method of reducing a constant matrix to a diagonal matrix as in Sec. 2.5, we can make the other elements in first row and in the first column vanish. This process is continued for the remaining A-matrices in the lower right corner. Ultimately we arrive at a diagonal A-matrix. Compared to the method in Sec. 2.5, we have added a procedure by which we make the element at the upper left corner divide any element in the same row and in the same column. Clearly this procedure is not necessary for constant matrices. We know the canonical form of a constant matrix is unique because we cannot use elementary operations to simplify it further. Whereas the above diagonal matrix of a A-matrix can be simplified further by elementary opera tions and hence is not unique. For instance, in Example 1, since the element A(A — 2) is not divisible by A(A — 1), adding the third row to the second row gives 1 0 v0
0 A(A-l) 0
0 \ A(A-2) A(A-2)y
Using formula (1), from A(A - 2) = A(A — 1) - A we obtain 1 0 0
0 A(A-l) 0
0 \ /l -A ~ 0 A(A-2)/ \0
0 -A A(A - 2)
0 A(A - 1) 0
where any element in the same row and in the same column is divisible by the element at the upper left corner of the matrix of order 2 in the lower right corner. Hence we obtain 1 0 0
0 -A A(A - 2)
0 ^ 0 A(A --1)(A- - 2 ) j
/I
Jo
Vo
0 0 A 0 0 A(A - 1)(A - 2)
239
Jordan Canonical Form of Matrices
where each element on the principal diagonal is a monic polynomial in A, and is a factor of the preceding element. We cannot further simplify the diagonal matrix by elementary operations. So is also the general case. Since the ranks of equivalent matrices are the same, we have: T h e o r e m 1. Any A-matrix A(X) of rank r can be reduced to a diagonal matrix
/Ei(X)
\ Er(X)
(2)
Oj
\
by elementary operations, where Ei(X) is a monic polynomial in A such that £i_i(A) divides JSfc(A), i.e., £J i _ 1 (A)|£ i (A), i = 2 , . . . , r . Such a diagonal A-matrix as (2) is called the canonical form ofA(X). Clearly the canonical form of a constant matrix is a special case of that of A(X). It should be noted that we require Ei-\(X) to divide J5,(A) in order that the degrees of the monic A-polynomials E\(X),..., Er(X) in (2) may be decreased to as low as possible. Thus the form (2) is the simplest form and thus unique. Evidently canonical forms of constant matrices all have the above properties. The canonical form of a A-matrix is unique just as that of a constant matrix. Its uniqueness is proved as follows: First we study the properties of Ei(X) in (2). It is easy to see that the nonzero minors of each order are all principal minors, so that we can easily find all the nonzero minors of each order. Then the greatest common divisor of all the minors of the same order can be easily found. The greatest common divisor of all minors of order m in (2) is denoted by D m (A) (monic polynomial in A). Then only nonzero principal minors of order 1 in (2) are those such as Ei(X). As E\(X) divides all Ei(X), we have A (A) = A (A). Similarly, since all nonzero principal minors of order 2 in (2) are only those such as Ei(X)Ej(X) (i =£ j), while Ei(X)E2(X) divides all Ei(X)Ej(X), we have D2(X) = A(A)A(A). In general, we have A (A) = A (A) A (A) ■ •■Ei(X), i.e., A(A)=A(A), A (A) = A (A) A (A), (3)
A-(A) = A (A) A (A) • ■ • Er(X).
240
Linear Algebra
Thus
J^L
= Ei(X),
i = l,...,r,A,(A)-l.
This is an important relation among -Ej(A) in (2). For example in a constant matrix with rank r, E1 = ---=Er
= l,
D1=-=Dr
= l.
Again in Example 1 above, E1(X) = 1,
E2(X) = X,
£3(A)=A(A-l)(A-2),
therefore Z>i(A) = 1, £>2(A) = A, D3(X) = A2(A - 1)(A - 2). We next verify that Dm(\) in the canonical form (2) of A(X) is simply Dm(\) of A(X). That is to say, after elementary operations the greatest com mon divisor £>m(A) of all minors of order m in A(X) does not change. Hence Em(X) is also unchanged, i.e., Em(X) is uniquely determined by .4(A), inde pendently of the elementary operations used, and so the canonical form (2) of A(X) is unique. If we apply elementary operations and carry A(X) into B(X), then, as in the proof of Theorem 1 in Sec. 2.5, we can easily verify that any minor Bm(X) of order m in B(X) can be written as kAm(X), or Ami (A) + f(X)Am2 (A), where Am(X), Ami(X), and Am2(X) are minors of order m in A(X). Assume that Dmi (A) is the greatest common divisor (a monic polynomial in A) of all mi nors of order m in A(X). Since Dmi(X)\Am(X), Dmi(X)\Ami(X)(i = 1,2), we have Dmi(X)\Bm(X), i.e., D mi (A) divides all the minors of order m of J5(A). If the greatest common divisor of all the minors of order m in B(X) is Dm2(X), then £> mi (A)|D m2 (A). From A(X) ~ B{X), we have B(X) ~ .4(A). There fore Dm2(X)\Dmi(X). Since £>mi(A) and Dm2(X) are monic polynomials in A, Dmi(X) = Dm2(X). That is to say, after elementary operations the greatest common divisor Dm(X) of all the minors of order m in A(X) is unchanged. This completely proves the uniqueness of the canonical form. Em(X) is known as the m-th invariant factor of A(X). In general, following the method described above and using elementary operations, we can find the canonical form of a A-matrix. But for some special A-matrices, if Di(X) is easily found, then Ei(X) is immediately obtained. Such a method is sometimes more convenient for finding the canonical form.
241
Jordan Canonical Form of Matrices
Example 2. Find the canonical form of the A-matrix /A+1
2
-2
1
0
A+ 1 0 0 A+1 0 -2
0
V o
\
1 2 A + iy
Solution: Obviously D4 =
A+ 1 -2
2 A+ 1
A+ 1 -2
2 A+ 1
2
= [(A + l ) + 4]< Since 2 A+ 1 0
1 0 A+1
0 1 = -4(A+l), 2
wehaveD 3 (A)|(A + l), and thus £>3(A) = 1 or £>3(A) = A + 1 . As D3(\)\D4{\), we have £>3(A) ^ A + 1, hence D3(A) = 1. Thus £>2(A) = £>i(A) = 1, i.e., Dx = D 2 = D 3 = 1,
£»4 = [(A + l ) 2 + 4] 2 ,
and so E!=E2=E3
= 1,
EA = [(A + l ) 2 + 4] 2 = (A2 + 2A + 5) 2 .
Therefore the canonical form required is •1
\ (A 2 +2A + 5 ) 2 /
E x a m p l e 3. Prove that the minimal polynomial of a matrix A is the n-th invariant factor En(X) of the characteristic matrix (XE - A). Proof. It is clear that d(X) in Theorem 2, Sec. 5.5, is the greatest common divisor of all the minors of order n - 1 in XE — A, i.e., d(X) = £) n _i(A). But |AJS -A\= /(A) is simply Dn(X), i.e., /(A) = Dn{X), so /(A) = d(X) thus the statement holds.
En(X),
242
Linear Algebra
In the above we have given the definition of the canonical form of a A-matrix and proved its uniqueness. As for using them to establish the conditions for equivalence of two A-matrices, it will be discussed in detail in the next section. Exercises 1. Find the canonical forms of the following A-matrices: X-a (1)
0 /
(3)
0 X-bJ 0 0
V(A + 1)2
w
'
/A-3 4 (4) -6 \ 14
0 A(A + 1 ) \ A 0 0
0
A 2 +2A + 1
VA+ 1
y
-1 0 A+1 0 -1 A-2 5 1
0 \ 0 -1 A /
2. Show that the A-matrix (
X
-1 0 0
Io
0 A -1
0 0 A
0 0
0 0
• • •
0 0 0
•
A ■
- 1
an
\
«n-l
an-2 a-2
A + ai/
only one invar iant factor En{X) = An + aiA n _ 1 + • ■ ■ + a n _iA + an which is not a constant. 3. Prove that the determinants of two equivalent A-matrices of order n differ only in a nonzero scalar factor. 6.3. Necessary and Sufficient Condition for Two A-Matrices to be Equivalent Now let us discuss the equivalence of two A-matrices, i.e., solve the first problem raised in the introduction. From the preceding section, we know that two equivalent A-matrices have the same canonical form. Conversely if two A-matrices of the same size have the same canonical form, then they are equivalent to the same A-matrix
Jordan Canonical Form of Matrices
243
and hence are equivalent. Therefore two A-matrices of the same size are equi valent if and only if they have the same canonical form. If the canonical form is expressed by invariant factors, then the condition is embodied in the following theorem: Theorem 1. Two A-matrices of the same size are equivalent if and only if they have the same invariant factors. Two A-matrices have the same Ei(X) and hence the same A (A). The converse also holds. Thus we further obtain: Theorem 2. Two A-matrices of the same size are equivalent if and only if they have the same greatest common divisor of all the minors of each order. We may understand the above two theorems in the following way: After defining the concept of the invariant factor Di(X), we obtain the necessary and sufficient condition for two A-matrices to be equivalent. After decomposing -Dj(A) into a product of factors by some method, we again obtain Ei(\). After quoting Ei(X) we obtain another necessary and sufficient condition for the Amatrices to be equivalent. After we have decomposed Ei(X) into the product of the simplest factors, do we still obtain a new sufficient and necessary condition for two A-matrices to be equivalent? The question can be discussed as follows. Let the rank of A(X) be r, Ei(X),..., Er(X) be its invariant factors. If the coefficients of any element aij(X) in .A(A) are real numbers, then the coefficients of Ei(A),..., Er(X) are all real numbers, whereas the coefficients of its linear factors may be complex numbers, instead of real numbers, in general. Hence in the realm of real numbers in general we cannot decompose -Ei(A), • • ■, ET(X) into products of linear factors of A. Suppose we could decompose £i(A) into the product of linear factors of A. Let E i (A) = ( A - A 1 ) ^ - - . ( A - A t ) e " ,
i = l,...,r.
(1)
We then have (A-Ax)611,..., (A-Ax)6-, (2) ei
e
(A - A t ) \ . . . , (A - Af ) " ,
244
Linear Algebra
where the A's are such that A* ^ \j when i ^ j , that is to say, Ai, A2, • • •, At are distinct. Since Ei-i(\)\Ei(\), we have e y < e%j
3=
l,...,t.
Consequently we again define erj ^ 0 (j = 1, 2 , . . . , t). Thus the powers of r linear factors of any row in (2) are not all constants. The powers of all linear factors in (2) which are not equal to constants, i.e., all (A — Xj)eii, where eij 7^ 0, are called elementary divisors of A(X). For example, in Example 1 in the previous section, £ 1 (A) = 1,
£ 2 (A) = A,
£ 3 (A) = A ( A - l ) ( A - 2 ) ,
giving its elementary divisors as A, A, A — 1, A — 2. E x a m p l e 1. Find the elementary divisors of the characteristic matrix of the matrix -1 1 0\ -4 3 0 . 1 0 2) Solution: As
/A + l XE-A
4
= =
\
-1
-1 A-3 0
-
{ o r*j
/l 0
U fl -
o
lo
0 A-2/
1 o \
0
f -1 A-3
0
(A + l ) ( A - 3 ) + 4 -1
o \
0 (A - I ) -1
2
0 (A - I ) -1
0 A-2y
2
0 A-2, 0 \ ( A - 2 ) ( A - I) 2 0 ;
Jordan Canonical Form of Matrices
/l ^ 1 0 \0
0 1 0
245
0 0 (A-2)(A-1)2
the elementary divisors required are A — 2, (A — l ) 2 . The solution is complete. We know that under elementary operations the rank and invariant factors Ei(X) of A(X) are invariant, so that its elementary divisors in (2) are also invariant. Hence if two A-matrices are equivalent, then their ranks are equal and they have the same elementary divisors. The converse is also true, namely, if the ranks of two A-matrices are equal and their elementary divisors are identical, then they are equivalent. For example, consider two A-matrices of order 6 with same elementary divisors A, A3, A — 1, A - 1, A + 1, (A + l ) 2 . If their ranks are 4, then they have four invariant factors. Again according to Ei^i(X)\Ei(X), its four invariant factors are E4(X) = A 3 (A-1)(A + 1) 2 , E3(X) = A(A-1)(A + 1), E2(X)=
E1(X) = 1.
Hence the two A-matrices have the same invariant factors and are thus equivalent. In general, if elementary divisors of two A-matrices of rank r are (1), then their invariant factors are (2). This is because Ei-i(X)\Ei(X). Therefore the two A-matrices are equivalent. Thus we obtain another necessary and sufficient condition for two A-matrices to be equivalent. T h e o r e m 3. Two A-matrices of the same size are equivalent if and only if their ranks are equal and they have the same elementary divisors. It should be noted that two A-matrices of the same size which have only the same elementary divisors are not necessarily equivalent. For example, for the six elementary divisors given above, they may be the elementary divisors of a A-matrix of rank 3 or they may also be the elementary divisors of a A-matrix of rank 5. Only when the rank is specified can in variant factors be determined uniquely. Therefore the sameness of ranks is an indispensable condition for two A-matrices to be equivalent. We shall consider the reduction of a matrix into the Jordan canonical form later. It is a key step to findinK the elementary divisors of the characteristic
246
Linear Algebra
matrix. Thus finding elementary divisors is an extremely important problem. This section closes with a discussion on the method of finding elementary divisors. Elementary divisors can be found also from diagonal matrices, not neces sarily only by canonical forms. This forms the following theorem: Theorem 4. Assume that a A-matrix A(X) is equivalent to a diagonal A-matrix: \ fhW
frW
A(A)~
o I
\
Then all powers of linear factors of / i ( A ) , . . . , / r (A) are all elementary divisors of 4(A). Proof. Let n n1 1 f/ii(X) (X-Xi) (A) == (A-A (A), 1 ) - '5 i9i(X),
i(Ai)/0, 5ffi(Ai)^0,
nn ^ 0, mi>0,
Ji== ll,,.. .. ..,r. ,r.
Among mong the factors (A-A Ax1 ) "n " , .. .•. •, , ( (A A --A A0 i ) n nr rt t>,
n a n>a>0, 0,
consider the factor (A — Ai)" n with exponent na > 0. We may suppose n n ^ 7121 ^ ••■ ^ nr\. As mentioned earlier, nonzero minors of order m in A{\) have to be principal minors / a (A) ■ • ■ /i m (A). As the largest exponent of (A — Ai) which divides fa (A) ■ • ■ /i m (A) is n n + ■ ■ • + nm\ = S j = l nii> the largest exponent of (A — Ai) in Dm(X) is Y^iLi nil- Thus the largest exponent of (A - Ai) in the Em(X) = D "■ A) is m
m—1
2 J riji - 2 J nil — "ml , i=l
t=l
and so (A —Ai)"ml is an elementary divisor of A{\). By an analogous argument we can prove that all powers of linear factors of /i(A) are elementary divisors of A(\). By definition the elementary divisors of A{\) consist of these only. Thus theorem is established.
247
Jordan Canonical Form of Matrices
Example 2. Find the elementary divisors of the characteristic matrix of the matrix 6 0\ ( 4 A = -3 -5 0 -6 l) V-3 Solution: As /A-4
XE-A
=
X-<
~
3
V o
r^j
(° 3
lo /I J^-/
0
Vo
-6 A+5 6
0 0
\
X- 1 /
-6 0 \ 0 A+5 0 A- 1 / ( A - l ) ( A + 2) A+ 5 0
0 0 A-l
0 ( A - l ) ( A + 2) 0
0 \ 0 A-l;
the elementary divisors required are A — 1, A— 1, A+2. The solution is complete. For some special A-matrices, their elementary divisors can be found through Dj(A) and Ei(X), which sometimes is much simpler, not necessarily by reducing them to diagonal A-matrices. Example 3. Find the elementary divisors of the A-matrix of order n IX — a A(X) =
b\ X—a
V
\ bn-l
X-aJ
where &i, 62, ■. •, & n -i are nonzero constants. Solution: By direct computation we easily obtain Dn(X) = (A —a)". Delet ing the first column and the n-th row of |-4(A)|, we get a determinant of order
248
Linear Algebra
n — 1 which equals the product of nonzero constants &i,&2> ■ ■ • > &n-i> a n d s 0 D„_i(A) = 1. As A - 1 (A) | A (A), we have £>n-2(A) = ■ • • = £>i(A) = 1, and so Ex(A) = • ■ • = ^,_!(A) = 1,
£ n (A) = (A - a)" .
Thus the elementary divisors of J4(A) consists only of (A — a)n. The solution is complete. In a similar way, we show that the A-matrix /X-a
\ A—a
&i
bn-i
\
h^o X-a/
has only one elementary divisor (A — a)n. The example is of great importance. Only after solving the above example can we find XE — B that is equivalent to XE — A in the next section by using Theorem 3. Thus the problem of finding the Jordan canonical form is completely solved. In general, the minimal polynomial is not equal to the characteristic poly nomial. But for the following special matrix, the minimal polynomial is the same as the characteristic polynomial. E x a m p l e 4. Let a matrix A of order n be (a
1 a a
\
1 a)
Prove that the characteristic polynomial /(A) of A is necessarily the same as its minimal polynomial m(A), and both are (A - a ) n , i.e., /(A) = m(A) = (A - a)n .
Proof. /(A) = \XE - A\ - (A - a)n. The example above gives the n-th invariant factor of XE — A as (A - a)n. From Example 3 in Sec. 6.2, m(A) = (A - a)n. Hence /(A) = m(A). The proof is complete.
249
Jordan Canonical Form of Matrices
E x a m p l e 5. Find the elementary divisors of /A-l A(\)
-2
-3
-4
A-l
-2 A-l
-3 -2
\
A-l/
V
Solution: As \A(X)\ = (A - l ) 4 , £>4(A) = (A - l ) 4 . As -2 A - 1
-3 -2 A-l
As(A)|A(A + 1). Since D3(X)\D4(X), and hence
-4 -3 -2
= -4A(A + 1),
D3{X) = 1. Thus D2(A) = L>i(A) = 1,
£i(A) = £ 2 (A) = E3{X) = 1,
E4(X) = (A - l ) 4 .
Thus A(X) has only one elementary divisor (A — l) 4 . E x a m p l e 6. Prove that a A-matrix A(X) is an invertible matrix if and only if it is equivalent to the unit matrix E. Proof. Noting that the greatest common divisors of all minors of each order in the unit matrix E of order n are 1. If A(X) a E, then Dn(X) = 1. Therefore |vl(A)| is a nonzero constant, and so .A(A) is an invertible A-matrix. Conversely, if ^4(A) is an invertible A-matrix, then |J4(A)| is a nonzero constant. Hence Dn(X) = 1 and so A(A) = 1. Thus A(X) ~ E. The statement holds. Generally, we have the following method of finding elementary divisors of block diagonal A-matrices. T h e o r e m 5. If a A-matrix A(X) is a block diagonal A-matrix A(X) = Am(X)i then a sequence of elementary divisors corresponding to A-matrices A2(X),..., Am{X) are all elementary divisors of .A(A).
Ai(X),
Proof. Since elementary operations applied to some Ai(X) in A(X) do not affect other A,(A), i =fi j , these elementary operations carrying J4I(A),
250
Linear Algebra
^ ( A ) , . . . , Am(X) into diagonal A-matrices reduce A(X) to a diagonal A-matrix. Therefore the elementary divisors of A\(X),A2{X),..., Am{\) are all elemen tary divisors of A(X). The theorem is established. Example 7. Find the elementary divisors of the A-matrix
A(X) =
1 A 0 0
0 0
Vo
0 \ 0 0 0 0 A 0 A-l/
Solution: As (X 1 0 A
A(X)
A 0
V
0 A-l/
is a block diagonal A-matrix, and the elementary divisors of the block A-matrices I
1 are A2; A and A — 1 respectively. The
. I and I
elementary divisors required are A2, A, and A — 1. The solution is complete. The above-mentioned is the answer to the first question raised in the introduction. Exercises 1. Find the invariant factors and elementary divisors of the following A-matrices: / (1) V
0 A(A - 2) 2 0
A 3 ( A - 2 )\33 0 0
0 0 0
0 0
0 0
A(A + 1) 0
/A-3 4 (2) -6 \ 14
-1 A+l -1 5
0 0 A-2 1
0 \ 0 -1 XJ
0 0 0 0 A-2
0
\ 0 A-2 0 0 /
251
Jordan Canonical Form of Matrices
(X + a -0 (3) 0 V 0
0 1 X+ a 0 0 X+a 0 - 0
0 \ 1 0 X + a/
Using the greatest common divisors of all the minors of each order, find all the invariant factors and elementary divisors of the following A-matrices:
(1)
(X
1
0
0 0 \0
A A 0 A 0 3
0
\
/A + 2 0
1 A A + 2/
0 1
(2)
1 A+2
A+2 0
V
0 0
\
1 A + 2/
Given a A-matrix of order 7 and rank 5 whose elementary divisors are A,A,A3,A-2,(A-2)4,(A-2)4, find the greatest common divisors of all the minors of each order in A(X). Prove that the following two A-matrices 0 X-a 0 1
(X-a 0 0
A(X) =
02
0 0 X-a 0
02 0
0
V o
1 02
-1 0 0 X-a
0 -1 0 0
0 0
X-a 0
o \ 0 -1 0 0 X-a)
and 0 -1 0
B(X) =
0 0 -1
\
(A - a)2 + 02
2
(X-a) +0
0 0
\
1 2
0
0 1 (X-a)'- l+02)
are equivalent, and then find the elementary divisors of A{X). 5. Show that the matrix A of order n is similar to its transposed matrix A'. 6. Suppose 5(A) and h(X) are relatively prime. Show that (9(X)
O \
V O
h(X)J
1 O
O g(X)h(X)
252
Linear Algebra
7. Show that the characteristic polynomial of the matrix / 0
0
1
0
\ 0
0
...
...
0
-an
\
0
-an_i
1
-ax
)
is identical with its minimal polynomial, and that they are xn + aixn~l
H
h an-ix
+ an .
6.4. Jordan Canonical Forms In this section we describe methods of reducing a matrix of order n to its Jordan canonical form. This is to answer the second question raised in the introduction. According to Theorem 3 and Example 3 in the previous section, we can find a matrix J such that \E — A ~ \E — J. Suppose the elementary divisors of the characteristic matrix XE — A of a matrix A of order n are (A-A1r\(A-A2H)...,(A-Ai)mi. Prom Example 3 in the previous section we know the elementary divisors of the A-matrix of order m* /A-Ai
-1 A-Ai
\ '•• -1
A-Ai/
V are only (A — \i)mi.
If we let
then the above A-matrix of order wij can be rewritten as XE — Ji, i.e., /A-Ai XEi — Ji — I
V
-1
\ .
_i
A-AJ
'
253
Jordan Canonical Form of Matrices
where Ei is a unit matrix of order m,. In particular, when rrii = 1, we have Let J = Jt
Since the degree of \XE — A\ is n, the sum of the degrees of the invariant factors in XE — A is also n. Therefore the sum of the degrees of the elementary divisors in XE — A is also n, i.e., ra\ + m,2 + ■ ■ ■ +mt
=n.
Thus the matrix J is of order n. Prom Theorem 5 in the previous section, the characteristic matrix of J is ' XEi - Jx XE-J
=
V
XEt-JtJ
whose elementary divisors are (A-AO^,
i = i,...,t.
Therefore the elementary divisors of (XE—J) are the same as those of (XE—A). Since XE — J and XE — A are nonsingular matrices of order n, according to Theorem 3 in the previous section they are equivalent, i.e., XE — J ~ XE — A. Hence J is the constant matrix required. Theorem 1 in Sec. 6.1 then gives A ~ J and we obtain the following main theorem: Theorem 1. Suppose the elementary divisors of the characteristic matrix XE — A of a matrix A of order n are (A-A1ri,(A-A2)m2,
, (A - xty
then A~
J=\ jt
where
/Xi
1 Xi
V are matrices of order m^.
i Xi)
254
Linear Algebra
J is known as the Jordan (C. Jordan, 1838-1922) canonical form of a matrix A, Ji is called a Jordan Block associated with (A — \i)mi or Aj. When A; ^ 0, the rank of Ji is rm; when Ai = 0, the rank of Ji is rrn — 1 and J t m; = O, i.e., Ji is a nilpotent matrix. From Example 4 of the previous section we see that (A—Aj)mi is the minimal polynomial of J,. Since similar matrices have the same characteristic polynomial, we have \XE -A\
= (X- Ai)7711 (A - A 2 ) m2 • • • (A - A t ) m ' .
Therefore Ai, Ag,..., At are eigenvalues of A. This is to say, A, in Theorem 1 are characteristic roots of A. It is worth noting that when i ^ j , we may have A, = Aj, so Aj need not be an eigenvalue of multiplicity mj of A, generally its multiplicity ^ m*. In other words, the elementary divisors of XE — A cannot be determined from the multiplicity of the eigenvalues of A. For example, suppose Ai is an eigenvalue of A. When Ai is a simple root, A — Ai is of course an elementary divisor of XE — A. But when Ai is an eigenvalue of multiplicity 2, the elementary divisors of XE — A may be (A — Ai) 2 , or A — Ai and A — Ai. Many such examples may be given. Since the elementary divisors of a A-matrix are unchanged under elementary operations and are uniquely determined by itself, the Jordan block Ji of A is unique. The Jordan canonical form J is also unique up to the order of the Jordan block Ji. Since A is similar to its transposed matrix A', they have the same Jordan canonical form. E x a m p l e 1. Find the Jordan canonical form of the matrix /-l A=
-4
1
0\
3 0
.
V 1 0 2) Solution: From Example 1 of the previous section we know that elementary divisors of XE - A are A - 2, (A - l) 2 . Thus
I2 A~J
= \
V
\ 11
,
i/
therefore the matrix A is not similar to a diagonal matrix. The result coincides with that in Example 3 of Sec. 5.1.
Jordan Canonical Form of Matrices
255
Example 2. Find the Jordan canonical form of the matrix (\
2 3 4\ 1 2 3 1 2
\
1/
Solution: From Example 5 of the previous section, we see that elementary divisors of XE — A are only (A - l ) 4 , and so (\ A~
1 1
J
1 1 1 1/
Example 3. Find the Jordan canonical form of the following matrix, / 4 4 1 A = -3 \-3
6 -5 -6
0' 0\ 0 1
Solution: From Example 2 of the previous section, we see that the elemen tary divisors of XE — A are the linear factors A — 1, A — 1, A + 2, and so /I
\
V
V
i.e., A is similar to a diagonal matrix. The result is the same as in Example 1 of Sec. 5.2. Theorem 1 above is a very important theorem as it has many important applications. Example 4. Suppose the eigenvalues of A are Ai, A2,..., A„. Prove that the eigenvalues of Am are Xf, Xf,..., A™. Proof. Assume A ~ J, i.e., J = P~lAP. P~ A P, or A2 ~ J2. Generally, we have 1 2
Am ~
Jm.
Then J2 = P~lAPP~lAP
=
256
Linear Algebra
Prom the multiplication rule of matrices we can easily check t h a t ' Ji J" jm
and (K
1
1 \ Hence the eigenvalues of Jm are the m-th power of the eigenvalues of J , i.e., A ™ , . . . , A™ are the eigenvalues of Am, and the statement holds. It should be noted t h a t there are differences between verifying Af1 being a n eigenvalue of Am in Exercise 7 of Sec. 5.1 and verifying here t h a t the eigenvalues of Am consist of only A™. Since the eigenvalues of a zero matrix are zeros, the eigenvalues of a nilpotent matrix are zeros. Thus we have E x a m p l e 5. T h e trace of a nilpotent matrix is zero. T h e result in Example 4 can be generalized to the general case. E x a m p l e 6. Assume that the eigenvalues of a matrix A of order n are A i , . . . , A n and g(x) is any polynomial in x. Prove t h a t the eigenvalues of g{A)
axeg(\i),g(\2),...,g(\n). This is the famous Frobenius's P r o o f . Let P~XAP
(G. Probenius 1848-1917)
= J , i.e. A = PJP~\
where 'Ai
1
Jl J =
■A
'9(Ji)
As 'fl(Ai)
\ g(Ji)
9{J) =
9(Jt)J '
\ 1 A,/
Jt Then we have g(A) = Pg{J)P~l.
Theorem.
-
Jordan Canonical Form of Matrices
257
the eigenvalues of g{Ji) are g(\i),g(Xi),... ,g(Xi). Thus the eigenvalues of g(A) are g{\{),... ,g(Xn) and the statement holds. Note that in Exercise 8 of Sec. 5.1 we also encountered this kind of problem, but both of them are actually not quite the same. Example 7. Suppose A is a matrix of order n. If there is a positive integral k such that rank of Ak = rank of Ak+1, prove that rank of Ak+i
= rank of Ak
(i = 2,3,...)
Proof. Suppose the Jordan canonical form of A is J, i.e., A~ J = Ji
Then
'
r
jm
jrr.
Suppose Ji is the Jordan block associated with eigenvalues Aj. When Ai ^ 0, rank of J, = rank of J™. When A; = 0, if the order of J, is more than 1, then rank of J, = rank of J? + 1. Therefore if rank of Ak+1 = rank of Ak, then for Ji, whose eigenvalues A, = 0, the k-th. power of J» equals zero matrix. Hence the eigenvalues Aj of nonzero Jk in Ak is not equal to zero. So rank of Ak+l = rank of Ak. The proof is complete. The diagonal matrices below can be considered as special cases of the Jor dan canonical form and from Theorem 1 we can draw two important results. If the eigenvalues of A are n simple roots Ai, A2,..., A„, then the elementary divisors of XE — A are A — Ai, A — A2,..., A — An. From Theorem 1 above we have
~ V '" 0 ' This is to say that if the eigenvalues of A are simple roots, then it is similar to a diagonal matrix. This once more proves Theorem 3 in Sec. 5.2. Assume that the elementary divisors of XE — A are linear factors in A. Then A is similar to a diagonal matrix. Conversely, if A is similar to a diagonal matrix, i.e.,
258
Linear Algebra Oi
then ' X — a\
ai
XE - A ~ XE X — an Therefore the elementary divisors of the characteristic matrix of A are linear factors in A. Thus we have: Theorem 2. A matrix A is similar to a diagonal matrix if and only if the elementary divisors of its characteristic matrix XE — A are linear factors in A. In Sec. 5.2 we gave the necessary and sufficient condition for a matrix A to be similar to a diagonal matrix. Here we again provide another necessary and sufficient condition. It is well known that the eigenvalues of a real matrix need not be real numbers. Therefore it is generally impossible to find the Jordan canonical form of a real matrix in the realm of real numbers. However, we can use the following theorem which is analogous to Theorem 1. Theorem 3. Assume that A is a matrix of order n and the nonconstant invariant factors of its characteristic matrix XE — A are fi(X) =Xni + a i i A n i - 1 + ---+a i ,„ i _iA + a i , n i ,
i=
Then 'Ni A~N Nt where 0
Ni =
0
0 1
Vo
0
■
•
0 0 0 1
is called the the rational canonical form of A.
-dim — a
\
i,7li_2
-an 1
l,...,t.
259
Jordan Canonical Form of Matrices
Proof. As 'XE1-N1 XE-N XEt -
Nt.
we obtain the nonconstant invariant factors of (XE — N) : /i(A),/2(A) , . . , , ft(X) (Exercise 2 in Sec. 6.2). Thus XE - A ~ XE -
N.
Therefore A~ N and the theorem is established. Since invariant factors are not changed under elementary operations and they are uniquely determined by the A-matrices themselves, iV, is unique. If the order of N is not considered, N is also unique. From Exercise 7 of Sec. 6.3, we know that fi is the minimal polynomial ofJVi. Example 8. Find the rational canonical form of the matrix
n A= i V
2 -1 -12
2\ 1 1 i
Solution: As
XE-A-
X3-X2 + X-1J we have /1 = A3 - A2 + A - 1. Thus Ni
and soA~N
= Ni.
/0 0 1 0 \0 1
1\ -1 1 )
260
Linear Algebra
This example is the very second example of Sec. 5.2, but the result here is not simpler than the previous one. Assume that A is a matrix of order n. If the invariant factors of (XE — A) are all nonconstant, then the n invariant factors are linear factors in A, i.e., En(X) =
Ei(X)
X-a,
and hence A ~ diag (a, a,... ,a). The converse is also true. That is to say, A is similar to a scalar matrix if and only if the invariant factors of (XE — A) are all linear factors in A. In the above we gave methods of finding a Jordan canonical form of A, but did not give a way to calculate the required matrix P as in Sec. 5.3. Of course, if we only require the Jordan canonical form, we need not find the matrix P. For simplicity, the methods for finding P will be illustrated below through some examples. Assume that \ 1 A2/
P~1AP =
P = (Xt X2
X3).
Then Ai
AP = P
\ A2
1 A2/
i.e.,
\
/Ai
A(Xi X2 X3) = (Xi X2
X3)
A2
V
1 X2)
Using the multiplication rule we find [AXX AX2 AX3) = (AiXi X2X2 X2 +
X2X3),
and thus AX\ = X\X\,
AX2 = X2X2,
AX3 = X2 + A2-X^ .
Hence (A-X1E)X1=0,
(A-X2E)X2
= 0,
[A - X2E)X3 = X2 .
261
Jordan Canonical Form of Matrices
Therefore Xi and X2 are the eigenvectors of A associated with Ai and A2 respectively, X3 is a solution vector of the last system of linear equations and is called the generalized eigenvector of A associated with A2. If we can find these vectors we can at once obtain the matrix P required. Suppose /Ai P-XAP
1
\
=
, "
P=(X1
;.)
Then
'Aj A{Xl
X2 X3) ■
1 Ai
X2 X3) = (Xl X2 X3)[
1 I, A1 .
and thus {AXX AX2 AX3) = (A1X1 Xi + XXX2 X2 + XXX3) or
(A-XiE)Xi=0,
(A - XXE)X2 = X i ,
(A - X1E)X3 = X2.
Hence X\ is an eigenvector associated with Ai,A"2 and X3 are generalized eigenvectors associated with Ai, and in this way we obtain P required. As P exists, the above system of linear equations must have solution vec tors. For example, X\,X2,X3 are simply their solution vectors. It should be noted that if we take any solution vectors Y\,Y2,Y3 of the above system of linear equations, they are of course not necessarily exactly the same as X\,X2,X3. However this does not really matter, for as long as 5^,12,1^ are linearly independent, we can replace X\, X2,X3 by them. The above solution vectors Y\,Y2, Y3 are actually linearly independent. For example, the vectors Yi, Y2, Y3 satisfying the system of linear equations (A - A!£)Yi = 0,
(A - \iE)Y2 =YU
(A- Ai£)Y 3 = Y2
are linear independent. The reason is as follows. Let fciFi + k2Y2 + k3Y3 = 0. As (A-\1E)2Y2
= 0,
(^-Ai£)2Y3^0,
(1)
262
Linear Algebra
premultiplying the two sides of (1) by (A—\\E)2, we obtain k3(A—X\E)Y3 = 0, and so £3 = 0. Likewise, we have k2 = 0, k\ = 0. Therefore Y\,Y2,Y3 are linearly independent. Consider another example. The solution vectors Yi,Y%, and Y3 satisfying the equations (A - XiE)Yi = 0,
(A - X2E)Y2 = 0,
{A - X2E)Y3 = Y2
are also linearly independent. In fact, if k\Y\ + k2Y2 + k3Y3 = 0, then as f(x) = x — Ai, g(x) = (x — A2)2 are relatively prime, there, exist polynomials r{x) and s(x) such that r(x)f(x)
+ s(x)g(x) = 1.
According to Theorem 3, Sec. 6.1, we can use the division algorithm for poly nomials in matrix A as the polynomials in x. Thus we have r(A)(A - Ai£) + s(A){A - \2E)2 = E, and hence r(A)(A - \!E)Yi
+ s(A)(A - A 2 ^) 2 Fi = Y1.
But as (A — XiE)Y\ = 0 (Yi is an eigenvector of A associated with Ai), k1Y1=
klS(A){A
= -s(A)(A
- X2E)2YX
= s(A)(A - Aa^ffciYi
- X2E)2{k2Y2 + k3Y3) = 0.
As Yi ^ 0, we have k\ = 0. Again since Y2 and Y3 are linearly independent, we have k2 =Q,k3— 0. Therefore Yi,Y2, and Y3 are linearly independent. Since solution vectors of the above system of linear equations are not unique, this exactly shows that the matrix P required is not unique. The same is true for the general case: If Aj is a simple root, we find an eigenvector X\ associated with the eigenvalue Ai. If Ai is a root of multiplicity k, we find the eigenvectors and generalized eigenvectors associated with Ai. X\, X2, . . . , Xk obtained in this way are linearly independent (it is rather troublesome to prove and the proof is omitted). Thus we obtain P. Essentially finding P is nothing but finding the eigenvectors and generalized eigenvectors. In Example 1, A has two eigenvalues: a single Ai = 2 and A2 = 1 of multiplicity 2. From Sec. 5.1 we know that one of the eigenvectors is (0, 0, 1)
263
Jordan Canonical Form of Matrices
which is associated with Ai = 2, an another one is (1, 2, -1) which is associated with A2 = 1. Taking
*i =
0
x2
W
(l
\ 2
V-i/
and solving (A — E)X$ = X2, we obtain a generalized eigenvector X3 1
. Then
-1/ /0 0 \1
1 2 -1
0 \ 1 - 1 /
is the matrix required. Again in Example 2, Ai = 1 is an eigenvalue of multiplicity 4. Taking eigenvector X\
0 0
associated with Ai, we find the generalized eigenvec
Vo, tors below. First, solving (.A — E)X2 = Xi we obtain X2 = solving (A - E)X3 = X2 and (A - E)X4 = X$ we obtain / 0 \ -1 *3
=
2
1 ,
X4 -2
V 1 / Hence the matrix required is /8 4 0 0 \ 0 4 - 1 1 0 0 2 - 2 \0 0 0 1 /
Then
264
Linear Algebra
E x a m p l e 9. Find the Jordan canonical form of the matrix / 2
-1
- 1 \
2
-1
-2
V-l
1
A =
2/
as well as the matrix P used. Solution: As
(
A-2
1
1
\
/l A-l
(A-l)V -2
A+l
1
-1
2
~
A-2/
V
its elementary divisors are A — 1 and (A — l) 2 . Hence A~ J /I 1 1 \ Let P = (Xi X2X3).
We have
{A - E)Xi = 0,
(A - E)X2 = 0,
(A - E)X3 = X2 .
Solving (.4 - E)X = 0, we obtain a = (1,1,0) and /3 = (1,0,1). We take (*\ Xx = I 1 I. To make (A — E)X — X2 solvable we must choose the vector X2
w
appropriately. For this purpose, let ki<x + k2(3 = (fci + k2, ki,k2) such that rank of
/-l -2
1 2
1 \ 2
Vi
-i
-i/
rank of
/-l -2 V 1
l 2 -1
l 2 -1
The obvious choice is fci = 2, k2 = —1, i.e., we choose X2 = | (A - E)X = X2, we obtain /1\ X3
0
h + k2\ fci k2 ) 2 | . Solving
265
Jordan Canonical Form of Matrices
Hence the required matrix is
ni Vo
i A 2 0 - 1 Oj
It is worth noting that we first ought to have found X2 and then selected X\. This is because if the X2 found and X\ are linearly dependent, then it is no use selecting X\. We shall consider two more examples below. Example 10. Prove that any real matrix A=(an
ai2
)
\ 0.21
0-22 )
is similar to one of the four canonical forms Xi
\
XiJ'
/Ai
\
1\
Xi) '
fXt
\
\
A2J
f a b ;
\-b
a
where a and b are both real numbers. Proof. Suppose /(A) = \XE - A\ = A2 - (an + a22)A + (ana22 - 012^21) = (A-A!)(A-A2), AXi = A1X1,
AX2 = A2X2 ,
i.e., X\ and X2 are eigenvectors of A associated with Ai and A2 respectively. 1. When Ai = A2, i.e., the two eigenvalues are identical, we consider the following two possibilities: (1) If X\ and X2 are linearly independent, then A is similar to a diag onal matrix:
(2) If X\ and X2 are linearly dependent, then A is similar to the Jordan canonical form:
266
Linear Algebra 2. When Ai ^ A2, i.e., the two eigenvalues are distinct, (3) If Ai and A2 are both real numbers, then Ai A2 (4) If Ai and A2 are both complex, letting Ai = a + bi and \2 = a where a and b are real numbers, Y\ — X\ + X% i
Y2
X\ — X2
we have AYi = AXi + AX2 = (a + bi)Xi + (a - bi)X2 = a(Xi + X2) + U{XX - X2) = aFi - bY2 and AY2 = \{AX1 - AX2) = -[(a + bi)Xx - (a - bi)X2] 1
1
= l[a(X1-X2)
+ bi(Xi+X2)] = bY1 + aY2,
1
so that A(Yi Y2) = {AYX AY2) = {aYx - bY2 bYx + aY2) = (Yi Y2) or
a bx -b ■ a
a b , —b a,
P = (Yi Y2),
where o and b are real numbers as noted above. The solution is complete. Example 11. Solve the system of linear ordinary differential equations ' ^ = -2/1+2/2 . <^ = - 4 y 1 + 3j/2 ^
= Vl + 2y 3 ,
where yi,y2, and 1/3 are unknown functions of x.
267
Jordan Canonical Form of Matrices
Solution:
Let /-I \
1
0\
-4
3
0
1
0
2/
Then the system of equations can be rewritten as 2/1 2/2
V2
dx
.3/3.
From Example 1 previously, we know there exists a matrix P such that /2 l
P~ AP
\
=
1
/0
1
0 \
0
2
1
1
V
1/
\1
-
1
-
1
/
T h e nonsingular linear transformation
(yi\ V2
=
W
Z2
W
reduces t h e system of differential equations to 'j/i\
Tx\y>
=
Z2
dx
w
,1/3/
p
zi\
Zl\
d dx
= AP
Z2
* \zz)
Us/
Therefore we have d_ dx
22
w
/
/*l\
/*i\ l
P~ AP
-Z2
\^3>
/*l\
1 1
=
V
1/
^2
w
i.e.,
dz\
dz-i
dz3
-j— = " I ,
- j — = *2 + Z3 ,
-J— = Z3 .
do: da; ax Clearly t h e solutions of the first and third differential equations are respectively z\ = kie2x
,
z3 = k3ex ,
a n d t h e solution of t h e second differential equation dz2 = z2 + k3ex dx
268
Linear Algebra
is found by using a standard formula to be
Hence the solution required is ' V\ =Z2 = ex(k3x + k2)1 < 2/2 = 2z2 + z3 = ex(2k3x + 2k2 + fe3), j/3 = zi - z 2 - z 3 = A;ie2a: - ex(k3x + k2+
k3),
where ki,k2, and £3 are arbitrary constants. We next present a recipe for finding the Jordan canonical form of a matrix of order n without using elementary divisors. T h e o r e m 4. Suppose that a matrix A is of order n and all the eigenvalues of A are Ai, A2,..., An. For each distinct eigenvalue Aj and each j'(l ^ j ^ n) let rank of (A - Aj.E)J = rj(Xi). Again let 61(Ai)=n-2n(Ai)+r2(Ai) and bm(Xi) = rm+1(Xi) - 2 r m ( A i ) + r m _ i ( A i ) ,
for 2 ^ m ^ n.
Then the Jordan canonical form of A is composed of exactly bm(Xi) Jordan blocks of order m for each m and each distinct eigenvalue A, of A and 61 (Aj) Jordan blocks of order 1 for each distinct eigenvalue Aj of A. The reader is asked to accept the conclusion given above without proof. Those who wish to learn more about the above recipe can read "Matrix Theory and its Applications" by N. J. Pullman. To compute the ranks of matrices (A — XiE)J (1 < j ^ n), we can make use of the following result: If rjo(^i) = rjo+i(^i)» rank of (AXiE)j.
then r
j(A;) = rjo(Xi) for all j ^ j 0 , where r.,(Aj) =
We illustrate the use of this recipe in the following examples.
269
Jordan Canonical Form of Matrices
Example 12. Find the Jordan canonical form of
(°2 0 0 0
A =
V-i
0 0 1 -1 0 2 0 0 0 0 0 0
0 -1 1 2 0 0
0 0 0 0 2
1 \ -1
0
2 /
0 0 0
Solution: The characteristic polynomial of A is 33 d3 //(A) ( A ) -=. ((AA--:lL) ) ((AA--22 ) .
the; matrices (.A (A -— XiE) \Eyj: We can easily calculate the ranks of the r i ( l ) === rank rank of of {A (A -- EE)) 11 === 55,,
2 r 2 (l) = rank of (A --E) E)--2 = 4,
3z r 3 (l) === rank rank ofof (A (A ---E)E) === 33,, rs(l)
4 r 4 (l) = rank of (A --E) E)4 --= 3 .
rj(l) ■== rank of (A - E)j --== 3 for j ^^ 4. 4. Using Using Theorem Theorem 44 we we obtain (A -Ey Hence rj(l) 6i(l) = 6 - 2(5) + 4 == 0, foi(l)
( 4 ) + 5 = 0, 6fe(l) 2(1) = 3 -- 2 2(4)
( 3 ) ++ 4 = 1, 633(1) = 3 - 2 2(3)
6^(1) := 3 - 2 ( 3 ) + 3 = 0, 6,(1)
for all j Ss 4.
Again (A -■2E) = A, 4, rn(2) i (2) = rank of (A-2E)
r 2 (2) = rank of (A --■2Ef 2£)2 = 3 ,
(A --■22Ef £ ) 3 -== 3 .. r 3 (2) = rank of (yl rj(2) ■■ = = rank of (A (A - 2Ey = 3, for j.7'^ ^ 3. Using Theorem 4 we \ obtain 2E)j = Hence rj{2) iii (21 = 66- -2 (2(4) 6i(2) 4 1 + 3 = 1, 1.
( 3 ) ++ 4 = 1, 66,(2) 2(2) = 33--2 2(3)
6j(2) = 3 - 2(3) + 3 = 0, for all j ^ 3 . Therefore the Jordan canonical form has one Jordan block of order 3 corre sponding to the eigenvalue 1, one Jordan block of order 1 corresponding to the eigenvalue 2, and one Jordan block of order 2 corresponding to the eigenvalue 2, hence
270
Linear Algebra
/I 1 0 0 0 1 1 0 0 0 1 0 J = 0 0 0 2 0 0 0 0 0 0 0
u
°^
0 0 0 0 0 1 0 2 0 0 2/
E x a m p l e 13. Find the Jordan canonical form of (I A =
1 1 1 1 1 1
u
1 1 1 1
1\ 1 1 l)
Solution: Notice that the rank of A is 1, so the Jordan canonical form of A must have rank 1 too. Therefore the required J has the form /A 0 0 \0
0 0 0 0
0 0\ 0 0 0 0 0 0/
Now A (1,1,1,1)' = 4(1,1,1,1)', so 4 is an eigenvalue of A, and hence / 4 0 0 0\ 0 0 0 0 J = 0 0 0 0 \0 0 0 0/ Of course, we could use the recipe in the following manner. We first find /(A) = A3(A - 4). Then r\ (0) = rank of A = 1, r 2 (0) = rank of A2 = rank of 4A = rank of A = 1, rj(0) = rank of 4 J ' - 1 A = rank of A = 1, for all j ^ 2 . Therefore 6i(0) = 4 - 2 + 1 = 3 ,
M0) = 1 - 2 + 1 = 0,
forallj>l.
Jordan Canonical Form of Matrices
271
Since 3 has three Jordan blocks of order 1 corresponding to 0, it must have one Jordan block of order 1 corresponding to 4. Therefore /4 0 0
J =
\0
0 0
0\
0 0 0 0 0 0 0 0
0/
If we had started trying to calculate rj(4), we would have found ri(4) = 3, r 2 (4) = 3, r,(4) = 3 for all j so that 6i(4) = 4 - 6 + 3 = 1,
6j(4) = 3 - 6 + 3 = 0
forallj>l,
and hence J has exactly one Jordan block of order 1 for the eigenvalue 4. Exercises 1. Find the Jordan canonical forms of the following matrices / 1 (1) 0 (1) ^ 2 V-2 /
3
2 2 --1 1
1 00 - 4 - 1 0 (3) (3) 7 1 2 w 7 1 2 V-l 7 - 6 - 1
0 \\ 0 , - -1) 1 / 00^\ 0 , 1 1 0^
/^ 3 7 (2) - 22 - -55 \V-4 - 4 -10
~- 33 \\ 2 33 J/
/0 • 11 .
• 0
„. (4)
w
V V
\—17 -6 - 1 0 /
2. Find the rational canonical form of the matrix /0 (o .4= A = 3. Suppose
1 1\ 1 00 1 .
\1 \i
1
(/ l A = I 0 A = \ \\ 00
4 -3 4 4
00 //
2\ 2X 44 | . 3/
Using a Jordan canonical form of A, find J4 5 .
-..
1
0\ 0° :
1 oy 0/
1
Linear Algebra
272
4. Show that if there exists a positive integer m such that Am = E, then the matrix A is similar to a diagonal matrix. 5. Show that a matrix is a nilpotent matrix if and only if its eigenvalues are all zero. 6. If the matrix A is a nilpotent matrix whose rank is r, prove that Ar+1 = O. 7. Suppose a matrix A is a nilpotent matrix. Show that the determinant of (E + A) is 1, where E is a unit matrix. 8. Using Theorem 1 in Sec. 6.4, prove that an idempotent matrix is similar to a diagonal matrix d i a g ( l , . . . , 1,0,..., 0). 9. Prove that the trace of a matrix is equal to the sum of its eigenvalues (if an eigenvalue A is of multiplicity k, it is counted k times). 10. Suppose the elementary divisors of A{\) are
{\-x1)m\...,[x-xty Is A{\) similar to XEi - Ji \Et - Jt where Ji is the Jordan block associated with (A — Aj) m '? *11. Prove that an arbitrary square matrix can be expressed as the product of two symmetric matrices one of which is a nonsingular matrix. 12. Find the Jordan canonical form of each of the following matrices by using Theorem 4 in Sec. 6.4:
(i)A =
2
2
M
1
3
i
V
2
2/
/1
(ii)B
2 -1
V-i
-4 0 1 4
-1 5 -2 -1
-4\ -4 3
6 )
CHAPTER 7 LINEAR SPACES AND LINEAR TRANSFORMATIONS
The concept of linear spaces may be said to have arisen from a further scientific abstraction of vectors which were considered in Chapter 2. Linear functions can be discussed from the viewpoint of linear spaces, while linear transformations serve as the basic links that reflect linear relations between elements in linear spaces and provide a means for studying linear functions. Therefore, linear spaces, linear transformations, and the matrix theory which provides a powerful tool of studying them are the main contents of linear algebra. Relatively speaking, linear problems are simpler and many a practical problem can ultimately be reduced to linear problems. This makes linear algebra especially important. In this chapter we shall discuss two problems: 1. The formation, basic properties, and bases of linear spaces. 2. The concept, operations, and the matrix representation of linear transfor mations. This chapter is divided into four sections. In the first two sections we shall solve the first problem; in the remaining two sections we shall solve the second problem. 7.1. Concept of Linear Spaces In mathematics we consider a vector as an ordered pair of real numbers or a (2,l)-matrix, whereas in physics we often treat a vector as a directed line segment. These are three very different interpretations of vectors. One might ask why the three can all be legitimately designated "vector." 273
Linear Algebra
274
To answer this question, we first observe that, mathematically speaking, the only thing that concerns us is the behavior of the object we call 'vector.' It turns out that all the three objects behave, from an algebraic point of view, in exactly the same manner. Moreover, many other objects that arise naturally in applied problems behave, algebraically speaking, as do the above-mentioned objects. To a mathematician, this is a perfect situation, for we can now abstract those features that all such objects have in common. What are the properties that vectors have in common? The notion of an abstract linear space has its origin in the familiar threedimensional space with its well-known geometrical properties. Consequently, we start the discussing of linear spaces by keeping in mind the properties of this particular and important space. We first recall some important properties of two- and three-dimensional vectors and consider them from a more general point of view. Suppose R2 is the set of all two-dimensional vectors starting from the origin and R3 is a set of all three-dimensional vectors also starting from the origin. Then in either R2 or R3, vector addition of any two vectors still gives a vector with initial point at the origin, scalar multiplication of a vector by a real num ber still gives a vector with initial point at the origin. Besides vector addition, there is also vector subtraction. Vector addition satisfies the commutative law and the associative law. Multiplication of a vector by a scalar satisfies the distributive law and the associative law. Generally, suppose Rn is a set of all n-dimensional vectors, i.e., Rn =
{(x1,...,xn)\xiGR},
where R is the set of all real numbers. For any vectors a = (01,02,... ,an) and /3 — (61, b%, ■ ■ ■, bn), we have a + p=(ai
+ bi,...,an
+ bn) £ Rn,
and ka = {ka\, kai, ■ ■ ■, kan) G Rn, k is any real number. Therefore the sum of any two vectors in Rn and the product of a vector mul tiplied by any number are also vectors in Rn. From Sec. 2.1, we know that in Rn there is also subtraction, which is the inverse operation of addition. Vector addition satisfies the commutative and associative laws, and scalar multiplica tion satisfies the distributive and associative laws. That is to say, operations in Rn have the same properties as operations in R2 or R3. Let S be a set of solution vectors of a system of homogeneous linear equa tions. We have proved in Sec. 2.3 that the sum of any two solution vectors
Linear Spaces and Linear
Transformations
275
in S is still a solution vector in S. Multiplication of a vector by a scalar also results in a solution vector in S. Besides, addition and scalar multiplication follow the same operation laws as in R2,R3, and Rn. That is to say, S have the same properties as R2,RZ, and Rn. Let Rn be the set of all matrices of order n. If A, B 6 Rn, then A + B £ Rn,kA €E Rn, which indicate that in Rn there are addition and scalar multiplication. Prom Chapter 3 we know that there is subtraction in Rn besides addition. Operations of addition and scalar multiplication of matrices also satisfy the above operation rules. Let T be a set of the zero vector and all the eigenvectors of A corresponding to the eigenvalue Ao- It is easy to see that the sum of any two eigenvectors in T and the result of the multiplication of an eigenvector by a scalar are still eigenvectors of A corresponding to Ao in T. Its addition and scalar multipli cation follow the same operation laws as the above. Namely, operations of T have the same properties as in R2,R3,Rn, S, and R^. Let M be a set of all polynomials in x whose coefficients are real numbers and whose degrees do not exceed n. If f(x) and g(x) € M, then f(x) + g(x) £ M and kf(x) £ M, which indicates that there are two operations, namely addition and scalar multiplication. Obviously, they also satisfy the above operation rules. Besides, in mathematics and mechanics there exist many such sets which have the above properties. Therefore it is necessary for us to abstract those properties that are common to all such sets (i.e., those properties that make them behave similarly) and define a new structure. A great advantage of doing this is that we can now talk about properties of all such sets at the same time without having to refer to any one set in particular. This, of course, is much more efficient than studying the properties of each set separately. For this purpose we introduce a new concept, the concept of the linear space. Consider a set of numbers (including some nonzero numbers). If every sum, product, and inverse of the numbers (except zero) in the set are also numbers in the set, then the set is called a number field. For instance, the set of all rational numbers forms a field and is called a rational number field. A set of all real numbers forms a real number field. A set of all complex numbers forms a complex number field. However, a set of all odd numbers does not form a number field. Similarly a set of all even numbers does not form a number field because a number field must contain 1 and 0. Hence any number field must contain a rational number field-
276
Linear Algebra
Definition 1. Let V be a (nonempty) set with its elements denoted by a,(3, i/,.. .,K be a number field with elements denoted by k,l,— If the following conditions hold, then V is called a linear space over K, or for short, a linear space. 1. To every pair of elements a and (3 in V, there is associated a unique element in V called their sum which we denote by a + /3. Element addition satisfies the following conditions for all a,f3, u £ V: (i) Addition is associative: ( a + /3) + u = a + (/3 + u). (ii) There exists an element, which we denote by o, such that a + o = a. (iii) For each a G.V there exists an element which we denote by —a such that a + (—a) = o. (iv) Addition is commutative: a+ (3 = 0 + a. 2. To every scalar k £ K and an element a £ V, there is associated a unique element, called the scalar multiplication of a by k, which we denote by ka. A scalar multiplication satisfies the following conditions for all a, (3 £ V and k,l £ K: (i) Scalar multiplications are associative, e.g. k(la) = (kl)a. (ii) Scalar multiplications are distributive with respect to element addi tion: k(a + (3) = ka + k(3. (iii) A scalar multiplication is distributive with respect to scalar addition (k + l)a = ka + la. (iv) l a = a (where 1 £ K). If K is by definition a rational number field or a real number field, then V is called a linear space over real numbers; If if is a complex number field then V is called a linear space over complex numbers. Thus the above-mentioned sets R2,R3,Rn,S,Rn,T, and M are all linear spaces and R2,R3, and M are linear spaces over real numbers. As 5 is a linear space composed of solution vectors, it is often called a solution space. As T is a linear space formed by eigenvectors, it is often called a characteristic space. A set need not form a linear space. For example, a set of all solution vectors of a system of nonhomogeneous linear equations is not a linear space. This is because the sum of any two of the solution vectors and the scalar multiplication of a solution vector by a scalar are no longer solution vectors. It should be noted that a linear space is an abstract concept; its elements are generally abstract in nature, not even necessarily numbers. By a space
Linear Spaces and Linear
Transformations
277
over real numbers we mean that the numbers k and Z in K are taken to be real numbers. Generally, a linear space over real numbers does not actually contain any real number, nor does a linear space over complex numbers contain any complex number. It should be remarked that the two operations as defined above do not give any concrete method of computations. The rules cited in the definition are nothing more than the conditions that must be satisfied by the two operations. For example, let V be a set of all positive numbers and K a rational field. Then it is easy to check by the definition that V is a linear space over K under addition © and scalar multiplication o defined by a(Bb = ab, k o a = a . From l a = a, we have 2 a = (l + l ) a = ct + a, i.e., 2 a can be regarded as a product of a by 2. It can also be regarded as the sum of two a . The same goes for any positive integer m, i.e., ma can be regarded as both the product of a by m and the sum of m a ' s . According to the associative law, the sum of a, /3, and v has nothing to do with the order of association. Thus the associative law for element addition allows us to drop parentheses, and we can write the sum of a , / 3 , and u as a + (3 + v. In general, the sum of any n elements a i , . . . , a „ can be written as a i + a.2 + • • • + a „ without any parentheses. In the above we have introduced the concept of a linear space. We shall next comprehend some basic properties of a linear space from its definition. From Sec. 2.1 we know that in the linear space Rn there exists a zero vector. To an arbitrary vector in Rn there exists a negative vector. The same applies to any linear space V. First suppose that a is an element in V. By definition, there is in V an element x such that a + x = a. (1) Then for an arbitrary element /3 in V, we also have (3 + x = {3. This is because by definition there is an element y in V such that a + y = f3. Hence (3 + x = (a + y) + x = (a + x) + y = a + y = f3. That is to say, the sum of x and an arbitrary element (3 is still equal to (3. The element x is called the zero element in V. The zero element is one such that when it is added to any element, their sum is still equal to this element; but if only there is some element in V, when it is added to another element,
278
Linear Algebra
their sum is equal to this element, then this element is the zero element. The zero element is unique. In fact, if x' is also a zero element, then from X ~T~ X
=
X,
X
] X
= :
X
we have x = x'. Let the zero element be denoted by o, i.e., a + o = a. To any element a in V, by definition there is an element y in V such that a +y= o
(2)
The element y is called the negative element of a . The negative element of a is also unique. As a matter of fact, if a + y' = o, we have y = ( a + y') + y = ( a + y) + y' = y'. Therefore y' — y. The negative element of a is represented by —a, i.e., a + (—a) = o or
a — a = o.
Thus we have T h e o r e m 1. There is a unique zero element in a linear space. To any element in a linear space, there is also a unique negative element. As ko = k(o + o)=ko
+ ko, Oa = (0 + 0)a = 0 a + 0a,
(—fc)a + fca = (—k + k)a = 0 a , the uniqueness of the zero element and the negative element immediately en ables us to obtain 0 a = o, ko = o, (—k)a = — (fca). Suppose that fca = o, then fc = 0 or a = o. This is because if fc ^ 0 , from fc_1fca = fc_1o we obtain a = o. Thus we obtain the following theorem: T h e o r e m 2. If V is a linear space, then
Linear Spaces and Linear
1. 2. 3. 4.
Transformations
279
Oa = o for any element a in V; ko = o for any scalar k\ if ka. = o, then either k = 0 or a = o; (—fc)a = -(A;a) for any element a in V and any scalar A;; in particular, ( - l ) a = - a .
The above properties which are common to all linear spaces are very useful. Prom Sec. 2.1, we learned certain basic concepts in connection with the linear space Rn such as linear independence, linear dependence, linear combi nation, and their basic properties. These concepts and properties are carried over word for word to any linear space V. This is because when we gave these definitions and properties in Chapter 2, we did not use expressions specific to vectors. Also we have used only certain operation rules which were listed in the definitions. These must also be valid as the definitions hold. The reader may like to check them as exercise. Henceforth we shall directly apply them without further explanation. Suppose V is a linear space over K, U is a subset consisting of some elements of V. If U is a linear space over K under addition and scalar multiplication of V, then U is called a subspace of V. As certain properties of a linear space have to be expressed by properties of some subspaces, and, in addition, in studying a linear space we often get involved with subspaces. The concept of subspace is also of great importance. It is easy to see that a set consisting of o alone is a subspace, called the zero space. The whole space V itself is always a subspace of V. From definition we can easily see that if U is a subspace of V, then the zero element of U is also the zero element of V; the negative element of the element a in U is also the negative element of the element a which is regarded as an element in V. The operations of addition and scalar multiplication are defined for all elements in V, in particular for those elements belonging to U. The results of these operations on the elements of U are elements of V". It may happen, however, that when these operations are applied to arbitrary elements of U, the resulting elements also belong to U. In this case we say that U is a subspace of V. Thus we obtain : T h e o r e m 3. A nonempty subset U of a linear space V over K is a subspace of V if and only if 1. whenever a. and /3 £ U, a + (3 € U; 2. whenever k 6 K and a eU,ka EU.
280
Linear Algebra
Proof. The necessary condition of the above theorem clearly holds. We need only prove the sufficient condition. If U satisfies the above two conditions, then the two operations on V are simply those on U. Moreover since (—l)a = —a € U, we have (3 — QEU, i.e., in U there is an element x such that a + x = /3. Again since elements in U are also elements in V and their operations on U and on V are also identical, elements in U satisfy conditions in the above definition of linear space. Thus U is a subspace of V and the theorem holds. For example, all vectors on a straight line or on a plane passing through the origin form a subspace of R3. Note that any subspace itself is a linear space and therefore must contain the zero element. For this reason, any line (or plane) in R3 that does not pass through the origin fails to be a subspace of R3. As a further example, all vectors of the form (0, a 2 , . . . , an) form a subspace of Rn, where a; is any real number. Suppose V is a linear space formed by all matrices of order 2. Then all matrices of order 2 of the form I
I form a
subspace of V. Similarly, all matrices of order 2 of the form I ,
1 form a
subspace of V. We next discuss on how to generate a subspace. We start with the following problem: Let a i , a.2, ■ ■ ■ ,an be elements of a linear space V over K. We want to find the minimal subspace U of V containing these elements. The minimality of U is understood in the sense that if W is a subspace of V containing the elements a.\, a.2, ■ ■ ■, am, then W D U. To solve the problem, we first note that according to the definition of a subspace, U must contain, along with oti(i = 1,2, . . . , m ) , all the elements of the form ai<Xi(ai G K; i = 1,2,..., m). Furthermore, the sum of elements belonging to U must also be an element of U. Thus any subspace containing the elements ati, 0 2 , . . . , a m must also con tain all elements of the form aidi + 0,20.2 H
h am&m
(3)
for any
Linear Spaces and Linear
Transformations
281
It is easily verified that the set of all linear combinations of the elements oci,ot2,. ■ ■ ,am belonging to a linear space V generates a subspace U of V. This is because the sum of any two linear combinations and multiplication of any linear combination by a scalar are still some linear combinations of Q i , Q 2 , . . . , Q m . Therefore according to Theorem 3 they form a subspace. Obviously, this subspace provides the solutions to the proposed problem: It contains the given elements themselves and is contained in any other subspace W of V such that Q I , 0 2 , . . . , am € W. The minimal subspace U generated by the elements ati, Ota,..., am G V is referred to as the span of Qi(i = l , 2 , . . . , m ) over K and is written < a t i , a 2 , . . . , a m > or L ( Q I , Q 2 , . . . , a m ) . ai,a%...,Ot.m are also said to span < Qi, c*2,..., am > if every element in < ax,at2, ■ ■ ■, a m > can be writ ten as a finite linear combination of a i , a.2, ■ ■ ■, a m . Thus by definition a linear space spanned by a 1, 0:2,..., a m is determined uniquely by the spanning ele ments aii(i = 1,2,... ,m). Thus we have proved the following theorem: T h e o r e m 4. If V is a linear space and 0.1,0.2, ■ ■ ■, <xm € V, then < a\,02, ■ ■ ■ ,om > is the minimal (in the sense of inclusion) subspace con taining a i , a 2 , . . . , a m . The zero space is obviously generated by the zero element; the set of mul tiples ka of a forms a subspace < a > generated by a. Example 1. Show that in the linear space R3 the subspace U generated by a x = ( 1 , 2 , - 1 , 3 ) , a 2 = (2,4,1,-2), a 3 = ( 3 , 6 , 3 , - 7 ) coincides with the subspace W generated by 01 = (1,2,-4,11), /32 = (2,4,-5,14) i.e., U = W. Proof. If we can prove that a* is a linear combination of the elements /3i and 02, while /3j is a linear combination of a 1,02, and c*3, then U CW,W C U. Therefore U = W. But we shall below give proof using elementary row operations. Let
-£)..-(*).
282
Linear Algebra
Applying elementary row operations we obtain
B = Thus U and W are two subspaces generated by (1, 2, 0, 1/3), (0, 0, 1, - 8 / 3 ) . Therefore U = W. The proof is complete. A subspace can be formed in two ways: On the one hand it can be generated by some elements as described above; on the other hand it can be formed by some subspaces themselves. Definition 2. Let V\ and V2 be two subspaces of V, where V is a linear space over K. The set of all common elements of V\ and V2 is called the intersection of Vi and V2, denoted Vi n V2, i.e., V1nV2 = {a\cc€ Vi, a G V 2 }. Theorem 5. Let Vi and V2 be two subspaces of a linear space V over K. Then the set of all common elements of Vi and V2 forms a subspace of V. Proof. In fact, since o G Vi and o G Vy, we have 0 G Vi DV2 and so Vi nV2 is not empty. Moreover, it is clear that the set ViflVj again satisfies the conditions of a subspace. As a matter of fact, if a, /3 G V\ D V2 then a , (3 G Vi, a , /3 G V2. Therefore a + /3GVi, a + (3 €.V2 and thus a + f3 G V\ n V2. Likewise, since fca G Vi, fca G V2, we have fca G Vi n V2. According to Theorem 3, Vi D V2 is a subspace of V. Therefore the theorem is proved. Clearly Vi n v2 - v2 n Vi, (Vi n v2) n v3 = Vi n (V2 n v 3 ). Definition 3. The sum of two subspaces Vi and V2 is defined as the set of all vectors of the form a + (3, where a G Vi, f3 G V2, and denoted by Vi + V2, i.e., Vi +V2 = {a + p\a 6Vuf3 EV2}. Theorem 6. With Vi and V2 as defined above, Vi + V2 is a subspace of V.
Linear Spaces and Linear
283
Transformations
Proof. Taking any two elements Q I and a2 in Vi and Pi and p2 in V2, respectively. Clearly ( « i + Pi) + (a2 + P2) = (<*i + a 2 ) + (Pi + p2) £ Vi + V2 , fc(ai + Pi) = fcai + kPi e Vi + V2 , and so Vi + V2 is a subspace of V. Obviously Vi+V2 = V2 + Vu (Vi + V2) + V3 = Vi + (V2 + V 3 ). For example consider R3. Let Vi be a subspace formed by all vectors through the origin lying on a line h and V2 a subspace formed by all vectors through the origin lying on another line l2. Obviously, V\ n V2 is a zero space formed by the intersection point (a zero vector) of h and l2. V\ + V2 is a subspace formed by all vectors through the origin on the plane determined by li and l2. Now let V be a linear space over K formed by all matrices of order 2, U
-{(::
a, beK\,
W
{(
a,c£
K
be two subspaces of V, then U+W
unw
■{(:
a, b, c 6
*}
] aG/fj.
Suppose {/ is a subspace of V\ and V2, i.e., U C Vi and U C V2. Then f/ C Vi n V2, that is to say, a subspace of both V\ and V2 is also a subspace of Vi fl V2, i.e., Vi D V2 is the biggest subspace of V which is contained in both Vi and V2. Suppose Vi and V2 are subspaces of V i.e., V\ C V, V2 C V. Then Vi + V2 C V. That is to say, V contains Vi, V2, and V\ + V2 as well. In other words, V\+V2 is the smallest subspace which contains both Vi and V2. We know that any element a of V = Vi + V2 can generally be written in several ways in the form a = a i + ot2,on 6 V- Hence the decomposition is generally not unique. If the decomposition is unique, then V is called the (internal) direct sum of Vi and V2 and is denoted by V = Vi © V2. For example, let a and /3 be elements of linear space V. If they are linearly independent, then < a > + < / 3 > i s their direct sum. If they are linearly dependent, then < a > + < / 3 > i s not a direct sum.
284
Linear Algebra
Suppose V = Vi © V2, a and 0 G V. Then a = a i + a 2 , 0 = 0\ + 02 and a + /3 = ( a i + /3i) + ( a 2 + #2), A;a = fcaj + ka2 ■ Hence operations on V are completely on its subspaces Vi and V2. In other words, the structure of V is uniquely determined by its subspaces Vi and V2. This is one reason why the direct sum is of great importance. The following is the necessary and sufficient condition for a sum to be the direct sum. Theorem 7. A linear space V is the direct sum of subspace Vi and V2> i.e., V = Vi ® V2 if and only if
v = V! + v2, Vinv 2 = o. Proof. Suppose a = ai+01=a2+
P2, cm € VL,
Pi&V2.
Then Qi - a 2 = f32 - /3i G Vi n V2 = o, and hence " i = «2, /9i = 02 1 i.e., the decomposition of a is unique, and so is V = V\ © V2. Conversely, suppose a G Vi n V2, i.e., a G V i , a G V2. As a + 0 = o + a, from the uniqueness of the decomposition, we obtain a = o, i.e., Vi n V2 = o, This proves the theorem. Example 2. Suppose V is the plane xj/, U is the axis z in R3, i.e., V = {(a,b,0) \a,be R},
U = {(0,0,c) \c G « } .
Then R3 = V © U. Proof. Since any element in R3 can be written as (a,6,c) = (a,6,0) + (0,0,c), and (a,6,0) = (0,0,c) = (0,0,0), 3
R is the direct sum of V and U, i.e., R3 = V © [/.
Linear Spaces and Linear
285
Transformations
Exercises 1. Determine if each of the following sets of numbers forms a number field? (1) zero. (2) All positive integers. (3) a + 6\/2, where a and 6 are arbitrary integers. (4) a + 6v/—3, where a and 6 are arbitrary real numbers. 2. Determine if each of the following sets forms a linear space under the indi cated operations. If it does, over which number field does it form a linear space? (1) The set of all polynomials whose coefficients are integers and whose degrees do not exceed n ( ^ 1) under the usual operations, or addition of polynomials and multiplication of a polynomial by a rational number. (2) The set of all real symmetric (skew-symmetric) matrices under the usual operations of addition of matrices and multiplication of a matrix by a real number. (3) The set of all vectors in a plane not parallel to a fixed vector under theusual operations of vector addition and scalar multiplication. (4) The set of all matrices of order n having zero traces under the usual op erations of matrix addition and multiplication of a matrix by a complex number. 3. Determine if the set of all vectors in Rn whose components satisfy condition (1) or (2) below forms a subspace. (1) xi + X2 H
r *n = 0,
(2) xi + x2 H
h xn = 1.
4. Suppose a and 3 are elements of a linear space over K and k £ K. Prove that k(a - 3) = ka- k(3. 5. Prove that the subspace generated by (1, 1, 0, 0) and (1, 0, 1, 1) coincides with the subspace generated by (2, —1, 3, 3) and (0, 1, —1, —1). 6. Suppose a = (01,02) and 3 = (61,62) are two nonzero elements in R2. If the subspace < a > generated by a coincides with the subspace < 3 > generated by 3, prove that
Oi
02
0.
6l 62 7. Prove that the real linear space formed by all matrices of order direct sum of the subspace formed by all real symmetric matrices n and the subspace formed by all real skew-symmetric matrices of 8. Suppose U, V, and W are three subspaces of a linear space. Prove
(unv) + (unw)cun{v + w).
n is the of order order n. that
286
Linear Algebra
7.2. Bases and Coordinates With the exception of the zero linear space which is formed by only the zero element, generally a linear space has infinitely many elements. How to express them? What is their structure? These are very important problems. Moreover, elements in a linear space are abstract. How to associate them with numbers? How to express these elements explicitly by using formula to enable us to perform operations? These are again important problems. In this section we shall mainly consider these problems. Prom analytic geometry we know that the vectors i = (1,0) and j = (0,1) in R2 are linearly independent; any vector (x, y) can be represented as a linear combination of i and j , i.e., (x, y)=xi
+ yj .
Vectors i = (1,0,0), j = (0,1,0), and k = (0,0,1) in R3 are three linearly independent vectors and an arbitrary vector can be represented as a linear combination of i, j , and k: (x, y, z) = xi + yj + zk. Again from the theory of a system of fundamental solutions in Sec. 2.3, we know that in the solution space S, an arbitrary solution vector is a linear combination of a system of fundamental solutions which consists of n — r solution vectors. According to the definition of the largest linearly independent set of vectors in Sec. 2.3 we know that vectors i and j are the largest linearly independent set of vectors in R2 and i, j , and k are the largest linearly independent set of vectors in R3. A system of fundamental solution is a largest linearly in dependent set of vectors in the solution space. As in the case of a system of fundamental solutions, for a largest linearly independent set of vectors in a linear space, we specially give the following definition: Definition 1. Let V be a linear space. If there are n linearly independent elements o i , 0:2, • ■ ■, <*n and an arbitrary element of V is a linear combination of e*i, 0L2, • • •, ot-n, then {a\, 0L2, ■ • •, &n} is called an ordered basis for V, or, for short, a basis for V. The dimension of V is the number of vectors in an ordered basis for V. Thus n is the dimension of V. We often write dim V for the dimension of V. V is known as an n-dimensional linear space. If there do not exist such AT elements, i.e., for an arbitrary positive integer N
Linear Spaces and Linear
Transformations
287
there may always be N linearly independent elements in V, then V is known as an infinite-dimensional linear space. If V is not an infinite-dimensional linear space, it is called a finite-dimensional linear space. Of course, if { a i , 0 2 , . . . , a „ } is an ordered basis for V, then the elements a%,ar2,..., a n must span V. Thus R2 is a two-dimensional linear space. The set of vectors i and 3 forms an ordered basis for R2. R3 is a three-dimensional linear space, the set of vectors i, j , and ft forming an ordered basis for R. Obviously, Rn is an n-dimensional linear space; the set of vectors ( 1 , 0 , . . . , 0 ) , . . . , ( 0 , 0 , . . . , 1) is an ordered basis for Rn. Let E%j be a matrix of order n with 1 in the i-th row and j - t h column and zeros elsewhere (Sec. 3.1). According to the definition of linear independence, it is clear that the n2 matrices Eij(i,j = 1,2,... ,n) are linearly independent, and any matrix (ajj) of order n can be represented in the form (aij) = TZdijEij. That is to say, any matrix (oy) is a linear combination of the n2 matrices Eij(i,j = 1,2,... , n), so the set of matrices Eij(i,j = 1,2,... ,n) is an ordered basis for the linear space Rn. Therefore Rn is an n 2 -dimensional linear space. M, the set of polynomials, is an (n+ l)-dimensional linear space. The zero linear space contains no linearly independent elements. It is a zero-dimensional linear space, and so there is no ordered basis for it. Its dimension is zero. From Theorem 3 in Sec. 2.1, we know that in a linear space of dimension n, any n + 1 vectors are linearly dependent. Therefore the dimension of a linear space is unique, but its ordered basis is not unique. For example, the set of vectors i and j is an ordered basis for R2. Likewise, the set of vectors j and —i is also an ordered basis for R2. Again for example, {(1,0,...,0),(0,1,...,0)...,(0,0,...,1)} and {(1,1,...,1),(0,1,...,1),...,(0,0,...,0,1)} are an ordered basis for Rn respectively. E x a m p l e 1. Give an example of an infinite basis. Solution: Let C be a complex number field and V a linear space consisting of polynomials with complex coefficients of the form f(x) = a0 + a\x -I
h anxn .
288
Linear Algebra
Let fk(x) = xh(k = 0,1,2,...). Then the polynomials /o, / i , ■ • ■, are an ordered basis for V. Clearly these polynomials span V. Why are these poly nomials linearly independent? To show that the set / o , / i , - - , are linearly independent means to show that each finite subset of it is linearly indepen dent. It will suffice to show that for each n the set /o, f\,. . . , fn are linearly independent. Suppose oo/o + ffli/i H
1- fln/n = 0 ■
This means OQ + a±x + ■ ■ • + anxn = 0 for every x in C. In other words, every x in C is a root of the polynomial f(x) = ao + a\x + . . . + anxn. We assume that the reader knows that a polynomial of degree n with complex coefficients cannot have more than n distinct roots. It follows that ao = ai = • • • = an = 0 . Thus we have exhibited an infinite basis for V. Does this mean that V is not finite-dimensional? As a matter of fact, it does. Suppose we have a finite number of polynomials gi(x),32(2), • • • ,9r{x)There will be a largest power of x which (with nonzero coefficient) appears in 9i{x),92(x), ■ ■ ■ ,9r(x)- If that power is k, clearly /fc+i(a;) = xk+1 is not in the linear combination of gi(x),g2(x),... ,gr{x). So V is not finite-dimensional. Thus there are infinitely many linearly independent elements in the infinitedimensional space. A infinite-dimensional space differs greatly from a finite-dimensional space. The former is usually not an object of study in linear algebra, and we shall from now on restrict our attention to finite-dimensional linear spaces. Example 2. Let V b e a linear space spanned by finite elements /3i, 02, ■ ■ ■, f3m. Then any linearly independent set of elements in V is finite and contains no more than m elements. Proof. To prove the statement it suffices to show that every subset S of V which contains more than m vectors is linearly dependent. From Theorem 3 in Sec. 2.1, we directly draw the above conclusion. Example 3. If V is a finite-dimensional linear space, then any two bases of V have the same (finite) number of elements.
Linear Spaces and Linear
Transformations
289
Proof. As V is finite-dimensional, it has a finite basis 3i,32, ■.. ,3m. From Example 2 we know that every basis of V is finite and contains no more than m elements. If Oi,ct2, ■ ■ ■ ,an is a basis, then we require n ^ m. By the same argument, m ^ n. Therefore m = n. Note that Example 3 allows us to define the dimension of a finitedimensional linear space as the number of elements in a basis for V. Example 4. Let a i , a 2 , . . . , a m be a linearly independent subset of a linear space V. Suppose 3 is an element in V which is not in the subset spanned by Oi, O s , . . . , a m . Then c*i, a<2,..., am,3 are linearly independent. Proof. Suppose kiati H
+ kmam
+ bd — o.
Then 6 = 0, otherwise
" = ( - y ) ° ' + ' + (-x)°and 3 is in the subspace spanned by 01,02,■ ■ ■, ctm. This conflicts with the assumption. Thus we have b — 0. Hence k\ai + foo^ + • • • + kmam — o. As Qi, ot2, ■ ■ ■, ot-m are linearly independent, we have fcj = 0 (i = 1,2,..., m). Example 5. Let W be a subspace of an n-dimensional linear space V. Then a basis for W can be extended to be a basis for V. Solution: Suppose that W is an m-dimensional subspace, {011,03, ■ ■ ■, a m } is a basis for W. We extend o i , 02, • ■., am to a basis for V as follows. If any element in V can be written as a linear conbination of OL1,O.2, ■ ■ ■ ,otm, i.e., W spans V, then {01,03, • • •, otm} is a basis for V and we are done. If { 0 1 , 0 3 , . . . , am} does not span V, we use the preceding Example 4 to find an element /3i in V such that a i , 0 3 , . . . , am, 3 are linearly independent. If they span V, fine. If not, again apply Example 4 to obtain an element ($2 in V such that 0 1 , 0 2 , . . . , otm,3i,32 are independent. If we continue in this way, then (in not more than dim V steps) we reach a set of elements in V ai,
which is a basis for V.
a2,
■ ■ ■, atm, 3i, 32, ■ ■ ■, 3
290
Linear Algebra
It should be remarked that if we did not have the property la = a, then we would have no way of writing on as linear combination of Qi, a.2, ■ ■ ■, otn. And consequently we would not be able to establish certain basic concepts, such as the concepts of basis and dimension, and the like. As mentioned above, proofs of above examples tell us nothing about how to find a basis; they just show that the bases for V exist. Generally, we have no simple method for finding a basis for a linear space. But once a basis is given, we can find other bases by using the conclusion in the following example. Example 6. If following n vectors
{QI,Q2,
Pi = a i l « l H
. . . , an} are a basis for a linear space V, then the
1" CllnOin, ■■■ ,Pn = O-nlOti H
h
annOLn
are a basis for V if and only if the determinant an
• • • o-in ^0.
0-n\
Proof. It suffices to prove that /3i,/32,... ,/3 n are linearly independent. Let fci/3i + k-ifii + • • ■ + kn(3n = o. Since a i , ot2, ■ ■ ■, otn are linearly inde pendent, we have Man H
1-fenflni= 0,
^l^in +
h knann — 0 .
Since k all zero if and only if the above determinant is not equal to zero, the statement holds. The above example once more explained that a basis for a linear space is not unique. Generally, there are infinitely many bases. Example 7. Suppose / ( a , ^ ) is bilinear form of a linear space V and {ai,a2, ■ ■ ■ ,otn} are a basis for V. Let a = Eo,aj,j9 = UbiOn. Then
Linear Spaces and Linear Transformations
i
i
291
3
3
where a^ = /(a*, otj). If ati, c*2, • ■ ■ j c*m a r e linearly independent, then the dimension of the subspace < oc\, c*2,..., am > spanned by an, a.2, ■ ■ ■,oim is m. If the greatest number of linearly independent vectors in a i , a 2 , - . - , a m is r, then the di mension of < « i , <*2> • • • i am > is r. The following is an important theorem concerning dimensions: T h e o r e m 1. Suppose V is an n-dimensional linear space over K and V\ and V% are its two subspaces. Then dimVi + dimV2 = dim(Vi +V2) + dim(Vi n V2). Proof. Suppose {ai,e*2,... , a/} is a basis for V\ n V2. Then we can choose 0i+i,..., j9 r G Vi such that { a i , . . . , on,/3z+i, • • •,A-} is a basis for Vi and vi+i,..., va 6 Vj such that { Q I , . . . , a;, i/j+i,..., vs} is a basis for V2. If we can show aii,. ■-,011,(31+1,... ,0r,vt+i,.. .,us is a basis for Vi + V2, then dim(Vi + V2) =r + s-l, i.e., fc = r + s — I, and the theorem is established. First, as any element a in Vi can be written as a = aicti H
h cuan + 6(+i/3j+i H
1- bTf3r
and any element /3 in V2 can be written as (3 = a[ai H
h a[on + cj+ii/j+i H
h csfs,
we have a + /3 =(ai + a[)o:i H H
h (oj + a;) a * + &J+1A+1
1- 6 r /3 r + Ci + ii/j + i -I
1- csus ,
i.e., a + (3 is a linear combination of Q i , . . . ,on, f3i+i,... ,(3U, fi+i,... Again if aiaiH
(- cuan + bi+ifii+i -I + a+ivi+i -\
h br(3T
+ cava = 0 ,
,vs.
292
Linear Algebra
or
then ^aiai
+ ^bjPj;
= -^Cfct'ifc,
(1)
which shows that the element on the left side belongs to V\, while the element on the right side belongs to V%. It follows that the element on the right side belongs to V\ l~l V2, and consequently
Substituting them into the above expression, we obtain
y^iai + di)ai + ^2 bi& Since cx\,... , Ctj, A+ii ■■■iPr (1) becomes
are
=
°•
linearly independent, we have bj = 0. Thus
^aiCXi + ^Cfci/fc = 0. Again since a.\,... ,ai, vi+i,... ,vs are also linearly independent, we have at = 0 and c^ = 0. That is to say, ati,...,aj, /3i+i,...,/3r,
I/{+1,...,I/S
are linearly independent. Hence they are a basis for V\ + V2 and thus the theorem holds. Suppose dimVi+dimV2 > n, then dim(VinV2) > 0. This is because V1+V2 is a subspace of V, so that the dimension of (Vi + V%) ^ n. E x a m p l e 8. Suppose V is a subspace of R3 formed by elements of the form (01,0,03) and U is a subspace spanned by the elements (1,2,1) and (3,1,2). Find the dimensions of V n U and V + U, and bases for V n U and V + U. Solution: Let cti = (1,2,1), a 2 = (3,1,2). If (ai,0,a 3 ) € U, then (ai,0,a 3 ) = biai + b2a2,
Linear Spaces and Linear
Transformations
293
or ai = bi + 3b2, 0 = 2bi + b2, a3 = 61 + 262 , therefore ai = -5bi, a3 = -3&i and so (ai,0,a 3 ) = &i(5,0,3). Hence V (~)U is a subspace of dimension 1 and {(5, 0, 3)} is a basis for V n U. Again since V and U are two-dimensional subspaces, while V C\U is onedimensional subspace, the dimension of V+U is 2 + 2 - 1 = 3 . Hence V+U = R3. It is clear that {(1,0,1), 0 1 , 0 2 } is a basis for V + U. Example 9. Let V be a subspace of R3 formed by vectors of the form (x, x, x) and U a subspace of R3 formed by vectors of the form (x, 0, z). Show that V © U = R3. Proof. Let a = (a, b, c) be any element in R3. Since a = (6, b, b) + (a —
b,0,c-b),
we have o € V + U, R3 CV + U. ButasV + UCR3,we have R3 = V + U. 3 Obviously V D £/ = o, so that i? = V © £7. We shall also show V © U = R3 by a different method. Since the dimension of V is 1 and the dimension of U is 2, while the dimension of V fl U is zero, the dimension of V + U is 3. Hence V+U = R3. Obviously V D U = o, so V © U = R3. The proof is complete. Example 10. Suppose that V\ and V2 are two subspaces of a linear space V. Prove that V = V\ + V2 is a direct sum if and only if dimV = dimVi + dimV^ . Proof. By Theorem 7 in Sec. 7.1, we know that V = V\ + V2 is the direct sum V = Vi ffi V2 if and only if V\C\V2 = 0. Again by Theorem 1, we immediately see that the theorem holds. The proof is complete. That is to say, the dimension of (Vi + V2) is generally not equal to the sum of the dimensions of Vi and V2. Only when Vi nV2 = o do we have dim(Vi + V2) = dimVi + dimV2. As mentioned above, using the largest linearly independent set as a ba sis we answer an important question raised in the introduction. By denning the coordinates of a vector with respect to a basis, we will answer another important question below.
294
Linear Algebra
Suppose { Q I , C*2, . . . , a n } is a basis for a linear space V over K. Then any element a in V is a linear combination of a.\, C*2, . • •, Cnn, i.e., a can be written as a = a±ai -\ h anan . This expression is unique. The reason is that if a = b\ai + ■ ■ ■ + bnan, then (ai - 6i)ai H
1- (o„ - bn)an = o.
As a i , a 2 , . . . , a n are linearly independent, we have a* = 6,, i = 1,2,..., n. That is to say, for a chosen basis, the coefficients 01,02,...,,o n can be de termined uniquely by a. The scalars 01,03, . . . , o „ so determined are called the coordinates of the vector a with respect to the basis { a 1,02,... ,<*„}. Therefore we can express a by an n-dimensional vector (01,... ,an) and de note a — [a\, ■ ■ ■ ,an). Thus an abstract element is associated with a set of specific numbers o i , . . . , an. Again suppose a = aia.1 H
\- anan,
(3 = bieti H
\- bnan .
Since a + (3 = (ai + 6i)ai H
h (a„ + 6 n ) a n ,
and ka = (kai)ati H
h (fca n )a n ,
we have ( a + /3) = (01 + 6 1 , . . . , a„ + 6 n ), ka = (fcai,..., fcan). That is to say, the coordinates of the sum of a and (3 are the sum of the coordinates of a and the coordinates of /3. The coordinates of the product of multiplication of a by k are the products of the coordinates of a by k. Thus when elements in a linear space V are written as vectors by means of coordinates, operations on the elements are turned into operations on vectors. Consequently we also call the elements in V vectors, the numbers in K scalars, and a linear space a vector space. By means of coordinates many problems of vectors can be tackled by dealing with the magnitudes, and thus make the problems of vector very specific. The above discussion suggests that, from an algebraic point of view, V and n jR behave "rather similarly". We shall now clarify this notion. Definition 2. Let V be a real vector space with operations ® and 0 , and let W b e a real vector space with operations ffl and CD. A one-one function L mapping V onto W is called an isomorphism of V onto W if:
Linear Spaces and Linear
295
Transformations
(a) L(a 0 / 3 ) = L(a) ffl L(/3) for a and (3 in V, (b) L(fcO) = fc □ £ ( a ) for a in V, k being a real number. In this case we say that V is isomorphic to W (see Sec. 7.5). When defining one-one and onto functions, we recall that L is one-one if L ( a i ) = L(a2) for a%, a2 in V, implying that a i = a2. Also, L is onto if for each (3 in W there is at least one a in V for which L(a) = (3. For example, the mapping L : R3 -t R2, defined as
("A : ■
H
z
ai
\
=( D
is onto. To see this, suppose f3 = I , 1 I and seek an a = I a2 I such that
Thus we obtain the solution: a\ = b2, a2 = b\ — b2 and 03 is arbitrary. However, L is not one-one. For example, ct\ = I 2 I ,
a2 = I 2 | ,
L(al)=L(a2)=(31 Isomorphic vector spaces differ only in the nature of their elements; their algebraic properties are identical. That is, if the vector spaces V and W are isomorphic under the isomorphism L, then for each a in V there is a unique (3 in W such that L(a) = /3 and, conversely, for each /3 in W there is a unique a in V such that L(a) = (3. If we now replace each element of V by its image under L and replace the operations © and 0 of V by EB and □ respectively, we get precisely W. The most important example of isomorphic vector spaces is given in the following theorem. Theorem 2. If V is an n-dimensional real vector space, then V is isomor phic to Rn (see Theorem 2 in Sec. 7.5). Proof. Let {ai,a2,... defined as
,an}
be a basis for V and let L : V -> Rn be
296
Linear Algebra
a2
L(a)
V an I where a = a,\a.\+aiOii-\ \-anan. First we see that L is one-one. Let / CL\
We shall show that L is an isomorphism.
\
0,1
L{cc)
\ a-n. an
b2
W) )I
and suppose L(a) = L(/3). Then I ai 0.2
\ an I
\bn
It follows that a = aiai + a2a2 + • • • + anan = bioci + b2a2 + ■ ■ ■ + bnan = 3 . b2
Next we see that L is onto. Suppose 3 =
is a given vector in Rn and
\bnJ
construct a vector
a = biai + b2Q.2 H
V bnan .
Then a € V, and L(a) = 3. Finally we show that L satisfies the conditions (a) and (b) in Definition 2. Let a = a i a i + • • ■ + anan and 3 = biai + ■ ■ ■ + bnan be vectors in V such that a\ \ / h \ b2 a-i L(a) L(8) \bnJ Then a + 3 = (ai + 6i)ai H a\ +bi L(a + 3) an + bn
\-(an + bn)an ,
ai \ / 6X ■■■ + ••• 1 =L(a) + L(P), an I \bn
Linear Spaces and Linear
Transformations
297
and
(
kai \
...
I ai \
=fcI •. •
kan )
= kL(a).
\ an )
Hence V and Rn are isomorphic. It is simple for us to find the coordinates of a vector with respect to some basis. It suffices to express the vector as a linear combination of the basis and the coefficients of the combination are the required coordinates. E x a m p l e 11. Find the coordinates of a vector a = (1,2,1) in R3 with respect to the following basis: {ai
= (1,1,1), a 2 = (1,1, - 1 ) , a 3 = (1, - 1 , - 1 ) } .
Solution: Let the required coordinates be (01,02,03), and consider a = a i a i + 0202 + 0303 .
As (1,2,1) = (ai + d2 + 03, ai + a 2 - a 3 , ai - a 2 - 03), we have 01 + 03 + 03 = 1, ai + 02 — 03 = 2, ai — 02 — 03 = 1, and thus the solution 1 1 ai = 1, a2 = - , 03 = — - . Hence the required coordinates are (1,1/2,-1/2). We next discuss the relationship between bases and coordinates. We know that the coordinates are closely related to the basis chosen. The same vector generally has different coordinates with respect to different bases. For example, if one basis for Rn is taken as {a\, 012, ■ ■ ■, otn} and we have a = aicti H h anan, then (a\, 0 2 , . . . , an) are the coordinates with respect to this basis. If another basis is taken as {a[, a'2,... ,a'n} where a'l = cti + a2-\
1- a n , a'2 = a2-\
1- <xn, ■ • ■ , a'n = a „ ,
then as on = a^ — ot'i+1{i = 1,2,... ,n — 1), and an = a'n, we have a = oicr'j + (a 2 - ai)a' 2 H
h (a„ - a n _ i ) a ^ .
So the coordinates of a are now (01,02 — o i , . . . , a n — a n _ i ) .
Linear Algebra
298
We next discuss the relationship between two sets of coordinates of the same vector with respect to two different bases. For this we have the following theorem: Theorem 3. Suppose that {ai,a.2, ■ • •, ocn} and {a[, a'2,..., bases for a linear space V', and
(a'1,...,a'n)
= (ai,...,an)A,
A =
a'n} are two
air
an
(2) a„i
■ ■ ■
av
If the coordinates of an element a in V with respect to the two bases are (xi, X2, • • ■ i xn) and (x[,x'2,..., x'n) respectively, then (x[,...,x'n)
= (x1,...,xn)A'-1,
(3)
where A is called the transition matrix from the one basis { a 1,0:2,..., a n } to another basis {a^oc^, • • •, &'„}■ Proof. Using the operational rules of matrices (here some elements in the matrices are vectors), we have «1
a = xiai + ■ ■ ■ + xnan
a = x^a'i +
= (sci,..., xn)
h x'na'n = {x[;
...,x'n)
u
and so (xi,...,x„,)
But as
on I j I=
{x\,...,x'n)
(4)
Linear Spaces and Linear
299
Transformations
we have ai
{(x1,...,xn)-{x'1,...,x'n)A'}
0.
>a« As a i , Q 2 , . . . ,ctn are linearly independent, we have (xi,...,xn)-
{x[,...,x'n)A'
=0,
or (xi,...
,xn)
= [Xi,...
,xn)A
.
Again since a[, a'2,..., a'n are linearly independent, we know that according to Example 6 and (2) A is nonsingular. Therefore (3) holds, which proves the theorem. The converse of the above theorem is also true. The reason is that from (4) and (3) we have
V-^l) • • • j Xn)
p IU
a -A
\
i-i
= 0.
\<J I
As we can take any value whatever for Xj, the above equation is an identity in Xj. Therefore aA / a[ \
: '«»'
-A'"1
:
=0,
\<x'nj
which is formula (2). Thus we see that if the relationship between two bases is given by formula (2), then the relationship between two sets of coordinates determined by the two bases, is given by formula (3). Conversely, if the relationship between two sets of coordinates is given by formula (3), the relationship between two bases which determine the coordinates is simply given by formula (2). In other words, if the change of bases is given by formula (2), then the transformation of the two sets of coordinates is simply given by formula (3). Conversely, if the transformation of the coordinates is given by formula (3), then the change of the bases is simply given by formula (2). Formula (1) in Sec. 4.1 may be considered as a transformation of coordi nates. E x a m p l e 12. Suppose a transformation of coordinates in a linear space V of dimension n is
300
Linear Algebra Xx = Xi,X2
= X2-
Xi,X3
= Xz - X2, . ■ . ,Xn = Xn - X „ _ i ,
find the transformation of the two bases. Solution: Rewriting the transformation of coordinates in the form
( i j , x2, ■ ■ ■, xn)
— \Xi, X2, ■ ■ ■,
0
0
0
-1 1/
xn)
\ 0 we have 1 0
-1 1
0
4/-1
-1 1)
0
Vo Then as {A')'1
= {A~1)', we have
/ 1 0
04"1)' = (AT1 =
0
°o^
-1
\l
■1
1/
We can transpose the above matrix and obtain
1
°
-1
1
( A'1
0
0
V 0
0
0 0
0 \ 0
1 -1
0 1 /
Again finding the inverse of the matrices on the two sides, we have
f\ A = {A~1)-1
0 1
0 0
1 1
1 1
=
0
0
1
V1
0
1J
\
301
Linear Spaces and Linear Transformations
As finding the change of bases for V means finding the matrix A, the change of bases is 0 1
(\ (a'1,a'2,...,a'n)
= {a1,a2,...
• •
0 0
0
1 1
0 1 /
,«„) 1
Vi
1 1
\
0
i.e.,
a[= ai +a2-\
h a „ , a'2 = a2 + a3-\
h a „ , ...,a'n
=
an.
E x a m p l e 13. Suppose V is a 4-dimensional linear space and a x1 ===( (1,2,1 , 2 , --1,0), 1 , 0 ) , a 2 == ((!," 1 , --1,1,1), 1,1,1), ( - 1 , 2,1,1), a 4 == ((--11,,-- i1,,o0,, i1)) «a 3 = = =(-1,2,1,1), and ai ai a« 33
= (2,1, (2,1,0,1), (0,1,2,2), 1,2,2), 0,1), c4 «' 2 = (o, = (1, (1,3,1,2) 3,1,2) = (-2,1,1,2), ( - 2 , 1,1,2) , a« i4 =
are two bases for V. If the change of the bases is from the former basis to the latter basis, find the transformation of coordinates in V. Solution: We can first find the coordinates of ot'j, a'2, a basis { Q I , a2, a3, a^}, and then using formula (2) find But we shall here use a more effective method to find coordinates in V as follows. Suppose that the base determined coordinates for Then (ai,a2,a3,a4) = (oc'1,a'2,a'3,a'4)
=
a'3, a 4 with respect to the matrix A required. the transformation of V is {/3i,/32,/33,/34}.
(J3i,p2,P3,P4)A, (p1,p2,p3,p4)B,
where
( A =
1 2 -1
V 0
1 -1 1 1
-1 2 1 1
-1 , 0 1/
/2 0 -2 1 1 1 B = 0 2 1 2 Vl 2
lx 3 1 2/
302
Linear Algebra
As (a[,a'2,oc'3,a'4) (A-lB)'-x
{ai,a2,a3,a4)A~1B,
=
= (B'A'-1)-1
= (A'-1)-^'-1
=
A'B''1.
we have [x'^x^x'^x1^ By direct computations we find / A'B
1
1 1-1 \ - l
/ °1 -1
V i
2 -1 2 -1 -1 1 0 0
(x1,x2,x3,x4)A'B'~1.
=
-1 1 1 0 0 0 0 1
0^ 1 1 1/
/
V
4 13 6 13 8 13 11 13
2 13 3 13 9 13 1 13
3 13 2 13 7 13 8 13
1 13 8 13 2 13 6 13
1\ -1
1 -1/
and hence
(°
-1 0 1 0 -1 1 [Xi, X2, X3, X4) — \X\, X2, X3, X4) 1 0 0 -1 0 1 1 / 1 Thus the required transformation of coordinates is Xi = X2 — X3 + Xi,
X2 = —X\ + X2,
X3 =
X4,
x'4 = X\ - X2 -r X3 - X4 .
Exercises 1. Find a basis for the solution space of the homogeneous system in Example 1 of Sec. 2.2 and its dimension. 2. Find a basis for the solution space V of the homogeneous system in Exercise 2(2) in Sec. 7.1 and the dimension of V. 3. Show that vectors (1, —1, 0) and (0, 1, —1) are a basis for the linear space in Exercise 3(1) of Sec. 7.1 when n = 3. 4. Find the coordinates of vector a = (3,7,1) with respect to the basis {01,02,03}, where o i = (1,3,5), o 2 = (6,3,2), o 3 = (3,1,0).
Linear Spaces and Linear Transformations
303
5. Find the coordinates of vector 1 + x + x2 in M (see Sec. 7.1) with respect to the basis {1, x — 1, (x — 2){x — 1)}. 6. Suppose V is a linear space formed from all matrices of order 2. Find the (2 3\ coordinates of I . _ I with respect to the basis
\ ( i ij \ \
o j ' ( o o)} •
o) '(o
7. Find the relationship between the two different sets of coordinates of the same vector with respect to the two bases {01,02,02} and {0\,02,0z\, where o j = (1,2,1), o 2 = (2,3,3), o 3 = (3, 7,1); 01 = (3,1,4), /3 2 = (5,2,1), 03 = (1,1, - 6 ) . 8. Let { 0 1 , 0 2 , 0 3 , 0 4 } and {Pi, fa,03,(34} be two bases for i? 4 , where fai 02 03 04
= = = =
(1,0,0,0), (0,1,0,0), (0,0,1,0), (0,0,0,1);
(Pi 02 p3 l£4
= = = =
(2,1,-1,1). (0,3,1,0), (5,3,2,1), (6,6,1,3).
(1) Find the matrix for transformation from the former basis to the latter basis; (2) Find the coordinates of vector a = (2:1,0:2, £3, £4) with respect to the latter basis; (3) Find a nonzero vector which has the same coordinates with respect to above two bases. 9. Suppose {01,02,03} is a basis for R3. spanned by 0\,0%,0z, where
Find a basis for a subspace U
/3i = o i - 2 o 2 + 3 o 3 , 02 = 2 a i + 3 o 2 + 2o 3 , 03 = 4 o : + 13o 2 . 10. Find a basis for a subspace V fl U in R4, where V = {(01,02,03,^4) I ai - a2 + 03 — 0-4 = 0} , U = {(01,02,03,04) I oi + a 2 + a 3 + 04 = 0} . 11. Suppose linear space V is n-dimensional, Vi and V2 are its subspaces, and dim(Vi + V2) = dim(Vi nV2) + l. Prove that Vx C V2 or V2 C Vi.
304
Linear Algebra
7.3. Linear Transformations Linear transformations are the most important link between vectors in a linear space. They play important roles in many areas of mathematics, physics, social science, and economics. In particular, linear transformations are central to linear algebra. In this section we shall present some basic concepts of linear transformations and discuss their basic properties. In the plane R2, which is a two-dimensional linear space formed by all two-dimensional vectors from the origin, a rotation about the origin is a trans formation. Suppose a is a vector in R2 and after rotating we get a vector a'. Then after the rotation, the sum of vectors a and /3 becomes the vector ( a + (3)' which equals the sum of the vectors a' and /3', where a' and /3' are vectors obtained by rotating a and /3 respectively. Similarly after rotating the multiplication of the vector a by a real number k we get a vector (fca)' which is the multiplication ka' of vector a' by k. That is to say, (a + 13)' = a' + /3', (ka)' = ka'. Now, suppose we project vector (a, b, c) in the linear space R3 on to the plane xoy and obtain vector a' = (a, b, 0). Such a projection is also a transformation, it is clear that (a + 0Y = a'+p',(ka)' = ka'. Likewise, a reflection with respect to the Cartesian coordinate plane xoy is also a transformation having the above properties. Obviously, the vector obtained by reflecting a = (a, b, c) with respect to plane xoy is a' = (a, b, —c). We can see from the above-mentioned transformations that there are some properties common to all transformations. The vector obtained from transforming the resultant of two vectors is the same as the resultant of two vectors obtained from transforming the two vectors separately. The vector obtained from transforming the product of multiplica tion of a vector by a scalar k is equal to the product of multiplication of a vector obtained from transformation of the vector by k. Such transformations which do not affect vector addition and multiplication of a vector by a scalar corre spond to operations in the linear space and reflect the nature of the essential link. Therefore they are important transformations for a linear space. In the following two sections we discuss such transformations, beginning with a definition. Definition 1. Suppose V is a linear space over K. If in V there is a rule T that assigns some unique element a' in V to any element a in V, that is
Linear Spaces and Linear Transformations
305
to say T transforms a into a', or a —> a ' = Ta, then the rule T is called a transformation on V, and a' is said to be the image of a under T and is denoted a ' = Ta. a is called the inverse image of a'. The above transformation T is said to be a linear transformation on V if it satisfies the following conditions: for any two elements a and (3 in V and any scalar k xa K, T(a + [3) = Ta + T/3, T(ka) =
k(Ta),
or ( a + 0)' = a' + 0', (ka)' = ka'. In the above-mentioned rotation is a linear transformation of R2, projection and the reflection are linear transformations of R3. It is easy to check that a transformation changing any element into the zero element is a linear transformation; it is called the zero linear transformation. A transformation changing any element in V into itself, i.e., a transformation that leaves any element unchanged is the identity linear transformation, or the unit linear transformation. A transformation which changes any element a in V into element ka, where k is a fixed scalar, is also a linear transformation and is called scalar multiplication. Since a transformation is simply a function, a linear transformation is a linear function. We next present some simple properties of a linear transformation. Prom the above definition we see that ka —> ka', where k is any number. Taking k = 0, we obtain o —» o; taking k = —1, we obtain —a —> —a'. That is to say, a linear transformation changes the zero element into itself, the negative element —a of a into the negative element —a' of the image a' of a. Then from k\ai + • ■ • + knan = o we have kia[ H
h kna'n = o.
That is to say, a linear transformation carries a linearly dependent set into a linear dependent set. However it should be noted that the inverse is not true, because a linear transformation may also change linearly independent elements into linearly dependent elements. For example, the zero transformation and the projection transformation are such transformations. Therefore a linear transformation need not change a basis into another basis. Definition 2. Let V be a linear space over K and T a linear transformation on V. The image of V under T, or the range of T, denoted by ImT or T(V),
306
Linear Algebra
consists of all those elements in V that are images under T of all the elements in V. T h e o r e m 1. If T is a linear transformation on a linear space V, then the images of all the elements in V form a subspace of V, called the image subspace and denoted by ImT or T(V). Proof. Suppose that U is composed of all the elements in V that are images of elements in V under T. Let a ' and /3' be any two elements of U and a and f3 are respectively their inverse images. Prom a! = Tot and /3' = T/3, we have T(a + /3) = a'+ /3', T(ka) = ka', which implies that a ' + / 3 ' and ka' are in U. That is to say, the sum of any two elements in U and the product of any element multiplied by a scalar k are elements in U. Hence U is a subspace of V and the theorem holds. The dimension of the subspace ImT is called the rank of the transformation T and is written as rank of T, i.e., rank of T = dim ImT. E x a m p l e 1. Let T be a linear transformation on R4 defined by /*l\ X2
' x2 + x3 \ X\
+X2
Xi , Xi — X2
Describe the image of T and determine its rank. Solution: Representing T as
/ x2 + x3 \ X\ +X2 X!
\Xi - X2I
Xi
1 + x2 Vi/
1 0 -l)
+ x3
0 0
w
we easily see that ImT coincides with L(oti, ot2,0:3), where a i = (0,1,1,1)',
a 2 = (1,1,0,-1)',
a 3 = (1,0,0,0)'.
The vectors on(i = 1,2,3) are linearly independent and, therefore rank of T = 3 .
Linear Spaces and Linear Transformations
307
If T is a linear transformation on n-dimensional linear space V, since ImT C V, rank of T < dim V = n. When rank of T = n, i.e. ImT = V, T is called a nonsingular linear transformation. When rank of T < n, i.e. ImT C V,T is called a singular linear transformation. The unit linear transformation is a nonsingular linear transformation and the zero transformation is a singular transformation. It is easily seen that a nonsingular linear transformation still changes a basis into some basis. Its inverse is obviously true also. Therefore a linear transformation is a nonsingular linear transformation if and only if it changes a basis into some basis. In general, we have the following result. Theorem 2. A linear transformation T on a linear space V is a nonsingular transformation if and only if it transforms linearly independent elements into linearly independent elements. Proof. The sufficient condition part of the theorem is apparently true. We verify the necessary condition by the method of reduction to absurdity as follows. Suppose T is a nonsingular linear transformation and oc\, a.2, ■ ■ ■, ocm are linearly independent elements. If Tcti, . . . , Tam are linearly dependent, we can choose elements Pm+uPm+2, ■ ■ ■ ,Pn in V such that . { a i , a 2 , . . . , a m , / 3 m + i , . . . ,f}n} a r e a basis for V. Then Ten, ■ ■ ■ , T a m , T / 3 m + i , . . . ,T{3n are linearly dependent, which contradicts the property of T that changes a basis into some basis. Hence Tot\,T0.2, ■ ■ ■,Tam are linearly independent and the necessary condition is established. This completes the proof. We know from definition that under a linear transformation any element in V can have only one image. But an element in V need not have an inverse image. Clearly under a nonsingular transformation any element in V always has an inverse image. Under singular transformation any element in V need not always have an inverse image. For example, for the rotation and reflection in R3 mentioned above, each element in R3 always has an inverse image. But for projection each element in R3 need not always have an inverse image. A point on the Cartesian coordinate plane xoy always has an infinite number of inverse images. For example, the inverse images of the point (0, 0, 0) are (0, 0, c), where c is an arbitrary number, while any other point in R3 always has no inverse images. Definition 3. Let V be a linear space and T a linear transformation on V. The kernel of T, denoted by KerT, is the subset of all elements a in V such
308
Linear Algebra
that T(a) = o. Clearly KerT is not an empty set, because if T is a linear transformation, then o is in KerT. Theorem 3. Let T be a linear transformation on a linear space V. Then KerT is a subspace of V, called the kernel subspace of the linear transformation T, or for short the kernel of T. The dimension of KerT is called the nullity ofT. Proof. We show that if a and j3 are in KerT, then so are a + f3 and ka for any number k G K. In fact, if a and (3 are in KerT, then Ta = o and T/3 = o. Consequently
T(a + P) = Ta + TP = o, thus a + (3 is in KerT. Also, T(fca) = kTa = o, so ka is in KerT. That is to say, the sum of two inverse images of the zero element is still an inverse image of the zero element; the multiplication of the image of the zero element by a scalar k also gives an inverse image of the zero element. Therefore KerT forms a subspace of V and so the theorem holds. The dimension of the subspace KerT is also referred to as the defect of the linear transformation T and is denoted by defT. Example 2. Show that the transformation denned by T(Xi, X2, Xz, Xi) = (Xi +X2 — 3^3 - Xi, ZXi — X2 — 3^3 + 4l4, 0, 0) is a linear transformation on 4-dimensional linear space R4, and find rank of T and defT. Solution: It is easily seen from definition that T is a linear transformation on i? 4 . The subspace T(V) consists of all such elements in R4 of the form (xi, X2, 0, 0). To find the dimension of T(V). We note that T(xu
x2, x3, x4) =xi(1,3,0,0)
+ 12(1,-1,0,0)
+ 3 z 3 ( l , l , 0 , 0 ) + a;4(-l,4,0,0) = x\a.\ + X2012 + 3x30:3 + X4CK4 .
Linear Spaces and Linear Transformations
309
Thus a i , a 2 , a<3, a$ spans T(V), i.e., T(V) = L(a.\, a 2 , «3, 0L4). We form a matrix whose rows are the above row vectors: /ai\ a2 a3
Va 4 /
/
1 3 0 0\ [ l - l O O ~ I 1 1 0 0 '
V-l
4 0 0/
Transforming this matrix to the reduced row echelon form, we obtain 1 0 0 0\ 0 1 0 01 0 0 0 0 I' 0 0 0 0/ Hence (1,0,0,0) and (0,1,0,0) are a basis for T(V). So the dimension of T(V) is 2. KerT is a solution space of the homogeneous system f xi + X2 - 3:E3 - Z4 =0, 1 3xi — X2 — 3X3 + 4X4 = 0 .
We know from Example 1 of Sec. 2.2 that KerT is composed of all vectors of the form (6fe - 31, 6k + 11, 4k, AI) = k (6,6,4,0) + I (-3,7,0,4) = */3i +102 . Thus /3i = (6,6,4,0) and /3 2 = ( - 3 , 7,0,4) span KerT, i.e., KerT = L (/3i,/32)Obviously, /3i and /3 2 are linearly independent. Hence /3i and /3 2 are a basis for KerT and so the dimension of KerT is 2, i.e., defT = 2. This solution is complete. T h e o r e m 4. Let T be a linear transformation on a linear space V with kernel subspace being KerT and a be an element in V. Then all the inverse images of T(a) are a + KerT. Proof. It is clear that an image of any element in a + KerT is Ta. Therefore all the inverse images of Ta contain a + KerT. Suppose /3 —> Ta. Since (3 - a -> Ta - Ta = 0, we have /3 - a E KerT, i.e., /3 6 a + KerT. That is to say, all the inverse images of T(a) are a + KerT. This completes the proof. Thus we know that for a linear transformation all the inverse images of each image have the same number (thickness") of elements. Therefore if there
310
Linear Algebra
is an image whose inverse image is unique, then the inverse image of any image is also unique. Definition 4. A linear transformation under which the image of any el ement has only one inverse image is called a one-one linear transformation. That is to say, if c*i 7^ 0:2 and it necessarily follows that T(ai) ^ T(ot2), the linear transformation T is a one—one linear transformation. Another equiva lent statement is that T is one-one if T ( a i ) = T(ct2) implies that c*i = 0.2A one-one linear transformation on V is also called an automorphism. We shall now develop some more efficient ways of determining whether a linear transformation is one-one or not. An examination of the elements in KerT allows us to decide whether T is a one-one linear transformation. T h e o r e m 5. Let T be a linear transformation on linear space V. Then T is one-one if and only if the kernel of T is the zero subspace. Proof. Let T be one-one and we shall show KerT = o. Let a be any element in KerT. Then T(a) = o. Also, we already know that To = o, thus T(a) = To. Since T is one-one, we conclude that a = o. Hence KerT = o. Conversely, suppose KerT = o and we wish to show that T is one-one. Let T ( a x ) = T(a2) for c*i and a 2 in V. Then T(t*i) - T{a2) = o, so that T(c*i — 0L2) = o. This means that c*i — 0*2 is in KerT and so a.\ — a.2 = o. Hence a j = a 2 and T is one-one. Note that Theorem 5 can also be proved directly using Theorem 4 above. Example 3. Show that a linear transformation T is one-one if and only if T transforms a linearly independent set into a linearly independent set. Proof. If T transforms linearly independent elements into linearly indenpendent elements, then from o ^ a E V we obtain T(a) jt o, i.e., an image of a nonzero element is not a zero element, and so KerT = o. Therefore T is one-one. Suppose T is one-one and at, 0*2,..., am are linearly independent elements in V. Let kiTcx.1 H h kmTam = 0. Then T(ki<xi + ■ • ■ + kmam)
= 0.
Linear Spaces and Linear Transformations
311
Since T is one-one, the inverse image have only the zero element. There fore kioti + ... + kmam = 0. Thus ki = 0 , . . . , km = 0. That is to say, T ( a i ) , T ( a 2 ) , . . . , T ( a m ) are linearly independent. The example is estab lished. We know from Theorem 2 that a linear transformation is one-one if and only if it is a nonsingular transformation. In other words, a linear transformation under which any image has only one inverse image is a linear transformation under which any element always has an inverse image; a linear transformation under which any element always has an inverse image is a linear transformation under which any image always has only one inverse image. We know that the rank of the zero transformation is zero. Its defect is its dimension n. Contrary to the zero transformation on V, the rank of the identity trans formation equals the dimension of V and its defect is the zero. In Example 1, the rank and defect of T are both two. The sum of the rank and the defect equals the dimension of the linear space. Generally, there is the important relationship among the rank and defect of a linear transformation, the dimension of a linear space. This is stated as the following theorem: Theorem 6. Suppose T is a linear transformation on a linear space V. Then dim (KerT) + dim (ImT) = dim V . Proof. Let k = dim KerT and dimV = n. If k = n, then KerT = V, which implies that T(a) = o for every a in V. Hence T(V) = o, dimT(V) = dim (ImT) = 0 and the conclusion holds. Next, suppose 1 ^ k < n and we prove dimT(V) = n — k. Let { a i , a 2 , . ■ ■ ,OLk) be a basis for KerT. By Example 5 in Sec. 7.2, we extend this basis to a basis { a i , a 2 , . . . , ak, ak+i, •••, a „ }
(1)
for V. We next prove that T(ak+1), is a basis for T(V).
T(a f e + 2 ), . . . , % )
(2)
312
Linear Algebra
Firstly we show that (2) spans T(V). Let /3 be any element in T(V). Then (3 = T(a) for some a in V. Since (1) is a basis for V, we can find a unique set of real numbers t\,ti,.. ,,tn such that a = tiot\ + t2oc2 H
h tnan .
Then p = T(a) = T(hai
+ t2a2 + ■■■+ tnan)
= t i T ( a i ) + t 2 r ( a 2 ) + • • • + t f c r(a f c ) + tk+lT(ak+1) = tk+iT{ak+i)
H
+■■• +
tnT(an)
h t„T(a„),
as 0 1 , 0 2 , . . . , afc are in KerT. Hence (2) spans T(V). Now we show that (2) is linearly independent. Suppose tk+iT(ak+i)
+ tk+2T(ak+2)
+ ■■■ + tnT(an)
= o.
Then T(tk+iak+i
+ tk+2ak+2
H
[■ tnan)
and hence tk+ia.k+\ + tk+2ak+2 + ■ ■ ■ + tnan tk+iock+i + tk+2ak+2
H
= o,
is in KerT, and we can write
1- t„an = 6 ^ 1 + 62^2 H
h bkak ,
where the real numbers b\, b2,..., bk are uniquely determined. We then have hai
+ b2a2 -\
h bkak - i f c + 1 a f c + 1
tnan
= o.
Since (1) is linearly independent, we have 61 = b2 = • • • = bk = tfe+i = ■■■ =
tn=0.
Hence (2) is linearly independent and forms a basis for T(V). If k = 0, then KerT has no bases: we let { a i , . . . , a n } be a basis for V. The proof now proceeds as above. It should be noted that although dim (ImT) + dim (KerT) = dim V, there may be ImT + KerT C V as it is not necessary that ImT n KerT = o. If V = ImT + KerT, from Example 10 of Sec. 7.2 we know that the sum is a direct sum, i.e., V = ImT ® KerT. Therefore ImT n KerT = o.
Linear Spaces and Linear
Transformations
313
The following concerns the general case: Theorem 7. Suppose that an n-dimensional linear space V is the di rect sum of subspaces V\ and V2, i.e., V = V L © V2, then there exists a linear transformation T on V such that ImT = Vi, KerT = V2. Proof. Suppose {a\, 012, ■ ■ ■, otn} is a basis for V, where {ap+i,..., an} is a basis for V2 and {Pi,..., Pp} is a basis for Vi and T is a linear transformation satisfying the conditions TQI
=
/3I,
. . . , Tap = PP, Ta-i = 0, i > p.
According to Theorem 1 in Sec. 7.4, T exists and is unique. It is then apparent that ImT = Vi, KerT = V2. This completes the proof. We have seen that a linear transformation is a corresponding rule. For a linear transformation T on V, if the image Ta. is given for any element a in V, then T is given. In other words, T is uniquely determined by the image Ta of any element a under T. Therefore, two transformations T\ and T2 on V is said to be equal if under them the images of any element a in V are the same, i.e., T\a = T2OC for each a G V. This is denoted by 7\ = T2. Hence the linear transformation T is the zero linear transformation if T(ot) = o for all a S V; this is denoted by T = o. We next discuss the operations of linear transformations. First we consider the addition of two linear transformations. Suppose T\ and T2 are two linear transformations on V, then a —> Tia + T2OL is a trans formation on V, and is called the sum of 7\ and T 2 , denoted by Ti -f- Ta, i.e., (Ti + T2)a = T i a + T 2 a . T\ + T2 is also a linear transformation on V as (Ti + T2)(a + p) = Tl(a + P) + T2(oc + P) = TiOt + Tip + T2a + T2P = (Tia + T 2 a) + (Tip + T2p) = (T1+T2)a
+
(T1+T2)p,
(Ti + T2)(ka) = Ti(fca) + T2(ka) = fc((Ti + T 3 ) a ) . If o is the zero linear transformation, then T+o = o + T = T. Therefore the zero linear transformation plays the same role in operations on linear
314
Linear Algebra
transformations as the zero number in number addition and the zero matrix in matrix addition. From the definition, we easily prove that addition of linear transformations satisfies the commutative law and the associative law, i.e., T1+T2 = T2+ Ti, (Ti + T2) +T3 = T1 + (T2 + T3) ■ By an analogous argument, it is easy to show that if T is a linear transfor mation on V, then —T, denned by (-T)a = — (Ta) for every element a in V, is also a linear transformation, and is called the negative linear transformation of T. Obviously, T + (-T) = -T + T = o. We can write T\ + (—la) as T\ — T2 and call it the difference between T\ and T2. Thus the difference between T\ and T2 is also a linear transformation. Next we discuss the multiplication of a linear transformation by a scalar. Suppose a is a scalar and T a linear transformation on V. We define a trans formation aT : (aT)(a) —► aT(a), for every element a in V. We call aT the scalar multiplication of T by a, denoted by aT, i.e., aT(a) =
a(Ta).
aT is also a linear transformation on V because aT(a + /3) = a[T(a + /3)] = a(Ta + T/3) = a(Ta) + a(T/3) = aTa + a.T/3, aT{ka) = a[T(ka)}
=k{aTa).
It is apparent that scalar multiplication of a linear transformation also satisfies the associative law and the distributive law: a{bT) = (ab)T, (a + b)T = aT + bT, a(T1+T2)
=
aT1+aT2.
Thus the set of all linear transformations on an n-dimensional linear space V over K is also a linear space over K, often denoted by Hom(V, V) (here Horn is short for homomorphism). E x a m p l e 4. Suppose V is an n-dimensional linear space over K. Hom(V, V) is an n2-dimensional linear space over K.
Then
Proof. Let {a\, OL2, . ■., a n } be a basis for V. The transformations l y G Hom(V, V) are defined by the following rule
Linear Spaces and Linear
Transformations
315
Tijdi = dj, Tijdk = 0, fe ^ i, i, j = 1 , . . . , n , Obviously, they are linear transformations on V. Next, we prove the n 2 linear transformations TV,- is a basis for Hom(V^ V). We first prove that any transformation T on V is a linear combination of Ty and then that TVj are linearly independent. Let a be any element in V, expressing it as a = aiai H
ha„a„ ,
and let Ta, = ouai H
h a,inan .
As T(Xi = 2 J °ij a J = 22 aij Tij OLi , we arrive at the result n 1 =
^ j aij
■* ij ■
i,j=l
We show below that TV, are linearly independent. Let n
Y2 hiiTii
=0
-
i,j=l
Then
_ b
T
a
bk a ^2 ij ij k=^2 i j = °• Hence bkj = 0, j = 1,2,..., n, and b^ = 0, i, j = 1,2,..., n, and so 7Vj are linearly independent. Thus T^ is a basis for Hom(Vr, V) and Hom(F, V) is an n 2 -dimensional linear space. The proof is complete. We now consider the multiplication of two linear transformations. Sup pose T\ and T2 are two linear transformations on V, then a —> Ti(T2a) is a transformation on V called the product of T\ and T2 and denoted by T1T2, i.e.,
T\T 2 (a) = T\(T 2 a). In other words, T\T2 is a transformation with T2 operating. It is also a linear transformation since
nrs
t operating and then T\
TiT2{a + (3) = T\(T 2 (a + 0)) = Tr(T2a + T2(3)
= r1T2(a) + r1r2(/3), TxT2{ka)
= Tipiika))
= Ti(k(T2a))
=
kTiT2(a).
316
Linear Algebra
I is the identity transformation if TI = IT = T. It plays the same role in the operations of linear transformation as the number 1 in number multiplication, and the identity matrix in matrix multiplication. For example, in the linear space R? let T\ and T% be rotation transfor mations which rotate every point in the plane xoy counterclockwise through angles 9\ and 02 respectively about the origin of a Cartesian coordinate system. Then T1T2 and T2Ti are two rotations which rotate every point in the plane xoy counterclockwise through 6\ +62- Again in R3 let Xi be the projection of vector onto the Cartesian coordinate plane yoz and T2 be the reflection with respect to the. plane zox, then T1T2 and T2X1 are two linear transformations which transform the vector (a, b, c) into the vector (0, —b, c). One should note that in general the multiplication of linear transformations does not satisfy the commutative law as in matrix multiplication. For example in R2 let Ti be a rotation which rotates counterclockwise through 7r/2 about the origin and T2 a projection with respect to the abscissa axis. Taking a rectangular coordinates system, if i and j are unit vectors on the abscissa axis and the ordinate axis respectively, then clearly we have (T1T2)i = T1(T2i) = T1i = j , (TiTaM = Tifaj)
= T10 = o,
{T2Tr)i = T2(T1i) = T2j = o, (T2T1)j = T2(T1j)=T2(-i)
= -i,
and hence TiT2^T2T1. The multiplication of linear transformations also satisfies the associative and distributive laws, just as the scalar multiplication of linear transformations: a(XiT2) = (aTi)T 2 ,
(T 1 T 2 )r 3 =
T^Ta),
Ti(T2 + T3) = T{T2 + TiT 3 , (Tj + T2)T3 = T{TZ + T2T3 . Example 5. Suppose that a linear space V is the direct sum of its subspaces V\ and V2, i.e., V = Vi ©V2, a = oti + 0 2 , a e V, an e V*. Prove that the transformations defined by T\a = on,
T2OL = 0L2
are linear transformations and that Tl = Tx, Ti = T2, T1T2 = T2T1 = o.
Linear Spaces and Linear
317
Transformations
Proof. From the definition it is easy to check that T\ and T2 are two linear transformations on V. Since T?a = Ti(Tia) = Tltxx = a i = Txa, {TxT2)a = T i ( r 2 a ) = Tl{a2) = o, and so 2? = Ti,
TxT2 = o
and likewise T22=r2,
T2T1=o.
Thus the statement holds. Generally, if T is a linear transformation on V and T 2 = T, then T is an idempotent transformation or is called a projection. E x a m p l e 6. Assume that T is a linear transformation on an n-dimensional linear space V. Then there exists some positive integer k such that ImT* = ImT m ,
KerTfc = KerT m ,
m^k.
Proof. Obviously T(V) C V. If T(V) = V, then T{V) = T2(V) = ■■■ ,
i.e., ImT = ImT 2
Thus the statement holds. If T(V) C V, then T2(V) C T ( ^ ) . When T 2 ( F ) = r ( V ) , we have T{V) = T2{V) = T3(V) and the statement also holds. When T 2 (V) C T(V), we can continue to discuss in this way. Since the dimension of V is n, while the dimensions of T(V),T2(V),T3(V),... gradually decrease, there exists some positive integer k such that Tk(V) = Tk+1(V). Hence the first expression holds. Again since dim(T fe (V)) = dim(T m (V)), from Theorem 6 we arrive at dim(KerT fc ) = dim(KerT m ). But KerTfe C KerT m , and so KerTfc = KerT m . Thus the second expression is established. The proof is complete. Example 7. Suppose T\ and T2 are two linear transformations on V. Show that rank of (TiT 2 ) ^ min(rank of Ti, rank of T2).
318
Linear Algebra
Proof. rank of {TXT2) = dim{(TiT 2 )V} = dim{ri(r 2 V)} < dim(TiV) = rank of 2i, rank of (TiT2) = dim{Ti(r 2 ^)} ^ dim (T2V) = rank of T2 . The proof is complete. The result of the above example coincides with the theorem in Sec. 3.1. Finally we present the invertible linear transformation similar to the inverse matrix. Suppose that T is a linear transformation on V. If there exists a linear transformation S such that
ST{a) = S(T(a)) = a for every element a € V, then <ST = I, where / is the identity transformation. That is to say, if T transforms a onto T(a) and S transforms T(a) back onto a , then S is called the inverse transformation of T. When T is a nonsingular linear transformation, the inverse image of T(a) is unique. It is easy to see that the transformation S transforming T(a) back onto a is a linear transformation on V. Therefore T is an invertible linear transformation. Conversely, suppose T is an invertible transformation. If T is not one-one, let Tx\ = Tx2, x\ ^ x2, then T(x\ — x2) = o. Premultiplication by S gives X\ = x2. Clearly we get a contradiction here. Therefore T is one-one. Thus T has an inverse trans formation if and only if T is nonsingular, or T is one-one, i.e., ImT = V or KerT = 0. If T is invertible, clearly its inverse linear transformation is unique and is denoted by T _ 1 . For example, the inverse linear transformation of the identity linear trans formation is itself. The zero linear transformation is not invertible. Again for example, the inverse linear transformation of the rotation transformation which rotates counterclockwise through 0 about the origin in R2 is a rotation which rotates clockwise through 9 about the origin. In R3, if T is the projec tion which projects vector a with respect to coordinate plane xoy, as a has either no inverse image or more than one inverse images, there is no way of transforming T ( a ) back onto its inverse image. Therefore T is not invertible. E x a m p l e 8. Find the inverse linear transformation of the linear transfor mation T on it 2 , defined by T(x,y) =
(y,2x-y).
Linear Spaces and Linear Transformations
319
Solution: T is nonsingular, because if T(x, y) = 0, we would have y = 0,
2x-y
= 0,
whose solution is x = y = 0. This last computation gives KerT = (0,0). Therefore T is invertible.
Let T(x,y) = (s,t), T~\a,t) = (x,y). Since T(x,y) = (y,2x-y) y = s,
2x-y
= (s,t), = t,
We have with solution 1 1 y = s,x = -s + -t. Hence we arrive at the explicit formula for T _ 1 :
The solution is complete. If T is invertible, obviously T _ 1 is also nonsingular. Therefore T _ 1 is also invertible. It is easy to see that the inverse linear transformation of T _ 1 is simply T: (T-1)-1=T, hence we have rp—lrp
rprp—1
r
It should be noted that an inverse linear transformation is different from a neg ative linear transformation. Any linear transformation need not be invertible, but it always has the negative transformation. A linear transformation which is invertible is called an invertible linear transformation. The above-mentioned rules which must be satisfied by linear transforma tions are similar to those which are satisfied by matrices as stated in Chapter 3. The interrelation between linear transformations and matrices will be discussed in detail in the next section. Exercises 1. Which of the following transformations are linear transformations? (1) T(ai,a 2 ,a 3 ) = {aj, ai + a 2 , a 3 ) in R3. (2) Tat. = a in the complex linear space which is composed of all complex numbers.
320
2.
3.
4.
5.
6.
Linear Algebra
(3) T(X) = BXC, where B and C are fixed matrices, in the linear space consisting of all matrices of order n. (4) T(f(x)) = f(x)' in a space which is composed of polynomials with real coefficients, T is called a differentiation transformation. (5) T(f(x)) = f* /(£)dt in a real linear space which is composed of all continuous functions in [a, b}. T is known as an integration transformation. Let Ti and Ti be two transformations on R2 defined by T\{xi,x2) = (x2, -xi),T2(xi,x2) = (xi,-x2). Prove that Ti and T2 are two linear transformations on R2 and find Tx + T2,TlT2,T2T1. Let { a 1,012} be a basis for a two-dimensional linear space V, T and S be two linear transformations on V, and Texi = fii,Toc2 = f32. If S(cti + a 2 ) = /3i + (32, 5 ( a i - a 2 ) = /9i - /3 2 . Prove that T = S. Given that T is a rotation which rotates counterclockwise through 0 about the origin, what are the rotations represented by — T and T~l. Do these two rotations coincide with each other? Can a linear transformation transforms the zero element into the zero element? A nonzero element into the zero element? Can all inverse images of a nonzero element form a subspace? Let T be a linear transformation on V and U be a subspace of V. A subspace U is called an invariant subspace under T If T(U) is contained in U, denoted T-invariant. Thus for any x £ U, the image T(x) also belongs to U: T(x) e U for all x 6 U. In other words, if U is T-invariant, then every element x £ U is trans formed under T into an element in the same subspace U. Prove that the following subspaces are invariant under any linear transformation T: (1)0, (2)V,
(3)KerT, ( 4 ) I m T .
7. Let T be a linear transformation on V and U an invariant under T. Show that if T is invertible, then U is also invariant under T _ 1 . 8. Let T be a linear transformation on R4 defined by T(x\,£2,£3,£4)
= (^1,^1 + £2,^1 + x2 + x$,xi + x2 +13 + £4).
Prove that T is an automorphism on R4. 9. Let T and S be two linear transformations on R3, defined by T(x, y,z) = (x + y + z, 0,0), ^(a;, y, z) = (y, z, x). Prove that Im(T + S) = R3. 10. Let T be a linear transformation on R3, defined by
Linear Spaces and Linear
T(x,y,z) 2
=
Transformations
321
(0,x,y).
2
Find ImT and KerT . 11. Suppose V is a linear space formed by all matrices of order 2 and T a linear transformation on V defined by T(A) = AM - MA, where M = (
AeV,
I. Find a basis for KerT.
12. Let T be a linear transformation on V. Prove that KerT* C KerT i + 1 . 13. Let T be a linear transformation on V, T2 = T, ImT = U, and KerT = W. Prove the following: (1) If u £ U, then T(u) = u, i.e., T is the identity linear transformation on V. (2) If T ^ I, where / is the identity linear transformation, then T is a singular linear transformation. (3)V = U®W. 14. Let T be a linear transformation on an n-dimensional linear space, whose rank is r and whose defect is s. Prove that rs ^ n 2 /4. *15. Let T be a linear transformation on a linear space V. Show that there exists some positive integer k such that TmV n KerT m = 0,
m^k.
*16. Let T be a linear transformation on a linear space V. Prove that the following four conditions are equivalent: (1) V = ImT + KerT, 2
(3) ImT = ImT,
(2) V = ImT © KerT, (4) KerT 2 = KerT.
7.4. Matrix Representation of Linear Transformations We have seen that elements in a linear space can be expressed by coordi nates. In this section we shall establish matrix representations for linear trans formations using coordinates such that the linear transformations are linked to numbers. The purpose of doing this is to link abstract linear transformations with concrete matrices. Let T be a linear transformation on a linear space V over K, and a any element in V. If {cti,oc2, ■ ■ ■ ,otn} is a basis for V, from a = aiaiH \-anan, we get
322
Linear Algebra
Ta = T ( a i a i + • • • + anan)
= a^Ton)
+ ■■■ +
an{Tan).
That is to say, the image Ta of any element a in V is uniquely determined by the images T a i , Ta2, ■ ■ ■ ,Tan of a basis, i.e., T is uniquely determined by Ta\,Ta2,... ,Tan. Suppose
{
Tai = aucti
H
hanian,
Tan = o i „ a i H
1- annan,
(1)
or n
Tai = 2 J a j i a j 1 i = 1,- ■ ■ ,n. j=i
Formally using the operational rules of matrices, expression (1) can be simpli fied to a matrix form: T(ai,...,an)
= (Tai,...
,Tan)
= ( ^ a j i " j , • • ■ >5Z a J' n Q ; j)
= (a1,...,an)A,
(
an
•••
0>nl
' ' '
(2)
au I , whose i-tb. column vector is the coordinates 0-nn
t
of Ta^ Thus for a given linear transformation T, we get a matrix determined by (1) or (2). Conversely, if we have a matrix A of order n, then we can obtain n elements Tai,Ta2,... ,Tan from (1) or (2). The following theorem states that there is precisely one linear transformation determined by the images of the n elements ati, a2,. ■ ■, an. Therefore, given any matrix A we can obtain uniquely linear transformation. Consequently, a linear transformation can be represented by a matrix. T h e o r e m 1. Let {a\,a2,... ,an} be a basis for a linear space V and let (3i,(32, ■ ■ ■ ,Pn be any elements in V. Then there is precisely one linear transformation transforming a ; into (3i,i = 1,2,... ,n. Proof. As {a\,a2,..., an} is a basis, for any element a in V there are unique n scalars a\,a2,...,an such that a = a\a\ + a2a2 +
\- anan ■
Linear Spaces and Linear
Transformations
323
Linear transformations can be uniquely determined by the images of a basis, and so the linear transformation required can be uniquely determined by Ton = Pi,i = 1,2,...,n, ought to be written as / n
\
T l^diOti
n
1 ='^2,aif3i.
^i=l
i=l
It suffices to show that T is really a linear tranformation; then T is simply the uniquely linear transformation required by Ton = Pi- Thus the theorem is established. As a matter of fact, let P = Y^l=\ bi&i be an element in V and let k be any scalar. Then n
T(a + p)=T ^2(ai+h) n
OLi
= Er=lK + bi)Pi
n
= ^2 cnPi + Y^ biPi = Ta + Tp, i=l
i=l n
T(ka)
=T ^2(ka,i)ai = Yd=1(kai)Pi -i=l
= k~^2 diPi = k(Ta) i=i
and so T is a linear transformation. As for the uniqueness of T, we shall give the following proof. If 5 is a linear transformation on V with Sai = Pi, i = 1,2,. for the element a = ^™ =1 diOn we have / n
SOL = S I ^diOti
\
n
I = ^2diS(an)
then
n
= ^a*/^,
so that S is exactly the rule T defined above. This shows that the linear trans formation T with Ton = Pi is unique. Therefore the theorem is proved. E x a m p l e 1. The elements c*i = (1,2) and c*2 = (3,4) are linearly inde pendent and form a basis for R2. Given /3i = (3,2) and P2 = (6,5) in R2, find a linear transformation T such that Ton = Pi, i = 1,2. Solution: According to the above theorem, there is a uniquely linear trans formation T on R2 such that Tot\ = (3,2) and Tot-z = (6,5). Taking any element a in R2 and supposing a = a i a i + 0202, we have
324
Linear Algebra
Ta = aiTai + a2Ta2 = oi(3,2) + a 2 (6,5) = (3ai + 6a 2 , 2ai + 5a 2 ). For example if a = (1,0), then a = (—2)ai + a 2 , and thus ai = —2 and a 2 = —1. Hence T(l,0) = ( - 2 ) T a 1 + l - T a 2 = - 2 ( 3 , 2 ) + (0,5) = (0,1). The matrix A determined by (1) or (2) is called the matrix of T with respect to the basis { a i , a 2 , . . . , a n } . For instance, the matrix of the zero linear transformation with respect to any basis is the zero matrix; the matrix of the identity linear transformation with respect to any basis is the identity matrix; the matrix of scalar multiplication linear transformation with respect to any basis is the scalar matrix. If T transforms a basis for V into a basis for V, then the matrix A is a nonsingular matrix. If V is a one-dimensional linear space over K, then T can be expressed by an element in K. That is to say, a linear transformation of a one-dimensional linear space over K can be represented by an element over K. From (1), we know that if the rank of A is r, then the largest linearly independent set of Ta\,Ta2, ■ ■ ■ ,Tan consists of r vectors. Therefore the dimension of the subspace spanned by TOL\, Ta2, ■ ■ ■, Tan is obviously r, i.e., the dimension of T(V) is r and so the rank of T is r. Again if the rank of T is r, clearly the rank of A is also r. That is to say, the rank of A is the same as the rank of T. And so A can really represent T. Using the above, we can simplify the proof of Theorem 6 in Sec. 7.3 as follows: As KerT is the subspace formed by all vectors satisfying Ta = o, letting a — x\OLx-\
\-xnan
= (ai,...,an)X,
X =
XnJ we obtain T a = T ( a i , • ■ ■ , an)X
= ( a l r . , an)AX
= o,
and so AX = 0. That is to say, the problem of finding KerT reduces to the problem of finding the solution space of AX = 0. Thus the defect of T is s — n — r, i.e., r + s = n, which is embodied in Theorem 6. E x a m p l e 2. Let T be a rotation which rotates counterclockwise through 0 about the origin and i and j by two unit vectors on the abscissa and ordinate
Linear Spaces and Linear Transformations
325
axes respectively in rectangular coordinates. Find the matrix of T with respect to the basis {i, j}. Solution: From analytic geometry we know Ti = icos 0 + jsin 6, Tj = i ( - s i n 6) + jcos 6. Therefore the matrix of T with respect to the basis {i, j} is cos 6 sin 0
—sin 9" cos 6
Example 3. Let T be a projection linear transformation denned by T(a,b,c) =
(a,b,0),
i,j, k be unit vectors along the rectangular axes. Find the matrix of T with respect to the basis {i, j , k}. Solution: It is clear that
Ti = i, Tj=j,
Tk = o.
Therefore the matrix of T with respect to the basis {i, j , k} is (1 0
V°
°\
0 1 0 0 *
Example 4. Suppose the linear transformation T transforms one basis { a = ( 1 , 0 , 1 ) , 0 = (0,1,0), v = (0,0,1)} into another basis {(1,0,2), ( - 1 , 2 , - 1 ) , (1,0,0)}. Find the matrices of T with respect to the two bases {a, f3, v} and {a',f3', is}', the latter being given below: a' = (1,0,0), 0 ' = (0,1,0), v1 = (0,0,1). Solution: As Ta = (1,0,2) = a
+u,
T0 = (-1,2,-1) = - a + 20, I V = (1,0,0) = a
-i/,
326
Linear Algebra
the matrix of T with respect to the basis {a, (3, i/} is A= Again as a' = (1,0,0) = a - 1 / , 0=0,
v' = u,
we have Ta' = Ta-Tv
= (0,0,2) = 2i/',
T/3' = T/9 = ( - 1 , 2 , - 1 ) = - a ' + 2/3' - i/, T i / = T i / = (1,0,0) = a ' . Hence the matrix of T with respect to the second basis {a',/3',i>'} is
From this example we again see that the matrices of the same linear transfor mation with respect to different bases are different. Consequently the matrix of a linear transformation is closely related to the basis employed. E x a m p l e 5. Suppose T is an idempotent linear transformation on a linear space V, i.e., T 2 = T. Prove that an appropriate basis can be chosen such that the matrix of T with respect to the basis is
A1
\ (3)
o/ where r is the rank of T. Proof. Suppose {ai,aa,■ ■ ■ ,ar} is a basis for ImT, { a r + i , - • ■ , a „ } is a basis for KerT and T(fii) = cii (i = 1,2, • • • , r), then T(ai) = r 2 ( f t ) = T(Pi) =ah
i = 1, • • • , r .
Linear Spaces and Linear
327
Transformations
It is clear that T(atj) = 0, j > r. Again c t i , . . . ,ar, otr+i, ■■■,o>-n are linearlyindependent. The reason is that from kiot\ H
1- kTaT + kr+iar+\
+ ■ ■ ■ + knan
= o,
we have k1T(a1)
+ --- + krT(ar)
= o,
i.e. kioci + • • • + krar = o ,
and so k\ — ki = ■ ■ ■ = kT = 0. Therefore fcr+i = • • ■ = kn = 0 . Thus the matrix of T with respect to a basis { a i , . . . , a r , a r + i , . . . , an} is simply (3). And so the statement holds. Example 6. Suppose T is a linear transformation on an n-dimensional linear-space V and A is the matrix of T with respect to a basis. If A is positive definite and orthogonal, then T is the identity linear transformation. Proof. Since A is positive definite, there is an orthogonal matrix Q such that Q~lAQ = diag(Ai,...,An), A , > 0 , thus Q~XA'Q = diag(Ai,..., A n ), and hence Q~lAA'Q
= Q~lAQ ■ Q^A'Q
= diag (A?,..., A£),
i.e., Q~XQ = diag (A?,...,A£), and so Xf = 1. Therefore A* = 1,» = 1, 2, • • • ,n. Thus Q~XAQ = E, or A = QQ~l = E. That is to say, T is the identity linear transformation. This completes the proof. It should be noted that since ImT n KerT ^ 0 generally, a basis for ImT plus a basis for KerT need not be a basis for V. For example, T2 = o. When T ^ o, T(TV) = T2V = 0, ImT C KerT. Obviously a basis for V cannot be chosen from ImT and KerT. The above example is merely the special case. We introduce above a method for finding the matrices of linear transfor mations. Next, we present methods for finding the matrices of the sum, scalar multiplication, product, and inverse of linear transformations. Suppose the matrices of linear transformations S and T with respect to a basis {ai,Q2, • • • , a „ } are A = (a,ij) and B — (6y) respectively. We shall consider the matrices of linear transformations T + S, aT, TS with respect to the basis { a i , ct2, • • •, a n } . As
328
Linear Algebra
Ton = s^ajiajjSai
= 22 bjiOtj ,
3=1
i=i
we have n n
.,n. (T + S)oLi S)ai = TfXi Ta.i + Soti Son = ^{aji 22(aJi + bji)atj, bji)aij, i — 1,..l,...,n. i=i
Thus the matrix of (T + S) with respect to the basis {a\, c*2,..., ocn} is A + B = (dij + bij). In other words, the matrix of the sum of two linear transformations with respect to the same basis is equal to the sum of the matrices of the two linear transformations. Furthermore as n n
a a a(Toti) = a 2. ^.o-jiOLji aT(a.i) = a(Tati) ji j, i == 1, • ■. ,■n ,,ti,
jj=i =i
the matrix of aT with respect to the basis { a i , OL2, ■■ ■ ,ocn} is a A = (ao,j). Hence the matrix of the multiplication of a linear transformation by a scalar a is equal to the multiplication of the matrix of this linear transformation by the same scalar a. We can also say that the sum and scalar multiplication of matrices in Chapter 3 are defined according to these needs. Thus the matrix of the negative linear transformation (—T) of a linear transformation T is the negative matrix —A of the matrix A of T. Again as T 5 ( a O =-T(Sa •4) = nn
nn
= £*; j=i
tt=i =i
(
n
n
\
*r(«i)
T\Y,t>ji*i i=i
nn
/ II n
j=i \
2 ^atjbjj tt=i = i \j=i \J-1
"t,
Jj
, a n} the matrix ix of TS with respect to the basis {ai,012,■■ { a i , a 2 , . . .■,oi n} is
4 5 == I(Yl y >a«itb6tjy )I ■ AB J
Linear Spaces and Linear
329
Transformations
That is to say, the matrix of the product of two linear transformations is the product of two matrices of the two linear transformations. The multiplication rule of two matrices in Chapter 3 is formulated in accordance with the result. Thus problems of linear transformations are essentially problems on matrices. Example 7. Suppose T\ and T2 are two rotation linear transformations which rotate vectors counterclockwise through angles #i and 02 about the origin respectively, find the matrices of TiT 2 and T2T\. Solution: Prom Example 2, we know that matrices of Ti and T2 with respect to a basis {i, j} are respectively 0\ . _ (/ COS cos #i Ai1 = I sin 0\ 6\
s i n 02 COS 02 A - /(cos ^2 -—sin 82 N \ \ sin sin 62 02 cos COS 82 02 J)
—sin —sin ##ii \\ COS cos 0\ #1 I) ,'
^ 22 =
Then the matrices of TiT 2 and T2Ti are / c o 1s+8 ( 02i)+ 0 2 ) A A _ A A _fcos{8 A AiA XA22 -= AA 2A 1 - ^ g i n ( 0 j + H) 2Ai ^ sin (0i + 02)
- ss iinn ((0i 0 i ++ 002)) \\ fc) j . cos (0i + 0 2 ) ) "
cQs ( f t +
m i J_I_ t * _ t __1 Thus TiT2 and_i m T2Ti both are a rotation which rotates vectors counterclockwise through an angle 8\ + 82 about the origin. _
-
J_-J__*
!_•
1_ .
E x a m p l e 8. Suppose T\ in R2 is a rotation which rotates vectors counter clockwise through an angle 7r/2 about the origin. T2 is the projection on the abscissa axis. Find the matrices of TiT 2 and T2T1. Solution: Let i and j be unit vectors on the abscissa and ordinate axes respectively. It is apparent that Tii=j,
T!J = -i,
T2i = i,
T2j = o.
(4)
Therefore the matrices of T\ and T2 are
* = ( ; "j), A2=(I
j)
respectively and the matrix of TiT 2 with respect to the basis {i,j}
A1A2 = (°1 ° ) ,
is
330
Linear Algebra
and the matrix of T2T1 is
The solution is complete. It is also easy to see from (4) that T\Ti first transforms i into i, and then i into j , while T2T1 first transforms i into j , and then j into o. It is clear that The results in the above example coincide with those cited in Sec. 7.3. Next we introduce the matrix of T~l. Let T be a linear transformation on V, A be matrix of T with respect to some basis for V. If T has an inverse linear tranformation T _ 1 , we can suppose that the matrix of T _ 1 with respect to the same basis is B. From T~lT = T T~l = I, we obtain BA = AB = E, Therefore B = A-1. That is to say, if T has an inverse linear transformation T~l, then the matrix of T _ 1 is A-1, the inverse matrix of A. For example, let T be a rotation in B? which rotates vectors counterclock wise through an angle 6 about the origin. From Example 2, the matrix of T with respect to a basis {i,j} is A, and so the matrix of the inverse transfor mation T - 1 of T with respect to a basis {i,j} is A-1 _ ( cos 6 ~\-sin8
sin 8 \ __ / c o s (—6) —sin (—8) \ cos 8 J ~ \sin (-8) cos (-0) J "
Thus T _ 1 is a rotation which rotates vectors counterclockwise through an angle —8 about the origin. We know from Sec. 7.2 that the coordinates of a vector depends upon a basis for V. When the basis changes, the coordinates of the vector may also change. Theorem 3 in Sec. 7.2 states the relationship between them. The matrices of the linear transformation may also change as they are closely related to the bases for V. If the bases for V are different, the matrices of the same linear transformation are also defferent. The following theorem describes the relationship between them. Theorem 2. If the matrices of a linear transformation T with respect to two bases { a i , a j , . . . , a n } and {j3i,/32i • • ■ ,/3n} are A and B respectively and (01,..., fin) = ( a i , . . . , a „ ) P ,
Linear Spaces and Linear
Transformations
331
then B =
P~1AP.
Proof. According to the hypothesis, we have T(c*i,..., a „ ) = ( 0 1 , . . . ,
an)A,
T(/3i,-.-,/3n)-(/3i,..-,/3n)S. It is clear that T{PU ...,fin)
= (fli,...t
Pn)B = ( « ! , . . . , On)PB,
T(3U...,/3„)
= T ( ( a i , . . . , a„)P) = (tti,...,
an)AP,
thus (ai,..., a„)PP = (ai,..., an)AP, and hence (ai,...,an)(PS-AP)=0. As Q i , a 2 , . . . , a n are linearly independent, P S — AP = 0, or PB = A P Then as /3i,/32,■ •• ,/3n are linearly independent, the matrix P is nonsingular. Therefore B = P~1AP. The theorem is proved completely. The above shows that two matrices of the same linear transformation with respect to two different bases are similar, or in other words, similar matrices represent the same linear transformation. For example, in Examples 4 it is easy to see that
P=
I
00 \V-i -i
0 0\ 1 00 I , P _ 1 o0 :1
°\
0 1 0 0 1/
I 0
V
By direct computation we obtain /I
P~lAP =
0
Vi /I 0
V
°\
0 /I 1 0 0 0 1 / Vi
-1 2 0
°\ /°
-1 2 0
0 1 0 0 1/
0
V2
1\ / 1 0
°
"1/ 1\
°
-1/
0 0\ 10
V-i o i ; =
/0 -1 0 2 \2 -1
1\ 0 O/
From (1), or (2), we know that linear transformations whose matrices have nothing to do with bases for a linear space are precisely of three types, namely,
332
Linear Algebra
the zero linear transformation, the identity transformation, and the multipli cation of a linear transformation by a scalar. According to the previous two chapters, we know any matrix is similar to a diagonal matrix or a Jordan canonical matrix. Therefore, for a linear transformation on V, we can choose an appropriate basis for V such that the matrix representing it takes as simple a form as possible, such as a diagonal matrix or a Jordan canonical matrix. Such matrices can directly reflect the properties of the linear transformations. This is one reason why we discuss the canonical form of matrices under similar conditions in Chapter 5. However the eigenvalues of matrices are in general complex numbers so that the above requirement may not necessarily be satisfied in the realm of real numbers. In Chapter 5 we considered the eigenvalues and eigenvectors of matrices. What relationship is there between them and the linear transformations rep resented by matrices? We next discuss these problems. Let T be a linear transformation on V, A be the matrix of T with respect to a basis { a i , Q 2 , . . . , a n } , i.e., T(ai, a 2 , . . . , a „ ) = (e*i, a 2 , . • ■, an)A. If a = a i a i + a 2 a 2 + ■ ■ ■+ anan, namely, (oi, a 2 , ■ ■ ■, cn) is an eigenvector of A associated with eigenvalue A, then as
fai\
i
MW /
|-A|
f ai \
, a = ( a i , . . •,««) | \an/ \an/
we have Ta = T j ( a i , . . . , a n ) : ) I = ( a l r .. \ \ an / ) = A(ai,...,a„) I j
,an)A \an
= Aa.
\anJ Thus if a is an eigenvector of A associated with eigenvalue A, then Ta = Aa, i.e., T transforms a into Aa. Conversely, if Ta = Aa, or
f
i
r|
i
("!,-•
/ai\ \
-."«) 1
\an/
/Oi\
= A(a 1; . •,On) j \an/
Linear Spaces and Linear
then
333
Transformations
"1f'\ - » l r'\ i
(ai,...,an)
: \an/
;
0.
\an/
As a i , . . . , a n are linearly independent, we have ai
<Xl
Hence (ai, a 2 , . . . , o n ) is an eigenvector of A associated with eigenvalue A. The number A satisfying Ta = Xa is called an eigenvalue of T; the vector a ( ^ 0) is called an eigenvector of T associated with A. Thus we see that A is an eigenvalue of T if and only if it is an eigenvalue of A. a is an eigenvector of T associated with A if and only if it is an eigenvalue of A associated with A. That is to say, eigenvalues and eigenvectors of T coincide with eigenvalues and eigenvectors of A respectively. Therefore we refer to the characteristic poly nomial of A as the characteristic polynomial of T. Any linear transformation always has its eigenvalues and eigenvectors. For example, from Example 2 of Sec. 5.1 we know that the eigenvectors of a scalar matrix are any nonzero vectors. Therefore any nonzero vector is an eigenvector of a scalar multiplication transformation. Again in Example 3 of Sec. 5.1, the vector a = (1, 2, —1) is an eigenvector of the matrix A associated with eigenvalue A = 1. Let T be a linear transfor mation whose matrix with respect to a basis {01,02,03} for R3 is A. Then a = ati + 2 a 2 — 0 3 . As Toti = - Q i - 4 a 2 + a 3 , Ta2 = an + 3 a 2 , Ta3 = 2 a 3 , have Ta = Tax + 2Ta2 - Ta3 = ax + 2 a 2 - a 3
a.
Thus a is an eigenvector of T associated with the eigenvalue 1. Again let T be a linear transformation on R2 denned by T(x,y) = (x + 2y,3x + 2y). If Ta = Xa, a = (x, y), then T(x,y) =
(Xx,Xy),
334
Linear Algebra
and hence x + 2y = Xx, 3x + 2y = Xy, or f (A - l)x - 2y = 0, \ -3a; + (A - 2)y = 0. Hence (XE - A)a = Q,A =
1 2 3 2
where clearly A is the matrix of T with respect to the basis {(1,0), (0,1)}. Therefore a is also an eignvector of A associated with A. Thus from the above we see that the more eigenvectors of T contained in a basis for V, the more zero elements are contained in a matrix of T and, consequently, the simpler the form of the matrix. If the vectors in a basis are all eigenvectors of T, then the matrix of T has the simplest form of a diagonal matrix. From this we can also see that Theorem 1 in Sec. 5.2 is true. In a word, linear transformations are in agreement with the matrices rep resenting them; they differ only in the form of expression. As a linear trans formation does not place restriction on a basis, it is more convenient to use. Exercises 1. Find the matrices of the following linear transformations with respect to the assigned bases: (1) T is a linear transformation on R3 projecting vectors on the coordinate plane x o y with respect to the basis {i,j,k} on rectangular coordinate axes. (2) T is the above linear transformation with respect to the basis {i, j,i+ 3+k}. (3) T is a linear transformation on R3 defined by r(ai,02,03) = (2ai - 0,2,0.2 + 03,01) with respect to the basis {i,j, k}. (4) T is a differentiation transformation on a 6-dimensional real linear space which consists of linear combinations of the following six vectors with real coefficients:
Linear Spaces and Linear
335
Transformations
o
a 2 = eoa;sin bx,
03 = xe ax cos bx,
an = xe a i sin bx,
c*5 = |a; 2 e ax cos bx,
ate = -x 2 e o x sin bx,
with respect to the basis formed by the six vectors. (5) T is a linear transformation on R3 defined by ( T a x = ( - 5 , 0,3), { T a 2 = ( 0, - 1 , 6), where [ T a 3 = ( - 5 -1,9),
(oti=(-l, \ a 2 = ( 0, [ a 3 = (3,
0,2), 1, 1), - 1 , 0)
with respect to the basis {0:1,0:2,0:3} for R3. (6) T is as described above, with respect to the basis {i, j , k}, where i = (1,0,0), j = (0,1,0), k = (0,0,1). 2. Let i and j be unit vectors on the abscissa axis and ordinate axis respec tively in a rectangular coordinate system, T\ a reflection with respect to the abscissa axis and T2 a linear transformation which rotates vectors in R2 counterclockwise through an angle 45° about the origin (0,0). Find the matrix of T2 with respect to the basis {i, j}. 3. Let (^
"l\ be the matrix of a linear transformation Ti on R2 with
respect to the basis {on = ( l , 2 ) , a 2 = (2,3)} and I
J the matrix
of a linear transformation T2 on R2 with respect to the basis {/3i = (3,1), /3 2 = (4,2)}. Find the matrix of the linear transformation Ti + T 2 with respect to the basis {/3i,/32} and matrix of the linear transformation TiT 2 with respect to the basis { a i , a 2 } . 4. Suppose T is a linear transformation on an n-dimensional linear space V and the matrix of T with respect to the basis {a\, a 2 , . . . , a n } is
r 1
Find the matrix of T with respect to the basis {/3i,/3 2 ,... ,0n}, /3i = a 2 , f32 = a 3 , . . . , /3„_i = a „ , /3 n = a i .
where
336
Linear Algebra
5. Let T be a differentiation transformation on the real linear space of polynomials in A whose degrees do not exceed n. Find the matrices of T,T2,... ,Tn with respect to a basis {eo,e%,e%,...,e„}, where eo = 1, e i = A, e2
An 2!
:
n! 6. If the matrices of a linear transformation on a linear space are the same matrix with respect to any basis, show that T is a scalar multiplication transformation. 7. Let T be a linear transformation on a linear space V. If T ^ o, Tk = o, then T is called a nilpotent transformation with power exponent k. Suppose a e V,T f e _ 1 (a) / 0. Show that a,Ta,.. .,Tk~la. are lin early independent. Then if the power exponent k is equal to the dimension n of V, show that the matrix of T with respect to {a,Ta,... ,Tn~1a\ is 0
\
V
/ 2
8. Let T be a linear transformation on R and 1 2 2 2
ll
2
°\
-2
<W
be the matrix of T with respect to some basis. Find the kernel of T. 9. Prove that all linear transformations on an n-dimensional space V under the usual vector addition and scalar multiplication of linear transforma tions form a linear space of dimension n 2 . 10. Let V be a linear space consisting of functions whose basis is {sin#, cos#} and D a differentiation transformation. Prove that D is the zero point of the characteristic polynomial /(A) = A2 + 1. *11. If A and B are matrices of order n and AB = BA, prove that A and B contain common eigenvectors. *12. Let T be a linear transformation on a linear space V. Prove that T is invertible if and only if the constant term of the minimal polynomial of T is not equal to zero.
337
Linear Spaces and Linear Transformations
*7.5. Linear Transformations from One Linear Space into Another The linear transformations considered above are linear transformations from a linear space into itself. In this section we will generalize them. We shall consider linear transformations from one linear space into another. Definition 1. Let U and V be linear spaces over K. According to a rule T that assigns to any element a in V, a unique element a' in U, i.e., a -4 a' = Ta, then T is called a transformation from V into U. If, moreover, T(a +/3) = T(a) + T((3),T(ka)
=
kT(a),
then T is called a linear transformation from V into U. Example 1. Let T be a transformation from R3 into R2 defined by
T(x,y,z) = (x,y). Prove that T is a linear transformation. Proof. Obviously T is a transformation from R3 into R?. (ai,&i,ci),/3 = (a 2 ,6 2 ,c 2 ). Then
Let a =
T ( a + 0) = T(ai + a 2 , &i + &2, Ci + c2) = (ax + a 2 ,6i + 62) = (a 1 ,6 1 ) + ( a 2 , 6 2 ) = T a + T/3, r(Jfca) = T(fcai, fcbi, fcci) = (fcai,fc&i)= ffa , and so T is a linear transformation from R3 into i? 2 . established.
The statement is
Example 2. Let a E i i " and M a real (n, m)-matrix. Then Ta = Ma is a linear transformation from Rn into Rm. We can check directly from definition that the above statement holds. All concepts and properties established before for linear transformations from V into itself can be carried over immediately for linear transformations from V into U. This can be proved by analogous arguments. We shall leave these proofs as exercise for the reader. Suppose T is a linear transformation from V into U, then the image subspace ImT of T is a subspace of U. The Kernel subspace KerT of T is a subspace of V. It is obvious that if the dimensions of V and U are n and m respectively, then the rank r of T cannot exceed n and m, i.e., r ^n,m, and the
338
Linear Algebra
defect of T cannot exceed n, i.e., s ^ n. When r = m, T is called a nonsingular linear transformation. When r < m, T is called a singular linear transforma tion. Like Theorem 6 in Sec. 7.3, we have the sum of the rank and the defect of T is equal to n, i.e., r + s = n. When s = 0, we obtain r = n, and n ^ m. Therefore a one-one linear transformation is not always a nonsingular trans formation. Only when n = m is a one-one linear transformation nonsingular. Moreover, the set of linear transformations from V into U, denoted by Hom(V, U), forms a linear space over K. If the dimensions of V and U are n and m respectively, then the dimension of Hom(V, U) is equal to ran. If T is a linear transformation from V into U and S a linear transformation from U into V, then ST is a linear transformation on V and T S is a linear transformation on £/. If ST = I and TS = I, where I is the identity linear transformation, then S is called the inverse linear transformation of T, denoted by S = T _ 1 . Of course, as we have TS = I,T is also the inverse linear transformation of S, i.e., T = 5 _ 1 . A linear transformation T having inverse linear transformation is called an invertible transformation. Clearly a one-one and nonsingular linear transfor mation is invertible. Conversely, an invertible linear transformation is a oneone and nonsingular transformation. If T and S are invertible, then T~1S~1 is the inverse linear transformation of ST. S ^ 1 ! 1 - 1 is the inverse transformation of TS. These can be checked by definition. If T is an invertible linear transformation from V into U, then it is called an isomorphism from V onto U. Here, we can also say that V and U are isomorphic, denoted by V ~ U. When V = U, T is an automorphism on V, that is, an isomorphism T of V onto itself is called an automorphism. The two sets are isomorphic, i.e., they have the same form. For many purposes we often regard isomorphic linear spaces as being essentially "the same", although the elements and operations in the spaces may be quite different. Thus if one of two isomorphic linear spaces has certain properties, the other space also has the same properties. Or we may say that the relation between elements of two isomorphic spaces is identical. Therefore we identify two isomorphic linear spaces with two same linear spaces. Example 3. Prove that Rn+1 is isomorphic to M (see Sec. 7.1). Proof. Let a = ( a o , a i , . . . ,a n ) be any elements in Rn+1. Ta = T(ao, Oi, • • ■ , a n ) = OQ + a\x -I
Then
1- anxn G M.
Linear Spaces and Linear
Clearly, T is a transformation from Rn+1 into M, and T(ka) (3 e Rn+1. By a computation it is easy to see T{a + (3)=Ta
339
Transformations
= kTa.
Let
+ T(3,
and so T is a linear transformation from Rn+1 into M. Obviously T is also a nonsingular linear transformation. When ao + a\x + ■ ■ ■ + anxn = 0, we have ao = a\ = ... = an = 0, i.e., a = o. Therefore T is a one-one linear transformation and hence an invertible linear transformation from Rn+l into M. Thus T is an isomorphism: Rn+1 ~ M. The proof is complete. The following are important properties of isomorphism. Theorem 1. (1) Every linear space V is isomorphic to itself. (2) If V is isomorphic to U, then U is isomorphic to V. (3) If V is isomorphic to U and U is isomorphic to W, then V is isomorphic toW. (1), (2), and (3) are not difficult to show. Their proofs are left to the reader as an exercise. The following theorem shows that all linear spaces of the same dimension are, algebraically speaking, alike, and conversely, isomorphic linear spaces have the same dimensions. Theorem 2. Let V and U be two linear spaces over K. Then V ~ U if and only if their dimensions are equal. Proof. Suppose V ~ U, then there exists an invertible transformation T from V into U, and so T is a one-one nonsingular linear transformation. Prom Example 3 of Sec. 7.3, T carries each linearly independent subset of V into a linearly independent subset of U, and so the dimensions of V and U are equal. Suppose the dimensions of V and U are equal to n. Let {cxi,ot2, ■ ■ ■, an} be a basis for V, {jh.tfh> ■ ■ ■ > Ai} a basis for U. As in Theorem 1 in Sec. 7.4, it is easy to show that 5 j kiCXi -> ^^ kiPi
340
Linear Algebra
is a linear transformation from V into U. It is obviously a nonsingular and one-one linear transformation. Therefore it is an isomorphism from V onto U and so V ~ U. The proof is complete. Thus any n-dimensional linear space over K is isomorphic to Rn (see Theorem 2 in Sec. 7.2). That is to say, any n-dimensional linear space can be regarded as Rn. Let T be a linear transformation from V into U, { a i , a 2 , . . . , a n } a basis for V, {/3i, / 3 2 , . . . , f3m} a basis for U. As in Theorem 2 in the previous section, we have T(ai,-,an) = (/31,--- ,fim)A, where A, an (m, n)-matrix, is a matrix of T with respect to the basis { o t i , . . . ,
otn;Pi,...,0m}. Let T be a linear transformation from R3 into R2.
E x a m p l e 4. transforms
It
a i = (1,1,1), a 2 = (1,2,1), a 3 = (1,1,2) into A = (1,1), A - ( 1 , 2 ) ,
A-(2,2)
respectively. Find the matrix of T with respect to the basis /32}-
{aci,a2,cX3',Pi,
Solution: Since Ten = l/3i + 0/32, T a 2 = 0/3i + l/3 2 , T a 3 = 2j3i + 0/32 , we have r ( a i , a 2 , a 3 ) = GSi.jSa) ( J
J
j) -
Thus the matrix of T with respect to the basis { a i , a i , a 3 ; / 3 i , / 3 2 } is
A-(l
° 2^l
Suppose V is a linear space over K and V a linear space over K'. If IT is an isomorphism from X onto if', then a transformation T satisfying the conditions T ( Q + 0 ) = Toe + T/3, T(ka) = a{k)Ta is called a semi-linear transformation.
Linear Spaces and Linear Transformations
341
It is apparent that a linear transformation is a semi-linear transformation. If T is invertible linear transformation, then its inverse linear transformation is a semi-linear transformation. Exercises 1. Suppose T is a linear transformation from R4 into R3 defined by T(x,y,z,t)
= (x + y,z + t,0).
Find ImT and KerT. 2. Let T be a linear transformation from a linear space V over K into a linear space U over K, A the matrix of T with respect to the basis {c*i, c*2, • ■ •, a n } for V and {/3i, fo, • • ■ ,Pm} for U. If {a'j, c*2, • • • , a'n} is also a basis for V and ( " i . - - - ,«!») = (<*!,•• ■
,an)P,
prove that the matrix of T with respect to a basis {a'j, a'2, ■ ■ ■, ot'n; (3\, /32, ...,/9m}isAP. 3. Suppose U, V, and W are linear spaces over K,S is a. linear transformation from U into V,T is a linear transformation from V into W, then TS is a linear transformation from U into W\ Prove that rank of (TS) ^ min (rank of T, rank of S)
*7.6. Dual Spaces and Dualistic Transformations In this section we shall consider a kind of linear space used very often. It is formed from a given linear space. Suppose V is a linear space over K. If we regard if as a one-dimensional linear space over K, then a transformation from V into K is called a linear functional on V. For example, if (01,03,... ,an) € Rn,Ti{ai,... ,an) = Oj, it is clear that Ti is a linear transformation from Rn into R. Hence Ti is a linear functional on Rn. If V is a linear space over K, it is formed by all matrices of order n with elements in K, and A = (a,ij) G V, T(A) = o n + • • • + o n n = tr(A), Then T is a linear functional on V.
342
Linear Algebra
It is easy to see that the sum a + r of any two linear functionals a and T on V and the multiplication ko of a by a number k in K: {a + r ) ( a ) = a(a) + r(a),
(ka)(ac) =
ka(a),
are linear functionals on V. Therefore all linear functionals on V form a linear space on K, called a dual space on V and denoted by V. T h e o r e m 1. Let V be an n-dimensional linear space over K, then a dual space V on V is also an n-dimensional linear space over K. Proof. Suppose { Q I , Q 2 , . . . , a n } is a basis for V, a = k\<x\ + ■ • • + is any element in V, a is a linear functional on V, then a(a) = fci
knan
1- fcna(an), a(a.i) = a* & K.
Take the linear functional on V ai(a.j) = 5ij, i = l , - - - , n . As <Ji(a) = ki, we have a(a) = a(ai)ai(a)
H
h a(an)an(oc)
= {a(oci)cri H
h cr(a„)<7„)a!.
Therefore (T = OxCTi +
han<7„,
(1)
i.e.,CTis a linear combination of ffl,<72,... , c n . It is easy to prove that (Ti,
i = l,--
,n,
(2)
is called a dual basis of {c*i, ■ • •, otn}. Formula (2) is an important expression. Taking a dual basis, we can simplify enormously many computations according to (2). For example, finding
1- anocn, a = bidi H
h 6ndn
Linear Spaces and Linear Transformations
343
and obtain CT
(°O
= ( Y l b i 6 i i ) I ^2ajaj n
v
2 j biaj6Li{a.j) i,j=l
=
or
(
ybjCLj i=l
ai
an
which is an extremely simple expression. Example 1. Find a dual basis {01,02,03} for the dual space R3 with respect to a basis { Q I , 02,03} for R3, where 01
(1,0,0), 0 2 = (1,1,0), 0 3 = (1,1,1).
=
Solution: Let a = (x, y, z) £ R3, d i (d) = d i (x, y, z) = ax + by + cz. As d i ( a i ) = 1, d i ( a 2 ) = 0 ,
di(a3)=0,
i.e., a = 1, a + b = 0, a + b + c = 0, we find a = 1, b=— 1, c = 0,
d i ( x , y , z ) = x — y.
By an analogous argument we obtain ot2(x,y,z)=y-
z,
a3(x,y,z)=z.
The solution is complete. Example 2. Suppose {oti, 0 2 , . . . , an} is a basis for V, { d i , 0 2 , . . . , d n } is a dual basis for V, {Pi,..., f3n} is another basis for V, and {0i, fa, ■ ■ ■,Pn} is a dual basis for V. If Q3i,--- ,/3„) = ( o i , - - (Pir
prove that Q'P = E.
■■ ,$n) =(&!,-■•
,an)P, ,a„)Q,
344
Linear Algebra
P r o o f . Let P = (Plj),Q
= ( 9 i j ) . Then
and hence PiPj = ^QkiOLk k
■ ^pijOti
=
I
^2,qkiPljakai k,l
= 2 J IkiPkj = Sij ■ k
Thus Q'P = E. The statement holds. Since V is an n-dimensional linear space over K, similarly we again have a dual space V on V. Theorem 2. Show that the dual space on V is V and a dual basis of a basis {di.,d2,. . . , d „ } for V is { Q i , Q 2 , . . . , Q n } , i.e., V = V, dj = cti. Proof. Suppose a eV,x eV, then a(x) £ K. Hence we can regard a; as a transformation from V into K, written as x, i.e., i(a) = a(x). It is clear that x(a + T) =
(<J
+
T)(X)
= a(x) +
T{X)
= x{a) +
X(T),
x(kcr) = (ka)(x) = fccr(x) = kx{cr). Thus x is a linear functional on V, i.e., x can be regarded as a linear functional on V, and s o ^ C V . Now suppose r £ V. We shall show that r is in agreement with some x, i.e., r(cr) = x(a), a £ V, and consequently V C\V. As r ( d i ) £ K, let x = r(di)ai H h r(dn)an , we obviously have x £ V. Thus from (2), we obtain x(&i) = d i ( r ( d i ) a i + • • • + r ( d „ ) a „ ) = r(di)di(a1) + • • ■ + r(dn)dj(an) = T(di).
As { d i , d 2 , . . . , d „ } is a basis for V, let a = fcidi H
1- fc„d„ ,
Linear Spaces and Linear
Transformations
345
we obtain T(CT) = fciT(di) H = fcii(di) H = x(kidci +
(- fc„T(dn) h
knx(an)
i- knan)
= x[a). That is to say, r = x. i.e., r can be regarded as the element x in V. Hence V" C V and so V" = V. From d i ( « j ) = Sij, we obtain dj(dj) = 6q. Then as d j ( d i ) = 6y, we have d j = &j, i.e., {ai,Q!2, . . . , a n } can be regarded a dual basis of {di,d2,...,dn}. Therefore the theorem is established. Thus V is a dual space on V. Conversely, V can be regarded as a dual space on V. { d i , d 2 , . . . , d n } is a dual basis of {01,02, ■ ■ ■ , a „ } . Conversely { a i , 0 2 , . . . , o n } can be regarded as a dual basis of {di, 02, ■ •., d „ } . We considered dual spaces above. Next, we shall introduce a dualistic transformation. They are closely associated with each other. Theorem 3. Suppose V and U are linear spaces over K, T is a linear transformation from V into U,
f(0)=0T,$£U. Then T is a linear transformation from U into V and is called a dualistic trans formation of T. Proof. Since 0 € U, (3T is a linear functional on V. Therefore J3T G V. i.e., T is a transformation from U into V. Then as
t(ki$! + k202) = fop! + k202)T = kiffiiT) + k2{02T) = hf^) + k2T02), T is a linear transformation. The proof is complete. Example 3. Assume that T(x, y) = (y, x + y) is a linear transformation on R2 and
346
Linear Algebra
Solution: By definition Tip = tpT. Then for any vector (x, y) we have (fip)(x, y) = ip(T(x, y)) = ip(y, x + y) = y-2(x
+ y) =
-2x-y.
The solution is complete. Definition. Let V be a linear space over K and U a subset of V. Suppose a linear functional T belongs to V. If Tu = 0 for every u in U, then T is called an annihilator of U. The set of all such linear functionals in V forms a subspace of V and is called annihilator subspace oiU, or for short, annihilator of U and denoted by U°. This is because letting T\,T2 € U°,a,b£ K, we have (aTi + bT2)u = aTiu + bT2u = o, i.e., aTi + bT2 G U°. T h e o r e m 4. Assume that T is a linear transformation from V into U and T is its dualistic transformation. Then the kernel of T, KerT, is an annihilator of ImT of T, i.e., Kerf = (ImT)°. Proof. Assume that ip G KerT, i.e., Tip = 0, then we have ip(Ta) - (ipT)a = (Tip)a = o for every element a EV. Hence ip £ (ImT)° and so Kerf C (ImT)°. Again assume that ip € (ImT)°. Then we have (fip)a = tp(Ta) = o for every a£V. Thus Tip = 0, i.e.,
Linear Spaces and Linear
Transformations
347
Theorem 6. Suppose T is a linear transformation from V into U, the ma trix of T with respect to the bases {01,0*2,... , a n } for V and {/3i,/32,... ,/3 m } for U is A = (a,ij), and the matrix of T with respect to the dual bases {$1,02, ■ ■ ■, An} and { d i , d 2 , . . . , d n } is B = (bij). Prove that B = A'. Proof. From (2), we have n
f(0j)=pjT = '£/$j(Tai)&i. As m
n
Ten = J H OfcijSfe,
f (ft) = J2 bU&i >
k=l
1=1
we find m
6y = $jT(cti)
m
= /§,■ ^ a r i / 3 r = 2 ^ a.riPjPr = a-ji ■ r=\
r=l
i.e., (by) = (dij)'. The proof is complete. Assume that T is a dualistic transformation of T, the matrix of T with respect to the dual basis {di, 0 2 , . . . , an;(31,...
,/3 m } is C = (cij), then
C = B' = (A')' = A. As d j = Oi,/3j = Pi, we have T = T. In other words, the dualistic transfor mation of a dualistic transformation of T is simply T itself. Exercises 1. Suppose V is a 3-dimensional linear space with a basis { 0 1 , 0 2 , 0 3 } and a = ziOi + £202 + £303 is any element in V. Which of the following transformations are linear functionals on V? (1)
(2) a(x) = (xi + x2)2 .
2. Find the dual basis of R3 with respect to the following basis for R3. (1) {(1,0,0), (0,1,0), ( 0 , 0 , 1 ) } , (2) {(1,0,-1), (-1,1,0), ( 0 , 1 , 1 ) } . 3. Suppose ip is a linear functional on R2 defined by ip(x, y) — x — 2y and T is a linear transformation on R? defined by T(x, y) = (2x — 3y,5x + 2y). Find
Tip).
348
Linear Algebra
4. Let T be a linear space and x EV. Show that if x(a) = 0 for every element o £ V, then x = 0. 5. Let {ati,ai2, ■ ■ ■, an} be a basis for V, {ax, &2, -. ■, d „ } a dual basis for V". Prove that any element a in V can be written as n
ot = ^ d i ( a ) a j . i=i
6. Assume that {/ is a proper subspace (different from the zero subspace and V) of an n-dimensional linear space U, a is an element in V, but not in U, i.e., a G V and aEU. Prove that there exists a linear functional a € V such that <j(a) = 1,
CHAPTER 8 INNER P R O D U C T SPACES
When we considered linear spaces above, we have so far not introduced any metric concepts such as lengths of vectors and angles between two vectors as in analytical geometry. These metric concepts, however, are needed in many problems, and must be dealt with. So in this chapter we shall introduce these concepts and explore the related properties. While there are many ways to do this, in our approach we shall derive both length and angle from a more fundamental concept called inner product of two vectors. Along the way we shall introduce also the inner product spaces. We shall be dealing with linear transformations which leave these metric properties unchanged. These are known as orthogonal transformations or unitary transformations. They are the basic transformations in an inner product space. In this chapter we consider mainly the following two problems: 1. Basic concepts and basic properties of an inner product space. 2. Orthogonal transformations and unitary transformations. 8.1. Concept of Inner Product Spaces In this section we turn to the task of discussing lengths and orthogonality. In analytical geometry the length of a vector and the angle between two vectors are two basic metric concepts. They can be expressed in terms of dot products: \a\ = y/[a,a),
cos (9 = ,.,Q,
,
where ( a , (3) is called the dot product or scalar product of a and /3, \a\ and \/3\ are the lengths of a and /3 respectively, 6 is the angle between a and f3. 349
350
Linear Algebra
The dot product has the following fundamental properties: (o,j9) = 09, a ) , ( a i + 09./9) = ( a X l 0 ) + ( a 2 , / 3 ) , (ka,(3) = k(a,p),
(a,a)>0,
(a, a ) = 0 if and only if a = o . We are all familiar with the dot product and the above properties. In a linear space we generally cannot visually define these metric concepts as in analytic geometry. This problem can be solved like this: first we use the above properties as conditions for defining the inner product, and then with the help of the inner product we define the length of a vector and the angle between two vectors. Thus we first have: Definition 1. Let V be a real linear space. For any two vectors a and /3, if there exists a real number, denoted as (a,/3), corresponding to them and satisfying the following conditions, then the real number (a,/3) is known as the real inner product of vectors a and (3: 1. (a,/3) = (/3,a), 2. (fca,/3) = fc(a,/3), where k is any real number, 3. ( a 1 + a 2 ,/3) = (ai,/9) + ( a 2 , 0 ) , 4. (a, a ) ^ 0 and (a, a ) = 0 if and only if a = o. Definition 2. A linear space over real numbers with an inner product is called a (real) inner product space. A finite-dimensional real inner product space is often called a Euclidean space. An inner product space over complex numbers is often referred to as a unitary space. Since the operation of inner product of vectors has nothing to do with the operation of sum of vectors and multiplication of a vector by a scalar, no matter how the inner product is defined, it has no influence on the dimension of the linear space over real numbers. A subspace of an inner product space is clearly also an inner product space over real numbers. Thus the linear space R3 over real numbers is an inner product space over real numbers. We know from analytical geometry that the inner product of vectors a = (01,02,03) and /3 = (61,62,^3) can be written as (a,/3) =
351
Inner Product Spaces
a\bi + a2&2 + 0363- In the linear space Rn over real numbers, for vectors a = (01,02,...,a n ) and /3 = (61,62, • - •, bn) in .R", we define similarly (a,/3) = aibi-\
\-anbn.
Clearly, (a, (3) satisfies the conditions denning the inner product, and so (a, (3) is an inner product. Therefore Rn is also an inner product space. Moreover we know that all the matrices of order n with real elements under the usual operations of addition and scalar multiplication of matrices form a n 2 -dimensional linear space Rn over real numbers. Let A = (aij) and B = (bij). Similar to above we define a b
(A,B) = Y^
ij ij,
i,3=l
i.e., (A,B)=tr(AB'). It is to easy to prove that (A, B) defines an inner product on R^. Therefore Rn is an inner product space. We assume that C is an infinite-dimensional linear space over real numbers formed by all continuous real-valued functions in x on the interval [a, b]. For any f{x) and g(x) in C, we define (f(x),g(x))=
/
f(x)g(x)dx.
Ja
It is easy to check that the conditions defining an inner product are satisfied. We freely use the following facts in integration from calculus: rb
(1) (f(x),g(x))=
/
pb
f(x)g(x)dx=
Ja
/ g(x)f(x)dx
fb
(2) (kf(x),g(x))=
(g(x),f(x)).
rb
/ kf{x)g(x)dx
= k
Ja
(3) (f(x)+g(x),h(x))=
=
Ja
f(x)g(x)dx
=
k(f(x),g(x)).
J o,
f (f(x) + g{x))h{x)dx Ja pb
rb
= I f(x)h(x)dx Ja
= (f(x),h(x))
+ /
g(x)h(x)dx
Ja
+
(g(x),h(x)). fb
(4) For a nonnegative continuous function f(x), we have / f(x)dx Ja
> 0.
352
Linear Algebra
We can have / f(x)dx = 0 if and only if f(x) = 0 for all x € [a, b]. Ja
Also, as (/(*),/(x))= /
/ a (x)«fa
and / 2 (x) is nonnegative, f2(x)dx^0.
(/(*),/(*))= / Ja
Finally suppose that (f(x),f(x)) / f2(x)dx
= 0. Then
= 0 implies f(x) = 0, a; € [a, 6].
Ja
Thus the conditions for an inner product (f(x),g(x)) the formula (f(x),g(x))
= /
on C are satisfied by
f(x)g(x)dx
Ja
Therefore C is an inner product space. It is not difficult to see from definition that the inner product has the following basic properties: (1) (a,Z/3) = Z(a,/3); (2) ( a , A + A ) = (a,j9i) + ( a , A ) ; (3) ( a , o ) = ( o , / 3 ) = 0 ;
(
71
71
\
71
Yshcti,
T,l}Pj)=
T,
i=l
j-1
i,j=l
'
kiij{oLU(3j).
To explain in detail we see that from Definition 1 we have (a,l(3) = (l{3,a) = l(f3,a) =
l(a,l3),
and so (1) holds. Again as (a, ft + A ) = (ft + 02, a ) = (0i, a ) + (0 2 , a ) = (a,0i) + (a,02),
353
Inner Product Spaces
(2) holds. Moreover, from (1) it follows that ( a , o ) = ( a , 0 - / 3 ) = 0 ( a , / 3 ) = 0, and so (3) holds. Finally, using the above properties of the inner product, (4) is easily derived by induction on n. Let V be an n-dimensional inner space, {c*i, a.2, ■ ■ ■, ctn} be a basis for V, and a = xicxi H
h xnan
, f3 = j / i a i -I
h ynan
be any two vectors in V. From properties (4) of inner product we have (a, /3) = ( x i a i H
h xnan,
n
y1oti H
h y„a„)
n
- XIi.ai'ai)xiyj - 5Z aaXiyi> where Oy = ( a i , a j ) . Expressed in matrices, (a,/3) is simply (a,/3) = X A y ' , where ( a j , a i ) •■• ( a i , a „ ) ^ = (ay)
( a n , a i ) •• • ( a n , a n ) / Jt = (xi,. ..,xn),Y=
( y i , . . . , 3/n).
Thus as long as we know the inner product of any two vectors in a basis, i.e., the matrix A, we can find the inner product of any two vectors in the inner product space. The matrix A is referred to as a metric matrix with respect to a basis {c*i, 0 2 , . . . , a n } . It is clear that A is symmetric. Since for any nonzero vector a , we have (a, a ) = XAX' > 0, A is a positive definite matrix. Thus, a metric matrix is a positive definite matrix. Hence, using any positive definite matrix as the metric matrix, we can define the inner product of any two vectors, and so on the same linear space we can define many different inner products. Thus the same linear space becomes different inner spaces. It should be noted that from now on by inner product on inner space Rn and Rn we mean those products as defined above. Then the metric matrix of Rn is a unit matrix and after some rows are appropriately interchanged, the metric matrix of Rn is also a unit matrix.
354
Linear Algebra
By direct computing we easily see that (AX,X)
=
X'AX,
where A is a symmetric matrix and X is a column vector. Therefore A is a positive definite matrix if and only if for any nonzero vector X, (AX,X)>0. A is a negative definite matrix if and only if for any nonzero vector X, (AX,X)
<0.
Example 1. Let a and /3 be any two vectors in an inner product on Rn, A any real matrix of order n, and A' the transposed matrix of A. Prove that (aA,(3) =
(a,(3A').
Proof. Let a = ( a i , o 2 , . . . ,a„), 3 — (bi,b2,-..,bn), and A = (a*,-). According to the above definition of the inner product on Rn, we have (aA,3)
= ( ^ O j O n j h + ■ ■■ + ( ^ a j a i n J bn = ai [\]hau) =
H
\-an
[\lbiani)
(a,{3A').
The proof is complete. If A is a symmetric matrix, then
(aA,0) = (a,l3A). If A is an orthogonal matrix, then (ocA,aA) = (a, a). Example 2. Suppose the metric matrices of an inner product space V with respect to bases { a i , a^, ■ ■ ■, ctn} and {(3i,P2, ■ ■ ■, fin) are A = (a,ij) and B = (bij) respectively. Prove that A and B are congruent, i.e., metric matrices with respect to different bases are congruent. Proof. Assume that ( f t , . . . ,/3 n ) = ( a i
an)P,
P = (pij).
355
Inner Product Spaces
Then Pi = PliOtl H
h PniOt-n ,
and thus n (Pi,Pj)
n
= Pli 2 j P « ( « l , a t ) + • • ■ + P n i ^ P t j ( a n , 0 ( t ) •
Hence n
&ij = Pli 2 j
n Pt
J
ait
"'
Pni
^
t=l
zlPtiant
■
t=l
Using the rules of matrix multiplication it is easy to check that B =
P'AP,
i.e., B is congruent to A. Hence the statement holds. E x a m p l e 3. Let T be the linear transformation of an inner product space V. If (Ta, /3) = 0 for every a and 0 € V, then T = o. Proof. Let (3 = Ta. Then (Ta,Ta) = 0 and so T a = o. Also as a is any vector, we have T = o. The proof is complete. Once we have the concept of inner product, by analogy with the length of a vector and the angle between two vectors in R3, the lengths of vectors and angles between vectors in any inner product space can be defined. Of course here the so-called lengths and angles have no visual geometrical sense. They are only geometry terminologies for convenience of understanding. Definition 3. The length of a vector a is defined as y/(a,a), M i i-e-,
denoted by
\a\ = \ / ( a , a ) . The cosine of the angle between two nonzero vectors a and /3 is defined as COS0 =
<*\\0\ Vectors a and /3 are said to be orthogonal, often denoted by a J_/3, if (a, f3) = 0, for which cos0 = 0. The zero vector can be considered to be orthogonal to
356
Linear Algebra
any vector. A vector a is said to be a unit vector if | a | = 1. That is to say, a vector whose length is equal to 1 is a unit vector. E x a m p l e 4. If r is a real number, then \ra\ = \r\\a\.
Proof. We have \ra\ = y/(rat,rat) =
y/r2(at,a)
= \r\y/(at,ot) = \r\ \a\. The proof is complete. If U is a subspace of an inner product space V, then the set of all vectors in V that are orthogonal to every vector in U forms a subspace called orthogonal space of U or orthogonal complement of U and denoted by I / x . We firstly prove that the orthogonal complement of U is a subspace of V. First, from (a, ft) = 0 and (a, ft) = 0 we obtain ( a , ft + ft) = (a, ft) + (a,/3a) = 0, ( a , r f t ) = r ( a , f t ) = 0. Thus any subspace [/ of V has an orthogonal complement J7 X . We shall then prove that U D C/x = 0. If a € £/ and a € C/ x , then (a, a ) = 0 and hence ot — o. Thus we have U n C7X = 0. Example 5. Let J7i and Ui be two subspaces of an inner product space V. Prove that (Ui + c/ 2 ) x = c/ x n [/2X. Proof. Since Ui C l/j + f/2, we have (I/i + C^)"1 £ [7,x , i = 1,2. Hence
(c/i + c/ 2 ) x c [/x n c/2x. If a e t/j 1 n f/2x , a j e Uit then (ai + a2, a ) = (ai, a ) + (a2, a ) = o ,
Inner Product Spaces
357
and so a G (Ux + L^)"1, i.e.,
Ut n u£ c (f/j + tfa)x. Thus the statement holds. Prom the definition it is easy to see that a vector a in V lies in the orthog onal complement of U if and only if a is orthogonal to each of a set of vectors which spans U. E x a m p l e 6. Let U be a subspace of an inner product space R3 spanned by the vectors <*i = (1,1,2) and a 2 = (2,2,3). Find U1. Solution: The two vectors c*i and a.2 are linearly independent, so the U is two-dimensional. It is easy to see that a vector a lies in the orthogonal complement of U if and only if a is orthogonal to each of these two vectors which span U. Thus U1- is the subspace tH- = { a € R3\(a, a x ) = (a, a 2 ) = 0} Carrying out the inner product calculation for a = (xi, 22, X3) gives the system of equations f xi + x2 + 2x3 = 0, \ 2xi + 2x 2 + 3x3 = 0. The solution set to this system is x\ = — £2 ,23 — 0- Hence (xi,x2,x3)
= ( * i , - * i , 0 ) =xi ( 1 , - 1 , 0 ) ,
and so U1- is a subspace formed by vectors of the form fc(l, —1,0), i.e., it is spanned by the vector (1, —1,0). Its dimension is 1. The solution is complete. From Definition 1 we know that (a, a ) ^ 0, so that the length of a vector is a nonnegative number. Only the length of the zero vector equals zero. The length of a vector a = {a\,..., a n ) in Rn is 1*1 = y/al + ■ ■ ■ + al > which is identical in form to the length of a vector in R3. We shall now prove a result which will enable us to give a worthwhile definition for the cosine of an angle between two nonzero vectors a and 3 in
358
Linear Algebra
an inner product space V. This result is called the Cauchy-Schwarz inequality, has many important applications in mathematics. As the cosine of any angle is a real number between —1 and 1, to see if this definition is reasonable for any inner product space, we must show that -is:
(«,/3)
$1.
This is formulated as the following theorem. Its proof, though not difficult, is not too obvious and does call for some clever maneuvering. T h e o r e m . If a and /3 are any two vectors in an inner product space V, then
l«l l/3| i.e., (a,/3)2<(a,a)(/3,/3).
(1)
This inequality is known as the Cauchy (A. L. Cauchy, 1789-1857)-Schwarz (H. A. Schwarz, 1843-1921) inequality. Proof. If j9 = o, then |/3| = 0 and (a,/3) = 0, so the inequality (1) is clearly valid. Now suppose that /3 is nonzero. Let x be any real number and consider the vector a — x(3. Since the inner product of a vector with itself is always nonnegative, we have 0^{a-xf3,a-
xf3) = (/3, (3)x2 - 2(a, (3)x + ( a , a ) = ax2 — 2bx + c
where a = (/3,/3), b = (a,/3), and c = (a, a). If we fix a and /3, then ax2 — 2bx + c = p(x) is a quadratic polynomial in x, which is nonnegative for all values of x. This means that p(x) has at most one real root, for if it had two distinct real roots x\ and X2, it would be negative between x\ and x^ (Figure 1):
Fig. l.
359
Inner Product Spaces
Prom the quadratic formula, the roots of p(x) are —b + y/b2 — ac a
—b — Vb2 — ac a
and
(a =£ 0, since /3 ^ 0). Thus we must have b2 — ac ^ 0, which means that b2 ^ ac, which is the desired inequality. If a and /3 are any two nonzero vectors in an inner product space V, the Cauchy-Schwarz inequality can be written as I
|2
l/a
i2
^
l
'
OI
~
1
^
I
I 1/31 ^
l
■
\a\2 |/3| 2 \a\ \(3\ It then follows that there is one and only one angle 6 such that
We define this angle to be an angle between a and /3. In inner space R3, when r§ma = ± 1 , i.e., cos# = ± 1 , a and (3 are coin ciding vectors, i.e., a = k(3. The same is true for the general case of any inner product space. As a matter of fact, the above proof of the Cauchy-Schwarz inequality shows that the equality holds if and only if
{a, a) i.e., if and only if a is a multiple of (5. If the equality in (1) holds, from the above proof it follows that, in an opposite way, ( a — x/3, a — x/3) = 0, i.e., a = x(3. However, this is not difficult to prove directly. Let (3 j= o. Substituting (a,/3) = /c(/9,/3) into equality in (1) we obtain
(a, a) = k(a,/3). Hence ( a - k(3, (3) = 0, (a, a - k(3) = 0, and therefore ( a , a - k(3) - k{a - fc/3,/3) = ( a - fc/3, a - fc/3) = 0, so that a = fc/3. Conversely, in the case a = fc/3 equality clearly occurs in (1). Thus equality occurs in (1) if and only if a and (3 are linearly dependent.
360
Linear Algebra
We now state the Cauchy-Schwarz inequality for the inner product spaces introduced in several of our examples. If a = ( o i , . . . , any and f3 = (bi,..., bn)' in the inner product space Rn, then formula (1) is simply the inequality,
(a,/3)2 = (±alb)j
^ (±aA (]>>2) = M" W •
In the inner product space C, formula (1) is simply the inequality ( f f(x)g(x)dx\
^ f f2{x)dx-
I
g2{x)dx.
In order to see how useful the techniques of linear algebra are, we invite the reader to try to prove this result using only tools of calculus. Furthermore, according to (1) it is not difficult to prove that the relation ship among the three sides of a triangle in plane geometry also holds in an inner product space: If a and /3 are any vectors in an inner product space V, then triangle inequality | a + 0 | < M + |/8|
(2)
holds. This is because of the following. As \a + P\2 = (a + (3, a + (3) = (a, a ) + 2(a, /3) + (J3,0) = |a| 2 + 2(a,/3) + |/3| 2 and the Cauchy-Schwarz inequality states that (a,jS)<|(af0)|<|a||jS|l we get | a + / 3 | 2 ^ | a | 2 + 2 | a | | / 3 | + |/3| 2 = (l«| + |/3|) 2 . Square-rooting the two sides of the above inequality, we obtain |a + j 9 | < | a | + |0|. If a and j3 are orthogonal, from the above proof we obtain the equality
(a + /J,a + j8)-(o,a) + (/9lj9),
361
Inner Product Spaces I.e.
|a + /3|2 = |a| 2 + |/3| 2 , which is Shang Gao's theorem in plane geometry. Exercises 1. In a 3-dimensional inner product space R3, if the real numbers (a, (3) cor responding to any two vectors a and (3 are defined respectively as (1) M - l / 9 1 ,
(2) | a | - | / 3 | - c o s 3 0 ,
(3) 2|a| • |/3| • cos6,
where 9 is the angle between a. and /3. State if each is an inner product. 2. Suppose a basis for a linear space over real numbers of dimension n is { a i , Q 2 , . . . , a n } and two vectors a = ait*! + • • ■ + anan , (3 = b\ai +
h bnan
n
give rise to a real number (at, (3) = J2 *ai^i- Prove or disprove that V is an i=l
inner product space. 3. Suppose a and /3 are any two nonzero vectors of an inner product space, a = kp, and the angle between a and /3 is 6. Prove that (1) k > 0 if and only it 6 = 0, (2) k < 0 if and only if 6 = IT. 4. Let a = (oi, a2, • • •, On)> (3 = (b±, 62,..., bn) be any two vectors in an ndimensional linear space Rn over real numbers and A = (a,ij) be a positive definite matrix. If (a,/3) = aA(3', (1) prove that Rn is an inner product space; (2) write the Cauchy-Schwarz inequality. 5. Let a i , c*2, • • ■, ctm be a set of vectors in an inner product space. Prove that ( a i , a i ) •■■ ( c t i , a m ) ^0
( a m , a i ) •■•
(am,am)
if and only if a 1, a.2, ■ • ■, otm are linearly independent. 6. Let a / 0 be an element in an inner product space V. If for any other element /3 we have (a,/3) = 0, show that the metric matrix with respect to any basis is singular.
362
Linear Algebra
7. Suppose A = (dij) is a positive definite matrix of order n and 2/i 12/2, • • •, Vn are any real numbers. Prove that
E dijXiyA
^ (yjajj^Xj) [y2aijyiyjj
■
8.2. Orthonormal Bases In analytical geometry we often take unit vectors i, j , and k as coordinate basis vectors in R3. When considering metric problems, we often use these vectors for convenience. So do we in the case of inner product spaces. Definition 1. Suppose { a i , a 2 , . . . , a n } is a basis for an n-dimensional inner product space V. If they are pairwise orthogonal unit vectors, i.e., (atuatj)
= <*ij >
then the basis { 0 1 , 0 2 , . . . , an} is known as an orthonormal basis of V. Thus {i, j} is an orthonormal basis for R2, {i,j, k} is an orthonormal basis for R3. It is clear that a metric matrix of an inner product space with respect to an orthonormal basis is a unit matrix. Theorem 1. The transition matrix of two orthonormal bases is an orthog onal matrix. Conversely, if the transition matrix of two bases is orthogonal and one basis of the two bases is an orthonormal basis, then the other basis is also orthonormal. Proof. Suppose that {ai,c*2,... , a n } and {/3i,/32,- • • ,/3n} are two bases for V and (Pi,fh,• ■ ■,Pn) = ( a i , a 2 , . . . , a n ) A . If these two bases are both orthonormal bases, from Example 2 of Sec. 8.1 we obtain E = A'EA = A'A, and so A is orthogonal. Conversely, if { « i , . . . ,an} is orthonormal and A is orthogonal, then the metric matrix with respect to {/3i,... ,/3 n } is A'EA = E, and hence {/3i,..., Pn) is an orthonormal basis. The proof is complete. There are orthonormal bases in any inner product space of dimension n. This was proved in Chapter 5 by direct orthogonization and normalization. As this problem is of extreme importance, we restate its general form as follows. We know a set of linearly independent vectors need not be pairwise or thogonal; but a set of pairwise orthogonal nonzero vectors must be linearly
363
Inner Product Spaces
independent. This is because if ai,..., vectors in V, then whenever k\ct\ H
ctm are pairwise orthogonal nonzero h kmctm = o
for any vector a;, we have (fciai H
+ kmam,
cti) = 0,
i.e., fci("l, a i ) H
1- ki(cti, Qi) H
h (^(cim, aj) = 0.
However, ( a i , a j ) / 0, while ( Q J , Q J ) = 0 for j ^ z. So fcj = 0. Therefore a i , . . . , a n are linearly independent. Thus the number of pairwise orthogonal nonzero vectors cannot exceed n in an n-dimensional inner product space. This fact is also apparent in R2 and R3. In a plane we cannot find three pairwise orthogonal nonzero vectors; in a space we cannot find four pairwise orthogonal nonzero vectors. Suppose {cti,..., a n } are pairwise orthogonal nonzero vectors in V, not all of which are unit vectors. We normalize each vector on to obtain the unit vectors 1 1 Prom {koti,la.j) = kl(ai,otj) = 0,i ^ j , we see that they are still pairwise orthogonal vectors. Therefore it suffices to find n pairwise orthogonal nonzero vectors to obtain an orthonormal basis for V. The following theorem gives us a method of finding an orthonormal basis. T h e o r e m 2. (Gram-Schmidt process) Let V be an inner product space and W ^ o & subspace of V with a basis { Q I , . . . , a n } . Then there exists an orthonormal basis {rji,..., r]n} for W. Proof. We prove by constructing the desired basis {rji,... ,7]n}. We first find an orthogonal basis {/3i,..., /3„} for W as follows. We pick any one of the vectors i n o i , . . . , a n , say cti, and call it Pi. Thus j3\ = ct\. We next look for a vector ,02 in the subspace W\ of W spanned by OL\ and e*2, which is orthogonal to pV Since /3i = a%, W\ is also the subspace spanned by /3i and a.2- Thus p^ = k\f3\ + A^a^- We then determine k\ and k2 so that (p"i,/32) = 0. Now 0 = (pa.jSi) = (*i0i + k2ct2,Pi)
= fci(p"i,P"i) + M " 2 , P " i ) .
364
Linear Algebra
Note that ft ^ 0, so ( f t , f t ) ^ 0. Then
We may assign an arbitrary nonzero value to k2. For convenience let &2 = 1, which gives (aa,/9i)
fci
(ft, A)
and hence ft = ^ l f t + Q2 = 02 - ; o
a
Jl
(ft, ft)'
•
At this point we have an orthogonal subset {ft, ft} of W (Figure 2).
Fig. 2.
Next, we look for a vector ft in the subspace W2 of Wspannedby {0:1,0:2,0:3}, which is orthogonal to both ft and ft. Of course W2 is also a subspace spanned by { f t , f t , a 3 } . Thus ft = tx0i + t202 + t3a3. We let t3 = 1 and find t% and t2 so that (ft, ft) = 0 and (/33,/32) = 0. Thus 0 = {03,0i)
= (h0i +1202 + a3,0i)
= ti(0i,0i)
+ (o3,ft),
0 = (03,02)
= {ti0i+t202
= t2(02,02)
+ (a3,ft).
+ a3,02)
Observe that 02 ^ 0. Solving for t\ and t2 we obtain
(<*3,ft)
(ft, ft)
, *a
(a-3,/32)
(ft, ft)
Thus A
~
as.ft [01,01)
fl
("3, ft),, (ft, ft)
Inner Product
365
Spaces
At this point we have an orthogonal subset {A> A> A } of W (Figure 3). We next seek a vector fa in the subspace W3 spanned by {011,012,013,0:4}, and also by {/3i,/32, A , a 4 } , which is orthogonal to (3i,@2,P3- We can then write _ {a*,0i)a (Q4,/32)a (Q4,ft)a P4 — «4 - (A, 7 ^ —A ^ ")T'P I - 73—;5T A - (A, 75—S"TP3 A)'
(A, A)
Apply the method of induction as follows. Suppose we have constructed n — 1 pairwise orthogonal nonzero vectors j9i»A)••■ ) A - 1 - i-e-> (A>A) 7^ 0, (Pj,Pi) = 0 (j 7^ i). We now look for a vector /3„ in the subspace W„_i of W spanned by {c*i, a 2 , . . . , a „ _ i } , which is orthogonal to A t A f - > A - i - Of course W n _i is also the subspace spanned by {/3i,/32,... , / 3 „ _ i , a n } . Thus A
= « l A + S2A H
1" S n - l A - l + SnOtn .
(Pl,Pl) P1
I02-P2)P2
Fig. 3.
We let s n = 1 and try to find s\, S2, ■. ■, s n - i so that
(A.A) = *i(A.A) + «2(A,A) + -" + sn-i(A-i,A) + (a„,/3i) = ai(A, A) + («„, A) = 0 (i = 1,2,..., n - 1). Observe that
ft/0.
Thus /3i, A> • • • i A
Solving for Si we find
are n
pairwise orthogonal nonzero vectors.
366
Linear Algebra
If we now let Vi = 7g7
for
i = 1,2, • ■ • , n .
then {T/I, T/2, . . . , T/^} is an orthonormal basis for W. Hence we have: T h e o r e m 3. Any nonzero inner product space has orthonormal basis. Example 1. Suppose {ai, ct2,013} is a basis for a subspace U of the inner product space R4, where a i = (1,1,0,0), a 2 = (1,0,1,0), a 3 = ( - 1 , 0 , 0 , 1 ) . Find an orthonormal basis of U. Solution: We first find an orthogonal basis of U. Taking /3j = dtj, by a similar computation as in Example 1 of Sec. 5.3, we obtain
fc-G'-l1'0)' ^B'ii' 1 )Normalizing, we obtain the desired orthonormal basis
( \
i_ 1 2V3 ' 2sfl'
1 Vs\ 2v^ ' 2 ) "
The solution is complete. From analytical geometry it is easy to see that (cos#, sin#), (—sin#, cos#)
and
(cos#, sin#), (sin#, — cos#)
are two orthonormal bases for R2. The following is its converse. E x a m p l e 2. Prove that the orthonormal basis for R2 can be written as (cos 0, sin 0), (— sin#, cos#) or
(cos 8, sin 6), (sin#, — cos#).
Proof. Let a and /3 be an orthonormal basis for R2, a = (a, 6). Then \a\ = a2 + b2 = 1 and so \a\ < 1, and there exists 6 such that a = cos#. Since 2
Inner Product Spaces
367
b2 = 1 - cos2 9, b = ± sin 9. Appropriately taking 9 such that a = (cos 0, sin 9). Likewise we have /3 = (cos
). Prom 0 = ( a , /3) = cos 0 cos ? + sin 9 sin v? = cos(<^ — 9), we obtain or
^ = 0+77
¥> = + y •
Therefore /3 = ±(— sinfl, cos#) and the statement is proved. The following are two basic properties of an orthonormal basis for V. Suppose { a i , . . . , a „ } is an orthonormal basis for V. Then any element a in V can be expressed in terms of inner products as a = (a,a1)ai For if a = a\cxi + ■ ■ ■ + anan, (a,ai)
+
h(a,a„)an.
(1)
we have
= ^(ajotj,^)
= OJ(OJ,c*i) = O j .
Conversely, if (1) holds, then as ai = 0 an H cti = ( a i , a i ) a i H
h 1 on -\
l-0a„,
h (a<,ai)a,-|
f-(ai,a„)an,
from the uniqueness of the above expression for oti we obtain (oLi,ai) = 1, (a*,ay) = 0, i ^ j . That is to say, (1) is the sufficient and necessary condition for {cxi,cx2,. ■ ■, a „ } to be an orthonormal basis. Moreover suppose { a i , a j , • ■ • ,<*n} is an orthonormal basis and a — aiQi +
1- anan ,
/3 = &iai H
+ b„a„ .
From definition or from the fact that its metric matrix is the unit matrix, it follows that n
(a,0) = ^2aibi.
(2)
i=l
Conversely, if (2) holds, then a s a ; = Oai H
+ 1 ■ aj +
(Qi, ati) = 1, ( a ^ a*) = 0, t ^ j ,
\- 0 a „ , clearly
368
Linear Algebra
i.e. (2) is the sufficient and necessary condition for { Q I , C*2, ■ ■ ■ > " n } to be an orthonormal basis. Thus if we take an orthonormal basis for V, then the expressions for length of the vector a and the angle between a and 3 can be simplified respectively to
(3)
\a\ = sja\ + --- + al, aibi H COS 6 —
. ,
h —.
=
.
^a{ + ... + aly/b\ + --- + bl Here we obtain the same results as specified in Sec. 5.3. They are a general ization of the corresponding concepts in analytical geometry. Theorem 4. Suppose U is a subspace of an n-dimensional inner product space V. Then V is the direct sum of U and U1: V = [/ e t / x .
Proof. Suppose {CKI, • ■ ■, a m } is an orthonormal basis for U. We choose such that {oti,...,am,am+i,...,an} is an orthonormal basis for V. Then the subspace generated from a m + i , . . . , a n is simply Ux. This is because (a*, fcm+iam+ji H
h fcnan) = 0 , i = 1 , . . . , m .
Moreover if (a*, a ) = 0, i = 1 , . . . , m, then a = (a, a m + i ) a m + i H
h (a, an)an
.
Thus F is the sum of U and U*-. Since C n C/-1 = 0, V is the direct sum of U and U-1, i.e., V = C/ © C/-1-. The proof is complete. Therefore we have dim U = n — dim U . From the above theorem, we know that every vector a in V can be uniquely expressed in the form 3 + is, where 3 is in U and u is in t / x , in the orthogonal complement of U, i.e., a = 3 + v. (4)
369
Inner Product Spaces
The vector /3 in U is called the orthogonal projection of a on U and is denoted by projua. In Figure 4 below we illustrate Theorem 4, where U is a twodimensional subspace of R? (a plane through the origin).
Fig. 4.
E x a m p l e 3. Let U be a two-dimensional subspace of R? with orthonormal basis {oti,an}, where (2 «i
1 _2\'
~ V3' 3'
""''
3/
,K
{
^°'72
Find the orthogonal projection of a = (2,1,3)' on t/ and the vector v that is orthogonal to every vector in U. Solution: From (1), we have /3 = projua = ( a a i ) a i + =
1
-3
5
ai +
7!
a2=
/41
(a,a2)a2 1 49
U ' " 9 ' 18
From (4), we have . "
= a
-
/ 3 =
, 5 10 5 ' " I S ' T ' 18
The solution is complete. It is clear from Figure 4 above that the distance from a to the plane U is given by the length of the vector v = a — (3, that is, by \a - projua|
(5)
E x a m p l e 4. Let U be the subspace of R3 defined in the above example and a = (1,1,0)'. Find the distance from a to U.
370
Linear Algebra
Solution: We compute first projya = ( a , a i ) a i + ( a , a 2 ) a 2
=ai+
1
7i
a2 =
(1
1
IV'
U' vi)'
then a - p r o j u a = (1,1,0) - I - , - ,
--1
=VP6' 3'- 6j-)'' and finally from (5) we have , 1
4
1
V2
|a-prOJua| = y - + - + - = —. Thus the distance from a to U is \f2j2. The solution is complete. E x a m p l e 5. Let V be an n-dimensional inner product space, T and T" be two linear transformations on V. The matrices of T and T" with respect to an orthonormal basis {ai,ot2, ■ ■ ■ ,&n} for V are respectively A = (<Jjj) and ^4' (transposed matrix of A) = (a,ji), then (Ta,/3) = (a,T'/3) for any vectors a and /3 in V. Proof. Let a = xiai +
h xnan
,
@ = yiai H
h i/„a n .
As n
T a i = 2 J ajiOij , 3=1
we have n i=l
n
/ n
\
j'=l \ i = l
/
371
Inner Product Spaces
and so n
/ n
\ a iXi
n
/ n Xi
(Ta,(3) = Y^ [ Y, i ) ' yj = Yl \
\ a iy
Hi i
= (<x,T'P). The proof is complete. If we replace the linear transformation T with the matrix A, the above example is just Example 1 of Sec. 8.1. Likewise we have (a,T0) = (T'a,0). If A is a symmetric matrix, i.e., A = A', then T = T'. Thus we have (Ta, 0) = (a, T/3), (a, T(3) = (Ta,
0).
A linear transformation having the above property is known as a symmetric transformation. From the proof in Example 5, it is easy to see that a matrix of T with respect to an orthonormal basis is a symmetric matrix. Therefore eigenvalues of T are real numbers. We conclude this section with a discussion about the relationship between an inner product and a linear functional. It is an important property of the inner product. Let V be an inner product space and v an element in V: T(a) = (v,a),
aeV.
Using the properties of an inner product we find T(ka) = {u, ka) = k{y, a) =
kT(a),
T(a + {3) = (v,oc + (3) = {u, a) + (i/, 0) = T{oc) + T(J3). Thus T is a linear functional on V, i.e., T G V. That is to say, an inner product on V is a linear functional on V. Its converse is true also. Thus we have: T h e o r e m 5. Let V be an inner product space and T a linear functional on V, then there is a unique element v in V such that T(a) = (L>,a),
aeV.
In other words, any element in V can be expressed as an inner product on V. Proof. Let { a i , a 2 , . . . , a n } be an orthonormal basis for V and {on, &2, ■ ■ ■) ocn\ be a dual basis on V, then T is expressible as T = &i&i H
h knan .
372
Linear Algebra
Take v = ki<X\ -\ h knan. Then for any element a = a i a i H V, according to (2) in Sec. 7.6 we obtain
TT aa
fcidi
aiCXi =
\- anan
in
ki
( ) = (5Z f c i d i ) ((mm a i a v ) = 1Z ^2ki( a)>■) ■
Moreover, ver, suppose T(a) T(a) = (ui,a), (ui,a), or (v,a) (u,a) — = (ui,tx), (i/i,a), then [y (v — - v\,a) f\,a) = 0. Since a(is is any element in V, when a = v1/ — V\, t»i, we have (1/ (i/ — fi^i, i, vis — V\) u{) = 0 . Therefore v\=v. Thus f is uniquely determined by T and the theorem holds. Theorem 6. Suppose that / ( a , 0) is a bilinear form on an inner product space V, then there is a unique linear transformation T such that /(a,/3) = (a,T/3). Proof. For any element /9 in V, f(a, /3) is a linear functional on V. Then the above theorem gives that there exists an element v such that / ( a , / 3 ) = (ex, v>), a € V. Thus for any element /3 in V, there is a unique element v corresponding to it. Let the correspondence be T. Then T(3 = v. Thus / ( a , / 3 ) = (a,T/3). As / ( a , / 3 ) is a bilinear form, T is a linear transfor mation. We next show that T is unique. In fact, if (a,Ti/3) = (a,T20), then (a,Ti/3-T 2 /3) = 0, or (a, (T1-T2)fi) = 0. Hence ((T1-T2)(3,(T1-T2)fi) = 0, so (Ti — T2)/3 = 0. Since /3 is any element, we have T\ = T2. Therefore T is unique. The proof is complete. Exercises 1. Let V be a linear space consisting of all polynomials whose degrees do not exceed 3 and the zero polynomial, and define an inner product as (f(x),
g(x)) = /
f{x)
g(x)dx.
Check that V is an inner space and find an orthonormal basis for it. 2. Suppose {oci, c*2, ■ ■ • , a n } is a basis for an inner product space of dimen sion n, a = XiOci + •'• + xnotn is any element whose coordinates are X{ = | a | cos6i, where 0j is the angle between a and on. Show that {QI, a.2, ■ ■ ■, otn} is an orthonormal basis.
Inner Product Spaces
373
3. Suppose { a i , a 2 ) . . . , a n } is an orthonormal basis for an inner product space V. Prove that vectors >9i, /32»■ • • , An in V are pairwise orthogonal if and only if n
53G9i l a t )C9 i) a t )=0,
i^j.
4. Suppose U\ and C/2 are two subspaces of an inner product space V. Prove that (1) {Ut)L
(2) (Ut n U2)L = Ut + Ui .
= Ui,
8.3. Orthogonal Linear Transformations Rotation and reflection transformations in analytical geometry are known to have one specific property that they preserve the length of a vector and the angle between two vectors. Hence they preserve inner products. In this section we shall consider linear transformations on an inner product space that preserve metric properties. Definition 1. Assume that T is a linear transformation on an inner prod uct space V. If it preserves the length of any vector in V, i.e., (a,a) = (Tat, Tex), then T is known as an orthogonal linear transformation on V. Thus the rotation transformation and the reflection transformation in R2 and R3 are orthogonal transformations. Example 1. Suppose T is a linear transformation on the inner product space R3 defined by T(x,y,z)
=
(y,z,x).
Prove that T is an orthogonal transformation, i.e., (a, a) = Proof. Let a = (x,y,z). (Ta,Ta)
As =
({y,z,x),(y,z,x)) 2
= y + z2 + x2 =
{a,a),
T is an orthogonal transformation. The proof is complete.
(Ta,Ta).
374
Linear Algebra
We considered reflection transformation in Sec. 7.3. At that time it was expressed in terms of its coordinates. We shall now express it in terms of inner products. Reflection is sometimes called a mirroring reflection. Let a be a unit vector in the inner product space R3. We see from Figure 5 that the projection vector of vector £ on a is |£| cos to ■ a = (a, £ ) a .
Fig. 6
Fig. 5.
It is easy to see from Figure 6 that the mirroring reflection or mirroring trans formation with respect to the plane 7r to which the vector a is a normal is
Example 2. Show that the mirroring transformation T a (£) is an orthog onal transformation. Proof. By definition Ta is a linear transformation. As (Ta£,Tat)
= « - 2(o,«)a,€ - 2(a,€)a)
Ta is an orthogonal transformation. The proof is complete. Assume that T is a linear transformation on an inner product space V, a and /3 are any two vectors in V. If (a,j8) = (Ta,T(3), clearly ( a , a ) = (Ta,Ta). Conversely, if (a,a) = (Ta,Ta), then (a,/3) = (Ta,T/3). This is because
(a + /3, a + 0) = (T(a + 0), T(a + (3)) = (Ta + T0, Ta + T0). Multiplying out the two sides of above equality, we obtain ( a , a ) + 2(a,,3) + (/3,/3)
= (Ta, Ta) + 2(Ta, T0) + (T/3, T0).
375
Inner Product Spaces
However (a,a)
= (Ta,Ta),
(/3,/3) = (Tf3,T/3). So
(a,p) = (Ta,T0). Thus we have: Theorem 1. A linear transformation on an inner product space V is an orthogonal transformation if and only if it preserves the inner product of any two vectors, i.e., (a,/3) = (Ta,Tf3). Thus as long as linear transformation on an inner product space preserves lengths of vectors, it preserves the inner product and consequently the angles between any two vectors. Therefore it transforms unit vectors onto unit vectors and a set of pairwise orthogonal vectors onto a set of pairwise orthogonal vectors, and consequently an orthonormal basis onto an orthonormal basis. The converse is also true. Thus we have: Theorem 2. A linear transformation T on an inner product space V is orthogonal if and only if T carries each orthonormal basis for V onto an orthonormal basis for V. Proof. As stated above, the necessary condition holds. Next we have to show that the sufficient condition also holds. n
Suppose { a i , Q 2 , . . . , a „ } is an orthonormal basis for V and a = Yl
a a
ii
n
is any vector in V. Then ( a , a ) = a\ + a% + ...+a\.
Since Ta = ^2
aiTat,
t=i
and {Toti,Toc2, ■ ■■, Tan}
is also an orthonormal basis for V, we have
(Ta,Ta)
= a\ + ■ ■ ■ + a\ .
Thus (a, a) — (Ta,Ta). That is to say, T is an orthogonal transformation. Therefore the theorem holds. An orthogonal transformation is a one-one transformation. Hence it is also a nonsingular transformation, as it transforms a basis onto a basis. We already know that a linear transformation on V can be represented by a matrix with respect to a basis given for V. We now ask what is the matrix of an orthogonal transformation with respect to an orthonormal basis. Assume that {a\, ot2, ■ ■ ■, c*n} is an orthonormal basis for an inner product space V and T is an orthogonal transformation on V. If Ton = a i i « i + h ^ n i ^ n j l*e.,
376
Linear Algebra
T(cti,...,an)
= (ai,...,an)A,A
= (a*,),
then A is simply the matrix of T with respect to {a\,a2, { a i , 0L2, • ■ ■, « n } is an orthonormal basis, we have
■ ■ ■ ,otn}.
As
n (TOL^TOLJ)
= ^atidtj
■
t=i
As {Tai,Toc2, ■. ■,Tan} is also an orthonormal basis, n
22atiatj
= 8ij ■
(1)
t=l
This is the orthogonal condition (see Sec. 3.2) and so A'A = E. Hence A is an orthogonal matrix. Conversely, if A is an orthogonal matrix, i.e., A'A = E, then (1) holds and hence (Tai,ToLj) =Sij . Thus {Tai,T(X2, ■ ■ ■ ,Tan} is an orthonormal basis for V and so T is an orthogonal transformation. Thus we obtain: T h e o r e m 3. A linear transformation on an inner product space V is an orthogonal transformation if and only if a matrix of T is an orthogonal matrix with respect to an orthonormal basis. It should be remarked that the matrix of an orthogonal transformation with respect to an orthonormal basis is an orthogonal matrix. However with respect to other bases, its matrices may or may not be orthogonal. For example, suppose {i,j, k} is an orthonormal basis for the inner product space R3 and T is a reflection transformation on the plane in which lie j and k. It is clear that T preserves lengths of vectors, and so is an orthogonal transformation. Since
Ti=-i,
Tj=j,
the matrix of T with respect to {i,j,k}
Tk = k, is
( ■ ' ■ . ) ■
377
Inner Product Spaces
Obviously, it is an orthogonal matrix. On the other hand, if we take {i, j , I = z + j + f e } a s a basis for R?, it is not an orthonormal basis and the matrix of T with respect to {i,j,l} is -1 0
V o
0 -2\ 1 0 0 1/
which is not an orthogonal matrix. Moreover, if we take {l\ = \i + j,h = —\i +i> fc} as a basis for R3, it is not an orthonormal basis, while the matrix of T with respect to {li,l2,k} is
°\
0 1 1 0 0 \0 0 1/
which is an orthogonal matrix. We know that the product of two orthogonal matrices is an orthogonal ma trix, the inverse of an orthogonal matrix is an orthogonal matrix, so the product T1T2 of two orthogonal transformations Ti and T2 on an inner product space V is still an orthogonal transformation on V. An orthogonal transformation T is an invertible transformation and T _ 1 is also an orthogonal transformation. The results readily follow from definition, too. This is because from
(r1r2a,r1:r2a) = (ri(r a o),ri(r a a)) = (T2a,T2a)
=
{a,a),
we see that T\T2 is an orthogonal transformation. Moreover, T is a nonsingular transformation. Let T~1a = a'. Prom {T^a,
r _ 1 a ) = ( a ' , a ' ) = (Tat', Tot') = (a, a ) ,
it follows that T - 1 is also an orthogonal transformation. We discuss below the geometric meaning of an orthogonal transformation. First we consider the case of a one-dimensional inner product space Rr. Let a be a basis for i? 1 , T an orthogonal transformation, and Ta — act. Since (Ta, Ta) = (aa, aa) = a2(a, a) = (a, a ) , a 2 = 1, or a = ± 1 . Thus there are two orthogonal transformations Tic* = a
and
T2a — —a.
378
Linear Algebra
Fig. 7.
Furthermore, let {a = (1,0), 3 = (0,1)} be an orthonormal basis for R2 and T an orthogonal linear transformation on R2. As a' =Ta
= (au,o2i),
/3' = T / 9 = (ai2,a 2 2 ), the orthogonal matrix of T with respect to a basis {a, 3} is the matrix A = I. From AA' = 25, we obtain a n a 2 i + 012*222 = 0 and so ai 2 = 121
^22/
fcan, a2i = —A;a22. Then since |A| = aua.22 — 012^21 = ± 1 , we have (1 + k2)ana,22 = ±1- From ^4'J4 = E, we obtain afx + a\2 = a\x + a22- Therefore (1 + A;2)ai1 = (1 + k2)a22, and so a n = ±a 2 2- Thus when \A\ = 1, we have an = ^22; when |J4| = —1, we have a n = —a22- From analytical geometry we know ( a , a ' ) = | a | ■ |a'|cos0 = cos0, where a and a ' are unit vectors, while
(a,a') = ((l,0),(an,a 2 i)) = an, so (a,a') =cos6 = a,u. Similarly, {3,3') = cosy = a22- Since a ' and 3' are mutually orthogonal vectors, when \A\ = 1, we have cos^ = cosy, i.e., 6 =
Inner Product Spaces
379
a' = Tot = ( a n , a 2 i , a 3 i ) , /3' = Tf3 = (ai2, a 2 2, a 3 2 ) , v' —Tv—
(ai3,023,033)-
Then the orthogonal matrix of T with respect to basis {a, /3, u} is an
ai2
ai3
^21 ^31
«22 «32
«23 033 ,
Prom analytical geometry we know that the mixed product of unit vectors a', (3', and u' equals ( a ' x /3') • 1/ = |A| = ± 1 and the absolute value of the cross product of the mutually orthogonal vectors a' and /3' is \a' x / 3 ' | = |a'||/3'||sin(9| = 1. Hence ( a ' x /3') • 1/ = | a ' x /3'| • |i/'|cosip = cos? = ± 1 ,
Fig. 8.
Therefore when |.A| = ± 1 , we have >p = 0, and so the directions of a , / 3 , and v coincide with those of a', j3', and v' (e.g. if a, (3, u form a right-handed system, then a ' , / 3 ' , i / also form a right-handed system). Consequently T is a rotation transforming a,/3, and v into a',/3', and 1/ in turn. When |.A| = —1, we have tp = 7r, and the directions of a , /3, and 1/ do not coincide with those of a ' , / 3 ' , and v' (e.g. if a , / 3 , and 1/ form a right-handed system, then a ' , / 3 ' , and v' form a left-handed system). So T is a linear transformation performing firstly a rotation transforming a,f3,i> into a',/3',— 1/ in turn, and then a reflection with respect to the plane determined by a ' and /3' (see Figure 8).
380
Linear Algebra
To summarize, when the determinant of the orthogonal matrix A of T equals one, i.e., |J4| = 1, an orthogonal transformation T on R2 or R3 is a rotation. When |^4| = — 1, T is a linear transformation performing firstly a rotation and then a reflection. As before, suppose T is a linear transformation from an inner product space V onto an inner product space U. If it preserves length of any vector, i.e., (a, a ) = (Tot, Tat), then T is called an orthogonal transformation. If T is an invertible orthogonal transformation, then T is called an isomorphism from V onto U. In this case we say that V is isomorphic to U, denoted by V m U. Isomorphic inner product spaces can be regarded as the same inner product space. Example 3. Let T be a linear transformation from the inner product space R2 into the inner product space RA, defined by T(a,b) =
-^=(a,b,b,a).
Prove that T is an orthogonal transformation. Proof. Let a = (a,b). As (Ta,Ta)
=
(-j=(a,b,b,a),-j={a,b,b,a)
= \(a2 + b2 + b2+a2)
= a2 + b2
= (a, a), the statement is true. The following theorem is similar to Theorem 2 in Sec. 7.5. T h e o r e m 4. Two finite-dimensional inner product spaces V and U over K are isomorphic if and only if their dimensions are equal. Proof. If V ~ U, then V and U, as two linear spaces, are isomorphic. Then we know from Theorem 2 in Sec. 7.5 that the dimensions of V and U are equal. Conversely, if the dimensions of V and U are each equal to n and { a i , . . . , a n } and {f$\tfoi • • • > Ai} a r e t w 0 orthonormal bases for V and U respectively,
Inner Product Spaces
381
then T(J2 ki&i) = ]£ fctA- It is apparent that T is an invertible linear trans formation from V into U. Further, as
(T (j2 hoi), T (j2 hoi)) = (J2 **ft. E**&) = E *? T is an orthogonal transformation from V into t/, and hence T is an isomor phism from V onto £/, V ~ C/. The proof is complete. Thus an n-dimensional inner product space is isomorphic to the inner prod uct space Rn. Exercises 1. Assume that { a i , 0.2,0.3} is an orthonormal basis for an inner product space V. Find an orthogonal transformation T on V such that 2 2 1 Tax = - a i + - a 2 - - a 3 ,
2 1 2 Ta2 = - a i - - a 2 + - a 3 .
2. Suppose that {i, j , k} is an orthonormal basis for R3. Find an orthogonal transformation T on R3 such that Tk is a unit vector normal to the plane x + y - 1 = 0. 3. Assume that the lengths of vectors a and /3 in an inner product space V are equal. Show that there exists an orthogonal transformation T such that Ta = @. 4. Assume that ati,ct2,Pi,(32 a r e vectors in an inner product space V and |c*i| = \0i\, \ct2\ = |/321» and (01,02) = (y3i,/?2)• Show that there exists an orthogonal transformation T on V such that
ra1=/31,Ta2=/32. *5. Assume that T is an orthogonal transformation on an inner product space V and U is an invariant subspace of V under T. Show that the orthogonal complement U1- of U is also an invariant subspace of V under T. 8.4. Linear Spaces over Complex Numbers with Inner Products In the first three sections of this chapter we considered linear spaces over real numbers with an inner product. In this section we shall consider linear
382
Linear Algebra
spaces over complex numbers with an inner product, i.e., a unitary space. The latter is a generalization of the former. Many concepts, definitions, and proper ties that have been established for the former can be carried over immediately to the latter. Since we have explicitly stated them before, in this section we shall list only the main results. As for their proofs, we shall not repeat them. If the reader should consider this too concise, they can certainly fill it in as an exercise. For a linear space over complex numbers, can we use Definition 1 in Sec. 8.1 as the definition of a linear space over complex numbers with an inner product? Obviously, this is not possible, because Condition 4 of Definition 1 need not be true for complex numbers. That is, according to the definition in Sec. 8.1, the length of a vector is no longer always a positive number. For example, in a 3-dimensional linear space C 3 over complex numbers, a = (3,4,5i) ^ 0, while (a, a ) = 9 + 16 — 25 = 0. To avoid the occurrence of such a situation and guarantee the validity of Condition 4, the definition of an inner product as given before is to be appropriately modified. Thus we have: Definition 1. Let V be a linear space over complex numbers. For any two vectors a and /3 in V, there exists a complex number corresponding to them, denoted (a,/3). If this complex number (a,/3) satisfies the following conditions, then it is called an inner product of a and (3:
1. (u,0) = (ftZ)] (where the bar denotes complex conjugate) 2. (ka,(3)=k(a,(3); 3. ( a 1 + a 2 , / 3 ) = (a 1 ,/3) + (a 2 ,/3); 4. (a, a ) > 0, and moreover, (a, a ) = 0 only if a = o. Here V is called a linear space over complex numbers with inner products, or sometimes a unitary space. Note that the inner product defined here differs from that in Sec. 8.1 only in Condition 1. Although (a, /3) here is a complex number generally, Condition 1 requires (a, a ) to be a real number. If there were no Condition 1, Condition 4 would be meaningless. It is clear that a linear space over real numbers with inner products is a special case of a linear space over complex numbers with inner products. A subspace of a linear space over complex numbers with inner products is still a linear space over complex numbers with inner products.
383
Inner Product Spaces
If in a linear space over complex numbers C" for any two vectors a = ( a i , . . . , a n ) and (3 = (bi,..., bn), where bi and a* are complex numbers gener ally, we define (a, 13) = ojfei H
\-anbn,
then Cn is a linear space over complex numbers with inner products. In a linear space Cn formed by all matrices of order n with complex elements, if A = (a,ij) and B = (hj), we define (A, B) = Y^ aijbij, i.e.,
(A,B) =
tt(AW),
then Cn is simply a linear space over complex numbers with inner products. The following are some basic properties of inner products which can be readily drawn from definition: ( l ) ( a , J / 3 ) = Z(a,/3);
(2)(a,0i+ft) = (a l 0i) + (a,A); (3)(a,o) = (o,/3)=0;
(
n
fl
\
n
_
^fciQi, E ^ J 1 = E kilj((Xi,Pj). Let V be an n-dimensional linear space over complex numbers with inner products, {ai,tX2, ■■ ■, a n } be a basis for V, and a = xicxi H \- xnan, (3 = yicti -\ 1- ynan be any two vectors in V. Prom the properties of an inner product, we find n
Let (ai,ctj)
= aij(i,j = 1,2,... ,n). Then (a,(3) =
XAY',
where A = (a,ij),X = (JCJ, . . . , x n ) , y = ( ^ j , . . . ,yn). Hence the inner product of any two vectors in V is determined uniauelv by A. As before, the matrix
384
Linear Algebra
A is called the metric matrix of V with respect to the basis (ai,... ,an). Obviously A = A , where A = (<%). Analogous to a matrix with real elements, the matrix A with complex ele ment is called Hermitian (C. Hermit, 1822-1901) matrix if A = A . A\s called skew-Hermitian matrix if A = —A . A Hermitian matrix with real elements is simply a symmetric matrix. A metric matrix is a Hermitian matrix. Assume that A is a Hermitian matrix. As \A\ = \A'\ = \A\,\A\ is a real number. Here, although A is a matrix with complex elements, \A\ is a real number. A skew-Hermitian matrix with real elements is simply a skew-symmetric matrix. A is a skew-Hermitian matrix if and only if iA is a Hermitian matrix. E x a m p l e 1. Let T be a linear transformation on an inner product space V over complex numbers. If {Tex, a) = 0 for any a € V, then T = o. Proof. On account of (T(a + (3), a + /3) = 0, expanding the left-hand side of the equality and using (Ta, a) = 0 and {T(3, /3) = 0, we obtain (Ta,(3) +
(T(3,a)=0.
However, as (Ta, i/3) = i(Ta, (3) = -i(T(3, a) = i{T(3, a), (T(ip),<x) = i{T/3,a) + i(Tl3,a)=0,
(iT0,a)=i(T(3,a), i.e.,
(T/3,a)=0.
From Example 3 in Sec. 8.1, it now follows that T = o. The proof is complete. As in Definition 3 in Sec. 8.1, the length of a vector is defined as \a\ = \ / ( a , « ) ; the cosine of the angle 6 between two nonzero vectors is defined as 2
(a,/3)(Aa) (a,a)03,/3)
If (a,/3) = 0, then cos# = 0 and a and /3 are called mutually orthogonal vectors, denoted a ± / 3 . The zero vector is orthogonal to any vector in V. If \a\ = 1, a is known as a unit vector. As in Sec. 8.1, we have the CauchySchwarz inequality (a,W,a)<(a,a)(ft/3), where equality holds if and only if a and (3 are linearly dependent.
385
Inner Product Spaces
In addition, we have
|a + /3|^|a| + |/3|. The proofs for the above could be copied verbatim from those for a linear space over real numbers with inner products. As before, if { a i , . . . , a n } is a basis for a linear space V over complex numbers with inner products and Q i , Q 2 , . . . , a n are pairwise orthogonal unit vectors, then { a i , . . . , an} are known as an orthonormal basis. Since a set of pairwise orthogonal vectors is a set of linearly independent vectors, pairwise orthogonal unit vectors in an n-dimensional linear space over complex num bers with inner products are simply an orthonormal basis. As before, we can find an orthonormal basis from a basis for V through orthogonalization and normalization. Thus we also have: T h e o r e m 1. There exists an orthonormal basis for any nonzero linear space over complex numbers with inner products. In a linear space over complex numbers with inner products, using an or thonormal basis can simplify many problems. For example, if {ot\,..., an} is an orthonormal basis for a linear space V over complex numbers with inner products, then a can be written as a = (a, ai)ai
H
h (a, an)an
.
for any vector a in V. Again if a = aioti + ■ ■ ■ + anan,
(3 = bioci H
+ bnan ,
then an inner product (a, (3) of a and (3 can be simplified to n
(a,/3) =
^2aibi. i=l
Hence n \a\2 =
}aia,i. i=l
Analogous to Sec. 8.3, if a linear transformation T on a linear space V over complex numbers with inner products preserves the length of any vector a in V, i.e.,
(a,a) =
{Ta,Ta),
then T is called a unitary transformation. Thus here we also have:
386
Linear Algebra
Theorem 2. A linear transformation T on a linear space V over complex numbers with inner products is a unitary transformation if and only if (a,P) =
(Ta,T(3).
Proof. Sufficient part of the theorem holds obviously. It suffices to prove the necessary part of the theorem. If T is a unitary transformation on V, then (a+p,a
+ p) = (T(a + (3),T(a + p)) = (Ta + T/3,Ta + TP).
Multiplying out the left-hand and right-hand sides and simplifying, we obtain (a, 0) + (ft a ) = (To, T/3) + (T/3, To). Replacing a. in the above formula with ia, according to (ka,P) = k(a,p),
(a,ip) =
l(a,p)
we immediately obtain (a, p) - 03, a ) = (Ta, T/3) - (T/3, T a ) . Adding the above expressions, we arrive at the necessary condition. Therefore the theorem is established. As before, a unitary transformation carries a set of pairwise orthogonal unit vectors onto a set of pairwise orthogonal unit vectors. Therefore a linear transformation on a linear space V over complex numbers with inner products is a unitary transformation if and only if T transforms an orthonormal basis into an orthonormal basis. Now suppose T is a linear transformation on a linear space V over complex numbers with inner products and let A = (ay) be a matrix of T with respect to an orthonormal basis. Then T is a unitary transformation if and only if A is a unitary matrix, i.e., A satisfies A! A = AA' = E, or n
E t=\
n t=l
Thus a unitary transformation is a nonsingular and one-one linear transforma tion.
Inner Product Spaces
387
It is clear that a unitary matrix with real elements is simply an orthogonal matrix. As in orthogonal matrices, if A is a unitary matrix, then the inverse matrix of A is A^1 = A'. Moreover, we know from |J4| \A\ = 1 that |J4| is a complex number whose modulus is equal to one. Furthermore A', A*, and A~l are all unitary matrices. In addition, the product matrix of two unitary matrices is still a unitary matrix. As in Example 5 of Sec. 8.2, we have: T h e o r e m 3. Let T and T* be linear transformations on a linear space V over complex numbers with inner products, matrices A and A1 the matrices of T and T* with respect to an orthonormal basis respectively. If a and /3 € V, then (Ta,/3) = ( a , r * / 3 ) . A linear transformation having such a property is known as a conjugate transformation of T. From the example below, we can easily see that the conjugate transformation of T is unique. Example 2. Let T\ and T2 be two linear transformations on V. If for any a and /3 in V (Tia,0)
=
(T2a:f3),
then Tx = T2. Proof. Since (Tia,/3) - (T 2 a,/3) = ((Ti - T 2 )a,/3) = 0 and /3 is any vector in V, we have ((T1-T2)a,(T1-T2)a)=0. Hence (Tx — T ^ a = o or Xia = T2CX, and so Tx = T 2 . The proof is complete. Thus if A is the matrix of T with respect to an orthonormal basis, then the matrix of the conjugate transformation T* is simply A . It is easy to see from this result that (T*)* =T**
=T,
which follows readily from definition. This is because
(T**a,/3) = (a,T*/3) = ( T ^ ) = (^T5) = (Ta,/3),
388
Linear Algebra
we have T**a = Ta and hence T** = T. Thus we again have (T'a,f3)
=
(a,Tf3).
E x a m p l e 3. Prove that KerT* = ( ImT)1-. Proof. Since (T*a,P) = (a,T(3), if T*a = o, i.e., a € KerT*, then (a,T(3) = 0 and hence a € (ImT)- 1 , so KerT* C (ImT)- 1 . If a 6 (ImT)-1-, i.e., (a,T/3) = 0, then (T*a,/3) = 0 and so T*a = o as /3 is any vector. Hence a € KerT* and thus (ImT)- 1 C KerT*. The proof is complete. If T* = T, we have (Ta,p)
= (a,TI3),
a,peV.
Hence T is simply a symmetric transformation, sometimes known as a selfconjugate transformation. Hence the matrix of T with respect to an orthonormal basis is a Hermitian matrix. T h e o r e m 4. A linear transformation T on V is self-conjugate trans formation if and only if (Ta, a) is a real number for any vector a in V. Proof. Since (Ta, a) = (a, Ta) = (Ta, a), (Ta, a) is a real number. Conversely, since (Ta,(3) = ±{(T(a + (3),a + +^{(T(a
(3)-(T(a-(3),a-l3)}
+ i(3),a + ip) - (T(a -ip),a-
i/3)},
and (TX, X) is a real number, we have (TX, X) = (TX,X)
= (X, TX).
Substituting these into the above formula, we obtain (Ta,f3) = (a,Tfi). Hence T is a self-conjugate transformation. This proves the theorem. If we replace the transformation above with the matrix of a transformation, we would obtain that a matrix A is a Hermitian matrix if and only if (Aa, a) is a real number, where a is any vector. E x a m p l e 4. The eigenvalues of a self-conjugate transformation are all real numbers.
Inner Product Spaces
389
Proof. Assume that T is a self-conjugate transformation, Ta = \a. We obtain from (Ta,a) = (a,Ta) that ( A a , a ) = ( a , A a ) and so A ( a , a ) = A(a, a ) . Therefore A — A, i.e., A is a real number. The proof is complete. In terms of matrices, the above result states that the eigenvalues of a Hermitian matrix are real numbers. This is actually a generalization of Theorem 2 in Sec. 5.3. On a linear space over complex numbers, instead of considering general quadratic forms, we consider Hermitian quadratic forms f=
X'AX,
where A is a Hermitian matrix, i.e., A = A'. The rank of A also known as the rank of / . It is easy to see that / = X'AX = (AX,X). Again since
/ = (AX)'X = (X,AX), we have {AX,X) = (X,AX).
Then as (AX,X) =
(X, AX) = (AX, X), (AX,X) is a real number. That is to say, although X is a complex vector, / = X'AX is a real number. In addition, / = X'AX can be reduced to a sum of squares / = All/!?/! + ■ • • + XnVnVn by a unitary transformation X = UY (U being a unitary matrix). This is actually a principal axes problem for a linear space over complex numbers with inner products, where Ai, A2,..., An, as in Sec. 5.3, are eigenvalues of A and are real numbers. The proofs are entirely analogous to those in Sec. 5.3. We know that the eigenvalues of a Hermitian matrix are all real numbers. As U~1AU is a Hermitian matrix, i.e., (U~lAU)' = (U~1AU), we have the following theorem, which is analogous to Theorem 6 in Sec. 5.3. Theorem 5. Assume that A is an eigenvalue of a Hermitian matrix A of multiplicity k. Then there exist k linearly independent eigenvectors of A associated with the eigenvalue A. As the procedure for finding the matrix U is exactly the same as that for finding the matrix P in Sec. 5.3, we shall not repeat it here. Expressed in terms of matrices, the above result can be simply expressed as a theorem: Theorem 6. Assume that A is a Hermitian matrix. Then there exists a unitary matrix U such that U~1AU becomes a diagonal matrix.
390
Linear Algebra
As in real quadratic forms in Sec. 4.2, in Hermitian quadratic forms or a Hermitian matrices there are the inertial theorem and positive definite, nega tive definite, or indefinite quadratic forms. The related theorems hold analo gously. We know from Sec. 8.1 that a Hermitian matrix A is positive definite if and only if (AX, X) > 0, and is negative definite if and only if (AX, X) < 0, where X is any nonzero vector. In addition, if one of two Hermitian quadratic forms / = X AX and g = X BX, s a y / , is positive definite (i.e., for any (x\,.. .,xn) ^ 0, f(x\,X2, ■ ■ ■ ,xn) > 0), then they can be reduced by applying the same nonsingular transforma tion X = PY to the following sums of squares simultaneously, / = 2/i2/i H
1" VuVn,
9 = hyxyi
+ • • • + knynyn
,
where k\,..., kn are the roots of \\A — B\ = 0 and are real numbers. Its proof is given as follows. Assume that (kA - B)X = 0, where X / 0. Then kAX = BX, and hence kJC'A' = X'B'. Thus kX'AX = X~'BX = kX'AX, or (k - k)X'AX = 0. Since A is positive definite, X1 AX > 0. Consequently k = k, and thus A; is a real number. We conclude this section by introducing an important type of matrices. A matrix A is called a normal matrix if A'A = AA'. Clearly diagonal ma trices, unitary matrices (orthogonal matrices), Hermitian matrices (real sym metric matrices) and skew-Hermitian matrices (real skew-symmetric matrices) are all normal matrices. Again assuming that A is a normal matrix and U is a unitary matrix, it is easy to show by definition that U~1AU is also a normal matrix. T h e o r e m 7. Suppose A is a triangular matrix. If A is also normal, then A is a diagonal matrix. Proof. Assume that A = (dij),a,ij = 0 when i > j . From A'A = AA' we have n n 2_^O,ik0.ik = 2_^0-kj0.kj ■ i=l
j=l
As dij = 0 for i > j , the above becomes k
n
t=l
j=k
391
Inner Product Spaces
or
J2\o-ik\2 = ^2\akj\2t=l
j=k
If A is not a diagonal matrix, we choose aki ^ 0,k < I, where k is the smallest number. Then n
^2\akj\
jafcfcl
i=k
Clearly this is a contradiction, and so A is a diagonal matrix. The theorem is completely proved. T h e o r e m 8. Suppose that A is a normal matrix, then there exists a unitary matrix U such that U~1AU is a diagonal matrix. Proof. For the matrix A, there exists a unitary matrix U such that U~1AU becomes a triangular matrix. Then since A is normal, U~1AU is also normal. That is to say, the triangular matrix U~1AU is normal and is therefore a diagonal matrix. The proof is complete. The above theorem is an important theorem which encompasses Theorem 5 in Sec. 5.3, Theorem 3 in Sec. 5.4, Theorem 6 in this section, and the respective exercises in Sec. 5.3 and Sec. 5.4. Therefore it sums up the entire problem of diagonalization of matrices. E x a m p l e 5. Suppose A is a normal matrix of order n with eigenvalues A i , . . . , An and HI, ..., /j,n are eigenvalues of A'A. Prove that Mi + --- + Mn = |A1|2 + --- + |A 7l | 2 . Proof. Let U~1AU = diag(Ai,..., A„), where U is a unitary matrix. Then
r\
Ai
U^AUiU^AU)' An
i
/|Ai| 2
\
An /
(
V
or |Ai|2
\
U-\A'A)U =
7
ii
and so the statement holds.
12 ,
^
|AnlV
392
Linear Algebra
Exercises 1. For each of the following Hermitian matrices A, find a unitary matrix U such that U~1AU is a diagonal matrix: 0 i 1\ ' 3* 1 (1) A = | - t 0 0 , (2) il = " 3v^ 1 0 0/ _L V V6
3\/2 ' s I 6
1_ 2V3
\/6 " i 2i/3
*
I , 2 J
2. Find a matrix which is both Hermitian and unitary. Then find a nor mal matrix which is not diagonal, nor unitary, not Hermitian, nor skewHermitian.
3. Prove that (o,j9) = i { ( a + /9,a + / 9 ) - ( o - i 9 , a - / 3 ) } + ^{(a + i/3, a + i/3) + (a - i/3, a - i0)} . 4. Show that if A is an eigenvalues of T, then A is an eigenvalue of T*. 5. Suppose that U is an invariant subspace in V under T. Show that U1- is an invariant subspace in V under T*. 6. Show that (ka)* = ko*,(ar)* = r*a*, where a and T are self-conjugate transformations. 7. Suppose a is a self-conjugate transformation and A is an eigenvalue of a. Then there exists a such that A = (act, a),
(a, a ) = 1.
8. Assume that A is normal and A i , . . . , An are eigenvalues of A. Prove that (1) if A; are real numbers, then A is a Hermitian matrix; (2) if Ai is zero or a pure imaginary number, then A is a skew-Hermitian matrix; (3) if |Aj| = 1, then A is a unitary matrix. *9. Prove that T is a unitary transformation if and only if T* = T - 1 . *8.5. Normal Operators We know that a linear transformation from a linear space into itself is sometimes called a linear operator. So a symmetric transformation may be
393
Inner Product Spaces
called a symmetric operator, a conjugate transformation a conjugate operator. In this section we shall present another important operator on a linear space over complex numbers. Suppose T is a linear operator on V and T* a conjugate operator of T. If T*T = TT*, i.e., T is commutative with its conjugate operator T*, then T is called a normal operator on V. Obviously a linear operator T is a normal operator if and only if a matrix of T with respect to an orthonormal basis is a normal matrix. In addition, we have: Theorem 1. T is a normal operator on V if and only if (Ta,T(3) =
{T*a,T*P)
for any a and /3 in V. Proof. If T is normal, then (Ta,T(3)
= (T*Ta,/3)
= (TT*a,/3) =
(T*a,T*(3).
Conversely, for any a,/3 € V, by hypothesis we have (a,TT*/3)
= {T*a,T*f3) = (Ta,T0)
=
(a,T*T0).
Hence TT* = T*T, i.e., T is normal. This proves the theorem. The following is some properties of normal operators. Theorem 2. Suppose T is a normal operator on V. Then ImT = ImT*, KerT = KerT* . Proof. From (Ta,Ta) = (T*a,T*a) we know when Ta = o, we have T*a = o. Conversely, when T*a = o, we have Ta = o. Therefore KerT = KerT*. Also since KerT* = (ImT)- 1 , we have KerT = (ImT)- 1 . Hence KerT* = (ImT*)- 1 and thus ImT = ImT*. The proof is complete. Theorem 3. Suppose T is a normal operator on V and a an eigenvector of T associated with the eigenvalue A. Then a is an eigenvector of T* associated with the eigenvalue A.
394
Linear Algebra
Proof. Since T is a normal operator, we have (Ta,Ta) = (T*a,T*a). Since a is an eigenvector of T associated with A, we have Ta = Xa. Hence 0 = (Ta - Xa,Ta
- Xa)
= (Ta, Ta) - X(a, Ta) - X(Ta, a) + XX(a, a) = (T*a, T*a) - X(T*a, a) - X(a, T*a) + XX(a, a) =
(T*a-Xa,T*a-Xa),
and so T*a — Xa = o, i.e., T*a = Aa. That is to say, a is an eigenvector of T* associated with A. The proof is complete. Theorem 4. Let T be a normal operator and U a set of eigenvectors of T. Then U1 is an invariant subspace under T. Proof. Since {3 £ U1- if and only if (a, /3) = 0 for any a in U and (a, T(3) = (T*a, (3) = (Xa, f3) = X(a, 0) = 0, we have Tf3 G If-1, i.e., U1- is invariant under T. The proof is complete. Theorem 5. Let T be a normal operator and U invariant under T and T*. Then T is also a normal operator on U. Proof. Let S be a linear operator on U induced by T and S* a conjugate operator of S. Then for any a and f3 S U, we have (S'a, (3) = (a, 5/3) = (a, Tf3) = (T*a, (3). Thus for any a and (3 € U we have ((S* - T*)a,/3)) = 0. Hence S* = T*, i.e., the effect of S* acting on U is identical to that of T* acting on U. And so for U we have S*S = T*T = TT* = SS* . Hence S is a normal operator on U. The proof is complete. Exercises 1. Let T be a normal operator. Show that Ta = o if and only if T*a = o. 2. Let T be a normal operator. Show that T — XI is also a normal operator.
ANSWERS TO SELECTED EXERCISES
CHAPTER 1 Section 1.1. , / , N r-.
1 ^ cos8 _ sin/3 cos z a cos a cos a x = cos a cos 3, y = cos a sin /3. (2) Since w3 = 1, we have D1 = 0, D2 = D, D3 = 0. Thus x = 0, y = 1, z = 0.
2. The general term containing an
where p and g are all permutations of 2 and 4: 2, 4 and 4, 2. Hence there are two terms containing ana23, the term with negative sign is —011023032044. 3. Among pz, p±, and p& in any term ± O i p i 0 2 P 2 0 3 P 3 a 4 p 4 05p 5
at least there is one number that takes some number among 3, 4, and 5. That is, in any term at least there is a zero factor. Hence the determinant equals zero. 4. Compute by algorithm of determinant of order 3. 5. The member of nonzero element in the determinant of order n is less than n 2 — (n 2 — n) = n, so each term contains zero elements, and consequently equals to zero. 6. Since any term containing n factors lying in different rows and different columns, we exchange rows and columns such that n factors lie on the main diagonal of the new determinant. 395
396
Linear Algebra
7. By definition of a determinant, any term is unchanged. 8. In determinants of order 2 or 3, the sign associated with the term of the product of elements lying on the minor diagonal is negative, while in the determinant of order 4, the sign associated with the term of the product on the minor diagonal is positive, hence in determinant of order n, when the n elements of each term are changed to the minor diagonal, we can't find the general rule for the sign. Section
1.2.
1. (1) - 2 1 . (2) Subtract the first row from the second, third, and fourth row respec tively, the determinant equals —8. (3) Imitate Example 5 and reduce to the triangular determinant. Its value equals (a + b + c)3. (4)a{b-a)3. (5) Subtract the third row from the other rows and reduce to the triangular determinant. Its value equals (n — 3)!6. 2. (1) Use Property 4 and reduce to the triangular determinant. (2) Use Property 2 and decompose the determinant to the sum of six de terminants, and then use Corollary 2 of Property 3. 3. Add the other columns to the first column. Factor out n(n+ l ) / 2 from the first column. Subtract the (n — l)th row from the nth row, the (n — 2)th from the (n — l)th row, ..., the first row from the second row and obtain:
D =
n 1 2
n(n + 1)
n-1 n(n + 1 ) 2
n(n+l) 2
1 0
2 1
3 1
■•■ ••■
n 1-n
0
1-n
1
•••
1
1
1
•••
1-n
1-n
1
...
1
397
Answers to Selected Exercises
"("+1)(
=
1 1
1-n
1
1}
^_M)(_
= (-1)
1 1
1 ) (
_
1 )
••■ ■••
l^^.(_
1 1-n
1 1
1
1
n r
-
2
'("-D nrc + n " - 1
b1-1^!
6
x
an
••■
6
n
ain
= {,l+-+n 6 n - 1 Onl
bn~nn"Tin
6-1ani
u
_ jjl+-+n-(H—-+n)
an
b~nann
■•■
an
•••
flni
■ ■ ■
«i an
air.
ani 5. Use differential property (abc)' = a'bc + ab'c + abc'. 6. Repeat use Property 3. Prove that two determinants are equal. Section 1. (1) (3) (5) (6)
1.3.
-102. (2) xyzuv. 4. (4) an-2{a2 - 1). I + xf + ■ ■ ■ + x„. Imitate Example 2. Use the formula C™ = C " - 1 + C^Zl, and subtract the ith row from the (i + l)th row, the ith column from the (i + l)th column. Continuing in this way, we decrease the order of the determinant. Its value equals 1.
2. Add x times the nth column to the (n — l)th column, x times the (n — l)th column to the (n — 2)th column, . . . , until x times the second column to the first column. Finally we expand about the first column. 3. Adding the other columns to the first column, we obtain
398
Linear Algebra
1 0
-1 2 - 1
i4n_! + (-l)n+l(-l)n~1
An — -1
= An-l + 1
2
so An = A\ + n — 1 = n + 1. 4
1 (_]\l+2+6+l+2+5 1 1 2
1
1
LJ
LO2
LJ2
LJ
1 1 1
1 U2 LU
1 U) 2
=
u
1 1 1
1 1 u> w2 u>2 UJ
2
= 9o> (w - l ) . 5. (1) By exchanging rows, it can be reduced to the Vandermonde determi nant. Its value equals n(n+l)
(-1)
n o'-on+l^i>jj:l
(2) We factor out common factors a" from the ith row, i = 1,2,..., n, and the original determinant becomes
UJ
1 1 1
6a
UJ =
j ] (bidj l=Sj
6~±i <»»+l
6. (1) Dn = (a + 6)£>n_i - a&£>„_2 . .„. _ „ 2sin0cos0 (2) Di = 2cos0 = sin#
sin2# s'mO 2cos6smn6 — sin(n — 1)0 Dn = 2cosdDn-i - Dn-2 sin# 2cos#[sin(n - l)0cos0 + cos(n - l)0sin#] — sin(n — 1)0 sin# sin(ra - l)0(2cos 2 0 - 1) + cos(n - 1)6 ■ 2sin0cos0 sin0 sin(n+l)6> sin#
dibj).
Answers to Selected
399
Exercises
7. Expanding about the first row, we obtain two determinants. One is of order n — 1, the other is of order n — 2. They take the form (2) in Exercise (2) above, hence the original determinant is „sinn# cost/ . „ sintz
sin(n-l)0 1 , n . „ . , „.„, . ' = —-icosOsmnO — sin(n - 1)6 \ sm9 sm# = cosnd.
^21
•■ •
&n\
'' '
a
2,j-l
02,j"+l
'''
a
1n
8. A,j = ( - l ) i + J &n,j—l 0-71,j+1
'''
®n%
Adding the other columns to the first column, we obtain -aij
■•■ a2,j-ia2,j+i
••■
~0"nj
'''
'''
a^n
Ay = (-1) i + j ^n,j — l^n,j+l
= (-l)1+j(-l)(-l)j-2An
^nn
« ( - 1 ) 2 M U = Au .
This is because 041 + • —h din = 0. Similarly An = An. Therefore Aij are all equal. Section
1.4.
1. (1) Xi = 1, X2 = — 1, 2:3 = 0, £4 = 2. (2) a; = a, 7/ = 6, z = c. 2. /(x) = Z2 - 5x + 3. 3. When t = 15°C, we have /i = 13.46; When t = 40°C, we have h = 13.46. CHAPTER 2 Section 2 . 1 . 1. a = (1,2,3,4). 2. When fci, fca,... ,fcm a r e a n zero, ctj., 0 2 , . . . , a m may be linearly indepen dent or may also be linearly dependent; When fei, &2,..., fcm are not all zero, Qi, 012, ■ ■ ■, ocm are linearly independent. 3. They need not be linearly independent, for example,
400
Linear Algebra
<*! = (1,0), a 2 = (0,1), ft = (0,0), fa = (0,0). 4. If a i , a2, •. ■, otm are linearly independent, then the expression is unique, otherwise the expression is not unique. 5. The two sets of the corresponding combination coefficients for the two sets of linearly dependent vectors need not be equal. If they are equal, then the proof is right. 6. Need not be. 7. Assume that ki(3\ + ■ ■ ■ + km(3m = o. Then kiau + 1- kmami = 0, i = 1,2,... n, hence fciai + ■ ■ ■ + fcm/9m = o. So / 3 i , . . . , (3m are linearly in dependent. The converse need not be true. For example, /3i = (0,0,1), f32 = (0,1,0) are linearly independent, but a i = (0,0), a2 = (0,1) are linearly dependent. 8. Let fci(ai -I- a 2 ) + k2(a2 + a 3 ) + k3(a3 + a i ) = o, i.e., (fei + fc3)ai + (fci + k2)a2 + (k2 + k3)a3 = o. Since c*i, Q2, a3 are linearly independent, the coefficients in the above expression are zero, hence we obtain ki = k2 = k3 = 0. 9. For any set of linearly dependent vectors a i , ot2,..., am, examine one by one from left to right: c*i ^ 0 preserve, in general, vectors which cannot be linearly expressed in terms of the preceding vectors are to be preserved. Let a set of vectors which are preserved be a i , . . . , aj.,
r<m.
(*)
hence (*) is linearly independent; any vector among c*i, 0 2 , . . . , a m , no matter whether it is preserved in (*), can be linearly expressed in terms of the set of vectors in (*). 11. (1) The dimension differs.
(2) linearly independent.
12. (1) 2;
(2) 5.
13. There may be zero minors of orders r — 1 and r, but any minor of order r + 1 is equal to zero. 14. The rank of B may be 1 less than that of A, or may be equal to that of A
401
Answers to Selected Exercises
15. Suppose that the matrix C is simply a (s,n)-matrix obtained from A by taking s rows in A. Prom Example 4 we obtain the rank of C ^ r + s — m. Assume that the matrix B is simply a (s,£)-matrix obtained from C by taking t columns in C, hence the rank of B ^ the rank of C + t — n. Thus the rank of B^r + s + t — m — n. 16. It is apparent that when the rank is equal to 0, the conclusion holds. The rank equals to 1 if and only if any two rows are linearly dependent. We might as well let ( a n , . . . , a\n) be a nonzero vector, then {an,.. .,ain)
— ai(an,...,ain),
i = l,...,m,
i.e., a,ij = didij. Let bj = a\j, j = l,2,...,n.
The conclusion holds.
Section 2.2. 1. X\ = 2X2 + 7^4) %2 = %2, £3 = — f£4, £4 = X4. 2. Xi = X3 - X5, X2 = Xz — X5, X3 = X3, X4 = I3, £5 = X5, X6 = X5.
Note that adding the second expression to the third expression equals the fifth expression.
{
auxi H
h ain-ixn-i
= -alna;n,
Q>n —1,1^1 T * * " T On—l,n— l^n—1 —
An-l.n^n •
where D = Ann, Oil
■■•
On—1,1
' ''
Ol,i-l
— 0,\nXn
fll,i+l
■•
ai,n-l
A= a
n— l,t— 1
0-n—l,n%n
so X
i—
i.e.,
rj -
~A
*»>
On —l,i+l
"n,n-l
402
Linear Algebra
Section 2.3. 1. (1) For example, (2,1,0,0), (2,0,-5,7). (2) For example, (1,1,1,1,0,0), ( - 1 , - 1 , 0 , 0 , 1 , 1 ) . 2. From Exercise 8 in Sec. 2.1, we obtain that a i + c*2, 0.2 + <*3, £*3 + « i are linearly independent. 3. There exists the following relation between equations of the system given (4) - (2) = 2(1), 3(1) - (2) = (3), hence to solve the system of linear equations given, it suffices to solve the following system ( X\ + X2 = -X3
|
- X 4 — X5,
X2 = —22:3 — 2x4 — 6x5.
Its solutions are Xi = X 3 + X 4 + 5X5, X2 = — 2X3 — 2X4 — 615, X3 = X3, Xi = Xi,
X5 = £5 .
Row vectors of the matrix given have the following relation OL\ — a.2 = c*3, 3 a i — 2c*2 = 0C4 . Taking /3 = (5, - 6 , 0 , 0 , 1 ) , thus <*! = (1,-2,1,0,0),
a 2 = (1,-2,0,1,0),
/3
are a system of fundamental solutions required.
Section 2.4. 1. (1) No solutions.
(2) x\ = —2x2 + X4 + 3, X3 = 1.
2. Let y = xi + X2 + • • • + x n , then the original system of linear equations becomes
{
-y + 2xi = 2a, -y + 4x 2 = 4a, -y + 2"x n = 2 n a.
Hence we obtain y xi=a+-,
y x2 =a+-,...,xn
y = a+ — .
Answers to Selected Exercises
But
y = Ytei
na + y(i
=
i + ...
+
+
403
£.\
= na+ ( l - i j j / , i.e., y = 2 n na. Thus the solutions required is Xi = a(l + n2n~%). 3. When A ^ 0, A ^ l , the system has a unique solution; When A = 0, it has no solutions; When A = 1, it has infinitely many solutions. 4. Let A = {oij), 5 = 1 , 4
: I, C = ( , ybi
v
J
B
■■
, . ). Obviously the bn 0J
rank of A ^ the rank of B ^ the rank of C. If the rank of A = the rank of C, then the rank of A = the rank of i?. 5
5. If the system has solutions, adding five equations, we obtain \]ai
=
0-
i=l 5
Conversely, if >Jaj = 0, then five row vectors of the coefficient matrix of i=l
the system are linearly dependent. T h e rank of the coefficient matrix a n d augmented matrix is 4, hence the system has solutions. Solving the first four equations, we obtain X4 = X$ + 04, X3 = Xs + CI3 + fl4, ^2 = ^5 + 0,2 + 0,3 + O4, xi = x$ + a\ + a-2 + a,3 + 04.
6. Let Vi = (cii, Ci2,...,
kivi H
cin),
then
Yktvt = I ^ A ; i C ( i , . . - , ^ A ; i C j n J , V=i
1=1
J
from S " = i a«jc«j = 6t, we have n
/ t
\
t
i
3=1
\l=l
J
1=1
n
\
t
\j=l
/
1=1
i.e., only when ki + ■ ■ ■ + kt = 1, any linear combinations of v%, 1/2, • ■ ■, ft are solution vectors of (1).
404
Linear Algebra
7. Use the result of Example 5. 8. TJO + fciai + fc2a2, where on = (1, 3, 2)', a 2 = (0, 2, 4)', rjo = (0, - 3 , 3)'. Section 2.5. 1. (1) 3.
(2) 2.
2. The rank of matrix can be found by using only elementary row operations or only elementary column operations, but a system of linear equations can't be solved by using only elementary column operations, or by using both elementary column operations and elementary row operations. 3. (1) xi = | ( 7 - 5x3 - 6x 4 ),
x2 = | ( 4 - 2x 3 - 3x 4 ),
x3 = x3,
x4 = x4.
(2) xi = - § + |x 4 , £2 = 0, X3 = Ap - 5X4, x 4 = x 4 . For two types of elementary row operations prove that corresponding com ponents of (*) and (**) have the same linear relation or prove that AX = 0 and BX = 0 are the same-solution systems.
/ 5.
4 5 2 6
2 4 -2 -1 1 5 3 6
0 0 --6 0 -1 0 9 3 2 1 5 3 9/ V 0 0 -9 0 /0 0 1 1 0 0 -3 — 0 1 0 9 Vo 0 0 0/ 6
-3 3
\
/
and so e*i, 0:2, and 0:3 are a largest linearly independent set. 0L4 9a2. CHAPTER 3 Section 3.1. • ■ « ( • • ! ) ■
(2) a n x 2 -1- a-n.y2 + cz2 + 2a\2xy + 2b\xz + 2b
V
#)
-3osi +
405
Answers to Selected Exercises
-1266\ -2922j'
3197 7385
3. Adding the (n + l)th, . . . , the 2nth row to the first, . . . , the nth row respectively, again substracting the first, . . . , the nth column from the (n + l)th, . . . , the 2nth column, we obtain A B
B A
A+ B B
B + A A
A+B B
o A-B
\A +
B\\A-B\
4. Since the rank of C = the rank of (AB) ^ min (the rank of A, the rank of B) ^ min(m, n). Now m > n, so min(m, n) = n. i.e., the rank of C < m, so C is a singular matrix. 5. The rank of (A+E) + the rank of (A-E) ^ the rank of (A+E+A-E) = the rank of A = n. Again (A + E)(A - E) = A2 - E = o, so the rank of (A + E) + the rank of (A-E) ^ n. 6. The rank of A + the rank of (A — E) = the rank of A + the rank of (E - A) ^ n, moreover A(A - E) = o.
7.
1
1
1
Xi
X2
X3
rrt^
ri**'
ni*
j,j
x2
A3
1 Xi 1 x2 1 x3
A\\
A12
an
A?i
Aii
O-ln
3
s\
S2
«i
s2
•S3
S2
S3
S4
O-nl
8. Alx A2i
Au A22
A13 A23 1
£>
0
0
0
D
0
Ol3
<*23
^33
Aln A2n
an
021
O-nl
ai2
022
«n2
1
«2n
0 0 an 3
^33 2
D
«3 7
Oln
^ n
G3n
0-n3
406
Linear Algebra
9. We draw the conclusion by induction on the order n. W h e n n = 2, D2
0
ai2
-ai2
0
= a 12 ■
Again from the above exercise, we have
0
A12
A21
0
ai2
air
a21
0
a-2,
O-nl
0,n2
0
a34
fl3n
a-n 3
On 4
0
0
but j4i2 = — A2\,
then DnDn-2
= A 12
10. From Exercise 9, we obtain 0 = the rank of (Ai ■ ■ ■ Am) A\ -\ h the rank of Am - (m - l ) n .
^ the rank of
11. From the preceding two expressions we obtain (a - l)i4
a-3
-3
-2
a-47
.
2 (a-1)5= ; ^ V 2
V
3 3
thus (a - 1 ) 2 A B =
2a - 12
3a - 1 8 \
2a - 12
3a
-18J=°'
so
— ■ - K - 2 I ) ' H ( 2 3> Again since B = E - A, AB = A - A2 = o, we have A = A2. Similarly B — B2. Let A + aB — M, then M2 = A + a2B. Using induction on n, we obtain M fc+i
Section
=
MkM
3.2.
1. Compute by definition. 2. Examine by definition.
=
(A + akB)(A
+ aB)
= A +
ak+1B.
Answers to Selected
407
Exercises
3. Compute directly. b . «n (a \ (a c \ (a2 + b2 ac + bd\ , „ n 4. When = „ = o, we have a2 + b2 = 0, \c dj \b dj \ac + bd c2 + d2 J 2 2 c + d = 0. Since a, 6, c, and d are all real numbers, we obtain a = b = c = d=0. Or we directly use the conclusion in Example 4 to prove it. 5. Let the rank of(AB) — the rank of B = r, then fundamental solutions of two systems ABX = 0 and BX = 0 contain n — r solution vectors. Again since the solution vectors of BX = 0 are all the solution vectors of ABX = 0, the system of fundamental solutions of BX — 0 is also the system of fundamental solutions of ABX = 0. Therefore two systems have the same solution vectors. Conversely, if two systems have the same solution vectors, then the systems of fundamental solutions are the same, so the rank of (AB) = the rank of B.
6. Since A is a singular matrix, AX = 0 have nonzero solutions. We form the matrix B by taking any n vectors in linear combinations of a system of fundamental solutions as its columns. Obviously AB = o, B ^ o. Similarly, since A' is a singular matrix, there is a matrix C" such that A'C = o. Therefore CA = o. 7. Let A = diag(ai, a 2 , . . . , an), B = ipij). From aibij = ajbij, we have bij = 0, if i =£ j . /Bn
•■
Bu\
8. We write B as a block matrix B =
. Analogous to the \Ba
■■■
B
a
)
above exercise, from AB = BA, we obtain XiBij = XjBij. When i ^ j , we have Bij = o. 9. As in Example 2, from ^(2)^1 = AEi{2), we obtain a,, = 0, ai:? = 0, i ^ j , where Ei{2) is obtained from E by multiplying the ith row of identity E by 2. Again from Iij ■ d i a g ( a n , . . . , a n n ) = d i a g ( a n , . . . ,ann) ■ Iij, we obtain a„ = ajj, where Jy is obtained from E by interchanging the ith row and the j t h row of E. 10. Use the definitions of symmetric and skew-symmetric matrices. 11. Since tr(AB) = tv(BA), ti{AB - BA) = 0, but trE = n. 12. We can use the definition of linearly independent vectors.
408
Linear Algebra
13. (aii, • •• i am) and (An,..., Ain) are solutions of system of linear equations ajix\-\ \-ajnxn = 0, j ' ^ i. Moreover, the rank is n— 1, and consequently, the component of two solutions are proportional, i.e., An _
_ Ain _ >
ail
using ±1 = aii^lii H
&in
1- a.inAin = A, we obtain the result proved.
14. \A + B\ = \AB'B + AA'B\ = \A(B' + A')B\ = \A\\B> + A'\\B\ = -\A\2\A + B\ = -\A + B\. Section 3.3. 1(1)
-i)
("a V
(3)1
2
1 1 \1
21 / 1 1 -1 -1 1 - 1 ■- 1
1 -1 -1 1
( (4)
2
-1 -1
V i
sin a cos a
°\
-1 1 1 -2
0 0 2 -1
Xz
Xn I'
""2 /
a ±\
/
2.
cos a —sin a
(2)
a2 1
I
3. Suppose that the inverse matrix required is
(l)Ftam(°
j ) ( *
Xi=4_\
* * ) = £ , we obtain
X 2 = o,
X3 = -B-lCA~l,
Jf4 = B _ 1 .
Answers to Selected Exercises
/ 4. A* =
1
\
-1 1 \-l
5. (E-A)(E
409
1/
+A +
+ A"1'1) (E - A)'1
6. {A + 2E)~\A2
= E - Am = E, therefore
- 4£) = (A + 2E)-1(A
0 -3 1
V i
l
=
+ 2E)(A - 2E) ~3 1
= A-2E A-2E-=
(A + 2E)~\A-2E)
1 ° °\ / - 31 -
1
1
0
1
\ o -i i / V /-3 4 \ 0
A"1'1.
=E + A+ ■•■+
°\ ) °
-sy 0 -3 1
°\ °
-3J
0 -3 0 4 "3/
°\
7. Since (A-E){B-E) = AB-A-B+E=E, we have E = (B-E)(A-E) BA-B-A + E, i.e., BA = A + B. Hence AB = BA.
=
9. It is easy to prove by definition or elementary operations. 10. Examine by definition. 11. (1) ( - A ) * = (-<%■)* = ( ( - l ) " - % 0 = ( - l ) » " M * . (2) From A A'1 = E, we obtain (A- X )M* = E, and so (vl*)" 1 =
(A'1)*.
(3) From A'(A*)' = \A\E, A'(A')* = \A\E, we obtain A'{(A*)' - {A')*} = o. When \A'\ ^ 0, we have (A*)' = (A1)*; when \A'\ = 0, as in Example 5 we have {A*)' = (A')*. (4) When A is nonsingular, A* is also nonsingular matrix, so we have (A*)* = (A*)-1^*]
=
= {\A\E)~l A\A\n~l =
A\A\n~2.
(A-^AlEy^A]71-1 =
lA^EAlAl"'1
410
Linear Algebra
When A is singular, the rank of A* ^ 1. If n > 2, then (A*)* = o, so the above expression still holds. If n = 2, it is easy to know that (A*)* = A. The above expression also holds. At this point, l^l™-2 is written as 1. 12. Let A be the matrix of order n and its rank be r, from PAQ =
\ o and P~lQ-x is
= B, we obtain A = P^BQ-1 = P^Q^QBQ'1 invertible. Since B2 = B, QBQ^1 is idempotent. / 1 13. Let Ai be the standard form of A, i.e., A\
/ Er
\ 1
Since the
0/ first r rows of A\B coincide with that of B, while the latter m — r rows are all zero rows. From Example 4 in Sec. 2.1 we obtain the rank of (AiB) ^ r — m + the rank of B. Let A = PA\Q, we have the rank of AB = the rank of (PAiQB) = the rank of {AXQB) > the rank of (QB) + r m = the rank of B + the rank of B — m. 14. Assume that f{x\,..., xn) ^ 0, and g(xi,..., xn) ^ 0. We next prove by induction on number n that f(x\,..., xn)g(xi,..., xn) ^ 0 . Let f(x) = a0xn + a i x n _ 1 H m
g(x) = b0x then f(x)g(x) assume that
\-an,
a 0 ^ 0,
+ --- + bm,
b0^0,
= aoboxn+m + • ■ •, since aofco ¥" 0> f(x)9(x)
f{x, y) = h0(x)yh + g(x,y) =kQ{x)yk
fti^)'/"1
+ k1{x)yk-1
+ ■" ■ .
M*) £0,
+ ■■■ ,
k0(x) ^ 0 ,
^ 0. Again
then f(x,y)
■ g(x,y) = h0(x)k0{x)yh+h
since ho(x)ko(x) ^ 0 , we have f(x,y)g(x,y)
+ ■■• ,
^0.
15. We set E under A and reduce A to E by elementary column operations, simultaneously E to A-1. We can't find the inverse matrix by using both elementary row operations and elementary column operations.
Answers to Selected Exercises
411
CHAPTER 4 Section 4 . 1 . 1. None of them are quadratic forms. 2. The rank of B need not be equal to that of A. 3. Yes. Firstly find P' and then find its transposed matrix. 4. /(zi,X2,£3) = (xi,x2,x3)
I a2 a3
(ai,a 2 ,a 3 )
/ a\ a\a2 = X' 1 &2a\ a2 a3a2 \a3ai
a^N a2a3 I -X' a\ )
5. (1) f(xux2,x3) = y\ - y\ + ^y3 where yi = xi + \x2 , y2 \x2 + (2) f(Xl,
x2 ^3
\x3,
2/3 = X3.
x 2 , 1 3 ) = 22/? + 32/I + § 2 / | ,
where y\ = x\ + x2 - x3 ,
y2 = x2
1*3
V3 = X3.
(3) Find the standard form by using elementary operations. Or directly choosing nonsingular linear transformation, we reduce / to the follo wing standard form: / = XiX2 + X2X3 + X3X4
= (2/1 + J/2)(l/i - 2/2) + (2/1 - 2/2X2/3 + 2/4) + (2/3 + 2/4X1/3 - 2/4) = [2/1 + 2/i(2/3 + 2/4)] - [2/2 + 2/2(2/3 + 2/4)] + 2/3-2/4 = [2/1 + 2(2/3 + 2/4)]2 - [2/2 + ^(2/3 + 2/4)]2 + 2/3 - 2/4 _ _2
_2 , _2 _
— %i — z2 -\- z3 5 2
ON
5 2
0
3 2
0
3 2
0
^ 1
2
z4 .
/I 0
0
0
25 4 3 2
0
3 2
0
~
6. (1) 1
0
0
1
0
1 0
0
0
1 0
1 )
0
0
0 0
25 4
0
0
9 25
1 0
5 2
^
~ 5 2
V0
Z1
\
u
0 0 1 )
u
1 0
3 5 6 25
1 I
Linear Algeb ra
412
(
2
2 -2
2 5 -4
/
~2\ -4 5
0 3 -2
2
0 0 ~
(2)
1 0
I 0
0 1 0
n
0\ -2 3
0 0
0 3
0\ 0
o 1
~
0 0
1 -1 0 1 0
1 0 1 )
u
1 )
1 -1 0 1 1 Vo 0
\ ^ 3
1/
7. Since the rank of (a,j) is n — 1, Ann ^ 0, we have O-in = kidn + ■ ■ ■ + fcn_iOi,„_i . Computing directly, we have HaijXiXj = (anxi ( a n ^ i -\ H £—i 1 1 ) [ ■ i , L ■)> • l j ■ 1
+1-aaiimm- -i i x2B„_i )xi + + ■ •■- ■ ai„; n-\ ++ a\ nxnZn)zi
~r \&n—1)1 X\ ~r • • • ~r
tt— fln_i,l nm XnX)Xn)X - 1 -\~r Q>n— n_i, n_i n—\ n—\ lm — 1 X*rtn—i
+ (( ff ll nn ll ^^ ll H + • • • 4" Onn in— + KO , n - l1 ^^n—1 n - l +T
= [au(xi [an(a:i + fcia; fci^n) n) H
a
n&TinX n^n/^n n)Xn
h oi (xn— 1 T+ "*n ai,n-i kn-\x )\x\ +-\ • — 1nz„)]zi )71 _i (^n-i
+ [On-1,1 + [an-i,i (^1 (^l + k\X kixnn) H
1-h a„_- l m — 1
V^n —i
^n—1 in—1
T K n _ia; n jja; — l*Ti )\^n— 1
& n m —1 l ^ n - 1 ~r »*n— l ^ n J J ^ - n
+ [ani(a:i + kiXn)-\ h = (anz'i H r- a i m - i x'n-i)(x'i ~ fciO + = (an^i H h a i m - i z n - i ) ( z i - fciz'J H
-
1" A n - l m - l ^ n -- ! ) « - ! + (a„_i,i x[-\H 1a„_i, n _i x'n_1)(x'n_1 X + (an ix[-{ 1- t*njn—1 n—l)n + ( f l n - 1 , 1 x\
= (anz'i H
1- a it*njn—1 m - i Z n*^n—1/ - i M +n •
= (au^'i H
1- "ai,n-i 'n-i) 'i 1 H ~r * * "T &n—lm— ^ n-- W ^ n - l
1 +n -(an-i,i a^i H
x
fcn_lXn)
kn-ix'n)
x
1- On-iin-i a 4 - i ) a 4 - i
n-1
= y ^ aijX^Xj , where Xi + fcjxn,..., ^ n _ i — xn—i + Kn_iXn . 8. Taking Xj = 1, Xj: = 0, i 7^ j , we obtain a*, = 6*,. Again taking Xj = 1, Xj = l,Xk=0,k^i,j, we obtain a^ = by. 9. Examine by definition.
Answers to Selected Exercises
Section
413
4.2.
1. Need not be. 2. (1) Negative definite.
(2) Indefinite.
3. When A > 2, / is positive definite; when A ^ 2, / is indefinite. 4. The quadratic form given can be reduced to the standard form
f = yf + --- + vl- vl+i
vl+i ,
by using nonsingular linear transformation: Vi = \fa~iZi,
i=l,...,k,
yk+j
= y^biZk+j,
j =
l,...,l.
5. Since r + s = (p + I) + (p — I) = 2p, r and s are odd numbers or even numbers. Again \s\ = \p — l\ ^ p +1 = r. 6. When Ai > 0, we have Bi = b\...bfAi
> 0.
7. A is positive (negative) definite, i.e., X'AX is positive (negative) definite. Taking the components Xi = 1, Xj = 0 (j' ^ j) of X' = ( x i , . . . ,xn), we have X'AX = au > 0(< 0). 8. Prom P'AP = E, we obtain P'A'P = E,
P-1A~1P'-1=E,
P*A*P'*=E.
9. By using the conclusion in Example 8. 10. Since ( M - 1 P ) ' ( M ' J 4 M ) ( M - 1 P ) =
P'AP.
11. Since X'AX, X'BX are positive definite, we have X'(A + B)X = X'AX + X'BX > 0, i.e., A + B is positive definite. But A — B and AB need not be positive definite, for there may be X'AX — X'BX ^ 0, and AB need not be a symmetric matrix. 12. The proof for Ak > 0 is the same as in Theorem 5. 13. Imitate the proof of Theorem 2. 14. The corresponding matrix of f{xi,..., / A =
xn) is
cosa
1
1
2cosa
\ 1
\
1
2cosa 2c
414
Linear Algebra
By induction we can prove the principal minor of order k in the upper left corner is A^ = cos (ka). From & < a < J, we obtain ^ < na < z ^ : , so among a, 2a,... ,na there must be angle in the second quadrant, i.e., there must be some Ak < 0. But Ai > 0, hence / is indefinite. n
15. Assume that y\ = V J anXi,
i/j = —auXj ,
j = 2,..., n, then
i=1
yi
aiiViVj = o-n I ^2 anXi 1 + 2 ^
i,j=l
\i=2 i
/
/
i=1
aa
I^
anXj I
\ j=2
(-anXi)
I
OLijCL-y-^XiXj
i,j=2
YJ dijdliXiXj
— 2_j
i,j=2
au(anaijXiXj)
i,j=2
an Yl i,j=2
an
aij
di\
G>ij
since a n and the left side are greater than zero, we have an
ay
■X- j J^ ^ ^-* U .
Section 4.3.
/
1
\
- 1 - 1 0 is 2. 0 0 0y
1. The rank of
2
6\
2. Examine by definition. CHAPTER 5 Section 5.1. 1. It is only nonzero linear combinations of X\, X2, ■ ■ ■, Xn that are eigenvec tors associated with AQ. 2. Possible.
Answers to Selected Exercises
415
3. (1) Eigenvalues A = ±ai. Eigenvectors associated with A = ai are ki(—i, 1), and eigenvectors associated with A = — ai are k2{i, 1), where fci and &2 are nonzero constants. (2) Eigenvalues A = 1 , 1 , - 1 . Eigenvectors associated with A = 1 are fci(0,1,0) + £2(1,0,1), where fci and ki are constants and not all zero. Eigenvectors associated with A = — 1 are k(—1,0,1), where k is a nonzero constant. (3) Eigenvalues A = 1,-1,2,2. Eigenvectors associated with them are fci(1,0,0,0), fc2(—3,2,0,0), fc3(6,1,3,0) respectively, where fci,fc2, and k$ are nonzero constants. 4. Their characteristic polynomials are /(A) = (A + 2)2(A — 4).
(
an
••■
oin \
0>n\
'''
Qnn /
. Let X = (1,0, . . . , 0 ) ' , eigenvalue
corresponding to it is Aj. Thus we obtain a n = \i,an = 0,i = 2 , . . . ,n. Similarly, we obtain an = Aj, Oy = 0, i ^ j . Again let X = ( 1 , . . . , 1)', we obtain a n = 022 = ■ • ■ = ann, so A is a scalar matrix. 6. AX = XX,A2X
= XAX, i.e., EX = X2X, (A2 - 1)X = 0, so A2 = 1.
7. AX = XX,A2X = XAX = X2X. eigenvalue of A™.
In general, AmX
= XmX, so Am is an
8. Since X is an eigenvector of A, AX = XX. Thus A2X = XAX = X2X. Hence f(A)X = f{X)X, i.e., [f(X)E - f(A)]X = 0. That is to say, /(A) is an eigenvalue of f(A), X is an eigenvector of f(A) associated with /(A). 9. From AX = XX, we obtain A~lX
= X~XX. Hence A*X =
^X^X.
10. When A is singular, the rank of A* < 1. Therefore the characteristic polynomial of A* is f(X) = Xn-(An
+ --- +
Ann)Xn-1.
When An + A22 + r Ann = 0, the eigenvalues of A* are n zeros. When An + ■ ■ ■ + Ann 7^ 0, one eigenvalue of A* is An + • • • + Ann. The others are zeros. 11. Since principal minors of each order of A are greater than zero, from the characteristic polynmial of A, we immediately know eigenvalues of A can't be negative real numbers.
416
Linear Algebra
12. Let {XE - A)X = 0. Since {XE - P~1AP)X = p-x{XE - A)PX, we have {XE - P-1AP)P~1X = P~\XE - A)X = 0 and consequently P~lX is an eigenvector of P~lAP. 13. It follows immediately from definitions of eigenvalue and eigenvector. 14. Let A be a matrix of order n and n an odd number. Prom /(A) = \XE — A\, we obtain / ( l ) = \E - A\ — \{E - A)'\ = \E - A'\. Again since AA' = E, \A\ = 1, we have / ( l ) - \AA' -A\
= \A\ \A' -E\
= \A' - E\ = {-l)n\E
-
A'\,
but n is an odd number, so \E - A'\ = —\E - A'\. Hence \E - A'\ = 0, i.e., /(1) = 0. 15. A is positive definite, so there is a nonsingular matrix Q such that A = QQ' ■ Thus AB = QQ'B ~ Q~l{QQ'B)Q = Q'BQ. Since B is nonnegative definite and Q is nonsingular, Q'BQ is nonnegative definite, hence eigenvalues are nonnegative. Since eigenvalues of similar matrices are the same, eigenvalues of AB are nonnegative. 16. If two matrices are similar, they must be equivalent. But if they are equiv alent, then they need not be similar. A unit matrix is similar to only a unit matrix; a real symmetric and positive definite matrix is congruent to a unit matrix. A nonsingular matrix is equivalent to a unit matrix. Section
5.2.
1. Only when Xi,...,Xn
are linearly independent, the expression holds.
2. (1) There are two eigenvectors (—i, 1) and (i, 1) that are linearly indepen dent, so it is similar to a diagonal matrix I * matrix P
= (T 0-
/I (J)A~\
I. Transformation
1
|,P =
0 1 1 0 0 1
"M 0 i
/I
orP =
°
Vi
0 I 0
-1 0 1
(3) It does not have four linearly independent eigenvectors, so it is not similar to a diagonal matrix.
Answers to Selected
417
Exercises
3. When P = EXE2 ■ ■ ■ Em, P~l = E'1 ■ ■ ■ E{\ then P~1AP = EtfE£_ l 1 ■ • ■ E{ AEi ■ • ■ Em, but E} AEi is also not convenient to use elementary operations. 4. Use the proof by contradiction. If P-1AP diag(a™,..., am). But this is impossible.
= diag(ai,... , a n ) , then o =
5. If b i , . . . ,6n are a permutation of o i , . . . , a n , since we interchange the ith and j t h row, simultaneously the ith. and the j t h column so that a* and a.j are interchanged. From Exercise 9 in Sec. 3.3 the resulting matrix can be written as EijAEij. Thus we have matrices Ei,E2,- ■ ■ ,Em which are of the same type as Eij such that Em • • • E2E\AE\E2
■ ■ ■ Em = B .
Letting P = EXE2 ■ ■ ■ Em. From fir1 = Eu we obtain P~l so P - 1 A P = B. 0 0 0 2 1 0
6. P
Section
1\
o , o
0 1\ 1 -2 0 1/
2
P~lAP
0
Vo
5.3. \
2 3
5
-*a 5
(
10 1. ( 1 ) A .
Q' = x
15 '
15 2
-1/ \ 2/1
2. ( l ) / = 4y2
+
y2-2y3 ,«
'
0 \
2
^2 2 _ l 2
2
I 2 1 ' 22
2
2 /
| x i - §x 2 + 3X3,
2/2 2/3
(2)/
3
0 2
0'
2
0 ^5 I
4y/5
/I
(2M-
3 \
i x i + §x + Ix . 3 z2 ' 3 ' 3
5yf-5y5 + 3y|-3yJ,<
2/1
5X1 + 5X2 + 5X3 + 5X4,
2/2
^ X l + \X2
~ 5X3 -
ix4,
2/3 - 5X1 2 1 * - ! - \x 2 12** +r 5X3 S-"3 - ^ X 4 , . ?/4 = ^ X i - ±X 2 - 5X3 + ± X 4 .
■ EoE 2-ca,
41 i
Linear Algebra
'X = 73X' + 7EZ' 3.
4
"I"
2
=
-t-Z
1 , < y = -73x'
+
72y'
z
+1
>
+
7Ez'
+
1
>
z
[ = -^'-^y'
+ 7E '-
4. It can't be orthogonalized, for pairwise orthogonal nonzero vectors must be linearly independent. Any nonzero vectors can be normalized. 5. Since AX = XX, we have XX7 = JCA1 = -X'A, then A X ' X = -X~'AX = -XX'X. Hence (A + XJX'X = 0, so A + A = 0, i.e., A = 0 or A is pure imaginary number. 6. As in the proof in Theorem 6, we can prove that any real skew-symmetric matrix A of order n has n linearly independent eigenvectors, for (P~lAP)' = -P~lAP. 7. Assume that A and B have the same eigenvalues. Prom /A, Ql1AQ1
/A,
\
=
,
Qi1BQ2
V
An/ 1
=
l
we obtain Q AQ = B, where Q = Q\Q2 - This proves the necessary condition. The sufficient condition obviously holds. 8. Since A is positive definite, there is an orthogonal matrix Q such that Q~lAQ = diag (A l f ...,A„). Let B = Q diag ( A J , . . . , A J | ) Q - \ then B is positive definite and by computing we know B2 = A. 9. Since A is positive definite, from the above exercise, we know there is the positive definite matrix A *. Hence we have A~ 2. Again since AB is similar to A~i{AB)At = A2BA2, while A2BA2 is a real symmetric matrix, so its eigenvalues are all real numbers. 10. Since A has n different eigenvalues, ^4 can be reduced to a diagonal matrix, i.e., p - 1 A P = D = diag(A 1 ,...,A n ), letting P~lBP = C = (c y ), from AB = BA we obtain CD = DC. When i 7^ j , we have A; ^ Aj, so c^ = 0, i ^ j . Let cu = fi\,..., c„ n = fin. Thus p - 1 B P = C = diag(/ii,...,Mn)-
Answers to Selected
Section
5.4.
1. (1)
Q~lAQ
13 14
3\/3 14
3-^/3 14
13 14
/ ^-
-i-
s/3
\/2
J_
\
V2
75
419
7e\ 2
Q i _ 1
Exercises
"75/
-1
(2) Q - U Q
cos30° - sin 30° sin30° cos 30° 1
(
1 -1
1 2
V i
1 1 1 -1 1 -1 -1 -1
1\ -1 1 1/
2. The proof is the same as that of Theorem 4. 3. Suppose that A 'A_= E, AX = XX. Thus X 'A' =JX ', i.e., X 'A'1 A X ', so X 'X = XXX 'X, and consequently AA = 1, i.e., |A| = 1.
=
4. Prom Theorem 4, \A\ = \P~lAP\ = (-1)7"2. If \A\ = - 1 , then r 2 is an odd number, i.e., r 2 ^ 1, so — 1 is an eigenvalue of A. If \A\ = 1, then r 2 is an even number. Hence when n is an odd number, r\ is nonzero, i.e., ri ^ 0, so at this point 1 is an eigenvalue of A. 5. The proof is the same as that of Theorem 2. Section '
\
'-3 0 0
5.5. 48 95 -61
2. (v4 + 2E)
-1 _ -
-26' -61 34 /
23
l__?_ \
23
23
|
_3_r 23 /
420
Linear Algebra
3. From the characteristic polynomial f(x) = xn + a i x " - 1 H we obtain
A{An~l + aiAn~2
+ ■■■ + an^E)
V an-\X + an,
.{-±-)=E.
4. Since the characteristic polynomial of A is f(x) = x3 — x2 — x + 1, we have A3 = A + A2 - E, i.e., when n = 3, An = An~2 + A2 - E holds. Again by induction, it is easy to prove that for any positive integer n, we have An
_
An-2
+
A
2 _
E
/I ^l 100 = 50A2 - 49E - I 50 \50
1 0 i
5. =:{X(x- -9)(z 9)(x + + 9) 2 , m(x) = xx22 - 881. 1. >• (1) (1) f(x) /(*) = 2 2 [x - 2a 2aoX (OQ, + a\+a\+ a\ + a\ + «i)l a|)]52 , 0x + (a (2) /(x) /(*) = :[X* 2 m(x) -== xx22 - 2a00xx + (OQ + + aa\ + 0% a 2 + aa223).). m(x) 5 3 3)3. (3) = (x (X- - - 3) , m(x) m(x) =■(x{x- 3) . (3) /(x) fix) = 6. Suppose that the minimal polynomial of A and A' are mi(x) and respectively. Let mi(x) = xm + 6 i x m _ 1 +
m^x)
h 6TO_!X + 6 m ,
thus Am + ^A™-1 + ■■■ + bm-iA + bmE = o hence A'm + biA""- 1 + • • • + 6 m _iA' + bmE = o, i.e., mi(i4') = o, so m2(x)|mi(x). Similarly mi(x)|m2(x). Hence mi(x) = m 2 (x). 7. It follows from Theorem 5. 8. Since the minimal polynomial of A is m(x) = x 2 — 1, the characteristic polynomial of A is /(x) = (x + l) 2 (x — 1), or f(x) = (x + l)(x— l ) 2 . Hence the trace of A is either 1 or —1. 9. Suppose that Q lAQ =
'•
and A i,A2,... ,At different.
From Exercise 7 in Sec. 3.2 we obtain Q lBQ = 1
"•
V
' Bt
. Since
Answers to Selected
421
Exercises
B is similar to a diagonal matrix, the minimal polynomial of B has no repeated root. Since the minimal polynomial of B is the lowest common multiple of the minimal polynomials of B\, B2,..., Bt, the minimal polyno mials of J5i, B2, ■ ■ ■, Bt have no multiple root. Hence B\,...,Bt are similar 1 to diagonal matrices. Let T~ BiTi = A , thus Tf1
'BX
Dx
'Ti
" IT1.
Bt
Dt
Tt,
i.e., •.-I
Bx
'Dx T Bt
Dt
so
T~lQ-lBQT
=
'Dx
(QT)-lB{QT)
Dt T~1Q-1AQT
= T~1
(XxEx
XxEx XtEt
XtEt,
CHAPTER 6 Section L (1)
6.2.
(o (A-o)(A-6)J-
{ i]
'
(A + l ) ( A 2 - 2 ) /
■
(3)ItiseasytofindthatD2(A) = A(A + l), £>3(A) = A 2 (A+1) 3 , Di(A) = 1, so £73(A) = A(A + l ) 2 , E2(X) = A(A + 1), Ei(X) = 1. The canonical form required is 1 0 0 A(A + 1) 0 0
(4)
£>4
=
(A -
(A-l)2,
I) 4 -
Let
Af(A) =
0 0 A(A + 1) 2
A-3 4 -6 A+ l 0 0 -1 A-2 -1 5 1 A
A£ 1 } (A) =
-1 A+l -1
0 0 A-2
: (A - 2)
(A + 1)(A - l) 2 . Since
422
Linear Algebra
D3(X) A, (^i () A ) , we have D3{\) = (A - l ) 2 . Moreover -6 -1 - 1 6 ^ 0, so £>2(A) = -Di(A) = 1, E4(\) 14 5
D3(\)\A?\\), A«(A) =
E*W = (A - I) 2 , E2(X) = E1(X) = 1.
=
= Wi
The canonical form required is T (A-1)2
(A-I)2
2. Obviously Di = D2 = ■ ■ = A i - i = 1- Adding A times the nth row to the (n - l)th row, A times the (n - l)th row to the (n - 2)th row, continuing in this way, we obtain 0
/(A)
\A{\)\ =
/(A), -1 X + ai
so Dn(\)
= /(A). Hence = l,En = /(A) = An + aiA™"1 + • • • + a n _iA + an.
£ i =...=En-i
3. We know from properties of determinant that when we perform type I of elementary operations on a A-matrix of order n, its determinant changes only its sign; when we perform type III of elementary operations, it is not altered; when we perform type II of elementary operations, it increases by a nonzero constant factor. Section 6.3.
1. (1) .4(A)
/A(A-2)2 0 0 0
V
o
0 A3(A-2)3 0 0 0
0 0 A-2 0 0
0 0 0 A(A + 1) 0
°0 \ 0 0 A-2/
so elementary divisors are A, A, A3, A + 1. A - 2. A - 2. (A - 2) 2 , (A - 2) 3 ,
Answers to Selected
423
Exercises
invariant factors are E5 = A3(A + 1)(A - 2) 3 , E4 = A(A - 2) 2 , E3=X(X-2),E2 (
A-3 (A-l)2 (2) A(X) ~ 0 V-(A-l)2
-1 0 0 0
= X-2,E1 0 0 0 A(A-2) + l
= 1. 0\ 0 1 0/
1 (A-l)2
(A-l)2,
so elementary divisors of A(X) are (A-l)2, (A-l)2, invariant factors are EA.
/
= (A - l ) 2 , E3 = (A - l ) 2 , E2 = E1 = 1.
1 0 (3) A(X) * A+ a \ -P
/? A+ a A+ a -0 0 0 / 0 0 A
/l 0 0
0 0 1 0 0 1
V0
0 0
0 0 0
0 \ 1 3 + c/ \
[(A + a ) 2 + / 3 2 ] 2 /
so elementary divisors of A(X) are (X + a + [3i)2, (X +
a-pi)2,
invariant factors are E4 = [(A + a ) 2 + p2}2, E3 = E2 = E1 = l.
424
Linear Algebra
2. (1) D4 = A3(A - 1), D3 = D2 = Di = 1, so Ei = E2 = E3 = 1, £ 4 = A3(A-1). Elementary divisors are A3, A — 1. (2) £>4 = (A + 2) 4 , D3 = D2 = Di = 1, so Ex = E2 = E3 = 1, £ 4 = (A + 2) 4 . Elementary divisors are (A + 2) 4 . 3. £ 5 = A3(A - 2) 4 , EA = A(A - 2) 4 , E3 = A(A - 2), E2 = EX = 1, £>! = l, £>2 = l, D3 = A(A - 2), D4 = A2(A - 2) 5 , Z>5 = A5(A - 2) 9 . 4. A(A)
(
°0
/
o
0 0 0 0 /?2 + (A - a ) 2 1 2 0 /3 + (A - a ) 2 0 I 0 0 0 (A - a ) 2 + P2 0 (A
rs-/
I
0 0 0 0 1 2 + (A - a ) 2 P 0 0 0 0 1
0 0 0 1
-a)2+p2
0
2
(A -a)
0
+p2
-1 0 0 A-a 0 0 -1 0 0 0 0 0
0 -1 0 0 A-a 0
0 \ 0 -1 0 0 A-a/
o\ 0 -1 0 0 -1 ~B(A) 0 0 0 0 0
oj
Let Bi(A) = ("' \
' a2
— ) +P B2(A) =
2
-,)' / (>i
V
1 —
2
\
a ) + (3
2
1 2 a ) +/3 (A
V
Thus S(A) = I *' . . I. Since invariant factors of Bi(A) are 1, 1 V B2[\) J 1, it has no elementary divisors. The invariant factors of B2{\) are 1, 1, [(A - a ) 2 + p2}3. Elementary divisors are (A - a + pi)3, (A - a - Pi)3, so the invariant factors of A(\) are E\ = E2 = E3 = E\ = E$ = 1, EQ = [(A - a)2 + P2}3. Elementary divisors are (A - a + pi)3, (A - a - Pi)3.
425
Answers to Selected Exercises
5. Since XE — A and XE — A' are mutual transposed matrices, corresponding minors of order A; are mutual transposed determinants, hence they are equal. Therefore they have the same greatest common divisors of each order, so they are equivalent. Hence A is similar to A'. 6. Since g(X) and h(X) are relatively prime, we have g(X)p(X) + h(X)q(X) = 1 thus 3(A) 0
0 h(X)
0 -g(X)h(X)
g(X)
ff(A)p(A)\
(g(X)
0
h(X) )
V °
1 h(X)
1 0
1 h
W
0 g(X)h(X)
7. As in Example 4, we use the conclusion of Example 3 in Sec. 6.2 to prove it. Section
6.4.
1. (1)
(2)
1 (3)
2
- 11
-1
I
/0
1 1 1
0
1
(4)
1 1,
\ i
V
0/
\ 2
1/ /'
3. J
0
\o
-V p-1
1 0
/5 0
0
U -2
-5\ 1 2
l)
2 1 2
i\ "2
J
A5 = PJbp-1
(1 = 0 v0
4 • 54 3 • 54 - 1 \ - 3 ■ 54 4 ■ 54 4-54 3-54 j
426
Linear Algebra
Jl
4. Assume that A = PJP'1,
then Am =
J
PJmP-
E
Jt
and consequently J"
m *E, i.e. J is a unit matrix, but
P-*P
JT)
(K j
m i
XT
=
1 \ Hence Ji = A*, so A is similar to a diagonal matrix. 5. Let Am = o. From XmX = AmX = oX = o, we obtain A = 0, that is, the eigenvalues of A are all zeros. Conversely, if the eigenvalues of A are all zeros, then we choose appropriately m such that J™ = o, where Ji is a Jordan block whose eigenvalues are all zeros, this is because the Jordan block associated with zero is a nilpotent matrix. Hence we have Am = PJmP~l = o, i.e., A is a nilpotent matrix. Or we directly use the conclusion of Example 4 to prove it.
M 6. Since A ~ J = I
\ "•
I , the rank of A = the rank of Ji +
h
the rank of Jt, J™' = o, where Jj is a matrix of order m*. 7. Suppose that P~lAP is a Jordan canonical form. Since A is a nilpotent matrix, P~XAP is a triangular matrix whose elements on the main diagonal are all zeros. Thus P~1(A+E)P = P~1AP+E is a triangular matrix whose elements on the main diagonal are all 1, so \A + E\ = 1. 8. The eigenvalues of A are either 0 or 1. Suppose that A = PJP~l, A2 = PJ2P~l, so J 2 = J. Thus J is a diagonal matrix.
then
9. Suppose A ~ J. The trace of J is the sum of eigenvalues of A, while tiA = tr J, so the trace of A is also the sum of its eigenvalues. 10. -A(A) need not be similar to a block diagonal matrix, this is because mi + (- mt need not be the order of A(X).
Answers to Selected
Jx 11. Suppose that A = PJP~\
427
Exercises
\
J = |
KX ,K =
jj where Ki =
\
Kt
and Ji are of the same order. By comput-
I ing it is easy to check that
)
Ji=KiJ'iKrl,J
=
KJ'K-1.
Thus A = PiKJ'K-^p-1
= PK^P'A'P-^K^P-1
=
BA'B'1,
where B = PKP'. Hence A = BC, where C = A'B'1. Clearly B is a nonsingular and symmetric matrix. It follows from B~XA — A'B-1 that C is also a symmetric matrix. 12. (1) \\E - A\ = (A - 1)2(A - 5);
(2) \XE - B\ = (A - 1)3(A - 2).
CHAPTER 7 Section 7.1. 1. None of (1), (2), and (3) are number fields. (4) is a number field. 2. Neither of (1) and (3) is a linear space; (2) is a linear space over real number field; (4) is a linear space over the complex numbers. 3. (1) forms linear space. When n = 3, all vectors lie in the plane x + y + z = 0 passing through the origin. (2) can't form a subspace. 4. k(a -f3) = k(a + (-/3)) = ka + fc(-/3) = ka - k/3. 5. This is because ( - 1 , - 1 , 0 , 0 ) + (3,0,3,3) = (2, - 1 , 3 , 3 ) , (1,1,0,0) + (-1,0, - 1 , - 1 ) = (0,1, - 1 , - 1 ) . 6. a and /3 are linearly dependent. 7. When skew-symmetric matrix and symmetric matrix are equal, they must be the zero matrix. Moreover from Example 5 in Sec. 3.2 we obtain the result required. 8. By definition.
428
Linear Algebra
Section
7.2.
1. {(3, 3, 2, 0), ( - 3 , 7, 0, 4)} is a possible basis. The dimension is 2. 2. All the real symmetric matrices of order 3 form a linear space, its dimension is 6. Its possible basis
I/ 11
f°
\
0
•
o)
H
\
f°
/0 1 0\ 1 0 0 , 0 \0 0 0/ \1
\ 1
■
o)
/°
oo,
0
0 0/
1/
V
/°
0 1\
\
o 0
Q\)
0 1
\o 1 0 ; j
4. Suppose that (3,7,1) =
ai(l,
3,5) + o 2 (6,3,2) + a 3 (3,1,0)
= (ai + 6a 2 + 3a 3 , 3ai + 3a 2 + a 3 , 5ai + 2a 2 ), then ai + 6a 2 + 3a 3 = 3, 3ai + 3a 2 + a 3 = 7, 5ai + 2a 2 = 1, i.e., ai = 33, a 2 = —82, a 3 = 154, and consequently coordinates required are (33, - 8 2 , 154). 5. (3, 4, 1). 6. Suppose that 2 4
3 -7
/l
0\
/ic + z + u; x —y — z \
Solving x + z + w = 2, x + y = 4, a : - 2 / - z = 3, a; = - 7 , we obtain a; = - 7 , y = 11, z = - 2 1 , IO = 30. Coordinates required are ( - 7 , 1 1 , -21,30). 7. Assume that the old basis fosr R3 is {i/j., i/2, i/ 3 }, then /l
(a1,a2,a3)
= (I/I,I/ 2 ,I/ 3 )
2
3'
I2 3 7 \ 1 3 1, /3
5
08i,A,A) = (*i,^,»*)( i 2 \4
l
1 -6
Answers to Selected
Exercises
Thus (1
(A.ft.ft) = (oi,a 2 ,a 3 )
1
2 3
2
V 3 /-IB = (oi, a 2 , a 3 )
V 3
/-27
-27 9 4
9
V 4
(
-71 20 12
\
7
1/ 7 -2 -1
5
( a i , 0:2,03)
3
7 3i
W
5
-1
\
/
" 1 / -41 9 8
13
19
7
\
3
1 \4
-71 20 12
-9
5 2 1
1
-6
181
-13
63
10
99
so /13 (2/1,2/2,2/3) = (Xl,
X2,X3) \
-9
7
19
-13
10
181 4
63 2
99 4
/ 2 0 5 [ 1 3 3 8. (1) (/3 1 ,/3 2 ,/93,/34) = ( a 1 , 0 3 , 0 3 , 0a 44 )) - 1 1 2 V 1 0 1 / (2) Since
6
2
1 -1 1 /
0 4 9 1 3
-1 _n
\
6 1 3/
1
27 4 9
I -3
9
23 27
/
1 3 3 6
2
0 5 \6
-1
IV 1 0 2 1 1 3/
6
\
3' / -l
27 \ _1 9
i
3 26 27
/ /
we have /
(2/1,2/2,2/3,2/4) =
(XI,X2,X3,X4)
9 3
-1 _ H g
i.
27 27 4 9 1 3 23 27
1 3
0 0 2 "3
-J-\ 27 \
1 9 1 3 26 27 ■
Linear Algebra
430
(3) Letting x\ = yu
x2 = y2, x 3 = y3, x 4 = y4, we obtain
{
-§£l+5SE2
-X
3
-^X
= 0,
4
&X1 - lx2 ~ | X 3 - §X4 = 0, \xX -§fXi
-X3 - \x2 + \x3
-§X4=0, - ^x4
= 0,
i.e., ' —5xi + 3x2 — 9^3 - 11^4 = 0, xi — 15x2 — 9x3 — 23x4 = 0, Xi — 3X3 — 2X4 = 0, . —7xi - 3x2 + 9x 3 — x 4 = 0. Solving the above set of linear equations, we obtain x\ = —X4, X2 = —X4, X3 = —X4. For example, x i = 1, x2 = 1, X3 = 1, X4 = — 1 , i.e., (1, 1, 1, —1) is a solution vector. 9. U is a subspace of dimension 2. {/3i, (32} is a possible basis. 10. Let a = (01,02,03,04) £ V n U, then Oi — 02 + 03 — 04 = 0,. Oi + 02 + 03 + 04 = 0 , thus oi + 03 = 0, 02 + 04 = 0,
i.e.,
oi = —03, 02 = —04 ,
so a = ( 0 1 , 0 2 , - 0 1 , - 0 2 ) = a i ( l , 0 , - 1 , 0 ) + a 2 ( 0 , 1 , 0 , —1), while (1, 0, — 1 , 0) and (0, 1, 0, —1) are linearly independent and consequently, they are a possible basis required. 11. Suppose t h a t dim(Vi n V2) = m. Since dimVi + dimV2 = 2m + 1, m ^ dimVi ^ m + 1. Hence there must be one of V\ and V2 whose dimension is m + 1. If dimVi = m + 1, then dimV"2 = m . Hence V2 = V\ fl V2, i.e., V2 C Vi; If dimy 2 = m + 1, then Vi = Vi n V2, i.e., Vx C V2. Section
7.3.
1. Neither of (1) and (2) is a linear transformation. transformations. 2. (Ti + T 2 ) ( x i , x 2 ) = (xi + x 2 , - x i - x 2 ) , T2Ti(xi,x2) = (x2,xi).
T h e others are linear
rir2(xi,x2)
=
(-x2,-xi),
Answers to Selected
431
Exercises
3. Since S(oti + a2+a1a2) = Pi+P2+P1-P2, i-e., 5(2a x ) = 2/3i, that is, S(oti) = fa. Similarly, S(a2) = /3 2 , so T = S. 4. — T is a rotation counterclockwise through an angle (180° + 6) about the origin of a Cartesian coordinate system. T _ 1 is a rotation clockwise through angle 6 about the origin. The former greatly differs from the latter. 5. A linear transformation transforms the zero element into the zero element. Some transformations transform nonzero elements into the zero element. All inverse images of one nonzero element can't form a linear space. 6. By definition. 7. Since T is an invertible transformation, T is a nonsingular transformation. Hence TU = U. T~XV = U, i.e., U is an invariant subspace under T _ 1 . 8. If T(xi, x2, X3, £4) = 0, then x\ = x2 = £3 = £4 = 0, and so T is a one-one linear transformation. 9. Since (T + 5)(l,0,0) = (1,0,1), (T + 5)(0,1,0) = (2,0,0), (T + S)(0,0,1) = (1,1,0), we have dim [Im(T + S)} = 3. 10. Since T2(x,y,z) = T(0,x,y) = (0,0,a;), ImT 2 is a subspace of dimen sion 1, {(0,0,1)} is a possible basis. KerT 2 is a subspace of dimension 2, {(0,1,0), (0,0,1)} is a possible basis. 11. Assume that T ( °
b
a c
) = ( °Q ? J . Thus b\ (I d) \0 -2c -2c
2\ _ / l 3j \0
2 \ (a 3J \c
2a + 2b-2d\ _ /0 2c / \0
Hence a + b = d, c = 0, so <(
)>(n
o)f
b d 0 0 i s a
P o s s u ^ e ^ ) a s i s ^ or
KerT. 12. Suppose that a € KerT*, i.e., ^a = o. Hence Ti+1a a € KerT i + 1 . Therefore KerT* C KerT i + 1 .
= T(T*a) = o, so
13. (1) Let Tv = u, then u = Tv = T2v = Tu. (2) Let Tv = u,v^u, then T(v - u) = Tv -Tu = u-u = o. (3) Since v = Tv + v — Tv = u + w, w eW. Moreover letting v €U (1W, since v € U, we have Tv = v. Owing to v € W, Tv = o, i.e., u = o, so Uf\W = 0.
432
Linear Algebra
14. Since r and s are real roots of equation x2 —nx + rs = 0, we have n 2 — 4rs >0. 15. Suppose that {01,02,. - - , 0 s } is a basis for TkV, {f3i,02,- ■ -,Pt} is a ba sis for KerTfc, where a G TkVC\ KerTfc, then a = aiOi + • • ■ + as a3 = &i/9i H h &t/3t, i-e., a i a i H h asas - bifii bt(3t = 0. Hence k + ■■■ + a3Tka3 = 0. But Tk{TkV) = T2kV = TkV, i.e., Tk is a aiT ai linear transformation on TkV. Hence the images of a basis { o i , 02, ■ • •, o s } for TkV are T f c o i , . . . ,Tkas. It is still a basis for TkV. They are linear independent, so ai = a2 = . . . = a3 = 0, i.e., a = o. 16. We prove the equivalent condition by using Theorem 7 in Sec. 7.1, Example 10 in Sec. 7.2, and Theorem 6. Section lection 7.4. /I 0 1.. (1) 0 1 \0 0 /2
°\
0\ 0 , 0/
V
-1
0\
,3, p) (I (? - 0j j0)/ . U
V
(5)
( (
„
U
„
(I / l 0 i1\\ 0 1 11 , • \ 0 0 00// f/ aa b b 1 —b a 0 -6 a 0 0 0 a (4) 0 0 -b 00 00 00 00 \V 00 00 (2)
'
5\ -1 ,
i \■
c« ? s 4 : 20 7 5 7 18 7
(/ - 55
- \
3 -1 0 2
00 0 0\ 1 1 00 00 6 1 0 a 0 1 00 aa 6b 00 \- -b 6 a) a/
(6)
'
47 7
1
_ 220 0 \ 2
\
• ) V ~7¥ T - T \ 2 3 5\ - 1 that 0 the - 1 matrix , - | respect -f - | a Dasis . 2. Suppose of(6)T2 with to {i J } i s ( ^ c - 1 1 0 / \ ?i m 2±) - s i n 4to 5 °a\ basis {i, j} is I 0 \ of T2/cos45° 2. Suppose that the matrix with respect Hence ( a d)(o Hence ( » V j 0 W c o s 4 5 ° -sin45J>X -1/ \sin45° cos45° / 5 0 /(ac - 6 X ' Vc rf/ \0 -\) Vsin45° cos45° / \c -d) = j , I' therefore the matrix required is I I. 11 J, _ (V72 * V \ 7V2 f 7 \v/2 / 722 / ~7 2sft)
V-i 1
o)
24
fV
-(? '
3. Since (0:1,0:2) (01,02) == (/3i,/32) (/3i,/3 2 )fI
55
v a
s { a i , a 2 } i is /-3
If (1 V
*) ,1.
2
-4N"1/4
l) l)
7,
the matrix matrix of of TT22 with with respect respect to to )),, the
2'
6W-3
- 44 \\ _ / /- -5577
U 9 A |I | I7 / " VW422 U 9JI _
- 9955 \
70j70 J'
Answers to Selected Exercises
433
Hence the matrix of T\T^ with respect to {0.1,0.2} is 3 5 4 3
-57 42
-95 70
39 -102
65 -170
-7 5
Similarly, since (/3i,/32) = ( a i , a 2 )
the matrix of Ti with
respect to {Pitfri} is -1
3 5
-7
4 3
5
-8
40
6
71
38
¥ "34
Hence the matrix of T\ + T% with respect to {/3i,/32} is /
40
[n
/° 4.
1
1
0 0 0
0 V0
38\
_34) + [6 9)-\-f 0
0
/4 6\ _ /
44
44 \
-25)
0 \
0 0 1 1 0 /
5. The matrices of T, T2,..., Tn with respect to {eo, 8i, 6 2 , . . . , e n } respec tively are
/o
/o 1
0 1 0
0
1 0 0/
. 1
V
0)
\
0
V
(° \
0/
6. We know from Exercise 9 in Sec. 3.2 that A of order n is a scalar matrix. Using definition, we obtain the proof. 7. Letfcoct+ k\Ta + • • • + fcm_iTm_1a = o. Suppose that the first nonzero scalar is kt among fco, fci,..., fcm-i, then in fact the above expression be comes kt^a + kt+iTt+1a + ■ ■ • + fc„l_iTm_1a = o. We act linear trans formation Tm~t~l on the two sides and obtain hT^a
+ kt+iTma
+ ■■■+
fcm_1TTO+m-(1+1)-1a
=
434
Linear Algebra
i.e., ktT^-^a = o. This conflicts with kt ^ o and r m _ 1 a ^ o, so m 1 a , T a , . . . , T ~ a are linearly independent. 8. The Kernel is a subspace of dimension 1, its possible basis is {(2, —1,1)}. 9. After the basis is determined, for a given matrix there is a unique linear transformation. For any linear combination of matrices, we have linear combinations of corresponding linear transformations. Since all the matrices of order n form a linear space of dimension n 2 , all the linear transformations on V also form a linear space of dimension n 2 . 10. Since £>(sin#) = 0 • sin# + 1 • cos#, D(cos8) = — 1 • sin# + 0 ■ cos#, the matrix A -1 = A2 + 1, i.e., the A of D is A = ( ). Hence \XE - A\ = 1 A characteristic polynomial of D is /(A) = A2 + 1. Again since f(D)sm6 = (D2 + l)sin0 = 0, f(D)cos0 = (D2 + l)cos0 = 0, i.e., f(D) transforms any element into the zero element. Hence f(D) = o. 11. Suppose that A and B are two matrices of linear transformations T and S with respect to some basis respectively. From AB = BA, we know TS = ST. Taking an eigenvalue Ao of T, letting VQ be a characteristic subspace corresponding to Ao, that is, for any a. € Vo, Ta. = XQOL. We have 5 a € Vo, for T(Sa) = S(Ta) = XoSa. Therefore VQ is an invariant subspace under S, i.e., S can be regarded as a linear transformation on VoThus there exists o ^ (3 s Vb such that 5(/3) = /u/3, where \i is an eigenvalue of S. Obviously T(3 — A0/3, so /3 is a common eigenvector required. Section
7.5.
1. T(x,0,y,0) = (x,y,Q), so ImT is a subspace of dimension 2. Its possible basis is {(1, 0, 0), (0, 1, 0)}. When (x + y,z + w,0) = (0,0,0), we have x = —y, z = —w, so KerT is a subspace of dimension 2, its possible basis i s { ( l , - 1 , 0 , 0 ) , (0,0, 1 , - 1 ) } . 2. Similar to Theorem 1, we have T ( a i , . . . , a , , ) = ( f t , . . . , 0 m ) A , T ( a i , . . . , < ) = QSi,..., f3m)B , hence T ( a i , . . . ,an)P Therefore 5 = AP.
= (pu...,(3m)B,
so ( f t , . . . ,/3 m )(B - AP) = 0.
Answers to Selected
435
Exercises
3. Since TS(U) = T(SU) C TV, we consider TS as a linear transformation from SU into TV. Hence the rank of TS < min (the rank of T, the rank of 5). Section
7.6.
1. (1) Yes;
(2) No.
2. (1) {(1,0,0),(0,1,0),(0,0,1)}; \^J \(.2'
2' ~2'
' (~ 2' 2'
_
2^ ' v 2 ' 2 ' J-'J '
3. f(
n
5. Let a = ^2 aiCti, then otj(a) = ^2 aidj(aj) = a,-. i=l
'
i=l
6. Suppose that {cti,...,ar} is a basis for U. Since a€~U,ati,...,ar,<* are linearly independent. Hence we can take a i , . . . , ctr, a r + i = a , a r + 2 , . . . , a n as a basis for V. So { d i , d 2 , . • • , d n } is a basis for V. Thus d r + i ( a ) = 1,d r + i(ai) = 0,i = 1,2,...,r, i.e., d r + i is the linear func tional required. 7. Let (p £ £7°. Then for any a e f/, we have TV(a) = >T(a) = 0 for T ( a ) € £/. Thus TV e U°, i.e., Z7° is an invariant subspace under T. 8. A is an eigenvalue of T, so T(£) = A£, where o / ^ F , i.e., (T - A)£ = o. Hence T - A is singular, but T — A = T — A, so T - A is also singular, thus there is 0 ^ n such that (T — A)T7 = o, i.e., Trj = Xn, so A is an eigenvalue
off. 9. fa((Tff)(4) = ¥ ^ ( 0 ) = *(¥>)(*(€))
CHAPTER 8 Section
8.1.
1. Neither of (1) and (2) is an inner product; 2. Yes.
(3) Yes.
436
Linear Algebra
3. (1) Suppose that k > 0, from a = fc/3, we obtain \a\ = k\{3\. Thus (a,/3)
fc(/3,/3)
i.e., 0 = 0. Conversely, if 6 = 0, i.e., cos0 = 1, then | a | = k\/3\, so k > 0. (2) In a similar manner we can prove the result in (2). 4. Rn becomes an inner product space. Cauchy-Schwarz inequality is w2aijaibj)
< (y^aijaiaj)
5. fciaiH hfcmCKm = 0, i-e., ki(eti,ai)-\ 6. Let a = i i a i + • • • + a;n«n- Then (a, a,) =xi(a1,ai)
(y^Oijkbj) \-km{am,cxi)
H
■ = 0, i = l , . . . , m .
|-x„(a„,aj).
Since X\,...,Xn are not all zero, the column vectors of the metric matrix A are linearly dependent. Hence \A\ = 0. 7. Since A can be considered as the metric matrix of an inner product space, letting X = {xu...,xn), Y = (j/i,...,y n ), from (X,Y)2 ^ (X,X)(Y,Y), a x we immediately obtain (£, ij iVj)2 ^ (%2aijxixj)(J2aijyiyj)Section
8.2.
1. Orthogonalizing and normalizing the basis {l,x,a; 2 ,a; 3 }, we obtain
I-*-
2 -"
^V_l), 44 V - S.)' *" v
4
2. When a = a*, its coordinates Xj = <$y, i.e., | a j | c o s ^ = <5y, so ( a i , a j ) =
Sij.
3. Firstly express /3j and f3j in terms of an,...,
an and then compute
n
£(Av«t)GSi.a*)-C8i,A).
Answers to Selected
Exercises
437
4. (1) U C (t/- 1 )- 1 , moreover dim(C/-L)-L = n - dim(C/-L) = n - (n — dim U) = dim U.
{2){ut + uj-)± = {uj-)±n{uf)± :Uinu2.
Section
8.3.
1. The matrix of a linear transformation T with respect to {a\, 0:2,03} is / 2 2 ' 3 3
Xx
i
x2
V
x3
It is an orthogonal matrix. Prom AA' . 1
X2 =
Xi
E, we obtain x3=±3-
±-,
3' The orthogonal matrices required are /
A =
2 3
2 3
i\ 3 \
I ~\ -f
1 _I \
3
or
2 _2 /
3
.4 =
3 2 3
14
3 /
2 3 1 3 2 3
■IN 2 3 2 3
2. For example, the orthogonal transformation whose matrix is /0 /o 0 1
\i
1 v/2 1 v/2
0
Vl\ 72 0/
with respect to a basis {*, J, &} is the orthogonal transformation required. 3. When \a\ = |/3| j= 0, there are two sets of orthonormal bases { Q I = |fj, a 2 , . . . , a „ } and {/3i = 1 | r , (32,...,/3n} for V. The linear trans formation T satisfying Tati = j3i is the linear transformation required. 4. When cti and a.2 are linearly dependent. Letting 02 = ka\, from (01,02) = (/3i,/32), we obtain (/32 - fej8i, /3 2 - A:/3i) = 0, i.e., /3 2 = kfa. Again from the above exercise we immediately can find T. When o.\ and a.2 are linearly independent, from Exercise 5 in Sec. 8.1 we know j3i and &2 are also linearly independent. Thus there are two bases {cx\,a.2, ■ ■ ■ ,an} and {/3i, P2, ■ ■ ■ 1 Pn} for V. Orthogonalizing and normalizing them respectively, we obtain two sets of orthonormal bases. The orthogonal transformation determined from these orthonormal bases is simply that required.
438
Linear Algebra
5. Since U is an invariant subspace under T, i.e., TU CU,Tis also a linear transformation on U. Moreover an orthogonal transformation is one-one, while the dimension of U is a finite number, so TU = U. Thus for any element x £ U, there is an element y GU such that Ty = x. Let a 6 U^. Then (a, y) = 0. Hence {Ta,x)
= (Ta,Ty)
=
(a,y)=0.
so Ta e U-1, i.e., TU1- C U1. Section
8.4.
/ 0 1. (1)
^/2 2
_\/2 \ 2 *
V2
i /
\ \/2
/
(2)
v^3 i
\/3 \/2
V2
V y/2 1 V2 i %/2
V3\ i
v/2
73.
1 i
i 1
4. The proof is the same as in Exercise 7 in Sec. 7.6, for {XE — A)' = XE — A '. 5. Let a 6 t / x . Then for any element (3 6 [/, we have (T*a,/3) = ( a , 7 7 3 ) = 0 , i.e., (T*a, £/) = 0, so T*a 6 Ux. Hence Ux is an invariant subspace under T*. 6. (kTa,/3) - k(Ta,0)
= k{a,T*(3) = (a, fcT*/3),
((ra)*a,/3) = (a,r<x(/3)) = (r*a,cr/3) = (a*r*(a),/3), so (JfeT)* = fcT*,(Ta)* =a*T*. 7. Since A is an eigenvalue, era = Aa. Let JC = ax = Aa;. Therefore (era;, a;) — A(x,a;) = A.
v / («^)
. Obviously (a;,x) = 1,
Answers to Selected
439
Exercises
8. Since U^AU = diag(Ai, ...,Xn)=D, i.e., A = UDIJ-1. When Xt are real numbers, we have D ' = D. So A ' = (JJ~l)'T> 'U ' = UDU~1 = A, i.e., A is a Hermitian matrix. When Aj is zero or imaginary number, we have D ' = —D. So A' = -(UDU~1) = -A, i.e., A is a skew-Hermitian matrix. When |Ajj = 1, we have D 'D = E, so A 'A = UD 'U^UDU'1 = UU'1 = E, i.e., A is a unitary matrix. 9. Suppose that T is a unitary transformation, then (a,/3) = (Ta,T(3)
= (T*(Ta),/3) =
(T*Ta,(3).
Since a and /3 are any element, we have T*T = I, i.e., T* =T~1. suppose that T* = T - 1 , then ( T a , r / 3 ) = (r*T(a),/3) = (a,/3), so T is a unitary transformation. Section
8.5.
1. Since {Ta,Ta)
=
(T*a,T*a).
2. (T - AI)(T - XI)* = (T - AJ)(T* - XI) = TT* -XT=
XT* + XXI = (T* - XI)(T - XI)
{T-XI)*(T-XI).
Moreover
INDEX A Addition of linear transformations, 313 of matrices, 93 of vectors, 45 Algebraic complement (cofactor) 23, 29 Annihilator, 346 Automorphism, 310, 338 B Bases, 286 dual, 342 for linear space, 286 Bilinear form(s), 170 matrix of, 171 rank of, 171 real, 171 skew-symmetric, 171 symmetric, 171 Block diagonal matrix, 116 matrix, 117 C Canonical form Jordan, 254 of orthogonal matrices, 216 of A-matrix, 239 rational, 258 Cauchy-Schwarz inequality, 358 Change of bases, 298 of coordinates, 298 Characteristic equation, 176
matrix, 175 polynomial of a matrix, 176 polynomial of a linear transformation, 333 root, 176 Coefficient determinant, 35 matrix, 56, 70 Cofactor (algebraic complement), 23, 29, 69, 70 Column vector, 48 Complement minor, 21, 29 Components, 44 Congruent matrices, 149 Constant term, 2 Coordinate, 294 of vector, 294 Cosine of angle between two vectors, 355 D Defect of a linear transformation, 308 Determinant coefficient, 35 cyclic, 102 of matrix, 49 of order n, 7 skew-symmetric, 19 symmetric, 19 transposed, 15 Vandermonde, 26 Diagonal matrix, 114 determinant, 8 Diagonalization of matrices, 184
Index of real symmetric matrices, 198 Difference between matrices, 95 between vectors, 44 Dimension of linear space, 286 Direct sum of subspaces, 283 Dual basis, 342 space, 342 Dualistic transformation, 345
441 G General solution, 75, 77 Greatest common divisor, 239 H Hermitian matrix, 384 quadratic form, 389 I
E Eigenvalue, 176 of linear transformations, 333 of Hermitian matrix, 389 of matrices, 176 of self-conjugate transformation, 388 multiplicity of, 176 Eigenvector, 177 of linear transformations, 333 of matrices, 177 generalized, 261 Element of a linear space, 276 Elementary column operation, 84 divisor, 244 matrices, 134 operations, 84 row operations, 81 Equality of linear transformations, 313 of matrices, 93 of vectors, 44 Equivalent linear systems, 80 matrices, 86 relation, 87, 149 Euclidean space, 350 F Fieldcomplex number, 275 number, 275 rational number, 275 real number, 275
Idempotent matrix, 111 transformation, 317, 326 Image subspace, 306 Infinitely dimensional linear space, 287, 288 Indefinite quadratic form, 160 Indefinite matrix, 167 Inner product, 350 inner product space, 353 Intersection of subspaces, 282 Invariant factor, 240 Invariant subspace, 320 Inverse linear transformation, 318, 319 of a linear transformation from V into U, 338 Inverse matrix, 125 Inverse image, 305 Invertible linear transformation, 319, 338 matrix, 125 Isomorphism, 294, 338, 339, 380 J Jordan block matrix, 254 K Kernel of a linear transformation, 308 subspace, 308 Knonecker symbol, 28
442
Largest linearly independent set of vectors, 64 Law of inertia, 158, 171 Length of a vector, 200, 355, 369, 384 Linear combination, 45, 279 expression, 45 functional, 341 operation, 45 operator, 392 space, 276 Linearly dependent, 46 independent, 46 Linear transformation, 143, 305 from one linear space into another, 337 M Matrix, 48 adjoint, 127 augmented, 70 characteristic, 175 constant, 175 idempotent, 111 identity, 112 metric, 353, 384 nilpotent, 99 A-matrix, 175 of a linear transformation, 143, 324, 340 of order n, 48 of the same size, 87 real, 92 real symmetric, 122 scalar, 113 singular, 49, 92 square, 48 transition, 298 zero, 94 Matrix representation, 321 Minimal polynomial, 222 Minor, 21
Linear Algebra principal, 29 of order k, 21 Mirroring reflection, 374 transformation, 374 N n dimensional linear space, 286 Negative definite, 160 definite matrix, 167 element, 278 index of inertia, 159, 171 linear transformation, 307, 314 matrix, 95 Nonsingular linear transformation, 143, 307, 338 matrix, 49, 92, 127 Normal matrix, 390 operator, 393 Normalize, 201 Nullity of linear transformation, 308 O One-one linear transformation, 310 Orthogonal basis, 362, 385 complement, 356 condition, 122 linear transformations, 373, 375, 380 matrices, 121 projection, 368 space, 356 vectors, 200, 355, 384 Orthogonalization, 201 Orthonormal basis, 362 P Positive definite, 160 matrix, 167 quadratic form, 160 Positive index of inertia, 159, 171 Principal axes problem of quadratic form, 154
Index for a linear space over complex numbers with inner products, 389 Principal minor, 29, 166 Product of linear transformations, 315 of two matrices, 97 Projection, 304, 317 Q Quadratic form, 142 coefficient matrix of, 148 negative definite, 160 rank of, 148 real, 148 R Rank, 49 of column, 53 of a linear transformation, 306 of a set of vectors, 53 of matrix, 49, 53 of row, 53 Reflection, 304, 316 S same-solution system of linear equations, 57, 80 Scalar matrix, 113 Scalar multiplication, 45 of linear transformation, 314 of matrix, 93 of vector, 45 Signature, 159 Similar matrices, 174, 331 Singular linear transformation, 143, 307, 338 Skew-symmetric matrix, 120 Hermitian matrix, 384 Solution vector, 62, 74 Space characteristic, 276 Euclidean, 350 finite-dimensional linear, 287, 383
443
infinite-dimensional linear, 287, 351 over complex numbers, 276 over complex numbers with an inner product, 382 over real numbers, 276 over real numbers with an inner product, 350, 382 solution, 276 unitary, 350, 382 Standard form of matrix, 85 of quadratic form, 145 of real quadratic form, 156 of bilinear form, 171 Subspace, 279 kernel, 308 spanned, 281 annihilator, 346 Sum of linear transformations, 313 of matrices, 93 of vectors, 44 Symmetric bilinear form, 171 matrix, 120 transformation, 371 System of fundamental solutions, 63, 66, 74 system of homogeneous linear equation, 56, 74 system pf nonhomogeneous linear equa tion, 70
Theorem Cayley-Hamilton's, 217 Cramer's, 37 Frobenius's 256 Laplace's, 33 Shang Guo's, 361 Sylvester's, 181 Trace of matrices, 108 Transformation affine coordinate, 157, 305 conjugate, 387
444
differentiation, 320 identity, 305 integration, 320 nilpotent, 336 scalar, 305 self-conjugate, 388 semi-linear, 340 unit, 305 unitary, 385 zero, 305 Transposed matrix, 118 Triangular determinant, 9 inequality, 360 matrix, 116 Trivial solution, 37 U Unitary matrix, 212 V Vector of dimension n, 44 row, 48 unit, 200, 356, 384 Vector space, 294 Whole solution, 74 Z
Zero element, 277 linear space, 279 linear transformation, 305 solution, 37 vector, 44
Linear Algebra