!P
Introduction to
'1!
Matrix Analysis ~
ISecond Edition I
~
SIAM's Classics in Applied Mathematics series consists of books that were previously allowed to go out of print. These books are republished by SIAM as a professional service because they continue to be important resources for mathematical scientists. Editor-in-Chief Robert E. O'Malley, Jr., University of Washington Editorial Board Richard A. Brualdi, University of Wisconsin-Madison Herbert B. Keller, California Institute of Technology Andrzej Z. Manitius, George Mason University Ingram Olkin, Stanford University Stanley Richardson, University of Edinburgh Ferdinand Verhulst, Mathematisch Instituut, University of Utrecht Classics in Applied Mathematics C. C. Lin and L. A. Segel, Mathematics Applied to Deterministic Problems in the Natural Sciences
Johan G. F. Belinfante and Bernard Kolman, A Survey of Lie Groups and Lie Algebras with Applications and Computational Methods
James M. Ortega, Numerical Analysis: A Second Course Anthony V. Fiacco and Garth P. McCormick, Nonlinear Programming: Sequential Unconstrained Minimization Techniques
F. H. Clarke, Optimization and Nonsmooth Analysis George F. Carrier and Carl E. Pearson, Ordinary Differential Equations Leo Breiman, Probability R. Bellman and G. M. Wing, An Introduction to Invariant Imbedding Abraham Berman and Robert]. Plemmons, Nonnegative Matrices in the Mathematical Sciences
Olvi L. Mangasarian, Nonlinear Programming *Carl Friedrich Gauss, Theory of the Combination of Observations Least Subject to Errors: Part One, Part Two, Supplement. Translated by G. W. Stewart Richard Bellman, Introduction to Matrix Analysis U. M. Ascher, R. M. M. Mattheij, and R. D. Russell, Numerical Solution of Boundary Value Problems for Ordinary Differential Equations
K. E. Brenan, S. L. Campbell, and L. R. Petzold, Numerical Solution of InitialValue Problems in Differential-Algebraic Equations
Charles L. Lawson and Richard J. Hanson, Solving Least Squares Problems J. E. Dennis, Jr. and Robert B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations
Richard E. Barlow and Frank Proschan, Mathematical Theory of Reliability
Classics in Applied Mathematics (continued) Cornelius LanclOs, Linear Differential Operators Richard Bellman, Introduction to Matrix Analysis, Second Edition Beresford N. Parlett, The Symmetric Eigenvalue Problem Richard Haberman, Mathematical Models: Mechanical Vibrations. Population Dynamics, and Traffic Flow Peter W. M. John, Statistical Design and Analysis
of Experiments
Tamer Ba~ar and Geert Jan Olsder, Dynamic Noncooperative Game Theory. Second Edition Emanuel Parzen, Stochastic Processes Petar Kokotovic, Hassan K. Khalil, and John O'Reilly, Singular Perturbation Methods in Control: Analysis and Design Jean Dickinson Gibbons, Ingram Olkin, and Milton Sobel, Selecting and Ordering Populations: A New Statistical Methodology James A. Murdock, Perturbations: Theory and Methods Ivar Eke1and and Roger Temam, Convex Analysis and Variational Problems Ivar Stakgold, Boundary Value Problems
of Mathematical Physics, Volumes I and II of Nonlinear Equations
J. M. Ortega and W. C. Rheinboldt, Iterative Solution in Several Variables
David Kinderlehrer and Guido Stampacchia, An Introduction to Variational Inequalities and Their Applications
of Computerized Tomography Avinash C. Kak and Malcolm Slaney. Principles of Computerized Tomographic
F. Natterer, The Mathematics Imaging
R. Wong, Asympototic Approximations
of Integrals
O. Axe1sson and V. A. Barker, Finite Element Solution Problems: Theory and Computation
of Boundary Value
David R. Brillinger, Time Series: Data Analysis and Theory
~
Introduction to
Matrix Analysis ~
di
ISecond Edition I Richard Bellman
•
s'anL. Society for Industrial and Applied Mathematics
Philadelphia
Society for Industrial and Applied Mathematics 3600 University City Science Center, Philadelphia, PA 19104-2688.
© 1997 by The RAND Corporation. 109876543 This SIAM edition is an unabridged republication of the work first published by McGraw-Hili Book Company, New York, New York, 1970. All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the RAND Corporation. For information, write RAND Book Program, 1700 Main Street, PO Box 2138, Santa Monica, CA 90407-2138. library of Congress Catalog Card Number: 97·67651 ISBN 0·89871-3994
siam. is a registered trademark.
To Nina
Contents Foreword
,
xvii
Preface to Second Edition
xix
Preface
xxi
Chapter I.
Maximization, Minimization, and Motivation
1
Introouetion
1
Maximization of Functions of One Variable Maximization of Functions of Two Variables Algel>ra.ic Approo.ch Analytic Apprc:>a.ch Analytic Approo.ch-II A Simplifying Transformation Another Necessary and Sufficient Condition I>efinite and Indefinite Fonns Goometric Approo.ch Discussion
1 2 3 4 6 7 8 8 9 10
Vectors and Matrices Intrcxluetion Vectors Vector Addition Scalar Multiplication The Inner Product of Two Vectors Orthogonality Matrices Matrix Multiplication-Vector by Matrix Matrix Multiplication-Matrix by Matrix Noncommutativity Associativity Invariant Vectors Quadratic Forms as Inner Prcxlucts The Transpose Matrix Symmetric Matrices Hermitian Matrices Invariance of Distance-Orthogonal Matrices Unitar'V' Matrices ,
12 12 12 13 14 14 15 16 17 18 20 20 21 22 23 23 24 25 25
u
Chapter 2.
"
ix
x Chapter 3.
Chapter 4.
Chapter 5.
Contents Diagonalization and Canonical Forms for Symmetric Matrices Recapitulation The Solution of Linear Homogeneous Equations Characteristic Roots and Vectors Two Fundamental Properties of Symmetric Matrices Reduction to Diagonal FormDistinct Characteristic Roots Reduction of Quadratic Forms to Canonical Form Positive Definite Quadratic Forms and Matrices
32 32 32 34 35
37 39 40
Reduction of General Symmetric Matrices to Diagonal Form. Introouetion Unear J:>et>endence Gram-Schmidt Orthogonalization On the Positivity of the Ole An Identity The Diagonalization of General Symmetric MatricesTwooimensional N-
eflniteness Characteristic Vectors Associated with Multiple Characteristic Roots The Cayley-Hamilton Theorem for Symmetric Matrices Simultaneous Reduction to Diagonal Form Simultaneous Reduction to Sum of Squares Hermitian Matrices The Original Maximization Problem Perturbation Theory-I Perturbation Theory-II
54 55 56 58 S9 60 60 6I
Constrained Maxima Introduction Determinantal Criteria for Positive Definiteness Representation as Sum of Squares Constrained Variation and Finsler's Theorem The Case k =I A Minimization Problem General Values of k Rectangular Arrays
73 73 73 75 76 78 81 82 82
44 44 44 44 47 49 SO S1 54
Contents
ComJ)<>Site Matrices , The Result for General k Chapter 6.
,
"
Functions of Matrices Introouction .., , , , Functions of Symmetric Matrices The Inverse Matrix Uniqueness of Inverse Square Roots u.,'u Parametric Representation
, 83
"
A Result of I. Schur
,
,
, ft
u
The Fundamental Scalar Functions The Infinite Integral J.:e·(-"AJrldx An Analogue for Hermitian Matrices Relation between J(fI) and IHI
Chapter 8.
,
85
u
Chapter 7.
xi
H •••
, ,
,
90 90 90 91 91 93 94
9S 95 97 99 99
~
Variational Description of Characteristic Roots
.! 12
Introouetion The Ra.yleigh Quotient Variational Description of Characteristic Roots DiscllSSion Goometric Preliminary The Courant-Fischer min-max Theorem Monotone Behavior of Ak(A) A Sturmian Separation Theorem A Necessary and Sufficient Condition that A Be Positive IJefinite The Poincare Separation Theorem A Representation Theorem Approximate Techniques
112 112 I13
114 114 I15 117 I I7 I18 I18 I19 120
~
Inequalities
126
Introduction ~ The Cauchy·Schwarz Inequality
126 126
Integral Version Holder Inequality
126 127
Concavity of IA I
~
128
AUseful Inequality
129
Hadamard's Inequality
130
Conca.vity of AN AN-I' .. Ale Additive Inequalities from Multiplicative
An Alternate Route A Simpler Expression for ANAN-I"'Ak
130 ~
131 00
132 133
xii
Chapter 9.
Contents Arithmetic-Geometric Mean Inequality Multiplicative Inequalities from Additive
134 135
Dynamic Programming Introouction A Problem of Minimum I>eviation Functional Eq,uations Recurrence Relations A More Complicated Example Sturm-liouville Problems Functional Eq,uations ".., Jacooi Matrices Analytic Continuation Nonsymmetric Matrices Complex A " Slightly Intertwined Systems Simpliflca.tions-I Simpliflca.tions-II The Eq,uation Ax = y Quadratic I:>eviation A Result of Stieltjes
144 144 144 145 146 146 147 148 150 151 151 152 153 154 154 155 156 157
00
. , ••••••••••• 00••••••00•••••• 00.00••• 00
oo•••ou•••••••oo.oo••••••••••••••••••••••••••• 00 ••••••••
00•• 00•••••••••••• 00.'00
.00 ••••••• 00
00•••••• 0000
" •••••••
00•••••••••••••••••••••••••••••••••
00••••••••••••••••••• 00
00 ••• 00•• 00 ••••••• 00•••••••••••••••••••
00
00••• 00
00 •••••••• 00 •••
00 ••••• 00 •••••••••••••••••••••••••••• 00 •••••••••"
00••••
oo••••• oo••••••••••••••••••••••••• '.n •••••oo•••••••••• 00 ••••••• , •••••••••••••••• 00 ••••••• '.0000•••••••••••••••• 0000
0000
00
00•••••••00 ••••••• 00•••••••
oooo
oo••••••••• 00
Chapter 10. Matrices and Differential Equations Motivation Vector-matrix Notation Norms of Vectors and Matrices Infinite Series of Vectors and Matrices Existence and Uniqueness of S<>lutions of Linear Systems.. The Matrix Exponential Functional 'Eq,uations-I , Functional Eq,uations-II Functional Eq,uations-III Nonsingularity of S<>lution ' Solution of Inhomogeneous Eq,uationConstant Coefficients Inhomogeneous Eq,uation-Variable Coefficients Inhomogeneous Eq,uation-Adjoint Eq,uation Perturbation Theory Non-negativity of S<>lution , Polya's Functional Eq,uation The Eq,uation dX/dt =AX + XB 00
0000
00
" ••••••••••••••
00 ••••• 00•••00
00
00 . . 0000 ••• n
00
00 •••••••••••• 00
00 •••••••• 00••• " ••••
00••••••••••••• , ••••• , ••
00••••••••• '00
00 . . ' •••••••• 00
00
00 ••• 0000.00
00
00 ••
00
00'.00
00 •••••
,
00••
00 •••••••••••• 00
00
00 ••• '
00.' •••
163 163 164 165 167
167 169 170 171 171 171 173 173 174 174 176 177 179
Contents
xiii
~
I90 190
Chapter II. Explicit Solutions and Canonical Forms Introouction
Euler's Meth
190
Construction of a Solution Nonsingularity of C
191 191
Second Meth
u
00 ••••••••••
192
The Vandermonde I>eterminant 193 Explicit Solution of linear Differential EquationsDiagonal Matrices " 194 Diagonalization of a Matrix 194 Connection 1>etween Apprc:>aches •." , ,.•, , , ,., 195 Multiple Characteristic Roots , 197 Jordan Canonical Form , , , , , , 198 Multiple Characteristic Roots , ,.., "., " 199 Semidiagonal or Triangular Form-Schur's Theorem 202 Normal Matrices ,." , ""." , , , , 204 An Approximation Theorem .., ", " ,.., ,., 205 Another Approximation Theorem , , , , " , 2()6 The Cayley-Hamilton Theorem 207 Alternate Proof of Cayley-Hamilton Theorem 207 linear Equations with Periodic Coefficients 208 A Nonsingular Matrix Is an Exponential 209 An Alternate Proof , 211 Some Interesting Transformations 212 Biorthogonality, , "., , " , ".., ,•.•..,213 The lAplace Transform ., ",..,."" " , , , , 215 An Example .•..., " "" , " ,.,.." , , , 216 Discussion ." , , ,.."" " , , " " ,•••,." " 217 Matrix Case " ,., , , ", " , , 218 00" •••• , •• ,
,
,
,
Chapter 12. Symmetric Function, Kronecker Products and Circulants Introouction ., "..•" ",.., ",.., , , , "" , , "••,., Powers of Characteristic Roots Polynomials and Characteristic Equations Symmetric Functions ",." ".." ,.., , ",••,.., Kronecker PrOOuets " ,.." " , ,.., , Algebra of Kronecker Products Kronecker Powers-I '.. Kronecker Powers-II , ,." , ".., , ,..", Kronecker Powers-III , " Kronecker 1..£lgarithm , , "" , , , , , , Kronecker Sum-I ,•."., "..•., ", , , ,.." 00
00
'
",
,
00
, ••• ,
,.,
, .. "
231 231 231 233 234 235 236 236 ,236 237 237 238 ,
xiv
Contents
Kronecker Sum-II The Equation AX + XB = G The Lyapunov Equation An Alternate Route Circulants
238 239 240 242
C11a.J]ter 13. Stability Theor'}" IntrOOuetion A Necessary and Sufficient Condition for Stability Stability Matrices A Meth
254 255 256
Chapter 14. Markoff Matrices and Probability Theor'}" IntrOOuetion ASimple Stochastic Process Markoff Matrices and Probability Vectors Analytic Formulation of Discrete Markoff Processes Asymptotic 'Behavior First Proof Second Proof of Independence of Initial State Some Properties of Positive Markoff Matrices Second Proof of Limiting 'Behavior General Markoff Matrices A Continuous Stochastic Process Proof of Probabilistic 'Behavior Generalized Probabilities-Unitary Transformations Generalized Probabilities-Matrix Transformations
263 263 263 265 265 266 266 268 269 270 271 272 274 274 275
ChaJ]ter IS. Stochastic Matrices Introouetion limiting Behavior of Physical Systems EXJ)eCted Values Ex}leCted Values of Squares
281 281 281 282 283
00
00
Chapter 16. Positive Matrices, Perron's Theorem, and Mathematical Economics IntrOOuetion Some Simple Growth Processes
00
249 249 249 251 251 252 253
286 286 286
Contents
Notation The Theorem of Perron Proof of Theorem 1 First Proof That A(A) Is Simple Second Proof of the Simplicity of A(A) Proof of the Minimum Property of A(A) An Equivalent Definition of A(A) A Limit Theorem Steady-state Growth Continuous Growth Processes Analogue of Perron Theorem Nuclear Fission Mathematical Economics Minkowski·I..eontieff Matrices Positivity of II - A I Strengthening of Theorem 6 u
u
,
n
u
., "
,
u
Unea.r Pr
u
287 288 288 290 290 291 292 292 292 293 293 294 294 298 298 299 299
u
The Theory of Games A Markovian Decision Process An Economic Mooel "
300 301 302
u
Chapter 17. Control Processes Introouetion Maintenance of Unstable Equilibrium DiSCllSSion The Euler Equation for A =0 DiSC'USsion Two-point Boundary·value Problem Nonsingu.larity of Analytic Form of S<>lution Alternate Forms and Asymptotic Behavior Representation of the Square Root of B Riccati Differential Equation for R Instability of Euler Equation Proof that j(x) Attains an Absolute Minimum Dynamic Programming u Dynamic Pf
n
316 316 316 317 318 319 320 320 321 322
x;
322 323 323
u
n
xv
u
n
n
324 325 325 327 328 328 329 331
xvi
Contents
Potential Eq,uation , , Discretized Criterion Function
,
,
.u
Chapter 18. Invariant Iml:>edding IntrOOuction Terminal Condition as a Function of a Disc'ussion , Unea.r TransJlClrt Process Classica.l Iml:>edding Invariant Iml:>edding Interpretation Conservation Relation Non-negativity of Rand T Discussion Internal Fluxes .AIlsorption Processes u
u
u
".. 331 332
"
338 338 338
u
uu
340 340 341 342 344 345 346 347
u
oo
00
347
00
.00
00
00
Chapter 19. Numerical Inversion of the Laplace Transform and Tychonov Regularization ~ ~ .. ~ Intrcxluetion .. The Heat Eq,uation The Renewal Eq,uation Differential-difference Eq,uations 'DiscllSSion Instability Quadrature GallSSian Quadrature Approximating System of Unear Eq,uations A I:>evice of JacOOi Discussion Tychonov Regularization The Eq,uation q>(x) = A(X - c, x - c) C>btaining an Initial Approximation Self-consistent
00
~ .. 353
00
u
u
u
u
Appendix A. Unear Eq,uations and Rank IntrOOuction I:>eterminants A Property of Cofactors Cramer's Rule HomogeneOllS Systems n
u
u
348
u u
353 354 355 355 355 356 356 357 359 359 361 362 363 364 365 365 366 371 371 371 372 372 372
Contents
Rank " ".. Rank of a Quadratic Form uw of Inertia n
nn.uu
"
Sipture
" •••••••• " •• u
379
Appendix C. A Method of Hermite
Indexes
376 377 377 377
u
Appendix B. The Quadratic Form of Selberg
Appendix D. Moments and Quadratic Forms Introouction A I:>evice of Stieltjes A Technique of E. FISCher Representation as Moments A Result of Herglotz
xvii
m
n
383 385 385 385 386 387 388
391
Foreword This book by Richard Bellman brought together a large number of results which were scattered throughout the literature. The book had enormous impact in such fields as numerical analysis, control theory, and statistics. It provided the vision and spirit for the SIAM Journal on Matrix Analysis and Applications. The reader should note that the problem sections contain many useful results that are not easily found elsewhere. We believe this new SIAM edition will be of continual benefit to researchers in the applied sciences where matrix analysis plays a vital role. Gene Golub Editor-in-Chief Classics in Applied Mathematics
xix
Preface to Second Edition Since the publication of the First Edition of this book in 1960, the field of matrix theory has expanded at a furious rate. In preparing a Second Edition, the question arises of how to take into account this vast proliferation of topics, methods, and results. Clearly, it would be impossible to retain any reasonable size and also to cover the developments in new and old areas in some meaningful fashion. A compromise is, therefore, essential. We decided after much soulsearching to add some new results in areas already covered by including exercises at the ends of chapters and by updating references. New areas, on the other hand, are represented by three additional chapters devoted to control theory, invariant imbedding, and numerical inversion of the Laplace transform. As in the First Edition, we wish to emphasize the manner in which mathematical investigations of new scientific problems generate novel and interesting questions in matrix theory. Secondly, we hope to illustrate how the scientific background provides valuable clues as to the results to expect and even as to the methods of proof to employ. Many new areas have not been touched upon at all, or have been just barely mentioned: graph theory, scheduling theory, network theory, linear inequalities, and numerical analysis. We have preferred, for obvious reasons, to concentrate on topics to which a considerable amount of personal effort has been devoted. Matrix theory is replete with fascinating results and elegant techniques. It is a domain constantly stimulated by interactions with the outside world and it contributes to all areas of the physical and social sciences. It represents a healthy trend in modem mathematics.
Richard Bellman University of Southern California
xxi
Preface Our aim in this volume is to introduce the reader to the study of matrix theory, a field which with a great deal of justice may be called the arithmetic of higher mathematics. Although this is a rather sweeping claim, let us see if we can justify it. Surveying any of the classical domains of mathematics, we observe that the more interesting and significant parts are characterized by an interplay of factors. This interaction between individual elements marlifests itself in the appearance of functions of several variables and correspondingly in the shape of variables which depend upon several functions. The analysis of these functions leads to transformations of multidimensional type. It soon becomes clear that the very problem of describing the problems that arise is itself of formidable nature. One has only to refer to various texts of one hundred years ago to be convinced that at the outset of any investigation there is a very real danger ofbeing swamped by a sea of arithmetical and algebraical detail. And this is without regard of many conceptual and analytic difficulties that multidimensional analysis inevitably conjures up. It follows that at the very beginning a determined effort must be made to devise a useful, sensitive, and perceptive notation. Although it would certainly be rash to attempt to assign a numerical value to the dependence of successful research upon well-conceived notation, it is not difficult to cite numerous examples where the solutions become apparent when the questions are appropriately formulated. Conversely, a major effort and great ingenuity would be required were a clumsy and unrevealing notation employed. Think, for instance, of how it would be to do arithmetic or algebra in terms of Roman numerals. A well-designed notation attempts to express the essence of the underlying mathematics without obscuring or distracting. With this as our introduction, we can now furnish a very simple syllogism. Matrices represent the most important oftransfonnations, the lineartransformations; transformations lie at the heart of mathematics. consequently, our first statement. This volume, the first of a series of volumes devoted to an exposition of the results and methods of modem matrix theory, is intended to acquaint the reader with the fundamental concepts of matrix theory. Subsequent volumes will expand the domain in various directions. Here we shall pay particular attention to the field of analysis, both from the standpoint of motivation and application. In consequence, the contents are specifically slanted toward the needs and aspirations of analysts, mathematical physicists, engineers of all shadings, and mathematical economists. It turns out that the analytical theory of matrices, at the level at which we shall xxiii
xxiv
Preface
treat it, falls rather neatly into three main categories: the theory of symmetric matrices, which invades all fields, matrices and differential equations, ofparticular concern to the engineer and physicist, and positive matrices, crucial in the areas of probability theory and mathematical economics. Although we have made no attempt to tie our exposition in with any actual applications, we have consistently tried to show the origin of the principal problems we consider. We begin with the question of determining the maximum or minimum of a function of several variables. Using the methods of calculus, we see that the determination of a local maximum or minimum leads to the corresponding question for functions of much simpler form, namely functions which are quadratic in all the variables, under reasonable assumptions concerning the existence of a sufficient number of partial derivatives. In this fashion, we are led to consider quadratic forms, and thus symmetric matrices. We first treat the case of functions of two variables where the usual notation suffices to derive all results of interest. Turning to the higher dimensional case, it becomes clear a better notation will prove of value. Nevertheless, a thorough understanding of the two-dimensional case is quite worthwhile, since all the methods used in the multidimensional case are contained therein. We tum aside, then, from the multidimensional maximization problem to develop matrix notation. However, we systematically try to introduce at each stage only those new symbols and ideas which are necessary for the problem at hand. It may surprise the reader, for example, to see how far into the theory of symmetric matrices one can penetrate without the concept of an inverse matrix. Consistent with these ideas, we have not followed the usual approach of deluging the novice with a flood of results concerning the solutions of linear systems of equations. Without for one moment attempting to minimize the importance of this study, it is still true that a significant number of interesting and important results can be presented without slogging down this long and somewhat wearisome road. The concept of linear independence is introduced in connection with the orthogonalization process, where it plays a vital role. In the appendix we present a proof of the fundamental result concerning the solutions of linear systems, and a discussion of some of the principal results concerning the rank of a matrix. This concept of much significance in many areas of matrix theory is not as important as might be thought in the regions we explore here. Too often, in many parts of mathematics, the reader is required to swallow on faith a large quantity of predigested material before being allowed to chew over any meaty questions. We have tried to avoid this. Once it has been seen that a real problem exists, then there is motivation for introducing more sophisticated concepts. This is the situation that the mathematician faces in actual research and in many applications. Although we have tried throughout to make the presentation logical, we have
Preface
xxv
not belabored the point. Logic, after all, is a trick devised by the human mind to solve certain types of problems. But mathematics is more than logic, it is logic plus the creative process. How the logical devices that constitute the toolsofmathematics are to be combined to yield the desired results is not necessarily logical, no more than the writing of a symphony is a logical exercise, or the painting of a picture an exercise in syllogisms. Having introduced square matrices, the class of greatest importance for our work here, we tum to the problem of the canonical representation of real quadratic forms, or alternatively ofa real symmetric matrix. The most important result ofthis analysis, and one that is basic for all subsequent development of the theory of symmetric matrices, is the equivalence, in a sense that will be made precise below, of every real symmetric matrix with a diagonal matrix. In other words, multidimensional transformations of this type to a very great extent can be regarded as a number ofone-dimensional transformations performed simultaneously. The results of these preliminary chapters are instructive for several reasons. In the first place, they show what a great simplification in proof can be obtained by making an initial assumption concerning the simplicity of the characteristic roots. Secondly, they show that two methods can frequently be employed to circumvent the difficulties attendant upon multiplicity of roots. Both are potent tools of the analyst. The first is induction, the second is continuity. Of the two, continuity is the more delicate method, and requires for its rigorous use, quite a bit more of sophistication than is required elsewhere in the book. Consequently, although we indicate the applicability of this technique wherever possible, we leave it to the ambitious reader to fill in the details. Once having obtained the diagonal representation, we are ready to derive the min-max properties of the characteristic roots discovered by Courant and Fischer. The extension of these results to the more general operators arising from partial differential equations by Courant is a fundamental result in the domain ofanalysis. Having reached this point. it is now appropriate to introduce some other properties ofmatrices. We tum then to a briefstudy ofsome ofthe important matrix functions. The question ofdefining a general function ofa matrix is quite abit more complicated than might be imagined, and we discuss this only briefly. A number of references to the extensive literature on the subject are given. We now return to our original stimulus. the question of the range of values of a quadratic form. However. we complicate the problem to the extent of adding certain linear constraints. Not only is the problem of interest in itself, but it also supplies a good reason for introducing rectangular matrices. Having gone this far. it also turns out to be expedient to discuss matrices whose elements are themselves matrices. This further refinement of the matrix notation is often exceedingly useful. Following this, we consider a number of interesting inequalities relating to characteristic roots and various functions of the characteristic roots. This chapter
xxvi
Preface
is rather more specialized than any ofthe others in the volume and reflects perhaps the personal taste of the author more than the needs of the reader. The last chapter in the part devoted to symmetric matrices deals with the functional equation technique of dynamic programming. A number of problems related to maximization and minimization of quadratic forms and the solution of linear systems are treated in this fashion. The analytic results are interesting in their dependence upon parameters which are usually taken to be constant, while the recurrence relations obtained in this way yield algorithms which are sometimes of computational value. In the second third of the volume, we tum our attention to the application of matrix theory to the study of linear systems of differential equations. No previous knowledge ofdifferential equations is assumed or required. The requisite existence and uniqueness theorems for linear systems will be demonstrated in the course of the discussion. The first important concept that enters in the study of linear systems with constant coefficients is that of the matrix exponential. In terms of this matrix function, we have an explicit solution of the differential equation. The case of variable coefficients does not permit resolution in this easy fashion. To obtain an analogous expression, it is necessary to introduce the product integral, a concept we shall not enter into here. The product integral plays an important role in modem quantum mechanics. Although the construction of the exponential matrix solves in a very elegant fashion the problem ofconstructing an explicit solution ofthe linear equation with constant coefficients, it does not yield a useful representation for the individual components of the vector solution. For this purpose, we employ a method due to Euler for finding particular solutions of exponential type. In this way we are once again led to the problem ofdetermining characteristic roots and vectors ofa matrix. Since the matrix is in general no longer symmetric, the problem is very much more complicated than before. Although there are again a number of canonical forms, none of these are as convenient as that obtained for the case of the symmetric or hermitian matrix. The representation ofthe solution as a sum ofexponentials, and limiting cases, permits us to state a necessary and sufficient condition that all solutions of a homogeneous system tend to the zero vector as the time becomes arbitrarily large. This leads to a discussion of stability and the problem of determining in a simple fashion when a given system is stable. The general problem is quite complicated. Having obtained a variety of results for general, not necessarily symmetric, matrices, we tum to what appears to be a problem of rather specialized interest. Given a matrix A, how do we determine a matrix whose characteristic roots are specified functions of the characteristic roots ofA? If we ask that these functions be certain symmetric functions of the characteristic roots of A, then in a very natural fashion we are led to one of the important concepts of the algebraic side of matrix theory, the Kronecker product of two matrices. As we shall see, however,
Preface
xxvii
in the concluding part of the book. this function of two matrices arises also in the study of stochastic matrices. The final part of the volume is devoted to the study of matrices all of whose elements are non-negative. Matrices of this apparently specialized type arise in two important ways. First in the study of Markoff processes, and secondly in the study of various economic processes. A consideration of the physical origin of these matrices makes intuitively clear a number of quite important and interesting limit theorems associated with the names of Markoff, Perron. and Frobenius. In particular, a variational representation due to Wielandt plays a fundamental role in much of the theory of positive matrices. A briefchapter is devoted to the study ofstochastic matrices. This area dealing with the multiplicative, and hence noncommutative, aspect ofstochastic processes, rather than the additive, and hence commutative, aspect, is almost completely unexplored. The results we present are quite elementary and introductory. Finally, in a series of appendices, we have added some results which were either tangential to the main exposition. or else of quite specialized interest. The applications of Selberg, Hermite, and Fischer of the theory of symmetric matrices are, however, of such singular elegance that we felt that it was absolutely imperative that they be included. Now a few words to the reader first entering this fascinating field. As stated above, this volume is designed to be an introduction to the theory of matrix analysis. Although all the chapters are thus introductory in this sense, to paraphrase Orwell, some are more introductory than others. Consequently, it is not intended that the chapters be read consecutively. The beginner is urged to read Chaps. 1 through 5 for a general introduction to operations with matrices, and for the rudiments of the theory of symmetric matrices. At this point, it would be appropriate to jump to the study of general square matrices, using Chaps. 10 and 11 for this purpose. Finally, Chaps. 14 and 16 should be studied in order to understand the origins of Markoff matrices and non-negative matrices in general. Together with the working out of a number of the exercises, this should cover a semester course at an undergraduate senior, or first-year graduate level. The reader who is studying matrix theory on his own can follow the same program. Having attained this level, the next important topic is that of the minimummaximum characterization of the characteristic roots. Following this, we would suggest Kronecker products. From here, Chap. 6 seems appropriate. At this point, the remaining chapters can be taken in any order. Now a few comments concerning the exercises. Since the purpose of mathematics is to solve problems, it is impossible to judge one's progress without breaking a lance on a few problems from stage to stage. What we have attempted to do is to provide problems of all levels of difficulty, starting from those which
xxviii
Preface
merely illustrate the text and ending with those which are of considerable difficulty. These latter have usually been lifted from current research papers. Generally, the problems immediately following the sections are routine and included for drill purpose. A few are on a higher level, containing results which we shall use at a subsequent time. Since they can be established without too much effort we felt that it would be better to use them as exercises than to expand the text unduly by furnishing the proofs. In any case, the repetition of the same technique would make for tedious reading. On the other hand, the problems at the end of the chapter in the Miscellaneous Exercises are usually of a higher level, with some of greater complexity than others. Although the temptation is to star those problems which we consider more difficult, a little thought shows that this is bad pedagogy. After all, the purpose of a text such as this is to prepare the student to solve problems on his own, as they appear in research in the fields of analysis, mathematical physics, engineering, mathematical economics, and so forth. It is very seldom the case that a problem arising in this fashion is neatly stated with a star attached. Furthermore, it is important for the student to observe that the complication of a problem can practically never be gauged by its formulation. One very quickly learns that some very simple statements can conceal major difficulties. Taking this all into account, we have mixed problems of all levels together in a fairly random fashion and with no warning as to their complexity. It follows that in attacking these problems, one should do those that can be done, struggle for a while with the more obdurate ones, and then return to these from time to time, as maturity and analytic ability increase. Next, a word about the general plan of the volume. Not only do we wish to present a number of the fundamental results, but far more importantly, we wish to present a variety of fundamental methods. In order to do this, we have at several places presented alternative proofs of theorems, or indicated alternative proofs in the exercises. It is essential to realize that there are many different approaches to matrix theory, as indeed to all parts of mathematics. The importance of having all approaches available lies in the fact that different extensions may require different approaches. Indeed, some extensions may be routine with one approach and formidable, or impossible, following other routes. In this direction, although it is quite elegant and useful to derive many of the results of matrix theory from results valid for general operators, it is also important to point out various special techniques and devices which are particularly valuable in dealing with finite-dimensi<)Dal transformations. It is for this reason that we have tried to rescue from oblivion a number of simple and powerful approaches which were used fifty to seventy-five years ago in the days of halcyon innocence. The human mind being what it is, repetition and cross-sections from different angles are powerful pedagogical devices. In this connection, it is appropriate to quote Lewis Carroll in the Hunting of the Snark, Fit the First-UI have said it thrice: What I tell you three times is true."-The Bellman. Let us now briefly indicate some of the many fundamental aspects of matrix
Preface
xxix
theory which have reluctantly not been discussed in this volume. In the first place, we have omitted any discussion ofthe theory ofcomputational treatment of matrices. This theory has vastly expanded in recent years, stimulated by the extraordinary abilities of modem computers and those that are planned for the immediate future, and by the extraordinary demands ofthe current physical and economic scenes. The necessity for the development of new techniques in the numerical treatment of matrices lies in the fact that the problem of solving systems of linear equations involving a large number ofvariables, or that offinding the characteristic roots and vectors of a matrix of high dimension, cannot be treated in any routine fashion. Any successful treatment of these problems requires the use of new and subtle techniques. So much work has been done in this field that we thought it wise to devote a separate volume of this series to its recital. This volume will be written by George Forsythe. In a different direction, the combinatorial theory of matrices has mushroomed in recent years. The mathematical theory of games of Borel and von Neumann, the theory of linear programming, and the mathematical study of scheduling theory have all combined to create virtually a new field of matrix theory. Not only do novel and significant analytic and algebraic problems arise directly in this way, but the pressure ofobtaining computational results also has resulted in the development of still further techniques and concepts. A volume devoted to this field will be written by Alan Hoffman. Classically, topological aspects of matrix theory entered by way of the study of electrical networks. Here we encounter a beautiful blend of analytic, algebraic, and geometric theory. A volume on this important field is to be written by Louis Weinberg. We have, of course, barely penetrated the domain of matrix theory in the foregoing enumeration of topics. On the distant horizon, we foresee a volume on the advanced theory of matrix analysis. This would contain, among other results, various aspects of the theory of functions of matrices, the Loewner theory, the Siegel theory of modular functions of matrices, and the R matrices of Wigner. In the more general theory of functionals of matrices, the Baker-Campbell-Hausdorff theory leads to the study of product integrals. These theories have assumed dominant roles in many parts of modem mathematical physics. Another vast field of matrix analysis has developed in connection with the study of multivariate analysis in mathematical statistics. Again we have felt that results in this domain are best presented with the background and motivation of statistics. In a common ground between analysis and algebra, we meet the theory of group representation with its many beautiful applications to algebra, analysis, and mathematical physics. This subject also requires a separate volume. In the purely algebraic domain, there is the treatment of ideal theory by way
xxx
Preface
of matrices due to Poincare. We have not mentioned this because of the necessity for the introduction of rather sophisticated concepts. In the exercises, however, there are constant reminders of the interconnection between complex numbers, quaternions, and matrices, and hints of more general connections. Closely related is the study of matrices with integer elements, a number theory of matrices. Despite the attractiveness of these topics, we felt that it would be best to have them discussed in detail in a separate volume. It is important to note that not even the foregoing enumeration exhausts the many roles played by matrices in modem mathematics and its applications. We have included in the discussions at the ends ofthe chapters, andoccasionally in the exercises, a large number of references to original papers, current research papers, and various books on matrix theory. The reader who becomes particularly interested in a topic may thus pursue it thoroughly. Nonetheless, we make no pretension to an exhaustive bibliography, and many significant papers are not mentioned. Finally, I wish to express my heartfelt appreciation of the efforts of a number of friends who devotedly read through several drafts of the manuscript. Through their comments and criticisms, a number of significant improvements resulted in content, in style, and in the form of many interesting exercises. To Paul Brock, Ky Fan, and Olga Taussky, thanks. My sincere thanks are also due to Albert Madansky and Ingram Olkin who read several chapters and furnished a number of interesting exercises. I would like especially to express my gratitude to The RAND Corporation for its research policies which have permitted generous support of my work in the basic field of matrix theory. This has been merely one aspect of the freedom offered by RAND to pursue those endeavors which simultaneously advance science and serve the national interest. At the last, a vote ofthanks to my secretary, Jeanette Hiebert, who unflinchingly typed hundreds of pages of equations, uncomplainingly made revision after revision, and devotedly helped with the proofreading.
Richard Bellman
1 Maximization, M inimization, and Motivation 1. Introduction. The purpose of this opening chapter is to show how the question of ascertaining the range of values of a homogeneous quadratic function of two variables enters in a very natural way in connection with the problem of determining the maximum or minimum of a general function of two variables. We shall treat the problem of determining the extreme values of a quadratic function of two variables, or as we shall say, a quadratic form in two variables, in great detail. There are several important reasons for doing so. In the first place, the three different techniques we employ, algebraic, analytic, and geometric, can all, suitably interpreted, be generalized to apply to the multidimensional cases we consider subsequently. Even more important from the pedagogical point of view is the fact that the algebraic and analytic detail, to some extent threatening in the twodimensional case but truly formidable in the N-dimensional case, rather pointedly impels us to devise a new notation. A detailed examination of this case thus furnishes excellent motivation for the introduction of new concepts. 2. Maximization of Functions of One Variable. Let f(x) be a real function of the real variable x for x in the closed interval [a,b], and let us suppose that it possesses a convergent Taylor series of the form !(x) = f(c)
+ (x
- c)f'(c)
+ (x -21 C)2 f"(c) +
(1)
around each point in the open interval (a,b). Let c be a 8tationary point of f(x), which is to say a point wheref'(x) = 0, and let it be required to determine whether c is a point at which f(x) is a relative maximum, a relative minimum, or a stationary point of more complioated nature. 1
2
Introduction to Matrix Analysis
U c is a stationary point, the expansion appearing in (1) takes the simpler form (z - c)l f(z) = f(c) + f"(c) + ... (2) 21
If f"(c) - 0, we must consider further terms in the expansion. If, however, f"(c) ~ 0, its sign tells the story. When f"(c) > 0, f(z) has a relative minimum at z = c; whenf"(c) < O,f(z) has a relative maximum at z - c. EXERCISE 1. H r(t:) - 0, what are sufficient conditions that t: furnish a relative maximum?
3. Muimization of Functions of Two Variables. Let us now pursue the same question for a function of two variables, f(z,y), defined over the rectangle Cll ~ Z ~ hi, Cl2 ~ Y ~ h2, and possessing a convergent Taylor aeries around each point (CI,C2) inside this region. Thus, for Iz - cil and Iy - cil sufficiently small, we have
Here af
af
-aCI = -at z az
CI
y =
C2
af -aC2 = -af at z = CI ay
y =
C2
=
(2)
and so on. Let (CI,C2) be a stationary point of f(x,y) which means that we have the equations (3)
Then, as in the foregoing section, the nature of f(x,y) in the immediate neighborhood of (ChC2) depends upon the behavior of the quadratic terms appearing in the expansion in (1), namely,
where to simplify the notation we have set (5)
Maximization, Minimization, and Motivation
S
To simplify the algebra still further, let us set Z -
Ct
=
U
y -
C2
=
(6)
V
and consider the homogeneous quadratic expression Q(u,v)
= au 2 + 2buv
+ cv 2
(7)
An expression of this type will be called a quadratic form, specifically a quadratic form in the two variables u and v. Although we are interested only in the behavior of Q(u,v) in the vicinity of u = v = 0, the fact that Q(u,v) is homogeneous permits us to examine, if we wish, the range of values of Q(u,v) as u and v take all real values, or, if it is more convenient, the set of values assumed on u 2 + v2 = 1. The fact that Q(ku,kv) = k 2Q(u,v) for any value of k shows that the set of values assumed on the circle u 2 + v2 = k 2 is related in a very simple way to the values assumed on u 2 + v2 = 1. If Q(u,v) > 0 for all u and v distinct from u = v = O,f(z,y) will have a relative minimum at x = ClI Y = C2; if Q(u,v) < 0, there will be a relative maximum. If Q(u,v) can assume both negative and positive values, we face a stationary point of more complicated type-a Baddle point. Although a number of quite interesting algebraic and geometric questions arise in connection with saddle points, we shall not be concerned with these matters in this volume. If Q(u,v) is identically zero, the problem is, of course, even more complicated, but one, fortunately, of no particular importance. EXERCISE 1. Can the study of the p&8itivity or negativity of a homogeneous form of the fourth degree, Q(U,II) ... aou 4 + aluiv + aaUllI1 + a.ull a + a4114, be reduced to the study of the corresponding problem for quadratic forms?
,. Algebraic Approach. Let us now see if we can obtain some simple relations connecting the coefficients a, b, and C which will tell us which of the three situations described above actually occurs for any given quadratic form, au 2 + 2buv + ev 2, with real coefficients. In order to determine the sign of Q(u,v), we complete the square in the expression au2 + 2buv and write Q(u,v) in the following form: Q(u,v) = a ( u
2 bV)2 + ( C - a b)v +a
2
(1)
provided that a ~ O. If a = 0, but C ~ 0, we carry out the same type of transformation, reversing the roles of u and v. If a = C = 0, then Q(u,v) reduces to
"
Introduction to Matrix Analysis
2buv. If b ~ 0, it is clear that Q(u,v) can assume both negative and positive values. If b = 0, the quadratic form disappears. Let us then henceforth assume that 0 ~ 0, since otherwise the problem is readily resolved. From (1) it follows that Q(u,v) > 0 for all nontrivial u and v (Le., for all u and tI distinct from the pair (0,0). We shall employ this expression frequently below), provided that
a>O Similarly, Q(u,v) inequalities
C -
bt a
-
>0
(2)
< 0 for all nontrivial u and v, provided that we have the a
C -
bl a
-
<0
(3)
Conversely, if Q is positive for all nontrivial u and v, then the two inequalities in (2) must hold, with a similar result holding for the case where Q is negative for all nontrivial u and v. We have thus proved the following theorem. Theorem 1. A 8et of neCe88CJry and 8Ujftcient condition8 that Q(u,v) bepoBitive for all nontrivial u and v i8 that
/: :1>
0>0
0
(4)
Observe that we say a set of necessary and sufficient conditions, since there may be, and actually are, a number of alternative, but, of course, equivalent, sets of necessary and sufficient conditions. We usually try to obtain as many alternative sets as possible, since some are more convenient to apply than others in various situations. EXERCISE
1. Show that a set of necessary and sufficient conditions that Q(U,II) be positive is > 0, tIC - bl > O.
that c
&. Analytic Approach. As noted above, to determine the range of values of Q(u,v) it is sufficient to examine the set of values assumed by Q(u,v) on the circle u l + Vi = 1. If Q is to be positive for all nontrivial values of u and v, we must have
min Q(u,v) ul+.I-l
>0
(1)
<0
(2)
while the condition max Q(u,v) u'+v'-l
Maximization, Minimization, and Motivation
5
is the required condition that Q(u,v) be negative for all u and v on the unit circumference. To treat these variational problems in a symmetric fashion, we employ a Lagrange multiplier. Consider the problem of determining the stationary points of the new quadratic expression R(u,v)
=
au 2 + 2buv
+ cv 2 -
X(u 2
+ v2)
(3)
The conditions oR/au = aR/av = 0 yield the two linear expressions au + bv - Xu bu + cv - Xv
= =
0 0
(4)
Eliminating u and v from these two equations, we see that X satisfies the determinantal equation j
a-x
b
1=0
(5)
c- X X2 - (a + c)X + ac - b2 = 0
or
b
(6)
Since the discriminant is (a
+ c)2 -
4(ac - b2 )
= (a - c)2
+ 4b 2
(7)
clearly non-negative, we see that the roots of (6) which we shall call Xl and X2 are always real. Unless a = c and b = 0, these roots are distinct. Let us consider the case of distinct roots in detail. If b = 0, the roots of the quadratic in (6) are Xl = a, X2 = c. In the first case, Xl = a, the equations in (4) are (a - XI)U = 0
(c - Xl)v
=
0
(8)
which leaves u arbitrary and v = 0, if a ~ c. Since we are considering only the case of distinct roots, this must be so. H b ~ 0, we obtain the nontrivial solutions of (4) by using one of the equations and discarding the other. Thus u and v are connected by the relation (9) (a - Xl)u = -bv In order to talk about a particular solution, let us add the requirement that u 2 + v2 = 1. This is called a normalization. The values of u and v determined in this way are Ul = -b/(b 2 + (a - XI)2)~ VI = (a - XI)/(b 2 + (a - XI)2)~i
with another set place of Xl.
(U2,V2)
(10)
determined in a similar fashion when X2 is used in
Introduction to Matrix Analysis
6
e.
Analytic Approach-n. Once).t and )., have been determined by way of (5.6),t Ui and Vi are determined by the formulas of (5.10). These values, when substituted, yield the required minimum and maximum values of au' + 2buv + cv' on 1.4' + v' = 1. It turns out, however, that we can proceed in a very much more adroit fashion. Returning to the linear equations in (5.4), and multiplying the first by 1.4 and the second by v, we obtain (1)
This result is not unexpected; it is a special case of Euler's theorem concerning homogeneous functions, i.e.,
aQ
aQ
(2)
u-+v-=2Q
au
av
if Q(u,v) is homogeneous of degree 2. Since Ui' + Vi' we see that ).t = aUt' 2butvt + eVt' )., = au,' + 2bu,v, + eV2'
=
1, for i = 1, 2,
+
(3)
Hence one solution of the quadratic equation in (5.6) is the required maximum, with the other the required minimum. We observe then the remarkable fact that the maximum and minimum values of Q(u,v) can be obtained without any explicit calculation of the points at which they are obtained. Nonetheless, as we shall see, these points have important features of their own. Let us now derive an important property of the points (Ui,Vi), still without using their explicit values. As we have seen in the foregoing sections, these points are determined by the sets of equations
aUt + bVt - ).tUt = 0 but + CVt - ).tVt = 0 Ut' + Vt' = 1
au, bUt
+ 00, + cv, 1.4,'
).,1.4, ).,1.4,
= 0 = 0
(4)
+ v,, = 1
Considering the first set, we have, upon multiplying the first equation by Ut, the second by v" and adding,
U,(aul
+ OO t -
).tUt) + v,(bu t + CVt - ).tVt) = aUtU, + b(utVt + UtV,) + CVtV, - ).t(UtU,
+ VtV,) = 0
(5)
Similarly, the second set yields
aUtU,
+ b(UtVt + UtV,) + CVtV,
- ).,(UtU,
+ VtV,)
=
0
(6)
I Double numbers in parentheses refer to equations in another section of the chapter.
Maximi1.:ation, Minimization, and Motivation
7
Subtracting, we have (Xt -
Since
Xl ;C X2
X2)(UtU2
+ VtV2)
=0
(7)
(by assumption), we obtain the result that (8)
The geometric significance of this relation will be discussed below. Let us also note that the quantity U l V2 - U2Vt is nonzero. For assume that it were zero. Then, together with (8), we would have the two linear equatiCins in Ut and Vt, UtU2 + VtV2 = 0 (9) UtV2 - V l U2 = 0 Since Ut and VI are not both zero, a consequence of the normalization conditions in (4), we mu~t have the determinantal relation (to} or
U2 2
+ V2 2 = 0, contradicting the last relation in (4). EXERCISE
1. Show that for any two sets of values (UI'
+
and thus again that
lJI')(UI I
UI1I2 -
+
U2111
111 1)
~
==
(UI,U2)
(UIUI
0 if the
U;
and
(111,112)
we have the relation
+ 111111)1 + (UIIII and
II,
-
UIIII)I,
are as above.
7. A Simplifying Transformation. Armed with a knowledge of the properties of the (u.,v,) contained in (6.4) and (6.8), let us see what happens if we make the change of variable U
= UIU'
V = vlu'
+ U2V' + V2V'
(1)
This is a one-to-one transformation since the determinant is nonzero, as noted at the end of Sec. 6. In the first place, we see that U2
+ v2 =
(U1 2
+ Vt 2)U'2 + (U2 2 + V2 2)V'2 + 2(UtU2 + VtV2)U'V'
U l V2 -
U2Vt
+ V'2 (2) ItfollowsthatthesetofvaluesassumedbyQ(u,v)onthecircleu + v = 1 is the same as the set of values assumed by Q(ulu' + U2V', VtU' + V2V') on the circle U'2 + V'2 = 1. =
U'2
2
2
Let us now see what the expression for Q looks like in terms of the new variables. We have, upon collecting terms,
Introduction to Matrix Analysia
8 Q(UtU'
+ UIfJ', VtU' + VIfJ') = (aut2 + 2butVl + CVt2)U'2 + (au22 + 2bulfJ2 + CV2 2)V'2 + 2(aulu2 + b(utv + U2Vt) + CVIV2)U'V' 2
(3)
Referring to (6.3), (6.6), and (6.8), we see that this reduces to
(4)
The effect of the change of variable has been to eliminate the crossproduct term 2buv. This is quite convenient for various algebraic, analytic and geometric purposes, as we shall have occasion to observe in subsequent chapters where a similar transformation will be applied in the multidimensional
case. As a matter of fact, the principal part of the theory of quadratic forms rests upon the fact that an analogous result holds for quadratic forms in any number of variables. EXERCISE 1. Referring to (4), determine the conic section described by the equation Q(U,II) - 1 in the following cases: (a)
}.I
> 0, >
}.I
>0
(b) }.I 0, }.I < (c) }.I - }.I 0
(d)
}.I -
0,
>
}.2
0
>0
8. Another Necessary and Sufllcient Condition. Using the pl'eceding representation, we see that we can make the following statement. Theorem 2. A nec68sary and sufficient condition that Q(u,v) be poBitive lor aU nontrivial U and v i8 that the roots of the determinantal equation
Ia ~ X
c
~ xl =
0
(1)
be positive. EXERCISE 1. Show directJy that the condition stated above is equivalent to that given in Theorem 1.
9. Definite and Indefinite Forms. Let us now introduce some terminology. If Q(u,v) = au 2 + 2buv + cv 2 > 0 for all nontrivial u and v, we shall say that Q(u,v) is positive definite. If Q(u,v) < 0 for these values of u and v, we shall call Q(u,v) negative definite. If Q(u,v) can be of either sign, we shall say that it is indefinite. If we merely have the inequality Q(u,v) ~ 0 for all nontrivial u and v, we say that Q is non-negative definite, with non-positive definite defined analogously. Occasionally, the term positivI indefinite is used in place of non-negative definite.
Maximization, Minimization, and Motivation
9
EXERCISES
1. Show that if alu l + 2b 1uII + Cl1I 1 and asu l + 2b1uv + caV l are both positive definite, then alalu l + 2b 1b1uII + CICaV l is positive definite. I. Under what conditions is (alul + alul)1 + (b1ul + b1u,)' positive definite? 8. In terms of the foregoing notation, how can one tell whether au' + 2buv + C111 - 1 represents an ellipse, a hyperbola, or a parabola?
10. Geometric Approach. Let us now consider a variant of the foregoing method which will yield a valuable insight into the geometrical significance of the roots, Xl and X2, and the values (u"v.). Assume that the equation au 2
+ 2buv + cv
2
=
1
(1)
represents an ellipse, as pictured: v
The quantity r = (ul + v2 )H denotes the length of the radius vector from the origin to a point (u,v) on the ellipse. Let us use the fact that the problem of determining the maximum of Q(u,v) on u 2 + v2 = 1 is equivalent to that of determining the minimum of u l + v2 on the curve Q(u,v) = 1. The Lagrange multiplier formalism as before leads to the equations
u - XCau v - X(bu
+ bv) + cv)
=
0
=0
(2)
These yield the equation 1 - aX 1
-bX
-bX
I -0
1 - eX -
(3)
or 1
a--
X
b
b
1 = 0 e-
(4)
X
If X, is a root of this equation a~d (u"v,) the corresponding extremum point, we see as above that (5)
Introduction to Matrix Analysis
10
From this we conclude that one root of (4) is the square of the minimum distance from the origin to the ellipse and that the other is the square of the maximum distance. We observe then that the variational problem we have posed yields in the course of its solution the lengths of the major and minor axes of the ellipse. The condition of (6.8) we now recognize as the well-known perpendicularity or orthogonality of the principal axes of an ellipse. The linear transformation of (7.1) is clearly a rotation, since it preserves both the origin and distance from the origin. We see that it is precisely the rotation which aligns the coordinate axes and the axes of the ellipse. EXERCISES 1. From the foregoing facte, conclude that the area of the ellipae is given by r/(ac - bl)~i. I. Following the algebraic approach, show that necessary and sufficient conditions that the form Q ... au l + 2bull + ell l + 2duw + ew l + 2/IIW be positive definite are that
dl > 0
b b e f
l
a
a>O
d
f
e
8. Similarly, following the analytic approach, show that a necessary and sufficient condition that Q be positive definite is that all the roots of the determinantal equation
a - >. b
e-}.
f
d
f
e-}.
l
b
d
I
-0
be positive. 4. If Q is positive definite, show that the equation Q(U,II,W) - 1 represents an ellipaoid and determine its volume.
11. Discussion. We have pursued the details of the two-dimensional case, details which are elementary but whose origins are perhaps obscure, in order that the reader will more readily understand the need for a better notation and appreciate its advantages. The results which seem so providential here, and which have essentially been verified by direct calculation, will be derived quite naturally in the general case. The basic ideas, however, and the basic devices are all contained in the preceding discussions. MISCELLANEOUS EXERCISES 1. For what values of XI and XI is the quadratic form (auxi + auxi - bl)1 a minimum, and what is the minimum value? I. Show that (XII + YII)(XI I + YII) can he written in the form
+ allXI -
bl)l
(allxl
(alx,xl
+ aIX,)11 + a.XIYI + a.YI?/I)1 + (blxlXI + blxlYI + b.X2Y' + bd/ll/l)1
+
Maximization, Minimization, and Motivation
11
for values of ai and bi which are independent of Xi and I/i, and determine all such values. 8. Show that there exists no corresponding result for
+ YI I + ZII)(XII + 1/11 + ZII) + Xalll and I/IU I + 21/IUII + 1/1111 are positive definite, (XII
4. If XIU I
+ 2XIUII
I
X'I/I XII/I
,1;11/1/ Xal/a
XI 2 / XI
XliiI/I Xa 1/1
then
1/1/ I/a
I. Utilize this result to treat Exercise 1 of Sec. 9. 8. Establish the validity of the Lagrange multiplier formalism by considering the maximum and minimum valu~s of (aul + 2bull + Cl/I)/(U I + III). 7. What linear trl,l.,nsformationsleave the quadratic form Q(XI,XI) ... >'(X,I + XII) + (I, - >,)(x, + XI)I invariant l' Here 0 :S >, :S 1.
Bibliography §1. A general discussion of the maxima and minima of functions of several variables can be found in any of a number of books on advanced calculus. A particularly thorough discussion may be found in
H. Hancock,
The Theory of Maxima and Minima, Ginn & Company,
Boston, 1907. In a later chapter we shall study the more difficult problem of constrained maxima and minima. The problem of determining the relations between the number of relative maxima, relative minima, and stationary points of other types is one which belongs to the domain of topology and will not be discussed here; see M. Morse, The Calculus of Variations in the Large, Amer. Math. Soc., 1934, vol. 18. §9. Other locutions also appear frequently. Schwerdtfeger and Mirsky use positive semidefinite; Halmos uses non-negative semidefinite.
2 Vectorsand M atrices 1. Introduction. In Chap. 1 we studied the question of determining the looal maxima and minima of a funotion of two variables. If we consider the corresponding problem for functions of N variables, and proceed as before, we immediately enoounter the problem of determining simple neoessary and suffioient conditions whioh ensure the positivity of a quadratio form in N variables, Q(ZI,Z2, •••
,~N)
N
=
~
fJlj%tZj
(1)
,,j-l
As we shall see later, in Chap. 5, the algebraio method presented in the previous ohapter yields a simple and elegant solution of this .problem. However, since we really want a muoh deeper understanding of quadratic forms than merely that required for this partioular problem, we shall pretend here that this solution does not exist. Our objective in this ohapter then is to develop a notation whioh will enable us to pursue the analytic approaoh with a very minimum of arithmetio or analytio oaloulation. Pursuant to this aim, we want a notation as independent of dimension as possible. Oddly enough, the study of quadratio funotions of the form appearing above is enormously simplified by a notation introduced initially to study linear transformations of the form N
y, =
L
i = 1,2, ...• N
(L;jXj
(2)
j-l
2. Vectors. Let us begin by defining a vector, a set of N oomplex numbers whioh we shall write in the form XI X2
(1)
x= XN
12
Vectors and Matrices A vector of thilJ type is called a column vector. ltrranged in horizontal array,
13 If N numbers are
(2) x is called a row vector. Since we can do all that we wish to do here working with column vectors, we shall henceforth reserve the term /I vector" to denote a quantity having the form in (1). Lower-case letters such as x, Y, z or a, b, c will be employed throughout to designate veotors. When discussing a particular set of vectors, we shall use superscripts, thus Xl, x 2, etc. The quantities Xi are called the components of x, while N is called the dimension of the vector x. One-dimensional vector8 are called scalars. These are the usual quantities of analysis. By ! we shall denote the vector whose components are the complex conjugates of the elements of x. If the components of x are all real, we shall say that x is real. 3. Vector Addition. Let us now proceed to develop an algebra of vectors, which is to say a set of rules for manipulating these quantities. Since arbitrarily many sets of rules can be devised, the justification for those we present will and must lie in the demonstration that they permit us to treat some important problems in a straightforward and elegant fashion. Two vectors x and yare said to be equal if and only if their components are equal, Xi = Yi for i = 1,2, . . . ,N. The simplest operation acting on two vectors is addition. The sum of two vectors, x and y, is written x + y and defined to be the vector
(1)
It should be noted that the plus sign connecting x and Y is not the same as the sign connecting Xi and Yi. However, since, as we readily see, it enjoys the same analytic properties, there is no harm in using the same iymbol in both cases. EXERCISES 1. Show that we have commutativitll. x (II
+ z)
... (x
+ y) + z.
I. Hence, show that
Xl
+ Xl + . . . + x N
+ II ... II + 11:,
and aIBociativity, x
is an unambiguous vector.
+
Introduction to Matrix Analysis
14
a. Define subtraction of two vectors directly, and in terms of addition; that is, :r: - II is a vector, such that II + , = x, 4. Scalar Multiplication. Multiplication of a vector x by a scalar is defined by means of the relation
Cl
(1)
EXERCISES
1. Show that (Cl
+ CI)(X + II)
-
CIX
+ CIII + CtZ + CIII.
I. Define the null vector, written 0, to be the vector all of whose components are zero. Show that it is uniquely determined as the vector 0 for which x + n ... x for all :1:. 8. Show that it is uniquely determined by the condition that CIO ... 0 for all scalars Cl.
I. The Inner Product of Two Vectors. We now introduce a most important soalar funotion of two veotors x and y, the inner product. This function will be written (x,Y) and defined by the relation N
(x,y) =
l
(1)
X.Y.
i==l
The following properties of the inner product are derived directly from the definition: (x
(x,Y) = (y,x) = (x,z) (x,w) (CIX,Y) = CI(X,Y)
+ Y, z + w)
+
+ (Y,z) + (Y,w)
(2a) (2b) (2c)
This is one way of "multiplying" two vectors. However, there are other ways, which we shall not use here. The importance of the inner product lies in the fact that (x,x) can he considered to represent the square of the "length" of the real vector x. We thus possess a method for evaluating these non-numerical quantities. EXERCISES 1. If:l: is real, show that (x,x) > 0 unless x == O. I. Show that (ux + IJII, ux + 1111) ... ul(x,x) + 2UII(X,II) + IJI (II,II) is a non-negative definite quadratic form in the scalar variables u and II if x and II are real and hence that (X,II)I $ (x,x)(II,II)
for any two real vectors
:I:
and II (Cauchy's inequality).
Vectors and Matrices
15
8. What is the geometric interpretation of this result? 4. Using this result, show that (x
+ 1/, x + 1/) ~s
~ (x, x) ~s
+ (1/,1/) ~
for any two real vectors. I. Why is this inequality called the "triangle inequality"? 8. Show that for any two complex vectors x and 1/ we have l(x,I/)11 ~ (x,~)(I/,ti)
6. Orthogonality. If two real vectors are conneoted by the relation (x,y) = 0
(1)
they are said to be orthogonal. The importance of this concept derives to a great extent from the following easily established result. Theorem 1. Let {xiI be a set of real vectors which are mutually orthogonalj that is, (xi,X i ) = 0, i ~ j, and normalized by the condition that
= 1
(Xi,X i )
Then if a particular vector x has the representation M
x =
l
(2)
CiXi
i-I
we have the following values for the coefficients Ci
=
(3)
(x ,Xi)
and, in addition, M
(x,x) =
L
Ci
2
(4)
i-I
EXERCISES 1. Why is the proper concept of orthogonality for complex vectors the relation (x,ti)
=0
and the proper normalization the condition (x,f) = I? I. Show that (x,ti) = (f,y). 8. In N-dimensional Euclidean space, consider the coordinate vectors
1/1 ...
111 ... 0
0 0 0
0 1 0
1 0 0
liN -
0
1
16
Introduction to Matrix Analysis
Show that the II' are mutually orthogonal, and are, in addition, normaliled.
A .., of
N
this type is called orthonormal. H s -
L
ell/', determine the values 01 the
c"
and
i-I
discuSl the geometric meaning of the result.
'to Matrices. Let us now introduoe the ooncept of a matrix. An array of oomplex numbers written in the form all au
au au
alN a,N
A =
(1) aNI
aNI
.,.
aNN
will be oalled a 'quare matrix. Sinoe these are the only types of matrioes we shall consider to any extent, we shall reserve the term U matrix" for these entities. When other types of matrioes are introduced subsequently, we shall use an appropriate modifying adjeotive. The quantities Oii are oalled the element, of A and the integer N is the dimension. The quantities (1;1, (1;" • • • , 0iN are said to oonstitute the ith row of A, and the quantities a1$, ali, ••• , aNi are said to oonstitute the jth column. Throughout, matrioes will be denoted by upper-oue letters, X, Y, Z and A, B, C. From time to time we shall use the shorthand notation A = «(1;/) (2} The determinant assooiated with the array in (1) will be written IAI or 1a.:;1. The simplest relation between matrices is that of equality t two matrioes are oalIed equal if and only if their elements are equal. Following the same path as for vectors, we next define the operation of addition. The sum of two matrices A and B is written A + B and defined by the notation (3)
Multiplication of a matrix A by a scalar Ct is defined by the relation ctA = ACI ... (Cta.:;)
(4)
Finally, by A we shall mean the matrix whose elements are the complex conjugates of A. When the elements of A are real, we shall call A a real matrix. EXERCISES
1. Show that 0, the null matrix defined by the condition that all its element. art sera, is uniquely determined by the condition that A + 0 ... A for all A.
Vectors and Matrices
17
I. Show that matrix addition is commutative and UIOciative. a. Show that A, + AI + ... + AN is unambiguously defined.
8. Matrix Multiplication-Vector by Matrix. In order to make our algebra more interesting, we shall define some multiplicative operations. In order to render these concepts reasonable, we return to linear transformations of the form N
Yi =
L
i
aijxi,
= 1,2, . . .
,N
(1)
i-I
where the coefficients a;j are complex quantities. Since, after all, the whole purpose of our introduction of vector-matrix notation is to facilitate the study of these transformations, it is only fair that in deriving the fundamental properties of vectors and matrices we turn occasionally to the defining equations for guidance. The point we wish to make is that the definitions of addition and multiplication of vectors and matrices are not arbitrary, but, on the contrary, are essentially forced upon us by the analytic and geometric properties of the entities we are studying. Given two vectors x and Y related as in (1), we write y = Ax
This relation defines multiplication of a vector x by a matrix A. carefully the order in which the product is written.
(2)
Observe
EXERCISES 1. Show that (A + B) (x + II) = Ax + All + Bx + BII. 2. Consider the identity matrix I defined by the tableau
1{ 0J 1
Explicitly, I ... (6if), where 6if is the Kronecker delta symbol defined by the relation &;j
= 0, == I,
i ~j i ... j
Show that N
alj
=
l
k-l
8/l8lj
Introduction to Matrix Analysis
18 8. Show that 1:1: •• Show that
:I:
for all:l:, and that this relation uniquely determines 1.
9. Matrix Multiplication-Matrix by Matrix. Now let us see how to define the product of a matrix by a matrix. Consider a second linear transformation z = By
(1)
which COllverts the components of y into the components of z. In order to express the components of z in terms of the components of x, where, as above, y = Ax, we write . N
Z,
=
N
pI
j-I
L Lb," (L l (L billYil =
Il-1
=
N
N
N
j-I
k-I
o"j%j)
(2)
biklLkj ) XI
If we now introduce a new matrix 0 = (Cij) defined by the relations N
Cij
=
L
i, j = 1, 2, . . . , N
bikakj
(3)
Il-1
we may write z = Ox
(4)
z = By = B(Ax) = (BA)x
(5)
Since, formally, we are led to define the product of A by B, C = BA
where 0 is determined by (3). which the product is written.
(6)
Once again, note carefully the order in
EXERCISES
1. Show that (A 2. If
+ B)(C + D)
... AC
T(9) ...
+ AD + BC + BD.
[C?S 9 8m 9
- sin 9 ] cos 9
show that T(9 1)T(91) = T(91)T(9 1)
...
T(9 1
+9
2)
•• Let A be a matrix with the property that ai; "" 0, if j pill i, a diagunal matrix, and let B be a matrix of similar type. Show that AB is again a diagonal matrix, and that AB ... BA.
Vectors and Matrices
19
,. Let A be a matrix with the property that ali =- 0, j > i, a triangular or Bemidiagonal matrix, and let B be a matrix of similar type. Show that AB is again a triangular matrix, but that AB ~ BA, in general. II. Let
Show that AB is again a matrix of the same type, and that AB ... BA. (As we shall subsequently see, these matrices are equivalent to complex numbers of the form al + ial if al and al are real.) 8. Use the relation IABI = IAIIBI to show that
7. Let
and B be a matrix of similar type. Show that AB is a matrix of similar type, but that AB ~ BA in general. 8. Use the relation IABI = IAIIBI to express (all + all + all + a.l)(b ll + bl l + bsl + b.l ) as a sum of four squares. 9. Let al al
a. al
as
-a.
al -al
and B be a matrix of similar type. Show that AB is a matrix of similar type, but that AB ~ BA in general. Evaluate IAI. (These matrices are equivalent to quaternions.) 10. Consider the linear fractional transformation
If Ts(z) is a similar transformation, with coefficients ai, bit CI, dl replaced by ai, bl , el, ds, show that TI(TI(z» and TI(TI(z» are again transformations of the same type. 11. If the expression ald l - b,cI ~ 0, show that TI-I(Z) is a transformation of the same type. If ald l - bici ... 0, what type of a transformation is TI(s)? 12. Consider a correspondence between TI(z) and the matrix
written Al '" TI(z). Show that if AI'" TI(z) and AI'" TI(z), then AlAI'" TI(TI(s». What then is the condition that TI(TI(z» ... TI(TI(z» for all z7 18. How can the foregoing results be used to obtain a representation for the iterates of TI(z)? U. Show that
-1]S o =1
111. If X = (Xi;), where Xi; = (-I)N-;
Xs .. I.
(~ ~ t)
(the binomial coefficient), then
20
Introduction to Matrix Analysis
1.. From the fact that we can establilh a correspondence deduce that arithmetic can be carried out without the aid of negative number..
10. Noncommutativity. What makes the study of matrices so fasciDating, albeit occasionally thorny, is the fact that multiplication is not commutative. In other words, in general, AB~BA
(1)
A simple example is furnished by the 2 X 2 matrices (2)
and more interesting examples by the matrices appearing. in Exercises 7 and 9 of Sec. 9. Then, AB
=
[2210 157]
(3)
If AB = BA, we shall say that A and B commute. The theory of matrices yields a very natural transition from the tame domain of scalars and their amenable algebra to the more interesting and realistic world where many different species of algebras abound, each with its own singular and yet compensating properties. EXERCISES 1. Show that AI ... I A ... A for all A, and that this relation uniquely determines the identity matrix. 2. Show that AO ... OA ... 0 for all A, and that this relation uniquely determines the null matrix. a. Let the rows of A be considered to consist, respectively, of the componentl of the vectors ai, ai, . . . , aN, and the columns of B to consist of the componentl of bt , bl , • • • ,bN • Then we may write AB ... [(al,bi»)
.. If AX ... XA for all X, then A is a scala' n:ultiple of I.
11. Associativity. Fortunately, although commutativity does not hold, associativity of multiplication is preserved in this new algebra. In other words, for all A, B, and C we have (AB)C
=
A(BC)
(1)
This means that the product ABC is unambiguously defined without the
Vectors and Matrices
21
aid of parentheses. To establish this result most simply, let us employ the 1/ dummy index" convention, which asserts that any index which is repeated is to be summed over all of its admissible values. The ijth element in AB may then be written (2)
Employing this convention and the definition of multiplication given above, we see that (AB)C A (BC)
= [(a'kbkl)CU]
(3)
= [a.k(bkICU)J
which establishes the equality of (AB)C and A(BC). EXERCISES
1. Show that ABCD = A(BCD) = (AB)(CD) = (ABC)D, and ~enerally that AlAI' .. AN has a unique significance. 2. Show that A" = A . A . (n times) A is unambiguously defined, and that A"+o =AmA", m,n =- 1,2, . . . . 8. Show that Am and B" commute if A and B commute. ,. Write where X
=
[Xl
x.
XI] X.
a given matrix. Using the relation X"+l =- XX", derive recurrence relations for the x,(n) and thus derive the analytic form of the x,(n). I. Use these relations for the case where
or 8. Use these relations to find explicit representations for the elements of X" where
X =
[~ ~]
X =
A1 0] [o0 0 1 }o,
}o,
7. Find all 2 X 2 matrix solutions of XI = X. 8. Find all 2 X 2 matrix solutions of X" = X, where n is a positive integer.
12. Invariant Vectors. Proceeding as in Sec. 5 of Chap. I, we see that the problem of determining the maximum and minimum values of N
Q:::
L ,.;-1
N
a'jX,Xj
for
Xi
satisfying the relation
L ,-1
Xi 2
= 1 can be reduced to
Introduction to Matrix Analysis
22
the problem of determining the values of Xfor which the linear homogeneous equations N
l
aijXj
=
i = 1,2, . . . , N
XXi
(1)
j-I
possess nontrivial solutions. In vector-matrix terms, we can write these equations in the form of a single equation Ax = Xx
(2)
Written in this way, the equation has a very simple significance. We are looking for those vectors x which are transformed into scalar multiples of themselves by the matrix A. )'hinking of x as representing a direction indicated by the N direction numbers Xl, X2, • • • , XN, we are searching for invariant directions. We shall pursue this investigation vigorously in the following chapters. In the meantime, we wish to introduce a small amount of further notation which will be useful to us in what follows. 13. Quadratic Forms as Inner Products. Let us now present another justification of the notation we have introduced. The quadratic form Q(u,v) = au 2 + 2buv + cv 2 can be written in the form u(au
+ bv) + v(bu + cv)
(1)
Hence, if the vector x and the matrix A are defined by A =
[~ ~]
(2)
we see that Q(u,v) = (x, Ax)
(3)
Similarly, given the N-dimensional quadratic form N
Q(x) =
l
(4)
OijX,Xj
i.j-I
where without loss of generality we may take N
Q(x) =
Xl [jfl aljxj] +
X
we can write
N
X2
[J~
a2jxj]
+
= (x,Ax) where
Oij = aji,
has the components Xi and A
(5) = (ajJ).
Vectors and Matrices
23
EXERCISES 1. Does (x/Ax) = (x,Bx) for all x imply that A ... B? 2. Under what conditions does (x,Ax) = 0 for all x?
14. The Transpose Matrix. Let us now define a most important matrix function of A, the transpose matrix, by means of the relation A'
=
(1)
(aji)
The rows of A' are the columns of A and the rows of A are the columns of A'. We are led to consider this new matrix in the following fashion. Consider a set of vectors {x 1and another set {y I, and form all inner products (x,y) composed of one vector from one set and one from the other. Suppose now that we transform the set {x 1 by means of matrix multiplication of A, obtaining the new set {Ax I. Forming inner products as before, we obtain the set of values (Ax,Y). Observe that N
(Ax,y) = YI [
l
N
atjxj]
+ Y2 [
N
l
a2jxj]
i-I
i-I
N
N
i-I
i-I
+
+ YN [l
aNjXi]
(2)
i-I
or, rearranging, (Ax,y)
= =
XI [
LaiIY.] + X2 [ Lai2Y'] +
(X,A'y)
N
+ XN [L
atNY']
i-I
(3)
In other words, as far as the inner product is concerned, the effect of the transformation A on the set of x's is equivalent to the transformation A' on the set of y's. We can then regard A' as an induced transformation or adjoint transformation. This simple, but powerful, idea pervades much of classical and modern analysis. 16. Symmetric Matrices. From the foregoing discussion, it is plausible that matrices satisfying the condition A
= A'
(1)
should enjoy special properties and play an important role in the study of quadratic forms. This is indeed the case. These matrices are called 8ymmetric, and are characterized by the condition that (2)
The first part of this volume will be devoted to a study of the basic properties of this class of matrices for the case where the atj are real.
Introduction to Matrix Analysis Henceforth, we shall use the term "symmetric" to denote real 8ym-
metric. When there is any danger of confusion, we shall say real symmetric or complex symmetric, depending upon the type of matrix we are considering. EXERCISES
1. Show that (A')' - A.
2. Show that (A + B)' ... A' + B', (AB)' ... B'A', (AlA • ••• A,,)' - A: •.• A~~, (A")' - (A')". 8. Show that AB ill not nece88arily symmetric if A and Bare. ,. Show that A'BA is symmetric if B ill symmetric. S. Show that (As,BII) ... (s,A'BII). e. Show that IAI - IA'I. N
T. Show that when we write Q(z)... 8S8uming that all - a,I.
~
aliz'Zf, there is no 1088 of generality in
_.1-1
16. Hermitian Matrices. As we have mentioned above, the important scalar function for complex vectors turns out not to be the usual inner product, but the expression (x,y). If we note that (Ax,y) = (x,!)
where z = ::Py, we see that the induced transformation is now complex conjugate of the transform of A. Matrices for which
(1)
::P, the (2)
are called Hermitian, after the great French mathematician Charles Hermite. We shall write A * in place of ::P for simplicity of notation. As we shall see, all the vital properties of symmetric matrices have immediate analogues for Hermitian matrices. Furthermore, if we had wished, we could have introduced a notation [x,y] = (x,y)
(3)
in terms of which the properties of both types of matrices can be derived simultaneously. There are advantages to both procedures, and the reader can take his choice, once he has absorbed the basic techniques. EXERCISES J. A real Hermitian matrix is symmetric. 2. (A·)· ... A, (AB)· ... B·A·, (A,A, ... An)· ... A: ... A:A~. a. If A iB is Hermitian, A,B real, then A' = A, B' - -B.
+
Vectors and Matrices 17. Invariance of Distance-Orthogonal Matrices. Taking as our guide the Euclidean measure of distance, we introduced the quantity (x,x) as a measure of the magnitude of the real vector x. It is a matter of some curiosity. and importance too as we shall see, to determine the linear transformations y = Tx which leave (x,x) unchanged. In other words, we wish to determine T so that the equation
is satisfied for all x.
(x,x) = (Tx,Tx)
(1)
(Tx,Tx) = (x,T'Tx)
(2)
Since
and T'T is symmetric, we see that (1) yields the relation T'T = I
(3)
A real matrix T possessing this property is caUed orthogonal. EXERCISES 1. Show that T' is orthogonal whenever T is. 2. Show that every 2 X 2 orthogonal matrix with determinant in the form
+1 can be written
8 - cos sin 8] 8
c?" [ Sln 8
What is the geometrical significance of this result? 8. Show that the columns of T are orthogonal vectors. ,. Show that the product of two orthogonal matrices is again an orthogonal matrix 15. Show that the determinant of an orthogonal matrix is ± 1. 6. Let TN be an orthogonal matrix of dimension N, and form the (N + l)-dimen· sional matrix
[
~
TN+I....
0... 0] TN
o
Show that TN+I is orthogonal. 7. Show that if T is orthogonal, :r; ... Til implies that II ... T's. 8. If AB ... BA, then TAT' and T BT' commute if T is orthogonal.
18. Unitary Matrices. Since the appropriate measure for a complex vector is (x,x), we see that the analogue of the invariance condition of (17.3) is (1) T*T = I Matrices possessing this property are called unitary, and play the same role in the treatment of Hermitian matrices that orthogonal matrices enjoy in the theory of symmetric matrices.
Introduction to Matrix Analysis
26
EXERCISES
1. Show that T· is unitary if Tis. I. Show that the product of two unitary matrices is again unitary. a. Show that the determinant of a unitary matrix has absolute value 1. ,. Show that if T is unitary, s ... Til implies that II ... T·s. 15. Obtain a result corresponding to that given in Exercise 2 of Bec. 17 for the elements of a 2 X 2 unitary matrix. (Analogous, but more complicated, results hold for the representation of the elements of 3 X 3 orthogonal matrices in terms of elliptic functions. See F. Caspary, Zur Theorie der Thetafunktionen mit zwei Argu~ menten, Kronecker J., XCIV, pp. 74-86; and F. Caspary, Sur les syst~mes orthogonaux, form~s par les fonctions tMta, Compte. Rendua fU Pam, CIV, pp. 490-493.) 8. Is a real unitary matrix orthogonal? T. Is a complex orthogonal matrix unitary? 8. Every 3 X 3 orthogonal matrix can be represented as the product of
C?S a a [ o -Sin
0] [COS
sin a cos a 0 0 1
~n
b 0 b
o 1
o
-sin
~
cos b
]
[~ 0
C?S oc -Sin
0]
sin c c cos C
What is the geometric origin of this result? MISCELLANEOUS EXERCISES
1. Prove that
aNI
aNI
•••
aNN
XN
XI
XI
.,.
XN
0
where a ii is the cofactor of aif in lai;l. I. Let Ii; denote the matrix obtained by interchanging the ith andjth rows of the unit matrix I. Prove that Iifl ... I
a. Show that IifA is a matrix identical to A except that the ith and jth rows have been interchanged, while Ali; yields the interchange of the ith and jth columns. ,. Let Hii denote the matrix whose ijth element is h, and whose other elements are zero. Show that (I + Hif)A yields a matrix identical with A except that the ith row has been replaced by the ith row plus h times the jth row, while A(I + H if ) has a similar effect upon columns. 15. Let H. denote the matrix equal to I except for the fact that the element one in the" position is equal to k. What are H.A and AH. equal to? 6. If A is real and AA' = 0, then A = O. 7. If AA· ... 0, then A = O. 8. Show that if Tis orthopnal, its elements are uniformly bounded. Similarly, if U is unitary, its elements are uniformly bounded in absolute value. 9. Let dl denote the determinant of the system of linear homogeneous equations derived from the rell\tions
Vectors and Matrices N
27
N
[L
ai;x;]
i-I
[L
i, k == 1, 2, • • • , N
ak;x;] == 0
i-I
+
1)(2 quantities XiX!, i, j == 1,2, ••. ,N, as unknowns. Then (Schiifti). 10. Show lim An, lim Bn may exist as n ...... 00, without lim (AB)n existing. It
regarding the N(N dl ==
laiiIN(N-Il/1
n-+ 10
n-+ 10
ft-+ 10
is sufficient to take A and B two-dimensional. N
L
11. Suppose that We are given the system of linear equations
i == 1, 2, ..• ,N. If it is possible to obtain the equations
XI
= CI,
aijXj
= bi ,
i-I S.
= CI,
••• ,
by forming linear combinations of the given equations, then these equations yield a solution of the given equations, and the only solution (R. M. Robinson). 12. Introduce the Jacobi bracket symbol [A,B) = AB - BA, the commutator of A and B. Show, by direct calculation, that SN ... CN,
[A,[B,Cll
+ [B,[C,All + [C,[A,Bll
= 0
18. Let r, == ell'''n be an irreducible root of unity, and let • . , n - 1. Consider the matrix
rk
==
n-I
Show that T /n~s is unitary.
Hence, if Xk
=
l
n-I
ehikilnllio then Ilk =
i-O n-I
and
L IXll
10-0
k ... 1, 2,
ell'ikln,
L
~
e-llrikilnXj,
i-O
n-I 1
= n
L IYkl
l•
This transformation is called a finite Fourier transform.
10-0
14. SuppOSe that
N
with bu = dkk = 1 for all k.
Then
lai, I =
n
Ck.
See J. L. Burchnall. 1
10-1
115. Consider the Gauss transform B = (6i;) of the matrix A = (ali),
k>l Let All ... (aii), i, j ... 2, ••
,N.
Show that
1>.1 - BI = all-I[>.I>.l - Alii - 1>.1 - All (D. M. Kole1llanski.l I
J. L. Burchnall, Proc. Edinburgh Math. Soc., (2), vol. 9, pp. 100-104, 1954.
Introduction to Matrix Analysis
28
18. Show that the matrices
al
1
A - [
~
g
b~l]
sati8fy the relation AB - 1. 1T. Show that 111
SI SI
Xl
II III
1/1
SI
Sl
I
-
(SI
-
SII
+ III + II)(SI + "'III + ",I'I)(SI + ",1111 + wzt) + 111 1 + SII -
32:llIlsl
where '" is a cube root of unity. 18. Hence, show that if Q(s) - SII + SII + Sal where S is a bilinear form in the SI and IIi, that is, SI aii. are independent of the Si and IIi. 19. Show that
-
then Q(s)Q(II) - Q(s), %a.ii,,:ei7/k, where the coefficient8
3SIStSl,
S.
SI
Xa
SI
-Sf
S.
SI
Sa SI
J:a
-SI
SI
_
(SII
+ SII + SII + S.I)I
and thus that (SII
+ SII + Sal + S.I) (111 1 + 1111 + lIa l + 1/.1) , - SII + SII + ZII + S.I,
where the II are bilinear forms in the Si and III. It was shown by Hurwitz that a product of N squares multiplied by a product of N squares is a product of N squares in the above sense, only when N - 1,2,4,8; see A. Hurwitz. 1 For an exposition of the theory of reproducing forms, see C. C. MacDufJee. I •1 10. Prove that a matrix A whose elements are given by the relations
1)
(-1) i-I (~ 1-1 - (_1)1-1
aif -
- 0
satisfies the relation AI - 1.
Here
(~)
i <j i - i i >i
is the binomial coefficient, nl/kl(n - k)t.
11. Let IIi - IIi (SISI, • • • ,SN) be a let of N functionl of the Si, i ... 1, 2, .•• , N. The matrix J - J(II,s) - (alii/aSi) is called the Jacobian matrix and its determinant the Jacobian of the transformation. Show that J(S,II)J(II~S) - J(I,S)
Ill. Consider the relation between the NI variables lIif and the NI variables Sit given by Y == AXB, where A and B are constant matrices. Show that IJ(Y,X)I IAINIBIN. I A. Hurwitz, Vber die Kompo.ition der quadratischen Formen von beliebig vielen Variablen, Math. Werke, bd II, Basel, 1933, pp. 565-571. I C. C. MacDulIee, On the Composition of Algebraic Forms of Higher Degree, Bull. Anw. Math. 8oc., vol. 51 (1945), pp. 198-211. I J. Radon, Lineare Scharen orthogonaler Matrizen, Abh. Math. 8em. Hamb., vol. 1 (1921), pp. 1-14.
Vectors and Matrices
29 N
23. If Y ... XX', where X is triangular, then IJ(Y,X)I = 2N
n
X"N-i+l.
24. Show that the problem of determining the maximum of the function (s,As) 2(s,b) leads to the vector equation Ax = b.
+
+ +
21S. Similarly, show that the problem of minimizing (s,Bs) 2(x,Ay) (y,By) 2(a,s) - 2(b,y) over ali x and yleads to the simultaneous equations Bx Ay ... a, A's By'" b. 28. Let f(s) be a function of the variable s which assumes only two values, s ... 0, z - 1. Show that f(s) may be written in the form a bx, where a ... f(O), b ... f(l) - f(O). 27. Let g(s) be a function of the same tyPe, which itself assumes only two values or 1. Then f(g(s» ... al bls. Show that al and bl are linear combinations of a and b, and thus that the effect of replacing s by g(x) is equivalent to a matrix trans--
+
+
+
o
formation of the vector whose components are a and b. 28. Let f(XI,SI) be a function of the two variables XI and Sl each of which assumes only the values 0 and 1. Show that we may write f(XI,XI) ... al + alZl + aaZl + a.xIXI. 29. Let gl(XI,XI), gl(XhSI) be functions of the same type, each of which itself assumes only two values, 0 or 1. Then if we write f(gl,gl) ... a~
+ a~1 + a~sl + a~lsl
we have a matrix transformation
mJ - [~~J M
For the case where gl(SI,SI) ... SIXI, gs(xl,xs) ... xl(1 - Sl), evaluate M. 80. Generalize the foregoing results to the case where We have a function of N variables, f(SI,XI, ••• ,XN), and to the case where the Xi can assume any of a given finite set of values. (The foregoing results are useful in the study of various types of logica.l nets; see, for example, R. Bellman, J. Holland, and R. Kalaba, Dynamic Programming and the Synthesis of Logical Nets, J. A88oc. Compo Mach., 1959.) 81. If A is a given 2 X 2 matrix and X an unknown 2 X 2 matrix, show that the equation AX - XA ... I has no solution. 82. Show that the same result holds for the case where A and X are N X N matrices. 83. Consider the relation between the N(N + 1)/2 variablesYij and the N(N + 1)/2 variables S,j given by Y ... AXA', where X and Yare symmetric. Show that IJ(Y,X)I ... lAIN+!. Two semiexpository papers discussing matters of this nature have been written by W. L. Deemer and I. Olkin. 1 Construct a 4 X 4 symmetric orthogonal matrix whose elements are ±1. This question, and extensions, is of interellt in connection with Hadamard's inequality, see Sec. 7 of Chap. 8, and (amazingly!), in connection with the design of experiments. See R. E. A. C. Paley I and R. L. Plackett and J. P. Burman. 1
8'.
I W. L. Deemer and I. Olkin, Jacobians of Matrix Transformations Useful in Multivariate Analysis, Biometrika, vol. 38, pp. 345-367, 1951. I. Olkin, Note on "Jacobians of Matrix Transformations Useful in Multivariate Analysis," Biometrika, vol. 40, pp. 43-46, 1953. I R. E. A. C. Paley, On Orthogonal Matrices, J. Math. and Physics, vol. 12, pp. 311-320, 1933. • R. L. Plackett and J. P. Burman, The Design of Optimum Multifactorial Experiments, Biometrika, vol. 33, pp. 305-325, 1946.
30
Introduction to Matrix Analysis
Bibliography The reader who is interested in the origins of matrix theory is urged to read the monograph by MacDuffee C. C. MacDuffee, The Theory of M alrices, Chelsea Publishing Co.. New York, 1946. Although we shall occasionally refer to various theorems by the names of their discoverers, we have made no serious attempt to restore every result to its rightful owner. As in many other parts of mathematics, there is a certain amount of erroneous attribution. When aware of it, we have attempted to rectify matters. §S. It should be pointed out that other physical situations may very well introduce other types of algebras with quite different /I addition" and /I multiplication." As I. Olkin points out, in the study of sociometries, a multiplication of the form A' B = (a;pij) is quite useful. Oddly enough, as will be seen subsequently, this Schur product crops up in the theory of partial differential equations. 110. For an extensive discussion of commutativity, see O. Taussky, Commutativity in Finite Matrices, A mer. Math. Monthly, vol. 64, pp. 229-235, 1957. For an interesting extension of commutativity, see B. Friedman, n-commutative Matrices, Math. Annalen, vol. 136, 1958, pp. 343-347. For a discussion of the solution of equations in X of the form = AX, XA = A'X, see H. O. Foukes, J. London Math. Soc., vol. 17, pp. 70-80, 1942.
XA
§11. Other algebras can be defined in which commutativity and associativity of multiplication both fail. A particularly interesting example is the Cayley algebra. For a discussion of these matters, see A. A. Albert, Modern Higher Algebra, University of Chicago Press, Chicago, 1937. G. A. Birkhoff and S. Maclane, Survey of Modern Algebra, The Macmillan Company, New York, 1958.
Vectors and Matrices
31
§14. In the case of more general transformations or operators, A'· is often called the adjoint transformation or operator. The reason for its importance resides in the fact that occasionally the induced transformation may be simpler to study than the original transformation. Furthermore, in many cases, certain properties of A are only simply expressed when stated in terms of A'. In our discussion of Markoff matrices in Chap. 14, we shall see an example of this. §16. The notation H* for 1P appears to be due to Ostrowski: A. Ostrowski, Dber die Existenz einer endlichen Basis bei gewissen Funktionensystemen, Math. Ann., vol. 78, pp. 94-119, 1917. For an interesting geometric interpretation of the Jacobi bracket identity of Exercise 12 in the Miscellaneous Exercises, see W. A. Hoffman, Q. Appl. Math., 1968. Some remarkable results have been obtained recently concerning the number of multiplications required for matrix multiplication. See S. Winograd, The Number of Multiplications Involved in Computing Certain Functions, Proc. IFIP Congress, 1968. For some extensions of scalar number theory, see F. A. Ficken, Rosser's Generalization of the Euclidean Algorithm, Duke Math. J., vol. 10, pp. 355-379, 1943. A paper of importance is M. R. Hestenes, A Ternary Algebra with Applications to Matrices and Linear Transformations, Archive Rat. Mech. Anal., vol. 11, pp. 138-194, 1962.
3 Diagonalization and Canonical Forms for Symmetric Matrices 1. Recapitulation.
Our discussion of the problem of determining the N
stationary values of the quadratic form Q(x) =
+ x. + ... +
l
aijx''XJ on the sphere
'·,1--1
= I led us to the problem of finding nontrivial solutions of the linear homogeneous equations XII
1
XN 2
N
l
lJoJXJ
= Xx,-
i = 1,2, . . . , N
(1)
1-1
We interrupted our story at this point to introduce vector-matrix notation, stating in extenuation of our excursion that this tool would permit us to treat this and related questions in a simple and elegant fashion. Observe that the condition of symmetry all = ali is automatically satisfied in (I) in view of the origins of these equations. This simple but significant property will permit us to deduce a great deal of information concerning the nature of the solutions. On the basis of this knowledge, we shall transform Q(x) into a simpler form which plays a paramount role in the higher theory of matrices and quadratic forms. Now to resume our story I 2. The Solution of Linear Homogeneous Equations. We require the following fundamental result. Lemma. A necessary and sufficient condition that the linear system N
l
i = 1,2, . . . , N
biJxj = 0
(I)
j-I
possess a nontrivial solution is that we have the determinantal relation
IbiJI
= 0 32
(2)
Diagonalization and Canonical Forms
38
As usual, by /I nontrivial" we mean that at least one Xi is nonzero. This result is actually a special case of a more general result concerning linear systems in which the number of equations is not necessarily equal to the number of variables, a matter which is discussed in Appendix A. Here, however, we shall give a simple inductive proof which establishes all that we need. Proof of Lemma. The necessity is clear. If Ibill F 0, we can solve by Cramer's rule, obtaining thereby the unique solution Xl = X2 = ... = o. Let us concentrate then on the sufficiency. It is clear that the result is true for N = 1. Let us then see if we can establish its truth for N, assuming its validity for N - 1. Since at least one of the biJ is nonzero, or else the result is trivially true, assume that one of the elements in the first row is nonzero, and then, without loss of generality, that this element is bll • Turning to the linear system in (1), let us eliminate Xl between the first and second equations, the first and third, and so on. The resulting system has the form (b 22
-
b21b12) X2 bl l
--
+ ... + (b 2N -
b21bIN) - - XN bl l
=0 (3)
bNlb12) X2 (b N2 - -bll
+ . . . + (b NN -
lN ) bNlbXN bl l
=0
Let us obtain a relation between the determinant of this system and the original N X N determinant, Ib;il, in the following way. Subtracting bulb ll times the first row of Ib;il from the second, b31 lb ll times the first row from the third row and so on, we obtain the relation
(
o
b22
-
b21 b12) bl l
b _ bNl b12) ( N2 bu
..,
(b 2N _ b21 bIN ) bu
•••
Hence, the determinant of (N - 1)-dimensional system in (3) is zero, since by assumption bll F 0 and Ib;il = O. From our inductive hypothe-
84
Introduction to Matrix Analysis
sis it follows that there exists a nontrivial solution of (3), Setting
X2,
Xa, •••
,XN.
N
Xl =
-
L
bIJXI/b l1
(5)
1-2
we obtain, thereby, the desired nontrivial solution of (1),
Xl, X2, • • • ,XN.
EXERCISE
1. Show that if A is real, the equation Ax = 0 always possesses a real nontrivial solution if it possesses any nontrivial solution.
3. Characteristic Roots and Vectors. Setting X equal to the vector whose components are Xi and A = (aij) , we may write (1.1) in the form Ax = Xx
(1)
Referring to the lemma of Sec. 2, we know that a necessary and sufficient condition that there exist a nontrivial vector X satisfying this equation is that Xbe a root of the determinantal equation lail - X8ijl = 0
(2)
IA -
(3)
or, as we shall usually write, Xli = 0
This equation is called the characteristic equation of A. As a polynomial equation in X, it possesses N roots, distinct or not, which are called the characteristic roots or characteristic values. If the roots are distinct, we shall occasionally use the term simple, as opposed to multiple. The hybrid word eigenvalue appears with great frequency in the literature, a bilingual compromise between the German word "Eigenwerte" and the English expression given above. Despite its ugliness, it seems to be too firmly entrenched to dislodge. Associated with each distinct characteristic value X, there is a characteristic vector, determined up to a scalar multiple. This characteristic vector may be found via the inductive route sketched in Sec. 2, or following the path traced in Appendix A. Neither of these is particularly attractive for large values of N, since they involve a large number of arithmetic operations. In actuality, there are no easy methods for obtaining the characteristic roots and characteristic vectors of matrices of large dimension. As stated in the Preface, we have deliberately avoided in this volume any references to computational techniques which can be employed to
Diagonalization and Canonical Forms
85
determine numerical values for characteristic roots and vectors. If Xis a multiple root, there mayor may not be an equal number of associated characteristic vectors if A is an arbitrary square matrix. These matters will be discussed in the second part of the book, devoted to the study of general, not necessarily symmetric, matrices. For the case of symmetric matrices, multiple roots cause a certain amount of inconvenience, but nothing of any moment. We will show that a real symmetric matrix of order N has N distinct characteristic vectors. EXERCISES 1. A and A' have the same characteristic values. I. T'AT - ",l - T'(A - >.1)T if T is orthogonal. Hence, A and T'AT have the same characteristic values if T is orthogonal. a. A and T·AT have the same characteristic values if T is unitary. t. SAT and A have the same characteristic values if ST = l. IS. Show by direct calculation for A and B, 2 X 2 matrices, that AB and BA have the same characteristic equation. 8. Does the result hold generally? '1. Show that any scalar multiple apart from zero of a characteristic vector is also a characteristic vector. Hence, show that we can always choose a characteristic vector x so that (x, i) = 1. 8. Show, by considering 2 X 2 matrices, that the characteristic roots of A + B cannot be obtained in general as sums of characteristic roots of A and of B. 9. Show that a similar comment is true for the characteristic roots of AB. 10. For the 2 X 2 case, obtain the relation between the characteristic roots of A and those of A I. 11. Does a corresponding relation hold for the characteristic roots of A and Aft for n ... 3, 4, . . . ?
4. Two Fundamental Properties of Symmetric Matrices. Let us now give the simple proofs of the two fundamental results upon which the entire analysis of real symmetric matrices hinges. Although we are interested only in symmetric matrices whose elements are real, we shall insert the word 1/ real" here and there in order to emphasize this fact and prevent any possible confusion. Theorem 1. The characteristic roots of a real symmetric matrix are real. Proof. Assume the contrary. Since A is a real matrix, it follows from the characteristic equation IA - XII = 0 that the conjugate of any complex characteristic root X is also a characteristic root. We obtain this result and further information from the fact that if the equation Ax = Xx
(1)
= ~:f
(2)
holds, then the relation Af
86 is $I.lso valid.
Introduction to Matrix Analysis From these equations, we obtain the further relations (f,Ax) = X(f,x) (x,Af) = X(x,f)
(3)
Since A is symmetric, which implies that (f,Ax) .. (Af,x) = (x,Af), the foregoing relations yield
o=
(X - X)(x,f),
(4)
whence X = X, a contradiction. This means that the characteristic vectors of a real symmetric matrix A can always be taken to be real, and we shall consistently do this. The second result is: Theorem 2. Characteristic vectors associated with distinct characteristic roots of a real symmetric matrix A are orthogonal. Proof. From Ax = Xx (5) Ay = p.y X ¢ p., we obtain (y,Ax) = X(y,x) (6) (x,Ay) = p.(x,y) Since (x,Ay) = (Ax,y) = (y,Ax), subtraction yields
o .... (X - p.)(x,y) (7) whence (x,y) = o. This result is of basic importance. Its generalization to more general operators is one of the cornerstones of classical analysis. EXERCISES
1. A characteristic vector cannot be associated with two distinct characteristic values. I. Show by means of a 2 X 2 matrix, however, that two distinct vectors can be associated with the same characteristic root. 8. Show by means of an example that there exist 2 X 2 symmetric matrices A and B with the property that IA - XBI is identically zero. Hence. under what conditions on B can we assert that all roots of IA - XBI ... 0 are real? 4. Show that if A and B are real symmetric matrices, and if B is positive definite, the roots of IA - XBI ... 0 are an real. I. Show that the characteristic roots of a Hermitian matrix are real and that the characteristic vectors corresponding to distinct characteristic roots are orthogonal using the generalized inner product (z,ti). 8. Let the elements BH of A depend upon a parameter t. Show that the derivatives of IAI with respect to t can be written as the sum of N determinants, where the kth determinant is obtained by differentiating the elements of the kth row and leaving the others unaltered.
Diagonalization and Canonical Forms
37 N
7. Show that the derivative of fA
- Xlf withrespeet to Xis equal to - llAi - XII, II-I
where Ai is the (N - 1) X (N - 1) matrix obtained from A by striking out the kth row and column. 8. From this, conclude that if Xis a simple root of A, then at least one of the determinants IA i - XII is nonzero. 8. Use this result to show that if X is a simple root of A, a characteristic vector x 8880ciated with X can always be taken to be a vector whose components are polynomials in Xand the elements of A.
G. Reduction to Diagonal Form-Distinct Characteristic Roots. We can obtain an important result quite painlessly at this juncture, if we agree to make the simplifying assumption that A has distinct characteristic roots Xl, X2, . . . ,XN. Let Xl, x 2, • • • ,xN be an associated set of characteristic vectors, normalized by the condition that i = 1,2, . . . ,N
Consider the matrix T formed upon using the vectors Schematically, T =
(X I ,X 2, • • •
Then T' is the matrix obtained using the
T'=
,xN) Xl
(1) Xl
as columns. (2)
as rows,
(3)
Since (4)
(in view of the orthogonality of the x' as characteristic vectors associated with distinct characteristic roots of the symmetric matrix A), we see that T is an orthogonal matrix. We now assert that the product AT has the simple form (5)
by which we mean that the matrix A T has as its ith column the vector ~iXI.
It follows that (6)
38
Introduction to Matrix Analysis
(recalling once again the orthogonality of the Zt),
XI
o
=
o XN The matrix on the right-hand side has as its main diagonal the characteristic values A., AI, ••• , AN, and zeros every place else. A matrix of this type is, as earlier noted, called a diagonal matrix. Multiplying on the right by T' and on the left by T, and using the fact that TT' = I, we obtain the important result
Xl 0
XI A =
T
T'
(7)
0
XN This process is called reduction to diagonal form. As we shall see, this representation plays a fundamental role in the theory of symmetric matrices. Let us use the notation
Xl 0
X2 A=
(8)
0
XN EXERCISES 1. Show that Ai ... (X,i.,,;), and that Ai - TAiT ' , for k == 1, 2, . . . . Show that if A has distinct characteristic roots, then A satisfies its own characteristic equation. This is a particular case of a more general result we shall establish later on. a. If A has distinct characteristic roots, obtain the set of characteristic vectors lUIIlOciated with the characteristic roots of Ai, k ... 2, 3, . . . .
t.
Diagonalization and Canonical Forms
89
6. Reduction of Quadratic Forms to Canonical Form. Let us now show that this matrix transformation leads to an important transformation of Q(x). Setting x = Ty, where T is as defined in (2) of Sec. 5, we have (x,Ax)
= (Ty,ATy) = (y,T' ATy) = (y,Ay)
(1)
or the fundamental relation N
I
(2)
a;jX;Xj
i.i= 1
Since T is orthogonal, we see that x T'x
Ty implies that
= T'Ty = y
(3)
Hence to each value of x corresponds precisely one value of y and conversely. We thus obtain the exceedingly useful result that the set of values assumed by Q(x) on the sphere (x,x) = 1 is identical with the set of values assumed by (y,Ay) on the sphere (y,y) = 1. So far we have only established this for the case where the Xi are all distinct. As we shall see in Chap. 4, it is true in general, constituting the foundation stone of the theory of quadratic forms. EXERCISES 1. Let A have distinct. characteristic roots which are all positive. Use the preceding result to compute the volume of the N-dimensional ellipsoid (x,Ax) = 1. I. Prove along the preceding lines that the characterist.ic roots of Hermitian matrices are real and that characteristic vectors associated with distinct characteristic roots are orthogonal in terms of the notation [x,y) of Sec. 16 of Chap. 2. S. Show that if the characteristic roots of a Hermitian matrix A are distinct, we can find a unitary matrix T such that A - TAT·. This again is a particular case of a more general result we shall prove in the following chapter. 4. Let A be a real matrix with the property that A' = -A, Ii, skew-symmetric matrix. Show that the characteristic roots are either zero or pure imaginary. II. Let T be an orthogonal matrix. Show that all characteristic roots have absolute value one. 6. Let T he a unitary matrix. Show that all characteristic roots have absolute value one. 7. Suppose that we attempt to obtain the representation of (5.2) under no restriction on the characteristic roots of the symmetric matrix A in the following way. To begin with, we assert that we can always find a symmetric matrix B, with elements arbitrarily small, possessing the property that A + B has simple characteristic roots. We do not dwell upon this point since the proof is a bit more complicated than might be suspected; see Sec. 16 of Chap. 11. Let l.ud be the characteristic roots of A + B Then, as we know, there exists an orthogonal matrix S such that
Introduction to Matrix Analysis III
o S'
o liN
Since 8 is an orthogonal matrix, ite elements are uniformly bounded. Let IB"I be a sequence of matrices approaching 0 luch that the corresponding sequence of orthogonal matrices IS"I converges. The limit matrix must then be an orthogonal matrix, day 7'. Since lim II' ... N, we have
_.
_00 lim
lim ........ liN
~
]
Since this proof relies upon a number of analytic concepts which we do not wish to introduce until much later, we have not put it into the text. It is an illustration of a quite useful metamathematical principle that results valid for general real symmetric matriceB can always be established by first considering matrices with distinct characteristic roots and then palling to the limit•
.,. Positive Definite Quadratic Forms and Matrices. In Sec. 9 of Chap. 1, we introduced the concept of a positive definite quadratic form in two variables, and the allied concepts of positive indefinite or non.negative definite. Let us now extend this to N-dimensional quadratic forms. If A = (~J) is a real symmetric matrix, and N
QN(X) "'"
L
a,jXiXJ
>0
'.i-1
for all real nontrivial X" we shall say that QN(X) is positive definite and that A is positive definiu. If QN(X) ~ 0, we shall say tha.t QN(X) and A are
positive indefinite. N
Similarly, if H is a Hermitian form and PN(x) =
L
h;jx.fJ> 0 for
'.i-l
all complex nontrivial definite.
X"
we shall say that PN(x) and H are positive EXERCISES
1. If A is a symmetric matrix with distinct characteristic roots, obtain a set of Mcessary and sufficient conditions that A be positive definite. I. There exists a scalar CI such that A + cll is positive definite, given any symmetric matrix A.
Diagonalization and Canonical Forms
a.
41
Show that we can write A in the form N
I
A =
X,E,
i-I
where the EI are non-negative definite matrices.
Then
N
I
Ai =
XliE,
i-1
for k ... 1, 2, . . . .
~
m
If p(X) is a polynomial in X with scalar coefficients, p(X)
ciAi.
Show that peA) ... I
01:-0
CiXi, let p(A)
k-O
N
m
denote the matrix peA) ... I
I
p(XI)E,.
i-I
MISCELLANEOUS EXERCISES 1. Let A and B be two symmetric matrices. Then the roots of IA - XBI ... 0 are all real if B is positive definite. What can be said of the vectors satisfying the relations Ax ... X,BxT
I. Every matrix is uniquely expressible in the form A = Ii + S, where H is Hermitian and S is skew-Hermitian, that is, S· '"' -So a. As an extension of Theorem 1, show that if X is a characteristic root of a real matrix A, then 11m (X)I ~ d(N(N - 1)/2)~, where d =-
max
(1. Bendix8on)
laij - a;,1/2
l~i,;~N
,. Generally, let A be complex; then if dl
...
max lal;1
d, ... max lai'
i.j
i.J
+ ;;;:1/2
d a ... max la,; - aj,I/2 we have IRe (X)I :s; Nd,
11m (X)I :s; Ndat
I. Show that for any complex matrix A, we have the inequalities N
IIX, I'
~
i-I
N
IRe (X;)I':s; II(ali
i=1
i-I
+ afi)/21'
i.J-l
N
I
la"I'
i,;=1
N
I
N
I
11m (X,)I'
~
N
I
I(al, - a,,)/21'
(/. Schur)
i.j= 1
t Far more sophisticated results are available. See the papers by A. Brauer in Duke /If ath. J., 1946, 1947, 1948, where further references are given. See also W. V. Parker, Characteristic Roots and Fields of Value of a Matrix, Bull. A m. Math. Soc., vol. 57, pp. 103-108, 1951.
42
Introduction to Matrix Analysis
8. So far, we have not dismissed the posaibility that a characteristic root may have several a8I!Ociated characteristic vectors, not all multiples of a particular characteristic vector. As we shall see, this can happen if A has multiple characteristic roots. For the case of distinct characteristic roots, this cannot occur. Although the simplest proof uses concepts of the succeeding chapter, consider a proof along the following lines: (0) Let Xl and 1/ he two characteristic vectors 88Sociated with XI and suppose that 11 " CIX I for any scalar CI. Then Xl and z = 11 - XI(XI,I/)/(XI,X I ) are characteristic 'vectors and Xl and z are orthogonal. (6) Let Zl be the normalized multiple of z. Then
8{' ·""""l is an orthogonal transformation. (c) A ... BDB', where
D...
X,
o (d) It follows that the transformation X = BI/ changes (x,Ax) into (I/,DI/). (8) Assume that A is positive definite (if not, consider A cll), then, on one hand, the volume of the ellipsoid (x,Ax) ... 1 is equal to the volume of the ellipsoid
+
XII/II
+ X.YI I + ... + XN!lN I =
1,
and, on the other hand, from what has just preceded, is equal to the volume of XII/II
This is a contradiction if XI
+ XII/ 12 + ... + XN1/N I =
1
~ XI.
Bibliography 12. This proof is taken from the book by L. Mirsky,
L. Mirsky, Introduction to Linear Algebra, Oxford University Press, New York, 1955;
Diagonalization and Canonical Forms
43
§3. The term "latent root" for characteristic value is due to Sylvester. For the reason, and full quotation, see N. Dunford and J. T. Schwartz, Linear Operators, part I, Interscience Publishers, New York, 1958, pp. 606-607. The term "spectrum" for set of characteristic values is due to Hilbert. §6. Throughout the volume we shall use this device of examining the case of distinct characteristic roots before treating the general case. In many cases, we can employ continuity techniques to deduce the general case from the special case, as in Exercise 7, Sec. 6. Apart from the fact that the method must be used with care, since occasionally there is a vast difference between the behaviors of the' two types of matrices, we have not emphasized the method because of its occasional dependence upon quite sophisticated analysis. It is, however, a most powerful technique, and one that is well worth acquiring.
4 Reduction of General Symmetric Matrices to Diagonal Form 1. Introduction. In this chapter, we wish to demonstrate that the results obtained in Chap. 3 under the assumption of simple characteristic roots are actually valid for general symmetric matrices. The proof we will present will afford us excellent motivation for discussing the useful concept of linear dependence and for demonstrating the Gram-8chmidt orthogonalization technique. Along the way we will have opportunities to discuss some other interesting techniques, and finally, to illustrate the inductive method for dealing with matrices of arbitrary order. 2. Linear Dependence. Let Xl, x t , • • • , x~ be a set of k N-dimensional vectors. If a set of scalars, Cl, Ct, • • • ,C~, at least one of which is nonzero, exists with the property that (1)
where 0 represents the null vector, we say that the vectors are linearly dependent. If no such set of scalars exist, we say that the vectors are linearly independent. Referring to the results concerning linear systems established in Appendix A, we see that this concept is only of interest if k ~ N, since any k vectors, where k > N, are related by a relation of the type given in (1). EXERCISES
1. Show that any set of mutually orthogonal nontrivial vectors is linearly independent. I. Given any nontrivial vector in N-dimetWonalspace, we can always find N - 1 vectors which together with the given N-dimensional vector constitute a linearly independent set.
3. Gram-Schmidt Orthogonalization. Let Xl, x t , . . • ,xN be a set of N real linearly independent N-dimensional vectors. We wish to show
'"
Reduction of General Symmetric Matrices
4S
that we can form suitable linear combinations of these base vectors which will constitute a set of mutually orthogonal vectors. The procedure we follow is inductive. We begin by defining two new vectors as follows. yt = Xl (1) y2 = x 2 + allxl where all is a scalar to be determined by the condition that yl and y2 are orthogonal. The relation (2)
yields the value (3)
Since the set of x' is by assumption linearly independent, we cannot have equal to the null vector, and thus (XI,X I ) is not equal to zero. Let us next set (4)
Xl
where now the two scalar coefficients au and a22 are to be determined by the conditions of orthogonality (5)
These conditions are easier to employ if we note that (1) shows that the equation in (5) is equivalent to the relations (y8,X 2)
=
0
(6)
These equations reduce to the simultaneous equations (X 8,X I ) (X 8,X 2)
+ a21(x l,x l) + a22(x 2,x l) = + a21(x l,x 2) + a22(x 2,x 2) =
0 0
(7)
which we hope determine the unknown coefficients au and a22. We can solve for these quantities, using Cramer's rule, provided that the determinant (8)
is not equal to zero. To show that D 2 is not equal to zero, we can proceed as follows. If D 2 = 0, we know from the lemma established in Sec. 2 of Chap. 3 there are two scalars rt and 8t, not both equal to zero, such that the linear equations rt(xl,x l) + 8t(X I,X 2) = 0 (9) rt(x l ,x 2) + 8t(X 2,X 2) = 0
46
Introduction to Matrix Analysis
are satisfied.
These equations may be written in the form
+ SlX 2) + 81X 2)
(Xl, rlX l (X 2, rlX l
Multiplying the first equation by obtain the resultant equation
rl
=
0
(10)
= 0
and the second by
Sl
and adding, we (11)
This equation, however, in view of our assumption of reality, can hold only if rlx l + SlX 2 = 0, contradicting the assumed linear independence of the x'. Hence D 2 ¢ 0, and there is a unique solution for the quantities Ou and au. At the next step, we introduc~ the vector (12) As above, the conditions of mutual orthogonality yield the equations
(y.,x l) = (y·,x 2) = (Y·,x 8) = 0
(13)
which lead to the simultaneous equations
(x.,x')
+ a31(x l,x + a82(x 2,x') + a88(x8,x~ i)
=
0
i = 1, 2, 3
(14)
We can solve for the coefficients au, a32, and aaa, provided that the determinant i, j = 1,2, 3 (15) D 8 = I(x',x i ) I is nonzero. The proof that D 8 is nonzero is precisely analogous to that given above in the two-dimensional case. Hence, we may continue this procedure, step by step, until we have obtained a complete set of vectors (y'l which are mutually orthogonal. These vectors can then be normalized by the condition (y',y') = 1. We then say they form an orthonormal set. The determinants D k are called Gramians. EXERCISES
1. Consider the interval [ -I,ll and define the inner product of two real functions Jet) and g(t) in the following way:
(f,g) =
f~
Let
1 f(t)g(t)
dt
P.(t) - ~, and define the other elements of the sequence of real polynomials IPft(t) I by the condition t~at Pft(t) is of degree nand (Pft,Pm ) == 0, m ~ n, (Pft,P ft ) - 1. Prove that we have (Pft,t"') ... 0, for :S m :S n - 1, and construct the first few
members of the sequence in this way.
°
Reduction of General Symmetric Matrices
47
I. With the same definition of an inner product as above, prove that
o .$ m
.$ n - 1, and thus express Pft in terms of the expression (£)ft (1 - tl)".
These polynomials are, apartfrom constant factors, the classical Legendre polynomials.
a.
Consider the interval (0, 00 ) and define the inner product (f,g) ...
fo" e-'f(t)g(t) dl
for any two real polynomials 1(1) and get). Let the sequence of polynomials ILft(t) I be determined by the conditions Lo(t) ... 1, Lft(t) is a polynomial of degree n, and (Lft,L",) ... 0, n pill m, (Lft,L n ) ... 1. Prove that these conditions imply that (Lft,t"') = 0, :s; m :s; n - 1, and construct the first few terms of the sequence using these relations. ,. Prove that (e'(d/dt)ft(e~tft), too) ... 0, .$ m .$ n - 1, and thus express Lft in terms of e'(d/dt)ft(e-lt ft ). These polynomials are apart from constant factors the classical Laguerre polynomials. 15. Consider the interval (- OIl, OIl) and define the inner product
°
°
(f,g) =
f_..
e-/l/(t)g(t) dl
for any two real polynomials I(t) and get). Let the sequence of polynomials IHft(t) I be determined as follows. Ho(t) = I, H .. (I) is a polynomial of degree n, and (Hm,H n ) ... 0, m pill n, (Hft,H ft ) ... 1. Prove that (Hft,t"') = 0, 0 .$ m .$ n - 1, and construct the first few terms of the sequence in this fashion. 6. Prove that (ell(d/dt)ft(r/Il ft ), t"') = 0, 0 :s; m :s; n - 1 and hence express H" in terms of e'1(d/dl)ft(r/1t ft ). These polynomials are apart from constant factors the classical Hermite polynomials. T. Show that given a real N-dimensional vector xl, normalized by the conditioD that (XI,X I ) = 1, we can find (N - 1) additional vectors Xl, Xl, • • • , x N with the property that the matrix T ... (XI,X I , • • • ,x N ) is orthogonal. 8. Obtain the analogue of the Gram-Schmidt method for complex vectors.
4. On the Positivity of the D k • All that was required in the foregoing section on orthogonalization was the nonvanishing of the determinants D k • Let us now show that a slight extension of the previous argument will enable us to conclude that actually the D k are positive. The result is not particularly important at this stage, but the method is an important and occasionally useful one. Consider the quadratic form Q(Ul,U2, • • • ,Uk)
=
(U1X 1
+ U2X 2 +
re
,.~ I
(xi,X;)UiUj
(1)
where the Ui are real quantities, and the xi as above constitute a system of real linearly independent vectors. In view of the linear independence of the vectors Xi, we see that Q > 0 for all nontrivial sets of values of the Ui. Consequently, the result we
48
Introduction to Matrix Analysis
wish to establish can be made a corollary of the more general result that the determinant (2) D = 14iJ1 of any positive definite quadratic form N
Q
L
=
(3)
a;ju"uJ
i,j-I
is positive. We shall show that this positivity is a simple consequence of the fact that it is nonzerO. All this, as we shall see, can easily be derived from results we shall obtain further on. It is, however, interesting and profitable to see the depth of various results, and to note how far one can go by means of fairly simple reasoning. Let us begin by observing that D is never zero. For, if D = 0, there is a nontrivial solution of the system of linear equations N
L
i = 1,2, . . . ,N
lZ;,1tJ = 0
(4)
i-I
From these equations, we conclude that Q =
N
N
i- 1
i-I
Lu..(L 4tJUJ) = 0
(5)
a contradiction. This is essentially the same proof we used in Sec. 3. The novelty arises in the proof that D is positive. Consider the family of quadratic forms defined by the relation N
P(>.) = >'Q
+ (1
- >.)
Lu..
2
(6)
i-I
where >. is a scalar parameter ranging over the interval [0,1]. For all >. in this interval, it is clear that P(>.) is positive for all nontrivial u... Consequently, the determinant of the quadratic form P(>.) is never zero. For>. = 0, the determinant has the simple form
1
o -I
1
o 1
(7)
Reduction of General Symmetric Matrices
49
clearly positive. Since the determinant of P(X) is continuous in X for 1, and never zerO in this range, it follows that positivity at X = 0 implies positivity at X = 1, the desired result. Ii. An Identity. An alternative method of establishing the positivity of the D" when the vectors Xi are linearly independent depends upon the following identity, which we shall call upon again in a later chapter devoted to inequalities. Theorem 1. Let Xi, i = 1, 2, . . . ,k, be a set of N-dimensional vectors, N ~ k. Then
o 5 X5
I. I.
X· I
X. I 2
" X·2 "
X· 2
(1)
where the sum is over all sets of integers {i,,1 with 15i15i 2 5 ... 5i,,5N. Proof. Before presenting the proof, let us obBerve that the case k = 2 is the identity of Lagrange N
(L
N
Xi 2)
i-I
(L
N
2 Yi )
N
(L X,y,)2 = 72 L (x,y; -
-
i-I
i-I
i l;
-
x,y,)2
(2)
1
It is sufficient to consider the case k = 3 in order to indioate the method of proof, and it is helpful to employ a lees barbarous notation. We wish to demonstrate that N
N
i=1
,=1
N
N
N
L X,2 .L L I I Y~i I li~1 ,=)L y,Z, L XyZi
XiYi
XiYi
i=1
XiZi
i-I N
Yi
2
1=1
i=1
N
N
Zi
(3)
2
i= I
Let us begin with the result X" XI x'" y" YI y", Z"
ZI
Z'"
2
X,,2 +X1 2 +X",2 X"y"+XIYI+X",y,,, X"z"+XIZI+X",Z,,, X"y"+XIYI+X",y,,, y,,2 +Y1 2 +y... 2 y"Z"+YIZI+Y",Z", X"Z" +XIZI +x...z", y"z" +YIZI+Y...z", Z,,2 +Z1 2 +Z",2
(4)
which we obtain from the rule for multiplying determinants. The last determinant may be written as the sum of three determinants, of which the first is typical,
Introduction to Matrix Analysis
50 X~2 X~Y~ x~z~
X~Y~ y~2 Y~Z~
+ XIYI + X",y", + YI 2 + y",2 + YIZI + Y",z",
XkZ~ Y~Z~ Zk
2
+ XIZI + x...z... + YIZI + y...z... + ZI 2 + Z",2
(5)
Subtracting Yk/Xk times the first column from the second column, and times the first column from the third column, we are left with the determinant
',:lX~
Xk
2
XIYI 2 YI
XkYk XkZk
YIZI
+ X",Y", + y",2 + y".z",
+ X",Z", YIZI + Y",z", ZI + Z",2 XIZI
(6)
2
Writing this determinant as a sum of two determinants, of which the following is typical, we are left with Xk 2 XkYk
XIYI 2 YI
XkZk
YIZI
XIZI YiZl ZI 2
+ x".z", + y".z", + Z",2
(7)
and subtracting ZI/YI times the second column from the third column, we obtain the determinant Xk 2
XIYI 2
X",Z",
XkYk
YI
Y",z",
XkZk
YIZI
Z...I
(8)
Summing over all k, l, and m, we obtain the left side of (3). Observing that the above procedures reduce us to a determinant of the form given in (8) in 3 X 2 ways, we see that a factor of 1/31 is required. 6. The Diagonalization of General Symmetric Matrices-Two Dimensional. In order to show that a general real symmetric matrix can be reduced to diagonal form by means of an orthogonal transformation, we shall proceed inductively, considering the 2 X 2 case first. Let (1)
be a symmetric matrix, and let Xl and Xl be an associated characteristic root and characteristic vector. This statement is equivalent to the relations or
(2)
where Xu and Xu are the components of Xl, which we take to be normalized by the condition (XI,X I ) = l. As we know, we can form a 2 X 2 orthogonal matrix T 2 one of whose columns is Xl. Let the other column be designated by x 2• We wish to show that (3)
Reduction of General Symmetric Matrices
51
where }.l and X2 are the two characteristic values of A, which need not be distinct. This, of course, constitutes the difficulty of the general case. We already have a very simple proof for the case where the roots are distinct. Let us show first that l2 T'2 AT 2 = [XI (4) 0 bb22
J
where bl2 and b22 are as yet unknown parameters. The significant fact is that the element below the main diagonal is zero. We have l 2 T~AT2 = T~ [XIXll (a 2 ,X 2 )] (5) XIXl2 (a ,x ) upon referring to the equations in (2). carrying through the multiplication,
T~ [X I X l1
XIXl2
(a
l
2 ,X )]
(a 2,x 2 )
Since T~T2 = I, we obtain, upon =
[Xl0
bl2 ] b22
(6)
where bl2 and b22 are parameters whose value we shall determine below. Let us begin with bl2 , and show that it ~as the value zero. This follows from the fact that T~A T 2 is a symmetric matrix, for (T~AT2)'
=
T~A'(T~)'
=
T~A'T2 = T~AT2
(7)
Finally, let us show that b22 must be the other characteristic root of A. This follows from the fact already noted that the characteristic roots of A are identical with the characteristic roots of T~A T2. Hence b22 = X2. This completes the proof of the two-dimensional case. Not only is this case essential for our induction, but it is valuable because it contains in its treatment precisely the same ingredients we shall use for the general case. 7. N-dimensional Case. Let us proceed inductively. Assume that for each k, k = 1, 2, . . . , N, we can determine an orthogonal matrix T k which reduces a real symmetric matrix A = (aij), i, j = 1,2, . . . ,k, to diagonal form,
o (1)
o Xk
The elements of the main diagonal Xi must then be characteristic roots of A. Under this assumption, which we know to be valid for N = 2, let us show that the same reduction can be carried out for N + 1.
Introduction to Matrix Analysis
S2 Let
(2)
and let }.l and Zl be, respectively, an associated characteristic value and normalized characteristic vector of AN+l. Proceeding as in the two-dimensional case, we form an orthogonal matrix T 1 whose first column is Zl. Let the other columns be designated as Zl, z', . . . ,ZN+1, so that T l has the form (3) Then as before, the first step consists of showing that the matrix T~AN+lTl has the representation ).1
o
bu
b18
..,
biN
(4)
o where a. The elements of the first column are all zero except for the first which is ).1 b. The quantities bu, bu , . . . , bIN will be determined below c. The matrix AN is an N X N matrix
(5)
Carrying out the multiplication, we have (al,x l )
(al,zl)
(al,x') (a 2,z2)
(a l ,zN+1) (a 2,zN+1)
A N+1 T l = (a N+1,zl) ).lZU ).lZU
(a N+1,zl) (a l ,z2) (a 2,z2)
(a N+1,zN+I) (a 1, ZN+1) (a 2,zN+l)
(6)
Reduction of General Symmetric Matrices
53
Since T I is an orthogonal matrix, we see that XI
o
b12
•••
bl,N+I
(7)
o To determine the values of the quantities b li , we use the fact that T~AN+lT, is symmetric. This shows that these quantities are zero and simultaneously that the matrix AN must be symmetric. The result of all this is to establish the fact that there is an orthogonal matrix T I with the property that XI
o
0
'"
0 (8)
o
with AN a symmetric matrix. Finally, let us note that the characteristic roots of the matrix AN must be X2, X8 , • • • ,XN+I, the remaining N characteristic roots of the matrix AN+I. This follows from what we have already observed concerning the identity of the characteristic roots of AN+I and T~AN+,T" and the fact that the characteristic equation of the matrix appearing on the righthand side of (7) has the form IX , - xl IAN - Xli = O. Let us now employ our inductive hypothesis. Let TN be an orthogonal matrix which reduces AN to diagonal form. Form the (N + 1)-dimensional matrix 1 0 ... 0
o (9)
o which, as we know, is also orthogonal. It is readily verified that
(10)
o
Introduction to Matrix Analysis Since we may write S:V+I(T~AN+ITI)SN+l = (TISN+I)'AN+I(TISN+I)
(11)
we see that T 1SN+l is the required diagonalizing orthogonal matrix for AN+l'
We have thus proved the following basic result. Theorem 2. Let A be a real symmetric matrix. Then it may be transformed into diagonal form by means of an orthogonal transformation, which is to say, there is an orthogonal matrix T such that
o T'AT =
(12)
o where ).i are the characteristic roots of A. Eq'j,ivalently, N
(x,Ax) =
,-IL
).,Yi 2
(13)
where y = T'x. EXERCISE 1. What can be said if A is a complex symmetric matrix?
8. A Necessary and Sufficient Condition for Positive Definiteness. The previous result immediately yields Theorem 3. Theorem 3. A necessary and sufficient condition that A be positive definite is that all the characteristic roots of A be positive.
Similarly, we see that a necessary and sufficient condition that A be positive indefinite, or non-negative definite, is that all characteristic roots of A be non-negative. EXERCISES
BB' is positive definite if B is real and IBI ~ o. CC· is positive definite if ICI ~ o. If A is symmetric, then I + fA is positive definite if « is sufficiently small.
1. A I. H
a.
9. Characteristic Vectors Associated with Multiple Characteristic Roots. If the characteristic roots of A are distinct, it follows from the orthogonality of the associated characteristic vectors that these vectors are linearly independent. Let us now examine the general case. Suppose that ).1 is a root of
Reduction of General Symmetric Matrices
55
multiplicity k. Is it true that there exist k linearly independent characteristic vectors with Xl as the associated characteristic root? If so, is it true that every characteristic vector associated with Xl is a linear combination of these k vectors? To answer the question, let us refer to the representation in (7.12), and suppose that Xl = X2 = . . . = Xl, but that Xi ~ Xl for
i
=
+ 1,
k
... , N
Retracing our steps and writing
o AT = T
(1)
o it follows that the jth column of T is a characteristic vector of A with the characteristic value Xi' Since T is orthogonal, its columns are linearly independent. Hence, if Xl is a root of multiplicity k, it possesses k linearly independent characteristic vectors. It remains to show that any other characteristic vector y associated with Xl is a linear combination of these k vectors. Let y be written as a linear combination of the N columns of T, N
Y=LCi X'
(2)
;=1
The coefficients c, can be determined by Cramer's rule since the determinant is nonzero as a consequence of the linear independence of the X'. Since characteristic vectors associated with distinct characteristic roots are orthogonal, we see that c, = 0 unless Xi is a characteristic vector associated with Xl. This shows that y is a linear combination of the characteristic vectors associated with Xl, obtained from the columns of T. 10. The Cayley-Hamilton Theorem for Symmetric Matrices. From (7.12), we see that for any polynomial p(X) we have the relation p(X l )
p(A) = T
o
o T'
(1)
56
Introduction to Matrix Analysis
If, in particular, we choose p(X) = IA - Xli, the characteristic polynomial of A, we see that p(A) = O. This furnishes a proof of the following special case of a famous result of Cayley and Hamilton. Theorem 4. Every symmetric matrix satiBjies its characteristic equation. As we shall see subsequently, this result can be extended to arbitrary square matrices. EXERCISE 1. Use the method of continuity to derive the Cayley-Hamilton theorem for general symmetric matrices from the result for symmetric matrices for simple roots.
11. Simultaneous Reduction to Diagonal Form. Having seen that we can reduce a real symmetric matrix to diagonal form by means of an orthogonal transformation, it is natural to ask whether or not we can simultaneously reduce two real symmetric matrices to diagonal form. The answer is given by the following result. Theorem Ii. A necessary and sujficient condition that there exist an orthogonal matrix T with the property that #£1
o T'AT =
T'BT
o
o ( 1)
=
o #£N
is that A and B cflmmute. Proof. The proof of the sufficiency is quite simple if either A or B has distinct characteristic roots. Assume that A has distinct characteristic roots. Then from (2) Ax = Xx we obtain (3) A(Bx) = B(Ax) = B(Xx) = X(Bx) From this we see that Bx is a characteristic vector assoniated with X whenever x is. Since the characteristic roots are assumed distinct, any two characteristic vectors associated with the same characteristic root must be proportional. Hence we have Bx' = 1JiZ'
i
== 1,2, . . . , N
(4)
where the IJ.I are scalars--which must then be the characteristic roots of the matrix B. W~ see then that A and B have the same characteristic vectors, Xl, x., ... , xN •
Reduction of General Symmetric Matrices
57
T can be taken to be T =
(X I ,X 2, • • •
,x N )
(5)
Consider now the general case where XI is a characteristic root of multiplicity k with associated characteristic vectors Xl, x 2 , • • • , xk • Then from (3) we can only conclude that t
Bx' =
LCi;x/
i = 1,2, . . . , k
(6)
i-I
Let us, however, see whether we can form suitable linear combinations of the Xi which will be characteristic vectors of B. We note, to begin with, that the matrix C = (Cij) is symmetric for the orthonormality of the Xi yields (xi,Bx') =
= (Bxi,x') =
Cij
(7)
C;,
t
L
Consider the linear combination
tliX i •
We have
i-I
(8)
Hence if the
tli
are chosen so that t
L
j = 1,2, . . . , k
Ci;a, = ria;
(9)
i-I
we will have (10) k
which means that
rl
is a characteristic root of Band
l
a;x' is an associ-
i-I
ated characteristic vector. The relation in (9) shows that rl is a characteristic root of C and the tli components of an associated characteristic vector. Thus, if T k is a k-dimensional orthogonal transformation which reduces C to diagonal form, the set of vectors z' furnished by the relation Zl
Xl
Z2
x2
= Zk
T~
(11)
xk
Introduction to Matrix Analysis
58
will be an orthonormal set which are simultaneously characteristic vectors of A and B. Performing similar transformations for the characteristic vectors associated with each multiple characteristic root, we obtain the desired matrix T. The necessity of the condition follows from the fact that two matrices of the form #£1
o A=T
T'
o
B=T
o T'
o
(12)
always commute if T is orthogonal. 12. Simultaneous Reduction to Sum of Squares. As was pointed out in the previous section, the simultaneous reduction of two symmetric matrices A and B to diagonal form by means of an orthogonal transformation is possible if and only if A and B commute. For many purposes, however, it is sufficient to reduce A and B simultaneously to diagonal form by means of a nonsingular matrix. We wish to demonstrate Theorem 6. Theorem 6. Given two real symmetric matrices, A and B, with A positive definite, there exists a nonBingular matrix T BUch that
T'AT = 1
(1)
o T'BT =
o #£N
Proof.
Let S be an orthogonal matrix such that XI
o S'
A = S
o
(2)
Reduction of General Symmetric Matrices
59
Then, if x = Sy, we have N
(x,Ax)
l
=
X,y,2
(3)
i=l
(x,Bx) = (y,S'BSy)
Now perform a /I stretching" transformation
that is Then
Z,
Yi
= Xi~i
Y
= S2Z
i = 1,2, . . . , N
(4)
N
l
A;Yi 2
=
(z,z)
(5)
i-I
(y,S'BSy) = (z,S~S'BSS2Z)
Denote by C the matrix S~S'BSS2, and let S3 be an orthogonal matrix which reduces C to diagonal form. Then, if we set z = S3W, we have N
(z,Cz) = (W,S'3CSaW) =
l
~iWi2
(6)
i-I
while (z,z)
(w,w).
It follows that
(7)
T = SS2S8
is the desired nonsingular matrix. EXERCISES 1. What is the corresponding result in case A is non-negative definite? I. Let the notation A > B for two symmetric matrices denote the fact that A - B is positive definite. Use the foregoing result to show that A > B > 0 implies that B-1> A-I,
13. Hermitian Matrices. It is clear that precisely the same techniques as used in establishing Theorem 2 enable one to establish Theorem 7, Theorem 7. If H is a Hermitian matrix, there exists a unitary matrix U such that
o H=U
U*
o
(1)
60
Introduction to Matrix Analysis
14. The Original Maximization Problem. Weare now in a position to resolve the problem we used to motivate this study of the positivity and negativity of quadratic forms, namely, that of deciding when a stationary point of a function !(XI,Xt, • • • ,XN) is actually a local maximum or a local minimum. Let C = (CI,C2, • • • ,CN) be a stationary point, and suppose that J possesses continuous mixed derivatives of the second order. It is consequently sufficient to consider the nature of the quadratic form Q =
a2! ) ( aCiaCl
=
(f.,./)
(1)
where, as before, (2)
evaluated at the point Xl = Cl, XI = C2, • • • ,XN = CN. We see then that a sufficient condition that C be a local minimum is that Q be positive definite, and a necessary condition that c be a local minimum is that Q be positive indefinite. The criterion given in Sec. 8 furnishes a method for determining whether Q is positive or negative definite. If Q vanishes identically, higher order terms must be examined. If N is large, it is, however, not a useful method. Subsequently, in Chap. 6, we shall derive the most useful criterion for the positive definiteness of a matrix. EXERCISE 1. Show that a set of sufficient conditions for /(z"z,) to have a local maximum is
/e.e. < 0 16. Perturbation Theory-I. We can now discuss a problem of great theoretical and practical interest. Let A be a symmetric matrix possessing the known characteristic values Xl, XI, ..• , XN and corresponding characteristic vectors Xl, Xl, • • • ,xN • What can we say about the characteristic roots and vectors of A + JJ, where B is a symmetric matrix and E is a II small" quantity? How small E has to be in order to be called so will not be discussed here, since we are interested only in the formal theory. If A and B commute, both may be reduced to diagonal form by the same orthogonal transformation. Consequently, with a suitable reordering of the characteristic roots of B, the characteristic roots of A + EB will be Xi + E#£i, i = 1, 2, ••. , N, while the characteristic vectors will be as before. Let us then consider the interesting case where AB ~ BA. For the sake of simplicity, we shall consider only the case where the characteristic roots are distinct.
Reduction of General Symmetric Matrices
61
It is to be expected that the characteristic roots of A + JJ will be ':1istinct, for e sufficiently small, and that they will be close to the characteristic roots of A. It will follow from this, that the characteristic vectors of A + JJ will be close to those of A. One way to proceed is the following. In place of the characteristic equation IA - Xli = 0, we now have the characteristic equation IA + JJ - Xli = O. The problem thus reduces to finding approximate expressions for the roots of this equation, given the roots of the original equation and the fact that e is small. To do this, let T be an orthogonal matrix reducing A to diagonal form. Then the determinantal equation may be written Xl
+ eCll
-
X
ECu
X2
ecu
+ eC22 -
X
=0 ECNt
XN
+
eCNN -
X
where T'BT = C = (c.j). We now look for a set of solutions of this equation of the form
We leave it as a set of exercises for the reader versed in determinantal expansions to obtain the coefficients d li, d2i, and so on, in terms of the elements Cij. We do not wish to pursue it in any detail since the method we present in the following section is more powerful, and, in addition, can be used to treat perturbation problems arising from more general operators. 16. Perturbation Theory-II. Let us now suppose not only that the characteristic roots and vectors of A + eB are close to those of A, but that we actually have power series in e for these new quantities. Write ~i, yi as an associated pair of characteristic root and characteristic vector of A + eB and Xi, Xi as a similar set for A. Then we set
+ eXlil + e 2X2i + yi = Xi + exl + e2xi2 +
~i = Xi
(1)
To determine the unknown coefficients Xli, Xu, .. and the unknown vectors Xii, Xii, • • • , we substitute these expressions in the equation (A
and equate coefficients.
+ JJ)y' == Ply'
(2)
Introduction to Matrix Analysis
62 We thus obtain (A
+ JJ)(x' + ex" + e' xl2 + ...) = (>.0 + eX" + e2Xi2 + . ')(x' + ex" + e2x'2 + ...)
(3)
and the series of relations Axl Ax" Ax'2
+ Bx + Bx" l
= XiXi
= XiXii
= X,X i2
+ XlIX'
+ X"x" + X'2X'
(4)
Examining these equations, we see that the first is identically satisfied, but that the second equation introduces two unknown quantities, the scalar >.01 and the vector x". Similarly, the third equation introduces two further unknowns, the scalar X'2 and the vector Xi2 • At first sight this appears to invalidate the method. However, examining the second equation, we see that it has the form (A - Xd)x" = (X,d - B)x l
(5)
where the coefficient matrix A - XiI is singular. Consequently, there will be a solution of this equation only if the right-hand-side vector (X,d - B)x' possesses special properties. When we express the fact that (Xid - B)x' possesses these properties, we will determine the unknown scalar Xii. Let us simplify the notation, writing Xii =
y
(X,d - B)x' =
Z
(6)
Then we wish to determine under what conditions upon z the equation (A - XiI)y = z
(7)
has a solution, when X, is a characteristic root of A, and what this solution is. Let Xl, x 2, • • • , x N be the normalized characteristic vectors of A, and consider the expressions for y and z as linear combinations of these vectors, N
Y=
l
bJxi
j-I N
z=lc;zt j-I
(8)
Reduction of General Symmetric Matrices
63
where Cj = (Z,X i )
i = 1,2, . . . , N
(9)
Substituting in (7), we obtain the equation N
(A - XiI)
L
N
L
bjx" =
j_1
cj'xi
(10)
j_1
or, since Axi = X,.x", i = 1, 2, . . . ,N, N
l
N
l
bj(Aj - X,)x" =
,-1
CiX'
(ll)
j_l
Equating the coefficients of
x", we obtain the relations i=
1,2, . . . ,N
(12)
We see that these equations are consistent if and only if
c.=o
(13)
Then b, is arbitrary, but the remaining bj are given by
i
=
1, 2, . . . , N, i ;e i
(14)
What the condition in (13) asserts is that (7) has a solution if and only if z is orthogonal to Xi, the characteristic vector associated with Xi' If so, there is a one-parameter family of solutions of (7), having the form (15)
where b, is arbitrary. Returning to (6), we see that we can take b, = 0, since a different choice of b, merely affects the normalization of the new characteristic vector yi. The orthogonality condition yields the relation (x', (Xill - B)x') = 0
(16)
Xii = (x',Bx i )
(17)
or, the simple result,
EXERCISES 1. Find the value of Xii. I. Consider the case of multiple roots, first for the 2 X 2 case, and then, in general. a. Let Xi denote the characteristic roots of A and Xj(z) those of A + zB. Show
Introduction to Matrix Analysis
M
..
that
~(A
+ ,B)...
I I+.I:.•
>./m),"'
m-O
A,(·) - (-l)"'m-Itr (
where
111+'"
~~o
B8;k'B8;·· ... B81") -m-l
8; ... -E; The E; are defined by the relation A
-I
>.;Ei , and 8; ...
;
L
Ek/('N. - >'i)
Ii,.;
(T. KGID).
,. Let A and B be 10 X 10 matrices of the form
A_[O~; .. ;] B==[O~; .. ;] o
1~"
0
0
A and B are identical except for the element in the (10,1) place, which is 10- 10 in the cue of B. Show that 1>'[ - AI ... >.10 and that I>.] - BI ... >.10 - 10- 1°. Hence, the characteristic rootl of B are 10- 1, 10- 1101, ••• , 10- 1101°, where 101 is an irreducible tenth root of unity. What is the explanation of this phenomenon? (Forsythe.)
MISCELLANEOUS EXERCISES 1. Show that a real symmetric matrix may be written in the form N
A =
I ,-1
>.,E,
where the E, are non-negative definite matrices satisfying the conditions
and the >., are the characteristic roots of A. This is called the spectral decompositi01l of A. I. Let A be a real skew-symmetric matrix; ali ... -aii. Then the characteristic roots are pure imaginary or zero. Let x + iy be a characteristic vector associated with i"., where". is a real nonzero quantity and where x and yare real. Show that :J: and yare orthogonal. 8. Referring to the above problem, let T, be an orthogonal matrix whose first two columns are x and y, T, =- (x,y,xl,x t , • • • ,x N) Show that
T~AT _ [(_~~) o
where
AN-I
is again skew-symmetric.
0]
AN-I
Reduction or General Symmetric Matrices
65
,. Prove inductively that if A is a real skew-eymmetric matrix of even dimension we can find an orthogonal matrix T such that
o (
0 -/AI
/AI)
0
T'AT ...
o
. (
0
-/AN
/AN) 0
where some of the /AI may be zero. 15. If A is of odd dimension, show that the canonical form is
(
0
-/.11
/.11)
0
o
T'AT ...
o
o
where again some of the /AI may be zero. 6. Show that the determinant of a skew-eymmetric matrix of odd dimension is zero. '1. The determinant of a skew-symmetric matrix of even dimension is the square of a polynomial in the elements of the matrix. S. Let A be an orthogonal matrix, and let ). be a characteristic root of absolute value 1 but not equal to ± I, with :r; + iy an associated characteristic vector, :r; and y real. Show that :r; and yare orthogonal. 9. Proceeding inductively as before, show that every orthogonal matrix A can be reduced to the form
o
- sin cos
A=T
).~)
T'
).~
±I
o
±I
10. Prove that TAT' is a positive definite matrix whenever T is an orthogonal matrix and A is a diagonal matrix with positive elements down the main diagonal.
66
Introduction to Matrix Analysis
11. Prove Theorem 5 by means of an inductive argument, along the lines of the proof given in Sec. 6. It. Establish the analogue of the result in Exercise 9 for unitary matrices. 18. Write down a sufficient condition that the function l(x"xl, ••• ,XN) possess a local maximum at Xi - C4, i ... 1, 2, ••• , N. U. Given that A is a positive definite matrix, find all solutions of XX' ... A. 115. Define the Gramian of N real functions h, ft, ... ,IN over (a,b) by means of the expression Prove that if Q(X"XI, • • • ,XN)
==
fa
N
b
[g(1) -
l
x,/l(1)
i-I
r
dt
then
• QG ...:.-=.-=.:.....,~IN~) mm == ..~~(g~Jc..:;"7'ft'!-, lie G(/I,ft, ••• ,IN) For Bome further results, see J. Geronimus.' For many further results, see G. Szego. I 18. If we consider the (N - I)-dimensional matrix 2
-1
-1
2-1
H== -1
2-1 -1
1
where all undesignated elements are zero, show that
IXl -
HI =
N-I
n (x -
2 - 2 cos
2;k: 1)
.1:-1
and determine the characteristic vectors. 17. Hence, show that if X, ... 0 and the N-I
l
,-1
(Xi -
Xi+,)1
~4
Determine the case of equality. 18. Show that if the Xi are all real and
Xi
are real, then N
sin l
2(
2N'Ir _ 1)
Xo ... XN+,
l
X,I
(-2
= 0 that
1 J. Geronimus, On Some Persymmetric Determinants Formed by the Polynomials of M. Appell, J. London Math. Soc., vol. 6, pp. 55-58, 1931. &0. Szega, Orthogonal Polynomials, Am. Math. Soc. Colloq. Publ., Vtll. 23, HI3D.
Reduction of General Symmetric Matrices
67
N
and that if x,
... XN+h
l
Xi ..
0, then
i-I N
l
(Xi -
Xi+')'
~
N
4 sin l
i-I
NlXi' i-I
(Ky Fan, O. TaWJsky, and J. Todd) 19. Find the characteristic roots and vectors of the matrix associated with the N
quadratic form
l
(x, -
X/)I.
i,j-l
10. Let A and B be real symmetric matrices such that A is non-negative definite. Then if IA + iBI ... 0, there exists a nontrivial real vector such that (A + iB)x .. 0 (Peremans-Duparo-Lekkerkerker). 11. If A and B are symmetric, the characteristic roots of AB are real, provided that at least one is non-negative definite. 22. If A and B are symmetric, the characteristic roots of AB - BA are pure complex. IS. Reduce QN(X) = X,XI + XIX. + ... + XN_,XN to diagonal form, determining the characteristic vectors and characteristic values. Similarly, reduce PN(X) = X,XI
+ XIXI + ... + XN_IXN + XNXI.
2'. If A is a complex square matrix, then A is HU, where H is non-negative definite Hermitian and U is unitary. Further results and references to earlier results of Autonne, Wintner, and Murnaghan may be found in J. Williamson.' 215. If A, B, and C are symmetric and positive definite, the roots of
Ix' A + XB + CI = 0 have negative real parts (Parodi). 28. Let A be a square matrix. Set IAI = IAI+ - IAI_, where IAI+ denotes the sum of the terms which are given a positive sign in the determinantal expansion. Show that if A is a positive definite Hermitian matrix, then IA 1_ ~ 0 (Schur-Leng). 27. Let A be a real matrix with the property that there exists a positive definite matrix M such that A'M ... M A. A matrix A with this property is called sllmmelruable. Show that this matrix A possesses the following properties: (a) All characteristic roots are real. (b) In terms of a generalized inner product [x,y) = (x,My), characteristic vectors associated with distinct characteristic roots are orthogonal. (c) Let T be called "orthogonal" if it is real and its columns are orthogonal in the extended sense. Then there exists an "orthogonal" matrix T such that TAT' is diagonal. I 28. Extend in a similar fashion the concept of Hermitian matrix. 29. Call a matrix A idempotent if A I = A. Show that a necessary and sufficient condition that a symmetric matrix A of rank k be idempotent is that k of the characteristic roots are equal to one and the remaining N - k are equal to zero. (The notion of rank is defined in Appendix A.) SO. If A is a symmetric idempotent matrix, then the rank of A is equal tp the II'
trace of A.
( The trace of A is equal to
l
a tl .)
i-I 1 J. Williamson, A Generalization of the Polar Representation of Nonsingular Matrices, Bull. Am. Math. Soc., vol. 48, pp. 856-863, 1942. 'See A. Kolmoioroff, Math. Ann., vol. 112, pp. 155-160, 1936.
Introduction to Matrix Analysis
68
81. The only nonsingular symmetric idempotent matrix is the identity matrix. If A is an N XN symmetric idempotent matrix of rank k, then A is positive definite if k ... N, positive indefinite if k < N. aa. If A is a symmetric idempotent matrix with ali ... 0, for a particular value of t l then every element in the ith row and ith column is equal to zero.
all.
m
a,.
If .each A, is symmetric, and I
A, ... 1, then the three following conditions
i-I
are equivalent: (a) Each Ai is idempotent. (b) AlA; ... 0 i ~ j. (c)
I
ni - N, where n, is the rank of Ai and N the dimension of the
A,.
i
alS. Let A, be a collection of N X N symmetric matrices where the rank of A, is 'PI. Let A ... I
A, have rank 'P.
Consider the four conditions
i
Cl. Each A, is idempotent. C2. AiA; ... 0 i ~ j. C3. A is idempotent. m
C4. 'P ...
I
'P"
i-I
Then 1. Any two of the three conditions CI, C2, C3 imply all four of the conditions CI, C2, C3, C4. 2. C3 and C4 imply CI and C2. For a proof of the foregoing and some applications, see F. A. Graybill and G. Marsaglia,1 D. J. Djokovic. 1 ae. Using the fact that 2x,xl < X,I + XII for any two real quantities x, and XI, show that the quadratic form i N ' "
>.I bix,1
+ '"
i-I
is positive definite if b,
b,x,1
+
I
N
'"
'"
a,/x,x;
i-hi j - I i-~+I > 0 and>. is sufficiently large.
+
I i,j-I
N
aT.
Show that I
a'ix,x; is positive definite if I ai/x,1 i
(,j-1
+
Ila;;lx,x/ is positive ir'j
definite. as. A matrix A ... (ail), i, j ... I, 2, 3, 4, is called a Lorentz matrix if the transformation :r; ... All leaves the quadratic form Q(z) ... XII - XII - X,I - Xt l unchanged; that is, Q(:r;) ... Q(II). Show that the product of two Lorentz matrices is again a Lorentz matrix. a9. Show that II' ... (x, + fJxl)/(1 - fJl)~j III = (fJx, + xl)/(1 - fJl)Jt II' ... x, lit -
where 0
Xt
< Il' < 1 is a Lorentz transformation.
F. A. Graybill and G. Marsaglia, Idempotent Matrices and Quadratic Forms ill the General Linear Hypotheaia, Am&. Math. Stat., vol. 28, pp. 678-686, 1957. I D. J. Djokovic, Note on Two Problems on Matrices, Amer. MalA. MonUllll, to appear. 1
Reduction of General Symmetric Matrices
69
to. Show that any Lorentz transformation is a combination of an orthogonal transformation of the variables XI, XI, X. which leaves x, fixed, a transformation of the type appearing above, and a possible change of sign of one of the variables (a reflection).' Physical implications of ·Lorentz transformations may be found in J. L. Synge.1 U. Let H be a non-negative Hermitian matrix. Show that given any t interval la,b) we can find a sequence of complex functions 1!i(I)}, i = 1, 2, . . . ,such that i, j ... 1, 2, •.• t
'2. Let A ... (ail) he an N X N symmetric matrix, and A N _, the symmetric matrix obtained by taking the (N - 1) X (N - 1) matrix whose elements are ao;, i, j 1, 2, . . . , N - 1. By means of an orthogonal transformation, reduce A to the form /A,
A-
o where /A' are the characteristic roots of AN_'. Using this representation, determine the relations, if any, between multiple characteristic roots of A and the characteristic roots of A N _ h U. Using these results, show that the rank of a symmetric matrix can be defined as the order, N, minus the number of zero characteristic roots. ". If H is a positive indefinite Hermitian matrix, then H ... TT*, where T is triangular and has real and non-negative diagonal elements. See H. C. Lee. 1 U. Necessary and sufficient conditions for the numbers d" dl , • • • , dN to be the diagonal elements of a proper orthogonal matrix are Idil ~ 1, j ... 1, 2, . . . , N, and N
'\' IdAI ~ n -
tf
2 - 2X min
l~j~N
l
Id;l,
where X = 1 if the number of negative d, is even
and 0 otherwise (L. Mirsky, A mer. Math. Monthly, vol. 66, pp. 19-22, 1959). fe. If A is a symmetric matrix 'with no characteristic root in the interval [a,b], then (A - aI)(A - bI) is positive definite (Kato's lemma). Can we use this result to obtain estimates for the location of intervals which are free of characteristic roots of A? Show that
'7.
min (Ii
r" e- (1 + a,t + ... + antn)1 dt =
10
I
min (I (1 a.
10
+ a,t + ... + antn)l dt
1/(n
= 1/(n
+ 1)
+ I)'
(L. J. Mordell, Equationes Mathematicae, to appear) I Cf. I. G. Petrovsky, Lectures on Partial Differential Equations, Interscience Publishers, Inc., New York, 1954. • J. L. Synge, Relativity, the Special Theory, Interscience Publishers, Inc., New York, 1956. t I. Schur, Math. Z., vol. 1, p. 206, 1918. B H. C. Lee, Canonical Factorization of Non-negative Hermitian Matrices, J. London Math. Soc., vol. 23, pp. 100-110, 1948.
Introduction to Matrix Analysis
70
48. Let s be an N-dimensional vector such that (s,l) ... 1. Determine the minimum of (I - 6, A(. - b» over this z..region 888uming that A is Hermitian. See G. E. Forsythe and G. H. Golub, On the Stationary Values of a Second Degree Polynomial on the Unit Sphere, J. Soc. Ind'lUJ. Appl. Math., vol. 13, pp. 1050-1068, 1965. '9. Let >'(, 14', i ... I, 2, . . . , N, be respectively the characteristic roota of A and B.
Set M = max (la'il,lb(jl), d = :
tJ pil :S (N + 2)MdJ./
Ib'i - eliil/M. To each root >'i there is a
N • Furthermore, the >., and p( can be put root /Ai such that I>., into one-to-one correspondence in such a way that 1>'( - /Ail :S 2(N 1)IMdl /N (A. Ostronki, Mathemalical Miscellanea X X VI: On the Continuity of Characteristic Roota in Their Dependence on the Matriz Elementa, Stanford University, 1959). 10. Let fA,,1 be a bounded, monotone-increasing sequence of positive definite matrices in the sense that there exists a positive definite matrix B such that B - A" is non-negative definite for any n and such that A" - A,,_I is non-negative definite for any n. Show that A" converges to a matrix A as n -+ GO (Riess).
+
Bibliography and Discussion §2. We suppose that the reader has been exposed to the rudiments of the theory of linear systems of equations. For the occasional few who may have missed this or wish a review of some of the basic results, we have collected in Appendix A a statement and proof of the results required for the discussion in this chapter. The reader who wishes may accept on faith the few results needed and at his leisure, at some subsequent time, fill in the proofs. §tS. The result in this section is a particular example of a fundamental principle of analysis which states that whenever a quantity is positive, there exists a formula for this quantity which makes this positivity apparent. Many times, it is not a 'trivial matter to find formulas of this type, nor to prove that they exist. See the discussion in
G. H. Hardy, J. E. Littlewood, and G. Polya, Inequalities, Cambridge University Press, New York, pp. 57-60, 1934. and a related comment at the end of Chap. 5. §8. The concept of a positive definite quadratic form, as a natural extension of the positivity of a scalar, is one of the most powerful and fruitful in all of mathematics. The paper by Ky Fan indicates a few of the many ways this concept can be used in analysis. Ky Fan, On Positive Definite Sequences, Ann. Math., vol. 47, pp. 593-607, 1946.
Reduction of General Symmetric Matrices
71
For an elegant and detailed discussion of other applications, see the monograph Ky Fan, Les Fonetions dcjinies-positives et les Fonctions compzetement monotones, fascicule CXIV, M em. sci. math., 1950. In the appendices at the end of the volume, we also indicate some of the ingenious ways in which quadratic forms may be used in various parts of analysis. A discussion of the diagonalization of complex non-Hermitian symmetric matrices may be found in C. L. Dolph, J. E. :McLaughlin, and I. Marx, Symmetric Linear Transformations and Complex Quadratic Forms, Comm. Pure and Appl. Math., vol. 7, 621-632, 1954. Questions of this nature arise as special cases of more general problems dealing with the theory of characteristic values and functions of SturmLiouville equations with complex coefficients.
§1Ii. For a further discussion of these problems, and additional references, see
R. D. Brown and I. M. Bassett, A Method for Calculating the First Order Perturbation of an Eigenvalue of a Finite Matrix, Proc. Phys. Soc., vol. 71, pp. 724-732, 1958. Y. W. Chen, On Series Expansiop of a Determinant and Solution of the Secular Equation, J. Math. Phys., vol. 7, pp. 27-34, 1966. H. S. Green, Matrix Mechanics, Erven P. Noordhoff, Ltd., Groningen, Netherlands, 1965. The book by Green contains a discussion of the matrix version of the factorization techniques of Infeld-Hull. For some interesting extensions of the concept of positive definiteness, see M. G. Krein, On an Application of the Fixed Point Principle in the Theory of Linear Transformations of Spaces with an Indefinite Metric, Am. Math. Soc. Transl., (2), vol. 1, pp. 27-35, 1955, where further references may be found. §16. In a paper devoted to physical applications,
M. Lax, Localized Perturbations, Phys. Rev., vol. 94, p. 1391, 1954, the problem of obtaining the solution of (A few elements of B are nonzero is discussed.
+
B)x
=
Xx when only a
72
Introduction to Matrix Analysis
For some interesting results concerning skew-symmetric matrices, see M. P. Drazin, A Note on Skew-symmetric Matrices, Math. Gat., vol. XXXVI, pp. 253-255, 1952. N. Jacobson, Bull. Amer. Math. Soc., vol. 45, pp. 745-748, 1939. With regard to Exercise 24, see also H. Stenzel, ttber die Darstellbarkeit einer Matrix als Product . Math. Zeit., vol. 15, pp. 1-25.
..,
Finally, for some interesting connections between orthogonality and quadratic forms, see W. Groebner, tiber die Konstruktion von Systemen orthogonaler Polynome in ein- und zwci-dimensionaler Bereich, Monat8h. Math., vol. 52, pp. 38-54, 1948. H. Larcher, Proc. Amer. Math. Soc., vol. 10, pp. 417-423, 1959. For an extensive generalization of the results of the two foregoing chapters, see F. V. Atkinson, Multiparametric Spectral Theory, Bull. Am. Math. Soc., vol. 74, pp. 1-27, 1968.
5 Constrained Maxima 1. Introduction. In the previous chapters, we have discussed the problem of determining the set of values assumed by (x,Ax) as x ranged over the region (x,x) = 1. In particular, we were interested in determining whether (x,Ax) could assume both positive and negative values. In this chapter we shall investigate this question under the condition that in addition to the relation (x,x) = 1, x satisfies certain additional linear constraints of the form (x,b i )
= c,
i
=
1,2, . . . , k
(1)
In geometric terms, we were examining the set of values assumed by We now add the condition that x simultaneously lie on a set of planes, or alternatively, is a point on a given N - k dimensional plane. Constraints of this nature arise very naturally in various algebraic, analytic, and geometric investigations. As a first step in this direction, we shall carry out the generalization of the algebraic method used in Chap. 1, obtaining in this way a more useful set of necessary and sufficient conditions for positive definiteness. These results, in turn, will be extended in the course of the chapter. 2. Determinantal Criteria for Positive Definiteness. In Chap. 4, it was demonstrated that a necessary and sufficient condition that a real symmetric matrix be positive definite is that all of its characteristic roots be positive. Although this is a result of theoretical value, it is relatively difficult to verify. For analytic and computational purposes, it is important to derive more usable criteria. The reason why a criterion in terms of characteristic roots is not useful in applications is that the numerical determina~ion of the characteristic roots of a matrix of large dimem\ion is a very difficult matter. Any direct attempt based upon a straightforward expansion of the determinant IA - Xli is surely destined for failure because of the extraordinarily large number of terms appearing in the expansion of a determinant. A determinant of order N has N! terms in its complete expansion. Since (x,Ax) as x roamed over the unit sphere.
73
74
Introduction to Matrix Analysis
101 = 3,628,000 and 20! ~ 2,433 X IOu, it is clear that direct methods cannot be applied, even with the most powerful computers at one's disposal. I The numerical determination of the characteristic roots and vectors constitutes one of the most significant domains of matrix theory. As we have previously indicated, we will not discuss any aspects of the problem here. For the form in two variables, (1)
we obtained the representation (2)
under the assumption that all ¢ O. From this, we concluded that the relations all>
0
all al21 > 0 Iau au
(3)
were necessary and sufficient for A to be positive definite. Let us now continue inductively from this point. Consider the form in three variables, Q(XI,X2,Xa) = allxl 2
+ 2al2XIX2 + aUx22 + 2a23x2Xa + 2a13XIXa + auxa2 (4)
Since all > 0 is clearly a necessary condition for positive definiteness, as we see upon setting X2 = Xa = 0, we can write
(5)
If Q(XI,XI,Xa) is to be positive definite, we see upon taking Xl
+ a13 + (al2X2 all
X 8) -
0
that the quadratic form in X2 and Xa must be positive definite. It follows, upon applying the result for 2 X 2 matrices given in (3), that a set of 1 At the rate or One operation per microsecond, 201 uperations would require Over 50,000 yearsl
Constrained Maxima
75
necessary and sufficient conditions that Q(XI,X2,Xa) be positive definite are al2
2
aU a 13 a23 - - -
au - all
ali
all
>0
a12a U
a13
an - - -
>0 (6)
2
a33 - -
all
all
The first two conditions we recognize. Let us now see if we can persuade the third t.o assume a more tractable appearance. Consider the determinant all
D3 =
au
al2
au
a2l
au a32
a23 a33
a3l
0
0
al2
al3
au -
a12 2
-
all al3 a l2
a23 - - all
al2 a l3
a23 - - -
(7)
all 2
al3
a33 - -all
The last equation follows upon subtracting the first row multiplied by aU/all from the second, and the first row multiplied by aU/all from the third (a method previously applied in Sec. 2 of Chap. 3). It follows then that the conditions in (6) may be written in the suggestive form all> 0
all
Ia2l
au! a22
>0
all
al2
al3
a2l
a22 a32
an a33
a3l
>
0
(8)
Consequently we have all the ingredients of an inductive proof of Theorem 1. A necessary and sufficient set of conditions that A be positive definite is that the following relations hold: k = 1,2, . . . , N
(9)
where Dk
=
i, j
laijl
=
1, 2, . . . , k
(10)
We leave the details of the complete proof as an exercise for the reader. 3. Representation as Sum of Squares. Pursuing the analysis a bit further, we see that we can state the following result. Theorem 2. Provided that nll D k is equal to zero, we may write N
Q(XI,X2, ••• ,XN) =
l
(D k/ Dk_ I)Yk 2
Do = 1
k= I
where N
Yk
=
Xk
+
l
Ck,Xj
k = 1,2, . . . ,N - 1
j=k+1
YN
=
XN
The Cij are rational functions of the al;' This is the general form of the representation given in (2.5).
(1)
76
Introduction to Matrix Analysis EXERCISE
1. What happens if one or mora of the DA are zero?
4. Constrained Variation and Finsler's Theorem. Let us now consider the problem of determining the positivity or negativity of the quadratic form Q(x I,X2, • • • ,XN) when the variables are allowed to vary over all Xi satisfying the constraints N
L
b,jXi =
i-I
°
i = 1,2, . . . ,k, k
(1)
As usual, all quantities occurring are taken to be real. Without loss of generality, we can suppose that these k equations are independent. Hence, using these equations, we can solve for k of the Xi as linear functions of the remaining N - k, substitute these relations in Q and use the criteria obtained in the preceding sections to treat the resulting (N - k)-dimensional problem. Although this method can be carried through successfully, a good deal of determinantal manipulation is involved. We shall consequently pursue a different tack. However, we urge the reader to attempt an investigation along the lines just sketched at least for the case of one constraint, in order to appreciate what follows. Let us begin by demonstrating the following result of Finsler. Theorem 3. If (x,Ax) > whenever (x,Bx) = 0, where B is a positive indefinite matrix, then there exists a scalar constant X such that (x,Ax) + X(x,Bx) is positive definite. Proof. Write x = Ty, where T is an orthogonal matrix chosen so that
°
k
(x,Bx)
=
L
IJ.'Yi 2
IJ.i
> 0,
1 ::; k
(2)
i= 1
Since B is by assumption positive indefinite, we know that (x,Bx) can be put in this form. Under this transformation, we haye N
(x,Ax) =
1
(3)
Ci;YiYi
i,l= I
k
If (x,Ax) is to be positive whenever (x,Bx) =
1
IJ.iy.2 =
0, we must have
i-I
the relation N
L
C,';YiYi
i,l=k+1
for all nontrivial YHI, YH2, ••• , yN.
>0
(4)
Constrained Maxima
77
In the (N - k)-dimensional space (YHI,YHI, . . . ,YN), make an orthogonal transformation which converts the quadratic form in (4) into a sum of squares. Let the variables Y" YI, . . . ,Yk remain unchanged. In N-dimensional Y space, write y = Sw to represent this transformation. In the w, variables, we have N
(x,Ax) = Q(w) =
N
L C,jYiYJ L + L L =
IJ.,W,2
'.1-1
'-k+l
N
k
2
k
di,w,wj
i-k+lj-l
+
L
di,w,wj
(5)
~j-l
Our problem then reduces to demonstrating that a quadratic form of the type k
X
I ,-1
IJ.iW i
2
+ Q(w)
(6)
is positive definite whenever IJ., > 0, i = 1, 2, . . . , N, and X is sufficiently large. Our proof is complete upon using the result indicated in Exercise 36 at the end of Chap. 4. Returning to the original variational problem, we now make the observation that the k equations in (1) can be combined into one equation, k
N
i= I
j-I
L(L bijXir = 0
Regarding "
k
N
i-I
;= I
2: (2:
(7)
bijX;r as a positive indefinite form, we see that
Theorem 3 yields the following result: A necessary and sufficient condition that Q(Xl,X2, . . . ,XN) be positive for all nontrivial values satisfying the linear equations in (1) is that the quadratic form P(Xl,X2, ••. ,XN)
=
Q(Xl,XI, . . . ,XN)
k
N
i= I
i-I
+ X L(L
b'iXi)2
(8)
be positive definite for all sufficiently large positive X. EXERCISE
1. Establish the foregoing result by considering the maximum over all x, of the function I(x) =
(1. Her8tein)
78
Introduction to Matrix Analysis G. The Case k = 1. Let us begin the task of obtaining a usable criterion
from the foregoing result with a case of frequent occurrence, k quadratic form P is then
=
1.
The
N
P =
L
(GoJ
+ Xbh-blj)~1
(1)
i,i == 1
For this to be positive definite for all large positive X, we must have, in accord with Theorem 1, the determinantal inequalities i, j = 1, 2, . . . ,k
(2)
for k = 1, 2, . . . , N. In order to obtain simpler equivalents of these inequalities, we employ the following matrix relation. au all
1
au au
o
1
all Xb k o bi bl • • . bk -1 au + M I2 au + XbIb2 an + Xb I b2 au + Xb 2 2
o 0
•
au au
+ XbIb k + Xb bk 2
Xb l Xb l (3)
an
+ Xbkb o
i
au
+ Xbkb
2
0
Taking determinants of both sides, we see that positivity of the determinant l
au au (4)
for large positive Xand k = 1,2, . . . , N.
Constrained Maxima
79
Hence, we must have the relation all a2l
an
alk a2k
au
bl b2
all au
au
au
a22
a2k
<0
X akl
bl
au b2
au bk
bk
akl
...
ak2
(5)
au
0
for all large positive Xand k = I, 2, . . . , N. Consequently, we see that a sufficient condition that Q be positive for all nontrivial Xi satisfying the linear relation in b1xl
+ b2X2 + ... + bNxN
=
0
is that the bordered determinants satisfy the relations
all
an
a21
au
alk a2k
bl b2
<0 akl bl
for k = 1,2, . . . ,N.
all au
ak2 b2
au bk
bk
0
It is clear that the conditions a12 au
alk
an
bl b2 ~O
au b1
(6)
au b2
au bk
(7)
bk 0
for k = I, 2, . . . , N are necessary, but that some of these determinants can be zero without disturbing the positivity property of Q. The simplest way in which these can be zero is for some of the bi to be zero. Referring to (5), we see that if some element of the sequence of bordered determinants is zero, say the kth, then a necessary condition for
Introduction to Matrix Analysis
80
positivity is that the relation
>0
be valid. A further discussion of special cases is contained following.
(8)
III
the exercises
EXERCISES
1. Assuming that bN
0, obtain the foregoing results by solving for XN, XN ... as a quadratic form in Does this yield an inductive approach? 2. If bl - 0, the condition in (7) yields -aub,1 ~ O. Hence, if bl ;>£ 0, we must have au ~ O. What form do the relations in (6) and (7) take if bl ... bl ... • • • ... br ... 0, 1 ~ r < N?
+
;>£
+ ... + bN_IXN_I)/bN, and considering Q(x)
-(bixi blZ l XI, XI, • • • ,XN_1o
L 4
a.
Consider in homogeneous coordinates, the quadric surface
ai/x,x;'" 0 and
',j-l
4
the plane
L
u,X, ... O.
Prove that a necessary and sufficient condition that the
i-I
plane be tangent to the surface is that l au all all
a41 UI
ai,
a22 aa2 act U,
au
au au aaa a.a
au
u. =0 u.
u.
u.
0
au
a..
UI Ua
,. Determine the minimum distance from the quadric surface (x,Ax) = 0 to the plane (u,x) ... O. Ii. Prove that a necessary and sufficient condition that the line determined by the planes (u,x) ... (v,x) ... 0 be either tangent to (x,Ax) = 0 or a generator, is that the bordered determinant
A UI
VI
UI VI
Ua Va
U.
v.
UI Ua Ua
v,
VI
u.
V.
0 0
0 0
Va
... 0
a Cf. M. Bocher, Introduction to Higher Algebra, Chap. 12, The Macmillan Company, New York, 1947.
Constrained Maxima
81
8. Determine the minimum distance from the quadric surface (z,Az) .0 to the line determined by the planes (u,z) = (v,z) ... O.
6. A Minimization Problem. Closely related to the foregoing is the problem of determining the minimum of (x,x) over all x satisfying the constraints i = 1,2, . . . , k (1) (x,a') = bi As usual, the vectors and scalars appearing are real. Without loss of generality, we can assume that the vectors a' are linearly independent. For any x we know that the inequality
(X,X) (x,a l ) ~o
is valid.
(2)
This is equivalent to the relation
(x,x)
~
- L..b..::.k-----.-c"'a-;-i,a-;''''),
I
(3)
since the determinant I(ai,a i ) I is positive in view of the linear independence of the ai • The right side in (3) is the actual minimum value, since equality is attained in (2) if x is chosen to be a linear combination of the ai, k
x=lcJai
(4)
j=l
In order to satisfy (1), the equations
Cj
are taken to be the solutions of the
k
l cj(aJ,a i ) = bi
i = 1, 2, . . . ,k
(5)
;-1
Since I(ai,a i ) I ;>6 0, there is a unique solution, which yields the unique minimiling vector x.
Introduction to Matrix Analysis
82
EDllCISBS 1. Deduce from this reeult that the quadratic form
o
XI
XI
•• ,
XN
A
Q(X) =
is negative Iefinite whenever A is positive definite. 2. Give an independent proof of this result by showing that the associated quadratic form cannot be positive definite, or even positive indefinite if A is positive definite. 8. Assuming that (x,Ax) possesses a positive minimum on the intersection of the planes (x,a') ... b" i - I , 2,. k, determine this minimum and thus derive the results of the preceding section in an alternate fashion. J
••
,
7. Gtmeral Values of k. Let us now return to the problem posed in Sec. 4. It is not difficult to utilize the same techniques in the treatment of the general problem. Since, however, the notation becomes quite cumbersome, it is worthwhile at this point to introduce the concept of a rectangular matrix and to show how this new notation greatly facilitates the handling of the general problem. Once having introduced rectangular matrices, which it would be well to distinguish by a name such as "array," as Cayley himself wished to do, we are led to discuss matrices whose elements are themselves matrices. We have hinted at this in some of our previous notation, and we shall meet it again in the study of Kronecker products. 8. Rectangular Arrays. A set of complex quantities a'j arranged in the form all
au
aIM
an
a22
a2M
A
will be called a rectangular matrix.
(1)
There is now no restriction that We will call this array an M X N matrix. Observe the order of M and N. There is no difficulty in adding two M X N matrices, but the concept of multiplication is perhaps not so clear. We shall allow multiplication of a K X M matrix B by an M X N matrix A. This arises as before M
=
N.
Constrained Maxima from the iteration of linear transformations.
8S
Thus
AB = C = (Coi)
(2)
Schematically,
a K X N matrix, where M
Coi =
L a",b
(3)
ki
k~l
The transpose A' of an M X N matrix is an N X M ma~rix obtained by interchanging rows and columns. In many cases, these rectangular matrices can be used to unify a presentation in the sense that the distinction between vectors and matrices disappears. However, since there is a great conceptual differ· ence between vectors and square matrices, we feel that particularly in analysis this blending is not always desirable. Consequently, rectangular matrices, although of great significance in many domains of mathematics, will playa small role in the parts of matrix theory of interest to us here. Hence, although in the following section we wish to give an example of the simplification that occasionally ensues when this notation is used, we urge the reader to proceed with caution. Since the underlying mathematical ideas are the important quantities, no notation should be adhered to slavishly. It is all a question of who is master. EXERCISES 1. Let x and y be N-dimensional column vectors.
Show that
i, j = 1, 2, . . . , N
2. If X is an N-dimensional column vector, then
IH - xx'i
= XN - (X',X)XN-l
9. Composite Matrices. We have previously defined matrices whose elements were complex quantities. Let us now define matrices whose elements are complex matrices, (1)
84
Introduction to Matrix Analysis
This notation will be particularly useful if it turns out that, for example, a 4 X 4 matrix may be written in the form
all au all [ au
au
au]
an au
au au
AU]
a23 au = [All Au
(2)
A 22
with the Aii defined by au au] A 12= [ an au
A 1 1-- [all au _ A u-
[a31
A 22 --
au
[aaa au] au
au
or, alternatively, with the definitions A 11 =
all
au au]
[ au an an all an an Au = (au au au)
Au = A 22 =
au] [ (au) au au
(4)
For various purposes, as we shall see, this new notation possesses certain advantages. Naturally, we wish to preserve the customary rules for addition and multiplication. It is easy to verify that addition is carried out in the expected fashion, and we leave it as an exercise for the reader to verify that the product of (Ai}) and (B.J) may safely be defined as follows: (AiJ)(BiJ ) = (Gii )
(5)
where N
GiJ =
L AikBkJ
(6)
k-1
provided that all the products A.kBkJ are defined. associativity of multiplication is also preserved.
Furthermore, the
EXERCISES
1. From the relation A B] [ D [C D -C
-B] [AD - BC BA - An] A ... CD - DC DA - CB
deduce that
I~ ~ I
= lAD - BCI
if BA ... AB or CD ... DC, and the required products exist.
Constrained Maxima 2. Let M = A
85
+ iB, A and B real, and let MI=
Show that
-AB]
fB
A
(Szaraki and Wazew8ki)
S. Hence, show that IMII = IMIIMI, and that the characteristic roots of M I are those of M and M. ,. Show that if M is Hermitian, then M I is symmetric.
10. The Result for General k. Let us now discuss the general problem posed in Sec. 4. As Theorem 3 asserts, the quadratic form N
l
P =
k
(aii
i,j-l
+Xl
brsbr;) XiXi
(1)
r-l
must be positive definite for large X. Hence we must have the determinantal relations k
Iaij + X
l
bribr; I > 0
i, j = 1, 2, . . . ,n
(2)
r-l
for n = 1, 2, . . . ,N and all sufficiently large positive X. In order to simplify this relation, we use the same device as employed in Sec. 5, together with our new notation. Let BN denote the rectangular array bll b12
b21
bkl bu
b22
BN =
(3)
biN
...
b 2N
Consider the following product of (N
bkN
+ k)-dimensional matrices,
where lk is the k-dimensional unit matrix and i, j
=
1, 2, . . . , N
Taking determinants of both sides, we have (-I)k
I ~~ ~Bi I =
IAN
+ XBNB~I
(5)
86
Introduction to Matrix Analysis
Hence, the conditions of (2) are replaced by (6)
for all large ).. It follows as before that sufficient conditions for positivity are readily obtained. The necessary conditions require a more detailed discussion because of the possible occurrence of zero values, both on the part of the coefficients btJ and the bordered determinants. MISCELLANEOUS EXERCISES 1. If IA,I is an arbitrary set of r X r matrices, there are unitary matrices U and V of order r X r, such that UAiV ... Di' Di diagonal and real, if and only if AlAI ... AlA. and A/A, = A,A I for all i and j (N. A. Wiegmann). N
L
2. Let
ailxiXI be a positive definite form.
Show that
',j-l
is a positive definite form in the N - 1 variables XI, XI, • • • , X~_I, X~+h • • • , XN What are the characteristic roots and vectors of the matrix of this quadratic form? Generalize this result. a. Introduce the following correspondence between complex numbers and matrices: Z
= X
+ iy '" Z
... [
X
-y
y]
X
Show that (a) ZW == WZ (b) i '" Z' (c) Z ' " Z, w '" W implies that zw '" ZW fl. Usc this correspondence and de Moivre's theorem to determine the form of
Z" for n = 1, 2, .•..
Ii. From Exercise 3, we see that we have the correspondence
.'" [-10 01]
• Using this in the matrix Q = [
we are led to the supermatrix
XI -Xa
+ ~XI + U.
+ ~x.]
Xa XI -
UI
Constrained Maxima
87
or, dropping parentheses, Q. =
XI
XI
-x,
XI
[ -Xa
X.
XI
Xa
-XI
-x.
XI
-X.
Consider the equivalence Q '" Q.. Show that Q '" Q., W '" W. results in QW '" Hence, determine IQ.I, and the characteristic values of Q•• 8. How does one determine the maximum value of (x,Ax) subject to the constraints (x,x) ... 1, (x,e C) ... 0, i ... 1, 2, •.• , k? 7. Write, for two symmetric matrices, A and B, A ~ B to indicate that A - B is non-negative definite. Show that A ~ B does not necessarily imply that A I ~ BI. 8. Let g(l) ~ 0 and consider the quadratic form Q.W..
N
QN(X) ...
fo 1 g(l) (L
Xkl k )' dl
k-O
Show that QN(X) is positive indefinite, and, hence, derive various determinantal inequalities for the quantities m" 9. Let g(1)
~
1
= f0 g(I)I" dl (Slielljes).
0 and consider the Hermitian form N
1
H N(X) = f0 g(1)
IL Xktl1rikt II dt k-O
Show that HN(x) is positive definite, and, hence, derive determinantal inequalities for the quantities
mk
=
10
1
g(l)e'rikt dl (0. Toeplilz).
10. Show that N
L
m,n=O
N XmXn.
m
+n + 1
<11'
L
(Hilbert's inequality)
n=O
11. Establish Theorem 1 inductively. See C. J. Seelye. 1 12. Determine the minimum value over all Xi of the expression N
QN(X) = for, e- i8
L II Xke iU
-
dB
k=O
IS. What is the limit of the minimum value as N -+ "'? (This result plays an important role in prediction theory. See U. Grenander and G. Szego, Toeplitz Formll and their Applicalions, University of California Press, 1958.) 14. Determine the minimum value over all x, of the expression QN(.r;) =
fo l II" -
N
L x,I~; I
dt
i-O
where I
Ix,1
is a sequence of real numbers, Xo
< XI < ....
C. J. Seelye, Am. Malh. Monlhly, vol. 65, pp. 355-356, 1958.
88
Introduction to Matrix Analysis
115. Consider the problem of determining the minimum and maximum values of + 2(b,x) on the sphere (x,x) ... 1. How many stationary values are there, and are they all real? 18. How many normals, real and complex, can one draw from a given point in the plane (Xhl/I) to the ellipse xl/al + yl/b l ... I? (x,Ax)
IT. If A ...
(ali),
C ...
(Clj),
and
L
aii -
0,
L
ali -
0,
Cit
=-
Ci
+ Cj, then AB and
I
j
A(B + C) have the same characteristic equation (A. Brauer). 18. If A, C.. and CI are such that CIA ... ACI ... 0, C - C1 + CI , then AB and A(B + C) have the same characteristic equation (Parker). I'. If ACA - 0, then AB and A(8 + C) have the same characteristic equation (Parw). 10. Let H be a Hermitian matrix with the characteristic roots XI ~ XI ~ ••• ~ XN, and let the characteristic roots of S'HS be /AI ~ /AI ~ • • • ~ /AN, where S is an arbitrary real nonsingular matrix. Then the number of positive, negative, and zero terms in the two lists of characteristic roots is the same. A quantitative sharpening of this result (due to Sylvester and Jacobi), often called the Law of Inertia, may be found in A. M. Ostrowski, A Quantitative Formulation of Sylvester's Law of Inertia, Proc. Nat. Acad. Sci. U.S., vol. 45, pp. 740-743, 1959. 11. Let A and B be real symmetric matrices. A necessary and sufficient condition that the pencil of matrices xA + /AB, X, /A real scalars, contains a positive definite matrix is that (x,Ax) = and (x,8x) = imply that x = 0, provided that the dimension N is greater than 2 (Calabs). II. Any pencil XA + /AB for which the foregoing holds can be transformed into diagonal form; see O. Taussky, Positive Definite Matrices, lnequalilie., Academic Preu, Inc., New York, 1967.
°
°
Bibliography §2. These determinantal criteria can also be obtained by use of a Sturm's sequence derived from the characteristic polynomial IA - }oJ/ and this technique can be used to prove the reality of the characteristic roots. See W. S. Bumside and A. W. Panton, Theory of Equationa, vol. II, p. 65, Exercise 39, etc., Longmans, Green & Co., Inc., New York, 1928.
A remarkable extension of these inequalities is contained in I. Schur, ttber endliche Gruppen und Hermitesche Formen, Math. Z., vol. 1, pp. 184-207, 1918. Some interesting results conceming positivity may also be found in I. J. Good, A Note on Positive Determinants, J. London Math. Soc., vol. 22, pp. 92-95, 1947.
Constrained Maxima
89
§4. The question treated here is part of a group of investigations, reference to which may be found in L. L. Dines, On the Mapping of Quadratic Forms, Bull. A m. Math. Soc., vol. 47, pp. 494-498, 1941.
L. L. Dines, On Linear Combinations of Quadratic Forms, Bull. Am. Math. Soc., vol. 49, 1943. The particular result we employ is due to Finsler; cf.
P. Finsler, V'ber das Vorkommen definiter und semidefiniter Formen in Scharen quadratischen Formen, Commentarii Mathematicii Helveticii, vol. 9, pp. 188-192, 1937.
The result was derived independently by Herstein, Ilsing the argument given in the exercise at the end of the section. We follow his derivation of the conditions here. A number of other derivations appear in later literature, e.g., S. N. Afriat, The Quadratic Form Positive Definite on a Linear Manifold, Proc. Cambridge Phil. Soc., vol. 47, pp. 1-6, 1951. §6. The exercises at the end of this section, 3 to 6, illustrate an important heuristic concept which can be made precise by mealls of the theory of invariants: Whenever we obtain a positivity condition in the form of an inequality, there is a more precise result expressing distances, areas, volumes, etc., from which the inequality becomes obvious. §6. The problem of determining when (x,Ax)
~
0 for x
~
0 is treated
in
J. W. Gaddum, Linear Inequalities and Quadratic Forms, Pacific J. Math., vol. 8, pp. 411-414, 1958.
6 Functions of Matrices 1. Introduction. In this chapter, we wish to concentrate on the concept of a function of a matrix. We shall first discuss two important matrix functions, the inverse function, defined generally, and the square root, of particular interest in connection with positive definite matrices. Following this, we shall consider the most important scalar functipns of a matrix, the coefficients in the characteristic polynomial. The problem of defining matrix functions of general matrices is a rather more difficult problem than it might seem at first glance, and we shall in consequence postpone any detailed discussion of various methods that have been proposed until a later chapter, Chap. 11. For the case of symmetric matrices, the existence of the diagonal canonical form removes most of the difficulty. This diagonal representation, established in Chap. 4, will also be used to obtain a parametric representation for the elements of symmetric matrices which is often useful in demonstrating various results. As an example of this we shall prove an interesting result due to Schur concerning the composition of two positive definite matrices. Finally, we shall derive an important relation between the determinant of a positive definite matrix and the associated quadratic form which will be made a cornerstone of a subsequent chapter devoted to inequalities, and obtain also an analogous result for Hermitian matrices. 2. Functions of Symmetric Matrices. As we have seen, every real symmetric matrix can be represented in the form
A
T'
T
o 90
(1)
Functions of Matrices where T is an orthogonal matrix.
91
From this it follows that
xlt
T'
(2)
o XN k
for any itlteger k.
Hence, for any analytic function j(2) we can define
f(A) to be
T'
j(A) = T
(3)
o j(XN)
whenever the scalar functions j(X i ) are well defined. 3. The Inverse Matrix. Following the path pursued above, we are led to define an inverse matrix as follows.
T'
A-I = T
(1)
o whenever the Xi are all nonzero. It is clear that A - I has the appropriate properties of an inverse, namely, AA-l = A-I A = I
(2)
4. Uniqueness of Inverse. It is natural to inquire whether or not the inverse matrix is unique, if it exists. In the first place, let us observe that the condition that the X; are all nonzero is equivalent to the condition that IAI ¢ O. For if X = 0 is a root of IA - HI = 0, we see that IAI = 0, and if IAI = 0, then X = 0 is a root. If A is a matrix for which IAI = 0, it will be called singular. If IA I ¢ 0, A is called nonsingular. It is immediately seen that a singular matrix cannot have an inverse. For, from AB = I, we have IABI = 1 or IAIIBI = 1. No such equation can hold if IAI = O. Let us now show, however, that any nonsingular matrix, symmetric or not, possesses a unique inverse. For the case where A is symmetric, the matrix found in this new way must coincide with that given by (3.1).
Introduction to Matrix Analysis The equation A B = I is equivalent to the N2 equations
92
N
L
aitbtj
= ~ij
i, j = 1, 2, . . . , N
(1)
II-I
Fixing the value of j, we obtain N linear equations for the quantities btil k := 1,2, ... ,N. The determinant of this set of equations is IAI = Ia;;/, regardless of the value of j. It follows then that if A is nonsingular, there is a unique solution of the equations in (1). Furthermore, we see that the solution is given by (2)
where
Of.ij
is the cofactor of
aji
in the expansion of the determinant of A. EXERCISES
1. Show that A -I, if it exists, is symmetric if A is symmetric in three ways: (a) Directly from (3.1) (b) Using the expression in (4.2) (c) Using the relations AA - I = A -IA ... I and the uniqueness of the inverse I. If A is singular, show that we can find a matrix B whose elements are arbitrarily small and such that A + B is nonsingular. S. Let IXI - AI =- XN + CIXN-I + + CN, the characteristic polynomial of A. Show that A-I ... _(AN-I + cIAN-2 + + CN_I)/CN, whenever CN pill O. '" If A I, A 2, B I, B 2 are nonsingular square matrices of the same order, then AI BI]-I =- [(AI - B IB 2-IA 2)-1 [ A 2 B2 (B, - A,A 2-IB 2)-1
(A 2 - B2BI-IAI)-I] (B 2 - A~I-IBI)-I
IS. Let A be an M X N matrix and consider the set of inconsistent linear equations x = Ay, where x is an N-dimensional vector and y an M-dimensional vector. To obtain an approximate solution to these equations, we determine y as the minimum ofthe quadratic function (x - Ay, x - Ay). Show that, under the above assumption of inconsistency, there is a unique solution given by y ... (A/A)-IA'x, and determine the minimum value of (x - Ay, x - Ay). 8. Let A ~ B, for two symmetric matrices A and B, denote as previously the fact that A - B is non-negative definite. Show that A ~ B implies that B-1 provided that A - I and B-1 exist. 7. If 8 is a realskew-symmetric matrix, then I + 8 is nonsingular. 8. If 8 is a real skew symmetric, then
7' =- (I - 8)(I is orthogonal (lhe Cayley transform). 9. If A is an orthogonal matrix such that A A = (I - 8)(I
~
A -I,
+ 8)-1 + I is nonsingular, then we can write
+ 8)-1
where 8 is a real skew-symmetric matrix. 10. Given a matrix A, we can find a matrix J, having ± Is along the main diagonal and ZeroeS elsewhere, such that J A + I is nonsingular. 11. Using this result, show that every orthogonal matrix A can be written in the
Functions of Matrices torrtl.
A = J(1 - 8)(1 where J is as above. 12. Show that
93
+ 8)-1
+
+
(a) AI ~ BI, AI ~ B I, implies AI AI ~ B, B2 (b) AI ~ BI, BI ~ CI , implies AI ~ C I (c) AI ~ BI, AI ~ B t , does not necessarily imply AlAI ~ BIB I, even if AlAI and BIB I are both symmetric. Thus, A ~ B does not necessarily imply that AI ~ BI. (d) AI ~ B I implies that T'AIT ~ T'BIT
18. Show that A -I can be defined by the relation (x,A-Ix) =- max [2(x,y) - (y,Ay») 1/
if A is positive definite.
14. From this, show that B-1 ~ A - I if A and B are symmetric and A ~ B > O. 115. If A is a matrix whose elements are rational integers, and if IAI = ± I, then A-I is a matrix of the same type. 18. If A is a matrix whose elements are complex integers, i.e., numbers of the form x, + iYI where XI and YI are rational integers, and if IAI = ± 1 or ±i, then A - I is a matrix of the same type. 17. What is a necessary and sufficient condition that the solutions of Ax = y be integers whenever the components of yare integers, given that thll elements of A are integers? 18. Show that IA + iBII = IAIIll + A -IBA -IBI if H = A + iB is Hermitian, and A is nonsingular. 19. Show that (z,Hz) = (x,Ax) + (y,Ay) + 2(Bx,y) if H = A + iB is Hermitian and z = X + iy. 20. Show that (x,Ax) = (x,A.x) where A. = (A + A')/2. 21. If A and B are alike except for one column, show how to deduce the elements of A -I from those of B-1. 22. If A has the form o x, 0 y, 0 XI 0 o YI 0 x, A = 0 y,
then, provided that y.
¢
0, in A - I the element blk is given by
blk
...
0
k odd
2: --,
kj2-1
= (_I)k/2-1
X2'
i-O
k even
Y2Hl
where Xo ... 1. (Clerruml) 2S. From this deduce the form of A -I.
5. Square Roots. Since a positive definite matrix represents n natural generalization of a positive number, it is interesting to inquire whether Of not a positive definite matrix possesses a positive definite square root.
94
Introduction to Matrix Analysis
Proceeding as in Sec. 2, we can define A ~ by means of the relation
T'
(1)
o obtaining in this way a matrix satisfying the relation B2 = A, which is positive definite if A is positive definite. There still remains the question of uniqueness. To settle it, we can proceed as follows. Since B is symmetric, it possesses a representation
P2
0
s'
B=B
(2)
o IJN
where B is orthogonal. It follows that Band B2 = A commute. Hence, both can be reduced to diagonal form by the same orthogonal matrix T. The relation B2 = A shows that B has the form given in (1) where the positive square roots are taken. EXERCISES
1. How many symmetric square roots does an N X N positive definite matrix po_ss? 2. Can a symmetric matrix possess nonsymmetric square roots? a. If B is Hermitian and B ~ 0, and if A I - B and (A + A *) ~ 0, then A is the unique non-negative Hermitian square root of B (Putnam, On Square Root8 and Logarithm, of Operator" Purdue University, PRF-1421, 1958). 4. If sA - B, B positive definite, and A ~ (2 log 2)1, then A is the unique positive definite logarithm of B (Putnam).
6. Parametric Representation.
The representation
).1
A=T
T'
o
(1)
Functions of Matrices
95
furnishes us a parametric representation for the elements of a symmetrio matrix in terms of the elements tij of an orthogonal matrix T, namely, N
Q,;j
L
=
(2)
X"ti"t j "
k-I
which can be used as we shall see in a moment to derive certain properties of the aij. EXERCISE 1. Obtain a parametric representation in terms of cos 6 and sin 6 for the elements of a general 2 X 2 symmetric matrix.
7. A Result of 1. Schur. prove Theorem 1. Theorem 1. If A
Using this representation, we can readily
= (a,), B = (b ij ) are positive definite, then
C =
(a,ibii)
is positive definite. Proof. We have N
l'
N
Q,;JQijXiXj
N
1 (1 ~kt'ktjk) 1 ~k 1
=
bijx,Xj
i,i - I
,,j - I
k= 1
N
=
N
(
k-I
(1)
b,jXitikXjtj,,)
/,i-I N
As a quadratic form in the variables Xit ik the expression
L
i,i =
positive unless the quantities
1
Xi 2tik
2
ilk
Xitik
=
are all zero.
1
Xi2l tik
i
k
2
=
bijXitikXjtjk
is
1
Since
1
X,2
(2)
i
we see that this cannot be true for all i, unless all the Xi are zero. This establishes the required positive definite character. EXERCISE 1. Use Exercise 2 of the miscellaneous exercises of Chap. 5 to obtain an inductive proof of this result, starting with the easily established result for N ... 2.
8. The Fundamental Scalar Functions. Let us now consider some properties of the scalar functions of A determined by the characteristic polynomial,
IXl -
AI = XN - 4>1(A)XN-I
+ 4>2(A)XN-2 + ... + (- I)N4>N(A)
(1)
Introduction to Matrix Analysis
96
In this section the matrices which occur are general square matrices, not necessarily symmetric. It follows from the relations between the coefficients and the roots of a polynomial equation that I/>,(A)
=
1/>2(A) =
).1
+ ).2 + ... + ).N
L
).').j
, ..j
(2) On the other hand, writing out the term in ).N-I of the characteristic polynomial as it arises from the expansion of the determinant IA - )./1, we see that (3) I/>,(A) = all + an + ... + aNN while setting). = 0 yields (4) The linear function I/>,(A) is of paramount importance in matrix theory. It is called the trace of A, and usually written tr (A). It follows from (3) that tr (A
+ B)
= tr (A)
+ tr (B)
(5)
and a little calculation shows that tr (AB) = tr (BA)
(6)
for any two matrices A and B. These relations hold despite the fact that there is no simple relation connecting the characteristic roots of A, B and A + B or AB. This last result is a special case of Theorem 2. Theorem 2. For any two matrices A and B, we have k
= 1,2,
. . . ,N
(7)
The proof will depend upon a fact that we already know, namely, that the result is valid for k = N, that is, IABI = IBAI for any two matrices. If A is nonsingular, we have the relations
1).[ -
ABI
=
IA().A-' - B)I
=
I().A-' - B)AI
= 1)./ -
BAI (8)
which yield the desired result. If A is singular, we can obtain (8) by way of a limiting relation, starting with A + El, where E is small and A + El is nonsingular.
91
Functions of Matrices
In the exercises immediately below, we indicate another approach which does not use any nonalgebraic results. EXERCISES 1. Show that II + tAl - 1 + t tr (A) + . 2. Show that t/>~(TAT-l) ... t/>~(A); t/>~(TAT') t/>~(A) if Tisorthogonal. S. Show that any polynomial p(all,aU, ••• ,aNN) in the elements of A which has the property that p(AB) ... p(BA) for all A and B must be a polynomial in the functions t/>~(A). 4. Show that tr «AB)~) ... tr «BA)~)fork = 1,2, •.. ,andhencethatt/>~(AB) ... t/>~(BA) for k ... 1, 2, . . . , N. I. Show that tr (AX) = 0 for all X implies that A ... O. 8. Exercise 2 and Theorem 2 above are both special cases of the result that for any m matrices AI, AI, ' .. , Am, of order N, the quantity t/>~(AIAI • '.' Am) remains unchanged after a cyclic permutation of the factors AI, AI, . . . , Am.
f_..
9. The Infinite Integral e-(""A",) dx. integrals in analysis is the following:
...f"- .
One of the most important
e-(s,As)
dx
(1)
where dx = dXI dX2 . . . dXN. Although the indefinite integral cannot be evaluated in finite terms, it turns out that the definite integral has a quite simple value. Theorem 3. If A is positive definite, we have
We shall give two proofs. First Proof, Let T be an orthogonal matrix reducing A to diagonal form, and make the change of variable x = Ty. Then (x,Ax) = (y,Ay)
Furthermore,
N
N
i= 1
i-I
n dXi = n dYi, since the Jacobian of the trs.nsformation
Ty is ITI, which can be taken to be one transformation, it follows that
X =
(3)
+ 1.
Since x
= Ty
is a one-to-
(4)
Introduction to Matrix Analysis
98
N
From the fact that
n
IA I, and the known evaluation
Xi ==
i-I
f-... e-
s'
we obtain the result stated in (2). Let US now give a second proof. that we may write
dx = lI"~S
'
In Sec. 3 of Chap. 5, it was shown
Do == 1
(x,Ax)
(5)
where N
y" == x"
+
l
k = 1,2, •.. , N
b"iXJ
(6)
i-Ic+l
provided that A is positive definite. Let us then make a change of variable from the x" to the y". The Jacobian is again equal to 1, and the transformation is one-to-one. It follows that
IN
= (/-". e-DIIII'dYI) (/-....
e- DIII .·/D, dY1)
(/-.... e-DNWN'/DN-I dYN) =
~:~ = ~I/:S
EXERCISES 1. Show that if A is positive definite and B symmetric, we have
f . ... f" -..
e-(s,A.. )-i(.. ,B..)
dz ...
-..
.".N12
IA
+ iBll~
where the principal value of IA + iBI~i is taken, in the following steps: (a) Set z ... Til, where T'AT - A, a diagonal matrix (b) Make the further change of variable, Ilk ... Zk/'Ak~~ (c) The integrand is now
Reduce C to diagonal form by means of an orthogonal transformation z ... 8,0. (d) Evaluate the remaining integral. 2. Evaluate integrals of the form
m, n positive integers or zero, in the following fashion.
100 .
...
.".
(allall -
alll)~i
Write
(7)
Functions of Matrices
99
and thul!l
How does one treat the general case where either m or n may be odd? a. If A and B are positive definite, evaluate
J_.. J"- .
IN =
4. Evaluate IN =
e-(··AB.)
dx
e-(S, A')+Il(.,v)
10. An Analogue for Hermitian Matrices. us consider the integral J(H) =
J_..
e-(i,H.)
dz
By analogy with (9.1), let (1)
dx dy
where dx = dXI dX2 ••• dXN, dy = dYI dY2 ••• dYN and H is a positive definite Hermitian matrix. Write H = A + iB, where A is real and positive definite and B is real and skew-symmetric. Then J(H) =
1_.. .
e-(·,As)-2(Bs,l/l-(I/,AI/)
dx dy
(2)
Since the integral is absolutely convergent, it may be evaluated by integration first with respect to x and then with respect to y. Using the relation (x,Ax)
+ 2(Bx,y)
= (x,Ax) - 2(x,By) = (A(x - A-'By),x - A-'By) - (By,A-'By)
(3)
we see that (4)
Hence, J(ll) = .".N/2 IAI~
f"_..
e-[(I/,AI/)+
.".N/2 =
dy
.".N12
IAlls 'IA + BA-'Blls .".N
- IAIII + A-IBA 'BIl' 11. Relation between J(H) and IHI, It remains to express terms of IHI. We have IHI IH'I
= =
IA + iBI IA - iBI
= =
IAIII + iA-'BI IAIII - iA-'BI
(5)
J(H) in
(1)
100 Since
Introduction to Matrix Analysis
IHI
IH'I, we have IHl2 = IA1 21(I + iA-IBHI = IAI 21l
- iA-IB)1
(2)
+ A-IBA-IBI
Combining the foregoing results, we see that J(H) =
r
N
jlHI
(3)
MISCELLANEOUS EXERCISES 1. Show that the characteristic roots of p(A), where and thUI!l that Ip(A)1 ...
n
p(~)
is a polynomial, are
p(~d,
p(}.;).
i
2. Under what conditions does the corresponding result hold for rational functions? a. Determine the inverses of the following matrices:
c
-I:
Q ...
[
-~I XI XI
"'
-XI -XI
X.
XI
-X.
Xa -X.
XI -XI
XI XI XI
"J
'- Let Ao - 1, and the sequence of scalarl!l and matrices be determined by Ck ... tr (AAk_I)/Ie, Ie - 1,2, ,Ak = AAk_1 - ckl. Then A-I - An_dc.. (Frame). I. Show that IA + tBI IAI(l + t tr (A-IB) + ...). 8. Using the representation IA
+ eB + ~ll
+
+ ~tf>N-I(A + eB) + + e tr «A + ~l)-IB) +
= tf>N(A eB) ... IA ~/1(1
+
.)
we obtain the relation N-I(A
+ eB)
- tf>N-I(A) - eN(A) tr (A-IB)
+
and corresponding results for each tf>k(A). T. Show that a necessary and sufficient condition that A be positive definite is that tr (AB) > 0 for all positive definite B. 8. Show that
•
~
II + ~AI
(- ~·-'>.ktr (Ak)
... ell - 1
and hence, obtain relations connecting tf>k(A) and tr (A k). 9. Show that AB + Band BA + B have the same determinant by showing that tr «AB + B)") - tr «BA + B)") for n ... 1,2, . . . . 10. Let Bo - 1, B, - AB,_I + k.l, where k • ... tr (AB;_Mi; then
(LelJerrier)
Functions of Matrices
101
11. Let A, B, ... , be commutative matrices, and let /(ZI,ZI, ••• ) be any rational function. The characteristic roots ai, ai, . . . , aft of A, bl, bl, . . . ,bft of B, . . . , can be ordered in such a way that the characteristic roots of /(A,B, . . . ) are !(a"bi, •.•), !(al,bl, .•.), and so on. 12. Show ihat every nonsingular matrix may be represented as a product of two not necessarily real symmetric matrices in an infinite number of ways (V088). 18. If H is non-negative definite Hermitian, there exists a triangular matrix T such that H ... TT· (Toeplilz). 14. For every A there is a triangular matrix such that TA is unitary (E. Schmidl). 115. Let P, Q, R, X be matrices of the second order. Then every characteristic root of a solution X of PXI + QX + R ... 0 is a root of IPh l + Qh + RI ... 0 (Sylvesler). 18. If A, B, C are positive definite, then the roots of IAX 2 + Bx + CI = have negative real parts. 17. Consider matrices A and B such that AB = rIBA, where rl is a qth root of unity. Show that the characteristic roots of A and B, h' and lAi, may be arranged so that those of A + Bare (h,« + lA,q)I/Q, and those of AB are r(q-l)ll hilAi • Different branches of the qth root may have to be chosen for different values of i (paller). 18. Using the representation
°
,....,..._"".,...N_'.I....., ... =
IA
+ eBlli
J"
e-(o,Ao)-'(o.Bo)
-00
n,
dZ i
and expanding hath sides in powers of e, obtain representations for the coefficients in the expansion of IA + eB) in powers of e, and thus of the fundamental functions ",~(A).
19. Let A and B be Hermitian matrices with the property that CIA + clB has the characteristic roots C,h, + CIIA' for all scalars c, and CI, where h" lA' are, respectively, the characteristic roots of A and B. Then AB "" BA (Molzkin). 20. Show that A is nonsingular if laill
>
lla'kl, i = 1, 2, . . . , N. k"'"
21. The characteristic vectors of A are characteristic vectors of p(A), for any polynomial p, but not necessarily conversely. 22. If A and B are symmetric matrices such that II -
Mill -
lAB I
""
II - hA - IABI
for all h and lA, then AB "" 0 (Craig-Holelling). 2S. Let AB = 0, and let p(A,B) be a polynomial having no constant term. Then I>.I - p(A,B)1 "" X-NlhI - p(A,O)II>.I - p(B,O)1 (H. Schneider). 24. Generalize this result to p(A1,A I , • • • ,An), where AiA i = 0, i < j (H. Schncider) .
21. Let A and B be positive definite.
28. Show t.hat
J-.. .
e-(o.Aol/H(v. o )
By writing IABI
n
= IAl~IIBIIAl~l,
dXi = (2,,")NIIIA l-l~e-(v.A-I~l/1
show that
102
Introduction to Matrix Analysis
IT. Let Y be a positive definite 2 X 2 matrix and X a 2 X 2 symmetric matriL Consider the integral J(Y) -
(
}x>o
e-" (XYlIXI'-% dx ll
dz ll dz ll
where the region of integration is determined by the inequalities SII > 0, SIISII > O. In other words, this is the region where X is positive definite. Then
-
Itl,'
for Re (8) >~ (lngha.m-8iegel). (See the discussion of Sec. 9 for further information concerning results of this nature.) 28. Let A be a complex matrix of order N, A ... (a..) - (b.. + ic..). Let B be the real matrix of order 2N obtained by replacing b.. + ic,. by i~ matrix equivalent
b +.~C" [b..e" "
I""to.I
_
cn ]
b"
Show that A -I can be obtained from B-1 by reversing this process after B-1 has been obtained. 29. If A is symmetric and IAI ... 0, then the bordered determinant all
alN alN
all
XI XI
B(s) ... aNI XI
'" ...
aNN XN
XN
0
multiplied by the leading second minor of A is the square of a linear function of the Xi (Burnside-Panton).
80. Consequently, if A is symmetric and the leading first minor vanishes, the determinant and its leading second minor have opposite signs. al. Let X and A be 2 X 2 matrices. Find all solutions of the equation XI - A which have the form X ... clI + ctA, where CI and CI are scalars. a2. Similarly, treat the case where X and A are 3 X 3 matrices, with X ... cil + ctA + c,Al. aa. Let f(t) be a polynomial of degree less than or equal to N - 1, and let ~b XI, ••• , XN be the N distinct characteristic roots of A. Then
(Sylve8ter interpolation formula)
What is the right-hand side equal to when f(X) is a polynomial of degree greater than N? U. Is the formula valid for f(A) - A-I? all. Show that
Functions of Matrices
103
8e. If the clements aij of the N X N matrix A are real and satisfy the conditions a..
>
Llai;l, i ... 1, 2, . . . , N, then the absolute term in the Laurent expansion of
J'" f(81,82, ••• ,8N)
on 1811 = 1821 Alternatively, =0
N
N
;=1
k=1
n (L al"z~/ Z;)-I
=
•
where the contour of integration is
Iz;1
=
1-
This result remains valid if A is complex and laii!
>
L
!aiil. A number of further
i'"
results are given in P. Whittle. 1 The original result was first given by Jacobi and applications of it may be found in H. Weyl2 and in A. R. Forsyth. s
87. Show that if A and B are non-negative definite, then ~ ai;b i ; 2 0, and thus 'oJ
establish Schur's theorem, given in Sec. 7 (Fejer). 8S. Let (Ad be a set of N X N symmetric matrices, and introduce the inner product (Ai,A;) = tr (AiA;) Given a set of M = N(N + 1)/2 linearly independent symmetric matriccs lAd, construct an orthonormal linearly independent set (Yi I using the foregoing inner product and show that every N X N symmetric matrix X can be written in the form
L M
X
=0
CiYi, where Ci
= (X, Vi).
i-I
89. For the case of 2 X 2 matrices, take
What is the corresponding orthonormal set? Similarly, in the general case, take for the Ai the matrices obtained in the analogous fashion, and construct the Yi. U. The Cayley-Hamilton theorem asscrts that A satisfies a scalar polynomial of degree N. Call the minimum polynomial associated with A the scalar polynomial of minimum degree, q(>.), with thc propcrty that q(A) = O. Show that q(>.) divides
'0.
IA -
>.11.
I P. Whittle, Some Combinatorial Results for Matrix Powers, Quart. J. Math., vol. 7, pp. 316-320, 1956. 2H. Weyl, The Classical Groups . .. ,Princeton University Press, Princeton, N.J., 1946. sA. R. Forsyth, Lectures Introductory to the Theory of FlIncticms of Two Complex Variables, Cambridge University Press, New York, 1914.
104
Introduction to Matrix Analysis
U. Find the Jacobian of the transformation Y ... X-t.• U. Evaluate the matrix function
..l, { l(z)(zl 2,..t }c
- A)-I dz
under various assumptions concerning A and the contour C. 1 The first use of a contour integral -2. (>.1 - T)-Ij(>.) d>. to represent I(T) is In
r
}c
found in Poincare (H. Poincar~, Sur les groupes continus, Trans. Cambridge Phil. Soc., vol. 18, pp. 220-225, 1899, and Oeuvres, vol. 3, pp. 173-212). 44. If Ax - b has a solution for all b, then A -I exists. U. For any two positive constants kl and k 2, there exist matrices A and B such that every element of AB - 1 is less than kl in absolute value and such that there exist elements of BA - 1 greater than kl (Houaeholder). '8. If A I is nonsingular, then
~:
IA' As
I..
IAIIIA. - AsAI-tAII
,.,. If BC ... CB, and
1 -B C A-
0
1 -B C
0
0
1 -B
0
1
0
then
... o 1
'"
0
] ...
Determine recurrence relations for 'the Di. '8. If A is an N X M matrix of rank r, r < H, C an M X N matrix such that ACA =- kA, where k is a scalar, Ban M X H matrix, then the characteristic equation of AB is >.N-'t/>(>') - 0 and that of A (B + C) is >.N-'t/>(>' - k) =- 0 (Parker). '9. If Y =- (AX + B)(CX + D)-I, express X in terms of Y. 150. We have previously encountered the correspondence
a+bi"'l
bl
-ba a
Prove that a necessary and sufficient condition for I(a
+ bi) '" 1 I -ba
b a
I
where I(z) is a power series in z, is that 1(1) = I(z). For a generalization, see D. W. Robinson, Mathematics Magazine, vol. 32, pp. 213-215, 1959. • I. Olkin, Note on the Jacohians of Certain Matrix Transformations Useful in Multivariate Analysis, Biometrika, vol. 40, pp. 43~, 1953.
Functions of Matrices
.
111. If F(s) =
l
c,.zft with
Cn
~
105
0, then F(A) is positive definite whenever A is
n-O
positive definite.
112. If F(A) is positive definite for every A of the form (IIi-I), then F(s) with Cn ~ 0 (W. Rudin). An earlier result is due to Schoenberg, Duke Math. J., 1942. 68. A matrix A = (a,/) is called a Jacobi matrix if ali = 0, for Ii - ;1 ~ 2. If A is a symmetric Jacobi matrix, to what extent are the elements atl determined, if the characteristic roots are given? This is a finite version of a problem of great importance in mathematical physics, namely that of determining the coefficient function ..p(x), given the spectrum of u" + ht/>(X)U = 0, subject to various boundary conditions. I 64. If A has the form
A
=
a1 1 1 az [ o 1
0 1 as
]
. o. . .,. 1
0
...
a,
how does one determine the given the characteristic roots? 1111. Show that for any A, there is a generalized inverse X such that AXA = A, XAX = X, XA = (XA)', AX = (AX)' (Penrose l ). 118. X is called a generalized inverse for A if and only ifAXA = A. Show that X is a generalized inverse if and only if there exist matrices V, V, and W such that
where
and IRis the identity matrix of dimension n. 117. X is called a reflexive generalizcd inverse for A if and only ifAXA XAX = X. Show that
X=Q[I,V
U]p
VU
=
A and
(C. A. Rhode)
118. Show that if Ax = y is cOllsistent, the general solution is given by x
= Aly + (l
- A'A)z
where Al is the Penrose-Moore inverse and z is arbitrary (penrose). tational aspects see Urquehart. '
For compu-
I G. Borg, Eine Umkehrung der Sturm-Liouvilleschen Eigenwerte Aufgabe, Acta Math., vol. 78, pp. 1-96, 1946. I. M. Gelfand and B. M. Leviton, On the Determination of a Differential Equation from Its Spectral Function, Trans. Am. Math. &c., vol. 2, pp. 253-304, 1955. I See the reference in the Bibliography and Discussion, Sec. 3. • N. S. Urquehart, Computation of Generalized Inverse Matrices which Satisfy Specified Conditions, SIAM Review, vol. 10, pp. 216-218, 1968.
106
Introduction to Matrix Analysis
See also Langenhop,1 Radhakrishna Rao,1 and Schwerdtfeger. 3 159. If A-I exists and if more than N of the elements of A are positive with the others non-negative, then A -I has at least one negative element. A has the form PD with P a permutation matrix and D a diagonal matrix with positive elements if and only if A and A -I hoth have non-negative entries (Spira). 80. Consider the matrix A + eB where A is positive definite, B is real and symmetric and e is small. Show that we can write (A + eB)-1 = A -~~(l + eS)-IA -~, where S is symmetric, and thus obtain a symmetric perturbation formula, (A
+ eB)-1
= A-I -
+ .... = ao + als + .. '.
eA-~SA-~
81. Let /(s) be analytic for lsi < 1, /(s) Write /(s).... ao. + aus + aatz l + ... , k = 1, 2, . . •. Can one evaluate lalil, i, i ... 0, 1, ... , N, in simple terms? 82. Write /(s) = bls + bls l + , and p·)(s) = buz + buzl + ... , where p.) is the kth iterate, that is, Pi) = /(j), , p.) = /(j(.-I», .. " Can one evaluate Iblll, i, i - 0, 1, . . . , N, in simple terms?
Bibliography and Discussion §1. There is an extensive discussion of functions of matrices in the book by MacDuffee, C. C. MacDuffee, The Theory of Matrices, Ergebnisse der Mathematik, Reprint, Chelsea Publishing Co., New York, 1946. More recent papers are R. F. Rinehart, The Derivative of a Matrix Function, Proc. Am. Math. Soc., vol. 7, pp. 2-5, 1956. H. Richter, tiber Matrixfunktionen, Math. Ann., vol. 122, pp. 16-34, 1950. S. N. Afriat, Analytic Functions of Finite Dimensional Linear Transformations, Proc. Cambridge Phil. Soc., vol. 55, pp. 51-56,1959. and an excellent survey of the subject is given in R. F. Rinehart, The Equivalence of Definitions of a Matric Function, Am. Math. Monthly, vol. 62, pp. 395-414, 1955. I C. E. Langenhop, On Generalized Inverses of Matrices, SlAM J. Appl. Math., vol. 15, pp. 1239-1246, 1967. I C. Radhakrishna Rao, Calculus of Generalized Inverses of Matrices, I: General Theory, Sankhya, ser. A, vol. 29, pp. 317-342, 1967. I H. Schwerdtfeger, Remarks 011 the Generalized Inverse of a Matrix, Lin. Algebra Appl., vol. 1, pp. 325-328, 1968.
Functions of Matrices
107
See also P. S. Dwyer and M. S. Macphail, Symbolic Matrix Derivatives, Ann. Math. Stat., vol. 19, pp. 517-534, 1948. The subject is part of the broader field of functions of hypercomplex quantities. For the case of quaternions, which is to say matrices having the special form
there is an extensive theory quite analogous to the usual theory of functions of a complex variable (which may, after all, be considered to be the theory of matrices of the special form
with x and y real) ; see
R. Fueter, Functions of a Hyper Complex Variable, University of Zurich, 1948-1949, reprinted by Argonne National Laboratory, 1959. R. Fueter, Commentarii Math. Helveticii, vol. 20, pp. 419-20, where many other references can be found. R. F. Rinehart, Elements of a Theory of Intrinsic Functions on Algebras, Duke Math. J., vol. 27, pp. 1-20, 1960. Finally, let us mention a paper by W. E. Roth, W. E. Roth, A Solution of the Matrix Equation P(X) Am. Math. Soc., vol. 30, pp. 597-599, 1928.
=
A, Trans.
which contains a thorough discussion of the early results of Sylvester, Cayley, and Frobenius on the problem of solving polynomial equations of this form. Further references to works dealing with functions of matrices will be given at the end of Chap. 10. §3. The concept of a generalized inverse for singular matrices was discussed first by E. H. Moore, General Analysis, Part I, Mem. Am. Phil. Soc., vol. 1, p. 197, 1935. and then independently discovered by R. Penrose, A Generalized Inverse for l\1atrices, Proc. Cambridge Phil. Soc., vol. 51, pp. 406-413, 1955.
Introduction to Matrix Analysis
108
These and other matters are discussed in M. R. Hestenes, Inversion of Matrices by Biorthogonalization, J. Soc. [ndu8t. Appl. Math., vol. 6, pp. 51-90, 1958. An exposition of the computational aspects of matrix inversion which contains a detailed account of many of the associated analytic questions may be found in
J. von Neumann and H. Goldstine, Numerical Inverting of Matrices of High Order, Bull. Am. Math. Soc., vol. 53, pp. 1021-1099, 1947.
§4. An excellent account of various methods of matrix inversion, together with many references, is given in D. Greenspan, Methods of Matrix Inversion, Am. Math. Monthly, vol. 62, pp. 303-319, 1955. The problem of deducing feasible, as distinguished from theoretical, methods of matrix inversion is one of the fundamental problems of numerical analysis. It will be extensively discussed in a succeeding volume by G. Forsythe. The problem of determining when a matrix is not singular is one of great importance, and usually one that cannot be answered by any direct calculation. It is consequently quite convenient to have various simple tests that can be applied readily to guarantee the nonvanishing of the determinant of a matrix. Perhaps the most useful is the following (Exercise 20 of Miscellaneous Exercises) : If A is a real matrix and laul > llai/l i = 1, 2, . . . , N, then
IAI
~ O.
I""
For the history of this result and a very elegant discussion of extensions, see O. Taussky, A Recurring Theorem on Determinants, Am. Math. Monthly, vol. 56, pp. 672-676, 1949. This topic is intimately related to the problem of determining simple estimates for the location of the characteristic values of A in terms of the elements of A, a topic we shall encounter in part in Chap. 16. Generally, any extensive discussion will be postponed until a later volume. A useful bibliography of works in this field is O. Taussky, Bibliography on Bound8 for Characteri8tic Root8 of Finite Matrice8, National Bureau of Standards Report, September, 1951. It should be noted, however, that the foregoing result immediately yields the useful result of Gersgorin that the characteristic roots of A must
Functions of Matrices lie inside the circles of center aH and radius
L IQ.;JI, for i
'pl.
109 = 1,2,
,N.
The original result is contained in S. Gersgorin, tiber die Abgrenzung der Eigenwerte einer Matrix, lzv. Akad. Nauk SSSR, vol. 7, pp. 749-754, 1931. §6. As mentioned previously, a parametric representation of different type can be obtained for 3 X 3 matrices in terms of elliptic functions; see F. Caspary, Zur Theorie der Thetafunktionen mit zwei Argumenten, Kronecker J., pp. 74-86, vol. 94.
F. Caspary, Sur les sys~mes orthogonaux form6s par les fonctions tMta, Comptes Rendus de Paris, pp. 490-493, vol. 104. as well as a number of other papers by the same author over the same period. §7. See I. Schur, Bemerkungen zur Theorie der beschrli.nkten Bilinearformen mit unendlich vielen Verli.nderlichen, J. Math., vol. 140, pp. 1-28, 1911. See also L. Fejer, tiber die Eindeutigkeit der L
j
§9. This infinite integral plays a fundamental role in many areas of analysis. We shall base our discussion in the chapter on inequalities upon it, and an extension. An extensive generalization of this representation may be obtained from integrals first introduced by Ingham and Siegel. See, for the original results,
110
Introduction to Matrix Analysis A. E. Ingham, An Integral which Occurs in Statistics, Proc. Cambridge Phil. Soc., vol. 29, pp. 271-276, 1933.
C. L. Siegel, tlber die analytische Theorie der quadratischen Formen, Ann. Math., vol. 36, pp. 527-606, 1935. For extensions and related results, see R. Bellman, A Generalization of Some Integral Identities Due to Ingham and Siegel, Duke Math. J., vol. 24, pp. 571-578, 1956.
C. S. Herz, Bessel Functions of Matrix Argument, Ann. Math., vol. 61, pp. 474-523, 1955. I. Olkin, A Class of Integral Identities with Matrix Argument, Duke Math. J., vol. 26, pp. 207-213, 1959.
S. Bochner, "Group Invariance of Cauchy's Formula in Several Variables," Ann. Math., vol. 45, pp. 686-707, 1944. R. Bellman, Generalized Eisenstein Series and Nonanalytic Automorphic Functions, Proc. Natl. Acad. Sci. U.S., vol. 36, pp. 356-359, 1950. H. Maass, Zur Theorie der Kugelfunktionen einer Matrixvariablen, Math. Z., vol. 135, pp. 391-416, 1958. H. S. A. Potter, The Volume of a Certain Matrix Domain, Duke Math. J., vol. 18, PP. 391-397, 1951. Integrals of this type, analogues and extensions, arise naturally in the field of statistics in the domain of multivariate analysis. See, for example, T. W. Anderson and M. A. Girshick, Some Extensions of the Wishart Distribution, Ann. Math. Stat., vol. 15, pp. 345-357, 1944. G. Rasch, A Functional Equation for Wishart's Distribution, Ann. Math. Stat., vol. 19, pp. 262-266, 1948. R. Sitgreaves, On the Distribution of Two Random Matrices Used in Classification Procedures, A nn. Math. Stat., vol. 23, pp. 263-270, 1952. In connection with the concepts of functions of matrices, it is appropriate to mention the researches of Loewner, which have remarkable connections with various parts of mathematical physics and applied mathematics. As we have noted in various exercises, A ~ B, in the sense that A, B are symmetric and A - B is non-negative definite, does not necessarily imply that A 2 ~ B2, although it is true that B-1 2: A-I if A 2: B > O. The general question was first discussed in C. Loewner, Math. Z., vol. 38, pp. 177-216, 1934.
Functions of Matrices
111
See also R. Dobsch, Math. Z., vol. 43, pp. 353-388, 1937. In his paper Loewner studies the problem of determining functionsf(x) for which A ;;:: B implies that f(A) ~ f(B). This leads to a class of functions called positive real, possessing the property that Re [f(z)) ~ 0 whenever Re (z) ;;:: O. Not only are these functions of great importance in modern network theory, cf.
L. Weinberg and P. Slepian, Positive Real Matrices, Hughes Research Laboratories, Culver City, Calif., 1958. F. H. Effertz, On the Synthesis of Networks Containing Two Kinds of Elements, Proc. Symposium on Modern Network Synthesis, Polytechnic Institute of Brooklyn, 1955, where an excellent survey of the field is given, and R. J. Duffin, Elementary Operations Which Generate Network Matrices, Proc. Am. Math. Soc., vol. 6, pp. 335-339, 1955; but they also play a paramount role in certain parts of modern physics; see the expository papers A. M. Lane and R. G. Thomas, R-matrix Theory of Nuclear Reactions, Revs. Mod. Physics, vol. 30, pp. 257-352, 1958.
E. Wigner, On a Class of Analytic Functions from the Quantum Theory of Collisions, Ann. Math., vol. 53, pp. 36-67, 1951. Further discussion may be found in a more recent paper, C. Loewner, Some Classes of Functions Defined by Difference or Differential Inequalities, Bull. Am. Math. Soc., vol. 56, pp. 308-319, 1950.
and in J. Bendat and S. Sherman, Monotone and Convex Operator Functions, Trans. Am. Math. Soc., vol. 79, pp. 58-71, 1955.
The concept of convex matrix functions is also discussed here, treated first by F. Kraus, Math. Z., vol. 41, pp. 18-42, 1936. §10. This result is contained in R. Bellman, Representation Theorems and Inequalities for Hermitian Matrices, Duke Math. J., 1959.
7 Variational Description of Characteristic Roots 1. Introduction. In earlier chapters, we saw that both the largest and smallest characteristic roots had a quite simple geometric significance which led to variational problems for their determination. As we shall see, this same geometric setting leads to corresponding variational problems for the other characteristic values. However, this variational formulation is far surpassed in usefulness and elegance by another approach due to R. Courant and E. Fischer. Both depend upon the canonical form derived in Chap. 4. We shall then obtain some interesting consequences of this second representation, which is, in many ways, the link between the elementary and advanced analytic theory of symmetric matrices. 9. The Rayleigh Quotient. As we know, the set of values assumed by the quadratic form (x,Ax) on the sphere (x,x) = 1 is precisely the same set taken by the quadratic form (y,Ay) = ~IY12 + ~2Y22 + ... + ~NYN2 on (y ,y) = I, A = T' AT, y = Tx, with T orthogonal. Let us henceforth suppose that ~I ;;:: ~2;;::
•••
;;:: ~N
(1)
Since, with this convention, (y,Ay) ;;:: ~N(YI2 (y,Ay) ~ ~1(YI2
+ Yi 2 + ... + YN 2) + y22 + .. + YN 2)
(2)
we readily obtain the representations ~I = max (y,Ay) = max (x,Ax) u
~
"N
=
•
(y,y) (y,Ay)
(x,x) • (x,Ax)
%
min - u (y,y)
=
(3)
mm - (x,x) %
The quotient (x)
=
(x,Ax)
q (x,x) is often called the Rayleigh quotient. 112
(4)
Variational Description of Characteristic Roots
113
From the relation in (3), we see that for all x we have X
>
I -
(x,Ax) > X (x,x) - N
(5)
Relations of this type are of great importance in applications where one is interested in obtaining quick estimates of the characteristic values. Remarkably, rudimentary choices of the Xi often yield quite accurate approximations for Xl and XN. Much of the success of theoretical physics depends upon this phenomenon. The reason for this interest in the characteristic roots, and not so much the characteristic vectors, lies in the fact that in many physical applications the Xi represent characteristic frequencies. Generally speaking, in applications the Xi are more often the observables which allow a comparison of theory with experiment. Matters of this type will be discussed in some slight detail in Sec. 12 below. 3. Variational Description of Characteristic Roots. Let us begin our investigation of variational properties by proving Theorem 1. Theorem 1. Let Xi be a set of N characteristic vectors associated with the Xi. Then for k = 1 2 N, j
j
j
Xk
•••
=
j
max (x,Ax)/(x,x)
(1)
R~
where R k is the region of X space determined by the orthogonality relations (x ,Xi) = 0
i = 1, 2, ...
j
Ie - 1, x ~ 0
(2)
Geometrically, the result is clear. To determine, for example, the second largest semiaxis of an ellipsoid, we determine the maximum distance from the origin in a plane perpendicular to the largest semiaxis. To demonstrate the result analytically, write (3)
Then N
(x,Ax)
l l
XkUk
l
(4)
k-I N
(x,x) =
Uk
2
k=l
The result for XI is precisely that given in Sec. 2. Consider the eJl:pression for X2. The condition (x,x l ) = 0 is the same as the assertion that
Introduction to Matrix Analysis
114
N
UI
= O. It is clear then that the maximum of
L
XtU,1
subject to
11-2
N
Lu,t
= 1 is equal to XI.
"'-2In exactly the same fashion,
we see that X", has the stated value for the remaining values of k. 4. Discussion. It is known from physical considerations that stiffening of a bar or plate results in an increase in all the naturaIfrequencies. The analytic equivalent of this fact is the statement that the characteristic roots of A + B are uniformly larger than those of A if B is positive definite. To demonstrate the validity of this statement for Xl or XN is easy. We have, using an obvious notation, , (A
"1
+ B)
_
max
[x,(A
+
B)x] x,x = ml\x [(x,AX) (x,BX)] '" (X,X) (X,X) (x,Ax) (x,Bx) max-- mIn-> '" (x,x) '" (x,x) ~ X1 (A) XN(B) -
()
'"
+
+ .
+
(1)
Since by assumption XN(B) > 0, we see that Xl(A + B) > Xl(A) if B is positive definite. The proof that XN(A + B) > XN(A) proceeds similarly. If, however, we attempt to carry through a proof of this type for the roots XI, Xa, • • • ,XN-I, we are baffled by the fact that the variational description of Theorem 1 is inductive. The formula for X2 depends upon the characteristic vector xl, and so on. Furthermore, as noted above, in general, in numerous investigations of engineering and physical origin, we are primarily concerned with the characteristic values which represent resonant frequencies. The characteristic vectors are of only secondary interest. We wish then to derive a representation of Xk which is independent of the other characteristic value's and their associated characteristic vectors. EXERCISES 1. Show that XN(A + B) ~ XN(A) + XN(B) if A and B are positive definite. 2. Suppose that B is merely positive indefinite. Must XN(A + B) actually he greater than X!f(A)?
Ii. Geometric Preliminary. In order to see how to do this, let us take an ellipsoid of the form x 2/a 2 + y2/b 2 + Z2/C2 = 1, and pass a plane through the origin of coordinates. The cross section will be an ellipse
Variational Description of Characteristic Roots
115
which will have a major semiaxis and a minor semiaxis. If we now rotate this cross-sectional plane until the major semiaxis assumes its smallest value, we will have determined the semiaxis of the ellipsoid of second greatest length. The analytic transliteration of this observation will furnish us the desired characterization., Like many results in analysis which can be demonstrated quite easily, the great merit of this contribution of Courant and Fischer lies in its discovery. 6. The Courant-Fischer min-max Theorem. Let us now demonstrate Theorem 2. Theorem 2. The characteristic roots N, i = 1, 2, ... , N, may be defined as follows: Xl = max (x,Ax)/(x,x)
.,
X2 = min
max (x,Ax)/(x,x)
(11.11)-1 (".11)-0
Xk
= min
max (x,Ax)/(x,x)
(11'.11') = 1 (X,II')=O
i-I.2..... k-l
(1)
Equivalently, XN = min (x,Ax)/(x,x) x
XN-l = max
min (x,Ax)/(x,x)
(2)
(11.11)-1 (X,II)-O
and so on, Proof. Let uS consider the characteristic root ~2' #£2 =
min
Define the quantity
max (x,Ax)/(x,x)
(3)
(11,11)=1 (x.II)-O
We shall begin by making an orthogonal transformation which converts A into diagonal form, x = Tz. Then N
min
#£2 =
max
(11,11) - 1 (T•• II) = 0
= min
N
{L ~kZk2 L =
2 Zk }
/
k= 1
k
max {"'I
1
(4)
(S.II) - 1 (., T'II)-O
Setting T'y
=
yl, this becomes #£2
= (Til'.min max I' . '1 Til') = 1 (••11')-0
(5)
Introduction to Matrix Analysis
116 Since (T y l, T y l) ==
we may write
(yl,yl),
min
#£2 =
max
(6)
I' • .,
(1/.1/)-1 (_.1/)-0
In other words, it suffices to assume that A is a diagonal matrix. In place of maximizing over all z, let us maximize over the subregion defined by s: Za = Z4 = . . . = ZN = 0 and (z,Y) = 0 (7) Since this is a subregion of the #£2 ~
min max (1/.1/)-1
Z
region defined by (z,y) = 0, we have
IX 1Zl2
+ XtZ22/Z12 + z2 21
(8)
S
Since (X1Z1 2 + X21Z22)/(ZI2 + Z2 2) ~ X, for all Zl and Z2, it follows that we have demonstrated that #£2 ~ X2• Now let us show that #£2 ~ X2. To do this, in the region defined by (Y,Y) = 1, consider the set consisting of the single value Yl = 1. This choice of the vector Y forces Z to satisfy the condition that its first component Zl must be zero. Since the minimum over a subset must always be greater than or equal to the minimum over the larger set, we see that #£2 ~
max •. -0
~
max
•
+ XtZ2 2 + ... + XNZN 2)/(Z12 + Z22 + ... + ZN 2) I (O'tZ2 2 + . . . + XNZN 2) / (Z2 2 + . . . + ZN 2 ) I = X2 (9)
{(X 1Z1 2
It follows that #£2 = X2. Now let us pursue the same techniques to establish the result for general k. Consider the quantity
= min
max (x,Ax)/(x,x) i-I.2.....1-1
#£l
(1/'.1/')-1 (ZI/')-O
(10)
Consider maximization over the subregion defined by S:
Zl+1
=
Zl+2
= .. . =
ZN
=0
(11)
~ Xl
(12)
It follows that
min
max {..
(1/'.1/')-1
i -1.2., .. ,A: - 1
since
L'"
XiZi 2/
i-I
'" L '-1
Zi 2
~
Xl
for all
'1
S
z.
Similarly, to show that Xl ~ #£l, we consider minimization over the subregion of (yi,yi) = 1, i = 1, 2, . . . ,k - 1, consisting of the (k - 1)
Variational Description of Characteristic Roots
117
vectors 0 1 0
0 1
0
yl =
y2
yk-l
=
=
1
(13)
0 0 0 The orthogonality conditions (yi,Z) = 0 are then equivalent to Zl
= Z2
=
. . . =
Zk-l = 0
(14)
As before, We see that ~k :::; Xk, and thus that ~k = Xk. 7. Monotone Behavior of Xk(A). The representation for Xk given in Theorem 2 permits uS to conclude the following result. Theorem 3. Let A and B be symmetric matrices, with B non-negative definite. Then Xk(A
+ B)
~
Xk(A)
k = 1,2, . . . , N
(1)
Xk(A)
k = 1,2, . . . , N
(2)
If B is positive definite, then Xk(A
+ B) >
EXERCISE 1. Obtain a lower bound for thc difference hk(A
+ B)
- hk(A).
8. A Sturmian Separation Theorem. Theorem 2 also permits us to demonstrate Theorem 4. Theorem 4. Consider the sequence of symmetric matrices i, j = 1, 2, . . . ,r
(1)
for r = 1, 2, . . . , N. Let Xk(A r ), k = I, 2, . . . , r, denote the kth characteristic root of AT) where, consistent with the previous notation, (2)
Then (3)
Proof. Let uS merely sketch the proof, leaving it as a useful exercise for the reader to fill in the details. Let us prove (4)
Introduction to Matrix Analysis
118 We have
}.l(A i ) = max (x,A;x)/(x,x)
'" (x,Ai+1X)/(x,x) Al(A i +1) = max
(5)
'" where the x appearing in the first expression is i-dimensional, and the x appearing in the second expression is (i + 1)-dimensional. Considering variation over the set of (i + 1)-dimensional vectors x with (i + l)st component zero, we see that X1(A.) ::s;; X1(A/+1). To obtain the inequality X2(Ai+l) ::s;; Xl(Ai), we use the definition of the X~ in terms of max-min. Thus X1(A i) =
max
1 '-1. f..),.. ,.,-1 (II·
~
max
A2(A/+ 1) =
min
(x,Ax)/(x,x)
("'.11.) -0
'-1,2,.. ,,;-1
min
(11•• 11.) -.1
("'.11.) -0
.-1.2.....'-1
'" -1.2...... -1
(x,A/+1x)/(x,x)
(6)
The vectors appearing in the expression for X1(A i ) are all i-dimensional, while those appearing in the expression for X2 (A/+ 1) are all (i + 1)-dimensional. In this second expression, consider only vectors x for which the (i + l)st component is zero. Then it is clear that for vectors of this type we maximize over the yk by taking vectors y~ whose (i + l)st components are zero. It follows that X2(A/+1) ::s;; X1(A i ). 9. A Necessary and Sufficient Condition that A be Positive Definite. Using the foregoing result, we can obtain another proof of the fact that a necessary and sufficient condition that A be positive definite is that IAkl > 0, k = 1,2, . . . , N. As we know, a necessary and sufficient condition that A be positive definite is that Xk(A) > 0, k = 1, 2, . . . ,N. If A is positive definite, and thus Xk(A) > 0, we must have, by virtue of the above separation theorem, Xk(A i )
> 0, k
=
1, 2, . . . ,i, and thus IAil
=
, n Xk(A) > 0.
'" -I
On the other hand, if the determinants IAkl are positive for all k, we must have as the particular case k = 1, X1(A 1) > 0. Since (1)
°
the condition IA 2 1 > ensures that X2(A 2) > 0. Proceeding inductively, we establish that Xk(A i) > 0, k = 1, 2, . . . ,i, for i = 1, 2, . . . , N, and thus obtain the desired result. Again the details of the complete proof are left as an exercise for the reader. 10. The Poincare Separation Theorem. Let us now establish the following result which is useful for analytic and computational purposes.
Variational Description of Characteristic Roots
119
Theorem Ii. Let Iyk}, k = 1, 2, . . . , K, be a 8et of K orthonormal K
l
vectors and set x =
Ukyk, so that
.1:.1 K
(x,Ax)
L
=
(1)
UkUI(y\Ayl)
k,l= I
Set BK
= (yk,Ayl)
= 1,2, . , . ,K
(2)
= 1,2, ' .. ,K j = 0,1,2, . , ,K - 1
(3)
k, l
Then i "'i(B K ) ~ "'i(A) "'K-i(B K ) ~ "'N-i(A)
The results follow immediately from Theorem 4. 11. A Representation Theorem. Let us introduce the notation (1)
We wish to demonstrate Theorem 6. If A is positive definite, 7f'k/2 - = max
IAli'
1
R
r(z,Az)
dV k
(2)
R
where the z'ntegration is over a k-dimensional linear subspace of N-dimensional space R, whose volume element is dV k , and the maximization is over
all R. Proof, "'lX1
2
It is easy to see that it is sufficient to take (x,Ax) in the form
+ "'2X2 2 + ... + "'NXN 2• 7f'k/2
("'N"'N-l ' . , "'N-Hl)~'
Hence, we must show that
=
max R
r e-(~IZI'+~""'+'" +~NZN')
J
dVk
(3)
R
Denote by Va(p) the volume of the region defined by "'lX1
(x,a')
2
+ "'2X2 2 + ' , , + "'NXN 2 ~ P =0
i = 1, 2, . . . ,N - k
(4)
where the a i are N - k linearly independent vectors. It is clear that (5)
Then
Introduction to Matrix Analysis
120
To complete the proof, we must show that the maximum of V,,(t) is attained when the relations (x,a i ) = 0 are Xl = 0, X2 = 0, . . . ,Xk = O. This, however, is a consequence of the formula for the volume of the N
ellipsoid, ] =
L
2 Xk X k ,
and the min-max characterization of the
;-N-"'+l
characteristic roots given in Theorem 2. 12. Approximate Techniques. The problem of determining the mini1 mum of u'· dt over all functions satisfying the constraints
10
Jo l q(t)u 2 dt =
I u(O) = u(l) = 0
(ta) (tb)
is one that can be treated by means of the calculus of variations. Using standard variational techniques, we are led to consider the SturmLiouville problem of determining values of X which yield nontrivial solutions of the equation utI
+ Xq(t)u
=0
u(O) = u(l) = 0
(2)
Since, in general, the differential equation cannot be solved in terms of the elementary transcendents, various approximate techniques must be employed to resolve this problem. In place of obtaining the exact variational equation of (2) and using an approximate method to solve it, We can always replace the original variational problem by an approximate variational problem, and then use an exact method to solve this new problem. One way of doing this is the following. Let (u;(t) I be a sequence of linearly independent functions over [0,1] satisfying the conditions of (lb), that is, Ui(O) = ui(l) = O. We 8,ttempt to find an approximate solution to the original variational problem having the form Nl
U =
L
x;u;(t)
(3)
i-I
The problem that confronts us now is finite dimensional, involving only the unknowns Xl, X2, . . • ,XN. We wish to minimize the quadratic form N
'\' L
(I" XiXj jo Ui(t)Uj(t) dt
(4)
Jo
(5)
I,; == 1
subject to the constraint N
.
~
",-1
XiX;
l
q(t)u;(t)u;(t) dt = 1
Variational Description of Characteristic Roots
121
It is clear that important simplifications result if We choose the sequence
IUt(t) I so that either
Jo l u~(t)u;(t) dt = a"J
or
hi
=
q(t)Uj(t)Uj(t) dt
84
(6)
The first condition is usually easier to arrange, if q(t) is not a function of particularly simple type. . Thus, for example, we may take u~(t) = sin 7rkt
(7)
u~(t) = Pk(t)
(8)
properly normalized, or the kth Legendre polynomial, again suitably normalized. This last would be an appropriate choice if q(t) were a polynomial in t, since integrals of 1 the form fo tku;(t)uJ(t) dt are readily evaluated if the Ui(t) are given by (8). This procedure leads to a large number of interesting a,nd significant problems. In the first place, we are concerned with the question of convergence of the solution of the finite-dimensional problem to the solution of the original problem. Second, we wish to know something about the rate of convergence, a matter of great practical importance, and the mode of convergence, monotonic, oBcillatory, and so forth. We shall, however, not pursue this path any further here. EXERCISE 1. Let XI(Nl, XI(N), • • • , XN(N) denote characteristic values associated with the problem posed in (4) and (5), for N ... 2, 3, . • .. What inequalities exist con· necting Xi(N) and X,(N-I)1 MISCELLANEOUS EXERCISES 1. Let A and B be Hermitian matrices with respective characteristic values ;\1 ~ ~ • • • , /AI ~ /AI ~ • • • , and let the characteristic values of A + B be VI ~ VI ~ • • • , then Xi + /Aj ~ V'+;_I, for i + j ~ N + 1 (Weyl). 2. By considering A as already reduced to diagonal form, construct an inductive proof of the Poincare separation theorem. S. What are necessary and sufficient conditions that aUXl 1 + 2aUxlXI + aUxl1 ~ 0 for all Xl, XI ~ 01 XI
3
fl. What are necessary and sufficient conditions that
l
a'lx,Xj
~0
i.i-I X. ~
01
l
N
Ii. Can one obtain corresponding results for
aijx,x j?
i,j-I
8. Prove that
is positive definite for r
> 0 if A
=
(alj)
is positive definite and a,j
~
U.
for
Xl' XI,
122
Introduction to Matrix Analysis
'I. Does a corresponding result hold for 3 X 3 matrices? 8. Prove that (ail) is positive definite if ail > 0 and (IaHI> is positive definite. 9. Show that the characteristic roots of AHBA~i are less than those of A if B is a symmetric matrix all of whose roots are between 0 and 1 and A is positive definite. 10. If T is orthogonal, x real, is (Tx,Ax) 5 (x,Ax) for all x if A is positive definite? 11. Define ranle for a symmetric matrix as the order of A minus the number of zero characteristic roots. Show, using the results of this chapter, that this definition is equivalent to the definition given in Appendix A. 11. Suppose that we have a set of real symmetric matrices depending upon a parameter, scalar or vector, q, IA(q)}. We wish to determine the maximum characteristic root of each matrix, a quantity we shall call f(q) , and then to determine the maximum over q of this function. Let us proceed in the following way. Choose an initial value of q, say qo, and let X Obe a characteristic vector associated with the characteristic root f(qo). To determine our next choice ql, we maximize the expression (xO,A(q)xO)/(xO,x O) over all q values. Cali one of the values yielding a maximum ql, and let Xl be a characteristic vector associated with the characteristic root f(ql). We continue in this way, determining a sequence of values f(qo), f(ql), • • .. Show that f(qo)
5 f(ql) 5
f(ql)
5 ...
18. Show, however, by means of an example, that we may not reach the value max f(q) in this way. t
1'. Under what conditions on A(q) can we guarantee reaching the absolute maximum in this way? 115. Let the 8pectral radiu8 r(A), of a square matrix A be defined to be the greatest of the absolute values of its characteristic roots. Let H be a positive definite Hermitian matrix and let g(A,H) ... max [(Ax,HAx)/(x,Hx)l~ or
Show that r(A) - min g(A,H). H
18. If A is a real matrix, show that r(A) ... min g(A,8), where now the minimum is 8
taken over nil real symmetric matrices (H. Osborn, The Existence of Conservation Laws, Annals of Math., vol. 69, pp. 105-118, 1959).
Bibliography and Discussion §1. The extension by R. Courant of the min-max characterization from the finite case to the operators defined by partial differential equations is one of the foundations of the modern theory of the characteristic values of operators. For a discuSllion of these matters, see R. Courant and O. Hilbert, Methoden der mathemati8chen Physik, Berlin, 1931, reprinted by Interscience Publishers, Inc., New York. §2. The Rayleigh quotient affords a simple means for obtaining upper bounds for the smallest characteristic value. The problem of obtaining lower bounds is n\uch more difficult. For the case of more general oper-
Variational Description of Characteristic Roots
123
ators a method developed by A. Weinstein and systematically extended by N. Aronszajn and others is quite powerful. For an exposition and further references, see
J. B. Diaz, Upper and Lower Bounds for Eigenvalues, Calculu8 of Variations and Its Application8, L. M. Graves (ed.), McGraw-Hill Book Company, Inc., New York, 1958. See also G. Temple and W. G. Bickley, Rayleigh'8 Principle and Its Application to Engineering, Oxford, 1933. An important problem is that of determining intervals within which characteristic values can or cannot lie. See T. Kato, On the Upper and Lower Bounds of Eigenvalues, J. PhY8. Soc. Japan, vol. 4, pp. 334-339, 1949. H. Bateman, Tran8. Cambridge Phil. Soc., vol. 20, pp. 371-382, 1908. G. Temple, An Elementary Proof of Kato's Lemma, Matematika, vol. 2, pp. 39-41, 1955. §6. See the book by Courant and Hilbert cited above and E. Fischer, tJber quadratische Formen mit reellen Koeffizienten, Monat8h. Math. u. Physik, vol. 16, pp. 234-249, 1905. For a discussion, and some applications, see G. Polya, Estimates for Eigenvalues, Studies in Mathematic8 and Mechanics, presented to R. Von Mises, Academic Press, Inc., New York, 1954. There are extensive generalizations of this result. Some are given in Chap. 7, Sees. 10, 11, and 13, results due to Ky Fan. See Ky Fan, On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations, I, Proc. N atl. Acad. Sci. U.S., vol. 35, pp. 652-655, 1949. An extension of this is due to Wielandt, H. Wielandt, An Extremum Property of Sums of Eigenvalues, Proc. Am. Math. Soc., vol. 6, pp. 106-110, 1955. See also Ali R. Amir-Moez, Extreme Properties of Eigenvalues of a Hermitian Transformation and Singular Values of the Sum and Product of a Linear Transformation, Duke Math. J., vol. 23, pp. 463-477, 1956.
124
Introduction to Matrix Analysis
A further extension is due to M. Marcus and R. Thompson, A Note on Symmetric Functions of Eigenvalues, Duke Math. J., vol. 24, pp. 43-46, 1957. M. Marcus, Convex Functions of Quadratic Forms, Duke Math. J., vol. 24, pp. 321-325, 1957.
§S. Results of this type may be established by means of the classical techniques of Sturm. See W. S. Burnside and A. W. Panton, Theory of Equations, vol. II, Longmans, Green & Co., Inc., New York, 1928. §10. H. Poincar~, Sur les ~quations aux d~riv~es partielles de la physique matMmatique, Am. J. Math., vol. 12, pp. 211-294, 1890.
§11. R. Bellman, I. Glicksberg, and O. Gross, Notes on Matrix Theory -VI, Am. Math. Monthly, vol. 62, pp. 571-572,1955. §12. An extensive discussion of computational techniques of this type is contained in
L. Collatz, Eigenwertproblemen und ihre numerische Behandlung, New York, 1948. Finally, let us mention the monograph M. Parodi, Sur quelques propri~Us des valeurs characUristiques des matrices carrees, fascicule CXVII, M em. sci. mathe., 1952. The problem of obtaining corresponding variational expressions for the characteristic values of complex matrices, which arise in the study of linear dissipative systems, is one of great difficulty. For some steps in this direction, see R. J. Duffin, A Minimax Theory for Overdamped Networks, J. Rat. Mech. Analysis, vol. 4, pp. 221-233, 1955, and the results of Dolph et al., previously referred to. R. J. Duffin, The Rayleigh-Ritz Method for Dissipative or Gyroscopic Systems, Q. Appl. Math., vol. 18, pp. 215-222, 1960. R. J. Duffin and A. Schild, On the Change of Natural Frequencies Induced by Small Constraints, J. Rat. Mech. Anal., vol. 6, pp. 731758, 1957. A problem of great interest in many parts of physics is that of determining q(t) given the characteristic values corresponding to various
Variational Description of Characteristic Roots boundary conditions.
125
See
G. Borg, An Inversion of the Sturm-Liouville Eigenvalue Problem. Determination of the Differential Equation from the Eigenvalues, Acta Math., vol. 78, pp. 1-96, 1946. I. M. Gelfand and B. M. Levitan, On the Determination of a Differential Equation from Its Spectral Function, Izv. Akad. Nauk SSSR, Ser. Math., vol. 15, pp. 309-360, 1951.
B. M. Levitan and M. G. Gasymov, Determination of a Differential Equation from Two Spectra, U8pekhi Mat. N auk, vol. 19, pp. 3-64, 1964.
R. Bellman, A Note on an Inverse Problem in Mathematical Physics, Q. Appl. Math., vol. 19, PP. 269-271, 1961.
L. E. Anderson, Summary of Some Results Concerning the Determination of the Operator from Given Spectral Data in the Case of a Difference Equation Corresponding to a Sturm-Liouville Differential Equation, to appear. See also R. Yarlagadda, An Application of Tridiagonal Matrices to Network Synthesis, SIAM J. Appl. Math., vol. 16, pp. 1146-1162, 1968.
8 Inequalities 1. Introduction. In this chapter, we shall establish a number of interesting inequalities concerning characteristic values and determinants of symmetric matrices. Our funda~ental tools will be the integral identity N 1 r / ~
IAI'"
=
foo
. . .
-00
foo
e-(s,As)
dx
(1)
-00
valid if A is positive definite, its extension given in Theorem 6 of Chap. 7, and some extensions of min-max characterizaticms of Courant-Fischer, due to Ky Fan. Before deriving some inequalities pertaining to matrix theory, we shall establish the standard inequalities of Cauchy-Schwarz and Holder. Subsequently, we shall also prove the arithmetic-geometric mean inequality, since we shall require it for one of the results of the chapter. 2. The Cauchy-Schwarz Inequality. The first result we obtain has already been noted in the exercises. However, it is well worth restating. Theorem 1. For any two real vectors x and y we have (x,y)2 ::;; (x,x)(y,y) Proof.
(1)
Consider the quadratic form in u and v, Q(u,v) = (ux + vy, ux + vy) = u 2(x,x) + 2uv(x,y) + v2(y,y)
(2)
Since Q(u,v) is clearly non-negative definite, we must have the relation in (1). We see that (1) is a special, but most important case, of the nonnegativity of the Gramian determinant established in Sec. 4 of Chap. 4. 3. Integral Version. In exactly the same way, we can establish the integral version of the preceding inequality. Theorem 2. Let f(x) , g(x) be function8 of x defined over some region R. Then
(!R fg dV)2 ::;; (!R P dV) (!R g2 dV) 126
(1)
127
Inequalities Proof. Since (f - g)2
~
0, we have 2fg
5:
r + g2
Hence, fg is integrable if rand g2 are.
(2)
Now consider the integral (3)
which by virtue of the above inequality exists. As a quadratic form in u and v, the expression in (3) is non-negative definite. Consequently, as before, we see that (1) must be satisfied. EXERCISES 1. We may also proceed as follows: (x,x)(y,y) -
(x J y)2
=
(y,y) [(X,X) - 2
.~~,~)I + ~~,~I]
... (y y) (x _ y (x,y), x _ y (X,y» > 0 J (y,y) (y,II)_ N
I. Prove that (
L
N
XkYk)
II-I
:s; (
L II-I
N
l Mzk )
(
L
Yk l IN.) if the Xk are positive.
II-I
a.
Hence, show that (x,Ax)(y,A -ly) ~ (x,II)· if A is positive definite. the result without employing the diagonal form.
Establish
4. Holder Inequality. As an extension of (1), let us prove Theorem 3. Let p > 1, and q = p/(p - 1),. then (1)
if Xk,
Yk
Proof.
~
O. Consider the curve (2)
where p
> 1. p
1------,ilL----4{i
u
Introduction to Matrix Analysis
128
It is clear that the area of the rectangle OvRu is less than or equal to the sum of the areas OPu and OQv, (3)
with equality only if v =
Hence, if u, v
Up-I.
~
0, p
>
1, we have
uP vq uv<-+- p q
(4)
Now set successively
(5)
k
1, 2, . . . ,N, and sum over k. The result is
=
(6)
Since lip Setting
+ 1/q =
1, the result is as stated in (1).
(7) the inequality in (4) yields
_ f(x)g(x) < ! f(x)p (!nf(x)P dVr'p (!n g(x)Q dV)l/q - p !nf(x)P dV
+!
g(x)q . q!n g(x)q dV
(8)
Integrating over R, the result is
!nf(x)g(x) dV
~ (!nf(x)PdVr'p (!ng(x)qdV)',q
(9)
valid when f(x) , g(x) ~ 0 and the integrals on the right exist. G. Concavity of IA I. Let us now derive some consequences of the integral identity of (1.1). The first is Theorem 4. If A and B are positive definite, then
IXA + (1 for 0 ::;; X ::;; 1.
- X)BI ~
IAIAIBI1-A
(1)
129
Inequalities Proof.
We have
f"- .. ... f-.. .
lI'N/2
IXA
+ (1
- X)BI~ =
e-A(s,As)-(I-A)(s,Bsl
(2)
Let us now use the integral form of Holder's inequality given in (4.9) with p = l/X, q = 1/(1 - X). Then IXA
(f-.... ... f-.. .
+ (;/~ X)BI~ ~
e-(z,As)
y
.(f-.. . ... f .
_ .. e-(z.Bs)
lI'NA/2 lI'N(I-A)/2
dx
)O-Al
(3)
::;; IA IA/2 IBI(l-A1I2
The result when simplified is precisely (1), which is a special case of a more general result we shall derive below. EXERCISES 1. Prove that IA 2. Show that >.x
+ iBI
+ (1
~ IA I if A is positive definite and B is real symmetric. - >.)y ~ XAyl-A for x, y ~ 0, 0 ~ X ~ l.
6. A Useful Inequality. A result we shall use below is given in Theorem 5. Theorem 6. If A is positive definite, then
IAI
~
(1)
allau • • • aNN
Let us give two proofs. First Proof. We have
IAI
=
au
a2N
an
a3N
0 au
au
alN
au
a2N
+
au aN2
...
aNN
(2) aNI
aN2
aNN
Since A is positive definite, the matrix (ai;), i, j = 2, . . . ,N, is positive definite; hence the quadratic form in a12, au, • • • ,alN appearing as the second term on the right in (2) is negative definite. Thus
IAI ~ all
whence the result follows inductively.
(3)
Introduction to Matrix Analysis
ISO &eorul Proof.
Replacing
Xl
by
Consider the integral in (1.1) for the case N = 3. and adding, we have
-XI
(4)
where Since z + r
1
I~i~ ~
~
2 for all z ~ 0, we have
(1-.. . ... ~S
e-(..•..·t+2.......t+......·) dx 2
dxa) (/-.. . . e-
l)
...
~ lalll~ IA21~
(6)
where A 2 = (Clij), i, j = 2, 3.
Hence,
(7)
lalll~IA21~i ~ IAI~
whence the result for N = 3 follows inductively. general N is derived similarly.
The inequality for
EXERCISE 1. Let D I ... 11Ii/1, i, j ... I, 2, . . . , nl, D. Dr - IlIiil, i, j - nr_1 + I, ••. ,N. Then
=
lai/I, i,
j ...
nl
+ I,
. . . , n.,
IAI :s; DID• .•. Dr 1. Hadamard's Inequality. The most famous determinantal inequality is due to Hadamard. Theorem 6. Let B be an arbitrary nonsingular real 8quare matrix. Then (1)
A
Proof. Apply Theorem 5 to the positive defitlite square matrix = BB'. 8. Concavity of XNX N- 1 • • • Xl. Now define the matrix function
IAll =
XNXN-1 .••
Xl
(1)
Then we may state Theorem 7. Theorem 7. If A and B are positive definite, we have
IXA + (1
-
X)Bll
~
IAll"IBlll-A
(2)
for 0 ::; X ::; 1, k = I, 2, . . . , N. The proof follows from Theorem 6 of Chap. 7 in exactly the same way that Theorem 4 followed from (1.1).
Inequalities
131
EXERCISES
1. Show that
where x is constrained by the condition that x( ... 1, and A i is the matrU' obtained by deleting the ith row and ith column of A. I. Hence, show that
+ (1
tf>[M
- >..)B] ~ tf>(A)A
~ >.. ~ 1 (BergBtrrnn'B inequality). Let A and B be two positive definite matrices of order N and let M + (1 - >..)B, 0 ~ >.. ~ 1. For each j = 1, 2, . . . ,N, let Ai denote the submatrix of A obtained by deleting the first (j - 1) rows and columns. If k I, k2, • • . ,kN are N
°-
for 0
a.
j
real numbers such that
l
ki
~ 0, then
i-I
N
N
n1011·; nIA ~
j-I
IIA·;IBil(l-A)./
(Ky Fan)
j=1
4. Establish Hadamard's inequality for Hermitian matrices. IS. If A is a positive definite N X N matrix and P. denotes the product of all principal k-rowed minors of A, then (8zaaz)
See L. Mirsky.1 8. Let A p denote the principal submatrix of A formed by the first p rows and p columns, and let Bp and Op have similar meanings. Then
> (ill)II
(Ky Fan)
9. Additive Inequalities from Multiplicative. Let us now give an example of a metamathematical principle which asserts that every multiplicative inequality has an associated additive analogue. Consider Theorem 7 applied to the matrices
A=I+eX
B
=
1+ eY
e
>0
(1)
where X and Yare now merely restricted to be symmetric. If e is sufficiently small, A and B will be positive definite. Let Xl ~ X2 ~ • • • ~ XN, Yl ~ Y2 ~ • • • ~ YN be the characteristic roots of X and Y, respectively, and ZI ~ Z2 ~ • • • ~ ZN the character1 L. Mirsky, On a Generalization of Hadamard's Determinantal Inequality Due to SZIWl, Arch. Math., vol. VIII, pp. 274-275, 1957.
182
Introduction to Matrix Analysis
istic roots of XX (1
+ (1
+ eZN)(1 + EZN-I) ~
[(1
- X) Y.
Then Theorem 7 yields the result
. . . (1 + eZt) + eXN)(1 + eXN-1) . . . (1 + eXt)]>' . [(1 + eYN)(1 + eYN-I) . . . (1 + eYt))'-),
(2)
To first-order terms in e, this relation yields 1
+ e(zN + ZN-1 + . . . + Zt) ~ 1 + Xe(xN + XN_1 + . . . + Xt) + (1 - X)e(YN + YN-l + ... + Yt) + O(e!)
(3)
Letting e --t 0, we obtain ZN
+ ZN-1 + ... + Zt
~
(XN
+ ... + Xt) + (YN + YN-1 + ... + Yt)
(4)
Let us state this result, due to Ky Fan, in the following form. Theorem 8. Let U8 define the matrix function St(A) = XN
for a 8ymmetric matrix A.
+ XN-1 + . . . + Xt
(5)
Then
for 0 ::;; X ::;; 1, k = 1, 2, . . . , N. From this follows, upon replacing A by - A, Theorem 9. If (7)
where A i8 8ymmetric, then Tt[XA
+ (1
-
X)B] ::;; XTt(A)
+ (1
- X) Tt(B)
(8)
for 0 ::;; }. ::;; 1, k = 1, 2, . . . , N. 10. An Alternate Route. Let us obtain these results in a different fashion, one that will allow us to exhibit some results of independent interest. We begin by demonstrating Theorem 10. Theorem 10. We have, if A i8 positive definite X1X2
•••
XN-t+1
=
max I(Zi,Azi) I R
XNXN-1 .•. Xt
= min I(Zi, Azi) I
(1)
R
where R i8 the z region defined by i, j
= 1, 2, . . .
,N - k
In other word8, the miftimization i8 over aU 8et8 of N Vlctor8.
(2) +1 k + 1 orthonormal
Inequalities Proof.
133
Consider the determinant N
_I (x,Ax) (x,Ay)
D 2(x,y)
I
upon setting x =
Uk Xt ,
(x,Ay) (y,Ay)
Y
=
i-I
I
I L L "=1 =
N
2 Xt U t
t-I N
k VtX ,
L L
"-1 N
XkUtVk
XtUkVt
(3) XtVt
2
i-I
where the x k are the character-
"-1
iatic vectors of A. The identity of Lagrange yields N
D 2(x,y)
=
LX
i '1l;(Ui V; -
(4)
U;Vi)2
U= I
No terms with i = j enter, since then It follows that N
L
XI X2
(u,Vj -
UjVi)2
U,Vj -
~
D 2(x,y)
~
XNXN-I
UjVi
= O.
i,j-l
N
L (u,v; -
(5)
UjViP
i3-1
Hence N
,L
,-I N
N
Ui
2
L
u,v,
U,V,
i-I
i= I
N
Lu,v, L
i= 1
N
L
(6)
N
Vi
L
2
i=1
Vi
2
i-I
or
(7)
This proves the result for k = 1. To derive the general result, we use the identity for the Gramian given in Sec. 5 of Chap. 3, rather than the Lagrangian identity used above. EXERCISE
1. Use Theorem 9, together with Theorem 4, to establish Theorem 7.
11. A Simpler Expression for '11 N'1I N-l • • • Xk. Since determinants are relatively complicated functions, let us obtain a simpler representation than that given in Theorem 10. We wish to prove Theorem 11.
Introduction to Matrix Analysis
IS4 Theorem 11.
If A is positive definite,
'ANXN-l •••
'A,
= min
(z',Az')(ZI,AzI) . . . (ZN-HI,AzN-HI)
(1)
R
where R is
a8 defined. in (10.2). 1, is posiProof. We know that (ai,Az'), i, j = 1, 2, ... , N - k t.ive definite if A is positive definite. Hence, Theorem 5 yields
+
N-"'+I
I(Zi, A zi) I 5
IT
{z',A z')
(2)
Thus N-k+1
min I(zi,Azj)1 5 min R
R
n
(Zi, Az')
(3)
i-I
Choosing Zi = Xi, i = N, N - 1, . . . ,N - k
+ 1, we see that
N-k+1
min R
n
(zl,Azi) 5 XNXN-l ••• X,
(4)
i-I
This result combined with Theorem 9 yields Theorem 10. EXERCISES 1. Prove Theorem 7, using the foregoing result. Zl so as to show that
I. Consider the case k ... N and specialize
a.
Considering the general case, show that
6. Show that min tr (AB)/N ... lApiN, where the minimum is taken over all B B
satisfying the conditions B ~ 0, IBI ... I, and A is positive definite. IS. Hence, show that IA + B/"N ~ lApiN + IBI'IN if A andB are positive definite (A. Minkow8ki).
1i. Arithmetic-Geometric Mean Inequality. For our subsequent purposes, we require the fundamental arithmetic-geometric mean inequality. Theorem 12. If Xi ~ 0, then N
L xii N ~
,-1
(XIX2 • • • XN) liN
(1)
Proof. Although the result can easily be established via calculus, the following proof is more along the lines we have been pursuing, t Starting
t This proof is due to Cauchy.
Inequalities with the relation
(al~i
-
a2~)2 ~
al
135
0, or
+2 a2 >- (ala2)~
(2)
the result for N = 2, we set The result is b1
+ b2
t
~
ba + b.
~
e e
(3)
(b l b2bab4) ~
(4)
1
~ b2)~
a
~ b4)~
Continuing in this fashion, we see that (1) is valid for N a power of 2. To complete the proof, we use a backward induction, namely, we show that the result is valid for N - 1 if it holds for N. To do this, set Yl
=
X2, • • • , YN-l
=
XN-l
YN
=
Xl
+ X2 +N ._ .1. + XN-l
(5)
and substitute in (1). The result is the same inequality for N - 1. Examining the steps of the proof, we see that there is strict inequality unless Xl = X2 = . . . = XN. 13. Multiplicative Inequalities from Additive. In a preceding section, we showed how to deduce additive inequalities from multiplicative results. Let us now indicate how the converse deduction can be made. Theorem 13. The inequality N
I
Xi
'=k
k = 1,2, . inequality
~
N
I
, N, valid for any set of orthonormal vectors lx'\, yields the N
N
n Xi S n
'=k
Proof.
CkXk
(1)
(x',Ax')
'=k
(x',Ax i )
(2)
'=k
Consider the sum
+ Ck+lXk+l + . . . + CNXN = Ck(Xk + Xk+l + . . . + XN) + (Ck+l - Ck)(Xk+l + ... + XN) + ... + (CN - CN_l)X"",
(3)
If we impose the restriction that
o ~ Ck
~
Ck+l =:;; • • •
~
CN
(4)
we have, upon using (I), the inequality CkXk
+ Ck+lXk+l + ... + CN"'N S Ck(xk,Axk) + Ck+l(xk+1,Axk+I) + ... + CN(xN,Ax N)
for any set of orthonormal vectors
(5)
IxiI and scalar constants satisfying (2).
136
Introduction to Matrix Analysis
We deduce from (5) that N
(L
min
N
L
CiX,) =:;; min [
i-k
R
R
c,(x',Ax i ) ]
(6)
i-k
where R is the region in C space defined by (4). Now it follows from Theorem 12 that N
N _
~ + 1 .'\' c,X, ~ (CkCk+l
il:!
•.• CN)'/(N-l+l)(XkXk+l .•• XN) l/(N-Hl)
(7)
with equality only if all X,C, are equal, whence Co = (XkXk+l • • • XN)
l/(N-Hl) IXi
(8)
Since Xk ~ Xk+l ~ .•• ~ XN, it follows that 0 < Ck =:;; Ck+l =:;;. • =:;; CN. This means that the minimum over all c, is equal to the minimum over R for the left side of (6). Let us temporarily restrict ourselves to orthonormal vectors Ix'} satisfying the restriction (xk,Ax k) =:;; (xHl,Ax H1) ~ ••. ~ (xN,Ax N) (9) Then (6) yields the inequality X,Xk+l ••• XN =:;; (x", Axk)(X H1 , AX H1 )
•••
(x N,Ax N)
(10)
for these Ix'}. However, since it is clear that any set of Xi can be reordered to satisfy (9), the inequality in (10) holds for all orthonormal
{xiI· MISCELLANEOUS EXERCISES 1. Let n(A) =-
(~laijI2Y~.
Show that
'.1 N
(L IXilur~ ~
n(Ak)
~ coNm-,
N
(L IXil
i-I
lk ) ' "
j=!
where m is the maximum multiplicity of any characteristic root of A (GauI8chi). I. If Xi, Pi, are the characteristic values arranged in decreasing order of A ·A, B·B, and (A + B)*(A + B), respectively, then
II,
k
L
i-I
lIi loi
~
k
L
k
X.loi
+
i-I
LPi~~
k ... 1, 2, ••• , N
i-I
Under the same hypothesis, one has (1I'+f-1)loi ~ X,~~
+ Piloi
(Ky Fan)
(KII Fan)
Inequalities
137
8. If H - A + iB is a Hermitian matrix, then it is positive definite if and only if the characteristic values of iA -'B are real and less than or equal to 1 (Rober18onO. TauB8kll). ,. If H ... A + iB is positive definite, where A and B are real, then IA I ~ IHI, with equality if and only if B = 0 (0. Taussky). IS. If HI is a positive definite Hermitian matrix and H, is a Hermitian matrix, then HI + H 2 is positive definite. if and only if the characteristic values of HI-IH I are all greater than -1 (Ky Fan-O. Taussky). 6. Let K I be a positive definite matrix and K I such that K ,K I is Hermitian. Then K I K 2 is positive definite if and only if all characteristic values of K I are real and posi. tive (KII Fan-O. Taussky). 7. If A and B are symmetric matrices, the characteristic roots of A B - BA are puce complex. Hence, show that tr «AB)I) ~ tr (A IB2). 8. If A, B are two matrices, the square of the absolute value of any characteristic root of AB is greater than or equal to the product of the minimum characteristic root of AA' by the minimum characteristic root of BR'. 9. Let H ... A + iB be positive definite Hermitian; then IAI > IBI (Robertson). 10. If A = B + C, where B is positive definite and C is skew-symmetric, then IAI ~ IHI (0. Taussky). 11. If A is symmetric, A and I - A are non-negative definite, and 0 is orthogonal, then II - AOI ~ II - AI (0. Taussky). 11. Establish Schur's inequality I N
I
I~.II
Ilaiil
S;
i= 1
l
i,j
18. Let A be symmetric and k he the number of zero characteristic roots.
f ~i
=
i-I
f ~il
Ntk~i i=1
(I
~i)2 S;
Ntk~i2
i=l
i-l
where the summation is now over nonzero roots. N-k
=
From the Cauchy inequality N-k
(N - k)
i= 1
Then
(L
~i')
i= 1
deduce that (tr A)I S
(N -
k)
tr
(A2)
whence k
<
-
N tr J..:t~~ -(.t:'~ tr (AI)
14,. From the foregoing, conclude that the rank of a symmetric matrix A is greater than or equal to (tr A)I/tr (A2). lIS. Obtain similar relations in terms of tr (A k) for k = I, 2, 3, .•.. (The foregoing trick was introduced by Schnirelman in connection with the problem of representing every integer as a sum of at most a fixed number of primes.) 16. Let ~N be the smallest characteristic value 1I.nd ~, the largest characteristic value of the positive definite matrix A. Then
<
( X,X ) 1 _ 1
(A x,x ) (A -I x,x ) _ < -(~l4~I~N + ~.v) I
I. Schur, Malh. Ann., vol. 66, pp. 488--510, 1909.
(
X,X
)I
138
Introduction to Matrix Analysis
1'his is a special case of more general results given by W. Greub and W. Rheinboldt. 1 17. If X is positive definite, then X + X-I ~ 21. 18. If X is positive definite, there is a unique Y which minimizes tr (XY-I) subject to the conditions that Y be positive definite and have prescribed diagonal elements. 19. This matrix Y satisfies an equation of the form X ... YAY, where A is a diagonal matrix of positive elements Xi. N
10. The minimum value of tr (XY-I) is
L
Xjbii (P. WhiUle).
j=1
11. If Band C are both positive definite, is BC + CB ~ 2miCBJ.i? II. If A, B, and C are positive definite, is (A + B)J.iC(A + B)J.i ~ AJ.iCA~i + miCmi? 18. Let A and B be N X N complex matrices with the property that I - A·A and 1- B·B are both positive semidefinite. Then 11- A·Bls ~ II - A·AI II - B·BI. (Hua, Acta Math. Sinica, 1J01. 5, pp. 463-470, 1955; Bee Math. Rev., 1J01. 17, p. 703, 1956.) For extensions, using the representation of Sec. 10 of Chap. 6, see R. Bellman, Representation Theorems and Inequalities for Hermitian Matrices, Duke Math. J., 1959, and, for others using a different method, M. Marcua, On a Determinantal Inequality, Amer. Math. Monthly, vol. 65, pp. 266-268, 1958. I'. Let A, B ~ O. Then tr «AB)S"+I) ~ tr «A SBS)S"), n == 0, 1, ... (Golden). lIS. Let X+ denote the Penrose-Moore generalized inverse. Define the parallel Bum of two non-negative definite matrices A and B by the expresaion A:B = A(A
+ B)+B
TlJ.en A:B = B:A, (A:B)C = A:(B:C). 18. If Ax ... Xx, Bx = /AX, then (A: B)x = (X:/A)x.
1'1. If ai, hi
> 0, then
(L (L ai):
hi)
~
L
(ai:hi) and if A, B
~ 0, then tr (A: B)
" i (tr A): (tr B), with equality if and only if "A = cIB, CI a scalar. 18. If A, B ~ 0, then IA;BI ~ IAI:IBI. 19. Can one derive the result of Exercise 27 from the result of Exercise 28 and conversely? For the results in Exercises 25-28, and many further results, see W. N. Anderson, Jr., and R. J. Duffin, Series and Parallel Addition of Matrices, J. Math. Anal. Appl., to appear. 80. Let AI, As > 0 and M(AI,A s) denote the convex hull of the set of matrices C such that C ~ AI and C ~ As. Similarly define M(AI,As,A s) with As > O. Is it true that M(A1,As,A s) = M(M(AI,As),A s), with the obvious definition of the righthand side? 81. A matrix is said to be doubly stocha8tic if its elements are nonnegative and the row and column sums are equal to one. A conjecture of Van der Waerden then states that per(A) ~ NI/NN with equality only if aii = liN, N the dimension of A. Here, per(A) i~ the expre88ion obtained from IAI by changing all minua signs into pluuigns. Establish.the conjecture for N = 2,3. See Marcus and Minc. s For an application of this result to some interesting problems in statistical mechanics, see Hammersley.~
J W. Greub and W. Rheinboldt, On a Generalization of an Inequality of L. V. Kanwrovich, Proc. Am. Math. Soc., 1959. a M. Marcus and H. Minc, Am. Malh. Monlhly, vol. 72, pp. 577-591, 1965. - J. M. Hammersley, An Improved Lower Bound for the Multidimensional Dimer Problem, Proc. Cambridge Phil. Soc., vol. 64, pp. 455-463, 1968.
Inequalities
139
Bibliography and Discussion §1. The modern theory of inequalities, and the great continuing interest in this field, stems from the classic volume G. H. Hardy, J. E. Littlewood, and G. Polya, Inequalities, Cambridge University Press, New York, 1934. An account of more recent results, together with work in different directions, will be found in E. F. Beckenbach and R. Bellman, Inequalities, Springer-Verlag, 1961. A much more extensive account of inequalities pertaining to matrices and characteristic root will be found therein. The principal results of this chapter have been motivated by papers by Ky Fan, Ky Fan, Some Inequalities Concerning Positive-definite Matrices, Proc. Cambridge Phil. Soc., vol. 51, pp. 414-421, 1955. Ky Fan, On a Theorem of Weyl Concerning the Eigenvalues of Linear Transformations, I, Proc. Natl. Acad. Sci. U.S., vol. 35, pp. 652-655, 1949.
Ky Fan, On a Theorem of Weyl Concerning the Eigenvalues of Linear Transformations, II, Proc. Natl. Acad. Sci. U.S., vol. 36, pp. 31-35, 1950.
Ky Fan, Maximum Properties and Inequalities for the Eigenvalues of Completely Continuous Operators, Proc. Natl. Acad. Sci. U.S., vol. 37, pp. 760-766, 1951. Ky Fan, Problems 4429 and 4430, Am. Math. Monthly, vol. 58, p. 194, 1951; Solutions in vol. 60, pp. 48-50, 1953. (Theorems 7 and 11 are to be found in these pages.) The methods employed are, in the main, quite different. Our aim has been to show the utility of representation theorems in the derivation of inequalities. §2 to §4. An intensive discussion of these classical inequalities will be found in the book by Hardy, Littlewood, and Polya cited above. §5. The result appears to be due to Ky Fan; cf. the third and fifth references cited above; cf. also the papers by Oppenheim.
Introduction to Matrix Analysis
140
A. Oppenheim, Inequalities Connected with Definite Hermitian Forms, J. London Math. Soc., vol. 5, pp. 114-119, 1930. A. Oppenheim, Inequalities Connected with Definite Hermitian Forms, II, Am. Math. Monthly, vol. 61, pp. 463-466, 1954. See also further references given there. Applications of a similar nature, using the representation of Sec. 10 of Chap. 6, may be found in R. Bellman, Hermitian Matrices and Representation Theorems, Duke Math. J., 1959. where some results due to Hua are generalized; see L. K. Hua, Inequalities Involving Determinants, Acta Math. Sinica, vol. 5, pp. 463-470, 1955; Math. Rev8., vol. 17, p. 703, 1956. Hua uses a different type of integral representation, derived from the general theory of group representations. §6. See E. F. Beckenbach, An Inequality for Definite Hermitian Determinants, Bull. Am. Math. Soc., pp. 325-329, 1929. §7. Hadamard's inequality is one of the most proved results in analysis with well over one hundred proofs in the literature. The proof here follows: R. Bellman, Notes on Matrix Theory-II, Am. Math. Monthly, vol. LX, pp. 174-175, 1953. Hadamard has a quite interesting remark concerning his inequality in his fascinating monograph,
J. Hadamard, The Psychology of Invention in the Mathematical Field, Princeton University Press, Princeton, N.J., 1949. For the original result, see J. Hadamard, Resolution d'une question relative aux determinants, Bull. 8ci. math., vol. 2, pp. 240--248, 1893. For one extension, see E. Fischer, tiber den Hadamardschen Determinantensatz, Arch. Math. u. PhY8., (3), vol. 13, pp. 32-40, 1908. and for a far-reaching extension, see
Inequalities
141
I. Schur, tiber endliche Gruppen und Hermitesche Formen, Math. Z., vol. 1, pp. 184-207, 1918.
See also
J. Williamson, Note on Hadamard's Determinant Theorem, Bull. Am. Math. Soc., vol. 53, pp. 608-613, 1947. H. S. Wilf, Hadamard Determinants, Mobius Functions and the Chromatic Number of a Graph, Bull. Am. Math. Soc., vol. 74, pp. 960--964, 1968. §8. For this result and the related Theorem 11, see the fifth reference of Ky Fan given above. The proof here follows
R. Bellman, I. Glicksberg, and O. Gross, Notes on Matrix TheoryVI, Am. Math. Monthly, vol. 62, pp. 571-572, 1955. The result of Exercise 2 is due to
Bergstr~m,
H. Bergstr~m, A Triangle Inequality for Matrices, Den 11te Skandinaviske M aternatiker-kongressen, Trondheim, 1949, Oslo, pp. 264-267, 1952. while the method of proof follows
R. Bellman, Notes on Matrix Theory-IV: An Inequality Due to Bergstr~m,
Am. Math. Monthly, vol. 62, pp. 172-173, 1955.
The result of Exercise 3 is given in R. Bellman, Notes on Matrix Theory-IX, Am. Math. Monthly, vol. 64, pp. 189-191, 1957. where two methods are used, one depending upon the generalization of the Ingham-Siegel representation referred to above and the other upon Bergstr~m 's inequality.
§9. This section illustrates an important technique for obtaining two results from one. Occasionally, the multiplicative results are easier to obtain ab initio. The result follows
R. Bellman, Notes on Matrix Theory-VII, Am'. Math. Monthly, vol. 62, pp. 647-648, 1955. §12. The book by Hardy, Littlewood, and Polya contains a history of the arithmetic-geometric mean inequality together with a large number of proofs. The one given here is our favorite because of its dependence . upon backward induction.
142
Introduction to Matrix Analysis
§13. We have omitted the proof of inequality (1) which is given in Ky Fan's second paper. The present derivation of (2) from (1) is given in R. Bellman, Notes on Matrix Theory-XV: Multiplicative Properties from Additive Properties, A m. Math. Monthly, vol. 65, pp. 693694,1958. The topic of inequalities for various functions of characteristic values has been thoroughly explored by a number of authors; cf. the following articles where a large number of additional references will be found. A. Ostrowski, Sur quelques applications des fonctions convexes et concaves au sens de I. Schur, J. math. pure8 et appl., vol. 31, no. 9, pp. 253-292, 1952. M. D. Marcus and L. Lopes, Inequalities for Symmetric Functions and Hermitian Matrices, Can. J. Math., vol. 8, pp. 524-531, 1956. M. D. Marcus and J. L. McGregor, Extremal Properties of Hermitian Matrices, Can. J. Math., vol. 8, 1956. M. Marcus and B. N. Moyls, Extreme Value Propertie8 of Hermitian Matrice8, Department of Mathematics, University of British Columbia, Vancouver, Canada, 1956. M. Marcus, B. N. Moyls, and R. Westwick, Some Extreme Value Results for Indefinite Hermitian Metrics II, Illinoi8 J. Math., vol. 2, pp.408-414, 1958. We have omitted all discussion of the interesting and important question of inequalities for the characteristic roots in terms of the elements of A. Thorough discussions of this field may be found in the papers by A. Brauer, Limits for the Characteristic Roots of a Matrix, IV, Duke Math. J., vol. 19, pp. 75-91, 1952. E. T. Browne, Limits to the Characteristic Roots of Matrices, Am. Math. Monthly, vol. 46, pp. 252-265, 1939. W. V. Parker, The Characteristic Roots of Matrices, Duke Math. J., vol. 12, pp. 519-526, 1945. Next, tat us mention the paper N. G. DeBruijn and D. Van Dantzig, Inequalities Concerning Minors and Eigenvalues, Nieuw. Arch. Wi8k., vol. 4, no. 3, pp. 18-35, 1956. where a systematic derivation of a large number of determinantal inequalities will be found.
Inequalities
143
Next, a number of interesting matrix inequalities are given in the papers by Masani and Wiener and by Helson and Lowdeuslager referred to at the end of Chap. 11, aud in Ky Fan and A. J. Hoffman, Some Metric Inequalities in the Space of Matrices, Proc. Amer. Math. Soc., vol. 6, pp. 111-116, 1955. Finally, see O. Taussky, Bibliography on Bounds for Characteristic Roots of Finite Matrices, National Bureau of Standards Rept. 1162, September, 1951. O. Taussky, Positive-Definite Matrices and Their Role in the Study of the Characteristic Roots of General Matrices, Advances in Mathematics, vol. 2, pp. 175-186, Academic Press Inc., New York, 1968. A recent paper containing inequalities quite different from those described above is L. Mirsky, Inequalities for Normal and Hermitian Matrices, Duke Math. J., vol. 24, pp. 591-599, 1957. See also E. V. Haynsworth, Note on Bounds for Certain Determinants, Duke Math. J., vol. 24, pp. 313-320, 1957. M. Marcus and H. Mine, Extensions of Classical Matrix Inequalities, Lin. Algebra Appl., vol. 1, pp. 421-444, 1968. H. J. Ryser, Inequalities of Compound and Induced Matrices with Applications to Combinatorial Analysis, Illinois J. 1J1 alh., vol. 2, pp. 240-253, 1958. H. J. Ryser, Compound and Induced Matrices in Combinatorial Analysis, Proc. Symp. Appl. Math., Combin. Anal., pp.149-167, 1960. R. C. Thompson, Principal Sul5matrices of Normal and Hermitian Matrices, Illinois J. Math., vol. 10, pp. 296-308, 1966. R. C. Thompson and P. Me En teggert, Principal Submatrices, II: The Upper and Lower Quadratic Inequalities, Lin. Algebra Appl., vol. 1, pp. 211-243, 1968.
9 Dynamic Programming 1. Introduction. In Chap. 1, we encountered quadratic forms in connection with maximization and minimization problems, and observed that systems of linear equations arose in connection with the determination of the e~rema of quadratic functions. Although simple conceptually, the solution of linear equations by means of determinants is not feasible computationally for reasons we have discussed in previous pages. Consequently, we must devise other types of algorithms to compute the solution. This suggests that it might be well to develop algorithms connected directly with the original maximization problem without considering the intermediate linear equations at all. In this chapter, we shall study a number of problems in which this can be done. Our basic tool will be the functional equation technique of dynamic programming. 2. A Problem of Minimum Deviation. Given a sequence of real numbers (akL k = 1,2, ... ,N, we wish to determine another sequence (Xkl, k = 1, 2, . . . , N, "close" to the ak and "close" to each other. Specifically, we wish to minimize the quadratic expression N
Q(x) =
L Ck(Xk -
N
Xk_l)!
+
k-l
L dk(X" -
ak)1
(1)
k-l
where Ck and dk are prescribed positive quantities, and Xo is a given constant. Proceeding in the usual fashion, taking partial derivatives, we obtain the system of linear equations Ct(Xt -
xo) - c.(x. - Xl)
CN(XN - XN_I)
+ dN(XN
+ dl(XI -
- aN) 144
=0
al) = 0
Dynamic Programming
145
If N is large, the solution of this system requires some care. In place of using any of a number of existing techniques to resolve this system of equations, we shall pursue an entirely different course. 3. Functional Equations. Let us consider the sequence of minimization problems: Minimize over all Xk N
Qr(X) =
L Ck(Xk -
N
Xk-l)2
k=r
+
1:
(1)
dk(Xk - ak)2
k-r
with Xr-t = u, a given quantity. It is clear that the minimum value depends upon u, and, of course, upon r. Let us then introduce the sequence of functions ffr(u) I, where - 00 < U < 00 and r = 1,2, . . . I N, defined by the relation fr(U) = min Qr(X)
(2)
'"
We see that
fN(U) = min [CN(XN - u)2
+ dN(XN
(3)
- aN)2]
"'N
function which can be readily determined. To derive a relation connecting fr(u) with fr+l(u), for r = 1, 2, .•• , N - 1, we proceed as follows:
B
N
fr(u) = min min . . . min [ Zr
=
Zr+l
min {Cr(Xr - u)2
"'r
+ min
L Ck(Xk -
k=r
'ZN
+ dr(x r N
L
N
Xk-l)2
L dk(Xk -
+
akP]
k_r
ar)2 N
L
... min'[ Ck(Xk - Xk-l)2 + dk(Xk - ak)2]} "'N k-r+l k-r+l = min [Cr(X r - U)2 + dr(x r - ar)' + fr+l(X r)] "".'
"'.
(4)
Since fN(U) is determined by (3), this recurrence relation in principle determines fN-l(U), fN-'(U), and so on, back to fl(U), the function we originally wanted. Using a digital computer, this series of equations can be used to compute the members of the sequence I!,(u) I. In place of solving a particular problem for a single value of N, our aim has been to imbed this variational problem within a family of problems of the same general type. Even though an individual problem may be complex, it may be quite simply related to others members of the family. This turns out to be the case in a large class of variational questions of which the one treated above is a quite special example.
146
Introduction to Matrix Analysis
'" Recurrence Relations. Making use of the analytic structure of each member of the sequence f,(u) , we can do much better. Let us begin by proving inductively that each member of the sequence is a quadratic function of u, having the form f,(u)
= u, + v,u
+ w,u·
(I)
where Ur, V" and w, are constants, dependent on r, but not on u. The result;s easily seen to be true for r = N, since the minimum value in (3.1) is furnished by the value (2)
Using this value of XN in the right-hand side of (3.3), we see that fN(U) is indeed quadratic in u. The same analysis shows that fN-l(U), and so inductively that each function, is quadratic. This means that the functions f,(u) are determined once the sequence of coefficients fU"v"w,1 has been found. Since we know UN, VN, and WN, it suffices to obtain recurrence relations connecting U" V" and w, with Ur+l, V'+l, Wr+l' To accomplish this, we turn to (3.4) and write Ur
+ v,U + w,u 2 =
min [cr(x, - u)2 s,
+ d,(x r - a,)2 + Ur+l + V,+IXr + W'+lX,2]
(3)
The minimizing value of x, is seen to be given by (c,.
+ d, + W,+l)X,
= c,.u
+ d,a,
_
V';l
(4)
Substituting this value in (3) and equating coefficients of powers of u, we obtain the desired recurrence relations, which are nonlinear. We leave the further details as an exercise for the reader. I. A More Complicated Example. In Sec. 2, we posed the problem of approximating to a varying sequence Iak I by a sequence fXk I with smaller variation. Let us now pursue this a step further, and consider the problem of minimizing the quadratic function N
Q(x)
=
L Ck(Xk t-l
N
Xk-l)2
+
L dk(Xk -
ak)2
t-l
N
+
L
ek(xk - 2Xk_l
+ X~2)2
(1)
t-I
where Xo = U, X-I - V. As above, we introduce the function of two values f,(u,v) defined by
Dynamic Programming the relation
N
f,(u,v) = min [ S
r
LCk(Xk -
'-r
N
Xk_l)2
+
L dk(Xk - ak)2 + L ek(Xk - 2Xk-l + Xk-2)2] k-r
k-r
00 < u, v < It is easy to see, as above, that
= I, 2, . . . , N, -
f,(u,v) = min [c,(x r - u)2 z,
+ dr(x r -
00,
N
(2)
with Xr_l = U, Xr-2 = v.
ar)2
+ er(x r -
r = 1, 2, . . . ,
147
2u
+ V)2 + fr+l(x"u)}
(3)
N, and that
fr(u,v) =
Ur
+ vru + w,u2 + v~v + W~V2 + ZrUV
where the coefficients depend only upon r. Combining these relations, we readily derive recurrence relations Connecting successive members of sequence IUr,V r,Wr,u~,v~,z,}. Since we easily obtain the values for r = N, we have a simple method for determining the sequence. 6. Sturm-Liouville Problems. A problem of great importance in theoretical physics and applied mathematics is that of determining the values of ). which permit the homogeneous equation u"
+ "'q,(t)u =
u(O) = u(l)
0
=0
(1)
to possess a nontrivial solution. The function q,(t) is assumed to be real, continuous, and uniformly positive over (O,ll. In Sec. 12 of Chap. 7, we sketched one way of obtaining approximate solutions to this problem. Let us now present another, once again leaving aside the rigorous aspects. Observe, however, that the questions of degree of approximation as N -- 00 are of great importance, since the efficacy of the method rests upon the answers to these questions. In place of seeking to determine a function u(t) for 0 ~ t.~ 1, we attempt to determine a sequence IUk = u(k~) I, k = 0, I, . . . ,N - 1, where N ~ = I. The second derivative u" is replaced by the second difference (2)
with the result that the differential equation in (1) is replaced by a difference equation which is equivalent to the system of linear homo-
148
Introduction to Matrix Analysis
geneous equations UI U8 -
+ )..1 l cPIUI = 0 2U2 + Ul + )..1 l cPIUI =
2Ul
0
(3) where cPk == cP(k.1). We see then that the original problem has been approximated to by a problem of quite familiar type, the problem of determining the characteristic values of the symmetric matrix
1
AJ
We shall discuss this type of matrix, a particular example of what is called a Jacobi matrix, again below. It is essential to note that this is merely one technique for reducing the differential equation problem to a matrix problem. There are many others. EXERCISES
1. Show that the equation in (6.1), together with the boundary condition, is equivalent to the integral equation v -
X/01K(e,8)tf>(8)v(8) dB
where K(e,a) - HI - 8) - 8(1 - e)
1~e~8~O O~e~8~1
I. Use this representation to obtain anot'tel approximating matrix problem of the form N
v, - X l1ci;tf>;u;
i - 1, 2, ••• , N
i-I
'1. Functional Equations. Since A, as given by (6.4), is real and symmetric, we know that the characteristic values can be determined from the problem:
149
Dynamic Programming Minimize N
l
(U'k -
(1)
Uk-l)2
II-I
subject to the conditions
= UN = 0
Uo
(2a)
N-l
L
k Uk2 = 1
(2b)
k-l
In place of this problem, consider the problem of determining the minimum of (3)
subject to the conditions = v = 0
(4a)
k U k2 = 1
(4c)
Ur-l UN
(4b)
N-l
1:
k-r
for r = I, 2, . . . , N - I, where - 00 < v < 00. The minimum is then for each r a function of v, !r(v). !N-l(V)
=
(f/VN-l - V)2
+
We see that
(_1_) N-l
(5)
and that, arguing as before, we have the relation !r(v) = min [(ur - v)2
+ (1
- ru,2)!r+1(ur/Vl - rur2)]
(6)
for r = I, 2, . . . , N -" 1. The sequence of functions I!r(v) I apparently possesses no simple analytic structure. However, the recurrence relation in (6) can be used to compute the sequence numerically. EXERCISE 1. Treat in the same fashion the problem of determining the maximum and minimum of
+ (XI + ax2)' + ... (XI + x, + ... + XN_I + axN)' subjeet to + X,, + . . . + XN' = 1 XI' + (Xl + ax,)' + ... (Xl + ax, + a'XI + ... + aN-lxN)' subjeet to XI' + %2' + ... + XN' = I XI' + (Xl + ax,)' + [XI + ax, + (a + b)xa)' + ... + IXI + ax, + (a + b)xa + ... + [a + (N - 2)b)XN I' subjeet to XI' + X,, + ... +
(a) (axl)' Xl'
(b) (c)
XN' =
1
150
Introduction to Matrix Analysis
For a discussion of these questions by other means, see A. M. Ostrowski, On the Bounds for a One-parameter Family of Matrices, J. lilT Math., vol. 200, pp. 190-200, 1958.
8. Jacobi Matrices. By a Jacobi matrix, we shall mean a matrix with the property that the only nonzero elements appear along the main diagonal and the two contiguous diagonals. Thus
Ii - jl
~ 2
(1)
Let us now consider the q~estion of solving the system of equations Ax =
c
(2)
where A is a positive definite Jacobi matrix. As we know, this problem is equivalent to that of minimizing the inhomogeneous form Q(x)
=
(3)
(x,Ax) - 2(c,x)
Writing this out, we see that we wish to minimize
+
+
+ 2a23X2Xa + ...
Q(x) = aUxl 2 2a12XIX2 a22X2 2 2aN-I.NxN-IXN aNNxN 2 -
+
+
2CIXI -
2C2X2 -
"
-
2CNXN
(4)
2ZXk
(5)
Consider then the problem of minimizing Qk(X,Z)
=
+
+ a22x22 + 2a23X2Xa + + akkXk 2 - 2CIXI - 2C2X2 -
anXl 2 2a12XIX2 2ak_1 kXk-lXk
+
for k = 1,2,
•• -
, with (6)
Define !k(Z) = min Qk(X,Z)
Then it is easy to see that !k(Z)
= min
for (c = 2, 3, . . . .
"'.
(7)
'"
[auxk2 - 2ZXk
+ !k-I(Ck-1 -
ak-I,kXk)J
(8)
Once again, we observe that each function !k(Z) is a quadratic in z,
k = 1,2, . . .
(9)
with Substituting in (8), we obtain recurrence relations connecting '/4, Vk, and Wk with Uk-I, Vk-I, and Wk-I. We leave the derivation of these as exercises for the reader.
Dynamic Programming
151
EXERCISES 1. Generalize the procedure above in the case where A is a matrix with the property that ai; = 0, Ii - il ~ 3. 2. Obtain relations corresponding to those given in Sec. 7 for the largest and smallest values of a symmetric matrix A with ali = 0, Ii - il ~ 2; and aii = 0, Ii - il ~ 3.
9. Analytic Continuation. We were able to apply variational techniques to the problem of solving the equation Ax = c at the expense of requiring that A be symmetric and positive definite. It is tempting to assert that the explicit solutions we have found are valid first of aU for symmetric matrices which are not necessarily positive definite, provided only that none of the denominators which occur is zero, and then suitably interpreted for not necessarily symmetric matrices. The argument would proceed along the following lines. The expressions for the Xi are linear functions of the Ci and rational functions of the Q,;j. An equality of the form (1)
valid in the domain of afj space where A is positive definite should certainly be valid for all afj which satisfy the symmetry condition. It is clear, however, that the rigorous presentation of an argument of this type would require a formidable background of the theory of functions of several complex variables. Consequently, we shall abandon this path, promising as it is, and pursue another which requires only the rudimentary facts concerning analytic continuation for functions of one complex variable. Consider in place of the symmetric matrix A the matrix zI + A, where z is a scalar. If z is chosen to be a sufficiently large positive number, this matrix will be positive definite, and will be a Jacobi matrix if A is a Jacobi matrix. We can now apply the principle of analytic continuation. The Xi, as given by the formulas derived from the variational technique, are rational functions of z, analytic for the real part of z sufficiently large. Hence, identities valid in this domain must hold for z = 0, provided none of the functions appearing has singularities at z = O. These singularities can arise only from zeroes of the denominators. In other words, the formulas are valid whenever they make sense. 10. Nonsymmetrlc Matrices. In order to apply variational techniques to nonsymmetric matrices, we must use a different device. Consider the expression f(x,y) = (x,Bx)
+ 2(x,Ay) + (y,By)
- 2(a,x) - 2(b,y)
(1)
152
Introduction to Matrix Analysis
where we shall assume that B is symmetric and positive definite and that A is merely real. Minimizing over x and y, we obtain the variational equations Bx A'x
+ Ay = a, + By = b
(2)
In order to ensure that !(x,y) is positive definite in x and y, we must impose some further conditions upon B. Since we also wish to use analytic continuation, perhaps the simplest way to attain our ends is to choose B = zI, where z is a sufficiently large positive quantity. The functional equation technique can now be invoked as before. We leave it to the reader to work through the details and to carry out the analytic continuation. 11. Complex A. Let us now see what can be done in case A is symmetric, but complex, A = B + iO, Band 0 real. The equation Ax = c takes the form (B
+ iC)(x + iy)
= a
+ ib
(1)
or, equating real and complex parts, Bx - Oy = a Ox + By = b
(2)
Observe that these relations establish a natural correspondence between the N-dimensional complex matrix B + iO and the 2N-dimensional matrix (3)
a relationship previously noted in the exercises. In order to relate this matrix, and the system of equations in (2) to a varia\ional problem, we consider the quadratic form !(x,y) = (x,Ox)
+ 2(x,By)
- (y,Oy) - 2(b,x) - 2(a,y)
(4)
If we impose the condition that 0 is positive definite, it follows that !(x,y) is convex in x and concave in y and hence, as a consequence of
general theorems, which we shall mention again in Chap. 16, that min max !(x,y) '"
1/
=
max min !(x,y) 1/
(5)
'"
Since!(x,y) is a quadratic form, there is no need to invoke these general results since both sides can be calculated explicitly and seen to be equal. We leave this as an exercise for the reader. Before applying the functional equation technique, we need the further result that the minimum over the XI and the maximum over the y. can be
153
Dynamic Programming taken in any order.
In particular,
min
max
= min max
(:la,:I::t•••• ,aw) (IlLoUt, ••. ,liN)
Zl
min
max
(6)
(St, ••• ISH) (lit•••• ,liN)
III
This we also leave to the enterprising reader. Problems of this nature are fundamental in the theory of games, which we shall discuss briefly in Chap. 16. Combining the foregoing results, we can treat the case of complex A. If necessary, we consider A + izI, where z is a sufficiently large positive quantity, once again introduced for the purposes of analytic continuation. 12. Slightly Intertwined Systems. Let us now consider the problem of resolving a set of linear equations of the form
+ a12X2 + a13Xa = Cl + a22X2 + a2aXa ~ C2 a31 X l + aa2 X 2 + aaaXa + b 1x4 = btxa + a44X4 + auXa + a4eXe = C4 aa4X4 + aa6Xa + a58X6 = C6 a64X4 + ae6Xa + a6&X6 + b 2x7 = aUXl a21Xl
bN-lXaN-a
Ca
C,
+ aaN-2,aN-2 X aN-2 + aaN-2,3N-IXaN-l + aaN-2.aNXaN = CaN-2 aaN-l.aN-2 X aN-2 + aaN-l,aN-IXaN-l + aaN-l.aNXaN = CaN-t aaN,aN-2XaN-2 + aaN.aN-lXaN-l + aaN,aNXaN = CaN
(1)
If the coefficients b; were all zero, this problem would separate into N simple three-dimensional problems. It is reasonable to assume that there is some way of utilizing the near-diagonal structure. We shall call matrices of the tyPe formed by the coefficients above slightly intertwined. They arise in a variety of physical, engineering, and economic analyses of multicomponent systems in which there is "weak coupling" between different parts of the system. To simplify the notation, let us introduce the matrices
Ak =
Xl
i, j
(
and the vectors
=
[...-,] Xak-l
Xak
c
k
=
=
I, 2, 3
[,~,] Cak-t
(2)
(3)
Cak
k = 1,2, . . . . We shall assume initially that the matrix of coefficients in (I) is a positive definite. Hence, the solution of the linear system is equivalent
154
Introduction to Matrix Analysis
to determining the minimum of the inhomogeneous quadratic form (zl,Ax 1)
+ (x 2,Ax l ) + ... + (xN,Ax N)
- 2(e 1,x 1) - 2(e 2,x 2) - .•• - 2(e N,xN) 2b1XaX4 2biX&X7 2bN-1X3N_aXaN_2
+
+
+ ... +
(4)
This we shall attack by means of functional equation techniques. Introduce the sequence of functions IfN(z) l, - 00 < Z < 00, N = 1, 2, . . . ,defined by the relation N
fN(Z) = min
[L
N
(x',A,x') - 2
i-I
Si
L
N-l
(e',x')
+2
i-I
L
b,xl+3,xai
+ 2ZXaN]
(5)
,-1
Proceeding in the usual fashion, we obtain the recurrence relation
where R N is three-dimensional region - 00 < X8N, X8N-l, X8N-I 13. Simplifications-I. We can write (12.6) in the form !N(Z) = min [ min [(xN,ANx N) %aN_1
+ 2ZX8N -
2(e N,xN)J
<
00.
+ fN-l(bN-1X8N-I)}
SaN,ZaN_l
(1)
Introduce the sequence of functions gN(Z,Y) =
min [(xN,ANx N)
+ 2ZX8N -
2(e N,x N)J
(2)
ZIN,%aN_l
where X8N-2
= y.
Then fN(Z) = min (gN(Z,y)
+ fN-l(bN-1y)]
(3)
1/
a simple one-dimensional recurrence relation. 14. Simplifications-II. We can, as in the section on Jacobi matrices, go even further, if we observe that fN(Z) is for each N a quadratic in z, (1)
where UN, VN, WN are independent of z. Using this relation in (3), we readily obtain recurrence relations for IUN,vN,wN}. EXERCISES 1. Obtain recurrence relations which enable us to compute the largest and smallest characteristic roots of slightly intertwined matrices. 2. Extend the foregoing technique to the case where the A~ are not all of the same dimension.
Dynamic Programming
155
16. The Equation Ax = y. Let us now pursue a simple theme. is a positive definite matrix, we know that the solution of
=y
Ax
If A (1)
given by x = A-I y can also be obtained as the solution of the problem of minimizing the quadratic form (2)
Q(X) = (x,Ax) - 2(x,Y)
The minimum has the value - (y ,A -Iy). Comparing the two approaches to the problem of solving (1), we can obtain some interesting identities. Introduce the function of N variables N
1:
= min [
!N(YI,Y2, . • • ,YN)
N
a,jXiXj - 2
iii -. 1
XI
1:
XiY,]
(3)
i == 1
Write N
N
1:
aijXiXj - 2
i,j~l
1:
XiY,
i=1
N-l
= aNNXN 2
+
1
N-l a,jXiXj - 2
ioj = I
1
,= 1
Xi(Y, - aiNXN) - 2XNYN
(4)
From this expression, we obtain the functional equation fN(Yl'Y2, •.• ,YN)
= min
[aNNxN 2 - 2XNYN
ZN
+ !N-l(YI
- aINXN,Y2 - a2NXN, . . . ,YN-I - aN-l NXN)J
(5)
In order to use this relation in some constructive form, we recall that ,YN) = _(y,A-l y). Hence, write
fN(YI,Y2, •
0
•
N
!N(YhY2, •.. ,YN)
=
1:
(6)
C,j(N)YiYi
i,j= I
Returning to (5), we obtain the relation N
1: 'ii
C,j(N)YiYi
== 1
= min [ aNNxN 2
-
2XNYN
XN
N-I
+ or, collecting terms,
1:
'I'
Cii(N -
aiNxN)]
(7)
l)y,Yj]
(8)
N-l
N
1
1)(y, - a.NXN)(Yi -
== 1
c;i(N)YiYj = min [ XN 2 {aNN
l,j= 1
+ XN
ZN
{ -2YN - 2
N-l
l
i,i -1
+
l
a,NaiNC'i(N - I)}
',i= I N-I
[YiaiN]Cij(N -
I)}
+
1:
',;-1
cij(N -
156
Introduction to Matrix Analysis
The minimization can now be readily performed, N-I
L + L +
YN
x -
a,Nyjc,,(N - 1)
ioi-I
N-
(9)
N-l
aNN
a;NajNCij(N - 1)
ili-l
while the minimum value itself is given by N-l
(L
N-l Cij(N - l)YiYj) (aNN
I
+
1»)
aiNaJNc.;(N -
'oi -
iJ -1
I
N -I
- (YN
+
L
y,-aiNCij(N -
i.;-t
1)
r
N I
aNN
L
+
aiNajNCij(N - 1)
ioi-I N
Equating coefficients of y,Yj in (10) and
L
Cij(N)Y,Yi, we obtain recur-
',i-l
renee relations connecting the sequences ICij( N)} and ICiJ(N - 1)}. 16. Quadratic Deviation. The problem of minimizing over all Xk the quadratic form (1)
is, in principle, quite easily resolvable. Let lhk(t)} denote the orthonormal sequence formed from 19k(t)} by means of the Gram-Schmidt orthogonalization procedure, where, without loss of generality, the gk(t) are taken to be linearly independent. Then N
min QN(X)
=
'"
min fo T (j(t) II
=
fo T
L
Ykhk(t»)2 dt
"'-I N
1: (fo
r dt -
T
jhA;(t) dt)2
(2)
II-I
This result may be written
fo r dt - fo T
T
fo T j(s)j(t)kN(s,t) ds dt
(3)
where N
kN(s,t) =
L
"'-I
hk(s)hk(t)
(4)
157
Dynamic Programming
Let us now obtain a recurrence relation connecting the members of the sequence IkN(s,t) J. Introduce the quadratic functional, a function of the function f(t), (5)
Employing the functional equation technique, we have for N 3, ... , N(f) = min N-l(f - XNgN)
=
2, (6)
:eN
and
(7) In order to use (6), we employ the technique used above repeated1ywe take advantage of the quadratic character of tPN(f). Hence, N(f) = min CloT (f - XNgN)2 dt "'N
- fo fo Pdt -
(f(s) - xNgN(s»(f(t) - XNgN(t»kN(s,t) ds dt] T T kN-1(s,t)f(s)f(t) dsdt
2
2
T
=
min "'N
-
{fo
T
T
fo fo 2XN [fo TfgN dt - fo T fo T gN(s)f(t)kN-1(s,t) ds dt] + XN [fo T gN dt - fo T fo T kN-1(S,t)gN(S)gN(t) ds dt]}
(8)
Obtaining the minimum value of XN, and determining the explicit value of the minimum, we obtain the recurrenGe relation _ kN(S,t) - kN_l(S,t)
+
Lhh T
+ where
gN(S)gN(t) 2gN(s) d N
h
T
N
T
T
dN =
h gN(s,)kN_1(t,S,) ds, d
gN 2(S) ds -
gN(sdgN(tI)kN-1(St,s)kN-1(tI,t) dS l dtl
hh T
(9)
T
kN_1(s,t)gN(S)gN(t) ds dt
(10)
17. A Result of Stieltjes. Since we have been emphasizing in the preceding sections the ~onnection between the solution of Ax = band the minimization of (x,Ax) - 2(b,x) when A is positive definite, let us use the same idea to establish an interesting result of Stieltjes. We shall obtain a generalization in Chap. 16. Theorem 1. If A is a positive definite matrix with the property that Q,;j < 0 for i ¢ j, then A - I has all positive element8.
158
Introduction to Matrix Analysis
Proof. Consider the problem of minimizing Q(x) = (x,Ax) - 2(b,x), where the components of b are all positive. Assume that Xl, X2, •.• , X~ < 0, X~+l, X~+2, ••• , XN ;;:: 0, at the minimum point. Writing out (x,Ax) - 2(b,x) in the form A:
A:
aUx l
2
+
aUx2 2
+
aNNxN 2
+
L
a;iXiXj
ioi-I N
+
N
l 1
+
l
LL
a;jXiXj - 2
iJ-A:+I
i-k+lj-I
aijX,xi
i-lJ-A:+l A:
N
aijXiXi
+
N
1
N
b,xi - 2
j-I
1
bixi
(1)
i-A:+l
we see, in view of the negativity of aii for i ¢ j, that we can obtain a smaller value of Q(x) by replacing Xi by -Xi for i = 1, 2, . . . , k, and leaving the other values unchanged, provided at least one of the Xi, i = k + 1, . . . ,N, is positive. In any case, we see that all the Xi can be taken to be non-negative. To show that they are actually all positive, if the bi are positive, we observe that one at least must be positive. For if Xl =
X2 = . . . = XN-l =
°
at the minimum point, then XN determined as the value which minimizes XN 2 - 2bNxN is equal to bN and thus positive. Since Ax = b at the minimum point, we have aiiXi = bi -
l
i = 1, 2, . . . , N - 1
a;jXi
(2)
ir'j
which shows that Xi > O. We see then that A-1b is a vector with positive components whenever b is likewise. This establishes the non-negativity of the elements of A-I. To show that A - I actually has all positive elements, we must show that A-Ib has positive components whenever b has non-negative components with at least one positive. Turning to (2), we see that the condition a;j < 0, i ¢ j, establishes this. MISCELLANEOUS EXERCISES 1. Given two sequences determine a finite sequence in the form
b~...
la~l and lb~l, [x~l, k ... 0, 1,
k = 0, 1, 2, . . . , N, we often wish to
. . . , M which expresses
M
1 Xla~-1. 1-0
To estimate the closeness of fit, we use the sum
N
QNoM(X) ...
M
1: (b~ I xla~_IY -
A:-O
1-0
N >M
~l
b~
most closely
159
Dyna mic Programming Consider the quadratic form in Yo, YI, . • . ,YN defined by ,YN) = min [(yo - xoao)!
+ (YI
- XOat - xlao)!
+
%
Show that
and that, generally, ,YN)
= min [(yo - xoao)! %.
2. Write N
fN.M(Yo,YI, . . . ,YN) =
l
c,;(N,M)YiYi
i.i= 0
and use the preceding recurrence relation to obtain the cl;(N,M) in terms of the cij(N - 1, M - 1). S. Write fN.M(Yo,YI, ••• ,YN)
= (Y,AMNY) and obtain the recurrence relation in terms of matrices. For an alternate discussion of this problem which is paramount in the KolmogorovWiener theory of prediction, see N. Wiener, The Extrapolation, Interpolation and Smoothing of Stationary Time Series and Engineering Applications, John Wiley & Sons, Inc., 1942, and particularly the Appendix by N. Levinson. ,. Let A and B be two positive definite matrices of order N with AB ~ BA and c a given N-dimensional vector. Consider the vector XN = ZNZN_I ••• Z!ZIC, where each matrix Zi is either an A or a B. Suppose that the Zi, i = 1, 2, . . . , N, are to be chosen so as to maximize the inner product (xN,b) where b is a fixed N-dimensional vector. Define the function fN(C) = max (xN,b) lz;!
for N = 1, 2, . . . , and all c. f.(c) fN(C)
=
Then
max ((Ac,b),(Bc,b»,
= max (fN-l(Ac),!N_l(Bc»
N = 2,3, . . . .
IS. Does there exist a scalar X such that fN(C) rv XN(J(C) as N -> OCI? 6. What is the answer to this question if AB = BA? 7. Suppose that C is a given positive definite matrix, and the Zi are to be chosen so that ZNZN_I . . . Z 2Z I C has the maximum maximum characteristic root. Let gN(C) denote this maximum maximorum. Then gl(C) = max [>(AC),>(BC)], (IN(C) = max [gN_I(AC),(JN_I(BC)]
N = 2,3, . . . ,
where >(X) is used to denote the maximum characteristic root of X. 8. Does there exist a scalar X such that gN(C) rv XNh(c) as N ..... OCI? See C. Bohm, Sulla minimizzazione di una funzione del prodotto di enti non commutati, LinceiRend. Sci. fiB. Mat. e Nat., vol. 23, pp. 386-388,1957.
160
Introduction to Matrix Analysis Bibliography
§1. The theory of dynamic programming in actuality is the study of multistage decision processes. It turns out that many variational problems can be interpreted in these terms. This interpretation yields a new approach which is useful for both analytic and computational purposes. In this chapter, pursuant to our general program, we are interested only in analytic aspects. For a detailed discussion of the general theory, the reader is referred to
R. Bellman, Dynamic Programming, Princeton University Press, Princeton, N.J., 1957.
1/
§2. The problem treated here is a simple example of what is called a smoothing" problem; cf.
R. Bellman, Dynamic Programming and Its Application to Variational Problems in Mathematical Economics, Calculus of Variations; Proceedings of Symposia in Applied Mathematics, vol. VIII, pp. 115-139, McGraw-Hill Book Company, Inc., New York, 1958. For a discussion of smoothing problems of a different type, see the following: I. J. Schoenberg, On Smoothing Operations and Their Generating Functions, Bull. Am. Math. Soc., vol. 59, pp. 199-230, 1953.
N. Wiener, Cybernetics, John Wiley & Sons, Inc., New York, 1951. §3. The derivation of this recurrence relation is a particular application of the principle of optimality; see the book referred to above in §t. The results discussed in §3 to §5 were first presented in
R. Bellman, On a Class of Variational Problems, Quart. Appl. Math., vol. XIV, pp. 353-359, 1957. §7. These results were presented in
R. Bellman, Eigenvalues and Functional Equations, Proc. Am. Math. Soc., vol. 8, pp. 68-72, 1957. §S. This discussion follows
R. Bellman, On a Class of Variational Problems, Quart. Appl. Math. vol. XIV, pp. 353-359, 1957. For a number of results dependent upon the fact that Jacobi matrices arise in the treatment of random walk processes, see
Dynamic Programming
161
S. Karlin and J. McGregor, Coincident Properties of Birth and Death Processes, Technical Rept. 9, Stanford University, 1958. P. Dean, The Spectral Distribution of a Jacobian Matrix, Proc. Cambridge Phil. Soc., vol. 52, pp. 752-755, 1956. §9. This discussion, given for the case of finite matrices, can be readily extended to cover more general operators. An application of this technique is contained in R. Bellman and S. Lehman, Functional Equations in the Theory of Dynamic Programming-X: Resolvents, Characteristic Functions and Values, Proc. Natl. Acad. Sci. U.S., vol. 44, pp. 905-907, 1958. §1G-§11. The contents of these sections are taken from a more general setting in
R. Bellman and S. Lehman, Functional Equations in the Theory of Dynamic Programming-IX: Variational Analysis, Analytic Continuation and Imbedding of Operators, Duke Math. J., 1960. For an extensive discussion of matters of this type, see C. L. Dolph, A Saddle Point Characterization of the Schwinger Stationary Points in Exterior Scattering Problems, J. Soc. Ind. Appl. Math., vol. 5, pp. 89-104, 1957. C. L. Dolph, J. E. McLaughlin, and I. Marx, Symmetric Linear Transformations and Complex Quadratic Forms, Comm. Pure and Appl. Math., vol. 7, PP. 621-632, 1954. §12 to §14. The results are taken from R. Bellman, On Some Applications of Dynamic Programming to Matrix Theory, Illinois J. Math., vol. I, pp. 297-301, 1957.
§16 to §16. The results follow
R. Bellman, Dynamic Programming and Mean Square Deviation, The RAND Corporation, Paper P-1147, September 13, 1957. §17. This result was given by
T. J. Stieltjes, Sur les racines de XII 1887.
=
0, Acta Math., vol. 9, 1886-
A generalization of this result will be given in Chap. 16. For another application of this device, see R. Bellman, On the Non-negativity of Green's Functions, Boll. Unione Matematica, vol. 12, pp. 411-413, 1957.
162
Introduction to Matrix Analysis
For some further applications of functional equation techniques to quadratic forms, see R. E. Kalman and R. W. Koepcke, Optimal Synthesis of Linear Sampling Control Systems Using Generalized Performance Indexes, Am. Soc. Mech. Eng., Instrument and Regulators Conferences, Newark, April 2-4, 1958. R. Bellman and J. M. Richardson, On the Application of Dynamic Programming to a Class of Implicit Variational Problems, Quart. Appl. Math., 1959.
M. Freimer, Dynamic Programming and Adaptive Control Processes, Lincoln Laboratory Report, 1959. H. A. Simon, Dynamic Programming under Uncertainty with a Quadratic Criterion Function, Econometrica, vol. 24, pp. 74-81,1956. See Chap. 17 for many further results and references. The study of routing problems introduces a new type of matrix composition. Write C = M(A,B) if Cii = min (a.", + b"'i). See k
R. Bellman, K. L. Cooke, and Jo Ann Lockett, Algorithms, Graphs, and Computers, Academic Press Inc., New York, 1970. C. Berge, The Theory of Graphs and Ita Application8, John Wiley & Sons, New York, 1962. J. F. Shapiro, Shortest Route Methods for Finite State Deterministic Dynamic Programming Problems, SIAM J. Appl. Math., vol. 16, pp. 1232-1250, 1968.
10 Matrices and Differential Equations 1. Motivation. In this chapter which begins the second main portion of the book, we shall discuss the application of matrix theory to the solution of linear systems of differential equations of the form
:i
r N
=
aiJXJ
i = 1,2, . . . , N
Xi(O) = c.
(1)
;=1
where the aiJ are constants. In order to understand the pivotal positions that equations of this apparently special type occupy, let us explain a bit of the scientific background. Consider a physical system S whose state at any time t is assumed to be completely described by means of the N functions Xl(t), X2(t), ••• , XN(t). Now make the further assumption that the rate of change of all these functions at any time t depends only upon the values of these functions at this time. This is always an approximation to the actual state of affairs, but a very convenient and useful one. The analytic transliteration of this statement is a set of differential equations of the form dx.
di = J.( Xl,X2,
••• ,XN)
i = 1, 2, . . . , N
(2)
with an associated set of initial conditions Xi(O) = c.
i = 1, 2, . . . , N
(3)
The vector c = (Cl,C2, ••• ,CN) represents the initial state of the system. Sets of constants, Ic.1, for which
i = 1,2, . . . , N (4) playa particularly important role. They are obviously equilibrium states since S cannot depart from them without the intervention of external forces. !i(Cl,C2, ••• ,CN)
= 0
163
Introduction to Matrix Analysis
1M
Whenever such states exist, it is of great interest to study the behavior of the system in the neighborhood of these states. In other words, we are examining the 8tability of the system under small disturbances. If the perturbed system eventually returns to the equilibrium state, we say that it is 8table; otherwise, we say that it is un8table. These considerations are of great practical significance. In order to carry out this study, we set Xi = Ci
+ Yi
(5)
where the Yi are taken to be small quantities. Substituting in (2), we obtain the equations dYi di
=
= fi(CI
+ YI, CI + Ylt
••• ,
2:
CN
+ YN)
i == 1,2, . . . , N
N
= fi(Cl,CI, • • • ,CN)
+
a;jYj
+
(6)
i-I
where aq = afi aXj
at Xl ==
CI, XI
==
CI, • • • , XN
= CN
(7)
and the three dots signify terms involving higher powers of the Yi. The behavior of S in the neighborhood of the equilibrium state, lcd, is thus determined, to an approximation whose accuracy must be carefully examined, by the linear system with constant coefficients given in (1).
We have then a powerful motivation for the study of linear systems of this type. Our aim is to determine analytic representations of the solution which will permit us to ascertain its limiting behavior as t _ 00. These questions of stability will be taken up again in Chap. 13. 2. Vector-matrix Notation. To study (1.1), we introduce the vectors Y and c, possessing the components Y. and Ci, respectively, and the matrix A = (a;j). It is clear from the way that the difference of two vectors is defined that the appropriate way to define the derivative of a vector is the following: dYI
dt dy dt
dYI (it (1)
= dYN
de
Matrices and Differential Equations
165
Similarly, the integral of yet) is defined to be
f
r
r
Yl(S) ds Y2(S) ds
yes) ds =
(2)
r
YN(S) ds
The derivatives and integrals of matrices are defined analogously. It follows that (1.1) can be written dy
-dt = Ay
y(O) = c
(3)
The matrix A will, in general, not be symmetric. Consequently, the techniques and results of the first part of this volume can be expected to playa small role. We shall have to develop some new methods for treating general square matrices. A vector whose components are functions of t will be called a vector function, or briefly, a function of t. It will be called conhnuous if its components are continuous functions of t in the interval of interest. We shall use similar terms in describing matrix functions. EXERCISES 1. Show that
(a) i (x,y) = (~,y) + (x,~~) (b) (c)
(d) (e)
~ (Ax)
i
(AB)
.!!. (X-I)
dt
!! de (Xn)
ed~) x + A ~ = (~~) B + A (~~)
=
(dX) de X-I =- (dX) Xn-l + X (dX) Xn-I + ... de de = _ X-I
J. Obtain an equation for the derivative of
X~S.
3. Norms of Vectors and Matrices. We could, if we so desired, use the scalar function (x,x) as a measure of the magnitude of x. However, it is more convenient to use not these Euclidean norms, but the simpler function N
Ilxll =
LIXil
i-I
(1)
166
Introduction to Matrix Analysis
and, for matrices, N
IIAII
=
l
iJ- 1
l
(2)
It is readily verified that
IIx + yll ::; IIxil + IIyll IIA + BII ::; IIAII + IIBII IIAxil ::; HAil IIxil IIABII ::; IIAII IIBII IIclxll = ICII IIxil IIcl Ail = Icd IIAII
(3)
The reason why we have chosen the foregoing norms for vectors and matrices is that the verification of the results in (3) is particularly simple. As we shall see in the exercises below, there are a large number of choices of norms which are equally useful when dealing with finite dimensional vectors and matrices. It is only when we turn to infinite dimensional vectors and matrices that the choice of a norm becomes critical. EXERCiSES 1. Show that we could define as norms satisfying (3) the functions N
IIxll - ( llx,I2yi - (x,f)~ i-I N
IIAII ... (
l
lajjll)l-i ... tr
(AA .)~
'.i-l
I. If we set IIxll = max Ix,l, what definition should we take for IIA II in order that i
ali of the inequalities of (3) he valid? 8. Let Ilxll be a vector norm satisfying the vector relations in (3) and the condition that IIx II ~ 0 and Ilx II = 0 if and only if x ... O. Show that IIA II ... max IIA x II is a -I matrix norm which satisfies the remaining relations in (3). (This is the""'''standard way of inducing a norm for transformations, given a norm for vectors.) fl. If we use the norm for x appearing in Exercise 2, which norm do we obtain for A using the technique of Exercise 3? IS. Show that convergence of a sequence of vectors Ix"l to a vector x implies and is implied by convergence of the kth components of the members of the sequence Ix"l to the kth component of x. 8. Show that convergenCe of a sequence of vectors Ix"l in one norm satisfying the conditions of Exercise 3 implies convergence in any other norm satisfying these conditions. 7. Show that IIfx(l) dIll 5 fllx(l) II dI II fA (I) dIll 5 filA (I) II dl
8. Show that IIAnll :S IIAII" for any norm satisfying the condition in (3). t. Is there any norm satisfying the conditions in (3) for which IIABII ... IIA II IIBII?
Matrices and Differential Equations
167
4. Infinite Series of Vectors and Matrices. In the course of establishing the existence of solutions of the linear vector equation appearing above, we shall have need of infinite series of vectors and matrices. By
..
the vector
l
X",
.. -0
sum of the series
we shall mean the vector whose ith component is the
..
1
Xi".
Thus, the convergence of the vector series is
..
n-O
equivalent to the simultaneous convergence of the N series,
1
Xi".
It
,,-0
..
.
n~O
n=O
follows that a sufficient condition for convergence of the vector series
1 x" is that the scalar series L Ilx"lI converge. .
Similarly, a matrix series of the form
LAn represents N2 infinite n=O
series, and a sufficient condition for convergence is that
..
I11Anll n-O
converge. 6. Existence and Uniqueness of Solutions of Linear Systems. With these preliminaries, we are ready to demonstrate the following basic result. Theorem 1. If A (t) is continuous for t ~ 0, there is a unique solution to the vector differential equation dx dt
- = A(t)x This solution exists for t
~
x(O)
= c
(1)
0, and may be written in the form
x = X(t)c
(2)
where X(t) is the unique matrix satisfying the matrix differential equation dX
dt
=
A(t)X
X(O) = I
(3)
Proof. We shall employ the method of successive approximations to establish the existence of a solution of (3). In place (,f (3), we consider the integral equation (4) x = I + Jot A (s)X ds
Introduction to Matrix Analysis
168
Define the sequence of matrices
= I X,,+I = I
IX"I as follows: (5)
Xu
+ JO'
A(s)X" ds
n
= 0, 1, . . .
Then we have
X"+l - X"
=
Let
JO'
A(8)(X" -
X,,_I) d8
n
= 1,2, .
m = max IIA(s)11
(6) (7)
0:9:9.
Here and in what follows, we are employing the norms defined in (3.1) and (3.2). Using (6), we obtain
h'
IIX"+I - X"II = II A(s)(X" - X,,_I) dsll 5 JO' IIA(s)IIIIX" - X"_III d8 ::; m IIX" - X"_III d8
h'
for 0 ::; t ::; h.
(8)
Since, in this same interval,
IIXI - Xoll ::; JO' IIA(s)11 ds
::; mt
(9)
we have inductively from (8), (10)
..
Hence, the series
l
(X"+l - X,,) converges uniformly for 0 ::; t ::; t l •
,,-0
Consequently, X" converges uniformly to a matrix X(t) which satisfies (4), and thus (3). Since, by assumption, A (t) is continuous for t ~ 0, we may take t l arbitrarily large. We thus obtain a solution valid for t ~ O. It is easily verified that x = X(t)c is a solution of (1), satisfying the required initial condition. Let us now establish uniqueness of this solution. Let Y be another solution of (3). Then Y satisfies (4), and thus we have the relation X - Y =
JO' A (s)(X(s)
- Y(s» ds
(11)
Hence
IIX - YII ::; JO' IIA(s)IIIIX(s) -
Y(s)11 d8
(12)
Since Y is differentiable, hence continuous, define
ml = max O:$ISI1
IIX - YII
(13)
Matrices and Differential Equations
169
From (12), we obtain
IIX - YII ~
m, fol IIA(s)11 ds
0 =:;; t
~ t,
(I4)
Using this bound in (12), we obtain
IIX - YII ~ m, fol IIA(s)11 (!o'IIA(s,)11 ds,) d8
< m, (fol IIA (s) II ds -
2
r
(15)
Iterating, we obtain
IIX - YII ~
m,
(fol IIA(s)1I d8)"+! (n
+ 1)!
(16)
Letting n ~ 00, we see that IIX - YI\ ~ O. Hence X == Y. Having obtained the matrix X, it is easy to see that X(t)~ is a solution of (1). Since the uniqueness of solutions of (1) is readily established by means of the same argument as above, it is easy to see X(t)e is the solution. EXERCISE 1. Establish the existence of a solution of (1) under the condition that A (t) is a Riemann-integrable function over any finite interval. In this case, need the differential equation be satisfied everywhere? Examine, in particular, the case where A (I) == A, 0 ::; I ::; 10, A (I) = B, I > 10.
6. The Matrix Exponential. Consider now the particular case where A(t) is a constant matrix. In the scalar case, the equation du dt
=
au
u(O) = e
(1)
has the solution u = elite. It would be very convenient to find an analogous solution of the matrix equation dX = AX
dt
X(O)
= C
(2)
having the form X = eAtC. By analogy with the scalar case, or from the method of successive approximations used in Sec. 5, we are led to define the matrix exponential function by means of the infinite series eAI = I
Let us now demonstrate
+ At +
A"t"
+nr+
. ..
(3)
170
Introduction to Matrix Analysis
Theorem 2. The matrix series defined above exists for all A for any fixed value oft, and/or all tfor anyjixed A. [tconverges uniformly in any finite r6gion of the complex t plane. Proof. We have
_"A_"t_"I
nl
< ",--IIA-,-,-;I ",-,-Itl" - nl
(4)
Since IIAII"ltl"/nl is a term in the series expansion of ell All 1'1, we see that the series in (3) is dominated by a uniformly convergent series, and hence is itself uniformly convergent in any finit.e region of the t plane. EXERCISE
1. Using the infinite series representation, show that d/dl(e AI )
= Ae A '
...
eAtA.
7. Functional Equations-I. The scalar exponential function satisfies the fundamental functional equation (1)
Unless there is an analogue of this for the matrix exponential, we have no right to use the notation of (3). Let us now demonstrate that (2)
Using the series expansions for the three exponentials and t.he fact that absolutely convergent series may be rearranged in arbitrary fashion, we have
=
~ A: (8 + t)" =
~
,,-0
nl
eA(H')
(3)
From (2) we obtain, upon setting s = -t, the important result that (4)
Hence, eA , is never singular and its inverse is e- A ,. This is a matrix analogue of the fact that the scalar exponential never vanishes.
Matrices and Differential Equations
171
8. Functional Equations-II. The proof of the functional equation in Sec. 7 was a verification rather than a derivation. In order to understand the result, let us turn to the differential equation
dX dt
=
AX
(1)
Observe that e AI is a solution with the boundary condition X(O) = I, and that eA(.+t) is a solution with the boundary condition X(O) = eA•• Hence, from the uniqueness theorem, we may conclude that (2)
9. Functional Equations-III. Having derived the functional equation discussed above, the question naturally arises as to the relation between e(A+B)1 and eAle BI • Since e(A+B)1
eAle
BI
+ (A + B)t + (A +2 B)2 t2 + ...
= I
2
=
22
+ At + 2A 2t + . ..) ( I + Bt + 2B t + 22 + (A + B)t + A22t + ABt + B2t +
(I
= I
. -)
(1)
2
2
we see that e(A+B)1 -
eAle B1
2 = (BA - AB) -t
2
+
(2)
Consequently, e(A+B) I = eAle BI for all t only if A B = B A, which is to say if A and B commute. It is easy to see that this is a sufficient condition. 10. Nonsingularity of Solution. We observed in Sec. 7 that eAI is never singular. Let us now point out that this is a special case of the general result that the solution of the equation
dX = A(t)X dt
X(O)
=I
(1)
is nonsingular in any interval 0 ::; t ::; t 1 in which foil II A (t) II dt exists. There are several ways of establishing this result. The first is the most interesting, while two other methods will be given in the exercises below. The first method is based upon the following identity of Jacobi: IX(t)1
= ef~ tr (A(.»d.
(2)
To derive this result, let us consider the derivative of the scalar function IX(t)l. To simplify the notational problem, consider the two-
Introduction to Matrix Analysis
172
dimensional case.
We have IX(t)1
=
IX2Xl
YI Y2
I
(3)
where dYI
df
dY2
di
= allYl
+ anY2
= anYI
+ allY2
(4)
and XI(O) = 1
X2(0) = 0
YI(O) = 0 Y2(0) = 1
(5)
Then
(6)
Thus IX(t)1 =
since
Ix (0) I =
eJ~ tr A(.)d.
(7)
1. EDRCISES
1. Consider the equations de: - A(t)X, X(O) - I, and dY /dt - - YA('), Y(O) - 1. Show that Y - X-I, and conclude that X(I) is nonsingular in any interval where IIA (I) II is integrable. I. Consider the second-order differential equation u" + p(l)u' + q(l)u - O. Write u' - 11 and obtain a first-order system corresponding to this aecond-order equation, namely,
du
dt - 11 dl1
di S. Consider the integral J we obtain
/0'
-p(')11 - q(l)u
w(u"
+ p(')u' + q(l)u) dt.
Integrating by parte
Matrices and Differential Equations
173
What is the connection between the vector-matrix system for w" + Pi (I)W' + + p(l)u' + q(l)u = O? ,. Consider the following proof of the nonsingular nature of X(t). If IX(t)/ - 0 at a point h, 0 ::; I• .::;; I., there exists a nontrivial relation between the vectors constituting the columns of X(I), CIZ 1 + CtZ l + ... + CNZ N ... O. Since CIZ 1 + CtZ l + ... + CNZN is a solution of the equation dz/dl ... A (I)z, if it is zero at one point, it is zero for all I in [0,11). The relation IX(I)I - 0, however, clearly does not hold at , - 0; a contradiction. ql(l)w ... 0 and the vector-matrix system for u"
11. Solution of Inhomogeneous Equation-Constant Coefficients. Let us now consider the problem of solving the inhomogeneous system dx = Ax dt
-
+ f(t)
(1)
x(O) = e
The utility of the matrix exponential notation shows to advantage here. We have e- AI ( dX - - Ax) = -d (e-Alx) = e-Alf (2) dt dt Hence, CAlX = e + 10' e-A'j(s) ds (3) x = eAte
or
+ 10' eA(/-o)f(s) ds
(4)
Observe how the use of matrix exponential function permits us to obtain the solution of (1) in exactly the same fashion as if we were dealing with a scalar equation. 12. Inhomogeneous Equation-Variable Coefficients. Consider the case where A is time-dependent. We wish to solve dx (1) x(O) = c dt = A (t)x + f(t) Let us use the Lagrange variation of parameters technique and attempt to find a solution of the form x = Xy where X = X(t) is the solution of the equation dX/dt = A(t)X, X(O) = I. Substituting in (1), we obtain the equation X'y
+ Xy' =
A(t)Xy
+ Xy'
=
A(t)Xy
+ f(t)
(2)
Hence Xy' = f(t)
(3)
whence y' = X-'(t)f(t)
or Consequently,
y = e
+ 10' X-l(s)f(s) ds
x = X(t)e
+ 10' X(t)X-l(s)f(s) ds
a generalization of the result of (11.4).
(4) (5) (6)
Introduction to Matrix Analysis
174
13. Inhomogeneous Equation-Adjoint Equation. Let us now l>ursu~ a different approach, one of great importance in the general theory of linear functional equations. Take Y(t) to be a variable matrix, as yet unspecified, and integrate between 0 and t the equation Y(t)
~=
Y(t)A(t)x
+ Y(t)f(t)
(1)
The result is, upon integrating by parts, Y(t)x(t) -
Y(O)c -
h' ~~
x(s) ds =
h'
Y(s)A(s)x(s) ds
+
h'
Y(s)f(s) ds
(2)
Without loss of generality, we can take c = 0, since we can obtain the solution of (12.1) by adding to the solution of this special case the vectol' X(t)c. Since our aim is to solve for x(t), suppose that we make the most convenient assumptions that we can, namely, that dY ds = - Y(s)A(s) Y(t)
=
(3a)
I
(3b)
If we can satisfy both of these equations, we can write x in the simple form (4) x = 10' Y(s)f(s) ds The equation in (3) is called the adjoint equation. We know from the general existence and uniqueness theorem established previously that a unique solution to (3) exists. The matrix Y will now be a function of sand t. EXERCISE
1. Show that Y(8) ... X(I}-lX(8).
14. Perturbation Theory. An interesting application of the formula for the solution of the inhomogeneous equation is in the direction of perturbation theory. Given the matrix exponential eM •B , we wish to evaluate it as a power series in t, eM • B
=
A
e
+
.
l
,,-1
(1)
t"Q,,(A,B)
The problem is readily resolved if A and B commute, since then Let us then consider the interesting case where AB
¢ BA.
Matrices and Differential Equations If we write
.
e·(+·B
= I
+
L
(A
+ EB)" nl
175 (2)
n-l
and attempt to collect the terms in E, we soon see that it is quite difficult to do this in a systematic fashion. In place of this direct procedure, we pursue the following route. The matrix e,H.B is the solution of the differential equation dX (3) X(O) = I = (A + EB)X dt evaluated at the point t = 1. Let us write this equation in the form
+ EBX
dX = AX dt
X(O) = I
(4)
It follows from (11.4) that X satisfies the linear integral equation
+
X = eAI
E
fo' eA(H)BX(s) ds
(5)
Solving this Volterra integral equation by iteration, we obtain an infinite series of the form X = eAI
Hence, eM
•B
+
t
fo' eA(I-.)Be A• ds
+ ...
(6)
has as the first two terms of its series expansion eA+. B
1. Set X(I) = eA.
+
= eA +
.
L
E
10
1
eA(I-.>Be A• ds
+
(7)
EXERCISES ,nP.(t) and use (5) to determine recurrence relations
n=1
connecting Pn(l) and P n- 1 (t). 2. Assuming for the moment that eA'e B' can be written in the form ee (a result we shall establish below), where C = Cd + Clt l + Call + ... , determine the coefficient matrices Cl , CI , Ca. a. Assuming that eM • B can be written in the form determine the matrices C Cz, C3• t " (The perturbation expansion in (7) possesses the great defect that it takes a matrix eA+'B, which will be unitary if A and B are skew-Hermitian, and replaces it by an approximation which is nonunitary. The perturbation expansion given above does not suffer from this.)
t See also F. Fer, A cad. Roy. Belg. Cl. Sci., vol. 44, no. 5, pp. 818-829, 1958.
176
Introduction to Matrix Analysis
16. Non-negativity of Solution. The following question arises in mathematical economics. Consider the equation dx = Ax dt
+ f(t)
-
x(O)
=c
(I)
where A is a constant matrix. What are necessary and sufficient conditions upon A in order that all the components of x be non-negative for t ~ 0 whenever the components of c are non-negative and the components of f(t) are non-negative for t ~ O? Referring to (11.4), we see that a sufficient condition is that all elements of eAI be" non-negative for t ~ 0, and it is easily seen that this condition is necessary as well. It is rather surprising that there is a very simple criterion for this condition. Theorem 3. A necessary and sufficient condition that all elements of eAI be non-negative for t ~ 0 is that i¢j
Proof. Since eAI = I
+ At + ...
(2)
(3)
it is clear that the condition in (2) is necessary for the result to be true for small t. To establish the sufficiency, let us show that CJ;j > 0, i ¢ j, implies that the elements of eAI are positive for all t. It is clear that CJ;j > 0 implies that eAI has positive elements for small t. Since (4)
for any integer n, the fact that the product of two positive matrices is positive yields the requisite positivity. Sinle the elements of eA' are continuous functions of the CJ;j, we see that the positivity of the elements of eAI for lJoJ > 0, i ¢ j, implies the nonnegativity of the elements of eAI for lJoJ ~ 0, i ¢ j. Another more direct proof proceeds as follows. Let CI be a scalar so that all the elements of A + ctl are non-negative. Then, clearly, all the elements of e(A+o,1)1 are non-negative. Also, the elements of e- o,11 are non-negative since exponentials are always non-negative. Since eAI _ _
observing that A non-negativity.
+ clI
e(A+c,l)l-C111 e(A+o,1)1e-"111
(5)
and -ciI commute, we have the desired
177
Matrices and Differential Equations EoXERCISES 1. Prove the foregoing result by using the system of differential equations
~'
N
=
L
i == 1, 2, . . . , N
x,(O) ... c,
a'jxi
i-I
2. Show that the result B,j(1) ~ 0, i ~ j, is sufficient to ensure that the solution of the corresponding equation with variable coefficients is non-negative if the initial conditions are non-negative.
16. Polya's Functional Equation. satisfies the functional equation Y(s
+ t)
= Y(s)Y(t)
-
00
We have seen that Y(t) = eAI
< s,t <
00
Y(O)
=I
(1)
An interesting and important question is whether or not there are any other types of solutions of this fundamental matrix equation. If Y(t) has a derivative for all finite t, the answer is simple. Differentiating first with respect to s and then with respect to t, we obtain the two equations Y'(s Y'(s
+ t) + t)
= Y(s) Y'(t) = Y'(s)Y(t)
(2)
Y(s)Y'(t) = Y'(s)Y(t)
(3)
Hence From (1) we see that Y(O) = Y( -t)Y(t), which shows that Y(t) cannot be singular for any value of t. Thus (3) yields Y-l(S) Y'(s) = Y'(t) y-l(t)
for all sand t. The equation
(4)
Thus we must have Y'(t) Y-l(t) = A, a constant matrix. Y'(t)
= A Y(t)
(5)
then yields Y(t) = eAIY(O) = e AI• Let us now prove a stronger result. Theorem 4. Let Y(t) be a continuous matrix function of t satisfying the functional equation in (1) for 0 ::;; s,t, s + t ::;; to. Then in [O,toI, Y(t) is of the form eAt for some constant matrix A. Proof. Consider the Riemann integral
fo' Y(s) ds =
N
lim
L
Y(k5)5
(6)
41--+0 k-O
where N is determined by the condition that (N (1) for 8 > 0, Y(k8)
=
+ 1)5
Y«k - 1)8) Y(8) = Y(8)t
=
t. Since, from (7)
118
Introduction to Matrix Analysis
we see that
(8) We also have
2: N
[Y(8)8 - I]
Y(k8)8 = Y«N
+ 1)8)
- I
(9)
"'-0 which leads to the result
We have assumed that Y(O) = I.
Hence, for small t,
fo' Y(s) ds = tI + o(t) and thus
fo' Y(s) ds is nonsingular for small t.
(11)
It follows that for fixed
N
small t and sufficiently small 8, the matrix N
lim ~o
(l
"'-0
Y(k8)8)-1 =
l
Y(k8)8 is nonsingular and
"'-0 (fo' Y(s) ds)-I
(12)
Hence (10) yields the fact that [Y(8) - IlleS has a limit as 8 - O. this limit A. Then A =
for small t.
~ (Y(8)8 - I)
= [Y(t) - II
[h'
r
Call
l
Y(s) ds
(13)
From this we obtain A
10' Y(s) ds =
Y(t) - I
(14)
Y(O) = I
(15)
and, finally, Y'(t) = AY eAI
for small t. Since Y(t) = Y(t) = eAI for all t in [O,to].
for small t, the functional equation yields EXERCISES
1. What is wrong with the following continuation of (10): From (10) we have N
lim [Y(8)8 - 1] lim ['\' Y(k8)8] ... Y(t) - I
'-0 ",fo
~o
lim [Y(8) - 1] (' Y(8) dB ... Y(t) - 1
~o
••7
8
jo
179
Matrices and Differential Equations I. Let Y and Z denote the solutions of dY = A(I)Y dl
~
Y(O) = I
Z(O) ... I
= ZB(I)
Then the solution of
~~ =
A(I)X
is given by X = YCZ. 8. Write the linear equation u"
+ XB(I)
=
X(O)
+ p(I)U' + q(l)u
C
== 0 in the form
Use the preceding result to show that the linear equation whose general solution is u ... alul l + aiU1UI + a,ul l , where Ul and UI are a set of linearly independent solutions of the second-order equation, is U'"
+ 3p(l)u" + [2p '(l) + p'(I) + 4q(I»)u' + [4p(l)q(l) + 2q'(I)ju
17. The Equation dX/dt = AX Theorem 5. Theorem Ii. The solution of
dX dt
=
AX
+ XB.
+ XB
... 0
Let us now demonstrate
X(O)
=
C
is given by (2)
The proof follows immediately by direct verification. The result, although simple (and a special case of Exercise 2 of Sec. 16), plays an important role in various parts of mathematical physics. EXERCiSES 1. Obtain the solution given above by looking for a solution of the form X ... eAI Y. 2. Let Y(I) be a square root of X(I). Show that
(~~) y + y ed~) 18. The Equation AX + XB = C. establish Theorem 6. Theorem 6. If the expression X
= -
/0"
=
d;
Using the foregoing result, we can
eAICe BI
dt
(1)
exists for all C, it represents the unique solution of
AX
+ XB
=
C
(2)
180
Introduction to Matrix Analysis
Proof.
Consider the equation dZ = AZ
dt
+ ZB
Let us integrate both sides between 0 and lim Z(t) = O. The result is
Z(O) = 00,
(J
(3)
under the assumption that
,.....
- C = A
(fo" Z dS) + (fo" Z dS) B
(4)
We see then that satisfies (2). In Chap. 12, we shall discuss in some detail the existence of the integral in (1) and the solutions of (2). The uniqueness of solution follows from the linearity of the equation in (2). Considering this matrix equation as N2 linear equations for the elements Xoi of X, we see that the existence of a solution for all C implies that the determinant of the coefficients of the Z'I is nonzero. Hence. there is a unique solution. MISCELLANEOUS EXERCISES
1. Consider the sequence of matrices IX.. l defined by
X.. +l
...
X ..(2I - AX..)
XI - B
Under what conditions on A and 1J does this sequence converge, and. if 80. to what does it converge? I. If AB - BA ... 1, and Cl and CI are scalars, then
See R. A. Sack,l where the general problem of determining the expansion of f(A
+ . . . , for the case where AB
~
+ B)
Bee also W. O. Kermack and W. H. McCrea l and R. Kubo,' where related expansions in terms of commutators appear. 8. What is the analogue of the result in Exercise 2 if AB - BA ... cIA? " If A is positive definite, B(t) is positive definite for t ~ 0, and IAI ~ IB(t)1 for all - f(A)
t
~ 0,
then IA I
~
Ifo"
B(t)g(t) dt
I
BA, is discussed.
for any non-negative function g(t) such that
fo" g(t) dt ... 1. I R. A. Sack, Taylor's Theorem for Shift Operators, Phil. Mag., vol. 3, pp. 497-503, 1958. I W. O. Kermack and W. H. McCrea, Proo. Edinburgh Math. 800., vol. 2 (220)• • R. Kubo, J. Chern. PhI/B., vol. 20, p. 770, 1952.
Matrices and Differential Equations ~
15. If A is symmetric positive definite and A defined by the recurrence relation X,,+I = X"
+ ~(A
181
1, then the sequence of matrices
- X"I)
X o '" 0
converges to the positive definite square root of A. t 8. The problem of determining the values of Xsatisfying the determinantal equation 11 - 1I.F1 - X'F1 - • • • - X~F~I - 0, where the F 1 are square matrices of order N, is equivalent to determining the characteristic values of F1
o
••
1
o
F~] 0 0 1 0 ,
See K. G. Guderley, On Nonlinear Eigenvalue Problems for Matrices, J. Ind. and Apul. Math., vol. 6, pp. 335-353, 1958. T. Let H be a function of a parameter I, H = H(I). Then
8. Show that
9. If (A,H('») = oHio', then (A,f(H») == of(H)/ol. 10. Show that eABe- A - B + [A,B) + [A,[A,Bll/21
,.
11. Write e' ... elAelB, Z =
l
,.
Z"I",
Z' =
n-l
l
+ ".
nZ,.t,,-I.
Show that
't-l
+
12. Obtain recurrence relations for the Z" in this fashion, for example ZI ... A B, ZI = (A,B)/2, Za - ([A,[A,Bll ([A,B),Bll/12, . . . (Baker-Campbell-Hausdorff formula). For the results of Exercises 7-12 and many further results, See Wilcox l and
+
Eichler. 1 18. Show that
et(A+B)
== lim
(eAtl"eBtl")".
The result is given in 'Trotter, I and has
n.... ,.
interesting applications to the study of the SchrOdinger equation and Feynman integrals. See Faris.'
t C. Visser, Notes on Linear Operators, Proc. A cad. 8ci. Am8terdam, vol. 40, pp. 270-272, 1937. I R. M. Wilcox, Exponential Operators and Parameter Differentiation in Quantum Physics, J. Math. PhY8., vol. 8, pp. 962-982, 1967. I M. Eichler, A New Proof of the Baker-Campbell-Hausdorff Formula, J. Malh. Boc. Japan, vol. 20, pp. 23-25, 1968. I H. F. Trotter, On the Products of Semigroups of Operators, Proc. Am. Math. Boc., pp. 545-551, 1959. 'W. G. Faris, The Trotter Product Formula for Perturbations of Semibounded Operators, Bull. Am. Math. Boo., vol. 73, pp. 211-215, 1967.
182
Introduction to Matrix Analysis
14. Let x = y + Ax where A depends on a parameter I. Write x = y + Ry, defining the resolvent matrix. Show that R' = (I + R)A'(I + R). 111. Assume that A (t) has distinct characteristic values for 0 .:5 t .:5 h, XI, XI' . • . , XN and let XII XI, , XN, the N associated characteristic vectors, be orthonormal. Show that
,.. = L x.. -
(A'x..,x..) n = 1, 2, .•. , N (A'x..,x')Xj X X,
X:
ipl!"
Discuss the uee of these results for computational purposes (Kalaba-SchmaedekeVeruke). 18. Consider the matrix operation df(T) dT
= ( .. iJf(T» at'i
'1"
where f(T) is a complex-valued function of the NI elements of T and = ~2, i pi! j. Show that
.!!. (f(T)g(T»
dT
d~ 'I'(f(T» d~ ITI
= df(T) g(T)
dT
=
'I"(J(T»
'I.; ...
1, i
=j
+ f(T) dg(T) dT
d~<::)
= ITIT-I
The operator d/dT was introduced by Garding in the study of hyperbolic partial differential equations and has been ueed by Koecher and Maass in the study of Siegel modular functions. See Bellman and LehmanI for another application and references to the foregoing; also See Vetter. 1 17. Introduce the norms N
1lA1i.
=
IIAII I
=
IIA II..
=
max
Lla'il
l$i$N i-I
max
(AZ,Ax)~
(z,f) - I
N
max
Lla'il
l$i:=;;N i - I
c:r(A) =
~ lalil ii'
N(A) =
(~ la'lll) H '.1
M(A) = N max ;.1
la'il
I R. Bellman and R. S. Lehman, The Reciprocity Formula for Multidimensional Theta Functions, Proc. Am. Malh. Soc., vol. 12, pp. 954-961, 1961. I W. J. Vetter, An Operational Calculu8 for Matrice8, to appear.
Matrices and Differential Equations
183
For what A do the ratios of these norms achieve their upper and lower bounds. See B. J. Stone, Best Possible Ratios Of Certain Matrix Norms, Technical Report 19, Stanford University, 1962. 18. Let u(l) be a function of t n times differentiable over [0, T) and consider the quadratic form Q,,(a) =
loT
(u(,,)
+ alu(,,-I) + ... + a,.U)1 dt
Does min Q,,(a) necessarily approach zero as n -+ 00. Q
19. Consider the case where the interval is [- 00,00). Use Fourier integrals and the Parseval-Plancherel theorem to obtain an expreBSion for min Q,,(a) in terms of Q
orthogonal polynomials. (The foregoing is connected with differential approximation see Bellman and Kalaba l and Lew. ' ) 20. Consider the matrix equation X - UXV = W. Show that a formal solution
..
is X =
l
U·WVk.
When does the series converge and actually represent a solution?
.1:-0
See R. A. Smith, Matrix Equation XA + BX = C, 8IAM J. Appl. Math., vol. 16, pp. 198-201, 1968. 21. Consider the equation X = C + e(AX + XB), e a scalar parameter. Write X = C
+
..
l E"~,,(A,B). ,,-I
Show that
~"
=
A~"_I + ~"_IB, n ~
I, with
~o
= C, and
that
22. Introduce a position operator P with the property that when it operates on a monomial consisting of powers of A and B in any order with C somewhere, it shifts all powers of A in front of C and all powers of B after C. Thus
Further, define P to be additive, P(ml(A,B)
+ ml(A,B»
= P(ml(A,B»
+ P(ml(A,B»
where ml and ml are monomials of the foregoing types. Show that ~,,(A,B) = P«A + B)"C). 2S. Hence, show that X, as defined by Exercise 21, may be written X = P([I e(A + BW'C). 2'- Similarly show that if X = E + e(AX + XB + CXD), t~en X = P([I _lA + B + CF»)-IE), with P suitably defined. I R. Bellman and R. Kalaba, Quasilinearization and Nonlinear Boundary-value Problems, American Elsevier Publishing Company, Inc., New York, 1965. I A. Lew, Some Resull8 in Differential Approximation, University of Southern Cali· fornia Press, Los Angeles, USCEE-314, 1968.
Introduction to Matrix Analysis
184
215. Show that if all the characteristic roots of A are less than one in absolute value, the solution of A *XA - X ... -Q is given by
X where
f
=
(2lr.)-1
f
(A *
r 11)-IQ(A - ,1)-1,-1 dz
denotes integration round the circle
X = (211")-1 where (J
-
-
Izl
= 1.
Alternately,
f~r (0- 1) *Qo-I dB
A - Ie". See R. A. Smith.•
Bibliography and Discussion §1. The equilibrium or stability theory of differential systems was started independently, and almost simultaneously, by Poincar~ and Lyapunov. Further references and discussion will be found in Chap. 13 and in the book R. Bellman, Stability Theory of Differential Equatiom, Dover Publications, New York, 1969.
Some interesting transformations of matrix differential equations may be found in the following papers:
J. H. Barrett, Matrix Systems of Second Order Differential Systems, Portugalicae Math., vol. 14, pp. 7s}-89, 1955. J. H. Barrett, A Prufer Transformation for Matrix Differential Equations, Proc. Am. Math. Soc., vol. 8, pp. 510-518, 1957. 2 §2. The matrix exponential lies at the very heart of all advanced work in the field of linear functional equations. Suitably generalized to arbitrary operators, it is the foundation stone of the theory of semigroups, see E. Hille, Functional Analysis and Semi-groups, Am. Math. Soc. Pub., 1942. and its revised form • R. A. Smith, Matrix Calculations for Lyapunov Quadratic Forms, J. DiJf. Eq., vol. 2, pp. 208-217, 1966. R. A. Smith, Bounds for Quadratic Lyapunov Functions, J. MotA. Anal. Appl., vol. 12, pp. 42&-435, 1966. I J. J. Levin, On the Matrix Riccati Equation, Proc. Am. Math. Soc., vol. 10, pp. 519-524, 1959.
Matrices and Differential Equations
185
E. Hille and R. Phillips, Functional Analysis and Semi-groups, Am. Math. Soc. Colloq. Pub., vol. 31, 1958. In a second direction, the problem of expressing eAleBI in the form eet, where A and B do not commute, has important ramifications not only in the theory of Lie groups and algebras, but also in modem quantum field theory. The interested reader may consult W. Magnus, Algebraic Aspects of the Theory of Systems of Linear Differential Equations, Research Rept. BR-3, New York University, Institute of Mathematical Sciences, June, 1953; also Comm. Pure Appl. Math., vol. 7, no. 4, 1954. H, F. Baker, On the Integration of Linear Differential Equations, Proc. London Math. Soc., vol. 34, pp. 347-360, 1902; vol. 35, pp. 333-374, 1903; second series, vol. 2, pp. 293-296, 1904. H. F. Baker, Alternants and Continuous Groups, Proc. London Math. Soc., second series, vol. 3, pp. 24-47, 1904.
H. B. Keller and J. B. Keller, On Systems of Linear Ordinary Differential Equations, Research Rept. EM-33, New York University, Washington Square College, Mathematics Research Group, 1957. F. Hausdorff, Die symboIische exponential Formel in der Gruppentheorie, Saech8ischen Akademie der Wissenschaften zu Leipzig, Math.-phys. Klasse, vol. 58, pp. 19-48, 1906. K. Goldberg, The Formal Power Series for log (eSe ll ), Duke Math. J., vol. 23, pp. 13-21, 1956. t In another, and related direction, we meet the problem of expressing the solution of a linear system of the form dX/dt = A(t)X, X(O) = I, in the form of an II exponential." This question leads to the study of II product integrals"; see L. Schlesinger, Neue Grundlagen fur einen Infinitesimalkalkul der Matrizen, Math. Z., vol. 33, pp. 33-61, 1931. L. Schlesinger, Weitere Beitrage zum Infinitesimalkalkul de'r Matrizen, Math. Z., vol. 35, pp. 485-501, 1932. G. Rasch, Zur Theorie und Anwendung des Produktintegrals, J. rune angew. Math., vol. 171, pp. 65-119, 1934.
B. W. Helton, Integral Equations and Product Integrals, Pacific J. Math., vol. 16, pp. 297-322, 1968.
t See also Kuo-Tsai Chen, Integration of Paths, Geometric Invariants and a Generalized Baker-Hausdorff Formula, Ann. Math., vol. 65, pp. 163-178, 1957.
Introduction to Matrix Analysis
186
For more recent developments, see W. Magnus, On the Exponential Solution of Differential Equations for a Linear Operator, Comm. Pure Appl. Math., vol. 7, pp. 64s}-673, 1954. For an understanding of the physical interest in this problem, see R. P. Feynman, An Operator Calculus Having Applications in Quantum Electrodynamics, Phys. Rev., vol. 84, pp. 108-128, 1951. Finally, let us mention that a great deal of effort has been devoted to the generalization of the foregoing results and methods to infinite systems of linear differential equations with constant coefficients. Systems of this type arise in a natural fashion in probability theory, particularly in its applications to physics and biology in the study of /I birth-anddeath" processes. These matters will be discussed again in Chap. 16. See N. Arley and A. Borchsenius, On the Theory of Infinite Systems of Differential Equations and Their Applications to the Theory of Stochastic Processes and the Perturbation Theory of Quantum Mechanics, Acta Math., vol. 76, pp. 261-322, 1945. R. Bellman, The Boundedness of Solutions of Infinite Systems of Linear Differential Equations, Duke Math J., vol. 14, pp. 695-706, 1947.
§3. Further results concerning norms of matrices, and further references, may be found in A. S. Householder, The Approximate Solution of Matrix Problems, J. Assoc. Compo Mach., vol. 5, pp. 205-243, 1958. J. von Neumann and H. Goldstine, Numerical Inverting of Matrices of High Order, Bull. Am. Math. Soc., vol. 53, pp. 1021-1099, 1947. A. Ostrowski, ttber Normen von Matrizen, Math. Z., Bd. 63, pp. 2-18, 1955.
K. Fan and A. J. Hoffman, Some Metric Inequalities in the Space of Matrices, Proc. Am. Math. Soc., vol. 6, pp. 111-116, 1955. T. E. Easterfield, Matrix Norms and Vector Measures, Duke Math. J., vol. 24, pp. 663-671, 1957. §Ii. The reader who is familiar with the theory of Lebesgue integration will see that the result of Theorem 1 can be obtained under much weaker conditions on A (t). However, since we have no need for the stronger result, we have not stated it.
Matrices and Differential Equations
181 §14. Variation of the characteristic values and characteristic roots of A as the dimension changes can be discussed using the techniques presented in R. Bellman and S. Lehman, Functional Equations in the Theory of Dynamic Programming-X: Resolvents, Characteristic Values and Functions, Duke Math. J., 1960.
§11i. Non-negativity results of this type play an important role in probability theory and mathematical economics, where the physical model makes the result intuitively clear. This point will be discussed again in Chap. 16. They also playa role in the study of various classes of nonlinear equatiolls; see
R. Bellman, Functional Equations in the Theory of Dynamic Programming-V: Positivity and Quasi-linearity, Proe. Natl. Aead. Sci. U.S., vol. 41, pp. 743-746, 1955. R. Kalaba, On Nonlinear Differential Equations, the Maximum Operation, and Monotone Convergence, Ph.D. Thesis, New York University, February, 1958. For an extensive discussion of the positivity of operators, see Chap. 5 of E. F. Beckenbach and R. Bellman, Inequalities, Springer, 1961. The first proof is due to S. Karlin, and the second proof to O. Taussky. The result was first presented in R. Bellman, I. Glicksberg, and O. Gross, On Some Variational Problems Occurring in the Theory of Dynamic Programming, Rend. eire. Palermo, Serie II, pp. 1-35, 1954. §16. The result and proof follow G. Polya, tiber die Funktionalgleichung der ExponentiaIfunktion in Matrizkalkul, Sitzber. Akad. Berlin, pp. 96-99, 1928. ThenatureofthesolutionsofY(s + t) = Y(s)Y(t) without the normalizing condition Y(O) = I is also of interest. See Shaffer. l §17. We are not aware of the origin of this result which may be found in the papers of many al1;thors. For an extensive discussion of this and related equations, see
J. A. Lappo-Danilevsky,
M~moires
~uations diff~rentieliesliMaires,
sur la tMorie des systemes des Chelsea Publishing Co., New York,
1953. 1 C. V. Shaffer, On Singular Solutions to Polya's Functional Equation, IEEE, vol. AC.13, pp. 135-136, 1968.
Introduction to Matrix Analysis
188
The fact that the solution of the equation dX/dt = AX + XB has the indicated form becomes quite natural when one examines the way in which it arises in quantum mechanics; cf. D. ter Haar, Element8 of Stati8tical Mechanic8, Rinehart &: Company, Inc., New York, p. 149, 1954. Y. Nambu, Progr. Theoret. PhY8. (Kyoto), vol. 4, p. 331, 1949. H. Primae and H. Gunthard, Eine Methode zur direkten Berechnung . . . , Helv. PhY8. Acta, vol. 31, pp. 413-434, 1958.
§18. For a discussion of the operator T defined by TX = AX - XB, see M. Rosenbloom, On the Operator Equation BX - XA = Q, Duke Math. J., vol. 23, pp. 263-269, 1956. For a generalization to the operator 8 defined by 8X = lA/X Bil see G. Lumer and M. Rosenbloom, Linear Operator Equations, Proc. Am. Math. 8oc., vol. 10, pp. 32-41, 1959. Exercises 2 and 3 are taken from R. Bellman, On the Linear Differential Equation Whose Solutions Are the Products of Solutions of Two Given Linear Differential Equations, Bol. unione mat. Italiana, ser. III, anno XII, pp. 12-15, 1957.
The intimate connection between matrix theory and the theory of the behavior of the solutions of linear differential equations and thus with the stud,y of linear oscillatory systems leads to the beautiful theory of Gantmacher and Krein, V. Gantmacher and M. Krein, Sur les matrices completement non ntSgatives et oscillatoires, Compo math., vol. 4, pp. 445-476, 1937. As pointed out at the end of the remarks on Chap. 16, these results have important applications in probability theory. For a treatment of the equation
where A, B, and
are symmetric by variational techniques, under suitable hypotheses concerning A, B, and C, see (J
Matrices and Differential Equations
189
R. J. Duffin, A Minimax Theory for Overdamped Networks, J. Rat. Mech. and Analysis, vol. 4, pp. 221-233, 1955. See also A. M. Ostrowski, On LancllBter's Decomposition of a Matrix Differential Operator, Arch. Rat. Mech. Anal., vol. 8, pp. 238-241, 1961.
H. Langer, V'ber Lancaster's Zerlegung von Matrizen-80haren, Arch. Rat. Mech. Anal., vol. 29, pp. 75-80, 1968. The study of linear circuits leads in another way to an interesting domain of matrix theory, namely, the study of when a given matrix of rational functions can be considered to be the open-circuit impedance matrix of an n-port network. The concept of a positive real matrix enters here in a fundamental fashion. For work on this subject, and references, see L. Weinberg and P. Slepian, Positive Real Matrices, Hughes Research Laboratories, Culver City, Calif., 1958.
11 Explicit Solutions and Canonical Forms 1. Introduction. Although the results obtained in Chap. 10 are of great elegance, it must be confessed that they do not satisfactorily resolve the problem of obtaining explicit scalar solutions. In this chapter, we shall pursue this question and show that it leads us to the question of determining the characteristic roots and vectors of matrices which are not necessarily symmetric. As in the case of symmetric matrices, this leads us to the study of various canonical forms for matrices. i. Euler's Method. In Chap. 10, we demonstrated the fact that the vector-matrix equation dx -dt = Ax
x(O) = c
(1)
possessed a unique solution which could be represented in the form (2)
Here, we traverse another road. Following the method of Euler, let us begin by looking for special solutions of the form x = eA1cl, where A is a scalar and c1 a vector, ignoring for the moment the initial condition x(O) = c. Substituting, we see that A and c1 are bound by the relation (3)
Since c1 is to be nontrivial, A must be a root of the characteristic equation (4) IA - HI = 0 and c1 must be an associated characteristic vector. From this quite different direction then, we are led to the basic problem treated in the first part of the book-the determination of characteristic roots and vectors. Here, however, the analysis is both more complicated and less complete due to the fact that the matrices we encounter are not necessarily symmetric. 190
Explicit Solutions and Canonical Forms
191
EXERCISE
1. Find the ebaracteristic roots and vectors of the matrix
A 1
o
A 1
A=
o
1
3. Construction of a Solution. Let us now see if we can construct a solution of the initial value problem along the preceding lines. As before, denote the characteristic roots by A1) A2, • • • , AN. Since these will, in general, be complex, there is now no attached ordering. To simplify our initial discussion, let us assume initially that the characteristic roots are distinct. Let cl , c2, • • • , cN , be a set of associated characteristic vectors, also, in general, complex. To put together the required solution, we use the principle of superposition. Since eA·'c Ao is a solution for k = 1, 2, . • . , N, the linear combination (1)
is a solution of d:c/dt = A:c for any set of scalars aAo. The question is now whether or not these scalars can be chosen so that :c(0) = c. This condition leads to the vector equation (2)
This system has a unique solution provided that
e is nonsingular, where (3)
and, as before, this represents the matrix whose columns are the vectors cO. 4. Nonsingularity of e. There are several ways of establishing the fact that e is nonsingular. The first method proceeds as follows. Suppose that lei = O. Then there exist a nontrivial set of scalars bt , b2, • • • , bN such that (1)
192
Introduction to Matrix Analysis
It follows that (2)
is a solution of d:c/dt = A:c satisfying the initial condition :c(0) = O. The uniqueness theorem asserts that actually z = 0 for all t. We must then show that the relation (3)
cannot hold for all t if the {b k I are a nontrivial set of scalars, none of the tI' is trivial, and the Ai are all distinct. Without loss of generality, assume that bN ~ O. Divide through by e),11 in (3) and differentiate the resultant expression with respect to t. The new expression has the form N
Lb~(Ak
- AI)e(),·-),Il'
'-2
=0
(4)
Since the A. have been assumed distinct, the quantities A. - AI, k = 2, 3, . . . , N, are all distinct. We may now appeal to an inductive argument, or merely repeat the above procedure. It is clear that in this way we eventually reach a relation of the form bNCN(AN - AI)(AN - A2) ••• (AN - AN_I)e(),N-),N_IH
=0
(5)
Since cN is nontrivial, we must have bN = 0, a contradiction. 6. Second Method. Beginning with the relation in (4.1), multiply both sides by A. The result is N
L
o=
'-1
bkAkcil
(1)
Repeating this operation (N - 1) times, the result is the system of linear equations r
= 0, 1, 2, . . . ,N - 1
(2)
Considering only the ith components of the ck , the quantities Cik, we obtain the scalar equations N
o=
L
'-I
bkAkret
r
= 0, 1, 2, . . . , N - 1
(3)
Explicit Solutions and Canonical Forms
193
Since not all of the b" are zero, and since all of the c/, cannot be zero as i and k take the values 1, 2, . . . ,N, we see that (3) possesses a nontrivial solution for some i. It follows then that the determinant of the coefficients k
1, 2, . . . ,N, r
=
=
0, 1, 2, . . . ,N - 1
(4)
must be zero. This, however, is not true, as we shall see in a moment. 6. The Vandermonde Determinant. The determinant 1 Al A12 IAkrl
1 A2 A22
1 AN
(1)
=
A1N-t
A2N-t
ANN-t
is a very famous one, bearing the name, Vandermonde determinant. We wish to show that IAkrl ¢ 0 if Ai ¢ Aj for i ¢ j. The simplest way to do this is to evaluate the determinant. Regard IAkrl as a polynomial of degree N - 1 in AI. As a polynomial of degree N - 1, IAkrl has the roots Al = A2, Al = A3, • • • ,AI = AN, since the determinant is zero whenever two columns are equal. It follows that
where q is a function depending only upon A2 , A3, • • • ,AN. Similarly, viewed as a polynomial of degree N - 1 in A2 , IAkrl must possess the factor (A 2 - A1)(A 2 - AI) ••• (A2 - AN). Continuing in this way, we see that IAkrl
n
=
(Aj - A;)I/l(A 1 ,A 2,
•••
,AN)
(3)
ISi
where f/J is a polynomial in the Ai. Comparing, however, the degrees in the Ai of the two sides, we see that 41 must be a constant. Examining the coefficients of A1A22 • • • AN_IN-Ion both sides, we see that f/J = 1. From this explicit representation, we see that IAilrl F 0 if >.0 ¢ Ai for i Fj. EXERCISE 1. Using similar reasoning, evaluate the Cauchy determinant
1_+_1 1
A,
lA/
i, j = 1, 2, . . . ,N
Introduction to Matrix Analysis
194
'1. Explicit Solution of Linear Differential Equations-Diagonal Matrices. Let us now consider another approach, quite different from that given in the preceding sections. Given the equation
dx -
dt
=
Ax
x(O)
=
c
(1)
make a change of variable, x = Ty, where T is a constant nonsingular matrix to be chosen in a moment. The equation for y is then dy
dt
= T-IATy
yeO) = T-1c
(2)
What choice of T will simplify this equation to the point where the solution can be immediately obtained? Suppose that we can find a matrix T such that
o T-IAT
=
(3)
o /ioN
a diagonal matrix. If this can be done, the equation in (2) decomposes into N independent equations of the form dYi
tit
=
/IoiYi
Yi(O)
=
having the elementary solutions Yi
c;
i
= e";lc~.
=
1,2, . . . ,N
(4)
Once y has been determined,
x is readily determined.
8. DiagonaUzation of a Matrix. Let us now discuss the possibility of the diagonalization of A, a problem of great interest and difficulty. As we know from the first part of the book, if A is symmetric, real or complex, a matrix T possessing the required property can be found, with T-l = T ' . As we shall Bee, there are a number of other important classes of matrices which can be diagonalized, and, what is more important, there are other types of canonical representations which are equally useful. Consider, however, the general case. To begin with, we know that the {~l must be the same set as the {A;} since the characteristic roots of T-IAT are the same as those of A. It follows then, as in previous chapters, that the columns of T must be characteristic vectors of A.
Explicit Solutions and Canonical Forms
195
Conversely, if the Ai are distinct and T is the matrix formed using the associated characteristic vectors as columns, it follows that
o AT
=
T
(1)
o We have thus established the following important result. Theorem 1. If the characteristic roots of A are distinct, there exists a matrix T 8'UCh that
o (2)
o As we shall see, the assumption that the roots are distinct is now quite an important one. In the general case, no such simple representation holds, as we shall see in Sec. 10. EXERCISES 1. Show that the Cayley-Hamilton theorem is valid for matrices with distinct characteristic roots. 2. Show that the assumption concerning N distinct characteristic roots may be replaced by one requiring N linearly independent characteristic vectors.
9. Connection between Approaches. As we have seen, one method of solution of the linear equation
dx - = Ax dt
x(O)
= C
(1)
leads to the expression x = eA1c, while on the other hand, a method based upon characteristic roots and vectors produces scalar exponentials. To obtain the connection between these approaches (still under the
196
Introduction to Matrix Analysis
assumption that Ai ptf: Ai), which must exist because of uniqueness of solution, in the equation dX =AX X(O) = I (2) dt make the change of variable X = TY, where T is as in Sec. 8. satisfies the equation
dY = T-IATY dt
Y(O)
= T-l
Then Y
(3)
or
Al 0
A2 dY (j[=
Y
Y(O)
=
T-I
G4)
0
AN It follows that
eAJI
o
eA• ,
T-l
Y=
(5)
0
whence
eAJI
o eAII T-l
X = eAI = T
(6)
0
Let us again note that this representation has been established under the assumption that the characteristic roots of A are distinct. EXERCISES
1. Establish the representation in (6) directly from the exponential aeries and the representation for A.
Explicit Solutions and Canonical Forms
197
II. Use (6) to show that leAl ... e~r (A). 8. Use the method of continuity to show that this result is valid for any square matrix A.
10. Multiple Characteristic Roots. Let us now turn to a discussion of the case where A has multiple roots. In order to appreciate some of the difficulties that we may expect to encounter, let us begin by showing that it may not always be possible to obtain a canonical representation such as that given in (8.2). Theorem 2. There exist matrices which cannot be reduced to a diagonal form by means of a nonsingular matrix, as in (8.2). Equivalently, there exist matrices of dimension N which do not possess N linearly independent characteristic vectors. Proof. Consider the particular matrix A
=
[~ ~]
(1)
If there exists a T such that
T-IAT =
[AIo
0] A2
we know that the columns of T must be chl!oracteristic vectors of A. us then determine the characteristic roots and vectors. The characteristic equation is 1- A 1
o
Hence 1 is a double root. the equations
1 I = (1 - A)2 = 0 1- A
(2)
Let
(3)
The characteristic vectors are determined by (4)
It follows that X2 = 0, with XI arbitrary. istic vectors are scalar multiples of
Consequently, all character(5)
This means that T must be a singular matrix. Observe then the surprising fact that an arbitrary matrix fleed not possess a full complement of characteristic vectors. This makes us more appreciative of what a strong restriction on a matrix symmetry is. The fact that diagonalization may not be always available greatly complicates the study of general matrices-and, in return, adds equally to their interest. As we shall see below, there are several methods we
198
Introduction to Matrix Analysis
can use, based upon the use of alternate canonical forms and approximation theorems, to bypass some of these hurdles. EXERCISES
1. Why can't we use a continuity argument to deduce a diagonal representation for general matrices from the result for matrices with distinct characteristic roots? I. Derive the solution of the differential equation €hI
dt €h.
dt -
1:1
+ 1:1
1:1
1:1(0) -
1:1(0) -
Ct
CI
and thus show that the matrix
[~ ~]
A -
cannot be reduced to diagonal form.
11. Jordan Canonical Form. Let us now discuss a canonical form for an arbitrary square matrix which is useful in the treatment of a number of problema. Since its proof is quite detailed and is readily available in a number of texts, and furthermore since its use can always be avoided by one means or another (at leas\ in our encounters), we shall merely state the result without a proof. Theorem 3. Let us denote by Lk(A) a k X k matrix of the form
o o
A 1 0 A 1
o
(1) 1
o whel:e Lt(A) = A.
0
...
A
There exists a matrix T such that Lk,(A t )
o
T-tAT =
(2)
o
Explicit Solutions and Canonical Forms
199
with k l + k 2 + ... + k, = N. The Ai are the characteristic roots of A, not neces8arily distinct. This representation is called the Jordan canonical form. For example, if A is of the third degree and has Al as a root of multiplicity three, it may be reduced to one of the three forms
["g ~J ["g 0
A = I
Al
0 1
A3
=
Al
0
["g !J 0
A = 2
n
Al
0
(3)
EXERCISES
1. Using the Jordan canonical form, determine the form of eA1 for generll.l A. I. Use this result to obtain the necessary and sufficient condition that eAI -+ 0 as
t -+
00.
8. Prove that the Ai, i = 1, 2, 3, are distinct in the sense that there exists no T for which T-IAiT = A it for i ;o! j. Give both an algebraic proof and one depending upon the solutions of the associated linear differential equations. 4. Show that Lk(X) - xl when raised to the kth power yields the null matrix.
12. Multiple Characteristic Roots. Using the Jordan canonical form, we possess a systematic method for obtaining the explicit analytic structure of the solutions of linear differential equations with constant coefficients. Thus, if the equation is reduced to
(1)
we
have Y2
= eA1c2,
and YI determined as the solution of (2)
Using the integrating factor e- M , this yields (3)
whence (4)
Introduction to Matrix Analysis
200
This is one way of seeing why terms of the form teA', and generally terms of the form tke A', enter into the solution in the case of multiple roots. Let us now discuss an alternative approach. If A has distinct characteristic roots, AI, A2, ••• , AN, we may write N particular solutions of dy/dt = Ay in the form y = e(A)e Al , A = AI, A2, • • • , AN, where e(A) is a polynomial fun:}tion of A. This we can see in the following way. As we know, to obtain a solution of the equation, we set y = eA1e, where e is a scalar, and determine A and e from the equation Ae = Ae
or
(A - H)e = 0
(5)
Once A has been determined from the characteristic equation
IA - HI
=
0
we must determine a solution of the linear system in (5). b ij
IA -
=
denote the cofactor of the term of IA - HI. Then
aij -
i
ei = bli
Let (6)
Hlij
Mij
in the determinantal expansion
= 1,2, . . . , N
(7)
is a solution of (5), with the property that the ei are polynomials in A. It may happen that this is a trivial solution in the sense that all the Co are zero. In that case, we may try the alternative sets of solutions i = 1,2, . . .' , N
(8)
for k = 2, 3, . . . ,N. Let us now show that one of these sets of solutions must be nontrivial if the characteristic roots of A are all distinct. In particular, let us show that for any particular characteristic root, A = AI, not all the expressions b.. can be zero if Al is not a multiple root. Consider the characteristic equation au - A a21
I(A) =
au a22 -
alN a2N
A
IA - HI =
=0 aNI
aN2
...
aNN -
A
(9)
Explicit Solutions and Canonical Forms
201
Then, using the rule for differentiating a determinant, we have
1'(>-)
-1
0
a2l
an -
0
>-
a2N
=
aNI
aN2
aNN al1 -
>-
0
>-
a12
alN
-1
0
+
o =
-b l1
-
b22
-
••• -
bNN
o
-1 (10)
= 0, i = 1, 2, , N, then /'(>-1) = 0, which means that >-1 is a multiple root, a contradiction. It follows that we have a solution of the desired form which we use in the following fashion. If >-1 and >-2 are two distinct characteristic roots, then If, for >- = >-1, we have bii(>-I)
(11)
is also a solution of the differential equation. The case where >-1 is a multiple root can be considered to be a limiting case of the situation where >-2 approaches >-1. Taking the limit as >-2 - >-1, we find, as a candidate for a solution, the expression (12)
We leave as an exercise for the reader the task of putting this on a rigorous foundation. Let us merely point out that there are two ways of establishing this result, direct verification, or as a consequence of general theorems concerning the continuous dependence of solutions upon the matrix A.
202
Introduction to Matrix Analysis EXERCISE
1. How do we obtain solutions ih) is a double root, using a direct algebraic approach 'f
13. Semidiagonal or Triangular Form-Schur's Theorem. Let us now prove one of the most useful reduction theorems in matrix theory. Theorem 4. Given any matrix A, there exists a unitary matrix T such that
(1)
T*AT =
o where, as the notation indicates, the elements below the main diagonal are zero. Proof. Let us proceed by induction, beginning with the 2 X 2 case. Let Al be a characteristic root of A and e l an associated characteristic vector normalized by the condition (el,e l) = 1. Let T be a matrix whose first column is el and whose second column is chosen so that T is unitary. Then evaluating the expression T-IAT as the product T-I(AT), we see that (2)
The quantity b22 must equal A2 since T-I A T has the same characteristic roots as A. Let us now show that we can use the reduction for Nth-order matrices to demonstrate the result for (N + I)-dimensional matrices. As before, let e l be a normalized characteristic vector associated with Air and let N other vectors ai, a 2, • • • , aN, be chosen so that the matrix T I , whose columns are e l , ai, a 2, • • • ,aN, is unitary. Then, as for N = 2, we have Al
bu l
..
b1N+JI
0 TI-IAT I
=
BN
(3)
0 where B N is an N X N matrix. Since the characteristic equation of the right-hand side is (4)
Explicit Solutions and Canonical Forms
203
it follows that the characteristic roots of B N are A2, A3, • • ,AN+I, the remaining N characteristic roots of A. The inductive hypothesis asserts that there exists a unitary matrix TN such that CIN CIN (5)
o Let TN +I be the unitary (N
+ 1) X (N + 1) matrix formed as follows: 1 0 ... 0
o (6)
o Consider the expression (TITN+I)-IA(TITN+I) = TN+1-I(TI-IATI)TN+I Al
b12 A2
bl •N + I
(7)
o The matrix TITN+I is thus the required unitary matrix which reduces A to semidiagonal form. EXERCISES 1. Show that if A has k simple roots, there exists a matrix T such that XI
o T-IAT =
0
hN
'"
0 0
b'.hl
X2
b2ok + 1
b2•N
•• ,
•••
X.
bM+I
b•. N
0
X.+I
o
o
II. Using the semidiagonal form derived above, obtain the general solution of dx/dt = Ax, x(O) "" c. 8. Determine a necessary and sufficient condition that no terms of the form ,;., occur.
204
Introduction to Matrix Analysis
4. Determine the general solution of x"
+ Ax
=
0, where A is positive definite.
6. Use the triangular form to obtain necessary and sufficient conditions that eA1 -+ 0 as t -+ 00. • 8. Consider the difference equation x(n + 1) = Ax(n), n = 0, I, . . . , where
= c. Show that the solution is given by x(n) = A "c. 7. Find particular solutions by setting x(n) = ""c, and determine the set of possible " and c. 8. Show that the general solution has the form
x(O)
N
x(n) =
l
p.(n)","
i=l
where the p.(n) are vectors whose components are polynomials in n of degree at most N-l. 9. What is the necessary and sufficient condition that every solution of x(n 1) = Ax(n) approach zero as n -+ oo? 10. Consider the difference equation x«n + I)A) = x(nA) + A Ax(nA), n = 0, 1, •.. ,x(O) = c, where A is a positive scalar. As A -+ 0, show that x(nA) -+ x(t), the solution of the differential equation dx/dt = Ax, x(O) = c, provided that nA -+ t. Give two proofs, one using the explicit form of the solution, and the other without this aid.
+
14. Normal Matrices. Real matrices which commute with their transposes, or, more generally, complex matrices which commute with their conjugate transposes, are called normal. Explicitly, we must have AA*=A*A
(1)
The utility of this concept lies in the following theorem. Theorem 6. If A is normal, it can be reduced to diagonal form by a unitary matrix. Proof. As we know, we can find a unitary transformation, T, which reduces A to semidiagonal form,
>-1 b12 >-2 A
T-I = TST-I
T
(2)
0
Then, since T is unitary,
>-1 b 12 A* = T
bIN
o >-2
T-I
TS*T-I
(3)
Explicit Solutions and Canonical Forms
205
AA'" = TSS"'T-I = A"'A = TS"'ST-I
(4)
SS'" = S...S
(5)
Hence, Thus, we must have
Equating corresponding elements in the products, we see that bij = 0
l~i<j~N
(6)
which means that the semidiagonal form in (2) is actually diagonal. EXERCISES 1. If A is a normal matrix with all characteristic roots real, it is Hermitian, and it can be reduced to diagonal form by an orthogonal transformation. II. Show that (Ax,Ax) = (A'x,A *x) if A is normal. 8. Show that A - hI is normal if A is normal. ,. Using the foregoing exercises, or otherwise, show that if A is normal, then x is a characteristic vector of A if and only if it is a characteristic vector of A *. 15. If A is normal, characteristic vectors x, y belonging to distinct characteristic values are orthogonal in the sense that (x,ii) = O. 8. Establish Theorem 5 using this fact. 7. Prove that the converse of Theorem 5 is also true. 8. If B is normal and there exists an angle /I such that Ae.B + A *e- iB ~ 0 where AI = B, then A is normal (Putnam, Proc. Am. Math. Soc.,lJol. 8, pp. 768-769, 1957). 9. If A and B commute, then A * and B commute if A is normal.
16. An Approximation Theorem. Let us now state a very useful approximation theorem which can often be used to treat questions involving general square matrices. Theorem 6. lVe can find a matrix T such that
T-lAT =
with
~ Ib'jl ~
E,
where
E
(1)
is any preassigned positive constant.
t,1
Proof. Let T 1 be a matrix which reduces A to semidiagonal form. Then the change of variable y = Tlz converts dyjdt = Ay into
206
Introduction to Matrix Analysis dZ I (j[ = Aizi + b U z2 + ... + blNZN dZ 2 (j[ =
A2Z2
+ . . . + bWzN
(2)
It is now easy to see how to choose T so as to obtain the stated result. In (2) make the change of variable (3)
Then the new variables
Zil
satisfy the system of equations
dz i l (If =
AIZII
+ r lb l 2Z2 1 + ... + rIN-IbINZNl
dZ 2 1 df =
A2z2 1
+ . . . + rI N - 2b 2NZN 1
(4)
With a suitable choice of rh the sum of the absolute values of the off-diagonal terms can be made as small as possible. The last change of variable is equivalent to a transformation Z = EZI, where E is nonsingular. Hence we can take T to be TIE. At first sight, the above result may seem to contradict the result that a general matrix possessing multiple characteristic roots may not be reducible to diagonal form. The point is that T depends upon E. If we attempt to let E ---+ 0, we find that either T approaches a singular matrix, or else possesses no limit. 16. Another Approximation Theorem. A further result which can similarly be used to reduce the proof of results involving general matrices to a proof involving diagonal matrices is given in Theorem 7. Theorem '1. Given any matrix A, we can find a matrix B with distinct characteristic roots such that IIA - BII ~ E, where E is any preassigned positive quantity. Prool. Consider the matrix A + E, where E = (eil) with the eij independent complex variables. If A + E has a multiple characteristic root, then I(A) '"" IA + E - HI and I'(A) have a root in common. If I(A)
Explicit Solutions and Canonical Forms
207
and f'(},,) have a root in common, the resultant of the two polynomials, R(E), a polynomial in the e,j must vanish. We wish to show that we can find a set of values of the eij for which ~ leiil is arbitrarily small, and for '.1 which R(E) ¢ O. The negation of this is the statement that R(E) vanishes identically in the neighborhood of the origin in eii space. If this is true, the polynomial R(E) vanishes identically, which means that f(},,) always has a multiple root. Consider, however, the following choice of the eij, i¢j
eij = -aii eii=i-aii
i = 1,2, . . . , N
(1)
The matrix A + E clearly does not have multiple roots. Hence, R(E) does not vanish identically, and we can find a matrix B = A + E with the desired properties. EXER.CISE 1. Construct a proof of this result which depends only upon the fact that A can be reduced to triangular form. (The point of this is that the foregoing proof, although short and rigorous, makes use of the concept and properties of the resultant of two polynomials which are not as easy to establish rigorously as might be thought.)
17. The Cayley-Hamilton Theorem. Using the approximation theorem established in Sec. 16, we can finally establish in full generality the famous Cayley-Hamilton theorem. Theorem 8. Every matrix satisfies its characteristic equation. Proof. Let A + E, with IIEII ~ E, be a matrix with distinct characteristic roots. Then, as we know, A + E satisfies its characteristic equation. The characteristic polynomial of A + E is f(}",E) = IA + E - HI, a polynomial in }" whose coefficients are polynomials in the elements of E, and thus continuous in the elements of E. Hence, lim f(A B-O
Since f(A
+ E,E)
= f(A)
(1)
+ E,E) = 0, we see that f(A) = O. EXER.CISES
1. Establish the Cayley-Hamilton theorem using the Jordan canonical form. I. Establish the Cayley-Hamilton theorem under the assumption that A possesses a full set of characteristic vectors which are linearly independent. 8. Use the Cayley-Hamilton theorem to derive a representation for A -I as a polynomial in A, provided that A is nonsingular.
18. Alternate Proof of Cayley-Hamilton Theorem. Once we know what we wish to prove, it is much easier to devise a variety of short and
208
Introduction to Matrix Analysis
elegant proofs. Let us present one of purely algebraic nature, making no appeal to continuity. Consider the matrix inverse to A - H, (A - H)-I, for Anot a characteristic value. We see that (A - H)-I = B(A)/f(A)
(1)
where the elements of B(A) are polynomials in A of degree N - 1 and f(A) is the characteristic polynomial of A - H, the determinant IA - HI. Hence, (A - H)B(A) = f(A)I
Write B(A) = AN-IBN_I f(A) = (-I)NAN
+ AN-2BN_2 + + Bo + clAN-I + + CN
where the B. are matrices independent of A. obtain a sequence of relations
(2) (3)
Equating coefficients, we
-B N- I = (-l)NI ABN_l - B N- 2 = clI ABN_2 - BN-8 = cd
(4)
and so on. We see that each B; is a polynomial in A with scalar coefficients, and hence that BiA = AB; for each i. It follows then that the identity in (2) is valid not only for all scalar quantities A, but also for all matrices which commute with A. In particular, it is valid for A = A. Substituting in (2), we see that f(A) = 0, the desired result. 19. Linear Equations with Periodic Coefficients. Let us now examine the problem of solving a linear differential equation with periodic coefficients, d:c - = P(t):c dt
:c(0) =
C
(1)
where P(t) is a continuous function of t satisfying the condition P(t
+ 1)
= P(t)
(2)
for all t. Surprisingly, the problem is one of extreme difficulty. the relatively simple scalar equation d2u dt 2
+ (a + b cos t)u
=
0
Even
(3)
the Mathieu equation, poses major difficulties, and has essentially a theory of its own. In obtaining a canonical representation of the solution of (1), we are
Explicit Solutions and Canonical Forms
209
led to discuss a problem of independent interest, namely the representation of a nonsingular matrix as an exponential. As usual, it is convenient to discuss the matrix equation first. We shall prove Theorem 9. Theorem 9. The solution of
dX = P(t)X dt
X(O) = I
(4)
where P(t) is periodic of period I, may be written in the form X (t) = Q(t)e C1
(5)
where Q(t) is periodic of period 1. Proof. It is clear from the periodicity of P(t), that X(t + I) is a solution of the differential equation in (4) whenever X(t) is. Since X(I) is not singular, we see that X (t + I)X (I )-1 is a solution of (4) with the initial value I. It follows from the uniqueness theorem that X(t
+ l)X(l)-1 =
X(t)
(6)
Suppose that it were possible to write X(I)
=
(7)
eC
Consider, then, the matrix Q(t) = X(t)e- c1 .
Q(t
+ I)
= X(t =
X(t
+ I)e-C(l+l) +
We have
= X(t
I)X(l)-le-CI
+ l)e-Ce- C1
= X(t)e- Ct = Q(t)
(8)
This establisheH (5). It remains to prove that (7) is valid. 20. A Nonsingular Matrix Is an Exponential. Let us now establish Theorem 10. Theorem 10. A nonsingular matrix is an exponential. Let A be nonsingular with distinct characteristic roots, so that A may be written
o T-l
A=T
o
(I)
Introduction to Matrix Analysis
210 Then A
=e
C
where
o C=T
7'"-1
(2)
o which proves the result for this case. If A has multiple characteristic roots, we can use the Jordan canonical form with equal effect. Let
o .'
A=T
T-I
(3)
o It is sufficient to show that each of the matrices L k,(}..;) has a logarithm. For if B; is a logarithm of L k ,(}..;), then
o T-I
B=T
(4)
o Br is a logarithm of A. We have noted above in Exercise 4 of Sec. 11 that (5)
Explicit Solutions and Canonical Forms
211
It follows then that the formal logarithm B = log Lk(A) = log [H
..
=
llog A +
+ L';(A)
l: (~~,,"+I
- HJ
[Lk(A) - HJ"
n-l
'-1
=
I log A +
~ (-1)"+1 '-' nA.. [Lk(X)
- HJ"
n-I
exists and is actually a logarithm in the sense that eB = A. EXERCISES 1. A nonsingular matrix has a kth root for k = 2, 3, . . • . II. The solution of dX/dt = Q(t)X, X(O) = I, can be put in the form ePeP' ••.
fo' Q(8) dB, and P" = fo' Q" d8, with Q" = e-Po-IQ,,_lePo-1 + fo eoPO-IQ,,_le-·PO-I dB
ePo ••• , where P =
-1
The infinite product converges if t is sufficiently small (F. Fer, Bull. claslJe 8ci., Acad. roy. Belg., 1101. 44, no. 5, pp. 818-829, 1958).
21. An Alternate Proof. Since we have not proved the Jordan representation, let us present an independent proof of an inductive nature. Assume that the result holds for N X N matrices and that we have already converted the nonsingular (N + I)-dimensional matrix under consideration to triangular form, A N+1=
[~N ~:+J
(1)
where AN is an N-dimensional matrix in triangular form and aN denotes an N-dimensional column vector. Let BN be a logarithm of AN, which is nonsingular if A N+1is, and write (2)
where l = log AN+! and x is an unknown N-dimensional vector. It remains to show that x may be determined so that eBN +1 = AN+!' It is easy to see inductively that BNk BN+1k = [ O
for k
=
1,2, . . .
(BNk-1
+ lBN k- + 2
lk
•••
+ lk-II)X]
(3)
212
Introduction to Matrix Analysis
(4)
where the first two terms are taken to be 0 and I. If l is not a characteristic root of B N , we ha.ve C(l)
• L • = L
(BN k- 1
=
+ B Nk- 2l + ... + lk-II)lkl
A:-O
(BNk - lkI)(BN - lI)-llkl = (e BN
-
el I)(BN - lI)-1
(5)
A:-O
Hence (6)
where rl, r2, ••• , rN are the characteristic roots of B N• Since IC(l)1 is a continuous function of l for alll, as we see from the series, it follows that (6) derived under the assumption that I yf: r" holds for allZ. H I yf: r" or any of the other values of the logarithms of the ~k, it is clear that IC(l) I yf: O. If l = rk, then the factor (e'· - el)/(rk - l) reduces to e'· yf: O. Restricting ourselves to principal values of the logarithms, we see that C(l) is never singular. Hence, we see that x may be determined so that C(l)x = aN. This completes the demonstration. 22. Some Interesting Transformations. The reduction of differential equations with variable matrix, dx = A(t)x dt
-
x(O) =
C
(1)
to canonical form is a problem of some difficulty. Since it lies more within the province of the theory of differential equations than of matrix theory, we shall not pursue it further here. Some interesting transformations, however, arise at the very beginning of the investigation. Set x = Ty, where T is a function of t. Then y satisfies the equation dy dt
=
T-I(AT - dTIdt)y
(2)
Explicit Solutions and Canonical Forms
213
f(A,T) = T-l(AT - dTjdt)
(3)
Write
Then we see from the derivation that f(A, T) satisfies the functional equation f(A,ST) = f(T-l(AT - dTjdt), S)
(4)
EXERCISE
1. Use the scalar equation
and the transformations u equations.
= 8V, t =
"'(8), to derive similar classes of functional
23. Biorthogonality. In the first part of the book, dealing with symmetric matrices, we observed that every N-dimensional vector could be written as a linear combination of N orthonormal characteristic vectors associated with the N characteristic roots of an N X N matrix A. Thus, if (1)
the coefficients are determined very simply by means of the formula (2)
If A is not symmetric, we face two difficulties in obtaining an analogue of this result. In the first place, A may not have a full quota of characteristic vectors, and in the second place, these need not be mutually orthogonal. To overcome the first difficulty, we need merely make an assumption that we consider only matrices that do possess N linearly independent characteristic vectors. To overcome the second difficulty, we shall use the adjoint matrix and the concept of biorthogonality, rather than orthogonality. Let yl, y2, yl, . . . , yN be a set of linear independent characteristic , AN of A. vectors associated with the characteristic roots Al , A2, Then (3)
is nonsingular and possesses the property of transforming A to diagonal form,
Introduction to Matrix Analysis
214
Al
o (4)
o li'rom this, it follows that
o T* A *(T*)-I ...
(5)
o Hence, A * also possesses the property of possessing N linearly independent characteristic vectors, furnished by the columns of (T*)-I and associated with the characteristic roots XI, XI, ... ,XN. Call these characteristic vectors Zl, Zl, • • • , ZN, respectively. To determine the coefficients in the representation (0)
we proceed as follows.
From Ayi = Aiyi A *zi = X,oz;
(7)
we obtain the further relations (8)
and thus Hence, (9)
Hence, if Ai
yf:
Aj, we have
(10)
It is clear that (y',i') to all of the yi.
yf:
0, since ii, being nontrivial, cannot be orthogonal
Explicit Solutions and Canonical Forms
215
Thus, in the special case where the Ai are all distinct, we can write, for the coefficients in (6), (11)
Two sets of vectors f yi I. Izll, satisfying the condition (yi,Zt) = 0
i~j
(12)
are said to be biorthogonal. It is clear that we can normalize the vectors so as to obtain the additional condition (13)
This technique for expanding x as a linear combination of the yl bears out a comment made earlier that often the properties of a matrix are most easily understood in terms of the properties of its transpose. EXERCISES 1. How does one treat the case of multiple characteristic roots? II. What simplifications ensue if A is normal?
24. The Laplace Transform. Once we know that the solution of a linear differential equation with constant coefficients has the form N
X
=
I
eX'1pk(t)
(1)
k~1
where Pk(t) is a vector whose components are polynomials of degree at most N - 1, we can use the Laplace transform to determine the solution without paying any attention to a number of details of rigor which usually necessitate an involved preliminary discussion. The value of the Laplace transform lies in the fact that it can be used to transform a transcendental function into an algebraic function. Specifically, it transforms the simplest exponential function into the simplest rational function. The integral (2)
is called the Laplace tran8form of f(t), provided that it exists. Since we shall be interested only in functionsf(t) having the form given in (1), it is clear that g(8) will always exist if Re (8) is sufficiently large.
Introduction to Matrix Analysis
216
The basic formula is
'i " o
for Re (s)
>
e-"eo' cit
1
(3)
= -8 - a
It follows that if we know that f(t) has the form
Re (a).
(4)
then the equality (5)
means that CI = C2 and AI = a. Similarly, from the additivity of the integral, it follows that if f(t) is of the form given in (1), and if g(s) has the form (6)
then we must have (7)
26. An Example. tial equation
Consider the problem of solving the linear differen-
u" - 3u'
+ 2u =
u(O) = 1
0
(1)
u'(O) = 0
Since the characteristic equation is A2 - 3A
+2 = 0
(2)
with roots A = 1 and A = 2, we see that the solution of (1) will have the form (3)
where CI and C2 will be determined by the initial conditions in (1). we obtain the linear system CI
2cI
+ C2 + C2
= 1 =
0
Thus (4)
yielding Cl
= -1
(5)
so that the solution is (6)
In place of this procedure, let us use the Laplace transform. for Re (s) > 2, we obtain the relation
fo· u"e- II dt - 3
h· u'e-" dt + 2 fo· ue-" dt = 0
Ftom (1).
(7)
Explicit Solutions and Canonical Forms
217
In order to evaluate the first two terms, we integrate by parts, obtaining
+ 8/0'" ue-'! dt = - 1 + 8 10'" ur'! dt 10'" u"e-'! dt = U'r'!]; + 8 10'" u'r'! dt = 0 + 8 10'" u'r'! dt = -8 + 8210'" ue-,I dt upon referring to the relation for 10'" u'e--'! dt.
10'" u'e-'! dt =
ur'!];
(8)
Thus, (7) yields the equation (8 2
38
-
r'"
or
+ 2) 10'" ue-,I dt t
)0 ue-· dt =
8 -
82 _ 39
= 8 - 3
3
+2
(9) (10)
The rational function on the right side of (10) has a partial fraction decomposition (11)
with al = lim .~1
a2 = lim ~2
~8 2- 1L~- 3) 8
-
38
+2
=
8 -
3]
s - 2
~__2-=_22~-=_~ = 8_=-_~J 8
Hence,
fa
o
-
'" ue-
.'38
sl
+2
8 -
2 8-1
1
= 2 .=1
(12) =-1
.=2
1 8-2
dt = - - - - -
(13)
whence u(t) has the form stated in (6). 26. Discussion. If, in place of the second-order system appearing in (5.1), we had studied a tenth-order system, the straightforward approach, based upon undetermined coefficients, would have required the solution of ten simultaneous linear equations in ten unknowns-a formidable problem. The use of the Laplace transform technique avoids this. Actually, this is only a small part of the reason why the Laplace transform plays a fundamental role in analysis. EXERCISES 1. Show that
Introduction to Matrix Analysis
218 using
(a) Integration by parts (b) Differentiation with respect to
8 or a i. Using this formula, show how to solve linear differential equations whose characteristic equations have multiple roots. In particular, solve
u" - 2u'
+u
= 0
= 1
u(O)
u'(O) = 0
8. Use the Laplace transform solution of an associated differential equation to obtain a representation for the solution of the system of linear equations b. =
L" e~i·x"
i = 1, 2, . . . , N, where the x, are the unknowns.
i-I
4. How does one treat the corresponding problem when the
2'1. Matrix Case.
x, and N are unknowns?
Consider the equation dx
-dt = Ax where A is a constant matrix.
r" dxdt
}o
x(O) = e
(1)
Taking transforms, we have
r" xc-· dt
c-.I dt = A
}o
I
(2)
whence, integrating by parts, (A - sI)
10"
XC-·
,
dt
= -e
(3)
Thus, the Laplace transform of the solution is given by
fo"
XC-·
,
dt
=
(sI - A)-Ie
(4)
In order to solve this, we can use the same type of partial fraction decomposition referred to above, namely N
(sI - A)-I =
L
Ak(s - Ak)-I
(5)
k=1
We shall assume, for the sake of simplicity, that the characteristic roots of A are distinct. Then A k = lim (s - Ak)(sI - A)-I
(6)
-~.
EXERCISES
A.,
1. Find an explicit representation for using the Sylvester interpolation formula of Exercise 34, Miscellaneous Exercises, Chap. 6. i. What happens if A has multiple characteristic roots?
Explicit Solutions and Canonical Forms
219 U(N) +
8. Use the Laplace transform of a linear differential equation of the form + ... + aNU = 0, u(O) = ao, u'(O) = aI, . • • , U(N-Il(O) = aN_t, obtain a solution of the linear system
alu(N-I)
to
N
1
Xira, = ar
r ... 0, 1, . . . , N - 1
i~l
where Xi
;t! Xi
for i
;t!
j.
MISCELLANEOUS EXERCISES
+ B, under what conditions does A-I ... I - B + B2 considering the equation x" + Ax = 0,
1. If A ... I
... ?
i. By where A is positive definite, show that the characteristic roots of a positive definite matrix are positive. 8. Let A ... HU be a representation of the complex matrix A where H is a positive definite Hermitian matrix and U is unitary. Show that A is normal if and only if Hand U commute. 4. Prove that if A, Band AB are normal, then BA is normal (N. A. Wiegmann). 6. A necessary and sufficient condition that the product of two normal matrices be normal is that each commute with the H matrices of the other. By the H matrix of A, we mean the non-negative definite square root of AA * (N. A. Wiegmann). 8. An ... I for some n if and only if the characteristic roots of A are roots of unity. 7. If Bk = 0 for some k, then IA + BI = IAI. 8. If IA + xli = 0 has reciprocal roots, A is either orthogonal, or A I = I (Burgatti). 9. Let P, Q, R, X be matrices of the second order. Then every characteristic root of a solution X of PX2 + QX + R = 0 is a root of IPX2 + QX + RI = 0 (Sylve8Ier). 10. Prove that the result holds for N-dimensional matrices and the equation
+ AIXm-1 + ... + Am_IX + Am = 0 11. The scalar equation u" + p(l)u' + q(l)1t ... 0 is converted into the Riccati equation v' + v 2 + p(t)v + q(t) = 0 by means of thc change of variable u ... exp (fv dt) AoXm
or v = lt' ju. Show that there is a simihu connection between the matrix Riccatian equation X' = A(l) + B(t)X + XC(l)X ami a second-oruer linear differential equation. li. Every matrix A with IAI = I can he written in the form A ... BCB-IC-I (Skoda).
18. If IXI - I YI
~
fl, then two llIatl'ices C and D can be found such that X = C-tD-tYCD
(0. Tau88ky).
14. Let
o -iX I
iX I 0 0 iX 2 -iX 2
220
Introduction to Matrix Analysis
Show that
I lim -N log IH N
N.-..
N
+ Xli
I lim -N \" log [X
-
N.....
'-'
+,.. (A»)
i-I
where eR(X) represents the continued fraction e,,(X) = Hint:
x"l/x
+ XR+ll/x +
Start with the recurrence relation
H N(X;X I,X2, •.• ,XN_I) = xHN_I(X;XI,XI, ••. ,XN_I)
+ XI2HN_1(XiXI,XC,
•.. ,XN_I)
for N ~ 3, where HN(x) = IHN + Xli. 16. if A and B are both normal, then if cIA + c2B has the characteristic values cIX. + Ct"'" for all CI and C2, we must have AB = BA (N. A. Wiegmann-H. Wielandl). 18. Write Show thatJ(A,B) - [A,BI/2 + g(A,B) + h(A,B), [A,B) = AB - BA, where g(A,B) is a homogeneous polynomial of degree 3 in A and B satisfying the relation g(A,B) g(B,A), and MA,B) is a sum of homogeneous polynomials beginning with one of degree 4. 17. Show that J(B,C)
+ J(A, B + C + J(B,C»
=- J(A,B)
+ J(A + B + J(A,B), C)
18. Using this result, show that [A,[B,Cll
+ [B,[C,All + [C,[A,Bll
= 0
where [A,B] is the Jacobi bracket symbol. 19. A necessary and sufficient condition that lim B" = 0, for B real or complex,
,._
..
is that there exist a positive definite Hermitian matrix H such that H - B·HB is positive definite (P. Stein). iO. A is diagonalizable if and only if H AH-I is normal for some positive definite Hermitian matrix H (Mitchell). i1. Let r be a root of the quadratic equation r 2 + air + a2 "" O. Write y = Xo + Xlr, yr = xor + XI[ -aIr - a2) = -a2XI + (xo - xlal)r. Introduce the matrix
and write X '" Xo + xlr. If X'" Xo + xlr and Y '" yo + Ylr, to what does XY correspond? ii. Determine the characteristic roots of X. i8. Generalize the foregoing results to the case where r is a root of the Nth-order polynomial equation r N + alrN- 1 + . . . + aN = 0
i4. There exists an orthogonal matrix B(I) such that y = B(l)e transforms dyldt = A (I)y into deldt = A I (I)e where A I (I) is semidiagonal (DiUberto). i6. B(I) can be chosen to be bounded and nonsingular and A I (t) diagonal (Diliberto).
i8. Given one characteristic vector of an N X N matrix A, can this information be used to reduce the problem of determining all characteristic vectors of A to that of determining all characteristic vectors of an (N - 1) X (N - 1) matrix? i7. The matrix A is called circular if A· = A -I. Show that eoR is circular if R is
real.
Explicit Solutions and Canonical Forms
221
i8. The matrix A has the form where S is real skcw-symmetric if and only if A is a real orthogonal matrix with IAI = 1 (Taber). i9. Every nonsingular matrix can be expressed as the product of eS
(0)
A symmetric matrix and an orthogonal matrix, if real
(b) A Hermitian matrix and a unitary matrix, if complex
If the first factor is chosen to be positive definite, the factors are uniquely determined. These and many other results are derived by DeBruijn and Szekeres in the paper cited below,l using the logarithm of a matrix. 80. Derive the result of Exercise 28 from the canonical representation given above for orthogonal matrices. 81. Let U" U2, •.• ,UN be a set of linearly independent solutions of the Nth-order linear differential equation The determinant
is called the Wronskian of the function U" U2, . . . ,UN. W(l) = W(UI,U2, . . . ,1tN) = W(lo)e
Show that
-ftt.
1',(8)<1.
For an extension of t.he concept of the Wronskian, see A. Ostrowski. 2 The corresponding determinant associated with the solution of a linear difference equation of the form
+ N) + p,(r)u(x + N ...• U(x
x = 0, 1,2,
1)
- 1)
+
u,(x
+N
+ pN(X)U(X)
= 0
1)
is called a Casorati determinant.· Many further results concerning Wron::$kians can be found in G. Polya and G. Szego. 4 , N. G. DeBruijn and G. Szekercs, Nieuw. Arch. Wisk., (3), vol. 111, pp. 20-32, 1955. I A. Ostrowski, ttber ein Analogon der Wronskischen Determinante bei Funktionen mehrerer Veranderlicher, Malh. Z., vol. 4, pp. 223-230, 1919. a For some of its properties, see P. Montel, Le~ons sur les recurrences et leurs applications, Gauthier-Villars, Paris, 1957. See also D. M. Krabill, On Extension of Wronskian Matril'es, Bull. Am. Malh. Soc., vol. 49, pp. 593-601, 1943. 4 G. Polya and G. Szego, AuJgaben und Lehrsalze aus der Analysis, Zweiter Band, p. 113, Dover Publications, New York, 1945.
Introduction to Matrix Analysis
222
81. A unitary matrix U can be expressed in the form U - V-IW-IVW with unitary V, W, if and only if det U - 1. II. The unitary matrices U, V can be expressed in the form U - WIWIW" V - W,WIW I with unitary WI, WI, W, if and only if det U - det V. 84. If A A'" - A'"A has all non-negative elements, then actually A is normal. 86. If HI and HI are Hermitian and one, at least, is positive definite, then HIHI can be transformed into diagonal form (0. Tau88ky). 88. If "A + lAB can be diagonalized for all scalars" and lA, then AB - BA (T. S. Motzkin and O. Taus8ky). 87. Show that a matrix A, which is similar to a real diagonal matrix D in the sense that A - TDT-I, is the product of two Hermitian matrices (0. Tau88ky). 8S. The characteristic roots of
are given by'" = a - 2 VbC cos k8, k = 1, 2, . , N, where B - -r/(N Show that AN(") - IA - HI satisfies the difference equation
+ 1).
89. The characteristic roots of II -
2a b
II
b 2a
b
2a
,
2a
b
b
2a b
b 2a
B= , 2a b
2a II
2a
b 2a II -
b
are given by
>-. -
II - 2b -
b-1Ia' - (a - 2b cos k8)1)
k - 1,2, . . . , N
(Ruther/ord, Todd)
Obtain the result by relating B to A 2. (A detailed discussion of the behavior of the characteristic roots and vectors of the matrix A (ex) 'A (ex) may be found in A. Ostrowski. l ) The matrix A (ex) is given by the expression ex 1 1
0 ex 1
ex
0 0 0
A(a) =
1 . . . ex I A. Ostrowski, On the Spectrum of a One-parametric Family of Matrices, J. rnne angew. Math., Bd. 193, pp. 143-160, 1954.
Explicit Solutions and Canonical Forms
223
(Another detailed discussion of the properties of a particularly interesting matrix may be found in T. Kato. l ) 40. Denote by s(A) the function max Ix, - xii, and by IIA II the quantity
Show that
and thus that s(A) ~ 2~illA" (L. Mirsky). 41. Let [A,B) = AB - BA, as above. If [A,[A,A·1l = 0, then A is normal (Putnam). See T. Kato and O. Taussky.2 41. If XI, X2, . . . ,XN are the characteristic values of A, CI and C2 are two nonzero complex numbers, and the characteristic values of cIA + c 2A· are CIXI + CIXf(I" where j(i) is a permutation of the integers I, . . . , N, then A is normal. 48. Let A and B commute. Then the characteristic values of /(A,B) are /(XI,1I1), where XI and lAi are the characteristic values of A and B arranged in a fixed order, independent of / (G. Frobenius). (Pairs of matrices kand B for which cIA c 2B has the characteristic roots CIX, C2IA' are said to possess property L.I) 44. Prove that A is normal if and only if
+
+
N
tr(AOA) = 11X,I2
(I. Schur)
.-1 46. Use this result to establish Wiegmann's result that BA is normal if A, Band AB are normal (L. Mirsky). 48. If A is a Hermitian matrix, show that with s(A) defined as in Exercise 40, s(A)
= 2 sup l(u,Av)1
U.'
where the upper bound is taken with respect to all pairs of orthogonal vectors u and v (L. Mirsky). 47. If A is Hermitian, then s(A) ~ 2 max larsl (Parker, Mirsky). r;'.
48. If A is normal, then s(A)
~ sup
1·1-1
s
(zA ~ ZA .)
(L. Mirsky) 4
IT. Kato, On the Hilbert Matrix, Proc. Am. Math. Soc., vol. 8, pp. 73-81, 1957. 2T. Kato and O. Taussky, Commutators of A and A·, J. Washington A cad. Sci., vol. 46, pp. 38-40, 1956. a For a discussion and further references, see T. S. Motzkin and O. Taussky, Pairs of Matrices with Property L, Trans. Am. Math. Soc., vol. 73, pp. 108-114, 1952. 4 For the preceding results, and further ones, see L. Mirsky, Inequalities for Normal and Hermitian Matrices, Duke Math. J., vol. 24, pp. 592-600, 1957.
224
Introduction to Matrix Analysis
49. If A and B are normal matrices with characteristic values Xi and ""i, respectively, then there exists a suitable rearrangement of the characteristic values so that
llXi - lIil l 5
,
IIA - Bill, where IIAII2 =
~ laill l , as above ,,'
(Hoffman-Wieland/).
GO. As we know, every matrix A can be written in the form B + iC, where Band C are Hermitian. Let the characteristic roots of B be contained between bl and bl and those of C be between CI and CI. Then if we consider the rectangle in the complex plane with the vertices bl + iCI, bl + iC2, b2 + iCI, b2 + UI, all the characteristic roots of A are contained within this rectangle (Bendixson-Hirsch). 61. By the domain D of a matrix A, let us mean the set of complex values assumed by the quadratic form (x,A:!!) for values of x satisfying the constraint (x,:!!) = 1. Show that the characteristic values of A belong to the domain of A.I 61. Let K be the smallest convex domain which includes all characteristic values of A, and let D be a domain of A. If A is normal, K and D are the same. 63. Show, by considering 2 X 2 matrices, that the result need not be valid if A is not normal. M. The boundary of D is a convex curve, whether or not A is normal. 66. In addition, D itself is a convex set. 1 For a bibliography of recent results concerning this and related problems, see O. Taussky.3 66. Associated with the equation x' - Ax, where x is an N-dimensional vector and A an N X N matrix, is an Nth-order linear differential equation satisfied by each component of x. Suppose that every solution of the corresponding equation associated with y' = By is a solution of the equation associated with x' = Ax. What can be said about the relation between the matrices .4 and B1 67. Consider the equation dx/dt = Bx, x(O) = c, where B is a constant matrix to be chosen so that the function (x,x) remains constant over time. Show that B will have the required property if and only if it is skew-symmetric. From this, conclude that eB is orthogonal. 68. Using the fact that the vector differential equation A d 2x/dt l + B dx/dt + Cx = 0 can be considered to be equivalent to the system dx/dt = y, A dy/dt + By + Cx = 0, obtain a determinantal relation equivalent to IAx2 + Bx + CI = 0 involving only linear terms in X. 69. Let XI, XI, . . . , x.. be scalars, and F(X.,XI, • • • ,x..) denote the matrix Alxl + Alxl + ... + A ..x.., where Ai are pairwise commutative. Then if /(X.,X2' .•• ,x..) = IF(x.,xl, . . . x..) I (the determinant of F), we have f(X.,X 2, ••. ,X.. ) = 0 if the Xi are matrices such that A.X I + A 2X. + ... + A .. X .. = O. (This generalization of the Hamilton-Cayley theorem is due to H. B. Phillips.) 60. Show that the result remains valid if the linear form
L
A/Xi is replaced by a
i
polynomial in the Xi with matrix coefficients. See Chao Ko and H. C. Lee, 4 where references to earlier work by Ostrowski and Phillips will be found. I For this and the following results see O. Tocplitz, Das algebraische Analogon zu einem Satze von Fejer, Math. Z., vol. 2, pp. 187-197, 1919. I F. Hausdorff, Der Wertvorrat einer Bilinearform, Math. Z., vol. 3, pp. 814-316, 1919. 3O. Taussky, Bt"bliography on Bounds for Characteristic Roots of Finite Matrices, National Bureau of Standards, September, 1951. 4 Chao Ko and H. C. Lee, A Further Generalization of the Hamilton-Cayley Theorem, J. London Math. Soc., vol. 15, pp. 153-158,1940.
Explicit Solutions and Canonical Forms
225
,n,
E.,
61. If i = I, 2, . . . are 4 X 4 matrices satisfying the relations E..I = -1, E.E j "" -EjE., i ~ j, then n ~ 5 (Eddington). 61. If the E. are restricted to be either real or pure imaginary, then 2 are real and 3 are imaginary (Eddington). For a generalization, see M. H. A. Newman. 1 68. If B commutes with every matrix that commutes with A, then B is a scalar polynomial in A. See J. M. Wedderburn. 1 64. Let A be a complex matrix with characteristic roots "I, "2, ... ,"N. Then,
if we set IIXII2
=
r
IX,jI2, we have
1,1
N
Inf l1S- I ASII2 = 8
I 1".\2
i~l
where the lower bound is taken with respect to all nonsingular S. The lower bound is attained if and only if A is diagonizable (L. Mirsky). 66. Let A, B, and X denote N X N matrices. Show that a sufficient condition for the existence of at least one solution X of the matrix equation X2 - 2AX + B = 0 is that the characteristic values of
be distinct (TreuenJels). 66. Let {X I be a set of N X N matrices with the property that every real linear combination has only real characteristic values. Then "mu(X), the largest characteristic value, is a convex matrix function, and "mlD(X), the smallest characteristic value, is a concave matrix function (P. D. Lax). 67. Let X and Y be two matrices all of whose linear combinations have real characteristic values, and suppose that those of X are negative. Then any characteristic root of X + iY has negative real part (P. D. Lax). 68. Let IXI be a set of N X N real matrices with the property that every real linear combination has only real characteristic values. Then if XI and X 2 are two matrices in the set with the property that X I - X 2 has non-negative characteristic roots, we have ".(X I) ~ ".(X 2), i = I, 2, . . . , N, where ".(X) is the ith characteristic root arranged in order of magnitude (P. D. Lax). For proofs of these results based upon the theory of hyperbolic partial differential equations, see P. D. Lax. 2 For proofs along more elementary lines, see A. F. Weinberger. 4 69. Let lad, {Ild (1 ~ i ~ n) be complex numbers, each of absolute value 1. Prove: There exist two unitary matrices A, B of order n with the preassigned characteristic roots {ad, Illd, respectively, and such that 1 is a characteristic root of AB, if and only if, in the complex plane, the convex hull of the points lad and the convex bull of the points {iJ. I have a point in common (Ky Fan).' 1M. H. A. Newman, J. London Math. Soc., vol. 7, pp. 93-99, 1932. J. M. Wedderburn, Lectures on Matrices, Am. Math. Soc. Colloq. Publ., vol. 17, p. 106, 1934. 2 P. D. Lax, Differential Equations, Difference Equations and Matrix Theory, Comm. Pure Appl. Math., vol. XI, pp. 175-194, 1958. 4 A. F. Weinberger, Remarks on the Preceding Paper of Lax, Comm. Pure Appl. Math., vol. XI, pp. 195-196, 1958. I This result is analogous to a theorem of H. Wielandt, Pacific J. Math., vol. 5, pp. 633-638, 1955, concerning the characteristic values of the sum of two normal matrices. I
Introduction to Matrix Analysis
226
70. We know that e A, may be written as a polynomial in A, e A1 = uo(t)l + uI(t)A + Determine differential equations for the Ui(t) using the fact that d/dt(e A') = Ae A1 • 71. Let A be an N X N matrix and A ( +) be formed from A by replacing with zeros all elements of A which are either on or below the principal diagonal. Let A(-) = A - A(+) and suppose that A(+) and A(-) commute. Writee A = PQ, where P - I has nonzero terms only above the diagonal and Q - I has nonzero terms only on or below the diagonal. Then
... + UN_I(t)AN-I.
P
=
e A (+)
Q = e A(-)
and the factorization is unique (G. Benler, An Operator Identity, Padfic J. Math., vol. 8, pp. 649-664, 1958). The analogy betwcen this factorization and the WienerHopf factorization is more than superficial. 7S. Consider the linear functional equation f = e "T(fg), where e is the identity element, ). is a scalar, and T is an operator satisfying a relation of the form (Tu)(Tv) = T(uT(v» + T(u)v - Buv, for any two functions u and v with 8 a fixed scalar. Show that for small ).
+
f = exp [ T
• (1 I o-l) ] n- ("g)n 8
n-l
The result is due to Baxter. For proofs dependent on differential equations, see Atkinson l and Wende!.' Operators of the foregoing nature are connected with the Reynolds operator of importance in turbulence theory. 7S. Let IA + HI = "n + al"n-I + ... + an' Show that al
= -
tr (A)
a,
= -
~2[al
tr (A)
+ tr (A'»),
and obtain a general recurrence relation connecting ak with ai, ai, . . • , ak_1 (Bocher). A matrix A whose main diagonal elements are Os and Is, with other elements satisfying the relation aij + ali = I is called a tournament matrix. Let "I, "2, ... , be the N characteristic roots of A, with 1"11 ~ 1"21 ~ ..• ~ I"NI. Then - ~2 ~ Re ("I) ~ (n - 1)/2, 1"11 ~ (n - 1)/2 and I"kl ~ (n(n - 1)/2k)J.i for k ~ 2. (A. Braver and I. C. Gentry, On the Charocler£stic Roots of Tournament Matrices, Bull. Am. Math. Soc., vol. 74, pp. 1133-1135, 1968.) 71. If B = lim A", what can be said about the characteristic roots of B1 (0.
7".
"N
n.....
Taussk1l, M atricu with C" -+ 0, J. Algebra, vol. 1, pp. 5-10, 1964.) 76. Let A be a matrix with complex elements. Show that A is normal (that is, AA· = A ·A), if and only if one of the following holds:
+
(a) A = B iC, where Band C are Hermitian and commute. (b) A has a complete set of orthonormal characteristic vectors. (c) A = U·DU where U is unitary and D is diagonal.
(d) A = UH where U is unitary and H is Hermitian and U and H commute. (e) The characteristic roots of AA· are 1"11 ' , .•. , I"NI ' where "I, ... , are the characteristic roots of A. (f) The characteristic roots of A + A • are "I + AI, ... , "N + AN.
"N
The foregoing properties indicate the value of approximating to a given complex matrix B by a normal matrix A in the sense of minimizing a suitable matrix norm
IF. V. Atkinson, "Some Aspects of Baxter's Functional Equation," J. Math. Anal. Appl., vol. 6, pp. 1-29, 1963. IJ. G. Wendel, "Brief Proof of a Theorem of Baxter," Math. Scand., vol. 11, pp. 107-108, 1962.
Explicit Solutions and Canonical Forms
227
liB - A II. This question was first investigated by Minsky; see Causeyl and Hoffman and O. Taussky.' 77. Consider the matrix differential equation X' = (PI (I) + P,(I + 8»X, X(O) "" 1, where PI(I + 1) = PI(t), P,(I + x) = P 2 (t), x is irrational, and 8 is a real parameter. Write X(I,8) for the solution. Show that X(I, 8 + X) = X(I,8), X(I + 1, 8) = X(I, 8 + l)X(I,8), and hence that X(t + n, 8) = X(I, 8 + n)X(I, 8 + n - 1) X(I, 8
+n
- 2) . . . X(I,8).
n II
7S. Does
X(I, 8
+ k)
possess a limiting behavior as n -+ oo? See R. Bellman,
'-0
A Note on Linear Differential Equations with Quasiperiodic Coefficients, J. Math. Anal. Appl., to appear.
Bibliography and Discussion §1 to §10. The results and techniques follow
R. Bellman, Stability Theory of Differential Equations, Dover Publications, New York, 1969. S. Lefschetz, Lectures on Differential Equations, Ann. Math. Studies, no. 14, 1946. §3. For a discussion of the problem of determining the characteristic polynomial, see A. S. Householder and F. L. Bauer, On Certain Methods for Expanding the Characteristic Polynomial, Numerische Mathematik, vol. 1, pp. 29-40, 1959. §6. For some further results, and additional references, see
K. A. Hirsch, A Note on Vandermonde's Determinant, J. London Math. Soc., vol. 24, pp. 144-145, 1949. §11. For a proof of the Jordan canonical form, see
S. Lefschetz, Differential Equations, Geometric Theory, Interscience Publishers, Inc., New York, Appendix I, 1957. J. H. M. Wedderburn, Lectures on Matrices, Am. Math. Soc. Colloq. Publs., vol. 17, 1934. §13. The result is given by I. Schur in I. Schur, Ubcr die characteristischen Wurzeln einer !inearen SubstJtuR. L. Causey, On Closesl Normal Malrices, Computer Sciences Division, Stanford University, Technical Report CSlO, June, 1964. . I A. J. Hoffman and O. Taussky, On Characterizations of Normal Matrices, J. Res. Bur. Standarch, vol. 52, pp. 17-19, 1954. I
228
Introduction to Matrix Analysis tion mit einer Anwendung auf die Theorie del' Integral Gleichungen, Math. Ann., vol. 66, pp. 488-510, 1909.
§14. The concept of a normal matrix appears to have been introduced by Toeplitz, O. Toeplitz, Math. Z., vol. 2, p. 190, 1919. §16. This approximate diagonalization was introduced by Perron and used to establish stability theorems. See O. Perron, Math. Z., vol. 29, pp. 129-160, 1929. §16. For a discussion of the requisite theorems concerning resultants, see B. L. Van del' Werden, Moderne Algebra, Springer-Verlag, Berlin, 1931. §17. This result was given by Hamilton for quaternions and by Cayley for matrices. On page 18 of Macduffee's book, there is an account of a generalization of the Hamilton-Cayley theorem due to Phillips. For further generalizations, see the paper by Ko and Lee referred to in Exercise 60 of the Miscellaneous Exercises. §19. This derivation follows the monograph by Lefschetz, cited above. The determination of C is a problem of great difficulty. An excellent survey of results in this field is contained in V. M. Starzinski, Survey of Works on Conditions of Stability of the Trivial Solutlon of a System of Linear Differential Equations with Periodic Coefficients, Trans. A m. Math. Soc., series 2, vol. 1, pp. 189-239. §20. Observe what this result means. Every nonsingular point transformation y = Ax can be considered to arise from a family of continuous transformations furnished by the differential equation dZ/dt = CZ, Z(O) = I, where A = eC• §22. Some problems connected with these transformations may be found in the set of Research Problems of the Bull. Am. Math. Soc. for 1958. For the second-order linear differential equation (pu)' + qu = 0 there exist a number of int.eresting representations for the solution. One such, called the Prufer transformation, was generalized by J. H. Barrett to cover the matrix equation (P(t)X')' + Q(t)X = 0 (Proc. Am. Math. Soc.,
Explicit Solutions and Canonical Forms vol. 8, pp. 510-518, 1957). by
229
Further extension. and refinement were given
W. T. Reid, A Prufer Transformation for Differential Systems, Pacific J. Math., vol. 8, pp. 575-584, 1958. For an application to stability theory, see
R. Conti, Sulla t ..-similitudine tra matrici e l'equivalenza asintotica dei sistemi differenziali lineari, Riv. Mat. Univ. Parma, vol. 8, pp. 43-47, 1957. §23. For a detailed account of the use of biorthogonalization in connection with the basic problems of inversion of matrices and the determination of characteristic roots and vectors, see M. R. Hestenes, Inversion of Matrices by Biorthogonalization and Related Results, J. Soc. Ind. Appl. Math., vol. 6, pp. 51-90, 1958. One of the most important problems in the theory of linear differential equations is that of determining the structure of the solutions of linear systems of equations having the form N
I
p,j(D)Xj = 0
X,(O) =
Ci
i = 1,2, . . . , N
i-I
where D is the operator d/dt. An equivalent problem in matrix analysis is that of determining the characteristic values and vectors of matrices A (A) whose elements are polynomials in A. Since any thorough discussion of this problem, which we have merely touched upon in treating A - H, would take us too far afield in this volume, we would like to refer the reader to
R. A. Frazer, W. J. Duncan, and A. R. Collar, Elementary Matrices, The Macmillan Company, New York, 1946, where many applications are given to the flutter problem, etc. A consideration of the analytic properties of these matrices brings us in contact with the general theory of matrices whose elements are functions of a complex variable. These questions are not only of importance in themselves, but have important ramifications in modern physics and prediction theory. See P. Masani, The Laurent Factorization of Operator-valued Functions, Proc. London Math. Soc., vol. 6, pp. 59-69, 1956. P. Masani and N. Wiener, The Prediction Theory of Multivariate
230
Introduction to Matrix Analysis Stochastic Processes, Acta Math., I, vol. 98, pp. 111-150, 1957; II, vol. 99, pp. 96-137, 1958. H. Helson and D. Lowdenslager, Prediction Theory and Fourier Series in Several Variables, Acta Math., vol. 99, 1958.
Results on the structure of matrices with quaternion elements may be found in N. A. Wiegmann, The Structure of Unitary and Orthogonal Quaternion Matrices, Illinois J. Math., vol. ~, pp. 402-407, 1958, where further references may be found. Some recent papers of interest concerning the location of characteristic roots are
J. L. Brenner, New Root-location Theorems for Partitioned Matrices, SIAM J. Appl. Math., vol. 16, pp. 889-896, 1968. A. S. Householder, Norms and th~ Location of Roots of Matrices, Bull. Am. Math. Soc., vol. 74, pp. 816-830, 1968.
J. L. Brenner, Gersgorin Theorems by Householder's Proof, Bull. Am. Math. Soc., vol. 74, pp. 625-627, 1968. D. G. Feingold and R. S. Varga, Block Diagonally Dominant Matrices and Generalizations of the Gersgorin Circle Theorem, Pacific J. Math., vol. 12, pp. 1241-1250, 1962.
R. S. Varga, Minimal Gerschgorin Sets, Pacific J. Math., vol. 15, pp. 719-729, 1965. B. W. Levinger and R. S. Varga, Minimal Gerschgorin Sets, II, Pacific J. Math., vol. 17, pp. 199-210, 1966. §24. For discussions of analytic and computational aspects of the Laplace transform, see '
R. Bellman and K. L. Cooke, Differential Difference Equations, Academic Press Inc., New York, 1963. R. Bellman, R. E. Kalaba and J. Lockett, Numerical Inversion of the Laplace Transform, American Elsevier, New York, 1966.
12 Symmetric Functions, Kronecker Products and Circulants 1. Introduction. In this chapter, we shall show that some problems of apparently minor import, connected with various symmetric functions of the characteristic roots, lead to a quite important concept in matrix theory, the Kronecker product of two matrices. As a limiting case, we obtain the Kronecker sum, a matrix function which plays a basic role in the theory of the matrix equation AX
+ XB
=C
(1)
We have already met this equation in an earlier chapter, and we shall encounter it again in connection with stability theory. Furthermore, the Kronecker product will crop up in a subsequent chapter devoted'to stochastic matrices. Followiilg this, we shall construct another class of compound matrices which arise from the consideration of certain skew-symmetric functions. Although these are closely tied in with the geometry of N-dimensional Euclidean space, we shall not enter into these matters here. Finally, again motivated by conditions of symmetry, we shall discuss circulants. Throughout the chapter, we shall employ the same basic device to determine the characteristic roots and vectors of the matrices'under discussion, namely, that of viewing a matrix as an equivalent of a transformation of one set of quantities into another. A number of problems in the exercises will further emphasize this point of view. 2. Powers of Characteristic Roots. A very simple set of symmetric functions of the characteristic roots are the power sums N
p.
=
1 >"l
i-I
k
=
1,2, . . . . 231
(1)
232
Introduction to Matrix Analysis N
Suppose that we wish to determine
1
A;2,
Since
i-I
(2)
we can determine the sum of the squares of the characteristic roots in terms of two of the coefficients of the characteristic polynomial. MoreN
over, since
.-1lA/I,
k
=
1, 2, . . . ,is a symmetric function of the charac-
teristic roots, we know, in advance, that it can be written as a polynomial in the elementary symmetric functions. For many purposes, however, this is not the most convenient representation. We wish to demonstrate Theorem 1. For k = 1,2, . . . ,we have N
l
Al = tr (Ak)
(3)
i=1
Proof. tation
If A has distinct characteristic roots, the diagonal represen-
o A
T-l
T
(4)
o which yields the representation for A k,
o 1'-1
(5)
o makes the result evident. It is easily seen that an appeal to continuity along by now familiar lines yields (3) for general matrices.
Symmetric Functions. Kronecker Products and Circulants
233
If we do not like this route, we can employ the triangular representation
"I T-I
A=T
(6)
o where the elements above the main diagonal are not necessarily zero and the Al are not necessarily distinct. It is easily seen that
"
...
1'-1
(7)
o This representation yields an alternative proof of (3). 3. Polynomials and Characteristic Equations. Although we know that every characteristic equation produces a polynomial, it is not clear that every polynomial can be written as the characteristic polynomial of a matrix. Theorem 2. The matrix -al I
-a2 0
0
I
-an--I 0 0
-an
0 0 (1)
A
0
0
0
has the characteristic equation
IA
-
HI
= 0 "" An
+ alXn- + . . . + an 1
(2)
We leave the proof as an exercise. EXERCISE 1. Using the result given above, determine the sum of the cubes of the roots of the equation X' + a,>-'-' + ... + a. = O.
I
Introduction to Matrix Analysis
234
4. Symmetric Functions. Although we have solved the problem of determining matrices which have as characteristic roots the quantities Allo, A21o, • • • , AN", we do not as yet possess matrices whose characteristic roots are prescribed functions of the characteristic roots, such as AiA/, i, j = 1, 2, . . . , N. In order to solve this problem, we begin with the apparently more difficult problem of determining a matrix whose characteristic roots are Ai}JiJ where Ai are the characteristic roots of A and Jli those of B. Specializing Ai = J.!.i, we obtain the solution of the original problem. Let us discuss the 2 X 2 case where the algebra is more transparent. Start with the two sets of equations Alxl
=
allXl
AIX2 = a2lXI
+ a12X2 + a22X2
J.!.lYl = bUYI
J.!.IY2
= b2lYI
+ b12Y2 + b22Y2
(1)
and perform the following multiplications: AIJ.!.lXIYl 'A1JllXIY2
= =
'A 1J.!.lX2Yl = 'A 1J.!.lX2Y2 =
+ allb 12xlY2 + a12bUx2Yl + a 12b12 X2Y2 + allb22xlY2 + a12b2lX2Yl + a 12b22 x 2Y2 a2lb llxIYI + a2lb 12XIY2 + a22bllx2Yl + a22 b12 X2Y2 a21b21X1Yl + a2lb22xlY2 + a22b2lX2Yl + a22 b22 x 2Y2 aubllxlYl
allb21X1YI
(2)
We note that t.he four quantities XIYI, X1Y2, X2Yl, X2Y2, occur on the right and on the left in (2). Hence if we introduce the four-dimensional vector (3)
and the 4 X 4 matrix allb U
C
=
allb 21 a2lbll [ a2l b21
allb 12 allb 22
al2b 21
a2lb 12
a 22 bll
a z2 b 22
a22b21
al2bll
a 12 b12 ] al2b22 a22 bl2 a22 b22
(4)
we may write the equations of (2) in the form (5)
It follows that z is a characteristic vector of C with the associated characteristic value AIJ.!.l. In precisely the same way, we see that C has A1Jl2, A2J.I.l, and A2J.!.2 as characteristic roots with associated characteristic vectors formed in the same fashion as in (3). We thus have solved the problem of determining a matrix whose characteristic roots are AilJj, i, j = 1, 2.
Symmetric Functions. Kronecker Products and Circulants 235 6. Kronecker Products. Since even the 2 X 2 case appears to introduce an unpleasant amount of calculation, let us see whether or not we can introduce a better notation. Referring to (4.4), we note that there is a certain regularity to the structure of C. Closer observation shows that C may be written as a compound matrix (1)
or, even more simply, C
=
(a;jB)
(2)
Once this representation has been obtained, it is easy to see that it is independent of the dimensions of A and B. Hence Definition. Let A be an M-dimensional matrix and B an N-dimensional matrix. The MN-dimensional matrix defined by (2) is called the Kronecker product oj A and B and written A X B = (aijB)
The above argumentation readily yields Theorem 3. Theorem 3. The characteristic roots of AX Bare AilJj where characteristic roots of A and J.l.j the characteristic roots of B. The characteristic vectors have the form
(3)
Ai
are the
xliyi x2'yi Itij
=
(4)
XMiyi Here by X/ci, k = 1, 2, . . . , M, we mean the components of the characteristic vector Xi of A, while yi is a characteristic vector of B. EXERCISES 1. Show that tr (A X B) = (tr A)(tr B). i. Determine IA X BI. 8. Show that
IXB{ B ]
236
Introduction to Matrix Analysis
6. Algebra of Kronecker Products. In order to justify the name "Kronecker product" and the notation we have used, we must show that A X B possesses a number of the properties of a product. We shall leave as exercises for the reader proofs of the following results: (A
A X B X C = (A X B) X C = A X (B X C) X (C D) = A X C A X D B X C B X D (A X B)(C X D) = (AC) X (BD)
+ B)
+
+
+
(la) (lb)
(Ie)
We shall also consider the K roneeker power
7. Kronecker Powers-I. and write
A(2)
= A X A
Alk+l) =
If
+
(1)
A X Alk)
A and B do not commute, (AD)k ~ AkBk in general; and never if
k = 2.
It is, however, true that (AB)lk) = Alk)Blk)
(2)
for all A and B. This important property removes many of the difficulties due to noncommutativity. and will be used for this purpose in a later chapter devoted to stochastic matrices. EXERCISE
1. Prove that
AI~·H) =
Alk)
X
All).
8. Kronecker Powers-II. If we are interested only in Kronecker powers of a particular matrix, rather than general products, we can define matrices with these properties which are of much smaller dimension. Starting with the equations Alxl A l x2
= =
allXI a21XI
+ a12X2 + a22X2
(1)
we form the products A I 2XI 2
=
AI2XIX2 = AI 2X2 2 =
+ 2alla12XIX2 + a122X2 2 2 2 alla2lXI + (alla22 + al2a21)xlx2 + al2a22X2 2 2 a2l2XI2 + 2a2la22XIX2 + a22 x2 2 a1l XI 2
(2)
It is clear then that the matrix 2alla12 (alla22
+ al~21)
(3)
2aUa22
possesses the characteristic roots A12, XI X2, and A22. charanteristic vectors are readily obtained.
The corresponding
Symmetric Functions, Kronecker Products and Circplants 237 To obtain the expression for A [k), we proceed in a more systematic fashion. 9. Kronecker Products-III. Consider the equations, for a fixed integer k, (allXl (allxl
+ al2X2)k = allkxlk + kaUk-laI2Xlk-IX2 + + a12X2)k-l(a2lXl + anX2) = auk-la2lXlk + «k - l)auk-2a22a21 + auk-lau)xlk-lx2
(1)
where i = 0, 1,2, . . . , k. The matrix A [k) is then defined by the tableau of coefficients of the terms Xl k-'X2 i , i = 0, 1, 2, . . . , k, kauk-lal2 «k -
of
1)allk-2aI2a21
+ auk-Ian) ...
."J
(2)
This representation holds only for 2 X 2 matrices. The representation A[k) for matrices of dimension N can be obtained similarly. EXERCISES
1. Show that (AB)(k] = A (kIB(kj. i. Show that Alk+/] = A[k]A[l].
10. Kronecker Logarithm. Let us now derive an infinite matrix which we can think of as a Kronecker logarithm. We begin by forming a generalized kth power for nonintegral k. To do this, we consider the infinite sequence of expression (auxi + aI2X2)k-'(a2lXI + anX2)i, i = 0, 1, 2, . . . ,for k nonintegral. This yields, in place of the finite tableau of (8.2), an infinite array, since the binomial series (allxl + aI2X2)k-i is now not finite. The change of variable (1)
shows that (AB)[k) = A [k)B[k)
holds for arbitrary values of k.
(2)
238
Introduction to Matrix Analysis
Let us now expand each element in A [II)
... A[G)
A[k)
about k
= O.
Write
+ kL(A) + . . .
(3)
This defines an infinite matrix L(A), which we can think of as a Kronecker logarithm. Substituting in (2), we see that L(AB) ... A[G)L(B)
+ L(A)B[G)
(4)
an analogue of the functional equation for the scalar logarithm. that A[G) is not an identity matrix.
Observe
EXERCISE 1. Determine the ijth element in L(A), for i = 0, I, j - 0, 1.
11. Kronecker Sum-I. Let us now turn to the problem of determining matrices which possess the characteristic values Ai + J.l.J. Following a method used before, we shall derive the additive result from the multiplicative case. Consider, for a parameter e, the relation (1 M
+ fA)
X (IN
+ eB)
+ e(lM X B + A X IN) + eSA X B (1) Since the characteristic roots of (l M + eA) X (l N + En) are (1 + eA,)(1 + efJ.j) = 1 + e(A + fJ.J) + eSAifJ.J we see that the characteristic roots of 1 M X B + A X IN must be Ai + J.l.j. = 1M X IN
i
EXERCISE 1. Determine the characteristic vectors of 1M X B
+A
12. Kronecker Sum-II. The matrix 1M X B obtained by use of differential equations. Consider the equations dXI
allXI
+ auxs
= aSlxl
+ auxs
df ...
dxs
df
X IN.
+A
X IN can also be
(1)
Let us now compute the derivatives of the four products XIYI, XIYS, XSYI, We have
ZsYs.
d
di
+ auxs)YI + xl(bllYI + buYs) = (all + bn)xlYI + bUXIY2 + aUXSYl
(XIYI) = (allxl
(2)
Symmetric Functions, Kronecker Products and Circulants 239 and so on. It is easy to see that the matrix we obtain is precisely A X IN + 1M X B. 13. The Equation AX + XB = C, the Lyapunov Equation. In Chap. 11, we showed that the equation AX
+ XB
=C
(1)
possessed the unique solution X = -
fo'" eAICe
BI
(2)
dt
provided that the integral on the right exists for all C. Let us now complete this result. Consider the case where A, B, C, and X are 2 X 2 matrices. equations to determine the unknown components Xij are
+ + xllb ll + x 12b 21 = CII + a12X22 + xllb 12 + x 12b 22 = CI2 a21XII + a22X2I + X2Ibli + x22b 21 = C21 + a22X22 + X 21b 12 + x22b 22 = C22
allxlI
The
a12X21
(3)
allXI2
a2lX12
The matrix of coefficients is all [
+ :::
all
o
a21
a21
+ :::
al2
0
a22
0
al2
+ b ll b12
0]
(4)
b 21
a22
+ b 22
which we recognize as the matrix A X I + I XB'. The characteristic roots are thus Ai + IJj, since Band B' have the same characteristic roots. It is easy to see that the same results generalize to matrices of arbitrary dimension so that we have established Theorem 4. Theorem 4. A necessary and sufficient condition that (1) have a solution
+
for all C is that Ai lJi ~ 0 where Ai are the characteristic roots of A and lJi the characteristic roots of B. EXERCISES 1. Prove that a necessary and sufficient condition that AX unique solution for all C is that Xi + Xi ~ O. 2. Prove the foregoing result in the following steps: (a) Let T be a matrix reducing A to triangular form,
+
XA' = C have a
Introduction to Matrix Analysis
240
(b) Then AX + XA' = C becomes B'(T'XT) + (T'XT)B = T'eT. (c) Let Y - T' XT and consider the linear equations for the elements of Y. N
Show that the determinant of the coefficients is
n
(bi'
+ bli) and thus
i,j-l
derive the stated results (Hahn).
14. An Alternate Route. We observed in the foregoing sections how Kronecker products arose from the consideration of various symmetric functions of the roots of two distinct matrices, A and B, and how a particular Kronecker power could be formed from a single matrix A. Let us now consider a different type of matrix" power" which is formed if we consider certain skew-symmetric functions, namely, determinants. There is strong geometric motivation for what at first sight may seem to be quite formal manipulation. The reader who is interested in the geometric background will find references to books by Bocher and Klein listed in the Bibliography at the end of the chapter. To emphasize the basic idea with the arithmetic and algebraic level kept at a minimum, we shall consider a 3 X 3 matrix and a set of 2 X 2 determinants formed from the characteristic vectors. In principle, it is clear how the procedure can be generalized to treat R X R determinants associated with the characteristic vectors of N X N matrices. In practice, to carry out the program would require too painstaking a digression into the field of determinants. Consequently, we shall leave the details, by no means trivial, to the interested reader. Consider the matrix (1)
where we have momentarily departed from our usual notation to simplify keeping track of various terms, whose characteristic roots are At, A2, Aa, with associated characteristic vectors xt, x 2, x a. For any two characteristic vectors, x' and xi, i ¢ j, we wish to obtain a set of linear equations for the 2 X 2 determinants yt, Y2, ya given by the relations
Ajxt~ I
Aixi
(2)
AjXa~ I
AiXt'
Since A,Xt' = atxt;
= blXt' A,.La i = CtXt' A,X2'
+ a2x2' + aaXa' + b2X21 + bJxa + C2X2· + caXa; i
(3)
Symmetric Functions, Kronecker Products and Circulants 241 with a similar equation for the components of xi, we see that i _I atxt + a2x 2' + aaxai btxti + b2X2i + baxai
AA· i ,Yt -
atxl btXti
+ a2x i + aaXai I + b2xi + baxi
(4)
If a is the vector whose components are the ai, and b the vector whose components are the bi , we see that (4) is equivalent to the equation (a, Xl) (b,X')
I
A;AjYt =
(a,x') (b,xi)
I
(5)
What we want at this point is a generalization of the expansion for the Gramian given in Sec. 5 of Chap. 4. Fortunately, it exists. It is easy to verify that
I
(a, Xi) (b,x i)
(a,xi) (b,xi)
I= I II at a2
I + I:: :: II::: ::~ I+ I:: :: II::: ::;I
bt b2
Xti
X2 i
Xti xi
(6)
From this it, follows immediately fr0f!l consideration of symmetry that
["']
A,Aj Y2 Ya
[.,,(a'b)
= gt2(a,c) g12(b,c)
gn(a,b) g2a(a,c) g23(b,c)
.,,(a,b)] ["'] g31(a,c) g31(b,c)
Y2 Ya
(7)
where Uii(X,Y) = IXi Xi
Yi I Yi
(8)
Our first observation is that AiA; is a characteristic root and y an associated characteristic vector of the matrix occurring in (7). Again, since the matrix is independent of i and j, we see that the three characteristic roots must be AtA2, AtA a, and A2Aa. We now have a means of obtaining a matrix whose characteristic roots are NAi , i ¢ j. EXERCISES
1. Reverting to the usual notation for the elements of A, write out the representation for the ijth element of the matrix appearing in (7). i. Let A be an N X N matrix. Write out the formula for the elements of the N(N - 1)/2-dimensional matrix, obtained in the foregoing fashion, whose roots are XiX;, i ~ j. 8. For the case where A is symmetric, show that we can obtain the inequality
Iall au'
)..).. I 2 -> au au
from the preceding results, and many similar inequalities. 4. Let A be au N X N matrix and let r be an integer between I and N. Denote by S, the ensemble of all sets of r distinct integers chosen from the integers I, 2, . . . , N.
Introduction to Matrix Analysis
242
Let Bl!ond t be two elements of S, and A" denote the matrix formed from A by deltutng all rows whose indices do not belong to B and all columns whose elements do not b~long
to t. 81,
Let the elemente of S, be enumerated in some fixed order, say in numerical value, BI, ••• ,8M. The M X M matrix, where M = N!/rf(N - r) f, C,(A) =
(IA"'iD
is called the rth compound or rth adjugate of A. Establish the following results. (a) C,(AB) = C,(A)C,(B) (b) C,(A') - C,(A)' (c) C,(A -I) - C,(A)-I (d) IC,(A)I = IAI· k = (N - 1) f/(r = 1) I(N - r) f (e) The characteristic roots of C,(A) are the expre88ions ",.., ""1, • • • , ",..V, where ",.1 + ",.1 + ... +",.M is the rth elementary symmetric function of "., "I, • • • • "N, e.g., ",.1 = "1"1 • • • "8, ",.1 = ",.1/11 • • • "8_1"11+1, • • • • • For a discu88ion of when a given matrix is a compound of another, see D. E. Rutherford, Compound Matrices, Koninkl. Ned. Akad. WetenBchap. Amsterdam, Proc. Sect. Sci., 8er. A, vol. 54, pp. 16-22, 1951. Finally, for the connection with invariant theory, see R. Weitzenbuck, Invariantentheorie, Groningen, 1923.
16. Circulants. Matrices of the form
c=
Co
Cl
CN-l
Co
CN-2
CN_l
CN-l CN-2
Co
CN-3
(1)
C2
CI
Co
occur in a variety of investigations. Let us determine the characteristic roots and vectors. Let TI be a root of the scalar equation TN = I, and set
=
YI
Then we see that
YI
Yl
+ CITI + . . . + CN_ITI N- I
(2)
satisfies the following system of equations:
=
TIYI
Co
+ CITI + . . . + CN_ITI N- I = CN_I + Corl + . . . + cN_2TI N - 2 Co
(3) TIN-IYI
=
CI
+ C2TI + . . . + CoT IN-I
I See, for further results and references, H. J. Ryser, Inequalities of Compound and Induced Matrices with Applications to Combinatorial Analysis, Illinoi8 J. Math., vol. 2, pp. 240-253, 1958.
Symmetric Functions, Kronecker Products and Circulants
243
It follows that YI is a characteristic root of C with associated characteristic vector 1 rl
(4)
Since the equation r N = 1 has N distinct roots, we see that we obtain N distinct characteristic vectors. Consequently, we have the complete set of characteristic roots and vectors in this way. EXERCISES
1. Use the scalar equation r 6 = air istic roots and vectors of the matrix CI
CO
[
CI Cs Cs
Cs Cs C.
c.
Co
+ 1 in a similar fashion to obtain the characterCs
+ cia
+ c~
+ caa + c.a
Ca
c. Co
C,
+ c~
+ caa
C.
Co
+ c.a
+ caa + c.a
Co
Cs
i. Generalize, using the defining equation
rN
=
Ct
Cs Ca
CI
CI
+C. c.a]
blr N - I
+ ... + b N •
MISCELLANEOUS EXERCISES
1. Let J(X) be a function of the NI variables Xi; possessing a power series development in these variables about zero. Show that we may write J(X) =
I k
i. Is it true that
tr (X/kICk)
~O
...
etrX
=
~ ~
tr (X/kJ) ? k! .
k=O
3. 4. 1,2, I.
If A a.nd B are positive definite, then A X B is positive definite. If A and B are symmetric matrices with A ~ B, then AI'] ~ Bln/, for n ""
....
If r satisfies the equation in Exercise 23 of Chap. 11 and 8 the equation 8 M + blsM- 1 + ... + by = 0, how does one form an equation whose roots are r,8i? 8. Let 1, 011, • • • , OIN_I be the ell'menta of a finite group. Write x = Xo + XIOII + ... + XN_10lN_l where the Xi are scnlars. Consider the products aiX = a,(O)Xo + a,(I)xl + ... + OIi(N - l)XN-l, where the elements OI,(J) are the 01, in some order, or OIiX = xo(t) + Xt(i)OII + XN_l(i)OIN_l, where the XI(J) are the Xi
+ ...
244
Introduction to Matrix Analysis
in some order. Introduce the matrix X - (:1:;<3) and write X ""':1:. If X""':I: Y ,..., 1/, does XY ....., x1/? 7. Let 1, ai, ••• , aN_I be the elements of a finite group G, and 1, fJl, ••• ,fJM-1 the elements of a finite group H. Consider the direct product of G and H defined as the group of order MN whose elements are CJlifJ,.. Using the procedure outlined in the preceding exercise, form a matrix corresponding to N-l
(l
M-l
XtCJIi) (
i~O
l
YifJi)
i~O
What is the connection between this matrix and the matrices N-l
X.....,
1
M-l
y.....,
XiCJI;
i-O
1
YifJi?
i-O
For an interesting relation between the concept of group matrices and normality, see O. Tau88ky.1 8. The equation AX - XA - "X possesses nontrivial solutions for X if and only if " = N where "I, "I, .•. , "N are the characteristic values of A (LappaDanilev8k1/). 9. Let F be a matrix of order NI, portioned into an array of N submatrices J;i, each of order N, such that eachJ;,. is a rational function, J;i(A), of a fixed matrix A of order N. If the characteristic values of A are "I, "I, ..• , "N, then those of F are given by the characteristic values of the N matrices, (j;j("k», k = 1, 2, .•• , N, each of order N. See J. Williamso.l l and S. N. Afriat. 1 For an application to the solution of partial differential equations by numerical techniques, see J. Todd 4 and A. N. Lowan.' 10. Let /(9) be a real function of 9 for -11" S;; 9 S;; 11", and form the Fourier coefficients of /(9),
"I
Cn
= -1
211"
Ir
-r
/(9)e- in6 d9
n = 0, ± 1, ±2, ...
The finite matrices k, I "" 0, 1, 2, . . . , N
are called Toeplitz matrices of order N. 11. Show that if we denote by "I(N), values of TN, then •
hm
N-+-
"I(N)
Show that TN is Hermitian. the N + 1 characteristic
"I(N), • • • , "N+I(N),
+ "2(N) + ... + "N+1(N) N +1
1 = -2
11"
Jr -r
/(9) d9
I O. Tau88ky, A Note on Group Matrices, Proc. A m. Math. Soc., vol. 6, pp. 984-986, 1955. I J. Williamson, The Latent Roots of a Matrix of Special Type, Bull. Am. Math. Soc., vol. 87, pp. 585-590, 1931. 'S. N. Afriat, Composite Matrices, Quart. J. Math., Oxford 2d, ser. 5, pp. 81-98, 1954. 4 J. Todd, The Condition of Certain Matrices, III, J. Research Nail. Bur. Standards, vol. 50, pp. 1-7, 1958. • A. N. Lowan, The Operator Approach to Problems of Stability and Convergence of Solutions of Difference Equations, Scripta Math., no. 8, 1957.
Symmetric Functions, Kronecker Products and Circulants 245 li. Show that
(These last two results are particular results of a general result established by Szego in 1917. A discussion of the most recent results, plus a large number of references to the ways in which these matrices enter in various parts of analysis, will be found in Kac, Murdock, and Szego. l ) Generalizations of the Toeplitz matrix have been considered in U. Grenander. ' See also H. Widom. 2 18. Having defined what we mean by the Kronecker sum of two matrices, A ED B, define a Kronecker derivative of a variable matrix X(t) as follows:
8X = lim X(t 8t
+ h)
ED (- X(t» h
h-->O
14. Consider the differential equation 8Y = A X Y
Y(O) = I
8t
Does it have a solution, and if so, what is it? 11. Establish the Schur result that C = (a,;b i ;) is positive definite if A and Bare by considering C as a suitable submatrix of the Kronecker product A X B (Marcus). 18. Let ali) be the ith column of A. Introduce the "stacking operator," S(A), which transforms a matrix into a vector
aW]
all)
S(A)
=
l
:
a(N)
Prove that S(PAQ) = (Q'l8lP)S(A). See D. Nissen, A Note on the Variance of a Matrix, Econometrica, vol. 36, pp. 603-604, 1968. N
17. Consider the Selberg quadratic form (see Appendix B), Q(x) = where at, k x. are real.
= 1, 2,
xv)',
_/ao
..• , N, is a set of positive integers, XI = I, and the remaining N
Consider the extended form Q,(x,z) =
j,(z) = min Q,(x,z),
I (I k-I
l
'-I where the minimization is over
(Zk
+
l
x.) I and write
v/ao Xk, Xk+I, . • . , XN.
Obtain a
:t:
recurrence relation connecting j,(z) with j,+I(Z). 1M. Kac, W. L. Murdock, and G. Szego, On the Eigenvalues of Certain Hermitian Forms, J. Rat. Mech. Analysis, vol. 2, pp. 767-800, 1953. I U. Grenander, Trans. Am. Math. Soc., 195~. • H. Widom, On the Eigenvalues of Certain Hermitian Operators, Trans. Am. Math. Soc., vol. 88, 491-522, 1958.
Introduction to Matrix Analysis
246
18. Use the fact that J,(,) is quadratic in , to simplify this recurrence relation; see Chap. 9. 19. Discuss the use of the recurrence relation for analytic and computational purposes. See R. Bellman, Dynamic Programming and the Quadratic Form of Selberg, J. Math.. Anal. Appl., vol. 15, pp.30-32, 1966. iO. What conditions must band c satisfy so that Xb = c where X is positive definite? (Wimmer) il. Establish the Schur result concerning (ai;bi;) using Kronecker products and a suitable submatrix (MaTera).
Bibliography and Discussion §1. The direct product of matrices arises from the algebraic concept of the direct product of groups. For a discussion and treatment from this point of view, see C. C. MacDuffee, The Theory of Matrices, Chelsea Publishing Co., chapter VII, New York, 1946. F. D. Murnaghan, The Theory of Group Representation, Johns Hopkins Press, Baltimore, 1938. We are interested here in showing how this concept arises from two different sets of problem areas, one discussed here and one in Chap. 15. See also E. Cartan, Oeuvres Completes, part 1, vol. 2, pp. 1045--1080, where the concept of a zonal harmonic of a positive definite matrix is introduced. §4. This procedure is taken from R. Bellman, Limit Theorems for Non-commutative Operations-I, Duke Math. J., vol. 21, pp. 491-500, 1954. §6. Some papers concerning the application and utilization of the Kronecker product are F. Stenger, Kronecker Product Extensions of Linear Operators, SIAM J. Numer. Anal., vol. 5, pp. 422-435, 1968. J. H. Pollard, On the Use of the Direct Matrix Product in Analyzing Certain Stochastic Population Models, Biometrika, vol. 53, pp. 397415, 1966.
§10. The Kronecker logarithm is introduced in the paper cited above. §11. The result of Theorem 4 is p;iven in the book by MacDuffee
Symmetric Functions, Kronecker Products and Circulants 247 referred to above. It was independently derived by W. Hahn, using the procedure outlined in the exercises; see W. Hahn, Eine Bemerkung zur zweiten Methode von Lyapunov, Math. Nachr., 14 Band., Heft 4/6, pp. 349-354, 1956. The result was obtained by R. Bellman, as outlined in the text, unaware of the discussion in the book by MacDuffee, cited above,
R. Bellman, Kronecker Products and the Second Method of Lyapunov, Math. Nachr., 1959. §13. A great deal of work has been done on the corresponding operator equations, partly for their own sake and partly because of their central position in quantum mechanics. See
M. Rosenbloom, On the Operator Equation BX - XA Math. J., vol. 23, 1956.
= C, Duke
E. Heinz, Math. Ann., vol. 123, pp. 415--438, 1951. D. E. Rutherford, Koninkl. Ned. Akad. Wetenschap., Proc., ser. A, vol. 35, pp. 54-59, 1932. §14. The motivation for this section lies within the theory of invariants of linear transformations and the geometric interconnections. See M. Bocher, Introduction to Higher Algebra, The Macmillan Company, New York, reprinted, 1947. F. Klein, Elementary Mathematics from an Advanced Standpoint, Geometry, Dover Publications, New York. H. Weyl, The Classical Groups, Princeton University Press, Princeton, N.J., 1946.
The 2 X 2 determinants are the Plucker coordinates. The basic idea is that every linear transformation possesses a large set of associated induced transformations that permit us to derive certain properties of the original transformation in a quite simple fashion. As indicated in the text, we have not penetrated into this area in any depth because of a desire to avoid a certain amount of determinantal manipulation. See also H. Schwerdtfeger, Skew-symmetric Matrices and Projective Geometry, Am. Math. Monthly, vol. LI, pp. 137-148, 1944. It will be clear from the exercises that once this determinantal groundwork has been laid, we have a new way of obtaining a number of the inequalities of Chap. 8.
248
Introduction to Matrix Analysis
For an elegant probabilistic interpretation of Plucker coordinates, and the analogous multidimensional sets, see Exercises 15 and 16 of Chap. 15, where some results of Karlin and MacGregor are given. §16. Circulants play an important role in many mathematical-physical theories. See the paper T. H. Berlin and lVI. Kac, The Spherical Model of a Ferromagnet, Phys. Rev., vol. 86, pp. 821-835, 1952. for an evaluation of some interesting circulants and further references to work by Onsager, et aI., connected with the direct product of matrices and groups. Treatments of the theory of compound matrices, and further references may be found in
H. J. Ryser, Inequalities of Compound and Induced l\1n.trices with Applications to Combinatorial Analysis, Illinois J. Math., vol. 2, pp. 240-253, 1958. N. G. deBruijn, Inequalities Concerning Minors and Eigenvalues, Nieuw. Arch. Wisk. (3), vol. 4, pp. 18-35, 1956. D. E. Littlewood, The Theory of Group Characters and Matrix Representations of Groups, Oxford, New York, 1950.
I. Schur, Vber eine Klasse von M atrizen die sich einer gegebenen Matrix zuordnen lassen, Dissertation, Berlin, 1901.
J. H. M. Wedderburn, Lectures on Matrices, Am. Math. Soc. Colloq. Publ., vol. 17, 1934. C. C. MacDuffee, The Theory of Matrices, Chelsea Publishing Company, New York, 1946. O. Toeplitz, Das algebraische Analogon zu einem Satze von Fejer, Math. Z., vol. 2, pp. 187-197, 1918. M. Marcus, B. N. Moyls, and R. Westwick, Extremal Properties of Hermitian Matrices II, Canad. J. Math., 1959.
13 Stability Theory 1. Introduction. A problem of great importance is that of determining the behavior of a physical system in the neighborhood of an equilibrium state. If the system returns to this state after being subjected to small disturbances, it is called stable,' if not, it is called unstable. Although physical systems can often be tested for this property, in many cases this experimental procedure is both too expensive and too time-consuming. Consequently, when designing a system we would like to have mathematical criteria for stability available. It was pointed out in Chap. 11 that a linear equation of the form dx
-
dt
=
x(O) = c
Ax
(1)
can often be used to study the behavior of a system in the vicinity of an equilibrium position, which in this case is x = O. Consequently, we shall begin by determining a necessary and sufficient condition that the solution of (1) approach zero as t ...... 00. The actual economic, engineering, or physical problem is, however, more complicated, since the equation describing the process is not (1), but nonlinear of the form dy = Ay dt
+ g(y)
y(O) = c
(2)
The question then is whether or not criteria derived for linear systems are of any help in deciding the stability of nonlinear systems. It turns out that under quite reasonable conditions, the two are equivalent. This is the substance of the classical work of Poincar~ and Lyapunov. However, we shall not delve into these more recondite matters here, restrict-ing ourselves solely to the consideration of the more tractable linear equations. 2. A Necessary and Sufficient Condition for Stability. Let us begin by demonstrating the fundamental result in these investigations. 249
Introduction to Matrix Analysis
250
A necessary and sufficient condition that the solution of
Theorem 1.
dx
- =Ax dt
x(O) = c
(1)
regardless of the value of c, approach zero as t -+ 00, is that all the characteristic roots of A have negative real parts. Proof. If A has distinct characteristic roots, then the representation
o T-I
(2)
o establishes the result. We cannot make an immediate appeal to continuity to obtain the result for general matrices, but we can proceed in the following way. In place of reducing A to diagonal form, let us transform it into triangular form by means of a similarity transformation, T-l AT = B. The system of equations in (1) takes the form
dz dt
= Bz
z(O) = c'
where B is a triangular matrix upon the substitution x = Tz. out in terms of the components, we have
~/
= buz l
dz t
di
=
(3)
Written
+ buz + t
b2tZ2
+ (4)
Since the b.. are the characteristic roots of A, we have, by assumption, Re (bit) < 0 for i = 1, 2, ., N. Solving for ZN, (5)
we see that
ZN
-+
0 as t -+
00.
251 Stability Theory In order to show that all z. -+ 0 as t -+ 00, we proceed inductively
based upon the following result. If vet) -+ 0 as t -+ 00, then u(t) as determined by du dt = b1u
approaches zero as t -+ Since
00,
u(t)
+ vet)
u(O) = al
provided that Re (b 1)
= alel'" + eI'"
(6)
< O.
fo' e-6··v(s) ds
(7)
it is easy to see that the stated result is valid. Starting with the result for ZN based upon (5), we obtain successively the corresponding results for ZN_I, • • • ,Zl. EXERCISE
1. Prove Theorem 1 using the Jordan canonical form.
3. Stability Matrices. a new term.
To avoid wearisome repetition, let us introduce
Definition. A matrix A will be called a stability matrix if all 01 its characteristic roots have negative real parts. EXERCISES
1. Derive a necessary and sufficient condition that a real matrix be a stability matrix in terms of the matrix (tr (A Hi» (Bass). i. What is the corresponding condition if A is complex?
4. A Method of Lyapunov. Let us now see how we can use quadratic forms to discuss questions of asymptotic behavior of the solutions of Hnear differential equations. This method was devised by Lyapunov and is of great importance in the modern study of the stability of solutions of nonlinear functional equations of all types. Consider the equation
-dx = Ax dt
x(O)
=c
(1)
where c and A are taken to be real, and the quadratic form u = (x,Yx)
where Y is a symmetric constant matrix as yet undetermined.
~~ =
+ (x,Yx') = (Ax,Yx) + (x,YAx) = (x,(A'Y + YA)x)
(2)
We have
(x',Yx)
(3)
252
Introduction to Matrix Analysis
Suppose that we can determine Y so that
A'Y+YA=-[
(4)
with the further condition that Y be positive definite. in (3) becomes du dt = - (x,x)
Then the reJation (5)
which yields (6) where 'AN is the largest characteristic root of Y. From (6) we have ~ u(O)e- 1/AN • Hence u -+ 0 as t -+ 00. It follows from the positive definite property of Y that each component of x must approach zero as u
t-+oo.
We know, however, from the result of Sec. 13 of Chap. 12 that if A is a stability matrix, we can determine the symmetric matrix Y uniquely from (4). Since Y has the representation (7)
we have
=
(x,Yx)
=
fo'" (x,eA'heAhx) dl
h'"
l
(eAhx,eA/'x) dt l
(8)
It follows that Y is positive definite, since eAI is Illever singular. EXERCISES 1. Consider the equation dx/dt = Ax + g(x), x(O) = e, where (a) A is a stability matrix, (b) Iig(x)II/lIxll-+ 0 as IIxll-+ 0, (e) lIeli is sufficiently small. Let Y be the matrix determined above. Prove that if x satitlfics the foregoing nonlinear equation and the preceding conditions arc satisfied, thell
it
(x,Yx)
~
-T,(X,YX)
where TI is a positive constant. Hence, show that x -+ 0 as t -+ co. i. Extend the foregoing argument to treat the case of complex A.
6. Mean-square Deviation. that we wish to calculate
J = where x is a solution of (4.1).
Suppose that A is a stability matrix and
10'" (x,Bx) dt
(1)
Stability Theory
253
It is interesting t.o note that J can be calculated as a rational function of the elements of A without the necessity of solving the linear differential equation for x. In particular, it is not necessary to calculate the characteristic roots of A. Let us determine a constant matrix Y such that (x,Bx) =
d
dt
We see that B = A'Y
(2)
(x, Yx)
+ YA
(3)
With this determination of Y, the value of J is given by J
= -
(c,Yc)
(4)
Since A is a stability matrix, (3) has a unique solution which can be found using determinants. 6. Effective Tests for Stability. The problem of determining when a given matrix is a stability matrix is a formidable one, and at the present time there is no simple solution. What complicates the problem is that we are not so much interested in resolving the problem for a particular matrix A as we are in deriving conditions which enable us to state when various members of a class of matrices, A(~), are stability matrices. Questions of this type arise constantly in the design of control mechanisms, in the field of mathematical economics, and in the study of computational algorithms. Once the characteristic polynomial of A has been calculated, there are a variety of criteria which can be applied to determine whether or not all the roots have negative real parts. Perhaps the most useful of these are the criteria of Hurwitz. Consider the equation
1>.1 - AI =
'AN
+ al'A N-- + ... + a'v_l'A + aN J
=0
(1)
and the associated infinite array al 0 0 aa a2 al 1 ab a, aa a2 a7 a6 a6 at
where
ak
is taken to be zero for k
0
0
0 0 al 0 aa a2
> N.
(2)
Introduction to Matrix Analysis
254
A necessary and sufficient condition that all the roots of (1) have negative real parts is that the sequence of determinants
at ha =
aa ali
1 0 a2 al a. aa
(3)
formed from the preceding array, be positive. There are no simple direct proofs of this result, although there are a number of elegant proofs. We shall indicate in the following section one line that can be followed, and in Appendix C discuss briefly the chain of ideas, originating in Hermite, giving rise to Hurwitz's proof. Both of these depend upon quadratic forms. References to other types of proof will be found at the end of the chapter. The reason why this result is not particularly useful in dealing with stability matrices is that it requires the evaluation of 1>.1 - A I, something we wish strenuously to avoid if the dimension of A is high. EXERCISES 1. Using the foregoing criteria, show that a nece88ary and sufficient condition that + al>" + al be a stability polynomial is that al > 0, as > 0. By this we mean that the roots of the polynomial have negative real parts. i. For >..' al>..l al>" a. show that corresponding conditions are
>..1
+
8. For >..4
alai>
+
+
+ al>'" + as).1 + a.>.. + at
a., a.(alal -
show that the conditions are ai, a4 > 0,
a,l > al'a t.
'1. A Necessary and Sufficient Condition for Stability Matrices. us now show that the results of Sec. 4 yield Theorem 2. Let Y be determined by the relation A'Y
+ YA
=
-I
Let
(1)
Then a necessary and sufficient condition that the real matrix A be a stability matrix is that Y be positive definite. Proof. Referring to Sec. 5, we see that
or
loT (x,x) dt =
(x(O),Yx(O» - (x(T),Yx(T»
(2)
(x(T),Yx(T»
+ JoT (x,x)
(3)
dt = (x(O),Yx(O»
Here x is a solution of the equation dxjdt = Ax. If Y is positive definite, (x,x) dt is uniformly bounded, which means that x(t) -+ 0 as t -+ 00, whence A is a stability matrix. We have already established the fact that Y is positive definite if A is a stability matrix.
loT
Stability Theory
255
8. Differential Equations and Characteristic Values. differential equation
Ax"
+ 2Bx' + Cx
= 0
x(O) = C1
Consider the
X'(O) = c2
(1)
which, if considered to arise from the study of electric circuits possessing capacitances, inductances, and resistances, is such that A, B, and Care non-negative definite. On these grounds, it is intuitively clear that the following result holds. Theorem 3. If A, B, and'C are non-negative definite, and either A or C positive definite, then (2)
has no roots with positive real parts. If A and C are non-negative definite and B is positive definite, then the only root with zero real part is >. = O. Proof. Let us give a proof which makes use of the physical background of the statement. In this case, we shall use energy considerations. Starting with (1), let us write (x',Ax") Thus, for any s
or
+ 2(x',Bx') + (x',Cx)
=0
(3)
> 0,
h' [(x',Ax") + 2(x',Bx') + (x',Cx)] dt = 0 (x"Ax')]~ + 4 h' (x',Bx') dt + (x,Cx) ]~ = 0
(4) (5)
This is equivalent to the equation
(x'(s),Ax'(s»
+ 4 10' (x',Bx') dt + (x(s),Cx(s»
= c~
(6)
where CI = (c 2,Ac 2) + (CI,CC I). If >. is a root of (2), then (1) has a solution of the form eXlc. If >. is real, c is real. If >. is complex, >. = rl + ir2' then the real part of eXlc, which has the form er,l(a l cos r2t + a 2 sin T2t), is also a solution. Substituting in (6), we see that
e2r ,'(b l,Ab l ) + 4
t:
e2r,l(b 2(t),Bb 2(t» dt
+ e2r,'(b ,Cb l
l
)
=
CI
(7)
where b l and b3 are constant vectors and b2 is a variable vector given by (alrl + a2r2) cos r2t + (a 2rl - alr2) sin r2t. If A or C is positive definite, with B ~ 0, we see that rl > 0 leads to a contradiction as s ~ 00. If A, C ~ 0, then B positive definite requires that rl ~ O. Furthermore, since a' cos r2t + a 2 sin r2t is periodic, we see that the integral (b 2(t),Bb 2(t» dt diverges to plus infinity as s -+ 00, unless r2 = 0, if
10'
rl = O.
256
Introduction to Matrix Analysis
We have presented this argument in some detail since it can be extended to treat similar questions for equations in which the coefficients A, B, and C are variable. EXERCISES
1. Following Anke,llet us use the foregoing techniqucs to evaluate
h"
u l dt, given
that u is a solution of u'" + a2tt" + a,u' + aou = 0, all of whose solutions tend to zero. Let u(O) = Co, u'(O) = C" u"(O) = C2. Establish the results
10'" u'''u dt = u"u I; - fo'" u"u' dt = -COC2 + c,2/2
fo'" u"u dt = u'u \; - fo'" U'2 dt = -coc, - fo'" 10'" u'u dt =
U'I
d'
-co l /2
i. Derive from the equations
10" U(i)(U'" + a2u" + alu' + aou) dt = 0
i = 0, 1, 2
and the results of Exercise 1, a set of linear equations for the quantities
10" (u')2 dt, 10'" (u")2 dt. 8. Using these Iincar equations, express
fo'"
ttl
dt as a quadrat.ic form in
10
to
u l dt,
Co, C"
C2.
4. Using this quadratic form, oMain a set of neccssary and sufficicnt conditions that r 3 + a2r2 + a,r + ao be a stability polynomial. 6. Show that these conditions are equivalent to the Hurwitz conditions given in Exercise 2 of Sec. 6.
9. Effective Tests for Stability Matrices. As indicated in the foregoing sections, there exist reasonably effective techniques for determining when the roots of a given polynomial have negative real parts. Since, however, the task of determining the characteristic polynomial of a matrix of large dimension is a formidable one, we cannot feel that we have a satisfactory solution to the problem of determ.ining when a given matrix is a stability matrix. In some special cases, nonetheless, very elegant criteria exist. Thus: Theorem 4. If A has the form
-1
bN _ 1 -1
1
Z. angew. Math. u. Phys., vol. VI, pp. 327-332, 1955.
Stability Theory
257
where the dots indicate that all other terms are zero, the ai are real, and the b. are zero or pure imaginary, then the number of positive terms in the sequence of products ai, ala2, ... , ala2 ... aN-IaN is the number of characteristic roots of A with positive real part. For the proof, which is carried out by means of the theory of Sturmian sequences, we refer to the paper by Schwarz given in the Bibliography at the end of the chapter. In this paper it is also shown that any matrix B with complex elements can be transformed into the form appearing in (1) by means of a transformation of the type A = T-IBT. As we shall see in Chap. 16, the theory of positive matrices gives us a foothold on the stability question when A is a matrix all of whose off-diagonal elements are non-negative. MISCELLANEOUS EXERCISES 1. Obtain the solution of the scalar equation u" + u = !J(I) by writing u' "" v, v' .. -u + /,(1) and determining the elements of eAI where A = [
0 1]
-1 0
i. Let the characteristic roots of A be distinct and B(I) -> 0 as 1-> 00. Then the characteristic roots of A + B(I), which we shall call >...(1), are distinct for I ~ 10•
3. If, in addition,
J'"
IIB(I) II dl
<
00,
there exists a matrix T(I) having the property
thatthechangeofvariabley .. Tzconvertsy' = (A +B(I»yinto,' .. (L(I) where >"1 (I)
+ C(I»,
>"2(1)
O,J
L(I) =
[
o and
J'"
IIC(I)II dl
<
00.
N
4. If
1 i,i~
1--+
Jo'" la.;(1)
+ a;.(I)1 dl <
ao, all solutions of y' = A (I)y are bounded as
1
00.
Ii. There exists an orthogonal matrix B(I) such that the transformation y = B(I), converts y' = A (I)y into z' = C(I), where C(I) is semidiagonal (Diliberto). 8. There exists a bounded nonsingular matrix B(I) with the property that C(I) is diagonal. 7. The usual rule for differentiation of a product of two functions u and v has the form d(ltv) Idl = u dvldt (duldl)v. Consider the linear matrix function d(X) defined by the relation
+
d(X) = AX - XA
258
Introduction to Matrix Analysis
where A is a fixed matrix.
Show that d(XY) = Xd(Y)
+ d(X)Y
8. Obtain a representation for d(XhX• . . • XN) and thus for d(XN). 9. Let d..(X) = AX - XA, dB(X) = BX - XB. Show that d..(dB(X» = dB(d..(X»
10. When does the equation d(X) = )..X, ).. a scalar, have a solution, 'and what is it? For an extensive and intensive discussion of these and related questions, see J. A. Lappo-Danilevsky.· 11. When do the equations d(X.) = auX. d(X.) = a•• X.
+ auX, + auX.
have solutions, and what are they? n. Since d(X) is an analogue of a derivative, are there analogues of a Taylor series expansion? 13. Given a real matrix A, can we always find a diagonal matrix B with elements ± 1 such that all solutions of Bdz/dt = Ax approach zero as t -+ 00 (Brock)? 14. Let A be nonsingular. Then a permutation matrix P and a diagonal matrix D can be chosen so that DP A is a stability matrix with distinct characteristic roots (Folkman).
Iii. What is the connection between the foregoing results and the idea of solving (Unfortunately, the result of Exercise 14 is an existence proof. No constructive way of choosing D and P is known.) 18. If C has all characteristic roots less than one in absolute value, there exists a positive definite G such that G - caC· is positive definite (Stein). 17. Establish Stein's result using difference equations. For extensions of the results in Exercises 14-17 and additional references, see O. Taussky, On Stable Matrices, Programmation en math~matique8 num~rique8, Besanl;On, pp. 75-88, 1966. 18. Let A be an N X N complex matrix with characteristic roots ).." ).., + X. ;4 O. Then the N X N matrix G = G·, the solution of AG + GA • = I is nonsingular and has as many positive characteristic roots as there are ).., with positive real parts. (0. Tau88ky, A Generalization 0/ a Theorem 0/ Lyapunov, J. Soc. Ind. Appl. Math., vol. 9, pp. 640-643, 1961.) 19. Given the equation )..n + a,)..n-. + ... + an = 0, with the roots).... ).." . . . , )..n, find the equation of degree whose roots are ).., + ).." i, j = 1, 2, . . . SO. Find the equation of degree n (n + 1) /2 whose roots are).., + )../, i = 1, 2, . . • , ft, j = 1,2, ••• , i, and the equation of degree n(n - 1)/2 whose roots are ).., + )../, i = 2,3, ••• ,n; i' = 1,2, . . . , i - 1. Sl. Use these equations to determine necessary and sufficient conditions that all of the roots of the original equation have negative real parts (Clifford-Routh). See Fuller,. where many additional results will be found, and Barnett and Storey. and the survey paper by O. Taussky cited above. Ax = b by use of the limit of the solution of y' = Ay - bas t -+ co?
n·
,n.
• J. A. Lappo-Danilevsky, Mbnoire8 8ur fa thoorie des 8Y8t~me8 de8 ~uations difJbentielles fin~aiTe8, vol. 1, Chelsea Publishing Co., New York, 1953. • A. T. Fuller, Conditions for a Matrix to Have Only Characteristic Roots with Negative Real Parts, J. Math. Anal. Appl., vol. 23, pp. 71-98, 1968. • 8. Barnett and C. Storey, Analysis and Synthesis of Stability Matrices, J. DifJ. Eq., vol. 3, pp. 414-422, 1967.
Stability Theory
259
Bibliography and Discussion §1. We are using the word "stability" here in the narrow sense that all solutions of dx/dt = Ax tend to the null solution as t _ 00. For a wider definition and further results, see
R. Bellman, Stability Theory of Differential Equations, Dover Publications, New York, 1969. The problem of determining the asymptotic behavior of the solutions of dx/dt = A (t)x in terms of the nature of the characteristic roots of A (t) as functions of t is much more complex. See the book cited above, and E. A. Coddington and N. Levinson, Theory of Ordinary Differential Equatwns, McGraw-Hill Book Company, Inc., New York, 1955. §2. An extensive generalization of this result is the theorem of Poincare-Lyapunov which asserts that all solutions of dx/dt = Ax + g(x) for which IIx(O) II is sufficiently small tend to zero as t - 00, provided that A is a stability matrix and g(x) is nonlinear, that is, IIg(x)II/lIxll- 0 as IIxll- O. See the books cited above for several proofs, one of which is sketched in Exercise 1 of Sec. 4. See also
R. Bellman, Methods of Nonlinear Analysis, Academic Press, Inc., New York, 1970. §4. This method plays a fundamental role in the modern theory of the stability of functional equations. It was first presented by Lyapunov in his memoir published in 1897 and reprinted in 1947,
A. Lyapunov, Probl.eme general de la stabilite du mouvement, Ann. Math. Studies, no 17, 1947. For a report on its current applications, see H. Antosiewicz, A Survey of Lyapunov's Second Method, Applied Mathematics Division, National Bureau of Standards, 1956. H. Antosiewicz, Some Implications of Lyapunov's Conditions for Stability, J. Rat. Mech. Analysis, vol. 3, pp. 447-457, 1954. §6. We follow the results contained in R. Bellman, Notes on Matrix Theory-X: A Problem in Control, Quart. Appl. Math., vol. 14, pp. 417-419, 1957.
260
Introduction to Matrix Analysis K. Anke, Eine neue Berechnungsmethode der quadratischen RegelHache, Z. angew. Math. u. Phys., vol. 6, pp. 327-331, 1955. H. Buckner, A Formula for an Integral Occurring in the Theory of Linear Servomechanisms and Control Systems, Quart. Appl. M at/t., vol. 10, 1952.
P. Hazenbroek and B. I. Van der Waerden, Theoretical Considerations in the Optimum Adjustment of Regulators, Trans. Am. Soc. Mech. Engrs., vol. 72, 1950. §6. The results of Hurwitz were obtained in response to the problem of determining when a given polynomial was a stability polynomial; see A. Hurwitz, tiber die Bedingungen unter welchen eine Gleichung nur Wurzeln mit negativen reellen Teilen besitzt, Math. Ann., vol. 46, 1895 (Werke, vol. 2, pp. 533-545). A large number of different derivations have been given since. A critical review of stability criteria, new results, and an extensive bibliography may be found in H. Cremer and F. H. Effertz, tiber die algebraische Kriterien fUr die Stabilitat von Regelungssystemen, Math. Annalen, vol. 137, pp. 328-350, 1959. A. T. Fuller, Conditions for a Matrix to Have Only Characteristic Roots with Negative Real Parts, J. Math. Analysis Appl., vol. 23, pp. 71-98, 1968.
§'1. A great deal of work using this type of argument has been done by R. W. Bass in unpublished papers. For a discussion of stability problems as they arise in mathematical economics, see A. C. Enthoven and K J. Arrow, A Theorem on Expectations and the Stability of Equilibrium, Econometrica, vol. 24, pp. 288-293, 1956.
K. J. Arrow and M. Nerlove, A Note on Expectations and Stability, Econometrica, vol. 26, pp. 297-305, 1958. References to earlier work by Metzler and others will be found there. §9 See the paper H. R. Schwarz, Ein Verfahren zur Stabilitii.tsfr~e bei. l\latrizenEigenwert-Problemen, Z. a1l{/ew. Math.. u. Pllys., vol. 7, pp. 473-500, 1956.
Stability Theory
261
For the problem of the conversion of a given matrix into a Jacobi matrix, see the above paper by Schwarz and W. Givens, Numerical Computation of the Characteristic Values of a Real Symmetric Matrix, Oak Ridge National Laboratory, Report ORNL-1574, 1954. For closely related results, see E. Frank, On the Zeros of Polynomials with Complex Coefficients, Bull. Am. Ma1h. Soc., vol. 52, pp. 144-157, 1946.
E. Frank, The Location of the Zeros of Polynomials with Complex Coefficients, Bull. Am. Math. Soc., vol. 52, pp. 89(}-898, 1946. Two papers dealing with matIices all of whose characteristic values lie inside the unit circle are O. Taussky, Analytical Methods in Hypereomplex Systems, Compo Math., 1937. O. Taussky and J. Todd, Infinite Powers of Matrices, J. London Math. Soc., 1943.
An interesting and important topic which we have not at all touched upon is that of determining the asymptotic behavior of the solutions of the vector-matrix equation x(n + 1) = (A + B(n»x(n) as n~ 00, where the elements of B(n) tend to zero as n ~ 00. The study of this problem was begun by Poincare; see H. Poincare, Sur les equations lineaires aux differentielles ordinaires et aux differences fillies, Am. J. Math., vol. 7, pp. 203-258, 1885. and continued by Perron, O. Perron, tJber die Poincaresche lineare Differenzgleichung, J. reine angew. Math., vol. 137, pp. 6-64, 1910. and his pupil Ta Li, Ta Li, Die Stabilit1i.tsfrage bei Differenzengleichungen, Acta Math., vol. 63, pp. 99-141, 1934. The most recent result in this field is contained in G. A. Freiman, On the Theorems of Poincare and Perron, Uspekhi (Russian.)
Mat. Nauk (N.S.), vol. 12, no. 3(75), pp. 241-246, 1957.
Other discussion may be found in R. Bellman, A Survey of the Theory of the Boundedness, Stability, and A symptotic Behavior of Solutions of Linear and N on~Unear Differ-
262
Introduction to Matrix Analysis ential and Difference Equations, Office of Naval Research, Depart- . ment of the Navy, January, 1949.
R. Bellman, MethodB of Nonlinear A naly8is, Academic Press, Inc., New York, 1970. These questions, as well as the corresponding questions for the differential equation dx/dt = (A + B(t»x, belong more to the theory of linear differential and difference equations than to the theory of matrices per se. For a discussion of the stability of linear differential systems with random coefficients, see O. Sefl, On Stability of a Randomized Linear System, Sci. Sinica, vol. 7, pp. 1027-1034, 1958. S. Sninivasan and R. Vasodevan, Linear Differential Equations with Random Coefficients, American Elsevier Publishing Company, Inc., New York, 1970 (to appear). J. C. Samuels, On the Mean Square Stability of Random Linear Systems, IRE Trans. on Circuit Theory, May, 1959, Special Supplement (Transactions of the 1959 International Symposium on Circuit and Information Theory), pp. 248-259. where many additional references to earlier work by Rosenbloom and others will be found.
14 Markoff Matrices and Probability Theory 1. Introduction. In this and the following chapters, which constitute the last third of the book, we wish to study a class of matrices which are generated by some fundamental questions of probability theory and mathematical economics. The techniques we shall employ here are quite different from those utilized in the study of quadratic forms, or in connection with differential equations. The basic concept is now that of non-negativity, non-negative matrices and non-negative vectors. A matrix M = (m,j) is said to be non-negative if mij ~ 0 for all i and j. Similarly, a vector :t is said to be non-negative if all its components are non-negative, :t, ~ O. The subclass of non-negative matrices for which the stronger condition m'J > 0 holds is called positive. These matrices possess particularly interesting and elegant properties. The terms /I positive" and /I non-negative" are frequently used to describe what we have previously called /I positive definite" and "nonnegative definite." Since the two types of matrices will not appear together in what follows, we feel that there is no danger of confusion. The adjective /I positive" is such a useful and descriptive term that it is understandable that it should be a bit overworked. We have restricted ourselves in this volume to a discussion of the more elementary properties of non-negative matrices. Detailed discussions, which would require separate volumes, either in connection with probability theory or mathematical economics, will be found in references at the end of the chapter. This chapter, and the one following, will be devoted to stochastic processes, while the next chapter will cover various aspects of matrix theory and mathematical economics. 2. A Simple Stochastic Process. In order to set the stage for the entrance of Markoff matrices, let us review the concept of a deterministic process. Consider a system S which is changing over time in such a way that its state at any instant t can be described in terms of a finite 263
264
Introduction to Matrix Analysis
dimensional vector x. Assume further, that the state at any time 8 + t, > 0, can be expressed as a predetermined function of the state at time t, and the elapsed time 8, namely,
8
X(8
+ t)
= g(X(t),8)
(1)
Under reasonable conditions of continuity on the function g(t), we can derive, by expansion of both sides in powers of 8, a differential equation for x(t), of the form
dx
- = h(x(t» dt
(2)
Finite-dimensional deterministic systems of the foregoing type are thus equivalent to systems governed by ordinary differential equations. If we introduce functions of more complicated type which depend upon the past history of the process, then more complicated functional equations than (2) result. The assumption that the present state of the system completely determines the future states is a very strong one. It is clear that it must always fail to some degree in any realistic process. If the deviation is slight, we keep the deterministic model because of its conceptual simplicity. It is, however, essential that other types of mathematical models be constructed, since many phenomena cannot, at the present time, be explained in the foregoing terms. Let us then consider the following stochastic process. To simplify the formulation, and to avoid a number of thorny questions which arise otherwise, we shall suppose that we are investigating a physical system S which can exist only in One of a finite number of states, and which can change its state only at discrete points in time. Let the states be designated by the integers 1, 2, . . . , N, and the times by t = 0, 1, 2, . . .. 1'0 introduce the random element, we assume that there is a fixed probability that a system in stage j at time t will transform into state i at time t + 1. This is, of course, again a very strong assumption of regularity. A" random process" in mathematical parlance is not at all what we ordinarily think of as a "random process" in ordinary verbal terms. Pursuant to the above, let us then introduce the trans-ition matrix M = (mlj), where mIl = the probability that a system in state j at time t will be in state i at time t + 1 Observe that we take 111 to be independent of time. This is the most important and interesting case. In view of the way in which M has been introduced, it is clear that we wish to impose the following conditions:
Markoff Matrices and Probability Theory ~~O
265 ~
j
= 1,2,
. . . ,N
(36)
That mil is non-negative is an obvious requirement that it be considered a probability. The second condition expresses the fact that a particle in state j at time t must be somewhere, which i~ to say in one of the allowable states, at time t + 1. 3. Markoff Matrices and Probability Vectors. Let us now introduce some notation. A matrix M whose elements satisfy the restrictions of (2.4) will be called a Markoff matrix. A vector x whose components Xi satisfy the conditions (la) N
.) x, = 1
.~
(lb)
will be called a probability vector. Generally, a vector all of whose components are positive will be called a positive vector. EXERCISES
+
1. Prove that .,.,p (1 - "")Q is a Markoff matrix for 0 ~ .,., ~ 1 whenever P and Q are Markoff matrices. i. Prove that PQ is a Markoff matrix under similar assumptions. 8. Prove that Mx is a probability vector whenever x is a probability vector and M is a Markoff matrix. 4. Prove that Markoff matrices can be characterized in the following way: A matrix M is a Markoff matrix if and only if Mx is a probability vector whenever x is a probability vector. Ii. Prove that if M' is the transpose of a Markoff matrix M and M' = (aii), then
I
N
a,i = 1, for i = 1, 2, . . . , N.
j~l
8. Prove that a positive Markoff matrix transforms non-trivial vectors with nonnegative components into vectors with positive components.
4. Analytic Formulation of Discrete Markoff Processes. A stochastic process of the type described in Sec. 2 is usually called a discrete Markoff process. Let us now see how this pmcess can be described in analytic terms. Since the state of the system S at any time t is a random variable, assuming anyone of the values i = 1, 2, ... , N, we introduce the N functions of t defined as follows: x.(t) = the probability that the system is in state i at time t
(0
Introduction to Matrix Analysis
266
At , = 0 we have the relation
x.(O)
= 8ik
(2)
where k is the initial state of S. The following relations must then be valid: N
x.(t + 1) =
L
mijXj(t)
i
= 1, 2, •.• , N
(3)
;-1
One aspect of the problem of predicting regular behavior of S is that of studying the behavior of the solutions of the system of equations in (3), which we may write more compactly in the form
X(t
+ 1)
= Mx(t)
t = 0, 1,2, . . .
(4)
The full problem is one of great complexity and interest, with analytic, algebraic, topological, and physical overtones. A number of monographs have been written on this topic, and there seems to be an unlimited area for research. It turns out that particularly simple and elegant results can be obtained in the case in which the elements of M are all positive. Consequently, we shall restrict ourselves principally to a discussion of this case by several methods, and only lightly touch upon the general case. EXERCISE 1. Prove that x(t) = M'x(O), x(t
+ s)
= M'x(s).
6. Asymptotic Behavior. The next few sections will be devoted to two proofs of the following remarkable result: Theorem 1. If M is a positive Markoff matrix and if x(t) satisfies (4) of Sec. 4, then lim x(t) = y, where y is a probability vector,
(1a)
y is independent of x(O),
(1b)
1-."
y is a characteristic vector of M with associated characteristic root 1 (1c) That x(t) should settle down to a fixed probability vector y is interesting and perhaps not unexpected in view of the mixing property implied in the assumption mij > O. Th!1.t this limit should be independent of the initial state is certainly surprising. 6. First Proof. We shall present two proofs in this chapter. A third proof can be derived as a consequence of results obtained for general positive matrices in Chap. 16. The first proof, which we present first because of its simplicity, illustrates a point we have stressed before,
Markoff Matrices and Probability Theory
267
namely, the usefulness of considering transformations together with their adjoints. Since t assumes only the discrete sequence of values 0, 1, 2, . . . , let us replace it by n and speak of x(n) in place of x(t). To establish the fact that x(n) has a limit as n ~ 00, we consider the inner product of x(n) with an arbitrary vector b. Since, by virtue of (4.4), x(n) = M"x(O), we see, upon setting x(O) = c, that (x(n),b)
=
(M"c,b)
=
(c,(M")'b) = (c,(M')"b)
~1)
where M' is the transpose of M. If we show that (M')"b converges as n ~ 00, for any given b, it will follow that x(n) converges as n ~ 00, since we can choose for b first the vector all of whose components are zero, except for the first, then the vector all of whose components are zero except for the second, and so on. Let us then introduce the vector ~2)
z(n) = (M')"b
which satisfies the difference equation z(n
+ 1)
z(O) = b
= M'z(n)
(3)
Let, for each n, u(n) be the component of z(n) of largest value and v(n) the component of z(n) of smallest value. We shall show that the properties of M' imply that u(n) - v(n) ~ 0 as n ~ 00. Since N
zi(n
+ 1)
=
I
(4)
mj;Zj(n)
j=1
N
and since
I
mj;
=
1,
mj; ;:::
0, we see that
i= I
u(n v(n
+ 1) + 1)
~ u(n) ;::: v(n)
(5)
Since {u(n)} is a monotone decreasing sequence, bounded from below by zero, and {v(n) I is a monotone increasing sequence, bounded from above by one, we see that both sequences converge. Let u(n) ~ u
v(n) ~ v
(6)
To show that u = v, which will yield the desired convergence, we proceed as follows. Using the component form of (3), we see that u(n v(n
+ 1) + 1)
~
(1 - d)u(n)
~ (1 - d)v(n)
+ dv(n) + du(n)
where d is the assumed positive lower bound for m,';, i, j
(7)
= 1,2,
,N.
268
Introduction to Matrix Analysis
From (7) we derive u(n
+ 1)
- v(n
+ 1)
~ [(1 - d)u(n)
+ dv(n) ~
- ~1 - d)v(n) - du(n)] (1 - 2d)(u(n) - v(n» (8)
We see that (8) yields u(n) - v(n) ~ (1 - 2d)"(u(0) - v(O»
(9)
and hence, since d ~ ~ if N ~ 2, that u(n) - v(n) - 0 as n _ 00. Thus, not only does z(n) converge as n - 00, but it converges to a vector all of whose components are equal. From the convergence of z(n) we deduce immediately the convergence of x(n), while the equality of the components will yield, as we shall see, the independence of initial value. Let lim z(n) = z and lim x(n) = y. As we know, all of the compon-+«J
n...... oa
nents of z are equal. (y,b)
=
Let us call each of these components al.
lim (x(n),b)
"......
=
=
(e,z)
al[cl
Then
+ e2 + ... + e,,] = al
(10)
where al is a quantity dependent only upon b. It follows that y is independent of e. That y is a characteristic vector of M with characteristic root 1 follows easily. We have y
=
lim M"+le
"......
=
M lim M"e
"......
=
My
(11)
In other words, y is a "fixed point" of the transformation represented by M. EXERCISE 1. Use the fact that I is a non-negative Markoff matrix to show that conditions of positivity cannot be completely relaxed.
7. Second Proof of Independence of Initial State. Another method for establishing independence of the initial state is the following. We already know that lim M"e exists for any initial probability vector e.
"......
Let y and z be two limits corresponding to different initial probability vectors e and d, respectively. Choose the scalar tl so that y - tlZ has at least one zero component, with all other components positive. As we know, M(y - tlZ) must then be a positive vector if y - tlZ is nontrivial. However, M(y - tlZ) = My - tlMz = y - tlZ
a contradiction unless y - tlZ are both probability vectors.
= O.
This means that tl
(1)
= 1, since y and z
Markoff Matrices and Probability Theory
269
8. Some Properties of Positive Markoff Matrices. Combining the results and techniques of the foregoing sections, we can derive some interesting results concerning positive Markoff matrices. To begin with, what we have shown by means of the preceding argument is that a positive Markoff matrix cannot possess two linearly independent positive characteristic vectors associated with the characteristic root one. Furthermore, the same argument shows that a positive Markoff matrix cannot possess any characteristic vector associated with the characteristic root one which is linearly independent of the probability vector Y found above. For, let z be such a vector. l Then for t l sufficiently large, z + tly is a positive vector. Dividing each component by the scalar t 2 = (z + tly, c), where c is a vector all of whose components are ones, we obtain a probability vector. Hence (z + tly)/tz = y, which means that z and yare actually linearly dependent. It is easy to see that no characteristic root of M can eXiceed one in abSolute value. For, let x be a characteristic vector of M ' , Abe a characteristic root, and m be the absolute value of a component of x of greatest magnitude. Then the relation Ax = M'x shows that N
1).lm ~
m
l.
mji = m
(1)
j~l
whence IAI ~ 1. We can, however, readily show that A = 1 is the only characteristic root of absolute value one, provided that M is a positive Markoff matrix. To do this, let p. be another characteristic root with 1p.1 = 1, and w + iz an associated characteristic vector with wand z real. Then, choosing Cl large enough, we can make w + CIY and z + CIY both positive vectors. It follows that . M(w
+ iz + cl(1 + i)y)
= p.(w
+ iz) + cl(1 + i)y
(2)
Hence, on one hand, we have M"(w
+ iz + cl(1 + i)y)
On the other hand, as n -+
QO,
= p."(w
+ iz) + cl(1 + f,")y
(-3)
we see that
+ elY + i(z + clY» (4) converges, because of the positivity of w + CIY and z + CIY, to a certain Mn(w
+ iz + cl(1 + i)y)
= M"(w
scalar multiple of y. The vector p."(w + iz), however, converges as n -+ 00 only if p. = 1, if p. is constrained to have absolute value one. This completes the proof. We see then that we have established Theorem 2. 1 It
is readily seen that it suffices to take
II
real.
Introduction to Matrix Analysis
270
Theorem 2. If M is a positive M arkoJ! matrix, the characteristic root of largest absolute value i. 1. A ny characteristic vector a"ociated with this characteristic root i8 a .calar multiple of a probability vector. EXERCISE 1. If >.. is a obaraoteristio root with an a&llOciated characteristic vector which is positive, then>.. - 1.
9. Second Proof of Limiting Behavior. Let us now turn to a second direct proof of the fact that x(n) has a limit as n _ 00. Writing out the equations connecting the components of x(n) and x(n + 1), we have N
x,(n
+ 1)
=
L
m,jxj(n)
(1)
mij[xi(n) - xj(n - 1)]
(2)
i-I
This yields N
x,(n
+ 1) -
xi(n) =
L j-l
N
Since
L
xj(n) = 1 for all n, it is impossible to have xj(n)
~
xj(n - 1)
i-I
for all i, unless we have equality. In this case x(n) = x(n - 1), a relation which yields x(m) = x(n - 1) for m ~ n, and thus the desired convergence. In any case, for each n let S(n) denote the set of j for which xj(n) ~ xj(n - 1) and T(n) the set of j for which xj(n) < xj(n - 1). The comment in the preceding paragraph shows that there is no loss of generality in assuming that S(n) and T(n) possess at least one element for each n. Referring to (2), we see that ') m,j[xj(n) - xj(n - 1)]
i.11,.)
~
x,(n
+ 1) ~
- x,(n)
') m,j[xj(n) - xj(n - 1)]
j.'sf.,.)
(3)
Hence
L
[x,(n
+ 1)
- x,(n)]
l.S{,.+I)
~
L{ L
j.S{,.)
which yields, since mtj ~ d
L+
,.S{,.
l)
[x,(n
+ 1) -
> 0 for all i
x,(n»
m,j} [xj(n) - xj(n - 1)]
(4)
,.S{,.+I)
~ (1 -
and i, d)
L j.S(,.)
[xj(n) - xj(n - 1)]
(5)
Markoff Matrices and Probability Theory Similarly, summing over iET(n (1 - d) )
[xj(n) -
xj(n -
271
+ 1), we have 1)]
j.t1n)
I
~
[x.(n
+ 1)
- x.(n)]
(6)
•• T(n+ I)
Combining (5) and (6), we see that N
llxj(n
+ 1)
- xj(n)1
j~1
~
N
(1 - d)
...
and hence that the series
I
I
IXj(n) - Xj(n - 1)1
(7)
j-I
IIx(n) - x(n -
1)11 converges.
n=I
This completes the proof of convergence of x(n). EXERCISES 1. If M is a Markoff matrix and M2 is a. positive Markoff matrix, show that the sequences (M 2n cl and (M2n+l cl converge. Does (M"cl converge? i. What is the limit of M" if M is a positive Markoff matrix?
10. General Markoff Matrices. As mentioned above, the study of the general theory of Markoff matrices is quite complex, and best carried on from a background of probability theory. In order to see how complex roots of absolute value 1 can arise, consider a system with only two states, 1 and 2, in which state 1 is always transformed into state 2 and state 2 into state 1. The corresponding transition matrix is then M =
[~ ~]
(1)
which has the characteristic roots AI = 1, A2 = -1. By considering cyclic situations of higher order, we can obtain characteristic roots which are roots of unity of arbitrary order. Let us, however, discuss briefly an important result which asserts that even in the case where there is no unique limiting behavior, there is an average limiting behavior. Let us suppose for the moment that M has only simple characteristic roots, so that we may write
o T-l
M=T
o
(2)
272
Introduction to Matrix Analysis
Then
o
T-t
(3)
o
If
IA.I <
1, we have
!
ANn -
.. -1
0 as n -
00.
If
IA,I
=
1, the limit
..
lim
l
A,"/n = aj
(4)
n....... A:-1
exists. If Aj = e", where (J /2r is an integer, then aj = 1; if O/2r is not an integer, then aj = O. In any case, under the assumption of distinct characteristic roots, we can assert the existence of a quasi-limiting state
_. (z + M:c +n +...1 + Mn:c)
y - ....... hm
(5)
which is a ·characteristic vector of M, that is, My = y. The limit vector y is a probability vector if :c is a probability vector. The general case can be discussed along similar, but more complicated, lines; see the exercises at the end of the chapter. EXERCISE 1. If >.. is a characteristic root of a Markoff m~trix M, with the property that 1>"1 - 1, must>.. be a root of unity?
11. A Continuous Stochastic Process. Let us now consider the stochastic process described in Sec. 2 under the assumption that the system is observed continuously. We begin with the discrete process in which observations are made at the times t = 0, .1, 2.1, . . .. Since we want the continuous process to be meaningful, we introduce a con-
Markoff Matrices and Probability Theory
273
tinuity hypothesis in the form of a statement that the probability of the system remaining in the same state over any time interval of length A is 1 - O(A). To make this statement precise, we define the following quantities: G.>iA = the probability that S is in state i at t it is in state j at time t, for i ~ j 1 - G.>oA = the probability that S is in state i at t it is in state i at time t
The quantities
aij
+ A, given that + A, given that
(1)
are assumed to satisfy the following conditions: aij ~ a,i
=
0
I
(2a)
(2b)
aii
i,.,
Then the equations governing the quantities x,(t) are x,(t
+ A)
= (1 - aiiA)x,(t)
+A I
j,.'
a"jXj(t)
i = 1, 2, . . . ,N
(3)
for t = 0, A, 2A, . . . . Writing, in a purely formal way, x,(t
+ A)
= x;(t)
+ Ax~(t) + 0(A2)
(t)
and letting A --t 0 in (2), we obtain the system of differential equations dXi
d[ =
-aiiXi
~ + f..t aijXj
Xi(O)
=
Ci
(5)
j,.,
i = 1,2, . . . ,N, where Ci, i = 1,2, ... ,N, are the initial probabilities. We shall bypass here all the thorny conceptual problems connected with continuous stochastic processes by defining our process in terms of this system of differential equations. In order for this to be an operationally useful technique, we must show that the functiollS generated (5) act like probabilities. This we shall do below. There are a number of interesting questions arising in this way which we shall not treat here. Some of these are: 1. How does one define a continuous stochastic process directly and derive the differential equations of (5) directly? 2. In what sense can the continuous stochastic process defined by (5) be considered the limit of the discrete stochastic process defined by (3)? Since these are problems within the sphere of probability theory, we shall content ourselves with mentioning them, and restrain ourselves here to the matrix aspects of (5). As usual, however, we Rhall not scruple to use these ideas to guide our analysis.
Introduction to Matrix Analysis
274
12. Proof of Probabilistic Behavior. Let us now demonstrate that the system of differential equations
tho
dt' = -a."x,
\' +~ ai/xi
i
x,.(0) = Co
= 1, 2,
. . . ,N
(1)
jr4'
where (2a)
ao/
~
(2b)
0
Laj,=ao,
(2c)
i >#,
prod uces a set of functions satisfying the relations
L
x,(t) ~ 0
t ~ 0
(3a)
(3b)
x,(t) = 1
i-l
Let us demonstrate (3b) first.
From (1) we have (4)
by virtue of (2c).
Hence, for all t,
.
~ x.(t) To show that x,
~
~ x,(O)
=
,
= 1
(5)
0, write (1) in the form
!! (X,e°ii') dt'
=
eO I•1
4~ . . ·x, n,
11
i>#,
Since Xi ~ 0 at t = 0, this shows that the quantities X,-e°'" are monotone increasing. In place of following this procedure, we could use the easily established fact that the solutions of (1) are the limits of the solutions of (11.3) as A - O. Hence, properties of the solutions of (11.3) valid for all A must carryover to the solutions of (1). We recommend that the reader supply a rigorous proof. 13. Generalized Probabilities-Unitary Transformations. Any set of non-negative quantities Ixd satisfying the conditions (la)
(lb)
Markoff Matrices and Probability Theory
275
can be considered to represent a set of probabilities, where x, is the probability that a system S is in state i. A set of transformations i = 1,2, . . . ,N
(2)
which preserves the relations in (1) can then be considered to represent the effects of certain physical transformations upon S. The simplest such transformations are the linear, homogeneous transformations N
x~
=
I
(3)
ai;Xj
j-I
As we know, a necessary and sufficient condition that the relations in (1) be preserved is that A = (aij) be a Markoff matrix. Let us now consider a quadratic transformation. Let x be a complex vector with components x, and let
x'= Tx
(4)
where T is a unitary transformation. Then (x',X') = (x,x). Hence, if we consider IXtI2, IX212, , IXNI2 to represent a set of proba, Ix~12 can also be considered to be bilities, we see that Ix~12, Ix~12, a set of probabilities. We leave it to the reader to establish the relevant limit theorems for the sequence Ix(n)} defined recurrently by x(n
+ 1)
(5)
= Tx(n)
where T is unitary and (x(O),x(O» = 1 14. Generalized Probabilities-Matrix Transformations. Let us now generalize the concept of probabilities in the following fashion. If {X,} is a finite set of symmetric matrices satisfying the conditions X" ~ 0
i = 1,2, . . . , N
(la)
tr (Xi) = 1
(lb)
N
I
i=1
we shall call them a set of probabilities. signifies that X, is non-negative definite. Consider the transformation
The condition X,
~
0 here
N
Yi =
I .i=
A;,XjAij 1
i = 1,2, . . . , N
(2)
Introduction to Matrix Analysis
276
It is easy to establish the following analogue of the result concerning
Markoff matrices. Theorem 3. A necessary and sufficient condition that the transformation in (2) preserve the relations in (1) is that N
l
j = 1,2, . . . ,N
A:jA,j = I
(3)
i-I
Althour h analogues of the limit theorems established for the usual Markoff transformations can be established for the transformations of (2), the results are more difficult to establish, and we shall therefore not discuss these matters any further here. EXERCISES 1. Let M be a Markoff matrix. exists along the following lines.
..
l ....... "-0
Consider a proof of the fact that lim
M'/n
Using the Schur transformation, we can write
T'
o where the 8, are real and 1>...1 < 1, i = k + 1, . . . ,N. It suffices to consider the iterates of the triangular matrix. . To do this, consider the system of equations x.(n xl(n
+ 1) + 1)
=
e"'x.(n)
~ = e"kxk(n)
n = 0, 1, 2, • • •. Show that IXi(n)1 Using the equation xk(n
..
lim
l
xk(m)!n
exists.
+ 1)
+ b'Nx.v(n) + b2NxN(n)
+ h12X2(n) + ei "x2(n) +
+ 1, ••• , N. where Iv,(n)1 ~ elm, show that
c.rn , 0 < r < 1, i = k
+ Vk(n),
Then continu!! inductively.
"'........ -0
i. Prove the stated result using the Jordan canonical form.
Markoff Matrices and Probability Theory
277
MISCELLANEOUS EXERCISES 1. Consider the mat.rix -~o
""I A(n)
~o
-(~I
+ ""I)
"".
=
~-I -(~ #!to)
+
where the ~I and ""I are positive quantities. Denote by .pn(~) the quantity IA (n) + HI. ""'(~)
-
Show that
+ ~n + ""n).pn-I(~) + ~n-I"".. .pn_l(~) = 0 .po(~) = ~ + ~o (W. Ledermann and G. E. Reuter). (~
for n ~ 1 if .p_I(~) = 1, i. Thus, or otherwise, show that the characteristic roots are A (n) are distinct and negative or zero, and separate those of A ("-I) (W. Ledermann and G. E. Reuter). 8. In the notation of Exercise 27 of the Miscellaneous Exercises of Chap. 4, is A (,,) symmetrizable? 4.. By a permutation matrix P of order N, we mean a matrix possessing exactly one nonzero element of value one in each row and column. Show that there are NI permutation matrices of order N. Ii. Prove that the product of two permutation matrices is again a permutation matrix, and likewise the inverse. What are the possible values for the determinant of a permutation matrix? 6. By a doubly stochastic matrix of order N we mean a matrix A whose elements satisfy the following conditions: (a) al; ~ 0
(b)
I I,
a;; ,. 1
i = 1,2, •
ali = 1
j = 1,2, .•• ,N
,N
j
(c)
Is the product of two doubly stochastic matrices doubly stochastic? 7. Prove that any doubly stochastic matrix A may be written in the form r = 1,2, . . . , NI
w,
~
0
where I P, I is the set of permutation matrices of order N. The result is due to Birkhoff. 1 Another proof, together with a number of applications to scheduling theory, may be found in T. C. Koopmans and M. J. Beckman. l 1 G. Birkhoff, Tres obaervaciones sobre el algebra lineal, Rell. unill. naco Tucumdn, aer. A, vol. 5, pp. 147-151, 1946. • T. C. Koopmans and M. J. Beckman, Assignment Problems and the Location of Economic Activities, Econometrica, vol. 25, pp. 53-76, 1957.
278
Introduction to Matrix Analysis
For some further results concerning doubly stochastio matrices, see S. Schreiber' and L. Mirsky. Mirsky's paper contains a partil!ularly simple proof of the Birkhoff results cited above and of ar. interesting connection between doubly stochastic matrices and the theory of ineq9alities. 8. Let A be a 2 X 2 Markoff matrix. Show that A is a square of a Markoff matrix if and only if all ~ au. 9. Hence, show that if A is a square of a Markoff matrix, it can be written as the 2" power of a Markoff matrix for n = 1,2, . • . . 10. From this, conclude that if A has a square root which is a Markoff matrix, it has a root of every order which is a Markoff matrix. 11. From this, conclude that we can construct a family of Markoff matrices A (I) with the property that A(s + t) = A(s)A(I), for s, t ~ 0, A(O) = I, and A(I) = A. U. Using this result, or otherwise, show that A = eB , where B has the form -al
B -
[
bl
18. If B is a matrix of the type described in Sec. 12, we know that eB is a Markoff matrix. Let B, and B I be two matrices of this type and let A, = eB ., A 2 = eB•• Is AlAI a matrix of this type? 14.. Let A be a Markoff matrix of dimension N with the property that it has a root of any order which is again a Markoff matrix. Let A" denote a particular 2"-th root which is a Markoff matrix, and write A(I) = A",A"• ••• ,when t has the form t = 2-", + 2-'" + ... , where the sum is finite. Can we define A (I) for 0 < t < 1 as a limit of values of A (I) for t in the original set of values? If so, show that A (s + I) = A (s) A (I) for 0 < s, t ~ 1. Define A (0) to be I. If the foregoing relation holds, docs it follow that A (I) is the solution of a differential equation of the form dX/dt = BX, X(O) = I, where B is a matrix of the type occurring in Sec. 121 lli. Let pl(n) = the probability of going from state a to state i in n steps, and ql(n) be similarly defined starting in state b. Consider the 2 X 2 determinant d,.(n) = '
Prove that d.j(n) =
Iq.(n) pl(n)
I
pj(n) qj(n)
a/rtlj.d..(n -
I 1)
r,.
(Cf. the matrices formed in Sec. 14 of Chap. 12.) 16. Let .p;j(n) = the probability of two particles going from initial distinct states a and b to states i and j at time n without ever being in the same state. Show that I!>ii(n) =
l
al,ai.",,,(n - 1)
r ••
In these two exercises, any particular move.
alj
denotes the probability of going from state j to state i in
, S. Schreiber, On a Result of S. Sherman Concerning Doubly Stochastic Matrices, Proc. Am. Math. Soc., vol. 9, pp. 350-353, 1953. L. Mirsky, Proofs of Two Theorems on Doubly-stochastic Matrices, Proc. Am. Math. Soc., vol. 9, pp. 371-374, 1958.
Markoff Matrices and Probability Theory
279
What is the connection between dij(n) and t/lij(n)? See S. Karlin and J. MacGregor, Coincidence Probabilities, Stanford University Department of Statistica, Tech. Rept. 8, 1958, for a detailed discussion. 17. Let M be a positive Markoff matrix and let the unique probability vector x satisfying M x = x be determined by the relation x = lim Mnb where b is a proba-
.........
bility vector. If we wish to use this result for numerical purposes, should we arrange the calculation so as to obtain Mn first by use of M. = M, M I = MM., ••• , and then Mnb, or should we calculate Mnb by way of XI = b, X2 = M Xh • • .? Which procedure is more conducive to round-off error? 18. Suppose that n = 2k and we calculate M. = M, M 2 = M.I, M s = M 2 2, • • • • Which of the two procedures sketched above is more desirable? 19. What are necessary and sufficient conditions on a complex matrix A that AH be a stability matrix whenever H is positive definite Hermitian? See Carlson. and the interesting survey paper by O. Taussky.1
Bibliography and Discussion §1-§12. The results given in this chapter follow the usual lines of the theory of finite Markoff chains, and the proofs are the classical proofs. For detailed discussion and many additional references, see W. Feller, An Introduction to Probability Theory and Applications, John Wiley & Sons, Inc., New York, 1950. M. S. Bartlett, A n Introduction to Stochastic Processes, Cambridge University Press, New York, 1955. N. Arley, On the Theory of Stochastic Processes and Their Applications to the Theory of Cosmic Radiation, Copenhagen, 1943. W. Ledermann and G. E. Reuter, Spectral Theory for the Differential Equations of Simple Birth and Death Processes, Phil. Trans. Royal Soc. London, ser. A, vol. 246, pp. 321-369, 1953-1954. W. Ledermann, On the Asymptotic Probability Distribution for Certain Markoff ProcesRes, Proc. Cambridge Phil. Soc., vol. 46, pp. 581-594, 1950. For an interesting discussion of matrix theory and random walk problems, see M. Kae, Random Walk and the Theory of Brownian Motion, Am. Math. Monthly, vol. 54, pp. 369-391, 1947. • D. Carlson, A New Criterion for H-stability of Complex Matrices, Lin. Algebra Appl., vol. 1, pp. 59-64, 1968. 10. Taussky, Positive Definite l\1atriees and Their Hole in the Study of the Characteristic Roots of General Matriees, Advances in Mathematic8, vol. 2, pp. 17.')-IX6, Al~f\demic Prl'SB Inc., New York, IflIiX.
Introduction to Matrix Analysis
280
§T. The shuffling of cards is designed to be a l\Iarkoff process which makes each new deal independent of the previous hand. In reality, most people shuffle quite carelessly, so that a great deal of information concerning the distribution of honors in the various hands can be deduced by remembering the tricks of the previous deal. Over several hours of play, this percentage advantage can be quite important. The concept of a Markoff chain is due to Poincare. See
M. Fr~chet, TraUe du ealeul des probabilites, tome 1, fasc. III, 2" livre, Gauthier-Villars & Cie, Paris, 1936. for a detailed discussion of the application of matrix theory to Markoff chains.
§13. For a discussion of these generalized probabilities and their connection with the Feynman formulation of quantum mechanics, see E. Montroll, Markoff Chains, Wiener Integrals, and Quantum Theory, Comm. Pure Appl. Math., vol. 5, pp. 415-453, 1952.
§1'-t This matrix generalization of probabilities was introduced in R. Bellman, On a Generalization of Classical Probability Theory-I: Markoff Chains, Proc. Natl. Aead. Sci. U.S., vol. 39, pp. 1075-1077, 1953.
See also R. Bellmo,n, On a Generalization of the Stieltjes Integral to Matrix Functions, Rend. eire. mat. Palel'11lO, ser. II, vol. 5, pp. 1-6, 1956.
R. Bellman, On Positive Definite Matrices and Stieltjes Integrals, Rend. eire. mat. Palermo, ser. II, vol. 6, pp. 2.54-2.58, 19.57. For another generalization of Markoff matrices, see E. V. Haynsworth, QUluoli-stochaHtic matrices, Duke Math. J., vol. 22, pp. 15-24, 1955. For some quite interesting results pertaining to Markoff chains and totally positive matrices in the sense of Gantmacher and Krein, see S. Karlin and .r. McGregor, Coincidence Prohabilities, Stanford University Tech. Rept. 8, 1958.
15 Slochasiic M airices 1. Introduction. In this chapter we wish to discuss in very brief fash. ion stochastic matrices and a way in which they enter into the study of differential and difference equations. As we shall see, Kronecker products occur in a very natural fashion when we seek to determine the moments of solutions of linear functional equations of the foregoing type with random coefficients. 2. Limiting Behavior of Physical Systems. Consider, as usual, a physical system 8, specified at any time t by a vector x(t). In previous chapters, we considered the case in which x(t -jJ .1) was determined from a knowledge of x(t) by means of a linear transformation x(t
+ .1)
= (l
+ ZA)x(t)
(1 )
and the limiting form of (1) in which .1 is taken to be an infinitesimal,
dx
-dt =
Zx
(2)
Suppose that we now assume that Z is a stochastic matrix. By this we mean that the elements of Z are stochastic variables whose distributions depend upon t. Since the concept of a continuous stochastic process, and particularly that of the solution of a stochastic differential equation, is one of great subtlety, with many pitfalls, we shall consider only discrete processes. There is no difficulty, however, in applying the formalism developed here, and the reader versed in these matters is urged to do so. Returning to (1), let us write x(k.1) = Xk to simplify the notation. Then (1) may be written (3)
Since Z is a stochastic matrix, its value at any time becomes a function of time, and we denote this fuct by writing Zk. 281
282
Introduction to Matrix Analysis
We see then that 11-1
n (1 + Zk.1) ] Xo
X" = [
(4)
k~O
If .1 is small, we can write this 11-1
X" =
[I +.1 n Zk + 0(.1
2 ) ]
(5)
Xo
k-O
Hence, to terms in .1 2, we can regard the effect of repeated transformations as an additive effect. Additive stochastic processes have been extensively treated in the theory of probability, and we shall in consequence devote no further attention to them here. Instead, we shall examine the problem of determining the limiting behavior of x" when Zk.1 cannot be regarded as a small quantity. Since this problem is one of great difficulty, we shall consider on13 the question of the asymptotic behavior of the expected value of the kth powers of the components of x" as n -+ 00 for k = I, 2, . . . . 3. Expected Values. To illustrate the ideas and techniques, it is sufficient to consider the case where the Xi are two-dimensional matrices and the Zk are 2 X 2 matrices. Let us write out the relations of (3) in terms of the components of Xk, which we call Uk and Vk for simplicity of notation. Then Uk+l = ZllUk Vk+l = Z21Uk
+ Z12Vk + Z22Vk
(1)
We shall assume that the Zi are independent random matrices with a common distribution. More difficult problems involving situations in which the distribution of Zk depends upon the values assumed by Zk-l will be discussed in a subsequent volume. The assumption that the Zk are independent means that we can write E(Ukt-l) = ellE(uk) E(Vk+l) = e21E(uk)
+ e12E (vk) + e22 E (vk)
(2)
where E(Uk) and E(llk) represent the expected values of Uk and Vk, respectively, and eli is the expected value of Z'i' We see then that (3)
This mea,ns that the asymptotic behavior of E(x,,) as n -+ upon the characteristic roots of E(Z).
00
depends
Stochastic Matrices
283
EXERCISE 1. Assume that with probability ~'2 Z is a positive Markoff matrix A, and with probability H a positive Markoff matrix B. Prove that E(xn) -> v, a probability vector independent of Xo, assumed also to be a probability vector.
4. Expected Values of Squares. Suppose that we wish to determine the expected values of u.. 2 and v.. 2• From (3.1) we have Uk+1
2
Vk+1 2
= =
+ 2ZnZ12UkVk + Z12 2Vk 2 Z212Uk2 + 2Z21Z22UkVk + Z22 2V k 2 2 2 Zn Uk
(1)
We see then that in order to determine E(Uk 2) and E(Vk 2) for all k we must determine the values of E(UkVk). From (3.1), we have
Hence
[ ~~~::::~+I)]
=
E(ZI2))
E(Vk+1 2)
[~~~::~)] 2 E(Vk )
where Z12) is the Kronecker square of Z = (Zij), as defined in Sec. 8 of Chap. 12. Observe how stochastic matrices introduce Kronecker products in a very natural fashion. MISCELLANEOUS EXERCISES 1. Show i. Show matrix, to obtained.
that E(ZI'J) arises from the problem of determining E(Uk'). that the Kronecker square arises in the following fashion. Let Y be a be determined, possessing the property that E(x(n), Yx(n» is readily Determine Y by the condition that E(Z'YZ) = "Y
What are the values of ,,? S. If Z has the distribution of Exercise 1 of Sec. 3, do the sequences IE(v. l ) I converge? If so, to what?
I E( Un I) I,
Bibliography and Discussion The problem of studying the behavior of various functions of stochastic matrices, such as the determinants, or the characteristic roots, is one of great interest and importance in the field of statistics and various parts of mathematical physics. Here we wish merely to indicate one aspect of the problem and the natural way in which Kronecker products enter.
Introduction to Matrix Analysis
284
For a discussion of the way stochastic matrices enter into statistics, see S. S. Wilks, Mathematical Statistics, Princeton University Press, Princeton, N.J., 1943. T. W. Anderson, The Asymptotic Distribution of Certain Characteristic Roots and Vectors, Proc. Second Berkeley Symposium on M athemat~'cal Stat£8tics and Probability, 1950.
T. W. Anderson, All Introduction to Multivaliate Statistt'cal Analysis, John Wiley & Sons, Inc., New York, 19.';8. and the references in A. E. Ingham, An Integral Which Occurs in Statistics,
PI·OC.
Cam-
bridge Phil. Soc., vol. 29, pp. 271-276, 1933.
See also H. Nyquist, S. O. Rice, llnd J. Riordan, The Distribution of Random Determinants, Quart. Appl. Math., vol. 12, pp. 97-104, 1954. R. Bellman, A Note on the Moan Value of Random Determinants, Quart. Appl. Math., vol. 13, pp. 322-324, 1955.
For some problems involving stochastic matrices arising from mathematical physics, see
E. P. Wigner, Characteristic Vectors of Bordered Matrices with Infinite Dimensions, A 1/1/. Math., vol. 62, pp. 548-564, 1955. F. J. Dyson, Phys. Rev., (2), vol. 92, pp. 1331-1338, ]953. R. Bellman, Dynamics of a Disordered Linear Chain, Phys. Rev., (2), vol. 101, p. 19, 1958. Two books discussing these matters in detail are U. Grenander, Probabilities on Algebra£c Structures, .fohn Wiley & Sons, Inc., New York, 1963. M. L. Mehta, Random Matrices, Academic Press Inc., New York, 1967.
For the use of Kronecker products to obtain moments, see R. Bellman, Limit Theorems for Noncommutative Operations, I, Duke Math. J., vol. 21, pp. 491-500, 1954.
E. H. Kerner, The Band Structure of Mixed Linear Lattices, Proc. Ph-ys. Soc. London, vol. 69, pp. 234-244, 1956.
Jun-ichi Hori, On the Vibration of Disordered Linear Lattices, II, Progr. Theoret. Phys. Kyoto, vol. 18, pp. 367-374, 1957.
Stochastic Matrices
285
The far more difficult problem of determining asymptotic probability distributions is considered in H. Furstenberg and H. Kesten, Products of Random Matrices, Ann. Math. Stat., vol. 3, pp. 457-469, 1960. An expository paper showing the way in which matrices enter into the study of Ilnearest neighbor" problems is A. Maradudin and G. H. Weiss, The Disordered Lattice Problem, a Review, J. Soc. Ind. Appl. Math., vol. 6, pp. 302-319, 1958. See also D. N. Nanda, Distribution of a Root of a Determinantal Equation, Ann. Math. Stat., vol. 19, pp. 47-57, 1948; ibid., pp. 340-350.
P. L. Hsu, On the Distribution of Roots of Certain Determinantal Equations, A nn. Eugenics, vol. 3, pp. 250-258, 1939. S. N. Roy, The Individual Sampling . . . ,Sankhya, 1943. A. M. Mood, On the Distribution of the Characteristic Roots of Normal Second-moment Matrices, Ann. Math. Stat., vol. 22, pp. 266--273, 1951. M. G. Kendall, The Advanced Theory of Statistics, vol. II, London, 1946. R. Englman, The Eigenvalues of a Randomly Distributed Matrix, Nuovo cimento, vol. to, pp. 615-621, 1958. Finally, the problem of the iteration of random linear transformations plays an important role in the study of wave propagation through random media, see R. Bellman and R. Kalaba, Invariant Imbedding, Wave Propagation, and the WKB Approximation, Proc. Nat[. A cad. Sci. U.S., vol. 44, pp. 317-319, 1958. R. Bellman and R. Kalaba, Invariant Imbedding and Wave Propagation in Stochastic Media, J. Math. and Mech., 1959.
16 Positive Matrices, Perron's Theorem, and Mathematical Economics 1. Introduction. In this chapter, we propose to discuss a variety of problems arising in the domain of mathematical economics which center about the themes of linearity and positivity. To begin our study, however, we shall study some simple branching processes which arise both in the growth of biological entities and in the generation of elementary particles, as in nuclear fission and cosmic-ray cascades. Oddly enough, the fundamental result concerning positive matrices was established by Perron in connection with his investigation of the multidimensional continued fraction expansions of Jacobi. His result was then considerably extended by Frobenius in a series of papers. Rather remarkably, a result which arose in number-theoretic research now occupies a central position in mathematical economics, particularly in connection with the II input-output" analysis of Leontieff. The matrices arising in this study were first noted by Minkowski. This result also plays an important role in the theory of branching processes. What we present here is a very small and specialized part of the modern theory of positive operators, just as the results for symmetric matrices were principally special cases of results valid for Hermitian operators. Finally, at the end of the chapter, we shall touch quickly upon the basic problem of linear programming and the connection with the theory of games of Borel and von Neumann. In conclusion, we shall mention some Markovian decision processes arising in the theory of dynamic programming. 2. Some Simple Growth Processes. Let us consider the following simple model of the growth of a set of biological objects. Suppose that there are N different types, which we designate by the numbers 1, 2, . . . , N, and that at the times t = 0, 1, 2, . . . , each of these different types gives birth to a certain number of each of the otner types. 286
Positive Matrices and Mathematical Economics
287
One case of some importance is that where there are only two types, the normal species and the mutant species. Another case of interest is that where we wish to determine the number of females in different age groups. As each year goes by, females of age i go over into females of age i + 1, from time to time giving birth to a female of age zero. It is rather essential to attempt to predict the age distribution by means of a mathematical model since the actual data are usually difficult to obtain. Let us introduce the quantities aij
= the number of type i derived at any stage from a single item of type j, i, j
=
1, 2, . . . , N.
(1)
As usual, we are primarily interested in growth processes whose mechanism does not change over time. The state of the system at time n is determined by the N quantities xi(n) =
i = 1,2, .
, N.
the number of type i at time
n
(2)
We then have the relations N
xi(n
+ 1)
=
I
aijXj(n)
l
= 1,2, . . . , N
(3)
j=!
where the relations Xi(O) = Ct , i = 1, 2, . . . , N, determine the initial state of the system. The problem we would like to solve is that of determining the behavior of the components xi(n) as n -l- 00. This depends upon the nature of the characteristic roots of A = (aiJ, and, in particular, upon those of greatest absolute value. As we know from our discussion of Markoff matrices, this problem is one of great complexity. We suspect, however, that the problems may be quite simple in the special case where all the alj are positive, and this is indeed the case. 3. Notation. As in Chap. 14, we shall call a matrix positive if all of its elements are positrve. If A is positive, we shall write A > O. The notation A > B is equivalent to the statement that A - B > O. Similarly, we may conceive of non-negative matrices, denoted by A ~ O. This notation definitely conflicts with previous notation used in our discussion of positive definite matrices. At this point, we have the option of proceeding with caution, making sure that we are never discussing both types of matrices at the same time, or of introducing a new notation, such as A 2: ~ R, or something equally barbarous. Of the alternatives, we prefer the one that uses the simpler notation, but which requires that the reader keep in mind what is being discussed. This is, after all, the more desirable situation.
Introduction to Matrix Analysis
288
A vector x will be called positive if all of its components are positive, and non-negative if the components are merely non-negative. We shall write x > 0, and x ~ 0 in the second. The relation x ~ y is equivalent to x - Y ~ O. EXERCISES 1. Show that x ~ 11, A ~ 0, imply that Ax ~ Ay. S. Prove that Ax ~ 0 for all x ~ 0 implies A ~ O.
4. The Theorem of Perron. Let us now state the fundamental result of Perron. Theorem 1. IJ A is a positive matrix, there is a unique characteristit:. root oj A, A(A), which has greatest ab"wlute value. This root is positive and simple, and its associated characteristic vector may be taken to be positive. There are many proofs of this result, of quite diverse origin, structure, and analytical level. The.proof we shall give is in some ways the most important since it provides a variational formulation for A(A) which makes many of its properties immediate, and, in addition, can be extended to handle more general operators. 6. Proof of Theorem 1. Our proof of Theorem 1 will be given in the course of demonstrating Theorem 2. Theorem 2. Let A be a positive matrix and let A(A) be defined as above. Let S(A) be the set oj non-negative A Jor which there exist non-negative vectors x such that Ax ~ Ax. Let T(A) be the set oj positive A Jor which there exist positive vectors x such that Ax ~ Ax. Then X(A) = max A = min A
ProoJ oj Theorem 2. shall consider, so that
(1)
Let us begin by normalizing all the vectors we N
Ilxll =
I
x, = 1
(2)
i~l
This automatically excludes the null vector.
Let us once more set
N
IIAII
=
l
4\1.
If Ax
~
Ax, we have
i,i-l
or
Allxll ~ IIAllllxl1 o ~ A~ IIAII
(3)
This shows that S(A) is a bounded set, which is clearly not empty if A is positive.
Positive Matrices and Mathematical Economics
289
Let Xo = Sup X for AES(A); let {Ad be a sequence of As in S(X) CODverging to Ao; and let {x(') I be an associated set of vectors, which is to say
i = 1,2, . . .
(4)
Since Ilx(illl = l,we may choose a subsequence of the x(·), say {xWI which converges to x(O), a non-negative, nontrivial vector. Since XoX(O) 5 Ax(O), it follows that AoES(A), which means that the supremum is actually a maximum. Let us now demonstrate that the inequality is actually an equality, that is, AQX(O) = Ax(O). The proof is by contradiction. Let us suppose, without loss of generality, that N
l I
a\jXj -
AoXl =
d1 > 0
i-I
(5)
N
ak,Xj -
AoXk
~
k
0
=
2, ... , N
i-I
where X" i = 1, 2, .•• , N are the components of Consider now the vector
x(O).
(6)
o It follows from (5) that Ay > AoY. This, however, contradicts the maximum property of Ao. Hence d1 = 0, and there must be equality in all the terms in (5). Consequently Ao is a characteristic root of A and X(O) a characteristic vector which is necessarily positive. Let us now show that Ao = A(A). Assume that there exists a characteristic root Aof A for which IAI ~ Ao, with z an associated characteristic vector. Then from Az = Az we have
IAlizl 5 A Izl
(7)
where Izl denotes the vector whose components are the absolute values of the components of z. It follows from this last inequality and the definition of Ao, that IAI 5 Ao. Since IAI = Ao, the argument above shows that the inequality IAlizl 5 A Izl mUlst be an equality. Hence IAzl = Alzl; and thus z = CIW, with w > 0 and Cl a complex number. Consequently, Az = Az is equivalent to Aw = Aw; whence A is real and positive and hence equal to Ao.
Introduction to Matrix Analysis
290
To show that w is equivalent to x(O), let us show that apart from scalar multiples there is only one characteristic vector associated with Ao. Let , be another characteristic vector of A, not necessarily positive, associated with A(A). Then x(O) + EZ, for all scalar E, is a characteristic vector of A. Varying E about E == 0, we reach a first value of E for which one or several components of x(O) + EZ are zero, with the remaining components positive, provided that x and Z are actually linearly independent. However, the relation A(x(O)
shows that x(O)
+ EZ)
= A(A)(x(O)
+ EZ)
+ EZ ~ 0 implies x(O) + EZ > O.
(8)
Thus a contradiction.
EXERCISES 1. Show that A ~ B ~ 0 implies that >..(A) ~ >..(B). I. Show by means of 2 X 2 matrices that MAB) ::; >..(A)>..(B) is not universally valid.
6. First Proof that A(A) Is Simple. Let us now turn to the proof that X(A) is a simple characteristic root. We shall give two demonstrations, one based upon the Jordan canonical form and the other upon more elementary considerations. Assume that A(A) is not a simple root. The Jordan canonical form shows that there exists a vector y with the property that (A - A(A)l)k y
= 0
(A - A(A)l)k-l y ;c 0
(1)
for'some k ~ 2. This means that (A - A(A)I)k-l y is a characteristic vector of A and hence a scalar multiple of x(O). By suitable choice of y we may take this multiple to be one. Hence x(O) = (A - A(A)l)k-l y
(2)
Now let Z
=
(A - A(A)l)k-2 y
Then Az = A(A)z
This yields Aizi
> A(A)lzl,
+ XO > A(A)z
(3) (4)
which contradicts the maximum definition of
A(A).
7. Second Proof of the Simplicity of A(A). Let us now present an alternate proof of the simplicity of A(A) based upon the minimum property. We begin by proving the following lemma. Lemma. Let AN be a positive N X N matrix (aii) , and A N- 1 any (N - 1) X (N - 1) matrix obtained by striking out an ith row and jth
Positive Matrices and Mathematical Economics column.
Then X(A N )
Proof.
291
> X(A N - 1)
Let us proceed by contradiction.
(1)
Write
Assume for the moment that >-'N ~ >-'N-I, and take, without loss of generality, AN_I = (a,j), i, j = 1, 2, . . . , N - 1. We have then the equations N-l
I I ;=1
a;jYj
= >-'N-IY;
a;jXj
= >-'NX,
i = 1,2,
. , N - 1, y;
>0
(2)
i=l N
and
i = 1, 2, . . . ,N, x,
>0
(3)
Using the first N - 1 equations in (3), we obtain N-l
I
aijXj
= >-'NXi - aiNXN
i= I
(4) which contradicts the minimum property of >-'N-l. Let us now apply this result to show that >-'(A) is a simple root of f(>-') = IA - >-.11 = O. Using the rule for differentiating a determinant, we obtain readily f'(>-')
=
-IA I
-
HI -
IA 2
-
HI ... -
IAN -
HI
(5)
where by A k we denote the matrix obtained from A by striking out the kth row and column. Since >-'(A) > >-.(A k ) for each k, and each expression IA k - HI is a polynomial in >-. with the same leading term, each polynomiallA k - HI has the same sign at>-. = >-.(A). Hence, !,(>-.(A» ;c o. 8. Proof of the Minimum Property of >-'(A). We have given above a proof of the maximum property of >-.(A), and used, in what has preceded, the minimum property. Let us now present a proof of the minimum property which uses the maximum property rather than just repeating the steps already given. The technique we shall employ, that of using the adjoint operator, is one of the most important and useful in analysis, and one we have already met in the discussion of Markoff matrices. Let A I be, as usual, the transpose of A. Since the characteristic roots of A and A' are the same, we have >-'(A) = >-'(A ' ). As we know, (Ax,y) = (x,A 'y). If Ay ~ >-.y for some y > 0, we have for any z ~ 0, >-'(z,Y)
~
(z,Ay) = (A 'z,y)
(1)
292
Introduction to Matrix Analysis
Let, be a characteristic vector of A' associated with >.(A). >.(z,y)
~
(A Iz,y) = >.(A )(z,y)
Then (2)
Since (',y) > 0, we obtain >. ~ >'(A). This completes the proof of the minimum property. 9. An Equivalent Definition of >.(A). In place of the definition of >.(A) given in Theorem 1 above, we may also define X(A) as follows. Theorem 3 N
l min m~x {l
>.(A) = max z
m~n { I
ClijXj/Xi}
i-I N
=
:r:
,
aijXj/Xi}
(1)
i-I
Here X varie8 over all non-negative vectors different from zero. The proof we leave as an exercise. 10. A Limit Theorem. From Theorem 1, we also derive the following important result. Theorem 4. Let c be any non-negative vector. Then v
=
lim A"c/>.(A)"
"......
exi8ts and i8 a characteristic vector of A associated with>. (A), unique up to a scalar multiple determined by the choice of c, but otherwise independent of the initial 8tate c. This is an extension of the corresponding result proved in Chap. 14 for the special case of Markoff matrices. EXERCISE 1. Obtain the foregoing result from the special case established for Markoff matrices.
11. Steady-state Growth. Let us now see what the Perron theorem predicts for the asymptotic behavior of the growth process discussed in Sec. 2. We shall suppose that A = (aij) is a positive matrix. Then, as n ~ 00, the asymptotic behavior of the vector x(n), defined by the equation in (3), is given by the relation x(n) ,...., >''''Y
(1)
where X is the Perron root and 'Y is a characteristic vector associated with X. We know that 'Y is positive, and that it is a positive multiple of one particular normalized characteristic vector, say 6. This normalization can be taken to be the condition that the sum of the components totals one. The constant of proportionality connecting 'Y and 6 is then
Positive Matrices and Mathematical Economics
293
determined by the values of the initial components of c, Ci, the initial population. What this means is that regardless of the initial population, as long as we have the complete mixing afforded by a positive matrix, asymptotically as time increases we approach a steady-state situation in which the total population grows exponentially, but where the proportions of the various species remain constant. As we shall see, the same phenomena are predicted by the linear models of economic systems we shall discuss subsequently. 12. Continuous Growth Processes. Returning to the mathematical model formulated in Sec. 2, let us suppose that; the time intervals at which the system is observed get closer and closer together. The limiting case will be a continuous growth process. Let
1+
(l;j.1 = the number of type i produced by a single item of type;' in a time interval .1, ;' ¢ i au.1 = the number of type i produced by a single item of type i in a time interval .1
(1)
i = 1,2, . . . ,N. Then, we have, as above, the equations Xi(t
+ .1)
= (1
+ ai,.1)xi(t) + .1
I
(l;jXj(t)
l
=
1, 2, . . . ,N
(2)
j,.i
Observe that the ajj are now rates of production. Proceeding formally to the limit as A -> 0, we obtain the differential equations dXi
df
= aiiXi
~ + 1-1 a,jxj
i = 1,2, . . . ,N
(3)
J;«i
We can now define the continuous growth process by means of these equations, with the proviso that this will be meaningful if and only if we can show that this process is in some sense the limit of the discrete process. The asymptotic behavior of the Xi(t) as t ~ 00 is noW determined by the characteristic roots of A of largest real part. 13. Analogue of Perron Theorem. Let us now demonstrate Theorem 5. Theorem 6. If (1) (l;j > 0 then the root of A with largest real part, which We shall call p(A), is real and simple. There is an associated positive characteristic vector which is unique up to a multiplicative constant.
Introduction to Matrix Analysis
294
FurtMrmore, peA) = max
m~n
1/1
N
L L
{
a;J'Xi/xi}
i-I
l
N
- min max { 1/1
,
i-I
(2)
aiJ'Xi/Xi}
Proof. The easiest way to establish this result is to derive it as a limiting case of Perron's theorem. Let p(A) denote a characteristic root of largest real part. It is clear that a root of eaA of largest absolute value is given by X(e aA ) = eap(A)
(3)
for 6 > O. Since eM is a positive matrix under our assumptions, as follows from Sec. 15 of Chap. 10, it follows that p(A) is real and positive, and simple. Using the variational characterization for A(e&A), we have N
eap(A)
= m!1x min (1
+8 l
aiJ'Xj/x.)
+ 0(8
2
)
(4)
i-I
,:r:
Letting 8 ~ 0, we obtain the desired representation. EXERCISES 1. If B ~ 0, and A is as above, show that p(A + B) ~ p(A). II. Derive Theorem 5 directly from Perron's theorem for sl + A, where s has been chosen so that sl + A is a positive matrix.
14. Nuclear Fission. The same type of matrix arises in connection with some simple models of nuclear fission. Here the N types of objects may be different types of elementary particles, or the same particle, say a neutron, in N different energy states. References to extensive work in this field will be found at the end of the chapter. 16. Mathematical Economics. As a simple model of an economic system, let us suppose that we have N different industries, the operation of which is dependent upon the state of the other industries. Assuming for the moment t\lat the state of the system at each of the discrete times n = 0, 1, . . . , can be specified by a vector x(n), where the ith component, xi(n), in some fashion describes the state of the ith industry, the statement concerning interrelation of the industries translates into a system of recurrence relations, or difference equations, of the following type: Xi(n
with Xi(O) =
ti,
+ 1)
=
gi(xl(n),x2(n), . . . ,xN(n»
i = 1, 2, . . . , N.
(1)
Positive Matrices and Mathematical Economics
29~
This is, however, much too vague a formulation. In order to see how relations of this type actually arise, let us consider a simple model of three interdependent industries, which we shall call for identification purposes the "auto" industry, the "steel" industry, and the" tool" industry. Using the same type of II lumped-constant " approach that has been so successful in the study of electric circuits, we 'shall suppose that each of these industries can be specified at any time by its stockpile of raw material and by its capacity to produce new quantities using the raw materials available. Let us introduce the following state variables: xl(n) x2(n)
= =
Xa(n) = x.(n) = x6(n) = x6(n) =
number of autos produced up to time n capacity of auto factories at time n stockpile of steel at time n capacity of steel mills at time n stockpile of tools at time n capacity of tool factories at time n
(2)
+
In order to obtain relations connecting the x.(n 1) with the xi(n), we must make some assumptions concerning the economic interdependence of the three industries: To increase auto, steel, or tool capacity, we require only steel and tools. Production of autos requires only auto capacity and steel. Production of steel requires only steel capacity. Production of tuols requires only tool capacity and steel.
(3a) (3b) (3c) (3d)
The dynamics of the production process are the following: At the beginning of a time period, n to n I, quantities of steel and tools, taken from their respective stockpiles, are allocated to the production of additional steel, tools, and aut(m, and tr) increasing exi.sting capacities for production.
+
Let
=
the amount of steel allocated at time n for the purpose of i = I, 2, . . . , 6 increasing xi(n) wi(n) = the amount of tools allocated at time n for the purpose of i = 1,2, . . . ,6 increasing x,(n) Zi(n)
(4)
Referring to the assumptions in (3), we see that Z3(n) = a wl(n) = w'l(n)
(5a)
= w5(n) = 0
(5b)
Introduction to Matrix Analysis
296
+
In order to obtain relations connecting x,(n 1) with the xj(n) and zj(n) and wj(n), we must make some further assumptions concerning the relation between input and output. The simplest assumption is that we have a linear production process where the output of a product is directly proportional to input in shortest supply. Thus production is proportional to capacity whenever there is no constraint on raw materials, and proportional to the minimum raw material required whenever there is no limit to capacity. Processes of this type are called" bottleneck processes," To obtain the equations describing the process, we use the principle of conservation. The quantity of an item at time n + 1 is the quantity at time n, less what has been used over [n, n + 1], plus what has been produced over [n, n + 1]. The constraints on the choice of the Zi and Wi are that we cannot allocate at any stage more than is available in the stockpiles, and further, that there is no point to allocating more raw materials than the productive capacity can accommodate. The conservation equations are then, taking account of the bottleneck aspects, Xl(n x2(n xa(n
+ 1) + 1) + 1)
Xe(n x6(n Xe(n
+ 1) + 1) + 1)
= xl(n)
+ min ('Ylx2(n),alzl(n»
= x.(n)
+ min (aez.(n),I3.w.(n»
= x2(n) + min (a2z2(n),132w2(n» = xa(n) - zl(n) - z2(n) - Ze(n) - z6(n) - ze(n)
=
x6(n) - w2(n) - w.(n) - We(n) min (aeZe(n),l3ewe(n»
= Xe(n)
+
(6)
+ min ['Y6Xe(n),a6Z6(n)]
where a" l3i, 'Yi are constants. The stockpile constraints on the choice of the Zi, Wi ~
Zl
+ 'Y2Xe(n)
ZJ
0
+ Z2 + Z. + Zb + Ze :::; Xa W2 + W. + We :::; X6
and Wi are (7a) (7b) (7c)
Applying the capacity constraints, we can reduce the equations in (5) to linear form. We see that allZ2 = 132W2 a.z. = l3.w. l3eze = l3e we
(8a) (8b)
(Be)
Using these relations, we can eliminate the Wi from (7) and obtain the linear equations
Positive Matrices and Mathematical Economics xl(n x2(n Xa(n
+ 1) + 1) + 1)
x.(n x&(n
+ +
Xe(n
+
= xl(n)
+
alzl(n),xl(O) = Cl x2(n) alZt(n),x2(O) = Cz = xa(n) - zl(n) - z2(n) - z.(n) - z&(n) - Ze(n) 'Y2x.(n),Xa(O) 1) = x.(n) a.z.(n),x.(O) = c. 1) = xl(n) - EzZ2(n) - E.z.(n) - EeZe(n) a&Z&(n) Ei = ai/~' x&(O) 1) = Xe(n) aeZe(n),Xe(O) = Ce
=
297
+
+
+
+
+
The constraints on the choice of the ZI
2.
=
Ca
(~)
= c&
are now
z. ;::: 0 + Z2 + Z. + ZI + Ze ~ XI 'Y2Z2 + 'Y.z. + 'YeZe ::; XI ZI ::; /2X2 Z& ::; fexe
(lOa) (lOb) (10c) (10d)
(10e)
Suppose that we satisfy these conditions by means of a choice (11)
where the scalars ez, e., and ee are chosen to satisfy (lOt) Z& =
(12)
fexe
and suppose that (lOb) is satisfied. Then the foregoing equations have the form x.(n
+ 1)
= (1 - a..)x.(n)
+ I ~jXj(n) j,,'
i = 1, 2, . . . ,6
(13)
where i, j
=
I, . . . , 6
(14)
The continuous version of these equations is dx; (it
=
- ~,x,
~ +~ a'jx. j,,'
i = 1,2, . . . ,6
(15)
What we have wished to emphasize is that a detailed discussion of the economic model shows that a special type of matrix, sometimes called an "input-output" matrix, plays a paramount role. Unfortunately, in the most interesting and important cases we cannot replace (14) by the stronger condition of positivity. The result is that the study of the asymptotic behavior of the x;(n) as given by a linear recurrence relation of the form appearing in (13) is quite complex. References to a great deal of research will be found at the end of the chapter.
Introduction to Matrix Analysis
298
16. Minkowski-Leontieff Matrices. Let us now discuss a particular class of non-negative matrices specified by the conditions (la) N
.I
(l;j::::;
(lb)
1
I-I
They arise in connection with the solution of linear equations of the form N
I
Xi =
a,jXj
+ Yi
i
=
I, 2, . . . , N
(2)
;-1
which occur in the treatment of models of interindustry production processes quite similar to that formulated above. Let us demonstrate Theorem 6. Theorem 6. If 0 ::::; aij and N
I
a,j
<
1
i
= 1,2, . . . , N
(3)
i=sl
tMn tM equations in (2) have a unique solution which is positive if the Yi are positive. If (l;j > 0, tMn (l - A)-l is positive. Proof. To show that (l - A)-l is a positive matrix under the assumption that a,j > 0, it is sufficient to show that its transpose is positive, which is to say that (l - A')-l is positive. Consider then the adjoint system of equations N
Zi
=
.I
,=1
ajiZj
+ Wi
i = 1,2, . . . , N
(4)
It is easy to see that the solution obtained by direct iteration converges, in view of the condition in (3), and that this solution is positive if Wi > 0 for i = I, 2, ... , N. The assumption that aij > 0 shows that Zi is positive whenever w is a nontrivial non-negative vector. The fact that (l - A)-l exists and is positive establishes the first part of the theorem. 1'1. Positivity of II - A I. Since the linear system in (16.2) has a unique solution for all y, under the stated conditions, it follows that II - AI ~ o. To show that II - AI > 0, let us use the method of continuity as in Sec. 4 of Chap. 4. If A ~ 0, the matrix AA satisfies the same conditions as A. Thus II - AA I is nonzero for 0 ::::; A ::::; 1. Since the determinant is positive at A = 0 and continuous in A, it is positiw at A = 1.
Positive Matrices and Mathematical Economics
299
N
18. Strengthening of Theorem 6. It is clear that if
L
aji
= 1 for all
i=-1
j, then 1 is a characteristic root of A, whence II - A I = O.
other hand, it is reasonable to suppose that the condition
L
aij
On the
< 1 can
i
be relaxed. It suffices to assume that
1 1
1
aij
> aij > 0
(la)
< 1 for at least one j
(lb)
~ 1 for all j
(Ie)
i
aij
i
To prove this, we show that A 2 is a matrix for which (lb) holds for all j. 19. Linear Programming. In the first part of this book, we considered the problem of maximizing quadratic forms subject to quadratic constraints, and the problem of maximizing quadratic forms subject to linear constraints. We have, however, carefully avoided any questions involving the maximization of linear forms subject to linear constraints. A typical problem of this type would require the maximization of N
=
L(x)
l
CjXi
(1)
i~l
subject to constraints of the form N
1
aijXj ::;
bi
i = 1,2, . . . , M
(2a)
j~l
Xi ~
0
(2b)
The reader will speedily find that none of the techniques discussed in the first part of the book, devoted to quadratic forms, are of any avail here. Questions of this type am indeed part of a new domain, the theory of linear inequalities, which plays an important role in many investigations. The theorem of Perron, established above, is closely related, as far as result and method of proof, to various parts of this theory. In addition to establishing results concerning the existence and nature of solutions of the problem posed above, it is important to develop various algorithms for numerical solution of this problem. This part of the general theory of linear inequalities is called linear programming.
Introduction to Matrix Analysis
300
Let us now present a very simple example of the way problems of this type arise in mathematical economics. Suppose that we possess quantities Xl, X2, • • • , XM of M different resources, e.g., men, machines, and money, and that we can utilize various amounts of these resources in connection with N different aetivities, e.g., drilling for oil, building automobiles, and so on. Let Xii
= the quantity of the ith resource alloeated to the jth activity (3)
so that N
1
Xii
=
X,
i = 1,2, . . . , M
(4a)
j-lo
Xii;:::
0
(4b)
Suppose, and this is usually a erude approximation, that the utility of allocating Xii is determined by simple proportionality, which is to say that the utility is a.'jXij. Making the further assumption that these utilities a.re additive, we arrive at the problem of maximizing the linear form L(x) =
~ a;jXii
(5)
',J
subject to the constraints of (2). As another example of the way in which these linear variational problems arise in mathematical economics, consider the model presented in Sec. 15. In place of assigning the quantities Zi and Wi as we did there, we can ask for the values of those quantities which maximize the total output of steel over N stages of the process. 20. The Theory of Games. Suppose that two players, A and B, are engaged in a game of the following simple type. The first player can make any of M different moves and the second player can make any of N different moves. If A makes the ith move and B the jth move, A receives the quantity aij and B receives - aij. The matrix A = (aij)
(1)
is called the payoff matrix. We now require that both players make their choices of moves simultaneously, without knowledge of the other's move. An umpire then examines both moves and determines the amounts received by the players according to the rule stated above. Suppose that this situation is repeated over and over. In general, it will not be advantageous for either player to make the same choice at each play of the game. To guard himself against his opponent taking
Positive Matrices and Mathematical Economics
301
advantage of any advance information, each player will randomize over choices. Let Xi = Yj
probability that A makes the ith choice, i = 1,2,
= probability that B makes the jth choice, j = I, 2,
,M ,N
(2)
Then the expected quantity received by A after any particular play is
~ aijXiYi
!(X,y) =
(3)
1,1
The problem is to determine how the Xi and Yi should be chosen. A can reason in the following way: "Suppose that B knew my choice. Then he would choose the Yi so as to minimize !(x,y). Consequently, I will choose the so as to maximize." Proceeding in this fashion, the expected return to A will be
Xi
=
VA
max min !(x,Y) :r:
where the
X
(4)
"
and Y regions are defined by Xi, Yi ~
LXi = LYi = i
(5a)
0 1
(5b)
j
Similarly, B can guarantee that he cannot lose more than VB
=
min max !(x,Y) "
(6)
:r:
The fundamental result of the theory of games is that VA = VB. This is the min-max theorem of von Neumann. What is quite remarkable is that it can be shown that these problems and the problems arising in the theory of linear inequalities, as described in Sec. 19, are mathematically equivalent. This equivalence turns out to be an analytic translation of the duality inherent in N-dimensional Euclidean geometry. 21. A Markovian Decision Process. Let us now discuss some problems that arise in the theory of dynamic programming. Consider a physical system 8 which at any of the times t = 0, I, . . . , must be in one of the states 81, 8 2, • • • ,8N • Let x.(n) = the probability that S is in state Si at time n, i = I, 2,
,N (1)
Introduction to Matrix Analysis
302
For each q, which represents a vector variable, let M(q) = (mij(q»
(2)
represent a. Markoff matrix. Instead of supposing, as in an ordinary Markoff process, that the probability distribution at time n is determined by M(q)IIX(O), for some fixed q, we shall suppose that q can change from stage to stage. Specifically, we shall suppose that this process is being supervised by someone who wants to maximize the probability that the system is in state 1 at any particular time. In place of the usual equations, we obtain the relations N
x1(n
+ 1)
= max g
l
m;j(q)xj(n)
;=1
(3)
N
Xi(n
+ 1)
=
1
mij(q*)x;(n)
i=1
where q* is a value of q which maximizes the expression on the first line. We leave to the reader the quite interesting problem of determining simple conditions under which a II steady..state" or equilibrium solution exists. In Sec. 22, we shall discuss a more general system. 22. An Economic Model. Suppose that we have N different types of resources. Let Xi(n) = the quantity of the ith resource at time n
(1)
Suppose that, as in the foregoing pages, the following linear relations exist: N
xi(n
+ 1)
=
1
aij(q)Xj(n)
(2)
j=1
where q is a vector parameter, as before. If A(q) is a positive matrix, the Perron theorem determines the asymptotic behavior of the system. Assume, however, as in Sec. 21, that the process is supervised by someone who wishes to maximize each quantity of resource at each time. The new relations are then N
xi(n
+ 1)
= max II
1
aij(q)Xj(n)
i
=
1,2, ••. ,N
(3)
;=1
The following generalization of the Perron theorem can then be obtained.
Positive Matrices and Mathematical Economics Theorem '1. If the maximum to be o < ml :::; a,j(q) :::; max >.(A(q)) exists
q runs over a set of values (ql,q2, obtained in (3) m2 < 00 and is attained for a q,
303
,qM) which allow (4a)
(4b) (4c)
IJ
then there exists a unique positive >. such that the homogeneous system N
>'Yi = max q
> O.
has a positive solution Yi cative constant and
>.
1 aij(q)Yi
(5)
i-I
This solution is unique up to a multipli=
max >.(A(q))
(6)
IJ
Furthermore, as n
~ 00,
(7) where al depends on the initial values.
The proof may be found in references cited at the end of the chapter. MISCELLANEOUS EXERCISES 1. Let A = (aij) be a non-negative matrix. A necessary and sufficient condition that all characteristic roots of A be less than 1 in absolute value is that all the principal minors of I - A be positive (Metzler). 2. If A has all negative diagonal elements, and no negative off-diagonal elements, if D is a diagonal matrix, and if the real parts of the characteristic roots of both A and DA are negative, then the diagonal elements of D are positive (K. D. Arrow and A. C. Enthoven) .. S. Consider a matrix Z(t) = (Zi;(t)), where the Zij possess the following properties: (a) Zij
(b) (c)
/0
/0
>0 00
Zii dt
>
00
Zijc-a'dt
1for some <
00
i
for some a
>0
Then there is a positive vector x and a Jlositive number
(/0 Furthermore,
So
is the root of
So
for which
00
II - 10
Ze-'o' dt) x 00
Ze-" dt
=
x
I
= 0 with greatest real part, and it is
a simple root (Bohnenblust)j sec R. Bellman.' 4. Let A = (a,j), and s,
=
laiil - 1laijl.
If
Si
> 0,
1
~i ~
N, then, as we
j;>!!i
ali'cady know, A -I
= (b ij )
exists, amI, furthermore,
Ib i1
1
~
l/s; (Ky Fan).
, R. Bellman, A Survey 01 the !If athl'matical Theory 01 'l'ime-Iay, Retarded Control, and Hereditary Processes, with the assistance of J. M. Danskin, Jr., The RAND Corporation, Rept. R-256, March I, 1954.
Introduction to Matrix Analysis
304
I. A real matrix A = (ai;) is said to be maximum increlJling if the relation N
max
15'5 N
Xl
~
max "aiiXI 15i5N /:1
Prove that A is maximum increasing if and only if
holds for any N real numbers Xi. A-I - (6ii) with (a) bit ~ 0 N
l
(6)
bij = 1
1 ~ i 5, N
j-l
(Ky Fan)
8. Let the Xi be positive, and /(XI,X2, • . . ,ZN), g(XI,X2, . . • ,XN) denote, respectively, the least and greatest of the N + 1 quantities Xl, XI + l/xl, Xa + l/x2, . . . , XN + I/zN_I, l/ZN. Prove that max /(ZI,X2, . . • ,xN) = min g(ZI,Z2, ••. ,XN) = 2 cos ("./N
%.>0
%.>0
+ 2) (Ky Fan)
7. Let a
> o.
For any arbitrary partition 0
= 10 < II <
into N subintervals, the approximate Riemann sum
fo
1
dtl(a
+ I)
< tN = 1 of
N
always contains a term
~
~
- li_I)/(a
+ Ii)
to
i-I
(l - (ala
+ 1)\1IN as
than this quantity (Ky Fan.). 8. Let A be a matrix of the form ail
1 ~ti
(0,1)
0, i 7"£ j, ail
+
I
ail =
well as a term less
0, j = 1,2, . . . , N.
jr4j
If Xis a charactE'ristic root, then either X = 0 or Re (X)
<0
(A. Brauer-O. TaUll8kll).
N
9. If Ui
~ 0,
ail
~ 0, i ~ j,
1
ail = all,
then
i~l
+
au Ul -an
-au an U2
+
~O
(MinkoW8ki)
10. If ail
~
0, i
~ i, and bl ~
1
ail,
i,.i
are non-negative (W. Ledermann).
then all cofactors of
Positive Matrices and Mathematical Economics
305
>- 11xtjl
Let
11. A real matrix X for which
Xil
i "i
is called an Hadamard matrix.
+
+
A and B both be Hadamard matrices. Then IA BI ~ IAI IBI (E. V. Hayn8worth). 1 See also A. Ostrowski.' Hadamard matrices play an important role in computational analysis. Matrices of this nature, and of more general type, the dominant matrices, enter in the study of n-part networks. See .
P. Slepian and L Weinberg, Synthe8i8 Application8 o! Paramount and Dominant Matrice8, Hughes Research Laboratories, 1958.
R. S. Burington, On the Equivalence of Quadrics in m-affine n-space and Its Relation to the Equivalence of 2m-pole Networks, Trans. Am. Math. Soc., vol. 38, pp. 163-176, 1935. R. S. Burington, R·matrices and Equivalent Networks, /', J. Math. and Phys., vol. 16, pp. 85-103, 1937.
li. If lat;1 :s; mla"l, j < i, i = 1,2, . . . , N, lat;1 ~ Mlaiil,i > i, i = 1,2, . . . , N - 1, then A is nonsingular if m/(l + m)N < M(l + M)N, and m < M. If m = M, then m < l/(N - 1) is sufficient (A. 08trowski).' 18. An M matrix is defined to be a real matrix A such that at; ~ 0, i ~ i, possessing one of the following three equivalent properties: (a) There exist N positive numbers XI such that N
1
ailxl
>0
i = 1,2, . . . , N
i-I
(b) A is nonsingular and all elements of A -I are non-negative (c) All principal minors of A are positive.
M matrices were first introduced by Ostrowski. 4 See also E. V. Haynsworth, Note on Bounds for Certain Determinants, Duke Math. J., vol. 24, pp. 313-320, 1957, and by the same author, Bounds for Determinants with Dominant Main Diagonal, Duke Math. J., vol. 20, pp. 199-209, 1953, where other references may be found. 14. Let A and B be two positive matrices of ordet N, with AB ~ BA, and c a given positive N-dimensional vector. Consider the vector XN = ZNZN_l . . . Z,ZIC, where each Zds either an A or a B. Suppose that the Z, are to be chosen to maximize the inner product (xN,b) where b is a fixed vector. Define the function !N(C) "" max (xN,b) (Zl\
Then /1(c) = max (Ac,b),(Bc,b» !N(C) = max (fN_I(Ac),!N_l(Bc»
N "" 2,3, . . .
1 E. V. Haynsworth, Bounds for Determinants with Dominant Main Diagonal, Duke Math. J., vol. 20, pp. 199-209, 1953. I A. Ostrowski, Note on Bounds for Some Determinants, Duke Math. J., vol. 22, pp. 95-102, 1955. A. Ostrowski, fiber die Determinaten mit iiberwiegender Haupt-Diagonale, Comment. Math. Helveticii, vol. 10, pp. 69-96, 1937-1938. I A. Ostrowski, On Nearly Triangular Matrices, J. Research Nail. Bur. Standards, vol. 52, pp. 319-345, 1954. 4 A. Ostrowski, Comment. Math. Helveticii, vol. 30, pp. 175-210, 1956; ibid., vol. 10, pp. 69-96, 1937.
Introduction to Matrix Analysis
306
16. Does there exist a scalar" such that IN(C) '" "Ng(C)? 16. Consider the corresponding problem for the case where we wish to maximize the Perron root of ZNZN_I . . . Z 2ZI. 17. Let A be a positive matrix. There exist two positive diagonal matrices, D I and D., such that DIAD. is doubly stochastic (Sinkhorn; see Menon l ). 18. Let A be a non-negative matrix. When does there exist a positive diagonal matrix such that DAD-I is positive and symmetric? (The question arose in some work by Pimbley in neutron transport thcory. See Parter,2 Parter and Youngs,' and Hearon. 4) 19. Let A be a non-negative matrix, and let P(A) be the finitc set of non-negative matrices obtained from A by permuting its elements arbitrarily. Let B CP(A). (a) What are the maxima and minima of tr (B2)? (b) What are the maximum and minimum values of the Perron root of B? (B. Schwarz, Rearrangements oj Square Matrices with Nonnegative Elements, University 0/ Wisconsin, MRC Report #282, 1961.)
iO. Let A be a positive matrix.
max {max aii, (N - 1) min aij
Prove that the Perron root satisfies the inequalities
+ min aid
i¢i'
i
~ "(A) ~ (N - 1)
i#
(Sunder) il. Consider the differential equation x' = Q(l)x, i ;J. j. If
f,' "'(h) dh
max all
° < t < to,
+ max ali i
where qij(1)
~
0,
-+ co
as s -+ 0, where ",(I) = inf {qljqjilH, then the equation has a positive solution, unique i#
(G. Birkhoff and L. Kotin, Linear Second-order Differential Equations oj Positive Type, J. Analyse Math., vol. 18, pp. 43-52, 1967.) ii. Consider the matrix product C = A . B where Cli = - aiibij, i ;J. j, Cll = ajibll. Show that if A and Bare M matrices (see Exercise 13), then C is an M matrix. For
up to a constant factor.
many further deeper results concerning this matrix product introduced by Ky Fan, see Ky Fan, Inequalities for M-matrices, lndag. Math., vol. 26, pp. 602-610, 1944. M matrices play an important role in the study of iterative methods for solving systems of linear equations, and in the numerical solution of elliptic partial differential equations using difference methods; see Ostrowski" and Varga.' The Fan product is analogous to the Schur product (Cil = aijb,j), previously introduced for positive definite matrices. In general, there is an amazing parallel between results for positive definite and M matrices. This phenomenon, noted some time 1M. V. Menon, Reduction of a Matrix with Positive Elements to a Doubly Stochastic Matrix, Proc. Am. Math. Soc., vol. 18, pp. 244-247, 1967. 2S. V. Parter, On the Eigenvalues and Eigenvectors of a Class of Matrices, SIAM Journal, to appear. s S. V. Parter and J. W. T. Youngs, The Symmetrization by Diagonal Matrices, J. Math. Anal. Appl., vol. 4, pp. 102-110, 1962. 4 J. Z. Hearon, The Kinetics of Linear Systems with Special Reference to Periodic Reactions, Bull. Math. Biophys., vol. 1.'">, pp. 121-141, 19;ja. I A. 1\1. Ostrowski, Itnative Solution of Linear Systems of Functional Equations, J. Math. Anal. App/., vol. 2, pp. 3!i1-a69, 1961. 'R. S. Varga, Matrix lterative.1 na/!Jsis, Prentit'e-f1all, IUl·., Englewood Cliffs, N.J., 1962.
Positive Matrices and Mathematical Economics
307
ago by O. Taussky (and also by Gantmacher-Krein, and by D. M. Kotelyanskii, Math. Rev., vol. 12, p. 793), is discussed and analyzed in the following papers: Ky Fan, An Inequality for Subadditive Functions on a Distributive Lattice, with Applications to Determinantal Inequalities, Lin. Algebra Appl., vol. I, pp.33-38, 1966. Ky Fan, Subadditive Functions on a Distributive Lattice, and an Extension of Szasz's Inequality, J. Math. Anal. Appl., vol. 18, pp. 262-268, 1967. Ky Fan, Some Matrix Inequalities, Abh. Math. Sem. Hamburg Univ., vol. 29, pp. 185-196, 1966. i8. If A and Bare M matrices such that B - A ~ 0, then IA + Bjl/n ~ IA Ilin + IBllIn, (tA + (1 - I)B)-1 ~ (1 - t)B-1 for any t in (0,1). Many further results are given in Ky Fan. 1 i4. Call a non-negative matrix A irreducible if AN > 0 for some N. Show that every irreducible A has a positive characteristic root /(A) which is at least as large as thc absolute value of any othcr characteristic root. This charactcristic root is simple and the characteristic vector can bc takcn to be positive (Frobenius). Sec Blackwell2 for a proof using the min-ma.x theorem of game theory. i6. A matrix A is said to be of monotone kind if Ax ~ 0 implies that x ~ O. Show that A -1 exists if A is of monotone kind, and A -1 ~ O. See Collatz,' and for an extension to rectangular matrices see Mangasarian. 4 i6. Let A = (ai;) be a Markov matrix, and let A-I = (bi;). Under what conditions do we have b;; > 1, bi; ~ 01 See Y. Uckawa, P-Matrices and Three ForrruJ 0/ the Stolper-Samuelson Crt'terion, Institute of Social and Economic Research, Osaka University, 1968. 27. Let X ~ X2 ~ O. Then the real characteristic values of X must lie between 1 and (1 - V2)/2 (R. E. DeMarr). i8. Let B be a real square matrix such that bii ~ 0 for i ~ j and the leading minors of B are positive. Then B-1 is a non-negativc matrix. Hint: Use the representation
10'" e-
Bl
dt
=
B-1.
See J. Z. Hearon, The Washout Curve in Tracer Kinetics, Math.
Biosci., vol. 3, pp. 31-39, 1968.
Bibliography and Discussion §1. A very large number of interesting and significant analytic problems arise naturally from the consideration of various mathematical models of economic interaction. For an interesting survey of mathematical questions in this field, see I. N. Herstein, Some Mathematical Methods and Techniques Economics, Quart. Appl. Math., vol. 11, pp. 249-261, 1953.
10
1 Ky Fan, Inequalities for the Sum of Two M-matrices, Inequalities, Academic Press Inc., New York, pp. 105-117, 1967. 2 D. Blackwell, Minimax and Irrenurihle Matrices, J. Math. Anal. Appl., vol. 3, pp.37-39, 1961. 3L. Collatz, Functional Analysis and Numerical .1tathematics, Academic Press., Inc., New York, 1966. 40. L. MangasariulI, Cbaraetcrizatillll uf HCIII Ml\tri('('~ uf l\lollutum' Kind, SIAM Review, vul. lU, pp. 439-441, 196~.
308
Introduction to Matrix Analysis
Two books that the reader interested in these matters may wish to examine are T. Koopmans, Activity Analysis of Production and Allocation, John Wiley & Sons, Inc., New York, 1951. O. Morgenstern (ed.), Economic Activity Analysis, John Wiley & Sons, Inc., New York, 1954. For an account of the application of the theory of positive matricetl to the study of the dynamic stability of a multiple market system, see K. J. Arrow and M. Nerlove, A Note on Expectations and Stability, TR No. 41, Department of Economics, Stanford University, 1957. K. J. Arrow and A. Enthoven, A Theorem on Expectations and the Stability of Expectations, Econometrica, vol. 24, pp. 288-293, 1956. where further references to these questions may be found. To add to the confusion of terminology, let us note that in addition to the positive definite matrices of the first part of the book, and the positive matrices of Perron, we also have the important positive real matrices mentioned on page Ill. §2. For a discussion of mathematical models of growth processes of this type, see the monograph by Harris, T. E. Harris, The Mathematical Theory of Branching Processes, Ergeb. Math., 1960, where many further references will be found. §3. The concept of a positive operator is a basic one in analysis. For an extensive discussion of the theory and application of these operators, see M. G. Krein and M. A. Rutman, Lineinye operatory ostavliaiushie invariantnym konus v prostranstve Banakha (Linear Operators Leaving Invariant a Cone in a Banach Space), Uspekhi!lfatem. N auk (n.s.), vol. 3, no. 1 (23), pp. 3-95, 1948. See Math. Revs., vol. 10, pp. 256-257, 1949. §4. Perron's result is contained in O. Perron, Zur Theorie der Matrizen l Math. Ann., vol. 64, pp. 248263, 1907. The method of proof is quite different from that given here and extremely interesting in itself. We have emphasized the method in the text since
Positive Matrices and Mathematical Economics
309
it is one that generalizes with ease to handle infinite dimensional operators. This method was developed by H. Bohnenblust in response to a problem encountered by R. Bellman and T. E. Harris in the study of multidimensional branching processes. See the above cited monograph by T. E. Harris. Perron's result was considerably extended by Frobenius,
G. Frobenius, tJ'ber Matrizen aus nicht-negativen Elementen, Sitzsbere der kgl. preuss. Akad. Wiss., pp. 456-477, 1912, who studied the much more involved case where only the condition 0 is imposed. For many years, the pioneer work of Perron was forgotten and the theorem was wrongly attributed to Frobenius. .For an approach to this theorem by way of the theory of linear differential equations, and some extensions, see
(l;j ~
P. Hartman and A. Wintner, Linear Differential Equations and Difference Equations with Monotone Solutions, Am. J. Math., vol. I, pp. 731-743, 1953. For a discussion of mOre recent results, see
G. Birkhoff and R. S. Varga, Reactor Criticality and Nonnegative Matrices, J. Ind. and Appl. Math., vol. 6, pp. 354-377, 1958. As in the case of Markoff matrices, the problem of studying the behavior of the characteristic roots under merely a non-negativity assumption requires a very careful enumeration of cases. The conditions that there exist a unique characteristic root of largest absolute value are now most readily expressible in probabilistic, economic, or topological terms. For a thorough discussion of these matters with detailed references to other work, see Y. K. Wong, Some Mathematical Concepts for Linear Economic Models, Economic Activity Analysis, O. Morgenstern (ed.), John Wiley & Sons, Inc., New York, 1954. Y. K. Wong, An Elementary Treatment of an Input-output System, Naval Research Logistics Quarterly, vol. I, pp. 321-326, 1954. M. A. Woodbury, Properties of Leontieff-type Input-output Matrices, Economic Activity Analysis, O. Morgenstern (ed.), John Wiley & Sons, Inc., New York, 1954. M. A. Woodbury, Characteristic Roots of Input-output Matrices, Economic Activity Analysis, O. Morgenstern (ed.), John Wiley & Sons, Inc., New York, 1954.
310
Introduction to Matrix Analysis S. B. Noble, Measures of the Structure of Some Static Linear Economic Models, George Washington University, 1958.
The first proof of the Perron theorem using fixed point theorems is contained in P. Alexandroff and H. Hopf, Topologie I, Berlin, 1935. For a detailed discussion of this technique, with many extensions and applications, see Ky Fan, Topological Proofs for Certain Theorems on Matrices with Non-negative Elements, Monats. Math., Bd. 62, pp. 219-237, 1958.
§6. For other proofs and discussion, see A. Brauer, A New Proof of Theorems of Perron and Frobenius on Non-negative Matrices-I: Positive Matrices, Duke Math. J., vol. 24, pp. 367-378, 1957. G. Debreu and I. N. Herstein, Non-negative Square Matrices, Econometrica, vol. 21, pp. 597-607, 1953. H. Samelson, On the Perron-Frobenius Theorem, Michigan Math. J., vol. 4, pp. 57-59, 1957. J. L. Ullman, On a Theorem of Frobenius, Michigan Math. J., vol. I, pp. 189-193, 1952. A powerful method for studying problems in this area is the "projective metric" of Birkhoff. See G. Birkhoff, Extension of Jentzsch's Theorem, Trans. Am. Math. Soc., vol. 85, pp. 219-227, 1957. R. Bellman and T. A. Brown, Projective Metrics in Dynamic Programming, Bull. Am. Math. Soc., vol. 71, pp. 773-775, 1965. An extensive generalization of the Perron-Frobenius theory is the theory of monotone processes. See
R. T. Rockafellar, Monotone Processes of Convex and Concave Type, Mem. Am. Math. Soc., no. 77, 1967. where many further references may be found. Some papers containing applications of the theory of positive matrices are S. Kaplan and B. Miloud, On Eigenvalue and Identification Problems in the Theory of Drug Distribution, University of Southern California Press, Los Angeles, USCEE-31O, 1968.
Positive Matrices and Mathematical Economics
311
J. Z. Hearon, Theorems on Linear Systems, Ann. N.Y. A cad. Sci., vol. 108, pp. 36-68, 1963. P. Mandl and E. Sevieta, The Theory of Nonnegative Matrice8 in a Dynamic Programming Problem, to appear. P. Novosad, Isoperimetric Eigenvalue Problems in Algebras, Comm. Pure Appl. Math., vol. 21, pp. 401-466, 1968. Another generalization of the concept of a positive matrix is that of totally positive matrices. See the book S. Karlin, Total Positivity, Stanford University Press, 1968. for an extensive account and an important bibliography, and
H. S. Price, Monotone and Oscillation Matrices Applied to Finite Difference Approximations, Math. Comp., vol. 22, pp. 489-517, 1968. See also P. G. Ciarlet, Some Results in the Theory of Nonnegative Matrices, Lin. Algebra Appl., vol. I, pp. 139-152, 1968.
R. B. Kellogg, Matrices Similar to a Positive Matrix, Lin. Algebra Appl., to appear. For the use of graph-theoretic methods, see B. R. Heap and M. S. Lynn, The Structure of Powers of Nonnegative Matrices, 1. The Index of Convergence, SIAM J. Appl. Math., vol. 14, pp. 610-639, 1966; II, ibid., pp. 762-777. R. B. Marimont, System Connectivity and Matrix Properties, Bull. Math. Biophysics, vol. 31, pp. 255-273, 1969. A. Rescigno and G. Segre, On Some Metric Properties of the Systems of Compartments, Bull. Math. Bt'ophysics, vol. 27, pp. 315-323, 1965.
J. Z. Hearon, Theorems on Linear Systems, Ann. N.Y. Acad. Sci., vol. 108, pp. 36-68, 1963.
J. Maybee and J. Quirk, Qualitative Problems in Matrix Theory, SIAM Review, vol. 11, pp. 30-51, 1969.
W. R. Spillane and N. Hickerson, Optimal Elimination for Sparse Symmetric Systems as a Graph Problem, Q. Appl. Math., vol. 26, pp. 425-432, 1968. For further references, RPe C. R. Putnam, Commutators on a Hilbert Space-On Bounded Matrice8 w1'thNon-negativeElemen(8, Purdue University, PRF-1421, May, l!}.58,
Introduction to Matrix Analysis
312
where it is indicated how Perron's theorem is a consequence of the Vivanti-Pringsheim theorem for power series. This result has been independently discovered by Wintner, Bohnellblust-Karlin, MullikinSnow, and Bellman. Various iterative schemes have been given for obtaining the largest characteristic root. See the above reference to A. Brauer, and also A. Brauer, A Method for the Computation for the Greatest Root of a Positive Matrix, University of North Carolina, December, 1957. R. Bellman, An Iterative Procedure for Obtaining the Perron Root of a Positive Matrix, Proc. Am. Math. Soc., vol. 6, pp. 719-72,1, 1955. §7. This technique is used in
R. Bellman and J. M. Danskin, A Survey of the Mathematical Th.eory of Time-lag, Retarded Control, and Hereditary Processes, The RAND Corporation, Rept. R-256, March I, 19.54. to handle the corresponding question posed in Exercise 3 of the l\Iiscellaneous Exercises. §9. This variational characterization of AN(A) appears first to have been implied in L. Collatz, Einschliessungssatz fUr die characteristischen Zahlen von Matrizen, Math. Z., vol. 48, pp. 221-226, 1946. The precise result was first given in H. Wielandt, Unzerlegbare, nicht negative Matrizen, Math. Z., vol. 52, pp. 642-648, 1950. It has been rediscovered several times, nottlbly by Bohnenblust as men-
tioned above, and by von Neumann. ~14.
See the above cited monograph by T. E. Harris, and the papers by
R. Bellman, R. E. Kalaba, and G. :\1. Wing, On the Principle of Invariant Imbedding and One-dimensional Neutron Multiplication, Pmc. Nat. Acad. Sci., vol. 43, pp. 517-520, 1957. R. Bellman, R. E. Kalaba, i1Dd G. :\'1. Wing, On the Principle of Invariant Imbedding and Neutron Transport TheorYi I, One-dimensional Case, J. Math. Mech., vol. 7, pp. 149-162, 1958. §16. This model is discussed in detail by the methods of dynamic programming in R. Bellman, Dynamic Programming, Princeton University Press, Princeton, N.J., 1957.
Positive Matrices and Mathematical Economics
313
§16. See the papers by Wong and Woodbury referred to above in §4. §19. See the paper by T. Koopmans referred to above and the various studies in the A nnals of Mathematics series devoted to linear programming. A definitive work is the book G. B. Dantzig, Linear Programming and Its Extensions, Princeton University Press, Princeton, KJ., 1963. An excellent exposition of the theory of linear inequalities is contained III
R. A: Good, Systems of Linear Relations, SIA.M Review, vol. I, pp. 1-31, 1959. where many further references may be found.
See also
M. H. Pearl, A Matrix Proof of Farkas' Theorem, Q. Jour. Math., vol. 18, pp. 193-197, 1967.
R. Bellman and K. Fan, On Systems of Linear Inequalities in Hermitian Matrix Variables, Convexity, Proc. Symp. Pure Math., vol. 7, pp. 1-11, 1963. §20. The classic work in the theory of games is J. von Neumann and O. Morgenstern, The Theory o/Games and Economic Behavior, Princeton University Press, Princeton, N.J., 1948. For a very entertaining introduction to the subject, see J. D. Williams, The Compleat Strategyst, McGraw-Hill Book Company, Inc., New York, 1955. For a proof of the min-max theorem via an induction over dimension, see I. Kaplansky, A Contribution to the Von Neumann Theory of Games, Ann. Math., 1945. The proof that the min-max of
~ a;ixiYJ 1,1
over
~ Xi = •
I,
~ Yi =
I,
1
X"Yi ~ 0 is equal to the max-min is complicated by the fact that the extrema with respect to Xi and Yi separately are at vertices. One can force the extrema to lie inside the region by adding terms such as E log (l/xi(1 - Xi» - E log (l/Yi(1 - Yi»' E > O. It is not difficult to establish that min-max = max-min for all E > 0 and to justify that equality holds as E -+ O. This unpublished proof by M. Shiffman (1948), is closely related to the techniques of "regulariza-
Introduction to Matrix Analysis
314
tion" discussed in the books of Lattes-Lions and Lavrentiev referred to in Chap. 19. §21. See Chap. 11 of the book by R. Bellman referred to in §2, and R. Bellman, A Markovian Decision Process, J. Math. Meeh., vol. 6, pp. 679-684, 1957. P. Mandl and E. Seneta, The Theory of Non-negative Matrices in a Dynamic Programming Problem, to appear. §22. For a different motivation, see R. Beilman, On a Quasi-linear Equation, Canad. J. Math., vol. 8, pp. 198-202, 1956. An interesting extension of positivity, with surprising ramifications, is that of positive regions associated with a given matrix A. We say that R is a positive region associated with A if x, yER implies that (x,Ay) ~ O. See
M. Koecher, Die Geodatischen von Positivitatsbereichen, Math. Ann., vol. 135, pp. 192-202, 1958. M. Koecher, Positivitatsbereiche im RfI, Am. J. Math., vol. 79, pp. 575-596, 1957. S. Bochner, Bessel Functions and Modular Relations of Higher Type and Hyperbolic Differential Equations, Comm. Sem. Math. Univ. Lunde, Suppl., pp. 12-20, 1952. S. Bochner, Gamma Factors in Functional Equations, Proe. N atl. Acad. Sci. U.S., vol. 42, pp. 86-89, 1956. Another interesting and significant extension of thc concept of a positive transformation is that of a variation-diminishing transformation, a concept which arises naturally in a surprising number of rcgions of analysis. For various aspects, see
I. J. Schoenberg, On Smoothing Functions and Their Gencrating Functions, Bull. Am. Math. Soc., vol. 59, pp. 199-230, 1953. V. Gantmacher and M. Krein, Sur les matrices compl~tement nonn~gatives et oscillatoires, Compo math., vol. 4, pp. 445-476, 1937. G. Polya and I. J. Schoenberg, Remarks on de la Vall~e Poussin Means and Convex Conformal Maps of the Circlc, Pacific J. Math., vol. 8, pp. 201-212, 1958.
Positive Matrices and Mathematical Economics
315
S. Karlin, Polya-type Distributions, IV, Ann. Math. Stat., vol. 29, pp. 1-21, 1958 (where references to the previous papers of the series may be found). S. Karlin and J. L. MacGregor, The Differential Equations of Birth-and-death Processes and the Stieltjes Moment Problem, Trans. Am. Math. Soc., vol. 85, pp. 489-546, 1957. See also P. Slepian and L. Weinberg, Synthesis Applications of Paramount and Dominant Matrices, Hughes Research Laboratories, Culver City, California. Finally, let US mention a paper devoted to the study of the domain of the characteristic roots of non-negative matrices. F. I. Karpelevic, Uber die charakterische Wurzeln von Matrizen mit nicht-negativen Elementen, lsvestja Akad. Nauk., S.S.S.R., ser. mat. 15, pp. 361-383, 1951. (Russian.)
17 Conlro l Processes 1. Introduction. One of the major mathematical and scientific devel. opments of the last twenty years has been the creation of a number of sophisticated analytic theorit\s devoted to the feasible operation and control of systems. As might be expected, a number of results concerning matrices play basic roles in these theories. In this chapter we wish to indicate both the origin and nature of some of these auxiliary results and describe some of the methods that are used to obtain them. Our objective is to provide an interface which will stimulate the reader to consult texts and original sources for many further developments in a field of continuing interest and analytic difficulty. Although we will briefly invade the province of the calculus of variations in connection with the minimization of quadratic functionals, we will not invoke any of the consequences of this venerable theory. The exposition here will be simple and self-contained with the emphasis upon matrix aspects. The theory of dynamic programming will be used in connection with deterministic control processes of both discrete and continuous type and very briefly in a discussion of stochastic control processes of discrete type. 2. Maintenance of Unstable Equilibrium. Consider a system 8 described by the differential equation
-dx = dt
Ax
x(O)
=
0
(1)
where A is not a stability matrix. The solution is clearly x = 0 for ~ O. This solution, however, is unstable as far as disturbances of the initial state are concerned, If y is determined by
t
dy
-dt = Ay
y(O)
=c
(2)
and if A possesses some characteristic roots with positive real parts, then unless the vector c is of particular form, y will be unbounded as t -+ 00. This can lead to undesirable behavior of the system 8, and 316
Control Processes
317
indeed the linear model in (2) becomes meaningless when lIyll becomes too large. One way of avoiding this situation is to redesign the system so as to produce a stability matrix. This approach motivates the great and continuing interest in obtaining usable necessary and sufficient conditions that A be a stability matrix. For many purposes, sufficient conditions are all that are required. This explains the importance of the second method of Lyapunov. Unfortunately, redesign of a system, such as one of biological, engineering, or social type, is not always feasible. Instead, additional influences are exerted on the system to counteract the influences which produce instability. Many interesting and difficult questions of analysis arise in this fashion. This is the principal theme of modern control theory. One question is the following. Choose a control vector y, a "forcing term," so that the solution of
dx
-dt = Ax
+Y
x(O)
=
c
(3)
follows a prescribed course over time. For example, we may wish to keep IIxll small. What rules out a trivial solution, for example y = -Ax (a simple "feedback law"), is the fact that we have not taken into account the cost of control. Control requires effort, time, and resources. Let US restrict ourselves in an investigation of control processes to a simple case where there are two costs involved, one the cost of deviation of the system from equilibrium and the other the cost involved in a choice of y. Let us measure the first cost by the functional Jt(x) =
B
> 0,
loT (x,Bx) dt
(4)
and the second cost by the functional (5)
We further assume that these costs are commensurable so that we can form a total cost by addition of the individual costs J(x,y)
=
loT [(x,Bx) + (y,y)] dt
(6)
The problem we pose ourselves is that of choosing the control function y so as to minimize the functional J(x,y), where x and yare related by
(2.3). The variable :1: is called the state variable, y is called the c"ntrol variable, and J(x,y) is called the criterion function. 3. Discussion. This problem may be regarded as lying in the domain of the calculus of variatiolls, which suggests a number of immediate
Introduction to Matrix Analysis
318 questions:
Over what class of functions do we search for a minimum? Does a minimizing function exist in this class? If so, is it unique? Can we develop an analytic technique for the determination of a minimizing function? How many different analytic tech(1) niques can we develop? (e) Can we develop feasible computational techniques which will provide numerical answers to numerical questions? How many such methods can we discover? (j) Why is there a need for a variety of analytic and computational approaches? (a) (b) (c) (d)
As mentioned previously, we will discuss these questions without the formidable apparatus of the general theory of the calculus of variations. Furthermore, we will refer the reader to other sources for detailed discussions of these questions. Let us introduce the term "control process" to mean a combination of, first, an equation determining the time history of a system involving state and control variables, and second, a criterion function which evaluates any particular history. In more general situations it is necessary to specify constraints of various types, on the behavior of the system, on the control permissible, on the information available, and so on. Here we will consider only a simple situation where these constraints are absent. 4. The Euler Equation for A = O. Let us consider first the case A = 0 where the analysis is particularly simple. Since x' = y, the quadratic functional (the criterion function) takes the form J(x) = loT [(x',x')
+ (x,Bx)] dt
(1)
As candidates for a minimizing function we take functions x whose derivatives x' are in V(O,T)-we write x' E V(O,T)-and which satisfy the initial condition x(O) = c. This is certainly a minimum requirement for an admissible function, and it turns out, fortunately, to be sufficient in the sense that there exists a function in this class yielding the absolute minimum of J(x). Let x be a function in this class minimizing J(x), assumed to exist for the moment. Let z be a function with the property that z(O) = 0, z' E J.,2(0, T), and consider t.hc sccond admissible function x + fZ where f is a scnlar parameter. Thell J(x
+ fZ) = Jo[7' [(.t' + fZ', x' -I- d) + (:c + fZ, B(x + fZ»] rlt = J(x) + f2J(Z) + 2f 101' [(x',z') + (Bx,z)] dt
(2)
Control Processes
319
Since E can assume positive and negative values, for J(x) to be a minimum value we must have the variational condition
foT [(x',z') + (Bx,z)] dt
=0
(3)
We want to use the fact that this holds for all z of the type specified above to derive a condition on x itself. This condition will be the Euler equation. We permit ourselves the luxury of proceeding formally from this point on since it is the equation which is important for our purposes, not a rigorous derivation. If we knew that x' possessed a derivative, we would obtain from (3) via integration by parts (x',z) ];
+ foT [-(x",z) + (Bx,z)] dt = 0
(4)
Since z(O) = 0, the integrated term reduces to (x'(T),z(T». Since (4) holds for all z as described above, we suspect that x must satisfy the differential equation x" - Bx = 0
(5)
subject to the original initial condition x(O) = c and a new boundary condition x'(T) = O. This, the Euler equation for the minimizing function, specifies x as a solution of a two-point boundary-value problem. EXERCISES 1. What is wrong with the following argument: If
fo
T
(x" - Bx, z) dl = 0 for all
z such that z' E V(O, T), take z = x" - Bx and thus conclude that x" - Bx 2. Integrate by parts in (3) to obtain, in the case where B is constant,
and takc z'
=
x'
+ B f,T Xdl l ,
to concludc that x'
+B
f,T Xdl
1
= O.
= 0 (Boor).
Why is this procedurc rigorous, as opposcd to that proposed in the prcceding exercise? 8. Carry through the proof for B variable.
6. Discussion.
We shall now pursue the following route:
Demonstrate that the Euler equation possesses a unique solution. (b) Demonstrate that the function thus obtained yields the absolute minimum of J(x).
(a)
(1)
320
Introduction to Matrix Analysis
If we can carry out this program, we are prepared to overlook the dubious origin of the Euler equation. The procedure is easily carried out as a consequence of the positive character of the "cost function" J(x). 6. Two-point Boundary-value Problem. Let Xl, X z be the principal matrix solutions of X" - BX = 0 (1) which is to say X~(O) = 0 Xt(O) = I (2) X 2(0) = 0 X~(O) = I Then every solution of x" - Bx
=0
x(O)
=c
(3)
has the representation (4)
where d is an arbitrary vector. To satisfy the terminal condition x'(T) = 0, we set (5) x'(T) = 0 = X~(T)c + X~(T)d Hence, the vector d is uniquely determined if X~(T) is nonsingular. 7. Nonsingularity of X;(T). Let us now show that the nonsingularity of X~(T) follows readily from the positive definite character of J(x). By this tenn "positive definite" we mean an obvious extension of the concept for the finite-dimensional case, namely that J(x) > 0 for all nontrivial x. This is a consequence of the corresponding property for (x,Bx) under the assumption B > O. As we shall see, we actually need J(x) > 0 only for nontrivial x satisfying x(O) = x'(T) = O. Suppose that X~(T) is singular. Then there exists a nontrivial vector b such that (1) X~(T)b = 0 Consider the function (2) y = X 2(t)b It is a solution of y" - By = 0 (3) satisfying the conditions y(O)
= 0, y'(T) = O. Hence, from (3),
loT (y, y" -
By) dt
=0
(4)
Integrating by parts, we have
o = (y,y') I o = -J(y)
- loT [(y',y') + (y,By)] dt
This is a contradiction since y is nontrivial.
(5)
Hence, X~(T) is nonsingular.
Control Processes
321
EXERCISES 1. Extend the foregoing argument to cover the case where J(x) =
loT [(x',Q(t)x') + (x,B(t)x)] dt
with Q(t), B(t) ~ 0, 0 =:;; t =:;; T. i. Extend the foregoing srgument to cover the case where there is a terminal condition x(T) = d. 8. Consider the terminal control problem of minimizing J(x,X) =
loT [(x',x') + (x,Bx») dt + X(x(T) -
d, x(T) - d)
X ~ 0, where x is subject solely to x(O) = c. Discuss the limiting behavior of x as X-> co. (X is called a Courant parameter.) 4.. Consider the problem of determining the minimum over c and x of J(x,c) =
loT [(x',x') + (x,Bx») dt + X(c -
= x(t,X)
b, c - b)
where x(O) = c. Ii. Consider the problem of determining the minimum over x and y of J(x,y)
= loT [(y,y) + (x,Bx») dt + X loT (x'
where X ~ 0 and x(O)
= c.
- y, x' - y) dt
What happens as X-> co?
8. Analytic Form of Solution. unique solution
Now that we know that (6.5) has a
d = -X;(T)-lX~(T)c
(1)
let us examine the analytic structure of the minimizing function and the corresponding minimum value of J(x). We have x
=
X l(t)C - X 2(t)X~(T)-lX~(T)c
whence the missing value at t
=
(2)
0 is
x'(O) = X~(O)c - X~(O)X~(T)-lX~(T)c = -X~(T)-lX~(T)c
(3)
Let us set
= X~(T)-lX~(T)
R(T)
(4)
for T ~ O. As we shall see, this matrix plays a fundamental role in control theory. Since X~ and X~ commute, we see that R is symmetric. We can easily calculate the minimum value of J(x) in terms of R. We have, using the Euler equation,
o = loT (x, x" =
(x,x') l~
- Bx) dt
- loT [(x',x') + (x,Bx)) dt
(5)
322
Introduction to Matrix Analysis
= 0,
Hence, since x'(T)
J(x)
= -(x(O),x'(O» = (c,R(T)c)
(6)
Thus the minimum value is a positive definite quadratic form in c whose matrix is R(T). 9. Altemate Forms and Asymptotic Behavior. Let BH = D denote the positive definite square root of B, where B is assumed to be positive definite. Then we have X1(t) = (e D1 + e- DI )/2 = cosh Dt X 2(t) = D-l(eDI - eD1 )/2 = D-l sinh Dt
(1)
Hence, X~(T) X~(T)
R(T)
= cosh DT = D sinh DT = D tanh DT
(2)
If we had wished, we could have used this specific representation to show that X~(T) is nonsingular for T ~ O. A disadvantage, however, of this direct approach is that it fails for variable B, and it cannot be readily used to treat the system mentioned in the exercises following Sec. 18. As T -+ 00, we see from (1) that tanh DT approaches I, the identity matrix. Hence, we conclude that (3)
R(T) '" D
asT-+oo. Returning to the equation Bx = 0
X" -
x(O) = c
=0
x'(T)
(4)
we see that we can write the solution in the form X =
(cosh DT)-l[cosh D(t - T)]c
(5)
10. Representation of the Square Root of B. From the foregoing we obtain an interesting representation of the square root of a positive definite matrix B, namely
(c,B~c)
= lim min { {T [(x',x') T-+.. z }o
+ (x,Bx)] dt}
(1)
where x(O) = c. EXERCISES
1. Using the foregoing representation, show that B1 BI~
> BI~.
> BI > 0 implies that
Control Processes
323
i. Show that B~ is a concave function of B, that is,
8. Obtain the asymptotic form of x(t) and x'(t) as T -> oc • •• Asymptotically as T -> co, what is the relation between x'(t) and x(t)?
11. Riccati Differential Equation for R.
Using the expression for
R(T) given in (9.2), we see that R(T) satisfies the matrix Riccati equation R'(T)
=
B - R(T)2
R(O)
=0
(1)
This means that we can solve the linear two-point boundary-value problem in terms of an initial-value problem. Given a specific value of T, say To, we integrate (1) numerically to determine R(T o). Having obtained R(T o), we solve the initial-value problem X" -
x(O) = c
Bx = 0 x'(O) = -R(To)c
(2)
again computationally, to determine x(t) for 0 ~ t ~ to. An advantage of this approach is that we avoid the numerical inversion of the N X N matrix X;(T o). In return, we must solve a system of N 2 nonlinear differential equations subject to initial conditions. EXERCISE 1. Why can one anticipate difficulty in the numerical inversion of X~(To) if To is large or if B possesses characteristic roots differing greatly in magnitude?
12. Instability of Euler Equation. A drawback to the use of (l1.2) for computational purposes is the fact that the solution of (l1.2) is unstable as far as numerical errors incurred in the determination of R(To)c and numerical integration are concerned. To see what we mean by this, it is sufficient to consider the scalar version, u(O)
=
u" - b2u = 0 c u'(O) = -b(tanh bTo)c
(1)
[cosh b(t - To)]c u = "---7--;--:;;---'--'cosh bTo
(2)
with the solution
It is easy to see that for fixed t as To -
00,
we have (3)
Introduction to Matrix Analysis
324
Consider the function (4)
satisfying the same differential equation as u, with the initial conditions v(O)
=
C
+E
v'(O) = u'(O)
+ bE
(5)
If the total time interval is small, the term Etf ' will cause no trouble. If, however, bT » 1, routine numerical integration of (11.2) will produce seriously erroneous results. The reason for this is that numerical integration using a digital computer involves the use of an approximate-difference equation, errors in evaluation of functions and round-off errorE!. Hence, we can consider numerical integration as roughly equivalent to a solution of the differential equation subject to slightly different initial conditions. There are a number of ways of overcoming or circumventing this instability. One way is to use an entirely different approach to the study of control processes, an approach which extends readily into many areas where the calculus of variations is an inappropriate tool. This we will do as soon as we put the finishing touches on the preceding treatment. It is interesting to regard numerical solution as a control process in which the aim is to minimize some measure of the final error. A number of complex problems arise in this fashion. 13. Proof that J(x) Attains an Absolute Minimum. Having established the fact that the Euler equation possesses a unique solution X, it is a simple matter to show that this function furnishes the absolute minimum. of the quadratic functional J. Let y be another admissible function. Then J(y) = J(x
+ (y
+ J(y
- x» = J(x)
- x)
+ 2 JOT [(x', as in Sec. 4, where now
loT (x', y' -
E
x') dt
=
1,
Z
=Y
loT [_(x", Y -
x)
since x satisfies the Euler equation. J(y) = J(x)
unless y
iii
X.
+ (Bx, y -
x)] dt
(1)
Integrating by parts,
I - loT (x", Y -
-loT (x", Y -
since x'(T) = 0, x(O) = y(O) = c. hand side of (1) has the form 2
- x.
= (x', y - x) =
y' - x')
x) dt
x) dt (2)
Hence, the third term on the right-
+ (Bx, y -
x)] dt
=0
(3)
Thus,
+ J(y
- x)
> J(x)
(4)
Control Processes
325
14. Dynamic Programming. In Chap. 9 we pointed out some uses of the theory of dynamic programming in connection with some finitedimensional minimization and maximization problems. The basic idea was that these could be interpreted as multistage decision processes and thus be treated by means of functional equation techniques. Let us now show that we can equally regard the variational problem posed above as a multistage decision process, but now one of continuous type. To do this we abstract and extend the engineering idea of II feed back control." We consider the control of a system as the task of determining what to do at any time in terms of the current state of the system. Thus, returning to the problem of minimizing J(x)
=
fo
T
[(x',x')
+ (x,Bx)] dt
(1)
we ask ourselves how to choose x' in terms of the current state x and the time remaining. What will guide us in this choice is the necessity of balancing the effects of an immediate decision against the effects of the continuation of the process. 16. Dynamic Programming Formalism. Let us parallel the procedure we followed in using the calculus of variations. First we shall derive some important results using an intuitive formal procedure, then we shall indicate how to obtain a valid proof of these results. See Fig. 1.
:~ o
A
I
T
FlO. 1.
To begin with,. we imbed the original variational process within a family of processes of similar nature. This is the essential idea of the llfeedback control" concept, since we must develop a control policy applicable to any admissible state. Write f(c,T) = min J(x)
(1)
:z
where x is subject to the constraint x(O) = c. This function is defined for T ~ 0 and all c. The essential point here is that c and T are the fundamental variables rather than auxiliary parameters. Referring to Fig. I, we wish to determine f(c,T) by using the fact that the control process can be considered to consist of a decision over
Introduction to Matrix Analysis
326
[0,.1] together with a control process of similar type over the remaining interval [.1, T). This is a consequence of the additivity of the integral (2)
If .1 is small, and if we assume sufficient smoothness in the minimizing x (which we already know it possesses from what has preceded), we can regard a choice of x(t) over [0,.1] as equivalent to a choice of x'(O) = y, an initial slope. Similarly, we may write
loti. [(X',X') + (x,Bx)] dt
= [(y,y)
+ (c,Bc)].1 + 0(.11)
(3)
Now let us consider the effect of a choice of y over [0,.1]. The initial state c is changed into c + y.1 (to terms in 0(.1 2», and the time remaining in the process is diminished to T -.1. Hence, for a minimizing choice of x over [.1, T], we must have
fAT = f(c + y.1, T
- .1)
+ 0(.1 2)
(4)
Combining these results, we see that we have f(c,T) = [(y,y)
+ (c,Bc)].1 + f(c + y.1, T
- .1)
+ 0(.1
(5)
2)
It remains to choose y. If f(c, T) is to be the minimum value of J(x), y must be chosen to minimize the right side of (5). Thus, we obtain the relation f(c,T) = min [[(y,y)
+ (c,Bc)].1 + f(c + y.1, T
- .1)]
+ 0(.1
2)
(6)
1I
keeping in mind that we are unreservedly using our formal license. Finally, we write f(c
+ y.1, T
_. .1)
= f(c,T)
+ «y,gradj)
- /T).1
+ 0(.1 2)
(7)
where
grad f =
(8)
of OYN
Control Processes
327
Letting .1 - 0, we are left with the nonlinear partial differential equation (9) fT = min [(y,y) + (c,Bc) + (y,gradf)] 1/
This is the dynamic programming analogue of the Euler equation as far as the determination of the minimizing function is concerned. 16. Riccati Differential Equation. The minimum with respect to y in (15.9) is readily determined,
y
= -(gradf)j2
(1)
Substituting this into (15.9), we obtain a nonlinear partial differential equation of conventional form,
IT =
(c,Bc) - (gradf,gradf)j4
(2)
Referring to the definition of f(c,T), we see that the appropriate initial condition is (3) f(c,O) = 0 We can readily dispose of the partial differential equation by observing that the quadratic nature of the functional (or equivalently the linear nature of the Euler equation) implies that
f(c,T) = (c,R(T)c) for some matrix R.
(4)
Using (4), we have gradf = 2R(T)c
(5)
Hence, (2) leads to the relation
(c,R'(T)c)
= (c,Bc)
- (R(T)c,R(T)c)
(6)
Since this holds for all c, we have
R'(T) = B - R2(T)
R(O) = 0
(7)
precisely the result previously obtl1ined in Sec. 11. The " controllaw" is given by (I), namely
y = -R(T)c
(8)
a linear relation. We have deliberately used the same notation as in (8.4), since, as (8.6) shows, we are talking about the same matrix. EXERCISE
1. Show that R-I also satisfies a Riccati equation, and generally that (CR (A R + b) satisfies a Riccati equation if R does.
+ D)-I
328
Introduction to Matrix Analysis
17. Discussion. The path we followed was parallel to that pursued in our treatment involving the calculus of variations. We first obtained a fundamental relation using a plausible intuitive procedure. Once the result was obtained, we provided a valid basis employing entirely different ideas. In this case, we can fall back upon the calculus of variations where the desired results were easily derived. This procedure is typical of mathematical analysis related to underlying physical processes. Scientific insight and experience contribute greatly to mathematical intuition. It is also not difficult to follow a direct path based upon functional analysis. 18. More General Control Processes. Let us proceed to the consideration of a slightly more general control process where we desire to minimize J(x,y)
=
loT [(x,Bx) + (y,y)] dt
(1)
subject to x and y related by the differential equation x' = Ax
+y
x(O)
=c
(2)
Once again we suppose that B > OJ A, however, is merely restricted to being real. Writing (3) f(c,T) = min J(x,y) 1I
the formal procedure of Sec. 15 leads to the relation
IT = min [(c,Bc) • where we have set z = y(O).
+ (z,z) + (Ac + z, grad f)]
(4)
The minimum is attained at z
= - (grad f)/2
(5)
and using this value (4) becomes fT = (c,Bc)
+ (Ac,gmdf)
- (gradf,gmdf)/4
(6)
Setting once again f(c,T)
= (c,R(T)c)
(7)
we obtain the equation (c,R"(T)c)
= (c,Bc)
+ (Ac,2R(T)c)
- (R(T)c,R(T)c)
(8)
Symmetrizing, we have (Ac,2R(T)c) = (IAR(T)
+ R(T)A *}c,c)
(9)
Control Processes where A * denotes the transpose of A. R'
=
B
329
Hence, (8) yields
+ AR + RA * -
R2
R(O)
= 0
(10)
The optimal policy is once again given by z
=
-R(T)c
(11)
We use the notation A * here in place of our l.Isual A' in order not to confuse with the derivative notation. EXERCISES 1. Obtain the corresponding relations for the control process defined by J(x,y) =
10 r [(x,Bx) + (y,ey») dt
x' = Ax
+ Dy
x(O) = c
S. Consider the rontrol process J(x,y) =
foT [(x,x) + (y,y») dl
x' = Ax
+y
x(O) =
c
Show that the Euler equation satisfied by a desired minimizing y is
y'
=
-A*y
+x
y(T)
=0
8. Following the procedures given in Secs. 6-8, show that the system of differential equations possesses a unique solution. 4. Show that this solution provides the absolute minimum of J(x,y). 6. Determine the form of R(T) in the analytic representation of min J(x,y) = (c,R(T)c) and show that it satisfies the Riccati equation corresponding to (10). 6. Carry through the corresponding analysis for the case where J(X,II)
.,.
=
fo
T (x,Bx)
+ (y,y») dt
B
>0
Consider the case where J(x,y) =
JOT (x,Bx) + (y,Cy») dt
x'
= Ax
+ Dy
first for the case where D is nonsingular and then where it is singular, under the assumption that B > 0 and a suitable assumption concerning C. 8. Show how the nonlinear differential equation in (IO) may be solved in terms of the solutions of linear differential equations.
19. Discrete Deterministic Control Processes. theory exists for discrete control processes. Let Xn+l = AXn + Yn
Xo =
C
N
J({X,,,Ynl)
=
I
u-=o
[(xn,x,,)
A corresponding
+ (Yn,Y..)]
(1)
Introduction to Matrix Analysis
330
and suppose that we wish to minimize .J with respect to the
Write
Yn.
fN(C) = minJ(lxn,Yn})
(2)
111"1
Then, arguing as before, we have fN(C)
+ (Y,l/) + fN-I(Ac + y)]
= min [(c,c)
(3)
1/
N
~ I, with fo(c) = (c,c). It is clear once ngain, or readily demonstrated inductively, that
fN(C)
where RN is independent of c. we have (c,RNc)
= (c,RNc)
(4)
Using the representation of (4) in (3),
+ (y,y) + (Ac + y, RN_1(Ac + yn]
= min [(c,c)
(5)
1/
The minimum with respect to Y can be obtained readily. tional equation is y + RN_1(Ac + y) = 0 whence
The varia(6) (7)
Substituting this result in (3), we obtain, after some simplification, the recurrence relation RN = (1
N
~
+ A *A)
+ RN-1)-IA
- A *(1
(8)
I, with R o = I. EXERCISES
1. Show that 1 ~ R N _ 1 < R N verges to 1\ matrix R,. satisfyinp: R,. = (I
~
(I
+ A *A)
+ A*A)
I\nll thus that as N
-->
10,
R N COII-
+ R,.)-IA
- A* (l
i. Obtain thc nnnloll:llc of thl' Euler equation for the foregoing finite-dimcnsional variational problem. Prove that the two-point boundary problem for these !ineur difference equations possesses a unique solution and that this solution yields the absolute minimum of J. 8. Carry out the snme program for the efise where N
J ( /XnlYn l) =
l
[(xn,RXn)
+ (Yn,Yn))
1/=0
'- Consider the discrete prOl'ess X n f I = (I
+ A ~)xn + Yn~, Xo
N
J(/Xn,Yn l) =
l
1/=0
[(xn,x,,)
+ (y.,Yn)J~
= C,
Control Processes
331
Consider the limiting behavior of min J and the minimizing ve!"lors as N~ = T.
~ -->
0 with
20. Discrete Stochastic Control Processes. Let us now consider the case where we have a stochastic control process of discrete type, X"+l .. Ax" N
J(IX",y"l)
=
l
+ y" + r" Xo = [(X"'Xn) + (y",y,,)]
C
(1)
n-O
where the r" are random variables. The objective is to choose the Yn so as to minimize the expected value of J( IXn,Yn I) subject to the following provisions: (a) The r" are independent random variables with specified probability distribution dG(r). (b) The vector Yn is chosen at each stage after observation of the state X n and before a determination of r n •
(2)
Writing !N(C) = min exp J(lxn,Ynl)
(3)
111,01 1r.1
and arguing as before) we obtain the functional equation !N(C)
= min [ (c,c) + (y,y) + 11
f-.. . !N_l(Ac + Y + r) dG(r) ]
(4)
N ~ 1, with !o(c) = (c,c). We can then proceed in the same fashion as above to use the quadratic nature of fN(C) to obtain more specific results. EXERCISES
1. Show that !N(C) = (c,RNc) + (bN,c) + aN and determine recurrence relations for the R N, bN, and aN. i. In control processes of this particular nature) does it make any difference whether we actually observe the system at each stage or use the expected state without observation? 8. How would one handle the case where the r. are dependent in the sense that dG(rn) = dG(rn,rn_l), that is, the probability distribution for Tn depends upon the value of r._l?
21. Potential Equation.
The potential equation
UU+UIIII=O
(X,y) E R
(1)
with U specified on the boundary of a region R, leads to many interesting investigations involving matrices when we attempt to obtain numericl\J results usin~ a diRcretized version of (1). Let x and Y run throu~h some
Introduction to Matrix Analysis
332
discrete grid and replace (1) by u(x
+ .1, y) + u(x
- .1, y) - 2u(x,Y) .1 2 U{X, Y .1)
+
+
+ U(X, y
- .1) - 2u(x,Y)
= 0 (2)
.1 2
This may be written U (x,y )
=
U(X
+ .1, y) + U(X
- .1, y)
+4 U(X, Y + .1) + U(X, Y -
.1)
(3)
expressing the fact that the value at (x,y) is the mean of the values at the four "nearest neighbors" of (x,y). Use of (3) inside the region together with the specified values of u(x,y) on the boundary leads to a system of linear algebraic equations for the quantities u(x,y). If an accurate determination is desired, which means small .1, the system becomes one of large dimension. A number of ingenious methods have been developed to solve systems of this nature, taking advantage of the special structure. We wish to sketch a different approach, using dynamic programming and taking advantage of the fact that (1) is the Euler equation of J(u)
~ JR (Uz2
+ U1J2) dx dy
22. Discretized Criterion Function. by differences, ,...., u(x
u.,
=
u lJ
=
As before, we replace derivatives
+ .1, y)
- u(x,y)
+ .1)
- u(x,y)
,...., u(x, Y
(4)
.1
.1
For simplicity, suppose that R is rectangular, with a = N.1, b :z:: M.1. See Fig. 2. Let CII C2, • • • , CM-I be the assigned values along x = 0
Con troJ Processes
333
and let XII X2, • • • , XM-I be the values of u at the values (.1,.1), (.1,2.1), . . . , (.1,(M - 1).1). In place of writing a single variational equation for the unknown values at the interior grid points, as in (21.3), we think of this as a multistage decision process in which we must determine first the M - 1 values along X = .1, then the (M - 1) values along X = 2.1, and so on. Write
= min 2;2;[/ u(x + .1, y)
fM(C)
where fl(c)
C
- u(x,y) j2
+ Iu(x, y + .1)
is the vector whose components are the
= (co
- u(x,y) 12]
(2)
Ci,
+ (CI - bl )2 + ... + (CM - bM)2 + (CI - co)2 + (C2 - CI)2 + ... + (CM
- bo)2
- CM_I)2
(3)
+ + (eM - XM)2 C2)2 + ... + (CM_I - eM)2 + fM-I(x)]
(4)
Then fM(c)
=:
min [(co - xo)2 x
+ (co -
CI)2
+ (CI
+-
(el - .1'1)2
EXERCISES 1. Use the quadratic nature of !M(C) to obtain appropriate recurrence relations. i. What happens if the rectangle is replaced by a region of irregular shape? Consider, for example, a triangular wedge. 8. How would one go about obtaining a solution for a region having the shape in Fig. 31 See AngeJ.1
I FlU. 3. 1 E. S. Angel, Dynamic Programming and Partial Differential Equations, doctoral dissertation, Electrical Engineering, University of Southern California, Los Angeles, }\)68. E. S. Angel, Dynamic Programming and Linear Partial Differential Equations, J. Math. Anal. Appl., vol. 23, pp. 639ff, 1968. E. S. Angel, Discrete Invariant Imbedding and Elliptic Boundary-value Problems over Irregular Regions, J. Math. Anal. Appl., vol. 23, pp. 471ff, 1968. E. S. Angel, A Building Block Technique for Elliptic Boundary-value Problems O\'er Irregular Regions, J. Math. Anal. Appl., to appear. Eo S. Angel, Noniterative Solutions of Nonlinear Elliptic Equations, Comm. ASBOC. C'omput. Machinery, to appear. E. S. Angel, Invariant Imbedding and Three-dimensional Potential Problems, J. Compo Phys., to appear.
Introduction to Matrix Analysis
334
MISCELLANEOUS EXERCISES 1. Let A be a positive definite matrix and write A~ = D + B, whereDis a diagonal matrix whose elements are the square roots of the diagonal elements of A. CoDBider the recurrence relation DB. + B"D = A - DI - B~_I' B o = O. Does B. converge as k -+ co? (P. Pul4g, An Iterative Method Jor the Determination oj the Square Root oj a Pollitivt Definite Matriz, Z. angew. Math. Meeh., vol. 46, pp. 151ff, 1966.) i. Let A, B > O. Then
(A; B) 5 (AI; B)"5 . . .
5 (AIN; BINr~N
Hence the limit
M(A,B) .. J~. (
A IN
+ BIN) ~~N 2
exists as N -+ co. 8. Does J~. (
AN
+ BN)lIN 2 exist if A, B > 01
4. Every real skew-symmetric matrix is a real, normal square root of a nonpositive definite real symmetric matrix whose nonzero characteristic values have even multiplicities. (R. F. Rinehart, Skew Matrices 08 Square Roots, Am. Math. Monthly, vol. 67; pp. 157-161, 1960.) Ii. Consider the solution of R ' = A - RI, R(O) = 0, and the approximation scheme ~+I = A - R,.( co )R.+l, R.i-l(O) .. 0, with R o( co) .. B, BI < A. Does R.+ 1 ( co) converge to A~ with suitable choice of B?
Bibliography and Discussion §1. For discussions of the modern theory of control processes, see R. Bellman, Introduction to the Mathematical Theory of Control Processes, Academic Press Inc., New York, 1967. R. Bellman, Adaptive Control Processes: A Guided Tour, Princeton University Press, Princeton, New Jersey, 1961.
R. Bellman and R. Kalaba, Dynamic Programming and Modem Control Theory, Academic Press Inc., New York, 1965. M. R. Hestenes, Calculus of Variations and Optimal Control Theory, John Wiley & Sons, Inc., New York, 1966. L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkrelidze, and E. F. l\Iishchenko, The Mathematical T heor y of Optimal Processes, Interscience Publishers, New York, 1962.
Control Processes
33S
§2. For discussions of stability theory and further references, see R. Bellman, Stability Theory of Differential Equation8, Dover Publications, New York, 1969.
R. Bellman, I ntroduetion to M ethod8 of Nonlinear A naly8i8, Academic Press, Inc., N ew York, 1970. §§3 to 9. We follow the route outlined in the first book cited above. See also
R. Bellman, On Analogues of Poincare-Lyapunov Theory for Multipoint Boundary-value Problems, I, J. Math. Anal. Appl., vol. 14, pp. 522-526, 1966. R. Bellman, On Analogues of Poincare-Lyapunov Theory for Multipoint Boundary-value Problems: Correction, J. Math. Anal. Appl., to appear. §10. See'
R. Bellman, Some Inequalities for the Square Root of a Positive Definite Matrix, Lin. Algebra Appl., vol. I, pp. 321-324, 1968. For an interesting way in which A l-i enters, see E. P. Wigner and M. Y. Yanase, Information Content of Distributions, Proc. Nat. Acad. Sci. U.S., vol. 49, pp. 910-918, 1963. §12. The equation for R(T), however (11.1), is stable. See R. Bellman, Functional Equations in the Theory of Dynamic Programming, XIII: Stability Considerations, J. Math. A nal. A ppl., vol. 12, pp. 537-450, 1965. As G. Dahlquist has pointed out, it is essential to distinguish between inherent instability and instability associated with a specific method. Thus, for example, the equation x" - Ax = 0, x(O) = c, x' (T) = 0 is 8table as far as small changes in c are concerned, but the solution calculated according to some particular technique may be very sensitive to computational error. §14. In addition to the first three books cited above in Sec. 1, see the references at the end of Chap. 9. §16. Geometrically, we can think of dynamic programming as dual to the calculus of variations. A curve is visualized in this theory as an envelope of tangents rather than a locus of points as it is in the calculus
336
Introduction to Matrix Analysis
of variations. In the derivation of (15.5) we are using the principle of optimality; see any of the three books cited above. §16. For detailed discussions of the ways in which matrix Riccati equations enter into the solution of two-point boundary-value problems, see
W. T. Reid, Solution of a Riccati Matrix Differential Equation as Functions of Initial Values, J. Math. Mech., vol. 8, pp. 221-230, 1959. W. T. Reid, Properties of Solutions of a Riccati Matrix Differential Equation, J. Math. Mech., vol. 9, pp. 74~770, 1960. W. T. Reid, Oscillation Criteria for Self-adjoint Differential Systems, Trans. Am. Math. Soc., vol. 101, pp. 91-106, 1961. W. T. Reid, Principal Solutions of Nonoscillatory Linear Differential Systems, J. Math. Anal. Appl., vol. 9, pp. 397-423, 1964. W. T. Reid, Riccati Matrix Differential Equations and Nonoscillation Criteria for Associated Linear Differential Systems, Pacific J. Math., vol. 13, pp. 665-685, 1963. W. T. Reid, A Class of Two-point Boundary Problems, Illinois J. Math., vol. 2, pp. 434-453, 1958. W. T. Reid, Principal Solutions of Nonoscillatory Self-adjoint Linear Differential Systems, Pacific J. Math., vol. 8, pp. 147-169, 1958. W. T. Reid, A Class of Monotone Riccati Matrix Differential Operators, Duke Math. J., vol. 32, pp. 68~696, 1965. See also R. Bellman, Upper and Lower Bounds for the Solution of the Matrix Riccati Equation, J. Math. Anal. Appl., vol. 17, pp. 373-379, 1967. M. Aoki, Note on Aggregation and Bounds for the Solution of the Matrix Riccati Equation, J. Math. Anal. Appl., vol. 21, pp. 377-383, 1968. The first appearance of the Riccati equation in control theory and dynatnic programming is in
R. Bellman, On a Class of Variational Problems, Q. Appl. Math., vol. 14, pp. 353-359, 1957. §§19 to 20. See the first book cited above. For a detailed discussion of stochastic control processes of discrete type and dynamic programming, see
Control Processes
337 S. Dreyfus, Dynamic Programming and the Calculu8 of Variations, Academic Press I nc., New York, 1965. §21. See, for example, R. S. Varga, Matrix Iterative Analysis, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1962. C. G. Broyden and F. Ford, An Algorithm for the Solution of Certain Kinds of Linear Equations, Numer. Math., vol. 8, pp. 307-323, 1966.
18 Invariant Imbedding 1. Introduction. In the previous chapter we oonsidered sometwo-point boundary-value problems for linear differential systems arising in the theory of control processes. Further, we indicated how the theory of dynamic programming could be used to obtain alternate analytic formulation in terms of a matrix Riccati differential equation. In this chapter, we wish to employ the theory of invariant imbedding-an outgrowth of dynamic programming-to obtain corresponding results for general twopoint boundary-value problems of the type -x' = Ax + Dy y' = Bx + Cy
x(a) = c y(O) = d
(1)
Why we employ this particular form will become clear as we proceed. We suppose that the matrices A, B, C, D depend on the independent variable t, 0 ::; t S a. As is to be expected, more can be said when these matrices are constant. There is little difficulty in establishing existence and uniqueness of the solution of (1) when a is a small quantity. Furthermore, the question of the determination of a. (the critical value of a), for which existence and uniqueness cease to hold, can be reduced to a Sturm-Liouville problem. A basic challenge, hoWever, is that of delineating significant classes of equations of this nature for which a unique solution exists for all a > O. We shall approach this question by associating the foregoing system of equations with an idealized physical process, in this case a transport process. Simple physical principles will then lead us to important classes of matrices (A,B,C,D]. Furthermore, the same principles will furnish the clue for a powerful analytic approach. 2. Terminal Condition as a Function of a. Let us examine the system in (1.1) with a view toward determining the missing initial condition y(a). Let (XIIY l ) and (X 2 ,Y 2 ) be the principal solutions of
-X' Y'
= =
AX + DY BX + CY 338
(1)
Invariant Imbedding
339
that is to say X1(O) = I X 2(O) = 0
Y1(O) = 0 Y 2 (O) = I
(2)
The solution of (1.1) may then be written x = X1b
y = Y1b
+X d +Yd 2
(3)
2
where the vector b is to be determined by the relation (4)
Since Xt(O) = I, it is clear that Xt(a) is nonsingular for a small. We shall operate under this assumption for the moment. Since the equation in (1.1) is linear, it is permissible to consider the cases c = 0 and d = 0 separately. Let us then take d = O. The case where d ~ 0, c = 0 can be treated by replacing t by a - t. From (4), we have (5)
whence we have the representations x = X tX t(a)-tc y
The missing value of y at t
=
ftX t(a)-t c
=
a is then
(6)
(7)
and at t = 0 we have (8)
Let us call (9)
the reflection matrix and T(a) = X l(a)-t
(10)
the transmission matrix. We are anticipating the identification of (1.1) with certain equations describing a transport process. Consider R(a) as a function of a. We have R'(a) = Y;(a)X1(a)-t - Yt(a)Xt(a)-tX;(a)X1(a)-t = [B(a)X t(a) C(a) Y t(a»)X t(a)-t Yt(a)Xt(a)-t[A(a)Xt(a) D(a)Yt(a»)Xt(a)-t = B(a) C(a)R(a) R(a).l (aj R(u)D(a)R(a)
+
+
+
The initial condition is R(O) = O.
+
+
+
(11)
Introduction to Matrix Analysis
340
Similarly, we have T'(a)
=
+
X 1(a)-l[A(a)X 1(a) D(a)Y1(a)]X1(a)-1 T(a)D(a)R(a)
+
= T(a) A (a)
(12)
with T(O) = I. 3. Discussion. We observe that we obtain the same type of Riccati differential equation in this general case as in the special case where the system of equations is derived from a control process. Once again an initial-value problem for a nonlinear system of ordinary differential equations replaces a linear system subject to two-point conditions leading to the solution of a linear system of algebraic equations. Sometimes this is desirable, sometimes not. What we want is the flexibility to use whatever approach is most expedient at the moment. 4. Linear Transport Process. Let us consider an idealized transport process where N different types of particles move in either direction along a line of finite length a. They interact with the medium of which the line is composed in ways which will be specified below, but not with each other. The effect of this interaction is to change a particle from one type to another, traveling in the same or opposite direction. We call this a "pure scattering" process. Let us consider the passage of a particle of type or state i through the interval It, t + A] where A > 0 is a small quantity and i may assume any of the values i = 1, 2, . . . , N j see Fig. 1 and note the reversal of the usual direction. We have done this to preserve consistency of notation with a number of references. Incident flux C
Yi(f)" -AJ'(fl
Incident flux d
"--II----+-I---+-1--+1---11"; (J
itA
f
I-A
0
FIG. 1.
Let a,j(t)A
1 - a,,(t)A
b,j(t)A
+ o(A) + +
the probability that a particle in state} will be transformed into a particle in state i traveling in the same direction, j ¢ i, upon traversing the interval [t + .1, t] going to the right o(A) = the probability that a particle in state i will remain in state i in the same direction while traversing the interval [t + A, t] going to the right o(A) = the probability that a particle going to the right in state j will be transformed into a particle in state i traveling in the opposite direction upon traversing the interval [t + A, t] =
(1)
Invariant Imbedding
341
Similarly, we introduce the functions Cij(t) and d;j(t), associated with forward scattering and back scattering for a particle going to the left through [t, t + .1J. We suppose that all of these functions are nonnegative for t ~ 0, as they should be, of course, to be associated with probabilities. The most important case is that where they are piecewise continuous. Suppose that there are streams of particles of all N types incident at both ends of the line, as indicated in Fig. 1, of constant intensities per unit time. We call this stream of particles a "flux." 6. Classical Imbedding. Let uS now introduce the steady-state functions x,(t) = the expected intensity of flux of particles of type i to the
right at the point t, 0 :::; t :::; a y;(t)
= the expected intensity of flux of particles of type i to the left (1) at the point t, 0 :::; t :::; a, i = 1, 2, . . . , N
See Fig. 1. We shall henceforth omit the term "expected" in what follows and proceed formally to obtain some equations for Xi and Yi. There is no need to concern ourselves with rigorous details here, since we are using the stochastic process solely as a guide to our intuition in ascertaining some of the properties of the solution of (1.1). Those interested in a rigorous derivation of the following equations, starting from the continuous stochastic process, may refer to sources cited at the end of the chapter. An input-output analysis (which is to say, a local application of conservation relations) of the intervals [t - .1, tJ and [t, t + .1J yields the following relations, valid to 0(.1): x;(t)
=
x;(t
+ .1)(1
- a;i.1)
+I
N
aij.1Xj(t
+ .1) + I
Yi(t)
= Yi(t
- .1)(1 - Cii.1)
+I
d1j .1yj(t)
i-I
i~i
N
Cij.1Yj(t - .1)
+I
(2)
bij .1Xj(t)
i-1
j .. i
Passing to the limit as .1 -+ 0, we obtain the differential equations -x;(t)
y;(t)
i = 1, 2, . . . ,N. conditions are
= =
-a;IXi(t)
-CiiYi(t)
I +I ,.,.i +
N
a;jxit)
i,.1
+I
dijYj(t)
j-1
N
Ct,yj(t)
+
l
(3)
bijXi(t)
i-1
Referring to Fig. 1, we see that the boundary
Xi(a)
Yi(O)
= Ci = di
i
=
1,2, . . . IN
(4)
342
Introduction to Matrix Analysis
We introduce the matrices 8 == .. (b,j) -all
au
A= aNI
aN2'"
-aNN
D == (dij ) -Cll
Cn
c==
(5)
Band D are non-negative matrices, while A and C possess quite analogous properties despite the negative main diagonals; see Chap. 16. The condition of pure scattering imposes certain conditions on the column sums of A + Band C + D which we will use below. We observe then the special structure of the matrices A, B, C, D required for (1.1) to be associated with a linear transport process. Introducing the vectors x and y, with components Xi and y, respectively, we obtain a linear vector-matrix system subject to a two-point boundary condition having the form given in (1.1). If the effects of interaction are solely to change type and direction, we say that the process is one of pure scattering as noted above; if particles disappear as a result of interactions, we say absorption occurs. We will consider the pure scattering case first. We will not discuss the fission case, corresponding to a production of two or more particles as a result of an interaction. Here questions of critical length arise, as well as a number of novel problems. 6. Invariant ImbeddJng. Let us now present a different analytic formulation of the pure scattering process which will permit us to demonstrate the existence of the reflection and transmission matrices for all a ~ O. This represents a different imbedding of the original process within a family of processes. The preceding was a classical imbedding. To begin with, we introduce the function rij(a)
== the intensity of flux in state i from a rod of length a emergent at a due to nn incident flux of unit intensity in state j incident at a
(1)
Invariant Imbedding
343
This function, the intensity of reflected flux, is defined, hopefully, for a ~ 0, i, .i = 1, 2, . . . ,N. It remains to be demonstrated that this is meaningful. See Fig. 2. Reflected flux Incident 1-1---11----------11 flux- 0+6 0 0
FIG. 2.
To obtain equations for the rij(a) as functions of a, we keep account of all the possible ways in which interactions in [a + A, a] can contribute to a reflected flux in state i. At first the bookkeeping seems tedious. Subsequently, we shall Show that with the aid of matrix notation the multidimensional case can be treated in a simple direct fashion when the physical meaning of the matrices is appreciated. This is typical of mathematical physics when suitable notation is employed. In more complex situations, infinite-dimensional operators replace the finite-dimensional matrices. Starting with an incident flux in state j, we can obtain a reflected flux in state i in the following fashions:
+ A, a] and immediate reflection resulting in a change of direction. (b) Interaction in [a + A, a] together with reflection in state i from (a) Interaction in [a
[a,O]. (c) Passage through [a A, a] without interaction, reflection from [a,O] in state i, and passage through [a, a A] without inter-
+
+
(2)
action. (d) Passage through [a + A, aJ without inter[\,ction, reflection from [a,O] in state j = i, and interaction in [a, a + A] resulting in emergence in state i. (e) Passage through [a + A, a] without interaction, reflection from [a,O] in state k, interaction in [a, a + A) resulting in a state l toward [a,O], and then reflection in state i from [a,O]. All further interactions produce terms of order .1 2 and can therefore be neglected at this time, since ultimately we let A tend to zero. Note that we write both [a + A, a] and [a, a + A] to indicate the direction of the particle. Taking account of the possibilities enumerated in (2), we may write rij(a
+ A) = b;jA +
l
akjrik(a)A
k,.i
+
+ (1
I k ... ;
- aijA) [ rjj(a)(1 - c;;A)
rkj(a)c;k A
+I It,1
ri/(a)d,krkj(a) A]
(3)
Introduction to Matrix Analysis
344
to term in o(A). Passing to the limit as A - 0, we obtain the nonlinear differential equation
r~(a) = bij +
k,..l all,1'ill(a) -
(ajj
+ cii)rila ) +
I
rllj(a)eu:
+I
k,.i
for i, j
rjl(a)d1clrllj(a)
(4)
k,l
= I, 2, .•. ,N, with the initial condition r;j(O)
=0
(5)
This last equation states that the reflection from a rod of zero length is zero. Weare not concerned with the rigorous derivation of (4) in the foregoing fashion at the present time. We do, however, want the difference approximation for an important purpose below j see Sec. 9. A rigorous derivation of (4) from the linear equations has already been given in Sec. 2. Writing (6)
and letting A, B, C, D have the previous significances, we see that (4) takes the compact form
R' = B
+ RA + OR + RDR
R(O)
= 0
(7)
This is a matrix differential equation of Riccati type. Similar equations arise in control theory, as we noted in Sec. 3 and in the previous chapter. '1. Interpretation. Once we have obtained the differential equation, we can readily identify each of the terms and then use this identification as a means of writing down the corresponding equation for the transmission matrix. We see that B corresponds to immediate backscattering, that CR corresponds to reflection followed by forward scattering, that RA corresponds to forward scattering followed by reflection, and that RDR corresponds to reflection, backscattering, and then reflection again. The physical interpretation is also useful in obtaining convenient approximations in certain situations. We expect then the equation for the transmission matrix T(a) to have the form (1) T' = TA + TDR corresponding to forward scattering followed by transmission, plus reflection, backscattering, and then transmission again. The initial condition is T(O) = I. We leave this for the reader to verify in various ways.
Invariant Imbedding
345
The importance of the physical approach lies in the fact that·it provides an easy way of deriving many important results, and, what is more, of anticipating them. EXERCISE 1. What differential equations do we get if we use the figure
o
(J
instead of Fig. 27
8. Conservation Relation. Since we have assumed a pure scattering process in Sec. 4, we know that no particles are "lost." A particle of type i must transform into a particle of type i, i = I, 2, . . . , N, for any i. The analytic translations of this observation are the relations
M(B
+ A)
=0
M(C
+ D) = 0
(1)
where M is the matrix 1 1 0
1
o
o
M=
(2)
o
0
... 0
The effect of multiplication by M is to produce column sums. We expect that the conservation relation
M(R
+ T) = M
(3)
which states that total input is equal to total output, will hold for a ~ O. Let us begin by showing that it holds for small a. From (6.7) and (7.1), we have
[M(R
+ T»)' =
+ CR + RA + RDR) + M(TA + TDR) + MCR + MRA + MTA + M(R + T)DR + MCR + M(R + T - I)A + MA + M(R + T - I)DR + MDR = M(B + A) + M(C + D)R + M(R + T - I)A + M(R + T - I)DR = M(R + T - I)A + M(R + T - I)DR M(B
= MB = MB
(4)
Regarding R as a known function, this is a differential equAtion of the form (5) Z(O) = 0 Z' = ZA + ZDR
346
Introduction to Matrix Analysis
for the function Z = M(R + T - I). One solution is clearly Z = O. Uniqueness assures uS that it is the only solution within the interval for which R(a) and T(a) exist. Hence, within the domain of existence of Rand T, we have (3). If we can establish what is clearly to be expected, namely that all the elements of Rand T are non-negative, (3) will imply that R and Tare uniformly bounded for 0 ~ a ~ ao, which means that we can continue the solution indefinitely, using the same argument repeatedly. EXERCISES 1. Show that the relation M(R
+ T)
= M is equivalent to M(Y 1 (a)
+ I)
=
MX 1 (a), provided that X 1 (a) is nonsingular. II. Obtain the foregoing relation in some interval [D,an} by showing that MY;(a) = MX;(a)
9. Non-negativity of Rand T. Let us now establish the non-negativity of T.,(a) and tlj(a) for a ~ O. We cannot conclude this directly from the differential equation of (6.7) since A and C have negative elements along the main diagonal. Let us present one proof here depending upon the recurrence relation of (6.3). Another, depending upon the properties of A and C as scattering matrices, is sketched in the exercises. Referring to (6.3), we see that all the terms are 'non-negative on the right-hand side if .1 is sufficiently small. We can avoid even this restriction by means of the useful device of replacing (1 - ajj.1) and (1 - c;;.1) by the positive quantities C Gli4 and e-c"4, a substitution valid to terms in 0(.1). It follows that the sequence ITij(k.1) I obtained by use of (6.3) is non-negative, and actually positive if all of thc bij are positive. Exact conditions that guarantee positivity rather than merely non-negativity are not easy to specify analytically, although easy to stnte in physical terms. We know from standard theory in differential equations that thc functions determined by the difference equation plus linear interpolation converge to the solutions of the differential equation in some interval [O,ao]. Hence, Tij(a) ~ 0 in this interval. A similar result holds for the ~j(a). Thus, using the conservation relation (8.3), we can conclude
o ~ rij(a) ~ 1 o ~ i.j(o) ~ 1
i, i = 1, 2, . . . ,N
(1)
It follows that the solution of the differcntial equntion can be continued for another interval lao, ao + all. Using (6.3) Rtarting at a = ao, wc
establish non-negativity, boundednesR, and so on. Thus, the Rolution of (6.7) can be continued in {tn intervul lao + aI, ao + 2ad, and so on.
Invariant Imbedding Hence, it exists and is unique for a ~ O. in the theory of differential equations.
347
This is a standard argument
EXERCISES 1. Consider first the case where A and C are constant. Convert the equation + CR + RA + RDR, R(O) = 0, into the integral equation
R' = B
10' eC(I-I.)[B + RDR}e
R =
A (I-I,)
dll
i. Use successive approximations, plus the properties of C and A, to deduce that
R has non-negative elements in some interval [O,aol. 8. If A and C are variable, obtain the corresponding integral equation R
=
10' X(I)X(II)-I[B + RDRIY(I) Y(h)-I de.
where X, Yare respectively solutions of X' = CX, Y' = YA, X(O) = Y(O) = I. 4. Show that X(I)X(II)-l, Y(I) Y(II)-l both have non-negative clements for 0 ~ II ~ I, and proceed as above. Ii. Show that the elements of R(a) are monotone-increasing in a if A, B, C, D are constant. 6. Is this necessarily true if A, B, C, D are functions of a? .,. Show that R( 00) = lim R(a) exists and is determined by the quadratic equalI_
OD
tion B + CR(oo) + R(oo)A + R(oo)DR(oo) = O. 8. Show that R(a) has the form R( 00) + Rle~'· + ... as a -+ 00 where Re ("1) < 0 in the case where A, B, C, D are constant. How does one determine "I? 9. Show that the monotonicity of the elements of R(a) implies that < O. 10. How would one use nonlinear interpolation techniques to calculate R( 00) from values of R(a) for small a? See R. Bellman and R. Kalabn, A Note on Nonlinear Summability Techniques in Invariant Imbedding, J. Malh. Anal. Appl., vol. 6, pp. 465-472, 1963.
"1
10. Discussion. Our objective is the solution of (1.1) in the interval Pursuing this objective in a direct fashion we encountered the problem of demonstrating that X l(a) was nonsingular for a ~ O. To establish this result, we identified X l(a)-l with the transmission matrix of a linear transport process and thus with the aid of a conservation relation we were able to deduce that X l(a)-l existed for a ~ O. This means that the internal fluxes x(t) and y(t) are uniquely determined for 0 ~ t ~ a, for all a > o. A basic idea is that of identifying the original equations with a certain physical process and then using a dijfere11t analytic formulation of this process to derive certain important results. 11. Internal Fluxes. Let us now consider a representation for the internal cases in the case where the medium is homogeneous, that is, A, B, C, D are constant, and the scattering is isotropic, that is, lO,a].
A=C
B=D
(1)
Introduction to Matrix Analysis
348
Consider Fig. 3. Concentrating on the interval [a,tj, we regard c and y(t) as the incident fluxes. Hence, we have x(t)
=
T(a - t)c
+ R(a -
t)y(t)
(2)
Similarly, considering the interval [t,Oj, we have y(t)
=
R(t)x(t)
y(f)-+-
+ T(t)d
(3)
-X(f)
c-I-I------+I-----II-+-d q t 0 FlO. 3.
Combining (2) and (3), we obtain x(t)
=
[I - R(a - t)R(t)j-l[T(a - t)c
+ R(a -
t)T(t)dj
(4)
provided that [I - R(a - t)R(t)j is nonsingular for 0 ~ t ~ a. This is certainly true for small a. We shall leave as exercises the proof of the result for general a and the demonstration of the non-negativity of x and y for non-negative c and d. EXERCISES 1. Show that R(a) has largest characteristic root. (the Perron root) less than one. Hint: Use the conservation relation. i. Hence, show that I - R(aj2)1 has an inverse and that this inverse is nonnegative. 8. Hence, show that z(a/2), y(a/2) are non-negative. 4. Repeating the same type of argument for [a/2,O) and [a,a/2), show that z(t) and y(t) are non-negative for a/4, 3a/4. Proceeding in this fashion and invoking continuity, show that z(t), y(t) are non-negative for 0 ~ t ~ a. Ii. Establish the relations in (2) and (3) purely analytically. 6. From the fact that [I - R(a - t)R(t)]x ~ T(a - t)c
+ R(a
- t)T(t)d = P
possesses one solution z for every c and d, conclude that I - R(a - t)R(t) possesses an inverse, and thus that the largest characteristir root of R(a - t)R(t) cannot be one. .,. To show that this largest characteristic root must be less than one in absolute value, iterate the relation
z
= p = P
+ R(a + R(a
- t)R(t)z - t)R(t)p
+ ... + [R(a
- t)R(t»)n-lp
+ R(a
- t)R(t)nz
and use the fact that p, z, R(a - t)R(t) are all non-negative. 8. Extend the foregoing considerations to the homogeneous, anisotropic ease. 9. Extend the foregoing considerations to the inhomogeneous c. se.
12. Absorption Processes. Consider the more general case where particles can disappear without producing other particles as a result of
Invariant Imbedding
349
an interaction. Call this phenomenon absorption. quantities i = I, 2, . . . , N, ei,(t).1
Introduce the
= the probability that a particle of type i is absorbed in
traversing the interval [t + .1, t] traveling in a right-hand direction lii(t).1 = the probability that a particle of type i is absorbed traversing the interval [t, t +.1] traveling in a left-hand direction We suppose that
e"~,
Ii,
~
(1)
0 and that M(A M(C
+ B + E) = M + D + F) = M
(2)
for t > O. This means that a particle is either forward-scattered, backscattered, or absorbed. Introduce the functions 'iJ{a)
= the probability that a particle of type j incident upon [a,O] from the left results in the absorption of a particle of type i in [a,O], i, j = I, 2, . . . ,N
(3)
Setting L(a) = (l'i(a» , it is easy to see from the type of argument given in Sec. 6 that the matrix L satisfies the differential equation L'
= E + LA +
FR
+ LDR
L(O) = 0
(4)
where E and F are the diagonal matrices composed of the e"~ and Iii respectively. We call L a dissipation matrix. We expect to have the conservation relation M(R
+ T + L)
=
(5)
M
for a ~ O. To demonstrate this we proceed as before. suitably grouping terms, [M(R
+
T
+ L)]' =
M(B + + M(C = M(B + + M(R + M(R = M(A + + M(R + M(R = M(R +
We have,
+ M(R + T + L)A + F)R + M(R + T + L)DR E) + MA + M(C + F)R + MDR E)
+ T +L + T +L
- I)A - I)DR B + E) + M(C + D + T + L - I)A + T + L - I)DR T + L - I)(A + DR)
This equation clearly has the solution M(R
+ F)R
+ T +L
- I) = O.
(6)
Introduction to Matrix Analysis
350
As before, we have Rand T non-negative and the same is readily demonstrated for L. From here on, the results and methods are as before, and we leave the details as exercises. MISCELLANEOUS EXERCISES 1. Consider the three-dimensional linear system 'Z' = A'Z for which 'Z is specified by the conditions 'Z1(O) = Cl, ('Z(h),b(l» = C2, ('Z(a),b
and obtain a differential equation satisfied by F(a), See R. Bellman Invariant Imbedding and Multipoint Boundary-value Problems, J. Math. Anal. Appl., vol. 24, 1968. 4. Consider the case where there is only one state and scattering produces a change of direction. Let r(a) denote the intensity of reflected flux. Show by following successive reflections and transmissions that r(a
+ b)
= r(a) r(a)
+ t(a)r(b)t(a) + t(a)r(b)r(a)r(b)t(a) + + r(b)
= 1 - r(a)r(b) Ii. Obtain the corresponding relation for the transmission function. 6. Obtain the analogous relations in the multitype case where R(a) and T(a) are matrices. .,. Do these functional equations determine R(a) and T(a)? See the disol ision of Polya's functional equation in Sec. 16 of Chap. 10.
Bibliography and Discussion §1. For a discussion of the background of invariant imbedding, see
R. Bellman, Some Vistas of Modern Mathematics, University of Kentucky Press, Lexington, Kentucky, 1968. A survey of applications of invariant imbedding to mathematical physics is given in
R. Bellman, R. Kalaba, and G. M. Wing, Invariant Imbedding and Mathematical Physics, I: Particle Processes, J. Math. Phys., vol. 1, pp. 280-308, 1960.
P. B. Bailey and G. M. Wing, Some Recent Developments III Invariant Imbedding with Applications, J. Math. Phlls., vol. 6, pp. 453-462, 1965.
Invariant Imbedding
351
See also
R. Bellman, R. Kalaba, and M. Prestrud, Invariant Imbedding and Radiative Transfer in Slabs of Finite Thickness, American Elsevier Publishing Company, Inc., New York, 1963. R. Bellman, H. Kagiwada, R. Kalaba, and M. Prestrud, Invariant Imbedding and Time-dependent Processes, American Elsevier Publishing Company, Inc., New York, 1964. §4. See the foregoing references and G. M. Wing, An Introduction to Transport Theory, John Wiley & SOIlS, New York, 1962. R. Bellman and G. Birkhoff (eds.), Transport Theory, Proc. Symp. Appl. Math., vol. XIX, Amer. Math. Soc., Providence, R.I., 1968. R. Preisendorfer, Radiative Transfer on Discrete Spaces, Pergamon Press, New York, 1965. §8. See R. Bellman, Scattering Processes and Invariant Imbedding, J. Math. Anal. Appl., vol. 23, pp. 254-268, 1968. R. Bellman, K. L. Cooke, R. Kalaba, and G. M. Wing, Existence and Uniqueness Theorems in Invariant Imbedding, I: Conservation Principles, J. Math. Anal. Appl., vol. 10, pp. 234-244, 1965. ~12. See the references of Sec. 8. discussed in
The classical theory of semi-groups,
E. Hille, Functional Analysis and Semi-groups, Amer. Math. Soc. Colloq. Pub!., vol. XXXI, 1948. may be thought of as a study of eAI where A is an operator, Le., the study of linear functional equations of the form x' = Ax + y, x(O) = c. Dynamic programming introduces the study of nonlinear functional equations of the form x' = max [A (q)x
+ y(q))
x(O) = c
q
in a natural fashionj see
R. Bellman, Adaptive Control Processes: A Guided Tour, Princeton U niverl;ity Press, Princeton, New .Jersey, 1961. Illvariallt imbedding may be cOIlFlidered lUI It theory of semi-groups in space and structure, alld, particularly, of nonlinear semi-groups of a type different from those just described. See
352
Introduction to Matrix Analysis R. Bellman and T. Brown, A Note on Invariant Imbedding and Generalized Semi-groups, J. Math. Anal. Appl., vol. 9, pp. 394-396, 1964.
A number of interesting matrix results connected with circuits arise from similar applications of invariant imbedding. See, for example,
R. M. Redheffer, On Solutions of Riccati's Equation as Functions of Initial Values, J. Rat. Meeh. Anal., vol. 5, pp. 835-848, 1960. R. M. Redheffer, Novel Uses of FUllctional Equations, J. Rat. Meeh. Anal., vol. 3, pp. 271-279, 1954. A. Ping and I. Wang, Time-dependent Transport Process, J. Franklin I nst., vol. 287, pp. 409-422. 1969. For an application of invariant imbedding to wave propagation, see R. Bellman and R. Kalaba, Functional Equations, Wave Propagation, and Invariant Imbedding, J. Math. Meeh., vol. 8, pp. 683-703, 1959. For discussions of two-point boundary-value problems of unconventional type using functional analysis, see F. R. Krall, Differential Operators and Their Adjoints Under Integral and Multiple Point Boundary Conditions, J. DiD. Eqs., vol. 4, pp. 327-336, 1968. W. R. Jones, Differential Systems with Integral Boundary Conditions, J. Dijf. Eqs., vol. 3, pp. 191-202, 1967. R. Bellman, On Analogues of Poincare-Lyapunov Theory for Multi. point Boundary-value Problems, J. Math. Anal. Appl., vol. 14, pp. 522-526, 1966. For an extensive application of invariant imbedding to Fredholm integral equation theory, with particular reference to the Wiener-Hopf equation, see A. McNabb and A. Schumitzky, Factorization of Operators-I: Algebraic Theory and Examples, University of Southern California, USCEE-373, July, 1969. 4. McNabb and A. Schumitzky, Factorization of Integral OperatorsII: A Nonlinear V olterra Method for Numerical Solution of Linear Fredholm Equations, University of Southern California, USCEE-330, March, 1969. A. McNabb and A. Schumitzky, Factorization of Operators-III: Initial Value Methods for Lineal' Tu'o-point Boundary J! alue PfOblems, University of Southern California, USCEE-371, July, 1969.
19 Numerical Inversion of the Laplace Transform and Tychonov Regularization 1. Introduction. In Chap. 11 we briefly indicated some of the advantages of using the Laplace transform to treat linear ordinary differential equations of the form (1) X" = Ax x(O) = c where A is a constant matrix. y(s)
If we set
= L(x) = 10" e-II x(t) dt
(2)
we obtain for Re(s) sufficiently large the simple explicit result y = (A - sl)-lc
(3)
The problem of obtaining x given y poses no analytical difficulty in this case since all that is required is the elementary result
f"
10
e-"e"l
dt = ...,--_1-...,.. (8 - a)
(4)
together with limiting forms. Comparing (1) and (3), we see that one of the properties of the Laplace transform is that of reduction of level of transcendence. Whereas (1) is a linear differential equation for x, (3) is a linear algebraic equation for y. As soon as we think in these terms, we are stimulated to see if the Laplace transform plays a similar role as far as other important types of functional equations are concerned. As we shall readily show, it turns out that it does. This explains its fundamental role in analysis. This, however, is merely the start of an extensive investigation centering about the use of a readily derived Laplace transform of a function to deduce properties of the function. Many interesting questions of matrix theory arise in the course of this investigation which force us to consider ill-conditioned systems of linear algebraic equations. In this fashion we make contact \\"ith some important ideas of Tychonov which 353
354
Introduction to Matrix Analysis
lie at the center of a great deal of current work in the analytic and computational study of partial differential equations and equations of more complex type. EXERCISE 1. Eetablish the Borel identity
(10
1
L
u(t - t.)v(t.)
dl.) = L(u)L(v)
under appropriate conditions on u and v by considering integration over the shaded region in Fig. 1 and interchanging the orders of integration.
'~"
o
f1
FlO. 1.
2. The Heat Equation.
Consider the partial differential equation
k(x)ut = Uzz u(O,t) = u(l,t) u(x,O) = g(x)
=
° °>< °< 1
(I)
t
x
Let us proceed formally to obtain an ordinary differential equation for L(u) assuming that all of the operations we perform are justifiable.
As mentioned above, our aim is solely to point out some of the uses of the Laplace transform. We have
-sk(x)g(x)
+
L(k(x)ul) = L(uu) k(x)L(u,) = L(u)zz sk(x)L(u) = L(u)zz
(2)
Hence, setting v = L(u)
=
(3)
v(x,s)
we obtain for each fixed s an ordinary differential equation for v,
vzz - sk(x)v = -k(x)g(x) v(O) = v(l) =
(4)
°
For any value of s it is an easy matter to solve (4) numerically. EXERCISES 1. What modifications are nece88ary if t.he boundary ("onditions nre 1/(0,1) = !t(I), u(I,t) = b(I)?
i. Obtain an analytic expressiotl for l,(u) in the case where k(x) constant.
!!!
k, a positive
Numerical Inversion of the Laplace Transform 3. The Renewal Equation.
Consider the renewal equation 1
+ 10
u(t) = j(t)
k(t - tl)U(tl) dtl
which arises in many parts of pure and applied mathematics. the Borel result (Exercise 1 at the end of Sec. 1), we have L(u)
355
= L(f)
+ L(k)L(u)
(1)
Using (2)
whence L(f) L(u) = 1 - L(k)
(3)
an important explicit representation for L(u). 4. Differential-difference Equations. As a third example of the use of the Laplace transform in simplifying equations, consider the linear differential-difference equation u'(t) = au(t) u(t) = g(t)
+ bu(t -
~
0
-1~t~O
We have
= aL(u)
L(u')
t
1)
+ bL(u(t - 1»
(1)
(2)
Using the relations L(u')
10'" u(t -
l)e-"dt
= = =
=
-g(O)
+ 8L(u)
fol + Ie'"
(3)
+ e-' 10r'" e-"u(t) dt /0-1 g(lJ)e-.(l-I,) dtl + e-'L(u) fO
-I
g(t l )e-·(l-I l) dt l
\ve readily obtain the expression L(U) =
f o g(tl)e-·(l-Il) dt l + g(O) -I
!!--~'(8---a-----;-b-e"""~)--
(4)
6. Discussion. We have illustrated the point that in a number of important cases it is easy to obtain an explicit representation for L(u). The basic question remains: What is the value of this as far as ascertaining the behavior of u is concerned? There are two powerful techniques which we can employ. The first is the use of the complex inversion formula u
where C is a
~uitable
= J-;
contour.
r
2n}c
L(u)e't d8
(1)
Once (1) has been established, we have
356
Introduction to Matrix Analysis
at our command the powerful resources of the theory of functions of complex variables. The second general technique involves the use of Tauberian theory, a theory which enables us to relate the behavior of u(t) for large t to the behavior of L(u) at s = O. Both of these methods are discussed in references cited at the end of the chapter. Here we wish to discuss a numerical inversion technique which yields values of u(t) for t ~ 0 in terms of values of L(u) for s ~ O. This numerical work, however, will often be guided by preliminary and partial results obtained using the two methods just mentioned. That there are serious difficulties in this approach to the numerical inversion of the Laplace transform is due essentially to the fact that in its general form the task is impossible. 6. Instability. By this last comment concerning impossibility, we mean that L-I is an unstable operator, which is to say, arbitrarily small changes in y = L(x) can produce arbitrarily large changes in the function x. Consider as a simple example of this phenomenon the familiar integral
f " e-
joo For
r~al 8,
II
sin at dt =
-;-::~a----:","2 (S2
+a )
a>O
(1)
we have a
(8 2 + a 2)
(2)
a
Hence, by taking a arbitrarily large, we can make L(sin at) arbitrarily small. Nonetheless, the function sin at oscillates between values of ± 1 for t ~ O. Consequently, the question of obtaining the numerical values of u of the linear integral equation
faoo e-"u(t) dt = v(s)
(3)
given numerical values of v(s), makes sense only if we add to a set of numerical values of v(s) some information concerning the structure of u(t). This is the path we shall pursue. '1. Quadrature. The first step is a simple one, that of approximating to an integral by a sum. There are many ways we can proceed. Thinking of an integral as representing the area under a curve (see Fig. 2), we can write
fal u(t) dt:: u(O).1 + u(.1).1 + ... + u«N -
1).1).1
(1)
where N.1 = 1. :\Iore accurate results can be obtained by using a . . to t he area 1fc<1 ((HI)<1 d d f h trapeZOIdal approximatIOn u(t) t, an so ort .
Numericnl Inversion of the Laplace Transform
357
As usual in computational analysis, we are torn between two criteria, one in terms of accuracy, the other in the coordinates of time. A method designed to yield a high degree of accuracy can require a correspondingly large amount of computation; a procedure requiring a relatively small number of calculations can produce undesirably large errors. Let us consider, to illustrate this Scylla and Charybdis of numerical analysis, a situation encountered, say, in connection with the heat equation discussed in Sec. 2, where the evolution of each functional value u(k.1) consumes an appreciable time. The price of desired accuracy using the quadrature relation in (1), namely choosing sufficiently small .1, may thus be too high.
t FIG. 2.
Let us then provide ourselves with a bit more flexibility by calculating values of u(t) at irregularly spaced points \til, called quadmture points, and suitably weighting the results. We write
fo
N
1
U(t) dt......,
I
U'iU(/i)
(2)
;~I
where the weights Wi are called the Christoffel numbers. There are clearly many ways of choosing the t; and Wi, each convenient according to Some preassigned criterion of convenience. We shall follow a procedure due to Gauss. 8. Gaussian Quadrature. Let us show that the Wi and t; are uniquely determined by the condition that (7.2) is exact for polynomials of degree less than or equal to 2N - 1. To demonstrate this, let us recall some facts concerning the Legendre polynomials, \P.. (t) I. These polynomials constitute an orthogonal set over [ -1,1], 1 / -I
.
PmP" dt
=0
m
~n
(1)
which is equivalent to the statement that
J~I tkP.. dt
=
0
k = 0,1,2, . . . , n - 1
(2)
358
Introduction to Matrix Analysis
Furthermore, they are a complete set over L2( -1,1) in the sense that any function u(t) belonging to L2( -1,1) can be arbitrarily closely N
approximated by a linear combination
LaJ',,(t) in the L2 norm for N
"-0
sufficiently large. If we take u(t) to be continuous, we can replace the Lt norm by the maximum norm lIuli = max lui. .
-IS'SI
We begin our derivation of the Gaussian quadrature formula by considering the shifted Legendre polynomial
n == 0, 1,2, .••
P:(t) .. P,,(l - 2t)
(3)
These polynomials constitute a complete orthogonal set over [0,1] and are determined by the analogue of (2), k
= 0, 1,2, ..• , n - 1
(4)
up to a normalizing factor which is chosen sometimes one way and sometimes another depending upon our convenience. Using the functions tkP~, k = 0, 1, . . . , N - 1 as "test functions," we see that the required equality in (7.2) yields the equations
o ==
fo
1
tkP~(t) dt =
N
LW.~kP~(t.)
,-1
(5)
k == 0, 1, . • . , N - 1. Regard this as a system of N simultaneous , N. linear algebraic equations for the N quantities W.P~(ti), i .. 1,2, The matrix of coefficients is (6)
the Vandermonde matrix, which, as we know, is nonsingular if the t. are distinct. Naturally, we assume that they are. Hence, (5) implies that (7) i = 1,2, . . . , N Since the Wi are assumed nonzero, we see that the quadrature points are the N zeros of P~(t), which we know lie in [0,1]. EXERCISES
1. Show that
i = 1,2, , ... ,N How does one show that the
Wi
are positive?
Numerical Inversion of the Laplace Transform 2. With the foregoing determination of I, and
fo
Wi,
359
6how that
N
1
u(l) dl =
I
WiU(t,)
i-I
for any polynomial of degree 2N - 1 or less. 3. Determine the possible sets of quadrature points and weights if we assume that (7.2) holds for all polynomials of degree less than or equal to M which are zero at t = O. What relation holds between M and N if we wish the W,' and t, to be uniquely determined? 4. Obtain similar results for the case where conditions hold at both t = 0 and t = 1 on u(t) and its derivatives.
9. Approximating System of Linear Equations. ground, let us return to the numerical solution of
With this as back-
fo" U(t)e-" dt = v(s)
(1)
To employ the preceding results, we first make a change of variable
e- I = r
(2)
obtaining (3)
Write g(r) = U (log
n
and employ the quadrature approximation of Sec. 8. relation
(4)
The result is the
N
I i
w,y(ri)rr J ::: v(s)
(5)
=- 1
We now set s = 1, 2, , N, and consider the approximate systen. of linear algebraic equations N
I w,rikg(r,) = v(k + 1)
k = 0, 1, . . . , N - 1
(6)
,~l
These N equations uniquely determine the N quantities wig(r,) since the determinant of the coefficients is the Vandermonde determinant which is nonzero. 10. A Device of Jacobi. There is strong temptation at this point to relegate the further steps of the numerical solution to the sturdy keepers of algorithms for solving linear systems of algebraic equations. There are important reasons for resisting this apparently easy way out. In the first place, there is nothing routine about the procedure; in the second place, some important analytic concepts are involved.
Introduction to Matrix Analysis
360
To begin with, let us show that we can obtain an explicit representation for the inverse of the Vandermonde matrix, which means that we can obtain an explicit solution of (9.6) for arbitrary v. Introduce the new variables y. = w.g(x,) , i = 1, 2, . . . , N, and set a. = lI(k + 1). The system in (9.6) takes the simpler form N
L
k = 0, 1, . . . , N - 1
rl'Y, = ak
(1)
i-I
Multiply the kth equation by the parameter qk, to be determined in a moment, and add. The result is N
N-l
N-l
L L
qkri k)
Yo (
i-I
L
=
(2)
akqlc
Ie-O
Ie-O
Let the polynomial J(r) be defined by N-l
L
J(r) =
(3)
qkr'
Ie-O
Then (2) may be written N
N-I
i-I
Ie-O
LYJ(r,) = L akqk
It remains to choose J in an expeditious fashion.
(4) Suppose that
J
is
chosen to satisfy the conditions J(ri) = 1 !(r.) = 0
i
¢
j
(5)
Call this polynomial obtained in this fashion Ii and write N-l
!i(r)
=
I
qklrk
(6)
Ie-O
Then (4) reduces to N-l
Yj =
L
akqki
(7)
Ie-O
The polynomial satisfying (5) is determined in principle by the Lagrange interpolation formula. In this case, it is readily obtained, namely P~(r)
!J(r)
= (r - rj)P;'(rj)
i = 1,2, . . . ,N
(8)
With the aid of the recurrence relations for the Legendre polynomials, the qkj can be calculated to any desired degree of accuracy.
Numerical Inversion of the Laplace Transform
361
EXERCISE
1. Why does it make any difference computationally that we can use (8) rather than the gen,ral formula for the Lagrange interpolation polynomial?
U. Discussion. With the aid of analytic and numerical expressions for the explicit inverse of V(rl,r2' •.. ,TN) we have apparently completed our self-imposed task, that of obtaining an approximate inversion formula. Nonetheless, the results of Sec. 6, which tell us that £-1 is an unstable operator warn us that there must be a catch somewhere. There isl The unboundedness of £-1 manifests itself in the fact that V(rl,r2, ••• ,rN) is an ill-conditioned matrix. This is equivalent to saying that x = V-Ib, the solution of Vx
=b
(1)
is highly sensitive to small changes in b. To see this, let us note the values of the inverse of (W,T,i-l) for the N = 10. It is sufficient to present the values of the last row of this matrix to illustrate the point we wish to make. - 6.2972078903648367 -1 6.8631241594216109 1 -1.8007322514163832 3 1.9787480981636393 4 -1.1232455134290224 5 3.6273559299309234 5 -6.9145818794948088 5 7.6901847430849219 5 -4.6080521705455609 5 1.1482677561426949 5
The last digits indicate the appropriate power of 10 by which the preceding seventeen-digit value is to be multiplied. We see from an examination of the foregoing table that small changes in the values of the ak in (9.7) can result in large changes in the value of the Yj. For the case where the values of the ak can be generated analytically, the problem is not overwhelming since we can calculate the desired values, at some small effort, to an arbitrary degree of accuracy, single precision, double precision, triple precision, etc. In many cases, however, the values are obtained experimentally with a necessarily limited accuracy. In other cases, V, the Vandermonde matrix, is replaced by another ill-conditioned matrix whose inverse cannot be readily calculated with any desired degree of accuracy; see Sec. 16 for an illustration of this.
362
Introduction to Matrix Analysis
The problem of solving unstable equations (llimproperly posed" problems in the terminology of Hadamard) is fundamental in modern science, and feasible solution represents a continual challenge to the ingenuity and art of the mathematician. It is clear, from the very nature of the problem, that there can never be a uniform or definitive solution to a question of this nature. Hence, we have the desirable situation of a never-ending supply of almost, but not quite, impossible problems in analysis. 12. Tychonov Regularization. The problem we pose is that of devising computational algorithms for obtaining an acceptable solution of Ax
=b
(1)
where A-I exists, but where A is known to be ill-conditioned. A first idea is to regard (1) as the result of minimizing the quadratic form Q(x)
= (Ax
- b, Ax - b)
(2)
and thus replace (1) by (3)
The matrix of coefficients is now symmetric. Unfortunately, it turns out that A 7'A may be more ill-conditioned than A and thus that (3) may result in even more chaotic results. A slight, but fundamental, modification of the foregoing procedure does, however, lead to constructive results. Let us replace the solution of (1) by the problem of minimizing R(x)
=
(Ax - b, Ax - b)
+ ",(x)
(4)
where ",(x) is a function carefully chosen to ensure the stability of the equation for the minimizing x. This is called Tychonov regularization. The rationale behind this is the following. We accept the fact that the general problem of solving (1), where A is ill-conditioned and b is imprecisely given, is impossible. We cannot guarantee accuracy. Rather than a single solution, (1) determines an equivalence class, the set of vectors x such that I/Ax - bl/ ~ E, where II' . '11 is some suitable norm. Due to the ill-conditioning of A we possess no way of choosing the vector we want in this equivalence class without some further information concerning x. The function '" represents the use of this additional information. Our thesis is that in any particular analytic or scientific investigation there is always more data concerning x than that present in (1). The challenge is to make use of it. Let us give two examples which we will expand upon. Suppose that as a result of preliminary studies we know that x ::: c, a given vector.
Numerical Inversion of the Laplace Transform
363
Then we can choose ",(x)
where>.
> 0,
=
>.(x - c, x - c)
(5)
and consider the problem of minimizing
Rt(x,>.) = (Ax - b, Ax - b)
+ >.(x -
c, x - c)
(6)
Alternatively, we may know that the components of x, the Xi, viewed as functions of the index i, vary llsmoothly" with i. By this we mean, for exampld, that we can use the values of Xl, X2, •.. , Xi-t to obtain a reasonable estimate for the value of Xi. Alternately, we can say that there is a high degree of correlation among the values of the Xi. This is an intrinsic condition, as opposed to the first example, where we have invoked external data. In this case we may take ",(x)
>.
> 0,
= >'[(X2
- XIP
+ (xa
- X2P
+ . . . + (XN
- XN_I)2]
(7)
and consider the problem of minimizing
R 2(x,>.)
= (Ax
- b, Ax - b)
+ >.l(X2 -
XIP
+ . . . + (XN
How to choose >. will be discussed below. 13. The equation ",(x) = >.(x - c, x - c). mization of R(x)
where>.
> 0.
=
(Ax - b, Ax - b)
- XN_t)2]
(8)
Let us consider the mini-
+ >.(x -
c, x - c)
(1)
It is easy to see that the minimum value is given by x = (ATA
+ >.1)-t(.4 Tb + >.c)
(2)
The question arises as to the choice of >.. If >. is II small," we are close to the desired value A-Ib; if A is "small," however, ATA + >.1 will still be ill-conditioned and we may not be able to calculate x effectively. How then do we choose>.? In general, there is no satishctory answer since, as we mentioned above, there is no uniform method of circumventing the instability of L:'I. We must rely upon trial and error and experience. In some Cases we can apply various analytic techniques to circumvent partially the uncertainty concerning a suitable choice of >.. We shall indicate some of these devices in the exercises. EXERCISES 1. Show that, for small ~,x(~) = Xo + ~YI + ~2Y2 + ... ,where the Vi are independent of ~. 2. Discuss the possibility of using "deferred passage to the limit" t.o calwlate Xo from x(~) calculated using convenient values of 11, that is, set Xo = 2x(1I) - x(2~) + 0 (11 2), etc.
364 Introduction to Matrix Analysis I. Consider the method of succeBsive approximations, n~O Z for any >. > 0 and any c. 4. What is the rate of convergence? How does this rate of convergence affect a choice of >.? DiscU88 the possibility of using nonlinear extrapolation techniqueB. I. Consider an application of dynamic programming to the minimization of R(z) along the following lines. Write
a:o = c. Show that z" converges to
N
RI(z) = >'I(ZI - CI)I
+
L
(ai/Zl -
RJI(z) .. [ >'(Zl -
,-1 CI)I + >'(ZI -
and let b = (bl,b l ,
•••
,bN ).
C.)I
bi)1
+ ..
Write !M(b) = min RM(Z) 1I
Show that !M(b) .. min [>'(ZM - CM)I
+ !M_I(b -
ZMa IM))1
"'"
where aIM) - (aIM,aut, ••• ,aNM). 8. Use the foregoing functional equation and the fact that !M(b) - (b,QMb)
+ 2(pM,b) + rM
to obtain recurrence relations for QM, PM, rM• .,. Is there any particular advantage in numbering the unknowns ZI, ZI, ••• , ZN in one order rather than another? (What we are hinting at here is that computing may be regarded as an adaptive control process where the objective is to minimize the total error, or to keep it within acceptable bounds.)
14. Obtaining an Initial Approximation. One way to obtain an initial approximation as far as the numerical inversion of the Laplace transform is concerned is to begin with a low-order quadrature approximation. Although the accuracy of the results obtained is low, we encounter no difficulty in calculating the solution of the associated linear equations. Suppose, for example, we use initially a five-point Gaussian quadrature (see ~'ig. 3). ,I')
o
'15
'25
'35
'45 '55 I ,
Numerical Inversion of the Laplace Transform
365
Using the Lagrange interpolation formula, we pass a polynomial of degree four through these points, ITi6,g(Ti&) I, and then use this polynomial to obtain starting values for the values of g at the seven quadrature points for a quadrature formula of degree seven. The seven values constitute our initial vector c in the successive approximation scheme of the exercises of Sec. 13. We can continue in this way, increasing the dimension of the process step-by-step until we obtain sufficient agreement. One measure of agreement would be the simultaneous smallness of the quantities (Ax - b, Ax - b} and (x - c, x - c). 16. BeH-consistent ",(x). The numerical inversion of the Laplace transform furnishes us with an example of a case where the Xi vary smoothly with i. Let us then consider the function introduced in (12.7). One interesting approach to the problem of minimizing R(x) = (Ax - b, Ax - b) >'[(X2 - Xl)2
+
+ (X3
- X2)2
+ ...
(XN - XN-l)!]
(1)
is by way of dynamic programming. Introduce the sequence of functions !k(Z,C)
= min [ %
M
N
ja:l
j-k
l (l
k = 2, 3, . . . , N.
aijXj -
Z,)2
+ >'[(XN -
XN_I)!
+ ...
(Xk - c)!]
(2)
Then z.
where aCk) = (alk,a2k, ••• ,aMk). Setting (4)
we can, as in Chap. 17, use (3) to obtain recurrence relations for Qk, Once the quadratic h(b,XI) has been determined in this fashion, we minimize over x. We leave the nontrivial details as a set of exercises. Pk, and Tk.
EXERCISE 1. Carry out a similar procedure for the minimization of R(x) = (Ax - b, Ax - b)
+ i\[(xa
- 2x,
+ Xl)' + ... + (XN -
2.zN_1
+ xN_,)'1
16. Nonlinear Equations. Suppose that we attempt to apply the same techniques to nonlinear systems. Consider, as a simple case illustrating the overall idea, the equation
u' = -u
+u
2
u(O) = c
(1)
Introduction to Matrix Analysis
366 Take
lei < 1 which ensures the existence of u(t) sL(u) - c L(u)
= =
for t
~
O.
We have
+ L(u L(u (s + 1) + (s + 1) 2
-L(u)
)
c
2)
(2)
We can approach the problem of calculating L(u) by means of successive approximations, namely L(uo) = (s
c
+ 1) C
L(U/l+l) =
(8 + 1)
L(u,,2)
(3)
+ (s + 1)
If we apply quadrature techniques, we see that we encounter the same coefficient matrix at each stage which means that we can use the explicit inverse previously obtained. Note that L(u,,2) is evaluated using the values of u" at the fixed quadrature points. A serious difficulty with the foregoing approach, however, is that the convergence is slow. Since the evaluation of L(u/l 2 ) by means of quadrature can introduce nonnegligible errors, it is plausible that repeated use of the foregoing techniques can lead to considerable and unacceptable error. Let us employ then in place of (3) a quasilinearization technique. Writing (4)
we have (s
+
+
L(u:+!) = -L(u/l+!) 2L(u/lu/l+!) - L(u/l 2 ) l)L(u/l+l) = C 2L(u/lu/l+l) - L(u/l 2)
+
(5)
The advantage of quasilinearization is that it provides quadratic convergence. Now, however, when we employ the quadrature technique described in the preceding pages, we face a different ill-conditioned matrix at each stage of the calculation. This is due to the presence of the term L(U/IU/l+l) in (5). It is this which motivates the discussion and utilization of Tychonov regularization. 17. Error Analysis. Let us now critically examine the validity of the foregoing procedures. Essentially what we have been assuming, in treating the integral equation
fol r--Ig(r) dr = v(s)
(1)
as we did, is that we call approximate to g(r) sufficiently accurately by means of a polynomial of degree N, namely Ilg(r) - per) II ~
E
(2)
Numerical Inversion of the Laplace Transform
367
For the moment, let us think in terms of the form
II· . ·11 = max /. . ./
(3)
05r51
To see the effect of this approximation, let us write
J/ r·-1p(r) dr
= v(s)
+ fol rl-l(p(r)
(4)
- g(r» dr
Since s = 1,2, ... ,N, and p(r) is a polynomial of degree N, the term on the left may be written N
l
(5)
WiTi·-Ip(r.)
i-I
Then (4) is equivalent to the system of linear equations N
l U'.ri·-1p(ri)
= v(s)
+ 10
1
r·-1(p(r) - g(r» dr
(6)
i-I
where s = 1, 2, ... ,N. It remains to estimate the discrepancy term. Using (2), we have, crudely,
Ihi rl-l(p(r) -
g(r» dr
I:: ; IIg(r) -
p(r)11
hi
rl-l
dr :::;
i
(7)
We face a stability problem in comparing the solution of (6), given the estimate in (7), with the solution of N
Lw.r;·-I gi = v(s)
s = 1, 2, . . . , N
(8)
i=-I
The problem is, as we have indicated, a serious one since (Wiri;) is illconditioned. Nevertheless, it is clear that max Igi - p(ri) / will be small, i
provided that E is sufficiently small. The question in practice is an operational. one, depending upon the relation between the degree of approximation of g by a polynomial of relatively low degree and the magnitude of the coefficients in V-I. In view of the size of these terms (see Sec. 10), it is surprising that the method works so well in practice. One explanation may lie in the fact that we can estimate the term r'-I(p(r) - g(r» dr in a far sharper fashion than that given in (7). We have
fol
hi
I
r'- l (p(r) - g(r» dr
I:::;
hi
rO-Ilp(r) - g(r)1 dr
: :; (/01 1'20-2 drY" (/01 (p(r) :::; (2s
~
- (/(1'»2
IF2(h
dry2
l
(p(r) - U(1'»)2d1'Y2
(9)
Introduction to Matrix Analysis
368
upon applying the Cauchy-Schwarz inequality. It is well known that for the same degree polynomial we can obtain a considerably better mean-square approximation than that available in the Cebycev norm. EXERCISE
1. What determines whether we choose s = 1.2••.•• Nor s = 1.1 .•• 1
+ (N -
+ a. 1 + 2a,
l)a?
MISCELLANEOUS EXERCISES
1. Consider the problem of minimizing the quadratic form Q(z) =
101 (J(t)
-
ZD -
Zit -
••• -
ZNtN) I
tit
where J(t) is a given function. Carry out some numerical experiments to ascertain whether the matrix of the a880ciated system of linear equations is ill-conditioned or not. i. Discu88 the advantages or disadvantages of replacing 1. t. . . . •tN by the shifted Legendre polynomials. 8. Show that 1
o
-M
o
1
-M
o -m
o o
1
o
0
-AI 1
where n is the order of the determinant. 4. Hence, discuss the stability of solutions of triangular systems subject to small changes in the zero elements below the main diagonal. See A. M. Ostrowski, On Nearly Triangular Matrices. J. Res. Nat. Bur. Standards. vol. 52. pp. 319-345. 19M. Ii. Consider the equation x + "Ax = b where A is a positive matrix. .Show that we can find a parameter k such that Z b + ,.b. + "Ib l + . . . converges for all /I. ~ 0. where /I = "/(A + k). 8. Consider the equation Z = 11 sAz where s is a scalar parameter. Show that
=
+
11
z =
+ s(A + k I)1I + sl(AI + klA + klI) + ... 1 + kls + klSI + . . . l
is a solution for any sequence of values k l • kl. • . •• What different sets of criteria are there for choosing various sets of values? .,. Consider the solution of z = 11 + Az by means of continued fractions using the fact that A"Z = A"1I + A ,,+IZ. See Bellman and Richardson, I Brown, I and Drobnies. 3 I R. Bellman and J. M. Richardson. A New Formalism in Perturbation Theory using Continued Fractions. Proc. Nat. Acad. Sci. USA. vol. 48, pp. 1913-1915. 1962. IT. A. Brown. Analysis of a New Formalism in Perturbation Theory Using Continued Fractions, Proc. Nat. Acad. Sci. USA. vol. 50. pp. 598-601, 1963. • S. I. Drobnies, Analysis of a Family of Fraction Expansions in Linear Analysis, J. Math. Anal. Appl.• to appear in 1970.
Numerical Inversion of the Laplace Transform
369
Bibliography and Discussion §1. For a detailed discussion of the Laplace transform and various inversion techniques, see G. Doetsch, Handbuch der Laplace-Transformation (3 volumes). Basel, 1950-1956. R. Bellman and K. L. Cooke, DifferentiaJ.-difference Equations, Academic Press Inc., New York, 1963. For some interesting and detailed accounts of the computational solution of Ax = b, see J. Todd, The Problem of Error in Digital Computation, in L. B. Roll (ed.), Error in Digital Computations, vol. 1, pp. 3-41, 1965. G. E. Forsythe, Today's Computational Methods of Linear Algebra, SIAM Review, vol. 9, pp. 489-515, 1967. A. S. Householder, Theory of Matriee8 in N umerieal Analysu, Blaisdell Publishing Company, New York, 1964. A. M. Ostrowski, Solution of Equation8 and Systems of Equation8, Academic Press Inc., New York, 1960. §§2 to 4. See the books cited above for detailed discussion of these three classes of equations. §7. For a discussion of quadrature techniques and for further discussion of most of the results of this chapter, see R. Bellman, R. Kalaba, and J. Lockett, Numerical Inversion of the Laplace Transform, American Elsevier Publishing Company, Inc., New York, 1966. For a different approach, see L. Weiss and R. N. McDonough, Prony's Method, Z-transforms and Pade Approximation, SIAM Rev£ew, vol. 5, pp. 145-149, 1963. §10. For extensive numerical applications of the foregoing techniques, see the book cited immediately above. For a careful discussion of the ill-conditioning of Vandermonde matrix and the impossibility of completely circumventing it, see W. Gautschi, On Inverses of Vandermonde and Confluent Vandermonde Matrices, Numel'. Math., vol. 4, pp. 117-123, 1962.
Introduction to Matrix Analysis
370
W. Gautschi, On the Conditioning of a Matrix Aril:ling in the Numerical Inversion of the Laplace Transform, Math. Comp., to appear. §11. See J. Todd, On Condition Numbers, Programmation en Mathematiques Numeriques, Colloques Internationaux du Centre National de la Recherche Scientifique, no. 165, pp. 141-159, 1968. A novel approach to solving Ax
= b where A is ill-conditioned is given in
Ju. V. Vorobev, A Random Iteration Process, II, Z. Vycisl. Mat. i .Mat. Fiz., vol. 5, pp. 787-795, 1965. (Russian.) J. N. Franklin, Well-posed Stochastic Extensions of Ill-posed Linear Problems, Programming Rept. 135, California Institute of Technology, 1969. §12. See the books
R. Lattes and J. L. Lions, Theory and Application of the Method of Quasi-rever8'toility, American Elsevier Publishing Company, Inc., New York, 1969. M. M. Lavrentiev, Some Improperly Posed Problems of Mathematical Physics, Springer-Verlag, Berlin, 1967. for a general discussion of "regularization," one of the important concepts of modern analysis. See the book cited in Sec. 7 for all that follows. §13. See R. S. Lehman, Dynamic Programming and Gaussian Elimination, J. Math. Anal. Appl., vol. 5, pp. 1-16, 1962. §16. For a detailed discussion of quasilinearization, see R. Bellman and R. Kalaba, Quasilinearizatz'on and Nonlinear Boundary-value Problems, American Elsevier Publishing Company, Inc., New York, 1965.
Appendix A Linear Equations and Rank 1. Introduction. In this appendix, we wish to recall some elementary results concerning determinants and the solution of linear systems of equations of the form N
l
a'jXj
= c.
i = 1,2, . . . ,N
(1)
j-I
by their means. The coefficients (l;j and c; are assumed to be complex numbers. Using these results, we wish to determine when a system of homogeneous linear equations of the form z = 1,2, . . . , M
(2)
possesses a solution distinct from XI = X2 = . . . = XN = 0, a solution we shall call the trivial solution. For our purposes in this volume, the most important of these linear systems is that where M = N. However, we shall treat this case in the . course of studying the general problem. Finally, we shall introduce a function of the coefficients in (2) called the rank and study some of its elementary properties. 2. Determinants. We shall assume that the reader is acquainted with the elementary facts about determinants and their relation to the solution of linear systems of equations. The two following results will be required. Lemma 1. A determinant with two rows or two columns equal has the valtte zero.
This result is a consequence of Lemma 2. 371
372 Lemma 2.
Introduction to Matrix Analysis The determinant all
au
a21
an
alN a2N
may be expanded in the elements of the first row,
laill = aliA 11 + a12A 12 + . . . + alNA IN
(2)
where A ij is ( _1)i+i times the determinant of order N - 1 formed by striking out the ith row and jth column of laiil. The quantity A ij is called the cofcretor of au. 3. A Property of Cofactors. Combining Lemmas 1 and 2, we see that the cofactors Aii> j = 1, 2, . • . , N, satisfy the following systems of linear equations N
l
aljAij = P
i
= 2,
. . . ,N
(1)
j-I
since this expresses the fact that the determinant with first and ith rows equal is zero. 4. Cramer's Rule. If laiil ~ 0, we may express the solution of (1.1) in the form
Xl
CI
au
C2
a22
alN a2N
= -------la;il
(1)
with Xi given by a similar expression with the ith column of Ia;jl replaced by the column of coefficients Ct, C2, • • • , CN. G. Homogeneous Systems. Let us now use the foregoing facts to establish the following result. Theorem 1. The system of linear equations N
l
a;jo'Cj = 0
i
= 1,2, . . . ,M
>
M.
j-I
possesses a nontrivial solution if N
(1)
Linear Equations and Rank
873
If N ::;; M, a necessary and sufficient condition that there be (J nontrivial 8olution i8 that every N X N determinant !l!;J! formed from N rows be zero. Furthermore, we can find a nontrivial solution in which each of the Zt ale polynomials in the l!;j. The proof will be by way of a long, but elementary, induction over M. Proof. Let us begin with the case M = 1 of the first statement where the statement is clearly true. Next, consider the first statement for M = 2. If N = 2, we have the equations aUXI a21XI
which, if
+ a12X2 = 0 + a22X2 = 0
(2)
!aiil = 0, have the solutions (3)
If this is a trivial solution, that is, Xl = X2 of the coefficients in the second equation, Xl
=
= 0, let us use the cofactors (4)
all
If both of these solutions are trivial, then all the coefficients l!;j are zero, which means that any set of values Xl, X2 constitute a solution. If N > 2, we have a set of equations allXI
a21Xl
+ a12x2 + + a22X 2 +
+ alNxN = + a2NxN =
0 0
(5)
To obtain nontrivial solutions, we use an obvious approach. Let us solve for Xl and X2 in terms of the remaining Xi. This is easily done if the determinant (6)
is nonzero. However, this may not be the case. Let us then see if we can find two variables, Xi and Xi> whose determinant is nonzero. If one such 2 X 2 determinant, (7)
exists, we can carry out the program of solving for Xi and XJ in terms of the remaining variables. Suppose, however, that all such 2 X 2 matrices are zero. We assert then that one of the equations in (5) is redundant, in the sense that any set of x,-values satisfying one equation necessarily satisfies the other. Specifically, we wish to show that there exist two parameters ~l and A2,
374
Introduction to Matrix Analysis
one at least of which is nonzero, such that A1(anXl
+ a12X2 + ... + atNxN) + A2(a21Xl + a22X2 + ... + a2NxN) = 0
(8)
x,.
identically in the If Al ~ 0, it is clear that it is sufficient to satisfy the second equation in (5) in order to satisfy the first. We have thus reduced the problem of solving two equations in N unknowns to that of one equation in N unknowns. The equation in (8) is equivalent to the statement that the following equations hold; Atan Atau
+ +
A2a21
=0
A2a22
= 0
(9)
Consider the first two equations: Alan Alan
+ A2a21 + A2a22
= 0 = 0
(10)
These possess a nontrivial solution since we have assumed that we are in the case where all 2 X 2 determinants are equal to zero. Without loss of generality, assume that all or au is different from zero, since the stated result is trivially true for both coefficients of XI equal to zero; take all ~ O. Then (11)
is a nontrivial solution. This solution, however, must satisfy the remaining equations in (9) since our assumption has been that all 2 X 2 determinants of the type appearing in (7) are zero. The interesting fact to observe is that the proof of the first statement, involving N X 2 systems, has been made to depend upon the second statement involving 2 X N systems. We have gone into great detail in this simple case since the pattern of argumentation is precisely the same in the general N X M case. Let us establish the general result by means of an induction on M. If we have the iJ1 equations
i = 1,2, . . . ,M
(12)
Linear Equations and Rank
375
with N > M, we attempt to find nontrivial solutions by solving for M of the Xi in terms of the remaining N - M variables. We can accomplish this if one of M X M determinants formed from M different columns in the matrix (Clii) is nonzero. Suppose, however, that all such M X M determinants are zero. Then we assert that we can find quantities YI, Y2, , YM, not all equal to zero, such that N
YI
(l
j-I
N
alixi)
+ Y2 (l
N
a 2j Xi)
+
+ YM (I
i=1
ailfixi)
=0
(13)
i=1
identically in the Xi. Suppose, for the moment, that we have established this, and let, without loss of generality, YI ¢ O. Then, whenever the equations N
.l
ClijXj
i = 2,3, . . . ,M
=0
(14)
1=1
N
are satisfied, the first equation
l
aljXj
= 0 will also be satisfied. We
j=1
see then that we have reduced the problem for M equations to the problem for M - 1 equations. It remains, therefore, to establish (13), which is equivalent to the equations Ylan Yla12
+ Y2 a + + Y2 a + 21
22
+ YMaMI = 0 + YMaM2 = 0 (15)
YlalN
+ Y2 a 2N + ... + YMaMN
= 0
This is the type of system of linear equations appearing in the second statement of Theorem 1. Again, we use an obvious technique to find nontrivial solutions. Consider the first M equations. Choose YI to be the cofactor of all in the determinantal expansion, Y2 to be the cofactor of au, and so on. These values must satisfy all of the other equations, since our assumption has been that all M X M determinants are zero. Hence, if Some cofactor Ail is nonzero, we have our desired nontrivial solution. If all the Ail are zero, we consider another set of M equations and proceed similarly. This procedure can fail only if all (M - 1) X (M - 1) determinants formed by striking out one column and N - M + 1 rows of (aji) are zero.
Introduction to Matrix Analysis In that case, we assert that we can set YN == 0 and still obtain a non-
376
trivial set of solutions for YI, Y2, •.• ,YN-I. For, the remaining system is now precisely of the type described by the second statement of Theorem 1 with M replaced by M - 1. Consequently, in view of our inductive hypothesis, there does exist a nontrivial solution of these equations. This completes the proof. 6. ~ank. ExanIining the preceding proof, we see that an important role is played by the non?Jero determinants of highest order, say r, formed by striking out N - r columns and M - r rows from the N X M matrix A = (elii)' This quantity r is called' the rank of A. As far as square matrices are concerned, the most important result is Theorem 2. Theorem 2. If T is nonsingular, the rank of T A * is the same as the rank of A. Proof. If A has rank r, we can express N - r columns as1inear combinations of r columns, which are linearly independent. Let these r columns be denoted by aI, a 2, • • • , ar , and without loss of generality, let A be written in the form A
= (a l a2
•••
ar(cual
+ c12a2 + + clrar)
. (cN_I,la l
+ ... + CN-f'.ra » r
(1)
If the rows of Tare t l , t 2, ••• , tN, we have (t l ,a l )(t l ,a 2)
•••
(t l,ar)[cll(tt,a l)
+ c12(tt,a + ,. + Clr(tl,ar)J 2
l
2
(t2,a )(t ,a 2) B
=
TA
=
"
)
(t ,a )[cl1(t 2,a l ) + c12(t 2,a 2) + 2
r
+ cl (t2,a ») r
r
(2)
Hence the rank of B is at most r. But, since we can write A = T-IB, it follows that if the rank of B were less than r, the rank of A would have to be less than r, a contradiction. EXERCISES 1. istic I. 8. root
What is the connection between the rank of A and the number of zero characterroots? Prove the invariance of rank by means of the differential equation d'Z/dt = A'Z. The rank of A - >'d is N - k where k is the multiplicity of the characteristic
>.,.
Linear Equations and Rank
377
'1. Rank of a Quadratic Form. The rank of a quadratic form (x,Ax) is defined to be the rank of the matrix A. It follows that the rank of (x,Ax) is preserved under a nonsingular transformation x = Ty. 8. Law of Inertia (Jacobi-Sylvester). If (x,Ax) where A is real is transformed by means of x = Ty, T nonsingular and real, into a sum of squares (x,Ax)
, = )
b,y.2
(1)
"':"1 where none of the b. are zero, it follows that r must be the rank of A. In addition, we have Theorem 3. Theorem 3. If a real quadratic form of rank r, (x,Ax), is transformed by two real nonsingular transformations x = Ty, x = Sz, into a sum of r squares of the form (x,Ax) = b1yl 2 + b2y22
+ ... + b,y,1
= CIZ\2 + C2Z2 2 + . . . + C.z,1
(2)
then the number of positive b. is equal to the number of positive Ci. 9. Signature. Let p(A) denote the number of positive coefficients in the foregoing representation and n(A) the number of negative coefficients. The quantity p(A) - n(A) is called the signature of A, s(A). It follows
from the above considerations that s(A) = s(TAT')
(1)
if Tis nonsingular. EXERCISES 1. Show that r(AB)
+ r(BC)
~
r(B)
+ r(ABC),
whenever ABC exists (G. Fro-
beniu8).
i. Derive the Jacobi-Stieltjes result from this.
MISCELLANEOUS EXERCISES 1. For any N X N matrix A ... (ali) of rank r, we have the relations
Whenever % occurs, we agree to interpret it as O.
(KIJ Fan-A. J. Hoffman)
378
Introduction to Matrix Analysis
In the paper by Ky Fan and A. J. Hoffman,! there is a discussion of the problem of fiDdiJlg lower bounds for the rank of A directly in terms of the elements a'i, and the relation between this problem and that of determining estimates for the location of characteristic values. I. For complex A, write A = B + ie, where B = (A + A) /2, C = - (A - A)i/2. Then if A is Hermitian semidefinite, the rank of B is not less than the rank of C (Broeder-Smith). 8. If A is Hermitian semidefinite, the rank of B is not less than the rank of A (Broeder-Smith). '- Let A be a complex matrix, and write A = S + iQ, where S - (A + A)/2, Q - -i(A - A)/2. If A is Hermitian positive-indefinite, then reS) ~ r(Q) and reS) ~ rCA).
Bibliography and Discussion An excellent discussion of the many fascinating questions arising in the course of the apparently routine solution of linear systems is given in G. E. Forsythe, Solving Linear Algebraic Equations Can Be Interesting, Bull. Am. Math. Soc., vol. 59, pp. 299-329, 1953. See also the interesting article C. Lanczos, Linear Systems in Self-adjoint Form, Am. Math. Monthly, vol. 65, pp. 665-679, 1958. For an interesting discussion of the connection between rank and a topological invariant, see D. G. Bourgin, Quadratic Forms, Bull. Am. Math. Soc., vol. 51, pp. 907-908, 1945. For some further results concerning rank, see
R. Oldenburger, Expansions of Quadratic Forms, Bull. Am. Math. Soc., vol. 49, pp. 136-141, 1943. For a generalization of linear dependence, see H. C. Thatcher, Jr., Generalization of Concepts Related to Linear Dependence, J. Ind. Appl. Math., vol. 6, pp. 288-300,1958. G. Whaples, A Note on Degree-n Independence, ibid., pp. 300--301. 1 Ky Fan and A. J. Hoffman, Lower Bounds for the Rank and Location of the Eigenvalues of a Matrix, Contributions to the Solution of Systems of Linear Equations and the Determination oj Eigenvalues, Natl. Bur. Standards Applied Math., ser. 39, 1954.
Appendix' B The Quadratic Form of Selberg AB another rather unexpected use of quadratic forms, let us consider the following method of Selberg, which in many parts of number theory . replaces the sieve method of Brun. Suppose that we have a finite sequence of integers {a.), k = 1,2, . . . I N, and we wish to determine an upper bound for the number of the a. which is not divisible by any prime p ~ z. Let {X. I, 1 ~ v ~ z, be a sequence of real numbers such that Xl = 1, while the other X. are arbitrary. Consider the quadratic form N
Q(x) =
Whenever
ak
if (~. X.)2
(1)
I
is not divisible by a prime ~ z, the sum )
Hence, if we denote the number of any prime p ~ z by f(N,z) we have f(N,z)
~
ak,
k
wf:.
X.
= 1, 2, . . . ,
yields XI
=
1.
N, divisible by
N
I (I_I'" X.)2
(2)
• ~I
for all x. with
XI
= 1.
Hence (3)
To evaluate the right-hand side, we write (4)
where a. runs over the sequence ai, a2, . . . ,aN. greatest common divisor of VI and V2. 379
Here K denotes the
380
Introduction to Matrix Analysis
Now suppose that the sequence Iak I possesses sufficient regularity properties so that there exists a formula of the type
L
pi'"
.~1.2 •...
N
+ R(p)
1 = f(p)
(5)
.N
where R p is a remainder term and the I'density function" f(p) is multiplicative, i.e., (6)
Then (7)
Using this formula in (4), the result is
f(N,z)
~N
L 7(::if(~~) + L VI,VI~'
xv,x••R
(V~2)
(8)
VI,VI~'
Consider now the problem of determining the x., 2 the quadratic form
~
v
~
z, for which (9)
is a minimum. To do this, we introduce the function fl(n)
=
I lJ(d)f(n/d)
(10)
dl"
If n is squarefree, we have /l(n)
=
f(n)
n(
1-
f(~»)
(11)
pi"
Using the Mobius inversion formula, (10) yields f(K) = ') fl(n) =
"ftc
l
fl(n)
(12)
"Iv. "I••
Using this expression for f(K) in (9), we see that P(x)
~ ~ f,(n) {~/e;)}'
(13)
The Quadratic Form of Selberg Now perform a change of variable.
~
381
Set Xv
Y.. = ~ f(v)
(14)
IIlv
.~.
Then, using the Mobius inversion formula once more, we have (15)
The problem is thus that of minimizing (16) over all I Yk I such that (17)
It is easily seen that the minimizing IYk I is given by (18)
and that the minimum value of the form P is 1
The corresponding A.. are determined by the relations
(19)
382
Introduction to Matrix Analysis
+"
Inserting these values in (8), we obtain the inequality f(N,z)
~
r
,:5,-
N
pJ(p) h(p)
~
1~••~.Jl(v,v,jK)1
(21)
III,II.:=;_
If z is chosen properly, say NJA-·, the upper bound is nontrivial. A particularly simple case is that where a. EI k, so that f(p) == p. We then obtain a bound for the number of primes less than or equal to N.l
Bibliography and Discussion For a treatment of the Selberg form by means of dynamic programming, see R. Bellman, Dyuamic Programming and the Quadratic Form of Selberg, J. Math. Anal, Appl., vol, 15, pp. 3~32, 1966. I
A. Selberg, On an Elementary Method in the Theory of Primes, Kgl. Nor.k.
Videnlkab. 8eukab. Fork., Bd 19, Nr. 18, 1947.
Appendix C A Method of Hermite Consider the polynomial p(x) with real coefficients and, for the moment, unequal real roots rl, r2, • . . , rN, and suppose that we wish to determine the number of roots greater than a given real quantity al. Let . (1) where the
Xi
are real quantities and Q(x) denotes the quadratic form N
Q(x) =
l
(2)
yN(ri - al)
i~
1
It now follows immediately from the "law of inertia" that if Q(x) is reduced to another sum of squares in any fashion by means of a. real transformation, then the number of terms with positive coefficients will be equal to the number of real roots of p(x) = 0 greater than al. This does not seem to be a constructive result immediately, since the coefficients in Q(x) depend upon the unknown roots. Observe, however, that Q(x) is a. symmetric rational function of the roots and hence can be expressed as a rational function of the known coefficients of p(x). The more interesting case is that where p(x) has complex roots. This was handled by Hermite in the following fashion. Let rl and r2 be conjugate complex roots and write rl = p(cos a YI = U iv
+
where
Ui
and 1
rl - al
Vi
are real.
= r(cos cP
+ i sin a)
r2 = p(cos a - i sin a)
Y2 = U + iv
(3)
Similarly, set
+i
1 r2 - al
sin cP)
(
.) cP - ~.81D cP
(4)
+ v cos cP/2)1
(5)
- - - = r cos
It follows that yl2 rl - at
+ r2 Y2-
2 al
=
2r(u cos cP/2 - V sin cP/2)2 - 2r(u sin cP/2 383
384
Introduction to Matrix Analysis
From this, Hermite's result may be concluded:
Theorem. Let the polynomial p(x) have real coejficients and unequal roots. Let N
Q(x)
=
l
yN(ri - al)
(6)
a-I be reduced to a sum of squares by means of a real nonsingular substitution. Then the number of squares with positive coeJficients will be equal to the number of imaginary roots of the equation p(x) = 0 plus the number of real roots greater than al. On the other hand, the result in Sec. 3 of Chap. 5 also yields a method for determining the number of roots in given intervals by way of Sturmian sequences. The connection between these two approaches and some results of Sylvester is discussed' in detail in the book on the theory of equations by Burnside and Panton to which we have already referred, and whose presentation we have been following here. An extension of the Hermite method was used by Hurwitz l in his classic paper on the determination of necessary and sufficient conditions that a polynomial have all its roots with negative real parts. Further references and other techniques may be found in Bellman, Glicksberg, and Gross.· What we have wished to indicate in this appendix, and in Appendix B devoted to an account of a technique due to Selberg, is the great utility of quadratic forms as Hcounting devices." Another application of this method was given by Hermite' himself in connection with the reduction of quadratic forms. I A. Hurwitz, Uber die Bedingungen unter welchen eine Gleichung nur Wurzeln mit negativen reellen Teilen besitzt, Math. Ann., vol. 46,1895 (Werke, vol. 2, pp. 533-545). E. J. Routh, The Advanced Part oj a Treati8e on the Dynamics oj a SY8tem oj Rigid Bodin, Macmillan &: Co., Ltd., pp. 223-231, London, 1905. • R. Bellman, I. Glick8berg, and O. Gross, S01M A8pects oj the Mathematical Theory oj Control Prou88u, The RAND Corporation, Rept. R-313, January 16, 1958. I C. Hermite, Oeuvre8, pp. 284-287, pp. 397-414; J. Crelle, Bd. LII, pp. 39-51, 1856.
Appendix D Moments and Quadratic Forms 1. Introduction. The level of this appendix will be somewhat higher than its neighbors, since we shall assume that the reader understands what is meant by a Riemann-Stieltjes integral. Furthermore, to complete both of the proofs below, we shall employ the Helly selection theorem. 2. A Device of Stieltjes. Consider the sequence of quantities (ak I, k = 0, 1, . . . ,defined by the integrals ak =
10
1
tk dg(t)
(1)
where get) is a monotone increasing function over [0,1}, with g(l) = 1, g(O) = 0. These quantities are called the moments of dg. It was pointed out by Stieltjes that the quadratic forms N
QN(X)
l
=
ak+lXkXl
=
k,l=O
1
N
(l
Xkt k
)2 dg(t)
~2)
k=O
are necessarily non-negative definite. AN = (ak+Z)
10
Let us call the matrices
k, l = 0, 1, . . . ,N
(3)
Hankel matrices. The criteria we have developed in Chap. 5, Sec. 2, then yield a number of interesting inequalities such as
10 10
1
k
t dg 1
tk +1 dg
101 tk+l dg 10 t + dg 1
k 21
~O
(4)
with strict inequality if g has enough points of increase. If, in place of the moments in (1), we employ the trigonometric moments Ck
(1
.
= Jo e2...kl dg
k = 385
0, ± 1, ±2,
(5)
886
Introduction to Matrix Analysis
then the quadratic form obtained from (6) is called the Toeplitz form, and the corresponding matrix a Toeplitz matrix. It follows from these considerations that a necessary condition that a sequence (Cll) be derived from (5) for some monotone increasing function g(t), normalized by the condition that g(l) - g(O) = 1, is that
(7) be a non-negative definite Hermitian matrix. It is natural to ask whether or not the condition is sufficient. As we shall see, in one case it is; in the other, it is not. A Technique of E. Fischer. Let us consider an infinite sequence of real quantities (a-) satisfying the condition that for each N, the quadratic form
a.
N
QN(X)
= .~
",-0
a~jXiXi
(1)
is positive definite. In this case we wish to establish Theorem 1. Theorem 1. If tM 2N 1 real quantities ao, ai, ••• , aIA' are BUch t/uJt QN(X) is positive definite, then tMY can be represented in tM form
+
k
= 0, 1, • . . , 2N
(2)
where b. > 0 and tM r i are real and different. Proof. Consider the associated quadratic form N
PN(X)
=
l
al+i+J"XiXi
(3)
i.i-O
Since, by assumption, QN(X) is positive definite, we can find a simultaneous representation N
QN(X)
=
L
(mjoXO
+
+ 'mjNXN)1
i-O N
PN(X) =
Lr.(mjoXO + ... + miNxN)1 i-O
where the r, are real.
(4)
Moments and Quadratic Forms
387
The expression for QN(X) yields l!;Hi+I)
=
mOimO,j+1
+ ... + mNimN,HI
i = 0, 1, . . . ,N, j = 0, . . . ,N - 1 (5)
On the other hand, from the equation for PN(x) we obtain = rOmOimOj
al+l+j
+ . . . + rNmNimNj
(6)
Combining (5) and (6), we obtain the system of linear homogeneous equations rOm OJ)
mOi(mO,j+l -
+ . . . + mNi(mN,j+l
for j = 0, I, . . . ,N. Since QN(X) is positive definite, the determinant Hence, (7) yields the relations or
mk,j
k
= rkmk,j
mk,j+l
=
ll::
-
rNmN,j) = 0
(7)
Imiil must be nonzero.
0, ... ,N
rkimk,O
(8) (9)
It follows that we can write QN(X) in the form N
QN(X)
l
Ie:
mj0 2(xo
+ rixi + ... + rl NxN)2
(10)
j-O
from which the desired representation follows. Since QN(X) is positive definite, ri y6 rj for i y6 j. 4. Representation as Moments. Using Theorem 1, it is an easy step to demonstrate Theorem 2. Theorem 2. If a real sequence {a.. } satisfies the two conditions N
l
l!;+jXiXj positive definite for all N
(la)
i,j-O N
~
(ai+j -
l!;+j+I)XiXj positive definite for all N
(lb)
',J -0
it is a moment sequence, i.e., there exists a bounded monotone increasing function g(t) defined over [ -1,1} such that a..
= J~l t.. dg(t)
(2)
Proof. The representation of Theorem 1, combined with condition (lb) above, shows that Ir.1 ~ 1 for i = 0, 1, . . . ,N. Hence, we can find a monotone increasing function gN(t), bounded by ao, such that
n
= 0,1, . . . ,N
(3)
Introduction to Matrix Analysis
388
To complete the proof, we invoke the Helly selection theorem which assures us that we can find a monotone increasing function g(t), bounded byao, with the property that lim u......
fit" dg M(t) = fit .. dg(t) -I
-I
(4)
as M runs through some sequence of the N's. G. A Result of Herglotz. Using the same techniques, it is not difficult to establish the following result of Herglotz. Theorem 3. A necessary and sujficient condition that a sequence of complex numbers (a,,), n = 0, ± 1, . . . ,with a_" = a", have tM property that N
N
l L a"_kx~k ~ 0
(1)
A-I ' - I
for all complex numbers (x.. 1 and all N is that there exist a real monotone nondeereasing function u(y), of bounded variation such that (2"
a.. = Jo ei " l1 du(y)
(2)
Fischer's method is contained in his paper,l
Bibliography and Discussion Quadratic forms of the type
l
~-jXiXi are called L forms after the
i,i-O
connection discovered between these forms and the Laurent expansion of analytic functions and thus the connection with Fourier series, in general. They play an extremely important role in analysis. See, for example, the following papers: O. Szasz, tlber harmonische Funktionen und L-formen, Math. Z., vol. 1, pp. 149-162, 1918. O. Szasz, tlber Potenzreihen und Bilinearformen, Math. Z., vol. 4, pp. 163-176, 1919.
L. Fejer, tlber trigonometrische Polynome, J. Math., vol. 146, pp. 53-82, 1915. 1 E. FillCher, 'Uber die CaratheodoryllChe Problem, Potenzreihen mit positivem reellen Teil betreft'end, Rend. eire. mat. Pakrmo, vol. 32, pp. 240-256, 1911.
Indexes
Name Index Mriat, s. N., 89, 106, 244 Albert, A. A., 30 Alexandroff, P., 310 Amir-Mocz, A. R., 123 Anderson, L. E., 125 Anderson, T. W., 110,284 Angel, E. S., 333n. Anke, K., 260 Antosiewicz, H., 259 Aoki, M., 336 Arley, N., 186, 279 Aronszajn, N., 123 Arrow, K. J., 260, 3().\l Atkinson, F. V., 72, 226n.
Boltyanskii, V. G., 334 Borchsenius, A., 186 Borel, E., 354 Borg, G., 105n., 125 Bourgin, D. G., 378 Brauer, A., 41, 142, 304, 310, 312 Braver, A., 226 Brenner, J. L., 230 Brock, P., 258 Brown, R. D., 71 Brown, T., 352 Brown, T. A., 310, 368n. Browne, E. T., 142 Broyden, C. G., 337 Brun, V., 379 Buckner, H., 260 Burchnall, J. L., 27 Burington, R. S., 305 Burman, J. P., 29 Burnside, W. S., 88
Bailey, P. B., 3iiO Baker, H. F., ts.'> Barnett, S., 25Rn. Barrett, J. H., 184, 228 Bartlett, M. S., 279 Bass, R. W., 260 Bassett, I. M., 71 Calabi, E., 88 Bateman, H., 123 Caratheodory, A., 389 Bauer, F. L., 227 Carlson, D., 279n. Baxter, G., 226 Cartan, E., 246 Beckenbach, E. F., 139, 140, 187, 389 Caspary, F., 26, 109 Beckman, M. J., 277n. Cauchy, A., 134n. Bellman, R., 29, 110, 111, 124, 125, 138- Causey, R. L., 227n. 142, 160-162, 182n., 183n., 184, 186- Cayley, A., 82, 107 188, 227, 230, 246, 247, 259, 261, 2RO, Chen, Kuo-Tsai, 185n. 284, 285, 303n., 310, 312-314, 334-336, Chen, Y. W., 71 Ciarlet, P. G., 311 347, 350-352, 368n., 369, 384, 389 Coddington, E. A., 259 Bendat, J., 111 Bendixson, I., 224 Collar, A. R., 229 Collatz, L., 124, 307n., 312 Berge, C., 162 Bergstrf/lm, H., 141 Conti, R., 229 Berlin, T. H., 248 Cooke, K. L., 162, 230, a51, 369 Bickley, W. G., 123 Courant, R., 122 Birkhoff, G. A., 30, 277n., 306, 309, 310, Craig, H., 101 Cremer, H., 260 351 Blackwell, D., 307n. Bocher, M., BOn., 247 Bochner, S., 110, 314 Dahlquist, G., 335 Bohm, C., 159 Danskin, J. M., 303, 312 Bohnenblust, H., 303, 309, 312 Dantzig, G. B., 313 393
394
Introduction to Matrix Analysis
Dean, P., 161 Debreu, G., 310 DeBruijn, N. G., 142, 221n., 248 Deemer, W. L., 29 DeMur, R. E., 307 Diaz, J. B., 123 Diliberto, S., 220, 257 Dines, L. L., 89 Djokovic, D. J., 68 Dobsch, R., 111 Doetsch, G., 369 Dolph, C. L., 71, 161 Drazin, M. P., 72 Dreyfus, S., 337 Drobnies, S. I., 368". Duffin, R. J., 111, 124, 138, 189 Duncan, W. J., 229 Dunford, N., 43 Dwyer, P. S., 107 Dyson, F. J., 284 Easterfield, T. E., 186 Eddington, A. E., 225 Effertz, F. H., 111, 260 Eichler, M., 181n. Englman, R., 285 Enthoven, A. C., 260 Fan, Ky, 67, 70, 71, 123, 126, 131, 132, 136, 137, 139, 143, 186, 225, 304, 306, 307, 310, 313, 377, 378 Faris, W. G., 181n. Feingold, D. G., 230 Fejer, L., 103, 109, 388, 389 Feller, W., 279 Fer, F., 175n., 211 Feynman, R. P., 186 Ficken, F. A., 31 Finsler, P., 89 Fischer, E., 123, 140, 386, 388n. Folkman, 0., 258 Ford, F., 337 Forsyth, A. R., 103 Forsythe, G. E., 70, 369, 378 Foukes, H. 0., 30 Frame, J. S., 100 Frank, E., 261 Franklin, J. N., 370 Frazer, R. A., 229 Fr~chet, M., 280 Freiman, G. A., 261 Freimer, M., 162 Friedman, B., 30 Frobenius, G., 107, 309 Fueter, R., 107
Fuller, A. T., 258n., 260 Furstenberg, H., 285 Gaddum, J. W., 89 Gamkrelidze, R. V., 334 Gantmacher, V., 188, 314 Garding, L., 182 Gasymov, G., 125 Gautschi, W., 369, 370 Gelfand, I. M., 125 Gentry, I. C., 226 Geronimus, J., 66 Gersgorin, S., 109 Girshick, M. A., 110 Givens, W., 261 Glicksberg, I., 124, 141, 187, 384n. Goldberg, K., 18!) Golden, H., 138 Goldstine, H., 108, 186 Golub, G. H., 70 Good, I. J., 88 Good, R. A., 313 Graybill, F. A., 68 Green, H. S., 71 Greenspan, D., 108 Grenander, U., 87, 245, 284 Greub, W., 13Rn. Groebner, W., 72 Gross, 0., 124, 141, 187, 384n. Guderley, K. G., 181 Gunthard, H., 188 Hadamard, J., 140, 362 Hahn, W., 247 Hammersley, J. M., 138n. Hancock, H., 11 Hardy, G. H., 70, 139 Harris, T. E., 308 Hartman, P., 309 Hausdorff, F., 185, 224n. Haynsworth, E. V., 143,280, 305 Hazenbroek, P., 260 Heap, B. R., 311 Hearon, J. Z., 306n., 307, 310, 311 Heinz, E., 247 lIelson, H., 230 Hermite, C., 383, 384n. Herstein, I. N., 77, 307, 310 Herz, C. S., 110 Hestenes, M. R., 31, 108, 229, 334 Hickerson, N., 311 Hilbert, D., 43 Hilbert, 0., 122 Hille, E., 184, 185, 351 Hirsch, K. A., 224, 227
Name Index Hottman, A. J., 143, 186,224, 227n., 377, 378 Hoffman, W. A., 31 Holland, J., 29 Hopi, H., 310 Hori, Jun-ichi, 284 Hotelling, H., 101 Householder, A. S., 186, 227, 230, 369 Hau, P. L., 285 Hua, L. K., 138, 140 Hurwitz, A., 28, 253, 260, 384n. Ingham, A. E., 102, 110, 284 Jacobi, G., 88, 359 Jacobson, N., 72 Jones, W. R., 352 Kac, M., 245ft., 248, 279, 389 Kagiwada, H., 351 Kalaba, R., 29, 183n., 187, 230, 285, 312, 334, 347, 350-352, 369, 370 Kalman, R. E., 162 Kaplan, S., 310 Kaplansky, I., 313 Karlin, S., 161, 279, 280, 311, 312, 314, 315 Karpelevic, F. I., 315 Kato, T., 123,223 Keller, H. B., 185 Keller, J. B., 185 Kellogg, R. B., 311 Kendall, M. G., 285 Kermack, W.O., 180 Kerner, E. H., 284 Kesten, H., 285 Klein, F., 247 Ko, Chao, 224 Koecher, M., 182, 314 Koepcke, R. W., 162 Kolmogoroff, A., 67n. Koopmans, T. C., 277n., 313 Kotelyanskii, D. M., 27 Kotin, L., 306 Krabill, D. M., 221n. Krall, F. R., 352 Krein, M. G., 71, 188, 308, 314 Kubo, R., 180 Lanczos, C., 378 Lane, A. M., 111 Langenhop, C. E., 106 Langer, H., 189
395
Lappo-Danilevsky, J. A., 187, 258 Larcher, H., 72 Lattes, R., 370 Lavrentiev, M. M., 370 Lax, M., 71 Lax, P. D., 225 Ledermann, W., 277, 279, 304 Lee, H. C., 69n., 224 Lefschetz, S., 227 Lehman, R. S., 182n., 370 Lehman, S., 161, 187 Levin, J. J., 184n. Levinger, B. W., 230 Levinson, N., 159, 259 Levitan, B. M., 125 Lew, A., 183n. Lewy, H., 109 Li Ta, 261 Lions, J. L., 370 Littlewood, D. E., 248 Littlewood, J. E., 70, 139 Lockett, J. A., 162, 230, 369 Loewner, C., 110, 111 Lopes, L., 142 Lowan, A. N., 244 Lowdenslager, D., 230 Lumer, G., 188 Lyapunov, A., 184, 251, 259 Lynn, M. S., 311 Maass, H., 110, 182 McCrea, W. H., 180 McDonough, R. N~, 369 MacDutTee, C. C., 28, 30, 106, 246, 248 McEnteggert, P., 143 McGregor, J. L., 142, 161, 279, 280, 315 Maclane, S., 30 McLaughlin, J. E., 71, 161 McNabb, A., 352 Macphail, M. S., 107 Magnus, W., 185, 186 Mandl, P., 311 Mangasarian, O. L., 307n. Maradudin, A., 285 Marcus, M. D., 124, 138, 142, 143, 245, 248
Marimont, R. B., 311 M8ol'Ilaglia, G., 68 Marx, I., 71, 161 Masani, P., 229 Maybee, J., 311 Mehta, M. L., 284 Menon, M. V., 306n. Metzler, P., 303 Miloud, B., 310 Mine, H., 138n., 143
396
Introduction to Matrix Analysis
Minkowski, A., 304 Mirsky, L., 11, 42, 69, 131n., 143, 223n., 225,278 Mishchenko, E. F., 334 Montel, P., 221n. Montroll, E., 280 Mood, A. M., 285 Moore, E. H., 107 Mordell, L. J., 69 Morgenstern, 0., 308, 313 Morse, M., 11 Motzkin, T. S., 101, 222, 223n. Moyls, B. N., 142, 248 Mullikin, T., 312 Murdock, W. L., 24Sn., 389 Murnaghan, F. D., 67, 246
Quirk, J., 311
Oldenburger, R., 109, 378 Olkin, I., 29, 104n., 110 Oppenheim, A., 140 Osborn, H., 122 Ostrowski, A., 31, 70, 88, 142, 150, 186, lR9, 221n., 222n., 305, 306n., 369
Radon, J., 28n. Rao, C. R., 106 Rasch, G., 110, 185 Redheft'er, R. M., 352 Reid, W. T., 229, 336 Rescigno, A., 311 Reuter, G. E., 277, 279 Rheinboldt, W., 138n. Rice, S. 0., 284 Richardson, J. M., 162, 368n. Richter, H., 106 Riesz, M., 70 Rinehart, R. F., 106, 107, 334 Riordan, J., 284 Robertson, H., 137 Robinson, D. W., 104 Robinson, R. M., 27 Rockafellar, R. T., 310 Roll, L. B., 369 Rosenbloom, M., 188, 247 Ross, C. C., Jr., 188 Roth, W. E., 107 Routh, E. J., 384n. Roy, S. N., 285 Rudin, W., 105 Rutherford, D. E., 242, 247 Rutman, M. A., 308 Ryser, H. J., 143, 242n., 248
Paley, R. E. A. C., 29 Panton, A. W., 88 Parker, W. V., 41, 88, 104, 142,223 Parodi, M., 124 Parter, S. V., 306n. Pearl, M. H., 313 Penrose, R., 107 Perron, 0., 228, 261, 308 Petrovsky, 1. G., 69n. Phillips, H. B., 224 Phillips, R., 185 Ping, A., 352 Plackett, R. L., 29 Poincar~, H., 104, 124, 184,261,280 PoUard, J. H., 246 Polya, G., 70, 123, 139, 187, 221n., 314 Pontryagin, L. S., 334 Potter, H. S. A., 110 Preisendorfer, R., 351 Prestrud, M., 351 Price, H. S., 311 Primas, H., 188 Pulag, P., 334 Putnam, C. R., 94, 223, 311
Sack, R. A., 18On. Samelson, H., 310 Samuels, J. C., 262 Schatli, C., 27 Schild, A., 124 Schlesinger, L., 185 Schmidt, E., 101 Schneider, H., 101 Schnirelman, L., 137 Schoenberg, 1. J., 105, 160, 314 Schreiber, S., 278 Schumitzky, A., 352 Schur, I., 69n., 88, 95, 109, 137n., 141, 227, 248, 389 Schwartz, J. T., 43 Schwarz, B., 306 Schwarz, H. R., 260 Schwerdtfeger, H., 11, 106, 247 Seelye, C. J., 87 Sefl, 0., 262 Segre, G., 311 Selberg, A., 379, 382n. Sevieta, E., 311 SJiaft'er, C. V., 187n.
Nambu, Y., 188 Nanda, D. N., 285 Nerlove, M., 260, 3M Newman, M. H. A., 225 Noble, S. B., 310 Nyquist, H., 284
Name Index Shapiro, J. F., 162 Sherman, S., Ill, 278n. Shiffman, M., 313 Shoda, 0., 219 Siegel, C. L., 102, 110 Simon, H. A., 162 Sitgreaves, R., 110 Slepian, P., Ill, 189, 305, 315 Smith, R. A., 183, 184n. Snow, R., 312 Spillane, W. R., 311 Srinivasam, S., 262 Starzinski, V. M., 228 Stein, P., 220, 258 Stenger, F., 246 Stenzel, H., 72 Stieltjes, T. J., 87, 157, 161, 385 Stone, B. J., 183 Storey, C., 258n. Sturm, C., 124 Sylvester, J. J., 43, 101, 107, 219 Synge, J. L., 69n. Szasz, 0., 131, 388 Szego, G., 66, 87, 221n., 245n., 389 Szekeres, G., 221n. Taussky, 0., 30, 67, 88, 108, 137, 143, 222-224, 226, 227, 244, 258, 261, 279, 304 Temple, G., 123 Ter Haar, F., 188 Thatcher, H. C., Jr., 378 Thomas, R. G., 111 Thompson, R. C., 124, 143 Todd, J., 67, 222, 244, 261, 269, 370 Toeplitz, 0., 87, 101, 224n., 228, 248 Trotter, H. F., 181n. Turan, P., 227
397
Varga, R. S., 230, 306n., 309, 337 Vasudevan, R., 262 Vetter, W. J., 182n. Visser, C., 181n. von Neumann, J., 108, 186, 301, 312, 313 Vorobev, Ju. V., 370 Wang, I., 352 Wedderburn, J. M., 225, 227, 248 Weinberg, L., 111, 189, 305, 315 Weinberger, A. F., 225 Weinstein, A., 123 Weiss, G. H., 285 Weiss, L., 369 Weitzenbuck, R., 242 Wendel, J. G., 226n. Westlake, J. R., 35 Westwick, R., 142, 248 Weyl, H., 103, 247 Whaples, G., 378 Whittle, P., lOOn. Widom, H., 2400. Wiegmann, N. A., 219, 220, 230 Wielandt, H., 123, 220, 225n., 312 Wiener, N., 159, 160,229 Wigner, E. P., 111, 284, 335 Wilcox, R. M., 181n. Wilf, H. S., 141 Wilks, S. S., 284 Williams, J. D., 313 Williamson, J., 67, 141, 244 Wimmer, S., 246 Wing, G. M., 312, 350, 351 Winograd, S., 31 Wintner, A., 67, 309, 312 Wong, Y. K, 309 Woodbury, M. A., 309
Ullman, J. L., 310 Urquehart, N. S., lOOO.
Yanase, M. Y., 335 Yarlagadda, R., 125 Youngs, J. W. T., 306n.
Van Dantzig, D., 142 Van der Werden, B. L., 228, 260
Zohar, S., 159
Subject Index Absorption processes, 348 Adaptive control processes, 351 Additive inequalities, 131 Adjoint equation, 174 Adjoint transformation, 31 Analytic continuation, 151 Approximate techniques, 120 Approximation theorem, 205 Arithmetic-geometric mean inequality, 134 Associativity, 13, 20 Asymptotic behavior, 322 Backward induction, 141 Baker-Campbell-Hausdorff formula, 181 Bilinear form, 28 Biorthogonality, 213 Birth and death processes, 161, 186 Bordered determinants, 86 Bordered matrices, 284 Bottleneck processes, 296 Branching processes, 308
Composite matrices, 83 Compound matrices, 242 Condition of certain matrices, 244n. Conservation laws, 122 Conservation relation, 345 Constrained maxima, 73 Continued fraction expansions, 286 Control, 259 Control processes, 316, 384n. Control variable, 317 Convex functions, 124 Convex matrix functions, 111 Courant-Fischer theorem, 115 Courant parameter, 321 Cramer's rule, 33, 372 Criterion function, 317 Deferred passage to the limit, 363 Design of experiments, 29 Determinants, 108, 371 bordered, 86 Casorati,221 Cauchy, 193 Gramian,46 random, 284 Vandermonde, 193 Diagonal form, 37 Diagonal matrix, 18 Diagonalization of a matrix, 50, 194 Difference equations, 204, 222, 309 Differential approximation, 183n. Differential-difference equations, 230, 355,369 Direct product, 244 Disordered linear chain, 284 Dissipative systems, 124 Domain, 224 Doubly stochastic matrix, 138,277, 30fl Dynamic programming, 29, 144, 187, 246, 301, 310-312, 314, 316, 325, 334, 364, 370, 382
Calculus of variations, 317, 334, 335, 337 Canonical form, 39 Casorati determinant, 221 Cauchy determinant, 193 Cauchy-Schwarz inequality, 126 Cauchy's inequality, 14 Cayley algebra, 30 Cayley-Hamilton theorem, 55, 103, 195, 207 Cayley transform, 92 Characteristic equation, 34 Characteristic frequencies, 113 Characteristic root, 34, 108 Characteristic value, 34 Characteristic vector, 34 Circulants, 231, 242 Cofactor, 372 Combinatorial analysis, 242n. Commutativity, 13, 30 Commutators, 27, 180, 311 Eigenvalue, 34 Complex quadratic forms, 71 Equilibrium states, 163 399
400
Introduction to Matrix Analysis
Error analysis, 366 Euclidean algorithm, 31 Euler equation, 318, 319 Euler's method, 190 Exponential operators, 181n. Farkas' theorem, 313 Feedback law, 317 Finite Fourier transform, 27 Finsler's theorem, 76 Fixed-point principle, 71 Fixed-point theorems, 310 Form: bilinear, 28 canonical, 39 Jordan, 198,276, 290 diagonal, 37 negative definite, 8 non-negative definite, 8 positive definite, 8, 40, 54 positive indefinite, 8 quadratic, 1, 22 complex, 71, 161 Selberg, 245 reproducing, 28 Toeplitz, 87 Function: convex, 124 criterion, 317 Green's, 161 Lyapunov, 184n. matrix, 106, 280 convex, 111 self-consistent, 365 symmetric, 231, 234 Functional analysis, 184 Functional equations, 145, 162, 170, 187, 335, 352 Functions of matrices, 90 Game theory, 300, 313 Gauss transform, 27 Gaussian elimination, 370 Gaussian quadrature, 357 Generalized probabilities, 274 Generalized semi-groups, 352 Gersgorin circle theorem, 230 Gram-Schmidt orthogonalization, 44 Gramian determinants, 46 Graph-theoretic methods, 311 Graph theory, 162 Green's functions, 161 Group matrices, 244 Group representation, 246 Growth processes, 286
Hadamard matrix, 305 Hadamard's inequality, 130 Hamilton-Cayley theorem, 224 Heat equation, 354 Hermite polynomials, 47 Hermitian matrix, 24, 36 Hilbert matrix, 223 Hilbert's inequality, 87 Holder inequality, 127 Homogeneous systems, 372 Hypercomplex variable, 107 Idempotent, 67 Identity matrix, 17 Ill-conditioned matrix, 361 Improperly posed problems, 362, 370 Induced matrices, 242n. Induced transformation, 31 Inequality, 70, 88, 111, 126, 187, 248, 335 additive, 131 arithmetic-geometric mean, 134 Cauchy-Schwarz, 126 Cauchy's, 14 Hadamard's, 130 Hilbert's, 87 Holder, 127 linear, 89, 299, 313 multiplicative, 135 Inhomogeneous equation, 173 Inner product, 14 Input-output matrices, 297, 309 Instability, 356 Integral equations, 185 Invariant imbedding, 285, 312, 333n., 338,342 Invariant vectors, 21 Inverse matrix, 91 Inverse problem, 125 Irreducible matrix, 307 Jacobi bracket symbol, 27, 31,220 Jacobian matrix, 28, 104, 105, 150 Jentzsch's theorem, 310 Jordan canonical form, 198, 276, 290 Kato's lemma, 69 Kronecker: delta, 17 derivative, 245 logarithm, 237 powers, 236 products, 231, 235, 247, 281, 284 square, 283 sum, 238
Subject Index Lagrange identity, 133 Lagrange multiplier, 5 Laguerre polynom iala, 47 Lancaster's decomposition, 189 Laplace transform, 215, 230, 353 Law of inertia, 377, 383 Sylvester's, 88 Legendre polynomials, 47 Lie groups, 185 Limit theorems, 246 Linear dependence, 44, 378 Linear inequalities, 89, 299, 313 Linear operators, 43 Linear programming, 399,313 Linear system, 32 Logical nets, 29 Lorentz matrix, 68 Lyapunov equation, 239 Lyapunov functions, 184n. Lyapunov's second method, 259 Markoff matrices, 265 Markoff processes, 265 Markovian decision process, 301, 314 Mathematical economics, 160, 294 Mathieu equation, 208 Matrix, 16 bordered, 284 composite, 83 compound, 83 condition of certain, 244n. diagonal, 18 diagonal form of, 37 reduction to, 38 diagonalization of, 50, 194 doubly stochastic, 138,277, 306n. functions of, 90 group, 244 Hadamard,305 Hermitian, 24, 36 Hilbert, 223 identity, 17 ill-conditioned, 17 induced, 31 input-output, 297, 309 inverse, 91 inversion of, 108 irreducible, 307 Jacobian, 28 Lorentz, 68 Markoff,265 Minkowski-LeontietT, 298 multiplication of, 31 network, 111 non-negative, 30U, 314 nonsingular, 91
401
Matrix (Cont.): normal, 204, 223n., 226, 228 normalization of, 5 numerical inversion of, 35 orthogonal, 25, 39, 92 oscillation, 311 parametric representation of the elements of, 94 payoff, 300 Penrose-Moore inverse of, 105 generalized, 105 positive, 263, 288 positive definite, 279 positive real, III powers of, 103n. random, 110, 284, 285 rank of, 69, 122, 371 reflection, 339 singular, 91 skew-symmetric, 39, 72, 92, 247, 334 stability, 251 stochastic, 281 symmetric, 23, 35, 36 Toeplitz, 159 totally positive, :ill tournament, 226 trace of, 96 transpose of, 23 triangular, 19, 202 tridiagonal, 125 unitary, 25, 39, 222 Vandermonde, 360, 369 Wronskian, 221n. Matrix exponential, 196 Matrix function, 106 Matrix inversion, 108 Matrix multiplication, 31 Matrix powers, 103n. Matrix publication, 17 Matrix Riccati equation, 184, 336 Maximization, 1, 60 Mean square deviation, 161, 252 Minimax, 124, 189 Minimization, 81 Minimum property, 291 Minkowski-Leontieff matrices, 298 Min-max theorem, 301, 313 Mobius inversion formula, 380 Moments, 387 Monotone behavior, 117 Monotone processes, 310 Multiparametric spectral theory, 72 Multiple characteristic roots, 199 Multiple root, 34 Multiplicative inequalities, 135 Multistage decision processes, 160
402
Introduction to Matrix Analysis
"Nearest neighbor" problems, 285 Negative definite forms, 8 Network matrices, 111 Noncommutative operations, 284 Noncommutativity,20 Nonlinear analysis, 259, 262 Nonlinear eigenvalue problems, 181 Nonlinear equations, 365 Nonlinear summability, 347 Non-negative definite forms, 8 Non-negative matrices, 309, 314 Non-negativity, 176 Nonsingular matrix, 91 Nontrivial solution, 33 Normal matrices, 204, 223n., 226, 228 Normalization of matrices, 5 Norms of vectors and matriccs, 165, 182, 186 Numerical analysis, 369 Numerical inversion: of Laplace transform, 230 of a matrix, 35 Operator calculus, 186 Orthogonal matrix, 25, 39, 92 Orthogonal polynomials, 66n. Orthogonal vectors, 36 Orthogonality, 15 Orthonormal set, 46 Oscillation matrices, 311 Pad~ approximation, 369 Parametric representation of the elements of a matrix, 94 Parseval-Plancherel theorem, 183 Partial differential equations, 333n. Particle proceBBes, 350 Payoff matrix, 300 Penrose-Moore inverse of a matrix, 105 generalized, 138 Periodic coefficients, 208, 228 Perron root, 348 Perron theorem, 288 Perturbation, 71 Perturbation theory, 50, 174, 186, 368n. Plucker coordinates, 247 Poincar~-Lyapunov theory, 335, 352 Poincar~ separation theorem, 118 Polya's functional equation, 177, 187n. Position operator, 183 Positive definite form, 8,40,54 Positive definite matrices, 279 Positive definite sequences, 70 Positive indefinite forms, 8 Positive matrix, 263, 288
Positive operator, 308 Positive real matrices, 111 Positive regions, 314 Positivity, 47, 187 Potential equation, 331 Prediction theory, 230 Kolmogorov-Wiener, 159 Primes, 382n. Principle of superposition, 191 Probability theory, 279 Probability vectors, 265 Process: absorption, 348 adaptive control, 351 birth and death, 161, 186 bottleneck, 296 branching, 296 control, 316, 384n. growth, 286 Markoff, 265 Markovian decision, 301, 314 monotone, 310 multistage decision, 160 particle, 350 scattering, 351 stochastic, 263 stochastic control, 331 transport, 340 Product integrals, 185 Projective metrics, 310 Prony's method, 369 PrUfer transformation, 184, 228 Quadratic form, 1, 22 Quadrature, 356 Quantum mechanics, 188 Quasilinearization, 183n., 370 Quasiperiodic coefficients, 227 Radiative transfer, 351 Random determinants, 284 Random matrices, 110, 284, 285 Random walk, 160, 279 Rank of a matrix, 69, 122, 371 Rayleigh quotient, 112, 122 Reactor criticality, 309 Recurrence relations, 146 Reduction to diagonal form of a matrix, 38 Ueflection matrix, 339 Regularization, 313 Relativity,69n. Renewal equation, 355 Reproducing forms, 28 Riccati differential equation, 219, 323
Subject Index Root: characteristic, 34, 108 multiple, 199 variational description of, 113 multiple, 34 Perron, 348 simple, 34 square, 93, 322, 334 Routing problems, 162 Saddlepoint, 3 Scalar multiplication, 14 Scattering processes, 351 Schur product, 30 Schur's theorem, 202 Selberg quadratic form, 245 Self-consistent function, 365 Semi-groups, 185, 351 Shift operators, 18On. Shortest route methods, 162 Simple roots, 34 Singular matrix, 91 Skew-symmetric matrices, 39, 72, 92, 247,334 Slightly intertwined systems, 153 Spectral radius, 122 Spectrum, 43 Square roots, 93, 322, 334 Stability, 164 Stability matrices, 251 Stability theory, 184, 227, 249, 259, 335 Stacking operator, 245 State variable, 317 Stationary point, 1 Stochastic control process, 331 Stochastic matrices, 281 Stochastic process, 263 Sturm-Liouville problems, 147 Sturmian separation theorem, 117 Sturmian sequences, 257 Sum of squares, 75 Superposition, principle of, 191 Sylvester interpolation formula, 102 Sylvester's Law of Inertia, 88 Symmetric functions, 231, 234 Symmetric matrix, 23, 35, 36
403
Ternary algebra, 31 Theory of games, 300, 313 Toeplitz forms, 87 Toeplitz matrix, 159 Totally positive matrices, 311 Tournament matrices, 226 Trace of a matrix, 96 Tracer kinetics, 307 Transport process, 340 Transport theory, 351 Transpose matrix, 23 Triangular matrix, 19, 202 Tridiagonal matrices, 125 Two-point boundary-value problem, 319 Tychonov regularization, 353, 362 Unitary matrix, 25, 39, 222 Unstable equilibrium, 316 Vandermonde determinant, 193 Vandermonde matrix, 360, 369 Variation-diminishing transformation, 314 Variational description of characteristic roots, 113 Vector, 12 characteristic, 34 inner product, 14 invariant, 21 orthogonal, 36 probability, 365 Wave propagation, 285, 352 Wiener-Hopf'equation, 352 Wiener-Hopf factorization, 226 Wiener integrals, 280 Wishart distribution, 110 Wronskian matrices, 221n. Zonal harmonic, 246