Universitext
Universitext Series Editors: Sheldon Axler San Francisco State University Vincenzo Capasso Universit`a d...
23 downloads
724 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Universitext
Universitext Series Editors: Sheldon Axler San Francisco State University Vincenzo Capasso Universit`a degli Studi di Milano Carles Casacuberta Universitat de Barcelona Angus J. MacIntyre Queen Mary, University of London Kenneth Ribet University of California, Berkeley Claude Sabbah ´ CNRS, Ecole Polytechnique Endre S¨uli University of Oxford Wojbor A. Woyczynski Case Western Reserve University
Universitext is a series of textbooks that presents material from a wide variety of mathematical disciplines at master’s level and beyond. The books, often well class-tested by their author, may have an informal, personal even experimental approach to their subject matter. Some of the most successful and established books in the series have evolved through several editions, always following the evolution of teaching curricula, to very polished texts. Thus as research topics trickle down into graduate-level teaching, first textbooks written for new, cutting-edge courses may make their way into Universitext.
For further volumes: http://www.springer.com/series/223
Paul A. Fuhrmann
A Polynomial Approach to Linear Algebra Second Edition
123
Paul A. Fuhrmann Ben-Gurion University of the Negev Beer Sheva Israel
ISSN 0172-5939 e-ISSN 2191-6675 ISBN 978-1-4614-0337-1 e-ISBN 978-1-4614-0338-8 DOI 10.1007/978-1-4614-0338-8 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2011941877 Mathematics Subject Classification (2010): 15-02, 93-02 © Springer Science+Business Media, LLC 2012 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
To Nilly
Preface
Linear algebra is a well-entrenched mathematical subject that is taught in virtually every undergraduate program, both in the sciences and in engineering. Over the years, many texts have been written on linear algebra, and therefore it is up to the author to justify the presentation of another book in this area to the public. I feel that my jusification for the writing of this book is based on a different choice of material and a different approach to the classical core of linear algebra. The main innovation in it is the emphasis placed on functional models and polynomial algebra as the best vehicle for the analysis of linear transformations and quadratic forms. In pursuing this innovation, a long standing trend in mathematics is being reversed. Modern algebra went from the specific to the general, abstracting the underlying unifying concepts and structures. The epitome of this trend was represented by the Bourbaki school. No doubt, this was an important step in the development of modern mathematics, but it had its faults too. It led to several generations of students who could not compute, nor could they give interesting examples of theorems they proved. Even worse, it increased the gap between pure mathematics and the general user of mathematics. It is the last group, made up of engineers and applied mathematicians, which is concerned not only in understanding a problem but also in its computational aspects. A very similar development occurred in functional analysis and operator theory. Initially, the axiomatization of Banach and Hilbert spaces led to a search for general methods and results. While there were some significant successes in this direction, it soon became apparent, especially in trying to understand the structure of bounded operators, that one has to be much more specific. In particular, the introduction of functional models, through the work of Livsic, Beurling, Halmos, Lax, de Branges, Sz.-Nagy and Foias, provided a new approach to structure theory. It is these ideas that we have taken as our motivation in the writing of this book. In the present book, at least where the structure theory is concerned, we look at a special class of shift operators. These are defined using polynomial modular arithmetic. The interesting fact about this class is its property of universality, in the
vii
viii
Preface
sense that every cyclic operator is similar to a shift and every linear operator on a finite-dimensional vector space is similar to a direct sum of shifts. Thus, the shifts are the building blocks of an arbitrary linear operator. Basically, the approach taken in this book is a variation on the study of a linear transformation via the study of the module structure induced by it over the ring of polynomials. While module theory provides great elegance, it is also difficult to grasp by students. Furthermore, it seems too far removed from computation. Matrix theory seems to be at the other extreme, that is, it is too much concerned with computation and not enough with structure. Functional models, especially the polynomial models, lie on an intermediate level of absraction between module theory and matrix theory. The book includes specific chapters devoted to quadratic forms and the establishment of algebraic stability criteria. The emphasis is shared between the general theory and the specific examples, which are in this case the study of the Hankel and Bezout forms. This general area, via the work of Hermite, is one of the roots of the theory of Hilbert spaces. I feel that it is most illuminating to see the Euclidean algorithm and the associated Bezout identity not as isolated results, but as an extremely effective tool in the development of fast inversion algorithms for structured matrices. Another innovation in this book is the inclusion of basic system-theoretic ideas. It is my conviction that it is no longer possible to separate, in a natural way, the study of linear algebra from the study of linear systems. The two topics have benefited greatly from cross-fertilization. In particular, the theory of finite-dimensional linear systems seems to provide an unending flow of problems, ideas, and concepts that are quickly assimilated in linear algebra. Realization theory is as much a part of linear algebra as is the long familiar companion matrix. The inclusion of a whole chapter on Hankel norm approximation theory, or AAK theory as it is commonly known, is also a new addition as far as linear algebra books are concerned. This part requires very little mathematical knowledge not covered in the book, but a certain mathematical maturity is assumed. I believe it is very much within the grasp of a well-motivated undergraduate. In this part several results from early chapters are reconstructed in a context in which stability is central. Thus the rational Hardy spaces enter, and we have analytic models and shifts. Lagrange and Hermite interpolation are replaced by Nevanlinna-Pick interpolation. Finally, coprimeness and the Bezout identity reappear, but over a different ring. I believe the study of these analogies goes a long way toward demonsrating to the student the underlying unity of mathematics. Let me explain the philosophy that underlies the writing of this book. In a way, I share the aim of Halmos (1958) in trying to treat linear transformations on finitedimensional vector spaces by methods of more general theories. These theories were functional analysis and operator theory in Hilbert space. This is still the case in this book. However, in the intervening years, operator theory has changed remarkably. The emphasis has moved from the study of self-adjoint and normal operators to the study of non-self-adjoint operators. The hope that a general structure theory for linear operators might be developed seems to be too naive. The methods utilizing
Preface
ix
Riesz-Dunford integrals proved to be too restrictive. On the other hand, a whole new area centering on the theory of invariant subspaces and the construction and study of functional models was developed. This new development had its roots not only in pure mathematics but also in many applied areas, notably scattering, network and control theories, and some areas of stochastic processes such as estimation and prediction theories. I hope that this book will show how linear algebra is related to other, more advanced, areas of mathematics. Polynomial models have their root in operator theory, especially that part of operator theory that centered on invariant subspace theory and Hardy spaces. Thus the point of view adopted here provides a natural link with that area of mathematics, as well as those application areas I have already mentioned. In writing this book, I chose to work almost exclusively with scalar polynomials, the one exception in this project being the invariant factor algorithm and its application to structure theory. My choice was influenced by the desire to have the book accessible to most undergraduates. Virtually all results about scalar polynomial models have polynomial matrix generalizations, and some of the appropriate references are pointed out in the notes and remarks. The exercises at the end of chapters have been chosen partly to indicate directions not covered in the book. I have refrained from including routine computational problems. This does not indicate a negative attitude toward computation. Quite to the contrary, I am a great believer in the exercise of computation and I suggest that readers choose, and work out, their own problems. This is the best way to get a better grasp of the presented material. I usually use the first seven chapters for a one-year course on linear algebra at Ben Gurion University. If the group is a bit more advanced, one can supplement this by more material on quadratic forms. The material on qudratic forms and stability can be used as a one-semester course of special topics in linear algebra. Also, the material on linear systems and Hankel norm approximations can be used as a basis for either a one term course or a seminar. Paul A. Fuhrmann
Preface to the Second Edition
Linear algebra is one of the most active areas of mathematics, and its importance is ever increasing. The reason for this is, apart from its intrinsic beauty and elegance, its usefulness to a large array of applied areas. This is a two-way road, for applications provide a great stimulus for new research directions. However, the danger of a tower-of-Babel phenomenon is ever present. The broadening of the field has to confront the possibility that, due to differences in terminology, notation, and concepts, the communication between different parts of linear algebra may break down. I strongly believe, based on my long research in the theory of linear systems, that the polynomial techniques presented in this book provide a very good common ground. In a sense, the presentation here is just a commercial for subsequent publications stressing extensions of the scalar techniques to the context of polynomial and rational matrix functions. Moreover, in the fifteen years since the original publication of this book, my perspective on some of the topics has changed. This, at least partially, is due to the mathematical research I was doing during that period. The most significant changes are the following. Much greater emphasis is put on interpolation theory, both polynomial and rational. In particular, we also approach the commutant lifting theorem via the use of interpolation. The connection between the Chinese remainder theorem and interpolation is explained, and an analytic version of the theorem is given. New material has been added on tensor products, both of vector spaces and of modules. Because of their importance, special attention is given to the tensor products of quotient polynomial modules. In turn, this leads to a conceptual clarification of the role of Bezoutians and the Bezout map in understanding the difference between the tensor products of functional models taken with respect to the underlying field and those taken with respect to the corresponding polynomial ring. This enabled the introduction of some new material on model reduction. In particular, some connections between the polynomial Sylvester equation and model reduction techniques, related to interpolation on the one hand and projection methods on the other, are clarified. In the process of adding material, I also tried to streamline theorem statements and proofs and generally enhance the readability of the book. It is my hope that this effort was at least partially successful. xi
xii
Preface to the Second Edition
I am greatly indebted to my friends and colleagues Uwe Helmke and Abie Feintuch for reading parts of the manuscript and making useful suggestions and to Harald Wimmer for providing many useful references to the history of linear algebra. Special thanks to my beloved children, Amir, Oded, and Galit, who not only encouraged and supported me in the effort to update and improve this book, but also enlisted the help of their friends to review the manuscript. To these friends, Shlomo Hoory, Alexander Ivri, Arie Matsliah, Yossi Richter, and Patrick Worfolk, go my sincere thanks. Paul A. Fuhrmann
Contents
1
Algebraic Preliminaries .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2 Sets and Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.3 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.4 Rings and Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.4.1 The Integers .. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.4.2 The Polynomial Ring . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.4.3 Formal Power Series. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.4.4 Rational Functions .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.4.5 Proper Rational Functions.. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.4.6 Stable Rational Functions . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.4.7 Truncated Laurent Series . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.5 Modules .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.6 Exercises .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.7 Notes and Remarks .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
1 1 1 3 8 11 11 20 22 23 24 25 26 31 32
2
Vector Spaces.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.2 Vector Spaces .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.3 Linear Combinations .. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.4 Subspaces .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.5 Linear Dependence and Independence . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.6 Subspaces and Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.7 Direct Sums. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.8 Quotient Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.9 Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.10 Change of Basis Transformations . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.11 Lagrange Interpolation .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.12 Taylor Expansion .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.13 Exercises .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.14 Notes and Remarks .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
33 33 33 36 36 37 40 41 43 45 46 48 51 52 52 xiii
xiv
Contents
3
Determinants .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.2 Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.3 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.4 The Sylvester Resultant .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.5 Exercises .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.6 Notes and Remarks .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
55 55 55 60 62 64 66
4
Linear Transformations.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.2 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.3 Matrix Representations . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.4 Linear Functionals and Duality . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.5 The Adjoint Transformation . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.6 Polynomial Module Structure on Vector Spaces .. . . . . . . . . . . . . . . . . . . 4.7 Exercises .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.8 Notes and Remarks .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
67 67 67 75 79 85 88 94 95
5
The Shift Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.2 Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.3 Circulant Matrices .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.4 Rational Models .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.5 The Chinese Remainder Theorem and Interpolation . . . . . . . . . . . . . . . 5.5.1 Lagrange Interpolation Revisited . . . . . .. . . . . . . . . . . . . . . . . . . . 5.5.2 Hermite Interpolation.. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.5.3 Newton Interpolation .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.6 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7 Universality of Shifts. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.8 Exercises .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.9 Notes and Remarks .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
97 97 97 109 111 118 119 120 121 122 127 131 133
6
Structure Theory of Linear Transformations . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.2 Cyclic Transformations . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.2.1 Canonical Forms for Cyclic Transformations .. . . . . . . . . . . . 6.3 The Invariant Factor Algorithm.. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.4 Noncyclic Transformations . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.5 Diagonalization .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.6 Exercises .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.7 Notes and Remarks .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
135 135 135 140 145 147 151 154 158
7
Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.2 Geometry of Inner Product Spaces . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.3 Operators in Inner Product Spaces . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.3.1 The Adjoint Transformation . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
161 161 161 166 166
Contents
xv
7.3.2 Unitary Operators.. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.3.3 Self-adjoint Operators . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.3.4 The Minimax Principle .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.3.5 The Cayley Transform.. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.3.6 Normal Operators.. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.3.7 Positive Operators . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.3.8 Partial Isometries . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.3.9 The Polar Decomposition . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Singular Vectors and Singular Values . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Unitary Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Exercises .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Notes and Remarks .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
169 173 176 176 178 180 182 183 184 187 190 193
8
Tensor Products and Forms . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.2 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.2.1 Forms in Inner Product Spaces. . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.2.2 Sylvester’s Law of Inertia . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.3 Some Classes of Forms . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.3.1 Hankel Forms .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.3.2 Bezoutians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.3.3 Representation of Bezoutians . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.3.4 Diagonalization of Bezoutians . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.3.5 Bezout and Hankel Matrices . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.3.6 Inversion of Hankel Matrices . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.3.7 Continued Fractions and Orthogonal Polynomials.. . . . . . . 8.3.8 The Cauchy Index . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4 Tensor Products of Models . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4.1 Bilinear Forms . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4.2 Tensor Products of Vector Spaces . . . . .. . . . . . . . . . . . . . . . . . . . 8.4.3 Tensor Products of Modules . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4.4 Kronecker Product Models .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4.5 Tensor Products over a Field . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4.6 Tensor Products over the Ring of Polynomials.. . . . . . . . . . . 8.4.7 The Polynomial Sylvester Equation . . .. . . . . . . . . . . . . . . . . . . . 8.4.8 Reproducing Kernels . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4.9 The Bezout Map . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.5 Exercises .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.6 Notes and Remarks .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
195 195 196 196 199 204 205 209 213 217 223 230 237 248 254 254 255 259 260 261 263 266 270 272 274 277
9
Stability .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.2 Root Location Using Quadratic Forms .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.3 Exercises .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.4 Notes and Remarks .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
279 279 279 293 294
7.4 7.5 7.6 7.7
xvi
Contents
10 Elements of Linear System Theory .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.2 Systems and Their Representations . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.3 Realization Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.4 Stabilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.5 The Youla–Kucera Parametrization . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.6 Exercises .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.7 Notes and Remarks .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
295 295 296 300 314 319 321 324
11 Rational Hardy Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.2 Hardy Spaces and Their Maps . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.2.1 Rational Hardy Spaces . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.2.2 Invariant Subspaces.. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.2.3 Model Operators and Intertwining Maps .. . . . . . . . . . . . . . . . . 11.2.4 Intertwining Maps and Interpolation . .. . . . . . . . . . . . . . . . . . . . 11.2.5 RH∞ + -Chinese Remainder Theorem . . .. . . . . . . . . . . . . . . . . . . . 11.2.6 Analytic Hankel Operators and Intertwining Maps .. . . . . . 11.3 Exercises .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.4 Notes and Remarks .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
325 325 326 326 334 337 344 353 354 358 359
12 Model Reduction.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.2 Hankel Norm Approximation .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.2.1 Schmidt Pairs of Hankel Operators .. . .. . . . . . . . . . . . . . . . . . . . 12.2.2 Reduction to Eigenvalue Equation .. . . .. . . . . . . . . . . . . . . . . . . . 12.2.3 Zeros of Singular Vectors and a Bezout equation .. . . . . . . . 12.2.4 More on Zeros of Singular Vectors . . . .. . . . . . . . . . . . . . . . . . . . 12.2.5 Nehari’s Theorem.. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.2.6 Nevanlinna–Pick Interpolation .. . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.2.7 Hankel Approximant Singular Values and Vectors . . . . . . . 12.2.8 Orthogonality Relations . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.2.9 Duality in Hankel Norm Approximation . . . . . . . . . . . . . . . . . . 12.3 Model Reduction: A Circle of Ideas. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.3.1 The Sylvester Equation and Interpolation . . . . . . . . . . . . . . . . . 12.3.2 The Sylvester Equation and the Projection Method.. . . . . . 12.4 Exercises .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.5 Notes and Remarks .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
361 361 362 363 369 370 376 378 379 381 383 385 392 392 394 397 400
References .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 403 Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 407
Chapter 1
Algebraic Preliminaries
1.1 Introduction This book emphasizes the use of polynomials, and more generally, rational functions, as the vehicle for the development of linear algebra and linear system theory. This is a powerful and elegant idea, and the development of linear theory is leaning more toward the conceptual than toward the technical. However, this approach has its own weakness. The stumbling block is that before learning linear algebra, one has to know the basics of algebra. Thus groups, rings, fields, and modules have to be introduced. This we proceed to do, accompanied by some examples that are relevant to the content of the rest of the book.
1.2 Sets and Maps Let S be a set. If between elements of the set a relation a b is defined, so that either a b holds or not, then we say we have a binary relation. If a binary relation in S satisfies the following conditions: 1. a a holds for all a ∈ S, 2. a b ⇒ b a, 3. a b and b c ⇒ a c, then we say we have an equivalence relation in S. The three conditions are referred to as reflexivity, symmetry, and transitivity respectively. For each a ∈ S we define its equivalence class by Sa = {x ∈ S | x a}. Clearly Sa ⊂ S and Sa = 0. / An equivalence relation leads to a partition of the set S. By a partition of S we mean a representation of S as the disjoint union of subsets. Since clearly, using transitivity, either Sa ∩ Sb = 0/ or Sa = Sb , and S = ∪a∈S Sa , the set of
P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, DOI 10.1007/978-1-4614-0338-8 1, © Springer Science+Business Media, LLC 2012
1
2
1 Algebraic Preliminaries
equivalence classes is a partition of S. Similarly, any partition S = ∪α Sα defines an equivalence relation by letting a b if for some α we have a, b ∈ Sα . A rule that assigns to each member a ∈ A a unique member b ∈ B is called a f
map or a function from A into B. We will denote this by f : A −→ B or A −→ B. We denote by f (A) the image of the set A defined by f (A) = {y | y ∈ B, ∃x ∈ A s.t. y = f (x)}. The inverse image of a subset M ⊂ B is defined by f −1 (M) = {x | x ∈ A, f (x) ∈ M}. A map f : A −→ B is called injective or 1-to-1 if f (x) = f (y) implies x = y. A map f : A −→ B is called surjective or onto if f (A) = B, i.e., for each y ∈ B there exists an x ∈ A such that y = f (x). A map f is called bijective if it is both injective and surjective. Given maps f : A −→ B and g : B −→ C, we can define a map h : A −→ C by letting h(x) = g( f (x)). We call this map h the composition or product of the maps f f
g
h
and g. This will be denoted by h = g ◦ f . Given three maps A −→ B −→ C −→ D, we compute h ◦ (g ◦ f )(x) = h(g( f (x))) and (h ◦ g) ◦ f (x) = h(g( f (x))). So the product of maps is associative, i.e., h ◦ (g ◦ f ) = (h ◦ g) ◦ f . Due to the associative law of composition, we can write h ◦ g ◦ f , and more generally fn ◦ · · · ◦ f1 , unambiguously. Given a map f : A −→ B, we define an equivalence relation in A by letting x1 x2 ⇔ f (x1 ) = f (x2 ). Thus the equivalence class of a is given by Aa = {x | x ∈ A, f (x) = f (a)}. We will denote by A/ the set of equivalence classes and refer to this as the quotient set by the equivalence relation. Next we define three transformations f
f
f
1 2 3 A/R −→ f (A) −→ B A −→
with the fi defined by f1 (a) = Aa , f2 (Aa ) = f (a), f 3 (b) = b, b ∈ f (A). Clearly the map f1 is surjective, f2 is bijective and f 3 injective. Moreover, we have f = f3 ◦ f2 ◦ f1 . This factorization of f is referred to as the canonical factorization. The canonical factorization can be described also via the following commutative diagram:
1.3 Groups
3
A
f
- B 6 f3
f1 ? A/R
f2
- f (A)
We note that f2 ◦ f1 is surjective, whereas f3 ◦ f2 is injective.
1.3 Groups Given a set M, a binary operation in M is a map from M × M into M. Thus, an ordered pair (a, b) is mapped into an element of M denoted by ab. A set M with an associative binary operation is called a semigroup. Thus if a, b ∈ M we have ab ∈ M, and the associative rule a(bc) = (ab)c holds. Thus the product a1 · · · an of elements of M is unambiguously defined. We proceed to define the notion of a group, which is the cornerstone of most mathematical structures. Definition 1.1. A group is a set G with a binary operation, called multiplication, that satisfies 1. a(bc) = (ab)c, i.e., the associative law. 2. There exists a left identity, or unit element, e ∈ G, i.e., ea = a for all a ∈ G. 3. For each a ∈ G there exists a left inverse, denoted by a−1 , which satisfies a−1 a = e. A group G is called abelian if the group operation is commutative, i.e., if ab = ba holds for all a, b ∈ G. In many cases, an abelian group operation will be denoted using the additive notation, i.e., a + b rather than ab, as in the case of the group of integers Z with addition as the group operation. Other useful examples are R, the set of all real numbers under addition, and R+ , the set of all positive real numbers with multiplication as the group operation. Given a nonempty set S, the set of all bijective mappings of S onto itself forms a group with the group action being composition. The elements of G are called permutations of S. If S = {1, . . . , n}, then the group of permutations of S is called the symmetric group of degree n and denoted by Sn .
4
1 Algebraic Preliminaries
Theorem 1.2. 1. Let G be a group and let a be an element of G. Then a left inverse a−1 of a is also a right inverse. 2. A left identity is also a right identity. 3. The identity element of a group is unique. Proof. 1. We compute (a−1 )−1 a−1 aa−1 = ((a−1 )−1 a−1 )(aa−1 ) = e(aa−1 ) = aa−1 = (a−1 )−1 (a−1 a)a−1 = (a−1 )−1 (ea−1 ) = (a−1 )−1 a−1 = e. So in particular, aa−1 = e. 2. Let a ∈ G be arbitrary and let e be a left identity. Then aa−1 a = a(a−1 a) = ae = (aa−1)a = ea = a. Thus ae = a for all a. So e is also a right identity. 3. Let e, e be two identities in G. Then, using the fact that e is a left identity and e a right identity, we get e = ee = e . In a group G, equations of the form axb = c are easily solvable, with the solution given by x = a−1 cb−1 . Also, it is easily checked that we have the following rule for inversion: −1 (a1 · · · an )−1 = a−1 n · · · a1 . Definition 1.3. A subset H of a group G is called a subgroup of G if it is a group with the composition rule inherited from G. Thus H is a subgroup if with a, b ∈ H, we have ab ∈ H and a−1 ∈ H. This can be made a bit more concise. Lemma 1.4. A subset H of a group G is a subgroup if and only if with a, b ∈ H, also ab−1 ∈ H. Proof. If H is a subgroup of G, then with a, b ∈ H, it contains also b−1 and hence also ab−1 . Conversely, if a, b ∈ H implies ab−1 ∈ H, then b−1 = eb−1 ∈ H and hence also ab = a(b−1 )−1 ∈ H, i.e., H is a subgroup of G. Given a subgroup H of a group G, we say that two elements a, b ∈ G are Hequivalent, and write a b, if b−1 a ∈ H. It is easily checked that this is a bona fide equivalence relation in G, i.e., it is a reflexive, symmetric, and transitive relation. We denote by Ga the equivalence class of a, i.e., Ga = {x | x ∈ G, x a} If we denote by aH the set {x | ah, h ∈ H}, then Ga = aH. We will refer to these as right equivalence classes or as right cosets. In a completely analogous way, left equivalence classes, or left cosets, Ha are defined.
1.3 Groups
5
Given a subgroup H of G, it is not usually the case that the sets of left and right cosets coincide. If this is the case, then we say that H is a normal subgroup of G. Assuming that a left coset aH is also a right coset Hb, we clearly have a = ae ∈ H and hence a ∈ Hb so necessarily Hb = Ha. Thus, for a normal subgroup, aH = Ha for all a ∈ G. Equivalently, H is normal if and only if for all a ∈ G we have aHa−1 = H. Given a subgroup H of a group G, then any two cosets can be bijectively mapped onto each other. In fact, the map φ (ah) = bh is such a bijection between aH and bH. In particular, cosets aH all have the same cardinality as H. We will define the index of a subgroup H in G, and denote it by iG (H), as the number of left cosets. This will be denoted by [G : H]. Given the trivial subgroup E = {e} of G, the left and right cosets consist of single elements. Thus [G : E], is just the number of elements of the group G. The number [G : E] will be refered to also as the order o(G) of the group G. We can proceed now to study the connection between index and order. Theorem 1.5 (Lagrange). Given a subgroup H of a group G, we have [G : H][H : E] = [G : E], or o(G) = iG (H)o(H). Homomorphisms are maps that preserve given structures. In the case of groups G1 and G2 , a map φ : G1 −→ G2 is called a group homomorphism if, for all g1 , g2 ∈ G1 , we have φ (g1 g2 ) = φ (g1 )φ (g2 ). Lemma 1.6. Let G1 and G2 be groups with unit elements e and e respectively. Let φ : G1 −→ G2 be a homomorphism. Then 1. φ (e) = e . 2. φ (x−1 ) = (φ (x))−1 . Proof. 1. For any x ∈ G1 , we compute φ (x)e = φ (x) = φ (xe) = φ (x)φ (e). Multiplying by φ (x)−1 , we get φ (e) = e . 2. We compute e = φ (e) = φ (xx−1 ) = φ (x)φ (x−1 ). This shows that φ (x−1 ) = (φ (x))−1 .
A homomorphism φ : G1 −→ G2 that is bijective is called an isomorphism. In this case G1 and G2 will be called isomorphic. An example of a nontrivial isomorphism is the exponential function x → ex , which maps R (with addition as the operation) isomorphically onto R+ (with multiplication as the group operation).
6
1 Algebraic Preliminaries
The general canonical factorization of maps discussed in Section 1.2 can now be applied to the special case of group homomorphisms. To this end we define, for a homomorphism φ : G1 −→ G2 , the kernel and image of φ by Ker φ = φ −1 {e } = {g ∈ G1 | φ (g) = e }, and
Im φ = φ (G1 ) = {g ∈ G2 | ∃g ∈ G1 , φ (g) = g }.
The kernel of a group homomorphism has a special property. Lemma 1.7. Let φ : G −→ G be a group homomorphism and let N = Ker φ . Then N is a normal subgroup of G. Proof. Let x ∈ G and n ∈ N. Then
φ (xnx−1 ) = φ (x)φ (n)φ (x−1 ) = φ (x)e φ (x−1 ) = φ (x)φ (x−1 ) = e . So xnx−1 ∈ N, i.e., xNx−1 ⊂ N for every x ∈ G. This implies N ⊂ x−1 Nx. Since this inclusion holds for all x ∈ G, we get N = xNx−1 , or Nx = xN, i.e., the left and right cosets are equal. So N is a normal subgroup. Note now that given x, y ∈ G and a normal subgroup N of G, we can define a product in the set of all cosets by xN · yN = xyN.
(1.1)
Theorem 1.8. Let N ⊂ G be a normal subgroup. Denote by G/N the set of all cosets and define the product of cosets by (1.1). Then G/N is a group called the factor group, or quotient group, of G by N. Proof. Clearly (1.1) shows that G/N is closed under multiplication. To check associativity, we note that (xN · yN) · zN = (xyN) · zN = (xy)zN = x(yz)N = xN · (yzN) = xN · (yN · zN). For the unit element e of G we have eN = N, and eN · xN = (ex)N = xN. So eN = N is the unit element in G/N. Finally, given x ∈ G, we check that the inverse element is (xN)−1 = x−1 N. Theorem 1.9. N is a normal subgroup of the group G if and only if N is the kernel of a group homomorphism.
1.3 Groups
7
Proof. By Lemma 1.7, it suffices to show that if N is a normal subgroup, then it is the kernel of a group homomorphism. We do this by constructing such a homomorphism. Let G/N be the factor group of G by N. Define π : G −→ G/N by
π (g) = gN.
(1.2)
Clearly (1.1) shows that π is a group homomorphism. Moreover, π (g) = N if and only if gN = N, and this holds if and only if g ∈ N. Thus Ker π = N. The map π : G −→ G/N defined by (1.2) is called the canonical projection. A homomorphism φ whose kernel contains a normal subgroup can be factored through the factor group. Proposition 1.10. Let φ : G −→ G be a homomorphism with Ker φ ⊃ N, where N is a normal subgroup of G. Let π be the canonical projection of G onto G/N. Then there exists a unique homomorphism φ : G/N −→ G for which φ = φ ◦ π , or equivalently, the following diagram is commutative:
φ
- G
G
π
φ ? G/N
Proof. We define a map φ : G/N −→ G by φ (xN) = φ (x). This map is well defined, since Ker φ ⊃ N. It is a homomorphism because φ is, and of course,
φ (x) = φ (xN) = φ (π (x)) = (φ ◦ π )(x). Finally, φ is uniquely defined by φ (xN) = φ (x).
We call φ the induced map by φ on G/N. As a corollary we obtain the following important result, the prototype of many others, which classifies images of group homomorphisms. Theorem 1.11. Let φ : G −→ G be a surjective group homomorphism with Ker φ = N. Then G is isomorphic to the factor group G/N. Proof. The induced map is clearly injective. In fact, if φ (xN) = e , we get φ (x) = e , or x ∈ Ker φ = N. It is also surjective by the assumption that φ is surjective. So we conclude that φ is an isomorphism.
8
1 Algebraic Preliminaries
A group G is cyclic if all its elements are powers of one element a ∈ G, i.e., of the form an . In this case, a is called a generator of G. Clearly, Z is a cyclic group with addition as group operation. Defining nZ = {nk | k ∈ Z}, it is obvious that nZ is a normal subgroup of Z. We denote by Zn the quotient group Z/nZ. We can identify Zn with the set {0, . . . , n − 1} with the group operation being addition modulo n.
1.4 Rings and Fields Most of the mathematical structures that we will encounter in this book have, unlike groups, two operations, namely addition and multiplication. The simplest examples of these are rings and fields. These are introduced in this section. Definition 1.12. A ring R is a set with two laws of composition called addition and multiplication that satisfy, for all elements of R, the following. 1. Laws of addition: a. Associative law: a + (b + c) = (a + b) + c. b. Commutative law: a + b = b + a. c. Solvability of the equation a + x = b. 2. Laws of multiplication: a. Associative law: a(bc) = (ab)c. b. Distributive laws: a(b + c) = ab + ac, (b + c)a = ba + ca. 3. R is called a commutative ring if the commutative law ab = ba holds for all a, b ∈ R. Law 1 (c) implies the existence of a unique zero element, i.e., an element 0 ∈ R satisfying 0 + a = a + 0 = a for all a ∈ R. We call R a ring with identity if there exists an element 1 ∈ R such that for all a ∈ R, we have 1a = a1 = a. An element a in a ring with identity has a right inverse b if ab = 1 and a left inverse if ba = e. If a has both left and right inverses they must be equal, and then we say that a is invertible and denote its inverse by a−1 . A field is a commutative ring with identity in which every nonzero element is invertible. Definition 1.13. Let R1 and R2 be two rings. A ring homomorphism is a function φ : R1 −→ R2 that satifies
φ (x + y) = φ (x) + φ (y), φ (xy) = φ (x)φ (y). If R1 and R2 are rings with identities, then we require also φ (e) = e .
1.4 Rings and Fields
9
As in the case of groups, the kernel of a ring homomorphism φ : R1 −→ R2 is defined by Ker φ = φ −1 (0) = {x ∈ R1 | φ (x) = 0}. The kernel of a ring homomorphism has special properties. In fact, if x, y ∈ Ker φ and r ∈ R1 , then also x + y, rx ∈ Ker φ . This leads to the following definition. Definition 1.14. A subset J of a ring R is called a left ideal if x, y ∈ J and r ∈ R implies x + y, rx ∈ J. Thus, a subset J of R is a left ideal if it is an additive subgroup of R and RJ ⊂ J. If R contains an identity then RJ = J. For right ideals we replace the second condition by JR = J. A two-sided ideal, or just an ideal, is a subset of R that is both a left and right ideal. The sum of a finite number of ideals J1 , . . . , Jr in R is defined as the set J1 + · · · + Jr = {a1 + · · · + ar | ai ∈ Ji }. Proposition 1.15. Let R be a ring. Then 1. The sum of a finite number of left ideals is a left ideal. 2. The intersection of any set of left ideals in R is a left ideal. Proof. 1. Let J1 , . . . , Jk be left ideals in R. Then J = J1 + · · · + Jk = {a1 + · · · + ak | ai ∈ Ji }. Clearly if ai , bi ∈ Ji and r ∈ R, then k
k
k
i=1
i=1
i=1
k
k
i=1
i=1
∑ ai + ∑ bi = ∑ (ai + bi) ∈ J,
and
r ∑ ai = ∑ rai ∈ J. 2. Let J = ∩α Jα with Jα left ideals. If a, b ∈ J then a, b ∈ Jα for all α . Hence a + b ∈ Jα for all α , which implies a + b ∈ J. A similar argument holds to show that ra ∈ J. Given a two-sided ideal J in a ring R, we can construct the quotient ring, denoted by R/J, whose elements are the cosets a + J by defining the operations of addition and multiplication by ⎧ ⎨ (a + J) + (b + J) = (a + b) + J, ⎩
(a + J)(b + J)
= ab + J.
It is easy to check that with the arithmetic operations so defined, R/J is indeed a ring. The following theorem gives a complete characterization of ideals. It is the counterpart, in the setting of rings, of Theorem 1.9. Theorem 1.16. Let R be a ring. A subset J ⊂ R is a two-sided ideal if and only if it is the kernel of a ring homomorphism.
10
1 Algebraic Preliminaries
Proof. We saw already that the kernel of a ring homomorphism is a two-sided ideal. Conversely, let J be a two-sided ideal. We define the canonical projection π : R −→ R/J by π (a) = a + J. It is easy to check that π is a surjective ring homomorphism, with Ker π = J. An ideal in a ring R that is generated by a single element, i.e., of the form J = (d) = {rd | r ∈ R}, is called a principal ideal. The element d is called a generator of the ideal. More generally, given a1 , . . . , ak ∈ R, the set J = (ai , . . . , ak ) = {∑ki=1 ri ai | ri ∈ R} is obviously an ideal. We say that a1 , . . . , ak are generators of this ideal. In a ring R a nonzero element a is called a zero divisor if there exists another nonzero element b ∈ R such that ab = 0. A commutative ring without a zero divisors is called an entire ring or an integral domain. A commutative ring R with identity and no zero divisors is called a principal ideal domain, or PID for short, if every ideal in R is principal. In a ring R we have a division relation. If c = ab we say that a divides c or a is a left divisor or left factor of c. Given a1 , . . . , an ∈ R, we say that a is a common left divisor of the ai if it is a left divisor of all ai . We say that a is a greatest common left divisor, or g.c.l.d., if it is a common left divisor and is divisible by any other common left divisor. Two greatest common left divisors differ by a right factor that is invertible in R. We say that a1 , . . . , an ∈ R are left coprime if any greatest common left divisor is invertible in R. Right divisors are similarly defined. If R is a commutative ring, a greatest common left divisor is also a greatest common right divisor and will be refered to as a greatest common divisor, or a g.c.d. for short. Proposition 1.17. Let R be a principal ideal domain with identity. Then a1 , . . . , an ∈ R are coprime if and only if there exist bi ∈ R for which the Bezout identity a1 b 1 + · · · + a n b n = 1
(1.3)
holds. Proof. If there exist bi ∈ R for which the Bezout identity holds, then any common divisor of the ai is a divisor of 1, hence necessarily invertible. Conversely, we consider the ideal J generated by the ai . Since R is a principal ideal domain, J is generated by a single element d, which necessarily is invertible. So 1 ∈ J and hence there exist bi such that the Bezout identity (1.3) holds. We present now a few examples of rings. We pay special attention to the ring of polynomials, due to the central role it plays in this book. Many of the results concerning ideals, factorizations, the Chinese remainder theorem, etc. hold also in the ring of integers. We do not give for those separate proofs nor, for the sake of concreteness, do we give proofs in the general context of Euclidean domains.
1.4 Rings and Fields
11
1.4.1 The Integers The set of integers Z is a commutative ring under the usual operations of addition and multiplication.
1.4.2 The Polynomial Ring A polynomial is an expression of the form n
p(z) = ∑ ai zi ,
ai ∈ F, 0 ≤ n ∈ Z.
i=0
The numbers ai are called the polynomial coefficients. We shall denote by F[z] the set of all polynomials with coefficients in the field F, i.e., F[z] = {∑ni=0 ai zi | ai ∈ F, 0 ≤ n ∈ Z}. Two polynomials are called equal if all their coefficients coincide. If p(z) = ∑ni=0 ai zi and an = 0, then we say that n is the degree of the polynomial p(z), which we denote by deg p. The coefficient an is called the leading coefficient of the polynomial p(z). We define the degree of the zero polynomial to be −∞. For a polynomial p(z) = ∑ni=0 ai zi , we assume ak = 0 for k > n = deg p. Given two i polynomials p(z) = ∑ni=0 ai zi and q(z) = ∑m i=0 bi z , we define their sum p(z)+q(z) by max(m,n)
∑
(p + q)(z) =
(ai + bi)zi .
(1.4)
i=0
The product, p(z)q(z), is defined by (pq)(z) =
m+n
∑ ci zi ,
(1.5)
i=0
where ci =
i
∑ a j bi− j .
(1.6)
j=0
It is easily checked that with these operations of addition and multiplication, F[z] is a commutative ring with an identity and has no zero divisors. The next theorem sums up the most elementary properties of polynomials. Theorem 1.18. Let p(z), q(z) be polynomials in F[z]. Then 1. deg(pq) = deg p + degq. 2. deg(p + q) ≤ max{deg p, deg q}. Proof. 1. If p(z) or q(z) is the zero polynomial, then both sides of the equality are equal to −∞. So we assume that both p(z) and q(z) are nonzero. Let
12
1 Algebraic Preliminaries
i p(z) = ∑ni=0 ai zi and q(z) = ∑m 0 i=0 bi z , with an , bm = 0. Then cn+m = an bm = but ck = 0 for k > m + n. 2. This is immediate.
Note that the inequality deg(p + q) < max{deg p, deg q} may occur due to cancellations. Corollary 1.19. If p(z), q(z) ∈ F[z] and p(z)q(z) = 0, then p(z) = 0 or q(z) = 0. Proof. If p(z)q(z) = 0, then −∞ = deg pq = deg p + degq. So either deg p = −∞ or deg q = −∞.
In F[z], as in the ring of integers Z, we have a process of division with remainder. i Proposition 1.20. Given a nonzero polynomial p(z) = ∑m i=0 pi z with pm = 0, then an arbitrary polynomial q(z) ∈ F[z] can be written uniquely in the form
q(z) = a(z)p(z) + r(z)
(1.7)
with degr < deg p. Proof. If the degree of q(z) is less than the degree of p(z), then we write q(z) = 0 · p(z) + q(z) and this is the required representation. So we may assume degq = n ≥ m = deg p. The proof will proceed by induction on the degree of q(z). We assume that for all polynomials of degree less than n such a representation exists. Clearly, n−m q1 (z) = q(z) − qn p−1 p(z) m z
is a polynomial of degree ≤ n − 1. Hence by the induction hypothesis, q1 (z) = a1 (z)p(z) + r1 (z), with deg(r1 ) < deg(p). But this implies n−m + a1 (z))p(z) + r(z) = a(z)p(z) + r(z), q(z) = (qn p−1 m z
with
n−m a(z) = qn p−1 + a1 (z). m z
To show uniqueness, let q(z) = a1 (z)p(z) + r1 (z) = a2 (z)p(z) + r2 (z). This implies (a1 (z) − a2 (z))p(z) = r2 (z) − r1 (z).
1.4 Rings and Fields
13
A consideration of the degrees of both sides shows that, necessarily, they are both equal to zero. Hence, the uniqueness of the representation (1.7) follows. The properties of the degree function in the ring F[z] can be abstracted to general rings. Definition 1.21. A ring R is called a Euclidean ring if there exists a function δ from the set of nonzero elements in R into the set of nonnegative integers that satisfies 1. For all a, b = 0, we have ab = 0 and δ (ab) ≥ δ (a). 2. For all f , g ∈ R, with g = 0, there exist a, r ∈ R such that f = ag + r and δ (r) < δ (g). We can define δ (0) = −∞. Obviously, with this definition, Proposition 1.20 implies that the ring of polynomials F[z] is a Euclidean ring. We note that in a Euclidean ring there are no zero divisors. It is convenient to have a notation for the remainder of a polynomial f (z) divided by q(z). If f (z) = a(z)q(z) + r(z) with deg r < deg q, we shall write r = πq f . We give several properties of the operation of taking a remainder. Lemma 1.22. Let q(z), a(z), b(z) ∈ F[z] with q(z) nonzero. Then
πq (aπq b) = πq (ab).
(1.8)
Proof. Let b(z) = b1 (z)q(z) + πq (z)b(z). Then a(z)b(z) = a(z)b1 (z)q(z) + a(z)πq b. Obviously, πq (ab1 q) = 0, and hence (1.8) follows. Corollary 1.23. Given polynomials ai (z) ∈ F[z], i = 1, . . . , k, then
πq (a1 · · · ak ) = πq (a1 πq (a2 · · · πq (ak ) · · · )).
Proof. By induction.
The following result simplifies in some important cases the computation of the remainder. Lemma 1.24. Let f (z), p(z), q(z) ∈ F[z], with p(z), q(z) nonzero. Then
π pq(p f ) = pπq ( f ).
(1.9)
Proof. Let r = πq f , i.e., for some polynomial a(z), we have f (z) = a(z)q(z) + r(z) and deg r < deg q. The representation of f (z) implies p(z) f (z) = a(z)(p(z)q(z)) + p(z)r(z). Since deg(pr) = deg p + degr < deg p + degq = deg(pq), it follows that π pq (p f ) = pr = pπq f , and hence (1.9) holds.
14
1 Algebraic Preliminaries
Definition 1.25. Let p(z), q(z) ∈ F[z]. We say that p(z) divides q(z), or that p(z) is a factor of q(z), and write p(z) | q(z), if there exists a polynomial a(z) such that q(z) = p(z)a(z). If p(z) ∈ F[z] and p(z) = ∑ni=0 ai zi , then p(z) defines a function on F given by n
p(α ) = ∑ ai α i ,
α ∈ F.
i=0
Then p(α ) is called the value of p(z) at α . An element α ∈ F is called a zero of p(z) if p(α ) = 0. We never identify the polynomial with the function defined by it. Theorem 1.26. Let p(z) ∈ F[z]. Then α is a zero of p(z) if and only if (z − α ) | p(z). Proof. If (z − α ) | p(z), then p(z) = (z − α )a(z), and hence p(α ) = 0. Conversely, by the division rule, we have p(z) = a(z)(z − α ) + r(z) with r(z) necessarily a constant. Substituting α in this equality implies r = 0.
Theorem 1.27. Let p(z) ∈ F[z] be a polynomial of degree n. Then p(z) has at most n zeros in F. Proof. The proof is by induction. The statement is certainly true for zero-degree polynomials. Assume that we have proved it for all polynomials of degree less than n. Suppose that p(z) is a polynomial of degree n. Either it has no zeros and the statement holds, or there exists a zero α . But then p(z) = (z − α )a(z), and a(z) has, by the induction hypothesis, at most n − 1 zeros. Theorem 1.28. Let F be a field and F[z] the ring of polynomials over F. Then F[z] is a principal ideal domain. Proof. We have already shown that F[z] is a commutative ring with identity that contains no zero divisors. Let J be any ideal in F[z]. If J = {0}, then J is generated by 0. So let us assume that J = {0}. Let d(z) be any nonzero polynomial in J of minimal degree. We will show that J = dF[z]. To this end, let f (z) ∈ J be arbitrary. By the division rule of polynomials we have f (z) = a(z)d(z) + r(z) with deg r < deg d. Now r(z) = f (z) − a(z)d(z) ∈ J, since both f (z) and d(z) are in J. Since d(z) was a nonzero element of smallest degree, we must have r(z) = 0. So f (z) ∈ dF[z] and hence J ⊂ dF[z]. Conversely, since d(z) ∈ J, we have dF[z] ⊂ J, and so equality follows. Definition 1.29. 1. Let p1 (z), . . . , pn (z) ∈ F[z]. A polynomial d(z) ∈ F[z] will be called a greatest common divisor of p1 (z), . . . , pn (z) ∈ F[z] if a. We have the division relation d(z) | pi (z), for all i = 1, . . . , n. b. If d1 (z) | pi (z), for all i = 1, . . . , n, then d1 (z) | d(z).
1.4 Rings and Fields
15
2. Let p1 (z), . . . , pn (z) ∈ F[z]. A polynomial d(z) ∈ F[z] will be called a least common multiple of p1 (z), . . . , pn (z) ∈ F[z] if a. We have the division relation pi (z) | d(z), for all i = 1, . . . , n. b. If pi (z) | d (z), for all i = 1, . . . , n, then d | d . Given polynomials p1 (z), p2 (z), we denote by p1 (z) ∧ p2 (z) their greatest common divisor and by p1 (z)∨ p2 (z) their least common multiple. It is easily shown that a greatest common divisor is unique up to a constant multiple. We will say that polynomials p1 (z), . . . , pn (z) ∈ F[z] are coprime if 1 is a greatest common divisor. The availability of the division process in F[z] leads directly to the Euclidean algorithm. This gives an algorithm for the computation of a g.c.d. of two polynomials. Theorem 1.30. Let p(z), q(z) ∈ F[z]. Set q−1 (z) = q(z) and q0 (z) = p(z). Define inductively, using the division rule for polynomials, a sequence of polynomials qi by qi+1 (z) = ai+1 (z)qi (z) − qi−1 (z)
(1.10)
with deg qi+1 < deg qi . Let qs (z) be the last nonzero remainder, i.e., qs+1 (z) = 0. Then qs (z) is a g.c.d. of q(z) and p(z). Proof. First we show that qs (z) is a common divisor of q(z) and p(z). Indeed, since qs+1 (z) = 0, we have qs−1 (z) = as+1 (z)qs (z), that is, qs (z) | qs−1 (z). Assume that we proved qs (z) | qs−1 (z), . . . , qi (z). Since qi−1 (z) = ai+1 (z)qi (z) − qi+1 (z) it follows that qs (z) | qi−1 (z) and hence, by induction qs (z) divides all qi (z), and in particular q0 (z), q−1 (z). So qs (z) is a common divisor of q(z) and p(z). Let J(qi , qi−1 ) = {aqi + bqi−1 | a(z), b(z) ∈ F[z]} be the ideal generated by qi (z), qi−1 (z). Clearly, equation (1.10) shows, again using an induction argument, that qi+1 (z) ∈ J(qi , qi−1 ) ⊂ J(qi−1 , qi−2 ) ⊂ · · · ⊂ J(q0 , q−1 ). In particular, qs (z) ∈ J(q0 , q−1 ). So there exist polynomials a(z), b(z) such that qs (z) = a(z)p(z) + b(z)q(z). This shows that any common divisor r(z) of p(z) and q(z) divides also qs (z). Thus qs (z) is a g.c.d. of p(z) and q(z). We remark that the polynomials a(z), b(z) in the representation qs (z) = a(z)p(z)+ b(z)q(z) can be easily calculated from the polynomials ai (z). We will return to this in Chapter 8. Corollary 1.31. Let p(z), q(z) ∈ F[z]. Then 1. p(z) and q(z) are coprime if and only if the Bezout equation a(z)p(z) + b(z)q(z) = 1 is solvable in F[z].
(1.11)
16
1 Algebraic Preliminaries
2. A solution pair a(z), b(z) is uniquely determined if we require additionally that deg a < deg q. In that case, we have also deg b < deg p.
Proof. 1. Solvability of (1.11) is clearly a sufficient condition for the coprimeness of p(z), q(z). Necessity is a consequence of Theorem 1.30. 2. The solution polynomials a(z), b(z) of the Bezout equation (1.11) are not unique. In fact, if a(z), b(z) is a solution pair and r(z) ∈ F[z] is arbitrary, then also (a(z) − r(z)q(z)), (b(z) + r(z)p(z)) is a solution. By the division rule, we can choose r(z) such that deg(a − rq) < deg q, which also implies deg(b + rp) < deg p. This proves uniqueness. In view of Theorem 1.16, the easiest way to construct ideals is by taking sums and intersections of kernels of ring homomorphisms. The case of interest for us is for the ring of polynomials. Definition 1.32. We define, for each α ∈ F, a map φα : F[z] −→ F by
φα (p) = p(α ).
(1.12)
Theorem 1.33. A map φ : F[z] −→ F is a ring homomorphism if and only if φ (p) = p(α ) for some α ∈ F. Proof. If φα is defined by (1.12), then clearly it is a ring homomorphism. Conversely, let φ : F[z] −→ F be ring homomorphism. Set α = φ (z). Then, given p(z) = ∑ki=0 pi zi , we have k
k
k
k
i=0
i=0
i=0
i=0
φ (p) = φ ∑ pi zi = ∑ pi φ (zi ) = ∑ pi φ (z)i = ∑ pi α i = p(α ). Corollary 1.34. Given α1 , . . . , αn ∈ F, the set Jα1 ,...,αn = {p(z) ∈ F[z] | p(α1 ) = · · · = p(αn ) = 0} n (z − α ). is an ideal in F[z]. Moreover, Jα1 ,...,αn = dF[z], where d(z) = Πi=1 i
Proof. For φα defined by (1.12), we have Ker φα = {p ∈ F[z] | p(α ) = 0} = Jα , which is an ideal. Clearly Jα1 ,...,αn = ∩ni=1 Jαi , and the intersection of ideals is an ideal. Obviously, for d(z) defined as above and an arbitrary polynomial f (z) we have (d f )(αi ) = d(αi ) f (αi ) = 0, so d(z) f (z) ∈ Jα1 ,...,αn . Conversely, if g(z) ∈ Jα1 ,...,αn , we have g(αi ) = 0, and hence g(z) is divisible by z − αi . Since the αi are distinct, g(z) is divisible by d(z), or g(z) = d(z) f (z).
1.4 Rings and Fields
17
Proposition 1.35. Let d(z) ∈ F[z]. Then dF[z] = {d(z)p(z) | p(z) ∈ F[z]} is an ideal. The next, important, result relates the generator of an ideal to division properties. Theorem 1.36. The ideal J generated by p1 (z), . . . , pn (z) ∈ F[z], namely J=
,
n
∑ ri (z)pi (z) | ri (z) ∈ F[z]
(1.13)
i=1
has the representation J = dF[z] if and only if d(z) is a greatest common divisor of p1 (z), . . . , pn (z) ∈ F[z] . Proof. By Theorem 1.28, there exists a d(z) ∈ J such that J = (d). Since pi (z) ∈ J, there exist polynomials qi (z) such that pi (z) = d(z)qi (z) for i = 1, . . . , n. So d(z) is a common divisor of the pi (z). We will show that it is maximal. Assume that d (z) is another common divisor of the pi (z), i.e., pi (z) = d (z)si (z). Since d(z) ∈ J, there exist polynomials ai (z) such that d(z) = ∑ni=1 ai (z)pi (z). Therefore n
n
n
i=1
i=1
i=1
d(z) = ∑ ai (z)pi (z) = ∑ ai (z)d (z)qi (z) = d (z) ∑ ai (z)qi (z). But this means that d (z) | d(z), and so d(z) is a g.c.d.
Corollary 1.37. Let p1 (z), . . . , pn (z) ∈ F[z], and let d(z) be their greatest common divisor. Then there exist polynomials a1 (z), . . . , an (z) ∈ F[z] such that n
d(z) = ∑ ai (z)pi (z).
(1.14)
i−1
Proof. Let d(z) be the g.c.d. of the pi (z). Obviously, d(z) ∈ J = {∑ni=1 ri (z)pi (z) | ri (z) ∈ F[z]}. Therefore there exist ai (z) ∈ F[z] for which (1.14) holds. Corollary 1.38. Polynomials p1 (z), . . . , pn (z) ∈ F[z] are coprime if and only if there exist polynomials a1 (z), . . . , an (z) ∈ F[z] such that n
∑ ai (z)pi (z) = 1.
(1.15)
i=1
Equation (1.15), one of the most important equations in mathematics, will be refered to as the Bezout equation. The importance of polynomials in linear algebra stems from the strong connection between factorization of polynomials and the structure of linear transformations. The primary decomposition theorem is of particular applicability.
18
1 Algebraic Preliminaries
Definition 1.39. A polynomial p(z) ∈ F[z] is factorizable, or reducible, if there exist polynomials f (z), g(z) ∈ F[z] of degree ≥ 1 such that p(z) = f (z)g(z). If p(z) is not factorizable, it is called a prime or an irreducible polynomial. Note that the reducibility of a polynomial is dependent on the field F. Theorem 1.40. Let p(z), f (z), g(z) ∈ F[z], and assume that p(z) is irreducible and p(z) | ( f (z)g(z)). Then either p(z) | f (z) or p(z) | g(z). Proof. Assume that p(z) | ( f (z)g(z)) but p(z) does not divide f (z). Then a g.c.d. of p(z) and f (z) is 1. There exist therefore polynomials a(z), b(z) such that the Bezout equation 1 = a(z) f (z) + b(z)p(z) holds. From this it follows that g(z) = a(z)( f (z)g(z)) + (b(z)g(z))p(z). This implies p(z) | g(z).
Corollary 1.41. Let p(z) be an irreducible polynomial and assume p(z) | ( f1 (z) · · · fn (z)). Then there exists an index i for which p(z) | f i (z).
Proof. By induction.
Lemma 1.42. Let p(z) and q(z) be coprime. Then if p(z) | q(z)s(z), it follows that p(z) | s(z). Proof. Follows immediately from Theorem 1.40.
Lemma 1.43. Let p(z), q(z) be coprime. Then p(z)q(z) is their l.c.m. Proof. Clearly, p(z)q(z) is a common multiple. Let s(z) be an arbitrary common multiple. In particular, we can write s(z) = q(z)t(z). Since p(z) and q(z) are coprime and p(z) | s(z), it follows that p(z) | t(z) or that t(z) = p(z)t (z). Thus s(z) = (p(z)q(z))t (z), and therefore p(z)q(z) is a least common multiple. A polynomial p(z) ∈ F[z] is called monic if its highest nonzero coefficient is 1. Theorem 1.44. Let p(z) be a monic polynomial in F[z]. Then p(z) has a unique, up to ordering, factorization into a product of prime monic polynomials. Proof. We prove the theorem by induction on the degree of p(z). If deg p = 1, the statement is trivially true. Assume that we proved the statement for all polynomials of degree < n. Let p(z) be of degree n. Then either p(z) is prime or p(z) = f (z)g(z) with 1 < deg f , deg g < n. By the induction hypothesis, both f (z) and g(z) are decomposable into a product of prime monic polynomials. This implies the existence of a decomposition of p(z) into the product of monic primes. It remains to prove uniqueness. Suppose p1 (z), . . . , pm (z) and q1 (z), . . . , qn (z) are all prime and monic and p1 (z) · · · pm (z) = q1 (z) · · · qn (z).
1.4 Rings and Fields
19
Clearly, pm (z) | q1 (z) · · · qn (z), so, by Corollary 1.41, there exists an i such that pm (z) | qi (z). Since both are monic and irreducible, it follows that pm (z) = qi (z). Without loss of generality we may assume pm (z) = qn (z) Since there are no zero divisors in F[z], we get p1 (z) · · · pm−1 (z) = q1 (z) · · · qn−1 (z). We complete the proof by using the induction hypothesis. Corollary 1.45. Given a monic polynomial p(z) ∈ F[z]. There exist monic primes pi (z) and positive integers ni , i = 1, . . . , s, such that p = p1 (z)n1 · · · ps (z)ns .
(1.16)
The primes pi (z) and the integers ni are uniquely determined. Proof. Follows from the previous theorem.
The factorization in (1.16) is called the primary decomposition of p(z). The monicity assumption is necessary only to get uniqueness. Without it, the theorem still holds, but the primes are determined only up to constant factors. The next result relates division in the ring of polynomials to the geometry of ideals. We saw already, in Proposition 1.15, that in a ring the sum and intersection of ideals are also ideals. the next proposition makes this specific for the ring of polynomials. Proposition 1.46. 1. Let p(z), q(z) ∈ F[z]. Then qF[z] ⊂ pF[z] if and only if p(z) | q(z). 2. Let pi (z) ∈ F[z] for i = 1, . . . , n. Then ∩ni=1 pi F[z] = pF[z], where p(z) is the l.c.m. of the pi (z). 3. Let pi (z) ∈ F[z] for i = 1, . . . , n. Then n
∑ pi F[z] = pF[z],
i=1
where p(z) is the g.c.d. of the pi (z). Proof. 1. Assume qF[z] ⊂ pF[z]. Thus there exists a polynomial f (z) for which q(z) = p(z) f (z), i.e., p(z) | q(z). Conversely, assume p(z) | q(z), i.e., q(z) = p(z) f (z) for some polynomial f (z). Then qF[z] = {q · g | g ∈ F[z]} = {p f q | g ∈ F[z]} ⊂ {ph | h ∈ F[z]} = pF[z]. 2. By Theorem 1.28, the intersection ∩m i=1 pi F[z] = pF[z] is a principal ideal, i.e., for some p(z) ∈ F[z] we have ∩m p F[z] = pF[z]. Clearly, pF[z] ⊂ pi F[z] for all i i=1 i. By part (i), this implies pi (z) | p(z). So p(z) is a common multiple of the pi (z).
20
1 Algebraic Preliminaries
Suppose q(z) is any common multiple, i.e., q(z) = pi (z)qi (z) and so qF[z] ⊂ pi F[z] for all i, which implies that qF[z] ⊂ ∩m i=1 pi F[z] = pF[z]. But this inclusion shows that p(z) | q(z), and hence p(z) is the l.c.m of the pi (z). 3. Again, by Theorem 1.28, ∑m i=1 pi F[z] is a principal ideal, i.e., for some p(z) ∈ F[z] we have ∑m p F[z] = pF[z]. Obviously, pi F[z] ⊂ pF[z] for all i, and hence i i=1 p(z) | pi (z). Thus p(z) is a common divisor for the pi (z). Let q(z) be any other common divisor. Then pi F[z] ⊂ qF[z], and hence pF[z] = ∑m i=1 pi F[z] ⊂ qF[z], which shows that q(z) | p(z), that is, p(z) is the g.c.d. For the case of two polynomials we have the following. Proposition 1.47. Let p(z), q(z) be nonzero monic polynomials and let r(z) and s(z) be their g.c.d. and l.c.m. respectively, both taken to be monic. Then we have p(z)q(z) = r(z)s(z). Proof. Write p(z) = r(z)p1 (z), q(z) = r(z)q1 (z), with p1 (z), q1 (z) coprime and monic. Clearly s(z) = r(z)p1 (z)q1 (z) is a common multiple of p(z), q(z). Let s (z) be any other multiple. Since p(z) | s (z), we have s (z) = r(z)p1 (z)t(z) for some polynomial t(z). Since q(z) | s (z), we have q1 (z) | p1 (z)t(z). Since p1 (z), q1 (z) are coprime, it follows from Lemma 1.42 that q1 (z) | t(z). This shows that s(z) = r(z)p1 (z)q1 (z) is the unique monic l.c.m. of p(z) and q(z). The equality p(z)q(z) = r(z)s(z) is now obvious.
1.4.3 Formal Power Series For a given field F, we denote by F[[z]] the set of all formal power series, i.e., of formal sums of the form f (z) = ∑∞j=0 f j z j . Addition and multiplication are defined by ∞
∞
∞
j=0
j=0
j=0
∑ f j z j + ∑ g j z j = ∑ ( f j + g j )z j
and
∞
∞
∞
j=0
j=0
k=0
∑ f j z j · ∑ g j z j = ∑ hk z k ,
with hk =
(1.17)
(1.18)
k
∑ f j gk− j .
(1.19)
j=0
With these operations, F[[z]] is a ring with identity 1. An element f (z) = ∑∞j=0 f j z j ∈ F[[z]] is invertible if and only if f0 = 0. To see this let g(z) = ∑∞j=0 g j z j . Then g(z) is an inverse of f (z) if and only if
1.4 Rings and Fields
21
1 = ( f g)(z) =
∞
k
∑ ∑ f j gk− j
k=0
zk .
j=0
This is equivalent to the solvability of the infinite system of equations k
∑ f j gk− j =
j=0
1 0
k = 0, k > 0.
The first equation is f0 g0 = 1, which shows the necessity of the condition f0 = 0. This is also sufficient, since the system of equations can be solved recursively. The following result analyzes the ideal structure in F[[z]]. Proposition 1.48. J ⊂ F[[z]] is a nonzero ideal if and only if for some nonnegative integer n, we have J = zn F[[z]]. Thus F[[z]] is a principal ideal domain. Proof. Clearly, any set of the form zn F[[z]] is an ideal. To prove the converse, we set, for f (z) = ∑∞j=0 f j z j ∈ F[[z]],
δ(f) =
min{n | fn = 0} −∞
f= 0 f =0
Let now f ∈ J be any nonzero element that minimizes δ ( f ). Then f (z) = zn h(z) with h invertible. Thus zn belongs to J and generates it. In the sequel we find it convenient to work with the ring F[[z−1 ]] of formal power series in z−1 . We study now an important construction that allows us to construct from some ring larger rings. The prototype of this situation is the construction of the field of rational numbers out of the ring of integers. Given rings R and R, we say that R is embedded in R if there exists an injective homomorphism of R into R. A set S in a ring R with identity is called a multiplicative set if 0 ∈ / S, 1 ∈ S, and a, b ∈ S implies also ab ∈ S. Given a commutative ring with identity R and a multiplicative set S in R, we proceed to construct a new ring. Let M be the set of ordered paits (r, s) with r ∈ R and s ∈ S. We introduce a relation in M by saying that (r, s) (r , s ) if there exists σ ∈ S for which σ (s r − sr ) = 0. We claim that this is indeed an equivalence relation. Reflexivity and symmetry are trivial to check. To check transitivity, assume σ (s r − sr ) = 0 and τ (s
r − s
) = 0 with σ , τ ∈ S. We compute
τσ s (s
r) = τ s
σ (s r) = τ s
σ (sr ) = σ sτ (s
r ) = σ sτ (s r
) = σ s τ (sr
), or τσ s (s
r − sr
) = 0. We denote by r/s the equivalence class of the pair (r, s). We denote by R/S the set of all equivalence classes in M . In R/S we define operations of addition and
22
multiplication by
1 Algebraic Preliminaries
rs + sr r r , + = s s ss
(1.20) rr r r · = . s s ss It can be checked that these operations are well defined, i.e., they are independent of the equivalence class representatives. Also, it is straightforward to verify that with these operations R/S is a commutative ring. This is called a ring of quotients of R. Of course, there may be many such rings, depending on the multiplicative sets we are taking. In case R is an entire ring, then the map φ : R −→ R/S defined by φ (r) = r/1 is an injective ring homomorphism. In this case more can be said. Theorem 1.49. Let R be an entire ring. Then R can be embedded in a field. Proof. Let S = R − {0}, that is, S is the set of all nonzero elements in R. Obviously S is a multiplicative set. We let F = R/S. Then F is a commutative ring with identity. To show that F is a field, it suffices to show that every nonzero element is invertible. If a/b = 0, this implies a = 0 and hence (a/b)−1 = b/a. Moreover, the map φ given before provides an embedding of R in F. The field F constructed by the previous theorem is called the field of quotients of R. For our purposes the most important example of a field of quotients is that of the field of rational functions, denoted by F(z), which is obtained as the field of quotients of the ring of polynomials F[z]. Let us consider now an entire ring that is also a principal ideal domain. Let F be the field of fractions of R. Given any f ∈ F, we can consider J = {r ∈ R | r f ∈ R}. Obviously, J is an ideal, hence generated by an element q ∈ R that is uniquely defined up to an invertible factor. Thus q f = p for p ∈ R and f = p/q. Obviously, p, q are coprime, and f = p/q is called a coprime factorization of f . We give now a few examples of rings and fields that are a product of the process of taking a ring of quotients.
1.4.4 Rational Functions For a field F we saw that the ring of polynomials F[z] is a principal ideal domain. Its field of quotients is called the field of rational functions and denoted by F(z). Its elements are called rational functions. Every rational function has a representation of the form p(z)/q(z) with p(z), q(z) coprime. We can make the coprime factorization unique if we require the polynomial q(z) to be monic. By Proposition 1.20, every polynomial p(z) has a unique representation in the form p(z) = a(z)q(z) + r(z) with degr < deg q. This implies
1.4 Rings and Fields
23
r(z) p(z) = a(z) + . q(z) q(z)
(1.21)
A rational function r(z)/q(z) is called proper if deg r ≤ degq and strictly proper if deg r < deg q. The set of all strictly proper rational functions is denoted by F− (z). Thus we have F(z) = F[z] ⊕ F−(z). Here the direct sum representation refers to the uniqueness of the representation (1.21).
1.4.5 Proper Rational Functions We denote by F pr (z) the subring of F(z) defined by F pr (z) =
p(z) f (z) ∈ F(z) | f (z) = , deg p ≤ deg q . q(z)
(1.22)
It is easily checked that F pr (z) is a commutative ring with identity. An element f (z) ∈ F pr (z) is a unit if and only if in any representation f (z) = p(z)/q(z), we have deg p = deg q. We define the relative degree ρ by ρ (p/q) = deg q − deg p, and ρ (0) = −∞. Theorem 1.50. F pr (z) is a Euclidean domain with respect to the relative degree. Proof. We verify the conditions of a Euclidean ring as given in Definition 1.21. Let ρ be the relative degree. Let f (z) = p1 (z)/q1 (z) and g(z) = p2 (z)/q2 (z) be nonzero. Then p1 p 2 = deg(q1 q2 ) − deg(p1 p2 ) ρ ( f g) = ρ q1 q2 = deg q1 + degq2 − deg p1 − deg p2 = (deg q1 − deg p1 ) + (degq2 − deg p2 ) = ρ ( f ) + ρ (g) ≥ ρ ( f ). Next, we turn to division. If ρ ( f ) < ρ (g), we write f (z) = 0 · g(z) + f (z). If g(z) = 0 and ρ ( f ) ≥ ρ (g), we show that g(z) divides f (z). Let σ and τ be the relative degrees of f (z) and g(z); thus σ ≥ τ . So we can write f (z) =
1 f1 (z), zσ
g(z) =
1 g1 (z), zτ
with f1 (z), g1 (z) units in F pr (z), i.e., satisfying ρ ( f1 ) = ρ (g1 ) = 0. Then we can write
24
1 Algebraic Preliminaries
f (z) = Of course
1 g (z)−1 f1 (z) zσ − τ 1
1 1 −1 g1 (z) g1 (z) f1 (z) . zτ zσ −τ
belongs to F pr (z) and has relative degree σ − τ .
Since F pr (z) is Euclidean, it is a principal ideal domain. We proceed to characterize all ideals in F pr (z). Theorem 1.51. A subset J ⊂ F pr (z) is a nonzero ideal if and only if it is of the form J = z1σ F pr (z) for some nonnegative integer σ . Proof. Clearly, J = z1σ F pr (z) is an ideal. Conversely, let J be a nonzero ideal. Let f (z) be a nonzero element in J of least relative degree, say σ . We may assume without loss of generality that f (z) = z1σ . By the proof of Theorem 1.50, f (z) divides every element with relative degree greater than or equal to σ . So f (z) is a generator of J. Heuristically speaking, and adopting the language of complex analysis, we see that ideals are determined by the zeros at infinity, multiplicities counted. In terms of singularities, F pr (z) is the set of all rational functions that have no singularity (pole) at infinity. In comparison, F[z] is the set of all rational functions whose only singularity is at infinity. Of course, there are many intermediate situations and we turn to these.
1.4.6 Stable Rational Functions Fix the field in what follows to be the real field R. We can identify many subrings of R(z) simply by taking the ring of quotients of R[z] with respect to multiplicative sets that are smaller than the set of all nonzero polynomials. We shall say that a polynomial is stable if all its (complex) zeros are in a given subset Σ of the complex plane. It is antistable if all its zeros lie in the complement of Σ . We will assume now that the domain of stability is the open left half-plane. Let S be the set of all stable polynomials. Then S is obviously a multiplicative subset of R[z]. The ring of quotients S = R[z]/S ⊂ R(z) is called the ring of stable rational functions. As a ring of fractions it is a commutative ring with identity. An element f (z) ∈ S is a unit if in an irreducible representation f (z) = p(z)/q(z) the numerator p(z) is stable. Thus we may say that f (z) ∈ S is a unit if it has no zeros in the closed left half-plane. Such functions are sometimes called minimum phase functions. From the point of view of complex analysis, the ring of stable rational functions is the set of rational functions that have all their singularities in the open left halfplane or at the point at infinity. In S we define a degree function δ as follows. Let f (z) = p(z)/q(z) with p(z), q(z) coprime. We let δ ( f ) be the number of antistable zeros of p(z), and hence of f (z). Note that δ does not take into account zeros at infinity.
1.4 Rings and Fields
25
Theorem 1.52. S is a Euclidean ring with respect to the degree function δ . Proof. That δ ( f g) = δ ( f ) + δ (g) ≥ δ ( f ) is obvious. Let now f (z), g(z) ∈ S with g(z) = 0. Assume without loss of generality that δ ( f ) ≥ δ (g). Let g(z) = αg (z)/βg (z) with αg (z), βg (z) coprime polynomials. Similarly, let f (z) = α f (z)/ β f (z) with α f (z), β f (z) coprime. Factor αg (z) = α+ (z)α− (z), with α− (z) stable and α+ (z) antistable. Then g(z) =
α+ (z)α− (z) α− (z)(z + 1)ν α+ (z) α+ (z) = e(z) · , = · ν βg (z) βg (z) (z + 1) (z + 1)ν
with e(z) a unit and ν = δ (g). Since β f (z) is stable, β f (z), α+ (z) are coprime in R[z] and there exist polynomials φ (z), ψ (z) for which φ (z)α+ (z) + ψ (z)β f (z) = α f (z)(z+ 1)ν −1 , and we may assume without loss of generality that deg ψ < deg α+ . Dividing both sides by β f (z)(z + 1)ν −1 , we get
α f (z) = β f (z)
φ (z + 1) β f (z)
α+ (z) (z + 1)ν −1
+
ψ (z) , (z + 1)ν −1
and we check that
δ
α+ (z + 1)ν −1
≤ deg ψ < deg α+ = δ (g).
Ideals in S can be represented by generators that are, up to unit factors, antistable polynomials. For two antistable polynomials p1 (z), p2 (z) we have p2 S ⊂ p1 S if and only if p1 (z) | p2 (z). More generally, for f 1 (z), f2 (z) ∈ S , we have f2 S ⊂ f1 S if and only if f1 (z) | f2 (z). However, the division relation f 1 (z) | f2 (z) means that every zero of f 1 (z) in the closed right half-plane is also a zero of f2 (z) with at least the same multiplicity. Here zeros should be interpreted as complex zeros. Since the intersection of subrings is itself a ring, we can consider the ring R pr (z) ∩ S . This is equivalent to excluding from S elements with a singularity at infinity. We denote this ring by RH∞ + . Because of its great importance to systemtheoretic problems, we shall defer the study of this ring to Chapters 11 and 12.
1.4.7 Truncated Laurent Series Let F be a field. We denote by F((z−1 )) the set of all formal sums of the form nf f (z) = ∑ j=−∞ f j z j with n f ∈ Z. The operations of addition and multiplication are defined by
26
1 Algebraic Preliminaries max{n f ,ng }
∑
( f + g)(z) =
( f j + g j )z j ,
j=−∞
and
{n f +ng }
( f g)(z) =
∑
hk zk ,
k=−∞
with hk =
∞
∑
f j gk− j .
j=−∞
Notice that the last sum is well defined, since it contains only a finite number of nonzero terms. We can check that all the field axioms are satisfied; in particular, all nonzero elements are invertible. We call this the field of truncated Laurent series. Note that F((z−1 )) can be considered the field of fractions of F[[z−1 ]]. Clearly, F(z) can be considered a subfield of F((z−1 )). We introduce for later use the projection maps n
n
f f π+ ∑ j=−∞ f j z j = ∑ j=0 f jz j, nf −1 j π− ∑ j=−∞ f j z = ∑ j=−∞ f j z j .
(1.23)
1.5 Modules The module structure is one of the most fundamental algebraic concepts. Most of the rest of this book is, to a certain extent, an elaboration on the module theme. This is particularly true for the case of linear transformations and linear systems. Let R be a ring with identity. A left module M over the ring R is a commutative group together with an operation of R on M that for all r, s ∈ R and x, y ∈ M, satisfies r(x + y) = rx + ry, (r + s)x = rx + sx, r(sx) = (rs)x, 1x = x. Right modules are defined similarly. Let M be a left R-module. A subset N of M is a submodule of M if it is an additive subgroup of M that further satisfies RN ⊂ M. Given two left R-modules M1 and M2 , a map φ : M1 −→ M2 is an R-module homomorphism if for all x, y ∈ M1 and r ∈ R,
φ (x + y) = φ x + φ y, φ (rx) = rφ (x).
1.5 Modules
27
Given an R-module homomorphism φ : M1 −→ M2 , then Ker φ and Im φ are submodules of M1 and M2 respectively. Given R-modules M0 , . . . , Mn , a sequence of R-module homomorphisms φi−1
φi
−→ Mi−1 −→ Mi −→ Mi+1 −→ is called an exact sequence if Im φi−1 = Ker φi . An exact sequence of the form φ
ψ
0 −→ M1 −→ M2 −→ M3 −→ 0 is called a short exact sequence. This means that φ is injective and ψ is surjective. Given modules Mi over the ring R, we can make the Cartesian product M1 × · · · × Mk into an R-module by defining, for mi , ni ∈ Mi and r ∈ R, (m1 , . . . , mk ) + (n1 , . . . , nk ) = (m1 + n1 , . . . , mk + nk ), r(m1 , . . . , mk ) = (rm1 , . . . , rmk ).
(1.24)
Given a module M and submodules Mi , we say that M is the direct sum of the Mi , and write M = M1 ⊕ · · · ⊕ Mk , if every m ∈ M has a unique representation of the form m = m1 + · · · + mk with mi ∈ Mi . Given a submodule N of a left R-module M, we can construct a module structure in the same manner in which we constructed quotient groups. We say that two elements x, y ∈ M are equivalent if x − y ∈ N. The equivalence class of x is denoted by [x]N = x + N. The set of equivalence classes is denoted by M/N. We make M/N into an R-module by defining, for all r ∈ R and x, y ∈ M, [x]N + [y]N = [x + y]N , r[x]N = [rx]N .
(1.25)
It is easy to check that these operations are well defined, that is, independent of the equivalence class representative. We state without proof the following. Proposition 1.53. With the operations defined in (1.25), the set of equivalence classes M/N is a module over R. This is called the quotient module of M by N. We shall make use of the following result. This is the counterpart of Theorem 1.9. Proposition 1.54. Let M be a module over the ring R. Then N ⊂ M is a submodule if and only if it is the kernel of a R-module homomorphism. Proof. Clearly the kernel of a module homomorphism is a submodule. Conversely, let π : M −→ M/N be the canonical projection. Then π is a module homomorphism and Ker π = N.
28
1 Algebraic Preliminaries
Note that with i : N −→ M the natural embedding, then i
π
{0} −→ N −→ M −→ M/N −→ {0} is a short exact sequence. Proposition 1.55. Let M, N be R-modules and let φ : M −→ N be a surjective Rmodule homomorphism. Then we have the module isomorphism N M/Ker φ . Proof. Let π be the canonical projection of N onto N/Ker φ , that is, π (m) = m + Ker φ . We define now a map τ : N/Ker φ −→ M by τ (m + Ker φ ) = φ (m). The map τ is well defined. For if m1 , m2 are representatives of the same coset, then m1 − m2 ∈ Ker φ . This implies φ (m1 ) = φ (m2 ). That τφ is a homomorphism follows from the fact that φ is one. Clearly, the surjectivity of φ implies that of τ . Finally, τ (m + Ker φ ) = 0 if and only if φ (m) = 0, i.e., m ∈ Ker φ . Thus τ is also injective, hence an isomorphism. Let M be a module over the ring R and let S ⊂ M. By a linear combination, with coefficients in R, we mean a sum ∑α ∈S aα mα , where aα ∈ R, mα ∈ M and S a finite subset of S. The elements aα are called the coefficients of the linear combination. A subset S of M is called linearly independent if whenever ∑α ∈S aα mα = 0, we have aα = 0 for all α ∈ S . A subset S of M generates M if the set of all finite linear combinations of elements in S is equal to M. Equivalently, if the smallest submodule of M that includes S is M itself. S is a basis for M if it is non-empty subset, linearly independent and generates M. We end by giving a few examples of important module structures. Every abelian group G is a module over the ring of integers Z. Every ring is a module over itself, as well as over any subring. A vector space, as we shall see in Chapter 2, is a module over a field. Thus F(z) is a module over F[z]; F((z−1 )) is a module over each of the subrings F[z] and F[[z−1 ]]; z−1 F[[z−1 ]] has an induced F[z]-module structure, being isomorphic to F((z−1 ))/F[z]. This module structure is defined by ∞
hj = j j=1 z
z· ∑
∞
h j+1 . j j=1 z
∑
As F-vector spaces, we have the direct sum representation F((z−1 )) = F[z] ⊕ z−1F[[z−1 ]].
(1.26)
We denote by π+ and π− the projections of F((z−1 )) on F[z] and z−1 F[[z−1 ]] respectively, i.e., given by
1.5 Modules
29 N
∑
π−
h jz j =
j=−∞ N
π+
∑
−1
∑
h jz j
j=−∞
h jz j =
j=−∞
N
∑ h jz j
(1.27)
j=0
Clearly, π+ and π− are complementary projections, i.e., satisfy π±2 = π± and π+ + π− = I. Since F((z−1 )) is a 1-dimensional vector space over the field F((z−1 )), every F((z−1 ))-linear map from F((z−1 )) to F((z−1 )) has a representation (LA f )(z) = A(z) f (z),
(1.28)
for some A(z) ∈ F((z−1 )). The map LA defined by (1.28) is called the Laurent operator, and A(z) is called the symbol of the Laurent operator LA . Of course in terms of the expansions of A(z) and f (z), we have g = LA f = ∑k gk zk with gk = ∑∞j=−∞ A j fk− j . The sum is well defined, since there are only a finitely many nonzero terms in it. Of special importance is the Laurent operator acting in F((z−1 )) with symbol z. Because of the natural interpretation in terms of the expansion coefficients, we call it the bilateral shift, or simply the shift, and denote it by S. The shift S is clearly an invertible map in F((z−1 )), and we have (S f )(z) = z f (z), (S
−1
f )(z) = z−1 f (z).
(1.29)
The subsets F[z] and F[[z−1 ]] of F((z−1 )) are closed under addition and multiplication; hence they inherit natural ring structures. Units in F[[z−1 ]] are biproper, i.e., are proper and have a proper inverse. Similarly F((z−1 )) is a module over both F[z] and F[[z−1 ]]; F[z] ⊂ F((z−1 )) is an F[z] submodule and similarly z−1 F[[z−1 ]] ⊂ F((z−1 )) is an F[[z−1 ]] submodule. Furthermore, F((z−1 )) becomes a module over various rings including F, F[z], F((z−1 )) and F[[z−1 ]]. Various module structures will be of interest in the sequel. In particular, we note the following. Proposition 1.56. 1. F[z] is an F[z]-submodule of F((z−1 )). 2. F[[z−1 ]] and z−1 F[[z−1 ]] are F[[z−1 ]] submodules of F((z−1 )). 3. As F[z] modules we have the following short exact sequence of module homomorphisms: j
π
0 −→ F[z] −→ F((z−1 )) −→ F((z−1 ))/F[z] −→ 0, with j the embedding of F[z] into F((z−1 )) and π the canonical projection onto the quotient module.
30
1 Algebraic Preliminaries
Elements of F((z−1 ))/F[z] are equivalence classes, and two elements are in the same equivalence class if and only if they differ in their polynomial terms only. For h ∈ F((z−1 )), we denote by [h]F[z] its equivalence class. A natural choice of representative in each equivalence class is the one element whose polynomial terms are all zero. This leads to the isomorphism F((z−1 ))/F[z] z−1 F[[z−1 ]],
(1.30)
given by [h]F[z] → π− h. There are important linear transformations induced by S in the spaces F[z] and z−1 F[[z−1 ]]. Noting that F[z] is an F[z]-submodule of F((z−1 )), then F[z] is Sinvariant. Thus, we can define a map S+ : F[z] −→ F[z] by restricting S to F[z], i.e., S+ = S | F[z].
(1.31)
Similarly, S induces a map in the quotient module F((z−1 ))/F[z] given by [h]F[z] → [Sh]F[z]. Using the isomorphism (1.33), we also define S− : z−1 F[[z−1 ]] −→ z−1 F[[z−1 ]] by S− h = π− zh.
(1.32)
We note that S+ is injective but not surjective, whereas S− is surjective but not injective. Also codim Im S+ = dim Ker S− = dim F = 1. We will refer to S+ as the forward shift operator and to S− as the backward shift operator. Clearly, F((z−1 )), being a module over F((z−1 )), is also a module over any subring of F((z−1 )). In particular, it has a module structure over F[z]m×m and F[[z−1 ]], as well as over the rings F[z] and F[[z−1 ]]. We can state the following. Proposition 1.57. 1. F[z] is a left F[z] submodule, and hence also a left F[z] submodule of F((z−1 )). 2. z−1 F[[z−1 ]] is an F[[z−1 ]] submodule and hence also an F[[z−1 ]] submodule of F((z−1 )). 3. We have the isomorphism z−1 F[[z−1 ]] F((z−1 ))/F[z],
(1.33)
and z−1 F[[z−1 ]] has therefore the naturally induced F[z]-module structure. Also, it has an F[z]-module structure given, for A ∈ F[z], by A · h = π− Ah,
h ∈ z−1 F[[z−1 ]].
(1.34)
4. Given A ∈ F[z] p×m, with the F[z]-module structures, the map X : z−1 F[[z−1 ]] −→ z−1 F[[z−1 ]] p defined by h → π− Ah is a module homomorphism. 5. F[z] has an F[[z−1 ]]-module structure given, for A ∈ F[[z−1 ]], by A · f = π+ A f ,
f ∈ F[z].
(1.35)
1.6 Exercises
31
1.6 Exercises 1. Show that the order o(Sn ) of the symmetric group Sn is n!. 2. Show that any subgroup of a cyclic group is cyclic, and so is any homomorphic image of G. 3. Show that every group of order ≤5 is abelian. 4. Prove the isomorphism C R[z]/(z2 + 1)R[z]. 5. Given a polynomial p(z) = ∑nk=0 pk zk , we define its formal derivative by p (z) = ∑nk=0 kpk zk−1 . Show that (p(z) + q(z)) = p (z) + q (z), (p(z)q(z)) = p(z)q (z) + q(z)p (z), (p(z)m ) = mp (z)p(z)m−1 . Show that over a field of characteristic 0, a polynomial p(z) factors into the product of distinct irreducible factors if and only if the greatest common divisor of p(z) and p (z) is 1. 6. Let M, M1 be R-modules and N ⊂ M a submodule. Let π : M −→ M/N be the canonical projection and φ : M −→ M1 an R-homomorphism. Show that if Ker φ ⊃ N, there exists a unique R-homomorphism, called the induced homomorphism, φ |M/N : M/N −→ M1 for which φ |M/N = φ ◦ π . Show that Ker φ |M/N = (Ker φ )/N, Im φ |M/N = Im φ . 7. Let M be a module and Mi submodules. Show that if
M = M1 + · · · + Mk Mi ∩ ∑ j=i M j = {0},
then the map φ : M1 × · · · × Mk −→ M1 + · · · + Mk defined by
φ (m1 , . . . , mk ) = m1 + · · · + mk is a module isomorphism. 8. Let M be a module and Mi submodules. Show that if M = M1 + · · · + Mk and M1 ∩ M2 = 0, (M1 + M2 ) ∩ M3 = 0, .. . (M1 + · · · + Mk−1 ) ∩ Mk = 0, then M = M1 ⊕ · · · ⊕ Mk .
32
1 Algebraic Preliminaries
9. Let M be a module and K, L submodules. a. Show that (K + L)/K L/(K ∩ L). b. If K ⊂ L ⊂ M, then
M/L (M/K)/(L/K).
10. A module M over a ring R is called free if it is the zero module or has a basis. Show that if M is a free module over a principal ideal domain R having n basis elements and N is a submodule, then N is free and has at most n basis elements. 11. Show that F[z]n , z−1 F[[z−1 ]]n , F((z−1 ))n are F[z]-modules.
1.7 Notes and Remarks The development of algebra can be traced to antiquity, in particular to the Babylonians and the Chinese. For an interesting survey of early non-European mathematics, see Joseph (2000). Galois, one of the more brilliant and colorful mathematicians, introduced the term group in the context of solvability of polynomial equations by radicals. This was done in 1830, but Galois’ writings were first published, by Liouville, only in 1846, fifteen years after Galois died in a duel. The abstract definition of a group is apparently due to Cayley. The development of ring theory as a generalization of integer arithmetic owes much to the efforts to prove Fermat’s last theorem. Although, in the context of algebraic number theory, ideals appear already in the work of Kummer, the concepts of rings and fields are probably due to Dedekind, Kronecker, and Hilbert. An axiomatic approach to the theory of rings and modules was first developed by E. Noether. Modern expositions of algebra all stem from the classical book by van der Waerden (1931), which in turn was based on lectures by E. Noether and E. Artin. Incidentally, this seems to be the first book having a chapter devoted to linear algebra. For a modern, general book on algebra, Lang (1965) is recommended. Our emphasis on the ring of polynomials is not surprising, considering their wellentrenched role in the study of linear transformations in finite-dimensional vector spaces. The similar exposure given to the field of rational functions, and in particular to the subrings of stable rational functions and bounded stable rational functions, is motivated by the role they play in system theory. This will be treated in Chapters 10 to 12. For more information in this direction, the reader is advised to consult Vidyasagar (1985).
Chapter 2
Vector Spaces
2.1 Introduction Vector spaces provide the setting in which the rest of the topics that are to be presented in this book are developed. The content is geometrically oriented and we focus on linear combinations, linear independence, bases, dimension, coordinates, subspaces, and quotient spaces. Change of basis transformations provide us with a first instance of linear transformations.
2.2 Vector Spaces Vector spaces are modules over a field F. For concreteness, we give an ab initio definition. Definition 2.1. Let F be a field. A vector space V over F is a set, whose elements are called vectors, that satisfies the following set of axioms: 1. For each pair of vectors x, y ∈ V there exists a vector x + y ∈ V , called the sum of x and y, and the following holds: a. The commutative law: x + y = y + x. b. The associative law: x + (y + z) = (x + y) + z. c. There exists a unique zero vector 0 that satisfies 0 + x = x + 0, for all x ∈ V . d. For each x ∈ V there exists a unique element −x such that P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, DOI 10.1007/978-1-4614-0338-8 2, © Springer Science+Business Media, LLC 2012
33
34
2 Vector Spaces
x + (−x) = 0, that is, V is a commutative group under addition. 2. For all x ∈ V and α ∈ F there exists a vector α x ∈ V , called the product of α and x, and the following are satisfied: a. The associative law:
α (β x) = (αβ )x.
b. For the unit 1 ∈ F and all x ∈ V we have 1 · x = x. 3. The distributive laws: a. (α + β )x = α x + β x, b. α (x + y) = α x + α y. Examples: ⎧⎛ ⎞ ⎫ ⎪ ⎪ ⎪ a1 ⎪ ⎪ ⎪ ⎪ ⎜ . ⎟ ⎪ ⎪ ⎪ ⎨⎜ ⎟ ⎬ ⎜ ⎟ n F = ⎜ . ⎟ |ai ∈ F . ⎜ ⎟ ⎪ ⎪ ⎪ ⎪ ⎪⎝ . ⎠ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ a ⎭ n
1. Let
We define the operations of addition and multiplication by a scalar α ∈ F by ⎛
⎞ ⎛ ⎞ ⎛ ⎞ a1 b1 a 1 + b1 ⎜ . ⎟ ⎜ . ⎟ ⎜ . ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ . ⎟ + ⎜ . ⎟ = ⎜ . ⎟, ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ . ⎠ ⎝ . ⎠ ⎝ . ⎠ an bn an + bn
⎞ ⎛ ⎞ α a1 a1 ⎜ . ⎟ ⎜ . ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ α ⎜ . ⎟ = ⎜ . ⎟. ⎜ ⎟ ⎜ ⎟ ⎝ . ⎠ ⎝ . ⎠ α an an ⎛
With these definitions, Fn is a vector space. 2. An m × n matrix over the field F is a set of mn elements ai j arranged in rows and columns, i.e., ⎞ ⎛ a11 . . . a1n ⎜ . ... . ⎟ ⎟ ⎜ ⎟ ⎜ A = ⎜ . . . . . ⎟. ⎟ ⎜ ⎝ . ... . ⎠ am1 . . . amn We denote by Fm×n the set of all such matrices. We define in Fm×n the operations of addition and multiplication by scalars by
2.2 Vector Spaces
35
(ai j ) + (bi j ) = (ai j + bi j ), α (ai j ) = (α ai j ). These definitions make Fm×n into a vector space. Given the matrix A = (ai j ), ˜ as the n × m matrix given by we define its transpose, which we denote by A, (a˜i j ) = (a ji ). Given matrices A ∈ F p×m , B ∈ Fm×n , we define the product AB ∈ F p×n of the matrices by (AB)i j =
m
∑ aik bk j .
(2.1)
k=1
It is easily checked that matrix multiplication is associative and distributive, i.e., we have A(BC) = (AB)C, A(B1 + B2 ) = AB1 + AB2 , (A1 + A2)B = A1 B + A2 B.
(2.2)
In Fn×n we define the identity matrix In by In = (δi j ).
(2.3)
Here δi j denotes the Kronecker delta function, defined by
δi j =
0 if i = j, 1 if i = j.
(2.4)
3. The rings F[z], F((z−1 )), F[[z]], F[[z−1 ]] are all vector spaces over F. So is F(z) the field of rational functions. 4. Many interesting examples are obtained by considering function spaces. A typical example is CR (X ), the space of all real-valued continuous functions on a topological space X . With addition and multiplication by scalars are defined by ( f + g)(x) = f (x) + g(x), (α f )(x) = α f (x), CR (X) is a vector space. We note the following simple rules. Proposition 2.2. Let V be a vector space over the field F. Let α ∈ F and x ∈ V . Then 1. 0x = 0. 2. α x = 0 implies α = 0 or x = 0.
36
2 Vector Spaces
2.3 Linear Combinations Definition 2.3. Let V be a vector space over the field F. Let x1 , . . . , xn ∈ V and α1 , . . . , αn ∈ F. The vector α1 x1 + · · · + αn xn is an element of V and is called a linear combination of the vectors x1 , . . . , xn . The scalars α1 , . . . , αn are called the coefficients of the linear combination. A vector x ∈ V is called a linear combination of the vectors x1 , . . . , xn if there exist scalars α1 , . . . , αn for which x = ∑ni=1 αi xi . A linear combination in which all coefficients are zero is called a trivial linear combination.
2.4 Subspaces Definition 2.4. Let V be a vector space over the field F. A nonempty subset M of V is called a subspace of V if for any pair of vectors x, y ∈ M and any pair of scalars α , β ∈ F, we have α x + β y ∈ M . Thus a subset M of V is a subspace if and only if it is closed under linear combinations. An equivalent description of subspaces is subsets closed under addition and multiplication by scalars. Examples: 1. For an arbitrary vector space V , V itself and {0} are subspaces. These are called the trivial subspaces. 2. Let ⎧⎛ ⎞ ⎫ ⎪ ⎪ a1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎜ ⎟ ⎪ ⎪ ⎪ . ⎨⎜ ⎟ ⎬ ⎜ ⎟ M = ⎜ . ⎟ |an = 0 . ⎜ ⎟ ⎪ ⎪ ⎪ ⎪ ⎪⎝ . ⎠ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ an Then M is a subspace of Fn . 3. Let A be an m × n matrix. Then M = {x ∈ Fn |Ax = 0} is a subspace of Fn . This is the space of solutions of a system of linear homogeneous equations. Theorem 2.5. Let {Mα }α ∈A be a collection of subspaces of V . Then M = ∩α ∈A Mα is a subspace of V . Proof. Let x, y ∈ M and α , β ∈ F. Clearly, M ⊂ Mα for all α , and therefore x, y ∈ Mα , and since Mα is a subspace, we have that α x + β y belongs to Mα , for all α , and hence to the intersection. So α x + β y ∈ M .
2.5 Linear Dependence and Independence
37
Definition 2.6. Let S be a subset of a vector space V . The set L(S), or span (S), the subspace spanned by S, is defined as the intersection of the (nonempty) set of all subspaces containing S. This is therefore the smallest subspace of V containing S. Theorem 2.7. Let S be a subset of a vector space V . Then span (S), the subspace spanned by S, is the set of all finite linear combinations of elements of S. Proof. Let M = {∑ni=1 αi xi |αi ∈ F, xi ∈ S, n ∈ N}. Clearly S ⊂ M , and M is a subspace of V , since linear combinations of linear combinations of elements of S are also linear combinations of elements of S. Thus span (S) ⊂ M . Conversely, we have S ⊂ span (S). So necessarily span (S) contains all finite linear combinations of elements of S. Hence M ⊂ span (S), and equality follows. We saw that if M1 , . . . , Mk are subspaces of V , then so is M = ∩ki=1 Mi . This, in general, is not true for unions of subspaces. The natural concept is that of sum of subspaces. Definition 2.8. Let M1 , . . . , Mk be subspaces of a vector space V . The sum of p these subspaces, ∑i=1 Mi , is defined by p
p
∑ Mi = ∑ αi xi |αi ∈ F, xi ∈ Mi
i=1
.
i=1
2.5 Linear Dependence and Independence Definition 2.9. Vectors x1 , . . . , xk in a vector space V are called linearly dependent if there exist α1 , . . . , αk ∈ F, not all zero, such that
α1 x1 + · · · + αk xk = 0. Vectors x1 , . . . , xk in a vector space V are called linearly independent if they are not linearly dependent. Thus x1 , . . . , xk are linearly dependent if there exists a nontrivial, vanishing linear combination. On the other hand, x1 , . . . , xk are linearly independent if and only if α1 x1 + · · · + αk xk = 0 implies α1 = · · · = αk = 0. That is, the only vanshing linear combination of linearly independent vectors is the trivial one. We note that, as a consequence of Proposition 2.2, a set containing a single vector x ∈ V is linearly independent if and only if x = 0. Definition 2.10. Vectors x1 , . . . , xk in a vector space V are called a spanning set if V = span (x1 , . . . , xk ). Theorem 2.11. Let S ⊂ V . 1. If S is a spanning set and S ⊂ S1 ⊂ V , then S1 is also a spanning set.
38
2 Vector Spaces
2. If S is a linearly independent set and S0 ⊂ S, then S0 is also a linearly independent set. 3. If S is linearly dependent and S ⊂ S1 ⊂ V , then S1 is also linearly dependent. 4. Every subset of V that includes the zero vector is linearly dependent. As a consequence, a spanning set must be sufficiently large, whereas for linear independence the set must be sufficiently small. The case in which these two properties are in balance is of special importance and this leads to the following. Definition 2.12. A subset B of vectors in V is called a basis if 1. B is a spanning set. 2. B is linearly independent. V is called a finite-dimensional space if there exists a basis in V having a finite number of elements. Note that we defined the concept of finite dimensionality before having defined dimension. Example: Let V = Fn . Let ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 1 0 0 ⎜0⎟ ⎜1⎟ ⎜.⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜.⎟ ⎜0⎟ ⎜.⎟ e1 = ⎜ ⎟ , e2 = ⎜ ⎟ , . . . , en = ⎜ ⎟ . ⎜.⎟ ⎜.⎟ ⎜.⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝.⎠ ⎝.⎠ ⎝0⎠ 0 0 1 Then B = {e1 , . . . , en } is a basis for Fn . The following result is the main technical instrument in the study of bases. Theorem 2.13 (Steinitz). Let x1 , . . . , xm ∈ V and let e1 , . . . , e p be linearly independent vectors that satisfy ei ∈ span (x1 , . . . , xm ) for all i. Then there exist p vectors in {xi }, without loss of generality we may assume they are the first p vectors, such that span (e1 , . . . , e p , x p+1 , . . . , xm ) = span (x1 , . . . , xm ). Proof. We prove this by induction on p. For p = 1, we must have e1 = 0 by linear independence. Therefore there exist αi such that e1 = ∑m i=1 αi xi . Necessarily αi = 0 for some i. Without loss of generality we assume α1 = 0. Therefore we can write m
x1 = α1−1 e1 − α1−1 ∑ αi xi . i=2
This means that x1 ∈ span (e1 , x2 , . . . , xm ). Of course we also have xi ∈ span (e1 , x2 , . . . , xm ) for i = 2, . . . , m. Therefore
2.5 Linear Dependence and Independence
39
span (x1 , x2 , . . . , xm ) ⊂ span (e1 , x2 , . . . , xm ). On the other hand, by our assumption, e1 ∈ span (x1 , x2 , . . . , xm ) and hence span (e1 , x2 , . . . , xm ) ⊂ span (x1 , x2 , . . . , xm ). From these two inclusion relations, the following equality follows: span (e1 , x2 , . . . , xm ) = span (x1 , x2 , . . . , xm ). Assume that we have proved the assertion for up to p − 1 elements and assume that e1 , . . . , e p are linearly independent vectors that satisfy ei ∈ span (x1 , . . . , xm ) for all i. By the induction hypothesis, we have span (e1 , . . . , e p−1 , x p , . . . , xm ) = span (x1 , x2 , . . . , xm ). Therefore, e p ∈ span (e1 , . . . , e p−1 , x p , . . . , xm ), and hence there exist αi such that e p = α1 x1 + · · · + α p−1 + α p x p + · · · + αm xm . It is impossible that α p , . . . , αm are all 0, for that implies that e p is a linear combination of e1 , . . . , e p−1 , contradicting the assumption of linear independence. So at least one of these numbers is nonzero, and without loss of generality, reordering the elements if necessary, we assume α p = 0. Now x p = α p−1 (α1 e1 + · · · + α p−1 e p−1 − e p + α p+1x p+1 + · · · + αn xn ), that is, x p ∈ span (e1 , . . . , e p x p+1 , . . . , xm ). Therefore, span (x1 , x2 , . . . , xm ) ⊂ span (e1 , . . . , e p x p+1 , . . . , xm ) ⊂ span (x1 , x2 , . . . , xm ), and hence the equality span (e1 , . . . , e p x p+1 , . . . , xm ) = span (x1 , x2 , . . . , xm ). Corollary 2.14. The following assertions hold. 1. Let {e1 , . . . , en } be a basis for the vector space V , and let { f1 , . . . , fm } be linearly independent vectors in V . Then m ≤ n. 2. Let {e1 , . . . , en } and { f1 , . . . , fm } be two bases for V . Then n = m. Proof. 1. Apply Theorem 2.13.
40
2 Vector Spaces
2. By the first part we have both m ≤ n and n ≤ m, so equality follows.
Thus two different bases in a finite-dimensional vector space have the same number of elements. This leads us to the following definition. Definition 2.15. Let V be a vector space over the field F. The dimension of V is defined as the number of elements in an arbitrary basis. We denote the dimension of V by dim V . Theorem 2.16. Let V be a vector space of dimension n. Then 1. Every subset of V containing more than n vectors is linearly dependent. 2. A set of p < n vectors in V cannot be a spanning set.
2.6 Subspaces and Bases Theorem 2.17. 1. Let V be a vector space of dimension n and let M be a subspace. Then dim M ≤ dim V . 2. Let {e1 , . . . , e p } be a basis for M . Then there exist vectors {e p+1 , . . . , en } in V such that {e1 , . . . , en } is a basis for V . Proof. It suffices to prove the second assertion. Let {e1 , . . . , e p } be a basis for M anf { f1 , . . . , fn } a basis for V . By Theorem 2.13 we can replace p of the fi by the e j , j = 1, . . . , p, and get a spanning set for V . But a spanning set with n elements is necessarily a basis for V . From two subspaces M1 , M2 of a vector space V we can construct the subspaces M1 ∩ M2 and M1 + M2 . The next theorem studies the dimensions of these subspaces. For the sum of two subspaces of a vector space V we have the following. Theorem 2.18. Let M1 , M2 be subspaces of a vector space V . Then dim(M1 + M2 ) + dim(M1 ∩ M2 ) = dim M1 + dimM2 .
(2.5)
Proof. Let {e1 , . . . , er } be a basis for M1 ∩ M2 . Then there exist vectors { fr+1 , . . . , f p } and {gr+1, . . . , gq } such that the set {e1 , . . . , er , fr+1 , . . . , f p } is a basis for M1 and the set {e1 , . . . , er , gr+1 , . . . , gq } is a basis for M2 . We will proceed to show that the set {e1 , . . . , er , fr+1 , . . . , f p , gr+1 , . . . , gq } is a basis for M1 + M2 . Clearly {e1 , . . . , er , fr+1 , . . . , f p , gr+1 , . . . , gq } is a spanning set for M1 + M2 . So it remains to show that it is linearly independent. Assume that they are linearly dependent, i.e., there exist αi , βi , γi such that r
p
i=1
i=r+1
∑ αi ei + ∑
βi f i +
q
∑
i=r+1
γi gi = 0,
(2.6)
2.7 Direct Sums
or
41
q
∑
γi g i = −
i=r+1
r
p
i=1
i=r+1
∑ αi ei + ∑
βi f i .
So it follows that ∑qi=r+1 γi gi ∈ M1 . On the other hand, ∑qi=r+1 γi gi ∈ M2 as a linear q combination of some of the basis elements of M2 . Therefore ∑i=r+1 γi gi belongs to M1 ∩ M2 and hence can be expressed as a linear combination of the ei . So there exist numbers ε1 , . . . , εr such that r
q
i=1
i=r+1
∑ εi ei + ∑
γi gi = 0.
However, this is a linear combination of the basis elements of M2 ; hence εi = γ j = 0. Now (2.6) reduces to r
p
i=1
i=r+1
∑ αi ei + ∑
βi fi = 0.
From this, we conclude by the same reasoning that αi = β j = 0. This proves the linear independence of the vectors {e1 , . . . , er , fr+1 , . . . , f p , gr+1 , . . . , gq }, and so they are a basis for M1 + M2 . Now dim(M1 + M2 ) = p + q − r = dim M1 + dim M2 − dim(M1 ∩ M2 ).
2.7 Direct Sums Definition 2.19. Let Mi , i = 1, . . . , p, be subspaces of a vector space V . We say that p ∑i=1 Mi is a direct sum of the subspaces Mi , and write M = M1 ⊕ · · · ⊕ M p , if p p for every x ∈ ∑i=1 Mi there exists a unique representation x = ∑i=1 xi with xi ∈ Mi . Proposition 2.20. Let M1 , M2 be subspaces of a vector space V . Then M = M1 ⊕ M2 if and only if M = M1 + M2 and M1 ∩ M2 = {0}. Proof. Assume M = M1 ⊕ M2 . Then for x ∈ M we have x = x1 + x2, with xi ∈ Mi . Suppose there exists another representation of x in the form x = y1 + y2 , with yi ∈ Mi . From x1 + x2 = y1 + y2 we get z = x1 − y1 = y2 − x2 . Now x1 − y1 ∈ M1 and y2 − x2 ∈ M2 . So, since M1 ∩ M2 = {0}, we have z = 0, that is, x1 = y1 and x 2 = y2 . Conversely, suppose every x ∈ M has a unique representation x = x1 + x2 , with xi ∈ Mi . Thus M = M1 + M2 . Let x ∈ M1 ∩ M2 . Then x = x + 0 = 0 + x, which implies, by the uniqueness of the representation, that x = 0.
42
2 Vector Spaces
We consider next some examples. 1. The space F((z−1 )) of truncated Laurent series has the spaces F[z] and F[[z−1 ]] as subspaces. We clearly have the direct sum decomposition F((z−1 )) = F[z] ⊕ z−1 F[[z−1 ]].
(2.7)
The factor z−1 guarantees that the constant elements appear in one of the subspaces only. 2. We denote by Fn [z] the space of all polynomials of degree < n, i.e., Fn [z] = {p(z) ∈ F[z]| deg p < n}. The following result is based on Proposition 1.46. Proposition 2.21. Let p(z), q(z) ∈ F[z] with deg p = m and deg q = n. Let r(z) be the g.c.d. of p(z) and q(z) and let s(z) be their l.c.m. Let deg r = ρ . Then 1. pFn [z] + qFm[z] = rFm+n−ρ [z]. 2. pFn [z] ∩ qFm [z] = sFρ [z]. Proof. 1. We know that with r(z) the g.c.d. of p(z) and q(z), pF[z] + qF[z] = rF[z]. So, given f (z), g(z), ∈ F[z], there exists h(z) ∈ F[z] such that p(z) f (z) + q(z)g(z) = r(z)h(z). Now we take remainders after division by p(z)q(z), i.e., we apply the map π pq. Now π pq p f = pπq f and π pq qg = qπ p g. Finally, since by Proposition 1.47 we have p(z)q(z) = r(z)s(z), it follows that π pq rh = πrs rh = rπs h. Now deg s = deg p + deg q − deg r = n + m − ρ . So we get the equality pFm [z] + qFm[z] = rFm+n−ρ [z]. 2. We have pF[z]∩qF[z] = sF[z]. Again we apply π pq. If f ∈ pF[z] then f = p f , and hence π pq p f = pπq f ∈ pFn [z]. Similarly, if f (z) ∈ qF[z] then f (z) = q(z) f (z) and π pq f = π pq · q(z) f (z) = q(z)π p f ∈ qFm [z]. On the other hand, f (z) = s(z)h(z) implies π pqsh = πrs sh = sπr h ∈ sFρ [z]. Corollary 2.22. Let p(z), q(z) ∈ F[z] with deg p = m and degq = n. Then p(z) ∧ q(z) = 1 if and only if Fm+n [z] = pFn [z] ⊕ qFm [z]. We say that the polynomials p1 (z), . . . , pk (z) are mutually coprime if for i = j, pi (z) and p j (z) are coprime. Theorem 2.23. Let p1 (z), . . . , pk (z) ∈ F[z] with deg pi = ni with ∑ki=1 ni = n. Then the pi (z) are mutually coprime if and only if Fn [z] = π1 (z)Fn1 [z] ⊕ · · · ⊕ πk (z)Fnk [z], where the π j (z) are defined by
πi (z) = Π j=i p j (z).
2.8 Quotient Spaces
43
Proof. The proof is by induction. Assume the pi (z) are mutually coprime. For k = 2 the result was proved in Proposition 2.21. Assume it has been proved for positive integers ≤ k − 1. Let τi (z) = Π k−1 j=1 p j (z). Then j=i
Fn1 +···+nk−1 [z] = τ1 (z)Fn1 [z] ⊕ · · · ⊕ τk−1 (z)Fnk−1 [z]. Now πk (z) ∧ pk (z) = 1, so Fn [z] = πk (z)Fnk [z] ⊕ pk (z)Fn1 +···+nk−1 [z] = pk (z) τ1 (z)Fn1 [z] ⊕ · · · ⊕ τk−1 (z)Fnk−1 [z] ⊕ πk (z)Fnk (z) = π1 (z)Fn1 [z] ⊕ · · · ⊕ πk (z)Fnk [z]. Conversely, if Fn [z] = π1 (z)Fn1 [z] ⊕ · · · ⊕ πk (z)Fnk [z], then there exist polynomials fi (z) such that 1 = ∑ πi fi . The coprimeness of the πi (z) implies the mutual coprimeness of the pi (z). Corollary 2.24. Let p(z) = p1 (z)n1 · · · pk (z)nk be the primary decomposition of p(z), with deg pi = ri and n = ∑ki=1 ni . Then Fn [z] = p2 (z)n2 · · · pk (z)nk Fr1 n1 [z] ⊕ · · · ⊕ p1 (z)n1 · · · pk−1 (z)nk−1 Frk nk [z]. Proof. Follows from Theorem 2.23, replacing pi (z) by pi (z)ni .
2.8 Quotient Spaces We begin by introducing the concept of codimension. Definition 2.25. We say that a subspace M ⊂ X has codimension k, denoted by codim M = k, if 1. There exist k vectors {x1 , . . . , xk }, linearly independent over M , i.e., for which ∑ki=1 αi xi ∈ M if and only if αi = 0 for all i = 1, . . . , k. 2. X = L(M , x1 , . . . , xk ). Now let X be a vector space over the field F and let M be a subspace. In X we define a relation x y if x − y ∈ M . (2.8) It is easy to check that this is indeed an equivalence relation, i.e., it is reflexive, symmetric, and transitive. We denote by [x]M = x + M = {x + m|m ∈ M } the equivalence class of x ∈ X . We denote by X /M the set of equivalence classes with respect to the equivalence relation induced by M as in (2.8).
44
2 Vector Spaces
So far, X /M is just a set. We introduce in X /M two operations, addition and multiplication by a scalar, as follows: [x]M + [y]M = [x + y]M , x, y ∈ X , α [x]M = [α x]M . Proposition 2.26. 1. The operations of addition and multiplication by a scalar are well defined, i.e., independent of the representatives x, y. 2. With these operations, X /M is a vector space over F. 3. If M has codimension k in X , then dim X /M = k. Proof. 1. Let x x and y y. Thus [x]M = [x ]M . This means that x = x+m1 , y = y + m2 with mi ∈ M . Hence x + y = x + y + (m1 + m2 ), which shows that [x + y ]M = [x + y]M . Similarly, α x = α x + α m. So α x − α x = α m ∈ M ; hence [α x ]M = [α x]M . 2. The axioms of a vector space for X /M are easily shown to result from those in X . 3. Let x1 , . . . , xk be linearly independent over M and such that L(x1 , . . . , xk , M ) = X . We claim that {[x1 ], . . . , [xk ]} are linearly independent, for if ∑ni=1 αi [xi ] = 0, it follows that [∑ αi xi ]M = 0, i.e., ∑ni=1 αi xi ∈ M . Since x1 , . . . , xn are linearly independent over M , necessarly αi = 0, i = 1, . . . , k. Let now [x]M be an arbitrary equivalence class in X /M . Assume x ∈ [x]M . Then there exist αi ∈ F such that x = α1 x1 + · · · + αn xn + m for some m ∈ M . This implies [x]M = α1 [x1 ]M + · · · + αk [xk ]M . So indeed {[x1 ]M , . . . , [xk ]M } is a basis for X /M , and hence dim X /M = k. Corollary 2.27. Let q ∈ F[z] be a polynomial of degree n. Then 1. qF[z] is a subspace of F[z] of codimension n. 2. dim F[z]/qF[z] = n. Proof. The polynomials 1, z, . . . , zn−1 are obviously linearly independent over qF[z]. Moreover, applying the division rule of polynomials, it is clear that 1, z, . . . , zn−1 together with qF[z] span all of F[z]. Proposition 2.28. Let F((z−1 )) be the space of truncated Laurent series and F[z] and z−1 F[[z−1 ]] the corresponding subspaces. Then we have the isomorphisms F[z] F((z−1 ))/z−1 F[[z−1 ]] and
z−1 F[[z−1 ]] F((z−1 ))/F[z].
Proof. Follows from the direct sum representation (2.7).
2.9 Coordinates
45
2.9 Coordinates Lemma 2.29. Let V be a finite-dimensional vector space of dimension n and let B = {e1 , . . . , en } be a basis for V . Then every vector x ∈ V has a unique representation as a linear combination of the ei . That is, n
x = ∑ αi ei .
(2.9)
i=1
Proof. Since B is a spanning set, such a representation exists. Since B is linearly independent, the representation (2.9) is unique. Definition 2.30. The scalars α1 , . . . , αn will be called the coordinates of x with respect to the basis B, and we will use the notation ⎛
⎞ α1 ⎜ . ⎟ ⎜ ⎟ ⎜ ⎟ B [x] = ⎜ . ⎟ . ⎜ ⎟ ⎝ . ⎠ αn The vector [x]B will be called the coordinate vector of x with respect to B. We will always write it in column form. The map x → [x]B is a map from V to Fn . Proposition 2.31. Let V be a finite-dimensional vector space of dimension n and let B = {e1 , . . . , en } be a basis for V . The map x → [x]B has the following properties: 1. 2.
[x + y]B = [x]B + [y]B . [α x]B = α [x]B .
3. [x]B = 0 if and only if x = 0. 4. For every α1 , . . . , αn ∈ F there exists a vector x ∈ V for which ⎞ α1 ⎜ . ⎟ ⎜ ⎟ ⎜ ⎟ B [x] = ⎜ . ⎟ . ⎜ ⎟ ⎝ . ⎠ αn ⎛
Proof. 1. Let x = ∑ni=1 αi ei , y = ∑ni=1 βi ei . Then
46
2 Vector Spaces n
n
n
i=1
i=1
i=1
x + y = ∑ αi ei + ∑ βi ei = ∑ (αi + βi )ei . So
⎞ ⎛ ⎞ ⎛ ⎞ α1 + β1 α1 β1 ⎜ ⎟ ⎜ ⎟ ⎟ ⎜ . ⎟ ⎜ . ⎟ ⎜ . ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ [x + y]B = ⎜ . ⎟ = ⎜ . ⎟ + ⎜ . ⎟ = [x]B + [y]B . ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎠ ⎝ . ⎠ ⎝ . ⎠ ⎝ . αn + βn αn βn ⎛
2. Let α ∈ F. Then α x = ∑ni=1 ααi ei , and therefore ⎛
⎞ ⎛ ⎞ αα1 α1 ⎜ . ⎟ ⎜ . ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ [α x]B = ⎜ . ⎟ = α ⎜ . ⎟ = α [x]B . ⎜ ⎟ ⎜ ⎟ ⎝ . ⎠ ⎝ . ⎠ ααn αn 3. If x = 0 then x = ∑ni=1 0ei , and ⎛ ⎞ 0 ⎜.⎟ ⎜ ⎟ ⎜ ⎟ [x]B = ⎜ . ⎟ . ⎜ ⎟ ⎝.⎠ 0 Conversely, if [x]B = 0 then x = ∑ni=1 0ei = 0. 4. Let α1 , . . . , αn ∈ F. Then we define a vector x ∈ V by x = ∑ni=1 αi ei . Then clearly ⎛
⎞ α1 ⎜ . ⎟ ⎜ ⎟ ⎜ ⎟ [x]B = ⎜ . ⎟ . ⎜ ⎟ ⎝ . ⎠ αn
2.10 Change of Basis Transformations Let V be a finite-dimensional vector space of dimension n and let B = {e1 , . . . , en } and B1 = { f1 , . . . , fn } be two different bases in V . We will explore the connection between the coordinate vectors with respect to the two bases.
2.10 Change of Basis Transformations
47
For each x ∈ V there exist unique α j , β j ∈ F for which x=
n
n
j=1
j=1
∑ α je j = ∑ β j f j .
(2.10)
Since for each j = 1, . . . , n, the basis vector e j has a unique expansion as a linear combination of the fi , we set n
e j = ∑ ti j f i . i=1
numbers, which we arrange in an n × n matrix. We denote 1 . We refer to this matrix as the basis transformation matrix this matrix by [I]B B from the basis B to the basis B1 . Substituting back into (2.10), we get These equations define n2
x = ∑ni=1 βi fi = ∑nj=1 α j e j = ∑nj=1 α j ∑ni=1 ti j fi = ∑nj=1 ∑ni=1 ti j α j fi = ∑ni=1 (∑nj=1 ti j α j ) fi . Equating coefficients of the fi , we obtain
βi =
n
∑ ti j α j .
j=1
Thus we conclude that B 1 [x]B1 = [I]B B [x] .
So the basis transformation matrix transforms the coordinate vector with respect to the basis B to the coordinate vector with respect to the basis B1 . Theorem 2.32. Let V be a finite-dimensional vector space of dimension n and let B = {e1 , . . . , en }, B1 = { f1 , . . . , fn }, and B2 = {g1 , . . . , gn } be three bases for V . Then we have B2 B1 2 [I]B B = [I]B1 [I]B . B2 B1 2 Proof. Let [I]B B = (ri j ), [I]B1 = (si j ), [I]B = (ti j ). This means that
⎧ ⎪ e j = ∑ni=1 ri j gi , ⎪ ⎪ ⎪ ⎪ ⎨ fk = ∑ni=1 sik gi , ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ e j = ∑ni=1 tk j fk . Therefore ri j = ∑ni=1 sik tk j .
48
2 Vector Spaces
Corollary 2.33. Let V be a finite-dimensional vector space with two bases B and B1 . Then B −1 1 [I]B B = ([I]B1 ) .
Proof. Clearly [I]B B = I, where I denotes the identity matrix. By Theorem 2.32, we get B1 B I = [I]B B = [I]B1 [I]B .
1 Theorem 2.34. The mapping in Fn defined by the matrix [I]B B as
B [x]B → [I]B1 [x]B = [x]B1
has the following properties: B1 B B1 B B B 1 [I]B B ([x] + [y] ) = [I]B [x] + [I]B [y] , B1 B B 1 [I]B B (α [x] ) = α [I]B [x] .
Proof. We compute B1 B B B B1 1 [I]B B ([x] + [y] ) = [I]B [x + y] = [x + y] B
B
= [x]B1 + [y]B1 = [I]B1 [x]B + [I]B1 [y]B . Similarly, B1 B B B1 1 [I]B B (α [x] ) = [I]B [α x] = [α x] B
= α [x]B1 = α [I]B1 [x]B .
These properties characterize linear maps. We will discuss those in detail in Chapter 4.
2.11 Lagrange Interpolation Let Fn [z] = {p(z) ∈ F[z]| deg p < n}. Clearly, Fn [z] is an n-dimensional subspace of F[z]. In fact, Bst = {1, z, . . . , zn−1 } is a basis for Fn [z], and we will refer to this as i the standard basis of Fn [z]. Obviously, if p(z) = ∑n−1 i=0 pi z , then
2.11 Lagrange Interpolation
49
⎛
⎞ p0 ⎜ . ⎟ ⎜ ⎟ ⎜ ⎟ st [p] = ⎜ . ⎟ . ⎜ ⎟ ⎝ . ⎠ pn−1 We exhibit next another important basis that is intimately related to polynomial interpolation. To this end we introduce The Lagrange interpolation problem: Given distinct numbers αi ∈ F, i = 1, . . . , n and another set of arbitrary numbers ci ∈ F, i = 1, . . . , n, find a polynomial p(z) ∈ Fn [z] such that p(αi ) = ci , i = 1, . . . , n. We can replace this problem by a set of simpler Special Interpolation Problems: Find polynomials li (z) ∈ Fn [z] such that for all i = 1, . . . , n, li (α j ) = δ i j ,
j = 1, . . . , n.
Here δi j is the Kronecker delta function. If li (z) are such polynomials, then a solution to the Lagrange interpolation problem is given by n
p(z) = ∑ ci li (z). i=1
This is seen by a direct evaluation at all αi . The existence and properties of the solution to the special interpolation problem are summarized in the following. Proposition 2.35. Given distinct numbers αi ∈ F, i = 1, . . . , n, let the Lagrange interpolation polynomials li (z) be defined by li (z) =
Π j=i (z − α j ) . Π j=i (αi − α j )
Then 1. The polynomials li (z) are in Fn [z] and li (α j ) = δi j ,
j = 1, . . . , n.
2. The set {l1 (z), . . . , ln (z)} forms a basis for Fn [z]. 3. For all p(z) ∈ Fn [z] we have n
p(z) = ∑ p(αi )li (z). i=1
(2.11)
50
2 Vector Spaces
Proof. 1. Clearly, li (z) has degree n − 1; hence li (z) ∈ Fn [z]. They are well defined, since by the distinctness of the αi , all the denominators are different from zero. Moreover, by construction, we have li (α j ) = δi j ,
j = 1, . . . , n.
2. We show first that the polynomials l1 (z), . . . , ln (z) are linearly independent. For this, assume that ∑ni=1 ci li (z) = 0. Evaluating at α j , we get n
n
i=1
i=1
0 = ∑ ci li (α j ) = ∑ ci δi j = c j . Since dim Fn [z] = n, they actually form a basis. 3. Since {l1 (z), . . . , ln (z)} forms a basis for Fn [z], then for every p(z) ∈ Fn [z] there exist c j such that n
p(z) = ∑ ci li (z). i=1
Evaluating at α j , we get n
n
i=1
i=1
p(α j ) = ∑ ci li (α j ) = ∑ ci δi j = c j . The solution of the Lagrange interpolation problem presented above was achieved by an ad hoc construction of the Lagrange polynomials. In Chapter 5, we will see that more general interpolation problems can be solved by applying the Chinese remainder theorem. Corollary 2.36. Let α1 , . . . , αn ∈ F be distinct. Let Bin = {l1 (z), . . . , ln (z)} be the Lagrange interpolation basis and let Bst be the standard basis in Fn [z]. Then the change of basis transformation from the standard basis to the interpolation basis is given by ⎛ ⎞ 1 α1 . . α1n−1 ⎜. . .. . ⎟ ⎜ ⎟ ⎜ ⎟ [I]in (2.12) . ⎟. st = ⎜ . . . . ⎜ ⎟ ⎝. . .. . ⎠ 1 αn . . αnn−1 Proof. From the equality (2.11) we get as special cases, for i = 0, . . . , n − 1, z j = j ∑ni=1 αi li (z), and (2.12) follows. The matrix in (2.12) is called the Vandermonde matrix.
2.12 Taylor Expansion
51
2.12 Taylor Expansion We proceed to study, algebraically, the Taylor expansion of polynomials, and we do it within the framework of change of basis transformations. Let p(z) ∈ Fn [z], i.e., deg p < n, and let α ∈ F. We would like to have a representation of p(z) in the form p(z) =
n−1
∑ p j (z − α ) j .
j=0
We refer to this as the Taylor expansion of p(z) at the point α . Naturally, the j standard representation of a polynomial, namely p(z) = ∑n−1 j=0 p j z , is the Taylor expansion at 0. Proposition 2.37. Given a positive integer n and α ∈ F. Then 1. The set of polynomials Bα = {1, z − α , . . . , (z − α )n−1 } forms a basis for Fn [z]. 2. For every p(z) ∈ Fn [z] there exist unique numbers pi,α such that p(z) =
n
∑ p j,α (z − α ) j .
(2.13)
j=0
3. The change of basis transformation is given by ⎛
⎞ (−α )n−1 1 −α α 2 ⎜ 0 1 −2α ⎟ . ⎜ ⎟ ⎜ ⎟ 1 . ⎜. ⎟ st [I]α = ⎜ ⎟, ⎜. ⎟ . . ⎜ ⎟ ⎝. . −(n − 1)α ⎠ 0 . . .0 1 i.e., the ith column entries come from the binomial expansion of (z − α )i−1 . Proof. 1. In Bα we have one polynomial of each degree from 0 to n − 1; hence these polynomials are linearly independent. Since dimFn [z] = n, they form a basis. 2. Follows from the fact that Bα is a basis. 3. We use the binomial expansion of (z − α )i−1 . The Taylor expansion can be generalized by replacing the basis {(z − λ )i }n−1 i=0 n−1 with {Π i−1 j=0 (z − λ j )}i=0 , where the λi ∈ F are assumed to be distinct. The proof of the following proposition is analogous to the preceeding one and we omit it. Proposition 2.38. Let λi ∈ F, i = 0, . . . , n − 1, be distinct. Then
52
2 Vector Spaces
n−1 1. The set of polynomials {Π i−1 j=0 (z − λ j )}i=0 forms a basis for Fn [z]. Here we use the convention that Π −1 j=0 (z − λ j ) = 1. 2. For every p(z) ∈ Fn [z] there exist unique numbers ci such that
p(z) =
n−1
∑ ci Π i−1 j=0 (z − λ j ).
(2.14)
i=0
We will refer to (2.14) as the Newton expansion of p(z). This expansion is important for solving polynomial interpolation problems, and we will return to it in Section 5.5.3.
2.13 Exercises 1. Let K , L , M be subspaces of a vector space. Show that K ∩ (K ∩ L + M ) = K ∩ L + K ∩ M , (K + L ) ∩ (K + M ) = K + (K + L ) ∩ M . 2. Let Mi be subspaces of a finite-dimensional vector space X . Show that if dim(∑ki=1 Mi ) = ∑ki=1 dim(Mi ), then M1 + · · · + Mk is a direct sum. 3. Let V be a finite-dimensional vector space over F. Let f be a bijective map on V . Define operations by v1 ∗ v2 = f −1 ( f (v1 ) + f (v2 )), α ∇v = f −1 (α f (v)). Show that with these operations, V is a vector space over F. 4. Let V = {pn−1 zn−1 + · · · + p1 z + p0 ∈ F[z]|pn−1 + · · · + p1 + p0 = 0}. Show that V is a finite-dimensional subspace of F[z] and find a basis for it. 5. Let q(z) be a monic polynomial with distinct zeros λ1 , . . . , λn . Let p(z) be a g(λ ) polynomial of degree n − 1. Show ∑ni=1 f (λj ) = 1. j 6. Let f (z), g(z) ∈ F[z] with g(z) nonzero. Then f (z) has a unique representation of the form f (z) = ∑i ai (z)g(z)i with deg ai < deg g. 7. Let CR (X) be the space of all real-valued, continuous functions on a topological space X. Let V± = { f ∈ CR (X)| f (x) = ± f (−x)}. Show that CR (X) = V+ ⊕ V− .
2.14 Notes and Remarks Linear algebra is geometric in origin. It traces its development to the work of Fermat and Descartes on analytic geometry. In that context points in the plane are identified with ordered pairs of real numbers. This was extended in the work of Hamilton, who
2.14 Notes and Remarks
53
discovered quaternions. In its early history linear algebra focused on the solution of systems of linear equations, and the use of determinants for that purpose was predominant. Matrices came rather late in the game, formalized by Cayley and Sylvester, both of whom made important contributions. This ushered in the era of matrix theory, which also served as the title of the classic book Gantmacher (1959). For a short discussion of the contributions of Cayley and Sylvester to matrix theory, see Higham (2008). The major step in the development of linear algebra as a field of mathematics is due to Grassmann (1844). In this remarkable book, Grassmann gives a fairly abstract definition of a linear vector space and develops the theory of linear independence very much in the spirit of modern linear algebra expositions. He defines the notions of subspace, independence, span, dimension, sum and intersection of subspaces, as well as projections onto subspaces. He also proves Theorem 2.13, usually attributed to Steinitz. He shows that any finite set has an independent subset with the same span and that any independent set extends to a basis, and he proves the important identity dim(U + V ) = dim U + dim V − dim(U ∩ V ). He obtains the formula for change of coordinates under change of basis, defines elementary transformations of bases, and shows that every change of basis is a product of elementary matrices. Since Grassmann was ahead of his time and did not excel at exposition, his work had little immediate impact. The first formal, modern definition of a vector space appears in Peano (1888), where it is called a linear system. Peano treats also the set of all linear transformations between two vector spaces. However, his most innovative example of a vector space was that of the polynomial functions of a real variable. He noted that if the polynomial functions were restricted to those of degree at most n, then they would form a vector space of dimension n + 1. But if one considered all such polynomial functions, then the vector space would have an infinite dimension. This is a precursor of functional analysis. Thirty years after the publication of Peano’s book, an independent axiomatic approach to vector spaces was given by Weyl. Extensions of the vector space axioms to include topological considerations are due to Schmidt, Banach, Wiener, and Von Neumann. The classic book van der Waerden (1931) seems to be the first algebra book having a chapter devoted to linear algebra. Adopting Noether’s point of view, modules are defined first and vector spaces appear as a special case.
Chapter 3
Determinants
3.1 Introduction Determinants used to be an important tool for the study of systems of linear equations. With the development of the abstract approach, based on the axioms of vector spaces, determinants lost their central role. Still, we feel that taking them out altogether from a text on linear algebra would be a mistake. Throughout the book we shall return to the use of special determinants for showing linear independence of vectors, invertibility of matrices, and coprimeness of polynomials. Since we shall need to compute determinants with polynomial entries, it is natural to develop the theory of determinants over a commutative ring with an identity.
3.2 Basic Properties Let R be a commutative ring with identity. Let X ∈ Rn×n and denote by x1 , . . . , xn its columns. Definition 3.1. A determinant is a function D : Rn×n −→ R that as a function of the columns of a matrix in Rn×n satisfies 1. D(x1 , . . . , xn ) is multilinear, that is, it is a linear function in each of its columns. 2. D(x1 , . . . , xn ) is alternating, by which we mean that it is zero whenever two adjacent columns coincide. 3. It is normalized so that if ei , . . . , en are the columns of the identity matrix, then D(e1 , . . . , en ) = 1. Proposition 3.2. If two adjacent columns in a matrix are interchanged, then the determinant changes sign.
P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, DOI 10.1007/978-1-4614-0338-8 3, © Springer Science+Business Media, LLC 2012
55
56
3 Determinants
Proof. Let x and y be the the ith and (i + 1)th columns respectively. Then 0 = D(. . . , x + y, x + y, . . .) = D(. . . , x, x + y, . . . ) + D(. . . , y, x + y, . . . ) = D(. . . , x, x, . . . ) + D(. . . , x, y, . . . ) + D(. . . , y, x, . . . ) + D(. . ., y, y, . . . ) = D(. . . , x, y, . . . ) + D(. . . , y, x, . . . ).
This property of determinants explains the usage of alternating. Corollary 3.3. If any two columns in a matrix are interchanged then the determinant changes sign. Proof. Suppose the ith and jth columns of the matrix A are interchanged. Without loss of generality, assume i < j. By j − i transpositions of adjacent columns, the jth column can be brought to the ith place. Now the ith column is brought to the jth place by j − i − 1 transpositions of adjacent columns. This changes the value of the determinant by only a factor (−1)2 j−2i−1 = −1. Corollary 3.4. If in a matrix any two columns are equal, then the determinant vanishes. Corollary 3.5. If the jth column of a matrix, multiplied by c, is added to the ith columnn, the value of the determinant does not change. Proof. We use the linearity in the ith variable to compute D(x1 , . . . , xi + cx j , . . . , x j , . . . , xn ) = D(x1 , . . . , xi , . . . , x j , . . . , xn ) + D(x1 , . . . , x j , . . . , x j , . . . , xn ) = D(x1 , . . . , xi , . . . , x j , . . . , xn ).
So far, we have proved elementary properties of the determinant function, but we do not know yet whether such a function exists. Our aim now is to prove the existence of a determinant function, which we will do by a direct, inductive, construction. Definition 3.6. Let A be an n × n matrix over R. We define the (i, j)th minor of A, which we denote by Mi j , as the determinant of the matrix of order n − 1 obtained from A by eliminating the ith row and jth column. We define the (i, j)-cofactor, which we denote by Ai j , by Ai j = (−1)i+ j Mi j . The matrix whose i, j entry is A ji will be called the classical adjoint of A and denoted by adj (A).
3.2 Basic Properties
57
Theorem 3.7. For each integer n ≥ 1, a determinant function exists, and is given by the following formula to which we refer as the expansion by the ith row: D(A) =
n
∑ a i j Ai j .
(3.1)
j=1
Proof. The construction is by an inductive process. For n = 1 we define D(a) = a, and this clearly satisfies the required three properties of the determinant function. Assume that we have constructed determinant functions for integers ≤ n − 1. Now we define a determinant of A via the expansion by rows formula (3.1). We show now that the properties of a determinant are satisfied. 1. Multilinearity: Fix an index k. We will show that D(A) is linear in the kth column. For j = k, ai j is not a function of the entries of the kth column, but Ai j is linear in the entries of all columns appearing in it, including the kth. For j = k, the cofactor Aik is not a function of the elements of the kth column, but the coefficient aik is a linear function. So D(A) is linear as a function of the kth column. 2. Alternacy: Assume that the kth and (k + 1)th columns are equal. Then, for each index j = k, k + 1 we clearly have Ai j = (−1)i+ j Mi j , and Mi j contains two adjacent and equal columns, hence is zero by the induction hypothesis. Therefore D(A) = aik Aik + ai(k+1)Ai(k+1) = aik (−1)i+k Mik + ai(k+1)(−1)i+k+1 Mi(k+1) = aik (−1)i+k Mik + aik (−1)i+k+1 Mi(k+1) = 0, for aik = ai(k+1) and Mik = Mi(k+1) . 3. Normalization: We compute D(I) =
n
∑ δi j (−1)i+ j Ii j = (−1)2iIii = 1,
j=1
for Iii is the determinant of the identity matrix of order n − 1, hence equal to 1 by the induction hypothesis. We use now the availability of the determinant function to discuss permutations. By a permutation σ of n elements we will mean a bijective map of the set {1, . . . , n} onto itself. The set of all such permutations Sn is a group and is called the symmetric group. We use alternatively the notation (σ1 , . . . , σn ) for the permutation. Assume now that (σ1 , . . . , σn ) is a permutation. Let e1 , . . . , en be the standard unit vectors in Rn . Since a determinant function exists, D(eσ1 , . . . , eσn ) = ±1. We can bring the permutation matrix, whose columns are the eσi , to the identity matrix in
58
3 Determinants
a finite number of column exchanges. Clearly, the sign depends on the permutation only. Thus we define (−1)σ = sign (σ ) = D(eσ1 , . . . , eσn ).
(3.2)
We will call (−1)σ the sign of the permutation. We say that the permutation is even if (−1)σ = 1 and odd if (−1)σ = −1. Given a matrix A, we denote by A(i) its ith column. Thus we have n
A( j) = ∑ ai j ei . i=1
Therefore, we can compute D(A) = D(A(1) , . . . , A(n) ) = D(∑ni1 =1 ai1 1 ei1 , . . . , ∑nin =1 ain 1 ein ) = ∑ni1 =1 · · · ∑nin =1 D(ai1 1 ei1 , . . . , ain n ein ) = ∑σ ∈Sn aσ1 1 · · · aσn n D(eσ1 , . . . , eσn ) = ∑σ ∈Sn (−1)σ aσ1 1 · · · aσn n D(e1 , . . . , en ) = ∑σ ∈Sn (−1)σ aσ1 1 · · · aσn n .
(3.3)
This gives us another explicit expression for the determinant function. In fact, this expression for the determinant function could be an alternative starting point for the development of the theory, but we chose to do it differently. From now on we will use the notation det(A) for the determinant of A, i.e., we have (3.4) det(A) = ∑ (−1)σ aσ1 1 · · · aσn n . σ ∈Sn
The existence of this expression leads immediately to the following. Theorem 3.8. Let R be a commutative ring with an identity. Then, for any integer n, there exists a unique determinant function, and it is given by (3.4). Proof. Let D be any function defined on Rn×n and satisfying the determinantal axioms. Then clearly, using the computation (3.3), we get D(A) = det(A). There was a basic asymmetry in our development of determinants, since we focused on the columns. The next result is a step toward putting the theory of determinants into a more symmetric state. Theorem 3.9. Given an n × n matrix A over the commutative ring R, we have for ˜ its transpose A, ˜ = det(A). det(A) (3.5) Proof. We use the fact that a˜i j = a ji , where a˜i j denotes the i j entry of the transposed ˜ Thus we have matrix A.
3.2 Basic Properties
59
˜ = ∑σ ∈S (−1)σ a˜σ (1)1 · · · a˜σ (n)n det(A) n = ∑σ ∈Sn (−1)σ a1σ (1) · · · anσ (n) −1 = ∑σ −1 ∈Sn (−1)σ aσ −1 (1)1 · · · aσ −1 (n)n = det(A).
Corollary 3.10. For each integer n ≥ 1, a determinant function exists, and is given by the following formula, to which we refer as the expansion by the jth column: n
D(A) = ∑ ai j Ai j .
(3.6)
i=1
Proof. We use the expansion by the ith column for the classical adjoint matrix. We use the fact that a˜ ji = ai j and A˜ ji = Ai j to obtain ˜ = ∑nj=1 a˜ ji A˜ ji = ∑nj=1 ai j Ai j . det(A) = D(A)
Corollary 3.11. The determinant, as a function of the rows of a matrix, is multilinear, alternating, and satisfies det(I) = 1. We can prove now the important multiplicative rule of determinants. Theorem 3.12. Let R be a commutative ring and let A and B be matrices in Rn×n . Then det(AB) = det(A) det(B). Proof. Let C = AB. Let A( j) and C( j) be the jth columns of A and C respectively. Clearly C( j) = ∑ni=1 bi j A(i) . Hence det(C) = det(C(1) , . . . ,C(n) ) = det(∑ni1 =1 bi1 1 A(i1 ) , . . . , ∑nin =1 bin n A(in ) ) = ∑ni1 =1 · · · ∑nin =1 bi1 1 · · · bin n det(A(i1 ) , . . . , A(in ) ) = ∑σ ∈Sn bσ (1)1 · · · bσ (n)n (−1)σ det(A(1) , . . . , A(n) ) = det(A) ∑σ ∈Sn (−1)σ bσ (1)1 · · · bσ (n)n = det(A) det(B).
We can use the expansion by rows and columns to obtain the following result. Corollary 3.13. Let A be an n × n matrix over R. Then ∑nk=1 aik A jk = δi j det(A), ∑nk=1 aki Ak j = δi j det(A).
(3.7)
Proof. The first equation is the expansion of the determinant by the jth row if i = j, whereas if i = j it is the expansion by the jth row of a matrix derived from A by
60
3 Determinants
replacing the jth row with the ith. Thus, it is the determinant of a matrix with two equal rows; hence its value is zero. The other formula is proved analogously. The determinant expansion rules can be written in a matrix form by defining a matrix adj A, called the classical adjoint of A, via (adj A)i j = A ji . Clearly, the expansion formulas, given in Corollary 3.13, can now be written concisely as (adjA)A = A(adjA) = det A · I. (3.8) This leads to the important determinantal criterion for the nonsingularity of matrices. Corollary 3.14. Let A be a square matrix over the field F. Then A is invertible if and only if detA = 0. Proof. Assume that A is invertible. Then there exists a matrix B such that AB = I. By the multiplication rule of determinants, we have det(AB) = det(A) · det(B) = det(I) = 1, and hence det(A) = 0. Conversely, assume det(A) = 0. Then A−1 =
1 adjA. det A
(3.9)
3.3 Cramer’s Rule We can use determinants to give a closed-form representation of the solution of a system of linear equations, provided that the coefficient matrix of the system is nonsingular. Theorem 3.15. Given the system of linear equations Ax = b, with A a⎛nonsingular, ⎞ x1 ⎜ ⎟ n × n matrix. Denote by A(i) the columns of the matrix A and let x = ⎝ ... ⎠ be the xn solution vector, that is, b = ∑ni=1 xi A(i) . Then
3.3 Cramer’s Rule
61
xi = or equivalently,
det(A(1) , . . . , b, . . . , A(n) ) , det(A)
a11 . 1 . xi = det A . . an1
. . . . . .
b1 . . . . bn
.. .. .. .. .. ..
a1n . . . . . ann
(3.10)
(3.11)
Proof. We compute det(A(1) , . . . , b, . . . , A(n) ) = det(A(1) , . . . , b, ∑nj=1 x j A( j) , A(n) ) = ∑nj=1 x j det(A(1) , . . . , b, A( j) , A(n) ) = xi det(A(1) , . . . , b, A( j) , A(n) ) = xi det(A). Another way to see this is to note that in this case, the unique solution is given by x = A−1 b. We will compute the ith component, xi : xi = (A−1 b)i = ∑nj=1 (A−1 )i j b j 1 n 1 = ∑ (adj A)i j b j = det A det A j=1
n
∑ b j A ji.
j=1
However, this is just the expansion by the ith column of the matrix A, where the ith column has been replaced by the elements of the vector b. The representation (3.11) of the solution to the system of equations Ax = b goes by the name Cramer’s rule. We prove now an elementary, but useful, result about the computation of determinants. Proposition 3.16. Let A and B be square matrices, not necessarily of the same size. Then
AC = det A · detB. (3.12) det 0B
AC Proof. We note that det is multilinear and alternating in the columns of A 0 B
and in the rows of B. Therefore
AC IC det = det A · detB · det . 0B 0 I
62
3 Determinants
But a consideration of elementary operations in rows and columns leads to det
IC 0 I
= det
I 0 0I
= 1.
Lemma 3.17. Let A, B,C, and D be matrices of appropriate size, with A, D square, such that C and D commute. Then
AB det = det(AD − BC). (3.13) CD Proof. Without loss of generality, let detD = 0. From the equality
AB CD
=
A − BD−1C B 0 D
0 , D−1C I I
we conclude that
AB det = det(A − BD−1C) · detD CD = det(A − BCD−1) · detD = det(AD − BC).
(3.14)
3.4 The Sylvester Resultant We give now a determinantal condition for the coprimeness of two polynomials. Our approach is geometric in nature. Let p(z), q(z) ∈ F[z]. We know, from Corollary 1.38, that p(z) and q(z) are coprime if and only if the Bezout equation a(z)p(z) + b(z)q(z) = 1 has a polynomial solution. The following gives a matrix condition for coprimeness. Theorem 3.18. Let p(z), q(z) ∈ F[z] with p(z) = p0 + · · · + pm zm and q(z) = q0 + q1 z + · · · + qnzn . Then the following conditions are equivalent: 1. The polynomials p(z) and q(z) are coprime. 2. We have the direct sum representation Fm+n [z] = pFn [z] ⊕ qFm [z]. 3. The resultant matrix
(3.15)
3.4 The Sylvester Resultant
63
⎛
p0 ⎜. ⎜ ⎜. ⎜ ⎜. ⎜ ⎜ ⎜p Res (p, q) = ⎜ m ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝
. .. ... .... .... ... .. .
q0 . . . . p0 q n . . . pm
⎞ ⎟ . ⎟ ⎟ .. ⎟ ... ⎟ ⎟ ⎟ . . . q0 ⎟ ⎟ .... ⎟ ⎟ .... ⎟ ⎟ ... ⎟ ⎟ .. ⎠
(3.16)
qn
is nonsingular. 4. The resultant, defined as the determinant of the resultant matrix, is nonzero. Proof. (1) ⇒ (2) Follows from the Bezout equation directly or from Proposition 2.21 as a special case. (2) ⇒ (1). Since 1 ∈ Fm+n [z], it follows that a solution exists to the Bezout equation a(z)p(z) + b(z)q(z) = 1. Moreover, with the conditions deg a < n, deg b < m satisfied, it is unique. (2) ⇒ (3) By Proposition 2.21, the equality Fm+n [z] = pFn [z] + qFm [z] is equivalent to this sum being a direct sum. In turn, this implies that a union of bases for pFn [z] and qFm [z] is a basis for Fm+n [z]. Now {zi p(z)|i = 0, . . . , n − 1} is a basis for pFn [z], and {zi q(z)|i = 0, . . . , m − 1} is a basis for qFm [z]. Thus Bres = {p(z), zp(z), . . . , zn−1 p(z), q(z), zq(z), . . . , zm−1 q(z)} forms a basis for Fm+n [z]. The last space has also the standard basis Bst = {1, z, . . . , zm+n−1 }. It is easy to check that Res (p, q) = [I]stres , i.e., it is a change of basis transformation, hence necessarily invertible. (3) ⇒ (2) If Res (p, q) is nonsingular, this shows that Bres is a basis for Fm+n [z]. This implies the equality (3.15). (3) ⇔ (4) Follows from Corollary 3.14.
We conclude this short chapter by computing the determinant of the Vandermonde matrix, derived in (2.12). Proposition 3.19. Let ⎛
1 λ1 ⎜. . ⎜ ⎜ V (λ1 , . . . , λn ) = ⎜ . . ⎜ ⎝. . 1 λn be the Vandermonde matrix. Then we have
⎞ . . λ1n−1 .. . ⎟ ⎟ ⎟ .. . ⎟ ⎟ .. . ⎠ . . λnn−1
64
3 Determinants
1 λ 1 . . detV (λ1 , . . . , λn ) = . . . . 1 λn
. . λ1n−1 . . . . . . = Π1≤ j
(3.17)
Proof. By induction on n. For n = 1 the equality is trivially satisfied. Assume that this holds for all integers < n. We consider the following determinant 1 λ 1 . . f (z) = . . 1 λn−1 1 z
. . λ1n−1 . . . . . . . n−1 . . λn−1 . . zn−1
Obviously, f (z) is a polynomial of degree n − 1 that vanishes at the points λ1 , . . . , λn−1 . Thus f (z) = Cn−1 (z − λ1 ) · · · (z − λn−1 ). Expanding by the last row, and using the induction hypothesis, we get 1 λ . . λ n−2 1 1 . . .. . Cn−1 = . . . . . = Π1≤ j
We note that for (3.17) to hold, we do not have to assume the λi to be distinct. However, clearly, the determinant of a Vandermonde matrix is nonzero if and only if the λi are distinct.
3.5 Exercises 1. Let A, B be n × n matrices. For the classical adjoint show the following: a. b. c. d.
adj (AB) = adj A · adjB. det adjA = (det A)n−1 . adj (adjA) = (det A)n−2 · A. 2 det adj(adj A) = (det A)(n−1) .
2. Let a square n × n matrix A be defined by ai j = 1 − δi j , where δi j is the Kronecker delta. Show that detA = (n − 1)(−1)n−1.
3.5 Exercises
65
3. Let A, B be square matrices. Show that det
AB BA
= det(A + B) · det(A − B).
4. Given the block matrix
AB CD
,
with A, D square and D nonsingular. Show that
AB det CD
= det D · det(A − BD−1C).
5. Let A be a square n × n matrix, x, y ∈ Fn , and 0 = α ∈ F. Show that det
A x y˜ α
˜ = det(α A − xy).
Show that adj(I − xy) ˜ = xy˜ − (1 − y˜x)I. 6. Let p(z) = be a polynomial of degree ≤ n − 1. Show that p(z) and zn − 1 are coprime if and only if i ∑n−1 i=0 pi z
⎛
p0 pn−1 ⎜ p ⎜ 1 p0 ⎜ det ⎜ . . ⎜ ⎝ . . pn−1 .
. . . . .
⎞ . p1 . . ⎟ ⎟ ⎟ . . ⎟ = 0. ⎟ . pn−1 ⎠ p1 p 0
7. Let V (λ1 , . . . , λn ) be the Vandermonde determinant. Show that 1 . . . . 1
λ1 . . . . λn
. . λ1n−2 λ2 · · · λn .. . . .. . . = (−1)n−1V (λ1 , . . . , λn ). .. . . .. . . n−2 . . λn λ1 · · · λn−1
8. Let V (λ1 , . . . , λn ) be the Vandermonde determinant. Show that
66
3 Determinants
1 λ1 . . . . . . . . 1 λn
. . λ1n−2 (λ2 + · · · + λn )n−1 .. . . .. . . = (−1)n−1V (λ1 , . . . , λn ). .. . . .. . . n−2 n−1 . . λn (λ1 + · · · + λn−1)
9. Given λ1 , . . . , λn , define sk = p1 λ1k + · · ·+ pn λnk . Set, for i, j = 0, . . . , n − 1, ai j = si+ j . Show that s 0 . det(ai j ) = . . sn−1
. . . sn−1 . . . . . . . . = p1 · · · pn ∏(λi − λ j )2 . i> j ... . . . . s2n−2
10. Show that 1 x 1 + y1 . . . 1 x +y n 1
... ... ... ... ...
1 x 1 + yn . . . 1 x n + yn
∏i> j (xi − x j )(yi − y j ) = . ∏i, j (xi + y j )
This determinant is generally known as the Cauchy determinant.
3.6 Notes and Remarks The theory of determinants predates the development of linear algebra, and it reached its high point in the nineteenth century. Thomas Muir, in his monumental The Theory of Determinants in the Historical Order of Development, Macmillan, 4 volumes, 1890-1923, (reprinted by Dover Books 1960), begins with Leibniz in 1693. However, a decade earlier, the Japanese mathematician Seki Kowa developed a theory of determinants; see Joseph (2000). The first systematic study is due to Cauchy, who apparently coined the term determinant in the sense we use today, and made important contributions, including the multiplication rule and the expansion rules that are named after him. The axiomatic treatment presented here is due to Kronecker and Weierstrass, probably in the 1860s. Their work became known in 1903, when Weierstrass’ On Determinant Theory and Kronecker’s Lectures on Determinant Theory were published posthumously.
Chapter 4
Linear Transformations
4.1 Introduction The study of linear transformations, and their structure, provides the core of linear algebra. We shall study matrix representations of linear transformations, linear functionals, and duality and the adjoint transformation. To conclude, we show how a linear transformation in a vector space induces a module structure over the corresponding ring of polynomials. This simple construction opens the possibility of using general algebraic results as a tool for the study of linear transformations.
4.2 Linear Transformations Definition 4.1. Let V and W be two vector spaces over the field F. A mapping T from V to W , denoted by T : V −→ W , is a linear transformation, or a linear operator, if T (α x + β y) = α (T x) + β (Ty) (4.1) holds for all x, y ∈ V and all α , β ∈ F. For an arbitrary vector space V , we define the zero and identity transformations by 0x = 0 and IV x = x, for all x ∈ V , respectively. Clearly, both are linear transformations. If there is no confusion about the space, we may drop the subscript and write I. The property of linearity of a transformation T is equivalent to the following two properties: • Additivity: T (x + y) = T x + Ty. • Homogeneity: T (α x) = α (T x). The structure of a linear transformation is very rigid. It suffices to know its action on basis vectors in order to determine the transformation uniquely. P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, DOI 10.1007/978-1-4614-0338-8 4, © Springer Science+Business Media, LLC 2012
67
68
4 Linear Transformations
Theorem 4.2. Let V , W be two vector spaces over the same field F. Let {e1 , . . . , en } be a basis for V and let { f1 , . . . , fn } be arbitrary vectors in W . Then there exists a unique linear transformation T : V −→ W for which Tei = fi , for all i = 1, . . . , n. Proof. Every x ∈ V has a unique representation x = ∑ni=1 αi ei . We define T : V −→ W by n
T x = ∑ αi fi . i=1
Clearly Tei = fi , for all i = 1, . . . , n. Next we show that T is linear. Let y ∈ V have the representation y = ∑ni=1 βi ei . Then T (α x + β y) = T ∑ni=1 (ααi + β βi )ei = ∑ni=1 (ααi + β βi ) fi = ∑ni=1 (ααi ) fi + ∑ni=1 (β βi ) fi = α ∑ni=1 αi fi + β ∑ni=1 βi fi = α (T x) + β (Ty). To prove uniqueness, let S : V −→ W be a linear transformation satisfying Sei = fi for all i. Then, for every x ∈ V , n
n
n
i=1
i=1
i=1
Sx = S ∑ αi ei = ∑ αi Sei = ∑ αi fi = T x. This implies T = S.
Given a linear transformation T : V −→ W , there are two important subspaces that are determined by it, namely 1. Ker T ⊂ V , the kernel of T , defined by Ker T = {x ∈ V |T x = 0}. 2. Im T ⊂ W , the image of T , defined by Im T = {T x|x ∈ V }. Theorem 4.3. Let T : V −→ W be a linear transformation. Then Ker T and Im T are subspaces of V and W respectively. Proof. 1. It is clear that Ker T ⊂ V and 0 ∈ Ker T . Let x, y ∈ Ker T and α , β ∈ F. Since T (α x + β y) = α (T x) + β (Ty) = 0, it follows that α x + β y ∈ Ker T . 2. By definition Im T ⊂ W . Let y1 , y2 ∈ Im T . Thus there exist vectors x1 , x2 ∈ V for which yi = T xi . Let α1 , α2 ∈ F. Then
4.2 Linear Transformations
69
α1 y1 + α2 y2 = α1 T x1 + α2 T x2 = T (α1 x1 + α2 x2 ), and α1 y1 + α2 y2 ∈ Im T .
We define the rank of a linear transformation T , denoted by rank(T ), by rank (T ) = dim Im T,
(4.2)
and the nullity of T , denoted null(T ), by null (T ) = dim Ker T.
(4.3)
The rank and nullity of a linear transformation are connected via the following theorem. Theorem 4.4. Let V and W be finite-dimensional vector spaces over F and let T : V −→ W be a linear transformation. Then rank(T ) + null(T ) = dim V ,
(4.4)
dim Im T + dim Ker T = dim V .
(4.5)
or equivalently, Proof. Let {e1 , . . . , en } be a basis of V for which {e1 , . . . , e p } is a basis of KerT . Let x ∈ V , then x has a unique representation of the form x = ∑ni=1 αi ei . Since T is linear, we have Tx = T
n
n
i=1
i=1
∑ αi ei = ∑ αi Tei =
n
∑
αi Tei ,
i=p+1
since Tei = 0 for i = 1, . . . , p. Therefore it follows that {Te p+1 , . . . , Ten } is a spanning set of Im T . We will show that they are also linearly independent. Assume that there exist {α p+1 , . . . , αn } such that ∑ni=p+1 αi (Tei ) = 0. Now ∑ni=p+1 αi (Tei ) = T ∑ni=p+1 αi ei , which implies that ∑ni=p+1 αi ei ∈ Ker T . Hence, it can be represented p αi ei . So we get as a linear combination of the form ∑i=1 p
n
i=1
i=p+1
∑ αi ei − ∑
αi ei = 0.
Using the fact that {e1 , . . . , en } is a basis of V , we conclude that αi = 0 for all i = 1, . . . , n. Thus we have proved that {Te p+1, . . . , Ten } is a basis of Im T . Therefore null(T ) = p, rank (T ) = n − p, and null(T ) + rank (T ) = p + (n − p) = n = dim V . Let U and V be vector spaces over the field F. We denote by L(U , V ), or Hom F (U , V ), the space of all linear transformations from U to V . Given two transformations T, S ∈ L(U , V ), we define their sum by
70
4 Linear Transformations
(T + S)x = T x + Sx, and the product with any scalar α ∈ F by (α T )x = α (T x). Theorem 4.5. With the previous definitions, L(U , V ) is a vector space. Proof. We show that T + S and α T are indeed linear transformations. Let x, y ∈ V , α ∈ F. We have (T + S)(x + y) = T (x + y) + S(x + y) = (T x + Ty) + (Sx + Sy) = (T x + Sx) + (Ty + Sy) = (T + S)x + (T + S)y. Similarly, (T + S)(α x) = T (α x) + S(α x) = α (T x) + α (Sx) = α (T x + Sx) = α ((T + S)x). This shows that T +S is a linear transformation. The proof for α T goes along similar lines. We leave it to the reader to verify that all the axioms of a linear space hold for L(U , V ). It is clear that the dimension of L(U , V ) is determined by the spaces U and V . The next theorem characterizes it. Theorem 4.6. Let U and V be vector spaces over the field F. Then dim L(U , V ) = dim U · dimV .
(4.6)
Proof. Let {e1 , . . . , en } be a basis for U and { f1 , . . . , fm } a basis for V . We proceed to construct a basis for L(U , V ). By Theorem 4.2 a linear transformation is determined by its values on basis elements. We define a set of mn linear transformations Ei j , i = 1, . . . , m, j = 1, . . . , n, by letting Ei j ek = δ jk fi ,
k = 1, . . . , n.
(4.7)
Here δi j is the Kronecker delta function, defined by (2.4). On the vector x = ∑nk=1 αk ek the transformation Ei j acts as Ei j x = Ei j or
n
n
n
k=1
k=1
k=1
∑ αk ek = ∑ αk Ei j ek = ∑ αk δ jk fi = α j fi , Ei j x = α j f i .
4.2 Linear Transformations
71
We now show that the set of transformations Ei j forms a basis for L(U , V ). To prove this we have to show that this set is linearly independent and spans L(U , V ). We begin by showing linear independence. Assume there exist αi j ∈ F for which m
n
∑ ∑ αi j Ei j = 0.
i=1 j=1
We operate with the zero transformation, in both its representations, on an arbitrary basis element ek in V : m
0 = 0 · ek = ∑
n
m
n
m
∑ αi j Ei j ek = ∑ ∑ αi j δ jk fi = ∑ αik fi .
i=1 j=1
i=1 j=1
i=1
Since the fi are linearly independent, we get αik = 0 for all i = 1, . . . , m. The index k was chosen arbitrarily; hence we conclude that all the αi j are zero. This completes the proof of linear independence. Let T be an arbitrary linear transformation in L(U , V ). Since Tek ∈ V , it has a unique representation of the form Tek = ∑m i=1 bik f i , k = 1, . . . , n. We show now that n T = ∑m b E . Of course, by Theorem 4.2, it suffices to prove the equality ∑ i=1 j=1 i j i j on the basis elements in U . We compute n m n m n (∑m i=1 ∑ j=1 bi j Ei j )ek = ∑i=1 ∑ j=1 bi j (Ei j ek ) = ∑i=1 ∑ j=1 bi j δ jk f i = ∑m i=1 bik f i = Tek .
Since k can be chosen arbitrarily, the proof is complete.
In some cases linear transformations can be composed, or multiplied. Let U , V , W be vector spaces over F. Let S ∈ L(V , W ) and T ∈ L(U , V ), i.e., T
S
U −→ V −→ W . We define the transformation ST : U −→ W by (ST )x = S(T x),
∀x ∈ U .
(4.8)
We call the transformation ST the composition, or product, of S and T . Theorem 4.7. The transformation ST : U −→ W is linear, i.e., ST ∈ L(U , W ). Proof. Let x, y ∈ U and α , β ∈ F. By the linearity of S and T we get (ST )(α x + β y) = S(T (α x + β y)) = S(α (T x) + β (Ty)) = α S(T x) + β S(Ty) = α (ST )x + β (ST)y.
72
4 Linear Transformations
Composition of transformations is associative; that is, if T
S
R
U −→ V −→ W −→ Y, then it is easily checked that with the composition of linear transformations defined by (4.8), the following associative rule holds: (RS)T = R(ST ).
(4.9)
We saw already the existence of the identity map I in each linear space U . Clearly, we have IT = T I = T for every linear transformation T . Definition 4.8. Let T ∈ L(V , W ). We say that T is right invertible if there exists a linear transformation S ∈ L(W , V ) for which T S = IW . Similarly, we say that T is left invertible if there exists a linear transformation S ∈ L(W , V ) for which ST = IV . We say that T is invertible if it is both right and left invertible. Proposition 4.9. If T is both right and left invertible, then the right inverse and the left inverse are equal and will be denoted by T −1 . Proof. Let T S = IW and RT = IV . Then R(T S) = (RT )S implies R = RIW = IV S = S.
Left and right invertibility are intrinsic properties of the transformation T , as follows from the next theorem. Theorem 4.10. Let T ∈ L(V , W ). Then 1. T is right invertible if and only if T is surjective. 2. T is left invertible if and only if T is injective. Proof. 1. Assume T is right invertible. Thus, there exists S ∈ L(W , V ) for which T S = IW . Therefore, for all x ∈ W , we have T (Sx) = (T S)x = IW x = x, or x ∈ Im T , and so Im T = W and T is surjective. Conversely, assume T is surjective. Let { f 1 , . . . , fm } be a basis in W and let {e1 , . . . , em } be vectors in V satisfying Tei = fi , for i = 1, . . . , m. Clearly, the
4.2 Linear Transformations
73
vectors {e1 , . . . , em } are linearly independent. We define a linear transformation S ∈ L(W , V ) by ei , i = 1, . . . , m, S fi = 0, i = m + 1, . . ., n. Then, for all i, (T S) fi = T (S fi ) = Tei = fi , i.e., T S = IW . 2. Assume T is left invertible. Let S ∈ L(W , V ) be a left inverse, that is, ST = IV . Let x ∈ Ker T . Then x = IV x = (ST )x = S(T x) = S(0) = 0, that is, KerT = {0}. Conversely, assume Ker T = {0}. Let {e1 , . . . , en } be a basis in V . It follows from our assumption that {Te1 , . . . , Ten } are linearly independent vectors in W . We complete this set of vectors to a basis {Te1 , . . . , Tem , fm+1 , . . . , fn }. We now define a linear transformation S ∈ L(W , V ) by S
Tei = ei , T fi = 0,
i = 1, . . . , m, i = m + 1, . . ., n.
Obviously, STei = ei for all i, and hence ST = IV .
It should be noted that the left inverse is generally not unique. In our construction, we had the freedom to define S differently on the vectors { fm+1 , . . . , fn }, but we opted for the simplest definition. The previous characterizations are special cases of the more general result. Theorem 4.11. Let U , V , W be vector spaces over F. 1. Assume A ∈ L(U , W ) and B ∈ L(U , V ). Then there exists C ∈ L(V , W ) such that A = CB if and only if Ker A ⊃ Ker B.
(4.10)
2. Assume A ∈ L(U , W ) and B ∈ L(V , W ). Then there exists C ∈ L(U , V ) such that A = BC if and only if Im A ⊂ Im B. Proof. 1. If A = CB, then for x ∈ Ker B we have Ax = C(Bx) = C0 = 0, i.e., x ∈ Ker A and (4.10) follows.
(4.11)
74
4 Linear Transformations
Conversely, assume Ker A ⊃ Ker B. Let {e1 , . . . , er , er+1 , . . . , e p , e p+1 , . . . , en } be a basis in U such that {e1 , . . . , e p } is a basis for Ker A and {e1 , . . . , er } is a basis for Ker B. The vectors {Ber+1 , . . . , Ben } are linearly independent in V . We complete them to a basis of V and define a linear transformation C : V −→ W by Aei , i = p + 1, . . ., n, CBei = 0, i = r + 1, . . . , p, and arbitrarily on the other basis elements of V . Then, for every x = ∑ni=1 αi ei ∈ U we have Ax = A ∑ni=1 αi ei = ∑ni=1 αi Aei = ∑ni=p+1 αi (Aei ) = ∑ni=p+1 αi (CB)ei = CB ∑ni=p+1 αi ei = CB ∑ni=1 αi ei = CBx, and so A = CB. 2. If A = BC, then for every x ∈ U we have Ax = (BC)x = B(Cx), or Im A ⊂ Im B. Conversely, assume Im A ⊂ Im B. Then we let {e1 , . . . , er , er+1 , . . . , en } be a basis for U such that {er+1 , . . . , en } is a basis for Ker A. Then {Ae1 , . . . , Aer } are linearly independent vectors in W . By our assumption, Im A ⊂ Im B, there exist vectors {g1 , . . . , gr } in V , necessarily linearly independent, such that Aei = Bgi ,
i = 1, . . . , r.
We complete them to a basis {g1 , . . . , gr , gr+1 , . . . , gm } of W . We define now C ∈ L(U , V ) by gi , i = 1, . . . , r, Cei = 0, i = r + 1, . . . , n. For x = ∑ni=1 αi ei ∈ U , we get A ∑ni=1 αi ei = ∑ri=1 αi Aei = ∑ri=1 αi Bgi = B ∑ri=1 αi gi = B ∑ri=1 αiCei = B ∑ni=1 αiCei = BC ∑ni=1 αi ei = BCx, and A = BC.
Theorem 4.12. Let T ∈ L(U , V ) be invertible. Then T −1 ∈ L(V , U ). Proof. Let y1 , y2 ∈ V and α1 , α2 ∈ F. There exist unique vectors x1 , x2 ∈ U such that T xi = yi , or equivalently, xi = T −1 yi . Since
4.3 Matrix Representations
75
T (α1 x1 + α2 x2 ) = α1 T x1 + α2 T x2 = α1 y1 + α2 y2 , therefore T −1 (α1 y1 + α2 y2 ) = α1 x1 + α2 x2 = α1 T −1 y1 + α2 T −1 y2 .
A linear transformation T : U −→ V is called an isomorphism if it is invertible. In this case we will say that U and V are isomorphic spaces. Clearly, isomorphism of vector spaces is an equivalence relation. Theorem 4.13. Every n-dimensional vector space U over F is isomorphic to Fn . Proof. Let B = {e1 , . . . , en } be a basis for U . The map x → [x]B is an isomorphism. Theorem 4.14. Two finite-dimensional vector spaces V and W over F are isomorphic if and only if they have the same dimension. Proof. If U and V have the same dimension, then we can construct a map that maps the basis elements in U onto the basis elements in V . This map is necessarily an isomorphism. Conversely, if U and V are isomorphic, the image of a basis in U is necessarily a basis in V . Thus dimensions are equal.
4.3 Matrix Representations Let U , V be vector spaces over a field F. Let BU = {e1 , . . . , en } and BV = { f1 , . . . , fm } be bases of U and V respectively. We saw before that the set of linear transformations {Ei j }, i = 1, . . . , m, j = 1, . . . , n, defined by Ei j ek = δ jk fi ,
(4.12)
is a basis for L(U , V ). We denote this basis by BU × BV . A natural problem presents itself, namely finding the coordinate vector of a linear transformation T ∈ L(U , V ) with respect to this basis. We observed before, in Theorem 4.6, that T can be written as m
T=∑
n
∑ t i j Ei j .
(4.13)
i=1 j=1
Hence m
n
m
n
m
Tek = ∑ ∑ ti j Ei j ek = ∑ ∑ ti j δ jk fi = ∑ tik fi , i=1 j=1
i=1 j=1
i=1
(4.14)
76
4 Linear Transformations
i.e.,
m
Tek = ∑ tik fi .
(4.15)
i=1
Therefore, the coordinate vector of T with respect to the basis BU × BV will have the entries tik arranged in the same order in which the basis elements are ordered. In contrast to the case of an abstract vector space, we will arrange the coordinates in this case in an m × n matrix (ti j ) and call it the matrix representation of T with respect to the bases BU in U and BV in V . We will use the following notation: B
[T ]BV = (tik ). U
(4.16)
The importance of the matrix representation stems froms the following theorem, which reduces the application of an arbitrary linear transformation to matrix multiplication. Theorem 4.15. Let U and V be n-dimensional and m-dimensional vector spaces with respective bases BU and BV , and let T : U −→ V be a linear transformation. Then the following diagram is commutative: T U V
[ · ]BU
[ · ]BV ? F
B
[T ]BV
U
n
-
? Fm
In other words, we have BU V [T x]BV = [T ]B . BU [x]
Proof. Assume x = ∑nj=1 α j e j and Te j = ∑m i=1 ti j f i . Then m n T x = T ∑nj=1 α j e j = ∑nj=1 α j ∑m i=1 ti j f i = ∑i=1 (∑ j=1 ti j α j ) f i .
Thus
⎞ ∑nj=1 t1 j α j ⎟ ⎜ . ⎟ ⎜ ⎟ ⎜ B =⎜ . ⎟ = [T ]BVU [x]BU . ⎟ ⎜ ⎠ ⎝ . n ∑ j=1 tm j α j ⎛
[T x]BV
4.3 Matrix Representations
77
The previous theorem shows that by choosing bases, we can pass from an abstract representation of spaces and transformations to a concrete one in the form of column vectors and matrices. In the latter form, computations are easily mechanized. The next theorem deals with the matrix representation of the product of two linear transformations. Theorem 4.16. Let U , V , W be vector spaces over a field F of dimensions n, m, and p respectively. Let T ∈ L(U , V ) and S ∈ L(V , W ). Let BU = {e1 , . . . , en }, BV = { f1 , . . . , fm }, and BW = {g1 , . . . , g p } be bases in U , V , and W respectively. Then B
B
B
[ST ]BW = [S]BW [T ]BV . U
V
(4.17)
U
p p Proof. Let Te j = ∑m k=1 tk j f k , S f k = ∑i=1 sik gi and (ST )e j = ∑i=1 ri j gi . Therefore
(ST )e j = S(Te j ) = S ∑m k=1 tk j f k p
m = ∑m k=1 tk j S f k = ∑k=1 tk j ∑i=1 sik gi p = ∑i=1 [∑m k=1 sik tk j ]gi .
Since {g1 , . . . , g p } is a basis for W , we get ri j =
m
∑ siktk j ,
k=1
and (4.17) follows. Clearly, we have proved the commutativity of the following diagram: U
T
S
-V
[ · ] BU
-W
[ · ]BV ? Fn
B
[T ]BV
U
-
? Fm
[ · ]BW B
[S]BW V
? - p F
Corollary 4.17. Let the vector space V have the two bases B1 and B2 . Then we have B2 B1 1 [I]B (4.18) B2 [I]B1 = [I]B1 = I, or alternatively,
−1
B2 1 = [I] . [I]B B2 B1
(4.19)
78
4 Linear Transformations
An important special case of the previous result is that of the relation between two matrix representations of a given linear transformation, taken with respect to two different bases. We begin by noting that the following diagram commutes. I
V
I
-V
[ · ]B1
-V
[ · ] B2 2 [I]B B1
? Fn
-
[ · ]B1 1 [I]B B2
? Fn
? - n F
Theorem 4.18. Let T be a linear transformation in a vector space V . Let B1 and B2 be two bases of V . Then we have B2 B1 B1 2 [T ]B B2 = [I]B1 [T ]B1 [I]B2 ,
or
(4.20)
−1 B1 B1 2 1 [T ]B [T ]B B2 = [I]B2 B1 [I]B2 .
(4.21)
Proof. We observe the following commutative diagram, from which the result immediately follows: I V
-
[ · ]B2 ? Fn
T V
-
I V
[ · ]B1 1 [I]B B2
-
? Fn
-
[ · ]B1 1 [T ]B B1
-
? Fn
V
[ · ]B2 2 [I]B B1
-
? Fn
Equation (4.21) leads us to the following definition. Definition 4.19. Given matrices A, B ∈ Fn×n , we say that A and B are similar, and write A B, if there exists an invertible matrix R ∈ Fn×n for which A = R−1 BR. Clearly, similarity is an equivalence relation.
(4.22)
4.4 Linear Functionals and Duality
79
Theorem 4.18 plays a key role in the study of linear transformations. The structure of a linear transformation is determined through the search for simple matrix representations, which is equivalent to finding appropriate bases. The previous theorem’s content is that through a change of basis the matrix representation of a given linear transformation undergoes a similarity transformation.
4.4 Linear Functionals and Duality We note that a field F is at the same time a one-dimensional vector space over itself. Given a vector space V over a field F, a linear transformation from V to F is called a linear functional. The set of all linear functionals on V , i.e., L(V , F), is called the dual space of V and will be denoted by V ∗ . Thus V ∗ = L(V , F).
(4.23)
Given elements x ∈ V and x∗ ∈ V ∗ , we will write also x∗ (x) = x, x∗
(4.24)
and call x, x∗ a duality pairing of V and V ∗ . Note that x, x∗ is linear in both variables. More generally, given F-vector spaces V , W , we say that a map v, w : V × W −→ F is a bilinear pairing or a bilinear form if v, w is linear in both variables, i.e.,
α1 v1 + α2 v2 , w = α1 v1 , w + α2 v2 , w, (4.25)
v, α1 w1 + α2 w2 = α1 v, w1 + α2 v, w2 , for all αi ∈ F, vi ∈ V , wi ∈ W . Examples: • Let V = Fn . Then, for α1 , . . . , αn ∈ F, the map f : Fn −→ F defined by ⎛ ⎞ x1 ⎜ . ⎟ ⎜ ⎟ n ⎜ ⎟ f ⎜ . ⎟ = ∑ αi xi ⎜ ⎟ i=1 ⎝ . ⎠ xn defines a linear functional. • Let Fn×n be the space of all square matrices of order n. Given a linear transformation T acting in a finite-dimensional vector space U , we define the trace of T , Trace(T ), by n
Trace(T ) = ∑ tii , i=1
(4.26)
80
4 Linear Transformations
where (ti j ) is a matrix representation of T with respect to an arbitrary basis of U . It can be shown directly that the trace is independent of the particular choice of basis. Let X , A ∈ Fn×n . Then f defined by f (X ) = Trace(AX)
(4.27)
is a linear functional. Let V be an n-dimensional vector space, and let B = {e1 , . . . , en } be a basis for V . Thus, each vector x ∈ V has a unique representation of the form x = ∑nj=1 α j e j . We define functionals fi by fi (x) = αi ,
i = 1, . . . , n.
It is easily checked that the fi so defined are linear functionals. To see this, let y = ∑nj=1 β j e j . Then fi (x + y) = fi (∑nj=1 α j e j + ∑nj=1 β j e j ) = fi (∑nj=1 (α j + β j )e j ) = αi + βi = fi (x) + fi (y). In the same way, fi (α x) = f i (α ∑nj=1 α j e j ) = fi (∑nj=1 (αα j )e j ) = ααi = α fi (x). Since clearly, ei = ∑nj=1 δi j e j , we have fi (e j ) = δi j for 1 ≤ i, j ≤ n. Theorem 4.20. Let V be an n-dimensional vector space, and let B = {e1 , . . . , en } be a basis for V . Then 1. We have
dim V ∗ = dim V .
(4.28)
2. The set of linear functionals { f1 , . . . , fn } defined, through linear extensions, by fi (e j ) = δi j is a basis for V ∗ . Proof. 1. We apply Theorem 4.6, i.e., the fact that dim L(U , V ) = dim U dim V and the fact that dim F = 1. 2. The functionals { f 1 , . . . , fn } are linearly independent. For let ∑ni=1 αi fi = 0. Then 0 = 0 · ej =
n
∑ αi fi
i=1
n
n
i=1
i=1
e j = ∑ αi fi (e j ) = ∑ αi δi j = α j .
Therefore α j = 0 for all j, and linear independence is proved. Since dim V ∗ = n, it follows that { f1 , . . . , fn } is a basis for V ∗ .
4.4 Linear Functionals and Duality
81
Definition 4.21. Let V be an n-dimensional vector space, and let B = {e1 , . . . , en } be a basis for V . Then the basis { f1 , . . . , fn } defined by fi (e j ) = δi j is called the dual basis to B and will be denoted by B ∗ . Definition 4.22. Let S be a subset of a vector space V . We denote by S ⊥ the subset of V ∗ defined by S ⊥ = { f ∈ V ∗ | f (s) = 0, for all s ∈ S }. The set S ⊥ is called the annihilator of S . Proposition 4.23. Let S be a subset of the vector space V . Then 1. The set S ⊥ is a subspace of V ∗ . 2. If Q ⊂ S then S ⊥ ⊂ Q⊥ . 3. We have S ⊥ = (span (S ))⊥ . Proof. 1. Let f , g ∈ S ⊥ and α , β ∈ F. Then for an arbitrary x ∈ S , (α f + β g)x = α f (x) + β g(x) = α · 0 + β · 0 = 0, that is, α f + β g ∈ S ⊥ . 2. Let f ∈ S ⊥ . Then f (x) = 0 for all x ∈ S and particularly, by the inclusion Q ⊂ S , for all x ∈ Q. So f ∈ Q ⊥ . 3. It is clear that S ⊂ span (S ), and therefore span(S )⊥ ⊂ S ⊥ . On the other hand, if f ∈ S ⊥ , xi ∈ S and αi ∈ F, then f
n
∑ αi xi
i=1
n
n
i=1
i=1
= ∑ αi f (xi ) = ∑ αi · 0 = 0,
which means that f annihilates all linear combinations of elements of S . So f ∈ span (S )⊥ or S ⊥ ⊂ span (S )⊥ . From these two inclusions the equality S ⊥ = span (S )⊥ follows. We will find the following proposition useful in the study of duality. Proposition 4.24. Let V be a vector space and M its subspace. Then the dual space to the quotient space V /M is isomorphic to M ⊥ . Proof. Let φ ∈ M ⊥ . Then φ induces a linear functional Φ on V /M by Φ ([x]) = φ (x). The functional Φ is well defined, since [x1 ] = [x2 ] if and only if x1 − x2 ∈ M , and this implies φ (x1 ) − φ (x2 ) = φ (x1 − x2 ) = 0. The linearity of Φ follows from the linearity of φ . Conversely, given Φ ∈ (V /M )∗ , we define φ by φ (x) = Φ ([x]). Clearly φ (x) = 0 for x ∈ M , i.e., φ ∈ M ⊥ . Finally, we note that the map φ → Φ is linear, which proves the required isomorphsm.
82
4 Linear Transformations
Theorem 4.25. Let V be an n-dimensional vector space. Let M be a subspace of V and M ⊥ its annihilator. Then dim V = dim M + dim M ⊥ . Proof. Let {e1 , . . . , en } be a basis for V , with {e1 , . . . , em } a basis for M . Let { f1 , . . . , fn } be the dual basis. We will show that { fm+1 , . . . , fn } is a basis for M ⊥ . The linear independence of the vectors { fm+1 , . . . , fn } is clear, since they are a subset of a basis. Moreover, they are clearly in M ⊥ . It remains to show that they actually span M ⊥ . So let f ∈ M ⊥ . It can be written as f = ∑nj=1 α j f j . But since f ∈ M ⊥ , we have f (e1 ) = · · · = f (em ) = 0. Since f (ei ) = ∑nj=1 α j f j (ei ) = ∑nj=1 α j δ ji = αi , we conclude that α1 = · · · = αm = 0 and therefore f = ∑nj=m+1 α j f j . The proof is complete, since dim M = m and dim M ⊥ = n − m. Let U be a vector space over F. We define a hyperspace M to be a maximal nontrivial subspace of U . Proposition 4.26. Let dim U = n. The following statements are equivalent. 1. M is a hyperspace. 2. dim M = n − 1. 3. M is the kernel of a nonzero functional φ . Proof. (1) ⇒ (2). Assume M is a hyperspace. Fix a nonzero vector x ∈ / M . Then span (x, M ) = {α x + m | α ∈ F, m ∈ M } is a subspace that properly contains M . Thus necessarily span (x, M ) = U . Let {e1 , . . . , ek } be a basis for M . Then {e1 , . . . , ek , x} are linearly independent and span U , so they are a basis for U . Necessarily k = dim M = n − 1. (2) ⇒ (3). Let M be an (n − 1) dimensional subspace of U . Let x ∈ / M . Define a linear functional φ by setting φ (x) = 1, φ |M = 0. For each u ∈ U there exist unique γ ∈ F and m ∈ M such that u = γ f + m. It follows that φ (u) = φ (γ f + m) = γφ ( f ) + φ (m) = γ . Therefore, u ∈ Ker φ if and only if u ∈ M . (3) ⇒ (1). Assume M = Ker φ with φ a nontrivial functional. Fix f ∈ / M , which exists by the nontriviality of φ . We show now that every u ∈ U has a unique representation of the form u = γ f + m with m ∈ M . In fact, it suffices to take γ = φφ ((u) f ) . Thus M is a hyperspace. Recalling Definition 2.25, where the codimension of a space was introduced, we can extend Proposition 4.26 in the following way. Proposition 4.27. Let U be an n-dimensional vector space over F, and let M be a subspace. Then the following statements are equivalent:
4.4 Linear Functionals and Duality
83
1. M has codimension k. 2. We have dim M = n − k. 3. M is the intersection of the kernels of k linearly independent functionals. Proof. (1) ⇒ (2). Let f1 , . . . , fk be linearly independent vectors over M and let L(M , f1 , . . . , fk ) = U . Let e1 , . . . , e p be a basis for M . We claim that {e1 , . . . , e p , f1 , . . . , fk } is a basis for U . The spanning property is obvious. To see linear p independence, assume there exist αi , βi ∈ F such that ∑i=1 αi ei + ∑ki=1 βi fi = 0. p This implies ∑ki=1 βi fi = − ∑i=1 αi ei ∈ M . This implies in turn that all βi are zero. From this we conclude that all αi are zero. Thus we must have that p + k = n, or dim M = n − k. (2) ⇒ (3). Assume dim M = n − k. Let {e1 , . . . , en−k } be a basis for M . We extend it to a basis {e1 , . . . , en }. Define, using linear extensions, k linear functionals φ1 , . . . , φk such that
φi (e j ) = δi j ,
j = n − k + 1, . . ., n.
Obviously, φ1 , . . . , φk are linearly independent and ∩i Ker φi = M . (3) ⇒ (1). Assume φ1 , . . . , φk are linearly independent and ∩i Ker φi = M . We choose a sequence of vectors xi inductively. We choose a vector x1 such that φ1 (x1 ) = 1. Suppose we have chosen already x1 , . . . , xi−1 . Then we choose xi such that xi ∈ ∩i−1 j=1 Ker φ j and φi (xi ) = 1. To see that this is possible, we note that if this is not the case, then ⎞ ⎛ φ1 ⎟ ⎜. ⎟ ⎜ ⎟ ⎜ i−1 Ker φi ⊃ ∩ j=1 Ker φ j = Ker ⎜ . ⎟. ⎟ ⎜ ⎠ ⎝. φi−1 Using Theorem 4.11, it follows that there exist α j such that φi = ∑i−1 j=1 α j φ j , contrary to the assumption of linear independence. Clearly, the vectors x1 , . . . , xk are linearly independent. Moreover, it is easy to check that x − ∑kj=1 α j x j ∈ ∩kj=1 Ker φ j . This implies dim M = n − k. Let V be a vector space over F and let V ∗ be its dual space. Thus, given x ∈ V and f ∈ V ∗ we have f (x) ∈ F. But, fixing a vector x ∈ F, we can view the expression f (x) as a F-valued function defined on the dual space V ∗ . We denote this function by x, ˆ and it is defined by x( ˆ f ) = f (x). (4.29) Theorem 4.28. The function xˆ : V ∗ −→ F, defined by (4.29), is a linear map. Proof. Let f , g ∈ V ∗ and α , β ∈ F. Then x( ˆ α f + β g) = (α f + β g)(x) = (α f )(x) + (β g)(x) ˆ f ) + β x(g). ˆ = α f (x) + β g(x) = α x(
84
4 Linear Transformations
Together with the function xˆ ∈ L(V ∗ , F) = V ∗∗ , we have the function φ : V −→ V defined by φ (x) = x. ˆ (4.30) ∗∗
The function φ so defined is referred to as the canonical embedding of V in V ∗∗ . Theorem 4.29. Let V be a finite-dimensional vector space over F. The function φ : V −→ V ∗∗ defined by (4.30) is injective and surjective, i.e., it is an invertible linear map. Proof. We begin by showing the linearity of φ . Let x, y ∈ V , α , β ∈ F, and f ∈ V ∗ . Then (φ (α x + β y)) f = f (α x + β y) = α f (x) + β f (y) = α x( ˆ f ) + β y( ˆ f ) = (α xˆ + β y)( ˆ f) = (αφ (x) + β φ (y)) f . Since this is true for all f ∈ V ∗ , we get
φ (α x + β y) = αφ (x) + β φ (y), i.e., the canonical map φ is linear. It remains to show that φ is invertible. Since dim V ∗ = dim V , we get also dim V ∗∗ = dim V . So, to show that φ is invertible, it suffices to show injectivity. To this end, let x ∈ Ker φ . Then for each f ∈ V ∗ we have ˆ f ) = f (x). (φ (x))( f ) = 0 = x( This implies x = 0. For if x = 0, there exists a functional f for which f (x) = 0. The easiest way to see this is to complete x to a basis and take the dual basis. Corollary 4.30. Let L ∈ V ∗∗ . Then there exists a x ∈ V such that L = x. ˆ Corollary 4.31. Let V be an n-dimensional vector space and let V ∗ be its dual space. Let { f1 , . . . , fn } be a basis for V ∗ . Then there exists a basis {e1 , . . . , en } for V for which fi (e j ) = δi j , i.e., every basis in V ∗ is the dual of a basis in V . Proof. Let {E1 , . . . , En } be the basis in V ∗∗ that is dual to the basis { f1 , . . . , fn } of V ∗ . Now, there exist unique vectors {e1 , . . . , en } in V for which Ei = eˆi . So, for each f ∈ V ∗ , we have Ei ( f ) = eˆi ( f ) = f (ei ), and in particular,
f j (ei ) = Ei ( f j ) = δi j .
4.5 The Adjoint Transformation
85
Let now M ⊂ V ∗ be a subspace. We define ⊥
M = ∩ f ∈M Ker f = {x ∈ V | f (x) = 0, ∀ f ∈ M }.
Theorem 4.32. We have
φ (⊥ M ) = M ⊥ .
ˆ f) = Proof. We have x ∈ ∩ f ∈M Ker f if and only if for each f ∈ M we have x( ⊥ f (x) = 0. However, the last condition is equivalent to xˆ ∈ M . Corollary 4.33. Let V be an n-dimensional vector space and M an (n − k)dimensional subspace of V ∗ . Let φ (⊥ M ) = ∩ f ∈M Ker f . Then dim φ (⊥ M ) = k. Proof. We have dim V ∗ = n = dim M + dim M ⊥ = n − k + dimM ⊥ , that is, dimM ⊥ = k. We conclude by observing that dim φ (⊥ M ) = dim M ⊥ .
p Theorem 4.34. Let f , g1 , . . . , g p ∈ V ∗ . Then Ker f ⊃ ∩i=1 Ker gi if and only if f ∈ span (g1 , . . . , g p ).
Proof. Define the map g : V −→ F p by ⎞ g1 (x) ⎜ . ⎟ ⎟ ⎜ ⎟ ⎜ g(x) = ⎜ . ⎟ . ⎟ ⎜ ⎝ . ⎠ g p (x) ⎛
(4.31)
p Obviously, we have the equality Ker g = ∩i=1 Ker gi . We apply Theorem 4.11 to conclude that there exists a map α = α1 . . . α p : F p −→ F such that f = α g = p ∑i=1 αi gi .
4.5 The Adjoint Transformation Let U and V be two vector spaces over the field F. Assume T ∈ L(U , V ) and f ∈ V ∗ . Let us consider the composition of maps T
f
U −→ V −→ F. It is clear that the product, or composition, of f and T , i.e., f T , is a linear transformation from U to F. This means that f T ∈ U ∗ and we denote this
86
4 Linear Transformations
functional by T ∗ f . Therefore T ∗ : V ∗ −→ U ∗ is defined by T ∗ f = f T,
(4.32)
or (T ∗ f )u = f (Tu),
u∈U.
(4.33)
∗
The transformation T is called the adjoint transformation of T . In terms of the duality pairing f (v) = v, f , we can write (4.33) as
Tu, f = u, T ∗ f .
(4.34)
Theorem 4.35. The transformation T ∗ is linear, i.e., T ∗ ∈ L(V ∗ , U ∗ ). Proof. Let f , g ∈ V ∗ and α , β ∈ F. Then, for every x ∈ U , (T ∗ (α f + β g))x = (α f + β g))(T x) = α f (T x) + β g(T x) = α (T ∗ f )x + β (T ∗ g)x = (α T ∗ f + β T ∗ g)x. Therefore T ∗ (α f + β g) = α T ∗ f + β T ∗ g, which proves linearity. Theorem 4.36. Let T ∈ L(U , V ). Then (Im T )⊥ = Ker T ∗ .
(4.35)
Proof. For every x ∈ U and f ∈ V ∗ we have (T ∗ f )x = f (T x), so f ∈ (Im T )⊥ if and only if for all x ∈ U , we have 0 = f (T x) = (T ∗ f )x, which is equivalent to f ∈ Ker T ∗ . Corollary 4.37. Let T ∈ L(U , V ). Then rank (T ) = rank (T ∗ ).
(4.36)
Proof. Let dim U = n and dim V = m. Assume rank T = dim Im T = p. Now m = dim V = dim V ∗ = dim Im T ∗ + dim Ker T ∗ = rank T ∗ + dimKer T ∗. Since we have dim Ker T ∗ = dim(Im T )⊥ = m − dim(Im T ) = m − rankT = m − p, it follows that rank T ∗ = p.
4.5 The Adjoint Transformation
87
Theorem 4.38. Let T ∈ L(U , V ) and let BU = {e1 , . . . , en } be a basis in U ∗ = {φ , . . . , φ } and B ∗ = and let BV = { f1 , . . . , fm } be a basis in V . Let BU n 1 V ∗ ∗ {ψ1 , . . . , ψm } be the dual bases in U and V respectively. Then B∗ B [T ∗ ]BU∗ = [T ]BV . V
Proof. We recall that
U
B
([T ]BVU )i j = ψi (Te j ), ∗ . Now we know that so in order to compute, we have to find the dual basis to BU ∗∗ BU = {eˆ1 , . . . , eˆn }, so B∗
B
([T ∗ ]BU∗ )i j = eˆi (T ∗ ψ j ) = (T ∗ ψ j )ei = ψ j (Tei ) = ([T ]BVU ) ji . V
Corollary 4.39. Let A be an m × n matrix. Then the row and column ranks of A are equal.
Proof. Follows from (4.36). T∗
= T. We consider now a special class of linear transformations T that satisfy Assuming T ∈ L(U , V ), it follows that T ∗ ∈ L(V ∗ , U ∗ ). For the equality T ∗ = T to hold, it is therefore necessary that V = U ∗ and V ∗ = U . Of course V = U ∗ implies V ∗ = U ∗∗ , and therefore V ∗ = U will hold only if we identify U ∗∗ with U , which we can do using the canonical embedding. Let now x, y ∈ U . Then we have (T x)(y) = y(T ˆ x) = (T ∗ y)(x) ˆ = x(T ˆ ∗ y). ˆ
(4.37)
If we rewrite now the action of a functional x∗ on a vector x by
we can rewrite (4.37) as
x∗ (x) = x, x∗ ,
(4.38)
T x, y = x, T ∗ y.
(4.39)
We say that a linear transformation T ∈ L(U , U ∗ ) is self-dual if T ∗ = T , or equivalently, for all x, y ∈ U we have T x, y = x, Ty. Proposition 4.40. Let U be a vector space and U ∗ its dual, B a basis in U and B ∗ its dual basis. Let T ∈ L(U , U ∗ ) be a self-dual linear transformation. Then ∗ [T ]B B is a symmetric matrix. Proof. We use the fact that U ∗∗ = U and B ∗∗ = B. Then
88
4 Linear Transformations ∗ B B [T ]B B = [T ]B ∗∗ = [T ]B . ∗
∗
∗
∑ni=1 ξi ei .
If B = {e1 , . . . , en }, then every vector x ∈ U has an expansion x = ∗ This leads to T x, x = ∑ni=1 ∑nj=1 Ti j ξi ξ j , where (Ti j ) = [T ]B B . The expression n n ∑i=1 ∑ j=1 Ti j ξi ξ j is called a quadratic form. We will return to this topic in much greater detail in Chapter 8.
4.6 Polynomial Module Structure on Vector Spaces Let U be a vector space over the field F. Given a linear transformation T in U , there is in U a naturally induced module structure over the ring F[z]. This module structure, which we proceed to define, will be central to our study of linear transformations. Given a polynomial p(z) ∈ F[z], p(z) = ∑kj=0 p j z j , and a linear transformation T in a vector space U over F, we define p(T ) =
k
∑ p jT j,
(4.40)
j=0
with T 0 = I. The action of a polynomial p(z) on a vector x ∈ U is defined by p · x = p(T )x.
(4.41)
Proposition 4.41. Given a linear transformation T in a vector space U over F, the map p(z) → p(T ) is an algebra homomorphism of F[z] into L(U ). Proof. Clearly (α p + β q)(T ) = α p(T ) + β q(T ) and (pq)(T ) = p(T )q(T ).
Given two spaces U1 , U2 and linear transformations Ti ∈ L(Ui ), a linear transformation X : U1 −→ U2 is said to intertwine T1 and T2 if XT1 = T2 X .
(4.42)
Obviously (4.42) implies, for an arbitrary polynomial p(z), that X p(T1 ) = p(T2 )X. Thus, for x ∈ U1 , we have X (p · x) = p · Xx. This shows that intertwining maps are F[z] module homomorphisms from U1 to U2 . Two operators Ti are said to be similar, and we write T1 T2 , if there exists an invertible map intertwining them. Equivalently, T2 = RT1 R−1 , where R : U1 −→ U2 is invertible. A natural strategy for studying the similarity of two transformations is to characterize intertwining maps and find conditions guaranteeing their invertibility.
4.6 Polynomial Module Structure on Vector Spaces
89
Definition 4.42. Let U be an n-dimensional vector space over the field F, and T : U −→ U a linear transformation. 1. A subspace M ⊂ U is called an invariant subspace of T if for all x ∈ M we have also T x ∈ M . 2. A subspace M ⊂ U is called a reducing subspace of T if it is an invariant subspace of T and there exists another invariant subspace N ⊂ U such that U = M ⊕ N. We note that given a linear transformation T in U , T -invariant subspaces are just F[z]-submodules of U relative to the F[z]-module structure defined in U by (4.41). Similarly, reducing subspaces of T are equivalent to module direct summands of the module U . Proposition 4.43. 1. Let M be a subspace of U invariant under T . Let B = {e1 , . . . , en } be a basis for U such that B1 = {e1 , . . . , em } is a basis for M . Then, with respect to this basis, T has the block triangular matrix representation [T ]B B
=
T11 T12 0 T22
.
1 Moreover, T11 = [T |M ]B B1 . 2. Let M be a reducing subspace for T and let N be a complementary invariant subspace. Let B = {e1 , . . . , en } be a basis for U such that B1 = {e1 , . . . , em } is a basis for M and B2 = {em+1 , . . . , en } is a basis for N . Then the matrix representation of T with respect to this basis is block diagonal. Specifically,
[T ]B B
=
T1 0 0 T2
,
B2 1 where T1 = [T |M ]B B1 and T2 = [T |N ]B2 .
Corollary 4.44. 1. Let T be a linear transformation in U . Let {0} ⊂ M1 ⊂ · · · ⊂ Mk ⊂ U be invariant subspaces. Let B = {e1 , . . . , en } be a basis for U such that Bi = {en1 +···+ni−1 +1 , . . . , en1 +···+ni } is a basis for Mi . Then ⎛
T11 ⎜ 0 ⎜ ⎜ [T ]B B =⎜ . ⎜ ⎝ . .
⎞ T12 . . T1n T22 . ⎟ ⎟ ⎟ . . . ⎟. ⎟ . . . ⎠ . . 0 Tkk
90
4 Linear Transformations
2. Let U = M1 ⊕ · · ·⊕ Mk , with Mi being T -invariant subspaces. Let Bi be a basis for Mi and let B be the union of the bases Bi . Then ⎛ [T ]B B
⎜ ⎜ ⎜ =⎜ ⎜ ⎝
⎞
T11
⎟ ⎟ ⎟ ⎟. ⎟ ⎠
T22 . . Tkk
We leave the proof of the two previous results to the reader. An invariant subspace of a linear transformation T in U induces two other transformations, namely the restriction T |M of T to the invariant subspace M and the induced transformation T |U /M in the quotient space U /M , which is defined by T |U /M [x]M = [T x]M . (4.43) Proposition 4.45. Let T ∈ L(U ), let M ⊂ U be a T -invariant subspace, and let π : U −→ U /M be the canonical projection, i.e., the map defined by
π (x) = [x]M .
(4.44)
Then 1. There exists a unique linear transformation T : U /M −→ U /M that makes the following diagram commutative: π - U /M U T
T ? U
i.e., we have
? - U /M
π
T π x = π T x.
(4.45)
2. Let B = {e1 , . . . , en } be a basis for U such that B1 = {e1 , . . . , em } is a basis for M . Then B = {π em+1, . . . , π en } is a basis for U /M . If [T ]B B = then
T11 T12 0 T22
[T ]B = T22 . B
,
(4.46)
(4.47)
4.6 Polynomial Module Structure on Vector Spaces
91
Proof. 1. In terms of equivalence classes modulo M , we have T [x] = [T x]. To show that T is well defined, we have to show that [x] = [y] implies [T x] = [Ty]. This is the direct consequence of the invariance of M under T . 2. Follows from the invariance of M under T . Definition 4.46. 1. Let M ⊂ U be a T -invariant subspace. The restriction of T to M is the unique linear transformation T |M : M −→ M defined by T |M x = T x,
x ∈ M.
2. The induced map, namely the map induced by T on the quotient space U /M , is the unique map defined by (4.45). We will use the notation T |U /M for the induced map. Invariance and reducibility of linear transformations are conveniently described in terms of projection operators. We turn to that. Assume the vector space U admits a direct sum decomposition U = M ⊕ N . Thus every vector x ∈ U can be written, in a unique way, as x = m + n with m ∈ M and n ∈ N . We define a transformation PN : U −→ U by PN x = m. We call PN the projection on M in the direction of N . Clearly PN is a linear transformation and satisfies Ker PN = N and Im PN = M . Proposition 4.47. A linear transformation P is a projection if and only if P2 = P. Proof. Assume PN is the projection on M in the direction of N . If x = m + n, it is clear that PN m = m, so PN2 x = PN m = m = PN x. Conversely, assume P2 = P. Let M = Im P and N = Ker P. Since for every x ∈ U , we have x = Px + (I − P)x, with Px ∈ Im P and (I − P)x ∈ Ker P, it follows that U = M +N . To show that this is a direct sum decomposition, assume x ∈ M ∩N . This implies x = Px = (I − P)y. This implies in turn that x = Px = P(I − P)y = 0. Proposition 4.48. A linear transformation P in U is a projection if and only if I − P is a projection. Proof. It suffices to show that (I − P)2 = I − P, and this is immediate.
If U = M ⊕ N , and PN is the projection on M in the direction of N , then PM = I − PN is the projection on N in the direction of M . The next proposition is central. It expresses the geometric conditions of invariance and reducibility in terms of arithmetic conditions involving projection operators. Proposition 4.49. Let P be a projection operator in a vector space X and let T be a linear transformation in X . Then 1. A subspace M ⊂ X is invariant under T if and only if for any projection P on M we have T P = PT P. (4.48)
92
4 Linear Transformations
2. Let X = M ⊕ N be a direct sum decomposition of X and let P be the projection on M in the direction of N . Then the direct sum decomposition reduces T if and only if T P = PT. (4.49) Proof. 1. Assume (4.48) holds, and Im P = M . Then, for every m ∈ M , we have T m = T Pm = PT Pm = PT m ∈ M . Thus M is invariant. The converse is immediate. 2. The direct sum reduces T if and only if both M and N are invariant subspaces. By part 1 this is equivalent to the two conditions T P = PT P and T (I − P) = (I − P)T (I − P). These two conditions are equivalent to (4.49). We clearly have, for a projection P, that I = P+ (I − P), which corresponds to the direct sum decomposition X = Im P ⊕ Im (I − P). This generalizes in the following way. Theorem 4.50. Given the direct sum decomposition X = M1 ⊕ · · · ⊕ Mk , there exist k projections Pi on X such that 1. Pi Pj = δi j Pj . 2. I = P1 + · · · + Pk . 3. Im Pi = Mi , i = 1, . . . , k. Conversely, given k operators Pi satisfying (1)-(3), then with Mi = Im Pi , we have X = M 1 ⊕ · · · ⊕ Mk . Proof. Assume we are given the direct sum decomposition X = M1 ⊕ · · · ⊕ Mk . Thus, each x ∈ X has a unique representation in the form x = m1 + · · · + mk with mi ∈ Mi . We define Pi x = mi . The operators Pi are clearly linear, projections with Im Pi = Mi , and satisfy (1)-(3). Moreover, we have KerPi = ∑ j=i M j . Conversely, assume Pi satisfy conditions (1)-(3), and let Mi = Im Pi . From I = P1 + · · · + Pk it follows that x = P1 x + · · · + Pk x and hence X = M1 + · · · + Mk . This representation of x is unique, for if x = m1 + · · · + mk , with mi ∈ Mi another such representation, then since m j ∈ M j , we have m j = Pj y j for some y j . Therefore, k
Pi x = Pi ∑ m j = j=1
k
k
k
j=1
j=1
j=1
∑ Pi m j = ∑ PiPj y j = ∑ δi j Pj y j = Pi yi = mi ,
which shows that the sum is a direct sum.
In search for nontrivial invariant subspaces we begin by looking for those that are 1-dimensional. If M is a 1-dimensional invariant subspace and x a nonzero vector in M , then every other vector in M is of the form α x. Thus the invariance condition is T x = α x, and this leads us to the following definition.
4.6 Polynomial Module Structure on Vector Spaces
93
Definition 4.51. 1. Let T be a linear transformation in a vector space U over the field F. A nonzero vector x ∈ U will be called an eigenvector, or characteristic vector, of T if there exists α ∈ F such that T x = α x. Such an α will be called an eigenvalue, or characteristic value, of T . 2. The characteristic polynomial of T , dT (z), is defined by dT (z) = det(zI − T ). Clearly, deg dT = dim U . Proposition 4.52. Let T be a linear transformation in a finite-dimensional vector space U over the field F. An element α ∈ F is an eigenvalue of T if and only if it is a zero of the characteristic polynomial of T . Proof. The homogeneous system (α I − T )x = 0 has a nontrivial solution if and only if α I − T is singular, i.e., if and only if dT (α ) = det(α I − T ) = 0. Let T be a linear transformation in an n-dimensional vector space U . We say that a polynomial p(z) ∈ F[z] annihilates T if p(T ) = 0, where p(T ) is defined by (4.40). Clearly, every linear transformation is annihilated by the zero polynomial. A priori, it is not clear that an arbitrary linear transformation has a nontrivial annihilator. This is proved next. Proposition 4.53. For every linear transformation T in an n-dimensional vector space U there exists a nontrivial annihilating polynomial. Proof. L(U ), the space of all linear transformations in U , is n2 -dimensional. 2 Therefore the set of linear transformations {I, T, . . . , T n } is linearly dependent. Hence, there exist coefficients pi ∈ F, i = 0, . . . , n2 , not all zero, for which 2 2 ∑ni=0 pi T i = 0, or p(T ) = 0, where p(z) = ∑ni=0 pi zi . Theorem 4.54. Let T be a linear transformation in an n-dimensional vector space U . There exists a unique monic polynomial of minimal degree that annihilates T . Proof. Let J = {p(z) ∈ F[z]| p(T ) = 0}. Clearly, J is a nontrivial ideal in F[z]. Since F[z] is a principal ideal domain, there exists a unique monic polynomial mT (z) for which J = mT F[z]. Obviously if 0 = p(z) ∈ J, then deg p ≥ deg m. The polynomial m(z) whose existence is established in the previous theorem is called the minimal polynomial of T . Both the characteristic and the minimal polynomial are similarity invariants.
94
4 Linear Transformations
Proposition 4.55. Let T1 and T2 be similar linear transformations. Let dT1 (z), dT2 (z) be their characteristic polynomials and mT1 (z), mT2 (z) their minimal polynomials. Then dT1 (z) = dT2 (z) and mT1 (z) = mT2 (z). Proof. If T2 T1 , then for some invertible R, zI2 − T2 = R(zI1 − T1 )R−1 . Using the multiplication rule of determinants, it follows that dT2 (z) = det(zI2 − T2 ) = det(zI1 − T1 ) = dT1 (z). Note that T2 = RT1 R−1 implies that for any p(z) ∈ F[z], we have p(T2 ) = Rp(T1 )R−1 . In particular, we have mT1 (T2 ) = RmT1 (T1 )R−1 = 0, which shows that mT2 (z) | mT1 (z). By symmetry, we have mT1 (z) | mT2 (z), and the equality of the minimal polynomials follows. Since the characteristic polynomial of a linear transformation T is a similarity invariant, all coefficients of dT (z) are similarity invariants too. We single out two. If dT (z) = zn +tn−1 zn−1 + · · ·+t0 , then it is easily checked that with Trace T defined by (4.26), we have tn−1 = −TraceT = − ∑ni=1 tii , for any matrix representation of T . In the same way, we check that t0 = (−1)n det T , so det T is also a similarity invariant.
4.7 Exercises 1. Let T be an m × n matrix over the field F. Define the determinant rank of T to be the order of the largest nonvanishing minor of T . Show that the rank of T is equal to its determinantal rank. 2. Let M be a subspace of a finite-dimensional vector space V . Show that M ∗ V ∗ /M ⊥ . 3. Let X be an n-dimensional complex vector space. Let T : X −→ X be such that for every x ∈ X , the vectors x, T x, . . . , T m x are linearly dependent. Show that I, T, . . . , T m are linearly dependent. 4. Let A, B,C be linear transformations. Show that there exists a linear transformation Z such that C = AZB if and only if ImC ⊂ Im A, KerC ⊃ Ker B. 5. Let X , Y be finite-dimensional vector spaces, A ∈ L(X ), and W, Z ∈ L(X , Y ). Show that the following statements are equivalent: a. We have AKerW ⊂ Ker Z. b. We have KerW ⊂ Ker ZA. c. There exists a map B ∈ L(Y ) for which ZA = BW . 6. Let V be a finite-dimensional vector space and let Ai ∈ L(V ), i = 1, . . . , s. Show that if M = ∑si=1 Im Ai (M = ∩si=1 Ker Ai ), then there exist Bi ∈ L(V ) such that Im ∑si=1 Ai Bi = M (Ker ∑si=1 Bi Ai = M ).
4.8 Notes and Remarks
95
7. Let V be a finite-dimensional vector space. Show that there is a bijective correspondence between left (right) ideals in L(V ) and subspaces of V . The correspondence is given by J ↔ ∩A∈J Ker A (J ↔ ∑A∈J Im A). 8. Show that Ker A2 ⊃ Ker A. Show also that Ker A2 = Ker A implies KerA p = Ker A for all p > 0. 9. Let T be an injective linear transformation on a (not necessarily finitedimensional) vector space V . Show that if for some integer k, we have T k = T , then T is also surjective. 10. Let A be an n × n complex matrix with eigenvalues λ1 , . . . , λn . Show that det A = ∏ni=1 λi . Show that given a polynomial p(z), the eigenvalues of p(A) are p(λ1 ), . . . , p(λn ) (Spectral mapping theorem). 11. Let A be an n × n complex matrix with eigenvalues λ1 , . . . , λn . Show that the eigenvalues of adj A are ∏ j=1 λ j , . . . , ∏ j=n λ j . 12. Let A be invertible and let dA (z) be its characteristic polynomial. Show that the characteristic polynomial of A−1 is dA−1 (z) = d(0)−1 zn dA (z−1 ). 13. Let the minimal polynomial of alineartransformation A be ∏(z − λ j )ν j . Show that the minimal polynomial of
A I 0 A
is ∏(z − λ j )ν j +1 .
14. Let A be a linear transformation in a finite-dimensional vector space V over the field F. Let mA (z) be its minimal polynomial. Prove that given a polynomial p(z), p(A) is invertible if and only if p(z) and mA (z) are coprime. Show that the minimal polynomial can be replaced by the characteristic polynomial and the result still holds.
4.8 Notes and Remarks An excellent source for linear algebra is Hoffman and Kunze (1961). The classic treatise of Gantmacher (1959), though strongly matrix oriented, is still a rich source for results and ideas and is highly recommended as a general reference. So is Malcev (1963), which is close in spirit to the present book. We already mentioned, in Section 2.14, the contributions of Grassmann and Peano to the study of linear transformations. The conceptual rigor of this part of linear algebra owes much to Weierstrass and his students Kronecker and Frobenius.
Chapter 5
The Shift Operator
5.1 Introduction We now turn our attention to the study of a special class of transformations, namely shift operators. These will turn out later to serve as models for all linear transformations, in the sense that every linear transformation is similar to a shift operator.
5.2 Basic Properties We introduce now an extremely important class of linear transformation that will play a central role in the analysis of the structure of linear transformations. Recall that for a nonzero polynomial q(z), we denote by πq f the remainder of the polynomial f (z) after division by q(z). Clearly, πq is a projection operator in F[z]. We note that given a nonzero polynomial q(z), any f (z) ∈ F[z] has a unique representation of the form f (z) = a(z)q(z) + r(z),
(5.1)
with deg r < deg q. The remainder, r = πq f , can be written in another form based on the direct sum representation (2.7), namely F((z−1 )) = F[z] ⊕ z−1 F[[z−1 ]]. Let π+ , π− be defined by (1.23), i.e., the projections of F((z−1 )) on F[z] and z−1 F[[z−1 ]] respectively, which correspond to the above direct sum decomposition. From the representation (5.1) of f (z), we have q(z)−1 f (z) = a(z) + q(z)−1 r(z). Applying the projection π− , we have π− q−1 f = π− q−1 r = q−1 r, which implies an important alternative representation for the remainder, namely
πq f = qπ− q−1 f .
P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, DOI 10.1007/978-1-4614-0338-8 5, © Springer Science+Business Media, LLC 2012
(5.2)
97
98
5 The Shift Operator
The importance of equation (5.2) stems from the fact that it easily extends to the case of polynomial vectors. Proposition 5.1. Let q(z) be a monic polynomial in F[z] and let πq : F[z] −→ F[z] be the projection map defined in (5.2). Then we have Ker πq = qF[z]
(5.3)
F[z] = Xq ⊕ qF[z].
(5.4)
Xq = Im πq = {πq f | f (z) ∈ F[z]},
(5.5)
and the direct sum Defining the set Xq by
we have the isomorphism Xq F[z]/qF[z].
(5.6)
Proof. Clearly, for each nonzero q(z) ∈ F[z], the map πq is a projection map. Equation (5.3) follows by a simple computation. Since πq is a projection, so is I − πq. The identity I = πq + (I − πq ), taken together with (5.3), implies the direct sum (5.4). Finally, the isomorphism (5.6) follows from (5.5) and (5.3). The isomorphism (5.6) allows us to lift the F[z]-module structure to Xq . This module structure is the one induced by the polynomial z. Definition 5.2. Let q(z) be a monic polynomial in F[z]. We define a linear transformation Sq : Xq −→ Xq by Sq f = z · f = π q z f .
(5.7)
We call Sq the shift operator in Xq . We note that for the shift operator Sq , we have for all k ≥ 0, that Sqk f = πq zk f . This implies that for any p(z) ∈ F[z], we have p · f = p(Sq ) f = πq (p f ),
f (z) ∈ Xq .
(5.8)
We refer to Xq , with the F[z]-module structure induced by Sq , as a polynomial model. The next proposition characterizes elements of a polynomial model and studies some important bases for it. Proposition 5.3. Let q(z) = zn + qn−1 zn−1 + · · · + q0 . Then 1. A polynomial f (z) belongs to Xq if and only if q(z)−1 f (z) is strictly proper. 2. We have dim Xq = deg q = n. 3. The following sets are bases for Xq : a. The standard basis, namely Bst = {1, z, . . . , zn−1 }.
5.2 Basic Properties
99
b. The control basis, namely Bco = {e1 (z), . . . , en (z)}, where ei (z) = zn−i + qn−1zn−i−1 + · · · + qi . c. In case α1 , . . . , αn , the zeros of q(z), are distinct then the polynomials pi (z) = Π j=i (z − α j ), i = 1, . . . , n form a basis for Xq . We refer to this as the spectral basis Bsp of Xq . d. Under the same assumption, the Lagrange interpolation polynomials, given by li (z) = ppii((z) αi ) are a basis for Xq , naturally called the interpolation basis. Proof. 1. Clearly, f (z) ∈ Xq is equivalent to πq f = f . In turn, this can be rewritten as q−1 f = π− q−1 f , and this equality holds if and only if q−1 f is strictly proper. 2. Clearly, the elements of Xq are all polynomials of degree < n = deg q. Obviously, this is an n-dimensional space. 3. Each of the sets has n linearly independent elements, hence is a basis for Xq . The linear independence of the Lagrange interpolation polynomials was proved in Chapter 2, and the polynomials pi (z) are, up to a multiplicative constant, equal to the Lagrange interpolation polynomials. We proceed by studying the matrix representations of Sq with respect to these bases of Xq . Proposition 5.4. Let Sq : Xq −→ Xq be defined by (5.7). 1. With respect to the standard basis, Sq has the matrix representation ⎛
0 ⎜1 ⎜ ⎜ Cq = [Sq ]stst = ⎜ . ⎜ ⎝ .
⎞ −q0 · ⎟ ⎟ ⎟ · ⎟. ⎟ · ⎠ 1 −qn−1
(5.9)
2. With respect to the control basis, Sq has the diagonal matrix representation ⎛ Cq
=
[Sq ]co co
⎜ ⎜ ⎜ =⎜ ⎜ ⎝
⎞
0 1 .
· .
1 −q0 . . . −qn−1
⎟ ⎟ ⎟ ⎟. ⎟ ⎠
3. With respect to the spectral basis, Sq has the matrix representation
(5.10)
100
5 The Shift Operator
⎛ [Sq ]sp sp
⎜ ⎜ ⎜ =⎜ ⎜ ⎝
⎞
α1 . . .
αn
⎟ ⎟ ⎟ ⎟. ⎟ ⎠
(5.11)
Proof. 1. Clearly, we have Sq zi =
⎧ ⎨ zi+1 ,
i = 0, . . . , n − 2,
⎩−
i = n − 1.
i ∑n−1 i=0 qi z ,
2. We compute, defining e0 (z) = 0, Sq ei = πq zei (z) = πq z(zn−i + qn−1zn−i−1 + · · · + qi ) = πq (zn−i+1 + qn−1zn−i + · · · + qi z) = πq (zn−i+1 + qn−1zn−i + · · · + qi z + qi−1) − qi−1en (z). So we get Sq ei = ei−1 − qi−1 en .
(5.12)
3. Noting that q(z) = (z − αi )pi (z), we compute Sq pi (z) = πq (z − αi + αi )pi = πq (q + αi pi ) = αi pi .
matrices Cq ,Cq
The are called the companion matrices of the polynomial q(z). We note that the change of basis transformation from the control to the standard basis has a particularly nice form. In fact, we have ⎛
q1 . . qn−1 ⎜ . . ⎜ ⎜ [I]stco = ⎜ . . ⎜ ⎝ qn−1 . 1
1
⎞ ⎟ ⎟ ⎟ ⎟. ⎟ ⎠
We relate now the invariant subspaces of the shift operators Sq , or equivalently the submodules of Xq , to factorizations of the polynomial q(z). This is an example of the interplay between algebra and geometry that is one of the salient characteristics of the use of functional models. Theorem 5.5. Given a monic polynomial q(z), M ⊂ Xq is an Sq -invariant subspace if and only if M = q1 Xq2 , for some factorization q(z) = q1 (z)q2 (z).
5.2 Basic Properties
101
Proof. Assume q(z) = q1 (z)q2 (z) and M = q1 Xq2 . Thus f (z) ∈ M implies f (z) = q1 (z) f1 (z) with deg f1 < deg q2 . Using Lemma 1.22, we compute Sq f = πq z f = πq1 q2 zq1 f1 = q1 πq2 z f1 = q1 Sq2 f1 ∈ M. Conversely, let M be an Sq -invariant subspace. Now, for each f (z) ∈ Xq there exists a scalar α that depends on f (z) for which Sq f = z f − α q.
(5.13)
Consider now the set N = M + qF[z]. Obviously N is closed under addition, and using (5.13), z{M + qF[z]} ⊂ {M + qF[z]}. Thus N is an ideal in F[z], hence of the form q1 F[z]. Since, obviously, qF[z] ⊂ q1 F[z], it follows from Proposition 1.46 that q1 (z) is a divisor of q(z), that is, we have a factorization q(z) = q1 (z)q2 (z). It is clear that M = πq {M + qF[z]} = πq q1 F[z] = πq1 q2 q1 F[z] = q1 Xq2 .
Proper subspaces of a vector space may have many complementary subspaces and invariant subspaces of polynomial models are no exception. The following proposition exhibits a particular complementary subspace to such a proper invariant subspace. Proposition 5.6. Let q(z) ∈ F[z] be nonzero and let q(z) = q1 (z)q2 (z) be a factorization. Then, as vector spaces, we have the following direct sum representation: Xq = Xq1 ⊕ q1 Xq2 .
(5.14)
Proof. That Xq1 ⊂ Xq follows from the fact that deg q1 ≤ deg q. Next, we note that Xq1 ∩ q1 Xq2 = {0} and every nonzero polynomial in Xq1 has degree < degq1 , whereas a nonzero polynomial in q1 Xq2 has degree ≥ deg q1 . Finally, the equality in (5.14) follows from dim Xq = degq = deg q1 + degq2 = dim Xq1 + dim Xq2 = dim Xq1 ⊕ q1 Xq2 . Note that Xq1 ⊂ Xq is generally not an invariant subspace for Sq . The following proposition sums up the basic arithmetic properties of invariant subspaces of the shift operator. This can be viewed as the counterpart of Proposition 1.46. Proposition 5.7. Given a monic polynomial q(z) ∈ F[z], the following hold: 1. Let q(z) = q1 (z)q2 (z) = p1 (z)p2 (z) be two factorizations. Then we have the inclusion q1 Xq2 ⊂ p1 X p2 (5.15) if and only if p1 (z) | q1 (z), or equivalently q2 (z) | p2 (z).
102
5 The Shift Operator
2. Given factorizations q(z) = pi (z)qi (z), i = 1, . . . , s, then ∩si=1 pi Xqi = pXq with p(z) the l.c.m of the pi (z) and q(z) the g.c.d. of the qi (z). 3. Given factorizations q(z) = pi (z)qi (z), i = 1, . . . , s, then ∑si=1 pi Xqi = pXq with q(z) the l.c.m of the qi (z) and p(z) the g.c.d. of the pi (z). Proof. 1. Assume p1 (z)|q1 (z), i.e., q1 (z) = p1 (z)r(z) for some polynomial r(z). Then q(z) = q1 (z)q2 (z) = (p1 (z)r(z))q2 (z) = p1 (z)(r(z)q2 (z)) = p1 (z)p2 (z), and in particular, p2 (z) = r(z)q2 (z). This implies q1 Xq2 = p1 rXq2 ⊂ p1 Xrq2 = p1 X p2 . Conversely, assume the inclusion (5.15) holds. From this we have q1 Xq2 + qF[z] = q1 Xq2 + q1 q2 F[z] = q1 [Xq2 + q2 F[z]] = q1 F[z]. So (5.15) implies the inclusion q1 F[z] ⊂ p1 F[z]. By Proposition 1.46 it follows that p1 (z)|q1 (z). 2. Let t(z) = pi (z)qi (z), i = 1, . . . , s. Since ∩si=1 pi Xqi is a submodule of Xq , it is of the form pXq with t(z) = p(z)q(z). Now the inclusion pXq ⊂ pi Xqi implies pi (z)|p(z) and q(z)|qi (z), so p(z) is a common multiple of the pi (z), and q(z) a common divisor of the qi (z). Let now q (z) be any common divisor of the qi (z). Since necessarily q (z) | t(z), we can write t(z) = p (z)q (z). Now applying Proposition 1.46, we have p Xq ⊂ pi Xqi and hence p Xq ⊂ ∩si=1 pi Xqi = pXq . This implies q (z) | q(z), and hence q(z) is a g.c.d. of the qi (z). By the same token, we conclude that p(z) is the l.c.m. of the pi (z). 3. Since p1 Xq1 + · · · + ps Xqs is an invariant subspace of Xt it is of the form pXq with t(z) = p(z)q(z). Now the inclusions pi Xqi ⊂ pXq imply the division relations qi (z) | q(z) and p(z) | pi (z). So q(z) is a common multiple of the qi (z), and p(z) a common divisor of the pi (z). Let p (z) be any other common divisor of the pi (z). Then pi (z) = p (z)ei (z) for some polynomials ei (z). Now t(z) = pi (z)qi (z) = p (z)ei (z)qi (z) = p (z)q (z), so ei (z)qi (z) = q (z), and q (z) is a common multiple of the qi (z). Now pi (z) = p (z)ei (z) implies pi Xqi = p ei Xqi ⊂ p Xei qi = p Xq and hence pXq = p1 Xq1 + · · ·+ ps Xqs = p Xq . This shows that q(z) | q (z), and so q(z) is the l.c.m. of the qi (z). Corollary 5.8. Given the factorizations q(z) = pi (z)qi (z), i = 1, . . . , s, then 1. Xq = p1 Xq1 + · · · + ps Xqs if and only if the pi (z) are coprime. 2. The sum p1 Xq1 + · · · + ps Xqs is a direct sum if and only if q1 (z), . . . , qs (z) are mutually coprime. 3. We have the direct sum decomposition Xq = p1 Xq1 ⊕ · · · ⊕ ps Xqs if and only if the pi (z) are coprime and the qi (z) are mutually coprime.
5.2 Basic Properties
103
4. We have the direct sum decomposition Xq = p1 Xq1 ⊕ · · · ⊕ ps Xqs if and only if the qi (z) are mutually coprime and q(z) = q1 (z) · · · qs (z). In this case pi (z) = Π j=i q j (z). Proof. 1. Let the invariant subspace p1 Xq1 + · · · + ps Xqs have the representation pν Xqν with pν (z) the g.c.d. of the pi (z) and qν (z) the l.c.m. of the qi (z). Therefore pν Xqν = Xq if and only if pν (z) = 1 or equivalently qν (z) = q(z). 2. The sum p1 Xq1 + · · · + ps Xqs is a direct sum if and only if for each index i, we have pi Xqi ∩ ∑ p j Xq j = {0}. j=i
Now ∑ j=i p j Xq j is an invariant subspace and hence of the form πi Xσi for some factorization q(z) = πi (z)σi (z). Here πi (z) is the g.c.d. of the p j (z), j = i and σi (z) is the l.c.m. of the q j (z), j = i. Now pi Xqi ∩ πi Xσi = {0} if and only if σi (z) and qi (z) are coprime. This, however, is equivalent to qi (z) being coprime with each of the q j (z), j = i, i.e., to the mutual coprimeness of the qi (z), i = 1, . . . , s. 3. Follows from the previous two parts. 4. Clearly, the pi (z) are coprime. Corollary 5.9. Let p(z) = p1 (z)ν1 · · · pk (z)νk be the primary decomposition of the polynomial p(z). Define πi (z) = Π j=i p j (z)ν j , for i = 1, . . . , k. Then X p = π1 X pν1 ⊕ · · · ⊕ πk X pνk . 1
(5.16)
k
Proof. Clearly, the g.c.d. of the πi (z) is 1, whereas the l.c.m. of the pνi i (z) is p(z). The structure of the shift operator restricted to an invariant subspace can be easily deduced from the corresponding factorization. Proposition 5.10. Let q(z) = q1 (z)q2 (z). Then we have the similarity Sq |q1 Xq2 Sq2 .
(5.17)
Proof. Let φ : Xq2 −→ q1 Xq2 be the map defined by
φ ( f ) = q1 f , which is clearly an isomorphism of the two spaces. Next we compute, for f (z) ∈ Xq2 ,
φ Sq2 f = q1 Sq2 f = qπq q1 z f = πq zq1 f = Sq φ f .
104
5 The Shift Operator
Therefore, the following diagram is commutative: Xq2
φ
-
q1 Xq2 Sq |q1 Xq2
S q2 ? Xq2
φ
-
? q1 Xq2
This is equivalent to (5.17).
Since eigenvectors span 1-dimensional invariant subspaces, we expect a characterization of eigenvectors of the shift Sq in terms of the polynomial q(z). Proposition 5.11. Let q(z) be a nonzero polynomial. 1. The eigenvalues of Sq coincide with the zeros of q(z). 2. f (z) ∈ Xq is an eigenvector of Sq corresponding to the eigenvalue α if and only if it has the representation cq(z) f (z) = . (5.18) z−α Proof. Let f (z) be an eigenvector of Sq corresponding to the eigenvalue α , i.e., Sq f = α f . By (5.13), there exists a scalar c for which (Sq f )(z) = z f (z) − cq(z). Thus z f (z) − cq(z) = α f (z), which implies (5.18). Since f (z) is a polynomial, we must have q(α ) = 0. Conversely, if q(α ) = 0, q(z) is divisible by z − α and hence f (z), defined by (5.18), is in Xq . We compute (Sq − α I) f = πq (z − α )
cq(z) = πq cq(z) = 0, z−α
which shows that α is an eigenvalue of Sq , and a corresponding eigenvector is given by (5.18). The previous proposition suggests that the characteristic polynomial of Sq is q(z) itself. Indeed, this is true and is proved next. Proposition 5.12. Let q(z) = zn + qn−1 zn−1 + · · · + q0 be a monic polynomial of degree n, and let Sq be the shift operator defined by (5.7). Then the characteristic polynomial of Sq is q(z). Proof. It suffices to compute det(zI − C) for an arbitrary matrix representation of Sq . We find it convenient to do the computation in the standard basis. In this case the matrix representation is given by the companion matrix Cq of (5.9). We prove the result by induction on n. For n = 1, we have C = (−q0 ) and det(zI −C) = z + q0 .
5.2 Basic Properties
105
Assume the statement holds up to n − 1. We proceed to compute, using the induction hypothesis and expanding the determinant by the first row, z q0 −1 · det(zI − Cq ) = . · . · −1 z + qn−1 z −1 z q1 −1 .. · n+1 = z .. . · + (−1) q0 . z . · −1 −1 z + qn−1 = z(zn−1 + qn−1zn−2 + · · · + q1 ) + (−1)n+1 q0 (−1)n−1 = q(z).
We introduce now the class of cyclic transformations. Their importance lies in that they turn out to be the building blocks of general linear transformations. Definition 5.13. Let U be an n-dimensional vector space over the field F. A map A : U −→ U is called a cyclic transformation if there exists a vector b ∈ U for which {b, Ab, . . ., An−1 b} is a basis for U . Such a vector b will be called a cyclic vector for A. Lemma 5.14. 1. Given a nonzero polynomial q(z) and f (z) ∈ Xq , the smallest Sq invariant invariant subspace of Xq containing f (z) is q1 Xq2 , where q1 (z) = q(z)∧ f (z) and q(z) = q1 (z)q2 (z). 2. Sq is a cyclic transformation in Xq . 3. A polynomial f (z) ∈ Xq is a cyclic vector of Sq if and only if f (z) and q(z) are coprime. Proof. 1. Let M be the subspace of Xq spanned by the vectors {Sqi f |i ≥ 0}. This is the smallest Sq invariant subspace containing f (z). Therefore it has the representation M = q1 Xq2 for a factorization q(z) = q1 (z)q2 (z). Since f (z) ∈ M, there exists a polynomial f 1 (z) ∈ Xq2 for which f (z) = q1 (z) f1 (z). This shows that q1 (z) is a common divisor of q(z) and f (z). To show that it is the greatest common divisor, let us assume that q (z) is an arbitrary common divisor of q(z) and f (z). Thus we have q(z) = q (z)q (z) and f (z) = q (z) f (z). Using Lemma 1.24, we compute Sqk f = πq xk f = πq xk q f = q πq xk f = q Sqk f .
106
5 The Shift Operator
Thus we have M ⊂ q Xq , or equivalently q1 Xq2 ⊂ q Xq . This implies q (z)|q1 (z); hence q1 (z) is the g.c.d. of q(z) and f (z). 2. Obviously 1 ∈ Xq and 1 ∧ q(z) = 1. So 1 is a cyclic vector for Sq . 3. Obviously, since dim q1 Xq2 = dim Xq2 = deg q2 , X = q1 Xq2 if and only if deg q1 = 0, that is, f (z) and q(z) are coprime. The availability of eigenvectors of the shift allows us to study under what conditions the shift Sq is diagonalizable, i.e., has a diagonal matrix representation. Proposition 5.15. Let q(z) be a monic polynomial of degree n. Then Sq is diagonalizable if and only if q(z) splits into the product of n distinct linear factors, or equivalently, it has n distinct zeros. n (z − α ). Proof. Assume α1 , . . . , αn are the distinct zeros of q(z), i.e., q(z) = Πi=1 i q(z) Let pi (z) = z−αi = Π j=i (z − α j ). Then Bsp = {p1 , . . . , pn } is the spectral basis for Xq , differing from the Lagrange interpolation basis by constant factors only. It is easily checked that (Sq − αi )pi = 0. So
⎛ [Sq ]sp sp
⎜ ⎜ ⎜ =⎜ ⎜ ⎝
⎞
α1 . . .
αn
⎟ ⎟ ⎟ ⎟, ⎟ ⎠
(5.19)
and Sq is diagonalizable. Conversely, assume Sq is diagonalizable. Then with respect to some basis it has the representation (5.19). Since Sq is cyclic, its minimal and characteristic polynomials coincide. Necessarily all the αi are distinct. Proposition 5.16. Let q(z) be a monic polynomial and Sq the shift operator in Xq defined by (5.7). Then p(Sq ) f = πq (p f ),
f (z) ∈ Xq .
(5.20)
Proof. Using linearity, it suffices to show that Sqk f = πq zk f ,
f (z) ∈ Xq .
We prove this by induction. For k = 1 this is the definition. Assume we proved it up to an integer k. Now, using the fact that zKer πq ⊂ Ker πq , we compute Sqk+1 f = Sq Sqk f = πq zπq zk f = πq zk+1 f .
Clearly, the operators p(Sq ) all commute with the shift Sq . We proceed to state the simplest version of the commutant lifting theorem. It characterizes operators
5.2 Basic Properties
107
commuting with the shift Sq in Xq via operators commuting with the shift S+ in F[z]. The last class of operators consists of multiplication operators by polynomials. Theorem 5.17. 1. Let q(z) be a monic polynomial and Sq the shift operator in Xq defined by (5.7). Let Z be any operator in Xq that commutes with Sq . Then there exists an operator Z that commutes with S+ and such that Z = πq Z|Xq .
(5.21)
Equivalently, the following diagram is commutative:
F[z]
Z
-
πq
πq ?
Xq
F[z]
Z
-
? Xq
2. An operator Z in Xq commutes with Sq if and only if there exists a polynomial p(z) ∈ F[z] for which Z f = πq p f = p(Sq ) f ,
f (z) ∈ Xq .
(5.22)
Proof. 1. By Proposition 6.2, there exists a polynomial p(z) for which Z = p(Sq ). We define Z : F[z] −→ F[z] by Z f = p f . It is easily checked that (5.21) holds. 2. If Z is represented by (5.22), then Sq Z f = πq zπq p f = πq zp f = πq pz f = πq pπq z f = ZSq f . Conversely, if Z commutes with the shift, it has a representation (5.21). Now, any map Z commuting with S+ is a multiplication by a polynomial p(z), which proves (5.22). The proof of the previous theorem is slightly misleading, inasmuch as it uses the fact that Sq is cyclic. In Chapter 8, we will return to this subject using tensor products and the Bezout map. Since any operator of the form p(Sq ) is completely determined by the two polynomials p(z) and q(z), all its properties should be derivable from these data. The next proposition focuses on the invertibility properties of p(Sq ). Theorem 5.18. Given the polynomials p(z), q(z), with q(z) monic, let r(z) and s(z) be the g.c.d and the l.c.m. of p(z) and q(z) respectively. Then
108
5 The Shift Operator
1. We have the factorizations q(z) = r(z)q1 (z), p(z) = r(z)p1 (z),
(5.23)
with p1 (z), q1 (z) coprime, as well as the factorizations s(z) = r(z)p1 (z)q1 (z) = p(z)q1 (z) = q(z)p1 (z).
(5.24)
Moreover, we have Ker p(Sq ) = q1 Xr , Im p(Sq ) = rXq1 .
(5.25)
2. Let p(z), q(z) ∈ F[z], with q(z) nonzero. Then the linear transformation p(Sq ) is invertible if and only if p(z) and q(z) are coprime. Moreover, we have p(Sq )−1 = a(Sq ),
(5.26)
where the polynomial a(z) arises out of any solution of the Bezout equation a(z)p(z) + b(z)q(z) = 1.
(5.27)
Proof. 1. With r(z) and s(z) defined as above, (5.24) is an immediate consequence of (5.23). Applying Theorem 5.5, it follows that q1 Xr and rXq1 are invariant subspaces of Xq . If f (z) ∈ q1 Xr ,, then f = q1 g, with g ∈ Xr . We compute, using (5.23), p(Sq ) f = πq p f = q1 rπ− r−1 q−1 1 rp1 q1 g = q1 r π− g = 0, which shows that q1 Xr ⊂ Ker p(Sq ). Conversely, assume f (z) ∈ Ker p(Sq ). Then πq p f = 0 or there exists g(z) such that p(z) f (z) = q(z)g(z), which implies p1 (z) f (z) = q1 (z)g(z). Since p1 (z) and q1 (z) are coprime, we have f (z) = q1 (z) f1 (z) for some polynomial f1 (z). As f (z) ∈ Xq , we must have f1 (z) ∈ Xr . So Ker p(Sq ) ⊂ q1 Xr , and the first equality in (5.25) follows. Next, assume g(z) ∈ Im p(Sq ), i.e., there exists an f (z) ∈ Xq such that g = πq p f . We compute, using (5.23) and (5.24), g = πq p f = q1 rπ− r−1 q−1 1 rp1 f = rπq1 p1 f ∈ rXq1 , i.e., we have the inclusion Im p(Sq ) ⊂ rXq1 . Conversely, assume g(z) ∈ rXq1 , i.e., g(z) = r(z)g1 (z) with g1 (z) ∈ Xq1 . By the coprimeness of p1 (z) and q1 (z), the map f1 → πq1 p1 f1 , acting in Xq1 , is an invertible map. Hence, there exists f1 (z) ∈ Xq1 for which g1 = πq1 p1 f1 . This implies
5.3 Circulant Matrices
109
rg1 = rq1 π− r−1 q−1 1 p1 r f1 = πq pr f 1 . This shows that rXq1 ⊂ Im p(Sq ), and the second equality in (5.25) follows. 2. From the characterization (5.25) of Ker p(Sq ) and Im p(Sq ), it follows that the injectivity of p(Sq ) is equivalent to the coprimeness of p(z) and q(z), and the same holds for surjectivity. In order to actually invert p(Sq ), we use the coprimeness of p(z) and q(z). This implies the existence of polynomials a(z), b(z) that solve the Bezout equation a(z)p(z) + b(z)q(z) = 1. Applying the functional calculus to the shift Sq , and noting that q(Sq ) = 0, we get a(Sq )p(Sq ) + b(Sq )q(Sq ) = a(Sq )p(Sq ) = I,
and (5.26) follows.
5.3 Circulant Matrices In this section we give a short account of a special class of structured matrices, called circulant matrices. We do this for its own sake, and as an illustration of the power of polynomial algebra. Definition 5.19. An n × n matrix over a field F is called a circulant matrix, or simply circulant for short, if it has the form ⎛
c0 cn−1 ⎜ c ⎜ 1 c0 ⎜ C = circ (c0 , . . . , cn−1 ) = ⎜ . . ⎜ ⎝ . . cn−1 .
. . . . .
⎞ . c1 . . ⎟ ⎟ ⎟ . . ⎟, ⎟ . cn−1 ⎠ c1 c0
i.e., ci j = c(i− jmod n) . We define a polynomial c(z) by c(z) = c0 + c1 z + · · · + cn−1 zn−1 . The polynomial c is called the representer of circ (c0 , . . . , cn−1 ). Theorem 5.20. For circulant matrices the following properties hold: 1. The circulant matrix circ (c0 , . . . , cn−1 ) is the matrix representation of c(Szn −1 ) with respect to the standard basis of Xzn −1 . 2. The sum of circulant matrices is a circulant matrix. Specifically, circ(a1 , . . . , an ) + circ(b1 , . . . , bn ) = circ(a1 + b1 , . . . , an + bn ). 3. We have
α circ (a1 , . . . , an ) = circ (α a1 , . . . , α an ).
110
5 The Shift Operator
4. Define the special circulant matrix Π by ⎛
0 ⎜1 . . ⎜ ⎜ Π = circ (0, 1, 0, . . . , 0) = ⎜ . . ⎜ ⎝ . 1
5. 6. 7.
8.
⎞ 1 .⎟ ⎟ ⎟ . ⎟. ⎟ .⎠ 0
Then C ∈ Fn×n is a circulant if and only if CΠ = Π C. The product of circulants is commutative. The product of circulants is a circulant. The inverse of a circulant is a circulant. Moreover, the inverse of a circulant with representer c(z) is a circulant with representer a(z), where a(z) comes from a solution of the Bezout equation a(z)c(z) + b(z)(zn − 1) = 1. Over an algebraically closed field, circulants are diagonalizable.
Proof. 1. We compute
⎛
⎞ 0.. . 1 ⎜1 ·⎟ ⎜ ⎟ ⎜ ⎟ st [Szn −1 ]st = ⎜ . · ⎟. ⎜ ⎟ ⎝ . ·⎠ 10
This is a special case of the companion matrix in (5.9). Now Szi n −1 z j
=
zi+ j , zi+ j−n ,
So c(Szn −1 )z j =
i + j ≤ n − 1, i + j ≥ n.
n−1
n−1− j
i=0
i=0
∑ ci Szi n−1 z j =
∑
ci zi+ j +
n−1
∑
ci zi+ j−n ,
i=n− j
and this implies the equality [c(Szn −1 )]stst = circ (c0 , . . . , cn−1 ). 2. Given polynomials a(z), b(z) ∈ F[z], we have (a + b)(Szn−1 ) = a(Szn −1 ) + b(Szn−1 ), and hence circ (a0 + b0 , . . . , an−1 + bn−1 ) = [(a + b)(Szn−1 )]stst = [a(Szn −1 )]stst + [b(Szn−1 )]stst = circ (a0 , . . . , an−1 ) + circ(b0 , . . . , bn−1 ).
5.4 Rational Models
111
3. We compute circ(α a0 , . . . , α an−1 ) = [α a(Szn −1 )]stst = α [a(Szn −1 )]stst = α circ (a0 , . . . , an−1 ). 4. Clearly, Π = circ (0, 1, 0, . . . , 0) = [Szn −1 ]stst . Obviously, Szn −1 is cyclic. Hence, a linear transformation K commutes with Szn −1 if and only if K = c(Szn −1 ) for some polynomial c(z). Thus, assume CΠ = Π C. Then there exists a linear transformation K in Xzn −1 satisfying [K]stst = C, and K commutes with Szn −1 . Therefore K = c(Szn −1 ) and C = circ (c0 , . . . , cn−1 ). Conversely, if C = circ(c0 , . . . , cn−1 ), we have CΠ = [c(Szn −1 )]stst [Szn −1 ]stst = [Szn −1 ]stst [c(Szn −1 )]stst = Π C. 5. Follows from c(Szn −1 )d(Szn −1 ) = d(Szn −1 )c(Szn −1 ). 6. Follows from c(Szn −1 )d(Szn −1 ) = (cd)(Szn −1 ). 7. Let C = circ (c0 , . . . , cn−1 ) = c(Szn −1 ), where c(z) = c0 + c1 z + · · · + cn−1zn−1 . By Theorem 5.18, c(Szn −1 ) is invertible if and only if c(z) and zn − 1 are coprime. In this case there exist polynomials a(z), b(z) satisfying the Bezout identity a(z)c(z) + b(z)(zn − 1) = 1. We may assume without loss of generality that deg a < n. From the Bezout identity we conclude that c(Szn −1 )c(Szn −1 ) = I and hence circ (c0 , . . . , cn−1 )−1 = [a(Szn −1 )]stst = circ (a0 , . . . , an−1 ). 8. The polynomial zn − 1 has a multiple zero if and only if zn − 1 and nzn−1 have a common zero. Clearly, this cannot occur. Since all the roots of zn − 1 are distinct, it follows from Proposition 5.15 that Szn −1 is diagonalizable. This implies the diagonalizability of c(Szn −1 ).
5.4 Rational Models Given a field F, we saw that the ring of polynomials F[z] is an entire ring. Hence, by Theorem 1.49, it is embeddable in its field of quotients. We call the field of quotients of F[z] the field of rational functions and denote it by F(z). Strictly speaking, the elements of F(z) are equivalence classes of pairs of polynomials (p(z), q(z)), with q(z) nonzero. However, in each nonzero equivalence class, there is a unique pair with p(z), q(z) coprime and q(z) monic. The corresponding equivalence class will p(z) be denoted by q(z) . Given such a pair of polynomials p(z), q(z), there is a unique representation of p(z) in the form p(z) = a(z)q(z) + r(z), with deg r < deg q. This allows us to write r(z) p(z) = a(z) + . (5.28) q(z) q(z)
112
5 The Shift Operator
A rational function r(z)/q(z) with deg r ≤ degq will be called proper, and if p(z) deg r < deg q is satisfied, strictly proper. Thus, any rational function g(z) = q(z) has a unique representation as a sum of a polynomial and a strictly proper rational function. We denote by F− (z) the space of strictly proper rational functions and observe that it is an infinite-dimensional linear space. Equation (5.28) means that for F(z) we have the following direct sum decomposition: F(z) = F[z] ⊕ F−(z).
(5.29)
With this direct sum decomposition we associate two projection operators in F(z), π+ and π− , with images F[z] and F− (z) respectively. To be precise, given the representation (5.28), we have
p =a q
p r = . π− q q
π+
(5.30)
Proper rational functions have an expansion as formal power series in the variable gi n n k i z−1 , i.e., in the form g(z) = ∑∞ i=0 zi . Assume p(z) = ∑k=0 pk z , q(z) = ∑i=0 qi z , with qn = 0. We compute n
n
∞
n gj ∑ pk z = ∑ qi z ∑ z j = ∑ i=0 j=0 k=0 k=−∞ k
i
n
∑ qigi−k
zk .
i=k
By comparing coefficients we get the infinite system of linear equations n
pk = ∑ qi gi−k ,
−∞ < k ≤ n.
i=k
This system has a unique solution, which can be found by solving it recursively, starting with k = n. An alternative way of finding this expansion is via the process of long division of p(z) by q(z). This generalizes easily to the case of the field of rational functions. We can consider the field F(z) of rational functions as a subfield of F((z−1 )), the field of truncated Laurent series. The space F− (z) of strictly proper rational functions can be viewed as a subspace of z−1 F[[z−1 ]]. In the same way that we have the isomorphism z−1 F[[z−1 ]] F((z−1 ))/F[z], we have also F− (z) F(z)/F[z]. In fact, the projection π− : F(z) −→ F− (z) is a surjective linear map with kernel equal to F[z]. However, the spaces F((z−1 )), F(z), and F[z] all carry also a natural F[z]-module structure, with polynomials acting by multiplication. This structure induces an F[z]-module structure in the quotient spaces. This module structure is transferred to F− (z) by defining, for p(z) ∈ F[z],
5.4 Rational Models
113
p · g = π− (pg),
g(z) ∈ F− (z).
(5.31)
In particular, we define the backward shift operator S− , acting in F− (z), by S− g = π− (zg),
g ∈ F− (z).
(5.32)
We shall use the same notation when z−1 F[[z−1 ]] replaces F− (z). We point out that in the behavioral literature, S− is denoted by σ . Using this, equation (5.31) can be rewritten, for p(z) ∈ F[z], as a Toeplitz-like operator p(σ )g = π− (pg),
g(z) ∈ z−1 F[[z−1 ]].
(5.33)
In terms of the expansion of g(z) ∈ F− (z) around infinity, i.e., in terms of the gi representation g(z) = ∑∞ i=1 zi we have ∞
∞ gi gi+1 = . ∑ i i i=1 z i=1 z
S− ∑
This explains the use of the term backward shift for this operator. We will show now that for the backward shift S− in z−1 F[[z−1 ]], any α ∈ F is an eigenvalue and it has associated eigenfunctions of any order. Indeed, let α ∈ F. Then α i−1 it is easy to check that (z − α )−1 = ∑∞ i=1 zi is an eigenvector of S− correspomding to the eigenvalue α . Analogously, (z − α )−k is a generalized eigenvector of S− of order k. Moreover, any other such eigenvector is necessarily of the form c(z − α )−k . Therefore, we have, for k ≥ 1, dim Ker (α I − S−)k = k. In contrast to this richness of eigenfunctions of the backward shift S− , the forward shift S+ does not have any eigenfunctions. This asymmetry will be further explored in the study of duality. The previous discussion indicates that the spectral structure of the shift S− , with finite multiplicity, might be rich enough to model all finite-dimensional linear transformations, up to similarity transformations, as the backward shift S− restricted to a finite-dimensional backward-shift-invariant subspace. This turns out to be the case, and this result lies at the heart of the matter, explaining the effectiveness of the use of shifts in linear algebra and system theory. In fact, the whole book can be considered, to a certain extent, to be an elaboration of this idea. We will study a special case of this in Chapter 6. It is easy to construct many finite-dimensional S− -invariant subspaces of F− (z). In fact, given any nonzero polynomial d(z), we let Xd =
r d
| deg r < deg d .
It is easily checked that X d is indeed an S− -invariant subspace, and its dimension equals the degree of d(z).
114
5 The Shift Operator
It is natural to consider the restriction of the operator S− to X d . Thus we define a linear transformation Sd : X d −→ X d by Sd = S− |X d , or equivalently, for h ∈ X d , Sd h = S− h = π− zh.
(5.34)
The modules Xd and X d have the same dimension and are defined by the same polynomial. Just as the polynomial model Xd has been defined in terms of the projection πd , so can the rational model X d be characterized as the image of a projection. Definition 5.21. Let the map π d : z−1 F[[z−1 ]] −→ z−1 F[[z−1 ]] be defined by
π d h = π− d −1 π+ dh,
h(z) ∈ F[[z−1 ]].
(5.35)
Proposition 5.22. Let π d be defined by (5.35). Then 1. π d is a projection in z−1 F[[z−1 ]]. 2. We have
X d = Im π d .
(5.36)
3. With d(σ ) defined by (5.33), we have X d = Ker d(σ ).
(5.37)
Proof. 1. For h(z) ∈ z−1 F[[z−1 ]], we compute (π d )2 h = π d (π d h) = π− d −1 π+ d π− d −1 π+ dh = π− d −1 π+ dd −1 π+ dh = π− d −1 π+ π+ dh = π− d −1 π+ dh = π d h, i.e., π d is a projection. 2. Assume h(z) = d −1 r, with deg r < deg d. Then
π d (d −1 r) = π− d −1 π+ dd −1 r = π− d −1 r = d −1 r, which shows that X d ⊂ Im π d . Conversely, assume h(z) ∈ Im π d . Thus there exists g(z) ∈ z−1 F[[z−1 ]] for which h(z) = π d g. Hence
π d h = (π d )2 g = π d g = h. From this it follows that
π− dh = π− d π d h = π− d π− d −1 π+ dh = π− dd −1 π+ dh = π− π+ dh = 0.
5.4 Rational Models
115
This implies that there exists r(z) ∈ F[z] for which d(z)h(z) = r(z) and deg r < deg d. In turn, we conclude that h(z) = d(z)−1 r(z) ∈ X d , which implies the inclusion Im π d ⊂ X d . 3. Assume h ∈ Ker d(σ ), which implies π+ dh = dh. We compute
π d h = π− d −1 π+ dh = π− d −1 dh = π− h = h, i.e., Ker d(σ ) ⊂ X d . Conversely, assume h ∈ X d . This implies h = π− d −1 π+ dh. We compute d(σ )h = π− dh = π− d π− d −1 π+ dh = π− dd −1 π+ dh = π− π+ dh = 0, i.e., X d ⊂ Ker d(σ ). The two inclusions imply (5.37).
It is natural to conjecture that the polynomial model Xd and the rational model X d must be isomorphic, and this is indeed the case, as the next theorem shows. Theorem 5.23. Let d(z) be a nonzero polynomial. Then the operators Sd and Sd are isomorphic. The isomorphism is given by the map ρd : X d −→ Xd defined by
ρd h = dh.
(5.38)
Proof. We compute
ρd Sd h = ρd π− zh = d π− zh = d π− d −1 dzh = πd z(dh) = Sd (ρd h), i.e.,
ρd S d = S d ρd ,
which, by the invertibility of ρd , proves the isomorphism.
(5.39)
The polynomial and rational models that have been introduced are isomorphic, yet they represent two fundamentally different points of view. In the case of polynomial models, all spaces of the form Xq , with deg q = n, contain the same elements, but the associated shifts Sq act differently. On the other hand, the operators Sq in the spaces X q act in the same way, since they are all restrictions of the backward shift S− . However, the spaces are different. Thus the polynomial models represent an arithmetic perspective, whereas rational models represent a geometric one. Our next result is the characterization of all finite-dimensional S− -invariant subspaces. Proposition 5.24. A subset M of F− (z) is a finite-dimensional S− -invariant subspace if and only if we have M = X d .
116
5 The Shift Operator
Proof. Assume that for some polynomial d(z), M = X d . Then S−
r r = π− z = d −1 d π− d −1 zr = d −1 πd zr ∈ M. d d
Conversely, let M be a finite-dimensional S− -invariant subspace. By Theorem 4.54, there exists a nonzero polynomial p(z) of minimal degree such that π− ph = 0, for all h ∈ M. Thus, we get M ⊂ X p . It follows that pM is a submodule of X p , hence of the form p1 X p2 for some factorization p(z) = p1 (z)p2 (z). We conclude that M = −1 p2 p−1 p1 X p2 = p−1 2 p1 p1 X p2 = X . The minimality of p(z) implies p(z) = p2 (z), up to a constant nonzero factor. The spaces of rational functions of the form X d will be referred to as rational models. Note that in the theory of differential equations, these spaces appear as the Laplace transforms of the spaces of solutions of a homogeneous linear differential equation with constant coefficients. The following sums up the basic arithmetic properties of rational models. It is the counterpart of Proposition 5.7. Proposition 5.25. 1. Given polynomials p(z), q(z) ∈ F[z], we have the inclusion X p ⊂ X q if and only if p(z) | q(z). 2. Given polynomials pi (z) ∈ F[z], i = 1, . . . , s, then ∩si=1 X pi = X p with p(z) the g.c.d. of the pi (z). 3. Given polynomials pi ∈ F[z], then ∑si=1 X pi = X q with q(z) the l.c.m. of the pi (z). Proof. Follows from Proposition 5.7, using the isomorphism of polynomial and rational models given by Theorem 5.23. The primary decomposition theorem and the direct sum representation (5.16) have a direct implication toward the partial fraction decomposition of rational functions. s p (z)νi be the primary decomposition of the nonzero Theorem 5.26. Let p(z) = Πi=1 i polynomial p(z). Then
1. We have
ν1
νs
X p = X p1 ⊕ · · · ⊕ X ps . 2. Each rational function g(z) ∈
Xp
(5.40)
has a unique representation of the form
g(z) =
s νi ri j r(z) = ∑ ∑ j, p(z) i=1 j=1 pi
with deg ri j < deg pi , j = 1, . . . , νi . Proof. 1. Given the primary decomposition of p(z), we define πi (z) = Π j=i pνi i . Clearly, by Corollary 5.9, we have X p = π1 X pν1 ⊕ · · · ⊕ πs X psνs . 1
(5.41)
5.4 Rational Models
117
We use the isomorphism of the modules X p and X p and the fact that p(z) = νi
νi pi , to get the direct πi (z)pi (z)νi , which implies p−1 πi X pνi = p− i πi −1πi X piνi = X i sum decomposition (5.40). i −1 2. For any ri (z) ∈ X pνi we have ri = ∑νj=0 ri(νi − j) pij . With rp = ∑si=1 rνi i , using (5.41) pi
i
and taking ri ∈ X pνi , we have i
ri = pνi i
νi
∑
j=1
ri j pij
.
(5.42)
The isomorphism (5.39) between the shifts Sq and Sq can be used to transform Theorems 5.17 and 5.18 into the context of rational models. Thus we have the following Theorem. Theorem 5.27. 1. Let q(z) be a monic polynomial and Sq the shift operator in X q defined by (5.34). Let W be any operator in X q that commutes with Sq . Then there exists an operator Z that commutes with Sq and such that Z = ρqW ρq−1 .
(5.43)
Equivalently, the following diagram is commutative: W Xq
-
ρq ? Xq
Xq
ρq Z
-
? Xq
2. An operator W in X q commutes with Sq if and only if there exists a polynomial p(z) ∈ F[z] for which W h = π− ph = p(Sq )h,
h(z) ∈ X q .
(5.44)
Proof. 1. By the isomorphism (5.39), we have ρq Sq = Sq ρq . Since W Sq = SqW , it follows that W ρq−1 Sq ρq = ρq−1 Sq ρqW , or (ρqW ρq−1 )Sq = Sq (ρqW ρq−1 ). Defining Z = ρqW ρq−1 , we have ZSq = Sq Z. 2. If W is represented by (5.44), then SqW h = π q zπ− ph = π− q−1 π+ qzπ− ph = π− pq−1 q−1 zπ− ph = π− zπ− ph = W Sq h.
118
5 The Shift Operator
Conversely, assume W Sq = SqW , and define Z by (5.43). Then we have ZSq = Sq Z. Applying Theorem 5.17, there exists a polynomial p(z) ∈ F[z] for which Z = p(Sq ). Since W = ρq−1 Z ρq , this implies W h = ρq−1 Z ρq h = q−1 qπ− q−1 pqh = π− ph.
Theorem 5.28. Given the polynomials p(z), q(z), with q(z) monic. Let r(z) and s(z) be the g.c.d. and the l.c.m. of p(z) and q(z) respectively. Then 1. We have the factorizations q(z) = r(z)q1 (z), p(z) = r(z)p1 (z),
(5.45)
with p1 (z), q1 (z) coprime, as well as the factorizations s(z) = r(z)p1 (z)q1 (z) = p(z)q1 (z) = q(z)p1 (z).
(5.46)
We have Ker p(Sq ) = X r , Im p(Sq ) = X q1 .
(5.47)
2. Let p(z), q(z) ∈ F[z], with q(z) nonzero. Then the linear transformation p(Sq ) is invertible if and only if p(z) and q(z) are coprime. Moreover, we have p(S q )−1 = a(Sq ),
(5.48)
where the polynomial a(z) arises out of any solution of the Bezout equation a(z)p(z) + b(z)q(z) = 1.
(5.49)
5.5 The Chinese Remainder Theorem and Interpolation The roots of the Chinese remainder theorem are in number theory. However, we interpret it, the underlying ring taken to be F[z], as an interpolation result.
5.5 The Chinese Remainder Theorem and Interpolation
119
Theorem 5.29 (Chinese remainder theorem). Let qi (z) ∈ F[z] be mutually coprime polynomials and let q(z) = q1 (z) · · · qs (z). Then, given polynomials ai (z) such that deg ai < deg qi , there exists a unique polynomial f (z) ∈ Xq , i.e., such that deg f < deg q, and for which πqi f = ai . Proof. The interesting thing about the proof of the Chinese remainder theorem is its use of coprimeness in two distinct ways. One is geometric, the other one is spectral. Let us define d j (z) = ∏i= j qi (z). The mutual coprimeness of the qi (z) implies the direct sum decomposition Xq = d1 Xq1 ⊕ · · · ⊕ ds Xqs .
(5.50)
This is the geometric use of the coprimeness assumption. The condition deg f < deg q is equivalent to f (z) ∈ Xq . Let f (z) = ∑sj=1 d j (z) f j (z) with f j (z) ∈ Xq j . Since for i = j, qi (z) | d j (z), it follows that in this case πqi d j f j = 0. Hence
π q i f = π qi
s
∑ d j f j = πqi di fi = di (Sqi ) fi .
j=1
For the spectral use of coprimeness, we observe that the pairwise coprimeness of the qi (z) implies the coprimeness of di (z) and qi (z). In turn, this shows, applying Theorem 5.18, that the module homomorphism di (Sqi ) in Xqi is actually an isomorphism. Hence, there exists a unique fi (z) in Xqi such that ai = di (Sqi ) fi and fi = di (Sqi )−1 ai . So f = ∑sj=1 d j di (Sqi )−1 ai is the required polynomial. Note that the inversion of di (Sqi ) can be done easily, as in Theorem 5.18, using the Euclidean algorithm. The uniqueness of f (z), under the condition f (z) ∈ Xq , follows from the fact that (5.50) is a direct sum representation. This completes the proof.
5.5.1 Lagrange Interpolation Revisited Another solution to the Lagrange interpolation problem, introduced in Chapter 2, can be easily derived from the Chinese remainder theorem. Indeed, given distinct numbers αi ∈ F, i = 1, . . . , n, the polynomials qi (z) = (z − αi ) are mutually coprime. We define q(z) = Π nj=1 q j (z) and qi (z) = Π j=i q j (z), which leads to the factorizations q(z) = di (z)qi (z). The coprimeness assumption implies the direct sum representation Xq = d1 Xq1 ⊕ · · · ⊕ dn Xqn . Thus, any polynomial f (z) ∈ Xq has a unique representation of the form f (z) =
n
∑ c j d j (z).
j=1
(5.51)
120
5 The Shift Operator
We want to find a polynomial f (z) that satisfies the interpolation conditions f (λi ) = ai , i = 1, . . . , n. Applying the projection πqi to the expansion (5.51), and noting that πqi d j = 0 for j = i, we obtain ai = f ( λ i ) = π q i f = π q i
n
∑ c j d j (z) = ci di (λi ).
j=1
Defining the Lagrange interpolation polynomials by li (z) = di (z)/di (λi ), we get f (z) = ∑nj=1 a j l j (z) for the unique solution in Xq of the Lagrange interpolation problem. Any other solution differs by a polynomial that has a zero at all the points λi , hence is divisible by q(z).
5.5.2 Hermite Interpolation We apply now the Chinese remainder theorem to the problem of higher-order interpolation, or Hermite interpolation. In Hermite interpolation, which is a generalization of the Lagrange interpolation problem, we prescribe not only the value of the interpolating polynomial at given points, but also the value of a certain number of derivatives, the number of which may differ from point to point. Specifying the first ν derivatives, counting from zero, of a polynomial p(z) at a point α means that we are given a representation f (z) =
ν −1
∑
fi,α (z − αi )i + (z − α )ν g(z).
i=0
ν −1 fi,α (z − αi )i < ν , this means that Of course, since deg ∑i=0 ν −1
deg ∑ fi,α (z − α )i = deg π(z−α )ν f . i=0
Hence we can formulate the Hermite interpolation problem: Given distinct i −1 α1 , . . . , αk ∈ F, positive integers ν1 , . . . , νk , and polynomials fi (z) = ∑νj=0 f j,αi (z − j αi ) , find a polynomial f (z) such that
π(z−αi )νi f = fi ,
i = 1, . . . , k.
(5.52)
Proposition 5.30. There exists a unique solution f (z), of degree < n = ∑ki=1 νi , to the Hermite interpolation problem. Any other solution of the Hermite interpolation problem is of the form f (z) + p(z)g(z), where g(z) is an arbitrary polynomial and p(z) is given by k p(z) = Πi=1 (z − αi )νi . (5.53)
5.5 The Chinese Remainder Theorem and Interpolation
121
Proof. We apply the Chinese remainder theorem. Obviously, the polynomials (z − αi )νi , i = 1, . . . , k are mutually coprime. Then, with p(z) defined by (5.53), there exists a unique f (z) with deg f < n for which (5.52) holds. If fˆ(z) is any other solution, then h(z) = ( f (z) − fˆ(z)) satisfies π(z−αi )νi h = 0, that is h(z) is divisible by (z − αi )νi . Since these polynomials are mutually coprime, it follows that p(z) | h(z), or h(z) = p(z)g(z) for some polynomial g(z).
5.5.3 Newton Interpolation There are situations in which the interpolation data are given to us sequentially. At each stage we solve the corresponding interpolation problem. Newton interpolation is recursive in the sense that the solution at time k is the basis for the solution at time k + 1. Definition 5.31. Given λi , ai ∈ F, i = 0, 1, . . . , we define the Newton interpolation problem NIP(i): Find polynomials fi (z) ∈ Xdi , i ≥ 1, satisfying the interpolation conditions NIP(i) : fi (λ j ) = a j ,
0 ≤ j ≤ i − 1.
(5.54)
Theorem 5.32. Given λi , ai ∈ F, i = 0, 1, . . . , with the λi distinct, we define polynomials by qi (z) = z − λi , di (z) = Π i−1 j=0 (z − λ j ).
(5.55)
di+1 (z) = di (z)qi (z),
(5.56)
Then 1. We have the factorizations
with di (z), qi (z) coprime. 2. We have the following direct sum decomposition: Xdi+1 = Xdi ⊕ diXqi .
(5.57)
3. Every f (z) ∈ Xdi+1 has a unique representation of the form f (z) = g(z) + cdi(z), for some g(z) ∈ Xdi and c ∈ F.
(5.58)
122
5 The Shift Operator
4. Let fi (z) be the solution to the Newton interpolation problem NIP(i), then there exists a constant for which f i+1 (z) = fi (z) + di (z)ci
(5.59)
is the solution to the Newton interpolation problem NIP(i + 1) and it has the representation fi+1 (z) =
i
∑ c j d j (z),
(5.60)
j=0
where cj =
f j+1 (λ j ) − f j (λ j ) . d j (λ j )
(5.61)
Proof. 1. Follows from (5.55). The coprimeness of di (z) and qi (z) is a consequence of our assumption that the λ j are distinct. 2. Follows from the factorization (5.56) by applying Proposition 5.6. 3. Follows from the direct sum representation (5.57). 4. In equation (5.59), we substitute the corresponding expression for fi (z) and proceed inductively to get (5.60). Alternatively, we note that degdi = i, for i ≥ 0; hence {d0 (z), . . . , di (z)} is a basis for Xdi+1 and an expansion (5.60) exists. Let fi+1 (z) be defined by (5.59). Since di (λ j ) = 0 for j = 0, . . . , i−1, it follows that fi+1 (λ j ) = a j for j = 0, . . . , i − 1. All we need for fi+1 (z) to be a solution of NIP(i + 1) is to choose ci such that ai+1 = fi+1 (λi ) = fi (λi ) + di (λi )ci or, since di (λi ) = 0, we get (5.61).
5.6 Duality The availability of both polynomial and rational models allows us to proceed with a deeper study of duality. Our aim is to obtain an identification of the dual space to a polynomial model in terms of a polynomial model. On F(z), and more generally on F((z−1 )), we introduce a bilinear form as nf ng follows. Given f (z) = ∑ j=−∞ f j z j and g(z) = ∑ j=−∞ g j z j , let [ f , g] =
∞
∑
f j g− j−1 .
(5.62)
j=−∞
Clearly, the sum in (5.62) is well defined, since only a finite number of summands are nonzero. Given a subspace M ⊂ F(z), we let M ⊥ = { f ∈ F(z)|[m, f ] = 0, ∀m ∈ M}. It is easy to check that F[z]⊥ = F[z]. We will need the following simple computational rule.
5.6 Duality
123
Proposition 5.33. Let φ (z), f (z), g(z) be rational functions. Then [φ f , g] = [ f , φ g].
(5.63)
Proof. With the obvious notation we compute [φ f , g] = ∑∞j=−∞ (φ f ) j h− j−1 = ∑∞j=−∞ (∑∞ i=−∞ φi f j−i )h− j−1 ∞ ∞ ∞ = ∑∞ i=−∞ f j−i ∑ j=−∞ φi h− j−1 = ∑k=−∞ f k ∑ j=−∞ φ j−k h− j−1 ∞ ∞ = ∑∞ k=−∞ f k ∑i=−∞ φi h−i−k−1 = ∑k=−∞ f k (φ h)−k−1 = [ f , φ h].
Multiplication operators in F(z) of the form Lφ h = φ h are called Laurent operators. The function φ is called the symbol of the Laurent operator. Before getting the representation of the dual space to a polynomial model, we derive a representation of the dual space to F[z]. Theorem 5.34. The dual space of F[z] can be identified with z−1 F[[z−1 ]]. Proof. Clearly, every element h(z) ∈ z−1 F[[z−1 ]] defines, by way of the pairing (5.62), a linear functional on F[z]. Conversely, given a linear functional Φ on F[z], it induces linear functionals φi on F by defining, for ξ ∈ F,
φi (ξ ) = [zi ξ , φ ] = Φ (zi ξ ), and an element h ∈ z−1 F[[z−1 ]] is defined by letting h(z) = ∑∞j=0 φ j z− j−1 . It follows that Φ ( f ) = [ f , h]. Point evaluations are clearly linear functionals in F[z]. It is easy to identify the representing functions. In fact, this is an algebraic version of Cauchy’s theorem. Proposition 5.35. Let α ∈ F and f (z) ∈ F[z]. Then f (α ) = [ f , (z − α )−1 ]. i−1 z−i . So this follows from (5.62), since Proof. We have (z − α )−1 = ∑∞ i=1 α
f,
n 1 = ∑ fi α i = f (α ). z−α i=0
Theorem 5.36. Let M = dF[z] with d(z) ∈ F[z]. Then M ⊥ = X d .
124
5 The Shift Operator
Proof. Let f (z) ∈ F[z] and h(z) ∈ M. Then 0 = [d f , h] = [ f , dh] = [ f , π− dh]. But this implies d(z)h(z) ∈ Xd , or h(z) ∈ X d .
Next, we compute the adjoint of the projection πd : F[z] −→ F[z]. Clearly πd∗ is a transformation acting in z−1 F[[z−1 ]]. Theorem 5.37. The adjoint of πd is π d . Proof. Let f (z) ∈ F[z] and h(z) ∈ z−1 F[[z−1 ]]. Then ˜ ˜ = [d −1 f , π+ dh] [πd f , h] = [d π− d −1 f , h] = [π− d −1 f , dh] −1 −1 −1 ˜ ˜ ˜ ˜ ˜ ˜ = [ f , d π+ dh] = [π+ f , d π+ dh] = [ f , π− d π+ dh] ˜ d = [ f , π h].
Not only are we interested in the study of duality on the level of F[z] and its dual space z−1 F[[z−1 ]], but also we would like to study it on the level of the modules Xd and X d . The key to this study is the fact that if X is a vector space and M a subspace, then (X/M)∗ M ⊥ . Theorem 5.38. Let d(z) ∈ F[z] be nonsingular. Then Xd∗ is isomorphic to X d and Sd∗ = Sd . Proof. Since Xd is isomorphic to F[z]/dF[z], then Xd∗ is isomorphic to (F[z]/dF[z])∗ , which, in turn, is isomorphic to (dF[z])⊥ . However, this last module is X d . It is clear that under the duality pairing we introduced, we actually have Xd∗ = X d . Finally, let f (z) ∈ Xd and let h(z) ∈ X d . Then [Sd f , h] = [πd z f , h] = z f , π d h , [z f , h] = [ f , zh] = [π+ f , zh], [ f , π− zh] = [ f , S− h] = f , Sd h . Hence, we can identify Xd˜ with Xd∗ by defining a new pairing f , g = [d −1 f , g] = [ f , d −1 g] for all f (z), g(z) ∈ Xd .
(5.64)
As a direct corollary of Theorem 5.38 we have the following. Theorem 5.39. The dual space of Xd under the pairing , introduced in (5.64) is Xd , and moreover, Sd∗ = Sd . With the identification of the polynomial model Xd with its dual space, we can identify some pairs of dual bases.
5.6 Duality
125
Proposition 5.40. 1. Let d(z) = zn + dn−1 zn−1 + · · · + d0 , and let Bst = {1, z, . . . , zn−1 } be the standard and Bco = {e1 (z), . . . , en (z)} the control bases respectively of Xd . Then Bco = Bst∗ , that is, the control and standard bases are dual to each other. n (z − λ ), with the λ distinct. Let B = {π (z), . . . , π (z)}, with 2. Let d(z) = Πi=1 i i in n 1 πi (z) the Lagrange interpolation polynomials, be the interpolation basis in Xd . Let Bsp = {p1 (z), . . . , pn (z)} be the spectral basis, with pi (z) = Π j=i (z − λ j ). ∗ =B . Then Bsp in Proof. 1. We use (5.64) to compute zi−1 , e j = [d −1 zi−1 , π+ z− j d] = [d −1 zi−1 , z− j d] = [zi− j−1 , 1] = δi j . 2. We use Proposition 5.35 and note that for every f (z) ∈ Xd we have f , pi = [d
−1
1 pi = f, f , pi ] = f , = f (λi ). d z − λi
In particular, πi , p j = πi (λ j ) = δi j .
This result explains the connection between the two companion matrices given in Proposition 5.4. Indeed, ∗ co co Cq = [Sq ]stst = [S q ]co = [Sq ]co = Cq .
Next we compute the change of basis transformations. Proposition 5.41. 1. Let q(z) = zn + qn−1zn−1 + · · · + q0 ,. Then ⎛
q1 . . qn−1 ⎜ . . ⎜ ⎜ [I]stco = ⎜ . . ⎜ ⎝ qn−1 . 1 and
1
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
(5.65)
⎞ 1 ⎜ . ψ1 ⎟ ⎜ ⎟ ⎜ ⎟ co [I]st = ⎜ ... ⎟, ⎜ ⎟ ⎝ . ... ⎠ 1 ψ1 . . ψn−1
(5.66)
⎛
126
5 The Shift Operator
where for q (z) = zn q(z−1 ), ψ (z) = ψ0 + · · · + ψn−1 zn−1 is the unique solution, of degree < n, of the Bezout equation q (z)ψ (z) + zn σ (z) = 1.
(5.67)
2. The matrices in (5.65) and (5.66) are inverses of each other. 3. Let d(z) = Π nj=1 (z − α j ) with α1 , . . . , αn distinct and let Bsp , Bin be the corresponding spectral and interpolation bases of Xd . Then we have the following change of basis transformations: ⎛
1 α1 ⎜. . ⎜ ⎜ [I]in st = ⎜ . . ⎜ ⎝. . 1 αn
⎞ . . α1n−1 .. . ⎟ ⎟ ⎟ . . . ⎟, ⎟ .. . ⎠ . . αnn−1
(5.68)
⎛
⎞ 1 ... 1 ⎜ α ... α ⎟ ⎜ 1 n ⎟ ⎜ ⎟ [I]co . . . . ⎟, sp = ⎜ . ⎜ ⎟ ⎝ . ... . ⎠ α1n−1 . . . αnn−1 ⎞ ⎛ p1 (α1 ) ⎟ ⎜ . ⎟ ⎜ ⎟ ⎜ in [I]sp = ⎜ . ⎟. ⎟ ⎜ ⎠ ⎝ . pn (αn )
(5.69)
(5.70)
Proof. 1. Note that if ψ (z) is a solution of (5.67), then necessarily ψ0 = 1. With J the transposition matrix defined in (8.39), we compute st −1 J[I]co = (q (Szn ))−1 . st = ([I]co J)
However, if we consider the map Szn , then ⎞ ⎛ 1 qn−1 . . . q1 ⎟ ⎜ . .... ⎟ ⎜ ⎟ ⎜ .... ⎟ ⎜ ⎟ = q (Szn ), ⎜ ⎟ ⎜ ... ⎟ ⎜ ⎝ . qn−1 ⎠ 1 and its inverse is given by ψ (Szn ), where ψ (z) solves the Bezout equation (5.67). This completes the proof.
5.7 Universality of Shifts
127
2. Follows from the fact that [I]stco [I]co st = I. 3. The matrix representation for [I]in st has been derived in Corollary 2.36. The matrix representation for [I]co sp follows from (5.68) by applying duality theory, in particular Theorem 4.38. We can also derive this matrix representation directly, which we proceed to do. For this, we define polynomials s1 (z), . . . , sn (z) by si (z) = e1 (z) + αi e2 (z) + · · · + αin−1en (z). We claim that si (z) are eigenfunctions of Sd corresponding to the eigenvalues αi . Indeed, using equation (5.12) and the fact that 0 = q(αi ) = q0 + q1αi + · · · + αin , we have Sq si = −q0 en + αi (e1 − q1 en ) + · · · + αi (en−1 − qn−1 en ) = αi e1 + · · · + αin−1 en−1 − (q0 + · · · + qn−1αin−1 )en = αi e1 + · · · + αin en = αi si (z). This implies that there exist constants γi such that si (z) = γi pi (z). Since both si (z) and ei (z) are obviously monic, it follows that necessarily γi = 1 and si (z) = pi (z). The equations pi (z) = e1 (z) + αi e2 (z) + · · · + αin−1en (z) imply the matrix representation (5.69). Finally, the matrix representation in (5.70) follows from the trivial identities pi (z) = pi (αi )πi (z).
5.7 Universality of Shifts We end this chapter by explaining the reason that the polynomial and rational models we introduced can be used so effectively in linear algebra and its applications. The main reason for this is the universality property of shifts. For our purposes, we shall show that any linear transformation in a finite-dimensional vector space V over the field F is isomorphic to the compression of the forward shift S+ : F[z]n −→ F[z]n , defined in (1.31), to a quotient space, or alternatively, to the restriction of the backward shift S− : z−1 F[[z−1 ]]n −→ z−1 F[[z−1 ]]n , defined in (1.32), to an invariant subspace. For greater generality, we will have to stray away from scalar-valued polynomial and rational functions. Assume we are given a linear transformation A in a finite-dimensional vector space V , over the field F. Without loss of generality, by choosing a matrix representation, we may as well assume that V = Fn and that A is a square matrix. By Fn [z] we denote the space of vector polynomials with coefficients in Fn , whereas F[z]n will denote the space of vectors with polynomial entries. We will use freely the isomorphism between Fn [z] and F[z]n and we will identify the two spaces.
128
5 The Shift Operator
With the linear polynomial matrix zI − A we associate a map πzI−A : F[z]n −→ Fn , given by k
πzI−A ∑ ξ j z j = j=0
k
∑ A jξ j.
(5.71)
j=0
The operation defined above can be considered as taking the remainder of a polynomial vector after left division by the polynomial matrix zI − A. Proposition 5.42. For the map πzI−A defined by (5.71) 1. We have πzI−A is surjective. 2. We have Ker πzI−A = (zI − A)F[z]n . 3. For the map S+
: F[z]n
−→
F[z]n
(5.72)
defined by
(S+ f )(z) = z f (z), the following diagram is commutative: πzI−A F[z]n
-
(5.73)
Fn
S+
A
πzI−A
? F[z]n This implies that
-
? Fn
Ax = πzI−A z · x.
(5.74)
4. Given a polynomial p(z) ∈ F[z], we have p(A)x = πzI−A p(z)x.
(5.75)
Proof. 1. For each constant polynomial x ∈ F[z]n , we have πzI−A x = x. The surjectivity follows. 2. Assume f (z) = (zI − A)g(z) with g(z) = ∑ki=0 gi zi . Then k
k
k
i=0
i=0
i=0
f (z) = (zI − A) ∑ gi zi = ∑ gi zi+1 − ∑ Agi zi . Therefore
k
k
i=0
i=0
πzI−A f = ∑ Ai+1 gi − ∑ Ai Agi = 0, i.e.,
(zI − A)F[z]n
⊂ Ker πzI−A .
5.7 Universality of Shifts
129
Conversely, assume f (z) = ∑ki=0 fi zi ∈ Ker πzI−A , that is, ∑ki=0 Ai fi = 0. i−1− j A j , we compute Recalling that zi I − Ai = (zI − A) ∑i−1 j=0 z f (z) = ∑ki=0 fi zi = ∑ki=0 fi zi − ∑ki=0 Ai fi i−1− j A j ) f = ∑ki=0 (zi I − Ai ) fi = ∑ki=0 (zI − A)(∑i−1 i j=0 z i−1− j A j ) f = (zI − A)g, = (zI − A) ∑ki=0 (∑i−1 z i j=0 so Ker πzI−A ⊂ (zI − A)F[z]n ; Hence the equality (5.72) follows. 3. To prove the commutativity of the diagram, we compute, with f (z) = ∑ki=0 fi zi ,
πzI−A S+ f = πzI−A z ∑ki=0 fi zi = πzI−A ∑ki=0 fi zi+1 = ∑ki=0 Ai+1 fi = A ∑ki=0 Ai fi = AπzI−A f . 4. By linearity, it suffices to prove this for polynomials of the form zk . We do this by induction. For k = 1 this holds by equation (5.74). Assume it holds up to k − 1. Using the fact that zKer (zI − A) = z(zI − A)F[z]n ⊂ (zI − A)F[z]n = Ker (zI − A), we compute
πzI−A zk x = πzI−A zπzI−A zk−1 x = πzI−A zAk−1 x = AAk−1 x = Ak x.
Clearly, the equality f (z) = (zI − A)g(z) + πzI−A f can be interpreted as πzI−A f being the remainder of f (z) after division by zI − A. As a corollary, we obtain the celebrated Cayley–Hamilton theorem. Theorem 5.43 (Cayley–Hamilton). Let A be a linear transformation in an ndimensional vector space U over F and let dA (z) be its characteristic polynomial. Then dA (A) = 0. Proof. By Cramer’s rule, we have dA (z)I = (zI − A)adj(zI − A); hence we have the inclusion dA (z)F[z]n ⊂ (zI − A)F[z]n . This implies, for each x ∈ U , that dA (A)x = πzI−A d(z)x = 0, so dA (A) = 0.
Corollary 5.44. Let A be a linear transformation in an n-dimensional vector space U over F. Then its minimal polynomial mA (z) divides its characteristic polynomial dA (z). Theorem 5.45. Let A be a linear transformation in Fn . Then A is isomorphic to S− restricted to a finite-dimensional S− -invariant subspace of z−1 F[[z−1 ]]n . Specifically, let Φ : Fn −→ z−1 F[[z−1 ]]n be defined by
130
5 The Shift Operator
Φξ = (zI − A)−1 ξ ,
(5.76)
L = Im Φ .
(5.77)
and let
Then the following diagram is commutative:
Φ
Fn
- Im Φ
S− |Im Φ
A ? Fn
Φ
? - Im Φ
which implies the isomorphism A S− |Im Φ .
(5.78)
Proof. Note that the map Φ is injective, hence invertible as a map from Fn onto L = Im Φ = {(zI − A)−1 ξ |ξ ∈ Fn }. Since (zI − A)−1ξ =
∞
∑ A j ξ z−( j+1)
j=0
L is a subspace of z−1 F[[z−1 ]]n . Since S− Φξ = S− (zI − A)−1ξ = π− z(zI − A)−1ξ = π− (zI − A + A)(zI − A)−1ξ = (zI − A)−1 Aξ = Φ Aξ , the S− -invariance of L follows as well as the isomorphism (5.78).
Remark 5.46. 1. The underlying idea is that the subspace Im Φ contains all the information about A. The study of L = Im Φ as an F[z]-module is a tool for the study of the transformation A. 2. The subspace L = Im Φ inherits an F[z]-module structure from z−1 F[[z−1 ]]m . With Fn having the F[z]-module structure induced by A, then Φ is clearly an F[z]-module homomorphism.
5.8 Exercises
131
5.8 Exercises 1. Assume q(z) = zn + qn−1zn−1 + · · · + q0 is a real or complex polynomial. Show that the solutions of the linear, homogeneous differential equation y(n) + qn−1 y(n−1) + · · · + q0 y = 0 form an n-dimensional space. Use the Laplace transform L to show that y is a solution if and only if L (y) ∈ X q . 2. Let T be a cyclic transformation with minimal polynomial mT (z) of degree n, and let p(z) ∈ F[z]. Show that the following statements are equivalent: a. b. c. d.
The operator p(T ) is cyclic. There exists a polynomial q(z) ∈ F[z] such that (q ◦ p)(z) = z mod (mT ). The map Λ p : F[z] −→ XmT defined by Λ p (q) = πmT (q ◦ p) is surjective. j We have det(πk j ) = 0, where πk = πmT (pk ) and πk (z) = ∑n−1 j=0 πk j z .
3. Assume the minimal polynomial of a cyclic operator T factors into linear factors, i.e., mT (z) = ∏(z − λi )νi with the λi distinct. Show that p(T ) is cyclic if and only if λi = λ j implies p(λi ) = p(λ j ) and p (λi ) = 0 whenever νi > 1. 4. Show that if π (z) ∈ F[z] is irreducible and deg p is a prime number, then p(Sπ ) is either scalar or a cyclic transformation. 5. Let q(z) = zn + qn−1 zn−1 + · · ·+ q0 and let Cq be its companion matrix. Let f (z) ∈ Xq with f (z) = f0 + · · · + fn−1 zn−1 . We put ⎛ ⎜ ⎜ ⎜ f=⎜ ⎜ ⎝
f0 . . .
⎞ ⎟ ⎟ ⎟ ⎟ ∈ Fn . ⎟ ⎠
fn−1 Show that f (Cq ) = f,Cq f, . . . ,Cqn−1 f . 6. Given a linear transformation A in V , a vector x ∈ V , and a polynomial p(z), define the Jacobson chain matrix by Cm (p, x, A) = (x, Ax, . . . , Ar−1 x, . . . , p(A)m−1 x, p(A)m−1 Ax, . . . , p(A)m−1 Ar−1 x). Prove the following a. Let E = Cm (p, 1,C pm ) then E is a solution of C pm E = EH(pm ). b. Every solution X of C pm X = XH(pm) is of the form X = f (Cpm )E for some polynomial f (z) of degree < m deg p.
132
5 The Shift Operator
m−1 + · · · + p and q(z) = zn + q n−1 + · · · + q . Let 7. Let p(z) = zm + p m−1 z 0 n−1 z 0 Cp 0 H(p, q) = , where N is the n × m matrix whose only nonzero element
N Cq
is N1m = 1. Show that the general solution to the equation Cpq X = XH(pq) is of the form X = f (Cpq )K, where deg f < deg p + degq and ⎛
1
p0 . . . . . .
⎜ . ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ . 1 pm−1 ⎜ K=⎜ 1 ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎝ .
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ p0 ⎟ ⎟. . ⎟ ⎟ ⎟ . ⎟ ⎟ . ⎟ ⎟ . ⎠ 1
8. Let C be the circulant matrix ⎛
c0 cn−1 ⎜ c ⎜ 1 c0 ⎜ C = circ (c0 , . . . , cn−1 ) = ⎜ . . ⎜ ⎝ . . cn−1 .
. . . . .
⎞ . c1 . . ⎟ ⎟ ⎟ . . ⎟. ⎟ . cn−1 ⎠ c1 c0
Show that det(C) = ∏ni=1 (c0 + c1 ζi + · · · + cn−1 ζin−1 ), where 1 = ζ1 , . . . , ζn are the distinct nth roots of unity, that is, the zeros of zn − 1. 9. Prove the following barycentric representation for the minimal-degree polynomial satisfying the interpolation constraints f (λi ) = ai , i = 0, . . . , n − 1, namely ⎧ wj ⎪ ai ⎪ ∑n−1 i=0 ⎪ ⎪ z− λj ⎪ ⎪ ⎨ n−1 w j , ∑i=0 f (z) = z− λj ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ , ai Here wi =
1 Π j=i (λi −λ j ) .
z = λ j , j = 0, . . . , n − 1,
z = λ j , j = 0, . . . , n − 1.
5.9 Notes and Remarks
133
5.9 Notes and Remarks Shift operators are a cornerstone of modern operator theory. As we explained in Section 5.7, their interest lies in their universality properties, an extremely important fact, first pointed out in Rota (1960). This observation is the key insight to the use of shift operators in modeling linear transformations as well as, more generally, linear systems. The fact that any linear transformation T acting in a finite-dimensional vector space is isomorphic to either a compression of S+ or to a restriction of S− has far-reaching implications. This leads to the use of functional models, in our case polynomial and rational models, in the study of linear transformations. The reason for the great effectiveness of the use of functional models can be traced back to the compactness of the polynomial notation as well as that the polynomial setting provides a richer algebraic language, with terms as zeros, factorizations, ideals, homomorphisms, and modules conveniently at hand. Of the two classes of models we employ, the polynomial models emphasize the arithmetic properties of the transformation, whereas the module structure of rational models is completely determined by the geometry of the model space. In other words, in the case of polynomial models, the space is simple but the tranformation is complicated. On the other hand, for rational models, the space is complicated but the transformation, i.e., the restricted backward shift, is simple. The material in this chapter is mostly standard. The definitions given in Equations (5.71) and (5.75) have far reaching implications. They can be generalized, replacing zI − A by an arbitrary nonsingular polynomial matrix, thus leading to the theory of polynomial models, initiated in Fuhrmann (1976). This is the algebraic counterpart of the functional models used so effectively in operator theory, e.g., Sz.-Nagy and Foias (1970). The results and methods presented in this chapter originate in Fuhrmann (1976) and a long series of follow-up articles. In an analytic, infinite-dimensional setting, Nikolskii (1985) is a very detailed study of shift operators. The central results of this chapter are two theorems. Theorem 5.17 characterizes the maps intertwining the shifts of the form Sq , while Theorem 5.18 studies their invertibility properties in terms of factorizations and polynomial coprimeness. These results generalize to the use of shifts of higher multiplicity, a generalization that requires the use of the algebra of polynomial matrices, which is beyond the scope of the present book. The analytic analogues of these result are the commutant lifting theorem, see Sarason (1967) and Sz.-Nagy and Foias (1970), and a corresponding spectral mapping theorem; see Fuhrmann (1968a,b). We shall return to these topics in Chapter 11. The kernel representation (5.37) of rational models has far-reaching generalizations. With the scalar polynomial replaced by a rectangular polynomial matrix, it is the basic characterizations of behaviors in the behavioral approach to linear systems initiated in Willems (1986, 1989, 1991). In this connection, see also Fuhrmann (2002).
134
5 The Shift Operator
Theorem 5.17 can be extended to the characterization of intertwining maps between two polynomial, or rational, models. We will return to this in Chapter 8, where we give a different proof using tensor products and Bezoutians. Theorem 5.18 highlights the importance of coprimenes and the Bezout equation. Since the Bezout equation can be solved using the Euclidean algorithm, this brings up the possibility of a recursive approach to inversion algorithms for structured matrices. This will be further explored in Chapter 8. The Bezout equation reappears, this time over the ring RH∞ + , in Chapters 11 and 12. Companion matrices of the polynomial q(z) appear in Krull’s thesis. The particularly musical notation for the matrices Cq ,Cq , defined in Proposition 5.4, was introduced by Kalman. Section 5.3 on circulant matrices is based on results taken from Davis (1979). The Chinese remainder theorem has its roots in the problem of solving simultaneous congruences arising in modular arithmetic. It appears in a third-century mathematical treatise by Sun Tzu. For a very interesting account of the history of early non-European mathematics, see Joseph (2000). In working over the ring of polynomials F[z] rather than over the integes Z, it can be applied to interpolation problems. The possibility of applying the Chinese remainder theorem to interpolation problems is not mentioned in many of the classic algebra texts such as van der Waerden, Lang, Mac Lane and Birkhoff or even Gantmacher (1959). Of these monographs, van der Waerden (1931) stands out inasmuch as it discusses both the Lagrange and the Newton interpolation formulas. In connection to interpolation, the following quotation, from Schoenberg (1987), may be of interest: “Sometime in the 1950’s the late Hungarian-Swedish mathematican Marcel Riesz visited the University of Pennsylvania and told us informally that the Chinese remainder theorem (1) can be thought of as an analogue of the interpolation by polynomials.” Cayley, in a letter to Sylvester, actually stated a more general version of the Cayley–Hamilton theorem. That version says that if the square matrices A and B commute and f (x, y) = det(xA − yB), then f (B, A) = 0. This was generalized in Livsic (1983).
Chapter 6
Structure Theory of Linear Transformations
6.1 Introduction In this chapter, we study the structure of linear transformations. Our aim is to understand the structure of a linear transformation in terms of its most elementary components. This we do by representing the transformation by its matrix representation with respect to particularly appropriate bases. The ultimate goal, not always achievable, is to represent a linear transformation in diagonal form.
6.2 Cyclic Transformations We begin our study of linear transformations in a finite-dimensional linear space by studying a special subclass, namely the class of cyclic linear transformations, introduced in Definition 5.13. Later on, we will show that every linear transformation is isomorphic to a direct sum of cyclic transformations. Thus the structure theory will be complete. Proposition 6.1. Let A be a cyclic linear transformation in an n-dimensional vector space U . Then its characteristic and minimal polynomials coincide. Proof. We prove the statement by contradiction. Suppose m(z) = zk + mk−1 zk−1 + · · · + m0 is the minimal polynomial of A, and assume k < n. Then Ak = −m0 I − · · · − +mk−1 Ak−1 . This shows that given an arbitrary vector x ∈ U , we have Ak x ∈ span (x, Ax, . . . , Ak−1 x). By induction, it follows that for any integer p we have span (x, Ax, . . . , A p x) ⊂ span (x, Ax, . . . , Ak−1 x). Since dim span (x, Ax, . . . , Ak−1 x) ≤ k < n, it follows that A is not cyclic. Later on, we will prove that the coincidence of the characteristic and minimal polynomials implies cyclicity.
P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, DOI 10.1007/978-1-4614-0338-8 6, © Springer Science+Business Media, LLC 2012
135
136
6 Structure Theory of Linear Transformations
Given a linear transformation T in a vector space U , its commutant is defined as the set C (T ) = {X ∈ L(U )|XT = T X}. The next result is a characterization of the commutant of a cyclic transformation. Proposition 6.2. Let T : U −→ U be a cyclic linear transformation. Then X : U −→ U commutes with T if and only if X = p(T ) for some polynomial p(z). Proof. Let p(z) ∈ F[z]. Then clearly, T p(T ) = p(T )T . Conversely, let X commute with T , where T is cyclic. Let x ∈ U be a cyclic vector, i.e., x, T x, . . . , T n−1 x form a basis for U . Thus there exist pi ∈ F for which Xx =
n−1
∑ pi T i x.
i=0
This implies XT x = T Xx = T = ∑n−1 i=0 i ∑n−1 i=0 pi z .
XT k x
and by induction, p(T ), where p(z) =
pi
n−1
n−1
i=0
i=0
∑ pi T i x = ∑ pi T i T x,
T i T k x.
Since the T k x span U , this implies X =
Next, we show that the class of shift transformations Sm is not that special. In fact, every cyclic linear transformation in a finite-dimensional vector space is isomorphic to a unique transformation of this class. Theorem 6.3. Let V be a finite-dimensional linear space and let A : V −→ V be a cyclic transformation, with b ∈ V a cyclic vector. We consider V an F[z]-module, with the module structure induced by A. Then 1. Define a map Φ : F[z] −→ V , by k
k
i=0
i=0
Φ ∑ fi zi = ∑ fi Ai b.
(6.1)
Then the following diagram commutes: F[z]
Φ
-
S+ ? F[z]
V A
Φ
-
? V
6.2 Cyclic Transformations
137
i.e., we have
Φ S+ = AΦ ,
(6.2)
which shows that Φ is an F[z]-homomorphism. 2. The map Φ is surjective and Ker Φ is a submodule of F[z], i.e., an S+ -invariant subspace. 3. There exists a unique monic polynomial m(z) ∈ F[z], with degm = dim V , for which Ker Φ = mF[z]. (6.3) 4. Let φ : F[z]/mF[z] −→ V be the map induced by Φ on the quotient module, i.e.,
φ [ f ]m = Φ f .
(6.4)
Let π : F[z] −→ F[z]/mF[z] be the canonical projection map defined by
π f = [ f ]m ,
(6.5)
where [ f ]m = f (z) + m(z)F[z]. Then φ π = Φ , and the map φ : F[z]/mF[z] −→ V is an isomorphism. Identifying the quotient module F[z]/mF[z] with the polynomial model Xm , we have the F[z]-module isomorphism φ : Xm −→ V given by
φ f = Φ f, i.e., φ = ΦXm . 5. We have
f (z) ∈ Xm ,
φ S m = Aφ ,
(6.7)
i.e., the following diagram is commutative:
φ
Xm
-
Sm
V A
? Xm
φ
-
? V
Proof. 1. We compute k
k
k
i=0
i=0
i=0
Φ (z f ) = Φ ∑ fi zi+1 = ∑ fi Ai+1 b = A ∑ fi Ai b = AΦ f . This proves (6.2).
(6.6)
138
6 Structure Theory of Linear Transformations
2. Since {b, Ab, . . ., An−1 b} is a basis of V , the map Φ is clearly surjective. We show that Ker Φ is an ideal, or equivalently, a submodule in F[z]. Assume f (z), g(z) ∈ Ker Φ . Then Φ (α f + β g) = αΦ ( f ) + β Φ (g) = 0, so α f + β g ∈ Ker Φ . Similarly, if f ∈ Ker Φ , then
Φ (z f ) = AΦ ( f ) = A0 = 0, which shows that Ker φ is indeed an ideal. In fact, as is easily seen, a non-trivial ideal. That Ker Φ is a submodule of F[z] follows from (6.2). 3. Follows from the fact that F[z] is a principal ideal domain; hence there exists a unique monic polynomial for which Ker Φ = mF[z]. 4. Follows from Proposition 1.55. 5. We compute Aφ f = Aφ πm f = AΦ f = Φ z f = φ πm z f = φ Sm f .
From this theorem it follows that the study of an arbitrary cyclic transformation reduces to the study of the class of model transformations of the form Sm . By Theorem 5.23, the polynomial and rational models that correspond to the same polynomial are isomorphic. This indicates that Theorem 6.3 can be dualized in order to represent linear transformations by S− restricted to finite-dimensional backward-shift-invariant subspaces. This leads to the following direct proof of Rota’s theorem restricted to cyclic transformations. Theorem 6.4. Let A be a cyclic linear transformation in V and let b ∈ V be a cyclic vector for A. Then 1. Let Φ , π and φ be defined by (6.1), (6.5) and (6.4) respectively. Then a. The adjoint map Φ ∗ : V ∗ −→ z−1 F[[z−1 ]] is given by ∞
(b, A∗(i−1)η ) zi i=1
Φ ∗η = ∑
(6.8)
and is injective. b. The adjoint map π ∗ : X m −→ z−1 F[[z−1 ]] is given by
π ∗ h = h,
(6.9)
i.e., it is the injection of X m into F[[z−1 ]]. c. The adjoint map φ ∗ : V ∗ −→ X m is given, for η ∈ V ∗ , by ∞
(b, A∗(i−1) η ) . zi i=1
φ ∗η = ∑
(6.10)
6.2 Cyclic Transformations
139
d. The dual of the diagram in Theorem 6.3 is given by
Φ∗
z−1 F[[z−1 ]]
V∗
6
6 A∗
S−
Φ∗
z−1 F[[z−1 ]]
V∗
i.e., we have
Φ ∗ A∗ = S − Φ ∗ .
(6.11)
e. Im Φ ∗ is a finite-dimensional, S− -invariant subspace of z−1 F[[z−1 ]]. f. We have the following isomorphism: S − φ ∗ = φ ∗ A∗ .
(6.12)
2. Let V be a finite-dimensional linear space and let A : V −→ V be a cyclic transformation, with b ∈ V a cyclic vector. Then A is isomorphic to S− restricted to a finite-dimensional S− -invariant subspace of z−1 F[[z−1 ]]. Specifically, let Ψ : V −→ z−1 F[[z−1 ]] be defined by ∞
(b, Ai−1 ξ ) . zi i=1
Ψξ = ∑
(6.13)
Then the following diagram is commutative:
Ψ
V
- ImΨ
S− |Im Ψ
A ? V
Ψ
? - ImΨ
This implies the isomorphism A S− |Im Ψ .
(6.14)
140
6 Structure Theory of Linear Transformations
Proof. 1. a. We compute (Φ f , η ) = (Φ ∑ki=0 fi zi , η ) = (∑ki=0 fi Ai b, η ) = ∑ki=0 fi (Ai b, η ) = ∑ki=0 fi (b, (A∗ )i η ) = [ f , Φ ∗ η ], which implies (6.8). To show injectivity of Φ ∗ , assume η ∈ Ker Φ ∗ . This means that for all i ≥ 0, we have (b, (A∗ )i η ) = (Ai b, η ) = 0. Since the Ai b span V , necessarily η = 0, which proves injectivity. b. We have made the identification (F[z]/mF[z])∗ = (mF[z])⊥ = X m . For f ∈ F[z] and h ∈ X m , we compute [π f , h] = [[ f ]m , h] = [ f , h] = [ f , π ∗ h], from which (6.9) follows. c. For f ∈ F[z] and η ∈ V ∗ , we compute (φ [ f ]m , η ) = (Φ f , η ) = ( f , Φ ∗ ) = ([ f ]m , φ ∗ ), i.e., φ ∗ = Φ ∗ . Using (6.8), (6.10) follows. d. Follows by dualizing (6.7) and using the representation (6.8). e. Follows from the fact that V ∗ is finite-dimensional. f. Follows from (6.11) and the equality φ ∗ = Φ ∗ . 2. Follows from the representation of the adjoints, computed in the previous part, taken together with F[z]∗ z−1 F[[z−1 ]]. Identity (6.11) is obtained by dualizing (6.2).
6.2.1 Canonical Forms for Cyclic Transformations In this section we exhibit canonical forms for cyclic transformations. Theorem 6.5. Let p(z) ∈ F[z] and let p(z) = p1 (z)m1 · · · pk (z)mk be the primary decomposition of p(z). We define
πi (z) = ∏ p j (z)m j .
(6.15)
j =i
Then we have the direct sum decomposition X p = π1 (z)X pm1 ⊕ · · · ⊕ πk (z)X pmk 1
k
(6.16)
6.2 Cyclic Transformations
141
and the associated spectral representation S p S p m1 ⊕ · · · ⊕ S p mk . 1
(6.17)
k
Proof. The direct sum decomposition (6.16) was proved in Theorem 2.23. We define a map Z : X pm1 ⊕ · · · ⊕ X pmk −→ X p = π1 (z)X pm1 ⊕ · · · ⊕ πk (z)X pmk 1
1
k
k
by k
Z( f 1 , . . . , fk ) = ∑ πi fi .
(6.18)
i=1
Clearly, Z is invertible, and it is easily checked that the following diagram is commutative: X p m1 ⊕ · · · ⊕ X p mk 1
Z
k
- Xp
S p m1 ⊕ · · · ⊕ S p mk 1
Sp
k
? X p m1 ⊕ · · · ⊕ X p mk 1
Z
-
k
which shows that Z is an F[z]-isomorphism.
? Xp
Thus, by choosing appropriate bases in the spaces X pmi , we get a block matrix i representation for S p . It suffices to consider the matrix representation of a typical operator S pm , where p(z) is an irreducible polynomial. This is the content of the next proposition. Proposition 6.6. Let p(z) = zr + pr−1 zr−1 + · · · + p0 . Then 1. dim X pm = mr.
(6.19)
2. The set of polynomials B jo = {1, z, . . . , zr−1 , p, zp, . . . , zr−1 p, . . . , pm−1 , zpm−1 , . . . , zr−1 pm−1 } (6.20) forms a basis for X pm . We call this the Jordan basis.
142
6 Structure Theory of Linear Transformations
3. The matrix representation of S pm with respect to this basis is of the form ⎛
⎞ Cp ⎜ N C ⎟ p ⎜ ⎟ ⎜ ⎟ N . ⎜ ⎟ jo [S pm ] jo = ⎜ ⎟, ⎜ ⎟ .. ⎜ ⎟ ⎝ ⎠ . . N Cp
(6.21)
where Cp is the companion matrix of p(z), defined by (5.9), and ⎞ 0...01 ⎜. 0⎟ ⎟ ⎜ ⎟ ⎜ .⎟ ⎜. N=⎜ ⎟. ⎜. .⎟ ⎟ ⎜ ⎝. .⎠ 0... . 0 ⎛
(6.22)
We refer to (6.21) as the Jordan canonical form of S pm . Proof. 1. We have dim X pm = deg pm = m deg p = mr. 2. In B there is one polynomial of each degree and mr polynomials altogether, so necessarily they form a basis. 3. We prove this by simple computations. Notice that for j < m − 1, S pm zr−1 p j = π pm z · zr−1 p j = π pm zr p j = π pm (zr + pr−1zr−1 + · · · + p0 − pr−1 zr−1 − · · · − p0 )p j i j = π pm p j+1 − ∑r−1 i=0 pi (z p ). On the other hand, for j = m − 1, we have S pm zr−1 pm−1 = π pm zr pm−1 = π pm (zr + pr−1zr−1 + · · · + p0 − pr−1 zr−1 − · · · − p0 )pm−1 i m−1 = π pm pm − ∑r−1 ) i=0 pi (z p r−1 i m−1 ). = − ∑i=0 pi (z p It is of interest to compute the dual basis to the Jordan basis. This leads us to the Jordan form in the block upper triangular form. Proposition 6.7. Given the monic polynomial p(z) of degree r, let {e1 (z), . . . , er (z)} be the control basis of X p . Then
6.2 Cyclic Transformations
143
1. The basis of X pm dual to the Jordan basis of (6.20) is given by jo = {pm−1 (z)e1 (z), . . . , pm−1 (z)er (z), . . . , e1 (z), . . . , er (z)}. B
(6.23)
Here e1 (z), . . . , er (z) are the elements of the control basis of X p . We call this basis the dual Jordan basis. 2. The matrix representation of Sqm with respect to the dual Jordan basis is given by ⎞ Cp N˜ ⎟ ⎜ C . ⎟ ⎜ p ⎟ ⎜ . . ⎟ ⎜ [Sqm ] jo = ⎟. ⎜ jo ⎟ ⎜ .. ⎟ ⎜ ˜ ⎝ . N⎠ Cp ⎛
(6.24)
Note that Cp is defined in (5.10). 3. The change of basis transformation from the dual Jordan basis to the Jordan basis is given by ⎛ ⎞ K ⎜ . ⎟ ⎜ ⎟ ⎜ ⎟ jo [I] = ⎜ . ⎟, jo ⎜ ⎟ ⎝ . ⎠ K ⎞ ⎛ p1 . . pr−1 1 ⎟ ⎜ . . ⎟ ⎜ ⎟ ⎜ st where K = [I]co = ⎜ . . ⎟. ⎟ ⎜ ⎠ ⎝ pr−1 . 1 4. The Jordan matrices in (6.21) and (6.24) are similar. A similarity is given by R = [I] jo . jo
Proof. 1. This is an extension of Proposition 5.40. We compute, with 0 ≤ α , β < m and i, j < r, pα ei , pβ z j−1 = [p−m pα π+ z−i p, pβ z j−1 ] = [π+ z−i p, pα +β −m z j−1 ]. The last expression is clearly zero whenever α + β = m−1. When α + β = m−1, we have [π+ z−i p, pα +β −m z j−1 ] = [π+ z−i p, p−1 z j−1 ] = [z−i p, p−1 z j−1 ] = [z j−i−1 , 1] = δi j .
144
6 Structure Theory of Linear Transformations
2. Follows either by direct computation or the fact that Sq∗ = Sq coupled with an application of Theorem 4.38. 3. This results from a simple computation.
jo jo jo 4. From S pm I = IS pm we get [S pm ] jo jo [I] = [I] [S pm ] . jo
jo
jo
Note that if the polynomial p(z) is monic of degree 1, i.e., p(z) = z − α , then with respect to the basis B = {1, z − α , . . . , (z − α )n−1 } the matrix representation has the following form: ⎞ α ⎟ ⎜1 α ⎟ ⎜ ⎟ ⎜ 1 . ⎟ ⎜ [S(z−α )n ]B ⎟. B =⎜ ⎟ ⎜ .. ⎟ ⎜ ⎝ . . ⎠ 1α ⎛
If our field F is algebraically closed, as is the field of complex numbers, then every irreducible polynomial is of degree 1. In case we work over the real field, the irreducible monic polynomials are either linear, or of the form (z − α )2 + β 2 , with α , β ∈ R. In this case, we make a variation on the canonical form obtained in Proposition 6.6. This leads to the real Jordan form. Proposition 6.8. Let p(z) be the real polynomial p(z) = (z − α )2 + β 2 . Then 1. We have dim X pn = 2n.
(6.25)
2. The set of polynomials B = {β , z − α , β p(z), (z − α )p(z), . . . , β pn−1 (z), (z − α )pn−1 (z)} forms a basis for X pn . 3. The matrix representation of S pn with respect to this basis is given by ⎞ A ⎟ ⎜N A ⎟ ⎜ ⎟ ⎜ N. ⎟ ⎜ jo [S pn ] jo = ⎜ ⎟, ⎟ ⎜ .. ⎟ ⎜ ⎝ . . ⎠ NA ⎛
where
A=
α −β β α
,
N=
0 β −1 0 0
(6.26)
.
Proof. The proof is analogous to that of Proposition 6.6 and is omitted.
6.3 The Invariant Factor Algorithm
145
6.3 The Invariant Factor Algorithm The invariant factor algorithm is the tool that allows us to move from the special case of cyclic operators to the general case. We begin by introducing an equivalence relation in the space F[z]m×n , i.e., the space of all m × n matrices over the ring F[z]. We will use the identification of F[z]m×n with Fm×n [z], the space of matrix polynomials with coefficients in Fm×n . Thus given a matrix A ∈ Fm×m , then zI − A is a matrix polynomial of degree one, but usually we will consider it also as a polynomial matrix. Definition 6.9. A polynomial matrix U(z) ∈ F[z]m×m is called unimodular if it is invertible in F[z]m×m , that is, there exists a matrix V (z) ∈ F[z]m×m for which U(z)V (z) = V (z)U(z) = I.
(6.27)
Lemma 6.10. U(z) ∈ F[z]m×m is unimodular if and only if detU(z) = α ∈ F and α = 0. Proof. Assume U(z) is unimodular. Then there exists V (z) ∈ F[z]m×m for which U(z)V (z) = V (z)U(z) = I. By the multiplicative rule of determinants we have detU(z) detV (z) = 1. This implies that both detU(z) and detV (z) are nonzero scalars. Conversely, if detU(z) is a nonzero scalar, then, using (3.9), we conclude that U(z)−1 is a polynomial matrix. Definition 6.11. Two matrices A, B ∈ F[z]m×n are called equivalent if there exist unimodular matrices U(z) ∈ F[z]m×m and V (z) ∈ F[z]n×n such that U(z)A(z) = B(z)V (z) = I. It is easily checked that this definition indeed yields an equivalence relation. The next theorem leads to a canonical form for this equivalence. However, we find it convenient to prove first the following lemma. Lemma 6.12. Let A(z) ∈ F[z]m×n . Then A(z) is equivalent to a block matrix of the form ⎛ ⎞ a(z) 0 . . . 0 ⎜ . ⎟ ⎜ ⎟ ⎜ ⎟ B(z) ⎜ . ⎟, ⎜ ⎟ ⎝ . ⎠ 0 with a(z) | bi j (z). Proof. If A(z) is the zero matrix, there is nothing to prove. Otherwise, interchanging rows and columns as necessary, we can bring a nonzero entry of least degree
146
6 Structure Theory of Linear Transformations
to the upper left-hand corner. We use the division rule of polynomials to write each element of the first row in the form a1 j (z) = c1 j (z)a11 (z) + a1 j (z), with deg a1 j < deg a11 . Next, we subtract the first column, multiplied by c1 j (z), from the jth column. We repeat the process with the first column. If all the remainders are zero we check whether a11 (z) divides all other entries in the matrix. If it does, we are through. Otherwise, we bring, using column and row interchanges, a lowest-degree nonzero element to the upper left-hand corner and repeat the process. Theorem 6.13 (The invariant factor algorithm). Let A(z) ∈ F[z]m×n . Then 1. A(z) is equivalent to a diagonal polynomial matrix with the diagonal elements di (z) monic polynomials satisfying di (z) | di−1 (z). 2. The polynomials di (z) are uniquely determined and are called the invariant factors of A(z). Proof. We use the previous lemma inductively. Thus, by elementary operations, A is reducible to diagonal form with the diagonal entries di (z). If d1 (z) does not divide di (z), then we add the ith row to the first. Then, by an elementary column operation, reduce di (z) modulo d1 (z) and reapply Lemma 6.12. We repeat the process as needed, till d1 (z) | di (z) for i > 1. We proceed by induction to get di (z) | di+1 (z). This is the opposite ordering from that in the statement of the theorem. We can get the right ordering by column and row interchanges. The polynomial matrix D(z) having nontrivial invariant factors on the diagonal will be called the Smith form of A(z). Proposition 6.14. A polynomial matrix U(z) is unimodular if and only if it is the product of a finite number of elementary unimodular matrices. Proof. The product of unimodular matrices is, by the multiplicative rule for determinants, also unimodular. Conversely, assume U(z) is unimodular. By elementary row and column operations, it can be brought to its Smith form D(z), with the polynomials di (z) on the diagonal. Since D(z) is also unimodular, D(z) = I. Let now Ui (z) and V j (z) be the elementary unimodular matrices representing the elementary operations in the diagonalization process. Thus Uk (z) · · ·U1 (z)U(z)V1 (z) · · ·Vl (z) = I, which implies U(z) = U1 (z)−1 · · ·Uk (z)−1Vl (z)−1 · · ·V1 (z)−1 . This is a representation of U(z) as the product of elementary unimodular matrices. To prove the uniqueness of the invariant factors, we introduce the determinantal divisors. Definition 6.15. Let A(z) ∈ F[z]m×n . Then we define D0 (A) = 1 and Dk (A) to be the g.c.d., taken to be monic, of all k × k minors of A(z). The Di (A) are called the determinantal divisors of A(z). Proposition 6.16. Let A(z), B(z) ∈ F[z]m×n . If B(z) is obtained from A(z) by a series of elementary row or column operations, then Dk (A) = Dk (B).
6.4 Noncyclic Transformations
147
Proof. We consider the set of all k × k minors of A(z). We consider the effect of applying elementary transformations on A(z) on this set. Multiplication of a row or column by a nonzero scalar α can change a minor by at most a factor α . This does not change the g.c.d. Interchanging two rows or columns leaves the set of minors unchanged, except for a possible sign. The only elementary operation that has a nontrivial effect is adding the jth row, multiplied by a polynomial p(z), to the ith row. In a minor that does not contain the ith row, or contains elements of both the ith and the jth rows, this has no effect. If a minor contains elements of the ith row but not of the jth, we expand by the ith row. We get the original minor added to the product of another minor by the polynomial p(z). Again, this does not change the g.c.d. Theorem 6.17. Let A(z), B(z) ∈ F[z]m×n . Then A(z) and B(z) are equivalent if and only if Dk (A) = Dk (B). Proof. It is clear from the previous proposition that if a single elementary operation does not change the set of elementary divisors, then also a finite sequence of elementary operation leaves the determinantal divisors unchanged. Since any unimodular matrix is the product of elementary unimodular matrices, we conclude that the equivalence of A(z) and B(z) implies Dk (A) = Dk (B). Assume now that Dk (A) = Dk (B) for all k. Using the invariant factor algorithm, we reduce A(z) and B(z) to their Smith form with the invariant factors d1 (z), . . . , dr (z), ordered by di−1 (z) | di (z), and e1 (z), . . . , es (z), similarly ordered, respectively. Clearly Dk (A) = d1 (z) · · · dk (z) and Dk (B) = e1 (z) · · · ek (z). This implies r = s and, assuming the invariant factors are all monic, ei (z) = di (z). By transitivity of equivalence we get A(z) B(z). Corollary 6.18. The invariant factors of A(z) ∈ F[z]m×n are uniquely determined. We can view the invariant factor algorithm as a far-reaching generalization of the Euclidean algorithm, for the Euclidean algorithm is the invariant factor algorithm as applied to a polynomial matrix of the form p(z) q(z) .
6.4 Noncyclic Transformations We continue now with the study of general, i.e., not necessarily cyclic, transformations in finite-dimensional vector spaces. Our aim is to reduce their study to that of the cyclic ones, and we will do so using the invariant factor algorithm. However, to use it effectively, we will have to extend the modular polynomial arithmetic to the case of vector polynomials. We will not treat the case of remainders with respect to general nonsingular polynomial matrices, since for our purposes, it will suffice to consider two special cases. The first case is the operator πzI−A , defined in (5.71)
148
6 Structure Theory of Linear Transformations
and discussed in Proposition 5.42. The second case is that of nonsingular diagonal polynomial matrices. There is a projection map, similar to the projection πd , defined in (5.2), that we shall need in the following, and which we proceed to introduce. Assume D(z) = ⎛ ⎞ ⎝
d1 (z)
..
⎠, with di (z) nonzero polynomials. Given a vector polynomial
.
⎛
d (z)
n ⎞ f1 (z) ⎜ ⎟ f (z) = ⎝ ... ⎠, we define
fn (z) ⎞ π d1 f 1 ⎟ ⎜ πD f = ⎝ ... ⎠ , ⎛
(6.28)
π dn f n where πdi fi are the remainders of fi (z) after division by di (z). Proposition 6.19. Given a nonsingular diagonal polynomial matrix D(z), then 1. We have 2.
Ker πD = DFn [z]. ⎧⎛ ⎪ ⎨ ⎜ XD = Im πD = ⎝ ⎪ ⎩
(6.29)
⎫ ⎞ f1 ⎪ ⎬ .. ⎟ | f (z) ∈ Im π di . . ⎠ i ⎪ ⎭ fn
(6.30)
In particular, XD Xd1 ⊕ · · · ⊕ Xdn . 3. Defining the map SD : XD −→ XD by SD f = πD z f for f (z) ∈ XD , we have SD S d 1 ⊕ · · · ⊕ S dn . Proof. 1. Clearly, f (z) ∈ Ker πD if and only if for all i, fi (z) is divisible by di (z). Thus ⎧⎛ ⎫ ⎞ ⎪ ⎪ ⎨ d1 f 1 ⎬ ⎜ .. ⎟ Ker πD = ⎝ . ⎠ | fi (z) ∈ F[z] = DFn [z]. ⎪ ⎪ ⎩ ⎭ dn f n 2. Follows from the definition of πD . 3. We compute ⎛ ⎞ f1 ⎜ ⎜ ⎟ SD ⎝ ... ⎠ = πD z ⎝ ⎛
fn
⎞ ⎛ ⎞ ⎛ ⎞ π d1 f 1 f1 S d1 f 1 .. ⎟ = ⎜ .. ⎟ = ⎜ .. ⎟ . . ⎠ ⎝ . ⎠ ⎝ . ⎠ fn
π dn f n
S dn f n
6.4 Noncyclic Transformations
149
With this background material, we can proceed to the structure theorem we have been after. It shows that an arbitrary linear transformation is isomorphic to a direct sum of cyclic transformations corresponding to the invariant factors of A. Theorem 6.20. Let A be a linear transformation in Fn . Let d1 (z), . . . , dn (z) be the invariant factors of zI − A. Then A is isomorphic to Sd1 ⊕ · · · ⊕ Sdn . Proof. Since d1 (z), . . . , dn (z) are the invariant factors of zI − A, there exist unimodular polynomial matrices U(z) and V (z) satisfying
This equation implies
U(z)(zI − A) = D(z)V (z).
(6.31)
UKer πzI−A ⊂ Ker πD .
(6.32)
Since U(z) and V (z) are invertible as polynomial matrices, we have also
which implies
U(z)−1 D(z) = (zI − A)V (z)−1 ,
(6.33)
U −1 Ker πD ⊂ Ker πzI−A .
(6.34)
We define now two maps, Φ : Fn −→ XD and Ψ : XD −→ Fn , by
Φ x = πDUx, and
x ∈ Fn ,
Ψ f = πzI−AU −1 f ,
(6.35)
f (z) ∈ XD .
(6.36)
We claim now that Φ is invertible and its inverse is Ψ . Indeed, using (6.34)-(6.36), we compute
Ψ Φ x = πzI−AU −1 πDUx = πzI−AU −1Ux = πzI−A x = x. Similarly, using (6.32), we have, for f ∈ XD ,
ΦΨ f = πDU πzI−AU −1 f = πDUU −1 f = πD f = f . Next, we show that the following diagram is commutative: A - n n F F
Φ
Φ ? XD
SD
-
? XD
150
6 Structure Theory of Linear Transformations
Indeed, for x ∈ Fn , using (6.32) and the fact that zKer πD ⊂ Ker πD , we have SD Φ x = πD z · πDUx = πD zUx = πDUzx = πDU πzI−A zx = πDU(Ax) = Φ Ax. Since Φ is an isomorphism, we have A SD . But SD Sd1 ⊕ · · · ⊕ Sdn , and the result follows. Corollary 6.21. Let A be a linear transformation in an n-dimensional vector space V . Let d1 (z), . . . , dn (z) be the invariant factors of A. Then A is similar to the diagonal block matrix composed of the companion matrices corresponding to the invariant factors, i.e., ⎛ ⎞ Cd1 ⎜ ⎟ . ⎜ ⎟ ⎜ ⎟ A⎜ (6.37) ⎟. . ⎜ ⎟ ⎝ ⎠ . Cdn Proof. By Theorem 6.20, A is isomorphic to Sd1 ⊕ · · · ⊕ Sdn acting in Xd1 ⊕ · · · ⊕ Xdn . Choosing the standard basis in each Xdi and taking the union of these basis elements, we have [Sd1 ⊕ · · · ⊕ Sdn ]stst = diag ([Sd1 ]stst , . . . , [Sdn ]stst ). We conclude by recalling that [Sdi ]stst = Cd i .
Theorem 6.22. Let A and B be linear transformations in n-dimensional vector spaces U and V respectively. Then the following statements are equivalent. 1. A and B are similar. 2. The polynomial matrices zI − A and zI − B are equivalent. 3. The invariant factors of A and B coincide. Proof. 1. Assume A and B are similar. Let X be an invertible matrix for which XA = BX. This implies X (zI − A) = (zI − B)X and so the equivalence of zI − A and zI − B. 2. The equivalence of the polynomial matrices zI − A and zI − B shows that their invariant factors coincide. 3. Assuming the invariant factors coincide, both A and B are similar to Sd1 ⊕ · · · ⊕ Sdn , and the claim follows from the transitivity of similarity. Corollary 6.23. Let A be a linear transformation in an n-dimensional vector space U over F. Let d1 (z), . . . , dn (z) be the invariant factors of A, ordered so that di | di−1 . Then
6.5 Diagonalization
151
1. The minimal polynomial of A, mA (z), satisfies mA (z) = d1 (z). 2. dA (z), the characteristic polynomial of A, satisfies n
dA (z) = ∏ di (z).
(6.38)
i=1
Proof. 1. Since A Sd1 ⊕ · · · ⊕ Sdn , for an arbitrary polynomial p(z) we have p(A) p(Sd1 ) ⊕ · · · ⊕ p(Sdn ). So p(A) = 0 if and only if p(Sdi ) = 0 for all i, or equivalently, di | p. In particular, d1 | p. But d1 (z) is the minimal polynomial of Sd1 , so it is also the minimal polynomial of Sd1 ⊕ · · · ⊕ Sdn and, by similarity, also of A. 2. Using Proposition 6.20 and the matrix representation (6.37), (6.38) follows. Next we give a characterization of cyclicity in terms of the characteristic and minimal polynomials. Proposition 6.24. A transformation T in U is cyclic if and only if its characteristic and minimal polynomials coincide. Proof. It is obvious that if T is cyclic, its characteristic and minimal polynomials have to coincide. Conversely, if the characteristic and minimal polynomials coincide, then the only nontrivial invariant factor d(z) is equal to the characteristic polynomial. By Theorem 6.20 we have T Sd , and since Sd is cyclic, so is T .
6.5 Diagonalization The maximal simplification one can hope to obtain for the structure of an arbitrary linear transformation in a finite-dimensional vector space is diagonalizability. Unhappily, that is too much to ask. The following theorem gives a complete characterization of diagonalizability. Theorem 6.25. Let T be a linear transformation in a finite-dimensional vector space U . Then T is diagonalizable if and only if mT (z), the minimal polynomial of T , splits into distinct linear factors. Proof. This follows from Proposition 5.15, taken in conjunction with Theorem 6.20. However, we find it of interest to give a direct proof. That the existence of a diagonal representation implies that mT (z), the minimal polynomial of T , splits into distinct linear factors is obvious.
152
6 Structure Theory of Linear Transformations
Assume now that the minimal polynomial splits into distinct linear factors. So k (z − λ ), where the λ are distinct. Let π (z) be the corresponding mT (z) = Πi=1 i i i Lagrange interpolation polynomials. Clearly (z − λi )πi (z) = mT (z). Since for i = j, (z − λi ) | π j (z), we get mT (z) | πi (z)π j (z). (6.39) Similarly, the polynomial πi (z)2 − πi (z) vanishes at all the points λi , i = 1, . . . , k. So mT (z) | πi (z)2 − πi (z).
(6.40)
The two division relations imply now
πi (T )π j (T ) = 0, and
i = j,
πi (T )2 = πi (T ).
(6.41)
(6.42)
It follows that the π (T ) are projections on independent subspaces. We set Ui = Im πi (T ). Then, since the equality 1 = ∑ki=1 πi (z) implies I = ∑ki=1 πi (T ), we get the direct sum decomposition U = U1 ⊕ · · · ⊕ Uk . Moreover, since T πi (T ) = πi (T )T , the subspaces Ui are T -invariant. From the equality (z − λi )πi (z) = mT (z) we get (T − λi I)πi (T ) = 0, or Ui ⊂ Ker (λi I − T ). This can be restated as T |Ui = λi IUi . Choosing bases in the subspaces Ui and taking their union leads to a basis of U made up of eigenvectors. With respect to this basis, T has a diagonal matrix representation. The following is the basic structure theorem, yielding a spectral decomposition for a linear transformation. The result follows from our previous considerations and can be read off from the Jordan form. Still, we find it of interest to give also another proof. Theorem 6.26. Let T be a linear transformation in a finite-dimensional vector space U . Let mT (z) be the minimal polynomial of T and let mT (z) = p1 (z)ν1 · · · pk (z)νk be the primary decomposition of mT (z). We set Ui = Ker pi (T )νi , i = 1, . . . , k, and define polynomials by πi (z) = Π j =i pνi i (z). Then 1. There exists a unique representation 1=
k
∑ π j (z)a j (z),
(6.43)
j=1
with a j ∈ X 2. We have
νj
pj
. mT | πi π j ,
for i = j,
(6.44)
and mT | πi2 a2i − πi ai .
(6.45)
6.5 Diagonalization
153
3. The operators Pi = πi (T )ai (T ), i = 1, . . . , k, are projection operators satisfying Pi Pj I T Pi Im Pi
= δi j Pj , = ∑ki=1 Pi , = Pi T Pi , = Ker pνi i (T ).
4. Define Ui = Im Pi . Then U = U1 ⊕ · · · ⊕ Uk . 5. The subspaces Ui are T -invariant. 6. Let Ti = T |Ui . Then the minimal polynomial of Ti is pi (z)νi . Proof. 1. Clearly, the πi (z) are relatively prime and pi (z)νi πi (z) = mT (z). By the Chinese remainder theorem, there exists a unique representation of the form 1 = ∑kj=1 π j (z)a j (z), with a j (z) ∈ X ν j . The a j (z) can be computed as follows. Since pj
for i = j we have π pνi π j f = 0, we have i
1 = π pνi i
k
∑ π j a j = π piνi πi ai = πi (S piνi )ai ,
j=1
and hence ai = πi (S pνi )−1 1. That the operator πi (S pνi ) : X pνi −→ X pνi is invertible i
i
i
i
follows, by Theorem 5.18, from the coprimeness of πi (z) and pνi i (z). 2. To see (6.44), note that if j = i, pνi i | π j , and hence mT (z) is a factor of πi (z)π j (z). Naturally, we get also mT | πi ai π j a j . To see (6.45), it suffices to show that for all j, pνi i | π 2j a2j − π j a j , or equivalently, that π 2j a2j and π j a j have the same remainder after division by pνi i . For j = i this is trivial, since pνi i |π j . So we assume j = i and compute
π pνi (πi2 a2i ) = π pνi (πi ai )π pνi πi ai = 1, i
3.
4. 5. 6.
i
i
for π pνi πi ai = πi (S pνi )ai = 1. i i Properties (6.44) and (6.45) have now the following consequences. For i = j we have πi (T )ai (T )π j (T )a j (T ) = 0, and (πi (T )ai (T ))2 = πi (T )ai (T ). This implies that Pi = πi (T )ai (T ) are commuting projections on independent subspaces. Set now Ui = Im Pi = Im πi (T )ai (T ). Equation (6.43) implies I = ∑ki=1 Pi , and hence we get the direct sum decomposition U = U1 ⊕ · · · ⊕ Uk . Since Pi commutes with T , the subspaces Ui actually reduce T . Let Ti = T |Ui . Since pνi i πi = mT , it follows that for all x ∈ U , pνi i (T )Pi x = pνi i (T )πi (T )ai (T )x = 0, i.e., the minimal polynomial of Ti divides pνi i . To see the converse, let q(z) be any polynomial such that q(Ti ) = 0. Then q(T )πi (T ) = 0. It follows that q(z)πi (z) is divisible by the minimal polynomial of T . But that means that pνi i πi | qπi or pνi i | q. We conclude that the minimal polynomial of Ti is pi (z)νi .
154
6 Structure Theory of Linear Transformations
6.6 Exercises 1. Let T (z) be an n × n nonsingular polynomial matrix. Show that there exists a unimodular matrix P(z) such that P(z)T (z) has the upper triangular form ⎛
t11 (z) ⎜ 0 ⎜ ⎜ ⎜ . ⎜ ⎝ . 0
⎞ . . . t1n (z) .. . . ⎟ ⎟ ⎟ .. . . ⎟ ⎟ .. . . ⎠ . . 0 tnn (z)
with tii (z) = 0 and deg(t ji ) < deg(tii ). 2. Let T be a linear operator on F2 . Prove that any nonzero vector that is not an eigenvector of T is a cyclic vector for T . Prove that T is either cyclic or a multiple of the identity. 3. Let T be a diagonalizable operator in an n-dimensional vector space. Show that if T has a cyclic vector, then T has n distinct eigenvalues. If T has n distinct eigenvalues, and if x1 , . . . , xn is a basis of eigenvectors, show that x = x1 + · · · + xn is a cyclic vector for T . 4. Let T be a linear transformation in the finite-dimensional vector space V . Prove that ImT has a complementary invariant subspace if and only if Im T and Ker T are independent subspaces. Show that if Im T and Ker T are independent subspaces, then Ker T is the unique complementary invariant subspace for Im T . 5. How many possible Jordan forms are there for a 6 × 6 complex matrix with characteristic polynomial (z + 2)4(z − 1)2 . 6. Given a linear transformation A in an n-dimensional space V , let d(z) and m(z) be its characteristic and minimal polynomials respectively. Show that d(z) divides mn (z). 7. Let p(z) = p0 + p1 z + · · · + pk−1 zk−1 . Solve the equation
πzk qp = 1. 8. Let A =
a c bd
be a complex matrix. Apply the Gram-Schmidt process to the
columns of A. Show that this results in a basis for C2 if and only if detA = 0. 9. Let A be an n×n matrix over the field F. We say that A is diagonalizable if there exists a matrix representation of A that is diagonal. Show that diagonalizable and cyclic are independent concepts, i.e., show the existence of matrices that are DC, DC, DC, DC. (Here DC means nondiagonalizable and cyclic.) 10. Let A be an n × n Jordan block matrix, i.e.,
6.6 Exercises
155
⎛
⎞ λ 1 ⎜ ⎟ ⎜ . . ⎟ ⎜ ⎟ A=⎜ . . ⎟, ⎜ ⎟ ⎝ . 1⎠ λ and
⎛
⎞ 01 ⎜ . . ⎟ ⎜ ⎟ ⎜ ⎟ N=⎜ . . ⎟, ⎜ ⎟ ⎝ . 1⎠ 0
i.e., A = λ I + N. Let p be a polynomial of degree r. Show that p(A) =
n−1
1
∑ k! p(k) (λ )N k .
k=0
11. Show that if A is diagonalizable and p(z) ∈ F[z], then p(A) is diagonalizable. 12. Let p(z) ∈ F[z] be of degree n. a. Show that the dimension of the smallest invariant subspace of X p that contains f (z) is n − r, where r = deg( f ∧ p). b. Conclude that f (z) is a cyclic vector of S p (z) if and only if f (z) and p(z) are coprime. 13. Let A : V −→ V and A1 : V1 −→ V1 be linear transformations with minimal polynomials m and m1 respectively. a. Show that if there exists a surjective (onto) map Z : V −→ V1 satisfying ZA = A1 Z, then m1 | m. (Is the converse true? Under what conditions?) b. Show that if there exists an injective (1 to 1) map Z : V1 −→ V satisfying ZA1 = AZ, then m1 | m. (Is the converse true? Under what conditions?) c. Show that the same is true with the minimal polynomials replaced by the characteristic polynomials. 14. Let A : V −→ V be a linear transformation and M ⊂ V an invariant subspace of A. Show that the minimal and characteristic polynomials of A|M divide the minimal and characteristic polynomials of A respectively. 15. Let A : V −→ V be a linear transformation and M ⊂ V an invariant subspace of A. Let d1 , . . . , dn be the invariant factors of A and e1 , . . . , em be the invariant factors of A|M . a. Show that ei | di for i = 1, . . . , m. b. Show that given arbitrary vectors v1 , . . . , vk in V , we have
156
6 Structure Theory of Linear Transformations
dim span {Ai v j | j = 1, . . . , k, i = 0, . . . , n − 1} ≤
k
∑ deg d j .
j=1
16. Let A : V −→ V and assume V = span {Ai v j | j = 1, . . . , k, i = 0, . . . , n − 1}. Show that the number of nontrivial invariant factors of A is ≤ k. 17. Let A(z) and B(z) be polynomial matrices and let C(z) = A(z)B(z). Let ai (z), bi (z), ci (z) be the invariant factors of A, B,C respectively. Show that ai | ci and bi | ci . 18. Let A and B be cyclic linear transformations. Show that if ZA = BZ and Z has rank r, then the degree of the greatest common divisor of the characteristic polynomials of A and B is at least r. 19. Show that if the minimal polynomials of A1 and A2 are coprime, then the only solution to ZA1 = A2 Z is the zero solution. 20. If there exists a rank r solution to ZA1 = A2 Z, what can you say about the characteristic polynomials of A1 and A2 ? 21. Let V be an n-dimensional linear space. Assume T : V −→ V is diagonalizable. Prove a. If T is cyclic, then T has n distinct eigenvalues. b. If T has n distinct eigenvalues, then T is cyclic. c. If b1 , . . . , bn are the eigenvectors corresponding to the distinct eigenvalues, then b = b1 + · · · + bn is a cyclic vector for T . 22. Show that if T 2 is cyclic, then T is cyclic. Is the converse true? 23. Let V be an n-dimensional linear space over the field F. Show that every nonzero vector in V is a cyclic vector of T if and only if the characteristic polynomial of T is irreducible over F. 24. Show that if m(z) is the minimal polynomial of a linear transformation A : V −→ V , then there exists a vector x ∈ V such that {p ∈ F[z] | p(A)x = 0} = mF[z]. 25. Let A : V −→ V be a linear transformation. Show that: a. For each x ∈ V , there exists a smallest A-invariant subspace Mx containing x. b. x is a cyclic vector for A|Mx . 26. Show that if A1 , A2 are cyclic maps with cyclic vectors b1 , b2 respectively and if the minimal polynomials of A1 and A2 are coprime, then b1 ⊕ b2 is a cyclic vector for A1 ⊕ A2 .
6.6 Exercises
157
27. Let A be a cyclic transformation in V and let M ⊂ V be an invariant subspace of T . Show that M has a complementary invariant subspace if and only if the characteristic polynomials of T |M and TV /M are coprime. 28. Show that if T 2 − T = 0, then T is similar to a matrix of the form diag (1, . . . , 1, 0, . . . , 0). 29. The following series of exercises, based on Axler (1995), provides an alternative approach to the structure theory of a linear transformation, which avoids the use of determinants. We do not assume the underlying field to be algebraically closed. Let T be a linear transformation in a finite-dimensional vector space X over the field F. A nonzero polynomial p(z) ∈ F[z] is called a prime of T if p(z) is monic, irreducible and there exists a nonzero vector x ∈ X for which p(T )x = 0. Such a vector x is called a p-null vector. Clearly, these notions generalize eigenvalues and eigenvectors. a. Show that every linear transformation T in a finite-dimensional vector space X has at least one prime. b. Let p(z) be a prime of T and let x be a p-null vector. If q(z) is any polynomial for which q(T )x = 0, then p(z) | q(z). In particular, if deg q < deg p and q(T )x = 0, then necessarily q(z) = 0. c. Show that nonzero pi -null vectors corresponding to distinct primes are linearly independent. d. Let p(z) be a prime of T with deg p = ν . Show that {x | p(T )k x = 0, for some k = 1, 2, . . .} = {x | p(T )[n/ν ] x = 0}. We call elements of Ker p(T )[n/ν ] generalized p-null vectors. e. Show that X is spanned by generalized p-null vectors. f. Show that generalized p-null vectors corresponding to distinct primes are linearly independent. g. Let p1 (z), . . . , pm (z) be the distinct primes of T and let Xi = Ker pi (T )[n/νi ] . Prove i. ii. iii. iv.
X has the direct sum representation X = X1 ⊕ · · · ⊕ Xm . Xi are T -invariant subspaces. The operators pi (T )|Xi are nilpotent. The operator T |Xi has only one prime, namely pi (z).
30. Given a nilpotent operator N, show that it is similar to a matrix of the form ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝
⎞
N1
⎟ ⎟ ⎟ ⎟, ⎟ ⎠
. . . Nk
158
6 Structure Theory of Linear Transformations
with Ni of the form
⎛
⎞ 01 ⎜ . . ⎟ ⎜ ⎟ ⎜ ⎟ . . ⎟. ⎜ ⎜ ⎟ ⎝ . 1⎠ 0
31. Let T be a linear operator in an n-dimensional complex vector space X . Let νi λ1 , . . . , λm be the distinct eigenvalues of T and let m(z) = ∏m i=1 (z − λi ) be its minimal polynomial. Let Hi (z) be the minimal degree solution to the Hermite (k) interpolation problem Hi (λ j ) = δ0 k δi j , i, j = 1, . . . , m, k = 0, . . . , ν j − 1. Define Pk = Hk (T ). Show that for any polynomial p(z), we have p(T ) =
m νk −1
∑∑
k=1 j=0
p( j) (λk ) (T − λk I) j Pk . j!
32. Let T be a linear operator in an n-dimensional vector space X . Let di (z) be the invariant factors of T and let δi = degdi . Show that the commutant of T has dimension equal to ∑ni=1 (2i − 1)δi . Conclude that the commutant of T is L(X) if and only if T is scalar, i.e., T = α I for some α ∈ F. Show also that the dimension of the commutant of T is equal to n if and only if T is cyclic.
6.7 Notes and Remarks The polynomial module structure induced by a linear transformation acting in a vector space is what in functional analysis is called a functional calculus; see Dunford and Schwartz (1958). The Jordan canonical form originated with C. Jordan and K. Weierstrass, using different techniques. Actually, in 1874, Jordan and Kronecker, himself a student of Weierstrass, were quarreling over priorities regarding pencil equivalence and the reduction to canonical forms. Jordan was stressing spectral theory, i.e., eigenvalues and eigenvectors, whereas Weierstrass, defended by Kronecker, was stressing invariant factors and elementary divisors. For a comprehensive discussion of this controversy, see Brechenmacher (2007). The study of a linear transformation on a vector space via the study of the polynomial module structure induced by it on that space appears already in van der Waerden (1931). Although it is very natural, it did not become the standard approach in the literature, most notably in books aimed at a broader mathematical audience. This is probably due to the perception that the concept of module is too abstract. Our approach to the structure theory of linear transformations and the Jordan canonical form, based on polynomial models, rational models, and shift operators,
6.7 Notes and Remarks
159
is a concretization of that approach. It was originated in Fuhrmann (1976), which in turn was based on ideas stemming from Hilbert space operator theory. In this connection, the reader is advised to consult Sz.-Nagy and Foias (1970) and Fuhrmann (1981). The transition from the cyclic case to the noncyclic case, described in Section 6.4, uses a special case of general polynomial model techniques introduced in Fuhrmann (1976).
Chapter 7
Inner Product Spaces
7.1 Introduction In this chapter we focus on the study of vector spaces and linear transformations that relate to notions of distance, angle, and orthogonality. We will restrict ourselves throughout to the case of the real field R or the complex field C.
7.2 Geometry of Inner Product Spaces Definition 7.1. Let U be a vector space over C (or R). An inner product on U is a function (·, ·) : U × U −→ C that satisfies the following conditions: 1. (x, x) ≥ 0, and (x, x) = 0 if and only if x = 0. 2. For α1 , α2 ∈ C and x1 , x2 , y ∈ U we have (α1 x1 + α2 x2 , y) = α1 (x1 , y) + α2 (x2 , y). 3. We have (x, y) = (y, x). We note that these axioms imply the antilinearity of the inner product in the right variable, that is, (x, α1 y1 + α2 y2 ) = α1 (x, y1 ) + α2 (x, y2 ). A function φ (x, y) that is linear in x and antilinear in y and satisfies φ (x, y) = φ (y, x) is called a Hermitian form. Thus the inner product is a Hermitian form in U . We define the norm of a vector x ∈ U by 1
x = (x, x) 2 ,
P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, DOI 10.1007/978-1-4614-0338-8 7, © Springer Science+Business Media, LLC 2012
161
162
7 Inner Product Spaces
where we take the nonnegative square root. A vector x ∈ U is called normalized, or a unit vector, if x = 1. The existence of the inner product allows us to introduce and utilize the all-important notion of orthogonality. We say that x, y ∈ U are orthogonal, and write x ⊥ y, if (x, y) = 0. Given a set S, we write x ⊥ S if (x, s) = 0 for all s ∈ S. We define the set S⊥ by S⊥ = {x | (x, s) = 0, for all s ∈ S} =
{x | (x, s) = 0}.
s∈S
A set of vectors {xi } is called orthogonal if (xi , x j ) = 0 whenever i = j. A set of vectors {xi } is called orthonormal if (xi , x j ) = δi j . Theorem 7.2 (The Pythagorean theorem). Let {xi }ki=1 be an orthogonal set of vectors in U . Then k k 2 ∑ xi = ∑ xi 2 . i=1
Proof. k 2 ∑ xi = i=1
k
k
i=1
j=1
∑ xi , ∑ x j
i=1
k
k
k
= ∑ ∑ (xi , x j ) = ∑ xi 2 . i=1 j=1
i=1
Theorem 7.3 (The Schwarz inequality). For all x, y ∈ U we have |(x, y)| ≤ x · y.
(7.1)
Proof. If y = 0, the equality holds trivially. So, assuming y = 0, we set e = y/y. We note that x = (x, e)e + (x − (x, e)e) and (x − (x, e)e) ⊥ (x, e)e. Applying the Pythagorean theorem, we get x2 = (x, e)e2 + x − (x, e)e2 ≥ (x, e)e2 = |(x, e)|2 , or |(x, e)| ≤ x. Substituting y/y for e, inequality (7.1) follows.
Theorem 7.4 (The triangle inequality). For all x, y ∈ U we have x + y ≤ x + y.
(7.2)
Proof. We compute x + y2 = (x + y, x + y) = (x, x) + (x, y) + (y, x) + (y, y) = x2 + 2Re(x, y) + y2 ≤ x2 + 2|(x, y)| + y2 ≤ x2 + 2xy + y2 = (x + y)2.
7.2 Geometry of Inner Product Spaces
163
Next, we prove two important identities. The proof of both is computational and we omit it. Proposition 7.5. For all vectors x, y ∈ U we have 1. The polarization identity: 1 (x, y) = {x + y2 − x − y2 + ix + iy2 − ix − iy2} 4
(7.3)
2. The parallelogram identity: x + y2 + x − y2 = 2 x2 + y2 .
(7.4)
Proposition 7.6 (The Bessel inequality). Let {ei }ki=1 be an orthonormal set of vectors in U . Then k
x2 ≥ ∑ |(x, ei )|2 . i=1
Proof. Let x be an arbitrary vector in U . Obviously, we can write x = ∑ki=1 (x, ei )ei + k x − ∑i=1 (x, ei )ei . Since the vector x − ∑ki=1 (x, ei )ei is orthogonal to all e j , j = 1, . . . , k, we can use the Pythagorean theorem to obtain x2 = x − ∑ki=1 (x, ei )ei 2 + ∑ki=1 (x, ei )ei 2 ≥ ∑ki=1 (x, ei )ei 2 = ∑ki=1 |(x, ei )|2 .
Proposition 7.7. An orthogonal set of nonzero vectors {xi }ki=1 is linearly independent. Proof. Assume ∑ki=1 ci xi = 0. Taking the inner product with x j , we get k
0 = ∑ ci (xi , x j ) = c j x j 2 . i=1
This implies c j = 0 for all j.
Since orthogonality implies linear independence, we can search for bases consisting of orthogonal, or even better, orthonormal, vectors. Thus an orthogonal basis for U is an orthogonal set that is also a basis for U . Similarly, an orthonormal basis for U is an orthonormal set that is also a basis for U . The next theorem, generally known as the Gram–Schmidt orthonormalization process, gives a constructive method for computing orthonormal bases. Theorem 7.8 (Gram–Schmidt orthonormalization). Let {x1 , . . . , xk } be linearly independent vectors in U . Then there exists an orthonormal set {e1 , . . . , ek } such that span (e1 , . . . , e j ) = span (x1 , . . . , x j ), 1 ≤ j ≤ k. (7.5)
164
7 Inner Product Spaces
Proof. We prove this by induction. For j = 1 we set e1 = x j
x1 x1 .
Assume we have
j−1 constructed e1 , . . . , e j−1 as required. We consider next = x j − ∑i=1 (x j , ei )ei . Clearly x j = 0, for otherwise, x j ∈ span (e1 , . . . , e j−1 ) = span (x1 , . . . , x j−1 ), contrary to the assumption of linear independence of the xi . We define now
ej =
x j x j
=
j−1 x j − ∑i=1 (x j , ei )ei j−1 x j − ∑i=1 (x j , ei )ei
.
Since ei ∈ span (x1 , . . . , x j ) for i = 1, . . . , j − 1 and since e j ∈ span (e1 , . . . , e j−1 , x j ) ⊂ span (x1 , . . . , x j ), we obtain the inclusion span (e1 , . . . , e j ) ⊂ span (x1 , . . . , x j ). To get the inverse inclusion, we observe that span(x1 , . . . , x j−1 ) ⊂ span(e1 , . . . , e j ) follows from the xj =
j−1
j−1
i=1
i=1
∑ (x j , ei )ei + x j − ∑ (x j , ei )ei e j
implies x j ∈ span (e1 , . . . , e j ). So span (x1 , . . . , x j ) ⊂ span (e1 , . . . , e j ), and these two inclusions imply (7.5). Corollary 7.9. Every finite-dimensional inner product space has an orthonormal basis. Corollary 7.10. Every orthonormal set in a finite-dimensional inner product space U can be extended to an orthonormal basis. Proof. Let {e1 , . . . , em } be an orthonormal set, thus necessarily linearly independent. Hence, by Theorem 2.13, it can be extended to a basis {e1 , . . . , em , xm+1 , . . . , xn } for U . Applying the Gram–Schmidt orthonormalization process, we get an orthonormal basis {e1 , . . . , en }. The first m vectors remain unchanged. Orthonormal bases are particularly convenient for vector expansions, or alternatively, for computing coordinates. Proposition 7.11. Let {e1 , . . . , en } be an orthonormal basis for a finite-dimensional inner product space U . Then every vector x ∈ U has a unique representation in the form n
x = ∑ (x, ei )ei . i=1
We consider next an important approximation problem. We define the distance of a vector x ∈ U to a subspace M by
δ (x, M ) = inf{x − m|m ∈ M }. A best approximant for x in M is a vector m0 ∈ M that satisfies x − m0 = δ (x, M ).
7.2 Geometry of Inner Product Spaces
165
Theorem 7.12. Let U be a finite-dimensional inner product space and M a subspace. Let x ∈ U . Then 1. A best approximant for x in M exists. 2. A best approximant for x in M is unique. 3. A vector m0 ∈ M is a best approximant for x in M if and only if x − m0 is orthogonal to M . Proof. 1. Let {e1 , . . . , em } be an orthonormal basis for M . Extend it to an orthonormal basis {e1 , . . . , en } for U . We claim that m0 = ∑m i=1 (x, ei )ei is a best approximant. Clearly m0 ∈ M . Any other vector m ∈ M has a unique representation of the form m = ∑m i=1 ci ei . Therefore m
m
m
i=1
i=1
i=1
x − ∑ ci ei = x − ∑ (x, ei )ei + ∑ [(x, ei ) − ci ]ei . Now, the vector x − ∑m i=1 (x, ei )ei is orthogonal to L(e1 , . . . , em ). Hence, applying the Pythagorean theorem, we get m m 2 2 2 x − ∑m i=1 ci ei = x − ∑i=1 (x, ei )ei + ∑i=1 [(x, ei ) − ci ]ei m 2 ≥ x − ∑i=1 (x, ei )ei .
2. Let ∑m i=1 ci ei be another best approximant. By the previous computation, we must m 2 2 have ∑m i=1 [(x, ei ) − ci ]ei = ∑i=1 |(x, ei ) − ci | , and this happens if and only if ci = (x, ei ) for all i. Thus the two approximants coincide. 3. Assume m0 ∈ M is the best approximant. We saw that m0 = ∑m i=1 (x, ei )ei and hence x − ∑m i=1 (x, ei )ei is orthogonal to M = L(e1 , . . . , em ). Conversely, assume m0 ∈ M and x − m0 ⊥ M . Then, for any vector m ∈ M , we have x − m = (x − m0 ) + (m − m0 ). Since m − m0 ∈ M , using the Pythagorean theorem, we have x − m2 = x − m0 2 + m − m02 ≥ x − m02 . Hence m0 is the best approximant.
In the special case of inner product spaces, we will reserve the notation U = M ⊕ N for the case of orthogonal direct sums, that is, U = M + N and M ⊥ N . We note that M ⊥ N implies M ∩ N = {0}. Theorem 7.13. Let U be a finite-dimensional inner product space and M a subspace. Then we have U = M ⊕ M ⊥. M⊥
(7.6)
are orthogonal. It suffices to show that they Proof. The subspaces M and span U . Thus, let x ∈ U , and let m be its best approximant in M . We can write x = m + (x − m). By Theorem 7.12, we have x − m ∈ M ⊥ .
166
7 Inner Product Spaces
Inner product spaces are a particularly convenient setting for the development of duality theory. The key to this is the identification of the dual space to an inner product space with the space itself. This is done by identifying a linear functional with an inner product with a vector. Theorem 7.14. Let U be a finite-dimensional inner product space. Then f is a linear functional on U if and only if there exists a vector ξ f ∈ U such that f (x) = (x, ξ f ).
(7.7)
Proof. If f is defined by (7.7), then obviously it is a linear functional on U . To prove the converse, we note first that if f is the zero functional, then we just choose ξ f = 0. Thus, without loss of generality, we can assume that f is a nonzero functional. In this case, M = Ker f is a subspace of codimension 1. This implies dim M ⊥ = 1. Let us choose an arbitrary nonzero vector y ∈ M ⊥ and set ξ f = α y. Since (y, ξ f ) = (y, α y) = α y2 , by choosing α = the same holds, by linearity, for all (x, ξ f ) = 0. Thus (7.7) follows.
f (y) , y2
we get f (y) = (y, ξ f ), and
vectors in M ⊥ . For x ∈ M
it is clear that f (x) =
7.3 Operators in Inner Product Spaces In this section we study some important classes of operators in inner product spaces. We begin by studying the adjoint transformation. Due to the availability of inner products and the representation of linear functionals, the definition is slightly different from that given for general transformations.
7.3.1 The Adjoint Transformation Theorem 7.15. Let U , V be inner product spaces and let T : U −→ V be a linear transformation. Then there exists a unique linear transformation T ∗ : V −→ U such that (T x, y) = (x, T ∗ y) (7.8) for all x ∈ U and y ∈ V . Proof. Fix a vector y ∈ V . Then f (x) = (T x, y) defines a linear functional on U . Thus, by Theorem 7.14, there exists a unique vector ξ ∈ U such that (T x, y) = (x, ξ ). We define T ∗ by T ∗ y = ξ . Therefore, T ∗ is a map from V to U . It remains to show that T ∗ is a linear map. To this end, we compute
7.3 Operators in Inner Product Spaces
167
(x, T ∗ (α1 y1 + α2 y2 )) = (T x, α1 y1 + α2 y2 ) = α1 (T x, y1 ) + α2 (T x, y2 ) = α1 (x, T ∗ y1 ) + α2 (x, T ∗ y2 ) = (x, α1 T ∗ y1 ) + (x, α2 T ∗ y2 ) = (x, α1 T ∗ y1 + α2 T ∗ y2 ). By the uniqueness of the representing vector, we get T ∗ (α1 y1 + α2 y2 ) = α1 T ∗ y1 + α2 T ∗ y2 .
We will call the transformation T ∗ the adjoint or, if emphasis is needed, the Hermitian adjoint of T . Given a complex matrix A = (ai j ), its adjoint A∗ = (a∗i j ) ˜ is defined by a∗ = a ji , or alternatively, by A∗ = A. ij
Proposition 7.16. Let T : U −→ V be a linear transformation. Then (Im T )⊥ = Ker T ∗ . Proof. We have for x ∈ U , y ∈ V , (T x, y) = (x, T ∗ y). So y ⊥ Im T if and only if T ∗ y ⊥ U , i.e., if and only if y ∈ Ker T ∗ . Corollary 7.17. Let T be a linear transformation in an inner product space U . Then we have the following direct sum decompositions: U = Im T ⊕ Ker T ∗ , U = Im T ∗ ⊕ KerT. Moreover, rankT = rank T ∗ and dim Ker T = dim Ker T ∗ . Proof. Follows from Theorem 7.13 and Proposition 7.16.
Proposition 7.18. Let T be a linear transformation in U . Then M is a T -invariant subspace of U if and only if M ⊥ is a T ∗ -invariant subspace. Proof. Follows from equation (7.8).
We proceed to study the matrix representation of a linear transformation T with respect to orthonormal bases. These are particularly easy to compute. Proposition 7.19. Let T : U −→ V be a linear transformation and let B = {e1 , . . . , en } and B1 = { f1 , . . . , fm } be orthonormal bases for U and V respectively. Then 1 1. For the matrix representation [T ]B B = (ti j ), we have
ti j = (Te j , fi ).
168
7 Inner Product Spaces
2. We have for the adjoint T ∗ , B1 ∗ [T ∗ ]B B1 = ([T ]B ) .
Proof. 1. The matrix representation of T with respect to the given bases is defined by Te j = ∑m k=1 tki f k . We compute m
(Te j , fi ) = ( ∑ tk j fk , fi ) = k=1
m
m
k=1
k=1
∑ tk j ( fk , fi ) = ∑ tk j δik = ti j .
∗ 2. Let [T ∗ ]B B1 = (ti j ). Then
ti∗j = (T ∗ f j , ei ) = ( f j , Tei ) = (Tei , f j ) = t ji .
Next, we study the properties of the map from T → T ∗ as a function from L(U , V ) into L(V , U ). Proposition 7.20. Let U , V be inner product spaces. Then the adjoint map has the following properties: 1. 2. 3. 4. 5. 6.
(T + S)∗ = T ∗ + S∗. (α T )∗ = α T ∗ . (ST )∗ = T ∗ S∗ . (T ∗ )∗ = T . ∗ =I . For the identity map IU in U , we have IU U −1 ∗ If T : U −→ V is invertible, then (T ) = (T ∗ )−1 .
Proof. 1. Let x ∈ U and y ∈ V . We compute (x, (T + S)∗ y) = ((T + S)x, y) = (T x + Sx, y) = (T x, y) + (Sx, y) = (x, T ∗ y) + (x, S∗y) = (x, T ∗ y + S∗y) = (x, (T ∗ + S∗ )y). By uniqueness we get (T + S)∗ = T ∗ + S∗ . 2. Computing (x, (α T )∗ y) = ((α T )x, y) = (α T x, y) = α (T x, y) = α (x, T ∗ y) = (x, α T ∗ y), we get, using uniqueness, that (α T )∗ = α T ∗ . 3. We compute (x, (ST )∗ y) = (ST x, y) = (T x, S∗ y) = (x, T ∗ S∗ y), or (ST )∗ = T ∗ S∗ .
7.3 Operators in Inner Product Spaces
4.
169
(T x, y) = (x, T ∗ y) = (T ∗ y, x) = (y, T ∗∗ x) = (T ∗∗ x, y),
and this implies T ∗∗ = T . 5. Let IU be the identity map in U . Then (IU x, y) = (x, y) = (x, IU y), ∗ =I . so IU U 6. Assume T : U −→ V is invertible. Then T −1 T = IU and T T −1 = IV . The first ∗ = T ∗ (T −1 )∗ , that is, (T −1 )∗ is a right inverse of T ∗ . equality implies IU = IU The second equality implies that it is a left inverse, and (T −1 )∗ = (T ∗ )−1 follows. The availability of the norm function on an inner product space allows us to consider a numerical measure of the size of a linear transformation.
Definition 7.21. Let U1 , U2 be two inner product spaces. Given a linear transformation T : U1 −→ U2 , we define its norm by T = sup T x. x≤1
Clearly, for an arbitrary linear transformation T , the norm T is finite. For it is defined as the supremum of a continuous function on the unit ball {x ∈ U1 | x ≤ 1}, which is a compact set. It follows that the supremum is actually attained, and thus there exists a, not necessarily unique, unit vector x for which T x = T . The following proposition, whose standard proof we omit, sums up the basic properties of the operator norm. Proposition 7.22. We have the following properties of the operator norm: 1. 2. 3. 4. 5. 6.
T + S ≤ T + S. α T = |α | · T . T S ≤ T · S. T ∗ = T . I = 1. T = supx=1 T x = supx,y≤1 |(T x, y)| = supx,y=1 |(T x, y)|.
7.3.2 Unitary Operators The extra structure that inner product spaces have over linear spaces, namely the existence of inner products, leads to a new definition, that of an isomorphism of two inner product spaces.
170
7 Inner Product Spaces
Definition 7.23. Let V and W be two inner product spaces over the same field. A linear transformation T : V −→ W is an isometry if it preserves inner products, i.e., if (T x, Ty) = (x, y), for all x, y ∈ V . An isomorphism of inner product spaces is an isomorphism that is isometric. We will also call it an isometric isomorphism or alternatively a unitary isomorphism. Proposition 7.24. Given a linear transformation T : V −→ W , then the following properties are equivalent: 1. T preserves inner products. 2. For all x ∈ V we have T x = x. 3. We have T ∗ T = I. Proof. Assume T preserves inner products, i.e., (T x, Ty) = (x, y), for all x, y ∈ V . Choosing y = x, we get T x2 = (T x, T x) = (x, x) = x2 . If T x = x, then using the polarization identity (7.3), we get (T x, Ty) = =
1 {(T x + Ty)2 − (T x − Ty)2 + i(T x + iTy)2 − i(T x − iTy)2} 4 1 {(x + y)2 − (x − y)2 + i(x + iy)2 − i(x − iy)2} = (x, y). 4
Next observe that if T preserves inner products, then ((I − T ∗ T )x, y) = 0, which implies T ∗ T = I. Finally, if the equality T ∗ T = I holds, then clearly T preserves inner products. Corollary 7.25. An isometry T : V −→ W maps orthonormal sets into orthonormal sets. The next result is the counterpart, for inner product spaces, of Theorem 4.14. Corollary 7.26. Two inner product spaces are isomorphic if and only if they have the same dimension. Proof. Assume a unitary isomorphism U : V −→ W exists. Since it maps bases into bases, the dimensions of V and W coincide. Conversely, suppose V and W have the same dimension. We choose orthonormal bases {e1 , . . . , en } and { f1 , . . . , fn } in V and W respectively. We define a linear map U : V −→ W by Uei = fi . It is immediate that the operator U so defined is a unitary isomorphism. Proposition 7.27. Let U : V −→ W be a linear transformation. Then U is unitary if and only if U ∗ = U −1 . Proof. If U ∗ = U −1 , we get U ∗U = UU ∗ = I, i.e., U is an isometric isomorphism. Conversely, if U is unitary, we have U ∗U = I. Since U is an isomorphism and the inverse transformation, if it exists, is unique, we obtain U ∗ = U −1 .
7.3 Operators in Inner Product Spaces
171
Next, we consider matrix representations. We say that a complex matrix is unitary if A∗ A = I. Here A∗ is defined by a∗i j = a ji. We proceed to consider matrix representations for unitary maps. We say that a complex n × n matrix A is unitary if A∗ A = I, where A∗ is the Hermitian adjoint of A. We expect that the matrix representation of a unitary matrix, when taken with respect to orthonormal bases, should reflect the unitarity property. We state this in the following proposition, omitting the simple proof. Proposition 7.28. Let U : V −→ W be a linear transformation between two inner product spaces. Let B1 = {e1 , . . . , en } and B2 = { f1 , . . . , fn }, be orthonormal bases in V and W respectively. Then U is unitary if and only if its matrix representation 2 [U]B B1 is a unitary matrix. The notion of similarity can be strengthened in the context of inner product spaces. Definition 7.29. Let V and W be two inner product spaces over the same field, and let T : V −→ V and S : W −→ W be linear transformations. We say that T and S are unitarily equivalent if there exists a unitary map U : V −→ W for which UT = SU, or equivalently, for which the following diagram is commutative: V
U
-
T
W S
? V
U
-
? W
The unitary transformations play, in the algebra of linear transformations on an inner product space, the same role that complex numbers of absolute value one play in the complex number field C. This is reflected in the eigenvalues of unitary transformations as well as in their structure. Proposition 7.30. Let U be a unitary operator in an inner product space V . Then 1. 2. 3. 4.
All eigenvalues of U have absolute value one. If Ux = λ x, then U ∗ x = λ x. Eigenvectors corresponding to different eigenvalues are orthogonal. We have Ker (U − λ1 I)ν1 · · · (U − λk I)νk = Ker (U − λ1 I) · · · (U − λk I).
5. The minimal polynomial mU (z) of U splits into distinct linear factors.
172
7 Inner Product Spaces
Proof. 1. Let λ be an eigenvalue of U and x a corresponding eigenvector. Then Ux = λ x implies (Ux, x) = (λ x, x) = λ (x, x). Now, using the unitarity of U, we have x2 = Ux2 = λ x2 = |λ |2 x2 . This implies |λ | = 1. 2. Assume Ux = λ x. Using the fact that U ∗U = UU ∗ = I, we compute U ∗ x − λ x2 = (U ∗ x − λ x,U ∗ x − λ x) = (U ∗ x,U ∗ x) − λ (x,U ∗ x) − λ (x,Ux) + |λ |2x2 = (UU ∗ x, x) − λ (Ux, x) − λ (x,Ux) + x2 = x2 − λ (λ x, x) − λ (x, λ x) + x2 = x2 − 2|λ |2x2 + x2 = 0. 3. Let λ , μ be distinct eigenvalues and x, y the corresponding eigenvectors. Then
λ (x, y) = (Ux, y) = (x,U ∗ y) = (x, μ y) = μ (x, y). Hence, (λ − μ )(x, y) = 0. Since λ = μ , we conclude that (x, y) = 0 or x ⊥ y. 4. Since U is unitary, U and U ∗ commute. Therefore also U − λ I and U ∗ − λ I commute. Assume now that (U − λ I)ν x = 0. Without loss of generality we may n assume that (U − λ I)2 x = 0. This implies 0 = (U ∗ − λ I)2 (U − λ I)2 x = [(U ∗ − λ I)2 n
n
n−1
(U − λ I)2
n−1
]2 x.
So 0 = ([(U ∗ − λ I)2
n−1
(U − λ I)2
n−1
]2 x, x) = (U ∗ − λ I)2
n−1
(U − λ I)2
n−1
x2 ,
or (U ∗ − λ I)2 (U − λ I)2 x = 0. This implies in turn that (U − λ I)2 x = 0, n−1 and hence (U − λ I)2 x = 0. By repeating the argument we conclude that (U − λ I)x = 0. Next, we use the fact that all factors U − λi I commute. Assuming (U − λ1 I)ν1 · · · (U − λk I)νk x = 0, we conclude n−1
n−1
n−1
0 = (U − λ1 I)(U − λ2 I)ν2 · · · (U − λk I)νk x = (U − λ2 I)ν2 (U − λ1 I)(U − λ3 I)ν3 · · · (U − λk I)νk x. This implies (U − λ1 I)(U − λ2 I)(U − λ3 I)ν3 · · · (U − λk I)νk x = 0. Proceeding inductively, we get (U − λ1 I) · · · (U − λk I)x = 0. 5. Follows from part 4.
7.3 Operators in Inner Product Spaces
173
7.3.3 Self-adjoint Operators Probably the most important class of operators in an inner product space is that of self-adjoint operators. An operator T in an inner product space U is called selfadjoint, or Hermitian, if T ∗ = T . Similarly, a complex matrix A is called selfadjoint or Hermitian if it satisfies A∗ = A, or equivalently ai j = a ji . The next proposition shows that self-adjoint operators in an inner product space play in L(U ) a role similar to that of the real numbers in the complex field. Proposition 7.31. Every linear transformation T in an inner product space can be written in the form T = T1 + iT2 with T1 , T2 self-adjoint. Such a representation is unique. Proof. Write 1 1 T = (T + T ∗ ) + i (T − T ∗ ) 2 2i and note that T1 = 12 (T + T ∗ ) and T2 = 2i1 (T − T ∗ ) are both self-adjoint. Conversely, if T = T1 + iT2 with T1 , T2 self-adjoint, then T ∗ = T1 − iT2 . By elimination we get that T1 , T2 have necessarily the form given previously. In preparation for the spectral theorem, we prove a few properties of self-adjoint operators that are of intrinsic interest. Proposition 7.32. Let T be a self-adjoint operator in an inner product space U . Then 1. All eigenvalues of T are real. 2. Eigenvectors corresponding to different eigenvalues are orthogonal. 3. We have Ker (T − λ1 I)ν1 · · · (T − λk I)νk = Ker (T − λ1 I) · · · (T − λk I). 4. The minimal polynomial mT (z) of T splits into distinct linear factors. Proof. 1. Let λ be an eigenvalue of T and x a corresponding eigenvector. Then T x = λ x implies (T x, x) = (λ x, x) = λ (x, x). Now, using the self-adjointness of (T x,x) T , we have (T x, x) = (x, T x) = (T x, x), that is, (T x, x) is real. So is λ = (x,x) . 2. Let λ , μ be distinct eigenvalues of T , and x, y corresponding eigenvectors. Using the fact that eigenvalues of T are real, we get (T x, y) = (λ x, y) = λ (x, y) and (T x, y) = (x, Ty) = (x, μ y) = μ (x, y) = μ (x, y). By subtraction we get (λ − μ )(x, y) = 0, and since λ = μ , we conclude that (x, y) = 0, or x ⊥ y.
174
7 Inner Product Spaces
3. We prove this by induction on the number of factors. So, let k = 1 and assume that (T − λ1 I)ν1 x = 0. Then, by multiplying this equation by (T − λ1 I)μ , we may n assume without loss of generality that (T − λ1 I)2 x = 0. Then we get n
0 = ((T − λ1 I)2 x, x) = ((T − λ1 I)2
n−1
x, (T − λ1I)2
n−1
x) = (T − λ1 I)2
n−1
x,
n−1
which implies (T − λ1 I)2 x = 0. Repeating this argument, we finally obtain (T − λ1 I)x = 0. Assume now the statement holds for up to k − 1 factors and k (T − λ I)νi x = 0. Therefore (T − λ I)νk Π k−1 (T − λ I)νi x = 0. By the that Πi=1 i i k i=1 argument for k = 1 we conclude that k−1 k−1 (T − λi I)νi x = Πi=1 (T − λi I)νi [(T − λk I)x]. 0 = (T − λiI)Πi=1
By the induction hypothesis we get k−1 k 0 = Πi=1 (T − λi I)[(T − λk I)x] = Πi=1 (T − λiI)x, k (T − λ I) = 0. or, since x is arbitrary, Πi=1 i 4. Follows from the previous part.
Theorem 7.33. Let T be self-adjoint. Then there exists an orthonormal basis consisting of eigenvectors of T . k (z − λ ), with Proof. Let mT (z) be the minimal polynomial of T . Then mT (z) = Πi=1 i the λi distinct. Let πi (z) be the corresponding Lagrange interpolation polynomials. We observe that the πi (T ) are orthogonal projections. That they are projections has been proved in Theorem 6.25. That they are orthogonal projections follows from their self-adjointness, noting that, given a real polynomial p(z) and a selfadjoint operator T , necessarily p(T ) is self-adjoint. This implies that U = Ker (λ1 I − T ) ⊕ · · · ⊕ Ker (λk I − T ) is an orthogonal direct sum decomposition. Choosing an orthonormal basis in each subspace Ker (λi I − T ), we get an orthonormal basis made of eigenvectors.
As a consequence, we obtain the spectral theorem. Theorem 7.34. Let T be self-adjoint. Then it has a unique representation of the form s
T = ∑ λi Pi , i=1
where the λi ∈ R are distinct and Pi are orthogonal projections satisfying s
Pi Pj = δi j Pj , ∑ Pi = I. i=1
7.3 Operators in Inner Product Spaces
175
Proof. Let λi be the distinct eigenvalues of T . We define the Pi to be the orthogonal projections on Ker (λi I − T ). Conversely, if such a representation exists, it follows that necessarily, the λi are the eigenvalues of T and Im Pi = Ker (λi I − T ). In the special case of self-adjoint operators, the computation of the norm has a further characterization. Proposition 7.35. Let T be a self-adjoint operator in a finite-dimensional inner product space U . Then 1. We have T = supx≤1 |(T x, x)| = supx=1 |(T x, x)|. 2. Let λ1 , . . . , λn be the eigenvalues of T ordered so that |λi | ≥ |λi+1 |. Then T = |λ1 |, i.e., the norm equals the modulus of the largest eigenvalue. Proof. 1. Since for x satisfying x ≤ 1, we have |(T x, x)| ≤ T x · x ≤ T · x2 ≤ T , it follows that supx=1 |(T x, x)| ≤ supx≤1 |(T x, x)| ≤ T . To prove the converse inequality, we argue as follows. We use the following version of the polarization identity: 4Re (T x, y) = (T (x + y), x + y) − (T(x − y), x − y), which implies 1 sup |(T z, z)|[x + y2 + x − y2], 4 z≤1 1 |(Re (T x, y)| ≤ sup |(T z, z)|[2x2 + 2y2] ≤ sup |(T z, z)|. 4 z≤1 z≤1
|(Re (T x, y)| ≤
Choosing y = TT xx it follows that T x ≤ supz≤1 |(T z, z)|. The equality supz≤1 |(T z, z)| = supz=1 |(T z, z)| is obvious. 2. Let λ1 be the eigenvalue of T of largest modulus and x1 a corresponding, normalized, eigenvector. Then |λ1 | = |(λ1 (x1 , x1 )| = |(λ1 x1 , x1 )| = |(T x1 , x1 )| ≤ T · x1 2 = T . So the absolute values of all eigenvalues of T are bounded by T . Assume now that x is a vector for which T x = T . We will show that it is an eigenvector of T corresponding either to T or to −T . The equality sup |(T x, x)| = T implies either sup(T x, x) = T or inf(T x, x) = −T . We assume the first is satisfied. Thus there exists a unit vector x for which (T x, x) = T . We compute
176
7 Inner Product Spaces
0 ≤ T x − T x2 = T x2 − 2T (T x, x) + T 2 x2 ≤ T 2 x2 − 2T 2 + T 2 x2 = 0. So necessarily T x = T · x, or T = λ1 . If inf(T x, x) = −T , the same argument can be used.
7.3.4 The Minimax Principle For self-adjoint operators we have a very nice characterization of eigenvalues, known as the minimax principle. Theorem 7.36. Let T be a self-adjoint operator on an n-dimensional inner product space U. Let λ1 ≥ · · · ≥ λn be the eigenvalues of T . Then
λk =
min
max{(T x, x) | x = 1}.
{M |dim M =n−k+1} x∈M
(7.9)
Proof. Let {e1 , . . . , en } be an orthonormal basis of U consisting of eigenvectors of T . Let now M be an arbitrary subspace of U of dimension n − k + 1. Let Mk = L(e1 , . . . , ek ). Then since dim Mk + dim Nk = k + (n − k + 1) = n + 1, their intersection Mk ∩ Nk is nontrivial and contains a unit vector x. For this vector we have k
k
i=1
i=1
(T x, x) = ∑ λi |(x, ei )|2 ≥ λk ∑ |(x, ei )|2 = λk . This shows that min{M |dim M =n−k+1} maxx∈M {(T x, x) | x = 1} ≥ λk . To complete the proof, we need to exhibit at least one subspace on which the reverse inequality holds. To this end, let Mk = L(ek , . . . , en ). Then for any x ∈ M with x = 1, we have n
n
i=k
i=k
(T x, x) = ∑ λi |(x, ei )|2 ≤ λk ∑ |(x, ei )|2 ≤ λk x2 = λk . Therefore, we have n
n
i=k
i=k
max{(T x, x)| x = 1} = ∑ λi |(x, ei )|2 ≤ λk ∑ |(x, ei )|2 ≤ λk x2 = λk .
x∈M
7.3.5 The Cayley Transform The classes of unitary and self-adjoint operators generalize the sets of complex numbers of absolute value one and the set of real numbers respectively. Now the fractional linear transformation w = z−i z+i maps the upper half-plane onto the unit
7.3 Operators in Inner Product Spaces
177
disk, and in particular the real line onto the unit circle. Naturally, one wonders whether this can be extended to a map of self-adjoint operators onto unitary operators. The next theorem focuses on this map. Theorem 7.37. Let A be a self-adjoint operator in a finite-dimensional inner product space V . Then 1. For each vector x ∈ V , we have (A + iI)x2 = (A − iI)x2 = Ax2 + x2. 2. The operators A + iI, A − iI are both injective, hence invertible. 3. The operator U defined by U = (A − iI)(A + iI)−1
(7.10)
is a unitary operator for which 1 is not an eigenvalue. The operator U is called the Cayley transform of A. 4. Given a unitary map U in a finite-dimensional inner product space such that 1 is not an eigenvalue, the operator A defined by A = i(I + U)(I − U)−1
(7.11)
is a self-adjoint operator. This map is called the inverse Cayley transform. Proof. 1. We compute (A + iI)x2 = Ax2 + x2 + (ix, Ax) + (Ax, ix) = Ax2 + x2 = (A − iI)x2 . 2. The injectivity of both A + iI and A − iI is an immediate consequence of the previous equality. This shows the invertibility of both operators. 3. Let x ∈ V be arbitrary. Then there exists a unique vector z such that x = (A + iI)z or z = (A + iI)−1x. Therefore (A − iI)(A + iI)−1x = (A − iI)z = (A + iI)z = x. This shows that U defined by (7.10) is unitary. To see that 1 is not an eigenvalue of U, assume Ux = x. Thus (A − iI)(A + iI)−1 x = x = (A + iI)(A + iI)−1x and hence 2i(A + iI)−1 x = 0. This implies x = 0. 4. Assume U is a unitary map such that 1 is not an eigenvalue. Then I − U is invertible. Defining A by (7.11), and using the fact that U ∗ = U −1 , we compute A∗ = −i(I + U ∗ )(I − U ∗)−1 = −i(I + U −1)UU −1 (I − U −1 )−1 = −i(U + I)(U − I)−1 = i(I + U)(I − U)−1 = A. So A is self-adjoint.
178
7 Inner Product Spaces
7.3.6 Normal Operators Our analysis of the classes of unitary and self-adjoint operators in inner product spaces showed remarkable similarities. In particular, both classes admitted the existence of orthonormal bases made up of eigenvectors. One wonders whether the set of all operators having this property can be characterized. In fact, this can be done, and the corresponding class is that of normal operators, which we proceed to introduce. Definition 7.38. Let T be a linear transformation in a finite-dimensional inner product space. We say that T is normal if it satisfies T ∗T = T T ∗.
(7.12)
Proposition 7.39. The operator T is normal if and only if for every x ∈ V , we have T x = T ∗ x.
(7.13)
Proof. We compute, assuming T is normal, T x2 = (T x, T x) = (T ∗ T x, x) = (T T ∗ x, x) = (T ∗ x, T ∗ x) = T ∗ x2 , which implies (7.13). Conversely, assuming the equality (7.13), we have (T T ∗ x, x) = (T ∗ T x, x), and hence, by Proposition 7.35, we get T T ∗ = T ∗ T , i.e., T is normal. Corollary 7.40. Let T be a normal operator. Let α be an eigenvalue of T . Then T x = α x implies T ∗ x = α x. Proof. If T is normal, so is T − α I, and we apply Proposition 7.39.
Proposition 7.41. Let T be a normal operator in a complex inner product space. If for some complex number λ , we have (T − λ I)ν x = 0, then (T − λ I)x = 0. Proof. The proof is exactly as in the case of unitary operators and is omitted.
Lemma 7.42. Let T be a normal operator and assume λ , μ are distinct eigenvalues of T . If x, y are corresponding eigenvectors, then x ⊥ y. Proof. We compute
λ (x, y) = (λ x, y) = (T x, y) = (x, T ∗ y) = (x, μ y) = μ (x, y). Hence (λ − μ )(x, y) = 0, and since λ = μ , we conclude that (x, y) = 0. Thus x, y are orthogonal. The following is known as the spectral theorem for normal operators.
7.3 Operators in Inner Product Spaces
179
Theorem 7.43. Let T be a normal operator in a finite-dimensional complex inner product space V . Then there exists an orthonormal basis of V consisting of eigenvectors of T . k (z − λ )νi be the primary decomposition of the minimal Proof. Let mT (z) = Πi=1 i polynomial of T . We set Mi = Ker (λi I − T )νi . By Proposition 7.41 we have Mi = Ker (λi I −T ), i.e., it is the eigenspace corresponding to the eigenvalue λi . By Lemma 7.42, the subspaces Mi are mutually orthogonal. We apply now Theorem 6.26 to conclude that V = M1 ⊕ · · · ⊕ Mk , where now this is an orthogonal direct sum decomposition. Choosing an orthonormal basis in each subspace Mi and taking their union, we get an orthonormal basis for V made of eigenvectors.
Theorem 7.44. Let T be a normal operator in a finite-dimensional inner product space V . Then 1. A subspace M ⊂ V is invariant under T if and only if it is invariant under T ∗ . 2. Let M be an invariant subspace for T . Then the direct sum decomposition V = M ⊕ M ⊥ reduces T . 3. T reduced to an invariant subspace M , that is, T |M is a normal operator. Proof. 1. Let M be invariant under T . Since T |M has at least one eigenvalue and a corresponding eigenvector, say T x1 = λ1 x1 , then also T ∗ x1 = λ1 x1 . Let M1 be the subspace spanned by x1 . We consider M M1 = M ∩ M1⊥ . This is also invariant under T . Proceeding by induction, we conclude that M is spanned by eigenvectors of T hence, by Proposition 7.41, by eigenvectors of T ∗ . This shows that M is T ∗ invariant. The converse follows by symmetry. 2. Since M is invariant under both T and T ∗ , so is M ⊥ . 3. Let {e1 , . . . , em } be an orthonormal basis of M consisting of eigenvectors. Thus Tei = λi ei and T ∗ ei = λi ei . Let x = ∑m i=1 αi ei ∈ M . Then ∗ m T ∗ T x = T ∗ T ∑m i=1 αi ei = T ∑i=1 αi λi ei m = ∑i=1 αi |λi |2 ei = T ∑m i=1 αi λi ei m ∗ ∗ = T T ∑i=1 αi ei = T T x.
This shows that (T |M )∗ = T ∗ |M .
Theorem 7.45. The operator T is normal if and only if T ∗ = p(T ) for some polynomial p. Proof. If T ∗ = p(T ), then obviously T and T ∗ commute, i.e., T is normal. Conversely, if T is normal, there exists an orthonormal basis {e1 , . . . , en } consisting of eigenvectors corresponding to the eigenvalues λ1 , . . . , λn . Let p be any polynomial that interpolates the values λi at the points λi . Then p(T )ei = p(λi )ei = λi ei = T ∗ ei . This shows that T ∗ = p(T ).
180
7 Inner Product Spaces
7.3.7 Positive Operators We consider next a subclass of self-adjoint operators, that of positive operators. Definition 7.46. An operator T on an inner product space V is called nonnegative if for all x ∈ V , we have (T x, x) ≥ 0. The operator T is called positive if it is nonnegative and (T x, x) = 0 for x = 0 only. Similarly, a complex Hermitian matrix A is called nonnegative if for all x ∈ Cn , we have (Ax, x) ≥ 0. Proposition 7.47. 1. A Hermitian operator T in an inner product space is nonnegative if and only if all its eigenvalues are nonnegative. 2. A Hermitian operator T in an inner product space is positive if and only if all its eigenvalues are positive. Proof. 1. Assume T is Hermitian and nonnegative. Let λ be an arbitrary eigenvalue of T and x a corresponding eigenvector. Then
λ x2 = λ (x, x) = (λ x, x) = (T x, x) ≥ 0, which implies λ ≥ 0. Conversely, assume T is Hermitian and all its eigenvalues are nonnegative. Let {e1 , . . . , en } be an orthonormal basis consisting of eigenvectors corresponding to the eigenvalues λi ≥ 0. An arbitrary vector x has the expansion x = ∑ni=1 (x, ei )ei ; hence (T x, x) = (T ∑ni=1 (x, ei )ei , ∑nj=1 (x, e j )e j ) = ∑ni=1 ∑nj=1 (x, ei )(x, e j )(Tei , e j ) = ∑ni=1 ∑nj=1 (x, ei )(x, e j )(λi ei , e j ) = ∑ni=1 λi |(x, ei )|2 ≥ 0. 2. The proof is the same except for the inequalities being strict.
An easy way of producing nonnegative operators is to consider operators of the form A∗ A or AA∗ . The next result shows that this is the only way. Proposition 7.48. T ∈ L(U ) is nonnegative if and only if T = A∗ A for some operator A in U . Proof. That A∗ A is nonnegative is immediate. Conversely, assume T is nonnegative. By Theorem 7.33, there exists an orthonormal basis B = {e1 , . . . , en } in U made out of eigenvectors of T corresponding to the eigenvalues λ1 , . . . , λn . Since T is nonnegative, all the λi are nonnegative and 1
have nonnegative square roots λi 2 . Define a linear transformation S in U by letting 1
Sei = λi 2 ei . Clearly, S2 ei = λi ei = Tei . The operator S so defined is self-adjoint, for given x, y ∈ U , we have
7.3 Operators in Inner Product Spaces
181
(Sx, y) = (S ∑ni=1 (x, ei )ei , ∑nj=1 (y, e j )e j ) 1
= (∑ni=1 λi2 (x, ei )ei , ∑nj=1 (y, e j )e j ) 1
= (∑ni=1 (x, ei )ei , ∑nj=1 λi2 (y, e j )e j ) = (∑ni=1 (x, ei )ei , S ∑nj=1 (y, e j )e j ) = (x, Sy). So T = S2 = S∗ S.
Uniqueness of a nonnegative square root will be proved in Theorem 7.51. Definition 7.49. Given vectors x1 , . . . , xk in an inner product space V , we define the corresponding Gram matrix, or Gramian, G = (gi j ) by gi j = (xi , x j ). Clearly, a Gramian is always a Hermitian matrix. The next results connects Gramians with positivity. Theorem 7.50. Let V be a complex n-dimensional inner product space. A necessary and sufficient condition for a k × k, with k ≤ n, Hermitian matrix G to be nonnegative definite is that it be the Gram matrix of k vectors in V . Proof. Assume G⎛is a ⎞ Gramian, i.e., there exist vectors x1 , . . . , xk ∈ V for which ξ1 ⎜ ⎟ gi j = (xi , x j ). Let ⎝ ... ⎠ ∈ Ck . Then we compute
ξk (Gξ , ξ ) = ∑ni=1 ∑nj=1 gi j ξ j ξ i = ∑ni=1 ∑nj=1 (xi , x j )ξ j ξ i = (∑ni=1 ξ i xi , ∑nj=1 ξ i x j ) ≥ 0, which shows that G is nonnegative. To prove the converse, assume G is nonnegative. By Theorem 7.34, there exists a k × k unitary matrix U, with columns u1 , . . . , uk , for which U ∗ GU = diag (δ1 , . . . , δk ), with δi ≥ 0. Let γi be the nonnegative square root of δi . We define Γ = diag (γ1 , . . . , γk ). Thus we have U ∗ GU = Γ ∗Γ , or with R = Γ U −1 , G = R∗ R. Let r1 , . . . , rk be the columns of R. Then it follows that gi j = (r j , ri ). i.e., G is the Gramian of h vectors in Ck . To show that G is also the Gramian of k vectors in V , we choose an arbitrary orthonormal basis e1 , . . . , en in V . We define a map S : Ck −→ V by Sui = ei and extend it linearly. Clearly, S is isometric, and with xi = γi ei ∈ V , it follows that G is also the Gramian of x1 , . . . , xk . Effectively, in the proof of Proposition 7.48, we have constructed a nonnegative square root. We formalize this. Theorem 7.51. A nonnegative operator T has a unique nonnegative square root. Proof. The existence of a nonnegative square root has been proved in Proposition 7.48.
182
7 Inner Product Spaces
To prove uniqueness, let λ1 ≥ · · · ≥ λn be the eigenvalues of T and B = {e1 , . . . , en } an orthonormal basis in U made out of eigenvectors. Let A be an arbitrary nonnegative square root of T , i.e., T = A2 . We compute 1
1
0 = (λi I − A2 )ei = (λi 2 I + A)(λi2 I − A)ei . 1
1
Now, in case λi > 0, the operators λi 2 I + A are invertible, and hence Aei = λi 2 ei . In case λi = 0 we compute Aei 2 = (Aei , Aei ) = (A2 ei , ei ) = (Tei , ei ) = 0. So Aei = 0. Thus a nonnegative square root is completely determined by T , whence uniqueness.
7.3.8 Partial Isometries Definition 7.52. Let U1 , U2 be inner product spaces. An operator U : U1 −→ U2 is called a partial isometry if there exists a subspace M ⊂ U1 such that Ux =
x, x ∈ M , 0,
x ⊥ M.
The space M is called the initial space of U and the image of U, UM , the final space. In the following proposition we collect the basic facts on partial isometries. Proposition 7.53. Let U1 , U2 be inner product spaces, and U : U1 −→ U2 a linear transformation. Then 1. U is a partial isometry if and only if U ∗U is an orthogonal projection onto the initial space of U. 2. U is a partial isometry with initial space M if and only if U ∗ is a partial isometry with initial space UM . 3. UU ∗ is the orthogonal projection on the final space of U and U ∗U is the orthogonal projection on the initial space of U. Proof. 1. Assume U is a partial isometry with initial space M . Let P be the orthogonal projection of U1 on M . Then for x ∈ M , we have (U ∗Ux, x) = Ux2 = x2 = (x, x) = (Px, x).
7.3 Operators in Inner Product Spaces
183
On the other hand, if x ⊥ M , then (U ∗Ux, x) = Ux2 = 0 = (Px, x). So ((U ∗U − P)x, x) = 0 for all x, and hence U ∗U = P. Conversely, suppose P = U ∗U is a, necessarily orthogonal, projection in U1 . Let M = {x|U ∗Ux = x}. Then for any x ∈ U1 , Ux2 = (U ∗Ux, x) = (Px, x) = Px2 . This shows that U is a partial isometry with initial space M . 2. Assume U is a partial isometry with initial space M . Note that we have U2 = ImU ⊕ KerU ∗ = UM ⊕ KerU ∗ . Obviously U ∗ |KerU ∗ = 0, whereas if x ∈ M , then U ∗Ux = Px = x. So U ∗Ux = x = Ux, i.e., U ∗ is a partial isometry with initial space UM . By the previous part, it follows that UU ∗ is the orthogonal projection on UM , the final space of U. 3. From Part 1, it follows that UU ∗ is the orthogonal projection on the initial space of U ∗ , which is UM , the final space of U. The rest follows by duality.
7.3.9 The Polar Decomposition In analogy with the polar representation of a complex number in the form z = reiθ , we have a representation of an arbitrary linear transformation in an inner product space as the product of a unitary operator and a nonnegative one. Theorem 7.54. Let U1 , U2 be two inner product spaces and let T : U1 −→ U2 be a linear transformation. Then there exist partial isometries V : U1 −→ U2 and W : U2 −→ U1 for which 1
1
T = V (T ∗ T ) 2 = (T T ∗ ) 2 W.
(7.14)
The partial isometry V is uniquely defined if we require KerV = Ker T . Similarly, the partial isometry W is uniquely defined if we require KerW = Ker T ∗ . Proof. For x ∈ U1 , we compute 1
1
1
(T ∗ T ) 2 x2 = ((T ∗ T ) 2 x, (T ∗ T ) 2 x) = (T ∗ T x, x) = (T x, T x) = T x2 .
(7.15)
Note that we have 1
Im (T ∗ T ) 2 = Im T ∗ , 1
Ker (T ∗ T ) 2 = Ker T.
(7.16)
184
7 Inner Product Spaces
We define V : U1 −→ U2 by V (T ∗ T ) 2 x = T x, 1
V z = 0,
x ∈ U1 , z ∈ Ker T.
(7.17) 1
Recalling the direct sum representation U1 = Im (T ∗ T ) 2 ⊕ Ker T , it follows from 1 (7.15) that V is a well-defined partial isometry with Im (T ∗ T ) 2 as initial space 1 and Im T = Im (T T ∗ ) 2 as final space. The other representation in (7.14) follows by duality. A remark on uniqueness is in order. It is clear from the construction of V that it is 1 uniquely determined on Im (T ∗ T ) 2 . Therefore V is uniquely determined if Ker T = 1 Ker (T ∗ T ) 2 = {0}. Noting that the definition of V on Ker T was a choice we made, it follows that we could make a different choice, provided Im T has a nontrivial orthogonal complement. Since (Im T )⊥ = Ker T ∗ , Ker T ∗ = {0} is also a sufficient condition for the uniqueness of V . In general, if dim U2 ≥ dim U1 , we can assume V to be a, not necessarily unique, isometry. If dim U1 ≥ dim U2 , we can assume W to be a, not necessarily unique, coisometry, i.e., W ∗ an isometry. In case we have dim U1 = dim U2 , we can assume both V and W to be unitary. In this case, V and W are uniquely determined if and only if T is invertible.
7.4 Singular Vectors and Singular Values In this section we study the basic properties of singular vectors and singular values. We use this to give a characterization of singular values as approximation numbers. These results will be applied, in Chapter 12, to Hankel norm approximation problems. Starting from a polar decomposition of a linear transformation, and noting that the norm of a unitary operator, or for that matter also of isometries and nontrivial 1 1 partial isometries, is 1, it seems plausible that the operators (T ∗ T ) 2 and (T T ∗ ) 2 provide a measure of the size of T . Proposition 7.55. Let U1 , U2 be two inner product spaces and let T : U1 −→ U2 be 1 1 a linear transformation. Let (T ∗ T ) 2 and (T T ∗ ) 2 be the nonnegative square roots 1 1 of T ∗ T and T T ∗ respectively. Then (T ∗ T ) 2 and (T T ∗ ) 2 have the same nonzero eigenvalues, including multiplicities. 1
Proof. Let μ be a nonzero eigenvalue of (T ∗ T ) 2 and x = 0 a corresponding eigenvector. Clearly, this implies that also T ∗ T x = μ 2 x. Define y = Tμx . We proceed to compute T T ∗y = T T ∗
Tx T ∗ Tx T = (T T x) = (μ 2 x) = μ 2 = μ 2 y. μ μ μ μ
7.4 Singular Vectors and Singular Values
185
Note that x = 0 implies y = 0, and this shows that μ 2 is an eigenvalue of T T ∗ . In turn, 1 1 this implies that μ is an eigenvalue of (T T ∗ ) 2 . Let now Vi = {x ∈ U1 |(T ∗ T ) 2 x = 1 μi x} and Wi = {y ∈ U2 |(T T ∗ ) 2 y = μi y}. We define maps Φi : Vi −→ Wi and Ψi : Wi −→ Vi by Tx Φi x = , x ∈ Vi , μi (7.18) T ∗y Ψi y = , y ∈ Wi . μi We compute
Φi x2 =
Tx Tx , μi μi
=
1 ∗ (T T x, x) = x2 , μi2
i.e., Φi is an isometry. The same holds for Ψi by symmetry. Finally, it is easy to check that Ψi Φi = I and ΦiΨi = I, i.e., both maps are unitary. Hence, Vi , Wi have 1 the same dimension, and so the multiplicity of μi as an eigenvalue of (T ∗ T ) 2 and 1 (T T ∗ ) 2 is the same. This leads us to the following definition. Definition 7.56. Let T : U1 −→ U2 be a linear transformation. A pair of vectors {φ , ψ }, with φ ∈ U1 and ψ ∈ U2 , is called a Schmidt pair of T , corresponding to the nonzero singular value μ , if T φ = μψ , T ∗ ψ = μφ ,
(7.19)
holds. We also refer to φ , ψ as the singular vectors of T and T ∗ respectively. The minimax principle leads to the following characterization of singualr values. Theorem 7.57. Let U1 , U2 be inner product spaces and let T : U1 −→ U2 be a linear transformation. Let μ1 ≥ · · · ≥ μn ≥ 0 be its singular values. Then
μk =
T x . min max {M |codim M =k−1} x∈M x
Proof. The μi2 are the eigenvalues of T ∗ T . Applying the minimax principle, we have
μk2 = min{M |codim M =k−1} maxx∈M = min{M |codim M =k−1} maxx∈M which is equivalent to the statement of the theorem.
(T ∗ T x, x) (x, x) T x2 , x2
186
7 Inner Product Spaces
With the use of the polar decomposition and the spectral representation of self-adjoint operators, we get a convenient representation of arbitrary operators. Theorem 7.58. Let T : U1 −→ U2 be a linear transformation and let μ1 ≥ · · · ≥ μr be its nonzero singular values and let {φi , ψi } be the corresponding orthonormal sets of Schmidt pairs. Then for every pair of vectors x ∈ U1 , y ∈ U2 , we have r
T x = ∑ μi (x, φi )ψi
(7.20)
i=1
and r
T ∗ y = ∑ μi (y, ψi )φi .
(7.21)
i=1
Proof. Let μ1 ≥ · · · ≥ μr > 0 the nonzero singular values and {φi , ψi } the corresponding Schmidt pairs. We have seen that with V defined by V φi = ψi , and of course V ∗ ψi = φi , we have 1 T = V (T ∗ T ) 2 . Now, for x ∈ U1 , we have r
(T ∗ T ) 2 x = ∑ μi (x, φi )φi , 1
i=1
and hence 1
T x = V (T ∗ T ) 2 x = V
r
r
i=1
i=1
∑ μi (x, φi )φi = ∑ μi (x, φi )ψi .
T∗
can be derived analogously, or alternatively, can be The other representation for obtained by computing the adjoint of T using (7.20). The availability of the previously obtained representations for T and T ∗ leads directly to an extremely important characterization of singular values in terms of approximation properties. The question we address is the following. Given a linear transformation of rank m between two inner product spaces, how well can it be approximated, in the operator norm, by transformations of lower rank. We have the following characterization. Theorem 7.59. Let T : U1 −→ U2 be a linear transformation of rank m. Let μ1 ≥ · · · ≥ μm be its nonzero singular values. Then
μk = inf{T − T | rank T ≤ k − 1}.
(7.22)
Proof. By Theorem 7.58, we may assume that T x = ∑ni=1 μi (x, φi )ψi . We define a linear transformation T by T x = ∑k−1 i=1 μi (x, φi )ψi . Obviously rank(T ) = k − 1 and n (T − T )x = ∑i=k μi (x, φi )ψi , and therefore we conclude that T − T = μk . Conversely, let us assume rank(T ) = k − 1, i.e., codimKer T = k − 1. By Theorem 7.57, we have
7.5 Unitary Embeddings
187
T x (T − T )x μk ≤ max = max ≤ T − T . x x∈Ker T x x∈Ker T
7.5 Unitary Embeddings The existence of nonnegative square roots is a central tool in modern operator theory. In this section we discuss some related embedding theorems. The basic question we address is, given a linear transformation A in an inner product space U, when can it be embedded in a 2 × 2 block unitary operator matrix of the form V=
A B CD
.
This block matrix is to be considered initially to be an operator defined on the A B x Ax + By = . It is clear that if V given inner product space U ⊕U, by CD y Cx + Dy x x before is unitary, then, obseving that P defined by P = is an orthogonal y
0
projection, it follows that A = PV |U⊕{0} , and we have A = PV |U⊕{0} ≤ V = 1, that is, A is necessarily a contraction. The following theorem focuses on the converse. Theorem 7.60. A linear transformation A can be embedded in a 2 × 2 block unitary matrix if and only if it is a contraction. Proof. That embeddability in a unitary matrix implies that A is contractive has been shown before. Thus, assume A is contractive, i.e., Ax ≤ x for all x ∈ U. Since x2 − Ax2 = (x, x) − (Ax, Ax) = (x, x) − (A∗ Ax, x) 1 = ((I − A∗ A)x, x) = (I − A∗ A) 2 x2 . Here we used the fact that the nonnegative operator I − A∗ A has a unique nonnega1 ∗ tive square root. Thisimplies given that, setting C = (I − A A) 2 , the transformation A C
A
is isometric, i.e., satisfies ( A∗ C∗ ) = I. C 1 A∗ Similarly, using the requirement that ( A B ) = I, we set B = (I − AA∗ ) 2 . ∗
by the block matrix
B
Computing now
188
7 Inner Product Spaces
I 0 0I
=
1
A (I − AA∗ ) 2 1 (I − A∗ A) 2 D
1
1
(I − A∗ A) 2 A∗ 1 (I − AA∗ ) 2 D∗
1
, 1
we have (I − A∗ A) 2 A∗ + D(I − AA∗ ) 2 = 0. Using the equality (I − A∗ A) 2 A∗ = 1 1 A∗ (I − AA∗ ) 2 , we get (A∗ + D)(I − AA∗ ) 2 = 0. This indicates that we should choose D = −A∗ . It remains to check that 1 A (I − AA∗ ) 2 (7.23) V= 1 (I − A∗ A) 2 −A∗
is a required embedding.
The embedding (7.23) is not unique. In fact, if K and L are unitary operators in 1
U, then
A (I − AA∗ ) 2 L 1 K(I − A∗ A) 2 −KA∗ L
parametrize all such embeddings.
The embedding presented in the previous theorem is possibly redundant, since it requires a space of twice the dimension of the original space U. For example, if A is unitary to begin with, then no extension is necessary. Thus the dimension of the minimal space where a unitary extension is possible seems to be related to the distance of the contraction A from unitary operators. This is generally measured by 1 1 the ranks of the operators (I − A∗ A) 2 and (I − AA∗ ) 2 . The next theorem handles this situation. However, before embarking on that route, we give an important dilation result. This is an extension of a result originally obtained in Halmos (1950). That construction played a central role, via the Sch¨affer matrix, see Sz.-Nagy and Foias (1970), in the theory of unitary dilations. A B Theorem 7.61. Given a contraction A in Cn , there exists a unitary matrix , CD
with B injective and C surjective, if and only if this matrix has the form
1
A (I − AA∗ ) 2 V 1 U(I − A∗ A) 2 −UA∗V
,
(7.24) 1
where U : Cn −→ C p is a partial isometry with initial space Im (I − A∗ A) 2 and 1 where V : C p −→ Cn is an isometry with final space Im (I − AA∗ ) 2 . Proof. Let P
1
Im (I−AA∗ ) 2
and P
1
Im (I−A∗ A) 2
be the orthogonal projections on the
appropriate spaces. Assume U,V are partial isometries as described in the theorem. Then we have UU ∗ = I,
U ∗U = P 1, Im (I−A∗ A) 2
V ∗V = I,
VV ∗ = P 1. Im (I−AA∗ ) 2
(7.25)
7.5 Unitary Embeddings
189
These identities imply the following: VV ∗ (I − AA∗) 2 = (I − AA∗ ) 2 , 1
1
1
1
(I − AA∗) 2 VV ∗ = (I − AA∗) 2 .
(7.26)
We compute now the product
A (I − AA∗ ) 2 V 1 U(I − A∗ A) 2 −UA∗V 1
A∗ (I − A∗ A) 2 U ∗ 1 V ∗ (I − AA∗) 2 −V ∗ AU ∗ 1
.
We use the identities in (7.26). So 1
1
1
1
AA∗ + (I − AA∗) 2 VV ∗ (I − AA∗ ) 2 = AA∗ + (I − AA∗ ) 2 (I − AA∗ ) 2 = I. Next, 1
1
1
1
A(I − A∗A) 2 U ∗ − (I − AA∗) 2 VV ∗ AU ∗ = A(I − A∗ A) 2 U ∗ − (I − AA∗ ) 2 AU ∗ = 0. Similarly, 1
1
1
1
U(I − A∗ A) 2 A∗ − UA∗VV ∗ (I − AA∗ ) 2 = U[(I − A∗ A) 2 A∗ − A∗ (I − AA∗ ) 2 ] = 0. Finally, 1
1
U(I − A∗ A) 2 (I − A∗A) 2 U ∗ +UA∗VV ∗ AU ∗ = U(I − A∗ A)U ∗ +UA∗ AU ∗ = UU ∗ = I. This shows that the matrix in (7.24) is indeed unitary. Conversely, suppose we are given a contraction A. If C satisfies A∗ A + C∗C = I, then C∗C = I − A∗ A. This in turn implies that for every vector x, we have 1
Cx2 = (I − A∗ A) 2 x2 . 1
Therefore, there exists a partial isometry U, defined on Im (I − A∗A) 2 , for which 1
C = U(I − A∗ A) 2 .
(7.27)
Obviously, ImC ⊂ ImU. Since C is surjective this forces U to be surjective. Thus UU ∗ = I and U ∗U = P 1 . In a similar fashion, we start from the equality ∗ Im (I−A A) 2
AA∗ + BB∗ = I and see that for every x,
1
B∗ x2 = (I − AA∗ ) 2 x2 .
190
7 Inner Product Spaces
So we conclude that there exists a partial isometry V ∗ with initial space Im (I − 1 AA∗ ) 2 for which 1 B∗ = V ∗ (I − AA∗) 2 , or 1
B = (I − AA∗ ) 2 V. Since B is injective, so is V . So we
get V ∗V
= I,
(7.28)
and VV ∗
Next we use the equality DB∗ + CA∗ = 0, to get 1
=P 1. Im (I−AA∗ ) 2
1
1
DV ∗ (I − AA∗ ) 2 + U(I − A∗ A) 2 A∗ = (DV ∗ + UA∗ )(I − AA∗ ) 2 = 0, so DV ∗ +UA∗ |
1
1
Im (I−AA∗ ) 2
1
= 0. Now we have Cn = Im (I − AA∗ ) 2 ⊕ Ker (I − AA∗ ) 2 . 1
1
1
Moreover, the identity A(I − A∗ A) 2 = (I − AA∗ ) 2 A implies AIm (I − A∗ A) 2 ⊂ 1 1 1 Im (I − AA∗ ) 2 as well as AKer (I − A∗ A) 2 ⊂ Ker (I − AA∗ ) 2 . Similarly, A∗ Ker (I − 1 1 1 AA∗ ) 2 ⊂ Ker (I − A∗ A) 2 . Letting now x ∈ Ker (I − AA∗ ) 2 , then A∗ x ∈ Ker (I − 1 1 A∗ A) 2 . Since U is a partial isometry with initial space Im (I − A∗ A) 2 , necessarily 1 UA∗ x = 0. On the other hand V ∗ is a partial isometry with initial space (I − AA∗ ) 2 ∗ ∗ and hence V ∗ | 1 = 0. So DV + UA | 1 = 0, and hence we can Ker (I−AA∗ ) 2 Ker (I−AA∗ ) 2 ∗ ∗ ∗ conclude that DV + UA = 0. Using the fact that V V = I, this implies D = −UA∗V.
(7.29)
It is easy to check now, as in the sufficiency part, that with B,C, D satisfying (7.27), (7.28), and (7.29) respectively, all other equations are satisfied.
7.6 Exercises 1. Show that the rational function q(z) = ∑nk=−n qk zk is nonnegative on the unit circle if and only if q(z) = p(z)p (z) for some polynomial p(z). Here p (z) = p(z−1 ). This theorem, the simplest example of spectral factorization, is due to Fej´er. 2. Show that a polynomial q(z) is nonnegative on the imaginary axis if and only if for some polynomial p(z), we have q(z) = p(z)p(−z). 3. Given vectors x1 , . . . , xk in an inner product space U , the Gram determinant is defined by G(x1 , . . . , xk ) = det((xi , x j )). Show that x1 , . . . , xk are linearly independent if and only if detG(x1 , . . . , xk ) = 0. 4. Let X be an inner product space of functions. Let f0 , f1 , . . . be a sequence of linearly independent functions in X and let φ0 , φ1 , . . . be the sequence of
7.6 Exercises
191
functions obtained from it by way of the Gram–Schmidt orthonormalization procedure. a. Show that
( f0 , f 0 ) . . . . ( fn , f 0 ) . .... . . .... . φn (x) = . . .... . ( f0 , fn−1 ) . . . . ( fn , fn−1 ) f0 (x) . . . . f n (x)
b. Let n
φn (x) = ∑ cn,i fi (x), i=0
and define D−1 = 1, ( f , f ) . . . . ( fn , f ) 0 0 0 . .... . .... . . Dn = . . .... . . .... . ( f0 , fn ) . . . . ( f n , fn ) Show that
cn,n =
Dn−1 Dn
1 2
.
5. In the space of real polynomials R[x], an inner product is defined by (p, q) =
6. 7.
8. 9.
1 −1
p(x)q(x)dx.
Apply the Gram-Schmidt orthogonalization procedure to the sequence of polynomials 1, x, x2 , . . . . Show that up to multiplicative constants, the resulting n 2 n . polynomials coincide with the Legendre polynomials Pn (x) = 2n1n! d (xdx−1) n Show that the rank of a skew-symmetric matrix in a real inner product space is even. Let K be a skew-Hermitian operator, i.e., K ∗ = −K. Show that U = (I + K)−1 (I − K) is unitary. Show that every unitary operator is of the form U = λ (I + K)−1(I − K), where λ is a complex number of absolute value 1. Show that if A and B are normal and Im A ⊥ Im B, then A + B is normal. Show that A is normal if and only if for some unitary U, we have A∗ = AU.
192
7 Inner Product Spaces
10. Let N be a normal operator and let AN = NA. Show that AN ∗ = N ∗ A. 11. Given a normal operator N and any positive integer k, show that there exists an operator A satisfying Ak = N. 12. Show that a circulant matrix A ∈ Cn×n is normal. 13. Let T be a real orthogonal matrix, i.e., it satisfies T˜ T = I. Show that T is similar to a block diagonal matrix diag (I, R1 , . . . , Rk ), where Ri =
cos θi sin θi − sin θi cos θi
.
14. Show that a linear operator A in an inner product space having eigenvalues λ1 , . . . , λn is normal if and only if Trace(A∗ A) = ∑ni=1 |λi |2 . 15. Given contractive operators A and B in finite-dimensional inner product spaces U1 andU2 respectively, show that there exists an operator X : U2 −→ U1 for which
AX 0 B
1
1
is contractive if and only if X = (I − AA∗ ) 2 C(I − B∗ B) 2 and
C : U2 −→ U1 is a contraction. 16. Define the Jacobi tridiagonal matrices by ⎞ 0 a1 −b1 . . ⎜ −c a . . ⎟ ⎟ ⎜ 1 2 ⎟ ⎜ Jk = ⎜ . . . . . ⎟, ⎟ ⎜ ⎝ . . . −bk−1 ⎠ 0 . . −ck−1 ak ⎛
assuming bi , ci > 0. Let Dk (λ ) = det(λ I − Jk ). a. Show that the following recurrence relation is satisfied: Dk (λ ) = (λ − ak )Dk−1 (λ ) − bk−1ck−1 Dk−2 (λ ). b. Show that Jk is similar to a self-adjoint transformation. 17. Singular value decomposition. Let A ∈ Cm×n and let σ1 ≥ · · · ≥ σr be its nonzero singular values. Show that there exist unitary matrices U ∈ Cm×m and V ∈ Cn×n such that D0 A= , 0 0 where D = diag (σ1 , . . . , σr ). Compare with Theorem 7.58. 18. Show that the eigenvalues of
0 A A∗ 0
are σ1 , . . . , σn , −σ1 , . . . , −σn .
7.7 Notes and Remarks
193
7.7 Notes and Remarks Most of the results in this chapter go over in one way or another to the context of Hilbert spaces, where, in fact, most of them were proved initially. Young (1988) is a good, very readable modern source for the basics of Hilbert space theory. The spectral theorem in Hilbert space was one of the major early achievements of functional analysis and operator theory. For an extensive source on these topics and the history of the subject, see Dunford and Schwartz (1958, 1963). Singular values were introduced by E. Schmidt in 1907. Because of the importance of singular values as approximation numbers, equation (7.22) is sometimes taken as the definition of singular values. In this connection, see Young (1988).
Chapter 8
Tensor Products and Forms
8.1 Introduction In this chapter, we discuss the general theory of quadratic forms and the notion of congruence and its invariants, as well as the applications of the theory to the analysis of special forms. We will focus primarily on quadratic forms induced by rational functions, most notably the Hankel and Bezout forms, because of their connection to system-theoretic problems such as stability and signature symmetric realizations. These forms use as their data different representations of rational functions, namely power series and coprime factorizations respectively. But we will discuss also the partial fraction representation in relation to the computation of the Cauchy index of a rational function and the proof of the Hermite–Hurwitz theorem and the continued fraction representation as a tool in the computation of signatures of Hankel matrices as well as in the problem of Hankel matrix inversion. Thus, different representations of rational functions, i.e., different encodings of the information carried by a rational function, provide efficient starting points for different methods. The results obtained for rational functions will be applied to root-location problems for polynomials in the next chapter. In the latter part of the chapter, we approach the study of bilinear forms from the point of view of tensor products and their representations as spaces of homomorphisms. This we specialize to the tensor product of polynomial and rational models, emphasizing concrete, functional identifications for the tensor products. Since polynomial models carry two natural structures, namely that of vector spaces over the underlying field F as well as that of modules over the polynomial ring F[z], it follows that there are two related tensor products. The relation between the two tensor products illuminates the role of the Bezoutians and the Bezout map. In turn, this leads to the characterization of the maps that intertwine two polynomial models, yielding a generalization of Theorem 5.17.
P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, DOI 10.1007/978-1-4614-0338-8 8, © Springer Science+Business Media, LLC 2012
195
196
8 Tensor Products and Forms
8.2 Basics 8.2.1 Forms in Inner Product Spaces Given a linear transformation T in a finite-dimensional complex inner product space U , we define a field-valued function φ on U × U by
φ (x, y) = (T x, y).
(8.1)
Clearly, φ is linear in the variable x and antilinear in the variable y. Such a function will be called a sesquilinear form or just a form. If the field is the field R of real numbers, then a form is actually linear in both variables, i.e., a bilinear form. It might seem that the forms defined by (8.1) are rather special. This is not the case, as is seen from the following. Theorem 8.1. Let U be a finite-dimensional complex inner product space. Then φ is a form on U if and only if there exists a linear transformation T in U such that (8.1) holds. The operator T is uniquely determined by φ . Proof. We utilize Theorem 7.14 on the representation of linear functionals on inner product spaces. Thus, if we fix a vector y ∈ U , the function φ (x, y) is a linear functional, hence given by an inner product with a vector ηy , that is, φ (x, y) = (x, ηy ). It is clear that ηy depends linearly on y. Therefore, there exists a linear transformation S in U for which ηy = Sy. We complete the proof by defining T = S∗ . To see uniqueness, assume T1 , T2 represent the same form, i.e., (T1 x, y) = (T2 x, y). This implies ((T1 − T2 )x, y) = 0, for all x, y ∈ U . Choosing y = (T1 − T2 )x, we get (T1 − T2 )x = 0 for all x ∈ U . This shows that T2 = T1 . Suppose we choose a basis B = { f 1 , . . . , fn } in U . Then arbitrary vectors in U can be written as x = ∑nj=1 ξ j f j and y = ∑ni=1 ηi fi . In terms of the coordinates, we can compute the form by
φ (x, y) = (T x, y) = (T ∑nj=1 ξ j f j , ∑ni=1 ηi fi ) = ∑nj=1 ∑ni=1 ξ j ηi (T f j , fi ) = ∑nj=1 ∑ni=1 φi j ξ j ηi , where we define φi j = (T f j , fi ). We call the matrix (φi j ) the matrix representation of the form φ in the basis B, and denote this matrix by [φ ]B B . Using the standard inner product in Cn , we can write B B φ (x, y) = ([φ ]B B [x] , [x] ).
Next, we consider how the matrix of a quadratic form changes with a change of B basis in U . Let B = {g1, . . . , gn } be another basis in U . Then [x]B = [I]B B [x] and therefore
8.2 Basics
197
B B B B B B B φ (x, y) = ([φ ]B B [x] , [y] ) = ([φ ]B [I]B [x] , [I]B [y] )
∗ B B B B B B B = (([I]B B ) [φ ]B [I]B [x] , [y] ) = ([φ ]B [x] , [y] ),
which shows that
B ∗ B B [ φ ]B B = ([I]B ) [φ ]B [I]B ,
(8.2)
i.e., in a change of basis, the matrix of a quadratic form changes by congruence. The next result clarifies the connection between the matrix representation of a form and the matrix representation of the corresponding linear transformation. We say that a complex square matrix A1 is congruent to a matrix A if there exists a nonsingular matrix R such that A1 = R∗ AR. In case we deal with the real field, the ˜ Hermitian adjoint R∗ is replaced by the transpose R. The following proposition is standard, and we omit the proof. Proposition 8.2. In the ring of square real or complex matrices, congruence is an equivalence relation. A quadratic form in a complex inner product space U is a function of the form
φˆ (x) = φ (x, x),
(8.3)
where φ (x, y) is a symmetric bilinear form on U . We say that a form on U is a symmetric form if we have φ (x, y) = φ (y, x) for all x, y ∈ U and a Hermitian form if we have the additional property
φ (x, y) = φ (y, x). Proposition 8.3. A form φ on a complex inner product space U is Hermitian if and only if there exists a Hermitian operator T in U for which φ (x, y) = (T x, y). Proof. Assume T is Hermitian, i.e., T ∗ = T . Then
φ (x, y) = (T x, y) = (x, Ty) = (Ty, x) = φ (y, x). Conversely, assume φ (x, y) = φ (y, x). Let T be the linear transformation for which φ (x, y) = (T x, y). We compute (T x, y) = φ (x, y) = φ (y, x) = (Ty, x) = (x, Ty), and this for all x, y ∈ U . This shows that T is Hermitian.
Proposition 8.4. Let U be a complex inner product space and φ a Hermitian form on U . Let T be the uniquely defined linear transformation in U for which φ (x, y) = B∗ (T x, y), and let B = { f1 , . . . , fn } be a basis for U . Then [φ ]B B = [T ]B .
198
8 Tensor Products and Forms ∗
Proof. Let B ∗ = {g1 , . . . , gn } be the basis dual to B. Then [T ]B B = (ti j ) where the ti j are defined through T f j = ∑nk=1 tk j gk . Computing (T f j , fi ) =
n
∑ tk j gk , fi
k=1
=
n
∑ tk j δik = ti j ,
k=1
the result follows.
We are interested in the possibility of reducing a Hermitian form, using congruence transformations, to a diagonal form. Theorem 8.5. Let φ be a Hermitian form on a finite-dimensional complex inner product space U . Then there exists an orthonormal basis B = {e1 , . . . , en } in U for which [φ ]B B is diagonal with real entries. Proof. Let T : U −→ U be the Hermitian operator representing the form φ . By Theorem 7.33, there exists an orthonormal basis B = {e1 , . . . , en } in U consisting of eigenvectors of T corresponding to the real eigenvalues λ1 , . . . , λn . With respect to this basis, the matrix of the form is given by
φi j = (Te j , ei ) = λ j (e j , ei ) = λ j δi j .
Just as positive operators are a subset of self-adjoint operators, so positive forms are a subset of Hermitian forms. We say that a form φ on a complex inner product space U is nonnegative if it is Hermitian and φ (x, x) ≥ 0 for all x ∈ U . We say that a form φ on an inner product space U is positive if it is Hermitian and φ (x, x) > 0 for all 0 = x ∈ U . Corollary 8.6. Let φ be a nonnegative Hermitian form on the complex inner product space U . Then there exists an orthogonal basis B = { f1 , . . . , fn } for which the matrix [φ ]B B is diagonal with
φii =
1, 0,
0 ≤ i ≤ r, r < i ≤ n.
Proposition 8.7. A form φ on a complex inner product space U is positive if and only if there exists a positive operator P for which φ (x, y) = (Px, y). Of course, to check positivity of a form, it suffices to compute all the eigenvalues and check them for positivity. However, computing eigenvalues is not, in general, an algebraic process. So our next goal is to search for a computable criterion for positivity.
8.2 Basics
199
8.2.2 Sylvester’s Law of Inertia If we deal with real forms, then due to the different definition of congruence, we cannot apply the complex result directly. The proof of Theorem 8.5 is particularly simple. However, it uses the spectral theorem, and that theorem is constructive only if we have access to the eigenvalues of the appropriate self-adjoint operator. Generally, this is not the case. Thus we would like to re-prove the diagonalization by congruence result in a different way. It is satisfying that this can be done algebraically, and the next theorem, known as the Lagrange reduction method, accomplishes this. Theorem 8.8. Let φ be a symmetric bilinear form on a real inner product space U . Then there exists a basis B = { f1 , . . . , fn } of U for which [φ ]B B is diagonal. Proof. Let φ (x, y) be a symmetric bilinear form on U . We prove the theorem by induction on the dimension n of U . If n = 1, then obviously [φ ]B B , being a 1 × 1 matrix, is diagonal. Assume we have proved the theorem for spaces of dimension ≤ n − 1. Let now φ be a symmetric bilinear form in an n-dimensional space U . Assume B = { f1 , . . . , fn } is an arbitrary basis of U . We consider two cases. In the first, we assume that not all the diagonal elements of [φ ]B B are zero, i.e., there exists an index i for which φ ( f i , f i ) = 0. The second case is the complementary one, in which all the φ ( fi , fi ) are zero. Case 1: Without loss of generality, using symmetric transpositions of rows and columns, we will assume that φ ( f1 , f1 ) = 0. We define a new basis B = { f1 , . . . , fn }, where ⎧ i = 1, f , ⎪ ⎪ ⎨ 1 fi = φ ( f1 , fi ) ⎪ ⎪ ⎩ fi − f1 , 1 < i ≤ n. φ ( f1 , f1 ) Clearly, B is indeed a basis for U and
φ ( f1 , fi ) =
⎧ ⎨ φ ( f1 , f1 ), i = 1, ⎩
1 < i ≤ n.
0,
Thus, with the subspace M = L( f2 , . . . , fn ), we have [φ ]
B B
=
φ ( f1 , f1 )
0
0
[φ |M]B0
B
.
0
Here φ |M is the restriction of φ to M and B0 = { f2 , . . . , fn } is a basis for M. The proof, in this case, is completed using the induction hypothesis.
200
8 Tensor Products and Forms
Case 2: Assume now that φ ( fi , fi ) = 0 for all i. If φ ( fi , f j ) = 0 for all i, j, then φ is the zero form, and hence [φ ]B B is the zero matrix, which is certainly diagonal. So we assume there exists a pair of distinct indices i, j for which φ ( fi , f j ) = 0. Without loss of generality, we will assume that i = 1, j = 2. To simplify notation we will write a = φ ( f1 , f2 ). First, we note that defining f1 = f1 +2 f2 and f2 = f1 −2 f2 , we have 1 a φ ( f1 , f1 ) = φ ( f1 + f2 , f1 + f2 ) = , 4 2 1 φ ( f1 , f2 ) = φ ( f1 + f2 , f1 − f2 ) = 0, 4 1 a φ ( f2 , f2 ) = φ ( f1 − f2 , f1 − f2 ) = − . 4 2 We choose the other vectors in the new basis to be of the form fi = fi − αi f1 − βi f2 , with the requirement that for i = 3, . . . , n, we have φ ( f1 , fi ) = φ ( f2 , fi ) = 0. Therefore we get the pair of equations 0 = φ ( f1 + f2 , fi − αi f1 − βi f2 ) = φ ( f1 , fi ) + φ ( f2 , fi ) − αi φ ( f2 , f1 ) − βi φ ( f2 , f1 ) 0 = φ ( f1 − f2 , fi − αi f1 − βi f2 ) = φ ( f1 , fi ) − φ ( f2 , fi ) + αi φ ( f2 , f1 ) − βi φ ( f2 , f1 ). This system has the solution αi = { f1 , . . . , fn }, where
φ ( f2 , fi ) a
and βi =
φ ( f1 , fi ) . a
f1 + f2 , 2 f1 − f2 , f2 = 2 φ ( f2 , fi ) φ ( f1 , fi ) f1 − f2 . fi = fi − a a
We define now B =
f1 =
(8.4)
It is easy to check that B is a basis for U . With respect to this basis we have the block diagonal matrix ⎞
⎛a
[φ ]B B
0 ⎜2 a ⎜0 − =⎜ 2 ⎝
B
[φ |M]B0
⎟ ⎟ ⎟. ⎠
0
Here B0 = { f3 , . . . , fn } and M = L( f3 , . . . , fn ). We complete the proof by applying the induction hypothesis to φ |M. Corollary 8.9. Let φ be a quadratic form defined on a real inner product space U . Let B = { f1 , . . . , fn } be an arbitrary basis of U and let B = {g1 , . . . , gn } be a
8.2 Basics
201
basis of U that diagonalizes φ . Let φi = φ (gi , gi ). If ⎞ ξ1 ⎜ ⎟ [x]B = ⎝ ... ⎠ , ⎛
⎞ η1 ⎜ ⎟ = ⎝ ... ⎠ , ⎛
[x]B
ξn
ηn
and [I]B B = (ai j ), then
n
n
i=1
j=1
φ (x, x) = ∑ φi ( ∑ ai j ξ j )2 .
(8.5)
Proof. We have
B B B B B B B B B B φ (x, x) = ([φ ]B B [x] , [x] ) = ([φ ]B [x] , [x] ) = ([φ ]B [I]B [x] , [[I]B x] ).
n B Also [x]B = [I]B B [x] implies ηi = ∑ j=1 ai j ξ j . So
n n B B φ (x, x) = [φ ]B = ∑ φi ηi2 = ∑ φi B [x] , [x] i=1
i=1
n
∑ ai j ξ j
2 .
j=1
We say that (8.5) is a representation of the quadratic form as a sum of squares. Such a representation is far from unique. In different representations, the diagonal elements as well as the linear forms may differ. We have, however, the following result, known as Sylvester’s law of inertia, which characterizes the congruence invariant. Theorem 8.10. Let U be a real inner product space and φ (x, y) a symmetric bilinear form on U . Then there exists a basis B = { f1 , . . . , fn } for U such that the matrix [φ ]B B is diagonal, with φ ( fi , f j ) = εi δi j , with ⎧ ⎨ 1, 0 ≤ i ≤ k, εi = −1, k < i ≤ r, ⎩ 0, r < i ≤ n. Moreover, the numbers k and r are uniquely determined by φ . Proof. By Theorem 8.8, there exists a basis B such that [φ ]B B is real and diagonal, with λ1 , . . . , λn the diagonal elements. Without loss of generality, we can assume 1 λi > 0 for 1 ≤ i ≤ k, λi < 0 for k < i ≤ r and λi = 0 for r < i ≤ n. We set μi = |λi | 2 and εi = sign λi . Hence, we have, for all i, λi = εi μi2 . We consider now the basis 1 B1 = {μ1−1 e1 , . . . , μn−1 en }. Clearly, with respect to this basis, the matrix [φ ]B B1 has the required form.
202
8 Tensor Products and Forms
To prove the uniqueness of k and r, we note first that r is equal to the rank of [ φ ]B B . So we have to prove uniqueness for k only. Suppose we have another basis B = {g1 , . . . , gn } such that φ (g j , gi ) = εi δi j , where ⎧ ⎨ 1, 0 ≤ i ≤ k , εi = −1, k < i ≤ r, ⎩ 0, r < i ≤ n. Without loss of generality, we assume k < k . Given x ∈ U , we have x = ∑ni=1 ξi fi = ∑ni=1 ηi gi , that is, the ξi and ηi are the coordinates of x with respect to B and B respectively. Computing the quadratic form, we get n
n
i=1
i=1
φ (x, x) = ∑ εi ξi2 = ∑ εi ηi2 , or
2 ξ12 + · · · + ξk2 − ξk+1 − · · · − ξr2 = η12 + · · · + ηk2 − ηk2 +1 − · · · − ηr2 .
This can be rewritten as 2 ξ12 + · · · + ξk2 + ηk2 +1 + · · · + ηr2 = η12 + · · · + ηk2 + ξk+1 + · · · + ξr2 .
(8.6)
Now we consider the set of equations ξ1 = · · · = ξk = ηk +1 = · · · = ηr = 0. We have here r − k + k = r − (k − k) < r equations. Thus there exists a nontrivial vector x for which not all the coordinates ξk+1 = · · · = ξr are zero. However, by equation (8.6), we get ξk+1 = · · · = ξr = 0, which is a contradiction. Thus we must have k ≥ k, and by symmetry we get k = k. Definition 8.11. The numbers r and s = 2k − r determined by Sylvester’s law of inertia are called the rank and signature respectively of the quadratic form φ . We will denote the signature of a quadratic form by σ (φ ). Similarly, the signature of a symmetric (Hermitian) matrix A will be denoted by σ (A). The triple (π , ν , δ ), with r−s π = r+s 2 , ν = 2 , δ = n − r, is called the inertia of the form, and denoted by In (A). Corollary 8.12. 1. Two quadratic forms on a real inner product space U are congruent if and only if they have the same rank and signature. 2. Two n × n symmetric matrices are congruent if and only if they have the same rank and signature. Proposition 8.13. Let A be an m × m real symmetric matrix, X an m × n matrix of full row rank, and B the n × n symmetric matrix defined by ˜ B = XAX. Then the rank and signature of A and B coincide.
(8.7)
8.2 Basics
203
Proof. The statement concerning ranks is trivial. From (8.7) we obtain the two ˜ Let us choose a basis for Rn that is inclusions Ker B ⊃ Ker X and Im B ⊂ Im X. compatible with the direct sum decomposition ˜ Rn = Ker X ⊕ Im X. In that basis, X = 0 X1 with X1 invertible and B=
00 0 B1
=
0 X˜1
A 0 X1
or B1 = X˜1 AX1 . Thus σ (B) = σ (B1 ) = σ (A), which proves the theorem.
The positive definiteness of a quadratic form can be determined from any of its matrix representations. The following theorem gives a determinantal characterization for positivity. Theorem 8.14. Let φ be a symmetric bilinear form on an n-dimensional vector space U and let [φ ]B B = (φi j ) be its matrix representation with respect to the basis B. Then the following statements are equivalent: 1. φ is positive definite. ˜ 2. [φ ]B B = (φi j ) = X X, with X invertible and lower triangular. 3. All principal minors are positive, i.e., φ 11 . . . φk1
. . . . .
. . . . .
. . . . .
φ1k . . > 0, . φkk
for k = 1, . . . , n. Proof. (2) ⇒ (1) This is obvious. (3) ⇒ (2) We prove this by induction on k.For k = 1, this is immediate. Assume it holds for all i = 1, . . . , k − 1 and that A =
A11 b b˜ d
is k × k with A11 ∈ R(k−1)×(k−1) positive.
By the induction hypothesis, there exists a matrix X ∈ R(k−1)×(k−1) for which A11 = X1 X˜1 , with X1 nonsingular. We write
A11 b b˜ d
=
X1 0 0 1
I X1−1 b 1 −1 d b˜ X
1 0 X , 0 1
204
8 Tensor Products and Forms
with
I v v˜ d
> 0, where v = X1−1 b. Using the computation
I v v˜ d
=
I 0 v˜ d
I 0 0e
I v 01
=
I v , v˜ e + vv ˜
i.e., e + vv ˜ = d and necessarily ˜ > 0, and setting e = ε 2 , it follows that e =d − vv I 0 ˜ where X = X1 0 . A = X X, 0 1
v˜ ε
(1) ⇒ (3) Assume all principal minors have positive determinants. Let (φi j ) be the matrix of the qudratic form. Let R be the matrix that reduces (φi j ) to upper triangular form. If we do not normalize the diagonal elements, then R is lower triangular with all elements on the diagonal equal to 1. Thus Rφ R˜ is diagonal with the determinants of the principal minors on the digonal. Since Rφ R˜ is positive by assumption, so is φ . The same result holds for Hermitian forms, where the transpose is replaced by the Hermitian adjoint. Theorem 8.15. 1. Let v1 , . . . , vk ∈ Cn , λ1 , . . . , λk ∈ R, and let A be the Hermitian matrix ∑ki=1 λi vi v∗i . Then A is Hermitian congruent to Λ = diag (λ1 , . . . , λk , 0, . . . , 0). 2. If A1 and A2 are Hermitian and rank (A1 + A2 ) = rank (A1 ) + rank(A2 ), then
σ (A1 + A2 ) = σ (A1 ) + σ (A2).
Proof. 1. Extend {v1 , . . . , vk } to a basis {v1 , . . . , vn } of Cn . Let V be the matrix whose columns are v1 , . . . , vn . Then V Λ V ∗ = A. 2. Let λ1 , . . . , λk be the nonzero eigenvalues of A1 , with corresponding eigenvectors v1 , . . . , vk and let λk+1 , . . . , λk+l be the nonzero eigenvalues of A2 , with corresponding eigenvectors vk+1 , . . . , vk+l . Thus A1 = ∑ki=1 λi vi v∗i and A2 = ∗ ∑k+l i=k+1 λi vi vi . By our assumption, Im A1 ∩ Im A2 = {0}. Therefore, the vectors v1 , . . . , vk+l are linearly independent. Hence, by part (1), A1 + A2 is Hermitian congruent to diag (λ1 , . . . , λk+l , 0, . . . , 0). In the sequel, we will focus our interest on some important quadratic forms induced by rational functions.
8.3 Some Classes of Forms We pass now from the general theory of forms to some very important examples. We will focus mainly on the Hankel and Bezoutian forms and their interrelations. These forms arise out of different representations of rational functions. Since positivity is
8.3 Some Classes of Forms
205
not essential to the principal results, we assume that the field F is arbitrary. Assume V is a finite-dimensional F-vector space. A billinear form φ (x, y) on V × V is called symmetric if, for all x, y ∈ V , we have
φ (x, y) = φ (y, x).
(8.8)
8.3.1 Hankel Forms We begin our study of special forms by focusing on Hankel operators and forms. Though our interest is mainly in Hankel operators with rational symbols, we need to enlarge the setting in order to give Kronecker’s characterization of rationality in terms of infinite Hankel matrices. Let g(z) = ∑nj=−∞ g j z j ∈ F((z−1 )) and let π+ and π− be the projections, defined in (1.23), that correspond to the direct sum representation F((z−1 )) = F[z] ⊕ z−1 F[[z−1 ]]. We define the Hankel operator Hg : F[z] −→ z−1 F[[z−1 ]] by Hg f = π− g f ,
f (z) ∈ F[z].
(8.9)
Here g(z) is called the symbol of the Hankel operator. Clearly, Hg is a well-defined linear transformation. We recall, from Chapter 1, that F[z], z−1 F[[z−1 ]] and F((z−1) ) carry also natural F[z]-module structures. In F[z] this is simply given in terms of polynomial multiplication. However, z−1 F[[z−1 ]] is not an F[z]-submodule of F((z−1) ). Therefore, we define for h(z) ∈ z−1 F[[z−1 ]], p · h = π− ph. For the polynomial z we denote the corresponding operators by S+ : F[z] −→ F[z] and S− : z−1 F[[z−1 ]] −→ z−1 F[[z−1 ]], defined by
(S+ f )(z), = z f (z)
f (z) ∈ F[z],
(S− h)(z), = π− zh(z)
h(z) ∈ z−1 F[[z−1 ]].
(8.10)
Proposition 8.16. Let g(z) ∈ F((z−1 )) and let Hg : F[z] −→ z−1 F[[z−1 ]] be the Hankel operator defined in (8.9). Then 1. Hg is an F[z]-module homomorphism. 2. Any F[z]-module homomorphism from F[z] to z−1 F[[z−1 ]] is a Hankel operator. 3. Let g1 (z), g2 (z) ∈ F((z−1 ). Then Hg1 = Hg2 if and only if g1 (z) − g2 (z) ∈ F[z]. Proof. 1. It suffices to show that Hg S+ = S− Hg .
(8.11)
206
8 Tensor Products and Forms
This we do by computing Hg S+ f = π− gz f = π− z(g f ) = π− zπ− (g f ) = S− Hg f . Here we used the fact that S+ Ker π− ⊂ Ker π− . 2. Let H : F[z] −→ z−1 F[[z−1 ]] be an F[z]-module homomorphism. We set g = H1 ∈ z−1 F[[z−1 ]]. Then, since H is a homomorphism, for any polynomial f (z) we have H f = H( f · 1) = f · H1 = π− f g = πg f = Hg f . Thus we have identified H with a Hankel operator. 3. Since Hg1 − Hg2 = Hg1 −g2 , it suffices to show that Hg = 0 if and only if g(z) ∈ F[z]. Assume therefore Hg = 0. Then π− g1 = 0, which means that g(z) ∈ F[z]. The converse is obvious. Equation (8.11) expresses the fact that Hg : F[z] −→ z−1 F[[z−1 ]] is an F[z]-module homomorphism, where both spaces are equipped with the F[z]-module structure. We call (8.11) the functional equation of Hankel operators . Corollary 8.17. Let g(z) ∈ F((z−1 )). Then Hg as a map from F[z] to z−1 F[[z−1 ]] is not invertible. Proof. Were Hg invertible, it would follow from (8.11) that S+ and S− are similar transformations. This is impossible, since for S− each α ∈ F is an eigenvalue, whereas S+ has no eigenvalues at all. Corollary 8.18. Let Hg be a Hankel operator. Then Ker Hg is a submodule of F[z] and Im Hg a submodule of z−1 F[[z−1 ]].
Proof. Follows from (8.11).
Submodules of F[z] are ideals, hence of the form qF[z]. Also, we have a characterization of finite-dimensional submodules of F− (z) ⊂ z−1 F[[z−1 ]], the subspace of rational, strictly proper functions. This leads to Kronecker’s theorem. Theorem 8.19 (Kronecker). Let g(z) ∈ z−1 F[[z−1 ]]. 1. A Hankel operator Hg has finite rank if and only if g(z) is a rational function. p(z) 2. If g(z) = q(z) with p(z), q(z) coprime, then Ker H p = qF[z]
(8.12)
Im H p = X q .
(8.13)
q
and q
8.3 Some Classes of Forms
207
Proof. Assume g(z) is rational. So g(z) = p(z)/q(z), and we may assume that p(z) and q(z) are coprime. Clearly qF[z] ⊂ Ker Hg , and because of coprimeness, we actually have the equality qF[z] = Ker Hg . This proves (8.12). Now we have F[z] = Xq ⊕ qF[z] as F-vector spaces and hence Im Hg = {π− q−1 p f | f ∈ Xq }. We compute π− q−1 p f = q−1 (qπ− q−1 p f ) = q−1 π− qp f ∈ X q . Using coprimeness, and Theorem 5.18, we have {πq p f | f ∈ Xq } = Xq and therefore Im Hg = X q , which shows that Hg has finite rank. Conversely, assume Hg has finite rank. So M = Im Hg is a finite-dimensional subspace of z−1 F[[z−1 ]]. Let A = S− |M and let q(z) be the minimal polynomial of A. Therefore, for every f (z) ∈ F[z] we have 0 = π− qπ− g f = π− qg f . Choosing f (z) = 1 we conclude that q(z)g(z) = p(z) is a polynomial and hence g(z) = p(z)/q(z) is rational. In terms of expansions in powers of z, since g(z) = ∑∞j=1 g j z− j , the Hankel operator has the infinite matrix representation ⎛
g1 ⎜g ⎜ 2 ⎜ H = ⎜ g3 ⎜ ⎝. .
g2 g3 g4 . .
g3 g4 g5 . .
. . . . .
⎞ . .⎟ ⎟ ⎟ .⎟. ⎟ .⎠ .
(8.14)
Any infinite matrix (gi j ), satisfying gi j = gi+ j−1 is called a Hankel matrix. Clearly, it is the matrix representation of the Hankel operator Hg with respect to the basis {1, z, z2 , . . .} in F[z] and {z− j | j ≥ 1} in z−1 F[[z−1 ]]. Thus H has finite rank if and only if the associated Hankel operator has finite rank. In terms of Hankel matrices, Kronecker’s theorem can be restated as follows. Theorem 8.20 (Kronecker). An infinite Hankel matrix H has finite rank if and only if its symbol g(z) = ∑∞j=1 g j z− j is a rational function. Definition 8.21. Given a proper rational function g(z), we define its McMillan degree, δ (g), by δ (g) = rank Hg . (8.15) Proposition 8.22. Let g(z) = p(z)/q(z), with p(z), q(z) coprime, be a proper rational function. Then δ (g) = deg q. (8.16) Proof. By Theorem 8.19 we have Im H p/q = X q , and so
δ (g) = rank Hg = dim X q = deg q.
(8.17)
208
8 Tensor Products and Forms
Proposition 8.23. Let gi (z) = pi (z)/qi (z) be rational functions, with pi (z), qi (z) coprime. Then 1. We have Im Hg1 +g2 ⊂ Im Hg1 + Im Hg2
(8.18)
δ (g1 + g2 ) ≤ δ (g1 ) + δ (g2).
(8.19)
δ (g1 + g2 ) = δ (g1 ) + δ (g2 )
(8.20)
and
2. We have if and only if q1 (z), q2 (z) are coprime. Proof. 1. Obvious. 2. We have g1 (z) + g2(z) =
q1 (z)p2 (z) + q2 (z)p1 (z) . q1 (z)q2 (z)
Assume q1 (z), q2 (z) are coprime. Then by Proposition 5.25, we have X q1 + X q2 = X q1 q2 . On the other hand, the coprimeness of q1 (z), q2 (z) implies that of q1 (z)q2 (z), q1 (z)p2 (z) + q2 (z)p1 (z). Therefore Im Hg1 +g2 = X q1 q2 = X q1 + X q2 = Im Hg1 + ImHg2 . Conversely, assume δ (g1 + g2) = δ (g1 ) + δ (g2 ). This implies Im Hg1 ∩ Im Hg2 = X q1 ∩ X q2 = {0}. But invoking once again Proposition 5.25, this implies the coprimeness of q1 (z), q2 (z). For the special case of the real field, the Hankel operator defines a quadratic form on R[z]. Since R[z]∗ = z−1 R[[z−1 ]], it follows from our definition of self-dual maps that Hg is self-dual. Thus we have also an induced quadratic form on R[z] given by ∞
∞
[Hg f , f ] = ∑ ∑ gi+ j+1 fi f j .
(8.21)
i=0 j=0
Since f (z) is a polynomial, the sum in (8.21) is well defined. Thus, with the Hankel map is associated the infinite Hankel matrix (8.14). With g(z) we also associate the finite Hankel forms Hk , k = 1, 2, . . . , defined by the matrices
8.3 Some Classes of Forms
209
⎛
g1 ⎜ . ⎜ ⎜ Hk = ⎜ . ⎜ ⎝ . gk
. . . . .
⎞ . . gk .. . ⎟ ⎟ ⎟ . . . ⎟. ⎟ .. . ⎠ . . g2k−1
In particular, assuming as we did that p(z) and q(z) are coprime and that deg(q) = n, then rank(Hk ) = deg(q) = n = δ (g) for k ≥ n. Naturally, not only the rank information for Hn is derivable from g(z) through its polynomial representation but also the signature information. This is most conveniently done by the use of Bezoutians or the Cauchy index. We turn next to these topics.
8.3.2 Bezoutians A quadratic form is associated with each symmetric matrix. We will focus now on quadratic forms induced by polynomials and rational functions. In this section we present the basic theory of Bezout forms, or Bezoutians. Bezoutians are a special class of quadratic forms that were introduced to facilitate the study of root location of polynomials. Except for the study of signature information, we do not have to restrict the field. The study of Bezoutians is intimately connected to that of Hankel forms. The study of this connection will be undertaken in Section 8.3.5. Let p(z) = ∑ni=0 pi zi and q(z) = ∑ni=0 qi zi be two polynomials and let z and w be two, generally noncommuting, variables. Then p(z)q(w) − q(z)p(w) = ∑ni=0 ∑nj=0 pi q j (zi w j − z j wi ) = ∑0≤i< j≤n (pi q j − qi p j )(zi w j − z j wi ). Observe now that j−i−1
zi w j − z j wi =
∑
ν =0
zi+ν (w − z)w j−ν −1 .
Thus, equation (8.22) can be rewritten as p(z)q(w) − q(z)p(w) =
∑
j−i−1
(pi q j − p j qi )
0≤i< j≤n
∑
ν =0
zi+ν (w − z)w j−ν −1 ,
(8.22)
210
8 Tensor Products and Forms
or n
q(z)p(w) − p(z)q(w) = ∑
n
∑ bi j zi−1 (z − w)w j−1 .
(8.23)
i=1 j=1
The last equality is obtained by changing the order of summation and properly defining the coefficients bi j . Equation (8.23) can also be written as n n q(z)p(w) − p(z)q(w) = ∑ ∑ bi j zi−1 w j−1 . z−w i=1 j=1
(8.24)
Definition 8.24. 1. The matrix (bi j ) defined by equation (8.23) or (8.24) is called the Bezout form associated with the polynomials q(z) and p(z) or just the Bezoutian of q(z) and p(z) and is denoted by B(q, p). The function q(z)p(w)−p(z)q(w) z−w is called the generating function of the Bezoutian form. 2. If g(z) = p(z)/q(z) is the unique coprime representation of a rational function g(z), with q(z) monic, then we will write B(g) for B(q, p). We shall call B(g) the Bezoutian of g(z). It should be noted that equation (8.23) holds even when the variables z and w are noncommuting. This observation is extremely useful in the derivation of matrix representations of Bezoutians. Note that B(q, p) defines a bilinear form on Fn by n
B(q, p)(ξ , η ) = ∑
n
∑ bi j ξi η j ,
ξ , η ∈ Fn .
(8.25)
i=1 j=1
No distinction will be made in the notation between the matrix and the bilinear form. The following theorem summarizes the elementary properties of the Bezoutian. Theorem 8.25. Let q(z), p(z) ∈ F[z] with max(deg q, deg p) ≤ n. Then 1. The Bezoutian matrix B(q, p) is a symmetric matrix. 2. The Bezoutian B(q, p) is linear in q(z) and p(z), separately. 3. B(p, q) = −B(q, p). It is these basic properties of the Bezoutian, in particular the linearity in both variables, that turn the Bezoutian into such a powerful tool. In conjunction with the Euclidean algorithm, it is the starting point for many fast matrix inversion algorithms. Lemma 8.26. Let q(z) and r(z) be two coprime polynomials and p(z) an arbitrary nonzero polynomial. Then
and
rank (B(qp, rp)) = rank (B(q, r))
(8.26)
σ (B(qp, rp)) = σ (B(q, r)).
(8.27)
8.3 Some Classes of Forms
211
Proof. The Bezoutian B(qp, rp) is determined by the polynomial expansion of p(z)
q(z)r(w) − r(z)q(w) p(w), z−w
˜ which implies B(qp, pr) = XB(q, r)X with ⎛
p0 . ⎜ . ⎜ ⎜ X =⎜ ⎜ ⎝
. . pm ... ... .. p0
⎞ ⎟ . ⎟ ⎟ .. ⎟. ⎟ ⎠ ... . . . pm
The result follows now by an application of Theorem 8.13 .
The following theorem is central in the study of Bezout and Hankel forms. It gives a characterization of the Bezoutian as a matrix representation of an intertwining map. Thus, the study of Bezoutians is reduced to that of intertwining maps of the form p(Sq ), studied in Chapter 5. These are easier to handle and yield information on Bezoutians. This also leads to a conceptual theory of Bezoutians, in contrast to the purely computational approach that has been prevalent in their study for a long time. Theorem 8.27. Let p(z), q(z) ∈ F[z], with deg p ≤ deg q. Then the Bezoutian B(q, p) of q(z) and p(z) satisfies B(q, p) = [p(Sq )]stco . (8.28) Here Bst and Bco are the standard and control bases of Xq respectively defined in Proposition 5.3. Proof. Note that ⎧ ⎪ 0 ⎪ ⎨ π+ w−k (z − w)w j−1 = −1 ⎪ ⎪ ⎩ (z − w)w j−k−1
j k.
So, from equation (8.23), it follows that n
q(z)π+ w−k p(w) − p(z)π+ w−k q(w)|w=z = ∑ bik zi−1 , i=1
or, with {e1 (z), . . . , en (z)} the control basis of Xq , and pk (z) defined by pk (z) = π+ z−k p(z),
212
8 Tensor Products and Forms
we have n
q(z)pk (z) − p(z)ek (z) = ∑ bik zi−1 . i=1
Applying the projection πq , we obtain n
πq pek = p(Sq )ek = ∑ bik zi−1 ,
(8.29)
i=1
which, from the definition of a matrix representation, completes the proof.
Corollary 8.28. Let p(z), q(z) ∈ F[z], with deg p < deg q. Then the last row and column of the Bezoutian B(q, p) consist of the coefficients of p(z). Proof. By equation (8.29) and the fact that the last element of the control basis satisfies en (z) = 1, we have n
n−1
i=1
i=0
∑ binzi−1 = (πq pen )(z) = (πq p)(z) = p(z) = ∑ pi zi ,
(8.30)
i.e., bin = pi−1 ,
i = 1, . . . , n.
(8.31)
The statement for rows follows from the symmetry of the Bezoutian. As an immediate consequence, we derive some well-known results concerning Bezoutians. Corollary 8.29. Let p(z), q(z) ∈ F[z], with deg p ≤ deg q. Assume p(z) has a factorization p(z) = p1 (z)p2 (z). Let q(z) = ∑ni=0 qi zi be monic, i.e., qn = 1. Let Cq and Cq be the companion matrices of q(z), given by (5.9) and (5.10) respectively, and let K be the Hankel matrix ⎛ ⎞ q1 . . . qn−1 1 ⎜. . ..1 0⎟ ⎜ ⎟ ⎜ ⎟ . .. .⎟ ⎜. K=⎜ (8.32) ⎟. ⎜. . .. .⎟ ⎜ ⎟ ⎝ qn−1 1 ⎠ 1 0... 0 Then 1. 2.
B(q, p1 p2 ) = B(q, p1 )p2 (Cq ) = p1 (Cq )B(q, p2 ).
(8.33)
B(q, p) = K p(Cq ) = p(Cq )K.
(8.34)
8.3 Some Classes of Forms
3.
213
B(q, p)Cq = Cq B(p, q).
(8.35)
Proof. Note that for the standard and control bases of Xq , we have Cq = [Sq ]stst and C = C˜ q = [Sq ]co . Note also that the matrix K in equation (8.32) satisfies K = B(q, 1) q
co
and is a Bezoutian as well as a Hankel matrix. 1. This follows from the equality p(Sq ) = p1 (Sq )p2 (Sq ) and the fact that B(q, p1 p2 ) = [(p1 p2 )(Sq )]stco = [p1 (Sq )p2 (Sq )]stco st st = [p1 (Sq )]stco [p2 (Sq )]co co = [p1 (Sq )]st [p2 (Sq )]co .
2. This follows from Part (1) for the trivial factorization p(z) = p(z) · 1. 3. From the commutativity of Sq and p(Sq ), it follows that st st [p(Sq )]stco [Sq ]co co = [Sq ]st [p(Sq )]co . st We note that [Sq ]co co = ([Sq ]st ).
Representation (8.34) for the Bezoutian is sometimes referred to as the Barnett factorization. Already in Section 3.4, we derived a determinantal test, using the resultant, for the coprimeness of two polynomials. Now, with the introduction of the Bezoutian, we have another such test. We can classify the resultant approach as geometric, for basically it gives a condition for a polynomial model to be a direct sum of two shift-invariant subspaces. The Bezoutian approach, on the other hand, is arithmetic, inasmuch as it is based on the invertibility of intertwining maps. We shall see in the sequel yet another such test, given in terms of Hankel matrices, and furthermore, we shall clarify the connection between the various tests. Theorem 8.30. Given two polynomials p(z), q(z) ∈ F[z], then 1. dim(Ker B(q, p)) is equal to the degree of the g.c.d. of q(z) and p(z). 2. B(q, p) is invertible if and only if q(z) and p(z) are coprime. Proof. Part (1) follows from Theorem 5.18. Part (2) follows from Theorem 5.18 and Theorem 8.27.
8.3.3 Representation of Bezoutians Relation (8.23) leads easily to some interesting representation formulas for the Bezoutian. For a polynomial a(z) of degree n, we define the reverse polynomial a (z) by a (z) = a(z−1 )zn . (8.36)
214
8 Tensor Products and Forms
Theorem 8.31. Let p(z) = ∑ni=0 pi zi and q(z) = ∑ni=0 qi zi be polynomials of degree n. Then the Bezoutian has the following representations: ˜ (S) − q(S)p ˜ (S)]J B(q, p) = [p(S)q ˜ − q(S)p(S)] ˜ = J[p(S)q (S) ˜ ˜ = −J[p (S)q(S) − q (S)p(S)] ˜ − q(S)p(S)]J. ˜ = −[p (S)q(S)
(8.37)
Proof. We define the n × n shift matrix S by ⎛
01 ⎜0 . ⎜ ⎜ ⎜0 . S=⎜ ⎜0 . ⎜ ⎝0 . 0 .
. . 1. . . . . . . . .
. . . 1 . .
⎞ 0 0⎟ ⎟ ⎟ 0⎟ ⎟ 0⎟ ⎟ 1⎠ 0
(8.38)
and the n × n transposition matrix J by ⎛
0 ⎜0 ⎜ ⎜ J = ⎜0 ⎜ ⎝0 1
. . . . . . 1. . .
. 1 . . .
⎞ 1 0⎟ ⎟ ⎟ 0⎟. ⎟ 0⎠ 0
(8.39)
˜ From Note also that for an arbitrary polynomial a(z), we have Ja(S)J = a(S). equation (8.23), we easily obtain q(z)p (w) − p(z)q (w) = [q(z)p(w−1 ) − p(z)q(w−1 )]wn = ∑ni=1 ∑nj=1 bi j zi−1 (z − w−1 )w− j+1 wn = ∑ni=1 ∑nj=1 bi j zi−1 (zw − 1)wn− j . In this identity, we substitute now z = S˜ and w = S. Thus, we get for the central term ⎛
1 ⎜0 ⎜ ˜ − I) = − ⎜ (SS ⎜0 ⎜ ⎝0 0
0. 0. 0. 0. 0.
. . . . .
. . . . .
⎞ 0 0⎟ ⎟ ⎟ 0 ⎟, ⎟ 0⎠ 0
8.3 Some Classes of Forms
215
˜ − I)Sn− j is the matrix whose only nonzero term is −1 in the i, n − j and S˜i−1 (SS ˜ (S) − q(S)p ˜ (S)}J. The other position. This implies the identity B(q, p) = {p(S)q identities are similarly derived. From the representations (8.37) of the Bezoutian we obtain, by expanding the matrices, the Gohberg–Semencul formulas, of which we present one only: ˜ (S) − q(S)p ˜ (S))J B(q, p) = (p(S)q ⎞⎛
⎡⎛
p0 ⎢⎜ . ⎢⎜ ⎜ =⎢ ⎢⎜ . ⎣⎝ . pn−1 ⎡⎛
p0 ⎢⎜ . ⎢⎜ ⎜ =⎢ ⎢⎜ . ⎣⎝ . pn−1
p0 . p0 . . p0 . . . p0
⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎠⎝
qn qn−1 . . . .. .. .
⎞ ⎛ q1 q0 ⎜. . ⎟ ⎟ ⎜ ⎜ . ⎟ ⎟ − ⎜. . ⎠ ⎝. qn
⎞⎛
q1 ⎟⎜ . p0 ⎟⎜ ⎟⎜ . . p0 ⎟⎜ ⎠⎝ q . . p0 n−1 . . . p0 qn
qn−1
⎞⎛ . .. . . . q0
⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎠⎝
pn pn−1 . . . .. .. .
⎞⎤ p1 ⎥ . ⎟ ⎟⎥ ⎥ . ⎟ ⎟⎥ J . ⎠⎦ pn
⎞ ⎛ ⎞⎛ . . qn−1 qn q0 p1 ⎟ ⎜. ⎟⎜ . . . qn ⎟ ⎜ ⎟⎜ ⎟ − ⎜. ⎟⎜ . . . . ⎟ ⎜ ⎟⎜ ⎝ ⎠ ⎠⎝ p . .. qn n−1 qn−1 . . . q0 pn
⎞⎤ . . pn−1 pn ⎟⎥ . . pn ⎟⎥ ⎟⎥. . . ⎟⎥ ⎠⎦ pn
(8.40) Given two polynomials p(z) and q(z) of degree n, we let their Sylvester resultant, Res (p, q), be defined by ⎛
p0 . ⎜ . ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ Res (p, q) = ⎜ ⎜ q0 . ⎜ ⎜ . ⎜ ⎜ ⎜ ⎝
. . pn−1 ... ... .. p0 . . qn−1 ... ... .. q0
pn . . . p1 qn . . . q1
⎞ ⎟ . ⎟ ⎟ .. ⎟ ... ⎟ ⎟ ⎟ . . . pn ⎟ ⎟. ⎟ ⎟ ⎟ . ⎟ ⎟ .. ⎟ ... ⎠ . . . qn
(8.41)
It has been proved, in Theorem 3.18, that the resultant Res (p, q) is nonsingular if and only if p(z) and q(z) are coprime. Equation (8.41) can be rewritten as the 2 × 2 block matrix ˜ p(S) p (S) Res (p, q) = ˜ , q(S) q (S) where the reverse polynomials p (z) and q (z) are defined in equation (8.36).
216
8 Tensor Products and Forms
Based on the preceeding, we can state the following. Theorem 8.32. Let p(z) = ∑ni=0 pi zi and q(z) = ∑ni=0 qi zi be polynomials of degree n. Then 0J 0 B(p, q) Res (p, q) Res (p, q) = . (8.42) −J 0 −B(p, q) 0 Proof. By expanding the left side of (8.42) we have
˜ ˜ q(S) ˜ 0 J p(S) p (S) p(S) ˜ q(S) q (S) −J 0 p (S) q (S) ˜ − q(S)J ˜ p (S)) ˜ ˜ ˜ p(S)) (p(S)Jq ˜ (S) (p(S)Jq(S) − q(S)J = ˜ − q(S)J p (S)) ˜ (p (S)Jq(S) − q(S)J p(S)) (p (S)Jq (S)
Now ˜ ˜ p(S) = J[p(S)q(S) − q(S)p(S)] = 0. p(S)Jq(S) − q(S)J Similarly, ˜ − q (S)J p (S) ˜ = [p (S)q (S) − q(S)p (S)]J = 0. p (S)Jq (S) In the same way, ˜ (S) ˜ − q(S)J ˜ p (S) ˜ = J[p(S)q (S) ˜ − q(S)p(S)] ˜ = B(q, p). p(S)Jq Finally, we compute ˜ (S) − q(S)p(S)] ˜ p (S)Jq(S) − q(S)J p(S) = J[p(S)q = −B(q, p).
This proves the theorem.
The nonsingularity of both the Bezoutian and resultant matrices is equivalent to the coprimeness of the two generating polynomials. As an immediate corollary to the previous theorem, we obtain the equality, up to a sign, of their determinants. Yet another determinantal test for coprimeness, given in terms of Hankel matrices, will be given in Corollary 8.41. Corollary 8.33. Let p(z) and q(z) be polynomials of degree n. Then | det Res (p, q)| = | det B(p, q)|.
(8.43)
In particular, the nonsingularity of either matrix implies that of the other. As we already know, these conditions are equivalent to the coprimeness of the polynomials p(z) and q(z). Actually, we can be more precise about the relationship between the determinants of the resultant and Bezoutian matrices.
8.3 Some Classes of Forms
217
Corollary 8.34. Let p(z) and q(z) be polynomials of degree n. Then 1. 2.
n(n+1) 2
detRes (q, p).
(8.44)
det p(Sq ) = det p(Cq ) = det Res (p, q).
(8.45)
det B(q, p) = (−1)
˜ (S) − q(S)p ˜ (S)}J of the Proof. 1. From the representation B(q, p) = {p(S)q Bezoutian we get, by applying Lemma 3.17, det B(q, p) = det Res (q, p) det J. Since B(q, p) = −B(p, q), it follows that det B(q, p) = (−1)n det B(p, q). On the n(n−1)
other hand, for J the n × n matrix defined by (8.39), we have det J = (−1) 2 . Therefore (8.44) follows. 2. We can compute the determinant of a linear transformation in any basis. Thus det p(Sq ) = det p(Cq ) = det p(Cq ). st Also, by Theorem 8.27, B(q, p) = [p(Sq )]stco = [I]stco [p(Sq )]co co , and since det[I]co =
(−1)
n(n−1) 2
, we get det p(Sq ) = (−1)
n(n−1) 2
= (−1)
n(n−1) 2
det B(q, p) (−1)n det B(p, q)
n(n−1) 2
(−1)n (−1) = (−1) = det Res (p, q).
n(n+1) 2
det Res (p, q)
8.3.4 Diagonalization of Bezoutians Since the Bezoutian of a pair of polynomials is a symmetric matrix, it can be diagonalized by a congruence transformation. The diagonalization procedure described here is based on polynomial factorizations and related direct sum decompositions. We begin by analyzing a special case, in which the congruence transformation is given in terms of a Vandermonde matrix. This is related to Proposition 5.15. In the proof of the following proposition, we shall make use of four distinct bases of the polynomial model Xq . These are the standard basis Bst , the control basis Bco , the interpolation basis Bin consisting of the Lagrange interpolation polynomials, and the spectral basis Bsp = {s1 (z), . . . , sn (z)}, where si (z) = Π j=i (z − λ j ). Proposition 8.35. Let q(z) be a monic polynomial of degree n having distinct zeros λ1 , . . . , λn , and let p(z) be a polynomial of degree ≤ n. Let V (λ1 , . . . , λn ) be the Vandermonde matrix
218
8 Tensor Products and Forms
⎛
1 ⎜. ⎜ ⎜ V (λ1 , . . . , λn ) = [I]in st = ⎜ . ⎜ ⎝. 1
λ1 . . . λn
⎞ . . λ1n−1 .. . ⎟ ⎟ ⎟ . . . ⎟. ⎟ .. . ⎠ . . λnn−1
Then the Bezoutian B(q, p) satisfies the following identity: V B(q, p)V˜ = R, where R is the diagonal matrix diag (r1 , . . . , rn ) and ri = p(λi )si (λi ) = p(λi )q (λi ). Proof. The trivial operator identity, I p(Sq )I = p(Sq ), implies the matrix equality st co in [I]in st [p(Sq )]co [I]sp = [p(Sq )]sp .
(8.46)
Since Sq si = λi si , it follows that p(Sq )pi = p(λi )pi = p(λi )pi (λi li . Now si (λi ) = ∏ j=i (λi − λ j ), but q (z) = ∑ni=1 si (z), and hence q (λi ) = ∏ j=i (λi − λ j ) = si (λi ). ∗ = Bin . Thus R = [p(Sq )]in Next, we use the duality relations Bst∗ = Bco and Bsp sp , and the result follows. The previous proposition uses a factorization of q(z) into distinct linear factors. However, such a factorization does not always exist. In case it does exist, then the polynomials pi (z) = ∏ j=i (z − λ j ), which, up to a constant factor, are equal to the Lagrange interpolation polynomials, form a basis for Xq . Moreover, we have Xq = p1 Xz−λ1 ⊕ · · · ⊕ dn Xz−λn . The fact that the sum is direct is implied by the assumption that the zeros λi are distinct, or that the polynomials z − λi are pairwise coprime. In this case the internal direct sum decomposition is isomorphic to the external direct sum Xz−λ1 ⊕ · · · ⊕ Xz−λn . This can be siginificantly extended and generalized. To this end, let q1 (z), . . . , qs (z) be monic polynomials in F[z]. Let Xq1 ⊕ · · · ⊕ Xqs be the external direct sum of the corresponding polynomial models. We would like to compare this space with Xq1 ···qs . We define polynomials by di (z) = ∏ j=i q j (z) and the map Z : Xq1 ⊕ · · · ⊕ Xqs −→ Xq1 ···qs by ⎛
⎞ f1 s ⎜ ⎟ Z ⎝ ... ⎠ = ∑ di fi . fs
(8.47)
i=1
Then we have the following result, related to the Chinese remainder theorem.
8.3 Some Classes of Forms
219
Proposition 8.36. Let Z : Xq1 ⊕ · · · ⊕ Xqs −→ Xq1 ···qs be defined by (8.47). Let BST be the standard basis in Xq1 ···qs and let Bst be the basis of Xq1 ⊕ · · ·⊕ Xqs constructed from the union of the standard bases in the Xqi . Then 1. The adjoint map to Z, that is, Z ∗ : Xq1 ···qs −→ Xq1 ⊕ · · · ⊕ Xqs , is given by ⎛
⎞ π q1 f ⎜ ⎟ Z ∗ f = ⎝ ... ⎠ .
(8.48)
πq s f 2. The map Z defined by (8.47) is invertible if and only if the qi (z) are mutually coprime. 3. For the case s = 2 we have [Z]ST st = Res (q2 , q1 ),
(8.49)
where Res (q2 , q1 ) is the resultant matrix defined in (3.16) and BST is the standard basis of Xq1 ⊕ Xq2 constructed from the union of the standard bases of Xq1 and Xq2 . 4. Assume qi (z) = z − λi. Then ⎛ ⎞ 1 λ1 . . λ1s−1 ⎜. . .. . ⎟ ⎜ ⎟ ⎜ ⎟ [Z ∗ ]stST = ⎜ . . . . . ⎟ . (8.50) ⎜ ⎟ ⎝. . .. . ⎠ 1 λn . . λns−1 5. We have [Z ∗ ]stST = [Z ∗ ]stin [I]in ST and [Z ∗ ]stin = I. Proof. 1. We compute Z f , g = ∑si=1 di fi , g = [q−1 ∑si=1 di fi , g] = ∑si=1 [q−1 di fi , g] s = ∑si=1 [q−1 i f i , g] = ∑i=1 f i , πqi g = f , Z ∗ g. 2. Note that Z ∗ is exactly the map given by the Chinese remainder theorem, namely, it maps an element of Xq into the vector of its remainders modulo the polynomials qi (z). It follows that Z ∗ is an isomorphism if and only if the qi (z) are mutually coprime. Finally, the invertibility of Z ∗ is equivalent to the invertibility of Z. 3. In this case, d1 (z) = q2 (z) and d2 (z) = q1 (z).
220
8 Tensor Products and Forms
4. This follows from the fact that for an arbitrary polynomial p(z), we have πz−λi p = p(λi ). 5. That [I]in ST is the Vandermonde matrix has been proved in Corollary 2.36. That [Z ∗ ]stin = I follows from the fact that for the Lagrange interpolation polynomials, we have πi (λ j ) = δi j . Next, we proceed to study a factorization of the map Z defined in (8.47). This leads to, and generalizes, from a completely different perspective, the classical formula, given by (3.17), for the computation of the determinant of the Vandermonde matrix. Proposition 8.37. Let q1 (z), . . . , qs (z) be polynomials in F[z]. Then, 1. The map Z : Xq1 ⊕ · · · ⊕ Xqs −→ Xq1 ···qs , defined in (8.47), has the factorization Z = Zs−1 · · · Z1 ,
(8.51)
where Zi : Xq1 ···qi ⊕ Xqi+1 ⊕ · · · ⊕ Xqs −→ Xq1 ···qi+1 ⊕ Xqi+2 ⊕ · · · ⊕ Xqs is defined by ⎛
⎛
⎞ (q1 · · · qi ) fi+1 + qi+1 g ⎜ fi+1 ⎟ ⎜ ⎟ fi+2 ⎜ ⎟ ⎜ ⎟ Z⎜ . ⎟ = ⎜ ⎟. .. ⎝ .. ⎠ ⎝ ⎠ . g
⎞
fs
fs
2. Choosing the standard basis in all spaces, we get [Zi ]stst =
Res (qi+1 , qi · · · q1 ) 0 . 0 I
3. For the determinant of [Zi ]stst we have i
det[Zi ]stst = ∏ det Res (qi+1 , q j ). j=1
4. For the determinant of [Z]stst we have det[Z]stst =
∏
det Res (q j , qi ).
(8.52)
1≤i< j≤s
5. For the special case qi (z) = z − λi , we have detV (λ1 , . . . , λs ) =
∏
(λ j − λi).
1≤i< j≤s
(8.53)
8.3 Some Classes of Forms
221
Proof. 1. We prove this by induction. For s = 2 this is trivial. Assume we have proved it for all integers ≤ s. Then we define the map Z : Xq1 ⊕ · · · ⊕ Xqs+1 −→ Xq1 ···qs ⊕ Xqs+1 by ⎛
⎞ f1 ⎜ .. ⎟ s f d ∑ ⎜ ⎟ j j=1 j . Z ⎜ . ⎟ = ⎝ fs ⎠ fs+1 fs+1 Here di (z) = ∏ j=i,s+1 q j (z). Now ds+1 (z) = ∏ j=s+1 q j (z) and qs+1 (z)di (z) = qs+1 (z)
∏
j=i,s+1
q j (z) = ∏ q j (z) = di (z). j=i
So ⎛ ⎞ f1 ⎜ ⎜ .. ⎟ s s+1 ⎜ ⎜ ⎟ Zs Z ⎜ . ⎟ = (q1 · · · qs ) fs+1 + qs+1 ∑ d j f j = ∑ d j f j = Z ⎜ ⎝ ⎝ fs ⎠ j=1 j=1 ⎛
fs+1
⎞ f1 .. ⎟ . ⎟ ⎟. fs ⎠ fs+1
2. We use Proposition 8.36. 3. From equation (8.49) it follows that det[Zi ]stst = det Res (qi+1 , qi · · · q1 ). Now we use (8.45), i.e., the equality det Res (q, p) = det p(Sq ), to get det Res (qi+1 , qi · · · q1 ) = (−1)(∑ j=1 m j )mi+1 det Res (qi · · · q1 , qi+1 ) i
= (−1)(∑ j=1 m j )mi+1 det(qi · · · q1 )(Sqi+1 ) i
= ∏ij=1 (−1)m j mi+1 det(q j )(Sqi+1 ) = ∏ij=1 (−1)m j mi+1 detRes (q j , qi+1 ) = ∏ij=1 det Res (qi+1 , q j ). 4. Clearly, by the factorization (8.51) and the product rule of determinants, we have s−1 i st det[Z]stst = ∏s−1 j=1 det[Zi ]st = ∏ j=1 ∏ j=1 detRes (qi+1 , q j )
= ∏1≤i< j≤s det Res (q j , qi ).
222
8 Tensor Products and Forms
5. By Proposition 8.36, we have in this case V (λ1 , . . . , λs ) = [Z ∗ ]stST . Now det Z ∗ = det Z implies that ST ST detV (λ1 , . . . , λs ) = det[Z ∗ ]stST = det[Z ∗ ]CO co = det [Z]st = det[Z]st .
Also, in this case, −λ −λ j . det Res (z − λi , z − λ j ) = i 1 1 Thus (8.53) follows from (8.50). This agrees with the computation of the Vandermonde determinant given previously in Chapter 3. We can proceed now with the block diagonalization of the Bezoutian. Here we assume an arbitrary factorization of q(z). Proposition 8.38. Let q(z) = q1 (z) · · · qs (z) and let Z : Xq1 ⊕ · · · ⊕ Xqs −→ Xq1 ···qs be the map defined by (8.47). Then 1. The following is a commutative diagram: Xq1 ⊕ · · · ⊕ Xqs
Z
(pd1 )(Sq1 ) ⊕ · · · ⊕ (pds)(Sqk ) ? Xq1 ⊕ · · · ⊕ Xqs
- Xq ···q s 1 p(Sq )
Z∗
? Xq1 ···qs
i.e., Z ∗ p(Sq )Z = (pd1 )(Sq1 ) ⊕ · · · ⊕ (pds )(Sqk ).
(8.54)
2. Let BCO be the control basis in Xq1 ···qs and let Bco be the basis of Xq1 ⊕ · · · ⊕ Xqs constructed from the ordered union of the control bases in the Xqi , i = 1, . . . , s. We have V˜ B(q, p)V = diag (B(q1 , pd1 ), . . . , B(qs , pds )), where di (z) = ∏ j=i q j (z) and V = [Z]CO co .
8.3 Some Classes of Forms
223
Proof. 1. We compute ⎛
⎞ f1 ⎜ ⎟ Z ∗ p(Sq )Z ⎝ ... ⎠ = Z ∗ p(Sq ) ∑sj=1 d j f j = Z ∗ πq p ∑sj=1 d j f j fs ⎛
⎞ ⎛ ⎞ ⎛ ⎞ πq1 ∑sj=1 πq pd j f j (pd1 )(Sq1 ) f1 πq1 πq pd1 f1 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ .. .. .. =⎝ ⎠=⎝ ⎠=⎝ ⎠ . . .
πqs ∑sj=1 πq pd j f j
πqs πq pds fs
(pds )(Sqs ) fs
⎛
⎞ f1 ⎜ ⎟ = ((pd1 )(Sq1 ) ⊕ · · · ⊕ (pds )(Sqk )) ⎝ ... ⎠ . fs 2. We start from (8.54) and take the following matrix representations: ST st [Z]CO [Z ∗ ]stST [p(Sq )]CO co = [(pd1 )(Sq1 ) ⊕ · · · ⊕ (pds )(Sqk )]co .
The control basis in Xq1 ⊕ · · ·⊕ Xqs refers to the ordered union of the control bases CO . in the Xq . Of course, we use Theorem 4.38 to infer the equality [Z ∗ ]st = [Z] ST
i
co
8.3.5 Bezout and Hankel Matrices Given a strictly proper rational function g(z), we have considered two different p(z) representations of it. The first is the coprime factorization g(z) = q(z) , where we assume deg q = n, whereas the second is the expansion at infinity, i.e., the gi representation g(z) = ∑∞ i=1 zi . We saw that these two representations give rise to two different matrices, namely the Bezoutian matrix on the one hand and the Hankel matrix on the other. Our next goal is to study the relationship between these two objects. We already saw that the analysis of Bezoutians is facilitated by considering them as matrix representations of intertwining maps; see Theorem 5.18. The next theorem interprets the finite Hankel matrix in the same spirit. Theorem 8.39. Let p(z), q(z) be polynomials, with q(z) monic and deg p < deg q = p(z) gi q n. Let g(z) = q(z) = ∑∞ i=1 zi , and let H : Xq −→ X be defined by H f = Hg f ,
f ∈ Xq .
224
8 Tensor Products and Forms
Let ρq : X q −→ Xq be the map defined by (5.38), i.e., ρq h = qh. Let Bst , Bco be n (z) 1 (z) , . . . , eq(z) } be the rational the standard and control bases of Xq and let Brc = { eq(z) q control basis of X defined as the image of the control basis in Xq under the map ρq−1 . Then ⎛
1.
⎞ . . . gn ⎟ .... ⎟ ⎟ .... ⎟. ⎟ ⎠ .... . . . g2n−1
g1 ⎜. ⎜ ⎜ Hn = [H]rc st = ⎜ . ⎜ ⎝. gn
(8.55)
2. Hn is invertible if and only if p(z) and q(z) are coprime. 3. If p(z) and q(z) are coprime, then Hn−1 = B(q, a), where a(z) arises out of any solution to the Bezout equation a(z)p(z) + b(z)q(z) = 1. Proof. 1. We compute, using the representation of πq given in (5.35),
ρq−1 p(Sq ) f = q−1 πq (p f ) = q−1 qπ− q−1 p f = π− g f = Hg f = H f , i.e., we have H = ρq−1 p(Sq ).
(8.56)
This implies that the following diagram is commutative: p(Sq )
Xq
- Xq
@ H
@
@
ρq−1
@
@
?
@ R @
Xq To compute the matrix representation of H, let hi j be the i j element of [H]rc st . Then we have, by the definition of a matrix representation of a linear transformation, that n
Hz j−1 = π− pq−1 z j−1 = ∑ hi j ei /q. i=1
So n
∑ hi j ei = qπ−q−1 pz j−1 = πq pz j−1.
i=1
8.3 Some Classes of Forms
225
Using the fact that Bco is the dual basis of Bst under the pairing (5.64), we have hk j = ∑ni=1 hi j ei , zk−1 = πq pz j−1 , zk−1 = [q−1 qπ− q−1 pz j−1 , zk−1 ] = [π− q−1 pz j−1 , zk−1 ] = [gz j−1 , zk−1 ] = [g, z j+k−2 ] = g j+k−1 . 2. The invertibility of H and (8.55) implies the invertibility of Hn . 3. From the equality H = ρq−1 p(Sq ) it is clear that H is invertible if and only if p(Sq ) is, and this, by Theorem 5.18, is the case if and only if p(z) and q(z) are coprime. The Bezout and Hankel matrices corresponding to a rational function g(z) are both symmetric matrices arising from the same data. It is therefore natural to conjecture that not only are their ranks equal but also the signatures of the corresponding quadratic forms. The next theorem shows that indeed they are congruent, and moreover, it provides the relevant congruence transformations. Theorem 8.40. Let g(z) = p(z)/q(z) with p(z) and q(z) coprime polynomials −i n n−1 + satisfying deg p < deg q = n. Assume g(z) = ∑∞ i=1 gi z and q(z) = z + qn−1 z · · ·+q0 . Then the Hankel matrix Hn and the Bezoutian B(q, p) = (bi j ) are congruent. Specifically, we have 1. ⎛
g1 ⎜. ⎜ ⎜ ⎜. ⎜ ⎝. gn
⎞ . . . gn ⎟ .... ⎟ ⎟ .... ⎟ ⎟ ⎠ .... . . . g2n−1 ⎛
⎜ ⎜ ⎜ =⎜ ⎜ ⎝ 1
⎞⎛ b11 1 ⎜ ⎟ . ψ1 ⎟ ⎜ . ⎟⎜ ... ⎟⎜. ⎟⎜ ⎠⎝. . ... ψ1 . . ψn−1 bn1
⎞⎛ . . . b1n ⎜ .... ⎟ ⎟⎜ ⎟⎜ . . . . ⎟⎜ ⎟⎜ . . . . ⎠⎝ . . . . bnn 1 ψ1
⎞ 1 . ψ1 ⎟ ⎟ ⎟ ... ⎟. ⎟ ⎠ ... . . ψn−1
(8.57)
Here, the ψi are given by the polynomial characterization of Proposition 5.41.
226
8 Tensor Products and Forms
2. In the same way, we have ⎛
b11 ⎜. ⎜ ⎜ ⎜. ⎜ ⎝. bn1
⎞ . . . b1n .... ⎟ ⎟ ⎟ .... ⎟ ⎟ .... ⎠ . . . bnn ⎛
q1 ⎜. ⎜ ⎜ = ⎜. ⎜ ⎝ qn−1 1
. . qn−1 . .1 . . 1
1
⎞⎛
g1 ⎟⎜ . ⎟⎜ ⎟⎜ ⎟⎜ . ⎟⎜ ⎠⎝ . gn
⎞⎛ . . . gn q1 ⎟⎜ . .... ⎟⎜ ⎟⎜ .... ⎟⎜ . ⎟⎜ ⎠ ⎝ qn−1 .... 1 . . . g2n−1
. . qn−1 . .1 . . 1
1
⎞ ⎟ ⎟ ⎟ ⎟. ⎟ ⎠
(8.58)
Note that equation (8.58) can be written more compactly as B(q, p) = B(q, 1)H p/qB(q, 1).
(8.59)
3. We have rank (Hn ) = rank (B(q, p)) and
σ (Hn ) = σ (B(q, p)). Proof. 1. To this end, recall that (8.56) and the definition of the bases Bst , Bco , and Brc enables us to compute a variety of matrix representations, depending on the choice of bases. In particular, it implies −1 rc st co −1 rc co st co [H]rc st = [ρq ]st [p(Sq )]co [I]st = [ρq ]co [I]st [p(Sq )]co [I]st .
It was proved in Theorem 8.39 that ⎛
g1 ⎜. ⎜ ⎜ [H]rc st = Hn = ⎜ . ⎜ ⎝. gn
⎞ . . . gn ⎟ .... ⎟ ⎟ .... ⎟. ⎟ ⎠ .... . . . g2n−1
st Obviously, [ρq−1 ]rc co = I and, using matrix representations for dual maps, [I]co = ∗ ]st = [I] st . Here, we used the fact that B and B are dual bases. So we see [I st co co
co
that [I]stco is symmetric. In fact, it is a Hankel matrix, easily computed to be
8.3 Some Classes of Forms
227
⎛
q1 ⎜. ⎜ ⎜ ⎜. st [I]co = ⎜ ⎜. ⎜ ⎝ qn−1 1
. . . qn−1 .... ... .. .
1
⎞ ⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎠
Finally, from the identification of the Bezoutian as the matrix representation of an intertwining map for Sq , i.e., from B(q, p) = [p(Sq )]stco , and defining R = [I]co st = co [I]st , we obtain the equality ˜ p)R. Hn = RB(q,
(8.60)
2. Follows from part 1 by inverting the congruence matrix. See Proposition 5.41. 3. Follows from the congruence relations. The following corollary gives another determinantal test for the coprimeness of two polynomials, this time in terms of a Hankel matrix. Corollary 8.41. Let g(z) = p(z)/q(z) with p(z) and q(z) polynomials satisfying −i deg p < deg q = n. If g(z) = ∑∞ i=1 gi z and ⎛
g1 ⎜. ⎜ ⎜ Hn = ⎜ . ⎜ ⎝. gn
⎞ . . . gn ⎟ .... ⎟ ⎟ .... ⎟, ⎟ ⎠ .... . . . g2n−1
then detHn = detB(q, p).
(8.61)
As a consequence, p(z) and q(z) are coprime if and only if det Hn = 0. Proof. Clearly, det R = (−1)n(n−1)/2, and so (detR)2 = 1, and hence (8.61) follows from (8.60). The matrix identities appearing in Theorem 8.40 are interesting also for other reasons. They include in them some other identities for the principal minors of Hn , which we state in the following way.
228
8 Tensor Products and Forms
Corollary 8.42. Under the previous assumption we have, for k ≤ n, ⎛
g1 ⎜. ⎜ ⎜ ⎜. ⎜ ⎝. gk
. . . . .
. . . . .
. . . . .
⎞ gk ⎟ . ⎟ ⎟ . ⎟ ⎟ ⎠ . g2k−1
⎞⎛ b(n−k+1)(n−k+1) 1 ⎟⎜ ⎜ ψ . . ⎜ 1 ⎟⎜ ⎟⎜ ⎜ =⎜ . ⎟⎜ ... ⎟⎜ ⎜ ⎠⎝ ⎝ . ... . bn(n−k+1) 1 ψ1 . . ψk−1 ⎛
⎞⎛ . . . b(n−k+1)n ⎟⎜ ... . ⎟⎜ ⎟⎜ ... . ⎟⎜ ⎟⎜ ⎠⎝ . ... . ... bnn 1 ψ1
⎞ 1 . ψ1 ⎟ ⎟ ⎟ ⎟. ... ⎟ ⎠ ... . . ψk−1 (8.62)
In particular, we obtain the following. Corollary 8.43. The rank of the Bezoutian of the polynomials q(z) and p(z) is equal to the rank of the Hankel matrix H p/q . Proof. Equation (8.62) implies the following equality: det(gi+ j−1 )ki, j=1 = det(bi j )ni, j=n−k+1 .
(8.63)
At this point, we recall the connection, given in equation (8.61), between the determinant of the Bezoutian and that of the resultant of the polynomials p(z) and q(z). In fact, not only can this be made more precise, we can find relations between the minors of the resultant and those of the corresponding Hankel and Bezout matrices. We state this as follows. Theorem 8.44. Let q(z) be of degree n and p(z) of degree ≤ n. Then the lower k × k right-hand corner of the Bezoutian B(q, p) is given by ⎛
b (n−k+1)(n−k+1) ⎜ . ⎜ ⎜ . ⎜ ⎜ ⎝ . bn(n−k+1)
⎞ . . . b(n−k+1)n ⎟ ... . ⎟ ⎟ ... . ⎟ ⎟ ⎠ ... . ... bnn
8.3 Some Classes of Forms
229
⎛
⎞⎛ qn−k+1 . . pn−2k+1 ⎜ ⎟ ... . ⎟⎜ . ⎟⎜ ... . ⎟⎜ . ⎟⎜ ⎠ ⎝ qn−1 ... . . . . pn−k qn
⎞ . qn−1 qn ⎟ . . . ⎟ ⎟ . . ⎟ ⎟ ⎠ qn
⎛
⎞⎛ pn−k+1 . . qn−2k+1 ⎟⎜ . ... ⎟⎜ ⎟⎜ ... ⎟⎜ . ⎟⎜ ⎠ ⎝ pn−1 ... . . . qn−k pn
⎞ . pn−1 pn ⎟ . . . ⎟ ⎟ . . ⎟ ⎟ ⎠ pn
pn−k ⎜ . ⎜ ⎜ =⎜ . ⎜ ⎝ . pn−1 qn−k ⎜ . ⎜ ⎜ −⎜ . ⎜ ⎝ . qn−1 ⎡⎛
⎞⎛ qn qn−1 . . . pn−2k+1 ⎟⎜ . ... . ⎟⎜ ⎟⎜ ... . ⎟⎜ ⎟⎜ ⎠⎝ ... . . . . pn−k
⎛
⎞⎛ pn pn−1 . . . qn−2k+1 ⎟⎜ . ... ⎟⎜ ⎟⎜ ... ⎟⎜ ⎟⎜ ⎠⎝ ... . . . qn−k
pn−k ⎢⎜ . ⎢⎜ ⎢⎜ = ⎢⎜ . ⎢⎜ ⎣⎝ . pn−1 qn−k ⎜ . ⎜ ⎜ −⎜ . ⎜ ⎝ . qn−1
.. .. .. .
⎞ qn−k+1 ⎟ . ⎟ ⎟ . ⎟ ⎟ ⎠ . qn
⎞⎤ . . pn−k+1 ⎟⎥ ... ⎟⎥ ⎟⎥ ... ⎟⎥ Jk , ⎟⎥ ⎠⎦ .. pn
(8.64)
where we take p j = q j = 0 for j < 0. Moreover, we have det(gi+ j )ki, j=1 = det(bi j )ni, j=n−k+1 pn−k . . . p = n−1 pn
qn−k . . . qn−1 qn
pn−k−1 pn−k . . . pn−1 pn
qn−k−1 qn−k . . . . qn
. . pn−k ... ... ... ... pn−1 pn
qn−2k−1 qn−k . . . . . qn−1 qn
(8.65)
230
8 Tensor Products and Forms
Proof. We use equation (8.40) to infer (8.64). Using Lemma 3.17, we have p n−k . . . pn−1 det(bi j )ni, j=n−k+1 = detJk = pn
. . . . . pn−1 .
. . pn−2k+1 ... ... ... . . pn−k . . pn−k+1 ... ... ... . pn−1 pn
qn−k . . . qn−1 qn
. . . qn−2k+1 .... .... .... . . . qn−k . . . qn−k+1 . .... ... ... . qn−1 qn
k(k−1)
Since det Jk = (−1) 2 , we can use this factor to rearrange the columns of the determinant on the right to get (8.65). An application of equation (8.63) yields equality (8.65). Equation (8.65) will be useful in the derivation of the Hurwitz stability test.
8.3.6 Inversion of Hankel Matrices In this section we study the problem of inverting a finite nonsingular Hankel matrix ⎛
g1 ⎜. ⎜ ⎜ Hn = ⎜ . ⎜ ⎝. gn
⎞ . . . gn ⎟ .... ⎟ ⎟ .... ⎟. ⎟ ⎠ .... . . . g2n−1
(8.66)
We already saw, in Theorem 8.39, that if the gi arise out of a rational function −i g(z) = p(z)/q(z), with p(z), q(z) coprime, via the expansion g(z) = ∑∞ i=1 gi z , the corresponding Hankel matrix is invertible. Our aim here is to utilize the special structure of Hankel matrices in order to facilitate the computation of the inverse. What is more, we would like to provide a recursive algorithm, one that uses the information in the nth step for the computation in the next one. This is a natural entry point to the general area of fast algorithms for computing with specially structured matrices, of which the Hankel matrices are just an example. The method we shall employ is to use the coprime factorization of g(z) as the data of the problem. This leads quickly to concrete representations of the inverse. Going back to the data, given in the form of the coefficients g1 , . . . , g2n−1 , we construct all possible rational functions with rank equal to n and whose first 2n − 1 coefficients
8.3 Some Classes of Forms
231
are prescribed by the gi . Such a rational function is called a minimal rational extension of the sequence g1 , . . . , g2n−1 . We use these rational functions to obtain inversion formulas. It turns out, as is to be expected, that the nature of a specific minimal extension is irrelevant as far as the inversion formulas are concerned. Finally, we will study the recursive structure of the solution, that is, how to use the solution to the inversion problem for a given n when the amount of data is increased. Suppose now that rather than having a rational function g(z) as our data, we are given only numbers g1 , . . . , g2n−1 for which the matrix Hn is invertible. We consider i this to correspond to the rational function ∑2n−1 i=1 gi /z . We look now for all other possible minimal rational extensions. It turns out that, preserving the rank condition, there is relatively little freedom in choosing the coefficients gi , i > 2n − 1. In fact, we have the following. Proposition 8.45. Given g1 , . . . , g2n−1 for which the Hankel matrix Hn is invertible, any minimal rational extension is completely determined by the choice of ξ = g2n . Proof. Since Hn is invertible, there exists a unique solution to the equation ⎛
g1 ⎜. ⎜ ⎜ ⎜. ⎜ ⎝. gn
⎞⎛ ⎞ ⎛ ⎞ . . . gn x0 gn+1 ⎟⎜ . ⎟ ⎜ . ⎟ .... ⎟⎜ ⎟ ⎜ ⎟ ⎟⎜ ⎟ ⎜ ⎟ .... ⎟⎜ . ⎟ = ⎜ . ⎟. ⎟⎜ ⎟ ⎜ ⎟ ⎠⎝ . ⎠ ⎝ . ⎠ .... . . . g2n−1 xn−1 g2n
Since the rank of the infinite Hankel matrix does not increase, we have ⎛
g1 ⎜. ⎜ ⎜. ⎜ ⎜. ⎜ ⎜ ⎜ gn ⎜ ⎜ gn+1 ⎜ ⎜. ⎜ ⎝. .
⎞ ⎞ ⎛ . . . gn gn+1 ⎟ ⎟ ⎜ .... ⎟⎛ ⎞ ⎜ . ⎟ ⎟ ⎟ ⎜ x0 .... ⎟ ⎜ . ⎟ ⎜ ⎟ ⎟ ⎜ ⎟ .... ⎟⎜ . ⎟ ⎜ . ⎟ ⎟⎜ ⎟ ⎟ ⎜ . . . g2n−1 ⎟ ⎜ . ⎟ = ⎜ g2n ⎟ , ⎟⎜ ⎟ ⎟ ⎜ ⎝ . ⎠ ⎜ g2n+1 ⎟ . . . g2n ⎟ ⎟ ⎟ ⎜ ⎟ xn−1 ⎜ . ⎟ .... ⎟ ⎟ ⎜ ⎠ ⎝ . ⎠ .... ....
.
which shows that the gi , with i > 2n, are completely determined by the choice of g2n , via the recurrence relation gk = ∑n−1 i=0 xi gk−n+i . Incidentally, it is quite easy to obtain the monic-denominator polynomial of a minimal rational extension. Theorem 8.46. Given g1 , . . . , g2n−1 for which the Hankel matrix Hn is invertible. Let gξ (z) = pξ (z)/qξ (z) be any minimal rational extension of the sequence g1 , . . . , g2n−1 determined by ξ = g2n , pξ (z) and qξ (z) coprime, and qξ (z) monic. Let g(z) = p(z)/q(z) be the extension corresponding to ξ = 0. Also, let a(z) and
232
8 Tensor Products and Forms
b(z) be the unique polynomials of degree < n and < deg p respectively that solve the Bezout equation a(z)p(z) + b(z)q(z) = 1. (8.67) Then, 1. The polynomial a(z) is independent of ξ , and its coefficients are given as a solution of the system of linear equations ⎛
g1 ⎜. ⎜ ⎜ ⎜. ⎜ ⎝. gn
⎛ ⎞ ⎞⎛ ⎞ a0 0 . . . gn ⎜. ⎟ ⎟⎜ . ⎟ .... ⎜ ⎟ ⎟⎜ ⎟ ⎜ ⎟ ⎟⎜ ⎟ .... ⎟⎜ . ⎟ = −⎜. ⎟. ⎜ ⎟ ⎟⎜ ⎟ ⎝0⎠ ⎠⎝ . ⎠ .... . . . g2n−1 1 an−1
(8.68)
2. The polynomial b(z) is independent of ξ and has the representation n
b(z) = − ∑ gi ai (z), i=1
where ai (z), i = 1, . . . , n − 2, are the polynomials defined by ai (z) = π+ z−i a. 3. The coefficients of the polynomial qξ (z) = zn + qn−1 (ξ )zn−1 + · · · + q0 (ξ ) are solutions of the system of linear equations ⎛
g1 ⎜. ⎜ ⎜ ⎜. ⎜ ⎝. gn
⎞⎛ ⎞ ⎛ ⎞ q0 (ξ ) . . . gn gn+1 ⎟⎜ . ⎟ ⎜. ⎟ .... ⎟⎜ ⎟ ⎜ ⎟ ⎟⎜ ⎟ ⎜ ⎟ .... ⎟⎜ . ⎟. ⎟ = −⎜ . ⎟⎜ ⎟ ⎜ ⎟ ⎠⎝ . ⎝ g2n−1 ⎠ ⎠ .... qn−1 (ξ ) . . . g2n−1 ξ
(8.69)
The polynomial qξ can also be given in determinantal form as g 1 . −1 . qξ (z) = (det Hn ) . gn gn+1 4. We have qξ (z) = q(z) − ξ a(z). 5. We have pξ (z) = p(z) + ξ b(z).
. . . gn 1 ... . z ... . . . ... . . . . . g2n−1 zn−1 ... ξ zn
(8.70)
8.3 Some Classes of Forms
233
6. With respect to the control basis polynomials corresponding to qξ (z), i.e., ei (ξ , z) = π+ z−i qξ = qi (ξ ) + qi+1(ξ )z + · · · + zn−i , i = 1, . . . , n, the polynomial pξ (z) has the representation n
pξ (z) = ∑ gi ei (ξ , z).
(8.71)
i=1
Proof. 1. Let pξ (z), qξ (z) be the polynomials in the coprime representation gξ (z) = pξ (z)/qξ (z) corresponding to the choice g2n = ξ . Let a(z) and b(z) be the unique polynomials, assuming dega < deg qξ , that solve the Bezout equation a(z)pξ (z) + b(z)qξ (z) = 1. We rewrite this equation as a(z)
pξ (z) 1 + b(z) = , qξ (z) qξ (z)
(8.72)
and note that the right-hand side has an expansion of the form 1 σn+1 1 = n + n+1 + · · · . qξ (z) z z Equating coefficients of z−i , i = 1, . . . , n, in (8.72), we have a0 g1 + · · · + an−1gn a0 g2 + · · · + an−1gn+1 . . . a0 gn−1 + · · · + an−1g2n−2 a0 gn + · · · + an−1g2n−1
= 0, = 0,
= 0, = −1,
which is equivalent to (8.68). Since g1 , . . . , g2n−1 do not depend on ξ , this shows that the polynomial a(z) is also independent of ξ . 2. Equating nonnegative indexed coefficients in (8.72), we obtain an−1 g1 = −bn−2 , an−1 g2 + an−2 g1 = −bn−3 , . . . an−1 gn−1 + · · · + a1 g1 = −b0 ,
234
8 Tensor Products and Forms
or in matrix form
⎛ ⎞⎛ ⎞ ⎞ g0 bn−2 an−1 ⎜ . ⎟ ⎟⎜ . ⎟ ⎜ . . ⎜ ⎟⎜ ⎟ ⎟ ⎜ ⎜ ⎟⎜ ⎟ ⎟ ⎜ . ⎟⎜ . ⎟ = −⎜ . ⎟. ⎜ . ⎜ ⎟⎜ ⎟ ⎟ ⎜ ⎝ . ⎠ ⎠⎝ . ⎠ ⎝ . . a1 . . . an−1 gn−1 b0 ⎛
(8.73)
This means that the polynomial b(z) is independent of ξ and has the representation b(z) = − ∑ni=1 gi ai (z). 3. From the representation gξ (z) = pξ (z)/qξ (z), we get pξ (z) = qξ (z)gξ (z). In the last equality we equate coefficients of z−1 , . . . , z−n to get the system of equations (8.69). Since Hn is assumed to be invertible, this system has a unique solution, and the solution obviously depends linearly on the parameter ξ . This system can be solved by inverting Hn to get ⎞ ⎛ ⎞ ⎛ ⎞ q0 (ξ ) gn+1 gn+1 ⎟ ⎜. ⎜ . ⎟ ⎜ . ⎟ ⎟ ⎜ ⎜ ⎟ ⎟ adjHn ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ −1 ⎜ ⎟ = −Hn ⎜ . ⎟ = − ⎜. ⎜ . ⎟. ⎟ ⎜ ⎜ ⎟ ⎟ detHn ⎜ ⎠ ⎝. ⎝ g2n−1 ⎠ ⎝ g2n−1 ⎠ ξ ξ qn−1 (ξ ) ⎛
Multiplying on the left by the polynomial row vector 1 z · · ·zn−1 , and using equation (3.14), we get (ξ ) qξ (z) = zn + qn−1(ξ )zn−1 + · · · + q0⎛ ⎞ gn+1 ⎜ . ⎟ ⎜ ⎟ adj H ⎜ ⎟ n = zn − 1 z · · ·zn−1 ⎜ . ⎟ ⎟ det Hn ⎜ ⎝ g2n−1 ⎠ ξ ⎡
⎞⎤ gn+1 ⎢ ⎜ . ⎟⎥ ⎢ ⎜ ⎟⎥ ⎢ ⎜ ⎟⎥ −1 n n−1 = (det Hn ) ⎢det Hn z − 1 z · · ·z adjHn ⎜ . ⎟⎥ ⎢ ⎜ ⎟⎥ ⎣ ⎝ g2n−1 ⎠⎦ ξ g1 . . . gn gn+1 . ... . . . . ... . = (det Hn )−1 , . ... . . gn . . . g2n−1 ξ 1 z . . zn−1 zn
which is equivalent to (8.70).
⎛
8.3 Some Classes of Forms
235
4. By writing the last row of the determinant in (8.70) as
gn+1 . . . ξ zn = gn+1 . . . 0 zn + 0 . . 0 ξ 0 ,
and using the linearity of the determinant as a function of each of its columns, we get g1 . . . gn 1 . ... . z . . ... . qξ (z) = (det Hn )−1 . ... . . gn . . . g2n−1 zn−1 gn+1 . . . ξ zn g1 . . = (det Hn )−1 . gn gn+1 g1 . . +(detHn )−1 . gn 0
. . . gn 1 ... . z ... . . ... . . . . . g2n−1 zn−1 ... 0 zn .. .. .. .. .. ..
. gn 1 . . z . . . . . . . g2n−1 zn−1 0 ξ 0
= q(z) − ξ r(z), where
g 1 . r(z) = . . gn
. . . gn−1 1 ... . z ... . . . ... . . . . . g2n−2 zn−1
We will show that r(z) = a(z). From the Bezout equations a(z)p(z) + b(z)q(z) = 1 and a(z)pξ (z) + b(z)qξ (z) = 1 it follows by subtraction that a(z)(pξ (z) − p(z)) + b(z)(qξ (z) − q(z)) = 0. Since qξ (z) = q(z) − ξ r(z), we must have a(z)(pξ (z) − p(z)) = ξ b(z)r(z). The coprimeness of a(z) and b(z) implies that a(z) is a divisor of r(z). Setting r(z) = a(z)d(z), it follows that pξ (z) = p(z) + ξ b(z)d(z) and qξ (z) = q(z) − ξ a(z)d(z). Now
236
8 Tensor Products and Forms
p(z) + ξ b(z)d(z) p(z) ξ (a(z)p(z) + b(z)q(z))d(z) − = q(z) − ξ a(z)d(z) q(z) q(z)(q(z) − ξ a(z)d(z)) ξ d(z) = . q(z)(q(z) − ξ a(z)d(z))
gξ (z) − g(z) =
Since gξ (z) and g(z) have expansions in powers of z−1 agreeing for the first 2n − 1 terms, and the next term has to be ξ z−2n , this implies d = 1. 5. Follows from pξ (z) = p(z) + ξ b(z)d(z), by substituting d(z) = 1. 6. Equating positively indexed coefficients in the equality pξ (z) = qξ (z)gξ (z), we have pn−1 (ξ ) = g1 , pn−2 (ξ ) = g2 + qn−1 (ξ )g1 , . . . p0 (ξ ) = gn + qn−1 (ξ )gn−1 + · · · + q1 (ξ )g1 , or
⎞ q1 (ξ ) . . . qn−1 (ξ ) 1 ⎛ ⎞ p 0 (ξ ) ⎟ g1 ⎜ . . .. . ⎟⎜ . ⎟ ⎟ ⎜ ⎜ . ⎟⎜ ⎟ ⎟ ⎜ ⎜ . . .. ⎟⎜ ⎟ ⎟ ⎜ ⎜ . ⎟⎜ . ⎟. ⎟=⎜ ⎜ ⎟⎜ ⎟ ⎟ ⎜ ⎜ . . . ⎟⎝ . ⎠ ⎠ ⎜ ⎝ . ⎠ ⎝ qn−1 (ξ ) 1 gn pn−1 (ξ ) 1 ⎛
⎞
⎛
This is equivalent to pξ (z) = g1 e1 (ξ , z) + · · · + gn en (ξ , z),
which proves (8.71).
Incidentally, parts (4) and (5) of the theorem provide a nice parametrization of all solutions to the minimal partial realization problem arising out of a nonsingular Hankel matrix. Corollary 8.47. Given the nonsingular Hankel matrix Hn , if g(z) = p(z)/q(z) is the minimal rational extension corresponding to the choice g2n = 0, then the minimal rational extension corresponding to g2n = ξ is given by gξ (z) =
p(z) + ξ b(z) , q(z) − ξ a(z)
where a(z), b(z) are the polynomials arising from the solution of the Bezout equation a(z)p(z) + b(z)q(z) = 1. This theorem explains the mechanism of how the particular minimal rational extension chosen does not influence the computation of the inverse, as of course it
8.3 Some Classes of Forms
237
should not. Indeed, we see that B(qξ , a) = B(q − ξ a, a) = B(q, a) − ξ B(a, a) = B(q, a), and so Hn−1 = B(qξ , a) = B(q, a).
8.3.7 Continued Fractions and Orthogonal Polynomials The Euclidean algorithm provides a tool not only for the computation of a g.c.d. of a pair of polynomials but also, if coprimeness is assumed, for the solution of the corresponding Bezout equation. The application of the Euclidean algorithm yields a representation of a rational function in the form of a continued fraction. It should be noted that a slight modification of the process makes the method applicable to an arbitrary (formal) power series. This is the theme of the present section. Given an irreducible representation p(z)/q(z) of a strictly proper rational function g(z), we define, using the division rule for polynomials, a sequence of polynomials qi (z), nonzero constants βi , and monic polynomials ai (z), by
q−1 (z) = q(z), q0 (z) = p(z), qi+1 (z) = ai+1 (z)qi (z) − βi qi−1 (z),
(8.74)
with deg qi+1 < deg qi . The procedure ends when qn = 0, and then qn−1 is the g.c.d. of p(z) and q(z). Since p(z) and q(z) are assumed coprime, qn−1 (z) is a nonzero constant. The pairs {βi−1 , ai (z)} are called the atoms of g(z). We define αi = deg ai . For i = 0, we rewrite equation (8.74) as
β0 p q0 = = . q q−1 a1 − q1 q0 Thus g1 = q1 /q0 is also a strictly proper rational function, and furthermore, this is an irreducible representation. By an inductive argument, this leads to the continued fraction representation of g(z) in terms of the atoms βi−1 and ai (z):
β0
g(z) =
.
β1
a1 (z) − a2 (z) −
β2 . a3 (z) − . .
βn−2 an−1 (z) −
βn−1 an (z)
(8.75)
238
8 Tensor Products and Forms
There are two ways to truncate the continued fraction. The first one is to consider the top part, i.e., let
β0
gi (z) =
.
β1
a1 (z) −
(8.76)
β2
a2 (z) −
βi−2
..
a3 (z) − .
ai−1 (z) −
βi−1 ai (z)
The other option is to consider the rational functions γi defined by the bottom part, namely βi . (8.77) γi (z) = βi+1 ai+1 (z) − βn−2 . ai+2 (z) − . . βn−1 an−1 (z) − an (z) p(z) . We also set γn (z) = 0. Clearly, we have gn (z) = γ0 (z) = g(z) = q(z) Just as a rational function determines a unique continued fraction, it is also completely determined by it.
Proposition 8.48. Let g(z) be a rational function having the continued fraction representation (8.75). Then {βi , ai+1 }n−1 i=0 are the atoms of g(z). Proof. Assume g(z) = p(z)/q(z). Then g(z) =
β0 p(z) = , q(z) a1 − γ1
with γi strictly proper. This implies p(z)γ1 = a1 (z)p(z) − β0 q(z). This shows that r(z) = p(z)γ1 (z) is a polynomial, and since γ1 (z) is strictly proper, we have deg r < deg p. It follows that {β0 , a1 (z)} is the first atom of g(z). The rest follows by induction. Fractional linear transformations are crucial to the analysis of continued fractions and play a central role in a wide variety of parametrization problems. Definition 8.49. A fractional linear transformation, or a M¨obius transformation, is a function of the form f (w) =
aw + b , cw + d
8.3 Some Classes of Forms
239
under the extra condition ad − bc = 0. With this fractional linear transformation we associate the matrix
ab cd
.
The following lemma relates composition of fractional linear transformations to matrix multiplication. w+bi , i = 1, 2, be two fractional linear transformations. Lemma 8.50. Let fi (w) = aciiw+d i Then their composition is given by ( f1 ◦ f 2 )(w) = aw+b cw+d , where
ab cd
=
a 1 b1 c 1 d1
a 2 b2 c2 d2
.
Proof. By computation.
We define now two sequences of polynomials {Qk (z)} and {Pk (z)} by the three term recurrence formulas Q−1 (z) = 0, Q0 (z) = 1, (8.78) Qk+1 (z) = ak+1 (z)Qk (z) − βk Qk−1 (z), and
P−1 (z) = −1, P0 (z) = 0, Pk+1 (z) = ak+1 (z)Pk (z) − βk Pk−1 (z).
(8.79)
The polynomials Qi (z) and Pi (z) so defined are called the Lanczos polynomials of the first and second kind respectively. Theorem 8.51. Let g(z) be a strictly proper rational function having the continued fraction representation (8.75). Let the Lanczos polynomials be defined by (8.78) and (8.79), and define the polynomials Ak (z), Bk (z) by Ak (z) =
Qk−1 (z) , β0 · · · βk−1
−Pk−1(z) Bk (z) = . β0 · · · βk−1
(8.80)
Then 1. For the rational functions γi (z), i = 1, . . . , n, we have ai+1 (z)γi (z) − βi = γi (z)γi+1 (z).
(8.81)
2. Defining the rational functions Ei (z) by Ei (z) = Qi (z)Pn (z)/Qn (z) − Pi (z) = (Qi (z)Pn (z) − Pi (z)Qn (z))/Qn (z), (8.82)
240
8 Tensor Products and Forms
they satisfy the following recurrence:
E−1 = 1, E0 = g0 = g, Ek+1 (z) = ak+1 (z)Ek (z) − βk Ek−1 (z).
3. We have
(8.83)
Ei (z) = γ0 (z) · · · γi (z).
(8.84)
4. In the expansion of Ei (z) in powers of z−1 , the leading terms is
β0 ···βi . zα1 +···+αi+1
5. With αi = deg ai , we have deg Qk = ∑ki=1 αi and deg Pk = ∑ki=2 αi . 6. The polynomials Ak (z), Bk (z) are coprime and solve the Bezout equation Pk (z)Ak (z) + Qk (z)Bk (z) = 1.
(8.85)
7. The rational function Pk (z)/Qk (z) is strictly proper, and the expansions of p(z)/q(z) and Pk (z)/Qk (z) in powers of z−1 agree up to order 2 ∑ki=1 deg ai + deg ak+1 = 2 ∑ki=1 αi + αi+1 . 8. For the unimodular matrices
0 β0 −1 a1
0 βi −1 ai+1
, i = 0, . . . , n − 1, we have
0 βi−1 ··· −1 ai
=
−Pi−1 Pi −Qi−1 Qi
.
9. Let Ti bethe fractional linear transformation corresponding to the matrix 0 βi . Then −1 ai+1
g(z) = (T0 ◦ · · · ◦ Ti−1 )(γi ) =
Pi − Pi−1 γi . Qi − Qi−1 γi
(8.86)
In particular, g(z) = (T0 ◦ · · · ◦ Tn−1)(γn ) = Pn (z)/Qn (z), i.e., we have Pn (z) = p(z), 10. The 11. The
Qn (z) = q(z).
atoms of Pk (z)/Qk (z) are {βi−1 , ai (z)}ki=1 . generating function of the Bezoutian of Qk (z) Qk (z)Ak (w) − Ak (z)Qk (w) z−w = ∑kj=1 (β0 · · · β j−1 )−1 Q j (z)
(8.87)
and Ak (z) satisfies
a j+1 (z) − a j+1(w) Q j (w). z−w
(8.88)
8.3 Some Classes of Forms
241
The formula for the expansion of the generating function is called the generalized Christoffel–Darboux formula . The regular Christoffel–Darboux formula is the generic case, in which all the atoms a j have degree one. 12. The Bezoutian matrix B(Qk , Ak ) can be written as B(Qk , Ak ) =
k−1
∑ (β0 · · · β j−1)−1 R˜ j , B(a j+1, 1)R j ,
j=1
where R j is the n j × n Toeplitz matrix ⎛
( j)
( j)
⎞
( j)
σ0 σ1 . . . σn j ⎜ . ... . . ⎜ ⎜ ... . . ⎜ Rj = ⎜ ⎜ .. . . ⎜ ⎝ . . . ( j) ( j) σ0 σ1
. .. ... ( j) . . . σn j
⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
(8.89)
( j)
and Q j (z) = ∑ σν zν . 13. Assume the Hankel matrix Hn of (8.66) to be nonsingular. Let g = p/q be any minimal rational extension of the sequence g1 , . . . , g2n−1 . Then Hn−1 =
n
1
∑ β0 · · · β j−1 R˜ j B(1, a j+1)R j .
j=1
Proof. 1. Clearly, by the definition of the γi (z), we have, for i = 1, . . . , n − 1,
γi (z) =
βi , ai+1 (z) − γi+1 (z)
(8.90)
which can be rewritten as (8.81). Since γn−1 (z) = βn−1 /an (z) and γn (z) = 0, equation (8.81) holds also for i = n. 2. We compute first the initial conditions, using the initial conditions of the Lanczos polynomials. Thus E−1 (z) = Q−1 (z)Pn (z)/Qn (z) − P−1(z) = 1, E0 (z) = Q0 (z)Pn (z)/Qn (z) − P0 (z) = Pn (z)/Qn (z) = γ0 (z) = g(z). Next, we compute
242
8 Tensor Products and Forms
Ei+1 (z) = Qi+1 (z)Pn (z)/Qn (z) − Pi+1 (z) = (Qi+1 (z)Pn (z) − Pi+1 (z)Qn (z))/Qn (z) = [(ai+1 (z)Qi (z) − βi Qi−1 (z))Pn (z) − (ai+1 (z)Pi (z) − βi Pi−1 (z))Qn (z)] /Qn (z) = ai+1 (z)(Qi (z)Pn (z) − Pi (z)Qn (z))/Qn (z) − βi (Qi−1 (z)Pn (z) − Pi−1 (z)Qn (z))/Qn (z) = ai+1 (z)Ei (z) − βi Ei−1 (z). 3. We use the recurrence formula (8.83) and equation (8.81). The proof goes by induction. For i = 1 we have E1 = a1 E0 − β0 = a1 γ0 − β0 = γ0 γ1 . Assuming that Ei (z) = γ0 (z) · · · γi (z), we compute Ei+1 (z) = ai+1 Ei (z) − βi Ei−1 (z) = ai+1 (z)γ0 (z) · · · γi (z) − βi γ0 (z) · · · γi−1 (z) = γ0 (z) · · · γi−1 (z)(ai+1 (z)γi (z) − βi ) = γ0 (z) · · · γi−1 (z)(γi (z)γi+1 (z)). 4. From (8.90) we have βi /zαi+1 for the leading term of the expansion of γi . Since Ei (z) = γ0 (z) · · · γi (z), we have β0 · · · βi /zα1 +···+αi+1 = β0 · · · βi /zνi+1 for the leading term of the expansion of Ei (z). 5. By induction, using the recurrence formulas as well as the initial conditions. 6. We compute the expression Qk+1 (z)Pk (z) − Qk (z)Pk+1 (z) using the defining recurrence formulas. So Qk+1 (z)Pk (z) − Qk (z)Pk+1 (z) = (ak+1 (z)Qk (z) − βk Qk−1 (z))Pk (z) −(ak+1 (z)Pk (z) − βk Pk−1 (z))Qk (z) = βk (Qk (z)Pk−1 (z) − Qk−1 (z)Pk (z)), and by induction, this implies Qk+1 (z)Pk (z) − Qk (z)Pk+1 (z) = βk · · · β0 (Q0 (z)P−1 (z) − Q−1 (z)P0 (z)) = − βk · · · β0 . Therefore, for each k, Qk (z) and Pk (z) are coprime. Moreover, dividing by −βk · · · β0 , we see that the following Bezout identities hold: Pk (z)Ak (z) + Qk (z)Bk (z) = 1. 7. With the Ei (z) defined by (8.82), we compute Pn (z)/Qn (z) − Pi (z)/Qi (z) = (Pn (z)Qi (z) − Qn (z)Pi (z))/Qn (z)Qi (z) = Ei (z)/Qi (z).
8.3 Some Classes of Forms
243
Since deg Qi = ∑ij=1 α j and using part (4), it follows that the leading term of Ei (z)/Qi (z) is β0 · · · βi /z2 ∑ j=1 α j +αi+1 . This is in agreement with the statement of the theorem, since deg ai = ∑ij=1 α j . 8. We will prove it by induction. Indeed, for i = 0, we have i
0 β0 −1 a1
=
−P0 P1 −Q0 Q1
.
Next, using the induction hypothesis, we compute
−Pi−1 Pi −Qi−1 Qi
0 βi −1 ai+1
=
−Pi ai+1 Pi − βiPi−1 −Qi ai+1 Qi − βiQi−1
=
−Pi Pi+1 −Qi Qi+1
.
9. We clearly have g = γ0 = T0 (γ1 ) = T0 (T1 (γ2 )) = (T0 ◦ T1)(γ2 ) = · · · = (T0 ◦ · · · ◦ Ti−1 )(γi ). To prove (8.87), we use the fact that γn = 0. 10. From (8.76), and using (8.86), we get gk = (T0 ◦ · · · ◦ Tk−1 )(0) = Pk /Qk . 11. Obviously, the Bezout identity (8.85) implies also the coprimeness of the polynomials Ak (z) and Qk (z). Since the Bezoutian is linear in each of its arguments we can get a recursive formula for B(Ak , Qk ): B(Qk , Ak ) = B(Qk , (β0 · · · βk−1 )−1 Qk−1 ) = B(ak Qk−1 − βk−1 Qk−2 , (β0 · · · βk−1 )−1 Qk−1 ) = (β0 · · · βk−1 )−1 Qk−1 (z)B(ak , 1)Qk−1 (w) + B(Qk−1 , (β0 · · · βk−2 )−1 Qk−2 ) = · · · = ∑(β0 · · · βk−1 )−1 Q j (z)B(a j+1 , 1)Q j (w). 12. This is just the matrix representation of (8.88). 13. We use Theorem 8.39, the equalities Qn (z) = q(z), Pn (z) = p(z), and the Bezout equation (8.85) for k = n. A continued fraction representation is an extremely convenient tool for the computation of the signature of the Hankel or Bezout quadratic forms. Theorem 8.52 (Frobenius). Let g(z) be rational having the continued fraction representation (8.75). Then r i−1 1 + (−1)degai −1 . (8.91) σ (Hg ) = ∑ sign ∏ β j 2 j=0 i=1
244
8 Tensor Products and Forms
Proof. It suffices to compute the signature of the Bezoutian B(q, p). Using the equations of the Euclidean algorithm, we have a1 q0 − q 1 , q0 β0 q0 (z)B(a1 , 1)q0 (w) + β1 B(q0 , q1 )
B(q, p) = B(q−1 , q0 ) = B =
1 β0
0
Here we used the fact that the Bezoutian is alternating. Since p(z) and q(z) are coprime, we have rank B(a1 , 1) + β1 rank B(q0 , q1 ) = deg a1 + degq0 = deg q = rank B(q, p). 0
The additivity of the ranks implies, by Theorem 8.15, the additivity of the signatures, and therefore 1 1 σ (B(q, p)) = σ B(a1 , 1) + σ B(q0 , q1 ) . β0 β0 We proceed by induction and readily derive r
σ (B(p, q)) = ∑ sign
i=1
1 β0 · · · βi
σ (B(ai+1 , 1)).
Now, given a polynomial c(z) = zk + ck−1 zk−1 + · · · + c0 , we have B(c, 1) =
k zi − wi c(z) − c(w) = ∑ ci , z−w z−w i=1
with ck = 1. Using the equality zi − wi = z−w we have
⎛
i
∑ z j wi− j−1 ,
j=1
c1 ⎜c ⎜ 2 ⎜ ⎜. B(c, 1) = ⎜ ⎜. ⎜ ⎝ ci−1 1
c2 . . . 1
. . ci−1 ... .. .
1
⎞ ⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎠
(8.92)
8.3 Some Classes of Forms
Hence
245
1
if deg c is odd,
0
if deg c is even.
Equality (8.92) can now be rewritten as (8.91).
The Lanczos polynomials Qi (z) generated by the recurrence relation (8.78) satisfy certain orthogonality conditions. These orthogonality conditions depend of course on the rational function g(z) from which the polynomials were derived. These orthogonality relations are related to an inner product induced by the Hankel operator Hg . We will explain now this relation. Starting from the strictly proper rational function g(z), we have the bilinear form defined on the space of polynomials by {x, y} = [Hg x, y] = ∑ ∑ gi+ j ξi η j , i
(8.93)
j
where x(z) = ∑i ξi zi and y(z) = ∑ j η j z j . If g(z) has the coprime representation g(z) = p(z)/q(z), then KerHg = qF[z], and hence we may as well restrict ourselves to the study of Hg as a map from Xq to X q = Xq∗ . However, we can view all the action as taking place in Xq , using the pairing ·, · defined in (5.64). Thus, for x(z), y(z) ∈ Xq , we have [Hg x, y] = [π− q−1 px, y] = [q−1 qπ− q−1 px, y] = πq px, y = p(Sq )x, y. In fact, p(Sq ) is, as a result of Theorem 5.39, a self-adjoint operator in Xq . The pairing {x, y} = p(Sq )x, y can of course be evaluated by choosing an appropriate basis. In fact, we have st st co p(Sq )x, y = ([p(Sq )]co st [x] , [y] ). Since [p(Sq )]st = Hn , we recover (8.93). Proposition 8.53. Let Pi (z), Qi (z) be the Lanczos polynomials associated with the rational function g(z). Relative to the pairing {x, y} = p(Sq )x, y = [Hg x, y] = ∑ ∑ gi+ j ξi η j , i
(8.94)
j
the polynomial Qi (z) is orthogonal to all polynomials of lower degree. Specifically, {Qi , z j } = [Hg Qi , z j ] =
0, j < α1 + · · · + αi , β 0 · · · β i , j = α1 + · · · + αi .
(8.95)
Proof. Recalling that the rational functions Ei (z) are defined in (8.82) by Ei (z) = g(z)Qi (z) − Pi (z), this means that
246
8 Tensor Products and Forms
π− Ei = π− gQi = Hg Qi . Since the leading term of Ei (z) is
β0 ···βi , zα1 +···+αi+1
this implies (8.95).
i−1 The set {z ji (Πk=0 Qk (z)) | i = 1, . . . , r − 1; ji = 0, . . . , ni − 1}, lexicographically ordered, is clearly a basis for Xq since it contains one polynomial for each degree between 0 and n − 1. We denote it by Bor and refer to it as the orthogonal basis because of its relation to orthogonal polynomials. The properties of the Lanczos polynomials can be used to compute a matrix representation of the shift Sq relative to this basis. p(z) Theorem 8.54. Let the strictly proper transfer function g(z) = q(z) have the r sequence of atoms {ai (z), βi−1 }i=1 , and assume that α1 , . . . , αr are the degrees of the atoms and
ak (z) = zαk +
αk −1
∑
(k)
ai zi .
i=0
Then the shift operator Sq has the following, block tridiagonal, matrix representation with respect to the orthogonal basis: ⎛
⎞ A11 A12 ⎜A A . ⎟ ⎜ 21 22 ⎟ ⎜ ⎟ or [Sq ]or = ⎜ . .. ⎟, ⎜ ⎟ ⎝ .. Ar−1 r ⎠ Ar r−1 Arr where
⎛
0.. . ⎜1 ⎜ ⎜ Aii = ⎜ . ⎜ . ⎝ 1 ⎛ 0..0 ⎜. ⎜ ⎜ Ai+1 i = ⎜ . ⎜ ⎝. 0.. .
⎞ (i) −a0 . ⎟ ⎟ ⎟ . ⎟ , i = 1, . . . r, ⎟ . ⎠ (i) −a ⎞αi−1 1 0⎟ ⎟ ⎟ . ⎟ , i = 1, . . . r − 1, ⎟ .⎠ 0
and Ai i+1 = βi−1 Ai+1 i . Proof. On the basis elements zi Q j−1 (z), i = 0, . . . , n j − 2, j = 1, . . . , r − 1, the map Sq indeed acts as the shift to the next basis element. For i = n j − 1 we compute
8.3 Some Classes of Forms
247
Sq zn j −1 Q j−1 = πq zn j Q j−1
n j −1 ( j) k n j −1 ( j) k = πq zn j + ∑k=0 ak z − ∑k=0 ak z Q j−1 n −1 ( j)
j = a j Q j−1 − ∑k=0 ak zk · Q j−1 n j −1 ( j) k = Q j + β j−1Q j−2 − ∑k=0 ak z · Q j−1 .
In the generic case, in which all Hankel matrices Hi , i = 1, . . . , n, are nonsingular, all polynomials ai (z) are linear, and we write ai (z) = z − θi . The recurrence equation for the orthogonal polynomials becomes Qi+1 (z) = (z − θi+1 )Qi (z) − βi Qi−1 (z), and we obtain the following corollary. Corollary 8.55. Assuming all Hankel matrices Hi , i = 1, . . . , n, are nonsingular, then with respect to the orthogonal basis {Q0 (z), . . . , Qn−1 (z)} of Xq = XQn , the shift operator Sq has the tridiagonal matrix representation ⎛
⎞ θ1 β 1 ⎜1 θ . ⎟ 2 ⎜ ⎟ ⎜ ⎟ . .. ⎜ ⎟ or [Sq ]or = ⎜ ⎟. ⎜ ⎟ .. . ⎜ ⎟ ⎝ . . βn−1 ⎠ 1 θn The Hankel matrix Hn is symmetric, hence can be diagonalized by a congruence transformation. In fact, we can exhibit explicitly a diagonalizing congruence. Using the same method, we can clarify the connection between Cq , the companion matrix of q(z), and the tridiagonal representation. Proposition 8.56. Let g(z) = p(z)/q(z) be strictly proper, rational of McMillan degree n = degq. Assume all Hankel matrices Hi , i = 1, . . . , n, are nonsingular and let {Q0 (z), . . . , Qn−1 (z)} be the orthogonal basis of Xq . Then 1.
⎛
⎞⎛ q0,0 g1 ⎜ q ⎜ ⎟ ⎜ 1,0 q1,1 ⎟⎜. ⎜ ⎟⎜ . ⎜ . ⎟⎜. ⎜ ⎟⎜ ⎝ . ⎠⎝. . gn qn−1,0 . . .qn−1,n−1 ⎛ ⎜ ⎜ ⎜ =⎜ ⎜ ⎝
β0
⎞⎛ ⎞ . . . gn q0,0 q1,0 . . qn−1,0 ⎟⎜ ⎟ .... q1,1 . ⎟⎜ ⎟ ⎟⎜ ⎟ .... . . ⎟⎜ ⎟ ⎟⎜ ⎟ ⎠⎝ ⎠ .... . . . . . g2n−1 qn−1,n−1
⎞
β 0 β1 . .
β0 · · · βn−1
⎟ ⎟ ⎟ ⎟. ⎟ ⎠ (8.96)
248
8 Tensor Products and Forms
2. ⎛
⎞⎛ ⎞ q0,0 q1,0 . . qn−1,0 −q0 ⎜ ⎟ · ⎟ q1,1 . ⎟⎜ ⎟ ⎟⎜ ⎟ · ⎟⎜ . . ⎟ ⎟⎜ ⎟ ⎠ · ⎠⎝ . . 1 −qn−1 qn−1,n−1 ⎞⎛ ⎞ ⎛ q0,0 q1,0 . . qn−1,0 θ1 β1 ⎟⎜ 1 θ . ⎟ ⎜ q1,1 . ⎟⎜ ⎟ ⎜ 2 ⎟⎜ ⎟ ⎜ =⎜ . . . . ⎟⎜ ⎟. ⎟⎜ ⎟ ⎜ ⎠⎝ ⎝ . . . . βn−1 ⎠ 1 θn qn−1,n−1
0 ⎜1 ⎜ ⎜ ⎜ . ⎜ ⎝ .
(8.97)
Proof. 1. This is the matrix representation of the orthogonality relations given in (8.95). 2. We use the trivial operator identity Sq I = ISq and take the matrix representation [Sq ]stst [I]stor = [I]stor [Sq ]or or . This is equivalent to (8.97). It is easy to read off from the diagonal representation (8.96) of the Hankel matrix the relevant signature information. A special case is the following. Corollary 8.57. Let g(z) be a rational function with the continued fraction representation (8.75). Then Hn is positive definite if and only if all ai (z) are linear and all the βi are positive.
8.3.8 The Cauchy Index Let g(z) be a rational transfer function with real coefficients having the coprime representation g(z) = p(z)/q(z). The Cauchy index of g(z), denoted by Ig , is defined as the number of jumps, on the real axis, of g(z) from −∞ to +∞ minus the number of jumps from +∞ to −∞. Thus, the Cauchy index is related to discontinuities of g(z) on the real axis. These discontinuities arise from the real zeros of q(z). However, the contributions to the Cauchy index come only from zeros of q(z) of odd order. In this section we will establish some connections between the Cauchy index, the signature of the Hankel map induced by g(z), and the existence of signaturesymmetric realizations of g(z). The central result is the classical Hermite–Hurwitz theorem. However, before proving it, we state and prove an auxiliary result that is of independent interest.
8.3 Some Classes of Forms
249
Proposition 8.58. Let g(z) be a real rational function. Then the following scaling operations (1) g(z) −→ mg(z), m > 0, (2) g(z) −→ g(z − a), a ∈ R, (3) g(z) −→ g(rz), r > 0, leave the rank and signature of the Hankel map as well as the Cauchy index invariant. Proof. (1) is obvious. To prove the rank invariance, let g(z) = p(z)/q(z) with p(z) and q(z) coprime. By the Euclidean algorithm, there exist polynomials a(z) and b(z) such that a(z)p(z) + b(z)q(z) = 1. This implies a(z − a)p(z − a) + b(z − a)q(z − a) = 1 as well as a(rz)p(rz) + b(rz)q(rz) = 1, i.e., p(z − a), q(z − a) are coprime and so are p(rz), q(rz). Now g(z − a) = p(z − a)/q(z − a) and g(rz) = p(rz)/q(rz), which proves the invariance of McMillan degree, which is equal to the rank of the Hankel map. Now it is easy to check that given any polynomial u(z), we have [Hg u, u] = [Hga ua , ua ], where ga (z) = g(z − a). If we define a map Ra : R[z] −→ R[z] by (Ra u)(z) = u(z − a) = ua (z), then Ra is invertible, R−1 a = R−a , and [Hg u, u] = [Hga ua , ua ] = [Hga Ra u, Ra u] = [Ra ∗ Hga Ra u, u], which shows that Hg = R∗a Hga Ra and hence that σ (Hg ) = σ (Hga ). This proves (2). To prove (3), define, for r > 0, a map Pr : R[z] −→ R[z] by (Pr u)(z) = u(rz). Clearly, Pr is invertible and Pr−1 = P1/r . Letting ur = Pr u, we have [Hgr u, u] = [π− g(rz)u, u] = [π− ∑(gk /rk+1 zk+1 )u, u] = ∑ ∑(gi+ j r−i− j ui )u j = ∑ ∑ gi+ j (ui r−i )(u j r− j ) = [Hg Pr u, Pr u] = [Pr∗ Hg Pr u, u],
250
8 Tensor Products and Forms
hence Hgr = Pr∗ Hg Pr , which implies σ (Hgr ) = σ (Hg ). The invariance of the Cauchy index under these scaling operations is obvious. Theorem 8.59 (Hermite–Hurwitz). Let g(z) = p(z)/q(z) be a strictly proper, real rational function with p(z) and q(z) coprime. Then Ig = σ (Hg ) = σ (B(q, p)).
(8.98)
Proof. That σ (Hg ) = σ (B(q, p)) has been proved in Theorem 8.40. So it suffices to prove the equality Ig = σ (Hg ). Let us analyze first the case that q(z) is a polynomial with simple real zeros, i.e., q(z) = ∏nj=1 (z − a j ) and ai = a j for i = j. Let di (z) = q(z)/(z − ai). Given any polynomial u(z) ∈ Xq , it has a unique expansion u(z) = ∑ni=1 ui di (z). We compute [Hg u, u] = [π− gu, u] = [π− q−1 pu, u] = [q−1 qπ− q−1 pu, u] = πq pu, u = p(Sq )u, u = ∑ni=1 ∑nj=1 p(Sq )di , d j ui u j = ∑ni=1 ∑nj=1 p(ai )di , d j ui u j = ∑ni=1 p(ai )di (ai )u2i , since di (z) are eigenfunctions of Sq corresponding to the eigenvalues ai and since by Proposition 5.40, di , d j = di (ai )δi j . From this computation it follows, since [p(Sq )]stco = B(q, p), that n
σ (Hg ) = σ (B(q, p)) = ∑ sign [p(ai )di (ai )]. i=1
On the other hand, we have the partial fraction decomposition g(z) = or
n
n ci p(z) =∑ , q(z) i=1 z − ai
p(z) = ∑ ci i=1
n q(z) = ∑ ci di (z), z − ai i=1
which implies p(ai ) = ci di (ai ), or equivalently, that ci = n
n
i=1
i=1
Ig = ∑ sign (ci ) = ∑ sign
p(ai ) di (ai ) .
Now obviously
p(ai ) , di (ai )
and, since sign (p(ai )di (ai )) = sign (p(ai )/di (ai )), the equality (8.98) is proved in this case.
8.3 Some Classes of Forms
251
We pass now to the general case. Let q(z) = q1 (z) · · · qs (z) be the unique factorization of q(z) into powers of relatively prime irreducible monic polynomials. As before, we define polynomials di (z) by q(z) . qi (z)
di (z) =
Since we have the direct sum decomposition Xq = d1 Xq1 ⊕ · · ·⊕ ds Xqs , it follows that each f (z) ∈ Xq has a unique representation of the form f (z) = ∑si=1 di (z)ui (z) with ui (z) ∈ Xqi . Relative to the indefinite metric of Xq , introduced in (5.64), this is an orthogonal direct sum decomposition, i.e., di Xqi , d j Xq j = 0
for i = j.
Indeed, if fi (z) ∈ Xqi and g j (z) ∈ Xq j , then di fi , d j f j = [q−1 di fi , d j f j ] = [d j q−1 di fi , f j ] = 0, since di (z)d j (z) is divisible by q(z) and F[z]⊥ = F[z]. Let g(z) = ∑si=1 pi (z)/qi (z) be the partial fraction decomposition of g(z). Since the zeros of the qi (z) are distinct, it is clear that s
Ig = I p = ∑ I pi /qi . q
i=1
Also, as a consequence of Proposition 8.23, it is clear that for the McMillan degree δ (g) of g(z), we have δ (g) = ∑si=1 δ (pi /qi ), and hence, by Theorem 8.15, the signatures of the Hankel forms are additive, namely σ (Hg ) = ∑si=1 σ (H pi /qi ). Therefore, to prove the Hermite–Hurwitz theorem it suffices to prove it in the case of q(z) being the power of a monic prime. Since we are discussing the real case, the primes are of the form z − a and ((z − a)2 + b2 ), with a, b ∈ R. By applying the previous scaling result, the proof of the Hermite–Hurwitz theorem reduces to the two cases q(z) = zm or q(z) = (z2 + 1)m . Case 1: q(z) = zm . Assuming p(z) = p0 + p1 z + · · · + pm−1 zm−1 , then the coprimeness of p(z) and q(z) is equivalent to p0 = 0. Therefore we have g(z) = pm−1 z−1 + · · · + p0 z−m , which shows that 0 if m is even, Ig = I(p/zm) = sign (p0 ) if m is odd. On the other hand, Ker Hg = zm+1 R[z], and so
σ (Hg ) = σ (Hg |Xzm ).
252
8 Tensor Products and Forms
Relative to the standard basis, the truncated Hankel map has the matrix representation ⎛ ⎞ pm−1 . . . p1 p0 ⎜. ⎟ .... ⎜ ⎟ ⎜ ⎟ ... ⎜. ⎟ ⎜ ⎟. ⎜. ⎟ .. ⎜ ⎟ ⎝ p1 . ⎠ p0 Now, clearly, the previous matrix has the same signature as ⎛
0 ⎜. ⎜ ⎜ ⎜. ⎜ ⎜. ⎜ ⎝0 p0 and hence
σ (Hg ) =
⎞ . . . 0 p0 ⎟ .... ⎟ ⎟ ... ⎟ ⎟, ⎟ .. ⎟ ⎠ .
sign (p0 ) if m is odd, 0 if m is even.
Case 2: q(z) = (z2 + 1)m . Since q(z) has no real zeros, it follows that in this case Ig = I p = 0. So it suffices q
to prove that also σ (Hg ) = 0. Let g(z) = p(z)/(z2 + 1)m with deg p < 2m. Let us expand p(z) in the form p(z) =
m−1
∑ (pk + qkz)(z2 + 1)k,
k=0
with the pk and qk uniquely determined. The coprimeness condition is equivalent to p0 and q0 not being zero together. The transfer function g(z) has therefore the following representation: g(z) =
m−1
∑
k=0
pk + q k z . (z2 + 1)m−k
In much the same way, every polynomial u(z) can be written, in a unique way, as u(z) =
m−1
∑ (ui + vi z)(z2 + 1)i.
i=0
8.3 Some Classes of Forms
253
We compute now the matrix representation of the Hankel form with respect to the basis B = {1, z, (z2 + 1), z(z2 + 1), . . ., (z2 + 1)m−1 , z(z2 + 1)m−1 } of X(z2 +1)m . Thus, we need to compute [Hg zα (z2 + 1)λ , zβ (z2 + 1)μ ] = [Hg zα +β (z2 + 1)λ +μ , 1]. Now, with 0 ≤ γ ≤ 2 and 0 ≤ ν ≤ 2m − 2, we compute
m−1 (pi + qi z)(z2 + 1)i γ 2 ∑i=0 z (z + 1)ν , 1 (z2 + 1)m ! γ 2 ν +i " m−1 (pi + qiz)z (z + 1) ,1 . = ∑i=0 (z2 + 1)m
[Hg zγ (z2 + 1)ν , 1] =
The only nonzero contributions come from the terms in which ν + i = m − 2, m − 1, or equivalently, when i = m − λ − μ − 2, m − λ − μ − 1. Now
!
"
⎧⎧ ⎪ ⎨ qi , γ = 0, ⎪ ⎪ ⎪ ⎪ p , γ = 1, ⎪ ⎪⎩ i ⎪ ⎪ ⎨ −qi , γ = 2,
(pi + qi z)zγ (z2 + 1)ν +i ,1 = ⎧ ⎪ (z2 + 1)m ⎪ ⎪ ⎨ 0, γ = 0, ⎪ ⎪ ⎪ ⎪ 0, γ = 1, ⎪ ⎪ ⎩⎩ qi , γ = 2,
i = m − ν − 1,
i = m − ν − 2.
Thus the matrix of the Hankel form in this basis has the following block triangular form: ⎞ ⎛ qm−1 pm−1 q1 p1 q0 p 0 ⎜ pm−1 qm−2 − qm−1 p1 q0 − q1 p0 −q0 ⎟ ⎟ ⎜ ⎟ ⎜ q0 p0 ⎟ ⎜ ⎟ ⎜ p0 −q0 ⎟ ⎜ M=⎜ ⎟. ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎠ ⎝ q0 p0 p0 −q0 q0 p0 By our coprimeness assumption, the matrix is nonsingular and its p0 −q0
determinant is negative. Hence it has signature equal to zero. This implies that also the signature of M is zero, and with this, the proof of the theorem is complete.
254
8 Tensor Products and Forms
8.4 Tensor Products of Models The underlying idea in this book is that the study of linear algebra is simplified and clarified when functional, or module-theoretic, methods are used. This motivated us in the introduction of polynomial and rational models. Naturally, we expect that the study of tensor products and forms by the use of these models will lead to similar simplifications, and this we proceed to do. In our approach, we emphasize the concreteness of specific representations of tensor products of polynomial models. Thus, we expect that the tensor products of polynomial models, whether over the underlying field F or over the polynomial ring F[z], will also turn out to be (isomorphic to) polynomial models in one or two variables as the case may be. There are several different models we can use. Two of them arise from Kronecker products of multiplication maps, the difference being whether the product is taken over the field F or the ring F[z].
8.4.1 Bilinear Forms Let X , Y be F-vector spaces. We shall denote by L(X , Y ), as well as by Hom F (X , Y ), the space of F-linear maps from X to Y . We take now a closer look at the case of tensor products of two finite-dimensional F-vector spaces X , Y . In this situation, and in order to clearly distinguish it from the algebraic dual M of a module M, we will write X ∗ = Hom F (X , F) for the vector space dual. Let BX = { fi }ni=1 , BY = {gi }m i=1 be bases of X and Y respectively. Let ∗ = {φ }n be the basis of X ∗ that is dual to B , i.e., it satisfies φ ( f ) = δ . BX i i=1 i j ij X Let F be a field and let X , Y , Z be F-vector spaces. A function φ : X × Y −→ Z is called a Z -valued bilinear form on X × Y if φ is linear in both variables. We will denote by Bilin (X , Y ; Z ) the space of all Z -valued bilinear functions. We will denote by Bilin (X , Y ; F) the set of all F valued bilinear functions on X × Y and refer to its elements as bilinear forms. There is a close connection between bilinear forms and linear transformations which we summarize in the following proposition. Proposition 8.60. Given F-linear vector spaces X , Y , a necessary and sufficient condition for a function φ : X × Y −→ F to be a bilinear form is the existence of a, uniquely defined, linear map T : X −→ Y ∗ for which
φ (x, y) = T x, y.
(8.99)
Proof. Assume T ∈ Hom F (X , Y ∗ ). Defining φ : X × Y −→ F by (8.99), it clearly follows that φ (x, y) is an F-valued bilinear form. Conversely, if φ (x, y) is an F-valued bilinear form on X × Y , then for each x ∈ X , φ (x, y) is a linear functional on Y . Thus there exists y∗x ∈ Y ∗ for which
8.4 Tensor Products of Models
255
φ (x, y) = y∗x , y. Define a map T : X −→ Y ∗ by T x = y∗x . Using the linearity of φ (x, y) in the x variable, we compute y∗α1 x1 +α2 x2 , y = φ (α1 x1 + α2 x2 , y) = α1 φ (x1 , y) + α2 φ (x2 , y) = α1 y∗x1 , y + α2 y∗x2 , y = α1 y∗x1 + α2 y∗x2 , y. This implies y∗α1 x1 +α2 x2 = α1 y∗x1 + α2 y∗x2 , which can be rewritten as T (α1 x1 + α2 x2 ) = α1 T x1 + α2 T x2 , i.e., T ∈ Hom F (X , Y ∗ ). Uniqueness of T follows by a standard argument.
8.4.2 Tensor Products of Vector Spaces The availability of duality pairings allows us to take another look at matrix representations. Given F-vector spaces X , Y and elements f ∈ X ∗ and w ∈ Y , we use the notation v, f = f (v). Next, we define a map w ⊗ f : X −→ Y by (w ⊗ f )v = v, f w,
v∈X.
(8.100)
It is easily checked that w⊗ f is the zero map if and only if f = 0 or w = 0; otherwise, it has rank one. Proposition 8.61. We have Hom F (X , Y ) =
s
∑
# wi ⊗ v∗i
| wi ∈ Y
, v∗i
∗
∈ X ,s ∈ N .
(8.101)
i=1
Proof. Clearly, from (8.100) and the bilinearity of the pairing v, f , it follows that w ⊗ f is a linear transformation. Since Hom F (X , Y ) is closed under linear combinations, we have the inclusion Hom F (X , Y ) ⊃ {∑i wi ⊗ fi | wi ∈ Y , fi ∈ X ∗ }. Conversely, let T ∈ Hom F (X , Y ). We consider the bases BV = {e1 , . . . , en } of X and BW = { f1 , . . . , fm } of Y . A vector v ∈ X has a unique representation of the form v = ∑nj=1 α j e j . Similarly, w = T v ∈ Y has a unique representation of the form ∗ ∗ ∗ T v = ∑m i=1 βi f i . With BV ∗ = {e1 , . . . , en } the dual basis to BV , we have e j , ei = δi j ∗ and BV ∗ = BV . Let Ei j ∈ Hom F (X , Y ) be the map defined by (4.12), i.e., by Ei j ek = δ jk fi . Clearly, we have Ei j v = v, e∗j fi = ( fi ⊗ e∗j )v. The computation (4.14) shows that
(8.102)
256
8 Tensor Products and Forms m
n
T = ∑ ∑ ti j fi ⊗ e∗j ,
(8.103)
i=1 j=1
and hence the inclusion Hom F (X , Y ) ⊂ {∑i wi ⊗ fi |wi ∈ Y , fi ∈ X ∗ }. The two inclusions are equivalent to (8.101). We proceed to define the tensor product of two vector spaces. Our preference is to do it in a concrete way, so that the tensor product is identified with a particular representation. Given F-vector spaces X , Y , let X ∗ be the vector space dual of X . For x∗ ∈ X ∗ and y ∈ Y , we define a map y ⊗ x∗ ∈ Hom F (X , Y ) by (y ⊗ x∗ )ξ = (x∗ ξ )y,
ξ ∈X.
(8.104)
Although we can give an abstract definition of a tensor product, we prefer to give a concrete one. Definition 8.62. Given F-vector spaces X , Y , we define Y ⊗F X ∗ = Hom F (X , Y ).
(8.105)
We call Y ⊗F X ∗ = Hom F (X , Y ) the tensor product of Y and X ∗ . The map φ : Y × X ∗ −→ Y ⊗F X ∗ defined by
φ (y, x∗ ) = y ⊗ x∗
(8.106)
is called the canonical map. Clearly, in this case, the canonical map φ is bilinear and bijective. Note that for finite-dimensional F-vector spaces X , Y , and using reflexivity, that is, (X ∗ )∗ X , we can also define Y ⊗F X = Hom F (X ∗ , Y ). Proposition 8.63. Let BX = {e1 , . . . , en } and BY = { f1 , . . . , fm } be bases of X ∗ = {e∗ , . . . , e∗ } be the dual basis to B . Then and Y respectively and let BX X n 1 1. The set
∗ BY ⊗ BX = { fi ⊗ e∗j | i = 1, . . . , m, j = 1, . . . , n}
(8.107)
∗ as the tensor is a basis for Y ⊗F X ∗ . We will refer to the basis BY ⊗ BX ∗ . product of the bases BY and BX 2. We have dim Y ⊗ X ∗ = dim Y · dim X . (8.108) B
3. For T ∈ Hom F (X , Y ), with the matrix representation [T ]BYX defined by (4.16), we have ∗ B [T ]BY ⊗BX = [T ]BYX . (8.109) Proof. 1. Note that we have the equality
8.4 Tensor Products of Models
257
fi ⊗ e∗j = Ei j ,
(8.110)
where the maps Ei j are defined by (4.7). Thus, the statement follows from Theorem 4.6. ∗ is mn; hence (8.108) follows. 2. The number of elements in the basis BY ⊗ BX n ∗ 3. Based on the identification (8.105), we have T = ∑m i=1 ∑ j=1 ti j f i ⊗ e j . Comparing this with (4.15), (8.109) follows. n ∗ The representation T = ∑m i=1 ∑ j=1 ti j ei ⊗ f j of T ∈ Hom F (X , Y ) can generally be simplified. If rankT = dim Im T = k, then there exists a minimal-length representation k
T = ∑ ψ i ⊗ φi ,
(8.111)
i=1
where {φi }ki=1 is a basis of (Ker T )⊥ ⊂ X ∗ and {ψi }ki=1 is a basis for Im T ⊂ Y . The following proposition describes the underlying property of tensor products. Proposition 8.64. Let γ : Y × X ∗ −→ Z be a bilinear map. Then there exists a unique linear map γ∗ : Y ⊗F X ∗ −→ Z for which the following diagram commutes:
φ
Y ×X ∗
- Y ⊗X ∗
@
@
γ
@
@
γ∗
@
@ R @
? Z
Proof. Given the bilinear map γ ∈ Hom F (Y × X ∗ , Z ), we define γ∗ ∈ Hom F (Y ⊗ X ∗ , Z ) by
γ∗ (y ⊗ x∗ ) = γ (y, x∗ ).
(8.112)
It is easily checked that γ∗ is well defined and linear and that γ = γ∗ ◦ φ holds. Since any element of Y ⊗ X ∗ is a finite sum ∑ri=1 yi ⊗ x∗i , the result follows. One of the fundamental properties of tensor products is the connection between maps defined on tensor products, bilinear forms (hence also quadratic forms), and module homomorphisms. In the special case of F-vector spaces X and Y , we have the following result.
258
8 Tensor Products and Forms
Proposition 8.65. 1. Let X , Y be finite-dimensional F-vector spaces. Then we have the following isomorphisms: (X ⊗F Y )∗ Bilin (X , Y ; F) Hom F (X , Y ∗ ).
(8.113)
2. Let X , Y be finite-dimensional F-vector spaces. Then we have the following isomorphism (X ⊗F Y )∗ X ∗ ⊗F Y ∗ Y ∗ ⊗F X ∗ . (8.114) Proof. 1. We give only a sketch of the proof. Let φ (x, y) be a bilinear form on X × Y . For fixed x ∈ X , this defines a linear functional ηx ∈ Y ∗ by
ηx (y) = φ (x, y).
(8.115)
Clearly, ηx depends linearly on x; hence there exists a map Z : X −→ Y ∗ for which Zx = ηx . So φ (x, y) = (Zx)(y). Conversely, assume Z ∈ Hom F (X , Y ∗ ). Then for every x ∈ X , we have Zx ∈ Y ∗ . Defining φ (x, y) = (Zx)(y), we get a bilinear form defined on X × Y . The homomorphisms φ → Z and Z → φ are inverses to each other; hence the isomorphism Bilin (X , Y ; F) Hom F (X , Y ∗ ) follows. Given a bilinear form γ : X × Y −→ F, by the definition of tensor products and with γ : X × Y −→ X ⊗F Y the canonical map, there exists a unique γ∗ ∈ (X ⊗F Y )∗ such that γ = γ∗ φ . The uniqueness implies that the map γ → γ∗ is injective. It is also surjective, since any γ∗ ∈ (X ⊗F Y )∗ induces a bilinear form γ on X ×Y by letting γ = γ∗ φ . Thus we have also the isomorphism (X ⊗F Y )∗ Bilin (X , Y ; F). 2. Assume ξi ∈ X ∗ , ηi ∈ Y ∗ . Then for every x ∈ X , we define s
s
i=1
i=1
∑ (ηi ⊗ ξi)x = ∑ ξi (x)ηi ,
(8.116)
which defines a map Y ∗ ⊗ X ∗ −→ Hom F (X , Y ∗ ). Conversely, if X is finite-dimensional, the image of any Z ∈ Hom (X , Y ∗ ) q is finite-dimensional, hence has a basis {ηi }i=1 , with ηi ∈ Y ∗ . Therefore we have, for suitable ξi ∈ X ∗ , the representation q
q
i=1
i=1
Zx = ∑ ξi (x)ηi = ∑ (ηi ⊗ ξi )x.
(8.117)
Thus Z → ∑qi=1 ξi ⊗ ηi defines an isomorphism from Hom(X , Y ∗ ) to Y ∗ ⊗X ∗ , and thus, by the isomorphism (8.113), also an isomorphism with X ∗ ⊗F Y ∗ .
8.4 Tensor Products of Models
259
8.4.3 Tensor Products of Modules Our definition of tensor products of vector spaces was an ad hoc one. To get into line with standard algebraic terminology, we use the characterization of tensor products, given in Proposition 8.64, as the basis for the following definition. Definition 8.66. Let R be a commutative ring and X , Y R-modules. An R-module X ⊗R W is called a tensor product over R if there exists a bilinear map φ : X × Y −→ X ⊗R W such that for any bilinear map γ : X × Y −→ Z , there exists a unique R-linear map γ∗ : X ⊗R W −→ Z for which
γ = γ∗ φ .
(8.118)
The following theorem summarizes the main properties of tensor products. We will not give a proof and refer the reader to Hungerford (1974) and Lang (1965) for the details. Theorem 8.67. Let R be a commutative ring and V , W R-modules. Then 1. For any R-modules V , W , a tensor product V ⊗R W exists and is unique up to isomorphism. 2. We have the following isomorphisms: (⊕ki=1 Mi ) ⊗R N ⊕ki=1 (Mi ⊗R N), M ⊗R (⊕lj=1 N j ) ⊕lj=1 (M ⊗R N j ), M ⊗R (N ⊗R P) (M ⊗R N) ⊗R P, M ⊗R N N ⊗R M.
(8.119)
3. If S ⊂ R is any subring and M, N are R-modules, then we have a well-defined surjective S-linear map M ⊗S N −→ M ⊗R N defined by m ⊗S n → m ⊗R n. 4. Let M1 , M2 be R-modules, with R a commutative ring. Let Ni ⊂ Mi be submodules. The quotient spaces Mi /Ni have a natural R-module structure. Let N be the submodule generated in M1 ⊗R M2 by N1 ⊗R M2 and M1 ⊗R N2 . Then we have the isomorphism M1 /N1 ⊗R M2 /N2 (M1 ⊗R M2 )/N.
(8.120)
5. Given R-modules M, N, L, we denote by Bilin R (M, N; L) the set of all R-bilinear maps φ : M × N −→ L. We have the following isomorphisms: Hom R (M ⊗R N, L) Bilin R (M, N; L) Hom R (M, Hom R (N, L)).
(8.121)
260
8 Tensor Products and Forms
8.4.4 Kronecker Product Models We define the F-Kronecker product of two scalar polynomials d1 (z), d2 (z) ∈ F[z] as the map d1 ⊗F d2 : F[z, w] −→ F[z, w] given by (d1 ⊗F d2 )q(z, w) = d1 (z)q(z, w)d2 (w). This map induces a projection πd1 ⊗F d2 in F[z, w] defined by
πd1 ⊗F d2 q(z, w) = (d1 ⊗F d2 )(π−z ⊗ π−w )(d1 ⊗F d2 )−1 q(z, w).
(8.122)
Thus, if q(z, w) is given as q(z, w) = ∑ri=1 ai (z)bi (w), then
πd1 ⊗F d2 q(z, w) = ∑ri=1 (πd1 ai )(z)(πd2 bi )(w) = ∑ri=1 d1 (z)π− (d1−1 ai )(z)π− (d2−1 bi )(w)d2 (w).
(8.123)
We obtain the F-Kronecker product model as Xd1 ⊗F d2 = Xd1(z)d2 (w) = Im πd1 ⊗F d2 . From now on, we will mainly use the more suggestive notation Xd1 (z)d2 (w) for the F-Kronecker product model. By inspection, one verifies that Xd1 (z)d2 (w) =
deg d1 −1 deg d2 −1
∑
i=0
∑
# i
fi j z w
j
.
(8.124)
j=0
From this description, it follows that we have the dimension formula dimF (Xd1 (z)d2 (w) ) = deg d1 · degd2 .
(8.125)
Similarly, we define the F[z]-Kronecker product d1 ⊗F[z] d2 of two scalar polynomials d1 (z), d2 (z) as the map d1 ⊗F[z] d2 : F[z] −→ F[z] given by d1 ⊗F[z] d2 q(z) = d1 (z)q(z)d2 (z). This defines the projection map πd1 ⊗d2 : F[z] −→ F[z] given by πd1 ⊗F[z] d2 f = π(d1 d2 ) f . (8.126) In turn, this allows us to introduce the F[z]-Kronecker product model as Xd1 ⊗F[z] d2 F[z]/(d1 d2 )F[z] Xd1 d2 .
(8.127)
Consequently, for the F[z]-Kronecker product model, we have the dimension formula dimF Xd1 ⊗F[z] d2 = deg(d1 d2 ) = deg d1 + degd2 = dimF Xd1 + dimF Xd2 .
(8.128)
8.4 Tensor Products of Models
261
Finally, from duality of polynomial and rational models, we achieve a representation of the dual to the Kronecker product model as (Xd1 ⊗F[z] d2 )∗ = (Xd1 d2 )∗ = X d1 d2 .
(8.129)
8.4.5 Tensor Products over a Field The tensor products were defined, in Definition 8.66, in a formal way. Our conviction is that a concrete representation for the tensor product is always preferable. Thus, we have the identifications, i.e., up to isomorphism, for tensor products of polynomial spaces, i.e., F[z] ⊗F F[z] F[z, w], (8.130) as well as F[z] ⊗F[z] F[z] = F[z].
(8.131)
For our purposes, this is less interesting than the identification of the tensor product of two polynomial models. This will be achieved by the use of Theorem 8.67, and in particular, equation (8.120). Central to this is the isomorphism (5.6), namely Xq F[z]/qF[z]. The availability of Kronecker product polynomial models allows us to give concrete representations for the tensor products. We show that the tensor product Xd1 ⊗F Xd2 of two polynomial models over the field F can be interpreted as a polynomial model for the two-variable polynomial d1 (z)d2 (w). Using (8.120) and the isomorphism (5.6), we have Xd1 ⊗F Xd2 (F[z]/d1 F[z]) ⊗F (F[z]/d2 F[z]) F[z, w]/(d1 (z)F[z, w] + d2 (w)F[z, w]). The submodule d1 (z)F[z, w] + d2 (w)F[z, w] can be seen as the kernel of the projection operator πd1 (z) ⊗ πd2 (w) acting in F[z, w] by (πd1 (z) ⊗ πd2(w) )( f (z)g(w)) = (πd1 (z) f (z))(πd2 (w) g(w)). From this, the above representation of Xd1 ⊗F Xd2 easily follows. Defining Xd1 (z)d2 (w) = Im (πd1 (z) ⊗ πd2(w) ) implies the isomorphism Xd1 ⊗F Xd2 Xd1 (z)d2 (w) ,
(8.132)
i.e., the F-tensor product of polynomial models is a polynomial model in two variables. Proposition 8.68. Given nontrivial polynomials d1 (z), d2 (z) ∈ F[z], we have the isomorphism Xd1 ⊗F Xd2 Xd1 ⊗F d2 = Xd1 (z)d2 (w) . (8.133)
262
8 Tensor Products and Forms
In particular, the F-tensor product of two polynomial models in one variable is the two-variable Kronecker product polynomial model, and the map γ∗ : Xd1 ⊗F Xd2 −→ Xd1 (z)d2 (w) induced by γ∗ ( f1 ⊗ f2 ) = f1 (z) f2 (w) (8.134) is an F-isomorphism. Proof. Let φ : Xd1 × Xd2 −→ Xd1 ⊗F Xd2 be the canonical map. Consider the Fbilinear map γ : Xd1 × Xd2 −→ Xd1 (z)d2 (w) defined by γ ( f1 , f2 ) = f1 (z) f2 (w). Thus γ∗ exists and is F-linear, leading to the following commutative diagram:
φ
Xd1 × Xd2
- Xd ⊗F Xd 1 2
@
@
γ
@
@
γ∗
@
@ R @
? Xd1 (z)d2 (w)
The description (8.124) shows that γ∗ is surjective. From the dimension formula (8.125) we conclude that Xd1 ⊗F Xd2 and Xd1 (z)d2 (w) have the same F-dimension, and the result follows. Being the tensor product of two vector spaces over the field F, the tensor product model Xd1 ⊗F Xd2 is clearly an F-vector space. We will see later that it carries also a natural F[z, w]-module structure. We now connect the space of F-linear maps between polynomial models Xd1 , Xd2 , namely Hom F (Xd1 , Xd2 ), with the tensor product Xd2 ⊗F Xd1 . Using the residue form on Xd1 , i.e., f , g = (d1−1 f g)−1 , where h−1 denotes the coefficient of z−1 in the Laurent series expansion of h(z), we can get the concrete form of the isomorphism as follows. Theorem 8.69. We define a map Ψ : Xd2 (z)d1 (w) −→ Hom F (Xd1 , Xd2 ) as follows. For q(z, w) ∈ Xd2 ⊗F Xd1 , the map is q(z, w) → Ψ (q), where Ψ (q) : Xd1 −→ Xd2 is defined, for f ∈ Xd1 , by
Ψ (q) f = f , q(z, ·) = (q(z, ·)d1 (·)−1 f (·))−1 .
(8.135)
The map q(z, w) → Ψ (q) defines a vector space isomorphism, i.e., we have the following isomorphism: Xd2 ⊗F Xd1 Hom F (Xd1 , Xd2 ).
(8.136)
8.4 Tensor Products of Models
263
Proof. Let q1 (z), . . . , qs (z) denote a basis of Xd1 (z)d2 (w) . Then it is easily seen that Ψ (q1 ), . . . , Ψ (qn ) are linearly independent in Hom F (Xd1 , Xd2 ). The result follows because both spaces have the same dimension. Using the isomorphism Xd2 ⊗F Xd1 Xd2 (z)d1 (w) ,
(8.137)
we obtain the explicit isomorphism Ψ : Xd2 ⊗F Xd1 −→ Hom F (Xd1 , Xd2 ) given, for g ∈ Xd1 , by Ψ ( f2 ⊗ f1 )g = [d1−1 g, f1 ] f2 = g, f1 f2 , (8.138) where the pairing g, f is defined by (5.64). We consider next the F-vector space dual of the tensor product. Proposition 8.70. Given nontrivial polynomials d1 (z), d2 (z) ∈ F[z], we have the following identification of the vector space dual of Xd1 ⊗F Xd2 : (Xd1 ⊗F Xd2 )∗ X d1 ⊗F X d2 X d1 (z) ∩ X d2 (w) .
(8.139)
Proof. We compute, using vector space annihilators, (Xd1 ⊗F Xd2 )∗ (F[z]/d1 F[z] ⊗F F[z]/d2 F[z])∗ (d1 (z)F[z, w] + d2 (w)F[z, w])⊥ (d1 (z)F[z, w])⊥ ∩ (d2 (w)F[z, w])⊥ d1 ⊗1 ∩ X 1⊗d2 =X =
h(z, w) ∈ z−1 F[[z−1 , w−1 ]]w−1 |h(z, w) =
$ p(z, w) . d1 (z)d2 (w)
8.4.6 Tensor Products over the Ring of Polynomials From Proposition 8.68, we have seen that the F-tensor product of polynomial models, namely Xd1 ⊗F Xd2 , takes us out of the realm of single-variable polynomial spaces. The situation changes when the underlying ring is F[z]. Since polynomial models Xd1 , Xd2 are F[z]-modules, their tensor product, namely Xd1 ⊗F[z] Xd2 , is an F[z]-module too. The tensor product is, by Definition 8.66, an abstract object. Again, for clarity, as well as computational purposes, we would like to have a concrete representation for it, and this is done next. In analogy with (8.133), we have the F[z]-module isomorphism
264
8 Tensor Products and Forms
Xd1 ⊗F[z] Xd2 F[z]/d1 F[z] ⊗F[z] F[z]/d2 F[z] F[z]/(d1 (z)F[z] + d2 (z)F[z]) F[z]/((d1 ∧ d2 )F[z]) Xd1 ∧d2 .
(8.140)
Proposition 8.71. Given nontrivial polynomials d1 (z), d2 (z) ∈ F[z], let φ : Xd1 × Xd2 −→ Xd1 ⊗F[z] Xd2 be the canonical map and let γ : Xd1 × Xd2 −→ Xd1 ∧d2 be defined by γ ( f1 , f2 ) = πd1 ∧d2 ( f1 f2 ). (8.141) Then the map γ∗ : Xd1 ⊗F[z] Xd2 −→ Xd1 ∧d2 , defined by
γ∗ ( f1 ⊗ f2 ) = πd1 ∧d2 ( f1 f2 ),
(8.142)
is an F[z]-module isomorphism that implies the isomorphism Xd1 ⊗F[z] Xd2 Xd1 ∧d2 .
(8.143)
The commutativity of the following diagram holds:
φ
Xd1 × Xd2
- Xd ⊗F[z] Xd 1 2
@
@
γ
@
γ∗
@
@
@ R @
? Xd1 ∧d2
Proof. We check first that γ∗ is well defined on the tensor product space. This follows immediately from the identity
γ∗ (p · f1 ⊗ f2 ) = πd1 ∧d2 (p f1 f2 ) = πd1 ∧d2 pπd1 ∧d2 ( f1 f2 ) = p · γ∗ ( f1 ⊗ f2 ), which shows that γ∗ is a well defined F[z]-homomorphism. Clearly, the map γ is surjective, since for any f (z) ∈ Xd1 ∧d2 , we have γ ( f , 1) = f . This implies also the surjectivity of γ∗ . Next, we note that dimF (Xd1 ⊗F[z] Xd2 ) = deg(d1 ∧ d2 ) = dimF Xd1 ∧d2 , which shows that γ∗ is bijective. This proves the isomorphism (8.143).
(8.144)
8.4 Tensor Products of Models
265
Clearly, we have the dimension formula dimF Xd1 ⊗F[z] Xd2 = dimF X(d1 ∧d2 ) = deg(d1 ∧ d2 ) ≤ max(deg d1 , deg d2 ) (8.145) ≤ deg d1 · degd2 = dimF (Xd1 ⊗F Xd2 ). This computation shows that in the case of tensor products over rings, a lot of dimension collapsing can occur. In fact, in the special case that d1 (z), d2 (z) are coprime, then Xd1 ⊗F[z] Xd2 = 0. The following theorem is the analogue of the isomorphism (8.136) for the case of F[z]-tensor products. We give two versions of it, using both Xd1 ⊗F[z] Xd2 and Xd1 ∧d2 . Theorem 8.72. Given nontrivial polynomials d1 (z), d2 (z) ∈ F[z], then 1. We have the F[z]-isomorphism of F[z]-modules Xd1 ⊗F[z] Xd2 Hom F[z] (Xd1 , Xd2 ).
(8.146)
The isomorphism is given by the map ψ : Xd1 ⊗F[z] Xd2 −→ Hom F[z] (Xd1 , Xd2 ) defined by f1 ⊗ f2 → ψ f1 ⊗ f2 :
ψ f1 ⊗ f2 (g) = d2 π− ((d1 ∧ d2 )−1 f1 f2 g) =
d2 d1 ∧ d 2
πd1 ∧d2 ( f1 f2 g),
g ∈ Xd1 . (8.147)
2. We have the F[z]-isomorphism of F[z]-modules Xd1 ∧d2 Hom F[z] (Xd1 , Xd2 ).
(8.148)
The isomorphism is given by the map ψ : Xd1 ∧d2 −→ Hom F[z](Xd1 , Xd2 ) defined by n → ψn defined, for n ∈ Xd1 ∧d2 , by
ψn (g) = πd2 (n2 g),
g ∈ Xd1 .
(8.149)
Proof. 1. We use the isomorphism (8.143) to identify Xd1 ⊗F[z] Xd2 with Xd1 ∧d2 . Choosing any element f (z) ∈ Xd1 ∧d2 , by construction ψ f is F[z]-linear. To show injectivity of ψ , assume ψ f = 0, i.e., for all g(z) ∈ Xd1 we have 0 = ψ f g = d2 d1 ∧d2 πd1 ∧d2 ( f g). This holds in particular for g(z) = 1. Thus πd1 ∧d2 f = 0 and necessarily f (z) ∈ (d1 ∧ d2 )F[z]. However, we assumed f (z) ∈ Xd1 ∧d2 . Thus the direct sum decomposition F[z] = Xd1 ∧d2 ⊕ (d1 ∧ d2 )F[z] implies f (z) = 0, and the injectivity of ψ follows. Using the F-linear isomorphism (Xd1 ⊗F[z] X d2 )∗ Hom F[z] (Xd1 , Xd2 ), it follows that dimF Hom F[z] (Xd1 , Xd2 ) = dimF Xd1 ⊗F[z] Xd2 = dimF Xd1 ⊗F[z] Xd2 . Hence injectivity implies surjectivity, and we are done. n(z) 2. Note that n(z) ∈ Xd1 ∧d2 if and only if h(z) = d (z)∧d ∈ X d1 ∧d2 = X d1 ∩ X d2 . 1 2 (z) In turn, h(z) is rational and strictly proper and has a unique representation of
266
8 Tensor Products and Forms
the form h(z) = d (z)∧d (z) . Writing di (z) = (d1 (z) ∧ d2 (z))di (z), we have, with 1 2 ni (z) = n(z)di (z), n(z)
h(z) =
n(z) n1 (z) n2 (z) = = , d1 (z) ∧ d2 (z) d1 (z) d2 (z)
(8.150)
which also implies the intertwining relation n2 (z)d1 (z) = d2 (z)n1 (z).
(8.151)
We compute now, with n(z) = πd1 (z)∧d2 (z) ( f1 (z) f2 (z)),
ψ f1 ⊗ f2 g = d2 π− (d1 ∧ d2 )−1 f1 f2 g = d2 π− (d1 ∧ d2 )−1 (πd1 ∧d2 ( f1 f2 ))g = d2 π− (d1 ∧ d2 )−1 ng = d2 π− d2−1 n2 g = πd2 n2 g = ψn (g). The most important aspect of Theorem 8.72 is the establishment of a concrete connection between the space of maps Z intertwining the shifts Sd1 and Sd2 , i.e., satisfying ZSd1 = Sd2 Z on the one hand, and the F[z]-tensor product of the polynomial models Xd1 and Xd2 on the other. Note that (8.149) is the more familiar representation of an intertwining map; see Theorem 5.17. Next, we turn to the identification of the dual to the tensor product. Proposition 8.73. Given nontrivial polynomials d1 (z), d2 (z) ∈ F[z], then we have the series of F[z]-isomorphisms (Xd1 ⊗F[z] Xd2 )∗ (Xd1 ∧d2 )∗ X d1 ∧d2 X d1 ∩ X d2 Xd1 ⊗F[z] Xd2 Hom F[z] (Xd1 , Xd2 ).
(8.152)
Proof. We use the duality results on polynomial models, as well as the isomorphisms (8.143) and (8.148).
8.4.7 The Polynomial Sylvester Equation Note that the polynomial models Xd1 and Xd2 not only have a vector space structure but are actually, by (5.8), F[z]-modules. This implies that Xd2 (z)d1 (w) , and hence also, using the isomorphism (8.132), Xd2 ⊗F Xd1 , have natural F[z, w]-module structures. This is defined by p(z, w) · q(z, w) = πd2 (z)⊗d1 (w) (p(z, w)q(z, w)), where p(z, w) ∈ F[z, w].
q(z, w) ∈ Xd2 (z)⊗d1 (w) , (8.153)
8.4 Tensor Products of Models
267
Similarly, we define an F[z, w]-module structure on the tensored rational model X d2 (z)⊗d1 (w) by letting, for p(z, w) = ∑ki=1 ∑lj=1 pi j zi−1 w j−1 ∈ F[z, w] and h(z, w) ∈ X d2 (z)⊗d1 (w) , p(z, w) · h(z, w) = π d2 (z)⊗d1 (w) [∑ki=1 ∑lj=1 pi j zi−1 h(z, w)w j−1 ].
(8.154)
Proposition 8.74. Let d2 (z) ∈ F[z] and d1 (w) ∈ F[w] be nonsingular polynomials. Then 1. The F[z, w]-module structure on X d2 (z)⊗d1 (w) defined by (8.154) can be rewritten as k
l
p(z, w) · h(z, w) = (π−z ⊗ π−w) ∑ ∑ pi j zi−1 h(z, w)w j−1 .
(8.155)
i=1 j=1
2. With the F[z, w]-module structure on Xd2 (z)⊗d1 (w) and X d2 (z)⊗d1 (w) , given by (8.155) and (8.154) respectively, the multiplication map d2 (z) ⊗ d1(w) : X d2 (z)⊗d1 (w) −→ Xd2 (z)⊗F d1 (w) is an F[z, w]-module isomorphism, i.e., we have Xd2 (z)⊗d1 (w) X d2 (z)⊗d1 (w) .
(8.156)
Proof. 1. Follows from (8.154). 2. Follows, using Proposition 8.74, from the fact that h(z, w) ∈ X d2 (z)⊗d1 (w) if and only if h(z, w) = π d2 (z)⊗d1 (w) h(z, w), equivalently, if and only if
πd2 (z)⊗d1 (w) d2 (z)h(z, w)d1 (w) = d2 (z)h(z, w)d1 (w), i.e., d2 (z)h(z, w)d1 (w) ∈ Xd2 (z)⊗d1 (w) .
Special cases that are of interest are the single-variable shift operators Sz , Sw : Xd2 (z)⊗d1 (w) −→ Xd2 (z)⊗d1 (w) , defined by Sz q(z, w) = πd2 (z)⊗d1 (w) zq(z, w) = πd2 (z) zq(z, w), Sw q(z, w) = πd2 (z)⊗d1 (w) q(z, w)w = πI⊗d1 (w) q(z, w)w.
(8.157)
If we specialize definition (8.153) to the polynomial p(z, w) = z − w, we get, with q(z, w) ∈ Xd1 (z)⊗F d2 (w) , S q(z, w) = (z − w) · q(z, w) = πd2 (z)⊗F d1 (w) (zq(z, w) − q(z, w)w).
(8.158)
In fact, with A2 ∈ F p×p and A1 ∈ Fm×m , if D2 (z) = zI − A2 and D1 (w) = wI − A1 , then Q(z, w) ∈ XD2 (z)⊗D˜ 1 (w) if and only if Q(z, w) ∈ F p×m, i.e., Q(z, w)
268
8 Tensor Products and Forms
is a constant matrix. In that case we have XD2 (z)⊗F D˜ 1 (w) = F p×m and (z − w) · Q = π(zI−A2 )⊗(wI−A˜ 1 ) (z − w)Q = A2 Q − QA1 ,
(8.159)
which is the standard Sylvester operator. The equation (z − w) · Q = R reduces in this case to the Sylvester equation A2 Q − QA1 = R.
(8.160)
We proceed now to a more detailed study of the Sylvester equation in the tensored polynomial model framework. We saw above that the classical Sylvester equation AX − XB = C
(8.161)
S Q(z, w) = π(zI−A)⊗(wI−B) ˜ (z − w)Q = C,
(8.162)
corresponds to the equation
with Q,C ∈ X(zI−A)⊗F(wI−B) ˜ necessarily constant matrices. Note that every t(z, w) ∈ Xd2 (z)⊗d1 (w) has a full row rank factorization of the form 1 (w), t(z, w) = R2 (z)R
(8.163)
with R2 (z) ∈ XD2 and R1 (w) ∈ XD1 both of full row rank. The following theorem reduces the analysis of the general Sylvester equation to a polynomial equation of Bezout type. A special case is of course the homogeneous Sylvester equation which has a direct connection to the theory of Bezoutians. Theorem 8.75. Let d2 (z) ∈ F[z] and d1 (w) ∈ F[w] be nonsingular. Defining the Sylvester operator S : Xd2 (z)⊗d1 (w) −→ Xd2 (z)⊗d1 (w) by S q(z, w) = πd2 (z)⊗d1 (w) (z − w)q(z, w) = R2 (z)R˜ 1 (w)
(8.164)
(8.164), then for R2 (z) ∈ Xd2 ⊗I and R˜ 1 (w) ∈ XI⊗d1 we have 1. The Sylvester equation Sd2 q − qSd1 = t(z, w) = R2 (z)R˜ 1 (w)
(8.165)
S q = R2 (z)R˜ 1 (w),
(8.166)
or equivalently is solvable if and only if there exist polynomials n2 (z) ∈ Xd2 (z) and n1 (z) ∈ Xd1 (z) for which d2 (z)n1 (z) − n2 (z)d1 (z) + R2 (z)R˜ 1 (z) = 0. (8.167)
8.4 Tensor Products of Models
269
We will refer to (8.167) as the polynomial Sylvester equation, or PSE. In that case, the solution is given by q(z, w) =
d2 (z)n1 (w) − n2 (z)d1 (w) + R2 (z)R˜ 1 (w) . z−w
(8.168)
2. Q(z, w) ∈ XD2 (z)⊗D˜ 1 (w) solves the homogeneous polynomial Sylvester equation, or HPSE, if and only if there exist polynomials n2 (z) ∈ Xd2 and n1 (z) ∈ Xd1 that satisfy d2 (z)n1 (z) − n2 (z)d1 (z) = 0, (8.169) in terms of which q(z, w) =
d2 (z)n1 (w) − n2 (z)d1 (w) , z−w
(8.170)
i.e., q(z, w) is a Bezoutian. Proof. 1. Assume there exist polynomials n2 (z) ∈ Xd2 ⊗I and n1 (z) ∈ Xg1 solving equation (8.167), and for which q(z, w) is defined by (8.168). We note first that under our assumptions on R2 (z), R1 (z), d2 (z)−1 q(z, w)d1 (w)−1 n1 (w)d1 (w)−1 − d2 (z)−1 n2 (z) + d2 (z)−1 R1 (z)R2 (w)d1 (w)−1 = z−w is strictly proper in both variables, i.e., q(z, w) is in Xd2 (z)⊗d1 (w) . We compute S q(z, w) = πd2 (z)⊗d1 (w) (z − w)q(z, w) = πd2 (z)⊗d1 (w) (d2 (z)n1 (w) − n2 (z)d1 (w) + R2 (z)R1 (w)) = R2 (z)R1 (w), i.e., q(z, w) is indeed a solution. To prove the converse, we note that for the single-variable case, given a nonzero polynomial d2 (z) ∈ F[z], we have, for f (z) ∈ Xd1 , Sd1 f = z f (z) − d1 (z)ξ f , where ξ f = (d1−1 f )−1 . This implies that for q(z, w) ∈ Xd2 (z)⊗d1 (w) , we have Sz⊗1 q(z, w) = zq(z, w) − d2 (z)n1 (w) and S1⊗w q(z, w) = q(z, w)w − n2 (z)d1 (w) with n1 d1−1 , d2−1 n2 strictly proper. Thus, assuming q(z, w) is a solution of the PSE, we have Sz−w q(z, w) = [zq(z, w) − d2 (z)n1 (w)] − [q(z, w)w − n2 (z)d1 (w)] = [zq(z, w) − q(z, w)w] − [d2 (z)n1 (w) − n2 (z)d1 (w)] = R2 (z)R1 (w). Equation (8.162) reduces to
270
8 Tensor Products and Forms
[zQ(z, w) − Q(z, w)w] − [d2 (z)n1 (w) − n2 (z)d1 (w)] = R2 (z)R1 (w), or q(z, w) =
d2 (z)n1 (w) − n2 (z)d1 (w) + R2 (z)R1 (w) . z−w
(8.171)
(8.172)
However, since q(z, w) ∈ Xd2 (z)⊗d1 (w) is a polynomial, we must have that (8.167) holds. 2. Follows from the previous part.
8.4.8 Reproducing Kernels In Theorem 8.69, we proved the isomorphism (8.136), which shows that any linear transformation K : Xd1 −→ Xd2 can be represented by a unique polynomial k(z, w) ∈ Xd2 (z)d1 (w) , with the representation given, for f (z) ∈ Xd1 , by (K f )(z) = f (·), k(z, ·).
(8.173)
With the identification Xq∗ = Xq , given in terms of the pairing (5.64), we can introduce reproducing kernels for polynomial models. To this end we consider the ring F[z, w] of polynomials in the two variables z, w. Let now q(z) be a monic polynomial of degree n. Given a polynomial k(z, w) ∈ F[z, w] of degree less than n in each variable, it can be written as k(z, w) = ∑ni=1 ∑nj=1 ki j zi−1 w j−1 . Such a polynomial k(z, w) induces a linear transformation K : Xd −→ Xd , defined by n
n
(K f )(z) = f (·), k(z, ·) = ∑ ∑ ki j zi−1 w j−1 , f .
(8.174)
i=1 j=1
We will say that the polynomial k(z, w) is the kernel of the map K defined in (8.174). Now, any f (w) ∈ Xd has an expansion f (w) = ∑nk=1 fk ek (w), with {e1 (z), . . . , en (z)} the control basis in Xq . Thus, we compute (K f )(z) = ∑ni=1 ∑nj=1 ki j zi−1 w j−1 , f = ∑ni=1 ∑nj=1 ∑nk=1 ki j zi−1 fk w j−1 , ek = ∑ni=1 ∑nj=1 ∑nk=1 ki j fk zi−1 δ jk = ∑ni=1 ∑nj=1 ki j f j zi−1 . This implies that [K]stco = (ki j ). Given a polynomial q(z) of degree n, we consider now the special kernel, defined by q(z) − q(w) k(z, w) = . (8.175) z−w
8.4 Tensor Products of Models
271
Proposition 8.76. Given a monic polynomial q(z) = zn + qn−1 zn−1 + · · · + q0 of degree n, then, with k(z, w) defined by (8.175): 1. We have k(z, w) =
n
n
j=1
j=1
∑ w j−1 e j (z) = ∑ z j−1 e j (w).
(8.176)
f (·), k(z, ·) = f (z).
(8.177)
2. For any f (z) ∈ Xd , we have
3. We have the following matrix representations: st [K]co co = [K]st = I.
(8.178)
j−1 k j−k−1 Proof. 1. We use the fact that z j − w j = (z − w) ∑k=1 zw to get j j q(z) − q(w) = zn − wn + ∑n−1 j=1 q j (z − w ) = (z − w) ∑k=1 zk wn−k−1 + (z − w) ∑ j=1 q j ∑k=1 zk w j−k−1 = (z − w) ∑ j=1 w j−1 e j (z)
= ∑nj=1 w j−1 e j (z). The equality and hence q(z)−q(w) z−w follows by symmetry. 2. Let f (z) ∈ Xq and f (z) = ∑nj=1 f j e j (z). Then
q(z)−q(w) z−w
= ∑nj=1 z j−1 e j (w)
k(z, ·), f = ∑nj=1 w j−1 e j (z), ∑nk=1 fk ek (w) = ∑nj=1 ∑nk=1 fk e j (z)w j−1 , ek (w) = ∑nj=1 ∑nk=1 fk e j (z)δ jk = ∑nk=1 fk ek (z) = f (z). 3. Obviously, by the previous part, K acts as the identity; hence its matrix representation in any basis is the identity matrix. The matrix representations in (8.178) can also be verified by direct computations. A kernel with the properties described in Proposition 8.76 is called a reproducing kernel for the space Xq . We note that the map K : Xq −→ Xq , defined by Proposition 8.76, is the identity map. As such, it clearly satisfies ISq = Sq I, i.e., it is an intertwining map. Of course, not every map in Hom F (Xq , Xq ) is intertwining. What makes K an intertwining map is the special form of the kernel given in (8.175), namely the fact that the kernel is one of the simplest Bezoutians, a topic we will pick up in the sequel. Given a polynomial M(z, w) = ∑ni=1 ∑nj=1 Mi j zi−1 w j−1 in the variables z, w, we have an induced bilinear form in Xq × Xq defined by
φ ( f , g) = M f , g = M(z, ·), f w gz .
(8.179)
272
8 Tensor Products and Forms
In turn, this induces a quadratic form given by φ ( f , f ) = M f , f = M(z, ·), f w f z . Proposition 8.77. Let q(z) ∈ F[z] and let Bst and Bco be the standard and control bases respectively of Xq , defined in Chapter 5. 1. The matrix representation of the bilinear form φ ( f , g), given by (8.179), with respect to the control and standard bases of Xq is (Mi j ). 2. φ is symmetric if and only if (Mi j ) is symmetric. 3. The form φ is positive definite if and only if (Mi j ) is a positive definite matrix. Proof. 1. Mi j , the i j entry of [M]stco , is given by Me j , ei = ∑nk=1 ∑nl=1 Mi j zk−1 wl−1 , e j w , ei z = ∑nk=1 ∑nl=1 Mi j zk−1 wl−1 , e j w , ei z = ∑nk=1 Mk j zk−1 , ei z = ∑nk=1 Mk j δik = Mi j . 2. Follows from (8.179) and the previous computation. 3. Follows from the equality φ ( f , f ) = ([M]stco [ f ]co , [ f ]co ).
In the next subsection, we will return to the study of representations of general linear transformations in terms of kernels. This will be done using the polynomial representations of the tensor products of polynomial models, both over the underlying field F and over the polynomial ring F[z].
8.4.9 The Bezout Map Since a map intertwining the polynomial models Xq1 and Xq2 is automatically linear, there is a natural embedding of Hom F[z] (Xd1 , Xd2 ) in Hom F (Xd1 , Xd2 ). This, taken together with the isomorphisms (8.137) and (8.146), shows that there exists a natural embedding of Xd1 ⊗F[z] Xd2 in Xd1 ⊗F Xd2 . In order to make this embedding specific, we use concrete representations for the two tensor products, namely the isomorphisms (8.136) and (8.146). One result of using concrete representations for the tensor product is that they reveal, in the clearest possible conceptual way, the role of the Bezoutians in the analysis of maps intertwining two shifts. Theorem 8.78. Given d1 (z), d2 (z) ∈ F[z] − {0}, let Ψ : Xd1 (z)d2 (w) −→ Hom F (Xd1 , Xd2 ) be given by (8.135) and let ψ : Xd1 ⊗F[z] Xd2 −→ Hom F[z](Xd1 , Xd2 ) be given by (8.147). Let i : Hom F[z] (Xd1 , Xd2 ) −→ Hom F (Xd1 , Xd2 ) be the natural embedding. For n(z) ∈ Xd1 ∧d2 having the representation (8.150), we define the Bezout map β : Xd1 ∧d2 −→ Xd1 (z)d2 (w) by
β (n) = q(z, w) =
d2 (z)n1 (w) − n2 (z)d1 (w) , z−w
(8.180)
8.4 Tensor Products of Models
273
i.e., q(z, w), denotes the Bezoutian based on the intertwining relation (8.151). Then β is injective, and we have the equality
Ψ ◦ β = i ◦ ψ,
(8.181)
i.e., the Bezout map is the concretization of the embedding of Xd1 ⊗F[z] Xd2 in Xd1 ⊗F Xd2 . Proof. Note, as above, that any element n(z) ∈ Xd1 ∧d2 yields a unique strictly n(z) proper function h(z) = d (z)∧d = nd1 (z) = nd2 (z) . Thus n(z) determines unique 1 2 (z) 1 (z) 2 (z) polynomials n1 (z), n2 (z) satisfying the intertwining relation (8.151). By inspection, d1 (z)−1 q(z, w)d2 (w)−1 ∈ z−1 Fsep [[z−1 , w−1 ]]w−1 and thus q(z, w) ∈ Xd1 (z)d2 (w) . This shows that β is well defined and F-linear. The map β is injective, since q(z, w) = 0 n2 (w) implies, by (8.150), that nd1 (z) (z) = d (w) is constant. In turn, we conclude that h(z) = 1
2
n(z) d1 (z)∧d2 (z)
is a constant which, by strict properness, is necessarily zero. To prove the equality (8.181), we compute, with β (n) = q(z, w) given by (8.180), & % d2 (z)n1 (w) − n2 (z)d1 (w) g, q(z, ·) = g, z−w ! " d2 (z)n1 (w) − n2 (z)d1 (w) −1 = d1 (w) g(w), z−w w d2 (z)n1 (w) − n2 (z)d1 (w) −1 d1 (w) g(w) = z−w w−1 d2 (z)n1 (w)d1 (w)−1 g(w) n2 (z)g(w) − = z−w z−w −1 = −d2 π+ n1 d1−1 g + n2g = n2 g − d2π+ d2−1 n2 g = πd2 n2 g.
(8.182)
Here (F(z, w))w−1 denotes the coefficient of w−1 in the Laurent expansion of F(z, w) with respect to the variable w. The availability of the Bezout map and the representation (8.182) allow us to extend Theorem 5.17, giving a characterization of elements of Hom F[z] (Xd1 , Xd2 ). Theorem 8.79. Given nonzero polynomials d1 (z), d2 (z) ∈ F[z], then Z ∈ Hom (Xd1 , Xd2 ), i.e., it satisfies ZSd1 = Sd2 Z, if and only if there exist polynomials n1 (z), n2 (z) satisfying n2 (z)d1 (z) = d2 (z)n1 (z), (8.183) and for which Z has the representation Zg = πd2 n2 g, Proof. Follows from Theorem 8.78.
g(z) ∈ Xd1 .
(8.184)
274
8 Tensor Products and Forms
8.5 Exercises 1. Jacobi’s signature rule. Let A(x, x) be a Hermitian form and let Δ i be the determinants of the principal minors. If the rank of A is r and Δ1 , . . . , Δ r are nonzero then
π = P(1, Δ1 , . . . , Δr ),
ν = V (1, Δ1 , . . . , Δr ),
where P and V denote, respectively, the number of sign permanences and the number of sign changes in the sequence 1, Δ 1 , . . . , Δ r . m n m i i 2. Let p(z) = ∑m i=0 pi z = pm ∏i=1 (z − αi ) and q(z) = ∑i=0 qi z = qn ∏i=1 (z − βi ). n m Show that detRes (p, q) = pm qn ∏i, j (βi − α j ). 3. Show that a real Hankel form n
n
Hn (x, x) = ∑ ∑ si+ j−1 ξi ξ j i=1 j=1
is positive if and only if the parameters sk allow a representation of the form sk =
n
∑ ρ j θ jk ,
j=1
with ρ j > 0 and θ j distinct real numbers. 4. Given g1 , . . . , g2n−1 , consider the Hankel matrix Hn given by (8.66). Let a(z) = n−1 i i ∑n−1 i=0 ai z and x(z) = ∑i=0 xi z be the polynomials arising out of the solutions of the system of linear equations ⎛
g1 ⎜. ⎜ ⎜ ⎜. ⎜ ⎝. gn
⎞⎛ ⎞ ⎛ ⎞ . . . gn x0 r1 ⎟⎜. ⎟ ⎜. ⎟ .... ⎟⎜ ⎟ ⎜ ⎟ ⎟⎜ ⎟ ⎜ ⎟ .... ⎟⎜. ⎟ = ⎜. ⎟ ⎟⎜ ⎟ ⎜ ⎟ ⎠⎝. ⎠ ⎝. ⎠ .... . . . g2n−1 xn−1 rn
with the right-hand side being given respectively by ri = and
0, i = 1, . . . , n − 1, 1, i = n,
ri =
1, i = 1, 0, i = 2, . . . , n.
8.5 Exercises
275
Show that if a0 = 0, we have xn−1 = a0 , the Hankel matrix Hn is invertible, and its inverse is given by Hn−1 = B(y, a), where the polynomial y is defined by y(z) = a(0)−1 zx(z) = q(z) − a(0)−1q(0)a(z). 5. Show that a Hermitian Toeplitz form n
Tn (x, x) = ∑
n
∑ ci− j ξi ξ j
i=1 j=1
with the matrix
⎛
c0 ⎜ . ⎜ ⎜ Tn = ⎜ . ⎜ ⎝ . cn
. . c−n+1 .. . .. . .. . .. .
⎞ c−n . ⎟ ⎟ ⎟ . ⎟ ⎟ . ⎠ c0
is positive definite if and only if the parameters ck = c−k allow a representation of the form ck =
n
∑ ρ j θ jk ,
j=1
with ρ j > 0, |θ j | = 1, and the θ j distinct. 6. We say a sequence c0 , c1 , . . . of complex numbers is a positive sequence if for all n ≥ 0, the corresponding Toeplitz form is positive definite. An inner product on C[z] is defined by ⎛
c0 ⎜ . ⎜ ⎜ a, bC = a0 . . . an ⎜ . ⎜ ⎝ . cn
⎞⎛ ⎞ . . . cn b0 ⎜ ⎟ ... . ⎟ ⎟⎜ . ⎟ n n ⎟⎜ ⎟ . . . . ⎟ ⎜ . ⎟ = ∑ ∑ c(i− j) ai b j . ⎟⎜ ⎟ . . . . ⎠ ⎝ . ⎠ i=0 j=0 . . . c0 bn
Under our assumption of positivity, this is a definite inner product on C[z]. a. Show that the inner product defined above has the following properties: zi , z j C = c(i− j) and za, zbC = a, bC . b. Let φn be the sequence of polynomials, obtained from the polynomials 1, z, z2 , . . . by an application of the Gram-Schmidt orthogonalization procedure. Assume all the φn are monic. Show that
276
8 Tensor Products and Forms
φ0 (z) = 1, c 0 . 1 φn (z) = . det Tn−1 cn−1 1 c. Show that
φn , zn−i C = 0,
d. Show that
.. . .. . .. . . . c0 . . zn−1
c−n . . . c−1 zn
i = 0, . . . , n − 1. n−1
φn (z) = 1 − z ∑ γi+1 φi (z), i=0
where the γi are defined by
γi+1 =
1, zφi . φi 2
The γi are called the Schur parameters of the sequence c0 , c1 , . . . . e. Show that the orthogonal polynomials satisfy the following recurrences φn (z) = zφn−1 (z) − γn φn−1 (z), φn (z) = φn−1 (z) − zγn φn−1 (z),
with the initial conditions for the recurrences given by φ0 (z) = φ0 (z) = 1. Equivalently,
f. Show that g. Show that
φn (z) φn (z)
=
z − γn −zγn 1
φn−1 (z) φn−1 (z)
,
φ0 (0) φ0 (0)
=
1 . 1
γn = −φn (0). φn 2 = (1 − γn2 )φn−1 2 /
h. Show that the Schur parameters γn satisfy |γn | < 1. i. The Levinson algorithm. Let {φn (z)} be the monic orthogonal polynomials i associated with the positive sequence {cn }. Assume φn (z) = zn + ∑n−1 i=0 φn,i z .
8.6 Notes and Remarks
277
Show that they can be computed recursively by rn+1 = (1 − γn2 )rn , 1 n γn+1 = ∑ ci+1 φn,i , rn i=0 φn+1 (z) = zφn (z) − γn+1 φn (z),
r0 = c0 ,
φ0 (z) = 1.
j. Show that the shift operator Sφn acting in Xφn satisfies i
Sφn φi = φi+1 − γi+1 ∑ γ j Πνi = j+1 (1 − γν2)φ j + γi+1 Πνi =1 (1 − γν2 )φ0 . j=1
k. Let c0 , c1 , . . . be a positive sequence. Define a new sequence cˆ0 , cˆ1 , . . . through the infinite system of equations given by ⎞ ⎛ ⎞ 1 cˆ0 /2 c0 /2 0 . . . ⎜ ⎟ ⎜0⎟ c ˆ ⎜ ⎟ 1 ⎟ ⎜ c1 c0 /2 0 . . ⎟ ⎜ ⎟ ⎜ ⎟ ⎟⎜ ⎜ = . ⎜ . ⎟. ⎟ ⎜ ⎝ c2 c1 . . . ⎠ ⎜ ⎟ ⎜ ⎟ ⎝ . ⎠ ⎝.⎠ . . . .. . . ⎛
⎞
⎛
Show that cˆ0 , cˆ1 , . . . is also a positive sequence. l. Show that the Schur parameters for the orthogonal polynomials ψn , corresponding to the new sequence, are given by −γn . m. Show that ψn satisfy the recurrence relation ψn = zψn−1 + γn ψn−1 , ψn = ψn−1 + zγn ψn−1 .
The initial conditions for the recurrences are ψ0 (z) = ψ0 (z) = 1.
8.6 Notes and Remarks The study of quadratic forms owes much to the work of Sylvester and Cayley. Independently, Hermite studied both quadratic and Hermitian forms and, more or less simultaneously, Sylvester’s law of inertia. Bezout forms and matrices were studied already by Sylvester and Cayley. The connection of Hankel and Bezout forms was known to Jacobi. For some of the history of the subject, we recommend Krein and Naimark (1936), as well as Gantmacher (1959).
278
8 Tensor Products and Forms
Continued fraction representations need not be restricted to rational functions. In fact, most applications of continued fractions are in the area of rational approximations of functions, or of numbers. Here is the generalized Euclidean algorithm; see Magnus (1962). Given g(z) ∈ F((z−1 )), we set g(z) = a0 (z) + g0 (z), with a0 (z) = π+ g(z). Define recursively
βn−1 = an (z) − gn (z), gn−1 (z) where βn−1 is a normalizing constant, an (z) a monic polynomial, and gn (z) ∈ z−1 F[[z−1 ]]. This leads, in the nonrational case, to an infinite continued fraction, which can be applied to approximation problems. In the proof of Proposition 8.65, we follow Lang (1965). Theorem 8.32 is due to Kravitsky (1980). The Gohberg–Semencul representation formula (8.40) for the Bezoutian is from Gohberg and Semencul (1972). The interpretation of the Bezoutian as a matrix representation of an intertwining map is due to Fuhrmann (1981b) and Helmke and Fuhrmann (1989). For the early history of the Bezoutian, see also Wimmer (1990). The approach to tensor products of polynomial models and the Bezout map are based on Fuhrmann and Helmke (2010). The analysis of the polynomial Sylvester equation extends the method, introduced in Willems and Fuhrmann (1992), for the analysis of the Lyapunov equation. In turn, this was influenced by Kalman (1969). The characterization of the inverse of finite nonsingular Hankel matrices as Bezoutians, given in Theorem 8.39, is due to Lander (1974). For a more detailed discussion of the vectorial versions of many of the results given in this chapter, in particular to the tensor products of vectorial functional models and the corresponding version of the Bezout map, we refer to Fuhrmann and Helmke (2010).
Chapter 9
Stability
9.1 Introduction The formalization of mathematical stability theory is generally traced to the classic paper Maxwell (1868). The problem is this: given a system of linear time-invariant, differential (or difference) equations, find a characterization for the asymptotic stability of all solutions. This problem reduces to finding conditions on a given polynomial that guarantee that all its zeros, i.e., roots, lie in the open left half-plane (or open unit disk).
9.2 Root Location Using Quadratic Forms The problem of root location of algebraic equations has served to motivate the introduction and study of quadratic and Hermitian forms. The first result in this direction seems to be that of Borchardt (1847). This in turn motivated Jacobi (1857), as well as Hermite (1856), who made no doubt the greatest contribution to this subject. We follow, as much as possible, the masterful exposition of Krein and Naimark (1936), which every reader is advised to consult. The following theorem summarizes one of the earliest applications of quadratic forms to the problem of zero location. Theorem 9.1. Let q(z) = ∑ni=0 qi zi be a real polynomial, g(z) = infinite Hankel matrix of g(z). Then
q (z) q(z) ,
and Hg the
1. The number of distinct, real or complex, roots of q(z) is equal to rank(Hg ). 2. The number of distinct real roots of q(z) is equal to the signature σ (Hg ), or equivalently to the Cauchy index Ig of g(z). Proof. 1. Let α1 , . . . , αm be the distinct roots of q(z) = ∑ni=0 qi zi , whether real or complex. So P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, DOI 10.1007/978-1-4614-0338-8 9, © Springer Science+Business Media, LLC 2012
279
280
9 Stability m
q(z) = qn ∏(z − αi )νi , i=1
with ν1 + · · · + νm = n. Hence the rational function g(z) defined below satisfies g(z) =
m νi q (z) =∑ . q(z) i=1 (z − αi )
From this it is clear that m is equal to δ (g), the McMillan degree of g(z), and so in turn to rank(Hg ). 2. On the other hand, the Cauchy index of g(z) is obviously equal to the number of real zeros, since all residues νi are positive. But by the Hermite-Hurwitz theorem, we have Ig = σ (Hg ). The analysis of Bezoutians, undertaken in Chapter 8, can be efficiently applied to the problem of root location of polynomials. We begin by applying it to the same problem as above. Theorem 9.2. Let q(z) be as in the previous theorem. Then 1. The number of distinct, real or complex, roots of q(z) is equal to codim Ker B(q, q ). 2. The number of distinct real roots of q(z) is equal to σ (B(q, q )). 3. All roots of q(z) are real if and only if B(q, q ) ≥ 0. Proof. 1. From our study of Bezoutians and intertwining maps we know that dim Ker B(q, q ) = deg r, where r(z) is the g.c.d. of q(z) and q (z). It is easy to νi −1 ; check that the g.c.d. of q(z) and q (z) is equal to r(z) = qn ∏m i=1 (z − αi ) hence its degree is n − m. 2. This follows as above. 3. Note that if the inertia, see Definition 8.11, is In (B(q, q )) = (π , ν , δ ), then ν = (rank (B(q, q )) − σ (B(q, q)))/2. So ν = 0 if and only if rank(B(q, q )) = σ (B(q, q )), i.e., if and only if all roots of q(z) are real. Naturally, in Theorem 9.1, the infinite quadratic form Hg can be replaced by Hn , the n × n truncated form. In fact, it is this form, in various guises, that appears in the early studies. We digress a little on this point. If we expand 1/(z − αi ) in powers of z−1 , we obtain g(z) =
∞ m m ∞ αik q (z) νi sk =∑ = ∑ νi ∑ k+1 = ∑ k+1 , q(z) i=1 z − αi i=1 k=1 z z k=1
k with sk = ∑m i=1 νi αi . If all the zeros are simple, the numbers sk are called the Newton sums of q(z). The finite Hankel quadratic form ∑n−1 i=0 si+ j ξi ξ j is easily seen to be a j 2 different representation of ∑ni=1 (∑n−1 ξ α ) . j=0 j i
9.2 Root Location Using Quadratic Forms
281
The result stated in Theorem 9.1 far from exhausts the power of the method of quadratic forms. We address ourselves now to the problem of determining, for an arbitrary complex polynomial q(z), the number of its zeros in an open half-plane. We begin with the problem of determining the number of zeros of q(z) = ∑nk=0 qk zk in the open upper half-plane. Once this is accomplished, we can apply the result to other half-planes, most importantly to the open left half-plane. To solve this problem, Hermite introduced the notion of Hermitian forms, which had far-reaching consequences in mathematics, Hilbert space theory being one offshoot. Given a complex polynomial q(z) of degree n, we define q(z) ˜ = q(z).
(9.1)
Clearly, (9.1) implies that q( ˜ α ) = 0 if and only if q(α ) = 0. Thus, α is a common zero of q(z) and q(z) ˜ if and only if α is a real zero of q(z) or α , α is a pair of complex conjugate zeros of q(z). Therefore the degree of q(z)∧ q(z) ˜ counts the number of real and complex conjugate zeros of q(z). The next theorem gives a complete analysis. Theorem 9.3. Given a complex polynomial q(z) of degree n, we define a polynomial Q(z, w) in two variables by Q(z, w) = −iB(q, q) ˜ = −i =
n
q(z)q(w) ˜ − q(z)q(w) ˜ z−w
n
∑ ∑ Q jk z j−1 wk−1 .
(9.2)
j=1 k=1
The polynomial Q(z, w) is called the generating function of the quadratic form, defined on Cn , by Q=
n
n
∑ ∑ Q jk ξ j ξk .
(9.3)
j=1 k=1
Then 1. The form Q defined in (9.3) is Hermitian, i.e., we have Q jk = Qk j . 2. Let (π , ν , δ ) be the inertia of the form Q. Then the number of real zeros of q(z) together with the number of zeros of q(z) arising from complex conjugate pairs is equal to δ . There are π more zeros of q(z) in the open upper half-plane and ν more in the open lower half-plane. In particular, all zeros of q(z) are in the open upper half-plane if and only if Q is positive definite.
282
9 Stability
Proof. 1. We compute
q(w)q(z) ˜ − q(w)q(z) ˜ Q(w, z) = −i w−z
q(z)q(w) ˜ − q(z)q(w) ˜ q(w)q(z) ˜ − q(w)q(z) ˜ = −i =i w−z z−w
= Q(z, w). In turn, this implies ∑nj=1 ∑nk=1 Q jk z j−1 wk−1 = ∑nj=1 ∑nk=1 Q jk w j−1 zk−1 = ∑nj=1 ∑nk=1 Q jk zk−1 w j−1 = ∑nj=1 ∑nk=1 Qk j z j−1 wk−1 . Comparing coefficients, the equality Q jk = Qk j is obtained. 2. Let d(z) be the g.c.d. of q(z) and q(z). ˜ We saw previously that the zeros of d(z) are the real zeros of q(z) and all pairs of complex conjugate zeros of q(z). Thus, we may assume that d(z) is a real monic polynomial. Write q(z) = d(z)q (z) and q(z) ˜ = d(z)q˜ (z). Using Lemma 8.26, we have B(q, q) ˜ = B(dq , d q˜ ) and hence rank B(q, q) ˜ = rankB(q , q˜ ) and σ (−iB(q, q)) ˜ = σ (−iB(q , q˜ )). This shows that δ = deg d is the number of real zeros added to the number of zeros arising from complex conjugate zeros of q(z). Therefore, for the next step, we may as well assume that q(z) and q(z) ˜ are coprime. Let q(z) = q1 (z)q2 (z) be any factorization of q(z) into real or complex factors. We do not assume that q1 (z) and q2 (z) are coprime. We compute −i
q1 (z)q˜1 (w) − q˜1 (z)q1 (w) q(z)q(w) ˜ − q(z)q(w) ˜ = −iq2 (z) q˜2 (w) z−w z−w −iq˜1 (z)
q2 (z)q˜2 (w) − q˜2 (z)q2 (w) q1 (w). z−w
Since by construction, the polynomials qi (z), q˜i (z), i = 1, 2 are coprime, the ranks of the two Hermitian forms on the right are degq1 and deg q2 respectively. By Theorem 8.15, the signature of Q is equal to the sum of the signatures of the forms q1 (z)q˜1 (w) − q˜1 (z)q1 (w) Q1 = −i z−w and Q2 = −i
q2 (z)q˜2 (w) − q˜2 (z)q2 (w) . z−w
9.2 Root Location Using Quadratic Forms
283
Let α1 , . . . , αn be the zeros of q(z). Then by an induction argument, it follows that
σ (−iB(q, q)) ˜ = ∑nk=1 σ (−iB(z − αk , z − α k )) = ∑nk=1 sign (2Im (αk )) = π − ν . Thus π is the number of zeros of q(z) in the upper half-plane and ν the number in the lower half-plane. We can restate the previous theorem in terms of either the Cauchy index or the Hankel form. Theorem 9.4. Let q(z) be a complex polynomial of degree n. Define its real and imaginary parts by q(z) + q(z) ˜ , 2 q(z) − q(z) ˜ qi (z) = . 2i
qr (z) =
(9.4)
We assume deg qr ≥ deg qi . Define the proper rational function g(z) = qi (z)/qr (z). Then 1. The Cauchy index of g(z), Ig = σ (Hg ), is the difference between the number of zeros of q(z) in the lower half-plane and in the upper half-plane. 2. We have Ig = σ (Hg ) = −n, i.e., Hg or equivalenly B(qr , qi ) are negative definite if and only if all the zeros of q(z) lie in the open upper half-plane. Proof. 1. Note that the assumption that deg qr ≥ deg qi does not limit generality, since if this condition is not satisfied, we can consider instead the polynomial iq(z). The Hermitian form −iB(q, q) ˜ can be also represented as the Bezoutian of two real polynomials. Indeed, if q(z) = qr (z) + iqi (z) with qr and qi real polynomials, then q(z) ˜ = qr (z) − iqi (z) and −iB(q, q) ˜ = −iB(qr + iqi , qr − iqi ) = −i[B(qr , qr ) + iB(qi , qr ) − iB(qr , qi ) + B(qi , qi )] = 2B(qi , qr ), or − iB(q, q) ˜ = 2B(qi , qr ).
(9.5)
˜ = σ (B(qi , qr )) and, by the From equality (9.5) it follows that σ (−iB(q, q)) Hermite–Hurwitz theorem, that
σ (B(qi , qr )) = −σ (B(qr , qi )) = −σ (Hg ) = −Ig . The result follows from Theorem 9.3. 2. Follows from part 1.
284
9 Stability
Definition 9.5. A complex polynomial q(z) is called stable, or a Hurwitz polynomial, if all its zeros lie in the open left half-plane Π− . To obtain stability characterizations, we need to replace the open upper half-plane by the left open half-plane and this can easily be done by a change of variable. Theorem 9.6. Let q(z) be a complex polynomial. Then a necessary and sufficient condition for q(z) to be a Hurwitz polynomial is that the Hermite–Fujiwara quadratic form with generating function H=
q(z)q(w) ˜ − q(−z)q(−w) ˜ = z+w
n
n
∑ ∑ H jk z j−1 wk−1
(9.6)
j=1 k=1
be positive definite. Proof. Obviously, a zero α of q(z) is in the open left half-plane if and only if −iα , which is in the upper half-plane, is a zero of f (z) = q(iz). Thus, by Theorem 9.4, all the zeros of q(z) are in the left half-plane if and only if the Hermitian form −iB( f , f˜) is positive definite. ˜ it follows that Now, since f˜(z) = f (z) = q(iz) = q((−iz)) = q(−iz), −iB( f , f˜) = −i
f (z) f˜(w) − f˜(z) f (w) q(iz)q(−iw) ˜ − q(−iz)q(iw) ˜ = . z−w i(z − w)
If we substitute ζ = iz and ω = −iw, then we get the Hermite-Fujiwara generating function (9.6), and this form has to be positive for all the zeros of q(z) to be in the upper half plane. Definition 9.7. Let q(z) be a real monic polynomial of degree m with simple real zeros α1 < α2 < · · · < αm and let p(z) be a real polynomial of degree ≤ m with positive leading coefficient and zeros β1 < β2 < · · · < βm−1 if deg p = m − 1, β1 < β2 < · · · < βm if deg p = m. Then 1. We say that q(z) and p(z) are a real pair if the zeros satisfy the interlacing property
α1 < β1 < α2 < β2 < · · · < βm−1 < αm β1 < α1 < β2 < · · · < βm < αm
if if
deg p = m − 1, deg p = m.
(9.7)
2. We say that q(z) and p(z) form a positive pair if they form a real pair and αm < 0 holds.
9.2 Root Location Using Quadratic Forms
285
Theorem 9.8. Let q(z) and p(z) be a pair of real polynomials as above. Then 1. q(z) and p(z) form a real pair if and only if B(q, p) > 0. 2. q(z) and p(z) form a positive pair if and only if B(q, p) > 0 and B(zp, q) > 0. Proof. 1. Assume q(z) and p(z) form a real pair. By our assumption, all zeros αi of q(z) are real and simple. Define a rational function by g(z) = p(z)/q(z). Clearly c q(z) = ∏mj=1 (z − α j ). Hence g(z) = ∑mj=1 z−αj j , and it is easily checked that c j = p(αi )/q (αi ). Using the Hermite-Hurwitz theorem, to show that B(q, p) > 0 it suffices to show that σ (B(q, p)) = Ig = m, or equivalently that p(αi )/q (αi ) > 0, for all i. Since both p(z) and q(z) have positive leading coefficients, we have, except for the trivial case m = 1 and deg p = 0, limx→∞ p(x) = limx→∞ q(x) = +∞. This implies q (αm ) > 0 and p(αm ) > 0. As a result, we have p(αm )/q (αm ) > 0. The simplicity of the zeros of q(z) and p(z) and the interlacing property forces the signs of the q (αi ) and p(αi ) to alternate, and this forces all other residues p(αi )/q (αi ) to be positive. The reader is advised to draw a picture in order to gain a better understanding of the argument. Conversely, assume B(q, p) > 0. Then by the Hermite-Hurwitz theorem, I p = m, which means that all the zeros of q(z) have to be real and simple and q
the residues satisfy p(αi )/q (αi ) > 0. Since all the zeros of q(z) are simple necessarily the signs of the q (αi ) alternate. Thus p(z) has values of different signs at neighboring zeros of q(z), and hence its zeros are located between zeros of q(z). If deg p = m − 1, we are done. If deg p = m, there is an extra zero of p(z). Since q (αm ) > 0, necessarily p(αm ) > 0. Now βm > αm would imply limx→∞ p(x) = −∞, contrary to the assumption that p has positive leading coefficient. So necessarily we have β1 < α1 < β2 < · · · < βm < αm , i.e., q(z) and p(z) form a real pair. 2. Assume now q(z) and p(z) are a positive pair. By our assumption all αi are negative. Moreover, since in particular q(z) and p(z) are a real pair, it follows by part (1) that B(q, p) > 0. This means that p(αi )/q (αi ) > 0 for all i = 1, . . . , m and hence that αi p(αi )/q (αi ) < 0. But these are the residues of zp(z)/q(z), and so I zp = σ (B(q, zp)) = −m. So B(q, zp) is negative definite and B(zp, q) positive q definite. Conversely, assume the two Bezoutians B(q, p) and B(zp, q) are positive definite. By part (1), the polynomials q(z) and p(z) form a real pair. Thus the residues satisfy p(αi )/q (αi ) > 0 and αi p(αi )/q (αi ) < 0. So in particular all zeros of q(z) are negative. In particular, αm < 0, and so q(z) and p(z) form a positive pair. As a corollary we obtain the following characterization. Theorem 9.9. Let q(z) be a complex polynomial and let qr (z), qi (z) be its real and imaginary parts, defined in (9.4). Then all the zeros of q(z) = qr (z) + iqi (z) are in the open upper half-plane if and only if all the zeros of qr (z) and qi (z) are simple and real and separate each other.
286
9 Stability
Proof. Follows from Theorems 9.4 and 9.8.
In most applications what is needed are stability criteria for real polynomials. The assumption of realness of the polynomial q(z) leads to further simplifications. These simplifications arise out of the decomposition of a real polynomial into its even and odd parts. We say that a polynomial p(z) is even if p(z) = p(−z) and odd if p(z) = −p(−z). With q(z) = ∑ q j z j , the even and odd parts of q(z) are determined by q+ (z2 ) =
q(z) + q(−z) = 2
q(z) − q(−z) = q− (z2 ) = 2
∑ q2 j z 2 j ,
j≥0
(9.8)
∑ q2 j+1z2 j .
j≥0
This is equivalent to writing q(z) = q+ (z2 ) + zq−(z2 ).
(9.9)
In preparation for the next theorem, we state the following lemma, describing a condition for stability. Lemma 9.10. Let q(z) be a monic real polynomial with all its roots real. Then q(z) = zn + qn−1 zn−1 + · · · + q0 is stable if and only if qi > 0 for i = 0, . . . , n − 1. n (z + β ). Proof. Assume q(z) is stable and let −βi < 0 be its zeros, i.e., q(z) = Πi=1 i This implies the positivity of the coefficients qi .
Since, by assumption, all zeros of q(z) are real, for stability it suffices to check that q(z) has no nonnegative zeros. But clearly, for every x ≥ 0 we have q(x) = xn + qn−1xn−1 + · · · + q0 > 0. The following theorem sums up the central results on the characterization of stability of real polynomials. Theorem 9.11. Let q(z) be a real monic polynomial of degree n, and let q+ (z), q− (z) be its even and odd parts respectively, as defined in (9.8). Then the following statements are equivalent: 1. 2. 3. 4. 5.
q(z) is a stable, or Hurwitz, polynomial. The Hermite–Fujiwara form is positive definite. The two Bezoutians B(q+ , q− ) and B(zq− , q+ ) are positive definite. The polynomials q+ (z) and q− (z) form a positive pair. The Bezoutian B(q+ , q− ) is positive definite and all qi are positive.
Proof. (1) ⇔ (2) By Theorem 9.6 the stability of q(z) is equivalent to the positive definiteness of the Hermite Fujiwara form. (2) ⇔ (3) Since the polynomial q(z) is real, we have in this case q(z) ˜ = q(z). From (9.9) it follows that q(−z) = q+ (z2 ) − zq−(z2 ). Therefore
9.2 Root Location Using Quadratic Forms
287
q(z)q(w) − q(−z)q(−w) (z + w) =
(q+ (z2 )+zq− (z2 ))(q+ (w2 )+wq− (w2 ))−(q+ (z2 )−zq− (z2 ))(q+ (w2 )−wq− (w2 )) z+w
=2
zq− (z2 )q+ (w2 ) + q+(z2 )wq− (w2 ) z+w
=2
z2 q− (z2 )q+ (w2 ) − q+(z2 )w2 q− (w2 ) q+(z2 )q− (w2 ) − q−(z2 )q+ (w2 ) + 2zw . z2 − w2 z2 − w2
One form contains only even-indexed terms, the other only odd ones. So the positive definiteness of the form (9.6) is equivalent to the positive definiteness of the two Bezoutians B(q+ , q− ) and B(zq− , q+ ). (3) ⇔ (4) This is the content of Theorem 9.8. (4) ⇒ (5) Since the first four conditions are equivalent, from (3) it follows that B(q+ , q− ) > 0. Also from (1), applying Lemma 9.10, it follows that all qi are positive. (5) ⇒ (4) B(q+ , q− ) > 0 implies, by Theorem 9.8, that q+ (z), q− (z) are a real pair. In particular, all zeros of both polynomials are real and simple. The positivity of all qi implies the positivity of all coefficients of q+ (z). Applying Lemma 9.10, all zeros of q+ (z) are negative, which from the interlacing properties (9.7) shows that the polynomials q+ (z) and q− (z) form a positive pair. The next corollary gives the stability characterization in terms of the Cauchy index. Corollary 9.12. Let q(z) be a real monic polynomial of degree n, having the expansion q(z) = zn + qn−1 zn−1 + · · · + q0 , and let q(z) = q+ (z2 ) + zq− (z2 ) be its decomposition into its even and odd parts. Then q(z) is a Hurwitz polynomial if and only if the Cauchy index of the function g(z) defined by g(z) =
qn−1 zn−1 − qn−3zn−3 + · · · zn − qn−2zn−2 + · · ·
is equal to n. Proof. We distinguish two cases. Case I: n is even. Let n = 2m. Clearly, in this case
288
9 Stability
q+ (−z2 ) = (−1)m (q2m z2m − q2m−2z2m−2 + · · · ) = (−1)m (zn − qn−2zn−2 + · · · ), −zq− (−z2 ) = (−1)m (q2m−1 z2m−1 − q2m−3z2m−3 + · · · ) = (−1)m (qn−1 zn−1 − qn−3 zn−3 + · · · ). −zq (−z ) So we have, in this case, g(z) = q −(−z2 ) . By the Hermite-Hurwitz theorem, the + Cauchy index of g(z) is equal to the signature of the Bezoutian of the polynomials q+ (−z2 ), −zq− (−z2 ), and moreover, Ig = n if and only if that Bezoutian is positive definite. We compute 2
−q+ (−z2 )wq− (−w2 ) + zq−(−z2 )q+ (−w2 ) z−w =
[−q+ (−z2 )wq− (−w2 ) + zq− (−z2 )q+ (−w2 )](z + w) z2 − w2
=
z2 q− (−z2 )q+ (−w2 ) − q+(−z2 )w2 q− (−w2 ) z2 − w2 + zw
q− (−z2 )q+ (−w2 ) − q+(−z2 )q− (−w2 ) z2 − w2
So the Bezoutian of q+ (−z2 ) and −zq− (−z2 ) is isomorphic to B(zq− , q+ ) ⊕ B(q+ , q− ). In particular, the Cauchy index of g(z) is equal to n if and only if both Bezoutians are positive definite. This, by Theorem 9.11, is equivalent to the stability of q(z). Case II: n is odd. Let n = 2m + 1. In this case q+ (−z2 ) = (−1)m (q2m z2m − q2m−2z2m−2 + · · ·) = (−1)m (qn−1 zn−1 − qn−3 zn−3 + · · · ), −zq− (−z2 ) = (−1)m+1 (q2m+1 z2m+1 − q2m−1 z2m−1 + · · · ) = (−1)m+1 (zn − qn−2 zn−2 + · · · ). So we have, in this case, g(z) =
q+ (−z2 ) . We conclude the proof as before. zq− (−z2 )
Since Bezout and Hankel forms are related by congruence, we expect that stability criteria can also be given in terms of Hankel forms. We present next such a result.
9.2 Root Location Using Quadratic Forms
289
As before, let q(z) = q+ (z2 ) + zq− (z2 ). Since for n = 2m, we have deg q+ = m and deg q− = m − 1, whereas for n = 2m + 1, we have deg q+ = deg q− = m, the rational function g(z) defined by g(z) =
q− (z) q+ (z)
is proper for odd n and strictly proper for even n. Thus g(z) can be expanded in a power series in z−1 , g(z) = g0 + g1 z−1 + · · · , with g0 = 0 in case n is even. Theorem 9.13. Let q(z) be a real monic polynomial of degree n and let q+ (z), q− (z) be its even and odd parts respectively. Let m = [n/2]. Define the rational function g(z) by ∞ gi −q− (−z) = −g0 + ∑ i . g(z) = q+ (−z) i=1 z Then q(z) is stable if and only if the two Hankel forms ⎛
g1 ⎜. ⎜ ⎜ Hm = ⎜ . ⎜ ⎝. gm and
⎛
. . . . .
.. .. .. .. ..
g2 ⎜. ⎜ ⎜ (σ H)m = ⎜ . ⎜ ⎝. gm+1
⎞ gm ⎟ . ⎟ ⎟ . ⎟ ⎟ ⎠ . g2m−1 ⎞ . . . gm+1 ⎟ .... ⎟ ⎟ .... ⎟ ⎟ ⎠ .... . . . g2m
are positive definite. Proof. For the purpose of the proof we let p(z) ˆ = p(−z). By Theorem 8.40, the Hankel forms Hm and (σ H)m are congruent to the Bezoutians −B(qˆ+ , qˆ− ) and B(qˆ+ , z
q− ) respectively. It is easily computed, by simple change of variables, that −B(qˆ+ , qˆ− ) = B(q+ , q− ) and B(qˆ+ , z
q− ) = B(zq− , q+ ). By Theorem 9.11, the stability of q(z) is equivalent to the positive definiteness of the Bezoutians B(q+ , q− ) and B(zq− , q+ ), and the result follows. We wish to remark that if we do not assume q(z) to be monic or to have its highest coefficient positive, we need to impose the extra condition g0 > 0 if n is odd. We conclude our discussion with the Hurwitz determinantal stability criterion.
290
9 Stability
Theorem 9.14. All the zeros of the real polynomial q(z) = ∑ni=0 qi zi lie in the open left half plane if and only if, under the assumption qn > 0, the n determinantal conditions qn−3 qn−4 qn−5 qn−2 qn−3 > 0, H3 = qn−1 qn−2 qn−3 > 0, . . . , H1 = |qn−1 | > 0, H2 = qn qn−1 0 q q n n−1 q0 . . Hn = . . 0
0. . . 0 . . . . . . . . . . > 0, . . qn−3 qn−4 qn−5 . . qn−1 qn−2 qn−3 . . 0 qn qn−1
are satisfied. We interpret qn−k = 0 for k > n. These determinants are called the Hurwitz determinants. Proof. Clearly, the assumption qn > 0 involves no loss of generality; otherwise, we consider the polynomial −q(z). By Theorem 9.8, the polynomial q(z) is stable if and only if the two Bezoutians B(q+ , q− ) and B(zq− , q+ ) are positive definite. This, by Theorem 8.14, is equivalent to the positive definiteness of all principal minors of both Bezoutians. We find it convenient to check for positivity all the lower-right-hand-corner minors rather than all upper-left-hand-corner minors. In the proof, we will use the Gohberg-Semencul formula (8.40) for the representation of the Bezoutian. The proof will be split in two parts according to n = deg q being even or odd. We will prove only the case that n is even. The case that n is odd proceeds along similar lines, and we omit it. Assume n is even and n = 2m. The even and odd parts of q(z) are given by q+ (z) = q0 + · · · + q2mzm , q− (z) = q1 + · · · + q2m−1zm−1 . The Bezoutian B(q+ , q− ) has therefore the representation ⎡⎛ ⎢⎜ ⎢⎜ ⎢⎜ ⎢⎜ ⎣⎝
q1 . . . q2m−1
⎞⎛
. .. ... . . . q1
q2m q2m−2 . . ⎟⎜ . .. ⎟⎜ ⎟⎜ .. ⎟⎜ ⎠⎝ .
⎞
⎛
q2 q0 ⎜ . . ⎟ ⎟ ⎜ ⎜ . ⎟ ⎟−⎜ . . ⎠ ⎝ . q2m q2m−2
⎞⎛
⎞⎤
0 q2m−1 . . q3 ⎟⎜ ⎥ . . .. . ⎟ ⎟⎜ ⎟⎥ ⎟ ⎜ ⎟ .. . . . ⎟⎥ ⎟⎜ ⎥ Jm . . . . ⎠⎝ . q2m−1 ⎠⎦ . . . q0 0
We consider now the lower right-hand k × k submatrix. This is given by ⎡⎛ ⎢⎜ ⎢⎜ ⎢⎜ ⎢⎜ ⎣⎝
q2m−2k+1 . . . q2m−1
⎞⎛ . . . . . . . . . q2m−2k+1
⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎠⎝
q2m . . . q2m−2k+2 . . . . . . . . . q2m
⎞
⎛
⎟ ⎜ ⎟ ⎜ ⎟−⎜ ⎟ ⎜ ⎠ ⎝
q2m−2k . . . q2m−2
⎞⎛ . . . . . . . . . q2m−2k
⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎠⎝
⎞⎤
0 q2m−1 . . q2m−2k+3 ⎟⎥ . . . . ⎟⎥ ⎟⎥ Jk . . . . ⎟⎥ . q2m−1 ⎠⎦ 0
9.2 Root Location Using Quadratic Forms
291
Applying Lemma 3.17, we have det(B(q+ , q− ))m i, j=m−k+1 q2m−2k+1 . . . q = 2m−1 0
. . . . . . q2m−2k+1 . . . q2m−2k+3 ... . .. . . q2m−1 0
Which, using the factor det Jk = (−1)
k(k−1) 2
q2m−2k+1 q2m−2k 0 . q2m−2k+2 q2m−2k+1 . . . . q2m−2 . q2m−1 0 q2m . . 0 . . .
q2m−2k . . . q2m−2 q2m
. . . . . . q2m−2k · detJk , . . . q2m−2k+2 ... . .. . . . q 2m
, can be rearranged in the form q2m−2k . q2m−2k+2 . . q2m−2 q2m . . . 0
. . . . q2m−2k+1 . . . q2m−2k+3 . . .. . . 0 q2m q2m−1 . 0
Since q2m = qn > 0, it follows that the following determinantal conditions hold: q2m−2k+1 q2m−2k 0 . q2m−2k+2 q2m−2k+1 . . . . q2m−2 . q2m−1 0 q2m . . 0 . .
. . . . q2m−2k+1 > 0. . . . q2m−2k+3 . . .. . . 0 q2m q2m−1
292
9 Stability
Using the fact that n = 2m, we can write the last determinantal condition as qn−2k+1 qn−2k 0 q . q n−2k+2 n−2k+1 . . . . qn−2 . qn−1 0 qn . . 0 . .
.
. . . . . . . . .. . 0 qn
qn−2k+1 > 0. qn−2k+3 . q n−1
Note that the order of these determinants is 1, . . . , 2m − 1. Next, we consider the Bezoutian B(zq− , q+ ), which has the representation ⎡⎛
q0 . . .
⎢⎜ ⎢⎜ ⎢⎜ ⎢⎜ ⎢⎜ ⎢⎜ ⎢⎜ ⎣⎝
q2m−2
⎞⎛
. .. ... . . . q0
⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎠⎝
q2m−1 q2m−3 . . . .. .. .
q1 . . .
⎞
⎛
q2m−1
⎞⎛
0 q1 . .
⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟−⎜ ⎟ ⎜ ⎟ ⎜ ⎠ ⎝
q2m−3
. .. .. . . . q1 0
⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎠⎝
q2m
⎞⎤
. . q2 ⎟⎥ ⎥ ... . ⎟ ⎟⎥ ⎟⎥ ⎥ Jm . .. . ⎟ ⎟⎥ ⎟⎥ . ⎠⎦ q2m
The lower right hand k × k submatrix is given by ⎡⎛ ⎢⎜ ⎢⎜ ⎢⎜ ⎢⎜ ⎢⎜ ⎣⎝
q2m−2k . . . q2m−2
⎞⎛
. . . . . . . . . q2m−2k
⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎠⎝
⎞
⎞⎛
⎛
q2m−1 q2m−3 . . q2m−2k+1 q2m−2k−1 ⎟ ⎜ . . . . . ⎟ ⎜ ⎟ ⎜ ⎟−⎜ . . . . ⎟ ⎜ ⎠ ⎝ . . . q2m−1 q2m−3
. . . . . . . . . q2m−2k−1
⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎠⎝
q2m
Again, by the use of Lemma 3.17, we have det(B(zq− , q+ ))m i, j=m−k+1 q2m−2k . . . q = 2m−2 q2m
. . .. .. .. .
. . q2m−2k . q2m−2k+2 . . . . . q2m−1 q2m
q2m−2k−1 . . . q2m−3 q2m−1
. . . ... ... ... .. .
⎞⎤
. . q2m−2k+2 ⎟⎥ . . . . ⎟⎥ ⎟⎥ ⎟⎥ Jk . . . . ⎟⎥ ⎠⎦ . q2m
q2m−2k−1 · detJk q2m−2k+1 . . . q2m−1
9.3 Exercises
293
q2m−2k . . . q = 2m−1 q2m . . .
q2m−2k−1 0 q2m−2k+1 q2m−2k . . . q2m−2 . . q2m−1 . . . . ..
q2m−2k . q2m−2k+2 . . q2m−3 q . . . 0
. . .
. .
q2m−2k+1 q2m−2k+3
. . q2m q2m−1 q2m−2 . q2m
2m−1
Using n = 2m, this determinant can be rewritten as qn−2k . . . qn−1 qn . . .
qn−2k−1 0 qn−2k+1 qn−2k . . . qn−2 . . . . qn−2k+1 qn−1 . . . . qn−2k+3 . . . .. . . qn qn−1 qn−2 . qn
qn−2k . qn−2k+2 . . qn−3 qn−1 . . . 0
Case II: This is the case in which n = 2m + 1 is odd. The proof proceeds along similar lines.
9.3 Exercises 1. Let the real polynomial q(z) = ∑ni=0 qi zi have zeros μ1 , . . . , μn . Prove Orlando’s formula, i.e., that for the Hurwitz determinants defined in Theorem 9.14, we have det Hn−1 = (−1)
n(n−1) 2
qn−1 n ∏( μi + μk ).
(9.10)
i
2. a. Show that a real polynomial p(z) is a Hurwitz polynomial if and only if the zeros and poles of f (z) =
p(z) − p(−z) p(z) + p(−z)
are simple, located on the imaginary axis, and mutually separate each other.
294
9 Stability
b. Show that a real polynomial p(z) = ∑ni=0 pi zi has all its zeros in the open unit disk if and only if |pn | > |p0 | and the zeros and poles of f (z) =
p(z) − p (z) p(z) + p (z)
are simple, located on the unit circle, and mutually separate each other. Here p (z) = zn p(z−1 ) is the reciprocal polynomial to p. 3. Given a polynomial f (z) ∈ R[z] of degree n, we define S( f ) = f (z) f (z) − f (z)2 . Show that f (z) has n distinct real roots if and only if S( f ) < 0, S( f ) < 0, . . . , S( f (n−2 ) < 0. Show that f (z) = z3 + 3uz + 2v has three distinct real roots if and only if u < 0 and v2 + u3 < 0.
9.4 Notes and Remarks The approach to problems of root location via the use of quadratic forms goes back to Jacobi. The greatest impetus to this line of research was given by Hermite (1856). A very thorough exposition of the method of quadratic forms is given in the classic paper Krein and Naimark (1936). This paper also contains a comprehensive bibliography. Theorem 9.11.4 is usually known as the Hermite–Biehler theorem. There are other approaches to the stability analysis of polynomials. Some are based on complex analysis, and in particular on the argument principle. A very general approach to stability was developed by Liapunov (1893). For the case of a linear system x˙ = Ax, this leads to the celebrated Lyapunov matrix equation AX + XA∗ = −Q. Surprisingly, it took half a century to clarify the connection between Lyapunov stability theory and the algebraic stability criteria. For such a derivation, in the spirit of this book, see Willems and Fuhrmann (1992).
Chapter 10
Elements of Linear System Theory
10.1 Introduction This chapter is devoted to a short introduction to algebraic system theory. We shall focus on the main conceptual underpinnings of the theory, more specifically on the themes of external and internal representations of systems and the associated realization theory. We feel that these topics are to be considered an essential part of linear algebra. In fact, the notions of reachability and observability, introduced by Kalman, fill a gap that the notion of cyclicity left open. Also, they have such a strong intuitive appeal that it would be rather perverse not to use them and search instead for sterile, but “pure,” substitute terms. We saw the central role that polynomials play in the structure theory of linear transformations. The same role is played by rational functions in the context of algebraic system theory. In fact, realization theory is, for rational functions, what the shift operator and the companion matrices are for polynomials. From the purely mathematical point of view, realization theory has, in the algebraic context, as its main theme a special type of representation for rational functions. Note that given a quadruple of matrices (A, b, c, d) of sizes n × n, n × 1, 1 × n, and 1 × 1 respectively, the function g(z) defined by g(z) = d + c(zI − A)−1 b
(10.1)
is a scalar proper rational function. The realization problem is the corresponding inverse problem. Namely, given a scalar proper rational function g(z), we want to find a quadruple of matrices (A, b, c, d) for which (10.1) holds. In this sense, the realization problem is an extension of a simpler problem that we solved earlier. That problem was the construction of an operator, or a matrix, that had a given polynomial as its characteristic polynomial. In the solution to that problem, shift operators, and their matrix representations in terms of companion matrices, played a central role. Therefore, it is natural to expect that the same objects will play a similar role in the solution of the realization problem, and this in fact is the case. P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, DOI 10.1007/978-1-4614-0338-8 10, © Springer Science+Business Media, LLC 2012
295
296
10 Elements of Linear System Theory
In Sections 10.4 and 10.5 we use the Hardy space RH∞ + . For the reader unfamiliar with the basic results on Hardy spaces and the operators acting on them, it is advisable to read first Chapter 11.
10.2 Systems and Their Representations Generally, we associate the word system with dynamics, that is, with the way an object evolves with time. Time itself can be modeled in various ways, most commonly as continuous or discrete, as the case may be. In contrast to some other approaches to the study of systems, we will focus here on the proverbial black box approach, given by the schematic diagram u
-
Σ
y
-
where Σ denotes the system, u the input or control signal, and y the output or observation signal. The way the output signal depends on the input signal is called the input/output relation. Such a description is termed an external representation. An internal representation of a system is a model, usually given in terms of difference or differential equations, that explains, or is compatible with, the external representation. Unless further assumptions are made on the properties of the input/output relations, i.e., linearity and continuity, there is not much of interest that can be said. Because of the context of linear algebra and the elementary nature of our approach, with the tendency to emphasize linear algebraic properties, it is natural for us to restrict ourselves to linear, time-invariant, finite-dimensional systems, which we define below. For discrete-time systems, with an eye to applications such as coding theory, we choose to work over an arbitrary field. On the other hand, for continuous-time systems, requiring derivatives, we work over the real or complex fields. Definition 10.1. 1. A discrete time, finite-dimensional linear time-invariant system is a triple (U , X , Y ) of finite-dimensional vector spaces over a field F and a quadruple of linear transformations A ∈ L(X , X ), B ∈ L(U , X ), C ∈ L(X , Y ), and D ∈ L(U , Y ), with the system equations given by xn+1 = Axn + Bun , yn = Cxn + Dun . (10.2) 2. A continuous-time, finite-dimensional linear time-invariant system is a triple (U , X , Y ) of finite-dimensional vector spaces over the real or complex field and a quadruple of linear transformations A ∈ L(X , X ), B ∈ L(U , X ), C ∈ L(X , Y ), and D ∈ L(U , Y ), with the system equations given by x˙ = Ax + Bu, y = Cx + Du.
(10.3)
10.2 Systems and Their Representations
297
The spaces U , X , Y are called the input space, state space, and output space respectively. Usually, we identify these spaces with Fm , Fn , F p respectively. In that case, the transformations A, B,C, D aregivenby matrices. In both cases, we will use the notation
A B C D
to describe the system. Such
representations are called state space-realizations. Since the development of discrete-time linear systems does not depend on analysis, we will concentrate our attention on this class. Let us start from the state equation xn+1 = Axn + Bun . Substituting in this xn = Axn−1 +Bun−1, we get xn+1 = A2 xn−1 +ABun−1 +Bun, and proceeding by induction, we get xn+1 = Ak xn−k+1 + Ak−1 Bun−k+1 + · · · + Bun .
(10.4)
Our standing assumption is that before a finite time, all signals were zero, that is, in the remote past the system was at rest. Thus, for some n0 , un = 0, xn = 0 for n < n0 . With this, equation (10.4) reduces to xn+1 =
∞
∑ A j Bun− j ,
j=0
and in particular, x0 =
∞
∑ A j Bu− j−1.
j=0
Thus, for n ≥ 0, equation (10.4) could also be written as xn+1 = ∑nj=0 A j Bun− j + ∑∞j=n+1 A j Bun− j = An+1 ∑∞j=0 A j Bu− j−1 + ∑nj=0 A j Bun− j = An+1 x0 + ∑nj=0 A j Bun− j . This is the state evolution equation based on initial conditions at time zero. From the preceding, the input/output relations become yn+1 =
∞
∑ CA j Bun− j .
(10.5)
j=0
We write y = f˜(u) and call f˜ the input/output map of the system. Suppose we look now at sequences of input and output signals shifted by one time unit. Let us write θn = un−1 and denote the corresponding states and outputs by ξn and ηn respectively. Then
ξn =
∞
∑ A j Bθn− j−1 =
j=0
∞
∑ A j Bun− j−2 = xn−1
j=0
298
10 Elements of Linear System Theory
and
ηn = Cξn = Cxn−1 = yn−1 . Let us introduce now the shift operator σ acting on time signals by σ (u j ) = u j−1 . Then the input/output map satisfies f˜(σ (u)) = σ (y) = σ ( f˜(u)).
(10.6)
If the signal, zero in the remote past, goes over to the truncated Laurent series ∑∞j=−∞ u− j z j , we conclude that σ (u) is mapped into ∞
∑
∞
∑
u− j−1 z j =
j=−∞
u− j−1 z j+1 = z ·
j=−∞
∞
∑
u− j z j ,
j=−∞
that is, in U ((z−1 )), the shift σ acts as multiplication by z. Since f˜ is a linear map, we get by induction that for an arbitrary polynomial p, we have f˜(p · u) = p · f˜(u). This means that with the polynomial module structure induced by σ , the input/output map f˜ is an F[z]-module homomorphism. n0 We find it convenient to associate with the infinite sequence {u j }−∞ the truncated Laurent series ∑ j≥n0 u j z− j . Thus positive powers of z are associated with past signals, whereas negative powers are associated with future signals. With this convention applied also to the state and output sequences, the input/output relation (10.5) can be written now as y(z) = G(z)u(z),
(10.7)
where y(z) = ∑ j≥n0 y j z− j and ∞
CAi−1 B = D + C(zI − A)−1 B. zi i=1
G(z) = D + ∑
(10.8)
The function G(z) defined in (10.8) is called the transfer function of the system. From this we can conclude the following result. A B Proposition 10.2. Let Σ = be a finite-dimensional, linear time-invariant C D
system. Then its transfer function G(z) = D + C(zI − A)−1 B is proper rational. Proof. Follows from the equality (zI − A)−1 =
adj (zI−A) det(zI−A)
.
From the system equations (10.2) it is clear that given that the system is at rest in the remote past, there will be no output from the system prior to an input being sent
10.2 Systems and Their Representations
299
into the system. In terms of signal spaces, the space of future inputs is mapped into the space of future output signals. This is the concept of causality, and it is built into the internal representation. We formalize this as follows. Definition 10.3. Let U and Y be finite-dimensional linear spaces over the field F. 1. An input/output map is an F[z]-module homomorphism f˜ : U ((z−1 )) −→ Y ((z−1 )). 2. An input/output map f˜ is causal if f˜(U [[z−1 ]]) ⊂ Y [[z−1 ]] and strictly causal if f˜(U [[z−1 ]]) ⊂ z−1 Y [[z−1 ]]. We have the following characterization. Proposition 10.4. 1. f˜(U ((z−1 ))) ⊂ Y ((z−1 )) is an F[z]-module homomorphism if and only if there exists G ∈ L(U , Y )((z−1 )) for which f˜(u) = G · u. 2. f˜ is causal if and only if G ∈ L(U , Y )[[z−1 ]] and strictly causal if and only if G ∈ z−1 L(U , Y )[[z−1 ]]. Proof. 1. Assume f˜ : U ((z−1 )) −→ Y ((z−1 )) is defined by f˜(u) = G · u for some G ∈ L(U , Y )((z−1 )). Then z · f˜(u) = z(Gu) = G(zu) = f˜(z · u), i.e., f˜ is an F[z]-module homomorphism. Conversely, assume f˜ is an F[z]-module homomorphism. Let e1 , . . . , em be a basis in U. Set gi = f˜(ei ). Let G have columns gi . Then, using the fact that f˜ is a homomorphism, we compute m
m
m
i=1
i=1
i=1
f˜(u) = f˜ ∑ ui zi = ∑ zi f˜(ui ) = ∑ zi Gui = Gu. ∞ i i 2. Assume G(z) has the expansion G(z) = ∑∞ i=−n Gi /z and u(z) = ∑i=0 ui /z . The polynomial part, not including the constant term, of Gu is given by ∑nk=0 ∑ki=0 G−n+k−i ui zk . This vanishes if and only if ∑ki=0 G−n+k−i ui = 0 for all choices of u0 , . . . , un−1 . This in turn is equivalent to G−n = · · · = G−1 = 0. Strict causality is handled similarly.
Let us digress a bit on the notion of state. Heuristically, the state of the system is the minimum amount of information we need to know about the system at present so that its future outputs can be computed, given that we have access to future inputs. In a sense, the present state of the system is the combined effect of all past inputs. But this type of information is overly redundant. In particular, many past inputs may lead to the same state. Since the input/output map is linear, we can easily eliminate the future inputs from the definition. Therefore, given an input/output map f˜, it is natural to introduce the following notion of equivalence of past inputs. We say that, with u(z), v(z) ∈ U [z], u f v if π− f˜u = π− f˜v. This means that the past input
300
10 Elements of Linear System Theory
sequences u(z) and v(z) have the same future outputs; thus they are indistinguishable based on future observations. The moral of this is that in order to elucidate the notion of state, it is convenient to introduce an auxiliary input/output map. Definition 10.5. Let f˜(U ((z−1 ))) ⊂ Y ((z−1 )) be an input/output map. Then the restricted input/output map f : U [z] −→ z−1 Y [[z−1 ]] is defined by f (u) = π− f˜(u). If fˆ has the transfer function G(z), then the restricted input/output map is given by f (u) = π− Gu, i.e., it is the Hankel operator defined by G(z). The relation between the input/output map and the restricted input/output map is best described by the following commutative diagram: fˆ
U ((z−1 ))
-Y ((z−1 ))
6
π−
i
f
U [z]
? -z−1 Y [[z−1 ]]
Here i : U [z] −→ U ((z−1 )) is the natural embedding.
10.3 Realization Theory We turn now to the realization problem. Given a causal input/output map fˆ with proper transfer function G(z), we want to find a state-space realization of it, i.e., we want to find linear transformations A, B,C, D such that G(z) =
AB CD
= D + C(zI − A)−1 B.
i Since G(z) = ∑∞ i=0 Gi /z , we must have for the realization,
D = G0 , CAi−1 B = Gi , i ≥ 1. The coefficients Gi are called the Markov parameters of the system.
10.3 Realization Theory
301
To this end, we go back to a state space system
A B C D
. We define the
reachability map R : U [z] −→ X by n
n
i=0
i=0
R ∑ ui zi = ∑ Ai Bui ,
(10.9)
and the observability map O : X −→ z−1 Y [[z−1 ]] by ∞
CAi−1 x . zi i=1
Ox = ∑
Proposition 10.6. Given the state space system
(10.10)
A B C D
with state space X , let
X carry the F[z]-module structure induced by A as in (4.41). Then 1. The reachability and observability maps are F[z]-module homomorphisms. 2. If f is the restricted input/output map of the system and G(z) = D+C(zI − A)−1 B the transfer function, then f = HG = OR. (10.11) Proof. 1. We compute Rz ∑ni=0 ui zi = R ∑ni=0 ui zi+1 = ∑ni=0 Ai+1 Bui = A ∑ni=0 Ai Bui = AR ∑ni=0 ui zi .
(10.12)
Also ∞ CAi−1 Aξ CAi ξ = ∑ zi ∑ zi i=1 i=1 ∞ CAi−1 ξ = S− ∑ = S− O ξ . zi i=1
OAx =
∞
(10.13)
2. Let ξ ∈ U and consider it as a constant polynomial in U [z]. Then we have ∞
CAi−1 Bx = G(z)ξ . zi i=1
OR ξ = O(Bξ ) = ∑
Since both O and R are F[z]-module homomorphisms, it follows that ∞
CA j−1 (Ai Bξ ) zj j=1 ∞ ∞ CAi+ j−1 Bξ CA j−1 Bξ = ∑ = π− zi ∑ j z zj j=1 j=1
OR(zi ξ ) = O(Ai Bξ ) =
∑
= π− zi G(z)ξ = π− G(z)zi ξ . By linearity we get, for u ∈ U [z], that (10.11) holds.
302
10 Elements of Linear System Theory
So far, nothing has been assumed that would imply further properties of the reachability and observability maps. In particular, recalling the canonical factorizations of maps discussed in Chapter 1, we are interested in the case that R is surjective and O is injective. This leads to the following definition. A B Definition 10.7. Given the state space system Σ with transfer function G = C D
with state space X , then 1. The system Σ is called reachable if the reachability map R is surjective. 2. The system Σ is called observable if the observability map O is injective. 3. The system is called canonical if it is both reachable and observable. Proposition 10.8. Given the state space system Σ with transfer function G = A B with the n-dimensional state space X , then C D
1. The following statements are equivalent: a. The system Σ is reachable. b. The zero state can be steered to an arbitrary state ξ ∈ X in a finite number of steps. c. We have rank(B, AB, . . . , An−1 B) = n. (10.14) d.
e.
∗ ∗ i ∩n−1 i=0 Ker B (A ) = {0}.
(10.15)
∗ ∗ i ∩∞ i=0 Ker B (A ) = {0}.
(10.16)
2. The following statements are equivalent: a. The system Σ is observable. b. The only state with all future outputs zero is the zero state. c. i ∩∞ i=0 KerCA = {0}.
(10.17)
d. e. We have
i ∩n−1 i=0 KerCA = {0}.
(10.18)
rank (C∗ , A∗C∗ , . . . , (A∗ )n−1C∗ ) = n.
(10.19)
i Proof. 1. (a) ⇒ (b) Given a state ξ ∈ X , there exists v(z) = ∑k−1 i=0 vi z such that k−1 i ξ = Rv = ∑i=0 A Bvi . Setting ui = vk−1−i and using this control sequence, the solution to the state equation, with zero initial condition, is given by xk = k−1 i i ∑k−1 i=0 A Buk−1−i = ∑i=0 A Bvi = ξ .
10.3 Realization Theory
303
i (b) ⇒ (c) Since every state ξ ∈ X has a representation ξ = ∑k−1 i=0 A Bui for some k, it follows by an application of the Cayley–Hamilton theorem that we can assume without loss of generality that k = n. Thus the map i (B, AB, . . . , An−1 B) : U n −→ X defined by (u0 , . . . , un ) → ∑n−1 i=0 A Bui is surjecn−1 tive. Hence rank (B, AB, . . . , A B) = n. (c) ⇒ (d) The adjoint to the map (B, AB, . . . , An−1 B) is
⎛
⎞ B∗ ⎜ B∗ A∗ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ : X ∗ −→ (U ∗ )n . · ⎜ ⎟ ⎝ ⎠ · ∗ ∗ n−1 B (A ) Applying Theorem 4.36, we have ⎞ B∗ ⎜ B∗ A∗ ⎟ ⎟ n−1 ⎜ ⎟
⎜ Ker ⎜ Ker B∗ (A∗ )i = {0}. · ⎟= ⎟ i=0 ⎜ ⎠ ⎝ · ∗ ∗ n−1 B (A ) ⎛
n−1 ∗ ∗ i (d) ⇒ (e) This follows from the inclusion ∩∞ i=0 Ker B (A ) ⊂ ∩i=0 ∗ ∗ i Ker B (A ) . ∗ ∗ i (e) ⇒ (a) Let φ ∈ (Im R)⊥ = Ker R ∗ = ∩∞ i=0 Ker B (A ) = {0}. Then φ = 0, and the system is reachable. 2. This can be proved similarly. Alternatively, it follows from the first part by duality considerations.
Proposition 10.6 implied that any realization of an input/output map leads to a factorization of the restricted input/output map. We proceed to prove the converse. Theorem 10.9. Let f˜ : U ((z−1 )) −→ Y ((z−1 )) be an input/output map. Then to any realization of f corresponds a unique factorization f = hg, with h : X −→ Y ((z−1 )) and g : U ((z−1 )) −→ X F[z]-module homomorphisms. Conversely, given any factorization f = hg into a product of F[z]-module homomorphisms, there exists a unique associated realization. The factorization is canonical, that is, g is surjective and h injective, if and only if the realization is canonical. Proof. We saw that a realization leads to a factorization f = OR with R, O the reachability and observability maps respectively.
304
10 Elements of Linear System Theory
Conversely, assume we are given a factorization U [z]
f
- z−1 F p [[−1 ]] * h
H
HH g HH
HH j
X with X an F[z]-module and g, h both F[z]-module homomorphisms. We define a triple of maps A : X −→ X , B : U −→ X and C : X −→ Y by Ax = z · x, Bu = g(i(u)), Cx = π1 h(x).
(10.20)
Here i : U −→ U [z] is the embedding of U in U [z] that identifies a vector with the corresponding constant polynomial vector, and π1 : z−1 Y [[z−1 ]] is the map that reads the coefficient of z−1 in the expansion of an element of z−1 Y [[z−1 ]]. It is immediate to check that these equations constitute a realization of f with g and h the reachability and observability maps respectively. In particular, the realization is canonical if and only if the factorization is canonical. There are natural concepts of isomorphism for both factorizations as well as realizations. Definition 10.10. 1. Let f = hi gi , i = 1, 2 be two factorizations of the restricted input/output map f through the F[z]-modules Xi respectively. We say that the two factorizations are isomorphic if there exists an F[z]-module isomorphism φ for which the following diagram is commutative:
g1 U [z]
*
X1
H
HH h1 HH HH j - z−1 Y [[−1 ]] * h2
φ
f
HH
H g2 HH
HH j
2. We say that two realizations
? X2
A1 B1 C1 D1
and
A2 B2 C2 D2
are isomorphic if there
exists an isomorphism Z : X1 −→ X2 such that the following diagram is commutative.
10.3 Realization Theory
*
U
305
A1
X1
- X1
B1 H
HH B 2 HH
HH j
Z
Z
? X2
? - X2
A2
HH
C HH 1 HH j H Y * C2
Theorem 10.11 (State space-isomorphism). 1. Given two canonical factorizations of a restricted input/output map, then they are isomorphic. Moreover, the isomorphism is unique. A1 B1 A2 B2 2. Assume G = = and the two realizations are canonical. Then C1 D1
C2 D2
D1 = D2 and the two realizations are isomorphic, and moreover, the isomorphism is unique. Proof. 1. Let f = h1 g1 = h2 g2 be two canonical factorizations through the F[z]modules X1 , X2 respectively. We define φ : X1 −→ X2 by φ (g1 (u)) = g2 (u). First we show that φ is well defined. We note that the injectivity of h1 , h2 implies Ker gi = Ker f . So g1 (u1 ) = g1 (u2 ) implies u1 − u2 ∈ Ker g1 = Ker f = Ker g2 and hence also g2 (u1 ) = g2 (u2 ). Next we compute
φ (z · g1 (u)) = φ (g1 (z · u)) = g2 (z · u) = z · g2 (u) = z · φ (g1 (u)). This shows that φ is an F[z]-homomorphism. It remains to show that h2 φ = h1 . Now, given x ∈ X1 , we have x = g1 (u) and h1 (x) = h1 g1 (u) = f (u). On the other hand, h2 φ (x) = h2 φ (g1 (u)) = h2 g2 (u) = f (u), and the equality h2 φ = h1 is proved. To prove uniqueness, assume φ1 , φ2 are two isomorphisms from X1 to X2 that make the isomorphism diagram commutative. Then f = h2 g2 = h2 φ1 g1 = h2 φ2 g1 , which implies h2 (φ2 − φ1 )g1 = 0. Since g1 is surjective and h2 injective, the equality φ2 = φ1 follows. 2. Let R1 , R2 be the reachability maps of the two realizations respectively and O1 , O2 the respective observability maps. All of these four maps are F[z]homomorphisms and by our assumptions, the reachability maps are surjective while the observability maps are injective. Thus the factorizations f = Oi Ri are canonical factorizations. By the first part, there exists a unique F[z]-isomorphism Z : X1 −→ X2 such that ZR1 = R2 and O2 Z = O1 . That Z is a module isomorphism implies that Z is an invertible map satisfying ZA1 = A2 Z. Restricting ZR1 = R2 to constant polynomials implies ZB1 = B2 .
306
10 Elements of Linear System Theory
Finally, looking at coefficients of z−1 in the equality O2 Z = O1 implies C2 Z = C1 . These equalities, taken together, imply the commutativity of the diagram. Uniqueness follows from part 1. We have used the concept of canonical factorizations, but so far have not demonstrated their existence. Actually, this is easy and we take it up next. Proposition 10.12. Let f : U [z] −→ z−1 Y [[z−1 ]] be an F[z]-homomorphism, and f = HG the Hankel operator of the transfer function. Then canonical factorizations exist. Specifically, the following factorizations are canonical: U [z]
HG
- z−1 Y [[z−1 ]] * HG
HH
HH
π
HH j H
U [z]/Ker HG
Here, the map π is the canonical projection and HG the map induced by HG on the quotient space U [z]/Ker HG . U [z]
HG HH
- z−1 Y [[z−1 ]] * i
H
HH
HH j H
Im HG
Here, the map i is the embedding of Im HG in z−1 Y [[z−1 ]] and H is the Hankel operator considered as a map from U [z] onto Im HG . Proof. We saw in Chapter 8 that the Hankel operator is an F[z]-homomorphism. Thus we can apply Exercise 6 in Chapter 1 to conclude that HG |U [z]/Ker HG is an injective F[z]-homomorphism. From now on, we will treat only the case of single-input single-output systems. These are the systems with scalar-valued transfer functions. If we further assume that the systems have finite-dimensional realizations, then by Kronecker’s theorem, this is equivalent to the transfer function being rational. In this case both Ker Hg and Im Hg are submodules of F[z] and z−1 F[[z−1 ]] respectively. Moreover, as linear spaces, Ker Hg has finite codimension and Im Hg is finite-dimensional. We can take advantage of the results on representation of submodules contained in Theorems 1.28 and 5.24 to go from the abstract formulation of realization theory to very concrete state space representations. Since Im Hg is a finite-dimensional S− invariant subspace, it is necessarily of the form X q for some, uniquely determined, monic polynomial q(z). This leads to the coprime factorization g(z) = p(z)q(z)−1 = q(z)−1 p(z). A similar argument applies to KerHg , which is necessarily of the form qF[z], and this leads to the same coprime factorization.
10.3 Realization Theory
307
However, rather than start with a coprime factorization, our starting point will be any factorization g(z) = p(z)q(z)−1 = q(z)−1 p(z),
(10.21)
with no coprimeness assumptions. We note that π1 ( f ) = [ f , 1], with the form defined by (5.62), in particular π1 ( f ) = 0 for any polynomial f (z). The reader may wonder, since we are in a commutative setting, why distinguish between the two factorizations in (10.21). The reason for this is that in our approach to realization theory, it makes a difference if we consider q(z) as a right denominator or left denominator. In the first case, it determines the input pair (A, B) up to similarity, whereas in the second case it determines the output pair (C, A) up to similarity. Theorem 10.13. Let g be a strictly proper rational function and assume the factorization (10.21) with q(z) monic. Then the following are realizations of g(z). 1. Assume g(z) = q(z)−1 p(z). In the state space Xq define A f = Sq f , Bξ = p · ξ , C f = f , 1 . Then g(z) =
A B C 0
(10.22)
. This realization is observable. The reachability map R :
F[z] −→ Xq is given by Ru = πq (pu).
(10.23)
The reachable subspace is Im R = q1 Xq2 , where q(z) = q1 (z)q2 (z) and q1 (z) is the greatest common divisor of p(z) and q(z). Hence the realization is reachable if and only if p(z) and q(z) are coprime. 2. Assume g(z) = p(z)q(z)−1 . In the state space Xq define A f = Sq f , Bξ = ξ , C f = f , p . Then g(z) =
A B C 0
(10.24)
. This realization is reachable, and the reachability map
R : F[z] −→ Xq is given by Ru = πq u,
u ∈ F[z].
(10.25)
The unobservable subspace is given by Ker O = q1 Xq2 , where q(z) = q1 (z)q2 (z) and q2 (z) is the greatest common divisor of p(z) and q(z). The system is observable if and only if p(z) and q(z) are coprime.
308
10 Elements of Linear System Theory
3. Assume g(z) = q(z)−1 p(z). In the state space X q define Ah = Sq h, Bξ = g · ξ , Ch = [h, 1]. Then g(z) =
A B C 0
(10.26)
. This realization is observable. It is reachable if and only if
p(z) and q(z) are coprime. 4. Assume g(z) = p(z)q(z)−1 . In the state space X q define Ah = Sq h Bξ = q−1 · ξ Ch = [h, p] Then g(z) =
(10.27)
A B . This realization is reachable. It is observable if and only if
C 0
p(z) and q(z) are coprime. We refer to these realizations as shift realizations. 5. If p(z) and q(z) are coprime, then all four realizations are isomorphic. In particular, the following commutative diagram shows the isomorphism of realizations (10.22) and (10.24):
1· F
*
Xq
Sq
- Xq
p(Sq )
H
HH p· HH
HH j
? Xq
HH
p(Sq )
π (pq−1 ·) HH 1 HH j H F * −1 π1 (q ·)
Sq
? - Xq
Proof. 1. Let ξ ∈ F. We compute CAi−1 Bξ = Sqi−1 pξ , 1 = [q−1 qπ− q−1 zi−1 pξ , 1] = [zi−1 gξ , 1] = [g, zi−1 ]ξ = gi ξ . Hence (10.22) is a realization of g(z). i To show the observability of this realization, assume f ∈ ∩∞ i=0 KerCA . Then, for i ≥ 0, 0 = Sqi f , 1 = [q−1 qπ− q−1 zi f , 1] = [q−1 f , zi ].
10.3 Realization Theory
309
This shows that q−1 (z) f (z) is necessarily a polynomial. However f (z) ∈ Xq implies that q(z)−1 f (z) is strictly proper. So q(z)−1 f (z) = 0 and hence also f (z) = 0. We proceed to compute the reachability map R. We have, for u(z) = ∑i ui zi , Ru = R ∑ ui zi = ∑ Sqi pui = πq ∑ pui zi = πq (pu). i
i
i
Clearly, since R is an F[z]-module homomorphism, Im R is a submodule of Xq hence of the form Im R = q1 Xq2 for a factorization q(z) = q1 (z)q2 (z) into monic factors. Since p(z) ∈ Im R, it follows that p(z) = q1 (z)p1 (z), and therefore q1 (z) is a common factor of p(z) and q(z). Let now q (z) be any other common factor of p(z) and q(z). Then q(z) = q (z)q (z) and p(z) = q (z)p (z). For any polynomial u(z) we have, by Lemma (1.24), Ru = πq (pu) = q πq (p u) ∈ q Xq . So we have obtained the inclusion q1 Xq2 ⊂ q Xq , which, by Proposition 5.7, implies q (z) | q1 (z), i.e., q1 (z) is the greatest common divisor of p(z) and q(z). Clearly Im R = q1 Xq2 = Xq if and only if q1 (z) = 1, i.e., p(z) and q(z) are coprime. 2. Let ξ ∈ F. We compute CAi−1 Bξ = Sqi−1 ξ , p = [q−1 qπ− q−1 zi−1 ξ , p] = [gzi−1 , 1]ξ = gi ξ . Hence (10.24) is a realization of g. We compute the reachability map of this realization: Ru = R ∑i ui zi = ∑i Sqi ui = ∑i πq zi ui = πq ∑i zi ui = πq u. This implies the surjectivity of R and hence the reachability of this realization. Now the unobservable subspace is a submodule of Xq , hence of the form q1 Xq2 for a factorization q(z) = q1 (z)q2 (z). So for every f (z) ∈ q1 Xq2 we have 0 = CAi f = Sqi f , p = [q−1 qπ− q−1 zi f , p] = [q−1 zi f , p] −1 i −1 i = [q−1 2 q1 z q1 f 2 , p] = [pq2 f 2 , z ]. In particular, choosing f1 (z) = 1, we conclude that p(z)q2 (z)−1 is a polynomial, say p2 (z). So p(z) = p2 (z)q2 (z) and q2 (z) is a common factor of p(z) and q(z). Let now q (z) be any other common factor of p(z) and q(z) and set q(z) = q (z)q (z), p(z) = p (z)q (z). For f = q f ∈ q Xq” we have
310
10 Elements of Linear System Theory
CAi f = Sqi q f , p = [q−1 qπ− q−1 zi q f , p] = [p(q”)−1 zi f , 1] = [p f , zi ] = 0. So we get the inclusion q Xq ⊂ q1 Xq2 and hence q (z) | q2 (z), which shows that q2 (z) is indeed the greatest common factor of p(z) and q(z). Clearly, this realization is observable if and only if KerO = q1 Xq2 = {0}, which is the case only if q2 (z) is constant and hence p(z) and q(z) coprime. 3. Using the isomorphism (5.38) of the polynomial and rational models, we get the isomorphisms of the realizations (10.22) and (10.26). 4. By the same method, we get the isomorphisms of the realizations (10.24) and (10.27). These isomorphisms are independent of coprimeness assumptions. 5. By parts (3) and (4), it suffices to show that the realizations (10.22) and (10.24) are isomorphic. By the coprimeness of p(z) and q(z) and Theorem 5.18, the transformation p(Sq ) is invertible and certainly satisfies p(Sq )Sq = Sq p(Sq ), i.e., it is a module ismorphism. Since also p(Sq )1 = πq (p · 1) = p and p(Sq ) f , 1 = f , p , it follows that the diagram is commutative. Based on the shift realizations, we can, by a suitable choice of bases whether in Xq or X q , get concrete realizations given in terms of matrices. Theorem 10.14. Let g(z) be a strictly proper rational function and assume g(z) = p(z)q(z)−1 = q(z)−1 p(z) with q(z) monic. Assume q(z) = zn + qn−1 zn−1 + · · · + i q0 , p(z) = pn−1 zn−1 + · · · + p0 , and g(z) = ∑∞ i=1 gi /z . Then the following are realizations of g(z): 1. Controller realization ⎛ ⎞ 0 ⎜.⎟ ⎟ ⎜ . ⎜ ⎟ ⎟ ⎜ ⎜ ⎟ ⎟ ⎜ A=⎜ . ⎟ , B = ⎜ . ⎟ , C = p0 . . . pn−1 . ⎜ ⎟ ⎟ ⎜ ⎝0⎠ ⎝ 1 ⎠ 1 −q0 . . . −qn−1 ⎛
0 1
⎞
2. Controllability realization ⎛
⎛ ⎞ ⎞ 0 . . . −q0 1 ⎜1 ⎜0⎟ . ⎟ ⎜ ⎜ ⎟ ⎟ ⎜ ⎜ ⎟ ⎟ A=⎜ . . ⎟ , B = ⎜ . ⎟ , C = g1 . . . gn . ⎜ ⎜ ⎟ ⎟ ⎝ ⎝.⎠ . . ⎠ 1 −qn−1 0
10.3 Realization Theory
311
3. Observability realization ⎛
⎛
⎞
⎞ g1 ⎜ ⎜ . ⎟ ⎟ . ⎜ ⎜ ⎟ ⎟ ⎜ ⎜ ⎟ ⎟ A=⎜ . ⎟, B = ⎜ . ⎟, C = 1 0 . . 0 . ⎜ ⎜ ⎟ ⎟ ⎝ ⎝ . ⎠ 1 ⎠ −q0 . . . −qn−1 gn 0 1
4. Observer realization ⎛ ⎞ ⎞ p0 0 . . . −q0 ⎜ . ⎟ ⎜1 . ⎟ ⎜ ⎟ ⎟ ⎜ ⎜ ⎟ ⎟ ⎜ A=⎜ . . ⎟, B = ⎜ . ⎟, C = 0 . . 0 1 . ⎜ ⎟ ⎟ ⎜ ⎝ . ⎠ ⎝ . . ⎠ 1 −qn−1 pn−1 ⎛
5. If p(z) and q(z) are coprime, then the previous four realizations are all canonical, hence isomorphic. The isomorphisms are given by means of the following diagram:
Controller Realization
Observability Realization
p(Cq ) @
@
@
@
@
K B(q, a) ?
Controllability Realization
K
@
@ B(q, p) @ @ @ R @ p(Cq ) -
?
Observer Realization
312
10 Elements of Linear System Theory
Here B(q, p) and B(q, a) are Bezoutians with the polynomial a(z) arising from the solution to the Bezout equation a(z)p(z) + b(z)q(z) = 1, and K is the Hankel matrix ⎛ ⎞ q1 . . . qn−1 1 ⎜. ⎟ . ..1 ⎜ ⎟ ⎜ ⎟ . . . . ⎜ ⎟ K = [I]stco = ⎜ ⎟. ⎜. ⎟ . . ⎜ ⎟ ⎝ qn−1 1 ⎠ 1 Proof. 1. We use the representation g(z) = p(z)q(z)−1 and take the matrix representation of the shift realization (10.24) with respect to the control basis {e1 , . . . , en }. In fact, we have A = [Sq ]co co = Cq , and since en (z) = 1, also B has the required form.
st [ f ]co and Finally, the matrix representation of C follows from C f = p, f = [p]
st the obvious fact that [p] = p . . . p . 0
n−1
2. We use the representation g(z) = p(z)q(z)−1 and take the matrix representation of the shift realization (10.24) with respect to the standard basis {1, z, . . . , zn−1 }. In this case, A = [Sq ]stst = Cq . The form of B is immediate. To conclude, we use
co = g . . . g . equation (8.71), that is, p(z) = ∑ni=1 gi ei (z), to get [p] n 1 3. We use the representation g(z) = q(z)−1 p(z) and take the matrix representation of the shift realization (10.22) with respect to the control basis. Here again A = [Sq ]co co = Cq and B is again obtained by applying (8.71). The matrix C is obtained
st . as [1]
4. We use the representation g(z) = q(z)−1 p(z) and take the matrix representation of the shift realization (10.22) with respect to the standard basis. The derivation follows as before. 5. We note that the controller and controllability realizations are different matrix representations of realization (10.22) taken with respect to the control and standard bases. Thus the isomorphism is given by K = [I]stco , which has the required Hankel form. A similar observation holds for the isomorphism of the observability and observer realizations. On the other hand, the controller and observability realizations are matrix representations of realizations (10.22) and (10.24), both taken with respect to the control basis. By Theorem 10.13, under the assumption of coprimeness for p and q, these realizations are isomorphic, and the isomorphism is given by the map p(Sq ). Taking the matrix representation of this map with respect to the control co basis, we get [p(Sq )]co co = p([Sq ]co ) = p(Cq ). A similar derivation holds for the isomorphism of the controllability and observer realizations. The controller and observer realizations are matrix representations of (10.22) and (10.24), taken with respect to the control and standard bases respectively. Hence based on the diagram in Theorem 10.13 and using Theorem 8.27, the isomorphism is given by [p(Sq )]stco = B(q, p).
10.3 Realization Theory
313
Finally, since by Theorem 5.18 the inverse of p(Sq ) is a(Sq ), the polynomial a(z) coming from a solution to the Bezout equation, the same reasoning as before shows that the isomorphism between the observability and controllability realizations is given by [a(Sq )]stco = B(q, a). Note that the controller and observer realizations are dual to each other, and the same is true for the controllability and observability realizations. The continued fraction representation (8.75) of a rational function can be translated immediately into a canonical form realization that incorporates the atoms. Theorem 10.15. Let the strictly proper transfer function g(z) = p(z)/q(z) have the sequence of atoms {βi , ai+1 (z)}, and assume that α1 , . . . , αr are the degrees of the atoms and ak (z) = zαk +
αk −1
∑
(k)
ai zi .
(10.28)
i=0
Then g(z) has a realization (A, b, c) of the form ⎛
⎞ A11 A12 ⎜A A . ⎟ ⎜ 21 22 ⎟ ⎜ ⎟ A=⎜ . .. ⎟, ⎜ ⎟ ⎝ .. Ar−1 r ⎠ Ar r−1 Arr
(10.29)
where ⎛
(i) ⎞ 0 . . . −a0 ⎜1 . ⎟ ⎜ ⎟ ⎜ ⎟ . ⎟, Aii = ⎜ . ⎜ ⎟ ⎝ . . ⎠ (i) 1 −aαi −1
⎛
Ai+1 i
and Ai i+1 = βi−1 Ai+1 i ,
⎞ 0..01 ⎜. 0⎟ ⎜ ⎟ ⎜ ⎟ =⎜. .⎟ ⎜ ⎟ ⎝. .⎠ 0.. . 0
i = 1, . . . r,
(10.30)
314
10 Elements of Linear System Theory
⎛ ⎞ 1 ⎜0⎟ ⎜ ⎟ ⎜ ⎟ b = ⎜ . ⎟, ⎜ ⎟ ⎝. ⎠ 0
c = 0 . . β0 0 . . 0
(10.31)
with β0 being in the α1 position. Proof. We use the shift realization (10.24) of g(z) which, by the coprimeness of p and q, is canonical. Taking its matrix representation with respect to the orthogonal basis Bor = {1, z, . . . , zn1 −1 , Q1 , zQ1 , . . . , zn2 −1 Q1 , . . . , Qr−1 , . . . , znr−1 −1 Qr−1 }, with Qi the Lanczos polynomials of the first kind, proves the theorem. In the computation of the matrix representation we lean heavily on the recurrence formula (8.78) for the Lanczos polynomials.
10.4 Stabilization The purpose of modeling, and realization theory is part of that process, is to gain a better understanding of the system in order to improve its performance by controlling it. The most fundamental control problem is that of stabilization. We say that a system is stable if given any initial condition, the system tends to its rest point with increasing time. If the system is given by a difference equation xn+1 = Axn , the stability condition is that all eigenvalues of A lie in the open unit disk. If we have a continuous-time system given by x˙ = Ax, the stability condition is that all eigenvalues of A lie in the open left half-plane. Since all the information needed for the computation of the future behavior of the system is given in the system equations and the present state, that is the initial condition of the system, we expect that a control law should be a function of the state, that is, a map, linear in the context we work in, F : X −→ U . In the case of a system given by equations (10.2), we have xn+1 = Axn + Bun , un = Fxn + vn , and the closed-loop equation becomes xn+1 = (A + BF)xn + Bvn. In terms of flow diagrams we have the feedback configuration
(10.32)
10.4 Stabilization
315
v -
u -
-
G
x -
F
There are two extreme idealizations inherent in the assumptions of this scheme. The first is the assumption that we have a precise model of our system, which is hardly ever the case. The other idealization is the assumption that we have access to full state information. This difficulty is more easily overcome via the utilization of observers, that is auxiliary systems that reconstruct, asymptotically, the state, or more generally by the use of dynamic controllers. The first problem is more serious, and one way to cope with it is to develop the theory of robust control. This, however, is outside the scope of this book. It is still instructive to study the original problem of stabilizing a given linear system with full state information. The following theorem on pole-shifting states that using state feedback, we have full freedom in assigning the characteristic polynomial of the closed-loop system, provided the pair (A, B) is reachable. Here we handle only the single-input case. Theorem 10.16. Let the pair (A, b) be reachable with A an n × n matrix and b an n vector. Then for each monic polynomial s of degree n, there exists a control law k : Fn −→ F for which the characteristic polynomial of A − bk is s. Proof. The characteristic polynomial is independent of the basis chosen. We choose to work in the control basis corresponding to q(z) = det(zI − A) = zn + qn−1zn−1 + · · · + q0. With respect to this basis, the pair (A, b) has the matrix representation ⎛ ⎜ ⎜ ⎜ A=⎜ ⎜ ⎝
⎞
0 1 .
· .
1 −q0 . . . −qn−1
⎟ ⎟ ⎟ ⎟, ⎟ ⎠
⎛ ⎞ 0 ⎜.⎟ ⎜ ⎟ ⎜ ⎟ b = ⎜ . ⎟. ⎜ ⎟ ⎝.⎠ 1
Let s(z) = zn + sn−1 zn−1 + · · · + s0 . Then k = q0 − s0 . . . qn−1 − sn−1 is the required control law. This theorem explains the reference to the control basis. We pass now to the analysis of stabilization when we have no direct access to full state information. We assume that the system is given by the equations xn+1 = Axn + Bun , yn = Cxn .
(10.33)
316
10 Elements of Linear System Theory
We construct an auxiliary system, called an observer, that utilizes all the available information, namely the control signal and the observation signal:
ξn+1 = Aξn + Bun − H(yn − Cξn ), ηn = ξn . We define the error signal by
(10.34)
en = xn − ξn .
The object of the construction of an observer is to choose the matrix H so that the error en tends to zero. This means that with increasing time, the output of the observer gives an approximation of the current state of the system. Subtracting the two system equations, we get xn+1 − ξn+1 = A(xn − ξn ) + H(Cxn − Cξn ), or en+1 = (A + HC)en . Thus, we see that the observer construction problem is exactly dual to that of stabilization by state feedback. Theorem 10.17. An observable linear single-output system has an observer, or equivalently, an asymptotic state estimator. Proof. We apply Theorem 10.16. This shows that the characteristic polynomial of A + HC can be arbitrarily chosen. Of course, state estimation and the state feedback law can be combined into a single framework. Thus we use the observer to get an asymptotic approximation for the state, and we feed this back into the system as if it were the true state. The next theorem shows that as regards stabilization, this is sufficient. A B , Theorem 10.18. Given a canonical single-input/single-output system C 0
choose linear maps H, F so that A + BF and A + HC are both stable matrices. Then 1. The closed-loop system given by xn+1 = Axn + Bun, yn = Cxn ,
ξn+1 = Aξn + Bun − H(yn − Cξn ), un = −F ξn + vn , is stable.
(10.35)
10.4 Stabilization
317
2. The transfer function of the closed-loop system, that is, the transfer function from v to y, is Gc (z) = C(zI − A + BF)−1 B. Proof. 1. We can rewrite the closed-loop system equations as
xn+1 ξn+1
A −BF = HC A − HC − BF xn yn = C 0 , ξn
xn ξn
,
and it remains to compute the characteristic polynomial of the new state matrix. We use the similarity
I 0 −I I
A −BF HC A − HC − BF
I0 I I
Therefore, the characteristic polynomial of
=
A − BF −BF . 0 A − HC
A −BF HC A − HC − BF
is the product of
the characteristic polynomials of A − BF and A − HC. This shows also that the controller and observer for the original system can be designed independently. 2. We compute the transfer function of the closed-loop system. The closed-loop system transfer function is given by ⎞ B A −BF Gc (z) = ⎝ HC A − HC − BF B ⎠ . C 0 0 ⎛
Utilizing the previous similarity, we get ⎛
⎞ A − BF −BF B Gc (z) = ⎝ 0 A − HC 0 ⎠ = C(zI − A + BF)−1 B. C 0 0 Note that the closed-loop transfer function does not depend on the specifics of the observer. This is the case because the transfer function desribes the steady state of the system. The observer does affect the transients in the system. Let us review the previous observer-controller construction from a transfer function point of view. Let G(z) = C(zI − A)−1 B have the polynomial coprime factorization p/q. Going back to the closed-loop system equations, we can write ξn+1 = (A − HC)ξn + Hyn +Bun . Hence the transfer function from (u, y) to the control signal is given by F(zI − A + HC)−1B F(zI − A + HC)−1H , and the overall transfer function is determined from
318
10 Elements of Linear System Theory
y = Gu, u = v − Hu u − Hy y, where
(10.36)
Hu = F(zI − A + HC)−1B, Hy = F(zI − A + HC)−1H.
The flow chart of this configuration is given by v
-
u
-
6
? -
-
G
Hy
Hu
y -
?
?
Note now that given that we chose H to stabilize A − HC, the two transfer functions Hu , Hy have the following representations: Hu =
pu , qc
Hy =
py , qc
where qc (z) = det(zI − A + HC) is a stable polynomial with deg qc = deg q. From (10.36) we get (I + Hu )u = v − Hy y, or u = (I + Hu )−1 v − (I + Hu )−1 Hy y. This implies y = Gu = G(I + Hu )−1 v − G(I + Hu )−1 Hy y. So the closed-loop transfer function is given by Gc = (I + G(I + Hu )−1 Hy )−1 G(I + Hu )−1 . Since we are dealing here with single-input single-output systems, all transfer functions are scalar-valed and have polynomial coprime factorizations. When these are substituted in the previous expression, we obtain 1 1 p · · py q 1 + p u 1 p 1+ · · qc q 1 + pu qc qc py p 1 pqc qc = py · q · q · q + p = q(q + p ) + pp . p c c u c u y 1+ · q qc + p u
Gc =
The denominator polynomial has degree 2 deg q and has to be chosen to be stable. So let t(z) be an arbitrary stable polynomial of degree 2 degq. Since p(z) and q(z) are
10.5 The Youla–Kucera Parametrization
319
coprime, the equation a(z)p(z) + b(z)q(z) = t(z) is solvable, with an infinite number of solutions. We can specify a unique solution by imposing the extra requirement that deg a < deg q. Since deg ap < 2 deg q − 1, we must have deg(bq) = degt. So deg b = deg q, and we can write b(z) = qc + pu (z) and a(z) = py (z). Moreover, pu (z), py (z) have degrees smaller than degq. A very special case is obtained if we choose a stable polynomial s(z) of degree equal to the degree of q(z), and let t(z) = s(z) · qc . The closed-loop transfer function reduces in this case to p(z)/s(z), which is the case in the observer-controller configuration. We go back now to the equation q(z)(qc + pu(z)) + p(z)py (z) = s(z)qc and divide by the right-hand side to obtain q(z) qc + pu p(z) py (z) · · = 1. + s(z) qc s(z) qc (z)
(10.37)
Now all four rational functions appearing in this equation are proper and stable, that is, they belong to RH∞ + . Moreover, equation (10.37) is just a Bezout identity in p(z)/s(z) ∞ , and g(z) = RH∞ + q(z)/s(z) is a coprime factorization of g(z) over the ring RH+ . This analysis is important, since we can now reverse our reasoning. We see that for stabilization it was sufficient to solve a Bezout equation over RH∞ + that is based on a coprime factorization of g(z) over the same ring. We can take this as a starting point of a more general approach.
10.5 The Youla–Kucera Parametrization We go back now to the stabilization problem. We would like to design a compensator, that is, an auxiliary system, that takes as its input the output of the original system, and its output is taken as an additional control signal, in such a way that the closed-loop system is stable. In terms of flow charts we are back to the configuration v -
u -
-
G
C
y -
with the sole difference that we no longer assume the state to be accessible. It is a simple exercise to check that the closed-loop transfer function is given by Gc = (I − GC)−1 G = G(I − CG)−1 .
320
10 Elements of Linear System Theory
Now a natural condition for stabilization would be that Gc should be stable. Indeed, from an external point of view, this seems the right condition. However, this condition does not guarantee that internal signals stay bounded. To see what the issue is, we note that there are two natural entry points for disturbances. One is the observation errors and the other is that of errors in the control signals. Thus we are led to the standard control configuration e1 -
-
y2
C
y1 -
-
G
e2
and the internal stability requirement is that the four transfer functions from the error variables u1 , u2 to the internal variables e1 , e2 should be stable. Since the system equations are e2 = u2 + Ge1 , e1 = u1 + Ce2 , we get
e1 e2
=
(I − CG)−1 (I − CG)−1C (I − GC)−1 G (I − GC)−1
.
We denote the 2 × 2 block transfer function by H(G,C). This leads to the following. Definition 10.19. Given a linear system with transfer function G and a compensator C, we say that the pair (G,C) is internally stable if H(G,C) ∈ RH∞ +. Thus internal stability is a stronger requirement than just closed loop stability, which only requires (I − G(z)C(z))−1 G(z) to be stable. The next result gives a criterion for internal stability via a coprimeness condition over the ring RH∞ + . We use the fact that any rational transfer function has a coprime factorization over the ring RH∞ +. Proposition 10.20. Given the rational transfer function G(z) and the compensator C(z), let G(z) = N(z)/M(z) and C(z) = U(z)/V (z) be coprime factorizations over RH∞ + . Then (G(z),C(z)) is internally stable if and only if Δ (z) = M(z)V (z) − N(z)U(z) is invertible in RH∞ +. MV MU −1 ∞ −1 ∈ RH∞ Proof. Assume Δ ∈ RH+ . Then clearly H(G,C) = Δ +. Conversely, assume H(G,C) ∈ RH∞ + . Since
NV MV
(I −G(z)C(z))−1 G(z)C(z) = (I −G(z)C(z))−1 G(z)C(z)(I −(I −G(z)C(z))) ∈ RH∞ +,
10.6 Exercises
321
it follows that Δ (z)−1 N(z)U(z) ∈ RH∞ + . This implies M(z) Δ (z)−1 V (z) U(z) ∈ RH∞ +. N(z) Since coprimeness in RH∞ + is equivalent to the Bezout identities, it follows that Δ (z)−1 ∈ RH∞ +. Corollary 10.21. Let G(z) = N(z)/M(z) be a coprime factorization over RH∞ +. Then there exists a stabilizing compensator C(z) if and only if C(z) = U(z)/V (z) and M(z)V (z) − N(z)U(z) = I. Proof. If there exist U(z),V (z) such that M(z)V (z) − N(z)U(z) = I, then C(z) = U(z)/V (z) is easily checked to be a stabilizing controller. Conversely, if C(z) = U1 (z)/V1 (z) is a stabilizing controller, then by Proposition 10.20, Δ (z) = M(z)V1 (z) − N(z)U1 (z) is invertible in RH∞ + . Defining U(z) = U1 (z)Δ (z)−1 and V (z) = V1 (z)Δ (z)−1 , we obtain C(z) = U(z)/V (z) and M(z)V (z)− N(z)U(z) = I. The following result, going by the name of the Youla–Kucera parametrization, is the cornerstone of modern stabilization theory. Theorem 10.22. Let G(z) = N(z)/M(z) be a coprime factorization. Let U (z),V (z) be any solution of the Bezout equation M(z)V (z) − N(z)U(z) = I. Then the set of all stabilizing controllers is given by U(z) + M(z)Q(z) ∞ | Q(z) ∈ RH+ ,V (z) + N(z)Q(z) = 0 . C(z) = V (z) + N(z)Q(z) ∞ Proof. Suppose C(z) = U(z)+M(z)Q(z) V (z)+N(z)Q(z) with Q(z) ∈ RH+ . We check that M(z)(V (z)+ N(z)Q(z)) − N(z)(U(z) + M(z)Q(z)) = M(z)V (z) − N(z)U(z) = I. Thus, by Corollary 10.21, C(z) is a stabilizing compensator. Conversely, assume C(z) stabilizes G(z). We know, by Corollary 10.21, that C(z) = U1 (z)/V1 (z) with M(z)V1 (z) − N(z)U1 (z) = I. By subtraction we get M(z) (V1 (z) − V (z)) = N(z)(U1 (z) − U(z)). Since M(z), N(z) are coprime, M(z) divides U1 (z) −U(z), so there exists Q(z) ∈ RH∞ + such that U1 (z) = U(z) + M(z)Q(z). This immediately implies V1 (z) = V (z) + N(z)Q(z).
10.6 Exercises 1. (The Hautus test). Consider the pair (A, b) with A a real n × n matrix and b an n vector. a. Show that (A, b) is reachable if and only if rank zI − A b = n for all z ∈ C. b. Show that (A, b) is stabilizable by state feedback if and only if for all z in the closed right half-plane, we have rank zI − A b = n.
322
10 Elements of Linear System Theory
2. Let φn (z), ψn (z) be the monic orthogonal polynomials of the first and second kind respectively, associated with the positive sequence {cn }, which were defined in exercise 8.6b. Show that 1 ψn A n Bn , = 2 φn Cn Dn where the matrices An , Bn ,Cn , Dn are given by ⎞ n−1 2 (1 − γ 2) . γ1 γ2 (1 − γ12 ) γ3 Πi=1 . . γn Πi=1 (1 − γi2) i n−1 ⎜ 1 −γ γ γ2 (1 − γ12 ) . . . −γn γ1 Πi=2 (1 − γi2 ) ⎟ 2 1 ⎟ ⎜ ⎟ ⎜ 1 −γ3 γ2 . ⎟ ⎜ ⎟ ⎜ An = ⎜ ⎟, . . ⎟ ⎜ ⎟ ⎜ . . ⎟ ⎜ 2 ⎝ 1 −γn γn−2 (1 − γn−1) ⎠ 1 −γn γn−1 ⎛
⎛ ⎞ 1 ⎜0⎟ ⎜ ⎟ ⎜ ⎟ ⎜.⎟ Bn = ⎜ ⎟ , ⎜.⎟ ⎜ ⎟ ⎝.⎠ 0
n−1 Cn = γ1 γ2 (1 − γ12 ) . . . γn Πi=1 (1 − γi2 ) ,
Dn = 12 . 3. Show that detAn = (−1)n−1 γn . 4. Show that the Toeplitz matrix can be diagonalized via the following congruence transformation: ⎛ ⎞⎛ ⎞ ⎞⎛ 1 c0 . . . . cn 1 φ1,0 φn,0 ⎜φ 1 ⎟⎜ . . . . . . ⎟⎜ 1 φn,1 ⎟ ⎜ 1,0 ⎟⎜ ⎟ ⎟⎜ ⎜ ⎟⎜ ⎟ ⎟⎜ . . ⎜ ⎟⎜ . . . . . . ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎟⎜ ⎜ ⎟⎜ . . . . . . ⎟⎜ ⎟ . . ⎜ ⎟⎜ ⎟ ⎟⎜ ⎝ ⎠⎝ . . . . . . ⎠⎝ . . φn,n−1 ⎠ cn . . . . c0 φn,0 φn,n−1 1 1 ⎛ ⎜ ⎜ ⎜ ⎜ = c0 ⎜ ⎜ ⎜ ⎝
⎞
h0
⎛
⎟ ⎜ (1 − γ 2) ⎟ ⎜ 1 ⎟ ⎜ . ⎟ ⎜ ⎟ = c0 ⎜ ⎟ ⎜ . ⎟ ⎜ ⎠ ⎝ .
h1 . . . hn
⎞
1
⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎠ 2 ) · · · (1 − γ 2 ) (1 − γn−1 1
10.6 Exercises
323
5. Show that 2 ). det Tn = c0 (1 − γ12 )n−1 (1 − γ22)n−2 · · · (1 − γn−1
(10.38)
6. Define a sequence of kernel functions by Kn (z, w) =
φ j (z)φ j (w) . hj j=0 n
∑
Show that the kernels Kn (z, w) have the following properties: a. We have Kn (z, w) = Kn (w, z). b. The matrix Kn = (ki j ), defined through Kn (z, w) = ∑ni=0 ∑nj=0 Ki j zi w j , is Hermitian, i.e., ki j = k ji. c. The kernels Kn are reproducing kernels, i.e., for each f ∈ Xφn+1 we have f , K(·, w) C = f (w). d. The reproducing kernels have the following determinantal representation: c 0 1 Kn (z, w) = − detTn cn 1
. . . cn ... ... ... . . . c0 z . . zn
1 w . . . wn 0
e. The following representation for the reproducing kernels holds: Kn (z, w) =
φ j (z)φ j (w) φn (z)φn (w) − zwφn (z)φn (w) . = ∑ hj hn (1 − zw) j=0 n
This representation of the reproducing kernels Kn (z, w) is called the Christoffel–Darboux formula. 7. Show that all the zeros of the orthogonal polynomials φn are located in the open unit disk. 8. Prove the following Gohberg–Semencul formula
324
10 Elements of Linear System Theory
Kn =
1 ˜ − Sφn (S) ˜ φn (S)S] ˜ [φ (S)φn (S) hn n ⎡⎛
1 ⎢⎜ φn,n−1 ⎢⎜ = ⎢⎜ . ⎣⎝ . φn,1
⎞⎛ . .. ... . . φn,n−1 1
1 φn,n−1 . . ⎟⎜ . .. ⎟⎜ .. ⎟⎜ ⎠⎝ .
⎞⎛ 0 φn,0 0 ⎟⎜ . ⎜φ ⎟⎜ ⎜ n,0 ⎟⎜ ⎜ − ⎜. . ⎟⎜ ⎟⎜ ⎜ ⎠⎝ ⎝. .. φn,n−1 . . φn,0 0 ⎛
.. .. .. .
⎞
φn,1 ⎟ . ⎟ . ⎟ φn,n−1 ⎠ 1
⎞⎤ φn,n−1 ⎟⎥ . ⎟⎥ ⎟⎥ . ⎟⎥ . ⎟⎥ φn,0 ⎠⎦ 0
9. With Kn (z, w) = ∑nj=0
n n φ j (z)φ j (w) = ∑ ∑ Ki j zi w j , hj i=0 j=0
show that Kn = Tn−1 .
10.7 Notes and Remarks The study of systems, linear and nonlinear, is the work of many people. However, the contributions of Kalman to this topic were, and remain, central. His insight connecting an applied engineering subject to abstract module theory is of fundamental importance. This insight led eventually to the wide use of polynomial techniques in linear system theory. An early exposition of this is given in Chapter 11 of Kalman, Falb, and Arbib (1969). Another approach to systems problems, using polynomial techniques, was developed by Rosenbrock (1970), who also proved the ultimate result on pole placement by state feedback. The connection between the abstract module approach employed by Kalman and Rosenbrock’s method of polynomial system matrices was clarified in Fuhrmann (1977). A more recent approach was initiated in Willems (1986, 1989, 1991). This approach moves away from the input/output point of view and concentrates on system equations.
Chapter 11
Rational Hardy Spaces
11.1 Introduction We devote this chapter to a description of the setting in which we will study, in Chapter 12, model reduction problems. These will include approximation in Hankel norm as well as approximation by balanced truncation. Since optimization is involved, we choose to work in rational Hardy spaces. This fits with our focus on finite-dimensional linear systems. The natural setting for the development of AAK theory is that of Hardy spaces, whose study is part of complex and functional analysis. However, to keep our exposition algebraic, we shall assume the rationality of functions and the finite dimensionality of model spaces. Moreover, to stay in line with the exposition in previous chapters, we shall treat only the case of scalar-valued transfer functions. There is another choice we are making, and that is to study Hankel operators on Hardy spaces corresponding to the open left and right half-planes, rather than the spaces corresponding to the unit disk and its exterior. Thus in effect we are studying continuous-time systems, though we shall not delve into this in great detail. The advantage of this choice is greater symmetry. The price we pay for this is that to a certain extent shift operators have to be replaced by translation semigroups. Even this can be corrected. Indeed, once we get the identification of the Hankel operator ranges with rational models, shift operators reenter the picture. Of course, with the choices we made, the results are not the most general. However, we gain the advantage of simplicity. As a result, this material can be explained to an undergraduate without any reference to more advanced areas of mathematics such as functional analysis and operator theory. In fact, the material presented in this chapter covers, albeit in simplified form, some of the most important results in modern operator theory and system theory. Thus it can also be taken as an introduction to these topics. The chapter is structured as follows. In Section 11.2 we collect basic information on Hankel operators, invariant subspaces, and their representation via Beurling’s theorem. Next we introduce model intertwining operators. We do this using the P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, DOI 10.1007/978-1-4614-0338-8 11, © Springer Science+Business Media, LLC 2012
325
326
11 Rational Hardy Spaces
frequency-domain representation of the right-translation semigroup. We study the basic properties of intertwining maps and in particular their invertibility properties. The important point here is the connection of invertibility to the solvability of an H+∞ Bezout equation. We follow this by defining Hankel operators. For the case of a rational, antistable function we give specific, Beurling-type, representations for the cokernel and the image of the corresponding Hankel operator. Of importance is the connection between Hankel operators and intertwining maps. This connection, coupled with invertibility properties of intertwining maps, is the key to duality theory.
11.2 Hardy Spaces and Their Maps 11.2.1 Rational Hardy Spaces Our principal interest in the following will be centered around the study of Hankel operators with rational and bounded symbols. In the algebraic approach, taken in Chapter 8, the Hankel operators were defined using the direct sum decomposition F((z−1 )) = F[z] ⊕ z−1 F[[z−1 ]], or alternatively F(z) = F[z] ⊕ Fspr (z), where Fspr (z) is the space of all strictly proper rational functions. The way to think of this direct sum decomposition is that it is a decomposition based on the location of singularities of rational functions. Functions in Fspr (z) have only finite singularities and behave nicely at ∞, whereas polynomials behave nicely at all points but have a singularity at ∞. We shall now consider analogous spaces. Since we are passing to an area bordering on analysis, we restrict the field to be the complex field C. The two domains where we shall consider singularities are the open left and right half-planes. This choice is dictated by considerations of symmetry, which somewhat simplify the development of the theory. In particular, many results concerning duality are made more easily accessible Our setting will be that of Hardy spaces. Thus, H+2 is the Hilbert space of all analytic functions in the open right half-plane C+ , with 1 π x>0
|| f ||2 = sup
∞ −∞
| f (x + iy)|2 dy.
The space H−2 is similarly defined in the open left half plane. It is a theorem of Fatou that guarantees the existence of boundary values of H±2 -functions on the imaginary axis. Thus the spaces H±2 can be considered closed subspaces of L2 (iR), the space of Lebesgue square integrable functions on the imaginary axis. It follows from the Fourier-Plancherel and Paley-Wiener theorems that L2 (iR) = H+2 ⊕ H−2 ,
11.2 Hardy Spaces and Their Maps
327
with H+2 and H−2 the Fourier-Plancherel transforms of L2 (0, ∞) and L2 (−∞, 0) respectively. Also, H+∞ and H−∞ will denote the spaces of bounded analytic functions in the open right and left half-planes respectively. These spaces can be considered subspaces of L∞ (iR), the space of Lebesgue measurable and essentially bounded functions on the imaginary axis. We will define f ∗ (z) = f (−z).
(11.1)
Note that on the imaginary axis we have f ∗ (it) = f (it). Also, since we shall deal with inner product spaces, we will reserve the use of M ⊕ N for the orthogonal direct ·
sum of the spaces M and N, whereas we shall use M +N for the algebraic direct sum. This describes the general analytic setting. Since we want to keep the exposition elementary, that is, as algebraic as possible, we shall not work with these spaces but rather with the subsets of these spaces consisting of their rational elements. We use the letter R in front of the various spaces to denote rational. The following definition introduces the spaces of interest. ∞ 2 2 2 Definition 11.1. We denote by RL∞ , RH∞ + , RH− , RL , RH+ , RH− the sets of all rational functions that are contained in L∞ , H+∞ , H−∞ , L2 , H+2 , H−2 respectively.
The following proposition, the proof of which is obvious, gives a characterization of the elements of these spaces. Proposition 11.2. In terms of polynomials, the spaces of Definition 11.1 have the following characterizations: ∞
RL
RH∞ + RH∞ − RL2 RH2+ RH2−
p(z) |p(z), q(z) ∈ C[z], deg p ≤ deg q, q(ζ ) = 0, ζ ∈ iR , = q(z) p(z) ∞ ∈ RL |q(z) stable , = q(z) p(z) ∈ RL∞ |q(z) antistable , = q(z) p(z) ∞ ∈ RL | deg p < deg q , = q(z) p(z) 2 ∈ RL |q(z) stable , = q(z) p(z) ∈ RL2 |q(z) antistable . = (11.2) q(z)
328
11 Rational Hardy Spaces
We note that all the spaces introduced here are obviously infinite-dimensional, and RL2 is an inner product space, with the inner product given, for f (z), g(z) ∈ RL2 , by 1 ∞ ( f , g) = f (it)g(it)dt. (11.3) 2π −∞ We observe that, assuming the functions f (z), g(z) to be rational, the integral can be evaluated by computing residues, using partial fraction decompositions and replacing the integral on the imaginary axis by integration over large enough semi circles. Proposition 11.3. We have the following additive decompositions: ∞ RL∞ = RH∞ + + RH− ,
(11.4)
RL2 = RH2+ ⊕ RH2−.
(11.5)
Proof. Follows by applying partial fraction decomposition. The sum in (11.4) is not a direct sum, since the constants appear in both summands. The fact that the direct sum in (11.5) is orthogonal follows from the most elementary properties of complex contour integration and the partial fraction decomposition, the details of which are omitted. We shall denote by P+ and P− the orthogonal projections of RL2 on RH2+ and RH2− respectively that correspond to the direct sum (11.5). We note that if f (z) ∈ p(z) with p(z), q(z) coprime polynomials, then q(z) has no zeros on RL2 and f (z) = q(z) the imaginary axis. Thus, we have the factorization q(z) = q− (z)q+ (z) with q− (z) stable and q+ (z) antistable. Thus there exists a unique partial fraction decomposition p(z) = qp−1 (z) + qp+2 (z) , with deg p1 < deg q− and deg p2 < deg q+ . Clearly, f (z) = q(z) (z) (z) p1 (z) q− (z)
∈ RH2+ and
p2 (z) q+ (z)
∈ RH2− , and we have P+
p1 p = , q q−
p2 p P− = . q q+
(11.6)
The space RL∞ is clearly a commutative ring with identity. The spaces RH∞ + and RH∞ − are subrings. So RH∞ +=
p(z) | q(z) ∈ S, deg p ≤ deg q . q(z)
Thus RH∞ + is the set of all rational functions that are uniformly bounded in the closed right half plane. This is clearly a commutative ring with identity.
11.2 Hardy Spaces and Their Maps
329
Given f (z) = p(z)/q(z) ∈ RH∞ + and a factorization p(z) = p+ (z)p− (z) into the stable factor p− (z) and antistable factor p+ (z), we use the notation ∇ f = p+ and set π+ = deg ∇ f . As in Chapter 1, we define the relative degree ρ by ρ ( qp ) = deg q − deg p, and ρ (0) = −∞. We define a degree function δ : RH∞ + −→ Z+ by
δ(f) =
deg ∇ f + ρ ( f ), −∞,
f= 0, f = 0.
(11.7)
∞ Obviously f (z) ∈ RH∞ + is invertible in RH+ if and only if it can be represented as the quotient of two stable polynomials of equal degree. This happens if and only if δ ( f ) = 0. Let us fix now a monic, stable polynomial σ (z). To be specific we can choose σ (z) = z + 1. Given f (z) = p(z)/q(z) ∈ RH∞ + , then f (z) can be brought, by multiplication by an invertible element, to the form σ pν++π , where ν = degq − deg p. (z) pi + Proposition 11.4. Given fi (z) = qpii(z) ∈ RH∞ + , i = 1, 2, with ∇( qi ) = pi and πi,+ = deg p+ i , we set νi = deg qi = deg pi . Let p+ (z) be the greatest common divisor of p+ (z) + p1 (z), p+ 2 (z) and set π+ = deg p+ and ν = min{ν1 , ν2 }. Then σ (z)ν +π+ is a greatest
common divisor of
pi (z) , qi (z)
i = 1, 2.
Proof. Without loss of generality we can assume f1 (z) =
p+ 1 , ν σ 1 +π1,+
+ Writing p+ i (z) = p+ (z) pˆi (z) we have f i =
ρ
p+ σ ν +π+
pˆ+ i
σ νi −ν +πi,+−π+
This shows that
f2 (z) = ·
p+ 2 . ν σ 2 +π2,+ σ
pˆ + i νi −ν +πi,+ −π+
. Now
= νi − ν + πi,+ − π+ − πi,+ + deg p+ = νi − ν ≥ 0.
pˆ + i νi −ν +πi,+ −π+ σ
p+ is σ ν +π+ (z) of qpi(z) , i i
is proper and hence
Let now r(z) s(z) be any other common factor generality we can assume that r(z) is antistable. Let
a common factor. = 1, 2. Without loss of
pi (z) r(z) ui (z) = · . qi (z) s(z) ti (z) νi +πi,+ , it follows that r (z), the antistable factor From the equality sti p+ + i = rui σ + of r(z), divides pi (z) and hence divides also p+ (z), the greatest common divisor of the p+ r+ (z). Since u(z)/t(z) is proper, we have i (z). Let us write p+ (z) = r+ (z)ˆ deg s − degr ≤ deg qi − deg pi = νi and hence deg s − degr ≤ ν . Now we write
p+ ν σ +π+
=
s s r+ rˆ+ r+ rˆ+ r− r · ν +π = = · ν +π . ν + π + + +r σ s σ r− s σ −
330
11 Rational Hardy Spaces
We compute now the relative degree of the right factor:
ν + π+ + degr− − degs = ν + degr− + deg p+ − degs + (π+ − deg p+ ) = ν + degr − deg s ≥ 0. s σ ν +π+ r− p1 (z) (z) and qp2(z) . q1 (z) 2
This shows that divisor of
is proper, and hence
p+ σ ν +π+
is indeed a greatest common
The next theorem studiies the solvability of the Bezout equation in RH∞ +. ∞ Theorem 11.5. Let f 1 (z), f2 (z) ∈ RH∞ + be coprime. Then there exist gi (z) ∈ RH+ such that
g1 (z) f1 (z) + g2 (z) f2 (z) = 1.
(11.8)
Proof. We may assume without loss of generality that f1 (z) =
p1 (z) , σ (z)π1
f2 (z) =
p2 (z) σ (z)ν +π2
with pi (z) coprime, antistable polynomials and πi = deg pi . Clearly, there exist polynomials α1 (z), α2 (z) for which α1 p1 + α2 p2 = σ π1 +π2 +ν , and we may assume without loss of generality that deg α2 < deg p1 . Dividing by the right-hand side, we obtain p1 p2 α1 α2 · + · = 1. σ π2 +ν σ π1 σ π1 σ π2 +ν p2 (z) α2 (z) Now σ (z) π2 +ν is proper, whereas σ (z)π1 is strictly proper by the condition deg α2 < deg p1 . So the product of these functions is strictly proper. Next, we note that the α1 (z) 1 (z) to be proper. Defining relative degree of σp(z) π1 is zero, which forces σ (z)π2 +ν
g1 (z) =
α1 (z) , σ (z)π2 +ν
g2 (z) =
α2 (z) , σ (z)π1
we have, by the stability of σ (z), that gi (z) ∈ RH∞ + and that the Bezout identity (11.8) is satisfied. Corollary 11.6. Let f 1 (z), f2 (z) ∈ RH∞ + and let f (z) be a greatest common divisor of f 1 (z), f2 (z). Then there exist gi (z) ∈ RH∞ + such that g1 (z) f1 (z) + g2(z) f2 (z) = f (z). Theorem 11.7. The ring RH∞ + is a principal ideal domain.
(11.9)
11.2 Hardy Spaces and Their Maps
331
Proof. It suffices to show that any ideal J ⊂ RH∞ + is principal. This is clearly the case for the zero ideal. So we may as well assume that J is a nonzero ideal. Now any p(z) − (z) = p+ (z)p with p(z), q(z) nonzero element f (z) ∈ J can be written as f (z) = q(z) q coprime and q(z) stable and p(z) factored into its stable factor p− (z) and antistable factor p+ (z). Of all nonzero elements in J we choose f (z) for which π+ = deg ∇ f is minimal. Let C = { f ∈ J | deg f = π+ }. In C we choose an arbitrary element g(z) of minimal relative degree. We will show that g(z) is a generator of J. To this end let f (z) be an arbitrary element in J. Let h(z) be a greatest common divisor of f (z) and g(z). By Corollary 11.6, there exist k(z), l(z) ∈ RH∞ + for which k(z) f (z) + l(z)g(z) = h(z). Since deg ∇g is minimal, it follows that deg ∇g ≤ deg ∇h. On the other hand, since h(z) divides g(z), we have ∇h|∇g and hence deg ∇g ≥ deg ∇h. So the equality deg ∇g = deg ∇h follows. This shows that h(z) ∈ C and hence ρ (g) ≤ ρ (h), since g(z) is the element of C of minimal relative degree. Again the division relation h(z) | g(z) implies ρ (g) ≥ ρ (h). Hence we have the equality ρ (g) = ρ (h). It follows that g(z) and h(z) differ at most by a factor that is invertible in RH∞ + , so g(z) divides f (z). Actually, it can be shown that with the degree function defined in (11.7), RH∞ + is a Euclidean domain if we define δ ( f ) = ρ ( f ) + deg ∇ f . Thus δ ( f ) counts the number of zeros of f (z) in the closed right half-plane together with the zeros at infinity. We omit the details. The following theorem introduces important module structures. The proofs of the statements are elementary and we omit them. Theorem 11.8. 1. RL∞ is a ring with identity under the usual operation of addition ∞ and multiplication of rational functions, with RH∞ + and RH− subrings. 2 2. RL is is a linear space but also a module over the ring RL∞ , with the module structure defined, for ψ (z) ∈ RL∞ and f (z) ∈ RL2 , by (ψ · f )(z) = ψ (z) f (z),
(11.10)
∞ as well as over the rings RH∞ + and RH− . ∞ 3. With the RH+ induced module structure on RL2 , RH2+ is a submodule. 4. RH2− has an RH∞ + module structure given by
ψ · h = P− (ψ h),
h(z) ∈ RH2− .
We wish to point out that since z ∈ / RL∞ , the multiplication by z operator is not defined in these spaces. This forces us to some minor departures from the algebraic versions of some of the statements. With all these spaces at hand, we can proceed to introduce several classes of operators.
332
11 Rational Hardy Spaces
Definition 11.9. Let ψ (z) ∈ RL∞ . 1. The operator Lψ : RL2 −→ RL2 defined by Lψ f = ψ f
(11.11)
is called the Laurent operator with symbol ψ (z). 2. The operator Tψ : RH2+ −→ RH2+ defined by Tψ f = P+ ψ f
(11.12)
is called the Toeplitz operator with symbol ψ (z). If ψ (z) ∈ RH∞ + , Tψ will be called an analytic Toeplitz operator. 3. The operator Hψ : RH2+ −→ RH2− defined by Hψ f = P− ψ f
(11.13)
is called the Hankel operator with symbol ψ (z). Similarly, the operator Hˆ ψ : RH2− −→ RH2+ , Hˆ ψ f = P+ ψ f ,
(11.14)
is called the reverse Hankel operator with symbol ψ (z). The next proposition studies the duality properties of these operators. Proposition 11.10. Relative to the inner product in RL2 and the induced inner products in RH2± , we have for ψ (z) ∈ RL∞ and with ψ ∗ (z) is defined by (11.1), L∗ψ = Lψ ∗ ,
(11.15)
Tψ∗ = Tψ ∗ ,
(11.16)
Hψ∗ = Hˆ ψ ∗ .
(11.17)
and
Proof. By elementary computations.
The next proposition introduces an orthogonal projection operator in RH2+ that links to the projection operator π q , defined in (5.35). This in preparation for Proposition 11.13, which links analytic and algebraic invariance concepts. Proposition 11.11. Given a stable polynomial q(z), then 1. The map Pq : RH2+ −→ RH2+ defined by Pq f = is an orthogonal projection.
q∗ q P− ∗ f , q q
f (z) ∈ RH2+ ,
(11.18)
11.2 Hardy Spaces and Their Maps
333
2. We have Ker Pq =
q∗ RH2+ q
(11.19)
and Im Pq = X q .
(11.20)
3. We have the orthogonal direct sum decomposition RH2+ = X q ⊕
q∗ RH2+ , q
(11.21)
as well as (X q )⊥ = 4. We have dim X q = dim
q∗ 2 q RH+
⊥
q∗ RH2+ . q
(11.22)
= deg q.
Proof. 1. We compute, for f (z) ∈ RH2+ , Pq2 f =
q∗ q∗ q∗ q q∗ q q q P− ∗ P− ∗ f = P−2 ∗ f = P− ∗ f = Pq f . q q q q q q q q
So Pq is indeed a projection. For f (z), g(z) ∈ RH2+ we have ∗ q q q q P− ∗ f , g = P− ∗ f , ∗ g (Pq f , g) = q q q q ∗ q q q q = f , P− ∗ g = f , P− ∗ g = ( f , Pq g). q∗ q q q This shows that Pq∗ = Pq , i.e., Pq is an orthogonal projection. 2. Assume f (z) ∈
q∗ (z) 2 q(z) RH+ .
Then f (z) =
Pq f = Thus
q∗ 2 q RH+
Conversely, let f (z) ∈ Ker Pq . Then
q q∗ q P− q∗
f = 0. This shows that
and hence f (z) ∈ or Ker Pq ⊂ follows. Assume f (z) = p(z)/q(z) ∈ X q . Then
RH2+
2 q RH+
Pq f = So X q ⊂ Im Pq .
and
q∗ q q∗ q∗ P− ∗ g = P− g = 0. q q q q
⊂ Ker Pq . q∗
q∗ (z) q(z) g(z)
q∗
2 q RH+ .
q(z) q∗ (z)
f (z) ∈
Thus equality (11.19)
q p q∗ p q∗ p P− ∗ = P− ∗ = = f . q q q q q q
334
11 Rational Hardy Spaces
Conversely, suppose f (z) ∈ Im Pq , i.e., f (z) =
q∗ (z) q(z) h(z)
with h = P− qq∗ f1 ∈
RH2+ . This representation shows that f (z) ∈ X q . Thus Im Pq ⊂ X q and equality follows. 3. Follows from I = Pq + (I − Pq ). 4. Follows from Proposition 5.3.
11.2.2 Invariant Subspaces Before the introduction of Hankel operators, we digress a bit on invariant subspaces of RH2+ . Since we are using the half-planes for our definition of the RH2+ spaces, we do not have the shift operators conveniently at our disposal. This forces us to a slight departure from the usual convention. Definition 11.12. 1. A subspace M ⊂ RH2+ is called an invariant subspace if for each ψ (z) ∈ RH∞ + , we have Tψ M ⊂ M . 2. A subspace M ⊂ RH2+ is called a backward invariant subspace if for each ψ (z) ∈ RH∞ + , we have Tψ∗ M ⊂ M .
Since RH2+ is a subspace of R− (z), the space of strictly proper rational functions, we have in it two notions of backward invariance. One is analytic, given by Definition 11.12. The other is algebraic, namely invariance with respect to the shift S− , defined in equation (5.32), restricted to R− (z). The following proposition shows that in this case, analysis and algebra meet and the two notions of invariance coincide. Proposition 11.13. Let M ⊂ RH2+ be a subspace. Then M is a backward invariant subspace if and only if M is S− -invariant. Proof. Assume first that M to be S− -invariant. Let f (z) = p(z)/q(z) ∈ M with p(z), q(z) coprime and q(z) stable. We show first that X q ⊂ M . By the coprimeness of p(z) and q(z), there exist polynomials a(z), b(z) ∈ C[z] that solve the Bezout equation a(z)p(z) + b(z)q(z) = 1. Then a(S− ) f = π− a ·
p 1 1 ap + bq = π− = π− · = . q q q q
We conclude that 1/q(z) ∈ M and hence zi /q(z) ∈ M for i = 1, . . . , deg q − 1. Thus we have X q ⊂ M . For f (z) as before, and ψ (z) = r(z)/s(z) ∈ RH∞ + , we compute now, taking partial fractions and using the fact that q(z) and s∗ (z) are coprime, v(z) t(z) r∗ (z) p(z) = + s∗ (z) q(z) s∗ (z) q(z)
11.2 Hardy Spaces and Their Maps
335
and hence P+ ψ ∗ f = P+
r∗ p t = ∈ Xq ⊂ M . ∗ s q q
This shows that M is backward invariant. Conversely, let M ⊂ RH2+ be backward invariant, and let f (z) = p(z)/q(z) ∈ M . Clearly, there exists a scalar α such that zp(z)/q(z) = α + t(z)/q(z), and hence S− f = t/q with degt < deg q. To show algebraic invariance, we have to show the ∗ existence of a proper rational stable function ψ (z) = r(z)/s(z) for which P+ rs∗ qp = qt . This, of course, is equivalent to the partial fraction decomposition r ∗ (z) p(z) t(z) v(z) = + ∗ . ∗ s (z) q(z) q(z) s (z) Let us choose now s(z) to be an arbitrary stable polynomial satisfying deg s = deg q − 1. Let e∗ = πq zs∗ , i.e., there exists a constant γ such that e∗ (z) = zs∗ (z) − γ q(z) with deg e < deg q. We compute now e∗ (z) p(z) = s∗ (z) q(z)
zs∗ (z)−γ q(z) p(z) s∗ (z) q(z)
= α+ =
=
zp(z) q(z)
−
γ p(z) s∗ (z)
t(z) γ p(z) t(z) α s∗ (z) − γ p(z) − ∗ = + q(z) s (z) q(z) s∗ (z)
v(z) t(z) + ∗ q(z) s (z)
and hence P+
e∗ p t(z) = , s∗ q q(z)
and the proof is complete.
Clearly, orthogonal complements of invariant subspaces are backward-invariant subspaces. However, in inner product spaces, we may have proper subspaces whose orthogonal complement is trivial. The next result is of this type. Proposition 11.14. Let M ⊂ RH2+ be an infinite-dimensional backward-invariant subspace. Then M ⊥ = {0}. Proof. Let f (z) ∈ M ⊥ . Since f (z) is rational, it has an irreducible representation f (z) = e(z)/d(z), with deg e < deg d = δ . Since, by assumption, M is infinitedimensional, we can find δ linearly independent functions, g1 (z), . . . , gδ (z) in M . Let M1 ⊂ M be the smallest backward-invariant subspace of RH2+ containing all the gi (z). We claim that M1 is finite-dimensional. For this, it suffices to show that if ∗ s g(z) = r(z)/s(z) and φ (z) ∈ RH∞ + , then Tφ = t(z)/s(z) ∈ X . Assume therefore that φ (z) = a(z)/b(z). Then by partial fraction decomposition,
φ ∗ (z)g(z) =
a∗ (z) r(z) α (z) β (z) · = + b∗ (z) s(z) s(z) b∗ (z)
336
11 Rational Hardy Spaces α (z)
and hence Tφ∗ g = P+ φ ∗ g = s(z) ∈ X s . Now if gi (z) = ri (z)/si (z), then clearly M1 ⊂ X s1 ···sδ , and so M1 is finite-dimensional. By Proposition 5.24, there exists a polynomial q(z) for which M1 = X q . Since g1 (z), . . . , gδ (z) ∈ M1 , it follows ∗ that δ ≤ dim M1 = deg q. Now, by Proposition 11.11, M1⊥ = qq RH2+ . So, for ∗
(z) f1 (z) with f1 (z) ∈ RH2+ . f (z) = e(z)/d(z), we have the representation f (z) = qq(z) ∗ Since q (z) is antistable, there can be no pole-zero cancellation between the zeros of q∗ (z) and those of the denominator of f1 (z). This clearly implies that q∗ (z) divides e(z). But dege < δ ≤ deg q. Necessarily e(z) = 0 and, of course, also f (z) = 0.
Definition 11.15. 1. A function m(z) ∈ RL∞ is called an all-pass function if it satisfies m∗ m = 1 on the imaginary axis, i.e., |m(it)| = 1. 2. A function m(z) ∈ RH∞ + is called inner if it is also an all-pass function. Rational inner functions have a simple characterization. Proposition 11.16. m(z) ∈ RH∞ + is an inner function if and only if it has a representation m(z) = α
p∗ (z) p(z)
(11.23)
for some stable polynomial p(z) and α ∈ C, with |α | = 1. Proof. Assume p(z) is a stable polynomial. Then m(z), defined by (11.23), is clearly in RH∞ + . Moreover, on the imaginary axis, we have ∗ ∗ p (z) p (z) ∗ m (z)m(z) = = 1. p(z) p(z) Conversely, assume m(z) ∈ RH∞ + is inner. Let m(z) = r(z)/p(z) with p(z) a stable polynomial. We may assume without loss of generality that r(z) and p(z) r∗ (z) ∗ ∗ are coprime. Since m(z) is inner, we get r(z) p(z) p∗ (z) = 1, or r(z)r (z) = p(z)p (z). ∗ Using our coprimeness assumption, we have r(z) | p (z) | r(z). This implies that r(z) = α p∗ (z), with |α | = 1. Note that m(z) ∈ RH∞ + is inner if and only if m∗ (z) = m(z)−1 .
(11.24)
We proceed to prove the basic tool for all that follows, namely the representation of invariant subspaces. We give an algebraic version of his theorem that is better suited to our needs. Theorem 11.17 (Beurling). M ⊂ RH2+ is a nonzero invariant subspace if and only if M = mRH2+ , where m(z) is a rational inner function.
11.2 Hardy Spaces and Their Maps
337
Proof. If m(z) is inner, then clearly M = mRH2+ is an invariant subspace. To prove the converse, assume M ⊂ RH2+ is a nonzero invariant subspace. By Proposition 11.14, it follows that M ⊥ is necessarily finite-dimensional. Since M ⊥ is backward invariant, by Proposition 11.13, it is also S− -invariant. We invoke now Proposition 5.24 to conclude that M ⊥ = X q for some polynomial q(z). Using ∗ Proposition 11.11, it follows that M = qq RH2+ . Two functions f 1 (z), f2 (z) ∈ RH∞ + are called coprime if their greatest common inner factor is the 1. Clearly, this is equivalent to the existence of δ > 0 for which | f1 (z)| + | f2 (z)| ≥ δ . Proposition 11.18. Let f1 (z), f2 (z) ∈ RH∞ + . Then f 1 (z), f 2 (z) are coprime if and only if there exist functions a1 (z), a2 (z) ∈ RH∞ + such that the Bezout identity holds, a1 (z) f1 (z) + a2 (z) f2 (z) = 1.
(11.25)
Proof. Clearly, if (11.25) holds then f1 (z), f2 (z) cannot have a nontrivial common inner factor. Conversely, assume that f1 (z), f2 (z) are coprime. Then f1 RH2+ , f2 RH2+ are both invariant subspaces of RH2+ and so is their sum. Therefore there exists a rational inner function m(z) such that f1 RH2+ + f2 RH2+ = mRH2+ .
(11.26)
This implies that m(z) is a common inner factor of the fi (z). Hence necessarily we have m(z) = 1. Let us choose α > 0. Then z+1α ∈ RH2+ . Equation (11.26), with m = 1, shows that there exist strictly proper functions b1 (z), b2 (z) ∈ RH2+ such that b1 (z) f1 (z) + b2 (z) f2 (z) = z+1α . Defining ai (z) = (z + α )bi (z), we have ai (z) ∈ RH∞ +, and they satisfy (11.25).
11.2.3 Model Operators and Intertwining Maps 2 The RH∞ + module structure on RH+ , defined by (11.10), induces a similar module structure on backward-invariant subspaces. Given an inner function m(z) ∈ RH∞ + , we consider the backward-invariant subspace
H(m) = {mRH2+ }⊥ = RH2+ mRH2+.
(11.27)
H(m∗ ) = {m∗ RH2− }⊥ = RH2− m∗RH2− .
(11.28)
Similarly, we define
The following proposition not only gives a useful characterization for elements of H(m), but actually shows that in the rational case, the set of invariant subspaces
338
11 Rational Hardy Spaces
of the form H(m) actually coincides with the set of rational models X d , where we d ∗ (z) have m(z) = d(z) . This makes this set of spaces a meeting point of algebra and analysis. Proposition 11.19. Let m(z) ∈ RH∞ + be inner. Then 1. We have the orthogonal direct sum RH2+ = H(m) ⊕ mRH2+.
(11.29)
2. f (z) ∈ H(m) if and only if m∗ (z) f (z) ∈ RH2− . 3. The orthogonal projection PH(m) of RH2+ onto H(m) has the representation PH(m) f = mP− m∗ f ,
f (z) ∈ RH2+ .
(11.30)
4. Let m(z) have the representation m(z) = α
d ∗ (z) , d(z)
(11.31)
for some stable polynomial d(z). Then as sets, we have H(m) = X d .
(11.32)
Proof. 1. Follows from (11.27). 2. Since m(z) is inner, multiplication by m∗ (z) is unitary in RL2+ and therefore preserves orthogonality. Since m∗ mRH2+ = RH2+ , the direct sum (11.29) implies that m∗ H(m) is orthogonal to RH2+ . 3. Given f (z) ∈ RH2+ , let f (z) = g(z) + m(z)h(z) be its decomposition corresponding to the direct sum representation (11.29). This implies m∗ (z) f (z) = m∗ (z)g(z) + h(z), with m∗ (z)g(z) ∈ RH2− . Applying the orthogonal projection P− to the previous equality, we obtain P− m∗ f = m∗ g and hence (11.30) follows. n(z) 4. Assume f (z) = d(z) , with d(z) stable. Then, with m(z) defined by (11.31), we compute m∗ f =
d(z) n(z) n(z) = ∈ RH2− . d ∗ (z) d(z) d ∗ (z)
This shows that X d ⊂ H(m). Conversely, assume f (z) ∈ H(m). Since f (z) is rational, it has a polynomial p(z) , with q(z) stable. Now, by Part 2, we have coprime factorization f (z) = q(z) d(z) p(z) 2 d ∗ (z) q(z) ∈ RH− . Taking a partial fraction decomposition n(z) r(z) d(z) p(z) = ∗ + , d ∗ (z) q(z) d (z) q(z)
11.2 Hardy Spaces and Their Maps
339
we conclude that, necessarily, r(z) = 0. From this it follows that f (z) = f (z) ∈ X d . Thus H(m) ⊂ X d , and (11.32) is proved.
n(z) d(z) ,
i.e.,
2 ∞ The RH∞ + -module structure on RH+ induces an RH+ -module structure on invariant subspaces. This is done by defining the module structure to be given by the following. ∞ Definition 11.20. 1. Given an inner function m(z) ∈ RH∞ + , we define the RH+ module structure on H(m) by
ψ · f = PH(m) (ψ f ),
f (z) ∈ H(m),
(11.33)
for all ψ (z) ∈ RH∞ +. 2. For each ψ (z) ∈ RH∞ + , the map Tψ : H(m) −→ H(m) is defined by Tψ f = ψ · f = PH(m) ψ f ,
f (z) ∈ H(m).
(11.34)
We will refer to operators of the form Tψ as model operators. Thus, the RH∞ + -module structure on the invariant subspaces H(m) is identical to the module structure induced by the algebra of analytic Toeplitz operators. The next proposition shows that we have indeed a module structure on H(m) and use it to explore the lattice of its submodules. Proposition 11.21. Given an inner function m(z) ∈ RH∞ + , then 1. For all φ (z), ψ (z) ∈ RH∞ + , we have Tφ Tψ = Tψ Tφ .
(11.35)
2. A subspace M ⊂ H(m) is a submodule with respect to the RH∞ + -module structure on H(m) if and only if there exists a factorization m(z) = m1 (z)m2 (z)
(11.36)
into inner factors for which we have the representation M = m1 H(m2 ).
(11.37)
Proof. 1. It suffices to show that for ψ (z) ∈ RH∞ + , the inclusion Tψ Ker PH(m) ⊂ Ker PH(m) holds. By Proposition 11.19, the general element of Ker PH(m) is of the form m(z) f (z) with f (z) ∈ RH2+ . So, the required inclusion follows immediately from ψ (z)(m(z) f (z)) = m(z)(ψ (z) f (z)).
340
11 Rational Hardy Spaces
2. Assume a factorization (11.36) exists and M is given by (11.37). First, we show that m1 H(m2 ) ⊂ H(m). Indeed, if f (z) ∈ m1 H(m2 ), then f (z) = m1 (z)g(z), with g(z) ∈ H(m2 ). Using the characterization given in Proposition 11.19, we compute m∗ f = m∗2 m∗1 (m1 g) = m∗2 g ∈ RH2− , i.e., f (z) ∈ H(m). Next, f (z) ∈ M implies f (z) = m1 (z)g(z), with g(z) ∈ H(m2 ). For ψ (z) ∈ RH∞ + , we compute
ψ · f = Tψ f = PH(m) (ψ f ) = mP− m∗ ψ m1 g = m1 m2 P− m∗2 m∗1 m1 ψ g = m1 PH(m2 ) (ψ g) ∈ m1 H(m2 ), i.e., m1 H(m2 ) is a submodule of H(m). Conversely, assume M is a submodule of H(m). This implies that M + mRH2+ is a submodule of RH2+ with respect to the RH∞ + -module structure. By Theorem 11.17, we have M + mRH2+ = m1 RH2+ ,
(11.38)
for some inner function m1 (z). Since (11.38) implies the inclusion mRH2+ ⊂ m1 RH2+ , it follows that a factorization (11.37) exists. Now, for f (z) ∈ M , (11.38) implies a representation f (z) = m1 (z)g(z). Since f (z) ∈ H(m), it follows that m∗ f = m∗2 m∗1 m1 g = m∗2 g ∈ RH2− , i.e., g(z) ∈ H(m2 ). This establishes the representation (11.37).
There is a natural division relation for inner functions. If m(z) = m1 (z)m2 (z) with m(z), m1 (z), m2 (z) inner functions in RH∞ + , then we say that m1 (z) divides m(z). The greatest common inner divisor of m1 (z), m2 (z) is a common inner divisor that is divided by every other common inner divisor. The least common inner multiple is similarly defined. 2 RH∞ + -submodules of RH+ are closed under sums and intersections. These lattice operations can be easily interpreted in terms of arithmetic operations on inner functions. Proposition 11.22. Let m(z), mi (z) ∈ RH∞ + , i = 1, . . . , s, be rational inner functions. Then 1. We have mRH2+ ⊂ m1 RH2+ if and only if m1 (z) divides m(z). 2. We have s
∑ mi RH2+ = mRH2+,
i=1
where m(z) is the greatest common inner divisor of all mi (z), i = 1, . . . , s.
11.2 Hardy Spaces and Their Maps
341
3. We have ∩si=1 mi RH2+ = mRH2+ , where m(z) is the least common inner multiple of all mi (z). Proof. The proof follows along the same lines as that of Proposition 1.46. We omit the details. The previous discussion has its counterpart in any backward-invariant subspace H(m). Proposition 11.23. Let m(z) ∈ RH∞ + be an inner function. We consider H(m) with the RH∞ + -module structure defined in (11.33). Then 1. Let m(z) = mi (z)ni (z) be factorizations into inner factors. Then m1 H(n1 ) ⊂ m2 H(n2 )
(11.39)
if and only if m2 (z) is an inner factor of m1 (z), or equivalently, n1 (z) is an inner factor of n2 (z). 2. Given factorizations m(z) = mi (z)ni (z), i = 1, . . . , s, into inner factors, then we have m0 H(n0 ) = ∩si=1 mi H(ni ),
(11.40)
where m0 (z) is the least common inner multiple of all mi (z), and n0 (z) is the greatest common inner divisor of all ni (z). 3. Given factorizations m(z) = mi (z)ni (z), i = 1, . . . , s, into inner factors, then we have s
m0 H(n0 ) = ∑ mi H(ni )
(11.41)
i=1
where m0 is the greatest common inner divisor of all mi and n0 is the least common inner multiple of all ni . 4. We have s
H(m) = ∑ mi H(ni )
(11.42)
i=1
if and only if the mi (z) are coprime. 5. ∑si=1 mi H(ni ) is an algebraic direct sum if and only if the ni (z) are mutually coprime. 6. We have the algebraic direct sum decomposition ·
·
H(m) = m1 H(n1 )+ · · · +ms H(ns )
(11.43)
s n (z). if and only if the ni are mutually coprime and m(z) = Πi=1 i
Proof. The proof follows along the same lines as that of Proposition 5.7. We omit the details.
342
11 Rational Hardy Spaces d ∗ (z)
The equality (11.32), namely H(m) = X d , with m(z) = d(z) , shows that H(m) carries two different module structures, one over RH∞ + , defined by (11.33), and the other over C[z], defined by (5.34). In both cases, the lattices are determined by factorizations. This is the content of the following proposition. Proposition 11.24. Let m(z) have the representation m(z) =
d ∗ (z) , d(z)
(11.44)
for some stable polynomial d(z). Then the lattices of invariant subspaces, with respect to the two module structures described above, are isomorphic. Proof. We note that factorizations of m(z) are related to factorizations of d(z). d ∗ (z) If d(z) = d1 (z)d2 (z), the di (z) are stable. Defining mi (z) = di (z) , we have the i factorization m(z) = m1 (z)m2 (z). The argument is reversible. The isomorphism of lattices follows from the correspondence of X d2 and m1 H(m2 ). Note that in view of Proposition 11.24, the results of Propositions 11.23 and 11.22 could have been derived from the analogous results on polynomial and rational models. The next theorem sums up duality properties of intertwining operators of the form Tψ . Theorem 11.25. Let ψ (z), m(z) ∈ RH∞ + with m(z) an inner function, and let Tψ be defined by (11.34). Then 1. Its adjoint, Tψ∗ , is given by Tψ∗ f = P+ ψ ∗ f , for f (z) ∈ H(m).
(11.45)
2. The operator τm : RL2 −→ RL2 defined by
τm f := m f ∗
(11.46)
is unitary. 3. The operators Tψ ∗ and Tψ∗ are unitarily equivalent. Specifically, we have Tψ τm = τm Tψ∗ . Proof. 1. For f (z), g(z) ∈ H(m), we compute (Tψ f , g) = (PH(m) ψ f , g) = (mP− m∗ ψ f , g) = (P− m∗ ψ f , m∗ g) = (m∗ ψ f , P− m∗ g) = (m∗ ψ f , m∗ g) = (ψ f , g) = ( f , ψ ∗ g) = (P+ f , ψ ∗ g) = ( f , P+ ψ ∗ g) = ( f , Tψ∗ g). Here we used the fact that g(z) ∈ H(m) if and only if m∗ (z)g(z) ∈ RH2− .
(11.47)
11.2 Hardy Spaces and Their Maps
343
2. Clearly the map τm , as a map in RL2 , is unitary. From the orthogonal direct sum decomposition RL2 = RH2− ⊕ H(m) ⊕ mRH2+ it follows, by conjugation, that RL2 = m∗ RH2− ⊕ {RH2− m∗ RH2− } ⊕ RH2+. Hence m{RH2− m∗RH2− } = H(m). 3. We compute, with f (z) ∈ H(m), Tψ τm f = Tψ m f ∗ = PH(m) ψ m f ∗ = mP− m∗ ψ m f ∗ = mP− ψ f ∗ . Now
τm Tψ∗ = τm (P+ ψ ∗ f ) = m(P+ ψ ∗ f )∗ = mP− ψ f ∗ . Comparing the two expressions, (11.47) follows. We proceed to study the invertibility properties of the maps Tψ that are the counterpart of Theorem 5.18. This will be instrumental in the analysis of Hankel operators restricted to their cokernels. Theorem 11.26. Let ψ (z), m(z) ∈ RH∞ + with m(z) an inner function. The following statements are equivalent: 1. The operator Tψ : H(m) −→ H(m), defined in (11.34), is invertible. 2. There exists δ > 0 such that |ψ (z)| + |m(z)| ≥ δ , for all z with Re z > 0.
(11.48)
3. There exist ξ (z), η (z) ∈ RH∞ + that solve the Bezout equation
ξ (z)ψ (z) + η (z)m(z) = 1.
(11.49)
In this case we have Tψ−1 = Tξ . Proof. 1 ⇒ 2 We prove this by contradiction. Being inner, m(z) has no zero at ∞; hence if no such δ exists, then necessarily there exists a point α in the open right half-plane C+ such that ψ (α ) = m(α ) = 0. This implies the existence of factorizations m(z) = πα (z)mα (z),
ψ (z) = ψα (z)mα (z),
344
11 Rational Hardy Spaces
α where mα (z) = z− z+α . To the first factorization there corresponds the one-dimensional invariant subspace πα H(mα ) ⊂ H(m). The second factorization implies the inclusion Ker Tmα ⊂ πα H(mα ). With lα (z) = z+1α ∈ H(mα ), we have
Tmα πα lα = PH(m) mα πα lα = 0. Next, we compute Tψ (πα lα ) = PH(m) ψπα lα = PH(m) ψα mα πα lα = PH(m) ψα PH(m) mlα = 0. So πα lα ∈ Ker Tψ and Tψ cannot be invertible. 2 ⇒ 3 That coprimeness implies the solvability of the Bezout equation was proved in Proposition 11.18. 3 ⇒ 1 Assume there exist ξ (z), η (z) ∈ RH∞ + that solve the Bezout equation (11.49). We will show that Tψ−1 = Tξ . To this end, let f (z) ∈ H(m). Then Tξ Tψ f = PH(m) ξ PH(m) ψ f = PH(m) ξ ψ f = PH(m) (1 − mη ) f = f . Here we used the fact that ξ Ker PH(m) = ξ mRH2+ ⊂ mRH2+ = Ker PH(m) .
2 In Theorem 11.8, we introduced natural RH∞ + -module structures in both RH+ 2 ∞ 2 ∞ and RH− . Given an inner function m(z) ∈ RH+ , mRH+ is an RH+ submodule, hence it follows that there is a naturally induced RH∞ + -module structure on the quotient RH2+ /mRH2+. Using the orthogonal decomposition (11.29), we have the RH∞ + -module isomorphism
RH2+ /mRH2+ H(m).
(11.50)
∞ Thus H(m) is a RH∞ + module, with the module structure induced by the RH+ 2 module structure of RH+ , i.e.,
ψ · f = PH(m) (ψ f ),
f (z) ∈ H(m).
(11.51)
11.2.4 Intertwining Maps and Interpolation Definition 11.27. Let m(z) be an inner function. We say that a map X : H(m) −→ H(m) is an intertwining map if for all ψ (z) ∈ RH∞ + , we have XTψ = Tψ X .
(11.52)
Because of the commutativity property (11.35), all operators Tψ , defined in (11.34), are intertwining maps. Putting it differently, (11.35) expresses the fact that the maps Tψ are RH∞ + homomorphisms in H(m) with the module structure defined
11.2 Hardy Spaces and Their Maps
345
by (11.51). Next, we will show that being of the form Tψ is not only sufficient for being an intertwining map, but also necessary. We will do so by showing that there is a close connection between intertwining maps and a rational interpolation problem. We begin this by exploring this connection for a special case, namely that of inner functions with distinct zeros, which leads to a representation theorem for intertwining maps. This will be extended, in Theorem 11.33, to the general case. Next, we introduce the RH∞ + -interpolation problem. Definition 11.28. 1. Given ξi ∈ C, and λi ∈ C+ , i = 1, . . . , s, find ψ (z) ∈ RH∞ + for which IP :
ψ (λi ) = ξi ,
i = 1, . . . , s,
we say that ψ (z) solves the interpolation problem IP. 2. Given ξ j ∈ C, and λ j ∈ Π+ , j = 1, . . . , s, find ψi (z) ∈ RH∞ + for which 0, j = i, IP(i) : ψi (λ j ) = ξi , j = i,
(11.53)
(11.54)
we say that ψi (z) solves the interpolation problem IP(i). Clearly, the second part of the definition is geared to a Lagrange-like solution of the interpolation problem (11.53). The next proposition addresses these interpolation problems. Proposition 11.29. Given ξi ∈ C and λi ∈ C+ , i = 1, . . . , s, define the functions m(z), πλi (z), mλi (z), lλi (z) by s m(z) = Πi=1
πλi (z) = Π j=i mλi (z) = lλi (z) =
z − λi z+ λi z− λj z+λ j
z − λi z+ λi 1 z+ λi
, ,
, .
(11.55)
Then 1. There exist solutions ψi (z) of the interpolation problems IP(i). A solution is given by
ψi (z) = ξi πλi (λi )−1 πλi (z).
(11.56)
2. A solution of the interpolation problems IP is given by s
ψ (z) = ∑ ξi πλi (λi )−1 πλi (z). i=1
(11.57)
346
11 Rational Hardy Spaces
Proof. 1. For any ψi (z) ∈ RH∞ + , (11.61) holds. Therefore, ψi (λ j ) = 0 for all j = i if and only if ψ (z) is a multiple of πi (z). We assume therefore that, for some constant αi , we have ψi (z) = αi πi (z). The second interpolation condition in IP(i) reduces to ξi = ψi (λi ) = αi πλi (λi ), i.e., αi = ξi πλi (λi )−1 . Thus (11.56) follows. 2. This follows easily from the first part. Note that ψi (z), defined in (11.56), is proper and has McMillan degree s − 1. On the other hand, since the least common inner multiple of all πi (z) is m(z), it follows that ψ (z), given in (11.57), has McMillan degree ≤ s. Proposition 11.30. Given an inner function m(z) ∈ RH∞ + having all its zeros, λi , i = 1, . . . , s, distinct, we define functions m(z), πλi (z), mλi (z), lλi (z) by (11.55). Then 1. We have the factorizations m(z) = πλi (z)mλi (z). 2. The space H(mλi ) is spanned by the function lλi (z) =
(11.58) 1 . z+λ i
We have
RH2+ = H(mλi ) ⊕ mλi RH2+ .
(11.59)
3. We have the algebraic direct sum decomposition in terms of invariant subspaces ·
·
H(m) = πλ1 H(mλ1 )+ · · · +πλs H(mλs ),
(11.60)
and {πλi (z)lλi (z)}si=1 forms a basis for H(m). 4. For any ψ (z) ∈ RH∞ + , we have Tψ (πλi (z)lλi (z)) = ψ (λi )πλi (z)lλi (z).
(11.61)
5. A map X : H(m) −→ H(m) is an intertwining map if and only if there exist ξi ∈ C such that X(πλi lλi ) = ξi πλi (z)lλi (z).
(11.62)
6. A map X : H(m) −→ H(m) is an intertwining map if and only if there exists ψ (z) ∈ RH∞ + such that X = Tψ .
(11.63)
7. If ψ (z) ∈ RH∞ + , we have, for Tψ : H(m) −→ H(m) defined by (11.34), Tψ ≤ ψ ∞ . Proof. 1. This is obvious. 2. We compute, applying Cauchy’s theorem, for f (z) ∈ RH2+ , 1 1 ∞ f (it) dt = dt 2π −∞ −it + λi −∞ it + λi f (ζ ) 1 d ζ = f (λi ). = 2π i γ ζ − λ i
1 ( f , li ) = 2π
∞
f (it)
1
11.2 Hardy Spaces and Their Maps
347
Here the contour γ is a positively oriented semicircle of sufficiently large radius. 3. Follows from Proposition 11.23. 4. We compute, using the representation (11.30) of PH(m) , Tψ (πλi lλi ) = mP− m∗ ψπλi lλi = πλi mλi P− m∗λi πλ∗i ψπλi lλi ψ (z) − ψ (λi ) ψ (λi ) = πi PH(mi ) ψ lλi = πλi PH(mλ ) + i z+ λi z+ λi = ψ (λi )(πλi (z)lλi (z)). Here we used the fact that
ψ (z)−ψ (λi ) z+λ i
∈ mλi RH2+ .
5. Assume first that there exist ξi ∈ C such that (11.62) holds. Then for every ψ (z) ∈ RH∞ + , we compute XTψ (πλi (z)lλi (z)) = X (ψ (λi )πλi (z)lλi (z)) = ψ (λi )X(πλi (z)lλi (z)) = ψ (λi )ξi (πλi (z)lλi (z)) = Tψ ξi (πλi (z)lλi (z)) = Tψ X (πλi (z)lλi (z)). Since {πλi (z)lλi (z)} form a basis of H(m), it follows that XTψ = Tψ X, i.e., X is an intertwining map. Conversely, assume X is an intertwining map. We note that we have Ker Tmλ = i πλi H(mλi ) for Tmλ (πλi (z)lλi (z)) = PH(m) (πλi (z)lλi (z)) = mP− m∗ (πλi (z)lλi (z)) i
= πλi mλi P− m∗λi πλ∗i πλi (z)lλi (z) = πλi PH(mλ ) mλi (z)lλi (z) = 0. i
Since XTmλ = Tmλ X , it follows that i
i
Tmλ X (πλi (z)lλi (z)) = XTmλ (πλi (z)lλi (z)) = 0. i
i
So X (πλi (z)lλi (z)) ∈ πλi H(mλi ), i.e., there exist ξi ∈ C such that (11.62) holds. 6. If X has the representation X = Tψ , then by (11.35) it is intertwining. To prove the converse, assume to begin with that all the zeros λi of m(z) are distinct. Thus, we have the factorization m(z) = Π sj=1 mλi (z), where mλi (z) = z−λi . z+λ i
By Proposition 11.21, we have lλi (z) =
1 z+λ i
∈ H(m), and since they are
linearly independent, these functions form a basis of
H(m). Using the duality results of Theorem 11.25, also Π j=i mλi (z) lλi (z) is a basis for H(m). This implies the algebraic direct sum representation
· · H(m) = Π j=1 mλ j H(mλ1 )+ · · · + Π j=s mλ j H(mλs ).
(11.64)
348
11 Rational Hardy Spaces
Let now X : H(m) −→ H(m) be an intertwining map. Since Ker Tmi = πλi H(mλi ), we have for f (z) = πλi (z)lλi (z), Tmi X (πλi lλi ) = XTmi (πλi lλi ) = 0. This means that X Ker Tmi ⊂ Ker Tmi . In turn, this implies the existence of constants ξi for which X (πλi lλi ) = ξi πλi lλi . Thus X = Tψ for any ψ (z) ∈ RH∞ + that solves the interpolation problem (11.53). In particular, ψ (z) given by (11.57) is such a solution. 7. Follows from the facts that PH(m) is an orthogonal projection and that for ψ (z) ∈ 2 RH∞ + and f (z) ∈ RH+ , we have Tψ f 2 = PH(m) ψ f 2 ≤ ψ f 2 ≤ ψ ∞ f 2 . We note that, due to (11.61), the direct sum representation (11.60) is a spectral decomposition for all maps Tψ . We proceed to extend Proposition 11.30 to the case that the inner function m(z) may have multiple zeros. However, before doing that, we introduce and study higher-order interpolation problems. νi −1 Definition 11.31. 1. Given λi ∈ C+ , νi ∈ N, and {ξi,t ∈ C}t=0 , i = 1, . . . , k. If ∞ ψ (z) ∈ RH+ has the local expansions
ψ (z) =
νi −1
∑ ψi,t mλi (z)t + mλi (z)νi ρνi (z),
i = 1, . . . , k,
(11.65)
i = 1, . . . , k, t = 0, . . . , νi − 1,
(11.66)
t=0
and satisfies the interpolation conditions
ψi,t = ξi,t ,
HOIP :
we say that ψ (z) solves the high-order interpolation problem HOIP. νi −1 , i = 1, . . . , k, find ψi (z) ∈ RH∞ 2. Given distinct λi ∈ C+ , νi ∈ N, and {ξi,t ∈ C}t=0 +, i = 1, . . . , k, having the local expansions
ψi (z) =
mλ j (z)ν j ρi,ν j (z), j = i, νi −1 ψi,t mλi (z)t + mλi (z)νi ρi,νi (z), j = i, ∑t=0
(11.67)
and satisfies the interpolation conditions HOIP(i) :
ψi,t =
ξi,t , i = j, t = 0, . . . , νi − 1, 0, i = j,
(11.68)
we say that ψi (z) solves the high-order interpolation problem HOIP(i).
11.2 Hardy Spaces and Their Maps
349
i −1 Proposition 11.32. Given distinct λi ∈ C+ , νi ∈ N, and {ξi j ∈ C}νj=0 , i = 1, . . . , k, define the functions m(z), πλi (z), mλi (z), lλi (z) by
k m(z) = Πi=1 mλi (z)νi ,
mλi (z) =
z − λi z + λi
,
πλi (z) = Π j=i mλ j (z)ν j , lλi (z) =
1 z + λi
.
(11.69)
Then 1. For i = 1, . . . , k, we have the factorizations
πλi (z)mλi (z)νi = mλi (z)νi πλi (z),
(11.70)
with mλi (z)νi , πλi (z) coprime. 2. There exist aλi (z), bλi (z) ∈ RH∞ + solving the Bezout equation aλi (z)πλi (z) + bλi (z)mλi (z)νi = 1.
(11.71)
3. Define the intertwining map Xi : H(mνλii ) −→ H(mνλii ) by fi (z) = Xi gi = PH(mνi ) πλi gi , λi
gi (z) ∈ H(mνλi ). i
(11.72)
Then Xi is invertible with its inverse, Xi−1 : H(mνλii ) −→ H(mνλii ), given by gi (z) = Xi−1 fi = PH(mνi ) aλi fi , λi
fi (z) ∈ H(mνλi ), i
(11.73)
where aλi (z) is determined by the solution to the Bezout equation (11.71). 4. A solution to HOIP(i) is given by
ψi (z) = πλi (z)gi (z),
(11.74)
with vλi lλi = Xi−1 (wλi lλi ) determined by (11.73). 5. A solution of the interpolation problems HOIP is given by k
ψ (z) = ∑ ψi (z), i=1
with ψi (z) given by (11.74).
(11.75)
350
11 Rational Hardy Spaces
Proof. 1. This is immediate. 2. Follows, using Proposition 11.18, from the coprimeness of πλi (z) and mλi (z)νi . 3. Follows from Theorem 11.26. 4. For ψi (z) to satisfy the homogeneous interpolation conditions of (11.68), it is ν necessary and sufficient that we have the factorizations ψi (z) = mλ j (z)uλ j (z) j for all j = i. Since the λ j are distinct, this implies the factorization ψi (z) = πλi (z)vλi (z) for some vλi (z) ∈ RH∞ + . So, all that remains is to choose vλi (z) so as to satisfy the last interpolation constraint of (11.68). Note that mλi (z)−νi +1 (πλi (z)vλi (z) − wλi (z)) ∈ RH∞ − if and only if PH(mνi ) (πλi (z)vλi (z) − wλi (z))lλi (z) ∈ RH2− . λi
However, the last condition can be interpreted as Xi (vλi lλi ) = wλi lλi ; hence vλi lλi = Xi−1 (wλi lλi ) and from this the result follows. 5. Follows from the previous part. The following theorem, characterizing intertwining maps, is the algebraic analogue of the commutant lifting theorem. Theorem 11.33. Given an inner function m(z) ∈ RH∞ + having the primary inner decomposition k m(z) = Πi=1 mλi (z)νi ,
where mλ j (z) =
z−λ j z+λ j
(11.76)
and the inner functions mλ j (z) are mutually coprime, we define
πλi (z) = Π sj=i mλ j (z)ν j , lλi (z) =
1 z+ λi
(11.77)
.
Then 1. For i = 1, . . . , k, we have the factorizations m(z) = πλi (z)mλi (z)νi .
(11.78)
2. We have the algebraic direct sum decomposition of invariant subspaces ·
·
ν
H(m) = πλ1 H(mνλ1 )+ · · · +πλk H(mλk ).
(11.79)
Ker Tm j = πλi mνλi − j H(mλj ),
(11.80)
1
k
3. We have λi
i
i
11.2 Hardy Spaces and Their Maps
351
and a basis Bλi , j for this invariant subspace is given by the following set of vectors
πλi (z)mλi (z)νi − j lλi (z), πλi (z)mλi (z)νi − j+1 lλi (z), . . . , πλi (z)mλi (z)νi −1 lλi (z) . (11.81)
Moreover, we have ν −j
dim πλi mλii H(mλi ) = j. j
(11.82)
4. For j = 1, . . . , νi , a basis for Ker Tm j is given by λi
Bλi , j = πλi (z)mλi (z)νi − j lλi (z), . . . , πλi (z)mλi (z)νi −1 lλi (z) .
(11.83)
5. Let mλ (z) = z−λ be an inner function and let ψ (z) ∈ RH∞ + . Then for each ν ≥ 0, z+λ there exist uniquely determined ψλ ,t ∈ C, t = 0, . . . , ν − 1, and ρν (z) ∈ RH∞ + such that we have the representation
ψ (z) =
ν −1
∑ ψλ ,t mλ (z)t + mλ (z)ν ρν (z).
(11.84)
t=0
6. Let ψ (z) ∈ RH∞ + have the local representations (11.84). Then Tψ πλi (z)mλi (z)νi − j lλi (z) =
j−1
∑ ψt,λi πλi (z)mλi (z)νi − j+t lλi (z).
(11.85)
t=0
Therefore, the matrix representation of Tψ | Ker Tm j , with respect to the basis λi
Bλi , j given in (11.83), is ⎛
⎞ ψλi ,0 0 . . 0 ⎜ . . . ⎟ ⎜ ⎟ ⎜ ⎟ . . ⎟. ⎜ . ⎜ ⎟ ⎝ . . 0 ⎠ ψλi , j−1 . . . ψλi ,0
(11.86)
7. A map X : H(m) −→ H(m) is an intertwining map if and only if there exists ψ (z) ∈ RH∞ + for which X = Tψ .
(11.87)
Proof. 1. Follows from (11.76) and the definition of πλi (z) given in (11.77). 2. Follows from the mutual coprimeness of the mi (z). 3. Equality (11.80) follows from the factorization m(z) = (πλi (z)mλi (z)νi − j )mλi (z) j .
352
11 Rational Hardy Spaces
Using the characterization of Proposition 11.19, all elements πλi (z)mλi (z)νi −t lλi (z), i = 1, . . . , k, t = 1, . . . , j, are in πλi mλi (z)νi − j H(mλj ). Their linear indepeni
dence is easily verfied, and since dim H(mλj ) = j, it follows that Bλi , j is indeed i
a basis for πλi (z)mλi (z)νi − j H(mλj i ). 4. The proof is similar to the proof of the previous part. 5. Let ψ (z) ∈ RH∞ + and let lλ (z) be defined by (11.77). We note that Bλ = ν −1 {mλ (z)t lλ (z)}t=0 is a basis for H(mλν ). Obviously, ψ (z)lλ (z) ∈ RH2+ , and therefore it has a decomposition with respect to the direct sum RH2+ = H(mλν ) ⊕ mλν RH2+ . So there exist ψt ∈ C and θ (z) ∈ RH2+ for which we have ψ (z)lλ (z) = ν −1 ψt mλ (z)t +mλ (z)ν θν (z). Since mλ (z)ν θν (z) ∈ RH2+ , it follows that ρν (z) = ∑t=0 θν (z)lλ (z)−1 ∈ RH∞ + , and (11.84) follows. 6. Follows from the local expansion (11.84) and Tmt πλi (z)mλi (z)s lλi (z) = λi
⎧ ⎨ 0, ⎩
t + s ≥ νi , (11.88)
πλi (z)mλi (z)t+s lλi (z), t + s < νi .
7. As in the proof of Proposition 11.30, if X has the representation X = Tψ , then by (11.35), it is intertwining. To prove the converse, assume X : H(m) −→ H(m) is intertwining. We compute, for f (z) ∈ Ker Tm j , λi
0 = XTm j f = Tm j X f , λi
λi
which shows that X πλi (z)mλi (z)νi − j H(mλj i ) ⊂ πλi (z)mλi (z)νi − j H(mλj i ).
(11.89)
In turn, this implies the existence of constants ξλi ,t , t = 0, . . . , j − 1, for which X πλi (z)mλi (z)νi − j lλi (z) =
j−1
∑ ξλi ,t πλi (z)mλi (z)νi − j lλi (z).
(11.90)
t=0
Choosing ψ (z) to be the solution of the high-order interpolation problem (11.65)(11.66), with ψλi ,t = ξλi ,t , we get the representation (11.87). We can interpret (11.84) as a division rule in RH∞ + , and we will refer to ν −1 as the remainder. Note that mλ (z)−ν ∑t=0 ψt mλ (z)t ∈ RH∞ −.
ν −1 ψt mλ (z)t ∑t=0
11.2 Hardy Spaces and Their Maps
353
11.2.5 RH∞ + -Chinese Remainder Theorem We have already seen many analogies between polynomial and rational models on the one hand and invariant subspaces of RH2+ on the other. Results applying coprimeness of inner functions follow the lines of results applying polynomial coprimeness. This applies to direct sum representations, as well as to the spectral analysis of intertwining maps. This analogy leads us to an analytic version of the Chinese remainder theorem, a result that can be immediately applied to rational interpolation problems. s Theorem 11.34. 1. Let mi (z) be mutually coprime inner functions, m(z) = Πi=1 mi (z), and let gi (z) ∈ H(mi ). Then there exists a unique f (z) ∈ H(m) for which
f (z) − gi (z) ∈ mi RH2+ .
(11.91)
s mi (z). Let wi (z) ∈ 2. Let mi (z) be mutually coprime inner functions, m(z) = Πi=1 ∞ −1 ∞ RH+ be such that mi (z) wi (z) ∈ RH− . Then there exists a unique v(z) ∈ RH∞ + such that
a. m(z)−1 v(z) ∈ RH∞ −, b. v(z) − wi (z) ∈ mi (z)RH∞ +. Proof. 1. We define πi (z) = Π j=i m j (z). This implies the factorizations m(z) = πi (z)mi (z).
(11.92)
In turn, we have the direct sum representation ·
·
H(m) = π1 H(m1 )+ · · · +πk H(mk ),
(11.93)
and f (z) ∈ H(m) has a unique representation of the form f (z) = ∑sj=1 π j (z) f j (z), s with f j (z) ∈ H(m j ). If f (z) satisfies (11.91), then P− m−1 i (∑ j=1 π j f j − g j ) = 0, which implies PH(mi ) πi fi = Xi fi = gi , where Xi : H(mi ) −→ H(mi ) is defined by Xi fi = PH(mi ) πi fi . This implies fi = Xi−1 gi . In order to invert Xi , we apply Theorem 11.26, to infer that Xi−1 : H(mi ) −→ H(mi ) is given by Xi−1 gi = PH(mi ) ai gi , where ai (z) arises from the solution of the RH∞ + Bezout equation ai (z)πi (z) + bi (z)mi (z) = 1. 2. Let now e(z) ∈ H(m) be a cyclic vector. For example, we can take, for any ∗ (α ) α in the open left half-plane, e(z) = PH(m) z−1α = 1−m(z)m ∈ H(m). Clearly, z−α −1 v(z)e(z) ∈ RH2 , which is equivalent to m(z)−1 v(z) ∈ RH∞ if and only if m(z) − − v(z)e(z) ∈ H(m). In that case, we have a decomposition with respect to the direct sum representation (11.93), namely v(z)e(z) =
k
∑ π j (z) f j (z),
j=1
354
11 Rational Hardy Spaces
with f j (z) ∈ H(m j ). Using v(z) − wi (z) ∈ mi (z)RH∞ + , we have mi (z)−1
k
∑ π j (z) f j (z) − wi (z)ei (z)
j=1
∈ RH2+ ,
which implies k k 0 = mi P− m−1 i (∑ j=1 π j (z) f j (z)−wi (z)ei (z)) = PH(mi ) ∑ j=1 π j (z) f j (z)−wi (z)ei (z)
= PH(mi ) πi (z) fi (z) − wi (z)ei (z) = (Xi fi )(z) − wi (z)ei (z). In order to evaluate fi (z), we have to invert the maps Xi : H(mi ) −→ H(mi ), defined by Xi f = PH(mi ) πi f , and by Theorem 11.26, this is accomplished by solving the Bezout equation ai (z)πi (z) + bi (z)mi (z) = 1.
11.2.6 Analytic Hankel Operators and Intertwining Maps Our next topic is the detailed study of Hankel operators, introduced in Defininition 11.9. We shall study their kernel and image and their relation to Beurling’s theorem. We shall also describe the connection between Hankel operators and intertwining maps. Definition 11.35. Given a function φ (z) ∈ RL∞ (iR), the Hankel operator Hφ : RH2+ −→ RH2− is defined by Hφ f = P− (φ f ),
f ∈ RH2+ .
(11.94)
The adjoint operator (Hφ )∗ : RH2− −→ RH2+ is given by (Hφ )∗ f = P+ (φ ∗ f ),
f ∈ RH2− .
(11.95)
n(z) ∈ RH∞ Thus, assume φ (z) = d(z) − and n(z) ∧ d(z) = 1. Our assumption implies that d(z) is antistable. In spite of the slight ambiguity, we will write n = deg d. It will always be clear from the context what n means. This leads to n(z) d ∗ (z) d(z) −1 n(z) n(z) = ∗ = . φ (z) = d(z) d (z) d(z) d ∗ (z) d ∗ (z)
Thus
φ (z) = m∗ (z)η (z) = m(z)−1 η (z),
(11.96)
d(z) ∞ with η (z) = dn(z) ∗ (z) and m(z) = d ∗ (z) , is a coprime factorization over RH+ . This particular coprime factorization, where the denominator is an inner function, is called the Douglas, Shapiro, and Shields (DSS) factorization.
11.2 Hardy Spaces and Their Maps
355
The next theorem discusses the functional equation of Hankel operators. Theorem 11.36. Let φ (z) ∈ RL∞ . 2 2 1. For every ψ (z) ∈ RH∞ + the Hankel operator Hφ : RH+ −→ RH− satisfies the Hankel functional equation
P− ψ Hφ f = Hφ ψ f ,
f ∈ RH2+ .
(11.97)
2. Ker Hφ is an invariant subspace, i.e., for f (z) ∈ Ker Hφ and ψ (z) ∈ RH∞ + we have ψ (z) f (z) ∈ Ker Hφ . Proof. 1. We compute P− ψ Hφ f = P− ψ P− φ f = P− ψφ f = P− φ ψ f = Hφ ψ f . 2. Follows from the Hankel functional equation. have an RH∞ +
module structure on any finite-dimensional backward-invariant We subspace H(m). The RH∞ + module homomorphisms are the maps X : H(m) −→ H(m) that satisfy XTψ = Tψ X for every ψ (z) ∈ RH∞ + . At the same time the abstract Hankel operators, i.e., the operators H : RH2+ −→ RH2+ that satisfy the Hankel functional equation (11.97), are also RH∞ + module homomorphisms. The next result relates these two classes of module homomorphisms. Theorem 11.37. A map H : RH2+ −→ RH2− is a Hankel operator with a nontrivial kernel mRH2+ , with m(z) rational inner, if and only if we have H = m∗ XPH(m) ,
(11.98)
or equivalently, the following diagram is commutative: X
H(m)
- H(m)
@ @
@ H @
@
m∗ @
@
@ R @
? H(m∗ )
Here X : H(m) −→ H(m) is an intertwining map, i.e., satisfies XTψ = Tψ X for every ψ (z) ∈ RH∞ +.
356
11 Rational Hardy Spaces
Proof. Assume X : H(m) −→ H(m) is an intertwining map. We define H : RH2+ −→ 2 RH2− by (11.98). Let now ψ (z) ∈ RH∞ + and f (z) ∈ RH+ . Then H ψ f = m∗ XPH(m) ψ f = m∗ XPH(m) ψ PH(m) f = m∗ XTψ PH(m) f = m∗ Tψ XPH(m) f = m∗ PH(m) ψ XPH(m) f = m∗ PH(m) ψ XPH(m) f = m∗ mP− m∗ ψ XPH(m) f = P− ψ m∗ XPH(m) f = P− ψ H f . So H satisfies the Hankel functional equation. Conversely, if H is a Hankel operator with kernel mRH2+ , then we define X : H(m) −→ H(m) by X = mH|H(m) . For f (z) ∈ H(m) and ψ (z) ∈ RH∞ + , we compute XTψ f = mHPH(m) ψ f = mH ψ f = mP− ψ H f = mP− m∗ mψ H f = PH(m) ψ mH f = Tψ X f . So X is intertwining. Obviously, (11.98) holds.
The following theorem shows that the Hankel functional equation characterizes the class of Hankel operators. Theorem 11.38. Let H : RH2+ −→ RH2− satisfy the Hankel functional equation (11.97). Then there exists a function ψ (z) ∈ RH∞ − for which H = Hψ . Proof. First proof: Choosing α ∈ C+ , then
1 z+α
∈ RH2+ , we clearly have
RH2+
=
θ (z) ∞ |θ (z) ∈ RH+ . z+α
(11.99)
α ∞ Note that with α , β in the open right half-plane, z+ z+β is a unit in RH+ . If we had ∞ H = Hψ , with ψ (z) ∈ RH− , then we would have as well
H
1 ψ (z) ψ (z) − ψ (−α ) + ψ (−α ) = P− = P− z+α z+α z+α =
ψ (z) − ψ (−α ) = φ (z). z+α
Therefore, we define
ψ (z) = φ (z)(z + α ) + ψ (−α ).
11.2 Hardy Spaces and Their Maps
357
Obviously ψ (z) ∈ RH∞ − . We clearly have P− ψ
1 φ (z)(z + α ) + ψ (−α ) = P− = φ (z). z+α z+α
Therefore, for any θ (z) ∈ RH∞ +, H
1 θ 1 = P− θ H = P− θ P− ψ z+α z+α z+α = P− θ ψ
θ 1 = P− ψ . z+α z+α
So H f = P− ψ f = Hψ f for every f (z) ∈ RH2+ . This shows that H = Hψ . Second proof: By our assumption, KerH is a nontrivial invariant subspace of RH2+ . Thus, by Theorem 11.37, there exists a rational inner function m(z) for which Ker H = mRH2+ . Since for any f (z) ∈ RH2+ , we have m(z) f (z) ∈ Ker H, it follows from the functional equation of Hankel operators that P− mH f = H(m f ) = 0. This implies Im H = H(m∗ ). Taking the restriction of H to a map from H(m) to H(m∗ ), it follows that the map X : H(m) −→ H(m), defined by X = mH, is an intertwining map. Applying Theorem 11.33, it follows that X = TΘ for some Θ (z) ∈ RH∞ + , which, without loss of generality, can be taken to satisfy m(z)∗Θ (z) ∈ RH∞ . From (11.98), − it follows that for f (z) ∈ H(m), H f = m(z)∗ X f = m(z)∗ TΘ f = m(z)∗ PH(m)Θ f = Hm(z)∗Θ f = Hψ f , where ψ (z) = m(z)∗Θ (z) ∈ RH∞ −.
It follows from Beurling’s theorem that Ker Hφ = mRH2+ for some inner function m ∈ RH∞ + . Since we are dealing with the rational case, the next theorem can make this more specific. It gives a characterization of the kernel and image of a Hankel operator and clarifies the connection between them and polynomial and rational models. Theorem 11.39. Let d(z) be antistable and φ (z) =
n(z) d(z)
∈ RH∞ − . Then
1. We have Ker Hφ ⊃ dd∗ RH2+ , with equality holding if and only if n(z) and d(z) are coprime. 2. If n(z) and d(z) are coprime, then {Ker Hφ }⊥ =
d RH2+ d∗
⊥
∗
= Xd .
(11.100)
358
3. We have Im Hφ ⊂ RH2−
11 Rational Hardy Spaces
d∗ RH2− = X d with equality holding if and only if n(z) d
and d(z) are coprime. 4. If n(z) and d(z) are coprime, then
dim Im Hφ = deg d. Proof. 1. {Ker Hφ }⊥ contains only rational functions. Let f (z) =
(11.101) p(z) q(z)
∈
d
2 ⊥. d ∗ RH+
d ∗ (z)p(z) 2 ∗ d(z)q(z) ∈ RH− . So q(z) | d (z)p(z). But, since p(z) ∧ q(z) = 1, it follows d∗ that q(z) | d ∗ (z), i.e., d ∗ (z) = q(z)r(z). Hence f (z) = r(z)p(z) d ∗ (z) ∈ X . ∗ d ∗ . Then p(z) = p(z) d(z) or d (z) p(z) ∈ RH2 . Conversely, let dp(z) ∗ (z) ∈ X − d ∗ (z) d(z) d ∗ (z) d(z) d ∗ (z) d 2 }⊥ . ∈ { RH Therefore we conclude that dp(z) ∗ ∗ (z) + d
Then
2. -4. These statements follow from Proposition 11.11.
As a corollary, we can state the following theorem. Theorem 11.40 (Kronecker). Let φ (z) ∈ RL∞ . Then rank Hφ = k if and only if φ (z) has k poles, counting multiplicities, in the open right half-plane. Proof. Since φ (z) ∈ RL∞ , it cannot have any poles on the imaginary axis. Taking a partial fraction decomposition into stable and antistable parts, we may assume without loss of generality that φ (z) is antistable. The previous theorem, though of an elementary nature, is central to all further development, since it provides the direct link between the infinite-dimensional object, namely the Hankel operator, and the well-developed theory of polynomial and rational models. This link will be continuously exploited. Hankel operators in general and those with rational symbol in particular are never invertible. Still, we may want to invert the Hankel operator as a map from its cokernel, i.e., the orthogonal complement of its kernel, to its image. We saw that such a restriction of a Hankel operator is of considerable interest because of its connection to intertwining maps of model operators. Theorem 11.26 gave a full characterization of invertibility properties of intertwining maps. These will be applied, in Chapter 12, to the inversion of the restricted Hankel operators, which will turn out to be of great importance in the development of a duality theory, linking model reduction with robust stabilization.
11.3 Exercises 1. Spectral factorization: Show that for g(z) ∈ RH∞ + we have g(iω ) ≥ 0 for all ω ∈ R if and only if for some h(z) ∈ RH∞ , we have g(z) = h∗ (z)h(z). Here + h∗ (z) = h(−z). Show that if g(z) is nonzero on the extended imaginary axis, then h(z) can be chosen such that also h(z)−1 ∈ RH∞ +.
11.4 Notes and Remarks
359
2. Normalized coprime factorization: Let g(z) ∈ RH∞ − . Show that there exist −1 and n∗ (z)n(z) + m∗ (z)m(z) = I. n(z), m(z) ∈ RH∞ for which g(z) = n(z)m(z) +
11.4 Notes and Remarks The topics covered in this chapter have their roots early in the Twentieth century in the work of Carath´eodory, Fej´er, and Schur. In two classic papers, Schur (1917, 1918), gives a complete study of contractive analytic functions in the unit disk. The problem is attacked by a variety of methods, including what has become known as the Schur algorithm as well as the use of quadratic forms. One of the early problems considered was the minimum H ∞ -norm extension of polynomials. This in turn led to the Nevanlinna-Pick interpolation problem. Nehari’s theorem was proved in Nehari (1957). Many of the interpolation problems can be recast as best-approximation problems, which are naturally motivated by computational considerations. A new impetus for the development of operator theory, based on complex analysis and the use of functional models, can be traced to the work of Livsic, Potapov, Debranges, Sz.-Nagy and Foias (1970), Lax and Phillips (1967), and many others. The characterization of invariant subspaces of H 2 by Beurling (1949) proved to be of central importance. Theorem 11.17 is a mild algebraic version. For a nice geometric proof of Beurling’s theorem, see Helson (1964). Of great impact was the work of Sarason (1967) on the connection between modern operator theory and generalized interpolation. This led in turn to the commutant lifting theorem, proved in full generality by Sz.-Nagy and Foias. Proposition 11.18 has a very simple proof. However, if we remove the assumption of rationality and work in the H ∞ setting, then the equivalence of the coprimeness condition ∑si=1 |ai (z)| ≥ δ > 0 for all z in the open right half-plane and the existence of an H ∞ solution to the Bezout identity is a deep result in analysis, referred to as the corona theorem, due to Carleson (1962). The corona theorem was applied to the characterization of the invertibility of intertwining maps in Fuhrmann (1968a,b). The study of the DSS coprime factorization (11.96) is due to Douglas, Shapiro, and Shields (1971). For a matrix-valued version of the DSS factorization, see Fuhrmann (1975).
Chapter 12
Model Reduction
12.1 Introduction Analytic Hankel operators are generally defined in the time domain, and via the Fourier transform their frequency domain representation is obtained. We will skip this part and introduce Hankel operators directly as frequency-domain objects. Our choice is to develop the theory of continuous-time systems. This means that the relevant frequency domain spaces are the Hardy spaces of the left and right halfplanes. Thus we will study Hankel operators defined on half-plane Hardy spaces rather than on those of the unit disk as was done by Adamjan, Arov, and Krein (1968a,b, 1971, 1978). In this, we follow the choice of Glover (1984). This choice seems to be a very convenient one, since all results on duality simplify significantly, due to the greater symmetry between the two half-planes in comparison to the unit disk and its exterior. In this, the last chapter of this book, we have chosen as our theme the theory of Hankel norm approximation problems. This theory has come to be generally known as the AAK theory, in honor of the pioneering and very influential contribution made by Adamjan, Arov, and Krein. The reason for the choice of this topic is twofold. First and foremost, it is an extremely interesting and rich circle of ideas and allows us the consideration of several distinct problems within a unified framework. The second reason is just as important. We shall use the problems under discussion as a vehicle for the development of many results that are counterparts in the setting of Hardy spaces of results obtained previously in polynomial terms. This development can be considered the construction of a bridge that connects algebra and analysis. In Section 12.2.1, we do a detailed analysis of Schmidt pairs of a Hankel operator with scalar, rational symbol. Some important lemmas are derived in our setting from an algebraic point of view. These lemmas lead to a polynomial formulation of the singular-value singular-vector equation of the Hankel operator. This equation, to which we refer as the fundamental polynomial equation (FPE), is easily reduced, using the theory of polynomial models, to a standard eigenvalue problem. Using P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, DOI 10.1007/978-1-4614-0338-8 12, © Springer Science+Business Media, LLC 2012
361
362
12 Model Reduction
nothing more than the polynomial division algorithm, the subspace of all singular vectors corresponding to a given singular value is parametrized via the minimaldegree solution of the FPE. We obtain a connection between the minimal-degree solution and the multiplicity of the singular value. The FPE can be transformed, using a simple algebraic manipulation, to a form that leads immediately to lower bound estimates on the number of antistable zeros of pk (z), the minimal-degree solution corresponding to the kth Hankel singular value. This lower bound is shown to actually coincide with the degree of the minimal-degree solution for the special case of the smallest singular value. Thus this polynomial turns out to be antistable. Another algebraic manipulation of the FPE leads to a Bezout equation over RH∞ + . This provides the key to duality considerations. Section 12.2.3 has duality theory as its main theme. Using the previously obtained Bezout equation, we invert the intertwining map corresponding to the initial Hankel operator. The inverse intertwining map is related to a new Hankel operator that has inverse singular values to those of the original one. Moreover we can compute the Schmidt pairs corresponding to this Hankel operator in terms of the original Schmidt pairs. The estimates on the number of antistable zeros of the minimum-degree solutions of the FPE that were obtained for the original Hankel operator Schmidt vectors are applied now to the new Hankel operator Schmidt vectors. Thus we obtain a second set of inequalities. The two sets of inequalities, taken together, lead to precise information on the number of antistable zeros of the minimal-degree solutions corresponding to all singular values. Utilizing this information leads to the solution of the Nehari problem as well as that of the general Hankel norm approximation problem. Section 12.2.6 is a brief introduction to Nevanlinna-Pick interpolation, i.e., interpolation by functions that minimize the RH∞ + -norm. In Section 12.2.7, we study the singular values and singular vectors of both the Nehari complement and the best Hankel norm approximant. In both cases the singular values are a subset of the set of singular values of the original Hankel operator, and the corresponding singular vectors are obtained by orthogonal projections on the respective model spaces. Section 12.2.9, using the analysis carried out in Section 12.2.3, is devoted to a nontrivial duality theory that connects robust stabilization with model reduction problems.
12.2 Hankel Norm Approximation Real-life problems tend to be complex, and as a result, the modeling of systems may result in high complexity (usually measured in dimension), models. Complexity can be measured in various ways, e.g., by McMillan degree or number of stable or antistable poles. In order to facilitate computational algorithms, we want
12.2 Hankel Norm Approximation
363
to approximate complex linear systems by systems of lower complexity, while retaining the essential features of the original system. These problems fall under the general term of model reduction. Since we saw, in Chapter 7, that approximating a linear tranformation is related to singular values and singular vectors, and as the external behavior of a system is described by the Hankel operator associated with its transfer function, it is only to be expected that the analysis of the singular values and singular vectors of this Hankel operator will play a central role.
12.2.1 Schmidt Pairs of Hankel Operators We saw, in Section 7.4, that singular values of linear operators are closely related to the problem of best approximation by operators of lower rank. That this basic method could be applied to the approximation of Hankel operators, by Hankel operators of lower ranks, through the detailed analysis of singular values and the corresponding Schmidt pairs is a fundamental contribution of Adamjan, Arov, and Krein. We recall that given a linear operator A on an inner product space, μ is a singular value of A if there exists a nonzero vector f such that A∗ A f = μ 2 f . Rather than solve the previous equation, we let g = μ1 A f and go over to the equivalent system A f = μ g, A∗ g = μ f ,
(12.1)
i.e., μ is a singular value of both A and A∗ . For the rational case, we present an algebraic derivation of the basic results on Hankel singular values and vectors. We recall Definition 11.1, i.e., f ∗ (z) = f (−z).
(12.2)
n(z) ∈ RH∞ Proposition 12.1. Assume, without loss of generality, that φ (z) = d(z) − is strictly proper, with n(z) and d(z) coprime polynomials with real coefficients. Let Hφ : RH2+ −→ RH2− be the Hankel operator, defined by (11.94). p(z) ˆ 1. The pair { dp(z) ∗ (z) , d(z) } is a Schmidt pair for Hφ if and only if there exist polynomials π (z) and ξ (z) for which
ˆ + d(z)π (z), n(z)p(z) = μ d ∗ (z) p(z)
(12.3)
ˆ = μ d(z)p(z) + d ∗ (z)ξ (z). n∗ (z) p(z)
(12.4)
364
12 Model Reduction
We will say that a pair of polynomials (p(z), p(z)), ˆ with deg p, deg pˆ < deg d, is a solution pair if there exist polynomials π (z) and ξ (z) such that equations (12.3) and (12.4) are satisfied. p(z) ˆ q(z) q(z) ˆ 2. Let { dp(z) ∗ (z) , d(z) } and { d ∗ (z) , d(z) } be two Schmidt pairs of the Hankel operator H dn , corresponding to the same singular value μ . Then p(z) q(z) = , p(z) ˆ q(z) ˆ i.e., this ratio is independent of the Schmidt pair. p(z) ˆ 3. Let { dp(z) ∗ (z) , d(z) } be a Schmidt pair associated with the singular value μ . Then
p(z) p(z) ˆ
p(z) is unimodular, that is we have p(z) ˆ = 1 on the imaginary axis. Such a function is also called an all-pass function. 4. Let μ be a singular value of the Hankel operator H dn . Then there exists a unique, up to a constant factor, solution pair (p(z), p(z)) ˆ of minimal degree. The set of all solution pairs is given by
{(q(z), q(z)) ˆ | q(z) = p(z)a(z), q(z) ˆ = p(z)a(z) ˆ , deg a < deg q − deg p}. p(z) is all-pass. Then 5. Let p(z), q(z) ∈ C[z] be coprime polynomials such that q(z) q(z) = α p∗ (z), with |α | = 1. p(z) is all-pass. Then, 6. Let p(z), q(z) be polynomials such that p(z) ∧ q(z) = 1 and q(z) with r(z) = p(z) ∧ p(z), ˆ we have p(z) = r(z)s(z) and p(z) ˆ = α r(z)s∗ (z).
Proof. 1. In view of equation (7.19), in order to compute the singular vectors of the Hankel operator Hφ , we have to solve Hφ f = μ g, Hφ∗ g = μ f .
(12.5) ∗
Since by Theorem 11.39, we have Ker Hφ = dd∗ RH2+ and Im Hφ = { dd RH2+ }⊥ , we may as well consider Hφ as a map from H(m) to H(m∗ ). Therefore a Schmidt p(z) ˆ pair for a singular value μ is necessarily of the form { dp(z) ∗ (z) , d(z) }. Thus (12.5) translates into n p pˆ P− =μ , d d∗ d n∗ pˆ p = μ ∗. ∗ d d d This means that there exist polynomials π (z) and ξ (z) such that we have the following partial fraction decomposition: P+
12.2 Hankel Norm Approximation
365
n(z) p(z) p(z) ˆ π (z) =μ + , d(z) d ∗ (z) d(z) d ∗ (z) n∗ (z) p(z) p(z) ξ (z) ˆ =μ ∗ + . ∗ d (z) d(z) d (z) d(z) 2. The polynomials p(z), p(z) ˆ correspond to one Schmidt pair, and let the polynomials q(z), q(z) ˆ correspond to another Schmidt pair, i.e., we have ˆ + d(z)ρ (z) n(z)q(z) = μ d ∗ (z)q(z)
(12.6)
ˆ = μ d(z)q(z) + d ∗ (z)η (z). n∗ (z)q(z)
(12.7)
and
Now, from equations (12.3) and (12.7) we get 0 = μ d(z)(p(z)q(z) ˆ − q(z) p(z)) ˆ + d ∗ (z)(ξ (z)q(z) ˆ − η (z) p(z)). ˆ ˆ − q(z) p(z)). ˆ Since d(z) and d ∗ (z) are coprime, it follows that d ∗ (z) | (p(z)q(z) On the other hand, from equations (12.3) and (12.6), we get 0 = μ d ∗ (z)( p(z)q(z) ˆ − q(z)p(z)) ˆ + d(z)(π (z)q(z) − ρ (z)p(z)), and hence that d(z) | ( p(z)q(z) ˆ − q(z)p(z)). ˆ Now both d(z) and d ∗ (z) divide p(z)q(z)− ˆ q(z)p(z), ˆ and by the coprimeness of d(z) and d ∗ (z), so does d(z)d ∗ (z). Since deg( pq ˆ − qp) ˆ < deg d + degd ∗ , it follows that p(z)q(z) ˆ − q(z)p(z) ˆ = 0, or equivalently, p(z) q(z) = , p(z) ˆ q(z) ˆ is independent of the particular Schmidt pair associated to the singular i.e., p(z) p(z) ˆ value μ . 3. Going back to equation (12.4) and the dual of (12.3), we have ˆ = μ d(z)p(z) + d ∗ (z)ξ (z), n∗ (z) p(z) ∗ + d ∗ (z)π ∗ (z). n∗ (z)p∗ (z) = μ d(z)( p(z)) ˆ
It follows that ∗ ˆ p(z)) ˆ ) + d ∗ (z)(ξ (z)p∗ (z) − π ∗ (z) p(z)), ˆ 0 = μ d(z)(p(z)p∗ (z) − p(z)(
366
12 Model Reduction
∗ ). By symmetry also d(z) | (p(z) and hence d ∗ (z) | (p(z)p∗ (z) − p(z)( ˆ p(z)) ˆ ∗ ∗ p (z) − p(z) ˆ pˆ (z)), and so, arguing as before, necessarily, p(z)p∗ (z) − p(z) ˆ p(z) p∗ (z) p(z) = 1, i.e., is all-pass. ( p) ˆ ∗ (z) = 0. This can be rewritten as p(z) ˆ pˆ∗ (z) p(z) ˆ 4. Clearly, if μ is a singular value of the Hankel operator, then a nonzero solution pair (p(z), p(z)) ˆ of minimal degree exists. Let (q(z), q(z)) ˆ be any other solution pair with deg q, deg qˆ < deg d. By the division rule for polynomials, we have q(z) = a(z)p(z) + r(z) with deg r < deg p. Similarly, q(z) ˆ = a(z) ˆ p(z) ˆ + r(z) ˆ with deg rˆ < deg p. ˆ From equation (12.3) we get
n(z)(a(z)p(z)) = μ d ∗ (z)(a(z) p(z)) ˆ + d(z)(a(z)π (z)),
(12.8)
whereas equation (12.6) yields ˆ p(z) ˆ + rˆ(z)) + d(z)(τ (z)). n(z)(a(z)p(z) + r(z)) = μ d ∗ (z)(a(z)
(12.9)
By subtraction we obtain ˆ − a(z)) p(z) ˆ + rˆ(z)) + d(z)(τ (z) − a(z)π (z)). (12.10) n(z)r(z) = μ d ∗ (z)((a(z) Similarly, from equation (12.7) we get n∗ (z)(a(z) ˆ p(z) ˆ + rˆ(z)) = μ d(z)(a(z)p(z) + r(z)) + d ∗ (z)ξ (z),
(12.11)
whereas equation (12.4) yields ˆ = μ d(z)(a(z)p(z)) + d ∗ (z)(a(z)ξ (z)). n∗ (z)(a(z) p(z))
(12.12)
Subtracting one from the other gives n∗ (z)((a(z) ˆ − a(z)) p(z) ˆ + rˆ(z)) = μ d(z)r(z) + d ∗ (z)(η (z) − a(z)ξ (z)). (12.13) (a(z)−a(z)) ˆ p(z)+ˆ ˆ r(z)) } is a Equations (12.10) and (12.13) imply that { dr(z) ∗ (z) , d(z) μ -Schmidt pair. Since necessarily deg r = deg(aˆ − a) pˆ + rˆ), we get a(z) ˆ = a(z). Finally, since we assumed (p(z), p(z)) ˆ to be of minimal degree, we must have r(z) = rˆ(z) = 0. Conversely, if a(z) is any polynomial satisfying deg a < deg d − deg p, then from equations (12.3) and (12.4), it follows by multiplication that (p(z)a(z), p(z)a(z)) ˆ is also a solution pair. p(z) p(z) p∗ (z) 5. Since q(z) is all-pass, it follows that q(z) = 1, or p(z)p∗ (z) = q(z)q∗ (z). Since q∗ (z) the polynomials p(z) and q(z) are coprime, it follows that p(z) | q∗ (z). By the same token, we have q∗ (z) | p(z) and hence q∗ (z) = ±p(z). s(z) is all-pass. 6. Write p(z) = r(z)s(z), p(z) ˆ = r(z)s(z). ˆ Then s(z) ∧ s(z) ˆ = 1 and s(z) ˆ The result follows by applying Part 5.
12.2 Hankel Norm Approximation
367
Equation (12.3), considered as an equation modulo the polynomial d(z), is not an eigenvalue equation, since there are too many unknowns. Specifically, we have to find the coefficients of both p(z) and p(z). ˆ To overcome this difficulty, we study in more detail the structure of Schmidt pairs of Hankel operators. The importance of the next theorem is due to the fact that it reduces the analysis of Schmidt pairs to one polynomial. This leads to an equation that is easily reduced to an eigenvalue problem. n(z) ∈ RH∞ Theorem 12.2. Let φ (z) = d(z) − be strictly proper, with n(z) and d(z) coprime polynomials with real coefficients. Let μ be a singular value of Hφ and let (p(z), p(z)) ˆ be a nonzero, minimal degree solution pair of equations (12.3) and (12.4). Then p(z) is a solution of
n(z)p(z) = λ d ∗ (z)p∗ (z) + d(z)π (z),
(12.14)
with λ real and |λ | = μ . Proof. Let (p(z), p(z)) ˆ be a nonzero minimal-degree solution pair of equations (12.3) and (12.4). By taking their adjoints, we can easily see that ( pˆ∗ (z), p∗ (z)) is also a nonzero minimal-degree solution pair. By uniqueness of such a solution, i.e., by Proposition 12.1, we have pˆ ∗ (z) = ε p(z).
(12.15)
p(z) ˆ is all-pass and both polynomials are real, we have ε = ±1. Let us put Since p(z) λ = ε μ . Then (12.15) can be rewritten as p(z) ˆ = ε p∗ (z), and so (12.14) follows from (12.3).
We will refer to equation (12.14) as the fundamental polynomial equation (FPE), and it will be the basis for all further derivations. Corollary 12.3. Let μi be a singular value of Hφ and let pi (z) be the minimaldegree solution of the fundamental polynomial equation, i.e., n(z)pi (z) = λi d ∗ (z)p∗i (z) + d(z)πi (z). Then 1.
deg pi = deg p∗i = deg πi .
n−1 j j 2. Putting pi (z) = ∑n−1 j=0 pi, j z and πi (z) = ∑ j=0 πi, j z , we have the equality
πi,n−1 (z) = λi pi,n−1 (z).
(12.16)
368
12 Model Reduction
Corollary 12.4. Let p(z) be a minimal-degree solution of equation (12.14). Then 1. The set of all singular vectors of the Hankel operator H dn corresponding to the singular value μ , is given by ∗
Ker (H n H dn − μ I) = d
2
p(z)a(z) | a(z) ∈ C[z], dega < deg d − deg p . d ∗ (z) (12.17)
2. The multiplicity of μ = Hφ as a singular value of Hφ is equal to m = deg d − deg p, where p(z) is the minimum-degree solution of (12.14). n(z) is a constant multiple of an antistable 3. There exists a constant c such that c + d(z) all-pass function if and only if μ1 = · · · = μn . Proof. We will prove (3) only. Assume all singular values are equal to μ . Thus the multiplicity of μ is deg d. Hence the minimal-degree solution p(z) of (12.14) is a constant and so is π (z). Putting c = − πp , then (12.14) can be rewritten as d ∗ (z)p∗ (z) n(z) +c = λ , d(z) d(z)p(z) and this is a multiple of an antistable all-pass function. n(z) Conversely assume, without loss of generality, that d(z) + c is antistable all-pass. Then the induced Hankel operator is isometric, and all its singular values are equal to 1. The following simple proposition is important in the study of zeros of singular vectors. Proposition 12.5. Let μk be a singular value of Hφ and let pk (z) be the minimal degree solution of n(z)pk (z) = λk d ∗ (z)p∗k (z) + d(z)πk (z). Then 1. The polynomials pk (z) and p∗k (z) are coprime. 2. The polynomial pk (z) has no imaginary-axis zeros. Proof. 1. Let e(z) = pk (z) ∧ p∗k (z). Without loss of generality, we may assume that e(z) = e∗ (z). The polynomial e(z) has no imaginary-axis zeros, for that would imply that e(z) and πk (z) have a nontrivial common divisor. Thus the fundamental polynomial equation could be divided by a suitable polynomial factor, this in contradiction to the assumption that pk (z) is a minimal-degree solution. 2. This clearly follows from the first part.
12.2 Hankel Norm Approximation
369
12.2.2 Reduction to Eigenvalue Equation The fundamental polynomial equation is easily reduced, using polynomial models, to either a generalized eigenvalue equation or a regular eigenvalue equation. Starting from (12.14), we apply the standard functional calculus and the fact that d(Sd ) = 0, i.e., the Cayley–Hamilton theorem, to obtain n(Sd )pi = λi d ∗ (Sd )p∗i .
(12.18)
Now d(z), d ∗ (z) are coprime, since d(z) is antistable and d ∗ (z) is stable. Thus, by Theorem 5.18, d ∗ (Sd ) is invertible. In fact, the inverse of d ∗ (Sd ) is easily computed through the unique solution of the Bezout equation a(z)d(z) + b(z)d ∗ (z) = 1, satisfying the constraints dega, deg b < deg d. In this case, the polynomials a(z) and b(z) are uniquely determined, which, by virtue of symmetry, forces the equality a(z) = b∗ (z). Hence b∗ (z)d(z) + b(z)d ∗ (z) = 1. (12.19) From this we get b(Sd )d ∗ (Sd ) = I or d ∗ (Sd )−1 = b(Sd ). Because of the symmetry in the Bezout equation (12.19), we expect that some reduction in the computational complexity should be possible. This indeed turns out to be the case. Given an arbitrary polynomial f (z), we let f + (z2 ) =
f (z) + f ∗ (z) , 2
f− (z2 ) =
f (z) − f ∗ (z) . 2z
The Bezout equation can be rewritten as (b+ (z2 ) − zb− (z2 ))(d+ (z2 ) + zd− (z2 )) + (b+ (z2 ) + zb− (z2 ))(d+ (z2 ) − zd− (z2 )) = 1, or 2(b+ (z2 )d+ (z2 ) − z2 b− (z2 )d− (z2 )) = 1. We can, of course, solve the lower-degree Bezout equation 2(b+ (z)d+ (z) − zb− (z)d− (z)) = 1. This is possible because, by the assumption that d(z) is antistable, d+ (z) and zd− (z) are coprime. Putting b(z) = b+ (z2 ) + zb− (z2 ), we get a solution to the Bezout equation (12.19).
370
12 Model Reduction
Going back to equation (12.18), we have n(Sd )b(Sd )pi = λi p∗i .
(12.20)
To simplify, we let r = πd (bn) = (bn) mod (d). Then (12.20) is equivalent to r(Sd )pi = λi p∗i .
(12.21)
If K : Xd −→ Xd is given by (K p)(z) = p∗ (z), then (12.21) is equivalent to the generalized eigenvalue equation r(Sd )pi = λi K pi . Since K is obviously invertible and K −1 = K, the last equation transforms into the regular eigenvalue equation Kr(Sd )pi = λi pi . To get a matrix equation, one can take the matrix representation with respect to any choice of basis in Xd .
12.2.3 Zeros of Singular Vectors and a Bezout equation We begin now the study of the zero location of the numerator polynomials of singular vectors. This, of course, is the same as the study of the zeros of minimaldegree solutions of equation (12.14). The following proposition provides a lower bound on the number of zeros the minimal-degree solutions of (12.14) can have in the open left half plane. However, we also show that the lower bound is sharp in one special case, and that leads to the solvability of a special Bezout equation. Using the connection between Hankel operators and model intertwining maps given in Theorem 11.37 and the invertibility of intertwining maps given in Theorem 11.26, opens the road to a full duality theory. n(z) Proposition 12.6. Let φ (z) = d(z) ∈ RH∞ − , be strictly proper, with n(z) and d(z) coprime polynomials with real coefficients.
1. Let μk be a singular value of Hφ satisfying
μ1 ≥ · · · ≥ μk−1 > μk = · · · = μk+ν −1 > μk+ν ≥ · · · ≥ μn , i.e., μk is a singular value of multiplicity ν . Let pk (z) be the minimum-degree solution of (12.14) corresponding to μk . Then the number of antistable zeros of pk (z) is ≥ k − 1. 2. If μn is the smallest singular value of Hφ and is of multiplicity ν , i.e.,
12.2 Hankel Norm Approximation
371
μ1 ≥ · · · ≥ μn−ν > μn−ν +1 = · · · = μn , and pn−ν +1 (z) is the corresponding minimum-degree solution of (12.14), then all the zeros of pn−ν +1 (z) are antistable. 3. The following Bezout identity in RH∞ + holds: n(z) d ∗ (z)
1 pn−ν +1 (z) d(z) 1 πn−ν +1 (z) − = 1. λn p∗n−ν +1 (z) d ∗ (z) λn p∗n−ν +1 (z)
(12.22)
Proof. 1. From equation (12.14), i.e., n(z)pk (z) = λk d ∗ (z)p∗k (z) + d(z)πk (z), we get, dividing by d(z)pk (z), d ∗ (z)p∗k (z) n(z) πk (z) − = λk , d(z) pk (z) d(z)pk (z) which implies, of course, that ∗ ∗ H n − H πk ≤ n − πk = μk d pk = μk . d pk d pk ∞ d pk ∞ This means, by the definition of singular values, that rank H πk ≥ k − 1. But this pk
implies, by Kronecker’s theorem, that the number of antistables poles of πpk (z) k (z) which is the same as the number of antistable zeros of pk (z), is ≥ k − 1. 2. If μn , the smallest singular value, has multiplicity ν , and pn−ν +1 (z) is the minimal-degree solution of equation (12.14), then it has degree n − ν . But by the previous part it must have at least n − ν antistable zeros. This implies that all the zeros of pn−ν +1 (z) are antistable. 3. From equation (12.14) we obtain, dividing by λn d ∗ (z)p∗n−ν +1 (z), the Bezout identity (12.22). Since the polynomials pn−ν +1 (z) and d(z) are antistable, all four functions appearing in the Bezout equation (12.22) are in RH∞ +. The previous result is extremely important from our point of view. It shifts the focus from the largest singular value, the starting point in all derivations so far, to the smallest singular value. Certainly, the derivation is elementary, inasmuch as we use only the definition of singular values and Kronecker’s theorem. The great advantage is that at this stage we can solve an important Bezout equation that is the key to duality theory. We have now at hand all that is needed to obtain the optimal Hankel norm approximant corresponding to the smallest singular value. We shall delay this analysis to a later stage and develop duality theory first. We will say that two inner product space operators T : H1 −→ H2 and T : H3 −→ H4 are equivalent if there exist unitary operators U : H1 −→ H3 and V : H2 −→ H4 such that V T = T U. Clearly, this is an equivalence relation.
372
12 Model Reduction
Lemma 12.7. Let T : H1 −→ H2 and T : H3 −→ H4 be equivalent. Then T and T
have the same singular values. Proof. Let T ∗ T x = μ 2 x. Since V T = T U, it follows that U ∗ T ∗ T Ux = T ∗V ∗V T x = T ∗ T x = μ 2 x, or T ∗ T (Ux) = μ 2 (Ux).
The following proposition is bordering on the trivial, and no proof need be given. However, when applied to Hankel operators, it has far-reaching implications. In fact, it provides a key to duality theory and leads eventually to the proof of the central results. Proposition 12.8. Let T be an invertible linear transformation. Then if x is a singular vector of the operator T corresponding to the singular value μ , i.e., T ∗ T x = μ 2 x, then T −1 (T −1 )∗ x = μ −2 x, i.e., x is also a singular vector for (T −1 )∗ corresponding to the singular value μ −1 . In view of this proposition, it is of interest to compute [(Hφ |H(m))−1 ]∗ . Before proceeding with this, we compute the inverse of a related, intertwining, Tθ operator in H(m). This is a special case of Theorem 11.26. Note that, since Tθ−1 = μn−1 , there exists, by Theorem 11.26, ξ (z) ∈ H+∞ such that Tθ−1 = Tξ and ξ ∞ = μn−1 . The next theorem provides this ξ (z). Theorem 12.9. Let φ (z) =
n(z) d(z)
∈ RH∞ − , with n(z) and d(z) coprime polynomials
n(z) ∞ with real coefficients and let m(z) = dd(z) ∗ (z) . Then θ (z) = d ∗ (z) ∈ RH+ and the operator Tθ : H(m) −→ H(m), defined by equation (11.34), is invertible and its inverse given by T 1 pn∗ , where λn is the last signed singular value of Hφ and pn (z) λn pn
is the minimal-degree solution of the FPE n(z)pn (z) = λn d ∗ (z)p∗n (z) + d(z)πn (z).
(12.23)
Proof. From the previous equation we obtain the Bezout equation n(z) d ∗ (z) which shows that
pn (z) p∗n (z)
1 pn (z) d(z) πn (z) − ∗ = 1. λn p∗n (z) d (z) λn p∗n (z)
∈ RH∞ + . This, by Theorem 11.26, implies the result.
(12.24)
We know, see Corollary 10.21, that stabilizing controllers are related to solutions of Bezout equations over RH∞ + . Thus we expect equation (12.22) to lead to a stabilizing controller. The next corollary is a result of this type.
12.2 Hankel Norm Approximation
Corollary 12.10. Let φ (z) =
373
n(z) d(z)
∈ RH∞ − , with n(z) and d(z) coprime polynomials
(z) stabilizes φ (z). If the multiplicity with real coefficients. The controller k(z) = πpnn(z) of μn is ν , there exists a stabilizing controller of degree n − ν .
Proof. Since pn (z) is antistable, we get from (12.14) that n(z)pn (z) − d(z)πn (z) = λn d ∗ (z)p∗n (z) is stable. We compute n(z) φ (z) −n(z)πn (z) −n(z)πn (z) d(z) = = ∈ RH∞ = +. pn (z) n(z) 1 − k(z)φ (z) n(z)pn (z) − d(z)πn (z) λn d ∗ (z)p∗n (z) 1− πn (z) d(z) Theorem 12.11. Let φ (z) =
∈ RH∞ − , with n(z) and d(z) coprime polynomials
with real coefficients. Let H
−→ X d be defined by H = Hφ | X d . Then
n(z) d(z) ∗ : Xd
∗
∗
1. H −1 : X d −→ X d is given by 1 d pn P− ∗ h. ∗ λn d pn
H −1 h =
(12.25)
∗
2. (H −1 )∗ : X d −→ X d is given by (H −1 )∗ f =
1 d ∗ p∗n P+ f . λn d pn
∗
(12.26)
∗
d −→ X d be the map given by T = mH n . Proof. 1. Let m(z) = dd(z) ∗ (z) and let T : X d Thus we have the following commutative diagram:
Xd
H
∗
@
- Xd
@ @ T @
m
@
@
@
@
@ R
? Xd
∗
374
12 Model Reduction
We compute Tf =
n d∗ n n n d d P P f = PH( d ) ∗ f = PX d ∗ ∗ f , f = − − ∗ ∗ ∗ ∗ d d d d d d d d
i.e., T = Tθ , where θ (z) = Tθ−1 = T 1
pn λn p∗n
. So for
H −1 h =
n(z) d ∗ (z) . Now h(z) ∈ X d ,
from Tθ = mH we have, by Theorem 12.9,
1 1 d 1 d pn d d ∗ pn d pn PH( d ) ∗ ∗ h = P− h= P− h. ∗ d ∗ pn d λn λn d d p∗n d ∗ λn d ∗ p∗n
2. The previous equation can also be rewritten as H −1 h = T 1
pn λn p∗n
mh. ∗
Therefore, using Theorem 11.25, we have, for f (z) ∈ X d , (H −1 )∗ f = m∗ (T 1
pn λn p∗n
)∗ f =
d∗ 1 d ∗ pn 1 p∗n P+ P+ ∗ f . f= d λ n pn λn d pn
Corollary 12.12. There exist polynomials αi (z), of degree ≤ n − 2, such that
λi p∗n (z)pi (z) − λn pn (z)p∗i (z) = λi d ∗ (z)αi (z), i = 1, . . . , n − 1. This holds also formally for i = n with αn (z) = 0. Proof. Since H dn
pi p∗ = λi i , ∗ d d
it follows that ∗ (H −1 n ) d
∗ pi −1 pi . = λ i d∗ d
So, using equation (12.26), we have 1 p∗i 1 d ∗ p∗n pi = P+ , λn d pn d ∗ λi d i.e.,
λn p∗i p∗n pi = P . + λi d ∗ pn d ∗
(12.27)
12.2 Hankel Norm Approximation
375
This implies, using a partial fraction decomposition, the existence of polynomials ∗ p∗ (z) pi (z) αi (z), i = 1, . . . , n, such that deg αi < deg pn = n−1, and ppnn (z) = λλn d ∗i (z) + αpni (z) , (z) d ∗ (z) (z) i i.e., (12.27) follows. We observed in Theorem 12.11 that for the Hankel operator Hφ , the map (Hφ−1 )∗ is not a Hankel map. However, there is an equivalent Hankel map. We sum this up in the following. Theorem 12.13. Let φ (z) =
∈ RH∞ − with n(z) and d(z) coprime polynomials
with real coefficients. Let H
−→ X d be defined by H = Hφ |X d . Then
n(z) d(z) ∗ : Xd
∗
1. The operator (Hφ−1 )∗ is equivalent to the Hankel operator H 1 has singular values
2. The Hankel operator H 1
d ∗ pn λn d p∗n pi (z)
p∗ (z)
Schmidt pairs are { d ∗i (z) ,
d(z)
μ1−1
d ∗ pn λn d p∗n
.
< · · · < μn−1 , and its
}.
Proof. 1. We saw, in (12.26), that ∗ (H −1 n ) = d
Since multiplication by
d ∗ (z) d(z)
d∗ ∗ T1 d λn
pn p∗n
. ∗
is a unitary map of X d onto X d , the operator
∗ ∗ (H −1 n ) has, by Lemma 12.7, the same singular values as T 1
pn λn p∗n
d
the same as those of the adjoint operator T 1 equivalent to the Hankel operator H 1 the all-pass function d∗ T1 d λn
pn p∗ n
f=
d∗ d
d ∗ pn λn d p∗n
pn λn p∗n
. These are
. However, the last operator is
. Indeed, noting that multiplication by ∗
is a unitary map from X d to X d , we compute
d∗ d∗ d 1 pn d ∗ 1 pn f = P f =H1 PH( d∗ ) − d d λn p∗n d d∗ d λn p∗n λn
d ∗ pn d p∗n
f.
2. Next, we show that this Hankel operator has singular values μ1−1 < · · · < μn−1 p∗ (z)
i (z) and its Schmidt pairs are { d ∗i (z) , pd(z) }. Indeed
H d ∗ pn d p∗ n
p∗i d ∗ pn p∗i pn p∗i = P = P . − − d∗ d p∗n d ∗ d p∗n
Now from equation (12.27), we get pn (z)p∗i (z) = or, taking the dual of that equation, pn (z)p∗i (z) =
λi λn
p∗n (z)pi (z) − λλni d ∗ (z)αi (z),
λn ∗ ∗ λi pn (z)pi (z) + d(z)αi (z).
So
pn (z)p∗i (z) λn p∗n (z)pi (z) d(z)αi∗ (z) λn pi (z) αi∗ (z) = + = + ∗ , d(z)p∗n (z) λi d(z)p∗n (z) d(z)p∗n (z) λi d(z) pn (z) which implies P−
pn p∗i d p∗n
=
λn p i λi d
and therefore
p∗i 1 ∗ λn H d p∗n d ∗ d pn
=
1 pi λi d .
376
12 Model Reduction
12.2.4 More on Zeros of Singular Vectors The duality results obtained previously allow us to complete our study of the zero structure of minimal-degree solutions of the fundamental polynomial equation (12.14). This in turn leads to an elementary proof of the central theorem in the AAK theory. Theorem 12.14 (Adamjan, Arov, and Krein). Let φ (z) = and d(z) coprime polynomials with real coefficients.
n(z) d(z)
∈ RH∞ − , with n(z)
1. Let μk be a singular value of Hφ satisfying
μ1 ≥ · · · ≥ μk−1 > μk = · · · = μk+ν −1 > μk+ν ≥ · · · ≥ μn , i.e., μk is a singular value of multiplicity ν . Let pk (z) be the minimum-degree solution of (12.14) corresponding to μk . Then the number of antistable zeros of pk (z) is exactly k − 1. 2. If μ1 is the largest singular value of Hφ and is of multiplicity ν , i.e.,
μ1 = · · · = μν > μν +1 ≥ · · · ≥ μn , and p1 (z) is the corresponding minimum-degree solution of (12.14), then all the (z) is outer. zeros of p1 (z) are stable. This is equivalent to saying that dp∗1 (z) Proof. 1. We saw, in the proof of Proposition 12.6, that the number of antistable zeros of pk (z) is ≥ k − 1. Now by Theorem 12.13, p∗k (z) is the minimum-degree solution of the fundamental equation corresponding to the transfer function −1 −1 1 d ∗ (z)pn (z) λn d(z)p∗ (z) and the singular value μk+ν −1 = · · · = μk . Clearly we have n
−1 −1 −1 −1 −1 μn−1 ≥ · · · ≥ μk+ ν > μk+ν −1 = · · · = μk > μk−1 ≥ · · · ≥ μ1 .
In particular, applying Proposition 12.6, the number of antistable zeros of p∗k (z) is ≥ n − k − ν + 1. Since the degree of p∗k (z) is n − ν , it follows that the number of stable zeros of p∗k (z) is ≤ k − 1. However, this is the same as saying that the number of antistable zeros of pk (z) is ≤ k − 1. Combining the two inequalities, it follows that the number of antistable zeros of pk (z) is exactly k − 1. 2. The first part implies that the minimum-degree solution of (12.14) has only stable zeros. We now come to apply some results of the previous sections to the case of Hankel norm approximation. We use here the characterization, obtained in Section 7.4, of singular values μ1 ≥ μ2 ≥ · · · of a linear transformation A : V1 −→ V2 as approximation numbers, namely
μk = inf{A − Ak | rank Ak ≤ k − 1}.
12.2 Hankel Norm Approximation
377
∞ We shall denote by RH∞ [k−1] the set of all rational functions in RL that have at most k − 1 antistable poles. n(z) Theorem 12.15 (Adamjan, Arov, and Krein). Let φ (z) = d(z) ∈ RH∞ − be a scalar strictly proper transfer function, with n(z) and d(z) coprime polynomials and d(z) monic of degree n. Assume that
μ1 ≥ · · · ≥ μk−1 > μk = · · · = μk+ν −1 > μk+ν ≥ · · · ≥ μn > 0 are the singular values of Hφ . Then μk = inf Hφ − A|rankA ≤ k − 1 = inf Hφ − Hψ |rankHψ ≤ k − 1
= inf φ − ψ ∞|ψ ∈ RH∞ [k−1] . Moreover, the infimum is attained on a unique function ψk (z) = φ (z) − (z) μk gf k (z) , k
(Hφ f k )(z) f k (z)
=
φ (z) − where { fk (z), gk (z)} is an arbitrary Schmidt pair of Hφ that corresponds to μk . Proof. Given ψ (z) ∈ RH∞ [k−1] , we have, by Kronecker’s theorem, that rankHψ = k − 1. Therefore we clearly have μk = inf Hφ − A|rankA ≤ k − 1 ≤ inf Hφ − Hψ |rankHψ ≤ k − 1
≤ inf φ − ψ ∞|ψ ∈ RH∞ [k−1] . Therefore, the proof will be complete if we can exhibit a function ψk (z) ∈ RH∞ [k−1] for which the equality μk = φ − ψ ∞ holds. To this end let pk (z) be the minimal-degree solution of (12.14), and define ψk (z) = polynomial equation
πk (z) pk (z) . From the fundamental
n(z)pk (z) = λk d ∗ (z)p∗k (z) + d(z)πk (z) we get, dividing by d(z)pk (z), that d ∗ (z)p∗k (z) n(z) πk (z) − = λk . d(z) pk (z) d(z)pk (z)
(12.28)
378
12 Model Reduction
This is, of course, equivalent to
ψk (z) = since for fk (z) =
p∗k (z) d(z)
d ∗ (z)p∗k (z) Hφ fk πk (z) n(z) , = − λk = φ (z) − pk (z) d(z) d(z)pk (z) fk we have Hφ fk = λk dp∗k (z) , and by Proposition 12.1, the ratio (z)
Hφ f k fk
is independent of the particular Schmidt pair. So from (12.28), we obtain the following norm estimate: d ∗ p∗k n πk φ − ψ ∞ = − ∞ = λ k ∞ = μk . d pk d pk Moreover,
πk (z) pk (z)
(12.29)
∈ RH∞ [k−1] , as pk (z) has exactly k − 1 antistable zeros.
Corollary 12.16. The polynomials πk (z) and pk (z), defined in Theorem 12.15, have no common antistable zeros. Proof. Follows from the fact that rank H πk ≥ k − 1.
pk
The error estimates (12.28) and (12.29) are important from the point of view of model reduction. We note that in the generic case of distinct singular values, the polynomials πk (z), pk (z) have degree n − 1, so the quotient πpk (z) , which is proper, (z) has 2n − 1 free parameters. From (12.28) it follows that
πk (z) pk (z)
k
interpolates the values
n(z) at the 2n − 1 zeros of d ∗ (z)p∗k (z). In the special case of k = n, the of φ (z) = d(z) ∗ polynomials d (z), p∗k (z) are both stable, so all interpolation points are in the open left half-plane. This observation is important, inasmuch as it places interpolation as a potential tool in model reduction. The choice of interpolation points is crucial for controlling the error. Our method not only establishes the optimal Hankel norm approximant but also gives a tight bound on the L∞ error.
12.2.5 Nehari’s Theorem Nehari’s theorem is one of the simplest model-reduction problems, i.e., the problem of approximating a given system by a system of smaller complexity. It states that if one wants to approximate an antistable function g(z) by a stable function, then the smallest error norm that can be achieved is precisely the Hankel norm of g(z). We are ready to give now a simple proof of Nehari’s theorem in our rational context. Theorem 12.17 (Nehari). Given a rational function φ (z) = d(z) = 1, then
n(z) d(z)
∈ RH∞ − and n(z)∧
12.2 Hankel Norm Approximation
379
σ1 = Hφ = inf{φ − q∞ | q(z) ∈ RH∞ + }, and this infimum is attained on a unique function q(z) = φ (z) − σ1 g(z) f (z) , where { f (z), g(z)} is an arbitrary σ1 -Schmidt pair of Hφ . Proof. Let σ1 = Hφ . It follows from equation (11.94) and the fact that for q(z) ∈ RH∞ + , we have Hq = 0, that
σ1 = Hφ = Hφ − Hq = Hφ −q ≤ φ − q∞, and so σ1 ≤ infq∈RH∞+ φ − q∞. To complete the proof we will show that there exists q(z) ∈ RH∞ + for which equality holds. We saw, in Theorem 12.14, that for σ1 = Hφ there exists a stable solution p1 (z) of n(z)p1 (z) = λ1 d ∗ (z)p∗1 (z) + d(z)π1 . Dividing this equation by d(z)p1 (z), we get d ∗ (z)p∗1 (z) n(z) π1 (z) − = λ1 . d(z) p1 (z) d(z)p1 (z) So, with q(z) =
π1 (z) p1 (z)
=
n(z) d(z)
− λ1
d ∗ (z)p∗1 (z) d(z)p1 (z)
∈ RH∞ + , we get
φ − q∞ = σ1 = Hφ .
12.2.6 Nevanlinna–Pick Interpolation We discuss now briefly the connection between Nehari’s theorem in the rational case and the finite Nevanlinna–Pick interpolation problem, which is described next. Definition 12.18. Given points λ1 , . . . , λs in the open right half-plane and complex numbers c1 , . . . , cs , then ψ (z) ∈ RH∞ + is a Nevanlinna–Pick interpolant if it is a function of minimum RH∞ norm that satisfies +
ψ (λi ) = ci ,
i = 1, . . . , s.
We can state the following. Theorem 12.19. Given the Nevanlinna–Pick interpolation problem of Definition 12.18, we define s
d(z) = ∏(z − λi ), i=1
(12.30)
380
12 Model Reduction
and let n(z) be the unique minimal-degree polynomial satisfying the interpolation constraints n(λi ) = d ∗ (λi )ci . Let p1 (z) be the minimal-degree solution of n(z)p1 (z) = λ1 d ∗ (z)p∗1 (z) + d(z)π1 (z) corresponding to the largest singular value σ1 . Then the Nevanlinna–Pick interpolant is given by p∗ (z) ψ (z) = λ1 1 . (12.31) p1 (z) Proof. Note that since the polynomial d(z), defined by (12.30), is clearly antistable, therefore d ∗ (z) is stable. We proceed to construct a special RH∞ + interpolant. Let n(z) be the unique polynomial, with deg n < deg d = s, that satisfies the following interpolation constraints: n(λi ) = d ∗ (λi )ci ,
i = 1, . . . , s.
(12.32)
This polynomial interpolant can be easily constructed by Lagrange interpolation or any other equivalent method. We note that since d ∗ (z) is stable, d ∗ (λi ) = 0 for ∞ i = 1, . . . , s and dn(z) ∗ (z) ∈ RH+ . Moreover, equation (12.32) implies n(λi ) = ci , d ∗ (λi )
i = 1, . . . , s,
(12.33)
n(z) ∞ d ∗ (z) is an RH+ interpolant. Any other interpolant is necessarily of the form n(z) d(z) ∞ d ∗ (z) − d ∗ (z) θ (z) for some θ (z) ∈ RH+ . To find infθ ∈RH∞+ dn∗ − dd∗ θ ∞ , is equivalent, noting that dd(z) ∗ (z) is inner, to finding infθ (z)∈RH∞+ dn − θ ∞ . However, this is just the content of Nehari’s theorem, and we
i.e.,
have
n inf ∞ − θ = σ1 . θ ∈RH+ d ∞ d ∗ (z)p∗ (z)
n(z) Moreover, the minimizing function is given by θ (z) = πp1 (z) = d(z) − λ1 d(z)p 1(z) . 1 (z) 1 Going back to the interpolation problem, we get for the Nevanlinna–Pick interpolant
ψ (z) =
d ∗ (z)p∗1 (z) p∗1 (z) n(z) d(z) d(z) θ (z) = λ λ − − = . 1 1 d ∗ (z) d ∗ (z) d ∗ (z) d(z)p1 (z) p1 (z)
12.2 Hankel Norm Approximation
381
12.2.7 Hankel Approximant Singular Values and Vectors Our aim in this section is, given a Hankel operator with rational symbol φ (z), to study the geometry of singular values and singular vectors corresponding to the Hankel operators with symbols equal to the best Hankel norm approximant and the Nehari complement. For the simplicity of exposition, in the rest of this chapter we n(z) ∈ RH∞ will make the genericity assumption that for φ (z) = d(z) − , all the singular values of H dn are simple, i.e., have multiplicity 1. In our approach, the geometric aspects of the Hankel norm approximation method become clear, for we shall see that the singular vectors of the approximant are the orthogonal projections of the initial singular vectors on the appropriate model space. They correspond to the original singular values except the smallest one. n(z) ∈ RH∞ Theorem 12.20. Let φ (z) = d(z) − , and let pi (z) be the minimal-degree solutions of n(z)pi (z) = λi d ∗ (z)p∗i (z) + d(z)πi (z).
Consider the best Hankel norm approximant πpnn (z) = (z) sponds to the smallest nonzero singular value. Then 1.
πn pn
n(z) d(z)
∗ (z)p∗ (z)
− λn dd(z)pnn(z) that corre-
∈ RH∞ − and H πn has the singuar values σi = |λi |, i = 1, . . . , n − 1, and the pn
α ∗ (z)
σi -Schmidt pairs of H πn are given by { αp∗i (z) , i }, where the polynomials αi (z) pn n (z) pn (z) are given by λi p∗n (z)pi (z) − λn pn (z)p∗i (z) = λi d ∗ (z)αi (z). (12.34) 2. Moreover, we have
αi pi = PX p∗n ∗ , ∗ pn d
(12.35)
i.e., the singular vectors of H πn are projections of the singular vectors of H dn onto pn
∗
X pn , the orthogonal complement of Ker H πn = ppn∗ RH2+ . n pn 3. We have
2 2 αi pi 2 = 1 − λn p∗ λi d ∗ . n
(12.36)
Proof. 1. Rewrite equation (12.27) as
λi So
λi
pi (z) pn (z)p∗i (z) αi (z) λ − = λi ∗ . n d ∗ (z) d ∗ (z)p∗n (z) pn (z)
πn (z) pi (z) πn (z) pn (z)p∗i (z) πn (z) αi (z) − λn = λi . ∗ pn (z) d (z) pn (z) d ∗ (z)p∗n (z) pn (z) p∗n (z)
(12.37)
382
12 Model Reduction
Projecting on RH2− , and recalling that pn (z) is antistable, we get P− which implies
pi (z) d ∗ (z)
−
αi (z) p∗n (z)
π n pi πn αi = P− , ∗ pn d pn p∗n
∈ Ker H πn . This is also clear from pn
pi (z) αi (z) λn pn (z) pi (z) pn − = ∈ RH2+ = Ker H πn . pn d ∗ (z) p∗n (z) λi p∗n (z) d ∗ (z) p∗n From
πn (z) n(z) d ∗ (z)pn (z)∗ (z) = − λn , pn (z) d(z) d(z)pn (z)
we compute H πn
pn
pi pi pi pi =H n = H dn ∗ −λn H d ∗ p∗n ∗ d ∗ p∗ ( d −λn d pnn ) d ∗ d∗ d d pn d = λi
p∗i d ∗ p∗n pi p∗ λn p∗n pi −λn P− = λi i −P− ∗ d d pn d d d pn
= λi
p∗i λn p∗n pi p∗ {λi pn p∗i − λid ∗ αi } − = λi i − d d pn d d pn
= λi
d αi∗ α∗ = λi i . d pn pn
Finally, we get H πn
pn
αi αi∗ = λ . i p∗n pn
2. Note that equation (12.37) can be rewritten as
αi (z) λn pn (z) p∗i (z) pi (z) = ∗ + . ∗ d (z) pn (z) λi p∗n (z) d ∗ (z) p∗ (z)
∗
(12.38)
Since d ∗i (z) ∈ RH2+ , this yields, by projecting on X pn = { pp∗n RH2+ }⊥ , equation n (12.35). 3. Follows from equation (12.38), using orthogonality and computing norms.
12.2 Hankel Norm Approximation
383
12.2.8 Orthogonality Relations We present next the derivation of some polynomial identities arising out of the singular value/singular vector equations. We proceed to interpret these relations as orthogonality relations between singular vectors associated with different singular values. Furthermore, the same equations provide useful orthogonal decompositions of singular vectors. Equation (12.34), in fact more general relations, could be derived directly by simple computations, and this we proceed to explain. Starting from the singular value equations, i.e., n(z)pi (z) = λi d ∗ (z)p∗i (z) + d(z)πi (z), n(z)p j (z) = λ j d ∗ (z)p∗j (z) + d(z)π j (z), we get 0 = d ∗ (z){λi p∗i (z)p j (z) − λ j pi (z)p∗j (z)} + d(z){πi (z)p j (z) − π j (z)pi (z)}. (12.39) Since d(z) and d ∗ (z) are coprime, there exist polynomials αi j (z), of degree ≤ n − 2, for which λi p∗i (z)p j (z) − λ j pi (z)p∗j (z) = d(z)αi j (z) (12.40) and πi (z)p j (z) − π j (z)pi (z) = −d ∗ (z)αi j (z). These equations have a very nice interpretation as orthogonality relations. In fact, we know that for any self-adjoint operator, eigenvectors corresponding to different eigenvalues are orthogonal. In particular, this applies to singular vectors. Thus, under our genericity assumption, we have for i = j, p
i , d∗
pj = 0. d ∗ RH2+
(12.41)
This orthogonality relation could be derived directly from the polynomial equations by contour integration in the complex plane. Indeed, equation (12.40) could be rewritten as ∗ αi j p∗i (z) p j (z) λ j pi (z) p j (z) − = ∗ . d(z) d ∗ (z) λi d ∗ (z) d(z) d (z) This equation can be integrated over the boundary of a half-disk of sufficiently large radius R, centered at origin and that lies in the right half-plane. Since d ∗ (z) is stable, the integral on the right-hand side is zero. A standard estimate, using the fact that deg p∗i p j ≤ 2n − 2, leads in the limit, as R → ∞, to p
j , d∗
λ j pi p j pi = , . d ∗ RH2+ λi d ∗ d ∗ RH2+
384
12 Model Reduction
This indeed implies the orthogonality relation (12.41). Equation (12.40) can be rewritten as p∗i (z)p j (z) =
λj 1 pi (z)p∗j (z) + d(z)αi j (z). λi λi
However, if we rewrite (12.40) as λ j pi (z)p∗j (z) = λi p∗i (z)p j (z) − d(z)αi j (z), then, λi λj
after conjugation, we get p∗i (z)p j (z) = two expressions leads to
λi λ j − λ j λi
pi (z)p∗j (z) =
pi (z)p∗j (z) − λ1 d ∗ (z)αi∗j (z). Equating the j
1 1 d(z)αi j (z) + d ∗ (z)αi∗j (z). λi λj
Putting j = i, we get αii (z) = 0. Otherwise, we have pi (z)p∗j (z) =
1
λi2 − λ j2
{λ j d(z)αi j (z) − λi d ∗ (z)αi∗j (z)}.
(12.42)
Conjugating this last equation and interchanging indices leads to pi (z)p∗j (z) =
1
λ j2 − λi2
{λ j d(z)α ji (z) − λi d ∗ (z)α ∗ji (z)}.
(12.43)
Comparing the two expressions leads to
α ji (z) = −αi j (z).
(12.44)
We continue by studying two special cases. For the case j = n we put αi = obtain
or
∗ αin λi
to
λi p∗n (z)pi (z) − λn pn (z)p∗i (z) = λi d ∗ (z)αi (z), i = 1, . . . , n − 1,
(12.45)
λi p∗i (z)pn (z) − λn pi (z)p∗n (z) = λi d(z)αi∗ (z), i = 1, . . . , n − 1.
(12.46)
From equation (12.39) it follows that πn (z)pi (z) − πi (z)pn (z) = λi d ∗ (z)αi∗ (z), or equivalently, πn∗ (z)p∗i (z) − πi∗ (z)p∗n (z) = λi d(z)αi (z). If we specialize now to the case i = 1, we obtain. πn (z)p1 (z)− π1 (z)pn (z) = λ1 d ∗ (z)α1∗ (z), which, after dividing through by p1 (z)pn (z) and conjugating, yields
πn∗ (z) π1∗ (z) d(z)α1 (z) − = λ1 ∗ . p∗n (z) p∗1 (z) p1 (z)p∗n (z)
(12.47)
12.2 Hankel Norm Approximation
385
Similarly, starting from equation (12.40) and putting i = 1, we get λ1 p∗1 (z)pi (z)− λi p1 (z)p∗i (z) = d(z)α1i (z). Putting also βi (z) = λ1−1 α1i (z), we get
λ1 p∗1 (z)pi (z) − λi p1 (z)p∗i (z) = λ1 d(z)βi (z),
(12.48)
and of course β1 (z) = 0. This can be rewritten as p∗1 (z)pi (z) −
λi p1 (z)p∗i (z) = d(z)βi (z), λ1
(12.49)
p1 (z)p∗i (z) −
λi ∗ p (z)pi (z) = d ∗ (z)βi∗ (z). λ1 1
(12.50)
or
This is equivalent to
p∗i (z) βi∗ (z) λi p∗1 (z) pi = + . d ∗ (z) p1 (z) λ1 p1 (z) d ∗
(12.51) p∗ (z)
We note that equation (12.51) is nothing but the orthogonal decomposition of d ∗i (z) p∗ relative to the orthogonal direct sum RH2+ = X p1 ⊕ 1 RH2+ . Therefore we have p1 ∗ 2 ∗ 2 2 βi pi |λi |2 2 pi 2 = pi − |λi | = 1 − . p1 d∗ |λ1 |2 d ∗ d∗ |λ1 |2
(12.52)
Notice that if we specialize equation (12.46) to the case i = 1 and equation (12.48) to the case i = n, we obtain the relation βn (z) = α1∗ (z).
12.2.9 Duality in Hankel Norm Approximation In the present subsection we will shed some light on intrinsic duality properties of the problems of Hankel norm approximation and Nehari extensions. Results strongly suggesting such an underlying duality have appeared previously. This, in fact, turns out to be the case, though the duality analysis is far from being obvious. There are three operations that we shall apply to a given, antistable, transfer function, namely, inversion of the restricted Hankel operator, taking the adjoint map and finally one-sided multiplication by unitary operators. The last two operations do not change the singular values, whereas the first operation inverts them. In the process, we will prove a result dual to Theorem 12.20. While this analysis, after leading to the form of the Schmidt pairs in Theorem 12.22, is not necessary for the proof, it is of independent interest, and its omission would leave all intuition out of the exposition.
386
12 Model Reduction
The analysis of duality can be summed up in the following scheme, which exhibits the relevant Hankel operators, their singular values, and the corresponding Schmidt pairs:
∗
H dn : X d −→ X d σ1> · · · >σn pi p∗i , d∗ d
H1
d ∗ pn λn d p∗n
σ1−1< · · · <σn−1 p∗i pi , d∗ d
ICE -
(n − 1)th approx. ?
0th approx. ? ∗
H1
H π1∗ : X p1 −→ X p1 p∗ 1
σ σn 2 > · · · > βi βi∗ , p1 p∗1
∗
: X d −→ X d
d ∗ α1∗ λn p∗ p∗n 1
ICE
∗
: X p1 −→ X p1
σ2−1< · · · <σn−1 βi∗ βi , p1 p∗1
We would like to analyze the truncation that corresponds to the largest singular value. To this end we invert (I) the Hankel operator H dn and conjugate (C) it, i.e., take its adjoint, as in Theorem 12.11. This operation preserves Schmidt pairs and inverts singular values. However, the operator so obtained is not a Hankel operator. This we correct by replacing it with an equivalent Hankel operator (E). This preserves singular values but changes the Schmidt pairs. Thus ICE in the previous diagram stands for a sequence of these three operations. To the Hankel operator so obtained, i.e., to H 1 d ∗ pn , we apply Theorem 12.20, which leads to the Hankel operator H1
d ∗ α1∗ λn p∗ p∗n 1
λn d p∗n
. This is done in Theorem 12.21. To this Hankel operator we apply again
the sequence of three operations ICE, and this leads to Theorem 12.22. We proceed to study this Hankel map. Theorem 12.21. For the Hankel operator H 1
d ∗ pn λn d p∗n
, the Hankel norm approximant
corresponding to the least singular value, i.e., to σ1−1 , is For the Hankel operator H 1 d ∗ α1∗ we have λn p∗ p∗n 1
∗ ∗ 1 d α1 λn p∗1 p∗n .
12.2 Hankel Norm Approximation
1. Ker H 1
d ∗ α1∗ λn p∗ p∗n 1 p∗1 p1
=
387
p∗1 RH2+ . p1
2. X p1 = { RH2+ }⊥ . 3. The singular values of H 1
d ∗ α1∗ λn p∗ p∗n 1
4. The Schmidt pairs of H 1
d ∗ α1∗ λn p∗ p∗n 1
are σ2−1 < · · · < σn−1 . β ∗ (z)
are { pi (z) , pβ∗i (z) }, where (z) 1
1
βi∗ p∗ = PX p1 ∗i , p1 d or
(12.53)
p∗ (z) λi p∗1 (z) pi (z) βi∗ (z) = ∗i − . p1 (z) d (z) λ1 p1 (z) d ∗ (z)
Proof. By Theorem 12.13, the Schmidt pairs for H 1
d ∗ pn λn d p∗n
p∗ (z)
i (z) are { d ∗i (z) , pd(z) }. There-
fore the best Hankel norm approximant associated to σ1−1 is, using also equation (12.45), 1 d ∗ (z)pn (z) 1 p1 (z) p∗1 (z) 1 d ∗ (z)pn (z) 1 d ∗ (z)p1 (z) − / = − λn d(z)p∗n (z) λ1 d(z) d ∗ (z) λn d(z)p∗n (z) λ1 d(z)p∗1 (z) =
1 d ∗ (z){λ1 d(z)α1∗ (z)} 1 d ∗(z) {λ1 pn (z)p∗1 (z) − λn p1 (z)p∗n (z)} = ∗ ∗ λ 1 λn d(z)p1 (z)pn (z) λ1 λn d(z)p∗1 (z)p∗n (z)
=
1 d ∗ (z)α1∗ (z) . λn p∗1 (z)p∗n (z)
1. Let f (z) ∈
p∗1 2 p1 RH+ ,
i.e., f (z) = P−
since both
d ∗ (z)α1∗ (z) p1 (z)p∗n (z)
p∗1 (z) p1 (z) g(z),
for some g(z) ∈ RH2+ . Then
1 1 d ∗ α1∗ p∗1 d ∗ α1∗ g = P− g = 0, ∗ ∗ λ n p1 pn p1 λn p1 p∗n
and g(z) are in RH2+ .
Conversely, let f (z) ∈ Ker H 1
d ∗ α1∗ λn p∗ p∗n 1
d∗α ∗
i.e., P− p∗ p1∗ f = 0. This implies, p∗1 (z) | 1 n
d ∗ (z)α1 (z) f (z). Now p1 (z) and d(z) are coprime, since the first polynomial is stable, whereas the second is antistable. This implies naturally the coprimeness of p∗1 (z) and d ∗ (z). Also we have
λ1 p∗1 (z)pn (z) − λn p1 (z)p∗n (z) = λ1 d(z)α1∗ (z).
388
12 Model Reduction
If p∗1 (z) and α1∗ (z) are not coprime, then by the previous equation, p∗1 (z) has a common factor with p1 (z)p∗n (z). However, p1 (z) and pn (z) are coprime, since the first is stable and the second antistable. So are p1 (z) and p∗1 (z), and for the same f (z) reason. Therefore, we must have that p∗ (z) is analytic in the right half-plane. So p∗
p1 (z) p∗1 (z)
1
f (z) ∈ RH2+ , i.e., f (z) ∈ p11 RH2+ , 2. Follows from the previous part. 3. This is a consequence of Theorem 12.20. 4. Follows also from Theorem 12.20, since the singular vectors of H 1
d ∗ α1∗ λn p∗ p∗n 1
p∗
are
given by PX p1 d ∗i . This can be computed. Indeed, starting from equation (12.48), we have λ1 p∗1 (z)pi (z) − λi p1 (z)p∗i (z) . βi (z) = λ1 d(z) We compute
λi βn (z)p∗i (z) − λn βi (z)p∗n (z) λ1 p∗1 (z)pn (z) − λn p1 (z)p∗n (z) p∗i (z) = λi λ1 d(z) λ1 p∗1 (z)pi (z) − λi p1 (z)p∗i (z) p∗n (z) − λn λ1 d(z) =
λ1 p∗1 (z) {λi p∗i (z)pn (z) − λn pi (z)p∗n (z)} λ1 d(z)
=
p∗1 (z) {λi d(z)αi∗ (z)}{λi p∗i (z)pn (z) − λn pi (z)p∗n (z)} d(z)
= λi p∗1 (z)αi∗ (z). Recalling that p∗ (z) λi p∗1 (z) pi (z) βi∗ (z) = ∗i − p1 (z) d (z) λ1 p1 (z) d ∗ (z)
(12.54)
and βn (z) = α1∗ (z), we have H
∗ ∗ 1 d α1 λn p∗ pn∗ 1
βi∗ 1 βi 1 − = P− ∗ ∗ p1 λi p∗1 p 1 pn p 1
1 1 ∗ ∗ ∗ d β n β i − p 1 pn β i . λn λi
Thus, it suffices to show that the last term is zero. From equation (12.50), we have λi d ∗ (z)βi∗ (z) = p1 (z)p∗i (z) − p∗1 (z)pi (z), λ1
12.2 Hankel Norm Approximation
389
hence
1 ∗ 1 ∗ ∗ d (z)βn (z)βi (z) − p1 (z)pn (z)βi (z) λn λi 1 λi 1 p1 (z)p∗i (z)βn (z) − pi (z)p∗1 (z)βn (z) = P− ∗ ∗ p1 (z)pn (z)p1 (z) λn λ1 1 − p1 (z)p∗n (z)βi (z) λi p1 1 λi ∗ ∗ ∗ = P− ∗ ∗ ( λ i p i β n − λ n pn β i ) − p i p 1 β n p 1 pn p 1 λ i λ n λ1 1 λi p1 ∗ ∗ ∗ λ1 λi p1 αi − pi p1 βn = P− ∗ ∗ p 1 pn p 1 λ i λ n λ1 1 λ1 λi p1 αi∗ − pi βn = 0, = P− ∗ p n p1 λ n λ1
1 P− ∗ p1 (z)p∗n (z)p1 (z)
since p1 (z)p∗n (z) is stable. Theorem 12.22. Let φ (z) = Let
π1 (z) p1 (z)
n(z) d(z)
∈ RH∞ − and n(z) ∧ d(z) = 1, i.e., d(z) is antistable.
be the optimal causal, i.e., RH∞ + , approximant, to
n(z) d(z) .
Then
n(z) − πp1 (z) is all-pass. 1. d(z) 1 (z) 2. The singular values of the Hankel operator H π1∗ are σ2 > · · · > σn , and the p∗1
corresponding Schmidt pairs of H π1∗ are p∗ 1
defined by
βi (z) =
β (z) β ∗ (z) { p i (z) , pi∗ (z) }, 1 1
where the βi (z) are
λ1 p∗1 (z)pi (z) − λi p1 (z)p∗i (z) . λ1 d(z)
(12.55)
Proof. 1. Follows from the equality n(z) π1 (z) d ∗ (z)p∗1 (z) − = λ1 . d(z) p1 (z) d(z)p1 (z) π ∗ (z)
π ∗ (z)
π ∗ (z)
α1 (z) ∞ n 2. We saw, in equation (12.47), that p1∗ (z) = pn∗ (z) − λ1 pd(z) ∗ (z)p∗ (z) . Since p∗ (z) ∈ RH− , n n n 1 1 the associated Hankel operator is zero. Hence H π1∗ = H−λ d α1 . Thus we have to
show that
p∗1
1 p∗ p∗ 1 n
390
12 Model Reduction
H−λ
d α1 1 p∗ p∗ 1 n
βi β∗ = λi i∗ . p1 p1
To this end, we start from equation (12.49), which, multiplied by α1 , yields
λi p1 (z)p∗i (z)α1 (z). λ1
(12.56)
d(z)α1 (z) βi (z) pi (z)α1 (z) λi α1 (z)p∗i (z) = − . p∗1 (z)p∗n (z) p1 (z) p1 (z)p∗n (z) λ1 p∗1 (z)p∗n (z)
(12.57)
d(z)α1 (z)βi (z) = p∗1 (z)pi (z)α1 (z) − This in turn implies
Since
pi (z)α1 (z) p1 (z)p∗n (z)
∈ H+2 , we have − λ1 P−
d α1 β i α1 p∗i = λ P . i − ∗ p1 p∗n p1 p∗1 p∗n
(12.58)
All we have to do is to obtain a partial fraction decomposition of the last term. To this end, we go back to equation (12.49), from which we get d(z)βi (z)pn (z) = p∗1 (z)pi (z)pn (z) −
λi p1 (z)p∗i (z)pn (z), λ1
λi d(z)βn (z)pi (z) = p∗1 (z)pn (z)pi (z) − p1 (z)p∗n (z)pi (z). λ1
(12.59)
Hence p1 (z) {λi p∗i (z)pn (z) − λn p∗n (z)pi (z)} λ1 p1 {λi d(z)αi∗ (z)} = λ1 (12.60)
d(z)(βn (z)pi (z) − d(z)βi (z)pn (z)) =
and
βn pi (z) − d(z)βi pn (z) =
λi p1 (z)αi∗ (z). λ1
(12.61)
Now
βn∗ (z)p∗i (z) − βi∗ (z)p∗n (z) = α1 (z)p∗i (z) − βi∗ (z)p∗n (z) = Dividing through by p∗1 (z)p∗n (z), we get
α1 (z)p∗i (z) βi∗ (z) λi αi (z) . = + , p∗1 (z)p∗n (z) p∗1 (z) λ1 p∗n (z)
λi ∗ p (z)αi (z). λ1 1
12.2 Hankel Norm Approximation
391 α p∗
and from this it follows that P− p∗1 p∗i = 1 n
− λ1 P−
βi∗ p∗1 .
Using equation (12.58), we have
d α1 βi β∗ = λi i∗ , ∗ ∗ p 1 pn p 1 p1
(12.62)
and this completes the proof.
There is an alternative way of looking at duality, and this is summed up in the following diagram:
∗
H dn : X d −→ X d σ1> · · · >σn pi p∗i , d∗ d
H1
d ∗ pn λn d p∗n
∗
: X d −→ X d
σ1−1< · · · <σn−1 p∗i pi , d∗ d
ICE -
(n − 1)th approx.
0th approx. ?
? ∗
H
H πn : X pn −→ X pn pn
σ1 > · · · > σn−1 αi αi∗ , p∗n pn
ICE
∗ 1 pn αn−1 λn−1 pn α ∗ n−1
∗
: X pn −→ X pn
−1 σ1−1< · · · < σn−1 ∗ αi αi , p∗n pn
-
We will not go into the details except for the following. Theorem 12.23. The Hankel operator values
σ1−1
< ··· <
−1 σn−1
1 ∗ λn−1 H pn αn−1
and Schmidt pairs
∗ pn αn−1
∗
: X pn −→ X pn has singular
α ∗ (z) { pi∗ (z) , αpni (z) }. (z) n
Proof. Starting from
πn (z)αi (z) = λi p∗n (z)αi∗ (z) + pn (z)ζi (z), πn (z)α j (z) = λ j p∗n (z)α ∗j (z) + pn (z)ζ j (z),
392
12 Model Reduction
we get 0 = p∗n (z){λi α j (z)αi∗ (z) − λ j αi (z)α ∗j (z)} + pn(z){α j (z)ζi (z) − αi (z)ζ j (z)}. ∗ (z) = λ p (z)κ (z), or For j = n − 1 we can write λi αn−1 (z)αi∗ (z) − λn−1 αi (z)αn−1 i n i
αn−1 (z)αi∗ (z) = pn (z)κi (z) +
λn−1 ∗ λi αi (z)αn−1 (z),
i.e.,
∗ (z) αi∗ (z) κi (z) λn−1 αn−1 αi (z) = + . pn (z) αn−1 (z) λi αn−1 (z) pn (z)
Thus we have 1
λn−1
H p∗n αn−1
∗ pn αn−1
αi∗ 1 1 = P− ∗ ∗ pn λn−1 pn αn−1 =
1
λn−1
P−
p n κi +
λn−1 ∗ αi αn−1 λi
1 1 αi κi αi + P− = . ∗ αn−1 λi pn λi pn
12.3 Model Reduction: A Circle of Ideas In this section we explore a circle of ideas related to model reduction, i.e., to the approximation of a large-scale linear system by a lower-order system, easier to compute with, but such that the approximation error is kept under control. In the spirit of this book, we will treat only the scalar case, i.e., the case of a SISO system. Recently, several different methods have been explored for the purpose of system approximation. These include, apart from Hankel norm approximation, approximation by interpolation, the application of the Sylvester equation, and truncation or projection methods of which balanced truncation is a special case. Not much effort has been invested in the study of the connections between the different methods. Our aim will be to show how the polynomial Sylvester equation can be used to illuminate the interrelation between the different methods. We will show how the polynomial Sylvester equation can be used directly for model-reduction purposes using either the interpolation method or the projection method. We will show also how the important methods of Hankel norm approximation and balanced truncation relate to the polynomial Sylvester equation.
12.3.1 The Sylvester Equation and Interpolation The simplest case of model reduction by interpolation is that of approximation at ∞ up to a certain order. Given a strictly proper rational function g(z) having the g expansion g(z) = ∑∞j=1 z jj at ∞, we look for a lower-degree strictly proper rational function g(z) that matches the Markov parameters of g(z), i.e., the coefficients {g j }∞j=1 , up to a certain order. A realization of g(z) is called a partial realization of g(z).
12.3 Model Reduction: A Circle of Ideas
393
Theorem 12.24. Let g(z) = p(z)/q(z) be stricly proper with q(z), p(z) ∈ F[z] coprime polynomials. Let q(z), p(z) ∈ R[z] be the unique solution of the polynomial Bezout–Sylvester equation q(z)p(z) − p(z)q(z) + 1 = 0
(12.63)
that satisfies the degree constraints deg q < deg q,
deg p < deg p.
(12.64)
Then a realization of g(z) = p(z)/q(z) is a partial realization of g(z) that approximates g(z) at ∞ up to order n + n. Proof. The Bezout–Sylvester equation (12.63) implies, for the error transfer function, p(z) p(z) 1 e(z) = g(z) − g(z) = − = . q(z) q(z) q(z)q(z) Since deg q(z) = n and deg q(z) = n, this implies that e(z) has a zero of order n + n at ∞. Hence g(z) interpolates the value of g(z) and its first n + n − 1 derivatives at ∞. It has been known since antiquity that the Bezout–Sylvester equation (12.63) can be solved using the Euclidean algorithm, i.e., by recursively using the division rule of polynomials. The approximant can be obtained by use of the Lanczos polynomials, introduced in Chapter 8, which are the orthogonal polynomials corresponding to a Hankel matrix associated with g(z). So far, we have used only the Bezout–Sylvester equation. If we replace it by the polynomial Sylvester equation, namely by q(z)p(z) − p(z)q(z) + r(z) = 0,
(12.65)
where we assume ρ = deg r < deg q, then again there exists a unique solution that satisfies the degree constraints (12.64). Thus for the error transfer function we obtain e(z) = g(z) − g(z) =
r(z) p(z) p(z) − = . q(z) q(z) q(z)q(z)
(12.66)
This shows that g(z) can be obtained by interpolating the values of g(z), and its derivatives up to appropriate orders, depending on the zeros of r(z), as well as at ∞ to the order of n + n − ρ .
394
12 Model Reduction
12.3.2 The Sylvester Equation and the Projection Method Another approach to model reduction is to project the system on a subspace of the state space. The projection, in general, is an oblique projection. Proposition 12.25. Given a transfer function g(z), of McMillan degree n, having the minimal realization g(z) = D +C(zI − A)−1 B, in the state space X , let X be a linear space of lower dimension. Let W : X −→ X be surjective and Y : X −→ X injective linear maps for which we have W Y = I.
(12.67)
A = W AY , B = W B, C = CY , D = D.
(12.68)
Define linear maps A, B,C,D by
Then we have 1. The map P : X −→ X , defined by P=Y W,
(12.69)
Ker P = Ker W , Im P = Im Y .
(12.70)
is a projection. 2. We have
3. We have the direct sum decomposition X = Ker W = Ker W ⊕ Im Y .
(12.71)
4. Let { f 1 , . . . , fn−k } be a basis for Ker W and {e1 , . . . , ek } a basis for X . Then { f 1 , . . . , fn−k , Y e1 , . . . , Y ek } is a basis for X , and with respect to these bases, we have the matrix representations 0 W = 0I , Y = . (12.72) I 5. If
A=
A11 A12 A21 A22
, B=
B1 B2
, C = C1 C2
(12.73)
are the matrix representations with respect to the above bases, then we have A = A11 , B = B1 , C = C1 .
(12.74)
12.3 Model Reduction: A Circle of Ideas
395
Proof. 1. We compute, using (12.67), P2 = (Y W )(Y W ) = Y (W Y )W = Y W = P, i.e., P = Y W is a projection. 2. The factorization (12.69) implies the inclusion Ker W ⊂ Ker P. The injectivity of Y proves the opposite inclusion. Similarly, from (12.69), we conclude that Im P ⊂ Im Y . The surjectivity of W shows the opposite inclusion. 3. Follows from (12.70). 4. That {Y e1 , . . . , Y ek } is a linearly independent set is a consequence of the injectivity of Y . That it spans Im Y is immediate from the fact that {e1 , . . . , ek } spans X . We note that W f j = 0 for j = 1, . . . , n − k, while, using (12.67), W (Y ei ) = ei for i = 1, . . . , k. The representation of Y is immediate. 5. Follows from the matrix representations (12.72) and (12.73). A B is a The previous proposition shows that the approximating system C D A B . It goes without saying that the properties projection of the original system C D
of the approximating system depend on the choice of the maps W , Y . Unless we get good bounds for the error, the projection method remains but a formal procedure. Our next result in this section is the clarification of the connection between the solution of the polynomial Sylvester equation (12.63) and the projection method. This we achieve by interpreting the polynomial data from a geometric point of view. As usual, the interpretation uses polynomial models and the shift realization. Furthermore, the maps W , Y , used in Proposition 12.25, are defined using the polynomials p(z), q(z), obtained in solving (12.63). With g(z) = p(z)/q(z) we associate the shift realization, in the state space Xq , ⎧ ⎨ A = Sq , Σq−1 p := Bα = pα , α ∈ F, ⎩ −1 C f = (q f )−1 ,
(12.75)
whereas with g(z) = p(z)/q(z), we associate the shift realization
Σ pq−1
⎧ ⎨ A = Sq , := Bα = α , α ∈ F, ⎩ Cg = (pq−1 g)−1 ,
(12.76)
with both realizations constructed in Theorem 10.13. Theorem 12.26. Let q(z), p(z) ∈ F[z] be coprime polynomials. Let q(z), p(z) ∈ F[z] be the unique solution of the polynomial Sylvester equation (12.63) that satisfies
396
12 Model Reduction
the degree constraints (12.64). For the realizations defined by (12.75) and (12.76), define linear maps Y : Xq −→ Xq and W : Xq −→ Xq as follows: Y = p(Sq )πq , W = πq q(Sq ).
(12.77)
With such choices, W is surjective, Y injective, and equations (12.63) and (12.68) are satisfied. Proof. Since as sets, we have Xq ⊂ Xq , the projection πq : Xq −→ Xq is clearly injective. The polynomial Sylvester equation (12.63) implies the coprimeness of q(z) and q(z). So by Theorem 5.18, q(Sq ) is invertible. Since πq : Xq −→ Xq is surjective, the surjectivity of W follows. We compute, for g(z) ∈ Xq and using (12.63), W Y g = πq q(Sq )p(Sq )πq g = πq πq qpπq g = πq πq (1 + qp)g = πq g = g. This proves (12.67). To prove equations (12.68), we begin by computing, with g ∈ Xq , W AY g = πq q(Sq )Sq p(Sq )πq g = πq πq zqpg = πq πq z(1 + pq)g = πq πq zg = Sq g = Ag. Here we used the fact that since degq < deg q, we have for g(z) ∈ Xq that zg(z) ∈ Xq , and hence πq zg = zg. Similarly, W Bα = πq πq qπq pα = πq πq qpα = πq πq (1 + qp)α = πq α = α = Bα . Finally, using q(z)−1 p(z) = p(z)q(z)−1 + q(z)−1 q(z)−1 , which is a consequence of the polynomial Sylvester equation (12.63), we compute CY g = (q−1 πq pg)−1 = (q−1 qπ− q−1 pg)−1 = (q−1 pg)−1 = ((pq−1 + q−1q−1 )g)−1 = (pq−1 g)−1 = Cg. The polynomial Sylvester equation (12.63) is, under the isomorphism Xq(z)q(w) Hom F (Xq , Xq ),
(12.78)
equivalent to a standard Sylvester equation. This is summed up by the following proposition.
12.4 Exercises
397
Proposition 12.27. Let q(z), p(z) ∈ F[z] be coprime polynomials. Let q(z), p(z) ∈ F[z] be the unique solutions of the polynomial Sylvester equation (12.63) that satisfy the degree constraints (12.64). Define y(z, w) ∈ Xq(z)q(w) by y(z, w) =
q(z)p(w) − p(z)q(w) + 1 , z−w
(12.79)
and define Y : Xq −→ Xq by Y g = g(·), y(z, ·),
g ∈ Xq .
(12.80)
Let E : Xq −→ Xq be defined, for g(z) ∈ Xq , by E g = g, 1 = (q−1 g)−1 = ξ g .
(12.81)
Then Y is a solution of the Sylvester equation S q Y − Y Sq = E .
(12.82)
Proof. In view of equation (12.63), the polynomials q(z), q(z) are coprime; hence equation (12.82) is solvable and the solution is unique. To see that Y , given by (12.80), is that solution, we compute (Sq Y − Y Sq )g = Sq p(Sq )Iq,q g − p(Sq)Iq,q Sq g = p(Sq ) Sq g − Iq,q Sq g = p(Sq ) (zg − qξg ) − (zg − qξ g ) = p(Sq )qξ g = πq pqξ g = qπ− q−1 pqξ g = qπ− (p + q−1 )ξ g = πq ξ g = ξ g = E g. Here we used the polynomial Sylvester equation (12.63), equation (12.81), and the representation Sq g = zg − q(z)ξg of the shift action.
12.4 Exercises 1. Define the map J : RL2 −→ RL2 by J f (z) = f ∗ (z) = f (−z). Clearly this is a unitary map in RL2 , and it satisfies JRH2± = RH2∓ . Let Hφ be a Hankel operator. a. Show that JP− = P+ J, and JHφ = Hφ∗ J | RH2+ . b. If { f , g} is a Schmidt pair of Hφ , then {J f , Jg} is a Schmidt pair of Hφ∗ . c. Let σ > 0. Show that
398
12 Model Reduction
i. The map Uˆ : RH2+ −→ RH2+ defined by 1 Uˆ f = H ∗ J f σ is a bounded linear operator in RH2+ ˆ ii. Ker (H ∗ H − σ 2 I) is an invariant subspace for U. ∗ 2 ∗ iii. The map U : Ker (H H − σ I) −→ Ker (H H − σ 2 I) defined by U = Uˆ | Ker (H ∗ H − σ 2 I) satisfies U = U ∗ = U −1 . iv. Defining I +U I −U , K− = , K+ = 2 2 show that K± are orthogonal projections and Ker (I − U) = Im K+ , Ker (I + U) = Im K− , and
Ker (H ∗ H − σ 2 I) = Im K+ ⊕ ImK− .
Also U = K+ − K− , which is the spectral decomposition of U, i.e., U is a signature operator. 2. Let σ be a singular value of the Hankel operator Hφ , let J be defined as before, and let p(z) be the minimal-degree solution of (12.14). Assume deg d = n and deg p = m. If ε = σλ , prove the following:
n−m+1 , 2
n−m , 2
dim Ker (Hφ − λ J) = dim Ker (Hφ + λ J) =
dim Ker (Hφ∗ Hφ − σ 2 I) = n − m, dim Ker (Hφ − λ J) − dimKer (Hφ + λ J) = 3. A minimal realization
A B C D
0, n − m even, 1, n − m odd.
of an asymptotically stable (antistable) transfer
function G(z) is called balanced if there exists a diagonal matrix diag (σ1 , . . . , σn ) such that ˜ AΣ + Σ A˜ = −BB, ˜ ˜ AΣ + Σ A = −CC
12.4 Exercises
399
(with the minus signs removed in the antistable case). The matrix Σ is called the gramian of the system and its diagonal entries are called the system singular n(z) ∈ RH∞ values. Let φ (z) = d(z) − . Assume all singular values of H dn are distinct. Let pi (z) be the minimal-degree solutions of the FPE, normalized so that dp∗i 2 = σi . Show that a. The system singular values are equal to the singular values of H dn . b. The function φ (z) has a balanced realization of the form ε j bi b j , λi + λ j B = (b1 , . . . , bn )˜, C = (ε1 b1 , . . . , εn bn ), D = φ (∞),
A=
with bi = (−1)n εi pi,n−1 , ci = (−1)n−1 pi,n−1 = −εi bi . c. The balanced realization is sign-symmetric. Specifically, with εi = diag (ε1 , . . . , εn ), we have
λi σi
and J =
˜ ˜ JA = AJ, JB = C. d. Relative to a conformal block decomposition
Σ= we have
BB˜ =
Σ1 0 0 Σ2
,
A=
A11 A12 , A21 A22
A11 Σ1 + Σ1 A˜ 11 A12 Σ2 + Σ1 A˜ 21 . A21 Σ1 + Σ2 A˜ 12 A22 Σ2 + Σ2 A˜ 22
e. With respect to the constructed balanced realization, we have the following representation: p∗i (z) = C(zI − A)−1ei . d(z) 4. Let φ (z) =
n(z) d(z)
∈ RH∞ − and let
π1 (z) p1 (z)
of Theorem 12.17 be the Nehari extension of
With respect to the balanced realization of φ (z), given in Exercise 3, show AN BN admits a balanced realization that πp1 (z) , with (z) n(z) . d(z)
1
CN DN
400
12 Model Reduction
⎧ ε j μi μ j b i b j ⎪ ⎪ AN = − , ⎪ ⎪ ⎪ λi + λ j ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ B = (μ b , . . . , μ b ), N n n 2 2 ⎪ ⎪ ⎪ ⎪ ⎪ CN = (μ2 ε2 b2 , . . . , μn εn bn ), ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ DN = λ1 , where
μi =
λ1 − λi λ1 + λi
for i = 2, . . . , n.
12.5 Notes and Remarks Independently of Sarason’s work on H ∞ interpolation, Krein and his students undertook a detailed study of Hankel operators, motivated by classical extension and approximation problems. This was consolidated in a series of articles that became known as AAK theory; see Adamjan, Arov, and Krein (1968a,b, 1971, 1978). The relevance of AAK theory to control problems was recognized by J.W. Helton and P. Dewilde. It was immediately and widely taken up, not least due to the influence of the work of G. Zames, which brought a resurgence of frequency-domain methods. A most influential contribution to the state-space aspects of the Hankel norm approximation problems was given in the classic paper Glover (1984). The content of this chapter is based mostly on Fuhrmann (1991, 1994). Young (1988) is a very readable account of elementary Hilbert space operator theory and contains an infinite-dimensional version of AAK theory. Nikolskii (1985) is a comprehensive study of the shift operator. The result we refer to in this chapter as Kronecker’s theorem was not proved in this form. Actually, Kronecker proved that an infinite Hankel matrix has finite rank if and only if its generating function is rational. Hardy spaces were introduced later. For more on the partial realization problem, we refer the reader to Gragg and Lindquist (1983). An extensive monograph on model reduction is Antoulas (2005). We note that Hankel norm approximation is related to a polynomial Sylvester equation and the optimal approximant determined by interpolation. However, we have to depart from the convention adopted before of taking the solution of the Sylvester equation that satisfies the degree constraints (12.64). We proceed to explain the reason for this. It has been stated in the literature, see for example Sorensen and Antoulas (2002) or Gallivan, Vandendorpe, and Van Dooren (2003), that for model reduction we may assume that if the transfer function of the system is strictly proper, then so is the approximant. Our analysis of Hankel norm
12.5 Notes and Remarks
401
approximation shows that this is obviously false when norm bounds for the error function are involved. In fact, in the Hankel norm approximation case, the error function turns out to be a constant multiple of an all-pass function, so the optimal approximant cannot be strictly proper. A simple example is g(z) = 2(z − 1)−1 ∈ RH∞ − with g∞ = 2. A 0th-order approximant then is −1 with e∞ = 1: e∞ = 2(z − 1)−1 − (−1)∞ = (z + 1)(z − 1)−1∞ = 1. An important, SVD-related, method of model reduction is balanced truncation, introduced in Moore (1981), see also Glover (1984). Exercise 3 gives a short description of the method. We indicate, again in the scalar rational case, how balanced truncation reduces to an interpolation result. Assume g(z) = n(z)/d(z) is a strictly proper antistable with n(z), d(z) coprime, and having a function, minimal realization g(z) =
A b c 0
. It has been shown that there exists a state-space
isomorphism that brings the realization into balanced form, namely one for which the Lyapunov equations AX + XA∗ = bb∗ , (12.83) A∗ X + XA = c∗ c, have a common diagonal and positive solution Σ = diag (σ1 , . . . , σn ). The σi , decreasingly ordered, are the Hankel singular values of g(z). If the set of singular values σi satisfies σ1 ≥ · · · ≥ σk >> σk+1 ≥ · · · ≥ σn , then we write accordingly Σ = diag (σ1 , . . . , σn ) = diag (σ1 , . . . , σk ) ⊕ diag (σk+1 , . . . , σn ) and partition the realization conformally to get ⎛
⎞ A11 A12 b1 g(z) = ⎝ A21 A22 b1 ⎠. c1 c2 0 By truncation, or equivalently by projection, of this realization, the transfer function gb (z) =
A11 b1 c1 0
is obtained. It is easy to check that gb (z) is antistable too and
gb (z) = nb (z)/db (z) serves as an approximation of g(z) for which an error estimate is available. In fact, if we partition Σ = diag (σ1 , . . . , σn ) as Σ = diag (σ1 , . . . , σn−1 ) ⊕ σn , and take p∗n (z)/d ∗ (z) to be the Hankel singular vector corresponding to the singular value σn , then db (z) is antistable too. It can be shown, see Fuhrmann (1991, 1994) for the details, that the Glover approximation error satisfies p∗ (z) e(z) = g(z) − gb (z) = λn n pn (z)
d ∗ (z) db∗ (z) + , d(z) db (z)
(12.84)
which implies e∞ ≤ 2σn . Furthermore, we have db (z)n(z) − d(z)nb (z) = −εn (p∗n (z))2 ,
(12.85)
402
12 Model Reduction
which implies the error estimate e(z) = εn
(p∗n (z))2 . d(z)db (z)
(12.86)
Equation (12.86) shows that the balanced truncation that corresponds to the smallest singular value can be obtained by second-order interpolation at the zeros of p∗n (z).
References
Adamjan, V.M., Arov, D.Z., and Krein, M.G. (1968a) “Infinite Hankel matrices and generalized problems of Carath´eodory-Fej´er and F. Riesz,” Funct. Anal. Appl. 2, 1–18. Adamjan, V.M., Arov, D.Z., and Krein, M.G. (1968b) “Infinite Hankel matrices and generalized problems of Carath´eodory-Fej´er and I. Schur,” Funct. Anal. Appl. 2, 269–281. Adamjan, V.M., Arov, D.Z., and Krein, M.G. (1971) “Analytic properties of Schmidt pairs for a Hankel operator and the generalized Schur-Takagi problem,” Math. USSR Sbornik 15, 31–73. Adamjan, V.M., Arov, D.Z., and Krein, M.G. (1978) “Infinite Hankel block matrices and related extension problems,” Amer. Math. Soc. Transl., series 2, Vol. 111, 133–156. Antoulas, A.C. (2005) Approximation of Large Scale Dynamical Systems, SIAM, Philadelphia. Axler, S. (1995) “Down with determinants,” Amer. Math. Monthly, 102, 139–154. Beurling, A. (1949) “On two problems concerning linear transformations in Hilbert space,” Acta Math. 81, 239–255. Brechenmacher, F. (2007) “Algebraic generality vs arithmetic generality in the controversy between C. Jordan and L. Kronecker (1874).” http://arxiv.org/ftp/arxiv/papers/0712/0712.2566. pdf. Carleson, L. (1962) “Interpolation by bounded analytic functions and the corona problem,” Ann. of Math. 76, 547–559. Davis, P.J. (1979) Circulant Matrices, J. Wiley, New York. Douglas, R.G., Shapiro, H.S. & Shields, A.L. (1971) “Cyclic vectors and invariant subspaces for the backward shift,” Ann. Inst. Fourier, Grenoble 20, 1, 37–76. Dunford, N. and Schwartz, J.T. (1958) Linear Operators, Part I, Interscience, New York. Dunford, N. and Schwartz, J.T. (1963) Linear Operators, Part II, Interscience, New York. Duren, P. (1970) Theory of H p Spaces, Academic Press, New York. Fuhrmann, P.A. (1968a) “On the corona problem and its application to spectral problems in Hilbert space,” Trans. Amer. Math. Soc. 132, 55–67. Fuhrmann, P.A. (1968b) “A functional calculus in Hilbert space based on operator valued analytic functions,” Israel J. Math. 6, 267–278. Fuhrmann, P.A. (1975) “On Hankel operator ranges, meromorphic pseudo-continuation and factorization of operator valued analytic functions,” J. Lon. Math. Soc. (2) 13, 323–327. Fuhrmann, P.A. (1976) “Algebraic system theory: An analyst’s point of view,” J. Franklin Inst. 301, 521–540. Fuhrmann, P.A. (1977) “On strict system equivalence and similarity,” Int. J. Contr. 25, 5–10. Fuhrmann, P.A. (1981) Linear Systems and Operators in Hilbert Space, McGraw-Hill, New York. Fuhrmann, P.A. (1981b) “Polynomial models and algebraic stability criteria,” Proceedings of Joint Workshop on Synthesis of Linear and Nonlinear Systems, Bielefeld June 1981, 78–90. Fuhrmann, P.A. (1991) “A polynomial approach to Hankel norm and balanced approximations,” Lin. Alg. Appl. 146, 133–220. P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, DOI 10.1007/978-1-4614-0338-8, © Springer Science+Business Media, LLC 2012
403
404
References
Fuhrmann, P.A. (1994) “An algebraic approach to Hankel norm approximation problems,” in Differential Equations, Dynamical Systems, and Control Science, the L. Markus Festschrift, Edited by K.D. Elworthy, W.N. Everitt, and E.B. Lee, M. Dekker, New York, 523–549. Fuhrmann, P.A. (1994b) “A duality theory for robust control and model reduction,” Lin. Alg. Appl. 203–204, 471–578. Fuhrmann, P.A. (2002) “A study of behaviors,” Lin. Alg. Appl. (2002), 351–352, 303–380. Fuhrmann, P.A. and Helmke, U. (2010) “Tensored polynomial models,” Lin. Alg. Appl. (2010), 432, 678–721. Gallivan, K., Vandendorpe, A. and Van Dooren, P. (2003) ”Model reduction via truncation: an interpolation point of view,” Lin. Alg. Appl. 375, 115–134. Gantmacher, F.R. (1959) The Theory of Matrices, Chelsea, New York. Garnett, J.B. (1981) Bounded Analytic Functions, Academic Press, New York. Glover, K. (1984) “All optimal Hankel-norm approximations and their L∞ -error bounds,” Int. J. Contr. 39, 1115–1193. Glover, K. (1986) “Robust stabilization of linear multivariable systems, relations to approximation,” Int. J. Contr. 43, 741–766. Gohberg, I.C. and Krein, M.G. (1969) Introduction to the Theory of Nonselfadjoint Operators, Amer. Math. Soc., Providence. Gohberg, I.C. and Semencul, A.A. (1972) “On the inversion of finite Toephtz matrices and their continuous analogs” (in Russian), Mat. Issled. 7, 201–233. Gragg, W.B. and Lindquist, A. (1983) “On the partial realization problem,” Lin. Alg. Appl. 50, 277–319. Grassmann, H. (1844) Die Lineale Ausdehnungslehre, Wigand, Leipzig. Halmos, P.R. (1950) “Normal dilations and extensions of operators,” Summa Brasil. 2, 125–134. Halmos, P.R. (1958) Finite-Dimensional Vector Spaces, Van Nostrand, Princeton. Helmke, U. and Fuhrmann, P. A. (1989) “Bezoutians,” Lin. Alg. Appl. 122–124, 1039–1097. Helson, H. (1964) Lectures on Invariant Subspaces, Academic Press, New York. Hermite, C. (1856) “Sur le nombre des racines d’une equation algebrique comprise entre des limites donn´es,” J. Reine Angew. Math. 52, 39–51. Higham, N.J. (2008) “Cayley, Sylvester, and early matrix theory,” Lin. Alg. Appl. 428, 39–43. Hoffman, K. (1962) Banach Spaces of Analytic Functions, Prentice Hall, Englewood Cliffs. Hoffman, K. and Kunze, R. (1961) Linear Algebra, Prentice-Hall, Englewood Cliffs. Householder, A.S. (1970) “Bezoutians, elimination and localization,” SIAM Review 12, 73–78. Hungerford, T.W. (1974) Algebra, Springer-Verlag, New York. ¨ Hurwitz, A. (1895) “Uber die bedingungen, unter welchen eine Gleichung nur Wurzeln mit negativen reelen Teilen besitzt,” Math. Annal. 46, 273–284. Joseph, G.G. (2000) The Crest of the Peacock: Non-European Roots of Mathematics, Princeton University Press. Kailath, T. (1980) Linear Systems, Prentice-Hall, Englewood Cliffs. Kalman, R.E. (1969) “Algebraic characterization of polynomials whose zeros lie in algebraic domains,” Proc. Nat. Acad. Sci. 64, 818–823. Kalman, R.E. (1970) “New algebraic methods in stability theory,” Proceeding V. International Conference on Nonlinear Oscillations, Kiev. Kalman R.E., Falb P., and Arbib M. (1969) Topics in Mathematical System Theory, McGraw Hill. Kravitsky N. (1980) “On the discriminant function of two noncommuting nonselfadjoint operators,” Integr. Eq. and Oper. Th. 3, 97–124. Krein, M.G. and Naimark, M.A. (1936) “The method of symmetric and Hermitian forms in the theory of the separation of the roots of algebraic equations,” English translation in Linear and Multilinear Algebra 10 (1981), 265–308. Lander, F. I. (1974) “On the discriminant function of two noncommuting nonselfadjoint operators,” (in Russian), Mat. Issled. IX(32), 69–87. Lang, S. (1965) Algebra, Addison-Wesley, Reading. Lax, P.D. and Phillips, R.S. (1967) Scattering Theory, Academic Press, New York.
References
405
Livsic, M.S. (1983) “Cayley–Hamilton theorem, vector bundles and divisors of commuting operators,” Integr. Eq. and Oper. Th. 6, 250–273. Liapunov, A.M. (1893) “Problem`e g´en´eral de la stabilit´e de mouvement”, Ann. Fac. Sci. Toulouse 9 (1907), 203–474. (French translation of the Russian paper published in Comm. Soc. Math. Kharkow). Mac Lane, S. and Birkhoff, G. (1967) Algebra, Macmillan, New York. Magnus, A. (1962) “Expansions of power series into P-fractions,” Math. Z. 80, 209–216. Malcev, A.I. (1963) Foundations of Linear Algebra, W.H. Freeman & Co., San Francisco. Maxwell, J.C. (1868) “On governors,” Proc. Roy. Soc. Ser. A, 16, 270–283. Moore, B.C. (1981) “Principal component analysis in linear systems: Controllability, observability and model reduction,” IEEE Trans. Automat. Contr. 26, 17–32. Nehari, Z. (1957) “On bounded bilinear forms,” Ann. of Math. 65, 153–162. Nikolskii, N.K. (1985) Treatise on the Shift Operator, Springer-Verlag, Berlin. Peano, G. (1888) Calcolo geometrico secondo l’Ausdehnungslehre di H. Grassmann, preceduto dalle operazioni della Iogica deduttiva, Bocca, Turin. Prasolov, V.V. (1994) Problems and Theorems in Linear Algebra, Trans. of Math. Monog. v. 134, Amer. Math. Soc., Providence. Rosenbrock, H.H. (1970) State-Space and Multivariable Theory, John Wiley, New York. Rota, G.C. (1960) “On models for linear operators,” Comm. Pure and Appl. Math. 13, 469–472. Routh, E.J. (1877) A Treatise on the Stability of a Given State of Motion, Macmillan, London. Sarason, D. (1967) “Generalized interpolation in H ∞ ,” Trans. Amer. Math. Soc. 127, 179–203. Schoenberg, I.J. (1987) “The Chinese remainder problem and polynomial interpolation,” The College Mathematics Journal, 18, 320–322. ¨ Schur, I. (1917,1918) “Uber Potenzreihen die im Innern des Einheitskreises beschr¨ankt sind,” J. Reine Angew. Math. 147, 205–232; 148, 122–145. Sorensen, D.C. and Antoulas, A.C. (2002) “The Sylvester equation and approximate balanced reduction,” Lin. Alg. Appl. 351–352, 671–700. Sz.-Nagy, B. and Foias, C. (1970) Harmonic Analysis of Operators on Hilbert Space, North Holland, Amsterdam. Vidyasagar, M. (1985) Control System Synthesis: A Coprime Factorization Approach, M.I.T. Press, Cambridge MA. van der Waerden, B.L. (1931) Moderne Algebra, Springer-Verlag, Berlin. Willems, J.C. (1986) ”From time series to linear systems. Part I: Finite-dimensional linear time invariant systems,” Automatica, 22, 561–580. Willems, J.C. (1989) “Models for dynamics,” Dynamics Reported, 2, 171–269, U. Kirchgraber and H.O. Walther (eds.), Wiley-Teubner. Willems, J.C. (1991) ”Paradigms and puzzles in the theory of dynamical systems,” IEEE Trans. Autom. Contr. AC-36, 259–294. Willems, J.C. and Fuhrmann, P.A. (1992) “Stability theory for high order systems,” Lin. Alg. Appl. 167, 131–149. Wimmer, H. (1990) “On the history of the Bezoutian and the resultant matrix,” Lin. Alg. Appl. 128, 27–34. Young, N. (1988) An Introduction to Hilbert Space, Cambridge University Press, Cambridge.
Index
Symbols F[z]-Kronecker product, 260 F[z]-Kronecker product model, 260 A AAK theory, 400 abelian group, 3 addition, 8 adjoint, 167 adjoint transformation, 86 all-pass function, 364 analytic Toeplitz operator, 332 annihilator, 81 atoms, 237
B backward invariant subspace, 334 backward shift operator, 30, 113 balanced realization, 398 Barnett factorization, 213 barycentric representation, 132 basis, 28, 38 basis transformation matrix, 47 Bessel inequality, 163 Beurling’s theorem, 336 Bezout equation, 15, 17 Bezout form, 210 Bezout identity, 10 Bezout map, 272 Bezoutian, 210 bijective map, 2 bilateral shift, 29 bilinear form, 79, 196, 254 bilinear pairing, 79
binary operation, 3 binary relation, 1
C canonical embedding, 84 canonical factorization, 2 canonical map, 256 canonical projection, 7 Cauchy determinant, 66 Cauchy index, 248 causality, 299 Cayley transform, 177 Cayley–Hamilton theorem, 129 characteristic polynomial, 93 characteristic vector, 93 Chinese remainder theorem, 118, 353 Christoffel–Darboux formula, 323 circulant matrix, 109 classical adjoint, 56, 60 codimension, 43 coefficients of a linear combination, 36 cofactor, 56 commutant, 136 commutative ring, 8 companion matrices, 100 composition, 2, 71 congruent, 197 continued fraction representation, 237 control basis, 99 controllability realization, 310 controller realization, 310 coordinate vector, 45 coordinates, 45 coprime factorization, 22, 354 cosets, 4
P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, DOI 10.1007/978-1-4614-0338-8, © Springer Science+Business Media, LLC 2012
407
408 Cramer’s rule, 61 cyclic group, 8 cyclic transformation, 105 cyclic vector, 105
D degree, 11 degree function, 329 determinantal divisors, 146 determinants, 55 diagonalizable matrix, 154 dimension, 40 direct sum, 41 DSS factorization, 354 dual basis, 81 dual Jordan basis, 143 dual space, 79
E eigenvalue, 93 eigenvector, 93 entire ring, 10 equivalence classes, 4 equivalence relation, 1 equivalent, 145, 371 Euclidean algorithm, 15 Euclidean ring, 13 exact sequence, 27 external representation, 296
F factor, 14 factor group, 6 factorizable polynomial, 18 field, 8 field of quotients, 22 final space, 182 finite-dimensional space, 38 formal derivative, 31 formal power series, 20 forward shift operator, 30 fractional linear transformation, 238 free module, 32 functional equation of Hankel operators, 206 fundamental polynomial equation, 367
G g.c.d., 10 generalized Christoffel–Darboux formula, 241 generalized Euclidean algorithm, 278
Index generating function, 210, 281 generating set, 28 generator, 8, 10 Gohberg–Semencul formulas, 215 Gram matrix, 181 Gram–Schmidt orthonormalization, 163 Gramian, 181 gramian, 399 greatest common divisor, 10 greatest common left divisor, 10 group, 3 group homomorphism, 5
H Hankel functional equation, 355 Hankel matrix, 207 Hankel operator, 205, 332, 354 Hardy spaces, 326 Hautus test, 321 Hermite interpolation, 120 Hermite interpolation problem, 120 Hermite–Fujiwara quadratic form, 284 Hermite–Hurwitz theorem, 250 Hermitian adjoint, 167 Hermitian form, 161, 197 Hermitian operator, 173 high-order interpolation problem, 348 homogeneous polynomial Sylvester equation, 269 Hurwitz determinants, 290 Hurwitz polynomial, 284 hyperspace, 82
I ideal, 9 identity, 8 identity matrix, 35 image, 68 index, 5 induced map, 7, 91 inertia, 202 initial space, 182 injectivity, 2 inner function, 336 inner product, 161 input space, 297 input/output relation, 296 integral domain, 10 interlacing property, 284 internal representation, 296 internal stability, 320 interpolation basis, 99
Index interpolation problem, 345 intertwining map, 88, 344 invariant factor algorithm, 145 invariant factors, 146 invariant subspace, 89, 334 invertible, 72 invertible element, 8 irreducible polynomial, 18 isometric isomorphism, 170 isometry, 170 isomorphic realizations, 304 isomorphism, 5, 75 J Jacobi’s signature rule, 274 Jacobson chain matrix, 131 Jordan basis, 141 Jordan canonical form, 142 K kernel, 9, 68, 270 Kronecker delta function, 35 Kronecker product model, 260 Kronecker product of polynomials, 260 L Lagrange interpolation, 119 Lagrange interpolation polynomials, 49, 120 Lagrange interpolation problem, 49 Lagrange reduction method, 199 Lanczos polynomials, 239 Laurent operator, 332 Laurent operators, 123 leading coefficient, 11 left coprimeness, 10 left ideal, 9 left inverse, 8 left invertible transformation, 72 left module, 26 Levinson algorithm, 276 linear combination, 28, 36 linear dependence, 37 linear functional, 79 linear independence, 37 linear operator, 67 linear transformation, 67 linearly independent, 28 M M¨obius transformation, 238 Markov parameters, 300
409 matrix, 34 matrix representation, 76 McMillan degree, 207 minimal polynomial, 93 minimal rational extension, 231 minimax principle, 176 minimum phase, 24 minor, 56 model operators, 339 model reduction, 363 monic polynomial, 18 multiplication, 8 multiplicative set, 21 mutual coprimeness, 42
N Nehari’s theorem, 378 Nevanlinna–Pick interpolant, 379 Newton expansion, 52 Newton interpolation problem, 121 Newton sums, 280 nonnegative form, 198 nonnegative operator, 180 norm, 161, 169 normal operator, 178 normal subgroup, 5 normalized coprime factorization, 359 nullity, 69
O observability map, 301 observability realization, 311 observer realization, 311 order, 5 Orlando’s formula, 293 orthogonal basis, 163, 246 orthogonal vectors, 162 orthonormal basis, 163 output space, 297
P parallelogram identity, 163 partial isometry, 182 partial realization, 392 partition, 1 permutations, 3 PID, 10 polar decomposition, 183 polarization identity, 163 polynomial, 11 polynomial coefficients, 11
410 polynomial model, 98 polynomial Sylvester equation, 269 positive form, 198 positive operator, 180 positive pair, 284 positive sequence, 275 primary decomposition, 19 prime polynomial, 18 principal ideal, 10 principal ideal domain, 10 projection, 91 proper, 23 proper rational function, 112 Pythagorean theorem, 162
Q quadratic form, 88 quotient group, 6 quotient module, 27
R rank, 69 rational control basis, 224 rational functions, 22, 111 rational models, 116 reachability map, 301 real Jordan form, 144 real pair, 284 reducible polynomial, 18 reducing subspace, 89 relative degree, 23, 329 remainder, 13, 352 representer, 109 reproducing kernel, 271 restricted input/output map, 300 restriction, 91 resultant, 63 resultant matrix, 62 reverse Hankel operator, 332 reverse polynomial, 213 right inverse, 8 right-invertible transformation, 72 ring, 8 ring homomorphism, 8 ring of quotients, 22
S Schmidt pair, 185 Schur parameters, 276 Schwarz inequality, 162
Index self-adjoint operator, 173 self-dual map, 87 sesquilinear form, 196 shift, 29 shift operator, 98 shift realizations, 308 short exact sequence, 27 signature, 202 similar, 78 similarity, 88 singular value, 185 singular value decomposition, 192 singular vectors, 185 Smith form, 146 solution pair, 364 spanning set, 37 spectral basis, 99 spectral factorization, 190, 358 spectral representation, 141 spectral theorem, 174 stable polynomial, 284 standard basis, 48, 98 state space, 297 state space isomorphism theorem, 305 state space-realization, 297 strict causality, 299 strictly proper, 23 subgroup, 4 submodule, 26 subspace, 36 sum of squares, 201 surjectivity, 2 Sylvester equation, 268 Sylvester operator, 268 Sylvester resultant, 215 symbol, 332 symbol of Hankel operator, 205 symbol of Laurent operator, 123 symmetric bilinear form, 205 symmetric form, 197 symmetric group, 3, 57
T Taylor expansion, 51 tensor product, 256 tensor product of the bases, 256 tensor product over a ring, 259 Toeplitz form, 275 Toeplitz operator, 332 trace, 79 transfer function, 298 transpose, 35 triangle inequality, 162
Index trivial linear combination, 36 truncated Laurent series, 26, 42
U unimodular, 145 unitary equivalence, 171 unitary isomorphism, 170
V Vandermonde matrix, 50, 63 vector, 33
411 vector space, 33 vector space dual, 254
Y Youla–Kucera parametrization, 321
Z zero, 14 zero divisor, 10 zero element, 8