METHODS OF NONLINEAR ANALYSIS Volume I
This is Volume 61 in MATHEMATICS IN SCIENCE AND ENGINEERING A series of monographs and textbooks Edited by RICHARD BELLMAN, University of Southern California A complete list of the books in this series appears at the end of this volume.
METHODS OF NONLINEAR ANALYSIS Richard Bellman Departments of Mathematics, Electrical Engineering, and Medicine University of Southern California Los Angeles, California
VOLUME I
@
1970
ACADEMIC PRESS
New York and London
COPYRIGHT 0 1970,
BY
ACADEMIC PRESS,INC.
ALL RIGHTS RESERVED. NO PART OF THIS BOOK MAY BE REPRODUCED I N ANY FORM, BY PHOTOSTAT, MICROFILM, RETRIEVAL SYSTEM, OR ANY OTHER MEANS, WITHOUT WRITTEN PERMISSION FROM
THE PUBLISHERS.
ACADEMIC PRESS, INC. 1 1 1 Fifth Avenue, New York, New York 10003
United Kingdom Edition published by ACADEMIC PRESS, INC. (LONDON) LTD. Berkeley Square House, London W l X 6BA
LIBRARY OF CONGRESS CATALOG CARDNUMBER:78-91424
PRINTED IN THE UNITED STATES OF AMERICA
To EMlL SELETZ Surgeon, Sculptor, Humanitarian, and Friend
This page intentionally left blank
PREFACE
T h e demands of modern science inexorably force the mathematician to explore the nonlinear world. That it is a difficult and often humbling journey with painfully crude maps and rather primitive direction-finders cannot be gainsaid, but in return it can be asserted that it is richly rewarding. The few areas that have been so far examined with any care have been full of surprises and vastly stimulating to the imagination. There is every reason to believe from what so far has been glimpsed that many more surprises lay in store, novel phenomena which will open up undreamt of vistas for mathematics. I t is an exciting prospect in an exciting field in an exciting time. Explicit analytic solutions of nonlinear equations in terms of the familiar, well-tamed functions of analysis are not to be expected, although fortuitous and treasured examples occur here and there. Consequently, if either analytic or computational results are desired, various approximate methods must be applied. By and large, the effective solution of particular problems is an art. However, there d o exist a number of powerful procedures for solving nonlinear problems which have been slowly forged and polished over the last one hundred years. As in all intellectual areas, art combined with method is more effective than untutored art. This book is intended as an introduction to the study of certain systematic and sophisticated techniques. T h e power and versatility of these methods has been tremendously amplified by the digital computer. Even more, this new tool has motivated a careful reexamination of older methods and thus a creation of new techniques specifically designed to exploit the peculiar properties of the electronic calculator. That a good deal of mathematical ingenuity and experience is required to study significant problems with the aid of a computer hardly needs emphasizing. This volunie may also be regarded as a contribution to a new mathematical theory that is slowly emerging, a theory of closure of operations. Abstractly, the general problem may be described in the following terms. We are given the privilege of using a certain limited number of mathematical operations, such as, for example, the solution of finite systems of linear or nonlinear differential equations subject to initial conditions, or the solution of a finite system of linear algebraic equations. The task is then that of solving a particular equation, such as a partial differential equation, a two-point boundary value problem for vii
viii
PREFACE
ordinary differential equations, or an integral equation, to a specified degree of accuracy using only these algorithms. T h e study becomes still more interesting and significant if we impose a constraint on the number of operations of particular types that can be employed, or on the time that may be consumed in the overall calculation. Usually, the computational facilities available automatically impose these constraints. ‘I’he two types of operations mentioned above have been carefully singled out for explicit mention since they correspond to the two major capabilities of the digital computer in the field of analysis. That they are not guaranteed capabilities merely adds to the interest of the zest of using computers to obtain numerical rcsu I ts. We nil1 present a spectrum of methods which can be used for a variety of purposes, ranging from the derivation of a simple exponential or algebraic approximation to a sequence of algorithms of increasing complexity which require a digital computer. At the moment, our orientation, as far as large-scale computing is concerned, is toward a digital computer, which is to say a leaning toward initial value problems. As hybrid computers become more powerful and prevalent, a certain mix of methods involving two-point boundary value problems and initial value problems will occur. In general, the word “solution” must be defined operationally in terms of various technological tools available for obtaining numerical results. T h e arrival of the digital computer has already drastically changed the significance of this term “solution,” and there will be further radical changes over the next twenty-five years. T h e majority of the methods we present here can be applied to the study of partial differential equations and to the still more complex functional equations that the determined engineer and intrepid physicist are forced to face. T h e applications w ithin this broader context are naturally of greater significance than those that can be made using ordinary differential equations. Despite this, we have deliberately refrained from any excursion in force into the area of partial differential equations. In this volume, the first of two, we have discussed only ordinary differential cquations, However, since any complete separation between ordinary and partial differential equations is unnatural, we have broken this self-imposed vow in the second volume. This is particularly the case in the treatment of dynamic programming and invariant imbedding. Since w e are primarily concerned with introducing the reader to a variety of fundamental methods, we feel that there is considerable pedagogical force to keeping the setting as familiar as possible while new ideas are being introduced. Once acquainted with the concepts, the reader can readily apply them to all types of functional equations with a small amount of additional background. References will be found throughout to their utilization in the theory of partial differential equations.
PREFACE
ix
Another strong argument for using ordinary differential equations as a proving ground is that it is relatively easy to provide a number of numerical examples in this area to illustrate different methods. In addition, a large number of interesting analytic results are available as exercises. These have principally been taken from research papers. Having briefly described our overall aims, let us examine the structure of the book. T h e first three chapters contain some of the fundamental results and methods that will serve throughout both as foundation and ancillary tools. Chapter 1 discusses first- and second-order linear differential equations, subject to initial and boundary value problems, with some attention to the Riccati differential equation and a detailed study of the behavior of the physical solutions of nonlinear equations of the form
where p and q are polynomials in their arguments. This last represents a brief account of extensive work by Bore1 and Hardy, very important and useful results which are still not as well known as they should be. Some related results for the Emden-Fowler (or Fermi-Thomas) equation will be presented at the end of Chapter 4 as applications of stability theory. Throughout we have tried to preserve some sort of a careful balance between general methods and particular problems. We have constantly kept in mind the famous dictum of Hurwitz, “It is easier to generalize than particularize.” In Chapter 2, we present a brief account of basic results in algebraic aspects of matrix analysis that will be employed throughout the remainder of the book. The principal contents are the reduction of quadratic forms to canonical forms and associated variational problems and the Perron theorem for positive matrices. Chapter 3 discusses the use of matrices in the study of systems of linear differential equations with both constant and variable coefficients. It is impossible to study multidimensional problems in any meaningful fashion without matrix theory. Chapter 4 contains some basic results concerning stability theory which we will employ in subsequent chapters to validate certain methods of approximation. Following the lines initiated by PoincarC and Lyapunov, we wish to compare the solutions of
T(u) = 0
(2)
T(u) = N ( 4 ,
(3)
with those of where N(u} is “small” in some sense. The most important case is that. where T is a linear operator with the property that T(u) = 0 possesses a convenient
X
PREFACE
solution. Closely connected with this question is the problem of estimating the difference between the solution of (2) and a function w satisfying the inequality
where I/ ... / / denotes some appropriate norm. With these “back-up” results available, we can turn to our principal goal, the study of certain powerful methods of analytic and computational approximation. I n Chapter 5 , we present the Bubnov-Galerkin method, and in Chapters 7 and 8 that of Rayleigh-Ritz. Although in certain special, but important, cases the methods overlap, they are quite different conceptually and extend in different ways. T h e guiding ideas are simple, as are all fundamental mathematical ideas. As always, effort and ingenuity enter in making these procedures work in particular cases. Let us begin with a description of the Bubnov-Galerkin method. Let T(u) = 0 be the equation whose solution is desired. This is equivalent to minimizing the scalar quantity 11 T(u)ll,for any norm, over the class of admissible functions. We now introduce a closure technique by restricting u to some smaller class of functions, for example, one defined by a finite set of parameters. T h e most important example of this is that where the restricted class is defined by N IZ =
akUk k=l
9
(5)
where the uk are fixed functions and the ak are parameters. T h e infinitedimensional problem of minimizing I/ T(u)lI is then replaced by the approximating finite-dimensional problem of minimizing the function
with respect to the ak . This problem may be attacked by any of a number of techniques developed in optimization theory over the last twenty years: search techniques, gradient methods, Newton-Raphson, nonlinear programming, expansion methods, and so on. Let us note that with such methods in mind, we have deliberately refrained from any automatic use of the usual quadratic norm in the foregoing description. In the text, however, succumbing to the lure of analytic simplicity, we have considered principally quadratic functionals. More general nonlinear functionals give us an opportunity to discuss the Newton-Raphson-Kantorovich method and the use of the Lagrange expansion. Closely associated with the Galerkin method are the methods of mean-square
PREFACE
xi
approximation and differential approximation. The first is discussed at the end of Chapter 5, the second in Chapter 6 . The technique of mean-square approximation may be described in the following terms. Let T ( u ) = 0, as usual, be the original equation, and let S(v, u)
=0
(7)
be another equation, depending on the vector parameter a,which is analytically or computationally more tractable than the original equation. Thus, for example, the equation in (7) may be linear with the original equation nonlinear, or it may be a nonlinear differential equation subject to an initial value condition with the original condition linear and subject to multipoint boundary conditions. Alternatively, the original equation may contain stochastic elements, while (7) is deterministic, or conversely. The existence of analog, digital, and hybrid computers, as well as the availability of many powerful analytic theories has considerably altered the concept of “tractable.” A great deal of flexibility now exists. Many different types of mathematical models are available to treat various kinds of physical processes. We have avoided stochastic processes in this volume since a good deal of effort is required to make various useful methods rigorous. We wish to determine the parameter a so that
is small, where u is the solution of T(u) = 0 and some convenient norm is employed. Presumably, this ensures that v, the solution of (7), is close to u. This is a stability question. Interesting complications arise from the fact that u itself is unknown. There are various “bootstrap” methods that can be employed to circumvent this annoying “little detail.” Here we make brief contact with “self-consistent” methods, of such importance in modern physics. A major analytic problem is that of choosing the operator S(v, u ) in such a way that the function v preserves certain desirable properties of u. Very little is known in this area. The method of differential approximation is the following. Let R(u, b ) be a family of operators depending on a finite-dimensional vector b, and let b be chosen so that
is minimized where u is now given implicitly as the solution of T(u) = 0. We then use the solution of
R(v, 6)
=0
(10)
as an approximation to u. Once again, any discussion of the validity of this approach requires stability considerations.
xii
PREFACE
A case of particular importance is that where
with the Rk(u)differential operators. Observe that the aim of this procedure is once again closure. We want to solve complex functional equations using only the algorithms required to solve the more familiar and placid differential equations. We now turn to an entirely different type of artifice. T h e Rayleigh-Ritz method hinges upon the observation that many equations of the form T ( u ) = 0 may be viewed as the Euler equation of an associated functional J(u), By this we mean that a solution of T(u) = 0 is a stationary point for J(u) as u varies over an appropriate space. Let us suppose that we are looking for a minimum value. The question of determining the minimum of J(u) over the original infinite-dimensional space is then replaced by the finite-dimensional problem of minimizing J(u) over a finite-dimensional subspace, each element of which is characterized by a finite number of parameters, say U =
1
bkUk.
k=l
Here the uk are carefully chosen functions. The new problem, that of minimizing the expression
can now be approached in a number of ways. In many cases of significance, J(u) is a quadratic functional, the minimization of which leads to linear equations for the b, . There are, however, numerous difficulties associated with the solution of large systems of linear algebraic equations, which means that the real difficulties often begin at this point. Observe that in both of the principal methods described above, monotone approximation is obtained immediately upon increasing the dimension of the finite space of functions over which the variation is allowed. Thus, if we set A N =
it is clear that
I! c )I ,
min T {a;}
(k:l
PREFACE
xiii
Similarly, if we set
we have
Several fundamental questions immediately present themselves. The first is that of determining when lim A ,
N-CC
= m$
11 T(u)ll,
lim dN = min J(u).
N+m
U
T h e second is that of determining when the function u ( N ) which yields 4 , converges to the function u which yields min, 11 T(u)ll, with the corresponding problem for the Rayleigh-Ritz functional. Under reasonable conditions on T(u),J(u), and the spaces over which u varies, these are not difficult to answer. Far more difficult and important are the associated stability problems of estimating 11 u - u") (1 in terms of A , - A , , or d , - d, , and in determining 11 u - d N I/) as a function of N . These are essential matters when the effective determination of u ( ~is) of importance. A few numerical examples, together with references to extensive work in this area will be given. In view of the considerable effort required to treat the finite-dimensional variational problems when N is large, there is considerable motivation for finding ways of obtaining useful estimates for small N . In a sense, the major problem is the converse. It is one of determining the smallest value of N which yields an acceptable approximation. Questions of acceleration of convergence and extrapolation arise in this connection, with techniques that go back to Euler and Kronecker. We shall touch briefly on these matters. In Chapter 8, we show how a linear equation containing a parameter can be considered to be the Euler equations associated with the minimization of a functional subject to a global constraint. Once again, Rayleigh-Ritz methods can be employed to obtain approximate results. Many of the problem areas discussed in this volume can be further illuminated, or considered by alternative techniques, using the methods of the second volume. There we consider duality as a technique for providing upper and lower bounds, Caplygin's method, and differential inequalities, quasilinearization, dynamic programming, invariant imbedding, the theory of iteration, and truncation techniques. The work was divided into two volumes to prevent the single,
xiv
PREFACE
massive volume that is so forbidding and discouraging to the newcomer into a field. Let us encourage the reader with the flat statement that very little is known about nonlinear analysis and that it is not obvious that major breakthroughs will be made in the near future, or ever. Hundreds and thousands of fascinating and significant problems abound, each of which may require a new theory for its elucidation. I have been fortunate in having three friends read through the book and help considerably with all aspects of preparation of the manuscript: David Collins, Thomas J. Higgins, and Art Lew. I wish to express my appreciation for their help, and to Jeanette Blood and Rebecca Karush for typing the manuscript. RICHARDBELLMAN
Los Angeles, 1969
CONTENTS
vii
Preface
Chapter 1.
First- and Second-order Differential Equations
1.1. Introduction 1.2. T h e First-order Linear Differential Equation 1.3. Fundamental Inequality 1.4. Second-order Linear Differential Equations 1.5. Inhomogeneous Equation 1.6. Lagrange Variation of Parameters 1.7. Two-point Boundary Value Problem 1.8. Connection with Calculus of Variations 1.9. Green’s Functions 1.10. Riccati Equation 1.1 1. T h e Cauchy-Schwarz Inequality 1.12. Perturbation and Stability Theory 1.13. A Counter-example 1.14. Sm If(t)l dt < co 1.15. Sm If’(t)l dt < co 1.1 6. Asymptotic Behavior 1.17. T h e Equation u” - (1 + f ( t ) ) t ~ = 0 1.18. More Refined Asymptotic Behavior 1.19. J m f z d t < 00 1.20. T h e Second Solution 1.21. T h e Liouville Transformation 1.22. Elimination of Middle Term 1.23. T h e WKB Approximation 1.24. T h e One-dimensional Schrodinger Equation 1.25. u” (1 + f ( t ) ) u = 0; Asymptotic Behavior 1.26. Asymptotic Series 1.27. T h e Equation u’ = p(u, I)/q(u, t ) 1.28. Monotonicity of Rational Functions of u and t 1.29. Asymptotic Behavior of Solutions of u’ = p(u,t ) / q ( u ,t ) Miscellaneous Exercises Bibliography and Comments
+
Chapter 2. 2.1. 2.2.
1 2 3 5
7 8 10 11 12 14 16 18 20 21 22 23 24 26 27 29 30 31
33 33 33 35 37 38 39 42 51
Matrix Theory 54 55
Introduction Determinantal Solution xv
CONTENTS
xvi
2.3. Elimination 2.4. Ill-conditioned Systems 2.5. T h e lmportance of Notation 2.6. Vector Notation 2.7. Norm of a Vector 2.8. Vector Inner Product 2.9. Matrix Notation 2.10. Noncommutativity 2.1 I. ‘l’he Adjoint, or Transpose, Matrix 2.12. ‘l’hc Inverse Matrix 2.13. Matrix Norin 2.14. Relative Invariants 2.15. Constrained Minimization 2.1 6. Symmetric Matrices 2.17. Quadratic Forms 2.18. Multiple Characteristic Roots 2.19. Maximization and Minimization of Quadratic Forms 2.20. illin-Max Characterization of the A, 2.21. Positive Definite Matrices 2.22. Ileterminantal Criteria 2.23. Representation for A-’ 2.24. Canonical Representation for Arbitrary A 2.25. Perturbation of Characteristic Frequencies 2.26. Separation and Reduction of Dimensionality 2.27. Ill-conditioned Matrices and Tychonov Regularization 2.28. Self-consistent Approach 2.29. Positive Matrices 2.30. Variational Characterization of h ( A ) 2.3 I . Proof of Minimum Property 2.32. Equivalent Definition of A(A) Miscellaneous Exercises Bibliography and Comments
Chapter 3.
58 59 60 60 61 61
63
64 65 65 67 68 71 72 14 75 76 77 79 81 82 82 84 85 86 88 88 89 91 92 94 101
Matrices and Linear Differential Equations
3.1. Introduction 3.2. Vector-Matrix Calculus 3.3. Existence and Liniqueness of Solution 3.4. ’rhe Matrix Exponential 3.5. Commutators 3.6. Inhomogeneous Equation 3.7. ‘I’he Euler Solution 3.8. Stability of Solution 3.9. Idinear Differential Equation with Variable Coefficients 3.10. Linear Inhomogeneous Equation 3.1 I. Adjoint Equation XB 3.12. T h e Equation X’ = A X 3.13. Periodic hlatriccs: the Floquet Representation 3.14. Calculus of Variations 3.15. Two-point Boundary Condition
+
104 104
105 107 108 110 111 113 114 116 118 118 1 20 121 122
CONTENTS
3.16. 3.17. 3.18. 3.19. 3.20.
Green’s Functions T h e Matrix Riccati Equation Kronecker Products and Sums
AX
+XB = C
Random Difference Systems Miscellaneous Exercises Bibliography and Comments
Chapter 4.
Introduction Dini-Hukuhara Theorem-I Dini-Hukuhara Theorem-I1 Inverse Theorems of Perron Existence and Uniqueness of Solution Poincark-Lyapunov Stability Theory Proof of Theorem Asymptotic Behavior T h e Function q ( c ) More Refined Asymptotic Behavior Analysis of Method of Successive Approximations Fixed-point Methods Time-dependent Equations over Finite Intervals Alternative Norm Perturbation Techniques Second Method of Lyapunov Solution of Linear Systems Origins of Two-point Boundary Value Problems Stability Theorem for Two-point Boundary Value Problem Asymptotic Behavior Numerical Aspects of Linear Two-point Boundary Value Problems Difference Methods Difference Equations Proof of Stability Analysis of Stability Proof T h e General Concept of Stability Irregular Stability Problems T h e Emden-Fowler-Fermi-Thomas Equation Miscellaneous Exercises Bibliography and Comments
5.6.
134 135 138 140 140 142 143 146 148 149 150
152 152 155 156 157 157 158 159 160 161 163 165 165 166 168 168 170 171 182
The Bubnov-Galerkin Method
5.1. Introduction 5.2. Example of the Bubnov-Galerkin Method 5.3. Validity of Method 5.4. Discussion 5.5.
123 123 124 125 127 127 131
Stability Theory and Related Questions
4.1. 4.2. 4.3. 4.4. 4.5. 4.6. 4.7. 4.8. 4.9. 4.1 0. 4.1 1 . 4.12. 4.13. 4.14. 4.1 5. 4.16. 4.1 7. 4.18. 4.19. 4.20. 4.21. 4.22. 4.23. 4.24. 4.25. 4.26. 4.27. 4.28.
Chapter 5.
xvii
T h e General Approach Two Nonlinear Differential Equations
187 188 189 190 190 192
xvi ii
CONTENTS
5.7. T h e Nonlinear Spring 5.8. Alternate Average 5.9. Straightforward Perturbation 5.10. A “Tucking-in’’ Technique 5.1 1. l‘he Van der Pol Equation 5.12. Two-point Boundary Value Problems 5.13. T h e Linear Equation L(u) = g 5.14. Method of Moments 5.15. Nonlinear Case 5.16. Newton-Raphson Method 5.17. Multidimensional Newton-Raphson 5.18. Choice of Initial Approximation 5.19. Nonlinear Extrapolation and Acceleration of Convergence 5.20. Alternatives to Newton-Raphson 5.21. Lagrange Expansion 5.22. Method of Moments Applied to Partial Differential Equations Miscellaneous Exercises Bibliography and Comments
Chapter 6.
193 196 196 198 198 200 200 202 202 204 207 208 210 21 1 212 214 215 222
Differential Approximation
6.1. Introduction 6.2. Differential Approximation 6.3. Linear Differential Operators 6.4. Computational Aspects-I 6.5. Computational Aspects-I1 6.6. Degree of Approximation 6.7. Orthogonal Polynomials 6.8. Improving the ApDroximation 6.9. Extension of Classical Approximation Theory 6.10. liiccati Approximation 6. I I . Transcendentally-transcendent Functions 6.12. Application to Renewal Equation 6.13. An Example 6.14. Differential-Difference Equations 6.15. An Example 6.16. Functional-Differential Equations 6.17. Reduction of Storage in Successive Approximations 6.18. Approximation by Exponentials 6.19. Mean-square Approximation 6.20. Validity of the Method 6.21. A Bootstrap Method 6.22. T h e Nonlinear Spring 6.23. ’I‘he Van der Pol Equation 6.24. Self-consistent Techniques 6.25. T h e Riccati Equation 6.26. Higher-order Approximation 6.27. Mean-square Approximation-Periodic Solutions Miscellaneous Exercises Bibliography and Comments
225 225 226 226 227 228 229 23 1 23 1 232 233 233 236 238 239 240 242 242 242 243 244 244 246 248 24 8 250 25 1 253 255
CONTENTS
Chapter 7.
The Rayleigh-Ritz Method
7.1. Introduction 7.2. T h e Euler Equation 7.3. T h e Euler Equation and the Variational Problem 7.4. Quadratic Functionals: Scalar Case 7.5. Positive Definiteness for Small T 7.6. Discussion 7.7. T h e Rayleigh-Ritz Method 7.8. Validity of the Method 7.9. Monotone Behavior and Convergence 7.10. Estimation of I u z1 I in Terms of J(v) - J(u) 7.11. Convergence of Coefficients 7.12. Alternate Estimate 7.13. Successive Approximations 7.14. Determination of the Cofficients 7.15. Multidimensional Case 7.16. Reduction of Dimension 7,17. Minimization of Inequalities 7.1 8. Extension to Quadratic Functionals 7.19. Linear Integral Equations 7.20. Nonlinear Euler Equation 7.21. Existence and Uniqueness 7.22. Minimizing Property 7.23. Convexity and Uniqueness 7.24. Implied Boundedness 7.25. Lack of Existence of Minimum 7.26. Functional Analysis 7.27. T h e Euler Equation and Haar's Device 7.28. Discussion 7.29. Successive Approximations 7.30. Lagrange Multiplier 7.3 I . A Formal Solution Is a Valid Solution 7.32. Raising the Price Diminishes the Demand 7.33. T h e Courant Parameter 7.34. Control Theory ~
Miscellaneous Exercises Bibliography and Comments
Chapter 8. 8.1. 8.2. 8.3. 8.4. 8.5. 8.6. 8.7. 8.8. 8.9.
xix
259 259 260 261 263 264 265 265 267 268 269 270 27 1 272 273 274 275 277 279 280 281 282 282 283 284 284 286 287 288 288 289 289 290 291 29 1 301
Sturm-Liouville Theory
Equations Involving Parameters Stationary Values Characteristic Values and Functions Properties of Characteristic Values and Functions Generalized Fourier Expansion Discussion Rigorous Formulation of Variational Problem Rayleigh-Ritz Method Intermediate Problem of Weinstein
304 305 306 307 312 313 314 315 316
xx
CONTENTS
8.10. Transplantation 8.1 1 . Positive Definiteness of Quadratic Functionals 8.12. Finite Difference Approximations 8.13. Monotonicity 8.14. Positive Kernels Miscellaneous Exercises Bibliography and Comment
316 317 318 319 320 322 329
Author Index
331
Subject Index
337
METHODS OF NONLINEAR ANALYSIS Volume I
This page intentionally left blank
Chapter 1 FIRST- AND SECONDORDER DIFFERENTIAL EQUATIONS
1.l.Introduction
I n this chapter we wish to prepare the way for the subsequent chapters by considering in some detail a number of results pertaining to firstand second-order ordinary differential equations. T h e pathbreaking consists of two parts. I n the first place, we want to observe the kinds of results that can be obtained in these relatively simple situations, and in the second place we want to note the methods that are employed. I n many cases the same methods can be employed to treat higher order equations. We will consider first the linear first-order differential equation
+p ( t ) u = q(t),
u(0)
u’
=
c,
(1.1.1)
and then the second-order linear differential equation U”
+ p ( t ) u’+ q ( t )
U = f(t),
( 1.1.2)
subject to both initial and two-point boundary conditions. T h e twopoint boundary conditions introduce Green’s functions. Following this, we will analyze the asymptotic behavior of solutions of u”f.(1 +f(t)) u
=
0
(1 .I -3)
as t -+ 00 wheref(t) is “small” in some sense. This will lay the groundwork for the study of the solutions of the more general equation 24’’
fR ( t ) U
=
0,
(1.1.4)
relying upon an ingenious change of dependent variable due to Liouville. T h e Riccati equation plays an important role in the study of (1.1.3) and (1.1.4). Although it appears here only as an artifice, its true fundamental role in analysis will be made apparent in Volume I1 in connection with dynamic programming and invariant imbedding. T h e results obtained in connection with (1.1.3) will permit us to discuss the validity of the WKB approximation. 1
2
FIRST- AND SECOND-ORDER DIFFERENTIAL EQUATIONS
Finally, we will indicate how elementary, albeit occasionally tedious, analysis can be used to obtain precise asymptotic behavior of the equation (1.1.5)
where p and q are polynomials in their arguments, in the important case where we are considering only solutions which remain finite for all t > t o , the so-called physical solutions. This work was initiated by Bore1 and Hardy. T h e explicit results obtained here can be used occasionally in the later chapters to illustrate the fancier analytic and computational methods designed to handle the more complex functional equations thrust upon u s by biology, economics, engineering, and physics. Although specific examples are absolutely essential, there is no harm done in occasionally minimizing the algebraic and arithmetic labor involved in this exposition. When an equation is displayed in this chapter, it will be tacitly assumed that a solution exists and, furthermore, that this solution is unique. I n Chapters 2 and 3 we will provide the necessary existence and uniqueness theorems validating both the analysis that follows and various statements that are made about the properties of these solutions. 1.2. The First-order Linear Differential Equation
T h e equation u' + p ( t ) u
=
dt),
4 0 ) = c,
(1.2.1)
plays an important role in the theory of differential equations, due equally to its simplicity and to the fact that it is one of the few equations which can be solved explicitly in terms of elementary functions and elementary operations. T o obtain this explicit analytic solution, we multiply by the integrating factor
and integrate both sides between 0 and t. We thus obtain
1.3. FUNDAMENTAL INEQUALITY
or u = cexp
3
1-
Let us denote the expression on the right by T ( p , q, c ) . This is an operation on the functions p and q and a simple linear function of c. Observe that it is also linear in q. Observe further that the positivity of the exponential function permits us to assert that the operation T is monotone in both c and q, that is to say, if c, 3 c2 , T(P, 9, c1> 3 T(P,9, cz), (1.2.4) 91 3 92 * T(P, 91, c) 3 T(p,qz ,4, if These properties will play an important role in the chapter on quasilinearization in Volume 11. Exercises
1. Use the preceding results to show that if u satisfies the differential 0, where v is inequality u' + p(t)u < q(t), u(0) = c, then u < v , t the solution of the corresponding equation.
2. Show that if
c
>, 0, q >, 0, then T ( p , , q, c)
3. How many solutions of
+
< T ( p , , q, c) for p , 3 p , .
u' au = 1 can satisfy the condition that 1imt+- u ( t ) exists ? Consider separately the cases a > 0, a < 0, a = 0.
+
4. Consider the same question for u' a(t)u = f ( t ) , under the assumption that the limits of a(t) andf(t) exist as t -+ 00.
5 . Obtain an explicit representation for the solution of u'
+ p,(t) u = pz(t)U"+1,
u(0) = c
1.3. Fundamental Inequality
From the monotonicity noted above we can readily establish the following basic inequality:
If u(t) < c 3 0, then
Lemma.
for t
+ JAu(tl)v ( t l ) dt, , where c > 0 and u, v 3 0 (1.3.1)
4
FIRST- AND SECOND-ORDER DIFFERENTIAL EQUATIONS
T o obtain this result from the result of the foregoing section, let rt
Then the integral inequality yields, after multiplication by v(t), the differential inequality ~
dw dt
< cv + vw,
w ( 0 ) = 0,
(1.3.3)
whence, taking account of Exercise 1 following the last section, w
< exp [,: v dt,]
1:
/cu(tl)exp
[-
f'
1
v(t2)dt2] dt,
.
(1.3.4)
Since the integration can be readily carried out, we have w
< c [exp
(1:
v dt,)
-
13.
(1.3.5)
Using this estimate in the original integral inequality, we have
11
=
c exp
(1:.
dt,),
(1.3.6)
the desired result. Exercises
1. Establish the foregoing result starting with the relation
and integrating between 0 and t. (This proof is much shorter, but it masks the reason for the validity of the result.)
2. What can we deduce if u(t) < f ( t ) v >O? 3 . Is the restriction
z1
+ $, u(tl)u(tl)dt, for t 3 0 with
3 0 essential ?
4. Carry through a proof of (1.3.1) by iteration of the inequality.
+
5. Consider the inequality u < h(t) aecbt $, ebL1w(t,)dt, J,,1 eb%u(tl) dt, \< v(t) where v' = heb1 av, u(0) = 0.
+
. Show
that
1.4. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS
5
6. Hence, show that if 0 < a < b and h(t) + O
as t-+ co, then f = o(g) to signify that f / g + 0. (Th e notation f = O ( g ) signifies that 1 f 111g 1 is uniformly bounded as t + co.)
Jot e b t w ( t l )dt,
=
o(ebl) as t + 00. We use the notation
+
(1 - a ) J: u(tl) dt, + b as t + co, where 0 can we say about the limiting behavior of u(t) ?
7. If au(t)
+
< a < 1, what
8. If au(t) (1 - a ) So u(tl) dt,/t -+ b as t -+ co, where 0 what can we say about the limiting behavior of u(t) ? 1
+
(1 - a ) C;=, uk/n+ b as n -+ co, where 0 < a can we say about the limiting behavior of u, as n + co ?
9. If au,
< a < 1,
< 1, what
1.4. Second-order Linear Differential Equations
Consider the linear equation U"
+ p ( t ) u' + q ( t ) u = 0,
(1.4.1)
where p and q are assumed continuous in [0, TI. This is a fundamental equation of mathematical physics, arising in analytical mechanics, quantum mechanics, wave propagation, and many other areas. It remains challenging despite all of the work that has been devoted to its study since we cannot read off the properties of the solution with the aid of a general explicit analytic solution in terms of the elementary functions and a finite number of integrations and differentiations. Every solution of this equation is a linear combination of two particular solutions u1 , u2 , the principal solutions, determined by the initial conditions =
1,
u1'(0) = 0,
Uz(O) =
0,
Uz'(O) =
Ul(0)
(1.4.2)
1.
T h e Wronskian of two solutions, u, v, of (1.4.1) is a useful function defined by the determinantal expression W(u,v) =
I
u
v
u)
v'
=
(1.4.3)
uv' - u'v.
It is easily seen that u'
21'
I
=
-p(t)
w.
(1.4.4)
6
FIRST- AND SECOND-ORDER DIFFERENTIAL EQUATIONS
I n particular, W ( u , , u2) = I in the case wherep principal solutions.
= 0, and
u and
are
Exercises
1. Use the Wronskian relation to obtain the general solution of (1.4.1) given one particular solution.
2. Show that W(u,v) # 0 for any t > 0 if W(u(O),v(0)) # 0.
+
3 . If zil and u2 are principal solutions, show that a,u,(t,) a,u2(tl)= b, , a,u,’(t,) + a,u,’(t,) = b, has a unique solution for the coefficients a, and a 2 .
4. If p ( t ) 3 0, show that W(u, , u 2 ) is a constant for t 3 0. 5. Show that W ( u , , u2) is never zero unless it is identically zero.
6. Show that the general solution of the linear difference equation
where a , , a2 are constants, is un = clrln
+ cZrzn,
where rl and r2 are the roots of the quadratic equation r2
provided that r, #
Y,
+ alr + a2
=
0,
. What is the form of the solution if rl
= r,
?
7. Show that a necessary sufficient condition that all solutions of the foregoing equation approach zero is that the roots of r2 a,r + a2 = 0 be less than one in absolute value.
+
8. Determine the analytic form of r, where
Y , + ~ = (ar, Hence determine the asymptotic form of rn as n + Y, u,&, and pick u, and v ,judiciously. 1
+ b)/(cr, + d ) . 00.
Hint: Set
1.5. INHOMOGENEOUS EQUATION
7
1.5. Inhomogeneous Equation
Consider first the case where the coefficients and u satisfies the inhomogeneous equation u"
p and q are constant
+ pu' + qu = f ( t ) .
(1.5.1)
We can take u(0) = u'(0) = 0 without loss of generality since it is sufficient to obtain any particular solution in order to obtain the general solution. Let us use a method which is particularly well suited to equations with constant coefficients, namely the Laplace transform. I t possesses the merit of extending immediately to many other types of functional equations such as linear differential-difference equations and linear partial differential equations and even to certain types of nonlinear equations containing convolution terms. Write L(u) =
I
m
(1.5.2)
e-%(t) dt
0
assumed convergent for Re(s) > 0. Actually, convergence is of no particular significance here since we use the Laplace transform solely to obtain the desired result. I t is then verified using a direct method. From (1.5.1) we derive by repeated integration by parts L(u) =
Let r1 , r2 be the roots of moment. Then
+ 4)
L ( f1
+ps
(IZ
(1.5.3)
*
+ p r + q = 0, assumed
P . ~
Hence, the inverse transform of (s2
+ PS + q)-'
distinct for the
is (1.5.5)
Let us now employ the fundamental relation for the Laplace transform of the convolution of two functions, L
(St
0
u(t
-
t l ) v(2,) dll)
= L(u)L(v).
(1.5.6)
FIRST- AND SECOND-ORDER DIFFERENTIAL EQUATIONS
8
1
This means that Jflu(t - tl) v(tl) dt, is the Laplace inverse of L(u)L(v) and thus, referring to (1.5.3), that u
1'
k(t
=
-
t l ) f ( t J dt,
0
.
(1.5.7)
Once the result has been obtained, we can readily verify by direct differentiation that it is indeed a solution. Hence, there is no need to impose any conditions on f ( t ) at t = co or worry about the convergence of the integrals involved. I n particular, if the equation is U"
we have r,
+ w2u = f ( t ) ,
(1.5.8)
and
= mi, r p = - w i
- e-wit
k(t) =
2wi
-
sin w t w
,
(1.5.9)
and thus the frequently useful result u =
-I I
t
sin w ( t -
(1.5.10)
w f l
Exercises
1. Determine the form of K(t) in the case that the roots r , and r2 are equal.
+
+
2. Obtain the general solution of u , + ~ a,u,+, 00 a2u, =f, using the generating function approach, i.e., set u(s) = Cn=ou,sn, etc. 3. Obtain an explicit expression for the Laplace transform of the solution of the differential-difference equation u'(t) = alu(t) a,u(t - l), t 3 1, u(t) = f ( t ) , 0 t 1, and thus a contour integral representation for u(t). For further results, see
< <
+
R. Bellman and K. L. Cooke, DzfSeerential-Dzjference Equations, Academic Press, New York, 1963. 1.6. Lagrange Variation of Parameters
If p and q are dependent on t , it is most convenient in this secondorder case to use a special device rather than either of the general
1.6. LAGRANGE VARIATION OF PARAMETERS
9
approaches based on matrix theory and the use of the adjoint operator given in the following chapter. T h e method employed, due to Lagrange, is useful in other situations and is worth keeping in mind. Let u l ,uz be, as above, the principal solutions of the homogeneous equation and write u =
v1u1
+ vzuz
(1.6.1)
1
where vl and v z are functions to be chosen expeditiously. T h e n
+ vzuz‘ +
u’ = vlul‘
Vlhl
+ vz‘u, .
(1.6.2)
Let us use one free choice to simplify this expression for u‘ by setting vl’ul
+ vz’u2= 0.
(1.6.3)
This yields one relation between vl’ and vz‘. Since this leaves the relation u’ = vlul’ vzu2’, combining (1.6.2) and (1.6.3), we have, upon a further differentiation,
+
u” = v1‘u1’
Hence u“
+ pu‘ + qu
= Vl[g
+
ViU2’
+ vlu; + v,u; .
+ pu,’ + qull +
V,[.,”
+ pu,‘ +
(1.6.4)
QU,]
+ vl’ul’ + vz’uz’ = vl’ul’ + v2‘u2‘
(1.6.5)
upon recalling the definition of u1 and u, . Thus, the equation u”
+ pu’ + qu = f
(1.6.6)
yields a second relation for vl‘ and v,’, vl’u;
+ v2’uz’ = f.
(1.6.7)
Solving (1.6.3) and (1.6.7) simultaneously for the unknown functions q‘, vz’,we have
(1.6.8)
10
FIRST- AND SECOND-ORDER DIFFERENTIAL EQUATIONS
Hence, ignoring the constants of integration,
(1.6.9) Exercises
1. Why can we ignore the constants of integration?
2. Obtain the general solution of
+
u" w2u = eaLand thus, by differentiation with respect to u, the general solution of u" w2u = t.
+
1.7. Two-point Boundary Value Problem
I n many important cases, such as those arising in the calculus of variations and in many parts of mathematical physics, the solution is not determined by initial-value conditions but rather by conditions at two points such as u(0) = c1 , u ( T ) = c2 , (1.7.1) Sometimes these two-point conditions determine a solution uniquely, and sometimes not, as we shall see. T o obtain the solution of (1.4.1) subject to (1.7.1), we proceed as follows. Let u1 , u2 be the principal solutions. T h e n the general solution has the form u = au, + bu,, where a and b are arbitrary constants. T h e condition at t = 0 determines a, namely, a = c l . Hence we have u = clul
+ bu, ,
(I -7.2)
where the remaining unknown parameter b is determined by the condition at t = T, (1.7.3) c* = clul( T ) bu2(T ) .
+
Hence, if u2(T ) f 0, there is a unique solution. If u2(T ) = 0, then u is a solution if c2 == clul( T ) . I n this case, there is a one-parameter family of solutions. More general boundary conditions can be discussed in a similar way. T h e determination of when u 2 ( T )# 0 is related to questions of characteristic values, which we shall discuss iubsequently. Exercises
+
1. Solve u" u = 0, u(0) = c, , u ( T ) = c2 . Discuss the condition on T to ensure existence and uniqueness.
1.8. CONNECTION WITH CALCULUS OF VARIATIONS
11
u’( T )= c 2 , and when u(0) = cl, u( 7’) = c2 . Discuss the question of existence and uniqueness.
2. Solve u” - au =0, u(0) = c,, 3. Consider the equation u“
+ p(t)u’ + q(t)u = 0 subject to
Let u, , u2 be two linearly independent solutions of the equation. Show that the question of existence and uniqueness of a solution subject to the foregoing conditions is dependent upon the nonvanishing of the determinant
4. Show that
5. Show that the domain of integration may be taken to be 0 < t, < t , < T .
+
6 . Using this result, discuss the problem of solving u” u = 0 subject T u ( t ) elit dt = b, , J i u ( t )eA21dt = b, . For more general results to
so
and the connection with Cebycev systems of functions, see
R. Bellman, “A Note on the Identification of Linear Systems,” Proc. Amer. Math. SOC.,Vol. 17, 1966, pp. 68-71.
7. Consider the parabolic partial differential equation u1 = u,, , u(x,0 ) = g(x), u(0, t ) = u(1, t ) = 0, t > 0. Show that L(u), the Laplace transform of u, satisfies an ordinary differential equation subject to a two-point boundary condition when considered as a function of x.
1.8. Connection with Calculus of Variations
T h e equation U”
+ g(t)u = 0
(1.8.1)
FIRST- AND SECOND-ORDER DIFFERENTIAL EQUATIONS
12
possesses the fundamental property of being the Euler equation associated with the quadratic functional J(u) =-
(1.8.2)
J T ( u ' 2 - q ( t ) u2) dt. 0
This stationarity property holds regardless of the boundary conditions that are imposed. I n Chapter 8 we will discuss the rigorous aspects in great detail. Here, let us merely indicate the connection which motivates many of the ingenious artifices that are used in the study of (1.8.1). Let u be a solution of (1.8.1), v be another function such that a ' d 2 ( 0 , T ) , and E a parameter. T h e n
+
J(ZL
CV)
Integrating the term
J(u)
:
Jr
+ 2~
IT
[u'v'
-
dt
u'v
1,
f
T ~
Hence
+ €V) = J ( u ) + =
+ €"(a).
(1.8.3)
u'o' dt by parts, we have
j:.w
J(u
q ( t ) uv] dt
0
J(u)
€'J(.)
+ ."(.).
~
2€
T
u"v dt.
0
j
T
o[u"
0
(1.8.4)
+ q ( t ) u] dt (1.8.5)
We have blithely assumed that the integrated term 24'711: is equal to zero. This may be ensured either by virtue of the original restrictions on u and u' or by choice of v ; see Chapter 8 for further details. Exercise
+
+
I . Can one obtain the equation u" p(t)u' q(t)u = 0, p # 0, as the Euler equation of a quadratic functional of the form
1.9. Green's Functions
Let us next consider the inhomogeneous equation u" + g ( t ) u
=f(t),
u(0) = u ( T ) =: 0.
(1.9.1)
1.9. GREEN'S FUNCTIONS
13
Referring to Sec. 1.7, we see that if u2(T)# 0 there is at most one solution. Furthermore, if u 2 ( T )# 0 there is exactly one solution. To obtain this solution we proceed as follows. Using the representation in Sec. 1.6, we have
* = Cl*l
+
CZ*Z
+
J t s(4
tl)f(tl)
4,
(1.9.2)
where c1 and c2 are constants to be determined, and
44 tl)
=
r -w *z@d +
*2@>
*l(tl)l/W(tl
> t2)-
(1.9.3)
Without loss of generality, we can take u1 and u2 to be principal solutions so that W = 1 . Setting t = 0, we see that c1 = 0. T o obtain c 2 , we set t = T, 0 = C2U2(T)
+J
T
q(T9 tdf(t1) 4 .
(1.9.4)
Using this value of c2 in (1.9.2), we have
T h e dependence of the kernel k on T is not usually explicitly indicated. We ordinarily write
*=J
T
4 4 tl)f(tl) dtl ,
(I .9.6)
where
(1.9.7)
14
FIRST- AND SECOND-ORDER DIFFERENTIAL EQUATIONS
I t follows from this (cf. 1.9.7) that
a most important reciprocity relation. T h e function k(t, t l ) is called the Green’s function associated with the equation and the boundary conditions. Different conditions produce different Green’s functions. Exercises
< <
1. Show that k ( t , t l ) is continuous in t for 0 t T , but that k(t, t l ) has a discontinuous partial derivative with respect to t at t = t , . What is the nature of the discontinuity ?
2. Obtain the Green’s functions for 24’’ = f ( t ) , (a> (b) U” = f ( t ) , (c) 24’’ w2u = f ( t ) , (4 U” w2u = f ( t ) , (el u” wzu = f ( t ) ,
+ + +
the equations u(0) = u ( T ) = 0, u(0) = 0, U ‘ ( T ) = 0, u(0) = u ( T ) = 0 u(0) = 0, u‘(T) = 0, u(0) = 0, U ’ ( T ) bu(T) = 0.
+
3. Consider the Green’s function for Exercise 2(e) as a function of b. Is it continuous as b 4 0 and b -+ co ?
+ u = f ( t ) , u(0) = u ( T ) = 0.
4. Consider the Green’s function for EU” Discuss the limiting behavior as E + 0.
5. Similarly discuss the behavior of the solutions of €24’’ - u = 0, €U“
as
+ tu‘
E 4
--
u =
€24‘‘
0,
+ u‘ = 0, EU“
EU” - u‘ =
+ tu’ + u = 0,
0, €24”
EU”
+ tu‘
+ tu’ = 0, -
u/2 = 0
0. See,
C. E. Pearson, “On a Differential Equation of Boundary Layer Type,” J . Math. Phys., Vol. 47, 1968, pp. 134-154. 1.10. Riccati Equation
T h e change of dependent variable, u
=
exp
(1
z1
dt),
(1.10.1)
1.10. RlCCATl EQUATION
15
replaces the linear second-order equation u” +pu’
+ gu = 0
(1.10.2)
with the nonlinear first-order equation 0’
+
v2
+ g = 0.
+pv
(1.10.3)
This is called a Riccati equation and plays a basic role in the theory of the linear second-order equation. * T h e transformation which appears so formal here is actually well motivated by both analytic and physical considerations, as we will see in the chapters on dynamic programming and invariant imbedding in Volume 11. Exercises
1. If v satisfies a Riccati equation, show that v-l also satisfies a Riccati equation.
+
2. If v satisfies a Riccati equation, show that w
+
= (av b)(cv d) also satisfies a Riccati equation for any functions of t , a , b, c , and d. m
3. If v(t) possesses a power series expansion v ( t ) = En=, antn, and p
and q are analytic in t , show how to obtain a continued fraction expansion for v,
where c1 , c2 are positive integers. Hint: Set v
=
a,
+ a l t c l / v l , etc.
4. If p and q are analytic in a parameter E and v possesses a power series expansion in E, v,(t) q ( t ) ..., show how to obtain a continued fraction expansion for v,
+
+
v=vo+-
+
En1
1
b
E lv,
’
+
where b, is a positive integer. Hint: Set v, = vo E W , etc. (The last two results are connected with the subject of Pad6 approximations, a topic considered in the exercises at the end of the chapter.)
* Also in the study of nonlinear equations. See H. P. F. Swinnerton-Dyer, “On a Problem of Littlewood Concerning Riccati’s Equation,” Proc. Cambridge Phil. SOC., Vol. 65, 1969, pp. 651-662.
16
FIRST- AND SECOND-ORDER DIFFERENTIAL EQUATIONS
1.11. The Cauchy-Schwarz Inequality
I n the course of various estimations, we will make frequent use of the following basic inequality: (1.11.1) valid for any two real functions for which the right-hand side exists. Let us give two different proofs; each can be extended in different ways. Start with the inequality 0
< (U
b)'
~-
=
+
( 1 .1 1.2)
a2 - 2 ~ b b2,
valid for any two real numbers a and b, equivalent to 2ab
< a2 + b2.
(1.1 1.3)
Now set b
= g/(jTg2dt)lil, 0
(1.11.4)
obtaining (1.1 1.5) Integrating between 0 and T , we obtain (1.1 1.1). I n the second proof, we start with the fact that we have 0
T 0
(af
+ bg)2dt
=
a2
s
T 0
f dt
+ 2ab
1
T
0
f g dt
+ 6 2 s T g 2 dt 0
( 1.1 1.6)
for any real numbers a and b and for any two functions f and g belonging to Z,2(0, T ) . For the quadratic form in a and b to be non-negative for all real a and b, it is necessary that the quadratic equation r ' I S0 T f 2 d t + 2 r / 0' f g d t + s T 0g 2 d t = O
( 1 . I 1.7)
have imaginary roots. T h e usual discriminant condition yields (1.11.1).
1.11. T H E CAUCHY-SCHWARZ INEQUALITY
17
Exercises
1. Show that ab
<
+
a2/2 b2/2 for a, b 3 0 by examining areas in the following figure. What is the case of equality ?
2. I n a similar fashion, establish Young’s inequality, ab < up/p + bq/q for a, b 3 0, p > 1 and q determined by l/p + l / q = 1 . 3. Thus, establish the Holder inequality
4. Establish the triangle inequality
5. Establish the Minkowski inequality
forf,g 3 0 , p 3 1.
6. From the inequality in Exercise 1, show by application to (al and (a3 + a4), where a , , a 2 , a 3 , a4 2 0, that
+ u2)
and hence, generally, that
for N
= 2”
if the a, are nonnegative.
7. By suitable specialization of the a,i show that this general arithmetic mean-geometric mean inequality holds for any integer N . (This
FIRST- AND SECOND-ORDER DIFFERENTIAL EQUATIONS
18
proof by a combination of backward and forward induction is due to Cauchy.)
8. By suitable specialization of the ai and a limiting procedure derive Young’s inequality given in Exercise 2. 9. Show that
where we employ the norm llfll
=
(JOT
If
l2
dt)1/2.
10. Hence, derive Minkowski’s inequality
I1g + h I1 < I1g I/
+ I1 h I/.
11. Similarly, show that
where
1lp
llfll,
+ 1/q
=
=
1.
(So’ I f I”
d P , Ilfll,
=
(JOT
I g Iq Wq,with p > 1,
12. Let d(x) = (xlP - x2P - ... - x,P)l/P for xi in the region defined x,P)l/P, p 3 1. by xi 3 0, i = 1, 2,..., n, and x1 > (x2P + Show that if x = (xl,x2 ,..., xn), y = ( y l , y 2 ,...,y,), then x y belongs to the foregoing region when x and y do and that 0 . -
4(. Hint: Write +(x)
=
+
+
+ Y >2 4(4 + 4(Y>* n
min,(xi=l xixi) for z in a suitable region.
13. Show that
This is an example of the interesting class of problems which arise in an attempt to show explicitly that an expression is nonnegative. 1.12. Perturbation and Stability Theory
T h e equation u”
+ a(t)u = 0
(1.12.1)
19
1.12. PERTURBATION AND STABILITY THEORY
#
plays an important role in mathematical analysis and in mathematical physics, particularly in connection with the Schrodinger equation. Since the solution cannot, in general, be obtained in terms of the elementary functions of analysis and a finite number of operations of integration and differentiation, we must have recourse to a number of approximate techniques. We shall present only a few of the many ingenious methods that exist in the pages that follow. An important case to begin with is that where a ( t ) is close to a constant, which is to say, the equation has the form u”
+ (1 +f(t)).
= 0,
(1.12.2)
where f is small in some sense. Once we have analyzed equations of this quite special type in detail, we will be in an excellent position to study the behavior of solutions of (1.12.1) with the aid of a simple basic transformation due to Liouville. *
Exercises
+ +
1. Consider the equation u” (1 Ef(t))u = 0, where E is a parameter. Show that we can obtain a formal solution of the form u = uo E U ~ e2u2 + , where
+
+
u;
+ u, = 0,
u;
+ u1 +fu,
= 0,
and so on. We will examine the validity of perturbation expansions of this nature in Chapter 3.
2. Consider the associated Riccati equation v‘
+ v2 + (1 + 4)
Show that we have a formal solution of the form 0
where 0,’
and so on.
= 0,
+ + E’U1
+ + 1 = 0, Oo2
E2V2
Vl’
= 0.
+ .*.,
+ 2v,v1 +f
=
0,
+
3. Consider the equation u“ u = g with the formal solution u = g - U” = g - g(2) + g(4)- .*., Under what conditions does the series represent a solution of the equation ?
* See also K. Stach, “Die allgemeine Eigenschaften der Kummerschen Transformation zweidimensional Raume von stetigen Funktionen,” Math. Rev., Vol. 36, August 1968,
No. 1720.
FIRST- A N D SECOND-ORDER DIFFERENTIAL EQUATIONS
20
1.13. A Counter-example
I t is reasonable to expect that the condition f ( t ) +0 would ensure that all solutions of (1.13.1) u“ ( I +f(t)) u = 0
+
are bounded as t -+ GO * and, furthermore, are well approximated to by the solutions of the tame equation V’’
+
z, =
0.
( 1.13.2)
As the following example shows, the situation is far more complex. Consider the function u = exp
(j‘g(s) cos s ds1 cos t ,
( 1.13.3)
0
where g will be chosen malevolently in a moment. We have 21 21‘
upon setting w u”
=
=
Hence u”
-zu
=
w cos t ,
=
-w sin t
( 1.1 3.4)
+ (g cos2 t ) w,
1
exp(Jog cos s ds), and cos t
-
(g cos t sin t ) w - (2g cos t sin t ) w
+ (g‘cos2 t ) zu
+u
:
-
=
+ (g2
C O S ~t ) w.
+
w cos t[g’ cos t 3g sin t g2 cos2 t ] u[g’ cos t - 3g sin t g2 cos2 t ] , -
+
(1.13.5)
(1.1 3.6)
which means that u satisfies the equation u”
4-(1
+ 4)u
=
( 1.1 3.7)
0,
where (f, = 3s sin t
-
g’ cos t
- g2
cos2 t .
(1.13.8)
Choosing g = cos t i t ,
( 1.13.9)
we see that g, g‘, and g2 all approach zero as t + a.Nonetheless, the integral J” g(s) cos s ds diverges, which means that u increases indefinitely
* As
a matter of fact, a well-known mathematician published a “proof” of this.
1.14.
Jm j
f(r)i dt < co
21
in amplitude as t + co. Hence, the study of (1.13.1) has its interesting aspects in the case whereftt) + 0 as t + CO.
T h e example presented in the preceding section shows that it is not sufficient to require that f ( t ) -+ 0 as t + co if we want all solutions of U" (1 + f ( t ) ) u = 0 bounded as t+co. Let us now show that the condition
+
is sufficient. To do this, we convert the linear differential equation into the linear integral equation u = c1 cos t
t c2 sin t
-
st
sin(t
-
t l ) f ( t l ) u(tl)dt,
0
.
(1.14.2)
Hence IUI
<
Applying the fundamental inequality of Sec. 1.3, we have
I I G (I c1 I
+I
c2
I) exp
(I'
If(t,l)ldt,)
< c3 < a,
(1.14.4)
the desired result. Exercises
1 . Extend the foregoing method to show that if all solutions of U" a(t)u = 0 are bounded as t -+ 03, then all solutions of U" ( a ( t ) + b(t))u = 0 are bounded, under the assumption that J" I b(t)l dt < co.
+ +
2. Show that I U' 1 is bounded as t -+ co under the assumption J" I f 1 dt < co. U" + a(t)u = 0 belong toL2(0, a), then + ( a ( t )+ b(t))u = 0 belong to L2(0,co) if
3. Show that if all solutions of all solutions of I b(t)l c1 < 03.
<
U"
22
FIRST- A N D SECOND-ORDER DIFFERENTIAL EQUATIONS
+ u/tk = 0 cannot be bounded if k > 1 .
4. Show that all solutions of u"
5. Construct a function f ( t ) with the property that maxlGTIf approaches 00 as T ---t co and yet J" 1 f I dt < co.
6. Show that all solutions of u" bounded as t + 03.
7. Show that all solutions of J" jg j dt < a.
+ (1 + b(sin 2t)lt)u = 0
U"
+ gu' + u = 0
I
cannot be
are bounded if
8. Obtain a bound for the difference between the solutions of U" +pu' pu 0 andv" +p,v plv = OoverO < t < T i n t e r m s of the integrals JOT I p -pl I dt, JOT I q - q1 I dt.
+
+
<
J" I f'(t)ldt
1.15.
Next, let us demonstrate that all solutions of u"
+ (1 + f ( t ) ) u = 0
(1.15.1)
are bounded as t + 00 under the conditions f(t)+ 0
t
as
Im l f ' ( t ) l dt <
+ co,
(1.15.2)
00.
Starting with (1.15.1), we have U'U''
Integrating,
+
+ (1 +f(t))
( u ' ) ~ u2 = c,
-
uu' =
(1.15.3)
2 j t f ( t l )uu'dt, 0
=
0.
+ J-"fr(tl)
c1 - f ( t > u2
u2
dt,
0
Without loss of generality, take If(t)l yields
.
< 1/2 for t 3 0.
(1.15.4)
T h e n (1.15.4)
Applying the fundamental inequality, we have u2
the desired result.
< 2c,exp ( 2 J f I f ' 0
I dt,) < c2 < m,
( 1.15.6)
1.16. ASYMPTOTIC BEHAVIOR
23
Exercises
1. Show that I U' I is bounded under the same hypotheses.
2. Show that all solutions of
U"
under the assumptions that J" g(t) + 0 as t 4 GO.
+ (1 + f + g)u = 0 I f I dt < co, J"
are bounded < 00, and
Ig' I dt
3. Is the full force of g ( t ) + 0 as t + co required ? 1.16. Asymptotic Behavior
Now that we have established boundedness under the condition that
J" I f I dt < co, let us obtain some more precise information concerning
the behavior of u ( t ) as t + co. Turning to (1.14.2), we have u = c1 cos t
+ c2 sin t
+ (cos t )
J t (sin
Since I u(t)l is bounded and converge as t -+00. Hence u = cos t [cl
-
(sin t )
Jl
(cos tl)f(tl) u ( t l )dt,
tl)f(tl) 4 t l ) dt,
(1.16.1)
*
J" If(t)l dt < co,we see that both integrals
+ Jm(sin t l ) f ( t l ) u(tl)4 1 0
+ sin t = c, cos
t
[cZ
- Jm
+ o(1)
(cos tl>f(tl> u(tl) dtl]
+ c4 sin t + o(l),*
( 1.1 6.2)
replacing the definite integrals by the constant values. I t is clear that c3 and c4 are linear functions of c1 and c2 . They are, however, quite complex functionals o f f . In the chapter on invariant imbedding in Volume I1 we will pursue the question of their analytic and numerical determination. There are important applications of results of this nature to scattering theory. Exercises
1. Consider the equation integral equation u =
-
U"
+ (1 + p)(t))u= 0
J' sin(t
-
tl) F ( t l > u ( t l ) dt,
0
* Recall
and the associated
our previous use of the o-notation.
,
FIRST- AND SECOND-ORDER DIFFERENTIAL EQUATIONS
24
where v u =
= c1
v
-
-t c2 sin t . Iterating, show that
cos t
J‘ sin(t
-
t , ) y(tl>v(tl>dt,
0
2. Hence, show that all solutions are bounded if (a)
Jlg)(tl)dt, ,
1
F(tl) sin 2t, dt, , J“,g)(tl) cos 2t, dt, are uniformly bounded for t >, t o . Jto
J“l,
I y ( t l ) sin(t - t,) sin(t, - t2)g)(tl)dt, 1 dt, for some t o . What conditions on y ensure (a) and (b) ? (b)
3 . Show that all solutions of a f 2, b > 1/2.
u”
+ (1 + sin at/t”)u
=
5. Obtain the asymptotic behavior of solutions of u” t -+ cc under the hypothesis that J“ I g I dt < CO. g+o,
J“
/ g ’ l dt
u”
0 are bounded if
+ (1 + sin t2a)u = 0 are bounded
4. Show that all solutions of u” a > 1.
6. Are all solutions of
< k < 1 f o r t 3 to
+ gu‘ + u = 0 bounded
< oo?
1.17. The Equation u”-
(1
if
+ gu’ + u = 0 as if we assume that
+ f(t))u = 0
Let us now consider the equation u”
-
(1 + f ( t ) ) u
=
0
(1.17.1)
under the assumption that f ( t ) + 0 as t + CO. T h e behavior of the solutions of this equation is very much easier to discuss than that of z i ‘ - 1 ( I + f ( t ) ) u = 0 since we can employ the associated Riccati equation, v’
+
212 -
(I
+f(t)) =
0.
( 1.17.2)
We shall carry through the analysis in detail since it will give us an opportunity to introduce techniques that will be employed in more general form in Chapter 3. I n ( 1 . I7.2), set v = 1 4.-w. T h e n w satisfies the equation w‘
+ 2w
1-
w2 - f ( t )
=
0.
(1.17.3)
1.17. THE EQUATION U" - (1
+ f(t))u
25
=0
Regardingf(t) - w2 as a forcing term, we see that w satisfies the integral equation w = ce-2'
+
e-2'
j'
n
e2t1[f(tl)
-
w21 dt,
,
(1.17.4)
where c = w(0). Let us now show that if I c I is sufficiently small (a bound will be given below), the solution of (1.17.4) exists for all t > 0 and w -+ 0 as t -+ co. For our present purposes, we could choose c =0, but it is instructive to give the full result. To do this, we use the method of successive approximations. Set wo = ce-2'
w,+,
=
ce-2'
+ +
e-2'
J' e2t1f(tl) dt, , 0
e-2'
J'
e2t1[f(tl)
- w:] dt,
,
n
0
> 0.
(1.17.5)
<
Without loss of generality, we suppose that If(t)l I c I for t 2 0, since we are interested in the case where f -+ 0 as t -+ co. We wish to show that (1.17.6)
IwnI < 4 1 c l
for n 3 0. We do this inductively. T h e results holds for n
=0
since
Assume that it holds for n. Th en
if I c I < 1/4. T o establish convergence of w, as n -+
CO,
we write
(1.17.9)
26
FIRST- AND SECOND-ORDER DIFFERENTIAL EQUATIONS
for n 3 I . For n
1, we have
:
J O
< 16 1 c 1'
It
eZt1dt,
eCZt
0
< 8 I c .'1
( 1.17.10)
Iterating the relation in (1.17.9), we have
1 wn+1 - W, I
< 8 1 c IZ(8 1 c j t)./.!
.
(1.17.11)
m
Hence, the series Cn=o1 w,+~- w, 1 converges uniformly in any finite t-interval, and thus w, converges to a function w which satisfies the integral equation in (1.17.4). It remains to show that w -+ 0 as t --t 00. We have w ( t ) = ceCZt
+ eKZtj tn e"ttf(tl) dt,
-
e-"
j t eZt1w2dt, . 0
(1.17.12)
We are now ready to employ Exercise 6 at the end of Sec. 1.3. We have
1w 1
< 1 ce-2t + eCZtJ t e Z f ' f ( t , )dt, I + 4 1 c I eCZtf eZtl I w I dt, , 0
(1.17.13)
0
and we know that the first term on the right approaches zero as t -+ co since f -+ 0 as t co. Hence, since 4 I c I < 1 we have, by the cited exercise, ---f
(1.17.14)
as t -+ co. Returning to (1.17.13), this yields the desired result that w ---f 0 as t co. An alternate proof yielding a stronger result is given below. ---f
Exercise
1. Why can't we use the Riccati equation to treat the asymptotic (1 f ( t ) ) u= 0 ? behavior of the solutions of U"
+ +
1.18. M o r e Refined Asymptotic Behavior
Now that we have demonstrated that w + 0 as t + co, let us see if we can obtain some more accurate bounds. From the equation (1.18.1)
1.19. l m f z d t <
27
00
we have ( 1.1 8.2)
Hence
< lgl+4 Clearly,
Ig I
=
I
ce-2'
+
e-2'
It
(1.18.3)
fi
e2ty dt,
0
dt,
.
(1.18.4)
A direct calculation shows that e-2'
St
e2t11 g
0
I dt,
< I c I e-t + 2e-t f e t l I f I dt, ,
(1.18.5)
0
upon using the right-hand bound on the integral. Hence, for small 1 c 1 we have lg I
< 2 [I c I e-t
+ e-t
fet1 0
I ~dt,]. I
(1.1 8.6)
Exercise
1. Why did we deliberately replace e-Zt by 1.19. J" f Z d t
<
e-1,
a weaker estimate ?
03
If we use the foregoing inequality (1.18.6) and iterate, we can obtain a still stronger result. Without loss of generality, take c = 0 since the J,,t eZL1gdt, in most cases of interest. term ce+l is much smaller than cZL Furthermore, as indicated below, we are really interested in demonstrating the existence of a solution of (1.17.3) with the property that w -+ 0. Then we have
(1.19.1)
FIRST- A N D SECOND-ORDER DIFFERENTIAL E Q U A T I O N S
28
upon using the Cauchy-Schwarz inequality. Hence w
=
ePzt
+ o (It e-(t-tl) I f 1'
1'
eZt'fdt,
0
+
(1.19.2)
dt,).
0
1
Since u'/u = 1 w , we see that it is the behavior of Sow dt, which we want in order to estimate that of u(t) as t -+00. We have
(1.19.3) Integration by parts yields
j t w dt, 0
=
1It
l t f dt, e-2(t-ti)f(t,) dt, 2 0 2 0 ~
j"
Hence U =
0
( s t j Zdt,).
(1.19.5)
c1
as t --?r co. We can thus assert that i f f are two solutions of
+0
as t
+ 03
and
(1.19.4)
0
+ ;j h t , + + o(l))
exp ( t
U"
+
S" f
- (1 4 - f ) u = 0
dt
< 00,
there
(1.19.6)
possessing the asymptotic forms: u1
-
u2 =
exp ( t
+ 1J i f d t , + o(l)),
exp ( - t
-
1
jLfdt, 0
+ o(1)).
( 1.19.7)
We can obtain the behavior of u2 from u1 by using the representation zll J;" dt/uI2for a second solution of (1.17.1) in terms of a first.
u2
Exercises
s"
1. Show that i f f + 0 and 1 f l 3 dt < co, there are two solutions of u" - (1 f ( t ) ) u = 0 possessing the asymptotic expansions
+
29
1.20. THE SECOND SOLUTION
2. Simplify this in the case where f
=
sin t/l/F.
3. Consider the case where f possesses the convergent expansion f = a,/t + a2/t2+ --.for large t, and obtain corresponding expansions for u1 and u2 . Are the expansions convergent ?
+
4. What is the eventual behavior of the solution of v' v 2 - 1 -f(t) = 0 with v(0) > 1 iff + 0 as t + 03 ? Hint: Consider the behavior of v in the (v, t)-plane. 1.20. The Second Solution
T h e equation
+
v'
v2 -
1 -f(t)
=
(1.20.1)
0
appears to have a solution that approaches -1 as t + GO, provided that f -+ 0 as t -+ 03. We can establish this indirectly as above by using the properties of u p . Can we demonstrate the existence of this solution directly ? Set v = - 1
+ w , obtaining the equation w'
=
2w - w2 + f ( t ) .
(1.20.2)
If we convert this into an integral equation in the usual fashion, obtaining w = e2tc
+
e2t
J: e-2t1[f(tl)
-
d]dt,
,
(1.20.3)
we see that the presence of ept causes some trouble. Hence, let us write w = eZtc + e2t -
ezt
J
m
e-ztt'[f(tl>
J't e - z t l [ f ( t l )
Since we want w to approach 0 as t -+ convergent. Choose c so that c
+J
m 0
-
w2] dt,
0
GO,
- to21
dt,
.
(1.20.4)
the infinite integral should be
e-z'l[f(tl) - to21 dt, = 0.
(I .20.5)
FIRST- AND SECOND-ORDER DIFFERENTIAL EQUATIONS
30
Alternatively, we can begin with the integral equation w
OD
= eZt t
e-2t1[zo2 - f ( t l ) ] dt,
(1.20.6)
and employ the method of successive approximations to establish the existence of the desired solution,
[
'x
w o = -eZt
eUzt'f(t,)dt, ,
(1.20.7)
We leave it to the reader to carry out the details. We shall employ this device again in Chapter 3. 1.21. The Liouville Transformation
Let us now engage in some analytic chicanery which will reduce the equation of (1.12.1) to an equation whose coefficients are close to constants in many cases of significance. T o simplify the algebra, let us take the equation in the form u"
*
d ( t )u
===
0.
(1.21.l)
Perform the change of variable ( 1.21.2)
an idea due to Liouville. T h e n a simple calculation shows that (1.21.1) becomes d2u
-+-ds2
d ( t ) du &u-0. & ( t ) ds
(1.21.3)
We then see that there are two cases of importance where (1.21.3) is close in form to an equation with constant coefficients. T h e first is I in this interval," i.e., where the t-interval is finite and I a'(t)i a(t) is slowly varying. T h e second is where a(t)+ co as t + 00 and 1 a'(t)/a2(t)l 0 as t + co. As we know from what has preceded, a sufficient condition for
<<
--f
* We
shall use the notation a
< 6 to mean that a is very much smaller than b.
1.22. ELIMINATION OF MIDDLE TERM
closeness of the solution of (1.21.3) to the solution of ~ ” ( s ) over the infinite interval is that
31
+ ~ ( s )= 0 ( 1.21.4)
If a’(t) > 0, a(t) > 0, we see that m
f
$#o(t)dt
=
loga(t)]
co
= 03,
(1.21.5)
if a(t).+03 as t + 03. Hence, we require a further transformation of the equation in this case of frequent occurrence. Let us consider first the case u” + a2(t)u = 0. T h e equation with the minus sign is easier to consider. Exercises
1. Use the Liouville transformation to establish the following result of Gusarov: If u” ( I + f ( t ) u = 0 with 1 + f ( t ) >, a2 > 0, J“ lf”(t)\ dt < 00, then u is bounded as t + co.
+
2. Hence, discuss the boundedness of the solutions of U“
for 0
+ (1 + (cos
t”)/tb)
u =
0
< a, b < 1. See
R. Bellman, “Boundedness of Solutions of Second Order Linear Differential Equations,” Duke Math. J., Vol. 22, 1955, pp. 51 1514.
3. Show that if we set u = g exp[i J ( d t ) / ( p g 2 ) ]then , -(pu’)’ + qu = 0 is converted into ((pg’)’/g)- (I/pg4)= q. For applications of this result, see
J. Walter, “Bemerkungen zu dem Grenzpunktfallkriterium von N. Levinson,” Math. Zeit., Vol. 105, 1968, pp. 345-350. J. F. deSpautz and R. A. Lerman, “Equations Equivalent to NonVol. 18, linear Differential Equations,” Proc. Amer. Math. SOC., 1967, pp. 441-444. 1.22. Elimination of Middle Term
Consider the general linear equation u”
+ p(s) u‘ + q(s) u = 0.
(1.22.1)
32
FIRST- A N D SECOND-ORDER DIFFERENTIAL EQUATIONS
T h e change of variable
u = exp ( - s j1, d s ) v
(1.22.2)
converts this into the equation v"
+ (q
-
p'/2 - p74) v
=
(1.22.3)
0,
as a direct calculation shows. Thus, if we set
(1.22.4) we see that v satisfies the equation
(1.22.5) If the integrals
(1.22.6) converge, we can apply the results of Sec. 1.16 to conclude that the solutions of (1.22.5) are asymptotically given by the solutions of wn
+w
==
(1.22.7)
0
as s + co. From this we can determine the asymptotic behavior of the solution of (1.21.1), as we do below. Prior to this, let us see what the conditions of (1.22.6) are in the 1 t-variable. Since s = fo a(tl) dt, , ds/dt = a(t), we see that they are
< 00.
(1.22.8)
Exercises
+
1. Determine the asymptotic behavior of the solutions of U" tau = 0, a > 0; u'' e"'u : 0, a > 0; U" (log t)"u = 0, a > 0. T h e last equation illustrates the fact that it may be necessary to apply the Liouville transformation twice.
+
+
2. Determinc the asymptotic behavior in the cases where a
3. Determine the asymptotic behavior of the solution of ZL"
f(I
+ 1 j t " ) u -= 0,
I > a > 0.
< 0.
1.25.
U”
+ (1 + f ( t ) ) u = 0 ; ASYMPTOTIC BEHAVIOR
33
1.23. The W K B Approximation
Retracing our steps to the original equation, (1.21.1), we see that under appropriate conditions on a(t), e.g., those of (1.22.8), a useful approximation to the solution of u”
+ &(t) u = 0
as t -+ 00 is given by u
exp [i
1’
a(tl)dt,] /a(t)l/z.
(1.23.1)
(1.23.2)
1.24. The One-dimensional Schrodinger Equation
I n quantum mechanics, it is often a question of solving an equation of the form (1.24.1)
where t is constrained to a finite interval. Carrying out the foregoing changes of independent and dependent variable, we obtain the equation (1.24.2)
Hence, if the functions (1.24.3)
are well behaved over the corresponding finite s-interval, we can write m
v
=
v,,
+ C hkvk,
(1.24.4)
k=l
a convergent series, and thus, retracing our steps, obtain an approximation for u. 1.25. u”
+ (1 + f ( t ) ) u = 0; Asymptotic Behavior
Using the Liouville transformation, we can readily determine the asymptotic behavior of the solutions of u”
+ (1 + f ( t ) ) u = 0
(1.25.1)
FIRST- AND SECOND-ORDER DIFFERENTIAL EQUATIONS
34
in an important case where f
that
+
0 but J" I f I dt
=
co. Let us suppose
I f ' I dt < co.
Jm
(1.25.2)
Performing the change of variable s =
1'(1 0
(1.25.3)
+ f ) l I 2 dt, ,
we obtain the new equation ) zdu+ u = o .
(1.25.4)
T h e condition (1.25.5) is met if (1.25.2) is satisfied and f --t 0 as t + 00. Hence, (1.25.4) has solutions which are asymptotically of the form u1
=
[I
u2 = [l
+ o(l)] cos s, + sin s.
(1.25.6)
0(1)]
This means that (1.25.1) has solutions of the form f 1.25.7)
If, in addition, we suppose that u1
=
[I
u2 = [I
+ 0(1)3
s" f COS
[t
dt
< m, we can write
+ Jtfdt1:2],
+ o(1)l sin [I+ jtfdt1/21. 0
Exercise
1. Determine the asymptotic behavior of the solutions of U"
4-(1
+ l i t u )u = 0
for
a
> 0.
(1.25.8)
1.26. ASYMPTOTIC SERIES
35
1.26. Asymptotic Series
T h e techniques discussed in the foregoing pages enable us to find the principal terms in the asymptotic expressions for solutions of equations of the form 1 U" - ( 1 ta) u = 0. (1.26.1)
+
Suppose that we wish to obtain more refined estimates. I t is tempting to employ the following procedure. Set (1.26.2)
where the ai are constants, and substitute in (1.26.1). Equating coefficients of powers of l / t we obtain the recurrence relation n 3 1,
(1.26.3)
with a, = 1. Since
(
n2 a76 >I n ) a 7 6 - l = .(
-
1) % - I
>
(1.26.4)
we see that a, >, (n - I)! Consequently, the series in (1.26.2) diverges for all t. Nonetheless, it can be used to calculate numerical values of u(t) for large t! It is a particular example of an asymptotic series. These series of paramount importance were introduced into analysis in a systematic fashion by PoincarC and Stieltjes. Let us now give a formal definition. We say that a series S ( t ) = a,
a +2 + -a$ + ...
(1.26.5)
is an asymptotic series for a functionf(t) as t -+ co if the coefficients are obtained in the following fashion: a,
=
limf(t),
a,
=
l i m t ( f ( t )- a,),
an
=
l i m t n [ j ( t )- a, - ... - an-1
t-tm
t+m
t+m
31.
(1.26.6)
FIRST- A N D SECOND-ORDER DIFFERENTIAL EQUATIONS
36
We write (1.26.7)
When we use the notation (1.26.8)
we mean that e - t f ( t ) possesses the asymptotic series Cn=, a,tpn. T h e concept of asymptotic series is of particular significance in the theory of differential equations, where it is often not difficult to show the existence of generalized solutions in the form of asymptotic series. A fundamental problem is then that of establishing, under suitable hypotheses, that these correspond to actual solutions. m
- En=, m
1. Show that i f f
and thatf,
-
Clf
ant-*, g
+ czg -
+
-
[ao/+p (a,&
2. If a, # 0, show tl1atf-'
-
Exercises
+
-+
-Zr=,
m
(clan 11=0
b,t-",
then
+ czb,) t r n
+ a,b,)/t + -..I. [lla,, - a,/a,t
SL
m
+ ...I.
-
a,/t2 a,/t3 *.-, show that f dt, -a,/t - a,/2t2 *... Hence, show that i f f a, -1- a,/t ... and f ' possesses an asymptotic series, then f ' ---ul/t2....
3 . If f
+
4. Show by means of a specific example that f may possess an asymptotic series without f ' possessing an asymptotic series. Hint: Consider a case wheref 0 -1O / t ....
-
+
5. Let
for t
3 0. Show that
and obtain a linear differential equation satisfied by f ( t ) .
1.27. THE EQUATION
0‘
= P(u, t)/q(o, t )
37
6. Show that u” - (1 + I/t2)u = 0 possesses two solutions with the asymptotic series obtained formally by converting the differential equation into a suitable integral equation and proceeding as in the foregoing pages.
7. Obtain the detailed asymptotic behavior of the solutions of U“ - (1 + I / t ) u = 0 . For the foregoing and additional references, see R. Bellman, Stability Theory of DafJerential Equations, Dover Publications, New York, 1969. For asymptotic series associated with differential-difference equations, see the book by Bellman and Cooke previously referred to. For the delicate question of deciding how to use asymptotic series effectively, see
G. M. Roe, “An Accurate Method for Terminating Asymptotic Series,” J. SOC.Indust. Appl. M a t h . , Vol. 8, 1960, pp. 354-367.
1.27. The Equation u’
= p(u, t)/9(u, t)
Let us now show that we can analyze the nature of the solutions of
where p and q are polynomials in their arguments, using quite simple analytic and geometric ideas, provided that we agree to restrict our attention t o those solutions which are continuous for all large t. T h e methods are quite different from those we applied to study the Riccati equation. Our first result is that every such solution is ultimately monotonic. We must show that ti’ cannot vanish for a series of t-values of arbitrary magnitude, unless u = c, a constant. T h e proof is by contradiction. Suppose that u‘ vanishes at the sequence of points (tk>,where t,+ GO. Then the solution u and the curve z, = v ( t ) defined by the equation p(v, t ) = 0 intersect at these points. Since p is a polynomial in z, and t , the curve defined in this fashion possesses only a finite number of branches. Hence, the solution u must intersect one of these branches infinitely often. Without loss of generality, we may suppose that p and q possess no common factor and thus, that q possesses a constant sign in the immediate neighborhood of a solution of p = 0.
FIRST- AND SECOND-ORDER DIFFERENTIAL EQUATIONS
38
T h e branches of the form
p
=
0 which extend to infinity consist of curves of v
(a)
= c,
(1.27.2)
where q(t) is ultimately monotonic. Let us consider the second case first and show that the solution cannot intersect one of these branches infinitely often. For t sufficiently large, the points of intersection cannot be maxima or minima. Drawing a figure will convince the reader of this. Hence, the points of intersection of u and v are points of inflection. But again, a simple diagram will show that two points of intersection which are points of inflection must be separated by a point of intersection which is not an inflection point-a contradiction. We will provide an analytic proof below to supplement this intuitive geometric proof. 1,et us next consider the possibility of intersection of u with curves of the form v = c. These intersections must again be points of inflection of u and have one of the four forms shown in Fig. I . 1. We can eliminate (a) and (b) since u' changes sign in the neighborhood, but p(u, t)/q(u, t ) does not change sign if u c or u 3 c. If (c) and (d) occur, they can occur only a finite number of times since u can return to intersect v = c only by means of intersections with curves of the form v = v(t)or by means of intersections of type (a) and (b) with curves of the form = c1 .
<
(b)
(C)
Figure 1.1
1.28. Monotonicity of Rational Functions of u and t
We can establish a stronger result using some simple analysis. Let us show that if u is a solution of (1.27.1) continuous for all large t , then any rational function of u and t , (1 -28.I )
1.29. ASYMPTOTIC BEHAVIOR OF SOLUTIONS OF u' = p(u, t)/q(u, t )
39
is ultimately strictly monotone unless Z(u, t) = 0 is satisfied by a solution of (1.27.1) or unless h(u, t)is constant along a solution of (1.26.1). We have dh p - +(u, t) _ -- -ah+ +a-h=
dt
au q
at
(1.28.2)
s(u, t ) *
If dh/dt is not of constant sign as t -+ co, it either vanishes infinitely often or becomes infinite infinitely often as t + co,where u is a solution of (1.27.1). Consider first the case where dhjdt = 0 infinitely often, which means that one branch of Y ( U , t) = 0 has an infinite number of intersections with u. For t t, , this branch has an expansion of the form u = a,tco
+ a#' + ...,
co
> c1 > ...,
a,
(1.28.3)
#0
At the intersection 1bltd'
+ ...,
do
> dl > ..., 6,
f 0
(1.28.4)
upon using the series in ( .28.3). Returning to (1.28.3), we have dv dt
- = aoc0tC"--l
+ alcltC1-l + -...
(1.28.5)
From this we see that for large t we have one of the following conditions: (1.28.6)
The persistent inequalities lead to a contradiction, as we see most easily from a figure. Hence at the points of intersection we must have duldt = dv/dt.If this holds for infinitely many t, we must have bo
= a,~,,
do
= C,
-
1,
6,
=
alcl
, dl = ~1
-
I ,...,
(1.28.7)
which means that dujdt = dvjdt. Hence, Y ( U , t) = 0 contains a solution of (1.27.1), which means that h(u, t) is constant for this solution. T h e case where dhjdt becomes infinite infinitely often is treated in exactly the same way. 1.29. Asymptotic Behavior of Solutions of u'
= p(u, t ) / q ( u , t)
We are now ready to demonstrate that any solution of (1.29.1)
40
FIRST- A N D SECOND-ORDER DIFFERENTIAL EQUATIONS
continuous for t 2 to , is ultimately monotonic, as are all of its derivatives, and satisfies one or the other of the relations (1.29.2)
where p is a polynomial in t, and c is an integer. Consider the expression q(u, t)u' - p(u, t ) which contains terms of the form altmu7' or bltnLunu'.From the foregoing discussion we know that the ratio of any two such terms is ultimately monotone and thus approaches a limit as t co. T h i s limit may be 0, fCQ, or a nonzero constant. Since the equation in (1.29.1) holds, there must be at least two terms whose ratio approaches a nonzero constant. If only one contains u', we obtain --f
UIUnpf
N
c1
.
(1.29.3)
If both or neither contain u', the result is u
N
(1.29.4)
c2t"/Q,
where p , q are rational. We will present the detailed analysis as a series of exercises. Exercises
1. Show that (1.29.3) leads to different results where n n n n
f-1, = -1 , =-I, f - I ,
m#+I, m f f l ,
m=+1, m=+1.
- + --
2. Show that Exercise ](a) leads to unt1/(n I ) and Exercise I(d) to un+'/(n 1 ) c1 log t .
+
3 . Show that Exercise l(b) leads to log u
which requires further consideration. Take 1 the equation
where the
cltl-m/(l
+ dl
+
cltl-m/(l - m ) d, , m > 0 and consider
p i and qi are polynomials in t . Show that
> 0,
m)
-
Y =s
c1 > 0. (Why is it sufficient to consider only the case I
4. Hence if c,
-
-
m
+ 1 if
> 0 ?)
1.29. ASYMPTOTIC BEHAVIOR OF SOLUTIONS
OF
U’
= P(u. t)/q(u, t)
41
for some a. Integrating, show that log Ic
=p(t)
+ CQ log t + O ( l / t ) .
5 . Show that c1 < 0 may be treated by replacing original equation.
zi
by l/u in the
6 . Consider Exercise I(c). There are two terms, atbucu’and dtb--luc+l, of equal order. Show that there is no other term of equal order.
7. Let
7 be any third term. Th en the quotient (atbuCu’ - dtb-1uc+1)/7 tends to a limit as t + CO. There are now two possibilities: (a) There is a third term whose order is equal to that of the difference between the two principal terms. (b) There is no such third term.
8. Consider the first possibility. T h e n either
or
Show that both of these lead to the stated result in (1.29.2).
9. Complete the analysis for part (b) of Exercise 7. More detailed
results can be obtained both for the solutions of polynomial equations of the form p ( t , u , u’)= 0 and for equations of the form u” = p(u, t)/q(u,t ) . See
G. H. Hardy, “Some Results Concerning the Behavior at Infinity
of a Real and Continuous Solution of an Algebraic Differential Equation of the First Order,” Proc. London Math. SOC.,Vol. 10, 1912, pp. 4 5 1 4 6 8 . R. H. Fowler, “Some Results on the Form Near Infinity of Real Continuous Solutions of a Certain Type of Second Order Differential Equation,” Proc. London Math. SOC.,Vol. 13, 1914, pp. 341-371. R. Bellman, Stability Theory of Da8erential Equations, Dover Publications, New York, 1969.
10. Use the foregoing techniques to analyze the behavior of the solutions of 71’ + 7 1 ~- a2(t)= 0 under various assumptions concerning a(t).
FIRST- A N D SECOND-ORDER DIFFERENTIAL EQUATIONS
42
Miscellaneous Exercises
1. Let p , ( t ) and q,,,(t) be polynomials of degrees n and m respectively whose coefficients are determined by the condition that
+
where f ( t ) = f,,+ f i t .--.These polynomials determine a Pad6 table and f ( t ) q,,,/p, is called a Padk approximation. Are p , and q,,, uniquely determined by this condition ?
2. Determine Pad6 approximants for p , ( t ) et
el.
Hint: Start with
+ qm(t) = tm+n+l + ...,
and differentiate to obtain recurrence relations.
+
+
+
+
3. Determine Pad6 approximants for (1
a,t a2t2)1/2.Hint: Uniformize by means of trignometric functions and count zeros and poles.
4. Determine Pad6 approximants for (1
+
a,t a,t2 u $ ~ ) ‘ / Hint: ~. Uniformize by means of elliptic functions. For the foregoing and additional references, see
R. Bellman and E. G. Straus, “Algebraic Functions, Continued Fractions and the Pad6 Table,” Proc. Nut. Acad. Sci., Vol. 35, 1949, pp. 472476.
+
+
5. Consider the differential equation U” au’ bu = 0, a, b > 0. Show that constants a, , a2 can be found such that u2 aim' U ~ U ’ ~ is positive for all nontrivial values of u and u’and, in addition, such
+
+
that
where a 3 , a4 > 0.
u’+ 0 as t -+co and, indeed, that some positive constants, a 5 , a 6 , without calculating the explicit solutions. This is a particular application of the powerful “second method” of Lyapunov we shall discuss in Volume I1 in connection with the topic of differential inequalities.
6. Hence, show that u2
+ ur2< a5ecaG1 for
u,
MISCELLANEOUS EXERCISES
7. Consider the equation u”
+
au’ and suppose that the roots of r2 parts. From the relations
s
m
+ + bu) dt = 0,
~ ( u ” au
0
43
+ bu 0, u(0) = c1 , u’(0) = c, + ar + b = 0 have negative real :
s
m
u’(u”
0
+ au’ + bu) dt = 0,
Sr
plus integration by parts, obtain expressions for ST u2 dt, uI2 dt as quadratic forms in c, ,c2 without using the explicit form of u.
sr
u2 dt as a quadratic form in c, and c2 , derive a necessary and sufficient condition for the roots of r2 ar b = 0 to have negative real parts in terms of a and b.
8. From the expression for
+ +
9. Similarly, derive necessary and sufficient conditions that the roots of r3
+ a1r2+ a,r + aR= 0 have negative real parts. See
A. Hurwitz, “Uber die Bedingungen unter welchen eine Gleichung
nur Wurzeln mit negativen reellen Teilen Besitzt,” Math. Ann., Vol. 46, 1895 (Werke, Vol. 2), pp. 533-545. H. Cremer and F. H. Effertz, “Uber die Algebraische Kriterien fur die Stabilitat von Regelungssystemen,” Math. Ann., Vol. 137, 1959, pp. 328-350.
10. Show that if a(t) 3 0 then all solutions of remain bounded as t --t co. Hint: d(u’2
+ u2)/dt = 2u‘u“ + 2uu’
=
2u’(--a(t)
u”
14‘ -
+ a(t)u’ + u = 0
u)
+ 2uu‘.
11. By consideration of the equation u“ + (2 + d)u‘ + u = 0, show that the condition that a(t) 3 a, > 0 is not sufficient to ensure that all solutions approach zero as t -+ a.
12. By means of a change of variable, show that f ( t ) - 0
as t + co cannot be a sufficient condition to ensure that all solutions of f ( t ) u ” u‘ u = 0 approach a solution of u’ u = 0 as t -+ co.
+ +
+
13. If a ( t ) > 0, show that no solution of u” - a(t)u = 0 can have more than one zero. Hint: Without loss of generality, let u ( t ) be negative between the two points t, and t, where u(tl) = u(tJ = 0. Let t, be
a point where u ( t )assumes a relative minimum value for t, < t < t , . Show that this leads to a contradiction upon using the fact that u“ - a(t)u = 0.
+ +
+
(1 E)U’ u = 0, u(0) = 1, u’(0) = 0, where E is a positive constant. Write the solution as u(t, E ) to indicate
14. Consider the equation EU’‘
FIRST- A N D S E C O N D - O R D E R DIFFERENTIAL EQUATIONS
44
thc dependencc on c. Does lime+ou(t, c) exist? Does it satisfy the differential equation u‘ u = 0 ? What is the initial condition ?
+
15. Consider the Green’s function k ( t , t l ) associated with u” =f, zi(0) -- ZL( T ) 0. Show that T
<
for 0 t , t, , t, kernel”. See
< 7’.
This shows that k ( t , t l ) is a “reproducing
N. Aronszajn, T h e Theory of Reproducing Kernels,” Trans. Amer. Math. IJ’oc.,Vol. 68, 1950, pp. 337-404. X. Aronszajn and K. T . Smith, “Characterization of Positive Reproducing Kernels. Applications to Green’s Functions,” Amel.. J . Math., Vol. 79, No. 3, July 1957, pp. 611-622.
16. Establish this result for the Green’s function of the second-order a(t)u = f,u(0) : u( 7’)= 0. linear differential equation u“
+
17. Show that the Green’s function of the foregoing equation can be characterized by the condition that k ( t , s) is a solution of the linear differential equation in (0, s) and (s, I), with a discontinuity in the derivative a t t
I-=
s, namely,
k’(s
~
0, s)
~
F(S
+ 0, s)
=
1
for 0 -: s .’.. 1.
18. Sho\v that the determinantal equation un
UI
u
21.;
ut?
112
has the general solution ZL -= clwl relations between the solutions of coefficient functions, p and y.
+ c 2 u 2 , and hence zi“ -1 pu‘ + qu = 0
19. Lkterniine the Green’s function associated with 0. ~ ( 0 ) 0 , j”; 11 dt ~
zi“
establish and the
f a% = f ( t ) ,
~
20. If p ( t ) > 0, andf,(t),fi(t) are monotone increasing in
[a,
01, we have
MISCELLANEOUS EXERCISES
45
T h e upper sign holds if fi and f2 are both increasing or both decreasing. T h e lower sign applies if one function is decreasing and the other increasing (Cebycev). For some applications, see
J. A. Shohat and A. V. Bushkovitch, “On some Applications of the Tchebycheff Inequality for Definite Integrals,” J. Muth. Phys., Vol. 21, 1942, pp. 211-217.
+
21. T h e equation u’ - u2 = -2/t2 O ( l / t ) as t $ 0 implies that either u = l/t 0(1) as t $ 0 or u = -2/t O(1) as t $ 0 (Korevaar).
+
22. T h e equation
u‘ - u2 =
+
O(l/t) as t $ 0 implies that either as t $ 0.
u = O(1og l / t ) as t $ 0 or u = - 1 j t
2. Koshiba and S. Uchiyama, “On the Existence of Prime Numbers in Arithmetic Progression,” Proc. Japan Acud., VoI. 42, 1966, pp. 696-701.
23. Let {xk>, {fJ, and {zk},k = 0, 1,..., m be real valued sequences and let { z k )be nonnegative. If for k = 0, 1 ,..., m, X k
+
c
22% 9
Ogi
then
for k
= 0,
1,..., m. See
G. S. Jones, “Fundamental Inequalities for Discrete and Discontinuous Functional Equations,” J. SOC.Indust. Appl. Math., Vol. 12, 1964, pp. 43-57. 24. If u”
+ (1 + f ( t ) ) u
= O,f(t)
- 0 as t
co,J“ l f ’ ( t ) / dt < co,then
FIRST- A N D SECOND-ORDER DIFFERENTIAL EQUATIONS
46
26. If
U"
+ g(t)u = 0, g(t) 3 a2 > 0, g(t)-1/4convex in t, then
(M. Zlamal.)
27. T h e general solution of u" +
is u =
1 -g"(t) -)u=o (m g(t) -
dt,
clg(t)sin
(M. Rab.)
28. Letf(n) be defined for n f(n) = u0
+
=
+ a2n(n
-
0, 1, 2,..., and set
Show that
(Newton Interpolation Formula).
29. Write
Show that
where now
30. Hence show that
+
1)/2!
4
7
2 -
l)(n
-
+ ...
2)/3!
47
MISCELLANEOUS EXERCISES
See
R. Bellman, “A q-version of the Newton Interpolation Formula and Some Eulerian Identities,” Boll. U.M.I., Vol. 16, 1961, pp. 285-287.
+
31. Consider the two equations (a) u” + f ( t ) u = 0, (b) u’’ g(t)v = 0. Suppose that there exist two solutions of (a), u l , u 2 , such that J“ (u12 us2)/f - g 1 dt < co. T h e n every solution of (b) can be written ZI = clul czup o( 1 uI 1 1 u2 I). (Wintner.) See
+
+
+
+
R. Bellman, “On a Generalization of a Result of Wintner,” Quart. Appl. Math., Vol. 16, 1958, pp. 431432.
+ +
32. Consider the equation u” (1 Ag(t))u = 0, where J: I g ( t ) /dt < 00. Let r(h) and e(h) be defined by the asymptotic relation lim {u(t,A) 2-
m
-
r(A) sin(t
+ B(h))) = 0.
Show that if g is real, then r(A) and d(X) are analytic functions of A for real A. See
+ +
Jr
R. D. Kemp and N. Levinson, “On u‘’ ( 1 Ag(x))u = 0 for 1 g(x)l dx < co,” Proc. Amer. Math. SOC.,Vol. 10, 1959, pp. 82-86.
+
b
33. Obtain the solution of y’ ay = J o ydx, , y(0) = c. Hint: First b set Soy dx, = m. Solve for y in terms of m and then obtain an equation for m.
34. Similarly, solve y’
+
Y’
4U Y
UY =
=
j1x1y dx, ,
Y(0) = 1
jlf(xl)y dx, ,
y(0) =: 1 .
+
35. Show that y’ uy = p(x), where p(x) is a polynomial of degree k in x, always has a solution which is a polynomial of degree k,provided that a # 0. What happens if u = 0 ? 36. Obtain the solution of u =
1
+ j1exp(k 1 x 0
-
x1 1) u(xI) dx,
by reducing the equation to a second-order differential equation subject to a two-point boundary condition.
48
FIRST- AND SECOND-ORDER DIFFERENTIAL EQUATIONS
37. Show that the factorization U“
+ u = (-dtd + i)(-
d - i) u dt
is not unique by means of the example d d ( t 4-tan t ) ( dt
-
tan t
Are there any other factorizations of
1
u =
U“
u“
+ u.
+u?
38. If u1 , u2 are two linearly independent solutions of u” + f ( t ) u then we may write c o s ( g ( t ) - el,
=
where r(t)
=
= 0,
(u12
+
u22)1/2,
T h e functions ( u , g ) are occasionally called a “polar pair” and the representation a “polar representation”. For applications of this representation, see
I. I. Kolodner, “Phase Shift of Solutions of Second-Order Linear Ordinary Differential Equations and Related Problems,” J . Math. Anal. Appl., Vol. 4, 1962, pp. 422-439. 39. 1,et sgn t : + 1 i f t > 0, -1 i f t totic behavior of the solutions of (a) u” (b) u” (c) U”
< 0,Oif t
= 0.
Study the asymp-
+ a,u’ + a2zi = sgn(6,u’ + b,u), + (sgn u)u 0, + (sgn u)u’ + u = 0. =
40. Establish Steffenson’s inequality: Let f ( t ) be nonnegative and monotone decreasing in [a, 61. T h e n
lbf b-c
when c
dt
<
Jb fl
f g dt
<
Sfl+‘
a
f dt
b
J,g dt.
Jz
Hint: Introduce the function u(s), defined by fg dt consider the differential equation satisfied by u.
=
Jzf dt, and
MISCELLANEOUS EXERCISES
49
41. Establish the following generalization. Let f ( t ) be nonnegative and , g 3 0 in [a, b] and monotone increasing in [a, 61, f ~ L p [ ab], l/q = 1, p > 1. Then
g ~ L q [ ab,] , where Ilp
where c See
=
+
b
(sag dt)p.
R. Bellman, “On Inequalities with Alternating Signs,” PYOC. Amer. Math. SOC., Vol. 10, 1959, pp. 807-809.
42. Let
u = f ’ ( t ) be monotone increasing,f(O),f’(O)
f
>j
(-l)n+lt(an)
n=l
if a, 3 a2 2 ... 3 a,
> 0 by
(i
1
3 0. Show that
( - 1 ~ + l an
n=l
referring to the figure below.
U
t
0 43. Obtain the result
c n
qc
1,
/ n
(-1)j-l
Wjf(Uj)
\
wjuj
(-1y-1
j=1
j=1
where 1 w1 >, wg 3 -.-3 w, >, 0 (Olkin’s inequality), by suitable specialization of Exercise 40.
44. Show that if u(t), 0 function with
ku(t)”
< t < a,
> (u
-
t)”
Iu t
is any nonnegative Ln-integrable ( a - s) u(s)” ds
<
> 0,
where k, m, n,p , q > 0 and m > n, then a c f ~ ( O ) ~where , c and Y are constants depending only on k, m, n,p and q. For the foregoing and additional references, see
FIRST- A N D SECOND-ORDER DIFFERENTIAL EQUATIONS
50
R. Bellman, “On a Differential Inequality of Cesari and Turner,” Rendiconte del Circolo Matematico di Palermo, Ser. 11Tom0 VII, 1958, pp. 1-3.
+
45. Show that u,, uyy + a2(x,y)u = 0 can be transformed into us, + u t t u = 0 by a suitable change of independent variables provided that log a(x, y ) is a harmonic function of x and y . See
+
+
R. Bellman, “On a Liouville Transformation for u, uyy a 2 ( x , y ) u1 0 , ” Boll. U.M.I., Vol. 13, 1958, pp. 535-538. 46. Obtain the result of Blaschke-Pick. Let u ( t ) and v(t) be convex functions of t for 0 t 1, normalized by the conditions
< <
u(1)
=
0,
v(0) = v(1)
=
0.
u(0)
Then
Ji uv dt 3 112.
=
Hint: u convex means that u“ >, 0 or that u” = f ( t ) , f ( t ) 3 0. 1 Hence u(t) = k(s, t ) f ( s )ds, where k(s, t ) is an appropriate Green’s function. See
so
R. Bellman, “Converses of Schwarz’s Inequality,” Duke Math. J., Vol. 23, 1956, .pp. 429-434. 2. Nehari, “Inverse Holder Inequalities,” J . Math. Anal. Appl., Vol. 21, 1968, pp. 405420. E. F. Beckenbach and R. Bellman, Inequalities, Springer-Verlag, Berlin, 1961.
+
a(t)u = 0, 47. Consider the two-point boundary-value problem u” u(0) = c, , u( T ) = c, , and the following scheme for successive approximations. Let u’(0) = 6, and consider the solution of the initial value problem u” -1 a(t)u = 0, u(0) = c, , u’(0) = b, . Let u’( T ) 6, be determined in this fashion. Consider the initial value problem U” a(t)u 0, u( T ) = cz , u’( T ) = b, , and let u’(0) = b, be determined in this fashion. Does this procedure converge? See :
+
1
R. Bellman, “On the Iterative Solution of Two-Point BoundaryValue Problems,” Boll. U.M.I., Vol. 16, 1961, pp. 145-149. R. Bellman and T. Brown, “On the Computational Solution of Two-Point Boundary-Value Problems,” Boll. U.M.I., Vol. 19, 1964, pp. 121-123.
BIBLIOGRAPHY A N D COMMENTS
51
48. Can one find the convenient analytic expressions for the Pad6 1 /t2)u= 0 ? approximants of the solutions of u” - (1 49. If u(a) = 0, u‘ EL2(a,b), then J: (Opial’s inequality.) See
+ [ uu’ 1 dt < 1/2(b -
JE
1 u’ I 2 dt.
D. W. Boyd, “Best Constants in Inequalities Related to Opial’s Inequality,” J. Math. Anal. Appl., Vol. 25, 1969, pp. 378-387. Bibliography and Comments 31.1. For an elementary account of analytic and computational aspects of ordinary differential equations, see R. Bellman, Modern Elementary Differential Equations, Addison- Wesley, Reading, Massachusetts, 1968. $1.3. This inequality plays a fundamental role in the modern theory of stability of differential equations. See R. Bellman, Stability Theory of Differential Equations, Dover, New York, 1969, where further references to its use may be found. It also plays an important part in the derivation of a priori inequalities in the theory of partial differential equations. See, for example, p. 115 of
R. Lattes and J. L. Lions, The Quasi-Reversibility Method and Applications, Dunod, Paris, 1967. (English translation, American Elsevier, New York, 1969.)
31.7. We shall discuss two-point boundary values again using the theory of invariant imbedding in Volume 11. See
T. Y. Na, “Transforming Boundary Conditions to Initial Conditions for Ordinary Differential Equations,” S I A M Review, Vol. 9, 1967, pp. 204-210. $31.8-1.9. For a detailed discussion of these questions, see
R. Bellman,Zntroductionto the Mathematical Theory of Control Processes: I . Linear Equations and Quadratic Criteria, Academic Press, New York, 1967.
91.10. A powerful method for obtaining analytic approximations to the Riccati equation depends upon the use of continued fractions and, more generally, Pad6 approximations. The method is originally due to Laguerre; see
E. Laguerre, “Sur la Reduction en Fractions Continues d’une Fonction qui Satisfait une Equation Differentielle Lin6aire du Premier Ordre dont les Coefficients Sont Rationnels,” J. Math. Pures Appl., Vol. 1, No. 4, 1885, pp. 135-165. See also
E. P. Merkes and W. T. Scott, “Continued Fraction Solutions of the Riccati Equation,” J . Math. Anal. Appl., Vol. 4, 1962, pp. 309-327. W. Fair, “Pad6 Approximation to the Solution of the Riccati equation,” Math. of Computation, Vol. 18, 1964, pp. 627-634.
52
FIRST- A N D SECOND-ORDER DIFFERENTIAL EQUATIONS
$1.1 1. For a presentation of the modern theory of inequalities, see
G. 11. Hardy, J. E. Littlewood, and G . Polya, Inequalities, Cambridge University Press, New York, 1934. E. 12. Beckenbach and R. Bellman, Inequalities, Springer-Verlag, Berlin, 1961. $1.12. For more intensive treatment, see the book by Bellman cited above in $1.3, and
K. Bellman, Perturbation Techniques in Mathematics, Physics, and Engineering, Holt,
Rinehart and Winston, New York, 1964. L. Cesari, Asymptotic Behavior and Stability Problems in Ordinary Differential Equations, Springer-Verlag, Berlin, 1959.
A powerful approach to thc solution of the second-order linear differential equation is bv(t))u = 0 can be expressed in afforded by the fact that the solution of u” - 2(a terms of the Wiener integrd of the functional exp[-b v ( x ( t l ) ) d t , - a t ] . See
+
Ji
M. Kac, “On Some Connections between Probability Theory and Differential Equations,” Proc. Second Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, 1959, pp. 189-215. T. hl. Gelfand and A. M. Yaglom, “Integration in Functional Spaces and Its Application in Quantum Physics,” /. M a t h . Phys., Vol. 1 , 1960, pp. 48-69. The equation of ( I . 12.1) plays a fundamental role in wave propagation. We will study it in detail using the theory of invariant imbedding in Volume 11. For a discussion of the propagation of waves in a dispersive medium using various approximate techniques, see I<. D. Lewis, “Asymptotic Theory of Transients,” Electromagnetic W a v e Theory, Pergamon Prcss, Oxford, 1967, pp. 845-869.
$1.13. ‘I’liis counter-example is duc to A. Wintner. $1.14. This is a special case of the Dini-Hukuhara theorem given in Secs. 4.2 and 4.3 of Chapter 4. $$1.14-1.15. For the results of this section and the exercises, see the book cited in $1.3. $$1.16-1.18. See the books cited in $1.12. $1.16. Additional results may be obtained by means of the theory of invariant imbedding. See
R. nellnlan and R. Kalaha, “Functional Equations, Wave Propagation and Invariant Imhcdding,” J. M u t h . Mech., Vol. 8, 1959, pp. 683-704. G. M. Lying, “hlathematical Methods Suggested by Transport Theory,” Proceedings of S’yinposirrm on Trumpart Theory (R. Bellman and G. Birkhoff, editors), 1969. F. V. Atkinson, “Wave Propagation and the Bremmer Series,” J. M a t h . Anal. Appl., Vol. I , 1960, pp. 255-276. A papcr which brings to the forefront the physical significance of the W K B approximation is
H. Brernnier, “l’he LVKB Approximation as the First Term of a Geometrical-optical Series,” Thcrory of Z?fectroniagnetic Wavrs, A Symposium, pp. 169-1 70, Wiley (Interscience), New York, 1951.
BIBLIOGRAPHY A N D COMMENTS
53
See also
F. W. Montroll, “Markoff Chains, Wiener Integrals and Quantum Theory,” Comm. Pure and Applied Math., Vol. 5, 1952, pp. 415-453. 51.23. The physical significance of the “WKB approximation,” (inappropriately named, as usual) will be given in Volume I1 in the treatment of invariant imbedding. A more accurate title would include the names Liouville, Green, Jeffreys, and Blumenthal. See, for example,
F. W. J. Olver, “Error Bounds for the Liouville-Green (WKB) Approximation,” Proc. Cambridge Phil. Soc., Vol. 57, 1961, pp. 790-810. 51.24. See J. J. Mahoney, “The Reflection of Short Waves in a Variable Medium,” Quart. Appl. Math., Vol. 25, 1967, pp. 313-316.
If the function we have written as a z ( t ) changes sign in the interval of interest, the situation is far more complex. Study of these problems of great physical interest was initiated by Langer. T h e basic idea is a simple but powerful one. Instead of using U” a2(to)u= 0 as a comparison equation for I t - to I 1 , Langer used the equation u“ c,(t - t,,)u = 0 which can be solved explicitly in terms of Bessel functions. T h e idea can be extended to the case where a 2 ( t )E cz(t - to)*for I t - to I 1 . See
+
<<
+
<<
A. ErdClyi, Differential Equations with Transition Points. I . The First Approximation, Dept. of Math., California Institute of Technology, Pasadena, California, Tech. Report 6, 1955, 22 pp. For further results, see
N. Froman, “Detailed Analysis of Some Properties of the JWKB-Approximation,” A r k i v f o r Fysik, Vol. 31, No. 26, 1966, pp. 381-408. N. Froman and P. 0. Froman, JWKB Approximation, Contributions to the Theory, North-Holland, Amsterdam, 1965. 51.26. For an interesting application of the Lagrange expansion (see Sec. 5.21 of Chapter 5) to the determination of the coefficients in the asymptotic expansion of solutions of u“ (1 (b,/t2))u’ (1 (b,/t2) -) u = 0, see
+ +
+ +
+
C. Hunter, “Asymptotic Solutions of Certain Linear Difference Equations with Applications to Some Eigenvalue Problems,” 1. Math. Anal. Appl., Vol. 24, 1968, pp. 279-289.
Chapter 2 MATRIX THEORY
2.1. Introduction
I n this chapter we wish to consider a number of questions centering about the apparently prosaic problem of solving a system of linear algebraic equations
9 aiixi N
j=1
=
i
b, ,
=
1 , 2,..., N .
(2.1.1)
T h e ability to obtain the unknowns x l , x, ,..., x, by means of a finite number of purely arithmetic operations is one of the foundations of analysis, both theoretical and numerical. I n general, we exert our best efforts to reduce the solution of any given equation to one of the foregoing form. Nonetheless, as we shall see, there are also cogent reasons for trying our best to avoid these equations at certain times. A careful discussion of the obstacles encountered in solving (2.1.1) provides motivation for a good deal of our subsequent work. T h is is particularly so when we encounter ill-conditioned systems. I n the discussion at the end of the chapter we will indicate various points of contact of this type of question with parts of contemporary analysis dealing with improperly posed problems of mathematical physics. One approach to (2.1 .l), susceptible to considerable generalization, is to regard it as the variational equation associated with the minimization of the quadratic function Q ( x ~ x,..., ~ xN) =
C N
i,j=l
N
aijxixj - 2
bpi.
(2.1.2)
i=l
This equivalence holds when aij = aji . Another approach is to regard the solution of (2.1.1) as equivalent to the problem of minimizing the quadratic function
2.2. DETERMINANTAL SOLUTION
55
A slight, but important, modification of this leads to the regularization method of Tychonov. These methods clearly indicate the need for a systematic study of quadratic forms, and thus symmetric matrices, if we are seriously concerned with linear systems. The variational principles encountered in this study play a fundamental role in the Rayleigh-Ritz method analyzed in Chapter 7. Extensions of these variational methods, the Courant-Fischer min-max theorems, will be introduced as a way of studying perturbation techniques. Finally, we will present the basic result of Perron concerning positive matrices and the important variational characterization due first to Collatz, and rediscovered by many others. No attempt has been made to penetrate 'into the remarkable and fascinating theory of matrices in any depth. We have bounded ourselves, albeit reluctantly, with the minimum material needed for subsequent chapters. 2.2. Determinantal Solution
From one point of view, the problem of solving the linear system in
(2.1.1) has long since been resolved. We refer, of course, to the fact that we possess the elegant analytic representation of Cramer. If the determinant of coefficients is nonzero, we may write
a21
022
..'
'N1
aN2
'..
/ /
I
(2.2.1)
~a2NN N
with similar expressions for x2 ,..., x N . A corollary of this result is that a necessary condition for the existence of a nontrivial solution of the homogeneous system N
C aijxj = 0,
i
=
1 , 2,..., N ,
(2.2.2)
3=1
is that the determinant det(aij) = I ai3I
=
0.
(2.2.3)
56
MATRIX THEORY
An inductive argument shows that it is also a sufficient condition. This result is an important tool in many investigations, as the exercises at the end of the section indicate. Satisfying as the Cramer solution is analytically, it possesses severe, indeed crippling, drawbacks computationally. If we were to employ the standard rule for evaluating a determinant of order N , we would be forced to form N ! products and then carry out a total of N ! - 1 additions and subtractions to calculate 1 aij 1 . T w o basic considerations blackball this approach: time and accuracy. T o begin with, note that lo! = 3,628,800 and that 20! is an awesome quantity best appreciated by means of the simple inequality 20!
> 10’0(10!).
(2.2.4)
Even at microsecond or micromicrosecond speeds we cannot contemplate IOO! additions or multiplications. Yet we frequently meet even larger systems of equations in scientific applications. Secondly, on a digital computer, multiplication can only be carried out to a specified precision. At some point, after ten, twenty, or thirty digits, we must “round off”, which is to say truncate. This introduces a systematic error with each arithmetic operation, which means that we must carefully ration the number and type of operations performed. We see then that a critical consideration of the simplest system of equations forces us to accept three basic premises:
I . Numerical solution requires a consideration of errors and thus brings in stability theory. 2. Numerical solution involves a scheduling of operations subject to various constraints. 3. One form of a solution may be far preferable to another for numerical purposes.
Exercises
1. If Ax = b possesses at least one solution for any b, then and the equation possesses a unique solution.
IAI #0
2. Consider the simultaneous polynomial equations p ( x , y ) q(x, y ) 0, where p and q are quadratic in x, 7
+ P d Y b + PAY) = 0, 0. Qo(Y)”2+ d Y ) X +
Po(Y)X2
=
=
0,
2.2. DETERMINANTAL SOLUTION
57
Show that we can eliminate x in the following manner. Multiply both equations by x, obtaining
pox3
+ p1x2 +
=
P,X
0,
Regard the four equations as four linear algebraic homogeneous equations in I, x, x2, x3. Th u s y is a solution of the determinantal equation 0 Po
Po 0
40
P,
P2
Pl
P2
0
40
41
q2
41
42
=
0.
0
Generalize to the case where p and q are polynomials in x of degrees m and n respectively. (Sylvester dialytic method of elimination.) Does every solution of the determinantal equation correspond to a solution of the simultaneous equations p(x, y ) = 0, q(x, y ) = 0 ?
+
+
3. Consider the congruence x2 a,x a, = O(p), a, # 0. Multiply by x, x2,..., X P - ~to obtain p - 1 linear homogeneous equations in the quantities 1, x, x2,..., xPp2, using the Fermat result XP--l 3 I@), x # 0. Obtain in this way the determinantal relation satisfied by a,, a, in order that a solution of the congruence exists.
4. Consider the partial differential equation u t
= u, . Let v be determined by u, = vu. Show that u,, = v,u vu, , uZt = v p vut, uxt= v,,u 2v,u, v u l . T h u s we have four linear homogeneous equations in u, u, , ut , uxt. Hence, obtain the Burger’s equation vt = v,, 2vv, .
+
+
+ +
+
5 , Consider the equation ut = u, , and let v be determined by u, = vu. Find a nonlinear partial differential equation for v.
6. Start with
ut = u ,, (u,,
+ a,u, + a2u, + a3u and set
+ h u , + b)
=
+1u,,
+ + CZU,
CQU).
With suitable choice of the parameters a , , a2 , a 3 , b, , b, , cl, c 2 , c 3 , how close can we get to the Korteweg-de Vries equation, v t = v,, uv,? For an interesting discussion of some questions connected with this partial differential equation, see
+
58
MATRIX THEORY
P. D. Lax, Integrals of Nonlinear Equations of Evolution and Solitary Wuues, Courant Institute, New York University, 1968, NYO1480-87. 2.3. Elimination
I n place of the foregoing determinantal procedure we can use a systematic elimination method to determine x l , xg ,..., x, . Multiply the first equation in (2.1.1) by a2, and the second by alN and subtract. T h e result is (%Vall
- ulN‘21)X1
+ ... +
(a2Nul.N-l
- ulN‘2.N--1)XN-l
= %NbI
- %Nb2
*
(2.3.1)
We then multiply the first equation by a3N and the third by alN and subtract, and so on. I n this way, we obtain ( N - 1) simultaneous linear equations for the ( N - 1) quantities x l ,x2 ,..., xNP1.T h e same procedure may be repeated to eliminate xNP1from these ( N - 1) equations, and so on. T h e total number of multiplications required to determine x l , say, in this fashion is thus 2N(N
-
1)
+ 2(N
-
1)(N - 2 )
+ .*. + 3(2) < 2N3.
(2.3.2)
This is considerably less than N ! for N >, 6, and the algorithm is thus seen to be a feasible one for values of N of the order of several hundred, as far as time is concerned. Nonetheless, the foregoing method by no means disposes of the fear of loss of significance as a consequence of the arithmetic operations. If, for example, the quantities (a2,all - a l N a 2 1 ) and so forth, as well as (a,,b, - al,b,), are small compared to the magnitudes of the aii and bi , we can easily obtain meaningless results. I t follows that there may be considerable merit to eliminating the xi in some specified order which depends on the nature of the particular values of the aij . Th is is indeed the case. References to the pivoting procedure, an important instance of “adaptive computation,” will be found at the end of the chapter. Exercise
1. How many multiplications and divisions are required to determine x1 , xg ,..., x, on the basis of the foregoing method ?
2.4. Ill-CONDITIONED SYSTEMS
59
2.4. Ill-conditioned Systems
Either of the above procedures (that of Sec. 2.2 or Sec. 2.3) shows that the solutions of (2.1.1), the x i , when they exist, are linear functions of the bi , (2.4.1)
If the cii oscillate in sign, with large magnitude compared to the b i , small variations in the b, can produce much larger variations in the xi . We see then that the xi are unstable functions of the b, (in a sense to be made precise in Chapter 4), and we say that the original linear system is ill-conditioned. I n some cases we can predict this unpleasant property well in advance; in other cases we can only wait and worry. I t is because of a fear of ill-conditioning that we strive to avoid linear systems wherever possible. This is a first indication of the modern trend toward nonlinearization in many areas. We will provide an instance of this instability in connection with our discussion of the numerical inversion of the Laplace transform in Volume 11. I n some cases one can apply dynamic programming to the numerical solution of ill-conditioned equations. In other cases, we can apply dynamic programming and invariant imbedding to bypass the task of solving linear systems. These matters will be discussed in Volume 11.
Exercise
1 . Consider the linear system Mx,
b,
x1
-
~2
- Mx, = b,
=
Show that the determinant of the system is 1 - mMN-’. Discuss the sensitivity of the solution to the value of m. See
A. M. Ostrowski, “On Nearly Triangular Matrices,’’ J. Research Nut. Bur. Standards, Vol. 52, 1954, pp. 319-345.
MATRIX THEORY
60
2.5. The Importance of Notation
Finite systems of linear algebraic and differential equations will play an cssential role in much of our subsequent work. T o escape a welter of subscripts and superscripts and symbols of bizarre type when treating multidimensional matters, it is necessary to employ vector-matrix notation, and even then to devote critical care and attention to the choice of notation. I t cannot be sufficiently emphasized that a simple, flexible notation which subliminally guides the intuition and obviates tedious, confusing calculations is an invaluable aid to both research and pedagogy. Let us then turn to vector-matrix terminology. 2.6. Vector Notation
1,ower case letters will be used to denote column vectors, (2.6.1) the only types of vectors we shall employ. T h e quantities x l ,x2 ,..., xN , real or complex, are called the components of x, with N the dimension. When we wish to indicate different vectors, we will use different letters. Generally, x, y , x will denote vectors whose components are variables, and a, 6, c those whose components are constants. T h e relation =_ y (2.6.2) signifies that the components are equal term by term. We define (2.6.3) and subtraction in a similar fashion.* For a scalar c1 we define multiplication of a vector by clx
1
f:::)
(2.6.4)
CIXN
T h e symbol 0 denotes the vector all of whose components are zero. ~
~
~
* T h e ‘‘4-” sign connecting x and y
+”
is not the “ sign of scalar arithmetic. But since it enjoys the same properties, it seems rather higgling to employ a different symbol.
2.8. VECTOR I N N E R PRODUCT
61
Occasionally, when there is no chance of confusion with components, we will use subscripts to denote different vectors; if there is a reasonable possibility of ambiguity, we will reluctantly employ superscripts. 2.7. N o r m of a Vector
As a scalar magnitude of a vector, we have a choice of a number of functions, each convenient for various purposes. Let 11 x 11 , read “norm of x”, denote this quantity. We demand the following properties: (a) 11 x I( >, 0 for all x; (1 x 11 = 0 implies that x = 0. (b) II c1x I/ = I c1 I II x 11, homogeneity. ( c ) /I x
+ Y II < /I x ll + II y (I
(2.7.1)
(triangle inequality),
T h e two most popular candidates are
II x I l l
c I xi I? N
=
i=l
(2.7.2)
T h e second has the great advantage of geometric and physical significance, the first that of analytic simplicity. It is easy to verify that 11 x Ill satisfies the three requirements of (2.7.1). T o establish the triangle inequality for 11 x [I2 , we require Minkowski’s inequality referred to in Sec. 1.11. Exercise
1 . Show that we can use maxi I xi I as a norm. 2.8. Vector Inner Product
A fundamental bilinear function of two real vectors x and y is the inner product N
(2.8.1)
MATRIX THEORY
62
Observe that it is symmetric, (x,y )
=
( y , x), and that for real x,
II x 112 = (x,
(2.8.2)
If (x,y ) = 0, we say that x and y are orthogonal. If we allow complex components, it is necessary to use a modified inner product,
c N
(x,Y ) =
i=l
xdi
(2.8.3)
9
where Tidenotes the complex conjugate of yi . I n this way (x,x) preserves the desired nonnegativity property. Observe, however, that this extended inner product is no longer symmetric. We have (2.8.4)
(%Y>= ( Y T ) .
Unless specifically mentioned or indicated in what follows, all quantities of interest will be real. Exercises
+
+
1. Consider (Y,X y , rlx y ) as a quadratic polynomial in rl . From the fact that it is nonnegative, deduce the Cauchy-Schwarz inequality (x,X)(YlY) 3 (x,Y)2-
2. From this deduce the triangle inequality .(
+ y , x + Y)1’2 < (x,
x)lj2
+ ( Y ,Y)1’2.
3. A set of vectors [xl , x2 ,..., x,,] is said to be linearly independent if any c2x2 - . * ckxk = 0, with the ci relation of the form clxl scalars, implies that c1 = c2 = ... = ck = 0. Show that any set of more than N N-dimensional vectors cannot be linearly independent.
+
+ +
4. If (x,y )
= 0, we say that x and y are orthogonal. Show that we can orthogonalize any set of linearly independent vectors [xl , x2 ,..., xk] in the following fashion (Gram-Schmidt orthogonalization). Set y1 = x1 , y 2 = x, clxl ; determine c1 so that ( y l ,y 2 ) = 0. Set y3 x3 cllyl cI2y2. Determine cI1 and cI2 so that (Y.37Yl) = (Y3 PYA = 0, etc5. What is the advantage of this procedure over setting 7
+
+ +
Y3
=
x3
+
CllXl
etc ? Are there any disadvantages ?
+
C12%
?
2.9. MATRIX NOTATION
63
6 . If x l , x p ,..., xN are N linearly independent vectors of dimension N, we can express any other N-dimensional vector x uniquely as a linear combination of the xi .
7. If the N vectors of dimension N are mutually orthogonal, i.e. (xi , xi) = 0, i # j, then they are linearly independent if nontrivial.
8. If x
=
( x i , xi)
xixlcixi , with the xi orthonormal, that is, (xi , xj = 0, i # j , N
=
1, then ci
=
( x , xi), i
=
1, 2,..., N.
2.9. Matrix Notation
To signify a square matrix, the type we shall deal with almost exclusively, we shall use upper case letters, A
=
(Uij),
i,j
=
1 , 2,.... N.
(2.9.1)
As in the case of vectors, X, Y, 2 will be used as variables, A, 3,C as constants. T h e quantities aij are called the elements of A, and N is called the dimension. Equality of two matrices, A = B, means that corresponding elements are equal, aij = bij . Two matrices of frequent occurrence are the identity matrix
I=
(; ;) ...
(2.9.2)
and the null matrix 0, all of whose elements are zero. Matrices possess a far richer and more interesting algebra than vectors. T h e relations that we introduce here are carefully chosen so that we can employ matrices to study the properties of linear transformations. Many other relations can be introduced for other purposes. Addition is defined in the expected fashion, A =fi.( bijh (2.9.3)
+
+
with subtraction the inverse operation. Multiplication of two matrices is induced by considering the properties of two linear transformations applied one after the other. Consider the linear transformation
c N
yi =
i=1
a,ixj,
i = 1, 2,..., N,
(2.9.4)
64
MATRIX THEORY
and define the product of a vector x by a matrix A so that (2.9.4) may be written y = AX, (2.9.5) that is, Ax is the vector whose components are given by the right-hand side of (2.9.4). Then, if z = By, we have z = By = B(Ax). (2.9.6) If we define B A so that (2.9.6) may conveniently be written (2.9.7)
z = (BA)x,
we see that the ijth element of BA is given by the expression (BA),, =
1b,,a,j .
(2.9.8)
k
Exercises
1 . Show that X + 0 = X for all X and that 0 is the only matrix possessing the property.
2. Show that X I
this property.
3. Let yi
=
= yi(xl
X for all X and that I is the only matrix possessing
, xg ,..., xN) be a set of N functions. Set
J(r,x) = ( a Y i / w , the Jacobian matrix. Show that J ( z ,y ) J(y,x)
=
J ( z ,x).
4. Establish the Jacobi identity
[[A, Bl, CI
+ [ [ C ,A ] ,BI + [ [ B ,Cl, A1
= 0,
where [A,B] = AB - BA, etc. For an interesting geometrical interpretation of this important result, see
W. A, Hoffman, “The Neuron as a Lie Group Germ and a Lie Product,” Quart. Appl. Math., Vol. 25, 1968, pp. 423-440. 2.10. Noncommutativity
It is easy to see by means of simple examples for 2 x 2 matrices that, in general, B A # AB. (2.10.1)
2.12. THE INVERSE MATRIX
65
It is precisely this property which renders matrix algebra so important in science and which makes matrix theory a fascinating and challenging domain. However, despite (2.10.1), it is true that associativity holds, namely, (AB)C = A(BC) (2.10.2) which means ABC is unambiguously determined. We leave the proof of this as a n exercise. If AB = BA, we say that A and B commute. Exercises
1. Show that An for n
=
3, 4,..., is unambiguously determined.
2. If X Y
= Y X for all X , then Y = k l , where k is a scalar. Is the requirement that it hold for all X necessary for this determination of Y?
2.11. The Adjoint, o r Transpose, M a t r i x
An important matrix function of A is the transpose, or adjoint, matrix defined by AT = ( ~ j i ) . (2.11.1)
It is ‘Linduced”using the inner product; namely, we can define AT by the relation (x,AY) = (ATX,Y),
(2.1 1.2)
valid for all x and y. Exercise
1 . Show that AT A = ( ( a ( < &)) ), where a‘<)is the vector whose components are the elements of the ith column of A. 2.12. The Inverse M a t r i x
Let us now return to our original objective, the solution of (2.1.1). If we attempt to invert the relation y = AX,
(2.12.1)
MATRIX THEORY
66
which is to say express x as a linear transformation of y, we are led to the inverse transformation x == A-1Y . (2.12.2) This is our definition of A-l. As we know from the theory of linear equations, the inverse matrix exists and is unique if and only if the determinant of A , written I A l = IaijI, (2.12.3) is nonzero. We can use Cramer’s rule to determine the elements of A-1. If 1 A 1 = 0, A is said to be singular. Exercises
1. From (2.12.1) and (2.12.2) we see that A-l is a left inverse, i.e., x = A-l(Ax) implies that A-lA = I. Show that it is also a right inverse, that is, AA-l = I. 2. Show that (AB)-l
=
B-lA-l when B-l and A-l exist.
3. Let y = S x be a relation which preserves distance, that is, ( y ,y ) = (x,x) for all x.Show that this means that S is an orthogonal matrix, ST = S-I.
4. Let y = Sx be a relation which preserves probabilities, that is, N xi = 1 implies that yi 3 0, N yi = 1 . Show that xi 2 0, N S must be a Markov matrix, that is, sii 2 0, sii = 1 for j = 1 , 2, ..., N . If S merely preserves the property of nonnegativity, then S is a nonnegative matrix, that is, sii 2 0.
zi=l
5. Show that the product of two orthogonal matrices is again an orthogonal matrix and that the product of two Markov matrices is again a Markov matrix, both with and without calculation. 6. Show that A is nonsingular if 1 aii I > 1 aii I for i = 1, 2,..., N . For a history of this important and useful result, as well as extensions, see
0. Taussky, “A Recurring Theorem on Determinants,” Amer. Math. Monthly, Vol. 56, 1949, pp. 672-676. For some quantitative refinements of this result, see A. M. Ostrowski, “Note on Bounds for Determinants with DomiAmer. Math. SOC.,Vol. 3, 1952, nant Principal Diagonal,” PYOC. pp. 26-30.
2.13. MATRIX NORM
67
A. M. Ostrowski, “Note on Bounds for Some Determinants,” Duke Math. J., Vol. 22, 1955, pp. 95-102.
7. T h e “generalized inverse” of an M x N complex matrix A is the unique matrix B such that ABA = A, BAB = B, (BA)* = BA, (AB)* = AB. Here C* = C*, where is the complex conjugate of C . See
R. Penrose, “A Generalized Inverse for Matrices,” Proc. Cambridge Phil. SOC.,Vol. 51, 1955, pp. 406413. 8. Show that Ax = 0 has a nontrivial solution for x if and only if I A I = 0 and that this solution may be taken to be real if A is real. Hint: Use induction on the order of A. 9. Geometrically, in two and three dimensions what do orthogonal matrices correspond to ?
10. If I A I = 0 the columns are linearly dependent, as are also the rows. If I A I # 0 the columns and rows are linearly independent. 2.13. Matrix N o r m
For the measure of the magnitude of A we shall usually employ the quantity
/I A /I1 =
N
C I aii I-
Lj=l
(2.13.1)
Occasionally, one can profitably employ (2.13.2)
Exercises
1. Prove that
II ... Ill -
11 A B 11
< 11 A 11 11 B 11 , 11 Ac 11 = 11 A 11 )I c 11 , using the norm
2. Do these results hold for the norm
3. Introduce the notation tr(A)
N
11
[I2?
uii (read “trace of A”) and the = inner product” of two matrices A and B, (A, B) = tr(AB). Show that tr(AB) = tr(BA) and that 11 A = [tr(AA)]1/2.
MATRIX THEORY
68
4. T h e vector norm can be used to induce a matrix norm (as is common in operator theory) by writing
where / j x I/ is one of the vector norms. Show that norm.
11 A [ I r n
is indeed a
5. Consider the six norms: N,(A)
=
N,(A)
=
ili,(A)
=-
max
I <j
N j=l
1 aij 1,
max (Ax,
(x,*)=l
max
I
I a i j I,
Determine the complex matrices for which the ratios N,(A)/N,(A) attain their upper and lower bounds.
B. J. Stone, Best Possible Ratios of Certain M a t r i x Norms, Technical Report No. 19, Applied Mathematics and Statistics Laboratories, Stanford University, 1962. 2.14. Relative Invariants
Given a transformation, it is always of interest to find its fixed points, which is to say the vectors left invariant under the transformation. I n general, as we shall see below, the only solution of AX = x
(2.14.1)
is x = 0. This is not particularly interesting. Much more interesting, and indeed basic to the theory of linear transformations, is the search for relative invariants.
2.14. RELATIVE INVARIANTS
69
Can we find a scalar A and a vector x, determined u p to a factor of proportionality, such that (2.14.2)
AX = AX.
We can think of x as representing an invariant direction. Writing this relation in the form ( A - Az)x
=
0,
(2.14.3)
we see that a necessary and sufficient condition for the existence of a nontrivial vector x satisfying (3) is the determinantal equation
1 A -XI I
= 0.
(2.14.4)
Th i s is a polynomial equation of degree N , the characteristic equation of
A ; and its N roots, A, ,A, ,..., AN , are called the characteristic roots of A.
I n general. they are complex, even if A is real. T h e corresponding vectors, determined u p to a multiplicative scalar constant, are called the characteristic vectors of A. Although there are always N characteristic roots, simple or multiple, real or complex, there are not necessarily N distinct characteristic vectors. T o illustrate this, consider the 2 x 2 matrix (2.14.5)
It i s easy to see that one is a multiple characteristic root and that there is exactly one characteristic vector
(2.14.6) in the sense that all invariant vectors are scalar multiples of this vector. If there is a full complement of linearly independent characteristic x ( ~ ..., ) , x f N ) , we can, as we know, express an arbitrary vectors, vector x as a linear combination of the x ( ~ ) .
c N
x =
CiX(i),
(2.14.7)
i=l
where the ci are scalars. Th en N
N
(2.14.8)
MATRIX THEORY
70
a simple and useful representation of the effect of the transformation equivalent to A. It is in the hopes of doing this that we study the relative invariants. Exercises
1 . Determine the possible limiting behaviors of Anx as n + co if (2.14.8) holds. 2. If A is a given matrix, show that we can always find a B with 11 B 11 < E such that A + B has distinct characteristic roots.
+
+
3. Show that tr(A) = ELl hi and that tr(A B) = tr(A) tr(B). Are the characteristic roots of A B sums of the characteristic roots of A and B ?
+
4. Show that 1 I
+ EA I = 1 + &(A) + ....
a : i , and hence establish the Cauchy5. Show that tr(AAT) = Schwarz and triangle inequalities for 11 A 1, =
6. Consider the linear system 9x1 qx2
+ +
PX,
= Ax1
7
h,, = Ax, ,
PX, = PX,
qxn-l
=
Ax,.
Determine the characteristic values and characteristic vectors of the associated matrix. This system plays an important role in the study of random walk. See
M. Kac, “Random Walk and the Theory of Brownian Motion,” Arner. Math. Monthly, Vol. 54, 1947, pp. 369-390. T h e determination of the properties of characteristic values and vectors of a particular class of matrices is usually a matter of some difficulty. See, for example, A. M. Ostrowski, “On the Spectrum of a One-Parametric Family of Matrices,” J . Reine und Angew. Math., Vol. 193, 1954, pp. 143160. M. Kac, W. L. Murdock, and G. Szego, “On the Eigen-values of Certain Hermitian Forms,” J. Ratl. Math. and Mech., Vol. 2, 1953, pp. 767-800,
2.1 5. CONSTRAlNED MINIMIZATION
71
R. Bellman, Introduction to Matrix Analysis, McGraw-Hill, New York, 1960. (Second Edition, in preparation.)
7. Are the characteristic roots continuous functions of the elements of
A ? See
A. Ostrowski, Mathematical Miscellanea, XX VII, On the Continuity of Characteristic Roots in Their Dependence on the Matrix Elements, Technical Report No. 2, Applied Mathematics and Statistics Laboratories, Stanford University, 1959. 2.15. Constrained Minimization
T h e problem of determining characteristic roots and characteristic vectors arises in a natural way from the theory of linear differential equations, as we shall see in the following chapter. It also arises from the following variational problem. Consider the quadratic form
c N
Q(x1
Xz
XN) =
V..V
aijxixj
(2.15.1)
9
f+l
and ask for its stationary points on the N-dimensional sphere
c N
xi2 =
(2.15.2)
1.
f=l
Proceeding formally, let us employ a Lagrange multiplier. Form the new function QA(xl ,x2 ,..., x N ) =
c N
N
aiixixj
-
i,j=l
h
1xi2
(2.1 5.3)
i=l
and suppose that the xi are now unconstrained. Then the variational equations are
aQn-- O = or
axi
c N
a t j x j - Ax,
(2.15.4)
j=1
AX = Ax.
(2.15.5)
Using the canonical representation of a quadratic form, we can make the foregoing rigorous in a simple fashion (see Sec. 2.19). At the moment we wish to indicate the fundamental role that matrices occupy in multidimensional maximization problems.
72
MATRIX THEORY
Exercise
1. I n place of using a Lagrange multiplier, consider the quotient
and once again derive (2.15.5). What is the meaning of the characteristic values as derived from this formulation ? 2.16. Symmetric Matrices
There are a number of important classes of matrices which possess a full complement of characteristic vectors. Of these, those most frequently occurring are the real Symmetric matrices. If AT = A we say that A is symmetric. T h e reason for the interest in symmetric matrices lies in the fact that the quadratic form (2.16.1)
can be written in the form
where A = (uii) is a symmetric matrix. Let us begin by showing that characteristic vectors corresponding to different characteristic roots of A are orthogonal if A is symmetric. As agreed, we suppose that all matrices appearing are real unless otherwise specified. From (2.16.3)
we have (2.16.4)
Since (Ax'i),dj)) have
=
(A&),di)),by virtue of the symmetry of A , we (hi --
Aj)(X(J),
=
0.
Since by assumption hi # X j , we have (dip, di)) = 0.
(2.16.5)
2.16. SYMMETRIC MATRICES
73
From this result, it follows readily that the characteristic roots of A are all real. For if A is a complex characteristic root, with associated characteristic vector x, then A, the complex conjugate of A, is also a characteristic root with characteristic vector %. Since, by assumption, A # A, we have from the foregoing the relation (x,5) = 0, a contradiction. Let A, , A, ,..., A, be the N characteristic roots of A and assume for the moment that they are all distinct. Then the associated characteristic ..., dN), can be taken to be real and we can normalize vectors, ~ ( 1 1 ,d2), = 1 . The orthogonality of the x(i) them by the condition ( x f i ) , di)) makes it clear that they are linearly independent. Let R be the matrix whose columns are the di). Then the relations Ax(*)= hi&), i = 1 , 2,..., N , yield the equation AR
=
(2.16.6)
RA,
where A is the diagonal matrix
(2.16.7)
Hence, we have R-IAR = A, A
=
RAR-’,
(2.16.8)
provided that R-l exists. The fact that the columns of R are linearly independent ensures this. However, we can go further. We see that RTR = ((di), dj))).
(2.16.9)
Since the x(*)are orthonormal, we see that RTR = I, which means that R-1 = RT and thus that R is an orthogonal matrix. Hence, (2.16.8) simplifies to RTAR = A,
A
=
RART,
(2.16.10)
a fundamental canonical representation. This yields a parametric representation for the elements of a symmetric matrix that can be used to establish a number of results.
14
MATRIX THEORY
Exercises
1. Show that Ak = RAkR-l for k = 1, 2 ,..., and for k = - 1 , if A is nonsingular.
2. Why can we safely suppose that ati
=
+
-2
,...,
aii in the quadratic form Q(x) ?
3. Consider the perturbed matrix A EB, where A and B are symmetric and E is a scalar. Under what circumstances are the characteristic roots hi(A EB)expressible in the form &(A)+ ~ h l l ) ( A B ,) ? What can happen if multiple characteristic roots of A occur ?
+
+
4. Show that a necessary and sufficient condition that the two symmetric matrices A and B can be reduced to diagonal form by the same orthogonal transformation is that A and B commute.
5. Show that any symmetric matrix with distinct characteristic roots N X,Ei, where the Ei are nonmay be written in the form A = negative definite matrices satisfying the conditions EiEi = 0, i # j , Ei2= Ei , and the hi are the characteristic values of A. T h is is called the spectral decomposition of A. What is the expression for A n ?
I XI - A I = AN that +,(All) = &(BA).
6. Write
-
+,(A)P-l
+ + (-
l)N+N(A).Show
2.17. Quadratic Forms
Returning to the representation (x, A x )
for the quadratic form Q(x) = variable,
N
(2.17.1)
aiixixj , let u s make the change of
x = Ry.
(2.17.2)
Then (x, A x ) = (Ry,A&)
=
( y , RTARy) = ( y , Ay)-
(2.17.3)
Hence N
C
i,+l
N
aijxex3 =
C Xiyz.
(2.17.4)
i=l
Th i s representation as a sum of squares plays an essential role in the study of quadratic forms. Observe that since RTR = I , we also have (x,x)
=
(RY,RY) = (Y, RTRy)= (Y,Y).
(2.17.5)
2.18. MULTIPLE CHARACTERISTIC ROOTS
75
Exercise
1. Show that (x,Ax) = 0 for all x implies that A T = -A. Hence if the quadratic form aiixixj is identically zero for all x, then aij = 0 for all i a n d j , if A is symmetric. 2.18. Multiple Characteristic Roots
T h e fundamental canonical representation of (2.16.8) remains valid in the general case where the symmetric matrix A has multiple characteristic roots. Since we will make no particular use of this result, let us merely sketch two approaches to its demonstration. T h e first proceeds by establishing the result for the perturbed matrix A B, where B is a real symmetric matrix with I(B (1 < E chosen so that A B has no multiple characteristic roots, and then proceeding to the limit as B+0. This is not an esthetically satisfying method since it requires extraneous analytic concepts of limiting procedures in the demonstration of an algebraic result, but it is nonetheless a useful method in many cases. It must be applied in each particular investigation, however, with a certain amount of care since sometimes the limiting procedure is valid, and sometimes not. I n this case, the fact that R is orthogonal, which implies that R-l = RT, hence RRT = I , provides the key. We leave the details to the reader. T h e second approach is inductive and algebraic. If A, is a characteristic root, we begin by showing that we can find an orthogonal matrix R, such that
+
+
(2.18.1)
It is easy to see that ANPIis again symmetric and that it possesses the remaining N - 1 characteristic roots of A. W e now iterate this proce-
dure, using the group property of orthogonal matrices. Again the details are left t o the reader. T h e fundamental fact is that a characteristic root of multiplicity Y has an r-dimensional linear subspace of characteristic vectors. An orthonormal basis may readily be constructed for this subspace. I n what follows we shall suppose that the characteristic roots are distinct unless otherwise specifically stated.
MATRIX THEORY
76
Exercises
I . Establish the Cayley-Hamilton theorem for symmetric matrices: Every symmetric matrix satisfies its characteristic equation.
2. Establish the Cayley-Hamilton theorem for matrices with distinct characteristic roots.
3. Establish the Cayley-Hamilton theorem for arbitrary matrices using a limiting process, or otherwise. (A simple proof uses the Jordan canonical form mentioned in Sec. 2.24. However, it is not simple to obtain this canonical form.) 2.19. Maximization and Minimization of Quadratic Forms
As we shall see in subsequent chapters, a large number of basic problems of analysis can be reduced to the problem of the minimization or maximization of a quadratic form Q ( x ) = (x,Ax)
+ 2(r, b )
(2.19.1)
over some region in x-space. There are two cases of particular importance. T h e first is that where x can vary over the entire N-dimensional space. T h e stationary points are then determined by the linear equation
+b
Ax T h e sccond is that where b
=
=
0.
(2.19.2)
0 and x is constrained to the unit sphere
(.2,s) =
1.
(2.19.3)
A rigorous discussion of the connection between the solution of (2.19.2) and the original variational problem requires the concept of positive definite matrices which we present in Sec. 2.21. Let us consider here the constrained case. ‘There are several ways of proceeding. Probably the easiest is the approach which makes use of the reduction to diagonal form as in (2.17.4). ITTe see that the range of values assumed by (x,Ax) when x tralerses (s,x) = 1 is the same as the range of values of the simpler expression
c 4Yi2 N
2-
1
(2.19.4)
2.20. MIN-MAX CHARACTERIZATION OF THE A*
77
Let us arrange the characteristic values in order of increasing magnitude, (2.19.5) A, < A, < ... < AN .
It follows that A
min
- (.r,x)=l
(x,
(2.19.6)
Ax) =
and that (2.19.7)
Exercises
1. Write &(A),AN(A)to indicate the dependence on A. Show that A,(A AN(A
+B) bA d 4 +A m ,
+ B, <
AN(A)
+
AN(B)*
2. Show that we can employ Rayleigh quotients and write
2.20. Min-Max Characterization of the A,
Let x(1), ..., x(N) be characteristic vectors associated with A, , A, ,..., A,. Then, referring to (2.19.4), we see that we can write (2.20.1)
where the minimum is taken over the y-region defined by y1 = 0. Translating this back into x-space, we may write A,
=
min(x, Ax),
N
yi2 = 1, (2.20.2)
where the minimum is taken over the x-region (x, x(1)) =
0,
(x, x) = 1.
(2.20.3)
?a
MATRIX THEORY
T h e first condition is an orthogonality condition, the geometric meaning of which is clear if we think in terms of an N-dimensional ellipsoid. Similarly, we can write down relations for A, ,..., AN-l . Since the characteristic vectors x(l),..., d N )depend in a complex way on A, these relations are not easily used t o obtain various bounds for A, ,..., AN-l that are often needed to establish other results. To overcome this defect, a basic representation due to Fischer and Courant is extensively used, a min-max characterization. It takes the form A, = min (x,A x ) / ( x ,x), X
A,
=
A,c
=
max min (x,Ax)/(%,x),
( y , y ) = l (x,y)=O
Equivalently,
max
min
( y i , y ' ) = l (x,y')=O
(x,
A x ) / ( x ,x),
i = 1, 2,..., k
max(x, A x ) / ( x , x), ANPI = min max (x, Ax)/(x,x), AN
-
1,
(2.20.4)
=
(2.20.5)
( y , y ) = l (x,y)=O
It is not difficult to verify the foregoing results using the canonical representation, once they have been stated. Their origin becomes quite clear if we consider the geometrical interpretation of the characteristic roots in terms of the lengths of the axes of the multidimensional ellipsoid; see the two-dimensional figure below (Fig. 2.1).
Figure 2.1
As we allow y to vary, it is easy to see that the length OP of x, kept perpendicular to y , becomes a minimum when OP coincides with the minor semiaxis of the ellipse and OQ coincides with the major semiaxis of the ellipse. Observe the desired fact that the domain of variation of the x and y(i)in (2.20.4) and (2.20.5) no longer depends on the matrix A.
2.21. POSITIVE DEFINITE MATRICES
79
2.21. Positive Definite Matrices
If a real symmetric matrix A has the property that (x,
Ax) > 0
(2.21 .l)
for all real nontrivial vectors x, we say that A is positive definite and we write A>0. (2.21.2) If it is merely true that (x, Ax) 3 0, we say that A is positive indefinite, 0. or nonnegative definite, and write A It follows from the decomposition into a sum of squares given in (2.17.4) that a necessary and sufficient condition that A be positive definite is that all characteristic roots of A be positive. T h e condition of positive definiteness introduces a partial ordering in real symmetric matrices. We write A 3 B if A - B 3 0. Clearly A B, B C implies that A 3 C and c1 > 0, A > 0 implies that c,A > 0.
>
>
>
Exercises
1. Show that A 2 3 0 for any real symmetric matrix A.
2. Show that
m:s
...
m
e-(zJz) dx, dx,
3. Show that BABT 3 0 if A
> 0.
4. If A, B 3 0, then h A
-
+ (1
... dx,
if A
= n N / 2 /A[
A) B 3 0 for 0
> 0.
< h < 1.
5. Why do we say “partial ordering” ? Can we compare any two symmetric matrices A and B in this fashion ?
> 0, then A-l > 0. If ( x , A x ) > 0 whenever ( x , B x ) = 0, where B is positive indefinite,
6. If A 7.
then there exists a scalar constant h such that ( x , A x ) positive definite. (Finder.)
+ h(x, B x ) is
8. A necessary and sufficient condition that ( x , A x ) be positive for all nontrivial x satisfying the equations ( x , bi) = 0, i = 1, 2, ..., k, is ( x , 6J2 be positive definite for h I. * that ( x , Ax) + h
>
9. Show that a sufficient condition that ( x , A x ) be positive for all * We
use the notation h
> I to mean that h must be sufficiently large.
MATRIX THEORY
80
nontrivial x satisfying b,x, bordered determinants
...
a11
a12
a21
a22
’..
a,,
Uk2
...
b,
be nonnegative for k
+ B2x, + ... + bNxN = 0
=
b2
a1?i
is that the
b, 62
akk
... 6 ,
b, 0
1 , 2,..., N .
10. If A = (aii), B = (0,) are positive definite, then C positive definite. (Schur.) Hirit: Use Sec. 2.17.
=
(adjbii)is
11. Show that a necessary and sufficient condition that A be positive definite is that tr(AB) > 0 for all positive definite B. 12. If A is positive definite and B symmetric, show that exp[-(x, A x ) - i(x, Bx)]dx
13. Show that dx, dx, ( Z , A d <1
... dx
A r(N/2
&“2j -
=
nN/2
1A
+ iB
/1/2*
\- 1 /2
+ 1)
14. Show 1 A J/i A(i)i= min, (x,A x ) where x is constrained by the condition xi = I and A ( i )is the matrix obtained from A by deleting the ith row and ith column.
15. If I1 is a symmetric matrix with no characteristic value in [a, b], then ( H - a l ) ( H - 01) is positive definite. (Kato’s Lemma.) G. ‘Temple, “An Elementary Proof of Kato’s Lemma,” Mathematika, Vol. 2, 1955, pp. 39-41. 16. I,et { A I 1 )be a sequence of positive definite matrices such that A , < A , < ... < A,, < B. Show that A, converges as n + 00. Hint: ( c , A,,c) is a monotone increasing scalar function for all c. Choose c adroitly.
17. If A is a positive definite matrix, it possesses a unique positive definite
square root 13 = Allz. For a representation of All2 which exhibits some of its properties in a simple fashion, see Chapter 7. For a surprising way in which square roots enter, see E. P. IVigner and 51. Y.Yanase, “Information Content of Distributions,” Proc. Nut. Acad. Sci. U S . , Vol. 49, 1963, pp. 910-918.
2.22. DETERMINANTAL CRITERIA
81
2.22. Determinantal Criteria
Since, however, any property requiring a knowledge of all of the characteristic roots is obviously a difficult condition to verify, it is essential to have a more tractable set of relations. T o obtain these, let us begin with the condition that A > 0 implies that I A 1 > 0. There are several ways of demonstrating this result. Since the characteristic roots, hi , are the solutions of /A-AXII=O (2.22.1) we see that (2.22.2)
Since A > 0 implies that hi > 0, i = 1, 2,..., N , the result follows. An important type of proof using imbedding starts with the fact that A > 0 implies that 1 A 1 # 0. For if I A 1 = 0, there is a nontrivial vector x satisfying Ax = 0. For this value of x, (x,Ax) = 0, a contradiction. Now consider the family of matrices A(t) = t I + (1 - t ) A where 0 t 1. Clearly, A ( t ) > 0 for 0 t 1 . It follows from t 1. Since I A ( l ) [ = 1 I I = 1 > 0, we this that I A(t)l f 0 for 0 see then that 1 A(O)I = I A I > 0. I t follows, upon consideration of the submatrices of A , that a necessary condition that A > 0 is that the determinantal criteria of Sylvester,
< <
< <
< <
(2.22.3)
hold. T hi s is also a sufficient condition, as follows from the representation of Exercise 2 below. A proof without an explicit representation follows, from the PoincarC separation theorem of Sec. 2.26. Exercises
1. Let fl ,fz,..,f N be a set of N real functions each in L2(0,7’).Con-
s:
+
+
x2fz ..- +xNfN)2dt, sidering the quadratic formQ(x) = (XIf show that the determinant fifidt 1 is nonnegative. Show that strict positivity holds if the f i are linearly independent over [0, 7’1.
1s;
2. Show that if Q(xl , x2) = ullx12
Q
= all
(xl
+ 2u12x1x2+ u22xz2,we have
+ 5 x.)” + (azz A) a11
- a11
x22,
82
MATRIX THEORY
and thus show that a necessary and sufficient condition thatQ be positive definite is that a11 a12 > 0. a11 > 0 , a12 a22
I
3. Generally, show that
where Do = 1, D, = I aij I , i , j YN =
=
I
1 , 2,..., k, and
xN
+ C N
yk = xk
ckjxj. (Lagrange.)
j=lc+l
4. Hence, establish the result stated in (2.22.3). 2.23. Representation for A”
Consider the quadratic expression f ( Y ) = W , Y ) - ( Y , AY),
(2.23.1)
where A > 0. Then the minimum off(y) over y is uniquely determined as the solution o f Ay = x, (2.23.2) and minf(y) = (x,A-lx). (2.33.3) Y
Exercises
1. Show that B
> A > 0 implies A-l > B-l > 0 .
2. Does A 3 B imply A2 3 B 2 ? 3. Establish that B > A (2.23.3).
> 0 implies A-l > B-l > 0 without
use of
2.24. Canonical Representation for Arbitrary A
It is easy to extend the foregoing results to complex matrices A which are Hermitian, A T = A. (2.24.1)
2.24. C A N O N I C A L REPRESENTATION FOR ARBITRARY A
83
An immediate reduction to canonical form may be carried out if all of the characteristic roots of A are distinct. We have A
=
T-'AT,
(2.24.2)
where T is orthogonal in the extended sense, T = T-l; A and T are in general complex. A matrix T such that T = T-l is called unitary. If the characteristic roots are not distinct, we use the inductive approach mentioned earlier. With no restriction on the roots, we can carry out an inductive procedure to obtain a triangular form
(2.24.3)
This is useful in connection with linear differential equations. Another useful canonical form, although quite tedious to derive, is that due to Jordan. Consider the matrices of special form
(2.24.4)
T h e n it can be shown that a nonsingular transformation T exists, reducing a general matrix A to the form
+ + +
K, R , = N , the dimension of A. If all of the Fzi are where k , equal to one, we say that A has simple elementary divisors. We will make only a single particular use of (2.24.5) and thus not tarry for its demonstration. -
0
-
a4
MATRIX THEORY
Exercises
1. Show that unitary matrices form a group
2. Let A be a matrix with complex elements. A is called normal if A.*Ir A T A . Show that A is normal if and only if one of the follon i n g holds: :
I? j iC,where R and C are Hermitian and commute. (a) A (b) B has a complete set of orthonormal characteristic vectors. (c) A U T DlJ where U is unitary and D is diagonal. liH ZIU, uhe re C' is unitary and H is Hermitian. (d) A (e) T h e characteristic values of AAT are 1 A, 12, 1 A, 12, ..., 1 AN 12, where A, , A, ,..., Ah are the characteristic values of A. (f) T h e characteristic values of A are ~
~
~
~
'I
+
> ' 2
+
+
&?
>..')AN 1
1 he foregoing desirable properties focus attention on the problem of approximating to a given complex matrix B by a normal matrix A in the senbe of minimizing a suitable matrix norm I/ R A (1 for A a normal matri\. T h e question first was investigated by Mirsky; see also r .
~
I<. I,. Causey, On Closest Normal Matrices, Computer Science l h ision, ~ Stanford University, Technical Report CSIO, June 1964.
2.25. Perturbation of Characteristic Frequencies
I n m a n y situations the characteristic roots of a symmetric matrix A are associated with the natural frequencies of a physical system. We shall disciiss this in the following chapter. Ilihat happens to these frequencies \'i hen the system parameters are altered ? Very often, all that we want to knov is that they are decreased or increased. I n particular, what can we say about the charactcristic roots of A -1 B as compared to those of A when I j is positive definite ? It is intuitively clear that all characteristic roots should be increased. I t is clear that the sinallest characteristic root is increased. We have
(2.25.1 )
2.26. SEPARATION AND REDUCTION OF DIMENSIONALITY
85
With the aid of the foregoing min-max characterization, we can assert that all of the characteristic roots are increased. Furthermore, we can even provide a quantitative result, namely,
+ B, 2
+
(2.25.2)
Exercise
1. Is &(A)a convex or concave function of A ?
2.26. Separation and Reduction of Dimensionality
I n many investigations only the first few characteristic values are desired. Often only the smallest value is important; in other cases, the distance between the first two is significant. If A is high-dimensional, we may wish to try an approximate procedure of the following type. Set
c u,y'", M
x
=
(2.26.1)
k=l
where the y'k) are a set of orthonormal vectors, with M
uk scalars. T he n
< N , and the
c u,u,(y'k', Ay'" M
(x,Ax) =
k,L = l
BMu),
(2.26.2)
where BM = (y(", Ay")). Furthermore, (x, x) = (u, u).
(2.26.3)
What can we say about the characteristic roots of B,? T h e min-max representation given above allows us to state that
This is the PoincarC separation theorem.
MATRIX THEORY
86
Exercise
1. Consider the sequence of symmetric matrices defined by A, i , j : 1, 2 ,..., r , for r = 1, 2,..., N . T h e n
=
(aii),
2 X,(AJ 3 W i + J -
W t + I )
Obtain corresponding inequalities for the other characteristic values. 2.27. Ill-conditioned Matrices and Tychonov Regularization
Let us now return to the problem of obtaining a solution of an illconditioned system Ax = b. (2.27.1) Generally, we regard as a solution of (2.27.1) (l), to a prescribed degree of accuracy, a vector z satisfying the condition / / Az - b /I E , with some convenient norm. This relation is equivalent to
<
AZ - b where I/ y
jl
<
E.
(2.27.2)
=y,
This yields z
=
A-lb
+ A-ly
=X
+ A-ly.
(2.27.3)
If A is ill-conditioned, I/ k l y /I may be excessively large, even if is small. T h e problem of paramount interest in numerous investigation is that of determining x,the solution of (2.27.1), with some desired accuracy. There is no general solution to this problem. If, however, we add some additional information concerning x, information available from analytical, computational, or scientific sources, then we can often achieve success. One method, due to Tychonov, which has been applied with considerable success in recent years is that of regularization. In place of attempting to minimize, say,
11 Ax
-
b
(12
=
(Ax - 6, Ax
-
b),
(2.27.4)
we use the modified norm, (actually, the square of a norm), N ( x ) = (AX- b, A X - 6) + P(x),
(2.27.5)
where ~ ( x )is carefully chosen to take advantage of all that we know concerning x. Thus, for example, we may possess an approximate value
2.27. Ill-CONDITIONED MATRICES A N D T Y C H O N O V REGULARIZATION
87
for the solution of (2.27.1) based o n an entirely different approach to the original problem. Call this approximate value c. Consider then the problem of minimizing N(x, A)
=
(Ax - b, Ax
-
b) + X(x
-
c, x - c).
(2.27.6)
An easy calculation shows that the vector x which minimizes is given by (ATA
+
+ X1)x = (ATb + hc).
(2.27.7)
Observe that ATA XI is positive definite for X 3 0. If X is chosen adroitly (and this is where experience and experimentation count), we will have ATA XI far better conditioned than A , so that x can be el, calculated from (2.27.6), and yet also have (Ax- b, Ax - 6)
+
(x - c,
x - c)
<
Ep
<
.
T h e foregoing procedure gives rise to a method of successive approximations. Let c = x,,, and let x,+~ be determined by the condition that it minimize the expression (Ax
~
b, Ax - b )
+ X(x
-
x, , x
- x,).
(2.27.8)
We leave it to the reader to show that the sequence {xn} determined in this fashion converges to the solution of Ax = b for any h 3 0, provided that A is nonsingular. As will be seen, the rate of convergence of x, to the solution of (2.27.1) is dependent on the size of A. If X is small, it is rapid; if X is large, it is slow. However, if X is large, ATA 3- XI is well-conditioned; if X is small, ATA AI is ill-conditioned. A fine balance must be struck.
+
Exercises
1. Let x(X) denote the vector x which minimizes N ( x , A). What is the analytic structure of .(A) as a function of X ? Can we use extrapolation techniques to determine x(0) from the values of x(h) for sufficiently large A ?
2. Consider the minimization of
with respect to the u i . Show that the matrix of coefficients (the Hilbert matrix) is ill-conditioned. How do we avoid this problem in the determination of the minimum of the integral ?
MATRIX THEORY
88
2.28. Self-consistent Approach
I n other cases we do not possess any useful approximate solution to use for c. We do, however, usually know something about the components x, x l , x., ,..., x N , as functions of the index. For example, we may know that xk is monotone increasing or decreasing in K or convex. Or, we may merely know that xi and xiilmust be reasonably close together in value. In a situation of this type, we can take ~ ( x )in (2.27.4) to have the form V(X)
=
X[(X,
-
xl)'
+ (x3 - x')' + ... +
qJ(n-)
=
,A.([
~-
2x,
+
or
x1)2
(xN - xN-~)'],
4-(x4 - 2x, + x2)2
+ '..I.
(2.28.1) (2.28.2)
Again it is necessary to choose h carefully. I n Volume I1 we will indicate how dynamic programming may be applied to take advantage of the special structure of $(x). 2.29. Positive Matrices
A combination of research in stochastic processes and mathematical economics has produced a considerable interest in matrices with nonnegative elements. We shall restrain ourselves here to the proof of the fundamcntal result for matrices all of whose elements are positive. These are called positiz9e matrices. T h e more general case of nonnegativity, inbestigated first by Frobenius, requires delicate and detailed arguments. A basic result, due to Perron, is the following: Theorem. If A is n positive matrix, there is a unique characteristic root of A , h ( A ) ,which hns greatest absolute value. This root is positive and simple and its associated characteristic vector may be taken to be positive.
There are many proofs of algebraic, analytic, and topological nature of this elegant result. T h e proof we present below is important for our purposes since it provides a variational formulation for h ( A ) which furnishes a means of obtaining upper and lower bounds. I n addition, the characterization can be extended to the more general operator case. We will give an application to integral equations in Chapter 8. 'The root h ( A ) is oftcn called the Perron root. \I;c will wish to consider nonnegative vectors, which is to say vectors all of whose components are nonnegative. We will write x>,o
(2.29.1)
2.30. VARIATIONAL CHARACTERIZATION OF h(A)
89
to signify this. We say x is positive, x > 0, if all of its components are positive. T h e notation x 3 y will mean that x - y 0. T h e point of this is that a nonnegative matrix preserves the region x 3 0, i.e., x 3 0 and A nonnegative imply A x >, 0. Th is invariance plays a fundamental role in what follows. Exercise
1 . Let A be of dimension N . If A-l exists and more than N of the elements of A are positive, then A-l has at least one negative element. T h u s A = P D with P a permutation matrix and D a diagonal matrix with positive elements if and only if A and A-l have nonnegative entries. (R. Spira.) 2.30. Variational Characterization of h(A)
T h e proof of the foregoing theorem will follow as a consequence of: Theorem. Let A be a positive matrix. Let S(X) be the set of nonnegative h for which there exist nonnegative vectors x such that A x 3 Ax. Let T(h) be the set of positive X for which there exist positive vectors x such that Ax Ax. Then A(A) = max A, A E S(A), (2.30.1) = min A, A E T(A),
<
We begin by normalizing all nonnegative vectors we consider. We take N
I/xjl = E x i = 1.
(2.30.2)
i =1
Th i s excludes the null vector automatically. Let us employ the norm
I/ A I/ = for a nonnegative matrix. If Ax
c N
aij
i,+l
(2.30.3)
< A x , we have (2.30.4)
whence (2.30.5)
MATRIX THEORY
90
Hence S(A)is a bounded set, and clearly not empty if A is positive. Let A,
=
x E S(X),
sup A,
and {hi} be a sequence of A's in S(A) converging to A, . Let {#} associated set of vectors, by which we mean hiX'i'
< Ax'i',
i
=
1, 2 ,....
(2.30.6) be an (2.30.7)
Since j / X ( ~ ) = I ~ 1, which implies that all of the xi lie in a bounded region of N-dimensional space, we can choose a subsequence, say {x(j)},which converges to x(O), necessarily nontrivial since Ij x(O)(I= 1. Since, by virtue of (2.30.7), X,x'"' < Ax',', (2.30.8) it follows that A, E S(A), whence the supremum is actually an attained maximum. I t remains to show that the inequality is actually an equality. T h e proof is by contradiction. Suppose that strict inequality holds in the first component. Then, letting do)have the components xl,x2 ,..., xN , we have
4 + x,x, <
c N
a1pj
j=1
, (2.30.9)
where dl > 0. Let us show that (2.30.9) permits us to modify do)in such a way that A, may be increased. Choose (2.30.10)
I t follows from (2.30.9) that
4lY < AY,
(2.30.11)
with strict inequality. This contradicts the maximum property of A,. Hence, (2.30.9) cannot hold and (2.30.8) is an equality. X,x'O'
=
Ax'"'.
(2.30.12)
2.31. PROOF OF MINIMUM PROPERTY
91
Thus, A, is a characteristic root of A , and do)is an associated characteristic vector. We must now show that A, = A(A). We proceed again by contradiction. Assume that there exists a characteristic root A with I A I 2 A,. Let x be its associated characteristic vector. From Az = Az, we have 1x1 1x1 G A l 21, (2.30.13) where I x I denotes the vector whose components are the absolute values of the components of x. It follows from the definition of A, that I X I A, . Hence, we must have I A I = A, . But then the same argument as before shows that (2.30.13) must be an equality. Hence
<
I
I
=
A1 2
I?
(2.30.14)
which means that x = clw, with w > 0 and c1 a complex number. Thus, Az = Ax is equivalent to A w = Aw, whence A is real and positive and thus equal to A,. Let us finally show that w is equivalent to x(O), which means that apart from scalar multiples there is only one characteristic vector associated with A, . Let x be another characteristic vector associated with A(A).As above, we see that it may be taken to be positive. T h e n do)+ EZ for all scalar E is a characteristic vector associated with A, . Decreasing E to some negative value, we meet a first value of E for which one or several components of do) EZ are zero, with the remaining components positive, provided d o ) and x are really linearly independent. This contradicts the relation A(x'0' E x ) = X(A)(x'O' €2) (2.30.15)
+
+
+
since A is positive. Exercises
1. Use the Jordan canonical form to show that A(A) must be a simple root.
2. Let ANPIbe obtained from A by striking out an ith row and a j t h column. Show that A(A) > A(ANA1). 3. Show that A(AT)= A(A). 2.31. Proof of Minimum Property
T h e minimum property may be derived by repeating the foregoing steps. However, it is worth pointing out that it follows readily by using
92
MATRIX THEORY
the adjoint operator, the transpose matrix. Suppose that Ay some y > 0. l’hen for any x > 0, A(z, y )
2 (2,Ay) = (ATz,y ) .
< Ay
for
(2.31.1)
Let z be a characteristic vector of AT associated with A(A).T h e n y)
Since (z,y )
z (ATz,Y ) = W ( z ,Y ) .
(2.31.2)
> 0, we havc X 3 X(A). 2.32. Equivalent Definition of A(A)
T h e definition of X(A) given in the theorem of Sec. 2.30 is equivalent
to the representation
I
=
1
min max I
1
i
1 uijxj/xi
ii:l
(2.32.1)
Judicious choices of the xi yield upper and lower bounds for A(A). We return to this point in Chapter 8, where we give an example of the use of this method in connection with integral equations. Exercises
1. 1,ct c be any nonnegative vcctor with A a positive matrix. T h e n ZI
7
lim [Anc h(A)’”] n ,a
cxists and is a characteristic vector of A associated with X(A).
2. If ,4 is such that a,, > 0, i # j , then the root of A with largest real part, p ( A ) , is real and simple. There is an associated characteristic vector which is positive and unique up to a scalar multiple. Furthermore,
2.32. EQUIVALENT DEFINITION O F X(A)
3. If aij 3 0,
aii
93
< 1 , j = 1 , 2,..., N , then the system
+ c aijxj , N
xz = yi
i = 1,2 )...)N ,
3=1
has a unique solution which is positive if theyi are positive. If aij > 0, then (I - A)-l is positive.
4. Under the same conditions, show that I I - A I > 0. Hint: Consider I I - XA I for 0 < X < 1. 5. Let A be a positive matrix. Then A(A) = maxmin R'
2
R'
I
xi
where R' is defined by xi 3 d > 0, xi = 1, and d is a parameter depending on A. Specifically, we may take
6. X(A) is the unique solution of
R
aiixj
r
j=1
+ A(1
-
xi)
7. X(A) may be obtained as the limit of the sequence (un>determined by
=
min max {...}, R'
i
(xj
with u,, = min,, maxi aijxj). Show that the convergence is monotone. T h e foregoing is closely related to the theory of games, particularly to multistage games. See
R. Bellman and J. P. LaSalle, Non-zero Sum Games and Stochastic Processes, T h e RAND Corporation, RM-271, 1949.
MATRIX THEORY
94
L. Shapley, “Stochastic Games,” Proc. Natl. Acad. Sci. U . S . , Vol. 39, 1953, pp. 1095-1100. R. Bellman, “On an Iterative Procedure for Obtaining the Perron Root of a Positive Matrix,’’ Proc. Amer. Math. SOC.,Vol. 6, 1955, pp. 719-725.
8. Let M be a Markoff matrix with positive elements and consider the recurrence relation xntl = M x , , xn = c, with c a probability N vector, that is, ci 2 0, ci = 1. Show that x, converges as n 4 co a_nd that it converges to the unique probability vector satisfying x = Mx. 9. If we wish to calculate x, is it better to proceed via (Ma)c or via M(Mrn-1~)if we are thinking in terms of time and accuracy ? 10. Let A be a positive symmetric matrix. Use the Rayleigh quotient and the representation of the Perron root given in Sec. 2.32 to obtain upper and lower bounds for h(A). Miscellaneous Exercises
1. If B = limN+mAN, what can be said about the characteristic roots of B ? See
0. Taussky, “Matrices with Cn+O,” pp. 5-10.
/.
Algebra, Vol. 1, 1964,
2. Let Y be a positive definite 2 x 2 matrix. Consider the integral ]( Y ) =
x>n
e-17(xy)IX
ls-3/2
dx,, dx,, dx,,
,
where the notation indicates region of integration is that determined by x l l > 0, xllxT2- x?2 > 0. T h e n
for Re(s) > 3/2. (Ingham-Siegel.) Integrals of this type arise in analytic number theory and mathematical statistics. For further results, see
K. Rellman, “A Generalization of Some Integral Identities due to
Ingham and Siegel,” Duke Math. J., Vol. 24, 1956, pp. 571-578. I. Olkin, “A Class of Integral Identities with Matrix Argument,” Duke Math. J., VoI. 26, 1959, pp. 207-213.
MISCELLANEOUS EXERCISES
95
3. Show that
where the contour of integration in each integral is I xi I suitable hypotheses concerning A.
=
1, under
4. Evaluate 2ni
~f(z)(xI A)-l dz
under suitable assumptions concerning f ( z ) ,A , and the contour C . This contour integral can be used as a basis for the construction of a theory of functions of matrices. See
R. F. Rinehart, “The Equivalence of Definitions of a Matric Function,” Amer. Math. Monthly, Vol. 62, 1955, pp. 395-414.
R. F. Rinehart, “The Derivative of a Matric Function,” Proc. Amer. Math. SOL.,Vol. 7, 1956, pp. 2-5. R. F. Rinehart, “Elements of a Theory of Intrinsic Functions on Algebras,” Duke Muth. J., Vol. 27, 1960, pp. 1-20. = (uii) is a Jacobi matrix, that is, aij = 0 for I i - j J 2 2, under what conditions are the elements aii determined by the characteristic roots ?
5. If A
6. If A is positive definite, then
where the integration is over a k-dimensional linear subspace R of N-dimensional space whose volume element is dVk and the maximum is taken over all R.
7. If A is positive definite, then I A I
< a,,a,,
-*-
uNN.
8. Establish Hadamard’s inequality: If B is a real matrix, then
Hint: Apply the previous result to A = BB‘. For the connection with integral equations and Hadamard’s thoughts concerning this, see
MATRIX THEORY
96
J. Hadamard, The Psychology of Invention in the Mathematical Field, Princeton University Press, Princeton, New Jersey, 1949.
9. If A is a positive definite matrix with the property that aii < 0 for i f j , then A-l has all positive elements. (Stieltjes.) 10. Show that 1 I1 A,
1 1
AN-1 1
... 1
A,
... AN
AN-1
...
2
I l
AN-l
N
the Vandermonde determinant.
1 1. Evaluate the Cauchy determinant
12. If A and B are positive definite, then for all x and y , (A-'x,
x)
+ (B-'y, y ) 2 ( ( A+ B)-'(x + Y ) , + Y ) .
13, Let A be a positive definite matrix and let A, 3 A, 3 ... 3 An be the characteristic values. Then ( x , Ax)(x, A-'x)
+ ($1
G - [(Aj2 4
A*
2
>0
ljZ
Show that the upper bound can be achieved. (Kantorovich.)
14. Let z be an N-dimensional vector such that (x,2) = 1. Determine for these x the minimum of (z- 6,A ( x - b ) ) , where A is a Hermitian matrix. See G . E. Forsythe and G. H. Golub, "On the Stationary Values of a Second-degree Polynomial on the Unit Sphere," J . Soc. Indust. Appl. Math., Vol. 13, 1965, pp. 1050-1068.
15. Let A , B be positive definite. Th en t r [ ( ~ ~ ) 2 " +< ' ] tr[(~2~2)2"] for n = 0, I ,.... (Golden.)
97
MISCELLANEOUS EXERCISES
16. Let F denote the set of nonnegative functions f ( t ) defined over [0, I]. Let T be any nonnegative linear one-to-one mapping of F into itself. T he n Tf(t)= g ( t ) f ( h ( t ) ) , where g is a particular function and h(t) is a permutation of the domain of F. (T. A. Brown, M . L. Juncosa.) 17. Consider the matrix A teristic roots. Then tr(A2)
=
N
(aij). Let A,, A, ,..., h, be the charac-
I uij l2 3
=
N
I hi 12.
(Schur.)
i=l
i,j=l
18. Using the foregoing result and the matrix 0 0
a, 0 .’. 0 0 u2 ...
UN
0
0
where the ai 3 0, establish the arithmetic mean-geometric mean inequality
19. Let A and B be real symmetric matrices. A necessary and sufficient pB,A, p real scalars, condition that the pencil of matrices AA contain a positive definite matrix is that (x, Ax) = 0,(x,Bx) = 0 imply that x = 0, provided that the dimension, N , is greater than 2. (Calabi.)
+
20. Any pencil h A
+
pB for which the foregoing holds can be transformed into diagonal form. See 0. Taussky, “Positive Definite Matrices,” Inequalities, Academic Press, New York, 1967.
21. Let A, B be real symmetric matrices, x real. Consider the values y, = (x, Ax),y2 = (x, Bx) for (x,x) = 1 . Show that the set of points in the ( y , ,y,)-plane is convex. (Brickman.) 22. Show that
f f
..‘ el G.. .G ~ ,
I %(%)I
I +i(.i)l 4 d X 2 ... d,.
=
MATRIX THEORY
98
T h i s useful result is due to C. Andreief. See
G. Polya and G. Szego, Aufgaben und Lehrsatze aus der Analysis, Dover Publications, New York, 1947. For some analogous results, see N. G. deBruijn, “On Some Multiple Integrals Involving Determinants,” J. Indian Math. SOC.,Vol. 19, 1955, pp. 133-1 51. Here integrals of the form
are treated. 23. Using the previous exercise, obtain some necessary and sufficient conditions for the existence and uniqueness of the solution of ”T
See
R. Bellman, “A Note on the Identification of Linear Systems,” Proc. Amer. Math. SOC.,Vol. 17, 1966, pp. 68-71. 24. Let { X }be a linear space of N x N real matrices with the property that every linear combination has only real characteristic values. is a convex matrix function of X and A,,,(X) is a Then A,,,(X) concave matrix function. 25. Let X and Y be two real matrices all of whose linear combinations have real characteristic values and suppose that X < 0. Then the iY have negative real parts. characteristic values of X
+
26. Let { X }be as above. T h e n A,(X), A2(X)..., AAX), the N characteristic roots of X are monotone functions of X . For the foregoing with applications to hyperbolic partial differential equations, see
P. D. Lax, “Differential Equations, Difference Equations and Matrix Theory,” Comm. Pure Appl. Math., Vol. 11, 1958, pp. 175-194.
+
AAx = c. Show that the Liouville27. Consider the linear equation x A2A2c - . * obtained by iteraNeumann solution x = c - AAc tion converges within the circle I A I < 1/1 A, I , where A, is the, or a, characteristic root of A of largest absolute value.
+
+
99
MISCELLANEOUS EXERCISES
28. Show that if A is positive definite and ,A = 1, we can make the change of variable p = h / ( l A), obtaining x(1
-
+ p) + pAx
=c
+
- CP
m
and, by iteration, x = c - Acp Cn=2p n ( A - 1)”( - A c ) , and assert that the new series converges for all A 3 0. See R. Bellman, “On Perturbation Methods Involving Expansions in Terms of a Parameter,’’ Quart. Appl. Math., Vol. 13, 1955, pp. 195-200.
-
29. Letf(8) be a continuous function in (-n, n) possessing the Fourier m m seriesf(8) C,E-m a,eins. Thenf(8) ecik6 a,+keine.Hence N
x,=-m
Thusf(8) is in some sense represented by the infinite matrix What connection, if any, exists between values off (8) and characteristic roots of the finite matrix (+J, i , j = 1 , 2, ..., N ? These matrices are called Toeplitx matrices. See M. Kac, W. L. Murdock, and G. Szego, “On the Eigen-values of certain Hermitean Forms,” J. Ratl. Mech. Anal., Vol. 2, 1953, pp. 767-800. Matrices of this type are of interest in connection with “nearest neighbor” problems of the kind that arise in statistical mechanics, such as the Ising model.
30. Let H be a positive definite Hermitian matrix. Then
where x
=
x
+ iy.
R. Bellman, “Representation Theorems and Inequalities for Hermitian Matrices,” Duke Muth. J., Vol. 26, 1959, pp. 485-490.
loo
MATRIX THEORY
31. A skew matrix is one for which AT = - A . Show that every real
skew matrix is a square root of a negative definite real symmetric matrix and that every skew hermitian matrix is a square root of a negative definite, or semidefinite, hermitian matrix.
R. F. Rinehart, “Skew Matrices as Square Roots,” Amer. Math. Monthly, Vol. 67, 1960, pp. 157-161.
For some geometric aspects of skew-symmetric matrices, see
H. Schwardtfeger, “Skew-symmetric Matrices and Projective Geometry,” Amer. Math. Monthly, Vol. 3, 1944, pp. 137-148.
+
iH,I 3 I H I I whenever H , is a positive 32. Hence show that j H I B 1 3 I A I when definite Hermitian matrix and thus that I A A is a real nonnegative definite matrix and B is skew-symmetric. (A. Ostrowski and 0. Taussky.)
+
33. L e t f ( T ) be a complex-valued function of the variable T ranging over symmetric matrices. Set d --f(T) dT
where c i j
=
1 if i
=j
d
--
dT
and
1T1
=
= (Cij
=
+ if i # j . Show that
1 T 1 T-l i f T-l exists.
R. Bellman and S. Lehman, “ T h e Reciprocity Formula for Multidimensional Theta Functions,” Proc. Amer. Math. SOC.,Vol. 12, 1961, pp. 954-961.
+
34. Let A and B be positive definite matrices. Show that (A2N B2N)112N converges as N + GO to a matrix C = C ( A , B ) such that C 3 A , C >, B. Show that
35. Let A and B be positive definite. Determine the minimum over X of tr X where X >, A , X 3 B. Write M ( A , B ) for the matrix
BIBLIOGRAPHY AND COMMENTS
101
which minimizes. Similarly write M ( A , , A , ,..., AN)for the matrix which minimizes t r X where X 2 A i, i = 1 , 2,..., N . Does M ( A , , A , , A3) = M ( A , , M ( A 2 ,A3))hold ? = c + A x and the method of successive A x , , n 3 0. Under what conapproximations xo = c, x , + ~ = c ditions on A does the sequence {xn> converge to x ?
36. Consider the equation x
+
37. Let us write the equations of approximation in component form
+ C u ~ ~ x ~ i) =, 1, 2,..., N , N
x?+’)
=c
j=1
xio) = ci , and then a modified version in which we “update” the values of the ( x r ) ] ,namely,
y!n+l) = c
+ C u i j y l ) + ,Z uijyj(n), i-1
j=1
N
y‘0’
=
ci ,
3=a
= 1, 2, ..., N . Does the new system yield a convergent sequence, and if so, does it converge more rapidly or more slowly than the original ?
i
38. What conditions must b and c satisfy so that Xb definite ? (Wimmer.)
=
c, with X positive
Bibliography and Comments 52.1. For a general introduction to matrix analysis, see
R. Bellman, Introduction to Matrix Analysis, McGraw-Hill, New York, 1960. (Second Edition, in preparation.)
L. Mirsky, Introduction to Linear Algebra, Oxford University Press, New York, 1955. We have avoided any detailed discussion of the art of solving the linear system A x See
=
b.
J. Todd, “Motivation for Working in Numerical Analysis,” Comm. Pure Appl. Math.,
V O ~8, . 1955, pp. 97-116. G. E. Forsythe, “Solving Linear Algebraic Equations Can Be Interesting,” Bull. Amer. Math. SOC.,Vol. 59, 1953, pp. 299-329. R. Bellman, R. Kalaba, and J. Lockett, Numerical Inversion of the Laplace Transform, American Elsevier, New York, 1966. J. R. Westlake, A Handbook of Numerical Matrix Inversion and Solution Techniques, Wiley & Sons, 1968. One aspect of closure, one of the themes of this volume, is the production of an algorithm which calculates the desired values with as few additional values as possible.
102
MATRIX THEORY
Thus, for example, given the linear system Ax = b, we may want to calculate x1 , the first component of x, without calculating the other values. Cramer’s rule accomplishes this, but not in a practical fashion, as we have noted. In some cases, Monte Carlo techniques can be used. In other types of problems, stochastic approximation can be employed. I n general, the task of constructing an algorithm leading to the desired information and only this information is an intriguing one, and little has been done in this area. 92.2. For Exercises 4, 5 , 6, see
R. Bellman, “Invariant Imbedding and Multipoint Boundary-value Problems,” J. Math. Anal. Appl., Vol. 24, 1968, pp. 461-466. $2.3. We meet ill-conditioned systems in numerical inversion of the Laplace transform and in the solution of Fredholrn integral equations in general. See the Bellman, Kalaba, Lockett book referred to in 92.1 above. We also encounter them in the study of improperly posed problems of mathematical physics, and of inverse problems. See
M. M. Lavrentiev, Some Improperly Posed Problems of Mathematical Physics, Springer, New York, 1967. R. Lattes and J. L. Lions, Mkthode de Quasi-reversibilitd et Applications, Dunod, Paris, 1967. English translation, American Elsevier, 1969. Extensive references to the work of Tychonov will be found there. See also,
R. Bellman, R. Kalaba, and J. Lockett, Systems,” J. Math. Anal. Appl., Vol. R. Bellman, R. Kalaba, and J. Lockett, Systems,” J. Math. Anal. Appl., Vol.
“Dynamic Programming and Ill-conditioned 10, 1965, pp. 206-215. “Dynamic Programming and Ill-conditioned 12, 1965, pp. 393-400.
A discussion of the method of “pivoting” may be found in
G. 8. Uanzig, Linear Programming and Its Extensions, Princeton University Press, Princeton, New Jersey, 1964. What we are desirous of emphasizing is the fact that numerical computation is a control process in which the objective is to minimize the overall error. This means that it should be a “feedback control process,” an adaptive process in which all of the information available at each stage of the computation is used to guide the subsequent calculations. This in turn implies that the computational process may be viewed as a stochastic control process. For further details, see
R. Bellman, Adaptive Control Processes: A Guided Tour, Princeton University Press, Princeton, New Jersey, 1957. 92.4. See
G. Temple, “Linearization and Delinearization,” Proc. International Congress of Mathematicians, 1958, pp. 14-21. 92.13. See
A. Ostrowski, “Uber Normen Von Matrizen,” Math. Zeit., Vol. 63, 1955, pp. 2-18. 92.14. For an extensive survey of far-reaching generalizations, see
F. V. Atkinson, “Multiparametrjc Spectral Theory,” Bull. Amer. Math. SOC.,Vol. 74, 1968, pp. 1-27.
BIBLIOGRAPHY A N D COMMENTS
103
92.15. We are interested in the minimization and maximization of quadratic forms because of their applicability to the study of more general variational problems. See Chapter 8. 92.15. See
R. J. Duffin, “The Rayleigh-Ritz Method for Dissipative or Gyroscopic Systems,” Quart. Appl. Math., Vol. 18, 1960, pp. 215-222. R. J. Duffin, “A Minimax Theory for Overdamped Networks,” J. Ratl. Mech. Anal., Vol. 4, 1955, pp. 221-233. R. J . Duffin, and A. Schild, “On the Change of Natural Frequencies Induced by Small Constraints,” J. Ratl. Mech. Anal., Vol. 6, 1957, pp. 731-758. 92.21. For some ramifications of this concept, see 0. Taussky, “Positive-definite Matrices,” Inequalities, Academic Press, New York, 1967. See the discussion on pp. 108-109 of the book by Bellman cited in 92.1 for references to the far-reaching work of Loewner on monotone matrix functions; i.e., analytic functions such that A B implies f ( A ) > f ( B ) .
$2.29. Perron’s original result was obtained in connection with the study of the multidimensional continued fractions of Jacobi. See 0. Perron, “Zur Theorie der Matrizen,” Math. Ann., Vol. 64, 1907, pp, 248-263.
A resurgence of interest in this area is due to the study of branching processes. See
T. E. Harris, The Mathematical Theory of Branching Processes, Springer-Verlag, Berlin,
1963. G. Birkhoff and R. S. Varga, “Reactor Criticality and Nonnegative Matrices,” J. SOC. Indust. Appl. Math., Vol. 6, 1958, pp. 354-377.
Problems of this nature can also be studied by means of differential equations. See
P. Hartman and A. Wintner, “Linear Differential and Difference Equations with Monotone Solutions,” Amer. J. Math., Vol. 1, 1953, pp. 731-743. A new powerful method for treating positive operators has been developped by G. Birkhoff using the Hilbert projective metric. See
G. Birkhoff, “Uniformly Semi-primitive Multiplicative Processes,” Trans. Amer. Math. SOC.,Vol. 104, 1962, pp. 37-51. R. Bellman and T. A. Brown, “Projective Metrics in Dynamic Programming,” Bull. Amer. Math. SOC., Vol. 71, 1965, pp. 773-775. See the books cited in 92.1 for further references to nonnegative matrices.
Chapter 3
MATRICES AND LINEAR DIFFERENTIAL EQUATl O N S
3.1. Introduction
I n this chapter we wish to discuss the use of matrices in the solution of linear systems of differential equations of the form (3.1.1)
i
=
I , 2, ..., N . Using matrix notation, this takes the compact form dx
-
dt
=
AX + f ( t ) ,
~ ( 0=) C.
(3.1.2)
\.lie will also consider some particular matrix equations of importance, dX - -- A X ,
dt dX -= AX dt
(3.1.3)
+ XB,
(3.1.4)
and the Kiccati matrix equation dt
=
A
+ BX + XC + XDX.
(3.1.5)
T h e first equation is classical; the second has risen in prominence over the last fifty ycars; the third has emerged as one of the fundamental equations of analysis and applied mathematics over the last ten years in connection with control theory, dynamic programming, and invariant imbedding. 3.2. Vector-Matrix Calculus
We wish to consider vectors and matrices which depend on the independent variable t . We write x ( t ) and A(t) to indicate 104
3.3. EXISTENCE AND UNIQUENESS OF SOLUTION
105
this, when necessary, and introduce in the expected fashion the derivatives dx -_
d A - (+) _ at
,
dt
-
(3.2.1)
and the integrals
Using the foregoing notation, we can write the system of linear differential equations in (3.1.1) more simply as dx dt
~ ( 0= ) C.
- = A(t)x + f ( t ) ,
(3.2.3)
It remains to show that there are advantages to this compression of data. We will begin with the homogeneous equation dx _ - Ax, dt
(3.2.4)
x(0) = c,
where A is a constant matrix. Exercises
1. Show that (1
$, x dt, I( < Jb 11 x 11 dt, .
2. Show that d / d t ( X Y ) ( d X / d t )y -1 X(dy/dt).
=
(dX/dt)Y
+
X ( d Y / d t ) , d/dt(Xy)
=
3 . Show that d/d t( X- l) = - X - l ( d X / d t ) X - l . 4 . Show that d/dt(x,y) = ( d x / d t , y )
+ (x, dy/dt).
3.3. Existence and Uniqueness of Solution
Let us begin by demonstrating the existence and uniqueness of the solution of (3.2.4), employing the method of successive approximations.
106
MATRICES A N D LINEAR DIFFERENTIAL EQUATIONS
Set x0 =- c,
x,+,
+
c
:
st
(3.3.1)
A x , dt,.
0
I t is easy to see inductively that x, is a polynomial of degree n in t . We have xo = c,
x1
=
c
+ Act,
x,
--
c
+ Act + ... + Anctn/n!
x,
,- x, = An+1ct?2+1/(n+ I)!
We have and thus
/I xn+l
-
xn /I < I1 A
Iln+lllc
II I t In+l/(n
(3.3.2)
(3.3.3)
+ l)!
(3.3.4)
Hence the series Cn=o11 x,+, - x, 1 1 , and thus also the series m (x,+~ - x,), converges uniformly in any finite t-interval to a function x(t) which satisfies the integral equation m
x =c
+
st
A x dt,
(3.3.5)
0
and thus the original differential equation. T h e right to differentiate (3.3.5) is not immediate since all we know 1 about x from its construction is that it is continuous. But then Ax dt, as an integral is differentiable, which implies that x as given by (3.3.5) is actually differentiable. We shall employ this circuitous route again in Chapter 4. From (3.3.2), we see that we may write
so
(3.3.6) I t remains to establish uniqueness. Let y be another solution of
(3.2.4). T h e n we have successively d dt
-(x
-
x -y = IIx-yIl
I' 0
<Jt
A(x
-
y ) dt, ,
IIAIIIlx-yIldt,
(3.3.7)
107
3.4. THE MATRIX EXPONENTIAL
From the last relation we have (3.3.8)
for any
E
> 0. Using the fundamental )) x
-
lemma in Sec. 1.3, we have
y /) ,< eetllRII.
(3.3.9)
Since this holds for any E > 0, we must have 1) x - y 1) that x = y , the desired uniqueness.
=
0. T h is ' means
3.4. The Matrix Exponential
T h e formal analogy of the series in (3.3.6) to what is obtained in the scalar case, where the solution is eatc, prompts us to introduce the matrix exponential defined by the series (3.4.1)
T h e estimate used in (3.3.4) shows that the matrix series converges uniformly in any finite t-interval. Hence, we see that e A t satisfies the matrix equation -dX _
dt
-
AX,
(3.4.2)
X(0) = I.
T h e uniqueness of solution of (3.4.2) follows from the corresponding result for the vector equation. T h e functional equation eA(t+s)
~
(3.4.3)
eAteAs
can be demonstrated as in the scalar case by series expansion in powers of the exponent and appropriate grouping of terms using the absolute convergence of the series involved. From (3.4.3) we derive the basic result I
Hence,
eAt
= e A ( t - t ) = eAte-At.
is never singular and its reciprocal is
(3.4.4) erAt.
MATRICES A N D LINEAR DIFFERENTIAL EQUATIONS
108
Exercises
1. I t is instructive to regard (3.4.3) as an expression of the law of causality. Derive it from the uniqueness of solution of the equation dX
-~ AX,
X ( 0 ) = eAs.
-
dt
2. Show that we can establish uniqueness for (3.4.2) by viewing it as N vector equations of dimension N or as one vector equation of dimension N 2 .
3 . Using the functional equation of (3.4.3), establish the functional equations for cos t and sin t. 4. Show ~ ( 1 " )
that
-1
we
alu(N-l)
4-
can
establish
+ a,zi
7
existence
and
0, ~ ( ~ ' = ( 0ci ) ,i
by viewing it as equivalent to the system 21' =
u1 , ul'
= u2
,..., u;YPl =
u(0) = co ,
5. Show that
eAf
= lininim ( 1
-aluN--l
Ul'(0)
--
Ant.
uniqueness
0, 1 ,..., N
a2uN--2...
= c1 ,...
~
for 1,
-
aNu,
.
+ At/@)".
6. Consider the differcnce equation xntl X,
-
=
=
Ax,, x,,
= c.
Show that
+
7. Consider the difference equation y ( t A ) = Ay(t),y ( 0 ) = c, where t 0, 4 , 2 4 , .... Examine the convergence of y ( N A ) , where NA = t , as 4 0. ~
---f
3.5. Commutators
T h e relation in (3.4.3) is an analogue of what holds in the scalar case. On the other hand, eAleBl # e ( A + B ) t (3.5.1) for general A and B. If equality holds in (3.5.1) for -co we have, upon expanding,
< t < 00,
3.5. COMMUTATORS
109
Examining the coefficient of t2 on both sides, we have
B2 A2 -+-+AB= 2 2
A2
+ AB + B A 2
+B2
(3.5.3)
or AB
=
(3.5.4)
BA.
Thus, equality can hold only if A and B commute. If A and B do commute, we can group terms as in the scalar case to show that equality holds in (3.5.1). Since eAteBtis nonsingular, we know that it can be written as an exponential (see end of Sec. 3.7). The task of determining the “logarithm” of eAteBtis an interesting and important one. We leave it to the reader to show formally that eAteBt =
exp[(A
+ B)t -1(AB
-
+ ....I
(3.5.5)
BA)t2/2
T h e expression [ A , B ] = A B - B A is called the commutator of A and B and is a basic matrix function of A and B . T h e series obtained in the exponent is called the Baker-Campbell-Hausdorff series and plays a vital role in many investigations. See the exercises after Sec. 3.12 for some results in this area. Exercises
1. Let A and B be matrices which do not necessarily commute. Then &A+B)
=
lim
(eAt/neBt/n)n.
n-tm
See
H. F. Trotter, “On the Product of Semigroups of Operators,” Proc. Amer. Math. SOC.,Vol. 10, 1959, pp. 545-551. This has interesting applications to the study of the Schrodinger equation with Feynman integrals. See W. G. Faris, “The Trotter Product Formula for Perturbations of Semibounded Operators,” Bull. Amer. Math. SOC.,Vol. 73, 1967, pp. 211-215.
+
+
2. Consider the equation X = C € ( A X X B ) , where m parameter. Write X = C + enyn(A,B), where yo(A, B )
=
C,
yn(A, B) = Ayn-i(A, B )
+ ~ n - i ( AB)B, ,
E
is a scalar n
2 1.
110
MATRICES A N D LINEAR DIFFERENTIAL EQUATIONS
Show inductively that B ) = AnC
where
+ (7)A"-lCB + );(
An-2CB2
+ ... + CB",
(2) are the binomial coefficients.
3. Introduce a position operator P with the property that when it operates on a monomial consisting of powers A and B in any order with C somewhere, it shifts all powers of A in front of C and all powers of B after C. Thus, P(AaiBB1 . .. AakBBkBBkCAak+iBflk+i .. . Aa~Bfl,)= AzUiCBz@i.
Further, define P to be additive, P(mi(A, B )
+ m2(4 B ) )
=
P(m,(A, B ) )
+ P(m2(A9 B)),
where m1 and m, are monomials of the foregoing type. Show that %(A, B ) = P ( ( A qnC).
+
+ B)]-l C . E + <(AX+ X B + C X D ) , then
4. Show that X 5. If X
=
=
P[l X
-
=
€(A
+ B + CD)]-lE.
P[I - €(A
3.6. lnhomogeneous Equation Consider next the equation dx
- = AX + f ( t ) ,
dt
~ ( 0= ) C.
(3.6.1)
We obtain the solution using an integrating factor as in the scalar case. Write
(3.6.2)
whence (3.6.3)
3.7. THE EULER SOLUTION
111
Exercise
1. Obtain this solution using the Laplace transform. 3.7. The Euler Solution T h e equation
dx _ -- A x , dt
x(0)
= c,
(3.7.1)
can also be solved in the following fashion. Ignoring the initial condition, let us look for particular solutions of the form x = eAtb,
(3.7.2)
where h is a scalar and b is a constant vector. T h e n (3.7.1) leads to (3.7.3)
hb = Ab,
which means that h is a characteristic value of A and b is an associated characteristic vector. Let us suppose for the moment that the characare distinct, and let T be a matrix which teristic values A,, A, ,...,A, reduces A to diagonal form,
(3.7.4)
Setting x =
(3.7.5)
Ty
we have
dx -- T -dy _ dt
dt
z=
AT9
(3.7.6)
whence (3.7.7) From x(0) = Ty(0) = c we obtainy(0) = T-L. Hence, we see the connection between the matrix exponential solution and the usual solution in terms of exponentials.
112
MATRICES A N D LINEAR DIFFERENTIAL EQUATIONS
We have
[
eAt = T e A t l ' - l
-
T
ehzt
.
..
T-1.
(3.7.8)
If A has multiple characteristic roots, the representation requires more than exponcntials in general. T o obtain the form of eAt we can proceed in either of two ways, use of the Jordan canonical form or a canonical representation in triangular form. Writing
(3.7.9)
where the Ji are the Jordan matrices
(3.7.10)
we see that
We leave the determination of the analytic form of eJa' as an exercise. Alternatively, we can use a representation of the form
3.8. STABILITY OF SOLUTION
113
where the elements below the main diagonal are zero. This is a considerably simpler result to establish. T h e equation for y , where x = Ty as before,
(3.7.13)
shows that if X is a multiple root, then in general, terms of the form tk-leAt will arise when k is the multiplicity of A. We will avoid these more complex cases in what follows and therefore content ourselves with the cursory discussion. Exercises
1 . If A has distinct characteristic roots and A is nonsingular, show that A = eB. 2. Use the Jordan canonical form to establish this exponential representation for general nonsingular A. 3. Obtain the exponential representation without use of the Jordan canonical form.
4. If xntl = Ax, , xo = c, obtain the analytic representation of x, .
+
5 . Consider the vector equation x" Ax = 0 where A > 0. Show that if a11 of the characteristic roots of A are distinct, the general solution is a linear combination of elementary oscillations.
6. Show that this result is true even if A has multiple roots. Hint: Use the first integral (x', x') + (x, Ax). 7. Show that we can obtain approximate values for the characteristic frequencies using variational techniques.
3.8. Stability of Solution
We are now able to answer a most important question: Under what conditions on A do all solutions of dx dt
-= Ax
(3.8.1)
114
MATRICES A N D LINEAR DIFFERENTIAL EQUATIONS
approach zero as t + m ? From what has preceded, we see that a necessary and sufficient condition that this be the case is that all of the characteristic roots of A have negative real parts. If this condition holds, we call A a stability matrix. T h e problem of testing for this condition without explicit calculation of the characteristic roots is a difficult one to which much attention has been directed in recent years.
Exercises
1. If A , B, C are nonnegative definite and either A or C is positive C I = 0 has no roots with positive real definite, then 1 X2A 2XB parts. Hint: Consider the behavior of the solutions of
+
+
+ 2Bx' + CX = 0.
AX"
sr
2. If x' = Ax,x(0) = c, and A is a stability matrix, then (x, Ax)dt can be evaluated directly in terms of the elements of A and the components of c .
3 . If A is a stability matrix, the solution of A y = b may be obtained as the limit of the solution of x' = Ax - b, x(0) = c as t + co.
4. What is the rate of convergence ? 5. Show that if A is nonsingular we can find a permutation matrix P and a diagonal matrix D such that D P A is stable. (Folkman.)
3.9. Linear Differential Equation with Variable Coefficients Let us next consider the equation
dx
- = A(t)x,
dt
x(0)
= c.
(3.9.1)
T o establish the existence and uniqueness of the solution of (3.9.1), we once again convert the differential equation t o an integral equation x
=
c
+
J t A@,)x(tl) 0
dt,
(3.9.2)
3.9. LINEAR DIFFERENTIAL EQUATION WITH VARIABLE COEFFICIENTS
115
and employ successive approximations, xo = c,
x,+1 = c
(3.9.3)
+ Jb4 t l ) Xn(t1) dtl .
T o establish convergence of x, we consider the series We have xn+l-
xn =
It
x:,"=, (x,,+, - x,).
~ ( ~ ( -x xn-1) n dt,
(3.9.4)
and thus
Let us suppose that A ( t ) is continuous for 0 k, = max, ]I A(t)l]. Then, from (3.9.5),
< t < to
and put
(3.9.6)
Since
we obtain inductively
xn=o m
Thus, once again, the series (x,+, -x ), converges uniformly in [0, to] which means that x, converges uniformly to a function satisfying (3.9.2) and thus (3.9.1). T h e proof of uniqueness follows as before. Let X ( t ) be defined as the solution of dX _ dt
-
A(t)X,
X ( 0 ) = I.
(3.9.9)
We can build X ( t ) u p out of N columns satisfying (3.9.1) for appropriate values of c or establish its properties directly proceeding as above. t t o , a fact which is readily T h e matrix X is nonsingular for 0 established by observing that Y = X-' satisfies the linear equation
< <
dY dt
- --
-YA(t),
Y(0) = I.
(3.9.10)
116
MATRICES A N D LINEAR DIFFERENTIAL E Q U A T I O N S
\Vc establish this in reverse. Start with the functions X and Y defined by (3.9.9) and (3.9.10). From (3.9.9) and (3.9.10) we see that d ctX dY (UX)= Y $- -X dt dt dt
-
=
YA(t)X
Y A ( t ) X = 0.
~
(3.9.1 1)
l ‘ h u s , Y X = C , a constant matrix. Setting t = 0, we see that C = I. Hence, Y = X-l. Exercises
I . Show that the solution of X‘ = A(t)X , X ( 0 ) = I , is nonsingular for t 3 0 if A ( t ) is integrable for t >, 0 by demonstrating the result of Jacobi,
I X(t)I = exp
[Jttr(A(t1)) dt,]
.
2. Show that X is nonsingular by showing that the vanishing of I X(t)I at some point t = t , , 0 <-c t, < t, , implies the existence of a nontrivial vector b such that X(t,) b = 0. Hence, y(t) = X ( t ) b is a solution of dy/dt = A(t)y, y(t,) = 0, and thus y ( t ) = 0, for 0 t to . T h i s is a contradiction sincey(0) = X ( 0 ) b = b # 0.
< <
3. Modify the foregoing proof to demonstrate that the integral equation x c J,1 A(t)x dt, possesses a unique solution in any interval [0, t,] where J j A(t)ll is L-integrable, that is, A I/ dt, < co. Suppose that A ( t ) is piecewise continuous. What interpretation do we give the differential equation at a point of discontinuity of A(t)? ~
+
J2II
4. Show that X = exp[J: A(t,) dt,] is the solution of X’ = A(t) X , X ( 0 ) = I if and only if A ( t ) $, A(t,) dt, = A(t,) dt,) A(t).
(Ji
5. Consider the positive definite matrix A(s) with distinct characteristic
values, Xi(s) and associated orthonormal characteristic vectors xi(s). Show that X r 2 ’ ( s ) = (A’(s)x,,, xn), where ’ denotes d/ds, and that
(Kalaba, Schmaedke, Vereeke.) 3.10. Linear lnhomogeneous Equation Let us now consider the inhomogeneous equation dx d t = A(t)x + f ( t ) ,
x(0)
=
c.
(3.10.1)
117
3.10. LINEAR INHOMOGENEOUS EQUATION
To obtain the solution, we can employ the variation of parameters technique. Set x =xy, (3.10.2) where X ( t ) is as in the previous section. Th e n dx dt
-=
Since X '
=
dX (x) Y +x ( dY z ) 4 t ) XY + f ( Q
(3.10.3)
=
A(t)X , this reduces to dY =
dt
x-If,
(3.10.4)
whence
x =c
(3.10.5)
+ jtX ( t ) X(t1)-'f(t1>dt, . 0
Exercises
1 . Let K(t, t l ) = X ( t ) X(tl)-l. Establish the semigroup property, K(t, tl) = K(t, t 2 ) K(t2 , tl). 2. Obtain expressions for the terms in the perturbation expansion e A k B = e A + B1c + ... by considering e A + c B as the value at t = 1 of X ( t ) , where X' = ( A EB)X , and converting the differential equa-
+
tion into an appropriate integral equation.
3. Show that if aij(t) >, 0, ci >, 0, then the solutions of x' x(0) = c are nonnegative for t 0.
=
A(t)x,
4. Show that the same holds under the conditions aii(t) >, 0, i # j , ci 2 0. Hint: Consider the integral equation
i = 1 , 2,..., N . 5. Establish the result using the difference approximation Y,(t
+4
=
(1
-
aii(t) 4
i = 1 , 2,..., N , t = O , L l ,....
YiW
+ c .Zi(t)Yj(t) A , 3#i
iia
MATRICES A N D LINEAR DIFFERENTIAL EQUATIONS
= --a..x. z + C. 3 f 2 . a..x. 23 3 xi(0) = ci , with ci 3 0, xi ci = 1 , >, 0, aij = Ci,j aij , then xi >, 0, Xixi(t) = 1, t 3 0. Equations
6. If xi’ aij
22
I
of this nature are of importance in the modern theory of pharmacokinetics. 3.11. Adjoint Equation
I n dealing with more general linear functional equations, the matrix inverse is replaced by the solution of an adjoint equation. To see how this goes, let Z be a matrix function, as yet unspecified, and consider the integral (3.1 1.1)
Integrating by parts, we have on the left
upon taking x(0) solution of
=
0, as is permissible. Let us then choose Z ( t l ) as the dZ dtl
_ _ --
for 0
-ZA(t,),
Z(t) = I,
(3.1 1.3)
< t , < t. With this choice of 2 we have, from (3.11 . I ) and (3.1 1.2), (3.11.4)
T h e solution of (3.1 1.3) is easily seen to be 2 (3.11.3) is called the adjoint equation. 3.12. The Equation X’
=
X ( t ) X(tl)-l. I n general,
= AX + XB
An equation that arises in a number of interesting situations is dt
=
AX
+XB,
X(0)=
c.
(3.12.1)
Let us first consider the case where A and B are constant. It is easy to see that X = eAtceBt (3.12.2) is a solution. Uniqueness shows that it is the solution.
3.12. THE EQUATION X’ = AX
+ XB
119
T o solve
_ dx at
-
AX
+ X B +F(t),
(3.12.3)
X ( 0 ) = C,
we use the left integrating factor e-At and the right integrating factor e-Bt, obtaining X = eAtCeBt
+
s:
eA(t-ti)F(t,) eB(t-ti) dt, .
(3.12.4)
From the representation in (3.12.2) we obtain an interesting result. Integrate (3.12.1) between 0 and 03 and suppose that A and B are stability matrices so that X -+ 0 as t -+ co. Then
which means that Y=
is a solution of
C
-so
m
=
eAtiCeBti
AY
(3.12.6)
dt,
+ YB.
(3.12.7)
I n Sec. 3.19 below, we will discuss the question of when (3.12.7) has a unique solution. I n the meantime, let us note that the existence of (3.12.6) for all C if A and B are stability matrices means that (3.12.7) has a unique solution in this case. Exercises
1. Obtain the result corresponding to (3.12.4) when A and B are variable. 2. Show that the solution of X’ definitive when C > 0.
=
AX
+ X A T , X ( 0 ) = C is positive
3. Let H be a function of a parameter t , H Then
4. Show that
=
H ( t ) , and
s a
parameter.
120
MATRICES A N D LINEAR DIFFERENTIAL EQUATIONS
5. If [ A ,H(t)]
aH/at, then [ A , f ( H ) ]= a f ( H ) / a t .
:
6. Show that e A B e r A = B
7. Write ez
etAetB,
:
z
+ [A,B] + [ A ,[A,B]]/2! + ..*. xm Z,,tn, 2‘ = ntn-l 2, . Show that m
:n=l
i:erzZ‘(t)ecTZdr
=
A
+ etABectA
and thus obtain recurrence relations for the 2,.I n particular, show -4- B , 2.2= [ A , q / 2 , 2 3 = [ [ A ,[A,BII “A, B ] , B]]/12. (Baker-Campbell-Hausdorff formula.)
+
z,= A
For the foregoing five results and many further results and additional references, see
R. M. Wilcox, “Exponential Operators and Parameter Differentiation in Quantum Physics,” J. Math. Phys., Vol. 8, 1967, pp. 962-982. I. Rialynicki-Rirula, R . Mielnik, and J. Plebanski, “Explicit Solutions of the Continuous Baker-Campbell-Hausdorff Problem ...,” Ann. P h ~ s . Vol. , 51, 1969, 187-200. 3.13. Periodic Matrices: the Floquet Representation
I n general, it is very difficult to obtain representations for the solution of X ’ A(t)X , X ( 0 ) : I . If A ( t ) is periodic, i.e., A(t 1) = A(t), a useful representation exists. We can show that
+
1
X
+
=
P ( t ) eat,
(3.13.1)
where Iz is constant and P(t I ) = P(t). T o see this, observe that X ( t ) and X ( t 1) are simultaneously solutions of the differential equation if A ( t ) is periodic. Hence X(t
+
+- 1)
X(t)C
(3.13.2)
for some constant matrix C. Setting t = 0, C = X(1), a nonsingular matrix. Hence, we can write C : ea for some matrix B. Set X ( t ) = Y ( t )e B t ,
then (3.13.2) yields Y(t
+ 1) eE(t+l)
=
Y(t)eB‘eE,
(3.13.4)
Y(t),
(3.13.5)
whence Y ( t t 1)
the desired result.
=
(3.13.3)
3.14. CALCULUS OF VARIATIONS
121
T h e representation in (3.13.1) is an elegant and important one but is not easy to use for quantitative purposes. There is no readily available way of determining B. Exercises
1. Consider the equation
+
+
where Pl(t 1) = Pl(t), P2(t A) = P2(t)with h irrational. Denote 1, 8) = X ( t , 8 1 ) X ( l , 8). the solution by X ( t , 8). Show that X ( t
+
+
2. Hence, show that
x(t + n, e) = x(t,e + ). 3. Write X ( 1 , 8)
=
X(I, e
+
+
It - 2)
... X(I, e).
e B ( @Then ).
JJ X ( 1 , e + n
n
n
k=O
- 1) X(I, e
-
exp[B(e
k) = k=O
+ n 41. -
Does an expression of the above form possess any simple asymptotic behavior as n -+ 03 ? See
R. Bellman, “A Note on Linear Differential Equations with Quasiperiodic Coefficients,’’ J. Math. Anal. Appl., forthcoming. 3.14. Calculus of Variations
As in the scalar case, variational problems lead to two-point conditions. Consider the problem of minimizing the quadratic functional
over the class of functions subject to the initial condition x(0) = c, and the existence of the integral. Let 3 be a function which renders J ( x ) stationary and consider (3.14.2)
MATRICES AND LINEAR DIFFERENTIAL EQUATIONS
122
where
t
is a scalar. T h e condition of stationarity yields the condition (3.14.3)
for all y . Integrating by parts, this becomes
(z’, y)l,T
+J
”T
[-(x”, Y )
+ (A(t)
9, y ) ] dl
0.
(3.14.4)
Hence, we are led t o conjecture the Euler equation
x”
-
A(t)x
=
0,
(3.14.5)
subject to the two-point boundary condition
X(0)
xyT) = 0
= c,
(3.14.6)
as a necessary condition to be satisfied by the minimizing function. We will discuss the rigorous aspects of this procedure in Chapter 7. 3.15. Two-point Boundary Condition
We solve (3.14.5) subject to (3.14.6) in exactly the same fashion that we treated the scalar case. Let X , and X , be the principal solutions of the matrix equation
x
-
A ( t ) X = 0,
(3.15.1)
that is, X,(O) X,(O)
=
1,
=
0,
Set x := X,b‘l’
X,’(O) X,’(O)
= =
+ X,b@’,
0, 1.
(3.15.2)
(3.15.3)
where .V1) and b(2)are vectors to be determined. T h e n (3.14.6) leads t o (3.1 5.4)
Thus, there is a unique solution for ZP) if and only if X 2 ‘ ( T )is nonsingular. I t is not easy to give general conditions on A which ensure the existence and uniqueness of a solution. O n the whole, we must be guided by
3.17. THE MATRIX RlCCATl EQUATION
123
biological, chemical, economic, engineering, and physical processes which suggest various analytic conditions. T h e theories of dynamic programming and invariant imbedding play a significant role in this study, both analytically and computationally, as we shall discuss in the second volume. 3.16. Green’s Functions
If X,’( T ) is nonsingular, the equation x”
-
A(t)x = f,
x(0)
=
0,
x’(T) = 0,
(3.16.1)
has a unique solution expressible in the form
T h e matrix K is called the Green’s function associated with the equation and the specific boundary conditions. We proceed exactly as in the scalar case to determine its analytic form. Exercise
1 . Find the Green’s functions associated with the conditions (a) x(0) = 0, x’(T) = 0, (b) ~ ( 0= ) x ( T ) = 0, (c) x(0) = 0, x ( T ) Xx’(T) = 0, h a scalar.
+
3.17. The M a t r i x Riccati Equation
An important role in both mathematical control theory and mathematical physics is occupied by the matrix Riccati equation, as we shall see in the chapters on dynamic programming and invariant imbedding in Volume 11. Purely formally, we obtain an equation of this nature in the following fashion. Consider the linear system of matrix equations X‘ = A X Y’ = CX
and the new variable
+ BY, + DY,
X(0) = XI, Y(0)= I ,
z= XY-1.
(3.17.1)
(3.17.2)
MATRICES AND LINEAR DIFFERENTIAL EQUATIONS
124
T h e inverse Y-l exists in some interval [0, t o ] .Then
+
=
( A X 4-B Y ) Y - ‘ - XY-’(CX DY) Y-l AAyk--l+ Z j - Xy-lCXE’-l - x1---1n
=
AZ
-
+B
-
ZCZ
-
ZD,
(3.17.3)
X , . Conversely, we can transform the equation for Z into lvith Z(0) a system of two linear equations of the form of (3.17.1) by means of ~~
(3.17.2).
Exercises
1. Show that 2-1satisfies an equation of the same type.
+
+
B,)(C,Z Dl)-’ satisfies an equation of the same 2. Show that (A,Z form, with and without calculation. 3.18. Kronecker Products and Sums
1,et us now introduce an important matrix product which arises in surprising Mays-the Kronecker product. Let A and B be two matrices of dimension M and N respectively. Define the MN-dimensional matrix A
0R
=
u,,B
=
i , j = 1, 2 ,...,M ,
(uz,B),
(:
uM,B
.‘.
...
“‘““i
. ~MMB
(3.18.1)
’This is called the Krorkecker product of B by A. One way in which the matrix arises is the following. Let x and y respectively undergo the two linear transformations x(‘) = Ax, y(1) = BY, (3.18.2) and consider the linear transformation induced in z , the MN-dimensional vector,
2-
(3.18.3)
3.19. AX
+ XB
=
C
125
I t is easy to see that the MN characteristic roots of A @ B are Xipi where Xi are the characteristic roots of A and the p j are the characteristic roots of B. T h e Kronecker sum of A and B is defined by
A @B
=
A
0IN + IM @ B,
(3.18.4)
where IM and I, are respectively the M and N dimensional identity matrices. If x and y respectively satisfy dx dt
dY
- = Ax,
dt = B ~ ,
(3.18.5)
then z as defined by (3.18.3) satisfies dz -- [ A @ BIZ. dt
(3.18.6)
It follows, thus or otherwise, that the characteristic roots of A @ B are the MN values Ai pj .
+
Exercises
1. Show that ( A x B)(C x D ) = AB @ CD,( A @ B)-l= A-l@ B-l, and ( A @ B)T = A T @ BT. 2. Introduce the “stacking operator,” a transformation from an N x N matrix into an N2-dimensional vector
where the a ( i )are the columns of A. Prove that S(PAQ) = (QT0P ) S ( A ) .
D. Nissen, “A Note on the Variance of a Matrix,” Econornetrica, Vol. 36, 1968, pp. 603-604. 3.19. AX
+ XB = C
As a first application of the Kronecker sum, consider the matrix equation AX+XB=C. (3.19.1)
MATRICES A N D LINEAR DIFFERENTIAL EQUATIONS
126
If we regard this as a linear equation of dimension N 2 for the elements xii and introduce the vector
(3.19.2)
it is easy to show that (3.19.1) takes the form
( A @ B T ) y= d
(3.19.3)
where d is formed from C in the same way that y was formed from X . From this it follows that (3.19.1) has a solution for all C if and only if hi pj # 0 for all i and j . T h i s extends the previous result for the case where A and B were stability matrices, in which case this condition was automatically satisfied.
+
Exercises
1. A necessary and sufficient condition that a real matrix A be a stability matrix is that ATY Y A = -I possess a unique solution which is positive definite. Hint: Let x satisfy dxjdt = A x and u ( t ) = (x,Yx). Consider the behavior of u(t) as a function o f t . Alternatively, use the character of the solution Y in terms of the Kronecker sum. ~
2. Let A be an N x N complex matrix with characteristic roots Xi , A, + A, f 0. T h e n the N x N matrix G, the solution of AG 1- GAT = I , is nonsingular and has as many positive characteristic roots as there are X i with positive real parts. See
0. ‘Taussky, “A Generalization of a Theorem of Lyapunov,” J . SOC. Ind. Appl. Math., Vol. 9, 1961, pp. 640-643.
3 . Show that if all characteristic roots of A are less than one in absolute value, the solution A * X A - X = -0, where A* is the conjugate transpose of A , is given by
MISCELLANEOUS EXERCISES
127
where J denotes integration round the circle I x ] z-plane. Alternately, X
= (2n)-l
JT
=
1 in the complex
(G-l)* QG-l do,
--n
where G
=
A
-
Ieie. See
R. A. Smith, “Matrix Calculations for Lyapunov Quadratic Forms,” J. Dz#. Eqns., Vol. 2, 1966, pp. 208-217. R. A. Smith, “Bounds for Quadratic Lyapunov Functions,” J. Math. Anal. Appl., Vol. 12, 1966, pp. 425-435.
4. Show formally that a solution of X - U X V = W is given by X = UkWVk.When does the series converge and represent the solution. See
xr=o
+
R. A. Smith, (‘Matrix Equation X A BX Math., Vol. 16, 1968, pp. 198-201.
=
C,” SIAM J. Appl.
+
= (C, B ) to be the unique solution of XB B X = C in the case hi hi # 0, Xi the ith characteristic root of B. Show that ( C , ( C , B ) ) = B.
5. Define X
+
3.20. Random Difference Systems
Kronecker products also arise from the consideration of linear difference equations of the form xn+l =
Rnxn,
xo =
(3.20.1)
C,
where the R , are independent random matrices. It is easy to calculate the expected values of the x, , namely, E(xn+l) = E(Rn) E(xn),
E(x0)
=z
C-
(3.20.2)
To calculate the second and higher moments, we require Kronecker products. Detailed results will be found in the references at the end of the chapter. Miscellaneous Exercises
1. Consider the matrix transformation f ( A , T ) = T-lAT
-
dT
-. dt
128
MATRICES A N D LINEAR DIFFERENTIAL EQUATIONS
Show that
f(4S T ) = M A , Y’),
S).
Hint: Start with the differential equation dx/dt = A(t)x, x(0) and consider the equation for y where x = Ty.
=
c,
2. Consider the linear functional equation
f
- -
e
+W f g ) ,
where e is the identity element, h is a scalar, and T is an operator satisfying a relation of the form ( T u ) ( T v )= ”(uT(v))
+ T(u)v
-
Buv,
for any two functions ZL and v where 6 is a fixed scalar. Show that for small A,
(Baxter.) See
F. V. Atkinson, “Some Aspects of Baxter’s Functional Equation.”
J . Math. Anal. Appl., Vol. 6 , 1963, pp. 1-29. J. G. Wendel, “Brief Proof of a Theorem of Baxter,” Math. Scand., Vol. 1 I , 1962, pp. 107-108. It is interesting to point out that operators of this nature are connected with the Reynolds operator of importance in turbulence theory. See
G. C. Rota, “Reynolds Operations,” Proc. Symposia in Appl. Math., Vol. 16, Stochastic Processes in Mathematical Physics and Engineering, Amer. Math. SOC.,1964, pp. 70-83. 3. Consider the differential-integral equation AX =
Use the Laplace transform to find a representation for L(v)
1
m
=
e c s t x ( t ) dt,
0
assumed to exist for Re(s) sufficiently large.
MISCELLANEOUS EXERCISES
129
4. Let X ( t ) be the solution of the matrix equation ~
dX dt
=
AX
+
st 0
K(t
-
t,) X ( t l ) dt, ,
-
tl) K(t,) d t ,
and let Y ( t )be the solution of dY dt
-= Y A
+ jt Y ( t 0
.
What relation exists between X and Y ?
5. Let X ( t ) satisfy the foregoing equation. Show that X(t
+ s)
-
s: [I'
A(tl) K(t
X ( t ) X ( s )= -
+ s - t,
- sl)
4,) ds,]
dt,
.
See J. M. Richardson, "Quasi-differential Equations and Generalized Semigroup Relations," J . Math. Anal. Appl., Vol. 2, 1961, pp. 293-298.
6. Show that if X ( t ) is a differentiable matrix function satisfying the s) = X ( t ) X (s ) for -03 < t , s < 03, functional equation X ( t and if X ( 0 ) = I , then X ( t ) = eAt for some constant matrix A. (Hint: Differentiate with respect to both t and s and compare
+
results.
7. Show that the same result holds if X ( t ) is merely required to be continuous. (This is a classic result due to Polya.)
+
8. Let
u satisfy the equation u = f AT(u), where f is a known function and X is a parameter. Using the relation
T y u ) = T " ( f ) + XT'n+l'(u), show that
and thus obtain a formal infinite continued fraction for u/T(u) and, hence, for u.
9. Apply this to the equation
u =
fraction expansion for et. See
1
+ X sou dt, to find a continued t
MATRICES A N D LINEAR DIFFERENTIAL EQUATIONS
130
R. Bellman and J. M. Richardson, "A New Formalism in Perturbation Theory Using Continued Fractions," Proc. Nut. Acad. Sci. U.S., Vol. 48, 1962, pp. 1913-1915.
10. Let u, be the nth convergent of this infinite continued fraction. Show that
(T. Brown.) S. I. Drobnies, "On a Method of Bellman and Richardson in Perturbation Theory," J. Muth. Anal. Appl., forthcoming. 11. Consider the Fredholm integral equation
4.)
=
f ( 4+ A f b 4 x 7 Y ) 4 Y ) dY.
Let the first approximation be determined by
where a,
=
Jb
a
ul(x) dX/(b - u).
Show that
where D(A) = ( b
-
a) -x
1: f:
k(x,y ) dx dy.
As the nth approximation, take
where a,
-= J b n
Show that
(un(x) - Un&))
dx/(b - a).
BIBLIOGRAPHY AND COMMENTS
131
Discuss the convergence of this method due to Sobolov. See A. Y . Luchka, The Method of Averaging Functional Corrections: Theory and Applications, Academic Press, New York, 1963. 12. Consider the linear integral equations
+ j; k(u9 4 v,(v) dv, g(.) + J: [j: exPC+ - Y ) / k ( % w)3 df4Y ) drl dv,
(1) y(u) = g(u) and
(2) d u , 4
=
where it is assumed that k(u,V ) >, 0 for u, w 3 0. Show formally that limz+m y(u, x) = cp(u). 13. Prove that if (1) has a unique positive solution y(u), then (2) possesses a unique solution which converges monotonically to ~ ( u as ) x + co. 14. For what matrices A and B do we have eAeB= eA+B,in addition to those for which A and B commute ? For the foregoing and additional references, See
R. F. Rinehart, “The Equation eXeY = fl+y in Quaternions,” Rend. Circ. M a t . Polermo. Ser. 11-Tomo VII, 1959, pp. 1-3. Bibliography and Comments 03.1. For the more extensive coverage of the material in this chapter, see Chapters 10 through 13 of
R. Bellman, Introduction to Matrix Analysis, McGraw-Hill, New York, 1960. (Second Edition, in preparation.)
For a survey of numerical techniques, see
G . E. Forsythe, “Today’s Computational Methods of Linear Algebra,” SIAM Review, Vol. 9, 1967, pp. 489-515.
93.3. See B. W. Helton, “Integral Equations and Product Integrals,” Pac. J. Math., Vol. 16, 1966, pp. 277-322.
93.5. See W. Magnus, “Algebraic Aspects of the Theory of Systems of Linear Differential Equations,” Comm. Pure Appl. Math., Vol. 7, 1954. H. Wilcox, “Exponential Operators and Parameter Differentiation in Quantum Physics,” J. Math. Phys., V O ~8,. 1967, pp. 962-982.
132
MATRICES A N D LINEAR DIFFERENTIAL EQUATIONS
Kao-Tsai Clien, “Integration of Paths, Geometric Invariants, and a Generalized BakerIIausdorff I:ormula,” Ann. Math., Vol. 65, 1957, pp. 163-178. Scc also
W. 1,. Miranker and B. iVeiss, “The Feynman Operator Calculus,” S I A M Review, L7oI.8, 1966, pp. 224-232.
43.8. For ;I detailed discussion of what can be done using Kronecker sums and Lyapunov matrices, see A. T. Fuller, “Conditions for a Matrix to IIave Only Characteristic Roots with Negative I
S. I<;iplari and Rl. Ben bliloud, “On Eigenvalue and Identification Problems in the ‘I’heory of I h u g I>istrihution,” USCEE-3 10, University of Southern California, 1968. J. Z. Hearon, ‘“l’lieorcms on Linear Systems,” A m . N. Y . Acad. Sci., Vol. 108, 1963, pp. 36-68. 43.1 1 . For a detailed account of the use of the adjoint operator in differential-difference equations, see
R. Bellinan and K . L. Cooke, “Asymptotic Behavior of Solutions of Differential-Difference Iquations,” Mem. Amev. Math. Soc., No. 35, 1959. R. Ikllman and I<. L. Cooke, Di~ereiitial-Difference Eqzcations, Academic Press, New Yorli, 1963. 43.13. For applications of functional analysis to the establishment of a generalization of the 1:loquct theorem, see A. Stokes, “A Iloquet Theorem for Functional Differential Equations,” Proc. Nut. Acad. Sci. U.S., Vol. 48, 1962, pp. 1330-1334. see also
Y. Sibuya, “Note o n Real Matrices and Linear Differential Systems with Periodic Cneficients,” J . Math. A n d . Appl., Vol. 1, 1960, pp. 363-372. Yu. A. Rlitropol’skiy and A. M. Samoylenko, “Construction of Solutions of Linear Differential 13quations with Quasi-periodic Coefficients,” A N UkrSSR. Matemnticlirskayn fizikn, No. 3, Kievc, Naukova dumka, 1967, pp. 185-198. 1. Z. Shtokalo, “Stability Criterion of Solutions of Linear Differential Equations with Sc;rrly Periodical Coeficients,” Recueil Mathernatiqiie (Matematicesk. II Sbornik), 1946. 53-14. See Chapter 7 and
K. Bellmnn, Introdrictioii to the i\/lattiemutical Theory of Control Processes, Vol. I : Linear Eqrratinns atzd Qnadratic Criteria, Academic Press, New York, 1967.
For a discussion of an important computational question, see
M. E. Fowler and R. M. \!liarten, “Numerical Integration Technique for Ordinary
Differential Equations with Widely Separated Eigenvalues,” I B M Jozrr. of Research rrnd Deaelopmeiit, Vol. 1 I , 1967, pp. 537-543.
BlBLlOG RAPHY A N D COMMENTS
133
J. Legras, “Rksolution NumCrique des Grands Systkmes DiffCrentielles Linkaires,” Numerische Mathematik, Vol. 8 , 1966, pp. 14-28. $3.17. See the book cited above in $14 for a detailed discussion of the Riccati equation. $3.20. See
R. Bellman, “Limit Theorems for Noncommutative Operations-I,” Duke Math. I., Vol. 21, 1954, pp. 491-500. H. Furstenberg and H. Kesten, “Products of Random Matrices,” Annals Math. Stat., Vol. 31, 1960, pp. 457-469. J. H. Pollard, “On the Use of the Direct Matrix Product in Analyzing Certain Stochastic Population Models,” Biometrika, Vol. 53, 1966, pp. 397-415. Also see
M. L. Mehta, Random Matrices, Academic Press, New York, 1967.
Chapter 4
STABILITY THEORY AND RELATED QUESTIONS
4.1. Introduction
I n this chapter we wish to lay a rigorous foundation for a number of approximate methods we will employ in subsequent chapters and to discuss some topics of independent interest. Our theme will be “stability.” I n very general terms, we are interested in what happens to certain properties of the solutions of an equation when the structure of the equation is altered in various ways. Starting with the equation dx
- = Ax, dt
(4.1. I )
x(0) = c,
the properties of whose solutions are readily noted, we wish to study first the boundedness and asymptotic behavior, as t + 00, of the linear equation with “almost-constant” coeficients, dx
dt = ( A
+ B(t))x,
x(0)
=
c.
(4.1.2)
Then we turn to some corresponding questions for the nonlinear equation dx dt
-
=
Ax
+ g(x) + h(t),
x(0) = c,
(4.1.3)
and present results of PoincarC and Lyapunov. We will also study the case where the solution is determined by conditions at two points, say t = 0 and t = T. Equations of this type arise in the calculus of variations in connection with the minimization of functionals of the form
and in mathematical physics in the consideration of transport processes associated with plane-parallel media of finite thickness. 134
4.2. DINI-HUKUHARA THEOREM-]
135
In addition to results of the foregoing nature, we are interested in the behavior of solutions of nonlinear differential equations over finite intervals. I n particular, we wish to study the connections between the solutions of dx
_ dt -- g(x),
40)
=c
(4.1.5)
and that of the perturbed equation
< <
over 0 t to under various assumptions concerning the order of magnitude and analytic properties of h ( y , t ) . These results are basic to our application of the Bubnov-Galerkin technique in Chapter 5. We will also discuss stability theory for difference equations, briefly, in connection with the application of difference methods to the study of partial differential equations. At the close of the chapter we discuss the Emden-Fowler-Fermi-Thomas equation as an illustration of the use of Poincark-Lyapunov results. 4.2. Dini-Hukuhara Theorem-I
T h e first result we wish to demonstrate is: Theorem.
If all solutions of dx - = AX dt
are bounded as t
-+
00, and
(4.2.1)
if (4.2.2)
then all solutions of
dr dt = ( A 4B ( t ) ) y
(4.2.3)
are bounded as t -+ 00. T h e proof we present depends upon the conversion of (4.2.3) into a linear integral equation. We have dY - AY = W y ,
(4.2.4)
136
STABILITY THEORY AND RELATED QUESTIONS
whence, setting y(0) = c, we have (4.2.5)
or
where x is the solution of (4.2.1) subject to x(0) = c. T h e assumption of (4.2.2) guarantees that the solution of (4.2.3) satisfying y(0) = c exists and is unique for any finite t-interval. From (4.2.6), it follows that
where we set
K,
= max
[max 11 x 11, m;x 11 eAt 111. t
(4.2.8)
By virtue of our assumption concerning boundedness of x, we know that
I/ e A t / / is uniformly bounded, and thus that K, < co.
Applying the fundamental lemma of Sec. 1.3, we have
for t 2 0. This completes the proof. Using this result, we can go further. Returning to (4.2.6),we have
Thus, we have not only a stability theorem, but also a stronger result providing an estimate of the difference between x and y. We can also determine asymptotic behavior. Consider the expression in (4.2.5) under the assumption that A possesses distinct characteristic roots. T h e integral J: e A ( t - t l )B(t,) y dt, contains two types of terms, those corresponding to characteristic roots with negative real part and
4.2. DINI-HUKUHARA TH EOREM-I
137
those corresponding to characteristic roots with zero real parts. Thus, we may write
Jr
where in Z, , Re(&) < 0 and in Z2, Re(&) = 0. I t is easy to verify that the fact that I[ By 11 dt,
< co implies
that
(4.2.12)
for any term in 2, . In Z2,write
T h e infinite integral converges since Re(&) = 0 and J" Thus, we can write y
+ Z2AkeAktf
= Z2AkceAkt
m 0
e-"''By dt,
(1 By (1 dt,
+ o(1) = &eAktbk,
(4.2.14)
where the b, are constant vectors linearly dependent on c. The determination of the b, is not a completely simple matter. We shall discuss some aspects of this in the chapter on invariant imbedding in Volume 11. Exercises
1. Let x' = Ax where all the characteristic roots of A have negative B(t))y are bounded real parts. Show that all solutions of y' = (A if [I B(t)l[ k, where k, is sufficiently small and that all solutions approach zero as t -+ co if k, is small enough.
<
+
2. Consider the equation x' = Ax + f ( t ) . Show that all solutions are bounded if all solutions ofy' = Ay are bounded and if J" [ l f [ l dt < 03.
STABILITY THEORY A N D RELATED QUESTIONS
138
3. Show that if A is such that all solutions of x' t + 03, then A has the canonical form
where Re(h,)
=
Re(&)
=
.**
A , have negative real parts.
=
=
Ax are bounded as
0, and all the characteristic roots of
Another way of putting this is that all the Jordan factors associated with roots of zero real part are diagonal.
4. Using the foregoing result, obtain aversionof the result in the foregoing section in which simplicity of characteristic roots is not assumed. 4.3. Dini-Hukuhara Theorem-I I
Using very much the same technique, we can demonstrate a more general result: Theorem.
If all solutions of dx dt
-
are bounded as t
+ CO,
==
A(t)x
if
<4 <
(4.3.2)
WJlI dtl < *,
(4.3.3)
2 dt = (A(t)-I- B(t))y
(4.3.4)
II X(4 W 1 > - l II for t
(4.3.1)
3 t , 3 0, and i f JrnII
then all solutions of
are bounded as t
+ GO.
4.3. DINI-HUKUHARA THEOREM-11
139
T h e proof proceeds as before, We write
and consider the associated integral equation
where, as before, X ( t ) is the solution of the matrix equation
_ dx - A(t)X dt
Hence, as above, I( y
= 0,
11 is bounded for t
X(0) = I .
(4.3.7)
0.
Exercises
1. Show that (4.3.2) is satisfied if all solutions of (4.3.1) are bounded and if tr(A(t)) = 0 for t 3 0. 2. More generally, show that (4.3.2) is satisfied if all solutions of (4.3.1) are bounded and f t tr(A(t,)) dt, is uniformly bounded for t 3 0.
+
+
3. Consider the scalar equation U" ( a ( t ) b ( t ) )u = 0, where f" 1 b(t)l dt < 03. Show that all solutions are bounded if all solutions a(t) u = 0 are bounded. of z)"
+
+
4. Show that all solutions of y' = ( A ( t ) B ( t ) ) y are bounded if all solutions of x' = A(t)x are bounded, if A(t) is periodic in t , and if I1 B(t)ll dt <
S"
140
STABILITY THEORY AND RELATED QUESTIONS
5. Show by considering the systems dY1 dt
-
d ~ 2=
~P
dt
-ay17
[sin log t
+ cos log t
dx, = (sin log t + cos log t dt
-
2a1 y z ,
-
2a) x2 + xlePat,
that the result in the preceding section is not true under the sole condition that all solutions of x' = A ( t )x are bounded as t + 00. See
0. Perron, Math. Zeit., Vol. 29, 1929, pp.129-160. R. Bellman, Stability Theory, Dover Publications, New York, 1969, p. 87.
6. Under the conditions of Exercises 1-3, show that all solutions of y' = A ( t ) y + f ( t ) are bounded as t-+ co provided that J" llfll dt < CO. 4.4. Inverse Theorems of Perron
Suppose that a system is described by the equation dx
-
dt
=4
t ) x +f(t),
x(0)
(4.4.1)
= c,
and that we want to determine something about the structure of the system by observing the behavior of the system, i.e., the solution x, under various classes of forcing functions, f ( t ) . We may, for example, R , < CO, consider thc class of functions satisfying the conditions Ilf(t)ll or j" i i f i l dt, < co, or 1" lifli2 dt < CO, and so on. This type of problem was first investigated by 0. Perron. It is most easily treated by Banach space methods. References will be found at the end of the chapter.
<
4.5. Existence and Uniqueness of Solution
We will establish the existence of a solution of the equations we consider subsequently by means of successive approximations. I n addition to this, we want to guarantee uniqueness of solution. I n many important cases, the question can be readily settled.
4.5. EXISTENCE A N D UNIQUENESS OF SOLUTION
141
Let us suppose that the equation (4.5.1)
has two solutions, x and y in some interval [0, to]. Then
Assume now that g(x) satisfies a Lipschitz condition, that is,
II g(.)
-
dY)l/ G
h I1 x - Y IL
(4.5.3)
where k, depends on the allowed range of 11 x I[ and (1 y 11. T h e n we have, from (4.5.2),
/I x
-
Y
I1 < J t I/ g(4 - g(r)ll ds
< 4 J t II x
-
Y I1 d5
(4.5.4)
where k1 is determined by max (1 x (1, max (1 y 11, as t ranges over [0, to]. As we know, this implies that x = y, the desired uniqueness. There are four important classes of problems associated with differential equations. T h e first is that of establishing existence and uniqueness of solution within some t-interval and some restricted class of functions. T h e second is that of determining the precise interval of existence of this solution. I n our case we are interested in solutions that can be continued for all t > 0. Thirdly, we are concerned with analytic expressions and approximations for the solution. Finally, we wish to obtain various types of computational algorithms. I n general, different methods are used for different types of problems. I n this chapter we concentrate on the first three classes of problems. I n subsequent chapters we will present some computational results. T h e proof of uniqueness is not included solely as a mathematical frill. Simple examples show that without further conditions on g(x), uniqueness will not hold. Consider, for example, the equation u' =
T h e function u
=
dii,
u(0)
= 0.
(4.5.5)
0 is readily seen to be a solution, but so also is u = t2/4.
(4.5.6)
We are not interested here in the weakest conditions which ensure uniqueness but rather in simple conditions which guarantee this property and which in addition permit us to exhibit a constructive demonstration of the existence of a solution.
STABILITY THEORY A N D RELATED QUESTIONS
142
Exercises
1. Show that it is sufficient to consider equations of the form in (4.5.1)
by reducing dyldt = g ( y , t ) to the foregoing form by the change of variable x1 = y1 , x2 = y z ,..., xN = y N ,xN+l = t.
2. Show thFt (4.5.3)may be replaced by /Ig ( x ) - g(y)j( where J: k,(t) dt < co.
< kl(t)/lx - y 11,
4.6. Poincar&Lyapunov Stability Theory
Let us now examine the nonlinear equation dx
-
dt
=
Ax
+,).(g
x(0) = c,
(4.6.1)
where the components of g ( x ) are convergent power series in the components of x , lacking constant and linear terms for, )I x 11 small. We are interested in determining simple criteria which will allow us to use the linear approximation (4.6.2)
as a guide to the qualitative and quantitative behavior of x as t + GO. The fundamental result in this area is a theorem due to PoincarC and Lyapunov: Theorem.
If
(a) all solutions of dyldt = A y approach zero as t 4 GO, (b) the components of g ( x ) are power series in the components of x lacking constant and linear terms, convergent for 11 x 11 suficiently small, (c) )I c I/ is suficiently small,
(4.6.3)
then (a) the solution of ( 1 ) exists for t 3 0, (b) the solution approaches zero as t + CO,
and for some constant k,
.
II x
-
Y
II
< kl II c 112
(4.6.4)
(4.6.5)
4.7. PROOF OF THEOREM
143
Observe that we are simultaneously establishing an existence and uniqueness theorem, a stability theorem, and a result concerning asymptotic behavior.
4.7. Proof of Theorem
T h e proof relies upon the fundamental technique in the theory of differential equations, conversion of the differential equation into an integral equation, a technique we have already employed. Proceeding as in Sec. 4.2, we convert (4.6.1) into the nonlinear integral equation x = e*tc
+
It
(4.7.1)
eA(t-S)g(x(s))ds.
0
We now employ the method of successive approximations since it will permit us not only to establish the stated result but also to obtain important information concerning the asymptotic behavior of x as t -+ co. Write x, = eAtc,
x,+~ = eAtc
+
Jt
eA(t-S)g(xn)ds,
n
0
2 0.
(4.7.2)
T h e proof now proceeds in easy stages. We begin by showing that
I/ %a I1 < 2b, I1 c 11,
n
= 0,l,...,
(4.7.3)
provided that (1 c [I is sufficiently small. Here b, is a constant we shall specify in a moment. T h e hypothesis made above concerning the behavior of g(x) enables us to assert that
provided that (1 x, ( ( is sufficiently small. This condition allows us to establish (4.7.3) inductively. Let us now fill in the details. We have (4.7.5) II xo 1I = II eAtc I1 < I1 eAt I1 I1 c II < I1 c I1 for some rl < 0 and some scalar 6,. This is a consequence of our hypothesis concerning the solutions of dy/dt = Ay. Subsequently, we shall use the full force of (4.7.5). At the moment, we use only the estimate 11 xo 11 b, I1 c 11.
<
144
STABILITY THEORY AND RELATED QUESTIONS
Let us now proceed inductively. Suppose that (4.7.3) is valid for n. Then, using (4.7.2),
I/ %+l I1 ,< II eAtc I1 + j t I1 ea(t-s)g(xn)ll ds
< 6, using (4.7.4). Since (1 x, 112 we have
11 %+I 11
I/ c II
+ 6, j t
erl(t-s)
< 4bI2 11 c 112,
11 g(xn)ll ds
by the inductive hypothesis,
< b1 I/ c /I + 4h3b211 c < bl /I c I/ + 4b13b2I1 c
j'erl(t--s)ds 112 j t erls ds
< b, 11 c 11 + 4b13b211 c 112 e 2bl I1 c
JmerLS ds 0
(4.7.7)
I17
provided that 11 c 11 is sufficiently small. This establishes the induction. T h e next step is to show that the sequence {x,} converges. We have x,+~
x,
~
=
1,:
e A ( t - s ) [g(x,)
(4.7.8)
- g(xnP1)] ds.
T h e assumption concerning g(x), namely, the power series nature of components of g(x), allows us to assert that
for some scalar b, , again under the assumption that
I/ xn+1
-
x,
I1 < b3 /I c I1 j t /I eA(t-s)ll /I x,
- xn-1
11 c 11
< 1 . Hence
/I ds (4.7.10)
From this, we see inductively that
3 0. Thus, the series any finite t-interval.
for n
2, 11 xntl - x, Ij converges uniformly in
4.7. PROOF OF THEOREM
145
A stronger result can be obtained by a slight modification of the foregoing argument. Let
a quantity which we know is finite. Then, from (4.7. lo), (4.7.13)
Hence, if b,lJ c 1) = Y
< I,
we see that u,
C, u, converges. Thus, the series
< b,P,
Y
< 1,
and thus that (4.7.14)
converges uniformly for t 3 0. This, in turn, implies the uniform convergence of Z%(X,+~ -x,) and therefore of x, for t 3 0. Let x denote the limit function, x limn+mx, . Then, uniform convergence yields (4.7.15)
Thus, we see that x satisfies the original differential equation. Since y = eA1c,we have
I1 x - Y It
e J t /I eA(t-s)I1 I1g(4ll < 4b12b2 )I c ]I2 It I] eA(t-s)/I ds 0
st
0
< 4bi2bz 11 \I2 C
11 eAs11 d5
< b, //
C
l12.
(4.7.16)
<
Hence, if 11 c (1 is small, the approximation of x by y over 0 t < 03 is an excellent one. T h e proof that x approaches zero as t -+ co will be deferred until the next section, where we establish a much stronger result. Exercises
1. Is uniform convergence needed to establish (4.7.15) ? 2. Establish the uniform boundedness of any solution of x =y
+
st
0
eA(t-s)g(x)ds,
STABILITY THEORY AND RELATED QUESTIONS
146
where no Lipschitz condition is invoked, by the following argument: t < co, for // c 11 1. We wish to show that /I x // < 2b,J/ c 11, 0 Assume the contrary and let to be the first point at which 11 x 11 = 2b,(l c 11. Then
<
<
2b1 /I c II
=
II x(to)ll
< IIr(to)ll+
1s
t0
eA'to--")g(x)ds
< bl II c II + b, II c 112 whence, a contradiction for 11 c 11
sto
I
II eA(to-s)It ds,
< 1 . (Levinson.)
4.8. Asymptotic Behavior
Having established the uniform boundedness of 11 x(t)ll for t 3 0, we now wish to investigate the asymptotic behavior of x as t -f 00. Let us consider only the simple case where the characteristic roots of A , A,, A, ,..., A,, are distinct. By assumption, 0 > Re(&) 2 Re(A,) 2 Re(&). If complex roots occur, they occur in conjugate pairs, since we have taken A real. Let us simplify the situation still further by supposing that A, is real, since we are primarily interested in pointing out the kind of result that can be obtained rather than deriving the most general result. Since the roots are distinct by assumption, we have eAt =
C eAktAk, N
(4.8.1)
k-1
where the A, are constant matrices. Using this representation in the basic nonlinear integral equation of (4.7.1), we may write x = e"ltA1c
+ 1eAktA N
ds
k=2
(4.8.2)
As we suspect, the principal contribution on the right-hand side is eA1'A1c
+
st
eA1(t-S)Alg(x) ds
0
m
m
+ eA1'J' e-AISAlg(x)ds - eAltf t
= eAltA1c
0
e-AISAlg(x)ds.
(4.8.3)
4.8. ASYMPTOTIC BEHAVIOR
147
T h e infinite integral exists since A, < 0 and [I A,g(x)l[ is bounded by . turns out that the first two terms dominate a term of the form 0 ( e z A i 1 ) It as t + 00, with the third term of lower order of magnitude. Let us now establish this statement. T o begin with, we show that for t >, 0 we have
where b, is a constant. We can do this inductively using the recurrence relation of (4.7.2). T h e result is valid for x1 = y = eAtc, and the induction can be established using the arguments employed in Sec. 4.7. Alternatively, we can start from
(4.8.5)
for some scalar b, independent of c. Hence (4.8.6)
Applying the fundamental inequality, we obtain
<
which yields [ ( x [I bleAltecl where to (4.8.5), we see that
E
is small since (1 c (1 i 1. Returning
11 x 11 < bleAlt+ b,eAlt
so
Since e-Alll(lx dt, converges as t -+ 00, for 11 c (1 sufficiently small we obtain the desired estimate for (1 x I(. Let us now return to (4.8.3). T h e estimate [Ig(x)ll b,1( x [I2 for t >, 0, plus that of (4.8.4), allows u s to conclude that the integral e+lSg(x) ds converges. f
Jr
<
STABILITY THEORY AND RELATED QUESTIONS
140
Furthermore,
/I eA1'
Im e-A1sA4,g(x)ds 11 <
00
eAlt
t
e-"Is
t
< bl0eAlt
m
< blleAlt
m
t
11 A , 11 11 g(x)ll ds
e-"IS /I x
1'
ds
e-A1se2A1sds
t
< bl2eZA1'.
(4.8.9)
Thus, we see that as t + 00 the third term in (4.8.3) is negligible compared to the first two. Similarly, we see that eAL"(-")A,g(x)ds
1 < b13eRe(A,)tIt
e--Re(Ak)s
/I g(x)ll ds
0
eRe(A\L)t
l4
so
e-Re(Ak)s 211s
1
* b14eRe(Ak)t
,*
00
e
--Re(Ak)s
ds
e2A1s ds
=
0
bl,e Re(Ak)t.
(4.8.10)
Collecting the foregoing results, we can assert that the following holds: (4.8.1 1) where we introduce the function ~ ( c to ) indicate that the limit depends upon the initial value c. Let us recall that this was established under the proviso that A, is a simple root. T h e case of multiple roots is not more complex in principle, but the details can become onerous due to the presence of terms such as teAIt, and so on. Exercise
1. Establish the limiting behavior of x in the case that A, is a multiple root. Consider first the case where it is a root of multiplicity two. 4.9. The Function cp(c)
T h e function ~ ( c )is of interest to us in our subsequent work on approximation and closure. Our next assertion is that p)(c) is an analytic
4.10. MORE REFINED ASYMPTOTIC BEHAVIOR
149
function of c for I/ c I/ sufficiently small. T o establish this, we observe that the assumptions concerningg(x) guarantee that each x, is an analytic co and for all t 0. Since x, converges uniformly function of c for ( 1 c (1 to x for t >, 0, it follows that x is an analytic function of c for t >, 0 and I( c I( co . An examination of the estimates we have employed shows that the limit in (4.8.7) is uniform for (1 c (( co . It follows that ~ ( c is ) an analytic function as stated. We have thus established the following theorem:
>
<
<
<
<
Theorem. Let g(x) be an analytic function of x for 11 x /I xo , lacking constant and jirst order terms; let the characteristic roots of A all have negative real parts, and let the one with largest real part, A, , be simple; supposejnally that (1 c (1 is suficiently small. Then the solution of dx
--Ax
dt
+,).(g
40) = c,
(4.9.1)
satisfies the asymptotic relation (4.9.2)
where p(c) is an analytic function of c for (1 c (1 small. At the moment, we do not possess an efficient way of determining p(c). Subsequently, in the chapter on invariant imbedding in Volume I1 we shall present an effective procedure. This determination of 4(c) will provide some information concerning the required degree of smallness of 11 c 1) and, more generally, the domain of stability. Nonetheless, it must be admitted that we do not possess adequate methods for the determination of this domain at the moment. Exercises
1. Consider the scalar equation u’ = -u by using the explicit analytic solution.
2. Use the integral equation
+
+ u2, u(0) = c. Calculate p(c) 6
e-k Joe-(l-tl)u2(t,)dt, to calculate ) means of iteration. the first three terms of the power series for ~ ( c by u =
4.10. More Refined Asymptotic Behavior
With some further effort and an additional assumption, we can establish the following result:
150
STABILITY THEORY AND RELATED QUESTIONS
Theorem. A d d to the foregoing assumptions the condition that the characteristic roots of A , A,, A, ,..., A, are all real and simple with 0 > A, > A, > * * . > A,. Then f o r each k, there is a n ( N - k 1)dimensional set of initial vectors c such that
+
lim xe-Akt t+m
(4.10.1)
= cpk(c)
<
holds. The functions q k ( c ) are analytic f o r 11 c 11 co . T o establish this result, we start with the basic integral equation
and make some suitable adjustments. To eliminate the terms eAlt, eAZt,..., eAk-ll, we wish to set A,C
+ Srne-Al%,g(x)
ds =
0
for r = 1, 2, ..., k integral equation k-1
x=-
s,
1 eAit
a=1
-
o
(4.10.3)
1. T o accomplish this, we start anew with the
00
eCA12(x)ds
N
+ 1 eAit [A,. + i= k
st
eKAiSAig(x)ds].
(4.10.4)
0
T o establish that this equation possesses a solution with the desired properties, the method of successive approximations can be applied as before, starting with the initial approximation x,
N
=
eAitAic.
(4.10.5)
i=k
We shall omit the details, which involve some tedious estimations, since we have no particular application of the results subsequently. Exercise
1. Carry through the proof of the theorem above. 4.11. Analysis of Method of Successive Approximations
Let us pause to analyze our procedure and point out the three basic steps in the application of the method of successive approximations. First, the equation is written in the form x
=
T(x),
(4.11.1)
4.11. ANALYSIS OF METHOD OF SUCCESSIVE APPROXIMATIONS
151
where T is a transformation derived from the original equation. Frequently, some experimentation is required in order to obtain an appropriate form which permits a convergence proof. As we shall discuss below, there are often alternate formulations which possess different advantages as far as accuracy, stability, ease of computation, and so forth, are concerned. In many cases, the choice of an initial approximation is crucial. Generally speaking, T(x) must possess the property of leaving a convenient class of functions invariant. Assuming that we have obtained a suitable transformation, we introduce a sequence of functions by means of the recurrence relation Xn,,
=
T(xn),
(4.1 1.2)
= 0, l)...,with xOprescribed. I n other words, we consider the solution of (4.11.1) to be the steady-state limiting form of the dynamic process defined by (4.11.2), that is, x = limn+mx, . T o establish the convergence of the sequence (x,}, we usually proceed in the following steps. First, we demonstrate that the sequence is bounded in a suitable norm. Following this, we prove the convergence of the sequence by considering the infinite series
n
2 m
(Xn+1 - xn).
(4.11.3)
n=O
I n Volume 11, in our discussion of quasilinearization, we provide an entirely different procedure based on monotone convergence. To obtain an estimate of the terms in this series, we return to the equation in (4.1 1.2) and write (4. I 1.4) %+1 - x n = W,) - w%-l).
If T is linear, this simplifies to xn+l
- xn = T(xn - %-I).
(4.11.5)
I n general, however, we must impose some condition on the transformation T which enables us to estimate 11 T(x,) - T ( X , - ~ )in\ ~terms of 1) x , - x,-~ 11. I n the previous discussions a Lipschitz condition suffices, which is to say a bound on the partial derivatives of the components of
g w
Having established convergence in this fashion, we then turn to the problem of showing that the limit function x = limn+mx , is actually a solution of the original equation. This requires a continuity property of T , namely, - . that T ( x ) = lim T(x,). (4.11.6) n-tm
152
STABILITY THEORY AND RELATED QUESTIONS
This is easy to establish as a consequence of the bound of 11 T ( x )- T(x,)lI in terms of I/ x - x, 11. This does not conclude the proof, however. I n many cases the equation of (4.11.1) is obtained from the original equation by means of some preliminary transformations and it may not be easy to show that these are reversible transformations. T h i s is particularly the case in dealing with partial differential equations.
4.12. Fixed-point Methods
One of the advantages of using the representation
x
=
T(x)
is that it is apparent that the desired solution is a fixed-point of the transformation T . I t is tempting then, stimulated by the finite-dimensional fixed-point theorem of Brouwer, to see if one can establish the existence of a solution using only very general properties of T , such as continuity with respect to a suitable norm. This method, introduced by Birkhoff and Kellogg and greatly extended by Leray and Schauder and others, is a very elegant and powerful one. Existence theorems based upon fixed-point methods play a vital role in the formulation of mathematical models in new domains of science. These existence theorems, derived on the basis of simple characteristics of T , show that a problem is well-posed. I n return, these methods are nonconstructive. They provide no algorithm for the determination of x. As we shall see in subsequent chapters, however, once existence, and hopefully uniqueness, of solution has been demonstrated, a number of paths open for the calculation of the solution. This is particularly so now that digital computers are available.
4.1 3. Time-dependent Equations over Finite Intervals
T h e preceding methods will now be used to obtain an estimate of the norm of the difference between the solutions of the two equations dx
(4 Z (6)
= g(4,
Z =g(Y) + dY
x(0)
w, Y ( 0 )
= c,
(4.13.1) = c,
4.13. TIME-DEPENDENT EQUATIONS OVER FINITE INTERVALS
153
over a finite interval [0, TI. Our basic assumption is that (4.13.la) has a solution over [0, TI. Furthermore, we suppose that g(x) is analytic in x in a neighborhood of x = c, which guarantees uniqueness. As we shall see, the full force of this assumption is not necessary. The analysis can be carried out under various assumptions concerning 11 h 11, where any of a number of different norms may be employed. Results of this type will be used in subsequent chapters, specifically in Chapter 5 . T o bring this problem into a form similar to that already discussed, let us writey = x w where we assume that x exists over [0, TI. Then
+
dw
= g(x dt
+ w ) - g(x) + h(t),
w(0) = 0.
(4.13.2)
Under the assumption that g(x) is analytic in x, we can write, using the multidimensional Taylor expansion,
where J ( x ) is the Jacobian matrix. By assumption, provided that 11 w (1 is sufficiently small,
I1 4%w>ll < bl I1 w 112?*
(4.13.4)
where the choice of b, depends upon the values assumed by x in [0, TI. Thus, w satisfies the equation dw dt
- --
A+
+ 4%4 + 4 t h
40)
= 0.
(4.13.5)
I n order to treat the problem of the estimation of 1) w 11 in terms of 11 h 11, we use the properties of linear systems with variable coefficients as in Sec. 4.3. Let X ( t ) denote the solution of the matrix equation dX _ - J(x)X, dt
X(0) = I .
(4.13.6)
Then the solution of (4.13.5) satisfies the integral equation (4.13.7)
* We now introduce another set of constants b, , 6, ,....
154
STABILITY THEORY AND RELATED QUESTIONS
Let us now employ successive approximations, both to establish the existence of a solution of (4.13.7) and to estimate its norm. Set (4.13.8)
We now show inductively that (4.13.9)
where we have introduced a new norm (4.13.10)
We have
(4.13.11)
Since X(t)-l satisfies the adjoint equation dY dt
Y ( 0 ) = 1,
(4.13.12)
we see that j J X ( t ) - l / /is uniformly bounded for 0 assumption that x exists in [0, TI. Hence
< t < T, under the
- = - YJ( x) ,
II wo IIM G bz II h llM -
(4.13.13)
Let us now proceed with the induction. We have
I1 wn+1 IlM
< (' I1 X ( t )X(t1)-' I1 II WG wn)ll 4 + I1 wo /I
+
G b, I1wn llM2 bz I1h IIM G 4bzZb3II h l l ~ ' bz II h IIM
+
< 2bz I1 h I I M ,
(4.13.14)
provided that 11 h IIM is sufficiently small. T h e convergence of the sequence {wn>follows a familiar route. Hence, we have established the following theorem: Theorem.
Let the equation (4.13.1 5)
4.14. ALTERNATIVE N O R M
155
possess a solution over [0, TI. Let g(x) be analytic in x (guaranteeing uniqueness), and let 11 h l l M , as defined above, be suficiently small. Then the equation
2 = g ( y ) 3- h(t), dt
(4.13.16)
y(0) = c,
possesses a unique solution over [0, TI and
The uniqueness of the solution of (4.13.16) is a consequence of the postulated analyticity of g( y). We can reduce the requirement of analyticity, but this is of little interest since this condition is usually fulfilled in important applications.
Exercises
1. Suppose that x' = g(x), x(0) = c, possesses a unique solution over [0, TI, where g ( x ) satisfies a Lipschitz condition. Then y' = g( y ) + h( y ) , y(0) = c , possesses a unique solution provided that h( y ) satisfies a Lipschitz condition in the neighborhood of y
)I h(x)/l < E where c is sufficiently small.
=
x, and
2. Extend the result of preceding section to cover the case where dx/dt = g(x, t ) , dy/dt = g ( y , t ) h(t). 3. Consider the general case
+
under suitable assumptions. 4. Consider the case where x(0)
=
c, y ( 0 ) = d, (1 c - d (1
< E.
4.14. Alternative N o r m
In some cases, we want to use a different norm. For example, we may want to use the norm (4.14.1)
STABILITY THEORY AND RELATED QUESTIONS
156
T h e same type of argument as before establishes the existence of y and the inequality (4.14.2) II x - Y ll < 6, II h I12 or (4.14.3) I/ x Y 112 6 4 I/ h 112
e
-
7
under the assumption that (1 h /I2 is sufficiently small. We shall employ these results in connection with both differential approximation in Chapter 6 and the Bubnov-Galerkin method in Chapter 5. 4.15. Perturbation Techniques
It is also possible to study the equation
+ g(x),
dx --A x dt
x(0) = c,
(4.15.1)
by means of perturbation techniques. We can often profitably regard the coefficients in the expansion of g(x) as parameters. l’hus, for example, given the scalar equation u
we can write u
=
uo
+ €U2,
+ EU1 +
u(0) = c,
E2U2
+ -.*,
(4.15.2)
(4.15.3)
and thereby obtain an infinite sequence of differential equations
(4.15.4)
These equations can be solved recurrently since the nth equation is linear in a,. I t is easy to establish the convergence of the series in (4.15.3) for E I small. I n general, we can assert: ~
Theorem.
Consider the equation dx dt
-
-=
Ax
+,.(g
E),
x(0)
= c.
(4.15.5)
4.17. SOLUTION OF LINEAR SYSTEMS
157
Ifg(x, E ) is analytic in x and E f o r 11 x (1 and 1 E I small, and the previous conditions concerning A and g are satis$ed, then x is analytic in E f o r 1 E I small and each of the functions rpk(c) introduced above is analytic in E , f o r t 3 0. Since we shall not use the result, we will not present the details of the proof. Exercise
1. Establish the convergence of the series in (4.15.3) for 1
E
1
< 1.
4.16. Second Method of Lyapunov
I n the previous pages we presented one powerful approach for the discussion of the stability of the solutions of an important class of differential equations. Another method of great importance is the “second method of Lyapunov.” This method can be applied to many situations where the nonlinear term is of unconventional structure and also quite easily to more complex functional equations. On the other hand, it furnishes much less information concerning the nature of the solution. Since the technique is closely allied with the theory of differential inequalities, we will defer any discussion until the appropriate chapter in Volume 11. 4.17. Solution of Linear Systems
T h e foregoing discussion of the nonlinear equation dx dt
-
= A X +g(x),
~ ( 0= ) c
(4.17.1)
implicitly assumes that there are various attractive and simple aspects to the linear equation dY = A y ,
dt
y(0) = c.
(4.17.2)
If the dimension of y is small, we can reasonably think of using the explicit analytic form of y to calculate values of y(t).If the dimension is large, it may be far more convenient to calculate these values using some other approach. If a method based upon difference equations is employed,
STABILITY THEORY AND RELATED QUESTIONS
158
considerable difficulty can be encountered if the characteristic roots of A vary greatly in magnitude. This corresponds to the fact that (4.17.2) describes a physical process in which different subprocesses are going on with vastly disparate time scales. T h e problem encountered here is very close to the problem of ill-conditioning discussed in Chapter 2. We would like to emphasize the fact that there is nothing routine about the solution of high-dimensional linear systems of differential equations. Novel types of stability questions arise, particularly in connection with two-point boundary value problems.
4.18. Origins of Two-point Boundary Value Problems
We now wish to examine some properties of nonlinear differential equations subject to two-point boundary conditions. Equations of this nature arise in several important ways. I n the first place, in the process of minimizing a functional of the form =
subject to x(0)
= c,
X” -
Bx
1‘
[(x’, 3‘)
+ (x,Bx) + h(x)] dt,
(4.18.1)
we obtain the Euler equation -
g(x)
= 0,
x(0)
= c,
x’(T) = 0,
(4.1 8.2)
where g is obtained in a simple fashion from h. We will examine the rigorous details in the chapter on the calculus of variations (Chapter 7). T w o questions are now of interest. If a unique solution to (4.18.2) holds when g(x) = 0, the linear case, does a unique solution exist if 11 g(x)l( is small ? Secondly, if (1 g(x)ll = o(l1 x 11) as 11 x 11 -+ 0, and 11 c 11 is small, what is the asymptotic behavior of min, J ( x ) as T 4 00 ? These are important questions in connection with the theory of control processes. Questions of similar nature arise in the study of transport processes in plane-parallel slabs of finite thickness T . If we discretize both the angle and energy of the particle, we obtain a finitedimensional set of vector equations
4.19. STABILITY THEOREM FOR TWO-POINT BOUNDARY VALUE PROBLEM
159
where, as indicated in Fig. 4.1, x may be regarded as the right-hand flux and y the left-hand flux. The determination of limiting behavior as T -+ co is of interest in connection with the properties of very thick slabs and of planetary atmospheres which can often considered semi-infinite.
Figure 4.1
4.19. Stability Theorem for Two-point Boundary Value Problem
T o illustrate the kind of result that can be obtained, consider the scalar equation U" - u = g(u), ~ ( 0= ) C, u'(T) = 0. (4.19.1)
As before, we proceed to convert this into an integral equation (4.19.2)
where uo =
ccosh(t - T ) cosh T
(4.19.3)
and the Green's function is given by K(t, t, , T ) = - -
Observe that as T -+
cosh(T - t) sinh t, cosh T cosh(T - t,) sinh t , cash T 9
OC)
K(t, t, , t )
O
O
- 2'< e-(t-t')
0
e-(t'-t) N -
2
'
t
t,
(4.19.4)
< t,
< t, < m.
(4.19.5)
160
STABILITY THEORY AND RELATED QUESTIONS
T o demonstrate the existence of a solution of (4.19.2), and hence of (4.19. l), under the usual assumptions, namely, (4.19.6) we employ successive approximations, (4.19.7) n = 0, 1,..., and follows the usual steps. With the aid of the bounds on K ( t , t , , T ) given in (4.19.5), there is no difficulty in completing the proof of the existence of one solution of (4.19.1) and (4.19.2). T h e question of uniqueness is, however, more complex. There is no difficulty in showing that this solution is unique in the class of functions satisfying k,/c 1, for some constant k, , for I c I 1. a constraint of the type 1 u I
<
<
4.20. Asymptotic Behavior
As T
+
co,the equation in (4.19.2) formally approaches the equation (4.20.1)
where vo
=
K ( t , tl)
=
lim uo,
T+m
lim K ( t , t, , 7’).
(4.20.2)
T+ m
T h e function v satisfies the equation a’’ - v
= g(v),
(4.20.3)
with c(0) = c and presumably v’( a)= 0. Existence of a solution of the above under the usual conditions o n g and c is readily demonstrated, as well as uniqueness in the class of functions satisfying a bound 1 v 1 k,/ c I. We leave as an exercise the task of showing that
<
lim zc(t) = v(t),
T+m
(4.20.4)
where u ( t ) is determined by (4.19.1) and the further condition that I u(t); < hi c I.
4.21. LINEAR TWO-POINT BOUNDARY VALUE PROBLEMS
161
Analogous results can be obtained for the vector-matrix equation X”
x(0)
- Bx = g(x),
x’(T) = 0,
= c,
(4.20.5)
as well as the equations in (4.18.3).
4.21. Numerical Aspects of Linear Two-point Boundary Value Problems
I n Sec. 4.17 we mentioned the fact that linear differential equations subject to initial values could present serious computational difficulties. When two-point boundary value conditions are present, the situation can exhibit further obstacles. As we know (see Sec. 3.15), the solution of X” -
Ax = 0,
= dl),
(4.21.1)
x’(T) = d 3 ) ,
(4.21.2)
X(0)
with a terminal condition of the form or
x(T) = d 2 )
along classical lines involves the solution of initial value equations plus the solution of a linear system of algebraic equations. Let us suppose that this has been carried out. T h e determination of the missing value x’(0) will inevitably involve some numerical error. It is essential to investigate the effect of this inaccuracy in the value of x’(0) upon the value of x ( t ) for t > 0. T o illustrate what can then occur even in the simplest cases, consider the scalar equation u“
-u
= 0,
u(0) = 1,
u’(T) = 0.
(4.21.3)
T h e solution is readily seen to be u =
cosh(t - T ) cosh T ’
(4.21.4)
a function having the form illustrated in Fig. 4.2. T h e initial condition - 1 for large T . for u’(0) is seen to be u’(0) = -tanh T Suppose we change the initial conditions to read u(0) = 1
+A,
u‘(0) = -tanh T
+A.
(4.21.5)
162
STABILITY THEORY AND RELATED QUESTIONS
~
0
t
T
Figure 4.2
We introduce a small error in the values of both u(0) and u’(0) to simplify the subsequent analysis. T h e solution corresponding to these new conditions is readily seen to be given by u =
cosh(t - T) cosh T
+ Aet.
(4.21.6)
If A is small and T is not too large, which is to say if we are not integrating over too large an interval, the unwanted term Ael causes no particular trouble. If, however, T 1 , we can easily observe a situation of the type in Fig. 4.3.
>
numerical solution
1
0
t
T
Figure 4.3
We will discuss ways of circumventing this unpleasant behavior in the chapters on dynamic programming and invariant imbedding in Volume 11. Observe that if A is positive definite in (4.21.1), we automatically have particular solutions with both positive and negative exponents. I n the matrix case, a condition equivalent to T 1 can easily occur even if T is not large by virtue of the existence of characteristic roots varying greatly in magnitude. This situation arises in many parts of science, such as biology, chemistry, and physics.
>
163
4.22. DIFFERENCE M E T H O D S
Exercises
1. Consider the equation u” - au = 0, u(0) = 1, u‘(0) = Y, with a > 0. Write v = u‘u-l. Show that v satisfies the Riccati equation v‘ v2 - a = 0, v(0) = Y. Examine the stability of the particular
+
solution v
=
a1I2.
+ w2
2. Consider the equation w’ I p ( t ) / 1 for t > 0, is w
<
-
v?
a
= p(t), w(0) = y1
. If I rl I << 1 ,
3. Consider the matrix equation X” - A X = 0, X ( 0 ) = I , X’(0) = R, , with A > 0. Write R = X’X-l. Show that R‘ = A - R2, R(0) = Ro . 4. Examine the stability of the steady-state solution R = All2.Hint: Write R = All2 + S a n d consider the equation S’ = -A’12S - SA112- S2, S(to)= S o , t 3 t o . Th en S satisfies the equation
s = e- B ( t - t o ) ~e-B(t--to) 0
where B
=
-
e-B(t-t,)~2e-B(t-t
Jto
dt, ,
All2,etc.
5. Consider the stability of solutions of R‘
=
A(t) - R2.
4.22. Difference Methods
Use of a digital computer to obtain the numerical solution of a differential equation of the form x’ = g(x),
x(0)
=
(4.22.1)
c,
requires that this equation be replaced by an approximate equation which requires for its numerical solution only the elementary arithmetic operations. One procedure of this type (by no means the most efficient) uses the difference equation Y(t
+ 4 - At)
=.dY(t)Y,
Y ( 0 ) = c,
(4.22.2)
where t assumes only the values 0, A , ... . T h e function y ( t ) may then be defined for all values of t by use of linear interpolation. T hi s approximation technique is of major significance, both analytically and computationally, particularly in the domain of partial differential equations. As discussed in Volume 11, it often enables us to demonstrate fundamental properties of the solution in an inductive fashion.
164
STABILITY T H E O R Y A N D RELATED QUESTIONS
Denote the solution of (4.22.2) as extended by linear interpolation, by y ( t , 4). T h e question we pose is that of estimating the error, / / x(t) - y ( t , 0)li, 0 t T , when d is small. What we wish to point out is that this basic problem of approximation can be regarded as one within the classical domain of stability theory. Let g(x) satisfy a uniform Lipschitz condition for I(x (1 b, . Take 11 c 11 < b, and assume that T is small enough so that x remains in the desired range. Then (4.22.1) yields
< <
<
x(t
+ 0)
-
x(t) =
J;dg(x)
dt,
where
We see then that the closeness of x to y can be considered as a statement concerning the stability of solutions of the dzflerence equation of (4.22.2). T h e methods presented above can be readily applied to resolve this question when appropriate assumptions are made concerning g. A typical result of this nature is given in the next section.
Exercises
1. Consider the scalar recurrence relation v(t 4-A ) - v ( t ) = adv(t), ”(0) = 1, t = 0, A , ..., and the associated differential equation zi‘ au, u(0) = 1 . Write v(t, 0) for the solution of the difference equation. Show that 1
v(t, 0)- u ( t ) = dul(t)
for small d and t alone.
=
nd, n
=
+
02U2(t)
+
a**,
0, 1, 2, ..., where the ui(t) depend on t
2. Hence, consider 2v(t, Oj2) - v(t, 0 ) as an improved estimate for u(t). Is there any computational advantage to using this approximation rather than v(t, 47 ? (This is a particular example of “deferred passage to the limit.” It is illustrative of the fact that information concerning the analytic structure of the solution can often be used to obtain a greatly improved computational procedure.)
4.24. PROOF OF STABILITY
165
4.23. Difference Equations
(4.23.1)
We can regard this relation as defining the nth iterate of g, the vector x, , or we can think of it as a difference equation. As we have indicated above, equations of this type arise in connection with the numerical solution of differential equations when we employ a difference approximation of some type. Since numerical solution by means of a digital computer necessarily entails errors of various types we must accept the fact that (4.23.1) is to be replaced by the perturbed equation when used for computational purposes. Here E, arises both as a consequence of round-off error and the arithmetic error involved in the evaluation of g(y,). We see then that any estimate of the accuracy of the computational solution involves stability analysis. If g(.)
=A X
+ h(x),
(4.23.3)
where h(x) contains second order terms and higher in its power series expansion around x = 0, we can proceed exactly as before to study the relation between xn and y, .
4.24. Proof of Stability
T h e first step is that of obtaining the solution of the linear inhomogeneous equation Ynil
=
AYn
+fn
3
YO
C,
1
(4.24.1)
in terms of the solution of (4.24.2)
Setting yn
=
Anx,, we have, from (4.24.1), z,+~= Z,
+ A-(n+l)f n
(4.24.3)
166
STABILITY THEORY A N D RELATED QUESTIONS
This step is only valid if A-l exists, but as we shall see, this requirement does not enter into the final result. From (4.24.3), we obtain z,
n-1
=c
+ 1 A-'k+'lf,.
(4.24.4)
k=O
Hence yT1== Anz, = Anc
+
n-1 A'n-k-l'f,
= xn
+
k=O
n-1
A(n-k-llfk.
(4.24.5)
k=O
We now observe that only nonnegative powers of A appear. Furthermore, we can verify by direct substitution that (4.24.5) is a solution of (4.24.1) and therefore clearly the solution. h( y,) E, into a nonlinear difference Next, we convert yn+l = Ay, equation
+
+
(4.24.6) From this point on, the study of boundedness and asymptotic behavior of y n as n -+ co is precisely analogous to that previously given for differential equations. Exercises
1. State an analogue of the PoincarC-Lyapunov result.
2. Obtain the detailed asymptotic behavior of yn as n + CO. 3 . Let the characteristic roots of A be A, ,A, ,..., A, with 1
> 141 > l & l b
*.*
b IhI-
Under what conditions on h does limnim ynX;"
+
=
4 ( c ) exist ?
4. Consider the difference equation u(t A ) - u(t) = g(u) A , t = 0, A , ..., where g(u) is analytic in u for I u I < 6, . Under what conditions o n g can we write u(t) = u(t,A ) = uO(t) u I ( t ) A u2(t)A2+ ---, where the u i ( t ) are independent of i?
+
+
4.25. Analysis of Stability Proof
T h e method we followed in the preceding pages is one of the two fundamental approaches in stability theory. T h e other, as mentioned above, Lyapunov's second method, will be presented in Volume 11.
4.25. ANALYSIS O F STABILITY PROOF
167
I t is important to analyze the various stages involved in the approach we followed, since similar questions of stability of the equilibrium position arise in connection with differential-difference equations such as dx dt
- = Ax(t)
x(t)
=
+ B x ( t - I) + g(x(t)), 0
h(t),
t
1,
t
2 0,
< t < 1,
(4.25.1)
parabolic partial differential equations such as Ut = u,,
+g(u),
0 < x < I,
u(0, t ) = u(1, t ) = 0 u(x, 0) = 44,
(4.25.2) O<x
and in connection with many other classes of functional equations in wide areas of mathematical physics and engineering. Characteristically, there are three stages. We begin with the equation L(u) = g(u),where L is a linear operator and g(u) is “small” with respect to u in some sense. It is first of all necessary to obtain an expression for the solution of the linear inhomogeneous equation (4.25.3)
L(u) = g .
This is usually done by means of the adjoint operator, yielding an equation of the form (4.25.4) u = uo W),
+
where K is a linear operator and uo is the solution of the homogeneous equation. Next, we replace g by g(u) and transform the original nonlinear equation L(u) = (4.25.5) into the nonlinear equation 7.l =
uo
+ K(g(u)).
(4.25.6)
This is almost always an integral equation. This equation is then solved by means of successive approximations, u,+1 = uo
+qg(u,)),
n
= 0,
I,....
(4.25.7)
T h e convergence of {un}is usually accomplished by means of consideration of the series C, - un). Finally, there is the task of showing that the solution of (4.25.6) yields a solution of (4.25.5).
168
STABILITY THEORY AND RELATED QUESTIONS
T h e second step depends crucially upon establishing certain properties of the operator K. We shall discuss some aspects of this for the partial differential equation in Volume 11. For the differential-difference equation, Laplace transform techniques are useful. References will be found at the end of the chapter. 4.26. The General Concept of Stability
We have so far discussed some particular aspects of the stability of solutions of the nonlinear differential equation
(4.26.1) T o be precise, we have focussed upon the stability of the equilibrium solution, x = 0, of the linear equation dX
-
dt
=
(4.26.2)
AX,
under both a change in the structure of the equation, namely, the addition of the nonlinear term g(x) and the forcing term h(t), and a change in the initial conditions, from x = 0 to x = c. I t is this point of view which leads us to a general definition of stability. Suppose that an equation has a solution, or a class of solutions, possessing certain structural properties such as boundedness, approach to an equilibrium state as time increases, and so forth. If the new equation obtained by certain changes in form and in initial and boundary conditions possesses a solution, or class of solutions, with the same structural properties, we say the original equation is stable under perturbations of the stated type. Since there are many different kinds of properties of solutions of interest to us and many different ways in which we can change the form of the equation, it is clear that there are many different types of stability. I n general, it is the physical background of the equation that tells us what types of stability to examine. 4.27. Irregular Stability Problems
I n the foregoing pages, as noted, we focused our attention primarily upon equations of the form dx
- = A X +g(x),
dt
~ ( 0= ) C.
(4.27.1)
4.27. IRREGULAR STABILITY PROBLEMS
169
Without much additional effort, we could consider equations of the form (4.27.2)
Problems of an entirely different level of difficulty, however, arise when we allow equations containing higher derivatives. One of the most important examples of this is that where the equation has the form E
~
d2x dx +=Ax dt2 dt
+ g(x),
(4.27.3)
with E a small parameter. We wish to know the connections between the solutions of this equation and that of (4.27.1), the limiting form when E = 0. A great deal of work has been done in this area. A reference will be found at the end of the chapter. Little work, however, has been done to date on the analysis of equations of the form dx
dx d2x =Ax+g(x,;lt,-p).
(4.27.4)
Exercise
+
1. Consider the equation u' u = u " ~For . u(0) sufficiently small and u'(0) suitably chosen, is there a solution which exists for all t 3 0 and approaches zero like e-t as t 4 co ? Hint: Write v = u' and consider the two associated systems
+
(1')
u' = 0,
0'
(1")
u' = 0,
v' = -(u
= (u
0)1/2,
+
.)1/2.
Show that (1') has exactly one solution with slope -1 at the origin, whereas (1") has a one-parameter family of such solutions which fill up a region in the (u, v)-plane. These are the only solutions corresponding to the desired asymptotic behavior. See
+
J. L. Massera, "Estudio Cualitativo de la Ecuacion un2= u u'," Publicaciones del Instituto de Matematica y Estadistica, Vol. 3 , 1956, pp. 1-10 (Montevideo, Uruguay).
170
STABILITY THEORY AND RELATED QUESTIONS
4.28. The Emden-Fowler-Fermi-Thomas
Equation
Let us now consider the equation = 0,
uir __
(4.28.1)
n > 1, which arose in astrophysics as the Emden-Fowler equation and in quantum mechanics as the Fermi-Thomas equation. There is no difficulty in finding a particular solution of the form (4.28.2)
u = atb,
where (4.28.3)
l)]l/(n-l)*
We suppose that m # -2. T h e interesting and important question is: How representative of the general solution of (4.28.1) is this particular solution? One way to answer this is to set u = atbv
(4.28.4)
and obtain an equation for v. We have ab(b - 1 ) tb-%
+ 2abtb-lv’ + a t b d
- antm+nbzP =
+ nb = b - 2, this reduces to t2v” + 2btv’ + b(b l)(v = 0.
0.
(4.28.5)
Using the fact that m
-
e)”)
(4.28.6)
T h e function v = 1 obviously satisfies this last equation. We want to determine whether it is a stable solution locally and, more important, whether it is stable globally. Starting at any point in the (v, 8’)phase plane, does a solution which exists for all t 3 0 necessarily approach the point v = 1, v’ = O ? Questions of this nature can be answered if we introduce the concept o f a proper solution. This is a solution which exists for all t to . If we set t = er, we obtain an equation free of the independent variable, of the form v”
+ clv‘ +
CZ(V
- ZF) = 0.
(4.28.7)
A detailed analysis of all the cases arising from different ranges of the parameters would consume a considerable amount of space. Some of the results will be given as exercises.
171
MISCELLANEOUS EXERCISES
Exercises
+ +
1. If u n 1 < 0, all positive proper solutions of u" have one of the following asymptotic forms: 11-
- t'un = 0
,
(n - 1)2
Hint:Begin by showing that all solutions are eventually monotone. Then consider the three cases u'
2. If
CJ
U" -
+ 2 < 0 < u + n + 1,
-+
0,
u' -+ a
# 0, u' -+
every positive proper solution of
tuun = 0 has the asymptotic form
+
+ +
u 2 <0=u n 1, every positive proper u" - tuun = 0 has the asymptotic form
3. If
03.
-
4. Every positive proper solution of t2u" - u" [log t / ( n - l)]l/(n-l). form u
=
solution of
0 has the asymptotic
5. If u + 2 > 0, every positive proper solution of u" - tou* the asymptotic form u ~t-(~f~)/(~-l).
=
0 has
N
Miscellaneous Exercises
+
1. Consider the vector systems x' = f ( t , x), y' = f ( t , y ) g ( t , y ) , x(t,) = c, y(t,,) = c. Show under appropriate assumptions that y -x =
1'
to
Z ( t , s, y(s))g(s, y(s)) ds,
where Z(t, s, c) is the matrix solution of 2' = J ( f ( x ) )2, Z(t, t , c) I, for all c. See
=
STABILITY THEORY AND RELATED QUESTIONS
172
V.M. Alekseev, “An Estimate for the Perturbations of the Solutions of Ordinary Differential Equations,” Vestn. Moskov. Univ., Ser. I Mat. Meh., No. 2, 1961, pp. 28-36. For some applications, see
F. Braver, “Perturbations of Nonlinear Systems of Differential Equations,II,” J. Math. Anal. Appl., Vol. 17, 1967, pp.418-434. 2. Show that f, 3 0, fm+n
+ <
< f , +f ,
implies that limn-tmf,/n exists.
3. Show thatf(x y ) f ( x ) +f(y), x, y 3 O , f ( x ) Lebesgue integrable for x >, 0, implies that lims-tm f ( x ) / x = K exists with -00 k < 00.
4. If f ( x
+
<
+ + sm
y ) d f ( x ) +f(y) g(x y ) , x, y 3 0 and if g(x) is nondecreasing, then the finiteness of g ( x ) dx/x2is a necessary and sufficient condition that limz+mf ( x ) / x exists. See
J. M. Hammersley, “Generalization of the Fundamental Theorem on Subadditive Functions,” Proc. Cambridge Phil. Soc., Vol. 58, 1962, pp. 235-238. 5. Consider the nonlinear functional equation
f(P) = m,““
M
P
7
q)
+ h(P?4 ) f ( T ( P ,4N1,
where p belongs to a space S and T(p , q) E S for all q. Consider the existence and uniqueness of solution under the following conditions:
<
<
a < 1 for all P and q ; I g ( p , q)l for all p (a) I h ( p , q)l and 4 ; or (b) I h ( p , q)l d 1 for all P and q; It T ( p , dll d all P II where a < 1 for all p and q for some suitable norm. This is a typical functional equation of the theory of dynamic programming. See
R. Bellman, Dynamic Programming, Princeton University Press, Princeton, New Jersey, 1957. 6. Consider the equation fn(P) =
”,”” k ( P 9 d +fil-l(T(P,
4))1,
n
3 1,
with f,,(p) = 0. Here p belongs to a space S , T ( p , q) E S for all p and q and g ( p , q) k < co for all p and q, with g 3 0. Let u, = maxp fn( p ) . Show that u,+, d u, u, and thus that un/n approaches
<
+
MISCELLANEOUS EXERCISES
173
a limit as n -+ co. Under what conditions on the nature of S and the transformation T ( p , q) can we conclude that f , ( p ) / n converges for each p ? See
R. Bellman, “Functional Equations in the Theory of Dynamic Programming-XI: Limit Theorems,” Rend. Circ. Mate.
Palermo, Vol. 8, 1959, pp. 1-3. R. Bellman and T. A. Brown, “Projective Metrics in Dynamic Programming,’’ Bull. Amer. Math. Soc., Vol. 71, 1965, pp. 773775.
7. Let p E S and T ( p )E S with T continuous. Consider the sequence defined by
%(P)
=g(P)
+ d T ( P ) )+ + g(T‘”-l’(PN, **-
where g is a continuous function over S. Show that u,+,(p) = u,(p) u,( T”(p)).Let v, = max, u,(p). Show that v,+, < v, v, and hence that v,ln converges as n co. Under what conditions on S and T(p ) does un(p ) / n converge as n -+03 ?
+
+
--f
+
<
8. If u(t) is continuous for all t and I u(t s) - u(t) - u(s)l E for -co < t , s < co, show that there exists a constant K such that I u(t) - kt 1 2~ for all t. See
<
D. Hyers and S. Ulam, “Approximately Convex Functions,” Bull Amer. Math. Soc., Vol. 3, 1952, pp. 821-828.
<
9. Let g(u) and f ( t ) be continuous for -a < u <m, 0 t < 00. Suppose that g(u) is such that g(u) < 0 for 0 < u < co,g(u) > 0 for -co < u < 0 and that I g(u)l -+ 00 as 1 u 1 -+ co. Suppose that either f ( t ) -+ 0 as t -+ 03 or that J r f ( t )dt converges. Then as t increases, no continuation of any solution of u’ = g(u) + f ( t ) can cease to exist at a finite value of t and u(t) -+ 0 as t -+ co holds for every continuation of a solution.
10. Let g(u) andf(t) be continuous for -co < u < co, 0 < t < 00. Suppose that If(t)lp dt < co for some p 3 1. If u ( t ) -+ 0 as t -+ co, then J“ I g(u(t))lpdt < 03, J“ I u‘(t)/p dt < co.
Jr
J’r
11. Iff(t) is a continuous function satisfying If (p dt < co,for somep, 1 p 2, then every solution of u” - (1 + f ( t ) ) u = 0 is a linear combination of two particular solutions u l , u2 , where
< <
174
STABILITY THEORY AND RELATED QUESTIONS
as t -+ 00. Hint: If v foregoing, see
= u’/u,
then v‘
+ v2
=
1 + f ( t ) ) . For the
P. Hartman, “Unrestricted Solution Fields of Almost-separable Differential Equations,” Trans. Amer. Math. SOC., 1947, pp. 560-580. 12. Consider the equation u’ = -au + p ( t ) , where a > 0 and p ( t ) is periodic in T . Show that every solution tends toward a unique periodic solution corresponding to u(0) = 0.
+ +
13. Consider the nonlinear equation u’ = -au u2 p(t), u(0) = c, 1cI 1, where p ( t ) is, as above, periodic. What is the asymptotic behavior of the solution ? Suppose that p ( t ) is a sum of trigonometric polynomials.
<
14. Extend the foregoing to the multidimensional case x’ = Ax +g(x)
+ h(t),
x(0)
= c,
where A is a stability matrix. 15. Consider the nonlinear integral equation u(x)
=
1
+ h s’ u ( y )u ( y X
-
0 <x
x) dy,
< 1.
This equation has two real solutions for 0 < 2 < 1/2 and no real solution for A > 1/2. These solutions are positive, monotone decreasing in x and at least twice differentiable. If u(x) is any real 1 2 = 0. solution and b = J,, u(x)dx,then hb2 - 2b
+
G. H. Pimbley, Jr., “Positive Solutions of a Quadratic Integral Equation,” Arch. RatE. Mech. Anal., Vol. 24, 1967, pp. 107-127. 16. Consider the nonlinear integral equation u(t) = J k ( t , s ) f ( s , u(s)) ds. Let K be a linear operator and h a scalar. Consider the equivalent equation u(t) - XKu = J K ( t ,s ) f ( s , u(s)) ds - XKu, whence, for suitable A, u = (1 - hK)-1 =
vn(4
[J k ( t , s ) f ( s , u(s)) ds - hKu]
+g(X)*
Can one find cases where K and A can be chosen adroitly so that the new equation can be solved by means of successive approximations, or fixed-point techniques, even though these methods do not apply to the original equation ? See
175
MISCELLANEOUS EXERCISES
I. I. Kolodner, Contractive Methods for Hammerstein Type Equations on Hilbert Spaces, Department of Mathematics, University of New Mexico, Technical Report No. 35, 1963.
17. Consider the equations
(4
dx - = A(t)x, dt
x(0) = c
(b)
3 dt = B ( t ) y ,
y(0)
Let X be the solution of dX/dt
=
= c.
A ( t ) X , X ( 0 ) = I . If
then every solution of (b) may be written y
=
Xc
+ o(ll X I ( ) .See
‘R. Bellman, “On a Generalization of a Result of Wintner,” Quart. Appl. Math., Vol. 16, 1958, pp. 431-432.
18. Consider the determinantal equation u‘ - %l(4Q l Z ( 4 -az2(4
where u
=
u(t) and u‘
=
du/dt. Then
How does one determine whether to use the maximum or minimum operator ? Obtain upper and lower bounds for the solutions in this way for particular choices of the xi .
19. Consider the Levinson oscillator u” + g(u) u’ g(4
=
+I,
IuI
-
-1,
lul<1.
Discuss the limiting behavior of u(t) as t conditions, u(0) = c, , u’(0) = c 2 . See
b
+ u = 0, where
1,
-j
co for any set of initial
N. Levinson, “A Second Order Differential Equation with Singular Solutions,” Ann. Math., Vol. 50, 1949, pp. 127-153.
STABILITY THEORY AND RELATED QUESTIONS
176
20. Consider the Blasius equation f (3) = 0, subject to the boundary conditions f ( 0 ) = f ’(0)= 0, f ’ -+ 2 as t -+ 00. Show that this is equivalent to the nonlinear integral equation logF”(t) = y(F”)
=
-
1
-
t
2 0
(t
-
s ) ~ F ’ ( sds, )
wheref(t) = cF(ct). Show that if we employ the successive approximation scheme Fi = 0, FL+l = +(Fi), then F,” < F; , F ; > F i , Fl < F i , and so on. Furthermore, the sequence { F i } converges, and any two consecutive members form upper and lower bounds to the solution. Show that c may be chosen so that f ’ + 2 as t -+ 00 and that c3 cv 1.328.
H. Weyl, “Concerning the Differential Equations of Some Boundary Layer Problems,” Proc. Nut. Acad. Sci. U.S., Vol. 27, 1941, pp. 578-583. G. Temple, “Linearization and Delinearization,” Proc. International Congress of Mathematicians, 1958, pp. 233-247.
+
21. Consider the Blasius equation f”’ ff” = 0, subject to f (0) = f ’ ( 0 ) = 0,f ’( co)= 1 . Set f ”(0) = a, a parameter to be determined, t and write f ” = aerF, where F = Sof dt, . Then a = l/sr ecFdt. Use the power series expansion F N at3/6,for small t , to obtain the approximate relation and thus obtain an estimate for a. Why is it sufficient to approximate
to F for small t ? How does one obtain an improved estimate for a ?
See
D. Meksyn, New Methods in Boundary Layer Theory, Pergamon Press, New York, 1961. D. Meksyn, “Integration of the Boundary Layer Equations,” Proc. Royal SOC.(London), A237, 1956, p. 543. W. Squire,“Application of Generalized Gauss-Laguerre Quadrature to Boundary-Layer Problems,” J. AerolSpace Sci., Vol. 26, NO. 8, August 1959, pp. 540-541. W. Squire, “A Note on the Blasius Equation with Three-Point Boundary Conditions,” J . AerolSpace Sci., Vol. 26, No. 10, October 1959, pp. 678-679.
+ +
22. Consider the operator D defined by DF = gaF/ax, haf / a x 2 , where g, h, and F are analytic functions of the variables x1 and x2 . Show that D(F G ) = DF DG, D(FG) = F ( D G ) G(DF).
+
+
177
MISCELLANEOUS EXERCISES
23. Show that if x, and x2 denote the solutions of the system x,‘ = g(x1 , xz), x2‘ = h(xl, 4, xl(0) = cl, x2(0) = c2 , then any analytic function F(x, , x2) of the solutions can be represented in the form f ( x l , x 2 ) = eDIF(c,, c2). This representation, due to Lie, has been used extensively by Grobner for analytic and numerical treatment of the n-body problem of celestial mechanics. See
W. Grobner, Lie-Reihen und ihre Anwendungen, Verlag der Wissenschaften, Berlin, 1960.
24. Consider the equation u(t) = f ( t )
+
st
g(u(s))(t - s)-ll2
0
ds.
<
Let f ( t ) be continuous for 0 t < co and satisfy a uniform Lipschitz condition of order b, > 0 on any finite interval. Let g(u) be monotone increasing in u, g(0) = 0, and let I g(ul)- g(u,)l kI u1 - u2 1 for 1 u1 I, I u2 1 b , , where k = k(b,). Then the equation possesses a unique solution.
<
<
N. Levinson, “A Nonlinear Volterra Equation Arising in the Theory of Superfluidity,” J. Math. Anal. Appl., Vol. 1, 1960, pp. 1-11.
25. Consider a linear differential equation of the form L(u) = f ( t ) = f ( t , c ) . For various classes of functions { f ( t ,c ) } and linear operators there are particular solutions of the form u = f ( t , g(c)). Consider a nonlinear differential equation N(u) = f ( t , c). For which nonlinear differential equations and function classes ( f ( t , c)} does this property persist ? Begin with the equation u’ - g(u) = f ( t , c).
+ +
u” u eu3 = 0 as an equation E U ~u ) = 0. Using the WKB approximation, of the form U” (1 we obtain an approximate solution of the form
26. Consider the nonlinear equation
+ +
Is this a valid or useful approximation ?
27. Consider the Fredholm integral equation
STABILITY THEORY AND RELATED QUESTIONS
178
where
t
3 0, and
as t -+ co. Show that the equation possesses a unique solution 0 as t -+ 03. which approaches zero as t + 03, if g ( t ) --f
28. Consider the truncated equation V(t> = At)
+ fk(t, tl) 4 t l >4
>
0
an equation of Volterra type. Show that this possesses a unique solution and that under the foregoing hypotheses, with the assumption that g ( t ) -+ 0, I u(t) - v(t)l -+ 0 as t -+ 00.
29. Consider the case where k ( t ) = e-alt'. Show that the conditions are fulfilled if a > 1 . Is this condition necessary? 30. Show that u(t) = g ( t ) +
s:
e-alt-lll u(t dt, can be converted into a linear second-order differential equation subject to a two-point boundary condition, while the truncated equation, the equation for v, is an initial value problem.
31. Consider the equation
a > 1, with h not equal to a characteristic value, and the truncated equation v(t) = g(t>
If g ( t ) -+ 0 as t
+
+A S
t
0
e-alt-tll
u(t1) dt,
*
co, does I u(t) - v(t)l -+ 0 as t
-+
co ?
32. Let (un(t)} be a sequence of continuous functions defined over closed finite interval [a, b] which converges monotonically to
a a
continuous function u(t). Show that the sequence necessarily converges uniformly. (Dini.)
33. Let {un} be a sequence of values determined by the relation u, = N airin. Given the values {u,}, how does one determine N ? Hint: ~~satisfiesalinear differenceequationu,,,+b,u,+,-,+ bNun=0. See
+
MISCELLANEOUS EXERCISES
179
R. Bellman, “On the Separation of Exponentials,” Boll. U.M.I., (3), Vol. 15, 1960, pp. 38-39.
The problem is a particular case of a more general question of determining N and the parameters when it is known that u, = N L1 aig(ri , n). See
P. Medgyessy, Decomposition of Superpositions of Distribution Functions, Akademiai Kiado, Budapest, 1961. 34. Discuss the numerical stability of the procedure. 35. Let x ( t ) be a nonconstant periodic solution with periodp of x’ = F(x). If 11 F(x,) - F(x2)11 LJjx1 - x2 11 for all x1 , x2 , then p 2 24L.
<
J. A. Yorke, Periods of Periodic Solutions and the Lipschitz Constant, Institute for Fluid Mechanics and Applied Mathematics, T N BN-583, 1968.
+
36. Consider the equation U ” r2(t)u - un = 0, n 3 2, u(0) = U ( T ) If a positive solution exists, it is unique.
= 0.
37. No positive solution can exist unless r2 > 1. 38. What is the asymptotic behavior of solutions as h 3 hp(t),p ( t ) > 0. Consider first the case p = 1 . See
oc)
if r2(t) =
J. Canosa and J. Cole, “Asymptotic Behavior of Certain Nonlinear Boundary-value Problems,” J. Math. Phys., Vol. 8, 1968, pp. 1915-1921.
+
39. Consider the Riccati equation u‘ = -u u2, u(0) = c . Show that it is equivalent to the infinite-dimensional linear system uh,, = -(n 1) u,+~ (n 1) u , + ~ ,U,+~(O) = cn+l, n = 0, 1,... . Hint: u, = un. This type of linearization was first used by Carleman.
+
+ +
40. Show similarly that any nonlinear differential equation of the form u‘ = g(u), u(0) = c, where g(u) is a power series in u, may be trans-
formed into a linear system of differential equations of infinite dimension.
41. Show that in place of the sequence {u”} we can use the sequence (P,(u)} of Legendre polynomials for the same purpose. Are there any advantages to using the sequence of Legendre polynomials as opposed to the sequence of powers ?
180
STABILITY THEORY AND RELATED QUESTIONS
42. Consider the truncated system Urt’ = - 7 W n
+ nu,+,
V,(o)
?Z =
== Cn,
1, 2,...,N - 1,
V N ( 0 ) = CN,
--Nu,,
u”
,
where 1 c I < 1. Write v n = vkN,”’. Show that limN+may’ = %l43. Consider the vector equation x’ = A x g(x), x(0) = c, where the components of g ( x ) are power series in the components of x, lacking constant and linear terms. Obtain the analogue of the Carleman linearization of Exercise 39.
+
44. Examine the validity of the truncation scheme analogous to that of Exercise 43 under the assumption that A is a stability matrix and that 11 c 11 1. For some results concerning truncation, see
<
R. Bellman, “On the Validity of Truncation for Infinite Systems of Ordinary Differential Equations,” Math. and Phys. Sci., Vol. 1, 1967, pp. 95-100. R. Bellman a n d J. M. Richardson, “On Some Questions Arising in the Approximate Solution of Nonlinear Differential Equations,” Quart. Appl. Math., Vol. 20, 1963, pp. 333-339. 45. Consider the matrix equation X ( s t ) - X ( s ) X ( t ) = F(s, t ) , - co < s, t \ co.Under what condition of F is every solution of the form
+
See A. B. Buche and A. T. Bharucha-Reid, “On Some Functional Equations Associated with Semigroups of Operators,” Proc. Nut. Acad. Sci. U.S., Vol. 60, 1968, pp. 1170-1 174. J. M. Richardson, “Quasi-differential Equations and Generalized Semigroup Relations,” J . Math. Anal. Appl., Vol. 2, 1961, pp. 293-298. 46. Show that au(s, t ) ___ =
at
R
k(s, r ) u(r, t ) dr
possesses a unique solution which reduces to a continuous function at t = 0, under suitable assumptions concerning k . (Volterra.) 47. Study the existence and uniqueness of solutions of
181
MISCELLANEOUS EXERCISES
See M. L. Heard, “On a Nonlinear Integrodifferential Equation,” J. Math. Anal. Appl., Vol. 26, 1969, pp. 170-189. 48. Let p ( t ) be a periodic function of t of period 1. Under what further conditions onp(t),if any, does u’ u + u2 = p ( t ) possess a periodic solution of period 1 ? 49. Let p ( t ) = a,eiAlt a2eiAi1,A,, A, incommensurable. Under what conditions does the equation possess a solution of the form
+
+
c m
m.n=--m
amn
exp[i(mh,
+ 4 )tl ?
50. Consider the equation u‘
+u +
u2
= c2e-at,
u(O) = c1 ,
c1
I
+
c2
1
< 1.
Then u = C,,,=o u,,,(t) clmczn. How does one determine umn(t)in an efficient fashion ? See W
R. Bellman, “A Note on the Asymptotic Behavior of Solutions of Nonlinear Differential Equations,” J. Math. Anal. Appl., to appear.
51. Consider the nonlinear system dxi
_ - -a&) dt
xi
+ c a&)
xj
,
X i ( 0 ) = ci
,
i
=
1 , 2,..., N ,
j#i
N
ci = 1, aii 3 0, ajj(x) = a i j ( x ) . Suppose where ci 3 0, further that the aii(x) satisfy uniform Lipschitz conditions in I xi I 1. Show that there is a unique solution for t 2 0, and that this solution approaches a limit as t -+ co.
xLl
<
R. Bellman, “Topics in Pharmacokinetics-I
dent Rates,” Math. Biosci., to appear. 52. Consider the nonlinear integral equation x(t) =f ( t )
+
J t K(t - s) 0
I N S ) ds,
where J is the Jacobian of g evaluated at x ditions can we assert that x y ? See
=
: Concentration-depen-
=
0. Under what con-
J. A. Nohel, “Problems in Qualitative Behavior of Solutions of Nonlinear Volterra Equations,” Nonlinear Integral Equations, University of Wisconsin Press, Madison, Wisconsin, 1964. R. K. Miller, “On the Linearization of Volterra Integral Equations,” J . Math. Anal. Appl., Vol. 23, 1968, pp. 198-208.
182
STABILITY THEORY AND RELATED QUESTIONS
Bibliography and Comments 54.1. For a detailed discussion of how nonlinear equations enter into engineering and science, see
T. von Karman, “The Engineer Grapples with Nonlinear Problems,” Modern Trends in Control Theory, R. Bellman and R. Kalaba (editors), Dover Publications, New
York, 1963. N. J. Zabusky (editor), Topics in Nonlinear Physics, Springer-Verlag, Berlin, 1968. For an interesting discussion of some of the advantages of nonlinearity, see
G. Temple, “Linearization and Delinearization,” Proc. International Congress of Mathematicians, August 1958, pp. 233-247. For a discussion of the role of stability theory in mathematical economics, see
G. Uzawa, “Walras’ Tatonnement in the Theory of Exchange,” Review of Economic Studies, Vol. 27, pp. 182-194. For systematic discussions of stability theory, see
R. Bellman, Stability Theory of Differential Equations, Dover Publications, New York, 1969. L. Cesari, Asymptotic Behavior and Stability Problems in Ordinary Differential Equations, Springer, New York, 1963. R. Reissig, G. Sansone, R. Conti, Nichtlineare Differentialgleichungen Hoheren Ordnung, Edizioni Cremonese, Rome, 1969. See also
J. P. LaSalle and S. Lefschetz, “Recent Soviet Contributions to Ordinary Differential Equations and Nonlinear Mechanics,” J. Math. Anal. Appl., Vol. 2, 1961, pp. 467-499. The use of the “projective metric” of Birkhoff is a further example of the fact that linear equations are often best treated by nonlinear techniques. See
G. Birkhoff, “Extensions of Jentzsch’s Theorem,” Trans. Amer. Math. Sac., Vol. 85, 1957, pp. 219-227. R. Bellman and T. Brown, “Projective Metrics in Dynamic Programming,” Bull. Amer. Math. Soc., Vol. 71, 1965, pp. 773-775. J. F. deSpautz and R. A. Lerman, “Equations Equivalent to Nonlinear Differential Equations,” Proc. Amer. Math. Soc., Vol. 18, 1967, pp. 441-444. 54.3. See
F. V. Atkinson, “On Asymptotically Periodic Linear Systems,” J . Math. Anal. Appl.,
Vol. 24, 1968, pp. 646-653. J. Kevorkian, “The Two Variable Expansion Procedure for the Approximate Solution of Certain Nonlinear Differential Equations,” Space Mathematics (Proc. Summer Seminar, Zthaca, New York, 1963), Part 3, Amer. Math. SOC.,Providence, R. I., 1966, pp. 206-275.
BIBLIOGRAPHY A N D COMMENTS
183
94.4. See
0. Perron, “Die Stabilitatsfrage bei Differential-gleichungen,” Math. Zeit., Vol. 32, 1930, pp. 703-728. K. P. Persidskii, “Uber die Stabilitat einer Bewegung noch der ersten Naherung,” Mat. Sbornik, Vol. 40, 1933, pp. 284-293 (in Russian). I. G. Malkin, “On Stability in the First Approximation,” Sbornik Naucuyh Trudov Kazanskogo ..., Vol. 3, 1935, pp. 7-17 (in Russian). M. G. Krein, “On Some Questions Related to the Ideas of Lyapunov in the Theory of Stability,” Uspehi Mat. Nauk (N. S.), Vol. 3, 1948, pp. 166-169 (in Russian). R. Bellman, “On an Application of a Banach-Steinhaus Theorem to the Study of the
Boundedness of Solutions of Nonlinear Differential and Difference Equations,” Ann. Math., Vol. 49, 1948, pp. 5 15-522. J. L. Massera and J. J. Schafer, “Linear Differential Equations and Functional Analysis, 1,” Ann. Math., Vol. 67, 1958, pp. 517-573; “11,” ibid., Vol. 69, 1959, pp. 88-103; “111,” ibid., Vol. 69, 1959, pp. 535-574; “IV,” Math. Ann., Vol. 139, 1960, pp. 287-342. T. G. Hallam, “On the Asymptotic Growth of the Solutions of a System of Nonhomogeneous Linear Differential Equations,” J. Math. Anal. Appl., Vol. 25, 1969, pp. 54-265.
94.5. We will subsequently, in Volume 11, briefly mention the use of Lie series and the Liouville operator to study nonlinear differential equations. See W. Grobner, Die Lie-Reihen und ihre Anwendungen, Verlag der Wissenschaften, Berlin, 1960. See also
W. V. Petryshyn, “Projection Methods in Nonlinear Numerical Functional Analysis,” J. Math. Mech., Vol. 17, 1967, pp. 353-372. For questions of the existence of multiple solutions, see
S. Bancroft, J. K. Hale, and D. Sweet, “Alternative Problems for Nonlinear Functional Equations,” J. Math. Anal. Appl., Vol. 4, 1968, pp. 40-56. For the use of inequalities to establish uniqueness, see
I. Bihari, “A Generalization of a Lemma of Bellman and Its Application to Uniqueness Problems of Differential Equations,” Acta Math. Hungar., Vol. 7, 1956, pp. 81-94. 94.6. For a different approach to the determination of approximate solutions of nonlinear h(t), a generalization of the WKB-Liouville approximasystems of the form x“ = g(x) tion, see
+
J. J. Gilvarry, “Linear Approximations in a Class of Nonlinear Vector Differential Equations,” Quart. Appl. Math., Vol. 11, 1953, pp. 145-156. See
G. Burnand and G. Sablos, “Determination of the Domain of Stability,” J. Math. Anal. Appl., Vol. 23, 1968, pp. 714-722. R. J. Duffin, “Stability of Systems with Nonlinear Damping,” 1. Math. Anal. Appl., Vol. 23, 1968, pp. 428-439.
STABILITY T H E O R Y A N D RELATED QUESTIONS
Ia4
E. J. Davison and K. C. Cowan, “A Computational Method for Determining the Stability Region of a Second-order Nonlinear Autonomous System,” Intern. J. Control, Vol. 9, 1969, pp. 349-358. $4.10. For further results, see
J. P. Rosen, “Stability and Bounds for Nonlinear Systems of Difference and Differential Equations,” J . Math. Anal. Appl., Vol. 2, 1961, pp. 370-393. Y. A. Mitropolskiy, “Construction of the General Solution of Nonlinear Differential Equations by a Method Ensuring “Accelerated” Convergence,” Ukrainskzy Matem. Zhur., Vol. 16, 1964, pp. 475-501. 54.11. See
J. Warga, “On a Class of Iterative Procedures for Solving Normal Systems of Ordinary Differential Equations,” J . Math. Phys., Vol. 31, 1953, pp. 223-243. I. Manning, “A Method for Improving Iteration Procedures,” Proc. Cambridge Phil. SOC.,Vol. 63, 1967, pp. 183-186. E. H. ROGERS,“Stability and Convergence of Approximation Schemes,” 1.Math. Anal. Appl., Vol. 20, 1967, pp. 442-453. J. moss^, “A New Technique for the Construction of Solutions of Nonlinear Differential Equations,” P N A S , Vol. 47,1961, pp. 1824-1831. Institute of Mathematical Sciences, New York University, New York, 1961. In a number of cases we can employ quasilinearization to obtain monotone convergence. See R. Bellman and K. Kalaba, Quasilinearization and Nonlinear Boundary-Value Problems, American Elsevier, New York, 1965. 94.12. For the use of fixed-point theorems in stability theory, see
M. IHukuhara, “Sur les Points Singulikres des Equations Diffkrentielles LinCaires,” /. Fac. Sci., Hokkaido Imp. Univ., Ser. I, Math. 2, 1934-36, pp. 13-88. R. Bellman, “On the Boundedness of Solutions of Nonlinear Difference and Differential
Equations,” Tram. Amer. Math. Soc., Vol. 62, 1947, pp. 357-386. A. Stokes, “The Application of il Fixed Point Theorem to a Variety of Nonlinear Stability Problems,” Contributions to the Theory of Nonlinear Oscillations, Vol. 5 , Princeton University Press, Princeton, New Jersey.
54.15. For a discussion of expansion in terms of a function of the parameter appearing, see the paper by Temple cited in 54.1 and
R. Bellman, “On Perturbation Methods Involving Expansions in Terms of a Parameter,” Quart. A p p l . Math., Vol. 13, 1955, pp. 195-200.
94.18. For a detailed discussion of how two-point boundary problems arise in modern control theory, see
I<. Bellman, Ititrorhrction to the Mathematical Theory of Control Processes: Linear Equations
and I)irudratic Criteria, Academic Press, New York, 1967. R. Conti, “Recent Trends in the Theory of Boundary-value Problems for Ordinary Differential Equations,” Boll. U . M . I., Vol. 22, 1967, pp. 135-178.
For their appearance in transport theory, and their avoidance by means of invariant imbedding, see
BIBLIOGRAPHY A N D COMMENTS
185
R. Bellman, R. Kalaba, and M. Prestrud, Invariant Imbedding and Radiative Transfer in Slabs of Finite Thickness, American Elsevier, New York, 1963. R. Bellman, H. Kagiwada, R. Kalaba, and M. Prestrud, Invariant Imbedding and Timedependent Processes, American Elsevier, New York, 1964. Tsung Yen Na, “Transforming Boundary Conditions to Initial Conditions for Ordinary Differential Equations,” S I A M Review, Vol. 9, 1967, pp. 204-210.
$4.19. See
R. Bellman, “On Analogues of Poincark-Lyapunov Theory for Multipoint Boundaryvalue Problems,” J . Math. Anal. Appl., Vol. 14, 1966, pp. 522-526. 54.20. See R. S: Bucy, “New Results in Asymptotic Control Theory,” J. S I A M Control, Vol. 4, 1966, pp. 397-402. R. Bellman and R. S. Bucy, “Asymptotic Control Theory,” J. S I A M Control, Vol. 2, 1964, pp. 11-18.
$4.21. For an interesting application of the Riccati equation, see
H. Osborn, “The Morse Index Theorem,” Proc. Amer. Math. SOC., Vol. 18, 1967, pp. 759-762.
R. Bellman, “Functional Equations in the Theory of Dynamic Programming-XIII: Stability Considerations,” J. Math. Anal. Appl., Vol. 12, 1965, pp. 537-504.
94.25. See R. Bellman and K. L. Cooke, Differential-Difference Equations, Academic Press, New York, 1963. For discussion of ways of avoiding the instability of the initial value problem, see D. Greenspan, “Approximate Solution of Initial Value Problem for Ordinary Differential Equations by Boundary Value Techniques,” J . Math. and Phys. Sci., Vol. 1 , 1967, pp. 261-274.
$4.26. For a discussion of types of irregular perturbation problems, see the paper by Temple cited in $4.1. and W. Wasow, Asymptotic Expansions for Ordinary Differential Equations, Wiley & Sons, New York, 1966. $4.27. See
H. G . Handelman, J. B. Keller, and R. E. O’Malley, Jr., “Loss of Boundary Conditions in the Asymptotic Solution of Linear Ordinary Differential Equations,” Comm. Pure and Appl. Math., Vol. 21, 1968, I, pp. 227-242; 111, pp. 243-262. $4.28. See
R. Bellman, Stability Theory of Dzfferential Equations, Dover Publications, New York, 1969. R. I. Anitschenko, “A Boundary Value Problem for Thomas-Fermi-Dirac Equations,” Akad. Nauk S S S R . Doklady, Vol. 145,1962, pp. 483-486.
186
STABILITY THEORY AND RELATED QUESTIONS
A. M. Arthurs and P. D. Robinson, “Complementary Variational Principles forV2p, = f(rp) with Applications to the Thomas-Fermi and Liouville Equations,” Proc. Cambridge Phil. SOC.,Vol. 65, 1969, pp. 535-542. E. Hille, “On the Thomas-Fermi Equation,” Proc. Nut. Acud. Sci. U.S., Vol. 62, 1969, pp. 7-10. G. Rosen, “A Necessary Condition for the Existence of Singularity Free Global Solutions to Nonlinear Ordinary Differential Equations,” Quart. Appl. Muth., Vol. 27, 1969, pp. 133-134. T. V. Atopanova, “Generalized Emden-Fowler Equation” (Russian), Dzfferenciul’nye Uravenenija, Vol. 3, 1967, pp. 831-838. Let us point out that the stability analysis we have been carrying out for the EFFTequation is a special case of the stability analysis for similarity solutions of partial differential equations. Little has been done in this direction. Let us mention finally a powerful approximation technique due to Von Zeipel dependent on a sequence of physically motivated transformations of variables; see
H. Von Zeipel, Arkiv Astron., M a t . , Fys., Vol. 11, No. 1, 1916.
Brown and Shook, Planetary Theory, Cambridge University Press, London and New York, 1933. R. F. Arenstorf, “A New Method of Perturbation Theory and Its Application to the Satellite Problem of Celestial Mechanics,” 1.Reine Angew. Muth., Vol. 221, 1966, pp. 113-145
Chapter 5 THE BUBNOV-GALERKIN METHOD
5.1. Introduction
T h e equations of science are generally not susceptible to explicit solution in terms of the elementary functions of analysis. Progress, therefore, both mathematical and scientific, depends critically upon our ability to obtain various kinds of approximate results. Sometimes it is sufficient to obtain an algorithm which will provide accurate numerical results; sometimes we want analytic expressions which readily exhibit certain basic qualitative properties of the solution. I n many cases, these analytic expressions can be used both in guiding the construction of a numerical algorithm and in providing a valuable test of the validity and even feasibility of a longwinded computer calculation. I n this chapter we discuss one of the major approximation techniques of modern applied mathematics of great analytic and computational significance, the Bubnov-Galerkin method. T h e basic idea of the procedure, which is quite simple (as are all important ideas), will be described and validated for differential equations of the form (5.i.l)
subject to both initial value and two-point boundary value conditions. T h e proof of validity is a direct application of the stability concepts of the preceding chapter. Following this, we will illustrate the application of the method to the equation of the nonlinear spring, us
+u +
&3
= 0,
u(0) = c,
and to the Van der Pol equation u"
+
h ( U2 -
1) u'
u'(0) = 0,
+ u = 0,
(5.1.2)
(5.1.3)
for the case where the parameters E and X are small. Since these equations will be used in the following chapter also to demonstrate the use of different types of approximations, we shall devote a few words to their properties. 187
188
THE BUBNOV-GALERKIN METHOD
T h e Bubnov-Galerkin method, applied to linear equations, leads quickly to the “method of moments,” a method we will very briefly discuss in connection with the heat equation. T h e problem of applying the Bubnov-Galerkin method, or any of its variants, to nonlinear equations forces us to consider techniques for solving a finite system of nonlinear equations. T o this end, we will discuss the Newton-Raphson method and also, in this connection, the use of the Lagrange expansion. We will indicate the utility of extrapolation techniques for accelerating convergence and some approaches to obtaining a suitable initial approximation. Numerous references to the application of the Bubnov-Galerkin method to the domain of partial differential equations, its area of principal significance, will be found at the end of the chapter. 5.2. Example of the Bubnov-Galerkin Method
Consider the vector differential equation
dx
=, ).(g
x(0) = c.
(5.2.1)
I n the case where g(x) is a general nonlinear function of x, and N , the dimension of x,is larger than one, the equation is analytically intractable. Despite the fact that we can now readily obtain the numerical soIution of equations of this type with the aid of a digital computer for N of the order of several hundred, and, indeed, several thousand, it is important for many reasons to possess systematic techniques for obtaining simple analytic approximations to x. I,et us attempt to do this in the following fashion. Form the functional
(5.2.2) and let us procecd to minimize J(x) over all x for which the functional exists, subject to the condition that x(0) = c. If x is subject to a twopoint, or multipoint, boundary condition, we can impose these conditions upon the trial functions or pursue a slightly different path as we indicate subsequently. T o obtain an approximate solution to this variational problem (which is, of course, equivalent to the original when the original possesses a solution), we restrict the candidates for minimization to those functions of the type x - - y(4 t), y(., 0) = c, (5.2.3)
5.3. VALIDITY OF METHOD
189
where y ( a , t ) is an element of a conveniently chosen family of functions dependent on a k-dimensional vector a . Write J ( Y ( ~t ,) ) = h(a)
h(a,
9
a2 >--,
(5.2.4)
. g ) -
T h e minimization of J ( y ( a , t ) ) thus involves the minimization of a function of k variables, a problem within the domain of classical calculus, approachable by nonlinear programming algorithms or by various search techniques of sophisticated nature. If k is large, even this approximate problem is by no means simple. However, as we shall show below, we are often in a situation where we possess a reasonable initial approximation, both to the minimizing values of the quantities a , , a2 ,..., ak and to a suitable family of functions. In this case, we can use a Newton-Raphson approach to good avail to ascertain the minimizing values. We will discuss this below. 5.3. Validity of Method
<
Naturally, the question arises: Given that I J ( y ) / E , can we obtain t T ? T h e stability theorems of an estimate for 11 x - y / / for 0 Chapter 4 readily enable us to obtain the desired estimates. Let us demonstrate:
< <
Theorem.
Let x be the solution of the equation dx
= g(X),
~ ( 0= ) C,
0
< t < T.
(5.3.1)
Let g(x) be analytic in x , * and let y be a function such that
(5.3.2) Then, provided that
E
is sufficiently small, we have
I1 x - Y JIM where k, is a constant independent of (5.3.1).
E
< k,
(5.3.3)
which depends upon the solution of
* All we actually need for uniqueness of solution of (5.3.1) is that g(x) satisfy a Lipschitz condition. In practice, we usually need analyticity, or some equally strong structural property, to carry out the approximation procedure. Hence, we may as well impose this stronger condition from the beginning.
190
THE BUBNOV-GALERKIN METHOD
Here, we are using the norm
(5.3.4)
(5.3.5)
(5.3.6)
Referring to the theorem established in Sec. 4.13, we see that (5.3.3) holds, provided, as hypothesized, that E is sufficiently small. 5.4. Discussion
What is particularly interesting about the foregoing result is that it shows that an inspired guess as to the kind of function that makes J ( y ) small actually yields information concerning x, the solution of (5.3.1). T h e guess can be inspired by analytic properties of the equation or by a knowledge of the underlying physical process. A straightforward, and usually successful, computational approach is one based on the use of a trial function of the form Y
c a,4t), k
-
(5.4.1)
z=1
where the ziare conveniently chosen functions. What is important from the standpoint of analysis is that, conversely, the existence of the solution of (5.3.5), with /I A(t)ll small, entails the existence of the solution of (5.3.1). This is particularly important in the case of boundary value problems where again mathematical or physical experience can be used to provide an excellent mean-square approximation. We shall discuss some examples of this below. 5.5. The General Approach
Before discussing some examples illustrating the use of the method, let us describe the steps that are taken in general. We begin with an equation N ( x ) = 0, (5.5.1)
5.5. THE GENERAL APPROACH
191
where x ( t ) is to be determined over some interval [0, TI. We replace this question by the problem of minimizing a functional of the form (5.5.2)
where as yet the norm is unspecified. Usually, analytic considerations dictate the use of the mean-square expression T (5.5.3) J ( 4 = II N(4llZdt,
1
where 11 . 11 is the usual vector norm. With the development of linear and nonlinear programming techniques, it becomes feasible in some cases to use a Cebycev norm,
T o determine the minimum value of J(x) to some desired degree of approximation, we restrict the class of admissible functions to those of the form y(t, a), where a is a k-dimensional vector and y ( t , a ) is some convenient family of functions. We suppose then a has been chosen so that (5.5.5)
T h e remaining problem lies in the domain of stability theory, namely, that of estimating (1 x - y I] over [0, TI. The approach taken depends on the form of N ( x ) and the ability to handle an equation of the form
Often there are initial and boundary conditions on x. Sometimes it is convenient to take account of these by suitable choice of a. Sometimes, it is convenient to introduce an equivalent averaging over boundary conditions. Thus, if we had conditions of the form ~ ( 0= ) C,
x ( T ) = d,
we could, if we wished, consider the new functional
The parameter A is the Courant parameter, of which more later.
(5.5.7)
192
T H E BUBNOV-GALERKIN
METHOD
What we wish to emphasize is the flexibility of the Bubnov-Galerkin method. T h i s flexibility is a manifestation of the approximate nature of all mathematical formulations of physical processes. 5.6. Two Nonlinear Differential Equations
In order to illustrate the efficacy of the methods we present in this and subsequent chapters, it is convenient to focus upon particular equations whose properties are reasonably well-known. T o that end, we consider the equation of the nonlinear spring, u"
+ u + EU3 = 0,
u(0) = c,
u'(0) = 0,
(5.6.1)
and the Van der Pol equation, the equation of the multivibrator,
+
U '
X(u2 -
1) u'
+ u = 0,
(5.6.2)
whereh > 0. T h e first equation is more readily treated and possesses more easily appreciated properties. T h e first integral of (5.6. l),
+ + Eu4/2= + Ec4/2,
u ~ 2
u2
c2
(5.6.3)
shows that the solution of (5.6.1)is periodic with period and amplitude dependent on E and c. We can introduce a change of variable, u = cv, to make c = I , so that there is only one basic parameter, d . 'I'he period and amplitude can be determined explicitly in terms of elliptic functions. We are interested here only in the case where I E I 1 and the solution will not differ appreciably from the simple solution valid for E = 0. As we shall see in a moment, this is an attractive, plausible, and intuitive assumption which is often wrong in dealing with nonlinear phenomena. u = 0 leads to an associIn this case, every periodic solution of U" ated periodic solution of (5.6,l).As we shall show, however, useful perturbation techniques are not as direct as might be imagined. The behavior of the solutions of the Van der Pol equation is entirely different. If h > 0, there is a unique periodic solution of (5.6.2) to which every other solution tends. T h i s means that of all the periodic solutions of 11'' 1 ZL = 0, one, and only one, is associated with the periodic solution of (5.6.2). T h e fact that the smallest amount of nonlinearity of the form appearing in (5.6.2)quantizes the continuum of periodic solutions of U" + ti = 0 has important engineering and biological consequences. Possibly it plays a significant role in the physical world as well.
<
+
5.7. THE N O N L I N E A R SPRING
193
References to detailed discussion of these and closely related results for other types of nonlinear differential equations will be found at the end of the chapter. Exercises
1 . Prove that the solution of u” is periodic.
+ u + €22= 0, u(0) = 1, u’(0) = 0,
2. Prove that the solution is stable as a function of E in the neighborhood of € = 0.
3. Obtain an approximate expression for the period for ]
E
]
< 1.
5.7. The Nonlinear Spring
To illustrate an application of the Bubnov-Galerkin method, let us consider first the nonlinear spring. U”
+u +
€243
u(0) = 1,
= 0,
u’(0) = 0.
(5.7.1)
We know from previous theoretical analysis, as pointed out above, that the solution this equation possesses is periodic and that this solution is close to the solution of the equation with E = 0, for E small, for t ranging over a single period. Let us first use a Bubnov-Galerkin approach to obtain an approximation to the solution of (5.7.1) and then discuss the difficulty encountered in a straightforward perturbation approach. It is this difficulty, encountered throughout the domain of nonlinear equations, which motivates the use of less pedestrian methods. Consider the functional J(u) =
I“
(u”
0
+ u + € 2 ~ ~dt.) ~
(5.7.2)
We observe, analytically or experimentally (either physically or with a digital computer), that the effect of the nonlinearity is to change both the period and amplitude of oscillation of (5.7.1). Let us ignore the change in amplitude, which turns out to be a smaller effect, and see if we can obtain the correction to the frequency. I n any case, there is no harm done in trying the simpler approximation first. Write u
cos w t ,
(5.7.3)
194
T H E BUBNOV-GALERKIN
where we take 1
=
W
1
+ blE + + blE +
METHOD
b2€2
+ ... (5.7.4)
b2€2,
for our present purposes. Th en J(c0s W t )
=
JI
=
JO
=
2 jr(-2b,
(-2blE
cos wt
€2
+
cos w t
0
=
+
((1 - w2) cos w t
jT(-2b,
cos t
0
E(C0S Wt)3)2
<(COS 4
)
+ (cos ~
+ (cos
+
3
t)3)2
dt
dt~
+
0(€3)
+
t )dt ~ )O(e3) ~
dt
+
0(~3),
(5.7.5)
upon using the fact that w = 1 O(E).Observe that a change in the O(E)] does not effect the value limits of integration from [0,T] to [0, n of J as far as the 2 term is concerned. It contributes only to the O(e3) term. Similarly, a change in the amplitude from 1 to 1 O(E) has no effect upon the determination of b, . Hence, we minimize J(coswt) to the desired order of accuracy by choosing b, to be the quantity which minimizes the quadratic expression in 6 , , ( -2b1 cos t (cos t ) 3 ) 2 dt (5.7.6)
+
+
s,
+
This leads to the equation
J, (-2b,
cos t
whence b,
=
+ (cos t ) 3 ) cos t dt = 0,
J; (cos t ) 3 cos t dt 2 J,-(cos t)Z dt
Since (cos t ) 3 this reduces to b,
=
=
cos3t
-+ 4
3 cost 4 '
$J; (cos t ) Z dt 2 J; (cos t ) 2 dt
-
3 8'
-
(5.7.7)
(5.7.8)
(5.7.9)
(5.7.10)
5.7. THE NONLINEAR SPRING
195
Exercises
1. Take an approximation of the form u = (1
+ b2c) cos (1 + blE)t + b3E cos 3(1 + ble)t
and determine b,, b , , using the value of b, found above. For small E , what is the order of magnitude of minbi J(u) ?
2. Consider the equation u‘ = - u u =act
+ u2, u(0) = c. Set
+ ( c - a ) e-2t
+
and determine a so as to minimize g(a) = (u’ u - u2), dt. Determine the solution ofg’(a) = 0 which has the form a = c O(c2).
+
3. Similarly, determine the coefficients a and b if we set u =act
4. Set u
+ be-2t + (c - a - b) ec3$.
+ be-2t and determine a and b so as to minimize JA(~) =I + u u2)2dt + X(a + b - c ) ~ , X > 0.
=
ae-l
co
0
(11’
-
What happens as A -+ co ?
5. Consider the equation x‘ = Ax
+ g(x), x(0) = c. Set
where A, and A, are the characteristic roots of A with the property that 0 > A, > Az > * * - > A, . Set b‘l) b(2)= c and determine b‘l) so as to minimize
+
g(b((1))=
Irn (x‘
Ax - g(x), x’
-
0
Ax
-
g(x)) dt.
+
6. Consider the equation u” + (1 ~cp(t)) u = 0,u(0) = 1, u’(0) = 0. Let u = cos(1 + b , ~ t) and determine b, so that
is a minimum. Here
E
is a small quantity.
+
= (1 + a,.) cos( 1 + blE) t - ale cos 2( 1 b , ~ t) and determine a, and b, so that the corresponding function is minimized.
7. Set u
196
THE BUBNOV-GALERKIN METHOD
5.8. Alternate Average
I n the foregoing analysis, we used the interval [0,7r] despite the fact that we knew that the solution was no longer periodic of period 27~. We were able to justify this because the change in the period was O(l E I). I n general, we can accept the fact that we are looking for a solution of unknown period and use the mean
This avoids the problem of an a priori determination of the period to any degree of accuracy. 5.9. Straightforward Perturbation
T h e reader may wonder at this point why we went to all of this effort. Why not use standard perturbation techniques ? T o this end, write u = cos t
+ .U1 +
where u1 , u2 ,..., are independent of 0
= u"
+u +
€U3
=
E,
.2U2
+ "',
(5.9.1)
and substitute in (5.7.1),
+ .u; + A; + ...
- cos t
+ cos t + + + ... + t + 3.(coS2 t ) u1 + .-.I. EU1
.2U2
~ [ C O S ~
(5.9.2)
Equating the coefficients o f t , 2,..., to zero, we obtain the infinite system of differential equations U; i%L
+ u1 + + + U2
=o
C O S ~t ~(COS'
t ) u1 = 0
(5.9.3)
T h e initial conditions are Ul(0) = ZC1'(O) U2(O)
= U2'(O)
0, = 0, =
(5.9.4)
and so on. Consider the equation for u l , U"1
+u
1 = -COS~ -
-cos3t
4
t
3
cos t .
(5.9.5)
Let us use superposition and consider the two forcing terms separately.
5.9. STRAIGHTFORWARD PERTURBATION
197
The term -(cos 3t)/4 yields the particular solution Ull
=
cos 3t 32 ’
(5.9.6)
which causes no particular trouble. The term -3(cos t ) / 4 requires a bit more effort because of resonance. We see that this leads to the particular solution 3t u12 = - - sint (5.9.7) Hence, the solution of (5.9.5) is given by cos 3t 32
3t sin t
u1=---
8
+ c1 cos t + c2 sin t,
(5.9.8)
where the constants c1 and c2 are determined by the conditions in (5.9.4). These yield the equations 1 O=-+c,, 32 (5.9.9) 0 = c2. Hence, explicitly, cos 3t
3t sin t
cos t
% = _32 _ - --~ 8 32
(5.9.10) a
Observe, however, that there is now a peculiar term, -(3t sin t ) / & which increases indefinitely in amplitude as t 3 00. This is rather strange behavior for a periodic solution. The explanation of this anomaly, the so-called “secular term” phenomenon, lies in the fact that we have attempted to expand the solution of (5.7.1) in terms of solutions of the linear equation obtained when E = 0, an equation whose solutions possess a different fundamental frequency. That difficulties should arise is not at all surprising. Precisely the same phenomenon occurs in connection with the much simpler linear equation v”
+ (1 + .yo
The solution is v = cos(1
= 0,
+
E)
v(0) = 1,
t = cos t
- E t sin t
o’(0) = 0.
(5.9.1 1)
+ ..-,
(5.9.12)
when we expand around E = 0. We see then that “top of the head” perturbation theory has its limitations.
THE BUBNOV-GALERKIN METHOD
198
A number of ingenious techniques have been found over the last two hundred years for avoiding the secular term. T h e Bubnov-Galerkin technique is one, and references to a number of others will be found at the end of the chapter. 5.10. A “Tucking-in” Technique
Let us now show that the straightforward perturbation technique can be modified to construct the periodic solution of interest to us, provided that we have a good idea of what to look for. This is a point we emphasized in connection with the Bubnov-Galerkin technique discussed previously. Beginning with (5.9.10), write
-)cost 32 +
3t sin t 8 3 ~ sin t t +--E cos 3t - - cost 8 32 ECOS3t - c o s 1+t+--32
u=cost+E
cos t 32
~
(
=
(1
-
5)cos
(1
3E
+ 8)t +
ECOSt
32 E
cos 3(1
0(E2)
+ O(E2) +O(4
+ 3 ~ / 8t ) + O ( E ~ ) .
32
(5.10.1)
T h e 0 terms are all uniform as long as t is restricted to the fixed interval [0, 2 ~ 3 .We thus obtain the correction to the frequency and to the amplitude of the principal part of the solution. Consequently, we can use the elementary perturbation solution directly to obtain an approximate expression for u, or to guide and check the Bubnov-Galerkin technique. Observe that there is an abstract analogy between this “tucking-in technique” and the Bubnov-Galerkin method. Exercise
1. Show that we can obtain arbitrarily accurate representations for u in this fashion. I n particular, determine the frequency to terms in O(e3). 5.11. The Van der Pol Equation
Let us now consider the equation u”
+ h(u2
-
1)
u’
+ u = 0,
h
2 0,
(5.1 1.1)
5.11. THE VAN DER POL EQUATION
199
one of the most well-known differential equations in analysis. I n the case of the equation of the nonlinear spring of (5.7.1), it can be shown that every solution of the associated linear equation (5.1 1.2)
u"+u=O
leads to a solution of the nonlinear equation. T h e situation is quite the contrary here. I t turns out, as noted above, that there is exactly one co. periodic solution of (5.1 1 . 1 ) to which all other solutions tend as t Suppose that we know that this solution has the approximate form --j
u
a cos t
(5.1 1.3)
as X -+ 0, where a is a parameter we wish to determine. Here, we know on the basis of experience that the frequency of the periodic solution is relatively unaffected as h -+ 0. In any case, as mentioned above, there is no harm done in trying the simplest approximation first. Let us employ the Bubnov-Galerkin method to determine this limiting amplitude. We have h(a) = J(a cos t ) = P a 2
2n 0
((a2cos2 t -
1 ) sin t ) 2 dt.
(5.11.4)
If we determine a2 so as to minimize the integral expression
I
2n
(a2 cos2
t - 1)2 sin2 t dt,
0
we obtain the standard result, a2 = 2. Exercises
1. Why don't we minimize the expression a2
SF (a2cos2 t - 1)2sin2t dt ?
2. Obtain a better approximation of the form u
(d2 + b2X) cos( 1 + bJ2)
3. Consider the equation
solution of the form u tion yields b, g 1.4923.
2.
+
u3 = sin t and look for a periodic b, sin t. Show that the Galerkin approxima-
U"
b, sin t + b, sin 3t. Show, using the Newton-Raphson method (Sec. 5.16 below), or 1.43, b2 0.142. otherwise, that b,
4. Obtain an improved approximation of the form u
200
THE BUBNOV-GALERKIN METHOD
5. Consider the Van der Pol equation with a forcing term u“
+ h(u2 - 1) u‘ + u = X sin t .
If u 6, cos t + b, sin t , show that b, -2.38, 6 , For detailed results of the foregoing type, see
-0.142.
M. Urabe and A. Reiter, “Numerical Computation of Nonlinear Forced Oscillations by Galerkin’s Procedure,” J. Math. Anal. Appl., Vol. 14, 1966, pp. 107-140. M. Urabe, “Les Vibrations ForcCes dans les Systttmes NonlinCaires,” Colloques Internationaux du Centre National de la Recherche Scientifque, No. 148, 1964. M. Urabe, “Galerkin’s Procedure for Nonlinear Periodic Systems,” Arch. Ratl. Mech. Anal., Vol. 20, 1965, pp. 120-152. 5.12. Two-point Boundary Value Problems
T h e equation u”
-
u = g(u),
u(0) = c1 ,
u(T)= c 2 ,
(5.12.1)
may be treated by means of the Bubnov-Galerkin method in very much the same fashion as the corresponding initial value problem. T h e principal difficulty resides in establishing the desired stability property. As indicated in Sec. 4.19., we can obtain results of this nature, provided that we agree to settle for results of a relative type. By this we mean that we ask for stability within a restricted class of functions. 5.13. The Linear Equation L(u)
=g
I n the treatment of partial differential equations, even linear equations present formidable analytic and computational problems. T h e BubnovGalerkin method is particularly effective in dealing with linear functional equations. Let us now see what the method leads to when the equation under consideration is of the form L(u) = g,
(5.13.1)
where L is a scalar linear operator. I n place of the original problem, we consider the new problem of minimizing the expression (5.13.2)
5.13. THE LINEAR EQUATION L(u) = g
201
Let us use a trial function of the type N
j
=
akUk
(5.13.3)
3
k=l
where the uk are prescribed functions and the ak are free parameters. J(u) is then a quadratic expression in the ak . Minimizing over the ak , we obtain the system of linear equations N
L ( u ~dt)
= 0,
i = 1,2,..., N .
(5.13.4)
N
Observe that the mi are the moments of the function L(xk=l -g with respect to the sequence of functions (L(ui)}. We are thus led to a system of linear equations (5.13.4) whose ease of solution depends upon the facility with which we can calculate the integrals bgj =
T
L(u,)L(u,)dt
0
(5.13.5)
and upon the conditioning of the matrix B = (bii). The conditioning of B often dictates the choice of the set of functions {ui>. Exercises
Consider the application of the method to the integral equation
Consider the application to the differential-difference equation u'(t) = u(t - E ) , E t , u ( t ) = 1, 0 t E , where E is a small positive parameter and an approximation to u(t) is desired over 1 t 3.
<
< <
< <
Consider the application to the solution of the linear system Ax = b, where A has the slightly-coupled form
Here I E 1
< 1.
202
THE BUBNOV-GALERKIN METHOD
5.14. Method of Moments
Analyzing the foregoing method, we see that there is no necessity to restrain oneself to moment equations of the form appearing in (5.13.4), derived from a variational problem. We could equally well choose another convenient sequence of functions, inlc},and determine the coefficients a,\.by the requirement that they satisfy the equation ~ ~ ~ ~ ~ ~ L ( u ,g v) i dv t =~ Od, t - i = 1 , 2 ,..., N . k=l
(5.14.1)
0
It may be much easier to calculate the integrals l t L ( u k )zli dt and thus carry out the procedure numerically. Suppose, for example, that the sequences {L(u,)} and {ak}are biorthogonal. T h e n (5.14.1) yields a,
=
i
J r g q dt, 0
1,2,..., N ,
=
(5.14.2)
which, in turn, leads to a simple expression for the approximate solution, N
(5.14.3)
k=l
This is the N t h partial sum of a series expansion for the solution of m L(u) = g, upon setting u = Cn=la,u, . Hence, under reasonable assumptions concerning the sequences {un}and {w,}, we can expect that the sequence of approximations obtained in this fashion converge, at least in a mean-square sense. 5.15. Nonlinear Case
Consider now the nonlinear equation L(U)
+ M(u) -g
=
0,
(5.15.1)
where M ( u ) is a nonlinear operator. Let us suppose that it has a solution u which can be expanded as above in a series
C bnun m
=
9
(5.15.2)
n=l
where the coefficients 6, are determined by
(5.15.3)
5.15. NONLINEAR CASE
203
Suppose that we use the trial expression
2 anun N
u=
(5.15.4)
n=l
and determine the a, as the solution of the N equations
We wish t o examine the connection between the a, and the b, . T h e a, are determined by the equations
i
)
anun , v k ,
uk = (g,v,) - M (nI1
k
=
1,2 ,..., N .
(5.15.6)
Th i s system has a unique solution under various appropriate conditions on the nonlinear function M(u). It may be obtained by successive approximations along Newton-Raphson lines or the multidimensional Lagrange expansion may be used. These methods will be described below. Write, referring to (5.15.2), the solution in the form (5.15.7)
where rN approaches zero in a suitable norm, say that of L2(0,T ) as
N-+
GO.
Since (5.15.1) holds, we have
+ M(u) -g,
(L(u)
or
Vk)
= 0,
k = 1 , 2,...)N ,
204
THE BUBNOV-GALERKIN METHOD
a, as N increases is a stability condition. Thus, the condition that b, We want the solution of the system
to be close to that of (5.15.6) for I rN I, I E~ I assumptions concerning M ( u ) , this is the case.
<
E.
Under appropriate
Exercises
+ +
1 . Obtain an approximate solution u" u u3 = 0, u(0) = 1 , ~ ' ( 0= ) 0, over 0 < t < n, using a trial function of the form u = c1 cos t
+ c2 cos 3t + c3 cos 5t.
2. Obtain an approximate solution, using the functional j ( u )= where h
> 1.
lTrU~r+ 0
u
+
dt
u 3 ~
+ x[(u(o) - + uyo)21, 112
3. Determine conditions on M(u) which ensure the uniqueness of solution of (5.15.6) and the desired stability of solution. 5.16. Newton-Raphson Method
A method of great utility for the solution of nonlinear equations of all types is the Newton-Raphson method. When extended to the study of nonlinear functional equations, it is usually called the NewtonRaphson-Kantorovich method in acknowledgment of the pioneering work of Kantorovich. We shall examine its close relation to the theory of quasilinearization in Volume 11. Let us begin with the one-dimensional case where the geometric significance is clearest. I,etf(t) be a scalar function of the scalar variable t with a zero at Y as indicated in Fig. 5.1. Let to be an approximate value for r and replace the curve u = f ( t ) in the neighborhood of to by its tangent, (5.16.1) = f(Q f'(to>(t - to>,
+
5.16. NEWTON-RAPHSON METHOD
205
Figure 5.1
as indicated. Let t,, the next approximation to r, be determined by the intersection of the tangent with the t-axis. Thus, tl
=
to -f(to)lf’(to).
(5.16.2)
Continuing in this way, we obtain the sequence (t,} generated by the recurrence relation tn+l = t n
-f(tn)if’(tn).
(5.16.3)
I t is clear from the figure that if u(t) is convex as indicated in the neighborhood of the zero, then t , converges to r, and indeed monotonically. But even more is true. Without loss of generality, take r = 0 which means that u(t) has the expansion
+ a,t2 + ..-, u’(t) = a, + 2a,t + .-., u ( t ) = alt
in the neighborhood of t tnfl = t,
-
=
[a$,
(5.16.4)
0. Then, from (5.16.3),
+ aztn2+ ...] / [ a l+ 2a2t, + ...I (5.16.5)
This is quadratic convergence. Computationally, it means that the number of significant figures essentially doubles on each iteration when we get sufficiently close to the root. Analytically, this type of approximation often provides a crucial additional degree of freedom in circumventing some troublesome difficulty. A rigorous proof of the convergence of t, can readily be obtained under various assumptions. If U ( t ) is not convex, the choice of the initial approximation is of the utmost significance.
206
THE BUBNOV-GALERKIN METHOD
Exercises
1. Consider the problem of determining a square root using the ancient 1 u,), ug = y, where 0 < x ,y . Show algorithm untl = F(x2/u, that u, converges to x as n + 00 for any initial choice of y and that the convergence is quadratic.
+
+ v,)
2. Set v, = u, - x. Show that v,+~= vn2/2(x v7L3 0 for n 3 I , and that u,+~ < v,.
and thus that
<
3. Consider v, = v,(x,y) as a function of x and y with 1 x < u, 1 < y < a. Show that v,(x,y) is a decreasing function of x for x < a. At 1 < x < y and an increasing function of x for y x = y , v, = 0, av,/ax = 0. Hence, for 1 < x a, v,(x,y) assumes its maximum in x either at x = 1 or x = a ;
< <
where we have omitted the y-dependence on the right.
< <
y a, a,( I ) = v,( 1, y ) is a monotone increasing function of y and vn(u) = v,(a,y) is a monotone decreasing function of y. Therefore, there is exactly one point of intersection of vn( 1, y ) and v,(a, y ) for 1 y a. Call this value y, . T h is yields the minimum a, of the expression max, (u, - x). over y , for 1 y
4. For 1
< < < <
5. Show that yT2.> yntl . 6. Show that y , converges to
as n -+ co. Hint: Show that
~
lim u , ( l , d 4 n+m
u,(a,
-
di)
1.
7. Establish corresponding results for the sequence (u,) generated by
This may be used to calculate the Rth root of a number.
8. Establish corresponding results for the sequence {u,} generated by u , + ~= xe-un
+ u,
, u,,= x, used to generate logarithms.
9. What results hold for the solution of ~ ( x )= x generated in this fashion where 'p is convex ?
5.17. MULTIDIMENSIONAL NEWTON-RAPHSON
207
10. Consider the case where a = 10. Using the recurrence relation = vn2/(x vn), calculate vn(l, y) and ~ ~ ( 1y)0 , and show graphically that, approximately,
+
yz = 4.71
min v2 = 2.20
y 3 = 4.13
minv, = 1.04
y, = 3.72
minv,
=
0.312
y5 = 3.44
minv,
=
3.59 x
ys = 3.34
minv,
=
2.45 x lo-,
y7 = 3.23
minv,
Y
Y
Y Y
Y
= 2.44
x
Y
1 1 . Show that if for n = 7 we use y = 3 instead of y, , we find an error i f y = 4, an error at x = 1 of 3.39 x lo-*. at x = 10 of 3.58 x This shows the sensitivity to choice of an initial approximation if a fixed number of iterations is employed. T h e foregoing is taken from an unpublished paper R. Bellman, On a Minimax Problem Occurring in the Computation of Square Roots, 1952. See also
P. C. Hammer, Proc. Computing Seminar, IBM, December 1949, p. 132. J. J. Sylvester, “Meditation on Idea of Poncelet’s Theorem,” “Notes on Meditation,” Mathematical Papers, Vol. 11. D. G. Moursund and G. D. Taylor, “Optimal Starting Values for the Newton-Raphson Calculation of Inverses of Certain Functions,” SIAM J. Numer. Anal., Vol. 5, 1968, pp. 138-150. 5.17. Multidimensional Newton-Raphson
T h e same approach can be used when
f(4 = 0
(5.17.1)
is a set of N simultaneous equations for the N components of the vector x. Using the multidimensional expansion, we have f(x)
= f(X(R))
+ J(x‘”’)(x
-
)
+ .’.,
(5.17.2)
208
THE BUBNOV-GALERKIN METHOD
where J ( x ) is the Jacobian matrix (5.17.3) Proceeding as before, we are led to the recurrence relation X(n+l)
= X(n) - J( x(n))-1 f ( x ( n ) ) ,
n
=
0, 1,...,
(5.17.4)
for the determination of successive approximations to the solution of (5.17.1). T h e proof of convergence now proceeds in the usual fashion. Let x = 0 be the desired solution. We show first that /I d n ) 11 k, , n == 1, 2, ..., provided that 11 do)/ / is sufficiently small. T h e n we show that I/ x ( n + l )/I k2// d n )/I where k, < 1, which implies that -+ 0 as n -+ co. Finally, we show that the convergence is quadratic. A simple set of requirements is that J ( f ) evaluated at x = 0 be nonsingular and that f possess uniformly bounded second partial derivatives with respect to the components of x in the neighborhood of x = 0. We leave the filling in of the details as a set of exercises.
<
<
5.18. Choice of Initial Approximation
What the foregoing illustrates is that the choice of an initial approximation is crucial. T h e convergence of the method, as well as the time required t o carry it through, depend on do).There are a number of systematic ways of obtaining an initial approximation. I n general, however, this is a catch-as-catch-can affair with the physical background and previous experience playing a considerable role. Let us discuss three different approaches to illustrate the value of a flexible stance in these matters. T o begin with, returning to the equation of (5.15. I),
we can always use the solution of the linear approximationL(u) - g as a starting point. Call this function u(O)and write
=
0
(5.18.2) Choose the N values, [aio),..., a $ ) ] , as the first approximation to the solution in (5.15.6).
5.18. CHOICE OF INITIAL APPROXIMATION
209
Secondly, we can use the technique of continuation. Suppose that the problem is that of solving (5.18.1) over an interval [0, TI. The best approximation of the form N
u(N) =
C a,u,
(5.18.3)
n=l
depends upon T . Let us indicate this specifically by writing ff, =
ffl(T),..*, aN
= aN(T)
(5.18.4)
for the optimal set of coefficients. For small T , it is often easy to determine the values ui(T ) quickly by means of the Newton-Raphson techniques, or otherwise, since the solution of L(u) = g will usually be an excellent approximation. If then the desired interval of approximation is [0, To],we can construct a sequence of problems, say for the intervals [0, A ] , [0,241, ..., [0, NA], where NA = T o , using the approximation obtained for [0, Kd] as the initial approximation for [0, (k I ) A ] . The choice of A is again a matter of experience and some experimentation. In Volume 11, in connection with invariant imbedding and dynamic programming, we will use this idea in a more sophisticated way. Finally, let us briefly indicate the use of some extrapolation techniques. For small N , say N = 1 , 2, 3, search methods can be used to obtain initial approximations to the minimizing uk . Let the respective values be
+
At the next step, N the values
N
=
1:
u:)
N
=
2:
a(2) 1 ’ ($2)2
N
=
3:
a?),
=
4, we can use as the initial approximation [UP), UP), u
(5.18.5)
43),43).
y , 01
(5.18.6)
and then continue with the Newton-Raphson algorithm for the solution of the equations of (5.15.6). In doing this, we are implicitly assuming that the uk approach zero as k increases. T h e u, are usually arranged so that this is the case. Alternatively, we can use the values u p ) to predict ai4) (see Fig. 5.2) and, similarly, the values of a:%),a$%),n = 1, 2, 3, and thus obtain an improved initial approximation, [a?),
fq, 01. UP),
(5.18.7)
210
THE BUBNOV-GALERKIN METHOD
Figure 5.2
If we don’t like the initial choice 0 for al4),we can use the values al4)9 a i 4 ) ,ah4) to predict a value of ai4),considering aL4)as a function of the subscript K. We can expect this type of extrapolation procedure to be a more and more effective technique as we increase the value of N and obtain additional values of .in). Since we feel that the use of extrapolation techniques will increase rapidly over the coming years, we will consider a specific procedure in more detail below. 5.19. Nonlinear Extrapolation and Acceleration of Convergence
What we have dubbed “nonlinear extrapolation” can also be viewed as a method for the acceleration of convergence of a sequence {u,}. Clearly, this is an important matter wherever approximate techniques are being employed. Let u s describe a method first suggested by Kronecker which has proved of great utility. Suppose that the sequence {u,} converges to a value u , and that u, possesses the asymptotic expansion un N u,
+ albl, + a2b2n +
’
”,
(5.19.1)
where 1 > b, > b, > * . I . Our aim is to obtain another sequence whose limit is u, , the desired limiting value, possessing a more rapid approach to this value as n + 00. What this amounts to is a procedure for filtering out the first exponential term, alb,,, appearing above. One way to do this is the following. Write u,
= 24,
+ albln,
(5.19.2)
and solve for u, in terms of the u, . Three values are needed, which are sensibly taken consecutive. A simple calculation yields u, =
+
unu,+z - G + 1 un+z un - 2un+1.
(5.19.3)
5.20. ALTERNATIVES TO NEWTON-RAPHSON
211
Consider then the new sequence {vn},where
I t is easy to see that vn = u w
+ O(bZn),
(5.19.5)
as n -+ 00, which is to say that the convergence has been accelerated. This procedure may now be repeated, starting with the sequence {vn}, to obtain arbitrarily rapid convergence. Exercise
1. What is the corresponding result if we start with the expression un = uw
+ albln + a,bzn ?
5.20. Alternatives t o Newton-Raphson
One of the serious difficulties associated with the use of the NewtonRaphson technique is the calculation of the Jacobian matrix J(xn), and its inversion at each step. For that reason, a number of alternative procedures may be found in the literature. References are given at the end of the chapter. Exercise
1. As a method of approximating to the solution of x = q ( x ) without the use of a derivative, consider the recurrence relation of Steffenson,
What is the order of convergence? For extensions and additional references, see
S. Yu. Ulm, “Extension of Steffensen’s Method for Solving Nonlinear Operator Equations,” U S S R Comp. Math. & Math. Phys., Vol. 4, 1964, pp. 159-165.
I. Pavaloiu, “Sur la MCthode de Steffenson pour la RCsolution des
Equations Operationnelles Non LinCaires,” Revue Roumaine de Mathe‘matiques Pures et Applique‘es, Tome 13, 1968, pp. 857-862.
212
THE BUBNOV-GALERKIN METHOD
5.21. Lagrange Expansion
We have indicated in the preceding pages that the application of the Bubnov-Galerkin method to the solution of nonlinear functional equations leads to the problem of solving a system of nonlinear equations of the form x =- a +g(.). (5.21,l) One approach, as we have seen, to the numerical solution of equations of this type is based on the use of the Newton-Raphson method. Another approach depends upon the introduction of a fictitious parameter t . Write (5.21.2) x=a tg(.),
+
and attempt to determine x as a power series in t ,
x
=
a
+ fg(.) + ....
(5.2 1.3)
I t would appear to be quite tedious to obtain the higher-order terms. Fortunately, an elegant formula is available for obtaining the coefficients in this power series expansion. I n the one-dimensional case, it is due to Lagrange; in the multidimensional case to Stieltjes and PoincarC. Let u s consider here only the one-dimensional case to indicate the type of result available. Let g(x) be an analytic function of x in a contour surrounding a, and let f ( x ) be another analytic function of x in this 1, we have contour. Then, for 1 t 1
<<
(5.21.4) There are many ways to derive this, as we indicate below in the exercises. Probably the most straightforward is by means of the Cauchy integral formula. We have
(5.2 1.5) where C is the contour mentioned above. Take t small enough so that 1 tg(w)I << I w - a 1 on the contour. Then, using the expansion
(5.21.6)
5.21. LAGRANGE EXPANSION
213
we obtain
Integrating the integrals in the second sum by parts, we derive the stated result. There are analogous results for the multidimensional case, as indicated in the references. Exercises
1. Use the foregoing result to obtain power series solutions of the equations tun, (a) u = a (b) u = a + tebu(Ramanujan’s equation), (c) u = a + t sin u (Kepler’s equation). 2. Starting with the equation z = a t(z2- l), deduce that
+
+
3. Starting with z
=
a
+ t x , establish the expansion
e-all-t)
--
(1 - t )
n=l
4. If u
= g(x - ut), show that ul = -uu,, and thus derive the Lagrange expansion. This was the procedure used by Lagrange. See
E. Goursat and E. R. Hedrick, A Course in Mathematical Analysis, Ginn & Co., Boston, 1904, pp. 404405. E. D . Banta, “Lossless Propagation of One-dimensional Finite Amplitude Sound Waves,” J. Math. Anal. Appl., Vol. 10, 1965, pp. 166-173. P. Henrici, “An Algebraic Proof of the Lagrange-Burmann Formula,” J . Math. Anal. Appl., Vol. 8, 1964, pp. 218-224. W. F. Noh and M. H. Protter, “Difference Methods and the Equations of Hydrodynamics,” J. Math. Mech. Vol. 12, 1963, pp. 149-192.
5. Consider the following formal technique for deriving the Lagrange expansion. Let y(x) and u(x) be infinitely differentiable functions and
214
THE BUBNOV-GALERKIN METHOD
form the integral J q ( x ) u(x - tg(x))dx, with suitably chosen limits. Perform the change of variable x - tg(x) = y , so that
Using the Taylor expansion, we have
Integrate each term J p(x) g(x)" d n ) ( x )dx repeatedly to eliminate d n ) , and thus by equating coefficients of zi on each side of ( l ) , obtain the Lagrange expansion. (E. Heine.)
6. Use the double integral
JJ
d " 1 > x2) 4 x 1
- t,g,(x,
9
dx1 dx2
x2), x2 - t2g2(x1 3 . 2 ) )
in a similar fashion to obtain a two-dimensional Lagrange expansion.
7. Use the fact that u = g(x - ut, y - vt), v = h(x - ut, y - vt) constitute the solution of u t = uu, v u v , vt,= uvz vvv, u = g(x, y ) , z, = h(x, y ) at t = 0 to obtain a multidlmensional Lagrange expansion.
+
+
5.22. Method of Moments Applied to Partial Differential Equations
T h e "method of moments" permits us to reduce ordinary differential equations to algebraic equations and partial differential equations to ordinary differential equations. A good example of this is furnished by the equation L(u) == u3cz- ~t - gu
=f
,
0 < x < 71,
t
> 0,
(5.22.1)
where u is subject to the conditions u(x, 0) = 0, u(0, t )
= u(77, t ) .
(5.22.2)
Use the approximation
C u d t ) sin kx N
u
=
k=l
(5.22.3)
215
MISCELLANEOUS EXERCISES
and the moment conditions
s:
(L(u) -f) sin Kx dx
=
0,
K
= 1,2 ,..., N .
(5.22.4)
Then (5.22.4) yields a system of linear differential equations N -j2Uj
- Uj’
ukbkj
~
0,
- aj
(5.22.5)
k=l
where
A discussion of the validity of this procedure will be found in the references given at the end of the chapter.
Miscellaneous Exercises
1. Write u du dt
-
= u =
sin(t
+ y ) , where a and y are functions of t. Then
da sin(t + p’) dt
Set dujdt sin(t
+ a cos(t + + [ a cos(t + p))
p))]
dt. dv
+ 9)) + [a cos(t + y ) ] dy/dt = 0. Then
and d2u
dt2 +
=
da
cos(t
+
p)) -
[a sin(t
+ p))]
d9J
dt
.
Using these results, obtain an alternate solution of u”
2. Consider the equation
+ +
da dt
a
+ u =f ( t ) .
U” u ~ f ( uu’) , = 0. As above, set where a and y are functions of t. Proceeding as before, obtain the equations
u = a sin(t
+ T),
- = - ~ f ( a sin(t + p)),
Conclude from this that a and
cos(t
+
p)))
cos(t
+
p’),
are constant to terms which are O(E).
216
THE BUBNOV-GALERKIN METHOD
3 . From the foregoing, show that a useful approximation for a and is given by the equations
v
where
See
N. Kryloff and N. Bogoliuboff, “Introduction to Nonlinear Mechanics,” Ann. Math. Studies, No. 11, Princeton University Press, Princeton, New Jersey, 1947.
4. Consider the special case where f ( u , u‘) = f ( u ) . Show that an approximate solution according to the foregoing technique is then 61, where
+
x = a sin[w(a) t
j
I
w(u)2 = ?ra
2n
[a sin
0
5. Consider the equation u” + u
v1 + cj(a sin vl)l sin y1 dvl .
+
E(U - u3/6)= 0. What is the approximate change in frequency of the solution due to nonlinearity ?
6. Consider the equation u” + hg(u, u’) + u coordinates u becomes
cos 6, u
= Y
dr -
d0
=
= 0. Introduce polar r sin 6. Show that the equation
X sin 0g(r cos 0, Y sin 0)
1
~
+ h cos 0g(r cos 0, r sin 0)/r
*
7. Show that a periodic solution of the equation in u corresponds to a solution of the equation in Y which has period 2.rr in 6. Show that the condition for periodicity is
I
2n
1
sin Bg(r cos 0, Y sin 0) d8 T sin O)/Y
+ X cos 0g(r cos 8,
=
8. In the limit as h +0, we have r
0.
= a, a constant, and thus the limiting values of the amplitudes of periodic solutions as h + 0 are the roots of
J
2n 0
sin 0g(a cos 0, a sin 0)
=
0.
217
MISCELLANEOUS EXERCISES
+
+
9. Apply the foregoing to the equation u” h sin u u = 0. Do all of the values obtained correspond to periodic solutions for h > 0 ? Which periodic solutions are stable ? See
H. Hochstadt and B. H. Stephan, “On the Limit Cycles of x p sin x x = 0,” Arch. Ratl. Mech. Anal., Vol. 23, 1967, pp. 369-379.
+
+
10. Let u(t) be a scalar function which is strictly convex, u” > 0 for all t . Show that u(t) can have at most one relative minimum and that if this point exists it is an absolute minimum. Hint:
11. Letf(x, , x2) be a function of two variables which is strictly convex, , fzlz, Y~~~~~~ is positive i.e., the quadratic form Y , ~ , ~ ~ 2r1r2 definite. Show that f has at most one stationary point, i.e., a point where f,, = 0, f,, = 0, and that if this point exists, it is an absolute minimum. Hint: Consider the function of one variable f ( t ) = f(tal (1 - t ) b, , ta, (1 - t ) b,) where (a, , b,) and (a2, b2) are two supposed stationary points. Show that f ‘ ( 0 ) = f’(1) = 0, f”(t) > 0, and hence deduce a contradiction.
+
+
+
+
12. If g, > 0 for
+
u # 0, the equation U“ g(u) = 0 has a family of closed paths given by G(u) = K, where G(u) = JEg(ul) du, . A necessary and sufficient condition that any solution oscillates around u = u‘ = 0 with bounded period is that ug(u) > 0 for f;0, g(0) = 0, g’(0) # 0.
+
13. Let T~ and T~ respectively be the times required to reach (0, -R) from (0, R ) through (a, 0) and (0, R ) from (0, - R ) through (b, 0). Show that R2 = ~ G ( u=) 2G(b),
14. Under what conditions is the period of every periodic solution of
+ g(u) = 0 equal to the same constant wo ? For the foregoing and additional references, see
U”
M. Urabe, Relation Between Periods and Amplitudes of Periodic g(x) = 0, Mathematics Research Center Solutions of x Report No. 437, November, 1963.
+
218
THE BUBNOV-GALERKIN METHOD
15. Determine g(u) so that the period is a given function of the velocity at R.
M. Urabe, “The Potential Force Yielding a Periodic Motion Whose
Period is an Arbitrary Continuous Function of the Amplitude of the Velocity,’’ Arch. Ratl. Mech. Anal., Vol. 11, 1962, pp.27-33.
16. (Lindstedt renormalization.) Consider the equation u” and make a change of independent variable t
=
s(1
+ + CIE
C2E2
+ u + cu3 = 0
+ ’..)
where c l , c2 ,..., are constants to be determined. T h e new equation has the form U”
Write u U;
+u +
+ 22) + 2((---) + ... = 0.
€(2C1U
+ cul(s) + + u1 = - ( ~ c , u , + u;) =
a cos s
+
E ~ U ~ ( S )
=
Then
*.a.
-(2uc1 cos s
+ u3
s).
COS~
Show that c1 can be determined so that u1 has no secular term, and that this condition yields c1 = -3a2/8. Thus, u
=
s =
cos 3 E
ucoss
+
t/(l
3a2€/8)
-
UE
32
+ O(E2),
+ O(2).
17. Obtain the solution of O(e3)in this fashion.
18. Carry through the same procedure for the Van der Pol equation u” E(U2 - 1) u’ u = 0.
+
+
+ +
19. Consider the solution u of u” u eu3 = 0 as an analytic function of E, and thus the period as an analytic function of E. What is the radius of convergence of the power series in E for the period ? 20. (Shohat renormalization). Consider the following technique for increasing the range of validity of the Lindstedt technique. Write s = tv as before, obtaining v2u“
or ( E z J ) 2 U”
+u + +
E2U
EU3
+
=
€3113
0 =
0.
MISCELLANEOUS EXERCISES
Now set p
=
c/(1
+
c)
or E
=
219
p/(l - p ) and
+ czp2 + c3p3 + ..-, u = cos s + pul(s) + p2u2(s) + ...,
vc = p
and proceed as before to determine c 2 , c3 ,..., by the condition that no secular terms arise in u l ,u2 ,... . Show that 1.37500 c3 = 1.66797 c4 = 1.91845 c6 = 2.14083.
c2 =
Why should we expect the series in p to converge for a larger range of positive +values, and does a bilinear transformation p = c/(k E) exist which yields a power series in p convergent for all 4 3 O?
+
21. Carry out the same procedure for the Van der Pol equation, obtaining c2 = 1, c3 = 15/16, c4 = 13/16. This yields the following approximate values:
x
v(Shohat)
v(Van der Pol)
0.33 1.0 2.0 8.0 10.0
0.98 0.93 0.77 0.35 0.30
0.99 0.90 0.78 0.39 0.31
See
J. Shohat, “On Van der Pol’s and Related Nonlinear Differential Equations,” J. Appl. Phys., Vol. 15, 1944, pp. 568-574. R. Bellman, “On Perturbation Methods Involving Expansions in Terms of a Parameter,” Quart. Appl. Math., Vol. 13, 1955, pp. 195-200. Ja. F. Kajuk, “Analytic Continuation of Solutions of Nonlinear Differential Equations with Respect to a Parameter” (in Russian), Ukrain. Mat. Z., Vol. 19, 1967, pp. 131-137. G. Sandri, “Uniformization of Asymptotic Expansions,” Nonlinear Partial Dz3erential Equations : A Symposium on Methods of Solution (Newark, Delaware, 1965), Academic Press, New York, 1967, pp. 259-277.
220
THE BUBNOV-GALERKIN
METHOD
Although it would, from the agreement between the results of Van der Pol obtained graphically and those of Shohat, appear that the Shohat series converges for all X 3 0, this is not the case as has been shown by deBruijn. See
N. G. deBruijn, “A Note on Van der Pol’s Equation,” Phil1;Ps Research Reports, Vol. 1, 1946, pp. 401406. For many further results, see
&‘ Urabe, I. “Periodic Solution of Van der Pol’s Equation with Damping Coefficient X = 0 (0.2) I .O,”J . Sci. Hiroshima Univ., Ser. A, Vol. 21, 1958, pp. 195-207. R/I. Urabe, H. Yanagiwara, and Y. Shinohara, “Periodic Solution of Van der Pol’s Equation with Damping Coefficient h = 2 10,” ibid., Vol. 23, 1960, pp. 325-366. M. Urabe, “Remarks on Periodic Solutions of Van der Pol’s Equation,” ibid., Vol. 24, 1960, pp. 197-199.
-
See also, for extensions of the Lindstedt technique,
J. Lighthill, “A Technique for Rendering Approximate Solutions to Physical Problems Uniformly Valid,” Phil. Mag., Vol. 40, 1949, pp. 1179-1201. G . Temple, “Linearization and Nonlinearization,” Proc. Intern. Congress of Mathematicians, August 1958, pp. 233-247. M. H. Millman and J. B. Keller, “Perturbation Theory of Nonlinear Boundary-value Problems,” J . Math. Phys., Vol. 10, 1969, pp. 342-360.
+
>
22. Consider the equation U” w2u + f ( u , u’) = 0, where E 1, and the approximate solution u a(t) sin(wt q(t)),where a ( t ) and y ( t ) are determined by the Krylov-Bogoliuboff averaging method. If a ( t ) and T ( t ) are given, can one determine f ( u , u’)? Consider the case f ( u , ZL’) = ulg(u2) uk(u2).Show that one is led to an Abel integral equation.
+
+
M. ‘I’obak, On Deduction of Certain Nonlinear Dzyerential Equations
f r o m their Solutions, NASA Technical Note, T N 0-2779, 1965.
+
23. Consider the equation U” + 1 . 5 ~ ( u - 1.5 sin t)3 = 2 sin t. Let u -~ a , sin t + a3 sin 3t be a Galerkin approximation. Show that a, 1.59941, a, = -0.00004. (M. Urabe.) ~
24. 1,et the fifth-order approximation be u =
b, sin t
+ b, sin 3t + b, sin 5 t .
MISCELLANEOUS EXERCISES
221
Using the value of b, given above as an initial approximation, and b, = b, = 0, together with the Newton-Raphson procedure, show that b, = 1.431189037 b, = 0.126915530 b, = 0.009754734
(L. Cesari, M. Urabe.) 25. Consider the Van der Pol equation with a forcing term us For h
=
+ h(u2 - 1) u’ + u = h sin t .
0.1, obtain the Galerkin approximation u = -2.3788
cos t
-
-0.0046 cos 3t
0.14233 sin t
+ 0.0419 sin 3t + ....
M. Urabe, “Galerkin’s Procedure for Nonlinear Periodic Systems,” Les Vibrations Force‘es dans les Systemes Nonline‘aires, Marseille, 1964, pp. 217-235.
+
26. Consider the equation EU” u’ + u = 0, u(0) = c, E > 0, and the expression J(u) = J-: (tu” + u’+ u), dt. Similarly, consider the 1 . Can one use the equation EU” + u’+ u = u2, u(0) = c, I c I
<
Bubnov-Galerkin method to obtain approximate solutions in the 1? case where I E 1
<
27. Consider the following method due to Steinhaus for finding the solution of the linear system L, = allxl a + + a13 = 0 , L, = a2,x1 a22x2 a23 = 0 :
+
+
+
+
From an arbitrary point P, on L, , drop a perpendicular on L,. This determines P, . From P, , drop a perpendicular on L, . This determines P, , and so on. Show that the sequence {Pn>converges to the desired solution. Obtain the N-dimensional generalization of this.
222
THE BUBNOV-GALERKIN METHOD
28. Extend the foregoing procedure to finding the simultaneous solution off(., y ) = 0, g(x, Y ) = 0. Bibliography and Comments 95.1. For an important survey of applications of the Bubnov-Galerkin method, see
T. J. Higgins, “The Approximate Mathematical Methods of Applied Physics as Exemplified by Application to Saint-Venant’s Torsion Problem,” /. Appl. Phys., Vol. 14, 1943, pp. 469-481. See also
B. A. Finlayson and L. E. Scriven, “The Method of Weighted Residuals-A Appl. Mech. Reviews, Vol. 19, 1966, pp. 735-748.
Review,”
$5.6. For a discussion of methods of treating the nonlinear equation u” +f(.)
=
a cos wt,
together with numerous examples and references, see N. Minorsky, Nonlinear Oscillations, D. Van Nostrand, Princeton, New Jersey, 1962. K. 0. Friedrichs and J. J. Stoker, “Forced Vibrations of Systems with Nonlinear Restoring Force,” Quart. Appl. Math., Vol. 1 , 1943, pp. 97-115. M. E. Levenson, “A Numerical Determination of Subharmonic Response for the Duffing NX px3 = F cos wt, N > 0,” Quart. Appl. Math., Vol. 25, 1967, Equation x’‘ pp. 11-17. H. H. Denman and Y. King Liu, “Application of Ultraspherical Polynomials to Nonlinear Oscillations 11. Free Oscillations,” Quart. Appl. Math., Vol. 22, 1965, pp. 273-292. H. H. Denman, “Ultraspherical Potential Approximation in Nonlinear Symmetric Free Oscillations,” J. Franklin Inst., Vol. 286, 1968, pp. 346-352. J. E. Littlewood, “Unbounded Solutions of y” g ( y ) = p ( t ) with p ( t ) Periodic and &a,”J. London Math. SOC.,Vol. 41, 1966, Bounded, and g ( y ) / y + co as y pp. 497-507.
+
+
+
---f
For a detailed discussion of the application of Galerkin’s method to problems of this nature, sce
M. Urabe, “Galerkin’s Procedure for Nonlinear Periodic Systems and Its Extension to Multipoint Boundary Value Problems for General Nonlinear Systems,” Numerical Soltitiom of Nonlinear Di#erential Equations, Wiley 8~ Sons, New York, 1966, pp. 297-327. M. Urabe, “Galerkin’s Procedure for Nonlinear Periodic Systems,” Arch. Ratl. Mech. Anal., Vol. 20, 1965, pp. 120-152. M. Urabe and A. Reiter, “Numerical Computation of Nonlinear Forced Oscillations by Galcrkin’s Procedure,” J . Math. Anal. Appl., Vol. 14, 1966, pp. 107-140. M. Urabe, “Numerical Solution of Multi-point Boundary Value Problems in Chebyshev Series-Theory of the Method,” Nzimerische Mathematik, Vol. 9, 1967, pp. 341-366. The applicability of nonlinear oscillators of the Van der Pol type to the explanation of various biological phenomena was indicated by Van der Pol in his original paper and was also considered by N. Wiener, see
BIBLIOGRAPHY A N D COMMENTS
223
B. Van der Pol, “Uber Relaxationsschwingungen,” Jahrb. drahtl. Telegr. und Teleph., Vol. 28, 1927, pp. 178-184; Vol. 29, 1927, pp. 114-118. The possible application to quantum mechanics was discussed in 1946 by N. Minorsky and R. Bellman in unpublished work, and more recently in Andrade e Silver, Jao, Fer Francis, Lenuste Philippe, Lochak, Georges, “Nonlinkaritt, Cycles Limit& et Quantification,” C . R. Acad. Sci. Paris, Vol. 251, 1960, pp. 26622664. Duerr, Heisenberg, Mitter, Schlieder, and Yamazaki, “Zur theorie der Elementarteilchen,” Zeit. Naturforschung, Bd. 14a, 1959, pp. 441-485. For the application of elliptic functions to equations of this nature, see
C. S. Hsu, “On the Application of Elliptic Functions in Nonlinear Forced Oscillations,” Quart. Appl. Math., Vol. 17, 1960, pp. 393-407.
K. Munakata, “Use of Elliptic Functions for Nonlinear Equations,” J. Phys. Soc. Japan, Vol. 7, 1952.
+ + b sin t ) u = 0, an
(a Elliptic functions can also be used to study the equation u’’ equation of Mathieu type, in the following fashion. The solution of u”
+ ( a + b sin t ) u = 0,
where sin t is the sine-amplitude function, can be expressed in terms of elliptic functions, a classical result of Hermite. The limit of sin t as the modulus, k2, approaches zero is sin t . Thus an explicit solution of the Mathieu equation can be obtained in terms of the limit of appropriate elliptic functions. With the aid of modern digital computers, this becomes a feasible approach to the analytic and computational study of differential equations with periodic coefficients. $5.10. See R. Bellman, “A ‘ Tucking-in’ Technique in Perturbation Theory,” J . Diff. Eq., Vol. 4, 1968, pp. 1-3.
$5.16. For an important variant of the Newton-Raphson method, which avoids the use of partial derivatives at the expense of convergence of order of (1 i - 1/3)/2 < 2 , see S. M. Robinson, “Interpolative Solution of Systems of Nonlinear Equations,” S I A M J. Numer. Anal., Vol. 3, 1966, pp. 650-658. See also A. M. Ostrowski, Solution of Equations and Systems of Equations, Academic Press, New York, 1960, Appendix D. R. Bellman and R. Kalaba, Quasilinearization and Nonlinear Boundary Value Problems, American Elsevier, New York, 1965. D. G. Moursund, “Optimal Starting Values for Newton-Raphson Calculation of x ~ / ~ , ’ ’ Comm. A C M , Vol. 10, 1967, pp. 430-432. D. G. Moursund and G. D. Taylor, “Optimal Starting Values for the Newton-Raphson Calculation of Inverses of Certain Functions,” S I A M J . Numer. Anal., Vol. 5 , 1968, pp. 138-150. L. L. Wegge, “A Family of Functional Iterations and the Solution of the Maximum Likelihood Estimating Equation,” Econometrica, Vol. 37, 1969, pp. 122-130.
T H E BUBNOV-GALERKIN
224
METHOD
P. H. Sterbenz and C. X. Fike, “Optimal Starting Approximations for Newton’s Method,” Mathematics of Computation, Vol. 23, 1969, pp. 313-318. For the use of quadratic convergence in an important fashion, see
J. Moser, “A New Technique for the Construction of Solutions of Nonlinear Differential Equations,” Proc. Nat. Acad. Sci. U.S., Vol. 47, 1961, pp. 1824-1831. $5.19. As might be imagined, an enormous amount of effort has been devoted to the developnient of both linear and nonlinear extrapolation methods for the acceleration of convergence; see, for example, D. Shanks, “Nonlinear Transformations of Divergent and Slowly Convergent Sequences,” J . Math. and Phys., Vol. 34, 1955, pp. 1-42. H. Rutishauser, Der Quotientan-Differzen-Algorithmus, Berkhouser-Verlag, Basel/ Stuttgart, 1957. P. Wynn, “Confluent Forms of Certain Nonlinear Algorithms,” Archiv der Mathematik, Vol. 11, 1960, pp. 215-236. R. Bellman and R. Kalaba, “A Note on Nonlinear Summability Techniques in Invariant Imbedding,” J. Math. Anal. Appl., Vol. 6, 1963, pp. 465-472. 95.21. See P. A. Sturrock, “Generalization of the Lagrange Expansion with Applications to Physical Problems,” J . Math. and Phys., Vol. 1, 1960, pp. 405-408. For the multidimensional expansion, see
I. J. Good, “Generalizations to Several Variables of Lagrange’s Expansion, with ApplicaVol. 56, 1960, pp. 367-380. tions to Stochastic Processes,” Proc. Cambridge Phil. SOC., D. Gorman and J. Zaborszky, “Functional Lagrange Expansion in State Space and the S-Domain,” IEEE Trans. Automatic Control, Vol. AC-11, 1966, pp. 498-505. For an integral representation due to Kronecker of the solution of a system of equations, with applications, see
G . Birkhoff and 0. 11. Kellogg, “Fixed Points in Function Space,” Trans. Amer. Math. SOC., Vol. 23, 1922, pp. 96-1 15. For thc use of the Wiener integral to provide an explicit solution to various types of functional equations, see
I. hl. Gelfmd and A. M. Yaglom, “Integration in Functional Spaces and its Application to Quantum Mechanics,” J. Math. and Phys., Vol. 1, 1960, pp. 48-68. $5.22. For a proof that the procedure given in this section constitutes an effective method for obtaining a n arbitrarily good approximation to the solution of the original equation (5.22.1), see
J.
\$’.
Green, “An Expansion Method for Parabolic Partial Differential Equations,”
J. Research Natl. Bnr. Standards, Vol. 51, 1953, pp. 127-132.
See, for another application,
J . McCoy, “An Application of the Method of Moments to Stochastic Equations,” Quart. d p p ! . Math., Vol. 26, 1969, pp. 521-536.
Chapter 6 DIFFERENTIAL APPROXIMATION
6.1. Introduction
I n previous chapters we studied some approaches to obtaining analytical and computational solutions of equations of the form T(u) = 0.
(6.1.1)
I n this chapter we wish to reverse the process and study the problem of finding equations satisfied by a given function u. T h e question is particularly interesting when u is given implicitly as a solution of a functional equation of one type and we are looking for an equation of another type. I n so doing we make contact with some novel problems concerning the storage and retrieval of functions. These problems have become of major significance as digital computers have increased in power and versatility. T h e investigation bifurcates accordingly as we assume that u may be considered known in this fashion or must be regarded as the basic unknown quantity. I n the latter case our methods furnish another technique far obtaining approximate solutions. A number of illustrations of the efficacy of the general method will be given in the text and in the exercises. 6.2. Differential Approximation
Let { T(u, b)} be a class of operators dependent on a vector parameter b. Our aim is to determine b so as to minimize the functional
I1 T(u9 b>11
(6.2.1)
where (1 . (1 is a n assigned norm. We expect to find that a suitably chosen solution of T(v,6) = 0 (6.2.2) furnishes a “convenient” approximation to u. T h e measure of convenience is dictated by certain properties of digital, analog, and, more generally, hybrid computers. This, in turn, is part of the contemporary concept of algorithm, a point we shall return to below. 225
DIFFERENTIAL APPROXIMATION
226
As usual, the question of validity turns out to lie in the domain of stability theory. Studies of this nature represent an extensive generalization of the classical theory of the approximation of functions. 6.3. Linear Differential Operators
Linear differential operators with constant coefficients constitute the most important class of operators for our present purposes, T(u, b)
f U(N)
+ b,U"-1) + ... + b N U .
(6.3.1)
From the standpoint of the employment of both analog and digital computers and, of course, from that of analytic approximation, there are considerable advantages to having a function defined as a solution of an equation of the form T(v, b ) = 0. (6.3.2) Let us consider then the problem of minimizing the expression
I: ,
(u(N)
+ b , ~ ( ~+- ~...) + bNu)2dt,
(6.3.3)
with respect to the 6, where u(t) is a given function. If we assume that the functions u, z', ..., z ( N ) are linearly independent, the minimizing values are uniquely determined as the solution of the system of simultaneous linear algebraic equations
1:
(U(N)
+ bp(N-1' + ... + b N U ) 2 d N - i )
dt
0,
1
(6.3.4)
i = 1 , 2, ..., N , obtained by differentiation of the expression in (6.3.3). Exercise
1. Let ~ ( tbe) a given vector function. Determine the matrix A and the vector b which minimize the expression
J:
(x'
-
Ax
- 6, x'
-
Ax
-
b) dt.
6.4. Computational Aspects-I
T o obtain numerical values for the bi , it is necessary, as we see from (6.3.4),to evaluate integrals of the form
227
6.5. COMPUTATIONAL ASPECTS-II
k, 1 = 0, 1, ..., N - 1. Repeated integration by parts shows that the evalu, k )T( ) , k = 0, 1 ,..., N - 1, ation of Ikzrequires a knowledge of d k ) ( 0 )d and of I,&, k = 0, 1, ..., N - 1. Since these integrals cannot be evaluated analytically in the general case, some type of numerical procedure must be used to determine the quantities Ikk, k = 0, 1 ,..., N - 1 . Exercises
1. Determine the bi for N = 1, 2, 3, 4 in the case where u = e-12 and T = a. Hint: u satisfies the equation u’+ 2tu = 0, u(0) = 1. Determine min, 11 T(u, b)ll as a function of N for N = 1, 2,3 , 4 . 2. What expressions must be evaluated for N determined by the equations
=
3, 4 for the functions
+ tu = 0, + + u2 0, + u + = 0, + A(u2 - I ) U’ + u = 0.
(a) u” (b) U’ (c) u“ (d) U ” For a number
=
-U
u3
of numerical results of this nature, see
A. Lew, Some Results in DzfJerentiaZ Approximation, University of Southern California, USCEE-3 14, November 1968. 3. Apply the Gram-Schmidt procedure to the determination of the orthonormal sequence over [0, T ] determined by u and its derivatives,
+ xzl=o alizuz,
i.e., form the new sequence ( V k } , where vk = uk k = 0, 1, ..., and j - t v k v l dt = a,, . Here ul = dZ). Express min, in terms of the v k .
k -1
I( T(u,b)ll
4. From the point of view of ease of calculation of the sequence {vk>,
+
k-1
what advantage is there in writing vk = u k a k l v zin place of the foregoing as far as the determination of the coefficients a k z is concerned ? 6.5. Computational Aspects41
I n many cases of importance, we can evaluate the integrals IlCz in a far more convenient fashion than numerical quadrature. Consider the case where u satisfies a differential equation U t M ) = g(u,
u’, ...,u ( M - l ) , t ) ,
(6.5.1)
DIFFERENTIAL APPROXIMATION
228
subject to initial conditions d k ) ( 0 )= ck , k = 0, I , ..., M - 1. Introduce the new variables wIilby means of the relation dw’cz - -
dt
u(’%d1),
wkz(0)= 0,
k, I
=
0, 1,..., M - I.
(6.5.2)
Let us then consider (6.5.1) and (6.5.2) simultaneously using an appropriate integration scheme. Then w k l ( T )is the required value of I,, . This method works without further effort in the case where M 3 N - I . If N - I , we differentiate (6.5.1) a total of N - 1 - M times to begin with and then proceed as before. We shall encounter this idea of adjoining differential equations again below. ~
Exercises
1. For the case where u e c t 2 , determine the system of linear differential equations satisfied by u, u’,..., u ( ~ - ~ ) . 1
2. Obtain the similar system for the Bessel function of the first kind I l ( t ) where 1; I l ’ / t ( 1 - l / t z )Il = 0.
+
+
+
+
3 . Given the equation u” alu’ upu = 0, u(0) = 1, u’(0) = 0, determine the coefficient b, which minimizes J,’ ($ b , ~ dt. ) ~What dt ? initial condition for ZI’ + b,v = 0 minimizes J’,, (u -
+
4. Discuss the possibility of the use of the foregoing techniques in reducing the dimension of an equation describing a system. 6.6. Degree of Approximation
Let zi be a given function and let the bi be determined as above so as to minimize the functional
Consider the associated differential equation W(N)
+ b a ( N - 1 ) + ... + 6N
=
0,
(6.6.2)
and suppose, for the moment, that we set v(7c)(o) ~
U ( k )(O),
K=O,l,
..., N - 1 ,
(6.6.3)
6.7. ORTHOGONAL POLYNOMIALS
229
as appropriate initial conditions. What relation exists between v and u ? This is a stability question of the type previously discussed in Chapters 4 and 5. It is, however, of a considerable more obdurate kind in general. We can proceed initially as in Chapter 5. We see that u satisfies the equation U(N) blu"-l' ... + b,u = h(t), (6.6.4)
+
+
<
where we suppose that the bi have been determined so that 11 h(t)/l E. We might expect then that u would be close to v. T h e difficulty in pursuing this line of attack is that the bi , as determined by the minimization process, are complicated functionals of u and its derivatives. It seems to be quite difficult to construct any general argument showing that we can obtain arbitrarily good approximation as N increases and to obtain some estimate of 11 u - v 1) . Fortunately, in applications of the type described below we are not interested in large values of N . Indeed, as we shall see, the differential approximation method is meaningful only if N is of moderate size. I n these cases we can try the method directly and see if it works. I t shares this pragmatic property with the Bubnov-Galerkin method. 6.7. Orthogonal Polynomials
T h e problem of determining the asymptotic properties of the minimizing bi as N-+ co is crucial in determining the effectiveness of this approximation procedure for large N . Observe that if we write
/ / T(u,b)llz = J T ( d N+) b,~"-l' 0
+ ... + NU)' dt,
(6.7.1)
it is not at all obvious, as mentioned above, that minb Ij T(u, 6))12decreases as a function of N as N increases. If, however, we consider min b
J
T
O
(u
+ b,u(l) + ... + b , u ( ~ ) ) zdt,
(6.7.2)
then, of course, the minimum value must decrease as N increases. Nonetheless, the values of the minimizing coefficients in (6.7.2) enter critically when we consider the equation u
+ b1u'1' + ... + bNU")
= h(t)
and compare its solution with the solution for the case where h
(6.7.3) =
0.
230
DIFFERENTIAL APPROXIMATION
I n some cases, however, we can use classical analysis to guide us. Consider the situation where the interval of interest is (- 00, 00) and d k )E L2(- m, 60) for every k. Let ZI be the Fourier transform of u, v(s) =
1 " dG j
~
u ( t )eist dt,
-m
1
(6.7.4)
rcc
Then
(6.7.5) and
J
--m
T h u s the determination of the minimizing bi is equivalent to the determination of complex orthogonal polynomials over (~ CO, 60) with the weight function I u(s)I2. VCTe can now apply the classical theory of orthogonal polynomials to obtain asymptotic behavior under various assumptions concerning v. We shall not pursue this line of investigation since we will have no need for results of this nature. Exercises
1. Determine explicitly the 6, which minimize 11 T ( u , 6)/12 for the case where u = e-1' and the interval is (-60, 60). Hint: Use Hermite polynomials.
2. Using the results for the asymptotic behavior of H,,(t) as n + 60, determine the asymptotic behavior of the bi and that of minb I/ T(u, b)ll as n + a.
3. Using a Fourier series expansion for u and its derivatives, obtain a result analogous to (6.7.7) for the case where the interval is (0, T ) .
6.9. EXTENSION OF CLASSICAL APPROXIMATION THEORY
s:
231
= 0, 1 , ..., N - 1 and let the bi be deter(@') + ~ , u ( N - ~ )+ - * . 6 , ~ )dt~ is mined by the condition that minimized. Consider the associated characteristic roots, the roots of rN b,rN-, .-. 6, = 0. Do they all have negative real p a r ts ? (This is an important question as far as numerical stability in the calculation of the function v is concerned.)
4. Let d k )€LZ(O,a)for k
+
+ +
+
6.8. Improving the Approximation
We begin with a function u and are led to a function v which is a solution of T(v, b) = 0 = v ( N ) + b,V"-1' + ... b N V , (6.8.1)
+
d k ) ( 0 )= d k ) ( 0 ) ,k = 0, I , ..., N - 1. A question which arises immediately is whether or not we can obtain a different solution of (6.8.1) which yields a better approximation to v in the sense of 'decreasing the quantity
11 v
-
u 112 =
J
T
(v
~
u ) dt. ~
0
(6.8.2)
Since this is a global norm, we can expect that we can decrease it by loosening the fit at t = 0. Let v1 , vg ,..., vNdenote the principal solutions of (6.8.1), and set N
v
=
2a < v i ,
(6.8.3)
i=l
where the ai are to be chosen so as to minimize (6.8.2). This again leads to the solution of a system of linear algebraic equations if pursued directly. Alternatively, we can use a set of mutually orthogonal solutions of (6.8.1) in place of the principal solutions and determine the ai quite simply. Exercise
1. How would one go about calculating a set of mutually orthogonal solutions of T ( v ,b ) = 0 ? 6.9. Extension of Classical Approximation Theory
T h e foregoing algorithm can be viewed as a step toward the solution of the following problem: Determine the coefficients b, , b, ,..., 6, and the initial conditions a, , a, ,..., aN so that (6.9.1)
232
DIFFERENTIAL APPROXIMATION
is a minimum where u is a given function and v satisfies the differential equation @)
+ b,dN-1) + ... + bNv = 0, d k ) ( 0 = ) ak, R
=
0, 1,..., N
-
1.
(6.9.2)
Observe that if we fix the 6, in various ways we get some classical problems: b2
(a)
b,
(b)
T(v,b )
=
==
=
... = bN
--
0
d2 d2 (-dt2 + I ) ... (= + n2)
(polynomial approximation), (6.9.3) (trigonometric approximation).
Once we have agreed to allow the use of a digital computer to effect an algorithm, there is no reason why we have to restrain ourselves to linear operators, nor even to differential operators. We can contemplate nonlinear operators such as T(u, b)
=
u'
-
b,u
- b,U2,
(6.9.4)
differential-difference operators such as U ' ( t ) - b,U(t)
- b,U(t - T ) ,
(6.9.5)
and integro-differential operators such as u'(t) - b,u(t)
-
It
k,(t - s) u(s) ds,
0
(6.9.6)
and, of course, various multidimensional versions. In general, our aim is to approximate to the output of one process by the output of a more convenient process. T h e question of effective determination of the minimizing parameters can often be discussed using the theory of quasilinearization. We will pursue this topic in the second volume where we discuss its interconnections with the identification of systems. 6.10. Riccati Approximation
T h e major advantage in using a linear matrix equation, Y'
=
AY,
(6.10.1)
to obtain an approximation to a specified matrix X ( t ) lies in the fact that the solutions of (6.10.1) form a linear manifold. T h e linearity leads
6.12. APPLICATION TO RENEWAL EQUATION
233
to simple analytic representations for the general solution of (6.10.1). Recall that the general solution of Z‘=AZ+B+ZD+ZCZ
(6.10.2)
can be expressed in terms of the general solution of the linear system
+ BY,
X
=
AX
Y’
=
CX f D Y ,
(6.10.3)
namely, 2 = X Y -l . Consequently, we can use a differential approximation of the type appearing in (6.10.2) without giving u p in any essential way the analytic advantages of linearity. We may consider the approximation obtained in this fashion as a partial extension of the classical Pad6 approximation. 6.1 1. Transcendentally-transcendent Functions
I n Sec. 6.3 we assumed that the functions u, u‘,..., u ( N ) were linearly independent. I n this case, linear independence means that zi satisfies no linear differential equation with constant coefficients of degree N or less. T hi s property becomes obvious, for the case where u is e-1’ or a Bessel function, upon examining the asymptotic behavior as t + GO. A more interesting and difficult problem is that of determining when a function satisfies no polynomial differential equation of the form p(u, u’,...)U ( N ) , t ) = 0.
(6.1 1.1)
A function with this property is called “transcendentally-transcendent.”
Questions of this nature arise in the study of transcendental numbers. References to results in this area will be found at the end of the chapter. Exercise
1. Prove the Holder-Moore result that r(t), the Euler gamma function, is transcendentally-transcendent.
6.12. Application to Renewal Equation
Let us present an application of differential approximation to an interesting equation which arises in many parts of analysis and mathe-
234
DIFFERENTIAL APPROXIMATION
matical physics and in many applications of analysis to biology and economics-the linear integral equation
usually called the “renewal equation.” We have indicated previously some of the interesting features of the equation du
dt = au
t
j’k(t
-
t l ) u(tl) dt, ,
u(0) = c1 .
(6.12.2)
T h e convolution term introduces some novel problems as far as a computational solution is concerned. There are, however, two immediate approaches to the task of obtaining a numerical solution of equations of this type. T h e first involves a pedestrian evaluation of u ( n d ) in terms of the previous values u(O), u(d),..., u((n - 1) A), using a quadrature formula of some type to evaluate the integral over [0, t] in terms of the values of u at the grid points, n d , n = 0, 1 ,... . For example, we may replace (6.12.1) by v(t)
+ c k(t N-1
=f ( t )
n=O
-
n o ) v(nd) A ,
(6.12.3)
where t = N d , N = 1, 2,..., with v(0) = f ( O ) . For small d, we expect v(t) to furnish a good approximation to the value of u ( t ) at the points
t = O , d , 2 4 ,.... This is a simple straightforward method, easily carried out and made far more accurate by use of an appropriate quadrature formula in place of the rudimentary one invoked above. A difficulty, however, that one faces immediately is that a considerable amount of rapid-access storage space is required for this procedure as opposed to what is involved in the solution of a differential equation. Namely, to calculate v ( N d ) using (6.12.3) we must store the values v(O), v(d), v(2A),..., v((N - 1) A ) . As we see, the further out we proceed in t , the greater the demand on storage. This additional storage is clearly of no particular importance in the one-dimensional case, considering the capacities of modern digital computers. I t can easily become of crucial importance in a multidimensional case where the equation assumes the form (6.12.4) with x ( t ) an N-dimensional vector and K an N x N matrix.
6.12. APPLICATION TO RENEWAL EQUATION
235
A second straightforward approach is based on the use of the Laplace transform. We have, returning to (6.12. I), and taking advantage of the effect of this transform upon a convolution kernel,
W )= L ( f ) + W ) L ( u ) ,
whence L(u) =
Here, as before, ~ ( u= )
J
(6.12.5)
L ( f1
(6.12.6)
1 -L(k) m
e-stu(t)
(6.12.7)
dt.
0
We can evaluate L( f ) and L(k) again by means of a quadrature technique and then determine u(t) by means of any of a number of numerical inversion methods. We will discuss this in the second volume. References will be found at the end of the chapter. Let us present here a third approach based upon the technique of differential approximation discussed in the foregoing sections. T h e idea of using and combining a variety of different approaches in place of a single method becomes meaningful when each requires so little time for execution. Furthermore, it should be emphasized that with the development of parallel programs, "parallelization," the advantage, insofar as reliability is concerned, of having alternative methods available becomes considerable. Consider the equation in (6.12.1). Differentiating repeatedly, we obtain
u"(t)= f " ( t )
+ k(0)u'(t) + k'(0)u ( t ) + j' K"(t
-
t l ) u(tl) d t , ,
0
~ ( r ) ( t= ) f'"(t)
+ k(0)u'+-l'(t)+ ... + k(r-l)(0)~ ( t )
+ Jt k q t
-
t l ) U ( t l ) dt,
0
.
Hence, for any set of constants bi , we have b i d i ) ( t )= E , ( f ) i=O
+ C a i d i ) ( t )+ r-1
t
i=O
0
[1b , W ( t T
i-0
(6.12.8)
1
- t l ) u(tl)dt,
,
(6.12.9)
where E,( f )is a linear expression in f,f ( I ) ,...,f ( r ) , with the bi as coefficients and the ai are certain linear combinations of the bi . Take 6, = 1 as a normalization. If the function k satisfies a linear differential equation
236
DIFFERENTIAL A P P R O X I M A T I O N
with constant coefficients, we see that u likewise satisfies an inhomogeous linear differential equation. T h e initial conditions for u, u’, ..., zdr-l) can be obtained from (6.12.1) and (6.12.8). In general, k will not satisfy a linear differential equation of the desired nature. If, however, the bi can be chosen so that (6.12.10) is small enough, it is plausible that the function a, the solution of the approximating equation r
r-1
a=0
i=l
C bzdi’(t)= E r ( f ) + C a i d i ) ( t ) ,
~‘”(0)= di)(0),
(6.12.11)
= 0, 1,..., Y - 1 , will be close to the function u, the solution of (6.12.9). VC’e will present an example of this below. T h e method is only of practical importance in overcoming rapid access storage difficulties if we can find a set of b i , i = 0, 1, ..., Y , for small Y, which renders the expression in (6.12.10) sufficiently small. Consequently, there is no particular interest in examining the related stability problem for large Y.
i
6.13. An Example
Consider the equation
carefully chosen to have the solution u(t) = 1 . Starting with the differential equation for k ( t ) = ect2, namely, k’
-t 2tk
=
0,
k(0) = 1,
(6.13.2)
we readily determine the values of b , , b , , b, which minimize the expression [k(3)
+ blk(2)+ b , W + b3kI2dt.
These are b,
=
2.7402990,
7.9511452, b, = 5.7636455. b,
=
(6.13.3)
(6.13.4)
6.13. A N EXAMPLE
T h e function f(t) =
1
-
J
t
237
(6.13.5)
e-tlZ dt,
0
satisfies the third-order linear differential equation
+
j ( 3 ) 2tf"'
+ 2f'l) = 0,
f(0)
=
f"(0)
f ' ( 0 ) = -1,
1,
==
0. (6.13.6)
Proceeding as indicated in Sec. 6.12, the renewal equation (6.13.1) is replaced by two simultaneous linear differential equations: one for ZI, the function obtained by ignoring the integral term in (6.12.9), the solution of (6.12.11)) and one for f-both subject to initial conditions. We obtain the values in Table I for v(t) in this way. As we see, the agreement with u ( t ) is excellent. One pleasant feature of the method is that we can vary the value of r until we get satisfactory agreement between the original function and the approximating function, or suitable agreement in the results for two successive values of Y. Extrapolation techniques can again be invoked. TABLE I
0.1
0.999999 0.999999 0.999969 0.999937 0.999909 0.999898 0.999909 0.999938 0.999970 0.999989
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 .o
-
-
-0.14 x 10-3
-0.148
-
0.229 0.330 0.272 0.919
-
X
lo-'
0.174 x 0.167 x 10-3 -0.135 X to-' -0.189 x lo-'
x x 10-3 x
x
Exercises
+ Ji
1. Consider the equation u(t) = f ( t ) k(l t - t , I)g (u (tl))dt, . Show N a,eAt', the equation can be transformed into the that if k ( t ) = two-point boundary value equation
xi=l
Vi'
Wif
where u = f
vz(0) = 0,
- AzVz = g(u),
+ h,Wi
+
= N
-g(u),
(upi
WZ(1) =
+ aiwi).
0,
i
=
1, 2)...)N ,
DIFFERENTIAL APPROXIMATION
238
2. Show that a similar result holds if k ( t ) satisfies any linear differential equation with constant coefficients.
3. Discuss the problem of using differential approximation to replace the nonlinear integral equation by a two-point boundary value problem.
6.14. Differential-Difference Equations Let us now consider the question of obtaining the computational solution of a differential-difference equation, say (6.14.1) ‘There is again no difficulty in utilizing standard algorithms for this purpose. If we store u ( t ) over [k,k 4- 11, at a sufficiently large number of grid points, say {k Y A } , Y = 0, 1 ,..., N , where NA = 1 , we can use (6.14.1) to calculate u(t) at the translated set of grid points, (k I YA). We then repeat this process in the interval [k 1, k 21 and continue in this fashion. Oncc again we can run into rapid-access storage difficulties rather easily if we attempt the same procedure for a vector system, or even if we demand an extremely fine grid of points for a scalar problem. To avoid this obstacle, let us proceed in the following fashion. Introduce the sequence of functions
+
+
u,(t)
u(t
+ n),
0
< t < 1,
+
= 0,
1,... .
+ +
(6.14.2)
Then (6.14.1) can be written as an infinite system of ordinary differential equations u,’(t) =- g(u,, unpl),
0
< t < 1,
n = 1 , 2,... .
(6.14.3)
IVe know the value ~ ~ (=0 u(1) ) = h ( l ) , but we do not know the values of u,(O) for TZ 2 2. Hence, we cannot proceed immediately to solve some truncated version of (6.14.3) as a system of ordinary differential equations. We can, however, proceed as follows. Using the value of ul(0), wc can integrate the differential equation
6.15. A N EXAMPLE
239
numerically to determine the value ul( 1) = u2(0). Having determined this new initial condition, u,(O), we then consider the two equations (6.14.5) and integrate them numerically to find ~ ~ ( = 1~ ) ~ ( 0With ). this new initial condition, u3(0),we can consider the three equations u l w = g(u1 9
UO),
U g l ( t ) = g(u,
9
%),
U Q l ( t ) = g(u3
5
= W),
%lo) = %(1), U3(0)
(6.14.6)
= v2(1),
and so on. If we assume that h(t), the functional value over [0, 11, is a function of simple analytic form, or itself a solution of an ordinary differential equation, we have eliminated the storage problem for differentialdifference equations at the expense of solving larger and larger systems of ordinary differential equations with initial conditions as the t-interval is increased. I n practice, again in the vector case, the number of differential equations can become embarrassingly large. One way to avoid this mushrooming effect is by means of differential approximation. If we use the foregoing techniques, together with the differential equations of (6.14.3), we can obtain an approximating function to un(t),a function v which satisfies a differential equation
dv
dt = y(v),
0
< t < 1.
(6.14.7)
T h e function v, of course, will depend upon n, but there is no need to make the dependence explicit. We then start over, using the preceding method to integrate
u'(q = g(.(t), u(t) =
u(t
-
4th
I)),
+
t >, fi 1, n < t e n
+ 1.
(6.14.8)
In the next section we give an example of this procedure. 6.15. An Example
Let us consider the equation u ' ( t ) = -u(t U(t)
G
1,
-
1)[1
+ u(t)],
t
1,
O
(6.15.1)
240
DIFFERENTIAL APPROXIMATION
and replace it as indicated above by the system of ordinary differential equations uo‘(t) = 0 , uo(0) = 1,
(6.15.2)
(6.15.3)
A grid size of A t = 0.00390625 was chosen, and integration began at t = 1.0 min and ended at t = 20.0 min. After t = 4k min, k = 1,..., 4, differential approximation of order R = 1, 2, 3 was used. Table I1 compares u(t) using differential approximation with its actual values for t = 4, 6, 8,..., 18. R = 2 gives excellent results in about two-thirds the time. TABLE I1
t
4 6 8 10 12 14 16 18
u ( t ) exact
0. I6534 -0.05369 - 0.00455 0.01 657 - 0.01477 0.00932 - 0.00466 0.00179
uwith R = 1
uwith R = 2
uwith R = 3
0.16838 -0.05267 -0.00606 0.01 762 -0.01533 0.00956 - 0.00473 0.00179
0.16533 -0.05369 - 0.00454 0.01 657 0.01477 0.00932 -0.00466 0.00179
0.16534 -0.05369 -0.0045 5 0.01657 -0.01477 0.00932 -0.00466 0.00179
6.16. Functional-Differential Equations
Realistic mathematical models of phenomena in fields ranging from electromagnetic theory and economic and engineering control theory to respiratory theory and neurophysiology lead to functional equations of the form (6.16.1)
6.16. FUNCTIONAL-DIFFERENTIAL EQUATIONS
241
and, more generally, to equations such as u'(t>= g(4 u(t),u(h(u, t>>>.
(6.16.2)
Let us extend the discussion in Sec. 6.14 to cover equations such as (6.16.1). Presumably, equations such as (6.16.2) may then be treated by means of successive approximations, but this possibility has not been investigated to date. There is certainly nothing routine about this method in this situation. Let us restrict the discussion here to (6.16.1). We suppose that h(t) t for t 3 0, so that the future is determined solely by the past. Further, suppose that h'(t) > 0 for t 3 0, and let H ( t ) = h-l(t), the inverse function. Take u(t) to be known in some initial interval [0, tl], where t , = H(O), and let the sequence {t,} be defined recursively by
<
t,
=
H(tn-l),
n
=
2, 3...
.
(6.16.3)
Th e n t, = H(")(O),the nth iterate of H(t), evaluated at t = 0. Observe that H ( t ) maps the interval [t,-,, t,] onto [t, , t,,,] in a one-to-one fashion and W k ) ( t maps ) [tn-,, tn]onto [tn-l+k, t,+k]. Consider the sequence of functions u,(s) =
where 0
u(H(")(s)),
n
=
0, 1 , 2 ,...)
(6.16.4)
< s < t, and H(O)(s)= s. Then, by virtue of (6.16.1), (6.16.5)
where the derivative of H(")(s)may be readily evaluated by means of the form u 1a d d H'"'(s) = H'(H("-l)(s))H'n-ys). (6.16.6) ds ds Now set t yields
=
H(")(s)with 0
< s < t , , for n = I , 2 ,... . T h e n (6.16.1)
u'(H("'(s))= g(u(H(")(s)),u(H(n-1)(s)N
(6.16.7)
= g(un(s), un-1(4).
Hence, using (6.16.6) and (6.16.7), we have
ud(4
=
d [x f3(Ys)]g(u,(s),
un-1(4)9
(6.16.8)
n = 1, 2, ... . We now proceed as before to obtain a computational solution of this equation.
242
DIFFERENTIAL APPROXIMATION
6.17. Reduction of Storage in Successive Approximations
T h e device of adjoining the defining differential equation in place of storing the function can be used with considerable effect in connection with the application of the method of successive approximations to the solution of two-point boundary value problems. We shall discuss this in detail in the chapter on quasilinearization in the second volume. 6.18. Approximation by Exponentials
At this point the reader may wonder why we have not considered the apparently simpler problem of minimizing the expression
(6.18.1) with respect to the parameters a, and A, in place of using the linear differential equation of the preceding sections. T h e first part of the answer is that this is a complex nonlinear variational problem. T h e second part of the answer is that approximation by exponentials is a notoriously delicate enterprise. Conceivably, if we allow polynomial coefficients and complex exponents, the situation would improve. However, the and a,< determined by the minimization of (6.18.1) are unstable functionals of u.Small changes in the value of u can cause much greater changes in the values of ah- and A,. Examples of this will be found in the book by Lanczos cited at the end of the chapter. 6.19. Mean-square Approximation
Let us now see if we can use the general technique of differential approximation as a method for obtaining an approximate solution of a given equation, dx 40) = c. (6.19.1) dt = g(4, I,et
$
=
AY
+ 6,
Y ( 0 ) = c,
(6.19.2)
be an associated linear equation where A and b are to be chosen to minimize the functional J ( x ) = I T ( g ( x ) - Ax 0
-
b,g(x)
-
Ax - 6) dt.
(6.19.3)
6.20. VALIDITY OF THE METHOD
243
I n view of this norm, this particular application of differential approximation is called mean-square approximation. T h e question remains: How do we calculate A and b when x is unknown ? Exercises
1 . Determine the expressions for A and b as functionals of x assuming x known.
2. Write Ax + b = Pg(x). Show that P is a projection operator, that is, P2 = P, and that AX -
AX - b ) d t =
6.20. Validity of the Method
We begin with the determination of A and b which minimize J(x)
=
I
T
(g(x) - AX - b , g ( x ) - Ax - b ) dt.
0
(6.20.1)
With A and b chosen in this fashion, y is determined as the solution of dY
=
Ay
+ b,
y(0) = c.
(6.20.2)
What can we say about (1 x - y (1 ? Writing the equations for x and y in the form
we can conclude that if
<
I\Ax
<
+b
- g(x)l\
< E,
(6.20.4)
with E 1, then I( x - y (1 Ale. Everything depends upon obtaining a sufficiently good mean-square approximation. I n some cases we know in advance that this can be done; in others we can only hope.
DIFFERENTIAL APPROXIMATION
244
Exercise
I . Examine a proof of the validity of the method starting with the equations dy - Ay
-t b,
ax - = AX
+ b + (g(s)
at
dt
Y ( 0 ) = c, -
~(0)
AX - b),
C.
1
6.21. A Bootstrap Method
One approach we will employ is the following. Suppose that in some fashion we know that a particular function xo is an excellent approximation to x, the solution of (6.20.3). Then we can calculate candidates for the optimal A and b by minimizing the expression (6.21.l)
in place of the expression in (6.20.1). A particularly important illustration of this idea is the case where the equation of interest has the form dx
Z = A,x with E
+ bl + %(.),
x(0)
(6.21.2)
= c,
< 1. I t is clear that if we take xo as the solution of dx
Z=A9
+ bl ,
(6.21.3)
z(0) = c,
then the expression in (6.21.1) can be made at least O(e2) by proper choice of A and b. 6.22. The Nonlinear Spring
As a first example of the application of the method, consider the familiar equation U"
+ + U
€U3 =
0,
u(0) = 1,
u'(0)
=
0.
(6.22.1)
We wish to determine an approximating linear equation v"
+ (1 + bt) v
=
0,
v(0) = 1,
where b is a constant, and to compare u and
TI
v'(0) = 0,
over [0, n].
(6.22.2)
6.22. THE NONLINEAR SPRING
245
Let us obtain a value of b accurate to O(E)by minimizing
J,
(u
+ cu3 - (u + bru))Z dt = 2 J,
(u3
~
b ~dt.) ~
(6.22.3)
I n view of the order of the term in (6.22.3), namely O(e2),we see that it is sufficient to use for u the approximate value, u1 = cos t . Substituting in (6.22.3) and differentiating, b is determined to this order of accuracy by Jn
0
Since
u14dt
s,
and (cos t)4
=
b J n u12dt
~
cos2t dt
( eit ; ecit l4
whence
(6.22.4)
= 0.
0
7r
(6.22.5)
=-
-
2
cos8 4t
cos 2t
3
+ 7 + p
37r J, (cos t)4 dt = ~, 8
we see that
(6.22.6)
(6.22.7)
b = 2.
(6.22.8)
Hence, the approximating equation is 2)’’
3
+ i1 + - V
=
0,
~ ( 0= ) 1,
~ ’ ( 0=) 0 ,
(6.22.9)
leading to the approximate relation v =
in agreement with previous approximations obtained in Chapter 5 in various ways in the text and exercises. Exercises
1 . Determine the order of magnitude in
E
of
I u - z, 1
over [0, 57-3.
2. Do we get the same degree of approximation as that afforded by the Bubnov-Galerkin method ?
246
DIFFERENTIAL A P P R O X I M A T I O N
3. Find in the same fashion an approximate solution for u” u(0) = c, u‘(0) = 1 , 1 c 1 1 over [0, TI.
<
+ sin
ZI
=
0,
4. What happens if we attempt to obtain a more accurate approximation using the linear equation z,” + b 2 E d (1 b,c) z, = O ?
+ +
6.23. The Van der Pol Equation Let us now turn to the familiar equation u”
+
E(U2 -
1) u‘
+u = 0
(6.23.1)
and see if we can obtain an approximation to the unique periodic solution over [0, 2n] for small E . Let us write (u2
~
1) u’
b,u’
+ b2u
(6.23.2)
and determine b, and B, by the condition that the expression 2n
[(uz - 1) u’
-
b,u’
-
(6.23.3)
b2u] dt
0
is minimized. T h e variational equations are
1 1
2n 0
[(u2 - 1 ) u‘
‘Ln
-
b,u’ - b2u]u’ dt
=
0,
(6.23.4) [(I?
0
-
1) u’
-
b,u’ - b2u]u dt
=
0.
T h e second equation reduces to
whence 6 , = 0. T h e first equation then yields 211 0
(u2 ~1 ) uI2 dt =
b,
1
2n
0
u‘2
dt.
(6.23.6)
6.23. THE VAN DER POL EQUATION
Let us take as our approximation for small
E
247
a function of the form (6.23.7)
k cos t ,
u
where k is to be determined. Since the approximating equation is V”
+
+ ZJ = 0,
b,V’
(6.23.8)
we see that k is reasonably to be determined by the condition that b, = 0. This is a “self-consistent” condition of a type we shall refer to again below. Hence, from (6.23.6),
1 (k2 277
cos2 t
- 1) sin2 t dt
=
0
(6.23.9)
0
or
Jrsin2t dt
Jrsin2t dt
4
JE sin2t cos2 t dt
J:n sin22t dt
=
(6.23.10)
4.
Hence, as before, k
=
2.
(6.23.1 1)
Exercises
+
(2 k2E) cos(1 What values do we get for k, and k, ?
1. Suppose we set u
+ k3e) t and proceed
as above.
2. Similarly determine the approximate solution of u“
+ e(u2
-
1) u’
+ u = cos w t
for
w2
# 1,
I E 1 < 1.
+ bu’ + u + eg(u) = 0. to U” + (1 + cos 2 t ) u = 0,
3. Find an approximate solution to u’’
4. Find an approximate solution Hill-Mathieu equation.
E
the
5. Find an approximate solution to u”
where
and
E,
+ bu’ + (1 + El cos 2t) u +
E2U3
=
0,
are small. See
A. Blaquiere, Nonlinear System Analysis, Academic Press, New York, 1966, pp. 188-192.
DIFFERENTIAL APPROXIMATION
248
6.24. Self-consistent Techniques
Let us now introduce a different approach equally versatile in its own fashion. Returning to the original formulation, we have dx
_ dt -- dX>,
x(0) = c,
+ b,
y(0) = C,
dY - = Ar dt
(6.24.1)
where A and b are to be determined so as to minimize the expression IIg(x) A x - /I. T o bypass the difficulty that x is unknown, let us replace x by y with y determined as above and consider the new problem of minimizing ~
w, 6)
=
IIdY)
-
AY
-
b I1
b
1.
(6.24.2)
This is a formidable analytic problem in most cases, but on the other hand one involving no unknown functions. T h e general question of validating this procedure requires a far more detailed investigation in each particular case, and we shall therefore not attempt to present any results. 6.25. The Riccati Equation
As a simple example of the use of the self-consistent method, consider the Riccati equation u'
= -u
+ u2,
u(0) = c,
(6.25.1)
where 1 c I is small enough to ensure that u exists for all t 3 0. We propose to approximate to the nonlinear term by a linear expression (6.25.2)
u,u.
u2
T h e quantity a, is to be chosen so as to minimize the expression f(Ul
,u )
=
J
T
(UZ
- Ul U)2
dt.
(6.25.3)
0
In place of u, we use the function v obtained from the approximating equation 0' = --zI alv, v(0) = c, (6.25.4)
+
6.25. THE RlCCATl EQUATION
namely v
=c
exp[-(I
- u,)
249
(6.25.5)
t].
The problem is then that of minimizing the transcendental function of a,, T
f(q)=
(c2
exp[-2(1
-
a,) t ] - cu,
expl-(1
- Q,)
t312 d f ,
(6.25.6)
0
a rather complicated problem even in this one-dimensional case. One way to simplify (comparable to the connection between the Bubnov-Galerkin and the moment methods of Chapter 5) is to obtain the variational equation for a, from (6.25.3), regarding u as independent of a,, namely L
3
J O
dt
=
(6.25.7)
a, “uz d t , J O
and then replace u by v. As we see, this is a reasonable procedure when 1 c 1 1. In this way, we obtain the transcendental equation
<
2c (1 - expi-3(1 - a,) TI} 3 (1 - exp[-2(1 - a,) TI) *
(6.25.8)
a, = -
If we suppose that a, < 1, and take T large, we obtain the value 2c 3 ’
(6.25.9)
a, = -
The exact solution of (6.25.1) is then u =
[(l - c ) c-ld
+ I]-,.
(6.25.10)
In Table 111, the exact and approximate solutions are compared for selected values of t for c = 1/2. T h e approximation is remarkably good, TABLE I11 COMPARISON OF APPROXIMATE AND EXACT SOLUTIONS (c = 1/2) t
Uexact
0 0.1 0.2 0.5 1 .o 2.0 5.0 10.0
0.500 0.475 0.450 0.378 0.269 0.119 0.00669 0.000045
Uapprox
.V
0.500 0.468 0.438 0.358 0.257 0.132 0.0178 0.00063
Uapprox - Uexact
0.000 -0.007 -0.012 -0.020 -0.012 0.013 0.01 11 0.00058
250
DIFFERENTIAL APPROXIMATION
considering the crudity of the methods employed and the fact that 112 is not a particularly small quantity. As the reader will have noted, a certain amount of improvisation is required to thread one's way through the maze of analytic and computational complications that arise. This is characteristic of the selfconsistent method and, as a matter of fact, of the implementation of all methods. As Hurwitz said, "It's easier to generalize than particularize." There is no infallible guide to the use of this powerful method, but engineering and physical intuition are extremely useful, and experience is best of all. Exercises
1. Examine the possibility of using the approximation u2 as a starting point for the self-consistent method.
2. For j c I
alu
+ a2
< I , obtain values of a , , a2 accurate to O(c3).
3. If the equation for u is taken to be o'
+
= -o a,v, v(0)= c l , with both a, and c1 as free variables, determine the values which minimize J," (u2 - alu)2dt where c1 g c and 1 c I 1.
<
4. Apply the foregoing techniques to
+ u" + u"
(a)
(b)
11
4-€22 = cos w t ,
E(U2 -
1) u'
+u
=
w2
cos
wt,
w2
# 1, # 1.
6.26. Higher-order Approximation
T h e problem of obtaining higher-order approximations along the foregoing lines is quite easy for one type of behavior but quite difficult as far a s periodic solutions is concerned. Consider first the Riccati differential equation u' = -u
where I c
1
+
u2,
u(0)
(6.26.1)
= c,
< 1 . A corresponding equation for u2 is (242)'
= =
2uu' = 2u(--u -2u2 2u3, q
+
+ UZ)
o)=~
2 .
(6.26.2)
Writing (6.26.3)
6.27. MEAN-SQUARE APPROXIMATION-PERIODIC
we have the system
+
ul' = -u1
u21
=
u2,
+ 2u13,
-2u,
q ( 0 ) = c, ~ ~ (= 0 ~)
251
SOLUTIONS
(6.26.4) 2 .
This is in the spirit of the Carleman linearization previously noted. To obtain an approximating linear system we write u13
= alul + u2u2
(6.26.5)
and determine the coefficients a, and u2 by the condition that the expression J(u, Q1 , u2) =
J
a,
(243
- a,u
- u2u2)2
dt
0
(6.26.6)
is minimized. T h e calculation is carried out using the function uo = u1 obtained as the solution of
+
vl' = -01 V 2 l = -2v2,
Vl(0)
0 2 ,
= c,
V2(O) =
c2.
(6.26.7)
Continuing in this fashion, using higher powers of u, we can obtain approximations of arbitrary accuracy.
Exercise
1. Determine a, and u2 and examine the order of magnitude of I u - u1 I after the mean-square approximation has been carried out. 6.27. Mean-square Approximation-Periodic
Solutions
Let us now consider an application of the foregoing method to the equation u" u + €U3 = 0. (6.27.1)
+
We introduce the equivalent system u' = v,
v'
= -u
(6.27.2) - €U3.
252
DIFFERENTIAL APPROXIMATION
and the four-dimensional vector,
Y
=
($jj
(6.27.3)
and let (6.27.4) Proceeding as above, we obtain an equation for y of the form dy = By
dt
+
E~(x),
(6.27.5)
so that (6.27.1) may be written
dx dt
- = Ax
+ EBIY,
9 = By + dt
(6.27.6) E~(x),
where B, is a readily calculated 2 x 4 matrix. If we now try to obtain an approximation merely by neglecting h(x), we encounter the secular term difficulties described in Chapter 5. If, however, we employ mean-square approximation to the term h(x), say
where B, is a 4 x 4, and B, is a 4 x 2 matrix, proceeding as in Sec. 6.26, we encounter a new type of difficulty. T h e resulting linear system dx dt
- = Ax
dt
=
+ EBIY,
+ ( B + d3,)y,
(6.27.8)
EB,X
need not, and will not, in general, possess a general solution which is periodic, i.e., the characteristic roots may not be multiples of one particular root. There are ways of circumventing this by means of “tucking-in” techniques, discussed earlier in Chapter 5, but it is desirable to have methods which avoid the necessity for this from the beginning.
253
MISCELLANEOUS EXERCISES
Exercise
1. What degree of approximation to the frequency of the periodic solution do we obtain from (6.27.8) ? Miscellaneous Exercises
1. Consider the Riccati equation u' = -u + u2, u(0) = r , where r is 1 . Consider the moments vn(t)= E(u"), a random variable with I r I where E denotes the expected value with respect to r. Show that for n = 1, 2, ..., we have
<
v,'
=
-nu,
+ nv,+l,
v,(O)
=
E(rn) = m,
.
2. Consider the truncated system
+
wl' = -wl w 2, w,' = -2w,,
wl(0)= m, , w,(O)
m2.
1
Are w 1 and w2 necessarily moments, i.e., do we necessarily have w 2 3 w12 for t 3 O ?
3. Write v3
alvl
+ a2v2and consider the new system
+ w,, w,' = -2w, + alwl+ azwz, wl' = -wl
Wl(0) =
m, I
wz(0)= m 2 ,
How do we choose a, and a2 so that the w2 3 w12 for t 3 0 ?
= B X + XBT, X ( 0 ) = R,where R is a positive definite matrix. Show that X is positive definite for t 2 0.
4. Consider the matrix equation X' 5. Write the system vl' = -vl v2(0) = m2 in the form
where vo
(:: 3'
=
1. Let
=
+ v 2 , vl(0) = m, , v2' = -221, + 2v,,
(-vl
0
+ vz
-2%
+
1
2v, ' + v2
and the elements be determined by the condition that the norm of the matrix
254
DIFFERENTIAL APPROXIMATION
is small, where we use the norm 11 X /I2 = 1 : tr(XXT) dt. Use a bootstrap method to determine 6, , b, , b, , b, . Will this truncation or “closure” technique preserve the moment property ? 6. Show that b, = 6, = 0 and that the condition that w, -+ 0 as t forces us to take b, = 0. Show that the minimization yields
-+co
7. Obtain the corresponding analytic expressions for w 1 and w 2 , and compare the results with the exact solution.
c a. 3
8. Consider the case where three moments are involved. Show that similar reasoning reduces B immediately to the form B
=
0
b,
b,
and determine the expressions for 6, , b, , b, , and b, .
9. Obtain a better approximation for N = 2 by using the fact that the solution of X’ = B X + XBT X C X , X ( 0 ) = A is positive definite if A , C > 0 and B is real. For the foregoing, see
+
R. Rellman and J. M. Richardson, “Closure and Preservation of Moment Properties,” J. Muth. Anal. Appl., Vol. 23, 1968, pp. 639-644.
10. L e t f ( t )
=
1:
u(t) ecSLds, t
2 0, and consider the integral
Show that the evaluation of I , yields a rational approximation tof(t) with a remainder term. Hint :Integrate by parts.
11. Show that determination of the coefficients a,,. so that
+
U(n)
UlU(7”1)
+ ... + a,u
vanishes to the highest possible order at s tion of Pad6 type.
=
0 yields an approxima-
12. What advantages or disadvantages are there to choosing the uk so that
+
[ u ( ~ ) u,u(n--l)
+ ... + u,uI2
dt is minimized ? See
BIBLIOGRAPHY A N D COMMENTS
255
R. Bellman, “On Approximate Expressions for the Exponential Integral and the Error Function,” J. Math. Phys., Vol. 30, 1952, pp. 226-231. Y. K. Luke, “On Economic Representations of Transcendental Functions,” J. Math. Phys., Vol. 38, 1960, pp. 279-294. L. Weiss and R. N. McDonough, “Prony’s Method, Z-transforms and Pad6 Approximation,” S I A M Review, Vol. 5, 1963, pp. 145-149. G. Merz, “PadCsche Naherungsbruche und Iterationsverfahren hoherer Ordnung,” Computing, Vol. 3, 1968, pp. 165-183. Y. L. Luke, The Special Functions and Their Approximations, Vols. I and 11, Academic Press, New York, 1969. Bibliography and Comments $6.1. For discussions of a number of techniques for approximate linearization of nonlinear equations, together with a number of additional references, see N. M. Aleksandrovskii and A. M. Deich, “Determination of Dynamic Characteristics of Nonlinear Objects,” Automation and Remote Control, January 1968, pp. 142-160. A. Blaquiere, Nonlinear Systems Analysis, Academic Press, New York, 1966. R. Bellman and R. Kalaba, Quasilinearization and Nonlinear Boundary-value Problems, American Elsevier, New York, 1965. T h e type of problem treated in this chapter is quite similar to that which motivated Cebycev in his work on approximation. See the discussion and Cebycev’s paper in R. Bellman, Modern Mathematical Classics-I: 1961.
Analysis, Dover Publications, New York,
$6.2. Differential-difference equations with small lags readily lead to questions of this nature. Thus, if u’(t) = u(t - c), we would like to use the ordinary differential equation o’(t) = v ( t ) - d ( t ) ... ( - l)“c”v‘”)(t)/n! for a suitable n as an approximation. See
+ +
R. Bellman and K. L. Cooke, “On the Limit of Solutions of Differential-difference Equations as the Retardation Approaches Zero,” Proc. Natl. Acad. Sci. U.S., Vol. 45, 1959, pp. 1026-1028.
96.4. T h e problem of the approximation of a complex system by one of simpler nature, which in many cases means one described by a lower order differential equation, is one of major importance. See M. Aoki, Chapter 10 in The Method of Aggregation and Some Applications, Multi-level Control Systems, D. A. Wismer (editor), McGraw-Hill, New York. M. Aoki, “Note on Aggregation and Bounds for the Solution of the Matrix Riccati Equations,” J. Math. Anal. Appl., Vol. 21, 1968, pp. 377-383. E. J. Davison, “A New Method for Simplifying Large Linear Dynamic Systems,” IEEE Trans. Automatic Control, Vol. AC-13, 1968, pp. 214-215.
256
DIFFERENTIAL APPROXIMATION
A. A. Kardashov, “Analysis of the Quality of Automatic Control by the Method of Reducing the Order of the Differential Equations,” Automatika i Tekmekhanika, Vol. 24, 1963, pp. 1073-1083. L. Meier, I11 and D. G. Luenherger, “Approximation of Linear Constant System,” IEEE Trans. Automatic Control, Vol. AC-12, 1967, pp. 585-589. 56.8. See C. R. Hobby and J. R. Rice, “Approximation from a Curve of Functions,” Arch. Ratl. Mech. Anal., Vol. 24, 1967, pp. 91-106. Also see the book on quasilinearization cited in $6.1. $6.11. For further results and references, see R. Bellman and C. L. Siegel, Transcendental Numbers, Ann. Math. Studies, No. 16, Princeton University Press, Princeton, New Jersey, 1947. $6.12. This method was given in R. Bellman, R. Kalaba, and B. Kotkin, “Differential Approximation Applied to the Solution of Convolution Equations,” Math. of Comp., Vol. 18, 1964, pp. 487-491. For the Laplace transform approach, see R. Bellman, R. Kalaba, and J. Lockett, Numerical Inversion of the Laplace Transform with Applications, American Elsevier, New York, 1965. R. Bellman and K. L. Cooke, D2~erential-Difference Equations, Academic Press, New York, 1963. For further discussion of the renewal equation, see
M. Shinbrot, “A Generalization of Latta’s Method for the Solution of Integral Equations,” Quart. Appl. Math., Vol. 16, 1958, pp. 415-421. G. E. Latta, “The Solution of a Class of Integral Equations,” J. Ratl. Mech. Anal., Vol. 5 , 1956, pp. 821-833. $6.14. This was given in R. Bellman, “On the Computational Solution of Differential-Difference Equations,” /. Math. Anal. Appl., Vol. 2, 1961, pp. 108-110. For a discussion of how physiological models give rise to differential-difference equations, see R. Bellman, “From Chemotherapy to Computers to Trajectories,” Mathematical Problems in the Biological Sciences, American Math. SOC.,Providence, Rhode Island, 1962, pp. 225-232. 36.15. ‘I’he results in this section are due to S. Azen. $6.16. This follows R. Bellman and K. L. Cooke, “On the Computational Solution of a Class of Functional Differential Equations,” /. Math. Anal. Appl., Vol. 12, 1965, pp. 495-500.
BIBLIOGRAPHY AND COMMENTS
251
96.17. See R. Bellman, “Successive Approximations and Computer Storage Problems in Ordinary Differential Equations,” Comm. Assoc. Comput. Machinery, Vol. 4, 1961, pp. 222-223. C. A. Lawson, “Recent Publications in Approximation Theory with Emphasis on Computer Applications,” Computing Reviews, Vol. 9, 1968, pp. 691-699. 96.18. For examples of the dangers of dealing with sums of exponentials in connection with approximation, see A. Marzollo, “On the Mean Square Approximation of a Function with a Linear Combination of Exponentials,” Intern. J. Control, Vol. 9, 1969, pp. 17-26. C . Lanczos, Applied Analysis, Prentice-Hall, Englewood Cliffs, New Jersey, 1956. M. I. Liechenstein, Two-parameter Exponential and Rational Functions for Least-square Approximations, The RAND Corporation, RM-5707-PR, 1968. 96.19. The results concerning mean-square approximation were first presented in R. Bellman and J. M. Richardson, “Renormalization Techniques and Mean-square Averaging-I: Deterministic Equations,” Proc. Natl. Acad. Sci. U S . , Vol. 47, 1961, pp. 1191-1194. R, Bellman and J. M. Richardson, “Self-consistent Solutions of Deterministic and Stochastic Nonlinear Differential Equations,” Proc. Kiev Symposium on Nonlinear Oscillations, 1963, pp. 87-92. For a discussion of how these techniques may be applied in statistical mechanics, see
L. Levitt, J. M. Richardson and E. R. Cohen, “Linearized Superposition Theory of a Classical One-component Plasma,” The Physics of Fluids, Vol. 10, 1967, pp. 406-413.
For applications to stochastic systems, see J. L. Bogdanoff, “Mean-square Approximate Systems and Their Application in Estimating Response in Complex Disordered Linear Systems,” J. Acoustical Soc. Amer., V O ~38, . 1965, pp. 244-252. J. L. Bogdanoff. “Uniqueness of Mean-square Approximate Systems,” J . Acoustical SOC.America, Vol. 39, 1966, pp. 261-268. W. J. Culver and M. D. Mesarovic, “Dynamic Statistical Linearization,” Comm. and Electronics, July 1963, pp. 317-324. J. B. Farison, “Approximation of Exponential Functions of Random Variables by Statistical Linearization,” IEEE Trans. Automatic Control, Vol. AC-13, 1968, pp. 174-178. I. C. Kazakov, “Generalization of the Method of Statistical Linearization to Multidimensional Systems,” Automatika i Telemekhanika, Vol. 26, 1965, pp. 1201-1206. M. Z . Kolovskii, “Estimating the Accuracy of Solutions Obtained by the Method of Statistical Linearization,” Automation and Remote Control, Vol. 27, 1966, pp. 16921701. M. Z . Kolovskii and A. A. Pervozvanskiy, “On Linearization by the Method of Distribution Functions for Problems in the Theory of Nonlinear Oscillations,” Akad. Nauk, NO. 5 , 1962, pp. 118-128. A. K. Nikitin and G. M. Ulanov, “Investigating Random Processes in Control Systems with a Variable Structure Using Statistical Linearization Methods,” Automation and Remote Control, Vol. 10, 1968, pp. 1553-1563.
258
DIFFERENTIAL APPROXIMATION
E. D. Zaidenberg, “A Third Method for the Statistical Linearization of a Class of Nonlinear Differential Equations,” Avtomatika i Telemekhanika, Vol. 25, 1964, pp. 195200. 06.24. Self-consistent techniques are of great utility in mathematical physics but as yet little explored from the mathematical side. $6.26. An interesting discussion of closure is contained in
H. J. Kushner, “Approximation to Nonlinear Filters,” IEEE Trans. Automatic Control, Vol. AC-12, 1967, pp. 546-556. For a different type of closure of operations, see
A. S. Vitushkin and G. M. Khenkin, “Linear Superposition of Functions,” Russian Math. Surveys, London Math. Soc., Vol. 22, 1967, pp. 72-126. Interesting problems arise in replacing a differential-difference equation by a differential equation. See
I. M. Repin, “On the Approximate Replacement of Systems with Lag by Ordinary Dynamical Systems,” Appl. M a t h . Mech., Vol. 29, 1965, pp. 254-264.
Chapter 7 THE RAYLEIGH-RITZ METHOD
7.1. Introduction
I n this chapter we wish to discuss the Rayleigh-Ritz method, one of the most elegant and powerful of existing techniques for the approximate analytic and numerical solution of functional equations. This will require a certain incursion into the calculus of variations. T o keep the presentation self-contained, we will provide some background results required for a rigorous treatment of some classes of linear and nonlinear equations and associated functionals. We shall, however, restrain our attention to certain fundamental material in order to keep the chapter within moderate bounds. We will primarily be concerned with quadratic functionals. These are important in their own right and form the foundation for various methods of approximation in the study of more general functionals.
7.2. The Euler Equation
Consider the problem of minimizing the functional
J(4 = J0 g(u, u') dt T
(7.2.1)
over some class of functions (u), subject either to an initial condition alone, u(0) = c, or to both initial and terminal conditions. Let us proceed formally for the moment. Subsequently we will present a rigorous approach. Set u=u+v, (7.2.2) where ii presumably minimizes J(u) and Th e n
I(@+v) = STg(ii + ZI, u' 0
+
2)')
dt
=
259
z1
is a function at our disposal.
](a) +
S
T
0
rugfi
+ vlg,,.] dt + (7.2.3) -.-.
260
THE RAYLEIGH-RITZ METHOD
Integrating by parts, the second term yields (7.2.4)
Since v is “arbitrary,” we suspect that u satisfies the equation 0.
gu -
(7.2.5)
This is the celebrated equation of Euler. If no condition at T had been assigned, we would obtain, in addition to (7.2.5), the terminal condition gu’ IT =
0.
(7.2.6)
With the foregoing in mind, it is reasonable to think of reversing the process. Starting with equation (7.2.5) as the original object of attention, let us see if we can use its connection with the minimization of J(u) to obtain an analytic and computational hold on its solution. Once the problem has been transformed into that of minimizing the functional J(u),we can proceed as in the chapter on the Bubnov-Galerkin method to use a number of approximate methods. T h e idea is attractive and feasible, but there are a number of points which require discussion before we can proceed with confidence in this direction. 7.3. The Euler Equation and the Variational Problem
’l’he first matter to investigate is the connection between the two problems. What the formal procedure above shows is that any solution of the Euler equation is a stationary point for the functional J(u). How do we select the stationary point that yields the minimum value, and how do we even know that the minimum value exists? It must be clearly stated that in many of the most important scientific applications the minimum does not exist, with the consequence that the Rayleigh-Ritz procedures are purely formal. I n these cases, one can occasionally use physical intuition to guide the analysis and test the validity of numerical results. Here, however, we are interested only in those cases where a firm basis can be readily provided. References will be given to a number of other applications of the formalism. 1,et us first turn to the Euler equation itself, (7.2.5), together with the initial condition u(0) = c and the terminal condition (7.2.6). Since we are looking for a solution subject to two-point boundary conditions,
7.4. QUADRATIC FUNCTlONALS: SCALAR CASE
261
it is not clear that a solution exists, nor, if so, that there is a unique solution. I n a number of cases of importance, fortunately, we possess very simple means for demonstrating the existence and uniqueness of the desired solution and, in addition, the fact that this solution furnishes the absolute minimum of J(u) over a suitable class of functions. Armed with this knowledge, we can fruitfully use various computational procedures and obtain estimates of the accuracy of approximation. 7.4. Quadratic Functionals: Scalar Case
Let us begin with the one-dimensional case where the functional has the form
+ ~ ( tu2)) d t .
(7.4.1)
T h e associated Euler equation is linear, U"
- p)(t)U = 0.
(7.4.2)
Let us assume for simplicity that q(t) is continuous in [0, TI, or at least piecewise continuous. All that is needed is that 1 q J E L ( O , T ) . We impose the boundary conditions u(0) = c,
u ' ( T ) = 0;
(7.4.3)
the condition at T is the free boundary condition. The procedure we follow is equally applicable to the treatment of the more general condition u'(T)
+ au(T) = 6.
(7.4.4)
Observe that by suitable choice of a and b, and a limiting procedure, we obtain the two cases of most frequent occurrence, u ( T ) = c1 and u'( T)= 0. Clearly, a minimal requirement is that J(u) be nonnegative for all admissible functions. For if J ( u ) were negative for some function u, by virtue of its homogeneity, we could make it arbitrarily negative. Let us assume that positive definiteness holds, i.e., that J(u) > 0 for all nontrivial u. I n some cases, such as ~ ( t>) 0, this condition is obviously satisfied. In other cases, whether or not the condition holds depends upon the magnitude of T. The precise constraint on the size of T will be made clear in our discussion of the Sturm-Liouville problem in Chapter 8.
THE RAYLEIGH-RITZ M E T H O D
262
We have used the term “admissible” above. Let us now make it precise. We demand that u satisfy the assigned end conditions and that u be such that the integral J(u) exists. T h e easiest condition to ensure this is that u’ €L2(0,T ) . This implies that u is continuous, but not necessarily that u’ is continuous. Actually, we suspect that the minimizing u‘ is continuous, since we expect it to satisfy (7.4.2). But we have no need for this information at the moment. And, as a matter of fact, useful approximations can be obtained from piecewise continuous trial functions, as we shall see in Chapter 8. Let us first show that the assumed positive definiteness of J(u) ensures that the equation in (7.4.2) has a unique solution. Let u l , u2 be the principal solutions of the equation U“ - p)(t)u = 0, which is to say the solutions determined by the initial conditions u1(0) = 1,
u1’(0) = 0,
uz(0) = 0,
uz’(0) = 1.
(7.4.5)
+
T o solve (7.4.2), set u = cuI bu,, where the coefficient b is to be determined by the condition at t = T , u’(T) = 0
= CU1’(T)
+ bu,’(T).
(7.4.6)
T h e constant b is thus uniquely determined if uz’(T ) # 0. It is, however, easy to show that this condition holds under the foregoing assumption concerning J(u). For if u2’(T ) = 0, we begin with the relation JT
n
(7.4.7)
u2(ui - p)(t)u2) d t = 0
and thus, integrating by parts, obtain u,uz’],’
1
T
-
0
[ui2
+ p)(t) u;]
dt
0
(7.4.8)
0.
(7.4.9)
=
or, finally, /(u2)=
jT (.Iz + ~ ( tu:)) n
dt
=
This is a contradiction if J ( u ) is positive definite, since u2 is not identically zero. Hence, there is a unique solution to (7.4.2). T h a t this solution u furnishes the absolute minimum of J(u) is easily seen. We have /(u
+ v)
=
J(u)
+ /(v> + 2 IT [u‘a’ + C ( t ) uvl dt
(7.4.10)
7.5. POSITIVE DEFINITENESS FOR SMALL T
263
for any function v such that v' eL2(0,T ) , v(0)= 0. Integrating by parts, we have
j: (u'v' + ~ ( tuv) ) dt = vu I,, ' * J:
V(U" - p ) ( t ) u) dt =
0.
(7.4.11)
T h e integrated term vanishes, since u'( T ) = 0, v(0)= 0. Hence J(u
+4
=
J(4 + J ( 4 > J ( 4
(7.4.12)
for any admissible variation v, v & 0. From the foregoing, we readily deduce the uniqueness of the function minimizing J . Let v be a function which supposedly yields a value of the functional J no larger than J(u), that is, J ( v ) J(u). Then we have
I(.)
=
J(.
+
2)
- u) =
J(.)
+ J(. -
< u)
> J(u)
(7.4.13)
if v is not identically equal to u, a contradiction. Exercises
1. Obtain the same conclusion by considering the convexity of quadratic (1 - X)v) for 0 X 1. polynomial in A, J(Au
+
< <
2. Carry through the arguments for the case where the terminal condition is u( T ) = c1 . 3. What happens if we attempt to impose the terminal condition u'( T ) = c2 , c2 # 0 ? 7.5. Positive Definiteness for Small T
Let us now show that the functional J ( u ) is positive definite for small T , regardless of the sign of p)(t). Analyzing the foregoing proof, we see that the crux of the argument is actually the fact that J(u2) # 0. T o show that this always holds for small enough T , we start with the relation u2(t) =
I'
u2W
4
*
(7.5.1)
Recall that u,(O) = 0. Hence, using the Cauchy-Schwarz inequality, (7.5.2)
THE RAYLEIGH-RITZ M E T H O D
264
Thus, for 0
< t < T, (7.5.3)
whence
(7.5.4) Thus,
3 (S%uiZdt)(l - J T0t l v l d t ) .
(7.5.5)
Sit
Ilence, if T is taken small enough so that 1 I I dt > 0, we have the desired nonnegativity. Observe in this case that we have the stronger condition J(uJ 3 kl(Jc d 2dt) for some constant k, > 0. As we shall see subsequently in Chapter 8, this is always a consequence of positivity. 7.6. Discussion
Let us briefly review the path we have followed. Starting with a functional, we derived a variational equation, the Euler equation, in a purely formal fashion. Ignoring its dubious antecedents, we focussed our attention upon the equation and showed, for the simple functional J(u) =
ST + (u'2
Y ( t )U 2 ) dt,
(7.6.1)
where the equation was U"
- 'p(t)u = 0,
u(0) = c,
u'(T) = 0,
(7.6.2)
that under certain reasonable conditions this equation possessed a unique solution. Furthermore, we then showed directly that this solution furnished the absolute minimum of J(u) over the class of functions satisfying u(0) = c and the condition that u' eL2(0,T ) . We shall pursue the same devious path for nonlinear functionals. Subsequently, we shall present an alternate, and more direct, approach based upon functional analysis which permits us to deduce the Euler equation from the fact the minimum value exists.
7.8. VALIDITY OF THE METHOD
265
7.7. The Rayleigh-Ritz Method
Let us now indicate the basic idea of the Rayleigh--Ritz method, a method of great power and versatility. We use the functional J(u) of Sec. 7.4 for illustrative purposes. Let {wk(t)}be a sequence of convenient functions and consider the “trial function” (7.7.1) Then
= Q ( ~ 7Ia2
(7.7.2)
v-., U N ) ?
a quadratic form in the coefficients a , , a2 ,..., a , . These are now determined to minimize Q subject to the prescribed initial condition u(0) = c
N
=
akwk(0).
(7.7.3)
k=l
Sometimes it is convenient to choose the w k so that (7.7.3) is automatically satisfied. For example, we may use a trial function of the form
+ C akP
(7.7.4)
+ C ak sin kt.
(7.7.5)
N-1
u
=
c
k=l
or N-1
u =c
k=l
I n any case, the original variational problem has been replaced by a finite-dimensional minimization problem, which in turn can be transformed into the task of solving a system of linear algebraic equations. 7.8. Validity of the Method
Once we have established the existence and uniqueness of a solution of the original variational problem, it is easy to demonstrate the validity of the Rayleigh-Ritz approach. Let u be the minimizing function, and let {vk)be a sequence of functions over [0, TI, complete in the L2-sense.
THE RAYLEIGH-RITZ METHOD
266
By this we mean that given any function w inL2[0, TI we can find a set of coefficients h, , 6, ,..., h, such that (7.8.1)
for N 2 N ( E ) ,for any E Choose w = u'. Then
> 0. (7.8.2)
implies that /u-c
upon applying the Cauchy-Schwarz inequality. Hence
for some constant c l . Let us then consider the trial function
On one hand, we have as a consequence of the minimizing property, /(')
<
(7.8.6)
J('N).
On the other hand, we have as in (7.4.12),
I(' = , I(') > + J('N Hence, using (7.8.2) and (7.8.4), J('N)
f- ['
-
+ cl'l*
(7.8.7)
(7.8.8)
7.9. MONOTONE BEHAVIOR A N D CONVERGENCE
267
Combining (7.8.6) and (7.8.Q
T h u s we can choose N large enough to obtain an arbitrarily good approximation to the desired minimum value. What is usually important in practice, of course, is to choose the sequence {v,} so that a moderate value of N suffices. With the development of digital computers we can readily allow larger values of N , but naturally the effort required increases rapidly as N increases. T h e task, however, of choosing the functions v k in some adroit fashion so as to obtain good approximations for moderate N is not an easy one. Physical intuition often plays an important role here, and a certain amount of experience is required.
Exercise
1. Let {wk}be orthonormal over [0, TI, f € L 2 ( 0 ,T ) , and consider the T N minimization of ( f ( t ) - x k = l u , w ~ dt ) ~ with respect to the uk . Deduce Bessel’s inequality, J i f ( t ) 2dt 2 Zzzl ( J i f ( t ) w ,dt)2, for any N .
so
7.9. Monotone Behavior and Convergence
Let {wk’}be a sequence of functions complete in L2(o,T ) , and J N N denote the minimum of J ( C L =u~p l C with ) respect to the a k , subject to N the initial value constraint a,w,(O) = c. This minimum exists by virtue of the fact that A
4
(7.9.1)
9
where u is the minimizing function; it is uniaue as a consequence of the N ..., uhN)] convexity of J(CLE1 u p , ) as a function of the uk . Let denote the set of minimizing values. It is clear that Ji
>, 1 2 >, ... 2 J N
9
(7.9.2)
260
THE RAYLEIGH-RITZ
METHOD
and, as indicated above, that lim
N+m
JN
= min
Iu-v I
7.10. Estimation of
J(u).
(7.9.3)
U
in Terms of ](v) -](IJ)
Let us now consider the converse problem. Suppose that we find an admissible function v such that
J(4 - J(4 < E .
(7.10.1)
What can we conclude about I ZL - v 1 ? We begin with the simple case where ~ ( t3) 0. Then, since (7.10.2)
and (7.10.3)
we can conclude that J(v - u )
<
(7.10.4)
E.
Since J(z)
- u)
=
J7
[(n’- U’)2
0
the assumption
‘p
+ cp(t)(v ~
U)Z]
dt,
(7.10.5)
3 0 implies
sT
(z)’
0
-
u ’ ) dt ~
<
E.
(7.10.6)
Hence, since v(0)= u(0)= c,
This may not be a very good estimate. It follows then that J(v) can be close to J ( u ) without v being a particularly accurate approximation to u. This has both desirable and undesirable aspects. On one hand, it means that a rough estimate of u can lead to an excellent value for J(u).Indeed, in many cases only the minimum value of J(u) is of interest, as we shall indicate below. I n return, however, we must be careful not to be misled by the closeness of J ( u ) to J(v), if it is really the estimation of u itself that is of importance.
7.11. CONVERGENCE OF COEFFICIENTS
269
7.1 1. Convergence of Coefficients
Suppose that we have a sequence of functions {un} such that limn+aoJ(u,) = J(u). T h e foregoing discussion shows that lim un = u,
n-
(7.1 1.1)
m
assuming as we do throughout that T is finite. Consider the situation described above in Sec. 7.9 where u, has the form n
ajl?)wk.
un =
(7.11.2)
k=l
What can we say about the convergence of {ufi")}as n -+ 00 ? There are several important cases where we can demonstrate convergence of the sequence of coefficients in a simple fashion. Perhaps the most important is that where the w k are orthonormal over [0, TI. Then
UP)= JTunwkdt. 0
(7.11.3)
Hence, if we set bk =
T 0
uwk dt,
(7.1 1.4)
we have
< (J
T
(u - 24,)'
dt)l''.
(7.1 1.5)
,
(7.11 -6)
0
Thus, we see that lim u p ) = b, n+m
under the assumption that u, converges strongly to u in the L2-space. Actually, the Bessel inequality (Exercise 1 of Sec. 7.8) establishes the stronger inequality,
If the ukare not orthonormal, more detailed argumentation is required.
THE RAYLEIGH-RITZ METHOD
270
I n the general case, where our assumption is that definite, we must use the fact that this implies J(u
-
v) >, c2
JT
J(E)
is positive
(7.11.8)
(u’ - v’)2 dt
0
for a positive constant c 2 . A derivation of this result will be given in Chapter 8 using Sturm-Liouville theory. An alternate proof can be given on the basis of functional analysis.
Exercise
1. Do we have convergence of the coefficients if u,
= C%, ain’tk?
7.12. Alternate Estimate
We possess a different approach to estimating the degree of approximation we can obtain to u based on an estimate of the difference between J ( u ) and J(v) where v is a trial function. Recall that the Euler equation associated with the minimization of J(u) is u“
~
y(l)u
~
0,
u(0)
u’(T) = 0.
= c,
(7.12.1)
T h e value of the minimum of J ( u ) is readily obtained in terms of the solution of (7.12.1). We have first
1:
U(U” --
y ( t ) U ) dt = 0.
(7.1 2.2)
Integration by parts yields uu’lo’
~
J7( U’ 2 + y ( t )u2) dt
=
0
0.
(7.12.3)
Thus Jmin
=
--u’(O)
(7.12.4)
c.
This yields the missing initial condition in (7.12.1). Hence, if
J(4 < J ( 4 < J(4 +
€9
(7.12.5)
271
7.13. SUCCESSIVE APPROXIMATIONS
we can use J(w) as an initial condition for u’(0) with the assurance that v, the function determined by v(0) = c, v’(0) = - J ( w ) / c , is a reasonable approximation to u.Namely, if U“ - p)(t)
u
= 0,
u(0) = c,
2)” - p)(t)
v
=
0,
v(0) = c,
u’(0) = -J(u)/c, v’(0)= -](w)/c,
(7.12.6)
we have u=cu1--
l(4u2 ,
v=cul--.
C
C
u2
(7.12.7)
Hence
(7.12.8)
C
for 0
< t < T . Here u1and u2 are, as usual, the principal solutions. 7.13. Successive Approximations
T h e problem of minimizing J(Cf=i,ukwk) can be attended with serious difficulties if N is large since this involves the solution of a large system of linear algebraic equations. One way to avoid this onerous task is to employ the method of successive approximations. T o illustrate this, suppose that we begin with the problem of minimizing the expresN/2 sion u , p I i ) .Let [&), ..., ~j$’,!~] denote the solution to this problem. As the next step, consider the minimization of the expression
J(xk=l
(7.13.1) Let a$)2+, ,..., u$) denote the values determined in this way and consider the minimization of N 12 (zlukwk
N
f k=N/Z+l )k.)!‘
*
(7.13.2)
Continuing in this fashion, we can solve the original minimization problem of dimension N in terms of a sequence of problems of dimension N/2. If N is quite large, it may be better to break the problem u p into even smaller blocks. T h e precise decomposition used depends upon the computing facilities available and the time and accuracy constraints.
212
THE RAYLEIGH-RITZ M E T H O D
It is worth noting that extrapolation methods can also profitably be employed in situations of this type. Having calculated the sequences [a!"), a(;), ..., a$:)] for convenient values of n, we can then estimate aim),ahm), ..., u i m )using nonlinear extrapolation procedures of the type previously described. Exercise
1. Show that the method of successive approximations described above yields a sequence converging to the set of values minimizing N J(C,=, al;w,J. 7.14. Determination of the Coefficients N
T h e minimization of J(&=lukwk) leads to the problem of solving a system of linear algebraic equations whose coefficients are determined by integrals of the form
I n many cases, we can simplify the calculations by choosing the base functions w k so that either (7.14.2)
or (7.14.3)
orthogonality conditions. As we shall see in Chapter 8, we cannot simultaneously satisfy both conditions without knowing enough to solve the original minimization problem in a much simpler fashion. How then do we evaluate the integrals in (7.14.1)? A direct approach is that of numerical quadrature. An approach which may be more convenient in many cases is based upon the integration of the system of ordinary differential equations dy,,
--
dt
~
WLWL',
dSk, dt- ~-Ip(t) W k W t ,
Ykl(0)
=
S,,(O)
0, = 0,
(7.14.4)
K, I
=
1, 2,..., N .
7.15. MULTIDIMENSIONAL CASE
273
T h e required integrals in (7.14.2) and (7.14.3) are the values r k l ( T ) , skr(T).If each function w k satisfies a differential equation, as is often the case, the foregoing procedure results in a considerable decrease in storage requirements for the computer. We have already discussed this point in the chapter on differential approximation. 7.1 5. Multidimensional Case
There is no difficulty in extending the foregoing considerations to the multidimensional case. Consider the linear differential equation X”
- Ax
=
(7.15.1)
0,
subject to the two-point conditions x(0) = c and either x ( T ) = dl) or x’(T) = 0.
(7.15.2)
Equation (7.15.1) is readily seen to be the Euler equation associated with the functional (7.15.3)
T h e proof of the existence and uniqueness of the solution of (7.15.1), subject to the boundary conditions of x(0) = c, and of (7.15.2) now proceeds as before under the assumption either that A is positive definite 1. Let us briefly sketch and T is arbitrary or that A is arbitrary and T the details for the sake of completeness. Consider uniqueness first. If x and y are two solutions of (7.15.1), satisfying the same boundary conditions, then z = x - y satisfies
<
z” - Az
=
z(0) = z ( T ) = 0,
0,
(7.15.4)
or z(0) = x’( T ) = 0, depending upon which terminal conditions we have imposed. From this we have
jr(z,
Z”
-
A X )dt
=
0
(7.15.5)
or, integrating by parts,
f
T
0
[(z’,z’)
+ ( z ,Az)]dt = 0.
(7.15.6)
274
THE RAYLEIGH-RITZ
METHOD
T h e integrated term drops out under either of the two boundary conditions above. If A is positive definite, the relation in (7.15.6) is impossible 1, a use of the Cauchy-Schwarz if x # 0. If A is arbitrary and T inequality as in Sec. 7.5 likewise shows that this relation cannot hold for z f 0. Hence, we have uniqueness. Now let us consider existence. Let X , and X , be the principal solutions of the matrix equation X " - AX = 0, (7.15.7) that is,
<
0, = I.
X,(O)
=
I,
X,(O)
=
0,
X,'(O) X,'(O)
Xlb'l'
+ XZb(2),
Write x =
=
(7.15.8) (7.15.9)
where 6'l) and 6 ( 2 )must be determined from the boundary conditions. Let us use the conditions x(0) = dl),x ( T ) = d 2 )to begin with. T h e n b ( l ) = d l ) and b ( 2 )is to be determined from the condition c(2) =
x
T ) c(1)
+ X,( T )b'2'.
(7.1 5.10)
If X,( T ) is nonsingular, 6(,) is uniquely determined. Let us show that X,( T ) cannot be singular under either of the foregoing assumptions concerning A and T. If X,( 7') is singular, there is a nontrivial vector 6 such that X,(T) b
=
0.
(7.15.11)
I n this case x = X,(t)b is a nontrivial solution of (7.15.4). But this we know is a contradiction. This establishes the existence and uniqueness of the solution of (7.15.1) and (7.15.2), and an argument similar to that given in the scalar case shows that x yields the absolute minimum of I(.). One of the reasons we have gone into the details is to make explicit the simple ideas upon which the method is based. T h e same overall approach can readily be applied to quadratic functionals of quite general form and, in particular, to those that arise in the study of partial differential equations. 7.16. Reduction of Dimension
An interesting point to observe is that the variational approach to the solution of x" - Ax, x(0) = c, x ( T ) = b, (7.16.1)
7.17. MINIMIZATION OF INEQUALITIES
275
provides us with some straightforward ways of reducing the dimension of the equation. As we know, we can solve (7.16.1) by means of a linear system of algebraic equations of order N . This may be an onerous task if N > 1. I n place of this approach, we can set y =
M
(7.16.2)
ckwk
k=l
as a trial function for the minimization of
and minimize over the ck subject to the initial condition ~ ( 0 = ) c. This N , this leads to a system of M linear algebraic equations. I< leads to a considerable simplification. Secondly, we can take as a trial function
<
(7.16.4)
w =
where only variation over the xi is allowed, with the yk fixed. This can be made the basis for a technique of successive approximations by shifting the location of the unknown functions. Exercise
1. Examine the convergence of a method of successive approximations of this nature. 7.17. Minimization of Inequalities
As pointed out above, the use of the Rayleigh-Ritz method can lead to the task of minimizing a quadratic form Q(.)
=
(XI
A4
(7.17.1)
276
THE RAYLEIGH-RITZ M E T H O D
subject to a set of linear constraints (x,
i = 1, 2)...)M.
b,) = a , ,
(7.17.2)
T h e matrix A is assumed to be positive definite. T h e case where = (x,
Ax)
+ 2(x, b)
(7.17.3)
can be reduced to the foregoing by means of a change of variable which eliminates the additional term. Problems of this type can, of course, be treated by means of Lagrange multipliers, a topic we shall discuss below. Nonetheless, it is important to note that the minimum value of Q ( x ) can be obtained quite easily on the basis of some elementary concepts concerning quadratic forms. It is worthwhile doing this since a rigorous validation of the use of Lagrange multipliers requires some effort. M+1 Let x have the form c k x k . Then the quadratic form O(x.,"=:'ckxli) is a positive definite quadratic form in the variables ck , k = 1, 2,..., M 1. Since
zkzl
&
+
(7.17.4) we see that we have the determinantal inequality
with equality only if a linear relation exists among the xk . 1,et x1 = x, and let the vectors x2 ,..., xM+l be determined by the relations i = 2 )...)M 1. (7.17.6) Axi = b i P l ,
+
Then (7.17.5) yields the inequality
(7.17.7)
with equality if and only if
(7.17.8)
7.18. EXTENSION TO QUADRATIC FUNCTIONALS
277
for some scalar constants c2 ,..., cM+l . Let us determine the ck by the condition that (7.17.2) holds. Th en
or
Since A-l is also positive definite, this set of linear equations possesses a unique solution. With the a, determined in this fashion, the determinant in (7.17.7) set equal to zero yields the minimum value of (x,Ax). 7.18. Extension t o Quadratic Functionals
T h e same approach can be used to determine the minimum of the quadratic functional
over all functions x for which J ( x ) exists, say x‘ €L2(0,T ) , and which satisfy the constraints (7.18.2)
Introduce the extended “inner product”
Lll.
and suppose that J x is positive definite. Then, consideration of the quadratic form J(&=, ckxk) = Q(cl ,..., c ~ + yields ~ ) the determinantal inequality (7.18.4) K X k , %>I b 0, with equality only if x1 is a linear combination of the other xi . Integrating by parts, we have <xl ,xi>
= (xl
,x;)lr
+ 1 [T
(xl
,4) + (XI ,A(t)4 dt.
(7.18.5)
THE RAYLEIGH-RITZ METHOD
278
Let xi,i
=
2, ..., M -x;
+ 1, be determined as the solution of + A(t)xi =fi-1, Xi'(0)
=
x,'(T)
i =
=
2,,.., M
+ 1,
0.
(7.18.6)
T h e positive definiteness of J ( x ) assures us of the existence and uniqueness of these functions, as we have seen above. Write xi
JT
: -
0
K ( t ,s ) ~ ~ - ~ ( ds, s)
i = 2,..., M
+ 1.
(7.18.7)
T h e n (7.18.4) yields, upon setting x1 = x,
Ji
T h e minimum value of J(x) subject to the constraints ( x , f i ) dt = ai (x, fi)dt in the preceding is obtained as before by replacing the terms determinant by a { . M+1 clixlZ.T h e scalars, ck , are determined Equality holds for x = by the relations in (7. I8.2), M+i
C
k=2
clCs T ( x k , f i ) dt = u i ,
i = 1, 2,..., M ,
(7.18.9)
0
or
T h e positive definite character of the Green's function K ( t , s) ensures the nonvanishing of the determinant of the coefficients. Thus, the minimizing x is uniquely determined. Exercises
1. Determine the minimum of J ( x ) subject to x(0) case of the foregoing.
=c
as a limiting
7.19. LINEAR INTEGRAL EQUATIONS
279
2. Determine the minimum value of J ( x ) subject to x(ti) = c i , i = 1, 2,..., R, 0 f t, < t , < - * - < t , < T, as a limiting case of the foregoing results.
7.19. Linear Integral Equations
T h e Rayleigh-Ritz method can also be applied to study the linear integral equation u(t)
+ J'
0
(7.19.1)
k(t, s) u(s) ds = f ( t ) .
Suppose that k(t, s) is a real symmetric kernel. T h e n (7.19.1) can be considered to be the variational equation associated with the quadratic functional J(u) = J5 u(t)2 dt 0
+ 1:
I5
k ( t , s) u ( t ) u(s) dt ds
-
2 /'f(t)
0
u ( t ) dt.
(7.19.2)
0
T h e question of whether J(u) has an absolute minimum leads to a characteristic value problem. It is necessary for this that the quadratic terms be positive definite. We will elucidate this point in the next chapter. Let us suppose that this condition is satisfied so that ](a) does possess an absolute minimum. Let z, be a trial function, and let
J ( 4 < J(n) G J ( 4 + 6.
(7.19.3)
Then, as before, making use of (7.19.1), we have
JI
+ J:
k(t, s)(u(t) - n(t))(u(s)- z,(s)) dt ds.
(7.19.4)
Hence, under the assumption that the double integral part of J(a) is positive definite, we see that (7.19.3) implies that (7.19.5)
where a, is a constant. This shows that the mean-value of u - z, is small. Nonetheless, we want a point estimate.
280
THE RAYLEIGH-RITZ METHOD
From (7.19.1), we see U(t) - [ f ( t )-
1"k ( t ,
s) v(s) ds] = -
J" k ( t , s)(u(s) - ~ ( s ) ) ds.
(7.19.6)
Hence
(7.19.7) Thus, if
'u
is a trial function which renders J(u) small, it may be that (7.19.8)
is a better point-by-point approximation to u than v itself. T h e minimum value of J(u) can be given quite simply in terms of u. We have
(7.19.9) if (7.19.1) is satisfied. Hence, we can obtain approximate values for the quantity J i f ( t )u ( t ) dt using trial functions of simple type. 7.20. Nonlinear Euler Equation
Let us now turn to the more difficult case where the Euler equation is nonlinear. We consider only the scalar case, since the general methods are the same in principle for the scalar and vector cases, and follow the same path outlined in the simpler case of quadratic functionals. Consider the equation U" - g(u) =
0,
u(0) = c1 ,
u'(7')
0,
(7.20.1)
the Euler equation associated with the minimization of the functional
.I(.)= J T (g(u)
[u'2
+ 2 4 4 1 dt,
h'(u)),subject to the initial value condition u(0) = cl.
:
(7.20.2)
7.21. EXISTENCE AND UNIQUENESS
281
Under various reasonable assumptions which we will discuss below, we can follow the preceding route, namely: Establish the existence of a solution of (7.20.1) Establish the uniqueness of this solution Demonstrate the minimizing property of this solution I n some cases, as we will illustrate, the methods of functional analysis are far more powerful in demonstrating the existence of a solution. I n all cases, obtaining an effective numerical procedure is a task of some difficulty. I n the second volume we approach this general problem by means of a variety of different methods-quasilinearization, duality, dynamic programming, and invariant imbedding. Here we concentrate solely on the Rayleigh-Ritz method. 7.21. Existence and Uniqueness
Following the procedure of Chapter 4, there is no difficulty in establishing the existence of a solution of (7.20.1) under the assumptions (a)Ic1l
or
T
(b) I g'(u)l bounded in some neighborhood of u
=
(7.21 .l)
cl.
For example, we can use the method of successive approximations. Determine u,(t) as the solution of 240" =
0,
u0(O) = c1 ,
(7.21.2)
u i ( T ) = 0,
and {uJt)} recurrently by U: = g(un-l),
u,(O)
=
~ 1 , u,'(T)
= 0,
n >, 1.
(7.21.3)
Under the foregoing assumptions {urn}converges uniformly to a solution of (7.20.1). Uniqueness is a different matter. As we have noted in Chapter 4, there is a unique solution in the class of functions satisfying a constraint such as I u 1 k, I c1 I. I n general, we cannot expect unrestricted uniqueness of the solution of the Euler equation. Consider, for exampIe, the problem of determining a geodesic on a torus as in Fig. 7.1. There is a minimum distance between P and Q
<
Figure 7.1
282
THE RAYLEIGH-RITZ METHOD
for those curves which do not go through the hole of the “doughnut,” a minimum distance for those curves which loop around once, as indicated, and so forth. Hence, the associated Euler equation has a denumerable set of solutions. Exercise
1. Can one obtain the same lack of uniqueness on a surface with hills and valleys but no holes? Would there be an infinite number of solutions of the Euler equation ? 7.22. Minimizing Property
Before continuing a discussion of uniqueness of solution, let us turn to an examination of the minimizing property of the suitably constrained solution of (7.20.1). We have J(u
+ TI)= ] ( u ) + IT [2u’v’ + g ’ ( u ) v] dt + j T [ d 2+ v2g”(6)]dt, (7.22.1) 0
0
+
where 0 is a function of t in the interval [u, u v], depending upon both u and v. Here we have employed the mean-value theorem, adding for our convenience the additional assumption that g“ exists for all u. T h e second term vanishes, since u is a solution of the Euler equation and u(0) = 0, u’( T ) = 0. We can conclude that u provides an absolute minimum, provided that
j T[a’%+ v””(6)l 0
dt
>0
(7.22.2)
for all u, and a relative minimum within an appropriate class of functions v to a class of otherwise. Thus, for example, if we restrict u and u k, , it is clear that (7.22.2) will hold for functions such that 1 g”(6)l T < 1. An important consequence of the foregoing argument is that (7.22.2) holds trivially if g”(6) > 0, which is to say, provided that g(u) is convex in u.
<
+
7.23. Convexity and Uniqueness
We can obtain a simple condition for uniqueness of the solution of (7.20.1) from what has preceded. If the function g(u) is convex, each
7.24. IMPLIED BOUNDEDNESS
283
solution of (7.20.1) yields an absolute minimum of J(u). This contradicts the existence of more than one solution.
Exercise
JOT
+
1 . J(u) = ( u ’ ~ eu) dt possesses an absolute minimum over all functions u such that u‘ eL2(0,T ) , ~ ( 0= ) c. How does one obtain the numerical values of this minimizing function ? 7.24. Implied Boundedness
I n some important cases, the origin of the equation in the variational problem directly yields the required boundedness condition. Consider, for example, the minimization of *T
(7.24.1)
where u is subject to the initial condition u(0) = c. T h e trial function ce-l shows that we need only consider the class of functions satisfying
<<
I , where k, is independent of T. Hence, we have for assuming I c 1 the minimizing function *t JO
+ u2) dt < J
,.t
(u’2
+ + u4)dt <
( u ’ ~ u2
PT JO
(..,) dt
< k,c2.
(7.24.3)
Since u ’ ~+ u2 2 2 I uu’ 1, we have
I U’
-
c2 1 =
2
is:
UU‘
dt
I < J:
(u2
+ u ’ ~dt) < k,c2.
(7.24.4)
Thus, we have the desired a priori bound (7.24.5)
IuI
Hence, we have established the desired uniqueness for I c 1
< 1.
284
THE RAYLEIGH-RITZ METHOD
7.25. Lack of Existence of Minimum
T h e general situation is quite complicated as far as nonlinear variational problems are concerned. We can easily encounter a case where a solution of the Euler equation exists without the associated functional possessing an absolute minimum. Consider, for example, the functional /(u) =
IT
(u'2
(7.25.1)
- u4) d t ,
0
where u(0) = 1. T h e equation uN
+ 2u3
=
0,
u(o)
=
1,
<
=
(7.25.2)
0,
>
possesses a solution for T 1. T h e trial function ekt with k 1 shows, however, that J ( u ) can assume arbitrarily large negative values and, thus, that no absolute minimum exists. Similarly, it is essential to examine the class of admissible functions carefully. T h e functional "T
(7.25.3)
does not assume a minimum value if we allow u to range over the class of continuous functions such that u(0) = 1 . We can come arbitrarily close to lim J ( u ) = 0, but no admissible function yields the value 0 for the functional. Simple examples of this nature force one to examine formal techniques with a suspicious eye.
7.26. Functional Analysis
Let us now see if we can provide a direct proof of the existence of the minimum value of a functional under suitable hypotheses. Once this has been established, we can proceed directly to an application of the Rayleigh-Ritz procedure without worrying about the Euler equation if we wish. We utilize a modicum of functional analysis. Let (7.26.1)
7.26. FUNCTIONAL ANALYSIS
285
and suppose that g(u) is convex. We wish to minimize J(u) over the class of functions such that u' d 2 ( 0 ,T ) and u(0) = c. Since g(u) is convex, we have g(u) 3 k, > - co for all u, and thus
I(.)
(7.26.2)
>, R, > --co
for all admissible u. T h u s lim J(u) = R, -
> --co
(7.26.3)
exists. Let {urn)be a sequence of admissible functions with the property that lim j(un) = R,. (7.26.4) n-m
We now show that actually {urn'} converges strongly in LZ(0, T ) . We have
Hence
+ 1: ( un
z
u r n ) dt
-
( un
urn)
.
(7.26.6)
by virtue of convexity, and (7.26.8)
we have
(7.26.9)
286
THE RAYLEIGH-RITZ METHOD
Hence, as m and n +
00, we
have
)
J: (
(7.26.10)
dt -+ 0,
the desired strong convergence. It follows that u,' converges strongly in L"0, T ) to a function zj €L2(0,T ) , and thus Jiu,' dt converges 1 strongly to a function TI dt, = u, for which the !&I is attained. Since g(u) is convex, J ( u ) is also convex, which means that u is the unique admissible minimizing function.
so
7.27. The Euler Equation and Haar's Device
I t is interesting to show that the function u singled out in this fashion satisfies the Euler equation. Let v be a function such that v' eL2(0,T ) with v(0) = 0. Then u 4-ZI is an admissible function and
(Replace g(u) by 2g(u) to simplify the arithmetic.) Since J(u) is the absolute minimum, we must have the variational equation
j n [2u'v' + 2g'(u) v7 dt = 0 T
(7.27.2)
for all functions of the foregoing type. Integrating by parts, we have ~u'v]:
+2
JT
n
+ g'(u)] dt = 0,
v[-u"
(7.27.3)
whence we would like to conclude that -u"
+ g'(.)
0,
u'(T)
0.
=
(7.27.4)
T h e difficulty with this direct approach resides in the fact that we do not know a s yet that u" exists. We know only that u' E L2(0,T ) . T o circumvent this difficulty, following Haar, let us integrate by parts in the reverse fashion. Namely, T
,$(Id)
v dt
I'
T
=
=
r:
---v
T
g'(u)]
0
[d j : g ' ( u )
+j T
0
[TI'
ds] dt,
I
T
t
g'(u) ds] dt
(7.27.5)
7.28. DISCUSSION
since v(0)= 0. Hence
1%
o' [u'
+ Sf g'(u) ds] dt = 0
for all v' eL2(0,T ) . But 2)'
287
=
u'
+J
T t
(7.27.6)
(7.27.7)
g'(u) ds
is itself a function of this class if u' €L2(0,T ) . Hence (7.27.6) yields (7.27.8) whence u'
+ ITg'(.) ds = 0
(7.27.9)
t
almost everywhere. Without loss of generality, redefine u' so that the relation holds everywhere. From (7.27.9) it follows that u' is the integral of a continuous function. Hence, we may differentiate, obtaining the desired Euler equation. Since every solution of the equation u"
-
g ' ( . )
=
0,
u(0) = c,
u'(T) = 0,
(7.27.10)
yields an absolute minimum of J(u), (7.27.10) has only one solution, the strong limit obtained above. This emphasis upon u' rather than u as the basic function appears rather strange from the standpoint of the calculus of variations but is completely understandable from the standpoint of dynamic programming, as we shall discuss in Volume 11. 7.28. Discussion
We have thus illustrated one of the fundamental services of the calculus of variations, that of providing a firm basis for the existence and uniqueness of solutions of functional equations. Often complex equations cannot be approached in any other fashion. Once existence and uniqueness have been established, the RayleighRitz method can be applied with confidence. Using an appropriate trial N function, u = C k Z l a k w k , we have (7.28.1)
288
THE RAYLEIGH-RITZ METHOD
a function of a finite number of variables. T h e various methods discussed in Chapter 5 , as well as a number of methods we have not mentioned, can be applied to the task of obtaining numerical values. There is no difficulty in applying the same ideas to the multidimensional versions of the variational problem. Naturally, the computational complexity increases as the dimension of the system increases. Furthermore, the relation is nonlinear. For example, the time required to solve a system of linear algebraic equations is roughly proportional to the cube of the dimension; hence, our constant reference to the size of N . 7.29. Successive Approximations
T h e rclativc facility with which we can handle quadratic variational problems suggests that there may be considerable advantage to treating more general questions by means of a sequence of carefully chosen approximations in terms of quadratic functionals. We shall pursue this in Volume I1 in connection with the theory of quasilinearization and duality. 7.30. Lagrange Multiplier
I n a number of important processes we meet the problem of minimizing a functional (7.30.1)
subject to an auxiliary condition such as H(u) =
J
T
h(u, u ' ) dt
=
k,
(7.30.2)
0
and an initial condition u(0) = cl. These problems are considerably more difficult than those so far considered, both analytically and computationally. There is, however, one general method which can be successfully employed in a number of cases, the method of the Lagrange multiplier. I n placc of the original problem, consider the problem of minimizing thc modified functional J(u, A)
=
ST 0
g(u, u') dt
+ X 1' h(u, u') d t , 0
(7.30.3)
7.32. RAISING THE PRICE DIMINISHES THE DEMAND
289
where A is the Lagrange multiplier. T h e idea of the method is to minimize J(u, A) subject only to the restriction u(0) = c, obtaining a minimizing function u(t, A) dependent upon the parameter A. A suitable value of A is then determined by use of the original integral constraint, leading to an equation
1: h(u(t,
A), ~ ' ( tA)), dt
=
k, .
(7.30.4)
T h e method has a simple geometric interpretation, as can be seen by considering the values of J and H as Cartesian coordinates as u assumes all admissible functional values. It is, however, perhaps most intuitive from the economic or engineering point of view. T h e constraint in (7.30.2) often arises as a restriction on the quantity of resources that can be employed to attain a certain objective. Alternatively, we may consider that we have an unlimited supply of these resources that can be used at a certain unit price. T h e parameter A then represents this price. It is reasonable to suspect that this price can be adjusted so that the total consumption of resources is equal to the initial quantity available. I n many important cases, this is true. 7.31. A Formal Solution I s a Valid Solution
As mentioned above, a detailed discussion requires a considerable amount of effort. Let us content ourselves then with the proofs of two simple, but highly important, results. T o begin with, let us suppose that u = u(t, A) furnishes the absolute minimum of J(u, A) and that (7.30.4) is satisfied. T h e n we maintain that u minimizes J(u) in (7.30.1) subject to (7.30.2). We proceed by contradiction. Suppose that
(7.3 I .l) Then a contradiction to the assumption that u provided the absolute minimum. 7.32. Raising the Price Diminishes the Demand
I n certain cases we can obtain u(t, A ) explicitly. I n general, however, it is necessary to solve the problem computationally for a set of values
290
THE RAYLEIGH-RITZ METHOD
of X and then to use some type of search, extrapolation, or interpolation procedure to obtain more accurate results. Intuitively, we would expect that as A, the “price,” increases, the functional H(u(t, A)) of (7.30.2), the quantity of resources used, will decrease. Let us demonstrate this. Let u = u(t, a ) correspond to X = a and 21 = u(t, b ) correspond to X = 0 . Then, by virtue of the minimizing property,
H(.)
(7.32.4)
From the foregoing, we also obtain
I(.) < I(+
(7.32.5)
T h e difficulty in the use of the Lagrange multiplier resides in the fact that H(u(t,A)) need not be a continuous function of X unless we add further conditions, essentially of a convexity nature. It is the validation of these additional conditions that requires a more sophisticated analysis. 7.33. The Courant Parameter
We have just indicated one way of disposing of a constraint. Another method is to consider a functional modified in a different fashion, namely
I(%4 = l(4 + W ( 4- KJ2.
(7.33.1)
T h e parameter k is called the Courant parameter. We would suspect that under appropriate conditions the limit as k+ 00 of the function u ( t , R ) minimizing J(u, k) yields the solution to the original constrained variational problem. References to studies of this question will be found at the end of the chapter. There are often analytic and computational advantages to considering (7.33.1) rather than the original problem.
MISCELLANEOUS EXERCISES
291
7.34. Control Theory
A prolific source of variational problems is the modern theory of control processes. We encounter questions of the following nature. Given that the vectors x and y are related by the differential equation x’ = g(.,y),
40) = c,
(7.34. I )
determine y so as to minimize the functional
A new feature, intrinsic to many significant control processes in economics and engineering, is the presence of local constraints of the form i = 1,2 ,..., k. (7.34.3) r i ( x , y ) GO, T h e vector x is called the state variable and the vector y the control variable. T h e problem of obtaining analytic and computational results when constraints are present is very difficult. No universally effective methods exist, nor do we expect to find any. I n the second volume we will discuss the application of quasilinearization and dynamic programming to these matters.
Miscellaneous Exercises 1. Consider the equation U” - x-1/2u3/2= 0, ~ ( 0= ) 1, u(c0) = 0, called the Emden-Fowler equation in astrophysics and the FermiThomas equation in quantum mechanics. Show that this is the [ u ’ ~ 4/5 x-1/2u5/2]dx. Euler equation of J(u) =
sr +
2. Show that for the minimizing function,
3. Use the trial function ~(x)= (1 choice of or and
p that
min, J(u)
+ x1/%)-D,and show by appropriate < 1.5883.
T. Ikebe and T. Kato, “Application of Variational Method to the Thomas-Fermi Equations,” J. Phys. SOC.Japan, Vol. 12, 1957, pp. 201-203.
T H E RAYLEIGH-RITZ
292
METHOD
For more recent results on the Emden-Fowler additional references, see
equation and
T. V. Atepanova, “Generalized Emden-Fowler Equation,” Dzffeerencial’nye Uravenenija, Vol. 3, 1967, pp. 831-838.
4. Show that ( y ,A-ly)
= -
min, [(x, Ax)- 2(x, y ) ] if A
<
M : &=l
5. Use a trial function x,
> 0.
ckxk, M N , where N is the dimension of x, to obtain estimates for A-l. Write ( y , BA4y)
=
-
min
[(.A4
(c*)
7
- 2(xM
AXM)
>
r)l*
Show that Bl > B2 > ... > B , > A-l, and consider the possibility of the use of extrapolation techniques to obtain appropriate elements in A-l.
+
+
+
6. Let J(xI Y ) (x, B x ) 2(x, A y ) ( y , BY) - 2(a, x) 2(b, y ) , where B > 0 and A is real, not necessarily symmetric. Show that the variation equations are Bx Ay = a, A‘x By = b, when we minimize over both x and y . Under what conditions does an absolute
+-
minimum exist ?
+
+
7. Let J ( x , Y ) (x,C X ) 2 ( X , BY) - ( y , CY) - 2(x, 6 ) - 2(a, y ) , where B and C are real and symmetric, with C > 0. Show that min, max, J(x, y ) = maxu min, J(x, y ) . ~
8. Consider the problem of minimizing J&)
= J T (u’2 0
+ 9 )dt + h (ITu dt 0
-
.IZ
over all admissible ~ ( tsubject ) to u(0) = c, where X 3 0. What is the limiting behavior of the minimum value of JA(u)and the minimizing function as A + 00 ?
+
9. Consider the functional Jn(u) JOT (Xuf2 u2)dt, where A 3 0 and ZL is subject to u(0) c. Write g(X) = min, JA(u).Is g(h) continuous for X > 0 ? Is the minimizing function continuous for X 3 0 ? 7
+
-
-
10. Consider I( =. J) ~ ( U ’ ~ u2)dt, where u is subject to u(0) = c l , u( T ) cg . Show that T 4 0, u (cl + t(c,--c,)/T), u’ (cZ-cl)/T. :
+
11. Consider the minimization of J(u) subject to u(0) = c, u ( T ) bu’( T ) = a. Obtain the analytic representation of min, J ( u ) and the minimizing function, and study the behavior as a + co and b + CO. Could one predict this behavior without use of the explicit analytic representation ?
MISCELLANEOUS EXERCISES
293
12. Consider Jh(u)=
J
T 0
+
h
(M2 u2) dt,
> 0,
u(0) = c.
What is the behavior of min, fh(u)as T + co ? Does lim lim JA(u) = lim lim JA(u)? A-0
T+m 1-0
T+m
13. Determine the function which minimizes
where u(0) = u( T ) = 0.
14. Show that the minimum value of K(u>=
JT
+ u2 - 2h(t)u ) d t ,
(d2
0
subject to u(0) = u( T ) = 0, is a quadratic functional ah)=
-
ST 0
JT 0
k ( t , t l ) h(t) h(t,) dt d2,
Hence, show that K(t, t l ) is a positive definite kernel, that is,
for all nontrivial h(t). Hint: u
=0
is an admissible function.
15. Consider the problem of minimizing J(u) = ~ ~ udt,1 subject 2 to J t u 2dt = K , K > 0, with u(0) = c. Consider the associated functional
where u(0) is again equal to c, A 3 0. Show explicitly that J i u 2 dt decreases as h increases.
16. Consider the problem of minimizing J(u, o) =
I
T
0
+ v2)dt
(uz
294
THE RAYLEIGH-RITZ METHOD
+
over v, where u' = au v, u(0) = c. Show that v satisfies the equation v' + av = u, v ( T ) = 0, and that u and ZI are uniquely determined in this fashion.
17. Consider the minimization of
IT
+
( u ' ~ u2)d t ,
0
subject to u(0) = c1 , u( T ) = cz . Is it true that min J(u)
=
u(0)=c1
min [ min J(u)] ? c2
u(0)=c1 u(T)=c,
18. Show that the solution to the problem of minimizing
j T(u2 + v2)d t 0
over all v, where u' u - c [
v
c
=
au
+
+ v, u(0) = c, is given by
a sinh[dl u2(T t ) ] a sinh[dlfT]
[
~
~
~
dl d1
+ a2 c o s h [ d r + a2 cosh[dl + a2T] ~
~
s i n h [ w ( T - t)] u s i n h [ d c t 22'1 - d l - 4 - u 2 cosh[dl
+ a2T]
Determine the limiting behavior as T + 00 and as a + 00.
19. Consider the problem of minimizing J(u) =
lT
(u2 -{- u f 2 )d t ,
0
u(T)
u(0) = c,
Show that the minimizing function is given by u
--
c1 sinh t
and min J 11
=
(c2
+ c sinh(T sinh T
+ cI2)coth T
-
~
t)
2cc, cosh T
=
cI .
295
MISCELLANEOUS EXERCISES
20. Let R(c, T , A)
=
min u
[IT o
(uI2
+ u2)dt + h(u(T) - c ) ~ ],
u(0)
=
c.
Show that the minimizing function is given by u=
sinh T
X(c2
+ Xc sinh(T t ) + c cosh(T X sinh T + cosh T -
-
t) f
+ c12)cosh T + c2 sinh T - ~ X C C ,
+ cosh T Consider the problem of maximizing JOT u2d t , subject to the condition R(c, T , A)
21.
Xc,
=
1:
X sinh T
u'~ dt = A,,
u(0) = c.
Letf(c, T ) denote this maximum value. Is there a choice of c which maximizes, or which minimizes ?
22. Consider the problem of minimizing the function
over all u and v, with u(0) = c. What is the limiting behavior of the minimizing functions as h + ? For the preceding Exercises (8-22), see
R. Bellman, Introduction to the Mathematical Theory of Control Pyocesses, Volume I, Academic Press, New York, 1967.
23. Show that the condition that u € L 2 ( 0 ,a),where h eL2(0,a),implies that
for some constant c2 . Hence, show that if u , ELZ(0, m).
U" -
U"
u = h, with
€ L 2 ( 0 ,m), then
u'
24. Establish corresponding results for the case where I u 1, 1 U" I are uniformly bounded for t
3 0.
25. Using the identity d / d t [e&(u' u' = -u
- et
+ u)] = ect(u"
1
m
t
e-s [u"
-
-
u] ds.
u),show that
T H E RAYLEIGH-RITZ M E T H O D
296
Hence, show that max I u’ 1 >o
I
< 2 max I u 1 + max I U” I . 20 f>O
26. Replace u ( t ) by u(rt), where r
I
r max I u t >o for any r
t
> 0, and deduce that
< 2 max 1 u I + r2 max 1 u” 1 20 t>O
t
> 0. Hence, conclude that (max j 120
< 8(max I u l)(max I u” I). >o
u
t>O
t
sr
sr
27. Obtain corresponding inequalities connecting u2 dt, uf2dt, and un2dt. T h e constant 8 is not the best possible constant. For further results along these lines, and for additional references, see
Sr
G. H. Hardy, J. E. Littlewood, and G. Polya, Inequalities, Cambridge University Press, New York, 1934. E. F. Beckenbach and R. Beilman, Inequalities, Springer-Verlag, Berlin, 1961.
+
+
28. Show that if (u” a,(t)u’ a2(t)u)eL2(0,a),u €L2(0,a),and ~ u l ~ ~< ca l 2< \ m f o r t & 0 , t h e n u ‘ E L 2 ( 0 , a ) . 29. Assume that all of the following integrals exist. Then
I
co
(u“
0
4-u ) dt~
=
Srn un2d t + I‘“ dt u2
0
0
-
2
Srn
~ ‘ d2t ,
0
provided that either u(0) or u’(0) is zero. I n this case, for any positive r , r4
jm un2d t + 0
and thus
(s‘“ 0
Jco u2 0
dt - 2r2
unZdt)(Jm u2 d t ) 0
1
co
uI2 dt
2 0,
0
3
(1; d 2dt)’,
if either u(0) = 0 or u’(0) = 0.
30. Determine the minimum of 1 ; (u2 + blv2) dt over v, where au” 6u’ 4-cu = ZI, u(0) = c l , u’(0) = c 2 . See
+
0. I,. R. Jacobs, “The Damping Ratio of an Optimal Control System,” I E E E Trans. Automatic Control, Vol. AC-10, 1965, pp. 473476.
297
MISCELLANEOUS EXERCISES
31. Determine the minimum over v of J:
(uz - b2(u”)2) dt.
See
G. C. Newton, L. A. Could, and J. F. Kaiser, Analytic Design of Linear Feedback Controls, Wiley & Sons, New York, 1957, Chapter 2.
32. Determine the minimum over
u of
Jr + d 2 )dt. See (uz
R. E. Kalman and J. E. Bertram, “Control System Analysis and Design via the Second Method of Lyapunov,” Trans. A S M E , Ser. D., Vol. 82, 1960, pp. 371-373. C. W. Merriam 111, Optimization Theory and the Design of Feedback Control Systems, McGraw-Hill, New York, 1964. 33. If one assumes in all these cases that the optimal control (u is the control variable) is linear, z, = b,u + b&, where 6 , and 6, are constants, how does one determine these constants ?
- u - 2u3 = 0, u(0) = 0, u’(0) = c1 > 0. Show that u is nonzero for t > 0 in two ways: (a) Assume that u ( T ) = 0 and consider Jlu(urr- u - 2u3) dt. (b) Assume that T is the first value for which u ( T ) = 0, and examine the consequence of the fact that u has at least one relative minimum in [0, TI.
34. Consider the equation U”
35. Suppose that we employ a perturbation procedure to minimize J(u, E ) =
s:
+ + cu4)dt,
( u ’ ~ u2
where E > 0, and u(0) = c. T h e Euler equation is u”- u - 2 ~ u 3= 0, u(0) = c, u’(T)= 0. Set u = uo cul .-*, where u o , u1 ,..., are independent of T . Then
+
+
u;; - uo = 0, 241
- u1 - 224: = 0,
u0(O) = c,
u,’(T) = 0,
0,
ul‘(T)= 0,
u,(O)
=
and so on. Show that u o , u1 ,..., are uniquely determined, and exhibit uo , u1 explicitly. Does the perturbation series converge for any negative value of E ? Does the minimization problem have a meaning for any negative value of E ?
36. Consider the problem of minimizing
THE RAYLEIGH-RITZ M E T H O D
298
where E is a small parameter and x(0) = c. Show that the associated Euler equation x” - x - cg(x) = 0, x(0) = c, x‘( T ) = 0, has a solution by considering the Jacobian of ~ ’ ( tat ) t = T for E = 0. 37. What is the behavior of
as
E + 0,
where x(0)
=c
and A is positive definite ?
38. Consider the problem of minimizing
+
with respect t o y , where x’ = A x y and x(0) = c1 , x( T ) = c2 . To avoid the problem of determining when y can be chosen so as to meet the second condition, x ( T ) = cp , consider the problem of minimizing
0, where the only constraint is now x(0) = c1 . Study the for X asymptotic behavior of min J(x, y , A) as h -+ co, and thus obtain a sufficient condition that a “control” y exists such that x( T ) = c2 . For a discussion of the significance of questions of this nature, see the book on control theory by R. Bellman, cited in Exercise 22. 39. Consider the following generalization of the Riesz-Fischer theorem. 1 Let (cp,] be an almost-orthogonal sequence, that is, So vmyndt = amn , m # n, where
If
ni,n=O
I< c m
amnUmUn
c1
Un2-
n=O
Show that, given any sequence (bk} with function f such that
xr=ob,
< 00,
there is a
See
R. Bellman, “Almost Orthogonal Series,” Bull. Amer. Math. Soc., Vol. 50, 1944, pp. 517-519.
299
MISCELLANEOUS EXERCISES
There are very interesting applications of the concept of almost orthogonality in probability theory and analytic number theory.
40. Let L(u) be a linear functional on the space of functions u €L2(0,1). By considering the minimization of the quadratic functional J(u) =
s’
u2 dx
0
+ L(u)
over u E L2(0,I), establish the Riesz representation theorem, namely, L(u) =
uz, dx, 0
for some See
ZI
eL2(0,1)
R. Bellman, “On the Riesz Representation Theorem,” Boll. U.M.I., Vol. 20, 1965, p. 122. 41. Show that the minimum of J(u) =
SYn
+ zi2) + 2(u2 + zi2)li2]dt,
[u6/(u2
0
Jr
subject to the conditions that the integral exists and u2 dt = k, is furnished by the solutions of (uz zi2)1i3 = u4(u2 zi2)-2/3, and the minimum value is 3k, . Hint: Apply Young’s inequality,
+
s‘“f 0
+
+ 2 r n g 3 i 2dt > 3 1 fg dt, 2n
dt
0
0
for suitable f and g. See R. Bellman, “On a Variational Problem of Miele,” Astronautica Acta, Vol. 9, 1963, No. 3. A. Miele (editor), Theory of Optimum Aerodynamic Shapes, Academic Press, New York, 1965.
For other applications of the theory of inequalities to the solution of variational problems, see D. C . Benson, “Solution of Classical Variational Problems Using an Elementary Inequality,” Bull. Amer. Math. SOC.,Vol. 67, 1967, p. 418.
42. If u(t) is continuous in [a, 61, show that
THE RAYLEIGH-RITZ M E T H O D
300
43. Using the fact that max
a
I g(u, u‘, t)l
= lim n-tm
Lf
b
a
g(U, u’,t ) z nd t ]
1/2n
,
obtain an Euler equation for the problem of determining min max u
a
1 g(u, u’, t)l .
See
G.
Aronsson, “Minimization Problems for the Functional supzF(x, f ( x ) , f ‘ ( x ) ) - I , ” Arkiv Math., Vol. 6, 1966, pp. 33-53. G. Aronsson, “Minimization Problems for the Functional supz F ( x , f ( x ) ,f ’(x))-II,” Arkiv Math., Vol. 6, 1966, pp. 409-
431. D. S. Carter, “A Minimum-maximum Expression for Differential Expressions,” Canadian J. Math., Vol. 9, 1957, pp. 132-140. 44. A number of questions originally considered under the guise of stability theory can be treated in more precise fashion by the use of variational techniques. Consider, for example, the problem of determining the maximum over f of J ( u , f ) = u2 dt, where U” + ( u z $-f (t))u= 0, u(0) = c1 , u’(0) = c2 , and
li
+
Similarly, consider the maximum of (2 u ’ ~dt. ) What is the asymptotic behavior of the maximum as T + O O ? For related results, see
N. C . deBruijn, “A Problem of Optimal Control,” J . Math. Anal. AppZ., V O ~ . 14, 1966, pp. 185-190. G. Borg, “Uber die Stabilitat gewissen Klassen von linearen Differentialgleichungen,” Ark. Mat. Astr. Fys., Vol. 3 1 A, 1944, no. 1 .
45. Consider the problem of minimizing J(u) = J ~ ( U ’+~ y(t)u2)dt, with p(t) continuous, u(0) = c, and the discrete analogue, the problem of minimizing
BIBLIOGRAPHY A N D COMMENTS
where u,, = c, 9, = ~ ( n d )NA ,
=
301
T . Is it true that
lim d[min J({un})]= mjn J ( u ) ? A-0
46. Discuss the use of deferred passage to the limit techniques. 47. Consider the problem of minimizing
with respect to the v,, where u, is determined by uo(0) = c, un’(t)= V O , u,’ = v, , nd t ( n l ) d , u,(nd) = u,-,((n - l)d ), n >, 1, Nd = T . Is it true that 1imA+,,d[min J((v,})] = min, J ( u )?
< < +
48. Discuss monotonicity of convergence and the use of extrapolation
techniques. Do we have convexity as a function of A ? I n Volume I1 we will discuss systematic techniques for obtaining upper and lower bounds based on the use of duality. Bibliography and Comments
97.1. Some survey papers which will provide a large number of additional references are
B. A. Finlayson and L. E. Scriven, “On the Search for Variational Principles,” Intern. J. Heat Mass Transfer, Vol. 10, 1967, pp. 799-821. D. E. Newland, “On the Methods of Galerkin, Ritz, and Krylov-Bogoliubov in the Theory of Nonlinear Vibrations,” Intern, J. Mech. Sci., Vol. 7, 1965, pp. 159-172. See also
S. G. Mikhlin, Problem of Minimum f o r Quadratic Funciionals, Holden-Day, San Francisco, 1965. R. Bellman, Introduction to the Mathematical Theory of Control Processes, I : Linear Equations and Quadratic Criteria, Academic Press, New York, 1967. 97.2. For the general theory of the calculus of variations, see M. R. Hestenes, Calculus of Variations and Optimal Control Theory, Wiley & Sons, New York, 1966. M. Morse, The Calculus of Variations in the Large, Amer. Math. SOC.Colloquium Publications, Vol. 18, 1960.
57.3. An interesting paper unifying classical and contemporary variational principles in mathematical physics is
S. Altschuler, “Variational Principles for the Wave Function in Scattering Theory,” Phys. Rev., Vol. 109, 1958, pp. 1830-1836. For a rigorous presentation of some important formal techniques, plus additional references, see
THE RAYLEIGH-RITZ M E T H O D
302
T Kato, “Note on Schwinger’s Variational Method,” Progress of Theoretical Physics, Vol. 6, 1951, pp. 295-305.
Euler derived the Euler equation by a passage to the limit from a discrete analogue of the original variational problem. This provides an important analytic and computational approach. For further use of the method of this section, see, for example,
R. Bellman, Introduction to the Mathematical Theory of Control Processes, II: Nonlinear Processes, Academic Press, New York, forthcoming. F. V. Atkinson, Discrete and Continuous Boundary Problems, Academic Press, New York, 1964. $7.6. See the two books by Bellman cited above for systematic use of this approach. $7.17. One of the considerable advantages of the use of the theory of inequalities as opposed to standard variational techniques is that in this fashion one bypasses sticky questions of existence and uniqueness and relative versus absolute minima. $7.19. In some fortunate cases, the expression
Sce, for cxarnple,
fu d t is precisely the quantity desired.
R. E. Marshak, “Variational Methods for Asymptotic Neutron Densities, Phys. Rev., Vol. 71, 1947, pp. 688-693. B. Davison, “Remark on the Variational Method,” Phys. Rev., Vol. 71, 1947, pp. 694-697. See also
I. Stakgold, “Bounds for Linear Functionals,” Bull. Amer. M a t h . SOC.,Vol. 73, 1967, pp. 180-183. J. B. I h z , “Upper and Lower Bounds for Linear Functionals,” Proc. S y m p . Spectral Theory and Differential Equations, Stillwater, 1951, pp. 279-289. $7.20, In the second volume we will present different approaches to the analytic and computational solution of two-point boundary value problems. See
R. Bellman and R. Icalaba, Quasilinearization and Nonlinear Boundary-value Problems, American Elsevier, New York, 1965. E. S. Lec, Qiiasilinrarization and Invariant Imbedding, Academic Press, New York, 1968. M. M. Vainberg, Variational Methods for the Study of Nonlinear Operators, Holden-Day, Sun Francisco, 1964. $7.25. We follow the argument given in
K. Bcllman, I. Glicksberg, and 0. Gross, Some Aspects of the Mathematical Theory of Control Processes, Publishers of Foreign Literature, Moscow, 1962.
See also the books by Bellman and Hestenes cited in $$7.1 and 7.2. $7.29. See
M. Aoki, “On a Successive Approximation Technique in Solving Some Control System Optimization Problems,”
/. M a t h . Anal. A p p t . , Vol. 5 , 1962, pp. 418-434.
’The problem of obtaining the best quadratic fit to a given functional can be approached by mean-square approximation techniques. For an application of approximation techniques involving quadratic functionals in connection with the use of the Wiener-Feynman integral, see
BIBLIOGRAPHY AND COMMENTS
303
R. P. Feynman, “Slow Electrons in a Polar Crystal,” Phys. Rev., Vol. 97, 1955, pp. 660-665. The method has not been systematically exploited. 97.34. See the books by Bellman and Hestenes previously cited and
R. Bellman, Adaptive Control Processes: A Guided Tour, Princeton University Press, Princeton, New Jersey, 1961.
Let us finally mention some areas we have not touched upon. First of all, the topic of invariance of automatic control systems introduces some interesting and important variational problems. See
L. I. Rozonoer, “Variational Approach to the Problem of Invariance of Automatic Control Systems-I,” Automation and Remote Control, Vol. 24, 1963, pp. 68&691. Second, there is the topic of asymptotic control, which is to say the behavior of the solution of the variational problem as the interval increases indefinitely. See R. Bellman and R. S. Bucy, “Asymptotic Control Theory,” SZAM Control, Vol. 2, 1964, pp. 11-18. Finally, there is the wide class of questions associated with the problem of determining the variational problem for which a given function is the solution which we have not discussed. See R. Bellman and R. Kalaba, “An Inverse Problem in Dynamic Programming and Automatic Control,” J. Math. Anal. Appl., Vol. 7 , 1963, pp. 322-325. This may be considered to be an extension of differential approximation. It leads to some interesting questions of stability of variational problems, an area as yet relatively uninvestigated.
Chapter 8 STURM-LIOUVILLE THEORY
8.1 Equations Involving Parameters
I n the treatment of the ordinary and partial differential equations of engineering and mathematical physics, we constantly encounter equations containing both an unknown function and an unknown parameter. T h e parameter is to be determined by the condition that solutions of a certain specified type exist. Thus, for example, we may be given an equation of the form u" A&) u = 0 (8.1.1)
+
and be asked to obtain the values of A, if any, which permit the existence of a solution satisfying the two-point boundary condition (8.1.2)
u(0) = u ( T ) = 0.
Or, the equation may have the form u"
+ (A + d t ) )u
=
0,
(8.1.3)
and it may be required to determine X so that there are nontrivial solutions which belong to L2(0,a).Questions of this nature are part of Sturm-Liouville theory. Consideration of some rudimentary questions of this nature will give us a fine opportunity to present another application of the Rayleigh-Ritz method. We will not discuss this fundamental problem area in any depth. References t o extensive treatments of this domain of basic mathematical and scientific importance will be found at the end of the chapter. Exercise
1. Consider the partial differential equation k(x)u, = u,, ,u(x,0) = g(x), u(0, t ) = u(1, t ) 0, t > 0. Show that a search for particular solutions of the equation of the form u(x,t ) = e-+w(x) leads to the hk(x)w = 0, w(0) = w(1) = 0. Sturm-Liouville equation W" 1
+
304
8.2. STATIONARY VALUES
305
8.2. Stationary Values
Let us now indicate a connection between the Sturm-Liouville equation of (8.1.1) and the calculus of variations. Consider the homogeneous functional K(u) =
SoT
uf2dt Jf ~ ( tu2) dt
(8.2.1)
Let us determine necessary conditions satisfied by functions satisfying the conditions u(0) = u ( T ) = 0, u' eL2(0,T ) , which render K(u) stationary. As before, we suppose that I y 1 EL(O,T ) . We proceed at first in a formal fashion. Set u = ii EV, where u is a function in this class, E is a scalar, and ZI is an arbitrary admissible function, that is, v' cL2(0,T ) , v(0) = v ( T ) = 0. Starting with
+
(8.2.2)
take the derivative with respect to E and set it equal to zero. We obtain in this fashion the variational equation (8.2.3)
for all admissible v. Introduce the parameter (8.2.4)
Then (8.2.3) reads
1'
U'V' dt
0
-
h
j"
T
F(t) 0
uv
dt
=
0,
(8.2.5)
for all admissible v. Integration by parts yields the Euler equation
u"
+ A&)
u=0
(8.2.6)
and provides an interpretation for the constant A. Observe that the introduction of this parameter enables us to replace a nonlinear equation by a linear equation.
STURM-LIOUVI LLE THEORY
306
Exercise
1. What is a corresponding variational problem yielding U"
+ (A + p)(t)) u = 0,
u(0) = u ( T ) = O ?
8.3. Characteristic Values and Functions
I t remains to establish, as before, precise connections between the differential equation and the original variational problem. If we can, we are in a position once again to apply the Rayleigh-Ritz method. T o this end, consider the initial value problem
+ hp(t) u = 0,
u"
(8.3.1)
u'(0) = 1.
u(0) = 0,
Since we want u to be nontrivial and since the equation is homogeneous, there is no loss of generality in using the initial condition u'(0) = 1. This is a normalization. Write
c hnu,(t). m
u=
(8.3.2)
n=O
We wish to show that u is an entire function of h for each t in the interval [0, TI. Substituting in (8.3.1), we have a sequence of equations determining the coefficients
u;
u; =
0,
+ p(t) un-l
=
U,(O) =
0,
u,'(O) = 1,
0,
U,(O) =
u,'(O)
=
0,
n
2 1.
(8.3.3)
We see that (8.3.4)
<
Let us suppose that 1 ~ ( t ) l 6 for 0
< t < T. T h e n
This is equivalent to saying that the power series in h for u is majorized by the power series in h for the solution of
v" - Xbv
=
0,
v(0) = 0,
v'(0) = 1.
(8.3.6)
8.4. PROPERTIES OF CHARACTERISTIC VALUES AND FUNCTIONS
307
Since v is given explicitly by (8.3.7)
< <
it follows that u(t) = u(t, A) is an entire function of h for 0 t T. Hence, the values of h which provide nontrivial solutions of (8.3.1), satisfying the further condition u(T) = 0, are the zeros of an entire function u(T, A). Thus, we know that the admissible values of h are denumerable and that the only cluster point is CX). Call these admissible values of h characteristic values and the associated solutions of (8.3.1) characteristic functions. Exercises
1 . Obtain corresponding results for
+ (A + u = 0, u(0) = u(T) = 0. 2. Is the condition JOT I 9) ] dt < co sufficient to establish U"
))&I
result ?
the foregoing
8.4. Properties of Characteristic Values and Functions
Let us now use the Sturm-Liouville equation in a simple fashion to establish the following basic results. If q ( t ) > 0, then:
The values of h are real and positive. Hence, we can write A, < A, < .... (b) The associated solutions ul(t), ug(t),..., may be chosen (8.4.1) real and they are orthogonal with the weight function ~ ( t )that , is, q ( t ) u p j dt = 0, i # j .
(a)
JOT
Let us start with the orthogonality from which the other properties follow. From u; A,&) 24, = 0, (8.4.2) u; + Ajfp(t) uj = 0,
+
we have uiu;- uju;
+ (Aj
- hi) v(t)uiu3
=
0.
(8.4.3)
308
STURM-LIOUVILLE THEORY
Integrating between 0 and T , we obtain (8.4.4)
T h e boundary conditions satisfied by ui and uj reduce this to (Aj
-
Xi)
J Ty ( t ) U*Uj dt =
0.
0
(8.4.5)
If Xi # X i , we have
J:
p)(t) UiUj dt =
0,
(8.4.6)
the desired orthogonality. This holds with no restriction on p)(t). Henceforth, let us assume that cp(t) > 0. This assumption will enable us to use the orthogonality to good effect. Before proceeding further, let us show that u( T , A) = 0 cannot have a double root. If u(T, A) = 0, u,(T, A) = 0, then v(t) = u(t, A) and w ( t ) = u,(t, A) are simultaneously solutions of V''
w"
++(t) v + Xy(t) w
=
0,
v(0) v,
= -p)(t)
=
v ( T ) = 0,
w(0)
=
w(T)= 0.
(8.4.7)
€Ience wv" - vw" = p)(t) 02.
Integrating, 0 = [wv'- .w'],'=
s:
y ( t )212 dt,
(8.4.8)
(8.4.9)
a contradiction if ~ ( t>) 0, v # 0. Hence, hi # hi for i # j , and (8.4.6) holds for i # j . From this we can easily show that the A, are real and thus that the ui can be chosen real. For if A, , u, , ui are respectively characteristic values and functions with Xi # Xi (here, iii denote the complex conjugates of X i and ui respectively), we have, from (8.4.6),
xi
~
xi,
(8.4.10)
a contradiction if ~ ( t>) 0, as we have assumed,
8.4. PROPERTIES OF CHARACTERISTIC VALUES AND FUNCTIONS
309
Let us note for further reference that (8.4.6) plus integration by parts yields the additional orthogonal property
sr
(8.4.1 1)
i # j.
uiujl dt = 0,
So far, we have used the normalization ~ ~ ’ (= 0 1) , i = 1, 2, .... Let us replace this by the more convenient S:&)uiZdt
=
i = 1 , 2)....
1,
(8.4.12)
Then (8.4.13)
T h e second result follows by means of integration by parts. Exercises
1 . Assume that the function y is positive for 0
< <
t 1 and possesses a continuous second derivative. Show that the Sturm-Liouville problem u“ hpu = 0, u(0) = u(1) = 0, can be transformed into (A 4-g)v = 0, v(0)= v( 1) = 0, by means of a one of the form v” sequence of changes of dependent and independent variable. Hint: f Begin with the change of independent variable s = dt, .
+
+
2. Show that v satisfies the integral equation sin p ( t - t l ) g ( t l )v(tl)d t ,
,
where p2 = A. Hence, p is determined by the equation sin p
1
--
where, of course,
P
ZI
s
1
o
sin P( 1 - t l ) g(tl) v ( t l ) d t ,
= v(t, A).
3. Consider the characteristic value problem u”(t)
+ (A
U(U) cos
- q(t))
01 -
u(t) = 0, 01 = 0,
~ ’ ( usin )
u(b) cos B - u’(b) sin ,B
= 0,
=
0,
STU RM-LIOUVILLE THEORY
310
where q ( t ) is an absolutely continuous function in [a, 61. Let 0 < N < T , 0 ,< /3 < T . Then, as n + 00, (a) If a
x, (b) If A,
(n
=
=
=
-
(c) If 0 (n
A, =
+
0, 0
+ 1/2y
(6
0,
==
1)2 772 (b - a)2 +
(72
=
01
p
b
q ( t ) dt
+ O(n-l).
< /3 < T ,
a)2
[I,
1
T2
b
+
< 01 < T , p
+ 1/2)2
s,
l
= 0,
1
772
q ( t ) dt
1
+ 2 cot + O(n-l). (Y
(d) I f O < m < n , O < p < . r r ,
x
n2r2
-~ -
(b
a)Z
Isb
+1 q ( t ) dt + 2 cot (b a) a -
(Y
-
2 cot
1+
O(n-l).
See
J. H. E. Cohn, “Large Eigenvalues of a Sturm-Liouville Problem,” Proc. Cambridge Phil. Soc., Vol. 63, 1967, pp. 473475. E. L. Ince, Ordinary DzfJerential Equations, Dover Publications, New York, 1944, pp. 270ff.
+ Ap(t)u = 0, u(0) = 0, u’(0) = 1 , with 4. Consider the equation U” cn the solution u = t + CnZ1un(t)Xn. T h e equation determining the characteristic values is f ( A ) = u( 1) u,( l)An = 0. From the estimates of Sec. 8.3, show that f ( X ) possesses a Weierstrass factorization of the formf(A) = (1 - A/hi).
n?==,
5. Write 6,
=
m
A,‘;
r
==
+ x,,“=,
1, 2 ,.... Then
logf(X)
c PbJr. 00
=
-
7=1
6. If logf(A)
=
x;==, cE,Xk,then .,(I)
+ C k~~u,-~(I). n-1
=
nc,
k=l
8.4. PROPERTIES
OF CHARACTERISTIC VALUES AND FUNCTIONS
311
7. Show that bk/bk+l > h > b i l l k , k = 1, 2, ..., and that {bk/bk+l} is monotone decreasing while {bkllk> is monotone increasing. Show that
b
- lim k = lim b-"k - k-.m bk+l k-.m
'
'
8. For large k which estimate furnishes a better approximation to h, , bk/bk+l or bk1lk ?
9. Applying these results to the equation U" + A( 1 + t)u = 0, u(0) = U( 1) = 0 (which can be treated explicitly by means of Bessel functions), show that we obtain the values in Tables I and 11. TABLE I .v"(l)
n
0
1 2 3 4 5
6 7 8 9 10
1. 1. 2. 3. 4. 7. 10. 16. 24. 36. 53.
=
(-1)"(2n
+ l)!ZP(l)
000 500 238 333 960 378 97 1 31 1 248 043 572
0000 0000 0952 3333 730 2054 506 836 509 599 540
TABLE I1
k 1 2 3 4 5
6 7 8 9 10
bk/bk+l
.25000000 .025198413 3.6210317 5.4595362 8.3125740 1.2685019 1.9367810 2.9575153 lo-' 4.5163492 lo-* 6.8968612 lo-a
9.921260 6.958904 6.632490 6.567805 6.553064 6.549537 6.548676 6.548465 6.548412
b;'lk
4.000000 6.299606 6.512122 6.542008 6.547159 6.548143 6.548342 6.548384 6.548393 6.548395
312
STURM-LIOUVILLE THEORY
10. Obtain an improved estimate for A, by use of the Newton-Raphson technique.
11. Obtain an estimate for X,h, using the determinant
. Is the sequence {(bi2))lii}monotone ? Show that 6 ~ 2 ) / b> ~ ~b::),/6:2:, , See for the foregoing R. Bellman, “On the Determination of Characteristic Values for a Class of Sturm-Liouville Problems,” Illinois J. Math., Vol. 2, 1958, pp. 577-585. P. F. Filchakov, “An Effective Method for Determining the Eigenvalues for Ordinary Differential Equations,” A N UkrRSR, Scriya A, Fizyko-teknichni te matematychni nauky, no. 10, 1967, pp. 883-890. C. Shoemaker, Computation of Characteristic Values of SturmLiouville Problems with a Digital Computer, University of Southern California, USCEE-267, 1968. T h e point of the foregoing exercises is that the higher characteristic values can usually be determined quite accurately from their asymptotic expansions and that it is the first few characteristic values which are difficult to determine accurately. Consequently, it is worth developing special techniques specifically for the determination of the small characteristic values. We shall subsequently discuss methods for determining the first N characteristic values at one time. 8.5. Generalized Fourier Expansion
T h e orthogonality relation of Sec. 8.4 (8.4.13) makes it tempting to expand a function u(t) in an orthogonal series using the uk(t). If we set (8.5.1)
the coefficients will be determined by means of the relations (8.5.2)
8.6. DISCUSSION
313
Formally, we have the Parseval relation
J
T 0
C m
p2dt
=
ak2
(8.5.3)
k=l
and, by virtue of the second relation in (8.4.13), the additional result
j
T
C X,ak2. m
d2dt=
k=l
0
(8.5.4)
Indeed, the principal purpose of the foregoing sections has been to obtain these relations, (8.5.3) and (8.5.4), which enable us to discuss the minimization of K(u) =
j r ur2d t / j
T
yu2 dt
0
(8.5.5)
in a simple, rigorous fashion.
8.6. Discussion
For the case of second-order linear differential equations, of the type discussed above, the behavior of the characteristic functions and characteristic values can be analyzed in detail, as indicated in the exercises at the end of Sec. 8.4. This enables us to use equiconvergence theorems of the type due to Haar to conclude that essentially what is valid for ordinary Fourier series is valid for Sturm-Liouville series. This approach can be pursued for higher-dimensional ordinary differential equations. For partial differential equations, the situation is quite different. I t is necessary to use the Fredholm theory of integral equations to develop a theory of characteristic values and functions and, indeed, the Fredholm theory was created specifically for this purpose. T h e basic idea is to convert an equation such as u”
+ Xcp(t) u = 0,
u(0) = u(T) = 0,
(8.6.1)
into the homogeneous integral equation
j K(t, T
=
0
s) g)(s) u(s) ds,
by means of the appropriate Green’s function.
(8.6.2)
STURM-LIOUVILLE THEORY
314
T h e theory of integral equations, a la F. Riesz, is part of the general theory of linear operators in Hilbert space. We have not mentioned the theory, nor used the notation, since we have no particular need of it in this or the second volume. Rather than provide any ad hoc proof of the Parseval relations of (8.5.3) and (8.5.4), we shall assume their validity under the condition that .p(t) is a positive continuous function and proceed from there. 8.7. Rigorous Formulation of Variational Problem
Using the results of (8.5.3) and (8.5.4), we can write
-x.,"=,
for u €L2(0,T ) , u(0) = u( T ) = 0, u sentation of K ( u ) , it is immediate that
(8.7.1)
a,u,(t). From this repre-
rnin K(u) = A,, U
(8.7.2)
where u runs over the admissible functions, and further that the minimum is attained uniquely for u proportional to u1 . Furthermore, this representation in (8.7.1) clearly shows the significance of the higher characteristic values as relative minima. Thus, A,
=
m:, l(4,
(8.7.3)
where R, is the region of function space determined by u(0)
=
u ( T ) = 0,
(8.7.4)
u' E L2(0,T ) ;similarly,
A,
=
min J(u), RZ
over the region R, determined by u'
(8.7.5)
E L2(0,T ) , and
u(0) = u(T) = 0,
1;
p(t) uu, dt = 0,
(8.7.6)
8.8. RAYLEIGH-RITZ METHOD
315
Once (8.7.1) has been established, we can invoke various min-max representations of the higher characteristic values which are far more convenient for many analytic purposes. 8.8. Rayleigh-Ritz
Method
As noted above, in many important scientific investigations only the characteristic values are of interest. They correspond to natural frequencies, energy levels, and so forth. Hence, it is advantageous to process a variational representation which provides a quick way of obtaining approximate values in terms of relatively simple algebraic operations. Given the expression X
min
- u(O)=u(T)=O
S,'
ut2dt J,' ~ ( tu2) dt '
(8.8.1)
we can use the Rayleigh-Ritz method in the expected fashion to obtain upper bounds for A, and, with enough effort, accurate estimates. Set N
==
(8.8.2)
>
k=l
where vk(0)= vk( T ) = 0, with the v k conveniently chosen functions and the ak free scalar parameters. We can obtain an upper bound for A, by minimizing the expression (8.8.3)
with respect to the ak , the components of the vector a. Here A and B are symmetric matrices, A
=
(l'
z~;w~' d t )
0
, (8.8.4)
This is an algebraic problem which, as we know, leads to the problem of determining the characteristic roots of a determinantal equation of order N , namely I A - AB I = 0. Let the characteristic roots obtained -.- A',". in this fashion be hiN) A$'" Not only does X i N ) yield information concerning A,, but the other characteristic roots, A i N ) ,..., A("), yield information concerning A, , A, ,....
<
< <
316
STU R M-LIOUVILLE THEORY
Exercises
1 . Show that A,
< X i N ),..., A, < AkN).
2. Show that {A~'"')} is decreasing in N , and discuss the use of extrapolation techniques to estimate A,, given [hi'), Ai2), ..., Xi")]. 8.9. Intermediate Problem of Weinstein
T h e variational characterization of the characteristic roots of u"
-1A&)
u =
0,
u(0) = u ( T ) = 0,
(8.9.1)
provides a ready method for obtaining sequences of values {XiN)} which converge from above to h i , the ith characteristic root. T h e problem of obtaining lower bounds is much more subtle and complex. A powerful systematic approach, particularly significant in connection with partial differential equations, was created by Weinstein and extended by Aronszajn and others. References to this work will be found at the end of the chapter. 8.10. Transplantation
Consider the equation
where
+ A&
=
0,
depends on a parameter
E,
U"
€) u
u(0)
=
U(7')
=
0,
(8.10.1)
for example,
d4 6) = d t )
+4(t>.
(8.10.2)
Frequently, we want estimates of the variation of A, as E changes, or perhaps as T , the interval length, varies. Consider the first question. Set K(u, €)
=
J;
u'2
d t / J T g)(t, €) 242 dt. 0
(8.10.3)
I n order to obtain estimates, corresponding say to E = 0, E = e l , we use the solution for E = 0 as an estimate in K(u, E , ) and the solution for E = el as an estimate in K(u, 0). This is called transplantation. It often yields estimates precise enough to use for many purposes. Thus, (8.10.4)
8.11. POSITIVE DEFINITENESS OF QUADRATIC FUNCTIONALS
317
Exercises
1. Consider the case where ~ ( tc), = p)(t) + ~ + ( t and ) , obtain an estimate for I A,(€) - Al(0)l. 2. Show that A,( T ) is monotone decreasing as T increases. 3. Obtain in this fashion some estimates of the variation of A,( T ) as T , the interval length, changes. 4. Obtain an estimate for min U
where
E
>, 0 and
'.T
J
+ + 4 dt,
( u ' ~ u2
u is subject to the conditions u(0) = u( T ) = 0.
8.11. Positive Definiteness of Quadratic Functionals
With the aid of Sturm-Liouville theory we are in a position to determine the precise interval [O, TI over which the absolute minimum of the quadratic functional J(u) =
I
T
( u ' ~- ~ ( tuz) ) dt,
0
(8.11.1)
subject to u(0) = c, u( T ) = d, exists and is determined by the solution of u"
+ T ( t ) u = 0,
u ( T ) = d.
u(O) = C,
(8.1 1.2)
We suppose that g)(t) > 0. Let {An}, {un) be respectively the characteristic values and functions associated with the Sturm-Liouville equation u(0) = u ( T ) = 0. (8.11.3) u" + + ( t ) u = 0, Returning to (8.11.1), let us write u = g(t) + v, where g ( t ) is the linear function of t satisfying g(0) = c, g( T ) = d. T h e n
J@) = J(g
+4
=
J(g)
+2
T
(g'v' - v(t)gv)dt
+ .I(.(8.11.4) ).
T h e positive definite nature of J(u) depends then upon the positive definite nature of J(v). Writing
-
>
k=l
(8.1 1.5)
318
STURM-LIOUVI LLE THEORY
we have (8.11.6)
Hence if A1 > I , the functional J(u) is positive definite. Since Al(T) decreases monotonically as T increases, it is possible that there is a critical value of T for which A,( T ) = 1 . Call this value T, . For T > T, , J(u) does not possess an absolute minimum. Exercises
1. If J ( u ) = S value ?
2. If J(u) = value ?
~ ( U -’ ~u z ) dt,
J ~ ( U ’~
u(0) = c, u ( T ) = 0, what is the critical
u2)dt, u(0) = c, u’(T) = 0, what is the critical
3. What happens if q ( t )changes sign over [0, T ] ? 8.12. Finite Difference Approximations
We can reduce the problem of obtaining estimates for the Ai to an algebraic problem in another fashion. Use the simple quadrature approximations
j: ~ ( tu2) dt g A 1 &l)u2(kA), N-1
,:
k=O
dt g d
(8.12.1)
N
1 [u(kA)
~
u((k
1) A)I2.
~
k=l
Let ufkd) = Uk ,
&?A)
=
cpk
.
(8.12.2)
T h e n the new problem is an algebraic one of minimizing the quotient (8.12.3)
Here uo = a,,, = 0. T h i s again leads to the calculation of the characteristic values of a matrix.
8.13. MONOTONICITY
319
8.13. Monotonicity
As d -+ 0, we can expect that the characteristic values and characteristic functions obtained in this fashion will approach those associated with the differential equation. This may again be regarded as a stability problem, which we will not pursue here. Let us show, however, how we can obtain monotonicity by a slight change in the foregoing formulation, Instead of using the approximations of Sec. 8.12, write U‘(t)=Uk,
k = o , 1 , ..., N - 1 .
kLl < t < ( k + l ) d ,
(8.13.1)
Th i s is an approximation in policy space, to use the terminology of dynamic programming. This will be discussed in detail in Volume 11. It reflects the idea previously stated that u’, rather than u, is the basic function in variational problems. Then
(8.13.2)
. (8.13.1) yields Write v k = ~ ( k d )Then (8.13.3)
whence = vk
vk+l
and p)(t) u2
at
=
I
+
(8.13.4)
ukA
(k+l)A RA
p)(t)[vk
$-
u k ( t - kLl)]2
at*
(8.13.5)
I n this way we obtain another algebraic problem. If we let h,(d) denote the smallest characteristic root obtained in this way, clearly
4 <W ) *
(8.13.6)
Once again we have monotone convergence of the approximating values as d +O. Extrapolation procedures, such as deferred passage to the limit, to estimate A1 can be used once a sequence of values of the form {Ai(d/2k)}, K = 1, 2, ..., R, has been obtained. Furthermore, it is easy t o see that we have convergence since we can approximate to the function u’(t) arbitrarily closely by a step function of the type described in (8.13.1).
320
STURM-LIOUVILLE THEORY
8.14. Positive Kernels
T h e representation
makes it easy to obtain upper bounds for A,. How do we obtain lower bounds? One approach already mentioned is through the theory of “intermediate problems” of Weinstein. Another approach is based on the theory of positive operators. I n Chapter 2 we indicated corresponding results for positive matrices. Let us here describe an application to integral equations. We begin with the conversion of the equation U”
+ XVU = 0,
(8.14.2)
u(0) = U(1) = 0
into the integral equation
where k, is the appropriate Green’s function as discussed in Sec. 8.6. Let us replace A by its reciprocal and consider the integral equation Xu(t) =
JU
(8.14.4)
R(t, s) U ( S ) ds,
0
<
Jz
<
where the essential facts are that k ( t , s) 3 0 for 0 s, t a, k ( t , s) is symmetric, and J; k 2 ( t ,s) dt ds < CO. We shall quote the results required from the theory of integral equations without demonstration. They are direct analogs of those derived in the matrix case. Appropriate references are given at the end of the chapter. T h e familiar representation for AM (now the largest characteristic root) is A,
=
max tl
s: K(t, s) u ( t )
U(S)
J;
u2
dt
dt ds
(8.14.5)
We know from the theory of positive operators that if k(t, s) 3 0, then the characteristic function uM associated with A, is nonnegative. Hence, we have, in addition to (8.14.5), the further results
8.14. POSITIVE KERNELS
321
Whether or not equality holds is extraneous to the present discussion since we are only concerned here with upper and lower bounds. A series of exercises below will illustrate the use of (8.14.5) and (8.14.6). Exercises
1 . Consider the integral equation Af(t) = J: k(t - s ) f ( s ) ds, where we suppose that k ( t ) is nonnegative, even, and monotone decreasing for 0 < t < m and that k dt < co. T h e n
JT
2. Show that max
O
/a
k(t - s) ds 3, ,A
and thus that J O
3. Obtain the estimate of Exercise 2 without any use of the theory of positive operators. Hint:
Au(t) =
J
t 0
k ( t , s) u(s) ds
implies
Choose t appropriately.
4. Combining Exercises 1 and 2, conclude that
5. For the case where k ( t ) = e-t2, show that we have the bounds 0.843 2 A,(2)/d/' 2 0.713 0.995 2 A,(4)/~l/~ 0.749 0.999 3 AM(lO)/d/' 2 0.899.
6. If k(t) is nonnegative, continuous, even, and monotone decreasing for 0 < x < 00, the characteristic functionf,(t) associated with A,,
322
STU RM-LIOUVI LLE THEORY
normalized by the requirement that J t f M ( t )dt = 1, possesses the following properties: (a)fM(t) = f M ( a - t ) ; (b) f M is monotone increasing in 0 t a/2. Hint: Let
< <
Tg
= Ju
k ( t , s)g(s) ds.
0
Use the fact that f,(t) is proportional to limn-rccTng/AMnand that Tg preserves the desired property. 7. Use the trial functions
< <
0 t ,< 6 < a/2, (a) f ( t ) = 1 , b t < a - 6, - c, =1, a-b
+ +
< < < <
to estimate ,A with appropriate choice of the free parameters. For the foregoing, see
R. Bellman and R. Latter, “On the Integral Equation Af(x) = J8 k(x - y ) f ( y )dy,” Proc. Amer. Math. SOC.,Vol. 3, 1952, pp. 884-891. For considerable improvements of the methods and the results, see G. M. Wing, “On a Generalization of a Method of Bellman and Latter for Obtaining Eigenvalue Bounds for Integral Operators,” J. Math. Anal. Appl., Vol. 23, 1968, pp. 384-396. G. M. Wing, “On a Method for Obtaining Bounds on the Eigenvalues of Certain Integral Equations,” J . Math. Anal. Appl., Vol. 1 1 , 1965, pp. 160-175. G. M. Wing, “Mathematical Methods Suggested by Transport Theory,” Amer. Math. SOC.Symposium Proceedings, R. Bellman and G. Birkhoff (editors), forthcoming. Miscellaneous Exercises
1. Show that if u(0) = 0, u(1) = 1 , k 3 4, u’ sL2(0,l ) , we have
where a
=
(1/4
-
l/k)1/2.
,;
,
MISCELLANEOUS EXERCISES
323
2. Hence, show that if u(0) = 0, we have (4d2
and thus
3. Show that
f
u2 dt -
-
tU2 ’)
dt
=
4
s
m
<4
1; (1:
u dt)’
=
u
z
- %) dt,
(Ut
uf2dt.
0
s: S1
(u(tl) - u(tz)),dt, dt,
1: (Jy: 0
=
<
u‘(t)d t y
f
t(1
-
0
See
4)dt,
t ) u ’ dt. ~
G. H. Hardy and J. E. Littlewood, “Some Integral Inequalities Connected with the Calculus of Variations,” Quart. J . Math., Vol. 3, 1932, pp. 241-252.
+
4. Let E(u) denote the expression a(t)u” b(t)u’ + c(t)u and E*(u) the adjoint expression (a(t)u)’ c(t)u. Show that for p 3 1,
s
+
T
n
[upE*(v) - u[p(p - 1) U =
+
[-puvup-L’
+
U ~ - ~ U ’ pu”-’E(u) ~
-
p ) cup]]dt
U p ( u 0 ’ - bS)]T.
JOT [upE*(u) vE(up)]dt. Show that if a(t) 3 0, c(t) < 0 for 0 < t < T and if there exists a function u such that v < 0, E*(v) > 0 in [0, TI, then if
Hint: Consider the integral
5.
+ (1
u(0) = u( T)= 0,
u
-
3 0 in [0, TI, we have
Hint: From the foregoing, we obtain STUpE*(V) 0
[mjn E*(u)]
j
T up
0
dt
dt
T
uaP-lE(u) dt,
< p m$x I w I 1
Now apply Holder’s inequality.
0
T 0
up-l
1 E(u)l dt.
STU RM-LIOUVI LLE THEORY
324
6. Show that if u(0) = u( T ) = 0, then JT
u2 dt
0
sT
( u “ )dt. ~
0
7. If a(t) 3 0, c(t) < 0, u(0) = u ( T ) = 0,
u
3 0, 0
< t < T , then
Hint: Take v = -1.
8. What modification must be made if u changes sign in [0, TI ?
9. If a ( t ) 3 0, c ( t ) < 0, u(0) = u( T ) = 0, show that
Ilint: Use the result max
If I = P’m lim
(ITI f 1’ 0
dt)l”.
10. Let a ( t ) 3 0, c(t) < 0 in [0, TI. For any u such that u(0) = 0, u’( T ) = 0, we have
provided that there exists a function v satisfying the conditions 0 in [0, TI, (av’ - bv),=, 3 0, and qn is the smallest integer such that p > qn , E * ( v ) qocv 3 cv. n
<
<
+
11. If a ( t ) 3 0, c(t) 0 in [0, TI, and if there exists a function v(t) satisfying the conditions v(t) 0, E*(v) > 0, (an’ - bv),=, 0, ( a d - bv),_, 3 0 in [0, TI, then if u’(0) = u’(t) = 0, we have
<
<
12. Let a ( t ) 3 0, c ( t ) LC 0 in [0, TI. If u is a solution of E(u) = 0 and if there exists a function v such that v < 0, E*(e) > 0, v(0)= v( T ) = 0, then
where k
=
min[\ av’(O)l,1 av’(T)/].
325
MISCELLANEOUS EXERCISES
13. I n particular, if a(t) = 1, b(t) = 0, we have
<
14. If a(t) 3 0, c(t) 0, a(0) # 0, for any solution of E(u) = 0 such that u(T)= 0, we have
where k = (a(0)I v’(O)l/min, I E*(v)l)l/p,provided that v is a function such that v 0, E*(u) > 0, v(0) = 0, av’(T) - bv(T) 0. In the I u(0)l. limit asp- oc) we obtain maxl I u(t)l
<
<
<
15. Use the foregoing results to obtain inequalities for the characteristic roots of the Sturm-Liouville equation au” boundary conditions at t = 0 and t = T.
+ cu’
=
hu, under various
For the foregoing results and a number of further results concerning systems of equations, see M. P. Colautti, “Sulla Maggiorazione a priori delle Soluzione delle Equazioni e die Sistemi di Equazioni Differenziali Lineari Ordinarie del Second0 Ordine,” L a Matematiche, Vol. 1 1, 1956, pp. 8-99. Reprinted as N. 484, Pubblicazioni dell’lstituto per le Applicaxioni del Calculo, Rome, 1957.
For extensive generalizations of the foregoing results to classes of partial differential equations, see
G. Fichera, Proc. Intern. Congress Math., Vol. 3, 1956, pp. 220-232. G. Fichera, “Alconi Recenti Sviluppi della Teoria dei Problemi a1 Contorno per le Equazioni alle Derivate Parziali Lineari,” Atti Cong. Internat. Equazioni Derivate Parxiali, Trieste, 1954. These are also available as N. 495 and N. 466 of the series of publications mentioned above.
16. Let q(t) be a given positive continuous function defined for 0 < t < CO. Two functions u ( t ) and p ( t ) are to be determined so as b to maximize the functional So q(t) u(t) dt, subject to the following conditions: (a) u satisfies d/dt ( p ( t )duldt) - q(t)u = 0,0 < t < 6, u(0) = 1, p(b) u’(6) = 0 (b) p ( t ) is continuous, p ( t ) > 0 in (0, b}, S:p(t) dt = k , where k is a given constant
STURM-LIOUVILLE THEORY
326
Show that the solution is unique and that it is given by u(t) = 1 - t/b,
p ( t ) = J! (6 - t ) q(t) dt, and the maximum value is Ji (1 - t / b ) q(t) dt, h where 6 is the unique solution of k = Jo ($, tlq(tl) dt,) dt. 17. Show that the problem is equivalent to that of maximizing
subject to u and p being nonnegative and piecewise continuous and p satisfying the condition J: p ( t ) dt = k. See
R. J. Duffin, “A Variational Problem Relating to Cooling Fins,” J. Math. Mech., Vol. 8, 1959, pp. 47-56. 18. Consider the linear differential equation d/dt (p(t)du/dt) - q(t)u = 0 b and the associated quadratic functional E(u) = Ja[p(t)u‘2 q(t)u2]dt. Take p and q continuous in {a, b} and p > 0. Assume that E is positive definite for all nontrivial u such that u(a) = u(b) = 0. h Consider the truncated functional E(u, t ) = J t [p(tl)u’2+ q(tl)u2]dt,, where a t 6. Let v satisfy the differential equation and u be a function such that u(t) = v(t), u(b) = u(b). Show that E(u, t ) = E(v, t ) E(x, t ) , where z = u - v.
+
< <
+
19. Hence, show that E(u, t ) >, E(v, t ) = p ( b ) v’(b) v(b) - p ( t ) v‘(t)v(t).
20. Performing the change of variable u(tl) = v ( t )w(t,)/w(t), show that we obtain v2((t)E(ut, t > / w 2 ( t ) p ( t ) v’(t)v ( t ) 3 0. Hence, show that
+ v(t>>, v(u> exp (- j’ tl> dt, 1 + u(b) exp (- jbp - l z + ~ ( z ,tl> dt,) ~-IW-~E(W,
a
t
< <
for a t 6, where w and x are arbitrary positive functions subject to the constraints ~ ( a=) x(b) = 0. See
R. J. Duffin and W. D. Serbyn, “Approximate Solution of Differential Equations by a Variational Method,” J . Math. Phys., Vol. 37, 1958, pp. 162-168.
+
21. Consider the equation (p(t)x’)’ q ( t ) f ( x )= 0, where f ( x ) is Introcontinuous and xf(x) > 0, x # 0, limz+mJgf(xl) dt, = +a. duce the function
MISCELLANEOUS EXERCISES
327
Show that if p and q are positive and absolutely continuous, then
+
where y ( t ) = (p’/p q’/q) and y - = max(-y, 0). I n particular, if J” dt < 00, then all solutions are bounded. Hint: Consider dV/dt. See
J. S . Wong, “Explicit Bounds for Solutions of Certain Second Order Nonlinear Differential Equations,” J. Math. Anal. Appl., Vol. 17, 1967, pp. 339-342.
22. Let p(u, t ) and g(u, t ) be continuously differentiable and p(u, t ) > 0. Then
(4 P(.’ -gg,P-1)2 b 0 (b) P (.‘ - gwP-1)2 + 2u‘w
+ 2gt 3 &d (2g)
Equality holds if and only if u‘ results, see
For this and the following
= g,p-l.
D. C. Benson, “Inequalities Involving Integrals of Functions and Their Derivatives,’’ J. Math. Anal. Appl., Vol. 17, 1967, pp. 292-308. 23. Using the foregoing results, show that
if u(O) = m, u ( 2 ~ = ) M, m (b)
= inf u,
M
=
+ + g2) 231 dt 2 U(b)’g(b) - U(.)’gg(a).
[ u ’ ~ (g’
a
Hence, show that (c)
(Jm
0
<4
u2
(1,
t 2U2
dt)
(J;
u‘2
dt) .
Show that for any real A, we have (d)
Jb
a
sup u, Jo u dx
+
( u ’ ~ X2u2)dt 2 h ( ~ ( b) ~~ ( u ) ~ ) .
277
=
0.
STURM-LIOUVILLE THEORY
328
Results concerning functions of several variables may also be found in the foregoing paper.
24. Let h(x) be positive and convex, satisfying h’(0) 0
h ( x ) f ’ ( ~dx) ~3
< 0. T h e n
2 (1:h(x) d x ) ( [ f z ( x ) d x ) , 4 0
with equality for h(x) = 1 (the Wirtinger inequality). (B. A. Troesch.)
25. Let R(x) be the Rayleigh quotient (Ax, x)/(x, x). Show that for any two scalars a and 6, we have R(ax
3- by)
+ R(ay - bx)
=
R(x)
+ R(y),
provided that (x, x) = ( y , y). See
J. B. Diaz and F. T. Metcalf, “A Functional Equation for the Rayleigh Quotient for Some Applications,” J . Math. Mech., Vol. 17, 1968, pp. 623-630. P. Zwaben, “Uber die Eigenwerte der Summen zweier selbstadjungierter Operatoren,” Comm. Math. Helveticii, Vol. 40, 1966, pp. 81-116.
26. Let the matrix A ( t )be such that [(x’, x’) + (x,Ax)]dt > 0. T h e n if X ( t ) is the solution X - A ( t ) X = 0, subject either to X ( 0 ) = I , X’(0) I= 0, or X ( 0 ) = 0, X ( 0 )= I, we can’t have 1 X(tl)/ = 0 for 0 < t , < T. See R. L. Sternberg, “Variational Methods and Non-Oscillation Theorems for Systems of Differential Equations,” Paci$c J . Math., VoI. 19, 1952, pp. 311-322.
27. Show that the problem of minimizing a layered functional of the form can be converted into a problem involving an unknown parameter. For a formal treatment, see
R. Bellman, “Functional Equations in the Theory of Dynamic Programming-XV: Layered Functionals and Partial Differential Equations,” J. Math. Anal. Appl., forthcoming.
28. Consider the problem of minimizing the quotient
BlBLlOGRAPHY AND COMMENTS
329
where uo = uN = 0, g, 3 0. Let denote the associated characteristic roots, k = 1, 2, ..., N - 1, and consider the sum N-1 (N) sN = C k = l (A, - k2.n2).Under what conditions on the g, does sN possess a limiting behavior as N + m ? Hint: t @ ) = Xi .
xi
29. Obtain in this fashion, or otherwise, an expression for C,“_l(An -n2r2), where A, is the nth characteristic root of u” + ( A + q)u = 0, u(0) = u(1) = 0, under appropriate assumptions concerning g.
30. Obtain a corresponding expression for X,“=, (Arh - n2r2)2. Hint: tr(A2) = Xihi2. See I. M. Gelfand and B. M. Levitan, Dokl. AKad. Nauk SSSR, Vol. 18, 1953, pp. 593-596. Also see the review of an article by V. A. Javrjan in Mathematical Reviews, Vol. 33, 1967, No. 5996. Bibliography and Comments $8.1. For a detailed discussion of Sturm-Liouville theory, see
S. H. Gould, Variational Methods in Eigenvalue Problems, University of Toronto Press, Toronto, 1957. E. L. Ince, Ordinary Diflerential Equations, Dover Publications, New York, 1944. 58.4. It is of some interest to determine the type of inhomogeneity that minimizes or maximizes the different characteristic values. For results of this nature, see B. Schwarz, “Some Results on the Frequencies of Nonhomogeneous Rods,” J. Math. Anal. Appl., Vol. 5 , 1962, pp. 169-175. B. Schwarz, Bounds for the Principal Frequency of Nonuniformly Loaded Strings, Mathematics Research Center Report No. 341, University of Wisconsin, October, 1962. M. G. Krein, “On Certain Problems on the Maximum and Minimum of Characteristic Values and on the Lyapunov Zones of Stability,” Amer. Math. SOC.Translations, Series 2, Vol. 1, 1955, pp. 163-187. A question of some importance in engineering and theoretical physics is that of determining the function g(x) in U” (A g(x))u = 0 on the basis of a knowledge of sets of characteristic values. See
+ +
G. Borg, “Eine Umkehrung der Sturm-Liouvilleschen Eigenwerte Aufgabe,” Acta Math., Vol. 78, 1946, pp. 1-96. I. M. Gelfand and B. M. Levitan, “On the Determination of a Differential Equation from its Spectral Function,” Amer. Math. SOC.Translations, Vol. 2, 1955, pp. 253-304. B. M. Levitan and M. G. Gasymov, “Determination of a Differential Equation from Two Spectra,” Uspekhi M a t . Nauk, Vol. 19, 1964, pp. 3-63 (in Russian).
330
STURM-LIOUVILLE THEORY
R. Bellman and J. M. Richardson, “A Note on an Inverse Problem in Mathematical Physics,” Quart. Appl. Math., Vol. 19, 1961, pp. 269-271. L. Anderson, “Summary of Some Results Concerning the Determination of the Wave Operator from Given Spectral Data, in the Case of a Difference Equation Corresponding to a Sturm-Liouville Differential Equation,” J. Math. Anal. Appl., forthcoming. 98.6. See the book by Ince previously cited. $8.9. See A, Weinstein, “Some Numerical Results in Intermediate Problems for Eigenvalues,” Numerical Solution of Partial Differential Equations, Academic Press, New York, 1966, pp. 167-191. A. Weinstein, “On the Sturm-Liouville Theory and the EigenvalueS of Intermediate Problems,” Numerische Mathematik, Vol. 5 , 1963, pp. 238-245.
98.14. See G. Birkhoff, “Extensions of Jentzsch’s Theorem,” Trans. Amer. Math. SOC.,Vol. 85, 1957, pp. 219-227. W. R. Boland, “Convexity Theorems for the Eigenvalues of Certain Fredholm Operators,” J . Math. Anal. Appl., Vol. 25, 1969, pp. 162-181. For a new approach to the determination of characteristic values based on invariant imbedding, see
M. R. Scott, L. F. Shampine, and G. M. Wing, “Invariant Imbedding and the Calculation of Eigenvalues for Sturm-Liouville Systems,” Computing, Vol. 4, 1969, pp. 10-23.
AUTHOR INDEX
A
Boland, W. R., 330 Borel, E., 2 Borg, G., 300, 329 Boyd, D. W., 51 Braver, F., 172 Bremmer, H., 52 Brickman, L., 97 Brown, 186 Brown, T. A., 50, 97, 103, 173, 182 Buche, A. B., 180 Bucy, R. S., 185, 303 Burnand, G., 183 Bushkovitch, A. V., 45
Aleksandrovskii, N. M., 255 Alekseev, V. M., 172 Altschuler, S., 301 Andersson, L., 330 Andrade, S., 223 Andreief, C., 98 Anitschenko, R. I., 185 Aoki, M., 255, 302 Arenstorf, R. F., 186 Aronsson, G., 300 Aronszajn, N., 44, 316 Arthurs, A. M., 186 Atkinson, F. V., 52, 102, 128, 182, 302
C
B Bancroft, S., 183 Banta, E. D., 213 Baxter, G., 128 Beckenbach, E. F., 50, 52, 296 Bellman, R. 8, 11, 31, 37, 41, 42, 47, 49, 50, 51, 52, 71, 93, 94, 98, 99, 100, 101, 102, 103, 121, 130, 131, 132, 133, 140, 172, 173, 175, 179, 180, 181, 182, 183, 184, 185, 207, 219, 223, 224, 254, 255, 256, 257, 295, 296, 298, 299, 301, 302, 303, 312, 322, 328, 330 Ben Miloud, M., 132 Benson, D. C., 299, 327 Bertram, J. E., 297 Bharucha-Reid, A. T., 180 Bialynicki-Birula, I., 120 Bihari, I., 183 Birkhoff, G., 103, 152, 182, 224, 330 Blaquiere, A., 247, 255 Bogdanoff, J. L., 257 Bogoliuboff, N., 216
Calabi, E., 97 Canosa, J., 179 Carleman, 179 Carter, D. S., 300 Cauchy, A., 18 Causey, R. L., 84 Cesari, L., 52, 182, 221 Chen, K-T, 132 Cohen, E. R., 257 Cohn, J. H. E., 310 Colautti, M. P., 325 Cole, J., 179 Conti, R., 182, I84 Cooke, K. L., 8, 37, 132, 185, 255, 256 Courant, R., 78 Cowan, K. C., 184 Cramer, H., 55 Cremer, H., 43 Culver, W. J., 257
D Dantzig, G. B., 102 Davison, B., 302
331
AUTHOR INDEX
332 Davison, 1:. J., 184, 255 deBruijn, N. G., 98, 220, 300 Deich, A. M., 255 Ilenman, H. H., 222 deSpautz, J. F., 31, 182 Iliaz, J . B., 302, 328 Drobnies, S. I., 130 Lherr, 223 Duffin, I<. J., 103, 183, 326
E Effertz, F. I I . , 43 Erdelyi, A., 53
Fair, IV., 5 I Faris, IY. G., 109 Farison, J . B., 257 Feynnian, R. P., 303 Fichera, G . , 325 Fike, C. X., 224 Filchakov, P. F., 3 I 2 Finlayson, B. A., 222, 301 Finsler, P., 79 Fischer, E., 78 Folltman, J,, 114 Forsythe, G . E., 96, 101, 131 Fonler, R2. E., 132 Fowler, it. I I . , 41 Friedrichs, K. O., 222 Froman, N., 53 Froman, P. O., 53 Fuller, A. T., 132 Furstenberg, II., 133
Green, J. W., 53 Greenspan, D., 185 Grobner, W., 177, 183 Gross, O., 302 Gusarov, 31
H Hadamard, J., 95, 96 Hale, J . K., 183 Hallam, T. G., 183 Hammer, P. C . , 207 Hammerslry, J. M., 172 Handelman, H. G., 185 Hardy, G. H., 2, 41, 52, 296, 323 Harris, T. E., 103 Hartman, P., 103, 174 Heard, M. L., 181 Hearon, J. Z., 132 Hedrick, E. R., 213 Heine, E., 214 Heisenberg, 223 Helton, B. W., 131 Henrici, P., 213 Herrnite, C., 223 Hestenes, M. R., 301 Higgins, T. J., 222 Hille, E., 186 Hobby, C. R . , 256 Hochstadt, H., 217 Hoffman, W. A., 64 Hsu, C . S., 223 Hukuhara, M., 184 Hunter, C . , 53 Hurwitz, A., 43 Hyers, D., 173
I
G Gasynov, M. G., 330
Gelfand, I. iv.,52, 224, 329 G i l \ a rr), J. J., 183 Giicksberg, I., 302 Golden, 96 Golub, G. II., 96 Good, I. J.. 224 Gorman, I)., 224 Gould, L. A,, 297 Gould, S. H., 329 Goursat, E., 213
Ikebe, T., 291 Ince, E. L., 310, 329 Ingharn, A. E., 94
Jacobi, K. G . J., 103, 116
Jacobs, 0. L. R., 296 Jao, F. I:., 223
Jeffreys, 53 Jones, G. S., 45
AUTHOR INDEX
Jordan, T., 83 Juncosa, M. L., 97
K Kac, M., 52, 79, 99 Kagiwada, H., 185 Kaiser, J. F., 297 Kajuk, Ja. F., 219 Kalaba, R., 52, 101, 102,116, 183, 184, 185, 223, 224, 255, 256, 302, 303 Kalman, R. E., 297 Kantorovich, L. V., 204 Kaplan, S., 132 Kardashov, A. A., 256 Kato, T., 291, 302 Kazakov, I. C., 257 Keller, J. B., 185, 220 Kellogg, 0. G., 152, 224 Kemp, R. D., 47 Kesten, H., 133 Kevorkian, J., 182 Khenkin, G. M., 258 Kolodner, I. I., 48, 175 Kolovskii, M. Z., 257 Koshiba, Z., 45 Kotkin, B., 256 Krein, M. G., 183, 329 Kryloff, N., 216 Kushner, H. J., 258
L Lagrange, J. L., 213 Laguerre, E., 51 Lanczos, C., 242, 257 Langer, R. E., 53 LaSalle, J. P., 93, 182 Latta, G. E., 256 Latter, R., 322 Lattes, R., 51, 102 Lavrentiev, M. M., 102 Lawson, C. A., 257 Lax, P. D., 58, 98 Lee, E. S., 302 Lefschetz, S., 182 Legras, J., 133 Lehman, S., 100 Leray, J., 152 Lerman, R. A., 31, 182
333
Levenson, M. E., 222 Levinson, N., 47, 175, 177 Levitan, B. M., 329, 330 Levitt, L., 257 Lew, A., 227 Lewis, R. D., 52 Liechenstein, M. I., 257 Lighthill, J., 220 Lions, J. L., 51, 102 Liouville, J., 30, 53 Littlewood, J. E., 52, 222, 296, 323 Liu, Y. K., 222 Lochak, G., 223 Lockett, J., 101, 102, 256 Loewner, C., 103 Luchka, A. Y., 131 Luenberger, D. G., 256 Luke, Y. L., 255 Lyapunov, A. A., 134, 142, 166
Mc McCoy, J., 224 McDonough, R. N., 255
M Magnus, W., 131 Mahoney, J. J., 53 Malkin, I. G., 183 Manning, I., 184 Marshak, R. E., 302 Marzollo, A., 257 Massera, J. L., 169, 183 Medgyessy, P., 179 Mehta, M. L., 133 Meier, L., 256 Meksyn, D., 176 Merkes, E. P., 51 Merriam, C. W., 297 Merz, G., 255 MesaroviC, M. D., 257 Metcalf, F. T., 328 Miele, A., 299 Mielnik, B., 120 Mikhlin, S. G., 301 Miller, R. K., 181 Millman, M. H., 220 Miranker, W. L., 132 Mirsky, L., 101
334
AUTHOR INDEX
iMitropolskiy, Yu. A., 132, 184 Mitter, 223 Montroll, F. LV., 53 Morse, M., 301 Moser, J., 184, 224 Moursund, 11. G., 207, 223 Munakata, K., 223 Murdock, \V. I,., 70, 99
N N * ~‘r. , Y., 51, 185 Nehari, Z., 50 Newland, U. E., 301 Newton, C;. C., 297 Nikitin, ,4. K., 257 Nisscn, U., 125 Noh, W. F., 213 Nohel, J. A , , 181
0 Olkin, I., 94 Olver, F. W.J., 53 O’Mallev, 11. E., 185 Osborn, H., 185 Ostrowski, A. M., 59, 66, 67, 70, 71, 100, 102, 223
P Pavaloiu, I . , 211 Pearson, C. E., 14 Penrose, R., 67 Perron, O., 103, 140, 183 Persidskii, K. P., 183 Pervozvanskiy, A. A , , 257 Petryshyn. W. V., 183 Philippc, L., 223 Pimbley, G. H., 174 Plebanski, J., 120 Poincark, H., 35, 134 Pollard, J. H., 133 Polya, G., 52, 98, 129, 296 Prestrud, M., 185 Protter, I\I. H., 213
R Rah, M., 46 Reissig, R., 182
Reiter, A., 200, 222 Repin, I. M., 258 Rice, J. R., 256 Richardson, J. M., 129, 130, 180, 254, 257, 330 Riesz, F., 3 14 Rinehart, R. F., 95, 100, 131 Robinson, P. D., 186 Robinson, S. M., 223 Roe, G. M., 37 Rogers, E. H., 184 Rosen, G., 186 Rosen, J. P., 184 Rota, G. C . , 128 Rozonoer, L. I., 303 Rutishauser, H., 224
S Sablos, G., 183 Samoylenko, A. M., 132 Sandri, G., 219 Sansone, G., 182 Schaefer, J. J., 183 Schauder, 152 Schild, A., 103 Schlieder, 223 Schmaedeke, IV., I I6 Schur, I., 80 Schwardtfegcr, H., 100 Schwarz, B., 329 Scott, M. R., 330 Scott, W.T., 51 Scriven, L. E., 222, 301 Serbyn, W. D., 326 Shampine, L. F., 330 Shapley, I>., 94 Shinbrot, M., 256 Shinohara, Y . , 220 Shoemaker, C., 3 I2 Shohat, J. A., 45, 219, 220 Shook, 186 Shtokalo, I. Z., 132 Sibuya, Y., 132 Siegel, C. L., 94, 256 Smith, K. T., 44 Smith, R. A., 127 Sobolov, V. I., 131 Squire, W., 176 Stach, K., 19
AUTHOR INDEX
Stakgold, I., 302 Steffenson, 21 1 Stephan, B. H., 217 Sterbenz, P. H., 224 Sternberg, R. L., 328 Stieltjes, T. J., 35, 96, 212 Stoker, J. J., 222 Stokes, A., 132, 184 Stone, B. J., 68 Straus, E. G., 42 Sturrock, P. A., 224 Sweet, D., 183 Swinnerton-Dyer, H. P. F., 15 Sylvester, J. J., 81, 207 Szego, G., 70, 98, 99
T Taussky, O., 66, 94, 97, 100, 103, 126 Taylor, G. D., 207, 223 Temple, G., 80, 102, 176, 182, 185, 220 Tobak, M., 220 Todd, J., 101 Trotter, H. F., 109 Tychonov, A. N., 55, 86, 102
U Uchiyama, S., 45 Ulam, S., 173 Ulanov, G. M., 257 Ulm, S. Yu., 211 Urabe, M., 200, 217, 218, 220, 221, 222 Uzawa, G., 182
V Vainberg, M. M., 302 Van der Pol, B., 220, 222, 223 Varga, R. S., 103 Vereeke, P., 116 Vitushkin, A. S., 258
335
Volterra, V., 180 von Karman, T., 182 Von Zeipel, H., 186
W Wafter, J., 31 Warga, J., 184 Warten, R. M., 132 Wasow, W., 185 Wegge, L. L., 223 Weinstein, A., 316, 330 Weiss, B., 132 Weiss, L., 255 Wendef, J. G., 128 Westlake, J. R., 101 Weyl, H., 176 Wiener, N., 222 Wigner, E. P., 80 Wilcox, H., 131 Wilcox, R. M., 120 Wimmer, 101 Wing, G. M., 52, 322, 330 Wintner, A., 103 Wong, J. S., 327 Wynn, P., 224
Y Yaglom, A. M., 52, 224 Yamazaki, 223 Yanagiwara, H., 220 Yanase, M. Y., 80 Yorke, J. A., 179
Z Zaborszky, J., 224 Zaidenberg, E. D., 258 Zlamal, M., 46 Zwaben, P., 328
This page intentionally left blank
SUBJECT INDEX
A Accelerated convergence, 184 Acceleration of convergence, 210 Adaptive computation, 58 Adaptive control processes, 102 Adjoint, 65 Adjoint equation, 118 Adjoint operator, 132 Aggregation, 225 Algorithm, 225 Analytic continuation of solutions, 219 Approximately convex functions, 173 Approximation by exponentials, 242 Asymptotic behavior, 23, 52, 146, 136, 149, 160, 179, 181 Asymptotic control theory, 185 Asymptotic series, 35 Averaging, 131 Averaging method, 250
B Baker-Campbell-Hausdorff, 109 Baker-Campbell-Hausdorff formula, 120 Banach space methods, 140 Baxter's functional equation, 128 Blasius equation, 176 Boundary layer, 14 Boundary layer problems, 176 Branching processes, 103 Brownian motion, 70 Bubnov-Galerkin method, 187 Burger's equation, 57
Cauchy-Schwarz inequality, 16 Cayley-Hamilton theorem, 76 Characteristic equation, 69 Characteristic functions, 307 Characteristic roots, 69 Characteristic values, 307 Characteristic vectors, 69 Closure, 258 Commutators, 108 Computational methods, 131 Conditioned matrices, 86 Conditioned systems, 59 Constrained minimization, 71 Continued fractions, 5 1 , 130 Control processes, 51, 132, 184 Convolution, 7, 256 Courant parameter, 191 Cramer solution, 56
D Deferred passage to the limit, 164 Determinantal criteria, 81 Determinantal solution, 55 Difference approximations, 31 8 Difference equations, 165 Difference methods, 163 Differential approximation, 225, 227, 256 Differential-difference equations, 8, 37, 132, 238, 255, 256 Differential-integral equation, 128 Domain of stability, 183 Dynamic programming, 104, 172
E
C Calculus of variations, 11, 121 Cauchy determinant, 96
Elimination, 58 Elliptic functions, 223 Emden-Fowler equation, 170, 186
337
SUBJECT INDEX
338 Euler equation, 12, 305 Euler solution, 111 Existence, I40 Exponential operators, 120 Extrapolation techniques, 209, 237
Fermi-Thomas equation, 170 Feynman integrals, 109 Feynman operator calculus, 132 Fixed-point methods, I52 Fixed-point theorems, 184 Fixed points, 152 Floquet representation, 120 Floquet theorem, I32 Fourier transform, 230 Fredholm integral equation, 130, 177 Functional-differential equations, 240 Fundamental inequality, 3 Fundamental lemma, 136
G Generalized Baker-I-Iausdorff formula, 132 Generalized Fourier expansion, 312 Generalized inverse, 67 Green’s functions, 12, 14, 123, 320 Gram-Schmidt orthogonalization, 62 Gram-Schmidt procedure, 227
H Hadamard’s inequality, 95 Hammerstein type equations, 175 Hermitian, 82 Hilbert matrix, 87 Hilbert projective metric, 103 Holder inequality, 17 Hyperbolic partial differential equations, 98
I Identification, 1I , 98 Identification problems, I32 Inequalitics, 50, 52, 183 Inhomogeneous equation, 7 Initial approximations, 208 Inner product, 61
Integral identities, 94 Integral inequalities, 323 Integration in functional spaces, 52 Intermediate problem, 316 Invariant imbedding, 52, 102, 104, 185 Inverse theorems, 140 Irregular perturbation problem, 185 Irregular stability problem, 168
Jacobi identity, 64 Jacobian matrix, 64 Jordan canonical form, I f 3
Kato’s Lemma, 80 Kepler’s equation, 213 Korteweg-de Vries, 57 Kronecker products, 124
L Lagrange-Burmann formula, 213 Lagrange expansion, 53, 212, 224 Lagrange multiplier, 72 Lagrange transform, 7 Lagrange variation of parameters, 8 Laplace transform, 7, 101, 250 Layered functional, 328 Levinson oscillator, 175 Lie-Reihen, 177 Limit cycles, 217 Lindstedt renormalization, 21 8 Linear Algebra, 101 Linear algebraic equations, 54 Linear programming, 102 Linearization, 179 Linearly independent, 62 Liouville transformation, 30, 50 Lipschitz condition, 141 Lyapunov functions, 127
M Markov matrix, 66 Matric function, 95 Matrix exponential, 107
SUBJECT INDEX
Matrix norm, 67 Matrix Riccati equation, 123 Mean-square approximation, 242, 251, 257 Method of moments, 202, 214, 224 Min-Max characterization, 77 Minimax theory, 103 Minimax problem, 207 Minkowski inequality, 17 Moments, 201 Monotone matrix functions, 103 Monotonicity, 3 19 Multidimensional Newton-Raphson, 207 Multidimensional theta functions, 100 Multiparametric spectral theory, 102 Multiple characteristic roots, 75 Multiple integrals, 98 Multiple solutions, 183 Multipoint value problems, 222
N Nearly periodical coefficients, 132 Newton Interpolation Formula, 46 Newton-Raphson-Kantorovich, 204 Newton-Raphson method, 204 Noncommutative operations, 133 Noncommutativity, 64 Nonlinear damping, 183 Nonlinear extrapolation, 210 Nonlinear forced oscillations, 122 Nonlinear integral equation, 143, 181 Nonlinear mechanics, 216 Nonlinear oscillations, 222 Nonlinear periodic systems, 222 Nonlinear problems, 182 Nonlinear spring, 192, 193, 244 Nonlinear summability techniques, 214 Nonlinear Volterra Equation, I77 Norm, 61 Normal matrices, 84 Nonzero sum games, 9 Numerical analysis, 101 Numerical aspects, 161 Numerical inversion, 101 Numerical stability, 231
0 Optimal starting, 224 Optimal starting values, 207, 223
339
Orthogonal matrix, 66 Orthogonal Polynomials, 229
P Prime numbers, 42 Principal solutions, 5 Product integrals, 131 Projective metrics, I73 Proper solution, 170
Q Quadratic convergence, 205, 224 Quantum mechanics, 203 Quadrature, 176 Quasi-differential equations, 129, 180 Quasilinearization, 184, 223, 232, 255 Quasiperiodic coefficients, 121 Quotientan-Differen-algorithms, 224
Ramanujan’s equation, 21 3 Random matrices, 127, 133 Random walk, 70 Rayleigh-Ritz method, 103, 259, 304 Reduction of dimensionality, 85 Reduction of storage, 242 Relative invariants, 68 Renewal equation, 233 Reynolds operations, 128 Reproducing inequality, 45 Reproducing kernels, 44 Riccati approximation, 232 Riccati equation, 14, 51, 179, 185, 248, 253 Riccati equations, 255
S Schrodinger equation, 33 Second method of Lyapunov, 157 Self-consistent approach, 88 Self-consistent condition, 247 Semigroup property, 1I7 Semigroups of operators, 180 Separation of exponentials, 179 Shohat renormalization, 21 8 Singular, 66
SUBJECT INDEX
340
Skew matrices, 100 Slightly-coupled, 201 Solitary w i v e s , 58 Spectral decomposition, 74 Square roots, 80, 207 Stzihility, 1 13, 166 Stability matrix, 114 Stability theory, 37, 56, 134, 182 Stxking operator, 125 Stationary values, 305 Statistical linearization, 257, 258 Steffenson’s inequality, 48 Steffenson’s method, 21 I Stochastic games, 94 Sturm-Liouville theory, 304 Subadditive functions, 172 Subharmonic response, 222 Successive approximations, 25, 105, 112, 150, 227 Sylvester dialytic method of elimination, 57 Symmetric matrices, 72
T Tchebycheff inequality, 45 Toeplitz matrices, 99 I ranscendental numbers, 250 r .
Transcendentally-transcedent functions,
23 3 Transition points, 53 Transport theory, 52, 322 l’runcated equation, 178 r l 1 runcated system, 180 Truncation, 254 “Tucking-in” technique, 198, 223
Two-point boundary condition, 122 Two-point boundary value problem, 10, 158, 159 Tychonov regularization, 86
U Ultraspherical polynomials, 222 Uniformization of asymptotic expansions, 219 Uniqueness, 140
V Van der Pol equation, 192, 198, 200, 218, 220, 221, 246 Vandermonde determinant, 96 Variational characterization, 89 Variation problem, 314
W W K B approximation, 52, 177 W K B approximations, 33 Wave propagation, 52 Weighted residuals, 222 Widely separated eigenvalues, 132 Wiener integral, 224 Wronskian. 5
Y Young’s inequality, 17
Mathematics in Science and Engineering A Series of Monographs and Textbooks Edited by RICHARD BELLMAN, University 1. T. Y. Thomas. Concepts from Tensor Analysis and Differential Geometry. Second Edition. 1965 2. T. Y. Thomas. Plastic Flow and Fracture in Solids. 1961 3. R. Aris. The Optimal Design of Chemical Reactors: A Study in Dynamic Programming. 1961
of Southern California
19. J. Aczel. Lectures on Functional Equations and Their Applications. 1966 20. R. E. Murphy. Adaptive Processes in Economic Systems. 1965
21. S. E. Dreyfus. Dynamic Programming and the Calculus of Variations. 1965 22. A. A. Fel'dbaum. Optimal Control Systems. 1965
4. J. LaSalle and S. Lefschetz. Stability by by Liapunov's Direct Method with Applications. 1961
23. A. Halanay. Differential Equations: Sta-
5. G. Leitmann (ed.). Optimization Techniques: With Applications t o Aerospace Systems. 1962
tems. 1966
6. R. Bellman and K. L. Cooke. DifferentialDifference Equations. 1963 7. F. A. Haight. Mathematical Theories of Traffic Flow. 1963
8. F. V. Atkinson. Discrete and Continuous Boundary Problems. 1964 9. A. Jeffrey and T. Taniuti. Non-Linear Wave Propagation: With Applications to Physics and Magnetohydrodynamics. 1964
bility, Oscillations, Time Lags. 1966
24. M. N. Oguztoreli. Time-Lag Control Sys25. D. Sworder. Optimal Adaptive Control Systems. 1966
26. M. Ash. Optimal Shutdown Control of Nuclear Reactors. 1966 27. D. N. Chorafas. Control System Functions and Programming Approaches (In Two Volumes). 1966 28. N. P. Erugin. Linear Systems of Ordinary Differential Equations. 1966 29. S. Marcus. Algebraic Linguistics; Analytical Models. 1967
10. J. T. Tou. Optimum Design of Digital Control Systems. 1963.
30. A. M. Liapunov. Stability of Motion. 1966
11. H. Flanders. Differential Forms: With Applications t o the Physical Sciences. 1963
31. G. Leitmann (ed.). Topics in Optimization. 1967
12. S. M. Roberts. Dynamic Programming in Chemical Engineering and Process Control. 1964
32. M. Aoki. Optimization of Stochastic Systems. 1967
13. S . Lefschetz. Stability of Nonlinear Control Systems. 1965 14. D. N. Chorafas. Systems and Simulation. 1965
15. A. A. Pervozvanskii. Random Processes in Nonlinear Control Systems. 1965 16. M. C. Pease, Ill. Methods of Matrix Algebra. 1965
17. V. E. Benes. Mathematical Theory of Connecting Networks and Telephone Traffic. 1965
18. W. F. Ames. Nonlinear Partial Differential Equations in Engineering. 1965
33. H. J. Kushner. Stochastic Stability and control. 1967 34. M. Urabe. Nonlinear Autonomous Oscillations. 1967 35. F. Calogero. Variable Phase Approach t o Potential Scattering. 1967 36. A. Kaufmann. Graphs, Dynamic Programming, and Finite Games. 1967 37. A. Kaufmann and R. Cruon. Dynamic Programming: Sequential Scientific Management. 1967
38. J. H. Ahlberg, E. N. Nilson, and J. L. Walsh. The Theory of Splines and Their Applications. 1967
39. Y. Sawaragi, Y. Sunahara, and T. Nakamizo. Statistical Decision Theory in Adaptive Control Systems. 1967 40. R. Bellman. Introduction to the Mathematical Theory of Control Processes Volume I. 1967 (Volumes II and Ill in preparation) 41. E. S. Lee. Quasilinearization and Invariant Imbedding. 1968 42. W. Ames. Nonlinear Ordinary Differential Equations in Transport Processes. 1968 43. W. Miller, Jr. Lie Theory and Special Functions. 1968
60. G. Rosen. Formulations of Classical and Quantum Dynamical Theory. 1969 61. R. Bellman. Methods of Nonlinear Analysis, Volume I. 1970 62. R. Bellman, K. L. Cooke, and J. A. Lockett. Algorithms, Graphs, and Computers. 1970
In preparation A. H. Jazwinski. Stochastic Processes and Filtering Theory S. R. McReynolds and P. Dyer. The Computation and Theory of Optimal Control
44. P. B. Bailey, L. F. Shampine. and P. E. Waltman. Nonlinear Two Point Boundary Value Problems. 1968.
J. M. Mendel and K. S. Fu. Adaptive, Learn-
45. Iu. P. Petrov. Variational Methods in Optimum Control Theory. 1968
E. J. Beltrami. Methods of Nonlinear Analysis and Optimization
46. 0. A. Ladyzhenskaya and N. N. Ural’tseva. Linear and Quasilinear Elliptic Equations. 1968
H. H. Happ. The Theory of Network Diakoptics
47. A. Kaufmann and R. Faure. Introduction to Operations Research. 1968 48. C. A. Swanson. Comparison and Oscillation Theory of Linear Differential Equations. 1968 49. R. Hermann. Differential Geometry and the Calculus of Variations. 1968 50.
N. K. Jaiswal. Priority Queues. 1968
51. H. Nikaido. Convex Structures and Economic Theory. 1968 52. K. S. Fu. Sequential Methods in Pattern Recognition and Machine Learning. 1968 53. Y. L. Luke. The Special Functions and Their Approximations (In Two Volumes). 1969 54. R. P. Gilbert. Function Theoretic Methods in Partial Differential Equations. 1969 55. V. Lakshmikantham and S. Leela. Differential and Integral Inequalities (In Two Volumes). 1969 56. S. H. Hermes and J. P. LaSalle. Functional Analysis and Time Optimal Control. 1969.
57. M. lri. Network Flow, Transportation, and Scheduling: Theory and Algorithms. 1969 58. A. Blaquiere. F. Gerard, and G. Leitmann. Quantitative and Qualitative Games. 1969
59. P. L. Falb and J. L. de Jong. Successive Approximation Methods in Control and Oscillation Theory. 1969
ing, and Pattern Recognition Systems: Theory and Applications
M. Mesarovic, D. Macko, and Y. Takahara. Theory of Hierarchical Multilevel Systems