PROCEEDINGS OF SYMPOSIA IN APPLIED MATHEMATICS VOLUME VIII
CALCULUS OF VARIATIONS AND ITS APPLICATIONS
McGRAW-HILL BOO...
67 downloads
485 Views
3MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
PROCEEDINGS OF SYMPOSIA IN APPLIED MATHEMATICS VOLUME VIII
CALCULUS OF VARIATIONS AND ITS APPLICATIONS
McGRAW-HILL BOOK COMPANY, INC. NEW YORK
TORONTO
LONDON
1958
FOR THE AMERICAN MATHEMATICAL SOCIETY 80 WATERMAN STREET, PROVIDENCE, RHODE ISLAND
PROCEEDINGS OF THE EIGHTH SYMPOSIUM IN APPLIED MATHEMATICS OF THE AMERICAN MATHEMATICAL SOCIETY Held at the University of Chicago April 12-13, 1956
COSPONSORED BY
THE OFFICE OF ORDNANCE RESEARCH
Lawrence M. Graves EDITOR
Prepared by the American Mathematical Society under Contract No. DA-19-020-ORD-3777 with the Ordnance Corps, U.S. Army.
Printed in the United States of America. All rights reserved except those granted to the United States Government. Otherwise, this book, or parts thereof, may not be reproduced in any form without permission of the publishers. Copyright © 1958 by the McGraw-Hill Book Company, Inc.
Library of Congress Catalog Card Number 50-1183
CONTENTS EDITOR'S PREFACE
V
On Variational Principles in Elasticity
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
BY ERIC REISSNER
Variational Principles in the Mathematical Theory of Plasticity.
7
BY D. C. DRUC%ER
Discussion of D. C. Drucker's Paper "Variational Principles in the Mathematical Theory of Plasticity" . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
23
BY P. G. HODGE, JR. A Geometrical Theory of Diffraction .
.
.
27
.
.
53
.
.
79
.
.
89
.
.
93
Dynamic Programming and Its Application to Variational Problems in Mathematical . Economics . . . . . . . . . . . . . . . . .
115
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
BY JOSEPH B. KELLER
Upper and Lower Bounds for Eigenvalues BY J. B. DIAZ
.
.
.
.
.
.
.
.
Stationary Principles for Forced Vibrations in Elasticity and Electromagnetism By J. L. SYNGE A Variational Computation Method for Forced-vibration Problems.
.
.
.
BY H. F. WEINBERGER
Applications of Variational Methods in the Theory of Conformal Mapping BY M. M. SCHIFFER
.
.
.
.
.
BY RICHARD BELLMAN
Variational Methods in Hydrodynamics .
.
.
.
.
.
.
.
.
.
.
.
139
.
.
.
.
143
BY S. CHANDRASERHAR
Some Applications of Functional Analysis to the Calculus of Variations BY E. H. ,ROTHE INDEX
153
111
EDITOR'S PREFACE This volume contains the papers presented at the Eighth Symposium in Applied Mathematics, sponsored by the American Mathematical Society and the Office of Ordnance Research, and devoted to The Calculus of Variations and Its Applications. In addition to the nine invited addresses, there are included two brief notes, by P. G. Hodge, Jr., and by H. F. Weinberger, which were invited by the Program Committee and which embody discussion of the papers by D. C. Drucker and by J. L. Synge, respectively. It seems obvious that one symposium could not profitably pay attention to all the directions in which variational methods have been applied. From the consultations of the Program Committee there resulted a group of addresses principally directed to applications in dynamics, but treating several other topics also. The editor wishes to make special acknowledgment to the McGraw-Hill Book Company for their care in the production of the volume, and to all the authors for the careful preparation of their manuscripts. As a result the editor's task has been a comparatively light one. LAWRENCE M. GRAVES
Editor
V
ON VARIATIONAL PRINCIPLES IN ELASTICITY' BY
ERIC REISSNER
1. Introduction. Boundary-value problems for the differential equations of the theory of elasticity have in common with many other differentialequation problems the property of being equivalent to problems of the calculus of variations. Recognition of this fact, for the problems of the elastic rod, goes back to Euler and Daniel Bernoulli. The general three-dimensional problem was first considered in this fashion by Green, in 1837. We may, in the discussion of variational principles in elasticity, distinguish a number of phases as follows: 1. The formulation of different variational principles and their interrelation. The best-known examples of this are Green's minimum principle for displacements and Castigliano's maximum principle for stresses. 2. The application of variational principles to the establishment of approximate two- and one-dimensional theories for three-dimensional problems. A classical example of this is Kirchhoff's treatment of the differential equations and boundary conditions for transverse bending of thin plates. 3. The application of variational principles for the determination of numerical values of the solution of boundary-value problems. 4. The simultaneous use of different variational principles for the determination of upper and lower bounds of numerical values. 5. The use of variational principles for the proof of uniqueness and existence theorems in elasticity theory. The present paper has as its object the consideration of some of the ques-
tions associated with phases 1 and 4, as they have been of interest to the author.
2. The boundary-value problem. We consider the following system of nine differential equations for six components of stress, Ti, = Tji, and three components of displacement, ui: (1) (2)
A-i = 0, Tij,j (uij -i+ wi) = W,rii.
In these equations and in what follows we make use of the summation convention according to which one sums over repeated subscripts. A comma in front of a subscript denotes partial differentiation with respect to the variable in question, except that f,i indicates differentiation of f with respect to the Cartesian coordinate xi. 1 The work leading to this paper has been supported by the Office of Naval Research under Contract No. Nonr-1841(17) with the Massachusetts Institute of Technology. 1
ERIC REISSNER
2
The function (3)
in the equilibrium equations is taken in the form Y' = Xiui + YYijuiu7,
where the Xi and Yij = Yji are given functions of the coordinates xi. The function W in the stress-strain relations (2) is taken in the form (4)
W = Ai,7ij + 2Bijklrirkl,
where the Aij = Aji and Bijkl = Bjikl = Bijlk = Bjilk are given functions of xi.
The system (1) and (2) is to be solved in the interior of a region V with boundary surface S. We divide the surface S in two parts, S. and SP, and consider the following system of conditions: (5)
On S..: On SP:
ui = O.Pi,
pi = x,ui.
The functions ¢ and x are taken in the form = uipi + v iipipj, X = AN + Icijuiujf where ui, pi, bij = bij, cij = cji are given functions of position on S. and SP, respectively. The quantities pi are the x; components of the surface-stress intensity, given by (6)
(7)
pi = cos (n,xi) rij,
where n is the outward normal direction to the surface S. The system of equations (1), (2), and (5), with ,', W, 0, and x defined by (3), (4), and (6), may be shown to represent the Euler equations and natural boundary conditions of a variational problem as stated below. 3. The general variational equation. Appropriate synthesis leads to the conclusion that a variational problem which has the differential equations (1) and (2) as Euler (differential) equations and the boundary conditions (5) as natural (or Euler) boundary conditions is the problem
SI = 0,
(8)
where (9)
I = fy (yijij - G - W) dV - fsp x dS - fsu (piui - -0) dS,
the quantities yij being defined by (10)
yij = $(ui.j + u1,i),
and where the rij and ui are varied independently.' 2 A variational theorem in which equations (10) are not used as definitions but are considered six of a total of fifteen differential equations for stresses, strains, and displacements has recently been formulated by K. Washizu in Technical Report 25-18 of the Aeroelastic and Structures Research Laboratory of the Massachusetts Institute of Technology (March, 1955).
ON VARIATIONAL PRINCIPLES IN ELASTICITY
3
To verify the/ correctness of the above statement, we write (11)
SI = fv (yij Srij + rij Syij - ik,u, Sui - W,,i, 57-ii) dV Y
-fsP X,u; Su; dS - fs
(ui Spi + pi Sui - 4,P; Spi) dS,
and we transform the second term in the volume integral by integration by parts, as follows: (12)
-- f rij S(ui,j + uj,i) dV = - f rij,,i Sui dV + f pi Sui dS.
Combination of (11) and (12) gives (13)
SI = f [(yij - W,z,j) Srij - (rij,j
/
L ,P,u;) Sui] dV I
Sui dS - jrsu (ui - 4,p;) Bpi dS,
+ fsP (pi
and this shows that the Euler equations of the problem are the differential equations (1) and (2) and the boundary conditions (5).
The variational theorem implied by (8) to (10) is a generalization of a theorem which was formulated earlier.' It reduces to the earlier theorem if it is assumed that the body force function ¢ is absent and that the functions 0 and x in the boundary conditions are of the form (6')
0 = icipi,
x = piui.
What we have done in going from (6') to (6) is to take the step from having either stress or displacement boundary conditions on S p and S. to a system of mixed boundary conditions on both S, and S. in such a manner as to preserve the form of the original theorem as a special case. Were it not for the desirability of accomplishing this within the framework of the generalized problem, there would, for the generalized problem, be no need for a separate consideration of the boundary portions S, and Su. 4. Variational equations for displacements or stresses. In order to bring out the significance of the general variational equation (8) for displacements and stresses, we state separately the less-general variational equations for displacements or stresses. In doing this, we are limiting ourselves here to stress and displacement boundary conditions of the form (6'). a. Variational principle for displacements (Green). The stress-strain relations (2) are considered as equations of definition for the stresses (so that stress variations are dependent on displacement variations), and displacement variations are limited such that Sui = 0 on S.. Equations (2) are inverted and written, with the help of a function U, in the form (14)
rij = V, q.
3 E. Reissner. On a variational theorem in elasticity, J. Math. Phys. vol. 29 (1950) pp. 90-95.
ERIC REISSNER
4
We further find that
'yjjrjj - W = U,
(15)
and that the variational equation which has the equilibrium equations (1) and the stress boundary conditions in (6') as Euler equations is of the form
SIu=0,
(16)
where (17)
I = fV (U - G) dV - f piui dS.
b. Variational principle for stresses (Castigliano). We now assume that stress variations and displacement variations are such that all comparison states are equilibrium states. (We then have Spi = 0 on S and
S(Tij.j +'A.) = 0
(18)
in the interior of the body. We find that the variational equation which has the stress-strain relations (2) and the displacement boundary conditions in (6') as Euler equations is of the form
SIT = 0,
(19)
where (20)
IT = f (-W - + uj¢,u,) dV + fSu uipi dS.
We may note that, as long as is a linear function of the ui, which corresponds to the case of body forces independent of displacements, we have that
- uiy',u; = 0 and therewith the disappearance of body-force terms in the variational equation. The extension of the principle to the case where ' is a special quadratic function of the ui, which allows use of the principle in connection with vibration problems, has been stated previously.4
5. A transformation and two inequalities. Useful information may be deduced from a comparison of the values of I for functions Tij and u; which are not solutions of SI = 0 and for the functions Tij and ui which are determined from SI = 0. We may designate the solution functions of SI = 0 by 'rij and 9i and write (21)
Tij = Tij + STij,
ui = ii + Sui.
If we introduce (21) in (9), we shall have (22)
I = fV [(yij + Syij)(Tij + 8rij) - 4,(9 + Su) - W(T + ST)] dV Y
-1sn x(u + Su) dS - f5u [(pi + Spi)(ui + 3W - -O(p + op)] dS. 4 E. Reissner, Note on the method of complementary energy, J. Math. Phys. vol. 27 (1948) pp. 159-160.
ON VARIATIONAL PRINCIPLES IN ELASTICITY
5
We shall from now on in this section limit ourselves to the case for which 'i, X, and 0 are linear functions and W is a homogeneous second-degree function. We then have
'(u + Su) = Xiui + Xi Sue, X(u + Su) = piui + pi Sui, 0(p + Sp) = uipi + ui Spi,
(23a) (23b) (23c)
and T (T + ST) =
(24)
T'
W,rj, STij + W(ST).
Y YT
We further write
I=I+61+621,
(25)
where I is the value of I when Tij = Tij and ui = 9.Li, where 31 contains all terms linear in the variations Srij and Sui, and where 521 contains all terms of second degree in the variations. Rearrangement of terms in (22) gives us
I=
fv [yijTij - Xiui - W(T)] dV - fsp piui dS - fsu (i - uti)pi dS, SI = 0, (27) (as it should be), and (26)
82I
fv [S-yij STij - TV(ST)] dV -
Sui dS.
,Is,. Spi = Equations (26) and (28) may be simplified if account is taken of some of the basic relations. Since W is homogeneous of the second degree, we have
(28)
(29)
W (,T) _W,TijTij
Furthermore, 'yij = W,T;; while ui = ui on S.., and pi = pi on S,,.
1 = fv
(30)
Therewith
Xiui) dV - fs piu; dS. P
We further have f?ijrij dV = -JTij,jui dV + f piui dS,
(31)
and, since Tij,j + Xi = 0, finally (32)
I
f,v Xiui dV -
fs,
p'2"ci dS +'- f s. piui dS.
In order to transform 6222I as given by (28), we have at our disposal the relations (33)
W(ST) _ YW.az,jSTij
and (34)
f Syij STij dV = - f Srij,j Sufi dV + f Spi Sui dS.
ERIC REISSNER
6
It is not immediately apparent in which way to utilize these two facts. However, let us write 62I in the following two alternate forms: (35)
S2I = f
Srij,j Sui - W(Sr)] dV + f8v Spi Sui dS
or (36)
82I = f [(Syii - W,,,,,) Si, + W(Sr)] dV - fsu Spi Sui dS.
In general, the quantity 52I may be made both positive and negative by a suitable choice of the integrands. There are two exceptional cases where this is not so. These cases are given when (37)
Bpi = 0 on SP,
and
Sri;,; = 0 in V
or when (38)
5yi; - W,8r;i = 0 in V
and
Sui = 0 on S..
We now take account of the fact that the function W is positive-definite. Accordingly, when (37) holds, we have S2I < 0, and when (38) holds, we have 0 < 82I. We note that (37) represents the same limitations on variations as those associated with the variational principle for stresses [equation (19)] and
that (38) represents the same limitations on variations as those associated with the variational principle for displacements [equation (16)]. We conclude then from (25) and (27) that the following basic inequality holds: (39)
Ir
Equation (39) confirms the known fact that, in the variational theorem for displacements, one is concerned with a minimum problem and, in the variational theorem for stresses, one is concerned with a maximum problem. In contrast to this, the general variational theorem for stresses and displacements is no more than a stationary-value problem. What is of importance in the present demonstration of this known fact is the explicit way in which both the minimum principle for displacements and the maximum principle for stresses are seen to be direct consequences of a more general principle for stresses and displacements. MASSACHUSETTS INSTITUTE OF TECHNOLOGY, CAMBRIDGE, MASS.
VARIATIONAL PRINCIPLES IN THE MATHEMATICAL THEORY OF PLASTICITY' BY
D. C. DRUCKER
1. Introduction. The fundamental definitions of work hardening and perfect plasticity have been shown to have strong implications with respect to uniqueness of solution for elastic-plastic bodies. It is not surprising, therefore, to find that they lead rather directly to the variational principles as well. Perfect-plasticity theory and both the incrementally linear and the incrementally nonlinear theories for work-hardening materials are considered.
The
several counterparts of the minimum-potential-energy and the minimumcomplementary-energy theorems are derived in a unified manner for stressstrain relations of great generality. relative ones are established.
Absolute-minimum principles rather than
There are any number of approaches to the establishment of variational principles. One is to state the principles directly and then proceed to prove them. Although clear and precise statements can be made, the motivation
for the original inspiration does not appear. A newcomer to the field then frequently will be unable to appreciate the development and generally will not see how to produce appropriate theorems or modifications of his own. The
approach to be followed here does not suppose the result to be known in advance. It is synthetic in a sense, because the basic theorems have been stated and proved for a number of special materials [1-6].2 Nevertheless, it is a procedure which arises logically from fundamental postulates in elasticity and in plasticity theory, and it is systematic. In the theory of elasticity, whether linear or nonlinear, the steps are reasonably straightforward. The equation of virtual work is written first under the implicit assumption of continuity of displacement and what may be termed equilibrium continuity of the stresses (surface tractions must be continuous across any surface, but the normal stress components parallel to the surface may be discontinuous). In a common notation, repeated subscripts indicating summation, (1)
fA T*u,; dA + fv F; u; dV = fV o-&-t; dV.
The starred quantities are related through equilibrium, and the unstarred are compatible. There need be no relation between the two sets of quantities. For convenience, the surface area A is divided into the region AT, on which the surface tractions Tti are specified, and the region Au, over which displace' The results presented in this paper were obtained in the course of research conducted under Contract Nonr 562(10) between the Office of Naval Research and Brown University. 2 Numbers in brackets refer to the bibliography at the end of the paper. 7
D. C. DRUCKER
8
ments ui are given. The true and unique solution (no buckling, no initial stress) to the boundary-value problem with given body forces Fi thus satisfies (2)
1v Uzj4ij-dV - L. Tiui dA - fAT Tiui dA -
f
V
Fzuv dV = 0.
If approximate solutions are sought, two procedures suggest themselves immediately. One is to choose a compatible strain-displacement field e. uq and satisfy the boundary conditions on Au. The other is to select an equilibrium stress field o which satisfies the surface-traction boundary conditions on AT. More elaborate mixed schemes may be devised, but they cannot be classed as obvious [3].
The value of an approximation procedure, or of a guess, must be determined by comparison of the approximate solution with the unknown true answer. The real difficulty and the intuitive heart of the problem lie in the decision on what should be compared. As has been noted, the equation of virtual work will be satisfied if the natural strains and displacements are replaced by any chosen set satisfying compatibility and the boundary conditions on A. Therefore (3)
fY o%jev dV
- fAT Tiuz dA - fv Fiu1 dV =
f,
dV - fAT Tju; dA - f. Ftiu( dV.
Transposing and calling the difference between the true and the assumed solution Deij, Aui, (4)
f
v vZ2
lei; dV - fAT Ti ou, dA - fv Fi ou, dV = 0.
This form suggests strongly a consideration of the elastic-strain energy density written as a function of strain alone, (5)
W (Eij) =
because dW = (aW/aeij) dEij = Qij dEij. (6)
fEij O
Qij deij,
Equation (4) can then be restated as
S. [ fv W (,-;j) dV - fAT TiuadA - fv Fiu; dV ] = S[P.E.°] = 0,
where SE is to be interpreted as the first variation of the expression in brackets (the potential energy) as ui and eij are varied in accordance with compatibility and the boundary conditions on ui. A variational principle is not necessarily very helpful in solving problems. The assumed state may not be close to the true one. What is required instead is an absolute-maximum or -minimum principle, a comparison of the value of the potential energy for the assumed state with that of the true state, without restriction on the magnitude of the difference between the states. The presentation here is, however, within the framework of small-displacement theory.
VARIATIONAL PRINCIPLES IN PLASTICITY
9
A comparison may be /made with the aid of the identity (7)
f, W() dV
-
.l AT
- fv Fiui dV + f
V
Tiua dA - fV Fiu; dV =
[W(6- -
fV W (ez;) dV - f
AT
Tiu; dA
dV - fAT Ti Dui dA - fV Fi Aui dV.
In view of (4), therefore, the potential energy of any admissible compatible state is algebraically more than the potential energy of the true state by P.E.C - P.E.t = fv [W(E) - W(Ezj) - aij Dei7] dV.
(8)
The integrand may be rewritten as Eijt
(9)
f0
Eyjt
aij deij
a
Eij
aij deij - aij Deij
EG
/
\aij - o
) dEij.
The rectangles in Fig. 1 symbolize DEij The shaded triangles represent (9), the integrand of (8). In Fig. la to c the triangles are on the positivestrain-energy side of the symbolic stress-strain curves for any magnitude Acij. a
E
(c)
(d)
(e)
FiG. 1. Potential energy is a minimum for a stable elastic material (a to c). Curves in (d) and (e) are for an unstable material.
The potential energy is an absolute minimum for a linear or for a stable nonlinear elastic material. For an unstable material (Fig. Id and e), the shaded triangles are on the negative side for some Deij, and the potential energy is not an absolute minimum. A similar set of steps leads to the principle of minimum complementary energy. Equation (2) is satisfied if the a;;, Ti, Fi system is replaced by any
D. C. DRUCKER
10
other in equilibrium. For any state of stress aE which satisfies the boundary conditions on AT and iss in equilibrium with Fi, (10)
JV
DuzjE4 - dV - f
AT;,uti dA = 0,
Au
Do-;j and zTi is the corresponding change in surface traction
where uE -
on A. The complementary-energy density as a function of stress alone is suggested by the first integral: foo,"
(11)
Way')
ej2 duaj7
=
Equation (10) can then be restated as
because d2 = (a S2/aa j) datij = cij dugj.
i(e.) dV s' [ Jv
(12)
,IA
Tiuti
dA] = 3,[C.E.'] = 0,
where So is to be interpreted as the first variation of the expression in brackets (complementary energy) as azj and T,; are varied in accordance with equilibrium a
a
P E
E
(o)
(c)
(b)
a
E
(d)
(e)
FIG. 2. Complementary energy is a minimum for a stable elastic material (a to c). in (d) and (e) are for an unstable material.
Curves
with F, and the boundary conditions on AT. Figure 2 is symbolic of the fact that the complementary energy is an absolute minimum for a stable material. Corresponding to (8) and (9), (13)
C.E.E - C.E. =
Iv
[S2(o
) - S2(aaj) - 4 &azj] dV,
where the integrand may be rewritten as (14)
f ,i v
(eij - 01j) da j.
VARIATIONAL PRINCIPLES IN PLASTICITY
11
The symbolic representation of the stress and strain tensors by one dimen-
sion each in Figs. 1 and 2 and the terms stable and unstable can be given general meaning and made precise. 2. The fundamental postulate for elasticity and plasticity. A basic postulate has been formulated for both elastic and plastic media [4] without time effects. It is essentially a definition of a stable material and may be stated as follows:
No work can be extracted from the material and the system of forces acting upon it.
A more useful statement is in terms of an external agency which applies a set of additional forces to the body under a given load and then removes the added forces. The external agency must do positive work in the application of force. Over the cycle of application and removal the work done by the external agency must be positive if plastic deformation occurs in work-hardening material and will be zero if only elastic changes take place. For a perfectly plastic material, the work done by the external agency may also be zero when plastic deformation takes place, although generally it will be positive. The basic postulate may be applied to a homogeneous material under homogeneous stress a and strain q l.. Suppose that the external agency changes the state of stress by Avij to v, The strain will change by AEij to eb. Then the postulate requires Ei
(15)
(aij-a
jb
f ija E
0.
The value of the integral is strongly path-dependent in the plastic range but is, of course, independent of path for any elastic material. 3. Absolute-minimum principles in elasticity. The inequality (15) is a
formal expression of the requirement that the shaded triangles of Fig. 1 be on the positive-strain-energy side of the stress-strain curve. Although for a nonlinear elastic material e depends upon ell as well as upon &vij, the integral is path-independent. Choosing a straight-line path in stress space from v to a , it is obvious that inequality (15) may be continued as
0<
(16)
rEijb
J Eija
(oij - va) deij < (o
Also, from (17)
e q_) - AQij AEj.
Ep
r (Qti; - aa.) (Ebi - E°) = 1 b d[ (?ij - s) (Eij - ea. ] ...111
=f p (18)
- jai) (eb -
0<
Qijjb
Gia
Eijb
(aij - 4)6 deij +
?ijb
J
(Eij - Eij6) daij,
(Eij - ea) dvij < 0?ij DEjj.
Inequality (18) expresses the requirement that the shaded triangles be as shown in Fig. 2a to c and not as in Fig. 2d and e.
12
D. C. DRUCKER
Materials of the type of Figs. 1d and e and 2d and e are thus excluded from our consideration, although not necessarily from physical reality. For elastic materials which follow the postulated behavior, comparison of (16) with (9) and of (18) with (14) proves that the potential energy and the complementary energy of the true state are both an absolute minimum: (19)
P.E.°
P.E.
C.E.Z
C.E.E,
the equality sign applying only to the trivial case of the admissible state c or E coinciding with the true state. 4. Deformation or total theories of plasticity. If no distinction is made between loading and unloading, or if each point of the body is assumed to be at the maximum load intensity in its history, deformation theories postulate a unique relation between stress and total strain. Although physically unacceptable, in general, because plastic deformation is path-dependent and irreversible,
such theories do in some instances lead to very useful results. Under the assumptions mentioned, there is no need to consider deformation theory further, as the theory is indistinguishable from nonlinear elasticity. No matter how elaborate the stress-strain relation, if the material postulated is stable, the principles of minimum potential and minimum complementary energy apply without any change. If, on the other hand, loading is taken to be nonlinear whereas unloading is assumed to follow a linear elastic relation, the inconsistency of deformation
theory becomes of primary importance.
The mathematical and physical
meaning of solutions then becomes quite obscure.
FIG. 3. Normality of plastic-strain increment (rate).
VARIATIONAL PRINCIPLES IN PLASTICITY
13
5. Work-hardening relations involving increments of stress and strain. The fundamental postulate of positive work by an external agency has very far reaching implications. As shown in Fig. 3, the plastic-strain increment or strain-rate vector e' , must be normal to the yield or loading surface at a smooth point and between normals to adjacent points at a corner. At a smooth point (20)
eyj = Cijklakl + G
of of o aoij
aC.kl
where Ci;klvkl is the elastic response and G and f are functions of the state of the
material, which may include strain and the history of loading as well as the existing state of stress. G may, in addition, be a homogeneous function of order zero in the stress rate In pictorial terms (Fig. 3), G may depend upon the direction of vzj, but doubling 4'j doubles In all stress-strain relations in use today, G is taken as completely independent of vzj, so that in the form Eip _ HijkZQkl
(21)
the Hijkl likewise are independent of a-kl.
The coefficients Hijkl, which appear
similar to the Cijkl of linear anisotropic elasticity, may be horribly complicated functions of the present state and prior history. If a corner is considered to be a set of intersecting loading surfaces [7-10], each of which makes its independent contribution to the plastic-strain rate (Fig. 4), then (22)
(1)HijklQkl _ (2)HijkZQkl + . Eij = Ci,klakl (Cijkl + Bijkl)QkZ
. + (m)HijklQkl
It is important to keep in mind that the coefficients (m)Hijkl are to be taken as has an outward-pointing normal component, just as in (20), zero unless where the plastic term must be chosen as zero if unloading takes place. Stressrate vectors having different directions often will activate different loading surfaces, so that, despite the apparent linearity of (22), it is not true in general 'P
(o)
(b)
FIG. 4. A corner as an intersection of two or more loading surfaces. Two only are drawn in (a), but there may be infinitely many, as symbolized in (b).
D. C. DRUCKER
14
that the strain rate produced by two stress rates acting simultaneously is the sum of the strain rates for each individually. In fact, at a corner the combination of two stress rates, each of which individually produces plastic action, may result in an unloading (Fig. 5). Even when the (m)H,jka are independent
ail + bQl
rate (or der zero).
aE!R I
1
1
/
n"
.
' ta s The b asic pos t u at e requires th im rate of work by the external agency to be positive; therefore
11
i1
the B;jka will be functions of the stress
of
I aQi
* belp
(23)
EQdjEij _ 'ff (l %jka + Bijka)Qija-a > 0,
/I
and because the elastic component is recoverable, (24)
FIG. 5. Two loadings may combine to an unloading at a corner.
arQ.E%p = '1l'Bjjkac'ijQka > 0
for a work-hardening material, unless
In the demonstration of the uniqueness theorem for stress and strain rates [7-10], the entire point lies in the proof of (25)
(a.k.
(a 0 b),
beg.) > 0,
ba'ij) (aei.
where a and b are two assumed solutions for the rates from the stress point v .
If an infinitesimal time, arbitrarily chosen as unity, is permitted to elapse, elP
a rp
b E .P
E it
(o) Permissible path
(c)
(d)
aTb
b-b-a for IE'p
(b)
b --e - a
a T b
a-b for bErp
FIG. 6. Permissible paths.
the two stress states are u + ao; and u +
(Fig. 6). At a smooth point of the loading surface it is possible to go from stress point b to stress point a 13
(Fig. 6a and b) or from stress point a to b (Fig. 6a and c) and change the strain by a4.ri; - bE or bEa - aE , respectively, in accord with (20) and (23). The work postulate for the b-to-a case then gives [see (9)] (26)
l
b
I_ bvj) dEjj
a
r Qgj) dEg; > 0.
r
(a - °y'
b o-1
The result (25) therefore is established [7], but the value of the integral itself is of interest here. (27)
0 < 1 aa.r aEr. v
$3
+
1 bQr $3 ijbE - b0U
-a'
U
(aa' 13
(aE . - bEr ) 13
V
ij
7
VARIATIONAL PRINCIPLES IN PLASTICITY
15
so that, as for (18), (28)
0<
a
J Pij
b= 1 a/ a/
ye - ,5/.-) da/i- = _T z.7 Ei;
b1 ' iQa9
b/ - aQZ;/ b/Ei;. Es;
For the a-to-b path, (27) and (28) merely interchange, and the result is therefore unaffected. When there are two or more loading surfaces at a corner (Fig. 6d), the permissible path may be from b to a for one set of plastic-strain rates and from a to b for another. As both (27) and (28) apply to each path, with the elasticstrain rates being counted only once, it is clear that (27) and (28) apply just as they are, even for this very complicated case. 6. Two minimum theorems for incremental work-hardening theories. In general, theorems which hold in the elastic range cannot be expected to apply in the plastic. As a consequence of irreversibility, the uniqueness theorem for work-hardening theories of plasticity is in terms of the increments or rates of stress and strain and not of the stresses and strains themselves. The equivalent variational or minimum principles likewise will be in terms of rates. Following the procedure established previously, the principle of virtual work is written for the rates (29)
dA - fAT
fv 0'/y*E2 dV - fAu
Ti*uv dA - fv Fv*u dV = 0.
Equation (29) applies to the actual rates, as it does for any compatible strainrate distribution and any stress-rate field in equilibrium. When the true stress rates are varied under the restriction that they satisfy equilibrium and the boundary conditions on AT (Ti = 0), the analogue to (10) is (30)
fv Aa Ez; dV -
I ATMdA = 0.
The analogue to complementary-energy density (11) is simply
corresponding to linear elasticity, because the stress-strain relations are timeindependent, as exhibited by (21). The form suggested for a variational principle similar to (12) is (32)
S; C fv c
(o .) dV - fAu T"ui i dA] = 0. S)
This complementary rate principle will be valid whenever the Hi;ka of (21) are independent of stress rate. All currently used forms (20) for smooth loading surfaces are in this category. At a corner, however, as previously explained, the Hi;ka of (22) which are nonzero depend upon the direction of v -. There0 for some or all directions of loading, and (32) is not valid. fore, SQ Hi;ka
A complementary rate minimum principle in a form equivalent to (19) would be much more valuable: (33)
f v'(d;) dV - f
AU
Ttitui dA
fv
dV - fA. T Eu dA.
D. C. DRUCKER
16
The right-hand side is algebraically larger than the left by (34)
fv [w(o ) -
dV - fAu (T;E - Tit)ui dA
f
V
[$(PiEEiE -
(?i
-°
)EtijI dv,
where eyf is computed from o f [equation (22)]. The integrand of (34) may be rewritten as 1 /t /t -2'O 6 j - Qij/E /t
1 P/EE/E ij
(35)
1,
23
Comparison with (28) shows that the fundamental work postulate requires (35) to be positive and thus guarantees the minimum principle (33), although (32) does not apply. If now the strain rates are varied in the virtual-work expression for the true rate state while the condition u' = 0 on A. is satisfied, the equivalent of (4) is fAT
(36)
TiAutidA - fVFiAu'dV = 0.
The analogue to strain-energy density (5) would be (37)
if (22) can be inverted as (38)
AijktEk/l
tTij =
Such inversion is not possible if the material is incompressible in either the elastic or the elastic-plastic range. This difficulty can be circumvented for such materials by solving for the stress-rate deviation (39)
Si- aijkEEkj
d
and writing w(Eyj) _
(40)
It can be shown, then, that a minimum-potential rate principle holds: (41)
fV
dV - fAT Tzuyt dA - ff F'u"t dA < f w(e) dV
The right-hand side exceeds the left by (42)
fV [w(Eij) -
- fAT T
iuti° dA - fV Ftuz- dA.
dV - fV ij(Ed - Ei1) dTT
when the surface-traction-rate and body-force-rate integrals are replaced by volume integrals of stress rates. The sum of the integrands is (43)
1 /c /c + 2(TijEij
1
/E /t
/t /c
U11Eij.
VARIATIONAL PRINCIPLES IN PLASTICITY
17
Comparison of (43) with (27) proves (41) and once again demonstrates that the minimum principles follow from the fundamental postulate or definition of work hardening. Incrementally nonlinear stress-strain relations have not been studied in any detail. If G in (20) or Hijka in (21) depends on the direction of the usual proof of the uniqueness theorem breaks down. On the other hand, if uniqueness is assured, the basic postulate in the form of (26) or the first inequality (28) will ensure the validity of the absolute-minimum principles (33) and (41), corresponding to complementary and potential energy. A simple assumption which leads to uniqueness is that the plastic-strain rate is a monotonically increasing function of the normal component of o . Although apparently a very reasonable postulate, as a consequence of the proportionality between rates of stress and strain for a given direction of Q4;, it is equivalent, unfortunately, to a reduction to linearity. 7. Restricted minimum theorems. The complete parallelism of the developments of the minimum principles for the elastic and for the elastic-plastic cases may have blurred the basic approach. There are but a few independent combinations possible for stress, stress rate, strain, and strain rate, and it is worthwhile running through some of them. Since uniqueness of rates only is guaranteed, it is not likely that any new theorems of true generality will result for work-hardening materials. Suppose a theorem is desired for the plastic range which contains the stresses and the strains themselves. Equations (2) to (4) and (10) are written exactly as for elastic bodies. Again (5) and (11) would be suggested by the form of (4) and (10). Now, however, the path of loading is important, and it is not true in general that 0E' ai; dei; = W is a function of final strain only, nor is
Nevertheless, if at each point of the material there has been no unloading from any of the loading surfaces, the irreversibility is- not apparent to the material. In this very limited sense, minimum-complementary-energy and minimum-potential-energy theorems ffQi' ei; do-i; a function of stress only.
hold for a very restricted and yet possibly useful class of alternative admissible states [11]. Next suppose that a theorem is desired for strain rates and stress. Virtual work is then written in the form (44)
f
V
ai;ei; dV - fA. Tiu; dA - fAT Tiu( dA - fV Fiu; dV = 0.
Varying the strain-rate system from the true state without changing the displacement-rate boundary conditions leads to (45)
fv oi; Dei; dV - f
AT
Ti Au' dA - f Fi Du; dV = 0.
D. C. DRUCKER
18
The variational principle suggested is be [
(46)
fv(E;)dV - fATT;uitdA - fF;u;'dV] = 0,
where amtt
(47)
= 4.
a principle can have meaning only if 4 determines ov;j or at least c,4y;.
As this will not be true in general, (46) can be valid for a restricted set of loading paths or special materials at most. Varying the equilibrium system in the familiar manner results in
dV - f,. Tiuz dA] = 0,
(48)
which has meaning only if o-ij determines Etij or at least o,y e. so that a
(49)
5-vii
1 = E 1.
Therefore, (48) cannot apply to a work-hardening material. Another possible set of theorems relates stress rates and strain. The corresponding equations then are found by interchanging primes and no primes in (44) to (49), and the end results are (50)
8, [fv `D(9-) dV - fAT Tzu; dA - fvFtiui dV] = 0,
where (and only if Etij determines o- ) acb,
(51)
= 0-!t
and (52)
s°
[ fv
(c) dV - fA. T tuj dA]
= 0,
where (and only if o-; determines Eij) (53)
alk = _ t
aQ
E1.
Such. theorems as these are therefore inappropriate for work-hardening materials as defined. They will have limited validity at least for materials whose state is described by surfaces in strain space rather than in stress space [12,13].
8. Elastic-perfectly plastic material. A perfectly plastic material may be defined directly or may equally well be considered as the limiting case of a work-hardening material for which all subsequent loading surfaces coincide with the initial yield surface, f = k, bounding purely elastic action. Unlimited plastic deformation may occur at yield. As no stress increment is required
VARIATIONAL PRINCIPLES IN PLASTICITY
19
for flow at yield, the work done by an external agency may be zero when plastic deformation takes place. The plastic-strain-rate vector is normal to the yield surface in the extended sense (Fig. 3). The stress-strain relation at a smooth point on the yield surface is Eij = Eij + E' = Cijkzokl + A
(54)
o-2
and at a corner (55)
EZj = CiiklQkl +
of 1 ao-ij
of 2 + + 2 aarij
.
I
where each (fixed) intersecting yield surface makes its own contribution. The a are homogeneous of degree -1 in time, because the stress-strain relations are independent of time. Each A is to be taken as zero unless the stress point is on its yield surface and remains there in the interval under consideration. They are otherwise indeterminate for a homogeneous state of stress.
As a consequence of a fixed yield surface and normality, (56)
for all permissible o(57)
0 .
Therefore
2w (Q ) = Q Eij =
Cijkl?ij?ky > 0.
Also, the total-strain-rate vector is resolved uniquely into an elastic and a plastic component at each stress point on the yield surface. At an interior point, the strain rate is, of course, purely elastic. Therefore, when the existing stress is known and the strain rate is given, the stress rate is determined except, again, for an incompressible material, in which the, stress deviation rate is determined [as in (38) and (39)] by (58)
2wp(Eij) = Uij4i7 = QijEij = AijklEz;Eki > 0.
All the equations (26) to (43) are valid, therefore, with w, replacing co and wp replacing w. The discussion of what happens at a corner is simplified for a perfectly plastic material. If the strain-rate vector points outward from the yield surface and lies between the normals to adjacent points (see Fig. 3), it is purely plastic, and the stress rate is identically zero. If the strain-rate vector
points outward but has a component tangent to the adjacent surface, that portion of the surface governs, and the stress point is then effectively at a smooth point on the yield surface.
It is not surprising that a minimum-complementary rate principle and a minimum-potential rate principle applies. As has been stated, a perfectly plastic material is a limiting case of a work-hardening one as all successive loading surfaces approach the initial yield surface. The lack of limitation on Eti' for a homogeneous state of stress does not matter because, at a given stress point, cr j' is zero when the strain rate and the stress rate are related and is zero or negative when they represent two independent states.
D. C. DRUCKER
20
Additional theorems of some generality would be expected for a perfectly plastic material, because the yield surface does not depend upon loading. The plastic-strain-rate vector is normal to the surface and so determines the stress point itself, or at worst a straight line or plane of the surface (Fig. 7). Actually, Eye determines the rate of dissipation, ajjs(', uniquely. The total-strain-
rate vector by itself does not provide any such information for an elasticplastic material. However, for a plastic-rigid material or for an elastic-plastic
material at the limit load [14], the elastic-strain rates are identically zero. The total-strain rates are then plastic only and do determine the dissipation and the stress to a considerable extent. E P
FIG. 7. The plastic-strain-rate vector determines the rate of dissipation of energy ai;e;+ and the stress point o-jj itself, except at a flat spot.
Under the restriction of zero elastic-strain rate, following steps (44) to (47), Following steps (48) and (49), c 4E4j is now zero, so that = 0 and (48) applies. Absolute-minimum principles can be established which are generalizations of the Markov [15] and Hill [16] principles and are in fact equivalent to the limit theorems [14]. The minimum principle for (46) is Eyj does determine ai,Ezj = g, and the principle (46) is established.
(59)
Iv P(e 13 )
dV - fAr Tut dA - fv F;u;t dV
fv(4)dV - fATTiu;°dA where e2
,
fvFiu,°dV,
u;° is any compatible system taken as plastic only and where
is the dissipation function. The right-hand side is algebraically greater than the left by the volume integral of (60)
Sa(Eal) -
0j(Eti1 - Eij) = aq.c7 - 04.eflV
which is positive or zero in accordance with the basic work postulate [4], or equally well from the convexity of the yield surface and the normality of the plastic-strain-rate vector, which themselves are consequences of the postulate.
VARIATIONAL PRINCIPLES IN PLASTICITY
21
The upper-bound limit theorem [14] may be obtained from (59) by observing that, if u; vanishes on A, the left-hand side is zero from virtual work, and (61)
fA T;uz' dA + J F;u' dV 5 fv c,(E(;) dV.
The minimum principle corresponding to (48) may be written as a maximum
principle by multiplying through by -1: (62)
fA T;uz dA
fAu T-u4 dA.
Proof follows directly from virtual work, as the left-hand side exceeds the right by the volume integral of (63)
(Ui - QE)Ei77
which is positive or zero just as (60) was. It is (63) which is the key in the lower-bound theorem of limit analysis [14].
The yield value and the yield criterion as well need not be the same at each point of the material, since nowhere in the proofs do such restrictions appear. Therefore (59) and (62) apply equally well to a rigid work-hardening material at each stage of loading. As mentioned in the previous section, theorems involving total strain are not appropriate for the types of materials postulated in the paper [13]; hence the list of simple minimum principles seems exhausted. 9. Conclusion. A systematic procedure is presented for establishing variational and minimum principles. The virtual-work expression is written in terms of the quantities for which a theorem is sought, and a variation is tried which suggests a possible principle. Use is then made of a basic postulate for stable materials without time effects, which had been formulated previously [4] : in the very strictest sense, work cannot be extracted from the stressed material and the system of forces acting upon it. Substitution of the relation between the quantities which is given directly by the fundamental postulate provides immediate proof of the valid absolute-minimum principles. Minimum-potential-energy and minimum-complementary-energy theorems (19) are established for linear and nonlinear elastic bodies and for deformation theories of plasticity by virtual work (2), the appropriate variation (4) or (10), and inequalities (16) or (18), as given by the postulate. Corresponding theorems [2,5] in which rates replace total quantities (41) and (33) are established for work-hardening and for perfectly plastic materials by virtual work (29), the appropriate variation (36) or (30), and inequalities (27) or (28) given by the postulate. All incrementally linear and the most complicated combinations of incrementally linear forms (Fig. 4) are included. Extended theorems (59) and (62) involving stress quantities and strain and displacement rates [15,16] are established for rigid-perfectly plastic materials or elastic-perfectly plastic materials at limit loading or collapse. The steps
D. C. DRUCKER
22
are virtual work (44), the variations (45) or (48), and inequalities given by the postulate, as indicated following (60) or (63). BIBLIOGRAPHY
1. W. Prager and J. L. Synge, Approximations in elasticity based on the concept of function space. Quart. Appl. Math. vol. 5 (1947) pp. 241-269.
J. B. Diaz and H. J. Greenberg, Upper and lower bounds for the solution of the first boundary value problem of elasticity, Quart. Appl. Math. vol. 6 (1948) pp. 326-331. 2. H. J. Greenberg, On the variational principles of plasticity, ONR Rep. All-S4 (March, 1949) Brown University. 3. E. Reissner, On a variational theorem in elasticity, J. Math. Phys. vol. 29 no. 2 (July, 1950) pp. 90-95. K. Washizu, On the variational principles of elasticity and plasticity, ONR Rep. 2S-18 (March, 1955) Massachusetts Institute of Technology. 4. D. C. Drucker, Some implications of work-hardening and ideal plasticity, Quart. Appl. Math. vol. 7 no. 4 (January, 1950) pp. 411-418. A more fundamental approach to plastic stress-strain relations, Proc. 1st U.S. Nat. Congress Appl. Mech. ASME (1951) pp. 487-491. 5. R. Hill, The mathematical theory of plasticity, Clarendon Press, Oxford, 1950. 6. W. Prager and P. G. Hodge, Jr., Theory of perfectly plastic solids, John Wiley & Sons, Inc., New York, 1951.
7. D. C. Drucker, On uniqueness in the theory of plasticity, Quart. Appl. Math. vol. 14 no. 1 (April, 1956) pp. 35-42. 8. W. Koiter, Stress-strain relations, uniqueness and variational theorems for elastic-plastic materials with a singular yield surface, Quart. Appl. Math. vol. 11 (1953) pp. 350-353. 9. J. L. Sanders, Plastic stress-strain relations based on infinitely many plane loading surfaces, Proc. 2d U.S. Nat. Congress Appl. Mech. ASME (1954) pp. 455-460. 10. B. Budiansky, Fundamental theorems and consequences of the slip theory of plasticity, Ph.D. Thesis, Brown University, 1950. 11. P. G. Hodge, Jr., Minimum principles of piecewise linear isotropic plasticity, ONR Rep. 298 (August, 1955) Polytechnic Institute of Brooklyn.
12. D. Trifan, A minimum principle of plasticity, Quart. Appl. Math. vol. 13 (1955) pp. 337-339.
13. W. Prager, On limiting states of deformation, ONR Report C11-9 (February, 1956) Brown University. 14. D. C. Drucker, W. Prager, and H. J. Greenberg, Extended limit design theorems for continuous media, Quart. Appl. Math. vol. 9 no. 4 (January, 1952) pp. 381-389.
15. A. A. Markov, On variational principles in the theory of plasticity, Prikladnaia Mat. Mek., vol. 11 (1947) pp. 339-350. 16. R. Hill, A variational principle of maximum plastic work in classical plasticity, Quart. J. Mech. Appl. Math. vol. 1 (1948) pp. 18-28. BROWN UNIVERSITY,
PROVIDENCE. R.I.
DISCUSSION OF D. C. DRUCKER'S PAPER "VARIATIONAL PRINCIPLES IN THE MATHEMATICAL THEORY OF PLASTICITY" BY
P. G. HODGE, JR.
In his most interesting paper Professor Drucker has mentioned the fact that he is concerned with minimum principles rather than with variational principles. In many branches of physical science, the distinction between these two viewpoints need not be made, since the unique point of zero variation is a minimum. Therefore, it is perhaps of interest to comment more fully on this difference, and to give an example of how it enters into applications.
B
B
C
C
FIG. 1. Points of zero first variation (A and B) and minimum points (A and C)
Figure i illustrates the difference between minimum points and points of zero variation. The point marked A is a minimum point of zero variation. The points marked B are of zero variation but are not minima. Such points are of interest in elastic stability problems. Finally, points C are minimum points where the first variation does not vanish. It is with such points that the minimum principles of plasticity are concerned. As an elementary example of the importance of this fact, let us consider the bending of a simply supported circular plate under a uniform load. The plate is assumed to be made of a rigid-plastic material which satisfies a certain type of integrated flow law. Let MT and M, represent the dimensionless radial and circumferential bending moments, respectively, and K, and K the corresponding curvatures. The relevant portions of the stress-strain law may then be 23
P. G. HODGE, JR.
24
written as follows:
If Mr < 1, M, < 1, then Kr=O,
(la)
K,=O.
If M, = 0, M, '2-: 1, then (lb) M,-1=CK
KIP
IfM,> 1, 0 Kr = 0,
(lc)
M, - 1 = cK,.
If M, = Mr ? 1, then (ld)
M, - 1 = Mr - 1 = C(Kr + L KN),
0,
Kr
K,
0,
The four situations expressed by equations (la) to (1d) are illustrated in Fig. 2. The small circles represent the point with coordiwhere c is a constant.
nates Mr,M,, and the arrows are vectors with coordinates Kr,K,.
FIG. 2. Stress-strain law.
The total-potential-energy function for a plate of radius 1 is proportional to (2)
II = f ' [Kr(Mr + 1) + K,(M, + 1) - 12PW]r dr,
where W is the dimensionless vertical displacement and P the dimensionless load. Finally, the curvatures are given in terms of W by (3)
Kr = - W",
K, _ -
W, T
Now, let it be desired to find the "best" fourth-degree function for the displacement, i.e., to determine A and B so as to minimize IT, if (4)
W = A(1 - r2) + B(1 - r4).
The curvatures are easily determined by substituting equation (4) into equa-
DISCUSSION Or D. C. DRUCKER'S PAPER
25
2.2
2.0
1.8
1.6
1.4
Approxim ate solution
Exact-
soluti on 1.2
W 1.0
0.8
0.6
0.4
0.2
01
0
V 1
1
1
2
3
4
5
6
7
P
8
9
FIG. 3. Comparison of displacement at plate center for exact and approximate solutions.
tion (3), but in order to find the moments, it is necessary to decide which of the four equations (1) is applicable. As a first guess, we assume that equations (1d) apply throughout the plate. II then turns out to be a quadratic function of A and B which possesses a minimum of zero variation. When this is located and when the resulting values of A and B are substituted into curvature equations, the results (5)
are obtained.
KT°6P-4-9Pr2 8
K
-6P-4-3Pr22 8
26
P. G. HODGE, JR.
Now, it turns out that the assumed solution is a reasonable approximation to the true one only if P is greater than 1. However, this implies KT < 0 near r = 1, in violation of the assumed law (1d). In other words, the assumed law
(1d) places certain restrictions on A and B, and the analytic minimum of II(A,B) does not satisfy these restrictions and hence is meaningless.
The obvious solution to this dilemma is to change our hypothesis as to which equation (1) is applicable. However, an alternative approach is to incorporate the restriction of equations (1d) into the assumed form of W. In other words, since the true minimum of II(A,B) lies outside the admissible domain, we shall locate the minimum on the boundary. To this end, we require that KT(1) = 0,
i.e., that
A = - 6B. II then becomes a quadratic function of the single variable B, and its minimum leads to the curvatures and displacements
(Cr =(7P - 8)(1 - r2), (6)
Kp = -(7P - 8) (3 - r2), W = 43 (7P - 8) (1 - r2)(5 - r2).
Since this satisfies the restrictions of equations (1d) for P > .-, it is the desired minimum. Figure 3 shows the displacement at W = 0 computed from equations (6) and compared with the exact solution. The close agreement indi-
cates that the approximate solution obtained in this manner is reasonably accurate. POLYTECHNIC INSTITUTE OF BROOKLYN,
BROOKLYN, N.Y. NOW AT ILLINOIS INSTITUTE OF TECHNOLOGY, CHICAGO, ILLINOIS.
A GEOMETRICAL THEORY OF DIFFRACTION' BY
JOSEPH B. KELLER
1. Introduction. Geometrical optics is a theory of light propagation based on the assumption that light travels along certain curves, called rays, which are determined by the laws of geometrical optics. Although experience has shown that this theory is essentially correct, there are still many cases in which light appears in places where there are no rays (i.e., in shadows). Such discrepancies between experience and geometrical optics are called diffraction effects. It is the purpose of this article to show that geometrical optics can be so modified as to include diffraction. The modification consists in introducing new rays, called diffracted rays, by extending the laws of geometrical optics. These new rays account for the appearance of light in shadows and also alter the light in lit regions. It seems evident that diffracted rays should be produced when a ray hits an edge or a vertex or when a ray grazes an interface or a boundary. Geometrical optics does not describe what happens in any of these cases; hence we will extend it to do so. Then it will yield the diffracted rays. Our extension of the laws of optics will be presented in two equivalent forms. The first is the explicit form, in which we enumerate the different situations in which diffracted rays are produced and describe the different kinds of diffracted
rays which occur in each case. The second formulation is based upon an extension of Fermat's principle. The equivalence of the two formulations follows from the usual considerations of the calculus of variations. Once the diffracted rays have been introduced, we shall define diffracted
wavefronts and the phase, or eiconal, function by means of them. In this way we shall obtain new solutions of the eiconal equation. Conversely, from appropriate solutions of this equation, diffracted wavefronts and rays can be determined. A number of examples in which diffracted rays occur will be described to illustrate this part of our theory. In these examples the diffracted rays cover the shadows of ordinary geometrical optics. However, we shall also find certain cases in which shadows remain even after the introduction of diffracted rays. To obtain rays in such shadow regions, we shall further extend the concept of a ray by introducing imaginary rays. These rays can be used in much the same way as real rays. For example, a complex phase function can be 1 This theory was first presented at the Symposium on Microwave Optics at McGill University, Montreal, Canada, in June, 1953. A brief account appeared in the program of the symposium. It was also presented, with various extensions, at the Symposium on the Calculus of Variations and Its Applications, University of Chicago, April, 1956. The research reported in this article has been sponsored by the Air Force Cambridge Research Center, Air Research and Development Command, under Contract No. AF 19(604)1717. 27
JOSEPH B. KELLER
28
defined in terms of them. It provides the analytic continuation, from a lit region into the shadow, of a solution of the eiconal equation. The second part of our theory shows how the rays and wavefronts can be used for the quantitative description of the light distribution. This necessitates the introduction of an amplitude function and certain principles for its determination. Finally we shall discuss the relation of our theory to previous work on diffraction. This may partially justify the introduction of the new rays by showing how some kinds of them have already appeared in special cases. -_ _ All our considerations can be applied to ------other single-integral variational problems and to other first-order partial differential
equations in any number of variables. Diffracted rays
They lead to the introduction of diffracted
_
extremals and diffracted characteristics and of complex characteristics. With the aid of these characteristics, additional branches of solutions of first-order equa-
Edge
Incident
Screen
rays
tions can be constructed. Complex solutions, which are analytic continuations of certain real solutions, can also be obtained for analytic equations. As an example, consider the HamiltonJacobi equation of classical mechanics.
The characteristics of this equation are
FIG. 1. The cone of diffracted rays pro-
duced by an incident ray which hits
the classical-mechanical trajectories which satisfy Hamilton's canonical equations or Newton's equations of motion. The com-
plex characteristics are complex-valued
solutions of Newton's or Hamilton's equations. These complex trajectories enter the "forbidden regions" into which real trajectories cannot penetrate. They the edge of a thin screen.
enable us to continue solutions of the Hamilton-Jacobi equation into these regions, and yield complex values for the solutions there. The appearance of trajectories in forbidden regions is related to the "tunnel effect" of quantum mechanics. In fact, the present considerations provide a new classical interpretation of this effect. 2. Diffracted rays. The first kind of diffracted ray is produced when an incident ray hits an edge (see Figs. 1 to 4). The incident ray produces infinitely many diffracted rays, traveling in directions determined by the law of diffraction. This law states that each diffracted ray which lies in the same medium as the incident ray makes the same angle with the edge as does the incident ray. Furthermore, the incident and diffracted rays lie on opposite
A GEOMETRICAL THEORY OF DIFFRACTION
29
sides of the plane normal to the edge at the point of diffraction. However, the diffracted ray need not lie in the same plane as the incident ray and the edge. Therefore the diffracted rays form the surface of a cone with its vertex at the point of diffraction. If a diffracted ray and the incident ray lie in different media, the angle between the diffracted ray and the edge is related to the angle between the incident ray and the edge by Snell's law (see Fig. 5a). But here again, the diffracted ray is not restricted to lie in the same plane as the incident ray and the edge.
Therefore these diffracted rays also form the surface of a cone.
FIG. 2. The plane of diffracted rays produced by a ray normally incident on the edge of a thin screen.
When an incident ray hits a vertex (e.g., a junction of two or more edges), it produces infinitely many diffracted rays which leave the vertex in all direc-
tions (see Fig. 6). Thus at a vertex a single incident ray produces a twoparameter family of diffracted rays. When a ray grazes an interface or boundary surface (i.e., when it is tangent to the surface), the ray splits in two (see Fig. 7). One part continues, unaffected by the surface, as an ordinary ray. The other part travels along the surface. Its path on the surface is a surface ray, i.e., a curve which satisfies the differential equations for a ray, when these equations are specialized to a surface. A surface ray also makes Fermat's integral stationary among all curves lying on the surface. At every point on its path this ray again splits in two, one part continuing along the surface and the other part leaving the surface along the tangent to the surface ray, provided that the tangent lies on the same side
30
JOSEPH B. KELLER
of the surface as the surface ray. In defining a surface ray, one must use the value of the index of refraction appropriate to that side of the surface from which the incident ray comes. If the surface ray lies on that side of the surface having the lesser index of refraction, it also sheds another diffracted ray at each point on its path (see
Incident/ray FIG. 3. The diffracted rays produced by a plane wave obliquely incident upon a slit in a thin screen. The two incident rays which hit the slit edges are shown, along with some of the singly diffracted rays which they produce. One diffracted ray from each edge is shown crossing the slit and hitting the opposite edge, producing doubly diffracted rays and then triply diffracted rays.
Fig. 8). This diffracted ray is a critically refracted ray, which leaves the surface at the critical angle on the side opposite the surface ray. Conversely, a surface ray is produced when a ray is incident at the critical angle on that side of the surface having the greater index of refraction (see Fig. 9). In this case
the refracted ray is initially tangent to the surface and then proceeds along it
A GEOMETRICAL THEORY OF DIFFRACTION
31
as a surface ray. Surface rays are also produced by a ray incident at an edge or a vertex, since in these cases some of the diffracted rays leave the edge or vertex along the surfaces meeting there (see Fig. 5). When a ray is incident on a surface of discontinuity of any derivative of the index of refraction, it is reflected and refracted just as at a surface of discontinuity of the index itself. Similarly, diffracted rays are produced when incident rays hit edges or vertices of such surfaces or when they graze these surfaces. In other words, such surfaces behave in all respects like discontinuity surfaces of the index of refraction itself. Just as discontinuity surfaces of deriv-
Screen
atives of the index of refraction must be included as surfaces, so must lines of discontinuity of any derivatives of surfaces be counted as edges. Thus an ordinary edge is a line along which first derivatives
of a surface (i.e., the slopes) are discontinuous. Similarly, a line along which second derivatives of a surface (i.e., the curvatures) are discontinuous must also be counted as an edge. Discontinuities
Edge
Diffracted rays
Incident rays
30
in higher derivatives of a surface also yield edges. In the same way, discontinuities in any derivatives of edges must FIG. 4. A plane wave normally incident be counted as vertices. In addition, iso- upon an aperture in a plane screen. The lated points (not on edges) at which de- incident rays are normal to the edge; rivatives of a surface are discontinuous hence the rays diffracted from each point of the edge lie in a plane normal to the also play the role of vertices. The apex edge. of a cone is an example of such a point. Diffracted rays are produced when rays hit any of these edges or vertices, in the same way as they are produced at ordinary edges and vertices.
On the basis of the preceding descriptions, we may say that a ray is diffracted whenever it hits an edge or vertex or grazes a surface. In every such case the ray produces infinitely many diffracted rays. Thus the process of diffraction, which occurs in these cases, splits a single ray into infinitely many diffracted rays. It is natural to expect that the light intensity associated with these diffracted rays, as well as with rays reflected from discontinuity surfaces of derivatives of the refractive index, is much smaller than that associated with the incident ray. This is indeed the case, and is of particular importance in using our theory for the quantitative calculation of intensities. Away from discontinuity or boundary surfaces, all rays-both ordinary and diffracted-are determined by the usual laws of geometrical optics. These laws, plus the laws of reflection and refraction and the foregoing laws governing diffraction, completely determine all real rays.
32
JOSEPH B. KELLER
As a consequence of the theory of diffraction just described, diffracted rays will exist, in addition to ordinary rays, in any medium which is bounded or in
which the refractive index or any of its derivatives is discontinuous. Of course, in specific examples the incident rays might be so arranged that no diffraction occurs (see Fig. 10). In such examples no incident ray can hit an
(b) Slow
Incident
\ Refracted
Reflected
(c) (a) FIG. 5. (a) Some of the diffracted rays produced by ray incident on the edge of a wedge. The light velocity within the wedge is greater than it is outside the wedge. Therefore the angle
between the refracted rays and the edge is less than that between the incident ray and the edge.
(b) A section of the wavefront resulting from incidence of a plane wave upon a wedge.
The velocity within the wedge is greater than that outside. The incident, reflected, refracted, and diffracted wavefronts are shown. The plane wavefronts produced by rays critically refracted into the outer region from the diffracted wavefront inside the wedge are also shown.
(c) A section of the wavefront resulting from incidence of a plane wave on a wedge. The velocity within the wedge is less than that outside. and diffracted wavefronts are shown.
The incident, reflected, refracted,
edge or graze a surface; hence no shadows can be formed. Therefore the absence of diffracted rays in these examples is not unexpected. 3. Examples. Let us now consider some examples of diffracted rays. First, consider a plane wave (i.e., a set of parallel rays) incident obliquely upon a thin opaque screen in the form of a half plane (see Fig. 1). If the medium is homogeneous, then all the rays, incident, reflected, and diffracted, are straight lines. The only diffracted rays are produced by those incident rays which hit
A GEOMETRICAL THEORY OF DIFFRACTION
33
Diffracted rays
FIG. 6. Diffracted rays produced by a ray hitting the tip of an opaque cone. The diffracted rays emanate from this tip in all directions.
FIG. 7. Some of the diffracted and reflected rays produced when a plane wave hits an opaque convex cylinder. One of the two grazing (tangent) rays is shown. This ray splits, part con-
tinuing unaffected and part running along the cylinder surface. At each point of its path this surface ray sheds a diffracted ray along the tangent to its path.
JOSEPH B. KELLER
34
the edge of the screen. Since the incident rays are parallel to each other, the cones of diffracted rays will also be parallel to each other. Each point of the edge will be the vertex of a cone. One diffracted ray will reach each point in the medium, accounting for the appearance of light in the "shadow" and also providing additional light rays in the illuminated region. If the incident rays are perpendicular to the edge instead of being oblique, the diffracted rays will also be perpendicular to the edge (see Fig. 2). Thus in
Incident rays
f Diffracted
Reflected /rays
rays
Fin. 8. Some of the diffracted, reflected, transmitted, and critically refracted rays produced
when a plane wave hits a convex cylinder of lower light velocity than the surrounding medium. One of the two grazing rays is shown, along with some of the diffracted and critically refracted rays it produces. Critically incident ray \
Diffracted
rays
Slow Fast
FIG. 9. A ray incident at the critical angle on a plane interface between two media. Some of the resulting diffracted rays are shown. The dotted line is a section of the diffracted wavefront.
this case each cone of diffracted rays is opened up to become a plane of diffracted rays. As a third example, consider a plane wave incident obliquely upon a slit in a thin opaque screen (see Fig. 3). The rays which hit the edges of the screen give rise to diffracted rays. Some of the rays diffracted from one edge will hit the other edge and give rise to a new set of diffracted rays. Some of these new rays will in turn hit the opposite edge, producing still other diffracted rays, etc. Thus, in this case there is an infinite set of multiply diffracted rays. Some of these rays are shown in Fig. 3. These singly and multiply diffracted rays are
A GEOMETRICAL THEORY OF DIFFRACTION
35
the only diffracted rays which occur in this problem, and they account for the occurrence of light in the shadow behind the screen. In addition, the usual incident and reflected rays are present. As a fourth example, let us consider a plane wave normally incident upon a plane screen which contains an aperture with a smooth rim (see Fig. 4). Each incident ray which hits the rim is perpendicular to it. Therefore each set of diffracted rays lies in the plane perpendicular to the edge at the point of diffraction. As in the case of the slit, multiply diffracted rays will also be produced. However, the cones of multiply diffracted rays will in general not be planes, since the diffracted rays which produce them will in general not be perpendicular to the edge. Reflected rays
FIG. 10. A case in which no diffracted rays occur. ing surface.
No incident ray is tangent to the reflect-
In all the preceding examples, the diffracting edge is a caustic of the diffracted rays, i.e., a locus of points of intersection of neighboring rays. Obviously, this is always the case with a diffracting edge. However, in the last example above, the singly and multiply diffracted rays also possess other caustics in addition to the edge. The caustic of the singly diffracted rays can be determined very simply, because these rays lie in planes normal to the edge. Therefore, to locate the intersections of neighboring rays, it suffices to consider the intersection of the neighboring planes which contain the rays. Since all these planes are perpendicular to the plane of the screen, any two of them must intersect in a straight line which is also perpendicular to the screen. As the caustic is made up of such lines, it is a cylinder with generators perpendicular to the screen. To determine its cross section, we consider the curve of intersection of the cylinder and the plane of the screen. In the plane of the screen, the diffracted
rays are perpendicular to the rim, and therefore their envelope is just the
36
JOSEPH B. KELLER
envelope of the normals to the rim. But this envelope is called the evolute of the rim. Thus we see that the caustic of the singly diffracted rays is a cylinder with generators normal to the screen and with the evolute of the rim as its cross section. If the screen is removed and replaced by a thin plate having the same rim as the aperture, the preceding considerations apply equally well. This fact is a geometrical form of Babinet's principle. Furthermore, in the case of the plate, the caustic just described will lie (at least partly) in the shadow of the plate. A cross section of it would appear as a bright line in the shadow, since light intensity is greater at a caustic than elsewhere. If the plate has a circular rim, the evolute is a single point, the center, and the caustic is just the axis of the circular plate. The resulting bright spot on the axis is well known experimentally. The bright lines in the shadows of plates of other shapes have also been observed and found to be the evolutes of the rims [1], as the above considerations predict. In the case of a plane wave obliquely incident upon a flat plate, a more detailed analysis shows that the caustic of the singly diffracted rays approaches a cylindrical shape far behind the plate. The generators of this limiting cylinder are parallel to the shadow boundary (i.e., to the incident-ray direction). The cross section of this cylinder is the evolute of the cross section of the shadow, i.e., of the curve obtained by cutting the shadow with a plane perpendicular to the incident-ray direction. This shadow cross section is just a
projection of the rim of the plate. Thus, far behind a plate illuminated obliquely, there should be a bright line in any cross section of the shadow, and it should be the evolute of the boundary of the shadow. This prediction of the theory is also in agreement with the experimental observations of the bright lines in the shadows of various plates illuminated obliquely [2]. If the bright lines had also been observed closer to the plates, they should have been found to differ from the evolutes described above. However, such observations were not made. Let us now examine another example, in which a plane wave in a homogeneous medium is incident upon a wedge of a different material. The incident
rays corresponding to the plane wave will be reflected and refracted at the surfaces of the wedge in the usual manner. In addition, the rays which hit the edge will produce cones of diffracted rays, as in the preceding examples. However, now some diffracted rays will also be produced inside the wedge (see Fig. 5a). If i denotes the angle between an incident ray and the edge and r
denotes the angle between one of the resulting diffracted rays and the edge, then r is determined by the equation cos r = (1/n) cos i. Here n is the relative refractive index of the two materials; i.e., n is the velocity in the surrounding medium divided by the velocity in the wedge. Thus if the light velocity is faster within the wedge material, the angle r is smaller than i. Of the diffracted rays produced inside the wedge by a single incident ray, one proceeds along each wall of the wedge. These two rays are surface rays
A GEOMETRICAL THEORY OF DIFFRACTION
37
within the faster medium, if n < 1. Therefore they shed refracted rays back into the surrounding medium all along their paths. These refracted rays leave the surface at the critical angle i, determined by cos i, = n. They also lie in a plane normal to the surface. The "first" critically refracted ray coincides with one of the diffracted rays produced by the incident ray in the outer medium, as can be shown by simple trigonometry. Thus the shed refracted rays due to a single incident ray lie in a plane sector bounded by the outer diffracted cone and a wall of the wedge. The refracted rays shed from one wall of the wedge due to all the incident rays are all parallel to each other and thus form a plane wave. In each medium the cones of diffracted rays from the different points on the edge are also parallel to each other and form a conical wave. Some sections of these wavefronts are shown in Fig. 5b and 5c. When
n > 1, that is, when the faster medium is outside the wedge, the refracted rays are shed into the wedge, and then the resulting plane waves lie inside the wedge.
Next, suppose that a plane wave is incident upon an opaque cone, as in Fig. 6. Then, in addition to the usual reflected rays, diffracted rays will be produced by that incident ray which hits the vertex or tip of the cone. These
rays will go in all directions from the tip, and the corresponding diffracted wavefronts will be spheres with the tip as center. Let us now consider a plane wave normally incident upon an opaque cylinder of convex cross section in a homogeneous medium (see Fig. 7). In this case, in addition to the reflected rays, two surface rays will be produced by the two incident rays tangent to the cylinder. These rays will lie in a plane normal to the axis of the cylinder and will encircle the cylinder in opposite directions. At each point on its path each surface ray will shed a diffracted ray along the tangent to the cross section of the cylinder (see Fig. 7). Two diffracted rays will pass through each point in space, one coming from each of the surface rays. The two rays through a given point are the two tangents to the cross section which pass through that point. Actually, each of these rays represents an infinite number of diffracted rays, one being shed by the surface ray each time it encircles the cylinder. These rays account for the
illumination in the shadow region and also provide additional light in the lit region. The cross sections of the diffracted wavefronts, i.e., the surfaces orthogonal to the diffracted rays, are just the involutes of the cross section of the cylinder. Suppose that the cylinder of the previous example is composed of a homogeneous material with a lower light velocity and therefore a higher refractive index than the surrounding medium. Then the incident rays which hit the
cylinder will produce refracted rays inside the cylinder in addition to the reflected rays in the surrounding medium (see Fig. 8). These refracted rays will hit the cylinder surface and again produce reflected and transmitted rays, and this process will be repeated ad infinitum. All of these multiply reflected and transmitted rays are ordinary rays. In addition to them, the grazing rays will produce diffracted rays in the outer medium as before. But now the sur-
38
JOSEPH B. KELLER
face rays will also shed refracted rays into the cylinder. These refracted rays will leave the surface at the critical angle of refraction. Once inside the cylin-
der, these diffracted rays will hit the opposite surface and be reflected and refracted, etc. As a final example, let us consider a spherical wave incident upon the plane interface between two homogeneous media, as shown in Fig. 9. The corre-
sponding incident rays are straight lines emanating from the center of the spherical wave. They are reflected and refracted at the interface in the usual way. However, if the second medium has the faster light velocity, there is a critical angle at which the refracted ray is parallel to the interface.
Therefore
this critically refracted ray is a surface ray in the faster medium. Consequently, it sheds refracted rays back into the slower medium. These shed rays leave the interface at the critical angle. The corresponding diffracted wavefronts are cones, one of which is shown in cross section in Fig. 9. It appears there as a straight-line segment. 4. A generalization of Fermat's principle. In Sec. 2 we extended the laws of geometrical optics by introducing diffracted rays and giving an explicit characterization of them. Now we shall consider an alternative (but equivalent) extension of the laws of geometrical optics based upon a generalization of Fermat's principle. This principle, which is the basis of ordinary optics, involves the index of refraction n(x). This is a real positive function which characterizes the optical behavior of the medium. In terms of it the optical length L of any curve x(s) connecting two points P and Q is defined as
L = f Q n[x(s)] ds. The parameter s denotes are length. Fermat's principle states that the optical rays connecting P and Q are those curves which make L stationary in the class Co of all smooth curves joining P and Q. This principle applies to an unbounded continuous medium (i.e., one in which n is continuous). It does not apply to bounded or discontinuous media [i.e., media in which n(x) is discontinuous]. We may try to apply it to such media by considering, instead of CO, a class of curves with a finite number of corners. However this formulation turns out to be unsatisfactory,
because it yields only some rays, "direct" ones, but does not include any reflected rays.
In order to obtain a principle valid for ordinary optics in discontinuous media, we introduce for each integer r ? 0 the class of curves CT. This is the class of curves with exactly r points on the boundaries or discontinuity surfaces of the medium. These points are to be inner points of these surfaces; i.e., they may not lie on edges or vertices. Now we formulate Fermat's principle as follows: The rays are those curves in each class C, which make the optical length stationary in C,. Upon examining the consequences of this formulation, we find that the class Co yields rays which do not touch the
A GEOMETRICAL THEORY OF DIFFRACTION
39
boundary or discontinuity surfaces; C, yields singly reflected or refracted rays; and Cr yields r-tuply reflected and/or refracted rays. The preceding formulation of Fermat's principle for discontinuous and/or bounded media is presumably implicit in older formulations of geometrical optics. Although it includes reflected rays, it still fails to take account of diffracted rays. Therefore we shall further modify Fermat's principle by introducing additional classes of curves. For each triple of nonnegative integers r, s, and t we shall define the class Dr8a This class consists of curves with r smooth arcs on the boundary or discontinuity surfaces, s points on edges of the boundary or discontinuity surfaces, and t points on vertices of these surfaces. Any number of the r arcs may be degenerate arcs, i.e., points. To each are the value of n on one side of the surface is assigned. We now define
the rays as those curves in each class Drsa which make the optical length stationary in Dr,t. The class Don is the previously considered class Co, and thus it yields the direct rays. The class Droo contains the previously considered class Cr and thus yields r-tuply reflected and/or refracted rays. Each ray in any of the other classes has at least one point on an edge or vertex and is thus a new ray not included in ordinary optics. Some of the rays in Droo are also new rays, since they have arcs on the boundary or discontinuity surfaces. From the above extension of Fermat's principle a number of conclusions can be drawn which suffice to characterize the rays explicitly. First, let us
consider any smooth arc of a ray not containing a boundary point in its interior.
By applying the usual considerations of the calculus of variations, we conclude that each such arc must be an extremal, i.e., a solution of the Euler equations. Similarly, each boundary are must be a surface extremal. Second, by applying the appropriate considerations of the calculus of variations to each corner at an inner point of a boundary surface, we find that the law of reflection or the law of refraction must be satisfied, according as the two parts of the ray lie on the same or on opposite sides of the boundary. Third, at each inner point of a boundary edge, we find that a law of diffraction must be satisfied. This law states that the two parts of the ray make equal angles
with the edge, if they lie in the same region at the edge, and that the angles are related by Snell's law if they lie in different regions. Fourth, at a vertex the two parts of a ray may make any angles. Finally, at an inner point of a boundary, a surface extremal and an extremal in space may join together smoothly. The foregoing consequences of the extended Fermat principle are essentially the explicit rules given in Sec. 2 for the determination of the rays. Since these
consequences also suffice to make the optical length stationary in each class, it follows that our two prescriptions for determining rays are equivalent. All the preceding considerations are based on the assumption that the index of refraction n(x) is a piecewise smooth function of x. This means that space is divided into a finite number of regions in each of which n and its deriva-
40
JOSEPH B. KELLER
tives are continuous and have limits at the boundary. The boundary is also assumed to be piecewise smooth, i.e., to consist of a finite number of parts each having continuous derivatives which have limits at the edge. The edge is assumed to be piecewise smooth, i.e., to consist of a finite number of arcs each having continuous derivatives which have limits at each end point of each arc. The end points of the edge arcs are called vertices. Isolated points of the boundary at which the derivatives are discontinuous are also vertices. The extended form of Fermat's principle for discontinuous media is complicated, compared to the original form for continuous media. Therefore it is natural to inquire whether the complicated form can be deduced from the simple form by considering a discontinuous medium to be the limit of a family of continuous media. Then the rays of the discontinuous medium could be defined as the limits of families of corresponding rays in the family of continuous media.
The answer to this question is negative. It turns out that the indicated limit process yields only some, but not all, of the rays determined by the In particular, many of the reflected rays are not obtained by the limit process, viz., those rays reflected at any angle extended form of Fermat's principle.
from a slower medium or normally reflected from a faster medium. However, some diffracted rays are given by this limit process. This result may be sum-
marized by stating that the geometrical optics of discontinuous media is not the limit of the geometrical optics of continuous media. 5. Diffracted wavefronts. In ordinary geometrical optics one always deals with normal congruences of rays. A normal congruence is a family of rays, all of which are normal to some surface. Such a surface is called a wavefront. The theorem of Malus guarantees that a normal congruence remains a normal congruence after reflection or refraction. Therefore reflected and refracted wavefronts can always be defined. Now suppose some of the rays of a normal congruence undergo diffraction. If the resulting diffracted rays also form a normal congruence, then diffracted wavefronts can be defined as the surfaces normal to the family of diffracted rays. That this is indeed the case can be proved, providing an extension of Malus's theorem to diffraction. The proof will not be given here. Some diffracted wavefronts have already been considered in the examples of Sec. 3.
In ordinary geometrical optics we define the eiconal, or phase, function P(P) at a point P as the optical distance to P from some fixed wavefront, measured along an ordinary ray. We then show that ' satisfies the eiconal equation (VT)2 = n2 and that the surfaces = constant are wavefronts. We also find that '(P) is double-valued if an incident and a reflected ray pass through P. These two values of ' become equal as P tends toward the reflecting surface. Thus the reflecting surface is a branch surface of T. If many rays-incident, reflected, and refracted-pass through P, then T(P) is many-valued, and the reflecting and refracting surfaces are the branch surfaces on which two or three different branches are equal. All the foregoing considerations can also be applied to diffracted rays.
We
A GEOMETRICAL THEORY OF DIFFRACTION
41
first define the eiconal I(P) as the optical distance to P from some fixed wavefront measured along any ray, ordinary or diffracted. We can then show that
' still satisfies the eiconal equation. With this new definition '(P) is even more multiple-valued than before. Not only are the boundaries and discontinuity surfaces of n(x) branch surfaces of T, but the discontinuity surfaces of derivatives of n(x) are also branch surfaces. Furthermore the edges and vertices are branch lines and branch points of T. In ordinary geometrical optics, in a continuous unbounded medium, it is possible to utilize the wavefronts and the eiconal equation as a basis for geometrical optics. We prescribe some smooth surface as an initial wavefront and consider a solution '' of the eiconal equation which has the value zero on the given surface. Because the eiconal equation is quadratic, there are two such solutions, but they differ from each other only in sign. For either solution we define the surfaces ' = constant as a family of wavefronts and the orthogonal trajectories of these wavefronts as rays. These rays are exactly the same as the rays given by Fermat's principle. By choosing different initial surfaces, we obtain precisely all the rays given by Fermat's principle. Thus
we see that in this case the wavefront formulation is equivalent to the ray formulation. No similar formulation in terms of wavefronts has been given for ordinary geometrical optics in bounded or discontinuous media. This is undoubtedly due to the occurrence of diffraction in such media. However, the present theory, which includes diffraction, can presumably be formulated in terms of wavefronts for any medium. To this end we proceed as above, by prescribing some smooth surface as an initial wavefront and by considering a solution I of the eiconal equation which is zero on the given surface. Then we define the surfaces ' = constant to be wavefronts and their orthogonal trajectories to be rays, just as before. The only difference is that now we must consider a multiple-valued solution which has as branch points, lines, and surfaces the vertices, edges, and surfaces determined by the boundaries and by the refractive index and its derivatives. This solution must be complete, in the sense that it must branch at every permissible branch point, line, or surface. Unfortunately, the foregoing requirements do not determine a unique solution T. Other conditions, perhaps at the boundaries or discontinuity surfaces, must be imposed to obtain uniqueness. Consequently, the equivalence of the ray and wavefront formulations of our theory is not yet demonstrated, since the wavefront formulation is not complete. In two dimensions, an interesting wavefront results from diffraction by an object bounded by a smooth convex curve. The diffracted wavefronts are the involutes of the curve. In three dimensions, toroidal wavefronts result from diffraction of a normally incident plane wave by a circular disk. 6. Imaginary rays. In an unbounded medium in which the refractive index and all its derivatives are continuous, no diffracted rays occur. This is clear from the above laws governing diffraction. Therefore in such a medium the
JOSEPH B. KELLER
42
theory so far presented coincides with ordinary geometrical optics. However, ordinary geometrical optics sometimes yields shadows in such media. An example is the region on that side of a caustic surface through which no rays pass (see Fig. 11). Experimentally some light is observed in these shadows. Since our theory fails to account for this light, the theory is incomplete. To complete the theory, we introduce another new type of ray, which we call an imaginary ray. Such a ray is a complex-valued solution of the ray equations. Thus, an imaginary ray in a homogeneous medium is a complex straight line . The definition presupposes that n(x) C aus ti c is analytic or piecewise analytic. Now we may consider an analytic normal congruence of real rays. By analytic we mean that the 11
rays of the congruence are analytic functions of two real parameters. Then complex values of these parameters determine imaginary rays of the same congruence. Therefore every analytic congruence contains imaginary rays. Some of them will enter the shadows of the type considered above and thus account for the light obFin. 11. A set of rays forming a caustic, or envelope. The shadow on one
side of the caustic is devoid of real
served there. To see this, consider a two-dimensional homogeneous medium. Suppose a given
curve C is a caustic, i.e., an envelope of a normal congruence of rays. These rays are then straight lines tangent to C. Let t denote arc length along C, and let the parametric equations of C be x = x(t), y = y(t). Then the equations of a ray tangent to C at the point [x(t),y(t)] are
rays.
(1)
x = x(t) + si(t), y = y(t) + sy(t).
In (1) the parameter s is the signed distance from [x(t),y(t)] on C to (x,y) on the ray. If the point (x,y) is given, the rays through it are determined by the solutions s,t of (1). Each solution yields one ray through (x,y). If C is convex, there will generally be two rays through each point on the convex side of C but no real rays through any point on the concave side. This is because there are no tangents from such points to C. However, if C is analytic, (1) may have complex solutions for s and t. Then for each solution s,t the point (x,y) lies on the complex ray (1) which is tangent to Cat the complex point [x(t),y(t)].
The value of s is the complex distance from (x,y) to the point of tangency, and (x,y) is the only real point on the ray. As an example, suppose that C is a circle of radius a. Let the polar coordi-
A GEOMETRICAL THEORY OF DIFFRACTION
43
nates (a,o) denote a point on C and let (r,0) denote a point off C.
Then (1)
becomes
r cos 0 = a cos 0 - s sin r sin 0 = a sin ¢ + s cos
(2)
Solving these equations for the point of tangency 0 and the distance s, we obtain
= 0 ± cos-,
(3)
s=
(4)
{
a
r r2 - a2.
From (3) we see that, for r ? a, there are two real values of gyp, if cos-1 is restricted to the range 0 to 7r. There are likewise two real values of t, one corresponding to each value of 0. If r < a, (3) and (4) do not yield real values of 0 and s, but they do give the complex values (5)
= 6 ± i cosh-1 a
(6)
s= +i1/a2-r2.
r
Thus two complex straight lines through (r, 0) are tangent to the circle r = a at the two points with 0 coordinates given by (5). These complex lines are the imaginary rays through (r,0) which belong to the normal congruence having the circle r = a as caustic. As a second example, consider as caustic the parabola x' = ay'2. If a line through (x,y) is tangent to the parabola at (x',y'), then
y --y
y'.
Solving for the point of tangency, we obtain
y'=y± y2- xa
(8)
Equation (8) shows that there are two, one, or no real points of tangency according as (x,y) is outside, on, or inside the parabola. In the latter case there are two complex points of tangency and thus two imaginary rays through (x,y). The distance s from (x,y) to the point of tangency is given by (9)
s=
y2-21 11+8a2y22-4ax±8a24y2-a
This distance is complex for points inside the parabola. 7. The complex eiconal, or phase, function. An analytic normal congru-
ence of rays may be defined as the set of rays normal to a given analytic Such a congruence contains imaginary as well as real rays. By means of the real rays, we have already defined the real eiconal, or phase, surface S.
JOSEPH B. KELLER
44
function ci(P). This is the optical length to P from S along a ray of the congruence. Now we may define the complex eiconal '(P) in exactly the same way by means of the complex rays. It is readily seen that '(P) is complex and that it satisfies the eiconal equation. Furthermore, it is the analytic continuation of the real phase function, which is defined only at points P lying on real rays of the congruence. In the same way, we can obtain the analytic continuation of the solution of the Cauchy problem for the eiconal equation. In this problem a surface 8, not necessarily a wavefront, is given, and a function To of position on S is also given. We are to find a solution NY of the eiconal equation such that 4' = To on S. This problem is usually solved by means of certain real rays through S. The analytic continuation can be obtained by applying the same considerations to the imaginary rays through S. As an example, let us consider a two-dimensional homogeneous medium with
n(x) = 1. We seek a solution 4' of the eiconal equation (VT)2 = 1 having the value T = t on C.
As before, t denotes are length along C.
Since the deriva-
tive of ' along C is unity, and since the length of VT is unity, we see that VT is tangent to C. Therefore the problem we have posed is a characteristic boundary-value problem, since C is everywhere characteristic (i.e., tangent to the rays). This problem has two solutions because the eiconal equation is quadratic. From the eiconal equation the derivative of T along a ray is ± 1. Therefore the solutions are given in terms of the parameters s and t by equations (1) and
* = s + t.
(10)
At points (x,y) for which s and t are real, 4' is also real. However, for points lying on imaginary rays, both s and t are complex; hence 4' is complex. In the case of the circle treated above, s is given by (4) or (6) and r = ac, where 0 is given by (3) or (5). Then from (10) we have in this case (11)
4' =aO±acos-llal+ r/ /P2-a2,
(12)
'P = aB -{- i La cosh-1 1 aJ - 1/a2 - r2J,
rr
/\
l
r
(r
- a),
(r
a).
For the parabola previously considered, if t = 0 at x = y = 0, we find (13)
t = a,
4a2y'2 - 1 + 4a log (lay' + 1/4a2y'2
1).
When this value of t is used in (10), together with s, given by (9), two solutions 4, result which are real on and outside the parabola but complex inside it. Complex solutions of the eiconal equation can also be obtained without making use of imaginary rays. As an example, consider the above problem for any curve C. Let a(t) be the radius of curvature of C and p be the distance
A GEOMETRICAL THEORY OF DIFFRACTION
45
along the normal to C, measured positively toward the convex side of C. In terms of p and t the eiconal equation becomes 2
*2
(14)
+ G_%) * = 1.
On the basis of the explicit solutions above, we assume that ' has the form (15)
`Y = t + Pq j b2(t)p1'2. =o
Upon inserting (15) into (14) and utilizing the boundary condition on C, we find (16)
t-I p
2
3
2 {-
a
ap
6a
+a2-3aa-27 a
45
The remaining coefficients can be found from a recursion formula which we will omit. The result (16) shows that ' is real for p > 0 and complex for p < 0, and that the imaginary part of T is proportional to 1pji for p small. Let us write = R + i7, where R and I are real. Then the eiconal equation yields (17) (18)
(yR)2 = n2 + (VI)2.
Equation (17) shows that the surfaces R = constant are orthogonal to the surfaces I = constant. In the next section these surfaces will be shown to be surfaces of constant phase and of constant amplitude, respectively, for a field
associated with '.
Thus (17) shows that for this field these surfaces are
mutually orthogonal. 8. Field and amplitude. To make our theory quantitative, we associate a field u(s) with each ray. It is composed of an amplitude A (s) and a phase W(s) in the form (19)
u(s) = A(s)e°l;*(
.
In (19), k = w/c is the propagation constant, determined by the angular frequency w of the field and the propagation velocity c in empty space. Equivalently, k = 21r/a, where X is the wavelength of the field in empty space. Thus our construction applies to a time-harmonic field. The time factor a-i" will be omitted. The total field at a point P is the sum of the fields (19) on all rays through P. When we deal with light, u is either the electric or the magnetic field and therefore A is a vector. However, our theory also applies to other types of field (e.g., acoustic pressure). For simplicity we shall describe it for a scalar u and then indicate the modifications which occur for vector fields.
JOSEPH B. KELLER
46
We first assume that the phase difference '(P) - 4, (Q) between two points on a ray is equal to the optical length L of the ray from Q to P. We also assume that a direction of propagation is associated with each ray and that From these assumptions it follows that can be determined at any point P if it is known at some point Q on the same ray: 41 increases in this direction.
*(P) _ *(Q) -I-- L.
(20)
We further require that ' be constant on some wavefront of a normal congruence of rays. Then F is just the eiconal, or phase, function previously introduced.
Next we assume that the principle of conservation of energy applies in its optical form. This states that the energy flux is the same at every cross section of a tube of rays. We assume that the energy flux per unit area is proportional to nA2. Then the energy principle yields, for a narrow tube of rays, nA2 do- = noA2 dvo.
(21)
Here n and A are evaluated at a point P on a ray in the tube, and do- is the cross-sectional area of the tube at P. The quantities no, Ao, and dvo are evaluated at some other point Q of the same ray. From (21) we obtain A=Ao
(22)
no dvo
V n do-
Thus we can compute A at any point P on a ray, provided that we know the amplitude A o at some point Q on the same ray. The ratio dvo/dv in (22) is the ratio of the areas of the cross sections at P and Q. Since these cross sections are portions of wavefronts, this ratio is just the Jacobian of the mapping from a wavefront at P to that at Q by means of rays. When A is a vector, we assume that its amplitude satisfies (22). Its direction, if A is an electric or magnetic field, is obtained from Ao by parallel transport along the ray with respect to the metric n ds. If pi and p2 denote the principal radii of curvature of the wavefront at Q, then, in a homogeneous medium, the corresponding radii at P are pi + s and p2 + s. Here s denotes the distance along the ray from Q to P. Since the area ratio is inversely proportional to the ratio of Gaussian curvatures, (22) becomes (23)
_
lI
PIP2
A = A0 (Pi+S)(P2+S)J
From (23) and (20) we see that, in a homogeneous medium, (24)
Here To = *(Q).
u($) =
P1P2
(Pi + s) (P2 + s)
Apezhc*o+n8>.
A GEOMETRICAL THEORY OF DIFFRACTION
47
The field (24) becomes infinite at two, one, or no points on a ray, according as both, one, or neither of the radii of curvature are finite. These points are on the caustics of the ray congruence. In these various cases, u decays for large s likes 1, s 1, or so, that is, as in a spherical, cylindrical, or plane wave. Later we shall indicate how to modify our theory in order to obtain a finite value for u on a caustic. In homogeneous media it is often convenient to measure s from a point Q on the caustic C. To do this, we first rewrite (23) in the form A[(P1 + s) (P2 + s)pa llf = Aopi
(25)
The left side of (25) has a limit as Q tends to C, and therefore the right side must also. This is understandable since A0 becomes infinite and pi becomes zero as Q tends to C. Let us denote this limit by Ao = lim Aopi. Then (24) becomes
u(s) =
(26)
1
[SP2 + s)
In a two-dimensional medium, or for cylindrical waves in three dimensions, P2 is infinite, and (26) becomes
u(s) = s
(27)
-A'eik(111o+ns).
0
In two dimensions, as we have seen, To = nt, where t denotes arc length along C. Furthermore, since Ao varies from ray to ray, we may designate each ray by its point of tangency t and write Ao = Ao(t). Then (27) becomes (28)
u(s,t) = s-iAo(t)ei1;1(t+8)
If two rays pass through a point P, as is often the case near a caustic, then u(P) is a sum of two terms of the form (28). Let us apply (28) to the congruence of rays tangent to a circular caustic of radius a. Making use of our previous results for s and t, we obtain for r > a, (29)
u(r7 B) =
A0[aO 4- a cos' -(a/r)]
eikn[a0-acos-1(a/r)+1/rs-aEj
12 - a2
+
AO[aO + a cos' (air)]
s/r2-a2
eikn[a0+acos-1(a/r)-
r2-01 -i(r/2).
For r < a we also obtain two terms, each corresponding to one of the imaginary rays through the point (r,B). One of these terms increases with distance from the caustic, whereas the other decreases. We now assume that the increasing term must be omitted. Then we obtain for r < a (30)
u(r B) = 7
tiai cosh-1 (air)] ekn[iaB{acosh i(a/r)-VaP-r=j-(ir/4)
4
a2
- 7'2
For the result (30) we require that A0(t) be an analytic function of t.
JOSEPH B. KELLER
48
Let us now consider the function v(r,e) defined by (31)
v(r,O) = B[Hnkd(nkr) -
H;,k'¢(nkr)]einxme.
This function is an exact solution of the reduced wave equation in two dimensions if B is a constant. We now expand it asymptotically for large nka and nkr. This yields exactly (29) for r > a and (30) for r < a, provided that Ao is constant and that B = ea-4 kn7r/2 A'.0 This agreement indicates that our construction yields the leading term in the asymptotic expansion with respect to k, for k large, of the exact solution of the wave equation. We believe that this is always the case. So far, we have described how the amplitude varies along a ray. Now we shall explain how the initial value of the amplitude is to be determined. First,
on rays which come from a source-even if it is at infinity-the amplitude must be prescribed.
This prescription characterizes the source. Second, on a
reflected or refracted ray at the point of reflection or refraction, we assume that the amplitude is proportional to that on the corresponding incident ray at this point. The proportionality factors are called reflection and transmission coefficients, R and T, respectively. For vector fields these coefficients are matrices. Third, on a ray diffracted from an edge or vertex we assume
that the field is also proportional to that on the corresponding incident ray at the point of diffraction. The proportionality factor we call a diffraction coefficient (or matrix, in the vector case). Additional hypotheses must be made to treat the fields on diffracted rays which have arcs on boundaries, but we shall not consider them here.
We assume that the various coefficients just introduced are determined solely by local conditions at the point of reflection, refraction, or diffraction. Thus, for example, the reflection and transmission coefficients depend only upon the angle between the incident ray and the surface normal as well as upon the properties of the media at the point of reflection. Therefore they can be determined from the solution of a canonical problem, that of reflection and refraction of a plane wave at a plane interface. The diffraction coef-
ficients can also be obtained from the solutions of appropriate canonical problems. The various coefficients depend upon the type of field under consideration. Sound waves will have different coefficients from water waves, electromagnetic waves, or other waves. Consequently, these coefficients must be determined separately for different fields. Mathematically, this difference will be mani-
fested by the differential equations and boundary conditions which occur in the canonical problems. Canonical solutions can also be used to modify the results of our theory at and near caustics. Thus, for example, let us again consider a two-dimensional homogeneous medium in which a circular caustic occurs. Our results (29) and (30) for the field u become infinite on the caustic r = a. But the function v in (31), which is asymptotic to u for large k, remains finite on the
A GEOMETRICAL THEORY OF DIFFRACTION
49
caustic.
Therefore we can use v instead of u on and near the caustic in order to obtain a finite value for the field. We can also assume that a finite value for the field at a point on any caustic can be obtained from the field u off the caustic by the same correction factor, involving the radius of curvature of the caustic at the point. 9. Relation to other work. Some types of diffracted rays and diffracted wavefronts have already occurred in the solutions of particular diffraction problems.
Some others have been observed experimentally or have been introduced to explain particular experimental results. We will now describe some of this previous work. First we recall Thomas Young's proposal that diffraction through an aperture in a screen is an edge effect. This proposal is in agreement with the present theory, which even makes it precise. Next we note that Sommerfeld's solution of Maxwell's equations for two-dimensional diffraction of waves by a half plane contains a cylindrical wave emanating from the edge [3]. The cylindrical wavefronts of this wave are just the diffracted wavefronts, and the normals to these cylinders are the diffracted rays, of our theory. The solutions of Sommerfeld and Macdonald for two-dimensional diffraction by wedges also contain cylindrical waves emanating from the edge. Their solutions for the three-dimensional case contain the cone of diffracted rays from each edge point. The bright lines in the shadows of plates, observed by G. G. Becknell and J. Coulson [1,2] have already been mentioned and explained in terms of our theory. Later Nijboer observed similar bright lines in the diffraction patterns of apertures. He introduced diffracted rays emanating normally from the edge and found that the caustics of these rays were exactly the observed bright lines. The present theory predicts the bright spot on the axis of a circular disk, as was noted above. This result is particularly interesting, because the observation of the bright spot was a strong argument for the wave theory of light. We now see that this result is also predicted by a ray theory. Therefore, if this ray theory had been available at the time of the controversy between ray and wave theory, it might have forestalled the acceptance of the latter. The field diffracted through an aperture in a screen can be represented as an integral over the aperture and screen. Using Kirchhoff's approximate
values for the integrand, A. Rubinowicz [4] reduced this integral to a line integral along the aperture rim and evaluated it by the method of stationary phase. The stationary points which he obtained for a given field point P coincide exactly with the places on the edge at which the diffracted rays through P are produced. N. G. van Kampen [5] evaluated asymptotically the integrals given by the modified Kirchhoff method. His result also contains one stationary point corresponding to each edge-diffracted ray through P, and in addition one stationary point corresponding to each corner of the edge, accounting for the corner-diffracted rays. R. M. Lewis, B. D. Seckler, and the present author [6] have obtained similar results from W. Braunbek's [7] modification of the Kirchhoff theory.
50
JOSEPH B. KELLER
Surface rays appear in the asymptotic expansion for large ka of the field diffracted by a sphere or cylinder of radius a. This was originally shown by G. N. Watson and elaborated by B. van der Pol and H. Bremmer [8], B. Friedman [9], I. Imai [10], W. Franz [11], and others. The tangent rays shed by these surface rays are exhibited in the exact solution of W. Franz [11] and the approximate solution of W. Franz and K. Depperman [12]. The latter authors showed that calculations of radar reflection from cylinders, based on the idea of surface rays, agreed excellently with the measurements of Limbach. F. G. Friedlander [13] introduced surface rays and the associated wavefronts in studying diffraction by cylinders of convex cross section. Surface rays produced by refraction at the critical angle occur in the work of E. Gerjuoy [14] and of L. Brehovskih [15]. These authors examined the field produced by a point source near a plane interface between two media, in the high-frequency limit. They found that each critically refracted ray gave rise
to the appropriate diffracted rays. Such rays have been observed experimentally in acoustics. Spherical waves emanating from the tip occur in the solution for the field diffracted by a circular or elliptic cone. The wavefronts and rays of these
waves are just the diffracted wavefronts and rays predicted by the theory in this case. The rays leave the vertex in all directions. Rays reflected from surfaces of discontinuity of derivatives of the index of refraction do not seem to have been considered before. However, the fact that such discontinuities do reflect at normal incidence was noticed by J. Feinstein [16] and S. A. Schelkunoff [17]. The possibility of using rays in a systematic way for the calculation of fields
was investigated by R. K. Luneberg [18]. He suggested that the ray construction would yield the leading term in the asymptotic expansion of the field for large k. The procedure for obtaining further terms in this asymptotic expansion was given by M. Kline [19], both for Maxwell's equations and for more general equations. Other authors have considered the same type of expansion for various equations. Thus F. G. Friedlander [20], H. Bremmer [21], and E. T. Copson [22] also considered Maxwell's equations; S. C. Lowell [23] considered waves in shallow water; J. B. Keller [24] considered weak shock waves; G. D. Birkhoff [25], L. Brillouin [26], G. Wentzel [27], P. A. M. Dirac [28], and J. B. Keller [29] considered the Schroedinger equation of quantum mechanics; F. G. Friedlander and J. B. Keller [30] considered the reduced wave equation; and W. J. Trjitzinsky [31] considered a very general linear equation. All of these authors restricted their attention to the rays of ordinary geometrical optics.
Many diffraction problems have been solved with the ray method by C. Schensted [32], J. B. Keller, R. M. Lewis, and B. Seckler [33], J. B. Keller [24,29,34,35], K. 0. Friedrichs and J. B. Keller [36], B. R. Levy and J. B. Keller [37], S. N. Karp and J. B. Keller [38], B. D. Seckler and J. B. Keller [40], etc. Whenever possible, the fields constructed by the ray method were compared
A GEOMETRICAL THEORY OF DIFFRACTION
51
with asymptotic expansions (for large k) of exact solutions. In all such cases perfect agreement was obtained. In other cases numerical results were compared, and good agreement was obtained for ka >_ 2, where a is a typical length in the problem. All of these results suggest that the ray method does yield the leading terms in the asymptotic expansions of solutions of diffraction problems. However, a general proof of this statement has not yet been obtained. Partial results of this kind are given by R. K. Luneberg [18], M. Kline [19], W. J. Trjitzinsky [31], W. L. Miranker [39], and R. M. Lewis [41]. BIBLIOGRAPHY
1. J. Coulson and G. G. Becknell, Reciprocal diffraction relations between circular and elliptical plates, Phys. Rev. vol. 20 (1922) p. 594. 2. and , An extension of the principle of the diffraction evolute and some of its structural detail, Phys. Rev. vol. 20 (1922) p. 607. 3. A. J. W. Sommerfeld, Optics, Academic Press, Inc., New York, 1954. 4. A. Rubinowicz, The diffraction waves in Kirchhoff's theory of diffraction phenomena, Ann.
Phys. vol. 53 (1917) p. 257. 5. N. G. van Kampen, An asymptotic treatment of diffraction problems, Physica vol. 14 (1949) p. 575.
6. J. B. Keller, R. M. Lewis, and B. D. Seckler, Diffraction by an aperture, II, New York Univ. Inst. Math. Sci. Research Rep. EM-96 (1956); J. Appl. Phys. vol. 28 no. 5 (May, 1957) pp. 570-579. 7. W. Braunbek, Neue Naherungsmethode fur die Beugung am ebenen Schirm, Zeit. Physik vol. 127 (1950) p. 381. , Zur Beugung an der Kreisscheibe, Zeit. Physik vol. 127 (1950) p. 405. 8. H. Bremmer, Terrestrial radio waves, Elsevier Press, Inc., Houston, 1949. 9. B. Friedman, Comm. Pure Appl. Math. vol. 4 (1951) p. 317. 10. I. Imai, Die Beugung electromagnetischer Wellen an einem Kreiszylinder, Zeit. Physik vol. 137 (1954) pp. 31-48.
11. W. Franz, Zeit. Natur. vol. 9a (1954) pp. 705-716. 12. K. Depperman and W. Franz, Theorie der Beugung an der Kugel unter Berucksichtigung der Kriechwelle, Ann. Phys. Ser. 6 vol. 14 (1954) pp. 253-264. 13. F. G. Friedlander, Proc. Cambridge Philos. Soc. vol. 38 (1942) p. 383. 14. E. Gerjuoy, Comm. Pure Appl. Math. vol. 6 (1953) p. 73. 15. L. Brekovskih, Tech. Phys. USSR vol. 18 (1948) p. 455.
16. J. Feinstein, Trans. IRE, PGAP, AP-2 (1954) p. 23. 17. S. A. Schelkunoff, Comm. Pure Appl. Math. vol. 4 (1951) p. 181. 18. R. K. Luneberg, The mathematical theory of optics, Brown University, 1944. Propagation of electromagnetic waves, New York University, 1948. 19. M. Kline, An asymptotic solution of Maxwell's equations, Comm. Pure Appl. Math. vol. IV no. 2-3 (August, 1951), pp. 225-263. Asymptotic solution of linear hyperbolic partial differential equations, J. Rational Mech. Anal. vol. 3 no. 3 (May, 1954).
20. F. G. Friedlander, Geometrical optics and Maxwell's equations, Proc. Cambridge Philos. Soc. vol. 43 part 2 (1946) pp. 284-286. 21. H. Bremmer, The jumps of discontinuous solutions of the wave equation, Comm. Pure Appl. Math. vol. IV no. 4 (November, 1951) pp. 419-427. 22. E. T. Copson, The transport of discontinuities in an electromagnetic field, Comm. Pure Appl. Math. vol. IV no. 4 (November, 1951) pp. 427-435.
52
JOSEPH B. KELLER
23. S. C. Lowell, The propagation of waves in shallow water, Comm. Pure Appl. Math. vol. 2 no. 2-3 (1949) pp. 275-291. 24. J. B. Keller, Geometrical acoustics, I, The theory of weak shocks, J. Appl. Phys. vol. 25 no. 8 (August, 1954) pp. 938-947. 25. G. D. Birkhoff, Some remarks concerning Schroedinger's wave equation, Proc. Nat. Acad. Sci. U.S.A. vol. 19 (1933), pp. 339-344; and in Collected mathematical papers Vol. II, American Mathematical Society, 1950, pp. 813-818. Quantum mechanics and asymptotic series, Amer. Math. Soc. Bull. vol. 39 (1933) pp. 681-700; and in Collected mathematical papers, Vol. II, American Mathematical Society, 1950, pp. 837-856. 26. L. Brillouin, Remarques sur la mecanique ondulatoire, J. Phys. Radium vol. 7 (1936) pp. 353-368. Las mlcanique ondulatoire; une methode ginerale de resolution par approximations successives, C. R. Acad. Sci. Paris vol. 183 (1926) p. 24. 27. G. Wentzel, Eine Verallgemeinerung der Quantenbedingungen fur die Zwecke der Wellenmechanik, Zeits. Physik vol. 38 (1926) p. 518. 28. P. A. M. Dirac, The principles of quantum mechanics, Oxford University Press, London, 3d ed., 1947, pp. 121-123. 29. J. B. Keller, Derivation of the Bohr-Sommerfeld quantum conditions from an asymptotic
solution of the Schroedinger equation, New York Univ. Inst. Math. Sci. Research Rept. CX-10 (July, 1953). 30. F. G. Friedlander and J. B. Keller, Asymptotic expansions of solutions of (V2 + kz)u = 0, New York Univ. Inst. Math. Sci. Research Rep. EM-67 (September, 1954); Comm. Pure Appl. Math. vol. 8 no. 3 (August, 1955) pp. 387-394. 31. W. J. Trijitzinsky, Analytic theory of parametric linear partial differential equations, Rec. Math. vol. 15 (1944) p. 179. 32. C. Schensted, The electromagnetic transport equation and the Luneberg-Kline method of solution, Univ. of Michigan, Eng. Research Inst. Rep. 15-25-(504)-3. 33. J. B. Keller, R. M. Lewis, and B. D. Seckler, Asymptotic solution of some diffraction problems, New York Univ. Inst. Math. Sci. Research Rep. EM-81 (1955); Comm. Pure Appl. Math. vol. 9 (1956) p. 207. 34. J. B. Keller, Diffraction by an aperture, I, New York Univ. Inst. Math. Sci. Research Rept. EM-92 (1956); J. Appl. Phys. vol. 28 no. 4 (April, 1957) pp. 426-444. 35. , Trans. IRE, PGAP, AP-4 (1956) pp. 312-321. 36. K. 0. Friedrichs and J. B. Keller, Geometrical acoustics, II; Diffraction, reflection and refraction of a weak spherical or cylindrical shock at a plane interface, Jour. Appl. Phys. vol. 26 (1955) pp. 961-966. 37. J. B. Keller and B. Levy, Diffraction by a smooth object, to be published, New York Univ. Inst. Math. Sci. EM Series; B. R. Levy and J. B. Keller, Diffraction by a smooth object, New York Univ. Inst. Math. Sci. Research Rep. EM-109 (December, 1957). 38. S. N. Harp and J. B. Keller, Diffraction by an aperture, III; to be published, New York Univ. Inst. Math. Sci. EM Series. 39. W. L. Miranker, The asymptotic theory of solutions of Au + kzu = 0, New York Univ. Inst. Math. Sci. BR-21 (1956). 40. B. 0. Seckler and J. B. Keller, Diffraction in inhomogeneous media, New York Univ. Inst. Math. Sci. Research Rep. MME-7 (December, 1957). 41. R. M. Lewis, Discontinuous initial value problems and asymptotic expansion of steadystate solution, New York Univ. Inst. Math. Sci. Research Rep. MME-8 (December, 1957). INSTITUTE OF MATHEMATICAL SCIENCES, NEW YORK UNIVERSITY,
NEW YORK, N.Y.
UPPER AND LOWER BOUNDS FOR EIGENVALUES' BY
J. B. DIAZ
1. Introduction. The problem of finding general methods for the approximation of eigenvalues of self-adjoint differential problems has attracted a great deal of attention in the scientific literature. The well-known Rayleigh-Ritz method (see Rayleigh [62], Ritz [64,65], and H. Poincar6 [58]) furnishes upper bounds for the eigenvalues of a differential eigenvalue problem. On the other hand, A. Weinstein [90], in connection with certain eigenvalue problems of the
theory of plates, developed a method for obtaining lower bounds for eigenvalues. It seemed to be of interest, in accordance with the theme of this Symposium, to present an account, as self-contained as possible, of the under-
lying ideas of these two fundamental methods for the approximation of eigenvalues. In order to proceed quickly, the following points of view have been adopted:
First of all, for definiteness, attention has been focused on a particular differential problem, that of a vibrating clamped plate, which was used by Weinstein [90] in developing his method originally. It has also been found convenient, so as not to encumber the exposition unduly, to omit for the most part all the
relevant differentiability hypotheses required of the "arbitrary" functions occurring in the discussion. The eigenvalues are supposed to be defined by means of certain variational problems, and the question of the equivalence between these problems and the corresponding differential problems is not analyzed. The variational problems are used in Sec. 2 as a basis for all later considerations. The Rayleigh-Ritz method is dealt with in Sec. 3, and the Weinstein method is treated in Sec. 4. The last section contains a number of remarks,
which were placed together at the end in order not to interrupt the trend of thought of the preceding sections. During the intervening years since the publication of Weinstein's volume in the series Memorial des sciences mathematiques, his method has been developed further by himself and other mathematicians, notably N. Aronszajn. A unified presentation of the Rayleigh-Ritz and the Weinstein methods for the approximation of the eigenvalues of operators in a Hilbert space is contained
in Aronszajn's report [2]. It would by far exceed the modest aims of the present account to even mention the many recent contributions of Aronszajn and his colleagues which appear among the items listed in the bibliography to this paper. 1 This research was supported by the U.S. Air Force through the Air Force Office of Scientific Research of the Air Research and Development Command under Contract No. AF18(600) 573. 53
J. B. DIAZ
54
The detailed treatment of a numerical application of Weinstein's method is not attempted here. However, attention is called to the many numerical applications made by Weinstein and his colleagues and also to Aronszajn [4, pp. 26-40], where one finds an application of the methods of Weinstein and of Rayleigh-Ritz to the computation of lower and upper bounds for the first 13 eigenvalues of a vibrating clamped square plate. Quite interesting numerical and theoretical considerations relative to plate problems are to be found in the report by Aronszajn and Donoghue [7].
It is hoped that the present summary of the basic ideas behind these two methods for the approximation of eigenvalues will make them more readily accessible to workers in related fields of mathematical physics. 2. The basic inequalities. The eigenvalue problem in question is the following; (1)
Mw - a2w = 0,
(2)
w
an=
(on D), (on C),
0,
where D is a bounded plane open connected set with a smooth boundary C, where the operator 0 = a2/ax2 + a2/ay2 is the two-dimensional Laplacian, and where a/an denotes differentiation in the direction of the outer normal to C. If w 0 0 is an eigenfunction of (1),(2) corresponding to the eigenvalue A2, then an application of Green's identity (3)
fc (
.
an -
a
) ds
yields at once that (4)
0 < X2 =
(Ow,Ow) (w,w)
=
(w,MMw).
(w,w)
Notice that here, as elsewhere in the sequel, the usual notation (5)
(f,g) = ff f(x,y)g(x,y) dx dy D
for the scalar product of the two functions f and g has been employed. The relation (4) explains why the eigenvalues in (1) have been designated by X2; one may also suppose that X > 0, for definiteness. To each eigenvalue X2 of (1),(2) corresponds a positive integer, called its
multiplicity, which is the maximum number of linearly independent, not identically zero solutions of (1),(2). Sometimes, if the number X2 is not an eigenvalue of (1),(2), it will be convenient to express this by saying in this case that the multiplicity of X2 is zero. Let the eigenvalues of (1),(2), with due account being taken of their multiplicity (i.e., each eigenvalue being represented exactly as many times as its multiplicity), be denoted by the following
UPPER AND LOWER BOUNDS FOR EIGENVALUES
55
nondecreasing sequence:
(6)
0 < X1 < X2 < X3 <
The following considerations are based on R. Courant's [17] fundamental characterization of the eigenvalues (6) by means of variational problems. Let A designate the class of admissible functions for the eigenvalue problem (1), (2), i.e., the set of all real-valued functions w defined on D + C and satisfying the boundary conditions w = aw/an = 0, on C. [In order to simplify the state-
ment of the various results, the following convention will be agreed upon: it shall be understood, by definition, that the function w = 0, which is a member of the set A, assigns the value + oo to the "Rayleigh" quotient (Ow,Ow) (w,w)
- (w,AOw) (w,w)
which occurs in Courant's "maximum-minimum" definition of the eigenvalues (6).] Then, according to Courant, one has (7)
X = min (Aw,Aw) weA
(W,w)
and (8)
AL =
max i.'
min weA (w, 0;) = 0
i=1,
(zw,Aw) (w,w)
n-1
for n = 2, 3, , where the functions q5i are arbitrary real-valued functions defined on D + C. Given n = 2, 3, and the n - 1 functions 01, , 0n_l, equation (8) states that, in order to obtain an, the quotient (Aw,Ow)/(w,w)
is to be minimized first over the subset (of all the admissible functions A) which is orthogonal to the functions 01, , On-1, and that this minimum value, which depends on the particular choice of ¢1, , cn_i, is then to be maximized over all the possible choices for 01, , /n_l. It is clear that, under a perfectly natural interpretation (the functions 0z being absent in this case), the equation (7) may be subsumed under equation (8) for n = 1, if
The cases n = 1 and n > 1 have been written separately above only to emphasize the fact that for n > 1 there are two distinct nontrivial processes to be carried out, first a minimum and then a maximum, while for n = 1 there is just one, a minimum, or, if one wishes to think of it that way, desired.
first a minimum and then a (trivial) maximum. Let the eigenvalues given by (7),(8) be denoted by an(A), n = 1, 2, 3, , in order to show their dependence on the set A. The chief idea of the estimation of eigenvalues X2(A) = X [see (6)], based on their variational definitions (7),(8), is simply the following: If S is a set of functions which is smaller . , than A (i.e., a subset of A) and if the "eigenvalues" X (S), n = 1, 2, 3, are defined by replacing A by S in (7),(8), then one obtains upper bounds for
J. B. DIAZ
56
; while if B is a set of functions which the eigenvalues An(A), n = 1, 2, 3, is bigger than A (i.e., a superset of A) and if the "eigenvalues" An (B), n = 1, , are defined by replacing A by B in (7),(8), then one obtains lower 2, 3, Clearly, in order that . bounds for the eigenvalues An(A), n = 1, 2, 3, the definitions of An(S) and An (B) remain meaningful for all sets S and B, one
may have to interpret "min" as "greatest lower bound" and "max" as "least upper bound," and this understanding will be taken for granted throughout. Thus if
SCACB,
(9)
then
X2(B)
(10)
An (A)
(n = 1, 2, 3,
An (S),
.
.
.),
where X 2(B) = min (Aw,w'w)
(11)
( ,
weB
(12)
(iw'Ow)
min
max
X1(B) =
)
01,
(w,w)
weB
i=1,
=0
),
,n-1
and similar definitions hold for A2(S), n = 1, 2, 3, 0 < X 2(B) < A2(B)
(n=2,3,
.
A2(B) < ..
It is clear that
.
and 0 < X 2(S) < A2(S) < A3(S) < .
.
The inequalities (10) follow at once from the fact that, for example, min
(Ow,Aw) /
weB
(w,w)
< _ min
(zXw,iw)
weA
(w,w)
and
nun weB
(w,00=0
i=1,
n-1
(Ow,Ow)
(w,w)
min weA
(zw,Ow) , (w,w)
(n = 2, 3,
).
(w,¢0 =0 i-1,--,n-1
It is to be noticed that the set-inclusion relation (9) is reflected in the inequalities (10), which seems to go the wrong way, so to speak, as far as the order of S, A, and B is concerned. In Hilbert space, or in the theory of integral equations, one considers 1/An rather than An, as above, and if this replacement is made, then the modified (10) appears with the inequality signs reversed, so
that then the relative order of S, A, and B is preserved in passing from (9) to the modified (10). Given the sets of functions S and B, the inequalities (10) furnish upper and lower bounds for the eigenvalues X (A). It is now clear how to proceed in
UPPER AND LOWER BOUNDS FOR EIGENVALUES
57
order to obtain a monotone nondecreasing sequence of lower bounds and a monotone nonincreasing sequence of upper bounds for X (A).
(13) S=SOCS1CS2C then in an obvious notation, (14)
X 2(B) = X2(Bo)
G X (A) G .
for n = 1, 2, 3,
.
Let
CAC
CSmC
CBmC ... CB2CB,CBo =B;
. < X,(Bm) < X (B1) _< an(B2) < . C X 2( & ) G . . . X (,S2) C X (SI) X (SO) .
It is clear that, for each m = 1, 2, 3,
0
Ai (Bm) -- X2 (Bm) < X (Bm) C
0
X (Sm) G X 2 (Sm) C X2 (Sm) L
X221(S),
, one has
.
and
It appears plausible, but will not be discussed further here, that under favorable circumstances it will be true that Jim X 2(B.) = X 2(A) = Jim X2(Sm) for m-tm
m-*m
n = 1, 2, 3,
(see N. Aronszajn and A. Weinstein [11]). For numerical applications, it is essential that the sets Sm and B. be chosen in such a way that the eigenvalues X (Sm) and a2 (Bm) (or at least an initial segment of them) be explicitly computable. Notice that, in order to be able
to write that X (Sm) C X 2(A)
X 2 (Bm),
one must actually know the proper multiplicities of the eigenvalues corresponding to the sets of functions S. and Bm. These questions will be dealt with in the next two sections. 3. The Rayleigh-Ritz method and Aronszajn's generalized Rayleigh-Ritz method for the determination of upper bounds for the eigenvalues. Consider the Rayleigh-Ritz method first. Here the initial step is to choose So to be the subset of A consisting of the identically zero function alone. In accordance with the convention made as to the value of the quotient (Aw,A) when w = 0, one obtains the (trivial) upper bounds (15)
Xln(A) < oc,
(n = 1, 2, 3,
).
of real-valued functions defined on Now choose a sequence 41, VY2, +G3, D + C, and satisfying the boundary conditions (2) : (16)
,Gti =an !b = 0,
(i = 1, 2, 3,
.
.)
These admissible functions %i; are the "coordinate functions," in the terminology of Ritz. Suppose further, for convenience, that for each positive integer m the functions Y1, , ¢m are linearly independent, so that their
J. B. DIAZ
58
Gram determinant (Y'1,'Y1)
(''ybI1,'',1',2)
(W2, '1)
(Y'2,Y'2)
(Y'1r''Yrr'''m)
(17)
0.
.
( m,y'2)
.
.
Let, for each positive integer m, the set S. be the set [y'1, , Y'm] of all linear combinations with real coefficients, of the functions 01, Since , +Gm. each of the functions ¢1, - , 4,m is an admissible function, it follows that
S. is contained in A and that, for any m = 1, 2, 3, Xn(A) 5 X2(&),
(18)
(for n = 1, 2, 3,
),
where (19)
(Ow,Ow)
min
X2(Sm) =
.
Wokbi,
,,P-)
(W,w)
and
(20)
X,2i(Sm) =
....
(zw'Ow)
min
max
welJPi.... 'P.j
(W,w)
(w, 00 =0
for n = 2, 3, . Actually, the upper bounds for X (A) given by (18) are trivial for n > m, because only the upper bounds X2(Sm), , X1 (S.) are finite. To see this, it suffices to show that ``
A2M+1 Sm)
(21)
because
Xm}i (Sm)
Xrn+2 ( m) C Xm+3 (Sm) <
But (21) readily follows upon taking n = m + 1 in (20) and choosing 01 = ¢i, , 0m = %P.. For then, if w = C1ip1 + 02 = 02, + Cmp'm, with real coefficients C1, , Cm, satisfies the m orthogonality conditions
(w4) = 0,
(i = 1,
.
I C, %*) = 0,
(i = 1,
.
that is,
.
.
,
m);
,
M);
in .
.
=1
one has, on account of (17), that C1 =
= Cm = 0 and that w = 0.
Hence min
(Ow,Ow)
we1+P1, ....fin)
(w,'w)
(w4'i) = 0
which implies (21).
_
UPPER AND LOWER BOUNDS FOR EIGENVALUES It remains to verify that the m eigenvalues Xi(Sm),
,
59
X,2, (Sm) are finite.
This can be seen from the fact that their determination reduces to an algebraic problem, the determination of the m eigenvalues of a certain m-by-m + Cm+Gm, with real coefficients symmetric matrix. For if w = Cl¢1 + C1,
,
Cm, not all zero, is a function of the set Sm, then m
m
(Ow,Ow) (w,w)
(22)
I
_ j
1 k =I m
l k(DY
m
II
j-1 k=1
is the ratio of two positive-definite quadratic forms in the m real variables This follows from the fact that, for each positive integer j, C1, , Cm. j < m, one has that the Gram determinant with 1 .
.
(`Y 111)
.
(Y'1f Y'j)
(23)
> 0, .
.
1 (1411)
.
and that also the Gram determinant
(24)
> 0.
(Di,Olpj) (Olj,41) The inequality (23) follows from the given linear independence of the j func, ¢j and the inequality (24) from the linear independence of tions +/'1, .
.
.
, A4,j, which will now be proved. Suppose, on the , Apj are linearly dependent. Then contrary, that the functions 0V,1, there exist real numbers C1, . . , Cj, not all zero, such that
the j functions AY'1,
C10411 .+
_{... Cj
4,j = 0(Ci4,i -+
.+ C5Y'j) = 0,
(on D).
But, in addition
C1+41+ ... +C,"pj=(C1lY1+ ... +C41j) =0, an
and thus C1+001+
.
. +CM=0,
(on C),
(on D),
contradicting the linear independence of the j functions 71, . , .kj. From the preceding remarks and from (22) one sees that the m eigenvalues X (Sm), , X (Sm) are precisely the m (positive) roots, in increasing order of magni-
J. B. DIAZ
60
tude, of the mth-degree polynomial in X which is given by the mth-order determinant (25)
(0/' ,i b) -
II,,
1'1,''II/'''1)
If the functions
(01//,17
(/'`1, Y'2)
''J'..1,
(X(
(DY'2,A'Y2) - X(1'2,'2)
(O 'm,L
.
.
.
lv''Ym)
(01bo, ''I'm)
'2) - (4m,y'2)
(P2,II,,,'m)
m) - X (Y'm,`Ym)
, qm are orthonormalized, i.e., if
(if i = j), j), (if
{1,
0,
, m, then (25) becomes just the characteristic determinant , m. The for the m-by-m matrix with elements (M'i,A1k) where j,lc = 1, Cm) denominator in (22) is then just the Euclidean distance of the point (C1, from the origin, and the m eigenvalues X (Si), , X2 (S.) are seen to be the squares of the reciprocals (arranged in increasing order of magnitude) of the principal semiaxes of the "ellipsoid" m m Cickwi,AY o = 1,
for i,j = 1,
I
i=1 k=
1
in m-dimensional Euclidean space with coordinates (C1,
,
Cm).
In the generalized Rayleigh-Ritz method of Aronszajn [2, pp. 54-56], the initial set So may be an infinite-dimensional subset of A, so that one may obtain in this way finite upper bounds for all the X (A) at the first step. The successive improvement of the initial upper bounds X (So) obtained by choosleads to considerations which resemble those ing sets of functions S1, S2, arising in the discussion of Weinstein's method, which is the subject of the next section, and for this reason will not be entered into at this juncture. 4. Weinstein's method for the determination of lower bounds for the eigenvalues. The discussion falls naturally into three parts. First, there is the determination of the initial set of functions Bo and of its corresponding eigenfunctions and eigenvalues (each eigenvalue with its proper multiplicity) : (26)
0 < X (Bo)
x2(B,) < X 2(B,)
..
.
For numerical applications, the set Bo must be chosen so that its corresponding eigenfunctions and eigenvalues may be regarded as known. The set Bo to be used here is that originally employed by Weinstein [90]; for other possibilities of choosing Bo, reference is made to Aronszajn [4, pp. 26-40]. In the second place, there is the determination, for m = 1, 2, 3, , of
UPPER AND LOWER BOUNDS FOR EIGENVALUES
61
the set of functions B and of its corresponding eigenvalues (each with its proper multiplicity) : (27)
0
X 2(B.) < X2(Bm) < X (Bm) S
.
The knowledge of the exact multiplicities (at least for an initial "segment" of the eigenvalues) is essential, for without this information one cannot assign to the (computed) eigenvalues corresponding to B0 and Bm their correct integral subscripts in the nondecreasing sequences (26) and (27), respectively, and hence one cannot obtain the desired inequalities (or even an initial "segment" of them). (28)
X 2(B,) < X (Bm) < X 2 (A),
(for m,n = 1, 2, 3,
).
Third, a fundamental difficulty (having to do with this question of the knowledge of the exact multiplicities) arises in this second part of Weinstein's method [i.e., in the successive improvement of the initial lower bounds X (Bo) by considering the sets of functions B, C B2 C This . ]. C Bm C difficulty does not occur at all in the Rayleigh-Ritz method but does occur in the generalized Rayleigh-Ritz method of Aronszajn, which was mentioned near the end of the last section. For each m = 1, 2, 3, , Weinstein [90] constructed (employing the known eigenfunctions and eigenvalues corresponding to Bo) a meromorphic function of a single complex variable (this function will be called Wm) and showed that, if the positive number X2 is an eigenvalue corresponding to Bm but is not an eigenvalue corresponding to B0, then X2 must be a zero of the meromorphic function Wm. Weinstein also showed how to obtain, by an analytic criterion, the multiplicity (relative to Bm) of each such eigenvalue X2 corresponding to B. [i.e., which occurs in the sequence (27)] but which is not an eigenvalue corresponding to Bo [i.e., which does not occur in the sequence (26)]. In order to determine those (if there are any "remaining") eigenvalues (with their proper multiplicities) corresponding to Bm which are also eigenvalues corresponding to B0, and thus be able to obtain the complete sequence (27) (or at least an initial segment of it), Weinstein employed a certain sequence of harmonic functions, which he termed a "privileged sequence." By means of this privileged sequence, Weinstein was able to surmount the difficulty just indicated. Aronszajn [2, pp. 38-531 gave a different criterion for the same purpose and showed that the consideration of the zeros and poles of Wm (together with their "order," thought of as a positive integer for a zero and as a negative integer for a pole) can be used for the exact determination of the sequence (27) of eigenvalues corresponding to B,,. The present account of the method, which in its main outline is patterned after Weinstein [90], will however employ Aronszajn's criterion just alluded to. 1. The initial set of functions Bo will be taken to be the set of all real-valued -
functions w defined on D ± C and vanishing on C, that is, satisfying the
J. B. DIAZ
62
single boundary condition
w = 0,
(29)
(on C),
[which is only a part of the boundary conditions (2) of the original problem; hence A C Bo]. The eigenvalues Xn(Bo) are defined by the following variational problems [cf. (11) and (12)]: A1 (Bo) = min
(30)
(Aw,Ow)
weBo
min
max
Xlm(Bo) =
(31)
(w w) (Ow,Aw)
(n = 2, 3,
(w w)
weBo
01,
.).
(w, 0:) = 0
It will now be shown that the eigenvalues and eigenfunctions relative to Bo (the eigenvalues and eigenfunctions of the "base problem" in the terminology of Weinstein) are quite simply related to the eigenvalues and eigenfunctions of the problem of the vibrating membrane with fixed edges for the same domain D + C. The eigenvalues Xln(Bo) have been defined by (30) and (31) in terms of "maximum-minimum" variational problems. This will lead to their recursive definition, in terms of the successive eigenfunctions, and finally to the corresponding differential problems, which will be seen to reduce to that of the vibrating membrane with edges fixed along C. This process followed here is, in a certain sense, a reversal of the usual steps (cf., e.g., Courant and Hilbert [20, pp. 398-407]), where one starts with the eigenvalues and eigenfunctions of a differential problem first, then obtains a recursive definition and finally a "maximum-minimum" definition. Let w1 0 0 be an eigenfunction corresponding to the eigenvalue X (Bo). From (30) it follows that (32)
(Ow1,Ow1) = X2 (B,) wl) 1(Bo)
(wl
(Q[wl + EJ], 0[wl++ tt-
_
(w1 -I- El', wl + E0 y + _ (Owl,Awl) + 2E((w1,r)
}
(w1,w1) +
where is an arbitrary function satisfying the boundary condition = 0 on C and where E is any real number. Inequality (32), in view of the arbitrariness of E, implies that (33)
(Ow1,Di') - X (Bo) (wl,i') = 0,
which, together with Green's identity (3), upon putting 4, _
and 0 = wi,
yields (34)
([DOW, - X (Bo)wl],i) +
In view of the boundary condition
_
Jc
(Awl an -
ant
I ds = 0.
= 0 on C, and of the arbitrariness of
in
UPPER AND LOWER BOUNDS FOR EIGENVALUES
63
D and of at/can on C, it follows from (34) that wl is a solution of the differential eigenvalue problem (35) (36)
Owl - A,(Bo)wl = 0, wl = Owl = 0,
(on D), (on C).
The additional boundary condition that Owl = 0 on C, which is actually fulfilled by the eigenfunction wl but is not required of it for membership in the set B0, is a "natural boundary condition," in the terminology of R. Courant [18].
Since the set Bo is not empty, from (32) one has that 0 < X (Bo) < oo. Let X1(Bo) >= 0 denote the (nonnegative) square root of X2(Bo). and (36) yield (37) (38)
O[Awi + X1(Bo)wl] - Xl(Bo)[Owl + X1(Bo)wl] = 0,
Awl + Xi(Bo)wl = 0,
Then (35) (on D), (on C),
with A1(Bo) > 0. Consequently, the function Owl + A1(Bo)wl must vanish throughout D + C, and wl must be an eigenfunction, with eigenvalue equal to X1(Bo), of the vibrating-membrane equation (39) (40)
(on D),
Owl + Xi(Bo)wl = 0,
wl = 0, (on Q. 0, it follows that X1(Bo) and X (Bo) are both positive. Without Since wl loss, one may assume that (wl,wl) = 1, and then X (Bo) = (Awl,4wl). Let w2 0 0 be a solution of the variational problem [cf. (31)] (41)
min wEBo (w,wi)
(Ow,Aw) (w,w)
=0
where wl is the eigenfunction of (30) already considered. Suppose that (w2,w2) = 1; then the minimum value of (41) is just (AW2,zw2). Further, from (30), (31), and the definitions of wl and w2, one has that (42)
0 < X (Bo) = (Owl,Ow1) : (Ow2,Ow2)
X2(Bo);
while from Green's identity (3), with 0 = wl and ¢ = w2, one obtains (43)
(Ow1,IXw2) = X (Bo)('w l,w2)
On the other hand, (wl,W2) = 0 from the definition (41) of w2. Now, from (31) (44)
A2(Bo) = min max 01
wEBo
(Ow,Ow)
(w,w)
(w,¢1) =0
but for any given ¢1 there always exist real numbers Cl and C2, not both zero, Since such that (Clwl + C2w2, 01) = (0[Clwl + C2w21, i[l lwl ++C2w2]) _ Cj /(Ow1,Ow1)++ C2(ow2,tw2) 1 + C2 (Clw1 + C2w2, Clwl + C2w2)
J. B. DIAZ
64
[cf. (43) and (wl,w2) = 0], it follows that (Ow Aw)
min
(w,w)
WEB,
< (Aw2,Ow2),
(w,¢i) =0
and consequently [cf. (44)] that a2(Bo) < (Aw2,Aw2).
(45)
In conclusion, then (46)
min
a2(Bo) = (Ow2,Aw2) =
(Aw,Ow)
(WweBa
(WA
'WI) =0
It still remains to show that w2 is an eigenfunction of the membrane problem, with eigenvalue equal to the positive square root of X2(Bo). The procedure is similar to that just carried out above with wl. From (46) it follows
that (47)
X2(Ba) < (A[u12 + 6l'], 0[w2 +
is an arbitrary function satisfying the boundary condition = 0 on C and (l',w1) = 0, and where e is any real number. In view of the arbitrariness of e, inequality (47) implies that where
(48)
(IAw2,01) - a2(Bo)
0,
whenever both l' = 0 on C and 0. But (48) continues to hold whenever 1 = 0 on C only [i.e., even if 0], because (AW2,Aw1) = (w2,w1) = 0 [cf. (43)]. An application of Green's identity (3), use being made of the boundary condition = 0 on C, and of the arbitrariness of on D and of on C, then yields that w2 is a solution of the differential eigenvalue problem (49)
OOw2 - X2(Bo)w2 = 0,
(on D),
(50)
w2 = Owe = 0,
(on C).
Further, if X2(Bo) denotes the positive square root of X2 (Bo), then (49) and (50)
imply that (51) (52)
(on D), (on C).
Awl + A2(Bo)w2 = 0, w2 = 0,
Proceeding in the above manner, one obtains that, for n = 2, 3,
[cf.
(46) ],
(53)
X 9 (Bo) = (Ow,,Awn) =
min wEBo
(AU'AW)
(w,w)
(w,w;) =0
i=1,
,n-1
where (Wj,Wk) = 1 for j = k and (w;,wk) = 0 for j 5Z k, with both j,k = 1,
UPPER AND LOWER BOUNDS FOR EIGENVALUES , n.
2,
(54) (55)
65
Further ODwn - an(Bo)wn = 0, wn = Own = 0,
(on D),
Awn + X,(Bo)wn = 0, wn = 0,
(on D),
Du + Wu = 0,
(on D), (on C),
(on C),
and (56)
(on Q. Consider the differential eigenvalue problem for the vibrating membrane:
(57)
(58) (59)
u = 0,
and let (60)
711, u2, U3,
and (61)
0 < W1 < W2 :_5 W3
be its sequences of eigenfunctions and eigenvalues, respectively. It has so far been shown that, if w is an eigenfunction corresponding to Bo, with eigenvalue X2 > 0, then w is also a membrane eigenfunction, with eigenvalue X > 0. But,
in order to show that indeed
an(Bo) = co, n(n = 1, 2, 3,
(62)
it still remains to show that the converse of this last statement holds, i.e., that if u is a membrane eigenfunction with eigenvalue w > 0, then u is also an eigenfunction relative to Bo, with eigenvalue W2. From (58) and (59) one has (63) (64)
OAu - W2u = 0,
(on D),
(on C), u = Du = 0, and the proof may then be completed along the lines of a reasoning of
H. Herrmann [34] (see also A. Weinstein [90, p. 11]), which will be omitted here for the sake of brevity. 2. Let pi(s), p2(s), be a sequence of real-valued functions , pk(s), defined on C, where s denotes are length on C. It is clear that, if this sequence pl(s), p2(s), , pk(s), is complete on C, then the boundary conditions (2), that is, w = aw/an = 0 on C, may be replaced by the equivalent boundary conditions (infinitely many in number) :
w = 0,
(65) (66)
f
pk(s) an
ds = 0,
(on C),
(for lc = 1, 2,
This immediately suggests the definition of the set of functions Bm, for m = 1, 2. , as the set of all real-valued functions w defined on D + C
J. B. DIAZ
66
and satisfying the boundary conditions (67) w = 0,
(on C),
(
(for k = 1, 2,
ds = 0, Jc P43) an
(68)
.
, m).
The variational problem [cf. (11) and (12)] corresponding to the set of func-
tions B. is called the "mth intermediate problem," in Weinstein's terminology.
For each m there is a sequence of eigenfunctions
(69)
Wml, Wm2,
.
. .
wmn,
. .
.
corresponding to the eigenvalues 0 < X (Bm) < a2(Bm) < ... < X2(Bm) <_ (70)
The eigenfunction Wmn (see Weinstein [90, pp. 20-22]) satisfies the equation [cf. equation (34) above] (71)
([OOwmn - an(Bm)wmn],i) +
(mn . an Ic
-
a Owmn
an
ds = 0
whenever the function 1 satisfies (on C),
1 = 0,
(72)
and
fC pk(s) an ds = 0,
(73)
(for k = 1, 2,
IM).
. This implies that there exist m real numbers Al, A2n, , Amn such that wmn satisfies the natural boundary condition [cf. the natural boundary con-
dition in (36) above] (74)
Aknpn,
owmn =
(on C).
k=1
In all, therefore, wmn is a solution of the differential eigenvalue problem [cf. (35) and (36) above] (on D), (75) M.Wmn - A,2i(Bm)W,nn = 0, (on C), w=0, (76)
owmn = I Aknpk,
(on C).
k=1
0, it is clear from (75) and (76) that not all the m real numbers
Since wmn
Al,
,
Amn vanish.
3. The eigenvalues an(Bm) will now be determined in terms of the supposedly known eigenvalues and eigenfunctions relative to the initial set Bo. The argument (cf. Weinstein [90, pp. 26-49]) is based on the following two facts concerning the eigenvalues wn and the eigenfunctions un of the vibrating-membrane problem [see equations (58) to (61) for the notation employed]. The first is
UPPER AND LOWER BOUNDS FOR EIGENVALUES
67
Parseval's equality (f,9) _
(77)
(f,ui)(9,ui); i=1
and the second is the known solution (see Courant and Hilbert [20, p. 361] and Aronszajn [2, pp. 11-13]) of the boundary-value problem Du + wu = f, u = 0,
(78)
(79)
(on D), (on C),
where f is a given function and w is a given complex number. If w 0 wi for i = 1, 2, 3, , then there is one and only one solution to (78) and (79), and it is given by the infinite series (f, ui
W - wi
Ui.
i=1
On the other hand, if co = w; for some positive integer j (the case of "resonance"), then either f is not orthogonal to all the eigenfunctions of the membrane [cf. (58), (59)] which correspond to the eigenvalue w;, or else f is orthogonal to all these eigenfunctions. In the first instance, the problem of (78) and
(79) has no solution, while in the second instance there are infinitely many solutions, the maximum number of linearly independent solutions being exactly equal to the multiplicity of w; as an eigenvalue of the membrane problem (58) and (59).
It will now be further supposed that the auxiliary functions pi(s), p2(s), , pk(s), , defined on C, which were introduced in step 2 above, are the boundary values of a sequence of harmonic functions defined on D + C. In an obvious notation, these harmonic functions will be denoted by .
p1(x,y), p2(x,y),
(81)
.
.
.
,
.
pk(x,y),
One has that 1 pk(x,y) = 0,
(82)
k = 1, 2, 3, (83)
.
(on D),
In view of (82), since by Green's identity it is true that
Ic Cpk an
wan) ds = (pk,Ow) - (w,Apk),
, the boundary conditions (67) and (68) required of a function w to belong to the set Bm may be rewritten thus: (on C), (84) w = 0, (k = 1, 2, . (85) (Ow,pk) = 0, , m).
with k = 1, 2, 3,
-
It is in this last form that the boundary conditions for the set of functions B. will be used in the next step, the determination of the eigenvalues X 2(B.).
J. B. DIAZ
68
Let w 0 0 be an eigenfunction, with eigenvalue X2 > 0, relative to the set of functions B,,,. Then the function w satisfies the following conditions [cf. (75), (76), and (85)]: (on D), AAw - X2w = 0, (86) (on D), w = 0,
Ow = I Akpk,
(87)
(on C),
k=1
(k = 1, 2, , m), A,,,, which are not all zero. By Green's
(Ow,pk) = 0,
(88)
, with m real constants A 1, identity, in view of the partial differential equation (86) and the boundary conditions (87), it follows that [put ¢ = w and, = ui in (3)], for i = 1, 2, 3,
(89)
(Aw,ui)
w (Ow,Du;) 1
A2
fe
c0i
_
X2(Aw,ui)
an
I
pj - ds
Aj
- W,
pw au' ds
C
j=1 m X2 W2
(Ow,ui) +
Aj(pj,ui), j=1
proper use being made of the equations [cf. (58) and (59)]
/ui + wiui = 0, ui = 0,
(on D), (on C), which are satisfied by the membrane eigenfunction ui. If, further, X2 0 w; for any positive integer i, then (89) implies that (90a) (90b)
m
(91)
G,2
(Ow,ui) = CO? 2
Aj(pj,ui),
X2
j=1
while Parseval's equality (77), together with equation (91), yields (92)
(0w,pk) = 2 (ow,ui) (pk,ui) .
j(pj,ui) I (pk,u'i) W?
W? - a2 (pjfui) (pk,ui) j=1
ti
i=1
UPPER AND LOWER BOUNDS FOR EIGENVALUES for lc = 1,
69
Finally, from (92) and (88),
, m. m
(93)
k = 1,
, m. Since not all the m numbers A1, A2, . , A. are zero, the eigenvalue A2 must be a zero of the determinant of the coefficients of the system of m linear equations (93). Thus (cf. Weinstein [90, p. 20]), if X2 is an eigenvalue corresponding to Bm but is not an eigenvalue corresponding to Bo, then X2 is a zero of the m-rowed determinant (note that j,k = 1, , m) (94)
W?
TV,,(,7) = det I
2
i=1
i
,
(Pj,ui)(Pk,2Ui) 11
which, as can be seen, depends explicitly on the eigenvalues and eigenfunctions
corresponding to Bo, that is, on the eigenvalues and eigenfunctions of the vibrating-membrane problem. The determinant Wm(71) is a meromorphic function of the complex variable 77.
For each complex number , let vt(Wm), called the "exponent of with respect to the function Wm," denote the uniquely determined integer v such that (95)
(77 - E)`[Co + C1(17 - ) + C2(71 - E)2 + .
.
.],
whenever 171 - I > 0 is sufficiently small, the analytic function in the square bracket in (95) being regular and different from zero (that is, Co 0 0) at o = . Clearly, if the function W, is regular at , then vE(Wm) is zero if Wm() 0; whereas vt(Wm) is a positive integer (equal to the order of the zero of W. at ) is a if Wm(E) = 0. On the other hand, if W. has a pole at , then negative integer (equal to minus the order of the pole of W. at t). Further, for each complex number let µ (Bo), called "the multiplicity of with respect to Bo," denote the number of times t appears in the sequence of eigenvalues X (Bo) corresponding to the set of functions Bo. Let uE(Bm), for m = 1, 2, Then, according to 3, , have a similar meaning with respect to Bm. Aronszajn [2, pp. 38-53], the following general relation holds for any complex . number and any m = 1, 2, 3, (96)
i (Bm) - µs(Bo) = vE(Wm)
This relation enables one, once the orders of the zeros and the poles of the function W. are known, to determine explicitly the sequence of eigenvalues M (B.), the proper multiplicities being taken into account. It should be noticed that (96) yields more information than the conclusion obtained from For if X2 is an eigenvalue corresponding to B. [so that ii (Bm) is a positive integer], but is not an eigenvalue corresponding to Bo [so that >.=(Bo) = 0], then (93) furnishes only the information that X2 must be a zero of the determi(93).
J. B. DIAZ
70
nant Wm(n) of (94), whereas (96), which in the present case reads (97)
,4X2(Bm) = vX
yields again the information that X2 must be a zero of W,n, but also that the exact number of times that the number X2 occurs in the sequence of the eigenvalues X (Bm) is precisely equal to the order of the zero of the function W.(77) at the number X2.
The proof of (96) will be carried out here (partially) only for m = 1. For the complete proof, for any m, reference is made to Aronszajn [2, pp. 38-53]. When m = 1, from (94), (98)
W1(n) _
I
(p1,u2)2
1 - 00/-I)
7,=1
It can be readily seen that W1 is a meromorphic function of 77 whose zeros and poles are simple and positive [notice that the poles can only occur at the eigenvalues X22 (Bo) = C02 corresponding to the initial set Bo]. Now suppose that X2 is an eigenvalue corresponding to B1 [so that µr2(B1) is a positive integer], but is not an eigenvalue corresponding to Bo [so that µa2(Bo) = 0]. The number X2 must be a zero of Wl(,i), and since this function has simple zeros only, it follows that v?,2(Wl) = 1. Hence [cf. (97)] one just has to prove that (99)
,.
2(B1) = 1,
and this fact can be readily seen. For, if w 0 0 is an eigenfunction corresponding to the eigenvalue X2 > 0 of B1j then [cf. (75), (76), (85)] AAw - X2w = 0, (100) W = 0, (101)
Aw = A1p1, (Aw,p1) = 0,
(102)
(on D), (on C), (on C),
where the number Al is not zero. But the only function satisfying equations (100) and (101), with Al replaced by zero, is the identically zero function. Consequently, the eigenfunctions corresponding to the eigenvalue X2 of B1 are
precisely the functions a(w/A1), where a 5z 0 is an arbitrary real number. This set of functions is one-dimensional, and this is precisely what (99) asserts.
5. Concluding remarks. 1. Lord Rayleigh [62] formulated the following conjecture [cf. (58) and (59)] which was proved by G. Faber [24] and E. Krahn [41] (see also L. Tonelli [74]) : of all membranes of a given area the circle has the gravest fundamental tone (lowest principal frequency wi). The corresponding result for n-dimensional Euclidean space: that the smallest eigenvalue w1 of the problem 2
(103)
Au + wu = 0,
A = 8x2 + u = 0, 1
(104)
2
+
8x2, n
(on D), (on C),
UPPER AND LOWER BOUNDS FOR EIGENVALUES
71
for all n-dimensional domains D of a given volume occurs when D is an n-dimensional sphere, was proved by E. Krahn [42, specially pp. 39-43]. "Isoperimetric" theorems of this kind have received a great deal of attention recently. Reference is made, in particular, to the book of G. Pblya and G. Szego [61]. The following assertion was made by E. T. Kornhauser and I. Stakgold [40] : of all simply connected domains D of a given area, the circle has the maximum value for the second eigenvalue µ2 of the problem (105) (106)
Au + µu = 0,
'A =
au
+
2
axay,
(on D),
=0
(on C).
T,
(It is readily seen that the first eigenvalue µi is always zero.) This statement was proved by G. Szego [72], by an argument based on conformal mapping. The corresponding result for n-dimensional space has been recently proved by H. F. Weinberger (see L. E. Payne and H. F. Weinberger [55]). 2. In the study of vibrations and buckling of continuous beams and other composite systems, E. Saibel [66,67] (see also E. Saibel [68,69], E. Saibel and E. d'Appolonia [70], E. Saibel and W. F. Z. Lee [71], and W. F. Z. Lee and E. Saibel [46]) has developed a method for the determination of eigenfrequencies which is very closely related to Weinstein's method (see Sec. 4 of the present paper). Although Saibel's analytical approach employs Lagrange
multipliers, his scheme of "developing the solution in terms of the eigenfunctions and eigenvalues of the beam with inner constraints removed, referred to as the simple beam" (E. Saibel and W. F. Z. Lee [71, p. 499]), is to be compared with Weinstein's approach [90, pp. 5-7] of reducing plate problems to simpler membrane problems. For a fuller discussion of the exact relationship
of these various ideas, reference is made to H. F. Weinberger [82, specially pp. 12-13]. 3. In the notation of the preceding sections, let w denote the nth eigenvalue
of the vibrating membrane [cf. (58) and (59)], and X denote the nth eigenvalue of the vibrating clamped plate [cf. (1) and (2)]. Weinstein [84, p. 50] has called attention to the inequalities (107)
2 < ax,,
(n = 1, 2, 3,
),
connecting the eigenvalues of the two problems in question. R. Courant [19] (see Weinstein [90, p. 20]) has arrived earlier at the inequalities (108)
2 < Ate,
Xn, in the notation of Sec. 4 of this paper]. Wein[which state that X (Bo) stein's [90, p. 50] proof of the inequalities (107) involves the use of what he
termed a "suite fondamentale privilegiee de fonctions harmoniques." It appears to be of some interest to obtain a proof of (107) which does not employ the notion of a privileged sequence. The following direct proof of (107) was
J. B. DIAZ
72
developed during a conversation with L. E. Payne and H. F. Weinberger. Recall first the definition "by recurrence" of the eigenvalues and eigenfunctions under discussion: wl =
(109)
(u, -Au)
(where u = 0, on C),
(u,u)
(u, - Du) (u,u)
in
wn =
(u,ui) =0
(110)
(where u = 0 on C, for n = 2, 3, and [cf. (7) and (8)] a i = mi n
(Aw,4w) (w,w)
(111)
(where w = an = 0, on C) , A
(Ow'z w)
min
n=
(w,w)
(w.w,)=o
i=1,
(112)
,n-1
(where w=an=0, on C,forn=2,3,
).
The first is Schwarz's inequality
Two other facts will be needed in the proof.
(v, -1.v) 2 < (v,v) (Av,Liv),
(113)
where, if the function v 0 0, the equality sign holds if and only if there is a real constant k such that This implies that, if v (115)
(on D).
Ov + kv = 0,
(114)
0, then one has the inequality
[
(v,-Ov) (v,v)
2
]
(Ov,Ov)
(VA
,
with equality if and only if (114) holds for some constant k. The second fact needed is H. Weber's [79, specially p. 5] representation formula (116)
v(x,y) =
f
\u any - Yo a7 ) ds, for a solution v of equation (114), where (x,y) is any point of D, the directional derivative 0/an is taken along the outer normal to C, the function 4
c
Yo = Yo(r N/k-) is Bessel's function of the second kind of order zero, and r denotes the Euclidean distance from the point (x,y). (Equation (116) is just
"Green's third identity" for the partial differential equation (114), in the terminology of 0. D. Kellogg [39, p. 219]).
The proof of (107) may now be readily carried out. n = 2, 3, first. There exist n real constants C1,
Consider the case ,
Cn, not all zero,
UPPER AND LOWER BOUNDS FOR EIGENVALUES
73
such that the function (117)
V = C1w1 + C2w2 +
+ Cn7AJn
satisfies the n - 1 orthogonality relations (v,ui) = 0,
(118)
(for i = 1,
, n - 1).
The function v of (117) satisfies the boundary conditions v = av/an = 0 on C.
Further, the function v is not identically zero, since the n eigenfunctions , w are linearly independent. Consequently, for this particular function v, inequality (115) holds in the strict sense [if the equality sign did hold, then v would be identically zero, from (116) and the boundary conditions satisfied by v]. Thus, with v given by (117), one has, in view of the recurrence definitions (109) to (112), that, for n = 2, 3, , wl,
(119)
w2 < [ (V,-AV) 12 < L
(VA
J
(Av,4v)
<
(v,v)
_
'
(Awn,Awn) _ X2 (wn,wn)
In the case n = 1, it suffices to put v = w, in (115) and to argue similarly. The proof of (107) is now complete.
4. Weinstein's monograph [90] is concerned not only with the vibration problem (1),(2) for a clamped plate, but also with the buckling problem (120) (121)
AAW +- A AW = 0, W
=n = 0,
(on D), (on C).
In the course of certain numerical applications of his method for obtaining lower bounds for the eigenvalues of (120),(121), Weinstein was led to conjecture the following inequality, which connects the first eigenvalue of the buckling problem (120),(121) and the second eigenvalue of the vibratingmembrane problem (58),(59): (122)
W2 < A1.
This conjecture has been recently proved by L. E. Payne [52]. In the same paper, L. E. Payne and H. F. Weinberger have shown further that the equality sign holds in (122) only when the domain is a circle. 5. In all methods for finding bounds for eigenvalues, it is of both practical and theoretical importance to develop estimates for the error. Estimates of this nature have been given for the Rayleigh-Ritz method by N. Kryloff [44], N. Kryloff and N. Bogoliuboff [45], and still more recently by N. Aronszajn [1] and H. F. Weinberger [80,81,83], who give error estimates also in the case of Weinstein's method. 6. The bibliography is, needless to say, representative rather than complete. It consists, for the most part, of the references which have been consulted at some time or other during the preparation of the present account. (Added in proof: Special reference is.made to the excellent treatise by S. H. Gould, Vari-
J. B. DIAZ
74
ational methods for eigenvalue problems. An introduction to the methods of Rayleigh, Ritz, Weinstein, and Aronszajn, Mathematical Expositions No. 10, University of Toronto Press, Toronto, 1957, which appeared while the present article was in press.) BIBLIOGRAPHY
1. N. Aronszajn, The Rayleigh-Ritz and the Weinstein methods for approximation of eigenvalues, I, Operators in a Hilbert space, Proc. Nat. Acad. U.S.A. vol. 34 (1948) pp. 474-480; II, Differential problems, Proc. Nat. Acad. Sci. U.S.A. vol. 34 (1948) pp. 594-601. 2. , Studies in eigenvalue problems: The Rayleigh-Ritz and the Weinstein methods for approximation of eigenvalues, I, Operators in a Hilbert space, Okla. Agric. and Mech. Coll. Tech. Rep. 1 (1949). , The Rayleigh-Ritz and the Weinstein methods for approximation of eigenvalues, 3. II, Differential operators, Okla. Agric. and Mech. Tech. Rep 2 Coll. (1949). , The Rayleigh-Ritz and the Weinstein methods for approximation of eigenvalues, 4. III, Application of Weinstein's method with an auxiliary problem of type I, Okla. Agric. and Mech. Coll. Tech. Rep. 3 (1950). , The Rayleigh-Ritz and the Weinstein methods for approximation of eigenvalues, 5. IV, Application of Weinstein's method with an auxiliary problem of type II, Okla. Agric. and Mech. Coll. Technical Rep. 4 (1950). 6.
, Approximation methods for eigenvalues of completely continuous symmetric oper-
ators, Proc. Symposium on Spectral Theory and Differential Problems, Okla. Agric. and Mech. Coll. (1951), pp. 179-202. and W. F. Donoghue, Variational approximation methods applied to eigenvalues 7. of a clamped rectangular plate, Univ. of Kansas Tech. Rep. 12 (1954). 8. and K. T. Smith, Functional spaces and functional completion, Univ. of Kansas Tech. Rep. 10 (1954). and A. Weinstein, Sur la convergence d'un procide variationnel d'approximation 9. dans la thiorie des plaques encastrEes, C. R. Acad. Sci. Paris vol. 204 (1937) pp. 96-98. and 10. , Existence, convergence, and equivalence in the unified theory of eigenvalues of plates and membranes, Proc. Nat. Acad. Sci. U.S.A. vol. 27 (1941) pp. 188-191. 11. and , On a unified theory of eigenvalues of plates and membranes, Amer. J. Math. vol. 64 (1942) pp. 623-645. 12. and A. Zeichner, Studies in eigenvalue problems. A new type of auxiliary problem for approximation of eigenvalues by Weinstein's method, Okla. Agric. and Mech. Coll. Tech. Rep. 5 (1951). 13. B. Budiansky and Pai C. Hu, The Lagrangian multiplier method of finding upper and lower limits to critical stresses of clamped plates, NACA Tech. Note 1103 (1946). 14. L. Collatz, Eigenwertprobleme and ihre numerische Behandlung, Chelsea Publishing Co., New York, 1948. 15. , Numerische Behandlung von Differentialgleichungen, Springer-Verlag, Berlin, 2d ed., 1955. 16. R. Courant, Beweis des Satzes, class von alien homogenen Membranen gegebenen Umfanges
and gegebener Spannung die kreisf6rmige den tiefsten Grundton besitzt, Math. Zeits. vol. 1 (1918) pp. 321-328. 17.
, Ueber die Eigenwerte bei den Differentialgleichungen der mathematischen Physik,
Math. Zeit. vol. 7 (1920) pp. 1-57. 18. , Ueber die Losungen der Differentialgleichungen der Physik, Math. Ann. vol. 85 (1922) pp. 280-325. 19. , Ueber die Schwingungen eingespannter Platten, Math. Zeit. vol. 15 (1922) pp. 195-200.
UPPER AND LOWER BOUNDS FOR EIGENVALUES
75
20.
and D. Hilbert, Methods of mathematical physics, vol. 1, New York, 1st English Interscience 21. J. B. Diaz, Upper and lower bounds for quadratic functionals, Proc. Symposium on Spectral Theory and Differential Problems, Okla. Agric. and Mech. Coll. (1951) pp. 279-289. 22. , Upper and lower bounds for quadratic functionals, Collectanea Math. Seminario Mat. Barcelona vol. 4 (1951) pp. 3-50. 23. and A. Weinstein, Schwarz's inequality and the methods of Rayleigh-Ritz and Trefftz, J. Math. Phys. vol. 26 (1947) pp. 133-136. 24. G. Faber, Beweis, dass unter aller homogenen Membranen von gleicher Fldche and gleicher Spannung die kreisformige den tiefsten Grundton gibt, Bayrisch. Akad. Wiss. Sitzungsber. (1923) pp. 169-172. 25. S. Faedo, I metodi ispirati a quello di Ritz net calcolo delle variazioni e nella teoria delle equazioni differenziali, Rend. Ist. Lombardo Sci. Lett. vol. 86 (1953) pp. 291-302. 26. G. Fichera, Risultati concernenti la risoluzione delle equazioni funzionali lineari dovuti all'Istituto Nazionale per le application del Calcolo, Mem. Accad. Naz. Lincei, Ser. VIII vol. 3 (1950) pp. 3-81. 27. , On general computation methods for eigenvalues and eigenfunctions, Nat. Bur. Standards Appl. Math. Ser. 29 (1953) pp. 79-82. 28. , Formule di maggiorazione connesse ad una classe di trasformazioni lineari, Ann. Mat. Ser. IV vol. 36 (1954) pp. 273-296. 29. , Methods of linear functional analysis in mathematical physics, Proc. Internat. Congress Mathematicians, Amsterdam vol. I (1954). , Su un metodo del Picone per it calcolo degli autovalori e delle autosoluzioni, 30. Scritti matematici offerti a Mauro Picone, Cooperative Tipografica Azzoguidi, Bologna, 1955, pp. 649-669. (Ann. Mat. Pura Appl. Ser. IV, vols. 39-40). 31. K. 0. Friedrichs, Die Randwert and Eigenwertprobleme aus der Theorie der elastischen Platten (Anwendung der direkten Methoden der Variationsrechnung), Math. Ann. vol. 98 (1927) pp. 205-247. ed., 1953.
32. J. Hadamard, Sur un problbme d'analyse relatif a l'equilibre des plaques elastiques encastroes, M6moires presentees par divers savants strangers a l'Academie des Sciences, vol. 33 (1908) Memoire no. 4. 33. H. R. Hass6, The bending of a uniformly loaded clamped plate in the form of a circular section, Quart. J. Mech. Appl. Math. vol. 3 (1950) pp. 271-278. 34. H. Herrmann, Beziehungen zwischen den Eigenwerten and Eigenfunktionen verschiedener Eigenwertprobleme, Math. Zeit. vol. 40 (1935) pp. 221-241. 35. J. Indritz, Applications of the Rayleigh-Ritz method to variational problems, Pacific J. Math. vol. 5 suppl. 1 (1955) pp. 765-797. 36. A. K. Jennings, Studies in eigenvalue problems: Some developments and applications of a new approximation method for partial differential eigenvalue problems, Okla. Agric. and Mech.
Coll. Tech. Rep. 6 (1951). 37. E. Kamke, Differentialgleichungen. Losungsmethoden and Losungen, vol. I, Gewohnliche Differentialgleichungen, Chelsea Publishing, Co., New York, 1948, specially pp. 182-245. 38. T. Kato, On the upper and lower bounds for eigenvalues, J. Phys. Soc. Japan vol. 4 (1949) pp. 415-438. 39. 0. D. Kellogg, Potential theory, Springer-Verlag, Berlin, 1929.
40. E. T. Kornhauser and I. Stakgold, A variational theorem for V2u + au = 0 and its applications, J. Math. Phys. vol. 31 (1952) pp. 45-54. 41. E. Krahn, Ueber eine von Rayleigh formulierte Minimaleigenschaft des Kreises, Math. Ann. vol. 94 (1924) pp. 97-100. , Ueber Minimaleigenschaften der Kugel in drei and mehr Dimensionen, Acta 42. et Commentationes Universitatis Dorpatensis vol. A9 (1926) pp. 1-44. Ueber Eigenschwingungszahlen freier Platten, Acta et Commentationes 43. , Dorpatensis vol. A21 (1931) pp. 1-12.
76
J. B. DIAZ
44. N. Kryloff, Les MEthodes de solution approchEe des problbmes de la physique mathEmatique, Mem. Sci. Math. vol. 49 (1931). and N. Bogoliuboff, Application de la mEthode de l'algorithme variationnel a 45. la solution approchEe des Equations diferentielles aux dErivEes partielles du type elliptique. Estimation des erreurs qu'on commet en s'arretant a la n-me approximation dans le calcul des vateurs et des fonctions singulii'res, I, II, Bull. Acad. Sci. URSS Ser. 7 (1930) pp. 43-71, 105114.
46. W. F. Z. Lee and E. Saibel, Free vibrations of constrained beams, J. Appl. Mech. vol. 19 (1952) pp. 471-477. 47. H. J. Mahly, Die gendherte Berechnung von Eigenwerten elastischer Schwingungen anistroper Korper, Ergeb. exak. Naturwiss. vol. 24 (1951) pp. 402-442. 48. S. G. Mihlin, Direct methods in mathematical physics, Moscow, 1950. , The problem of the minimum of a quadratic functional, Moscow, 1952. 49. 50. S. Minakshisundaram, Expansion in eigenfunctions of the membrane problem, Proc.
Symposium on Spectral Theory and Differential Problems, Okla. Agric. and Mech. Coll. (1951) pp. 317-325. 51. Marston Morse, The calculus of variations in the large, Amer. Math. Soc. Colloquium Pub. vol. 18 (1934) chap. IV. 52. L. E. Payne, Inequalities for eigenvalues of membranes and plates, J. Rational Mech. Anal. vol. 4 (1955) pp. 517-529. , G. Pblya, and H. F. Weinberger, Sur le quotient de deux frEquences propres 53. consEcutives, C. R. Acad. Sci. Paris vol. 241 (1955) pp. 917-919. and H. F. Weinberger, New bounds in harmonic and biharmonic problems, J. 54. Math. Phys. vol. 33 (1955) pp. 291-307. , Two inequalities for eigenvalues of membranes, Univ. of Maryland and 55.
Inst. for Fluid Dynamics and Appl. Math. Tech. Note BN-65 (1955). 56. M. Picone, Sul metodo delle minime potenze ponderate e sul metodo di Ritz per it calcolo approssimato nei problemi della fisica matematica, Rend. Cire. Mat. Palermo vol. 52 (1928) pp. 225-253. 57. A. Pleijel, On Green's functions for elastic plates with clamped, supported and free edges, Proc. Symposium on Spectral Theory and Differential Problems, Okla. Agric. and Mech. Coll. (1951) pp. 413-437. 58. H. Poincar6, Sur les Equations aux derivEes partielles de la physique mathEmatique, Amer. J. Math. vol. 12 (1890) pp. 211-294, specially pp. 237-244, 259-261. 59. G. P61ya, Remarks on the foregoing paper, J. Math. Phys. vol. 31 (1952) pp. 55-57. 60. G. P61ya and M. Schiffer, Convexity of functionals by transplantation, J. Anal. Math. vol. 3 (1954) pp. 246-345. and G. Szego, Isoperimetric inequalities in mathematical physics, Annals of 61. Mathematics Studies, vol. 27, Princeton, N.J., 1951. 62. J. W. S. Rayleigh, The theory of sound, 2 vols., London, 2d edition, 1894, 1896. 63. E. Reissner, Buckling of plates with intermediate rigid supports, J. Aero. Sci. vol. 12 (1945) pp. 375-377. 64. W. Ritz, Ueber eine neue Methods zur Losung gewisser Variationsprobleme der mathematischen Physik, J. Reine Angew. Math. vol. 135 (1909) pp. 1-61. 65.
, Oeuvres complbtes, Paris, 1911.
66. E. Saibel, Vibration frequencies of continuous beams, J. Aero. Sci. vol. 11 (1944) pp. 88-90. 67. , Buckling loads of beams or plates on continuous supports, J. Aero. Sci. vol. 11 (1944) pp. 399-403. [See also letter to the editor in vol. 12 (1945) p. 251.1 68. , Free and forced vibrations of composite systems, Proc. Symposium on Spectral Theory and Differential Problems, Okla. Agric. and Mech. Coll. (1951) pp. 333-343. 69. , Buckling of continuous beams on elastic supports, J. Franklin Inst. vol. 253 (1952) pp. 563-566.
UPPER AND LOWER BOUNDS FOR EIGENVALUES
77
and E. d'Appolonia, Forced vibrations of continuous beams, Trans. ASCE vol. 70. 117 (1952) pp. 1075-1090. and W. F. Z. Lee, Vibrations of a continuous beam under a constant moving force, 71. J. Franklin Inst. vol. 254 (1952) pp. 499-516. 72. G. Szego, Inequalities for certain membranes of a given area, J. Rational Mech. Anal. vol. 3 (1954) pp. 343-356. 73. G. Temple and W. G. Bickley, Rayleigh's principle and its applications to engineering, London, 1933. 74. L. Tonelli, Sur un probleme de Lord Rayleigh, Monatsh. Math. Phys. vol. 37 (1930) pp. 253-280. 75. E. Trefftz, Die Bestimmung der Knicklast gedruckter, rechteckiger Platten, Zeit. Angew. Math. Mech. vol. 15 (1935) pp. 339-344; vol. 16 (1936) p. 64.
76. F. H. van den Dungen, Application du principe de Rayleigh au calcul des systi:mes acoustiques, M6moires in 4° Acad. Roy. Belgique vol. 12 (1938). 77. T. Viola, Calcolo approssimato di autovalori, Rend. Mat. Applicazioni, University, di Roma Ser. 5 vol. 2 (1941) pp. 71-106. 78. K. Washizu, On the bounds of eigenvalues, Quart. J. Mech. Appl. Math. vol. 8 (1955) pp. 311-325. 79. H. Weber, Ueber die Integration der partiellen Differentialgleichung: a2u/axe + a2u/ ay2 + k2u = 0, Math. Ann. vol. 1 (1869) pp. 1-36. 80. H. F. Weinberger, Error estimates in the Weinstein method for eigenvalues, Proc. Amer. Math. Soc. vol. 3 (1952) pp. 643-646. , An optimum problem in the Weinstein method for eigenvalues, Pacific J. Math. 81. vol. 2 (1952) pp. 413-418. , An extension of the classical Sturm-Liouville theory, Duke Math. J. vol. 22 82. (1955) pp. 1-14. , A Rayleigh-Ritz procedure giving upper and lower bounds for eigenvalues, 83. Univ. of Maryland Inst. for Fluid Dynamics and Appl. Math. Tech. Note BN-41 (1954). 84. A. Weinstein, Sur la stabilite des plaques encastr(es, C. R. Acad. Sci. Paris vol. 200 (1935) pp. 107-109. , On a minimal problem in the theory of elasticity, J. London Math. Soc. vol. 10 85. (1935) pp. 184-192. , Sur l'6quation des vibrations d'une plaque encastree, C. R. Acad. Sci. Paris 86. vol. 202 (1936) pp. 1899-1901. , On the symmetries of the solutions of a certain variational problem, Proc. Cam87. bridge Philos. Soc. vol. 33 (1956) pp. 96-101. 88. , Sur l'6quation des vibrations d'une plaque, C. R. Soc. Phys. Genbve vol. 53 (1936) p. 184. , Sur le spectre de l'lquation des vibrations d'une plaque encastrye, C. R. Acad. 89. Sci. Paris vol. 205 (1937) pp. 707-708. , Etude des spectres des equations aux dlrivIes partielles de la thyorie des plaques 90. Elastiques, Memorial des Sciences Mathbmatiques, vol. 88, Paris, 1937. 91.
,
Les vibrations et le calcul des variations, Portugaliae Mathematica vol. 2
(1941) pp. 1-20. 92. , Separation theorems for the eigenvalues of partial differential equations, H.
Reissner Anniversary Volume, J. W. Edwards, Publisher, Inc., Ann Arbor, Mich., 1948, pp. 404-414.
93. , Quantitative methods in Sturm-Liouville theory, Proc. Symposium on Spectral Theory and Differential Problems Okla. Agric. and Mech. Coll. (1951) pp. 345-352. , Variational methods for the approximation and exact computation of eigenvalues, 94. Nat. Bur. Standards Appl. Math. Ser. 29 (1953) pp. 83-89. and W. Z. Chien, On the vibrations of a clamped plate under tension, Quart. 95. Appl. Math. vol. 1 (1943) pp. 61-68.
78
J. B. DIAZ
96. and J. A. Jenkins, On a boundary value problem for a clamped plate, Trans. Roy. Soc. Canada Sec. III (3) vol. 40 (1946) pp. 59-67. 97. and D. H. Rock, On the bending of a clamped plate, Quart. Appl. Math. vol. 2 (1944) pp. 262-266. 98. R. Weinstock, Inequalities for a classical eigenvalue problem, J. Rational Mech. Anal. vol. 3 (1954) pp. 745-753. 99. H. Weyl, Ueber die Abhangigkeit der Eigenschwingungen einer Membran von deren Begrenzung, J. Reine Angew. Math. vol. 144 (1912) pp. 1-11. , Ueber das spektrum der Hohlraumstrahlung, J. Reine Angew. Math. vol. 141 100. (1912) pp. 163-181. 101. , Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungens (mit einer Anwendung auf die Theorie der Hohlraumstrahlung), Math. Ann. vol. 71 (1912) pp. 441-479. 102. , Ramifications, old and new, of the eigenvalue problem, Bull. Amer. Math. Soc. vol. 56 (1950) pp. 115-139. INSTITUTE FOR FLUID DYNAMICS AND APPLIED MATHEMATICS, UNIVERSITY OF MARYLAND, COLLEGE PARK, MD.
STATIONARY PRINCIPLES FOR FORCED VIBRATIONS IN ELASTICITY AND ELECTROMAGNETISM' BY
J. L. SYNGE
1. Geometrization of stationary principles. It is possible to exhibit certain stationary principles in a simple geometrical form, common to them all, by using the concept of function space with a suitable scalar product.2 This scalar product provides a metric, which may be positive-definite or indefinite. In the physical applications, it is found that a positive-definite metric occurs in problems of static equilibrium, the positive-definite character being due to the positive-definite character of the energy of the system. In problems of forced vibrations, on the other hand, the appropriate metric is indefinite, like S
11
FIG. 1
the metric in Minkowskian space-time. With a positive-definite metric we get strong minimum principles, but an indefinite metric yields only stationary principles.
The basic idea behind all this work is to "split the problem," so that the point S of function space which corresponds to the solution is the point of intersection of two orthogonal linear subspaces L' and L". Let S' be any point of L' and S" any point of L" (Fig. 1). If the metric is positive-definite, then, by the inequality of Schwarz (or, equivalently, the theorem of Pythagoras), we have (S - S")2 < (S' - S")2, (1) (S - S') 2 _-< (S' - S")2. 1 This research was supported by the U.S. Air Force, through the Office of Scientific
(2)
Research of the Research and Development Command, under Contract No. AFOSR-TN-56122 AD-82518. The author was on leave of absence January to April, 1956, at the National Bureau of Standards. 2 This procedure was originally devised in connection with problems of elastic equilibrium; cf. W. Prager and J. L. Synge, Quart. Appl. Math. vol. 5 (1947) pp. 241-269. The present treatment of forced vibrations is based on A. J. McConnell, Proc. Roy. Irish Acad. vol. 54A (1951) pp. 263-290. A detailed account of these methods, with numerous applications, will be found in J. L. Synge, The hypercircle in mathematical physics, Cambridge University Press, London, 1957. 79
J. L. SYNGE
SO
Define -D by
(S' - S")2.
(3)
We have then the following two minimum principles: (4) (5)
For S" fixed, 1 is minimized when S' = S. For S' fixed, (D is minimized when S" = S.
These may be stated in a slightly different way: (6) (7)
For S" fixed, S'2 - 2S' S" is minimized when 5' = S. For S' fixed, -2S' S" + S"2 is minimized when S" = S.
Here S' S" is the scalar product; S'2 = S' S', etc. If the metric is indefinite, we no longer have the inequalities (1) and (2). Instead, we have merely the orthogonality condition 0.
(8)
For arbitrary variations of S', 5" on L', L ", respectively, this gives (9) (10)
with (D defined by (3), we have (11)
64, = 2(S' - S") (SS' - SS").
Hence we have the following two stationary principles: (12) (13)
For S" fixed, 4) is stationary when S' = S. For S' fixed, is stationary when 5" = S.
And, as in (6) and (7), these may also be stated in the form: (14) (15)
For S" fixed, 5'2 - 2S' S" is stationary when S' = S. For S' fixed, -2S'- S" + S"2 is stationary when S" = S.
Thus, by passage from a positive-definite metric to an indefinite one, the two complementary minimum principles (4), (5), or (6), (7), are replaced by the two complementary stationary principles (12), (13), or (14), (15). The stationary principles derived below are either known or closely related to known results. The interest of the present work would seem to lie in the geometrical interpretation of these principles. 2. Vibrating membrane. Consider a membrane A bounded by a curve B. The transverse displacement u* satisfies the equation (16)
a2u* vat, = T Au* + P*,
where v = density, T = tension, P* = applied pressure, and o is the two-
FORCED VIBRATIONS dimensional Laplacian. (17)
u* =
81
For simple harmonic forced vibrations we put ue-i-'),
_.(ueiW' +
P* = (peiwt + 156-i''),
where it and P are complex, and the bar indicates the complex conjugate; hence we get
Du + k2u + Q = 0,
(18)
where k2 = vw2/T, Q = P/T. We seek the solution of (18) in A with boundary conditions (u)n, = f,
(19)
(an)B, = 91 where B1 + B2 = B and f and g are given functions. With boundary conditions of this form, the real and imaginary parts of it separate, and progressive waves may be treated as a superposition of standing waves. There is then no loss of generality in regarding u and Q as real, and our problem as that of solving (18), (19) with f and g given real functions. We assume that k is not an eigenvalue. By this we mean that the homogeneous problem presented by (18), (19) with Q = f = g = 0 has no solution except u = 0. We are then assured that the inhomogeneous problem cannot have two solutions, since their difference would satisfy the homogeneous problem. Let xi be rectangular Cartesian coordinates, small Latin suffixes taking the values 1, 2 with the summation convention. Let a point S in function space correspond to a vector field pi together with a scalar field it. We write
S -' (pi,u)
(20)
The scalar product is defined by S S' = J (pip; - k2uu') dA; (21) this gives the indefinite metric S2 = f (pipi - k2u2) dA. Note that, like the Minkowskian metric of space-time, this metric is the difference of two positive-definite parts. In the above definitions, S and S' are general F-points (points of function space). Henceforth S (unprimed) will denote the F-point corresponding to the solution of (18), (19), so that (22)
S < (u,i,u),
(23)
the comma indicating the partial derivative (u,i = au/axi). We split the problem by defining linear subspaces L' and L" as follows: (24) (25)
L':
L":
S' H (p%,u'), S" +-' (pi')u"),
pi = u;i,
(U') B1 = f;
k2u" + Q = 0,
(p('ni)B, = 9;
J. L. SYNGE
82
here n, is the unit normal to B2, drawn outward. We take u' continuous (but u'i need only be piecewise continuous), and we take p'i'ni continuous across any curve with unit normal ni (but pi' need only be piecewise continuous).
It is clear that the solution S is on L' and on L". We shall prove that L' is orthogonal to L", in the sense that every vector T' lying in L' is orthogonal to every vector T" lying in L". (A vector lies in a subspace when it is the difference of two points in that subspace.) Thus we may write (26)
(27)
p% = u'i,
(pi,u'),
+ k2u" = 0,
T"
(u')a, = 0; (pi'ni)B, = 0.
Hence (28)
T' , T" = f (pipz' - k2u'u") dA = f (u'ip;' - k2u'u") dA = f u'p;'ni dB - f u'(pa'i + k2u") dA.
The last integral vanishes, by (27). (29)
The preceding integral is
f u'p?'ni dB = f u'pi'ni dB, + f u'pti'ni dB2 = 0,
by (26) and (27).
Hence
T' T" = 0,
(30)
and the required orthogonality is established.
The situation is now as in Fig. 1, and we have the two complementary stationary principles (12), (13) or (14), (15). To use these in the latter form,. we calculate S' S", obtaining [cf. (28)] (31)
S' , S" = ffpi'ni dBl + fu'g dB2 + fu'Q dA.
Hence we can state the two following principles: (i) The expression (32)
f (u'iu'i - k2u'2) dA - 25 u'g dB2 - 2f u'Q dA,
where u' is a scalar field arbitrary except for
(u')s, = f,
(33)
has a stationary value for u' = u, where u is the solution of (18), (19). (ii) The expression (34)
-2ffp%'ni dBl + f (pi'pa' - k2u"2) dA,
where p;' is an arbitrary vector field satisfying (35)
(p:'ni) Bz = 9,
and u" is given by (36)
k2u = -pi.i - Q,
has a stationary value for p?' = u,i, the gradient of the solution of (18), (19).
FORCED VIBRATIONS
83
3. Vibrations of an elastic body. Henceforth small Latin suffixes will take the values 1, 2, 3 with the summation convention for a repeated suffix. Let xi be rectangular Cartesian coordinates, udisplacement, E* stress, and X* body force per unit volume. Stress is connected with strain (e*) by the generalized Hooke's law, (37)
CijklEkl,
where the elastic coefficients satisfy (38)
Cijkl = Cjikl = Cklij
Strain is related to displacement by
e* = I(u7i + u*;).
(39)
The equations of motion are (40)
p aat2*
= E*,5 + X *,
where p is density. For simple harmonic forced vibrations, we put (41)
u* = ui cos wt,
X* = Xi cos wt,
etc.,
where ui and Xi are functions of position only, and (40) becomes (42)
Eij,j + k2ui + Xi = 0,
(k2 = pw').
We assign boundary conditions as follows: (43)
(ui) B, = fi,
(Eijnj) Bz = gi,
where B1 + B2 = B, the bounding surface of the body, and fi and gi are given. By (37) and (39) we have (44)
eij = CijkzEkl,
(45)
eij = -i(uj.i + ui,j).
We assume that k2 is not an eigenvalue, so that the solution is unique (cf. Sec. 2).
In dealing with the problem (42), (43), we think of a tensor field Ei; and a vector field ui independent of it. We regard (44) as the definition of eij, and we disregard (45) (unless specifically imposed). For a point S of function space, we write (46)
S - (Eij,ui),
and we define the scalar product by (47)
S S. S' = f
k2uiui) dV,
integrated through the volume V of the body. By virtue of (38) we have
J. L. SYNGE
84
S S' = S'
S.
The indefinite metric of function space is S2 = f (ei;Ei; - k2uiui) dV.
(48)
Here S and S' are general points. Henceforth S refers to the solution of (42), (43). Define linear subspaces L' and L" by
L': S' H (N)B, = fi; e'; _ (ui,5 + uj'.i), (Ez n;)B, = gi. E%; + k2u?, + Xi = 0, (50) L": "S" +-' (Ei;,ui'), We take u continuous, and EzJn; continuous across any surface with unit normal ni. Then S is on L' and on L", and (as is easily seen) L' and L" are (49)
orthogonal. Accordingly, the stationary principles (12), (13) apply, or equivalently (14), (15). We have (51)
S'
S" = f
dV
= f ui ,nj dB - f ui(Ev;,; + 00) dV = f ffE2;n; dBi + fu%gi dB2 + f u'Xi dV. Accordingly, the first stationary principle reads as follows:' (i) The expression (52)
f (e,';E;; - k2uiui) dV - 2f uigi dB2 - 2f uiXi dV,
where u, is an arbitrary vector field satisfying (53)
(ua)B, = fi
are derived from it according to (44) and (45), has a stationary value for ui = ui, where ui is the displacement vector in the solution of (42), and e2; and (43).
The second stationary principle reads:" (ii) The expression (54)
-2 f ffEi;n; dBl + f (e E%' -
dV,
where Ea; is any symmetric tensor field satisfying (55)
gi,
where ei; is derived from it according to (44), and where u;' is given by (56)
k2uv' _
Xi,
has a stationary value when E?; = Ei;, the stress in the solution of (42), (43). The time average of strain energy is (57)
W = f ei;Ei; dV,
and the time average of kinetic energy is 8 This is essentially Hamilton's principle. Cf. E. Reissner, Journal Math. and Physics vol. 27 (1948) pp. 159-160.
FORCED VIBRATIONS
K = 4P
(58)
85
uiui dV.
These formulas are meaningful even for the artificial states considered above, so that the stationary expressions in (52) and (54) may be written (59) (60)
4(W' - K') - 2f
dB2 - 2f u;Xi dV,
-2f ffEz;nj dB,. + 4(W" - K"). 4. Electromagnetic vibrations. If E and H* are the electric and magnetic
'
vectors, Maxwell's equations in vacuo read
l aE (61)
EijkHkj,
E*i = 0,
1 3H* c atz = Ei,kEk'j,
Hi,i = 0,
where c is the speed of light and Eijk the permutation symbol, skew-symmetric in all its subscripts and such that E123 = 1. For simple harmonic vibrations, we put E* = Ei sin wt, (62) H* = Hi cos wt,
where Ei and Hi are functions of position only. kEi = EijkHk.j, kHi = EijkEk.j,
(63)
Then (61) becomes
Ei,i = 0, Hi.i = 0,
where k = w/c. Consider electromagnetic vibrations in a cavity V, with certain boundary conditions on the boundary surface' B; we shall take these conditions to be (64)
(EijkEjnk)B, = fi,
(EijkHjnk)B. = gi,
where B1 + B2 = B, ni is the unit normal to B, and fi and gi are given.
These
conditions assign the tangential component of Ei on B1 and the tangential component of Hi on B2.
We assume that k is not an eigenvalue, so that the solution is unique (cf. Sec. 2). As a point of function space we take (65)
S - (Ei,Hi),
Define the a pair of vector fields, in general independent of one another. scalar product by S S' = f (EiE'i - HiH;;) dV, (66)
so that we have the indefinite metric (67)
S2 = f (E2 - H2) dV.
Henceforth S will refer to the solution of (63), (64).
J. L. SYNGE
86
We split the problem by defining linear subspaces L' and L" as follows:
L': S' kHi = eijkEk.j, (EijkEjnk)a, = fi; (Ei,Hi), kEi' = eijkHk.j, (eijkHj'nk)n, = gi L": S" *, (E:',Ha'), Note that, to get a vector S', we may choose any field Ei satisfying the conditions on B1 and derive H; from it; similarly, we may derive E;' from We take the tangential components of E; and H'' to be continuous across any (68) (69)
surface.
It is clear that S is on L' and on L". We now establish the orthogonality of L' and L". To do this, let us first calculate S' S". We have
S' , S" = f (EiEz' - H;H;') dV.
(70)
Now (71)
f E;E;' dV = k-1 f EzeijkHk; dV k-1 f EzeijkH,'nk dB
-
k-1 f E{.jeijkH'k dV
= k-1 f fiH;' dB1 - k-1 f Eigi dB2 + f HiH7' dV,
and therefore (72)
S' S" = k-1(ffiH;' dB1 - fgiEa dB2).
If T' and T" are vectors lying in L' and L", respectively, the calculation of T' T" proceeds along the same lines as above, except that we are to replace fi and gi by zero. Hence T' T" = 0, so that the orthogonality of L' and L" is established. We have then the two complementary stationary principles as in (14), (15) :
(i) The expression
f (E'2 - H'2) dV + 2k-1 f giE' dB2,
(73)
where Ei and Hi satisfy (68), has a stationary value for the actual field. (ii) The expression (74)
-2k-1 f fiHq' dB1 + f (E"2 - H"2) dV,
where Ey' and satisfy (69), has a stationary value for the actual field. 5. A procedure for improving approximations. The metrics (22), (48), and (67) are all of Minkowskian type, the difference of two positive-definite parts.
The argument which follows is based on this Minkowskian property and applies generally. For purposes of illustration, let us take the vibrating membrane with metric (22). Consider these two linear subspaces: (i) Subspace M consisting of all points of the form (pi,0). (ii) Subspace N consisting of all points of the form (0,u).
It is obvious that M and N are orthogonal. Further, any vector S can be resolved uniquely by orthogonal projection into the sum of two vectors lying
FORCED VIBRATIONS
87
in M and N, respectively :
S = SM + SN.
(75)
We have then (76)
SM SN = 0,
SM > 0,
SN < 0,
the metrics in M and N being, respectively, positive-definite and negativedefinite, for we have SN = - f7c2u2 dA. Snr = fpipi dA, The projection is illustrated in Fig. 2. (77)
FIG. 2
Let S' and S" be points on the linear subspaces L' and L" defined in (24) and (25), and let their orthogonal projections on M and N be indicated by these subscripts. (78)
Define
D = (SM - SM) 2 - (SN - SN) 2.
(Note the central minus sign!)
Then
D > 0.
(79)
If D = 0, then (80)
SM=Sit ,
SN=SN,
S'=S",
and therefore (81)
S'=S"=S,
where S is the solution (the unique common point of L' and L"). We can regard D as a "measure of error" for the approximate solution (82)
A = .(S' + S"),
J. L. SYNGE
88
in the sense that D = 0 implies
A = S.
(83)
The technique for improving an approximation (i.e., decreasing D) proceeds as follows: Let S' and S" be points on L' and L", respectively, and let T' and T" be vectors lying in these subspaces. For any values of the scalars a' and a", the points
S' + a'T'
(84)
sit + a"T"
are on L' and L", respectively, and they give an error, according to (78), (85)
D (al,a")
= (SM - SM + a'T M - a"T al) 2
- (S'N - S"N + a'T'N - a"T")N2.
This we minimize by choosing a' and a" to satisfy the two linear equations
(SM -St/M + a 'TM - a "T") T11 I
(86)
Sf
(SN - SN "I'" a'TN - a"T N) TN = 0, - SM + a'TM - a"TM) TM - (SN - SN -f- a'T N - a"T N) T N = 0.
The corresponding minimum of D(a',a") is (87)
Dmia = (SM - SM) it2 - (SN - SN) 2 " - (SN - SN) !I / // + S1 / SM) (a/ TM - a!!TM) (a' TN - a "TN), 1
where a' and a" have the values given by (86). To effect a greater improvement in one step, we should use more vectors, say Ti, T2i , T' lying in L' and Ti', T2 , , Te' lying in L". Then the minimization of D leads us to r + s linear equations of the same type as (86) for the determination of r + s weighting factors a,, , a', a,', DUBLIN INSTITUTE FOR ADVANCED STUDIES, DUBLIN, IRELAND
A VARIATIONAL COMPUTATION METHOD FOR FORCED-VIBRATION PROBLEMS' BY
H. F. WEINBERGER
As has been shown by Professor Synge in the preceding paper, problems of forced vibrations yield, in general, stationary principles for indefinite quadratic functionals. These principles provide no estimate of the deviation of an approximating function from the correct solution. When the quadratic functional naturally associated with a problem is used
as the norm of a Hilbert space, it yields the orthogonality of the subspaces Then, if S' and S" are any vectors in L' and L", respectively, and if S is the correct solution, which is the intersection of L' and L", the Pythagorean theorem yields L' and L", in the terminology of Synge.
(S"- S) . (S"-S).
(1)
(Following Synge, we use the dot to denote scalar product.) If the norm is positive-definite, each of the terms on the right, which represent the deviations of S' and S" from S, is bounded by the computable quantity on the left. This is the only use that is made of the orthogonality of L' and L". Clearly, if the norm is indefinite, the identity (1) gives no bound for the deviations S' - S and S" - S. In fact, even if it were possible to bound the norm of, say, S' - S, this would only result in a measure of the deviation of S' from S plus an arbitrary null vector, rather than from S itself. In the last part of his paper, Professor Synge suggests the use of a positivedefinite quadratic form in S' - S" to provide a measure of the deviations of
S' and S" from S when the natural norm is not positive-definite. It is the purpose of this discussion to show that in many cases this procedure will lead to an estimate for the deviations of S' and S" in a positive-definite norm. From now on, we shall work in a Hilbert space with a positive-definite norm. In general, L' and L" will not be orthogonal in this norm, so that (1) does not hold. However, it is sufficient to have the inequality (2)
(S' - S) (S' - S) < K(S" - S') (S" - S'),
for some constant K. A simple geometric consideration shows that the smallest value of K is given by (3)
K-1 = sine 0,
where 0 is the greatest lower bound of angles between vectors in L' and L". ' This research was supported by the U.S. Air Force through the Air Force Office of Scientific Research of the Air Research and Development Command under Contract No. AF (600)-573. 89
90
H. F. WEINBERGER
Since this is a symmetric functional of L' and L", (2) is equivalent to (S" - S) (S" S) K(S" - S') (S" - S')
-
(4)
It is clear that, if L' and L" intersect in a line (i.e., if the solution S is not unique), then B = 0, and there will be no inequalities (2) and (4). Since L' and L" are in general not complete, the same may happen even when S is unique.
Once K is determined, the quantity (S" - S') (S" - S') may be made small by the usual Ritz procedures, and thus S' and S" may be made to approximate S in our norm. The constant K must, of course, be determined for the given problem. We restrict the calculation of K to an example. A more general discussion will be published elsewhere. Let A be a simply connected domain in the xy-plane with smooth boundary B. Consider the problem (5)
Au + k2u + Q = 0,
u = f,
where Q and f are given functions. problem
Let the eigenvalues of the corresponding
Au + Xu = 0,
(6)
(in A), (on B),
u = 0,
(in A), (on B),
be denoted by X1, A2,
. We suppose k2 to be different from the X j. Following Synge, we consider the space of function triplets S +-> (pl,p2ju), but we introduce the positive-definite norm
(7)
S S = f (pi + p2 + c2u2) dA,
where c2 is a positive constant. The spaces L' and L" are again defined by
S' -* (pi,pa,u')tL' , pi = u'1, P2, = u'20; u' = f 11 11 S (p1 )p2 ,u )sL as p11 + P2/2 + k2u" + Q = 0 Let S' and S" be given vectors in L' and L", respectively, and let Q
(9)
where u is the solution of (5). (10)
S " (u,1,u.2,u), Then
S"-S'=(S"-S)-(S'- S).
Letting (11)
S" - S' E-'
and (12)
v=u'-u,
(on B), (in A).
COMPUTATION OF FORCED VIBRATIONS
91
we have (13)
- S' - S H (v,I,v,2,v)
and (14)
- Av - k2v = p1,I + P2,2 + k2u v=0
Thus, (15)
(in A), (on B).
v = fG(p1,1 + P2,2 + k2u) dA,
where G is the Green's function associated with the problem (5). Furthermore (16)
(S' - S) . (S' - S) = f (v2I + v'22 + c22v2) dA
and (17)
(S11 - S') , (S" - S') = f (pi + p2 + c2u2) dA. A simple computation involving the expansion of Green's function in the eigenfunctions of (6) shows that (18)
f (v2 + V22 + c2u2) dA
max [2Ai(Ai + c2) (X, - k2)-2]f (pi + p2) dA
+ max [2k4c-2(X + c2) (Xi - k')-2] fu2 dA. i
Thus, we have obtained the inequality (2) with (19) K = sup { max [2Ai(Xi + c2) (ai - k2)-2], max [2k4c 2(Ai + C2) (AZ - k2)-2] ] .
It is easy to show that the maximum will occur when Xi is either the eigenvalue above k2 nearest to k2 or the eigenvalue below k2 nearest to k2. Furthermore,
if a lower bound is used for the former and an upper bound for the latter, a larger value of K is obtained, so that the inequality (2) still holds. Thus, it is only necessary to know the distance of k2 from the spectrum of (6).
We then
have the inequalities (2) and (4), so that S is approximated by both S' and S" in the norm (7). If the vector S" is taken with pl = u;i, p2 = u,2, then u" - u satisfies a homogeneous differential equation of second order. Therefore the inequality (4), unlike (2), leads to arbitrarily close bounds for the value of u at a fixed interior point. INSTITUTE FOR FLUID DYNAMICS AND APPLIED MATHEMATICS, UNIVERSITY OF MARYLAND, COLLEGE PARK, MD.
APPLICATIONS OF VARIATIONAL METHODS IN THE THEORY OF CONFORMAL MAPPING BY
M. M. SCHIFFER
1. Introduction. The theory of conformal mapping has attracted the attention of many analysts for various reasons. It is an essential tool in the general theory of analytic functions of a complex variable. It is of great use in boundary-value problems of two-dimensional potential theory and plays, therefore, a role in electrostatics, elasticity, and fluid dynamics. Finally, it may be considered as an interesting branch of functional analysis in which we study the various quantities involved in their dependence on curves and domains which are geometrically given. The problems in conformal mapping may be classed into three major groups. We have to deal with existence proofs for specified canonical mappings; we need constructive procedures to carry these mappings out; and finally, we wish to estimate the various quantities arising in conformal mapping by means of more easily accessible quantities which depend upon the geometry of the curves or domains considered. In all three types of problems, methods of the calculus of variations have been successfully applied. In so far as the problem of conformal mapping may be reduced to a particular boundary-value problem of potential theory, we may consider the Dirichlet principle as the oldest variational approach to
the theory. In the Dirichlet principle we characterize the solution of a boundary-value problem by a minimum property within the class of all differentiable functions with a finite Dirichlet integral and with the prescribed boundary values. The characterization is easy, since the class of competing functions is so large that it is easy to vary within the class and to find admissible neighbors. On the other hand, since the class is so wide, it is very difficult
to prove that there is an extremal function which would satisfy the characterization. This difficulty invalidated much of the intuitively obtained results of Riemann and Lord Kelvin; it necessitates careful analysis and elaborate limit procedures or Hilbert space arguments. On the other hand, it is well known that the class of univalent analytic functions in a fixed domain of the complex plane forms a normal family; that means that from each sequence of functions of the family a convergent subsequence can be selected [13]. This subsequence converges uniformly in each closed subdomain and has as its limit either a constant or a univalent function. By restricting the class of univalent functions by means of some normalization, one can exclude the possibility of a limit function which is constant, and one obtains in this manner a compact class of univalent functions. In such a class every extremum problem will determine at least one function of the class for which the extreme value is actually achieved. But now the class is already 93
M. M. SCHIFFER
94
so narrow that it is much harder to characterize the extremum function by comparison with neighbor functions of the class. In other words, by restricting ourselves to univalent functions, we simplify very much the existence problems
but increase the difficulties of the technique of variations. We are led to a new type of variational problem in which the univalency of the admissible function is the hardest side condition. It will be seen in the following pages that a variational procedure can be established which operates within the class of univalent functions. There exists a close relationship between the functions mapping a domain upon some canonical domain like a circular region and the Green's function of potential theory. We shall show how a variational formula for the Green's function of a domain can be established and how the theory of univalent functions can also be approached from this point of view. Since the Green's function is a functional of the domain considered, the functional-analytic aspect of the theory is most clearly exhibited by this method. 2. The group property of univalent functions. The possibility of varying univalent functions is based on the following obvious remark: If w = f(z) is univalent in a domain D. in the z-plane and maps D. onto the domain D. in the w-plane; if, moreover, W = F(w) is univalent in D., then W = F[f(z)] is univalent in D. We shall refer to this fact as the group property of univalent functions.
Our next problem is to provide functions F(w) which are univalent in the image domain D. and very near to the identity function Fo(w) = w. Clearly, F[ f (z)] will then be a neighbor function of f (z) within the class of univalent functions. We assume that D, has as complement in the z-plane only proper
continua; then the complement A. of D,, in the w-plane will likewise be a set of proper continua. We choose a point wo E Ow and a finite continuum The complement of P(wo) in the w-plane may be called P (wo) ; it can be mapped onto the domain W1 > p by a univalent function which has near infinity the series development I'(wo) C A,,, which contains it.
(1)
W=F(w) =w-wo+ao+w alwo+(w
a2wo)2+
The number p is called the exterior radius of r(wo); it is a monotonic function of the set r and can be made arbitrarily small by shrinking r down to wo.
The inverse function w = F-1(W) is univalent in the circular domain JWJ > p, and its coefficients can be easily estimated by means of the classical area theorem. Working these estimates back to the original coefficients av, we can show that (2)
IavI < (4p)v+'.
Then putting av = bvpv}1, we may assert that the function (3)
blp2
W = w + pbo + w - wo + O(P3)
CONFORMAL MAPPING
95
represents a small variation of the identity function and yields, by the group property of the univalent functions, a variation for the univalent function f (z) in D. We may take any function O (W) which is univalent for JWI > p and construct the new function O(W) = O[F(w)], which is also univalent in I'(wo) and, a fortiori, in D. Thus, we have for given wo and r(wo) a large class of functions which are univalent in Dw and have a development (3) in powers of w - wo. This development converges in the exterior of a circle around wo which encloses r(wo) and, for p small enough, converges everywhere in D. except for a very small neighborhood of wo. Thus, we can prove [18]: THEOREM I. If f(z) is univalent in D. and maps this domain onto D. in the w-plane and if A. is the complement of Dw, then for each point wo c A. there exists an infinity of univalent functions in D. with arbitrarily small p of the form b1p2
(4)
f*(z) = f(z) +
f(z) - wo
+ 0(p3),
16)
where the estimate 0(p3) is uniform in each closed subdomain of Dz.
In general, we have a very great freedom in the choice of the coefficient bl of the varied function (4). Indeed, the domain D. must be of a very special form in order that we may not prescribe the sign of bi arbitrarily. This is shown by THEOREM II. Let r be a continuum in Ow and s(w) be analytic on r. If we have, for every point wo c r and every function (4), the inequality (5)
Re {bls(wo)} > 0(p),
then r must be an analytic arc w(t) in the w-plane which depends on a real paramWe can choose this parameter such that r satisfies the differential equation
eter t. (6)
()2 s(w) -}- 1 = 0.
The proof of Theorem II is rather difficult and will not be given here. But we wish to point out the importance of the two theorems for the general calculus of variations within the class of univalent functions. The first theorem
provides a formalism for varying a function, while the second allows us to draw conclusions from the extremum property and thus to characterize the extremum function. Let us illustrate the method by the following application: Let D. contain the point at infinity, and denote by a- the family of all functions which are univalent in D. and which have at infinity the development (7)
f(z) = z + co + ciz-1 + ..
.
If zo is a given fixed point in D.j let us ask for the maximum and minimum value of I f' (zo) I within the family .
M. M. SCHIFFER
96
We know from the compactness argument that an extremum function must exist, and we let f (z) satisfy, say, the conditions of the maximum problem. If D. is the image domain and A. its complement in the w-plane, we can
obtain by Theorem I an infinity of competing functions (4), all in . But the maximum property of f (z) implies the inequality (8)
f*'(zo)1 = f'(zo)I
I
1-
[f(zo)
Wo]2
+ O(P3)
< I f'(zp)I,
An easy transformation leads from (8) to the inequality (9)
Re {b1 [f(zo) - wo]-2} > O(p),
which allows the immediate application of Theorem II. We recognize that A. consists of analytic arcs each of which satisfies the differential equation (10)
w (t) 2
+ 1 = 0,
AZOA.
This equation can be readily integrated and yields (11)
w(t) = a + kve't.
The constant of integration k, depends on the particular component of A. considered.
Thus, we have proved: There exists a conformal mapping of each domain Dz by a function of which maps D. onto the complex plane slit along circular arcs around a common center a. We may prescribe arbitrarily the point zo in D. which shall go into a.
Similarly, we would have obtained the following result by considering the analogous minimum problem: There exists a conformal mapping of Dz by a function of the family which maps D, onto the complex plane slit along linear segments which all point to a common point a. point zo in D. which shall go into a.
We may prescribe arbitrarily the
We have thus obtained existence theorems for canonical slit mappings by proving appropriate extremum problems. It is clear that in the same way a great variety of existence theorems and canonical domains may be obtained. But, at the same time, our method yields also inequalities and estimates for conformal mappings. Indeed, let D. be the circular domain IzI > 1. is then the much-investigated class of univalent functions in the exterior of the unit circle which are represented in the entire domain by the power series (7). If we ask for the minimum value of If'(zo) I for an arbitrary 1zoj > 1, we have to determine a function in which maps jzj > 1 onto the w-plane slit along a linear segment showing in the direction of f (zo). Since f (z) is obviously only determined up to an additive constant, we may assume f(zo) = 0. All mappings of JzJ = 1 into linear segments and which are of the family have the form
CONFORMAL MAPPING (12)
f(z) = z -f- m +
97
e2ix
z
Under these mappings the unit circle goes into a segment of length 4, centered
at the point m, and having the direction e'". If we wish that the segment should be radial, clearly m = ImIe". requirement f (zo) = 0, which yields (13)
The sign itself is now determined by the
-e" = sgn zo.
Hence finally (14)
f'(zo) = 1 -
Izpl
and since this is the minimum value, we derive the general estimate (15)
TWI ? 1 -
zl
valid for all functions of . It is obvious that many estimates and inequalities can be obtained in this manner. The procedure is as easily applied in the case of arbitrary multiply connected domains as for the case of a circle, to which most alternative methods are particularly suited. The limitation of the method of variation described lies in the fact that it is not very well adapted to take care of many side conditions. If we are dealing, for example, with problems of interpolation in conformal mapping, we have to consider subclasses of univalent functions which satisfy at N given points z, , N) of the domain D. the conditions f (zv) = to,, with N given (v = 1, 2, numbers w.. If we want to vary in this subclass, the formalism provided by
Theorem I becomes hard to apply. A second disadvantage lies in the too restrictive limitation to analytic single-valued functions in the domain. Some of the most important functions of potential theory, like the Green's function, cannot be dealt with adequately in this way. 3. The method of interior variation. If we want to deal with the general problem of two-dimensional potential theory, it is advisable to focus our attenThis function is harmonic and symmetric tion on the Green's function in both variables in the domain D. Only if z = does it become infinite, but (16)
h(z,r) =
log Iz - rl
is regular harmonic if z is near . If z or approaches the boundary C of D, the Green's function tends to zero. lies in the fact that all important quantities conThe importance of nected with the potential theory of the domain D are expressible in terms of g. For example, if D is simply connected and if ¢(z) maps D upon the interior
M. M. SCHIFFER
98
of the unit circle Jwl < 1, we have
g(z,O = log 1 - 0(0 0(z)
(17)
(z) - 0(1")
Thus, the knowledge of the Green's function allows a determination of the mapping function onto the circle. Similarly, in the case that D is multiply connected, the functions mapping D upon the important canonical domains like parallel-slit domain, circular and radial-slit domain, etc., are expressible in terms of We want to derive now a formula which describes the change of the Green's
function with a change of the domain D. This formula contains then, in principle, also a method for variation of univalent functions. For we may characterize the univalent functions by the domain D upon which they map and express them in terms of the Green's function of the domain. Instead of varying the functions by an arithmetical expression (4), we can describe their variation by the deformation of their image domain and express the change of the function by means of the formula for the variation of the Green's function with the domain. In fact, both theorems of the preceding section can be deduced from the following theory of the Green's function.
Let us start at first with a domain D bounded by n analytic curves CY, C=
C,,.
Let ¢(z) be an analytic function defined on and near the bound-
ary C of D. We assume that there exists a curve system r in D which is homotopic to C such that in the ring system bounded by C and r the function O(z) is analytic. Consider now the mapping (18)
z* = z + e4 (z)
It is easily seen that, if jel is small enough, the curves C, will be mapped in a one-to-one manner upon n analytic curves C*, which will in turn determine a domain D* in the z-plane. Let 0 denote the subdomain of D bounded by the curve system r; if lel is sufficiently small, A will also be a subdomain of D*, which will now be assumed. We denote by g(z,l') and the Green's functions with respect to D and D*. Let p(z,l') and p*(zS) be their analytic completions, i.e., analytic functions of z whose real parts are these Green's functions. We consider the harmonic functions (19)
b(z) = g(zs),
'I'(z) = g*(z + E0(z), q),
which are regular in the domain D - 0 if and 77 are chosen in A and if lel is small enough. Observe that both functions vanish if z tends to C, since g* vanishes on C* and the latter curve is obtained from C by the mapping (18). By Green's identity, we have
CONFORMAL MAPPING 27
(20)
c+r (cia an - * aT) ds = 2a r (41 an -
99
) ds = 0.
On the other hand, the functions and g*(z,n) are harmonic in 0 except for their poles at and n. Hence, we have (21)
2a 7'r [9(z,0 a9
z,n)
(an
- 9*(z,,1) ag(z,an )] ds =
Subtracting (21) from (20), we obtain (22)
2a
r [an [g*(z + eo(z), n) - 9*(z,n)] [9*(z +
(z), n) - 9*(z,n)] . ag(z )] ds = g*(1",n) -
9(r,n)
Using the complex functions p and p*, we can bring (22) into the simpler form (23)
where
g*(3',n) -
Re {tai 7'r [p*(z + e0(z), n) - p*(z,n)jp'(z,i") dz}, dp(z,i-)/dz.
The identity (23) is of fundamental importance in the theory of variation for the Green's function. It has been derived under the assumption that the boundary C of the domain D is an analytic curve system. But it connects only expressions of the Green's functions taken at points of P and inside A. Hence, if we approximate a given domain D by a sequence of analytically bounded domains, we may use formula (23) at first for each domain of the sequence; because of the uniform convergence of the Green's functions and their derivatives in each closed subdomain of D, we can then conclude the validity of (23) for the Green's functions of D and its varied domain D*. Thus, (23) is now established for the most general plane domain D. We observe that (23) is an integral equation, between and and can be used to compute g(i',n) as a series of iterated integrals involving Indeed, a rapidly converging computational procedure can be established for the calculation of the one Green's function in terms of the other. We do not pursue this subject further since we are interested in the case of very small e. We obtain from (23) by series development in e (24)
g*(i',n) - g(1",n) = Re {tai
r
p*'(z,n)p'(z,l") . eo(z) dz} + 0(e2).
The error term 0(e2) is a harmonic function in
and n which can be estimated
uniformly in 0; from this we conclude easily that p*'(z,n) = p'(z,n) + 0(e),
1\2. M. SCHIFFER
100
and hence we replace (24) by the more useful result Re
(25)
{tai
Tr
p'(z,n)p'(z,i")e0(z)
dz}
+
If the boundary C of D consists of smooth curves C., we may replace in (25) the path of integration r by C in view of the fact that the integrand is analytic in the ring domain D - A. But on C we clearly have
-i
(26)
Cz,
= ds)
Hence (25) assumes the form (27)
* 9
1
Re
f
ag(z,'n) e0(z)
- 2.7r c
an
an
iz'
ds + 0(0).
Observe that l
Re
(28)
EO(z) t
iz1
I
Sn
describes the shift in the direction of the inner normal of each boundary point z 8 C under the variation (18). Thus, we arrive at the classical variational formula of Hadamard [11]: (29)
21
f
agan77)
o
on ds.
The importance of this formula in numerous applications is well known. Hadamard's formula was first applied to problems of conformal mapping by Julia [12]. It allows us to make statements on the monotonic change with the domain D of numerous functionals which are connected with the Green's function and leads in this way to comparison theorems and estimates. But it is made precise and admits of an estimation for the error term by virtue of the integral equation (23), from which it has been derived. The limitation of (29) to the case of smooth boundaries is a serious one and restricts the usefulness of Hadamard's formula considerably. Suppose, on the other hand, that 0(z) is meromorphic in 0; then (25) can be calculated explicitly by means of the residue theorem. For the sake of simplicity, we consider the case (30)
45 (z) =
z
(zo E 0).
1 zo,
Then (25) reduces to [20] (
(31)
Re S e
r Lp
P'(??'zo]}
77 -
+ O(e2).
Formula (31) gives the first variation of the Green's function for the case of the very special deformation (30) of the boundary. But it is valid for the most general domain D and expresses the variation by the values of the Green's
CONFORMAL MAPPING
101
function at an interior point of D. Equation (31) is called the interior variational formula for the Green's function. It is most valuable if one has to characterize a domain by its extremum property. In this case, (31) leads to simple differential equations for the Green's function of the domain considered. The case 0(z)
(30')
can be reduced to the preceding one, since the first variations superimpose linearly. On the other hand, formula (30') enables us to introduce so large a number of parameters as to enforce rather difficult side conditions for the variation. We have given an explicit formula for the first variation of the Green's function under the interior variations (18) and (30). It is clear that we can as a power series in e with coefficients depending on g(i',,J) calculate and that the variation of every order can be obtained if necessary. The knowledge of the fact that the second variation of a functional is always of a fixed sign for a given class of deformations leads often to useful convexity theorems, which have theoretical as well as numerical significance [5]. 4. Applications of interior variation. Let D be a domain containing the point at infinity and consider its Green's function, g(z) = g(z, oo ). We have near infinity
g(z) = log jz1 - y + 0 (h).
(32)
The quantity y is called the capacity constant of the domain D (or of its boundary C) and is, indeed, closely related to the electrostatic capacity of cylindric conductors with the cross section C. In the case that D is simply connected, we can also find a geometric inter-
pretation for y. We map D upon the circular domain jwj > 1 by means of the univalent function w=
(33)
(z) = r+ao+ z+
(r > 0).
The constant r is called the mapping radius of the domain D or the exterior radius of its boundary C. In view of (17) we have (34)
g(z) = log O(z)I = log jzj + log
r
which shows that (35)
+
0
r=ell.
It is easily seen that, if Dz is mapped upon a domain D. by means of a univalent function (7), the capacity constant y is unchanged.
M. M. SCHIFFER
102
We wish to apply our variational formulas in order to obtain various results concerning y. In this way we will illustrate the potential theoretical significance and possibilities of the various formulas. We obtain from the definition
(32) of y and from (29) the Hadamard-type variational formula (36)
5-Y
=
1
7c
(az))2 on ds, On the other hand, (31) specializes to
which was given first by Poincare [14]. (37)
y* = y - Re {ep'(zo)2} + O(e2)
for the case of the variation (38)
zz+ z - zo e
(zo E D).
,
We consider, for example, the following problem. Let - be the family of all functions f (z) of normalization (7) at infinity and univalent outside of the unit circle. We decompose the circumference jzj = 1 into two arcs A and B of angle a and p, respectively. Consider the set of image points corresponding to A; they form in the w-plane a continuum iX which determines a domain with capacity constant ya. We ask for the minimum value of -ya within all mappings in a-. The existence of a minimum function f (z) can be easily shown.
If F(w) is univalent outside of 1 and normalized at infinity, then the group property of univalent function implies that F[ f (z) ] lies also in 3 and is also a minimum function, for the capacity constant ya is unchanged under normalized mappings. We overcome the indeterminacy of the minimum function by asking in addition that the continuum 9-1 be a circle in the w-plane. This can always be achieved by an admissible mapping in the w-plane. It is easily seen that the radius of the circle must be r = e y.. We must characterize the extremum function by stating what happens to the image of B if A is mapped on the circle.
We know that B will be mapped onto a continuum jb in the w-plane. Let wo be a point of 0. By Theorem I of Sec. 2 there exists an infinity of functions (4) which are univalent and normalized. These are competing functions to the extremum function f(z). But observe that the mapping (39)
w* = to +W bl
p WO
+ O(P3)
will change the capacity constant ya of the circle jwl = e7a. (40)
g(w) = log *1 - 'Ya,
Since
p(w) = log w - ya,
we find that the varied image of A will have the capacity constant (41)
ya = ya - Re {bip2wo2} + O(p3).
CONFORMAL MAPPING
103
Because of the minimum property of f (z), we have necessarily y*
ya.
Hence, for all admissible functions (4), we have (42)
Re {blwo2} < O(p).
We can now apply Theorem II of Sec. 2 and conclude that the image of B is an analytic are b which satisfies the differential equation (43)
w'2w-2 = 1.
Thus, 58 is a ray w = tetix, where etix is a fixed sign factor and t the real parameter. Hence, we have proved: The minimum value for ya is attained when the arc A is mapped into a circle and the arc B into a radial slit.
Once this geometric characterization of the minimum function f (z) is obtained by variational methods, it is easy to express f (z) explicitly in elementary functions. In particular, the calculation shows that in the minimum case
r. = e1a = sine
(44)
Hence, we have proved: Every function of the family maps an arc of the unit circumference of opening a into a continuum whose mapping radius r satisfies the inequality (45)
r > sin2 4
Since a + 0 = 2ir implies sin (p3/4) = cos (a/4), we see that two complementary arcs A and B are mapped onto continua 9K and 0 whose mapping radii satisfy the inequality ra + r5 >= 1.
(46)
Observe now that the unit circumference is mapped by a normalized mapping onto the continuum ?X + -58 and that, by definition (33), its mapping radius is, therefore, exactly 1. It is easily seen that, if we start with a closed curve C and subdivide it into two arcs A and B, we have always (47)
rA > rc sin2 4,
rB > re sin2
4-
+ rB > = rc.
Here a and ii are the angles of the images of A and B if C is mapped upon a full circumference, and rA, rB, rc are the exterior radii of the continua. The
quantities rc and a play an important role in the lift theory of an infinite cylindrical wing with cross section C which is subdivided by the stagnation points into subarcs A and B. The inequalities (47) are of significance in this connection. We have shown in the preceding example how one can combine the method
of interior variation with the method of boundary variation of Sec. 2.
We
104
M. M. SCHIFFER
obtained the complete answer to our minimum problem and a number of interesting inequalities. Let us show next an application of (36). We prescribe a closed curve C, and ask for another closed curve C, which shall contain C in its interior, shall enclose a prescribed area A, and shall have a minimum value for its capacity constant y. While it is possible to show that an extremal curve C, must exist, it is by no means sure that it will be smooth enough to admit the variational formula (36). For the sake of analysis, let us assume that a smooth extremum curve C, exists and try to characterize it by means of (36). We decompose C, into those arcs which are in touch with points of C and
into "free" arcs which do not touch C and can be shifted a little without violating the side condition that C, contain C. Under a variation Sn of the free arcs we have obviously (48)
SA = - fa on ds.
If we keep SA = 0, we must necessarily have Sy > 0, and the fundamental lemma of the calculus of variations leads to the consequence (49)
T = constant,
(on the free arcs).
We can now give an interesting fluid-dynamical interpretation to the situation described by the above analysis. We may conceive g(z) as the stream function of an incompressible and irrotational fluid flow. It is generated by a vortex at infinity and has Cl as closed streamline. Along the free arcs of Cl the stream velocity is constant, and hence, by Bernoulli's law, the pressure is constant along these arcs. In other words, g(z) describes a circulation around the given obstacle C which leaves parts of the fluid around C in rest. The combined obstacle of C and the stagnant fluid forms the body Cl; the con-
dition of equal pressure in the fluid at rest is guaranteed by the extremum condition (49). We have not proved the existence of such a flow pattern around the curve C, since our reasoning by means of (36) was purely heuristic. But once we have
got an insight into the nature of the extremum function, we may apply the method of interior variation in order to analyze the extremum function in an exact manner and to prove the existence of the flow pattern from compactness theorems in analytic function theory. This method was applied successfully by Garabedian and Spencer [8,15] in order to prove the existence of cavitation
in two-dimensional fluid dynamics, and it can be readily extended to more complicated situations [4].
The role of the variational formula (45) as a heuristic tool and of (47) as a less intuitive but more precise and applicable tool is well illustrated by the above example. It is often easy to foresee the answer to an extremum prob-
CONFORMAL MAPPING
105
lem by means of Hadamard's formula, and afterwards one can give an exact proof by means of interior variations. We mention finally the application of the method of interior variations to the coefficient problem for univalent functions [6,7,17]. One obtains immediately differential equations for the extremum functions; since the coefficients, in these differential equations depend in turn on the unknown extremum function, one is led to an interesting functional problem which has been solved in several cases.
The method of interior variations is flexible enough to admit variations for important subclasses of univalent functions. One can preserve under variation the property that the function has real coefficients in its Taylor development at the origin and the condition that the function be bounded by a given value. V. Singh has recently considered the class of univalent functions whose image domain contains a fixed given circle; it is possible to vary in such a way as to preserve this property of univalent functions.
It is also possible to generalize the variational procedure to the case of potential theory on Riemann surfaces and to apply it to the theory of p-valent functions [19,24].
5. The Fredholm eigenvalues. We want to discuss in this section a fundamental problem of two-dimensional potential theory. If we have a domain D bounded by a closed smooth curve C, we can apply the Poincar6-Fredholm theory and reduce the solution of every boundary-value problem to solving an inhomogeneous integral equation (50)
,f(z) = O(z) - a
fk(z,)) dst,
with (51)
7c(z,l') =cant log z
1
(zE C),
I
The integral equation with the transposed kernel leads to the solution of Neumann-type boundary-value problems. The same integral equations serve also to solve corresponding boundary-value problems for the complementary domain D, the outside of C.
It can be shown that the lowest eigenvalue X of the kernel
which Hence, a convergent belongs to a nonconstant eigenfunction is larger than 1. can be derived, Neumann-Liouville series for the resolvent kernel of and (50) can be solved by successive approximation. However, the convergence of this iterative procedure depends strongly upon the value of X and is better if X is larger. It seems, therefore, important to study the eigenvalue X as a functional of the curve C and to derive variational formulas for it. In this way, we shall obtain some results which are important if one wishes to apply the integralequation technique to harmonic boundary-value problems.
M. M. SCHIFFER
106
We start with the integral equation (52)
4 (z) = ° fC
dsr,
(z r,. C),
which defines the eigenvalue X, of k(z,t) and the eigenfunction 0y(z) on C. We introduce the harmonic function (53)
h,(z)
fk(z,()
aY
dsr,
which is defined for z v D. Because of the well-known discontinuity behavior of a double-layer potential, we have (54)
lim h(z) = (1 + X)0,(zl),
(z1 E C).
Z-4Z1
Thus, hv(z) is a harmonic function in D whose boundary values are proportional to the eigenfunction 0,(z), which is defined only on C. We next define the analytic function in D (55)
V, (Z) c
7z
hv(z)
The integral equation (52) can be expressed in terms of vy(z) in the elegant form (56)
tai
v°(z)
f ( - z)-'v,(i') df', o
(z t D).
Observe that the differentiation in (55) weeds out the constant eigenfunction and that all eigenvalues of (56) are larger than 1. It is easily seen that we can assume all eigenfunctions to be orthonormalized according to the condition (57)
f f vYv dT D
Similarly, we may define in the complementary domain D a harmonic function /Y(z) with an analytic derivative 'U (z). It is easily seen that (54')
(1 - X,)0,(z1),
lim 2-121
(z1 E C),
and that (56')
iUZ(z)
= tai
f(C
di',
(z E D).
Since the normal derivative of the double-layer potential passes continuously through the charged curve, we have (58)
an
h, (z) = an hv(z),
(for z z C),
CONFORMAL MAPPING
107
while (54) and (54') imply (59)
h,(z) = 1 + aY hy(z).
If we differentiate the boundary relations (58) and (59), we obtain the following connection between vv(z) and 77),(z) on C: (60)
iv,(z)z' = (1 + X)-lv,(z)z' - A'(1 + X)-1vY z)z'.
On the other hand, we may use the right-hand side in (56) in order to define an analytic function V,(z) in U. By the well-known theorem of Plemelj for the jump of a function defined by a Cauchy integral, we find for z E C V '(Z) = vy(z) - X x'(7i)7i'2.
(61)
Comparing (60) with (61), we find (62)
(1 +
271i
f(-
z E D),
and analogously (62')
(1 -
27ri
c
( - z)-1
(for z e D).
Thus, the knowledge of one analytic eigenfunction leads immediately to a simple representation for the corresponding eigenfunction of the complementary domain. We assumed the eigenfunction 0,(z) chosen in such a way that v,(z) is normalized. Since
fV2dr = 41 fc h, anv ds = 1,
(63)
D
the conditions (58) and (59) show that
ffI2dr=A, +-
(64)
1
8
We introduce, therefore, the normalized eigenfunction
v,(z) = i \Ix. + 1
(65)
ivv(z),
which also satisfies (56') and is related to vv(z) by the symmetric formulas (z) (66)
u,(z) =
z c D),
Ay 2Jri xy
2t 1/1-vAai
df,
(for z v- D).
M. M. SCHIFFER
108
Finally, we remark that the integral equation (56) may be expressed in the alternative form ( 67)
v(z) _ Ty
dr,,
ff
(z E D),
D
which leads to close relations to the theory of the Hilbert transform [1,2]. Let us perform now a variation (with zo E D).
(68)
It will transform the curve C into a curve C* with eigenfunctions vv (z) and eigenvalues A*. We write down integral equation (56) with respect to C*, but by the aid of (68) we can refer everything back to1D. Observe that (69)
v*
+z
(Z
E
Wy(z)
(1
Z0
J
(1
- (z
E
zp)2 / = W, (z)
/
Hence, we find
is regular analytic in D. (70)
E
l*
(z --z o)2/
2ai
1 Wv(J) d C
[1
(z -
zo)
z
Thus, Wy(z) and X* are eigenfunctions and eigenvalues for an integral equation with respect to the same domain but with a slightly changed kernel.
Let us suppose, for the sake of simplicity, that to the eigenvalue X, there corresponds exactly one eigenfunction v,(z). Then, clearly, the corresponding eigenfunction Wy(z) will be near to v,(z). Hence, using (56) and (66), we can transform (70) into (70')
Wy (z)
'!r
ff ( - Z)2 d7- D
(Z
+ 0(E). ZO)2
Finally, we multiply this identity by v;-(Z) and integrate over the domain D. Using (66) and (67), we obtain
ffwdi. = yX, ff v4Wy dT - ?E 1 -X,Xy 71v(zo)2 + 0(E). z
(71)
D
D
Now, Wy(z) is very near to vy(z), and vy(z) is normalized. real parts on both sides of (71), we find [2] (72)
Hence, taking the
X* - X. = Re {7rE(1 - A2)PJv(z0)2} + 0(E).
If we had used a variation of the form (68) but with zo E D, the symmetry of the functions v, and vy would have led to the analogous result: (73)
X* - ay = Re {ire(1 - X2)vy(zo)2} + 0(E).
In the case of an eigenvalue of higher multiplicity, we would have obtained
CONFORMAL MAPPING
109
a secular equation for A*, the terms of which are expressible in an obvious manner in terms of the eigenfunctions v"(z) and v"(z) which belong to the degenerate eigenvalue. It is easy to derive a Hadamard-type variational formula, once the interior formulas (72) and (73) have been established. For this purpose we put (73) into the form (74)
2e
i (1 - A,) C c
SA" = Re
- zo d
(zo a D). J
Since zo a D, we have, by Cauchy's integral theorem and the fact [obvious from (66)] that v"(oo) = 0,
0 = Re
(74')
{(i - AY) C 2i ! e
2
v"
d },
(zo a D).
But using (60) and the definition (65), we see easily that on C (1 - A2 v2 '2 (75) X2 - 1 v2 '2 = A2 v2 '2 v2Y 12 - 2A Iv I2 is purely real. Thus, combining (74) with (74'), we obtain (76)
SA"
_-{
[A" Re
Iv"12] Re
ds.
Using the geometric interpretation (28) for the last factor, we arrive at (77)
X"""
[X" Re
Iv"12] an ds.
It can also be shown from the boundary relation (60) that (78)
A" Re
Iv"I2
= -(A" Re {p2 12} - Iv"I2).
Since the interior normal with respect to D has the opposite sign to that with respect to D, we see the complete symmetry of (77). We have derived the variational formula (77) with respect to the particular kind of variation (68). However, we can obtain the most general an variation of C by superposition of variations of this type, and, by a limit argument, we can establish (77) for the most general case admissible. The eigenvalue variational formulas have been derived under the assumption that the curve C is smooth. In the case that C has a slit component, however small, it can be shown that its lowest eigenvalue is necessarily 1. This fact shows that the lowest eigenvalue depends in a highly unstable fashion upon the boundary, and consequently we shall have to frame extremum problems for this eigenvalue in a particularly careful manner. We shall be able to assert the existence of extremum curves C and characterize them by variational procedures, if we will admit some class of analytic curves C and vary within this class. A very useful extremum problem for the lowest eigenvalue is the following: Suppose that there exists an analytic function p(z) which is defined and uni-
M. M. SCHIFFER
110
valent in the ring domain
(r<1 < R), r<_JzI<_R, which contains the unit circumference. The function p(z) will map the unit circumference upon a curve C. We ask for the minimum value of the lowest eigenvalue of C. The significance of this minimum problem is obvious. We can refer every analytic curve C to a parameter varying on the unit circle. The requirement that the parameter function be analytic and univalent in a fixed ring surrounding this circle leads to something like a uniform analyticity of C. We can show that the measure (r,R) of the analyticity leads to an estimate for X and, consequently, to a valuable statement on the quality of the iteration procedure for solving boundary-value problems. In particular, we can predict the convergence of the Neumann-Liouville series for the conformal mapping of the interior of C upon the unit circle. If we apply the variational formulas (72) and (73) and the method of variation of Sec. 2 for univalent functions, we find that the minimum value for X is (79)
2 pp
(80)
Xmin -
R2
1 + (rR) 2.
It is attained if w = p(z) maps the circumference jzj = R upon the part of the real axis w2 > 1, while the circumference jzj = r goes into a segment of the imaginary axis -w2 <- p, where p depends on r and R. While the eigenvalues A, are very interesting functionals of a curve C, there are very few curves for which these eigenvalues have been explicitly calculated. One knows that in the case of a circle all eigenvalues are infinite, and it is easy to calculate the eigenvalues for an ellipse. It seems desirable to give more examples of eigenfunctions and eigenvalues, and this can be done easily by variational methods. We want to show how to construct eigenfunctions and eigenvalues for many nearly circular domains. Let C* be a curve obtained from the unit circle by a variation
z*=z+ z - ezo '
(68)
Let f(z) be analytic on and inside of C*.
g(z) = J 1 z +
(81)
z
will be regular analytic for jz\ < 1. (82)
(Izo1 > 1).
If a is small enough, the function
E zo) (1
(z
E zo)
Hence, we may write
21
Itl=r
(i - z)-1 (1
- (z - zo)( - zo)/l
9
") di'.
CONFORMAL MAPPING
111
But for 1, we may replace by f-1 and compute the right-side integral explicitly by the residue theorem. We find easily (83)
1 tai
fc*
* - z*)-lf(T*) d3'* =
- zo
7-z
77
with E
7=zo+ z-z0
(84)
If f (z) is an eigenfunction of the homogeneous integral equation (56) with respect to the curve C*, it must satisfy the condition f(z*) = A 7 7-Zzo 1
(85)
1
+1W
This leads to the simple functional equation for g(z) : 1 9 (z) = - (z -EZ0)1; g (1).
(86)
introduce the analytic function G(z) by the definition
g(z) = d G(z),
(87)
and, by proper choice of the constant of integration, we arrive at the very elegant functional equation
G(z) _ -XG
(88)
_
-((2_0+
C),
which may be solved even for finite values of E.
Instead of solving (88) explicitly, we might develop both sides of (85) in powers of e and compare equal terms on both sides. we find (89)
f (Z) -i- z
E
Zo
f' (z) -f-
.
.
_
zy(z 0
In view of (81) and (84),
E z0)2f (2
Since f(z) is a normalized function, eX cannot be small. (90)
A
f (z) = (Z - z0)2
+ (E),
E - (IzpI2 -
Hence, we find easily 1)E + O(E).
Thus, for a nearly circular domain of this type, all eigenvalues are still very We can construct further examples of eigenfunctions and eigenvalues if we consider a linear superposition of elementary variations of the type (68). While we might have obtained the exact value of X by explicitly solving the functional equation (88), this becomes virtually impossible in the more general case, and the series development in powers of e remains the only tool for an approximate determination of the eigenvalues. high.
112
M. M. SCHIFFER
The last considerations show a useful field of applications for variational procedures, viz., the construction of rather general examples for difficult functionals. In this way, we have the possibility of testing conjectures and obtaining intuitive insight into the nature of the functionals in question.
6. Conclusion. We have tried to show the basic ideas of the variational calculus in conformal mapping. It seems evident that this technique is useful in existence problems, provides complete information in extremum problems, and leads to a large number of inequalities and comparison theorems. There
remains the problem of the actual construction of important canonical map functions and of the Green's function for a given domain. Since variational methods, by their very nature, connect only infinitesimally near domains, it is clear that we can expect from them only results with respect to the conformal mapping of domains near a well-known standard domain, say, a circle. The integral equation (23) indeed allows a calculation of the Green's function and the mapping functions of a domain D* which lies near to a domain D whose Green's function is known. One obtains a series development in powers of the approximation parameter e which converges well if e is small. If it comes to the problem of mapping a graphically given domain of arbitrary structure, it is clear that methods of difference equations or purely graphical procedures will give the answers faster. But the drawback of purely numerical and graphical methods is the lack of information on the dependence of the solution on the parameters of the problem. This is just the point where the variational theory is strongest, and the most promising approach to the theory of conformal mapping seems, therefore, a judicious blending of the more theoretical variational calculus with numerical and graphical methods. BIBLIOGRAPHY
1. L. V. Ahlfors, Conformality with respect to Riemannian metrics, Ann. Acad. Sci. Fennicae Ser. A 202 (1955).
2. S. Bergman and M. Schiffer, Kernel functions and conformal mapping, Compositio Math. vol. 8 (1951) pp. 205-249. 3. R. Courant, Dirichlet's principle, conformal mapping and minimal surfaces, New York, 1950. Appendix by M. Schiffer.
4. P. R. Garabedian, H. Lewy, and M. Schiffer, Axially symmetric cavitational flow, Annals of Math. vol. 56 (1952) pp. 560-602. 5. P. R. Garabedian and M. Schiffer, Convexity of domain functionals, Anal. Math. vol. 2 (1953) pp. 281-368. 6. and , A coefficient inequality for schlicht functions, Ann. of Math. vol. 61 (1955) pp. 116-136. 7. and , A proof of the Bieberbach conjecture for the fourth coefficient, J. Rational Mech. Anal. vol. 4 (1955) pp. 427-465.
8. P. R. Garabedian and D. C. Spencer, Extremal methods in cavitational flow, J. Rational Mech. Anal. vol. 1 (1952) pp. 359-409. 9. G. Golusin, Method of variations in the theory of conformal mapping, Rec. Math. (Mat. Sbornik) N.S. vol. 19 (1946) pp. 203-236; vol. 21 (1947) pp. 83-117, 119-132. 10.
, Geometrical theory of functions of a complex variable, Moscow, 1952.
CONFORMAL MAPPING
113
11. J. Hadamard, Memoire sur le probleme d'analyse relatif a l'equilibre des plaques elastiques encastrEes, Acad. Sci. Paris, Memoires des savants Strangers, vol. 33 (1908). 12. G. Julia, Sur une equation aux dtrivdes fonctionelles liees a la representation conforme, Ann. Ecole Norm. (3) vol. 39 (1922) pp. 1-28. 13. P. Montel, Legons sur les fonctions univalentes on multivalentes, Paris, Gauthier-Villars, 1933.
14. H. Poincare, Figures d'equilibre d'une masse fluide, Paris, Gauthier-Villars, 1902. 15. D. Riabouchinsky, Sur un problbine de variation, C.R. Acad. Sci. Paris vol. 185 (1927) pp. 840-841. 16. A. C. Schaeffer and D. C. Spencer, A variational method in conformal mapping, Duke Math. J. vol. 14 (1947) pp. 949-966. 17. , Coefficient regions for schlicht functions, Amer. Math. Soc. Colloquium Pub. vol. 35 (1950). 18. M. Schiffer, A method of variation within the family of simple functions, Proc. London Math. Soc. vol. 44 (1938) pp. 432-449. 19. , Variation of the Green function and theory of the p-valued functions, Amer. Jour. Math. vol. 65 (1943) pp. 341-360. 20. , Hadamard's formula and variation of domain functions, Amer. J. Math. vol. 68 (1946) pp. 417-448. 21. , Variational methods in the theory of conformal mapping, Proc. Internat. Congress Math. (1950) pp. 233-240. 22. , Variational methods in the theory of Riernann surfaces, Contributions to the theory of Riemann surfaces, Princeton University Press, Princeton, N.J., 1953, pp. 15-30. , Variation of domain functionals, Bull. Amer. Math. Soc. vol. 60 (1954) pp. 23. 303-328. 24.
and D. C. Spencer, Functionals of finite Riemann surfaces, Princeton
University Press, Princeton, N.J., 1954. 25. D. C. Spencer, Some problems in conformal mapping, Bull. Amer. Math. Soc. vol. 53 (1947) pp. 417-439. STANFORD UNIVERSITY, STANFORD, CALIF.
DYNAMIC PROGRAMMING AND ITS APPLICATION TO VARIATIONAL PROBLEMS IN MATHEMATICAL ECONOMICS BY
RICHARD BELLMAN I. INTRODUCTION
The purpose of this paper is to discuss some variational problems arising from mathematical economics, and some of the methods that can be used to treat these questions both analytically and computationally. Since the range of mathematical economics is so extensive-and indeed the subject possesses no precise boundaries-and since the array of mathematical techniques which have been borrowed, begged, stolen, or improvised to cope with this field is so imposing, we cannot hope to present any adequate survey in any reasonably sized article. In consequence, we have restricted our atten-
tion to two important and interesting classes of processes, allocation and smoothing processes, and to a discussion of the application of the theory of dynamic programming to these processes.
The paper is divided into four major parts, aside from this introductory section. In the first part, we present the basic features of the theory of dynamic programming and then show its relation to the calculus of variations.
The second part is devoted to the treatment of allocation processes, containing a formulation of the mathematical model and an application of functionalequation techniques. In the third part, we consider smoothing processes in a similar vein. The final part of the paper contains the details of the computational solution of a "bottleneck process," using the functional-equation approach of dynamic programming. II. DYNAMIC PROGRAMMING
1. Introduction. Multistage decision process. Before we formulate any variational problems arising from mathematical images of the economic uni-
verse and discuss the application of the theory of dynamic programming thereto, let us briefly sketch the salient features of the theory.' A fundamental concept is that of a multistage decision process, which we now proceed to enlarge upon. Consider a mathematical system S whose state at any time t, which we shall take initially to be discretized, is specified by an n-dimensional vector p. There exists a set of transformations, T(p,q), which
may be used upon the system. Here q is a vector parameter which distinguishes one transformation from another. Let (1)
p = T (p,ql),
for any particular choice of q, say q1.
Let, further, (A(p) be some preassigned
1 More detailed discussions may be found in [3,4], where additional references are given. 115
116
RICHARD BELLMAN
scalar function of p, and assume that q, is to be chosen so as to maximize 0(p,). A process of this type is called a decision process. Let us now introduce some dynamic features. Suppose that, after q, has been chosen to determine p,, we must choose q2 so as to determine p2, etc., obtaining in this fashion a sequence of states
p, = T (p,gl), p2 = T (pl,g2), (2)
pN = T(pN-i,qN).
The set of values (q,, q2, , qN) is to be determined so as to maximize 0(pr). A process of this type is called a multistage decision process.
At this point it is important to emphasize the fact that there is no sharp distinction between a single-stage and multistage decision process, since by suitable alteration of the dimensions of p, we can always convert one into the other. Conceptual and analytic convenience alone dictate the name we bestow upon any particular process. To simplify the presentation, we have constrained ourselves initially to a discussion of deterministic processes which are time-independent. We shall remedy this below. Under appropriate assumptions of continuity and differentiability for the functions 0(p) and T(p,q), for p and q in a product region, the problem of determining the variables (q,, q2, , qN) which maximize 0(pN) is a problem in calculus. At these words, there may be a tendency among the mathematical audience to lose interest. At best, it would seem, these problems require partial differentiation and the solution of simultaneous systems of equations, with, perhaps, some Lagrangian multipliers; at worst, they can be foisted off upon modern computing machines. There are many reasons why the first approach either fails or is unsatisfactory, as we shall discuss again below. Let us at this point merely indicate why computing machines provide no panacea. Consider the case, avoiding all temptation to load the dice, where p is a scalar and there are 10 stages to the process, so that N = 10. The problem reduces to one of determining the maximum of a function of 10 variables. A rapid estimation of grid points will convince the reader that any routine search method is out of the question, using any of the current machines. If we permit p to be k-dimensional, k > 1, the problem requires a maximization over Nk-dimensional space. The increase in grid points is exponential, rather than proportional, with a corresponding increase in computing time. Fortunately, the function 0(pN) is not an arbitrary function of N variables, but one arising from an iterative process. As a consequence, it possesses a specific structure. On metamathematical grounds, appealing to that most
DYNAMIC PROGRAMMING
117
useful of principles, the principle of wishful thinking, there should exist ways and means of taking advantage of the special features of the function so as to
reduce the dimensionality of the problem and so to ease the analytic and computational difficulties. As we shall see below, this is indeed the case. 2. The principle of optimality. Let us begin by introducing some notation. Any admissible set of choice variables (qi, q2, , q,v) will be called a policy,
and a set which maximizes the criterion function 4'(pN) will be called an optimal policy.
The problem of maximizing ¢(pN) is equivalent to that of determining all optimal policies. These optimal policies may be characterized by the following simple and intuitive principle : PRINCIPLE OF OPTIMALITY. An optimal policy has the property that, whatever
the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.
As we shall see in the following pages, the analytic transliteration of this principle yields the functional equations which can be used to resolve the original maximization problem. 3. Functional equations. In order to solve the original problem, we begin by complicating it. In place of the original definite problem involving a given initial state p and a fixed number of stages, we pose ourselves the problem of determining optimal policies for similar processes involving any number of stages, starting from an arbitrary initial position. In other words, we embed the original problem within a family of related problems. Furthermore, in place of determining the entire set of choices (ql, q2, qN) which constitute an optimal policy, we endeavor to determine q1, the first choice, as a function of p and N. This information, when available for all p and N, suffices to determine all optimal policies. Consistent with this idea, define the sequence of functions fN(p) = max 0(pN), (3) [51,g2, ... ,QN]
. , and for all p within a prescribed region of variability D. for N = 1, 2, Let us assume that ps D implies that T(p,q) c D for all q, and impose suf-
ficient continuity restrictions to ensure the existence of the above maximum. The principle of optimality cited in Sec. 2 yields the recurrence relations (4)
for N = 1, 2, (5)
fN+1(p) = max fw(T(p,q)),
,with
qj
fi(p) = max 4(T(p,gl)) q1
Starting with the computation of f 1(p), involving a k-dimensional maximization, where k is the dimension of the q-vectors, we compute f2(p), using (4) above, then f3(p), etc. In this way the original Nk-dimensional problem is reduced to a sequence of N k-dimensional problems.
RICHARD BELLMAN
118
4. Continuous decision processes and the calculus of variations. Let us now consider a continuous version of the above discrete process. In place of determining a sequence { qk } which maximizes, we wish to determine a function y(t) which maximizes. In lieu of an abstract definition, which can involve us in a great deal of verbiage, most of it extraneous to the present question, let us consider a specific problem in the calculus of variations, a member of the
general class we shall treat in succeeding sections, and interpret it to be a continuous decision process. We wish to maximize the functional
J(x) = foT F(x,x') dt
(6)
over all functions x(t) satisfying the initial condition x(O) = c. Let us assume that F satisfies appropriate conditions ensuring that the maximum is assumed by a function x(t) possessing a derivative. Since J(x) may be written
J(x)=f5+ ffT,
(7)
(0<S< T),
it is easy to see that, upon having chosen x(t) over the initial interval [0,S], we have a variational problem of precisely the same type as the original, with the distinction that the length of the interval is now T - S and the initial value is the value of x(t) at t = S. This suggests that we follow the same approach as above, and define, for T >_ 0 and all c,
f(c,T) = max J(x).
(8)
X
The principle of optimality yields the functional equation (9)
f (c,T) = max [ f S F(x,x.') dt + f (c(S), T - S) ], x[0,S]
0
where c(S) = x(S) and the maximization is over all functions x(t) that are defined over all 0 < t < S, that satisfy x(0) = c, and that possess derivatives. More precisely, we are maximizing over all values of x'(t) over the interval [0,S]. Let us now assume that the solution of the variational problem is a continu-
ous function of the state variables c and T, and that f (c, T) has the requisite continuous partial derivatives. Letting S -> +0, the limit of (9) is the nonlinear partial differential equation (10)
aT = max [F(c,v) + vac]'
where we have set v = v(c,T) = x'(0).
This is the analogue of the recurrence relation of (4). Once again we have reduced a global maximization to a local maximization. The classical
DYNAMIC PROGRAMMING
119
approach seeks to determine the solution in the form of a function x(t), while the dynamic-programming approach is to determine x' as a function of x and T - t. Geometrically, the classical approach regards a curve as a locus of points, whereas we are taking it to be an envelope of tangents. The duality will manifest itself in a moment in our derivation of the Euler equation from (10) above. Differentiating the right-hand side of (10) to determine a maximizing v, we see that (10) is equivalent to the two equations
aT =
(11)
F(c,v)
+ vac'
0 = av +ac or
(12)
of
aF
ac
av aT=F-vav
Eliminating f between these equations, we obtain the first-order quasi-linear partial differential equation for v, 2
2
32
av2 VT = F, - vv':
(13)
av -
v
acF
av
*
As is easily verified, the equations of the characteristics are equivalent to the usual Euler equation for the variational problem. A fuller discussion of this approach may be found in [5] and [7]. b. Constraints. If we consider the maximization problem of (6) subject to a constraint of the form
0<_x'<1,
(14)
a type of problem of great importance in applications, as we shall see below, the equation corresponding to (10) is 2L
aT = max IF(c,v) +
(15)
vac].
This equation will be discussed again in Sec. 17. 6. Multidimensional case. Let us now formulate in similar terms the general problem of determining the maximum of (16)
J(yl,y2,.
.
.,yx) = fof
T F(xl,x2,.
.
.,xN,
Yi,y2,.
.
.)yx) dt,
over all functions yj satisfying a set of constraints of the form (17)
R,(x1 x2,.
.
.,xN, yi,y2).
.
.,Ys) < 0,
(7 = 1, 2,
RICHARD BELLMAN
120
where the x's and the y's are connected by means of the differential equations dx;
(18)
dt =
GM1(xl)x2,.
.
,XN,
.
yl,y2,.
.,yK),
(i = 1, 2,
, N),
xti(0) = cti.
We shall only present the formalism, and omit any discussion of the oftentimes quite unpleasant rigorous details, since, as we shall indicate below, in many cases of importance, we have a means of by-passing these thorny questions completely. Set
max J = f(c1,c2,
(19)
,cN, T).
[vll
The analogue of (15), derived as in Sec. 4, is (20)
fT = max CF(cl,c2,
,CN, y1,Y2)
,yK)
.
(v(
N Cl C2 .
V1,V2j
.
of VK)
ac;
I,
where the v, are subject to the relations (21)
Rg(cl c2 .
.,CN, vl,V2,.
.
.
.,VK) < 0,
(7 = 1, 2,
.
.
, M)
,c,v, T) satisfies the initial condition
The function f(cl,c2, (22)
f(c1,c2,.
.
.,CN00) = 0,
for all c;.
Eliminating f from (20), in the case where we want an unrestricted maximum, we obtain, in much the same fashion as before, a system of quasi-linear equations for the policy functions v;, i = 1, 2, , K. The proof that the
characteristics of these equations are equivalent to the Euler equations obtained in the usual way is no longer simple (cf. Osborn, Chap. 7 of [5]). 7. Time-dependent case. Consider now the case where t occurs explicitly. We wish to maximize (23)
J(yl,y2,.
iK) =
(' T
Jo
F(x,y,t) dt,
subject to the constraints (24) R(x,y,t) < 0, where the vectors x and y are connected by the vector equations
(25)
dt = G(x,y,t),
x(0) = c.
In order to take account of the dependence upon t, we fix T and introduce
DYNAMIC PROGRAMMING the state variable a, the time at which the process begins. is to maximize (26)
J(y1,y2,.
.
121
The new problem
.,YK) = r T F(x,y,t) dt, Ja
subject to
R(x,y,t) < 0,
(27)
and
dx
x(a) = c.
T = G(x,y,t),
(28)
Define
f(c,a) =max J(yljy2j
(29)
,?K)
{v}
Then, as above, we obtain the equation (30)
- as = max I F(Cl,C2,.
vl,v2)
.
(v}
,vK, a)
N Gi(C1,C2,
.
CN,
v1,v2,.
a) a I, s
i=1
where R(c,v,a) < 0, with the initial condition (31)
f(cl,c2,
,6N, T) = 0.
8. Computational techniques. We shall discuss several methods for obtaining computational solutions in the third part of the paper in connection with allocation processes. Furthermore, we shall point out some of the difficulties arising in the application of classical methods. III. MULTISTAGE ALLOCATION PROCESSES
9. Description of the general allocation process. Discrete form. This section of the paper is devoted to a discussion of some general classes of variational problems arising from the study of multistage production processes. These problems are important mathematically, as well as from the standpoint of applications, since, on the one hand, many purely analytic processes may be profitably interpreted as multistage allocation processes, a view which occasionally furnishes a certain amount of insight, and, on the other hand, the applications focus attention on many interesting analytic processes which might otherwise languish unnoticed. Consider an economic system S whose state at any time t is specified by a vector x(t) whose components measure the levels of N interdependent "resources." Specifically, let (32)
xi(t) = the quantity of the ith resource, i = 1, 2,
, M, at time t.
RICHARD BELLMAN
122
Consider, to begin with, a discrete process, where t assumes only the values At each of these times, we divide the quantity xi(t) into , T. 0, 1, 2,
N + 1 parts, N
xv(t) = I xii(t) + ri(t),
(33)
(i = 1, 2,
,
N),
j=1
where xii(t) is the amount of the ith resource allocated to the production of the jth resource over the interval [t, t + 1], and ri(t) is the amount remaining in the stockpile of the ith resource. An immediate constraint is xii(t) > 0. 0 < 1 xii(t) <= xi(t), i As a result of these allocations, new quantities of these resources are avail-
(34)
able at time t + 1: (35)
xi(t + 1) = ri(t) + 9i(xli(t),x2i(t),.
.
.,xNi(t)),
(i = 1, 2,
. .
.
,
N).
We may conceive, if we wish, of some of these resources as being productive resources, and others as raw materials, or we may consider them all to be raw materials, with productive capacities available at no cost. Given the initial quantities, xi(0) = ci, the problem is to determine the xi;(t) so as to maximize an expression of the form F(xi(T),x2(T),...,xN(T))
(36)
In some cases, in addition to the constraints of (34), there are additional constraints of the form V.
(37)
xij(t) < bi,.
A case of particular interest is that where the amount of the ith resource generated over [t, t + 1] depends upon the minimum allocation over this interval. Let z;(t) = min aijx;i(t),
(38)
i
where (a} is a given matrix. Then (35) takes the form (39)
xi(t + 1) = ri(t) + gi(zi(t)).
If we fix upon an allocation policy, the problem of determining the set x1(T), In this respect, the growth of an eco, xN(T) is one of iteration.
x2(T),
nomic system is a particular case of a "branching process." It would be interesting to know whether results concerning asymptotic behavior, of the precision obtainable in the probabilistic case, can be derived for the economic case.
A survey of the theory of branching processes is contained in Harris [20].
DYNAMIC PROGRAMMING
123
10. Discussion of computational difficulties. Before discussing the continuous version of this process, let us discuss some of the difficulties associated
with the discrete version when considered as a calculus problem. We shall list them and then discuss them in order. a. Dimensionality b. Nonuniqueness of solution; local maxima c. Noninterior maximum; constraints d. Nondifferentiable functions (1) Nonanalytic (2) Imprecise e. Stability analysis As we shall point out again below, the functional-equation approach eliminates all these difficulties except the first and reduces this to a great degree and, in a sense, to its irreducible level. a. Dimensionality. In Sec. 1 we discussed the dimensionality difficulties associated with N-stage decision processes described there. These difficulties are common to all multistage processes. b. Nonuniqueness of solution; local maxima. In locating the maximum of a function of many variables, we have two obvious methods immediately available. We may use a search method to locate the maximum, or we may rely upon partial derivatives, either directly or by means of a gradient method. In either case, we encounter grave difficulties if local maxima exist. If there is a unique maximum, a number of efficient search procedures are available, although the problem of determining most efficient procedures remains open (cf. [21,22]). c. Noninterior maximum; constraints. When constraints exist on the region of variation, a characteristic feature of allocation processes, the maximum may
be inside the region of variation, in which case differentiation will yield the location of the maximum, or it may be on the boundary. In multidimensional variation problems, where combinations of the above cases occur, a consideration of all possible cases is time-consuming and analytically baffling. d. Nondifferentiable functions. The functions which appear in many applications may be nondifferentiable for two reasons: they may be either nonanalytic or imprecise. In the first case, we may have functions such as max (y;,c;) or jy; - cil; in the second case we may have functions defined only by a set of points. The function g(t) we use may be a very good approximation to the actual values, but the derivative g'(t) may be a very poor approximation. e. Stability analysis. Having performed the laborious task of determining the maximum by means of a search procedure, we find, in many cases, that the individual answer is of little importance. What we really desire is an analysis of the sensitivity of the optimal policy to a variation of the parameters defining the process. This is a very long-winded operation if we must
RICHARD BELLMAN
124
examine the parameter space point by point. For a further discussion of this point, see [6]. 11. Dynamic-programming approach. Consider the problem posed in Sec. 9, equations (32) to (36). Define ,xN(T)) = f(c1,c2,
max F(x1(T),x2(T),
(40)
',CN1 T),
{ x;; }
for T > 1, with (41)
f(C1,C".
.
CN, 0) =
F(C1,C2,.
.
.,CN).
Proceeding as in the previous part, we obtain the recurrence relation (42)
f (C1,C2,.
.
.,CN, T + 1) = max f f (r1 +
91(v11,v21,.
.
vNl),
(Vii)
r2 +
rN +
92(v12,v22j.
9N(V1N,V2N,.
.
,VNN), T)],
where N
0 < I v;j < ct,
(43)
(Vii >= 0).
j=1
Starting with the determination of f(c1,c2,
,CN, 0), we compute f(ci, This approach overcomes difficulties b through e of Sec. 10 but faces grave difficulties, with C2,
,CN, 1), using (42); then f(c1,c2,'
,cN, 2); etc.
present-day computing machines, as far as dimensionality is concerned. The value of T, no matter how large, causes no trouble; it is the value of N which is significant.
For values of N larger than 4, not only do we run up against memory problems, but the question of access time becomes significant. 12. Continuous version. To pass from the discrete case to a continuous allocation process, we consider allocations over the interval [t, t + A] having the form x;j(t)0. The quantity x;j(t) now represents a rate of allocation. In place of (35), we obtain a system of differential equations (44)
dx; = dt
hi(x1i)x2a,.
.
.,XNi),
x;(0) = c;.
The inequalities of (34) reduce to (45)
0 <= xtij(t),
with no upper bound. On the other hand, a new constraint enters, viz., (46)
x; (t) > 0,
a constraint automatically satisfied in the discrete case. Subject to these constraints, we wish to maximize the function (47)
F(x1(T),x2(T),. ..,xN(T)),
DYNAMIC PROGRAMMING
125
or, more generally, an (integral of the form (48)
.,xx(t)) dG(t).
JoT
The continuous case appears to afford a much more attractive problem than the discrete case, because of the absence of many of the constraints. Unfortunately, precisely because of the absence of upper bounds, the maximum may not exist, unless we allow the use of delta functions. In precise terminology, it may be necessary to examine variational problems over distribution functions in the sense of L. Schwartz.
To avoid this, in order to obtain a solution of the type we wish to admit as an economic solution, it is customary to impose an upper bound on the rate of allocation: (49)
mzj < X,
0 < xz, (t)
(i, j = 1, 2,
.
.
.
,
AT).
Under this condition, we can establish the existence of a solution of the variational problem, under reasonable assumptions concerning the functions appearing, using a "weak-convergence" argument. 13. Discussion of computational difficulties-continuous case. Many of the difficulties discussed above in Sec. 10 in connection with the application of classical variational methods to the discrete process are also attendant upon the continuous case. We shall list some of the difficulties and discuss only those which have not been considered above. a. Two-point boundary conditions b. Nonuniqueness of solution; local maxima c. Noninterior maximum; constraints d. Nondifferentiable functions e. Stability analysis
The analogue of the finite-dimensional variational equations, obtained by equating partial derivatives to zero, is the Euler equation. This is, in general, a nonlinear differential equation. In order to illustrate in its simplest form what we mean by the difficulty of the two-point boundary condition, consider the problem of maximizing T
(50)
J(x) =
F(x,x') dt, o
(x, = d
over all x subject to x(0) = c. Proceeding purely formally, the Euler equation is (51)
aFdaF TX
dt ax'
RICHARD BELLMAN
126
This is to be solved subject to the two conditions a. X(0) = c aF
(52)
= 0.
b.
Observe that two conditions are required, since (51) is a second-order equation. The first condition (4a) is part of the data; the second condition is derived from the variation.
If (51) is nonlinear, as it is in general, there exists no uniform method for determining the solution of (51) satisfying (52), nor even for determining the existence and uniqueness of solutions.
The usual approach, in any particular problem, is to take x'(0) as an unknown and try a range of values until (4b) is satisfied. Although this method, with modifications, is efficient in one dimension, in multidimensional problems it breaks down. Let us also briefly mention some of the difficulties introduced by constraints. Assume that, in the above problem (50), we had imposed a constraint
0<x'<1.
(53)
We know that there are three possible regimes: a.
(54)
X, = 0,
x'=1,
b. C.
&F
_ d eF _
TX
I 5-X-7 x'
0,
(valid when 0 < x' < 1).
The question is how to fit these three types of behavior together so as to determine the complete solution. At the present time, there is no systematic method for attacking this problem without imposing special conditions upon F(x,x'). 14. Functional equations and computational solutions. Let us now examine the functional-equation approach with a view toward both analytic and computational solutions. Once again, to avoid analytic difficulties which are extraneous to our discussion of the moment, let us consider a simple one-dimensional problem: "Maximize over all functions y, the functional
J(y) = JoT F(x,y) dt,
(55)
subject to
(56)
a. b.
dx
I
= G(x,y),
x(0) = c
0
Setting (57)
f(c, T) = max J (y), V
DYNAMIC PROGRAMMING
127
we obtain the nonlinear partial differential equation of
(58)
(3T
= max [F(c,v) + G(c,v)
ac
under the usual assumptions concerning continuous partial derivatives, with the initial condition f (c,0) = 0.
(59)
There are now two approaches we can use to the computational solution of (58). The principle of the first is to replace all derivatives by differences and thereby convert (58) into a partial difference equation. Thus of _ f(c, T + OT) - f(c,T)
aT -
(60)
AT
leading to (61)
f(c, T + 0T) - f(c,T) [fc + Ac, = AT max [F(c,v) + G(c,v) AC
- f(c,T)J ].
This equation enables us to compute f(c, T + AT) given the values of f(c,T). Starting with (59), we can theoretically determine f(c,T) for all points of the grid [k Ac, 1 AT].
We have carefully used the words "principle" and "theoretically" in the above paragraph, because, in actuality, practically nothing is known about the convergence and stability properties of approximation methods of the above type. Nor do we know much about optimal grid sizes or optimal choices of approximating difference quotients, when dealing with nonlinear partial differential equations. In actual practice, we have found that the above method is unsatisfactory, because of stability difficulties. Apart from these matters of practical concern, the above procedure raises a number of difficult questions for the theoretician. In order to use (58), he must know a great deal about the solution of the variational problem. Second, he must show that the sequence generated by (61) converges to the solution of (58) as is and AT both approach zero. Both parts of the problem are quite difficult. In order to avoid these difficulties, and simultaneously to obtain a more efficient computational procedure, let us employ an approximation scheme of a different type. 15. Computational Solutions-II. Let us replace the continuous variational problem of (55) and (56) by the following discrete variational problem: " Maximize (62)
J(y) _
F(xk,Yk)O, k-0
RICHARD BELLMAN
128
over all sequences {yk} subject to (63)
a. b.
Xk+1 = Xk + G(xk,yk)1, yk <_ xk.,, 0
x0 = C
Here
T = N0,
(64)
xk = x(k0),
Ilk = y(k0).
Let
max J(y) = fN(c).
(65)
Then, using the techniques of the preceding sections, we have
fo(c) = max F(c,y)0, 0 sy sc
(66)
fN+1(c) = max [F(c,y)z + fN(c + G(c,y)0)]. 0Sy5c
This technique is one that has worked very well on a number of examples. 16. Convergence problems. The question arises as to the convergence of fN(c) to f (c, T) as A - 0. Fortunately, this is a matter which can be resolved quite easily under various reasonable conditions upon the functions F(x,y) and G(x,y). Interestingly enough, it can be shown that the sequence {fN(C) = fN(C,0) }
converges to a function f(c,T) under conditions which are weaker than those required to establish the existence of a solution of the corresponding variational problem. This point has significance in connection with the theory of multistage games of continuous type (cf. [8]). For a discussion of the convergence problem we refer to [12], [24], [25]. 17. Structure theorems. As we have discussed above, in many of the variational problems arising in mathematical economics, we are interested not only in the numerical solution, but also in the structure of the optimal policy. Let us now indicate, in a heuristic fashion, how the nonlinear partial differential equation of (58) may be used to deduce the analytic character of the optimal policy in certain situations. Consider, as an example, the problem of maximizing
J(y) = J0T (x
(67)
y) dt,
where (68)
a.
dx = G(x,y), dt
x(0) = c,
0
b.
The corresponding functional equation is (69)
OT =
max Lc - v) + G(c,v) of v c ac J
0
DYNAMIC PROGRAMMING
129
Let us assume that the function G(x,y) satisfies the following conditions: (70)
a.
G(x,y) > 0,
(for 0-
b.
c > m >_ G,,(x,y) > 0,
(for 0 << y < x),
Gyy(x,y) > 0, IG(x,y)l < ait + a2JxI,
(for 05y<x).2
C.
d.
(for 0
It is clear that of/ac -> 0 as T -> 0, and it is plausible that of/ac is a monotone increasing function of T for t >_ 0, if we think of an economic interpretation of the analytic problem:
FIG. 1
It follows then that for small T, with c fixed, the maximum will be at v = 0. As T increases, the maximum stays at v = 0 until we reach a value of T where (71)
ac
For larger T, there will be a maximum inside the interval [v,c] until T becomes so large that (72 )
(;)= v
For larger values of T, the maximum remains at v = c. It follows that the "policy function," v = v(c,T), has the form shown in Fig. 1 as a function of c and T. Having determined the structure of the policy, we can now determine the precise boundary curves, employing (68), (71), and (72). 1 This condition is imposed to ensure the existence of a solution of (68a) for all T k 0.
RICHARD BELLMAN
130
A rigorous discussion of this problem, using classical variational methods, is presented in [13].
18. Linear functions and functionals. A particularly interesting class of problems derives from the general problem of Sec. 12 under the assumption that all functions involved are linear. Using vector-matrix notation, the problem is that of maximizing the inner product
J(y) =
(73)
fOT
(x(t),a) dt,
where (74)
a.
b.
dx dt
x(0) = c,
= Ax + By, Cy<_Dx.
An initial investigation of these problems, using the functional-equation approach outlined above, is contained in [5], where the importance of the dual problem is emphasized. A unified discussion of these problems will require the recent theory of distributions of L. Schwartz. For the solution of some specific problems of this general type, see [5], [18], and Lehman [23], where a continuous version of the "simplex method" of Dantzig is presented. IV. SMOOTHING PROCESSES
19. Introduction. In the previous sections, we have discussed a variety of allocation processes which arise in mathematical economics. In this part, we wish to discuss some representative examples of a problem area which has become of great significance in recent years, the study of "smoothing
processes."
We shall consider only a part of the economic aspects of the over-all prob-
lem, referring the reader interested in other aspects of the problem to the work of I. J. Schoenberg [26] on data smoothing, to the prediction theory of N. Wiener [29], and to the general theory of feedback control in servomechanism theory [27], where some related work on noise theory is also discussed. 20. General smoothing processes. Let us consider an economic system S specified by a vector x(t), and let y(t) be another vector, not necessarily of the same dimension, which we shall call the "control vector." The equation governing the system is assumed known, viz., (75)
at = g(x,y),
x(0) = c,
where g is a known function. It is desired to maintain the system in the state z(t), which is not the solution of (75). The loss incurred as a result of the fact that x(t) is not equal to
DYNAMIC PROGRAMMING
131
z(t) over [0,T] is measured by
J(y) = f ' lix - zii dG(t),
(76)
where Ix - zI is some norm such as a.
(77) b.
A more realistic, but less frequently treated, criterion is the Cebysev norm
Ja(y) = max JJx - z1J.
(78)
05t
A discussion of this norm using dynamic-programming techniques is contained in [10]. We now wish to choose the vector y so as to minimize J(y). In many cases, restrictions will exist covering the admissible y of the following forms: a. (79)
b.
Iyal < m1,
(i = 1, 2,
. .
.
,
M),
f 0T I Hyi I dt -5 c,
one or the other or both in conjunction. Processes of this general type are called smoothing processes. Alternatively, we may take the total cost of the smoothing process to consist of the cost of deviation, f 0T I Ix - z J I dG(t), plus the cost of control, a
f0T huh I dt,
where a is a positive constant. The y-norm and the (x,z)-norm are not necessarily the same. 21. Stochastic smoothing process. In some cases, in place of (75), the basic equation of the system is (80)
dt = g(x,y,r(t)),
x(0) = c,
where r(t) is a random function of t. A typical problem is now to determine y so as to minimize (81)
E(J(y)),
where E denotes the expected value over r. 22. Functional equations. It is clear that, although the underlying processes are quite distinct in nature, the analytic problems discussed in the preceding sections are equivalent to those arising from allocation processes, discussed in Part III. Consequently, the same techniques may be used to treat these problems.
RICHARD BELLMAN
132
In the following sections, we shall discuss some simple processes, in order to illustrate these methods. 23. A discrete smoothing process. Let us assume that we have a monthly demand rK for a given item, and production facilities which can be used to satisfy this demand. The rules of the game are that we must produce enough to fulfill the demand. When the demand goes up, we must hire more people to operate the facilities, when it goes down, we have a choice of firing some of these or keeping the same staff. The incentive to fire is obvious, the incentive to retain lies in the fact that there is a cost associated with hiring new people, or, generally speaking, in expanding facilities.
Let xK be the actual production rate, and take as the cost of excess production at the Kth stage the function 9K(XK - rK). Furthermore, take as the cost of increasing production from the Kth to the (K + l)st stage the function hK(xK+l - Xx) We wish to determine a production schedule which minimizes the total cost of the N-stage process, N JN(x) _
(82)
N-1 hK(xx+1 - Xx).
9K(xx - rK) +
x=1
K=1
A simple choice for hK is hK(x) = aK max (xK+1 - XK, 0).
(83)
The variational problem is that of minimizing J(x) over all sets xK satisfying the constraint xK it rK.
(84)
Let us define N-1 f R (c) = min [ I 9K(xK - rK) + I N
(85)
(x)
forR=2,3,
K=R
N, and f1(c) = min JN(x).
(86)
{5l
Then (87)
hK(XK+1 - C)
K=R-1
fn(c) = min [gR(x - rR) + hR(x - c) + fR+i(x)), x >rg
forR=2,3,
,N-1.
A particularly interesting case is that where the gK and hK are quadratic and the restriction xK >= rK is dropped. In this case, it can be shown inductively that each element of the sequence { fR(c) } is quadratic in c, (88)
f R(C) = UR + VRC + WRC2,
DYNAMIC PROGRAMMING
133
and it is easy to obtain a recurrence relation for the sequence {uR,vR,wR}, using (87). The details are given in [11].
Closely related results may be obtained for the case where the restriction xK ? rK is retained. 24. The continuous version. A continuous version of the above discrete process brings up the problem of minimizing the integral (89)
J(x) _ f [(x(t) - r(t)) -h dt
dt,
over all x(t) satisfying the condition x(t) > r(t). For the case where the functions are undistinguished by any special properties, the simplest procedure is to use the discrete process of Sec. 23 as an approximation. If g and h are quadratic, there are simple results based upon the formulation of (88) (cf. [11]). The explicit solution of a particular case is given in [17] using special meth-
ods, and a number of related problems are discussed by Arrow, Beckmann, and Karlin in [2]. 25. The criterion min max lix - z1J. Problems involving the minimiza0 5t
(90)
0stsT
where (91)
7t = g(x,y), dt
x(0) = C,
and y(t) is the independent variable, which seem impervious, in general, to the usual variational attack, may be handled by the functional-equation technique. The procedure is discussed in [10].
A certain class of problems of this type can be resolved by variational techniques borrowed from the theory of games (cf. [6]).
26. The optimal inventory problem. A range of smoothing problems of great interest and difficulty may be grouped together under the heading of "optimal inventory" problems. The basic problem is that of ordering, or producing, in the face of an uncertain future demand, with various penalties for under- and overproduction. Since the subject is one that merits a full exposition of its own, we shall refer the interested reader to Arrow, Harris, and Marschak [1], Dvoretzky, Kiefer, and Wolfowitz [19], Whitin [28], and [14] and [15], where a variety of processes are discussed and treated. V. COMPUTATIONAL SOLUTION OF A BOTTLENECK PROCESS
27. Introduction. In this part we wish to discuss the details of the numerical solution of a particular bottleneck process of the type described in Sec. 18.
RICHARD BELLMAN
134
28. A bottleneck process. Let us assume that we have two interdependent
industries, the "auto" industry and the "steel" industry, and that the state of each industry at any particular time may be completely specified by two quantities, the stockpile or raw material required for production, and the maximum productive capacity. To simplify the problem for this discussion,
we shall take the auto capacity to be unbounded. We assume that the purpose of the process is to produce as many autos as possible over a time period T. The process is taken to be discrete, with allocations made only at the times
t = 0, 1, 2,
, T - 1. At any particular time n let
x8(n) = amount of steel in the steel stockpile, xm(n) = capacity of steel mills. At each time n the steel in the stockpile may be used for one of three purposes 1. To produce additional steel, using the existing steel capacity 2. To increase the steel capacity 3. To produce autos, using the existing auto capacity Let us write
(92)
(93)
x8(n) = y8(n) + y,(n) + ya(n),
where
ye(n) = the quantity of steel used to produce additional steel, y,(n) = the quantity of steel used to increase steel capacity, y,(n) = the quantity of steel used to produce autos.
(94)
Let us introduce the following constraints on allowable allocations: (95)
a. b.
0 <= ya(n) < rix8(n), y8(n) < xm(n).
The first constraint says that it is not possible to use more than a fixed percentage of steel for auto production over any stage, k to k + 1, while the second asserts that there is no point to allocating more steel to the steel mills than the maximum capacity. Let us now see how the state of the system is affected by these allocations, assuming linearity of production : x8(n + 1) = r=y8(n), xm(n + 1) = xm(n) + r3ym(n) Finally, let us assume that the quantity of autos produced in a stage is ya(n).
(96)
1,
It is required to choose the quantities y8(n), yc(n), and ya(n), for n = 0, , T - 1, so as to maximize the total quantity of autos produced over
the time period [0,T], given the initial quantities, (97)
x8(0) = C1,
xm(0) = C2-
29. Discussion. It is clear that this problem can be formulated as a linear programming problem, involving the maximization of the linear form
DYNAMIC PROGRAMMING
135
T-1 (98)
L(y) _
ya(n),
n=O
subject to the constraints ys(n) H- yc(n) + ya(n) _ xs(n),
Y., Yc, Y. ? 0,
Ya : rixa, ye < xm, x,(n + 1) = rzy$(n),
xm.(n + 1) = xm(n) + rsym(n).
f.
If, however, T is of moderate size, say 20 or 30, we see that the problem is a formidable one, although not by any means impossible. As we have noted above, the interest in the solution of problems of the type posed above lies not so much in the numerical solution of a particular representative as in the determination of the structure of an optimal policy. In the next section we shall formulate the problem in dynamic terms and see how our requirements concerning a solution are met. 30. Functional equations. Let us define the sequence fT(ci,c2) = max L(y),
(100)
(y)
where L(y) is, as in (98), the total auto production resulting from a sequence of allocations. Then, clearly, (101)
fl(cl,c2) = max y.(0) = rIc1, V. Sr,c,
and
fr(ci,c2) = max [ya + fT-i(r2ye, c2 + ray.)],
(102)
R
for T = 2, 3, (103)
,
where the maximization is over the region defined by
a.
Ya + l/ + Y. = Cl,
b.
Ya, Ys, Y. 1 0, ya < rice,
C.
d.
Ys < C2.
31. Discussion. Given the function fi(ci,c2), as determined by (101), we Although the maximization is over a three-dimensional region, we know, actually, that the maximum occurs at a vertex of the polyhedral region determined by (103). These vertices may can now compute f2(ci,c2) from (102), etc.
readily be computed, and thus the maximization is reduced to the trivial problem of choosing among a handful of alternatives, each of which has an important economic interpretation.
RICHARD BELLMAN
136
There are, however, still two unpleasant aspects remaining to the computation. We must, first of all, compute a sequence of functions of two variables, requiring therefore a large memory in our digital computer, and, second, we are faced by the possibility of an expanding region in the (Cl,c2)-plane. This means that, to compute the values of fT(c1,c2) in some rectangle 0 < cl < cl, 0 G C2 < C2, we must determine fT_1(c1,c2) in a larger rectangle, fT-2(CI,C2) in
a still larger rectangle, etc. This can be a serious roadblock to feasible computation. 32. Reduction in dimension and constrained grid. Let us now show how we can utilize the homogeneity of the process so as to reduce the sequence of
functions of two variables to sequences of functions of one variable, and simultaneously to maintain a uniform region of values for all T. We begin with the observation that C2
(104)
f T (C1,C2) = C1fT
= C2fT
l, l
1
1
(i)
for el, C2 > 0. Thus it is sufficient to determine the sequence { fT(1,x) } for all x > 0, or { fT(x,1) } for all x > 0. Let 9T(x) = fT(l,x), (105)
hT(x) = fT(x,1)
It will be shown that the computation of fT(cl,c2) can be reduced to the simultaneous computation of the two sequences {gT(x) } and {hT(x) }. Returning to (102), we have (106)
9T(x) = fT(l,x) = max [ya + fT-1(r2y3, C2 + r3ym)],
with cl = 1, C2 = x. (107)
Hence
9T(x) = max Y. + r2y8fT-1
1,
R
r2 a
lJ
(C2 + y3ym 11
max ya. + r2y89T-1 R
C2 + 3y m11
r22 ,
I
,
if r2y, ? C2 + r3ym, and (108)
L
9T(x) = max I y. + (C2 + r3ym)hT-1 (C2
if r2ys < C2 + r3ym.
A computation of this type is readily performed on a digital computer. In conjunction with Stuart Dreyfus, a number of these computations have been run, disclosing some interesting structural behavior of the optimal policy, which we shall discuss elsewhere.
DYNAMIC PROGRAMMING
137
BIBLIOGRAPHY
1. K. Arrow, T. E. Harris, and J. Marshak, Optimal inventory policy, Econometrica (July, 1951).
2. K. Arrow, M. J. Beckmann, and S. Karlin, Game theory methods applied to the optimal expansion of the capacity of a firm, Stanford Univ. Dept. of Economics, Tech. Rep. 27 (1955). 3. R. Bellman, The theory of dynamic programming, Bull. Amer. Math. Soc. vol. 60 (1954) pp. 503-516. 4. , Dynamic programming in E. F. Beckenbach (ed.), Mathematics for modern engineers, McGraw-Hill Book Company, Inc., New York, 1956, chap. 11. 5. , Dynamic programming of continuous processes, RAND Monograph R-271
(1954). 6.
, An introduction to the theory of dynamic programming, RAND Monograph
R-245 (1953). 7. , Dynamic programming and a new formalism in the calculus of variations, Rivista di Parma vol. 6 (1955) pp. 193-213. , Functional equations in the theory of dynamic programming, III, Multi-stage 8. games, Studies in game theory, Rendiconti di Palermo, Serie II, Tono V (1956) pp. 1-23. , Functional equations in the theory of dynamic programming, IV, Multi-stage 9. decision processes of continuous type, The RAND Corp., Paper No. P-705, July 12, 1955. 10. , On the minimum of maximum deviation, Quart. Appl. Math vol. XIV (1957) pp. 419-423. 11. , On a class of variational problems, Quart. Appl. Math vol. XIV (1957) pp. 353359.
, Functional equations in the theory of dynamic programming, VI, A direct con12. vergence proof, Ann. of Math. vol. 65(1957) pp. 215-223. , W. Fleming, and D. V. Widder, On a class of variational problems, Ann. di Mat. 13. Serie IV Tomo XLI (1956) pp. 301-323. 14. , I. Glicksberg, and 0. Gross, On some nonlinear integral equations occurring in the theory of dynamic programming, Proc. Nat. Acad. Sci. U.S.A. vol. 41 (1955) pp. 227-229. , On the optimal inventory equation, Management Sci. vol. 2 , and 15. , (1955) pp. 83-104. On some unconventional problems in the calculus of . , and 16. , variations, Proc. Amer. Math. Soc. vol. 7 (1956) pp. 87-94. and 0. Gross, Some problems in the theory of dynamic programming, a smoothing 17. problem, J. Soc. Ind. Appl. Math. vol. 2 (1954) pp. 82-89. and S. Lehman, Studies in bottleneck processes, RAND Paper P-492 (1953). 18. 19. A. Dvoretzky, J. Kiefer, and J. Wolfowitz, The inventory problem, I, II, Econometrica vol. 20 (1952) pp. 187-222. 20. T. E. Harris, Some mathematical models for branching processes, 2d Berkeley Symposium on Mathematical Statistics and Probability (1950) pp. 305-328. 21. S. M. Johnson, Best exploration for maximum is fibonaccian, The RAND Corp., Paper No. P-856, May 4, 1956. 22. J. Kiefer, Sequential minimax search for a maximum, Proc. Amer. Math. Soc. vol. 4 (1953).
'23. S. Lehman, On the continuous simplex method, RAND Research Memorandum RM1386 (1953). 24. H. Osborn, On the convergence of discrete stochastic processes to their continuous analogies,
RAND Corporation RM-1368 (1954). , The problem of continuous programs, RAND Paper P-718 (1954). 25. 26. I. J. Schoenberg, On smoothing operations and their generating functions, Bull. Amer. Math. Soc. vol. 59 (1953) pp. 199-230.
138
RICHARD BELLMAN
27. J. G. Truxal, Automatic feedback control system synthesis, McGraw-Hill Book Company, Inc., New York, 1955. 28. T. Whitin, The theory of inventory control, Princeton University Press, Princeton, N.J., 1953. 29. N. Wiener, Cybernetics, John Wiley & Sons, Inc., New York, 1951. Note added in proof, December, 1957:
Since this article was written, the author has obtained a number of additional results which may be found in his book 1. Dynamic programming, Princeton University Press, Princeton, N. J., 1957. and in the following articles:
2. On the application of the theory of dynamic programming to the study of control processes, Proc. of the Symposium on Nonlinear Circuit Analysis, Polytechnic Institute of Brooklyn, 1956. 3. Terminal control, time-lags and dynamic programming, Proc. Nat. Acad. Sci. USA vol. 43 (1957) pp. 927-930. 4. Dynamic programming, nonlinear variational processes, and successive approximations, The RAND Corp., Paper No. P-1133, Sept. 6, 1957. 5. Some new techniques in the dynamic programming solution of variational problems, Quart. Appl. Math. (to appear). 6. Dynamic programming and the computational solution of feedback design control problems, Conference on Computers in Control, AIEE Atlantic City Oct. 16-18, 1957. 7. Dynamic programming, successive approximations and variational problems of combinatorial nature, The RAND Corp., Paper No. 1175, Sept. 13, 1957. 8. Dynamic programming approach to optimal inventory processes with delay in delivery, The RAND Corp., Paper No. P-1227, Dec. 6, 1957. THE RAND CORPORATION, SANTA MONICA, CALIF.
VARIATIONAL METHODS IN HYDRODYNAMICS BY
S. CHANDRASEKHAR
1. Introduction. Studies in hydrodynamic and hydromagnetic stability have led to characteristic-value problems in differential equations of high order, and it has been possible to solve several of them by variational methods which,
at least in the manner of their applications, appear novel. In this paper we shall briefly describe two such examples; a list of further examples will be found in Chandrasekhar [1]. 2. First example. In the study of the instability of a layer of fluid heated from below and subject to Coriolis forces resulting from rotation with an angular velocity 0 about the vertical, one is led to the following problem: To solve
(D2-a2-io)Z= -2-dDW, v
(1)
(D2 - a2) (D2 - a2 - io)W -
(2)
2Q d3 DZ = F, V
and
(D2 - a2 - iwo-)F = - Ra2W,
(3)
together with the boundary conditions
W = F = 0 for z = ± 2i
(4)
and (5)
Either DW = Z = 0 on z = ±1V1 D2W = DZ = 0 on z = ±, Or
DW=Z=Oonz=-}-i
Or
and
D2W=DZ=0onz=-,
Si, v, and Co are assigned constants, and o- is a parameter to be determined by the condition that the characteristic value R is real. The solution of the physical problem requires the minimum (with respect to a2) of these real characteristic values of R for various assigned values of T (= 402 d4/v2) and w.
where D = d/dz, a,
A consideration of the foregoing double characteristic-value problem (double
since both a and R are to be determined) leads to the following variational principle :
From equations (1) to (3) it follows that f #i [(DF)2 + (a2 + iiaa)F2] dz (g)
R = a2
1
1[(D2 - a2)W]2 + d2[(DZ)2 -f- a2Z2] ia[(DW)2 + a2W2 + d2Z2]} dz 139
S. CHANDRASEKHAR
140
It can now be readily verified that the variation SR in R given by equation (6) due to variations SW and SZ in W and Z compatible only with the boundary conditions on W, Z, and F, is given by
SR = -
(7)
2
a 12 r--
f+1 5F[(D2 - a2 - iwv)F + Ra2W] dz,
-I
where 12 stands for the integral in the denominator of the expression on the right-hand side of (6). Accordingly, SR = 0 for all small arbitrary variations SF, provided that
(D2 - a2 - i& r)F + Ra2W = 0,
(8)
i.e., if the differential equation governing W is satisfied. On this account, formula (6) provides the basis for a variational procedure for solving equations (1) to (3) (for any assigned a2 and a) and satisfying the boundary conditions of the problem. It should be noted that formula (6) does not express R as the ratio of two positive-definite integrals; indeed, for an arbitrarily assigned Q, R will be complex. Nevertheless, it appears that the simplest trial function for F, namely, F = cos 7rz, already leads to surprisingly accurate determinations for the characteristic values provided that, for the chosen form of F, the functions W and
Z are determined as solutions of equations (1) and (2). (For the details of the solution and for the comparison with experimental results, see Chandrasekhar and Elbert [2] and Fultz and Nakagawa [3].) 3. Second example. The stability of an incompressible, heavy, viscous fluid of variable density leads to the following characteristic-value problem (Chandrasekhar [Q: To solve (9)
D Lp - n
(D2 - k2) ] Dw - n (DA) (D2 + k2)w }
-n(D2-k2)]w-2(Dµ)(Dw)t,
=k2
1
together with the boundary conditions
w = 0 for z = 0 and 1,
(10)
and
(11)
Either Dw = 0 for z = 0 Or Dew = 0 for z = 0 Or Dw = 0 for z = 0 Or
Dew=0 forz=0
and and and and
1, 1,
Dew = 0 for z = 1,
Dw = 0 for z = 1,
where p = p(z) and u = u(z) are given functions of z, k is an assigned (real) constant, and n is the characteristic-value parameter. (Note that n can be complex.)
VARIATIONAL METHODS IN HYDRODYNAMICS
141
One can deduce from equation (9) that (12)
n
Joy p {w2
-
(Dw) 2 t dz - n f(Dp)w2 dz o
1
t
{p Ik2w2 + 2(Dw)2 + o
Y2
(D2w)2 I + (D2A)w2} dz;
and again this last equation provides the basis for a convenient variational procedure for determining n. For, considering the effect on n [determined in accordance with equation (12)] of an arbitrary variation bw in w compatible only with the boundary conditions on w, we find that (13)
- 2 k2 (I, + n29 12) 5nn = t Sw {k2 [Pw - (D2 - k2)w - n2(Dp)w J n
-D IpDw-E(D1-k2)Dwn
where
Il =
(14)
f
(DA)(Dw)J n In (Dµ)(D2 + k2)w]} dz,
('
p
W2 -I- k2 (Dw)2} dz
and
12 =
(15)
(Dp)w2 dz. l
It will be noticed that the variational procedure in this instance involves the solution of a quadratic equation none of whose coefficients are positivedefinite. Nevertheless, as Hide [5] has shown, simple trial functions for w satisfying the boundary conditions enable the complicated dependence of n on the various parameters of the problem to be determined. BIBLIOGRAPHY
1. S. Chandrasekhar, On characteristic value problems in high order differential equations which arise in studies on hydrodynamic and hydromagnetic stability, Amer. Math. Monthly vol. 61 (1954) pp. 32-45. 2.
and D. Elbert, The instability of a layer of fluid heated below and subject to Coriolis
forces, 77, Proc. Roy. Soc. London Ser. A vol. 231 (1955) pp. 198-210. 3. D. Fultz and Y. Nakagawa, Experiments on over-stable thermal convection in mercury, Proc. Roy. Soc. London Ser. A vol. 231 (1955) pp. 211-225. 4. S. Chandrasekhar, The character of the equilibrium of an incompressible heavy viscous fluid of variable density, Proc. Cambridge Philos. Soc. vol. 51 (1955) pp. 162-178. 5. R. Hide, The character of the equilibrium of an incompressible heavy viscous fluid of variable density: an approximate theory, Proc. Cambridge Philos. Soc. vol. 51 (1955) pp. 179-201. UNIVERSITY OF CHICAGO, CHICAGO, ILL.
SOME APPLICATIONS OF FUNCTIONAL ANALYSIS TO THE CALCULUS OF VARIATIONS BY
E. H. ROTHE
1. Introduction. Methods of functional analysis for proving existence and other theorems in the calculus of variations have been used for a long time' and by many mathematicians. As in [9] and [10], the treatment in the present paper will be based on the well-known fact that the ball in a reflexive Banach space is weakly compact and on the theory of gradient mappings. For the general motivation and the facts needed of the latter theory, the reader is referred to the introduction of [10] and the literature mentioned there. In [10] a multiple-integral problem was treated, and among other things an existence proof was given. In Sec. 2 of the present paper a simpler existence proof for the problem in question is given by the explicit use of Friedrich's theory of half-bounded operators [2]. (See also the exposition of this theory in [6].)
In Sec. 3 the method used in [10] is applied to problems connected with the integral faQ2
(1)
.7(y) =
.
f(t)yl,.
y.,pl.
1
.
.,pn) dt,
(Pi =
'
O,
in the special case that2 n
(2)
n
f = i,kk= I a(t)pp + r I bi(t)pi + l
c(t,yl,.
.
.,Y.)p
i=1
the object at present being not to obtain new results or even the same generality as in the classical theory (see, for example, [7]) but to show that the theory mentioned above gives a new approach to problems of this kind. For instance, the "index form" connected with the integral (1) will not be defined
by "intermediate manifolds" [7, Chap. III] but will appear directly as a quadratic form in Hilbert space (cf. also [4]) in the form of a scalar product (x, x + K(x)), where K is a completely continuous symmetric linear operator. The classical spectral theorem for such operators allows then the immediate
X xl and that at most a finite
conclusion that the form can be written as
number of the coefficients X. are negative. 2. Existence proof for a multiple-integral problem. (3)
i(y) = fD f (t,y,y1,.
.
Let
.,y.) dt.
1 For a survey of older literature (until 1931) of the subject, see [11]. 2 For notations and assumptions, see the beginning of Sec. 3. 143
E. H. ROTHE
144
Here D denotes a bounded open domain in the n-dimensional real Euclidean space of points t with coordinates ti, t2, . , t,,, while y; = ay/at, and n
n
(4)
fJ = I aik(t)yiyk +
bi(t)yi + c(t,y) i=1
i,k=1
We make the following assumptions: the boundary b of D has Hoelder continuous derivatives up to order three. The aik(t) are defined in the closure f) of D and have there Hoelder continuous derivatives up to order two, while their matrix is positive-definite. The bi(t) are in L2 = L2(D);3 c(t,y) is continu-
ous for t C D and - c < y < + cc and satisfies a (uniform) Lipschitz condition in y. We consider the minimum problem of (3) among functions y = y(t) satisfying the boundary condition (on D).
As usual, the scalar product of two elements y and 71 of L2 is denoted by (y,,q) and IIyJI
= +. Moreover, we denote by H the set of those
y C L2 satisfying (5) which have continuous derivatives up to order three in D. For y,n in H, we set (6)
Q(y,i]) = 2 fD I aikyilk dt. i,kk-1
is symmetric. Moreover, Q is half-bounded; more precisely, there exists a constant C such that (7)
1y112
Moreover, if we set (8)
(y C H)
CQ(y,y),4
E(y) _ -2 i.k=1
a, aik
t
,
an application of Green's theorem [taking (5) into consideration] gives us (9)
Q(n,y) = (L(77),y),
(yCH)
It follows [6, p. 16, example 2.1] that Q is closable.' Let then Q be the closure a L2(D) = { x(t)j fD x' dt < - }. 4 Inequality (7) follows from (2.10) in [10] if, in that inequality, x - x,,, is replaced by x and yn - Y. by y = M(z). Note that, by (1.11) and (2.4) of that paper and (6) of the present paper, Q(y,y) = [x,x]. ' A form Q with domain ZQ is closed if the following is true: if y is a sequence of elements
of ZQ such that yn -4 y and Q (y - y., y 0 as n,m -4 eo, then y C DQ and Q(y - y,,, y 0. Q is closable if it has a closed extension. For details see, for example, [6, pp. 11-13].
APPLICATIONS OF FUNCTIONAL ANALYSIS
145
of a and HQ the domain of Q. Then HQ is a complete Hilbert space if the scalar product and norm in HQ are defined by (n,y) = Q(ii,y),
(10)
IIyIIe = +
).
Moreover, Z can be extended to a self-adj oint operator L, whose domain HL is dense in HQ. (For a proof of these facts, see [6, pp. 12-13 and p. 24, Theorem
Relation (7) is then valid for Q and all y C HQ, and (9) holds for the extensionsL and Q, that is, ally C HQ and , C HL. That j(y) can be extended to HQ is seen as follows: Obviously we have to show that, if {yn} is a sequence of elements of .t? which is a Cauchy sequence in HQ, then j(yn) is a Cauchy sequence (of real numbers). Now by (3), (4), (6), and (10), we have 4.1].)
i(y) = IIyIIQ + I(y),
(11)
with n+
I(y) = fn
(12)
bi(t)yi dt + fD c(t,y) dt.
L,
i=1
Now (7) shows that {yn} is a Cauchy sequence also in L2, and the assumed Lipschitz condition on c(t,y) shows then that fD c(t,yn) dt is a Cauchy sequence.
To prove the same for the sequence n 11
bj(t)y{"'(t) dt},
l fD i=1
(13)
we note that the positivity assumption concerning the matrix of the aik implies the existence of a positive constant m such that (14)
/
ydt J
L Zj
i' dt
m1
(
J
Z ay,ij dt = m2 k=1
This inequality obviously implies that the sequence { y; } is a Cauchy sequence, and therefore also (13). Finally it is clear that IIynIIe is a Cauchy sequence. We now prove the following: Let K be a positive number and V the ball of radius K with the zero element of HQ as center, that is, V = {yI IIyjjQ K}. Then j(y) takes a minimum in V; that is, there is a yo C V such that j(y) >_ j(yo) for all y in V.6
For the proof we denote by Vk the topological space which, as point set, coincides with V and whose topology is induced by the intersections of the usual weak neighborhoods with V [8, p. 423, Definition 2.4]. Since Vs is compact, it will be sufficient to prove that j(y) is lower semicontinuous in Vrc yo is an element of L2, since, as a point set, HQ can be identified with a subset of L2. Cf. [6, p. 14].
E. H. ROTHE
146
[8, p. 424, Lemma 2.3]. Since the square of a Banach space norm is lower semicontinuous in VK [8, p. 424, Lemma 2.4], the expression (11) for j(y) shows that only the lower semicontinuity for 1(y) has to be proved. Actually 1(y) is continuous in VK, which may be seen as follows: The continuity in VK of the first integral in the expression (12) for 1(y) is trivial, since this linear functional of y is bounded in HQ as follows readily from the inequality (14) [together with (10)]. Thus it remains to prove the continuity in VK of the second integral in (12). We remark first that the Lipschitz assumption concerning c(t,y) implies the existence of a positive constant a such that fD
(c(t, y +
77) -- c(t,y)) dt
2 <
a fDndt
2
< a'- IInnII2
(measure of D).
This inequality shows that it will be sufficient to prove the following lemma: LEMMA.
IylI is continuous in VK.7
To given yo C VK and e > 0 we have to construct a neighborhood W of yo in VK such that Proof.
(15)
(for y C W).
(IIyII - Il'yoll)2 < e2,
Now by Friedrich's inequality [1, p. 489], there exist, to given S > 0, elements w1, ' , wN such that N
(16)
(h,.,)2 + SQ(h,h).
IIhII2 = (h,h)2 = Y=1
But l,(h) _ (h,wv) is a bounded linear functional in HQ, as follows immediately from (7). We now choose S = e2/8K2 and define W by (17)
W = iy = yo + hl I1Y(h)I < 2N) (l V.
We then see from (16) and (10) that the left member of (15) is not greater than IIhII2 <
Ilv(h)I2 +
e8KIIe. 2
V=1
and this is indeed not greater than e2, as follows from (17) together with the fact that IIhliQ < 2K. This finishes the existence proof. We finally remark: If c > 0, then there exists a Ko such that, for K ? Ko, the minimum of j(y) in VK is not taken at the boundary of VK. Therefore, by what has just been proved, there exists an interior minimum point yo. If g(y) = grad j(y) exists, then g(yo) = 0; i.e., yo is a critical point. The proofs of these facts are essentially the same as the proofs for the corresponding facts in [10], to which we therefore refer, particularly to Theorem 3.14 of that paper. 7 This lemma is closely connected with Rellich's theorem.
See [1, p. 489].
APPLICATIONS OF FUNCTIONAL ANALYSIS
147
3. Remarks on the index form. We consider in this section the integral (1) which is extended over the interval
a1 < t < a2
(18)
of the real t-axis. Concerning the coefficients in (2), we suppose that the aik and their first two derivatives with respect to t, as well as the bi(t), are continuous in the interval (18). c(t,y) = c(t,yi, is supposed to be continuous when t is in that
interval and when - cc < yi < + - and to satisfy a (uniform) Lipschitz condition in y. Moreover, the matrix of the aik is assumed to be symmetric and positive-definite. We denote by L2 the space of vector functions y(t) = (Y,,. .,yam), which are square-integrable over (18),$ and by L the subset of elements of L2 , n, which have Hoelder continuous first derivatives. We set, for i = 1, 2, .
Li(y) _ -2 j)
(19)
.
aik
-y-,
(L(y) =
(L1,...,L.))
k=1
If then x = (x1,
C L, the following facts are well known: The equation
L(y) = x
(20)
has one and only one solution which satisfies the boundary conditions (21)
y(a') = 0,
(22)
llyll2 G µlIx112,
y(a2) = 0. There exists a constant ,u such that for this solution Ily'll2
1,11x112.9
Therefore y = 31(x), y' = M1(x) are well-defined, linear, bounded operators whose domain L is dense in L2. Moreover, M is symmetric, as follows from the Lagrange identity (together with the symmetry assumptions concerning the aik)
(,7iL(y)) =
(23)
2 ",s=1
which holds for any couple x C L,
L('n) = ,
(24)
8 This means that, if y = (y,,
dy" dna dt dt
C L if y satisfies (20), (21), and 7(a1) = f7(a2) = 0.
,y.), then the component functions yi = yi(t) are ,z,), the scalar product of y and z is defined
square-integrable over (18). If z = (zi, n
by (y,z) = fa,
yizi dt, and IIyJI =
i=1
9 This follows, for example, if one expresses y in terms of a Green's tensor(y' = dy/dt). [5, p. 249).
See
E. H. ROTHE
148
The positivity assumption concerning the matrix of the aik together with (23) implies the existence of a positive constant m such that (25)
.
(x,M(x)) ? m I a
2
,- j dt = mlly'112 = mI IM1(x)211.
i-1
From this it is easily seen that actually
(x,M(x)) > 0,
(26)
unless x = 0. We therefore can introduce the left member of (26) as new norm in L; i.e., we introduce
(x,M()) = [x,]
(27)
as new scalar product in L. The linear space thus obtained is called H, and its completion H. Now, for x C H, the functions y = M(x), y' = Ml(x) are continuous, so that they can be substituted in (1); in other words, for x C H, the definition
i(x) = j(M(x))
(28)
makes sense. We see then from (1), (2), (23), and (27) that
i(x) =
(29)
[x2 ]]
+ I(x)
where as
(30)
I( x) =
d
bi(t) dti dt +
al
fal a c(t,y) dt,
[(y = M(x))].
ti=1
It can be seen that i(x) can be extended to H.10 If (31)
ci(t,y) =
UaY_c,,. .,
c
Y-)
exists and if its components satisfy the assumptions previously made concerning c, then it can be easily verified that the first Frechet differential di(x,E) of i(x) at the point x belonging to the increment exists and is given by the formula (32)
,g(x))
where the "Euler operator" g(x) is defined by (33)
g(x) = x + G(x),
G(x)
dt + ci(t,M(x)).11
10 For a proof, see [10, Lemma 3.1]. 11 Here again the operator is first defined for x C H and is then extended. ing footnote.
Cf. the preced-
APPLICATIONS OF FUNCTIONAL ANALYSIS
149
If in addition the second derivatives a2c/ayi ayi exist and satisfy the conditions previously made on c, then it can be verified that the second Frechet differential d2i(x,g,r) of i(x) at the point x corresponding to the increments E, exists and is given by the formula (34)
[E,k(x,l')]
where the "Jacobi operator"
is defined by
K(x,f) = (K.,. ..,K.),
(35)
with (36)
Ki = ) j=1
(see footnote 11).
a2c
a yi ayi
Bi,
0 = (91,.
.,0n)
=M
If we use the classical notation 9(17,77') -
/
IJiP,1707j + J 7nj17,gj],12
then direct computation shows that (37)
2 fat"
dt,
(7 = M( ))-
Now the advantage of introducing, instead of y,,q, the "new variables" x = L(y), = L(,l) lies in the fact that the operator G(x) defined in (33) is completely continuous and that the operator defined in (35) is completely continuous in x as well as in r; in the latter variable it is, moreover, linear and symmetric.13 This allows us to draw nearly immediately the following conclusions from
the theory of gradient mappings: From (32) we see that g(x) is the gradient of i(x), and comparison of (29) with (33) shows then that the completely continuous G(x) is the gradient of I(x). From this, the existence of a minimum for i(x) in any ball [x,x] < K2 follows [8, p. 430, Theorem 4.1]. If c > 0, we can conclude that, for K big enough, the minimum is taken in an interior point xo of the ball and that xo is a critical point, i.e., that g(xo) = 0 (cf. the corresponding remark in the last paragraph of Sec. 1). Moreover, from (34) and the spectral theorem for completely continuous symmetric operators, we see that the "index form" (37) can be written in Note that, in our case, f,, = 0. 13 As to the proof of this, we remark first that, on account of the Lipschitz assumptions for the first and second derivatives of c with respect to the y;, it is easily seen that the complete continuity of G and K follows from that of y = M(x). The proof for the complete continuity of this latter operator is quite analogous to the proof of the first part of Theorem 2.1 in [10], to which we therefore refer. The symmetry of K follows from (35) together with the symmetry of d2i(x,t,l) in and f. !2 See, for example, [7, p. 7].
E. H. ROTHE
150
the form d2i(x, s) _
(38)
[ey, ]2X
,
Y=1
where e, and X, are the eigenelements and corresponding eigenvalues of the problem
k(s) _ X
(39)
and that at most a finite number of the X. are negative. This follows from the fact that i, = Ay - 1 are the eigenvalues of the completely continuous (in t) operator K(x,l-); these are either finite in number or tend to zero as v -* oo so that, in the latter case, X. -+ +1. Also to each eigenvalue belong a finite number of eigenelements ev, since the e, are also eigenelements of the completely continuous K(x,f'). For the same reason the "nullity" [i.e., the number of eigenelements to the eigenvalue 0 or, what is the same, the number 014] is finite. of linearly independent solutions of the "Jacobi equation" If x = x0 is a critical point, then the index of xo is defined as the number of negative terms in the quadratic form (38) or, what is the same, the number of linearly independent eigenelements of (39) belonging to negative eigenvalues.
If the "Jacobi equation" has no (nontrivial) solutions, then the quadratic form (38) is nondegenerate, and it can be shown that then the critical point xo is isolated (see [10, Theorem 4.2]). In this case the index can be related to the topologically defined Morse-type numbers of real-valued functions in Hilbert space (see [10, Theorem 4.3]).
We finally remark that the fact that the index is the number of negative terms in the quadratic form (38) for the second differential makes it nearly evident that a critical point xo furnishes a (relative) minimum if and only if the index is zero, provided that we add a proper "differentiability condition"; e.g., if we assume that the third Frechet differential exists at xo.15 For a formal proof we have only to apply "Taylor's theorem" (with remainder term) up to terms of order three (see [3, Theorem 5]). BIBLIOGRAPHY
1. R. Courant and D. Hilbert, Methoden der mathematischen Physik, vol. 2, SpringerVerlag OHG, Berlin, 1937.
2. K. Friedrichs, Spektraltheorie halbbeschrankter Operatoren and Anwendung auf die Spektralzerlegung von Differentialoperatoren, Math. Ann. vol. 109 (1934) pp. 465-487.
14 If k = (ki,
,k,), direct computation shows that (for x C H),
-ki(x)=dt017'-07iSince 7(a') _ i7(al) = 0, it follows that, if ;: is a solution of the "Jacobi equation" of the text, then n = M(t) is a solution of the classical Jacobi equation vanishing in the end points. If t is nontrivial, then obviously 77 is nontrivial. 11 Sufficient for this is that the third derivatives of c with respect to the yi exist and satisfy the assumptions previously made about c.
APPLICATIONS OF FUNCTIONAL ANALYSIS
151
3. L. M. Graves, Riemann integration and Taylor's theorem in general analysis, Trans. Amer. Math. Soc. vol. 29 (1927) pp. 163-177. 4. M. R. Hestenes, Application of the theory of quadratic forms in Hilbert space to the calculus of variations, Pacific J. Math. vol. 1 (1951) pp. 525-581. 5. E. Kamke, Differentialgleichungen, Losungsmethoden and Losungen, vol. 1, Gewohnliche Diferentialgleichungen, Akademische Verlagsgesellschaft M.b.H., Leipzig, 3d ed., 1944. 6. T. Kato, Quadratic forms in Hilbert space and asymptotic perturbation series, Tech. Rept. No. 7, prepared under Contract DA-04-200-ORD-171, Task order 5, for Office of Ordnance Research, Department of Mathematics, University of California, Berkeley, Cal., April, 1955. 7. M. Morse, The calculus of variations in the large, Amer. Math. Soc. Colloquium Publ. vol. 18 (1934). 8. E. H. Rothe, Gradient mappings and extrema in Banach spaces, Duke Math. J. vol. 15 (1948) pp. 421-431. 9.
, A note on the Banach spaces of Calkin and Morrey, Pacific J. Math. vol. 3
(1953) pp. 493-499. 10. , Remarks on the application of gradient mappings to the calculus of variations and the connected boundary value problems, Comm. Pure Appl. Math., N. Y. U. vol. 9 (1956) pp. 551-568. 11. R. G. Sanger, Functions of lines and the calculus of variations, Univ. of Chicago Dept. of Math. Contributions to the Calculus of Variations (1931-1932) pp. 191-293. UNIVERSITY OF MICHIGAN, ANN ARBOR, MICH.
INDEX Malus, theorem of, 40 Mappings, gradient, 149 Metric, definite, 79 indefinite, 79
Allocation processes, multistage, 121 Aronszajn, N., 54
Bottleneck process, 134
Operators, half-bounded, 143 Optimal policy, 117
Caustic, 35 Characteristic-value problem, double, 139
Courant, 55
Perfectly plastic material, 18
Decision processes, continuous, 118 multistage, 115 Deformation theories, 12
Eiconal equation, 27, 40 Eiconal function, complex, 43 Eigenvalues, 105, 150 Electromagnetic vibrations, 85 Energy, complementary, 9 potential, 9 Equilibrium-stress field, 8 Euler (differential) equations, 2 Existence proof, 143
Rate principle, complementary, 15 Rayleigh-Ritz method, 57 Rays, diffracted, 27, 28 imaginary, 41 normal congruences of, 40 Smoothing processes, 130 Snell's law, 29, 39 Spectral theorem, 143 Stationary principles, 79, 89 Strain-displacement field, 8
Univalent functions, group property
Fermat's principle, 38, 41
of, 94
Green's function, 97
Variational principle, for displacements, 3 for stresses, 4 Vibrating membrane, 80 Vibrations of an elastic body, 83
Hadamard variational formula, 102, 109
Hilbert transform, 108 Index form, 147 Instability, 139 Interior variation, method of, 97
Weinstein's method, 60 Work hardening, 15 Work-hardening relations, 13 153