Inequalities: With Applications to Engineering
Michael J. Cloud Byron C. Drachman
Springer
Preface
We might wonder ...
179 downloads
1453 Views
898KB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Inequalities: With Applications to Engineering
Michael J. Cloud Byron C. Drachman
Springer
Preface
We might wonder why it is necessary to study inequalities. Many applied science and engineering problems, for instance, can be pursued without their explicit mention. Nevertheless, a facility with inequalities seems to be necessary for an understanding of much of mathematics at intermediate and higher levels. Inequalities serve a natural purpose of comparison, and they sometimes afford us indirect routes of reasoning or problem solving when more direct routes might be inconvenient or unavailable. This small guide to inequalities was originally written with engineers and other applied scientists in mind. Comments from those mathematicians who have seen the manuscript lead us to hope that some mathematicians will find some of the applications interesting, and that students of mathematics will also find the book useful. It is intended to help fill the gap between college-algebra treatments of inequalities and the formidable treatises on the subject that exist in the mathematics literature. Important techniques are all reinforced through the exercises that appear at the end of each chapter, and hints are included to expedite the reader’s progress. We review a few topics from calculus, but make no attempt at a thorough review. In order to simplify the discussion, we use a stronger hypothesis than is necessary in some of the statements or proofs of theorems and in some of the exercises. For a review of calculus, we recommend the fine classic by Landau [37]. Among the many good books on analysis, we can recommend Stromberg [57]. We would like to thank Edward Rothwell of the Department of Electrical Engineering at Michigan State University, for encouragement during the early stages of writing this book. Thanks are also due to Beth Lannon-
vi
Preface
Cloud for comments and suggestions on the figures. We thank Catherine Friess and Tammy Hatfield for creating the first LATEX version of this book. Thanks to Val Drachman for encouragement and support while we wrote the book. We owe a substantial debt to the staff at Springer-Verlag, especially Allan Abrams, Frank Ganz, Ina Lindemann, and Anne Fossella. Two anonymous reviewers were very helpful, proposing several topics that were not included in the preliminary version. We want to thank Glen Anderson, Mavina Vamanamurthy, and Matti Vuorinen for their generous help, in particular for pointing out the importance of l’Hˆ opital’s monotone rule and for suggesting several related exercises. Finally, we wish to thank Carl Ganser for many helpful discussions and suggestions, and for generously agreeing to read the manuscript.
Contents
Preface
v
1 Basic Review of Inequalities 1.1 Preliminaries . . . . . . . . . . . . . . . . . 1.2 Elementary Properties and Survival Rules . 1.3 Bounded Set Terminology . . . . . . . . . . 1.4 Quadratic Inequalities . . . . . . . . . . . . 1.5 Absolute Value and the Triangle Inequality 1.6 Miscellaneous Examples . . . . . . . . . . . 1.7 Exercises . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
1 1 2 4 5 5 10 14
2 Methods from the Calculus 2.1 Introduction . . . . . . . . . . . . . . . 2.2 Function Terminology and Facts . . . 2.3 Basic Results for Integrals . . . . . . . 2.4 Results from the Differential Calculus 2.5 Some Applications . . . . . . . . . . . 2.6 Exercises . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
19 19 19 21 23 27 32
3 Some Standard Inequalities 3.1 Introduction . . . . . . . . . 3.2 Bernoulli’s Inequality . . . . 3.3 Young’s Inequality . . . . . 3.4 The Inequality of the Means
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
37 37 37 38 38
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
viii
Contents
3.5 3.6 3.7 3.8 3.9 3.10
H¨ older’s Inequality . . . . . . . . Minkowski’s Inequality . . . . . . The Cauchy–Schwarz Inequality . Chebyshev’s Inequality . . . . . . Jensen’s Inequality . . . . . . . . Exercises . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
40 43 44 45 46 49
4 Inequalities in Abstract Spaces 4.1 Introduction . . . . . . . . . . . . . . . 4.2 Metric Spaces . . . . . . . . . . . . . . 4.3 Iteration in a Metric Space . . . . . . 4.4 Linear Spaces . . . . . . . . . . . . . . 4.5 Orthogonal Projection and Expansion 4.6 Exercises . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
53 53 53 56 57 62 65
5 Some Applications 5.1 Introduction . . . . . . . . . . . . . . . . . . 5.2 Estimation of Integrals . . . . . . . . . . . . 5.3 Series Expansions . . . . . . . . . . . . . . . 5.4 Simpson’s Rule . . . . . . . . . . . . . . . . 5.5 Taylor’s Method . . . . . . . . . . . . . . . 5.6 Special Functions of Mathematical Physics . 5.7 A Projectile Problem . . . . . . . . . . . . . 5.8 Geometric Shapes . . . . . . . . . . . . . . 5.9 Electrostatic Fields and Capacitance . . . . 5.10 Applications to Matrices . . . . . . . . . . . 5.11 Topics in Signal Analysis . . . . . . . . . . 5.12 Dynamical System Stability and Control . . 5.13 Some Inequalities of Probability . . . . . . . 5.14 Applications in Communication Systems . . 5.15 Existence of Solutions . . . . . . . . . . . . 5.16 A Duality Theorem and Cost Minimization 5.17 Exercises . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
67 67 67 68 72 74 77 82 84 88 93 100 103 110 112 115 121 123
Appendix
. . . . . .
. . . . . .
Hints for Selected Exercises
127
References
143
Index
147
1 Basic Review of Inequalities
1.1 Preliminaries A few set and logic symbols shall serve as convenient shorthand, including: ∈ ⊆ ∪ ∩ R N C ⇒ ⇔
for for for for for for for for for
set membership; subset containment; set union; set intersection; the set of all real numbers; the natural numbers (positive integers); the complex numbers; logical implication; logical equivalence.
The set-builder notation S = {x | P(x)} specifies S as the set of all elements x such that proposition P(x) holds. For instance, S = {x ∈ R | x2 − 1 = 0} defines S as the set of all real solutions of x2 − 1 = 0. Recall that z ∈ C can be written as z = x + iy, where i is the imaginary unit, x = [z] is the real part of z, and y = [z] is the imaginary part of z. In polar form, z = |z| exp(iφ), where |z| is the (nonnegative real) modulus of z and φ is the argument of z. We denote the complex conjugate of z by z.
2
1. Basic Review of Inequalities
We agree to leave the fundamental relation “≤” (is less than or equal to) undefined; for present purposes, the reader’s intuitive notion of what is meant by a ≤ b will suffice. We may then proceed to define the other basic inequality relations. Let a, b, c ∈ R. Then, for example, • a ≥ b means that b ≤ a; • a < b means that a ≤ b and a = b; • a > b means that b < a; • a ≤ b ≤ c means that a ≤ b and b ≤ c. Other compound inequality notations, such as a < b < c and a > b ≥ c, are given analogous definitions. We say that the inequalities a < b and c < d have the same sense, while a < b and c > d have opposite sense. Inequalities such as a ≤ b are sometimes called weak or mixed, while those such as a < b (in which equality is precluded) are called strict. Example 1.1. The various finite intervals along the real line are subsets of R defined as: • (a, b) = {x ∈ R | a < x < b}; • [a, b) = {x ∈ R | a ≤ x < b}; • (a, b] = {x ∈ R | a < x ≤ b}; • [a, b] = {x ∈ R | a ≤ x ≤ b}. Infinite intervals are defined using: • [a, ∞) = {x ∈ R | x ≥ a}; • (−∞, a) = {x ∈ R | x < a}; and so forth.
1.2 Elementary Properties and Survival Rules The basic laws underlying inequality manipulations are the axioms [18] that distinguish the set R as an ordered field. For any a, b, c ∈ R, (a) if a ≤ b and b ≤ c, then a ≤ c; (b) we have a ≤ b and b ≤ a if and only if a = b; (c) either a ≤ b or b ≤ a; (d) if a ≤ b, then a + c ≤ b + c;
1.2 Elementary Properties and Survival Rules
3
(e) if 0 ≤ a and 0 ≤ b, then 0 ≤ ab. We recognize axiom (a) as the familiar transitive property. Axiom (b) is helpful when we want to show indirectly that two real numbers a and b are equal. Unlike R, the field C cannot be ordered (Exercise 1.8). However, |z| is real; hence any inequality established for real numbers can also be applied to the moduli of complex numbers, and vice versa. In addition to the axioms, a number of useful order properties can be established. The following list, while by no means exhaustive, will serve as a review of some of the most important aspects of inequality manipulation. Suppose a, b, c, d ∈ R. • One and only one of the following holds: a < b, a = b, a > b. • If a ≤ b and b < c, then a < c. • We have a ≤ b if and only if a + c ≤ b + c; a < b if and only if a + c < b + c. • If c < 0 and a < b, then ac > bc. • We have a ≤ b if and only if a − b ≤ 0. • If a ≤ b and c ≥ 0, then ac ≤ bc. • If a ≤ 0 and b ≥ 0, then ab ≤ 0; if a ≤ 0 and b ≤ 0, then ab ≥ 0. • We have a2 ≥ 0; furthermore, if a = 0, then a2 > 0. • We have a < 0 if and only if 1/a < 0; a > 0 if and only if 1/a > 0. • We have 0 < a < b if and only if 0 < 1/b < 1/a. • If a > 0 and b > 0, then a/b > 0. • If a < b and c < d, then a + c < b + d. • If 0 < a < b and 0 < c < d, then ac < bd. • If a ≤ b and c ≤ d, then a + c ≤ b + d. • If a ≤ b and c < d, then a + c < b + d. • If a > 1, then a2 > a. If 0 < a < 1, then a2 < a. • For c ≥ 0 and 0 < a ≤ b, we have ac ≤ bc , with equality if and only if b = a or c = 0. • If a is less than every positive real number ε, then a ≤ 0.
4
1. Basic Review of Inequalities
Generally, a term can be transposed from one side of an inequality to the other, provided that its algebraic sign is changed in the process. Inequalities can be added together, and inequalities between positive numbers can be multiplied. However, inequalities cannot in general be subtracted or divided. It is false, for instance, that for every a, b, c, d, the inequalities a < b and c < d imply a−c < b−d. For b−d > a−c is equivalent to (b−a)−(d−c) > 0, which cannot be guaranteed under the hypothesis that a < b and c < d alone. The other caution, about dividing inequalities, exists for similar reasons; however, it should suffice to note that dividing the inequality 1 < 2 by itself would yield the false result 1 < 1.
1.3 Bounded Set Terminology Some additional points about the real numbers shall be useful later. Let S be a set of real numbers. If there is a number B such that s ≤ B for every s ∈ S, then S is bounded above and B is an upper bound for S. Of course, a set that is bounded above has many upper bounds. If there exists M such that M is an upper bound for S and no number less than M is an upper bound for S, then M is called the least upper bound or supremum for S, denoted by sup S. If sup S ∈ S, we may refer to sup S as the maximum of S, denoted by max S. These concepts are elementary, but the reader unfamiliar with them should not pass over them too quickly and we offer an example for clarification. Example 1.2. The interval I1 = [0, 1] is bounded above by 1 (and by any x > 1). No number less than 1 is an upper bound for I1 , because if y < 1, then there exists ε > 0 such that y + ε ∈ I1 . Hence sup I1 = 1. Moreover, 1 ∈ I1 so that max I1 = 1. The interval I2 = [0, 1), on the other hand, has no maximum value although it does have a supremum (viz., 1). A fundamental property of the real numbers is that any nonempty set of real numbers that is bounded above has a supremum. Similarly, we can formulate definitions for the analogous concepts of bounded below, lower bound, and greatest lower bound or infimum. The infimum of S is symbolized as inf S, and always exists for sets that are bounded below. If inf S ∈ S, then S has a minimum denoted by min S. Again, the supremum and infimum concepts are convenient and important because for a bounded set these values always exist, whereas maximum and minimum values may not. The supremum and infimum concepts may also be applied to functions. Let f be a real-valued function with domain D, and let S be a nonempty subset of D. The image of S under f is f (S) = {f (x) | x ∈ S}
1.5 Absolute Value and the Triangle Inequality
5
and we have, by definition, sup f (x) = sup f (S)
and
inf f (x) = inf f (S).
x∈S
x∈S
If f (S) is not bounded above or below, respectively, we write sup f (x) = ∞ or inf f (x) = −∞. Various properties of, and relations between, suprema and infima are given in Exercises 1.10 and 1.11.
1.4 Quadratic Inequalities Consider the quadratic polynomial g(x) = ax2 + 2bx + c, a = 0, with discriminant ∆ = b2 − ac. Completing the square, we have 1 g(x) = a
x+
b a
2 −
∆ . a2
Therefore, ∆ ≤ 0 implies (1/a)g(x) ≥ 0 for all x. Conversely, if (1/a)g(x) ≥ 0 for all x, then in particular letting x = −b/a gives ∆ ≤ 0. It is clear that • (1/a)g(x) ≥ 0 for all x if and only if ∆ ≤ 0; • (1/a)g(x) > 0 for all x if and only if ∆ < 0.
√ Geometrically, of course, g(x) is a parabola with roots (−b ± ∆)/a by the quadratic formula. If ∆ ≤ 0, then g(x) does not have two distinct real roots; hence its graph never crosses the real axis, and g(x) has the same sign everywhere. Conversely, if either g(x) ≥ 0 for all x or g(x) ≤ 0 for all x, then ∆ ≤ 0. If ∆ < 0, then g(x) has no real roots; hence its graph never touches the real axis, and g(x) is strictly positive or strictly negative. Conversely, if either g(x) > 0 for all x or g(x) < 0 for all x, then ∆ < 0.
1.5 Absolute Value and the Triangle Inequality The absolute value |x| of x ∈ R is given by x if x ≥ 0, |x| = −x if x < 0. Some useful properties hold for the absolute value. We have: • |x| ≥ 0, with equality if and only if x = 0; • |ab| = |a||b| and, if b = 0, |a/b| = |a|/|b|; • |x − a| < b if and only if −b < x − a < b;
6
1. Basic Review of Inequalities
• −|a| ≤ a ≤ |a|, and ab ≤ |a||b|; • |a| ≤ |b| if and only if a2 ≤ b2 . Example 1.3. The set Nε (x0 ) = {x ∈ R | |x − x0 | < ε} is called an ε-neighborhood of x0 in R. Example 1.4. The restriction |x − 3| < 1 is sufficient to insure that the inequality |x + 2|−1 < 1/4 holds, because |x − 3| < 1
⇒ −1 < x − 3 < 1 ⇒ 4<x+2<6 ⇒ |x + 2| > 4
and we can then take reciprocals. This is a convenient place to introduce sequences and convergence into our discussion. A sequence {an } has limit A as n tends to ∞, written lim an = A
n→∞
or an → A as n → ∞, if and only if for every positive number ε, there exists a corresponding number N such that the inequality |an − A| < ε holds whenever n > N . The sequence is said to converge to A. The following simple fact about sequences will be crucial in our later work with inequalities, because it will provide a framework for passage from results for finite sums to corresponding results for series or integrals. Lemma 1.1. Let an → A and bn → B as n → ∞, with an ≤ bn for all n. Then A ≤ B. Proof. Suppose A > B, and let ε = (A − B)/2 > 0. Then there exist Na and Nb such that n > Na
⇒ |an − A| < ε,
n > Nb
⇒ |bn − B| < ε.
Choose n > max{Na , Nb } and add the two inequalities −ε < bn − B < ε, −ε < A − an < ε, to obtain −2ε < bn − an + (A − B) < 2ε, or −2ε < bn − an + 2ε < 2ε. This implies that bn − an < 0, a contradiction.
1.5 Absolute Value and the Triangle Inequality
7
Note, however, that strict inequality can be lost in such a limiting process. Consider, for instance, the case an = 0, bn = 1/n; we have an < bn for all n, but A = B = 0. In certain cases, absolute value signs can be cleared from an inequality (i.e., an equivalent inequality can be obtained) through the equivalence |a| ≤ |b|
⇔
a2 ≤ b2 .
This can help us verify inequalities that involve several pairs of such signs. Example 1.5. To establish ||a| − |b|| ≤ |a − b| we can use the following chain of equivalences, eventually reaching an inequality that is self-evident: ||a| − |b|| ≤ |a − b| ⇔ (|a| − |b|)2 ≤ (a − b)2 ⇔ −2|a||b| ≤ −2ab ⇔ |a||b| ≥ ab. A pivotal inequality involving the absolute value is as follows: Theorem 1.1 (Triangle Inequality). Let z1 , . . . , zn be nonzero complex numbers. Then n n zi ≤ |zi |. (1.1) i=1
i=1
Equality holds if and only if the zi all have the same arguments. Proof. The case n = 1 is trivial, so we examine n = 2 as the verification step for mathematical induction. Some elementary facts about complex numbers are needed here. For z ∈ C, [z] = x ≤ x2 + y 2 = |z|. Similarly, [z] ≤ |z|. It is also easily shown that |z| = |z|,
|z|2 = zz,
[z] =
1 (z + z), 2
[z] =
1 (z − z). 2i
We use these facts as follows: |z1 + z2 |2 = (z1 + z2 )(z1 + z2 ) = |z1 |2 + |z2 |2 + 2 [z1 z2 ]. However, 2 [z1 z2 ] ≤ 2|z1 z2 | = 2|z1 ||z2 |
8
1. Basic Review of Inequalities
so that
|z1 + z2 |2 ≤ |z1 |2 + |z2 |2 + 2|z1 ||z2 | = (|z1 | + |z2 |)2 .
Taking a square root and noting that both sides are positive, we obtain |z1 + z2 | ≤ |z1 | + |z2 |.
(1.2)
In general, we have n n+1 n zi = zi + zn+1 ≤ zi + |zn+1 | i=1
i=1
≤
n
|zi | + |zn+1 | =
i=1
i=1
n+1
|zi |.
i=1
For a shorter proof and a discussion of the conditions for equality, see Exercise 4.9. Similarly |z1 − z2 |2 = |z1 |2 + |z2 |2 − 2 [z1 z2 ] ≥ |z1 |2 + |z2 |2 − 2|z1 ||z2 | = (|z1 | − |z2 |)2 , so that |z1 − z2 | ≥ ||z1 | − |z2 || and this can be combined with (1.2) in the convenient form ||z1 | − |z2 || ≤ |z1 ± z2 | ≤ |z1 | + |z2 |
(1.3)
which provides both upper and lower bounds for the modulus of a sum or difference. Geometrically, the length of any side of a triangle can neither exceed the sum of the lengths of the two remaining sides, nor fall short of the difference in the lengths of the two remaining sides. Opportunities to employ the triangle inequality range from direct applications to those requiring prior setup or possibly multiple, sequential applications of the inequality. Example 1.6. We show that |a + b|1/2 ≤ |a|1/2 + |b|1/2 for a, b ∈ R. Because both sides are nonnegative, we may square both sides to obtain the equivalent statement |a + b| ≤ |a| + |b| + 2|a|1/2 |b|1/2 . But this is implied by the triangle inequality, completing the proof.
1.5 Absolute Value and the Triangle Inequality
9
Example 1.7. The uniqueness of a sequence limit is easily established. We begin by supposing {an } has limits A1 , A2 . Let ε > 0; then there exist N1 and N2 such that n > N1
⇒ |an − A1 | < ε,
n > N2
⇒ |an − A2 | < ε.
We choose n > max{N1 , N2 } and write |A1 − A2 | = |A1 − an + an − A2 | ≤ |A1 − an | + |A2 − an | < ε + ε = 2ε. Because ε is arbitrary, it follows that |A1 − A2 | = 0 and hence A1 = A2 . Example 1.8. To show that the limit of a product of sequences is the product of the limits, the triangle inequality is employed only after some algebraic manipulation. Supposing an → A and bn → B as n → ∞, we have |an bn − AB| = |(an − A)(bn − B) + A(bn − B) + B(an − A)| ≤ |an − A||bn − B| + |A||bn − B| + |B||an − A|. Each quantity on the right can now be made arbitrarily small for n sufficiently large, and the proof is easily completed. Example 1.9. With z = x + iy, we have |sinh y| ≤ |sin z| ≤ cosh y because
||eix ||e−y | − |e−ix ||ey || ei(x+iy) − e−i(x+iy) |eix ||e−y | + |e−ix ||ey | ≤ ≤ |2i| 2i |2i|
by (1.3), where |e±ix | = |i| = 1. Example 1.10. Here is a rough theorem for bounding polynomial zeros. Suppose that f (z) = a0 z n + a1 z n−1 + · · · + an−1 z + an (a0 = 0) where z is complex and where the coefficients ai may be real or complex. We assert and would like to show that all the zeros of f (z) have moduli less than or equal to the number n (1.4) max |ak | . ξ =1+ |a0 | 1≤k≤n We look for a disk such that outside the disk the leading term of the polynomial dominates; that is, we seek ξ such that n−1 n i an−i z . |z| > ξ ⇒ |a0 z | > i=0
10
1. Basic Review of Inequalities
First note that if |z| ≥ 1 we have n−1
|z|i ≤ n|z|n−1
i=0
and so n−1 n−1 n−1 i an−i z ≤ |an−i ||z|i ≤ M |z|i ≤ nM |z|n−1 , i=0
i=0
i=0
where M = max |ak |. We now choose ξ as in (1.4) so that |z| > ξ implies both |z| > 1 and |z| > nM/|a0 |. Then for |z| > ξ, |a0 ||z|n − nM |z|n−1 > 0 and n−1 n−1 n i n i an−i z ≥ |a0 z | − an−i z |f (z)| = a0 z + i=0
i=0
≥ |a0 z n | − nM |z|n−1 > 0 by (1.3). Thus all zeros of f (z) are in the disk |z| ≤ ξ. (This argument is used in complex variable theory to prove the fundamental theorem of algebra via Rouch´e’s theorem.)
1.6 Miscellaneous Examples A number of powerful standard inequalities are available for application in addition to the triangle inequality, and these will be developed in Chapter 3. However, we can also “invent” useful inequalities in an ad hoc manner based on simple ideas. For instance, the obvious fact that 1 1 < n2 n(n − 1) for all n > 1 is used in the following example: Example 1.11. A sequence {an } is increasing if for every n ∈ N, an ≤ an+1 . It can be shown that every increasing sequence that is bounded above is convergent (Exercise 1.12). We can use this fact to prove convergence of the increasing sequence 1,
1 1+ , 4
1+
1 1 + , 4 9
1+
1 1 1 + + ,··· , 4 9 16
1.6 Miscellaneous Examples
11
because its terms for n ≥ 2 are given by n−1
n−1 1 1 ≤ 1 + an = 1 + (k + 1)2 k(k + 1) k=1 k=1 n−1 1 1 1 − −1 =1− =1− k+1 k n k=1
< 2. Here we have illustrated a common practice — the use of an inequality as an alternative to an attempt at direct algebraic simplification of the original expression. The idea, of course, is to compare a difficult expression with one that is simpler. We also used partial fraction expansion and, to evaluate the summation, the telescoping property (Exercise 1.7). Some inequalities may not be so obvious at first glance; however, they become clear upon revelation of the process by which they are obtained. Example 1.12. If n ∈ N and n > 1, then 1 1 1 1 1 + + ··· + < . < 4n (n + 1)2 (n + 2)2 (n + n)2 n To see why, we note that of the n terms in the expression an =
1 1 1 + + ··· + 2 2 (n + 1) (n + 2) (n + n)2
the smallest is 1/(n + n)2 and the largest is 1/(n + 1)2 . Obviously 1 1 n < n < a n (n + n)2 (n + 1)2 or
1 n . < an < 4n (n + 1)2
Reducing the denominator of the rightmost member (we give a little and replace n + 1 by n) yields the desired result. The give-a-little approach can also be combined with other techniques, such as mathematical induction. Example 1.13. To show that for n ∈ N and 0 ≤ x ≤ 1, (1 + x)n ≤ 1 + (2n − 1)x, we put n = k into the inequality and create a proposition P(k). P(1) is 1 + x ≤ 1 + x, and thus holds trivially. It remains to show that P(k) ⇒
12
1. Basic Review of Inequalities
P(k + 1). Hence we assume P(k) holds, and multiply both sides by the positive quantity 1 + x: (1 + x)k+1 ≤ (1 + x) + (1 + x)(2k − 1)x ≤ (1 + x) + 2(2k − 1)x = 1 + (2k+1 − 1)x. This is P(k + 1), as required. Other useful inequalities, such as (1 + x)n >
n(n − 1) 2 x 2
for x > 0, n ∈ N, n > 1, may be obtained ad hoc from the binomial expansion (1 + x)n = 1 + nx +
n(n − 1) 2 x + ··· . 2!
(1.5)
Example 1.14. To show that lim
n→∞
n =0 an
for a > 1, we can write n n n 2 = < = . n n n(n − 1) a [1 + (a − 1)] (n − 1)(a − 1)2 2 (a − 1) 2 Then, for n > 1, 0<
n 2 < an (n − 1)(a − 1)2
and as n → ∞ the nth term of the sequence is squeezed to zero. This squeezing idea will occur repeatedly throughout the book, and a rigorous justification is requested in Exercise 1.4. Many problems of physical interest rely on simple inequality concepts for their solution. We give two illustrations at this point, and will be able to offer many more after further stocking our arsenal with inequalities. Example 1.15. Consider a person walking across flat, nonslippery terrain. One leg swings forward pendulum-like while the other foot is planted firmly on the ground. In a simple biomechanical model [2] for the body during the stride, the leg attached to the planted foot is represented by a straight, rigid member of length L, while the rest of the body is a point mass m on top of the leg (Figure 1.1). This mass describes a circular arc at some tangential velocity v, and having centripetal acceleration v 2 /L. This value
1.6 Miscellaneous Examples
13
m v XXX v X z L FIGURE 1.1. Example on walking.
of acceleration certainly cannot exceed the free-fall acceleration constant g, and we are led immediately to the inequality v ≤ gL. We can use this to estimate the speed at which a transition from the walking gait to a running gait must occur for an average person with L ≈ 0.9 meters; since g = 9.8 m/s2 , the model suggests that the person must run to exceed a speed of 3 m/s. Example 1.16. Electric current divides among parallel resistors in such a way that power dissipation is minimized. For suppose that resistances R1 , . . . , Rm are connected in parallel, let i be the total current entering the parallel combination, and let in be the current through Rn . The resistors all share an identical voltage v. The power dissipated in Rn is i2n Rn , and the total power dissipated is given by P =
m
i2n Rn .
n=1
Consider now what would happen if the total current i were distributed in some other way. Letting the current through Rn be in + δn instead, the constraint m (in + δn ) = i n=1
implies that the δn sum to 0. If P is the new dissipated power, then P − P =
m
(in + δn )2 Rn −
n=1
m
i2n Rn = 2v
n=1
Hence P − P =
m n=1
and we conclude that P ≥ P .
m n=1
δn2 Rn ≥ 0
δn +
m n=1
δn2 Rn .
14
1. Basic Review of Inequalities
1.7 Exercises 1.1. Assuming n, m ∈ N, prove the following: (a) x, y > 1 ⇒ x + y < 2xy. (b) x > 0 ⇒ x + x−1 ≥ 2. (c) (x + y)2 ≤ 2(x2 + y 2 ). (d) n > 2 ⇒ n! > 2n−1 . (e) 2n+1 > n2 . (f) a, b > 0 ⇒ a4 + b4 ≥ ab(a2 + b2 ). n 1. n2 >1+ n (g) n > 1 ⇒ 2 n −1 (h) (n + m)! ≥ n!(n + 1)m . √ m √1 > 2( m + 1 − 1). (i) n=1 n (j) sinh x ≤ cosh x. (k) cosh x ≥ 1. 1.2. Simple physical applications: (a) An ice skater spins with arms fully outstretched. Show that when she pulls in her arms, her angular frequency and rotational kinetic energy both increase. (b) Show that the parallel combination of a set of electrical resistors gives an equivalent resistance that cannot exceed any of the individual resistances in the set. 1.3. Establish Weierstrass’s inequalities: (a) For positive real numbers a1 , . . . , an , n
(1 + ai ) ≥ 1 +
i=1
n
ai .
i=1
(b) For n ≥ 2 with 0 ≤ ai < 1, n
(1 − ai ) > 1 −
i=1
n
ai .
i=1
(For related inequalities, along with applications to the convergence of infinite products, see Bromwich [12].) 1.4. Problems on sequence limits: (a) Prove the squeeze principle: if an → L and cn → L as n → ∞ and in addition an ≤ bn ≤ cn for n > N , then bn → L. (b) Show that n1/n → 1 as n → ∞. (c) Find the limit of n!/nn as n → ∞.
1.7 Exercises
15
1.5. Given positive numbers a1 , . . . , am , the numbers A, H, and G defined by m 1 an , A= m n=1
H=
m 1 −1 an m n=1
−1
,
G=
m
1/m an
,
n=1
are called the arithmetic, harmonic, and geometric means, respectively, of the set. Show that if the an are not all equal, then each of the means lies between the minimum and maximum values of the an . 1.6. The Fibonacci numbers fn are defined by the recursion fn = fn−1 + fn−2 with f1 = f2 = 1. Show that
fn < 2n
for n ∈ N. 1.7. Prove the telescoping property m
(an+1 − an ) = am+1 − a1
n=1
for finite summations. Note that if b1 , . . . , bm is another set of numbers such that bn > an+1 − an , then the inequality m
bn > am+1 − a1
n=1
may be asserted even if the sum on the left happens to be unavailable in closed √ form. Apply this idea with an = n to show that m √ √ 1 √ . 2( m + 1 − m) < n n=1
1.8. For a number system S to be ordered, S must have a subset P of positive numbers such that the following requirements are met. First, the sum and product of any pair of positive numbers must also be positive. Second, for every x ∈ S exactly one of the conditions x = 0, x ∈ P , −x ∈ P must hold. Use this definition to establish that the system C cannot be ordered. 1.9. Show that with z = x + iy: (a) |sinh y| ≤ |cos z| ≤ cosh y; (b) sinh |x| ≤ |sinh z| ≤ cosh x. 1.10. Supply proofs for the following assertions about suprema and infima. Assume all sets are subsets of R. (a) If sup S exists, then it is unique (likewise for inf S). (b) If x is an upper bound for S and x ∈ S, then x = max S.
16
1. Basic Review of Inequalities
(c) We have s = sup S if and only if: (1) for all ε > 0, if x ∈ S, then x < s + ε; and (2) for all ε > 0, there exists y ∈ S such that y > s − ε. (d) If sup S exists, then sup S = inf U , where U is the set of all upper bounds for S. If inf S exists, then inf S = sup L, where L is the set of all lower bounds for S. (e) If S is nonempty, then inf S ≤ sup S, with equality if and only if S contains a single number. (f) Let A ⊆ B. If sup A and sup B both exist, then sup A ≤ sup B. If inf A and inf B both exist, then inf A ≥ inf B. (g) Let S − = {x | −x ∈ S} where S is a bounded set. Then sup S = −inf S − , inf S = −sup S − . (h) Let S c = {x | x/c ∈ S} where S is bounded and c > 0. Then sup S c = c sup S, inf S c = c inf S. 1.11. Assume that f (x) and g(x) are defined over a common domain D, and establish the following relations. (Take all subsets to be nonempty.) (a) If S1 ⊆ S2 ⊆ D, then sup f (x) ≤ sup f (x),
x∈S1
x∈S2
inf f (x) ≥ inf f (x).
x∈S1
x∈S2
(b) If f (x) ≤ g(x) for all x ∈ S ⊆ D, then sup f (x) ≤ sup g(x),
x∈S
x∈S
inf f (x) ≤ inf g(x).
x∈S
x∈S
(c) For S ⊆ D, sup[f (x) + g(x)] ≤ sup f (x) + sup g(x),
x∈S
x∈S
x∈S
inf [f (x) + g(x)] ≥ inf f (x) + inf g(x),
x∈S
x∈S
x∈S
sup[−f (x)] = − inf f (x),
x∈S
x∈S
inf [−f (x)] = − sup f (x),
x∈S
x∈S
sup[f (x) − g(x)] ≤ sup f (x) − inf g(x),
x∈S
x∈S
x∈S
inf [f (x) − g(x)] ≥ inf f (x) − sup g(x).
x∈S
x∈S
x∈S
1.7 Exercises
17
(d) If f (x) ≥ 0 and g(x) ≥ 0 whenever x ∈ S ⊆ D, then inf [f (x)g(x)] ≥ inf f (x) inf g(x),
x∈S
x∈S
x∈S
sup[f (x)g(x)] ≤ sup f (x) sup g(x),
x∈S
x∈S
x∈S
sup[1/f (x)] = 1/ inf f (x), x∈S
x∈S
inf [1/f (x)] = 1/ sup f (x).
x∈S
x∈S
1.12. Prove that every bounded monotonic sequence converges. (Recall that a sequence is monotonic if it is either increasing or decreasing; it is increasing if am ≥ an whenever m > n, or decreasing if am ≤ an whenever m > n.) 1.13. By definition lim f (x) = L,
x→∞
where L is finite, means that corresponding to ε > 0 there exists N > 0 such that |f (x) − L| < ε whenever x > N . Prove the following assertions: (a) The limit, if it exists, is unique. (b) The limit of a sum is the sum of the limits, and the limit of a product is the product of the limits. (c) If f (x) ≥ 0 for x sufficiently large, then L ≥ 0.
This page intentionally left blank
2 Methods from the Calculus
2.1 Introduction Several topics studied in calculus are basic to working with inequalities. Facts involving function continuity, differentiation, and extrema are particularly important, as are certain facts about integration. We review the most crucial of these here, and then advance to some preliminary applications.
2.2 Function Terminology and Facts Let I represent an interval. By the statement f (x) is bounded on I, we mean that there is a number B such that for every x ∈ I, |f (x)| ≤ B. Example 2.1. The function sin x is bounded on any interval. Two forms of notation are sometimes useful in comparing the behavior of functions of the same independent variable as that independent variable tends to a limit. Given two functions f (x) and g(x), we may write f (x) = O(g(x))
as x → ∞
if there exist positive numbers N and B such that |f (x)| ≤ Bg(x)
(2.1)
20
2. Methods from the Calculus
whenever x > N . Similarly, f (x) = O(g(x)) as x → 0+ if there exist positive numbers δ and B such that (2.1) holds whenever 0 < x < δ. We understand statements of the form f (x) = g(x) + O(h(x)) to mean that the function f (x) − g(x) is O(h(x)). If f (x)/g(x) → 0 as x → x0 , we may write f (x) = o(g(x))
as x → x0 .
In these O and o statements, the gauge function g is usually chosen to have a simple form, such as x−1 , 1, or x. We assume a working knowledge of function limits. One fact, however, is worthy of explicit mention. We omit the proof, which is analogous to the corresponding proof for sequences (Exercise 1.4). Lemma 2.1 (Squeeze Principle). If g(x) ≤ f (x) ≤ h(x), and if lim g(x) = lim h(x) = L,
x→a
x→a
then lim f (x) = L.
x→a
We also take a moment to review function continuity. The statement f (x) is continuous at x = a means that for every ε > 0, there is a δ > 0 such that |f (x) − f (a)| < ε whenever |x − a| < δ. We write f (x) ∈ C(I) if f is continuous at every x ∈ I (suitable modifications having been made for continuity at endpoints of closed intervals). Continuity of f on [a, b] is denoted by f (x) ∈ C[a, b]. Two useful facts about continuity are the following: Lemma 2.2 (Persistence of Sign). Suppose f (x) ∈ C at x = x0 , and suppose that f (x0 ) is nonzero. Then there is an open interval containing x0 such that f (x) is nonzero at every point of the interval. Proof. Assume f (x0 ) > 0. (Otherwise replace f by −f .) Let ε = f (x0 ). There exists δ > 0 such that |x − x0 | < δ implies |f (x) − f (x0 )| < ε so if x ∈ (x0 − δ, x0 + δ), then −ε < f (x) − f (x0 ) < ε. Hence f (x0 ) − ε < f (x) < f (x0 ) + ε or, since f (x0 ) = ε, 0 < f (x). Theorem 2.1 (Continuity and Convergence). A function f (x) is continuous at x0 if and only if f (xn ) → f (x0 ) whenever xn → x0 . Proof. Suppose f is continuous at x0 and xn → x0 . Let ε > 0. There exists δ > 0 such that |x − x0 | < δ implies |f (x) − f (x0 )| < ε. Now suppose xn → x0 . Choose N such that n > N implies |xn − x0 | < δ. For this N, n > N implies |f (xn )−f (x0 )| < ε and therefore f (xn ) → f (x0 ). Conversely,
2.3 Basic Results for Integrals
21
suppose xn → x0 implies f (xn ) → f (x0 ). To show f is continuous at x0 , we suppose f is not continuous at x0 , and seek a contradiction. There exists ε > 0 such that for any δ > 0, there exists some x with |x − x0 | < δ but |f (x) − f (x0 )| ≥ ε. In particular we may choose a sequence δi = 1/i and xi with |xi − x0 | < δi but |f (xi ) − f (x0 )| ≥ ε for all i ∈ N. Then xi → x0 but it is false that f (xi ) → f (x0 ). The consequences of continuity on a closed interval are particularly important. We state the following without proof, and refer the reader to any standard calculus text for more details. The first of these consequences is known as the intermediate value property. Theorem 2.2. If f (x) ∈ C[a, b], then f (x) assumes every value between f (a) and f (b), f (x) is bounded on [a, b], and f (x) takes on its supremum and its infimum in [a, b]. Finally, we review concepts relating to function monotonicity and extrema. A function f (x) is increasing on I if f (x2 ) ≥ f (x1 ) whenever x2 > x1 for all x1 , x2 in I. Similarly, f (x) is decreasing if f (x2 ) ≤ f (x1 ) whenever x2 > x1 . If strict inequality holds we use the terms strictly increasing or decreasing, respectively. Let x0 ∈ I. If f (x0 ) ≥ f (x) for all x ∈ I, then f (x) has a maximum on I equal to f (x0 ). The definition of minimum is analogous.
2.3 Basic Results for Integrals The formal definition of the Riemann integral appears in Exercise 2.8. It is helpful to keep in mind that f (x) is integrable on [a, b] if f (x) is continuous or monotonic on [a, b]. Several useful inequalities for integrals can be established by forming Riemann sums. Given an integral b f (x) dx, a
we use the notation ∆x = (b − a)/n and xi = a + i∆x for i = 0, . . . , n, and write the corresponding Riemann sum as n
f (xi )∆x.
i=1
Once an inequality is established for such a sum, we may let n → ∞ and apply Lemma 1.1 to obtain an inequality involving the integral. Theorem 2.3. If f (x) and g(x) are integrable on [a, b] with f (x) ≤ g(x), then b b f (x) dx ≤ g(x) dx. a
a
22
2. Methods from the Calculus
Proof. With the notation described above, we form Riemann sums: n
f (xi )∆x ≤
i=1
n
g(xi )∆x.
i=1
The result follows as n → ∞ by Lemma 1.1. Corollary 2.3.1 (Simple Estimate). If f (x) is integrable on [a, b] with m ≤ f (x) ≤ M , then
b
m(b − a) ≤
f (x) dx ≤ M (b − a).
a
Corollary 2.3.2 (Modulus Inequality). If f (x) is integrable on [a, b], then b b f (x) dx ≤ |f (x)| dx. a a The second corollary follows from the inequalities −|f (x)| ≤ f (x) ≤ |f (x)| and plays the role of the triangle inequality for integrals. If continuity is assumed in the integrand function f (x), the persistence of sign property leads to the next result. Lemma 2.3. Let f ∈ C[a, b] and suppose that f (x) ≥ 0 on [a, b] with f (x) > 0 for some x ∈ [a, b]. Then
b
f (x) dx > 0. a
Proof. Suppose f (x0 ) > 0 where x0 ∈ (a, b). There is an open interval about x0 where f (x) > 0. Choose a smaller closed interval where f (x) > 0, say I = [x0 − δ, x0 + δ]. Let m be the minimum value of f (x) in I. Then
b
f (x) dx ≥ m(2δ) > 0.
a
If x0 is an endpoint, f (x) is also positive at an interior point so the argument still applies. A class of results known as mean value theorems are also useful. We give two of these next, and refer the reader to Exercise 2.10 for other examples.
2.4 Results from the Differential Calculus
23
Theorem 2.4 (Second Mean Value Theorem for Integrals). If f ∈ C[a, b], and g(x) is integrable and never changes sign on [a, b], then for some ξ ∈ [a, b] b b f (x)g(x) dx = f (ξ) g(x) dx. (2.2) a
a
Proof. Assume that g(x) ≥ 0 on [a, b]; otherwise, replace g(x) by −g(x). Let M and m denote the maximum and minimum values, respectively, of f (x) on [a, b]. Then mg(x) ≤ f (x)g(x) ≤ M g(x) for all x, hence m If
b a
b
g(x) dx ≤
a
b
f (x)g(x) dx ≤ M
a
b
g(x) dx. a
g(x) dx = 0 then any choice of ξ will do. Otherwise
b f (x)g(x) dx m ≤ a b ≤ M. g(x) dx a
By the intermediate value property applied to f ,
b f (x)g(x) dx f (ξ) = a b g(x) dx a for some ξ ∈ [a, b], and (2.2) follows. Corollary 2.4.1 (First Mean Value Theorem for Integrals). If f ∈ C[a, b], then for some ξ ∈ [a, b] b f (x) dx = f (ξ)(b − a). a
Hence there is a point ξ ∈ [a, b] such that f (ξ) equals the average of f (x) taken over [a, b].
2.4 Results from the Differential Calculus The notation f ∈ C n [I] means that the first n derivatives of f exist and are continuous on I. In the case n = 0 it means that f is continuous, and we omit the superscript. We first recall the fundamental theorem of calculus. A proof may be found in any standard calculus text.
24
2. Methods from the Calculus
Theorem 2.5. If f ∈ C[a, b] and F (x) = f (x), then b f (x) dx = F (b) − F (a). a
The next result is a source of series expansions that are useful in approximating functions and, as we shall see, in generating inequalities. Theorem 2.6 (Taylor’s Theorem). Let x > a, f (x) ∈ C n [a, x], and let f (n+1) (x) exist on (a, x). Then f (x) = f (a) + f (a)(x − a) + · · · +
f (n) (a) f (n+1) (ξ) (x − a)n + (x − a)n+1 n! (n + 1)!
for some ξ strictly between a and x. Proof. The first n + 1 terms constitute the Taylor polynomial of degree n for f (x) about the point a; the last term is called the remainder term. To simplify the proof, assume f ∈ C (n+1) [a, b]. By Theorem 2.5, x f (t) dt. f (x) − f (a) = a
Integrate by parts with u = f (t), du = f (t)dt, v = −(x − t), and dv = dt; then x x f (t) dt = f (a)(x − a) +
a
f (t)(x − t) dt.
a
Repeat with u = f (t), du = f (t)dt, v = −(x − t)2 /2, dv = (x − t)dt, and continue the process until f (n) (a) 1 x (n+1) f (x) = f (a)+f (a)(x−a)+· · ·+ (x−a)n + f (t)(x−t)n dt. n! n! a Because (x − t)n never changes sign in the interval with endpoints a and x, by (2.2) the remainder term can be rewritten f (n+1) (ξ) x f (n+1) (ξ) 1 x (n+1) f (t)(x−t)n dt = (x−a)n+1 (x−t)n dt = n! a n! (n + 1)! a for some ξ between a and x. The next two results, although both important in their own right, can be viewed as immediate consequences of Taylor’s theorem. Corollary 2.6.1 (Mean Value Theorem). Let f ∈ C[a, b], with f (x) differentiable on (a, b). Then there exists ξ ∈ (a, b) such that f (b) = f (a) + f (ξ)(b − a).
(2.3)
2.4 Results from the Differential Calculus
25
y 6
a
ξ
b
x
FIGURE 2.1. Mean value theorem.
Intuitively, there is a point in (a, b) such that the slope of the line tangent to f (x) at that point is equal to the slope of the secant line connecting the functional values at the endpoints of [a, b]. See Figure 2.1. Corollary 2.6.2 (Rolle’s Theorem). If f (x) ∈ C[a, b], f (x) is differentiable on (a, b), and f (b) = f (a) = 0, then there exists ξ ∈ (a, b) such that f (ξ) = 0. Rolle’s theorem indicates that between every two zeros of a continuous function the derivative has at least one zero. Theorem 2.7 (Conditions for Monotonicity). If f ∈ C[a, b] and f (x) is differentiable on (a, b) with f (x) ≥ 0, then f (x) is increasing on [a, b]. If f (x) > 0, then f (x) is strictly increasing. Corresponding statements hold for decreasing functions, for which f (x) ≤ 0. Proof. We prove only the first part of the theorem, and leave the rest for the reader. Suppose a ≤ x1 < x2 ≤ b. By Corollary 2.6.1, there is a number ξ ∈ (x1 , x2 ) such that f (x2 ) − f (x1 ) = f (ξ)(x2 − x1 ). But f (ξ) ≥ 0 by hypothesis and x2 − x1 > 0, so f (x2 ) − f (x1 ) ≥ 0. Hence f (x2 ) ≥ f (x1 ) whenever x2 > x1 on [a, b], as required. Theorem 2.8 (Cauchy’s Mean Value Theorem). Let f, g ∈ C[a, b] be differentiable on (a, b). Then there exists ξ ∈ (a, b) such that [f (b) − f (a)] g (ξ) = [g(b) − g(a)] f (ξ). Proof. Call A = f (b) − f (a), B = −[g(b) − g(a)], C = −Bf (a) − Ag(a), and apply Rolle’s theorem to h(x) = Ag(x) + Bf (x) + C. The following is useful for establishing the monotonicity of the ratio of two functions:
26
2. Methods from the Calculus
Theorem 2.9 (l’Hˆ opital’s Monotone Rule). Let f, g ∈ C[a, b], with f and g differentiable on (a, b) such that g (x) = 0 on (a, b). Let f (x)/g (x) be increasing (or decreasing) on (a, b). Then the functions f (x) − f (a) g(x) − g(a)
and
f (x) − f (b) g(x) − g(b)
are also increasing (or decreasing) on (a, b). If f (x)/g (x) is strictly increasing (or decreasing) so are f (x) − f (a) g(x) − g(a)
and
f (x) − f (b) . g(x) − g(b)
Proof. (See [3]). We may assume g (x) > 0 on (a, b). (If not, multiply f and g by −1.) By Theorem 2.8, for x ∈ (a, b) there exists y ∈ (a, x) with f (y) f (x) f (x) − f (a) = ≤ , g(x) − g(a) g (y) g (x)
so
f (x) ≥ g (x)
f (x) − f (a) . g(x) − g(a)
Now use the quotient rule and the last expression to deduce that the derivative of [f (x) − f (a)]/[g(x) − g(a)] is nonnegative, hence Theorem 2.7 applies. By l’Hˆopital’s rule, to evaluate a ratio of the indeterminate form 0/0 we differentiate both numerator and denominator and try to evaluate again. Theorem 2.9 is almost as easily remembered. To determine whether a ratio is monotonic on an interval, we verify that we get 0/0 at one of the endpoints, then differentiate numerator and denominator and try again (making sure the new denominator is nonzero on the open interval). Theorem 2.10 (Second Derivative Test). Assume f (x) ∈ C 2 (a, b). Let x0 ∈ (a, b), and suppose that f (x0 ) = 0 and f (x0 ) > 0. Then f (x) has a local minimum at x0 . That is, there is a positive δ such that if 0 < |x − x0 | < δ, then f (x) > f (x0 ). Proof. Because f (x0 ) > 0, by Lemma 2.2 there is a positive δ such that f (x) > 0 if |x − x0 | < δ. Now let 0 < |∆x| < δ. By Theorem 2.6 there exists some ξ strictly between x0 and x0 + ∆x such that 1 f (x0 + ∆x) = f (x0 ) + f (x0 )∆x + f (ξ)(∆x)2 . 2
(2.4)
Since f (x0 ) = 0 and f (ξ) > 0 the result follows by inspection. Note that if we assume f (x0 ) = 0 and f (x0 ) < 0, then f (x) has a local maximum at x0 . We will state and prove the theorem for n variables later.
2.5 Some Applications
27
2.5 Some Applications We begin by exploiting Corollary 2.6.1, the mean value theorem. Example 2.2. We can verify the useful inequality tan x > x
(2.5)
for 0 < x < π/2 by applying Corollary 2.6.1 with f (x) = tan x, a = 0, and b = x < π/2; i.e., by asserting that tan x − tan 0 =
1 (x − 0) cos2 ξ
for some ξ ∈ (0, x), and simply noting that 0 < cos2 ξ < 1. Similarly, Corollary 2.6.1 yields sin x < x whenever x > 0 (Exercise 2.4), so sin x < x < tan x whenever 0 < x < π/2. Example 2.3. Applying Corollary 2.6.1 to the natural log, we obtain ln(1 + x) − ln 1 =
1 [(1 + x) − 1] ξ
for some ξ ∈ (1, 1 + x). Therefore x < ln(1 + x) < x 1+x whenever x > 0. This is the logarithmic inequality. (The range of validity can be extended to include −1 < x < 0 as well.) The other differentiation theorems also provide ways to check many proposed inequalities. One plan is as follows. Suppose the proposed inequality is of the general form g(x) < h(x)
(x > x0 ),
(2.6)
where g(x0 ) = h(x0 ) and the functions g(x) and h(x) have known derivatives. Defining f (x) = h(x) − g(x) we have f (x0 ) = 0. If we can further show that f (x) > 0 for x > x0 , then (2.6) is established.
28
2. Methods from the Calculus
y 6
1 sin x 2x/π x
π/2 FIGURE 2.2. Jordan’s inequality.
Example 2.4. We can prove xr ≤ rx + (1 − r)
(2.7)
for x > 0 and 0 < r < 1 by this method. Defining f (x) = (1 − r) + rx − xr we find f (1) = 0 and f (x) = r − rxr−1 = r 1 −
1 x1−r
.
For x > 1, f (x) > 0; for 0 < x < 1, f (x) < 0. Hence (2.7) holds, with equality if and only if x = 1. Similarly, xr ≥ rx + (1 − r) whenever x > 0 and r > 1. Example 2.5. For 0 < x < π/2, x − tan x d sin x <0 = cos x dx x x2 by (2.5), so sin x/x is strictly decreasing on the interval of interest. Because sin x/x → 2/π as x → π/2, we conclude that sin x >
2 x π
whenever 0 < x < π/2. This is Jordan’s inequality. It is useful and easily remembered (Figure 2.2).
2.5 Some Applications
29
Another handy technique relies on the inspection of series expansions to establish inequalities. We saw in Chapter 1 how the binomial expansion could be used for this purpose. Taylor series are also useful in this regard. Example 2.6. From the Taylor series ∞ xn e = n! n=0 x
we see that ex > 1 + x +
x2 2
for all x > 0. Even more simply ex > 1 + x, but we can replace x by x/n to get the less obvious result x n ex > 1 + n for all x > 0 and n ∈ N. Example 2.7. If z ∈ C, (1.1) yields ∞ 2n−1 n−1 z (−1) |sin z| = (2n − 1)! n=1 ∞ 2n−1 (−1)n−1 z ≤ (2n − 1)! n=1 ∞ |z|2n−1 = (2n − 1)! n=1
and we have |sin z| ≤ sinh |z|. The basic integration properties are useful in estimating (i.e., putting bounds on) integrals. Example 2.8. For any positive constants a, b, c, d with a < b, (b − a) ca3 + d <
b
cx3 + d dx < (b − a) cb3 + d.
a
Example 2.9. Consider the integral 1 x5 dx. I= 1/2 0 (x + 25)
30
2. Methods from the Calculus
y 6
xp
0
1
2
3
4
5
x
FIGURE 2.3. Overestimating an integral.
On the interval [0, 1] we have x5 ≥ 0; hence by Theorem 2.4 there exists ξ ∈ [0, 1] such that 1 . I= 6(ξ + 25)1/2 Therefore
1 1 √ ≤I≤ . 30 6 26
Example 2.10. A simple observation shows that t
∞
2
∞ −x2 2 xe−x xe dx ≤ dx 2n+1 x t2n+1 t t ∞ 2 1 e−t −x2 = − 2n+1 (−2x)e dx = 2n+1 . 2t 2t t
e−x dx = x2n
∞
Example 2.11. The average of a positive, increasing function is increasing. We can see this as follows. Let f (x) be increasing on [0, a]. Then for every x ∈ (0, a] we have x 1 1 x du ≥ f (u) du. f (x) ≥ max f (u) = max f (u) x 0 x 0 u∈[0,x] u∈[0,x] Hence
1 f (x) − x
or 1 x2
x
0
xf (x) −
f (u) du ≥ 0
x
f (u) du 0
≥ 0.
2.5 Some Applications
31
y 6
xp
0
1
2
3
x
4
FIGURE 2.4. Underestimating an integral.
By the quotient rule for differentiation, d 1 x f (u) du ≥ 0 dx x 0 as required. It is possible to obtain other inequalities involving integrals through an ad hoc consideration of areas bounded by various plane curves. This simple process, reminiscent of the integral test from calculus, is probably best described in an example. Example 2.12. The function f (x) = xp , where −1 < p < 0, is strictly decreasing. From Figures 2.3 and 2.4 it is apparent that
n+1
p
x dx < 1
n
p
k <
n
xp dx.
0
k=1
Hence, after carrying out the integrations, np+1 (n + 1)p+1 − 1 p k < < . p+1 p+1 n
k=1
Integration along a contour in the complex plane follows many rules that are analogous to those for real integration, with little modification. In particular, Corollary 2.3.2 extends to the complex case: if g(z) is integrable on contour C, then g(z) dz ≤ |g(z)| |dz|. C
C
32
2. Methods from the Calculus
Example 2.13. Suppose C is of finite length L. If there is a number M > 0 such that for all z ∈ C the inequality |g(z)| < M holds, then
g(z) dz < M L C
because g(z) dz ≤ |g(z)| |dz| < M |dz| = M |dz| = M L. C
C
C
C
2.6 Exercises 2.1. Let p ∈ R, p > 0. Use the fact that ln x = 1
x
dt t
and the squeeze principle to show that lim
x→∞
ln x = 0. xp
2.2. Use differentiation to prove the following. Assume n ∈ N. (a) ln x ≤ n(x1/n − 1) for x > 0 with equality if and only if x = 1. (b) xn + (n − 1) ≥ x for x ≥ 0. (c) 2 ln(sec x) < sin x tan x for 0 < x < π/2. (d) sinh x ≥ x for x ≥ 0. (e) |x ln x| ≤ e−1 for 0 ≤ x ≤ 1. (f) ex < (1 − x)−1 for x < 1. (g) π e < eπ . (h) (s + t)a ≤ sa + ta ≤ 21−a (s + t)a for s, t > 0, 0 < a ≤ 1. (i) 21−b (s + t)b ≤ sb + tb ≤ (s + t)b for s, t > 0, b ≥ 1. 2.3. Obtain the inequality ex ≥
ex a
a for x > a and a > 0, and determine the circumstances for equality to hold. (See Mitrinovic [45] for several related inequalities.) 2.4. Use Corollary 2.6.1 to derive the inequalities: (a) sin x < x for x > 0. (b) x/(1 + x2 ) < tan−1 x < x for x > 0. √ √ (c) 1 + (x/2 1 + x) < 1 + x < 1 + x/2 for x > 0.
2.6 Exercises
33
(d) ex (y − x) < ey − ex < ey (y − x) for y > x. (e) (1 + x)a ≤ 1 + ax(1 + x)a−1 , for a > 1 and x > −1, with equality if and only if x = 0. Also prove that if |f (x)| ≤ B for some constant B, then f satisfies the Lipschitz condition |f (x2 ) − f (x1 )| ≤ B|x2 − x1 |. 2.5. Applications of l’Hˆ opital’s monotone rule: opital’s (a) For a > 1 and x > −1, x = 0, define h(x) = ((1+x)a −1)/x. Use l’Hˆ rule to define h(0) = a. Use Theorem 2.9 to show that h(x) is increasing on [−1, ∞) and hence that (1 + x)a ≥ 1 + ax with equality if and only if x = 0 (cf., Example 2.4). (b) Show that h(x) =
ln cosh x ln((sinh x)/x)
is decreasing on (0, ∞). (c) Prove that for x ∈ (0, 1), π<
sin πx ≤ 4. x(1 − x)
(d) Prove that 1 > sin x/x > 2/π on (0, π/2) (cf., Example 2.5.) 2.6. Use series expansions to establish the following inequalities: (a) tan 2x ≥ tan x + tanh x for 0 ≤ x < π/4. (b) |cos z| ≤ cosh |z|, z ∈ C. (c) |ln(1 + x)| ≤ − ln(1 − |x|) if |x| < 1. ∞
∞ (d) n=1 (1 + an ) ≤ exp n=1 an if 0 ≤ an < 1 for all n. 2.7. Show that if n is an integer greater than 1 and a, b are positive with a > b, then an − bn bn−1 < < an−1 . n(a − b) Use this to prove that no positive real number can have more than one positive nth root. 2.8. The following exercises involve the definition of integration. Recall from calculus that f (x) is integrable on [a, b] and
b
f (x) dx = I a
means that given ε > 0 there exists some δ > 0 such that if a = x0 < x1 < · · · < xn = b
34
2. Methods from the Calculus
and if xi − xi−1 < δ for i = 1, . . . , n and if ξi ∈ [xi−1 , xi ] for i = 1, . . . , n, then n f (ξi )(xi − xi−1 ) − I < δ. i=1
Note as a special case that if f (x) is integrable on [a, b], then lim
n→∞
n
b
f (a + i∆x)∆x =
f (x) dx, a
i=1
where ∆x = (b − a)/n. (a) Show that if f (x) is integrable on [a, b], then f (x) is bounded on [a, b]. (b) Show that if f (x) is integrable on [a, b], then the function x f (t) dt F (x) = a
is continuous on [a, b]. (c) Define f (x) = x−1/2 if 0 < x ≤ 1 and f (0) = 0. Does
1 0
f (x) dx exist?
2.9. More exercises involving integration: (a) Put simple lower and upper bounds on the family of integrals 1 dx I(α, β) = β α 0 (x + 1) where α, β ≥ 0. (b) Show that
π/2
0
ln(1/ sin t) dt < ∞.
(c) A function f (t) is of exponential order on [0, ∞) if there exist positive numbers b and C such that whenever t ≥ 0, |f (t)| ≤ Cebt . Show that the Laplace transform of f (t) given by ∞ f (t)e−st dt F (s) = 0
exists if f (t) is of exponential order. (d) Verify that 0
π/2
(sin x)2n+1 dx ≤
0
π/2
(sin x)2n dx ≤
π/2
(sin x)2n−1 dx
0
and establish Wallis’s product 2 2 4 4 6 6 2m 2m π = · · · · · · ··· · · · ··· . 2 1 3 3 5 5 7 2m − 1 2m + 1
2.6 Exercises (e) Show that
T
sin x dx x
lim
T →∞
35
0
exists and is between 1 and 3. 2.10. Prove the following statements. Parts (a) and (b) are challenging; according to Hobson [29], they were first given by Bonnet circa 1850. (a) Let f (x) be a monotonic decreasing, nonnegative function on [a, b], and let g(x) be integrable on [a, b]. Then for some ξ with a ≤ ξ ≤ b,
b
ξ
f (x)g(x) dx = f (a)
g(x) dx.
a
a
(b) Let f (x) be a monotonic increasing, nonnegative function on [a, b], and let g(x) be integrable on [a, b]. Then for some η with a ≤ η ≤ b,
b
b
f (x)g(x) dx = f (b)
g(x) dx.
a
η
(c) Let f (x) be bounded and monotonic on [a, b], and let g(x) be integrable on [a, b]. Then for some ξ with a ≤ ξ ≤ b,
b
f (x)g(x) dx = f (a) a
ξ
b
g(x) dx + f (b) a
g(x) dx. ξ
This is also referred to as the second mean value theorem for integrals, particularly in older books. (d) Let f (x) be a monotonic function integrable on [a, b], and suppose that f (a)f (b) ≥ 0 and |f (a)| ≥ |f (b)|. Then, if g is a real function integrable on [a, b], b ξ ≤ |f (a)| max . f (x)g(x) dx g(x) dx a≤ξ≤b a
a
This is Ostrowski’s inequality for integrals. 2.11. Exercises using graphical approaches: (a) Verify pictorially that
n
n+1
ln x dx < ln(n!) < 1
ln x dx 1
for n ∈ N, n > 1. (b) Sketch the curve y = 1/x for x > 0, and consider the area bounded by this curve, the x-axis, and the lines x = a and x = b (b > a). Compare this with the areas of two suitable trapezoids and obtain 2(b − a) b b2 − a2 < ln . < b+a a 2ab
36
2. Methods from the Calculus
2.12. Euler’s constant C is defined by
n 1 − ln n . C = lim Cn = lim n→∞ n→∞ j j=1
Verify that C exists and is positive by showing that Cn is strictly decreasing with lower bound 1/2. (Preliminary hint: Use trapezoids as in Exercise 2.11.) 2.13. Show that a thin metal ring rolls more slowly down an incline than any solid circular disk of the same radius and same total mass. 2.14. Prove the following generalized version of Rolle’s theorem. Let g ∈ C n [a, b], and let x0 < x1 < · · · < xn be n + 1 points in [a, b]. Suppose g(x0 ) = g(x1 ) = · · · = g(xn ) = 0. Then there exists ξ ∈ [a, b] such that g (n) (ξ) = 0. 2.15. Prove that if g(x) ≥ 0, g(x) ∈ C[a, b], and b g(x) dx = 0, a
then g(x) ≡ 0 on [a, b]. 2.16. A set A is said to be dense in a set B if every element of B is the limit of a sequence of elements belonging to A. Show that if f (x) and g(x) are continuous on B with f (x) ≤ g(x) for every x in some dense subset of B, then f (x) ≤ g(x) for all x ∈ B. Explain how this idea could be used to extend to real arguments an inequality proved for rational arguments. 2.17. (A simple caution.) Given a valid inequality between two functions, is it generally possible to obtain another valid inequality by direct differentiation? Is it true, for instance, that f (x) > g (x) whenever f (x) > g(x)? Note, however, that if f (x) > g (x) on [a, b], then we do have f (b) − f (a) > g(b) − g(a).
3 Some Standard Inequalities
3.1 Introduction Here we examine some famous inequalities. Many of these are extremely powerful, and have left bold imprints on both pure and applied mathematics. We begin with a simple result called Bernoulli’s inequality, and proceed to examine results of a more advanced nature.
3.2 Bernoulli’s Inequality Theorem 3.1 (Bernoulli’s Inequality). If n ∈ N and x ≥ −1, then (1 + x)n ≥ 1 + nx.
(3.1)
Equality holds if and only if n = 1 or x = 0. (See Exercise 2.5 for a generalization.) Proof. We can give an elementary proof by induction. Let P(n) be the proposition x ≥ −1 ⇒ (1 + x)n ≥ 1 + nx with equality if and only if n = 1 or x = 0. P(1) holds trivially. Now let n ∈ N and assume P(n) is true. Note that since n + 1 = 1, conditions for equality in P(n + 1) are simply x = 0. Multiplying by the nonnegative number 1 + x, we have (1 + x)n+1 ≥ (1 + x)(1 + nx) = 1 + (n + 1)x + nx2 ≥ 1 + (n + 1)x. (3.2) Equality holds in (3.2) if and only if nx2 = 0, which holds if and only if x = 0.
38
3. Some Standard Inequalities
y 6 y = f (x) x = g(y)
h A2 A1 w
x
FIGURE 3.1. Young’s inequality.
3.3 Young’s Inequality Consider two continuous functions f and g, both strictly increasing and inverses of one another. Suppose further that both functions vanish at the origin when graphed as in Figure 3.1. Area A1 +A2 clearly exceeds the area of a rectangle of width w and height h (for any choice of positive numbers w, h), and we are led immediately to the following: Theorem 3.2 (Young’s Inequality). Let f, g ∈ C be strictly increasing and inverse to one another for nonnegative argument, with f (0) = g(0) = 0. Then w h wh ≤ f (x) dx + g(x) dx. (3.3) 0
0
Equality holds if and only if h = f (w). An analytical proof is requested in Exercise 3.1.
3.4 The Inequality of the Means We now present the celebrated arithmetic mean – geometric mean, or AM– GM, inequality. Theorem 3.3 (Weighted AM–GM Inequality). Let a1 , . . . , an be positive numbers and let δ1 , . . . , δn be positive numbers (weights) such that δ1 + · · · + δn = 1. Then δ1 a1 + · · · + δn an ≥ aδ11 · · · aδnn and equality holds if and only if the ai are all equal.
(3.4)
3.4 The Inequality of the Means
39
Proof. (See [21]). We begin with the fact that x − 1 − ln x ≥ 0 whenever x > 0, with equality if and only if x = 1 (Exercise 2.2). Call A=
n
δk ak .
k=1
For each i, ai /A − 1 − ln(ai /A) ≥ 0. Multiplying each such term by δi and summing over i, we have n
(δi ai /A − δi ) −
i=1
Since
n
δi ln(ai /A) ≥ 0.
(3.5)
i=1 n
(δi ai /A − δi ) = 0
i=1
we have
n
δi ln(ai /A) ≤ 0.
i=1
Now by the fact that the exponential function is increasing (see Theorem 2.7) and the laws of exponents,
n δi ln(ai /A) ≤ exp(0) = 1. exp i=1
Hence (aδ11 · · · aδnn )/A ≤ 1, and aδ11 · · · aδnn ≤ δ1 a1 + · · · + δn an .
(3.6)
Equality holds in (3.6) if and only if it holds in (3.5). Because each summand is nonnegative, equality holds in (3.5) if and only if each summand is zero which is equivalent to each ai /A = 1. In other words, equality holds in (3.6) if and only if a1 = · · · = an . For other proofs, see Exercises 3.5, 3.9, and 3.23. The choice of weights δi = 1/n for all i leads to the next result. Corollary 3.3.1 (AM–GM Inequality). If a1 , . . . , an are positive numbers, then a1 + · · · + an ≥ (a1 · · · an )1/n . n
(3.7)
Equality holds if and only if the ai are all equal. The left member is the ordinary arithmetic mean of the n numbers, while the right member is by definition the ordinary geometric mean.
40
3. Some Standard Inequalities
Example 3.1. Application of (3.7) to the reciprocals 1/ai gives n 1/n . −1 ≤ (a1 · · · an ) a−1 + · · · + a n 1 The left member is called the harmonic mean of the ai (recall Exercise 1.5). Thus the harmonic mean of positive numbers never exceeds the geometric mean. Example 3.2. A simple technique is to multiply by unity and then apply (3.7). Consider, for instance, the sequence an =
1 1+ n
n .
We have an = =
1 1+ n
n
n+1 1 +1 n+1 n 1+ n n+2 ·1< = n+1 n+1
1 1+ n+1
n+1 .
Hence an < an+1 , and an is monotone increasing. An integral form of the AM–GM inequality is introduced in Exercise 3.10.
3.5 H¨older’s Inequality This result can be obtained in one step from the weighted AM–GM inequality. Theorem 3.4 (H¨ older’s Inequality). Suppose for each j, 1 ≤ j ≤ n, that aj1 , . . . , ajm are nonzero numbers. Suppose δ1 , . . . , δn are positive numbers such that δ1 + · · · + δn = 1. For each j denote Sj =
m
|aji |.
i=1
Then m i=1
|a1i |δ1 · · · |ani |δn ≤ S1δ1 · · · Snδn .
(3.8)
3.5 H¨ older’s Inequality
41
Proof. m i=1
|a1i |δ1 · · · |ani |δn S1δ1 · · · Snδn
= ≤
δ m |a1i | 1 i=1 m i=1
S1 δ1
···
|ani | Sn
δ n
|a1i | |ani | + · · · + δn S1 Sn
= δ1 + · · · + δn =1
(3.9)
by the application of (3.4) to each summand. With n = 2 write δ1 = 1/p, δ2 = 1/q, and let a1i = |ai |p and a2i = |bi |q for i = 1, . . . , m. Then (3.8) becomes m
|ai bi | ≤
i=1
m
|ai |
i=1
p
1/p m
1/q |bi |
q
.
(3.10)
i=1
This special case is also commonly referred to as H¨older’s inequality, and we can give another proof based on Young’s inequality. Putting f (x) = xp−1 and g(x) = xq−1 with complementary exponents that satisfy 1 1 + =1 p q
(1 < p < ∞)
we obtain from (3.3) wh ≤
wp hq + . p q
With two sets of m numbers a1 , . . . , am and b1 , . . . , bm , we form the quantities 1/p 1/q m m |aj |p , β= |bj |q . α= j=1
j=1
Assuming that α, β are both nonzero, we have |ai | |bi | 1 |ai |p 1 |bi |q + ≤ p α β p α q βq for any positive integer i. Summation over i produces m m m 1 1 1 q 1 1 p |αi ||bi | ≤ |ai | + q |bi | = + = 1, αβ i=1 pαp i=1 qβ i=1 p q
as required.
42
3. Some Standard Inequalities
Taking m → ∞ we have, by Lemma 1.1, ∞ 1/p ∞ 1/q ∞ p q |ai bi | ≤ |ai | |bi | i=1
i=1
i=1
provided the series on the right both converge. The corresponding result for integrals, provided the integrals exist, is 1/p 1/q b
a
b
|f (x)g(x)| dx ≤
b
|f (x)|p dx
a
|g(x)|q dx
.
(3.11)
a
See Exercise 3.14 for a derivation. In order to discuss when equality holds in H¨ older’s inequality, we note that if αi ≥ 0 for all i, then m
αi = 0
i=1
if and only if each αi = 0. If αi ≥ βi for all i, then m i=1
αi =
m
βi
i=1
if and only if αi = βi for all i. Thus equality holds in (3.9) if and only if for each i δ δ |ani | n |a1i | |ani | |a1i | 1 ··· = δ1 + · · · + δn . (3.12) S1 Sn S1 Sn From the weighted AM–GM inequality, (3.12) holds for each i if and only if |a1i | |ani | = ··· = . S1 Sn
(3.13)
Hence equality holds in H¨ older’s inequality (3.8) if and only if (3.13) holds for all i. In the case n = 2, equality holds in (3.10) if and only if |bi |q |a |p m i = m p q i=1 |ai | i=1 |bi |
(3.14)
for all i. It is convenient to remove the condition that each aji be nonzero. If aj1 = · · · = ajm = 0, then (3.8) holds by inspection. Now suppose each set {aj1 , . . . , ajm } contains at least one nonzero term. For each index i in (3.9) it is still true that δ δ |a1i | 1 |ani | n |a1i | |ani | ··· ≤ δ1 + · · · + δn S1 Sn S1 Sn
3.6 Minkowski’s Inequality
43
(by (3.4) if each aji = 0, by inspection otherwise) so (3.9) and (3.10) are still valid. We summarize this discussion applied to (3.10) as follows: Theorem 3.5 (H¨ older’s Inequality). Let p > 1 and q > 1 and p−1 + −1 q = 1. Let a1 , . . . , am and b1 , . . . , bm be two sequences of real numbers. Then m 1/p m 1/q m p q |ai bi | ≤ |ai | |bi | . i=1
i=1
i=1
Equality holds if and only if one of the sequences ai or bi consists entirely of zeros or else |a |p |bi |q m i = m p q i=1 |ai | i=1 |bi | for all i.
3.6 Minkowski’s Inequality Theorem 3.6 (Minkowski’s Inequality). Assume that a1 , . . . , am and b1 , . . . , bm are real numbers, and let p ≥ 1. Then m 1/p m 1/p m 1/p p p p |ai + bi | ≤ |ai | + |bi | . (3.15) i=1
i=1
i=1
Proof. If p = 1 this follows from the triangle inequality. Now suppose p > 1, older’s inequality as and choose q > 1 so that p−1 + q −1 = 1. Write H¨ 1/q 1/p m m m |αi βi | ≤ |αi |p |βi |q . i=1
i=1
i=1
Let αi = |ai | and βi = |ai +bi | , and then let αi = |bi | and βi = |ai +bi |p/q to get 1/p m 1/q m m p/q p p |ai ||ai + bi | ≤ |ai | |ai + bi | (3.16) p/q
i=1
i=1
and m i=1
|bi ||ai + bi |
p/q
≤
m i=1
i=1
|bi |
p
1/p m
1/q |ai + bi |
p
,
(3.17)
i=1
respectively. Since p = 1 + (p/q), for each i, |ai + bi |p = |ai + bi ||ai + bi |p/q ≤ |ai ||ai + bi |p/q + |bi ||ai + bi |p/q .
(3.18)
44
3. Some Standard Inequalities
Summing over the terms in (3.18), and using (3.17) and (3.16), we have m
|ai + bi | ≤ p
m
i=1
|ai |
p
1/p m
i=1
+
m
1/q |ai + bi |
p
i=1
|bi |
p
1/p m
i=1
1/q |ai + bi |
p
i=1
1/p m 1/p m 1/q m p p p = |ai | + |bi | |ai + bi | . i=1
i=1
We may assume that
m
i=1
|ai + bi |p = 0
i=1
(because (3.15) obviously holds otherwise) and the theorem is proved. For conditions when equality holds, see Exercise 4.8. Minkowski’s inequality can be extended to apply to infinite series, provided the series converge, as ∞
1/p |ai + bi |
≤
p
∞
i=1
1/p |ai |
p
+
∞
i=1
1/p |bi |
p
i=1
and to integrals, provided the integrals exist, as
b
1/p |f (x) + g(x)| dx p
a
≤
b
1/p |f (x)| dx p
+
a
b
1/p |g(x)| dx p
.
a
3.7 The Cauchy–Schwarz Inequality Theorem 3.7 (Cauchy–Schwarz Inequality). Suppose a1 , . . . , am and b1 , . . . , bm are nonnegative real numbers. Then m 2 m m ai bi ≤ a2i b2i . (3.19) i=1
i=1
i=1
Equality holds if and only if all ai are zero, or all bi are zero, or a2 aj = 2i bj bi for all j.
3.8 Chebyshev’s Inequality
45
Proof. Let p = q = 2 in H¨ older’s inequality. Alternatively, we can use our previous remarks on quadratic inequalities as follows. First, if ai = 0 for all i, then (3.19) holds trivially. If ai = 0 for at least one i, then for every x ∈ R certainly m (ai x + bi )2 ≥ 0 i=1
or
g(x) = αx2 + 2βx + γ ≥ 0,
where α=
m
a2i ,
β=
i=1
m
ai bi ,
γ=
m
i=1
b2i .
i=1
2
Hence β − αγ = ∆ ≤ 0, and (3.19) is established. Provided the series on the right converge, ∞
2 ≤
ai bi
∞
i=1
a2i
∞
i=1
b2i
.
i=1
We can also write (3.19) for Riemann sums and apply Lemma 1.1 to obtain
2
b
f (x)g(x) dx
≤
a
b
f 2 (x) dx
a
b
g 2 (x) dx
a
for functions f (x) and g(x), provided the integrals exist.
3.8 Chebyshev’s Inequality Theorem 3.8 (Chebyshev’s Inequality). Let ai and bi be similarly ordered such that either a1 ≥ · · · ≥ am , a1 ≤ · · · ≤ am , or b1 ≤ · · · ≤ b m , b1 ≥ · · · ≥ b m . Then
1 ai bi ≥ m i=1 m
1 ai m i=1 m
m 1 bi m n=1
with equality if and only if a1 = · · · = am or b1 = · · · = bm . Proof. For either of the two cases it is evident that for any choice of i, j, (ai − aj )(bi − bj ) ≥ 0.
46
3. Some Standard Inequalities
Summation over both indices yields m m
(ai − aj )(bi − bj ) ≥ 0
i=1 j=1
and expansion gives m
ai bi
i=1
m j=1
(1) −
m i=1
or 2m
ai
m
bj −
j=1
m
ai bi − 2
i=1
m
aj
j=1 m i=1
m i=1
ai
m
bi +
m
aj bj
j=1
m
(1) ≥ 0
i=1
bi ≥ 0,
i=1
as required. Example 3.3. Choosing bi = ai for all i, we see that the square of the arithmetic mean never exceeds the mean of the squares. With functions f (x) and g(x), analogous operations yield b b b 1 f (x)g(x) dx ≥ f (x) dx g(x) dx b−a a a a if f (x) and g(x) are either both increasing or both decreasing on [a, b]. If one function is increasing and the other is decreasing, the inequality sign is reversed.
3.9 Jensen’s Inequality A function f (x) is convex on the open interval (a, b) if and only if the inequality f (px1 + (1 − p)x2 ) ≤ pf (x1 ) + (1 − p)f (x2 )
(3.20)
holds for all x1 , x2 ∈ (a, b) and every p ∈ (0, 1). In the case of strict inequality for x1 = x2 , f is strictly convex on (a, b). We note that any xp ∈ (x1 , x2 ) can be expressed as xp = x1 + (1 − p)(x2 − x1 ) = px1 + (1 − p)x2 for some p ∈ (0, 1). The straight line connecting the points (x1 , f (x1 )) and (x2 , f (x2 )) is f (x2 ) − f (x1 ) (x − x1 ) fs (x) = f (x1 ) + x2 − x1 so that fs (xp ) = pf (x1 ) + (1 − p)f (x2 ). Geometrically then, the convexity condition prevents the graph of f (x) from rising above the secant line connecting any two of its points (Figure 3.2).
3.9 Jensen’s Inequality
47
y 6
f (x) fs (x)
a
x1
xp
x2
b
x
FIGURE 3.2. Function convexity.
Upon reflection it seems natural to associate convexity with the requirement that f (x) ≥ 0 on (a, b). In fact, this requirement is equivalent to (3.20) for functions twice continuously differentiable on (a, b) (Exercise 3.22). The reader should also be aware that other definitions of convexity exist. An example is the midpoint convexity definition requiring that x1 + x2 f (x1 ) + f (x2 ) f ≤ 2 2 for x1 , x2 ∈ (a, b). Here the geometric requirement is only that the midpoint of every secant line lie on or above the graph of f . For more detailed information on convexity, see Mitrinovic [45]. Our main result for convex functions is as follows: Theorem 3.9 (Jensen’s Inequality). Let f (x) be convex on (a, b), and let x1 , . . . , xm be m points of (a, b). Also let c1 , . . . , cm be nonnegative constants such that c1 + · · · + cm = 1. Then m m f ci xi ≤ ci f (xi ). (3.21) i=1
i=1
If f is strictly convex and if additionally each ci > 0, then equality holds if and only if x1 = · · · = xm . Proof. We first consider the case for which cm < 1, and proceed by induction. With m = 2, (3.21) holds by convexity of f . If x1 = x2 , then equality holds in (3.21) by inspection, and if f is strictly convex, all ci > 0, and equality holds in (3.21) we must have x1 = x2 , for otherwise f (c1 x1 + c2 x2 ) < c1 f (x1 ) + c2 f (x2 ).
48
3. Some Standard Inequalities
Now assume the theorem is true for m = k and suppose the numbers c1 , . . . , ck+1 sum to 1. We have f
k+1
ci xi
i=1
ci xi + ck+1 xk+1 = f (1 − ck+1 ) 1 − ck+1 i=1 k ci ≤ (1 − ck+1 )f xi + ck+1 f (xk+1 ) 1 − ck+1 i=1 k
by convexity of f . Since the numbers ci /(1 − ck+1 ) for 1 ≤ i ≤ k sum to 1, f
k i=1
ci xi 1 − ck+1
k 1 ci f (xi ), ≤ 1 − ck+1 i=1
(3.22)
hence (with m = k + 1 ) (3.21) holds. If x1 = · · · = xk+1 equality holds in (3.21) by inspection. Now suppose equality holds (with m = k+1) in (3.21), f is strictly convex, and all ci > 0. Then equality also holds in (3.22), for if not then equality cannot hold in (3.21) either, contrary to hypothesis. Hence, since the theorem is assumed true for k numbers, x1 = · · · = xk . Putting this into (3.21), we obtain k k ci x1 + ck+1 xk+1 = ci f (x1 ) + ck+1 f (xk+1 ), f i=1
i=1
so by the case m = 2, xk+1 = x1 and hence the induction is complete. The other case, for which cm = 1, is much easier; for then c1 = · · · = cm−1 = 0, whence (3.21) becomes simply f (xm ) ≤ f (xm ). Example 3.4. The function f (x) = xn , n ∈ N, is convex on (0, ∞). Hence for all x, y > 0 we have n x+y xn + y n ≤ . 2 2 Example 3.5. Choosing ci = 1/m for i = 1, . . . , m, we have m m 1 1 f xi ≤ f (xi ) m i=1 m i=1 for any convex f . If instead f is “concave” such that −f is convex, our inequality becomes m m 1 1 xi ≥ f (xi ). f m i=1 m i=1
3.10 Exercises
An example is f (x) = sin x on (0, π), and we have m m 1 1 sin θi ≤ sin θi m i=1 m n=1
49
(3.23)
for 0 < θ1 ≤ · · · ≤ θm < π. In fact, −sin x is strictly convex on (0, π) so that equality holds in (3.23) only if θ1 = · · · = θm . See Exercise 3.21 for an application. An integral form of Jensen’s inequality is introduced in Exercise 3.24.
3.10 Exercises 3.1. Construct an analytical proof of Young’s inequality. 3.2. Assume a, b, c, d > 0 and prove the following: (a) a2 + b2 ≥ 2ab. (b) a4 + b4 ≥ 2a2 b2 . (c) a2 + b2 + c2 ≥ ab + bc + ca. (d) a4 + b4 + c4 + d4 ≥ 4abcd. (e) (a + b)(b + c)(c + a) ≥ 8abc. 3.3. Use AM–GM to show that n! <
n+1 2
n .
Prove also that if n > 1 is a natural number, then [45] (2n − 1)!! < nn
and
(n + 1)n > (2n)!!,
where (2n − 1)!! = (2n − 1) · (2n − 3) · (2n − 5) · · · · · 5 · 3 · 1 (2n)!! = (2n) · (2n − 2) · (2n − 4) · · · · · 4 · 2. 3.4. Simple applications of the AM–GM inequality: (a) Show that of all rectangles having a given area, a square has the least perimeter. (b) A charge q is removed from a given electric charge Q to make two separate charges q and Q − q. Determine q so that repulsion between the charges at a given distance is maximized. 3.5. Prove the weighted AM–GM inequality by induction on n. 3.6. Show that if the product of N positive numbers equals 1, then the sum of those numbers cannot be less than N .
50
3. Some Standard Inequalities
3.7. Show that the sequence (1 − n−1 )n is monotone increasing. 3.8. Let a1 , . . . , aN be positive numbers that sum to 1, and let m be a positive integer. Show that N a−m ≥ N m+1 . n n=1
3.9. Let n ∈ N, and let x1 , . . . , xn and δ1 , . . . , δn be positive numbers such that n n t 1/t . i=1 δi = 1. For any real number t = 0 define g(t) = ( i=1 δi xi ) n n δi (a) Show that g(t) → i=1 xi as t → 0. Hence we define g(0) = i=1 xδi i . (b) Show that g is increasing. Preliminary hint: Take the logarithm and use l’Hˆ opital’s monotone rule on (0, ∞) and (−∞, 0). (c) Note that g(−1) ≤ g(0) ≤ g(1) gives n −1 n n δi /xi ≤ xδi i ≤ δi xi i=1
i=1
i=1
(the weighted harmonic–geometric–arithmetic means inequality). 3.10. Let f (x) ∈ C[a, b] and let f (x) > 0 on [a, b]. Prove b b b−a 1 1 ≤ exp ln f (x) dx ≤ f (x) dx. b b−a a b−a a (1/f (x)) dx a This is the harmonic–geometric–arithmetic mean inequality for integrals. 3.11. n ∈ N and x1 , . . . , xn be positive numbers. On (0, ∞) define h(t) = Let1/t ( n x ) . Show that h is decreasing. i i=1 3.12. Show that for any m numbers ai that satisfy 0 < a1 < · · · < am and any m positive numbers λi that sum to 1, Kantorovich’s inequality m m 2 λi A ≤ λi ai ai G i=1 i=1 holds, where A = (a1 + am )/2 and G =
√
a1 am .
3.13. Suppose p and q are positive real numbers such that p−1 + q −1 = 1. Let a1 , . . . , am be nonzero numbers. Define bi = c|ai |p−1 for i = 1, . . . , m. Verify that |bi |p |a |p m i = m p p i=1 |ai | i=1 |bi |
(3.24)
for i = 1, . . . , m. By Theorem 3.5 equality must hold in H¨ older’s inequality. Verify this by direct substitution. Conversely, show that (3.24) implies there is a c > 0 such that |bi | = c|ai |p−1
(3.25)
for i = 1, . . . , m. Hence the condition for equality in H¨ older’s inequality can be stated by (3.25).
3.10 Exercises
51
3.14. Justify equation (3.11). 3.15. A function f (x) is square integrable on [a, b] if
b
|f (x)|2 dx < ∞.
a
Show that the sum of two square integrable functions is square integrable. 3.16. Prove that if h(x) ≥ 0, then
b a
2 f (x)g(x)h(x) dx ≤
b
f 2 (x)h(x) dx
a
b
g 2 (x)h(x) dx.
a
3.17. A particle undergoes rectilinear motion at speed v. Show that the temporal average of v never exceeds its spatial average: 1 T
T 0
v(t) dt ≤
1 X
X
v(x) dx. 0
Under what condition does equality hold? 3.18. Show that if ci > 0 for i = 1, . . . , n, then n n 1 2 . ci n ≤ ci i=1 i=1 3.19. Use the Chebyshev inequalities for integrals to derive the inequalities
b
1 b−a
[f (x)]2 dx ≥
a
and
b
b
f (x) dx a
a
b
2 f (x) dx
a
dx ≥ (b − a)2 . f (x)
3.20. Show that the sum of finitely many convex functions is convex. 3.21. Prove that of all N sided polygons that can be inscribed in a circle of fixed radius, a regular polygon has the greatest area. 3.22. Show that the condition 1 (1 − p)(x2 − x1 )2 0
s
(1−p)s
f (x1 + (x2 − x1 )t) dt ds ≥ 0
is equivalent to (3.20) for functions f that are twice continuously differentiable. Reason from this fact that f (x) ≥ 0 is necessary and sufficient for the convexity of such functions. 3.23. Use Jensen’s inequality with f (x) = −ln x, x > 0, to deduce the weighted AM–GM inequality.
52
3. Some Standard Inequalities
3.24. Let g(t) and p(t) be continuous and defined for a ≤ t ≤ b such that α ≤ g(t) ≤ β and p(t) > 0. Let f (u) be continuous and convex on the interval α ≤ u ≤ β. Show that b b f (g(t)) p(t) dt a g(t)p(t) dt ≤ a . f b b p(t) dt p(t) dt a
This is Jensen’s inequality for integrals.
a
4 Inequalities in Abstract Spaces
4.1 Introduction Generality is gained by working in abstract spaces. For instance, all essential aspects of the topics of convergence and continuity can be studied in the context of a metric space. When we search for solutions to problems of physical interest, we must often search among the members of linear or vector spaces. Inequalities provide basic structure for abstract spaces like these, and we turn to a consideration of that topic in the present chapter. In doing so we present a few topics from functional analysis. Needless to say our coverage is neither broad nor deep: we hope only to catch a glimpse of inequalities in the kind of abstract setting that can help unify many of our previous results before we proceed to the final chapter on applications.
4.2 Metric Spaces Let M be a nonempty set of elements (points). Let d(x, y) be a real-valued function defined for each x, y ∈ M such that: M1. d(x, y) = 0 if and only if x = y. M2. d(x, y) = d(y, x). M3. d(x, y) ≤ d(x, z) + d(z, y) for every x, y, z ∈ S.
54
4. Inequalities in Abstract Spaces
Then M taken together with d is a metric space, and d is the distance or metric for M . Property M3 is an abstract version of the familiar triangle inequality. Putting y = x in M3 and using the other two properties, we get 2d(x, z) ≥ 0 so that distance as defined is never negative. The distance is also symmetric by M2, and we see that the properties assigned to the metric do satisfy our primary expectations about the distance concept. Example 4.1. R and C are metric spaces, with distances each defined by the usual metric d(x, y) = |x − y|. More generally, Rn and Cn , the spaces consisting of n-tuples of elements of R and C, respectively, are metric spaces. A closed interval [a, b] in R is also a metric space. The set C[a, b] of all real-valued functions defined and continuous on [a, b] can form a metric space if the metric is chosen as d(f, g) = max |f (x) − g(x)|. x∈[a,b]
(4.1)
To verify this we check to see that d(f, g) satisfies the required properties. We have d(f, g) = 0 if and only if |f (x) − g(x)| = 0 for all x ∈ [a, b], verifying M1. Property M2 is obviously satisfied. For M3, |f (x) − g(x)| = |f (x) − h(x) + h(x) − g(x)| ≤ |f (x) − h(x)| + |h(x) − g(x)|, so that max |f (x) − g(x)| ≤ max |f (x) − h(x)| + max |h(x) − g(x)|,
x∈[a,b]
x∈[a,b]
x∈[a,b]
as desired. The metrics in Rn and Cn are given, respectively, by d(x, y) = (x1 − y1 )2 + · · · + (xn − yn )2 and d(z, w) =
|z1 − w1 |2 + · · · + |zn − wn |2 .
We now state some definitions important in the study of metric spaces. An ε-neighborhood of x0 in M is a set Nε (x0 ) = {x ∈ M | d(x, x0 ) < ε}. This is a direct extension of the corresponding definition in R (recall Example 1.3). A set S in M is open if, given any x0 ∈ S, there exists ε > 0 such that Nε (x0 ) is contained in S. A set S in M is said to be closed if its complement in M is open. A point z ∈ M is a limit point of a set S if every ε-neighborhood of z contains at least one point of S distinct from z. It can be shown that S is closed if and only if it contains all its limit points. A sequence of points {xn } converges to the limit x if, for every ε > 0, there exists N such that d(xn , x) < ε whenever n > N . A sequence {xn } is a Cauchy sequence if, for every ε > 0, there exists N such that for every pair of numbers m, n, the inequalities m > N and n > N together imply that d(xm , xn ) < ε.
4.2 Metric Spaces
55
Theorem 4.1. If {xn } converges, then {xn } is a Cauchy sequence. Proof. Let {xn } → x. Then by the triangle inequality, d(xm , xn ) ≤ d(xm , x) + d(x, xn ) < ε/2 + ε/2 = ε for sufficiently large n, m. The converse of Theorem 4.1 is false, and a metric space M is called complete if every Cauchy sequence in M converges to a point in M . Example 4.2. Let M = C[a, b], with d defined as in (4.1). Then M is complete. Let {fn } be a Cauchy sequence in M . For each x ∈ [a, b], {fn (x)} is a Cauchy sequence in R and hence has a limit which we denote by φ(x). To show that φ(x) is continuous at x1 ∈ [a, b] we use an ε/3 argument. For x2 ∈ [a, b] |φ(x1 ) − φ(x2 )| ≤ |φ(x1 ) − fn (x1 )| + |fn (x1 ) − fn (x2 )| + |fn (x2 ) − φ(x2 )|. The first and third terms can be made less than ε/3 for a sufficiently large choice of n, independent of x1 and x2 , and with n fixed, the middle term can be made less than ε/3 by choosing x2 sufficiently close to x1 . Also, if K > 0, then {f ∈ C[a, b] | |f (x)| ≤ K for all x ∈ [a, b]} is a complete metric space by the previous argument and Lemma 1.1. Next, we take M1 , M2 to be two metric spaces with distance functions d1 , d2 , respectively, and denote by F : M1 → M2 a mapping (i.e., function) from M1 to M2 . If F : M → M , then we say that F is a mapping on M . A mapping F : M1 → M2 is continuous at x0 ∈ M1 if for every ε > 0, there is a δ > 0 such that d2 (F (x), F (x0 )) < ε whenever d1 (x, x0 ) < δ. Lemma 4.1 (Persistence of Sign). If M is a metric space and f : M → R is continuous at x0 with f (x0 ) > 0, then there exists a δ > 0 such that f (x) > 0 whenever d(x, x0 ) < δ. Proof. We leave it to the reader to modify the proof of Lemma 2.2. The interrelationship between convergence and continuity in R, noted in Theorem 2.1, extends to a general metric space. Theorem 4.2. The mapping F : M1 → M2 is continuous at x0 ∈ M1 if and only if F (xn ) → F (x0 ) whenever xn → x0 . Proof. We leave it to the reader to make the necessary modifications to the proof of Theorem 2.1.
56
4. Inequalities in Abstract Spaces
Let F be a mapping on M . F is a contraction mapping on M if there exists a number α ∈ [0, 1) such that d(F (x1 ), F (x2 )) ≤ αd(x1 , x2 )
(4.2)
whenever x1 , x2 ∈ M . The point y is a fixed point of F if F (y) = y.
4.3 Iteration in a Metric Space An iterative process is a method of locating a fixed point of F , or, in other words, of solving the equation y = F (y). The method is based on use of the recursion yn+1 = F (yn ) for n = 0, 1, 2, . . . , to obtain successive approximations to the desired fixed point y. The construction of such a sequence is referred to as Picard iteration. If F is a contraction, then repeated application of (4.2) gives, for any n ∈ N, d(yn+1 , yn ) ≤ αn d(y1 , y0 ). Because 0 ≤ α < 1, the successive approximations form a sequence of points y0 , y1 , y2 , . . . in the metric space that cluster together at a rate controlled by α. The reader may prove that if F is a contraction mapping on M , then F is continuous on M (in the ε − δ definition of continuity, choose δ = ε). The following is one of the most important theorems in all of mathematics: Theorem 4.3 (Banach Contraction Mapping Theorem). Let M be a complete metric space and let F : M → M be a contraction. Then F has a unique fixed point. Proof. Choose any initial point y0 ∈ M . As above, let ym+1 = F (ym ) for all m. For m > n, d(ym , yn ) ≤ d(ym , ym−1 )+d(ym−1 , ym−2 )+· · ·+d(yn+2 , yn+1 )+d(yn+1 , yn ), so that d(ym , yn ) ≤ (αm−1 + αm−2 + · · · + αn+1 + αn )d(y1 , y0 ) = αn (1 + α + · · · + αm−n−2 + αm−n−1 )d(y1 , y0 ) n α d(y1 , y0 ). ≤ 1−α Now αn /(1 − α) can be made arbitrarily small by choosing n large enough. Hence {ym } is a Cauchy sequence, so there is a point Y ∈ M such that {ym } → Y as m → ∞. By continuity of F , Y = lim F (ym ) = F lim ym = F (Y ) m→∞
m→∞
4.4 Linear Spaces
57
and the existence of the fixed point is established. For uniqueness, suppose that Y = F (Y ) and Z = F (Z). Then d(Y, Z) = d[F (Y ), F (Z)] ≤ αd(Y, Z). But α < 1; hence d(Y, Z) = 0, and uniqueness is established. We will encounter several applications of the contraction mapping theorem in Chapter 5.
4.4 Linear Spaces A linear space over a field F is a set X whose elements are called vectors together with two operations called vector addition and scalar multiplication such that the following axioms are satisfied: 1. X is closed with respect to the two operations. That is, x + y ∈ X and αx ∈ X whenever x, y ∈ X and α ∈ F . 2. Addition is both commutative and associative. 3. There is an additive identity element (zero vector) in X. For each vector x ∈ X, there is an additive inverse −x ∈ X. 4. If x, y ∈ X and α, β ∈ F , then: (a) α(x + y) = αx + αy. (b) (α + β)x = αx + βx. (c) (αβ)x = α(βx). (d) 1x = x. If the field of scalars is R, then X is a real linear space; if F = C, then X is a complex linear space. Example 4.3. Rn and Cn are linear spaces. The collection of all realvalued continuous functions on [a, b] is also a linear space. Some terms and concepts are important in the study of linear spaces. If M is a nonempty subset of a linear space X, then M is a subspace of X if M is itself a vector space under the same operations of addition and multiplication as X. A linear combination of the vectors x1 , . . . , xn is a sum of the form α1 x1 + · · · + αn xn where the scalars α1 , . . . , αn ∈ F . A set of vectors {x1 , . . . , xn } is linearly dependent if there exist α1 , . . . , αn ∈ F , not all zero, such that α1 x1 + · · · + αn xn = 0. A set of vectors that is not linearly dependent is linearly independent. If every vector x ∈ X can be expressed as a linear combination of the vectors from a set S, then S is a
58
4. Inequalities in Abstract Spaces
spanning set for X. A linearly independent spanning set is called a basis. A basis is essentially a coordinate system. Any vector space which has a finite spanning set (i.e., any finite-dimensional space) contains a basis. All bases of such a space contain the same number of vectors, called the dimension of the space. If X has dimension n, then any set of n linearly independent vectors in X is a basis of X. Thus far, the issues of “magnitude and direction” normally associated with physical vectors have not been addressed. To introduce the notion of magnitude into a linear space, we define the vector norm. A norm is a real-valued function that assigns to each vector x ∈ X a number x such that: N1. x ≥ 0, and x = 0 if and only if x = 0. N2. αx = |α|x. N3. x + y ≤ x + y. Example 4.4. In Rn , x =
x21 + · · · + x2n .
A function space is an example of an infinite-dimensional linear space. Norms often used with function spaces include the supremum norm ||f || = max |f (x)| x∈[a,b]
and the L2 norm
||f || =
b
1/2 |f (x)|2 dx
.
a
For a particular selection of norm, the function space is defined as the set of all f with ||f || < ∞. X is now a normed linear space, and we have the following result: Theorem 4.4 (Triangle Inequality). Assume that x, y are two vectors in a normed linear space. Then |x − y| ≤ x − y ≤ x + y.
(4.3)
Proof. By N3 with x replaced by x − y, x − y ≤ x − y. Swapping x and y we have, by N2, y − x ≤ y − x = (−1)(x − y) = x − y. Therefore |x − y| ≤ x − y, and the use of N3 again yields the desired inequality.
4.4 Linear Spaces
59
It is often convenient to have a measure of the distance between arbitrary elements of the normed space. Hence we employ the norm to induce a metric using d(x, y) = x − y. The normed linear space is now a metric space also, and our previous concepts involving Cauchy sequences, convergence, and completeness apply. The following theorem, for instance, is useful in applications: Theorem 4.5. In a normed linear space every Cauchy sequence is bounded. Proof. If {xn } is a Cauchy sequence, then with ε = 1 there exists N such that xn − xm < 1 whenever n, m > N . With m = N + 1, this reads xn − xN +1 < 1 whenever n > N . For all n > N , xn = xn − xN +1 + xN +1 ≤ xn − xN +1 + xN +1 < xN +1 + 1. Hence an upper bound for xn is given by B = max{x1 , . . . , xN , xN +1 + 1}. To get the concept of angle (and hence of vector direction) into a linear space, we introduce the inner product. An inner product on a real linear space is a function which assigns to each pair of vectors x, y a real number x, y such that: I1. x, y = y, x. I2. αx, y = αx, y. I3. x + y, z = x, z + y, z. I4. x, x ≥ 0, and x, x = 0 if and only if x = 0. To define an inner product on a complex linear space, we modify the above so that x, y ∈ C, and rewrite I1 as: I1. x, y = y, x. A linear space furnished with an inner product is called an inner product space. Example 4.5. In Rn and Cn we use, respectively, the inner product expressions n n x, y = xi yi , x, y = xi yi . i=1
i=1
An inner product of the form f (x), g(x) =
b
f (x)g(x) dx a
60
4. Inequalities in Abstract Spaces
is often used with functions. (Here we gloss over a technical point involving measure theory and Lebesgue integration; the reader is referred to Oden [47] for a more thorough treatment.) With inner-product structure in a linear space, we can watch more familiar inequalities arise. Theorem 4.6 (Cauchy–Schwarz Inequality). Let x, y be two vectors in a complex inner product space. Then (4.4) |x, y| ≤ x, xy, y and equality holds if and only if there is a scalar β such that x = βy. Furthermore, in the case of equality with y = 0, x, y = βy, y = βy, y so β = x, y/y, y. Thus equality holds if and only if x = 0 or y = 0 or else x = (x, y/y, y)y. Proof. By property I4, for every scalar α, 0 ≤ x+αy, x+αy with equality if and only if x = −αy = βy. By the other properties this inequality can be manipulated into the equivalent form 0 ≤ x, x + αx, y + αx, y + ααy, y. To shorten the notation, we write a = x, x, b = x, y, c = y, y and have 0 ≤ |α|2 c + 2 [αb] + a. Note that a and c are real and nonnegative. If c = 0, we may put α = −b/c and get |b|2 ≤ ac as desired. If c = 0 but a = 0, the roles of x and y may be reversed in the definitions of a, b, c above to yield the same result. If c and a are both zero, then x and y are both zero by I4, and Cauchy–Schwarz holds trivially. Example 4.6. Substituting the various inner product expressions given in Example 4.5, we may quickly generate the specific forms n n n 2 2 xi yi ≤ xi yi , i=1 i=1 i=1 n n n xi yi ≤ |xi |2 |yi |2 , i=1 i=1 i=1 b b b 2 f (x)g(x) dx ≤ |f (x)| dx |g(x)|2 dx. a a a Note the economy and power of the abstract space approach.
4.4 Linear Spaces
61
For any two vectors x, y in a real linear space, the Cauchy–Schwarz inequality can be written as x, y2 ≤ x, xy, y.
(4.5)
Theorem 4.7 (Minkowski Inequality). Suppose x, y are two vectors in a linear space. Then x + y, x + y ≤ x, x + y, y. (4.6) Proof. x + y, x + y = x, x + 2 x, y + y, y ≤ x, x + 2|x, y| + y, y ≤ x, x + 2 x, xy, y + y, y 2 x, x + y, y . = Conditions for equality are treated in Exercise 4.8. Whereas the triangle inequality makes a statement about generalized distances, we see that the Cauchy–Schwarz and Minkowski inequalities, as stated above, are concerned with the properties of generalized angles. A norm can also be conveniently induced by the inner product using the equation x2 = x, x.
(4.7)
The Cauchy–Schwarz and Minkowski inequalities can then be written as |x, y| ≤ xy and x + y ≤ x + y, respectively. In this case we also have the following theorem: Theorem 4.8 (Parallelogram Law). Let x, y be vectors in a linear space in which the norm is induced by the inner product. Then x + y2 + x − y2 = 2x2 + 2y2 . Proof. This follows from straightforward expansion and manipulation of the quantity x + y, x + y + x − y, x − y using the basic inner product properties.
62
4. Inequalities in Abstract Spaces
Note that in R2 the vectors x, y represent adjacent sides of a parallelogram, while the vectors x + y, x − y are the diagonals. With a norm present, we may also employ concepts of convergence and completeness. A Hilbert space is a complete linear space equipped with an inner product. The following two items are also of some use in applications: Theorem 4.9 (Continuity of the Inner Product). Assume the norm is induced by the inner product, and suppose that xn → x and yn → y. Then xn , yn → x, y. Proof. We use the triangle and Cauchy–Schwarz inequalities: |xn , yn − x, y| = |xn , yn − xn , y + xn , y − x, y| = |xn , yn − y + xn − x, y| ≤ |xn , yn − y| + |xn − x, y| ≤ xn yn − y + xn − x y. Since {xn } is convergent it is bounded with xn ≤ B for some finite B. The other n-dependent quantities can be made as small as desired by choosing n sufficiently large. Corollary 4.9.1 (Continuity of the Norm). If the norm is induced by the inner product, then xn → x implies xn → x. The notion of angle also leads to a generalized notion of “perpendicularity.” Two vectors x, y are orthogonal if x, y = 0. We say {x1 , x2 , . . . } is an orthogonal set if for all i, j > 0, i = j implies xi , xj = 0. An orthonormal set is an orthogonal set where for all i, xi = 1. It can be shown that mutual orthogonality among the members of a finite set of nonzero vectors implies linear independence among those vectors. Through an algorithm called the Gram–Schmidt procedure, we can generate a mutually orthogonal set from any linearly independent set of vectors. A handy theorem, which follows directly from the inner product and orthogonality definitions, is the following: Theorem 4.10 (Pythagorean Theorem). Let the vectors x and y be orthogonal. Then x + y2 = x2 + y2 .
4.5 Orthogonal Projection and Expansion Given a vector x in a Hilbert space H, and a subspace M of H, some optimization schemes require a vector m ∈ M that is “closest” to x in the sense that x − m is minimized. Such a vector m0 is known as a minimizing vector. Assuming M is closed, we can establish the existence
4.5 Orthogonal Projection and Expansion
63
and uniqueness of m0 ∈ M . First, we wish to show that, corresponding to the given x, there exists m0 ∈ M such that for all m ∈ M , x − m ≥ x − m0 . Let x ∈ H be given. If x ∈ M , we simply choose m0 = x. If x ∈ M , we define δ = inf x − m. m∈M
Note that for any mi , mj ∈ M , mj − mi 2 = (mj − x) + (x − mi )2 and by Theorem 4.8 (mj − x) + (x − mi )2 + (mj − x) − (x − mi )2 = 2x − mj 2 + 2x − mi 2 . Hence 2 mi + mj mj − mi 2 = 2x − mj 2 + 2x − mi 2 − 4 x − 2 ≤ 2x − mj 2 + 2x − mi 2 − 4δ 2 .
(4.8)
The inequality follows by definition of δ, because M is a subspace and therefore contains the vectors (mi + mj )/2. Now let {mi } be a sequence in M such that x − mi → δ. As i, j → ∞ then, the squeeze principle gives mj − mi → 0 so that {mi } is a Cauchy sequence in M (and in H). Because {mi } is a Cauchy sequence and M is closed, {mi } converges to a point m0 ∈ M . By continuity of the norm, x − m0 = δ. The minimizing vector is unique. For supposing that m01 , m02 ∈ M are two minimizing vectors, then choosing mi = m01 and mj = m02 in (4.8) gives m02 − m01 2 ≤ 2x − m02 2 + 2x − m01 2 − 4δ 2 ≤ 2δ 2 + 2δ 2 − 4δ 2 = 0, and hence m02 = m01 . From an intuitive “best approximation” standpoint, it is not surprising (Figure 4.1) that m0 is the unique minimizing vector if and only if the error vector m0 − x is orthogonal to every m ∈ M . For suppose there exists m ∈ M that fails to be orthogonal to m0 − x. We may assume (by dividing m by its norm if necessary) that m is a unit vector. We get a vector in M closer to x by subtracting the component of m0 − x along m from m0 as follows: denote m0 − x, m = α = 0. The vector m0 − αm
64
4. Inequalities in Abstract Spaces
xp C
C
H C C
C
C C
m0 p
M C Cp m
FIGURE 4.1. Minimizing vector.
satisfies x − (m0 − αm)2 = (x − m0 ) + αm, (x − m0 ) + αm = x − m0 , x − m0 + αm, x − m0 + x − m0 , αm + αm, αm
(4.9)
= x − m0 2 + 2 (αm, x − m0 ) + |α|2 m2 = x − m0 2 − 2|α|2 + |α|2 = x − m0 2 − |α|2 < x − m0 2 . Because x − (m0 − αm) < x − m0 , m0 cannot be a minimizing vector, a contradiction. The orthogonality of x − m0 to every m ∈ M is sufficient for uniqueness of m0 because for any m ∈ M we have x − m2 = x − m0 + m0 − m2 = x − m0 2 + m − m0 2 , and hence x − m > x − m0 unless m = m0 . Abstract spaces also offer a concise viewpoint on the subject of Fourier series expansion. Let f be an arbitrary vector in a Hilbert space, and {x1 , x2 , . . . } an orthonormal set in the space. For fixed n ∈ N, form the subspace generated by functions g(x) =
n
ci xi .
i=1
Using the properties of the inner product and orthonormality, it is easily shown that f − g, f − g = f − g2 = f 2 + 2
n
n i=1
2
|ci − f, xi |2 −
n
|f, xi |2 .
i=1
The terms f and i=1 |f, xi | are fixed; hence, with ci = f, xi the function f −g is minimized in the least squares sense and Bessel’s inequality
4.6 Exercises n
65
|f, xi |2 ≤ f 2
i=1
holds. These choices of ci are the Fourier coefficients of f . As n → ∞, the resulting series on the left must converge; therefore, we deduce the Riemann lemma lim f, xi = 0. i→∞
Consider the classical setting for Fourier series, where H is the set of square-summable functions on [0, 2π] using Lebesgue integration with the inner product given by 2π 1 f, g = f (x)g(x) dx. 2π 0 With n fixed, let M be the subspace generated by all trigonometric polynomials of order n. That is, let {x1 , . . . , xn } = {1, e±ix , e±2ix , . . . , e±mix }. The Fourier coefficients ci chosen as above makes g(x) the minimizing vector, the vector closest to a given f .
4.6 Exercises 4.1. Prove that in any metric space: (a) |d(x, y) − d(x, z)| ≤ d(y, z). (b) d(x1 , xn ) ≤ d(x1 , x2 ) + d(x2 , x3 ) + · · · + d(xn−1 , xn ). 4.2. Prove that if a sequence converges in a metric space, then its limit is unique. 4.3. (The space lp .) Let p ≥ 1 be a fixed real number, ∞let X pbe the set of all real sequences of the form x = {ξ1 , ξ2 , . . . } such that i=1 |ξi | is convergent. For two points x = {ξ1 , ξ2 , . . . } and y = {η1 , η2 , . . . }, let the distance be defined as 1/p ∞ p |ξi − ηi | . d(x, y) = i=1
Show that: (a) the series defining d(x, y) is convergent for all x, y ∈ X; and (b) X is a metric space. 4.4. Show that C[a, b] with distance defined using b |f (x) − g(x)| dx d(f, g) = a
is a metric space.
66
4. Inequalities in Abstract Spaces
4.5. Show that the set of all bounded sequences {xi } with d(x, y) = sup |xi − yi | 1≤i<∞
is a metric space. 4.6. Show that equation (4.7) gives rise to a valid norm. 4.7. In a triangle ABC let the points β and α be the midpoints of sides AC and BC, respectively. Show that if the side lengths of the triangle satisfy AC > BC, then the medians satisfy Aα > Bβ. 4.8. State and prove conditions for equality in Minkowski’s inequality. 4.9. Use the Cauchy–Schwarz inequality to prove Theorem 1.1.
5 Some Applications
5.1 Introduction We now look at a small subset of inequality applications. The reader who has worked patiently through the mathematical content of the previous chapters should be comfortable dealing with these and many other areas of inequality application. The topics of this chapter are chosen for variety, and are presented in no particular order (just as we might encounter them in practice).
5.2 Estimation of Integrals This idea was introduced in Chapter 2, and we now offer some additional examples. Note that the triangle, Cauchy–Schwarz, and Minkowski inequalities all provide upper bounds for an integral. A useful lower bound can often be obtained from the Chebyshev inequality. Example 5.1. Consider the integral I=
1
1 + x5 dx.
0
√ Because the integrand √ takes extreme values of 1 and 2 on [0, 1], the inequality 1 ≤ I ≤ 2 is easily obtained. However, Cauchy–Schwarz with
68
5. Some Applications
g(x) ≡ 1 gives an improved upper bound: 1 1 7 5 | 1 + x | dx ≤ (1 + x5 ) dx = ≈ 1.167. 6 0 0 So 1 ≤ I ≤ 7/6. (A numerical evaluation gives I ≈ 1.075.) Example 5.2. Given two functions f (t) and g(t) defined on (−∞, ∞), the convolution of f (t) and g(t), written f (t) ∗ g(t), is defined by ∞ f (x)g(t − x) dx f (t) ∗ g(t) = −∞
provided the integral exists. The function f ∗ g is bounded provided that both f (t) and g(t) are square integrable on (−∞, ∞) (see Exercise 3.15). For by Cauchy–Schwarz we have ∞ 2 f (x)g(t − x) dx |f (t) ∗ g(t)|2 = −∞ ∞ ∞ ≤ |f (x)|2 dx |g(t − x)|2 dx. −∞
−∞
Hence |f (t) ∗ g(t)| < ∞ for all t, and f ∗ g is bounded. Example 5.3. The integrand of 5 I= ex (x + 1) dx 2
is a product of functions that both increase on [2, 5]. Hence by the Chebyshev inequality 5 1 5 x 9 I≥ e dx (x + 1) dx = (e5 − e2 ) ≈ 635. 3 2 2 2 An upper bound can be obtained from Cauchy–Schwarz: 5 5 e2x dx (x + 1)2 dx ≈ 832. I≤ 2
2
The precise value of I is close to 727.
5.3 Series Expansions Series of functions arise in many contexts. Suppose the functions ui (x), for i = 1, 2, 3, . . . , have a common domain D along the x-axis. The nth partial
5.3 Series Expansions
sum of the series
69
ui (x) is n
Sn (x) =
ui (x).
i=1
ui (x0 ) as a numerical series and For each fixed x0 ∈ D we could view decide on its convergence or divergence via the standard tests from calculus. This would yield information about pointwise convergence of ui (x). However, uniform convergence is more important. We say that ui (x) converges uniformly to u(x) on D if and only if for every ε > 0 there is an N > 0 (dependent on ε but not on x) such that for every n and for every x in D, n > N ⇒ |u(x) − Sn (x)| < ε. Uniform convergence can settle the question whether a given series of functions can be integrated or differentiated termwise. The manipulation b ∞ b ∞ ui (x) dx = ui (x) dx a i=1
i=1
a
is valid provided that the functions ui (x) are integrable and verges uniformly on [a, b]. The manipulation ∞
ui (x) con-
∞
d d ui (x) ui (x) = dx i=1 dx i=1 is valid for all x ∈ [a, b] if the functions ui (x) have continuous derivatives in [a, b], ui (x) converges uniformly on [a, b], and the differentiated series on the right converges uniformly in [a, b]. A useful lemma called the Weierstrass M -test provides sufficient conditions for uniform convergence. Suppose a convergent series of positive constants Mi can befound such (x)| ≤ M . Call u(x) = ui (x) and that for all x in D and for all i, |u i i M = Mi . Then ∞ ∞ ∞ n ui (x) ≤ |ui (x)| ≤ Mi = M − Mi . |u(x) − Sn (x)| =
i=n+1
i=n+1
i=n+1
i=1
But Mi is convergent; hence, given ε > 0, we can choose N such that for n > N the last absolute value quantity is less than ε. Thus ui (x) converges uniformly on D. Example 5.4. Let f (x) be periodic with period 2π. The series a0 +
∞
an cos nx + bn sin nx
n=1
is the Fourier series of f (x) if a0 =
1 2π
π
f (x) dx, −π
70
5. Some Applications
and for n ∈ N, an =
1 π
π
f (x) cos nx dx, −π
bn =
1 π
π
f (x) sin nx dx. −π
Convergence (especially uniform convergence) of Fourier series has received a great deal of study, and a general treatment of the topic involves Lebesgue integration. However, it is instructive to see a simple set of convergence conditions established using the M test. If f (x) has continuous derivatives through order 2 for all x, then the trigonometric Fourier series of f (x) converges uniformly everywhere. We integrate the formula for an by parts twice and make use of the periodicity of f (x) and its derivative to get π 1 f (x) cos nx dx. an = − 2 n π −π Now since f (x) is continuous on [−π, π], it attains maximum and minimum values on that interval. Hence, for some B > 0, π 1 2B |an | ≤ 2 |f (x)| dx ≤ 2 . n π −π n Similarly, |bn | ≤ 2B/n2 . The desired conclusion −p follows from the M test and convergence of the numerical series n for p > 1. Another important type of expansion is the asymptotic expansion. In an asymptotic sequence of functions each term is dominated, in o fashion, by the previous term. Thus, the criterion for {wn (x)} to be an asymptotic sequence for x → x0 is wn+1 (x) = o(wn (x)), or equivalently lim
x→x0
wn+1 (x) = 0. wn (x)
The weighted sum an wn (x), where the an are constants, might turn out to be a good approximation to some function f (x) when x is close to x0 . If f (x) −
m
an wn (x) = o(wm (x))
(x → x0 ),
(5.1)
n=1
then the summation is an asymptotic expansion to m terms of f (x) for x → x0 , and we write f (x) ∼
m
an wn (x)
(x → x0 ).
n=1
(The special case m = 1 gives rise to a single-term “expansion” known as an asymptotic formula for f .) For fixed m, the difference between f and its
5.3 Series Expansions
71
asymptotic expansion approaches zero faster than the last term included in the expansion. Many functions have asymptotic expansions for large x of the form m an (x → ∞), f (x) ∼ n x n=0 i.e., in inverse powers of x. If such a function can be written without approximation as m an f (x) = + Rm (x) xn n=0 then suitable criteria on the remainder are that for any fixed m, 1 Rm (x) → 0, Rm = O (x → ∞). xm+1
(5.2)
The latter O requirement stipulates the existence of some finite B such that for sufficiently large x, |Rm | ≤ B/xm+1 . Hence xm |Rm | ≤ B/x, and the quantity Rm /(1/xm ) is squeezed to zero as x → ∞, implementing the o requirement in (5.1). Example 5.5. Consider the function g(x) defined by the integral
∞
g(x) = x
ex−t dt. t
An m-fold integration by parts yields g(x) =
(m − 1)! 1 2! 3! 1 − 2 + 3 − 4 + · · · + (−1)m−1 + Rm (x), x x x x xm
where
m
∞
Rm (x) = (−1) m! x
ex−t dt. tm+1
But we can give a little and write
∞
x
ex−t dt ≤ tm+1
∞
x
ex−t 1 dt = m+1 , xm+1 x
so that (5.2) is satisfied, and thus g(x) ∼
1 2! 1 (m − 1)! 3! − + 3 − 4 + · · · + (−1)m−1 . x x2 x x xm
72
5. Some Applications
5.4 Simpson’s Rule Suppose we need a numerical estimation of an integral of the form b f (x) dx.
(5.3)
a
We assume that all derivatives of f (x) formed in the next discussion exist and are continuous. Interval [a, b] is partitioned into 2n subintervals each of length ∆x = (b − a)/2n, and f (x) is approximated by a quadratic polynomial on the first two subintervals, another quadratic polynomial on the third and fourth subintervals, and so on. Of course, polynomials are easy to integrate, and the sum of the integrals of the approximating polynomials is used to approximate (5.3). In order to carry out the integration of the polynomials, we mention Lagrange interpolation: Let {x0 , x1 , . . . , xn } be n + 1 distinct points. Define the function li (x) = j=i
(x − xj ) . (xi − xj )
Then li (xi ) = 1 and li (xj ) = 0 if j = i. The polynomial pn (x) =
n
f (xi )li (x)
i=0
interpolates f (x) at {x0 , x1 , . . . , xn }; that is, pn (x) = f (x) at each xi . We call h = ∆x. We now restrict our attention to the first two intervals: x1 = x0 + h, x2 = x0 + 2h. On [x0 , x2 ] p2 (x) = f (x0 )l0 (x) + f (x1 )l1 (x) + f (x2 )l2 (x).
x2 The integral x0 p2 (x) dx, which we denote by S[x0 ,x2 ] , is after simplification h (f (x0 ) + 4f (x1 ) + f (x2 )). (5.4) 3
x To find the difference (error) between the integral x02 f (x) dx and the approximation S[x0 ,x2 ] we expand both terms in Taylor series. Define F (x) =
x f (t) dt. By Theorem 2.5, F (x) = f (x), F (x) = f (x), etc. x0 S[x0 ,x2 ] =
F (x0 + 2h) = F (x0 ) + F (x0 )2h + · · · + = f (x0 )2h + · · · +
F (5) (x0 ) (2h)5 + O(h6 ) 5!
f (4) (x0 ) (2h)5 + O(h6 ), 5!
f (x0 + h) = f (x0 ) + f (x0 )h + · · · +
f (4) (x0 ) 4 (h) + O(h5 ), 4!
5.4 Simpson’s Rule
f (x0 + 2h) = f (x0 ) + f (x0 )2h + · · · +
Since
73
f (4) (x0 ) (2h)4 + O(h5 ). 4!
x2
f (x) dx = F (x0 + 2h) x0
substituting the Taylor series for the various terms and simplifying, we get x2 −h5 (4) f (x0 ) + O(h6 ). f (x) dx − S[x0 ,x2 ] = 90 x0 Summing over all pairs of intervals, we get the difference b −h5 (4) f (x0 ) + · · · f (x) dx − S[a,b] = 90 a −h5 (4) + f (x2n−2 ) 90 + O(h6 ) + · · · + O(h6 ),
(5.5)
where S[a,b] is the Simpson approximation given by S[a,b] = S[x0 ,x2 ] + · · · + S[xn−2 ,xn ] h = (f (x0 ) + 4f (x1 ) + 2f (x2 ) + · · · + 4f (x2n−1 ) + f (x2n )). 3 Let M and m denote the maximum and minimum values, respectively, of f (4) (x) on [a, b]. Then nm ≤ f (4) (x0 ) + · · · + f (4) (x2n−2 ) ≤ nM, f (4) (x0 ) + · · · + f (4) (x2n−2 ) ≤ M. n So by the intermediate value property, for some ξ ∈ [a, b], m≤
f (4) (x0 ) + · · · + f (4) (x2n−2 ) = f (4) (ξ). n Since b − a = 2nh, we can write the sum of the n terms −h5 (4) −h5 (4) −h4 (4) f (x0 ) + · · · + f (x2n−2 ) = f (ξ)(b − a). 90 90 180 Similarly, the terms O(h6 ) + · · · + O(h6 ) = nO(h6 ) = ((b − a)/2h)O(h6 ) = O(h5 ). The error, (−h4 /180)f (4) (ξ)(b − a) + O(h5 ), more simply put, is b f (x) dx − S[a,b] = O(h4 ). a
74
5. Some Applications
This expression for the error is of theoretical interest: if (b − a) and the higher derivatives of f (x) are not large, then Simpson’s rule is very accurate for small h. In practice, we seldom know the fourth derivative of a function, and instead keep doubling the number of partitions until (within a specified tolerance) convergence, noting that the sum of function evaluations at the interior points at one iteration gives the sum of the function evaluations with even indices at the next iteration, hence does not need to be recomputed. For other methods including Rhomberg’s method, see Patel [49].
5.5 Taylor’s Method Suppose we are given the initial-value problem y = f (x, y)
(y(a) = y0 ).
(5.6)
Suppose y(x) is a solution to the problem and that a numerical estimate of y(b) is to be computed. We suppose y(x) ∈ C p+1 [a, b]. Assuming y(xn ) has been computed (accurately), we want to compute y at the next value of x, at xn+1 = xn + h. By Taylor’s theorem h2 + ··· 2 hp hp+1 . + y (p) (xn ) + y (p+1) (ξ) p! (p + 1)!
y(xn+1 ) = y(xn ) + y (xn )h + y (xn )
(5.7)
Since y(x) is an unknown function of x, we do not know y (x), y (x), . . . , explicitly. By (5.6) we do know y (x) in terms of x, y(x) and by the chain rule we can express y (x), y (x), . . . in terms of x, y(x). To simplify notation, we write ∂f /∂x as fx , ∂f /∂y as fy , etc. Since y (x) = f (x, y),
(5.8)
by the chain rule y (x) = f (x) = fx (x, y) + fy (x, y)y (x) = fx (x, y) + fy (x, y)f (x, y). We shorten the notation for the right-hand side so y (x) = (fx + fy f )|(x,y) .
(5.9)
y = (fxx + 2f fxy + fyy f 2 + fx fy + fy2 f )|(x,y) .
(5.10)
Similarly,
Similar but more complicated expressions hold for the higher derivatives. Fortunately, the tedium formerly involved in entering such expressions into
5.5 Taylor’s Method
75
a Fortran or C program has been overcome by the availability of symbolic manipulators, which permit us to cut and paste symbolic computations of derivatives into a program (see Exercise 5.5). We write Taylor’s series for y(xn + h) as y(xn+1 ) = y(xn ) + hΦp (xn , y(xn )) + y (p+1) (ξ) where
Φp (x, y) =
f+
hp+1 , (p + 1)!
h h(p−1) (p−1) , (fx + fy f ) + · · · + f 2 p! (x,y)
where the term f (p−1) (x, y(x)) is the (p − 1)th derivative with respect to x and can be expanded as in (5.8), (5.9), etc. Taylor’s algorithm of order p uses the Taylor polynomial of degree p to get the next approximation to y(xn+1 ). In other words, let yn denote the last computed approximation to the exact value y(xn ). Then the next approximation yn+1 = yn + hΦp (xn , yn ). The special case of p = 1 gives the familiar Euler method: yn+1 = yn + hf (xn , yn ). Of course, the remainder term has been dropped, so at each step there is a (discretization) error caused by approximating the value of y by its Taylor series of order p. In order to simplify the discussion we ignore any errors due to roundoff in floating point arithmetic. A local error can grow along with a solution, and so at a later stage earlier discretization errors might have grown along with the solution. To see how the error can grow, call the difference at xn between the exact solution and the computed approximation en = y(xn ) − yn . Then en+1 − en = y(xn+1 ) − yn+1 − (y(xn ) − yn )) = y(xn+1 ) − y(xn ) − (yn+1 − yn ) = hΦp (xn , y(xn )) + y (p+1) (ξn )
hp+1 − hΦp (xn , yn ). (p + 1)! (5.11)
We now add the hypothesis that there exists a positive constant L such that for all u, v ∈ R and all x ∈ [a, b], |Φp (x, u) − Φp (x, v)| ≤ L|u − v|.
(5.12)
Since we have assumed y(x) ∈ C p+1 [a, b] there is some constant Y such that |y (p+1) (x)| ≤ Y
(x ∈ [a, b]).
(5.13)
76
5. Some Applications
Then by (5.11), (5.12), and (5.13), |en+1 | ≤ |en | + hL|y(xn ) − yn | + Y
hp+1 , (p + 1)!
hence |en+1 | ≤ (1 + hL)|en | + Y
hp+1 . (p + 1)!
(5.14)
To see how quickly en can grow we construct, by replacing inequality with equality, a sequence {zn } that dominates {|en |}. In other words, assuming our initial condition y(a) = y0 is correct, we define z0 = e0 = 0 and, for all positive n, hp+1 . zn+1 = (1 + hL)zn + Y (p + 1)! Call B = Y hp+1 /(p + 1)!
(5.15)
so z1 = B, z2 = (1 + hL)z1 + B = ((1 + hL) + 1)B, .. . zn = ((1 + hL)n−1 + · · · + 1)B.
(5.16)
Summing the geometric series, we get zn = B
(1 + hL)n − 1 (1 + hL)n − 1 = B. 1 + hL − 1 hL
By a Taylor series argument 1 + hL < ehL , so zn ≤
ehLn − 1 B. hL
Since our x values xn are in [a, b], nh ≤ (b − a), and, using (5.15), we have hp Y L(b−a) |en | ≤ zn ≤ e →0 (h → 0). −1 L (p + 1)! Because Y and L are constants, at least in theory Taylor’s method converges to the exact solution as h → 0. The error bound, although comforting, is not useful in practice. However, we can compare the results at the same final point b for two different choices of h, say h and h/2, and (usually) safely assume that h has been made sufficiently small if the results agree to within a specified tolerance. See [49] for more sophisticated methods of choosing stepsize h appropriately.
5.6 Special Functions of Mathematical Physics
77
5.6 Special Functions of Mathematical Physics Applied science is replete with so-called special functions, many of which satisfy interesting inequalities [1, 4, 27, 58]. We provide a basic selection of examples in order to whet the reader’s appetite. Example 5.6. If [z] > 0, the gamma function Γ(z) is given by Euler’s integral of the second kind: ∞ Γ(z) = tz−1 e−t dt. 0
When z = x where x is real and positive, Γ(x) can be differentiated any number of times, with ∞ dn Γ(x) = tx−1 e−t (ln t)n dt dxn 0 for any x > 0. By the Cauchy–Schwarz inequality, ∞ ∞ |Γ (x)|2 ≤ (t(x−1)/2 e−t/2 )2 dt (t(x−1)/2 e−t/2 ln t)2 dt 0
0
and we obtain the result |Γ (x)|2 ≤ Γ(x)Γ (x). In the complex-argument case, ∞ x−1 −t iy |Γ(x + iy)| = t e t dt ≤ 0
∞
0
|tx−1 e−t ||tiy | dt,
where |tiy | = |eiy ln t | = 1, and hence |Γ(x + iy)| ≤ |Γ(x)|. Of course, with positive integer arguments the gamma function reduces to the factorial function, with Γ(n) = (n − 1)! (see Exercise 5.6). Example 5.7. For x > 0 and n = 0, 1, 2, . . . , a sequence of exponential integrals ∞ −xt e dt En (x) = tn 1 may be defined. The observation ∞ −xt 2 ∞ −xt/2 −xt/2 2 e e e dt = dt tn t(n−1)/2 t(n+1)/2 1 1 2 ∞ −xt/2 2 ∞ e−xt/2 e ≤ dt dt (n−1)/2 t t(n+1)/2 1 1
78
5. Some Applications
leads to the inequality En2 (x) ≤ En−1 (x)En+1 (x) for n = 1, 2, 3, . . . . Example 5.8. The function Tn (x) = cos(n cos−1 x) for n ∈ N is the Chebyshev polynomial of the first kind, order n. Obviously, |Tn (x)| ≤ 1 whenever −1 ≤ x ≤ 1. With x = cos p, differentiation gives 1 dTn (p) sin np dTn (x) =− =n . dx sin p dp sin p The maximum occurs at p = 0, and we obtain dTn (x) 2 dx ≤ n whenever −1 ≤ x ≤ 1. Example 5.9. The Bessel function of the first kind and integer order n may be defined for −∞ < x < ∞ by the series expansion Jn (x) =
∞ (−1)m (x/2)2m+n , m!(m + n)! m=0
or by the integral representation 1 π Jn (x) = cos(nt − x sin t) dt. π 0 Immediately from the integral representation 1 π 1 π |cos(nt − x sin t)| dt ≤ (1) dt = 1. |Jn (x)| ≤ π 0 π 0 Other useful properties of Jn (x), such as J−n (x) = (−1)n Jn (x) may be derived from the series expansion. Still others follow from the generating function relation ∞ x 1 Jn (x)tn , t− = exp 2 t n=−∞
5.6 Special Functions of Mathematical Physics
79
among these the symmetry property Jn (−x) = (−1)n Jn (x), the fact that J0 (0) = 1, and the addition theorem ∞
Jn (x + y) =
Jm (x)Jn−m (y).
m=−∞
Putting y = −x and n = 0 we obtain ∞
J0 (0) = 1 =
Jm (x)J−m (−x)
m=−∞
so that 1 = J0 (x)J0 (−x) + = J02 (x) + 2
∞
∞
[J−m (x)Jm (−x) + Jm (x)J−m (−x)]
m=1 2 Jm (x).
m=1
Hence the bound
√ |Jm (x)| ≤ 1/ 2
for m = 1, 2, . . . . It is also of interest to note the interlacing property of the zeros of consecutively ordered Bessel functions. We take Rolle’s theorem, along with the differentiation formulas [xn Jn (x)] = xn Jn−1 (x) = −x−n Jn+1 (x) which hold for any x > 0. With n = k and n = k + 1, we obtain [xk Jk (x)] = −x−k Jk+1 (x),
[xk+1 Jk+1 (x)] = xk+1 Jk (x),
respectively. The first of these equations implies that between any two zeros of Jk , Jk+1 has at least one zero; but the second implies that between any two zeros of Jk+1 , Jk has at least one zero. Hence, each function has one and only one zero between each pair of zeros of the other, and the interlacing property is established. Example 5.10. The Legendre polynomials Pn (x), for n = 0, 1, 2, . . . , are solutions to a certain common ordinary differential equation; they are also given [27] by the Laplace integral formula 1 π 1 π Pn (x) = [x + x2 − 1 cos t]n dt = [x + i 1 − x2 cos t]n dt π 0 π 0
80
5. Some Applications
for |x| ≤ 1, and possess many other properties, among them the integral 1 xm Pn (x) dx = 0 (0 ≤ m < n), −1
and the recursion formula (x2 − 1)Pn (x) = n[xPn (x) − Pn−1 (x)] By the Laplace formula,
π|Pn (x)| ≤
π
0 π
= ≤
0 π 0
(|x| < 1).
|x + i 1 − x2 cos t|n dt [x2 + (1 − x2 ) cos2 t]n/2 dt [x2 + (1 − x2 )]n/2 dt.
Hence the upper bound |Pn (x)| ≤ 1 whenever |x| ≤ 1 and n = 0, 1, 2, . . . . Alternatively, π [x2 + (1 − x2 ) cos2 t]n/2 dt π|Pn (x)| ≤ 0
π/2
=2 0
≤2
π/2
0
<2
0
π/2
[1 − (1 − x2 ) sin2 t]n/2 dt
2
1 − (1 − x )
2t π
2 n/2 dt
n/2 4(1 − x2 )t2 dt. exp − π2
The last step follows from the fact that e−x > 1 − x for every x > 0. Then π/2 2n(1 − x2 )t2 π|Pn (x)| ≤ 2 exp − dt π2 0 ∞ 2n(1 − x2 )t2 exp − ≤2 dt. π2 0 The remaining integral exists in closed form for n = 0 and gives us ! π |Pn (x)| ≤ 2n(1 − x2 ) for |x| < 1 and n = 1, 2, 3, . . . . The recursion relation can be treated with the triangle inequality, |xPn (x)| + |Pn−1 (x)| n|xPn (x) − Pn−1 (x)| ≤n 2 |x − 1| |x2 − 1| |x| + 1 |x| + 1 =n , ≤n 2 |x − 1| ||x|2 − 1|
|Pn (x)| =
5.6 Special Functions of Mathematical Physics
81
giving the upper bound |Pn (x)| ≤
n 1 − |x|
on the first derivatives whenever |x| < 1. We spend a bit more time on the Legendre polynomials. Example 5.11. The Legendre polynomials are orthogonal polynomials. A family of polynomials pn (x) for n = 0, 1, 2, . . . is said to be orthogonal with respect to a weight function w(x) > 0 on [a, b] if for m = n
b
pn (x)pm (x)w(x) dx = 0. a
One interesting fact about orthogonal polynomials concerns the ease with which a bound can be placed on the locations of their zeros. Putting m = 0 we have, for all n ≥ 1,
b
pn (x)w(x) dx = 0, a
and it is evident that there exists at least one x ∈ (a, b) at which pn (x) has a sign change. If all such points are denoted by x1 , . . . , xk , then the quantity pn (x)(x−x1 ) · · · (x−xk )w(x) never changes sign in [a, b]. However, k < n would imply that
b
pn (x)(x − x1 ) · · · (x − xk )w(x) dx = 0,
a
because pn (x) is orthogonal to any polynomial of degree less than n. (This follows from the fact that any polynomial of degree less than n can be written uniquely as a linear combination of the polynomials pj (x) for j < n.) So k ≥ n, hence k = n because pn (x) cannot have more than n zeros. Our conclusion: the zeros of pn (x) are all real, distinct, and located in (a, b). Example 5.12. Suppose that Pn (x), a polynomial of degree n, is divided through by its leading coefficient cn to give the related polynomial πn (x) = Pn (x)/cn . It turns out that πn (x) has a smaller norm on (−1, 1) than any other polynomial fn (x) of degree n with leading coefficient 1. To show this, we define the difference polynomial dn−1 (x) = fn (x) − πn (x)
82
5. Some Applications
of degree n − 1, and note that 1 2 [dn−1 (x) + πn (x)]2 dx fn (x) = −1 1
=
−1
d2n−1 (x) dx + 2
1
−1 2
dn−1 (x)πn (x) dx +
1
−1
πn2 (x) dx
= dn−1 (x)2 + πn (x) . Hence fn (x)2 ≥ πn (x)2 . We can sometimes make direct use of power series expansions in establishing bounds for special functions. Example 5.13. The modified Bessel function of the first kind and zeroth order is given by ∞ x2n . I0 (x) = (n! 2n )2 n=0 It is easily shown by induction that (n! 2n )2 ≥ (2n)! for any nonnegative integer n. Hence ∞ x2n I0 (x) ≤ = cosh x. (2n)! n=0 See also Exercise 5.10. We hasten to point out that bounds for interesting special functions are not always as easy to obtain as those we have seen here. For instance, much more work is needed to obtain a bound on the Hermite polynomials Hn (x) that are important in quantum mechanics. The interested reader can see Indritz [30] for a treatment of these functions, including an outline of steps leading to the inequality 2
|Hn (x)| ≤ (2n n!)1/2 ex
/2
.
5.7 A Projectile Problem Suppose an object is thrown straight upward with initial speed v0 . If the drag due to air resistance is directly proportional to instantaneous speed, which part of the subsequent motion would take longer: the upward flight, or the return trip? Newton’s second law dictates that the velocity vu (t) for the upward motion be described by dvu = −mg − kvu , m dt
5.7 A Projectile Problem
83
where m is the mass of the object, g is the free-fall acceleration constant, and k is the proportionality constant quantifying air resistance. With the given initial condition, this equation has solution vu (t) = γg[(1 + α)e−t/γ − 1], where γ = m/k and α = v0 /γg. The time tu to complete the upward motion can be found from the condition vu (tu ) = 0; it is tu = γ ln(1 + α) and the maximum height reached is tu vu (t) dt = γ 2 g[α − ln(1 + α)]. h= 0
Referring a new time origin to the start of the downward motion, the speed vd for the downward trip is governed by dvd 1 + vd = g. dt γ When subjected to the initial condition vd (0) = 0, the solution becomes vd (t) = γg(1 − e−t/γ ). The time interval td for completion of the downward motion must then satisfy td vd (t) dt = h or
0
γgtd + γ 2 g(e−td /γ − 1) = γ 2 g[α − ln(1 + α)].
This is a transcendental equation for the unknown td ; introducing the variable Td = td /γ, we can write it as F (Td ) = 0 where F (x) = x + e−x − (1 + α) + ln(1 + α). Because F (x) = 1−e−x > 0, F (x) is strictly increasing. Defining Tu = tu /γ we have 2+α 1 = 2 ln(1 + α) − α < 0, F (Tu ) = 2 ln(1 + α) − (1 + α) + 1+α 1+α where the last inequality is easily verified by differentiation. This and the monotonicity of F are enough to conclude (Figure 5.1) that Tu < Td . Hence, the object spends more time in its descent than it does in its ascent. Of course, the physical reliability of this conclusion depends on the correctness of the model employed. It is well known that in many (if not most) situations drag is actually proportional to the square of the speed — see Glaister [26] for a treatment of this case. For a more general analysis with arbitrary air resistance, see de Alwis [16].
84
5. Some Applications
y 6
F (x)
Td
F (Tu )
x
FIGURE 5.1. Times for projectile motion.
5.8 Geometric Shapes It is worthwhile to examine a few applications of inequalities to simple geometrical objects. Example 5.14. A polyhedron is a solid figure bounded by planes. Such a figure can be considered as a union of a (finite) number of polygonal faces. The faces are joined along line segments called edges, and at the two ends of each edge are points called vertices. Among the most beautiful of the polyhedra are the regular polyhedra, where the faces are all congruent regular polygons. That there exist only five of these (the Platonic solids) can be shown using simple arguments with inequalities. The proof is based on Euler’s formula, which states that for any simple polyhedron, the number of faces F , the number of edges E, and the number of vertices V are all related by the equation F − E + V = 2. For instance, the cube has F = 6, E = 12, V = 8, while the tetrahedron has F = 4, E = 6, V = 4. A precise definition of the term “simple” would require a topological digression that would send us too far afield; suffice it to say that we rule out shapes with holes in them, such as toroidal-shaped polyhedra [6]. Consider then a simple, regular polyhedron. Because the face polygons are all assumed identical, we may define a constant σ as the number of edges per face, and another constant v as the number of edges meeting at each vertex. It is readily apparent that σ ≥ 3, v ≥ 3. We may also assert that 2E = σF = vV , because each edge has two vertices and is shared by two faces. Elimination of F and V from Euler’s formula gives 1 1 1 1 + = + . σ v 2 E Now because
1 1 1 + > σ v 2
(5.17)
5.8 Geometric Shapes
85
we must rule out the possibility that both σ > 3 and v > 3. Putting σ = 3 in (5.17) we get 1 1 1 − = >0 v 6 E and hence the restriction 3 ≤ v ≤ 5; putting instead v = 3 we get 1 1 1 − = >0 σ 6 E or 3 ≤ σ ≤ 5. Hence there are only five permissible combinations, and they can be tabulated as follows: σ 3 3 3 4 5
ν 3 4 5 3 3
E 6 12 30 12 30
F = 2E/σ 4 8 20 6 12
V = 2E/v 4 6 12 8 20
Object tetrahedron octahedron icosahedron cube dodecahedron
These are the Platonic solids. An interesting class of inequalities called isoperimetric inequalities provides information about various extremal properties of geometric shapes. We encounter one of these inequalities in our next example (see [44]). Example 5.15. Let f (x) be periodic with period L. We showed previously that under suitable restrictions it can be represented by a uniformly convergent Fourier series. The form of the series in this case is f (x) = a0 +
∞
an cos
n=1
2πn 2πn x + bn sin x. L L
Similarly, for another function F (x) of the same period, F (x) = A0 +
∞ n=1
An cos
2πn 2πn x + Bn sin x. L L
Integration of the product of these two functions over [0, L] gives Parseval’s identity
L ∞ L 2a0 A0 + f (x)F (x) dx = (an An + bn Bn ) . (5.18) 2 0 n=1 Consider a simple, smooth, closed plane curve C with known length L as in Figure 5.2. We ask what shape C must have in order that its enclosed area A be maximized. (This is an extremal problem, but not of the type
86
5. Some Applications
y 6
s=0 q
∆y
A A q si A ∆sA A ∆x
C x FIGURE 5.2. Derivation of an isoperimetric inequality.
normally encountered in calculus courses.) We choose a reference point P on C, and define a parameter s to measure arc length along C to the point (x, y) as shown. We assume that x(s) and y(s), each with period L, are sufficiently smooth (i.e., differentiable) so that we may represent them as Fourier series: x(s) = a0 + y(s) = A0 +
∞
an cos
n=1 ∞ n=1
2πn 2πn s + bn sin s, L L
An cos
2πn 2πn s. s + Bn sin L L
Moreover, uniform convergence permits termwise differentiation: ∞ 2πn 2πn 2πn s − an sin s , x (s) = bn cos L L L n=1 ∞ 2πn 2πn 2πn y (s) = Bn cos s − An sin s . L L L n=1 The application of (5.18) to these series and addition of the results gives L ∞ 2π 2 2 2 [x2 (s) + y 2 (s)] ds = n (an + b2n + A2n + Bn2 ). L n=1 0 But x2 (s) + y 2 (s) ≡ 1 so that ∞ n=1
n2 (a2n + b2n + A2n + Bn2 ) =
L2 . 2π 2
5.8 Geometric Shapes
87
Referring again to the figure, we have for the enclosed area A≈
x(si )∆y =
i
x(si )
i
∆y ∆s ∆s
or in the limit
L
A=
x(s)y (s) ds = π
0
∞
n(an Bn − An bn )
n=1
by (5.18) and use of the differentiated series above. Then L2 − 4πA = 2π 2 = 2π 2
∞ n=1 ∞
n2 (a2n + b2n + A2n + Bn2 ) − 4π 2
∞
n(an Bn − An bn )
n=1
[(nan − Bn )2 + (nAn + bn )2 + (n2 − 1)(b2n + Bn2 )]
n=1
or, since the right member is nonnegative, A ≤ L2 /4π. This is the desired isoperimetric inequality, and will answer our maximization question. It is apparent that equality holds if and only if: (1) all the Fourier coefficients vanish whenever n ≥ 2; and (2) a1 = B1 and A1 = −b1 . Under these conditions 2π 2π s + b1 sin s, L L 2π 2π y(s) = A0 − b1 cos s + a1 sin s. L L
x(s) = a0 + a1 cos
Squaring and adding to eliminate s, we obtain (x − a0 )2 + (y − A0 )2 = a21 + b21 . Hence of all closed curves of a given length, a circle encloses the greatest area. Example 5.16. We can show that of all triangles having a given fixed perimeter, an equilateral triangle encloses the greatest area. The area A of a triangle is given by Heron’s formula A = s(s − a)(s − b)(s − c), where a, b, c are the side lengths and s is one-half the perimeter p. By the AM–GM inequality, ! 2 s (s − a) + (s − b) + (s − c) 3 A = 3 (s − a)(s − b)(s − c) ≤ = . s 3 3
88
5. Some Applications
Hence
s2 p2 A≤ √ = √ . 3 3 12 3
Equality is attained only if the numbers s − a, s − b, s − c are all equal.
5.9 Electrostatic Fields and Capacitance Electrostatics is the study of stationary electric charges and their effects on one another. Utmost in electrostatics is the conservative nature of the force field, which permits the vector electric intensity to be expressed as the gradient of a potential function. The electrostatic potential Φ(x, y, z) is produced by electric charge and satisfies Poisson’s equation ∇2 Φ =
∂2Φ ∂2Φ ∂2Φ ρ + + =− , ∂x2 ∂y 2 ∂z 2 0
where ρ = ρ(x, y, z) is the density of electric charge (Coulombs/meter3 ) and 0 is the free-space permittivity (a positive constant). Φ is convenient because of its scalar nature, and we can study some of its most fundamental properties through basic manipulations with inequalities. Example 5.17. Consider electric charge distributed with continuous density ρ(x, y, z) throughout a volume region V in unbounded free space. The resulting potential at points (x, y, z) external to V is given by ρ(x , y , z ) 1 dx dy dz , Φ(x, y, z) = 4π0 V R where R is the distance from (x, y, z) to an element of electric charge at the location (x , y , z ) of the differential volume dx dy dz . We inquire as to how Φ behaves at large distance from V . For simplicity we assume ρ(x, y, z) ≥ 0 in V ; the following argument is easily modified if negative charge is present. Let the maximum and minimum values of R for a fixed point (x, y, z) be RM and Rm , respectively. Then 1 1 1 ≤ ≤ RM R Rm and we obtain V
or
ρ dx dy dz ≤ RM
V
ρ dx dy dz ≤ R
V
ρ dx dy dz Rm
Q R Q R ≤ RΦ ≤ , 4π0 RM 4π0 Rm
5.9 Electrostatic Fields and Capacitance
where Q=
89
ρ dx dy dz
V
is the total charge in V . As R → ∞, R/RM and R/Rm both approach 1 and the middle term RΦ is squeezed to Q/4π0 . Hence Φ = O(R−1 ) and the potential is said to be regular at infinity. Example 5.18. (See [60]). Consider a two-dimensional situation where ∂2Φ ∂2Φ ρ(x, y) + =− ∂x2 ∂y 2 0
(5.19)
holds within a bounded domain D of the xy-plane. Let the boundary curve of D be C. We investigate a property possessed by any continuous solution Φ(x, y) under the condition that ρ is strictly negative so that the forcing function for (5.19) is strictly positive. If Φ is continuous on D ∪ C, then it must attain a maximum on D ∪ C, at point p0 = (x0 , y0 ) say. Now p0 ∈ D implies that simultaneously ∂ 2 Φ ∂ 2 Φ ≤ 0, ≤ 0, ∂x2 p0 ∂y 2 p0 a contradiction, and hence p0 ∈ C. Under the given assumptions max Φ must occur on the boundary contour C. Next, suppose that ρ is required only to be nonpositive in D. We take B to be an upper bound for Φ on C, and let Φ(x, y) = φ(x, y) − ε(x2 + y 2 ), where φ is a new function and ε > 0 is arbitrary. Substitution into (5.19) gives ∂2φ ∂2φ ρ(x, y) + 2 =− + 4ε. ∂x2 ∂y 0 Because φ is a solution of (5.19) and its forcing function is strictly positive, we know that max φ occurs on C. Because Φ(x, y) ≤ φ(x, y), max Φ ≤ max φ ≤ max φ = max [Φ + ε(x2 + y 2 )]
(x,y)∈D
(x,y)∈D
(x,y)∈C
(x,y)∈C
= B + ε max (x2 + y 2 ) (x,y)∈C
for every ε > 0. Hence for every (x, y) ∈ D, Φ(x, y) ≤ B. The two results obtained above, called maximum principles, yield prior knowledge about the behavior of all possible solutions of Poisson’s equation under certain prescribed conditions. The case ρ(x, y) ≡ 0 of (5.19) is the important Laplace equation ∂2Φ ∂2Φ + = 0. ∂x2 ∂y 2
(5.20)
90
5. Some Applications
We suppose for (5.20) that there exist positive numbers b and B such that for every (x, y) ∈ C, b ≤ Φ(x, y) ≤ B.
(5.21)
Then Φ(x, y) ≤ B in D by the maximum principle above. Moreover, −Φ also satisfies (5.20) and the condition −B ≤ −Φ(x, y) ≤ −b on C. Hence on D we have −Φ(x, y) ≤ −b and it follows that (5.21) holds there. In other words, both the maximum and minimum values of any solution to (5.20) in a bounded domain must occur on the boundary of the domain. The reader interested in pursuing this area further could obtain Protter [51]. A quantity of interest in electrostatics is the capacitance of a metallic body. Consider a conducting solid with boundary surface A and held at potential Φ0 , and let Φ be the potential produced by the charge on the body. The capacitance of the body is defined as the ratio of the total charge it carries to the surface potential Φ0 , and is given by 0 C= 2 |∇Φ|2 dV, (5.22) Φ0 Ve where volume integration is done over the space Ve exterior to A. Based on this relation we can derive an inequality that provides a convenient upper bound on the capacitance. We introduce two new scalar fields f and δ (not having any particular physical interpretation) such that f (x, y, z) = Φ(x, y, z) + δ(x, y, z), where δ = 0 on the body A and δ → 0 at large distance from A so that f satisfies the same boundary conditions as Φ. Notice that |∇f |2 dV = |∇Φ + ∇δ|2 dV Ve Ve = |∇Φ|2 dV + 2 ∇δ · ∇Φ dV + |∇δ|2 dV, Ve
Ve
Ve
so that by (5.22), Green’s formula " ∂ψ dS = φ ∇φ · ∇ψ dV + φ∇2 ψ dV S ∂n V V and Laplace’s equation, " Φ20 C ∂Φ 2 2 dS − |∇f | dV = +2 δ δ∇ Φ dV + |∇δ|2 dV 0 Ve S ∂n Ve Ve Φ20 C + |∇δ|2 dV. (5.23) = 0 Ve
5.9 Electrostatic Fields and Capacitance
91
VeL V
FIGURE 5.3. Bound on capacitance.
Because the rightmost term is nonnegative, we have Dirichlet’s principle 0 |∇f |2 dV. (5.24) C≤ 2 Φ0 Ve Equality is attained when f = Φ, i.e., the actual potential; any other function that fits the same boundary conditions will overestimate C. Example 5.19. The capacitance C of a body is less than that of any other body that can completely surround it (Figure 5.3). For let ΦL be the actual potential due to the larger body, and take f = ΦL at points in the region VeL outside the larger body while taking f = Φ0 everywhere inside. Then f satisfies the boundary conditions for both bodies and 0 0 0 C≤ 2 |∇f |2 dV = 2 |∇Φ0 |2 dV + 2 |∇ΦL |2 dV = CL . Φ0 Ve Φ0 V Φ0 VeL For instance, because the capacitance of a sphere is elementary we could get a rough estimate of the capacitance of a cube by inscribing and circumscribing appropriate spheres. The reader interested in more sophisticated uses of Dirichlet’s principle is referred to Polya and Szeg¨o [50] who, for instance, show that the capacitance of a convex but otherwise arbitrarily shaped body cannot exceed the capacitance of a certain related prolate spheroid. The capacitance of a prolate spheroid of eccentricity e is well known [56] and is given by C=
8π0 ae , ln[(1 + e)/(1 − e)]
(5.25)
where a is the major semiaxis of the generating ellipse. The capacitance of a convex body can never exceed the capacitance of a prolate spheroid, the major and minor semiaxes of which are the mean radius and surface radius, respectively, of the body. These last terms are further defined in the
92
5. Some Applications
reference, where this elegant result is used to attack the difficult problem of better estimating the capacitance of a cube. Another approach to the estimation of electrical capacitance is based on a geometrical concept of symmetrization. Steiner symmetrization of a given solid B with respect to a plane P is an operation which changes B into new solid B such that: • B is symmetric with respect to P ; • if L is a straight line perpendicular to P , then L intersects B if and only if L intersects B , and both intersections have the same length; • L ∩ B is just one line segment, bisected by P (or, is a point of P in a degenerate case). P is known as the plane of symmetrization for the operation. For instance, suppose our original solid is the hemispherical ball B = {(x, y, z) | 0 < z < a2 − x2 − y 2 } and P is the z = 0 plane. We see that the new solid # 1 2 2 2 a −x −y B = (x, y, z) | |z| < 2 satisfies the three conditions of the definition of symmetrization; hence, B is the ellipsoid y2 z2 x2 + + = 1. a2 a2 (a/2)2 Symmetrization has useful properties. First, the solids B and B have equal volumes (as is easily verified for the example above). Second, the operation does not increase surface area; that is, supposing B has boundary area S and B has boundary area S , then S ≤ S. A similar relation holds between the capacitances C and C of metallic objects formed in the shapes of B and B , respectively, i.e., that C ≤ C. Polya and Szeg¨o discuss an ingenious application of this idea to the calculation of capacitance of arbitrarily shaped bodies. The main ideas are as follows. We begin with an arbitrarily shaped body B (0) having known volume V and unknown capacitance C (0) . Now imagine symmetrizing the body repeatedly, with respect to any number of different successive planes. After the nth such symmetrization, we get a body B (n) whose volume is still V but whose capacitance C (n) satisfies C (n) ≤ C (n−1) ≤ · · · ≤ C (0) .
5.10 Applications to Matrices
93
As n → ∞ we should arrive at a sphere of volume V ; letting e → 0 in (5.25), the capacitance of this simple object is 4π0 a where a is the radius. Since the volume is V = 4πa3 /3, we can eliminate a from the capacitance expression and assert that ! 3 3V (0) C ≥ 4π0 4π for the body B (0) . For a metallic cube of edge length L, for instance, this would yield ! 3 3 3L ≈ 7.7960 L C ≥ 4π0 4π as a lower bound.
5.10 Applications to Matrices Matrix theory and linear algebra contain many references to inequalities. Given an n × n square matrix of complex elements a11 · · · a1n .. , .. A = ... . . an1 · · · ann an important related matrix is the conjugate transpose A† of A: a11 · · · an1 .. . .. A† = ... . . a1n
···
ann
In terms of inner products, if x, y ∈ Cn , then x, Ay = A† x, y. The matrix A is called Hermitian or self-adjoint if A = A† . Because x, Ay = A† x, y for any vectors x, y ∈ Cn and any complex matrix A, the Hermitian matrix A satisfies x, Ay = Ax, y.
(5.26)
Next, we derive an important inequality involving the eigenvalues of a square matrix A. Recall that the eigenvalues λ1 , λ2 , . . . , λn of A are scalars satisfying Ax = λi x for some nonzero column vectors x (the corresponding eigenvectors). We first note that if λ is an eigenvalue of the Hermitian matrix A, then λ must be real. To see this, let λ be an eigenvalue with corresponding eigenvector x. Then x, xλ = x, λx = x, Ax = Ax, x = λx, x = λx, x.
94
5. Some Applications
Since x, x = 0, λ = λ. In case A is a real matrix, Hermitian means symmetric: aij = aji for all i, j. We now restrict our discussion to symmetric matrices. Now suppose λ1 and λ2 are two (distinct) eigenvalues of the symmetric matrix A with corresponding eigenvectors x1 , x2 . Then x1 and x2 are orthogonal, i.e., x1 , x2 = 0. To see this, we write x1 , x2 λ2 = x1 , λ2 x2 = x1 , Ax2 = Ax1 , x2 = λ1 x1 , x2 = λ1 x1 , x2 . But λ1 = λ2 , hence x1 , x2 = 0. The following several theorems are useful: Theorem 5.1. Let A be an n × n symmetric (real) matrix. Suppose the (real) eigenvalues {λi } satisfy λ1 < λ2 < · · · < λn . Define the quadratic form for x ∈ Rn by Q(x) = x, Ax. Then, for any x ∈ Rn , λ1 x2 ≤ Q(x) ≤ λn x2 . Proof. Let {x1 , . . . , xn } be corresponding eigenvectors. We may assume each xi satisfies xi = 1. (Otherwise replace xi by xi /xi .) By our previous observation, the vectors {x1 , . . . , xn } are an orthonormal set. Hence they are linearly independent and form a basis. Thus there exist coefficients {ci } such that n ci xi . x= i=1
Then Q(x) =
& n
ci xi , A
i=1
n
' cj xj
=
& n
j=1
so that Q(x) ≤ λn
i=1 n
ci xi ,
n j=1
' cj λj xj
=
n
c2i λi ,
i=1
c2i = λn x2 .
i=1 2
Similarly, λ1 x ≤ Q(x). Theorem 5.2 (Sylvester’s Criterion). Let A be a symmetric n × n real matrix. The quadratic form defined by Q(x) = xT Ax = x, Ax for x ∈ Rn is positive definite if and only if the determinants a11 · · · a1n a11 a12 . .. .. a11 , . . a21 a22 , . . . , .. an1 · · · ann are all positive.
5.10 Applications to Matrices
95
Proof. Recall from linear algebra that an n × n symmetric matrix A is called positive definite if xT Ax > 0 whenever the vector x = 0. We discuss the theorem for n = 2 and refer the reader to Gelfand [25] for the general case.Suppose Q(x) is positive definite. That is, Q(x) > 0if x = 0. Choose x1 1 x= . Then Q(x) = a11 > 0. Now choose x = . Because x = 0 1 0 we have Q(x) = a11 x21 + 2a12 x1 + a22 > 0 for all x1 so by our discussion of quadratic inequalities in Chapter 1, a11 a12 > 0. ∆= a12 a22 The converse is proved in a similar way. Theorem 5.3 (Second Derivative Test for n Variables). Let U be an open set in Rn . Let f (x) ∈ C 2 (U ). Let x0 ∈ U and suppose f (x0 ) = 0 and f (x0 ) is positive definite. Then f (x) has a local minimum at x0 . That is, there is a positive δ such that if 0 < x − x0 < δ, then f (x) > f (x0 ). Proof. We first discuss some of the terms used. f (x) ∈ C 2 (U ) means all first and second partial derivatives of f with respect to {x1 , . . . , xn } exist and are continuous in U . f (x0 ) represents the n-tuple, or row vector (∂f (x0 )/∂x1 , . . . , ∂f (x0 )/∂xn ). The second derivative f (x0 ), or Hessian, is the n×n whose (i,j)th entry is ∂ 2 f (x0 )/∂xi ∂xj . If x is the column matrix x1 .. vector . , then xT (the transpose) is the row vector (x1 , . . . , xn ). By xn Sylvester’s criterion, the second derivative f (x0 ) is positive definite if and only if the determinants ∂2f ∂2f · · · 2 2 2 ∂ f ∂ f ∂x1 ∂xn ∂x1 ∂ 2 f ∂x2 ∂x ∂x 1 2 .. .. .. 1 , . . . , 2 . . . ∂x2 , ∂ 2 f ∂ f 1 ∂x ∂x ∂2f ∂2f 1 2 ∂x22 ··· ∂x ∂x 2 n 1 ∂xn are all positive at x0 . By persistence of sign applied to n variables, if the determinants are positive at x0 , then they are positive nearby. Choose a positive δ such that x ∈ U and all the determinants are positive at x whenever x − x0 < δ. Now let 0 < ∆x < δ. The following generalizes equation (2.4) to n variables: 1 f (x0 + ∆x) = f (x0 ) + f (x0 )∆x + (∆x)T f (ξ)(∆x) 2 for some ξ (on the line segment) strictly between x0 and x0 +∆x. (See [37, 17].) Because f (x0 ) = 0 and f (ξ) is positive definite the result follows by inspection.
96
5. Some Applications
Another useful concept is the trace of a square matrix M , denoted by † tr[M ] and defined n as the sum of the diagonal elements of M . With B = A A we have bij = k=1 aki akj and the trace of B is n
bmm =
m=1
n n
akm akm =
m=1 k=1
n n
|akm |2 ≥ 0.
m=1 k=1
Hence tr[A† A] ≥ 0, and equality holds if and only if A is the zero matrix. We use this simple result to derive another inequality involving matrix eigenvalues. Because λx−Ax = λIx−Ax = (λI −A)x where I is the identity matrix having the same dimension as A, the eigenvalues are conveniently computed as solutions of the characteristic equation det(λI − A) = 0. For any square matrix S there is a unitary matrix U (i.e., one having U −1 = U † ) such that U † SU is upper triangular. This upper triangular matrix T = U † SU is said to be unitarily similar to S. Since det(λI − T ) = det(λU † IU − U † SU ) = det(U −1 ) det(λI − S) det(U ) = det(λI − S) it is apparent that T has the same eigenvalues as does S; moreover, the eigenvalues of T reside along the main diagonal. We use these facts as follows. Suppose A is square with eigenvalues λ1 , . . . , λn . Then B = U † AU is upper triangular and BB † = (U † AU )(U † AU )† = (U † AU )(U † (U † A)† ) = (U † AU )(U † A† U ) = U † AA† U, so that BB † is unitarily similar to AA† . Because the trace of a matrix is the sum of its eigenvalues, tr[BB † ] = tr[AA† ] or n n
|aij |2 =
i=1 j=1
n n
|bij |2 .
i=1 j=1
But remembering that B is upper triangular and unitarily similar to A, we have n n n i−1 n |bij |2 = |bij |2 + |bii |2 + |bij |2 i=1 j=1
i=1
=
n i=1
j=1
|λi |2 +
j=i+1 n n i=1 j=i+1
|bij |2 .
5.10 Applications to Matrices
Hence
n n
|aij |2 ≥
i=1 j=1
n
97
|λi |2 .
i=1
This is Schur’s inequality. Equality holds if and only if n n
|bij |2 = 0,
i=1 j=i+1
i.e., if and only if B is a diagonal matrix. Example 5.20. Schur’s inequality can be applied to find a rough bound on the magnitudes of the individual eigenvalues. Since |λi |2 ≤
n
|λi |2 ≤
i=1
n n
|aij |2 ≤ n2 max |aij |2
i=1 j=1
we have |λi | ≤ n max|aij | for i = 1, . . . , n. Example 5.21. Schur’s inequality can also be used in conjunction with AM–GM to obtain a bound on the determinant of A. Recall that the determinant of a square matrix equals the product of its eigenvalues: n n | det A| = λi = |λi |. i=1
Then |det A|2/n
n =
n
i=1
i=1
1 1 |λi |2 ≤ |aij |2 n i=1 n i=1 j=1 n
|λi |2 ≤
n
n
1 ≤ n2 max|aij |2 , n and therefore |det A| ≤ nn/2 (max|aij |)n . Inequalities also arise in the discussion of vector and matrix norms. For a column vector X = (x1 , . . . , xn )T , scalars called Lp norms can be defined through the equation n 1/p |xi |p , Xp = i=1
where p = 1, 2, . . . , ∞. In particular, n |xi |2 X2 = i=1
98
5. Some Applications
is called the Euclidean or L2 norm. This norm has many applications in systems engineering, as does the L1 norm n
X1 =
|xi |.
(5.27)
i=1
The definition for p = ∞ is interpreted as X∞ = max |xi |. Inequalities are available to relate the various vector norms to one another. For instance, the reader might wish to pause momentarily and verify the following: X2 ≤
√
(X2 )2 ≤ X1 X∞ .
nX∞ ,
For the vector norm Xp , we define the induced matrix norm of the n × n matrix A by Ap = max X=0
AXp Xp
# .
(5.28)
Then AXp ≤ Ap Xp for all X, and AXp = Ap Xp for some X. Although X2 is the most natural way to measure the length of a vector, A2 is difficult to compute in general. For this and other reasons, p = 1 or p = ∞ are frequent choices. A1 gives the max column sum, i.e., A1 = max
1≤j≤n
( n
) |aij |
i=1
while A∞ gives the max row sum. Two other matrix norms that are commonly used are the Frobenius norm n n |aij |2 = tr[A† A] AF = i=1 j=1
and the cubic norm AC = n max|aij |. A property that must be satisfied by any valid matrix norm is AB ≤ A B
5.10 Applications to Matrices
99
for any two matrices A, B. It is easily verified that the Frobenius norm satisfies this general property; by the Cauchy–Schwarz inequality, 2 n n n aik bkj ABF = i=1 j=1 k=1 n n n n ≤ |aik |2 |bmj |2 i=1 j=1 k=1
m=1
n n n n = |aik |2 |bmj |2 i=1 k=1
j=1 m=1
= AF BF , as desired. The same property is also easily verified for the cubic norm. Another property of interest is that √ A2 ≤ AF ≤ nA2 , providing an estimate for the more difficult to compute A2 . A matrix norm is said to be compatible with a given vector norm if the inequality AX ≤ A X holds everywhere. For example, A2 and AF are both compatible with X2 . The inequality from (5.28) AXp ≤ Ap Xp is sharp in the sense that equality holds for some x = 0. However, the Frobenius norm is not sharp (Exercise 5.15). We pay a penalty for having an easily computed matrix norm AF ; it overestimates the “size” of the matrix A as an operator. Using any compatible norms, we can write λi X = |λi |X = AX ≤ A X and because the eigenvectors are nonzero, |λi | ≤ A for i = 1, . . . , n. The spectral radius ρ[A] of the matrix A is defined to be the magnitude of the largest eigenvalue of A. Obviously then, ρ[A] ≤ A. We have already met a specific instance of this inequality in Example 5.20, where the cubic norm was effectively used. See Theorem 6.9.2 of Stoer and Bulirsch [55] for more discussion on the spectral radius. See Marcus and Minc [42] for more discussion on matrix inequalities in general. Another good reference on matrices is L¨ utkepohl [40].
100
5. Some Applications
5.11 Topics in Signal Analysis Consider a periodic square wave w(t), given over one cycle by −π/2 for − π ≤ t < 0, w(t) = π/2 for 0 ≤ t < π. By expressions given earlier, w(t) can be represented as the Fourier series w(t) = 2
∞ sin(2n − 1)t . 2n − 1 n=1
(5.29)
Waveform w(t) has a jump discontinuity at t = 0, and it is well known that a truncated version of the Fourier series will overshoot this jump (the Gibbs phenomenon). We now compute the amount of overshoot present (see [41]). Let us call the mth partial sum of the series Swm (t). Differentiation gives m dSwm (t) =2 cos(2n − 1)t. dt n=1
Using the identity m
cos(2n − 1)t ≡
n=1
1 sin 2mt csc t 2
and then integrating, we get Swm (t) =
0
t
sin 2mτ dτ. sin τ
Defining ∆m (t) = Swm (t) −
0
t
sin 2mτ dτ τ
(5.30)
(the motivation becomes clear later) we see that t 1 sin τ τ − 2 ∆m (t) = sin 2mτ dτ sin τ τ τ 0 t τ 1 sin τ ≤ | sin 2mτ | − 2 dτ. sin τ τ τ 0 But sin τ ≥ 2τ /π whenever 0 ≤ τ ≤ π/2 by Jordan’s inequality; also, 1 sin τ τ3 τ5 τ − 2 = − + − ··· , τ τ 3! 5! 7!
5.11 Topics in Signal Analysis
101
so that for small positive τ 0< and
1 sin τ τ − 2 < τ τ 3!
∆m (t) ≤
0
t
π τ π 2 · dτ = t . 2 3! 24
Hence for any ε > 0, there is a T > 0 such that ∆m (t) < ε whenever 0 ≤ t ≤ T . For m > π/2T , we may choose in particular t = π/2m and after a change of variables write (5.30) as π sin τ Swm π − (5.31) dτ < ε. 2m τ 0 But
0
π
sin τ dτ = τ
0
∞
sin τ dτ − τ
∞
π
π sin τ dτ = − τ 2
∞
π
sin τ dτ τ
and (5.31) is * + ∞ sin τ Swm π − π − − dτ < ε. 2m 2 τ π So the difference between the series and w(t) in the neighborhood to the right of the jump discontinuity at t = 0 is ∞ sin τ − dτ ≈ 0.281. τ π This is roughly 9% of the jump height π, and is independent of m. The fact that the Gibbs overshoot cannot be eliminated by a sufficiently large choice of m is, of course, related to the fact that the convergence of the series of functions in (5.29) is not uniform. For aperiodic signals f (t), the Fourier transform ∞ f (t)e−iωt dt F[f (t)] = F (ω) = −∞
and its inverse F −1 [F (ω)] = f (t) =
1 2π
∞
F (ω)eiωt dω
−∞
are used to study frequency content as a function of the continuous angular frequency variable ω. By the differentiation property n d f (t) (5.32) = (iω)n F (ω) F dtn
102
we have
5. Some Applications
1 |ω F (ω)| = n i n
∞
−∞
∞ n d f dn f −iωt e dt ≤ n dt n dt −∞ dt
with the resulting bounds 1 |ω n |
|F (ω)| ≤
n d f n dt −∞ dt
∞
for n = 0, 1, 2, . . . , on the spectrum of f . This inequality lends support to our intuitive notion that only signals that vary rapidly with time (i.e., having significant nth derivatives for large n) can have significant spectral content at high frequencies. A related fact is that short-duration time signals have broadband frequency spectra. In order to quantify this relationship, we use second-moment integrals to define the temporal duration of f (t) as ∞
D2 =
t2 f 2 (t) dt
−∞
and the bandwidth of its spectrum as ∞ 2 ω 2 |F (ω)|2 dω. B = −∞
The uncertainty principle states that if f (t) = o(t−1/2 ), then DB ≥ π/2.
(5.33)
To obtain (5.33) we write the Cauchy–Schwarz inequality for integrals as ∞ 2 ∞ ∞ 2 df df 2 ≤ dt [tf (t)] [tf (t)] dt dt. dt dt −∞ −∞ −∞ But integration by parts gives ∞ ∞ ∞ 2 ∞ df f (t) f 2 (t) [tf (t)] tf (t) df (t) = t − dt = dt, dt 2 −∞ 2 −∞ −∞ −∞ where the first term in the rightmost member vanishes by the o condition on f . The integral ∞ f 2 (t) dt E= −∞
is called the normalized energy in the signal f ; without loss of generality this can be set to unity to give ∞ 2 df 1 ≤ D2 dt. 4 dt −∞
5.12 Dynamical System Stability and Control
103
It only remains to invoke (5.32) and Parseval’s identity in the form ∞ ∞ 2π f 2 (t) dt = |F (ω)|2 dω −∞
−∞
to obtain (5.33). It is easily shown (Exercise 5.16) that the minimum duration-bandwidth product is realized (i.e., equality is attained in (5.33)) when the signal f is a Gaussian pulse.
5.12 Dynamical System Stability and Control A broad class of continuous-time systems can be modeled using an initialvalue problem of the form dx(t) = f (x(t)) (t ≥ t ), 0 dt x(t0 ) = x0 , where x(t) is the N -dimensional state vector of the system, and the system structure is reflected in function f . Such systems are unforced (i.e., no input signal). The set of all possible x is the state space of the system, and solution curves in state space are known as system trajectories. Stability theory deals with sensitivity to unwanted disturbances. Of special concern are disturbances tending to perturb the system from an equilibrium state, a value x = xe such that f (xe ) = 0 whenever t ≥ t0 . Because such a state may always be transferred to the origin of state space by a suitable coordinate translation, it is customary to take xe = 0. If xe is unstable, a slight perturbation could put the system on a trajectory leading away from xe ; on the other hand, the trajectory could stay within a small neighborhood of xe , or it could lead back to xe . Technically, there are several notions of stability. The origin xe = 0 is . . . • stable in the sense of Lyapunov if for every given ε > 0 there is a number δ(ε) > 0 such that if x(t0 ) < δ, then the resulting trajectory satisfies x(t) < ε for all t > t0 . • asymptotically stable if it is stable in the sense of Lyapunov and, in addition, there exists γ > 0 such that whenever x(t0 ) < γ the resulting trajectory has x(t) → 0 as t → ∞. • exponentially stable if there exist positive numbers α, λ, such that for all t > t0 , x(t) ≤ αx0 e−λt whenever the initial state x0 falls sufficiently close to xe . Lyapunov theory can yield conclusions about stability without explicit knowledge of x(t). This theory is extensive, and here we can offer only
104
5. Some Applications
a few preliminary remarks for the reader. A principal notion is that if a system having just one equilibrium state is dissipative, then the system will always return to that state after any perturbation from it. This equilibrium will be a point of minimum energy for the system, and as any trajectory is followed toward this point the system energy must continually decrease. Use is therefore made of a “generalized energy” function V (x), called a Lyapunov function. Assume F (x) is continuous with continuous partial derivatives in state space, and let Ω be a region about x = 0. We say that F (x) is positive definite in Ω if F (0) = 0 and, for every nonzero x ∈ Ω, F (x) > 0. F (x) is negative semidefinite in Ω if F (0) = 0 and, for every nonzero x ∈ Ω, F (x) ≤ 0. Similar definitions can be formulated for the terms positive semidefinite and negative definite. We can now state a simple stability result. If a positive-definite function V (x) can be determined for a system such that dV /dt is negative semidefinite, then the equilibrium point x = 0 is stable in the sense of Lyapunov. Here dV /dt means dV (x(t))/dt, which is also written as dV (x)/dt. For let ε > 0 be given. Let Sε = {x | x = ε}. Because Sε is closed and bounded, as in the case for one variable, V (x) assumes a minimum value m on Sε . Note that m > 0 because V (x) is positive definite about x = 0. Continuity of V at x = 0 guarantees a δ > 0 such that δ < ε and such that for all x having x ≤ δ, V (x) ≤ m/2. Also, because dV (x)/dt is negative semidefinite we know that V (x(t)) is nonincreasing with respect to t. Hence, for an initial condition with x(t0 ) ≤ δ, we have V (x(t)) ≤ V (x(t0 )) ≤ m/2 whenever t > t0 . But this implies that x(t) < ε whenever t > t0 . To see this, suppose to the contrary that x(t) ≥ ε at some time t > t0 . Because x(t0 ) < δ < ε, at some intermediate time t0 < t ≤ t, x(t ) = ε. But on Sε , V (x) ≥ m contradicting V (x(t)) ≤ m/2 whenever t > t0 . Moreover, if dV (x)/dt is negative definite, then x = 0 is asymptotically stable. Geometrically, we may imagine a level contour or surface V (x) = constant > 0 (Figure 5.4). Let x be on this contour or surface. Since x = xe we require dV (x)/dt < 0. By the chain rule, dV (x) T = (∇V (x)) f (x), dt
(5.34)
∇V (x) represents an outward normal to the level contour or surface, and the vector field f (x) provides tangent vectors along solutions. In engineering T terminology (∇V (x)) f (x) is the dot product, the product of the magnitudes (norms) with the cosine of the angle between the two vectors. The fact that the cosine of the angle is negative means that the solution is directed inward, hence a solution starting in the interior of the level contour or surface can never leave the interior and so remains bounded for all time and, in fact, approaches xe as t → ∞. The Lyapunov function V (x) sometimes corresponds to actual physical energy, but not always.
5.12 Dynamical System Stability and Control
105
solution curve
xe ) x f (x)
-
∇V (x)
V (x)=constant FIGURE 5.4. Lyapunov function.
Example 5.22. Consider the motion of a mass m attached to a nonlinear spring with stiffness k(x + x3 ), where x is the distance from the equilibrium position, and a dashpot (shock absorber) is attached to provide a damping force c(dx/dt). The differential equation is mx + cx + k(x + x3 ) = 0. The equation is converted to a first-order system by substituting x = y so our system is dx = y, dt
c dy k = − (x + x3 ) − y. dt m m
The kinetic and potential energies are given respectively by 2 x 1 x4 1 + P E = k(x + x3 ) dx = k KE = mv 2 = my 2 , . 2 2 2 4 Hence the total energy should be a candidate for the Lyapunov function 2 x x4 1 x (5.35) + + my 2 . =k V y 2 4 2 By (5.34), dV dt
x y
=
∇V
x y
T f
x y
= −cy 2 .
(5.36)
Equations (5.35) and (5.36) show that V is positive definite and dV /dt is negative semidefinite, so the system is stable. However, physical intuition tells us the damped system should be asymptotically stable. Unfortunately, for this choice of V the derivative is zero along the x-axis, and we wanted dV x x 0 <0 if = . y y 0 dt
106
5. Some Applications
After some fiddling, let 2 x x4 c x2 1 x V =k + + my 2 + β xy + . y 2 4 2 m 2 Then, using (5.34) again, βk 2 dV x (x + x4 ). = (−c + β)y 2 − y dt m Thus, if we choose 0 < β < c, then dV /dt is negative definite. We want V to be positive definite. Rewrite kx4 x x +W . V = y y 4 Recognize
W
x y
=
k βc + 2 2m
x2 + βxy +
as the quadratic form T a11 x x W = a21 y y
a12 a22
m 2 y 2
x y
,
where a11 = k/2 + βc/2m, a12 = a21 = β/2, and a22 = m/2. Recall from Theorem 5.2 that if A is symmetric and a a (5.37) a11 > 0 and the determinant 11 12 > 0, a21 a22 then W is positive definite. It is easily verified that the choice β = c/2 guarantees that (5.37) is satisfied. Hence W and therefore V is positive definite. Because 0 < β < c we have guaranteed that dV /dt is negative definite. In other words, the system is asymptotically stable at the origin. In this relatively simple example finding a Lyapunov function was not immediate. Before continuing we simplify matters by assuming m = k = c = 1 and β = 1/2. By Theorem 5.1 the quadratic form W satisfies λ1 (x2 + y 2 ) ≤ W ≤ λ2 (x2 + y 2 ), where the eigenvalues of A are √ 5− 5 λ1 = 8
and
√ 5+ 5 λ2 = . 8
Since V = (x4 /4) + W , x4 x4 + λ1 (x2 + y 2 ) ≤ V ≤ + λ2 (x2 + y 2 ). 4 4
(5.38)
5.12 Dynamical System Stability and Control
Now
107
x4 dV x2 + y 2 − =− dt 2 2
so we may substitute for x2 + y 2 to get V ≤
dV dV x4 + λ2 −2 − x4 ≤ −2λ2 , 4 dt dt
which implies
V (t) ≤ V (0)e−t/2λ2 .
We use (5.38) and observe λ1 x2 ≤
x4 + λ1 (x2 + y 2 ) ≤ V, 4
hence
|x(t)| ≤
V (0) −t/4λ2 e . λ1
Similarly, |y(t)| and therefore (x, y)T are bounded by decaying exponentials and the system is exponentially stable at the origin. The reader interested in pursuing Lyapunov theory further is invited to consult the references [11, 32, 35]. When considering systems with nonzero input signals, other notions of stability must be employed. A crucial question is whether a bounded input signal will always give rise to a bounded output signal. If so, the system is said to have bounded-input, bounded-output (BIBO) stability. More precisely, a system is BIBO stable if there is a constant I such that if the input |u(t)| ≤ B for all t, then the output |y(t)| ≤ BI for all t. One class of systems that covers many engineering situations is the linear, time-invariant (LTI) system, which can be modeled by an equation of the form L[y] = u where operator L is time-independent and linear, for instance, a linear constant-coefficient differential operator of the form L = an
dn d + · · · + a1 + a0 . n dt dt
An LTI system is relaxed if has zero initial conditions. In this case, there exists a function h(t) such that ∞ y(t) = h(t) ∗ u(t) = h(τ )u(t − τ ) dτ 0
for any input u(t). Knowledge of the system’s weighting function h(t) is sufficient to determine the output corresponding to any given input. We
108
5. Some Applications
take the system to be causal so that h(t) ≡ 0 whenever t < 0. For bounded inputs ∞
|y(t)| ≤
0
|h(τ )||u(t − τ )| dτ ≤ B1
∞
0
|h(τ )| dτ,
and thus a sufficient condition for BIBO stability is that h(t) be absolutely integrable on [0, ∞): ∞
|h(t)| dt = B2 < ∞.
0
Conversely, suppose the system has BIBO stability. In particular, if we choose B = 1 there exists a constant M such that if the input is bounded by B, the output is bounded by M . We claim that ∞ |h(t)| dt < ∞. 0
If not, choose T so that 0
T
|h(t)| dt > M.
Define the bounded input h(T − t) , 0 ≤ t ≤ T, |h(T − t)| u(t) = 0, otherwise. Then
y(T ) = 0
T
h(τ )u(T − τ ) dτ =
0
h(T − t) = 0,
T
|h(τ )| dτ > M,
which is a contradiction. An alternative system description often employed for relaxed LTI systems is the transfer function H(s), defined as the Laplace transform of h(t): ∞ h(t)e−st dt. H(s) = 0
Again, this function should contain all necessary information about the internal structure of the system. As a function of the complex variable s, however, its interesting properties involve its singularities in the complex plane. Of these, the poles (the values of s for which |H(s)| → ∞) are of greatest importance. To see why, let s0 be a point located either on the imaginary axis or in the right-half of the s-plane so that [s0 ] ≥ 0. Then for any t ≥ 0, |e−s0 t | ≤ 1 and ∞ |H(s0 )| ≤ |h(t)| dt = B2 0
5.12 Dynamical System Stability and Control
109
if the system is BIBO stable. Hence stability implies that all the poles of H(s) must lie in the left-half of the s-plane, a result familiar to electrical and mechanical engineers. Like stability, the subject of control is huge. We give one example. Example 5.23. The differential equation dω(t) + aω(t) = Kv(t) dt can model the angular shaft speed ω(t) of a fixed-field, armature-controlled dc motor. Here the forcing function v(t) is the voltage applied to the armature, while a and K are positive constants that incorporate necessary data on the resistance of the motor windings, rotational inertial of the shaft and its load, frictional effects, back emf, etc. Application of Laplace transform methods, with zero initial conditions on the shaft speed, yields the corresponding weighting function h(t) = Ke−at for t > 0, and hence by convolution with the input ω(t) = 0
t
v(λ)h(t − λ) dλ =
t
v(λ)Ke−a(t−λ) dλ
0
for all t > 0. A simple motor control question is this: What input v(t) should be applied in order to bring the shaft from rest to some given speed ωd , in time T , while keeping the input energy integral Ev =
T
|v(t)|2 dt
0
a minimum? By the Cauchy–Schwarz inequality, ωd2 ≤ Ev
T
K 2 e−2a(T −λ) dλ
0
and hence the energy required for the task satisfies Ev ≥
2aωd2 . − e−2aT )
K 2 (1
Equality is attained when v(λ) = Kp e−a(T −λ) , where the proportionality constant Kp is determined by setting ωd =
0
T
Kp e−a(T −λ) Ke−a(T −λ) dλ =
Kp K (1 − e−2aT ). 2a
110
5. Some Applications
Hence the optimal driving voltage waveform is given for t > 0 by v(t) =
2aωd e−a(T −t) . K(1 − e−2aT )
Putting this back into the convolution integral, we write the resulting shaft speed as at e − e−at sinh at = ωd ω(t) = ωd . eaT − e−aT sinh aT
5.13 Some Inequalities of Probability Take a random variable X ≥ 0. Then for any t > 0, the probability of the event X ≥ t satisfies P (X ≥ t) ≤
µX , t
(5.39)
where µX is the mean or expected value of X. This is the Markov inequality. To illustrate how it is developed, let us consider the case where X is a discrete random variable having frequency function fX (x) = P (X = x) ≥ 0 (recall that probabilities are always nonnegative). We have µX = xfX (x) = xfX (x) + xfX (x), x≥0
so that µX ≥
x≥t
0≤x
xfX (x) ≥ t
x≥t
fX (x) = tP (X ≥ t),
x≥t
and (5.39) follows. The development for a continuous random variable is similar. The inequality P (|X − µX | ≥ t) ≤
2 σX , t2
(5.40)
2 is the variance of X, is the Chebyshev inequality. To derive it where σX from the Markov inequality, we start from the obvious fact that
(X − µX )2 ≥ 0. Then, for every t > 0, 2 0 E[(X − µX )2 ] / σX = . P (X − µX )2 ≥ t2 ≤ t2 t2
But (X − µX )2 ≥ t2 if and only if |X − µX | ≥ t, and (5.40) follows.
5.13 Some Inequalities of Probability
111
Example 5.24. The special case P (|X − µX | ≥ nσX ) ≤ 1/n2 immediately leads to the interpretation that a random variable X is likely to fall close to its mean. Example 5.25. Binomially distributed random variables have mean np and variance np(1 − p), where n is the number of trials of the experiment and p is the probability of “success” in each trial. Then t = nβ gives P (|X − np| < nβ) ≥ 1 −
p(1 − p) . nβ 2
For instance, suppose a population contains an unknown proportion p of defective objects. Let X be the number of defectives in a sample of size N . Then for every β > 0, X p(1 − p) P − p < β ≥ 1 − . N N β2 Now max[p(1 − p)] = 0.25; hence for fixed β, N , the minimum probability that the observed proportion of defectives in the sample differs from the actual proportion p by an amount less than β is 1 − 0.25/N β 2 . Hence, to insure that this probability meets or exceeds some value P , we need N ≥ 0.25/(1 − P )β 2 . The Chebyshev inequality is too conservative for some applications. A better estimate on the tail of a probability distribution is often reached through the more specialized Chernoff bound (see Lafrance [36] for a nice development). Many important continuous random variables are normally distributed. Recall that the standard normal density has 2 1 fX (x) = √ e−x /2 2π
for −∞ < x < ∞. It is often convenient to work with the related coerror function ∞ 2 1 e−t /2 dt Q(x) = P [X > x] = √ 2π x for which repeated integration by parts yields the asymptotic expansion [1] 2 1·3 (−1)n 1 · 3 · 5 · · · (2n − 1) e−x /2 1 1 − 2 + 4 + ··· + Q(x) = √ x x x2n 2πx 2 (−1)n+1 1 · 3 · 5 · · · (2n + 1) ∞ e−t /2 √ + dt t2n+2 2π x
112
5. Some Applications
gi (t) + ni (t)
H(f )
go (t) + no (t)
FIGURE 5.5. Pre-decision filter.
whenever x > 0. The inequalities 2 2 1 1 1 √ 1 − 2 e−x /2 < Q(x) < √ e−x /2 x 2πx 2πx are thus apparent. Tighter bounds on Q(x) are also available (see [10] and Exercise 5.17). This function is of interest in communication engineering, where it is often assumed that the system noise is Gaussian. We touch on some other aspects of communication systems in the next section.
5.14 Applications in Communication Systems The field of communications strives toward the accurate reception of information signals in the presence of noise. Noise phenomena can be typed according to their physical mode of production, or simply according to the shape of their power spectra. For instance, much noise is produced by electrons moving randomly in conductors; this noise, called thermal noise, is approximately white because power in the associated waveform tends to be distributed evenly across the frequency spectrum. In binary communication, time is divided into successive bit intervals, of length T seconds say, and during each interval at the receiver a known deterministic signal gi (t) is either present or absent (corresponding to binary 1 or binary 0). Of course, noise ni (t) is present in either case (subscript i denotes waveforms at the receiver input, as in Figure 5.5). Since the form of gi (t) is known in advance, the receiver’s function is simply to make a presence/absence decision during each bit interval. The decision process is, of course, complicated by ni (t), which in many cases enters in additive fashion so that the received waveform is gi (t) + ni (t). An error occurs if the receiver decides gi (t) is present when it was never transmitted, or vice versa. It can be shown that the probability of such an error is minimized if the decision is based on a sample taken from the received waveform at a time instant when the signal-to-noise power ratio S/N is maximum. Hence we become interested in a system block that can enhance signal power at some instant of time while simultaneously reducing average noise power. Such a device, known as a matched filter, can be found as follows. We assume additive white noise with power spectral density N0 /2 (watts per Hz), and seek an expression for the Fourier-domain transfer function
5.14 Applications in Communication Systems
113
H(f ). The signal output from the filter is given by Fourier inversion as ∞ g0 (t) = F −1 [G0 (f )] = Gi (f )H(f )ej2πf t df, −∞
so that at the sample time t = T its normalized power is ∞ 2 S = g02 (T ) = Gi (f )H(f )ej2πf T df . −∞
The power spectral density of the output noise is given by (N0 /2)|H(f )|2 because the power gain of the linear system H(f ) at frequency f is |H(f )|2 ; hence the normalized output average noise power is ∞ (N0 /2)|H(f )|2 df N= −∞
and the signal-to-noise ratio is
2 ∞ j2πf T G (f )H(f )e df i −∞ S
∞ = . N (N0 /2)|H(f )|2 df −∞ The integral in the numerator can be expressed in inner-product form (see Example 4.5): ∞ ∞ Gi (f )H(f )ej2πf T df = H(f )Gi (f )e−j2πf T df. −∞
−∞
But by the Cauchy–Schwarz inequality (see Example 4.6),
∞
∞ |Gi (f )|2 df −∞ |H(f )|2 df S −∞
∞ ≤ . N (N0 /2) −∞ |H(f )|2 df Equality is attained when H(f ) ∝ Gi (f )e−j2πf T leading to the choice h(t) = F −1 {Gi (f )e−j2πf T } = g i (T − t) for the matched filter. For more details on the matched filter, along with derivation of the optimal error rate expressions in terms of coerror function Q(x), the reader is referred to Couch [15]. The study of communications naturally involves some aspects of the broadly based mathematical discipline known as information theory, a subject replete with inequalities beginning at its most elementary level. To
114
5. Some Applications
ζ = {S1 , . . . , SN }
S
FIGURE 5.6. An information source.
formally define such a nebulous concept as “information” must have been a substantial challenge to Claude Shannon and the other early workers in this area. To help make the idea precise, we can imagine a hypothetical machine called a discrete memoryless source (DMS). The DMS has an alphabet ζ, which is just a discrete set of symbols ζ = {S1 , . . . , SN }, and it periodically emits one of these symbols S as a message to the outside world (Figure 5.6). The DMS is random in its methods of symbol selection, with pn = 1), that are assigned symbol probabilities P (S = Sn ) = pn (with time independent and such that successive symbols are statistically independent (hence the term memoryless). By definition, the self-information associated with each source symbol is given by 1 (n = 1, . . . , N ). In = logb pn This definition passes some key intuitive tests. Because 0 ≤ pn ≤ 1, we have In ≥ 0, and need not worry about the possibility of getting “negative information” from the source. As pn → 1, In → 0; that is, a symbol that occurs with such stubborn repetitiveness as to be deterministic would never surprise an observer and should carry no information. We have In > Im whenever pn < pm ; the monotonic behavior of the logarithm means that unlikely symbols carry more information than likely ones. Finally, the joint probability P (Sn and Sm ) = pn pm for two successive independent messages, leading to a joint information quantity satisfying 1 1 1 = logb + logb = In + Im . Inm = logb p n pm pn pm The logarithmic base b is arbitrary, but does determine the unit of information. Commonly b = 2 is employed, giving information measured in bits. Because the self-information I is a random variable taking possible values I1 , . . . , IN , we can compute its expected value as N N 1 In pn = pn log2 H(ζ) = . p n n=1 n=1 This quantity, the average information per symbol, is the entropy of the DMS. Of special interest are bounds on H(ζ). Certainly H(ζ) ≥ 0
5.15 Existence of Solutions
115
and equality holds if and only if all the pn except one vanish (again the case of no uncertainty, no information). For an upper bound we may convert to the natural log via the formula log2 x = K ln x (the constant K is immaterial) and use ln x ≤ x − 1 (Exercise 2.2), which is regarded as a fundamental inequality of information theory. We have log2 N = log2 N
N
pn =
n=1
N
pn log2 N
n=1
so that N 1 1 pn log2 pn log2 H(ζ) − log2 N = − log2 N = pn N pn n=1 n=1 N N 1 1 =K pn ln pn −1 . ≤K N pn N pn n=1 n=1 N
The last quantity vanishes, hence H(ζ) ≤ log2 N. This upper bound is attained if and only if 1/(N pn ) = 1 for all n, that is, when pn = 1/N for all n. The entropy of a DMS is therefore greatest when its output is least predictable on average, i.e., when all the message outputs are equally probable. As usual, we could only touch on a few preliminary aspects of this fascinating subject here. The interested reader can consult the references [9, 36, 54] for more.
5.15 Existence of Solutions Theorem 4.3, the contraction mapping theorem, is a powerful result that allows us to prove the existence of unique solutions to many equations of practical importance. In addition, the proofs yield practical methods such as Neumann series and Picard iteration for solving certain integral equations and differential equations, and Newton’s method for solving systems of nonlinear equations. Integral equations, where the unknown is a function appearing underneath an integral sign, arise naturally in areas like mechanics, electromagnetics, control, and population dynamics. For example, the equation b K(x, t)ψ(t) dt (a ≤ x ≤ b), (5.41) ψ(x) = g(x) + λ a
116
5. Some Applications
where ψ(x) is unknown, is called a Fredholm integral equation of the second kind. We assume that g(x) ∈ C[a, b], and that the kernel K(x, t) is continuous for both a ≤ x ≤ b and a ≤ t ≤ b. We seek a condition under which the integral operator b K(x, t)ψ(t) dt F (ψ)(x) = g(x) + λ a
is a contraction on C[a, b]. Now since K(x, t) is continuous on a closed and bounded domain, we know that K(x, t) is bounded (by B, say). Let u(x) and v(x) be arbitrary members of C[a, b]. Then b d(F (u), F (v)) = max λ K(x, t)[u(t) − v(t)] dt x∈[a,b] a b ≤ max |λ| |K(x, t)||u(t) − v(t)| dt x∈[a,b]
a
b
≤ B|λ| max
x∈[a,b]
|u(t) − v(t)| dt
a
≤ B|λ|(b − a) max |u(x) − v(x)| x∈[a,b]
= B|λ|(b − a) d(u(x), v(x)). For F to be a contraction on C[a, b] then, we require that |λ| < 1/B(b − a). Provided this condition is satisfied, we may iterate to solve (5.41) for ψ(x) on [a, b]. Starting with an initial guess of ψ (0) (x) = g(x), the first iteration yields b K(x, t)ψ (0) (t) dt = g + λΓ[g], ψ (1) (x) = g(x) + λ a
where for notational convenience we have defined Γ as the integral operator b K(x, t)ψ(t) dt. Γ[ψ] = a
The second iteration is then ψ (2) = g + λΓ[g] + λΓ2 [g] = g +
2 i=1
and, in general, ψ (n) = g +
n i=1
λi Γi [g].
λi Γi [g]
5.15 Existence of Solutions
117
By Theorem 4.3, we can express the solution of (5.41) as ψ = lim ψ (n) = g + n→∞
∞
λi Γi [g].
i=1
This is called the Neumann series for the integral equation. Example 5.26. A specific instance of (5.41) is furnished by 1 ex−t ψ(t) dt (0 ≤ x ≤ 1). ψ(x) = g(x) + λ
(5.42)
0
In this case it is easily verified that Γn [g] = κex for n = 1, 2, 3, . . . , where 1 κ= e−t g(t) dt. 0
Hence ψ(x) = g(x) +
∞
λi κex = g(x) + κex
i=1
λ 1−λ
(5.43)
for 0 ≤ x ≤ 1, provided that |λ| < 1. The validity of (5.43) as a solution to (5.42) is easily verified by direct substitution. The reader interested in pursuing integral equations further could refer to Jerri [31]. Another application of Theorem 4.3 is to the proof of existence of solutions to differential equations. Theorem 5.4 (Picard–Lindel¨ of Theorem). Suppose we are given the differential equation dy = f (x, y) with initial condition y(x0 ) = y0 . dx
(5.44)
We may assume x0 = y0 = 0 (by taking appropriate translations if necessary). Suppose f is continuous in a rectangle D = {(x, y) | |x| ≤ a, |y| ≤ b} where a and b are positive constants. Also suppose that f satisfies a Lipschitz condition in y in D, namely that there exists a positive constant k such that |f (x, y1 ) − f (x, y2 )| ≤ k|y1 − y2 |
for all (x, y1 ) and (x, y2 ) ∈ D. (5.45)
Then there exists a constant α > 0 so that in the interval I = {x | |x| ≤ α} there exists a unique solution y = φ(x) to the initial-value problem (5.44). Proof. Before giving the proof we give some motivation. Suppose we already knew φ(x) exists. Then x f (t, φ(t)) dt for x ∈ I (5.46) φ(x) = 0
118
5. Some Applications
by Theorem 2.5. Let M be the space of continuous functions on the closed interval I. M is a complete metric space with metric d(f, g) = max |f (x) − g(x)|. x∈I
Define F from M to itself as follows: if ψ ∈ M , define F (ψ) by x f (t, ψ(t)) dt (x ∈ I). F (ψ)(x) = 0
So if φ(x) exists, then it is a fixed point of F ; i.e., F (φ) = φ. By Theorem 4.3 F will have a unique fixed point if it is a contraction. Now let φ1 , φ2 ∈ M . Suppose in addition that |φ1 (t)| ≤ b
and |φ2 (t)| ≤ b
for all t ∈ I,
(5.47)
d(F (φ1 ), F (φ2 )) = max |F (φ1 )(x) − F (φ2 )(x)| x∈I x x = max f (t, φ1 (t)) dt − f (t, φ2 (t)) dt x∈I
0
0
≤ max |f (t, φ1 (t)) − f (t, φ2 (t))||x − 0|
(5.48)
≤ k max |φ1 (t) − φ2 (t)|α
(5.49)
= αkd(φ1 , φ2 ).
(5.50)
t,x∈I
t∈I
We will want to choose α so that αk < 1 in order that F be a contraction. Also we will want for any φ1 and φ2 ∈ M that (5.47) be satisfied so that (5.45) will allow us to deduce (5.49) from (5.48). We are now ready to prove the theorem. Since f (x, y) is continuous on D there is a positive constant Q such that |f (x, y)| ≤ Q for all (x, y) ∈ D. Choose α sufficiently small so that αk = λ < 1 and α < b/Q. We now define I = {x | |x| ≤ α} and modify our original definition of M so that M = {φ | φ ∈ C(I) and |φ(t)| ≤ b for all t ∈ I}, M is a complete metric space. To see that F maps M into M , for φ ∈ M and x ∈ I, x f (t, φ(t)) dt ≤ αQ < b. |φ(x)| = 0
The derivation of the inequality (5.50) is now valid. Since αk < 1, F is a contraction on M and therefore has a fixed point φ. We may choose φ0 (x) as the (constant) zero function, and for all i perform Picard iteration x φi+1 (x) = F (φi )(x) = f (t, φi (t)) dt for all x ∈ I. 0
5.15 Existence of Solutions
119
Then {φi (x)} → φ(x) as i → ∞. With one minor difference, we have given the same proof twice. In finding a solution to (5.41), F is a contraction when λ is sufficiently small; in finding a solution to (5.46), F is a contraction when the interval of integration from 0 to x is sufficiently small. Recall from our discussion of the mean value theorem for derivatives that if f : R → R ∈ C 1 and x, x + ∆x ∈ (a, b), then f (x + ∆x) = f (x) + f (η)∆x
(5.51) 1
for some η between x and x + ∆x. However, if f : R → R ∈ C so that ∂f1 ∂f1 ··· f1 (x) ∂xn ∂x1 .. , .. f (x) = ... , f (x) = ... . . ∂fn ∂fn fn (x) ··· ∂x1 ∂xn n
n
then it is not true in general that given x, ∆x ∈ Rn there exists η between (in the line segment joining) x and x + ∆x such that (5.51) holds. An example [17] is x x1 e 1 − x2 2 2 f: R → R , = . f x2 x21 − 2x2 Our concern is as follows. For f : [a, b] → [a, b] to be a contraction in [a, b], if |f (x)| ≤ λ < 1 on [a, b], then f is a contraction on [a, b] by Corollary 2.6.1: |f (x) − f (y)| = |f (η)(x − y)| ≤ λ|x − y| for some η. We cannot extend this argument directly to f : Rn → Rn . However x+∆x 1 f (x + ∆x) − f (x) = f (z) dz = f (x + t∆x)∆x dt x
0
by componentwise application of Theorem 2.5. This implies that f (x + ∆x) − f (x) ≤ M ∆x
where M = max f (x), ¯ x∈U
¯ is a closed neighborhood containing x and x + ∆x. Using this, we where U can prove the following important theorem: Theorem 5.5 (Implicit Function Theorem). Let F ∈ C 1 (U × V, W ) where U, V, W are open subsets of Rn , Rm , Rn , respectively. Let (x0 , y0 ) ∈ U × V with F (x0 , y0 ) = 0 and ∂F1 · · · ∂F1 ∂xn ∂x1 .. (x , y ) .. Dx F (x0 , y0 ) = ... 0 0 . . ∂Fn · · · ∂Fn ∂x1 ∂xn
120
5. Some Applications
be nonsingular. Then there exists a neighborhood U1 × V1 ⊂ U × V and a function f : V1 → U1 , f ∈ C 1 , where f (y0 ) = x0 such that F (x, y) = 0
for (x, y) ∈ U1 × V1
⇔
x = f (y).
Proof. The basic idea is that if we have more unknowns than equations, we may choose and rename surplus variables as y1 , . . . , ym . Then, holding these variables constant, we may solve the set of equations in suitable neighborhoods for x1 , . . . , xn by Newton’s method. Staying nearby, as the values of y1 , . . . , ym vary, so will the values of x1 , . . . , xn . Usually we do not know an explicit formula, but x1 , . . . , xn are determined implicitly by y1 , . . . , ym . In the above, we use the standard notation U × V = {(x, y) | x ∈ U and y ∈ V }. We now give the proof. Since Dx F (x0 , y0 ) is nonsingular, there is a neighborhood of (x0 , y0 ) in which Dx F is nonsingular (the determinant is a continuous function that is nonzero at a point, and hence in a neighborhood). By choosing U and V smaller if necessary, (Dx F (x, y))−1 is defined on U × V . Define G : U × V → Rn by G(x, y) = x − (Dx F (x, y))−1 F (x, y). Then G(x0 , y0 ) = x0 and Dx G(x0 , y0 ) = I − I = 0, where I is the identity matrix. Since G(x0 , y0 ) = x0 and Dx G(x0 , y0 ) = 0, and since G ∈ C 1 , there exists a neighborhood U1 × V1 of (x0 , y0 ) with Dx G(x, y) ≤ α < 1 ¯1 × V¯1 → U ¯1 ¯1 × V¯1 . Choose this neighborhood U1 × V1 such that G: U in U ¯1 → U ¯1 is a contraction, and hence (see above). For each y ∈ V1 , G(x, y): U has a unique fixed point which we denote by f (y). To see that f is smooth, and to see an illustrative example, consult Edwards [22]. A special case of the preceding proof guarantees convergence of Newton’s method (under reasonable hypotheses) if the initial guess is sufficiently close to the solution. Theorem 5.6. Let F ∈ C 1 (U, W ) with U, W open in Rn . Suppose F (ξ) = 0 and F (ξ) is nonsingular. Then there is a neighborhood U1 of ξ such that if x0 ∈ U1 and xn+1 = xn − (F (xn ))−1 F (xn ) for all n, then the sequence {xn } converges to ξ. Proof. Consider m = 0 and Rm as the empty set. That is, take F (x) instead of F (x, y). G: U → Rn becomes G(x) = x − (F (x))−1 F (x) on sufficiently ¯1 → U ¯1 a contraction; hence if x0 ∈ U1 , {xn+1 = G(xn )} small U1 , with G: U converges to the fixed point ξ of G (where F (ξ) = 0). For a treatment of the implicit function theorem, existence of solutions of differential equations, and related topics in greater generality, see Chow and Hale [14].
5.16 A Duality Theorem and Cost Minimization
121
5.16 A Duality Theorem and Cost Minimization The proof of the duality theorem and an example are by Duffin [19, 20]. Suppose c1 , . . . , cn is a sequence of positive constants and for each i = 1, . . . , n there is a sequence of real numbers αi1 , . . . , αik . Suppose these are used to define for positive t1 , . . . , tk the cost function 11 1k n1 nk u(t1 , . . . , tk ) = c1 tα . . . tα + · · · + cn tα . . . tα 1 1 k k .
Denote Rk+ = {t = (t1 , . . . , tk ) | each ti > 0}, ( δ = (δ1 , . . . , δn ) | each δi > 0 and
∆n = =
) δi = 1 ,
i=1
( ∆α n
n
δ | δ ∈ ∆n and
n
αij δi = 0 for j = 1, . . . , k
) ,
i=1 αik i1 Pi (t) = tα 1 . . . tk n δi ci v(δ) = δ i i=1
for t ∈ Rk+ , for δ ∈ ∆n .
Using the above notation with the positive constants c1 , . . . , cn and the matrix of real numbers (the exponents in the cost function) {αij } fixed, we state two problems: • Problem 1: Let M = inf t∈R+ {u(t)}. Find M . k
• Problem 2 (The dual problem): Let m = supδ∈∆αn {v(δ)}. Find m. Zero is clearly a lower bound for the set in Problem 1 hence M exists. On the interval (0, 1), (1/δi )δi is bounded by e1/e so m exists for Problem 2. To show how the two problems are related we apply the weighted AM– GM inequality to u(t): for any δ ∈ ∆n and t ∈ Rk+ , u(t) =
n i=1
δi
ci Pi (t) δi
n
≥ i=1
where each Dj =
ci Pi (t) δi
n
δi
Dk 1 = v(δ) tD 1 · · · tk ,
αij δi .
i=1 + α If δ ∈ ∆α n , then each Dj = 0. Thus u(t) ≥ v(δ) for all t ∈ Rk and δ ∈ ∆n . + For t ∈ Rk and δ ∈ ∆n define Dk 1 Q(t, δ) = u(t) − v(δ) tD 1 . . . tk .
122
5. Some Applications
Then Q(t, δ) ≥ 0 with equality if and only if all ci Pi (t)/δi are equal. Now suppose that u(t) attains its infimum M at a point t∗ ∈ Rk+ . For each i choose δi∗ = ci Pi (t∗ )/M . Then δ ∗ ∈ ∆n and all ci Pi (t∗ )/δi∗ are equal, hence Q(t∗ , δ ∗ ) = 0. Since we have assumed the cost function u(t) attains its minimum at t∗ in the open set Rk+ , ∂u/∂tj = 0 for j = 1, . . . , k at this point. Since Q(t, δ) attains its minimum at (t∗ , δ ∗ ), all its partial derivatives, hence the first k, ∂Q/∂tj = 0 at (t∗ , δ ∗ ). But ∂Q ∂ ∂ Dk 1 = u− v(δ) tD 1 . . . tk . ∂tj ∂tj ∂tj Since the first term is zero, ∂ Dj −1 Dk D1 1 k v(δ) tD . . . tD 1 . . . tk = Dj v(δ) t1 . . . tj k = 0. ∂tj ∗ α The conditions Dj = 0 are exactly that δ ∈ ∆α n . Thus δ ∈ ∆n and + ∗ α v(δ ) = M. Since v(δ) ≤ u(t) for all t ∈ Rk and δ ∈ ∆n , in particular ∗ v(δ) ≤ u(t∗ ) = M for all δ ∈ ∆α n with equality when δ = δ . Thus M = m and + for all δ ∈ ∆α n and t ∈ Rk
v(δ) ≤ M ≤ u(t) ∗
(5.52)
∗
with equality at t = t and δ = δ . Therefore the following has been proved: Theorem 5.7. If the cost function u(t) in Problem 1 attains its infimum M in Rk+ , then the dual function v(δ) in Problem 2 has maximum value M in ∆α n. Example 5.27. Suppose 400 yd3 of material must be ferried across a river in an open box of length t1 , width t2 , and height t3 . The bottom and sides cost $10 per yd2 and the ends cost $20 per yd2 . Two runners costing $2.50 per yd are needed for the box to slide on. Each round trip of the ferry costs $0.10. Minimize the total cost 40 u(t1 , t2 , t3 ) = + 20t1 t3 + 40t2 t3 + 10t1 t2 + 5t1 . t1 t2 t3 (Ignore the fact that a fraction of a trip does not make sense.) The reader might wish to perform the following numerical experiment. Apply Newton’s method to the gradient of u(t), hoping to find a minimum point. In order to force the constraints that ti > 0, substitute ti = ε + x2i and estimate the minimum of u(x) by applying Newton’s method to its gradient. (When using Newton’s method, use the Jacobian derivative of the gradient of u, which is the Hessian of u.) Set ε = 0, take initial guesses ti = x2i = 1, and find values of t1 ≈ 1.54, t2 ≈ 1.11, t3 ≈ .557. Take this value of t as (a good approximation for) t∗ and define δi∗ = ci Pi (t∗ )/u(t∗ ) for ∗ all i and verify (to within a specified tolerance) that δ ∗ ∈ ∆α n and u(t ) = ∗ v(δ ) = 108.69. Note that a typical application of Newton’s method yields only a local result. However, (5.52) tells us that we found M = $108.69, the global minimum cost.
5.17 Exercises
123
5.17 Exercises 5.1. Applications of the Chebyshev inequality: (a) Obtain an upper bound for the integral 5 ex I= dx. 2 x+1 (b) Derive the inequality (sin−1 t)2 <
t t + 1 ln . 2 t − 1
5.2. Show that if u(x) and every un (x) are integrable on [a, b], and if {un (x)} converges uniformly to u(x) on [a, b], then b b b lim un (x) dx = u(x) dx = lim un (x) dx. a n→∞
a
n→∞
a
5.3. Let u(x) and the sequence {un (x)} be defined on [a, b]. We say that {un (x)} converges in mean to u(x) if and only if b [u(x) − un (x)]2 dx = 0. lim n→∞
a
(a) Show that uniform convergence implies mean convergence. (b) Show that mean convergence implies b lim u2n (x) dx = n→∞
a
b
u2 (x) dx.
a
5.4. Write a computer program using Simpson’s rule to estimate the integral 1 x2 e dx. Keep doubling the number of partitions until convergence within a 0 specified tolerance. Do not recompute any function evaluations. 5.5. Use Taylor’s method of order 5 to estimate y(3) given that y = 1 −
y , x
y(2) = 2.
The exact solution is y = x/2 + 2/x. Use a symbolic manipulator to form the derivatives in Φ5 and paste them into a standard language such as Fortran or C. 5.6. Show that for n ∈ N 1 1 1 n 2Γ n + ≤Γ Γ (n + 1) ≤ 2 Γ n + 2 2 2 with strict inequality for n > 1. See [38] for an application to traffic flow. 5.7. For the exponential integrals defined in the text, show that En+1 (x) < En (x) for n = 1, 2, 3, . . . .
124
5. Some Applications
5.8. Show that for x > 1 x − 1 −x e < x2
∞
x
e−t e−x dt < . t x
5.9. The autocorrelation of a real-valued function f is the function f f defined by ∞ f f (t) = f (u)f (t + u) du. −∞
Show that f f reaches its maximum at t = 0. 5.10. Use the power series expansion for Jn (x) to show that for n ≥ 0, |Jn (x)| ≤
|x|n 2n (n!)
ex
2
/4
.
5.11. The complementary error function erfc(x) is defined by ∞ 2 2 erfc(x) = √ e−t dt. π x (a) Establish the following upper bound: √ e−x erfc( x) < √ . πx This bound is useful for x > 3, but obviously not for small x. (b) Show that for 0 ≤ x ≤ y, √ √ erfc( y) ≤ e−(y−x) erfc( x). 5.12. (See Anderson et al. [3]). Let a, b be real, positive numbers. Denote√the arithmetic and geometric means by A(a, b) = (a + b)/2 and G(a, b) = ab, respectively. Define sequences starting with a0 = a, b0 = b and, for all n, an+1 = A(an , bn ), bn+1 = G(an , bn ). (a) Use AM–GM and induction to show that bn ≤ bn+1 ≤ an+1 ≤ an for n ≥ 1. (b) Observe that an is a decreasing sequence bounded below (by b0 ), and hence has a limit. Similarly, bn has a limit. (c) Show that the sequences an and bn have a common limit. This common limit, the arithmetic–geometric mean of a and b, is denoted by AG(a, b). (d) Defining the integral
π/2
T (a, b) = 0
dx 2 2 a cos x + b2 sin2 x
with a1 and b1 defined as above, show that T (a, b) = T (a1 , b1 ).
5.17 Exercises
125
(e) Show that the sequence (a2n cos2 x + b2n sin2 x)1/2 converges uniformly to AG(a, b) on [0, π/2]. (f) Show that π/2 . AG(a, b) √ (g) Now let 0 < r < 1 and set a = 1 and b = 1 − r2 to deduce the stunning result of Gauss concerning Legendre’s elliptic integrals of the first kind: π/2 π/2 dx √ = . AG(1, 1 − r2 ) 0 1 − r2 sin2 x T (a, b) =
(h) Let r = 1/2. Verify that you get six digits of accuracy in using (π/2)/a2 to estimate the elliptic integral π/2 ( 1 − r2 sin2 x)−1 dx. 0
5.13. Show that if ∇2 Φ = 0 throughout a region bounded by a simple closed surface S, then ∂Φ dS ≥ 0. Φ ∂n S 5.14. Two isolated conducting bodies carry electric charges. Show that if they are subsequently connected by a very thin wire, the total stored energy of the system is diminished. 1 1 5.15. Let A = . Show that there is no X = 0 with AX 2 = A F X 2 . 0 1 5.16. Show that of all finite-energy signal forms, the Gaussian pulse of the form f (t) = K2 exp(K1 t2 /2), where K1 , K2 are constants with K1 < 0, has the minimum duration-bandwidth product. 5.17. Show that an improved lower bound on the coerror function is given for x > 0 by 2 x 1 Q(x) > √ e−x /2 . 2π x2 + 1 5.18. Solve the differential equation y = 2y, y(0) = 1 by Picard iteration. 5.19. Describe a suitable choice of the neighborhood U1 when using Newton’s method (Theorem 5.6) for the case n = 1.
This page intentionally left blank
Appendix Hints for Selected Exercises
1.1. (a) Add 1/x < 1 to 1/y < 1, and then multiply by xy. (b) Manipulate (x − 1)2 ≥ 0. (c) (x − y)2 ≥ 0. (d) 1 · 2 · 3 · · · n > 1 · 2 · 2 · · · 2. (e) Verify by substitution for n = 1, 2, 3. Use induction for n ≥ 3. (f) Manipulate the inequality algebraically into (x − 1)(x3 − 1) ≥ 0 where x = b/a. Equality holds if and only if b = a. (g) Apply long division, followed by the binomial theorem and then an ad hoc step. (h) An ad hoc result, obvious upon inspection. (j) Compare the expressions in terms of exponentials. (k) cosh x − 1 = (1/2)(ex + e−x ) − 1 = (1/2)(ex/2 − e−x/2 )2 ≥ 0. 1.2. (b) For a parallel combination of N resistances, the equivalent resistance −1 −1 Req is given by Req = R1−1 + · · · + RN ≥ Rn for any n. 1.3. (a) The inequality holds trivially for n = 1. For the induction step use n+1
n
i=1
i=1
(1 + ai ) = (1 + an+1 ) ≥1+
n
(1 + ai ) =
n
(1 + ai ) + an+1
i=1
ai + an+1
i=1
n
n
(1 + ai )
i=1
(1 + ai ) ≥ 1 +
i=1
n
ai + an+1 .
i=1
(b) First check n = 2. Then n+1
n
i=1
i=1
(1 − ai ) = (1 − an+1 )
= 1 − an+1 −
(1 − ai ) > (1 − an+1 ) 1 −
n i=1
n i=1
ai + an+1
n i=1
ai > 1 −
n+1 i=1
ai .
ai
128
Appendix. Hints for Selected Exercises
1.4. (a) The proof hinges on L − ε < an ≤ bn ≤ cn < L + ε which holds for sufficiently large n. (b) An equivalent statement is dn → 0 where dn = n1/n − 1; from the binomial expansion n = (1 + dn )n > n(n − 1)d2n /2 so that for n > 1, 0 < dn < [2/(n−1)]1/2 . (c) The answer is 0; establish and apply 0 < n!/nn ≤ 2/n2 for n > 2. 1.5. Sum the inequalities min an ≤ an ≤ max an , 1 ≤ n ≤ m, then divide through by m to get the result for A. 1.6. Let P(n) be fn < 2n . P(1) and P(2) hold by inspection. The inequality fn = fn−1 + fn−2 < 2n−1 + 2n−2 = 3(2n−2 ) < 4(2n−2 ) = 2n shows that the truth of P(k) for 1 ≤ k < n implies P(n). By the “strong” variation of the principle of induction, this is enough. √ √ 1.7. an+1 − an√= n + 1 − n. Rationalize the numerator to write this as √ √ an+1 − an = 1/( n + 1 + n). Hence an+1 − an < 1/(2 n) bn . 1.8. Assume C can be ordered by a suitable selection of set P . By definition, i is the solution to the equation z 2 + 1 = 0; from this it is apparent that i = 0 and hence either i ∈ P or −i ∈ P (but not both). The assumption i ∈ P leads to i2 = −1 ∈ P and therefore to i(−1) = −i ∈ P , a violation of trichotomy. Similarly, the assumption −i ∈ P implies that i ∈ P. 1.10. (a) Suppose that S has two distinct suprema s1,2 . Both s1,2 are upper bounds for S; s1 ≥ s2 because s2 is a supremum for S, and likewise s2 ≥ s1 because s1 is a supremum for S. Hence s2 = s1 , contradicting the supposition that s1,2 are distinct. (d) Let U be the set of all upper bounds for S, and show that the suppositions sup S > inf U and sup S < inf U both lead to contradictions. (f) Assume sup A > sup B, with sup A − sup B = δ > 0. For every ε > 0, there is x ∈ A such that x > sup A − ε. Hence there exists x0 ∈ A such that x0 > sup A − δ/2. We then have x0 > δ + sup B − δ/2 = sup B + δ/2. Because δ is arbitrary, x0 > sup B. But x ∈ A ⇒ x ∈ B since A ⊆ B. Thus sup B would not be an upper bound for B, a contradiction. 1.11. (c) To show that supx∈S h(x) ≤ supx∈S f (x) + supx∈S g(x) where h(x) = f (x) + g(x), we may assume the contrary: that sup h(S) − sup f (S) − sup g(S) = δ > 0. For every x ∈ S, f (x) + g(x) ≤ sup f (S) + sup g(S). Also, there exists x0 ∈ S such that h(x0 ) = f (x0 ) + g(x0 ) > sup h(S) − δ/2. Hence sup h(S) − δ/2 < f (x0 ) + g(x0 ) ≤ sup f (S) + sup g(S), or δ < δ/2, a contradiction. Use this result and supx∈S [−f (x)] = − inf x∈S f (x) to obtain supx∈S [f (x) − g(x)] ≤ supx∈S f (x) − inf x∈S g(x). 1.12. Suppose {an } is increasing and bounded above. Then S = sup{an } exists and to any ε > 0 there corresponds k such that S − ε < ak ≤ S. Thus for m > k, S − ε < ak ≤ am ≤ S < S + ε. Therefore, to any ε > 0 there corresponds k such that m > k ⇒ S − ε < am < S + ε ⇒ |am − S| < ε. 1.13. (c) Given ε > 0 there exists N such that for x > N , L − ε < f (x) < L + ε. Hence for sufficiently large x we have 0 ≤ f (x) < L + ε or L ≥ −ε for any ε > 0. From this conclude that L ≥ 0, as required. 2.1. Choose α < 1 with p + α > 1 (e.g., if p ≥ 1 let α = 1/2, if 0 < p < 1 choose 1 − p < α < 1). For x > 1, x x ln x = (1/t) dt ≤ (1/tα ) dt = (x1−α − 1)/(1 − α) 1
1
Appendix. Hints for Selected Exercises
129
hence 0 < ln x/xp ≤ (x1−α−p − x−p )/(1 − α) → 0 as x → ∞. 2.2. (g) Differentiate the function f (x) = ln x/x to show that its maximum is attained at x = e, not at x = π, and hence that f (e) > f (π). (h) Define f (x) = xa + (1 − x)a for 0 ≤ x ≤ 1, 0 < a < 1. Check that f (0) = f (1) = 1, f (x) = 0 at x = 1/2 and f (1/2) = 21−a . Thus 1 ≤ xa + (1 − x)a ≤ 21−a . Now substitute x = s/(s + t). (i) Define f (x) = xb + (1 − x)b for 0 ≤ x ≤ 1, b > 1. Check that f (0) = f (1) = 1, f (x) = 0 at x = 1/2 and f (1/2) = 21−b . Thus 21−b ≤ xa + (1 − x)a ≤ 1. Now substitute x = s/(s + t). 2.3. The result can be obtained via the inequality ln x ≤ x − 1 (Exercise 2.2). Putting x → x/a and rearranging gives x ≥ a[1 + ln(x/a)] = a[ln e + ln(x/a)] = ln[(ex/a)a ]. Now raise e to the power of both sides. Equality holds if and only if x = a. √ 2.4. (b) Take f (x) = tan−1 x. (c) f (x) = x. (d) f (x) = ex . (e) Using f (x) = (1+x)a in the mean value theorem, if x > 0 there exists ξ ∈ (0, x) with ((1+x)a − 1)/x = a(1 + ξ)a−1 < a(1 + x)a−1 , and since x > 0, (1 + x)a − 1 < ax(1 + x)a−1 . If −1 < x < 0 there exists ξ with x < ξ < 0 such that ((1 + x)a − 1)/x = a(1 + ξ)a−1 > a(1 + x)a−1 , and since x < 0, (1 + x)a − 1 < ax(1 + x)a−1 . 2.5. (a) Write h(x) = f (x)/g(x) with f (x) = (1 + x)a − 1 and g(x) = x. Then f (x)/g (x) = a(1+x)a−1 is increasing (since its derivative is a(a−1)(1+x)a−2 > 0). Now use f (0) = g(0) = 0 and l’Hˆ opital’s monotone rule (LMR) for the cases x > 0 giving h(x) > h(0) = a and x < 0 giving h(x) < h(0) = a. (b) ln cosh x ln((sinh x)/x) =0/0 at
−→ x=0 LMR
x tanh2 x x − tanh x =0/0 at
−→ x=0 LMR
1 + 4x/(sinh 2x)
which clearly decreases on (0, ∞). (c) Check that h(x) = sin πx/(x(1 − x)) is symmetric about x = 1/2, so we restrict to (0, 1/2) = (a, b). Now h(x) → π as x → 0+ by l’Hˆ opital’s rule and h(1/2) = 4. To show h is monotonic, sin πx x(1 − x) =0/0 at
−→ x=0 LMR
π cos πx 1 − 2x =0/0 at
−→ x=1/2 LMR
(π 2 sin πx)/2
is increasing on (0, 1/2). (d) h(x) = sin x/x on (0, π/2] extends to [0, π/2] with h(0) = 1 by l’Hˆ opital’s rule, h(π/2) = 2/π. Now sin x x =0/0 at
−→ x=0
LMR
cos x 1
is strictly decreasing hence 1 > sin x/x > 2/π on (0, π/2). 2.6. (d) Apply ex ≥ 1 + x. Thus 1 + an ≤ exp(an ) and N N N (1 + an ) ≤ exp(an ) = exp an . n=1
n=1
n=1
2.7. Substitute x = a/b > 1 and use either differentiation or Corollary 2.6.1 to establish that n(x − 1) < xn − 1 < nxn−1 (x − 1). Next, suppose that an = bn
130
Appendix. Hints for Selected Exercises
with a = b; then bn−1 < 0 < an−1 (a contradiction, because b assumed positive). 2.8. (a) Fix ε and δ and a partition a = x0 < x1 < · · · < xn = b with xi − xi−1 < δ for all i. If f (x) is not bounded on [a, b], then it is not bounded on some subinterval [xk−1 , xk ]. Choose ξj for all j = k. There exists some ξk with |f (ξk )| sufficiently large to contradict | n i=1 f (ξi )(xi − xi−1 ) − I| < δ. (b) Since f (x) is integrable it is bounded, i.e., there exists M with |f (x)| ≤ M on [a, b]. Let ε > 0 be given, and suppose x0 ∈ (a, b). Then x x x x0 |F (x) − F (x0 )| = f (t) dt − f (t) dt = f (t) dt ≤ |f (t)| dt . a
x0
a
x0
Hence |F (x) − F (x0 )| ≤ M |x − x0 | and we may choose δ = ε/M . (c) No. f (x) is not bounded on [0, 1]. However the integral exists as the improper integral 1 limε→0+ ε x−1/2 dx. 2.9. (a) 2−α ≤ I(α, β) ≤ 1. (b) Use Jordan’s inequality. (c) For s > b, ∞ ∞ ∞ C −st −st ≤ . f (t)e dt |f (t)||e | dt ≤ Cebt e−st dt = s−b 0
0
0
(d) Use suitable table lookup integrals to obtain 1≤
(2n − 1)!!(2n + 1)!! π 2n + 1 ≤ . [(2n)!!]2 2 2n
As n → ∞, the middle quantity is squeezed to 1. (e) Consider the alternating series (i+1)π ai = (sin x)/x dx. iπ
Since the alternating series has |ai+1 | < |ai | and |ai | → 0 as i → ∞, the series converges. Write the series of positive terms ∞ (sin x)/x dx = (a0 + a1 ) + (a2 + a3 ) + · · · > a0 + a1 . 0
Using Jordan’s inequality on [0, π/2], (sin x)/x ≥ 2/π so
π/2 0
(sin x)/x dx ≥ 1.
Using symmetry and Figure 2.2, on [π/2, π] (sin x)/x ≥ 2/x−2/π ≥ 2/(π/2)−2/π hence π (sin x)/x dx ≥ 1.
π/2
Also
2π
(sin x)/x dx ≥ −2/3.
π
(Sketch a box from π to 2π touching the curve (sin x)/x at 3π/2.) Hence ∞ (sin x)/x dx > 4/3. 0
Appendix. Hints for Selected Exercises
131
Now regroup and get a0 plus a series of negative terms, ∞ (sin x)/x dx = a0 + (a1 + a2 ) + (a3 + a4 ) + · · · < a0 . 0
Form a left Riemann sum
2
f (xi )∆x > a0 ,
i=0
where ∆x = π/3, xi = i∆x. The integral from 0 to ∞ of the Fourier kernel (sin x)/x is computed using complex variables, with a crucial step invoking Jordan’s lemma, which in turn uses Jordan’s inequality. See [48]. The answer is π/2. 2.10. (a) [53] First assume f (b) = 0. Since g is integrable on [a, b] it is bounded, so we may choose a constant c > 0 with g(x) + c ≥ 0 on [a, b]. Define the ξ (continuous) function G(ξ) = a g(x) dx. Let M denote the maximum and m the minimum of G on [a, b]. Let ∆x = (b − a)/n and xi = a + i∆x for i = 0, . . . , n. Then b n xk (g(x) + c)f (x) dx = (g(x) + c)f (x) dx a
≤ =
k=1 n k=1 n k=1
xk−1
xk
f (xk−1 )
(g(x) + c) dx xk−1
xk
f (xk−1 )
g(x) dx + c xk−1
n
f (xk−1 )(xk − xk−1 ).
k=1
In the last expression the first term can be rewritten n
f (xk−1 )(G(xk ) − G(xk−1 )) =
k=1
n
G(xk )(f (xk−1 ) − f (xk ))
k=1
≤M
n (f (xk−1 ) − f (xk )) = M f (a). k=1
b As n → ∞ the second term approaches c a f (x) dx. Taking limits, by Lemma 1.1, b b (g(x) + c)f (x) dx ≤ M f (a) + c f (x) dx. a
Similarly,
a b
a
(g(x) + c)f (x) dx ≥ mf (a) + c
b
f (x) dx. a
Now apply the intermediate value theorem to G. Finally, note that if f (b) = 0 we b may redefine f (x) to be 0 at x = b without changing the integral a f (x)g(x) dx. (c) If f (x) is monotonic decreasing, replace f (x) by f (x)−f (b) in part (a). If f (x) is monotonic increasing, replace f (x) by f (x) − f (a) in part (b). (d) Immediate from part (a). 2.11. (a) Note that ln(n!) = n k=1 ln(k), and interpret this as a sum of rectangular areas (each of unit width). (b) Both trapezoids are bounded by the x-axis,
132
Appendix. Hints for Selected Exercises
and the lines x = a, b. The fourth side of the smaller trapezoid is formed by the line tangent to y = 1/x at the midpoint x = (a + b)/2 of interval (a, b). The fourth side of the larger trapezoid is formed by the secant line connecting the points x = a, y = 1/a, and x = b, y = 1/b. 2.12. We have Cn − Cn+1 = ln(1 + 1/n) − 1/(n + 1) > 0 by the logarithmic The lower bound of 1/2 would require that n n inequality, so Cn is decreasing. 1/j − 1/2 > ln n = dx/x. To show that this is indeed the case, construct j=1 1 suitable trapezoids to slightly overestimate the area under 1/x as A=
n−1 n 1 1 1 1 1 1 + − − , = 2 j=1 j j+1 j 2 2n j=1
which makes it apparent that n 1 1 1 − > ln n + > ln n. j 2 2n j=1
2.13. The moment of inertia 2 2 I = r2 dm ≤ rmax . dm = mrmax 2.14. Between each xi and xi+1 there is a point where g (x) vanishes. Between every two adjacent such points there is a point where g (x) vanishes. Continue the pattern until reaching a point ξ where g (n) (x) vanishes. 2.15. Assume the contrary and develop a contradiction based on Lemma 2.3. 2.16. Let A be dense in B, and suppose xn → x where x ∈ B and xn is a sequence of points in A. By hypothesis f (xn ) ≤ g(xn ) for all n; hence, by Lemma 1.1 we know that lim f (xn ) ≤ lim g(xn ) and by continuity (Theorem 2.1) the result is proved. Also note that the rationals are dense in the reals. 2.17. No; think of two functions f (x) and g(x) (e.g., two straight lines defined on some interval of the x-axis) such that f (x) is greater than g(x) but the slope of f (x) is less than that of g(x). 3.1. Eggleston [23] gives a proof along the following lines. Let w f (w) f (x) dx, Ig = g(y) dy. If = 0
0
Suppose without loss of generality that h ≥ f (w). (Otherwise interchange w, f with h, g.) First partition the interval [0, w]: let ∆x = w/n and xi = i∆x for i = 0, . . . , n. Observe graphically that because f is increasing If ≥
n−1
f (xi )∆x =
i=0
n−1
fi ∆x,
i=0
where fi = f (xi ) for i = 0, . . . , n. Similarly, because the numbers fi serve to (nonuniformly) partition the interval [0, f (w)], Ig ≥
n−1 i=0
g(fi )(fi+1 − fi ) =
n−1 i=0
xi (fi+1 − fi ).
Appendix. Hints for Selected Exercises
133
Hence If + Ig ≥
n−1 i=0
=
=
n−1
xi (fi+1 − fi )
i=0
n−1 i=0
=
fi ∆x +
fi+1 ∆x +
n−1
n−1
i=0
i=0
(fi − fi+1 )∆x +
n−1
n−1
i=0
i=0
[fi+1 (xi + ∆x) − xi fi ] +
(fi − fi+1 )∆x
n−1
n−1
i=0
i=0
(xi+1 fi+1 − xi fi ) + ∆x
xi (fi+1 − fi )
(fi − fi+1 )
= xn fn − x0 f0 − ∆x(fn − f0 ) = (w − ∆x)f (w) by the telescoping property. But g(y) ≥ g(f (w)) if y ≥ f (w), so that h h g(y) dy ≥ g(f (w)) dy = [h − f (w)]w. f (w)
f (w)
Upon addition to the previous inequality, w h f (x) dx + g(y) dy ≥ wh − ∆xf (w). 0
0
This holds for arbitrary ∆x > 0, leading to the desired conclusion. 3.2. (a) Use AM–GM. (c) Use cyclic permutation on the result of part (a); i.e., add the inequalities a2 + b2 ≥ 2ab, b2 + c2 ≥ 2bc, and c2 + a2 ≥ 2ca. 3.3. The summation formula n k=1 k = n(n + 1)/2 is useful in this problem. For instance, 2 n (2n) + (2n − 2) + · · · + 4 + 2 i=1 i = = n + 1. [(2n)!!]1/n < n n 3.4. (a) Let the rectangle have length L, width W , perimeter P . Then P/4 = (L + W )/2 ≥ (LW )1/2 . Equality occurs if and only if L = W . (b) q = Q/2. 3.5. The case n = 1 is trivial. For n ≥ 2 consider an as a variable by defining for x > 0, δ
n−1 δn f (x) = (δ1 a1 + · · · + δn−1 an−1 + δn x)/(aδ11 · · · an−1 x )
which is rewritten as f (x) = (sn−1 + δn x)/(pn−1 xδn ). Show f (x) = 0 at xm = −δn −1 sn−1 /(1 − δn ) and f (xm ) = (δn /pn−1 )xm (1 − δn ) > 0, so f (x) has its minimum at xm . n−1 f (xm ) = {[δ1 /(1 − δn )]a1 + · · · + [δn−1 /(1 − δn )]an−1 }1−δn /(aδ11 · · · an−1 ).
δ
The weights δ1 /(1 − δn ), . . . , δn−1 /(1 − δn ) add up to 1, so inductively if a1 , . . . , an−1 are not all equal, then δ /(1−δn )
[δ1 /(1 − δn )]a1 + · · · + [δn−1 /(1 − δn )]an−1 > a11
δ
n−1 · · · an−1
/(1−δn )
134
Appendix. Hints for Selected Exercises
so δ /(1−δn )
δ
n−1 · · · an−1
f (x) ≥ f (xm ) > [a11
/(1−δn ) 1−δn
]
δ
n−1 /(aδ11 · · · an−1 ) = 1.
If a1 = · · · = an−1 , then xm = [δ1 /(1 − δn )]a1 + · · · + [δn−1 /(1 − δn )]an−1 = a1 and f (xm ) = 1. For any other choice of x, f (x) > f (xm ) = 1. 3.6. Use AM–GM. 3.7. Multiply by 1 and then use AM–GM. 3.8. Use AM–GM twice: N
a−m ≥N n
n=1
N
1/N a−m n
=N
n=1
≥N
(1/N )
m
1 N n=1
1 1/N ( N n=1 an )
m
.
an
3.9. (a)
δi xti /t lim ln g(t) = lim ln t→0
= lim δi xti ln xi δi xti t→0 = δi ln xi
t→0
by l’Hˆ opital’s rule and the fact that g(t) →
n
n
i=1
xδi i
δi = 1. Hence (t → 0).
i=1
(b)
ln g(t) = ln δi xti t
=0/0
−→ at
t=0
LMR
δi xti ln xi
δi xti .
The last expression is increasing since its derivative, using the quotient rule, has numerator
2 δi xti ln2 xi − δi xti ln xi δi xti which is nonnegative because
2
2
1/2 t/2 1/2 t/2 δi xi δi xi ln xi ≤ δi xti ln xi = δi xti ln2 xi δi xti by the Cauchy–Schwarz inequality (3.19). (This applies to g on (0, ∞) or (−∞, 0).) 3.10. Choose a partition x0 = a, x1 = a + ∆x, . . . , xn = b where ∆x = (b − a)/n. Form Riemann sums approximating each term with a1 = f (x1 ), . . . , an = f (xn ) so that b f (x) dx, (a1 + · · · + an )/n = (f (x1 ) + · · · + f (xn ))∆x/(b − a) → (1/(b − a)) a
Appendix. Hints for Selected Exercises −1 −1 ((a−1 → (b − a) 1 + · · · + an )/n)
135
b
(1/f (x)) dx, a
and (a1 · · · an )1/n = exp[(1/n)(ln a1 + · · · + ln an )] = exp[(1/(b − a))∆x(ln(f (x1 )) + · · · + ln(f (xn ))] b ln f (x) dx → exp (1/(b − a)) a
as n → ∞. Use the previous exercise (c) with each δi = 1/n and Lemma 1.1. yi . Then ln h(t) = 3.11. To simplify notation, denote yi = xti and s = (d/dt) yi 2 ((t/s) y y because i ln xi − ln s)/t is nonpositive since ln s ≥ (t/s) i ln xi ln ss ≥ t ln xyi i = ln yiyi . The last s ln s ≥ ln xyi i , which follows from inequality holds because ss = s yi = syi ≥ yiyi . 3.12. Ptak [52] gives a proof using AM–GM as follows. From the given ai , form √ a new set of numbers bi = ai / a1 am (i = 1, 2, . . . , m). Like the ai , these are positive and satisfy b1 < b2 < · · · < bm ; moreover, they have the additional prop√ √ erty bm = am / a1 am = a1 am /a1 = 1/b1 so that bi ≤ 1/b1 (i = 1, 2, . . . , m). Manipulate this to get b i − b1 ≤ or bi + Hence
bi − b1 1 1 = − bi b1 b1 bi
1 1 ≤ b1 + bi b1
(i = 1, 2, . . . , m).
m m λi 1 λi bi + ≤ b1 + λi bi b1 i=1 i=1 i=1
m
m m 1 λi b1 + b m ≤ . λi bi + 2 i=1 b 2 i i=1
or
By AM–GM then, m
m 2 λi b1 + bm ≤ λi bi . bi 2 i=1 i=1
When rewritten in terms of the ai , this is the desired result. 3.13. If (3.24) holds, then c=
m i=1
|bi |
q
m
1/q |ai |
p
.
i=1
If (3.25) holds, then for each i, m m m q q p−1 q p−1 q p p = ((c|ai | = (|ai | ) |bi | ) ) (c|ai | ) |ai | . (|bi | ) i=1
i=1
i=1
136
Appendix. Hints for Selected Exercises
3.14. Let a = x0 , x1 = x0 + ∆x, . . . , xm = b where ∆x = (b − a)/m. Calling ai = f (xi ), bi = g(xi ), m
m
|ai bi |∆x →
i=1
|ai | ∆x
i=1
m
|f (x)g(x)| dx,
a
1/p
p
b
→
1/p |f (x)| dx ,
b
1/q |g(x)| dx ,
a
1/q |bi |q ∆x
b
→ a
i=1
as m → ∞. Now use (3.10) and Lemma 1.1. 3.15. Use b |f (x) + g(x)|2 dx ≤ a
b
|f (x) + g(x)|2 dx +
a b
=
[f (x) + g(x)]2 dx +
b
a b
a
|f (x) − g(x)|2 dx
[f (x) − g(x)]2 dx
a
b
=2
[f 2 (x) + g 2 (x)] dx.
a
√ √ 3.16. Use Cauchy–Schwarz with the two functions f h and g h. 3.17. By Cauchy–Schwarz
Here X =
T 0
1 T
T 0
2 T T 1 v(t) dt ≤ 2 (1)2 dt v 2 (t) dt T 0 0 2 1 T 2 1 T dx = v (t) dt = dt T 0 T 0 dt X X 1 dx 1 = dx = v(x) dx. T 0 dt T 0
v(t) dt. Equality holds if the particle speed is constant.
3.18. Make a substitution into Cauchy–Schwarz. 3.21. Inscribe a polygon of N sides in a circle of fixed radius R. With θn the central angle subtending the nth side of the polygon, the area of the polygon is A, where N N N R2 1 N R2 1 N R2 A= sin θn = sin sin θn ≤ θn . 2 2 N n=1 2 N n=1 n=1 Thus A ≤ (N R2 /2) sin(2π/N ), and equality holds only with all central angles equal. 3.23. Since f (x) > 0, f (x) is convex. Therefore −ln(δ1 a1 + · · · + δn an ) ≤ −δ1 ln a1 − · · · − δn ln an , which implies that δ1 a1 + · · · + δn an ≥ aδ11 · · · aδnn .
Appendix. Hints for Selected Exercises
137
3.24.(By now this is old hat.) Let ∆t = (b − a)/n, ti = a + i∆t, ci = p(ti )/ n j=1 p(tj ), xi = g(ti ). Apply (3.21) and Lemma 1.1. 4.1. (a) The statement is equivalent to −d(y, z) ≤ d(x, y) − d(x, z) ≤ d(y, z). The left inequality is equivalent to d(x, z) ≤ d(x, y) + d(y, z), while the right is equivalent to d(x, y) ≤ d(x, z) + d(z, y). But these are both occurrences of the triangle inequality. (b) Use induction to generalize the triangle inequality. 4.2. Assume x and y are distinct limits for {xn }. Then for sufficiently large n, d(x, y) ≤ d(x, xn ) + d(xn , y) < ε/2 + ε/2 = ε by the triangle inequality. Because ε is arbitrarily small, we must have x = y, a contradiction. 4.3. (a) Use Minkowski’s inequality. (b) Verification of the first two metric space requirements is trivial. For the third, use the triangle and Minkowski’s inequalities as follows: ∞ 1/p ∞ 1/p p p d(ξ, η) = |ξi − ζi + ζi − ηi | ≤ [|ξi − ζi | + |ζi − ηi |] i=1
≤
∞
1/p |ξi − ζi |
p
+
i=1
1/p
∞
i=1
|ζi − ηi |
p
.
i=1
4.4. Employ the result of Exercise 2.15. 4.7. Let x be the vector from C to β, y the vector from C to α. Then the median vector from α to A is 2x − y, and the median vector from β to B is 2y − x. Compare the magnitudes of these median vectors by using the fact that the inner product of any vector with itself equals its magnitude squared. 4.8. By the proof of Theorem 4.7 equality holds if and only if we have | x, y| =
x, x y, y and x, y = | y, y|, i.e., x, y is real and nonnegative. Hence equality holds if and only if x = 0 or y = 0 or else x = βy for some β real and nonnegative. aj 4.9. Consider each zj = aj + i bj ∈ C as a point wj = ∈ R2 . bj n 2 n 2 n zi = ai + bi i=1 i=1 i=1 n = wi 2 + 2
wi , wj , 1≤i<j≤n
i=1
whereas
n
|zi | =
n
wi .
i=1
i=1
Squaring both sides, (1.1) is equivalent to n i=1
wi 2 +
1≤i<j≤n
2 wi , wj ≤
n i=1
2 wi
=
n i=1
wi 2 +
1≤i<j≤n
2 wi wj .
138
Appendix. Hints for Selected Exercises
By Cauchy–Schwarz for each i, j, wi , wj ≤ wi wj hence the inequality is established. Furthermore, equality holds in the sum if and only if wi , wj = | wi , wj | = wi wj for all i < j. Hence equality holds if and only if for each i < j, wi = βij wj for a constant βij > 0, i.e., arg(zi ) = arg(zj ). 5.1. (a) I ≤ 32.58 (actual value close to 28). (b) Start with
t
√
0
1 1 − x2
2 dx >
1 t
t 0
√
dx 1 − x2
2
5.2. If u(x) and every un (x) are integrable on [a, b], then b b b un (x) dx − u(x) dx = [un (x) − u(x)] dx ≤ a
a
a
.
b
|un (x) − u(x)| dx.
a
Hence if {un (x)} converges uniformly to u(x) on [a, b], then for ε > 0 there exists N such that b b < (b − a)ε u (x) dx − u(x) dx n a
a
whenever n > N . 5.3. (a) Given ε > 0, take N so large that n > N implies |u(x) − un (x)| < [ε/(b − a)]1/2 for all x ∈ [a, b]. Then for n > N , b b [u(x) − un (x)]2 dx < ε/(b − a) dx = ε. a
a
(b) Use Minkowski to generate ! ! b 2 u dx ≤ !
a
a
b
!
b
u2n
a
! u2n dx ≤
!
b
(u − un )2 dx,
a b
u2 dx + a
b
dx +
(u − un )2 dx,
a
which together are equivalent to the needed inequality ! ! ! b b b 2 dx − 2 dx ≤ u u (u − un )2 dx. n a a a 5.4. Convert the following pseudo-code to your favorite language: tol=.5E-3; a=0; b=1; f(x)=eˆ(xˆ2); n=2; h=(b-a)/n; ends=f(a)+f(b); evens=0; odds=f(a+h); aold=(h/3)(ends+4*odds+2*evens); DoLoop n=2*n; h=h/2; evens=evens+odds; odds=f(a+h)+f(a+3h)+...+f(a+(n-1)h); anew=(h/3)(ends+4*odds+2*evens) if |anew-aold|<=tol*|anew| then exit
Appendix. Hints for Selected Exercises else aold=anew; End Doloop Print anew 5.5. A Fortran program:
c c
program taylor dimension yp(5) x=2 y=2 h=.1 do i=1,10 the next five lines were created by a symbolic manipulator and pasted in yp(1)=1-y/x yp(2)=(-x+2*y)/x**2 yp(3)=3*(x-2*y)/x**3 yp(4)=12*(-x+2*y)/x**4 yp(5)=60*(x-2*y)/x**5 Phi=yp(5) do k=5,2,-1 Phi=(h/k)*Phi+yp(k-1) enddo y=y+h*Phi x=x+h enddo write(*,*) ’x,y=’,x,y end
5.6. Some helpful formulas are [27] √ 1 Γ Γ(n) = (n − 1)!, = π, 2
√ π 1 Γ n+ = n (2n − 1)!!. 2 2
Using these the given inequality can be put into the form 2≤
2n n! ≤ 2n (2n − 1)!!
which is easily established by induction. 5.7. t > 1 ⇒ e−xt /tn > e−xt /tn+1 . 5.8. Integrate by parts. 5.9. Use
2
∞
f (u)f (t + u) du −∞
≤
∞
f 2 (u) du
−∞
∞
= −∞
∞
−∞ 2
f 2 (u) du
f 2 (t + u) du = (f f (0))2 .
139
140
Appendix. Hints for Selected Exercises
5.10. (See [59].) Starting with
∞ (−1)m (x/2)2m+n |Jn (x)| = m!(m + n)! m=0
apply the triangle inequality to get ∞ ∞ n (x2 /4)m (x2 /4)m 1 x n x = |Jn (x)| ≤ 2 m=0 m!(m + n)! n! 2 m=0 m!(n + 1)(n + 2) · · · (n + m) 2 ∞ n (x /4) (x2 /4)m 1 x n 1 x = exp ≤ n! 2 m=0 m!(n + 1)m n! 2 n+1 x2 1 x n . ≤ exp n! 2 4 5.11. (b) Letting y = x + d, we have ∞ ∞ 2 2 e−t dt = √ e−t dt = √ y
x+d
≤ e−d
∞
x
∞
e−(u+d) √ du 2 u+d x ∞ 2 e−u √ du = e−d √ e−t dt. 2 u x
√ √ √ √ 5.12. (c) For n ≥ 1, (an+1 −bn+1 ) = (1/2)(( an − bn )/( an + bn ))(an −bn ) ≤ (1/2)(an − bn ) so (a2 − b2 ) ≤ (1/2)(a1 − b1 ), . . . , (an+1 − bn+1 ) ≤ (1/2n )(a1 − b1 ) and now use the squeeze principle on 0 ≤ an+1 − bn+1 ≤ (1/2n )(a1 − b1 ). (d) Rewrite the integrand of T (a, b) as 1/(sin x cos x((a + b)2 + (a cot x − b tan x)2 )1/2 ). Sketch a cot x − b tan x to see that we may substitute tan y = (a cot x − b tan x)/(2b1 ) for −π/2 < y < π/2. Then 1/ cos y = ((2b1 )2 + (a cot x − b tan x)2 )1/2 /(2b1 ). Now using b21 = ab and a few steps of algebra, 2b1 / cos y = a cot x + b tan x. Since also 2b1 tan y = (a cot x − b tan x), subtraction yields b tan x = b1 / cos y − b1 tan y. Next, differentiate to get (b/ cos2 x)dx = b1 (sin y − 1)/ cos2 y) dy. Multiplication of both sides by cot x gives b dx/(sin x cos x) = b1 (sin y − 1)/(cos2 y tan x)dy. Now use 1/ tan x = b cos y/(b1 (1 − sin y)) to get dx/(sin x cos x) = −dy/ cos y hence π/2 1 π/2 sec y dy dy . = T (a, b) = 2 −π/2 a21 + b21 tan2 y a21 cos2 y + b21 sin2 y 0 (e)
b2n cos2 x + b2n sin2 x ≤ a2n cos2 x + b2n sin2 x ≤ a2n cos2 x + a2n sin2 x = an .
bn =
Use the squeeze principle. (f) T (a, b) = T (a1 , b1 ). By induction, we have T (a, b) = T (an , bn ) for all n. Since the sequence (a2n cos2 x + b2n sin2 x)1/2 converges uniformly we may pass the limit inside the integral: π/2 T (a, b) = T (an , bn ) = ( a2n cos2 x + b2n sin2 x)−1 dx 0
→
π/2 0
AG(a, b)−1 dx = (π/2)/(AG(a, b)).
Appendix. Hints for Selected Exercises
5.13. Obvious from
" S
Φ(∂Φ/∂n) dS =
V
141
|∇Φ|2 dV .
5.14. Let the first body carry charge Q1 at surface potential Φ1 , the second body Q2 at potential Φ2 ; the individual capacitances are then C1 = Q1 /Φ1 , C2 = Q2 /Φ2 . After the bodies are put in communication the new charges become Q1 , Q2 , and the shared potential Φ, where Φ = Q1 /C1 = Q2 /C2 and Q1 + Q2 = Q = Q1 + Q2 . These equations yield Φ = Q/(C1 + C2 ) so that the overall capacitance is C1 + C2 . Now it is straightforward to show that the energy stored by any conducting body is given by W = Q2 /2C, where Q is its charge and C is its capacitance. Assuming Q1 and Q2 are both positive, AM–GM gives 2Q1 Q2 C1 C2 ≤ Q22 C12 + Q21 C22 , and suitable algebraic manipulation of this yields (Q1 + Q2 )2 /(C1 + C2 ) ≤ Q21 /C1 + Q22 /C2 , as desired. (In case Q2 = 0, the desired inequality is Q21 /(C1 + C2 ) ≤ Q21 /C1 , which is obvious.) 5.16. Condition for equality in Cauchy–Schwarz: df = K1 tf (t) dt
d 1 df = [ln f (t)] = K1 t f dt dt
⇒
⇒
ln f (t) =
K 1 t2 . 2
5.17. After one integration by parts, Q(x) = √ But 1 √ 2π
∞ x
2 1 1 e−x /2 − √ 2πx 2π
2
e−t /2 1 1 dt < 2 √ t2 x 2π
Hence Q(x) > √
∞
∞ x
2
e−t
2
e−t /2 dt. t2
/2
dt =
x
1 Q(x). x2
2 1 1 e−x /2 − 2 Q(x). x 2πx
Now solve for Q(x). 5.18. Substitute ω = y − 1 so ω = 2ω + 2, ω(0) = 0. For this shifted differential equation, φ0 (x = 0, x x f (t, φ0 (t)) dt = (2φ0 (t) + 2) dt = 2x, φ1 (x) = 0
0
.. . φn (x) =
(2x)2 (2x)n + ··· + + 2x. n! 2
Recognize φn (x) as the Taylor polynomial of degree n for e2x − 1 hence φn (x) → e2x − 1. Therefore y = e2x solves the original differential equation. 5.19. In general we want to choose a neighborhood U 1 of ξ so that G (x) ≤ α < 1 for x ∈ U 1 so that G is a contraction. For the case n = 1, G(x) = x−F (x)/F (x) and G (x) = F (x)F (x)/(F (x)2 ). So if an initial guess x is sufficiently close to ξ so that |F (x)F (x)/(F (x)2 )| ≤ α < 1, then convergence is guaranteed (at least in theory).
This page intentionally left blank
References
[1] Abramowitz, M., and I. Stegun. Handbook of Mathematical Functions. New York: Dover, 1965. [2] Alexander, N. Exploring Biomechanics: Animals in Motion. New York: Scientific American Library, 1992. [3] Anderson, G., M. Vamanamurthy, and M. Vuorinen. Conformal Invariants, Inequalities, and Quasiconformal Maps. New York: Wiley, 1997. [4] Andrews, L. Special Functions of Mathematics for Engineers. New York: McGraw-Hill, 1992. [5] Arbel, B. “From ‘tricks’ to strategies for problem solving,” International Journal of Mathematical Education in Science and Technology, vol. 21, no. 3, 1990. [6] Ballard, W. Geometry. Philadelphia: W.B. Saunders, 1970. [7] Bazaraa, M., and C. Shetty. Nonlinear Programming. New York: Wiley, 1979. [8] Beckenbach, E., and R. Bellman. An Introduction to Inequalities. New York: Random House, 1961. [9] Blahut, R. Principles and Practice of Information Theory. Reading, MA: Addison-Wesley, 1987.
144
References
[10] B¨ orjesson, P., and C. Sundberg. “Simple approximations of the error function Q(x) for communications applications,” IEEE Transactions on Communications, vol. COM-27, no. 3, 1979. [11] Brogan, W. Modern Control Theory. New York: Quantum, 1974. [12] Bromwich, T. An Introduction to the Theory of Infinite Series. London: Macmillan, 1965. [13] Chong, K. “A study of some inequalities involving the modulus signs,” International Journal of Mathematical Education in Science and Technology, vol. 12, no. 4, 1981. [14] Chow, S., and J. Hale. Methods of Bifurcation Theory. New York: Springer-Verlag, 1982. [15] Couch, L. Digital and Analog Communication Systems. New York: Macmillan, 1990. [16] de Alwis, T. “Projectile motion with arbitrary resistance,” College Mathematics Journal, vol. 26, no. 5, 1995. [17] Dennis, J., and R. Schnabel. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Englewood Cliffs, NJ: Prentice Hall, 1983. [18] Dieudonne, J. Foundations of Modern Analysis. New York: Academic Press, 1960. [19] Duffin, R. “Cost minimization problems treated by geometric means,” Operations Research, vol. 10, no. 5, pp. 668–675, 1962. [20] Duffin, R. “Dual programs and minimum cost,” Journal of the Society for Industrial and Applied Mathematics, vol. 10, pp. 119–123, 1962. [21] Duren, P. Theory of Hp Spaces. New York: Academic Press, 1970. [22] Edwards, C. Advanced Calculus of Several Variables. New York: Academic Press, 1973. [23] Eggleston, H. Elementary Real Analysis. Cambridge University Press, UK, 1962. [24] Everitt, W. Inequalities: Fifty years on from Hardy, Littlewood, and Polya: Proceedings of the International Conference. New York: Dekker, 1991. [25] Gelfand, I. Lectures on Linear Algebra. New York: Dover, 1989. [26] Glaister, P. “Does what goes up take the same time to come down?,” College Mathematics Journal, vol. 24, no. 2, 1993.
References
145
[27] Gradshteyn, I., and I. Ryzhik. Table of Integrals, Series, and Products. Boston: Academic Press, 1994. [28] Hardy, G., J. Littlewood, and G. Polya. Inequalities. Cambridge University Press, UK, 1952. [29] Hobson, E. The Theory of Functions of a Real Variable and the Theory of Fourier’s Series, vol. I. New York: Dover, 1957. [30] Indritz, J. Methods in Analysis. New York: Macmillan, 1963. [31] Jerri, A. Introduction to Integral Equations with Applications. New York: Dekker, 1985. [32] Jordan, D., and P. Smith. Nonlinear Ordinary Differential Equations. Clarendon Press, Oxford, UK, 1987. [33] Kazarinoff, N. Geometric Inequalities. New York: Random House, 1961. [34] Klamkin, M. (ed.) Problems in Applied Mathematics. Selections from SIAM Review. Philadelphia: SIAM, 1990. [35] Knowles, J. “Energy decay estimates for damped oscillators,” International Journal of Mathematical Education in Science and Technology, vol. 28, no. 1, 1997. [36] Lafrance, P. Fundamental Concepts in Communication. Englewood Cliffs, NJ: Prentice Hall, 1990. [37] Landau, E. Differential and Integral Calculus (3rd ed.). New York: Chelsea, 1980. [38] Lew, J., J. Frauenthal, and N. Keyfitz. “On the average distances in a circular disc,” in Mathematical Modeling: Classroom Notes in Applied Mathematics. Philadelphia: SIAM, 1987. [39] Libeskind, S. “Summation of finite series — A unified approach,” TwoYear College Mathematics Journal, vol. 12, no. 1, 1981. [40] L¨ utkepohl, H. Handbook of Matrices. Chichester: Wiley, 1996. [41] Manley, R. Waveform Analysis. New York: Wiley, 1945. [42] Marcus, M., and H. Minc. A Survey of Matrix Theory and Matrix Inequalities. New York: Dover, 1992. [43] Marshall, A., and I. Olkin. Inequalities: Theory of Majorization and its Applications. New York: Academic Press, 1979.
146
References
[44] Meschkowski, H. Series Expansions for Mathematical Physicists. New York: Interscience, 1968. [45] Mitrinovic, D. Analytic Inequalities. Berlin: Springer-Verlag, 1970. [46] Mitrinovic, D., and Dragoslav, S. Recent Advances in Geometric Inequalities. Boston: Kluwer Academic, 1989. [47] Oden, J. Applied Functional Analysis: A First Course for Students of Mechanics and Engineering Science. Englewood Cliffs, NJ: Prentice Hall, 1979. [48] Papoulis, A. The Fourier Integral and its Applications. New York: McGraw-Hill, 1962. [49] Patel, V. Numerical Analysis. New York: Saunders College, 1994. [50] Polya, G., and G. Szeg¨ o. Isoperimetric Inequalities in Mathematical Physics. Annals of Mathematics Studies No. 27. Princeton, NJ: Princeton University Press, 1951. [51] Protter, M. Maximum Principles in Differential Equations. New York: Springer-Verlag, 1984. [52] Ptak, V. “The Kantorovich inequality,” American Mathematical Monthly, vol. 102, no. 9, 1995. [53] Rogosinski, W. Volume and Integral. New York: Interscience, 1952. [54] Shannon, C. The Mathematical Theory of Communication. Urbana: University of Illinois Press, 1964. [55] Stoer, J., and R. Bulirsch. Introduction to Numerical Analysis. New York: Springer-Verlag, 1980. [56] Stratton, J. Electromagnetic Theory. New York: McGraw-Hill, 1941. [57] Stromberg, K. An Introduction to Classical Real Analysis. Belmont, CA: Wadsworth International, 1981. [58] Temme, N. Special Functions: An Introduction to the Classical Functions of Mathematical Physics. New York: Wiley, 1996. [59] Watson, G. A Treatise on the Theory of Bessel Functions. Cambridge University Press, UK, 1944. [60] Weinberger, H. A First Course in Partial Differential Equations with Complex Variables and Transform Methods. New York: Wiley, 1965.
Index
absolute value, 5 air resistance, 82 algebra of inequalities, 2 AM–GM inequality, 38, 49–51 arithmetic mean, 15, 39, 50 arithmetic–geometric mean, 124 asymptotic sequence, 70 axioms of order, 2 Bernoulli’s inequality, 37 Bessel’s inequality, 65 bound Chernoff, 111 greatest lower, 4 least upper, 4 bounded above,below, 4 function, 19 capacitance, 90 Cauchy mean value theorem, 25 Cauchy sequence, 54, 59 Cauchy–Schwarz inequality, 44, 60, 61, 66 Chebyshev’s inequality, 45, 51, 110
communication, 112 complementary exponents, 41 complete, 55 continuity, 20, 62 contraction mapping, 56 theorem, 56 control, 109 convergence, 6, 54 in mean, 123 pointwise, 69 uniform, 69, 123 convolution, 68 dense, 36 DMS, 114 duality, 121 eigenvalues, 93 elliptic integral, 125 entropy, 114 equation Laplace, 89 Poisson, 88 estimation of integrals, 29, 67 Euler method, 75
148
Index
Euler’s constant, 36 Euler’s formula, 84 extrema, 21 Fibonacci numbers, 15 fixed point, 56 Fourier coefficients, 65 series, 65, 69, 85 transform, 101 Fredholm integral equation, 116 function autocorrelation, 124 Bessel, 78, 82, 124 bounded, 19 coerror, 125 complementary error, 124 continuity, 20 convex, 46 convolution, 68 decreasing, 21 elliptic integral, 125 exponential integral, 77, 123 gamma, 77, 123 increasing, 21 Lyapunov, 104 monotonic, 21, 35 square integrable, 51 terminology and facts, 19 transfer, 108 geometric mean, 15, 39, 50 geometric shapes, 84 Gibb’s phenomenon, 100 give a little, 11 Green’s formula, 90 H¨ older’s inequality, 40, 42, 43, 50 harmonic mean, 15, 40, 50 harmonic–geometric–arithmetic means inequality, 50 Heron’s formula, 87 Hilbert space, 62 implicit function theorem, 119
inequality AM–GM, 38, 49–51 Bernoulli, 37 Bessel, 65 Cauchy–Schwarz, 44, 60, 61, 66 Chebyshev, 45, 51, 110 convexity, 46 H¨ older, 40, 42, 43, 50 harmonic–geometric– arithmetic means, 50 for integrals, 50 integral, 21 isoperimetric, 85 Jensen, 46, 51, 52 Jordan, 29 Kantorovich, 50 logarithmic, 27 Markov, 110 Minkowski, 43, 61, 66 modulus, 22 of the means, 38, 49–51 Ostrowski, 35 quadratic, 5 Schur, 97 triangle, 7, 58 Weierstrass, 14 Young, 38, 41, 49 infimum, 4 information, 113, 114 inner product, 59, 62 inner product space, 59 intermediate value property, 21 isoperimetric inequality, 85 iterative process, 56 Jensen’s inequality, 46, 51, 52 Jordan’s inequality, 29 Kantorovich’s inequality, 50 l’Hˆ opital’s monotone rule, 25 Lagrange interpolation, 72 Laplace equation, 89
Index
Laplace transform, 108 law parallelogram, 61 transitive, 3 limit, 6 limit point, 54 linear space, 57 Lipschitz condition, 33, 117 logarithmic inequality, 27 Lyapunov function, 104 mapping, 55 continuous, 55 contraction, 56 Markov’s inequality, 110 matched filter, 112 matrix, 93 conjugate transpose, 93 eigenvalues, 93 Hermitian, 93 norm, 98 symmetric, 94 trace, 96 maximum, 4 mean arithmetic, 15, 39, 50 arithmetic–geometric, 124 geometric, 15, 39, 50 harmonic, 15, 40, 50 mean value theorem, 24 Cauchy, 25 for integrals, 23, 35 metric, 54 metric space, 53 minimizing vector, 62 minimum, 4 Minkowski’s inequality, 43, 61, 66 modulus inequality, 22 monotonicity conditions for, 25 monotonicity, conditions for, 25 motor control, 109 neighborhood, 6, 54
Neumann series, 117 Newton’s method, 120, 125 noise, 112 norm, 58, 62 L2 , 58 Lp , 97 compatible, 99 cubic, 98 Frobenius, 98 induced, 61 matrix, 97 supremum, 58 order exponential, 34 symbols O, o, 19 orthogonal projection, 62 orthogonality, 62 orthonormal set, 62 Ostrowski’s inequality, 35 parallelogram law, 61 Parseval’s identity, 85 persistence of sign, 20, 55 Picard iteration, 56, 125 Picard–Lindel¨ of theorem, 117 Platonic solids, 85 Poisson’s equation, 88 polyhedron, 84 polynomial, 82 Chebyshev, 78 Legendre, 79 orthogonal, 81 zeros of, 81 positive definite, 95, 104 principle Dirichlet, 91 maximum, 89 squeeze, 14, 20 uncertainty, 102 projectile problem, 82 Pythagorean theorem, 62 quadratic form, 94 quadratic inequality, 5
149
150
Index
results from differential calculus, 23 Riemann integral, 21 lemma, 65 sum, 21 Rolle’s theorem, 25 Schur’s inequality, 97 second derivative test, 26, 95 sequence, 6 asymptotic, 70 increasing, 10 monotonic, 17 set closed, 54 open, 54 Simpson’s rule, 72, 123 space lp , 65 function, 58 Hilbert, 62 inner product, 59 linear, 57 metric, 53 complete, 55 normed linear, 58 spectral radius, 99 spring, nonlinear, 105 square integrable, 51, 68 squeeze principle, 14, 20 stability, 103, 107 asymptotic, 103 BIBO, 107
exponential, 103 Lyapunov, 103 Steiner symmetrization, 92 successive approximations, 56 supremum, 4 Sylvester’s criterion, 94 Taylor’s method, 74, 123 Taylor’s theorem, 24 telescoping property, 15 theorem contraction mapping, 56 fundamental, of calculus, 23 implicit function, 119 l’Hˆ opital’s monotone, 25 mean value, 24 Cauchy, 25 for integrals, 23, 35 Picard–Lindel¨ of, 117 Pythagorean, 62 Rolle, 25 Taylor, 24 transfer function, 108 transform Fourier, 101 Laplace, 108 triangle inequality, 7, 58 Wallis’s product, 34 Weierstrass’s inequality, 14 Young’s inequality, 38, 41, 49 zeros of polynomials, bounds, 9