Advanced Analysis

Advanced Analysis Min Yan Department of Mathematics Hong Kong University of Science and Technology September 3, 2009 2...

Author: Min Yan

74 downloads 1852 Views 2MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Advanced Analysis Min Yan Department of Mathematics Hong Kong University of Science and Technology September 3, 2009

2

Contents 1 Limit and Continuity 1.1 Limit of Sequence . . . . . . . . . . . . . . . . . . . . . 1.1.1 Definition . . . . . . . . . . . . . . . . . . . . . 1.1.2 Property . . . . . . . . . . . . . . . . . . . . . . 1.1.3 Infinity and Infinitesimal . . . . . . . . . . . . . 1.1.4 Additional Exercise . . . . . . . . . . . . . . . . 1.2 Convergence of Sequence Limit . . . . . . . . . . . . . 1.2.1 Necessary Condition . . . . . . . . . . . . . . . 1.2.2 Supremum and Infimum . . . . . . . . . . . . . 1.2.3 Monotone Sequence . . . . . . . . . . . . . . . . 1.2.4 Convergent Subsequence . . . . . . . . . . . . . 1.2.5 Convergence of Cauchy Sequence . . . . . . . . 1.2.6 Open Cover . . . . . . . . . . . . . . . . . . . . 1.2.7 Additional Exercise . . . . . . . . . . . . . . . . 1.3 Limit of Function . . . . . . . . . . . . . . . . . . . . . 1.3.1 Definition . . . . . . . . . . . . . . . . . . . . . 1.3.2 Variation . . . . . . . . . . . . . . . . . . . . . 1.3.3 Property . . . . . . . . . . . . . . . . . . . . . . 1.3.4 Limit of Trignometric Function . . . . . . . . . 1.3.5 Limit of Exponential Function . . . . . . . . . . 1.3.6 More Property . . . . . . . . . . . . . . . . . . 1.3.7 Order of Infinity and Infinitesimal . . . . . . . . 1.3.8 Additional Exercise . . . . . . . . . . . . . . . . 1.4 Continuous Function . . . . . . . . . . . . . . . . . . . 1.4.1 Definition . . . . . . . . . . . . . . . . . . . . . 1.4.2 Uniformly Continuous Function . . . . . . . . . 1.4.3 Maximum and Minimum . . . . . . . . . . . . . 1.4.4 Intermediate Value Theorem . . . . . . . . . . . 1.4.5 Invertible Continuous Function . . . . . . . . . 1.4.6 Inverse Trignometric and Logarithmic Functions 1.4.7 Additional Exercise . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7 8 8 13 18 21 21 22 24 28 31 36 37 39 40 40 43 46 50 52 56 59 62 63 63 67 69 71 73 75 79

2 Differentiation 2.1 Approximation and Differentiation 2.1.1 Approximation . . . . . . . 2.1.2 Differentiation . . . . . . . . 2.1.3 Derivative . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

81 82 82 84 85

3

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

4

CONTENTS

2.2

2.3

2.1.4 Tangent Line and Rate of Change 2.1.5 Rules of Computation . . . . . . 2.1.6 Basic Example . . . . . . . . . . 2.1.7 Derivative of Inverse Function . . 2.1.8 Additional Exercise . . . . . . . . Application of Differentiation . . . . . . 2.2.1 Maximum and Minimum . . . . . 2.2.2 Mean Value Theorem . . . . . . . 2.2.3 Monotone Function . . . . . . . . ˆ 2.2.4 L’Hospital’s Rule . . . . . . . . . 2.2.5 Additional Exercise . . . . . . . . High Order Approximation . . . . . . . . 2.3.1 Quadratic Approximation . . . . 2.3.2 High Order Derivative . . . . . . 2.3.3 Taylor Expansion . . . . . . . . . 2.3.4 Remainder . . . . . . . . . . . . . 2.3.5 Maximum and Minimum . . . . . 2.3.6 Convex and Concave . . . . . . . 2.3.7 Additional Exercise . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

88 90 93 96 99 100 100 102 105 108 112 114 114 116 121 126 129 130 135

3 Integration 3.1 Riemann Integration . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Riemann Sum . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Integrability Criterion . . . . . . . . . . . . . . . . . 3.1.3 Integrability of Continuous and Monotone Functions 3.1.4 Properties of Integration . . . . . . . . . . . . . . . . 3.1.5 Additional Exercise . . . . . . . . . . . . . . . . . . . 3.2 Antiderivative . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Fundamental Theorem of Calculus . . . . . . . . . . 3.2.2 Antiderivative . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Integration by Parts . . . . . . . . . . . . . . . . . . 3.2.4 Change of Variable . . . . . . . . . . . . . . . . . . . 3.2.5 Additional Exercise . . . . . . . . . . . . . . . . . . . 3.3 Topics on Integration . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Integration of Rational Functions . . . . . . . . . . . 3.3.2 Improper Integration . . . . . . . . . . . . . . . . . . 3.3.3 Riemann-Stieltjes Integration . . . . . . . . . . . . . 3.3.4 Bounded Variation Function . . . . . . . . . . . . . . 3.3.5 Additional Exercise . . . . . . . . . . . . . . . . . . .

139 . 140 . 140 . 143 . 146 . 148 . 154 . 158 . 159 . 164 . 170 . 171 . 174 . 179 . 179 . 184 . 189 . 195 . 201

4 Series 4.1 Series 4.1.1 4.1.2 4.1.3 4.1.4 4.1.5

203 . 204 . 204 . 207 . 212 . 215 . 219

of Numbers . . . . . . . . Sum of Series . . . . . . Comparison Test . . . . Conditional Convergence Rearrangement of Series Additional Exercise . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

CONTENTS 4.2

Series 4.2.1 4.2.2 4.2.3 4.2.4 4.2.5

5 of Functions . . . . . . . . . . . . . Uniform Convergence . . . . . . . . Properties of Uniform Convergence Power Series . . . . . . . . . . . . . Fourier Series . . . . . . . . . . . . Additional Exercise . . . . . . . . .

5 Multivariable Function 5.1 Limit and Continuity . . . . . . . . . . . 5.1.1 Limit in Euclidean Space . . . . . 5.1.2 Topology in Euclidean Space . . . 5.1.3 Multivariable Function . . . . . . 5.1.4 Continuous Function . . . . . . . 5.1.5 Multivariable Map . . . . . . . . 5.1.6 Additional Exercise . . . . . . . . 5.2 Multivariable Algebra . . . . . . . . . . . 5.2.1 Linear Transform . . . . . . . . . 5.2.2 Bilinear and Quadratic Form . . 5.2.3 Multilinear Map and Polynomial 5.2.4 Additional Exercises . . . . . . . 6 Multivariable Differentiation 6.1 Differentiation . . . . . . . . . . . . . . 6.1.1 Differentiability and Derivative 6.1.2 Partial Derivative . . . . . . . . 6.1.3 Rules of Differentiation . . . . . 6.1.4 Directional Derivative . . . . . 6.2 Inverse and Implicit Function . . . . . 6.2.1 Inverse Differentiation . . . . . 6.2.2 Implicit Differentiation . . . . . 6.2.3 Hypersurface . . . . . . . . . . 6.2.4 Additional Exercise . . . . . . . 6.3 High Order Differentiation . . . . . . . 6.3.1 Quadratic Approximation . . . 6.3.2 High Order Partial Derivative . 6.3.3 Taylor Expansion . . . . . . . . 6.3.4 Maximum and Minimum . . . . 6.3.5 Constrained Extreme . . . . . . 6.3.6 Additional Exercise . . . . . . . 7 Multivariable Integration 7.1 Riemann Integration . . . . . . . . 7.1.1 Volume in Euclidean Space . 7.1.2 Riemann Sum . . . . . . . . 7.1.3 Properties of Integration . . 7.1.4 Fubini Theorem . . . . . . . 7.1.5 Volume in Vector Space . .

. . . . . .

. . . . . .

. . . . . . . . . . . . . . . . .

. . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

222 222 228 234 238 244

. . . . . . . . . . . .

249 . 250 . 250 . 253 . 258 . 261 . 263 . 267 . 269 . 269 . 272 . 279 . 287

. . . . . . . . . . . . . . . . .

289 . 290 . 290 . 291 . 295 . 298 . 302 . 302 . 306 . 309 . 311 . 314 . 314 . 317 . 320 . 323 . 328 . 334

. . . . . .

337 . 338 . 338 . 344 . 348 . 352 . 356

6

CONTENTS

7.2

7.3

7.1.6 Change of Variable . . . . . . . . . . . . 7.1.7 Improper Integration . . . . . . . . . . . 7.1.8 Additional Exercise . . . . . . . . . . . . Integration on Hypersurface . . . . . . . . . . . 7.2.1 Rectifiable Curve . . . . . . . . . . . . . 7.2.2 Integration of Function on Curve . . . . 7.2.3 Integration of 1-Form on Curve . . . . . 7.2.4 Surface Area . . . . . . . . . . . . . . . . 7.2.5 Integration of Function on Surface . . . . 7.2.6 Integration of 2-Form on Surface . . . . 7.2.7 Volume and Integration on Hypersurface Stokes Theorem . . . . . . . . . . . . . . . . . . 7.3.1 Green Theorem . . . . . . . . . . . . . . 7.3.2 Independence of Integral on Path . . . . 7.3.3 Stokes Theorem . . . . . . . . . . . . . . 7.3.4 Gauss Theorem . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

360 367 372 374 374 378 380 384 387 388 394 399 399 404 410 415

Chapter 1 Limit and Continuity

7

8

1.1

CHAPTER 1. LIMIT AND CONTINUITY

Limit of Sequence

A sequence is an infinite list x1 , x2 , x3 , . . . , xn , xn+1 , . . . . The sequence can also be denoted as {xn }. The subscript n is called the index and does not have to start from 1. For example, x5 , x6 , x7 , . . . , xn , xn+1 , . . . , is also a sequence, with the index starting from 5. In this chapter, the terms xn of a sequence are assumed to be real numbers and can be plotted on the real number line. A sequence has a limit l if the following implication happens: n is big =⇒ xn is close to l. Intuitively, this means that the sequence “accumulates” around l. However, to give a rigorous definition, the meaning of “big”, “close” and “implies” has to be more precise. The bigness of a number n is usually measured by n > N for some big N . For example, we say n is “in the thousands” if N = 1, 000. The closeness between two numbers u and v is usually measured by (the smallness of) the size of |u − v|. The implication means that the predetermined smallness of |xn − l| may be achieved by the bigness of n.

1.1.1

Definition

Definition 1.1.1. A sequence {xn } of real numbers has limit l (or converges to l), and denoted limn→∞ xn = l, if for any > 0, there is N , such that n > N =⇒ |xn − l| < .

(1.1.1)

A sequence is convergent if it has a (finite) limit. Otherwise, the sequence is divergent. .. l... .. ..................................... .. . .. . ......................................................................................................................................................................................................... ... .. ...x x1 x 2 x5 . . . N +1 x3 x4 xN +3 ....xN +2 . xN +4 Figure 1.1: for any , there is N Note the logical relation between and N . The predetermined smallness for |xn − l| is arbitrarily given, while the size N for n is to be found after is given. Thus the choice of N usually depends on and is often expressed as a function of .

1.1. LIMIT OF SEQUENCE

9

.... .. ..... .. .. .... . . .. ........................................................................................... ......... ... .. ..... . .. . . . ........................................................ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . l . ..... ... .. ............................................................................................................. .. .. ...... .. .. ..... .. .. . .. ... ..................................................................................................................................................................................... N +1 N +4 5 .. 1 .. .. Figure 1.2: another plot of a converging sequence Since the limit is about the long term behavior of a sequence getting closer to a target, only small and big N need to be considered establishing a limit. For example, the limit of a sequence is not changed if the first one hundred terms are replaced by other arbitrary numbers. Exercise 1.1.9 contains more examples. 1 Example 1.1.1. Intuitively, the bigger n is, the smaller gets. This suggests n 1 1 limn→∞ = 0. Rigorously following the definition, for any > 0, choose N = . n Then 1 1 1 n > N =⇒ − 0 = < = . n n N 1 Thus the implication (1.1.1) is established for the sequence . n 1 How did we find the suitable N ? Our goal is to achieve − 0 < . This is n 1 1 the same as n > , which suggests us to take N = . Note that our choice of N may not be a natural number. n Example 1.1.2. The sequence , with index starting from n = 2, is n + (−1)n 2 3 4 5 6 7 , , , , , , ... . 3 2 5 4 7 6 n = 1. n + (−1)n For the rigorous argument, we observe that n 1 1 = − 1 n + (−1)n ≤ n − 1 . n + (−1)n

Plotting the sequence suggests limn→∞

In order for the left side to be less than , it is sufficient to make is the same as n >

1 + 1.

1 < , which n−1

10

CHAPTER 1. LIMIT AND CONTINUITY

Based on the analysis, we have the following formal and rigorous argument for 1 the limit: For any > 0, choose N = + 1. Then 1 1 1 n − 1 = ≤ < = . n > N =⇒ n n n + (−1) n + (−1) n−1 N −1 Example 1.1.3. Consider the sequence 1.4, 1.41, 1.414, 1.4142, 1.41421, 1.414213, 1.4142135, 1.41421356, . . . √ of more and more refined decimal approximations of 2. The intuition suggests √ that limn→∞ xn = 2. The rigorous verification means that for any > 0, we need to find N , such that √ n > N =⇒ |xn − 2| < . Since the n-th√term xn is the decimal expansion up to the n-th decimal point, it satisfies |xn − 2| < 10−n . Therefore it suffices to find N such that the following implication holds n > N =⇒ 10−n < . Assume 1 > > 0. Then has the decimal expansion = 0.00 · · · 0EN EN +1 EN +2 · · · , with EN is from {1, 2, . . . , 9}. In other words, N is the location of the first nonzero digit in the decimal expansion of . Then for n > N , we have ≥ 0.00 · · · 0EN ≥ 0.00 · · · 01 = 10−N > 10−n . The argument above assumes 1 > > 0. This is not a problem because if we can achieve |xn − l| < 0.5 for n > N , then we can certainly achieve |xn − l| < for any ≥ 1 and for the same n > N . In other words, we may add the assumption that is less than a certain fixed number without hurting the overall rigorous argument for the limit. See Exercise 1.1.9 for more freedom we may have in choosing and N . Exercise 1.1.1. Rigorously verify the limits. 1. limn→∞

2n = 2. n−2

4. limn→∞

1 2. limn→∞ √ = 0. n

5. limn→∞

√ √ 3. limn→∞ ( n + 1 − n) = 0.

6. limn→∞

1 = 0. − n1/2 cos n = 0. n √ n − cos n √ = 1. n + sin n n2/3

Exercise 1.1.2. Let a positive real number a > 0 have the decimal expansion a = X.Z1 Z2 · · · Zn Zn+1 · · · , where X is a non-negative integer, and Zn is a single digit integer from {0, 1, 2, . . . , 9} at the n-th decimal point. Prove the sequence X.Z1 , X.Z1 Z2 , X.Z1 Z2 Z3 , X.Z1 Z2 Z3 Z4 , . . . of more and more refined decimal approximations converges to a.

1.1. LIMIT OF SEQUENCE

11

Exercise 1.1.3. Suppose xn ≤ l ≤ yn and limn→∞ (xn −yn ) = 0. Prove limn→∞ xn = limn→∞ yn = l. Exercise 1.1.4. Suppose |xn − l| ≤ yn and limn→∞ yn = 0. Prove limn→∞ xn = l. Exercise 1.1.5. Suppose limn→∞ xn = l. Prove limn→∞ |xn | = |l|. Is the converse true? Exercise 1.1.6. Suppose limn→∞ xn = l. Prove limn→∞ xn+3 = l. Is the converse true? Exercise 1.1.7. Prove that the limit is not changed if finitely many terms are modified. In other words, if there is N , such that xn = yn for n > N , then limn→∞ xn = l if and only if limn→∞ yn = l. Exercise 1.1.8. Prove the uniqueness of the limit. In other words, if limn→∞ xn = l and limn→∞ xn = l0 , then l = l0 . Exercise 1.1.9. Prove the following are equivalent definitions of limn→∞ xn = l. 1. For any c > > 0, where c is some fixed number, there is N , such that |xn − l| < for all n > N . 2. For any > 0, there is a natural number N , such that |xn − l| < for all n > N. 3. For any > 0, there is N , such that |xn − l| ≤ for all n > N . 4. For any > 0, there is N , such that |xn − l| < for all n ≥ N . 5. For any > 0, there is N , such that |xn − l| ≤ 2 for all n > N . Exercise 1.1.10. Which are equivalent to the definition of limn→∞ xn = l? 1. For = 0.001, we have N = 1000, such that |xn − l| < for all n > N . 2. For any 0.001 ≥ > 0, there is N , such that |xn − l| < for all n > N . 3. For any > 0.001, there is N , such that |xn − l| < for all n ≥ N . 4. For any > 0, there is a natural number N , such that |xn − l| ≤ for all n ≥ N. 5. For any > 0, there is N , such that |xn − l| < 22 for all n > N . 6. For any > 0, there is N , such that |xn − l| < 22 + 1 for all n > N . 7. For any > 0, we have N = 1000, such that |xn − l| < for all n > N . 8. For any > 0, there are infinitely many n, such that |xn − l| < . 9. For infinitely many > 0, there is N , such that |xn − l| < for all n > N . 10. For any > 0, there is N , such that l − 2 < xn < l + for all n > N . 11. For any natural number K, there is N , such that |xn − l| <

1 for all n > N . K

The following examples are the most important basic limits. For any given , the analysis leading to the suitable choice of N will be given. It is left to the reader to write down the rigorous formal argument in line with the definition of limit.

12

CHAPTER 1. LIMIT AND CONTINUITY

Example 1.1.4. We have 1 = 0 for a > 0. (1.1.2) na 1 1 1 1 The inequality a < is the same as < a . Thus choosing N = − a should n n make the implication (1.1.1) hold. lim

n→∞

Example 1.1.5. We have lim an = 0 for |a| < 1.

n→∞

Let

(1.1.3)

1 = 1 + b. Then b > 0 and |a| 1 n(n − 1) 2 = (1 + b)n = 1 + nb + b + · · · > nb. n |a | 2

1 . In order to get |an | < , therefore, it suffices to make sure This implies |an | < nb 1 1 < . This suggests us to choose N = . nb b Example 1.1.6. We have √ (1.1.4) lim n n = 1. n→∞ √ Let xn = n n − 1. Then xn > 0 and n(n − 1) 2 n(n − 1) 2 xn + · · · > xn . 2 2 √ . In order to get | n n − 1| = xn < , therefore, it suffices

n = (1 + xn )n = 1 + nxn +

2 n−1 2 2 to make sure < 2 . Thus we may choose N = 2 + 1. n−1 Example 1.1.7. For any a, we have This implies x2n <

an = 0. (1.1.5) n→∞ n! Fix an integer M > |a|. Then for n > M , we have n a |a|M |a| |a| |a| |a| |a|M |a| = · · · ≤ . n! M! M + 1 M + 2 n−1 n M! n n a |a|M |a| Thus in order to get < , we only need to make sure < . This leads n! M! n |a|M +1 to the choice N = max M, . M ! n! 1 (n!)2 1 and < for n > 1. Then use this to Exercise 1.1.11. Prove n < n n (2n)! n+1 n! (n!)2 prove limn→∞ n = limn→∞ = 0. n (2n)! n(n − 1) Exercise 1.1.12. Use the binary expansion of 2n = (1+1)n to prove 2n > . 2 n Then prove limn→∞ n = 0. 2 √ √ Exercise 1.1.13. Prove limn→∞ n 3 = 1 and limn→∞ n n2 + 2n + 3 = 1. r √ n 1 n Exercise 1.1.14. Prove n! > . Then use this to prove limn→∞ √ = 0. n 2 n! lim

1.1. LIMIT OF SEQUENCE

1.1.2

13

Property

A sequence is bounded if there is a constant B, such that |xn | ≤ B for all n. This is equivalent to the existence of constants B1 and B2 , such that B1 ≤ xn ≤ B2 for any n. The constants B, B1 , B2 are respectively called a bound, a lower bound and an upper bound. Proposition 1.1.2. Convergent sequences are bounded. Proof. Suppose limn→∞ xn = l. For = 1 > 0, there is N , such that n > N =⇒ |xn − l| < 1. Moreover, by taking a bigger natural number if necessary, we may further assume N is a natural number. Then xN +1 , xN +2 , . . . , have upper bound l + 1 and lower bound l − 1, and the whole sequence has upper bound max{x1 , x2 , . . . , xN , l + 1} and lower bound min{x1 , x2 , . . . , xN , l − 1}. Exercise 1.1.15. Prove that if |xn | N , then the whole sequence {xn } is bounded. This implies that the boundedness is not changed by modifying finitely many terms in a sequence. Exercise 1.1.16. Suppose limn→∞ xn = 0 and yn is bounded. Prove limn→∞ xn yn = 0.

Proposition 1.1.3 (Arithmetic Rule). Suppose lim xn = l, lim yn = k.

n→∞

n→∞

Then

xn l = , n→∞ n→∞ n→∞ yn k where yn 6= 0 and k = 6 0 are assumed in the third equality. lim (xn + yn ) = l + k, lim xn yn = lk, lim

Proof. For any > 0, there are N1 and N2 , such that n > N1 =⇒ |xn − l| < , 2 n > N2 =⇒ |yn − k| < . 2 Then for n > max{N1 , N2 }, we have |(xn + yn ) − (l + k)| ≤ |xn − l| + |yn − k| <

+ = . 2 2

This completes the proof that limn→∞ (xn + yn ) = l + k. By Proposition 1.1.2, we have |yn | 0, there are N1 and N2 , such that , 2B n > N2 =⇒ |yn − k| < . 2|l| n > N1 =⇒ |xn − l| <

14

CHAPTER 1. LIMIT AND CONTINUITY

Then for n > max{N1 , N2 }, we have |xn yn − lk| = |(xn yn − lyn ) + (lyn − lk)| ≤ |xn − l||yn | + |l||yn − k| <

B + |l| = . 2B 2|l|

This completes the proof that limn→∞ xn yn = lk. Assume yn 6= 0 and k 6= 0. We will prove limn→∞ property of the limit, this implies

1 1 = . By the product yn k

xn 1 1 l = lim xn lim =l = . n→∞ yn n→∞ n→∞ yn k k |k|2 |k| For any > 0, we have 0 = min , > 0. Then there is N , such 2 2 that lim

n > N =⇒ |yn − k| < 0 |k|2 |k| ⇐⇒ |yn − k| < , |yn − k| < 2 2 |k|2 |k| =⇒ |yn − k| < , |yn | > 2 2 2 |k| 1 1 |yn − k| < 2 = . =⇒ − = |k| yn k |yn k| |k| 2 1 1 = . This completes the proof that limn→∞ yn k Example 1.1.8. By the limit (1.1.2) and the arithmetic rule, we have 2n = lim n − 2 n→∞

2

limn→∞ 2 2 = 2. = = 1 1 1−0 1− limn→∞ 1 − limn→∞ n n Here is a more complicated example 1 1 1+2 2 +2 3 n3 + 2n + 2 n n lim = lim 1 1 n→∞ 2n3 + 10n2 + 1 n→∞ 2 + 10 + 3 n n 1 2 1 3 1 + 2 limn→∞ + 2 limn→∞ n n = 1 1 3 2 + 10 limn→∞ + limn→∞ n n 2 3 1+2·0 +2·0 1 = = . 2 + 10 · 0 + 03 2 The idea can be generalized to obtain  if p < q and bq 6= 0 ap np + ap−1 np−1 + · · · + a1 n + a0 0 ap lim = . (1.1.6) if p = q and bq 6= 0 n→∞ bq nq + bq−1 nq−1 + · · · + b1 n + b0  bq lim

n→∞

1.1. LIMIT OF SEQUENCE

15

Exercise 1.1.17. Suppose limn→∞ xn = l and limn→∞ yn = k. Prove limn→∞ max{xn , yn } = max{l, k} and limn→∞ min{xn , yn } = min{l, k}. You may use the formula max{x, y} = 1 (x + y + |x − y|) and the similar one for min{x, y}. 2

Proposition 1.1.4 (Order Rule). Suppose both {xn } and {yn } converge. 1. If xn ≥ yn for big n, then limn→∞ xn ≥ limn→∞ yn . 2. If limn→∞ xn > limn→∞ yn , then xn > yn for big n. A special case of the property is that xn ≤ l implies limn→∞ xn ≤ l, and limn→∞ xn < l implies xn < l for sufficiently big n. Proof. We prove the second statement first. Suppose limn→∞ xn > limn→∞ yn . Then by Proposition 1.1.3, limn→∞ (xn − yn ) = limn→∞ xn − limn→∞ yn > 0. For = limn→∞ (xn − yn ) > 0, there is N , such that n > N =⇒ |(xn − yn ) − | < =⇒ xn − yn > − = 0 ⇐⇒ xn > yn . By exchanging xn and yn in the second statement, we find that lim xn < lim yn =⇒ xn < yn for big n.

n→∞

n→∞

This further implies that we cannot have xn ≥ yn for big n. The combined implication lim xn < lim yn =⇒ opposite of (xn ≥ yn for big n)

n→∞

n→∞

is equivalent to the first statement.

In the second part of the proof above, we used the logical fact that “A =⇒ B” is the same as “(not B) =⇒ (not A)”. Moreover, we note that the following two statements are not opposite of each other. 1. There is N , such that xn < yn for n > N . 2. There is N , such that xn ≥ yn for n > N . Therefore although the proof above showed that the first statement is a consequence of the second, the two statements are not logically equivalent. Proposition 1.1.5 (Sandwich Rule). Suppose xn ≤ yn ≤ zn , lim xn = lim zn = l. n→∞

Then lim yn = l.

n→∞

n→∞

16

CHAPTER 1. LIMIT AND CONTINUITY

Proof. For any > 0, there are N1 and N2 , such that n > N1 =⇒ |xn − l| < , n > N2 =⇒ |zn − l| < . Then n > max{N1 , N2 } =⇒ − < xn − l ≤ yn − l ≤ zn − l < =⇒ |yn − l| < .

n cos n o Example 1.1.9. To find the limit of the sequence , we compare it with n 1 1 cos n 1 1 the sequence in Example 1.1.1. Since − ≤ ≤ and limn→∞ = n n n n n cos n 1 limn→∞ − = 0, we get limn→∞ = 0. n n (−1)n sin n = 0 and limn→∞ = 0. Then by By similar reason, we get limn→∞ n n the arithmetic rule, cos n n sin n (−1)n 1+ n n cos n 1 − lim n→∞ 1 n = lim n→∞ n sin n (−1)n 1 + limn→∞ limn→∞ n n 1−0 = 0. =0 1+0·0 1−

n − cos n 1 lim = lim n→∞ n2 + (−1)n sin n n→∞ n

1 Example 1.1.10. Suppose {xn } is a sequence satisfying |xn − l| < . Then we n 1 1 1 1 = limn→∞ l + = l, by the have l − < xn < l + . Since limn→∞ l − n n n n sandwich rule, we get limn→∞ xn = l. √ √ Example 1.1.11. For any a > 1 and n > a, we have 1 < n a < n n. Thus by the √ limit (1.1.4) and the sandwich rule, we have limn→∞ n a = 1. On the other hand, 1 for 0 < a < 1, we have b = > 1 and a √ n

1 1 √ = 1. a = lim √ = n n→∞ n→∞ b limn→∞ n b √ Combining all the cases, we get limn→∞ n a = 1 for any a > 0. Furthermore, we have 1 1 2 2 1 < (n + a) n+b < (2n) n , 1 < (n2 + an + b) n+c < (2n2 ) n lim

for sufficiently big n. By √ n

√ 2 lim n n = (1 · 1)2 = 1, n→∞ n→∞ n→∞ 2 4 √ √ 2 n 2 n n lim (2n ) = lim 2 lim n = 12 · 14 = 1, 2

lim (2n) n =

n→∞

lim

n→∞

2

n→∞

1.1. LIMIT OF SEQUENCE

17 1

1

and the sandwich rule, we get limn→∞ (n + a) n+b = limn→∞ (n2 + an + b) n+c = 1. The same idea leads to the limit 1

lim (ap np + ap−1 np−1 + · · · + a1 n + a0 ) n+a = 1.

n→∞

(1.1.7)

np for |a| > 1 and any p. The special case p = 0 an is the limit (1.1.3), and the special case p = 1, a = 2 is Exercise 1.1.12. Let |a| = 1 + b. Since |a| > 1, we have b > 0. Fix a natural number P ≥ p. Then for n > P ,

Example 1.1.12. Consider limn→∞

n(n − 1) 2 n(n − 1) · · · (n − P ) P +1 b + ··· + b + ··· 2 (P + 1)! n(n − 1) · · · (n − P ) P +1 > b . (P + 1)!

|a|n = 1 + nb +

Thus p n (P + 1)! nP nP 0 < n ≤ n < a |a| n(n − 1) · · · (n − P ) bP +1 1 n n n (P + 1)! = ··· . n (n − 1) (n − 2) (n − P ) bP +1 1 n = 0, limn→∞ = 1, the fact that P and b are fixed constants, n n−k and the arithmetic rule, the right side has limit 0 as n → ∞. By the sandwich rule, we conclude that By limn→∞

np = 0 for |a| > 1 and any p. n→∞ an lim

(1.1.8)

Exercise 1.1.18. Redo Exercise 1.1.4 by using the sandwich rule. 1 Exercise 1.1.19. Let a > 0 be a constant. Then < a < n for big n. Use this and n √ the limit (1.1.4) to prove limn→∞ n a = 1. Exercise 1.1.20. Compute the limits. 2n7/4 − 3n3/2 . (3n3/4 − n1/2 + 1)(n + 2) √ 2. limn→∞ ( n2 + n − n).

1. limn→∞

3. limn→∞

3n−1 − 8 · 7n + (−1)n+1 . 8n−1 + 2(n + 1)(−5)n (−1)n n

4. limn→∞ √

5. limn→∞

n3

+1

+ (−1)n

.

n! + 10n . n10 + nn

Exercise 1.1.21. Compute the limits. 1. limn→∞

an , where a 6= −1. an + 1

6. limn→∞ 7. limn→∞ 8. limn→∞ 9. limn→∞ 10. limn→∞ 11. limn→∞

(n2 + n + 3)(n!2n + 5n ) . (n + 2)!2n + 5n p √ n n + cos n + sin n. √ n 11 · 2n + 2 · 5n + 5 · 11n . p n 11n n − 5n (n2 + 1). q p √ n 1 + n 2 + n 3 + n. r n 1 · 3 · 5 · · · (2n − 1) . 2 · 4 · 5 · · · 2n

18

CHAPTER 1. LIMIT AND CONTINUITY n

2. limn→∞ (n + a) n2 +bn+c . √ 3. limn→∞ n an + bn + cn , where a, b, c > 0. 1 1 1 4. limn→∞ √ +√ + ··· + √ . 1 + n2 2 + n2 n + n2

1.1.3

Infinity and Infinitesimal

A changing numerical quantity is an infinity if it tends to get arbitrarily big. For sequences, this means the following. Definition 1.1.6. A sequence {xn } diverges to infinity, denoted limn→∞ xn = ∞, if for any b, there is N , such that n > N =⇒ |xn | > b.

(1.1.9)

It diverges to positive infinity, denoted limn→∞ xn = +∞, if for any b, there is N , such that n > N =⇒ xn > b. (1.1.10) It diverges to negative infinity, denoted limn→∞ xn = −∞, if for any b, there is N , such that n > N =⇒ xn < b. (1.1.11) Example 1.1.13. We rigorously verify limn→∞

n2 = +∞. For any b > 0, n + (−1)n

choose N = 2b. Then n > N =⇒

n2 n2 n2 N ≥ > > = b. n n + (−1) n+1 2n 2

Exercise 1.1.22. Rigorously verify the divergence to infinity. 1. limn→∞ (100 + 10n − n2 ) = −∞. √ (−1)n n 2. limn→∞ = ∞. 2 + sin n r q p √ 3. limn→∞ n + (n − 1) + · · · + 2 + 1 = +∞. Exercise 1.1.23. Infinities must be unbounded. Is the converse true? Exercise 1.1.24. Suppose limn→∞ xn = +∞ and limn→∞ yn = +∞. Prove limn→∞ (xn + yn ) = +∞ and limn→∞ xn yn = +∞. Exercise 1.1.25. Suppose limn→∞ xn = ∞ and |xn − xn+1 | < c for some constant c. 1. Prove that either limn→∞ xn = +∞ or limn→∞ xn = −∞. 2. If we further know limn→∞ xn = +∞, prove that for any a > x1 , some term xn lies in the interval (a, a + c).

1.1. LIMIT OF SEQUENCE

19

A changing numerical quantity is an infinitesimal if it tends to get arbitrarily small. For sequences, this means that for any > 0, there is N , such that n > N =⇒ |xn | < . (1.1.12) This simply means limn→∞ xn = 0. Note that the implications (1.1.9) and (1.1.12) are equivalent by changing 1 1 and taking = . Therefore we have xn to xn b 1 {xn } is an infinity ⇐⇒ is an infinitesimal. xn For example, the infinitesimals (1.1.2),(1.1.3), (1.1.5), n (1.1.8) tell us that n! a {na } (for a > 0), {an } (for |a| > 1), , and (for |a| > 1) are an np infinities. Moreover, the first case in the limit (1.1.6) tells us ap np + ap−1 np−1 + · · · + a1 n + a0 = ∞ if p > q and ap 6= 0. n→∞ bq nq + bq−1 nq−1 + · · · + b1 n + b0 lim

On the other hand, since limn→∞ xn = l is equivalent to limn→∞ (xn −l) = 0, we have {xn } converges to l ⇐⇒ {xn − l} is an infinitesimal. √ For example, the limit (1.1.4) tells us that { n n − 1} is an infinitesimal. Exercise 1.1.26. How to characterize a positive infinity {xn } in terms of the in1 finitesimal ? xn Exercise 1.1.27. Explain the infinities. 1. limn→∞

n! = ∞ for any a 6= 0. an

2. limn→∞

n! = ∞ if a + b 6= 0. an + bn

3. limn→∞ √ n

1 = +∞. n−1

4. limn→∞ √ n

1 √ = −∞. n − n 2n

Some properties of finite limits can be extended to infinities and infinitesimals. For example, if limn→∞ xn = +∞ and limn→∞ yn = +∞, then limn→∞ (xn + yn ) = +∞. The property can be denoted as the arithmetic rule (+∞) + (+∞) = +∞. Moreover, if limn→∞ xn = 1, limn→∞ yn = 0, and xn yn < 0 for big n, then limn→∞ = −∞. Thus we have another arithmetic yn 1 rule − = −∞. Common sense suggests more arithmetic rules such as 0 c c c+∞ = ∞, c·∞ = ∞(for c 6= 0), ∞·∞ = ∞, = ∞(for c 6= 0), = 0, 0 ∞

20

CHAPTER 1. LIMIT AND CONTINUITY

where c is a finite number and represents a sequence convergent to c. We must be careful in applying arithmetic rules involving infinities and infinitesimals. For example, we have lim n−1 = 0, lim 2n−1 = 0, lim n−2 = 0,

n→∞

n−1 = n→∞ 2n−1 0 This shows that has no 0 lim

n→∞

n→∞

n−1 1 n−2 , lim −2 = +∞, lim = 0. n→∞ 2n−1 2 n→∞ n definite value.

Example 1.1.14. By the (extended) arithmetic rule, we have lim (n2 + 3)(−2)n = lim (n2 + 3) lim (−2)n = ∞ · ∞ = ∞. n→∞ n→∞ 1 1 lim n + = lim n + lim = (+∞) + 0 = +∞. n→∞ n→∞ n→∞ n n √ √ n n limn→∞ n n 1 √ lim √ = = lim + = +∞. n n n→∞ n−1 limn→∞ ( n − 1) n→∞ 0

n→∞

Exercise 1.1.28. Prove the properties of infinities. 1. (bounded)+∞ = ∞: If {xn } is bounded and limn→∞ yn = ∞, then limn→∞ (xn + yn ) = ∞. 2. min{+∞, +∞} = +∞: If limn→∞ xn = limn→∞ yn = +∞, then limn→∞ min{xn , yn } = +∞. 3. Sandwich rule: If xn ≥ yn and limn→∞ yn = +∞, then limn→∞ xn = +∞. 4. (> c > 0)·(+∞) = +∞: If xn > c for some constant c > 0 and limn→∞ yn = +∞, then limn→∞ xn yn = +∞. Exercise 1.1.29. Show that it is not necessarily true that ∞ + ∞ = ∞ by constructing examples of sequences {xn } and {yn } that diverge to ∞ but one of the following holds. 1. limn→∞ (xn + yn ) = 2. 2. limn→∞ (xn + yn ) = +∞. 3. {xn + yn } is bounded and divergent. Exercise 1.1.30. Show that one cannot make a definite conclusion on 0 · ∞ by constructing examples of sequences {xn } and {yn }, such that limn→∞ xn = 0 and limn→∞ yn = ∞ but one of the following holds. 1. limn→∞ xn yn = 2. 2. limn→∞ xn yn = 0. 3. limn→∞ xn yn = ∞. 4. {xn yn } is bounded and divergent. Exercise 1.1.31. Provide counterexamples to the wrong extensions of the arithmetic rules. +∞ = 1, (+∞) − (+∞) = 0, 0 · ∞ = 0, 0 · ∞ = ∞, 0 · ∞ = 1. +∞

1.2. CONVERGENCE OF SEQUENCE LIMIT

1.1.4

21

Additional Exercise

Ratio Rule xn+1 yn+1 . Exercise 1.1.32. Suppose ≤ x n yn 1. Prove that |xn | ≤ c|yn | for some constant c. 2. Prove that limn→∞ yn = 0 implies limn→∞ xn = 0. 3. Prove that limn→∞ xn = ∞ implies limn→∞ yn = ∞. xn+1 = l. What can you say about limn→∞ xn Exercise 1.1.33. Suppose limn→∞ xn by looking at the value of l? (n!)2 an Exercise 1.1.34. Use the ratio rule to study the limits (1.1.8) and limn→∞ . (2n)!

Power Rule Exercise 1.1.35. Suppose limn→∞ xn = l > 0. Prove limn→∞ xαn = lα by the following steps. 1. Assume xn ≥ 1 and l = 1. By using the sandwich rule and the fact that limn→∞ xan = 1 for any integer a, prove that limn→∞ xαn = 1 for any number α. The same argument applies to the case xn ≤ 1. 2. Use min{xn , 1} ≤ xn ≤ max{xn , 1}, Exercise 1.1.17 and the sandwich rule to remove the assumption xn ≥ 1 in the first part. 3. Use the arithmetic rule to prove the limit for general l.

Average Rule x1 + x2 + · · · + xn . n 1. Prove that if |xn − l| < for n > N , where N is a natural number, then

Exercise 1.1.36. Suppose limn→∞ xn = l. Let yn =

n > N =⇒ |yn − l| <

|x1 | + |x2 | + · · · + |xN | + N |l| + . n

2. Use the first part and Proposition 1.1.2 to prove limn→∞ yn = l. 3. What happens if l = +∞ or ∞? Exercise 1.1.37. Find suitable condition on a sequence {an } of positive numbers, a1 x1 + a2 x2 + · · · + an xn = l. such that limn→∞ xn = l implies limn→∞ a1 + a2 + · · · + an

1.2

Convergence of Sequence Limit

The discussion of convergent sequences in Section 1.1 is based on the explicit value of the limit. However, there are many cases that a sequence must be convergent, but the value of the limit is not known. The limit of the world record in 100 meter dash is one such example. In such cases, the existence of the limit cannot be established by using the definition alone. A more fundamental theory is needed.

22

CHAPTER 1. LIMIT AND CONTINUITY

1.2.1

Necessary Condition

A subsequence of a sequence {xn } is obtained by selecting some terms from the sequence. The indices of the selected terms can be arranged as a strictly increasing sequence n1 < n2 < · · · < nk < · · · , and the subsequence can be denoted as {xnk }. The following are two examples of subsequences. {x3k } : x3 , x6 , x9 , x12 , x15 , x18 , . . . {x2k } : x2 , x4 , x8 , x16 , x32 , x64 , . . . Note that if {xn } starts from n = 1, then nk ≥ k. Thus by reindexing the terms if necessary, we will always assume nk ≥ k in subsequent proofs. Proposition 1.2.1. Suppose a sequence converges to l. Then all its subsequences converge to l. Proof. Suppose limn→∞ xn = l. For any > 0, there is N , such that n > N implies |xn − l| < . Then k > N =⇒ nk ≥ k > N =⇒ |xnk − l| < .

√ √ 2k Example 1.2.1. By limn→∞ n√n = 1, we have lim 2k = 1. Taking the square k→∞ √ 2k n 2 of the limit, we get limn→∞ 2n = (limk→∞ 2k) = 1. Example 1.2.2. The sequence {(−1)n } has subsequences {(−1)2k } = {1} and {(−1)2k+1 } = {−1}. Since the two subsequences have different limits, the original sequence {(−1)n } diverges. Exercise 1.2.1. Prove the sequences diverge. 1.

(−1)n 2n + 1 . n+2

(−1)n 2n(n + 1) √ . ( n + 2)3 p √ p 3. n n + (−1)n − n − (−1)n . 2.

nπ n sin 3 . 4. nπ +2 n cos 2

5.

(−1)n n . n+1

6. x2n = 7.

√ 1 , x2n+1 = n n. n

p n 2n + 3(−1)n n .

8. 1 + n sin 9. cosn

nπ . 2

2nπ . 3

Exercise 1.2.2. Prove that limn→∞ xn = l if and only if limk→∞ x2k = l and limk→∞ x2k+1 = l. Exercise 1.2.3. What is wrong with the following application of Propositions 1.1.3 and 1.2.1: The sequence xn = (−1)n satisfies xn+1 = −xn . Therefore limn→∞ xn = limn→∞ xn+1 = − limn→∞ xn , and we get limn→∞ xn = 0. Exercise 1.2.4. Prove that if a sequence diverges to infinity, then all its subsequences diverge to infinity.

1.2. CONVERGENCE OF SEQUENCE LIMIT

23

Theorem 1.2.2 (Cauchy1 Criterion). Suppose a sequence {xn } converges. Then for any > 0, there is N , such that m, n > N =⇒ |xm − xn | < . Proof. Suppose limn→∞ xn = l. For any > 0, there is N , such that n > N implies |xn − l| < . Then m, n > N implies 2 |xm − xn | = |(xm − l) − (xn − l)| ≤ |xm − l| + |xn − l| <

+ = . 2 2

Cauchy criterion plays a critical role in analysis. Therefore the criterion is called a theorem instead of just a proposition. Moreover, the property described in the criterion is given a special name. Definition 1.2.3. A sequence {xn } is called a Cauchy sequence if for any > 0, there is N , such that m, n > N =⇒ |xm − xn | < .

(1.2.1)

Theorem 1.2.2 says that convergent sequences must be Cauchy sequences. The converse that any Cauchy sequence is convergent is also true and is one of the most fundamental results in analysis. Example 1.2.3. Consider the sequence {(−1)n }. For = 1 > 0 and any N , we can find an even n > N . Then m = n + 1 > N is odd and |xm − xn | = 2 > . Therefore the Cauchy criterion fails and the sequence diverges. Example 1.2.4 (Oresme2 ). The harmonic sequence xn = 1 +

1 1 1 + + ··· + 2 3 n

satisfies x2n − xn =

1 1 1 1 1 1 n 1 + + ··· + ≥ + + ··· + = = . n+1 n+2 2n 2n 2n 2n 2n 2

1 1 Thus for = and any N , we have |xm − xn | > by taking any natural number 2 2 n > N and m = 2n. Therefore the Cauchy criterion fails and the harmonic sequence diverges. 1

Augustin Louis Cauchy, born 1789 in Paris (France), died 1857 in Sceaux (France). His contributions to mathematics can be seem by the numerous mathematical terms bearing his name, including Cauchy integral theorem (complex functions), Cauchy-Kovalevskaya theorem (differential equations), Cauchy-Riemann equations, Cauchy sequences. He produced 789 mathematics papers and his collected works were published in 27 volumes. 2 Nicole Oresme, born 1323 in Allemagne (France), died 1382 in Lisieux (France). Oresme is best known as an economist, mathematician, and a physicist. He was one of the most famous and influential philosophers of the later Middle Ages. His contributions to mathematics were mainly contained in his manuscript Tractatus de configuratione qualitatum et motuum (Treatise on the Configuration of Qualities and Motions).

24

CHAPTER 1. LIMIT AND CONTINUITY

Example 1.2.5. We show the sequence {sin n} diverges. The real line is divided π π by kπ ± into intervals of length > 1. Therefore for any integer k, there are 4 2 integers m and n satisfying 2kπ +

π 3π π 3π < m < 2kπ + , 2kπ − > n > 2kπ − . 4 4 4 4

Moreover, by taking k to be a big positive number, m and n can be as big as we √ 1 1 wish. Then sin m > √ , sin n < − √ , and we have | sin m − sin n| > 2. Thus 2 2 the sequence {sin n} is not Cauchy and must diverge. Exercise 1.2.5. Prove the sequences diverge. 1 1 1 1. xn = 1 + √ + √ + · · · + √ . n 2 3

2. xn =

1 2 3 n + + + ··· + 2 . 2 5 10 n +1

Exercise 1.2.6. Prove that if limn→∞ xn = +∞ and |xn − xn+1 | < c for some constant c < π, then {sin xn } diverges. Exercise 1.1.25 might be helpful here.

1.2.2

Supremum and Infimum

To discuss the converse of Theorem 1.2.2, we have to consider the difference between rational and real numbers. Specifically, consider the converse of Cauchy criterion stated for the real and the rational number systems: 1. Real number Cauchy sequences always have real number limits. 2. Rational number Cauchy sequences always have rational number limits. The key distinction here is that a sequence of rational numbers may have an irrational number as the limit. For √ example, the rational number sequence of the decimal approximations of 2 in Example 1.1.3 is a Cauchy sequence but has no rational number limit. This shows that the second statement is wrong. Therefore the truthfulness of the first statement is closely related to the fundamental question of the definition of real numbers. In other words, the establishment of the first property must also point to the key difference between the rational and the real number systems. One solution to the fundamental question is to simply use the converse of Cauchy criterion as the way of constructing real numbers from rational numbers (by requiring that all Cauchy sequences converge). This is the topological approach and can be dealt with in the larger context of the completion of metric spaces. Alternatively, real numbers can be constructed by considering the order among the numbers. The subsequent discussion will be based on this more intuitive approach, which is called the Dedekind3 cut. The order relation between real numbers enables us to introduce the following concept. 3 Julius Wilhelm Richard Dedekind, born 1831 and died 1916 in Braunschweig (Germany). Dedekind came up with the idea of the cut on November 24 of 1858 while thinking how to teach calculus. He made important contributions to algebraic number theory. His work introduced a new style of mathematics that influenced generations of mathematicians.

1.2. CONVERGENCE OF SEQUENCE LIMIT

25

Definition 1.2.4. Let X be a nonempty set of numbers. An upper bound of X is a number B such that x ≤ B for any x ∈ X. The supremum of X is the least upper bound of the set and is denoted sup X. The supremum λ = sup X is characterized by the following properties. 1. λ is an upper bound: For any x ∈ X, we have x ≤ λ. 2. Any number smaller than λ is not an upper bound: For any > 0, there is x ∈ X, such that x > λ − . The lower bound and the infimum inf X can be similarly defined and characterized. Example 1.2.6. Both the set {1, 2} and the interval [0, 2] have 2 as the supremum. In general, the maximum of a set X is a number ξ ∈ X such that ξ ≥ x for any x ∈ X, and the maximum (if exists) is always the supremum. On the other hand, the interval (0, 2) has no maximum but still has 2 as the supremum. The similar discussion on minimum can be made. √ Example 1.2.7. 2 is the supremum of the set {1.4, 1.41, 1.414, 1.4142, 1.41421, 1.414213, 1.4142135, 1.41421356, . . . } of its decimal expansions. It is also the supremum of the set nm o 2 2 : m and n are natural numbers satisfying m < 2n n of positive rational numbers whose squares are less than 2. Example 1.2.8. Let Ln be the length of an edge of the inscribed regular n-gon in a circle of radius 1. Then 2π is the supremum of the set {3L3 , 4L4 , 5L5 , . . . } of the circumferences of the inscribed regular n-gons. Exercise 1.2.7. Find the suprema and the infima. 1. {a + b : a, b are rational. a2 < 3, |2b + 1| < 5}. n : n is a natural number . 2. n+1 (−1)n n 3. : n is a natural number . n+1 nm o 4. : m and n are natural numbers satisfying m2 > 3n2 . n 1 1 5. + : m and n are natural numbers . 2m 3n 6. {nRn : n ≥ 3 is a natural number}, where Rn is the length of an edge of the circumscribed regular n-gon around a circle of radius 1. Exercise 1.2.8. Prove the supremum is unique. Exercise 1.2.9. Suppose X is a nonempty bounded set of numbers. Prove that λ = sup X is characterized by the following two properties.

26

CHAPTER 1. LIMIT AND CONTINUITY 1. λ is an upper bound: For any x ∈ X, we have x ≤ λ. 2. λ is the limit of a sequence in X: There are xn ∈ X, such that λ = limn→∞ xn .

The following are some properties of the supremum and infimum. Proposition 1.2.5. Suppose X and Y are nonempty bounded sets of numbers. 1. If x ≤ y for any x ∈ X and y ∈ Y , then sup X ≤ inf Y . 2. If |x − y| ≤ c for any x ∈ X and y ∈ Y , then | sup X − sup Y | ≤ c and | sup X − inf Y | ≤ c. 3. If X + Y = {x + y : x ∈ X, y ∈ Y }, then sup(X + Y ) = sup X + sup Y and inf(X + Y ) = inf X + inf Y . 4. If cX = {cx : x ∈ X}, then sup(cX) = c sup X when c > 0 and sup(cX) = c inf X when c < 0. 5. If XY = {xy : x ∈ X, y ∈ Y } and all numbers in X, Y are positive, then sup(XY ) = sup X sup Y and inf(XY ) = inf X inf Y . 6. If X −1 = {x−1 : x ∈ X} and all numbers in X are positive, then sup X −1 = (inf X)−1 . Proof. In the first property, fix any y ∈ Y . Then y is an upper bound of X. Therefore sup X ≤ y. Since sup X ≤ y for any y ∈ Y , sup X is a lower bound of Y . Therefore sup X ≤ inf Y . Now consider the second property. For any > 0, there are x ∈ X and y ∈ Y , such that sup X − < x ≤ sup X and sup Y − < y ≤ sup Y . Then (sup X − ) − sup Y < x − y 0, we conclude that | sup X − sup Y | ≤ c. The inequality | sup X − inf Y | ≤ c can be similarly proved. Next we prove the third property. For any x ∈ X and y ∈ Y , we have x + y ≤ sup X +sup Y , so that sup X +sup Y is an upper bound of X +Y . On the other hand, for any > 0, there are x, y ∈ X, Y , such that x > sup X − and y > sup Y −. Then x+y ∈ X +Y satisfies x+y > sup X +sup Y −2. Since > 0 is arbitrary, this shows that any number smaller than sup X + sup Y is not an upper bound of X + Y . Therefore sup X + sup Y is the supremum of X + Y . The proof of the rest are left as exercises. Exercise 1.2.10. Finish the proof of Proposition 1.2.5.

1.2. CONVERGENCE OF SEQUENCE LIMIT

27

Exercise 1.2.11. Suppose Xi are nonempty sets of numbers. Let X = ∪i Xi and li = sup Xi . Prove that sup X = supi li .

The existence of the supremum is what distinguishes the real numbers from the rational numbers. Definition 1.2.6. Real numbers is a set with the usual arithmetic operations and the order satisfying the usual properties, and the additional property that any bounded set of real numbers has the supremum. The arithmetic operations are addition, subtraction, multiplication and division. An order on a set S is a relation x < y defined for pairs x, y ∈ S, satisfying the following properties: • transitivity: x < y and y < z =⇒ x < z. • exclusivity: If x < y, then x 6= y. The two properties imply that x < y and y < x cannot happen at the same time. The following are some (but not all) of the usual arithmetic and order properties. • commutativity: a + b = b + a, ab = ba. • distributivity: a(b + c) = ab + ac. • unit: There is a special number 1 such that 1a = a. • order compatible with addition: a < b =⇒ a + c < b + c. • order compatible with multiplication: a < b, 0 < c =⇒ ac < bc. Because of these properties, the real numbers form an ordered field. Since the rational numbers also has the arithmetic operations and the order satisfying these usual properties, the rational numbers also form an ordered field. Thus the key distinction between the real and rational numbers is the existence of the supremum. A bounded set of rational numbers may not have rational number supremum. A bounded set of real numbers always has real number supremum. Due to the existence of supremum, the real numbers form a complete ordered field. Finally, we remark that the real numbers is the logical foundation of mathematical analysis. In this course, all the proof are logically derived from the properties (or axioms) about the arithmetic operations and the order (including the existence of the supremum). In fact, we will also use the exponential ab for any positive real number a and any real number b. This means the existence and the usual properties of such exponentials. Some (but not all) properties are listed below. • zero: a0 = 1. • unit: a1 = a, 1a = 1. • exponential compatible with addition: ab+c = ab ac .

28

CHAPTER 1. LIMIT AND CONTINUITY • exponential compatible with multiplication: abc = (ab )c , (ab)c = ac bc . • exponential compatible with order: a > b, c > 0 =⇒ ac > bc ; and a > 1, b > c =⇒ ab > ac .

Note that the exponential and the associated properties are not assumptions added to the real numbers. They can be derived from the existing arithmetic operations and order relation.

1.2.3

Monotone Sequence

A sequence {xn } is increasing if xn+1 ≥ xn . It is strictly increasing if xn+1 > xn . The concepts of decreasing and strictly decreasing sequences can be similarly defined. A sequence is monotone if it is either increasing or decreasing. Proposition 1.2.7. Bounded monotone sequences of real numbers are convergent. Unbounded monotone sequences of real numbers diverge to infinity. Since an increasing sequence {xn } satisfies xn ≥ x1 , the sequence has x1 as a lower bound. Therefore it is bounded if and only if it has an upper bound, and the proposition says that an increasing sequence with upper bound must be convergent. Similar remarks can be made for decreasing sequences. Proof. Let {xn } be a bounded increasing sequence. The sequence has a real number supremum l = sup{xn }. For any > 0, by the second property characterizing the supremum, there is N , such that xN > l −. Then because the sequence is increasing, n > N implies xn ≥ xN > l − . We also have xn ≤ l because l is an upper bound. Thus we conclude that n > N =⇒ l − < xn ≤ l =⇒ |xn − l| < . This proves that the sequence converges to l. Let {xn } be an unbounded increasing sequence. Then it has no upper bound. In other words, for any b, there is N , such that xN > b. Since the sequence is increasing, we have n > N =⇒ xn > xN > b. Thus we conclude that {xn } diverges to +∞. The proof for the decreasing sequences is similar. Example 1.2.9. The sequence in Example 1.2.4 is clearly increasing. Since it is divergent, the sequence has no upper bound. In fact, the proof of Proposition 1.2.7 tells us 1 1 1 lim 1 + + + · · · + = +∞. n→∞ 2 3 n On the other hand, the increasing sequence xn = 1 +

1 1 1 + 2 + ··· + 2 2 2 3 n

1.2. CONVERGENCE OF SEQUENCE LIMIT

29

satisfies 1 1 1 + + ··· + 1·2 2·3 (n − 1)n 1 1 1 1 1 1 =1+ + + ··· + − − − 1 2 2 3 n−1 n 1 1 = 1 + − < 2, 1 n

xn < 1 +

and must be convergent. Much later on, we will see that the sequence xn = 1 +

1 1 1 + p + ··· + p p 2 3 n

converges if and only if p > 1. Example 1.2.10. Suppose a sequence is given inductively by √ x1 = 1, xn+1 = 2 + xn . We claim that the sequence is increasing. It is easy to see that xn+1 > xn is equivalent to x2n − xn − 2 = (xn − 2)(xn + 1) < 0. Since the sequence is clearly positive,√the problem becomes xn < 2. First x1 < 2. Second if xn < 2, then xn+1 < 2 + 2 = 2. Thus the induction proves the inequality xn < 2. Since the sequence is increasing and has upper bound 2, it is convergent. The limit l satisfies l2 = lim x2n+1 = 2 + lim xn = 2 + l. n→∞

n→∞

Solving the equation for l ≥ 0, we conclude that the limit is l = 2. 1 n Example 1.2.11. Let xn = 1 + . The binomial expansion tells us n 1 n(n − 1) 1 2 n(n − 1)(n − 2) 1 3 xn = 1 + n + + n 2! n 3! n n n(n − 1) · · · 1 1 + ··· + n! n 1 1 1 1 1 2 =1+ + 1− + 1− 1− 1! 2! n 3! n n 1 1 2 n−1 + ··· + 1− 1− ··· 1 − . n! n n n By comparing the similar formula for xn+1 , we find the sequence is strictly increasing. The formula also tells us (see Example 1.2.9) xn < 1 + 1 +

1 1 1 1 1 1 + + ··· + <1+1+ + + ··· + < 3. 2! 3! n! 1·2 2·3 (n − 1)n

Therefore the sequence converges. The limit has a special notation 1 n e = lim 1 + = 2.71828182845904 · · · n→∞ n and is a fundamental constant of the nature. Exercise 1.2.12. Prove the sequences converge.

30

CHAPTER 1. LIMIT AND CONTINUITY 1. xn = 1 +

1 1 1 + 3 + ··· + 3. 3 2 3 n

3. xn = 1 +

1 1 1 + 3 + ··· + n. 2 2 3 n

2. xn = 1 +

1 1 1 + + ··· + . 2! 3! n!

4. xn = 1 +

1 1 1 + 2 + ··· + n. 2 2 2

Exercise 1.2.13. Prove that limn→∞ an = 0 for |a| < 1 in the following steps. 1. Prove that for 0 ≤ a < 1, the sequence {an } converges. 2. Prove that the limit of {an } must be 0. 3. Prove that limn→∞ an = 0 for −1 < a ≤ 0. √ 1 1 1 Exercise 1.2.14. Consider the sequence xn = 1 + √ + √ + · · · + √ − 2 n + α. n 2 3 1 1. Prove that if α ≤ , then the sequence is strictly decreasing. 2 1 1 2. Prove that if α > , then the sequence is strictly increasing for n > . 2 16α − 8 3. Prove the sequence converges to the same limit for all α. r q p √ Exercise 1.2.15. For any a > 0, let xn = a + a + a + · · · + a, where the √ 1 + 4a + 1 square root appears n times. Prove xn < xn+1 < and find the limit 2 of the sequence. The result can be extended to a sequence defined by x1 ≤ b, xn+1 = f (xn ), where f (x) be a function satisfying b ≥ f (x) ≥ x for x ≤ b. See Figure 1.3.

.. ..... .... . . .. .. .. ..... ............................. . .. . .... .............. .. ..................................................................... .. ...... .............................................................. ... ......................................................... ... ... ..................... ................ ...... .. .. .. .. ... .. . . . ...... ... ........... ... ... ... .. . . . . . .......................................... ... ... ... ... . ... . .. . . . . . ....... ... ...... ... ... ... ... ... . . . . .. .. .. .. .. .. .. ..... ... ........ . . . . . . . . .............................................................................................................................................................................................. ... .. x1 x2 x3 x4x5 b .. .. Figure 1.3: recursively defined convergent sequence Exercise 1.2.16. Prove the inductively defined sequences converge and find the limits. 1. x1 = 1, xn+1 = 1 +

2 . xn

2. x1 = 1, xn+1 =

x2n + 2 . 2xn

1.2. CONVERGENCE OF SEQUENCE LIMIT

31

Exercise 1.2.17. Discuss the convergence of the sequence defined by x1 = α, xn+1 = γ β + γxn . Do the same to the sequence defined by x1 = α, xn+1 = β + . xn Exercise 1.2.18. Let a, b > 0. Define sequences by a1 = a, b1 = b, an = an−1 + bn−1 a+b 2an−1 bn−1 2ab . Use , bn = ≥ to prove the sequences con2 an−1 + bn−1 2 a+b verge. Moreover, find the limits. 1 n 1 n+1 Exercise 1.2.19. Let xn = 1 + and yn = 1 + . n n 1. Use induction to prove (1+x)n ≥ 1+nx for x > −1 and any natural number n. 2. By showing decreasing.

xn+1 yn−1 > 1 and > 1, prove {xn } is increasing and {yn } is xn yn

3. Prove {xn } and {yn } converge to the same limit e. e . n 1 −n 5. Prove limn→∞ 1 − = e. n 4. Prove e − xn <

Exercise 1.2.20. Prove e ≥ 1 +

1.2.4

1 1 1 + + ··· + for any n. 1! 2! n!

Convergent Subsequence

By Proposition 1.1.2, any convergent sequence is bounded. While bounded sequences may not converge (see Example 1.2.2), the following still holds. Theorem 1.2.8 (Bolzano4 -Weierstrass5 Theorem). A bounded sequence of real numbers has a convergent subsequence. Recall that by Proposition 1.2.1, any subsequence of a convergent sequence is convergent. Theorem 1.2.8 shows that if the original sequence is only assumed to be bounded, then “any subsequence” should be changed to “some subsequence”. Proof. Let {xn } be a bounded sequence. Then all xn lie in a bounded interval I = [a, b]. 4

Bernard Placidus Johann Nepomuk Bolzano, born 1781 and died 1848 in Prague (Bohemia, now Czech). Bolzano insisted that many results which were thought ”obvious” required rigorous proof and made fundamental contributions to the foundation of mathematics. He understood the need to redefine and enrich the concept of number itself and define the Cauchy sequence four years before Cauchy’s work appeared. 5 Karl Theodor Wilhelm Weierstrass, born 1815 in Ostenfelde (Germany), died 1848 in Berlin (Germany). In 1864, he found a continuous but nowhere differentiable function. His lectures on analytic functions, elliptic functions, abelian functions and calculus of variations influenced many generations of mathematicians, and his approach still dominates the teaching of analysis today.

32

CHAPTER 1. LIMIT AND CONTINUITY

a + b a + b Divide I into two equal halves I 0 = a, and I 00 = , b . Then 2 2 either I 0 or I 00 must contain infinitely many xn . We denote this interval by I1 = [a1 , b1 ] and find xn1 ∈ I1 . a1 + b 1 0 and I100 = Further divide I1 into two equal halves I1 = a1 , 2 a1 + b1 , b1 . Then either I10 or I100 must contain infinitely many xn . We 2 denote this interval by I2 = [a2 , b2 ]. Because I2 contains infinitely many xn , we can find xn2 ∈ I2 with n2 > n1 . Keep going, we get a sequence of intervals I = [a, b] ⊃ I1 = [a1 , b1 ] ⊃ I2 = [a2 , b2 ] ⊃ · · · ⊃ Ik = [ak , bk ] ⊃ · · · with the length of Ik being bk −ak =

b−a . Moreover, we have a subsequence 2k

{xnk } satisfying xnk ∈ Ik . The inclusion relation between the intervals tells us

a ≤ a1 ≤ a2 ≤ · · · ≤ ak ≤ · · · ≤ bk ≤ · · · ≤ b2 ≤ b1 ≤ b. Thus {ak } and {bk } are bounded and monotone sequences. By Proposition 1.2.7, both sequences converge. Moreover, the length of Ik and the limit (1.1.3) tell us limk→∞ (bk − ak ) = 0. Therefore the two sequences have the same limit. Denote l = limk→∞ ak = limk→∞ bk . The property xnk ∈ Ik means ak ≤ xnk ≤ bk . By the sandwich rule, we get limk→∞ xnk = l. Thus we find a convergent subsequence {xnk }. Example 1.2.12. Let {xn }, {yn }, {zn } be sequences converging to l1 , l2 , l3 , respectively. Then for the sequence x1 , y1 , z1 , x2 , y2 , z2 , x3 , y3 , z3 , . . . , xn , yn , zn , . . . the limits of convergent subsequences are l1 , l2 , l3 . We need to explain that any l 6= l1 , l2 , l3 is not the limit of any subsequence. 1 There is > 0, such that (take = min{|l − l1 |, |l − l2 |, |l − l3 |}, for example) 2 |l − l1 | ≥ 2, |l − l2 | ≥ 2, |l − l3 | ≥ 2. Then there are N1 , N2 , N3 , such that n > N1 =⇒ |xn − l1 | < , n > N2 =⇒ |yn − l2 | < , n > N3 =⇒ |zn − l3 | < . Since |l − l1 | ≥ 2 and |xn − l1 | < imply |xn − l| > , we have n > max{N1 , N2 , N3 } =⇒ |xn − l| > , |yn − l| > , |zn − l| > . This implies that l cannot be the limit of any convergent subsequence.

1.2. CONVERGENCE OF SEQUENCE LIMIT

33

A more direct argument about the limits of convergent subsequences is the following. Let l be the limit of a convergent subsequence {wm } of the combined sequence. The subsequence {wm } must contain infinitely many terms from at least one of the three sequences. If {wm } contains infinitely many terms from {xn }, then it contains a subsequence {xnk } of {xn }. By Proposition 1.2.1, we get l = lim wm = lim xnk = lim xn = l1 . The second equality is due to the fact that {xnk } is a subsequence of {wm }. The third equality is due to {xnk } being a subsequence of {xn }. Exercise 1.2.21. For sequences in Exercise 1.2.1, find all the limits of convergent subsequences. n1 n2 Exercise 1.2.22. Any real number is the limit of a sequence of the form , , 10 100 n3 , . . . , where nk are integers. Based on this observation, construct a sequence 1000 so that the limits of convergent subsequences are all the numbers between 0 and 1. Exercise 1.2.23. Prove that a number is the limit of a convergent subsequence of √ {xn } if and only if it is the limit of a convergent subsequence of {xn n n}. Exercise 1.2.24. Suppose {xn } and {yn } are two bounded sequences. Prove that there are nk , such that both subsequences {xnk } and {ynk } converge.

The following technical result provides a criterion for a number to be the limit of a subsequence. Proposition 1.2.9. l is the limit of a convergent subsequence of {xn } if and only if for any > 0 and N , there is n > N , such that |xn − l| < . Proof. Let l be the limit of a convergent subsequence {xnk }. Let > 0 and N be given. Then there is K, such that k > K implies |xnk −l| < . It is easy to find k > K, such that nk > N (take k = max{K, N } + 1, for example). Then for n = nk , we have n > N and |xn − l| < . Conversely, suppose for any > 0 and N , there is n > N , such that |xn − l| < . Then for = 1, there is n1 such that |xn1 − l| < 1. Next, for 1 1 = and N = n1 , there is n2 > n1 such that |xn2 − l| < . Keep going, 2 2 1 by taking = and N = nk to find xnk+1 , we construct a subsequence k+1 1 {xnk } satisfying |xnk − l| < . The inequality implies limk→∞ xnk = l. k Here is a remark that is very useful for the discussion of subsequences. Suppose P is a property about terms in a sequence (xn > l or xn > xn+1 , for examples). Then the following statements are equivalent: 1. For any N , there is n > N , such that xn has property P . 2. There are infinitely many xn with property P . 3. There is a subsequence {xnk } such that each term xnk has property P .

34

CHAPTER 1. LIMIT AND CONTINUITY

In particular, the criterion for l to be the limit of a convergent subsequence of {xn } is that for any > 0, there are infinitely many xn satisfying |xn − l| < . Exercise 1.2.25. Let {xn } be a sequence. Suppose limk→∞ lk = l and for each k, lk is the limit of a convergent subsequence of {xn }. Prove that l is also the limit of a convergent subsequence.

Let {xn } be a bounded sequence. Then the set LIM{xn } of all the limits of convergent subsequences of {xn } is also bounded. The supremum of LIM{xn } is called the upper limit and denoted limn→∞ xn . The infimum of LIM{xn } is called the lower limit and denoted limn→∞ xn . For example, for the sequence in Example 1.1.14, the upper limit is max{l1 , l2 , l3 } and the lower limit is min{l1 , l2 , l3 }. The following characterizes the upper limit. The lower limit can be similarly characterized. Proposition 1.2.10. Suppose {xn } is a bounded sequence and l is a number. 1. If l < limn→∞ xn , then there are infinitely many xn > l. 2. If l > limn→∞ xn , then there are only finitely many xn > l. The characterization gives us a clear picture of the upper limit. Let us start with a big l and move downwards. As we decrease l, the number of terms xn > l will increase. If l is an upper bound of {xn }, then this number is zero. When l is lowered and is no longer an upper bound, some terms in the sequence will be higher than l. The number might be finite at the beginning. But there will be a threshold, such that if l is below the threshold, then the number of xn > l becomes infinite. The reason for the existence of such a threshold is that when l is so low to become a lower bound, then all (in particular, infinitely may) xn > l. This threshold is the upper limit. Proof. If l < limn→∞ xn , then by the definition of the upper limit, some l0 > l is the limit of a subsequence. By Proposition 1.1.4, all the terms in the subsequence except finitely many will be bigger than l. Thus we find infinitely many xn > l. The second statement is the same as the following: If there are infinitely many xn > l, then l ≤ limn→∞ xn . We will prove this equivalent statement. Since there are infinitely many xn > l, there is a subsequence {xnk } satisfying xnk > l. By Theorem 1.2.8, the subsequence has a further convergent subsequence {xnkp }. Then xnk > l implies lim xnkp ≥ l. This gives a number in LIM{xn } that is no less than l, so that limn→∞ xn = sup LIM{xn } ≥ l. Proposition 1.2.11. The upper and lower limits are limits of convergent subsequences. Moreover, the sequence converges if and only if the upper and lower limits are equal. The first conclusion is limn→∞ xn , limn→∞ xn ∈ LIM{xn }. In the second conclusion, the equality limn→∞ xn = limn→∞ xn = l means LIM{xn } = {l}, which basically says that all convergent subsequences have the same limit.

1.2. CONVERGENCE OF SEQUENCE LIMIT

35

Proof. Denote l = lim xn . For any > 0, we have l + > lim xn and l − < lim xn . Applying Proposition 1.2.10, we know there are infinitely many xn > l − and only finitely many xn > l + . Therefore there are infinitely many xn satisfying l + ≥ xn > l − . Thus we have proved that for any > 0, there are infinitely many xn satisfying |xn − l| ≤ . By Proposition 1.2.9 (and the remark after the proof), this shows that l is the limit of a convergent subsequence. For the second part, Proposition 1.2.1 says that if {xn } converges to l, then LIM{xn } = {l}, so that lim xn = lim xn = l. Conversely, suppose lim xn = lim xn = l. Then for any > 0, applying the second part of Proposition 1.2.10 to l + > lim xn , we find only finitely many xn > l + . Applying the similar property for the lower limit to l − < lim xn , we also find only finitely many xn < l − . Thus |xn − l| ≤ holds for all but finitely many xn . If N is the biggest index for those xn that do not satisfy |xn −l| ≤ , then we get |xn − l| ≤ for all n > N . This proves that {xn } converges to l.

Exercise 1.2.26. Find all the upper and lower limits of bounded sequences in Exercise 1.2.1. Exercise 1.2.27. Prove the properties of upper and lower limits. 1. limn→∞ (−xn ) = − limn→∞ xn . 2. limn→∞ xn + limn→∞ yn ≥ limn→∞ (xn + yn ) ≥ limn→∞ xn + limn→∞ yn . 3. If xn > 0, then limn→∞

1 1 = . xn limn→∞ xn

4. If xn ≥ 0 and yn ≥ 0, then limn→∞ xn · limn→∞ yn ≥ limn→∞ (xn yn ) ≥ limn→∞ xn · limn→∞ yn . xn+1 < 1, then limn→∞ xn = 0. Prove Exercise 1.2.28. Prove that if limn→∞ x n xn+1 > 1, then limn→∞ xn = ∞. that if limn→∞ xn Exercise 1.2.29. Prove that the upper limit l of a bounded sequence {xn } is characterized by the following two properties. 1. l is the limit of a convergent subsequence. 2. For any > 0, there is N , such that xn < l + for any n > N . The characterization may be compared with the one for the supremum in Exercise 1.2.9. Exercise 1.2.30. Let {xn } be a bounded sequence. Let yn = sup{xn , xn+1 , xn+2 , . . . }. Then {yn } is a bounded decreasing sequence. Prove that limn→∞ yn = limn→∞ xn . Find the similar formula for limn→∞ xn .

36

1.2.5

CHAPTER 1. LIMIT AND CONTINUITY

Convergence of Cauchy Sequence

Now we are ready to prove the converse of Theorem 1.2.2. Theorem 1.2.12. Any Cauchy sequence of real numbers is convergent. Proof. Let {xn } be a Cauchy sequence. We claim the sequence is bounded. For = 1 > 0, there is N , such that m, n > N implies |xm − xn | < 1. Taking m = N + 1, we find n > N implies xN +1 − 1 < xn < xN +1 + 1. Therefore max{x1 , x2 , . . . , xN , xN +1 + 1} is an upper bound for the sequence, and min{x1 , x2 , . . . , xN , xN +1 − 1} is a lower bound. By Theorem 1.2.8, there is a subsequence {xnk } converging to a limit l. Thus for any > 0, there is K, such that k > K =⇒ |xnk − l| < . 2 On the other hand, since {xn } is a Cauchy sequence, there is N , such that m, n > N =⇒ |xm − xn | < . 2 Now for any n > N , we can easily find some k > K, such that nk > N (k = max{K, N } + 1, for example). Then we have both |xnk − l| < and 2 |xnk − xn | < . The inequalities imply |xn − l| < . Thus we established the 2 implication n > N =⇒ |xn − l| < .

The proof of Theorem 1.2.12 does not use the upper and lower limits. Alternatively, we observe that for any > 0, the Cauchy condition implies that any two subsequences will be within of each other after finitely many terms. This implies that the difference between the limits of any two convergent subsequences cannot be more than . Since can be arbitrarily small, this implies the upper and the lower limits must be the same. The Proposition 1.2.11 then shows that the whole sequence converges. Example 1.2.13. For the sequence xn = 1 −

1 (−1)n 1 + − · · · + , 22 32 n2

and m > n, we have (−1)n+1 (−1)n+2 (−1)m |xm − xn | = + + ··· + (n + 1)2 (n + 2)2 m2 1 1 1 < + + ··· + n(n + 1) (n + 1)(n + 2) (m − 1)m 1 1 1 1 1 1 = − + − + ··· + − n n+1 n+1 n+2 m−1 m 1 1 1 = − < . n m n

1.2. CONVERGENCE OF SEQUENCE LIMIT

37

1 For any > 0, let N = . Then for m > n > N , we have 1 < . n

0 < xm − xn <

Thus {xn } is a Cauchy sequence and must converge. Exercise 1.2.31. Prove the sequences converge. 1. xn = 1 +

1 1 1 + + ··· + . 2! 3! n!

2. xn = 1 +

1 1 1 + 3 + ··· + n. 2 2 3 n

3. xn = 1 −

1 1 1 + + · · · + (−1)n 3 . 23 33 n

4. xn = sin 1 +

sin 2 sin 3 sin n + 3 + ··· + 3 . 3 2 3 n

1 Exercise 1.2.32. Let |a| ≤ 1. Define a sequence by x1 = a, xn+1 = (1 + x2n ). 4 1. Prove |xn | ≤ 1 and |xn+1 − xn | < 2. Prove

1 2n−2

.

1 1 1 1 1 + 2 + 3 + ··· + n = 1 − n. 2 2 2 2 2

3. Prove the sequence converges and find the limit.

1.2.6

Open Cover

A set X of numbers is closed if xn ∈ X and limn→∞ xn = l implies l ∈ X. Intuitively, this means that one cannot escape X by taking limits. For example, the order rule on the limit tells us that closed intervals [a, b] are closed sets. The proof of Bolzano-Weierstrass Theorem also gives us the following result, which in modern topological language says that bounded and closed sets of numbers are compact. Theorem 1.2.13 (Heine6 -Borel7 Theorem). Suppose X is a bounded and closed set of numbers. Suppose {(ai , bi )} is a collection of open intervals such that X ⊂ ∪(ai , bi ). Then X ⊂ (ai1 , bi1 ) ∪ (ai2 , bi2 ) ∪ · · · ∪ (ain , bin ) for finitely many intervals in the collection. When X ⊂ ∪(ai , bi ) happens, we say U = {(ai , bi )} is an open cover of X. The theorem says that if X is bounded and closed, then any cover by open intervals has a finite subcover. 6

Heinrich Eduard Heine, born 1821 in Berlin (Germany), died 1881 in Halle (Germany). In addition to the Heine-Borel theorem, he introduced the idea of uniform continuity. 7 ´ F´elix Edouard Justin Emile Borel, born 1871 in Aveyron (France), died 1881 in Paris (France). Borel’ measure theory was the beginning of the modern theory of functions of a real variable. He was French Minister of the Navy from 1925 to 1940.

38

CHAPTER 1. LIMIT AND CONTINUITY

Proof. Suppose X ⊂ I = [α, β] for a bounded and closed interval I. Suppose X cannot be covered by finitely many open intervals in U = {(ai , bi )}. Similar to the proof of Bolzano-Weierstrass Theorem, we divide the inα + β α+β terval into two equal halves I 0 = α, and I 00 = , β . Then 2 2 either X 0 = X ∩ I 0 or X 00 = X ∩ I 00 cannot be covered by finitely many open intervals in U. We denote the corresponding interval by I1 = [α1 , β1 ] and denote X1 = X ∩ I1 . α1 + β1 0 Further divide I1 into two equal halves I1 = α1 , and I100 = 2 α1 + β1 , β1 . Then either X10 = X1 ∩ I10 or X100 = X1 ∩ I100 cannot be covered 2 by finitely many open intervals in U. We denote the corresponding interval by I2 = [α2 , β2 ] and denote X2 = X ∩ I2 . Keep going, we get a sequence of intervals I = [α, β] ⊃ I1 = [α1 , β1 ] ⊃ I2 = [α2 , β2 ] ⊃ · · · ⊃ Ik = [αk , βk ] ⊃ · · · β−α with the length of Ik being βk − αk = , and Xk = X ∩ Ik cannot be 2k covered by finitely many open intervals in U. As argued in the proof before, we have converging limit l = limk→∞ αk = limk→∞ βk . Moreover, picking xk ∈ Xk , we have αk ≤ xk ≤ βk . By the sandwich rule, we get l = limk→∞ xk . Then by the assumption that X is closed, we get l ∈ X. Since X ⊂ ∪(ai , bi ), we have l ∈ (ai0 , bi0 ) for some interval (ai0 , bi0 ) ∈ U. Then by l = limk→∞ αk = limk→∞ βk , we have Xk ⊂ Ik = [αk , βk ] ⊂ (ai0 , bi0 ) for sufficiently big k. In particular, Xk can be covered by one open interval in U. The contradiction shows that X must be covered by finitely many open intervals from U. Exercise 1.2.33. Find a collection U = {(ai , bi )} that covers (0, 1], but (0, 1] cannot be covered by finitely many intervals in U. Find similar counterexample for [0, +∞) in place of (0, 1]. Exercise 1.2.34 (Lebesgue8 ). Suppose [α, β] is covered by a collection U = {(ai , bi )}. Denote X = {x ∈ [α, β] : [α, x] is covered by finitely many intervals in U}. 1. Prove that sup X ∈ X. 2. Prove that if x ∈ X and x < β, then x + δ ∈ X for some δ > 0. 3. Prove that sup X = β. 8 Henri L´eon Lebesgue, born 1875 in Beauvais (France), died 1941 in Paris (France). His 1901 paper “Sur une g´en´eralisation de l’int´egrale d´efinie” introduced the concept of measure and revolutionized the integral calculus. He also made major contributions in other areas of mathematics, including topology, potential theory, the Dirichlet problem, the calculus of variations, set theory, the theory of surface area and dimension theory.

1.2. CONVERGENCE OF SEQUENCE LIMIT

39

This proves Heine-Borel Theorem for bounded and closed intervals. Exercise 1.2.35. Prove Heine-Borel Theorem for a bounded and closed set X in the following way. Suppose X is covered by a collection U = {(ai , bi )}. 1. Prove that there is δ > 0, such that for any x ∈ X, (x − δ, x + δ) ⊂ (ai , bi ) for some (ai , bi ) ∈ U. 2. Use the boundedness of X to find finitely many numbers c1 , c2 , . . . , cn , such that X ⊂ (c1 , c1 + δ) ∪ (c2 , c2 + δ) ∪ · · · ∪ (cn , cn + δ). 3. Prove that if X ∩ (cj , cj + δ) 6= ∅, then (cj , cj + δ) ⊂ (ai , bi ) for some (ai , bi ) ∈ U. 4. Prove that X is covered by no more than n open intervals in U.

1.2.7

Additional Exercise

Extended Supremum and Extended Upper Limit Exercise 1.2.36. Extend the number system by including the “infinite numbers” +∞, −∞ and introduce the order −∞ < x < +∞ for any real number x. Then for any nonempty set X of real numbers and possibly +∞ or −∞, we have sup X and inf X similarly defined. Prove that there are exactly three possibilities for sup X. 1. If X has no upper bound or +∞ ∈ X, then sup X = +∞. 2. If X has a finite number as an upper bound and contains at least one finite real number, then sup X is a finite real number. 3. If X = {−∞}, then sup X = −∞. Write down the similar statements for inf X. Exercise 1.2.37. For a not necessarily bounded sequence {xn }, extend the definition of LIM{xn } by adding +∞ if there is a subsequence diverging to +∞, and adding −∞ if there is a subsequence diverging to −∞. Define the upper and lower limits as the supremum and infimum of LIM{xn }, using the extension of the concepts in Exercise 1.2.36. Prove the following extensions of Proposition 1.2.11. 1. A sequence with no upper bound must have a subsequence diverging to +∞. This means limn→∞ xn = +∞. 2. If there is no subsequence with finite limit and no subsequence diverging to −∞, then the whole sequence diverges to +∞.

Supremum and Infimum in Ordered Set Recall that an order on a set is a relation x < y between pairs of elements satisfying the transitivity and the exclusivity. The concepts of upper bound, lower bound, supremum and infimum can be defined for subsets of an ordered set in a way similar to numbers. Exercise 1.2.38. Provide a characterization of the supremum similar to numbers. Exercise 1.2.39. Prove that the supremum, if exists, must be unique.

40

CHAPTER 1. LIMIT AND CONTINUITY

Exercise 1.2.40. An order is defined for all subsets of the plane R2 by A ≤ B if A is contained in B. Let R be the set of all rectangles centered at the origin and with circumference 1. Find the supremum and infimum of R.

Alternative Proof of Bolzano-Weierstrass Theorem We say a term xn in a sequence has property P if there is M , such that m > M implies xm > xn . The property means that xm > xn for sufficiently big m. Exercise 1.2.41. Suppose there are infinitely many terms in a sequence {xn } with property P . Prove that for any term xn with property P , there is m > n, such that xm > xn and xm also has property P . Then construct an increasing subsequence {xnk } in which each xnk has property P . Exercise 1.2.42. Suppose there are only finitely many terms in a sequence {xn } with property P . Prove that there is N , such that for any n > N , there is a term xm ≤ xn with m > n. Then use this to construct a decreasing subsequence {xnk }. Exercise 1.2.43. Suppose the sequence {xn } is bounded. Prove that the increasing subsequence constructed in Exercise 1.2.41 converges to lim xn , and the decreasing subsequence constructed in Exercise 1.2.42 converges to lim xn .

Set Version of Bolzano-Weierstrass Theorem A number l is an accumulation point of a set X of numbers if for any > 0, there is x ∈ X satisfying 0 < |x − l| < . The set version of BolzanoWeierstrass Theorem says that any infinite, bounded and closed set of numbers has an accumulation point. Theorem 1.2.8 may be considered as the sequence version of Bolzano-Weierstrass Theorem. Exercise 1.2.44. What does it mean for a point l not to be an accumulation point? Use your answer and Heine-Borel Theorem to prove the set version of BolzanoWeierstrass Theorem. Exercise 1.2.45. Use the set version of Bolzano-Weierstrass Theorem to prove the sequence version.

1.3

Limit of Function

The limit process is not restricted to sequences only. For a function f (x) defined near (but not necessarily at) a, we may also consider its tendency as x approaches a. This leads to the definition of the limit of functions. Many properties of the limit of sequences can be extended to functions. The two types of limits are also closely related.

1.3.1

Definition

Definition 1.3.1. A function f (x) defined near a has limit l at a, denoted limx→a f (x) = l, if for any > 0, there is δ > 0, such that 0 < |x − a| < δ =⇒ |f (x) − l| < .

(1.3.1)

1.3. LIMIT OF FUNCTION

41

.... .. ... .. . . .. .. ... .. . . .. .. .... .. . . . .. ...................................................................... .. ... .... ................................................................................. ... . L .. ... .... . ........ ... ... .. .... . . . . . . ............................................. .. .. .. .. .. .. .. ...... .. .. .. .. .................. .......δ.............δ........ .. .. .. .. .. . . . .. ................................................................................................................................................................................ a .. .. .. Figure 1.4: for any , there is δ Similar to the limit of sequences, the predetermined smallness for |f (x)− l| is arbitrarily given, while the size δ for |x − a| is to be found after is given. Thus the choice of δ usually depends on and is often expressed as a function of . Moreover, since the limit is about what happens when numbers are close, only small and δ need to be considered. Example 1.3.1. The graph for the function f (x) = x2 suggests n limx→2 x2 = 4. o Rigorously following the definition, for any > 0, choose δ = min 1, . Then 5 5 =⇒ |x + 2| < 5, |x − 2| < 5 =⇒ |x2 − 4| = |x + 2||x − 2| < 5 = . 5

0 < |x − 2| < δ =⇒ |x − 2| < 1, |x − 2| <

How did we choose δ? We try to achieve |x2 − 4| = |x + 2||x − 2| < by requiring 0 < |x − 2| < δ. Note that when x is close to 2, |x + 2| is close to 4 and |x2 − 4| is close to 4δ. More precisely, if we give some room to x by choosing δ ≤ 1, so that |x + 2| is not more than 5, then we end up with the requirement 5δ ≤ . Combining δ ≤ 1 and 5δ ≤ together yields our choice for δ. √ √ Example 1.3.2. The graph for the function f (x) = x suggests that limx→4 x = 2. Rigorously following the definition, for any > 0, choose δ = . Then √ √ √ | x − 2|| x + 2| |x − 4| δ √ 0 < |x−4| < δ =⇒ | x−2| = = √ < √ < δ = . | x + 2| | x + 2| | x + 2| 1 1 Example 1.3.3. The graph for the function f (x) = suggests limx→1 = 1. To x x rigorously argue by the definition, we estimate how small 1 − 1 = |x − 1| x |x| can be when 0 < |x − 1| < δ. Note that in the quotient, the numerator |x − 1| < δ 1 and the denominator |x| should be close to 1 for small δ. Specifically, if δ < , 2

42

CHAPTER 1. LIMIT AND CONTINUITY

1 |x − 1| then |x − 1| < δ will imply |x| > , and we get < 2δ. We conclude that 2 |x| 1 1 implies − 1 < . 0 < |x − 1| < δ = min , 2 2 x Example 1.3.4. Note that the limit of f (x) at a does not depend on f (a). In fact, the function f does not even need to be defined at a. For example, the function   1 if x 6= 1 f (x) = x 2 if x = 1 1 has limit limx→1 f (x) = limx→1 = 1. In fact, the argument for the limit in x Example 1.1.3 can be used here without any change. Exercise 1.3.1. Rigorously verify the limits. √

1. limx→−2 x2 = 4.

3. limx→1

2. limx→2 3x3 = 24.

4. limx→−2

1 2 = . 2 x 2

x = 1.

5. limx→−2

1 1 =− . x 2

6. limx→a |x| = |a|.

Exercise 1.3.2. Prove that limx→a f (x) = l implies limx→a |f (x)| = |l|. Prove the converse is true if l = 0. What about the converse in case l 6= 0? Exercise 1.3.3. Prove the following are equivalent definitions of limx→a f (x) = l. 1. For any > 0, there is δ > 0, such that 0 < |x − a| ≤ δ implies |f (x) − l| < . 2. For any c > > 0, where c is some fixed number, there is δ > 0, such that 0 < |x − a| < δ implies |f (x) − l| ≤ . 3. For any natural number n, there is δ > 0, such that 0 < |x − a| < δ implies 1 |f (x) − l| ≤ . n 4. For any 1 > > 0, there is δ > 0, such that 0 < |x − a| < δ implies |f (x) − l| < . 1− Exercise 1.3.4. Which are equivalent to the definition of limx→a f (x) = l? 1. For = 0.001, we have δ = 0.01, such that 0 < |x−a| ≤ δ implies |f (x)−l| < . 2. For any > 0, there is δ > 0, such that |x − a| < δ implies |f (x) − l| < . 3. For any > 0, there is δ > 0, such that 0 < |x − a| < δ implies 0 < |f (x) − l| < . 4. For any 0.001 ≥ > 0, there is δ > 0, such that 0 < |x − a| ≤ 2δ implies |f (x) − l| ≤ . 5. For any > 0.001, there is δ > 0, such that 0 < |x − a| ≤ 2δ implies |f (x) − l| ≤ . 6. For any > 0, there is a rational number δ > 0, such that 0 < |x − a| < δ implies |f (x) − l| < .

1.3. LIMIT OF FUNCTION

43

7. For any > 0, there is a natural number N , such that 0 < |x − a| < implies |f (x) − l| < .

1 N

8. For any > 0, there is δ > 0, such that 0 < |x−a| < δ implies |f (x)−l| < 2 . 9. For any > 0, there is δ > 0, such that 0 < |x − a| < δ implies |f (x) − l| < 2 + 1. 10. For any > 0, there is δ > 0, such that 0 < |x − a| < δ + 1 implies |f (x) − l| < 2 .

1.3.2

Variation

In the definition of limx→a f (x), x may approach a from right (i.e., x > a) or from left (i.e., x < a). The two approaches may be treated separately, leading to one side limits. Definition 1.3.2. A function f (x) defined for x > a and near a has right limit l at a, denoted limx→a+ f (x) = l, if for any > 0, there is δ > 0, such that 0 < x − a < δ =⇒ |f (x) − l| < . (1.3.2) A function f (x) defined for x < a and near a has left limit l at a, denoted limx→a− f (x) = l, if for any > 0, there is δ > 0, such that −δ < x − a < 0 =⇒ |f (x) − l| < .

(1.3.3)

The one side limits are often denoted as f (a+ ) = limx→a+ f (x) and f (a− ) = limx→a− f (x). Moreover, the one side limits and the usual (two side) limit are related as follows. The proof is left as an exercise. Proposition 1.3.3. limx→a f (x) = l if and only if limx→a+ f (x) = limx→a− f (x) = l. Example 1.3.5. For the function  1 if x > 1 , f (x) = x 3x − 2 if x < 1 we have lim f (x) = lim

x→1+

x→1

1 = 1, x

lim f (x) = lim (3x − 2) = 1.

x→1−

x→1

Therefore limx→1 f (x) = 1. On the other hand, for the function   1 if x > 1 g(x) = x , 3x if x < 1 we have lim g(x) = lim

x→1+

x→1

1 = 1, x

lim g(x) = lim 3x = 3.

x→1−

x→1

Since the left and right limits are not equal, g(x) diverges at 1.

44

CHAPTER 1. LIMIT AND CONTINUITY

Example 1.3.6. If α > 0, then for any > 0, we have 1

0 < x < δ = α =⇒ 0 < xα < δ α = . This shows limx→0+ xα = 0. Exercise 1.3.5. Let [x] be the biggest integer n satisfying n ≤ x. For example, [1.1] = 1, [0.99] = 0, [−3.2] = −4. Compute the limits. 1. limx→2+ [x].

3. limx→√2+ [x].

2. limx→2− [x].

4. limx→√2− [x].

Exercise 1.3.6. Determine whether limits exist. ( (√ 1 x if x > 0 3. limx→1 1. limx→0 . √ x − −x if x < 0 ( ( 1 2 if x > 1 4. limx→1 . 2. limx→1 x x if x < 1

√ 5. limx→4+ [ x]. √ 6. limx→4− [ x].

if x > 1 . if x < 1 if x > 2 . if x < 2

Another variation of the function limit is to consider what happens when x gets very big. Definition 1.3.4. A function f (x) has limit l at ∞, denoted limx→∞ f (x) = l, if for any > 0, there is N , such that |x| > N =⇒ |f (x) − l| < .

(1.3.4)

The limit at the infinity can also be split into the limits limx→+∞ f (x), limx→−∞ f (x) at positive and negative infinities. Proposition 1.3.3 still holds for the limit at infinity. The index n in sequences are usually taken as positive numbers only, so that the sequence limit limn→∞ xn really means limn→+∞ xn . For the function limit, if only ∞ is used without sign, then both negative and positive infinities are included. Example 1.3.7. We verify that limx→∞ Then

2x + 1 3 = 2. For any > 0, take N = +1. x−1

2x + 1 3 3 |x| > N =⇒ |x − 1| ≥ |x| − 1 > =⇒ − 2 = < . x−1 |x − 1| Example 1.3.8. By treating n in Example 1.1.1 as a real number instead of an 1 integer, the same argument leads to limx→∞ = 0. More generally, the argument x for the limit (1.1.2) also works and gives us 1 = 0 for α > 0. xα

(1.3.5)

lim xα = 0 for α < 0.

(1.3.6)

lim

x→+∞

The limit can be rephrased as x→+∞

1.3. LIMIT OF FUNCTION

45

Note that only +∞ is considered here because xα may not be defined for negative x and non-integer α. Of course in the special case n is a natural number, the same argument leads to 1 lim = 0 for natural number n. (1.3.7) x→∞ xn Example 1.3.9. The analogue of the limit (1.1.3) is lim αx = 0 for 0 < α < 1.

(1.3.8)

x→+∞

1 The proof, however, needs to be modified. Again let = 1 + β. Then β > 0. For α 1 any > 0, take N = + 1. Then β x > N =⇒ x > n for some integer n satisfying x > n >

1 β

1 1 1 > n > nβ > x α α =⇒ 0 < αx < ,

=⇒

1 > nβ was proved in Example 1.2.12. αn Exercise 1.3.7. Rigorously verify the limits. where the inequality

1. limx→∞

2. limx→∞

x2

x = 0. +1

sin x = 0. x

3. limx→∞

x2 − 1 = 1. x2 + 1

4. limx→∞

x2 + x − 1 = 1. x2 + 1

Exercise 1.3.8. Prove lim αx = 0 for α > 1.

x→−∞

(1.3.9)

Exercise 1.3.9. Prove the extension lim xβ αx = 0 for 0 < α < 1

x→+∞

(1.3.10)

of the limit (1.3.8).

The final variation is the divergence to infinity. Definition 1.3.5. A function f (x) diverges to infinity at a, denoted limx→a f (x) = ∞, if for any b, there is δ > 0, such that 0 < |x − a| < δ =⇒ |f (x)| > b.

(1.3.11)

The divergence to positive and negative infinities, denoted limx→a f (x) = +∞ and limx→a f (x) = −∞ respectively, can be similarly defined. Moreover, the divergence to infinity at the left of a, the right of a, or when a is various kind of infinities, can also be similarly defined. Similar to the sequences diverging to infinity, we know f (x) is an infinity 1 if and only if limx→a = 0, i.e., the reciprocal is an infinitesimal. f (x)

46

CHAPTER 1. LIMIT AND CONTINUITY

x Example 1.3.10. We verify that limx→1− 2 = −∞. For any b > 0, choose x −1 1 1 . Then , δ = min 2 b 1 −δ < x − 1 < 0 =⇒ −δ < x − 1 < 0, x > 1 − δ ≥ , x + 1 < 2 2 1 x x b =⇒ 2 = < 2 <− . x −1 (x + 1)(x − 1) 2(−δ) 4 b x can also be arbitrarily big. Therefore 2 is a 4 x −1 negative infinity at the left of 1.

Since b can be arbitrarily big,

Example 1.3.11. Taking the reciprocals of the infinitesimals in the Examples 1.3.6 and 1.3.8, we get some infinities. By considering the sign, we actually get positive or negative infinities. Combining the results with earlier examples, we get the limits of the power functions     if α > 0 if α < 0 0 0 α α lim x = 1 (1.3.12) if α = 0 , lim x = 1 if α = 0 . x→+∞   x→0+   +∞ if α < 0 +∞ if α > 0 Taking the reciprocal of the limits of the exponential function in Example 1.3.9 and in (1.3.9) give us   if 0 < α < 1 0 x lim α = 1 , if α = 1 x→+∞   +∞ if α > 1

  0 x lim α = 1 x→−∞   +∞

if α > 1 . if α = 1 if 0 < α < 1

(1.3.13)

Exercise 1.3.10. Rigorously verify the limits. 1. limx→0

1−x = +∞. 2 x (1 + x)

2. limx→0+

1.3.3

1−x = +∞. x

3. limx→−∞

x2 + 1 = −∞. x+1

4. limx→+∞

x2 + 1 = +∞. 2x − 5

Property

The properties of the limit of sequences can be extended to functions. Proposition 1.3.6. The limit of functions have the following properties. 1. Boundedness: A function convergent at a is bounded near a. 2. Arithmetic: Suppose limx→a f (x) = l and limx→a g(x) = k. Then f (x) l = , x→a g(x) k

lim (f (x) + g(x)) = l + k, lim f (x)g(x) = lk, lim

x→a

x→a

where g(x) 6= 0 and k 6= 0 are assumed in the third equality.

1.3. LIMIT OF FUNCTION

47

3. Order: Suppose limx→a f (x) = l and limx→a g(x) = k. If f (x) ≥ g(x) for x close to a, then l ≥ k. Conversely, if l > k, then f (x) > g(x) for x close to a. 4. Sandwich: Suppose f (x) ≤ g(x) ≤ h(x) and limx→a f (x) = limx→a h(x) = l. Then limx→a g(x) = l. 5. Composition: Suppose limx→a g(x) = b, limy→b f (y) = c. If g(x) 6= b for x near a, or f (b) = c, then limy→a f (g(x)) = c. When something happens near a, we mean that there is δ > 0, such that it happens when 0 < |x − a| < δ. For example, f (x) is bounded near a if there is δ > 0 and B, such that 0 < |x − a| < δ implies |f (x)| < B. The first four properties are parallel to the Propositions 1.1.2, 1.1.3, 1.1.4, 1.1.5 for the limit of sequences, and can be proved in the similar way. The fifth property means that for y = g(x) and z = f (y), we have lim y = b, lim z = c =⇒ lim z = c.

x→a

y→b

x→a

Its analogue for the sequence is Proposition 1.2.1. The property can be proved by combining the following two implications together 0 < |x − a| < δ =⇒ |g(x) − b| < µ, 0 < |y − b| < µ =⇒ |f (y) − c| < , where µ is found for the given , and then δ is found for the given µ. However, there is the technical problem that the right side of the first implication does not quite match the left side of the second implication. To make them match, the implications should be modified either as 0 < |x − a| < δ =⇒ 0 < |g(x) − b| < µ, 0 < |y − b| < µ =⇒ |f (y) − c| < , or as 0 < |x − a| < δ =⇒ |g(x) − b| < µ, |y − b| < µ =⇒ |f (y) − c| < . The first modification simply additionally requires that g(x) = 6 b for x near a. The second modification simply additionally requires that f (b) = c. Example 1.3.12. A polynomial is a function of the form p(x) = cn xn + cn−1 xn−1 + · · · + c1 x + c0 By repeatedly applying the arithmetic rule to limx→a c = c (c is a constant) and limx→a x = a, we get lim p(x) = cn an + cn−1 an−1 + · · · + c1 a + c0 = p(a).

x→a

(1.3.14)

48

CHAPTER 1. LIMIT AND CONTINUITY

More generally, a rational function r(x) =

cn xn + cn−1 xn−1 + · · · + c1 x + c0 dm xm + dm−1 xm−1 + · · · + d1 x + d0

is a quotient of two polynomials. By applying the arithmetic rule to (1.3.14), we get cn an + cn−1 an−1 + · · · + c1 a + c0 lim r(x) = = r(a), (1.3.15) x→a dm am + dm−1 am−1 + · · · + d1 a + d0 as long as the denominator is nonzero. √ Example 1.3.13. We have limx→1 (x2 +x+2) √ = 4 by Example 1.3.12 and limx→4 x = 2 by Example 1.3.2. Then we get limx→1 x2 + x + 2 = 2 by the composition rule.

Proposition 1.3.6 was stated for the two side limit limx→a f (x) = l with finite a and l only. The properties also hold when a is replaced by a+ , a− , ∞, +∞ and −∞. Some properties still hold when l is replaced by various infinities. Here are some examples. 1. The arithmetic rule (+∞)+(+∞) = +∞ for sequences can be extended to functions. For example, if limx→a+ f (x) = +∞ and limx→a+ g(x) = +∞, then limx→a+ (f (x) + g(x)) = +∞. In general, all the valid arithmetic rules for sequences that involve infinities and infinitesimals are still valid for functions. However, as in the sequence case, the same care needs to be taken in applying the arithmetic rules to infinities and infinitesimals. 2. In the composition rule, a, b, c can be basically any symbols. For example, if limx→∞ g(x) = b, g(x) > b and limy→b+ f (y) = c, then limx→∞ f (g(x)) = c. 3. If f (x) ≥ g(x) and limx→a g(x) = +∞, then limx→a f (x) = +∞. This extends the sandwich rule to +∞. There is similar sandwich rule for −∞ but no sandwich rule for ∞. Example 1.3.14. By repeatedly applying the arithmetic rule to limx→∞ c = c (c is 1 a constant) and limx→∞ = 0, we get x 1 2 + 10 2x5 + 10 x = (+∞) 2 + 10 · 0 = −∞, lim = lim x4 1 x→∞ −3x + 1 x→∞ −3 + 0 −3 + x 1 1 1−2 + 3 3 x3 − 2x2 + 1 x x = 2 − 2 · 0 + 0 = −2. lim = lim 1 x→∞ x→∞ −x3 + 1 −1 + 0 −1 + 3 x In general, we have the limit   0  c

cn xn + cn−1 xn−1 + · · · + c1 x + c0 n lim = m m−1 dm x→∞ dm x + dm−1 x + · · · + d1 x + d0    ∞ of a rational function at ∞.

if m > n, dm 6= 0 if m = n, dm 6= 0 if m < n, cn 6= 0

(1.3.16)

1.3. LIMIT OF FUNCTION

49

Example 1.3.15. The power function xα is defined for any α and x > 0. In Example 1.3.11, we obtained the limits at 0+ and +∞. Now we may consider the limit at any finite a > 0. 1 Fix a natural number n > |α|. Then for x > 1, we have n < xα < xn . x 1 By limx→1 xn = 1, limx→1 n = 1 from (1.3.15) and the sandwich rule, we get x limx→1+ xα = 1. Note that the limit is taken from the right because the sandwich 1 inequality holds for x > 1 only. Similarly, from the inequality n > xα > xn for x 0 < x < 1, we get limx→1− xα = 1. Thus we conclude that limx→1 xα = 1. For the limit of the power function at any a > 0, the composition rule gives us lim xα = lim (ax)α = aα lim xα = aα .

x→a

x→1

(1.3.17)

x→1

Example 1.3.16. Suppose limx→0+ f (x) = l. By limx→0 x2 = 0, x2 > 0 for x 6= 0, and the composition rule, we have limx→0 f (x2 ) = l. Conversely, suppose limx→0 f (x2 ) = l. Then by the composition rule and √ √ limx→0+ x = 0 in (1.3.12), we get limx→0+ f (x) = limx→0+ f (( x)2 ) = l. Thus we conclude that limx→0 f (x2 ) = limx→0+ f (x). The equality means that one limit exists if and only if the other also exists. Moreover, the two limits are equal when they exist. By the extension of the composition rule, the equality limx→0 f (x2 ) = limx→0+ f (x) also holds even if the limits diverge to infinity. Exercise 1.3.11. Rewrite the limits as limx→a f (x) for suitable a. 1. limx→a− f (−x). 1 2. limx→0 f . x Exercise 1.3.12. Compute the limits. √ x2 + 3 x + 1 √ 1. limx→1 √ . ( x + 1)( x + 2) 2. limx→−3 3. limx→0

√ 3

x − 5.

√ √ 2x + 1 − x + 1 √ √ . x + 2 − 2x + 2

4. limx→+∞

√ √ 2x + 1 − x + 1 √ √ . x + 2 − 2x + 2

3. limx→0 f ((x + 1)3 ). 1 4. limx→2+ f . x

7. limx→+∞

x−1 . x−1

6. limx→0 √ 3

x √ . 2x + 1 − 3 x + 1

3

5

8. limx→1+ 9.

10.

11. 12.

1

x 2 − 3x 2 + 2x 2 8

√ 3 5. limx→1

5

8

3x 3 + 5x 3 + 2

.

5

3x 3 + 5x 3 + 2

3 1 . 5 x 2 − 3x 2 + 2x 2 r x+1 limx→+∞ x −1 . x−1 r x+1 limx→1+ x −1 . x−1 √ x + 1 − x2 + 3 limx→+∞ . x−1 √ x + 1 − x2 + 3 limx→1+ . x−1

Exercise 1.3.13. Prove the properties of limit. 1. If limx→a f (x) = l and limx→a g(x) = k, then limx→a max{f (x), g(x)} = max{l, k} and limx→a min{f (x), g(x)} = min{l, k}.

50

CHAPTER 1. LIMIT AND CONTINUITY 2. If limx→a+ f (x) = ∞ and there are c > 0 and δ > 0, such that 0 < x − a < δ implies g(x) > c, then limx→a+ f (x)g(x) = ∞. 3. If limx→a g(x) = +∞ and limy→+∞ f (y) = c, then limx→a f (g(x)) = c. 4. If f (x) ≤ g(x) and limx→+∞ g(x) = −∞, then limx→+∞ f (x) = −∞.

1.3.4

Limit of Trignometric Function

We already know the limits (1.3.12) and (1.3.17) of the power function xα . We also know the limits (1.3.15) and (1.3.16) of the polynomials and rational functions. In this section, we will study the limits of trigonometric functions. We begin by establishing an important inequality. In Figure 1.5 are a π circle of radius 1 and an angle 0 < x < . The arc P B has length x, and 2 1 the area of the fan OBP is x. Both the triangles OBP and OBQ have 2 the base OB = 1 and respective heights AP = sin x, BQ = tan x. Since the fan OBP is sandwiched between the triangles OBP and OBQ, we have the inequality between the areas 1 1 π 1 sin x < x < tan x for 0 < x < . 2 2 2 2 Therefore we get 0 < sin x < x for 0 < x < and cos x <

π , 2

sin x π < 1 for 0 < x < . x 2

(1.3.18)

(1.3.19)

.... .. .. . 1 ................................. ......... .. ....... .. ....... ..... .. Q ..... .. ....P ............ .. . ...... .. .. .................... .... . .. . . . . .... .... ...... ... .. ....... ... ...... ... . .. . . . . ... ... ...... ... .. . . . . . ... .. ..... ... .. . . . . . .. ........ .. ......... .. .... .. .......... x . . . . . . . . . . .. . . . . . . . ........................................................................................................................................................ O ... A B .. . Figure 1.5: trignometric function Applying the sandwich rule to (1.3.18), we get limx→0+ sin x = 0. By the composition rule, limx→0− sin x = lim−x→0− sin(−x) = − limx→0+ sin x = 0. Thus by Proposition 1.3.3, lim sin x = 0. x→0

1.3. LIMIT OF FUNCTION

51

This further implies x x 2 lim cos x = lim 1 − 2 sin = 1 − 2 lim sin = 1. x→0 x→0 x→0 2 2

2

Then by the addition formulae for the sine and the cosine functions, we have lim sin x = lim sin(a + x) = lim (sin a cos x + cos a sin x)

x→a

x→0

x→0

= sin a · 1 + cos a · 0 = sin a, lim cos x = lim cos(a + x) = lim (cos a cos x + sin a sin x)

x→a

x→0

(1.3.20)

x→0

= cos a · 1 + sin a · 0 = cos a.

(1.3.21)

Finally, by the arithmetic rule, we get lim tan x =

x→a

limx→a sin x sin a = = tan a, limx→a cos x cos a

(1.3.22)

and similarly for the limits of the other trignometric functions. Applying limx→0 cos x = 1 and the sandwich rule to (1.3.19), we get sin x = 1. x→0 x lim

(1.3.23)

This implies sin x limx→0 tan x x = 1. = (1.3.24) lim x→0 x limx→0 cos x x 2 2 sin2 1 − cos x 2 sin2 x sin x 1 1 2 lim = lim = lim = lim = . (1.3.25) 2 2 2 x→0 x→0 x→0 (2x) x x 2 x→0 x 2 Example 1.3.17. In the limits (1.3.23) and (1.3.24), we substitute x by x − get (the composition rule is used here) lim

x→ π2

π and 2

cos x 1 limπ = −1. π = −1, x→ π 2 x− x− tan x 2 2

Taking the reciprocal of the second limit, we get π limπ x − tan x = −1. 2 x→ 2 √ √ Example 1.3.18. Since limx→+∞ ( x + 1 − x) = limx→+∞ √ the composition rule, we have √ √ x+1− x lim sin = 0, x→+∞ 2

1 √ = 0. By x+1+ x

√ √ sin( x + 1 − x) √ lim = 1. √ x→+∞ x+1− x

Then by the fact that cos is bounded, we have √ √ √ √ √ √ x+1− x x+1+ x lim (sin x + 1 − sin x) = lim 2 sin cos = 0. x→+∞ x→+∞ 2 2

52

CHAPTER 1. LIMIT AND CONTINUITY

Moreover, lim

x→+∞

√

√ √ √ √ √ √ sin( x + 1 − x) √ x sin( x + 1 − x) = lim x( x + 1 − x) lim √ x→+∞ x→+∞ x+1− x √ x 1 1 = . = lim √ √ = lim r x→+∞ x→+∞ 2 x+1+ x 1 1+ +1 x √

Exercise 1.3.14. Compute the limits. sin x . x−π

1. limx→π

sin x . x−π

2. limx→∞

8. limx→0

1 − cos x . sin x

9. limx→0

1 − cos x 1 sin 2 . sin x x

10. limx→0

tan x − sin x . x3

x+

11. limx→0

sin x2 . (sin x)2

5. limx→π

1 + cos x . (x − π)2

12. limx→0

1 − cos x2 . (1 − cos x)2

6. limx→0

tan 2x . tan 3x

13. limx→0

7. limx→0

sin(tan x) . x

14. limx→0

3. limx→0

sin x . x−π

4. limx→− 3π 2

1.3.5

3π 2 . cos x

cos x − cos 2x . x2 √ √ cos x − 3 cos x . sin2 x

Limit of Exponential Function 1

For a > 0, we know limn→∞ a n = 1. This suggests that lim ax = 1.

(1.3.26)

x→0

For the case a ≥ 1 and x > 0, we will prove this by comparing with the 1 known limit of a n . In other words, we compare x with the reciprocals of natural numbers. 1 1 For 0 < x < 1, we have ≤ x < for some natural number n. n+1 n Moreover, the smaller x is, the bigger n will be (the precise estimation is given in the formal argument below). This implies 1

1

1 ≤ a n+1 ≤ ax ≤ a n . 1

Repeating the proof of the sandwich rule, by the limit limn→∞ a n = 1, for 1 any > 0, there is N , such that n > N implies |a n − 1| < . Then 0<x<δ=

1 1 1 =⇒ ≤ x < for some natural number n > N N +1 n+1 n 1 x =⇒ |a − 1| ≤ |a n − 1| < .

1.3. LIMIT OF FUNCTION

53

This completes the proof of limx→0+ ax = 1 in case a ≥ 1. 1 For the case 0 < a ≤ 1, we have ≥ 1, and the arithmetic rule gives us a lim+ ax =

x→0

1 limx→0+

1 x a

= 1.

Further by the composition rule, we have lim− ax = lim+ a−x =

x→0

x→0

1 limx→0+ ax

= 1.

This completes the proof of the limit (1.3.26) in all cases. The limit (1.3.26) is a special case of limx→b ax , which we expect to be ab . The equality limx→b ax = ab can be derived from the following general result. Proposition 1.3.7 (Exponential Rule). Suppose limx→a f (x) = l > 0 and limx→a g(x) = k. Then limx→a f (x)g(x) = lk . Proof. We prove the case a is finite. The proof for the other cases is similar. By the limit (1.3.17) and the composition rule, we have limx→a f (x)k = k l . On the other hand, choose A > 1 satisfying A−1 < l < A. Then limx→a f (x) = l tells us that there is δ > 0, such that 0 < |x − a| < δ implies A−1 < f (x) < A. This further implies A−|g(x)−k| < f (x)g(x)−k < A|g(x)−k| . The assumption limx→a g(x) = k implies limx→a |g(x) − k| = 0. By the limit (1.3.26) and the composition rule, we have lim A−|g(x)−k| = lim A|g(x)−k| = 1.

x→a

x→a

Then by the sandwich rule, we get limx→a f (x)g(x)−k = 1. Thus we conclude that lim f (x)g(x) = lim f (x)k lim f (x)g(x)−k = lk · 1 = lk . x→a

x→a

x→a

Next we establish the limit lim+ xx = 1,

(1.3.27)

x→0

which is not covered by the exponential rule above. Note that for the special 1 1 . So the limit is closely related to the limit case x = , we have xx = √ n n n (1.1.4). This suggests us to prove (1.3.27) by comparing with (1.1.4). We have n1 1 1 1 1 x ≤x< =⇒ 1 > x > = 1 n+1 n n+1 (n + 1) n 1 x =⇒ |x − 1| < 1 − 1 . (n + 1) n

54

CHAPTER 1. LIMIT AND CONTINUITY

Based on the estimation and the known limit limn→∞

1

= 1 (see 1 (n + 1) n (1.1.7)), an argument for the limit (1.3.27) can be made similar to the proof of (1.3.26). Other forms of the exponential rule can be found in Exercise 1.3.19. Example 1.3.19. By the limit (1.3.27), we have limx→0− |x|x = limx→0+ x−x = 1 = 1. Thus limx→0 |x|x = 1. limx→0+ xx Example 1.3.20. By the limit (1.3.27) and the exponential rule, we have lim (5 − 3x + 4x2 )x = 50 = 1.

x→0

lim (2x2 − x3 )x = lim (|x|x )2 (2 − x)x = 12 · 20 = 1.

x→0

3

√

lim (2x + x )

x→0

x

x→0+

= lim (2x2 + x6 )x = lim (|x|x )2 (2 + x4 )x = 12 · 20 = 1. x→0+

x→0

Example 1.3.21. By the limits (1.3.23) and the exponential rule, we have sin x x x x x = 10 · 1 = 1. lim (sin x) = lim x x→0+ x→0+ 0 1 − cos x tan x x 2 tan x 1 tan x x lim (1 − cos x) = lim = (x ) · 12 = 1. 2 x→0 x→0 x 2 Exercise 1.3.15. Compute the limits. 1. limx→0+ (xx )x .

6. limx→+∞ x−x .

x

2. limx→0+ x(x ) . 3. limx→0+ (tan x)x . 4. limx→0

|x|tan x .

5. limx→0 (cos x)x .

1

7. limx→+∞ x x . 1

8. limx→+∞ (2x + 3x ) x . x

9. limx→+∞ (2x + 3x ) x2 +1 .

Exercise 1.3.16. Prove limx→0 |p(x)|x = 1 for any nonzero polynomial p(x). Exercise 1.3.17. Finish the proof of the limit (1.3.27). 1 for sufficiently small x > 0. Exercise 1.3.18. For any a > 0, we have x < a < x Use this to derive the limit (1.3.26) from the limit (1.3.27). Exercise 1.3.19. Prove the following exponential rules. 1. l+∞ = +∞ for l > 1: If limx→a f (x) = l > 1 and limx→a g(x) = +∞, then limx→a f (x)g(x) = +∞. 2. (0+ )k = 0 for k > 0: If f (x) > 0, limx→a f (x) = 0 and limx→a g(x) = k > 0, then limx→a f (x)g(x) = 0. From the two rules, further derive the following exponential rules. 1. l+∞ = 0 for 0 < l < 1.

3. (0+ )k = +∞ for k < 0.

2. l−∞ = 0 for l > 1.

4. (+∞)k = 0 for k > 0.

1.3. LIMIT OF FUNCTION

55

Exercise 1.3.20. Provide counterexamples to the wrong exponential rules. (+∞)0 = 1, 1+∞ = 1, 00 = 1, 00 = 0.

x 1 . Similar to the way the limit Finally we consider limx→∞ 1 + x (1.3.27) is derived, we compare with the natural number case, which is the definition of e in Example 1.2.11. For x > 1, we have n ≤ x ≤ n + 1 for some natural number n. This implies n x n+1 1 1 1 ≤ 1+ ≤ 1+ . (1.3.28) 1+ n+1 x n From Example 1.2.11, we know n+1 n 1 1 1 lim 1 + = lim 1 + lim 1 + = e, n→∞ n→∞ n n n→∞ n n+1 1 n limn→∞ 1 + 1 n+1 = e. lim 1 + = n→∞ 1 n+1 limn→∞ 1 + n+1 Therefore for any > 0, there is N > 0, such that n+1 n 1 1 n > N =⇒ 1 + − e < , 1 + − e < , n n+1

(1.3.29)

Then for x > N + 1, we have n ≤ x ≤ n + 1 for some natural number n > N . By the inequalities (1.3.28) and (1.3.29), this further implies n x n+1 1 1 1 −e≤ 1+ −e≤ 1+ − e < . − < 1 + n+1 x n x 1 This proves that limx→+∞ 1 + = e, which further implies x x −x 1 1 lim 1 + = lim 1 − x→−∞ x→+∞ x x x−1 1 1 1+ = e. = lim 1 + x→+∞ x−1 x−1 Thus we conclude that

lim

x→∞

1 1+ x

x = e.

(1.3.30)

Example 1.3.22. By the composition rule, we have a a x 1 ax 1 x lim 1 + = lim 1 + = lim 1 + = ea . x→∞ x→∞ x→∞ x x x −1 1 1 −x 1 x x lim (1 − x) = lim 1 + = lim 1 + = e−1 . x→∞ x→∞ x→0 x x

56

CHAPTER 1. LIMIT AND CONTINUITY

By the exponential rule and the composition rule, we have 2 x 1 x 1 x lim 1+ = lim 1+ = e−∞ = 0. x→−∞ x→−∞ x x x+x2 1 x 2 x+x2 2 x1 lim (1 + x + x ) = lim (1 + x + x ) x→0

x→0

x+x2 1 limx→0 x = lim (1 + x + x2 ) x+x2 x→0 limx→0 (1+x) 1 y = lim 1 + = e1 = e. y→∞ y

The examples also show that there is no exponential rule for 1+∞ or 1−∞ . Exercise 1.3.21. Compute the limits. 1. limx→0 (1 + x)

−2 x

.

4. limx→∞

1 x 2. limx→∞ 1 + 2 . x

5. limx→0

2+x 2 + 3x

1

x

.

2 + x + x2 2 − 3x + 2x2

1 x2 +x

.

21 x +x 2 + x + x2 3. limx→0 . . 6. limx→∞ 2 2 − 3x + 2x −n 1 1 −x Exercise 1.3.22. Prove that n ≤ x ≤ n+1 implies 1 − ≤ 1− ≤ n+1 x 1 x 1 −(n+1) . Then use this and Exercise 1.2.19 to prove limx→−∞ 1 + = 1− n x e.

1.3.6

2+x 2 + 3x

1

x

More Property

In Section 1.3.5, we saw the close relation between the sequence limit and the function limit. In fact, we used the sequence limit to derive the function limit. Proposition 1.3.8. For a function f (x), limx→a f (x) = l if and only if limn→∞ f (xn ) = l for any sequence {xn } satisfying xn 6= a and limn→∞ xn = a. Proof. We prove the case a and l are finite. The proof for the other cases is similar. Suppose limx→a f (x) = l. Suppose {xn } is a sequence satisfying xn 6= a and limn→∞ xn = a. Then for any > 0, there are δ > 0 and N , such that 0 < |x − a| < δ =⇒ |f (x) − l| < . n > N =⇒ |xn − a| < δ. The assumption xn 6= a further implies |xn − a| > 0. Thus combining the two implications together leads to n > N =⇒ |f (xn ) − l| < ,

1.3. LIMIT OF FUNCTION

57

and we conclude limn→∞ xn = a. Conversely, assume limx→a f (x) 6= l (which means either the limit does not exist, or the limit exists but is not equal to l). Then there is > 0, such that for any δ > 0, there is x satisfying 0 < |x − a| < δ and |f (x) − 1 for natural numbers n, we find a l| ≥ . Specifically, by choosing δ = n 1 sequence {xn } satisfying 0 < |xn − a| < and |f (xn ) − l| ≥ . The first n inequality implies xn 6= a and limn→∞ xn = a. The second inequality implies limn→∞ f (xn ) 6= l. In conclusion, we find a sequence satisfying xn 6= a and limn→∞ xn = a but limn→∞ f (xn ) 6= l. Example 1.3.23. Consider limx→+∞ cos x. The sequences xn = 2nπ, yn = (2n+1)π diverge to +∞. However, the sequences {cos xn } = {1} and {cos yn } = {−1} converge to different limits. By Proposition 1.3.8, cos x diverges at +∞. Example 1.3.24. The Dirichlet9 function is ( 1 if x is rational D(x) = . 0 if x is irrational For any a, we can find a sequence {xn } of rational numbers and a sequence {yn } of 1 irrational numbers convergent to but not equal to a (for a = 0, take xn = and n √ 2 , for example). Then f (xn ) = 1 and f (yn ) = 0, so that limn→∞ f (xn ) = 1 yn = n and limn→∞ f (yn ) = 0. This implies that the Dirichlet function diverges everywhere. 1 Example 1.3.25. The restriction of the limits (1.3.23), (1.3.27), (1.3.30) to , n √ 1 √ , { n} give n 1 = 1. n→+∞ n √ √1 2 √1 lim n n = lim ( n) n = 1. n→∞ n→∞ √n 1 lim 1 + √ = e. n→∞ n lim n sin

Exercise 1.3.23. Determine convergence. tan x + 1 . tan x − 1

1. limx→ π2 tan x.

3. limx→ π2

π 2. limx→ π2 x − tan x. 2

4. limx→π tan x.

9 Johann Peter Gustav Lejeune Dirichlet, born 1805 in D¨ uren (French Empire, now Germany), died in G¨ ottingen (Germany). He proved the famous Fermat’s Last Theorem for the case n = 5 in 1825. He made fundamental contributions to the analytic number theory, partial differential equation, and Fourier series. He introduced his famous function in 1829.

58

CHAPTER 1. LIMIT AND CONTINUITY 5. limx→1 sin

x2

√ √ 6. limx→1 ( 1 + x − 2x) sin

x . −1

x . x2 − 1

Exercise 1.3.24. Compute the limits. sin √ 1 n−1 n2 − 1 . 1. limn→∞ 2n + 1 1 1 tan n 2. limn→∞ sin . n √1 n 1 3. limn→∞ sin . n

√ √ 4. limn→∞ (sin n + 1 − sin n). 1 n 5. limn→∞ 1 + sin . n (−1)n n 6. limn→∞ 1 + . n2

Exercise 1.3.25. Prove that limx→a f (x) converges if and only if limn→∞ f (xn ) converges for any sequence {xn } satisfying xn 6= a and limn→∞ xn = a. Exercise 1.3.26. Prove that limx→a+ f (x) = l if and only if limn→∞ f (xn ) = l for any strictly decreasing sequence {xn } converging to a. Moreover, state the similar criterion for limx→+∞ f (x) = l. Exercise 1.3.27. Prove the exponential rule for the sequence limits similar to Proposition 1.3.7: If limn→∞ xn = l > 0 and limn→∞ yn = k, then limn→∞ xynn = lk . Moreover, extend the exponential rule similar to Exercise 1.3.19.

A function is increasing if x > y =⇒ f (x) ≥ f (y). It is strictly increasing if x > y =⇒ f (x) > f (y). The concepts of decreasing and strictly decreasing functions can be similarly defined. A function is monotone if it is either increasing or decreasing. Proposition 1.2.7 on the convergence of monotone sequences can be extended to the one side limits and the limits at signed infinities of monotone functions. The following result is stated for the right limit, and can be proved in similar way. Proposition 1.3.9. Suppose f (x) is a monotone function defined and bounded for x > a and x near a. Then limx→a+ f (x) converges. We will use derivatives to determine whether certain functions are monotone. The method can be combined with the proposition to show the convergence of limits. Exercise 1.3.28. Suppose f (x) is an increasing function defined and unbounded for x > a and x near a. Prove that limx→a+ f (x) = −∞.

Finally, the Cauchy criterion can also be applied to the limit of functions. Proposition 1.3.10. A function f (x) has limit at a if and only if for any > 0, there is δ > 0, such that 0 < |x − a| < δ, 0 < |y − a| < δ =⇒ |f (x) − f (y)| < .

1.3. LIMIT OF FUNCTION

59

Similar statements can be made for the one side limits and the limits at infinities. Proof. For the convergence implying the Cauchy condition, the proof of Theorem 1.2.2 can be adopted without much change. Conversely, assume f (x) satisfies the Cauchy condition. Choose a sequence {xn } satisfying xn 6= a and limn→∞ xn = a. We prove that {f (xn )} is a Cauchy sequence. For any > 0, we have δ > 0 as given by the Cauchy condition for f (x). Then for δ > 0, there is N , such that n > N =⇒ |xn − a| < δ. The assumption xn 6= a also implies 0 < |xn − a| < δ. Combined with the Cauchy condition for f (x), we get m, n > N =⇒ 0 < |xm − a| < δ, 0 < |xn − a| < δ =⇒ |f (xm ) − f (xn )| < . Therefore {f (xn )} is a Cauchy sequence and must converge to a limit l. Next we prove that l is also the limit of the function at a. For any > 0, we have δ > 0 as given by the Cauchy condition for f (x). Then we can find xn satisfying 0 < |xn − a| < δ and |f (xn ) − l| < . Moreover, for x satisfying 0 < |x − a| < δ, we may apply the Cauchy condition to x and xn to get |f (x) − f (xn )| < . Therefore |f (x) − l| ≤ |f (xn ) − l| + |f (x) − f (xn )| < 2. This completes the proof that limx→a f (x) = l.

Example 1.3.26. Consider limx→+∞ cos x. For any N , we can find a natural number n such that x = 2nπ, y = (2n + 1)π > N , but |cos x − cos y| = 2. Thus the Cauchy criterion fails for = 2, and limx→+∞ cos x diverges. Exercise 1.3.29. Use Proposition 1.3.10 to show the divergence of limits. 1. limx→1

1.3.7

1+x . 1−x

1 2. limx→0 sin . x

3. limx→∞

2 + sin x . 2 − sin x

Order of Infinity and Infinitesimal

The limit limx→a f (x) = l means f (x) − l is an infinitesimal. The limit limx→a f (x) = ∞ means f (x) is an infinity. Thus the concept of limit is really a matter of infinitesimals and infinities. The infinities can be compared. For example, as x → ∞, the infinity x2 gets bigger much faster than the infinity x does. In general, bigger infinities are considered to have higher order. The following are more specific ways of comparing infinities f (x) and g(x) at a. The comparisons can also be applied to sequences.

60

CHAPTER 1. LIMIT AND CONTINUITY

f (x) g(x) = 0 (or equivalently, limx→a = ∞), then we denote g(x) f (x) f (x) = o(g(x)) and call g(x) an infinity of higher order than f (x). For example, If limx→a

xα = o(xβ ) at + ∞ for β > α > 0, xα = o(xβ ) at 0+ for β < α < 0, ax = o(bx ) at + ∞ for b > a > 1, xα = o(ax ) at + ∞ for α > 0, a > 1, nα = o(nβ ) for β > α > 0, nα = o(ax ) for α > 0, a > 1, an = o(n!), n! = o(nn ). If |f (x)| ≤ c|g(x)| for a constant c and x close to a, then we denote f (x) = O(g(x)) and say the order of f (x) is no higher than that of g(x). n For example, x(1 + sin x) = O(x) at ∞, n2+(−1) = O(n3 ). We also denote f (x) = O(1) to mean that f (x) is bounded, although the constant 1 is not an infinity. For example, sin x = O(1) and cos x = O(1) at ∞. If c1 |g(x)| ≤ |f (x)| ≤ c2 |g(x)| for constants c2 > c1 > 0 and x close to a, that is, f (x) = O(g(x)) and g(x) = O(f (x)), then f (x) and g(x) are infinities of the same order. For example, a polynomial of degree n has the same order as xn at ∞, x(3 + 2 sin x) and x have the same order at ∞. f (x) If limx→a = 1, then we denote f (x) ∼ g(x) and call f (x) and g(x) g(x) equivalent infinities. For example, x3 + 2x − 5 ∼ x3 and 2x + x4 ∼ 2x at ∞, x+2 3 ∼ 2 at 1, 3n − 2n+2 ∼ 3n . 2(x − 1) x −1 Example 1.3.27. By v q s p s √ u p √ x+ x+ x x+ x u 1 1 t √ = 1+ = 1+ + , x x x 32 x q p √ q x+ x+ x p √ √ √ we get limx→+∞ = 1 and conclude x + x + x ∼ x at +∞. x Exercise 1.3.30. Compare the infinities at +∞. r q q √ √ 3 √ x + 1 − 1, x + 1 − 1, 1 + 1 + 1 + x, x2 (2 + cos x), xx , 2x . Exercise 1.3.31. Find α, so that the infinities at ∞ have the same order as xα . r q p √ 2 3 5 x(2 + cos x) + x (2 − sin x), 6x + 5x , x + x2 + x3 . Exercise 1.3.32. Is it true that f1 (x) = o(g(x)) and f2 (x) = o(g(x)) =⇒ f1 (x) + f2 (x) = o(g(x))? Discuss the similar properties for product, composition, etc. Discuss the similar properties for other types of comparisons.

1.3. LIMIT OF FUNCTION

61

Exercise 1.3.33. Given a sequence of functions f1 (x), f2 (x), . . . , fn (x), . . . that are infinities as x → a. Construct a function f (x) such that fn (x) = o(f (x)) for all n. In other words, f (x) diverges to infinity much faster than any fn (x).

Similar to the infinities, the infinitesimals can also be compared. In general, smaller infinitesimals are considered to have higher order. The following comparisons are defined for infinitesimals f (x) and g(x) at a. f (x) If limx→a = 0, then we denote f (x) = o(g(x)) and call f (x) an g(x) infinitesimal of higher order than g(x). For example, xβ = o(xα ) at 0+ for β > α > 0, bx = o(ax ) at − ∞ for b > a > 1, nβ = o(nα ) for β < α < 0, an = o(nα ) for α < 0, 0 < a < 1. If |f (x)| ≤ c|g(x)| for a constant c and x close to a, then we denote f (x) = O(g(x)) and say the order of g(x) is no than that of f (x). For higher 1 cos x 1 example, sin x sin = O(x) at 0, =O at ∞. x x x If c1 |g(x)| ≤ |f (x)| ≤ c2 |g(x)| for constants c2 > c1 > 0 and x close to a, that is, f (x) = O(g(x)) and g(x) = O(f (x)), then f (x) and g(x) are infinitesimals of the same order. For example, x2 + 2x − 3 and x − 1 have the same order at 1, 1 − cos x and x2 have the same order at 0. f (x) = 1, then we denote f (x) ∼ g(x) and call f (x) and g(x) If limx→a g(x) equivalent infinitesimals. For example, sin x ∼ x and tan x ∼ x at 0, x2 + 2x − 3 ∼ 4(x − 1) at 1. Example 1.3.28. By q p √ x+ x+ x

s =

1

x8

3 4

x +

p √ x+ x 1

r =

x4

q p √ x+ x+ x we get limx→0+

q 1 x + x 2 + 1, 3 4

= 1 and conclude

1

q p √ 1 x + x + x ∼ x 8 at 0+ .

x8 Example 1.3.29. The signs o(g(x)) and O(g(x)) are often used as a function f (x) satisfying the indicated comparison with g(x). For example, we may write 1 sin x = x + o(x), cos x = 1 − x2 + o(x2 ) 2 to mean that

1 sin x = x + f1 (x), cos x = 1 − x2 + f2 (x), 2

with

f1 (x) f2 (x) = lim = 0. x→0 x2 x Exercise 1.3.34. Compare the infinitesimals at 0. q √ 1 3 √ x + 1 − 1, x + 1 − 1, tan x − x, sin x + cos x − 1, sin x sin . x lim

x→0

62

CHAPTER 1. LIMIT AND CONTINUITY

Exercise 1.3.35. Find α, so that the infinitesimals at 0 have the same order as xα . r q p √ 2 3 5 sin 2x − 2 sin x, 6x + 5x , x + x2 + x3 . Exercise 1.3.36. Do Exercise 1.3.32 again for the infinitesimals. Exercise 1.3.37. How do the comparisons change when we use the reciprocal 1 f (x) ↔ to convert between infinities and infinitesimals? f (x)

1.3.8

Additional Exercise

Extended Supremum and Extended Upper Limit f (x) = 1, then we expect f (x1 ) + f (x2 ) + · · · + f (xn ) and g(x) g(x1 ) + g(x2 ) + · · · + g(xn ) to be very close to each other when x1 , x2 , . . . , xn are very small. The following exercises indicate some cases our expectation is fulfilled. If limx→0

Exercise 1.3.38. Suppose f (x) is a function on (0, 1] satisfying limx→0+ Prove n 1 2 3 lim f +f +f + ··· + f n→∞ n2 n2 n2 n2 1 2 3 n n+1 1 = lim + 2 + 2 + · · · + 2 = lim = . n→∞ n2 n→∞ n n n 2n 2

f (x) = 1. x

f (x) = 1. Suppose for each natural g(x) number n, there are nonzero numbers xn,1 , xn,2 , . . . , xn,kn , so that {xn,k } uniformly converges to 0: For any > 0, there is N , such that n > N implies |xn,k | < . Prove that if lim (g(xn,1 ) + g(xn,2 ) + · · · + g(xn,kn )) = a, Exercise 1.3.39. Suppose g(x) > 0 and limx→0

n→∞

then lim (f (xn,1 ) + f (xn,2 ) + · · · + f (xn,kn )) = a.

n→∞

Upper and Lower Limits of Functions Suppose f (x) is defined near (but not necessarily at) a. Let n o LIMa f = lim f (xn ) : xn 6= a, lim xn = a, lim f (xn ) converges . n→∞

n→∞

n→∞

Define lim f (x) = sup LIMa f, lim f (x) = inf LIMa f.

x→a

x→a

Similar definitions can be made when a is replaced by a+ , a− , ∞, +∞ and −∞. Exercise 1.3.40. Prove the analogue of Proposition 1.2.9: l ∈ LIMa f if and only if for any > 0 and δ > 0, there is x satisfying 0 < |x − a| < δ and |f (x) − l| < .

1.4. CONTINUOUS FUNCTION

63

Exercise 1.3.41. Prove the analogue of Exercise 1.2.25: If limk→∞ lk = l and lk ∈ LIMa f for each k, then l ∈ LIMa f . Exercise 1.3.42. Prove the analogue of Proposition 1.2.10: 1. If l < limx→a f (x), then for any δ > 0, there is x satisfying 0 < |x − a| < δ and f (x) > l. 2. If l > limx→a f (x), then there is δ > 0, such that 0 < |x − a| < δ implies f (x) < l. The two properties completely characterize the upper limit. Exercise 1.3.43. Prove the analogue of Proposition 1.2.11: The upper and the lower limits of f (x) at a belong to LIMa f , and f (x) converges at a if and only if the upper and the lower limits at a are equal. Exercise 1.3.44. Prove the extension of Proposition 1.3.3: LIMa f = LIMa+ f ∪ LIMa− f. In particular, we have

lim f (x) = max lim f (x), lim f (x) , x→a x→a+ x→a− lim f (x) = min lim f (x), lim f (x) . x→a

x→a+

x→a−

Exercise 1.3.45. Extend the arithmetic and order properties of upper and lower limits in Exercise 1.2.27. Exercise 1.3.46. Prove the analogue of Exercise 1.2.30: lim f (x) = lim sup{f (x) : 0 < |x − a| < δ}.

x→a

1.4

δ→0+

Continuous Function

Changing quantities are often described by functions. Most of the changes in the real world are smooth, gradual and well behaved. For example, people do not often press the brake when driving a car, and the weather do not suddenly change from summer to winter. The functions describing such well behaved changes are at least continuous.

1.4.1

Definition

A function is continuous if its graph “does not break”. The graph of a function f (x) may break at a for various reasons. But the breaks are always one of the two types: Either limx→a f (x) diverges or the limit converges but the limit value is not f (a). In Figure 1.6, the function is continuous at a4 and is not continuous at the other four points. Specifically, limx→a1 f (x) exists but is not equal to f (a1 ). limx→a2 f (x) does not exist because the left and the right limits are not the same. limx→a3 f (x) does not exist because the function is not bounded near a3 . limx→a5 f (x) does not exist because the left limit does not exist. The observation leads to the following definition.

64

CHAPTER 1. LIMIT AND CONTINUITY .. .. ....................... .. ........ ....... ...... ... ... . .. . . . . . . . . . . . . . . . . . . . . . . . . . ...... . ... .. ... ..... .. ... . ............. .. . . . . . .. ........ .. ... .. ............................... . . . . . . . ... .. ....... ..... .. ... .. ...... ..... .. ..... ............................. .. ... .. ...... ... .. .. . .. ....... .. . . . . . . . . ... .. . . ......... .. . . . ... .. .. .. .. ... ... .. ...... ... . . .. . . . . . . . . ... .. .. . .. .............. ... . . ................................................................................................................................................................................................................................................................ a1 a2 a4 a5 .. a3 Figure 1.6: continuity and discontinuity

Definition 1.4.1. A function f (x) defined near (and include) a is continuous at a if limx→a f (x) = f (a). Using the -δ language, the continuity of f (x) at a means that for any > 0, there is δ > 0, such that |x − a| < δ =⇒ |f (x) − f (a)| < .

(1.4.1)

A function is right continuous at a if limx→a+ f (x) = f (a), and is left continuous if limx→a− f (x) = f (a). For example, the function in Figure 1.6 is left continuous at a2 and right continuous at a5 . A function is continuous at a if and only if it is both left and right continuous at a. A function defined on an open interval (a, b) is continuous if it is continuous at every point on the interval. A function defined on a closed interval [a, b] is continuous if it is continuous at every point on (a, b), is right continuous at a, and is left continuous at b. Continuities for functions on other kinds of intervals can be defined similarly. Most basic functions are continuous. The limits (1.3.15), (1.3.17), (1.3.20), (1.3.21), (1.3.22), (1.3.26) show that polynomials, rational functions, trignometric functions, power functions, and exponential functions are continuous (at the places where the functions are defined). Then the arithmetic rule and the composition rule in Proposition 1.3.6 imply the following. Proposition 1.4.2. The arithmetic combinations and the compositions of continuous functions are still continuous. tan x x , 2x cos 2 x +1 x+1 are continuous. Thus examples of discontinuity can only be found among more exotic functions. Proposition 1.3.8 implies the following criterion for the continuity in terms of sequences. By the proposition, functions such as sin2 x+tan x2 , √

Proposition 1.4.3. A function f (x) is continuous at a if and only if limn→∞ f (xn ) = f (a) for any sequence {xn } converging to a. The conclusion of the proposition can be written as lim f (xn ) = f lim xn . n→∞

n→∞

(1.4.2)

1.4. CONTINUOUS FUNCTION

65

Moreover, by the composition rule for the limit of functions, the continuity of f (x) at a = limy→b g(y) also means lim f (g(y)) = f lim g(y) . (1.4.3) y→b

y→b

Therefore the function and the limit can be interchanged if the function is continuous. ..... ..... ..... ..... . . .. rational .. rational ... . . .................................................. ................................. .. .... .. ....... .. .. .. ..... .. .. . . irrational . . . . . . . .................................................... .........................................irrational ..................... ....................................................................... . . . ... . . .. . ... . .. .. .... ... . . . . . . ............................................. .. .. ..... . .. xD(x) .. D(x) .. sign(x) . . . .. .. .. ... Figure 1.7: discontinuous functions Example 1.4.1. The sign function   1 sign(x) = 0   −1

if x > 0 if x = 0 if x < 0

is continuous everywhere except at 0. The Dirichlet function in Example 1.3.24 is not continuous everywhere. Multiplying x to the Dirichlet function produces a function ( x if x is rational xD(x) = 0 if x is irrational that is continuous only at 0. p , where p is an integer, q 1 q is a natural number, and p, q are coprime, define R(x) = . For an irrational q number x, define R(x) = 0. Finally define R(0) = 1. We will show that R(x) is continuous at precisely all the irrational numbers. By the way R(x) is defined, for any integer N , the only numbers x satisfying 1 p are the rational numbers with q ≤ N . Therefore on any bounded R(x) ≥ N q 1 interval, we have R(x) < for all except finitely many rational numbers. For N any irrational a, let > 0 be the smallest distance between a and these rational 1 numbers. Then we have |R(x) − R(a)| = R(x) < for all x satisfying |x − a| < . N p This proves that f (x) is continuous at a. On the other hand, for rational a = , q

Example 1.4.2 (Thomae10 ). For a rational number x =

10 Karl Johannes Thomae, born 1840 in Laucha (Germany), died 1921 in Jena (Germany). Thomae made important contributions to the function theory. In 1870 he showed the continuity in each variable does not imply the joint continuity. He constructed the example here in 1875.

66

CHAPTER 1. LIMIT AND CONTINUITY

we can find irrational numbers x arbitrarily close to a, but with |R(x) − R(a)| = 1 R(a) = . Since q does not depend on x, we do not have limx→a R(x) = R(a), q and the function is not continuous at the rational point. Exercise 1.4.1. Which functions are not continuous? Where do the discontinuities happen, and for what reason?  sin 1 if x 6= 0 1. |x|. 8. . x 0 x2 sin x if x = 0 2. 2 . x + sin x  x sin 1 if x 6= 0 3. [x] sin x. 9. . x 0 if x = 0 4. [x] sin πx.  (  |x| xx if x > 0 if x 6= 0 10. . 5. . x 0 if x ≤ 0 0 if x = 0   x  if x > 0  |x| x sin x if x 6= 0 x . 6. 11. (−x) if x < 0 . x 0   if x = 0 1 if x = 0  (  |x| sin x if x 6= 0 x if x is rational 7. . x 12. . 1 1 − x if x is irrational if x = 0 Exercise 1.4.2. Construct functions on (0, 2) satisfying the requirements. 1. f (x) is not continuous at 2. f (x) is continuous at

3 1 , 1 and and is continuous everywhere else. 2 2

3 1 , 1 and and is not continuous everywhere else. 2 2

3. f (x) is continuous everywhere except at

1 for all natural numbers n. n

1 4. f (x) is not left continuous at , not right continuous at 1, neither side 2 3 1 continuous at , and continuous everywhere else (including the right of 2 2 and the left of 1). Exercise 1.4.3. Prove that a function f (x) is continuous at a if any only if there is l, such that for any > 0, there is δ > 0, such that |x − a| < δ =⇒ |f (x) − l| < . By (1.4.1), all you need to do here is to show l = f (a). Exercise 1.4.4. Prove that if a function is continuous on (a, b] and [b, c), then it is continuous on (a, c). What about other types of intervals? Exercise 1.4.5. Suppose f (x) and g(x) are continuous. Prove max{f (x), g(x)} and min{f (x), g(x)} are also continuous. Exercise 1.4.6. Suppose f (x) is continuous on [a, b] and f (r) = 0 for all rational numbers r ∈ [a, b]. Prove f (x) = 0 on the whole interval.

1.4. CONTINUOUS FUNCTION

67

Exercise 1.4.7. Prove that a continuous function f (x) on (a, b) is the restriction of a continuous function on [a, b] if and only if the limits limx→a+ f (x) and limx→b− f (x) exist. Exercise 1.4.8. Suppose f (x) is an increasing function on [a, b]. Prove that if any number in [f (a), f (b)] can be the value of f (x), then the function is continuous. Exercise 1.4.9. Suppose f (x) and g(x) be continuous functions on (a, b). Find the places where the following function is continuous. ( f (x) if x is rational h(x) = g(x) if x is irrational Exercise 1.4.10. Suppose f (x) is an increasing function on [a, b]. By Proposition 1.3.9, the limits f (c+ ) = limx→c+ f (x) and f (c− ) = limx→c− f (x) exist at any c ∈ (a, b). 1. Prove that for any > 0, there are finitely many c satisfying f (c+ )−f (c− ) > . 2. Prove that f (x) is not continuous only at countably many points.

1.4.2

Uniformly Continuous Function

Theorem 1.4.4. Suppose f (x) is a continuous function on a bounded closed interval [a, b]. Then for any > 0, there is δ > 0, such that for x, y ∈ [a, b], |x − y| < δ =⇒ |f (x) − f (y)| < .

(1.4.4)

In the -δ formulation (1.4.1) of the continuity, only one variable x is allowed to change. This means that, in addition to being dependent on , the choice of δ may also be dependent on the location a of the continuity. The property (1.4.4) says that the choice of δ can be the same for all the points on the interval, so that it depends on only. Therefore the property described in the theorem is called the uniform continuity, and the theorem basically says a continuous function on a bounded closed interval is uniformly continuous. Proof of Theorem 1.4.4. Suppose f (x) is not uniformly continuous. Then there is > 0, such that for any natural number n, there are xn , yn ∈ [a, b], such that 1 |xn − yn | < , |f (xn ) − f (yn )| > . (1.4.5) n Since the sequence {xn } is bounded by a and b, by Theorem 1.2.8, there is 1 1 a convergent subsequence {xnk }. Then by xnk − < ynk < xnk + and nk nk the sandwich rule, the subsequence {ynk } also converges to the same limit. Denote limk→∞ xnk = limk→∞ ynk = c. By a ≤ xn , yn ≤ b, we have a ≤ c ≤ b. Therefore by the assumption, f (x) is continuous at c. By Proposition 1.4.3, we have lim f (xnk ) = lim f (ynk ) = f (c).

k→∞

k→∞

68

CHAPTER 1. LIMIT AND CONTINUITY

Then by the second inequality in (1.4.5), we have ≤ lim f (xnk ) − lim f (ynk ) = |f (c) − f (c)| = 0. k→∞

k→∞

The contradiction shows that it was wrong to assume that the function is not uniformly continuous. Example 1.4.3. Consider the function x2 on [0, 2]. For any > 0, take δ = . 4 Then for 0 ≤ x, y ≤ 2, we have |x − y| < δ =⇒ |x2 − y 2 | = |x − y||x + y| ≤ 4|x − y| < . Thus x2 is uniformly continuous on [0, 2]. Now consider the same function on [0, ∞). For any δ > 0, take x =

1 , δ

1 δ δ2 + . Then |x − y| < δ, but |x2 − y 2 | = xδ + > xδ = 1. Thus (1.4.4) fails δ 2 4 for = 1, and x2 is not uniformly continuous on [0, ∞). √ Example 1.4.4. Consider the function x on [1, ∞). For any > 0, take δ = . Then for x, y ≥ 1, we have y=

√ |x − y| |x − y| √ |x − y| < δ =⇒ | x − y| = √ < . √ ≤ 2 | x + y| √ Thus x is uniformly continuous on [1, ∞). √ We also know x is continuous on [0, 1], by Theorem 1.4.4. Then by using √ Exercise 1.4.12, x is uniformly continuous on [0, ∞). Exercise 1.4.11. Which are uniformly continuous? 1. f (x) = x2 on (0, 3). 2. f (x) =

1 on [1, 3]. x

1 on (0, 3]. x √ 4. f (x) = 3 x on (−∞, ∞). 3. f (x) =

3

5. f (x) = x 2 on [1, ∞).

6. f (x) = sin x on (−∞, ∞). 7. f (x) = sin

1 on (0, 1]. x

8. f (x) = x sin

1 on (0, 1]. x

9. f (x) = xx on (0, 1]. 1 x 10. f (x) = 1 + on (0, ∞). x

Exercise 1.4.12. Prove that if a function is uniformly continuous on (a, b] and [b, c), then it is uniformly continuous on (a, c). What about other types of intervals? Exercise 1.4.13. A function f (x) is called Lipschitz11 if there is a constant L such that |f (x) − f (y)| ≤ L|x − y| for any x and y. Prove that Lipschitz functions are uniformly continuous. 11 Rudolf Otto Sigismund Lipschitz, born 1832 in K¨onigsberg (Germany, now Kaliningrad, Russia), died 1903 in Bonn (Germany). He made important contributions in number theory, Fourier series, differential equations, and mechanics.

1.4. CONTINUOUS FUNCTION

69

Exercise 1.4.14. A function f (x) on the whole real line is called periodic if there is a constant p such that f (x + p) = f (x) for any x (the number p is called the period of the function). Prove that continuous periodic functions are uniformly continuous. Exercise 1.4.15. Is the sum of uniformly continuous functions uniformly continuous? What about the product, the maximum and the composition of uniformly continuous functions? Exercise 1.4.16. Suppose f (x) is a continuous function on [a, b]. Prove that g(x) = sup{f (t) : a ≤ t ≤ x} is continuous. Exercise 1.4.17. Let f (x) be a continuous function on (a, b). 1. Prove that if the limits limx→a+ f (x), limx→b− f (x) exist, then f (x) is uniformly continuous. 2. Prove that if (a, b) is bounded and f (x) is uniformly continuous, then the limits limx→a+ f (x), limx→b− f (x) exist. √ 3. Use x to show that the bounded condition in the second part is necessary. 4. Use the second part to show that sin

1 is not uniformly continuous on (0, 1). x

Exercise 1.4.18 (Dirichlet). For any continuous function f (x) on a bounded and closed interval [a, b] and > 0, inductively define c0 = a, cn = sup{c : |f (x) − f (cn−1 )| < on [cn−1 , c]}. Prove that the process must stop after finitely many steps, which means that there is n, such that |f (x) − f (cn−1 )| < on [cn , b]. Then use this to give another proof of Theorem 1.4.4.

1.4.3

Maximum and Minimum

Theorem 1.4.5. A continuous function on a bounded closed interval must be bounded and reaches its maximum and minimum. The theorem says that there are x0 , x1 ∈ [a, b], such that f (x0 ) ≤ f (x) ≤ f (x1 ) for any x ∈ [a, b]. Thus the values of the function are bounded by f (x0 ) and f (x1 ). Moreover, the function reaches its minimum at x0 and reaches its maximum at x1 . We also note that although the theorem tells us the existence of the maximum and the minimum, its does not tell us how to find the extrema. The method for finding the extrema will be developed in Section 2.2.1. Proof of Theorem 1.4.5. Let f (x) be a continuous function on a bounded closed interval [a, b]. We first prove the function is bounded. By Theorem 1.4.4, for = 1 > 0, there is δ > 0, such that x, y ∈ [a, b] and |x − y| < δ implies |f (x) − f (y)| < 1. In other words, f (x) is bounded by f (y) + 1 and f (y) − 1 on any open interval (y − δ, y + δ) of length 2δ. Since the bounded b−a interval [a, b] can be covered by finitely many (any number bigger than 2δ is enough) such intervals, the function is bounded on the whole interval [a, b].

70

CHAPTER 1. LIMIT AND CONTINUITY .... .. . .. ................ ........................................................max . . . . .. . .. . . ... ... ...... .. .. .. . .... .. .. .. . . ..... ... . .. .. . ........... ... .. .. .. .. .. . .. . . .. .. .. .. . . .. .. .. . .. . .. ..... .. . .. .. .. .... .. ... ... .. . ..... .. .. ............................min .. .. ... . . . . . . . . . .. .. .. .. .. .. . .. .. .. .. .. .. .. . . . . ................................................................................................................................................................................. .. a x0 x1 b .. .. . Figure 1.8: maximum and minimum

Let β = sup{f (x) : x ∈ [a, b]}. Showing f (x) reaches its maximum is the same as proving β is a value of f (x). By the characterization of supremum, 1 for any natural number n, there is xn ∈ [a, b], such that β − < f (xn ) ≤ β. n By the sandwich rule, we get lim f (xn ) = β.

n→∞

(1.4.6)

On the other hand, since the sequence {xn } is bounded by a and b, by Theorem 1.2.8, there is a convergent subsequence {xnk } with c = limk→∞ xnk ∈ [a, b]. Then we have (1.4.7) lim f (xnk ) = f (c). k→∞

Combining the limits (1.4.6), (1.4.7) and using Proposition 1.2.1, we get f (c) = β. The proof for the function to reach its minimum is similar. Exercise 1.4.19. Construct functions satisfying the requirements. 1. f (x) is continuous and not bounded on (0, 1). 2. f (x) is continuous and bounded on (0, 1) but does not reach its maximum. 3. f (x) is continuous and bounded on (0, 1). Moreover, f (x) also reaches its maximum and minimum. 4. f (x) is not continuous and not bounded on [0, 1]. 5. f (x) is not continuous on [0, 1] but reaches its maximum and minimum. 6. f (x) is continuous and bounded on (−∞, ∞) but does not reach its maximum. 7. f (x) is continuous and bounded on (−∞, ∞). Moreover, f (x) also reaches its maximum and minimum. What do your examples say about Theorem 1.4.5?

1.4. CONTINUOUS FUNCTION

71

Exercise 1.4.20. Suppose a continuous function f (x) on (a, b) satisfies limx→a+ f (x) = limx→b− f (x) = −∞. Prove that the function reaches its maximum on the interval. Exercise 1.4.21. Suppose f (x) is continuous on a bounded closed interval [a, b]. 1 Suppose for any a ≤ x ≤ b, there is a ≤ y ≤ b, such that |f (y)| ≤ |f (x)|. Prove 2 that f (c) = 0 for some a ≤ c ≤ b. Does the conclusion still hold if the closed interval is changed to an open one? Exercise 1.4.22. Prove that a uniformly continuous function on a bounded interval must be bounded.

1.4.4

Intermediate Value Theorem

Theorem 1.4.6 (Intermediate Value Theorem). Suppose f (x) is a continuous function on a bounded closed interval [a, b]. If y is a number between f (a) and f (b), then there is c ∈ [a, b], such that f (c) = y. By Theorem 1.4.5, for the function f (x) in the theorem above, there are x0 , x1 ∈ [a, b], such that f (x0 ) = min{f (x) : a ≤ x ≤ b}, f (x1 ) = max{f (x) : a ≤ x ≤ b}. Denote α = f (x0 ) and β = f (x1 ). Applying the intermediate value theorem to f (x) on [x0 , x1 ] (or [x1 , x0 ] if x0 > x1 ), we see that f (x) reaches any value between α and β. This proves the following result. Proposition 1.4.7. Suppose f (x) is a continuous function on a bounded closed interval [a, b]. Then the values of the function on the interval again form a bounded closed interval: f ([a, b]) = {f (x) : a ≤ x ≤ b} = [α, β]. Conversely, the proposition implies both Theorems 1.4.5 and 1.4.6. .... .. max .. β ............................................................................. .. .. .. .... .. ..... .. ... .. .. . .. .... .. .. .. . . ...... .... .. . .. .. . ......... .. . . .. .. y ............................................. .. .. .. .. .. .. .. .. . .. .. ... . .. . .. .. ...... .. . .. .. . .... . . .. .. .. .. .......min..... ... . . . . . . . . . .. . . . α ....................... ... .. .. .. .. .. .. .. .. .. .. .. .. . .. ................................................................................................................................................................................... x0 x1 .. a c b .. .. Figure 1.9: intermediate value theorem

72

CHAPTER 1. LIMIT AND CONTINUITY

Proof of Theorem 1.4.6. Without loss of generality, assume f (a) ≤ y ≤ f (b). Let X = {x ∈ [a, b] : f (x) ≤ y}. The set is not empty because a ∈ X. The set is also bounded by a and b. Therefore we have c = sup X ∈ [a, b]. We expect f (c) = y. If f (c) > y, then by the order property in Proposition 1.3.6, there is δ > 0, such that f (x) > y for any x satisfying c − δ < x ≤ c (the right side of c may not be allowed in case c = b). On the other hand, by c = sup X, there is c − δ < x0 ≤ c, such that f (x0 ) ≤ y. Therefore we have a contradiction at x0 . If f (c) < y, then by f (b) ≥ y, we have c < b. Again by the order property in Proposition 1.3.6, there is δ > 0, such that f (c) < y for any x satisfying |x − c| < δ. In particular, any x0 ∈ (c, c + δ) will satisfy x0 > c and f (x0 ) < y. This contradicts with the assumption that c is an upper bound of X. Thus we conclude that f (c) = y, and the proof is complete. Example 1.4.5. For the function f (x) = x5 + 2x3 − 5x2 + 1, we have f (0) = 1, f (1) = −1, f (2) = 29. By the intermediate value theorem, there are a ∈ [0, 1] and b ∈ [1, 2] such that f (a) = f (b) = 0. Example 1.4.6. A number a is a root of a function f (x) if f (a) = 0. For a polynomial f (x) = an xn + an−1 xn−1 + · · · + a1 x + a0 with an > 0 and n odd, we have a1 a0 an−1 + · · · + n−1 + n . f (x) = xn g(x), g(x) = an + x x x Since limx→∞ g(x) = an > 0 and n is odd, we have f (x) > 0 for sufficiently big and positive x, and f (x) < 0 for sufficiently big and negative x. Then by the intermediate value theorem, f (a) = 0 for some a (between two sufficiently big numbers of opposite signs). Similar argument also works for the case an < 0. Therefore we conclude that a polynomial of odd degree must have a real root. π π Example 1.4.7. The function tan x is continuous on − , . By 2 2 lim tan x = −∞,

x→− π2 +

lim tan x = +∞,

x→ π2 −

π π < a < b < , such that tan a < y < tan b. 2 2 Then by the intermediate value theorem, we have y = f (c) for some c ∈ [a, b]. This shows that any real number is the tangent of some angle. for any number y, we can find −

Exercise 1.4.23. Prove that for any polynomial of odd degree, any real number is the value of the polynomial. Moreover, estimate a solution of the equation x5 − 3x4 + 5x = 4 to the accuracy of the first decimal point (i.e., find n such that n+1 n you are certain that there is a solution between and ). 10 10 Exercise 1.4.24. Show that 2x = 3x has solution on (0, 1). Show that 3x = x2 has solution. Exercise 1.4.25. Suppose a continuous function on an interval is never zero. Prove that it is always positive or always negative.

1.4. CONTINUOUS FUNCTION

73

Exercise 1.4.26. Suppose a continuous function f (x) on (a, b) satisfies limx→a+ f (x) = α and limx→b− f (x) = β. Prove that any number in (α, β) is the value of the function. The statement holds even when some of a, b, α, β are infinities. Exercise 1.4.27. Suppose f (x) is a continuous function on (a, b). Prove that if f (x) only takes rational numbers as values, then f (x) is a constant. Exercise 1.4.28. Suppose f (x) and g(x) are continuous functions on [a, b]. Prove that if f (a) < g(a) and f (b) > g(b), then f (c) = g(c) for some c ∈ (a, b). Exercise 1.4.29. Suppose f : [0, 1] → [0, 1] is a continuous function. Prove that f (c) = c for some c ∈ [0, 1]. c is called a fixed point of f . Exercise 1.4.30. Suppose f (x) is a two-to-one function on [a, b]. In other words, for any x ∈ [a, b], there is exactly one other y ∈ [a, b] such that x 6= y and f (x) = f (y). Prove that f (x) is not continuous.

1.4.5

Invertible Continuous Function

Functions are maps. By writing a function in the form f : [a, b] → [α, β], we mean the function is defined on the domain [a, b] and its values lie in the range [α, β]. The function is onto (or surjective) if any γ ∈ [α, β] can be written as γ = f (c) for some c ∈ [a, b]. It is one-to-one (or injective) if x1 6= x2 implies f (x1 ) 6= f (x2 ). It is invertible (or bijective) if there is another function g : [α, β] → [a, b] such that g(f (x)) = x for any x ∈ [a, b] and f (g(y)) = y for any y ∈ [α, β]. The function g is called the inverse of f and is denoted g = f −1 . It is a basic fact that a function is invertible if and only if it is onto and one-to-one. Moreover, the inverse function is unique. In the discussion above, the domain and the range are expressed as closed intervals. The discussion is still valid if other kinds of intervals (or any sets of real numbers) are used. Theorem 1.4.8. A continuous function on an interval is invertible if and only if it is strictly monotone. Moreover, the inverse is also continuous and strictly monotone. Proof. We only prove the case f (x) is a continuous function on a bounded and closed interval [a, b]. Suppose f (x) is strictly increasing. By Theorem 1.4.5, the function is an onto map f : [a, b] → [α, β] = [f (a), f (b)]. Moreover, x1 6= x2 ⇐⇒ x1 > x2 or x1 < x2 =⇒ f (x1 ) > f (x2 ) or f (x1 ) < f (x2 ) ⇐⇒ f (x1 ) 6= f (x2 ). Thus the function is one-to-one. Since onto and one-to-one maps are invertible, we conclude that f (x) invertible. Conversely, suppose f : [a, b] → [α, β] is invertible and f (a) < f (b). We claim f (x) is strictly increasing. If f (x) is not strictly increasing, then there are a ≤ x1 < x2 ≤ b satisfying f (x1 ) ≥ f (x2 ). Since the invertibility implies one-to-one, we must have f (x1 ) > f (x2 ). Then we have two possibilities.

74

CHAPTER 1. LIMIT AND CONTINUITY 1. If f (a) ≥ f (x1 ), then f (b) > f (a) ≥ f (x1 ) > f (x2 ), and we have f (a) > f (x2 ) < f (b). 2. If f (a) < f (x1 ), then f (a) < f (x1 ) > f (x2 ).

By f being one-to-one, we have either y1 < y2 < y3 , f (y1 ) > f (y2 ) < f (y3 ), or y1 < y2 < y3 , f (y1 ) < f (y2 ) > f (y3 ). For the first case, we have either f (y1 ) ≤ f (y3 ) or f (y1 ) ≥ f (y3 ). If f (y1 ) ≤ f (y3 ), then by the intermediate value theorem, there is y1 < y2 ≤ y4 ≤ y3 satisfying f (y1 ) = f (y4 ). Similarly, if f (y1 ) ≥ f (y3 ), then there is y1 ≤ y4 ≤ y2 < y3 satisfying f (y4 ) = f (y3 ). Both contradict the one-to-one property. Thus the first case cannot happen. The proof for the impossibility of the second case is similar. This completes the proof that f (x) must be strictly increasing. It remains to prove that if f (x) is continuous, strictly increasing and invertible, then its inverse f −1 (y) is also continuous and strictly increasing. Let y1 = f (x1 ) and y2 = f (x2 ). Then x1 ≥ x2 =⇒ y1 ≥ y2 . The implication is the same as y1 < y2 =⇒ x1 < x2 , which means exactly that f −1 (x) is strictly increasing. Finally, we prove the continuity of f (x) at γ ∈ [α, β] = [f (a), f (b)]. We just proved that f −1 is strictly increasing, so that by Proposition 1.3.9, the right side limit limy→γ + f −1 (y) = c exists. Applying the continuous function f (x) and (1.4.3), we get −1 −1 γ = lim+ y = lim+ f (f (y)) = f lim+ f (y) = f (c). y→γ

y→γ

y→γ

Therefore limy→γ + f −1 (y) = c = f −1 (γ), which means f −1 is right continuous at γ. By the similar reason, the inverse function is left continuous. For a strictly increasing and continuous function f (x) on an (not necessarily bounded) open interval (a, b), the same argument as before shows it is one-to-one. Moreover, by Proposition 1.3.9, the limits α = limx→a+ f (x) and β = limx→b− f (x) exist. Then by Exercise 1.4.26, the map f : (a, b) → (α, β) is onto. Thus the function is invertible. For an invertible and continuous function f (x) on (a, b), the restriction on any closed interval [a0 , b0 ] ⊂ (a, b) is still invertible and continuous. Then by Theorem 1.4.8 for the closed intervals, the restriction on [a0 , b0 ] is one-to-one. Since [a0 , b0 ] is arbitrary, we conclude that f (x) is one-to-one on (a, b).

1.4. CONTINUOUS FUNCTION

75

The proof of the continuity of the inverse function can also be applied to the strictly increasing and continuous functions on (a, b). Moreover, the proof also shows lim+ f −1 (y) = a, lim− f −1 (y) = b.

y→α

y→β

Exercise 1.4.31. Determine invertible functions. 1. x2 : [0, 1] → [0, 1].

5. x2 : [−2, 0] → [0, 4].

2. x2 : (0, 2) → (0, 4).

6. x2 : (0, ∞) → (0, ∞).

3. x2 : (0, 2) → (0, 5).

7. x2 : (−∞, 0] → [0, ∞).

4. x2 : (−2, 2) → (0, 4).

8. x2 : (−∞, ∞) → (−∞, ∞).

Exercise 1.4.32. Suppose f (x) is a strictly decreasing and continuous function on [a, b). Let α = f (a) and β = limx→b− f (x). Prove that f : [a, b) → (β, α] is invertible and limy→β + f −1 (y) = b.

1.4.6

Inverse Trignometric and Logarithmic Functions

Theorem 1.4.8 may be used to introduce some important inverse functions. The functions h π πi sin : − , → [−1, 1], 2 2 cos : [0, π] → [−1, 1], π π tan : − , → (−∞, ∞), 2 2 are onto, strictly monotone and continuous (see (1.3.20), (1.3.21), (1.3.22)). Therefore they are invertible, and the inverses trigonometric functions h π πi , arcsin : [−1, 1] → − , 2 2 arccos : [−1, 1] → [0, π], π π arctan : (−∞, ∞) → − , , 2 2 are also strictly monotone and continuous. The other threeinversetrigonoπ metric functions may be defined similarly. The equality cos − x = sin x 2 implies that π arcsin x + arccos x = . 2 The similar equalities also imply the similar equations between other inverse trigonometric functions. Moreover, by the remark made after the proof of Theorem 1.4.8, we have π lim arctan x = − , x→−∞ 2

lim arctan x =

x→+∞

π . 2

Similar limits hold for other inverse trigonometric functions.

76

CHAPTER 1. LIMIT AND CONTINUITY

Example 1.4.8. The continuity of inverse sine function says that limx→0 arcsin x = arcsin 0 = 0. Moreover, by the composition rule, the variable x in the limit (1.3.23) x may be substituted by the continuous function arcsin x to get limx→0 = 1. arcsin x arcsin x Taking reciprocal, we get limx→0 = 1. x Exercise 1.4.33. Prove the equalities.

2. arccos(−x) = π − arccos x.

π 1 = for x > 0. x 2 √ 6. cos(arcsin x) = 1 − x2 .

3. arctan(−x) = − arctan x.

7. tan(arcsin x) = √

1. arcsin(−x) = − arcsin x.

5. arctan x+arctan

x . 1 − x2 √ 1 − x2 8. tan(arccos x) = . x

π 4. arcsin(cos x) = − x for 0 ≤ x ≤ 2 π. Exercise 1.4.34. Compute the limits. 1. limx→−1+

2 arcsin x + π . x+1

4. limx→0

(arccos x)2 . x−1 arctan x . x

tan(arcsin x) . sin(arctan x)

2. limx→1−

5. limx→0+ (sin x)arcsin x .

3. limx→0

π 6. limx→+∞ x arctan x − . 2

By (1.3.26), the exponential function αx based on a constant α > 0 is continuous. For α > 1, the function is strictly increasing, and lim αx = 0,

x→−∞

lim αx = +∞.

x→+∞

For 0 < α < 1, the function is strictly decreasing, and lim αx = +∞,

x→−∞

lim αx = 0.

x→+∞

Thus the exponential function αx : (−∞, ∞) → (0, ∞), 0 < α 6= 1 is invertible. The inverse function logα x : (0, ∞) → (−∞, ∞), 0 < α 6= 1 is the logarithmic function, which is also continuous, strictly increasing for α > 1 and strictly decreasing for 0 < α < 1. Moreover, ( −∞ if α > 1 , (1.4.8) lim+ logα x = x→0 +∞ if 0 < α < 1 ( +∞ if α > 1 lim logα x = . (1.4.9) x→+∞ −∞ if 0 < α < 1

1.4. CONTINUOUS FUNCTION

77

The following equalities for the exponential function α0 = 1, α1 = α, αx αy = αx+y , (αx )y = αxy imply the following equalities for the logarithmic function logα 1 = 0, logα α = 1, logα (xy) = logα x + logα y, log x . log α The logarithmic function with the special base α = e is called the natural logarithmic function and is denoted by log or ln. In particular, we have log e = 1. Next we derive some important limits involving the exponential and the logarithmic functions. xβ 1 in the limit (1.3.10), we have limx→+∞ x = 0. By Taking α = e e limx→+∞ log x = +∞ and the composition rule, we may substitute x by log x and get (log x)β lim = 0. (1.4.10) x→+∞ x This means that at +∞, the logarithmic function goes to infinity much slower than x. Even raising the power of the logarithmic function does not help the speed of growth. 1 1 Substituting x by in the limit (1.3.30), we get limx→0 (1 + x) x = e. By x 1 1 x the continuity of log, we have limx→0 log (1 + x) = log limx→0 (1 + x) x = log e = 1. Then by the property of log, the limit is logα xy = y logα x, logα x =

lim

x→0

log(1 + x) = 1. x

(1.4.11)

The continuity of ex tells us limx→0 (ex −1) = 0. Substituting x in (1.4.11) x log(1 + ex − 1) by ex − 1, we get limx→0 x = limx→0 = 1. Taking recipe −1 ex − 1 rocal, we have ex − 1 lim = 1. (1.4.12) x→0 x The continuity of log tells us limx→0 log(1 + x) = 0. Substituting x in (1 + x)α − 1 (1.4.12) by α log(1+x) and using eα log(1+x) = (1+x)α , we get limx→0 = log(1 + x) α. Multiplying the limit with (1.4.11), we get (1 + x)α − 1 = α. x→0 x lim

Example 1.4.9. For β > 0, taking lim

x→+∞

(1.4.13)

1 power of the limit (1.4.10) gives us β log x = 0 for β > 0. xβ

(1.4.14)

78

CHAPTER 1. LIMIT AND CONTINUITY

Then substituting x by

1 1 and using log = − log x, we get x x lim xβ log x = 0 for β > 0.

(1.4.15)

x→0+

The limits (1.4.10), (1.4.14), (1.4.15) can be extended to the logarithms based on log x any α > 0 by using logα x = . log α Example 1.4.10. Substituting x by x log α in (1.4.12) and using ex log α = αx , we get a more general formula αx − 1 = log α. x→0 x lim

(1.4.16)

Example 1.4.11. We already know limx→0+ xx = 1. To further find out how close xx − 1 y−1 xx is to 1, we let y = xx . Then x log x = log y and = . As x → 0+ , x log x log y y−1 z we have y → 1+ , and limy→1 = limz→0 = 1 by (1.4.11). Thus we log y log(1 + z) conclude that xx = 1 + x log x + o(x log x), where x log x is an infinitesimal at 0+ . Exercise 1.4.35. Compute the limits. log x . 1. limx→1 √ x−1 2. limx→0+

log x . x

3. limx→+∞

log(log x) . x

11. limx→+∞

√ log(1 + x) . log(1 + x)

12. limx→+∞

log3 (2 + x3 ) . log4 (3 + x4 )

13. limx→+∞

p √ x

x + log x.

4. limx→1+ (x − 1) log(log x). 5. limx→2

log x − log 2 . x−2

6. limx→+∞ √ 7. limx→+∞ 8. limx→0

log x . x + log x

log(2 + 3x) . log(2 + 7x)

log(2 + 3x) . log(2 + 7x)

log(2 + 3x) 9. limx→− 2 + . log(2 + 7x) 7 √ log(1 + x) 10. limx→0 . log(1 + x)

14. limx→0

logα (1 + x) . logα (1 + 2x − x2 )

15. limx→0

(1 + x + x2 ) 9 − 1 . x

16. limx→0

(1 + x + x2 ) 9 − (1 + x) 9 . x2

7

7

7

2

17. limx→0+

xx − 1 . x log x

√ 18. limx→+∞ x( x x − 1). 19. limx→+∞ ((x + 1)α − xα ).

Exercise 1.4.36. Use Exercise 1.1.36 and the continuity of logarithmic function to √ prove that if xn > 0 and limn→∞ xn = l, then limn→∞ n x1 x2 · · · xn = l. What about the case limn→∞ xn = +∞?

1.4. CONTINUOUS FUNCTION

79

xn √ = l, then limn→∞ n xn = l. xn−1 Use the conclusion to compute the following limits s p √ n n (2n)! n! (2n)! lim , lim , lim n . 2 n→∞ n n→∞ n→∞ n (n!)2 1 1 1 < . Then Exercise 1.4.38. Use Exercise 1.2.19 to derive < log 1 + 1+n n n 1 1 use the inequality to show that xn = 1 + + · · · + − log n is strictly decreasing 2 n 1 1 and yn = 1 + + · · · + − log(n + 1) is strictly increasing. Moreover, prove 2 n that both xn and yn converge to the same limit, called the Euler12 -Mascheroni13 constant: 1 1 1 (1.4.17) lim 1 + + + · · · + − log n = 0.577215669015328 · · · . n→∞ 2 3 n Exercise 1.4.37. Prove that if xn > 0 and limn→∞

A vast extension of the exercise appears in Exercise 4.1.64.

1.4.7

Additional Exercise

Additive and Multiplicative Functions Exercise 1.4.39. Suppose f (x) is a continuous function on R satisfying f (x + y) = f (x) + f (y). 1. Prove that f (nx) = nf (x) for integers n. 2. Prove that f (rx) = rf (x) for rational numbers r. 3. Prove that f (x) = ax for some constant a. Exercise 1.4.40. Suppose f (x) is a continuous function on R satisfying f (x + y) = f (x)f (y). Prove that either f (x) = 0 or f (x) = ax for some constant a > 0.

Cauchy Criterion for Continuity Exercise 1.4.41. Prove that a function f (x) is continuous at a if any only if for any > 0, there is δ > 0, such that |x − a| < δ, |y − a| < δ =⇒ |f (x) − f (y)| < . Exercise 1.4.42. Prove that if a function f (x) is not continuous at a, then one of the following will happen. 12

Leonhard Paul Euler, born 1707 in Basel (Switzerland), died 1783 in St. Petersburg (Russia). Euler is one of the greatest mathematicians of all time. He made important discoveries in almost all areas of mathematics. Many theorems, quantities, and equations are named after Euler. He also introduced much of the modern √ mathematical terminology and notation, including f (x), e, Σ (for summation), i (for −1), and modern notations for trigonometric functions. 13 Lorenzo Mascheroni, born 1750 in Lombardo-Veneto (now Italy), died 1800 in Paris (France). The Euler-Mascheroni constant first appeared in a paper by Euler in 1735. Euler calculated the constant to 6 decimal places in 1734, and to 16 decimal places in 1736. Mascheroni calculated the constant to 20 decimal places in 1790.

80

CHAPTER 1. LIMIT AND CONTINUITY 1. There is > 0, such that for any δ > 0, there are a − δ < x < a and a < y < a + δ, such that |f (x) − f (a)| ≥ and |f (y) − f (a)| ≥ . 2. There is > 0, such that for any δ > 0, there are a − δ < x < a and a < y < a + δ, such that |f (x) − f (y)| ≥ .

Note that x and y are on different sides of a.

One Side Invertible Monotone Function Let f (x) be an increasing but not necessarily continuous function on [a, b]. By Exercise 1.4.10, the function has only countably many discontinuities. A function g(x) on [f (a), f (b)] is a left inverse of f (x) if g(f (x)) = x, and is a right inverse if f (g(y)) = y. Exercise 1.4.43. Prove that if f (x) has a left inverse g(y), then f (x) is strictly increasing, and g(f (x− )) = g(f (x+ )) = x for any x ∈ [a, b]. Exercise 1.4.44. Suppose f (x) is strictly increasing and has only one discontinuity at c ∈ (a, b). Prove that f (x) has a unique left inverse g(y). Moreover, g(y) is strictly increasing on [f (a), f (c− )] and [f (c+ ), f (b)], and g(y) = c on [f (c− ), f (c+ )]. Exercise 1.4.45. Suppose f (x) is strictly increasing. Prove that for any y ∈ [f (a), f (b)], there is a unique x ∈ [a, b] satisfying f (x− ) ≤ y ≤ f (x+ ). Moreover, for such unique x, the definition g(y) = x gives the unique left inverse of f (x). Exercise 1.4.46. Prove that f (x) has a right inverse if and only if f (x) is continuous. Moreover, the right inverse must be strictly increasing, and the right inverse is unique if and only if f (x) is strictly increasing.

Chapter 2 Differentiation

81

82

CHAPTER 2. DIFFERENTIATION

2.1

Approximation and Differentiation

The precalculus mathematics deals with static quantities which always have specific and fixed values. For example, the angle in an equilateral triangle is 60 degrees. The area of the rectangle with width 3 and height 2 is 6. Differentiation is the mathematical tool for studying changing quantities. Such quantities often depend on other quantities. The dependence is often expressed as functions. For example, if two angles in a triangle are α and β degrees, then the third angle is γ = 180 − α − β degrees. Moreover, the area of a rectangle depends on the width and the height (and is given by the product function). Of course, functions may be evaluated to get static values. For example, for width = 3 and height = 2, we get area = 6. Such substitutions simply fix the changing quantities and cannot answer the following questions: 1. Does the area increase when the width and the height are increasing? 2. What is the biggest area of a rectangle with circumference 10? To answer such questions, we have to think of the whole function instead of specific evaluation. We have to study how the function changes. We have to consider the whole evolving process instead of specific moments of the process. The change of general functions may be described in terms of approximations by simple functions. The simplest and the most useful function for the purpose of approximation is the linear function. The linear approximation is called the differential and the key coefficient in the linear approximation is called the derivative.

2.1.1

Approximation

Let us start with some common sense. Suppose P (A) is a problem about an object A. To solve the problem, we may consider a simpler object B that closely approximates A. Because B is simpler, P (B) is easier to solve. Moreover, because B is close to A, the (easily obtained) answer to P (B) is also pretty much the answer to P (A). Example 2.1.1. To find the distance from Hong Kong to Shanghai, we take a map and use the ruler to measure the distance between the two cities and then multiply the scaling ratio of the map. In this process, P is the distance, A is the cities Hong Kong and Shanghai, and B is the two dots representing the two cities on a map. The map is only an approximation of the actual world. Of course, the bigger the map, the more accurate the measurement is. An even better measurement can be obtained by a sphere shaped map (the globe found in stationary stores) because the sphere is a better approximation of the world than the flat plane (some ancient people may disagree, though). On the other hand, how accurate we wish the answer to be depends on our purpose. If our purpose is only to estimate the time needed for flying from Hong Kong to Shanghai, a poster sized map is more than sufficient.

2.1. APPROXIMATION AND DIFFERENTIATION

83

Example 2.1.2. Suppose we want to count how much money there is in a big bag of one dollar coins. We may weigh the whole bag, weigh one coin, and then divide the two numbers. In this process, A is the collection of real coins, with slight variation in size and weight, and B is the imaginary ideal collection in which each coin is identical to the one we choose to weigh. If the big bag is not too big (say two kilo), the error would be small enough and the answer would be quite reliable. Exercise 2.1.1. Think again how you solve problems in everyday life. Do you actually solve the problem as is? Or what you solve is just an approximation of the problem?

Many real world problems can often be mathematically interpreted as quantities related by complicated functions. To solve the problem, we may try to approximate the complicated functions by simple ones and solve the same problem for the simple functions. What are the simple functions? Although the answer could be rather subjective, most people would agree that the following functions are listed from simple to complicated. type zero constant linear quadratic cubic

one variable examples two variable examples 0 0 −1, −3 −1, −3 3 + x, 4 − 5x 3 + x + 2y, 5x − 7y 2 2 1 + x , −4 + 5x − 2x xy, x2 + xy − 3y 2 + y + 1 2x − 2x3 2x2 − 2y 3 , x2 y + xy 2 2 1 x + xy + 3y 2 2+x 1 , , rational x(4 − x) 1 + x x+y x(y − x) √ √ 1/2 1/3 3 algebraic x, (1 + x ) xy, (x − 3y 2 )1/3 x transcendental sin x, e + cos x sin xy, ex+y + cos(x + y) What do we mean by approximating a function with a simple (class of) functions? Consider measuring certain length (say the height of a person, for example) by a ruler with only centimeters. We expect to get an approximate reading of the length with the accuracy within millimeters, or significantly smaller than the base unit of 1cm for the ruler: |actual length − reading from ruler| ≤ (1cm). Similarly, approximating a function f (x) near x0 by a function p(x) of some class would mean the difference |f (x) − p(x)| is significantly smaller than the “base unit” for the class. Specifically, for any > 0, there is δ > 0, such that |x − x0 | < δ =⇒ |f (x) − p(x)| ≤ (unit).

(2.1.1)

The “base unit” for the class of constant functions is 1. Thus the approximation of a function f (x) by a constant function p(x) = a near x0 means that for any > 0, there is δ > 0, such that |x − x0 | < δ =⇒ |f (x) − a| ≤ .

(2.1.2)

84

CHAPTER 2. DIFFERENTIATION

The condition can be split into two parts. First, for x = x0 , we get |f (x0 ) − a| ≤ for any > 0. This means exactly a = f (x0 ). Second, for x 6= x0 , we get 0 < |x − x0 | < δ =⇒ |f (x) − a| ≤ . This means exactly limx→x0 f (x) = a. Combined with a = f (x0 ), we conclude that a function f (x) is approximated by a constant near x0 if and only if it is continuous at x0 . Moreover, the approximating constant is f (x0 ). The approximation by constant functions, which happens for continuous functions only, is good enough for the problem of the sign of the function. Example 2.1.3. Consider the positivity of functions. The cubic function f (x) = x3 − 5x + 2 is approximated by the constant f (3) = 14 near 3. Since the constant function is positive, the cubic function is also positive near 3. Note that the farther away from 3, the less certain we are about the positivity for the cubic function. The reason is that the approximation becomes worse further away from 3. As a matter of fact, if we move away by distance 1 (say at x = 2), the positivity is lost. Exercise 2.1.2. Rigorously prove that if f (x) is continuous at x0 and f (x0 ) > 0, then there is δ > 0, such that f (x) > 0 for any x ∈ (x0 − δ, x0 + δ).

2.1.2

Differentiation

The approximation by constant functions is too crude for most problems. As the next simplest, the linear functions A + Bx often need to be used. For the convenience of discussion, we introduce ∆x = x − x0 (∆ is the Greek equivalence of D, here used to stand for Difference) and rewrite A + Bx = A + B(x0 + ∆x) = (A + Bx0 ) + B∆x = a + b∆x. What is the base unit of a + b∆x as x approaches x0 ? The base unit of the constant term a is 1. The base unit of the difference term b∆x is ∆x, which is very small compared with the unit 1. Therefore the base unit for the linear function a + b∆x is ∆x. The discussion may be compared with the expression am + bcm (a meters and b centimeters). Since 1cm is much smaller than 1m, the base unit for am + bcm is 1cm. The approximation of a function f (x) by a linear function a + b∆x at x0 means the following. Definition 2.1.1. A function f (x) is differentiable at x0 if there is a linear function a + b∆x, such that for any > 0, there is δ > 0, such that |∆x| = |x − x0 | < δ =⇒ |f (x) − a − b∆x| ≤ |∆x|.

(2.1.3)

Example 2.1.4. To find the linear approximation of x2 at x0 = 1, we rewrite the function in terms of ∆x = x − 1 near 1. x2 = (1 + ∆x)2 = 1 + 2∆x + ∆x2 .

2.1. APPROXIMATION AND DIFFERENTIATION

85

Therefore |∆x| < δ = =⇒ |x2 − 1 − 2∆x| = |∆x|2 ≤ δ|∆x|. This shows that 1 + 2∆x is the linear approximation of x2 at 1. Exercise 2.1.3. Show that x2 is linearly approximated by x20 + 2x0 ∆x at x0 . What about the linear approximation of x3 ?

For x = x0 , the condition (2.1.3) tells us the constant coefficient a = f (x0 ). Let ∆f = f (x) − a = f (x) − f (x0 ) be the change of the function caused by the change ∆x of the variable. Then the right side of (2.1.3) becomes |∆f − b∆x| ≤ |∆x|, so that the condition becomes ∆f = b∆x + o(∆x). (2.1.4) In other words, the scaling b∆x of the change of variable is the linear approximation of the change of the function. The viewpoint is caught in the notation of the differential df = bdx (2.1.5) of the function at x0 . Note that the symbols df and dx have not yet been specified as numerical quantities. Thus bdx should be, at least for the moment, considered as an integrated notation instead of the product of two quantities. On the other hand, the notation is motivated from b∆x, which was indeed a product of two numbers. So it is allowed to add two differentials and to multiply numbers to differentials. In more advanced mathematics, the differential symbols will indeed be defined as quantities in some linear approximation spaces (called tangent spaces). However, one has to be careful in multiplying differentials together because this is not a valid operation within linear spaces. Finally, dividing differentials is not allowed.

2.1.3

Derivative

Taking x = x0 in (2.1.3) tells us a = f (x0 ). Taking x 6= x0 , the condition (2.1.3) becomes f (x) − f (x0 ) − b ≤ . (2.1.6) 0 < |∆x| = |x − x0 | < δ =⇒ x − x0 This tells us how to compute the coefficient b. Definition 2.1.2. The derivative of a function f (x) at x0 is f 0 (x0 ) =

df f (x) − f (x0 ) ∆f f (x0 + ∆x) − f (x0 ) = lim = lim = lim . ∆x→0 ∆x ∆x→0 dx x→x0 x − x0 ∆x

We already saw that the differentiability implies the existence of the derivative. Conversely, if the derivative exists, then we have the implication (2.1.6), which is the same as (2.1.3) with a = f (x0 ) and 0 < |x − x0 | < δ. Since (2.1.3) always holds with a = f (x0 ) and x = x0 , we conclude that the condition for differentiability holds with a = f (x0 ) and |x − x0 | < δ.

86

CHAPTER 2. DIFFERENTIATION .... ∆y . .. slope = ∆x .. .. ...... .. . . .. .. slope =f 0 (x0 ) ........ .. . . . . . . . ... ... . f (x) ............................................................................... . . . . . . .. . ... . ..... .. .. ∆y ... ............................. .. .. ................... .. .... f (x0 ) .............................................................................................. .. ...................................................... .. .. .. .... .. . .. f (x) .. . . . . . . .. ..... .. .. .. . . . . .. .. .. linear .. . . . ........∆x .. approx. .... .. .. .................... .. . . . . . .......................................................................................................................................................................................... x x0 .. .. .. Figure 2.1: linear approximation and derivative of f (x) at x0

Proposition 2.1.3. A function f (x) is differentiable at x0 if and only if it has the derivative at x0 . Moreover, the function is continuous at x0 and its linear approximation is f (x0 ) + f 0 (x0 )∆x. Proof. It remains to prove the continuity. Taking = 1 in the definition of differentiability, we find δ > 0, such that |∆x| = |x − x0 | < δ =⇒ |f (x) − f (x0 ) − b∆x| ≤ |∆x| =⇒ |f (x) − f (x0 )| ≤ (|b| + 1)|∆x|. Then for any > 0, we get |x − x0 | < max δ, =⇒ |f (x) − f (x0 )| ≤ (|b| + 1)|x − x0 | ≤ . |b| + 1 This shows f (x) is continuous at x0 . We emphasis here that although the existence of linear approximation is equivalent to the existence of derivative, the two play different roles. Linear approximation is the motivation and the concept. Derivative is the computation of the linear approximation and is derived from it. Therefore linear approximation is much more important in understanding the essence of calculus. As a matter of fact, for multivariable functions, linear approximations may be similarly computed by partial derivatives. However, the existence of partial derivatives does not necessarily imply the existence of linear approximation. Mathematical concepts are always derived from common sense. The formulas for computing the concepts are only obtained after mathematically analysing the common sense. Never equate the formula with the concept itself! Example 2.1.5. The linear approximation to a linear function f (x) = A + Bx must be the linear function itself. Moreover, since ∆f = f (x) − f (x0 ) = B(x − x0 ) = B∆x, the derivative f 0 (x0 ) = B by comparing with (2.1.4).

2.1. APPROXIMATION AND DIFFERENTIATION

87

Example 2.1.6. For a quadratic function f (x) = A + Bx + Cx2 , we have ∆f = f (x0 + ∆x) − f (x0 ) = (B + 2Cx0 )∆x + C∆x2 = (B + 2Cx0 )∆x + o(∆x). By comparing with (2.1.4), the function is differentiable at x0 , with the differential df = (B + 2Cx0 )dx at x0 . Usually we write df = (B + 2Cx)dx for the differential at any x. We also have f 0 (x) = B + 2Cx. Example 2.1.7. For f (x) = sin x, the limit (1.3.23) tells us f 0 (0) = lim

x→0

f (x) − f (0) sin x = lim = 1. x→0 x x

Therefore the sine function is differentiable at 0, with f (0) + f 0 (0)(x − 0) = x as the linear approximation. f (x) − f (0) |x| Example 2.1.8. For f (x) = |x|, the limit limx→0 = limx→0 diverges x x because the left and the right limits are different. Therefore the absolute value function is not differentiable at 0. Exercise 2.1.4. Find the differential and the derivative for the cubic function f (x) = x3 by computing ∆f = f (x0 + ∆x) − f (x0 ). Exercise 2.1.5. Determine the differentiability and find the linear approximations if possible (α, β > 0). 1.

p

|x| at 0.

2. cos x at a. 3. sin2 x at 0. 4. ex at a. 5. log(1 + x) at 0. 6.

| sin πx3 |

at 0 and 1.

( x 7. 1

if x 6= 0 at 0. if x = 0

( −xα 8. (−x)β

if x ≥ 0 at 0. if x < 0

( xα 9. (−x)β

if x ≥ 0 at 0. if x < 0

Exercise 2.1.6. Study the differentiability of the function |x3 (x − 1)(x − 2)2 |. Exercise 2.1.7. Study the differentiability of Thomae’s function in Example 1.4.2. Exercise 2.1.8. Suppose f (0) = 0 and f (x) ≥ |x|. Show that f (x) is not differentiable at 0. What if f (x) ≤ |x|? Exercise 2.1.9. Suppose f (x) is differentiable on a bounded interval (a − , b + ). Suppose f (x) = 0 for infinitely many x ∈ [a, b]. Prove that there is c ∈ [a, b], such that f (c) = 0 and f 0 (c) = 0. Exercise 2.1.10. Suppose f (x) is differentiable at x0 , with f (x0 ) 6= 0. 1 f (x0 + t) t . limt→0 f (x0 )

Find

Exercise 2.1.11. Prove that f (x) is differentiable at x0 if and only if f (x) = f (x0 )+ (x − x0 )g(x) for a function g(x) continuous at x0 .

88

CHAPTER 2. DIFFERENTIATION

2.1.4

Tangent Line and Rate of Change

Geometrically, the differentiation may be understood through the graphs of functions. The graph of a function is a curve, and the graph of a linear function is a straight line. Therefore the linear approximation is the approximation of a curve by a straight line. Such straight line is the “best fit” among all the straight lines passing through (x0 , f (x0 )). To construct the best fit, we let Lx be the line passing through the two points (x0 , f (x0 )) and (x, f (x)) on the graph of the function. Then the limit of the line Lx as x approaches ∆f , the slope of the x0 is the linear approximation. Since the slope of Lx is ∆x linear approximation is the limit of the quotient, or the derivative. The differentiation can also be viewed as the rate of change. To understand how a quantity y depends on another quantity x, it suffices to understand how the change in x induces the change in y. To illustrate the idea, suppose x is the age of a person and y = f (x) = 100 + x2 is the wage the person earns at age x. Then we know the wage changes by Df (x) = f (x + 1) − f (x) = 2x + 1 when the person becomes one year older. Conversely, suppose we know the starting salary f (20) = 500 at the age 20 and we also know that every year the wage changes by Df (x) = 2x + 1. Then we may recover the actual salary f (x) by adding up the changes. For example, the salary at age 25 is f (25) = f (20) + Df (20) + Df (21) + Df (22) + Df (23) + Df (24) = 500 + 41 + 43 + 45 + 47 + 49 = 725. In general, it is not difficult to recover the formula f (x) = 100 + x2 . The process of measuring changes is differentiation, and the process of recovering the whole by adding up the changes is integration. Exercise 2.1.12. Given the initial term x1 of a sequence and the difference xn+1 −xn between consecutive terms, recover the sequence. 1. x1 = 1 and xn+1 − xn = 1. 2. x1 = 1 and xn+1 − xn = n. 3. x1 = 1 and xn+1 − xn = (−1)n . 4. x1 = 1 and xn+1 − xn =

1 . n(n + 1)

5. x1 = 1 and xn+1 − xn = √

n+

1 √

n+1

.

In the example above, the variable is taken to be integers only. In reality, many variables are real numbers and the changes are continuous. In this case, we have to consider the change ∆f = f (x + ∆x) − f (x) of a function when x is changed by ∆x. Since ∆f usually becomes smaller as ∆x gets smaller, ∆f it is more sensible to measure the rate of change for the period between ∆x

2.1. APPROXIMATION AND DIFFERENTIATION

89

∆f x and x + ∆x. When ∆x is approaching zero, the limit lim∆x→0 then ∆x becomes the rate of change at the instant of x. The process of computing the rate of change is differentiation, and the process of recovering the whole from the rate of change is integration. Example 2.1.9. The height of a free falling object on the face of the earth is 1 h(t) = h0 − gt2 , 2 where h0 is the initial height, g is the gravitational constant, and t is the time. As the time is changed from t to t + ∆t, the height is changed by 1 ∆h = h(t + ∆t) − h(t) = −gt∆t − ∆t2 . 2 The average speed of the falling object during the period is ∆h 1 = −gt − ∆t. ∆t 2 As ∆t → 0, we get the instantaneous speed h(t) = lim

∆t→0

∆h = −gt ∆t

at the time t. Note that the speed is getting bigger as time goes by, which is consistent with the observation. As a matter of fact, the formula for the height h(t) is obtained in the reverse way (i.e., by integration). From Newton1 ’s second law of mechanics, we know the speed of the falling object is v(t) = −gt. To recover the height of the object is the same as finding a function h(t) satisfying h0 (t) = v(t) = −gt. Indeed 1 h(t) = h0 − gt2 is the function with the required derivative. 2 Example 2.1.10. Let f (t) be the amount of money in a savings account. The increase of the amount, due to the interest, can be measured by the rate of change, which at a particular time t is proportional to the amount of money f (t) at the time. In other words, we have f 0 (t) = cf (t), where the constant c is the interest rate. Now the question is what kind of function satisfies the (differential) equation. To get some idea, let us consider the discrete case. Suppose the interest is paid out yearly instead of continuously. Then the amount at the (t + 1)-st year is f (t + 1) = (1 + c)f (t). Thus for any natural number t we get f (t) = (1 + c)f (t − 1) = (1 + c)2 f (t − 2) = · · · = (1 + c)t f (0), which is an exponential function of t. If the interest is made monthly payment 1 c at the yearly rate of c, then we have f t + = 1+ f (t), from which we 12 12 c 12t f (0). If the interest payment is made n times can similarly get f (t) = 1 + 12 1 Isaac Newton, born 1643 in Woolsthorpe (England), died 1727 in London (England). Newton is a giant of science and one of the most influential people in human history. Together with Gottfried Leibniz, he invented calculus. For Newton, differentiation is the fundamental concept, and integration only means to recover a function from its derivative.

90

CHAPTER 2. DIFFERENTIATION

c nt a year, then we will get f (t) = 1 + f (0). As n goes to infinity, we get the n formula c nt f (t) = lim 1 + f (0) = f (0)ect n→∞ n for the account for which the interest is paid continuously. We will rigorously show later that the solutions to f 0 (t) = cf (t) are indeed scalar multiples of ect .

2.1.5

Rules of Computation

Proposition 2.1.4. The sum, the product and the composition of differentiable functions are differentiable. Moreover, (f (x) + g(x))0 = f 0 (x) + g 0 (x), (f (x)g(x))0 = f 0 (x)g(x) + f (x)g 0 (x), (g(f (x)))0 = g 0 (f (x))f 0 (x). The rules can also be written as d(f + g) df dg d(f g) df dg dg dg df = + , = g+f , = , dx dx dx dx dx dx dx df dx or in the differential form, d(f + g) = df + dg, d(f g) = gdf + f dg, d(g ◦ f ) = (g 0 ◦ f )df. The formula for the product is called the Leibniz2 rule. The formula for the composition is called the chain rule. A special case of the Leibniz rule is that for any constant c, we have (cf )0 = c0 f + cf 0 = 0f + cf 0 = cf 0 . The formula for the derivative of the sum can be explained through the linear approximation. Suppose the functions f (x) and g(x) are approximated by the linear functions L(x) = f (x0 ) + f 0 (x0 )∆x, K(x) = g(x0 ) + g 0 (x0 )∆x near x0 . Then we expect f (x) + g(x) to be approximated by the sum L(x) + K(x) = (f (x0 ) + g(x0 )) + (f 0 (x0 ) + g 0 (x0 ))∆x of the linear functions. In particular, this implies the derivative (f + g)0 (x0 ) is the coefficient f 0 (x0 ) + g 0 (x0 ) of ∆x. 2 Gottfried Wilhelm von Leibniz, born 1646 in Leipzig, Saxony (Germany), died 1716 in Hannover (Germany). Leibniz was a great scholar who contributed to almost all the subjects in the human knowledge of his time. He invented calculus independent of Newton. He also invented the binary system, the foundation of modern computer system. He was, along with Ren´e Descartes and Baruch Spinoza, one of the three greatest 17th-century rationalists. Leibniz was perhaps the first major European intellect who got seriously interested in Chinese civilization. His fascination of I Ching may be related to his invention of bindary system.

2.1. APPROXIMATION AND DIFFERENTIATION

91

Similarly, we expect f (x)g(x) to be approximated by the product L(x)K(x) = f (x0 )g(x0 ) + (f 0 (x0 )g(x0 ) + f (x0 )g 0 (x0 ))∆x + f 0 (x0 )g 0 (x0 )∆x2 of linear functions. Although L(x)K(x) is not linear, it is further approximated by the linear function M (x) = f (x0 )g(x0 ) + (f 0 (x0 )g(x0 ) + f (x0 )g 0 (x0 ))∆x. Therefore f (x)g(x) is also approximated by the linear function M (x). In particular, (f g)0 (x0 ) is the coefficient f 0 (x0 )g(x0 ) + f (x0 )g 0 (x0 ) of ∆x. For the composition g(f (x)), the functions f (x) and g(y) are approximated by the linear functions L(x) = f (x0 ) + f 0 (x0 )∆x, K(y) = g(y0 ) + g 0 (y0 )∆y near x0 and y0 = f (x0 ). Then we expect g(f (x)) to be approximated by the composition K(L(x)) = g(y0 ) + g 0 (y0 )f 0 (x0 )∆x of linear functions. In particular, this implies the derivative (g ◦ f )0 (x0 ) is the coefficient g 0 (y0 )f 0 (x0 ) = g 0 (f (x0 ))f 0 (x0 ) of ∆x. Proof of Proposition 2.1.4. We prove the formulae by rigorously verifying the claims made above regarding the linear approximations. Consider the sum first. For any 1 > 0, there are δ1 , δ2 > 0, such that |∆x| < δ1 =⇒ |f (x) − L(x)| ≤ 1 |∆x|, |∆x| < δ2 =⇒ |g(x) − K(x)| ≤ 1 |∆x|.

(2.1.7) (2.1.8)

This implies |∆x| < min{δ1 , δ2 } =⇒ |(f (x) + g(x)) − (L(x) + K(x))| ≤ 21 |∆x|. (2.1.9) and find δ1 , δ2 > 0, such that (2.1.7) and 2 (2.1.8) hold. Then (2.1.9) holds and becomes Thus for any > 0, we take 1 =

|∆x| < min{δ1 , δ2 } =⇒ |(f (x) + g(x)) − (L(x) + K(x))| ≤ |∆x|. This proves that f (x) + g(x) is approximated by the linear function L(x) + K(x). Now consider the product. For |∆x| < min{δ1 , δ2 }, we have |f (x)g(x) − L(x)K(x)| ≤ |f (x)g(x) − L(x)g(x)| + |L(x)g(x) − L(x)K(x)| ≤ 1 |∆x||g(x)| + 1 |L(x)||∆x| ≤ 1 (|g(x)| + |L(x)|)|∆x|. Then |f (x)g(x) − M (x)| ≤ |f (x)g(x) − L(x)K(x)| + |f 0 (x0 )g 0 (x0 )∆x2 | ≤ [1 (|g(x)| + |L(x)|) + |f 0 (x0 )g 0 (x0 )∆x|] |∆x|.

92

CHAPTER 2. DIFFERENTIATION

Since |g(x)| and |L(x)| are continuous at x0 , they are bounded near x0 by Proposition 1.3.6. Therefore for any > 0, it is not difficult to find δ3 > 0 and 1 > 0, such that |∆x| < δ3 =⇒ 1 (|g(x)| + |L(x)|) + |f 0 (x0 )g 0 (x0 )∆x| ≤ . Next for this 1 , we may find δ1 , δ2 > 0, such that (2.1.7) and (2.1.8) hold. Then we have |∆x| < min{δ1 , δ2 , δ3 } =⇒ |f (x)g(x) − M (x)| ≤ |∆x|. This proves that f (x)g(x) is approximated by the linear function M (x). Finally consider the composition. For any 1 , 2 > 0, there are δ1 , δ2 > 0, such that |∆x| = |x − x0 | < δ1 =⇒ |f (x) − L(x)| ≤ 1 |∆x|, |∆y| = |y − y0 | < δ2 =⇒ |g(y) − K(y)| ≤ 2 |∆y|.

(2.1.10) (2.1.11)

Then for y = f (x), we have |∆x| < δ1 =⇒ |∆y| = |f (x) − f (x0 )| ≤ |f (x) − L(x)| + |f 0 (x0 )∆x| ≤ (1 + |f 0 (x0 )|)|∆x| < (1 + |f 0 (x0 )|)δ1 . (2.1.12) and |∆x| < δ1 , |∆y| < δ2 =⇒ |g(f (x)) − K(L(x))| ≤ |g(f (x)) − K(f (x))| + |K(f (x)) − K(L(x))| = |g(y) − K(y)| + |g 0 (x0 )||f (x) − L(x)| ≤ 2 |∆y| + |g 0 (x0 )|1 |∆x| ≤ [2 (1 + |f 0 (x0 )|) + 1 |g 0 (x0 )|]|∆x|. (2.1.13) Suppose for any > 0, we can find suitable δ1 , δ2 , 1 , 2 , such that (1 + |f 0 (x0 )|)δ1 ≤ δ2 , 2 (1 + |f 0 (x0 )|) + 1 |g 0 (x0 )| ≤ ,

(2.1.14) (2.1.15)

and (2.1.10), (2.1.11) hold. Then (2.1.12) tells us |∆x| < δ1 implies |∆y| < δ2 , and (2.1.13) becomes |∆x| < δ1 =⇒ |g(f (x))−K(L(x))| ≤ [2 (1 +|f 0 (x0 )|)+1 |g 0 (x0 )|]|∆x| ≤ |∆x|. This proves that f (g(x)) is approximated by the linear function K(L(x)). It remains to find δ1 , δ2 , 1 , 2 such that (2.1.10), (2.1.11), (2.1.14) and (2.1.15) hold. For any > 0, we first find 1 , 2 > 0 satisfying (2.1.15). For this 2 > 0, we find δ2 > 0, such that the implication (2.1.11) holds. Then for the 1 > 0 we already found, it is easy to find δ1 > 0 such that the implication (2.1.10) holds and (2.1.14) is also satisfied. Exercise 2.1.13. A function is odd if f (−x) = −f (x). It is even if f (−x) = f (x). A function is periodic with period p if f (x + p) = f (x). Prove that the derivatives of odd, even, periodic functions are respectively even, odd, periodic.

2.1. APPROXIMATION AND DIFFERENTIATION

2.1.6

93

Basic Example

We establish the following derivatives. (xα )0 (sin x)0 (cos x)0 (tan x)0 (ex )0 (αx )0

= αxα−1 , = cos x, = − sin x, = sec2 x, = ex , = αx log α.

By the limit (1.4.13), for x > 0 we have dxα (x + ∆x)α − xα (x + xu)α − xα = lim = lim u→0 ∆x→0 dx ∆x xu α (1 + u) − 1 = αxα−1 . = lim xα−1 u→0 u For x < 0 and integer α, let y = −x. By the chain rule, dxα d((−y)α ) dy d(y α ) d(−x) = = (−1)α = (−1)α αy α−1 (−1) = αxα−1 . dx dy dx dy dx For natural number α, we also have ( 0 if α > 1 dxα (0 + ∆x)α − 0α = lim = lim (∆x)α−1 = = αxα−1 x=0 . ∆x→0 dx x=0 ∆x→0 ∆x 1 if α = 1 Thus the formula for the derivative of the power function holds whenever the function is defined. The derivative of power functions leads to the derivative for polynomials (cn xn + cn−1 xn−1 + · · · + c1 x + c0 )0 = ncn xn−1 + (n − 1)cn−1 xn−2 + · · · + c1 . On the other hand, by the derivative of x−1 and the chain rule, we get the derivative 0 f0 1 = (−1)(f )−2 f 0 = − 2 f f of the reciprocal of any differentiable function. By further applying the Leibniz rule, we get the derivative 0 0 f 1 f 0g − f g0 01 =f +f = (2.1.16) g g g g2 of the quotient of differentiable functions. In particular, we are able to compute the derivative of any rational function. The limits (1.3.23) and (1.3.25) tell us d sin x sin x − sin 0 sin x = lim = lim = 1, x→0 x dx x=0 x→0 x d cos x cos x − cos 0 cos x − 1 = lim x = 0. = lim x→0 dx x=0 x→0 x x2

94

CHAPTER 2. DIFFERENTIATION

At any x = a, let y = x − a. Then by the chain rule, d sin x d sin(y + a) dy d (sin y cos a + cos y sin a) = = dx x=a dy dx dy y=0 y=0 x=a d sin y d cos y = cos a + sin a = 1 cos a + 0 sin a = cos a. dy y=0 dy y=0 Similar computation gives us the derivative of the cosine function. Further applying (2.1.16) gives us the derivative of the tangent function. d sin x cos x cos x − (− sin x) sin x d tan x 1 = = = = sec2 x. 2 dx dx cos x cos x cos x2 The limit (1.4.12) tells us dex ex+∆x − ex e∆x − 1 = lim = lim ex = ex . ∆x→0 ∆x→0 dx ∆x ∆x The derivative of the other exponential functions can be obtained by the chain rule dex log α dey d(x log α) dαx = = = ex log α log α = αx log α. dx dx dy y=x log α dx Note that all the exponential functions are the solutions of the differential equations of the form f 0 = cf , where c is a constant (see Example 2.2.6 for the converse). Such functions are characterized by the important property that the rate of change is proportional to the value of the function itself (see the discussion on interest rate in Example 2.1.10). From this viewpoint, the case c = 1 is the purest and the most natural case. Only the exponential function based on e corresponds to c = 1. This is why ex is called the natural exponential function. p √ Example 2.1.11. The function f (x) = 1 + sin2 x is the composition of f = u, u = 1 + v 2 , v = sin x. Therefore √ df d u d(1 + v 2 ) d(sin x) 1 sin x cos x = = √ 2v cos x = p . dx du dv dx 2 u 1 + sin2 x

y2 = 1 is an ellipse. The ellipse has upper and 4 √ √ lower parts given respectively by y = 2 1 − x2 and y = −2 1 − x2 . By the chain rule, the derivative of the upper part is 1 d(1 − x2 ) 2x dy d(2u 2 ) 1 1 = = 2 u− 2 (−2x) = − √ . dx du dx 2 1 − x2 u=1−x2

Example 2.1.12. The equation x2 +

Alternatively, instead of finding the explicit formula for y in terms of x, we keep y(x)2 in mind that y is a function of x and take the equation to mean x2 + = 1. 4 d Then taking on both sides of the equation gives us dx d y2 d(x2 ) 1 d(y 2 ) dy 1 dy 2 0= x + = + = 2x + 2y . dx 4 dx 4 dy dx 4 dx

2.1. APPROXIMATION AND DIFFERENTIATION

95

Solving the equation, we get dy 4x =− . dx y dy Yet another way of finding is by the parametrization x = cos t, y = 2 sin t dx dy dy dx of the ellipse. The chain rule = tells us dt dx dt dy dy 2 cos t = dt = = −2 cot t. dx dx − sin t dt The reader is left to verify that the three ways give the same result. Exercise 2.1.14. Compute the derivatives of the trigonometric functions cot x, sec x, csc x. Exercise 2.1.15. Compute the derivative of log x by making use of the limit (1.4.11). Exercise 2.1.16. Compute the derivatives. √ 1 4. 1 + x2 . . 1. 2 x +1 q p √ 2 5. 1 + 2 + 3 + x. 2x + 1 2. 2 . x +1 6. (sin 2x + cos 3x)7 . 2 4 2x + 1 3. . 7. cos(sin(tan x)). x2 + 1

8. sin2

1 . x

9. 2sin 3x . 10. e2x (cos x − 2 sin x). 11.

23x − 3x2 . 32x + 2x3

Exercise 2.1.17. The hyperbolic functions are sh x =

ex − e−x ex + e−x ex − e−x ex + e−x , ch x = , th x = x , coth x = . 2 2 e + e−x ex − e−x

Express the derivatives of hyperbolic functions in terms of hyperbolic functions. Exercise 2.1.18. Suppose f (x) and g(x) are differentiable functions satisfying f (2) = 3, g(2) = 2, f 0 (2) = 1, g 0 (2) = −1. Compute the derivatives of the following functions at 2. f (x)g(x), ef (x) sin πg(x), f (g(x)), g(2f (x)2 − 8g(x)). Exercise 2.1.19. Suppose u = u(x) and v = v(x) are differentiable functions satisfying u(0) = 1, v(0) = −1 and the given equations. Compute the derivatives of u and v at 0. 1. u2 + uv + v 2 = 1, (1 + x2 )u + (1 − x2 )v = x. 2. xu + (x + 1)v = ex , eu + xe−v = ex+1 . Exercise 2.1.20. Suppose y is a function of x satisfying given equation. Compute dy at the given point. the derivative dx 1. y 3 + 2xy 2 − 2x3 = 1, at x = 1, y = 1. 2. x sin(x − y) = y cos(x + y), at x = π, y = 0.

96

CHAPTER 2. DIFFERENTIATION 3. ey = xy, at x = −e−1 , y = −1. 4. (1 + y)ex + (1 + x)ey = xy + 2, at x = 0, y = 0. 5. y 2 sin x = x2 cos y, at x =

π π ,y= . 4 4

dy at the Exercise 2.1.21. For the parametrized curves, compute the derivative dx given point. Then find the equation for the tangent line of the curve. 1. Cycloid: x = t − sin t, y = 1 − cos t, at t =

π . 4

2. Spiral: x = t cos t, y = t sin t, at t = π. 3. Involute of circle: x = cos t + t sin t, y = sin t − t cos t, at t = π. 4. Four-leaved rose: x = cos 2t cos t, y = cos 2t sin t, at t =

2.1.7

π . 4

Derivative of Inverse Function

The inverse trigonometric functions and the logarithmic functions were defined as inverse functions. In order to compute their derivatives, we need to know how to compute the derivatives of inverse functions. If a function y = f (x) is approximated by a linear function y = A + Bx, then the inverse function x = f −1 (y) should be approximated by the inverse 1 linear function x = −B −1 A + B −1 y. This suggests that (f −1 )0 = B −1 = 0 . f Proposition 2.1.5. Suppose a continuous function f (x) is invertible near x0 and is differentiable at x0 . If f 0 (x0 ) 6= 0, then the inverse function is also differentiable at y0 = f (x0 ), with (f −1 )0 (y0 ) =

1 f 0 (x

0)

.

The formula can also be written as −1 dy dx = for y = f (x), x = f −1 (y). dy dx

(2.1.17)

(2.1.18)

Proof. Let f (x) be invertible and differentiable. By Proposition 2.1.3, it is continuous. By Theorem 1.4.8, the inverse function is also continuous. Therefore for y = f (x), we have x → x0 if and only if y → y0 . Moreover, because f is invertible, we have x 6= x0 if and only if y 6= y0 . By the composition rule, we may substitute x, x0 by f −1 (y), f −1 (y0 ) in the limit f 0 (x0 ) = lim

x→x0

and get f 0 (x0 ) = lim

f (x) − f (x0 ) x − x0 y − y0 . − f −1 (y0 )

y→y0 f −1 (y)

2.1. APPROXIMATION AND DIFFERENTIATION

97

If f 0 (x0 ) 6= 0, then this implies f −1 (y) − f −1 (y0 ) 1 = lim . f 0 (x0 ) y→y0 y − y0 Thus f −1 (y) is differentiable at y0 with

1 f 0 (x

0)

as the derivative.

For more detailed discussion on the relation between the invertibility of f (x) near x0 and f 0 (x0 ) 6= 0, see Exercises 2.1.22, 2.1.24, and the remark after Proposition 2.2.4. Exercise 2.1.22. Suppose f (x) is invertible near x0 and is differentiable at x0 . Prove that if the inverse function is differentiable at y0 = f (x0 ), then f 0 (x0 ) 6= 0. This is the “conditional” converse of Proposition 2.1.5. Exercise 2.1.23. Suppose f (x) is invertible near x0 and is differentiable at x0 . Prove directly that if L(x) = f (x0 ) + b∆x is the linear approximation of f (x) at x0 with b 6= 0, then L−1 (y) = x0 + b−1 ∆y is the linear approximation of f −1 (x) at f (x0 ). This gives an alternative proof of Proposition 2.1.5. Exercise 2.1.24. Consider the function   n f (x) = n2 + 1 x

1 ,n ∈ N n . otherwise if x =

Verify that f 0 (0) = 1 but f (x) is not one-to-one. This shows that the invertibility condition is necessary in Proposition 2.1.5. In particular, f 0 (x0 ) 6= 0 does not necessarily imply the function is invertible near x0 .

Now we are ready to derive the following derivatives. 1 (arcsin x)0 = √ , 1 − x2 1 (arccos x)0 = − √ , 1 − x2 1 (arctan x)0 = , 1 + x2 1 (arcsec x)0 = √ , x x2 − 1 1 (log |x|)0 = , x 1 0 . (logα |x|) = x log α Let y = arcsin x. Then x = sin y, − d arcsin x = dx

d sin y dy

−1 =

π π ≤ y ≤ , and 2 2

1 1 1 =p =√ . 2 cos y 1 − x2 1 − sin y

98

CHAPTER 2. DIFFERENTIATION

The derivative of the inverse cosine can be derived similarly, or derived from π arcsin x + arccos x = . For the derivative of the inverse tangent, let y = 2 π π arctan x. Then x = tan y, − ≤ y ≤ , and 2 2 −1 d arctan x d tan y 1 1 1 = = = = . 2 2 dx dy sec y 1 + tan y 1 + x2 Using sec0 x = sec x tan x, we also get the derivative for inverse secant. 1 1 1 d arcsec x p = = = √ . 2 dx sec y tan y y=arcsec x sec y sec y − 1 y=arcsec x x x2 − 1 The derivative of the logarithmic function can be obtained from the derivative of the exponential function. For x > 0, y −1 de d log x 1 1 = = y = . dx dy e y=log x x y=log x

For x < 0, y = log |x|, we have x = −|x| = −ey and −1 y d log |x| d(−e ) 1 1 1 = = = . = y dx dy −e y=log |x| −|x| x y=log |x|

Thus the formula holds for any x 6= 0. The derivative for the other logarithlog x mic functions can be then obtained from logα x = . log α √ (x + 2)7 x2 + 1 , we take Example 2.1.13. To compute the derivative of f (x) = (x2 − x + 1)6 the log and get log f (x) = 7 log(x + 2) +

1 log(x2 + 1) − 6 log(x2 − x + 1). 2

Taking the derivative, we get f 0 (x) 1 2x 2x − 1 =7 + −6 2 . 2 f (x) x + 2 2(x + 1) x −x+1 Thus

√ (x + 2)7 x2 + 1 7 x 12x − 6 f (x) = + − . (x2 − x + 1)6 x + 2 x2 + 1 x2 − x + 1 0

Exercise 2.1.25. Compute the derivatives. 1 1. arcsin . x arcsin x 2. . arccos x 3. (arctan(1 + x2 ))3 .

4. x(log x − 1).

7. log | sin x|.

5. log(log x).

8. arctan(log x).

6. log(x +

√

1 + x2 ).

9. arcsin(arccos x).

2.1. APPROXIMATION AND DIFFERENTIATION

99

Exercise 2.1.26. Suppose f (x) and g(x) are differentiable functions satisfying f (2) = 3, g(2) = 2, f 0 (2) = 1, g 0 (2) = −1. Compute the derivatives of the following functions at 2. ! p 6 arctan f (x) f (x) log(f (x) + g(x)), arcsin p . , f (1 + log2 g(x)), g π 3 g(x) p y Exercise 2.1.27. Suppose y is a function of x satisfying arctan = log x2 + y 2 . x Compute the derivative of y and express the result in terms of x and y. Exercise 2.1.28. Compute the derivatives by making use of the logarithmic function. r √ x (x2 + 1)3 x − 2 1+x 5. x(x ) . √ 1. . 3. x . 3 sin x + cos x (x + 3)9 x3 − x + 2 2. xx .

4. f (x)g(x) .

6. (1 + sin x)arccos x .

Exercise 2.1.29. Find the inverse of hyperbolic functions in Exercise 2.1.17. Then compute the derivatives in two ways: by direct computation and by using Proposition 2.1.5.

2.1.8

Additional Exercise

One Side Derivative Define one side derivatives f+0 (x0 ) = lim+ x→x0

f (x) − f (x0 ) f (x) − f (x0 ) , f−0 (x0 ) = lim− . x − x0 x − x0 x→x0

By Proposition 1.3.3, the usual (two side) derivative exists if and only if the left and right side derivatives exist and are equal. Exercise 2.1.30. Define one side differentiability and prove the equivalence between the right differentiability and the existence of the right derivative. Exercise 2.1.31. For α > 0, compute the right derivative of the power function xα at 0. ( xx if x > 0 Exercise 2.1.32. Determine whether the function is right differen1 if x = 0 tiable at x = 0. Exercise 2.1.33. Prove right differentiability implies right continuity. 0 (x) and (f g)0 (x) = Exercise 2.1.34. Prove the properties (f + g)0+ (x) = f+0 (x) + g+ + 0 (x) for the right derivative. f+0 (x)g(x) + f (x)g+

Exercise 2.1.35. Suppose f+0 (x0 ) > 0. Prove that f (x) > f (x0 ) for x > x0 and near x0 . Exercise 2.1.36. Suppose f (x) and g(y) are right differentiable at x0 and y0 = f (x0 ). Prove that under one of the following conditions, the composition g(f (x)) 0 (y )f 0 (x ). is right differentiable at x0 , and we have the chain rule (g◦f )0+ (x0 ) = g+ 0 + 0 1. f (x) ≥ f (x0 ) for x ≥ x0 .

100

CHAPTER 2. DIFFERENTIATION

2. g(y) is (two side) differentiable at y0 . Note that by Exercise 2.1.35, the first condition is satisfied if f+0 (x0 ) > 0. Can you find the other chain rules for one side derivatives? Exercise 2.1.37. State and prove the one side derivative version of Proposition 2.1.5.

2.2

Application of Differentiation

Having defined and computed the linear approximations, we are ready to solve problems related to the change of functions.

2.2.1

Maximum and Minimum

A function f (x) has a local maximum at x0 if f (x0 ) is the biggest value among all the values near x0 . In other words, there is δ > 0, such that |x − x0 | < δ =⇒ f (x0 ) ≥ f (x).

(2.2.1)

Similarly, f (x) has a local minimum at x0 if there is δ > 0, such that |x − x0 | < δ =⇒ f (x0 ) ≤ f (x).

(2.2.2)

Local maxima and local minima are also called local extrema. In the definition above, the function is assumed to be defined on both sides of x0 . Similar definition can be made when the function is defined on only one side of x. For example, suppose f (x) is defined on a bounded closed interval [a, b]. Then a is a local maximum if there is δ > 0, such that 0 ≤ x − a < δ =⇒ f (a) ≥ f (x). .... absolute .. maximum .. ..... ... .. ..... .. local ..... .. maximum ..... .. .......... . ... ...... .. ... .. . . ... .. ... .. local .. ... maximum .. ... .. . ... ....... . . .. .. .... .. ... . . .. .. ... .. . ..... ....... ... ..... .. ... .. ... .. .. ............... .. local .. minimum . .. .. .. .. absolute .. minimum .. .. .. .. .. .. .. . . ........................................................................................................................................................................................ a .. b .. .. Figure 2.2: maximum and minimum A function f (x) has an absolute maximum at x0 if f (x0 ) is the biggest value among all the values of f (x). In other words, we have f (x0 ) ≥ f (x)

2.2. APPLICATION OF DIFFERENTIATION

101

for any x in the domain of f . The concept of absolute minimum is similarly defined. Absolute extrema are clearly also local extrema. The following result tells us how to find local extrema. Proposition 2.2.1. Suppose a function f (x) is defined on both sides of x0 and has a local extreme at x0 . If f (x) is differentiable at x0 , then f 0 (x0 ) = 0. Proof. Suppose f (x) is differentiable at x0 , with f 0 (x0 ) > 0. Take = Then there is δ > 0, such that

f 0 (x0 ) . 2

|x − x0 | < δ =⇒ |f (x) − f (x0 ) − f 0 (x0 )(x − x0 )| ≤ |x − x0 |. For δ > x − x0 > 0, this tells us f (x) − f (x0 ) ≥ f 0 (x0 )(x − x0 ) − |x − x0 | = (f 0 (x0 ) − )(x − x0 ) > 0. For 0 > x − x0 > −δ, this tells us f (x) − f (x0 ) ≤ f 0 (x0 )(x − x0 ) + |x − x0 | = (f 0 (x0 ) − )(x − x0 ) < 0. Thus x0 is not a local extreme. Similarly, if f 0 (x0 ) < 0, then x0 is also not a local extreme. Note that the proof makes critical use of both sides of x0 . For a function f (x) defined on a bounded closed interval [a, b], this means that the proposition may be applied to the interior points of the interval where the function is differentiable. Therefore the proposition tells us that a local extreme point x0 must be one of the following three cases. 1. a < x0 < b, f 0 (x0 ) does not exist. 2. a < x0 < b, f 0 (x0 ) exists and is 0. 3. x0 = a or b. Typically, the three possibilities would provide finitely many candidates for the potential local extrema. If we take the maximum and minimum of the values at these points, we will get the absolute maximum and minimum. Example 2.2.1. The function f (x) = x2 ex is differentiable everywhere, with the derivative f 0 (x) = x(x + 2)ex . From f 0 (x) = 0, we get two possible local extrema x = −2 and x = 0. By limx→−∞ f (x) = 0, limx→∞ f (x) = +∞, f (x) ≥ 0 for any x, f (−2) = 4e−2 , f (0) = 0, we conclude that the function has the absolute minimum at 0, and the function has no absolute maximum. As to whether x = −2 is a local extreme, see Example 2.2.7. 2

Example 2.2.2. For the function f (x) = (x + 1)x 3 on [−1, 1], we have f 0 (x) = 1 1 2 (5x + 2)x− 3 . The derivative vanishes at x = − and does not exist at x = 0. 3 5 Thus the possible local extrema are the two points and the end points −1, 1. By √ 3 2 3 20 f (−1) = 0, f − = , f (0) = 0, f (1) = 2, we conclude that the function 5 25 has absolute minimum at −1 and 0 and has absolute maximum at 1.

102

CHAPTER 2. DIFFERENTIATION

Example 2.2.3. What is the biggest rectangle with circumference 1? Let one side of the rectangle be x. From practical consideration, we have 1 1 0 ≤ x ≤ , and the area is A(x) = x − x . Thus the problem becomes finding 2 2 1 the maximum of A(x) on the interval 0, . Since A(x) is differentiable, the 2 potential candidates are either the end points of the interval or the interior points 1 1 1 where A0 (x) = − 2x = 0. So the possible local extrema are 0, , . The values 2 4 2 1 1 of A at the three points are 0, , 0, respectively. The biggest value appears 16 16 1 at x = , in which case the rectangle is a square. 4 Example 2.2.4. The function f (x) = x3 satisfies f 0 (0) = 0. However, we have f (x) < f (0) for x < 0 and f (x) > f (0) for x > 0. Therefore 0 is not a local extreme for f (x). The example shows that f 0 (x0 ) = 0 is a necessary instead of a sufficient condition for local extrema (of differentiable functions). Exercise 2.2.1. Find the maximum and the minimum on the given range. 1. x(x + 1)(x + 2) on [−3, 3].

4. sin x + cos x on [0, 2π].

2. x(x + 1)(x + 2) on [−3, 0].

5. sin x + cos x on whole R.

3. |x|ex on [−2, 1].

6. (x + 1) cos x + (x − 1) sin x on [−π, π].

Exercise 2.2.2. How much of the corners of a square with side length 1 should be cut so that the tray made from it has the biggest volume? Exercise 2.2.3. Given the surface area of a cylinder, when is the volume biggest? Exercise 2.2.4. Find the distance from the point (1, 1) to the parabola y 2 = 2x. Exercise 2.2.5. An arrow is shot at the angle α. The location of the arrow at time t is 1 x = tv cos α, y = tv sin α − gt2 , 2 where v is the initial speed, x is the distance traveled and y is the height. For what angle α does the arrow travel the furthest? Exercise 2.2.6. Suppose f (x) is left differentiable at x0 (see Exercise 2.1.30). Prove that if x0 is a local maximum, then f 0 (x− 0 ) ≥ 0. What about the right differentiable case?

2.2.2

Mean Value Theorem

Theorem 2.2.2 (Main Value Theorem). Suppose f (x) is continuous on [a, b] and differentiable on (a, b). Then there is a < c < b, such that f 0 (c) =

f (b) − f (a) . b−a

(2.2.3)

The quotient on the right is the slope of the line segment that connects the end points of the graph of f (x) on [a, b]. The theorem says the line segment is parallel to some tangent line.

2.2. APPLICATION OF DIFFERENTIATION

103

The conclusion of the theorem can also be written as f (b) − f (a) = f 0 (c)(b − a) for some a < c < b,

(2.2.4)

or f (x + ∆x) − f (x) = f 0 (x + θ∆x)∆x for some 0 < θ < 1.

(2.2.5)

Also note that a and b may be exchanged in the theorem, so that we do not have to insist a 0) for the theorem to hold. ..... .. max{f −L}..... .. .............................. .. . ... .. ...... .. .... .. . ... .. ... f (b) . .. .. . . .. ......................................... .. . .. .. ............................ . .. .. . . . . . . . . . . . . . ........ .. L .. . . . .. .. . . . . . . . . . . . .. .. .. .. .. f (a)....... .. .. .. .. .. .. ..... .. .. .. .. ...... .. .... .. .. .. .. ....................... .. .. .. . .. min{f −L} .. .. .. .. .. .. .. . .. .. .. .............................................................................................................................................................................. .. c1 c2 a b .. .. . Figure 2.3: mean value theorem Proof. The line connecting (a, f (a)) and (b, f (b)) is L(x) = f (a) +

f (b) − f (a) (x − a). b−a

As suggested by Figure 2.3, the tangent lines parallel to L(x) can be found where the difference h(x) = f (x) − L(x) = f (x) − f (a) −

f (b) − f (a) (x − a) b−a

reaches maximum or minimum. Since h(x) is continuous, by Theorem 1.4.5, it reaches maximum and minimum at c1 , c2 ∈ [a, b]. If both c1 and c2 are end points a and b, then the maximum and the minimum of h(x) are h(a) = h(b) = 0. This implies h(x) is constantly zero on the interval, so that h0 (c) = 0 for any a < c < b. If one of c1 and c2 , denoted c, is not an end point, then by Proposition 2.2.1, we have h0 (c) = 0. In any case, we have a < c −1. Taking f (x), a, b in 1+x the mean value theorem to be log(1 + x), 0, x, we have x d log(1 + u) (x − 0) = , log(1 + x) = log(1 + x) − log(1 + 0) = du 1 + θx u=θx

Example 2.2.5. We prove

where 0 < θ < 1. For x > 0, we have 1 < 1 + θx < 1 + x. For −1 < x < 0, we have 1 > 1 + θx > 1 + x. In both cases, we conclude x x < < x. 1+x 1 + θx Exercise 2.2.7. Find c in the mean value theorem. 1. x3 on [−1, 1].

2.

1 on [1, 2]. x

3. 2x on [0, 1].

Exercise 2.2.8. Let f (x) = |x − 1|. Is there 0 < c < 3 such that f (3) − f (0) = f 0 (c)(3 − 0)? Does your conclusion contradict the means value theorem? Exercise 2.2.9. Suppose f (x) is continuous on [a, b] and differentiable on (a, b). Prove that f (x) is Lipschitz (see Exercise 1.4.13) if and only if f 0 (x) is bounded on (a, b). Exercise 2.2.10. Use the main value theorem to prove | sin x − sin y| < |x − y| and | arctan x − arctan y| < |x − y|. x Exercise 2.2.11. Prove that < arctan x < x for x > 0. 1 + x2 Exercise 2.2.12 (Rolle3 ’s Theorem). Suppose f (x) is continuous on [a, b] and differentiable on (a, b). Prove that if f (a) = f (b), then f 0 (c) = 0 for some a < c < b. Exercise 2.2.13. Suppose f (x) has continuous derivative on a bounded and closed interval [a, b]. Prove that for any > 0, there is δ > 0, such that |∆x| < δ implies |f (x + ∆x) − f (x) − f 0 (x)∆x| ≤ |∆x|. In other words, f (x) is uniformly differentiable. Exercise 2.2.14. Suppose f (x) is continuous on [a, b] and differentiable on (a, b). Suppose f (a) = 0 and |f 0 (x)| ≤ A|f (x)| for some constant A. First prove that 1 f (x) = 0 on a, a + . Then further prove that f (x) = 0 on the whole interval 2A [a, b]. Exercise 2.2.15. Suppose f (x) is continuous on [a, b] and is left and right differentiable on (a, b) (see Exercise 2.1.30). Prove that there is a < c < b, such that f (b) − f (a) lies between f−0 (c) and f+0 (c). A further extension of the mean value b−a theorem appears in Exercise 2.2.37.

The mean value theorem has the following important consequence, which basically says that a non-changing quantity must be a constant. Proposition 2.2.3. Suppose f 0 (x) = 0 for all x on an interval. Then f (x) is a constant on the interval. 3 Michel Rolle, born √ 1652 in Ambert (France), died 1719 in Paris (France). Rolle invented the notion n x for the n-th root of x. His theorem appeared in an obscure book in 1691.

2.2. APPLICATION OF DIFFERENTIATION

105

For any two points x1 , x2 in the interval, we applying the mean value theorem to get f (x1 ) − f (x2 ) = f 0 (c)(x1 − x2 ) = 0(x1 − x2 ) = 0 for some x1 < c < x2 . This proves the proposition. Example 2.2.6. Suppose f (x) satisfies f 0 (x) = af (x), where a is a constant. Then (e−ax f (x))0 = −ae−ax f (x) + e−ax f 0 (x) = −ae−ax f (x) + ae−ax f (x) = 0. Therefore e−ax f (x) = c is a constant, and f (x) = ceax is a multiple of the exponential function. Exercise 2.2.16. Suppose f (x) satisfies f 0 (x) = xf (x). Prove that f (x) = ce some constant c.

x2 2

for

Exercise 2.2.17. Suppose f (x) satisfies |f (x) − f (y)| ≤ |x − y|α for some constant α > 1. Prove that f (x) is constant. Exercise 2.2.18. Prove that if f 0 (x) = g 0 (x) for all x, then f (x) = g(x) + c for some constant c. Then prove the following equalities. 1. 2 arctan x + arcsin

2x = π for x ≥ 1. 1 + x2

1 2. 3 arccos x − arccos(3x − 4x3 ) = π for |x| ≤ . 2

2.2.3

Monotone Function

To find out whether a function f (x) is increasing near x0 , we consider the linear approximation f (x0 ) + f 0 (x0 )∆x. The linear approximation is increasing if and only if the coefficient f 0 (x0 ) ≥ 0. Because of the approximation, the condition f 0 (x0 ) ≥ 0 very likely will imply that f (x) is also increasing near x0 . This leads to the following criterion for a function to be increasing. Proposition 2.2.4. Suppose f (x) is continuous on an interval and is differentiable on the interior of the interval. Then f (x) is increasing if and only if f 0 (x) ≥ 0. Moreover, if f 0 (x) > 0, then f (x) is strictly increasing. The similar statement for decreasing functions is also true. Moreover, by combining with Theorem 1.4.8, we find that f 0 (x) 6= 0 near x0 implies f (x) is invertible near x0 (compare Proposition 2.1.5). We note that in the proposition, it is not sufficient for the derivative to be non-negative at a single point. The derivative needs to be non-negative everywhere on an interval. Proof. Suppose f (x) is increasing. Then either f (x) = f (y) or f (x) − f (y) f (y) − f (x) ≥ 0 for any x 6= y. This has the same sign as x − y. Therefore y−x implies f 0 (x) ≥ 0 for any x. Conversely, for x > y, the mean value theorem tells us f (x) − f (y) = f 0 (c)(x − y) for some x > c > y. Then the condition f 0 (c) ≥ 0 implies f (x) ≥ f (y), and the condition f 0 (c) > 0 implies f (x) > f (y).

106

CHAPTER 2. DIFFERENTIATION .... . .. .. .. . . .. .. .. . ...... −2 .. ...................... . . . . . . . . . 4e . . . . . ...... .. .. ..... ..... ... . .... . . . . . . . . .... . .. .... .... ...... . . . . . . . . . . . . . . . . . . ..... . ..... ........................ ................................................................................................................................................................................................................................................................................. .. a −2 b .. .. Figure 2.4: graph of x2 ex

Example 2.2.7. The derivative of f (x) = x2 ex is f 0 (x) = x(x + 2)ex . The possible local extrema found in Example 2.2.1 divide the whole real line into three sections (−∞, −2], [−2, 0] and [0, ∞), on the interiors of which we have respectively f 0 (x) > 0, f 0 (x) < 0 and f 0 (x) > 0. Therefore the function is strictly increasing, strictly decreasing and strictly increasing again on the three sections. Since the function changes from increasing to decreasing at x = −2, f (−2) = −2 4e is a local maximum. Since the function changes from decreasing to increasing at x = 0, f (0) = 0 is a local minimum. Combined with limx→−∞ f (x) = 0, limx→+∞ f (x) = +∞, we get a rough sketch for the graph of the function in Figure 2.4. The numbers a and b in the graph will be explained in Example 2.3.21. 2

Example 2.2.8. By taking the derivative of the function f (x) = (x+1)x 3 , we found 2 x = − and x = 0 are the possible local extrema in Example 2.2.2. To determine 5 0 0 whether they are indeed local extrema, we note that f (x) > 0, f (x) < 0 and 2 2 f 0 (x) > 0 in the interiors of the intervals −∞, − , − , 0 and [0, ∞). Thus 5 5 2 the function changes from increasing to decreasing at x = − and changes from √ 5 3 3 20 2 decreasing to increasing at x = 0. This implies that f − = is a local 5 25 maximum and f (0) = 0 is a local minimum. Example 2.2.9. We prove ex > 1+x for x 6= 0. The problem is the same as showing f (x) = ex − x > 1 = f (0). For x > 0, we have f 0 (x) = ex − 1 > 0 for x > 0. Thus f (x) is strictly increasing and f (x) > f (0) for x > 0. The argument for the case x < 0 is similar. Exercise 2.2.19. Study the monotone property of functions and find local maxima and minima. 1. x(x + 1)(x + 2). 2.

x . 1 + x2

5. sin x + cos x. 6. |x|ex . 7. (x + 1) cos x + (x − 1) sin x.

1 3. x − . x

8. x − 2 sin x. 2

4. (x − 1)x 3 .

9. x2 − log x.

2.2. APPLICATION OF DIFFERENTIATION 10. ex cos x.

107

12. xx .

11. x − 2 arctan x. Exercise 2.2.20. Prove the inequalities. 1. tan x > x for 0 < x < 2. sin x >

π . (Hint: Consider sin x − x cos x) 2

2 π sin x x for 0 < x < . (Hint: Consider ) π 2 x

x2 3. sin x + cos x > 1 + x − √ for x > 0. 2 4. cos x > 1 − 5. x −

x2 π for 0 < x < . 2 2

x2 x2 > log(x + 1) > x − for x > 0. 2(1 + x) 2

1 x+1 1 x is strictly increasing and 1 + is Exercise 2.2.21. Prove that 1 + x x x x+1 1 1 <e< 1+ strictly decreasing on (0, +∞). Moreover, prove that 1 + x x 1 x e and e − 1 + < for x > 0. x x 1 x+α Exercise 2.2.22. The following steps show that the smallest α such that 1 + > x 1 e for all x > 0 is α = . 2 1 1 1 1. Convert the problem to α > f , with f (x) = − . Thus the x log(1 + x) x problem is the same as finding the supremum of f (x) on (0, +∞). 1 1 2. Prove that u − > 2 log u for u > 1 and u − < 2 log u for u < 1. Then u u use the inequality to show that f is a decreasing function. 1 3. Show that the supremum of f (x) is limx→0+ f (x) = . 2 1 n+α 1 n+β 4. Find the smallest α and biggest β such that 1 + >e> 1+ n n for all natural numbers n. Exercise 2.2.23. Suppose f (x) is continuous for x ≥ 0 and differentiable for x > 0. f (x) Prove that if f 0 (x) is strictly increasing and f (0) = 0, then is also strictly x increasing. Exercise 2.2.24. Suppose f (x) is left and right differentiable on an interval. Prove that if f+0 (x) ≥ 0 and f−0 (x) ≥ 0, then f (x) is increasing. Moreover, if the inequalities are strict, then f (x) is strictly increasing.

108

2.2.4

CHAPTER 2. DIFFERENTIATION

ˆ L’Hospital’s Rule

The derivative is a useful tool for the computation of function limits of the ∞ 0 . types or 0 ∞ Proposition 2.2.5 (L’Hˆospital4 ’s Rule). Suppose f (x) and g(x) are differentiable on (a − δ, a) ∪ (a, a + δ) for some δ > 0. Assume 1. Either limx→a f (x) = limx→a g(x) = 0 or limx→a f (x) = limx→a g(x) = ∞. 2. limx→a Then limx→a

f 0 (x) exists. g 0 (x) f (x) f 0 (x) f (x) also exists and limx→a = limx→a 0 . g(x) g(x) g (x)

The following counterexample 1+x 1 (1 + x)0 1 = , lim = lim = 1. 0 x→0 2 + x x→0 x→0 2 (2 + x) 1 lim

shows the necessity of the first condition. The second condition means that the existence of the limit for the derivative quotient implies the existence of the limit for the original quotient. The converse of this implication is not necessarily true. The L’Hˆospital’s rule as stated above is only about the finite limit at a finite a. The subsequent proof shows that the technique also applies to one 1 side limits. By converting x to , it is easy to show that the technique also x holds for x approaching various kinds of infinities. Moreover, the rule can also be applied if the limit of the quotient is some kind of infinity. This can f (x) g(x) be shown by reverting the quotient to . g(x) f (x) To prove L’Hˆospital’s rule, we need the following extension of the mean value theorem. Theorem 2.2.6 (Cauchy’s Mean Value Theorem). Suppose f (x) and g(x) are continuous on [a, b] and differentiable on (a, b). If g 0 (x) is never zero, then there is a < c < b, such that f 0 (c) f (b) − f (a) = . g 0 (c) g(b) − g(a)

(2.2.6)

Geometrically, consider (g(x), f (x)) as a parametrized curve in R2 . The vector from one point at x = a to another point at x = b is (g(b)−g(a), f (b)− 4 Guillaume Francois Antoine Marquis de L’Hˆospital, born 1661 in Paris (France), died 1704 in Paris (France). His famous rule was found in his 1696 book “Analyse des infiniment petits pour l’intelligence des lignes courbes”, which was the first textbook in differential calculus.

2.2. APPLICATION OF DIFFERENTIATION

109

.... .. ............ .. . ....... ....... . . .. . .. ... . . ..... .. ... .. .. . ... . . .... .. ... ... .. .. .. . .... .. . . . .............. (g(b), f (b)) ..... .. ..................................... ... ...... .. . . ................... .. .. . .... .. (g(a), f (a)) ............................... ..................................... (g 0 (c), f 0 (c)) .............. .. ... ...... . . .. (g(c), f (c)) . . . . .. ....... .. ................. . .... ................................................................................................................................................................................ .. .. .. Figure 2.5: Cauchy’s mean value theorem f (a)). Cauchy’s mean value theorem says that it should be parallel to a tangent vector (g 0 (c), f 0 (c)) at another point x = c on the curve. This suggests that the theorem may be proved by imitating the proof of the mean value theorem, by considering h(x) = f (x) − f (a) −

f (b) − f (a) (g(x) − g(a)). g(b) − g(a)

The details are left to the reader. Proof of L’Hˆospital’s Rule. We will prove for the limit of the type limx→a+ only, with a a finite number. Thus f (x) and g(x) are assumed to be differentiable on (a, a + δ). First assume limx→a+ f (x) = limx→a+ g(x) = 0. Then f (x) and g(x) can be extended to continuous functions on [a, a+δ) by defining f (a) = g(a) = 0. Cauchy’s mean value theorem then tells us that for any a < x < a + δ, we have f (x) f (x) − f (a) f 0 (c) = = 0 (2.2.7) g(x) g(x) − g(a) g (c) for some c satisfying a < c < x (and c depends on x). As x → a+ , we have c → a+ . Therefore if the limit of the right of (2.2.7) exists, so is the limit on the left, and the two limits are the same. Now consider the case limx→a+ f (x) = limx→a+ g(x) = ∞. The technical difficulty here is that the functions cannot be extended to x = a as before. f (x) − f (a) Still, we try to establish something similar to (2.2.7) by replacing g(x) − g(a) f (x) − f (b) with , where b > a is very close to a. The second equality in g(x) − g(b) (2.2.7) still holds. Although the first equality no longer holds, it is sufficient f (x) f (x) − f (b) to show that and are very close. Of course all these should g(x) g(x) − g(b) be put together in logical order.

110

CHAPTER 2. DIFFERENTIATION f 0 (x) = l. For any > 0, there is δ1 > 0, such that g 0 (x) 0 f (x) a < x ≤ b = a + δ1 =⇒ 0 − l < . g (x)

Denote limx→a+

Then by Cauchy’s mean value theorem, 0 f (x) − f (b) f (c) a < x < b =⇒ − l = 0 − l < , g(x) − g(b) g (c)

(2.2.8)

f (x) − f (b) is bounded. g(x) − g(b) Moreover, by the assumption limx→a+ f (x) = limx→a+ g(x) = ∞, we have

where a < x < c < b. In particular, the quotient

g(b) g(x) lim+ = 1. f (b) x→a 1− f (x) 1−

Therefore, there is δ1 ≥ δ > 0, such that f (x) f (x) − f (b) − a < x < a + δ =⇒ g(x) g(x) − g(b) g(b) 1 − f (x) − f (b) g(x) = − 1 < . f (b) g(x) − g(b) 1 − f (x) Since a < x < a + δ implies a < x < b, the conclusion of (2.2.8) also holds. Thus f (x) a < x < a + δ =⇒ − l g(x) f (x) f (x) − f (b) f (x) − f (b) + ≤ − − l < 2. g(x) g(x) − g(b) g(x) − g(b) sin x Example 2.2.10. To find limx→0 , we note that both sin x and x are continuous x and vanish at x = 0. Moreover, (sin x)0 cos x = lim = cos 0 = 1. x→0 x→0 1 x0 lim

sin x (sin x)0 = limx→0 = 1. x x0 sin x = Unfortunately, the argument above is circular, because the limit limx→0 x 1 was used to compute the derivative of sin x. Similarly, the computations of ex − 1 log(x + 1) limx→0 and limx→0 by L’Hˆospital’s rule are also circular. x x

Therefore by L’Hˆ ospital’s rule, limx→0

2.2. APPLICATION OF DIFFERENTIATION Example 2.2.11. The limit limx→1

1 1 − x − 1 log x

111

may be found by making use

of L’Hˆospital’s rule twice. 1 log x − x + 1 1 lim = lim − x→1 x − 1 x→1 (x − 1) log x log x 1 −1 1−x x = lim =(1) lim x−1 x→1 x log x + x − 1 x→1 log x − x −1 1 (2) = lim =− . x→1 log x + 2 2 Note that one should not be blindly apply L’Hˆospital’s rule without checking the conditions. For the computation above, since 1 − x = x log x + x − 1 = 0 at x = 0 −1 and limx→1 exists, the equality =(2) holds. Then since log x − x + 1 = log x + 2 1 −1 x exists, the equality =(1) holds. (x − 1) log x = 0 at x = 0 and limx→1 x−1 log x − x √ 1 Example 2.2.12. The limit limx→+∞ (x + 1 + x2 ) log x may be found by first applying L’Hˆ ospital’s rule to the log of the function. x 1+ √ 2 √ √1 + x 2 log(x + 1 + x ) x x + 1 + x2 lim = lim = lim √ = 1. 1 x→+∞ x→+∞ x→+∞ log x 1 + x2 x √ Note that L’Hˆ ospital’s rule may be applied because limx→+∞ log(x + 1 + x2 ) = limx→+∞ log x = ∞ and the limit in the middle exists. By taking the exponential, we get p 1 lim (x + 1 + x2 ) log x = e1 = e. x→+∞

Exercise 2.2.25. Compute the limits. x − sin x . x3 ex − 1 − x 2. limx→0 . x2 cos x − 1 1 + . 3. limx→0 x4 2x2

1. limx→0

4. limx→0 sin x log x. 1 5. limx→0 − cot x . x

1

6. limx→0 (cos x) x2 . √ x 7. limx→0

1+x−e . x

8. limx→+∞ (π − 2 arctan x) log x. 2 1 x 9. limx→∞ x sin . x

Exercise 2.2.26. Discuss whether L’Hˆospital’s rule can be applied to the limits. 1. limx→∞

x + sin x . x − cos x

2. limx→+∞

x . x + sin x

x2 sin 3. limx→0

sin x

1 x.

112

CHAPTER 2. DIFFERENTIATION

Exercise 2.2.27. The mean value theorem tells us log(1 + x) − log 1 = x

1 , ex − 1 = xeθx , 1 + θx

1 for some 0 < θ < 1. Prove that in both cases, limx→0 θ = . 2 Exercise 2.2.28. Prove L’Hospital’s rule for the case a = +∞. Moreover, discuss L’Hospital’s rule for the case l = ∞.

2.2.5

Additional Exercise

Ratio Rule By specifying the ratio rule in Exercise 1.1.32 to yn = an , we get the limits in Exercises 1.1.33 and 1.1.34. By making other choices of y0 and using the yn+1 linear approximations to estimate the quotient , we get other concrete yn forms of the ratio rule. Exercise 2.2.29. Prove that if a > b > c > 0, then 1 − ax < (1 − x)b < 1 − cx and 1 + ax > (1 + x)b > 1 + cx for small and positive x. Then prove the following xn+1 ≤ 1 − a for some a > 0 and big n, then limn→∞ xn = 0. 1. If xn n xn+1 ≥ 1 + a for some a > 0 and big n, then limn→∞ xn = ∞. 2. If xn n Exercise 2.2.30. Study the limits. 1. limn→∞

(n!)2 an . (2n)!

2. limn→∞

(n + a)n+b . cn n!

Exercise 2.2.31. Rephrase the rules in 2.2.29 Exercise in terms of the quotient xn xn xn+1 . Then prove that if limn→∞ n xn+1 − 1 > 0, then limn→∞ xn = 0. Find the similar condition for the conclusion limn→∞ xn = ∞. Exercise 2.2.32. Prove that if a > b > c > 0, then 1 −

a (log(x − 1))b < < x log x (log x)b

c a (log(x + 1))b c and 1 + > > 1+ for big and positive x. x log x x log x x log x (log x)b Then prove the following. xn+1 ≤1− a 1. If for some a > 0 and big n, then limn→∞ xn = 0. xn n log n xn+1 ≥1+ a 2. If for some a > 0 and big n, then limn→∞ xn = ∞. xn n log n

1−

Darboux5 ’s Intermediate Value Theorem 5 Jean Gaston Darboux, born 1842 in Nimes (France), died 1917 in Paris (France). Darboux made important contributions to differential geometry and analysis.

2.2. APPLICATION OF DIFFERENTIATION

113

Exercise 2.2.33. Suppose f (x) is differentiable on [a, b]. By considering the extrema of f (x) − γx, prove that for any γ between f 0 (a) and f 0 (b), there is c ∈ [a, b], such that f 0 (c) = γ. Exercise 2.2.34. Find a function f (x) differentiable everywhere on [0, 1], such that f 0 (x) is not continuous. The examples shows that Darboux’s intermediate value theorem is not a consequence of the usual intermediate value theorem.

Extension of Mean Value Theorem6 In Exercise 2.2.15, the mean value theorem is extended to continuous functions that are both left and right differentiable. In what follows, we consider continuous functions that are left or right differentiable. f (b) − f (a) , b−a 0 there is a linear function L(x) satisfying L (x) = l and L(a) > f (a), L(b) < f (b).

Exercise 2.2.35. Prove that for any function f (x) on [a, b] and l <

Exercise 2.2.36. Suppose f (x) is a continuous function f (x) on [a, b] and L is a linear function with the properties in Exercise 2.2.35. Prove that c = sup{x ∈ (a, b) : f ≤ L on [a, x]} satisfies a < c < b and f (c) = L(c). Moreover, if f (x) has any one side derivative at c, then the one side derivative is no less than L0 (c) = l. Exercise 2.2.37. Suppose f (x) is a continuous function on [a, b] and is left or right differentiable at any point on (a, b). Let f∗0 (x) be one of the one side derivatives at x. Prove that f (b) − f (a) inf f∗0 ≤ ≤ sup f∗0 . b−a (a,b) (a,b) Exercise 2.2.38. Use Exercise 2.2.37 to further extend the criterion for the monotonicity in Exercise 2.2.24.

Basic Inequalities The monotone property can be used to prove some important basic inequalities. Let p and q be real numbers satisfying 1 1 + = 1. p q 1

Exercise 2.2.39. For x > 0, prove that x p ≤

1 1 1 1 1 x+ in case p > 1 and x p ≥ x+ p q p q

in case p < 1. Exercise 2.2.40 (Young7 Inequality). For a, b > 0, prove ab ≤

1 p 1 q a + b for p > 1 p q

ab ≥

1 p 1 q a + b for p < 1. p q

and

When does the equality hold? 6 See “Some Remarks on Functions with One-Sided Derivatives” by Miller and V´ yborn´ y, American Math Monthly 93 (1986) 471-475. 7 William Henry Young, born 1863 in London (England), died 1942 in Lausanne (Switzerland). Young discovered a form of Lebesgue integration independently. He wrote an influential book ”The fundamental theorems of the differential calculus” in 1910.

114

CHAPTER 2. DIFFERENTIATION

Exercise 2.2.41 (H¨ older8 Inequality). Suppose p, q > 0. For positive numbers a1 , ai bi a2 , . . . , an , b1 , b2 , . . . , bn , by taking a = P 1 and b = P 1 in the Young ( api ) p ( bqi ) q inequality, prove X 1 X 1 X p q bqi . (2.2.9) ai bi ≤ api When does the equality hold? Exercise 2.2.42 (Minkowski9 Inequality). Suppose p > 1. By applying the H¨older inequality to a1 , a2 , . . . , an , (a1 + b1 )p−1 , (a2 + b2 )p−1 , . . . , (an + bn )p−1 and then to b1 , b2 , . . . , bn , (a1 + b1 )p−1 , (a2 + b2 )p−1 , . . . , (an + bn )p−1 , prove X

(ai + bi )p

1

p

≤

X

api

1

p

+

X

bpi

1

p

.

(2.2.10)

When does the equality hold?

2.3

High Order Approximation

The linear approximation has many nice properties and can be used to solve many problems. Still, there are problems that require approximations more refined than the linear one. In this case, polynomials of higher and higher degrees can be used. This leads to high order derivatives and Taylor series.

2.3.1

Quadratic Approximation

A function is approximated by a quadratic function a + b∆x + c∆x2 at x0 , if for any > 0, there is δ > 0, such that |∆x| < δ =⇒ |f (x) − a − b∆x − c∆x2 | ≤ |∆x|2 .

(2.3.1)

Similar to the linear approximation, the condition (2.3.1) for quadratic approximation means exactly a = f (x0 ), the derivative b = f 0 (x0 ) exists, and the limit f (x0 + ∆x) − f (x0 ) − f 0 (x0 )∆x ∆x→0 ∆x2 f (x) − f (x0 ) − f 0 (x0 )(x − x0 ) = lim x→x0 (x − x0 )2

c = lim

(2.3.2)

exists. It is rather tempting to define c to be the second order derivative. But the following result suggests that it is better to call 2c the second order derivative. 8 Otto Ludwig H¨ older, born 1859 in Stuttgart (Germany), died 1937 in Leipzig (Germany). He discovered the inequality in 1884. H¨older also made fundamental contributions to the group theory. 9 Hermann Minkowski, born 1864 in Alexotas (Russia, now Kaunas of Lithuania), died 1909 in G¨ ottingen (Germany). Minkowski’s fundamental contribution to geometry provided the mathematical foundation of Einstein’s theory of relativity.

2.3. HIGH ORDER APPROXIMATION

115

Proposition 2.3.1. Suppose f (x) is differentiable near x0 , and the derivative function f 0 (x) has further derivative f 00 (x0 ) at x0 . Then f (x) has a quadratic f 00 (x0 ) approximation at x0 , given by f (x0 ) + f 0 (x0 )(x − x0 ) + (x − x0 )2 . 2 Proof. Consider the difference (called remainder) R2 (x) = f (x) − f (x0 ) − f 0 (x0 )(x − x0 ) −

f 00 (x0 ) (x − x0 )2 2

between the function and the expected quadratic approximation. The difference satisfies R2 (x0 ) = R20 (x0 ) = R200 (x0 ) = 0. By Cauchy’s mean value theorem, we have R20 (c) − R20 (x0 ) R2 (x) R2 (x) − R2 (x0 ) R20 (c) = = = (x − x0 )2 (x − x0 )2 − (x0 − x0 )2 2(c − x0 ) 2(c − x0 ) for some c between x0 and x. As x → x0 , we have c → x0 , so that lim

x→x0

R2 (x) R200 (x0 ) R20 (c) − R20 (x0 ) = lim = = 0. (x − x0 )2 c→x0 2(c − x0 ) 2

Because of the proposition, we define the second order differential to be d2 f = 2cdx2 , where dx2 is indeed considered as the square of dx, at least symbolically. We have d2 f = f 00 (x0 )dx2 if the second order derivative exists at x0 . Example 2.3.1. Suppose f (x) has second order derivative at x0 . Then we have f 00 (x0 ) ∆x2 + o(∆x2 ) 2 f (x0 + 2∆x) = f (x0 ) + 2f 0 (x0 )∆x + 2f 00 (x0 )∆x2 + o(∆x2 ). f (x0 + ∆x) = f (x0 ) + f 0 (x0 )∆x +

This implies f (x0 + 2∆x) − 2f (x0 + ∆x) + f (x0 ) = f 00 (x0 )∆x2 + o(∆x2 ), and we get another way of expressing the second order derivative as a limit. f 00 (x0 ) = lim

∆x→0

f (x0 + 2∆x) − 2f (x0 + ∆x) + f (x0 ) . ∆x2

Exercise 2.3.1. Find suitable conditions among constants a, b, λ, µ so that λf (x0 + a∆x) + µf (x0 + b∆x) + f (x0 ) = f 00 (x0 )∆x2 + o(∆x2 ) holds for twice differentiable functions. Then derive some formula for the second order derivative similar to the one in Example 2.3.1. Exercise 2.3.2. Suppose f 00 (0) exists and f 00 (0) 6= 0. Prove that in the mean value 1 theorem f (x) − f (0) = xf 0 (θx), we have limx→0 θ = . This generalizes the 2 observation in Exercise 2.2.27.

116

CHAPTER 2. DIFFERENTIATION

Exercise 2.3.3. Suppose f (x) has second order derivative at x0 . Let h and k be small, distinct and nonzero numbers. Find the quadratic function q(x) = a + b∆x + c∆x2 satisfying q(x0 ) = f (x0 ), q(x0 + h) = f (x0 + h), q(x0 + k) = f (x0 + k). f 00 (x0 ) h as long as 2 h−k is kept bounded. This provides the geometrical interpretation of the quadratic approximation. Then prove that limh,k→0 b = f 0 (x0 ) and limh,k→0 c =

Exercise 2.3.4. Proposition 2.3.1 basically says that the existence of the second order derivative implies the second order differentiability. Show that the converse is not true (in contrast what Proposition 2.1.3 says about the first order case) by considering the following functions at x0 = 0.  ( x3 sin 1 if x 6= 0 x3 if x is rational 1. f (x) = . 2. f (x) = . x2 0 if x is irrational 0 if x = 0 Exercise 2.3.5. Determine the existence of the quadratic approximation and the existence of the second order derivative of functions in Exercise 2.1.5. Exercise 2.3.6. Study the existence of the quadratic approximation and the existence of the second order derivative of function |x3 (x − 1)(x − 2)2 |. Exercise 2.3.7. Suppose P (x) and Q(x) are quadratic approximations of f (x) and g(x) at x0 . 1. Prove that P (x) + Q(x) is the quadratic approximation of f (x) + g(x) at x0 . 2. Prove that although P (x)Q(x) has degree 4, the second order truncation of the product is the quadratic approximation of f (x)g(x). 3. Suppose f (x) and g(x) have second order derivatives. What do the two conclusions tell you about the second order derivatives of f (x) + g(x) and f (x)g(x)? Exercise 2.3.8. Suppose P (x) is the quadratic approximations of f (x) at x0 . Suppose Q(y) is the quadratic approximations of g(y) at y0 = f (x0 ). 1. Prove that the second order truncation of Q(P (x)) is the quadratic approximation of g(f (x)). 2. Suppose f (x) and g(y) have second order derivatives at x0 and y0 . What can you say about the second order derivative of g(f (x))?

2.3.2

High Order Derivative

The quadratic approximation is computed by taking the derivative twice. Approximations by higher degree polynomials are expected to be computed by taking the derivative many times. Given a differentiable function f (x), its derivative f 0 (x) is also a function. If the function f 0 (x) is also differentiable, then we have the second order derivative f 00 (x). If the function f 00 (x) is again differentiable, then we get the

2.3. HIGH ORDER APPROXIMATION

117

third order derivative f 000 (x). In general, the n-th order derivative of f (x) is dn f denoted f (n) (x) or . dxn It is easy to show (by induction, for example), that the high order derivatives of the power, the exponential, and the logarithmic functions are (xα )(n) = α(α − 1) · · · (α − n + 1)xα−n , (ex )(n) = ex , (αx )(n) = αx (log α)n , (log |x|)(n) = (−1)n−1 (n − 1)!x−n . Note that if α is a natural number, then (xα )(n) = 0 for n > α. The high order derivatives of the sine and the cosine functions have periodic pattern. sin0 x = cos x, sin00 x = − sin x, sin000 x = − cos x, sin0000 x = sin x, . . . cos0 x = − sin x, cos00 x = − cos x, cos000 x = sin x, cos0000 x = cos x, . . . However, there is no clear pattern for the derivatives of the tangent function. tan0 x = sec2 x, tan00 x = 2 sec2 x tan x, tan000 x = 4 sec2 x tan2 x + 2 sec4 x = 6 sec4 x − 4 sec2 x, tan0000 x = (24 sec4 x − 8 sec2 x) tan x. The formulae for the derivatives of the sum and the scalar multiplication can be directly extended to the high order derivatives. (f + g)(n) = f (n) + g (n) , (cf )(n) = cf (n) . The Leibniz rule can also be extended. (f g)0 (f g)00 (f g)000 (f g)0000

= f 0g + f g0, = f 00 g + 2f 0 g 0 + f g 00 , = f 000 g + 3f 00 g 0 + 3f 0 g 00 + f g 000 , = f 0000 g + 4f 000 g 0 + 6f 00 g 00 + 4f 0 g 000 + f g 0000 .

By induction, it is not hard to show that the coefficients in the extended Leibniz rule are the same as the ones in the binomial expansion. (f g)(n) = f (n) + nf (n−1) g 0 +

n(n − 1) (n−2) 00 f g + · · · + g (n) . 2

There is no clean extension of the chain rule. The following is the chain rule

118

CHAPTER 2. DIFFERENTIATION

for the second order derivative of z = g(y) = g(f (x)). d2 z d dz (definition of f 00 ) = dx2 dx dx d dz dy = (chain rule) dx dy dx d dz dy dz d dy + (Leibniz rule) = dx dy dx dy dx dx d dz dy dy dz d2 y = + (chain rule and definition of f 00 ) 2 dy dy dx dx dy dx 2 d2 z dy dz d2 y = 2 + . (definition of f 00 ) dy dx dy dx2 If the n-th order derivative exists, then the n-th order differential is dn f = f (n) (x)dxn , where dxn is symbolically the n-th power of dx. The high order differential can be generally computed by dn f = d(dn−1 f ), d(dxn ) = 0 for the variable x, and the Leibniz rule d(αβ) = (dα)β + αdβ. For example, d2 f = d(df ) = d(f 0 dx) = (df 0 )dx + f 0 d(dx) = (f 00 dx)dx = f 00 dx2 , and d3 f = d(d2 f ) = d(f 00 dx2 ) = (df 00 )dx2 + f 00 d(dx2 ) = (f 000 dx)dx2 = f 000 dx3 . Example 2.3.2. Let ( x2 f (x) = −x2 Then

( 2x f (x) = −2x 0

if x ≥ 0 . if x < 0

if x ≥ 0 = 2|x|. if x < 0

By Example 2.1.8, the function f 0 (x) is not differentiable at 0. Therefore f (x) has derivative at 0 only up to the first order, and has all the high order derivatives at any x 6= 0. 0 p sin x cos x 1 + sin2 x = p from Example 2.1.11. Example 2.3.3. We have 1 + sin2 x Then p 0 p p 00 (sin x cos x)0 1 + sin2 x − sin x cos x 1 + sin2 x 1 + sin2 x = p 2 1 + sin2 x p sin x cos x (cos2 x − sin2 x) 1 + sin2 x − sin x cos x p 1 + sin2 x = 1 + sin2 x 2 2 (1 − 2 sin x)(1 + sin x) − sin2 x(1 − sin2 x) = 3 1 + sin2 x 2 =

1 − 2 sin2 x − sin4 x . 3 1 + sin2 x 2

2.3. HIGH ORDER APPROXIMATION

119

Example 2.3.4. By (x3 )(n) = 0 for n > 3 and the Leibniz rule, (x3 ex )(n) = x3 (ex )(n) + n(x3 )0 (ex )(n−1) +

n(n − 1) 3 00 x (n−2) (x ) (e ) 2

n(n − 1)(n − 2) 3 000 x (n−3) (x ) (e ) 6 = (x3 + 3nx2 + 3n(n − 1)x + n(n − 1)(n − 2))ex . +

Example 2.3.5. From the formula for the high order derivatives of xα , it is easy to deduce ((ax + b)α )(n) = α(α − 1) · · · (α − n + 1)an (ax + b)α−n . 1 1 1+x Then by √ = 2(1 − x)− 2 − (1 − x) 2 , we get 1−x 2n−1 1 + x (n) 13 2n − 1 1 1 2n − 3 − 2n+1 2 √ =2 ··· (1 − x) − − ··· (1 − x)− 2 2 2 2 2 2 2 1−x 2n+1 1 · 3 · · · (2n − 3) (1 − x)− 2 (4n − 1 − x). = n 2 dy y2 Example 2.3.6. In Example 2.1.12, the derivative for the ellipse x2 + = 1 was dx 4 computed in three ways. Continuing the three ways, the second order derivative can also be computed. √ In the first way, the upper part y = 2 1 − x2 has the first order derivative dy 2x , = −√ dx 1 − x2 and the second order derivative √ x √ 2 1 − x2 + 2x √ 0 2 1 − − 2x( 1 − x ) 1 − x2 √ =− =− 2 2 dx 1−x ( 1 − x2 )2 2 2 2(1 − x ) + 2x 2 =− =− 3 3 . (1 − x2 ) 2 (1 − x2 ) 2 d2 y

(2x)0

√

x2

In the second way, y is implicitly considered as a function of x. Taking both sides of the equation x2 +

y2 = 1 and using the chain rule, we get 4

d on dx

1 dy 2x + y = 0. 2 dx Taking

d again, we get dx 1 2+ 2

dy dx

2

1 d2 y + y 2 = 0. 2 dx

dy 4x = − . Substituting dx y this into the second equation and solve for the second order derivative, we get y2 2 16 x + d2 y 16 4 =− = − 3. dx2 y3 y In Example 2.1.12, the first equation was solved to give

120

CHAPTER 2. DIFFERENTIATION

In the third way, the ellipse is parametrized by x = cos t, y = 2 sin t. The first dy order derivative = −2 cot t was computed by the chain rule in Example 2.1.12. dx Continuing with the same idea, we get d2 y d(−2 cot t) = = 2 dx dx

d(−2 cot t) 2 csc2 t 2 dt = =− 3 . dx − sin t sin t dt

Example 2.3.7. Let p(x) be a polynomial and  p 1 e− x12 x f (x) =  0

if x 6= 0

.

if x = 0

At x 6= 0, we have 1 1 1 1 2 − 12 0 1 x f (x) = −p e =q e− x2 , +p 2 3 x x x x x 0

where q(x) = −p0 (x)x2 − 2p(x)x3 is also a polynomial. Moreover, by k

1 k − x2

lim x e

x→0

= lim x

−k −x2

x→∞

e

x− 2 = lim =0 x→+∞ ex

for any k, we have f (x) 1 f (0) = lim = lim p x→0 x x→0 x 0

1 1 e− x2 = 0. x

Therefore f 0 (x) is of the same type as f (x), with another polynomial q(x) in place of p(x). In particular, we conclude that f (x) has derivatives of all orders, and f (n) (0) = 0 for any n. Exercise 2.3.9. Compute the derivatives up to the third order. 1.

x2

1 . +1

2x2 + 1 2. 2 . x +1

1 12. arcsin . x

6. arcsin x. 7. arctan x.

13. x(log x − 1).

8. arcsec x.

4. sec x.

10. 2sin 3x .

14. log(log x). √ 15. log(x + 1 + x2 ).

5. csc x.

11. e2x (cos x − 2 sin x).

16. log | sin x|.

3. cot x.

9. (sin 2x + cos 3x)7 .

Exercise 2.3.10. Compute all the high order derivatives. 1. log(2 − 3x). 2.

x2 + x + 1 . 4x2 − 1

Exercise 2.3.11. Prove

3. √

x . 2x + 1

4. x4 log x. dn (f (ax + b)) = an f (n) (ax + b). dxn

2.3. HIGH ORDER APPROXIMATION

121

Exercise 2.3.12. Suppose u = u(x) and v = v(x) are differentiable functions satisfying u(0) = 1, v(0) = −1 and the given equations. Compute the second order derivatives of u and v at 0. 1. u2 + uv + v 2 = 1, (1 + x2 )u + (1 − x2 )v = x. 2. xu + (x + 1)v = ex , eu + xe−v = ex+1 . Exercise 2.3.13. Suppose y is a function of x satisfying the given equation. Comd2 y pute the second order derivative at the given point. dx2 1. y 3 + 2xy 2 − 2x3 = 1, at x = 1, y = 1. 2. x sin(x − y) = y cos(x + y), at x = π, y = 0. 3. ey = xy, at x = −e−1 , y = −1. 4. (1 + y)ex + (1 + x)ey = xy + 2, at x = 0, y = 0. Exercise 2.3.14. For the parametrized curves, compute the second order derivative d2 y at the given point. dx2 π 1. Cycloid: x = t − sin t, y = 1 − cos t, at t = . 4 2. Spiral: x = t cos t, y = t sin t, at t = π. 3. Involute of circle: x = cos t + t sin t, y = sin t − t cos t, at t = π. π 4. Four-leaved rose: x = cos 2t cos t, y = cos 2t sin t, at t = . 4 Exercise 2.3.15. Find the formulae for the derivatives of quotient, composition and inverse functions up to the third order. Exercise 2.3.16. Prove the functions y = arctan x and y = (arcsin x)2 satisfy the equations (1 + x2 )y 00 + 2xy 0 = 0 and (1 − x2 )y 00 − xy 0 = 2. Then use the equations to compute all the high order derivatives of arctan x and (arcsin x)2 at 0. Exercise 2.3.17. Suppose x = x(t) and y = y(t) is a parametrized curve. Prove that d2 y y 00 x0 − y 0 x00 = , dx2 x03 where x0 , x00 , y 0 , y 00 are the derivatives with respect to t. Also find the formula for the third order derivative.

2.3.3

Taylor Expansion

Using high order derivatives, the discussion on the quadratic approximations can be extended to high order approximations. Theorem 2.3.2. Suppose f (x) has the n-th order derivative f (n) (x0 ) at x0 . Then for the n-th degree polynomial Tn (x) = f (x0 ) + f 0 (x0 )(x − x0 ) +

f 00 (x0 ) f (n) (x0 ) (x − x0 )2 + · · · + (x − x0 )n , 2 n! (2.3.3)

122

CHAPTER 2. DIFFERENTIATION

we have lim

x→x0

f (x) − Tn (x) = 0. (x − x0 )n

Note that the existence of f (n) (x0 ) implicitly assumes the existence of f (k) (x) for all k < n and all x near x0 . The function Tn (x) is a polynomial of degree n characterized by the property f (x) = Tn (x) + o(∆xn ), ∆x = x − x0 .

(2.3.4)

Because of the property, we say f (x) is n-th order differentiable. The polynomial Tn (x) is called the n-th order Taylor expansion. Proof of Theorem 2.3.2. The theorem can be proved similar to Proposition 2.3.1. The remainder Rn (x) = f (x) − Tn (x) satisfies Rn (x0 ) = Rn0 (x0 ) = Rn00 (x0 ) = · · · = Rn(n) (x0 ) = 0. Therefore by Cauchy’s mean value theorem, Rn (x) − Rn (x0 ) Rn (x) = (x − x0 )n (x − x0 )n − (x0 − x0 )n 0 Rn0 (c1 ) − Rn0 (x0 ) Rn (c1 ) = = ··· = n(c1 − x0 )n−1 n((c1 − x0 )n−1 − (x0 − x0 )n−1 ) (n−1)

=

Rn (cn−1 ) n(n − 1) · · · 2(cn−1 − x0 )

(2.3.5)

for some c1 between x0 and x, c2 between x0 and c1 , . . . , and cn−1 between x0 and cn−2 . Then we have (n−1)

Rn (x) Rn lim = lim n x→x0 (x − x0 ) cn−1 →x0

(n−1)

(cn−1 ) − Rn n!(c1 − x0 )

(x0 )

(n)

Rn (x0 ) = = 0. n!

The computation of the high order derivatives in Section 2.3.2 immediately gives the following Taylor expansions at 0. α(α − 1) 2 α(α − 1)(α − 2) 3 x + x 2! 3! α(α − 1) · · · (α − n + 1) n + ··· + x + o(xn ), n!

(1 + x)α = 1 + αx +

1 = 1 + x + x2 + x3 + · · · + xn + o(xn ), 1−x 1 1 1 ex = 1 + x + x2 + x3 + · · · + xn + o(xn ), 2! 3! n! 1 2 1 3 (−1)n−1 n log(1 + x) = x − x + x + · · · + x + o(xn ), 2 3 n 1 3 1 5 (−1)n+1 2n−1 sin x = x − x + x + · · · + x + o(x2n ), 3! 5! (2n − 1)! 1 1 (−1)n 2n cos x = 1 − x2 + x4 + · · · + x + o(x2n+1 ). 2! 4! (2n)!

2.3. HIGH ORDER APPROXIMATION

123

Example 2.3.8. By rewriting the function x4 as a polynomial in (x − 1) x4 = (1 + (x − 1))4 = 1 + 4(x − 1) + 6(x − 1)2 + 4(x − 1)3 + (x − 1)4 , and using the characterization (2.3.4), we get the Taylor expansion of various orders of x4 at 1. T1 (x) = 1 + 4(x − 1), T2 (x) = 1 + 4(x − 1) + 6(x − 1)2 , T3 (x) = 1 + 4(x − 1) + 6(x − 1)2 + 4(x − 1)3 , Tn (x) = 1 + 4(x − 1) + 6(x − 1)2 + 4(x − 1)3 + (x − 1)4 , for n ≥ 4. Example 2.3.9. To find the Taylor expansion of log x at 2, we use the Taylor expansion of log(1 + x) at 0 to get x−2 log x = log 2 + log 1 + 2 x − 2 1 (x − 2)2 1 (x − 2)3 = log 2 + + − 2 2 22 3 23 n−1 n (−1) (x − 2) + o((x − 2)n ). + ··· + n 2n Thus the Taylor expansion of log x at 2 is 1 1 1 (−1)n−1 Tn (x) = log 2 + (x − 2) − (x − 2)2 + (x − 2)3 + · · · + (x − 2)n . 2 8 24 n2n Example 2.3.10. The Taylor expansion of ex sin x at 0 can be obtained by multiplying the Taylor expansions of ex and sin x together. In particular, to get the 5-th order Taylor expansion, we have 1 3 1 4 1 3 1 5 1 2 4 6 x x − x + x + o(x ) e sin x = 1 + x + x + x + x + o(x ) 2! 3! 4! 3! 5! 1 1 1 1 1 = 1 + x + x2 + x3 + x4 x − x3 + x5 + o(x5 ) 2! 3! 4! 3! 5! 1 1 1 1 1 1 1 x5 + x5 + o(x5 ). = x + x2 + x3 + x4 + x5 − x3 − x4 − 2! 3! 4! 3! 3! 2! · 3! 5! The second equality uses xm o(xn ) = o(xm+n ). The third equality uses xm = o(xn ) for m > n. Thus the 5-th order Taylor expansion of ex sin x at 0 is 1 1 T5 (x) = x + x2 + x3 − x5 . 3 30 Example 2.3.11. To find the 5-th order Taylor expansion of sec x at 0, we use the Taylor expansions of cos x and (1 − x)−1 at 0 to get −1 1 1 2 1 4 5 sec x = = 1 − x + x + o(x ) cos x 2! 4! 2 1 2 1 4 1 2 1 4 5 5 =1+ x − x + o(x ) + x − x + o(x ) 2! 4! 2! 4! 3 3 ! 1 2 1 4 1 1 + x − x + o(x5 ) + o x2 − x4 + o(x5 ) 2! 4! 2! 4! 2 1 1 1 2 = 1 + x2 − x4 + x + o(x5 ). 2! 4! 2!

124

CHAPTER 2. DIFFERENTIATION

Thus the Taylor expansion of sec x at 0 is 1 5 T5 (x) = 1 + x2 + x4 . 2 24 As a consequence of the Taylor expansion, we get a limit 1 1 1 5 1 − x2 + x4 + 1 + x2 + x4 − 2 + o(x5 ) cos x + sec x − 2 2! 4! 2 24 = lim lim x→0 x→0 (1 + x2 + o(x3 ) − 1)2 (ex2 − 1)2 1 4 x + o(x5 ) 1 4 = lim 4 = . x→0 x + o(x5 ) 4 Example 2.3.12. Here is one more example of using Taylor expansions to compute the limit.   1 1 1 1  = lim  − lim −  1 x→0 x→0 x sin x x x − x3 + o(x4 ) 3!   1 1 − x→0 x

= lim

1

  1 2 1 − x + o(x3 ) 3! 1 1 = lim 1 − 1 − x2 + o(x2 ) x→0 x 3! 1 = lim − x + o(x) = 0. x→0 6 Example 2.3.13. The derivative of f (x) = arcsin x is 1 1 1 1 · 3 · · · (2n − 1) n g(x) = f 0 (x) = (1 − x)− 2 = 1 + x + · · · + x + o(xn ) 2 n! 2n (2n)! n 1 x + o(xn ), = 1 + x + · · · + 2n 2 2 (n!)2

This tells us f (2n) (0) g (2n−1) (0) f (2n+1) (0) g (2n) (0) (2n)! = = 0, = = 2n . (2n − 1)! (2n − 1)! (2n)! (2n)! 2 (n!)2 Thus we get f

(2n)

(0) = 0, f

(2n+1)

(2n)! (0) = (2n)! 2n = 2 (n!)2

(2n)! 2n n!

2 .

Example 2.3.14 (Cauchy). From Example 2.3.7, we know the derivatives of the function ( 1 e− x2 if x = 6 0 f (x) = 0 if x = 0 are 0 at any order. Thus the Taylor expansion of the function is 0 at any order. Exercise 2.3.18. Find the Taylor expansions.

2.3. HIGH ORDER APPROXIMATION

125

1. x3 + 5x − 1 at −1.

3. αx at 1.

5. sin 2x at

2. x3 + 5x − 1 at 0.

4. xα at 1.

6. log

π . 4

3 + x2 at 0. 2+x

Exercise 2.3.19. Find the 5-th order Taylor expansions at 0. 1.

√

2

x + 1ex tan x.

3. arcsin x.

2. (1 + x)x .

4. log

5. log cos x.

sin x . x

6.

ex

x . −1

Exercise 2.3.20. Use Taylor expansions to compute limits. x − tan x . x − sin x 1 1 2. limx→0 − . x2 tan2 x

xex − log(1 + x) . x2 1 2 8. limx→∞ x log x sin . x

1. limx→0

3. limx→0

sin x x

1 log(1−x2 )

7. limx→0

(x − 1) log x . 1 + cos πx 1 1 − . 10. limx→1 x − 1 log x 1 11. limx→0 − cot x . x 9. limx→1

. 1

4. limx→0 (cos x + sin x) x(x+1) . 1

5. limx→0 (cos x) x2 . 1 x . 6. limx→∞ x e − 1 + x

1

12. limx→0

Exercise 2.3.21. Find the high order derivatives of arctan x at 0. Exercise 2.3.22. Suppose f (x) has second order derivative at 0 and satisfies lim

x→0

f (x) 1+x+ x

1

x

1

(1 + 2x + x2 ) x − (1 + 2x − x2 ) x . x

= eλ .

1. Find f (0), f 0 (0) and f 00 (0). 1 f (x) x 2. Find limx→0 1 + . x Exercise 2.3.23. Suppose f (x) has derivatives of any order. How are the high order derivatives of f (x) and f (x2 ) at 0 related? Exercise 2.3.24. Prove that the Taylor expansion at 0 of an odd function contains only terms of odd power. What about an even function? Exercise 2.3.25. Suppose f (x) and g(x) are approximated by linear functions a0 + a1 ∆x and b0 + b1 ∆x at x0 . Suppose a1 > 0. Without computing the derivatives, find the linear approximation of f (x)g(x) at x0 . Moreover, extend the result to quadratic approximations.

As explained after the statement of Theorem 2.3.2, like the quadratic situation, the existence of n-th order derivative at x0 implies the n-th order differentiability at x0 . On the other hand, Exercise 2.3.4 shows that the

126

CHAPTER 2. DIFFERENTIATION

n-th order differentiability does not necessarily imply the existence of n-th order derivative. This is rather different from the first order case, when the differentiability is equivalent to the existence of the derivative. Example 2.3.15. It is easy to see that |x|α is n-th order differentiable at 0 when one of the following happens: 1. α > n: The n-th order approximation is 0. 2. α is an even natural number: We have |x|α = xα . The n-th order approximation is xα if α ≤ n and is 0 if α > n. We claim the converse is also true. Let m be the unique natural number satisfying m + 1 ≥ α > m. Then for any natural number n ≥ α, we have n ≥ m + 1. Therefore n-th order differentiability implies (m + 1)-st order differentiability. Moreover, by the first statement above, the m-th order approximation is 0, so that the (m + 1)-st order approximation is bxm+1 . In other words, for any > 0, there is δ > 0, such that |x| < δ =⇒ |x|α − bxm+1 < |x|m+1 . |x|α This is equivalent to the existence of the limit b = limx→0 m+1 . Since m + 1 ≥ α, x this happens exactly when α = m + 1 is an even number. Therefore the converse is proved. As for the existence of the n-th order derivative, because this implies the nth order differentiability, the conditions above are necessary. Conversely, if α > n ≥ k > 0, then we have (|x|α )(k) = α(α − 1) · · · (α − k + 1)xα−k for x > 0 or x = 0+ and (|x|α )(k) = (−1)k α(α − 1) · · · (α − k + 1)(−x)α−k for x < 0 or x = 0− . Therefore |x|α has n-th order derivative 0 at 0. Moreover, if α is an even natural number, then |x|α = xα has derivative of any order. Therefore the condition for the existence of the n-order derivative is the same as the condition for the n-th order differentiability. Exercise 2.3.26. order derivative ( axα 1. b(−x)β

2.3.4

Study the n-th order differentiability and the existence of the n-th of the functions at 0 (α, β > 0, a, b 6= 0).  |x|α sin 1 if x ≥ 0 if x 6= 0 . |x|β 2. . if x < 0  0 if x = 0

Remainder

Let Tn (x) be the n-th order Taylor expansion of f (x) at x0 . Under the condition of Theorem 2.3.2, all we know about the remainder is Rn (x) = f (x) − Tn (x) = o(∆xn ). Under slightly stronger assumption, however, more can be said about the remainder. Proposition 2.3.3 (Lagrange10 ). Suppose f (t) has (n+1)-st order derivative 10 Joseph-Louis Lagrange, born 1736 in Turin (Italy), died 1813 in Paris (France). In analysis, Lagrange invented calculus of variations, Lagrange multipliers, and Lagrange interpolation. He invented the method of solving differential equations by variation of parameters. In number theory, he proved that every natural number is a sum of four squares. He also transformed Newtonian mechanics into a branch of analysis, now called Lagrangian mechanics.

2.3. HIGH ORDER APPROXIMATION

127

between x and x0 . Then there is c between x and x0 , such that Rn (x) =

f (n+1) (c) (x − x0 )n+1 . (n + 1)!

(2.3.6)

Note that when n = 0, the conclusion of the proposition is exactly the mean value theorem. The expression for Rn (x) in Proposition 2.3.3 is called the Lagrange form of the remainder. In Exercise 2.3.47 and Example 3.3.1, two other formulae for the remainder will be given. Proof. Under the assumption that f has (n + 1)-st order derivative between x0 and x, we have the following computation similar to (2.3.5), Rn (x) Rn (x) − Rn (x0 ) = n+1 (x − x0 ) (x − x0 )n+1 − (x0 − x0 )n+1 Rn0 (c1 ) Rn0 (c1 ) − Rn0 (x0 ) = = = ··· (n + 1)(c1 − x0 )n (n + 1)((c1 − x0 )n − (x0 − x0 )n ) (n)

(n)

(n)

(n+1)

Rn (cn ) − Rn (x0 ) Rn (c) Rn (cn ) = = , = (n + 1)n(n − 1) · · · 2(cn − x0 ) (n + 1)!(cn − x0 ) (n + 1)! where c is between x0 and x. Since Tn is a polynomial of degree n, its (n+1)(n+1) st order derivative is zero. Therefore Rn (c) = f (n+1) (c). The formula for the remainder then follows. Example 2.3.16. The remainder of the Taylor expansion for ex at 0 is Rn (x) =

ec xn+1 . (n + 1)!

Since |c| < |x|, for each fixed x, we have |Rn (x)| <

e|x| |x|n+1 , which converges to (n + 1)!

0 as n → ∞. Therefore

1 2 1 3 1 n e = lim 1 + x + x + x + · · · + x n→∞ 2! 3! n! 1 2 1 3 1 n = 1 + x + x + x + ··· + x + ··· . 2! 3! n! x

Moreover, since |R9 (1)| <

e 3 < < 10−6 , we find 10! 10!

e≈1+1+

1 1 1 + + · · · + ≈ 2.718285 2! 3! 9!

is accurate up to the 6th digit. Example 2.3.17. Suppose f (x) has second order derivative on [0, 1]. Suppose |f (0)| ≤ 1, |f (1)| ≤ 1 and |f 00 (x)| ≤ 1. We would like to estimate the size of f 0 (x).

128

CHAPTER 2. DIFFERENTIATION

Fix any 0 < x < 1. By the second order Taylor expansion at x and the remainder formula, we have f 00 (c1 ) (x − 0)2 , 2 f 00 (c2 ) f (1) = f (x) + f 0 (x)(x − 1) + (x − 1)2 , 2 f (0) = f (x) + f 0 (x)(x − 0) +

0 < c1 < x x < c2 < 1

Subtracting the two, we get f 0 (x) = f (1) − f (0) +

f 00 (c1 ) 2 f 00 (c2 ) x − (x − 1)2 . 2 2

Therefore by the assumption on the sizes of f (1), f (0) and f 00 (x), we get 1 5 |f 0 (x)| ≤ 2 + (x2 + (x − 1)2 ) ≤ . 2 2 Exercise 2.3.27. Estimate the errors of approximations for |x| ≤ 0.2. 1 1 1. cos x ≈ 1 − x2 + x4 . 2 24 2. √

1 3. log(1 + x) ≈ x − x2 . 2

1 1 3 ≈ 1 − x + x2 . 2 8 1+x

4. 2x ≈ 1 + x log 2 +

(log 2)2 2 x . 2

Exercise 2.3.28. Compute the values up to the 4-th digit. 1.

√

e.

2. log 0.9.

3.

√ 5

30.

Exercise 2.3.29. Show that the Taylor expansions of sin x and cos x converge to the respective functions for any x as n → ∞. Exercise 2.3.30. Prove that if there is M , such that |f (n) (x)| ≤ M for all n and x ∈ [a, b], then the Taylor expansion of f (x) converges to f (x) for x ∈ [a, b]. Exercise 2.3.31. Prove that for x > 0, we have x−

x2 x3 x2k x2k+1 x2 x3 x2k x2k+1 + −· · ·+ − > log(1+x) > x− + −· · ·+ − , 2 3 2k (2k + 1)(1 + x)2k+1 2 3 2k 2k + 1

and x−

x2 x3 x2k−1 x2k x2 x3 x2k−1 x2k < log(1+x) < x− + −· · ·− + + −· · ·− + . 2 3 2k − 1 2k(1 + x)2k 2 3 2k − 1 2k

This extends the inequality in Exercise 2.2.20. Also derive the similar inequalities for 0 > x > −1. Then use the inequalities to discuss the convergence of the Taylor series of log(1 + x). Exercise 2.3.32. Suppose f (x) has the third order derivative on [−1, 1], such that f (−1) = 0, f (0) = 0, f (1) = 1, f 0 (0) = 0. Prove that there are −1 < x < 0 and 0 < y < 1, such that f 000 (x) + f 000 (y) = 6.

2.3. HIGH ORDER APPROXIMATION

2.3.5

129

Maximum and Minimum

Suppose f (x) is differentiable at x0 . By Proposition 2.2.1, a necessary condition for x0 to be a local extreme is f 0 (x0 ) = 0. To further determine whether x0 is indeed a local extreme, high order approximations can be used. Proposition 2.3.4. Suppose f (x) has n-th order approximation a+b(x−x0 )n at x0 , with b 6= 0. 1. If n is odd and b 6= 0, then x0 is not a local extreme. 2. If n is even and b > 0, then x0 is a local minimum. If n is even and b < 0, then x0 is a local maximum. Proof. The n-th order approximation means that for any > 0, there is δ > 0, such that |x − x0 | < δ =⇒ |f (x) − a − b(x − x0 )n | ≤ |(x − x0 )n |. This implies a = f (x0 ) by taking x = x0 . In what follows, we will fix to be any number satisfying 0 < < |b|, so that b − and b have the same sign. Suppose n is odd and b > 0. Then for δ > x − x0 > 0, we have f (x) − f (x0 ) > b(x − x0 )n − |(x − x0 )n | = (b − )(x − x0 )n > 0, and for 0 > x − x0 > −δ, we have f (x) − f (x0 ) < b(x − x0 )n + |(x − x0 )n | = (b − )(x − x0 )n < 0, Thus x0 is not a local extreme. Similarly, if b < 0, then x0 is also not a local extreme. Suppose n is even and b > 0. Then |x − x0 | < δ implies f (x) − f (x0 ) ≥ b(x − x0 )n − |(x − x0 )n | = (b − )(x − x0 )n ≥ 0. Thus x0 is a local minimum. Similarly, if b < 0, then x0 is a local maximum. Suppose f (x) has derivatives of sufficiently high order. Then the function has high order approximations by the Taylor expansion. To apply the proposition, we assume f 0 (x0 ) = f 00 (x0 ) = · · · = f (n−1) (x0 ) = 0, f (n) (x0 ) 6= 0. Then we conclude the following. 1. If n is odd, then x0 is not a local extreme. 2. If n is even, then x0 is a local minimum when f (n) (x0 ) > 0, and is a local maximum when f (n) (x0 ) < 0. On the other hand, since the n-th order differentiability is weaker than the existence of the n-th order derivative, Proposition 2.3.4 is a technically stronger statement.

130

CHAPTER 2. DIFFERENTIATION

Example 2.3.18. By Example 2.2.1, we know the possible local extrema of the function f (x) = x2 ex are 0 and −2. Then by f 00 (x) = (x2 + 4x + 2)ex , f 00 (0) = 2 > 0, f 00 (−2) = −2e−2 < 0, we conclude that 0 is a local minimum and −2 is a local maximum. This confirms the same conclusion in Example 2.2.7, which was obtained by studying the monotone properties of the function. 8 5 1 Example 2.3.19. For the function f (x) = (x+ 1)x 3 , we have f 0 (x) = (11x+8)x 3 , 3 2 8 11 f 00 (x) = (11x + 5)x 3 . From f 0 (x) we find two candidates − and 0 for the local 9 8 9 2 11 11 = − 11 3 < 0, we know − is a local maximum. Since extrema. Since f 00 − 8 4 8 f 00 (0) = 0, no immediate conclusion can be drawn from the other candidate 0 by Proposition 2.3.4. Moreover, since f 000 (0) does not exist, the proposition cannot be applied to determine whether x = 0 is a local extreme. On the other hand, for x close to 0, we do have f (x) ≥ 0 = f (0), so that 0 is indeed a local minimum by the definition. Exercise 2.3.33. Find local extrema. 1. x +

1 . x

7. x log x.

2. 6x10 − 10x6 . 3. sin x + cos x. 4.

sin x . 2 + cos x

5. cos x +

1 cos 2x. 2

6. (x2 + 1)ex .

(log x)2 . x 1 9. 1 + x + x2 e−x . 2 1 1 10. 1 + x + x2 + x3 e−x . 2 6 8.

11. arctan x −

1 log(1 + x2 ). 2

Exercise 2.3.34. Does the second part of Proposition 2.3.4 hold if > and < are replaced by ≥ and ≤? Exercise 2.3.35. Study the local extrema of the function   1 e− x12 if x 6= 0 f (x) = x4 . 0 if x = 0

2.3.6

Convex and Concave

A function is convex if the line segment Lx,y connecting any two points (x, f (x)) and (y, f (y)) on the graph of f lies above the graph of f . In other words, for any x < z < y, the point (z, f (z)) lies below Lx,y . See Figure 2.6. From the picture, it is easy to see that the condition is equivalent to any one of the following. 1. slope of Lz,y ≥ slope of Lx,y . 2. slope of Lx,y ≥ slope of Lx,z . 3. slope of Lz,y ≥ slope of Lx,z .

2.3. HIGH ORDER APPROXIMATION

131

f (y) − f (x) , and it is not difficult to verify y−x by direct computation that the three conditions are equivalent. Moreover, the line segment Lx,y is given by

Algebraically, the slope of Lx,y is

Lx,y (z) = f (y) +

f (y) − f (x) (z − y). y−x

Thus the convexity of f (x) means Lx,y (z) ≥ f (z), which is easily seen to be equivalent to the first condition. ..... .. .. .. ... . .. .......... . . .. . . . . . . Lx,y .......... .... .. .. . . .. . ............ Lz,y ......... ... . . . . . . . . . ... . .. . . .. ....... ......... ...................... .. ....................... Lx,z ... ........................ ... .. .. ...... ................ .............. .. .. ............ .. .. ......... .. . ..................... .. .. .. .. . . ... .. .. .. .. . . . . . .. .. .. K........ .. .. .. ........ .. .. . .......... .. .. .. 1 − λ ... .. .. λ .......................................................................................................................................... y . x z Figure 2.6: convex function Convex functions can also be characterized by straight lines below the graph. Proposition 2.3.5. A function f (x) on an open interval is convex if and only if for any z, there is a linear function K(x) such that K(z) = f (z) and K(x) ≤ f (x) for all x. Proof. For a convex function f and fixed z, by the third convexity condition, we have sup(slope of Lx,z ) ≤ inf (slope of Lz,y ). x
y>z

Let B = B(z) be a number between the supremum and the infimum. Then slope of Lx,z ≤ B ≤ slope of Lz,y for any x < z < y. Let K(x) = f (z) + B(x − z) be the linear function with slope B and satisfies K(z) = f (z). Then the relation above between B and the slopes tells us that f (x) ≥ K(x) for any x. Conversely, suppose the linear function K exists with the claimed property. Then geometrically it is clear that the properties of K implies slope of Lx,z ≤ B and slope of Lz,y ≥ B for any x < z < y. This implies the third convexity condition.

132

CHAPTER 2. DIFFERENTIATION

The convexity condition can also be rephrased as follows. Write z = (1 − λ)x + λy, where x < z < y is equivalent to 0 < λ < 1 (and z < x, z = x, x < z < y, z = y, z > y are respectively equivalent to λ < 0, λ = 0, 0 < λ < 1, λ = 1, λ > 1). Either geometrical consideration or algebraic computation tells us Lx,y (z) = (1 − λ)f (x) + λf (y). Thus the convexity is the same as 0 < λ < 1 =⇒ (1 − λ)f (x) + λf (y) ≥ f ((1 − λ)x + λy).

(2.3.7)

A function is concave if the line segment connecting two points on the graph of f lies below the graph of f . By exchanging the directions of the inequalities, the characterizations of convex functions become the characterization of concave functions. .... .. .. ........................ .. ....................... ........... . . . . ............... ... .. .... . .......... .. ......... ... . . ................ . .. .... .. . . . . . . . . .. . .. . .. .. ............ .. ........... ....... .. . .. .... ............ .. . .. .............. .. .. . .. .. .. ....... . . .. .. ... .. . .. .. .. . .. . .. .. .. .. .. .. .. .. .. .. . .. .. .................................................................................................................................................... . Figure 2.7: concave function A point of inflection is where the function is changed from concave to convex, or from convex to concave. Proposition 2.3.6. Suppose f (x) is differentiable on an interval. Then f (x) is convex if and only if f 0 (x) is increasing. Combining Propositions 2.2.4 and 2.3.6, a function f (x) with second order derivative is convex if and only if f 00 (x) ≥ 0. The similar statement for concave functions is also true. The points of inflection are then the places where f 00 (x) changes the sign. Proof. Suppose f (x) is convex. Then for fixed x < y and changing z between x and y, we have f 0 (x) = lim+ (slope of Lx,z ) ≤ slope of Lx,y ≤ lim− (slope of Lz,y ) = f 0 (y). z→x

z→y

Conversely, suppose f 0 is increasing and x < z < y. By the mean value theorem, we have slope of Lx,z = f 0 (c), slope of Lz,y = f 0 (d),

2.3. HIGH ORDER APPROXIMATION

133

for some x < c < z and z < d < y. Since c < d, we have f 0 (c) ≤ f 0 (d), so that the third condition for the convexity holds. 1 > 0, the derivative (− log x)0 is an increasx2 1 1 ing function and − log x is convex. Therefore if p, q > 0 satisfy + = 1, then p q we have 1 1 p 1 q 1 x + y . log xy = log xp + log y q ≤ log p q p q

Example 2.3.20. Since (− log x)00 =

Taking the exponential, we get the Young equality in Exercise 2.2.40 xy ≤

1 p 1 q x + y . p q

Example 2.3.21. The function f (x) = x2 ex in Examples has √ √ 2.2.7 and 2.3.18 f 00 (x) = (x2 +4x+2)ex = (x−a)(x−b)ex , where a = −2− 2, b = −2+ 2. From the signs of the second order derivative, we know that the function is convex on (−∞, a] and [b, ∞), and is concave on [a, b]. The points a and b, where concavity and convexity are exchanged, are the points of inflection. This adds additional information to the graph of the function in Figure 2.4. x 1 − x2 0 (x) = . From f , we find f (x) is 1 + x2 (1 + x2 )2 decreasing on (−∞, −1], increasing on [−1, 1], and decreasing again on [1, ∞). 1 1 Therefore f (−1) = − is a local minimum, and f (1) = is a local maximum. 2 2 2) √ −2x(3 − x , we find f (x) is concave on (−∞, − From f 00 (x) = 3], convex on (1 + x2 )3 √ √ √ [− 3, 0], concave √ again on [0, 3], and then √ convex again on [ 3, ∞). Therefore √ √ 3 3 , f (0) = 0 and f ( 3) = are points of inflection. f (− 3) = − 4 4 Combining the information with limx→∞ f (x) = 0, we get a rough sketch of the function in Figure 2.8. Example 2.3.22. Let f (x) =

..... 1 ...... ............................. ....... .. 2 ............. ......... .. ....... ............. .. .... .............. . . . . . . . . . . . . . . ..........................................√ .....................................................................................................................................√ ................................................................. .. . ....................... − 3 −2 2 3 .......... .... ... . . . . . ........ . . . . . . . . ...... . ...... 1 .................................. .. − 2 x Figure 2.8: graph of 1 + x2 2

Example 2.3.23. Consider the function f (x) = (x + 1)x3 in Example 2.2.8. From 1 2 − 31 0 f (x) = (5x + 2)x , we find f (x) is increasing on −∞, − , decreasing on 3 5 √ 2 2 3 3 20 − , 0 , and increasing again on [0, ∞). Therefore f − = is a local 5 5 25 maximum, and f (0) = 0 is a local minimum.

134

CHAPTER 2. DIFFERENTIATION

2 1 − 43 and convex From = (5x − 1)x , we find f (x) is concave on −∞, 9 5 √ 1 635 1 on = , ∞ . Therefore f is a point of inflection. 5 5 25 f (x) Combined with limx→∞ 5 = 1, we get a rough sketch of the function in x3 Figure 2.9. f 00 (x)

..... .. .. ... . .. .. .. . .. . .. .. . .. .. .. . . √ . .. 3 3 20 .. .... . . . 25 ..... .. ................................... ... .... . . . . . . . .... . . ..... ....... ....... . . . . . . . . . ............................................................................................................................................................................................... .. 1 ... − 52 . 5 ..... 2

Figure 2.9: graph of (x + 1)x 3 Exercise 2.3.36. Sketch the graph of functions. 1. x3 + 6x2 − 15x − 20. 1 2. x − . x 1 3. . 1 + x2

2

1

4. (x − 1) 3 (x + 1) 3 .

7. x − arctan x.

5. log(1 + x2 ).

8.

6. xe−x .

9. e−x cos x.

log x . x

Exercise 2.3.37. Are the sum, product, composition, maximum, minimum of two convex functions still convex? Exercise 2.3.38. convexity of x log x and then use the property to prove Verifythe x + y x+y the inequality ≤ xx y y . 2 Exercise 2.3.39. Suppose p ≥ 1. Show that xp is convex. Then for non-negative a1 , a2 , . . . , an , b1 , b2 , . . . , bn , take 1 P ( api ) p ai bi x= P 1 , y = P 1 , λ = P 1 1 , P ( api ) p ( bpi ) p ( api ) p + ( bpi ) p

in the inequality (2.3.7) and derive the Minkowski inequality in Exercise 2.2.42. Exercise 2.3.40. Suppose f (x) is a convex function. For any λ1 , λ2 , . . . , λn satisfying λ1 + λ2 + · · · + λn = 1 and 0 < λi < 1, prove Jensen11 inequality f (λ1 x1 + λ2 x2 + · · · + λn xn ) ≤ λ1 f (x1 ) + λ2 f (x2 ) + · · · + λn f (xn ). 11 Johan Jensen, born 1859 in Nakskov (Denmark), died 1925 in Copenhagen (Denmark). He proved the inequality in 1906.

2.3. HIGH ORDER APPROXIMATION

135

Then use this to prove that for xi > 0, we have √ n

x1 + x2 + · · · + xn x1 x2 · · · xn ≤ ≤ n

and (x1 x2 · · · xn )

x1 +x2 +···+xn n

r p

xp1 + xp2 + · · · + xpn for p ≥ 1, n

≤ xx1 1 xx2 2 · · · xxnn .

Exercise 2.3.41. Prove that if f 00 (x0 ) = 0 and f 000 (x0 ) 6= 0, then x0 is a point of inflection. Exercise 2.3.42. Prove that function on an interval is convex if and a continuous f (x) + f (y) x+y only if for any x and y on the interval. ≥f 2 2 Exercise 2.3.43. Prove that a function f (x) on an open interval (a, b) is convex if and only if for any a < x < y < b, we have f (z) ≥ Lx,y (z) for any z ∈ (a, x) and z ∈ (y, b). Then prove that a convex function on an open interval must be continuous.

2.3.7

Additional Exercise

Estimation of sin x and cos x 2 π We know the inequality sin x > x for 0 < x < from (1.3.18) and π 2 Exercise 2.2.20. The subsequent exercises extend the estimation to higher order and to cos x. Exercise 2.3.44. Let x2k−1 x3 + · · · + (−1)k−1 − sin x, 3! (2k − 1)! x2 x2k gk (x) = 1 − + · · · + (−1)k − cos x. 2! (2k)!

fk (x) = x −

0 = gk . Then use the equalities to prove that Verify that gk0 = −fk and fk+1

x−

x3 x4k−1 x3 x4k−1 x4k+1 + ··· − < sin x < x − + ··· − + 3! (4k − 1)! 3! (4k − 1)! (4k + 1)!

for x > 0. Also derive the similar inequalities for cos x. Exercise 2.3.45. Prove that x−

x3 x4k−1 2 x4k+1 x3 x4k+1 2 x4k+3 +· · ·− + < sin x < x− +· · ·+ − , 3! (4k − 1)! π (4k + 1)! 3! (4k + 1)! π (4k + 3)!

and 1−

x2 x4k−2 2 x4k x2 x4k 2 x4k+2 + ··· − + < cos x < 1 − + ··· + − 2! (4k − 2)! π (4k)! 2! (4k)! π (4k + 2)!

for 0 < x <

π . 2

136

CHAPTER 2. DIFFERENTIATION

Exercise 2.3.46. Let ( 1 −1 ( 1 x2 x3 xn gn (x) = 1 − x − + + · · · + s2 (n) , s2 (n) = 2! 3! n! −1

x2 x3 xn fn (x) = 1 + x − − + · · · + s1 (n) , s1 (n) = 2! 3! n!

if n = 4k, 4k + 1 , if n = 4k + 2, 4k + 3 if n = 4k − 1, 4k . if n = 4k + 1, 4k + 2

Prove that f4k+1 (x) − and g4k (x) −

√

2

√ x4k x4k+2 < cos x + sin x < f4k−1 (x) + 2 , (4k + 2)! (4k)!

√ x4k+1 √ x4k+3 2 < cos x − sin x < g4k+2 (x) + 2 . (4k + 1)! (4k + 3)!

Moreover, derive similar inequalities for a cos x + b sin x.

Cauchy Form of the Remainder The Lagrange form (2.3.6) is the simplest form of the remainder. However, for certain functions, it is more suitable to use the Cauchy form Rn (x) =

f (n+1) (c) (x − c)n (x − x0 ) n!

(2.3.8)

of the remainder. The proof makes use of the function F (t) = f (x) − f (t) − f 0 (t)(x − t) −

f 00 (t) f (n) (t) (x − t)2 − · · · − (x − t)n 2 n!

defined for any fixed x and x0 . Exercise 2.3.47. By applying the mean value theorem to F (t) for t between x0 and x, prove the Cauchy form of the remainder. Exercise 2.3.48. By applying Cauchy’s mean value theorem to F (t) and G(t) = (x − t)n+1 , derive the Lagrange form (2.3.6) for the remainder. Exercise 2.3.49. Prove the remainder of the Taylor series of (1 + x)α satisfies α(α − 1) · · · (α − n) n+1 for |x| < 1, |Rn | ≤ ρn = A x n! where A = (1 + |x|)α−1 for α ≥ 1 and A = (1 − |x|)α−1 for α < 1. Then use Exercise 1.1.33 to show that limn→∞ ρn = 0. This shows that the Taylor series of (1 + x)α converges for |x| < 1. Exercise 2.3.50. Study the convergence of the Taylor series of log(1 + x).

Relation between the Bounds of a Function and its Derivatives In Example 2.3.17, we saw bounds on a function and its second order derivative will impose a bound on the first order derivative. The subsequent exercises provide more examples.

2.3. HIGH ORDER APPROXIMATION

137

Exercise 2.3.51. Suppose f (x) is a function on [0, 1] with second order derivative and satisfying f (0) = f 0 (0) = 0, f (1) = 1. Prove that if f 00 (x) ≤ 2 for any 0 < x < 1, then f (x) = x2 . In other words, unless f (x) = x2 , we will have f 00 (x) > 2 somewhere on (0, 1). Exercise 2.3.52. Consider functions f (x) on [0, 1] with second order derivative and satisfying f (0) = f (1) = 0 and min[0,1] f (x) = −1. What would be the “lowest bound” a for f 00 (x)? In other words, find biggest a, such that any such function f (x) will have f 00 (x) ≥ a somewhere on (0, 1). Exercise 2.3.53. Study the constraint on the second order derivative for functions on [a, b] satisfying f (a) = A, f (b) = B and min[a,b] f (x) = m. Exercise 2.3.54. Suppose f (x) has the second order derivative on (a, b). Suppose M0 , M1 , M2 are the suprema of |f (x)|, |f 0 (x)|, |f 00 (x)| on the interval. By rewriting the remainder formula as an expression of f 0 in terms of f and f 00 , prove that |f 0 (x)| ≤

h 2 M2 + M0 2 h

√ for any a < x < b and 0 < h < max{x − a, b − x}. Then prove M1 ≤ 2 M2 M0 in case b = +∞. Moreover, verify that the equality happens for  2x2 − 1 if −1 < x < 0 f (x) = x2 − 1 .  if x ≥ 0 2 x +1 Exercise 2.3.55. Suppose f (x) has the second order derivative on (a, +∞). Prove that if f 00 (x) is bounded and limx→+∞ f (x) = 0, then limx→+∞ f 0 (x) = 0.

Convexity Criterion by the One Side Derivative Convex functions are continuous (see Exercise 2.3.43) but not necessarily always differentiable (|x| is convex, for example). So a convexity criterion more general than Proposition 2.3.6 is needed. Exercise 2.3.56. Prove that a convex function f (x) on an open interval is left and right differentiable, and the one side derivatives satisfy f+0 (x) ≤

f (y) − f (x) ≤ f−0 (y). y−x

Exercise 2.3.57. Prove the following are equivalent for a function f (x) on an open interval. 1. f (x) is convex. 2. f (x) is left and right differentiable, with x < y implying f−0 (x) ≤ f+0 (x) ≤ f−0 (y) ≤ f+0 (y). 3. f (x) is left continuous and right differentiable, with increasing f+0 (x).

138

CHAPTER 2. DIFFERENTIATION

Chapter 3 Integration

139

140

3.1

CHAPTER 3. INTEGRATION

Riemann Integration

Discovered independently by Newton and Leibniz, the integration was originally the method of using the antiderivative to find the areas under curves. Thus the method started as an application of the differentiation. Then Riemann1 studied the limiting process leading to the area and established the integration as an independent subject. The new viewpoint further led to other integration theories, among which the most significant is the Lebesgue integration.

3.1.1

Riemann Sum

Let f (x) be a function on a bounded interval [a, b]. To compute the area of the region between the graph of the function and the x-axis, we choose a partition P : a = x0 < x1 < x2 < · · · < xn = b (3.1.1) of the interval and approximate the region by a sequence of rectangles with base [xi−1 , xi ] and height f (x∗i ), where x∗i ∈ [xi−1 , xi ]. The total area of the rectangles is the Riemann sum S(P, f ) =

n X

f (x∗i )(xi

i=1

− xi−1 ) =

n X

f (x∗i )∆xi .

(3.1.2)

i=1

Note that S(P, f ) also depends on the choices of x∗i , although the choice does not explicitly appear in the notation. .... .. .............................. .. .. .. ..... ........... . . . .. .... .. .. ... ................ . . . .. .. ....................... f (x∗i ) . . .............................. .. ... ... .. .. .... .. .. .. ... ... .... .. .. .. ... .. ... . . .. . . ... ... .. . .. .. ............... .. . .. ... ... . . . . .. ... ..... .... ... .. .. ... ... ... . .. .. ..... .................. .. .. ... .... ... .. .. ............................. .. .. .. ... ... ... .. .... .. .. .. .. .. ... .... ... .. ... .. .. .. .. .. ... .... ... .. ... .. .. .. .. .. . . . . . . . .................................................................................................................................................................................. .. xi−1 xi a b .. .. . Figure 3.1: Riemann sum Example 3.1.1. To find the area under the function f (x) = x over the interval i i [0, 1], we choose a partition Pn given by xi = and choose x∗i = . Then n n n X i1 1 n(n + 1) n+1 S(Pn , x) = = 2 = . nn n 2 2n i=1

1 Georg Friedrich Bernhard Riemann, born 1826 in Breselenz (Germany), died 1866 in Selasca (Italy).

3.1. RIEMANN INTEGRATION If we instead choose the middle point

141 x∗i

1 = 2

i−1 i + n n

=

2i − 1 of the interval 2n

[xi−1 , xi ], then the Riemann sum S(Pn , x) =

n X 2i − 1 1

2n n

i=1

=

n(n + 1) − n 1 = . 2n2 2

Exercise 3.1.1. Compute the Riemann sums. 1. f (x) = x, xi =

i ∗ i−1 ,x = . n i n

3. f (x) = x2 , xi =

i ∗ 2i − 1 ,x = . n i 2n

i ∗ i , xi = . n n

4. f (x) = αx , xi =

i ∗ i−1 ,x = . n i n

2. f (x) = x2 , xi =

The Riemann sum is only an approximation of the area of the region. We expect the approximation to get more accurate when the size of the partition kP k = max ∆xi 1≤i≤n

gets smaller. This leads to the definition of the definite integral. Definition 3.1.1. A function f (x) on a bounded interval [a, b] is Riemann integrable, with integral I, if for any > 0, there is δ > 0, such that kP k < δ =⇒ |S(P, f ) − I| < .

(3.1.3)

Because of the similarity to the definition of limits, we write b

Z I=

f (x)dx = lim S(P, f ). kP k→0

a

The numbers a and b are called the lower limit and the upper limit of the integral. As pointed out at the beginning of the chapter, there are several integration theories. Therefore there are different meanings of integrability. In this course, unless otherwise indicated, the integrability will always mean Riemann integrability. Example 3.1.2. For the constant function f (x) = c on [a, b] and any partition P , we have n n X X S(P, c) = c∆xi = c ∆xi = c(b − a). i=1

i=1

Therefore the constant function is integrable, with Z

b

cdx = c(b − a). a

Example 3.1.3. Consider the function ( 0 dc (x) = 1

if x 6= c if x = c

142

CHAPTER 3. INTEGRATION

that is constantly zero except at x = c. For any partition P of a bounded closed interval [a, b] containing c, we have   if no x∗i = c 0   ∆x if xk−1 < x∗k = c < xk k S(P, dc ) = . ∗ = a = c, k = 1 or x∗ = b = c, k = n  ∆x if x  k n 1   ∆x + ∆x ∗ ∗ k k+1 if xk = xk+1 = xk = c, k 6= 0, k 6= n b

Z

dc (x)dx = 0.

This implies that dc (x) is integrable, and a

Example 3.1.4. For the Dirichlet function D(x) in Example 1.3.24 and any partition P of [a, b], we have S(P, D) = b − a if all x∗i are rational numbers and S(P, D) = 0 if all x∗i are irrational numbers. Thus the Dirichlet function is not integrable. Example 3.1.5. Consider Thomae’s function R(x) in Example 1.4.2. For any natural number N , let AN be the set of rational numbers in [0, 1] with denominators ≤ N . Then AN is finite, containing say νN numbers. For any partition P of [0, 1] and choices of x∗i , the Riemann sum S(P, R) can be divided into two parts. The first part consists of those intervals with x∗i ∈ AN , and the second part has x∗i 6∈ AN . The number of terms in the first part is ≤ 2νN (the factor 2 takes into account of the possibility that x∗i = x∗i+1 = xi ∈ AN ), so that the total length of the intervals in the first part is no more than 2νN kP k. Moreover, the total length of the intervals in the second part is no more than the total length 1 of the whole interval [0, 1]. Since we have 0 < R(x∗i ) ≤ 1 the first part and we also have 1 0 ≤ R(x∗i ) ≤ in the second part, we conclude N 0 ≤ S(P, R) ≤ 2νN kP k +

1 1 1 = 2νN kP k + . N N

2 1 , for example, we get 0 ≤ S(P, R) < . Thus we 2N νN N Z 1 conclude that the function is integrable on [0, 1], with R(x)dx = 0. By taking kP k < δ =

0

Exercise 3.1.2. Study the integrability. For the integrable ones, find the integrals. ( ( 0 if 0 ≤ x < 1 1 if x = n1 , n ∈ Z on [−1, 1]. 1. on [0, 2]. 3. 0 if otherwise 1 if 1 ≤ x ≤ 2 ( ( 1 if x = 21n , n ∈ N x if x is rational 2. on [0, 1]. 4. n on [0, 1]. 0 if x is irrational 0 if otherwise Z 1 1 Exercise 3.1.3. Prove that xdx = in the following steps. 2 0 xi + xi−1 is the middle point of the intervals, 2 1 then the Riemann sum Smid (P, x) = . 2

1. For any partition P , if x∗i =

1 2. For any partition P and choice of x∗i , we have |S(P, x)−Smid (P, x)| ≤ kP k. 2

3.1. RIEMANN INTEGRATION

143

Finally, we remark that since the Riemann sum S(P, f ) takes into account of the sign of the function f (x), the integration is actually the signed area, which counts the part of the area corresponding to f (x) < 0 as negative. .... .. .. ........ .. ...... ......... ... .. ..... .... .. . . .... . .. ............... . . .. .. . .. .. .. B .. .. .. .. .. .................................................................................................................................................................................. .. a ..... A .. b .... .. ....... ...... .. ...... .. .. . Z b f (x)dx = −area(A) + area(B) Figure 3.2: a

3.1.2

Integrability Criterion

In Examples 3.1.2 through 3.1.5, we saw a function may or may not be integrable. The following is a simple condition for integrability. Proposition 3.1.2. Riemann integrable functions are bounded. Example 3.1.4 shows the converse is not true. Proof. Let f (x) be integrable on a bounded interval [a, b] and let I be the integral. Then for = 1 > 0, there is a partition P , such that n X ∗ f (xi )∆xi − I = |S(P, f ) − I| < 1 i=1

for any choice of x∗i . Now we fix x∗2 , x∗3 , . . . , x∗n , so that fixed bounded number. Then n X ∗ ∗ |f (x1 )∆x1 | ≤ f (xi )∆xi − I + 1

Pn

i=2

f (x∗i )∆xi is a

i=2

for any x∗1 ∈ [x0 , x1 ]. In particular, this shows that f (x) is bounded by P 1 (| ni=2 f (x∗i )∆xi − I| + 1) on the first interval [x0 , x1 ] of the partition. ∆x1 Similar argument shows that the function is bounded on any interval of the partition. Since the partition contains finitely many intervals, the function is bounded on [a, b].

144

CHAPTER 3. INTEGRATION

Similar to the convergence of sequences, a more refined criterion for integrability can be obtained by considering the Cauchy criterion. By a proof similar to the convergence of sequences and functions, the Riemann sum converges if and only if for any > 0, there is δ > 0, such that kP k, kP 0 k < δ =⇒ |S(P, f ) − S(P 0 , f )| < .

(3.1.4)

Note that hidden in the notation is the choices of x∗i and x0 ∗i for P and P 0 . For the special case P = P 0 , we have 0

S(P, f ) − S(P , f ) =

n X

∗

(f (x∗i ) − f (x0 i ))∆xi ,

i=1

and the supremum of the difference for all possible choices of x∗i and x0 ∗i is ! n X sup S(P, f ) − inf∗ S(P, f ) = sup f (x) − inf f (x) ∆xi . all xi

all x∗i

i=1

[xi−1 ,xi ]

[xi−1 ,xi ]

Define the oscillation of a bounded function f (x) on an interval [a, b] to be ω[a,b] (f ) =

sup |f (x) − f (y)| = sup f (x) − inf f (x). a≤x≤y≤b

[a,b]

Then sup S(P, f ) − inf∗ S(P, f ) =

all x∗i

all xi

n X

[a,b]

ω[xi−1 ,xi ] (f )∆xi ,

i=1

is the Riemann sum of the oscillations. Therefore the specialization of the Cauchy criterion (3.1.4) to the case P = P 0 becomes kP k < δ =⇒

n X

ω[xi−1 ,xi ] (f )∆xi < .

i=1

..... .. .. ...................................................................... .. .... .. ..... .. .. .. .... ............................................ .. ............................. .. .. .. .. .. ... .......................... .. .. .. .. ). .. .. ω(f ... ... ... . .. .. .. .... .. ... ... .... .. .... .... .. ... ... ... .. .................. ............................................ ... ... .... .. . . . . . ... ... ... .. .. ..... .... .. .... .. ... ... .... .. ................................. ... .. .. ... ... ... .. .. ...................................... .. ... ... .... .. .. .. .. .. .. ... ... ... .. .. .. .. .. .. .. .. ... .. .. .. .. .. .. ..................................................................................................................................................................................... xi−1 xi a ... b .. .. Figure 3.3: Riemann sum of oscillations

(3.1.5)

3.1. RIEMANN INTEGRATION

145

It turns out this specialized Cauchy criterion also implies the general Cauchy criterion (3.1.4). As a preparation for the proof, note that for any a ≤ c ≤ b, we have n n X X ∗ |f (c)(b − a) − S(P, f )| = (f (c) − f (xi ))∆xi ≤ |f (c) − f (x∗i )|∆xi ≤

i=1 n X

i=1

ω[a,b] (f )∆xi ≤ ω[a,b] (f )(b − a).

(3.1.6)

i=1

Theorem 3.1.3 (Riemann Criterion). A bounded function f (x) on a bounded interval [a, b] is Riemann integrable if and only if for any > 0, there is δ > 0, P such that kP k < δ implies ni=1 ω[xi−1 ,xi ] (f )∆xi < . Proof. Assume the implication (3.1.5) holds. Let P and P 0 be partitions satisfying kP k, kP 0 k < δ. Let Q be the partition obtained by combining the partition points in P and P 0 together. Make an arbitrary choice of x∗i for Q and form the Riemann sum S(Q, f ). Note that Q is obtained by adding more points into P (Q is called a refinement of P ). For any interval [xi−1 , xi ] in the partition P , denote by Q[xi−1 ,xi ] the part of the partition Q lying inside the interval. Then n X

S(Q, f ) =

S(Q[xi−1 ,xi ] , f ).

i=1

By the inequality (3.1.6), we have n n X X |S(P, f ) − S(Q, f )| = f (x∗i )∆xi − S(Q[xi−1 ,xi ] , f ) i=1

i=1

n X f (x∗i )(xi − xi−1 ) − S(Q[x ≤

≤

i=1 n X

, f) i−1 ,xi ]

ω[xi−1 ,xi ] (f )∆xi .

i=1

Since kP k < δ, the implication (3.1.5) tells us that the right side is less than . By the same reason, we get |S(P 0 , f ) − S(Q, f )| < . Therefore |S(P, f ) − S(P 0 , f )| ≤ |S(P, f ) − S(Q, f )| + |S(P 0 , f ) − S(Q, f )| < 2.

Example 3.1.6. For theP function f (x) = x in Example 3.1.1, we have ω[xi−1 ,xi ] (f ) = P xi − xi−1 ≤ kP k and ω[xi−1 ,xi ] (f )∆xi ≤ kP k∆xi = (b − a)kP k. Then it is easy to see that the Riemann criterion is satisfied, so that f (x) = x is integrable. Z 1 1 Moreover, the computation in Example 3.1.1 tells us xdx = . 2 0 Example P 3.1.7. For the Dirichlet function in Exercise 1.3.24, we have ω[xi−1 ,xi ] (f ) = 1 and ω[xi−1 ,xi ] (f )∆xi = b − a. By the Riemann criterion, the function is not integrable.

146

CHAPTER 3. INTEGRATION

Exercise 3.1.4. Prove that the function f (x) = x2 is integrable on [0, 1]. Exercise 3.1.5. Study the integrability of the functions in Exercise 3.1.2 again by using Theorem 3.1.3. Exercise 3.1.6. Prove the inequality ω(|f |) ≤ ω(f ). Then prove that the integrability of f (x) implies the integrability of |f (x)|. Exercise 3.1.7. Prove that the integrability of f (x) implies the integrability of f (x)2 . Exercise 3.1.8. Suppose a bounded function f (x) on [a, b] is integrable on [c, b] for any a < c < b. Prove that f (x) is integrable on [a, b]. In fact, we also have Z b Z b f (x)dx by Theorem 3.2.1. f (x)dx = lim c→a+

a

3.1.3

c

Integrability of Continuous and Monotone Functions

By using the integrability criterion in Theorem 3.1.3, we may identify some important classes of integrable functions. Proposition 3.1.4. Continuous functions on bounded closed intervals are Riemann integrable. Proof. By Theorem 1.4.4, for any > 0, there is δ > 0, such that |x − y| < δ implies |f (x)−f (y)| < . Suppose kP k < δ. Then for any x, y ∈ [xi−1 , xi ], we have |x − y| ≤ |xi − xi−1 | < δ, which P implies |f (x) − f (y)| < . Therefore the oscillation ω[xi−1 ,xi ] (f ) ≤ and ω[xi−1 ,xi ] (f )∆xi ≤ (b − a). By Theorem 3.1.3, the function is integrable. Proposition 3.1.5. Monotone functions on bounded closed intervals are Riemann integrable. Proof. Let f (x) be an increasing function on a bounded closed interval [a, b]. Then ω[xi−1 ,xi ] (f ) = f (xi ) − f (xi−1 ), and X

X

(f (xi ) − f (xi−1 ))∆xi X ≤ kP k (f (xi ) − f (xi−1 )) = kP k(f (b) − f (a)).

ω[xi−1 ,xi ] (f )∆xi =

By Theorem 3.1.3, this implies that the function is integrable. Example 3.1.8. The function x is integrable on [a, b] by either Proposition 3.1.4 or Z 1 Proposition 3.1.5. The computation in Example 3.1.1 tells us xdx = lim S(Pn , x) = 1 . An extension of the computation shows 2 Z b 1 xdx = (b2 − a2 ). 2 a

0

n→∞

3.1. RIEMANN INTEGRATION

147

The function x2 is integrable on [a, b] by Proposition 3.1.4. To find the integral, i we take a partition Pn to consist of xi = a + (b − a) and choose x∗i = xi . Then n n 2 X i 1 S(Pn , x2 ) = a + (b − a) (b − a) n n i=1 n X 1 2 i2 i 2 = a + 2 2 a(b − a) + 3 (b − a) (b − a) n n n i=1

=

n(n + 1)(2n + 1) n 2 n(n + 1) a(b − a)2 + (b − a)3 . a (b − a) + 2 n n 6n3

Taking the limit as n → ∞, we get Z b 1 x2 dx = (b3 − a3 ). 3 a Exercise 3.1.9. Suppose f (x) is a convex function on [a, b]. Use the inequaliZ b ties in Exercises 2.3.56 and 2.3.57 to prove that f (b) − f (a) = f 0 (x− )dx = a Z b 0 + f (x )dx. a

Proposition 3.1.6. Suppose f (x) is an integrable function on a bounded closed interval [a, b]. Suppose the values of f (x) lie in a finite union U of closed intervals, and φ(y) is a continuous function on U . Then the composition φ(f (x)) is integrable. Proof. By Theorems 1.4.4 and 1.4.5, the continuous function φ is uniformly continuous and bounded on each interval inside U . Since U contains finitely many closed intervals, φ is also uniformly continuous and bounded on U . In other words, for any > 0, there is δ > 0, such that |y − y 0 | < δ implies |φ(y) − φ(y 0 )| < . By taking y = f (x) and y 0 = f (x0 ), for any interval [c, d] ⊂ [a, b], we have ω[c,d] (f ) < δ =⇒ |f (x) − f (x0 )| < δ for any x, x0 ∈ [c, d] =⇒ |φ(f (x)) − φ(f (x0 ))| < for any x, x0 ∈ [c, d] =⇒ ω[c,d] (φ ◦ f ) ≤ . By Theorem 3.1.3, since f (x) is integrable, there is δ 0 > 0 satisfying δ 0 < δ, such that X kP k < δ 0 =⇒ ω[xi−1 ,xi ] (f )∆xi < δ. Then the sum X

ω[xi−1 ,xi ] (φ ◦ f )∆xi =

X <δ

P

P

+

X , ≥δ

where <δ and ≥δ consist respectively of the terms with ω[xi−1 ,xi ] (f ) < δ and the terms with ω[xi−1 ,xi ] (f ) ≥ δ. Since ω[xi−1 ,xi ] (f ) < δ implies ω[xi−1 ,xi ] (φ◦ f ) < , we have X X X ω[xi−1 ,xi ] (φ ◦ f )∆xi ≤ ∆xi ≤ ∆xi = (b − a). <δ

<δ

148

CHAPTER 3. INTEGRATION

P For those intervals with ω[xi−1 ,xi ] (f ) ≥ δ, the total length ≥δ ∆xi of such intervals can be estimated as follows X X X δ ∆xi ≤ ω[xi−1 ,xi ] (f )∆xi ≤ ω[xi−1 ,xi ] (f )∆xi ≤ δ. ≥δ

≥δ

If φ is bounded by B on U , then ω[xi−1 ,xi ] (φ ◦ f ) ≤ 2B and X X ω[xi−1 ,xi ] (φ ◦ f )∆xi ≤ 2B ∆xi ≤ 2B. ≥δ

≥δ

Thus we conclude X X X ω[xi−1 ,xi ] (φ ◦ f )∆xi = + ≤ (b − a) + 2B. ≥δ

<δ

Since a, b, B are all fixed constants, by Theorem 3.1.3, this implies that φ ◦ f is integrable. Example 3.1.9. If f (x) is integrable, then f (x)2 and |f (x)| are also integrable. If we p further have f (x) ≥ 0, then f (x) is integrable. If we further have |f (x)| > c > 0 for a constant c, then by taking U = [−B, −c] ∪ [c, B], where B is the bound for 1 f , we find to be integrable. f Exercise 3.1.10. Does the integrability of |f (x)| imply the integrability of f (x)? What about f (x)2 ? What about f (x)3 ? Exercise 3.1.11. Suppose φ(x) satisfies A(x0 − x) < φ(x0 ) − φ(x) < B(x0 − x) for some constants A, B > 0 and all a ≤ x < x0 ≤ b. 1. Prove that ω[x,x0 ] (f ◦ φ) = ω[φ(x),φ(x0 )] (f ). 2. Prove that if f (y) is integrable on [φ(a), φ(b)], then f (φ(x)) is integrable on [a, b]. Moreover, prove that if φ(x) is continuous on [a, b] and differentiable on (a, b), satisfying A < φ0 (x) 0 and all a ≤ x < x0 ≤ b.

3.1.4

Properties of Integration

Being defined as a certain type of limit, the integration has similar properties as the limit. The properties limn→∞ (xn + yn ) = limn→∞ xn + limn→∞ yn and limn→∞ cxn = c limn→∞ xn have the following analogues. Proposition 3.1.7. Suppose f (x) and g(x) are integrable on [a, b]. Then f (x) + g(x) and cf (x) are also integrable on [a, b], and Z b Z b Z b (f (x) + g(x))dx = f (x)dx + g(x)dx, (3.1.7) a a a Z b Z b cf (x)dx = c f (x)dx. (3.1.8) a

a

3.1. RIEMANN INTEGRATION Z Proof. Denote I =

b

149 Z

f (x)dx and J =

g(x)dx. For any > 0, there is

a

δ > 0, such that

b

a

kP k < δ =⇒ |S(P, f ) − I| < , |S(P, g) − J| < . On the other hand, by choosing the same partition P and the same x∗i for both f (x) and g(x), we have X S(P, f + g) = (f (x∗i ) + g(x∗i ))∆xi X X = f (x∗i )∆xi + g(x∗i )∆xi = S(P, f ) + S(P, g). Thus we conclude kP k < δ =⇒ |S(P, f + g) − I − J| ≤ |S(P, f ) − I| + |S(P, g) − J| < 2. Z b This shows that f + g is also integrable, and (f (x) + g(x))dx = I + J. a Z b The proof of cf (x)dx = cI is similar. a

Example 3.1.10. Let L(x) = A + Bx be the linear function satisfying L(a) = h and L(b) = k. Then k + l = 2A + B(a + b). By Examples 3.1.2 and 3.1.8, we have Z b Z b Z b B 1 L(x)dx = A dx + B xdx = A(b − a) + (b2 − a2 ) = (b − a)(k + l). 2 2 a a a This is indeed the area of the trapezoid under the line. Example 3.1.11. Suppose f (x) and g(x) are integrable. By Proposition 3.1.7, f (x) + g(x) is integrable. By Proposition 3.1.6, f (x)2 , g(x)2 and (f (x) + g(x))2 are also integrable. Then by Proposition 3.1.7 again, the product f (x)g(x) =

1 (f (x) + g(x))2 − f (x)2 − g(x)2 2

is also integrable. However, there is no formula expressing the integral of f (x)g(x) in terms of the integrals of f (x) and g(x). Moreover, if |g(x)| > c > 0 for a constant c, then by the discussion in Example f (x) 1 3.1.9, the quotient = f (x) is integrable. g(x) g(x) Example 3.1.12. Suppose f (x) is integrable on [a, b]. Suppose g(x) = f (x) for all x except at c ∈ [a, b]. Then g(x) = f (x) + λdc (x), where dc (x) is the function in Example 3.1.3 and λ = g(c) − f (c). In particular, by the computation in Example 3.1.3, g(x) is also integrable and Z b Z b Z b Z b g(x)dx = f (x)dx + λ dc (x)dx = f (x)dx. a

a

a

a

The example shows that changing an integrable function at finitely many places does not change the integrability and the integral. In particular, it makes sense to talk about the integrability of a function f (x) on a bounded open interval (a, b) because any numbers may be assigned as f (a) and f (b) without affecting the integrability of (the extended) f (x) on [a, b].

150

CHAPTER 3. INTEGRATION

Example 3.1.13. In Example 3.1.5, we showed that the Thomae’s function R(x) in Example 1.4.2 is integrable. By Example 3.1.12, we also know that the function ( 1 if 0 < x ≤ 1 f (x) = 0 if x = 0 is integrable. However, the composition f (R(x)) is the Dirichlet function, which is not integrable by Example 3.1.4. Exercise 3.1.12. Prove that if f (x) and g(x) are integrable, then max{f (x), g(x)} and min{f (x), g(x)} are also integrable. Exercise 3.1.13. Find a function f (x) on [0, 1] that is never zero and is integrable, 1 such that is not integrable. f (x)

Proposition 3.1.8. Suppose f (x) and g(x) are integrable on [a, b]. If f (x) ≤ g(x), then Z b Z b f (x)dx ≤ g(x)dx. a

Z Proof. Denote I = δ > 0, such that

a

b

Z f (x)dx and J =

b

g(x)dx. For any > 0, there is

a

a

kP k < δ =⇒ |S(P, f ) − I| < , |S(P, g) − J| < . By choosing the same x∗i for both functions, we have X X I − < S(P, f ) = f (x∗i )∆xi ≤ g(x∗i )∆xi = S(P, g) < J + . Therefore we have the inequality I − < J + for any > 0. This implies I ≤ J. Example 3.1.14. If f (x) is integrable, then |f (x)| is integrable, and −|f (x)| ≤ Z b Z b f (x) ≤ |f (x)|. By Proposition 3.1.8, we have − |f (x)|dx ≤ f (x)dx ≤ a a Z b |f (x)|dx. This is the same as a

Z b Z b ≤ f (x)dx |f (x)|dx. a

(3.1.9)

a

Exercise 3.1.14. Suppose f (x) is integrable on [a, b]. Prove that Z (b − a) inf f ≤ [a,b]

b

f (x)dx ≤ (b − a) sup f. a

(3.1.10)

[a,b]

Exercise 3.1.15. Suppose f (x) is continuous on [a, b]. Prove that there is a < c < b, such that Z b

f (x)dx = f (c)(b − a). a

(3.1.11)

3.1. RIEMANN INTEGRATION

151

More generally, prove the first integral mean value theorem: For any non-negative integrable function g(x) on [a, b], there is a < c < b, such that Z

b

Z

b

g(x)dx.

g(x)f (x)dx = f (c)

(3.1.12)

a

a

Exercise 3.1.16. Suppose f (x) is integrable on [a, b]. Prove that Z b f (c)(b − a) − ≤ ω[a,b] (f )(b − a) f (x)dx

(3.1.13)

a

for any a ≤ c ≤ b. Moreover, prove that X Z b ≤ S(P, f ) − f (x)dx ω[xi−1 ,xi ] (f )∆xi .

(3.1.14)

a

Exercise 3.1.17. Suppose f (x) is a continuous function on an open interval conZ b |f (x + t) − f (x)|dx = 0. We will see in Exercise taining [a, b]. Prove that limt→0 a

3.1.46 that the continuity assumption is not needed.

If a region is divided into non-overlapping parts, then the whole area is the sum of the areas of the parts. The following result reflects the intuition. Proposition 3.1.9. Suppose f (x) is a function on [a, c] and a < b < c. Then f (x) is integrable on [a, c] if and only if its restrictions on [a, b] and [b, c] are integrable. Moreover, Z c Z b Z c f (x)dx. (3.1.15) f (x)dx + f (x)dx = b

a

a

Proof. The proof is based on the study of the relation of the Riemann sums of the function on [a, b], [b, c] and [a, c]. Let P be a partition of [a, c]. If P contains b as a partition point, then P is obtained by combining a partition P 0 of [a, b] and a partition P 00 of [b, c] together. For any choice of x∗i for P and the same choice for P 0 and P 00 , we have S(P, f ) = S(P 0 , f ) + S(P 00 , f ). If P does not contain b as a partition point, then xk−1 < b < xk for some k, and the new partition P˜ = P ∪ {b} is still obtained by combining a partition P 0 of [a, b] and a partition P 00 of [b, c] together. For any choice of x∗i for P , we keep all x∗i with i 6= k and introduce xk−1 < x0 ∗k < b, b < x00 ∗k < xk for P˜ . Then S(P˜ , f ) = S(P 0 , f ) + S(P 00 , f ) as before, and |S(P, f ) − S(P 0 , f ) − S(P 00 , f )| = |S(P, f ) − S(P˜ , f )| ∗ ∗ =|f (x∗k )(xk − xk−1 ) − f (x0 k )(b − xk−1 ) − f (x00 k )(xk − b)| ≤2 sup |f | kP k. [xk−1 ,xk ]

Suppose f is integrable on [a, b] and [b, c]. Then by Proposition 3.1.2, f (x) is bounded on the two intervals. Thus |f (x)| 0, there is a

b

152

CHAPTER 3. INTEGRATION

δ > 0, such that for partitions P 0 of [a, b] and P 00 of [b, c] satisfying kP 0 k < δ and kP 00 k < δ, we have |S(P 0 , f ) − I| < and |S(P 00 , f ) − J| < . Then for any partition P of [a, c] satisfying kP k < δ, we always have (regardless we are in the first or the second case above) |S(P, f ) − I − J| ≤|S(P, f ) − S(P 0 , f ) − S(P 00 , f )| + |S(P 0 , f ) − I| + |S(P 00 , f ) − I| <2δB + 2. Z

c

f (x)dx = I + J.

This implies that f (x) is integrable on [a, c], with a

It remains to show that the integrability of f (x) on [a, c] implies the integrability of f on [a, b] and [b, c]. By Cauchy criterion, for any > 0, there is δ > 0, such that for partitions P and Q of [a, c] satisfying kP k < δ and kQk < δ, we have |S(P, f ) − S(Q, f )| < . Now suppose P 0 and Q0 are partitions of [a, b] satisfying kP 0 k < δ and kQ0 k < δ. Let R be any partition of [b, c] satisfying kRk < δ. By adding R to P 0 and Q0 , we get partitions P and Q of [a, b] satisfying kP k < δ and kQk < δ. Moreover, the choices of x∗i (which may be different for P 0 and Q0 ) may be extended by adding the same x∗i for P 00 . Then we get |S(P 0 , f ) − S(Q0 , f )| = |(S(P 0 , f ) + S(R, f )) − (S(Q0 , f ) + S(R, f ))| = |S(P, f ) − S(Q, f )| < . This proves the integrability for f on [a, b]. The proof of the integrability on [b, c] is similar.

Example 3.1.15. A function f (x) on [a, b] is a step function if there is a partition P and constants ci , such that f (x) = ci for xi−1 < x < xi (it does not matter what f (xi ) are). Then Z

b

f (x)dx = a

=

XZ

xi

xi−1 X Z xi

f (x)dx ci dx

(Theorem 3.1.9) (Example 3.1.12)

xi−1

=

X

ci (xi − xi−1 ).

(Example 3.1.2)

Example 3.1.16. Combining Example 3.1.10 with Proposition 3.1.9, we know the integral of a piecewise linear function is equal to the geometrical area of the region under the graph. Exercise 3.1.18. Suppose f (x) is a continuous function on [a, b]. Prove that the following are equivalent. 1. f (x) = 0 for all x. Z

b

|f (x)|dx = 0.

2. a

3.1. RIEMANN INTEGRATION Z

153

d

f (x)dx = 0 for any [c, d] ⊂ [a, b].

3. c

Z

b

f (x)g(x)dx = 0 for any continuous function g(x).

4. a

Exercise 3.1.19. Z bSuppose f (x) is a continuous function on [a, b]. Z b |f (x)|dx if and only if f (x) does not change sign. f (x)dx =

Prove that

a

a

Exercise 3.1.20. Suppose f (x) is a continuous function on [a, b]. Prove that if Z b Z b xf (x)dx = 0, then there are at least two distinct points a < f (x)dx = a

a

c1 , c2 < b, such that f (c1 ) = f (c2 ) = 0. Extend the result to more points. Exercise 3.1.21. Suppose f (x) ≥ 0 is a concave function on [a, b]. Then for any y ∈ [a, b], f (x) is bigger than the function obtained by connecting straight lines from Z b 2 (a, 0) to (y, f (y)) and then to (b, 0). Use this to prove that f (y) ≤ f (x)dx. b−a a Moreover, determine when the equality holds. Exercise 3.1.22. Suppose f (x) is continuous on [a, b] and differentiable on (a, b). f (b) − f (a) Suppose m ≤ f 0 ≤ M on (a, b) and denote µ = . By comparing f (x) b−a with suitable piecewise linear functions, prove that Z b (M − µ)(µ − m) f (a) + f (b) ≤ (b − a) (b − a)2 . (3.1.16) f (x)dx − 2 2(M − m) a Exercise 3.1.23. Suppose f (x) is a non-negative and strictly increasing function on [a, b]. Z b 1. Prove that if f (b) ≤ 1, then limn→∞ f (x)n dx = 0. a

Z 2. Prove that if f (b) > 1 and f (x) is continuous at b, then limn→∞

b

f (x)n dx =

a

+∞. Z

b

f (x)n g(x)dx, where g(x) is non-negative and in-

Extend the result to limn→∞ a

tegrable on [a, b]. Exercise 3.1.24. Suppose f (x) is continuous on [a, b] and f (x) > 0 on (a, b). SupZ b Z b p g(x)dx. pose g(x) is integrable on [a, b]. Prove that limn→∞ g(x) n f (x)dx = a

a

Exercise 3.1.25. Suppose f (x) ≥ 0 is continuous on [a, b]. Prove that Z lim

p→+∞

b p

f (x) dx a

p1 = max f (x). [a,b]

Exercise 3.1.26. Suppose f (x) satisfies the Lipschitz condition |f (x) − f (x0 )| < L|x − x0 | on [a, b]. Prove that for any partition P and any choice of x∗i , we have Z b LX S(P, f ) − f (x)dx ≤ ∆x2i . 2 a This gives an estimate of how close the Riemann sum is to the actual integral.

154

CHAPTER 3. INTEGRATION

Exercise 3.1.27. Suppose g(x) is integrable on [a, b]. Prove that for any partition P of [a, b] and choice of x∗i , we have Z xi Z b X X f (x∗i ) g(x)dx − f (x)g(x)dx ≤ sup |g| ω[xi−1 ,xi ] (f )∆xi . [a,b] xi−1 a In particular, if f (x) is integrable, then Z xi Z b X ∗ f (x)g(x)dx. lim f (xi ) g(x)dx = kP k→0

xi−1

a b

Z

f (x)dx implicitly assumes a <

The definition of the Riemann integral b. If a > b, then we also define Z Z b f (x)dx = − a

Z Moreover, we define

a

a

f (x)dx.

(3.1.17)

b

a

f (x)dx = 0 (which can be considered as a special a

case of the original definition of the Riemann integral). Then the equality Z c Z b Z c f (x)dx f (x)dx + f (x)dx = b

a

a

still holds for any order between a, b, c. Proposition 3.1.7 still holds for a ≥ b, and the direction of the inequality in Proposition 3.1.8 needs to be reversed for a ≥ b.

3.1.5

Additional Exercise

Modified Riemann Sum and Riemann Product Exercise 3.1.28. Let φ(t) be a function defined near 0. For any partition P of [a, b] and choice of x∗i , define the “modified Riemann sum” Sφ (P, f ) =

n X

φ(f (x∗i )∆xi ).

i=1

Prove that if φ is differentiable at 0, such that φ(0) = 0 and φ0 (0) = 1, and f (x) Z b f (x)dx. is integrable on [a, b], then limkP k→0 Sφ (P, f ) = a

Exercise 3.1.29. For any partition P of [a, b] and choice of x∗i , define the “Riemann product” Π(P, f ) = (1 + f (x∗1 )∆x1 )(1 + f (x∗2 )∆x2 ) · · · (1 + f (x∗n )∆xn ). Rb

Prove that if f (x) is integrable on [a, b], then limkP k→0 Π(P, f ) = e

a

f (x)dx

.

Integrability and Continuity Riemann integrable functions are not necessarily continuous. How much discontinuity can Riemann integrable functions have?

3.1. RIEMANN INTEGRATION

155

Exercise 3.1.30. Suppose f (x) is integrable on [a, b]. Prove that for any > 0, there is δ > 0, such that for any partition P satisfying kP k < δ, we have ω[xi−1 ,xi ] (f ) < for some interval [xi−1 , xi ] in the partition. Exercise 3.1.31. Suppose there is a sequence of intervals [a, b] ⊃ [a1 , b1 ] ⊃ [a2 , b2 ] ⊃ · · · , such that an < c < bn for all n and limn→∞ ω[an ,bn ] (f ) = 0. Prove that f (x) is continuous at c. Exercise 3.1.32. Prove that an integrable function must be continuous somewhere. In fact, prove that for any (c, d) ⊂ [a, b], an integrable function on [a, b] is continuous somewhere in (c, d). In other words, the continuous points of the integrable function must be dense. Exercise 3.1.33. Define the oscillation ω(x) = lim ω[x−δ,x+δ] (f ) = lim f (x) − lim f (x) y→x

δ→0+

y→x

of a function at a point (see the definition before Exercise 1.3.40 for the upper and lower limit of functions). Prove that f (x) is continuous at x0 if and only if ω(x0 ) = 0. Exercise 3.1.34. Prove that if f (x) is integrable, then for any > 0 and δ > 0, there is a union U of finitely many intervals, such that the sum of the lengths of the intervals in U is < , and ω(x) ≥ δ implies x ∈ U . Exercise 3.1.35 (Hankel2 ). Prove that if f (x) is integrable, then for any > 0, there is a union U of countably many intervals, such that the sum of the lengths of the intervals in U is < , and all discontinuous points of f (x) lie inside U . This basically says that the set of discontinuous points of a Riemann integrable function has Lebesgue measure 0. The converse is also true.

Strict Inequality in Integration The existence of the continuous points for integrable functions (see Exercise 3.1.32) enables us to change the inequalities in Proposition 3.1.8 to become strict. Z Exercise 3.1.36. Prove that if f (x) > 0 is integrable on [a, b], then

b

f (x)dx > 0. a

In particular, this shows Z f (x) < g(x) =⇒

b

Z f (x)dx <

a

b

g(x)dx. a

Exercise 3.1.37. Suppose f (x) is an integrable function on [a, b]. Prove that the following are equivalent. Z d 1. f (x)dx = 0 for any [c, d] ⊂ [a, b]. c

Z

b

|f (x)|dx = 0.

2. a

2 Hermann Hankel, born 1839 in Halle (Germany), died 1873 in Schramberg (Germany). Hankel was Riemann’s student, and his study of Riemann’s integral prepared for the discovery of Lebesgue integral.

156

CHAPTER 3. INTEGRATION Z

b

f (x)g(x)dx = 0 for any continuous function g(x).

3. a

Z 4.

b

f (x)g(x)dx = 0 for any integrable function g(x). a

5. f (x) = 0 at continuous points.

Refinement of Partition and Integrability Criterion The integrability criterion in Theorem 3.1.3 requires the Riemann sum of oscillations to be small for all partitions P satisfying kP k < δ. The following exercises show that it is sufficient for this to happen for just one partition P . Exercise 3.1.38. Suppose a partition Q : a = y0 < y1 < y2 < · · · < ym = b is obtained by adding k points to a partition P : a = x0 < x1 < x2 < · · · < xn = b. Prove that X X ω[yj−1 ,yj ] (f )∆yj ≤ ω[xi−1 ,xi ] (f )∆xi and X

ω[xi−1 ,xi ] (f )∆xi ≤

X

ω[yj−1 ,yj ] (f )∆yj + 2kkP kω[a,b] (f ).

Exercise 3.1.39. Prove that a functionP f (x) is integrable if and only if for any > 0, there is a partition P , such that ω[xi−1 ,xi ] (f )∆xi < .

Darboux Sum and Darboux Integral For a function f (x) on [a, b] and a partition P of [a, b], the upper and lower Darboux sums are U (P, f ) = sup S(P, f ) = all x∗i

L(P, f ) = inf∗ S(P, f ) =

n X

sup f (x)∆xi ,

i=1 [xi−1 ,xi ] n X

all xi

inf f (x)∆xi .

i=1

(3.1.18) (3.1.19)

[xi−1 ,xi ]

For bounded f (x), the upper and lower Darboux integrals are Z

b

Z f (x)dx = inf U (P, f ), all P

a

b

f (x)dx = sup L(P, f ).

(3.1.20)

all P

a

Exercise 3.1.40. Prove that if Q is a refinement of P , then U (P, f ) ≥ U (Q, f ) ≥ L(Q, f ) ≥ L(P, f ). Exercise 3.1.41. Prove that Z b Z b f (x)dx = lim U (P, f ), f (x)dx = lim L(P, f ). kP k→0

a

kP k→0

a

Exercise 3.1.42. Prove that Z

b

Z f (x)dx ≥

a

b

f (x)dx, a

and the equality holds if and only if f (x) is Riemann integrable on [a, b]. Moreover, Z b the Riemann integral f (x)dx is the common value. a

3.1. RIEMANN INTEGRATION

157

Exercise 3.1.43. Prove that if f (x) is integrable, then Z b X f (x)dx = lim φi ∆xi , kP k→0

a

where φi is any number satisfying inf [xi−1 ,xi ] f (x) ≤ φi ≤ sup[xi−1 ,xi ] f (x). Exercise 3.1.44. Study the properties in Section 3.1.4 for the Darboux integral.

Integral Continuity The equality limt→0 |f (x + t) − f (x)| = 0 means the continuity of f at Z b t. Therefore the equality limt→0 |f (x + t) − f (x)|dx = 0 means the “ina

tegral continuity” of f on [a, b]. Exercise 3.1.17 says that continuity implies integral continuity. The following exercises show that the continuity of f is unnecessary for the integral continuity. Exercise 3.1.45. Suppose f is integrable on an open interval containing [a, b]. Suppose P be a partition of [a, b] by intervals of equal length δ. Prove that if |t| < δ, then Z xi |f (x + t) − f (x)|dx ≤ δ(ω[xi−2 ,xi−1 ] (f ) + ω[xi−1 ,xi ] (f ) + ω[xi ,xi+1 ] (f )). xi−1

Exercise 3.1.46. Suppose f is integrable on an open interval containing [a, b]. Prove Z b Z b that limt→0 |f (x + t) − f (x)|dx = 0 and limt→1 |f (tx) − f (x)|dx = 0. a

a

Integral Inequalities for Convex Functions Exercise 3.1.47. Suppose f (x) is a convex function on [a, b]. By comparing f (x) with linear functions in Figure 2.6, prove that Z b f (a) + f (b) a+b (b − a) ≤ f (x)dx ≤ (b − a). f 2 2 a Exercise 3.1.48. A weight on [a, b] is a function λ(x) satisfying Z b 1 λ(x) = 1. (3.1.21) λ(x) ≥ 0, b−a a Z b 1 We have λ(x)xdx = (1 − µ)a + µb for some 0 < µ < 1. For a convex b−a a function on [a, b], prove that Z b 1 f ((1 − µ)a + µb) ≤ λ(x)f (x)dx ≤ (1 − µ)f (a) + µf (b). b−a a The left inequality is the integral version of the Jensen inequality in Exercise 2.3.40. What do you get by applying the integral Jensen inequality to x2 , ex and log x? Exercise 3.1.49. Suppose f (x) is a convex function on [a, b] and φ(t) is an integrable function on [α, β] satisfying a ≤ φ(t) ≤ b. Suppose λ(x) is a weight function on [α, β] as defined in Exercise 3.1.48. Prove that Z β Z β 1 1 f λ(t)φ(t)dt ≤ λ(t)f (φ(t))dt. β−α α β−α α This further extends the integral Jensen inequality.

158

CHAPTER 3. INTEGRATION

H¨ older and Minkowski Inequalities 1 1 + = 1. Prove the integral versions p q of the H¨ older and Minkowski inequalities in Exercises 2.2.41 and 2.2.42. Z b p1 Z b 1q Z b p q |f (x)g(x)|dx ≤ |f (x)| dx |g(x)| dx , (3.1.22) Exercise 3.1.50. Suppose p, q > 0 satisfy

a

Z

a

b p

p1

|f (x) + g(x)| dx

Z ≤

a b p

|f (x)| dx

a

p1

Z

p

|g(x)| dx

+

a

b

p1 .

(3.1.23)

a

Estimation of Integral Exercise 3.1.51. Suppose f (x) is continuous on [a, b] and differentiable on (a, b). By comparing f (x) with the straight line L(x) = f (a)+m(x−a) for m = sup(a,b) f 0 or m = inf (a,b) f 0 , prove that Z b sup(a,b) f 0 inf (a,b) f 0 (b − a)2 ≤ f (x)dx − f (a)(b − a) ≤ (b − a)2 . (3.1.24) 2 2 a Then use Darboux’s intermediate value theorem in Exercise 2.2.33 to show Z b f 0 (c) f (x)dx = f (a)(b − a) + (b − a)2 (3.1.25) 2 a for some a < c < b. Exercise 3.1.52. Suppose f (x) is continuous on [a, b] and differentiable on (a, b). Suppose g(x) is non-negative and integrable on [a, b]. Prove that Z b Z b Z b f (x)g(x)dx = f (a) g(x)dx + f 0 (c) (x − a)g(x)dx (3.1.26) a

a

a

for some a < c < b. Exercise 3.1.53. Suppose f (x) is continuous on [a, b] and differentiable on (a, b). Use Exercise 3.1.52 to prove that Z b ω(a,b) (f 0 ) f (a) + f (b) ≤ f (x)dx − (b − a) (b − a)2 . (3.1.27) 2 8 a In fact, this estimation can also be derived from Exercise 3.1.22. Exercise 3.1.54. Suppose f (x) is continuous on [a, b] and has second order derivaa+b to prove that tive on (a, b). Use the Taylor expansion at 2 Z b a+b f 00 (c) f (x)dx = f (b − a) + (b − a)3 (3.1.28) 2 24 a for some a < c < b.

3.2

Antiderivative

Riemann’s definition of integration is independent of the differentiation. Before Riemann, however, Newton and Leibniz considered the integration as the inverse of the differentiation. The relation enables us to compute the integrals.

3.2. ANTIDERIVATIVE

3.2.1

159

Fundamental Theorem of Calculus

Let us consider the integral of f (x) as a function by changing the upper limit. The result is a function with f (x) as the derivative. Z x f (t)dt is a Theorem 3.2.1. Suppose f (x) is integrable. Then F (x) = a

continuous function. Moreover, if f (x) is continuous at x0 , then F (x) is differentiable at x0 , with F 0 (x0 ) = f (x0 ). Proof. By Proposition 3.1.2, we have |f (x)| < B for a constant B and all x. Then by Propositions 3.1.8 (see also (3.1.9)) and Proposition 3.1.9, we have Z x f (t)dt ≤ B|x − x0 |. |F (x) − F (x0 )| = x0

This implies limx→x0 F (x) = F (x0 ). Now further assume that f (x) is continuous at x0 . For any < 0, there is δ > 0, such that |x − x0 | < δ implies |f (x) − f (x0 )| < . Then Z x |F (x) − F (x0 ) − f (x0 )(x − x0 )| = f (t)dt − f (x0 )(x − x0 ) Zx0x = (f (t) − f (x0 ))dt ≤ |x − x0 |. x0

This means that F (x0 ) + f (x0 )(x − x0 ) is the linear approximation of F (x) at x0 , with F 0 (x0 ) = f (x0 ). The theorem suggests that, to compute the integral of a continuous function f (x), we may consider a function φ(x) satisfying φ0 (x) = f (x). Although the functions φ(x) and F (x) may not be equal, the property φ0 (x) = f (x) = F 0 (x) implies, by Proposition 2.2.3, that φ(x) = F (x) + C for a constant C. Then Z b f (x)dx = F (b) = F (b) − F (a) = φ(b) − φ(a). a

For the obvious reason, the function φ(x) is called an antiderivative of f (x). It is unique up to adding a constant. Example 3.2.1. For the sign function sign(x) in Example 1.4.1, we have Z x   1dx = x if x > 0   Z x  0 sign(t)dt = 0 if x = 0 Z x  0     −1dx = −x if x < 0 0

by considering Z xthe antiderivative of 1 for x > 0 and the antiderivative of −1 for x < 0. Thus sign(t)dt = |x| is continuous and is differentiable except at x = 0, 0

which is the place where sign(t) is not continuous.

160

CHAPTER 3. INTEGRATION

1 1 Example 3.2.2. The functions c, x, x2 are continuous, with cx, x2 , x3 as an2 3 tiderivatives. Therefore Z b Z b Z b 1 1 1 1 xdx = b2 − a2 , cdx = cb − ca, x2 dx = b3 − a3 . 2 2 3 3 a a a 1 Example 3.2.3. The functions cos2 x = (1 + cos 2x), tan2 x = sec2 x − 1 are 2 1 continuous, with (2x + sin 2x), tan x − x as antiderivatives. Therefore 4 Z π 1 2 1 π π cos2 xdx = 2 + sin π − (2 · 0 + sin 0) = , 4 2 4 4 0 Z π 4 π π π tan2 xdx = tan − − (tan 0 − 0) = 1 − . 4 4 4 0 1 is integrable on [1, 2]. Consider the partition Example 3.2.4. The function x n+1 n+2 2n n+1 n+2 2n P: 1< < < ··· < = 2 and take the right ends , ,..., n n n n n n as x∗i . Then the Riemann sum 1 1 1 1 1 1 1 1 1 1 S P, = + + ··· + = + + ··· + . n + 1 n + 2 2n x n n n n+1 n+2 2n n n n Thus lim

n→∞

1 1 1 + + ··· + n+1 n+2 2n

1 = lim S P, x kP k→0

Z = 1

2

1 dx = log 2, x

where the last equality is obtained from the fact that log x is an antiderivative of 1 . x Z x2 Z y tdt tdt 2 Example 3.2.5. The function g(x) = = h(x ), with h(y) = . 3 3 1+t 0 1+t 0 By Theorem 3.2.1 and the chain rule, we have g 0 (x) = h0 (x2 )2x = 2x

x2 2x3 = . 1 + (x2 )3 1 + x6

By the similar idea, we have Z ex Z ex Z log x d tdt d tdt tdt = − 3 dx dx 1 + t3 1 + t3 log x 1 + t 0 0 ex 1 log x = ex − x 3 1 + (e ) x 1 + (log x)3 e2x log x = − . 3x 1+e x(1 + (log x)3 ) Exercise 3.2.1. Compute the integrals.

3.2. ANTIDERIVATIVE Z

161

b

Z

n

x dx, n ∈ N.

1.

2.

Z sin xdx.

3.

b

Z

α

x dx, b > 0.

b

Z 6.

cos 2xdx.

Exercise 3.2.2. Compute the derivatives. Z x Z sin x 1. sin t2 dt. t2 dt. 3. Z 2.

Z 5.

0 x2

Z sin tdt.

2x dx.

|x|

sin t2 dt.

0 x

4.

Z

sin t2 dt.

sin x

|t|dt.

6.

sin x

0

b

0

0

0

ex dx.

0

4.

1

b

5.

0

0

Z

b

0

Exercise 3.2.3. For non-negative integers m and n, prove that Z

2π

cos mx sin nxdx = 0,   if m 6= n Z 2π 0 cos mx cos nxdx = π if m = n 6= 0 ,  0  2π if m = n = 0 ( Z 2π 0 if m 6= n or m = n = 0 sin mx sin nxdx = . π if m = n 6= 0 0 0

(3.2.1)

Exercise 3.2.4. Compute the limits by relating to the Riemann sums of suitable integrals. 1. limn→∞ 2. limn→∞

3. limn→∞

12 · 1 + 22 · 3 + · · · + n2 · (2n − 1) . n4 1 π 2π (n − 1)π cos + cos + · · · + cos . n n n n √ n n! . n

4. limn→∞

1α + 3α + · · · + (2n + 1)α , α > 0. nα+1

5. limn→∞

1α + 3α + · · · + (2n − 1)α , α > 0. 2α + 4α + · · · + (2n)α Z

Exercise 3.2.5. If f (x) is not continuous at x0 , is it true that F (x) =

x

f (t)dt is a

not differentiable at x0 ? Z Exercise 3.2.6. Suppose f (x) is integrable and F (x) =

x

f (t)dt. Prove that if a

− f (x) has left limit at x0 , then F 0 (x− 0 ) = f (x0 ). In particular, this shows that if f has different left and right limits at x0 , then F (x) is not differentiable at x0 .

Exercise 3.2.7. Suppose f (x) is integrable on [a, b]. Prove that there is a < c < b Z Z c 1 b such that f (x)dx = f (x)dx. 2 a a

162

CHAPTER 3. INTEGRATION

Exercise 3.2.8. Find continuous functions on given intervals satisfying the equalities. Z 1 Z x f (t)dt on [0, 1]. f (t)dt = 1. x

0

Z

x

Z 0

0

3.

x

f (t)dt on (0, +∞).

tf (t)dt = x

2. A

Z

(f (x))2

=2

x

f (t)dt on (−∞, +∞). 0

Exercise 3.2.9. Find continuous functions f (x) on (0, +∞), such that for all b > 0, Z ab the integral f (x)dx is independent of a > 0. a

Exercise 3.2.10. Suppose f (x) is continuous on [a, b], differentiable on (a, b), and Z b 2 Z b 0 satisfies f (a) = 0, 1 ≥ f (x) ≥ 0. Prove that f (x)3 dx. f (x)dx ≥ a

a

As a matter of fact, the continuity assumption can be weakened in order for the integral to be the antiderivative. Theorem 3.2.2. Suppose f (x) is integrable on [a, b]. Suppose F (x) is continZ b 0 uous on [a, b] and is differentiable on (a, b). If F (x) = f (x), then f (x)dx = F (b) − F (a).

a

The theorem is almost the converse of Theorem 3.2.1. Put together, they are called the Fundamental Theorem of Calculus. Proof. For a partition P of [a, b], we have X X F (b) − F (a) = (F (xi ) − F (xi−1 )) = f (x∗i )(xi − xi−1 ), where the second equality comes from the mean value theorem and F 0 (x) = f (x). Therefore F (b) − F (a) is a Riemann sum with suitable choice of x∗i . Z b When kP k → 0, by the integrability of f (x), we get F (b)−F (a) = f (x)dx. a

Example 3.2.6. By Exercise 3.1.8 and Proposition 3.1.4, the function  2x sin 1 − cos 1 if x 6= 0 f (x) = x x 0 if x = 0 is not continuous but still integrable on any bounded interval. Moreover, the function  x2 sin 1 if x 6= 0 F (x) = x 0 if x = 0 is an antiderivative of f (x). Therefore Theorem 3.2.2 can be applied to give us Z 1 f (x)dx = F (1) − F (0) = sin 1. Note that the discussion after Theorem 3.2.1 0

does not apply to the example because of the discontinuity.

3.2. ANTIDERIVATIVE

163

1 Example 3.2.7. Consider F (x) = x2 sin 2 for x 6= 0 and F (0) = 0. The function x is differentiable with the derivative  2x sin 1 − 2 cos 1 if x 6= 0 0 F (x) = . x2 x x2 0 if x = 0 However, F 0 (x) is not integrable on [0, 1] because it is not bounded. So the existence of antiderivative does not necessarily imply the integrability. Z 1 On the other hand, we note that the limit lima→0+ F 0 (x)dx = lim (F (1) − a→0+ a Z 1 F 0 (x)dx is an improper integral F (a)) = sin 1 exists. What we have here is that 0

with sin 1 as the value.

Exercise 3.2.11. Prove that Theorem 3.2.2 still holds if F (x) is differentiable at all but finitely many points (so F (x) is piecewise differentiable). Exercise 3.2.12. Suppose f (x) is differentiable. Prove that f 0 (x) is integrable if Z x and only if there is an integrable function g(x), such that f (x) = f (a) + g(t)dt. a

Exercise 3.2.13. Suppose f (x) is differentiable and Z b [x]f 0 (x)dx, where [x] is defined in Exercise 1.3.5.

f 0 (x)

is integrable. Compute

a

Exercise 3.2.14. Suppose f (x) has integrable derivative on [a, b]. Z 1. Prove that if f (x) vanishes somewhere on [a, b], then |f (x)| ≤

b

|f 0 (x)|dx.

a b

Z 2. Prove that a

Z b Z b 0 |f (x)|dx ≤ max f (x)dx , (b − a) |f (x)|dx . a

a

Exercise 3.2.15. Suppose f (x) is continuous on [a, b] and differentiable on (a, b), such that f 0 (x) is integrable on [a, b]. Use H¨older inequality (3.1.22) in Exercise 3.1.50 to prove that b

Z

2

(f (b) − f (a)) ≤ (b − a)

f 0 (x)2 dx.

a

Then prove the following. Z 1. If f (a) = 0, then a

b

(b − a)2 f (x) dx ≤ 2 2

b

Z

f (x)2 dx ≤

2. If f (a) = f (b) = 0, then a

3. If f

a+b 2

Z = 0, then a

b

Z

b

f 0 (x)2 dx.

a

(b − a)2 4

(b − a)2 f (x) dx ≤ 4 2

b

Z

f 0 (x)2 dx.

a

Z a

b

f 0 (x)2 dx.

(3.2.2)

164

3.2.2

CHAPTER 3. INTEGRATION

Antiderivative Z

Because of the fundamental theorem of calculus, we use

f (x)dx to denote

all the antiderivatives of f (x). In other words, if F (x) is a differentiable function satisfying F 0 (x) = f (x), then Z f (x)dx = F (x) + C, where C is an arbitrary constant. Here are some basic examples.  α+1 Z x + C if α 6= −1 α , x dx = α + 1  log |x| + C if α = −1 Z sin xdx = − cos x + C, Z cos xdx = sin x + C, Z tan xdx = − log | cos x| + C, Z sec2 xdx = tan x + C, Z sec x tan xdx = sec x + C, Z ex dx = ex + C, Z ax ax dx = + C, log a Z log |x|dx = x log |x| − x + C, Z dx √ = arcsin x + C, 2 Z 1−x dx = arctan x + C, 1 + x2 Z dx √ = arcsec x + C. x x2 − 1 1 1 1 1 Example 3.2.8. By = + , we get 1 − x2 2 1−x 1+x Z Z Z dx 1 dx dx = + 1 − x2 2 1−x 1+x 1 + x 1 1 + C. = (− log |1 − x| + log |1 + x|) + C = log 2 2 1 − x 1 Example 3.2.9. By sin x sin 2x = (cos 3x − cos x), we get 2 Z 1 1 sin x sin 2xdx = sin 3x − sin x + C. 6 2

3.2. ANTIDERIVATIVE

165

By sin2 x + cos2 x = 1, we get Z Z dx sin2 x + cos2 x = dx sin2 x cos2 x Z sin2 x cos2 x = (sec2 x + csc2 x)dx = tan x − cot x + C. Exercise 3.2.16. Verify the antiderivatives. Z p dx √ 1. = log |x + x2 + a| + C. x2 + a Z p p 1 2. 1 − x2 dx = (arcsin x + x 1 − x2 ) + C. 2 Z a cos bx + b sin bx 3. eax cos bxdx = eax + C. a2 + b2 Z 4. log |x|dx = x log |x| − x + C. Exercise 3.2.17. Compute the antiderivatives. Z Z 2 dx x −x+1 . 1. 5. dx. x(1 + x) (1 + x)n Z Z 2. x(1 + x)9 dx. 6. (2x + 2−x )2 dx. Z 3. Z 4.

x2 dx. (1 + x)9 2

n

x (1 + x) dx.

Z |x|dx.

7.

sin 2x cos 3xdx.

cos2 xdx.

Z

tan2 xdx.

Z

sin3 xdx.

Z

cos 2xdx . sin2 x cos2 x

9. 10. 11.

Z 8.

Z

12.

The formulae for derivatives can be directly translated into formulae for antiderivatives. For example, the formula (f (x) + g(x))0 = f 0 (x) + g 0 (x) corresponds to the formula Z Z Z (f (x) + g(x))dx = f (x)dx + g(x)dx. (3.2.3) To translate the Leibniz rule, we start with Z Z f (x)dx = F (x) + C, g(x)dx = G(x) + C. In other words, F 0 (x) = f (x) and G0 (x) = g(x). Then (F (x)G(x))0 = F 0 (x)g(x) + F (x)G0 (x) = f (x)G(x) + F (x)g(x). In terms of the antiderivative, this means Z Z F (x)g(x)dx = F (x)G(x) − f (x)G(x)dx. This is the formula for the integration by parts.

(3.2.4)

166

CHAPTER 3. INTEGRATION

Using the differential notation, we have dF (x) = f (x)dx and dG(x) = g(x)dx. Thus the formula can also be written as Z Z F (x)dG(x) = F (x)G(x) − G(x)dF (x). (3.2.5) In other words, the integration by parts is a way of exchanging the functions “inside” and “outside” the differential notation. The viewpoint suggests the usefulness of including dx in the integration (or antiderivative) notaZ tion

f (x)dx. Indeed, in the more advanced mathematics, integrations are

carried out for the differential forms f (x)dx instead of the functions only. Example 3.2.10. The antiderivative of the logarithmic function is Z Z Z 1 log |x|dx = x log |x| − xd log |x| = x log |x| − x dx = x log |x| − x + C. x Example 3.2.11. To find the antiderivative of parts twice. Z Z x e sin xdx = − ex d cos x Z x = −e cos x + Z x = −e cos x + Z x = −e cos x +

ex sin x, we apply the integration by

cos xdex ex cos xdx

ex d sin x Z x x = −e cos x + e sin x − sin xdex Z x x = −e cos x + e sin x − ex sin xdx.

Z Solving for

ex sin xdx, we get Z

1 ex sin xdx = ex (sin x − cos x) + C. 2

Exercise 3.2.18. Compute the antiderivatives. Z Z ax 1. e sin bxdx. 3. x2 2x dx. Z 2.

x

a cos bxdx.

Z 4.

x

xe sin xdx.

Z

x2 e−x cos 2xdx.

Z

xn arctan xdx.

5. 6.

Exercise 3.2.19. Prove the recursive relations. Z Z n−1 1 n n−1 1. sin xdx = − sin x cos x + sinn−2 xdx. n n Z Z 1 n−1 n n−1 2. cos xdx = cos x sin x + cosn−2 xdx. n n

3.2. ANTIDERIVATIVE Z 3.

tann xdx =

167

tann−1 x − n−1

Z

tann−2 xdx.

secn−2 x tan x n − 2 4. sec xdx = − n−1 n−1 Z Z 5. xn ex dx = xn ex − n xn−1 ex dx. Z

n

Z

secn−2 xdx.

Z

Z

1 n x (log |x|) dx = xα+1 (log |x|)n − α+1 α+1

Z

ex sinn xdx =

Z

xn cos xdx = xn sin x + nxn−1 cos x − n(n − 1)

Z

2n x(1 + ax2 )n + (1 + ax ) dx = 2n + 1 2n + 1

6.

α

n

xα (log |x|)n−1 dx.

Z 1 n x n−1 e sin x(n cos x − sin x) + ex sinn−2 xdx. n2 − 1 n+1 Z Z 8. xn sin xdx = −xn cos x + nxn−1 sin x − n(n − 1) xn−2 sin xdx. 7.

9.

10.

Z

2 n

2.

4

sin x cos xdx. Z

3.

5

6

tan xdx.

Z 5.

Z Exercise 3.2.20. Let I(m, n) =

√

Z

ex sin4 xdx.

Z

x4 sin xdx.

Z

dx . (1 + x2 )2

7. x

x (x + 1)e dx. Z

6.

3

xn−2 cos xdx.

(1 + ax2 )n−1 dx.

Then compute the antiderivatives. Z Z 6 1. sin xdx. 4. tan−6 xdx Z

Z

2

x(log x) dx.

8. 9.

cosm x sinn xdx. Use

d sinn x = n sinn−1 x cos xdx, d cosn x = −n cosn−1 x sin xdx to derive the recursive relations. n−1 cosm+1 x sinn−1 x + I(m + 2, n − 2) m+1 m+1 cosm+1 x sinn−1 x n−1 =− + I(m, n − 2) m+n m+n

I(m, n) = −

if m 6= −1 if m + n 6= 0.

Find the similar relations by Z exchanging the Zsine 4and cosine functions. Then sin x compute the antiderivatives cos4 x sin5 xdx, dx. cos3 x Z Exercise 3.2.21. Let I(m, n) = (x − a)m (x − b)n dx. Find a recursive relation between I(m, n) and I(m−1, n+1). Then use the relation to compute the integral Z 1 (x − 1)3 (x + 1)10 dx. −1

168

CHAPTER 3. INTEGRATION

Now we translate the chain rule. Suppose F 0 (y) = f (y), or Z f (y)dy = F (y) + C. Then for any differentiable function φ(x), we have F (φ(x))0 = F 0 (φ(x))φ0 (x) = f (φ(x))φ0 (x). Therefore F (φ(x)) is the antiderivative of f (φ(x))φ0 (x), and we have Z Z f (y)dy = F (φ(x)) + C = f (φ(x))φ0 (x)dx. (3.2.6) y=φ(x)

This is the formula for the change of variable. Z

dx √ , we introduce x = t2 . 1+ x Z Z Z Z d(t2 ) dx 2tdt 1 √ = dt = = 2 1− 1+t 1+t 1+t 1+ x √ √ = 2(t − log |1 + t|) + C = 2( x − log |1 + x|) + C.

Example 3.2.12. To compute

Example 3.2.13. To compute the antiderivative of √

x , we note 5 + 4x − 3 + 4x − x2

x2 = 9 − (x − 2)2 and introduce x − 2 = 3 sin t. Z Z Z xdx (2 + 3 sin t)3d sin t √ p = 3(2 + 3 sin t)dt = 3 + 4x − x2 9 − 9 sin2 t p x−2 = 3(2t − 3 cos t) + C = 6 arcsin − 3 3 + 4x − x2 + C. 3 Z p In general, to compute an antiderivative of the form f (x, ax2 + bx + c)dx, we may complete the square for the quadratic function ax2 + bx + c. Then a suitable trigonometric function can be used to change the variable. Example 3.2.14. We compute the antiderivative of the secant function. Z Z Z 1 + sin x cos xdx d sin x 1 + C. = = log sec xdx = cos2 x 2 1 − sin x 1 − sin2 x In the last step, the computation of Example 3.2.8 is used. Note that by 1 + sin x (1 + sin x)2 (1 + sin x)2 = = = (sec x + tan x)2 , 1 − sin x (1 − sin x)(1 + sin x) cos2 x the antiderivative can also be written as Z sec xdx = log | sec x + tan x| + C. Example 3.2.15. The antiderivative of √

1 x2 + 1

may be computed by introducing

x = tan t. Z Z Z p sec2 tdt dx √ = = sec tdt = log | sec x+tan x|+C = log(x+ x2 + 1)+C. sec t x2 + 1

3.2. ANTIDERIVATIVE

169

The integration of sec t is computed in Example 3.2.14. Similarly, the antideriva1 tive of √ may be computed by introducing x = sec t. 2 x −1 Z Z Z p sec t tan tdt dx √ = = sec tdt = log |x + x2 − 1| + C. tan t x2 − 1 Example 3.2.16. To compute the antiderivative of the inverse sine function, we introduce x = sin t. Z Z Z arcsin xdx = td sin t = t sin t − sin tdt p = t sin t + cos t + C = x arcsin x + 1 − x2 + C. Note that the integration by parts is also used. Exercise 3.2.22. Compute the antiderivatives. Z Z 0 f 0 (x) f (x) dx. 2. dx. 1. f (x)α 1 + f (x)2

Z 3.

Exercise 3.2.23. Compute the antiderivatives. Z p Z dx √ √ 9. x2 + 2x + 5dx. 1. . x+ 3x Z Z √ dx xdx √ 10. . √ 2. . 3 2x − x2 1− x Z p Z √ 10 3 11. x 5 + 4x − x2 dx. 3. (1 + x) dx. Z 4. Z 5. Z 6. Z 7. Z 8.

dx . 2 x + 2x + 5 xdx . x2 + 2x + 5 x3 dx . x2 + 2x + 5

Z 12.

(x2 Z

13.

15.

x3 dx . (x2 + 2x + 5)2

16.

+ 2x + 2)

3 .

Z 17. 18.

csc xdx. Z

tan3 xdx.

Z

sec3 xdx.

Z

sin4 x dx. cos3 x

Z

sin5 x dx. cos3 x

Z

(arcsin x)2 dx.

Z

x(arcsin x)2 dx.

19. 20.

2

(2x + 1)dx p . x(x + 1)

21.

dx . x log x

Z

log x dx. x

23.

dx . x e + e−x

24.

Z

cot xdx. Z

Z 14.

dx . 2 (x + 2x + 5)2

xdx

2f (x) f 0 (x)dx.

Exercise 3.2.24. Compute the antiderivatives (a > 0). Z Z dx α √ 1. (ax + b) dx. 4. . 2 a − x2 Z Z dx dx 2. . √ 5. . 2 a2 + x2 a + x2 Z Z dx dx 3. √ 3 . 6. . 2 (a2 + x2 ) 2 x − a2

22.

Z p 7. a2 − x2 dx. Z p 8. a2 + x2 dx. Z p 9. x2 − a2 dx.

170

CHAPTER 3. INTEGRATION Z

Z √

xdx √ . x2 − a2

10. Z

dx √ . x x2 − a2

11.

3.2.3

Z

x2 − a2 12. dx. x Z p 13. x a2 − x2 dx.

14.

x

p a2 + x2 dx.

x

p x2 − a2 dx.

Z 15.

Integration by Parts

By the fundamental theorem of calculus, the integration by parts formula (3.2.4) for the antiderivatives gives the integration by parts formula for the Riemann integral. Theorem 3.2.3 (Integration by Parts). Suppose F (x) and G(x) are continuous on [a, b] and differentiable on (a, b). Suppose f (x) = F 0 (x) and g(x) = G0 (x) are integrable on [a.b]. Then Z b Z b f (x)G(x)dx. (3.2.7) F (x)g(x)dx = F (b)G(b) − F (a)G(a) − a

a

Note that although f (x) and g(x) are not defined at a and b, arbitrary numbers may be assigned as the values at the two points. By Example 3.1.12, this does not affect the integrability and the integral. The conditions in the theorem are included to make sure the antiderivative can be used to compute the integrals. The condition is weakened in Exercise 3.2.57. The reason behind the weakened condition will be revealed in Theorem 3.3.6. Finally, similar to (3.2.5), the integration by parts formula can also be written as Z b Z b F (x)dG(x) = F (b)G(b) − F (a)G(a) − G(x)dF (x). (3.2.8) a

a

Example 3.2.17. Suppose f (x) has integrable (n + 1)-st order derivative. Then Z x Z x 0 f (x) = f (x0 ) + f (t)dt = f (x0 ) + f 0 (t)d(t − x) x0 x0 Z x 0 0 = f (x0 ) + f (x)(x − x) − f (x0 )(x0 − x) − (t − x)f 00 (t)dt x0 Z 1 x 00 = f (x0 ) + f 0 (x0 )(x − x0 ) − f (t)d(t − x)2 2 x0 Z 1 00 1 x 0 2 = f (x0 ) + f (x0 )(x − x0 ) + f (x0 )(x − x0 ) − (t − x)2 f 000 (t)dt 2 2 x0 = ··· 1 1 = f (x0 ) + f 0 (x0 )(x − x0 ) + f 00 (x0 )(x − x0 )2 + · · · + f (n) (x0 )(x − x0 )n 2! n! Z x n 1 + (−1) (t − x)n f (n+1) (t)dt. n! x0 This gives the integral form 1 Rn (x) = n!

Z

x

(x − t)n f (n+1) (t)dt.

x0

of the remainder of the Taylor expansion.

(3.2.9)

3.2. ANTIDERIVATIVE

171

Exercise 3.2.25 (Jean Bernoulli3 ). Suppose f (t) has continuous n-th order derivative on [0, x]. Prove that Z

x

x2 xn 1 f (t)dt = xf (x)− f 0 (x)+· · ·+(−1)n−1 f (n−1) (x)+(−1)n 2! n! n!

0

Z

x

tn f (n) (t)dt.

0

Exercise 3.2.26. Suppose u(x) and v(x) have continuous n-th order derivative on [a, b]. Prove that Z

b

uv

(n)

Z b h ix=b (n−1) 0 (n−2) n−1 (n−1) n u(n) vdx. dx = uv −uv + · · · + (−1) u v +(−1) x=a

a

a

x

Z

(x − t)n f (n+1) (t)dt to prove the integral form (3.2.9)

Then apply the formula to x0

of the remainder.

Exercise 3.2.27. Suppose f 0 (x) is integrable and limx→+∞ f 0 (x)dx = 0. Prove Z 1 b that limb→+∞ f (x) sin xdx = 0. Moreover, extend the result to high order b a derivatives.

3.2.4

Change of Variable

By the fundamental theorem of calculus, the change of variable formula (3.2.6) for the antiderivatives gives the change of variable formula for the Riemann integral. Theorem 3.2.4 (Change of Variable). Suppose φ(x) is differentiable, with φ0 (x) integrable on [a, b]. Suppose f (y) is continuous on φ([a, b]). Then φ(b)

Z

b

Z

f (φ(x))φ0 (x)dx.

f (y)dy = φ(a)

(3.2.10)

a

Similar to the integration by parts, the differential notation can be used in the change of variable formula to get Z

φ(b)

Z f (y)dy =

φ(a)

b

f (φ(x))dφ(x).

(3.2.11)

a

The formulation makes the meaning of the change of variable rather clear. In particular, the change should also be made for the variable y inside the differential dy. This suggests again the usefulness of including the differential notation in the integration. Z t Proof. By Theorem 3.2.1, the continuity of f (y) implies that F (t) = f (y)dy φ(a)

satisfies F 0 (t) = f (t). Since φ is differentiable, by the chain rule, we have F (φ(x))0 = F 0 (φ(x))φ0 (x) = f (φ(x))φ0 (x). 3

Jean Bernoulli, born 1667 and died 1748 in Basel (Switzerland).

172

CHAPTER 3. INTEGRATION

Since f and φ are continuous, the composition f (φ(x)) is continuous and is therefore integrable. Moreover, φ0 (x) is assumed to be integrable. Therefore the product f (φ(x))φ0 (x) is integrable, and we may apply Theorem 3.2.2 to get Z b

f (φ(x))φ0 (x)dx.

F (φ(b)) − F (φ(a)) = a

By the definition of F (t), this is the formula (3.2.10). The following result shows that the change of variable formula also holds under more strict condition on φ(x) and less strict condition on f (y). However, the result cannot be proved by applying the fundamental theorem of calculus. Riemann sum has to be used. Theorem 3.2.5 (Change of Variable). Suppose φ(x) is increasing and differentiable, with φ0 (x) integrable on [a, b]. Suppose f (y) is integrable on [φ(a), φ(b)]. Then Z

φ(b)

Z f (y)dy =

φ(a)

b

f (φ(x))φ0 (x)dx.

a

Proof. Since φ0 and f are integrable, they are both bounded. Assume |φ0 (x)| < A and |f (y)| < B for all x ∈ [a, b] and y ∈ [φ(a), φ(b)]. Let P : a = x0 < x1 < x2 < · · · < xn = b be a partition of [a, b]. Denote by φ(P ) : φ(a) = φ(x0 ) ≤ φ(x1 ) ≤ φ(x2 ) ≤ · · · ≤ φ(xn ) = φ(b) the corresponding partition of [φ(a), φ(b)]. Note that although it might happen that φ(xi−1 ) = φ(xi ) for some i, it is not hard to see that the definition of Riemann sum is not changed if some equality in the partition is allowed. The assumption |φ0 (x)| < A implies that kφ(P )k ≤ AkP k. For a Riemann sum X S(P, f (φ(x))φ0 (x)) = f (φ(x∗i ))φ0 (x∗i )(xi − xi−1 ) of f (φ(x))φ0 (x) with respect to the partition P , we consider a corresponding Riemann sum X S(φ(P ), f (y)) = f (φ(x∗i ))(φ(xi ) − φ(xi−1 )) X = f (φ(x∗i ))φ0 (x∗∗ i )(xi − xi−1 ) of f (x) with respect to the partition φ(P ), where the second equality is from the mean value theorem. Then |S(P, f (φ(x))φ0 (x)) − S(φ(P ), f (y))| X X ≤ |f (φ(x∗i ))||φ0 (x∗i ) − φ0 (x∗∗ )|(x − x ) ≤ B ω[xi−1 ,xi ] (φ0 )∆xi . i i−1 i

3.2. ANTIDERIVATIVE

173

Since φ0 and f are integrable, for any > 0, there are δ1 , δ2 > 0, such that kP k < δ1 and kφ(P )k ≤ δ2 imply Z φ(b) X f (y)dy < . ω[xi−1 ,xi ] (φ0 )∆xi < , S(φ(P ), f (y)) − φ(a)

δ2 Combined with kφ(P )k ≤ AkP k, we conclude that kP k < min δ1 , A implies Z φ(b) f (y)dy < B + . S(P, f (φ(x))φ0 (x)) − φ(a)

This proves that f (φ(x))φ0 (x) is integrable and the change of variable formula holds. Example 3.2.18. For the special case φ(x) = x + c, we get Z

b+c

b

Z f (x)dx =

a+c

f (x + c)dx. a

For the special case φ(x) = cx, c 6= 0, we get Z

cb

b

Z f (x)dx = c

ca

f (cx)dx. a

Z

π

x sin xdx , we introduce x = π − t. 2 0 1 + cos x Z π Z π (π − t) sin(π − t)dt (π − t) sin tdt sin tdt = =π − I. 2 2 2 1 + cos (π − t) 1 + cos t 0 0 1 + cos t

Example 3.2.19. To compute I = Z

0

I=− π

Thus Z Z Z π π sin tdt π π d cos t π −1 dx I= =− =− 2 0 1 + cos2 t 2 0 1 + cos2 t 2 1 1 + x2 π π2 = (arctan 1 − arctan(−1)) = . 2 4 Exercise 3.2.28. Compute the integrals. Z 1 Z 2 xdx x2 1. . xe dx. 2. 2 0 0 1+x

Z 3. 1

Exercise 3.2.29. Suppose f (x) is integrable on [−a, a]. Z a Z 1. If f (x) is an even function, prove that f (x)dx = 2 −a

Z

3

dx √ . x x+1

a

f (x)dx.

0

a

2. If f (x) is an odd function, prove that

f (x)dx = 0. −a

Exercise 3.2.30. Suppose f (x) is a continuous function on [−a, a]. Prove the following are equivalent.

174

CHAPTER 3. INTEGRATION

1. f (x) is an odd function. Z b 2. f (x)dx = 0 for any 0 0 to prove the property 1 t log x + log y = log(xy) of the logarithmic function. Exercise 3.2.31. Use the formula log x =

Exercise 3.2.32. Suppose f (x) is integrable on an open interval containing [a, b] and is continuous at a and b. Prove that Z b f (x + h) − f (x) lim dx = f (b) − f (a). h→0 a h The result should be compared with the equality Z b Z b f (x + h) − f (x) dx = f 0 (x)dx = f (b) − f (a), lim h→0 h a a which by Theorem 3.2.2 holds when f (x) is differentiable and f 0 (x) is integrable.

3.2.5

Additional Exercise

Estimation of Integral The estimations on the integrals in Exercises 3.1.51, 3.1.53 and 3.1.54 may be extended by using high order derivatives. Exercise 3.2.33. Suppose f (x) is continuous on [a, b] and has n-th order derivative on (a, b). By either integrating theZTaylor expansion at a or considering the Taylor x

expansion of the function F (x) =

f (t)dt, prove that a

Z

b

f (x)dx = f (a)(b−a)+ a

f (n−1) (a) f (n) (c) f 0 (a) (b−a)2 +· · ·+ (b−a)n + (b−a)n+1 , 2! n! (n + 1)! (3.2.12)

where a < c < b. Exercise 3.2.34. Suppose f (x) is continuous on [a, b] and has n-th order derivative on (a, b). Prove that Z n−1 b X f (k) (a) + (−1)k f (k) (b) ω(a,b) (f (n) ) k+1 f (x)dx − ≤ (b − a) (b − a)n+1 (n + 1)!2n+1 a (k + 1)!2k k=0 (3.2.13) for odd n and Z b n−1 X f (k) (a) + (−1)k f (k) (b) f (n) (c) (b−a)k+1 + f (x)dx = (b−a)n+1 (3.2.14) n k (n + 1)!2 (k + 1)!2 a k=0

for even n and some a < c < b.

3.2. ANTIDERIVATIVE

175

Exercise 3.2.35. Suppose f (x) is continuous on [a, b] and has 2n-th order derivative on (a, b). Prove Z

b

f (x)dx = a

n−1 X k=0

1 f (2k) (2k + 1)!22k

a+b 2

(b − a)2k+1 +

f (2n) (c) (b − a)2n+1 . (2n + 1)!22n (3.2.15)

for some a < c < b.

Estimation of Special Riemann Sums Consider the partition of [a, b] by evenly distributed partition points xi = i a + (b − a). We can form the Riemann sums by choosing the left, right and n middle points of the partition intervals. b−aX f (xi−1 ), n X b−aX f (xi ), Sright,n (f ) = f (xi )∆xi = n X xi−1 + xi b−aX xi−1 + xi Smiddle,n (f ) = f f ∆xi = . 2 n 2 Sleft,n (f ) =

X

f (xi−1 )∆xi =

The question is how close these are to the actual integral. Exercise 3.2.36. Suppose f (x) is continuous on [a, b] and differentiable on (a, b), such that f 0 (x) is integrable. Use the estimation in Exercise 3.1.51 to prove that Z b 1 lim n f (x)dx − Sleft,n (f ) = (f (b) − f (a))(b − a), n→∞ 2 a Z b 1 lim n f (x)dx − Sright,n (f ) = − (f (b) − f (a))(b − a). n→∞ 2 a Exercise 3.2.37. Suppose f (x) is continuous on [a, b] and has second order derivative on (a, b), such that f 00 (x) is integrable on [a, b]. Use the estimation in Exercise 3.1.54 to prove that Z b 1 2 lim n f (x)dx − Smiddle,n (f ) = (f 0 (b) − f 0 (a))(b − a)2 . n→∞ 24 a Exercise 3.2.38. Use Exercises 3.2.33, 3.2.34 and 3.2.35 to derive higher order Z b approximation formulae for the integral f (x)dx. a

Average of Functions The average of an integrable function on [a, b] is 1 Av[a,b] (f ) = b−a

Z

b

f (x)dx.

(3.2.16)

a

Exercise 3.2.39. Prove the properties of the average. 1. Av[a+c,b+c] (f (x + c)) = Av[a,b] (f (x)), Av[λa,λb] (f (λx)) = Av[a,b] (f (x)).

176

CHAPTER 3. INTEGRATION

2. If c = λa + (1 − λ)b, then Av[a,b] (f ) = λAv[a,c] (f ) + (1 − λ)Av[c,b] (f ). In particular, Av[a,b] (f ) lies between Av[a,c] (f ) and Av[c,b] (f ). 3. f ≥ g implies Av[a,b] (f ) ≥ Av[a,b] (g). 4. If f (x) is continuous, then Av[a,b] (f ) = f (c) for some a < c < b. Exercise 3.2.40. Suppose f (x) is integrable on [0, a] for any a > 0. Consider the Z 1 x f (t)dt. average function g(x) = Av[0,x] (f ) = x 0 1. Prove that if limx→+∞ f (x) = l, then limx→+∞ g(x) = l (compare Exercise 1.1.36). 2. Prove that if f (x) is increasing, then g(x) is also increasing. 3. Prove that if f (x) is convex, then g(x) is also convex. Exercise 3.2.41. For a weight function λ(x) defined in (3.1.21) in Exercise 3.1.47, the weighted average of an integrable function f (x) is Z b 1 λ(x)f (x)dx (3.2.17) Avλ[a,b] (f (x)) = b−a a Can you extend the properties of average in Exercise 3.2.39 to weighted average?

Integration of Periodic Function and Riemann-Lebesgue Lemma Suppose f (x) is a periodic integrable function with period T . Z

a+T

Z

T

Exercise 3.2.42. Prove that

f (x)dx = f (x)dx. a 0 Z Z 1 b 1 T Exercise 3.2.43. Prove that limb→+∞ f (x)dx = f (x)dx. This says that b a T 0 the limit of the average on bigger and bigger intervals is the average on an interval of the period length. Exercise 3.2.44 (Riemann-Lebesgue Lemma). Suppose f (x) is a periodic integrable function with period T and g(x) is integrable on [a, b]. Prove that Z b Z Z b 1 T lim f (tx)g(x)dx = f (x)dx g(x)dx. t→∞ a T 0 a

Trigonometric Integration Exercise 3.2.45. Let a, b, n be given. 1. Prove that there are A, B, C, such that Z Z dx A sin x + B cos x dx = +C . n n−1 (a sin x + b cos x) (a sin x + b cos x) (a sin x + b cos x)n−2 2. Prove that if |a| = 6 |b|, then there are A, B, C, such that Z Z Z dx A sin x dx dx = +B +C . n n−1 n−1 (a + b cos x) (a + b cos x) (a + b cos x) (a + b cos x)n−2 Z dx Exercise 3.2.46. Compute the antiderivative of by using cos(x + a) cos(x + b) tan(x + a) − tan(x + b) =

sin(a − b) . cos(x + a) cos(x + b)

Use the similar idea to compute the following antiderivatives.

3.2. ANTIDERIVATIVE Z 1.

177

dx . sin(x + a) cos(x + b)

dx . sin x − sin a

Z

dx . cos x + cos a

3.

Z 2.

Z

tan(x + a) tan(x + b)dx.

4.

Wallis4 Formula Z For p, q ≥ 0, define w(p, q) =

1

1

(1 − x p )q dx.

0

q Exercise 3.2.47. Prove w(p, q) = w(p, q − 1). p+q 1

Exercise 3.2.48. Prove w(p, q) = w(q, p) by changing the variable y = (1 − x p )q . m!n! for natural numbers m and n. Exercise 3.2.49. Prove w(m, n) = (m + n)! Exercise 3.2.50. Prove w is strictly decreasing in p and in q. π 1 1 = . Exercise 3.2.51. Show that w , 2 2 4 1 1 Exercise 3.2.52. Use w(n + 1, n + 1) < w n + , n + < w(n, n) to prove 2 2 22 · 42 · 62 · · · (2n)2 π 22 · 42 · 62 · · · (2n)2 · (2n + 2) < < . 1 · 32 · 52 · · · (2n − 1)2 · (2n + 1) 2 1 · 32 · 52 · · · (2n − 1)2 · (2n + 1)2 Exercise 3.2.53. Prove that in the inequality in Exercise 3.2.52, the left is increasπ as the limit. This leads to Wallis ing, the right is decreasing, and both have 2 infinite product formula π 2 2 3 4 5 6 = · · · · · · ··· . 2 1 3 4 5 6 7

Integration by Parts under Weaker Condition Suppose f (x) and g(x) are integrable on [a, b], and Z x Z x F (x) = f (t)dt, G(x) = g(t)dt. a

a

The subsequent exercises will prove the integration by parts formula (3.2.7) under the only assumption that f and g are integrable. Note that up to adding constants to F and G, the assumption is strictly weaker than the one in Theorem 3.2.3. For a partition P : a = x0 < x1 < · · · < xnP= b and a choice of x∗i = xi , define the “partial Riemann sums” Si (P, f ) = ik=1 f (xk )∆xk for 1 ≤ i ≤ n. Exercise Z3.2.54. The partial Riemann sum is the Riemann sum for the integral xi

F (xi ) =

f (t)dt and should approximate the integral. Then by replacing F (xi ) P with the approximation Si (P, f ), the sum ni=1 Si (P, f )g(xi )∆xi should approximate the Riemann sum S(P, F g). By using the estimation (3.1.14) in Exercise 3.1.16, prove the estimation ! n n X X Si (P, f )g(xi )∆xi ≤ ω[xi−1 ,xi ] (f )∆xi S(P, |g|). S(P, F g) − a

i=1

4

i=1

John Wallis, born 1616 in Kent (England), died 1703 in Oxford (England).

178

CHAPTER 3. INTEGRATION

Exercise 3.2.55. Exercise 3.2.54 provides approximations for the Riemann sums of of F g and f G. Prove the sum of the two approximations is n X

Si (P, f )g(xi )∆xi +

i=1

n X

f (xi )∆xi Si (g, P ) = S(P, f )S(P, g) +

i=1

n X

f (xi )g(xi )∆x2i .

i=1

The geometric meaning of the eqality is given in Figure 4.1. Exercise 3.2.56. Prove the integration by parts formula Z b (F (x)g(x) + f (x)G(x))dx = F (b)G(b) a

for the special case F (a) = G(a) = 0. Exercise 3.2.57. Suppose f (x) and g(x) are integrable on [a, b] and Z x Z x F (x) = f (t)dt + c, G(x) = g(t)dt + d, a

a

where c and d are some constants. Prove the integration by parts formula (3.2.7).

Second Integral Mean Value Theorem The first integral mean value theorem appeared in Exercise 3.1.15. Suppose f (x) is differentiable with integrable f 0 (x) on [a, b]. Suppose g(x) is continuous on [a, b]. Z Exercise 3.2.58. Suppose f (x) ≥ 0 and is decreasing. Suppose G(x) = satisfies m ≤ G(x) ≤ M for x ∈ [a, b]. Prove that Z b f (a)m ≤ f (x)g(x)dx ≤ f (a)M.

x

g(x)dx a

a

Then use this to prove that Z b

Z f (x)g(x)dx = f (a)

a

c

g(x)dx a

for some a < c < b. What if f (x) is increasing? Exercise 3.2.59. Suppose f (x) is monotone. Prove that Z b Z c Z b f (x)g(x)dx = f (a) g(x)dx + f (b) g(x)dx a

a

c

for some a < c < b.

Young Inequality Exercise 3.2.60. Suppose f (x) is differentiable for x ≥ 0 and satisfies f 0 (x) > 0, f (0) = 0. Prove that for any a, b > 0, we have the Young inequality Z a Z b f (x)dx + f −1 (y)dy ≥ ab. (3.2.18) 0

0

Then apply to the function xp−1 for p > 1 and derive the Young inequality in Exercise 2.2.40. The inequality actually holds under weaker condition. See Exercise 3.3.30.

3.3. TOPICS ON INTEGRATION

3.3 3.3.1

179

Topics on Integration Integration of Rational Functions

It is a basic algebraic fact that any rational function is a sum of functions A Bx + C of the forms Axn , and 2 , where x2 + ax + b has no n (x − a) (x + ax + b)n real roots. Thus the antiderivative of the rational function is the sum of the antiderivatives of these three types of functions. The antiderivative of the first two types of functions are very simple. To compute the antiderivative of the last type, we complete the square and get x2 + ax + b = (x + α)2 + β 2 . Then Z Z Z (x + α)dx dx (Bx + C)dx =b +c , 2 n 2 2 n (x + ax + b) ((x + α) + β ) ((x + α)2 + β 2 )n where b = B, c = C − Bα. The first antiderivative is Z

 1   log(x2 + ax + b) + C (x + α)dx = 2 1 ((x + α)2 + β 2 )n  +C − 2 2(n − 1)(x + ax + b)n−1

if n = 1 . if n > 1

The change of variable βt = x + α reduces the computation of the second Z dx antiderivative into . This may be computed by the last recursive 2 (t + 1)n relation in Exercise 3.2.19. Example 3.3.1. To compute the antiderivative of the rational function f (x) = x4 + x3 + x2 − 2x + 1 , we find the factorization x5 +x4 −2x3 −2x2 +x+1 = x5 + x4 − 2x3 − 2x2 + x + 1 (x − 1)2 (x + 1)3 of the denominator and write f (x) =

A1 A2 B1 B2 B3 + + + + . 2 2 x − 1 (x − 1) x + 1 (x + 1) (x + 1)3

This is the same as x4 + x3 + x2 − 2x + 1 = A1 (x − 1)(x + 1)3 + A2 (x + 1)3 + B1 (x − 1)2 (x + 1)2 + B2 (x − 1)2 (x + 1) + B3 (x − 1)2 . Then we get the following equations. 2 = 8A2 ,

(at x = 1)

4 = 4B3 ,

(at x = −1) d ( at x = 1) dx d ( at x = −1) dx (coefficient of x4 )

7 = 8A1 + 12A2 , −5 = 4B2 − 4B3 , 1 = A1 + A2 + B 1 + B 2 + B 3 .

180

CHAPTER 3. INTEGRATION

It is easy to solve the equations and get Z Z 1 1 1 1 1 f (x)dx = + + dx + − 2(x − 1) 4(x − 1)2 2(x + 1) 4(x + 1)2 (x + 1)3 1 1 1 1 1 = log |x − 1| − + log |x + 1| + − +C 2 4(x − 1) 2 4(x + 1) 2(x + 1)2 1 x + C. = log |x2 − 1| − 2 (x − 1)(x + 1)2 Example 3.3.2. To compute the antiderivative of the rational function f (x) = 4x3 − x2 + 2x , we find the factorization (x−1)(2x−1)(2x2 + 6 5 8x + 4x − 4x4 − 8x3 − 2x2 + x + 1 2x + 1)2 of the denominator and write f (x) =

A2 B 1 x + C1 B 2 x + C2 A1 . + + 2 + x − 1 2x − 1 2x + 2x + 1 (2x2 + 2x + 1)2

This is the same as 4x3 − x2 + 2x = A1 (2x − 1)(2x2 + 2x + 1)2 + A2 (x − 1)(2x2 + 2x + 1)2 + (B1 x + C1 )(x − 1)(2x − 1)(2x2 + 2x + 1) + (B2 x + C2 )(x − 1)(2x − 1). By various methods (comparing coefficients of xk , evaluating at specific values of x, taking derivative and evaluating at specific values of x, etc.), we get a system of equations. Solving the system, we get f (x) =

1 2 1 x − − + . 2 2 5(x − 1) 5(2x − 1) 5(2x + 2x + 1) (2x + 2x + 1)2

We have Z

dx x−1 Z dx 2x − 1 Z dx 2x2 + 2x + 1 Z xdx 2 (2x + 2x + 1)2

= log |x − 1| + C, Z 1 d(2x − 1) 1 = = log |2x − 1| + C, 2 2x − 1 2 Z d(2x + 1) = = arctan(2x + 1) + C, (2x + 1)2 + 1 Z Z 4xdx (tan t − 1)d tan t = = 2 2 ((2x + 1) + 1) (tan2 t + 1)2 Z 1 = (sin t cos t − cos2 t)dt = (sin2 t − sin t cos t − t) 2 1 = (cos2 t(tan2 t − tan t) − t) + C 2 (2x + 1)2 − (2x + 1) 1 = − arctan(2x + 1) + C 2(2x + 1)2 + 1 2 x(2x + 1) 1 = 2 − arctan(2x + 1) + C. 2x + 2x + 1 2

Thus we conclude Z x−1 1 − 7 arctan(2x + 1) + x(2x + 1) + C. f (x)dx = log 5 2x − 1 10 2x2 + 2x + 1 Exercise 3.3.1. Compute the antiderivatives of rational functions.

3.3. TOPICS ON INTEGRATION Z 1. Z 2. Z 3. Z 4. Z 5. Z 6. Z 7.

181 Z

dx . 2 x (1 + x)

8.

dx . x(1 + x)(2 + x)

Z 9.

x4 dx . 1 − x2

10.

xdx . x2 + x − 2

11.

x5 dx . x2 + x − 2

12.

dx . 1 − x4

13.

(x −

xdx . + 2x + 2)

1)2 (x2

Z

(x4 + 4x3 + 4x2 + 4x + 4)dx . x(x + 2)(x2 + 2x + 2)2

Z

(2x2 + 3)dx . x3 + x2 − 2

Z

dx . +4

x4 Z

x2 dx . (1 − x4 )2

dx . (1 − x4 )2

xdx . x3 + 1

Z 14.

(x +

1)(x2

dx . + 1)(x3 + 1)

By suitable change of variables, many antiderivatives may be converted to the antiderivatives of rational functions. x Example 3.3.3. By the change t = tan , we have 2 Z Z 2t 1 − t2 2dt R(sin x, cos x)dx = R , . 2 2 1+t 1+t 1 + t2 If R is a rational function, then the computation is reduced to the antiderivative of some rational function. As a concrete example, for a 6= 0, we have Z Z Z dx 2dt 2dt !. = = 2 2t a + sin x 1 1 2 a+ (1 + t ) a t+ +1− 2 1 + t2 a a r 1 1 If |a| > 1, then with t + = 1 − 2 u, we further have a a r 1 x Z Z 2 1 − 2 du a tan + 1 dx 2 a √ = = arctan √ 2 + C. 2 1 a + sin x sign(a) a − 1 a2 − 1 2 a 1 − 2 (u + 1) a If |a| = 1, then Z

dx = a + sin x

Z

2dt −2 −2 = +C = + C. x 2 a(t + a) a(t + a) a tan + 1 2

If |a| < 1, then r r 1 2 1 1 1 1 1 t+ + 1 − 2 = (t − α)(t − β), α = − + − 1, β = − − − 1, 2 a a a a a a2

182

CHAPTER 3. INTEGRATION

and we further have Z Z t − α dx 2dt 2 +C = = log a + sin x a(t − α)(t − β) a(α − β) t−β a tan x + 1 − √1 − a2 1 2 + C. =√ log √ x 2 1 − a2 a tan + 1 + 1 − a 2 x Example 3.3.4. Although the change t = tan can always be used for rational 2 functions of trigomonetric functions, it may not be the most efficient one. For example, by t = cos x, we have Z Z Z dx −dt −d cos x = = . 2 (a + cos x) sin x (a + t)(1 − t2 ) (a + cos x) sin x If a 6= ±1, then −1 1 1 1 = 2 − + , 2 (a + t)(1 − t ) (a − 1)(t + a) 2(a − 1)(t + 1) 2(a + 1)(t − 1) and Z

log |t + a| log |t + 1| log |t − 1| dx = − + +C (a + cos x) sin x a2 − 1 2(a − 1) 2(a + 1) log | cos x + a| log | cos x + 1| log | cos x − 1| = − + + C. a2 − 1 2(a − 1) 2(a + 1) r

Example 3.3.5. By the change t = r

Z R x,

n

ax + b cx + d

!

n

ax + b , we have cx + d

Z dx =

R

αtn + β (αδ − βγ)ntn−1 , t dt γtn + δ (γtn + δ)2

Z r Applying the idea to the antiderivative Then x =

x dx, we introduce t = (x + 1)3

r

x . x+1

t2 2tdt , dx = , and 2 1−t (1 − t2 )2

Z r

Z 1 2tdt 1 − − 2 dt |1 − t |t =± (1 − t2 )2 t+1 t−1 t + 1 − 2t + C = ± log t − 1 r √ √ x = ± 2 log( x + x + 1) − 2 + C. x+1

x dx = (x + 1)3

Z

2

The sign is positive if x > 0, and is negative if x < −1. Z r x e −1 Example 3.3.6. To compute the antiderivative dx, we introduce t = ex . ex + 1 r Z t − 1 dt x Then dt = e dx = tdx, and the antiderivative becomes . By further t+1 t

3.3. TOPICS ON INTEGRATION r introducing u = Z r

t−1 = t+1

ex − 1 dx = ex + 1

r

183

ex − 1 , we get ex + 1

Z 1 4u2 du 1 2 du = − − (1 + u2 )(1 − u2 ) u + 1 u − 1 u2 + 1 u + 1 − 2 arctan u + C = log u − 1 r √ √ ex − 1 = 2 log( ex − 1 + ex + 1) − 2 arctan + C. ex + 1 Z

Exercise 3.3.2. Compute the antiderivatives. Z Z dx (1 + sin x)dx 7. . 1. . a sin x + b cos x + c sin x(1 + cos x) Z Z √ dx 8. . tan xdx. 2. 2 2 a sin x + b2 cos2 x Z r Z 1−x (sin x + cos x)dx 9. dx. 3. . 1+x sin x(sin x − cos x) r Z Z dx 1 1−x . 4. 10. dx. a + cos x x2 1 + x Z Z dx dx √ √ . 11. 5. . 2 x (a + cos x) sin x 1 + e + 1 − ex Z Z dx dx √ 12. . 6. . a + tan x ax + b Exercise 3.3.3. Suppose R is a rational function. Suppose r, s are rational numbers such that r + s is an integer. Find a suitable change of variable, such that Z

R(x, (ax + b)r , (cx + d)s )dx is changed into the antiderivative of a rational func-

tion. Exercise 3.3.4. Suppose r, s, t are rational numbers. Z For each of the following cases, find a suitable change of variable, such that xr (a + bxs )t dx is changed into the antiderivative of a rational function. 1. t is an integer. 2.

r+1 is an integer. s

3.

r+1 + t is an integer. s

A theorem by Chebyshev5 says that these are the only cases that the antiderivative can be changed to the antiderivative of a rational function. 5 Pafnuty Lvovich Chebyshev, born 1821 in Okatovo (Russia), died 1894 in St Petersburg (Russia). Chebyshev’s work touches many fields of mathematics, including analysis, probability, number theory and mechanics. Chebyshev introduced his famous polynomials in 1854 and later generalized to the concept of orthogonal polynomials.

184

CHAPTER 3. INTEGRATION

3.3.2

Improper Integration

So far the integrability was defined only for bounded functions on bounded intervals. When the functions or the intervals are unbounded, we may integrate on the bounded part and then take limit. Z c

Suppose f (x) is integrable on [a, c] for any a < c < b. If limc→b− f (x)dx a Z b f (x)dx converges and write converges, then we say the improper integral a

Z a

b

Z f (x)dx = lim− c→b

c

f (x)dx. a

The definition also applies to the case b = +∞, and similar definition can be made when f (x) is integrable on [c, b] for any a < c < b. By Exercise 3.1.8, if f (x) is a bounded function on bounded interval [a, b] and is integrable on [a, c] for any a < c < b, then it is integrable on [a, b]. Therefore an integral on a bounded interval becomes improper only if it is not bounded. Z Example 3.3.7. For any α 6= −1, the limit lima→0+

1

xα dx = lim

a→0+

a

1 − aα+1 Zα + 1 1

xα dx

converges if and only if α + 1 > 0. Therefore the improper integral 0

(which is in fact improper only when α < 0) converges if and only if α > −1, and Z 1 1 xα dx = . α+1 0 Z a aα+1 − 1 On the other hand, the limit lima→+∞ xα dx = lim converges a→+∞ α + 1 1 Z +∞

if and only if α + 1 < 0. Therefore the improper integral xα dx converges if 1 Z +∞ −1 α . and only if α < −1, and x dx = α+1 1 Z +∞ dx Example 3.3.8. The integral is improper at −∞ and +∞. Since 1 + x2 −∞ Z

b

dx π π lim = lim (arctan b − arctan a) = − − = π, a→−∞,b→+∞ a 1 + x2 a→−∞,b→+∞ 2 2 Z +∞ dx the improper integral converges, and = π. 2 −∞ 1 + x Exercise 3.3.5. Determine the convergence of the improper integrals and evaluate the convergent ones. Z ∞ Z +∞ Z +∞ dx α 1. . 3. x dx. 5. ax dx. x(log x)α 2 0 0

Z 2. 0

1

dx . x(− log x)α

Z 4.

1

Z log xdx.

0

0

6. −∞

ax dx.

3.3. TOPICS ON INTEGRATION Z

1

7. −1

Z 8. 2

Z

dx . 1 − x2

+∞

1

√

9. −1

dx . 1 − x2

Z 10.

π 2

185 Z

dx . 1 − x2

+∞

11.

e−x sin xdx.

0

Z

+∞

12.

tan xdx.

e−x | sin x|dx.

0

0

Exercise 3.3.6. Suppose f (x) is continuous for x ≥ 0, and limx→+∞ f (x) = l. Z +∞ f (ax) − f (bx) b Prove that for any a, b > 0, dx = (f (0) − l) log . x a 0

The convergent improper integrals have properties similar to the normal integrals. For example, the integration by parts and the change of variable still hold. The convergence of improper integrals can be determined by the Cauchy criterion. The following is the criterion stated for the case that the integral becomes improper at +∞. Proposition 3.3.1 (Cauchy Criterion). Suppose fZ(x) is integrable on [a, b] +∞ f (x)dx converges if for any b ∈ [a, +∞). Then the improper integral a

and only if for any > 0, there is N , such that Z c b, c > N =⇒ f (x)dx < . b

A consequence of the criterion is the following useful method for determining the convergence, again stated for integrals that are improper at +∞. Proposition 3.3.2 (Comparison Test). SupposeZ f (x) and g(x) are integrable +∞ g(x)dx converges, then on [a, b] for any b > a. If |f (x)| ≤ g(x) and a Z +∞ f (x)dx also converges. a

Proof. of Z +∞ For any > 0, applying the Cauchy criterion to the convergence Z c g(x)dx tells us that there is N , such that b, c > N implies g(x)dx < a

. Then b, c > N implies Z c Z c Z c |f (x)|dx ≤ g(x)dx < . f (x)dx ≤ b

b

b

b

Z Thus the Cauchy criterion for the convergence of

+∞

f (x)dx is verified. a

A special case of the comparison test is that if both f (x) and Z +∞g(x) are f (x) positive, and limx→+∞ = l exists, then the convergence of g(x)dx g(x)Z a +∞

f (x)dx. If l 6= 0, then the two convergences

implies the convergence of a

are equivalent. More generally, if c1 g(x) ≤ f (x) ≤ c2 g(x) for some c1 , c2 > 0, then the two convergences are equivalent.

186

CHAPTER 3. INTEGRATION

p(x) be a rational function. Let m and n be the q(x) r(x) degrees of the polynomials p(x) and q(x). Then limx→∞ m−n converges to a x nonzero number (the quotient of the leading coefficients). By changing the sign of the Zleading coefficients if necessary, we may assume p(x) and q(x) are positive. Example 3.3.9. Let r(x) =

+∞

xm−n dx converges if and only if m − n ≤ −2, by the comparison test, a Z +∞ we conclude that r(x)dx converges if and only if m − n ≤ −2. Since

a

Z

2

Example 3.3.10. The integral 0

sin xdx p is improper at 1 (in fact at both 1+ |x(x − 1)|

sin x and Since limx→1 p = sin 1 6= 0, the convergence of the improper integral |x| Z 2 dx p is equivalent to the convergence of . The later converges (on both |x − 1| 0 Z 1 Z 2 dx sin xdx √ converges. Therefore p sides of 1) because converges. x |x(x − 1)| 0 0 Z 1 √ dx √ converges, by the comparExample 3.3.11. Since limx→0+ x log x = 0 and x 0 Z 1 ison test, the improper integral log xdx converges. Moreover, by 0 < log sin x < 0 Z π 2 log x for 0 < x < 1 and the comparison test, the improper integral log sin xdx 1− ).

0

also converges. To compute the improper integral, we change the variable. Z

π 2

Z π 4 x x dx = 2 log 2 sin cos log(2 sin x cos x)dx 2 2 0 0 Z π Z π 4 4 π log 2 + log sin xdx + log cos xdx 2 0 0 Z π Z π 4 4 π log 2 + 2 log sin xdx − 2 log sin xdx π 2 0 2 Z π 2 π log 2 + 2 log sin xdx. 2 0

Z log sin xdx =

0

= = = Z Therefore 0

π 2

π 2

π log sin xdx = − log 2. 4

Z

b

The comparison test always concludes that the integral |f (x)|dx conZ b a verges. In this case, we way that the improper integral f (x)dx absolutely a

converges. By taking g(x) = |f (x)| in the comparison test, we see that the Z b Z b absolute convergence of f (x)dx implies the convergence of f (x)dx. a

a

However, the converse is not necessarily true, as shown by the next example.

3.3. TOPICS ON INTEGRATION

187 Z

Example 3.3.12. Consider the improper integral 1

+∞

sin x dx. The integral of the x

corresponding absolute value function satisfies Z nπ Z nπ sin x 2 dx ≥ 1 | sin x|dx = . x nπ (n−1)π nπ (n−1)π Z a sin x P1 then implies that The divergence of x dx is not bounded. Therefore n 1 Z +∞ sin x dx diverges, and the comparison test cannot be used. x 1 On the other hand, for a, b > 1, we use integration by parts to get Z b Z b Z b sin x 1 cos x sin b sin a dx. dx = − d cos x = − + + x b a x2 a a x a 1 Therefore if b > a > , then Z Z b Z b 1 sin x sin b sin a b cos x 1 1 dx ≤ + + dx ≤ + + dx < 3. 2 2 x b a x b a a a x a Z +∞ sin x By the Cauchy criterion, the improper integral dx converges. x 1

Z

b

Z

b

|f (x)|dx diverges, then we say the imf (x)dx converges but a a Z b proper integral f (x)dx conditionally converges. The idea used in deriving a Z +∞ sin x the (conditional) convergence of the improper integral dx can be x 1 elaborated to become the following tests. If

Proposition 3.3.3 (Dirichlet Test). Suppose f (x) is Zmonotone and satisfies b limx→+∞ f (x) = 0. Suppose there is M , such that g(x)dx < M for all a Z +∞ f (x)g(x)dx converges. b ∈ [a, +∞). Then a 6 Proposition Z +∞ 3.3.4 (Abel Test). Suppose Z +∞f (x) is monotone and bounded. f (x)g(x)dx converges. Suppose g(x)dx converges. Then a

a

Proof. We prove the tests by making the additional assumption that f (x) is 0 differentiable and Z f (x) is integrable on [a, b] for any b ∈ [a, +∞). x

Let G(x) =

g(t)dt. Using integration by parts, for b, c > a, we have a

Z

c

Z

Z f (x)dG(x) = f (b)G(b) − f (c)G(c) −

f (x)g(x)dx = b

c

b

c

G(x)f 0 (x)dx.

b

6 Niels Henrik Abel, born 1802 in Frindoe (Norway), died 1829 in Froland (Norway). In 1824, Abel proved the impossibility of solving the general equation of fifth degree in radicals. Abel also made contributions to elliptic functions. Abel’s name is enshrined in the term abelian, which describes the commutative property.

188

CHAPTER 3. INTEGRATION

Under the assumption of either test, we have |G(x)| ≤ M for a constant M and all x > a. Moreover, f 0 (x) does not change sign. Therefore Z c Z c Z c 0 0 0 G(x)f (x)dx ≤ M |f (x)|dx = M f (x)dx = M |f (b) − f (c)|. b

b

b

Assuming the condition of the Dirichlet test, for any > 0, there is N , such that x > N implies |f (x)| < . Then b, c > N implies Z c f (x)g(x)dx ≤ |G(b)| + |G(c)| + 2M ≤ 4M . b

+∞

Z

f (x)g(x)dx.

This verifies the Cauchy criterion for the convergence of a

Assuming the condition of the Abel test, we know both limx→+∞ f (x) and limx→+∞ f (x)G(x) converge. The Cauchy criterion for both limits tells us that for any > 0, there is N , such that b, c > N implies |f (b) − f (c)| < and |f (b)G(b) − f (c)G(c)| < . Then b, c > N implies Z c f (x)g(x)dx ≤ + M = (M + 1). b

Z

+∞

f (x)g(x)dx.

This also verifies the Cauchy criterion for the convergence of a

The assumption on the differentiability of f (x) can be removed by using Theorem 3.3.6, which is the more general version of the integration by parts in terms of the Riemann-Stieltjes integral. The relevant inequality used for estimation can be found in Exercises 3.3.24 and 3.3.25. The detailed proof is left as Exercise 3.3.28. Exercise 3.3.7. Compute improper integrals. Z π Z +∞ x sin xdx . 1. 3. xn e−x dx. 0 1 − cos x 0 Z 2.

π

Z x log sin xdx.

0

4. 0

Z 5. 0

1

(log x)n dx.

Z

1

xn dx √ . 1−x

+∞

6. −∞

dx . (1 + x2 )n

Exercise 3.3.8. Determine the convergence of improper integrals. If possible, also find out whether the convergence is absolute or conditional. Z π Z ∞ Z +∞ 1 sin x α β 1. x | log x| dx. 4. dx. 7. dx. xα 2 0 sin x 1 Z +∞ Z 1 Z +∞ x sin x α α β x | sin x|dx. 5. 2. x | log x| dx. 8. dx. 2 0 0 −∞ x + 1 Z +∞ Z 1 Z +∞ α x α β 3. x a dx. 6. x (1 − x) dx. 9. sin x2 dx. 0

0

0

3.3. TOPICS ON INTEGRATION

189

Exercise 3.3.9. Prove that Z +∞ Z b 1 +∞ p 2 f ax + dx = f ( x + 4ab)dx, x a 0 0 provided a, b > 0 and both sides converge. b

Z Exercise 3.3.10. Suppose f (x) < g(x) < h(x). Prove that if Z b Z b h(x)dx converge, then g(x)dx converges. a

f (x)dx and a

a

Z

+∞

Exercise 3.3.11. Prove that if f (x) ≥ 0 and

f (x)dx converges, then there is a

an increasing sequence {xn } diverging to +∞, such that limn→∞ f (xn ) = 0. Moreover, prove that in the special case f (x) is monotone, we have limx→+∞ f (x) = 0. Exercise 3.3.12. Suppose f (x) ≥ 0. b

Z

f (x)2 dx converges, prove that

1. If [a, b] is a bounded interval and

Z

+∞

2. If limx→+∞ f (x) = 0 and

Z f (x)dx converges, prove that

a

also converges.

f (x)dx a

a

also converges.

b

Z

+∞

f (x)2 dx

a

+∞ sin xdx Exercise 3.3.13. Prove that lima→+∞ = 0 when α > β > 0. Then xα a Z a 1 1 prove that lima→0 sin dx = 0. a 0 x Exercise 3.3.14. Derive the Abel test from the Dirichlet test.

aβ

Z

Exercise 3.3.15. Can you extend the Dirichlet and Abel tests to other kinds of improper integrals, such as unbounded function on bounded interval?

3.3.3

Riemann-Stieltjes Integration

Let α be a function on a bounded interval [a, b]. The Riemann sum may be extended to the Riemann-Stieltjes7 sum S(P, f, α) =

n X

f (x∗i )(α(xi )

− α(xi−1 )) =

i=1

n X

f (x∗i )∆αi .

(3.3.1)

i=1

We say f has Riemann-Stieltjes integral (or simply Stieltjes integral) I and Z b denote f dα = I, if for any > 0, there is δ > 0, such that a

kP k < δ =⇒ |S(P, f, α) − I| < .

(3.3.2)

When α(x) = x, we get the Riemann integral. 7 Thomas Jan Stieltjes, born 1856 in Zwolle (Netherland), died 1894 in Toulouse (France). He is often called the father of the analytic theory of continued fractions and is best remembered for his integral.

190

CHAPTER 3. INTEGRATION

Example 3.3.13. Suppose f (x) = c is a constant. Then S(P, f, α) = c(α(b)−α(a)). Z b cdα = c(α(b) − α(a)). Therefore a

Example 3.3.14. Suppose α(x) = α0 is a constant. Then ∆αi = 0. Therefore Z b f dα0 = 0. Since f can be arbitrary, we see that Proposition 3.1.2 cannot be a

extended without additional conditions. Example 3.3.15. Suppose a < c c

is a step function, in which the three values are not the same. Then ( f (x∗i )(α+ − α− ) S(P, f, α) = f (x∗i+1 )(α+ − α0 ) + f (x∗i )(α0 − α− )

if xi−1 < c < xi . if c = xi

If f (x) is continuous at c, then it is easy to see that kP k small implies Z b |S(P, f, α) − f (c)(α+ − α− )| small. Therefore f (x)dα = f (c)(α+ − α− ). a

If f (x) is not continuous at c and α+ 6= α− , then we choose P with xi−1 < c < xi . The discontinuity at c tells us that there is > 0, such that no matter how small kP k is, there are x, y satisfying xi−1 < x, y < xi and |f (x)−f (y)| ≥ . Taking x and y as x∗i respectively and keeping all the other x∗j the same, we get two RiemannStieltjes sums. The difference of the two sums is |(f (x) − f (y))(α+ − α− )| ≥ |α+ − α− |. Therefore f (x) is not Riemann-Stieltjes integrable with respect to α. If f (x) is not continuous at c and α+ = α− , then α0 6= α+ = α− and we choose P with xi = c. Assume f (x) is not left continuous at c. Then there is > 0, such that no matter how small kP k is, there are x, y satisfying xi−1 < x, y ≤ xi = c and |f (x) − f (y)| ≥ . Taking x and y as x∗i respectively and keeping all the other x∗j the same, we get two Riemann-Stieltjes sums. The difference of the two sums is |(f (x) − f (y))(α0 − α− )| ≥ |α0 − α− |. Therefore f (x) is not Riemann-Stieltjes integrable with respect to α. The case f (x) is not right continuous at c is similar. We conclude that f is Riemann-Stieltjes integrable with respect to α if and only if f is continuous at c. Exercise 3.3.16. Prove that if α and f have a common point of discontinuity in [a, b], then f is not Riemann-Stieltjes integrable with respect to α. You may need the characterization of discontinuity in Exercise 1.4.42. Z 2 Exercise 3.3.17. Find suitable α on [0, 2], such that f dα = f (0) + f (1) + f (2) 0

for any continuous f on [0, 2]. Exercise 3.3.18. Prove that if the Dirichlet function is Riemann-Stieltjes integrable Z b with respect α, then α is constant. In particular, this shows that f dα = 0 for a

any function α with respect to which f is Riemann-Stieltjes integrable does not necessarily imply that f is zero.

3.3. TOPICS ON INTEGRATION

191

Exercise 3.3.19. Prove that the only function Riemann-Stieltjes integrable with respect to the Dirichlet function is the constant function. In particular, this shows Z b f dα = 0 for any function f that is Riemann-Stieltjes integrable with that a

respect to α does not necessarily imply that α is a constant.

The Riemann-Stieltjes integrability can be verified by Cauchy criterion. The results from Section 3.1.4 remain mostly true. Proposition 3.1.7 can be adopted without much change (see Exercise 3.3.21). Proposition 3.1.8 still holds if α is an increasing function (see Exercise 3.3.24). Proposition 3.1.9 holds in one direction (see Exercises 3.3.22 and 3.3.23). Exercise 3.3.20. Suppose f is Riemann-Stieltjes integrable with respect to α and β. Suppose c is a constant. Prove that f is Riemann-Stieltjes integrable with respect to α + β and cα, and Z

b

b

Z

a

b

Z

f d(α + β) = a

b

Z

f dα +

f dβ, a

Z f d(cα) = c

b

f dα.

a

(3.3.3)

a

Exercise 3.3.21. Suppose f and g are Riemann-Stieltjes integrable with respect to α. Prove that f + g and cf are Riemann-Stieltjes integrable with respect to α and Z

b

Z

b

(f + g)dα = a

b

Z

Z

f dα + a

b

Z

gdα, a

cf dα = c a

b

f dα.

(3.3.4)

a

Exercise 3.3.22. Suppose a < b < c and f is Riemann-Stieltjes integrable with respect to α on [a, c]. Prove that f is Riemann-Stieltjes integrable with respect to α on [a, b] and [b, c], and Z

c

b

Z f dα =

a

c

Z f dα +

a

f dα.

(3.3.5)

b

Exercise 3.3.23. Suppose a < b < c and f is Riemann-Stieltjes integrable with respect to α on [a, b] and [b, c]. Assume f and α are bounded. 1. Prove that if f is continuous at b, then f is Riemann-Stieltjes integrable on [a, c]. 2. Prove that if α is continuous at b, then f is Riemann-Stieltjes integrable on [a, c]. 3. Construct functions f and α on [a, c], such that f is Riemann-Stieltjes integrable with respect to α on [a, b] and [b, c], but both f and α are not continuous at b. By Exercise 3.3.16, f is not Riemann-Stieltjes integrable with respect to α on [a, c]. Exercise 3.3.24. Suppose f and g are Riemann-Stieltjes integrable with respect to an increasing α on [a, b]. Prove that Z f ≤ g =⇒

b

Z f dα ≤

a

b

gdα. a

Moreover, if α is strictly increasing and f and g are continuous, then the equality holds if and only if f = g.

192

CHAPTER 3. INTEGRATION

Exercise 3.3.25. Suppose f is Riemann-Stieltjes integrable with respect to an increasing α on [a, b]. Prove that Z b Z b ≤ |f |dα, f dα a

a

b

Z (α(b) − α(a)) inf f ≤

f dα ≤ (α(b) − α(a)) sup f,

[a,b]

a

[a,b]

Z b f (c)(α(b) − α(a)) − f dα ≤ ω[a,b] (f )(α(b) − α(a)), a Z b X S(P, f, α) − ≤ f dα ω[xi−1 ,xi ] (f )∆αi . a

Moreover, extend the first integral mean theorem in Exercise 3.1.15 to RiemannStieltjes integral. ExerciseZ3.3.26. Suppose f is Riemann-Stieltjes integrable with respect to α, and x f dα. Prove that if α is strictly monotone and f (x) is continuous at x0 , F (x) = a

then limh→0

F (x0 + h) − F (x0 ) = f (x0 ). This extends the fundamental theorem α(x0 + h) − α(x0 )

of calculus.

The Riemann-Stieltjes integral can often be computed by the ordinary Riemann integral. Theorem 3.3.5. Suppose f is bounded, β is Riemann integrable and α(x) = Z x β(t)dt. Then f is Riemann-Stieltjes integrable with respect to α if and a

only if f β is Riemann integrable. Moreover, Z b Z b f βdx. f dα = a

a

For the special case β is continuous, we have α0 = β and the formula becomes Z b Z b f dα = f α0 dx. (3.3.6) a

a

This provides the rule for moving α from inside d to outside d, which is consistent with the notation dα = α0 dx for the differential. Proof. Suppose |f | 0, there is δ > 0, such that kP k < δ implies ω[xi−1 ,xi ] (β)∆xi < . For the same choice of P and x∗i , the Riemann-Stieltjes sum of f with respect to α is Z xi X X ∗ ∗ S(P, f, α) = f (xi )(α(xi ) − α(xi−1 )) = f (xi ) β(t)dt, xi−1

and the Riemann sum of f β is S(P, f β) =

X

f (x∗i )β(x∗i )∆xi .

3.3. TOPICS ON INTEGRATION

193

Then in case kP k < δ, we have |S(P, f, α) − S(P, f β)| ≤

X

≤

X

Z |f (x∗i )|

xi

xi−1

β(t)dt − β(x∗i )∆xi

Bω[xi−1 ,xi ] (β)∆xi ≤ B.

This implies that the Riemann-Stieltjes sum of f with respect to α converges if and only if the Riemann sum of f β converges. Moreover, the two limits are the same. The relation between the Riemann-Stieltjes integral and the ordinary Riemann integral suggests that the integration by parts and the change of variable formulae can be extended. The integration by parts for the RiemannStieltjes integral turns out to be a lot more symmetric and elegant. Theorem 3.3.6. Suppose f is Riemann-Stieltjes integrable with respect to α. Then α is Riemann-Stieltjes integrable with respect to f . Moreover, Z

b

Z

a

b

αdf = f (b)α(b) − f (a)α(a).

f dα +

(3.3.7)

a

Proof. Let P : a = x0 < x1 < x2 < · · · < xn = b be a partition of [a, b] and x∗i ∈ [xi−1 , xi ] are chosen for 1 ≤ i ≤ n. Consider Q : a = x∗0 ≤ x∗1 ≤ x∗2 ≤ · · · ≤ x∗n ≤ x∗n+1 = b. Q is almost a partition except some repetition may happen among the partition points. By choosing xi−1 ∈ [x∗i−1 , x∗i ] for 1 ≤ i ≤ n + 1, we still use the notation n+1 X S(Q, f, α) = f (xi−1 )(α(x∗i ) − α(x∗i−1 )) i=1

to denote the Riemann-Stieltjes sum of f with respect to α. In case a repetition x∗i = x∗i−1 happens, the corresponding term in the sum simply vanishes. Therefore S(Q, f, α) is the same as the Riemann-Stieltjes sum after removing all the repetitions and we may pretend there is no repetition in S(Q, f, α) in the subsequent argument. In particular, by the integrability of f with respect to α, for any > 0, there is δ, such that Z b kQk < δ =⇒ S(Q, f, α) − f dα < . a

Since x∗i − x∗i−1 ≤ xi − xi−2 ≤ 2kP k, we have kQk ≤ 2kP k and Z b δ kP k < =⇒ S(Q, f, α) − f dα < . 2 a

194

CHAPTER 3. INTEGRATION

Note that (see Figure 4.1 for the geometric meaning) S(Q, f, α) = =

n+1 X i=1 n X

f (xi−1 )α(x∗i ) −

n+1 X

f (xi−1 )α(x∗i−1 )

i=1

f (xi−1 )α(x∗i )

+ f (b)α(b) −

i=1

=

n X

n+1 X

f (xi−1 )α(x∗i−1 ) − f (a)α(a)

i=2

f (xi−1 )α(x∗i ) −

i=1

n X

f (xi )α(x∗i ) + f (b)α(b) − f (a)α(a)

i=1

= −S(P, α, f ) + f (b)α(b) − f (a)α(a). Therefore we have Z b δ f dα < . =⇒ S(P, α, f ) − f (b)α(b) − f (a)α(a) − kP k < 2 a This proves that α is Riemann-Stieltjes integrable with respect to f , and the formula (3.3.7) holds. Exercise 3.3.27. Suppose f (x) is a non-negative and decreasing function on [a, b] that is Riemann-Stieltjes integrable with respect to α. Suppose m ≤ α ≤ M on [a, b]. Prove Z b f dα ≤ f (a)(M − α(a)) f (a)(m − α(a)) ≤ a

and derive the second integral mean value theorem in Exercises 3.2.58 and 3.2.59 without assuming the differentiability. Exercise 3.3.28. Prove the general version of the Dirichlet and Abel tests (Propositions 3.3.3 and 3.3.4) without the differentiability assumption on f (x).

In view of Theorem 3.3.5, the following extends Theorem 3.2.5 for the change of variable. Theorem 3.3.7. Suppose φ is increasing and continuous on [a, b]. Suppose f is Riemann-Stieltjes integrable with respect to α on [φ(a), φ(b)]. Then f ◦ φ is Riemann-Stieltjes integrable with respect to α ◦ φ on [a, b]. Moreover, Z φ(b) Z b f dα = (f ◦ φ)d(α ◦ φ). (3.3.8) φ(a)

a

Proof. Let P be a partition of [a, b]. Similar to the proof of Theorem 3.2.5, we have a partition φ(P ) of [φ(a), φ(b)]. Choose x∗i for P and choose corresponding φ(x∗i ) for φ(P ). Then the Riemann-Stieltjes sum of f ◦ φ with respect to α ◦ φ is X S(P, f ◦ φ, α ◦ φ) = f (φ(x∗i ))(α(φ(xi )) − α(φ(xi−1 ))) = S(φ(P ), f, α). Since f is Riemann-Stieltjes integrable with respect to α, for any > 0, there is δ > 0, such that Z φ(a) kQk < δ =⇒ S(Q, f, α) − f dα < . φ(a)

3.3. TOPICS ON INTEGRATION

195

The continuity of φ implies the uniform continuity. Therefore there is δ 0 > 0, such that kP k < δ 0 implies kφ(P )k < δ. Then kP k < δ 0 implies Z φ(a) Z φ(a) f dα < . f dα = S(φ(P ), f, α) − S(P, f ◦ φ, α ◦ φ) − φ(a) φ(a) This proves that f ◦ φ is Riemann-Stieltjes integrable with respect to α ◦ φ and the formula (3.3.8) holds. Exercise 3.3.29. What will happen to Theorem 3.3.7 if φ is decreasing? What if φ(x) is not assumed to be continuous? Exercise 3.3.30. Suppose f and g are increasing functions satisfying g(f (x)) = x and f (0) = 0. Prove that if f is continuous, then for any a, b > 0, we have the Young inequality Z b Z a g(y)dy ≥ ab. f (x)dx + 0

3.3.4

0

Bounded Variation Function

The discussion of the integrability in Section 3.1.2 critically depends on the inequality (3.1.6). The inequality may be extended to the Riemann-Stieltjes sum |f (c)(α(b) − α(a)) − S(P, f, α)| ≤ ω[a,b] (f )VP (α), (3.3.9) where VP (α) = |α(x0 ) − α(x1 )| + |α(x1 ) − α(x2 )| + · · · + |α(xn−1 ) − α(xn )| is the variation of α with respect to the partition P . We say α has bounded variation if there is a constant V , such that VP (α) ≤ V for any partition P . If a partition Q refines the partition P , then we clearly have VQ (α) ≥ VP (α). This suggests us to define the variation of α on an interval to be V[a,b] (α) = sup{VP (α) : P is a partition of [a, b]}. It is easy to see that monotone functions have bounded variations with V[a,b] (α) = |α(b) − α(a)|, and linear combinations of bounded variation functions also have bounded variations. Moreover, it is easy to establish the following properties for the variations. Proposition 3.3.8. Suppose α and β are bounded variation functions on [a, b]. Then V[a,b] (α) ≥ |α(b) − α(a)|, V[a,c] (α) = V[a,b] (α) + V[b,c] (α) for a < b < c, V[a,b] (λα) = |λ|V[a,b] (α), V[a,b] (α + β) ≤ V[a,b] (α) + V[a,b] (β).

196

CHAPTER 3. INTEGRATION

Proof. The difference |α(b) − α(a)| is the variation with respect to the partition a = x0 < x1 = b. Therefore the first inequality follows from the definition of the variation V[a,b] (α). For any partition P of [a, c], we denote by P ∪ {b} the partition of [a, b] obtained by adding c to P , and denote by P[x,y] the partition of [x, y] obtained by combining x, y and those points in P lying in [x, y]. Then VP[a,b] (α) + VP[b,c] (α) = VP ∪{b} (α) ≥ VP (α). This implies V[a,c] (α) ≤ V[a,b] (α) + V[b,c] (α). Conversely, for any partitions P 0 of [a, b] and P 00 of [b, c], we denote by P 0 ∪ P 00 the partition of [a, c] obtained by combining points in P 0 and P 00 together. Then VP (α) = VP 0 (α) + VP 00 (α). This implies that V[a,c] (α) ≥ V[a,b] (α) + V[b,c] (α). The completes the proof of the second equality. The third equality follows from VP (λα) = |λ|VP (α). The forth inequality follows from VP (α + β) ≤ VP (α) + VP (β). Z Example 3.3.16. Suppose β is Riemann integrable and α(x) = Z P xi β(t)dt , and by (3.1.13) in Exercise 3.1.16, VP (α) = xi−1

x

β(t)dt. Then a

Z xi X ∗ |S(P, |β|) − VP (α)| ≤ β(t)dt |β(xi )|∆xi − xi−1 Z xi X ∗ β(t)dt ≤ β(xi )∆xi − xi−1 X ≤ ω[xi−1 ,xi ] (β)∆xi . This implies α has bounded variation, and Z b V[a,b] (α) = |β(t)|dt. a

Exercise 3.3.31. Prove that any Lipschitz function has bounded variation. Exercise 3.3.32. Prove that any bounded variation function is Riemann integrable. Exercise 3.3.33. Suppose a function α has bounded variation on [a, b]. Prove that α is increasing if and only if V[a,b] (α) = α(b) − α(a). Exercise 3.3.34. Suppose f is Riemann-Stieltjes integrable with respect to α. Suppose a ≤ c ≤ b. Prove that Z b f dα ≤ sup |f |V[a,b] (α), a

[a,b]

Z b f (c)(α(b) − α(a)) − f dα ≤ ω[a,b] (f )V[a,b] (α). a

3.3. TOPICS ON INTEGRATION

197

Exercise 3.3.35. Suppose f is Riemann-Stieltjes integrable with respect to a bounded Z x

f dα. Prove that if α is not continuous at x0

variation function α, and F (x) = a

and f (x0 ) 6= 0, then F is not continuous at x0 .

For a function α(x) with bounded variation on [a, b], define the variation function v(x) = V[a,x] (α). Intuitively, the change of α is positive when α is increasing and negative when α is decreasing. The variation function v(x) keeps track of both as positive changes. Then we may further define the positive variation function and the negative variation function v+ =

v+α v−α , v− = . 2 2

Intuitively, v + (x) keeps track of the positive changes only and becomes constant on the intervals on which α is decreasing, and v − (x) keeps track of the negative changes only. We have α = v + − v − . Moreover, for a ≤ x < y ≤ b, by Proposition 3.3.8, we have v + (y) − v + (x) =

V[x,y] (α) − |α(y) − α(x)| V[x,y] (α) + α(y) − α(x) ≥ ≥ 0. 2 2

Thus v + (x) is increasing. Similarly, v − (x) is also increasing. Therefore we proved the necessary part of the following. Proposition 3.3.9. A function has bounded variation if and only if it is the difference of two increasing functions. The sufficiency follows from the fact that monotone functions have bounded variations and linear combinations of bounded variation functions still have bounded variations. Z Exercise 3.3.36. Suppose β is Riemann integrable and α(x) = Z x Z x that v + (x) = max{0, β(t)}dt, v − (x) = − min{0, β(t)}dt. a

x

β(t)dt. Prove a

a

Exercise 3.3.37. Suppose a function α has bounded variation, with positive and negative variation functions v + and v − . Suppose α = u+ − u− , where u+ and u− are increasing functions. 1. Prove that V[a,b] (α) ≤ (u+ (b) − u+ (a)) + (u− (b) − u− (a)). 2. Prove that v + (y)−v + (x) ≤ u+ (y)−u+ (x) and v − (y)−v − (x) ≤ u− (y)−u− (x) for any a ≤ x < y ≤ b. 3. Prove that the equality in the first part holds if and only if u+ = v + + c and u− = v − + c for some constant c. The result shows that α = v + − v − is the “most efficient” way of expressing a bounded variation function as the difference of increasing functions.

Next we establish properties of continuous bounded variation functions.

198

CHAPTER 3. INTEGRATION

Proposition 3.3.10. A bounded variation function is continuous if and only if its variation function is continuous. Proof. Let α be a bounded variation function on [a, b] and v(x) = V[a,x] (α). By Proposition 3.3.8, for a ≤ x < y ≤ b, we have |α(y) − α(x)| ≤ V[x,y] (α) = v(y) − v(x). This implies that if v is continuous, then α is also continuous. Conversely, for any > 0, there is a partition P of [a, b], such that VP (α) > V[a,b] (α) − = v(b) − . If α is continuous, then there is δ > 0, such that a ≤ x < y ≤ b and y − x < δ implies |α(y) − α(x)| < . Let δP = min ∆xi be the length of the smallest interval in P . Then for a ≤ x < y ≤ b satisfying y − x < δ 0 = min{δ, δP }, the interval (x, y) contains at most one point from P . Denote by Q = P ∪ {x, y} the partition of [a, b] obtained by adding x, y to P . Denote by Q[x,y] the partition of [x, y] obtained by combining x, y and those points in P lying in [x, y] (as in the proof of Proposition 3.3.8). Then VQ[a,x] (α) + VQ[x,y] (α) + VQ[y,b] (α) = VQ (α) ≥ VP (α) > V[a,b] (α) − = V[a,x] (α) + V[x,y] (α) + V[y,b] (α) − . By VQ[a,x] (α) ≤ V[a,x] (α) and VQ[y,b] (α) ≤ V[y,b] (α), we get VQ[x,y] (α) ≥ V[x,y] (α) − . Since (x, y) contains at most one point from P . The partition Q[x,y] is either x < y or x < c < y for some c ∈ P . Since y − x < δ, we have either VQ[x,y] (α) = |α(y) − α(x)| < , or VQ[x,y] (α) = |α(y) − α(c)| + |α(c) − α(x)| < 2. Therefore in either case, we get v(y) − v(x) = V[x,y] (α) ≤ VQ[x,y] (α) + ≤ 3. Since the only condition for this to hold is y−x < δ 0 , this proves the continuity of v. The following is a technical result about continuous bounded variation functions that will be useful for proving further results. Proposition 3.3.11. Suppose α is a continuous bounded variation function. Then for any > 0, that there is δ > 0, such that kP k < δ implies VP (α) > V[a,b] (α) − .

3.3. TOPICS ON INTEGRATION

199

Proof. Example 3.3.16 shows that the close relation between the variation and the Riemann integral. The subsequent proof is similar to Exercises 3.1.38 and 3.1.39. By definition, there is a partition Q, such that VQ (α) > V[a,b] (α) − . Suppose Q contains n partition points. By the continuity of α, there is δ > 0, such that |x − y| < δ =⇒ |α(x) − α(y)| <

. 2n

Now assume P is any partition satisfying kP k < δ. Then the oscillations of α on the intervals in P are < . Denote by P ∪ Q the partition of [a, b] 2n obtained by combining points of P and Q together. Then VP ∪Q (α) ≥ VQ (α) > V[a,b] (α) − . On the other hand, P ∪ Q is obtained from P by adding no more than n points. These n points fall into at most n intervals in P , and the sums for VP ∪Q (α) and VP (α) differ only at these intervals. Specifically, if [xi−1 , xi ] is one such interval that contains k points from Q, then the term |α(xi ) − α(xi−1 )| in VP (α) is replaced by a sum of at most k + 1 terms in VP ∪Q (α). , the sum of these k + 1 terms is < (k + 1) . Adding Since ω[xi−1 ,xi ] (α) < 2n 2n P up all these differences, we get (note that (ki + 1) ≤ 2n) VP ∪Q (α) < VP (α) + 2n

= VP (α) + . 2n

Combining the two inequalities together, we conclude that kP k < δ =⇒ VP (α) + > V[a,b] (α) − .

Exercise 3.3.38. Given the conclusion of Proposition 3.3.11, prove that for any partition P of any interval [c, d] ⊂ [a, b] satisfying kP k < δ, we have VP (α) > V[c,d] (α) − . Use this observation to give another proof of Proposition 3.3.10. ExerciseZ3.3.39. Suppose f is Riemann-Stieltjes integrable with respect to α, and x F (x) = f dα. Prove that if α is continuous and has bounded variation, then F a

is continuous. Compare with the Exercise 3.3.35.

Now we are ready to extend Theorem 3.1.3 on the criterion for Riemann integrabilty to the Riemann-Stieltjes integrability. Theorem 3.3.12. Suppose f is bounded and α has bounded variation. 1. If f is Riemann-Stieltjes integrable with respect to α, then for any > 0, there is δ > 0, such that X kP k < δ =⇒ ω[xi−1 ,xi ] (f )|∆αi | < .

200

CHAPTER 3. INTEGRATION

2. If for any > 0, there is δ > 0, such that X kP k < δ =⇒ ω[xi−1 ,xi ] (f )V[xi−1 ,xi ] (α) < , then f is Riemann-Stieltjes integrable with respect to α. Moreover, if α is monotone, then the two parts are inverse to each other and give a necessary and sufficient condition for the Riemann-Stieltjes integrability. If α is continuous, then the converse of the second part is also true and gives a necessary and sufficient condition for the Riemann-Stieltjes integrability. Proof. The proof is very similar to the proof of Theorem 3.1.3. For the first part, we note that by taking P = P 0 but choosing different x∗i for P and P 0 , we have X ∗ S(P, f, α) − S(P 0 , f, α) = (f (x∗i ) − f (x0 i ))∆αi . Then for each i, we may choose f (x∗i ) − f (x0 ∗i ) to have the same sign as ∆αi and |f (x∗i ) − f (x0 ∗i )| to be as close to the oscillation ω[xi−1 P,xi ] (f ) as possible. 0 The result is that S(P, f, α) − S(P , f, α) is very close to ω[xi−1 ,xi ] (f )|∆αi |. The rest of the proof is the same. For the second part, the key is the following estimation for a refinement Q of P . n n X X ∗ |S(P, f, α) − S(Q, f, α)| = f (xi )∆αi − S(Q[xi−1 ,xi ] , f, α) i=1

i=1

n X f (x∗i )(α(xi+1 ) − α(xi )) − S(Q[x ≤

i−1 ,xi ]

, f, α)

i=1

≤ ≤

n X i=1 n X

ω[xi−1 ,xi ] (f )VQ[xi−1 ,xi ] (α) ω[xi−1 ,xi ] (f )V[xi−1 ,xi ] (α),

i=1

where the second inequality follows from (3.3.9). The rest of the proof is the same. If α is monotone, then |∆αi | = V[xi−1 ,xi ] (α), so that the two parts are inverse to each other. If α is continuous, then by Proposition 3.3.11, for any > 0, there is δ > 0, such that kP k < δ implies VP (α) > V[a,b] (α) − . Now for RiemannStieltjesP integrable f , by the first part, there is δ 0 > 0, such that kP k < δ 0 implies ω[xi−1 ,xi ] (f )|∆αi | < . If |f | < B, then kP k < min{δ, δ 0 } implies X ω[xi−1 ,xi ] (f )V[xi−1 ,xi ] (α) X X ≤ ω[xi−1 ,xi ] (f )|∆αi | + ω[xi−1 ,xi ] (f )(V[xi−1 ,xi ] (α) − |∆αi |) X ≤ + 2B (V[xi−1 ,xi ] (α) − |∆αi |) = + 2B(V[a,b] (α) − VP (α)) < (2B + 1).

3.3. TOPICS ON INTEGRATION

201

With the help of the theorem, the results in Section 3.1.3 may be extended. The proofs are left as exercises. Proposition 3.3.13. Any continuous function is Riemann-Stieltjes integrable with respect to any function with bounded variation. By Theorem 3.3.6, we have the following result that extends Proposition 3.1.5. Proposition 3.3.14. Any function with bounded variation is Riemann-Stieltjes integrable with respect to any continuous function. If α is monotone or continuous, then Theorem 3.3.12 gives us necessary and sufficient criteria for the Riemann-Stieltjes integrability. Based on this, Proposition 3.1.6 can be proved just as before. Proposition 3.3.15. Suppose f is Riemann-Stieltjes integrable with respect to α, which is either monotone or continuous with bounded variation. Suppose the values of f lie in a finite union U of closed intervals, and φ is a continuous function on U . Then the composition φ ◦ f is also RiemannStieltjes integrable with respect to α. By the same reason as Riemann integration, under the assumption about α in Proposition 3.3.15, the products of functions Riemann-Stieltjes integrable with respect to α are still Riemann-Stieltjes integrable with respect to α.

3.3.5

Additional Exercise

Trigonometric Integration Exercise 3.3.40. By writing a sin x + b cos x as a linear combination of c sin x + Z a sin x + b cos x 0 dx d cos x and (c sin x + d cos x) , compute the antiderivatives c sin x + d cos x Z a sin x + b cos x and dx. Use the similar idea to compute the antiderivative (c sin x + d cos x)2 Z a sin x + b cos x + λ dx. c sin x + d cos x + µ Exercise 3.3.41. By writing Au2 + 2Buv + Cv 2 in the form (αu + βv)(au + bv) + Z A sin2 x + 2B sin x cos x + C cos2 x γ(u2 + v 2 ), compute the antiderivative dx. a sin x + b cos x

Gamma Function The Gamma function is Z Γ(x) =

+∞

tx−1 e−t dt.

0

Exercise 3.3.42. Prove that the function is defined and continuous for x > 0. Exercise 3.3.43. Prove limx→0+ Γ(x) = limx→+∞ Γ(x) = +∞.

202

CHAPTER 3. INTEGRATION

Exercise 3.3.44. Prove the other formulae for the Gamma function Z ∞ Z ∞ 2x−1 −t2 x tx−1 e−at dt. Γ(x) = 2 t e dt = a 0

0

Exercise 3.3.45. Prove the equalities for the Gamma function Γ(x + 1) = xΓ(x), Γ(n) = (n − 1)!.

Beta Function The Beta function is 1

Z

tx−1 (1 − t)y−1 dt.

B(x, y) = 0

In Exercise 7.1.69, we will see that the Beta function can be written in terms of the Gamma function. Exercise 3.3.46. Prove that the function is defined for x, y > 0. 1 Exercise 3.3.47. Use the change of variables t = to prove the other formulae 1+u for the Beta function Z ∞ Z 1 x−1 ty−1 dt t + ty−1 B(x, y) = = dt. x+y (1 + t)x+y 0 0 (1 + t) Exercise 3.3.48. Prove Z B(x, y) = 2

π 2

cos2x−1 t sin2y−1 tdt.

0

Exercise 3.3.49. Prove the equalities for the Beta function B(x, y) = B(x, y), B(x + 1, y) =

x B(x, y). x+y

Chapter 4 Series

203

204

4.1

CHAPTER 4. SERIES

Series of Numbers

When n approaches infinity, the Taylor expansion becomes a series T (x) = f (x0 )+f 0 (x0 )(x−x0 )+

f 00 (x0 ) f (n) (x0 ) (x−x0 )2 +· · ·+ (x−x0 )n +· · · . 2 n!

This can be considered as the infinite degree approximation of f (x) at x0 . The value of the series would be a limit, and the natural question is whether the value is equal to the function itself. Another important and useful series is the Fourier1 series 1 a0 + a1 cos x + b1 sin x + a2 cos 2x + b2 sin 2x + · · · + an cos nx + bn sin nx + · · · 2 defined for an integrable periodic function f (x) with period 2π, where Z Z 1 2π 1 2π an = f (x) cos nxdx, bn = f (x) sin nxdx. π 0 π 0 Again the central question here is whether the limit of the series is equal to the function itself. The Taylor series and the Fourier series are series of functions. In this part, we discuss the more elementary theory of the series of numbers. The series of functions will be discussed in the next part.

4.1.1

Sum of Series

A series is an infinite sum ∞ X

x n = x1 + x2 + · · · + xn + · · · .

n=1

The partial sum of the series is the sequence sn =

n X

xk = x 1 + x2 + · · · + xn .

k=1

Definition 4.1.1. A series if limn→∞ sn = s.

P∞

n=1

xn has sum s, and denoted

P∞

n=1

xn = s,

If the limit exists, then the series converges. Otherwise, the series diverges.PIf limn→∞ sn = ∞, we also say the series diverges to infinity and write ∞ n=1 xn = ∞. The Cauchy criterion for the convergence of sequences leadsP immediately to the Cauchy criterion for the convergence of series: A series ∞ n=1 xn converges if and only for any > 0, there is N , such that n ≥ m > N =⇒ |xm + xm+1 + · · · + xn | < .

(4.1.1)

1 Jean Baptiste Joseph Fourier, born 1768 in Bourgogne (France), died 1830 in Paris (France).

4.1. SERIES OF NUMBERS

205

For the special P∞ case that m = n, we find that a necessary condition for a sequence n=1 xn to converge is lim xn = 0.

n→∞

Like sequences, a series does not have to start with index 1. On the other hand, without loss of generality, we can always assume that a series starts with index 1 in theoretical studies. Moreover, modifying, adding or deleting finitely many terms in a series does not change the convergence of the series. Example 4.1.1. The geometric series is partial sum satisfies

P∞

n n=0 a

= 1 + a + a2 + · · · + an + · · · . The

(1 − a)sn = (1 + a + a2 + · · · + an ) − (a + a2 + a3 + · · · + an+1 ) = 1 − an+1 . Therefore sn =

1 − an+1 , and 1−a   1 n a = 1−a diverges n=0 ∞ X

if |a| < 1

.

(4.1.2)

if |a| ≥ 1

Example 4.1.2. If xn ≥ 0, then the partial sum P is an increasing sequence. Therefore the convergence of the non-negative series xn is equivalent to the boundedness of the partial sums. This is the reason behind the divergence of the harmonic P P 1 1 and the convergence of the series ∞ series ∞ n=1 2 in Example 1.2.9. We n=1 n n also note that the computation in Example 1.2.9 tells us ∞ X 1 1 1 1 1 1 = + + ··· + + · · · = lim − = 1. n→∞ 1 n(n + 1) 1·2 2·3 (n − 1)n n n=1

Example 4.1.3. In Example 2.3.16, the estimation of the remainder of the Taylor P xn series tells us that ∞ converges to ex for any x. In particular, we have n=0 n! 1 1 1 1 + + + ··· + + · · · = e. 1! 2! n! Exercise 4.1.1. Compute the sums of convergent series. P∞ P∞ n. 1 1. na n=1 5. n=2 log 1 − 2 . n P∞ n 2. . P∞ 1 n=1 2n − 1 6. . n=1 √ n a P∞ (−1)n P∞ 1 3. . n=1 7. . (2n)! n=1 (a + n)(a + n + 1) 4.

P∞

n=1

(−1)n . (2n + 1)!

8.

P∞

n=1

1 . n(n + 1)(n + 2)

Exercise 4.1.2. Use Exercise 1.4.38 to show 1 1 1 1 1 1 1 1 − + − + − + − + · · · = log 2, 2 3 4 5 6 7 8 1 1 1 1 1 1 1 1 3 1+ − + + − + + − + · · · = log 2. 3 2 5 7 4 9 11 6 2 Note that the second series is obtained from the first by rearranging the terms.

206

CHAPTER 4. SERIES

Exercise 4.1.3. Prove that if y is not an integer multiple of 2π, then n X 1 2n + 1 1 cos(x + ky) = , y sin x + 2 y − sin x − 2 y 2 sin k=0 2 n X 2n + 1 1 1 . sin(x + ky) = y − cos x + 2 y + cos x − 2 y 2 sin k=0 2 P sin ny Use integration between y and π to find the partial sum of the series n for 0 < y < 2π. Then apply Riemann-Lebesgue Lemma in Exercise 3.2.44 to get P∞ sin ny π−y = for 0 < y < 2π. n=1 n 2 P xn Exercise 4.1.4. Suppose xn > 0. Compute ∞ . n=1 (1 + x1 )(1 + x2 ) · · · (1 + xn ) P Exercise 4.1.5. Prove that a sequence xn converges if and only if the series (xn+1 − xn ) converges. P P Exercise 4.1.6. Prove xn and yn converge, then the series P P that if the series (xn + yn ) and cxn also converge. Moreover, X X X X X (xn + yn ) = xn + yn , cxn = c xn . Exercise 4.1.7. Let nk be a strictly increasing sequence of natural P∞ P∞ numbers. Then n=k (xnk +xnk +1 +· · ·+xnk+1 −1 ) is obtained from the series n=1 xn by combining successive terms. P∞ P 1. Prove that if ∞ n=k (xnk + xnk +1 + · · · + xnk+1 −1 ) n=1 xn converges, then also converges. 2. Prove that if the terms P∞ xnk , xnk +1 , . . . , xnk+1 −1 that got combined have the same sign and n=k (xnk + xnk +1 + · · · + xnk+1 −1 ) converges, then the P∞ x also converges, n=1 n P Exercise 4.1.8. Prove that if lim x = 0, then xn converges if and only if n→∞ n P (x2n−1 +x2n ) converges, and the two sums are the same. What about combining three consecutive terms? P Exercise 4.1.9. Suppose xn is decreasing and positive. Prove that xn conP n P 1 and verges if and only if 2 x2n converges. Then study the convergence of nα P 1 . n(log n)α

An infinite product is ∞ Y

xn = x1 x2 · · · xn · · · .

n=1

The partial product is the sequence pn = x1 x2 · · · xn . If limn→∞ pn = p and p 6= 0, then we say the infinite product converges to (or has the product) p. Because the limit limn→∞ pn is assumed to be nonzero, we have xn 6= 0, so that the convergence is not affected if we modify, add or delete finitely many nonzero terms. Moreover, for a convergent infinite product we have pn limn→∞ pn p = = = 1. n→∞ pn−1 limn→∞ pn−1 p

lim xn = lim

n→∞

4.1. SERIES OF NUMBERS

207

In particular, the terms xn will be positive for big enough n. Since the finitely many possibly negative (and nonzero) terms will not affect the convergence, we may pretend all terms are positive and use log to convert the infinite product to a series ∞ X

log xn = log x1 + log x2 + · · · + log xn + · · · ,

n=1

whose partial sum is log x1 + log x2 + · · · + log xn = log pn . Q P Thus the infinite product xn converges if and only if the series log xn converges. Exercise 4.1.10. Compute convergent infinite products. Q 1 Q∞ 1 n 3. ∞ n=1 2 . . 1. n=1 1 + n (−1)n Q n! . 2 4. ∞ n=1 Q (−1)n Q x 2. ∞ . 5. ∞ n=2 1 + n=1 cos n . n 2 P Exercise 4.1.11. Use the Cauchy criterion for the convergence of the series Q log xn to get the Cauchy criterion for the convergence of the infinite product xn . Exercise 4.1.12. Establish properties for infinite products similar to Exercises 4.1.5 and 4.1.6. Exercise 4.1.13. Why do we have to consider the infinite product as divergent when limn→∞ pn = 0? What bad things may happen if the case limn→∞ pn = 0 is considered as convergent?

4.1.2

Comparison Test

Similar to the convergence of improper integrals, the convergence of series may be determined by comparing with other series. P PropositionP4.1.2 (Comparison Test). Suppose |xn | ≤ yn . If yn converges, then xn converges. Proof. By the Cauchy criterion for Pthe convergence of yn , for any > 0, there is N , such that (4.1.1) holds for yn . Then for n ≥ m > N , we have |xm + xm+1 + · · · + xn | ≤ |xm | + |xm+1 | + · · · + |xn | ≤ ym + ym+1 + · · · + yn < . P This verifies the Cauchy criterion for the convergence of xn . P P P A series xn absolutely converges if |xn | converges. The series xn in the proposition absolutely converges. The proposition also tells us that absolute convergence implies convergence. Moreover, by the discussion in

208

CHAPTER 4. SERIES

P Example 4.1.2, a series xn absolutely converges if and only if the partial P sum of |xn | is bounded. A special case of the comparison test is that if both xn and yn are positive, P xn and limn→∞ = l exists, then the convergence of yn implies the converP yn gence of xn . If l 6= 0, then the two convergences are equivalent. More generally, if c1 yn ≤ xn ≤ c2 yn for some c1 , c2 > 0, then the two convergences are also equivalent. A series conditionally converges if it converges but not absolutely converges. P 1 1 1 in Example Example 4.1.4. Since α ≤ 2 for α ≥ 2, the convergence of n n n2 P 1 1 1 4.1.2 implies that converges for α ≥ 2. On the other hand, since α ≥ for α n n n P 1 P1 implies that diverges 0 < α ≤ 1, the divergence of the harmonic series n nα for 0 < α ≤ 1. p Example 4.1.5. Suppose xn satisfies n |xn | ≤ a for someP constant a < 1. Then |xn | ≤ an and the convergence of the geometric series an in Example 4.1.1 P implies that xn absolutely converges. This is called the root test. For a specific example, let p(t) be a nonzero polynomial. Then p p lim n |p(n)an | = |a| lim n |p(n)| = |a|. n→∞

n→∞

p If |a| < 1, then the limit implies that for any |a| < b < 1, we have n |p(n)an | < b for sufficiently big n. Since the convergence is independent of the choice Pof a series of finitely many terms, we conclude that p(n)an converges for |a| < 1. P If |a| ≥ 1, then p(n)an does not converge to 0 as n → ∞. Thus p(n)an diverges for |a| ≥ 1. xn+1 ≤ a for some constant a < 1. Then Example 4.1.6. Suppose xn satisfies xn x2 x3 xn ≤ |x1 |an−1 . |xn | = |x1 | · · · x1 x2 xn−1 P P The convergence of the geometric series |x1 | an−1 implies that xn absolutely converges. This is called the ratio test. P (n!)2 an For a specific example, the series satisfies (2n)! (n!)2 an n2 |a| |a| (2n)! = lim lim = . 2 n−1 n→∞ ((n − 1)!) a n→∞ 2n(2n − 1) 4 (2n − 2)! |a| + 1 < 1 for sufficiently big n, and the series 5 converges. If |a| ≥ 4, then the absolute value of the quotient is > 1 and the individual term does not converge to 0 (by Exercise 1.1.33, the limit is in fact ∞), so that the series diverges.

Thus for |a| < 4, the ratio is <

4.1. SERIES OF NUMBERS

209

Example 4.1.7. Consider the series X (−1)n−1 n

=1−

1 1 1 + − + ··· 2 3 4

By 0<

1 1 1 1 − = < 2 2n − 1 2n 2n(2n − 1) n

and the comparison test, we know the series X 1 1 1 1 1 = 1− + + ··· − − 2n − 1 2n 2 3 4 P 1 1 converges. The partial sum of is the even part s2n of the partial − 2n − 1 2n P (−1)n−1 sum sn of . By the convergence of s2n and limn→∞ (s2n − s2n−1 ) = n 1 limn→∞ = 0, we see that s2n−1 converges to the same limit, so that the series 2n n−1 P (−1) converges. n P1 P (−1)n−1 By the divergence of , we further know that converges condin n tionally. Exercise 4.1.14. Suppose p(x) and q(x) are polynomials. Determine the converP p(n) gence of . q(n) P P Exercise 4.1.15. Is the P convergence of xn and yn related to the convergence P of max{xn , yn } and min{xn , yn }? xn+1 yn+1 for sufficiently big n. Prove that if Exercise 4.1.16. Suppose ≤ xn P yn P yn absolutely converges, then xn absolutely converges. p P Exercise 4.1.17 (Root Test). Prove that xn absolutely converges if limn→∞ n |xn | 1. xn+1 P < Exercise 4.1.18 (Ratio Test). Prove that xn absolutely converges if limn→∞ x n xn+1 > 1. 1 and diverges if limn→∞ xn Exercise 4.1.19. Prove that p xn+1 n , lim |xn | ≤ lim n→∞ n→∞ xn

p xn+1 n . lim |xn | ≥ lim xn n→∞ n→∞

What do the inequalities tell you about the relation between the root and ratio tests? Exercise 4.1.20. Determine the convergence of series. 1.

P

1 p . n(n − 1)

2.

P

3.

P

sin n p

(n2 − 1)(n2 − 2) n

an+(−1) .

.

2

4.

P

an .

5.

P

n2 an .

2

210

CHAPTER 4. SERIES

6.

P√

7.

P

an + bn .

√

10.

1 . + bn

an

P 1 8. (a n − 1). P 1 1 en − 1 − 9. . n

P

an + b cn + d

11.

P log n . n3

12.

P

n .

13.

P

1 . (log n)n

14.

P

n!an .

nan . (n + 1)! p P (2n)!an 16. . n! 15.

1 . (log n)2

P

Exercise 4.1.21. Let [x] be√the biggest integer ≤ x, defined in Exercise 1.3.5. P (−1)[ n] , with α > 0. Consider the series nα 1. Prove that (n+1)2 −1

X

xn =

k=n2

1 1 1 1 + ··· + = 2α + 2 α k n (n + 1) ((n + 1)2 − 1)α

satisfies 2n 2n < xn < 2α . 2 α ((n + 1) − 1) n 1 M 2. If α > , prove that |xn+1 − xn | < 2α for a constant M . 4 n 1 3. If α ≤ , prove that xn is increasing for sufficiently big n. 4 √

P (−1)[ 4. With the help of Exercise 4.1.7, prove that nα 1 and diverges for α ≤ . 4

n]

converges for α >

1 2

Exercise 4.1.22. Suppose xn 6= −1. 1. Prove that if converges.

P

xn converges, then

Q

(1 + xn ) converges if and only if

P

x2n

2. Prove that if converges.

P

x2n converges, then

Q

(1 + xn ) converges if and only if

P

xn

The convergence of series may also be determined by comparing with the convergence of improper integrals. Proposition 4.1.3 (Integral Comparison Test). Suppose f (x) is a decreasing P function on [a, +∞) satisfying limx→+∞ f (x) Z= 0. Then the series f (n) +∞

converges if and only if the improper integral

f (x)dx converges. a

Proof. Since the convergence is not changed if finitely many terms are modified or deleted, we may assume a = 1 without loss of generality.

4.1. SERIES OF NUMBERS

211 Z

k+1

Since f (x) is decreasing, we have f (k) ≥

f (x)dx ≥ f (k + 1). Then k

f (1) + f (2) + · · · + f (n − 1) Z 2 Z Z n f (x)dx = f (x)dx + ≥ 1

1

3

Z

n

f (x)dx + · · · +

2

f (x)dx n−1

≥f (2) + f (3) + · · · + f (n). Z

n

This implies that f (x)dx is bounded if and only if the partial sums of the 1 P series f (n) are bounded. Since Z +∞f (x) ≥ 0, the boundedness is equivalent P f (x)dx converges if and only if f (n) to the convergence. Therefore a converges. Z +∞ P 1 dx Example 4.1.8. The series converges if and only if converges, which α n xα 1 by Example 3.3.7 means α > 1. The Riemann zeta function ζ(α) = 1 +

1 1 1 + α + ··· + α + ··· α 2 3 n

(4.1.3)

is then defined for α > 1. Z +∞ 1 dx converges if and only if Example 4.1.9. The series α n(log n) x(log x)α 2 converges. It is easy to see that the improper integral converges if and only if α > 1. Therefore the series converges if and only if α > 1. P

Example 4.1.10. Suppose a is not a multiple of π. We study the convergence of P | sin na| . nα P 1 1 | sin na| ≤ α , the convergence of , and the comparison For α > 1, by α n n nα P | sin na| test, the series converges. nα Z +∞ | sin x| For α ≤ 1, an argument similar to Example 3.3.12 shows that dx xα 1 diverges. This suggests that the series diverges. However, we cannot apply the | sin x| integral comparison test because the function is not decreasing. xα π π Assume 0 < a ≤ . Then any interval of length contains an integer multiple 2 2 of a. Choose natural numbers nk , such that 4k + 1 4k + 3 π ≤ nk a ≤ π. 4 4 1 Then | sin nk a| ≥ √ , and for α ≤ 1, 2 ∞ X | sin na| n=1

nα

≥

∞ X | sin na| n=1

n

≥

∞ X | sin nk a| k=1

nk

∞

1 X 1 ≥√ . 2 k=1 nk

212

CHAPTER 4. SERIES

By a 1 4 a ≥ > , nk 4k + 3 π 4k P a 1 , and the comparison test, we conclude that ∞ k=1 4k nk P∞ | sin na| diverges for α ≤ 1. diverges. Consequently, the series n=1 nα π For general a that is not an integer multiple of π, there is b, such that 0 < b ≤ 2 and either a+b or a−b is an integer multiple of π. Then we have | sin na| = | sin nb|, P | sin na| and we still conclude that ∞ diverges when α ≤ 1. n=1 nα Exercise 4.1.23. Show that the number of n digit numbers that do not contain the digit 9 is 8 · 9n−1 . Then use the fact to prove that if we delete the terms in the harmonic series that contain the digit 9, then the series becomes convergent. What about deleting the terms that contain some other digit? What about the numbers expressed in the base other than 10? What about deleting similar terms P 1 in the series ? nα Exercise 4.1.24. Determine the convergence of series. P P 1 sin n P a log n n p . 1. . 3. . 5. 1− nα + (log n)β n n(n − 1)(n − 2) the divergence of

2.

P

P∞

k=1

1 . (log n)α log n

4. P

P

nα (log n)β .

6.

P

Z 0

1 n

xα dx. 1 + x2

P xn x2n converges, then also converges for any nα

Exercise 4.1.25. Prove that if 1 1 α > . What if α = ? 2 2 Exercise 4.1.26. Use the idea of the proof of Proposition 4.1.3 to prove log(n−1)! < (n + 1)n+1 nn . n(log n − 1) + 1 < log n!. Then derive the inequality n−1 < n! < e en

4.1.3

Conditional Convergence

P A series xn is alternating if the signs of P xn and xn+1 are different. Another way of expressing an alternating series is (−1)n xn , with xn ≥ 0. Proposition (Leibniz Test). Suppose xn is decreasing and limn→∞ xn = P 4.1.4 n 0. Then (−1) xn converges. Moreover, the sum s and the partial sum sn satisfy |sn − s| ≤ xn+1 . Proof. The partial sum satisfies s2n+1 − s2n−1 = x2n − x2n+1 ≥ 0, s2n+2 − s2n = −x2n+1 + x2n+2 ≤ 0. Thus s2n+1 is increasing and s2n is decreasing. Moreover, by s2n − s2n+1 = x2n+1 ≥ 0, s2n+1 has upper bound and s2n has lower bound. Therefore the sequences converge. Then limn→∞ (s2n − s2n+1 ) = 0 implies that the limits are the same.

4.1. SERIES OF NUMBERS

213

The estimation |sn − s| ≤ xn+1 follows from 0 ≤ s2n − s ≤ s2n − s2n+1 = x2n+1 , 0 ≤ s − s2n−1 ≤ s2n − s2n−1 = x2n .

P (−1)n converges for α > 0. Note that by ExamExample 4.1.11. The series nα ple 4.1.8, the series absolutely converges only for α > 1. Therefore the series conditionally converges for 0 < α ≤ 1. Exercise P a n 4.1.27. Suppose b 6= 0. Find all the combinations of a and b such that n b converge. Exercise 4.1.28. Find all a 6= e−1 such that will be dealt with in Exercise 4.1.48).

P (na)n converges (the case a = e−1 n!

P P 2 Exercise 4.1.29. Construct a convergent series xn such that xn P the series diverges. By Exercise 4.1.22, weQ see that the convergence of xn does not necessarily imply the convergence of (1 + xn ). Exercise 4.1.30. Suppose xn is decreasing and limn→∞ xn = 1. Prove that the Q (−1)n “alternating infinite product” xn converges. Then find suitable xn , such P (−1)n that the series Q yn defined by 1 + yn = xn diverges. This shows P that the convergence of (1 + yn ) does not necessarily imply the convergence of yn .

We have Dirichlet and Abel tests for the convergence of improper integrals in Propositions 3.3.3 and 3.3.4, especially in the conditionally convergent cases. The analogue of the tests for the convergence of the series is the following. Proposition 4.1.5 (Dirichlet Test). Suppose P xn is monotone and limn→∞ P xn = 0. Suppose the partial sums of the series yn are bounded. Then xn y n converges. Proposition 4.1.6 P (Abel Test). Suppose P xn is monotone and bounded. Suppose the series yn converges. Then xn yn converges. Proof. The proof is the discrete version of the proof for the convergence of improper integrals. First we need the discrete version of the integration by parts, which already appeared in Exercise 3.2.55 and the proof of Theorem 3.3.6. P Let sn be the partial sum of yn . Then for n ≥ m, n X

xk yk =xm (sm − sm−1 ) + xm+1 (sm+1 − sm ) + · · · + xn (sn − sn−1 )

k=m

= − xm sm−1 − (xm+1 − xm )sm − (xm+2 − xm+1 )sm+1 − · · · − (xn − xn−1 )sn−1 + xn sn =xn sn − xm sm−1 −

n−1 X

(xk+1 − xk )sk .

k=m

214

CHAPTER 4. SERIES sn ..................................................................................................................................................................... . . sn−1 ......................................................................................................................................................................... . ....................................................................................................................................................................................................................................................................................... ... . . . . ............................................................................................................................... ... ... ... . . .. .. .. .. ......................................................................................................... .. .. .. .. .. .. .. .. .. .. .. sm+1.................................................................................... .. .. .. .. .. .. . . .. .. .. .. . . . . sm ........................................................... .. .. .. .. .. .. .. . . .. .. .. .. . . . . . . .. .. .. .. .. .. sm−1................................................................... .. .. .. .. .. .. .. .................................................... ... .. .. .. .. .. .. ......................................... . .. .. .. .. .. .. ................................................................ ... .. . . . . .. ........................................ . ............................................................................................................................................................. 0 xn−1 xn xm xm+1 xm+2

Figure 4.1: the big square is equal to shaded small square, plus vertical strips, plus horizontal strips The geometric meaning of the equality is illustrated in Figure 4.1. Under the assumption of either test, we have |sk | ≤ M for a constant M and all k. Moreover, the increments xk+1 − xk do not change sign. Therefore n−1 n−1 n−1 X X X (xk+1 − xk ) = M |xn −xm |. M |xk+1 −xk | = M (xk+1 − xk )sk ≤ k=m

k=m

k=m

Assuming the condition of the Dirichlet test, for any > 0, there is N , such that k > N implies |xk | < . Then n ≥ m > N implies n X xk yk ≤ |sn | + |sm−1 | + 2M ≤ 4M . k=m P This verifies the Cauchy criterion for the convergence of x n yn . Assuming the condition of the Abel test, we know both limn→∞ xn and limn→∞ xn sn converge. The Cauchy criterion for both limits tell us that for any > 0, there is N , such that m, n > N implies |xn − xm | < and |xn sn − xm sm | < . Then n ≥ m > N implies n X xk yk ≤ + M = (M + 1). k=m P This also verifies the Cauchy criterion for the convergence of x n yn . Example 4.1.12. Let us consider the discrete version of Example 3.3.12. We are concerned with the series X sin na X 1 = sin na. nα nα P If a is not a multiple of π, then the partial sum of sin na is 1 1 cos n + a − cos a n X 2 2 sin ka = , 1 2 sin a k=1 2

4.1. SERIES OF NUMBERS

215

1 1 . Moreover, for α > 0, the sequence is decreasing which is bounded by nα sin 1 a 2 and converges to 0. By the Dirichlet test, the series converges. P sin na Combined with the discussion in Example 4.1.10, we see that the series nα absolutely converges for α > 1 and conditionally converges for 0 < α ≤ 1. Exercise 4.1.31. Determine the absolute and conditional convergence of series 1.

P

(−1)n nα logβ n.

2.

P

(−1)n . (log n)α log n n(n−1) 2

P (−1) 3. nα

.

4.

5.

P (−1)

n(n−1) 2

nα

P cos na . nα

sin n

.

P sin3 na . nα P √ 7. sin n. 6.

8.

√ P sin n . nα

Exercise 4.1.32. Derive the Leibniz test and the Abel test from the Dirichlet test. P an Exercise 4.1.33. Prove that if β > α, then the convergence of implies the nα P an convergence of . nβ

4.1.4

Rearrangement of Series

P P A rearrangement of a series xn is xkn , where n → kn is a one-to-one correspondence from the index set to itself (i.e., a rearrangement of the indices). Proposition 4.1.7. Any rearrangement of an absolutely convergent series is still absolutely convergent. Moreover, the sum is the same. P Proof. Let s = xn . For any > 0, there is a natural number N , such that PN 0 P∞ 0 i=N +1 |xi | < . Let N = max{i : ki ≤ N }. Then i=1 xki contains all the P P 0 terms x1 , x2 , . . . , xN . Therefore for n > NP, the difference ni=1 xki − N i=1 xi ∞ is a sum of some non-repeating terms in i=N +1 xi , and we have n ∞ n N N X X X X X |xi | < 2. xk i − s ≤ xki − xi + xi − s ≤ 2 i=1

i=1

i=1

i=1

i=N +1

The absolute convergence ofP the rearrangement may be obtained by applying what was just proved to |xn |.

In contrast to absolutely convergent series, Exercise 4.1.2 shows that the rearrangement of conditionally convergent series may change the sum. In fact, the rearrangement can produce any behavior we wish to have. Proposition 4.1.8. A conditionally convergent series may be rearranged to have any number as the sum, or to become divergent.

216

CHAPTER 4. SERIES

P 00 P P 0 xn be the Proof. Suppose xn conditionally converges. Let xn and series obtained byP respectively taking only P 00the non-negative P P 0 terms and the 0 negative terms. xn −P xn also converges. n converges, P If xP P 00then xn = Therefore |xn | = x0n − xn converges. Since |xn | is assumed to P 0 0 diverge, the shows that xP n diverges. Because xn ≥ 0, we P contradiction 00 0 have xn = −∞. conclude xn = +∞. Similarly, P we 0 Let s be any number. By xn = +∞, there is m1 , such that m 1 −1 X

x0k

≤s<

k=1

Then by

P

x0k .

k=1

x00n = −∞, there is n1 , such that nX 1 −1

x00k

≥s−

k=1

Then by

m1 X

P

n>n1

m1 X

x0k

>

k=1

n1 X

x00k .

k=1

x0n = +∞, there is m2 , such that m 2 −1 X

x0k

m1 X

≤s−

k=m1 +1

x0k

−

k=1

n1 X

x00k

k=1

m2 X

<

x0k .

k=m1 +1

Keep going, we get −x00np

≥s−

mp X

x0k

−

np X

x00k

> 0,

−x0mp

≤s−

np−1

x0k

−

X

k=1

k=1

k=1

mp X

x00k < 0.

k=1

Then for mp−1 < j ≤ mp , by x0k > 0, we have −x0mp

≤

np−1 X 0 xk − s− k=1 k=1 mp X

x00k

≤

np−1 X 0 s− xk − k=1 k=1 j X

mp−1

x00k

≤ s−

np−1 X 0 xk − k=1 k=1

X

x00k ≤ −x00np−1 ,

and for np−1 < j ≤ np , by x00k < 0, we have −x00np

≥ s−

mp X k=1

x0k −

np X k=1

x00k

≥ s−

mp X k=1

x0k −

j X k=1

x00k

≥ s−

mp X k=1

np−1

x0k −

X

x00k ≥ −x0mp .

k=1

P

The convergence of xn implies limn→∞ xn = 0. Then the two estimations above show that the rearranged series x01 + · · · + x0m1 + x001 + · · · + x00n1 + x0m1 +1 + · · · + x0m2 + x00n1 +1 + · · · + x00n2 + · · · converges to s. 1 1 1 Exercise 4.1.34. Rearrange the series 1 − + − + · · · so that p positive terms 2 3 4 are followed by q negative terms and the pattern repeated. Show that the sum of p 1 the new series is log 2 + log . For any real number, expand the idea to construct 2 q a rearrangement to have the given number as the limit.

4.1. SERIES OF NUMBERS

217

Exercise 4.1.35. Prove satisfies |kn − n| < M for a P that if the rearrangementP constant M , then xkn converges if and only if xn converges, and the sums are the same. P Exercise 4.1.36. Suppose xn conditionally converges. Prove that for any s and t satisfying s < t, there is a rearrangement with the partial sum s0n satisfying limn→∞ s0n = s, limn→∞ s0n = t. Moreover, prove that any number between s and t is the limit of a convergent subsequence of {s0n }.

P P The product of two series xn and ynPinvolves the product xm yn for all m and n. In general, the product series xP sense only after m yn makesP we arrange all the terms into a linear sequence (xy)k = xmk ynk , which is given by a one-to-one correspondence (mk , nk ) : N → N × N. For example, the following is the “diagonal arrangement” X (xy)k =x1 y1 + x1 y2 + x2 y1 + · · · (4.1.4) + x1 yn−1 + x2 yn−2 + · · · + xn−1 y1 + · · · , and the following is the “square arrangement” X (xy)k =x1 y1 + x1 y2 + x2 y2 + x2 y1 + · · ·

(4.1.5)

+ x1 yn + x2 yn + · · · + xn yn−1 + xn yn + xn yn−1 + · · · + xn y1 + · · · ....n .. .. .. .. .... ......... ......... (xy) . •........ 7 ..•....... ..•....... .•....... . ..... ..... .. ..... ..... ........ ..... .. ..... ... .(xy) .. (xy) .. •........ 4 ..•....... 8 ..•....... .•....... .... .... ... .. ...... .. ...... ......... ........ ..... (xy) . (xy) . (xy) . •........ 2 .•....... 5 .•....... 9 .•....... .. .. .. .. ..... .. ...... ......... ......... .(xy) .. (xy) .. (xy) .. (xy) .......... m •................1.......•............3........•............6........•...........10

......................................... ....n .. .. .. .. . 10 (xy)11 (xy)12 (xy)13 . .•.(xy) .. ................•...............•...............•... .. .. .. ..(xy)5 (xy)6 (xy)7 ..(xy)14 ... .. •.................•...............•... •... .. .. .. .. .. .. .. .. (xy)3 .(xy)8 .(xy)15 2 .•..(xy) •.. •.. ...............•... .. .. .. .. ..(xy)1 ..(xy)4 ...(xy)9 ...(xy)16 •........................•........................•........................•......................... m

Figure 4.2: diagonal and square arrangements In view of Proposition 4.1.7, the following result shows that the arrangement does not matter if the series absolutely converge. P P Proposition 4.1.9. Suppose xn and yn absolutely converge. Then P P P P xm yn also absolutely converges, and xm yn = ( xm ) ( yn ). Proof. Let s=

X

xm , t =

X

yn , S =

X

|xm |, T =

For any > 0, there is a natural number N , such that X X |xm | < , |yn | < . m>N

n>N

X

|yn |

218

CHAPTER 4. SERIES

Then N X xm − s < , m=1

N X yn − t < , |s| ≤ S, n=1

N X yn ≤ T. n=1

PK Let K = max{k : mk ≤ N and nk ≤ N }. Then i=1 (xy)i contains all the terms xmyn with 1 ≤m, n ≤ N . Therefore for k > K, the difference Pk PN PN is a sum of some non-repeating terms i=1 (xy)i − m=1 xm n=1 yn xm yn with either m > N or n > N , and we have ! N ! k N X X X X xm yn ≤ |xm yn | (xy)i − m=1 n=1 i=1 either m>N or n>N X X ≤ |xm yn | + |xm yn | m>N

X

=

n>N

|xm |

X

|yn | +

X

m>N

|xm |

X

|yn |

n>N

≤ (T + S). Combined with N N N N ! ! N X X X X X xm − s yn + |s| yn − t yn − st ≤ xm m=1 n=1 n=1 n=1 m=1 N X ≤ yn + |s| ≤ (T + S), n=1

we get k X (xy)i − st ≤ 2(T + S). i=1

The absolute convergence ofP the productPseries may be obtained by applying what was just proved to |xm | and |yn |. Example 4.1.13. The geometric series absolutely converges to The product of two copies of the geometric series is X i,j≥0

ai aj =

∞ X X n=1 i+j=n

an =

∞ X

1 for |a| < 1. 1−a

(n + 1)an .

n=1

Thus we conclude 1 + 2a + 3a2 + · · · + (n + 1)an + · · · =

1 . (1 − a)2

Exercise 4.1.37. Use the Taylor series of ex to verify ex ey = ex+y . P P Exercise 4.1.38. Suppose xn and yn converge necessarily absolutely). P (not P Does the square arrangement (4.1.5) converge to ( xn )( yn )?

4.1. SERIES OF NUMBERS

219

P P Exercise 4.1.39. Suppose xn absolutely converges and yn converges. Prove that the diagonal arrangement (4.1.4) converges. Moreover, show that the condiP (−1)n √ tion of absolute convergence is necessary by considering the product of n with itself. Exercise 4.1.40. Let pn , n = 1, 2, . . . , be prime numbers in increasing order. Let Sn be the set of natural numbers whose prime factors are among p1 , p2 , . . . , pn . For example, 20 6∈ S2 and 20 ∈ S3 because the prime factors of 20 are 2 and 5. Q 1 −1 P 1 = n∈Sk α . 1. Prove that ki=1 1 − α pi n Q 1 1 − α converge? 2. When does the infinite product pn 3. When does the series

P 1 converge? pαn

1 −1 when i=1 pαi the right side converges. The equality relates the function to the number theory. Note that the zeta function (see Example 4.1.8) ζ(α) =

4.1.5

Q∞

1−

Additional Exercise

Convergence of Series P 1 converges Exercise 4.1.41. Suppose xn > 0 and xn is increasing. Prove that xn P n if and only if converges. x1 + x2 + · · · + xn P Exercise 4.1.42. Suppose xn > 0. Prove that xn converges if and only if P xn converges. x1 + x2 + · · · + xn P xn converges. Exercise 4.1.43. Suppose xn ≥ 0. Prove that (x1 + x2 + · · · + xn )2 Exercise 4.1.44. Suppose Pn and decreasing sequence. Prove that Pnxn is a non-negative (x − x ) = if limP x = 0 and n→∞ n n i=1 i i=1 xi − nxn ≤ B for a fixed bound B, then xn converges.

Raabe2 and Bertrand3 Tests Exercise 4.1.16 is the mother of ratio tests. By applying the exercise to the power series, we get the ratio test in Example 4.1.8 and Exercise 4.1.18. By applying the exercise to the series in Examples 4.1.8 and 4.1.9, we get the Rabbe and Bertrand tests. xn+1 ≤ 1 − a for some a > 1 and big n, then Exercise 4.1.45. Prove that if xn n P xn absolutely converges. 2 Joseph Ludwig Raabe, born 1801 in Brody (now Ukrain), died 1859 in Z¨ urich (Switzerland). 3 Joseph Louis Fran¸cois Bertrand, born 1822 and died 1900 in Paris (France).

220

CHAPTER 4. SERIES

xn+1 ≥ 1 − 1 for some constant a and big n, Exercise 4.1.46. Prove that if xn n−a P then xn does not absolutely converge. Exercise 4.1.47. Rephrase the Raabe test in Exercises 4.1.45 and 4.1.46 in terms xn . of the quotient xn+1 P α(α + 1) · · · (α + n) Exercise 4.1.48. Determine the convergence of the series β(β + 1) · · · (β + n) P nn and . en n! P Exercise 4.1.49. Derive the condition for the absolute convergence of xn in terms xn+1 a and 1 − , or the comparison between of the comparison between xn n log n xn a xn+1 and 1 + n log n .

Kummer Test Exercise 4.1.50. Prove that if there are cn > 0 and δ > 0, such that cn − xn+1 ≥ δ for sufficiently big n, then P xn absolutely converges. cn+1 xn Exercise 4.1.51. Prove that if xn > 0, and there are cn > 0, such that cn − P 1 P xn+1 cn+1 ≤ 0 and diverges, then xn diverges. xn cn P Exercise 4.1.52. Prove xn absolutely converges, then there are cn > 0, that if xn+1 ≥ 1 for all n. such that cn − cn+1 xn Exercise 4.1.53. Derive ratio, Raabe and Bertrand tests from the Kummer test.

Absolutely Convergence of Infinite Product Q Suppose x = 6 −1. An infinite product (1 + xn ) absolutely converges if n Q (1 + |xn |) converges. Q Exercise 4.1.54. Prove that (1 + xn ) absolutely converges if and only if the P Q series xn absolutely converges. Moreover, (1 + |xn |) = +∞ if and only if P xn = +∞. Exercise 4.1.55. Prove that if the infinite product absolutely converges, then the infinite product converges. Q Exercise 4.1.56. Suppose 0 < xn
Ratio Rule Q xn+1 , the limit of the sequence xn xn+1 xn can be studied by considering the ratio . This leads to the extension xn of the ratio rules in Exercises 1.1.32, 2.2.29, 2.2.32. By relating xn to the partial product of

4.1. SERIES OF NUMBERS

221

xn+1 ≤ 1 − yn and 0 < yn < 1. Use Exercises 4.1.54 Exercise 4.1.58. Suppose xn P and 4.1.56 to prove that if yn = +∞, then limn→∞ xn = 0. xn+1 P ≥ 1 + yn and 0 < yn < 1. Prove that if Exercise 4.1.59. Suppose yn = xn +∞, then limn→∞ xn = ∞. xn+1 Exercise 4.1.60. Suppose 1 − yn ≤ ≤ 1 + zn and 0 < yn , zn < 1. Prove that xn P P if yn and zn converge, then limn→∞ xn converges to a nonzero limit. 1

Exercise 4.1.61. Study limn→∞

(n + a)n+ 2 , the case not yet settled in Exercise (±e)n n!

2.2.30.

An Example by Borel P√ Suppose an > 0 and an converges. Suppose {rn } is all the rational P an for numbers in [0, 1]. We study the convergence of the series |x − rn | x ∈ [0, 1]? √ √ Exercise 4.1.62. Prove that if x 6∈ ∪n (rn − c an , rn + c an ), then the series converges. P√ 1 an < Exercise 4.1.63. Use Heine-Borel theorem to prove that if , then 2c √ √ [0, 1] 6⊂ ∪n (rn − c an , rn + c an ). By Exercise 4.1.62, this implies that the series converges for some x ∈ [0, 1].

Approximate Partial Sum by Integral The P proof of Proposition 4.1.3 gives an estimation of the partial sum for f (n) by the integral of f (x) on suitable intervals. The idea leads to an estimation of n! in Exercise 4.1.26. In what follows, we study the approximation in general. Suppose f (x) is a decreasing function on [1, +∞) Z satisfying limx→+∞ f (x) = n

0. Denote dn = f (1) + f (2) + · · · + f (n − 1) −

f (x)dx. 1

Exercise 4.1.64. Prove that dn is increasing and 0 ≤ dn ≤ f (1)−f (n). This implies that dn converge to a limit γ. 1 Exercise 4.1.65. Prove that if f (x) is convex, then dn ≥ (f (1) − f (n)). 2 Exercise 4.1.66. By using Exercise 3.1.53, prove that if f (x) is convex and differentiable, then for m > n, we have dn − dm + 1 (f (n) − f (m)) ≤ 1 (f 0 (m) − f 0 (n)). 8 2 Exercise 4.1.67. For convex and differentiable f (x), prove dn − γ + 1 f (n) ≤ − 1 f 0 (n). 2 8

222

CHAPTER 4. SERIES

Exercise 4.1.68. By using Exercise 3.2.34, prove that if f (x) has second order derivative, then for m > n, we have m−1 m−1 1 X 1 1 0 1 X 00 0 inf f ≤ dn −dm + (f (n)−f (m))− (f (n)−f (m)) ≤ sup f 00 . 24 2 8 24 [k,k+1] [k,k+1] k=n

k=n

Exercise 4.1.69. Let γ be the Euler-Mascheroni constant in Exercise 1.4.38. Prove 1 1 1 1 1 1 ≤ 1 + + ··· + − log n − γ + + 2 ≤ . 2 24(n + 1) 2 n−1 2n 8n 24(n − 1)2 √ 1 1 1 Exercise 4.1.70. Estimate 1+ √ + √ +· · ·+ √ −2 n (see Exercise 1.2.14). n−1 2 3

4.2

Series of Functions

We study of the series of functions, especially the Taylor and Fourier series. To preserve the properties such as limit, differentiation, integration of the functions under the infinite sum, some condition on the uniformity on the convergence is needed. We first introduce such uniformity condition for sequences and series of functions. Then we will study Taylor and Fourier series in detail.

4.2.1

Uniform Convergence

A sequence of functions fn (x) converges to a function f (x) if limn→∞ fn (x) = f (x) for each x. The following are some examples. 1 = 0, n→∞ n + x ( 0 if |x| < 1 lim xn = . n→∞ 1 if x = 1 x n lim 1 + = ex , n→∞ n

x ∈ (−∞, +∞)

lim

x ∈ (−1, 1] x ∈ (−∞, +∞)

The second example indicates that the continuity of fn (x) does not necessarily imply the continuity of the limit function. In other words, the equality limn→∞ limx→a fn (x) = limx→a f (x) = limx→a limn→∞ fn (x) does not necessarily hold. The next example shows that the limit and the integration may not commute. Example 4.2.1. Consider the function fn (x) in Figure 4.3. We have limn→∞ nfn (x) = Z 1 1 0 and nfn (x)dx = . In particular, 2 0 Z lim

n→∞ 0

1

1 nfn (x)dx = = 6 2

Z

1

lim nfn (x)dx = 0.

0 n→∞

4.2. SERIES OF FUNCTIONS

223

.... .. 1 ..... .... .. ...... .. .. .. .. .. ... .. .. ... .. .. ... .. .. ... .... ... .... ... .... .. .... ... . .. ................................................................................................................ 0 ... n1 1 . Figure 4.3: a non-uniform convergent sequence Z Z

b

fn (x)dx =

Given limn→∞ fn (x) = f (x) on [a, b], how can we prove limn→∞ a

b

f (x)dx? We have a

Z b Z b Z b fn (x)dx − f (x)dx ≤ |fn (x) − f (x)|dx. a

a

a

If for any > 0, there is N − f (x)| Z, bsuch that |fZn (x) < for all n > N and b x ∈ [a, b], then we have fn (x)dx − f (x)dx ≤ (b − a) for n > N . a

a

The proof leads to the following definition. Definition 4.2.1. A sequence of functions fn (x) uniformly converges to a function f (x) on [a, b] if for any > 0, there is N , such that n > N, x ∈ [a, b] =⇒ |fn (x) − f (x)| < .

(4.2.1)

The key point of the definition is that the inequality holds for all x in the defining domain. In other words, N is independent of the choice of x. Of course, the defining domain does not have to be closed interval only. The interval [a, b] in the definition can be replaced by any set of numbers. ... ... ... ............................... . . . . . . . . . . ... .......... ... .......... fn . . . . . . . . . . . . . . . . ... ............. ....... ................................. ............... ... .......... ......................... ......... .................. .... .. f + ...... .......... .. ........ ..... ......... . . . . . . . . . . . . . . . . . . .... .......... . . . . . . . . . .......... ........ f .. ... ...... ............... ....... ... ..... ....... ....... .. f − . ............. .. .. ...... .................................................................................................................................................. a b Figure 4.4: uniform convergence It is not difficult to derive the Cauchy criterion for the uniform convergence (the defining domain is omitted in the statement).

224

CHAPTER 4. SERIES

Proposition 4.2.2 (Cauchy Criterion). A sequence fn (x) uniformly converges if and only if for any > 0, there is N , such that m, n > N =⇒ |fm (x) − fn (x)| < .

(4.2.2)

Example 4.2.2. For any fixed R and > 0, we have 1 1 1 < . x ≥ R, n > − R =⇒ n + x > =⇒ n + x 1 uniformly converges to 0 on [R, +∞) for any R. n+x On the other hand, the sequence is defined on the set X of all real numbers except negative integers. However, the sequence is not uniformly convergent on X. Specifically, for = 1 and any N , pick a natural number n > N . Then for 1 1 x = −n + ∈ X, we have − 0 = 2 > . 2 n+x Therefore

Example 4.2.3. Let 0 < R < 1. For any > 0, we have |x| ≤ R, n > logR =

log =⇒ |xn | ≤ Rn < RlogR = . log R

Therefore xn uniformly converges to 0 on [−R, R] for any 0 < R < 1. On the other hand, for any 0 < < 1 and any N , pick a natural number √ n > N . Then we have 0 < x = n < 1 and xn = . This shows the convergence is not uniform on (0, 1). x Example 4.2.4. We have limn→∞ n log 1 + = x for any x. Moreover, for any n fixed R and > 0, by Taylor expansion we have x R2 R2 x c2 |x| ≤ R, n > =⇒ n log 1 + − x = n − 2 − x ≤ < , 2 n n 2n 2n where |c| < |x|. Therefore the convergence is uniform on [−R, R] for any R > 0. x n Taking the exponential, we get limn→∞ 1 + = ex . To see that the n convergence is also uniform on [−R, R], we use the uniform continuity of ex on [−R − 1, R + 1]. For any > 0, there is 0 < δ < 1, such that |x| ≤ R + 1, |y| ≤ R + 1, |x − y| < δ =⇒ |ex − ey | < . Then by the estimation above, we get x R2 =⇒ n log 1 + − x ≤ δ < 1 2δ n x x n =⇒ 1 + − ex = en log(1+ n ) − ex < . n x n On the other hand, for any fixed n, we have limx→+∞ 1 + − ex = n ∞. This that for = 1 and any natural number n, there is x, such implies x n that 1 + − ex > . This shows that the convergence is not uniform on n (−∞, +∞). |x| ≤ R, n >

Exercise 4.2.1. Determine the intervals on which the sequences uniformly converge.

4.2. SERIES OF FUNCTIONS 1

1. x n .

4. 1

2. n(x n − 1). 3.

1 . nx + 1

sin nx . n

x 5. sin . n √ 6. n 1 + xn .

225 1 α . x+ n x α 8. 1 + . n x 9. log 1 + . n 7.

Exercise 4.2.2. Suppose limn→∞ fn (x) = f (x) uniformly on [a, b] and uniformly on [b, c]. Prove that limn→∞ fn (x) = f (x) uniformly on [a, c]. What about other combinations of intervals? Exercise 4.2.3. Prove that limn→∞ fn (x) = f (x) uniformly if and only if limn→∞ sup |fn (x)− f (x)| = 0, where the supremum is taken over all x in the defining domain. Exercise 4.2.4. Suppose limn→∞ fn (x) = f (x) uniformly on X. Suppose g(y) is a function on Y such that g(y) ∈ X for any y ∈ Y . Prove that limn→∞ fn (g(y)) = f (g(y)) uniformly on Y . Is it also true that limn→∞ fn (x) = f (x) not uniformly implies limn→∞ fn (g(y)) = f (g(y)) not uniformly? Exercise 4.2.5. Suppose limn→∞ fn (x) = f (x) uniformly on X. Suppose fn (x) ∈ Y for all n and all x ∈ X. Prove that if g(y) is uniformly continuous on Y , then limn→∞ g(fn (x)) = g(f (x)) uniformly on X. Exercise 4.2.6. Are the sum, product, composition, maximum, etc, of two uniformly convergent sequences of functions still uniformly convergent? Exercise 4.2.7. Suppose f (x) Prove that the sequence is integrable on [a, b +Z1]. x+1 1 Pn−1 i fn (x) = f x+ converges uniformly to f (t)dt on [a, b]. n i=0 n x

P A series un (x) of functions uniformly converges if the sequence of partial sum P functions uniformly converges. Applying the Cauchy criterion, we find un (x) uniformly converges if and only if for any > 0, there is N , such that n ≥ m > N =⇒ |um (x) + um+1 (x) + · · · + un (x)| < .

(4.2.3)

Based on this, it is possible to extend various results on the convergence of series of numbers to the uniform convergence P of series of functions. For example, the uniform convergence of a series un (x) would imply that the sequence un (x) uniformly converges to 0. The comparison test can also be extended. P Proposition 4.2.3 (Comparison Test). Suppose |un (x)| ≤ vn (x). If vn (x) P uniformly converges, then un (x) uniformly converges. P P It is possible for un (x) to uniformly converge but |un (x)| not uniformly converge (either diverge, or converge but not uniformly). So we also need to extend the Leibniz test, the Dirichlet test, and the Abel test. The proof is the same as before and makes the additional use of the following condition: A sequence of functions fn (x) is uniformly bounded if there is M , such that |fn (x)| < M for all n and x. Proposition 4.2.4 (Leibniz Test). Suppose un (x) P is a monotone sequence for each x, and uniformly converges to 0. Then (−1)n un (x) uniformly converges.

226

CHAPTER 4. SERIES

Proposition 4.2.5 (Dirichlet Test). Suppose un (x) is a monotone sequence P for each x, and uniformly converges P to 0. Suppose the partial sums of vn (x) are uniformly bounded. Then un (x)vn (x) uniformly converges. Proposition 4.2.6 (Abel Test). Suppose un (x) P is a monotone sequence for each x, and is uniformly bounded. Suppose vn (x) uniformly converges. P Then un (x)vn (x) uniformly converges. Note that in the monotone condition, it is allowed to have un (x) increasing for some x and decreasing for some other x. (−1)n ≤ 1 . By considering 1 as constant functions, Example 4.2.5. We have 2 n + x2 n2 n2 P 1 uniformly converges. Therefore by the comparison test, the series the series n2 n P (−1) also uniformly converges. n2 + x2 Example 4.2.6. The partial sum of the geometric series 1 + x + x2 + · · · + xn + · · · 1 − xn+1 1 is , and the sum is for |x| < 1. By 1−x 1−x 1 − xn+1 1 |xn+1 | 1 − x − 1 − x = 1 − x , and an argument similar to the one in Example 4.2.3, we find the series uniformly converges on [−R, R] for any 0 < R < 1, but does not uniformly converge on (−1, 1). P (−1)n n x converges to a function f (x) on (−1, 1]. We Example 4.2.7. The series n expect f (x) = − log(1 + x) but the fact is yet to be established. Where does it converge uniformly? (−1)n n For any 0 < R < 1, we have x ≤ Rn on [−R, R]. By considering each nP Rn as a constant function, the series Rn converges uniformly. Therefore by the n P (−1) n comparison test, the series x converges uniformly on [−R, R]. n The convergence is not uniform on (−1, 0]. For any 0 < x < 1, we have 2n X (−1)i xn+1 xn+2 x2n x2n (−x)i = + + ··· + ≥ . n+1 n+2 i 2n 2 i=n+1

x2n 1 For = and any n, we can find x close to 1, such that > . Therefore the 3 2 Cauchy criterion for the uniform convergence fails. The convergence is uniform on [0, 1]. For each x ∈ [0, 1], the series is an 1 alternating series and xn is decreasing. By the estimation in Proposition 4.1.4, n we get n X (−1)i i 1 1 x − f (x) ≤ xn+1 ≤ n+1 i n+1 i=0

for all x ∈ [0, 1]. From this it is easy to see the series uniformly converges on [0, 1]. The uniform convergence on [0, 1] will be extended in a later Proposition 4.2.13.

4.2. SERIES OF FUNCTIONS

227

1 1 1 x+ x2 +· · ·+ xn +· · · 1! 2! n! of ex converges to ex . By the remainder formula (2.3.6) in Proposition 2.3.3, the difference between the sum and the partial sum is

Example 4.2.8. By Example 2.3.16, the Taylor series 1+

n X 1 e|x| |x|n+1 k x x −e < . k! (n + 1)! k=0

Then it is easy to see that for any R, the Taylor series uniformly converges for |x| ≤ R. n x Rn P Rn . Then the series , in which Alternatively, for |x| ≤ R, we have ≤ n! n! n! each term is considered as a constant function, uniformly converges. Therefore by the comparison test, the Taylor series uniformly converges. On the other hand, the Taylor series of ex does not uniformly converge on xn (−∞, +∞) because the sequence does not uniformly converge to 0. n! P sin nx converges for all Example 4.2.9. By Example 4.1.12, we know the series nα x and α > 0. sin nx P 1 1 If α > 1, then by α ≤ α and the convergence of , we conclude n n nα P sin nx that uniformly converges on the whole real line. nα Suppose 0 < α ≤ 1. Then in the argument in Example 4.1.12, the partial P 1 . For any δ > 0, the bounded sum of the series sin nx is bounded by sin 1 x 2 1 is uniform on the interval [δ, 2π − δ]. Since α can be considered as a monotone n sequence uniformly converging to 0, we may use the Dirichlet test to conclude that P sin nx the series uniformly converges on [δ, 2π − δ]. nα Exercise 4.2.8. Determine the intervals on which the series uniformly converge. 1.

P

xn e−nx .

P xn √ . n P x(x + n) n 3. . n

2.

4.

P

5.

P

6.

P

1 . + xα

sin nx

7.

P

1 . x + an

8.

P cos nx . nα

xn . 1 − xn

9.

P sin3 nx . nα

nα

nα (log n)β

.

P Exercise 4.2.9. Show that (−1)n xn (1 − x) uniformly converges on [0, 1] and P absolutely converges for each x ∈ [0, 1]. However, the series |(−1)n xn (1 − x)| does not uniformly converge. P Exercise 4.2.10. Regarding Exercise 4.1.25, if fn (x)2 uniformly converges, can P fn (x) 1 you conclude that also uniformly converges for any α > ? α n 2 Exercise 4.2.11. Can you establish the uniform convergence version of Exercise 4.1.16?

228

4.2.2

CHAPTER 4. SERIES

Properties of Uniform Convergence

Proposition 4.2.7. Suppose fn (x) uniformly converges to f (x) for x 6= a. If limx→a fn (x) = ln exists, then both limn→∞ ln and limx→a f (x) converge and are equal. The conclusion is lim lim fn (x) = lim lim fn (x).

x→a n→∞

n→∞ x→a

(4.2.4)

Therefore uniform convergence implies that the two limits limx→a and limn→∞ commute. Proof. For any > 0, there is N , such that m, n > N, x 6= a =⇒ |fm (x) − fn (x)| < .

(4.2.5)

Taking limx→a of the right side of (4.2.5), we get |lm − ln | ≤ . Therefore limn→∞ ln converges by the Cauchy criterion. Let l = limn→∞ ln . Then n > N =⇒ |l − ln | ≤ . On the other hand, taking limm→∞ of the right side of (4.2.5), we get n > N, x 6= a =⇒ |f (x) − fn (x)| ≤ . Fix one n > N . Since limx→a fn (x) = ln , there is δ > 0, such that 0 < |x − a| < δ =⇒ |fn (x) − ln | < . Then for the fixed choice of n, we have 0 < |x − a| < δ =⇒ |f (x) − l| ≤ |f (x) − fn (x)| + |fn (x) − ln | + |l − ln | < 3. This proves that limx→a f (x) = l. An immediate consequence of Proposition 4.2.7 is that the uniform convergence preserves the continuity. Proposition 4.2.8. Suppose fn (x) is continuous and the sequence fn (x) converges to f (x). 1. If the convergence is uniform, then f (x) is also continuous. 2. (Dini Theorem) If fn (x) is monotone in n and f (x) is continuous, then the convergence is uniform on any bounded and closed interval. Proof. By taking ln = limx→a fn (x) = fn (a), the first part is a consequence of Proposition 4.2.7. Now suppose fn (x) is monotone in n and f (x) is continuous. If the limit lim fn (x) = f (x) is not uniform on [a, b], then there is > 0, a subsequence fnk (x), and a sequence xk ∈ [a, b], such that |fnk (xk ) − f (xk )| ≥ . In the

4.2. SERIES OF FUNCTIONS

229

bounded and closed interval [a, b], xk has a convergent subsequence. Thus without loss of generality, we may assume xk converges to c ∈ [a, b]. By the monotone assumption, we have n ≤ m =⇒ |fn (x) − f (x)| ≥ |fm (x) − f (x)| ≥ . Therefore n ≤ nk =⇒ |fn (xk ) − f (xk )| ≥ |fnk (xk ) − f (xk )| ≥ . For each fixed n, by taking k → ∞ and the continuity of f (x) and fn (x), we get |fn (c) − f (c)| ≥ . This contradicts with the assumption that fn (c) converges to f (c). Thus we proved that fn (x) must converge to f (x) uniformly on [a, b]. Note that in the proof of the second part, we only need fn (x) to be monotone for each x. It is not required that the sequence should be increasing for all x or decreasing for all x. Example 4.2.10. Since xn are continuous and the limit ( 0 if 0 ≤ x < 1 lim xn = n→∞ 1 if x = 1 is not continuous on [0, 1], the convergence is not uniform on [0, 1]. On the other hand, for any 0 < R < 1, the sequence xn is decreasing in n on [0, R] and the limit function 0 is continuous on the interval. By Dini Theorem, the limit limn→∞ xn = 0 is uniform on [0, R]. Example 4.2.11. ByExample 4.2.4, we know the function ex is the uniform limit n x on [−R, R] for any R > 0. Since polynomials are of the polynomials 1 + n continuous, we conclude that ex is continuous on any [−R, R]. In other words, ex is continuous on (−∞, +∞). Instead of deriving the continuity of ex from the uniform convergence, we may x n x also use the continuity of e and the fact that the sequence 1 + is monotone n to derive that the convergence is uniform on [−R, R] for any R > 0. Example 4.2.12. The function fn (x) given in Figure 4.3 is continuous, and the limit limn→∞ fn (x) = 0 is also continuous. However, the convergence is not uniform. Note that the sequence fn (x) is not monotone, so that Dini Theorem cannot be applied. Exercise 4.2.12. Suppose fn (x) uniformly converges to f (x). Prove that if fn (x) are uniformly continuous, then f (x) is uniformly continuous.

The discussion leading to the definition of uniform convergence makes us expect the following. Proposition 4.2.9. Suppose fn (x) is integrable on a bounded interval [a, b] and the sequence fn (x) uniformly converges. Then limn→∞ fn (x) is integrable, with Z b Z b lim fn (x)dx = lim fn (x)dx. (4.2.6) a n→∞

n→∞

a

230

CHAPTER 4. SERIES

Proof. For any > 0, there is N , such that m, n > N, x ∈ [a, b] =⇒ |fm (x) − fn (x)| < . This further implies Z b Z b m, n > N =⇒ fm (x)dx − fn (x)dx ≤ (b − a). a a Z b By Cauchy criterion, the limit limn→∞ fn (x)dx converges. Denote the a

limit by I. Let f (x) = limn→∞ fn (x). For any > 0, there is N , such that n > N, x ∈ [a, b] =⇒ |fn (x) − f (x)| < . This implies that for any partition P of [a, b] and the same choice of x∗i for all functions, we have n > N =⇒ |S(P, fn (x)) − S(P, f (x))| < (b − a). Now we fix one n > N satisfying Z b < . f (x)dx − I n a

For the fixed integrable function fn (x), there is δ > 0, such that Z b fn (x)dx < . kP k < δ =⇒ S(P, fn (x)) − a

Combining everything together, we find kP k < δ implies |S(P, f (x)) − I| ≤|S(P, fn (x)) − S(P, f (x))| Z b Z b + S(P, fn (x)) − fn (x)dx + fn (x)dx − I a

a

<(b − a + 2). Z This shows that f (x) is integrable, with

b

f (x)dx = I. a

Example 4.2.13. Let rn , n = 1, 2, . . . , be all the rational numbers in [0, 1]. Then the functions ( 1 if x = r1 , r2 , . . . , rn fn (x) = 0 otherwise are integrable. However, the limit limn→∞ fn (x) = D(x) is the Dirichlet function, which is not Riemann integrable. Of course the limit is not uniform. The example shows that the limit of Riemann integrable functions are not necessarily Riemann integrable. Thus we often need to attach the uniform convergence condition if we want to consider the limit of Riemann integral. The annoyance will be resolved by the introduction of Lebesgue integral. The Lebesgue integral is an extension of the Riemann integral that allows more functions (such as the Dirichlet function) to be integrable, and has the property that the limit of Lebesgue integrable functions are (almost always) Lebesgue integrable.

4.2. SERIES OF FUNCTIONS

231

Example 4.2.14. By Example 4.2.3, the limit ( 0 n lim x = f (x) = n→∞ 1

if |x| < 1 if x = 1

is not uniform on [0, 1]. However, we still have Z 1 Z 1 1 xn dx = lim lim f (x)dx. =0= n→∞ n + 1 n→∞ 0 0 In general, we have the dominant convergence theorem in the theory of Lebesgue integral: If fn (x) are uniformly bounded and Lebesgue integrable, then f (x) = limn→∞ fn (x) is Lebesgue integrable, and the equality (4.2.6) holds. Exercise 4.2.13. Suppose fn (x) is integrable on a bounded interval Z [a, b] and x

limn→∞ fn (x) = f (x) uniformly. Prove that the convergence limn→∞ Z x lim fn (t)dt is uniform for x ∈ [a, b].

fn (t)dt = a

a n→∞

Exercise 4.2.14. Extend Proposition 4.2.9 to the Riemann-Stieltjes integral. Exercise 4.2.15. Suppose f is Riemann-Stieltjes integrable with respect to each αn . Will the uniform convergence of αn tell you something about the limit of Z b f dαn . a

Proposition 4.2.10. Suppose fn (x) are differentiable on an interval, such that fn (x0 ) converges at some point x0 and fn0 (x) uniformly converges to g(x). Then fn (x) uniformly converges and 0 lim fn (x) = lim fn0 (x). (4.2.7) n→∞

n→∞

Proof. For any > 0, there is N , such that 0 m, n > N =⇒ |fm (x) − fn0 (x)| < , |fm (x0 ) − fn (x0 )| < .

Applying the mean value theorem to fm (x) − fn (x), we get 0 |(fm (y) − fn (y)) − (fm (x) − fn (x))| = |fm (c) − fn0 (c)||y − x| ≤ |x − y| (4.2.8)

for m, n > N and any x, y in the interval. In particular, we get |fm (x) − fn (x)| ≤ |x0 − x| + |fm (x0 ) − fn (x0 )| ≤ (|x0 − x| + 1) for m, n > N and any x. This implies that the sequence fn (x) uniformly converges on any bounded interval. Let f (x) = limn→∞ fn (x). By taking limm→∞ in (4.2.8), we get |(f (y) − f (x)) − (fn (y) − fn (x))| = |(f (y) − fn (y)) − (f (x) − fn (x))| ≤ |y − x|. This means that for fixed x, the function gn (y) =

fn (y) − fn (x) uniformly y−x

f (y) − f (x) for all y not equal to x. By Proposition y−x 4.2.7, we conclude that limy→x g(y) converges and

converges to g(y) =

f 0 (x) = lim g(y) = lim lim gn (y) = lim fn0 (x). y→x

n→∞ y→x

n→∞

232

CHAPTER 4. SERIES

x n Exercise 4.2.16. Derive the derivative of ex from ex = limn→∞ 1 + or ex = n 1 1 1 limn→∞ 1 + x + x2 + · · · + xn and the uniform convergence. 1! 2! n!

The properties of uniform convergence can be rephrased for series. P 1. Suppose un (x) uniformly converges. Then taking the limit commutes with the sum: X X lim un (x) = lim un (x). x→a

2. P Suppose un (x) is continuous and un (x) is also continuous.

x→a

P

un (x) uniformly converges. Then

P 3. (Dini Theorem) Suppose un (x) is continuous, u (x) ≥ 0 and un (x) n P converges to a continuous function. Then un (x) uniformly converges. P 4. P Suppose un (x) is integrable and un (x) uniformly converges. Then un (x) is integrable and the integration commutes with the sum: Z bX XZ b un (x)dx = un (x)dx. a

a

P P 0 5. P Suppose un (x0 ) converges and un (x) uniformly converges. Then un (x) uniformly converges and is differentiable, and the derivative commutes with the sum: X 0 X un (x) = u0n (x). Example 4.2.15. By Example 4.1.8, the Riemann zeta function ζ(x) = 1 +

1 1 1 + + ··· + x + ··· 2x 3x n

is defined on (1, +∞). For any R > 1, we have x ≥ R =⇒ 0 <

1 1 ≤ R. x n n

P 1 Since the series of numbers (considered as constant functions) converges, nR P 1 uniformly converges on [R, +∞). Therefore ζ(x) is continuous the series nx on [R, +∞) for any R > 1. Since R > 1 is arbitrary, we conclude that ζ(x) is continuous on (1, +∞). P log n Consider the series − obtained by differentiating ζ(x) term by term. nx log n log n 1 For any R > 1, choose R0 satisfying R > R0 > 1. Then 0 < ≤ R < R0 x n n n P 1 for x > R and sufficiently big n. The convergence of the series of numbers nR 0 P log n implies the series − uniformly convergent on [R, +∞) for any R > 1. This nx P log n implies that ζ(x) is differentiable and ζ 0 (x) = − . Further argument shows nx that ζ(x) has derivative of any order.

4.2. SERIES OF FUNCTIONS

233

Example 4.2.16. Let h(x) be given by h(x) = |x| on [−1, 1] and h(x + 2) = h(x) for any x. The function is continuous and satisfies 0 ≤ h(x) ≤ 1. Therefore f (x) =

∞ n X 3 n=0

4

h(4n x)

uniformly converges and is a continuous function. However, we will show that the function is not differentiable anywhere. 1 Let δk = ± . Then for any n, by |h(x) − h(y)| ≤ |x − y|, we have 2 · 4k h(4n (a + δk )) − h(4n a) 4n δk ≤ = 4n . δk δk For n > k, 4n δk is a multiple of 2, and we have h(4n (a + δk )) − h(4n a) = 0. δk 1 For n = k, we have 4k δk = ± . By choosing ± sign so that there is no integer 2 1 between 4k a and 4k a ± , we can make sure that |h(4k (a + δk )) − h(4k a)| = 2 1 k k |4 (a + δk ) − 4 a| = . Then 2 h(4k (a + δk )) − h(4k a) = 4k . δk Thus for any fixed a, by choosing a sequence δk with suitable ± sign, we get k k−1 n X f (a + δk ) − f (a) 3 3 3k − 1 3k + 1 k n k ≥ 4 − 4 = 3 − = . δk 4 4 3−1 2 n=0

f (a + δ) − f (a) diverges. δ Exercise 4.2.17. Justify the equalities. Z 1 ∞ X (−a)n−1 1. xax dx = . nn 0 This implies that limδ→0

n=1

Z

a

2. λa

Z

∞ X

! n

n

λ tan λ x dx = − log | cos a| for |a| <

n=0

+∞

(ζ(t) − 1)dt =

3. x

∞ X n=2

1 nx log n

π , |λ| < 1. 2

for x > 1.

Exercise 4.2.18. Find the places where the series converge and have derivatives. Also find the highest order of the derivative. P 1 P 1 n 1. . 2. x+ . n(log n)x n

234 3.

CHAPTER 4. SERIES P+∞

4.2.3

n=−∞

1 . |n − x|α

4.

P (−1)n . nx

Power Series

A power series is a series of the form ∞ X

an x n = a0 + a1 x + a2 x 2 + · · · + an x n + · · · ,

n=0

or more generally, of the form series.

P∞

n=0

an (x − x0 )n . Taylor series are power

Theorem 4.2.11. Suppose R=

1 limn→∞

p . n |an |

(4.2.9)

Then the power series absolutely converges for |x| < R and diverges for |x| > R. Moreover, for any 0 < r < R, the power series uniformly converges for |x| ≤ r. Proof. For any 0 < r < R, choose r0 satisfying r < r0 < R. Then we have p 1 > limn→∞ n |an |. By the second part of Proposition 1.2.10, this means 0 r p 1 that 0 > n |an | for all but finitely many n. Thus for |x| ≤ r, we have r n |x| n r n p n n |an ||x| < |an x | = ≤ 0 r0 r P r n for all but finitely many n. Since 0 < r < r0 , the series of numbers P r0 n converges. Then by the comparison test, the series an x of functions uniformly converges for |x| ≤ r. Since the series converges for |x| ≤ r, where r can be any number satisfying 0 < r < R, we conclude that the series converges for |x| < R. On p 1 the other hand, if |x| > R, then < limn→∞ n |an |. By the first part of |x| p 1 Proposition 1.2.10, there are infinitely many n satisfying < n |an |, or |x| P n n |an x | > 1. Thus the terms of the series an x do not converge to 0, and the series diverges. The number R in Theorem 4.2.11 is called the radius of convergence for the power series. By the theorem, the radius is characterized by the property that the power series converges for |x| < R and diverges for |x| > R. P Example 4.2.17. By Example 4.2.6, the geometrical series xn converges for |x| < 1 and diverges for |x| ≥ 1. Thus the radius of convergence of the geometric series

4.2. SERIES OF FUNCTIONS

235

p is 1. Alternatively, the radius of convergence can be obtained by limn→∞ n |an | = √ limn→∞ n 1 = 1. P xn By Example 4.2.8 (or Example 2.3.16), the Taylor series of ex converges n! for any x. Therefore the radius of convergence is ∞. By the similar reason, the radius of convergence for the Taylor series of sin x and cos x is also ∞. p P (−1)n+1 n The Taylor series of log(1 + x) is x . Since limn→∞ n |an | = n r 1 1 limn→∞ n = 1, the radius of convergence is = 1. n 1 √ n Example 4.2.18. By limn→∞ n! = ∞, the radius of convergence for the series P n n!x is 0. In √ other words, the series diverges for all x 6= 0. P By limn→∞ n 2n + 3n = 3, the radius of convergence for the series (2n +3n )xn 1 is . 3 r P (−1)n n 1 x is By limn→∞ n n = 0, the radius of convergence for the series n nn ∞. In other words, the series converges for all x. P Exercise 4.2.19. Prove the radius of convergence R of a power series an xn satisfies an an ≤ R ≤ lim . lim n→∞ an+1 n→∞ an+1 Then use the result to show that the radius of convergence for the Taylor series of (1 + x)α and log(1 + x) is 1. P Exercise 4.2.20. Suppose the radii of convergence for the power series an xn and P bn xn are R and R0 . P 1. Prove that the radius of convergence for the sum power series (an + bn )xn is at least min{R, R0 }. 2. Prove that if R 6= R0 , then the radius of convergence for the sum power series is equal to min{R, R0 }. P What about the radius of convergence for the product power series (a0 bn + a1 bn−1 + · · · + an b0 )xn ? Exercise 4.2.21. Find the radius of convergence. 1.

P

2

xn .

P 2n n x . n P n n2 −1 3. 2 x .

6.

P

n+1 n

2.

P (2n)! n 4. x . (n!)2 5.

P

n2

a xn .

P 7. (−1)n

P 8. (−1)n P 9. (−1)n

n

xn .

n+1 n

n2

n+1 n

n2

n+1 2n

n

xn . 2

xn .

xn .

Due to the uniform convergence, the calculus of the power series can be done term by term.

236

CHAPTER 4. SERIES

Proposition 4.2.12. Suppose R is the radius of convergence of a power P n series an x . Then the sum has derivative of any order for |x| < R. Moreover, the derivative and integral can be taken term by term for |x| < R: (a0 + a1 x + · · · + an xn + · · · )0 = a1 + 2a2 x + · · · + nan xn−1 + · · · , Z x a2 an a1 (a0 + a1 t + · · · + an tn + · · · ) dt = a0 x + x2 + x3 + · · · + xn+1 + · · · . 2 3 n 0 Note that by the formula (4.2.9), the radius of convergence for the derivative and integral power series remains the same. P n Example 4.2.19. The power series nx = x + 2x2 + 3x3 + · · · has radius of convergence 1. To find the sum, we note that 0 1 1 2 2 3 0 1 + 2x + 3x + · · · = (1 + x + x + x + · · · ) = = . 1−x (1 − x)2 Therefore X

nxn = x(1 + 2x + 3x2 + · · · ) =

x . (1 − x)2

Example 4.2.20. By the equality 1 = 1 + x + x2 + · · · + xn + · · · , |x| < 1, 1−x we get 1 = 1 − x + x2 + · · · + (−1)n xn + · · · , |x| < 1. 1+x By integration, we get 1 (−1)n n+1 1 x + · · · , |x| < 1. log(1 + x) = x − x2 + x3 + · · · + 2 3 n+1 Exercise 4.2.22. Find the sum of power series. 1.

P∞

2.

P∞

2 xn .

3.

P∞

xn . n(n + 1)

x2n+1 . 2n + 1

4.

P∞

xn . n(n + 1)(n + 2)

n=1 n

n=0

n=1

n=1

Exercise 4.2.23. Find the Taylor series of the function and the radius of convergence. Then explain why the sum of the Taylor series is the given function. √ 1. arcsin x. 3. arctan x. 5. log(x + 1 + x2 ). Z x Z x Z x dt sin t −t2 √ 6. dt. 2. dt. 4. e dt. t 1 − t4 0 0 0 Exercise 4.2.24. Verify that the functions defined by the power series are the solutions of the differential equations. 1. f (x) =

P∞

x4n , f (4) (x) = f (x). (4n)!

2. f (x) =

P∞

xn , xf 00 (x) + f 0 (x) − f (x) = 0. (n!)2

n=0

n=0

4.2. SERIES OF FUNCTIONS

237

xn n=1 2 = − n that f (x) + f (1 − x) + log x log(1 − x) = f (1).

Exercise 4.2.25. Show that f (x) =

P∞

Z 0

x

log(1 − t) dt and then verify t

Exercise 4.2.26 (Euler). The vibration of a circular drumhead is described by the differential equation b2 1 0 2 00 f (x) + f (x) + a − 2 f (x) = 0. x x Verify that f (x) = xb 1 −

ax 4 ax 6 1 ax 2 1 1 + − + ··· b+1 2 2!(b + 1)(b + 2) 2 3!(b + 1)(b + 2)(b + 3) 2

is a solution of the equation. Exercise 4.2.27. Derive the derivative of ex , sin x and cos x from the uniform convergence of their Taylor series. P Exercise 4.2.28. Prove that if the radius of convergence for the power series an xn is nonzero, then the sum function f (x) satisfies f (n) (0) = n!an .

The results so far concerns only with what happens within the radius of convergence. The following describes what happens at the radius of convergence. Proposition 4.2.13 (Abel Theorem). Suppose R is the radius of convergence of a power series. If the power series converges at x = R, then the series uniformly converges on [0, R]. If the power series converges at x = −R, then the series uniformly converges on [−R, 0]. P By Proposition 4.2.7, a consequence of Abel theorem is that if an x n converges at x = R > 0, then X X lim− an x n = an R n . x→R

Similar equality holds when the power series converges at x = −R. P Proof. Suppose an Rn converges. Then the Abel test in Proposition 4.2.6 xn may be applied to un (x) = n and vn (x) = an Rn on [0, R]. The functions R un (x) are all bounded by 1, and thePsequence un (x) is increasing for each x ∈ [0, R]. The convergence of series vnP (x) is uniform because the series is P independent of x. The conclusion is that un (x)vn (x) = an xn converges uniformly on [0, R]. Example 4.2.21. By Example 4.2.17, the Taylor series of log(1 + x) has radius of convergence 1. The series also converges at x = 1 because it is alternating. By Example 4.2.20 and the remark made after Proposition 4.2.13, we have 1 1 (−1)n+1 + + ··· + + ··· 2 3 n 1 2 1 3 (−1)n n+1 = lim x − x + x + · · · + x + ··· 2 3 n+1 x→1− = lim log(1 + x) = log 2. 1−

x→1−

238

CHAPTER 4. SERIES

Exercise 4.2.29. Use the Taylor series of arctan x to show that 1−

1 1 1 1 1 π + − + − + ··· = . 3 5 7 9 11 4

Then show 1+

π log 2 1 1 1 1 1 1 1 − − + + − − + ··· = + . 2 3 4 5 6 7 8 4 2

Exercise 4.2.30. Discuss the convergence of the Taylor series of arcsin x at the radius of convergence. Exercise 4.2.31. By making use of the Taylor series of (1 + x)α , prove that ∞ X α(α − 1) · · · (α − n + 1)

n!

n=0

and

= 2α , α > −1,

∞ X α(α + 1) · · · (α + n − 1)

n!

n=0

= 0, α < 0.

Note that the convergence can be obtained by the Leibniz test and the Raabe test (see Example 4.1.45). P Exercise 4.2.32. The series cn with cn = a0 bn +a1 bn−1 +· · ·+an b0 is obtained by P combining terms in the diagonal arrangement of the product of the series a and n P P P P n, n, n at x = 1, prove bn . By considering the power series a x b x c x n n n P P P P P P that if an , bn and cn converge, then cn = ( an )( bn ). Compare with Exercise 4.1.39.

4.2.4

Fourier Series

A trigonometric series is an infinite sum of trigonometric functions ∞

a0 X (an cos nx + bn sin nx). + 2 n=1

(4.2.10)

If the series uniformly P converges to a continuous function f (x) (this happens, for example, when (|an | + |bn |) converges), then the sum f (x) is necessarily a periodic function with period 2π: f (x + 2π) = f (x). The inner product of two periodic integrable real functions with period 2π is Z 2π hf, gi = f (x)g(x)dx. (4.2.11) 0

Because of the periodicity, the integral can be taken over any interval of length 2π. The two functions are said to be orthogonal if hf, gi = 0. A collection of such functions is called an orthogonal system if they are pairwise orthogonal.

4.2. SERIES OF FUNCTIONS

239

The inner product is linear with respect to f : hf1 + f2 , gi = hf1 , gi + hf2 , gi, hcf, gi = chf, gi. It is also symmetric: hf, gi = hg, f i. Therefore the inner product is also linear in g, and we say the inner product is bilinear. Moreover, the inner product has the positivity property: hf, f i ≥ 0. By Exercise 3.2.3, the functions 1 φ1 (x) = , φ2 (x) = cos x, φ3 (x) = sin x, φ4 (x) = cos 2x, φ5 (x) = sin 2x, . . . 2 used in the construction of the trigonometric series form an orthogonal system. Denote P the corresponding coefficients by c1 = a0 , c2 = a1 , c3 = b1 , . . . . Suppose |cn | converges. Then by the uniform convergence and the bilinear property of the inner product, we have Z 2π f (x)φn (x)dx hf, φn i = 0 Z 2π Z 2π Z 2π = c0 φ0 (x)φn (x)dx + c1 φ1 (x)φn (x)dx + c2 φ2 (x)φn (x)dx + · · · 0

0

0

= c0 hφ0 , φn i + c1 hφ1 , φn i + c2 hφ2 , φn i + · · · = cn hφn , φn i. Therefore cn =

hf, φn i . hφn , φn i

In our case (again by the computation in Exercise 3.2.3), we get Z Z 1 2π 1 2π an = f (x) cos nxdx, bn = f (x) sin nxdx. π 0 π 0

(4.2.12)

(4.2.13)

These are the Fourier coefficients. Conversely, for any periodic integrable function with period 2π, we may compute the Fourier coefficients as above and construct the Fourier series ∞

f (x) ∼

a0 X + (an cos nx + bn sin nx). 2 n=1

The notation ∼ only means that the function and the series are related. The two will be equal only under certain conditions. Example 4.2.22. Let 0 ≤ a ≤ 2π. Then ( 0 if 2nπ < x < a + 2nπ f (x) = , n∈Z 1 if a + 2nπ < x < 2(n + 1)π

240

CHAPTER 4. SERIES

is the periodic function with period 2π that uniquely extends the function ( 0 if 0 < x < a f (x) = 1 if a < x < 2π on (0, 2π). Note that since the Fourier coefficients are defined as integrals, the values of the function at 0 and a are not important. The Fourier series is 1−

X 1 a − (sin na cos nx + (1 − cos na) sin nx). 2π nπ

Example 4.2.23. Let f (x) be the periodic function with period 2π given by f (x) = x2 on (0, 2π). Then Z 8π 2 1 2π 2 x dx = a0 = , π 0 3 Z Z 2π Z 2π 2 2 4 1 2π 2 x cos nxdx = − x sin nxdx = 2 2π − cos nxdx = 2 , an = π 0 nπ 0 n π n 0 Z 2π Z 2π 1 1 bn = x2 sin nxdx = − 4π 2 − 2 x cos nxdx π 0 nπ 0 Z 2π 4π 2 4π =− − 2 sin nxdx = − . n n π 0 n Therefore the Fourier series is 4π 2 X + 3

4π 4 cos nx − sin nx . n2 n

Exercise 4.2.33. Compute the Fourier series for the periodic functions with period 2π and satisfy the given formulae. ( 0 if 0 < x < a 1. f (x) = x on (0, 2π). 5. f (x) = . x − a if a < x < 2π 2. f (x) = ax on (0, 2π). 3. f (x) = ax on (−π, π). 4. f (x) = | sin x|.

( x2 6. f (x) = −x2

if 0 < x < π . if π < x < 2π

Exercise 4.2.34. Suppose l > 0 and f (x) is a periodic function with period l: lx f (x + l) = f (x). Then f is a periodic function with period 2π. Use the 2π observation to derive the formula for the coefficients of the Fourier series ∞ 2nπx a0 X 2nπx + an cos + bn sin . 2 l l n=1

Exercise 4.2.35. What can you say about the Fourier series of odd periodic functions? What about even periodic functions? π Exercise 4.2.36. How can you extend a function f (x) on 0, so that its Fourier 2 P series is of the form b2n−1 sin(2n − 1)x? Exercise 4.2.37. Suppose k is a integer. How are the Fourier series of f (x) and f (kx + a) related?

4.2. SERIES OF FUNCTIONS

241

Exercise 4.2.38. Suppose f (x) is a periodic differentiable function with period 2π, such that f 0 (x) is integrable. How are the Fourier series of f (x) and f 0 (x) related? Exercise 4.2.39. Show that {cosZnx} is an orthogonal system on (0, π) with respect π

to the inner product hf, gi =

f (x)g(x)dx. Show that {sin nx} is also an or0

thogonal system on (0, π). However, the combined system is not an orthogonal system on (0, π).

Similar to the Taylor series, we are interested in whether the Fourier series of f (x) converges to f (x). We start the discussion with the following result. Proposition 4.2.14 (Bessel Inequality). For any periodic integrable function f (x) of period 2π, the Fourier coefficients satisfy Z a20 X 2 1 2π 2 + (an + bn ) ≤ f (x)2 dx. (4.2.14) 2 π 0 P Proof. Let sN = N n=1 cn φn . Then hf, sN i = =

=

=

N X n=1 N X

cn hf, φn i

(inner product is bilinear)

c2n hφn , φn i

n=1 N X N X

(definition of cn )

cm cn hφm , φn i

m=1 n=1 * N X

N X

m=1

n=1

φm ,

(φn is orthogonal)

+ φn

= hsN , sN i.

(inner product is bilinear)

Thus by the bilinearity and the symmetric property of the inner product, we have 0 ≤ hf − sN , f − sN i = hf, f i − hf, sN i − hsN , f i + hsN , sN i Z 2π N X = hf, f i − hf, sN i = f (x)2 dx − c2n hφn , φn i. 0

n=1

In our case, we have 1 1 π , = , hcos nx, cos nxi = π, hsin nx, sin nxi = π, 2 2 2 and the inequality (4.2.14) follows. Note that the Bessel inequality applies to any orthogonal system {φn }: hf, f i ≥

∞ X n=1

c2n hφn , φn i, where cn =

hf, φn i . hφn , φn i

(4.2.15)

As a matter of fact, the inequality becomes an equality for the trigonometric series (the fact is related to the completeness of the trigonometric system).

242

CHAPTER 4. SERIES

Proposition 4.2.15 (Riemann-Lebesgue Lemma). Suppose f (x) is an integrable function on a bounded interval [a, b]. Then Z lim

n→∞

b

b

Z f (x) cos nxdx = 0, lim

n→∞

a

f (x) sin nxdx = 0.

(4.2.16)

a

Proof. For a periodic integrable function f (x) with period 2π, Proposition 4.2.14 tells us that the Fourier coefficients converge to 0, which means Z

Z n→∞

0

2π

f (x) sin nxdx = 0.

f (x) cos nxdx = 0, lim

lim

n→∞

2π

0

If |b − a| ≤ 2π, then f (x) may be extended to a periodic integrable function F (x) with period 2π satisfying ( f (x) if a < x < b F (x) = . 0 if b < x < a + 2π Then b

Z

Z f (x) cos nxdx =

a

a+2π

2π

Z F (x) cos nxdx =

a

F (x) cos nxdx. 0

We just proved that this approaches 0 as n → ∞. Z In general, for bounded [a, b], the integration

b

may be divided into a

finitely many integrations on intervals of length ≤ 2π. Then the limit also approaches 0.

A more general version of Riemann-Lebesgue Lemma can be found in Exercise 3.2.44. Note that by cos(n + λ)x = cos λx sin nx − sin λx cos nx and the similar formula for sin(n + λ)x, we also have Z lim

n→∞

b

b

Z f (x) cos(n + λ)xdx = 0, lim

a

n→∞

f (x) sin(n + λ)xdx = 0. (4.2.17) a

Exercise 4.2.40. Suppose f (x) is a periodic function with period 2π that is monotone on (a, a + 2π) for some a. Prove that there is a constant M , such that the M M and |bn | ≤ . Can you draw stronger Fourier coefficients satisfy |an | ≤ n n conclusion if f 0 (x) is monotone? Exercise 4.2.41. Suppose f (x) is decreasing limx→+∞ f (x) = 0. Prove that Z +∞ Z and +∞ f (x) cos nxdx converges and limn→∞ f (x) cos nxdx = 0. a

a

Now we are ready for a simple convergence criterion for the Fourier series.

4.2. SERIES OF FUNCTIONS

243

Theorem 4.2.16. Suppose f (x) is a periodic integrable function with period 2π. Suppose at a, f (x) has the left limit f (a− ) and the right limit f (a+ ), and there are M and δ > 0, such that 0 < t < δ =⇒ |f (a + t) − f (a+ )| ≤ M t, |f (a − t) − f (a− )| ≤ M t. f (a+ ) + f (a− ) at a. 2 The condition of the theorem is satisfied everywhere if the function is piecewise Lipschitz. Then the Fourier series converges to

Proof. The partial sum of the Fourier series is sN (a) = =

= = =

N a0 X + (an cos na + bn sin na) 2 n=1 ! Z N 1 2π 1 X f (t) + (cos nt cos na + sin nt sin na) dt π 0 2 n=1 ! Z N 1 2π 1 X f (t) + cos n(t − a) dt π 0 2 n=1 Z Z 1 2π−a 1 2π f (t)DN (t − a)dt = f (t)DN (t)dt π 0 π −a Z 1 π f (a + t)DN (t)dt, π −π

where the last equality is due to the fact that the integrand is periodic. The Dirichlet kernel function 1 sin N + t N 1 X 2 DN (t) = + cos nt = . (4.2.18) t 2 n=1 2 sin 2 satisfies 1 π

Z

0

1 DN (t)dt = π −π

Z 0

π

1 DN (t)dt = . 2

Therefore f (a+ ) + f (a− ) sN (a) − 2 Z Z Z 1 π 1 0 1 π − = f (a + t)DN (t)dt − f (a )DN (t)dt − f (a+ )DN (t)dt π −π π −π π 0 Z Z 1 0 1 π − = (f (a + t) − f (a ))DN (t)dt + (f (a + t) − f (a+ ))DN (t)dt. π −π π 0 f (a + t) − f (a− ) is integrable on t 2 sin 2 [−π, −] for any > 0. Moreover, the assumption tells us that the function Note that for any fixed a, the function

244

CHAPTER 4. SERIES

is bounded on [−π, 0]. Thus by Exercise 3.1.8, the function is integrable on [−π, 0]. Then by (4.2.17), Z 0 Z 0 f (a + t) − f (a− ) 1 − (f (a + t) − f (a ))DN (t)dt = sin N + tdt t 2 −π −π 2 sin 2 Z π (f (a+t)−f (a+ ))DN (t)dt approaches 0 as N → ∞. By the similar reason, 0

also converges to 0.

Example 4.2.24. The function in Example 4.2.23 satisfies the condition of Theorem 4.2.16. By evaluating the Fourier series at x = 0, we get (2π)2 + 02 4π 2 X 4 = + . 2 3 n2 From this we get 1 1 1 π2 + + · · · + + · · · = . 12 22 n2 6 Exercise 4.2.42. By evaluating the Fourier series in Example 4.2.23, show that P sin nx π−x = on (0, 2π). Then derive the sum of the following infinite series. n 2 1 1 1 1 1 + ···. 1. 1 − + − + − 3 5 7 9 11 1 1 1 1 1 2. 1 + − − + + + ···. 5 7 11 13 17 1 1 1 1 1 1 1 3. 1 + − − + + − − + ···. 2 4 5 7 8 10 11 1 1 1 1 1 1 1 4. 1 + − − + + − − + ···. 3 5 7 9 11 13 15 P 1 Exercise 4.2.43. By studying the Fourier series of x4 , compute . n4 Exercise 4.2.44. Prove that if f (x) has continuous second order derivative, then the Fourier series uniformly converges to f (x).

4.2.5

Additional Exercise

Uniform Convergence of Double Sequence A double sequence xm,n is indexed by natural numbers m and n. We write limn→∞ xm,n = lm if the limit holds for any fixed m. We say the convergence is uniform if for any > 0, there is N , such that n > N =⇒ |xm,n − lm | < . The key point here is that N depends on P only, and is independent of m. The uniform convergence of series n xm,n can also be defined accordingly. Exercise 4.2.45. Determine uniform convergence.

4.2. SERIES OF FUNCTIONS 1. limn→∞ 2. limn→∞ 3. limn→∞

245

1 = 0. m+n m = 0. m+n 1 = 0. mn

4. limn→∞

1 = 0 (n > 1). nm

5. limn→∞

√ n m = 1.

6. limn→∞

m = 0. n

Exercise 4.2.46. Construct a double sequence xm,n satisfying limm→∞ xm,n = 0 and limn→∞ xm,n = 1. In particular, the double limits limm→∞ limn→∞ xm,n and limn→∞ limm→∞ xm,n are different. Exercise 4.2.47. Are the sum, product, maximum, etc, of two uniformly convergent double sequences still uniformly convergent? Exercise 4.2.48. Suppose limn→∞ xm,n = lm uniformly and limm→∞ xm,n = kn exists. Prove that both limm→∞ lm and limn→∞ kn converge and are equal. Exercise P 4.2.49. State the Cauchy criterion for the uniform convergence of the series n xm,n . State the corresponding comparison test. Exercise 4.2.50. Determine uniform convergence (α, β > 0). P mn P 1 1. . 3. na n αm . n 2.

P

n

1 mα nβ

.

4.

P (m + n)! n a . n m!n!

Double Series P A double series m,n≥1 xm,n may converge in several different ways. First, the double series converges to sum s if for any > 0, there is N , such that m X n X m, n > N =⇒ xm,n − s < . i=1 j=1 P P P Second, the double series has P repeated sum x if m,n m n n xm,n conP verges for each m and the series m( n xm,n ) again converges. Of course, P P there is another repeated sum n m xm,n . 2 Third, a one-to-one correspondence P k ∈ N 7→ (m(k), n(k)) ∈ N arranges the double series into a single series k xm(k),n(k) , and we may consider the sum of the single series. Fourth, for any finite subset A ⊂ N2 , we may define the partial sum X sA = xm,n . (m,n)∈A

Then for any sequence Ak of finite subsets satisfying Ak ⊂ Ak+1 and ∪Ak = N2 , we say the double series converges to s with respect to the sequence Ak if limk→∞ sAk = s. For example, we have the spherical sum by considering Ak = {(m, n) ∈ N2 : m2 + n2 ≤ k 2 } and the triangular sum by considering Ak = {(m, n) ∈ N2 : m + n ≤ k}. Finally, the double series absolutely converges (see Exercise 4.2.52 for the reason for the terminology) to s if for any > 0, there is N , such that |sA − s| < for any A containing all (m, n) satisfying m ≤ N and n ≤ N .

246

CHAPTER 4. SERIES

Exercise 4.2.51. State the Cauchy criterion for the convergence of the double series P m,n≥1 xm,n (in the first sense). State the corresponding comparison test. P Exercise 4.2.52. Prove that aP double series m,n≥1 xm,n absolutely converges (in the final sense) if and only if m,n≥1 |xm,n | converges (in the first sense). P Exercise 4.2.53. Prove that a double series m,n≥1 xm,n absolutely converges if P and only if all the arrangement series k xm(k),n(k) converge. Moreover, the arrangement series have the same sum. P Exercise 4.2.54. Prove that a double series m,n≥1 xm,n absolutely converges if and only if the double series converge with respect to all the sequences An . Moreover, the sums with respect to all the sequences An are the same. P Exercise 4.2.55. Prove that if a double series m,n≥1 xm,n absolutely converges, then the two repeated sums converge and have the same value. Exercise 4.2.56. If a double series does not converge absolutely, what can happen to various sums? Exercise 4.2.57. Study the convergence and values of the following double series. 1.

P

2.

P

m,n≥1 a

m,n≥1

mn .

1 . (m + n)α

3.

P

(−1)m+n . (m + n)α

4.

P

1 . nm

m,n≥1

m,n≥2

Uniform Convergence of Two Variable Function Let f (x, t) be a two variable function. If limt→a f (x, t) = l(x), then we say f (x, t) converges to l(x) as t → a. The convergence is uniform if for any > 0, there is δ > 0, such that 0 < |t − a| < δ =⇒ |f (x, t) − l(x)| < . Similar definition can be made for the one-side limit and the limit at infinity. Exercise 4.2.58. Determine the intervals on which the limits are uniformly convergent. 1. limt→∞

1 = 0. xt

2. limt→0 xt = 1. 3. limt→0 t(xt − 1) = 1.

1 = 0. tx + 1 √ √ 5. limt→0 x + t = x. x t 6. limt→∞ 1 + = et . t 4. limt→∞

Exercise 4.2.59. State the Cauchy criterion for f (x, t) to uniformly converges to l(x) as t → a. Exercise 4.2.60. Prove that f (x, t) uniformly converges to l(x) as x → a if and only if f (x, tn ) uniformly converges to l(x) for any sequence tn satisfying tn 6= a and limn→∞ tn → a. Exercise 4.2.61. Suppose f (x) is continuous on an open interval containing [a, b]. Prove that f (x + t) uniformly converges to f (x) on [a, b] as t → 0.

4.2. SERIES OF FUNCTIONS

247

Exercise 4.2.62. Suppose f (x) has continuous an open interval containing [a, b]. f (x + t) − f (x) Prove that uniformly converges to f 0 (x) on [a, b] as t → 0. t Exercise 4.2.63. Suppose f (x) is continuous on an open interval containing [a, b]. Prove that the fundamental theorem of calculus Z 1 x+t lim f (t)dt = f (x) t→0 t x converges uniformly. Exercise 4.2.64. What properties of the uniform convergence of fn (x) still hold for the uniform convergence of f (x, t)?

P∞ an n=1 x n P∞ an The series n=1 x has many properties similar to the power series. n

The Series

P an Exercise 4.2.65. Use the Abel test in Proposition 4.2.6 to show that if connr P an P an P an verges, then uniformly converges on [r, +∞), and = limx→r+ . nx nr nx P an Exercise 4.2.66. Prove that there is R, such that converges on (R, +∞) and nx log |an | diverges on (−∞, R). Moreover, prove that R ≥ limn→∞ . log n Exercise 4.2.67. Prove that we can take terms wise integration and derivative of any order of the series on (R, +∞). Exercise 4.2.68. Prove that there is R0 , such that the series absolutely converges on log |an | (R0 , +∞) and absolutely diverges on (R0 , +∞). Moreover, prove that limn→∞ + log n log |an | + 1. 1 ≥ R0 ≥ limn→∞ log n Exercise 4.2.69. Give an example such that the inequalities in Exercises 4.2.66 and 4.2.68 are strict.

Continuous But Not Differentiable Function Exercise 4.2.70. P Let an be any sequence of points. Let |b| < 1. Prove that the function f (x) = bn |x − an | is continuous and is not differentiable precisely at {an }. Exercise 4.2.71 (Riemann). Let ((x)) be the periodic function with period 1, deP 1 1 1 ((nx)) termined by ((x)) = x for − < x < and (( )) = 0. Let f (x) = ∞ . n=1 2 2 2 n2 1. Prove that f (x) is not continuous precisely at rational numbers with even a denominators, i.e., number of the form r = , where a and b are odd 2b integers. 2. Compute f (r+ ) − f (r) and f (r− ) − f (r) at discontinuous points (you may need the conclusion of Exercise 4.2.24 for the precise value). Z x 3. Prove that f (x) is integrable, and F (x) = f (x)dx is not differentiable 0

precisely at rational numbers with even denominators.

248

CHAPTER 4. SERIES

Exercise 4.2.72 (Weierstrass). Suppose 0 1 + . Prove that ∞ n=0 b cos(a πx) is continuous but nowhere 2 differentiable. Exercise 4.2.73 (Weierstrass). Let {an } be a bounded countable set of numbers. P x n Let h(x) = x + sin log |x|, h(0) = 0, and 0 < b < 1. Prove that ∞ n=1 b h(x − an ) 2 is continuous, strictly increasing, and is not differentiable precisely at {an }.

Chapter 5 Multivariable Function

249

250

5.1

CHAPTER 5. MULTIVARIABLE FUNCTION

Limit and Continuity

The extension of the analysis from single variable to multivariable means that we are going from the real line R to the Euclidean space Rn . While many concepts and results may be extended, the Euclidean space is more complicated in various aspects. The first complication is the many possible choices of the distance in a Euclidean space, which we will show to be all equivalent as far as the mathematical analysis is concerned. The second complication is that it is not sufficient to just do analysis on the rectangles, which are the obvious generalizations of the intervals on the real lines. For example, to analyze the temperature around the globe, we need to deal with a function defined on the 2-dimensional sphere inside the 3-dimensional Euclidean space. Thus we need to set up proper topological concepts that extend closed interval (closed subsets), bounded closed interval (compact subsets), and open interval (open subsets). Once the concepts are established, we find that most of the discussion about the limits and continuity of single variable functions may be extended to multivariable functions (the only exceptions are those dealing with orders among real numbers, such as monotone properties).

5.1.1

Limit in Euclidean Space

The n-dimensional (real) Euclidean space Rn is the collection of n-tuples of real numbers ~x = (x1 , x2 , . . . , xn ), xi ∈ R. Geometrically, R1 is the usual real line and R2 is a plane with origin. An element of Rn can be considered as a point, or as an arrow starting from the origin and ending at the point, called a vector. The operation of addition and scalar multiplication may be applied to Euclidean vectors ~x + ~y = (x1 + y1 , x2 + y2 , . . . , xn + yn ), c~x = (cx1 , cx2 , . . . , cxn ).

(5.1.1)

The operations satisfy the usual properties such as the commutativity and the associativity. In general, a vector space is a set with two operations satisfying these usual properties. The operation ~x · ~y = x1 y1 + x2 y2 + · · · + xn yn .

(5.1.2)

is called the dot product and satisfies the usual properties such as the bilinearity and the positivity. In general, an inner product on a vector space is an operation with numerical value satisfying these usual properties. The dot product, or the inner product in general, induces the length (or the Euclidean norm) k~xk2 =

q √ ~x · ~x = x21 + x22 + · · · + x2n .

(5.1.3)

5.1. LIMIT AND CONTINUITY

251

It also induces the angle θ between two nonzero vectors by the formula cos θ =

~x · ~y . k~xk2 k~y k2

(5.1.4)

The definition of the angle is justified by the Schwarz inequality |~x · ~y | ≤ k~xk2 k~y k2 .

(5.1.5)

By the angle formula, two vectors are orthogonal and denoted ~x ⊥ ~y , if ~x · ~y = 0. Moreover, the orthogonal projection of a vector ~x on another vector ~y is ~y ~x · ~y proj~y ~x = k~xk2 cos θ = ~y . (5.1.6) k~y k2 ~y · ~y Finally, the area of the parallelogram formed by two vectors is p A(~x, ~y ) = k~xk2 k~y k2 | sin θ| = (~x · ~x)(~y · ~y ) − (~x · ~y )2 . (5.1.7) A norm on a vector space is a function k~xk satisfying 1. Positivity: k~xk ≥ 0, and k~xk = 0 if and only if ~x = ~0 = (0, 0, . . . , 0). 2. Scalar Property: kc~xk = |c|k~xk. 3. Triangle Inequality: k~x + ~y k ≤ k~xk + k~y k. The norm induces the distance k~x − ~y k between two vectors. The most often used norms are the Euclidean norm k~xk2 and k~xk1 = |x1 | + |x2 | + · · · + |xn |, k~xk∞ = max{|x1 |, |x2 |, . . . , |xn |}. In general, for any p ≥ 1, the Lp -norm is defined in Exercise 5.1.3. ................................. ........................ .............. .. .. . . . ..... ..... .. .. .. ... . ..... .. .... .. . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .......... . . . . . . . . . . . . . . . . . . . . . . ~ x ~ x ~ x . . ..... .. . . . . . . . . . . ..... .. .. . .. .. . . . ..... . . . ..... .... .. . .................... ..... ................................ BL∞ (~x, ) BL1 (~x, ) BL2 (~x, ) Figure 5.1: balls with respect to different norms Given any norm, we have the (open) ball and the closed ball B(~x, ) = {~y : k~y − ~xk < }, ¯ x, ) = {~y : k~y − ~xk ≤ } B(~ of radius and centered at ~x. Moreover, for the Euclidean norm, we have the (closed) Euclidean ball and the sphere of radius R BRn = {~x : k~xk2 ≤ R} = {(x1 , x2 , . . . , xn ) : x21 + x22 + · · · + x2n ≤ R2 }, SRn−1 = {~x : k~xk2 = R} = {(x1 , x2 , . . . , xn ) : x21 + x22 + · · · + x2n = R2 }. For the radius R = 1, we have the unit ball B n = B1n and the unit sphere S n−1 = S1n−1 .

252

CHAPTER 5. MULTIVARIABLE FUNCTION

Exercise 5.1.1. Find all the norms on R. Exercise 5.1.2. Directly verify the Schwarz inequality for the dot product q q |x1 y1 + x2 y2 + · · · + xn yn | ≤ x21 + x22 + · · · + x2n y12 + y22 + · · · + yn2 , and find the condition for the equality to hold. Moreover, derive the triangle inequality for the norm k~xk2 from the Schwarz inequality. Note that the H¨ older inequality in Exercise 2.2.41 generalizes the Schwarz inequality. Exercise 5.1.3. Prove that for any p ≥ 1, the Lp -norm k~xkp =

p p |x1 |p + |x2 |p + · · · + |xn |p

(5.1.8)

satisfies the three conditions for the norm. Then prove that the Lp -norms satisfy k~xk∞ ≤ k~xkp ≤

√ p

nk~xk∞ .

Exercise 5.1.4. Prove that for any positive numbers a1 , a2 , . . . , an , and p ≥ 1, p k~xk = p a1 |x1 |p + a2 |x2 |p + · · · + an |xn |p is a norm. Exercise 5.1.5. Suppose k~xk and 9~y 9 are norms on Rm and Rn . Prove that k(~x, ~y )k = max{k~xk, 9~y 9} is a norm on Rm × Rn = Rm+n . Prove that if m = n, then k~xk + 9~x9 is a norm on Rn . Exercise 5.1.6. Prove that for any norm and any vector ~x, there is a number r ≥ 0 and a vector ~u, such that ~x = r~u and k~uk = 1. The expression ~x = r~u is the polar decomposition that describes a vector as characterized by the length r and the direction ~u. Exercise 5.1.7. Prove that any norm satisfies |k~xk − k~y k| ≤ k~x − ~y k. Exercise 5.1.8. Prove that if ~y ∈ B(~x, ), then B(~y , δ) ⊂ B(~x, ) for some radius δ > 0. In fact, we can take δ = − k~y − ~xk.

Comparing the single variable (represented by R) with the multivariable (represented by Rn ), we find that the addition is generalized in a unique way, and the multiplication is generalized to the scalar multiplication and the dot product. Moreover, the absolute value?and therefore the distance) is generalized to many possible choices of norms. A sequence of vectors {~xn } converges to a vector ~l with respect to a norm k~xk, denoted limn→∞ ~xn = ~l, if for any > 0, there is N , such that n > N =⇒ k~xn − ~lk < .

(5.1.9)

The right side is the same as ~xn ∈ B(~l, ). The definition appears to depend on the choice of norm. For example, limn→∞ (xn , yn ) = (l, k) with respect to the Euclidean norm means p lim |xn − l|2 + |yn − k|2 = 0, n→∞

5.1. LIMIT AND CONTINUITY

253

and the same limit with respect to the L∞ -norm means each coordinate converges lim xn = l, lim yn = k. n→∞

n→∞

On the other hand, if two norms k~xk and 9~x9 satisfy c1 k~xk ≤ 9~x9 ≤ c2 k~xk

(5.1.10)

for some c1 , c2 > 0, then it is easy to see that the convergence with respect to k~xk is equivalent to the convergence with respect to 9~x9. Because of the remark, two norms related as in (5.1.10) are said to be equivalent. For example, since all Lp -norms are equivalent by Exercise 5.1.3, the Lp -convergence in Rn means exactly that each coordinate converges. In Theorem 5.1.13, we will prove that all norms on Rn (and on finite dimensional vector spaces in general) are equivalent, so that the convergence is independent of the choice of the norm. For the moment, however, all the subsequent discussions can be rigorously verified with regard to the Lp norm. After Theorem 5.1.13 is established, the discussion becomes valid for any norm. Because the L∞ -convergence is the same as the convergence of each coordinate, by applying the Cauchy criterion to each coordinate, we get the Cauchy criterion for the convergence of vectors: For any > 0, there is N , such that m, n > N implies k~xm − ~xn k < . This extends Theorems 1.2.2 and 1.2.12. By considering individual coordinates, Proposition 1.2.1, Theorem 1.2.8 and Proposition 1.2.9 can be extended to vectors. Because of the lack of order among vectors, Proposition 1.2.7 cannot be extended directly to vectors, but can still be applied to individual coordinates. For the same reason, the concept of upper and lower limits cannot be extended. Exercise 5.1.9. Prove that if the first norm is equivalent to the second norm, and the second norm is equivalent to the third norm, then the first norm is equivalent to the third norm. Exercise 5.1.10. Prove that if limn→∞ ~xn = ~l, then limn→∞ k~xn k = k~lk. Prove the converse is true if ~l = ~0. In particular, convergent sequences of vectors are bounded. Exercise 5.1.11. Prove that if limn→∞ ~xn = ~l and limn→∞ ~yn = ~k, then limn→∞ (~xn + ~yn ) = ~l + ~k and limn→∞ c~xn = c~l. Exercise 5.1.12. Prove that if limn→∞ cn = c and limn→∞ ~xn = ~x, then limn→∞ cn ~xn = c~x. Exercise 5.1.13. Prove that if limn→∞ ~xn = ~x and limn→∞ ~yn = ~y with respect to the Euclidean norm k~xk2 , then limn→∞ ~xn · ~yn = ~x · ~y . Of course the Euclidean norm can be replaced by any norm after Theorem 5.1.13 is established.

5.1.2

Topology in Euclidean Space

The theory of one variable functions are often discussed over intervals. For multivariable functions, it is no longer sufficient to only consider rectangles.

254

CHAPTER 5. MULTIVARIABLE FUNCTION

We need some basic concepts and results for the most useful type of subsets of the Euclidean space. The subsequent definitions are made with respect to any norm. The results are stated for any norm. The discussion and proofs are mostly based on the L∞ -norm. After Theorem 5.1.13 is established, the discussion and results are valid for any norm. Many important results on one-variable functions are stated on a bounded and closed interval. Since the key property used for proving the results is that any sequence in the interval has a subsequence converging to a number in the interval, we introduce the following concept. Definition 5.1.1. A subset of a Euclidean space is compact if any sequence in the subset has a convergent subsequence with the limit still in the subset. A subset K is bounded if there is a constant M such that k~xk < M for any ~x ∈ K. By applying Theorem 1.2.8 to each coordinate (for L∞ -norm, with the help of Exercise 1.2.24), we find that any sequence in a bounded subset K has a convergent subsequence. For K to be compact (with respect to the L∞ -norm for the moment), it remains to make sure the limit is still inside K. This leads to the following concept. Definition 5.1.2. A subset of a Euclidean space is closed if the limit of any convergent sequence in the subset is still in the subset. In other words, a subset is closed if it contains all its limits. Proposition 5.1.3. A subset of the Euclidean space is compact if and only if it is bounded and closed. For the special case of the L∞ -norm, we have argued that bounded and closed subsets are compact. The proof of the converse is given below, for any norm. Proof. Suppose K is not bounded. Then there is a sequence {~xn } in K satisfying limn→∞ k~xn k = ∞. This implies that limn→∞ k~xnk k = ∞ for any subsequence {~xnk }, so that any subsequence is not bounded and must diverge (see Exercise 5.1.10). Thus K is not compact. Suppose K is not closed. Then there is a convergent sequence {~xn } in K such that limn→∞ ~xn = ~y 6∈ K. In particular, any subsequence of {~xn } converges to ~y 6∈ K. Thus no subsequence has limit in K, and K is not compact. Exercise 5.1.14. Prove that if two norms are equivalent, then a subset is bounded, compact, or closed with respect to one norm if and only if it has the same property with respect to the other norm. Exercise 5.1.15. Prove that the subsets {~x : k~xk∞ = 1} and {~x : k~xk∞ ≤ 1} are compact with respect to the L∞ -norm. Exercise 5.1.16. Suppose K is a compact subset. Prove that there are ~a, ~b ∈ K, such that k~a − ~bk = max~x,~y∈K k~x − ~y k.

5.1. LIMIT AND CONTINUITY

255

¯ x, ) is a closed subset (so the name Exercise 5.1.17. Prove that the closed ball B(~ is appropriate). Exercise 5.1.18. Prove that the intersection of closed subsets is closed, the union of finitely many closed subsets is also closed, and the product of closed subsets is closed (use the product norm in Exercise 5.1.5). What about compact subsets?

The definition of closed subsets suggests that, to make any subset closed, we should add all the limits of convergent subsequences in the subset. Definition 5.1.4. The closure A¯ of a subset A consists of the limits of all convergent sequences in A. Clearly, a subset A is closed if and only if A¯ ⊂ A. The closure is also characterized by the following result. Proposition 5.1.5. The closure of A is closed. It consists of points ~y satisfying the property that for any > 0, we have k~y − ~xk < for some ~x ∈ A. Moreover, it is the smallest closed subset containing A. The property in the proposition means B(~y , ) ∩ A 6= ∅ for any > 0. The geometric meaning is illustrated in Figure 5.2. ................ ...... ... .............. .... ....... .......... .. .. ..... ~y ........................................... nonempty . .. intersection ....................... . . . . .. . . . . . . .. ... . . . . .. . .. ....................... . . . . . . . . . . . . . . . . . . .. ... .. A .... . . . . . . . . . .................... .. ................... Figure 5.2: characterization of points in the closure If a subset A is bounded, then it is contained in the closed cube [−M, M ]n for some big M . The proposition implies that the closure A¯ is also contained in [−M, M ]n and is therefore bounded. Then by Proposition 5.1.3, the closure is compact. Proof. Any ~y ∈ A¯ is the limit of a sequence {~xn } in A. Thus for any > 0, we have k~y − ~xn k < for big n, where we also note that ~xn ∈ A. Conversely, suppose ~y has the property described in the proposition. Then 1 1 for = , we have ~xn ∈ A satisfying k~y − ~xn k < . This implies ~y is the n n limit of the sequence {~xn } in A. Next we prove the closure is closed. By the definition of closure, this is the ¯ then ¯ By the characterization just proved, if ~z ∈ A, same as showing A¯ ⊂ A. ¯ Using the characterization for any > 0, we have k~z −~y k < for some ~y ∈ A. of the closure again, we have k~y − ~xk < for some ~x ∈ A. Therefore we get k~z − ~xk ≤ k~z − ~y k + k~y − ~xk < 2. By the characterization of the closure, this shows that ~z ∈ A¯ and completes the proof that A¯ is closed.

256

CHAPTER 5. MULTIVARIABLE FUNCTION

Finally, we prove that the closure is the smallest closed subset containing A. Since any ~x ∈ A is the limit of the constant sequence {~x} in A, we have ¯ Therefore the closure is a closed subset containing A. Moreover, A ⊂ A. suppose C is a closed subset containing A. Then any ~y ∈ A¯ is the limit of a convergent sequence in A. Since A ⊂ C, the sequence also lies in C. Since C is closed, it contains the limit ~z of the convergence sequence. This proves that A¯ ⊂ C. Therefore the closure is the smallest closed subset containing A. Definition 5.1.6. The boundary ∂A of a subset A consists of points that are simultaneously limits of A and Rn − A. The definition means ∂A = A¯ ∩ Rn − A. Equivalently, ~y ∈ ∂A means B(~y , ) ∩ A 6= ∅ and B(~y , ) ∩ (Rn − A) 6= ∅ for any > 0. ¯ x, ) is the closure of Exercise 5.1.19. Prove that for any norm, the closed ball B(~ ¯ B(~x, ), and the sphere B(~x, ) − B(~x, ) is the boundary of B(~x, ). Exercise 5.1.20. Prove that A is closed if and only if A¯ = A. Exercise 5.1.21. Prove that if two norms are equivalent, then the closure and the boundary of a subset with respect to one norm is the same as the the closure and the boundary with respect to the other norm. Exercise 5.1.22. Prove properties of the closure and the boundary (the norm on the product of Euclidean space is given in Exercise 5.1.5). ¯ 1. A ⊂ B =⇒ A¯ ⊂ B.

6. ∂A = ∂(Rn − A).

¯ 2. A ∪ B = A¯ ∪ B.

7. ∂(A ∪ B) ⊂ ∂A ∪ ∂B.

¯ 3. A ∩ B ⊂ A¯ ∩ B.

8. ∂(A ∩ B) ⊂ ∂A ∪ ∂B.

¯ 4. A × B = A¯ × B.

9. ∂(A − B) ⊂ ∂A ∪ ∂B.

5. A¯ = ∂A ∪ A.

¯ ∪ (A¯ × ∂B). 10. ∂(A × B) = (∂A × B)

Exercise 5.1.23. Suppose ~x ∈ A and ~y ∈ / A. Suppose φ(t) = (1 − t)~x + t~y and τ = sup{t : φ(t) ∈ A}. Prove that φ(τ ) ∈ ∂A.

In the analysis of one-variable functions, such as differentiability, we often require functions to be defined at all the points near a given point. For multivariable functions, the requirement becomes the following concept. Definition 5.1.7. A subset of a Euclidean space is open if for any point in the subset, all the points near the point are also in the subset. Thus a subset U is open if ~x ∈ U =⇒ B(~x, ) ⊂ U for some > 0.

(5.1.11)

The geometrical meaning is illustrated in Figure 5.3. Proposition 5.1.8. A subset U of Rn is open if and only if the complement Rn − U is closed.

5.1. LIMIT AND CONTINUITY .................. .................. ......... ..U . .. .. .................................................... . ... . .... ...... ~x ........................

257 .................. .................. ... ... ..U . .. ............. .. . ... .. .. .... ~x ............. ...........................

Figure 5.3: definition of open subset Proof. The subset U is open if and only if (5.1.11) holds. Since the right side of (5.1.11) is the same as B(~x, ) ∩ (Rn − U ) = ∅ for some > 0, by Proposition 5.1.5, we see that U is open if and only if ~x ∈ U =⇒ ~x 6∈ Rn − U . This is logically the same as ~x ∈ Rn − U =⇒ ~x 6∈ U. Since the right side is the same as ~x ∈ Rn − U , the implication means exactly Rn − U ⊂ Rn − U , which by the definition means the complement Rn − U is closed. The following result says that if an open subset U contains a compact subset K, then U contains an -neighborhood of K. Proposition 5.1.9. Suppose a compact subset K is contained in an open subset U . Then there is > 0, such that k~y − ~xk < and ~x ∈ K implies ~y ∈ U . Proof. Suppose the conclusion is not true. Then for any n, there are ~xn ∈ K 1 and ~yn 6∈ U , such that k~yn − ~xn k < . Since K is compact, there is a n 1 subsequence {~xnk } converging to ~z ∈ K. By k~yn − ~xn k < , we know the n subsequence {~ynk } also converges to ~z. Since ~ynk ∈ Rn − U and Rn − U is closed by Proposition 5.1.8, we get ~z ∈ Rn − U . This contradicts with ~z ∈ K ⊂ U . Finally, the Heine-Borel Theorem in Section 1.2.6 can be extended to Euclidean space. Theorem 5.1.10. Suppose K is a compact subset. Suppose {Ui } is a collection of open subsets such that K ⊂ ∪Ui . Then K ⊂ Ui1 ∪ Ui2 ∪ · · · ∪ Uik for finitely open subsets in the collection. The extension can be proved for the L∞ -norm by adopting the original Heine-Borel Theorem with little modification. The compact subset K is contained in a bounded rectangle I = [α1 , β1 ] × [α2 , β2 ] × · · · × [αn ,βn ]. αi + βi By replacing each component interval [αi , βi ] with either αi , or 2

258

CHAPTER 5. MULTIVARIABLE FUNCTION

αi + βi , βi , the rectangle can be divided into 2n rectangles. If K cannot 2 be covered by finitely many open subsets from {Ui }, then for one of the 2n rectangles, denoted I1 , the intersection K1 = K ∩ I1 cannot be covered by finitely many open subsets from {Ui }. The rest of the construction and the argument can proceed as before. Exercise 5.1.24. Prove that if two norms are equivalent, then a subset is open with respect to one norm if and only if it is open with respect to the other norm. Exercise 5.1.25. Prove that for any norm, the ball B(~x, ) is open.

Exercise 5.1.26. Prove that the intersection of finitely many open subsets is open, the union of open subsets is also open, and the product of open subsets is open (use the product norm in Exercise 5.1.5). Exercise 5.1.27. Suppose f (x) is a continuous function on R. Prove that the subset {(x, y) : y < f (x)} is open, with the subset {(x, y) : y = f (x)} as the boundary. What if f (x) is not continuous?

5.1.3

Multivariable Function

Multivariable functions may be defined on all kinds of subsets of the Euclidean space. Compared with the single variable case, we often need to pay more attention to the subset on which the function is defined. A multivariable function f on a subset A may be visualized either by the graph {(~x, f (~x)) : ~x ∈ A} or the levels {~x ∈ A : f (~x) = c}. .... z . .......................................... . . . . . . . . . . . . . . . . .... .. .. .......... . ..................................................................... .. . .. ... . ... .. . ... .. . . . . ... . . . . ... ........................................ ... ....... ... .. ... ...... z = x2 + y 2 ............................................ ... ... .. .. .. ... ... .. .. .. ... ... .. .. .. .... .. .. .. ...... .. . . ... .. ............................................................................................. y .... .. ..... . x

.... y .. .................. f =9 . . . . . . . . . . . . . .... ........ . .. . . . . . ..... . .... . .... . . ............ . ..... ............ ... ........f....=4 . . ... ... ..... ..................f.. =1 ...... ..... .. ... .... ... ..... .... .... . . ..................................................................................................................................... x ... ... ... .. .. .. .. ... ... ..................... ... .. . . ... ..... . .... ......... ... .......... .... ................... . ..... .. .... ....... . . . . . . . .......... . ......... .................... .

Figure 5.4: graph and level of x2 + y 2 Given a norm, a multivariable function f (~x) = f (x1 , x2 , . . . , xn ) has limit l at ~a = (a1 , a2 , . . . , an ) if for any > 0, there is δ > 0, such that 0 < k~x − ~ak < δ =⇒ |f (~x) − l| < .

(5.1.12)

For the L∞ -norm, this means |xi − ai | < δ for all i and xi 6= ai for some i =⇒ |f (x1 , x2 , . . . , xn ) − l| < . After we show that all norms are equivalent in Theorem 5.1.13, we will know that the continuity is independent of the choice of the norm.

5.1. LIMIT AND CONTINUITY

259

In general, if a function is defined on a subset A, then the limit is denoted as lim~x∈A,~x→~a f (~x) = l. If a function is defined for all ~x satisfying 0 < k~x − ~ak < δ for some δ (i.e., ~x is in the punctured ball B(~a, δ) − ~a), then we simply denote lim~x→~a f (~x) = l or limx1 →a1 ,...,xn →an f (x1 , x2 , . . . , xn ) = l. If B ⊂ A, then by restricting the definition of the limit from ~x ∈ A to ~x ∈ B, we find lim

~ x∈A,~ x→~a

f (~x) = l =⇒

lim

~ x∈B,~ x→~a

f (~x) = l.

In particular, if the restrictions of a function on different subsets give different limits, then the limit diverges. The limits of multivariable functions have all the usual properties as before. We may also define various variations at ∞ by replacing k~x − ~ak < δ with k~xk > N or replacing |f (~x) − l| < by |f (~x)| > b. xy(x2 − y 2 ) defined for (x, y) 6= x2 + y 2 √ (0, 0). Since |f (x, y)| ≤ |xy|, we find k(x, y)k∞ < δ = implies |f (x, y)| ≤ . Therefore we have limx,y→0 f (x, y) = 0 in the L∞ -norm. By the equivalence between the Lp -norms, the limit also holds for other Lp -norms. xy Example 5.1.2. Consider the function f (x, y) = 2 defined for (x, y) 6= (0, 0). x + y2 c c We have f (x, cx) = , so that limy=cx,(x,y)→(0,0) f (x, y) = . Thus the 1 + c2 1 + c2 restriction of the function to straight lines of different slopes gives different limits. We conclude that the function diverges at (0, 0). Exercise 5.1.28. Describe the graphs and the levels of functions. Example 5.1.1. Consider the function f (x, y) =

1. ax + by.

6.

x2 y 2 z 2 + 2 − 2. a2 b c

7.

x2 y 2 z 2 − 2 − 2. a2 b c

2. ax + by + cz. 3.

(x − x0 a2

)2

+

)2

(y − y0 . b2

x2 y 2 z 2 4. 2 + 2 + 2 . a b c 5.

(x − x0 )2 (y − x0 )2 − . a2 b2

8. (x2 + y 2 )2 . 9. |x|p + |y|p . 10. xy.

Exercise 5.1.29. What is the relation between lim~x∈A,~x→~a f (~x), lim~x∈B,~x→~a f (~x), lim~x∈A∪B,~x→~a f (~x), and lim~x∈A∩B,~x→~a f (~x). Exercise 5.1.30. Prove that lim~x→~a f (~x) = l if and only if limn→∞ f (~xn ) = l for any sequence {~xn } satisfying ~xn 6= ~a and limn→∞ ~xn = ~a. Exercise 5.1.31. For a subset A ⊂ Rn , the characteristic function is ( 1 if ~x ∈ A χA (~x) = 0 if ~x 6∈ A Find the condition for lim~x→~a χA (~x) to exist. xp y q = 0, where all the (xm + y n )k parameters are positive. Extend the discussion to more variables. Exercise 5.1.33. Compute the convergent limits. All parameters are positive. Exercise 5.1.32. Find the condition for limx,y→0+

260

CHAPTER 5. MULTIVARIABLE FUNCTION

1. limx→1,y→1

1 . x−y

2. limx→0,y→0,z→0

7. limx→∞,y→∞,ax≤y≤bx

xyz . x2 + y 2 + z 2

3. limx→0,y→0 (x − y) sin

1 . x2 + y 2

4. limx→∞,y→∞ (x − y) sin 5. limx→∞,y→∞

(x2 (x4

+ . + y 4 )q

6. limx→0,y→0,0<x
8. limx→0+ ,y→0+ (x + y)xy . 9. limx→∞,y→0

1 . x2 + y 2

y 2 )p

xp y . x2 + y 2

1 . xy

1 1+ x

x2 x+y

.

10. limx→+∞,y→+∞

1 1+ x

11. limx→+∞,y→+∞

x2 + y 2 . ex+y

x2 x+y

.

For multivariable functions, we may first take the limit in some variables and then take the limit in the other variables. In this way, we get repeated limits such as lim~x→~a lim~y→~b f (~x, ~y ). The following result tells us the relation between the repeated limits and the usual limit lim~x→~a,~y→~b f (~x, ~y ) = lim(~x,~y)→(~a,~b) f (~x, ~y ). The proof is given for the the case k(~x, ~y )k = k~xk + k~y k. Proposition 5.1.11. Suppose f (~x, ~y ) is defined near (~a, ~b). Suppose 1. The limit lim(~x,~y)→(~a,~b) f (~x, ~y ) converges to l. 2. For each ~x near ~a, the limit lim~y→~b f (~x, ~y ) converges to g(~x). Then the repeated limit lim~x→~a lim~y→~b f (~x, ~y ) = lim~x→~a g(~x) exists and is equal to l. Proof. For any > 0, there is δ > 0, such that 0 < max{k~x − ~ak, k~y − ~bk} < δ =⇒ |f (~x, ~y ) − l| < . Then for each fixed ~x near ~a, taking ~y → ~b in the inequality above gives us 0 < k~x − ~ak < δ =⇒ |g(~x) − l| = lim |f (~x, ~y ) − l| ≤ . ~ y →~b

This implies that lim~x→~a g(~x) converges to l. xy in Example 5.1.2 satisfies limx→0 f (x, y) = x2 + y 2 0 for each y and limy→0 f (x, y) = 0 for each x. Therefore both repeated limits limy→0 limx→0 f (x, y) and limx→0 limy→0 f (x, y) exist and are equal to 0. However, the limit lim(x,y)→(0,0) f (x, y) diverges. Example 5.1.3. The function f (x, y) =

Exercise 5.1.34. Study the usual limit limx→0,y→0 and the repeated limits limx→0 limy→0 , limy→0 limx→0 .

5.1. LIMIT AND CONTINUITY 1.

x − y + x2 + y 2 . x+y

2. x sin 3.

1 1 + y cos . x x

|x|α |y|β , α, β, γ > 0. (x2 + y 2 )γ

261 4.

x2 y 2 . |x|3 + |y|3

5. (x + y) sin 6.

1 1 sin . x y

ex − ey . sin xy

Exercise 5.1.35. Establish a concept of the uniform convergence and find the condition for the commutativity of the repeated limits such as limx→a limy→b f (x, y) = limy→b limx→a f (x, y), similar to Proposition 4.2.7. Exercise 5.1.36. Construct a function f (x, y), such that the limit limx→a,y→b f (x, y) converges, but the two repeated limits do not converge.

5.1.4

Continuous Function

A multivariable function f (~x) is continuous at ~a if it is defined at ~a and lim~x→~a f (~x) = f (~a). For the L∞ -norm, this means that for any > 0, there is δ > 0, such that |xi − ai | < δ for all i =⇒ |f (x1 , x2 , . . . , xn ) − f (a1 , a2 , . . . , an )| < . Like the single variable case, it is easy to see that the arithmetic combinations of continuous functions are continuous, and the exponential of continuous functions are continuous. A function f (~x) on A is uniformly continuous if for any > 0, there is δ > 0, such that ~x, ~y ∈ A, k~x − ~y k < δ =⇒ |f (~x) − f (~y )| < .

(5.1.13)

The following extension of Theorem 1.4.5 is proved for any norm. Theorem 5.1.12. A continuous function on a compact subset is bounded, uniformly continuous, and reaches its maximum and minimum. Proof. Suppose f (~x) is continuous on a compact subset K. If f (~x) is not bounded, then there is a sequence {~xn } in K such that limn→∞ f (~xn ) = ∞. Since K is compact, there is a subsequence {~xnk } converging to ~a ∈ K. By the continuity of f (~x) on K, we have limk→∞ f (~xnk ) = f (~a). This contradicts with limn→∞ f (~xn ) = ∞. The function is bounded and we may let β = sup~x∈K f (~x). There is a sequence {~xn } in K such that limn→∞ f (~xn ) = β. By the similar argument as before, there is a subsequence converging to ~a ∈ K, such that f (~a) = limk→∞ f (~xnk ) = β. Thus the maximum is reached at ~a. The proof of Theorem 1.4.4 can be adopted to prove the uniform continuity. Exercise 5.1.37. Prove that any norm is a continuous function with respect to itself. Is the continuity uniform?

262

CHAPTER 5. MULTIVARIABLE FUNCTION

Exercise 5.1.38. Prove that if f (~x) and g(~x) are continuous at ~x0 , and h(u, v) is continuous at (f (~x0 ), g(~x0 )) with respect to the L∞ -norm on R2 , then h(f (~x), g(~x)) is continuous at ~x0 . Exercise 5.1.39. Prove that if f (x, y) is monotone and continuous in x and in y, then f (x, y) is continuous with respect to the L∞ -norm. What if the function is monotone and continuous in x but is only continuous in y? Exercise 5.1.40. Study the continuity.   sin xy if x 6= 0 1. . x 0 if x = 0   sin xy if(x, y) 6= (0, 0) |x| + |y| 2. .  0 if(x, y) = (0, 0) ( y if x is rational 3. . −y if x is irrational

( x(x2 + y 2 )p 4. 0 ( y log(x2 + y 2 ) 5. 0 ( x ey 6. 0

if y 6= 0 . if y = 0 if(x, y) 6= (0, 0) . if(x, y) = (0, 0)

if y 6= 0 . if y = 0

Exercise 5.1.41. Suppose φ(x) is uniformly continuous. Is f (x, y) = φ(x) uniformly continuous with respect to the L∞ -norm? Exercise 5.1.42. Prove that with respect to the L∞ -norm, any uniformly continuous function on a bounded subset is bounded. Exercise 5.1.43. Prove that the sum of uniformly continuous functions is uniformly continuous. What about the product?

By using Theorem 5.1.12, we can establish the following result, which allows us to freely use any norm in the discussion of vectors and multivariable functions. For example, a subset is compact with respect to any norm if and only if it is closed and bounded. Theorem 5.1.13. All norms on a finite dimensional vector space are equivalent. Proof. We prove the claim for norms on a Euclidean space. Since linear algebra tells us that any finite dimensional vectors space is isomorphic to a Euclidean space, the result applies to any vector space. We compare any norm k~xk with the L∞ -norm k~xk∞ . Let ~e1 = (1, 0, . . . , 0), ~e2 = (0, 1, . . . , 0), . . . , ~en = (0, 0, . . . , 1) be the standard basis of Rn . By the conditions for norm, we have k~xk = kx1~e1 + x2~e2 + · · · + xn~en k ≤ |x1 |k~e1 k + |x2 |k~e2 k + · · · + |xn |k~en k ≤ (k~e1 k + k~e2 k + · · · + k~en k) max{|x1 |, |x2 |, . . . , |xn |} = (k~e1 k + k~e2 k + · · · + k~en k)k~xk∞ = c1 k~xk∞ . We remark that this implies (see Exercise 5.1.7) k~x − ~ak∞ ≤ δ =⇒ |k~xk − k~ak| ≤ k~x − ~ak ≤ c1 δ.

5.1. LIMIT AND CONTINUITY

263

Based on this, it is easy to see that the function kxk is continuous with respect to the L∞ -norm. It remains to prove that kxk ≥ c2 k~xk∞ for some c2 > 0. The inequality implies k~xk∞ = 1 =⇒ k~xk ≥ c2 . (5.1.14) Conversely, suppose the implication (5.1.14) holds. For any ~x, we write ~x = r~u with r = k~xk∞ and k~uk∞ = 1 (see Exercise 5.1.6). Then k~xk = kr~uk = rk~uk ≥ rc2 = c2 k~xk∞ . Therefore we conclude the inequality k~xk ≥ c2 k~xk∞ is equivalent to the implication (5.1.14). The subset K = {~x : k~xk∞ = 1} is bounded and closed with respect to the L∞ -norm (see Exercise 5.1.15). By Proposition 5.1.3, which was fully proved for the L∞ -norm, K is compact with respect to the L∞ -norm. Then by Theorem 5.1.12, the continuity of the function k~xk on K with respect to the L∞ -norm implies that k~xk reaches its minimum on K at ~a ∈ K. Then we conclude the implication (5.1.14) holds for c2 = k~ak, with ~a ∈ K =⇒ k~ak∞ = 1 =⇒ ~a 6= ~0 =⇒ c2 = k~ak > 0.

5.1.5

Multivariable Map

Multivariable analysis is not restricted to multivariable functions. We may also consider maps from one multivariable to another multivariable. These are the maps between Euclidean spaces. A map from A ⊂ Rn to Rm has m-dimensional vectors as values. Instead of using many arrows, we denote the map by F (~x) = (f1 (~x), f2 (~x), . . . , fm (~x)) : A ⊂ Rn → Rm . Two maps into the same Euclidean space may be added. A number can be multiplied to a map into a Euclidean space. If F : A ⊂ Rn → Rm and G : B ⊂ Rk → Rn are maps such that G(~x) ∈ A for any ~x ∈ B, then we have the composition (F ◦ G)(~x) = F (G(~x)) : B ⊂ Rk → Rm . The map converges to ~l ∈ Rm at ~a ∈ Rn , denoted lim~x→~a F (~x) = ~l, if for any > 0, there is δ > 0, such that 0 < k~x − ~ak < δ =⇒ kF (~x) − ~lk < .

(5.1.15)

By using the L∞ -norm, it is easy to see that lim F (~x) = ~l ⇐⇒ lim fi (~x) = li for all i.

~ x→~a

~ x→~a

(5.1.16)

The property also holds for any norm because any norm is equivalent to the L∞ -norm.

264

CHAPTER 5. MULTIVARIABLE FUNCTION

The map is continuous at ~a if lim~x→~a F (~x) = F (~a). By (5.1.16), a map is continuous if and only if its coordinate functions are continuous. Many properties can be extended from functions to maps. For example, the sum and the composition of continuous maps are still continuous. The scalar product of a continuous function and a continuous map is a continuous map. The dot product of two continuous maps is a continuous function. Some special cases of maps can be visualized in various ways. For example, a function is a map to R and can be visualized by its graph or its levels. On the other hand, a (parametrized) curve (or a path) in Rn is a continuous map φ(t) = (x1 (t), x2 (t), . . . , xn (t)) : [a, b] → Rn . The continuity means that each coordinate function xi (t) is continuous. For example, the straight line passing through ~a and ~b is φ(t) = (1 − t)~a + t~b = ~a + t(~b − ~a),

(5.1.17)

and the unit circle on the plane is φ(t) = (cos t, sin t) : [0, 2π] → R2 .

(5.1.18)

We say the path connects φ(a) to φ(b). A subset A ⊂ Rn is path connected if any two points in A are connected by a path in A (i.e., φ(t) ∈ A for any t ∈ [a, b]). Similar to curves, a map R2 → Rn may be considered as a (parametrized) surface. For example, the sphere in R3 may be parametrized by σ(φ, θ) = (sin φ cos θ, sin φ sin θ, cos φ) : [0, π] × [0, 2π] → R3 ,

(5.1.19)

and the torus by (a > b > 0) σ(φ, θ) = ((a + b cos φ) cos θ, (a + b cos φ) sin θ, b sin φ) : [0, 2π] × [0, 2π] → R3 . (5.1.20) A map Rn → Rn may be considered as a change of variable, or a transform, or a vector field. For example, (x, y) = (r cos θ, r sin θ) : [0, ∞) × R → R2 is the polar coordinate. Moreover, Rθ (x, y) = (x cos θ − y sin θ, x sin θ + y cos θ) : R2 → R2 transforms the plane by rotation of angle θ, and ~x → ~a + ~x : Rn → Rn shifts all the Euclidean space by ~a. A vector field assigns an arrow to a point. For example, the vector field F (~x) = ~x : Rn → Rn is a vector field in the radial direction, while F (x, y) = (y, −x) : R2 → R2 is a counterclockwise rotating vector field, just like the water flow in the sink.

5.1. LIMIT AND CONTINUITY ..... .. .. R. θ (~a) .. ...• . . . . .. . .. .......... .....• ~a . . .... ...........................................................θ................................... . . ..θ......... ... . . . . .. . . ~b •... ... ... •. ... Rθ (~b).. . ..... ........

265 ~a + ~x .....•. . . . . . ..... ...... . . . . . . ...~a ~x •... ...... . . . . . . .... .......~a + ~y . . . . . . ~0 •.. ........•. ....... ............ .... .. . . . . . . . . A .. ..... .. ~a + ..................... ...... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ~y • ..... . . ............ ................ .. . . ...... ... ..............A ...............................

Figure 5.5: rotation and shifting ..... ..... ..... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............ ............. .... ...... .... .. .... ...... ..... ... . . ... . ....... . .. . ............................................... ........ ... ... .. ... .. ... . . ............ ..... .. ..... ........... . . . .. .. . . .. . . ... ... .......... ......... ..... ... . ........ . . . . . . . . .. . .. .... • ....... ...... ................... ....... • ....... ................... ....... . ... .. ......... ... .. ... .. . .. ... ................... ..... ...... ............. .... ....... ...... ................................. .. ... ........ ....... .... .. .. .. .... ..... . . .. . . . ...... .... ..... ..... ..... ..... ............... ................................................. . ..... ..... ..... . Figure 5.6: radial and rotational vector fields Example 5.1.4. By direct argument, it is easy to see that the product function µ(x, y) = xy : R2 → R is continuous. Suppose f (~x) and g(~x) are continuous. Then F (~x) = (f (~x), g(~x)) : Rn → R2 is also continuous because each coordinate function is continuous. The composition (µ ◦ F )(~x) = f (~x)g(~x) of the two continuous maps is also continuous. Therefore we conclude that the product of continuous functions is continuous. Exercise 5.1.44. Describe the maps in suitable ways. 1. F (x) = (cos x, sin x, x).

5. F (x, y) = (x2 , y 2 ).

2. F (x, y) = (x, y, x + y).

6. F (x, y) = (x2 − y 2 , 2xy).

3. F (x, y, z) = (y, z, x).

7. F (~x) = 2~x.

4. F (x, y, z) = (y, −x, z).

8. F (~x) = ~a − ~x.

The meaning of the sixth map can be seen through the polar coordinate. Exercise 5.1.45. Prove that the composition of two continuous maps is continuous. Exercise 5.1.46. Prove that the addition (~x, ~y ) → ~x + ~y : Rn × Rn → Rn and the scalar multiplication (c, ~x) → c~x : R × Rn → Rn are continuous.

The following extends the similar Propositions 1.3.8 and 1.4.3 for single variable functions. The same proof applies here. Proposition 5.1.14. For a map F (~x), lim~x→~a F (~x) = ~l if and only if limn→∞ F (~xn ) = ~l for any sequence {~xn } satisfying ~xn 6= ~a and limn→∞ ~xn = ~a. In particular, F is continuous at ~a if and only if limn→∞ F (~xn ) = F (~a) for any sequence {~xn } satisfying limn→∞ ~xn = ~a.

266

CHAPTER 5. MULTIVARIABLE FUNCTION

Theorem 5.1.12 on the boundedness and the uniform continuity can also be extended. The extreme is meaningless for maps. Theorem 5.1.15. Suppose F (~x) is a continuous map on a compact subset K. Then F (~x) is bounded and uniformly continuous: For any > 0, there is δ > 0, such that ~x, ~y ∈ K, k~x − ~y k < δ =⇒ kF (~x) − F (~y )k < .

(5.1.21)

We do not expect the intermediate value theorem to extend to multivariable functions in general because the theorem makes critical use of the interval. For the multivariable case, the interval may be replaced by the path connected condition. Theorem 5.1.16. Suppose f (~x) is a continuous function on a path connected subset A. Then for any ~a, ~b ∈ A and y between f (~a) and f (~b), there is ~c ∈ A, such that f (~c) = y. Proof. Since A is path connected, there is a continuous path φ(t) : [a, b] → A such that φ(a) = ~a and φ(b) = ~b. The composition g(t) = f (φ(t)) is then a continuous function for t ∈ [a, b], and y is between g(a) = f (~a) and g(b) = f (~b). By Theorem 1.4.6, there is c ∈ [a, b], such that g(c) = y. The conclusion is the same as f (~c) = y for ~c = φ(c). We cannot talk about multivariable monotone maps. Thus the only part of Theorem 1.4.8 that can be extended is the continuity. Theorem 5.1.17. Suppose F : K ⊂ Rn → Rm is a one-to-one and continuous map on a compact set K. Then the inverse map F −1 : F (K) ⊂ Rm → Rn is continuous. Proof. We claim that F −1 is in fact uniformly continuous. Suppose it is not uniformly continuous, then there is > 0 and ξ~k = F (~xk ), ~ηk = F (~yk ) ∈ F (K), such that kξ~k − ~ηk k → 0 as k → ∞ and k~xk − ~yk k = kF −1 (ξ~k ) − F −1 (~ηk )k ≥ . By the compactness of K, we can find kp , such that both limp→∞ ~xkp = ~a and limp→∞ ~ykp = ~b exist. Then by the continuity of F , we have limp→∞ ξ~kp = limp→∞ F (~xkp ) = F (~a) and limp→∞ ~ηkp = limp→∞ F (~ykp ) = F (~b). We conclude that kF (~a) − F (~b)k = limp→∞ kξ~kp − ~ηkp k = 0, so that F (~a) = F (~b). Since F is one-to-one, we get ~a = ~b. This is in contradiction with limp→∞ ~xkp = ~a, limp→∞ ~ykp = ~b, and k~xk − ~yk k ≥ . Exercise 5.1.47. Prove the following are equivalent for a map F . 1. The map is continuous. 2. The preimage F −1 (C) = {~x : F (~x) ∈ C} ⊂ Rn of any closed subset C ⊂ Rm is closed. 3. The preimage F −1 (U ) of any open subset U ⊂ Rm is open.

5.1. LIMIT AND CONTINUITY

267

4. The preimage F −1 (B(~b, )) of any open ball B(~b, ) is open. Exercise 5.1.48. Prove the following are equivalent for a map F . 1. The map is continuous. ¯ ⊂ F (A) for any subset A. 2. F (A) ¯ ⊃ F −1 (B) for any subset B. 3. F −1 (B) Exercise 5.1.49. Suppose F is a continuous map on a compact subset K. Prove that the image subset F (K) = {F (~x) : ~x ∈ K} is compact. Exercise 5.1.50. A map F (~x) is Lipschitz if kF (~x) − F (~y )k ≤ Lk~x − ~y k for some constant L. Prove that Lipschitz maps are uniformly continuous.

5.1.6

Additional Exercise

Exercise 5.1.51. Suppose f (x) has continuous derivative. Prove that f (x) − f (y) = f 0 (a). x−y (x,y)→(a,a) lim

What if the continuity condition is dropped? Is there a similar conclusion for the second order derivative? Exercise 5.1.52. Prove that F : Rn → Rm is continuous if and only if F · ~a is a continuous function for any ~a ∈ Rm .

Repeated Extreme Exercise 5.1.53. For a function f (~x, ~y ) on A × B. Prove that inf sup f (~x, ~y ) ≥ sup inf f (~x, ~y ) ≥ inf inf f (~x, ~y ) =

~ y ∈B ~ x∈A

y ∈B ~ x∈A ~

~ x∈A ~ y ∈B

inf

~ x∈A,~ y ∈B

f (~x, ~y ).

Exercise 5.1.54. For any a ≥ b1 ≥ c1 ≥ d and a ≥ b2 ≥ c2 ≥ d, can you construct a function f (x, y) on [0, 1] × [0, 1] such that sup

f = a,

~ x∈A,~ y ∈B

inf sup f = b1 .

~ y ∈B ~ x∈A

inf sup f = b2 ,

~ x∈A ~ y ∈B

inf

~ x∈A,~ y ∈B

f = d,

sup inf f = c1 ,

y ∈B ~ x∈A ~

sup inf f = c2 .

x∈A ~ y ∈B ~

Exercise 5.1.55. Suppose f (x, y) is a function on [0, 1] × [0, 1]. Prove that if f (x, y) is increasing in x, then inf y∈[0,1] supx∈[0,1] f (x, y) = supx∈[0,1] inf y∈[0,1] f (x, y). Can the interval [0, 1] be changed to (0, 1)? Exercise 5.1.56. Extend the discussion to three or more repeated extrema. For example, can you give a simple criterion for comparing two strings of repeated extrema?

Limit Along Any Path Exercise 5.1.57. Prove that if lim~x→~a f (~x) converges along any continuous path leading to ~a, then the limit exists.

268

CHAPTER 5. MULTIVARIABLE FUNCTION

xy 2 does not exist, although the limit x2 + y 4 converges zero along any straight line leading to (0, 0) . Exercise 5.1.58. Show that limx→0,y→0

Homogeneous and Multihomogeneous Function A function f (~x) is homogeneous of degree α if f (c~x) = cα f (~x) for any c > 0. More generally, a function is multihomogeneous if cf (x1 , x2 , . . . , xn ) = f (cβ1 x1 , cβ2 x2 , . . . , cβn xn ). The later part of the proof of Theorem 5.1.13 makes use of the fact that any norm is a homogeneous function of degree 1 (also see Exercise 5.1.61). Exercise 5.1.59. Prove that two homogeneous functions of the same degree are equal away from ~0 if and only if their restrictions on the unit sphere S n−1 are equal. Exercise 5.1.60. Prove that a homogeneous function is bigger than another homogeneous function of the same degree away from ~0 if and only if the inequality holds on S n−1 . Exercise 5.1.61. Suppose f (~x) is a continuous homogeneous function of degree α satisfying f (~x) > 0 for ~x 6= ~0. Prove that there is c > 0, such that f (~x) ≥ ck~xkα for any ~x. Exercise 5.1.62. Prove that a homogeneous function is continuous away from ~0 if and only if its restriction on S n−1 is continuous. Then find the condition for a homogeneous function to be continuous at ~0.

Continuous Map and Function on Compact Set Exercise 5.1.63. Suppose F (~x, ~y ) is a continuous map on A × K ⊂ Rm × Rn . Prove that if K is compact, then for any ~a ∈ A and > 0, there is δ > 0, such that ~x ∈ A, ~y ∈ K, and k~x − ~ak < δ imply kF (~x, ~y ) − F (~a, ~y )k < . Exercise 5.1.64. Suppose f (~x, ~y ) is a continuous function on A × K. Prove that if K is compact, then g(~x) = max~y∈K f (~x, ~y ) is a continuous function on A. Exercise 5.1.65. Suppose f (~x, y) is a continuous function on A × [0, 1]. Prove that Z 1 f (~x, y)dy is a continuous function on A. g(~x) = 0

Continuity in Coordinates A function f (~x, ~y ) is continuous in ~x if lim~x→~a f (~x, ~y ) = f (~a, ~y ). It is uniformly continuous in ~x if the limit is uniform in ~y : For any > 0, there is δ > 0, such that k~x − ~x0 k < δ implies kF (~x, ~y ) − F (~x0 , ~y )k ≤ for any ~y . Exercise 5.1.66. Prove that if f (~x, ~y ) is continuous, f (~x, ~y ) is continuous  then  xy if(x, y) 6= (0, 0) in ~x. Moreover, show that the function f (x, y) = x2 + y 2 is 0 if(x, y) = (0, 0) continuous in both x and y, but is not a continuous function. Exercise 5.1.67. Prove that if f (~x, ~y ) is continuous in ~x and is uniformly continuous in ~y , then f (~x, ~y ) is continuous. Exercise 5.1.68. Suppose f (~x, y) is continuous in ~x and in y. Prove that if f (~x, y) is monotone in y, then f (~x, y) is continuous.

5.2. MULTIVARIABLE ALGEBRA

5.2

269

Multivariable Algebra

To extend the differentiation from single to multivariable, we need to consider multivariable linear and polynomial maps. We introduce the necessary linear and multilinear algebras for multivariable analysis, including linear transform, linear functional, bilinear form, multilinear form, and exterior algebra. The discussion will be made on Euclidean spaces and can be easily extended to general finite dimensional vector spaces.

5.2.1

Linear Transform

A map L : Rn → Rm is a linear transform if it preserves the addition and scalar multiplication L(~x + ~y ) = L(~x) + L(~y ), L(c~x) = cL(~x).

(5.2.1)

As a consequence, a linear transform also preserves the linear combinations L(c1~x1 + c2~x2 + · · · + ck ~xk ) = c1 L(~x1 ) + c2 L(~x2 ) + · · · + ck L(~xk ). In particular, for the standard basis ~e1 = (1, 0, . . . , 0), ~e2 = (0, 1, . . . , 0), . . . , ~en = (0, 0, . . . , 1) of Rn and the image ~ai = L(~ei ) of the standard basis, we get L(~x) = x1~a1 + x2~a2 + · · · + xn~an . (5.2.2) Conversely, any map given by the formula (5.2.2) is a linear transform. Denote ~aj = (a1j , a2j , . . . , amj ). Then the i-th coordinate of L(~x) is li (~x) = ai1 x1 + ai2 x2 + · · · + ain xn . By expressing vectors as vertical lists of coordinates, the linear transform becomes     l1 (~x) a11 x1 + a12 x2 + · · · + a1n xn  l2 (~x)   a21 x1 + a22 x2 + · · · + a2n xn      L(~x) =  ..  =   ..  .    . lm (~x) am1 x1 + am2 x2 + · · · + amn xn    a11 a12 · · · a1n x1  a21 a22 · · · a2n   x2     =  .. .. ..   ..  = A~x,  . . .  .  am1 am2 · · · amn xn where A is an m × n matrix. Therefore matrices are really the explicit presentations of linear transforms. A linear transform L produces a matrix A = L(~e1 ) L(~e2 ) · · · L(~en ) . Conversely, a matrix A produces a linear transform L(~x) = A~x. Under the correspondence, the addition, scalar multiplication and composition of linear transforms become the addition, scalar multiplication and product of matrices.

270

CHAPTER 5. MULTIVARIABLE FUNCTION

A number valued linear transform l : Rn → R is called a linear functional. A linear functional can be written as l(~x) = a1 x1 + a2 x2 + · · · + an xn = ~a · ~x

(5.2.3)

for a unique vector ~a = (a1 , a2 , . . . , an ). The levels l(~x) = c of the linear functional are (n − 1)-dimensional hyperplanes in Rn orthogonal to ~a. Exercise 5.2.1. Verify that a map given by the formula (5.2.2) is a linear transform. In other words, the map satisfies (5.2.1). Exercise 5.2.2. Suppose L, K : Rn → Rm are linear transforms and c is a number. Verify that the maps K + L and cK defined by (K + L)(~x) = K(~x) + L(~x), (cK)(~x) = c(K(~x)) are linear transforms. Moreover, prove that if K and L correspond to m × n matrices A and B, then K + L and cK correspond to the matrices A + B and cA (the right way is to study the column vectors of K + L and cK). Exercise 5.2.3. Suppose L : Rn → Rm and K : Rk → Rn are linear transforms. Verify that the composition L◦K : Rk → Rm is a linear transform. Moreover, prove that if K and L correspond to matrices A and B, then L ◦ K correspond to the product matrix BA (again, study the column vectors of the matrix corresponding to the composition). Exercise 5.2.4. Suppose L : Rn → Rm is a linear transform. Verify that for any ~y ∈ Rm , ~x 7→ L(~x) · ~y is a linear functional on Rn . The linear functional (of ~x) can be expressed as L(~x) · ~y = ~x · ~z for a unique ~z ∈ Rn . 1. Verify that L∗ (~y ) = ~z : Rm → Rn is a linear transform. Note that L∗ is the unique linear transform satisfying L(~x) · ~y = ~x · L∗ (~y ) for all ~x ∈ Rn and ~y ∈ Rm .

(5.2.4)

2. Prove that (L + K)∗ = L∗ + K ∗ , (cK)∗ = cK ∗ , (L ◦ K)∗ = K ∗ ◦ L∗ . 3. Prove that if L corresponds to the m × n matrix A, then L∗ corresponds to the transpose AT of A. In particular, this implies (BA)T = AT B T . The linear transform L∗ is called the adjoint of L.

Given a linear transform L : Rn → Rm and norms on Rn and Rm , we have kL(~x)k ≤ |x1 |k~a1 k+|x2 |k~a2 k+· · ·+|xn |k~an k ≤ (k~a1 k+k~a2 k+· · ·+k~an k)k~xk∞ . By the equivalence of norms on Rn , we can find λ, such that kL(~x)k ≤ λk~xk. By kc~xk = |c|k~xk and L(c~x) = cL(~x), the smallest such λ is kLk = inf{λ : kL(~x)k ≤ λk~xk for any ~x} = inf{λ : kL(~x)k ≤ λ for any ~x satisfying k~xk = 1} = sup{kL(~x)k : k~xk = 1}.

(5.2.5)

The number kLk is the norm of the linear transform (with respect to the given norms on the Euclidean spaces). The norm kAk of a matrix A is the norm of the corresponding linear transform. The inequality kL(~x) − L(~y )k = kL(~x − ~y )k ≤ λk~x − ~y k

5.2. MULTIVARIABLE ALGEBRA

271

also implies that linear transforms from finitely dimensional vector spaces are continuous. The linear functions used in the differentiation of single variable functions will be extended to maps of the form ~a + L(~x). We will call such maps linear maps (the formal name is affine maps) to distinguish from linear transforms (for which ~a = ~0). Theorem 5.2.1. The norm of linear transform satisfies the three axioms for the norm. Moreover, the norm of composition satisfies kL ◦ Kk ≤ kLkkKk. Proof. By the definition, we have kLk ≥ 0. If kLk = 0, then kL(~x)k = 0 for any ~x. By the positivity of norm, we get L(~x) = ~0 for any ~x. In other words, L is the zero transform. This verifies the positivity of kLk. By the definition of kLk and kKk, we have kL(~x) + K(~x)k ≤ kL(~x)k + kK(~x)k ≤ kLkk~xk + kKkk~xk ≤ (kLk + kKk)k~xk. Then by the definition of kL + Kk, we get kL + Kk ≤ kLk + kKk. This verifies the triangle inequality. The scalar property can be similarly proved. Finally, we have k(L ◦ K)(~x)k = kL(K(~x))k ≤ kLkkK(~x)k ≤ kLkkKkk~xk. Then by the definition of kL ◦ Kk, we have kL ◦ Kk ≤ kLkkKk. Exercise 5.2.5. Prove that with respect to the Euclidean norm on Rn and the absolute value on R, the norm of the linear functional l(~x) = ~a · ~x : Rn → R is k~ak2 . Then extend the conclusion to linear transforms Rn → Rm with the Euclidean norm on Rn and the L∞ -norm on Rm . Exercise 5.2.6. Prove that if the Euclidean spaces are given the Euclidean norms, then kLk = supk~xk=k~yk=1 L(~x) · ~y . This implies that the adjoint linear transform in Exercise 5.2.4 satisfies kL∗ k = kLk with respect to the Euclidean norms. Exercise Suppose A = (aij ) is the matrix of a linear transform. Prove that q5.2.7. P 2 kAk ≤ aij with respect to the Euclidean norms. Exercise 5.2.8. Prove that the supremum in the definition of the norm of linear transform is in fact maximum. In other words, the supremum is reached at some vector of unit length. Exercise 5.2.9. Suppose a continuous map F : Rn → Rm is additive: F (~x + ~y ) = F (~x) + F (~y ). Prove that F is a linear map.

By using matrices, the collection of all the linear transforms from Rn to Rm is easily identified with the Euclidean space Rmn . On the space Rmn , the preferred norm is the norm of linear transform, although the choice is often not important because all the norms are equivalent. Example 5.2.1. For an m×n matrix A, the left multiplication LA (X) = AX : Rnk → Rmk is a linear transform. By kAXk ≤ kAkkXk, we have kLA k ≤ kAk. Similarly, the right multiplication RA (X) = XA : Rkm → Rkn is a linear transform with kRA k ≤ kAk.

272

CHAPTER 5. MULTIVARIABLE FUNCTION 2

2

Example 5.2.2. Consider the map of taking squares F (L) = L2 : Rn → Rn . If H is a linear transform satisfying kHk < , then by Proposition 5.2.1, we have k(L + H)2 − L2 k = kLH + HL + H 2 k ≤ kLHk + kHLk + kH 2 k ≤ kLkkHk + kHkkLk + kHk2 < 2kLk + 2 . From this it is easy to deduce that the square map is continuous. Exercise 5.2.10. Prove that with respect to the norm of linear transform, the addition, scalar multiplication and composition of linear transforms are continuous maps. Exercise 5.2.11. Is it possible to have kLA k = kAk for some choices of norms on the Euclidean space? Does the equality always hold for the special choice? Exercise 5.2.12. Suppose Rn is given a norm, which induces a norm for the col2 lection Rn of linear transforms from Rn to itself. Study the inverse of linear transforms. P n 2 n2 1. Prove that if kLk < 1, then ∞ n=0 L = 1 + L + L + · · · converges in R and is the inverse of I − L (I is the identity transform). 2. Prove that if L is invertible, then for any linear transform K satisfying kK − −1 2 1 −1 − L−1 k ≤ kK − LkkL k , K is also invertible and kK . Lk < kL−1 k 1 − kK − LkkL−1 k This implies that the collection GL(n) of invertible linear transforms on Rn 2 is an open subset of Rn . 3. Prove that L 7→ L−1 is a continuous map on GL(n).

5.2.2

Bilinear and Quadratic Form

A map B : Rm × Rn → Rk is bilinear if it is linear in the first vector B(~x + ~x0 , ~y ) = B(~x, ~y ) + B(~x0 , ~y ), B(c~x, ~y ) = cB(~x, ~y ),

(5.2.6)

and is also linear in the second vector B(~x, ~y + ~y 0 ) = B(~x, ~y ) + B(~x, ~y 0 ), B(~x, c~y ) = cB(~x, ~y ).

(5.2.7)

Bilinear maps can be considered as generalized products because the scalar product c~x : R × Rn → Rn , the dot product ~x · ~y : Rn × Rn → R, the 3-dimensional cross product (x1 , x2 , x3 )×(y1 , y2 , y3 ) = (x2 y3 −x3 y2 , x3 y1 −x1 y3 , x1 y2 −x2 y1 ) : R3 ×R3 → R3 , and the matrix product AB : Rmn × Rnk → Rmk

5.2. MULTIVARIABLE ALGEBRA

273

are all bilinear. The addition and scalar multiplication of bilinear maps are still bilinear. Moreover, if B is bilinear and L and K are linear, then B(L(~u), K(~v )) is bilinear (note that (L, K)(~u, ~v ) = (L(~u), K(~v )) is a linear transform). On the other hand, if L is linear, then L(B(~x, ~y )) is bilinear. For the standard bases ~ei and f~j of Rm and Rn , we get ! X X X X B(~x, ~y ) = B xi~ei , yj f~j = xi yj B(~ei , f~j ) = xi yj~aij . (5.2.8) i

j

i,j

i,j

Conversely, any map given by the formula (5.2.8) is bilinear. Moreover, we have ! X X kB(~x, ~y )k ≤ |xi ||yj |k~aij k ≤ k~aij k k~xk∞ k~y k∞ . i,j

i,j

This implies that the bilinear map is continuous, and we may define the norm of the bilinear map to be kBk = inf{λ : kB(~x, ~y )k ≤ λk~xkk~y k for any ~x} = sup{kB(~x, ~y )k : k~xk = k~y k = 1}.

(5.2.9)

The norm kBk depends on the choices of norms on the three spaces Rm , Rn and Rk . A map is bilinear if and only if its coordinate functions are bilinear functions. Bilinear functions b(~x, ~y ) for ~x ∈ Rm and ~y ∈ Rn are equivalent to matrices A = (aij ) via the formula b(~x, ~y ) =

X

aij xi yj = A~x · ~y .

(5.2.10)

i,j

The relation can also be written in terms of the linear transform as b(~x, ~y ) = L(~x) · ~y . The addition and scalar multiplication of bilinear functions correspond to the addition and scalar multiplication of matrices or linear transforms. A bilinear function on Rn × Rn is called a bilinear form on Rn . A bilinear form is symmetric if b(~x, ~y ) = b(~y , ~x), and is skew-symmetric if b(~x, ~y ) = −b(~y , ~x). We also call the corresponding matrices symmetric and skew-symmetric. Exercise 5.2.13. Find the norms of the scalar product, dot product and matrix product with respect to the Euclidean norms on the Euclidean spaces. Exercise 5.2.14. Prove that the norm of bilinear map satisfies the three conditions for norms. Exercise 5.2.15. Find the norm of a bilinear function b(~x, ~y ) = L(~x)·~y with respect to the Euclidean norms. Exercise 5.2.16. Prove that a bilinear form is skew-symmetric if and only if b(~x, ~x) = 0 for any ~x.

274

CHAPTER 5. MULTIVARIABLE FUNCTION

Exercise 5.2.17. Prove that any bilinear form is the unique sum of a symmetric form and a skew-symmetric form. Exercise 5.2.18. Suppose B : Rm × Rn → Rk is a continuous map satisfying B(~x + ~x0 , ~y ) = B(~x, ~y ) + B(~x0 , ~y ) and B(~x, ~y + ~y 0 ) = B(~x, ~y ) + B(~x, ~y 0 ). Prove that B is a bilinear map.

A skew-symmetric bilinear form on R2 is x1 y 1 b(~x, ~y ) = a12 x1 y2 + a21 x2 y1 = a12 x1 y2 − a12 x2 y1 = a12 det . x2 y 2 A skew-symmetric bilinear form on R3 is b(~x, ~y ) = a12 x1 y2 + a21 x2 y1 + a13 x1 y3 + a31 x3 y1 + a23 x2 y3 + a23 x3 y2 = a12 (x1 y2 − x2 y1 ) + a31 (x3 y1 − x1 y3 ) + a23 (x2 y3 − x3 y2 ) x2 y 2 x3 y 3 x1 y 1 = a23 det + a31 det + a12 det x3 y 3 x1 y 1 x2 y 2 = ~a · (~x × ~y ). In general, a skew-symmetric bilinear form on Rn is b(~x, ~y ) =

X

aij xi yj =

i6=j

X

aij (xi yj − xj yi ) =

i<j

X i<j

xi y i aij det . xj yj

Motivated by the formula on R3 , we may define n-dimensional cross product n(n−1) ~x × ~y = (−1)i+j+1 (xi yj − xj yi ) i<j : Rn × Rn → R 2 . (5.2.11) Then we still have b(~x, ~y ) = ~a · (~x × ~y ), n(n−1)

where the ij-coordinate of ~a ∈ R 2 is aij when i + j is odd and is aji when i + j is even. The formula for skew-symmetric bilinear forms may be compared with the formula (5.2.3) for linear functionals. The general cross product ~x ×~y is still bilinear and satisfies ~x ×~y = −~y ×~x. Therefore (x1~a1 + x2~a2 ) × (y1~a1 + y2~a2 ) =x1 y1~a1 × ~a1 + x1 y2~a1 × ~a2 + x2 y1~a2 × ~a1 + x2 y2~a2 × ~a2 x1 y1 =x1 y2~a1 × ~a2 − x2 y1~a1 × ~a2 = det ~a × ~a2 . x2 y2 1

(5.2.12)

In R3 , the cross product ~x × ~y is a vector orthogonal to the two vectors, with the direction determined by the right hand rule from ~x to ~y . Moreover, the Euclidean norm of ~x ×~y is given by the area of the parallelogram spanned by the two vectors k~x × ~y k2 = k~xk2 k~y k2 | sin θ|, where θ is the angle between the two vectors. In other dimensions, we cannot talk about the direction because ~x × ~y is in a different Euclidean space.

5.2. MULTIVARIABLE ALGEBRA

275

However, the length of the cross product is still the area of the parallelogram spanned by the two vectors sX X p Area(~x, ~y ) = (~x · ~x)(~y · ~y ) − (~x · ~y )2 = x2i yj2 − xi y i xj y j i,j

=

sX

i,j

sX (x2i yj2 + x2j yi2 − 2xi yi xj yj ) = (xi yj − xj yi )2

i<j

i<j

= k~x × ~y k2 .

(5.2.13)

In R2 , the cross product ~x × ~y ∈ R is the determinant of the matrix formed by the two vectors. The Euclidean norm of the cross product is | det ~x ~y |, which is the area of the parallelogram spanned by ~x and ~y . A linear transform L(~x) = A~x = x1~a1 + x2~a2 : R2 → Rn takes the parallelogram spanned by ~x, ~y ∈ R2 to the parallelogram spanned by L(~x), L(~y ) ∈ Rn . By (5.2.12), the areas of the parallelograms are related by Area(L(~x), L(~y )) = | det ~x ~y |k~a1 × ~a2 k2 = k~a1 × ~a2 k2 Area(~x, ~y ). Exercise 5.2.19 (Archimedes1 ). Find the area of the region A bounded by x = 0, y = 4, and y = x2 in the following way. 1. Show that the area of the triangle with vertices (a, a2 ), (a + h, (a + h)2 ) and (a + 2h, (a + 2h)2 ) is |h|3 . 2. Let Pn be the polygon bounded by the line connecting (0, 0) to (0, 4), the line connecting (0, 4) to (2, 4), and the line segments connecting points on 2 2n 1 the parabola y = x2 with x = 0, n−1 , n−1 , . . . , n−1 . 2 2 2 3. Prove that the area of the polygon Pn − Pn−1 is 4. Prove that the area of the region A is

P∞

n=0

1 . 4n−1

1 4n−1

=

16 . 3

A quadratic form q(~x) = b(~x, ~x) =

X

aij xi xj = A~x · ~x

(5.2.14)

i,j

is obtained by taking two vectors in a bilinear form to be the same. Since skew-symmetric bilinear forms induce the zero quadratic form, usually only symmetric bilinear forms are used to induce quadratic forms. Therefore quadratic forms are in one-to-one correspondence with symmetric bilinear 1

Archimedes of Syracuse, born 287 BC and died 212 BC.

276

CHAPTER 5. MULTIVARIABLE FUNCTION

forms (and symmetric matrices, and self-adjoint linear transforms) by (note that aij = aji ) q(x1 , x2 , . . . , xn ) X X = aii x2i + 2 aij xi xj 1≤i≤n =a11 x21 +

(5.2.15)

1≤i<j≤n

a22 x22

+ · · · + ann x2n + 2a12 x1 x2 + 2a13 x1 x3 + · · · + 2a(n−1)n xn−1 xn .

Exercise 5.2.20. Prove that a symmetric bilinear form can be recovered from the corresponding quadratic form by 1 1 b(~x, ~y ) = (q(~x + ~y ) − q(~x − ~y )) = (q(~x + ~y ) − q(~x) − q(~y )). 4 2

(5.2.16)

Exercise 5.2.21. Prove that a quadratic form is homogeneous of second order q(c~x) = c2 q(~x), and satisfies the parellelogram law q(~x + ~y ) + q(~x − ~y ) = 2q(~x) + 2q(~y ).

(5.2.17)

Exercise 5.2.22. Suppose a function q satisfies the parellelogram law (5.2.17). Define a function b(~x, ~y ) by (5.2.16). 1. Prove that q(~0) = 0, q(−~x) = q(~x). 2. Prove that b is symmetric. 3. Prove that b satisfies b(~x + ~y , ~z) + b(~x − ~y , ~z) = 2b(~x, ~z). 4. Suppose f (~x) is a function satisfying f (~0) = 0 and f (~x+~y )+f (~x−~y ) = 2f (~x). Prove that f is additive (see Exercise 5.2.9): f (~x + ~y ) = f (~x) + f (~y ). 5. Prove that if q is continuous, then b is a bilinear form.

A quadratic form is positive definite if ~x 6= 0 =⇒ q(~x) > 0. It is negative definite if ~x 6= 0 =⇒ q(~x) < 0. It is indefinite if it takes both positive and negative values. In addition to the three possibilities, a quadratic form is semi-positive definite if q(~x) ≥ 0 for all ~x and q(~x) = 0 for some ~x 6= ~0. It is semi-negative definite if q(~x) ≤ 0 for all ~x and q(~x) = 0 for some ~x 6= ~0. The terms of the form aij xi xj , i 6= j are cross terms. A quadratic form without cross terms is q = a11 x21 + a22 x22 + · · · + ann x2n . It is positive definite if and only if all aii > 0. It is negative definite if and only if all aii < 0. It is indefinite if and only if some aii > 0 and some ajj < 0. In general, the cross terms may be eliminated by the technique of completing the squares. Then the nature of the quadratic form can be determined.

5.2. MULTIVARIABLE ALGEBRA

277

Example 5.2.3. Consider the quadratic from q(x, y, z) = x2 + 13y 2 + 14z 2 + 6xy + 2xz + 18yz. Putting together all the terms involving x and completing a square, we get q = x2 + 6xy + 2xz + 13y 2 + 14z 2 + 18yz = [x2 + 2x(3y + z) + (3y + z)2 ] + 13y 2 + 14z 2 + 18yz − (3y + z)2 = (x + 3y + z)2 + 4y 2 + 13z 2 + 12yz. The remaining terms involve only y and z. Putting together all the terms involving y and completing a square, we get 4y 2 + 13z 2 + 12yz = (2y + 3z)2 + 4z 2 . Thus q = (x + 3y + z)2 + (2y + 3z)2 + (2z)2 = u2 + v 2 + w2 is positive definite. Example 5.2.4. The cross terms in the quadratic form q = 4x21 + 19x22 − 4x24 − 4x1 x2 + 4x1 x3 − 8x1 x4 + 10x2 x3 + 16x2 x4 + 12x3 x4 can be eliminated as follows. q = 4[x21 − x1 x2 + x1 x3 − 2x1 x4 ] + 19x22 − 4x24 + 10x2 x3 + 16x2 x4 + 12x3 x4 " 2 # 1 1 1 1 = 4 x21 + 2x1 − x2 + x3 − x4 + − x2 + x3 − x4 2 2 2 2 2 1 1 2 2 + 19x2 − 4x4 + 10x2 x3 + 16x2 x4 + 12x3 x4 − 4 − x2 + x3 − x4 2 2 2 1 1 2 2 = 4 x1 − x2 + x3 − x4 + 18 x22 + x2 x3 + x2 x4 − x23 − 8x24 + 16x3 x4 2 2 3 3 " 2 # 1 1 1 1 x3 + x4 + x3 + x4 = (2x1 − x2 + x3 − 2x4 )2 + 18 x22 + 2x2 3 3 3 3 2 1 1 − x23 − 8x24 + 16x3 x4 − 18 x3 + x4 3 3 2 1 1 2 = (2x1 − x2 + x3 − 2x4 ) + 18 x2 + x3 + x4 − 3(x23 − 4x3 x4 ) − 10x24 3 3 = (2x1 − x2 + x3 − 2x4 )2 + 2(3x2 + x3 + x4 )2 − 3[x23 + 2x3 (−2x4 ) + (−2x4 )2 ] − 10x24 + 3(−2x4 )2 = (2x1 − x2 + x3 − 2x4 )2 + 2(3x2 + x3 + x4 )2 − 3(x3 − 2x4 )2 + 2x24 = y12 + 2y22 − 3y33 + 2y42 . The result tells us that q is indefinite. Example 5.2.5. The quadratic form q = 4xy+y 2 has no x2 term. We may complete the square by using the y 2 term and get q = (y + 2x)2 − 4x2 = u2 − 4v 2 , which is indefinite. The quadratic form q = xy + yz has no square terms. We may eliminate the cross terms by introducing x = x1 +y1 , y = x1 −y1 , so that q = x21 −y12 +x1 z −y1 z. 1 2 1 2 1 Then we complete the square and get q = x1 − z − y1 + z = (x + y − 2 2 4 1 z)2 − (x − y + z)2 . The quadratic form is also indefinite. 4 Exercise 5.2.23. Eliminate the cross terms in the following quadratic forms. Then determine the nature of the quadratic forms.

278

CHAPTER 5. MULTIVARIABLE FUNCTION

1. x2 + 4xy − 5y 2 .

5. −2u2 − v 2 − 6w2 − 4uw + 2vw.

2. 2x2 + 4xy. 3.

4x21

4. x2 + 2y 2 + z 2 + 2xy − 2xz.

+ 4x1 x2 +

5x22 .

6. x21 + x23 + 2x1 x2 + 2x1 x3 + 2x1 x4 + 2x3 x4 .

Exercise 5.2.24. Eliminate the cross terms in the quadratic form x2 + 2y 2 + z 2 + 2xy − 2xz by first completing a square for terms involving z, then completing for terms involving y. Exercise 5.2.25. Prove that if a quadratic form is positive definite, then all the coefficients of the square terms must be positive. Exercise 5.2.26. Prove that if a quadratic form q(~x) is positive definite, then there is λ > 0, such that q(~x) ≥ λk~xk2 for any ~x.

Now let us study the technique of completing the squares in general. The k-th principal minor of a matrix A is the determinant of the k × k matrix formed by the entries in the first k rows and first k columns of A. Consider a quadratic form q(~x) = A~x · ~x, where A is a symmetric matrix. Let a11 a12 , . . . , dn = det A d1 = a11 , d2 = det a21 a22 be the principal minors of the coefficient matrix. If d1 6= 0, then eliminating all the cross terms involving x1 gives us 1 1 2 2 q(~x) = a11 x1 + 2x1 (a12 x2 + · · · + a1n xn ) + 2 (a12 x2 + · · · + a1n xn ) a11 a11 2 2 + a22 x2 + · · · + ann xn + 2a23 x2 x3 + 2a24 x2 x4 + · · · + 2a(n−1)n xn−1 xn 1 (a12 x2 + · · · + a1n xn )2 − a11 2 a1n a12 = d 1 x1 + x2 + · · · + xn + q2 (~x2 ), d1 d1 where q2 is a quadratic form of the truncated vector ~x2 = (x2 , x3 , . . . , xn ). The coefficient matrix A2 for q2 is obtained as follows. For each 2 ≤ i ≤ n, a1i adding − multiple of the first column of A to the i-th column will make the a11 d1 ~0 i-th term in the first row to become zero. Then we get a matrix . ∗ A2 Since the column operation does not change the determinant of the matrix (2) (2) (2) (and all the principal minors), the principal minors d1 , d2 , . . . , dn−1 of A2 (1) (1) (1) are related to the principal minors d1 = d1 , d2 = d2 , . . . , dn = dn of A1 (1) (2) by dk+1 = d1 dk . The discussion sets up an inductive argument. Assume d1 , d2 , . . . , dk are all nonzero. Then we may complete the squares in k steps and obtain (1)

(2)

q(~x) = d1 (x1 + b12 x2 + · · · + b1n xn )2 + d1 (x2 + b23 x3 + · · · + b2n xn )2 (k)

+ · · · + d1 (xk + ak(k+1) xk+1 + · · · + akn xn )2 + qk+1 (~xk+1 ),

5.2. MULTIVARIABLE ALGEBRA with

(i−1)

(i) d1

=

d2

(i−1) d1

279

(i−2)

=

d3

(i−2) d2

(1)

= ··· =

di

(1) di−1

=

di , di−1

dk+1 . dk Thus we have shown that if d1 , d2 , . . . , dn are all nonzero, then there is an “upper triangular” change of variables (k+1)

and the coefficient of x2k+1 in qk+1 is d1

=

y1 = x1 +b12 x2 +b13 x3 + · · · y2 = x2 +b23 x3 + · · · .. . yn = such that q = d1 y12 + criterion

+b1n xn , +b2n xn , xn ,

dn 2 d2 2 y2 + · · · + y . Consequently, we get Sylvester’s d1 dn−1 n

1. If d1 > 0, d2 > 0, . . . , dn > 0, then q(~x) is positive definite. 2. If −d1 > 0, d2 > 0, . . . , (−1)n dn > 0, then q(~x) is negative definite. 3. If d1 > 0, d2 > 0, . . . , dk > 0, dk+1 < 0, or −d1 > 0, d2 > 0, . . . , (−1)k dk > 0, (−1)k+1 dk+1 < 0, then q(~x) is indefinite.

5.2.3

Multilinear Map and Polynomial

A map F (~x1 , ~x2 , . . . , ~xk ) : Rn1 × Rn2 × · · · × Rnk → Rm is multilinear if it is linear in each of its k variables. For example, if B1 (~x, ~y ) and B2 (~u, ~v ) are bilinear, then B1 (~x, B2 (~u, ~v )) is a trilinear map in ~x, ~u, ~v . The addition, scalar multiplication and composition of multilinear maps of matching types are still multilinear. Similar to the discussion of linear and bilinear maps, a map is multilinear if and only if it is given by X F = x1i1 x2i2 · · · xkik~ai1 i2 ···ik , ~ai1 i2 ···ik = F (~ei1 , ~ei2 , . . . , ~eik ), (5.2.18) i1 ,i2 ,··· ,ik

in terms of the coordinates of the variables. Moreover, we have ! X kF (~x1 , ~x2 , . . . , ~xk )k ≤ k~ai1 i2 ···ik k k~x1 k∞ k~x2 k∞ · · · k~xk k∞ . i1 ,i2 ,··· ,ik

Thus the norm of multilinear maps can be defined similar to the linear and bilinear maps. A map is multilinear if and only if its coordinate functions are multilinear (of the same type). Multilinear functions are equivalent to the collection (ai1 i2 ···ik ) of its coefficients, which can be imagined as a “k-dimensional matrix”. Multilinear functions on Rn × Rn × · · · × Rn are called multilinear forms on Rn .

280

CHAPTER 5. MULTIVARIABLE FUNCTION

A multilinear map F on Rn × Rn × · · · × Rn is symmetric if switching any two variables does not change the value F (~x1 , . . . , ~xi , . . . , ~xj , . . . , ~xk ) = F (~x1 , . . . , ~xj , . . . , ~xi , . . . , ~xk ). This is equivalent to the coefficients ~ai1 i2 ···ik being independent of the order of indices. A k-th order form X ai1 i2 ···ik xi1 xi2 · · · xik (5.2.19) φ(~x) = f (~x, ~x, . . . , ~x) = i1 ,i2 ,...,ik

is obtained by taking all vectors in a multilinear form of k vectors to be the same. Similar to the quadratic forms, usually only symmetric multilinear forms are used here, and this gives an equivalence between symmetric multilinear forms of k vectors and k-th order forms. Write ak1 k2 ···kn = ai1 i2 ···ik when the collection {i1 , i2 , . . . , ik } consists of ki copies of i for any 1 ≤ i ≤ n. For example, a244 x2 x4 x4 = a424 x4 x2 x4 = a442 x4 x4 x2 = a0102 x01 x12 x03 x24 . Then we have φ(x1 , x2 , . . . , xn ) =

X k1 +k2 +···+kn

k! ak1 k2 ···kn xk11 xk22 · · · xknn . k1 !k2 ! · · · kn ! =k,k ≥0 i

(5.2.20) The k-th order forms satisfy φ(c~x) = c φ(~x). They are homogeneous functions of order k and are multivariable analogues of the monomial xk for single variable polynomials. Thus a k-th order multivariable polynomial on Rn is a linear combination of j-th order forms with j ≤ k k

p(~x) = φ0 (~x) + φ1 (~x) + φ2 (~x) + · · · + φk (~x) X = bk1 k2 ···kn xk11 xk22 · · · xknn .

(5.2.21)

k1 +k2 +···+kn ≤k,ki ≥0

By defining a polynomial as the sum of reductions of multilinear forms, the concept extends to general vector spaces and to maps between vector spaces. In particular, a map F : Rn → Rm is a polynomial map if and only if all its coordinate functions are polynomials. A multilinear map F on Rn × Rn × · · · × Rn is alternating if switching any two variables changes the sign F (~x1 , . . . , ~xi , . . . , ~xj , . . . , ~xk ) = −F (~x1 , . . . , ~xj , . . . , ~xi , . . . , ~xk ). This is equivalent to the coefficients ~ai1 i2 ···ik changing signs when two indices are exchanged. The alternating property is also equivalent to the value being zero when two vectors are equal (see Exercise 5.2.16) F (~x1 , . . . , ~y , . . . , ~y , . . . , ~xk ) = 0.

5.2. MULTIVARIABLE ALGEBRA

281

In particular, if k > n, then at least two indices in ~ai1 i2 ···ik must be the same, and the coefficient vector is zero. Therefore any alternating multilinear map of k-variables in Rn is zero when k > n. Suppose k = n. Then we have F (~x1 , ~x2 , . . . , ~xn ) X = x1i1 x2i2 · · · xnin F (~ei1 , ~ei2 , . . . , ~ein ) i1 ,i2 ,...,in

=

X

sign(i1 , i2 , . . . , in )x1i1 x2i2 · · · xnin F (~e1 , ~e2 , . . . , ~en )

i1 ,i2 ,...,in



x11 x21 · · ·  x12 x22 · · ·  = det  .. ..  . . x1n x2n · · ·

 xn1 xn2   ..  F (~e1 , ~e2 , . . . , ~en ), . 

(5.2.22)

xnn

where (i1 , i2 , . . . , in ) is a rearrangement of (1, 2, . . . , n) (called a permutation), and sign(i1 , i2 , . . . , in ) is 1 if it takes even number of steps to recover (1, 2, . . . , n) from (i1 , i2 , . . . , in ) by exchanging pairs of numbers, and sign(i1 , i2 , . . . , in ) is −1 if it takes odd number of steps. Thus the determinant of n × n matrices is the unique multilinear alternating function of the n column vectors, such that the value at the identity matrix (corresponding to the columns ~e1 , ~e2 , . . . , ~en ) is 1. Suppose k ≤ n. Then we have F (~ei1 , ~ei2 , . . . , ~eik ) = ±F (~ej1 , ~ej2 , . . . , ~ejk ), where j1 < j2 < · · · < jk is the rearrangement of i1 , i2 , . . . , ik in increasing order. A computation similar to (5.2.22) tells us F (~x1 , ~x2 , . . . , ~xk ) X = x1i1 x2i2 · · · xkik F (~ei1 , ~ei2 , . . . , ~eik ) i1 ,i2 ,...,ik



x1j1 x2j1 · · · x1j x2j · · · X 2  2 = det  .. ..  . . j1 <j2 <···<jk x1jk x2jk · · ·

 xkj1 xkj2   ..  F (~ej1 , ~ej2 , . . . , ~ejk ). . 

(5.2.23)

xkjk

The skew-symmetric bilinear forms can be described by the generalized n(n−1) n(n−1) cross product ~x × ~y : Rn × Rn → R 2 , where the coordinates of R 2 are indexed by 1 ≤ i < j ≤ n. For alternating multilinear k-forms, we need to consider cross product of k vectors with value in a Euclidean space in which the coordinates are indexed by 1 ≤ i1 < i2 < · · · < ik ≤ n. Therefore for any subset I = {i1 , i2 , . . . , ik } ⊂ [n] = {1, 2, . . . , n}, arrange the indices in I in increasing order and introduce the symbol ~e∧I = ~ei1 ∧ ~ei2 ∧ · · · ∧ ~eik .

282

CHAPTER 5. MULTIVARIABLE FUNCTION

n! Define the k-th exterior product Λk Rn ∼ = R k!(n−k)! of Rn to be the vector space with symbols ~e∧I as basis (called the standard basis of Λk Rn ). Then define the k-th order wedge product to be the map

~x1 ∧ ~x2 ∧ · · · ∧ ~xk : Rn × Rn × · · · × Rn → Λk Rn given by 

x1i1 x2i1 · · · x1i x2i · · · X 2  2 det  .. ~x1 ∧ ~x2 ∧ · · · ∧ ~xk = ..  . . i1
 xki1 xki2   ..  ~ei1 ∧ ~ei2 ∧ · · · ∧ ~eik . .  xkik

(5.2.24) The wedge product is multilinear and alternating. The notation is consistent in the sense that if 1 ≤ i1 < i2 < · · · < ik ≤ n and ~xj = ~eij , then the right side of (5.2.24) is indeed equal to ~ei1 ∧ ~ei2 ∧ · · · ∧ ~eik . Moreover, if I = {i1 , i2 , . . . , ik } but ij is not necessarily increasing, then ~ei1 ∧~ei2 ∧· · ·∧~eik = ±~e∧I , where the sign is determined by the number of exchanges needed in order to rearrange the indices in increasing order. Suppose ~y1 = a11~x1 + a12~x2 + · · · + a1k ~xk , ~y2 = a21~x1 + a22~x2 + · · · + a2k ~xk , .. . ~yk = ak1~x1 + ak2~x2 + · · · + akk ~xk , and denote the coefficient matrix A = (aij ). Then we have ~y1 ∧ ~y2 ∧ · · · ∧ ~yk = (det A)~x1 ∧ ~x2 ∧ · · · ∧ ~xk .

(5.2.25)

The equality can be derived similar to (5.2.22). It can also be derived by the fact that both sides are alternating multilinear maps of the row vectors of A, and both sides have the same values when the rows of A are standard basis vectors. For the special case that ~xi form the standard basis of the Euclidean space, the equality (5.2.25) becomes ~x1 ∧ ~x2 ∧ · · · ∧ ~xn = det ~x1 ~x2 · · · ~xn ~e[n] for ~xi ∈ Rn . (5.2.26) Taking the map F in the equality (5.2.22) to be the wedge product will also lead to (5.2.26). The equality also generalizes (5.2.12) for the case n = 2. If ~x1 , ~x2 , . . . , ~xk are linearly independent, then by extending the vectors to a basis of Rn , the equality (5.2.26) implies ~x1 ∧~x2 ∧· · ·∧~xk 6= 0. Conversely, if the vectors are linearly dependent, then one vector, say ~x1 , can be written as a linear combination of the others ~x1 = c2~x2 + · · · + ck ~xk ,

5.2. MULTIVARIABLE ALGEBRA

283

and by the alternating property of the wedge product, ~x1 ∧ ~x2 ∧ · · · ∧ ~xk = c2~x2 ∧ ~x2 ∧ · · · ∧ ~xk + · · · + ck ~xk ∧ ~x2 ∧ · · · ∧ ~xk = 0. Therefore we conclude that ~x1 , ~x2 , . . . , ~xk are linearly independent ⇐⇒ ~x1 ∧ ~x2 ∧ · · · ∧ ~xk 6= 0. (5.2.27) Given a linear transform L : Rn → Rm , we define a linear transform ΛL = Λk L : Λk Rn → Λk Rm by linearly extending the values on the basis vectors ΛL(~ei1 ∧ ~ei2 ∧ · · · ∧ ~eik ) = L(~ei1 ) ∧ L(~ei2 ) ∧ · · · ∧ L(~eik ). Then we have ΛL(~x1 ∧ ~x2 ∧ · · · ∧ ~xk ) = L(~x1 ) ∧ L(~x2 ) ∧ · · · ∧ L(~xk )

(5.2.28)

because both sides are multilinear maps from (Rn )k to ΛRm with the same values at the standard basis vectors. The equality (5.2.28) implies that Λ(L ◦ K) = ΛL ◦ ΛK. Therefore if L is invertible, then ΛL is also invertible. Since any basis ~b1 , ~b2 , . . . , ~bn of Rn is the image of the standard basis ~e1 , ~e2 , . . . , ~en under an invertible linear map, we conclude from (5.2.28) that ~b∧I = ~bi1 ∧ ~bi2 ∧ · · · ∧ ~bik also form a basis of the exterior product Λk Rn . This suggests that the exterior product ΛV can be defined for any vector space V . Because the dimension of Λn Rn is 1, the linear transform Λn L : Λn Rn → Λn Rn induced by a linear transform L : Rn → Rn is multiplying a number L(~x1 ) ∧ L(~x2 ) ∧ · · · ∧ L(~xn ) = d~x1 ∧ ~x2 ∧ · · · ∧ ~xn .

(5.2.29)

By comparing the special case L(~e1 ) ∧ L(~e2 ) ∧ · · · ∧ L(~en ) = d~e∧[n] with (5.2.26), we find the number d to be the determinant of the matrix corresponding to L. Therefore the equation 5.2.29 gives a definition L(~x1 ) ∧ L(~x2 ) ∧ · · · ∧ L(~xn ) = (det L)~x1 ∧ ~x2 ∧ · · · ∧ ~xn

(5.2.30)

of the determinant of a linear transform from Rn to itself without explicitly referring to the standard basis. Such a definition can be directly extended to general vector spaces. We also note that the equality Λn (L◦K) = Λn L◦Λn K simply means det(L ◦ K) = det L det K. (5.2.31) This gives a conceptual proof of the similar equality for matrices.

284

CHAPTER 5. MULTIVARIABLE FUNCTION

We have Λ0 Rn = R, Λ1 Rn = Rn , Λn Rn = R~e∧[n] , and Λk Rn = 0 for k > n. The total exterior product space ΛRn = Λ0 Rn ⊕ Λ1 Rn ⊕ Λ2 Rn ⊕ · · · ⊕ Λn Rn has dimension 2n . Define the wedge product ∧ : Λk Rn × Λl Rn → Λk+l Rn , ΛRn × ΛRn → ΛRn to be the bilinear map that extends the obvious values at the standard basis vectors. Then we have (~x1 ∧~x2 ∧· · ·∧~xk )∧(~y1 ∧~y2 ∧· · ·∧~yl ) = ~x1 ∧~x2 ∧· · ·∧~xk ∧~y1 ∧~y2 ∧· · ·∧~yl (5.2.32) because both sides are multilinear maps from (Rn )k+l to Λk+l Rn with the same values at the standard basis vectors ~e∧I . The associativity (~λ ∧ µ ~ ) ∧ ~ν = ~λ ∧ (~µ ∧ ~ν )

(5.2.33)

and the graded commutativity ~λ ∧ µ ~ = (−1)kl µ ~ ∧ ~λ for ~λ ∈ Λk Rn , µ ~ ∈ Λl R n

(5.2.34)

can also be derived by considering both sides as multilinear maps that coincide at the standard basis vectors. This makes ΛRn into an algebra with unit 1 ∈ Λ0 Rn , called the exterior algebra. The exterior algebra can be generated from the unit 1 and the vectors in Rn by the wedge product. For a linear transform L : Rn → Rm , the induced transform ΛL on the exterior algebra is an algebra homomorphism ΛL(1) = 1, ΛL(a~λ + b~µ) = aΛL(~λ) + bΛL(~µ), ΛL(~λ ∧ µ ~ ) = ΛL(~λ) ∧ ΛL(~µ). By taking the standard basis ∧~eI as an orthonormal basis, the exterior algebra ΛRn becomes an inner product space. Then we have the equality (~x1 ∧~x2 ∧· · ·∧~xk )·(~y1 ∧~y2 ∧· · ·∧~yk ) = det(~xi ·~yj )1≤i,j≤k = det X T Y, (5.2.35) where X = ~x1 ~x2 · · · ~xk , Y = ~y1 ~y2 · · · ~yk . The equality is a consequence of the fact that both sides are multilinear functions on (Rn )2k and are equal at the standard basis. As a result, if U : Rn → Rn is an orthogonal transform, then ΛU : ΛRn → ΛRn is also an orthogonal transform. In other words, if ~b1 , ~b2 , . . . , ~bn is an orthogonal basis of Rn , then ~b∧I also form an orthogonal basis of ΛRn . The equality (5.2.35) also tells us that if ~λ = ~x1 ∧ ~x2 ∧ · · · ∧ ~xk , µ ~ = ~ ~y1 ∧ ~y2 ∧ · · · ∧ ~yk , ξ = ~z1 ∧ ~z2 ∧ · · · ∧ ~zl , ~η = w ~1 ∧ w ~2 ∧ · · · ∧ w ~ l , such that ~xi · w ~ j = 0, then ~ · (~µ ∧ ~η ) = (~λ · µ ~ (~λ ∧ ξ) ~ )(~η · ξ). (5.2.36)

5.2. MULTIVARIABLE ALGEBRA

285

In general, for a subspace V ⊂ Rn , we let ΛV to be the subspace of ΛRn consisting of linear combination of ~x1 ∧ ~x2 ∧ · · · ∧ ~xk with ~xi ∈ V . Now if V and W are subspaces orthogonal to each other, then the equality (5.2.36) holds for ~λ ∈ ΛV and ~η ∈ ΛW . As suggested by the cross product and the extension to higher dimension, the Euclidean norm q √ k~x1 ∧ ~x2 ∧ · · · ∧ ~xk k2 = det(~xi · ~xj )1≤i,j≤k = det X T X (5.2.37) with respect to the inner product in Λk Rn is the k-dimensional volume of the parallelepiped spanned by the k vectors in Rn . Strictly speaking, the concept of volume is yet to be defined, and the formula needs to be justified. The justification will be carried out in Section 7.1.5. Note that by (5.2.36), we have ~λ ∈ ΛV, ~µ ∈ ΛW, V ⊥ W =⇒ k~λ ∧ µ ~ k2 = k~λk2 k~µk2 .

(5.2.38)

Define the duality map (or Hodge star operator) on the exterior algebra ~λ 7→ ~λ? : Λk Rn → Λn−k Rn , as a linear map with the value on the standard basis given by ? ~e∧I = ±~e∧([n]−I) ,

where the sign is chosen so that ? ~e∧I ∧ ~e∧I = ~e∧[n] = ~e1 ∧ ~e2 ∧ · · · ∧ ~en .

(5.2.39)

For example, we have ~ei? = (−1)i−1~e∧([n]−i) , ? ~ei∧j = (−1)i+j−1~e∧([n]−{i,j}) , for i < j, ? ~e∧([n]−i) = (−1)n−i~ei .

Moreover, the dual in R2 is the counterclockwise rotation by 90 degrees (x1 , x2 )? = x1~e1? + x2~e2? = x1~e2 − x2~e1 = (−x2 , x1 ),

(5.2.40)

and the cross product in R3 is the dual of the wedge product ((x1 , x2 , x3 ) ∧ (y1 , y2 , y3 ))? x1 y 1 x1 y 1 x2 y 2 ∗ ∗ = det (~e1 ∧ ~e2 ) + det (~e1 ∧ ~e3 ) + det (~e2 ∧ ~e3 )∗ x2 y 2 x3 y 3 x3 y 3 x1 y 1 x1 y1 x2 y2 = det ~e − det ~e + det ~e x2 y 2 3 x3 y3 2 x3 y3 1 =(x1 , x2 , x3 ) × (y1 , y2 , y3 ).

(5.2.41)

286

CHAPTER 5. MULTIVARIABLE FUNCTION

? ? ?? Suppose I consists of k indices. Then by ~e∧I ∧ ~e∧I = ~e∧[n] = ~e∧I ∧ ~e∧I = k(n−k) ?? ? (−1) ~e∧I ∧ ~e∧I , we have

~λ ?? = (−1)k(n−k)~λ for ~λ ∈ Λk Rn .

(5.2.42)

~λ ∧ µ ~ ? = (~λ · µ ~ )~e∧[n]

(5.2.43)

We also have because both sides are bilinear maps from (Λk Rn )2 to Λn Rn that are equal at ~= the standard basis vectors. By (~λ? · µ ~ ? )~e∧[n] = ~λ? ∧ µ ~ ?? = (−1)k(n−k)~λ? ∧ µ µ ~ ∧ ~λ? = (~µ · ~λ)~e∧[n] = (~λ · µ ~ )~e∧[n] , we find the duality map preserves the inner product ~λ? · µ ~ ? = ~λ · µ ~, (5.2.44) and therefore also preserves the Euclidean norm k~λ? k2 = k~λk2 . The equalities (5.2.42) and (5.2.43) imply µ ~ ∧ ~λ = (~µ · ~λ∗ )~e∧[n] .

(5.2.45)

Since a vector ξ~ ∈ Λk Rn is uniquely determined by the dot product µ ~ · ξ~ for k n ∗ all µ ~ ∈ Λ R , the dual ~λ may be characterized as the unique exterior vector satisfying (5.2.45) for all µ ~. n n Suppose U : R → R is an orthogonal transform. Then ΛU (~µ) ∧ ΛU (~λ) = ΛU (~µ ∧ ~λ) = (~µ · ~λ∗ )ΛU (~e∧[n] ) = (ΛU (~µ) · ΛU (~λ∗ ))(det U )~e∧[n] . Since ΛU (~µ) can be any vector in Λk Rn , the equality tells us ΛU (~λ)∗ = (det U )ΛU (~λ∗ ) for ~λ ∈ Λk Rn and orthogonal U.

(5.2.46)

A basis ~b1 , ~b2 , . . . , ~bn of Rn is positively oriented if det ~b1 ~b2 · · · ~bn > 0. By (5.2.26), this means ~b1 ∧ ~b2 ∧ · · · ∧ ~bn = d~e1 ∧ ~e2 ∧ · · · ∧ ~en , d > 0. Suppose ~x1 , ~x2 , . . . , ~xk and ~xk+1 , ~xk+2 , . . . , ~xn are two collections of linearly independent vectors, such that any vector in the first collection is orthogonal to any vector in the second. Then the union ~x1 , ~x2 , . . . , ~xn of two collections is a basis in Rn . By modifying ~xn by a sign if necessary, we assume the basis ~x1 , ~x2 , . . . , ~xn is positively oriented. Then we claim that (~x1 ∧ ~x2 ∧ · · · ∧ ~xk )∗ = c~xk ∧ ~xk+1 ∧ · · · ∧ ~xn

(5.2.47)

5.2. MULTIVARIABLE ALGEBRA with c=

287

k~x1 ∧ ~x2 ∧ · · · ∧ ~xk k2 > 0. k~xk ∧ ~xk+1 ∧ · · · ∧ ~xn k2

The equality (5.2.47) suggests that the duality can be defined in the exterior algebra of any inner product space. To derive the equality (5.2.47), we choose an orthonormal bases ~b1 , ~b2 , . . . , ~bk for the span of ~x1 , ~x2 , . . . , ~xk and an orthonormal bases ~bk+1 , ~bk+2 , . . . , ~bn for the span of ~xk+1 , ~xk+2 , . . . , ~xn , such that ~x1 ∧ ~x2 ∧ · · · ∧ ~xk = α~b1 ∧ ~b2 ∧ · · · ∧ ~bk , ~xk ∧ ~xk+1 ∧ · · · ∧ ~xn = β~bk+1 ∧ ~bk+2 ∧ · · · ∧ ~bn , for some α, β > 0. Then ~b1 , ~b2 , . . . , ~bn is also a positively oriented orthonormal basis of Rn . Let U be the orthogonal transform given by the matrix ~b1 ~b2 · · · ~bn . Then det U > 0 and by (5.2.46) ~b1 ∧ ~b2 ∧ · · · ∧ ~b∗ = (ΛU (~e1 ∧ ~e2 ∧ · · · ∧ ~ek ))∗ k = ΛU (~ek+1 ∧ ~ek+2 ∧ · · · ∧ ~en ) = ~bk+1 ∧ ~bk+2 ∧ · · · ∧ ~bn , α ~xk ∧ ~xk+1 ∧ · · · ∧ ~xn . β In the special case k = n − 1, we find ~y = (~x1 ∧ ~x2 ∧ · · · ∧ ~xn−1 )∗ is a vector Rn such that Therefore we conclude (~x1 ∧ ~x2 ∧ · · · ∧ ~xk )∗ =

~xi · ~y = 0, ~x1 ∧ ~x2 ∧ · · · ∧ ~xn−1 ∧ ~y = k~x1 ∧ ~x2 ∧ · · · ∧ ~xn−1 k22~e[n] . Therefore ~y is a vector orthogonal to ~x1 , ~x2 , . . . , ~xn−1 and has “compatible” direction. The geometrical meaning and the equality (5.2.41) shows that the map (~x1 ∧ ~x2 ∧ · · · ∧ ~xn−1 )∗ : (Rn )n−1 → Rn is the true generalization of the cross product in R3 .

5.2.4

Additional Exercises

Isometry between Normed Spaces Let k~xk and 9~x9 be norms on Rn and Rm . A map F : Rn → Rm is an isometry if it satisfies 9F (~x) − F (~y )9 = k~x − ~y k. Exercise 7.2.5 says that isometries preserve the length of curves. A map F : Rn → Rm is affine if the following equivalent conditions are satisfied. 1. F ((1 − t)~x + t~y ) = (1 − t)F (~x) + tF (~y ) for any ~x, ~y ∈ Rn and any 0 ≤ t ≤ 1. 2. F ((1 − t)~x + t~y ) = (1 − t)F (~x) + tF (~y ) for any ~x, ~y ∈ Rn and any t ∈ R.

288

CHAPTER 5. MULTIVARIABLE FUNCTION

3. F (~x) = ~a + L(~x) for a fixed vector ~a (necessarily equal to F (~0)) and a linear transform L. Exercise 5.2.27. Suppose the norm 9~x9 on Rm is strictly convex, in the sense 9~x + ~y 9 = 9~x 9 + 9 ~y 9 implies ~x and ~y are parallel. Prove that for any isometry F and ~x, ~y , ~z = (1 − t)~x + t~y ∈ Rn , 0 ≤ t ≤ 1, the vectors F (~z) − F (~x) and F (~y ) − F (~z) must be parallel. Then prove that the isometry F is affine. Exercise 5.2.28. Show that the Lp -norm is strictly convex for 1 < p < ∞. Then show that an isometry between Euclidean spaces with the Euclidean norms must be of the form ~a + L(~x), where L is a linear transform with its matrix A satisfying AT A = I. Exercise 5.2.29. For any ~u ∈ Rn , the map φ(~x) = 2~u − ~x is the reflection with respect to ~u. A subset K ⊂ Rn is symmetric with respect to ~u if ~x ∈ K implies φ(~x) ∈ K. The subset has radius r(K) = sup~x∈K k~x − ~uk. 1. Prove that φ is an isometry from (Rn , k~xk) to itself, kφ(~x) − ~xk = 2k~x − ~uk, and ~u is the only point fixed by φ (i.e., satisfying φ(~x) = ~x). 2. For a subset K symmetric with respect to ~u, prove that the subset has diameter sup~x,~y∈K k~x − ~y k = 2r(K). Then prove that the subset K 0 = 1 {~x : K ⊂ B(~x, r(K))} has radius r(K 0 ) ≤ r(K). 2 ~a + ~b . Prove that 3. For any ~a, ~b ∈ Rn , denote ~u = 2 1 K0 = {~x : k~x − ~ak = k~x − ~bk = k~a − ~bk} 2 is symmetric with respect to ~u. Then prove that the sequence Kn defined by Kn+1 = Kn0 satisfies ∩Kn = {~u}. The last part gives a characterization of the middle point ~u of two points ~a and ~b purely in terms of the norm. Exercise 5.2.30 (Mazur-Ulam Theorem). Prove that an invertible isometry is necessarily affine. Specifically, suppose F : (Rn , k~xk) → (Rn , 9~x9) is an invertible isometry. By using the characterization of the middle point in Exercise 5.2.28, prove that the map preserves the middle point ! ~a + ~b F (~a) + F (~b) F = . 2 2 Then further prove that the property implies F is affine. Exercise 5.2.31. Let φ(t) be a real function and consider F (t) = (t, φ(t)) : R → R2 . For the absolute value on R and the L∞ -norm on R2 , find suitable condition on φ to make sure F is an isometry. The exercise shows that an isometry is not necessarily affine in general.

Chapter 6 Multivariable Differentiation

289

290

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

6.1

Differentiation

The differentiation can be extended from single to multivariable by considering the linear approximation. The coefficients in the linear approximation are given by partial derivatives, and most of the properties of differentiation can be extended. However, the multivariable differentiability is no longer equivalent to the existence of partial derivatives. Moreover, we can consider derivatives in directions other than the coordinate directions (which give the partial derivatives).

6.1.1

Differentiability and Derivative

The differentiation of maps between Euclidean spaces can be defined by directly generalizing the definition for single variable functions. Denote ∆~x = ~x − ~x0 . Definition 6.1.1. A map F (~x) defined on a ball around ~x0 is differentiable at ~x0 if there is a linear map ~a + L(∆~x), such that for any > 0, there is δ > 0, such that k∆~xk = k~x − ~x0 k < δ =⇒ kF (~x) − ~a − L(∆~x)k ≤ k∆~xk.

(6.1.1)

The linear transform L is the derivative of F at ~x0 and is denoted F 0 (~x0 ). The condition (6.1.1) can be rephrased as kF (~x0 + ∆~x) − F (~x0 ) − L(∆~x)k = 0. k∆~xk ∆~ x→~0

~a = F (~x0 ), lim

(6.1.2)

The differentiability at a point requires the map to be defined everywhere near the point. Thus for single variable functions, the differentiability is defined only for functions on open intervals or unions of open intervals. For multivariable maps, the differentiability is defined only for maps on open subsets. In the future, the differentiability will be extended to maps defined on “differentiable subsets” (called submanifolds). Similar to the single variable case, we denote the differential dF = L(d~x) of a map F . Again at the moment it is only a symbolic notation. Example 6.1.1. For the map F (x, y) = (x2 + y 2 , xy) : R2 → R2 , we have F (x, y) − F (x0 , y0 ) = (2x0 ∆x + ∆x2 + 2y0 ∆y + ∆y 2 , y0 ∆x + x0 ∆y + ∆x∆y) = (2x0 ∆x + 2y0 ∆y, y0 ∆x + x0 ∆y) + (∆x2 + ∆y 2 , ∆x∆y). Since k(∆x2 + ∆y 2 , ∆x∆y)k∞ ≤ 2k(∆x, ∆y)k2∞ , F is differentiable at (x0 , y0 ) and the derivative F 0 (x0 , y0 ) is the linear transform (u, v) 7→ (2x0 u + 2y0 v, y0 u + x0 v). The differential is d(x0 ,y0 ) F = (2x0 dx + 2y0 dy, y0 dx + x0 dy). Example 6.1.2. For the function f (~x) = ~x · ~x = k~xk22 , we have f (~x0 + ∆~x) − f (~x0 ) = 2~x0 · ∆~x + k∆~xk22 . Since 2~x0 · ∆~x is a linear functional of ∆~x, f is differentiable, with the derivative f 0 (~x0 )(~v ) = 2~x0 · ~v and the differential d~x0 f = 2~x0 · d~x.

6.1. DIFFERENTIATION

291 2

Example 6.1.3. The space of n × n matrices form the Euclidean space Rn . For 2 2 the map F (X) = X 2 : Rn → Rn of taking squares of matrices, we have F (A + H) − F (A) = (A2 + AH + HA + H 2 ) − A2 = AH + HA + H 2 . The map H 7→ AH + HA is a linear transform, and by Proposition 5.2.1, kH 2 k ≤ kHk2 ≤ kHk when kHk < . Therefore the map is differentiable, with the derivative F 0 (A)(H) = AH + HA and the differential dA F = A(dX) + (dX)A. Exercise 6.1.1. Use the definition to show the differentiability of the function f (x, y) = ax2 + 2bxy + cy 2 and find the derivative. Exercise 6.1.2. Prove that if a map is differentiable at ~x0 , then the map is continuous at ~x0 . Then show that the Euclidean norm k~xk2 is continuous but not differentiable at ~0. Exercise 6.1.3. Suppose F is differentiable at ~x0 , with F (~x0 ) = ~0. Suppose a function λ(~x) is continuous at ~x0 . Prove that λ(~x)F (~x) is differentiable at ~x0 . Exercise 6.1.4. Find the condition for a homogeneous function to be differentiable at ~0. What about a multihomogeneous function? Exercise 6.1.5. Suppose A is an n × n matrix. Find the derivative of the function A~x · ~x. Exercise 6.1.6. Let X T be the transpose of a matrix X. Prove that the derivative 2 2 of F (X) = X T X : Rn → Rn is F 0 (A)(H) = AT H + H T A. Exercise 6.1.7. For any natural number k, find the derivative of the map of taking the k-th power of matrices. Exercise 6.1.8. Use the equality (I + H)−1 = I − H + H 2 (I + H)−1 and Exercise 5.2.12 to find the derivative of the inverse matrix map at the identity matrix I.

6.1.2

Partial Derivative

By considering the L∞ -norm, we see that a map is differentiable if and only if each coordinate function is differentiable. Moreover, the linear approximation is obtained by putting together the linear approximations of the coordinate functions. A function f (~x) on Rn is approximated at ~x0 by a linear function a + b1 ∆x1 + b2 ∆x2 + · · · + bn ∆xn if for any > 0, there is δ > 0, such that |∆xi | = |xi − xi0 | < δ, for all i =⇒ |f (x1 , x2 , . . . , xn ) − a − b1 ∆x1 − b2 ∆x2 − · · · − bn ∆xn | ≤ max{|∆x1 |, |∆x2 |, . . . , |∆xn |}. If we fix x2 = x20 , . . . , xn = xn0 , and let only x1 change, then the above says that f (x1 , x20 , . . . , xn0 ) is approximated by the linear function a + b1 ∆x1 at x1 = x10 . Thus a = f (~x0 ) and f (x10 + ∆x1 , x20 , . . . , xn0 ) − f (x10 , x20 , . . . , xn0 ) ∆x1 →0 ∆x1

b1 = lim

292

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

is the derivative of f (~x) in x1 with all the other coordinates fixed. The coefficient is called the partial derivative in the first variable. The other coefficients are the similar partial derivatives and denoted bi =

∂f = Dxi f = fxi . ∂xi

(6.1.3)

Using the notation, the derivative f 0 (~x) is the linear transform (v1 , v2 , . . . , vn ) 7→

∂f ∂f ∂f v1 + v2 + · · · + vn : Rn → R, ∂x1 ∂x2 ∂xn

and the differential of the function is df =

∂f ∂f ∂f dx1 + dx2 + · · · + dxn . ∂x1 ∂x2 ∂xn

(6.1.4)

Moreover, introduce the gradient ∂f ∂f ∂f . ∇f = , ,..., ∂x1 ∂x2 ∂xn

(6.1.5)

Then the linear approximation is f (~x0 ) + ∇f (~x0 ) · ∆~x, the derivative is f 0 (~x)(~v ) = ∇f (~x0 ) · ~v , and the differential is df = ∇f · d~x. Example 6.1.4. The function f (x, y) = 1 + 2x + xy 2 has partial derivatives at (0, 0) f (0, 0) = 1, fx (0, 0) = (2 + y 2 )|x=0,y=0 = 2, fy (0, 0) = 2xy|x=0,y=0 = 0. So the candidate for the linear approximation is 1 + 2(x − 0) + 0(y − 0) = 1 + 2x. Then we verify that (note that ∆x = x, ∆y = y) |f (x, y) − 1 − 2x| = |xy 2 | ≤ k(x, y)k3∞ . Therefore f is differentiable at (0, 0), with d(0,0) f = 2dx. p Example 6.1.5. The function f (x, y) = |xy| satisfies f (0, 0) = 0, fx (0, 0) = 0, fy (0, 0) = 0. So the candidate for the linear approximation at (0, 0) is the zero function. However, by Example 5.1.2, the limit s |f (x, y)| |xy| lim = lim x→0,y→0 x2 + y 2 (x,y)→(0,0) k(x, y)k2 does not exist. Therefore f is not differentiable at (0, 0), despite the existence of the partial derivatives. Exercise 6.1.9. Compute the partial derivatives. z

1. 4xy 3 + 5x3 y − 3y 2 .

3. exyz sin(x+2y+3z).

5. xy .

y 2. arctan . x

4. log(x2 + y 2 ).

6. (xy)z .

Exercise 6.1.10. Discuss the continuity, the existence of partial derivatives and the differentiability at (0, 0). p, q > 0.

6.1. DIFFERENTIATION

293

 p  |x| y if(x, y) 6= (0, 0) 1. f (x, y) = x2 + y 2 .  0 if(x, y) = (0, 0)  xy  if(x, y) 6= (0, 0) 2 + y 2 )p (x . 2. f (x, y) = 0 if(x, y) = (0, 0)  1 (x2 + y 2 )p sin 2 + y2 x 3. f (x, y) = 0  1 |x|p |y|q sin 2 + y2 x 4. f (x, y) = 0

if(x, y) 6= (0, 0)

.

if(x, y) = (0, 0)

if(x, y) 6= (0, 0)

.

if(x, y) = (0, 0)

Exercise 6.1.11. Prove the properties of the gradient. 1. ∇c = ~0.

3. ∇(f g) = f ∇g + g∇f .

2. ∇(f + g) = ∇f + ∇g.

4. ∇φ(f ) = φ0 (f )∇f .

Putting the linear approximations of coordinate functions together, we get the linear approximation of a differentiable map. In particular, the derivative linear transform F 0 (~x0 ) is given by the Jacobian matrix   ∂f1 ∂f1 ∂f1  ∂x1 ∂x2 · · · ∂xn    ∂f2   ∂f2 ∂f2  ··· ∂(f1 , f2 , . . . , fm )  ∂F ∂x1 ∂x2 ∂xn  . (6.1.6) = =  .. ..  ∂~x ∂(x1 , x2 , . . . , xn )  ..  . .   .  ∂fm ∂fm ∂fm  ··· ∂x1 ∂x2 ∂xn For a differentiable multivariable function f (~x) : Rn → R, the Jacobian matrix is the gradient ∇f considered as a 1 × n matrix. It is rather tedious to first use the partial derivatives to find the candidate linear approximation and then verify the condition for differentiability. Fortunately, there is a simple sufficient (but not necessary) condition for the differentiability. Proposition 6.1.2. Suppose a map is defined near ~x0 . If all the partial derivatives exist near ~x0 and the partial derivatives are continuous at ~x0 , then the map is differentiable at ~x0 . A differentiable map F : Rn → Rm is continuously differentiable if the map ~x 7→ F 0 (~x) : Rn → Rmn is continuous. This is equivalent to all the partial derivatives are continuous. Proof. Assume the partial derivatives fx (x, y) and fy (x, y) exist near (x0 , y0 ). Applying the mean value theorem to f (t, y) for t ∈ [x0 , x] and to f (x0 , s) for

294

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

s ∈ [y0 , y], we get f (x, y) − f (x0 , y0 ) = (f (x, y) − f (x0 , y)) + (f (x0 , y) − f (x0 , y0 )) = fx (c, y)(x − x0 ) + fy (x0 , d)(y − y0 ) = fx (c, y)∆x + fy (x0 , d)∆y for some c ∈ [x0 , x] and d ∈ [y0 , y]. If fx and fy are continuous at (x0 , y0 ), then for any > 0, there is δ > 0, such that |∆x| < δ, |∆y| < δ =⇒ |fx (x, y) − fx (x0 , y0 )| < , |fy (x, y) − fy (x0 , y0 )| < . By |c − x0 | < |∆x| and |d − y0 | < |∆y|, we find |∆x| < δ and |∆y| < δ implies |fx (c, y) − fx (x0 , y0 )| < , |fy (x0 , d) − fy (x0 , y0 )| < , so that |f (x, y) − f (x0 , y0 ) − fx (x0 , y0 )∆x − fy (x0 , y0 )∆y| ≤ |∆x| + |∆y|. This shows that f (x, y) is differentiable at (x0 , y0 ). The proof for the general case is similar. Example 6.1.6. The function f (x, y) = 1+2x+xy 2 in Example 6.1.4 has continuous partial derivatives fx = 2 + y 2 , fy = 2xy. Therefore the function is differentiable with df = (2 + y 2 )dx + 2xydy. Example 6.1.7. Express the map in Example 6.1.1 as u(x, y) = x2 + y 2 , v(x, y) = xy. The partial derivatives ux = 2x, uy = 2y, vx = y, v x are continuous. y = 2xdx + 2ydy du = = Therefore the map is differentiable, with the differential ydx + xdy dv dx 2x 2y . dy y x Example 6.1.8. On the plane, the polar coordinate (r, θ) and the cartesian coordinate (x, y) are related by x = r cos θ, y = r sin θ. The relation is differentiable because the partial derivatives are continuous. The Jacobian matrix ∂(x, y) cos θdr − r sin θdθ dx cos θ −r sin θ = = and the differential = sin θdr + r cos θdθ dy sin θ r cos θ ∂(r, θ) dr cos θ −r sin θ . dθ sin θ r cos θ Example 6.1.9. By Proposition 2.1.3, the differentiability of a parametrized curve φ(t) = (x1 (t), x2 (t), . . . , xn (t)) : (a, b) → Rn is equivalent to the existence of the derivatives x0i . The Jacobian matrix is the vertical version of the tangent vector φ0 = (x01 , x02 , . . . , x0n ). As a linear map, the derivative is u 7→ uφ0 . The definition of parametrized curve allows the constant curve and allows the tangent vector to become zero. It also allows continuously differentiable curves to appear to have sharp corners, such as this example ( (t2 , 0) if t ≤ 0 φ(t) = . (0, t2 ) if t > 0 To avoid such undesirable situations, we say a differentiable parametrized curve is regular if the tangent vector φ0 is never zero. We will see that by the inverse function theorem (see Theorem 6.2.1), this is equivalent to the possibility of local reparametrization by one coordinate (i.e., t = xi for some i).

6.1. DIFFERENTIATION

295

Example 6.1.10. The parametrized sphere (5.1.19) and the parametrized torus (5.1.20) are differentiable surfaces in R3 . In general, if a parametrized surface σ(u, v) : R2 → Rn is differentiable, then the tangent vectors σu and σv span the tangent plane T(u0 ,v0 ) S. As a linear map, the derivative σ 0 is (s, t) 7→ sσu + tσv . The undesirable situation that may happen for parametrized curves may also happen for parametrized surfaces. We say a differentiable parametrized surface is regular if the tangent vectors σu and σv are always linearly independent. We will see that by the inverse function theorem (see Theorem 6.2.1), this is equivalent to the possibility of local reparametrization by two coordinates (i.e., u = xi and v = xj for some i and j). Exercise 6.1.12. Compute the Jacobian matrices and the differentials. p y 1. r = x2 + y 2 , θ = arctan . x 2. u1 = x1 + x2 + x3 , u2 = x1 x2 + x2 x3 + x3 x1 , u3 = x1 x2 x3 . 3. x = r sin φ cos θ, y = r sin φ sin θ, z = r cos φ. Exercise 6.1.13. Suppose F (~x) is a bounded map near ~0. Prove that k~xk22 F (~x) is differentiable at ~0. Then use this to construct a function that is differentiable at ~0 but has no partial derivatives away from ~0. Exercise 6.1.14. Construct a function that is differentiable everywhere but the partial derivatives are not continuous at ~0. Exercise 6.1.15. Prove that if fx (x0 , y0 ) exists and fy (x, y) is continuous at (x0 , y0 ), then f (x, y) is differentiable at (x0 , y0 ). Extend the result to three or more variables. Exercise 6.1.16. Prove that a function f (~x) is differentiable at ~x0 if and only if f (~x) = f (~x0 ) + J(~x) · ∆~x, where J : Rn → Rn is continuous at ~x0 . Extend the fact to differentiable maps.

6.1.3

Rules of Differentiation

The rules of computation of the derivatives can be extended from single to multivariable since the underlying principle still holds. If maps have linear approximations, then their addition, scalar multiplication and composition are still approximated by the similar combinations of the linear approximations. Suppose F, G : Rn → Rm are differentiable at ~x0 . Then the sum F + G : Rn → Rm is also differentiable at ~x0 , with the derivative (F + G)0 = F 0 + G0 . Therefore the Jacobian matrix for F + G is the sum of the Jacobian matrices of F and G. In terms of the individual entries of the Jacobian matrix, the ∂f ∂g ∂(f + g) = + . equality means ∂xi ∂xi ∂xi n Suppose a function λ : R → R and a map F : Rn → Rm are differentiable at ~x0 . Then the scalar multiplication λF : Rn → Rm is also differentiable at ~x0 , with the derivative given by the Leibniz formula (λF )0 = λ0 F + λF 0 .

296

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

Note that both sides are supposed to be linear transforms from Rn to Rm . At ~x0 , we have λ(~x0 ) ∈ R, λ0 (~x0 ) : Rn → R, F (~x0 ) ∈ Rm and F 0 (~x0 ) : Rn → Rm . Then we see that the right side takes a vector in Rn to a vector in Rm . Of course, in terms of the individual entries of the Jacobian matrix, the equality ∂λ ∂f ∂(λf ) = f +λ . means ∂xi ∂xi ∂xi The most general form of the Leibniz formula can be found in Exercise 6.1.17. For the composition, we get multivariable version of the chain rule. Suppose F : Rn → Rm is differentiable at ~x0 and G : Rm → Rk is differentiable at ~y0 = F (~x0 ). Then the composition G ◦ F : Rn → Rk is also differentiable at ~x0 , with the derivative given by the chain rule (G ◦ F )0 = G0 ◦ F 0 . The right side is a composition of the linear transforms F 0 (~x0 ) : Rn → Rm and G0 (~y0 ) = G0 (F (~x0 )) : Rm → Rk . Therefore the Jacobian matrix of the composition is the product of the Jacobian matrices of individual maps. In terms of the individual entries of the Jacobian matrix, the chain rule means ∂g ∂f1 ∂g ∂f2 ∂g ∂fm ∂(g ◦ F ) = + + ··· + . ∂xi ∂f1 ∂xi ∂f2 ∂xi ∂fm ∂xi Example 6.1.11. Suppose u = ex sin(y + 2z), x = t2 , y = t3 − 2t, z = t2 + t. The relations can be considered as a composition t 7→ (x, y, z) 7→ u. To compute the derivative of u(t) = u(x(t), y(t), z(t)), we differentiate the relations du = ex sin(y + 2z)dx + ex cos(y + 2z)dy + 2ex cos(y + 2z)dz, dx = 2tdt, dy = (3t2 − 2)dt, dz = (2t + 1)dt. Substituting dx, dy, dz into du, we get du = (ex sin(y + 2z))2tdt + (ex cos(y + 2z))(3t2 − 2)dt + (2ex cos(y + 2z))(2t + 1)dt 2

= et (2t sin(t3 + 2t2 ) + (3t2 − 2 + 4t + 2) cos(t3 + 2t2 ))dt. Thus

du 2 = et (2t sin(t3 + 2t2 ) + (3t2 + 4t) cos(t3 + 2t2 )). dt We note that the coefficients in the relation between the differentials form the Jacobian matrix. The substitution of the differential relation besically computes the product of the Jacobian matrix. Example 6.1.12. Suppose u = ex sin(y + 2z), x = s2 t, y = s + 2t, z = s2 − t. Then du = ex sin(y + 2z)dx + ex cos(y + 2z)dy + 2ex cos(y + 2z)dz, dx = 2stds + s2 dt, dy = ds + 2dt, dz = 2sds − dt.

6.1. DIFFERENTIATION

297

Substituting dx, dy, dz into du, we get du = ex sin(y + 2z)(2stds + s2 dt) + ex cos(y + 2z)(ds + 2dt) + 2ex cos(y + 2z)(2sds − dt) 2

2

= es t (2st sin(s + 2s2 ) + (1 + 4s) cos(s + 2s2 ))ds + es t s2 sin(s + 2s2 )dt. In other words, 2

us = es t (2st sin(s + 2s2 ) + (1 + 4s) cos(s + 2s2 )), 2

ut = es t s2 sin(s + 2s2 ). Exercise 6.1.17. Suppose F : Rp → Rm and G : Rp → Rn are differentiable at ~z0 ∈ Rp , and B : Rm × Rn → Rk is a bilinear map. Prove that B(F, G) : Rp → Rk is also differentiable at ~z0 , with the derivative given by the Leibniz formula B(F, G)0 = B(F 0 , G) + B(F, G0 ). Extend the Leibniz formula to multilinear maps. Exercise 6.1.18. Prove the chain rule for multivariable maps. Does the chain rule dg(x(t), y(t)) formula = gx (x(t), y(t))x0 (t) + gy (x(t), y(t))y 0 (t) still hold if g is not dt assumed to be differentiable? Exercise 6.1.19. Suppose F, G : Rn → Rn are maps that are invertible to each other. Suppose F is differentiable at ~x0 and G is differentiable at ~y0 = F (~x0 ). Prove that the linear transforms F 0 (~x0 ) and G0 (~y0 ) are inverse to each other. In particular, the derivative of a differentiable invertible map is an invertible linear transform. Exercise 6.1.20. Compute partial derivatives. 1. z = uv + sin t, u = et , v = cos t, find zt . 2. z = x2 log y, x =

u , y = u − v, find zu , zv . v

3. u = f (x + y, x − y), find ux , uy . 4. u = f (r cos θ, r sin θ), find ur , uθ . x y z , , , find ux , uy , uz . 5. u = f y z x Exercise 6.1.21. Compute the derivative of xx by considering f = uv , u = x, v = x. Exercise 6.1.22. Prove that a differentiable function f (x, y) depends only on the angle in the polar coordinate if and only if xfx + yfy = 0. Can you make a similar claim for a differentiable function that depends only on the length. Exercise 6.1.23. Suppose f (x, y) is a differentiable function satisfying f (x, x2 ) = 1, fx (x, x2 ) = x. Find fy (x, x2 ). Exercise 6.1.24. For a differentiable function f , show that u = yf (x2 − y 2 ) satisfies y 2 ux + xyuy = u. Exercise 6.1.25. Suppose differentiable functions f and g satisfy f (u, v, w) = √ √ √ g(x, y, z) for u = yz, v = zx, w = xy. Prove that ufu + vfv + wfw = xgx + ygy + zgz . Exercise 6.3.47 is a vast generalization.

298

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

Exercise 6.1.26. Find the partial differential equation characterizing differentiable functions f (x, y y) of the form f (x, y) = φ(xy). What about functions of the form f (x, y) = φ ? x Exercise 6.1.27. Discuss the differentiability of a function f (k~xk2 ) of the Euclidean norm. Exercise 6.1.28. For any invertible matrix A, the inverse map is the composition of the following three maps: X 7→ A−1 X, X 7→ X −1 , X 7→ XA−1 . Use this and Exercise 6.1.8 to find the derivative of the inverse matrix map at a matrix A.

6.1.4

Directional Derivative

The restriction of a function f on a parametrized curve φ(t) : (a, b) → Rn is the composition f (φ(t)). If both f and φ are differentiable, then the change of the function f along the curve φ is measured by f (φ(t))0 =

∂f dx1 ∂f dx2 ∂f dxn + + ··· + = ∇f · φ0 . ∂x1 dt ∂x2 dt ∂xn dt

(6.1.7)

The formula shows that the change depends on the tangent vector instead of the curve itself. Define the derivative of f along any vector ~v to be f (~x + t~v ) − f (~x) = f 0 (~x)(~v ) = ∇f · ~v . t→0 t

D~v f = lim

(6.1.8)

Then we get f (φ(t))0 = Dφ0 (t) f . Note that the derivative along ~v is the value of the derivative linear functional f 0 (~x) at ~v . Thus D~v f is a linear function of ~v and the partial derivatives are simply the values of the linear function at the standard basis vectors. If ~v has unit Euclidean length (i.e., k~v k2 = 1), then only the direction of the vector matters and D~v f is called the directional derivative. Let θ be the angle between ∇f and ~v . Then D~v f = k∇f k2 cos θ. The formula shows that the function “does not change” in directions orthogonal to its gradient and increases the most (at the rate of k∇f k2 ) in the direction of its gradient (and decreases the most in the opposite direction of its gradient). Geometrically, the restriction of the function f on a parametrized curve φ does not change if and only if the curve lies in a level Sc = {~x : f (~x) = c} of the function, which is typically an (n − 1)-dimensional hypersurface in Rn . The tangent space of the level hypersurface consists of the tangent vectors of all the curves inside the level T Sc ={φ0 (t) : φ(t) ∈ Sc } = {φ0 (t) : f (φ(t)) = c} ={φ0 (t) : Dφ0 (t) f = 0} = {~v : ∇f · ~v = 0}.

6.1. DIFFERENTIATION

299

Therefore the tangent space of the level is the vector subspace orthogonal to the gradient. On the other hand, the function changes only if one jumps from one level Sc to a different level Sc0 . The change is fastest (i.e., the directional derivative is biggest) when one moves in the direction orthogonal to the levels, which is the direction of ∇f . Moreover, the magnitude k∇f k2 of the gradient measures how fast this fastest jump is. Therefore k∇f k2 measures how much the levels are “squeezed” together. ....∇f ∇f .. .... . . .................................................................................................................................................. . . . . . . . . . ...... ....... .......... .. .................................................................... ....... ......... ......... .......... ... ............. . . . ... .. .. ... ... ... ... .... ... ... .....S 0 .....S....c........................... ..... .... .... ... . ......c... . .... ... ............................................... ........ .... ....... .... . . . . .. . . .......... ...... ......................................................... .......... ........ ... ........... ......................... ................................... ......... Figure 6.1: gradient and level Example 6.1.13. The gradient of f = x2 + y 2 + z 2 is ∇f = (2x, 2y, 2z). The derivative at (1, 1, 1) in the direction (2, 1, 2) is 1 1 10 D 1 (2,1,2) f (1, 1, 1) = ∇f (1, 1, 1) · (2, 1, 2) = (2, 2, 2) · (2, 1, 2) = . 3 3 3 3 Note the vector (2, 1, 2) is divided by its length, so that only the direction counts. Example 6.1.14. Suppose the derivative of f in the directions (1, 1) and (1, −1) are respectively 2 and −3. Then √ √ D(1,1) f = 2D √1 (1,1) f = 2 2, 2 √ √ D(1,−1) f = 2D √1 (1,−1) f = −3 2, 2

and 1 fx = D(1,0) f = (D(1,1) f + D(1,−1) f ) = 2 1 fy = D(0,1) f = (D(1,1) f − D(1,−1) f ) = 2

−1 √ , 2 5 √ . 2

Example 6.1.15. We try to characterize continuously differentiable functions f (x, y) satisfying xfx = yfy . Note that the condition is the same as ∇f ⊥ (x, −y), which is equivalent to ∇f being parallel to (y, x) = ∇(xy). This means that the levels of f and the levels of xy are tangential everywhere. Therefore we expect the levels of f (x, y) and xy to be the same, and we should have f (x, y) = h(xy) for some continuously differentiable h(t). z For the rigorous argument, let f (x, y) = g(x, xy), or g(x, z) = f x, (we x define such g in case x 6= 0, and we define f (x, y) = g(xy, y) in case y 6= 0). Then z 1 z gx = fx − 2 fy = xfx − fy = 0. x x x

300

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

This shows that g is independent of the first variable, and we have f (x, y) = h(xy). A vast extension of the discussion is the theory of functional dependence. See Exercises 6.2.20 through 6.2.25. Exercise 6.1.29. Compute directional derivatives. 1. 4xy 3 + 5x3 y − 3y 2 at (0, 1), in direction (1, 2). 2. arctan

y at (1, 1), in direction (1, −1). x

3. xy + yz + zx at (1, 2, 3), in direction (2, 2, 1). 4. a + b1 x1 + b2 x2 + · · · + bn xn at (1, 1, . . . , 1), in direction (v1 , v2 , . . . , vn ). 5. x21 + x22 + · · · + x2n at (1, −1, . . . , (−1)n ), in direction (1, 1, . . . , 1). √ Exercise 6.1.30. Suppose Df = 1 in direction (1, 2, 2), Df = 2 in direction (0, 1, −1), Df = 3 in direction (0, 0, 1). Find the gradient of f . Exercise 6.1.31. The curves φ(t) = (t, t2 ) and ψ(t) = (t2 , t) intersect at (1, 1). Suppose the derivatives of f (x, y) along the two curves are respectively 2 and 3 at (1, 1), what is the gradient of f at (1, 1). Exercise 6.1.32. Suppose ~u1 , ~u2 , . . . , ~un is an orthonormal basis. Prove that ∇f = (D~u1 f )~u1 + (D~u2 f )~u2 + · · · + (D~un f )~un . Exercise 6.1.33. Express the gradient of f (x, y) in terms of the partial derivatives ~xr ~xθ fr , fθ and the directions ~er = = (cos θ, sin θ), ~eθ = = (− sin θ, cos θ) k~xr k2 k~xθ k2 in the polar coordinates. Exercise 6.2.16 is a vast generalization. Exercise 6.1.34. Find continuously differentiable functions satisfying the equations. 1. afx = bfy .

4. (x + y)fx + (x − y)fy = 0.

2. yfx = xfy .

5. fx = fy = fz .

3. xfx + yfy = 0.

6. xfx = yfy = zfz .

Exercise 6.1.35. Suppose ~a is a nonzero vector in Rn . Suppose ~b1 , ~b2 , . . . , ~bn−1 are linearly independent vectors orthogonal to ~a. Prove that a function f on whole Rn (or on any open convex subset) satisfies D~a f = 0 if and only if f (~x) = g(~b1 · ~x, ~b2 · ~x, . . . , ~bn−1 · ~x) for some function g on Rn−1 .

The idea underlying the discussion of the directional derivative can be used to derive the following partial extension of the mean value theorem. Proposition 6.1.3. Suppose f is a function differentiable along the straight line connecting ~a and ~b. Then there is ~c on the straight line, such that f (~b) − f (~a) = ∇f (~c) · (~b − ~a). Proof. The straight line connecting ~a to ~b is φ(t) = (1 − t)~a + t~b for t ∈ [0, 1]. The single variable function f (φ(t)) is differentiable with (f (φ(t)))0 = ∇f (φ(t)) · φ0 (t) = ∇f (φ(t)) · (~b − ~a).

6.1. DIFFERENTIATION

301

By applying the main value theorem to f (φ(t)), we find 0 < c < 1, such that f (~b) − f (~a) = f (φ(1)) − f (φ(0)) = (f (φ(t)))0t=c (1 − 0) = ∇f (φ(c)) · (~b − ~a). Let ~c = φ(c). The proposition is proved. Although the proposition may be extended to individual coordinates of a multivariable map, the choice of ~c may be different for different coordinates. The following is a more unified extension. Proposition 6.1.4. Suppose F is a map differentiable along the straight line connecting ~a and ~b. Then there is ~c on the straight line, such that kF (~b) − F (~a)k2 ≤ kF 0 (~c)kk~b − ~ak2 . where the norm kF 0 (~c)k is with respect to the Euclidean norms. Proof. Fix any vector ~v and consider the function f (~x) = F (~x) · ~v . Then ∇f (~x) · ~u = F 0 (~x)(~u) · ~v . By Proposition 6.1.3, we have |(F (~b) − F (~a)) · ~v | = |∇f (~c) · (~b − ~a)| = |F 0 (~c)(~b − ~a) · ~v | ≤ kF 0 (~c)(~b − ~a)k2 k~v k2 ≤ kF 0 (~c)kk~b − ~ak2 k~v k2 , By taking ~v = F (~b) − F (~a) at the beginning, we get kF (~b) − F (~a)k2 ≤ kF 0 (~c)kk~b − ~ak2 . As a consequence of the mean value theorem, Theorem 2.2.3 can be extended from single to multivariable. Proposition 6.1.5. Suppose a differentiable map on a path connected open subset has zero derivative everywhere. Then the map is a constant map. Proof. For any ~x ∈ U , there is > 0, such that k~y − ~xk < implies ~y ∈ U . Then the straight line connecting ~x and ~y lies in U . By Proposition 6.1.4 and ∇F 0 (~c) = 0, we have F (~y ) = F (~x). Thus F is constant near ~x. For any two points ~x, ~y ∈ U , there is a path φ(t), t ∈ [a, b], connecting ~x to ~y . For any t0 ∈ [a, b], there is > 0, such that k~y − φ(t0 )k < implies ~y ∈ U . Since φ is continuous, there is δ > 0, such that |t − t0 | < δ implies kφ(t) − φ(t0 )k < . Then F (φ(t)) is constant on (t0 − µ, t0 + µ). In particular, we have (F (φ(t)))0 = 0 for any t ∈ [a, b]. For the map F ◦ φ : [a, b] → Rn , the vanishing of the derivative on (a, b) implies that F (φ(a)) = F (φ(b)), which is F (~x) = F (~y ). This completes the proof that F is constant on U . Exercise 6.1.36. What can you say about Proposition 6.1.4 if the norms are different from the Euclidean norm? Exercise 6.1.37. What can you say about Proposition 6.1.5 if only some partial derivatives are constantly zero? The answer makes Example 6.1.15 more rigorous.

302

6.2

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

Inverse and Implicit Function

The inverse and implicit function theorems basically say that if the similar inverse and implicit problems can be solved for the linear approximation, then the problems can be solved for the multivariable map itself. The result has geometrical interpretation related to the definition of hypersurface.

6.2.1

Inverse Differentiation

Suppose a single variable function f (x) has continuous derivative near x0 . If f 0 (x0 ) 6= 0, then f 0 (x) is either positive for all x near x0 or negative for all x near x0 . Thus f (x) is monotone and is therefore invertible near x0 . The multivariable extension is the following. Theorem 6.2.1 (Inverse Function Theorem). Suppose F : Rn → Rn is continuously differentiable near ~x0 . Suppose the derivative F 0 (~x0 ) is an invertible linear map. Then there is an open subset U containing ~x0 , such that F (U ) is also open, F : U → F (U ) is invertible, F −1 : F (U ) → U is differentiable, and (F −1 )0 (~y ) = (F 0 (~x))−1 when ~y = F (~x). The theorem basically says that if the linear approximation of a map is invertible, then the map is locally also invertible. Moreover, the differential d~x = (F −1 )0 d~y is the solution of the equation d~y = F 0 d~x. Exercise 2.1.24 shows that the continuity assumption cannot be dropped from the theorem. Proof. For any ~y near ~y0 = F (~x0 ), we wish to find ~x near ~x0 satisfying F (~x) = ~y . To solve the problem, we approximate the map F (~x) by the linear map L0 (~x) = F (~x0 )+F 0 (~x0 )(~x −~x0 ) and solve the similar equation L0 (~x) = ~y . The solution ~x1 of the approximate linear equation satisfies ~y = F (~x0 ) + F 0 (~x0 )(~x1 − ~x0 ). Although not exactly equal to the solution ~x that we are looking for, ~x1 is often closer to ~x than ~x0 (see Figure 6.2). So we repeat the process by using L1 (~x) = F (~x1 )+F 0 (~x0 )(~x −~x1 ) to approximate F near ~x1 and solve the similar approximate linear equation L1 (~x) = ~y to get solution ~x2 . Continuing the process with the linear approximation L2 (~x) = F (~x2 ) + F 0 (~x0 )(~x − ~x2 ) of F near ~x2 and so on, we get an inductively defined sequence {~xk } by ~y = Lk (~xk+1 ) = F (~xk ) + F 0 (~x0 )(~xk+1 − ~xk ).

(6.2.1)

To prove the expectation that the sequence {~xk } converges to the solution ~x of the equation F (~x) = ~y , we substitute F (~xk ) in (6.2.1) by using the approximation F (~x) = F (~x0 ) + F 0 (~x0 )(~x − ~x0 ) + R(~x)

(6.2.2)

of F near ~x0 . We get ~y = F (~x0 ) + F 0 (~x0 )(~xk − ~x0 ) + R(~xk ) + F 0 (~x0 )(~xk+1 − ~xk ) = F (~x0 ) + F 0 (~x0 )(~xk+1 − ~x0 ) + R(~xk ).

(6.2.3)

6.2. INVERSE AND IMPLICIT FUNCTION

303

.... .. .. .L0 .L1.L2 ...... ...................... . .. . . . . ... ... ... . ....................F ~y ................................................................................................................................................................................................................. .. .... . .... ..... . . . ...... ................................. ... ... ... .. . . . . . .. .... ........ . .... .. .. .. .............................................. .. .. .. .. . . . . . .......... ....... ....... .. .. .. .. .. . . . . . ........ ....... ....... .. . .. .. .. .. . . . . . . .. ................ ........ ........ .. .. .. .. .. .. .. .. .. .......... .................. ....................................................................................................................................................................................... (~x0 , ~y0 )... ~x1 ~x2 ~x3 ~x . Figure 6.2: find ~x satisfying F (~x) = ~y Write the formula (6.2.3) with k − 1 in place of k and taking the difference with (6.2.3), we get F 0 (~x0 )(~xk+1 − ~xk ) + R(~xk ) − R(~xk−1 ) = ~0.

(6.2.4)

This gives us a relation between ~xk+1 − ~xk and ~xk − ~xk−1 . By the continuity of the differentiation at ~x0 , for any > 0, there is δ > 0, such that F is defined on the ball B(~x0 , δ) and k~x − ~x0 k < δ =⇒ kR0 (~x)k = kF 0 (~x) − F 0 (~x0 )k < .

(6.2.5)

If we know ~xk , ~xk−1 ∈ B(~x0 , δ), then by (6.2.4), (6.2.5) and Proposition 6.1.4, k~xk+1 − ~xk k = kF 0 (~x0 )−1 (R(~xk ) − R(~xk−1 ))k ≤ kF 0 (~x0 )−1 kkR(~xk ) − R(~xk−1 )k ≤ kF 0 (~x0 )−1 kk~xk − ~xk−1 k.

(6.2.6)

Note that the estimation in Proposition 6.1.4 holds for the Euclidean norm only. Therefore this proof will be restricted to the Euclidean norm for vectors. α at the beginning, we Now fix some 0 < α < 1. By choosing < 0 kF (~x0 )−1 k get k~xk+1 − ~xk k ≤ αk~xk − ~xk−1 k. This further implies k~xk+1 − ~xk k ≤ αk k~x1 − ~x0 k. This further tells us that, if we know ~x1 , ~x2 , . . . , ~xk ∈ B(~x0 , δ), then for any 0 ≤ j < i ≤ k + 1, we have k~xi − ~xj k ≤ (αj + αj+1 + · · · + αi−1 )k~x1 − ~x0 k <

αj k~x1 − ~x0 k. (6.2.7) 1−α

This will imply that {~xk } is a Cauchy sequence and therefore converges. Taking the limit of (6.2.1), we further see that the limit ~x = limk→∞ ~xk satisfies ~y = F (~x). How do we know ~x1 , ~x2 , . . . , ~xk ∈ B(~x0 , δ), so that the estimations (6.2.6) and (6.2.7) hold? The estimation (6.2.7) tells us that if ~x1 , ~x2 , . . . , ~xk ∈ B(~x0 , δ), then k~xk+1 − ~x0 k <

1 1 k~x1 − ~x0 k ≤ kF 0 (~x0 )−1 kk~y − F (~x0 )k. 1−α 1−α

304

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

The right side will be < δ if we assume ~y satisfies k~y − ~y0 k = k~y − F (~x0 )k <

(1 − α)δ . kF 0 (~x0 )−1 k

(6.2.8)

Under the assumption, we have ~x1 , ~x2 , . . . , ~xk , ~xk+1 ∈ B(~x0 , δ) and the inductive estimation can continue. In summary, we should make the following rigorous setup at the very α is beginning of the proof: Assume 0 < α < 1 is fixed, 0 < < 0 kF (~x0 )−1 k given, and δ > 0 is found so that (6.2.5) holds. Then for any ~y satisfying (6.2.8), we may inductively construct ~xk by the formula (6.2.1). Moreover, an inductive argument shows that the sequence lies in B(~x0 , δ). Therefore the limit ~x = lim ~xk , which solves the equation F (~x) = ~y , still satisfies k~x − ~x0 k = lim k~xk+1 − ~x0 k ≤

1 kF 0 (~x0 )−1 kk~y − F (~x0 )k < δ. 1−α

Geometrically, this means

(1 − α)δ F (B(~x0 , δ)) ⊃ B F (~x0 ), kF 0 (~x0 )−1 k

.

(6.2.9)

Note that so far we have only used the fact that F 0 is continuous at ~x0 and F 0 (~x0 ) is invertible. Since F 0 is continuous and F 0 (~x0 ) is invertible, by choosing δ even smaller, we may further assume that F 0 is actually invertible on the ball U = B(~x0 , δ). We show that the image F (U ) is open. A point in F (U ) is of the form F (~z) for some ~z ∈ U . For F (~x) near ~z, we find αz , z and sufficiently small δz as (1 − αz )δz , we have by (6.2.9) that above, with B(~z, δz ) ⊂ U . Then for δz0 = kF 0 (~z)−1 k B(F (~z), δz0 ) ⊂ F (B(~z, δz )) ⊂ F (U ). This proves that F (U ) is open. Next we prove that F is one-to-one on the ball U , so that U can be used as the open subset in the statement of the theorem. By the approximation (6.2.2) and the estimation (6.2.5), for any ~x, ~x0 ∈ U = B(~x0 , δ), kF (~x) − F (~x0 )k = kF 0 (~x0 )(~x − ~x0 ) + R(~x) − R(~x0 )k ≥ kF 0 (~x0 )(~x − ~x0 )k − k~x − ~x0 k 1 − k~x − ~x0 k ≥ 0 −1 kF (~x0 ) k 1−α ≥ k~x − ~x0 k. kF 0 (~x0 )−1 k

(6.2.10)

Since α < 1, ~x 6= ~x0 implies F (~x) 6= F (~x0 ). It remains to prove the differentiability of the inverse map. By (6.2.2), for ~y ∈ F (U ) and ~x = F −1 (~y ), we have ~x = ~x0 + F 0 (~x0 )−1 (~y − ~y0 ) − F 0 (~x0 )−1 R(~x).

6.2. INVERSE AND IMPLICIT FUNCTION

305

Then by R(~x0 ) = ~0, kR0 (~x)k < along the line connecting ~x0 and ~x, and Proposition 6.1.4, we get kF 0 (~x0 )−1 R(~x)k ≤ kF 0 (~x0 )−1 kkR(~x) − R(~x0 )k ≤ kF 0 (~x0 )−1 kk~x − ~x0 k kF 0 (~x0 )−1 k2 ≤ k~y − ~y0 k, 1−α where the last inequality makes use of (6.2.10). The estimation shows that ~x0 + F 0 (~x0 )−1 (~y − ~y0 ) linearly approximates F −1 (~y ). Therefore the inverse map is differentiable at ~x0 , with (F −1 )0 (~x0 ) = F 0 (~x0 )−1 . Note that most of the proof only makes use of the assumption that the derivative F 0 is continuous at ~x0 . The assumption implies that the image of a ball around ~x0 contains a ball around F (~x0 ). Based on this fact, the continuity of the derivative everywhere is (only) further used to conclude that the image of open subsets must be open. ..... .. L L .. ...... 0 .............. .1..................L2 . . . . .. . ...... .. .................... F ..... ~y ........................................................................................................................................................................................................... .. .. . .... ..... . . ...... ....................... ... ... .. . . . . . .. .. .. .... ..... . ...................... ... . .. .. .. . . . . ........... . .. .. .. . . . . . . ......... . .. .. . .. . . . . . . .. ................... .. .. .. .. .. .. .. ........ . . . . . . . . . . ...................................................................................................................................................................... . (~x0 , ~y0 )... ~x1 ~x2 ~x Figure 6.3: Newton’s method The proof actually includes a method for computing the inverse function. In fact, it appears to be more suitable to use Ln (~x) = F (~xk ) + F 0 (~xk )(~x − ~xk ) to approximate F at ~xk (with F 0 at ~xk instead of at ~x0 ). The method is quite effective in case the dimension is small (so that F 0 (~xk )−1 is easier to compute). In particular, for a single variable function f (x), the solution to f (x) = 0 can be found by starting from x0 and successively constructing xn+1 = xn −

f (xn ) . f 0 (xn )

This is Newton’s method. Example 6.2.1. In Example 6.1.8, we computed the differentials of the cartesian coordinate in terms of the polar coordinate. The Jacobian matrix is invertible away from the origin, so that the polar coordinate can also be expressed locally

306

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

in terms of the cartesian coordinate. In fact, the map (r, θ) → (x, y) is invertible for (r, θ) ∈ (0, ∞) × (a, a + 2π). By solving the system dx = cos θdr − r sin θdθ, dy = sin θdr + r cos θdθ, we get dr = cos θdx + sin θdy, dθ = −r−1 sin θdx + r−1 cos θdy. By the inverse function theorem, the coefficients form the Jacobian matrix ∂(r, θ) cos θ sin θ = . −r−1 sin θ r−1 cos θ ∂(x, y) Exercise 6.2.1. Use the differential in Exercise 6.1.21 to find the differential of the change from the cartesian coordinate (x, y, z) to the spherical coordinate (r, φ, θ). Exercise 6.2.2. Suppose x = eu + u cos v, y = eu + u sin v. Find the places where u and v can be locally expressed as differentiable functions of x and y and then ∂(u, v) compute . ∂(x, y) Exercise 6.2.3. Find the places where z can be locally expressed as a function of x and y and then compute zx and zy . 1. x = s + t, y = s2 + t2 , z = s3 + t3 . 2. x = es+t , y = es−t , z = st. Exercise 6.2.4. Change the partial differential equation (x + y)ux − (x − y)uy = 0 to an equation with respect to the polar coordinate (r, θ).

6.2.2

Implicit Differentiation

Suppose a continuous function f (x, y) satisfies f (x0 , y0 ) = 0 and has continuous partial derivative fy near (x0 , y0 ). Assume fy (x0 , y0 ) > 0. Then fy (x, y) > 0 for (x, y) near (x0 , y0 ). Thus for any fixed x, f (x, y) is strictly increasing in y. In particular, for some small > 0, we have f (x0 , y0 + ) > f (x0 , y0 ) = 0 and f (x0 , y0 − ) < f (x0 , y0 ) = 0. By the continuity in x, there is δ > 0, such that f (x, y0 + ) > 0 and f (x, y0 − ) < 0 for any x ∈ (x0 − δ, x0 + δ). Now for any fixed x ∈ (x0 − δ, x0 + δ), f (x, y) is strictly increasing in y and has different signs at y0 + and y0 − . Therefore there is a unique y ∈ (y0 − , y0 + ) satisfying f (x, y) = 0. We say y is an implicit function of x because the function is only implicitly given by the equation f (x, y) = 0. The argument shows that if f (x0 , y0 ) = 0 and fy (x0 , y0 ) 6= 0, plus some continuity condition, then the equation f (x, y) = 0 can be solved to define a function y = y(x) near (x0 , y0 ). If f is differentiable at (x0 , y0 ), then the equation f (x, y) = 0 is approximated by the linear equation fx (x0 , y0 )(x − x0 ) + fy (x0 , y0 )(y − y0 ) = 0. The assumption fy (x0 , y0 ) 6= 0 makes it possible to solve the linear equation fx (x0 , y0 ) (x − x0 ). So the conclusion is that if the linear and get y = y0 − fy (x0 , y0 )

6.2. INVERSE AND IMPLICIT FUNCTION

307

.... .y + ...................+ ..............................................0................................................... y = y(x) .. .............. .. .. ... ....... .. .. ... . . . . . . ...... .... . . .. .. . . . ... . . . . . . . .. .. .............. .. ... . . . +δ . . . . . . . . . . x..0.....− .......δ..............................................................................................................................x ......0................ .. .. . .. . ....... ... . .. . .. . (x0 , y0 ) ........ .. .. ............. ... .............. ... .. .. ... ..................... .. .. .. ... ................................................................................................................... .. y0 − − .. Figure 6.4: implicit function approximation implicitly defines a function, then the original equation also implicitly defines a function. In general, consider a differentiable map F : Rn × Rm = Rm+n → Rm . We have F 0 (~x0 , ~y0 )(~u, ~v ) =F 0 (~x0 , ~y0 )(~u, ~0) + F 0 (~x0 , ~y0 )(~0, ~v ) =F~x (~x0 , ~y0 )(~u) + F~y (~x0 , ~y0 )(~v ), where F~x (~x0 , ~y0 ) : Rn → Rm and F~y (~x0 , ~y0 ) : Rm → Rm are linear transforms and generalize the partial derivatives. In terms of the Jacobian matrix, F 0 can be written as the block matrix (F~x (~x0 , ~y0 ) F~y (~x0 , ~y0 )). The equation F (~x, ~y ) = ~0 is approximated by the linear equation F~x (~x0 , ~y0 )(~x − ~x0 ) + F~y (~x0 , ~y0 )(~y − ~y0 ) = ~0.

(6.2.11)

The linear equation defines ~y as a linear map of ~x (in other words, the linear equation has a unique solution ~x for each ~y ) if and only if the linear transform F~y (~x0 , ~y0 ) is invertible. We expect the condition is also the condition for the original equation F (~x, ~y ) = ~0 to locally define ~y as a map of ~x. The expectation is summarized in the next result. Theorem 6.2.2 (Implicit Function Theorem). Suppose F : Rn ×Rm → Rm is continuously differentiable near (~x0 , ~y0 ) and satisfies F (~x0 , ~y0 ) = ~0. Suppose the derivative F~y (~x0 , ~y0 ) : Rm → Rm in ~y is an invertible linear map. Then there is an open subset U containing ~x0 and a map G : U ⊂ Rn → Rm , such that G is continuously differentiable near ~x0 , G(~x0 ) = ~y0 and F (~x, G(~x)) = ~0. Moreover, the derivative of G is given by G0 = −F~y−1 F~x . The solution ∆~y = −F~y−1 F~x ∆~x to the linear approximation equation (6.2.11) should be the linear approximation of the solution ~y = G(~x) of the original equation F (~x, ~y ) = ~0. In other words, the linear transform −F~y−1 F~x should be the derivative of G, or the differential d~y = G0 d~x is the solution of the equation dF = F~x d~x + F~y d~y = 0. This is the last claim of the theorem.

308

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

Another way of understanding the formula G0 = −F~y−1 F~x is that vectors of the form L(∆~x) = (∆~x, −F~y−1 F~x ∆~x) are solutions of the linear approximation equation F 0 (~x0 , ~y0 )(∆~x, ∆~y ) = F~x (~x0 , ~y0 )∆~x + F~y (~x0 , ~y0 )∆~y = ~0. Proof. The map H(~x, ~y ) = (~x, F (~x, ~y )) : Rm+n → Rm+n has continuous derivative H 0 (~u, ~v ) = (~u, F~x (~u) + F~y (~v )) near ~x0 . The invertibility of F~y (~x0 , ~y0 ) implies the invertibility of H 0 (~x0 , ~y0 ). Then by the inverse function theorem, H has inverse H −1 = (S, T ) that is continuously differentiable near H(~x0 , ~y0 ) = (~x0 , ~0), where S : Rm+n → Rn and T : Rm+n → Rm . Since HH −1 = (S, F (S, T )) is the identity, we have S(~x, ~z) = ~x and F (~x, T (~x, ~z)) = ~z. Then F (~x, ~y )) = ~0 ⇐⇒ H(~x, ~y ) = (~x, ~0) ⇐⇒ (~x, ~y ) = (S(~x, ~0), T (~x, ~0)) = (~x, T (~x, ~0)). Therefore for ~y = G(~x) = T (~x, ~0) is exactly the solution of F (~x, ~y ) = ~0. Finally, for the derivative of G, we differentiate the equality F (~x, G(~x)) = ~0 and get F~x + F~y G0 = 0. Since F~y is invertible, we get G0 = −F~y−1 F~x . Example 6.2.2. The unit sphere S 2 ⊂ R3 is given by the equation f (x, y, z) = x2 + y 2 + z 2 = 1. By fz = 2z and the implicit function theorem, z can be expressed pas a function of (x, y) near any place where z 6= 0. In fact, the expression is z = ± 1 − x2 − y 2 , where the sign is the same as the sign of z. By solving the equation df = 2xdx + 2ydy + 2zdz = 0, x y x y we get dz = − dx − dy. Therefore zx = − and zy = − . z z z z Exercise 6.2.5. Find the places where the map is implicitly defined and compute the derivatives of the map. 1. x3 + y 3 − 3axy = 0, find

dy . dx

2. x2 + y 2 + z 2 − 4x + 6y − 2z = 0, find

∂z ∂x and . ∂(x, y) ∂(y, z)

3. x2 + y 2 = z 2 + w2 , x + y + z + w = 1, find 4. z = f (x + y + z, xyz), find

∂(z, w) ∂(x, w) and . ∂(x, y) ∂(y, z)

∂z ∂x and . ∂(x, y) ∂(y, z)

Exercise 6.2.6. Verify that implicitly defined function satisfy the partial differential equation.

6.2. INVERSE AND IMPLICIT FUNCTION

309

1. f (x − az, y − bz) = 0, z = z(x, y) satisfies azx + bzy = 1. 2.

x2 + y 2 + z 2

z = yf , z = z(x, y) satisfies (x2 − y 2 − z 2 )zx + 2xyzy = 4xz. y

Exercise 6.2.7. In solving the equation f (x, y) = 0 for two variable functions, we did not assume anything about the partial derivative in x. Extend the discussion to the general multivariable case and point out what conclusion in the implicit function theorem may not hold.

6.2.3

Hypersurface

For a continuously differentiable map F : Rn → Rm , the graph (~x, F (~x)) : Rn → Rm+n is an n-dimensional hypersurface in Rm+n . However, we should not insist that it is always the last m coordinates that can be written as a map of the first n coordinates. For example, the unit circle S 1 ⊂ R2 is the graph of a function of y in x near (x0 , y0 ) ∈ S 1 if y0 6= 0, and the circle is the graph of a function of x in y if x0 6= 0. Thus an n-dimensional hypersurface in Rm+n is a subset such that near any point, the subset is the graph of a continuously differentiable map of some choice of m variables in terms of the other n variables. There are generally two ways of specifying a hypersurface. For n ≤ m, the image of a continuously differentiable map F : Rn → Rm may give an n-dimensional parametrized hypersurface in Rm . For n ≥ m, the preimage of a continuously differentiable map F : Rn → Rm may give an (n − m)dimensional level hypersurface in Rn . A continuously differentiable map F : Rn → Rm is regular if the derivative 0 F (~x) : Rn → Rm is always an injective linear map (this necessarily implies n ≤ m). For example, a parametrized curve φ(t) is regular if its tangent vector φ0 (t) is not zero, and a parametrized surface σ(u, v) is regular if the tangent vectors σu and σv are not parallel. From linear algebra, the injective linear map F 0 (~x) is invertible when projected to certain n coordinates in Rm . Assume these n coordinates are the first n coordinates. Then F (~x) = (G(~x), H(~x)), where G : Rn → Rn and H : Rn → Rm−n are continuously differentiable and G0 (~x) is invertible. By the inverse function theorem, G is locally invertible, and we can locally change the variable from ~x to ~y = G(~x) and get F (G−1 (~y )) = (~y , H(G−1 (~y ))). In other words, by a local change of variable in Rn , F becomes the graph of a continuously differentiable map H ◦ G−1 : Rn → Rm−n . Thus F specifies an n-dimensional hypersurface S in Rm . The tangent space TF (~x) S of the hypersurface is the collection of tangent vectors of the curves F (φ(t)) in the surface. By F (φ(t))0 = F 0 (φ(t))(φ0 (t)) (linear transform F 0 (φ(t)) applied to vector φ0 (t)), the tangent space is the image of the linear transform F 0 (~x) TF (~x) S = {F 0 (~x)(~v ) : ~v ∈ Rn }.

(6.2.12)

310

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

Example 6.2.3. For the parametrized sphere (5.1.19), we have   cos φ cos θ − sin φ sin θ ∂(x, y, z)  = cos φ sin θ sin φ cos θ  . ∂(φ, θ) − sin φ 0 The matrix has rank 2 as long as sin φ 6= 0 (i.e., z = cos φ 6= ±1). Therefore the parametrization is a regular surface away from the north and south poles. ∂(x, y) Moreover, if sin φ 6= 0 and cos φ 6= 0 (i.e., z 6= 0 or ±1), then has full ∂(φ, θ) rank 2. This suggests that away from the two poles and the equator, the map (φ, θ) 7→ (x, y) is locally invertible and the surface can be reparametrized by (x, y). As a matter of fact, on the whole northern hemisphere (which means 0 < z < 1), we have the explicit formula for the reparametrization p p p y (φ, θ) = arcsin x2 + y 2 , arctan , z = cos arcsin x2 + y 2 = 1 − x2 − y 2 . x The southern hemisphere can be similarly reparametrized. Exercise 6.2.8. Study the reparametrization of the sphere (5.1.19) by two coordinates along the equator. Exercise 6.2.9. For the parametrized torus (5.1.20), find the places where the surfaces are regular and then compute the partial derivatives of some coordinate in terms of the other two.

A vector ~y0 ∈ Rm is a regular value of a continuously differentiable map F : Rn → Rm if F 0 (~x) : Rn → Rm is a surjective linear map (this necessarily implies n ≥ m) for all ~x satisfying F (~x) = ~y0 (~x is a preimage of ~y0 ). From linear algebra, the surjective linear map F 0 (~x) is invertible when restricted to certain choice of m coordinates in Rn (by taking all the other n − m coordinates to be 0). By taking these m coordinates to be Rm and the remaining n − m coordinates to be Rn in the implicit function theorem, we find the equation F (~x) = ~y0 defines the graph of a continuously differentiable map that expresses the m coordinates in terms of the remaining n − m coordinates. In particular, this shows that F (~x) = ~y0 is an (n − m)-dimensional hypersurface S in Rn . The tangent space T~x S is given by the derivative of the map produced by the implicit function theorem. The derivative of the map is computed from the linear approximation of the defining equation F (~x) = ~y0 . Thus the tangent space T~x S = {~v ∈ Rn : F 0 (~x)(~v ) = ~0} (6.2.13) is the kernel of the linear transform F 0 (~x) : Rn → Rm . For a continuously differentiable function f (~x), a number y0 is a regular value if f (~x) = y0 implies the gradient ∇f 6= ~0 at ~x (or at least one partial derivative is nonzero). The equation f (~x) = y0 specifies an (n − 1)dimensional hypersurface, with the tangent hyperplane given by ∇f · ~v = 0. In other words, the gradient ∇f is the normal vector of the tangent hyperplane. In general, for a continuously differentiable map F = (f1 , f2 , . . . , fm ) : Rn → m R , the derivative F 0 (~x) : Rn → Rm is surjective if and only if the gradients

6.2. INVERSE AND IMPLICIT FUNCTION

311

∇f1 , ∇f2 , . . . , ∇fm are linearly independent c1 ∇f1 + c2 ∇f2 + · · · + cm ∇fm = 0 =⇒ c1 = c2 = · · · = cm = 0. For a regular value ~y0 , the hypersurface S defined by the equation F (~x) = ~y0 is the intersection of the (n − 1)-dimensional hypersurfaces fi (~x) = yi0 . The tangent space T~x S is the intersection of the tangent spaces ∇fi · ~v = 0 for the coordinate functions. The normal vectors ∇f1 , ∇f2 , . . . , ∇fm span the normal space of the hypersurface S at ~x. Example 6.2.4. The unit sphere S n−1 = f −1 (1), where f (~x) = ~x · ~x = x21 + x22 + · · · + x2n . The number 1 is a regular value because f (~x) = 1 =⇒ ~x 6= ~0 =⇒ ∇f = 2~x 6= ~0. As a linear map, the restriction of the derivative ~v 7→ 2~x · ~v to the i-th coordinate is invertible if and only if xi 6= 0. By the implicit function theorem, if xi 6= 0, then xi can be expressed as a function of (x1 , . . . , xi−1 , xi+1 , . . . , xn ) near ~x. Indeed, the q part of the sphere with xi 6= 0 consists of the graphs of the functions xi = ± 1 − x21 − · · · − x2i−1 − x2i+1 − · · · − x2n (which graph ~x belongs to depends on the sign of its i-th coordinate). The whole sphere is then covered by these pieces of hypersurfaces, called coordinate charts. Exercise 6.2.10. Find the regular values of the map and determine the tangent space of the preimage of regular value. 1. f (x, y) = x2 + y 2 + z 2 + xy + yz + zx + x + y + z. 2. F (x, y, z) = (x + y + z, xy + yz + zx). Exercise 6.2.11. Find a regular value λ for xyz, so that the surface xyz = λ is x2 y 2 z 2 tangential to the ellipse 2 + 2 + 2 = 1 at some point. a b c Exercise 6.2.12. Prove that any sphere x2 +y 2 +z 2 = a2 and any cone x2 +y 2 = b2 z 2 are orthogonal at their intersections. Can you extend this to higher dimension? Exercise 6.2.13. The space of n × n symmetric matrices can be identified with n(n+1) n(n+1) 2 R 2 . By considering the map F (X) = X T X : Rn → R 2 in Exercise n(n − 1) 6.1.6, prove that the orthogonal matrices O(n) = {X : X T X = I} is an 2 2 dimensional hypersurface in Rn .

6.2.4

Additional Exercise

Orthogonal Change of Variable A change of variable ~x = F (~y ) : Rn → Rn is orthogonal if the vectors ~xy1 =

∂~x ∂~x ∂~x , ~xy2 = , . . . , ~xyn = ∂y1 ∂y2 ∂yn

are orthogonal. Exercise 6.2.14. Prove that an orthogonal change of variable satisfies 1 k~xyi k22

∂xj . ∂yi

∂yi = ∂xj

312

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

Exercise 6.2.15. Is the inverse ~y = F −1 (~x) of an orthogonal change of variable also an orthogonal change of variable? Exercise 6.2.16. Prove that under an orthogonal change of variable, the gradient in ~x can be written in terms of the new variable by ∇f =

∂f ~xy2 ∂f ~xyn ∂f ~xy1 + + ··· + . 2 2 ∂y1 k~xy1 k2 ∂y2 k~xy2 k2 ∂yn k~xyn k22

Elementary Symmetric Polynomial The elementary symmetric polynomials for n variables x1 , x2 , . . . , xn are X σk = xi1 xi2 · · · xik , k = 1, 2, . . . , n. 1≤i1
Vieta’s formulae says that they appear as the coefficients of the polynomial p(x) = (x − x1 )(x − x2 ) · · · (x − xn ) = xn − σ1 xn−1 + σ2 xn−2 − · · · + (−1)n−1 σn−1 x + (−1)n σn .

(6.2.14)

Therefore ~x 7→ ~σ : Rn → Rn is the map that takes the roots of polynomials to polynomials. ∂~σ of the polynomial with respect to Exercise 6.2.17. Prove that the derivative ∂~x the roots satisfy ··· xn−2 xn−1 1 1 xn−1 xn−2 · · · 2  2  .. ..  . . n−2 · · · xn−1 x n n 

  0 1 p (x1 ) 0 ··· 0   p (x2 ) · · · 1 ∂~σ  0 + . ..  . ... .  ∂~x  . 0 0 ··· 1

0 0 .. .



   = O.  0 p (xn )

Then prove that when the roots are distinct, the roots can be locally written as continuously differentiable functions of the polynomial.

Power Sum and Newton’s Identity The power sums for n variables x1 , x2 , . . . , xn are X sk = xki = xk1 + xk2 + · · · + xkn , k = 1, 2, . . . , n. 1≤i≤n

For the polynomial p(x) in (6.2.14), by adding p(xi ) = 0 together for i = 1, 2, . . . , n, we get sn − σ1 sn−1 + σ2 sn−2 − · · · + (−1)n−1 σn−1 s1 + (−1)n nσn = 0. Exercise 6.2.18. For ul,k =

X i1
xi1 xi2 · · · xil xkj , l ≥ 0, k ≥ 1, l + k ≤ n,

(6.2.15)

6.2. INVERSE AND IMPLICIT FUNCTION

313

prove that sk = u0,k , σ1 sk−1 = u0,k + u1,k−1 , σ2 sk−2 = u1,k−1 + u2,k−2 , .. . σk−2 s2 = uk−3,3 + uk−2,2 , σk−1 s1 = uk−2,2 + kσk . and derive Newton’s identities sk − σ1 sk−1 + σ2 sk−2 − · · · + (−1)k−1 σk−1 s1 + (−1)k kσk = 0, k = 1, 2, . . . , n. (6.2.16) Exercise 6.2.19. Prove that there is a polynomial invertible map that relates ~s = (s1 , s2 , . . . , sn ) and ~σ = (σ1 , σ2 , . . . , σn ). Then discuss the local invertibility of the map ~x 7→ ~σ : Rn → Rn when there are multiple roots (see Exercise 6.2.17).

Functional Dependence A collection of functions are functionally dependent if some can be written as functions of the others. For example, the functions f = x + y, g = x2 + y 2 , 1 1 h = x3 + y 3 are functionally dependent because h = f 3 − f g. In the 2 2 following exercises, all functions are continuously differentiable. Exercise 6.2.20. Prove that if f1 (~x), f2 (~x), . . . , fn (~x) are functionally dependent, then the gradients ∇f1 , ∇f2 , . . . , ∇fn are linearly dependent. Exercise 6.2.21. Prove that f1 (~x), f2 (~x), . . . , fn (~x) are functionally dependent near ~x0 if and only if there is a function h(~y ) defined near ~y0 = (f1 (~x0 ), f2 (~x0 ), . . . , fn (~x0 )), such that ∇h(~y0 ) 6= ~0 and h(f1 (~x), f2 (~x), . . . , fn (~x)) = 0. Exercise 6.2.22. Suppose the gradients ∇f , ∇g are linearly dependent everywhere, and ∇g(~x0 ) 6= ~0. Prove that there is a function h(y) defined for y near g(~x0 ), such that f (~x) = h(g(~x)) for ~x near ~x0 . ∂g Hint: If 6= 0, then (x1 , x2 , . . . , xn ) 7→ (g, x2 , . . . , xn ) is invertible near ~x0 . ∂x1 After changing the variables from (x1 , x2 , . . . , xn ) to (g, x2 , . . . , xn ), verify that ∂f ∂f = ··· = = 0. ∂x2 ∂xn Exercise 6.2.23. Suppose the gradient vectors ∇f , ∇g1 , . . . , ∇gk are linearly dependent near ~x0 . Suppose ∇g1 , . . . , ∇gk are linearly independent at ~x0 . Prove that there is a function h(~y ) defined for ~y near (f1 (~x0 ), . . . , fk (~x0 )), such that f (~x) = h(g(~x)) for ~x near ~x0 . Exercise 6.2.24. Suppose the rank of the gradient vectors ∇f1 , ∇f2 , . . . , ∇fm is always k near ~x0 . Prove that there are k functions from f1 , f2 , . . . , fm , such that the other m − k functions are functionally dependent on these k functions. Exercise 6.2.25. Determine functional dependence. 1. x + y + z, x2 + y 2 + z 2 , x3 + y 3 + z 3 . 2. x + y − z, x − y + z, x2 + y 2 + z 2 − 2yz.

314 3.

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION x y z , , . x2 + y 2 + z 2 x2 + y 2 + z 2 x2 + y 2 + z 2

x y z 4. p , p , p . 2 2 2 2 2 2 2 x +y +z x +y +z x + y2 + z2

6.3

High Order Differentiation

The high order differentiation and Taylor expansion can also be extended to multivariable. The concepts can be used to solve problems that require high order approximations, such as local maximum and minimum with or without constraint.

6.3.1

Quadratic Approximation

A function f (~x) defined near ~x0 ∈ Rn is approximated by a quadratic function p(~x) = a +

X

X

bi (xi − xi0 ) +

1≤i≤n

cij (xi − xi0 )(xj − xj0 )

1≤i,j≤n

= a + ~b · ∆~x + C∆~x · ∆~x

(6.3.1)

if for any > 0, there is δ > 0, such that k∆~xk < δ =⇒ |f (~x) − p(~x)| ≤ k∆~xk2 .

(6.3.2)

In this case, we say f is second order differentiable at ~x0 , with the quadratic form X f 00 (~x0 )(~v ) = 2 cij vi vj = C~v · ~v (6.3.3) 1≤i,j≤n

as the the second order derivative. The second order differential is X d2 f = 2 cij dxi dxj = 2Cd~x · d~x.

(6.3.4)

1≤i,j≤n

Similar to the single variable case, the second order differentiability im∂f (~x0 ) . We plies the first order differentiability, so that a = f (~x0 ) and bi = ∂xi expect the coefficients cij to be the second order partial derivatives 1 1 ∂ 2 f (~x0 ) 1 ∂ = cij = fxj xi (~x0 ) = 2 2 ∂xi ∂xj 2 ∂xi

∂f ∂xj

.

(6.3.5)

~ x=~ x0

However, a second order differentiable function can be very bad (discontinuous, for example) away from ~x0 . Therefore the formula (6.3.5) may not make sense at all. On the other hand, Theorem 2.3.1 suggests that if the second order partial derivatives do exist and have good enough properties, then the quadratic function (6.3.1) with the coefficients given by suitable partial derivatives should approximate f .

6.3. HIGH ORDER DIFFERENTIATION

315

Theorem 6.3.1. Suppose a function f is differentiable near ~x0 and the par∂f are differentiable at ~x0 . Then f is second order differential derivatives ∂xi tiable at ~x0 , with the quadratic approximation T2 (~x) = f (~x0 ) +

X ∂f (~x0 ) 1 X ∂ 2 f (~x0 ) ∆xi + ∆xi ∆xj . ∂x 2 ∂x ∂x i i j 1≤i≤n 1≤i,j≤n

Proof. We only prove the case n = 2 and ~x0 = (0, 0). The general case is similar. To show that p(x, y) = f (0, 0) + fx (0, 0)x + fy (0, 0)y 1 + (fxx (0, 0)x2 + fxy (0, 0)xy + fyx (0, 0)yx + fyy (0, 0)y 2 ) 2

(6.3.6)

approximates f near (0, 0), we restrict the remainder R2 (x, y) = f (x, y) − p(x, y) to the straight lines passing through (0, 0). Therefore for fixed (x, y) close to (0, 0), we introduce r2 (t) = R2 (tx, ty) = f (tx, ty) − p(tx, ty). Inspired by the proof of Theorem 2.3.1, we apply Cauchy’s mean value theorem to get R2 (x, y) = r2 (1) =

r20 (c) r2 (1) − r2 (0) = , 0 < c < 1. 12 − 02 2c

To compute r20 (c), we note that the differentiability of f near (0, 0) allows us to use the chain rule to get r20 (t) = fx (tx, ty)x + fy (tx, ty)y − fx (0, 0)x − fy (0, 0)y − (fxx (0, 0)x2 + fxy (0, 0)xy + fyx (0, 0)yx + fyy (0, 0)y 2 )t = x[fx (tx, ty) − fx (0, 0) − fxx (0, 0)tx − fxy (0, 0)ty] + y[fy (tx, ty) − fy (0, 0) − fyx (0, 0)tx − fyy (0, 0)ty]. (6.3.7) Further by the differentiability of fx and fy , for any > 0, there is δ > 0, such that k(ξ, η)k < δ implies |fx (ξ, η) − fx (0, 0) − fxx (0, 0)ξ − fxy (0, 0)η| < k(ξ, η)k, |fy (ξ, η) − fy (0, 0) − fyx (0, 0)ξ − fyy (0, 0)η| < k(ξ, η)k. Taking (ξ, η) = (cx, cy), by (6.3.7) we find k(x, y)k < δ implies |r20 (c)| ≤ |x|k(cx, cy)k + |y|k(cx, cy)k = c(|x| + |y|)k(x, y)k. Therefore 1 k(x, y)k < δ =⇒ |R2 (x, y)| ≤ (|x| + |y|)k(x, y)k. 2 By the equivalence of norms, the right side is comparable to k(x, y)k2 . Therefore we conclude that p(x, y) is a quadratic approximation of f (x, y) near (0, 0).

316

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

Under slightly stronger condition, the partial derivative coefficients are symmetric. Proposition 6.3.2. Suppose f (x, y) has partial derivatives fx , fy , fxy near (x0 , y0 ) and fxy is continuous at (x0 , y0 ), then fyx (x0 , y0 ) exists and fxy (x0 , y0 ) = fyx (x0 , y0 ). Thus if a function has all the first and second order partial derivatives near ~x0 , and the second order partial derivatives are continuous at ~x0 , then the condition of Theorem 6.3.1 is satisfied. Moreover, the second order partial derivatives is independent of the order of the variables. Proof. The second order partial derivative fyx (x0 , y0 ) is defined by a repeated limit fy (x, y0 ) − fy (x0 , y0 ) fyx (x0 , y0 ) = lim x→x0 x − x0 f (x, y) − f (x, y0 ) − f (x0 , y) + f (x0 , y0 ) . = lim lim x→x0 y→y0 (x − x0 )(y − y0 ) The other second order partial derivative fxy (x0 , y0 ) is defined by switching the two limits in the repeated limit. We will show that the existence of fy , fxy near (x0 , y0 ) and the continuity of fxy at (x0 , y0 ) actually imply the convergence of the whole limit fxy (x0 , y0 ) =

lim

x→x0 ,y→y0

f (x, y) − f (x, y0 ) − f (x0 , y) + f (x0 , y0 ) . (x − x0 )(y − y0 )

Note that the existence of fx implies the existence of the limit fy (x, y0 ) − fy (x0 , y0 ) f (x, y) − f (x, y0 ) − f (x0 , y) + f (x0 , y0 ) = lim . y→y0 x − x0 (x − x0 )(y − y0 ) Then by Proposition 5.1.11, the repeated limit fyx (x0 , y0 ) exists and is equal to fxy (x0 , y0 ). For fixed y0 and y, we apply the mean value theorem to the function g(x) = f (x, y) − f (x, y0 ). By the existence of fx near (x0 , y0 ), we get f (x, y) − f (x, y0 ) − f (x0 , y) + f (x0 , y0 ) =g(x) − g(x0 ) = g 0 (c)(x − x0 ) = (fx (c, y) − fy (c, y0 ))(x − x0 ) for some c between x0 and x. Then we fix c and apply the mean value theorem to the function fx (c, y) of y. By the existence of fxy near (x0 , y0 ), we get f (x, y) − f (x0 , y) − f (x, y0 ) + f (x0 , y0 ) = fxy (c, d)(x − x0 )(y − y0 ) for some d between y0 and y. Then the continuity of fxy at (x0 , y0 ) further tells us f (x, y) − f (x0 , y) − f (x, y0 ) + f (x0 , y0 ) lim x→x0 ,y→y0 (x − x0 )(y − y0 ) = lim fxy (c, d) = lim fxy (c, d) = fxy (x0 , y0 ). x→x0 ,y→y0

c→x0 ,d→y0

6.3. HIGH ORDER DIFFERENTIATION

317

Example 6.3.1. The function f = xy 2 z 3 has continuous partial derivatives fx = y 2 z 3 , fy = 2xyz 3 , fz = 3xy 2 z 2 , fxx = 0, fyy = 2yz 3 , fzz = 6xy 2 z, fxy = 2yz 3 , fxz = 3y 2 z 2 , fyz = 6xyz 2 . The values of the partial derivatives at (3, 2, 1) are f = 12, fx = 4, fy = 12, fz = 36, fxx = 0, fyy = 4, fzz = 72, fxy = 4, fxz = 12, fyz = 36. The quadratic approximation of f at (3, 2, 1) is 1 p = 12 + 4∆x + 12∆y + 36∆z + (4∆y 2 + 72∆z 2 + 8∆x∆y + 24∆x∆z + 72∆y∆z), 2 where ∆x = x − 3, ∆y = y − 2, ∆z = z − 1. The quadratic differential is d2(3,2,1) f = 4dy 2 + 72dz 2 + 8dxdy + 24dxdz + 72dydz. Exercise 6.3.1. Compute the partial derivatives up to second order. z

1. 4xy 3 + 5x3 y − 3y 2 .

3. exyz sin(x+2y+3z).

5. xy .

y 2. arctan . x

4. log(x2 + y 2 ).

6. (xy)z .

Exercise 6.3.2. Construct a function that is second order differentiable at (0, 0) but is not continuous anywhere away from (0, 0). Exercise 6.3.3. Show that the function  2 2  xy(x − y ) f (x, y) = x2 + y 2  0

if(x, y) 6= (0, 0) if(x, y) = (0, 0)

has all the second order partial derivatives near (0, 0) but fxy (0, 0) 6= fyx (0, 0). Exercise 6.3.4. Consider the function f (x, y) in Example 5.1.2. Show that f (x, y)2 has all the second order derivatives but the function is still not continuous (so not differentiable) at (0, 0). Can you find a function that has all the partial derivatives but is not continuous? Exercise 6.3.5. Suppose f has continuous second order partial derivatives. Prove that if f fxy = fx fy , then f (x, y) = g(x)h(y). Exercise 6.3.6. Derive the chain rule for the second order derivative by composing the quadratic approximations.

6.3.2

High Order Partial Derivative

The discussion on the quadratic approximation suggests the role to be played by high order partial derivatives in the high order approximation. In general, the k-th order partial derivative ∂kf = Dxi1 xi2 ···xik f = fxik ···xi2 xi1 ∂xi1 ∂xi2 · · · ∂xik

(6.3.8)

318

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

is obtained by successfully taking partial derivatives in xik , xik−1 , . . . , xi2 , xi1 . The partial derivative may depend on the order of variables in general. However, by Proposition 6.3.2, if all the partial derivatives up to the (k − 1)st order are continuous near ~x0 , and the k-th order partial derivatives are continuous at ~x0 , then the partial derivative is independent of the order. In this case, we can write ∂kf ∂kf = , ∂xi1 ∂xi2 · · · ∂xik ∂xk11 ∂xk22 · · · ∂xknn where kj is the number of xj in the collection {xi1 , xi2 , . . . , xik }. Many techniques for the computation of high order derivatives of single variable functions can be adapted to multivariable functions. Example 6.3.2. For a function f (x, y) and x = x(u, v), y = y(u, v), we have ∂2f ∂ ∂f ∂ ∂f ∂x ∂f ∂y = = + ∂u∂v ∂u ∂v ∂u ∂x ∂v ∂y ∂v ∂ ∂f ∂x ∂f ∂ ∂x ∂ ∂f ∂y ∂f ∂ ∂y = + + + ∂u ∂x ∂v ∂x ∂u ∂v ∂u ∂y ∂v ∂y ∂u ∂v 2 2 2 ∂ f ∂x ∂ f ∂y ∂x ∂f ∂ x = + + 2 ∂x ∂u ∂y∂x ∂u ∂v ∂x ∂u∂v 2 2 ∂ f ∂x ∂ f ∂y ∂y ∂f ∂ 2 y + + + ∂x∂y ∂u ∂y 2 ∂u ∂v ∂y ∂u∂v 2 2 2 ∂x ∂x ∂ f ∂y ∂y ∂ f ∂x ∂y ∂x ∂y ∂ f = + + + 2 2 ∂u ∂v ∂x ∂u ∂v ∂y ∂u ∂v ∂v ∂u ∂x∂y 2 2 ∂ y ∂f ∂ x ∂f + . + ∂u∂v ∂x ∂u∂v ∂y Example 6.3.3. By Example 6.2.1, the partial derivatives of the polar coordinate in terms of the cartesian coordinate is rx = cos θ, ry = sin θ, θx = −r−1 sin θ, θy = r−1 cos θ. Thus rxx = −(sin θ)θx = r−1 sin2 θ, ryy = (cos θ)θy = r−1 cos2 θ, rxy = −(sin θ)θy = −r−1 sin θ cos θ, θxx = r−2 rx sin θ − r−1 (cos θ)θx = 2r−2 sin θ cos θ, θyy = −r−2 ry cos θ + r−1 (− sin θ)θy = −2r−2 sin θ cos θ, θxy = r−2 ry sin θ − r−1 (cos θ)θy = r−2 (sin2 θ − cos2 θ). Example 6.3.4. By Example 6.2.2, the partial derivatives of one coordinate in terms of the other coordinates in the unit sphere x2 + y 2 + z 2 = 1 is (when z 6= 0) x y zx = − , z y = − . z z

6.3. HIGH ORDER DIFFERENTIATION

319

Thus z − xzx x2 + z 2 1 − y2 zxx = − =− = , 2 3 z z z3 xzy xy zxy = 2 = − 3 , z z (1 − y 2 )zx (1 − y 2 )x zxxx = −3 = 3 , z4 z5 −2yz 3 − 3(1 − y 2 )z 2 zy (1 − 3y 2 − 2z 2 )y = . zxxy = z6 z5 Exercise 6.3.7. Compute the third order partial derivatives. z

1. 4xy 3 + 5x3 y − 3y 2 .

3. exyz sin(x+2y+3z).

5. xy .

y 2. arctan . x

4. log(x2 + y 2 ).

6. (xy)z .

Exercise 6.3.8. Compute the partial derivatives. 1. z = uv + sin t, u = et , v = cos t, find ztt , zttt . 2. z = x2 log y, x =

u , y = u − v, find zuu , zuv , zvv . v

3. u = f (x + y, x − y), find uxx , uxy , uxxx , uxyy . 4. u = f (r cos θ, r sin θ), find urr , uθθ , urθ . x y z 5. u = f , , , find uxx , uxy , uxyz . y z x 6. u = xyzf (x, y, z), find

∂ i+j+k u . ∂xi ∂y j ∂z k

Exercise 6.3.9. For a function f on Rn and a linear transform L : Rm → Rn . How are the high order partial derivatives of f and f ◦ L related? Exercise 6.3.10. Verify the functions satisfy the partial differential equations. 1. u =

(x−b)2 1 √ e− 4a2 t , heat equation ut = a2 uxx . 2a πt

2. u = log((x − a)2 + (y − b)2 ), Laplace equation uxx + uyy = 0. 3. u = A((x1 − a1 )2 + (x2 − a2 )2 + · · · + (xn − an )2 ) ux21 + ux22 + · · · + ux2n = 0.

2−n 2

, Laplace equation

4. u = φ(x + at) + ψ(x − at), wave equation utt = a2 uxx . Exercise 6.3.11. Derive the partial differential equations under the new variables. 1. x = r cos θ, y = r sin θ, Laplace equation uxx + uyy = 0. 2. ξ = x + at, η = x − at, wave equation utt = a2 uxx . 3. s = xy, t =

x , equation x2 uxx − y 2 uyy = 0. y

Exercise 6.3.12. Compute the partial derivatives.

320

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

1. x = s + t, y = s2 + t2 , z = s3 + t3 , find zxx , zxy , zyy . 2. x = es+t , y = es−t , z = st, find zxx , zxxx . 3. x3 + y 3 − 3axy = 0, find yxx and yxxx . 4. x2 + y 2 + z 2 − 4x + 6y − 2z = 0, find

∂2z ∂2x and . ∂(x, y)2 ∂(y, z)2

5. x2 + y 2 = z 2 + w2 , x + y + z + w = 1, find

∂ 2 (z, w) ∂ 2 (x, w) , . ∂(x, y)2 ∂(y, z)2

Exercise 6.3.13. Suppose y = y(~x) is given implicitly by f (~x, y) = 0. Compute the second order partial derivatives of y.

6.3.3

Taylor Expansion

A map P : Rn → Rm is a polynomial map of degree k if all its coordinate functions are polynomials of degree k. A map F (~x) defined near ~x0 is k-th order differentiable at ~x0 if it is approximated by a polynomial map P of degree k. In other words, for any > 0, there is δ > 0, such that k∆~xk < δ =⇒ kF (~x) − P (~x)k ≤ k∆~xkk .

(6.3.9)

Express the approximate polynomial as 1 1 P (~x) = F (~x0 )+F 0 (~x0 )(∆x)+ F 00 (~x0 )(∆x)+· · ·+ F (k) (~x0 )(∆x), (6.3.10) 2 k! where the coordinates of F (i) (~x0 ) are i-th order forms. Then F (i) (~x0 ) is called the i-th order derivative of F at ~x0 . It is easy to see that F is approximated by P if and only if each coordinate is approximated. A function f (~x) defined near ~x0 is k-th order differentiable at ~x0 if there is a polynomial X p(~x) = bk1 k2 ···kn ∆xk11 ∆xk22 · · · ∆xknn k1 +k2 +···+kn ≤k,ki ≥0

of degree k, such that for any > 0, there is δ > 0, such that k∆~xk < δ =⇒ |f (~x) − p(~x)| ≤ k∆~xkk . The k-th order derivative f (k) (~x0 )(~v ) =

X

k!bk1 k2 ···kn v1k1 v2k2 · · · vnkn

k1 +k2 +···+kn =k,ki ≥0

is a k-th order form, and the k-th order differential is X dk f = k!bk1 k2 ···kn dxk11 dxk22 · · · dxknn . k1 +k2 +···+kn =k,ki ≥0

Similar to single variable functions, a high order differentiable function may not be continuous away from the point. On the other hand, if f has

6.3. HIGH ORDER DIFFERENTIATION

321

partial derivatives up to order k at ~x0 , then we may construct the k-th order Taylor expansion 1 ∂ m f (~x0 ) ∆xi1 ∆xi2 · · · ∆xim . (6.3.11) m! ∂x i1 ∂xi2 · · · ∂xim ≤n,0≤m≤k

X

Tk (~x) =

1≤i1 ,i2 ,...,im

If all the partial derivatives exist near ~x0 and are continuous at ~x0 , then the partial derivatives are independent of the order of the variables, and we have X

Tk (~x) =

k1 +k2 +···+kn

1 ∂ k1 +k2 +···+kn f (~x0 ) k1 k2 ∆x1 ∆x2 · · · ∆xknn . k k 2 1 k n k1 !k2 ! · · · kn ! ∂x1 ∂x2 · · · ∂xn ≤k,k ≥0 i

(6.3.12) Theorem 6.3.3. Suppose f (~x) has continuous partial derivatives up to order k − 1 near ~x0 and the (k − 1)-st order partial derivatives are differentiable at ~x0 . Then the k-th order Taylor expansion is the k-th order polynomial approximation at ~x0 . Proof. Restrict the remainder Rk (~x) = f (~x) − Tk (~x) to the straight line connecting ~x0 to ~x rk (t) = Rk ((1 − t)~x0 + t~x) = Rk (~x0 + t∆~x). Since f (~x) has continuous partial derivatives up to order k − 1 near ~x0 , the function f (~x) and its partial derivatives up to order (k − 2) are differentiable near ~x0 . Then by the chain rule, rk has derivatives up to order (k − 1) for t ∈ (−1, 1). Moreover, we have (k−1)

rk (0) = rk0 (0) = rk00 (0) = · · · = rk

(0) = 0,

(6.3.13)

and (k−1)

rk

(t) =

X

[δi1 ,i2 ,...,ik−1 (t∆~x)−λi1 ,i2 ,...,ik−1 (t∆~x)]∆xi1 ∆xi2 · · · ∆xik−1 ,

1≤i1 ,i2 ,...,ik−1 ≤n

where δi1 ,i2 ,...,ik−1 (∆~x) =

∂ k−1 f (~x0 + ∆~x) ∂xi1 ∂xi2 · · · ∂xik−1

is the (k − 1)-st order partial derivative, and λi1 ,i2 ,...,ik−1 (∆~x) =

X ∂ k−1 f (~x0 ) ∂ k f (~x0 ) + ∆xi ∂xi1 ∂xi2 · · · ∂xik−1 1≤i≤n ∂xi ∂xi1 ∂xi2 · · · ∂xik−1

is the linear approximation of the partial derivative.

322

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

By (6.3.13) and Cauchy’s mean value theorem, we have rk (1) − rk (0) rk0 (c1 ) rk0 (c1 ) − rk0 (0) Rk (~x) = rk (1) = = k−1 = 1k − 0k kc1 k(ck−1 − 0k−1 ) 1 (k−1)

(k−1)

r rk00 (c2 ) (ck−1 ) rk (ck−1 ) = k = = ··· = k−2 k(k − 1) · · · 2ck−1 k!ck−1 k(k − 1)c2 for some 1 > c1 > c2 > · · · > ck−1 > 0. By the assumption that the (k − 1)-st order partial derivatives δi1 ,i2 ,...,im of f are differentiable, for any > 0, there is δ > 0, such that k∆~xk < δ implies |δi1 ,i2 ,...,ik−1 (∆~x) − λi1 ,i2 ,...,ik−1 (∆~x)| ≤ k∆~xk. Then for k∆~xk < δ and |t| < 1, we have X (k−1) |rk (t)| ≤ kt∆~xk|∆xi1 ∆xi2 · · · ∆xik−1 | 1≤i1 ,i2 ,...,ik−1 ≤n

≤ nk−1 |t|k∆~xkk∆~xkk−1 ∞ , and

(k−1)

(ck−1 )| nk−1 ≤ k∆~xkk∆~xkk−1 ∞ . k!|ck−1 | k! Rk (~x) = 0. By the equivalence of norms, this implies that lim∆~x→~0 k∆~xkk |Rk (~x)| =

|rk

Exercise 6.3.14. Find Taylor expansions. 1. xy at (1, 4), to third order. 2. sin(x2 + y 2 ) at (0, 0), to fourth order. 3. xy y z at (1, 1, 1), to third order. Z 1 2 4. (1 + x)t y dt at (0, 0), to third order. 0

Exercise 6.3.15. Find the second and third order derivatives of the map of taking the k-th power of matrices. Exercise 6.3.16. Find the high order derivatives of the map of taking the inverse of matrices. Exercise 6.3.17. Find the condition for a homogeneous function to be k-th order differentiable at ~0. What about a multihomogeneous function?

Under stronger differentiability condition, the remainder formula in Proposition 2.3.3 can also be extended. Proposition 6.3.4. Suppose f (~x) has continuous partial derivatives up to order k + 1 near ~x0 . Then for any ~x near ~x0 , there is ~c on the straight line connecting ~x0 to ~x, such that X 1 ∂ k+1 f (~c) |∆x1 |k1 |∆x2 |k2 · · · |∆xn |kn . |f (~x)−Tk (~x)| ≤ k1 k2 k k 1 !k2 ! · · · kn ! ∂x1 ∂x2 · · · ∂xnn k +k +···+k =k+1 1

2

n

(6.3.14)

6.3. HIGH ORDER DIFFERENTIATION

323

Proof. Under the assumption of the proposition, we have (k)

(k+1)

rk (1) − rk (0) r (ck+1 ) rk (ck ) Rk (~x) = k+1 = k = ··· = k+1 1 −0 (k + 1)k · · · 2ck (k + 1)! for some 1 > c1 > c2 > · · · > ck+1 > 0. Since Tk (~x) is a polynomial of degree k, we have (k+1)

rk

dk+1 f (~x0 + t∆~x) dtk+1 X (k + 1)! ∂ k1 +k2 +···+kn f (~x0 + t∆~x) k1 k2 ∆x1 ∆x2 · · · ∆xknn , = k2 k1 k n k !k ! · · · k ! ∂x1 ∂x2 · · · ∂xn 1 2 n k +k +···+k =k+1

(t) =

1

2

n

and the estimation for the remainder Rk (~x) follows. Example 6.3.5. For the function f = xy 2 z 3 in Example 6.3.1, we have fxxx = fxxy = fxxz = fxyy = fyyy = 0, fxyz = 6yz 2 , fxzz = 6y 2 z, fyyz = 6yz 2 , fyzz = 12xyz. Thus at (3, 2, 1), when |∆x|, |∆y|, |∆z| < δ, the sum of the absolute values of the third order derivatives is bounded by 6(2 + δ)(1 + δ)2 + 6(2 + δ)2 (1 + δ) + 6(2 + δ)(1 + δ)2 + 12(3 + δ)(2 + δ)(1 + δ) < 120(1 + δ)3 . The error of the quadratic 120(1 + δ)3 3 δ = 20(1 + δ)3 δ 3 . approximation is < 3! Exercise 6.3.18. Estimate the errors of the approximations in Exercise 6.3.14.

6.3.4

Maximum and Minimum

A function f (~x) has a local maximum at ~x0 if there is δ > 0, such that k~x − ~x0 k < δ =⇒ f (~x) ≤ f (~x0 ). In other words, the value of f at ~x0 is biggest among the values of f at points near ~x0 . Similarly, f has a local minimum at ~x0 if there is δ > 0, such that k~x − ~x0 k < δ =⇒ f (~x) ≥ f (~x0 ). Suppose the function is defined near ~x0 . Then ~x0 is a local extreme if and only if it is a local extreme in all directions. So a necessary condition is that if the derivative D~v f (~x0 ) exists in the direction of ~v , then D~v f (~x0 ) = 0. By taking ~v to be the coordinate directions, we get the following result. Proposition 6.3.5. Suppose f (~x) is defined near ~x0 and has a local extreme ∂f at ~x0 . If a partial derivative (~x0 ) exists, then the partial derivative must ∂xi be zero. Thus if f (~x) is differentiable at a local extreme ~x0 , then ∇f (~x0 ) = ~0. The condition can also be expressed as df = 0.

324

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

Example 6.3.6. For f (x, y) = x2 + xy + y 2 , we have fx = 2x + y and fy = x + 2y. The condition fx = fy = 0 tells us the only possible local extreme is (0, 0). Since 1 2 3 2 f (x, y) = x + y + y ≥ 0 = f (0, 0), 2 4 (0, 0) is a local minimum. Similarly, by solving ∇g = 0, the only possible local extreme for g(x, y) = 2 x − y 2 is again (0, 0). Since g(x, y) > 0 = g(0, 0) when |x| > |y| and g(x, y) < 0 = g(0, 0) when |x| < |y|, (0, 0) is neither a local maximum nor a local minimum. Taking a clue from the graph of g, (0, 0) is called a saddle point. Example 6.3.7. The function f (x, y) = |x| + y satisfies fy = 1 6= 0. Therefore there is no local extreme, despite the fact that fx may not exist. p Example 6.3.8. The function f (x, y) = |xy| has the partial derivatives  s  r  1 x  1 y     if y > 0 if x > 0     2 y   2 rx   s     1 y x 1 if x < 0 − , fy = − . fx = if y < 0 y 2 x     2     0 if y = 0     0 if x = 0   does not exist if x = 0, y 6= 0   does not exist if x 6= 0, y = 0 Therefore the possible local extrema are (0, y0 ) for any y0 and (x0 , 0) for any x0 . Since p f (x, y) = |xy| ≥ 0 = f (0, y0 ) = f (x0 , 0), the points on the two axes are indeed local minima. Example 6.3.9. Consider the differentiable function z = z(x, y) given implicitly by x2 − 2xy + 4yz + z 3 + 2y − z = 1. To find the possible local extrema of z(x, y), we take the differential and get (2x − 2y)dx + (−2x + 4z + 2)dy + (4y + 3z 2 − 1)dz = 0. The possible local extrema are obtained by the condition dz = zx dx + zy dy = 0 and satisfy the implicit equation. Thus we try to solve 2x − 2y = 0, −2x + 4z + 2 = 0, x2 − 2xy + 4yz + z 3 + 2y − z = 1. From the first two equations, we have x = y = 2z + 1. Substituting into the third, we get z 3 + 4z 2 + 3z + 1 = 1, which has three solutions z = 0, −1, −3. Using x = y = 2z + 1, we find three possible local extrema (1, 1, 0), (−1, −1, −1), (−5, −5, −3) for z(x, y). Example 6.3.10. The continuous function f = x2 − 2xy reaches its maximum and minimum on the compact subset |x| + |y| ≤ 1. From fx = 2x − 2y = 0 and fy = −2x = 0, we find the point (0, 0) to be the only possible local extreme in the interior |x| + |y| < 1. Then we look for the local extrema of f restricted to the boundary |x| + |y| = 1, which may be devided into four (open) line segments and four points. 2 On the segment x + y = 1, 0 < x < 1, we have f = x2 − 2x(1 − x) = 3x − 2x. 1 2 From fx = 6x − 2 = 0 and x + y = 1, we find the possible local extreme , for 3 3

6.3. HIGH ORDER DIFFERENTIATION

325

1 2 on the f on the segment. Similarly, we find a possible local extreme − , − 3 3 segment x + y = −1, −1 < x < 0, and no possible local extrema on the segments x − y = 1, 0 < x < 1 and −x + y = 1, −1 < x < 0. We also need to consider four points at the ends of the four segments, which are (1, 0), (−1, 0), (0, 1), (0, −1). Comparing the values of f at all the possible local extrema

f (0, 0) = 0, f (1/3, 2/3) = f (−1/3, −1/3) = −1/3, f (1, 0) = f (−1, 0) = 1, f (0, 1) = f (0, −1) = 0, we find ±(1/2, 2/3) are the absolute minima and (±1, 0) are the absolute maxima. Exercise 6.3.19. Find possible local extrema. 1. x2 y 3 (a − x − y). 2. x + y + 4 sin x sin y. 3. xyz log(x2 + y 2 + z 2 ). a1 a2 an + + ··· + . x1 x2 xn x1 x2 · · · xn 5. . (a + x1 )(x1 + x2 ) · · · (xn + b)

4. x1 x2 · · · xn +

Exercise 6.3.20. Find possible local extrema of implicitly defined function z. 1. x2 + y 2 + z 2 − 2x + 6z = 6. 2. (x2 + y 2 + z 2 )2 = a2 (x2 + y 2 − z 2 ). 3. x3 + y 3 + z 3 − axyz = 1. Exercise 6.3.21. Find the absolute extrema for the functions on the given domain. 1. x + y + z on {(x, y, z) : x2 + y 2 ≤ z ≤ 1}. 2. sin x + sin y − sin(x + y) on {(x, y) : x ≥ 0, y ≥ 0, x + y ≤ 2π}. Exercise 6.3.22. What is the shortest distance between two straight lines?

Similar to the single variable case. The linear approximation may be used to find potential local extrema. The high order approximations may be used to determine whether the candidates are indeed local extrema. If a function is continuously second order differentiable, then the second order derivative is a quadratic form, called the Hessian of the function hf (~v ) =

X ∂ 2f X ∂ 2f 2 v + 2 vi vj 2 i ∂x ∂x x i j i 1≤i<j≤n 1≤i≤n

∂ 2f 2 ∂ 2f 2 ∂ 2f 2 = v + v + · · · + 2 vn ∂x21 1 ∂x22 2 ∂xn 2 2 ∂ f ∂ f ∂ 2f +2 v1 v2 + 2 v1 v3 + · · · + 2 vn−1 vn . (6.3.15) ∂x1 ∂x2 ∂x1 ∂x3 ∂xn−1 ∂xn The following extends Proposition 2.3.4 in the second order case.

326

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

Proposition 6.3.6. Suppose f (~x) has second order partial derivatives near ~x0 , such that the first order derivative is continuous near ~x0 and the second order derivative is continuous at ~x0 . Moreover, assume ∇f (~x0 ) = 0. 1. If the Hessian is positive definite, then ~x0 is a local minimum. 2. If the Hessian is negative definite, then ~x0 is a local maximum. 3. If the Hessian is indefinite, then ~x0 is not a local extreme. Proof. The Hessian is a continuous homogeneous function of order 2. Thus it reaches its maximum and minimum on the compact subset {~v : k~v k = 1}. Suppose the Hessian is positive definite. Then the minimum on the subset is reached at some point, with value c > 0. The homogeneity and the fact that hf (~v ) ≥ c for ~v satisfying k~v k = 1 implies that hf (~v ) ≥ ck~v k2 for any ~v . c By Theorem 6.3.3 and ∇f (~x0 ) = 0, for 0 < < , there is δ > 0, such 2 that k∆~xk < δ implies f (~x) − f (~x0 ) − 1 hf (∆~x) ≤ k∆~xk2 . 2 By hf (∆~x) ≥ ck∆~xk2 , we get 1 f (~x) − f (~x0 ) ≥ hf (∆~x) − k∆~xk2 ≥ 0. 2 The proof that the negative definiteness implies local maximum is similar. The argument also shows that if hf (~v ) > 0 for some ~v , then f (~x0 + t~v ) > f (~x0 ) for sufficiently small t 6= 0. Similarly, if hf (w) ~ < 0 for some other w, ~ then f (~x0 + tw) ~ < f (~x0 ) for sufficiently small t 6= 0. Thus if hf is indefinite, then ~x0 is not a local extreme. Example 6.3.11. We try to find the local extrema of f = x3 + y 2 + z 2 + 12xy + 2z. By solving fx = 3x2 + 12y = 0, fy = 2y + 12x = 0, fz = 2z + 2 = 0, we find two possible local extrema ~a = (0, 0, −1) and ~b = (24, −144, −1). The Hessian of f at the two points are hf,~a (u, v, w) = 2v 2 + 2w2 + 24uv, hf,~b (u, v, w) = 144u2 + 2v 2 + 2w2 + 24uv. Since hf,~a (1, 1, 0) = 26 > 0, hf,~a (−1, 1, 0) = −22 < 0, ~a is not a local extreme. Since hf,~b = (12u + v)2 + v 2 + 2w2 > 0 for (u, v, w) 6= ~0, ~b is a local minimum. Example 6.3.12. We study whether the three possibilities in Example 6.3.9 are −2x + 2y 2x − 4z − 2 indeed local extrema. By zx = , zy = and the fact that 2 4y + 3z − 1 4y + 3z 2 − 1 zx = zy = 0 at the three points, we have zxx =

−2 2 −8(x − 2z − 1) , zxy = , zyy = , 4y + 3z 2 − 1 4y + 3z 2 − 1 (4y + 3z 2 − 1)2

6.3. HIGH ORDER DIFFERENTIATION

327

at the three points, and get the Hessians 2 4 h(1,1,0) (u, v) = − u2 + uv, 3 3 2 h(−1,−1,−1) (u, v) = u − 2uv, 1 2 h(−5,−5,−3) (u, v) = − u2 + uv. 3 3 By taking (u, v) = (1, 1) and (1, −1), we see the Hessians are indefinite and none are local extrema. Example 6.3.13. The only possible local extreme for the function f (x, y) = x3 + y 2 is (0, 0), where the Hessian hf (u, v) = 2v 2 . Although the Hessian is non-negative, it is not positive definite. In fact, the Hessian is semi-positive definite, and (0, 0) is not a local extreme. The problem is that h(1, 0) = 0, which corresponds to the fact that the local extreme problem cannot be solved for f (x, 0) = x3 by the quadratic approximation alone. What we have is a local extreme problem for which the quadratic approximation is needed in y-direction and the cubic approximation is needed in x-direction. Exercise 6.3.23. Try your best to determine whether the possible local extrema in Exercises 6.3.19 and 6.3.20 are indeed local maxima or local minima. Exercise 6.3.24. Suppose a two variable function f (x, y) has continuous derivatives up to second order at (x0 , y0 ). Suppose fx = fy = 0 at (x0 , y0 ). 2 > 0, then (x , y ) is a local minimum. 1. Prove that if fxx > 0 and fxx fyy − fxy 0 0 2 > 0, then (x , y ) is a local maximum. 2. Prove that if fxx < 0 and fxx fyy − fxy 0 0 2 < 0, then (x , y ) is not a local 3. Prove that if fxx 6= 0 and fxx fyy − fxy 0 0 extreme.

Exercise 6.3.25. Show that the function  x3 y 3 x2 + y 2 + (x2 + y 2 )3 f (x, y) =  0

if(x, y) 6= (0, 0) if(x, y) = (0, 0)

has first and second order partial derivatives and satisfy ∇f (0, 0) = 0 and hf (u, v) > 0 for (u, v) 6= (0, 0). However, the function does not have a local extreme at (0, 0). The counterexample is not continuous at (0, 0). Can you find a counterexample that is differentiable at (0, 0)? Exercise 6.3.26. Suppose a function f (~x) has continuous derivatives up to third order at ~x0 , such that all the first and second order partial derivatives vanish at ~x0 . Prove that if some third order partial derivative is nonzero, then ~x0 is not a local extreme.

Example 6.3.13 shows that it is possible for the Hessian to be in none of the three cases, and the quadratic approximation is not enough for concluding the local extreme. Like the single variable case, we may consider the finer higher order approximation. However, two problems appear in the multivariable case. The first is that we may need approximations of different

328

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

orders in different directions, so we may not have a neat criterion like Proposition 2.3.4. The second is that even if we can use approximations of the same order in all directions, there is no general technique such as completing the squares to convert forms of higher order to some kind of standard form, from which we can see the positive or negative definiteness.

6.3.5

Constrained Extreme

Suppose G : Rn → Rm is a map and G(~x0 ) = ~c. A function f (~x) has a local maximum at ~x0 under the constraint G(~x) = ~c if there is δ > 0, such that G(~x) = ~c, k~x − ~x0 k < δ =⇒ f (~x) ≤ f (~x0 ). In other words, the value of f at ~x0 is biggest among the values of f at points near ~x0 and satisfying G(~x) = ~c. Similarly, f has a local minimum at ~x0 under the constraint G(~x) = ~c if there is δ > 0, such that G(~x) = ~c, k~x − ~x0 k < δ =⇒ f (~x) ≥ f (~x0 ). For example, the hottest and the coldest places in the world are the extrema of the temperature function T (x, y, z) under the constraint g(x, y, z) = x2 + y 2 + z 2 = R2 , where R is the radius of the earth. Suppose G : Rn → Rm is continuously differentiable and ~c is a regular value, then G(~x) = ~c defines an (n − m)-dimensional hypersurface S in Rn . Suppose f is a differentiable function defined on an open subset containing S. Then f has a local maximum at ~x0 under the constraint G(~x) = ~c if and only if the restriction of f on any curve in S passing through ~x0 has a local maximum at ~x0 . In particular, this implies that the derivative D~v f in the direction of any tangent vector ~v ∈ T~x0 S must vanish. The tangent vectors ~v ∈ T~x0 S are characterized by the linear equation 0 G (~x0 )(~v ) = ~0. Suppose G = (g1 , g2 , . . . , gm ). Then G0 (~x0 )(~v ) = (∇g1 (~x0 ) · ~v , ∇g2 (~x0 ) · ~v , . . . , ∇gm (~x0 ) · ~v ). Thus D~v f = ∇f (~x0 ) · ~v vanishes for all tangent vectors ~v means ∇g1 (~x0 ) · ~v = ∇g2 (~x0 ) · ~v = · · · = ∇gm (~x0 ) · ~v = 0 =⇒ ∇f (~x0 ) · ~v = 0. Linear algebra tells us that this is equivalent to ∇f (~x0 ) being a linear combination of ∇g1 (~x0 ), ∇g2 (~x0 ), . . . , ∇gm (~x0 ). Proposition 6.3.7. Suppose G = (g1 , g2 , . . . , gm ) : Rn → Rm is continuously differentiable map and G(~x0 ) = ~c is a regular value of G. Suppose a function f is defined near ~x0 and is differentiable at ~x0 . If f has a local extreme at ~x0 under the constraint G(~x) = ~c, then ∇f (~x0 ) is a linear combination of ∇g1 (~x0 ), ∇g2 (~x0 ), . . . , ∇gm (~x0 ): ∇f (~x0 ) = λ1 ∇g1 (~x0 ) + λ2 ∇g2 (~x0 ) + · · · + λm ∇gm (~x0 ).

6.3. HIGH ORDER DIFFERENTIATION

329

The numbers λ1 , λ2 , . . . , λm are called the lagrange multipliers. The condition can also be written as an equality of linear functionals f 0 (~x0 ) = ~λ · G0 (~x0 ). Example 6.3.14. We try to find the possible local extrema of f = xy 2 on the circle g = x2 + y 2 = 3. At local extrema, we have ∇f = (y 2 , 2xy) = λ∇g = λ(2x, 2y). Combined with the fact the point lies in the circle, we get y 2 = 2λx, 2xy = 2λy, x2 + y 2 = 3. If λ = 0, then the first two equations become xy = y 2 = 0, which is √ the same as y = 0. From the third equation, we get two possible local extrema (± 3, 0). If λ 6= 0, then from the first equation, y = 0 would imply x = 0, which contradicts with the third equation. By y 6= 0 and the first equation, we get 2 2 λ = x. Substituting into the first and the third equations, √ we get y = 2x and 2 x = 1. This gives us four possible local extrema (±1, ± 2). Note that since the circle is compact, the function must reach its maximum and minimum. By comparing the values of the function √ at the six possible local extrema, we find f √has absolute maximum 2 at (1, ± 2), and has absolute minimum −2 at (−1, ± 2). Example 6.3.15. The local extrema problem in Example 6.3.9 can also be considered as the local extrema of the function f (x, y, z) = z under the constraint g(x, y, z) = x2 − 2xy + 4yz + z 3 + 2y − z = 1. At local extrema, we have ∇f = (0, 0, 1) = λ∇g = λ(2x − 2y, −2x + 4z + 2, 4y + 2z 2 − 1). This is the same as 0 = λ(2x − 2y), 0 = λ(−2x + 4z + 2), 1 = λ(4y + 2z 2 − 1). The third equality tells us λ 6= 0. Thus the first two equalities become 2x − 2y = 0 and −2x + 4z + 2 = 0. These are the conditions we found in Example 6.3.9. Example 6.3.16. We try to find the possible local extrema of f = xy + yz + zx on the sphere x2 + y 2 + z 2 = 1. By ∇f = (y + z, z + x, x + y) and ∇(x2 + y 2 + z 2 ) = (2x, 2y, 2z), we have y + z = 2λx, z + x = 2λy, x + y = 2λz, x2 + y 2 + z 2 = 1 at local extrema. Adding the first three equalities together, we get (λ − 1)(x + y + z) = 0. If λ = 1, then x = y = z from the first three equations. Substituting into the 1 1 1 2 fourth equation, we get 3x = 1 and two possible extrema ± √ , √ , √ . 3 3 3 If λ 6= 1, then x + y + z = 0 and the first three equations become (2λ + 1)x = (2λ + 1)y = (2λ + 1)z = 0. Since (0, 0, 0) does not satisfy x2 + y 2 + z 2 = 1, we must have 2λ + 1 = 0, and the four equations is equivalent to x + y + z = 0 and x2 + y 2 + z 2 = 1, which is a circle in R3 .

330

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

Thus the possible local extrema are ±

1 1 1 √ ,√ ,√ 3 3 3

and the points on the

circle x + y + z = 0, x2 + y 2 + z 2 = 1. Note that the sphere is compact and the continuous function f reaches its 1 1 1 and maximum and minimum on the sphere. Since f = 1 at ± √ , √ , √ 3 3 3 1 1 f = [(x + y + z)2 − (x2 + y 2 + z 2 )] = − along the circle x + y + z = 0, 2 2 1 2 2 2 x + y + z = 1, we find the maximum is 1 and the minimum is − . 2 We can also use fx = y + z = 0, fy = z + x = 0, fz = x + y = 0 to find the possible local extreme (0, 0, 0) of f in the interior x2 + y 2 + z 2 < 1 of the ball x2 + y 2 + z 2 ≤ 1. By comparing f (0, 0, 0) = 0 with the maximum and minimum of f on the sphere, we find f reaches its extrema on the ball at boundary points. Example 6.3.17. We try to find the possible local extrema of f = xyz on the circle given by g1 = x + y + z = 0 and g2 = x2 + y 2 + z 2 = 6. The necessary condition is ∇f = (yz, zx, xy) = λ1 ∇g1 + λ2 ∇g2 = (λ1 + 2λ2 x, λ1 + 2λ2 y, λ1 + 2λ2 z). Canceling λ1 from the three equations, we get (x − y)(z + 2λ2 ) = 0, (x − z)(y + 2λ2 ) = 0. If x 6= y and x 6= z, then y = −2λ2 = z. Therefore at least two of x, y, z are equal. If x = y, then the two constraints g1 = 0 and g2 = 6 tell us 2x + z = 0 and 2x2 + z 2 = 6. Canceling z, we get 6x2 = 6 and two possible local extrema (1, 1, −2) and (−1, −1, 2). By assuming x = z or y = z, we get four more possible local extrema (1, −2, 1), (−1, 2, −1), (−2, 1, 1), (2, −1, −1). By evaluating the function at the six possible local extrema, we find the absolute maximum 2 is reached at three points and the absolute minimum −2 is reached at the other three points. Exercise 6.3.27. Find possible local extrema under the constraint. 1. xα y β z γ under the constraint x + y + z = a, x, y, z > 0, where α, β, γ > 0. 2. sin x sin y sin z under the constraint x + y + z =

π . 2

3. x1 + x2 + · · · + xn under the constraint x1 x2 · · · xn = a. 4. x1 x2 · · · xn under the constraint x1 + x2 + · · · + xn = a. 5. x1 x2 · · · xn under the constraint x1 + x2 + · · · + xn = a, x21 + x22 + · · · + x2n = b. 6. xp1 + xp2 + · · · + xpn under the constraint a1 x1 + a2 x2 + · · · + an xn = b, where p > 0. Exercise 6.3.28. Prove inequalities. x1 + x2 + · · · + xn for xi ≥ 0. n x1 + x2 + · · · + xn p xp1 + xp2 + · · · + xpn 2. ≥ for p ≥ 1, xi ≥ 0. What if n n 0 < p < 1? 1.

√ n

x1 x2 · · · xn ≤

6.3. HIGH ORDER DIFFERENTIATION

331

Exercise P 6.3.29. Derive H¨ older inequality in Exercise P 2.2.41 by considering the q function bi of (b1 , b2 , . . . , bn ) under the constraint ai bi = 1. Exercise 6.3.30. Suppose A is a symmetric matrix. Prove that if the quadratic form q(~x) = A~x ·~x reaches maximum or minimum at ~v on the unit sphere {~x : k~xk2 = 1}, then ~v is an eigenvector of A. Exercise 6.3.31. Fix the base triangle and the height of a pyramid. When is the total area of the side faces smallest? Exercise 6.3.32. The intersection of the plane x + y − z = 0 and the ellipse x2 + y 2 + z 2 − xy − yz − zx = 1 is an ellipse centered at the origin. Find the lengths of the two axes of the ellipse.

After identifying the possible local extrema by using the linear approximation, we need to use the quadratic approximation to determine whether the candidates are indeed local extrema. Assume we are in the situation described in Proposition 6.3.7. Further assume that f and G = (g1 , g2 , . . . , gm ) are continuously second order differentiable. Then we have quadratic approximations 1 pf (~x) = f (~x0 ) + ∇f (~x0 ) · ∆~x + hf (∆~x), 2 1 pgi (~x) = gi (~x0 ) + ∇gi (~x0 ) · ∆~x + hgi (∆~x). 2 of f and gi at the possible local extreme ~x0 . The local maximum problem f (~x) < f (~x0 ) for ~x near ~x0 , ~x 6= ~x0 , and gi (~x) = ci for the original function f is approximated by the similar problem pf (~x) < pf (~x0 ) = f (~x0 ) for ~x near ~x0 , ~x 6= ~x0 , and pgi (~x) = ci for the quadratic approximations (note that the constraint is also quadratically approximated). The local minimum problem can be similarly approximated. The quadratic approximation problem is not easy to solve because of the mixture of linear and quadratic terms. To remove the mix, we introduce the function f˜(~x) = f (~x) − ~λ · G(~x) = f (~x) − λ1 g1 (~x) − λ2 g2 (~x) − · · · − λm gm (~x), where λi are the lagrange multipliers obtained from the linear approximation. Since f˜(~x) = f (~x) − ~λ · ~c for those ~x satisfying the constraint G(~x) = ~c, and ~λ·~c is a constant, f has a local maximum at ~x0 under the constraint G(~x) = ~c if and only if f˜ has a local maximum at ~x0 under the constraint G(~x) = ~c. However, the later problem is simpler because the quadratic approximation pf˜ = pf (~x) − λ1 pg1 (~x) − λ2 pg2 (~x) − · · · − λm pgm (~x) = f˜(~x0 ) + (∇f (~x0 ) − λ1 ∇g1 (~x0 ) − λ2 ∇g2 (~x0 ) − · · · − λm ∇gm (~x0 )) · ∆~x 1 + (hf (∆~x) − λ1 hg1 (∆~x) − λ2 hg2 (∆~x) − · · · − λm hgm (∆~x)) 2 1 = f˜(~x0 ) + (hf (∆~x) − λ1 hg1 (∆~x) − λ2 hg2 (∆~x) − · · · − λm hgm (∆~x)) 2

332

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

has no first order term. Since the quadratic approximation of f˜ at ~x0 contains no first order term, the local maximum problem for f˜ at ~x0 is not changed if the constraint is modified by terms of second or higher order (this remains to be rigorously proved). Thus the local maximum problem becomes pf˜(~x) < pf˜(~x0 ) = f˜(~x0 ) for ~x near ~x0 , ∆~x 6= ~0, and ∇gi (~x0 ) · ∆~x = 0. The problem is easy to solve: First we solve the linear problem gi (~x0 )·∆~x = 0. Then we substitute the solution ∆~x into pf˜(~x) < f˜(~x0 ), which is the same as the quadratic problem hf (∆~x) − λ1 hg1 (∆~x) − λ2 hg2 (∆~x) − · · · − λm hgm (∆~x) < 0. Proposition 6.3.8. Suppose G = (g1 , g2 , . . . , gm ) : Rn → Rm is a continuously second order differentiable map and G(~x0 ) = ~c is a regular value of G. Suppose f is defined near ~x0 and is continuously second order differentiable at ~x0 . Suppose f 0 (~x0 ) = ~λ · G0 (~x0 ) for a vector ~λ ∈ Rm . Denote the quadratic form q = hf − ~λ · hG = hf − λ1 hg1 − λ2 hg2 − · · · − λm hgm . 1. If q(~v ) is positive definite for vectors ~v satisfying G0 (~x0 )(~v ) = ~0, then ~x0 is a local minimum of f under the constraint G(~x) = ~c. 2. If q(~v ) is negative definite for vectors ~v satisfying G0 (~x0 )(~v ) = ~0, then ~x0 is a local maximum of f under the constraint G(~x) = ~c. 3. If q(~v ) is indefinite for vectors ~v satisfying G0 (~x0 )(~v ) = ~0, then ~x0 is not a local extreme of f under the constraint G(~x) = ~c. Proof. Since G(~x) = ~c implies f˜(~x) = f (~x) − ~λ · G(~x) = f (~x) − ~λ · ~c, the extreme problem for f (~x) under the constraint G(~x) = ~c is the same as the extreme problem for f˜(~x) under the constraint G(~x) = ~c. Since ~c is a regular value, the constraint G(~x) = ~c means that some choice of m coordinates ~z of ~x can be written as a continuously differentiable map H(~y ) of the other n − m coordinates ~y . In other words, after rearranging the coordinates if necessary, we may write ~x = (~y , ~z), and ~x = (~y , H(~y )) is exactly the solution of the constraint G(~x) = ~c near ~x0 = (~y0 , ~z0 ). Then the problem is to determine whether ~y0 is a (unconstrained) local extreme of f˜(~y , H(~y )). We will study the problem by substituting the linear approximation of H(~y ) near ~y0 into the quadratic approximation of f˜(~y , ~z) near (~y0 , ~z0 ). In the process, we need to keep track of the remainder terms and show that they do not affect the local extreme problem. The linear approximation of ~z = H(~y ) near ~y0 is ∆~z = ~z − ~z0 = H 0 (~y0 )(∆~y ) + R1 (~y ), R1 (~y ) = o(k∆~y k). The quadratic approximation of f˜(~x) near ~x0 is 1 f˜(~x) = pf˜(~x) + R2 (~x) = f˜(~x0 ) + q(∆~x) + R2 (~x), R2 (~x) = o(k∆~xk2 ), 2

6.3. HIGH ORDER DIFFERENTIATION

333

where pf˜ has been computed before the statement of the theorem. Denote the linear transform L(~u) = (~u, H 0 (~y0 )(~u)). Then ∆~x = (∆~y , ∆~z) = (∆~y , H 0 (~y0 )(∆~y ) + R1 (~y )) = L(∆~y ) + (~0, R1 (~y )) is the solution of G(~x) = ~c. Substituting into f˜(~x), we get 1 f˜(~x) = f˜(~x0 ) + q(L(∆~y ) + (~0, R1 (~y ))) + R2 (~x) 2 1 1 = f˜(~x0 ) + q(L(∆~y )) + q(~0, R1 (~y )) + b(L(∆~y ), (~0, R1 (~y ))) + R2 (~x), 2 2 where b be the symmetric bilinear form that induces q. Since q is quadratic and b is bilinear, the remainder terms may be estimated as follows q(~0, R1 (~y )) = O(k(0, R1 (~y ))k2 ) = O(kR1 (~y )k2 ) = o(k∆~y k2 ), b(L(∆~y ), (~0, R1 (~y )) = O(kL(∆~y )kk(0, R1 (~y ))k) = O(k∆~y kkR1 (~y )k) = o(k∆~y k2 ), R2 (~x) = o(k∆~xk2 ) = o(kL(∆~y )k2 + k(~0, R1 (~y )k2 ) = o(k∆~y k2 ). Therefore under the constraint G(~x) = ~c, we get 1 f˜(~x) = f˜(~y , H(~y )) = f˜(~x0 ) + q(L(∆~y )) + R(~y ), R(~y ) = o(k∆~y k2 ). 2 Then we may apply the proof of Proposition 6.3.6 for the unconstrained extrema to conclude that the nature of the quadratic form q(L(~u)) determines the nature of the local extreme. What are vectors of the form L(~u)? Note that ∆~x = L(∆~y ) is the linear approximation of the solution of the equation G(~x) = ~c at ~x0 . Therefore ∆~x = L(∆~y ) should be the solution of the linear approximation G0 (~x0 )(∆x) = ~0 of the constraint. See the discussion after the implicit function theorem. Thus vectors of the form L(~u) are exactly the solutions of the linear equation G0 (~x0 )(~v ) = 0, and we get the three statements in the proposition. Example 6.3.18. In Example 6.3.14 we found six possible local extrema of f = xy 2 on the circle g = x2 +y 2 = 3. To determine whether they are indeed local extrema, we compute the Hessians hf (u, v) = 4yuv+2xv 2 and hg (u, v) = 2u2 +2v 2 . Together with ∇g = (2x, 2y), we have the following table showing what happens at the six points. λ hf √ − λhg ∇g √ · (u, v) = 0 hf − λhg restricted 0 2 √3v 2 2 √3v = 0 0 2 0 −2 3v −2 √ 3v = 0 0 √ 2 1 −2u + 4√2uv 2u + 2√2v = 0 −6u2 2 1 −2u − 4√ 2uv 2u − 2 √2v = 0 −6u2 −1 2u2 + 4√2uv −2u + 2√2v = 0 6u2 2 −1 2u − 4 2uv −2u − 2 2v = 0 6u2 √ √ Therefore (1, ± 2) are local maxima√and (−1, ± 2) are local minima. However, we still do not know whether (0, ± 3) are local extrema (an approximation more refined than the quadratic one is needed). (x √0 , y0 ) ( √3, 0) (− √ 3, 0) (1, √2) (1, −√2) (−1, √2) (−1, − 2)

334

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

1 1 1 Example 6.3.19. In Example 6.3.16, we found that ± √ , √ , √ are the pos3 3 3 sible extrema of f = xy + yz + zx on the sphere x2 + y 2 + z 2 = 1. To determine whether they are indeed local extrema, we compute the Hessians hf (u, v, w) = 2uv + 2vw + 2wu and hx2 +y2 +z 2 (u, v, w) = 2u2 + 2v 2 + 2w2 . At the two points, we have λ = 1 and hf − λhx2 +y2 +z 2 = −2u2 − 2v 2 − 2w2 + 2uv + 2vw + 2wu. By ∇(x2 + y 2 + z 2 ) = (2x, 2y, 2z), we need to consider the sign of hf − λhx2 +y2 +z 2 for those (u, v, w) satisfying 1 1 1 ± √ u + √ v + √ w = 0. 3 3 3 Substituting the solution w = −u − v, we get 1 2 9 2 hf − λhx2 +y2 +z 2 = −6u − 6v − 6uv = −6 u + v − v , 2 2 1 1 1 √ √ √ which is negative definite. Thus ± , , are local maxima. 3 3 3 Exercise 6.3.33. Determine whether the possible local extrema in Exercise 6.3.27 are indeed local maxima or local minima. 2

6.3.6

2

Additional Exercise

Convex Subset and Convex Function A subset A ⊂ Rn is convex if ~x, ~y ∈ A implies the straight line connecting ~x and ~y still lies in A. In other words, (1 − λ)~x + λ~y ∈ A for any 0 < λ < 1. A function f (~x) defined on a convex subset A is convex if 0 < λ < 1 =⇒ (1 − λ)f (~x) + λf (~y ) ≥ f ((1 − λ)~x + λ~y ).

(6.3.16)

Exercise 6.3.34. For any λ1 , λ2 , . . . , λk satisfying λ1 + λ2 + · · · + λk = 1 and 0 ≤ λi ≤ 1, extend Jensen inequality in Exercise 2.3.40 to multivariable convex functions f (λ1 ~x1 + λ2 ~x2 + · · · + λk ~xk ) ≤ λ1 f (~x1 ) + λ2 f (~x2 ) + · · · + λk f (~xk ).

(6.3.17)

Exercise 6.3.35. Prove that a function f (~x) is convex if and only if its restriction on any straight line is convex. Exercise 6.3.36. Prove that a function f (~x) is convex if and only if for any linear function L(~x) = a+b1 x1 +b2 x2 +· · ·+bn xn , the subset {~x : f (~x) ≤ L(~x)} is convex. Exercise 6.3.37. Extend Proposition 2.3.5 to multivariable: A function f (~x) defined on an open convex subset is convex if and only if for any ~z in the subset, there is a linear function K(~x), such that K(~z) = f (~z) and K(~x) ≤ f (~x). Exercise 6.3.38. Prove that any convex function on an open convex subset is continuous.

6.3. HIGH ORDER DIFFERENTIATION

335

Exercise 6.3.39. Prove that a second order continuously differentiable function f (~x) on an open convex subset is convex if and only if the Hessian is semipositive definite: hf (~v ) ≥ 0 for any ~v .

Snell’s Law Fermat’s principle says that light travels along the path of least traveling time. Because of the principle, the direction of light changes as it enter from air to water, as shown in the picture. The phenomenon is called refraction. ........ .. ........ .. ........ .. air, speed u ........ ........ . ........ α ... ........... ....................................................................................................................................................... .. .... ... ....... . β ...... ..... water, speed v ... ..... .. .... . Figure 6.5: Snell’s Law Exercise 6.3.40. Suppose a light travels at the respective speed of u and v in the air and the water. Use Fermat’s principle to prove that the angle α of incidence u v and the angle β of refraction are related by Snell’s law: = . sin α sin β

Laplacian The Laplacian of a function f (~x) is ∆f =

∂ 2f ∂ 2f ∂ 2f + + · · · + . ∂x21 ∂x22 ∂x2n

(6.3.18)

Functions satisfying the Laplace equation ∆f = 0 are called harmonic. Exercise 6.3.41. Prove that ∆(f + g) = ∆f + ∆g and ∆(f g) = g∆f + f ∆g + 2∇f · ∇g. Exercise 6.3.42. Suppose a function f (~x) = u(r) depends only on the Euclidean norm r = k~xk2 of the vector. Prove that ∆f = u00 (r) + (n − 1)r−1 u0 (r) and find the condition for the function to be harmonic. Exercise 6.3.43. Derive the Laplacian in R2 ∆f =

∂2f 1 ∂f 1 ∂2f + + . ∂r2 r ∂r r2 ∂θ2

(6.3.19)

in the polar coordinates. Also derive the Laplacian in R3 ∆f =

∂2f 2 ∂f 1 ∂2f cos φ ∂f 1 ∂2f + + + + . ∂r2 r ∂r r2 ∂θ2 r2 sin φ ∂θ r2 sin2 φ ∂θ2

in the spherical coordinates.

(6.3.20)

336

CHAPTER 6. MULTIVARIABLE DIFFERENTIATION

Exercise 6.3.44. Let ~x = F (~y ) : Rn → Rn be an orthogonal change of variable (see Exercise (6.2.15)). Let |J| = k~xy1 k2 k~xy2 k2 · · · k~xyn k2 be the absolute value of the determinant of the Jacobian matrix. Prove the Laplacian is 1 ∂ |J| ∂f ∂ |J| ∂f ∂ |J| ∂f ∆f = + + ··· + . |J| ∂y1 k~xy1 k22 ∂y1 ∂y2 k~xy2 k22 ∂y2 ∂yn k~xyn k22 ∂yn (6.3.21) In particular, if the change of variable is orthonormal (i.e., k~xyi k2 = 1), then the Laplacian is not changed.

Euler Equation Exercise 6.3.45. Suppose a function f is differentiable away from ~0. Prove that f is homogeneous of degree α if and only if it satisfies the Euler equation x1 fx1 + x2 fx2 + · · · + xn fxn = αf.

(6.3.22)

What about a multihomogeneous function? Exercise 6.3.46. Extend the characterization of homogeneous function (6.3.22) to high order derivatives X 1≤i1 ,i2 ,...,ik

x i1 x i2 · · · x ik

∂kf = α(α − 1) · · · (α − k + 1)f. ∂xi1 ∂xi2 · · · ∂xik

(6.3.23)

Exercise 6.3.47. Prove that a change of variable ~x = H(~z) : Rn → Rn preserves P xi fxi z1 (f ◦ H)z1 + z2 (f ◦ H)z2 + · · · + zn (f ◦ H)zn = x1 fx1 + x2 fx2 + · · · + xn fxn . for any differentiable function f (~x) if and only if it preserves the scaling H(c~z) = cH(~z) for any c > 0.

Chapter 7 Multivariable Integration

337

338

CHAPTER 7. MULTIVARIABLE INTEGRATION

7.1

Riemann Integration

The integration of multivariable functions is closely tied to the concept of volume in two ways. First we can only integrate functions defined on subsets with volumes. Second, the integral of an n-variable function is the volume of an (n + 1)-dimensional subset (the region enclosed by the graph of the function). The connection makes the theory of integration equivalent to the theory of volume. The theory of volume can be developed from the obvious positivity, additivity, and translation invariant properties. The finite additivity leads to the Jordan volume theory and the Riemann integral. In the future, the countable additivity will lead to the Lebesgue measure theory and the Lebesgue integral.

7.1.1

Volume in Euclidean Space

It is easy to imagine how to extend the Riemann integration to multivariable functions on rectangles I = [a1 , b1 ] × [a2 , b2 ] × · · · × [an , bn ] ⊂ Rn .

(7.1.1)

However, integration on rectangles alone is too restrictive. The subsets on which the Riemann integration can be defined should be subsets with volumes. After all, the integral of the constant function 1 is supposed to be the volume of the underlying subset. Thus a theory of volume needs to be established. Let A ⊂ Rn be a bounded subset. Then A ⊂ [−M, M ]n for some big M . By taking a partition of [−M, M ] for each coordinate, we get a partition P of [−M, M ]n consisting of rectangles that intersect only along boundaries. The size of the partition can be measured by kP k = max{d(I) : I ∈ P }, d(I) = max{b1 − a1 , b2 − a2 , . . . , bn − an }. For any partition P , the unions of rectangles − A+ P = ∪{I : I ∈ P, I ∩ A 6= ∅}, AP = ∪{I : I ∈ P, I ⊂ A}

(7.1.2)

form the outer and the inner approximations of A, satisfying − A+ P ⊃ A ⊃ AP .

Define the volume of the rectangle (7.1.1) by µ([a1 , b1 ] × [a2 , b2 ] × · · · × [an , bn ]) = (b1 − a1 )(b2 − a2 ) · · · (bn − an ). Then define µ(A+ P) =

X

µ(I), µ(A− P) =

X

µ(I).

I∈P,I⊂A

I∈P,I∩A6=∅

We have − µ(A+ P ) − µ(AP ) =

X

µ(I),

I∈P,I∩A6=∅,I−A6=∅ − and the volume of A is supposed to be between µ(A+ P ) and µ(AP ).

(7.1.3)

7.1. RIEMANN INTEGRATION

339

Definition 7.1.1. A subset A ⊂ Rn has volume (or is Jordan measurable) if for any > 0, there is δ > 0, such that − kP k < δ =⇒ µ(A+ P ) − µ(AP ) < .

The volume (or the Jordan measure) of A is the unique number µ(A) satis− fying µ(A+ P ) ≥ µ(A) ≥ µ(AP ) for all partitions. Example 7.1.1. Consider A = (a, b) ⊂ R. For any > 0, take P to be a < a + < − b − < b. Then µ(A+ P ) = µ([a, b]) = b − a and µ(AP ) = µ([a + , b − ]) = b − a − 2. This suggests that A has volume (or length) b − a. Note that we cannot yet conclude that A has volume according to the definition, − because we have not verified that µ(A+ P ) − µ(AP ) is small for all P with small kP k. The rigorous reason is given by Proposition 7.1.2. 1 , take P to be 0 < Consider B = {n−1 : n ∈ N}. For any N and < 2N 2 −1 −1 −1 −1 N + < (N − 1) − < (N − 1) + < · · · < 2 − < 2−1 + < 1 − < 1. −1 + 2N < 2N −1 and µ(A− ) = 0. Thus A should have volume Then µ(A+ P) = N P 0. Example 7.1.2. Let A be the set of rational numbers in [0, 1]. Then any interval [a, b] with a < b contains points inside and outside A. Therefore for any partition − P of [0, 1], we have µ(A+ P ) = 1 and µ(AP ) = 0. We conclude that A has no volume. Example 7.1.3. Consider a non-negative bounded single variable function f (x) on [a, b]. The subset G(f ) = {(x, y) : a ≤ x ≤ b, 0 ≤ y ≤ f (x)} ⊂ R2 is the subset under the graph of the function. For any partition P : a = x0 < x1 < · · · < xk = b of [a, b] and any partition Q : 0 = y0 < y1 < · · · < yl of the y-axis, we have − G(f )+ P ×Q = ∪[xi−1 , xi ] × [0, yq ], G(f )P ×Q = ∪[xi−1 , xi ] × [0, ys ].

where yq is the smallest yj > f on [xi−1 , xi ] and ys is the biggest yj ≤ f on [xi−1 , xi ]. The description of yq and ys tells us yq ≥

sup f (x) ≥ yq−1 ≥ yq − kQk, ys ≤ [xi−1 ,xi ]

inf [xi−1 ,xi ]

f (x) ≤ ys+1 ≤ ys + kQk,

P yq ∆xi satisfies Therefore µ G(f )+ P ×Q = X + µ G(f )+ ≥ sup f (x)∆x ≥ µ G(f ) i P ×Q P ×Q − kQk(b − a). [xi−1 ,xi ]

Similarly, we have X µ G(f )− P ×Q ≤

inf [xi−1 ,xi ]

f (x)∆xi ≤ µ G(f )− P ×Q + kQk(b − a).

Thus X − − µ G(f ) − ω (f )∆x µ G(f )+ ≤ 2kQk(b − a). i [xi−1 ,xi ] P ×Q P ×Q This implies that G(f ) has volume (or area) if and only if f is Riemann integrable. Z b Moreover, the volume of G(f ) is f (x)dx. a

340

CHAPTER 7. MULTIVARIABLE INTEGRATION

The following gives some criteria for subsets to have volume. Recall that ~x is a boundary point of A if and only if for any > 0, there are ~a ∈ A and ~b 6∈ A, such that k~x − ~ak < and k~x − ~bk < . Proposition 7.1.2. The following are equivalent for a bounded subset A. 1. A has volume. − 2. For any > 0, there is a partition P , such that µ(A+ P ) − µ(AP ) < .

3. For any > 0, there are rectangles I1 , I2 , . . . , Ik , such that ∂A ⊂ I1 ∪ I2 ∪ · · · ∪ Ik , µ(I1 ) + µ(I2 ) + · · · + µ(Ik ) < .

(7.1.4)

Proof. The first property clearly implies the second. Now assume the second property holds. A boundary point ~x of A lies either on the boundary of some I ∈ P , or inside the interior of some I ∈ P . If ~x is in the interior of I, then I contains points inside as well as outside A. In other words, I appears in the outer approximation A+ P but not in the − inner approximation AP . We conclude that ∂A ⊂ I1 ∪ I2 ∪ · · · ∪ Ik , where each Ii is either part of the boundary of some I ∈ P or Ii ∈ P appears − in A+ P but not in AP . Since the boundary parts of rectangles have volume 0, we have − µ(I1 ) + µ(I2 ) + · · · + µ(Ik ) ≤ µ(A+ P ) − µ(AP ) < . This proves the third property. Finally we assume the third property. For I in (7.1.1) and δ > 0, denote Y I δ = {~y : k~y − ~xk∞ ≤ δ for some ~x ∈ I} = [ai − δ, bi + δ]. Then for sufficiently small δ, we still have µ(I1δ ) + µ(I2δ ) + · · · + µ(Ikδ ) < . Suppose P is a partition with kP k < δ. For any rectangle I ∈ P that appears − in A+ a ∈ I ∩ A and ~b ∈ I − A. Then some point P but not in AP , we can find ~ on the straight line connecting ~a and ~b is on the boundary ∂A of A (see Exercise 5.1.23). Denote the point by ~c = (1 − τ )~a + τ~b, 0 ≤ τ ≤ 1. Then for any ~x ∈ I, we have k~x − ~ck∞ = k(1 − τ )(~x − ~a) + τ (~x − ~b)k∞ ≤ (1 − τ )k~x − ~ak∞ + τ k~x − ~bk∞ ≤ (1 − τ )kP k + τ kP k < δ. By ~c ∈ ∂A ⊂ I1 ∪I2 ∪· · ·∪Ik , the estimation above implies ~x ∈ I1δ ∪I2δ ∪· · ·∪Ikδ . Therefore I ⊂ I1δ ∪ I2δ ∪ · · · ∪ Ikδ , and X − µ(A+ µ(I) ≤ µ(I1δ ) + µ(I2δ ) + · · · + µ(Ikδ ) < . P ) − µ(AP ) = I∩A6=∅,I−A6=∅

7.1. RIEMANN INTEGRATION

341

This proves the first property. We remark that, in the proof above, we did not use the fact that the subsets I in P are rectangles. The observation will be used in the proof of Proposition 7.1.4. Define the outer volume and the inner volume − − µ+ (A) = inf µ(A+ P ), µ (A) = sup µ(AP ). P

P

+ − − Since µ(A+ P ) ≥ µ(AQ ) ≥ µ(AQ ) ≥ µ(AP ) for a refinement Q of P , and any two partitions have a common refinement, we can easily get µ+ (A) ≥ µ− (A). By Proposition 7.1.2, A has volume if and only if µ+ (A) = µ− (A). The outer and inner volumes are clearly monotone

A ⊂ B =⇒ µ+ (A) ≤ µ+ (B), µ− (A) ≤ µ− (B). Moreover, the outer volume is clearly subadditive µ+ (A1 ∪ A2 ∪ · · · ∪ Ak ) ≤ µ+ (A1 ) + µ+ (A2 ) + · · · + µ+ (Ak ).

(7.1.5)

Consequently, the volume is monotone and subadditive. Proposition 7.1.3. A subset has volume if and only if it boundary has volume 0. Moreover, if A and B have volumes, then A ∪ B, A ∩ B and A − B have volumes and satisfy the additive property µ(A ∪ B) = µ(A) + µ(B) − µ(A ∩ B).

(7.1.6)

The additivity (7.1.6) means that the subadditivity property (7.1.5) becomes an equality when the subsets have volumes and are disjoint. Proof. Suppose A has volume. By the criterion (7.1.4) in Proposition 7.1.2, for any > 0, we have µ+ (∂A) ≤ µ+ (I1 ∪ I2 ∪ · · · ∪ Ik ) ≤ µ+ (I1 ) + µ+ (I2 ) + · · · + µ+ (Ik ) = µ(I1 ) + µ(I2 ) + · · · + µ(Ik ) < . Therefore we get µ+ (∂A) = 0. Then by µ− (∂A) ≤ µ+ (∂A), we see that ∂A has volume 0. Conversely, if ∂A has volume 0, then we can derive the third property in Proposition 7.1.2 from the definition of µ+ (∂A) = 0. This completes the proof that A has volume if and only if ∂A has volume 0. The boundaries of A ∪ B, A ∩ B and A − B are all contained in ∂A ∪ ∂B. If A and B have volumes, then by µ+ (∂A ∪ ∂B) ≤ µ+ (∂A) + µ+ (∂B) = 0, we see that all the boundaries have volume 0, so that the three subsets have volumes. Suppose A and B are disjoint. Then for any partition P , we have A− P ∪ − − − − − − BP ⊂ (A ∪ B)P , and AP , BP are disjoint. This leads to µ (A) + µ (B) ≤ µ− (A∪B). On the other hand, we always have µ+ (A)+µ+ (B) ≥ µ+ (A∪B) by

342

CHAPTER 7. MULTIVARIABLE INTEGRATION

the subadditivity. Therefore in case A and B have volumes, we get µ(A∪B) = µ(A) + µ(B). In general, suppose A and B have volumes. Then A − B, B − A, and A ∩ B have volumes and are disjoint. Then (7.1.6) follows from µ(A ∪ B) = µ(A − B) + µ(B − A) + µ(A ∩ B), µ(A) = µ(A − B) + µ(A ∩ B), µ(B) = µ(B − A) + µ(A ∩ B).

Exercise 7.1.1. Find subsets A, B of R, such that A and B do not have volumes, but A ∪ B, A ∩ B have volumes. Is it possible for A − B and B − A also not to have volumes? ¯ Prove that B also Exercise 7.1.2. Suppose A has volume and A − ∂A ⊂ B ⊂ A. has volume and µ(B) = µ(A). In particular, the interior A − ∂A and the closure A¯ have the same volume as A. Conversely, construct a subset A such that both ¯ so that A does not have A − ∂A and A¯ have volumes, but µ(A − ∂A) 6= µ(A), volume. Exercise 7.1.3. Suppose A ⊂ Rm and B ⊂ Rn are bounded. 1. Prove that µ+ (A × B) = µ+ (A)µ+ (B). 2. Prove that if A has volume 0, then A × B has volume 0. 3. Prove that if µ+ (A) 6= 0 and µ+ (B) 6= 0, then A × B has volume if and only if A and B have volumes. 4. Prove that if A and B have volumes, then µ(A × B) = µ(A)µ(B). Exercise 7.1.4. Prove that µ+ (∂A) = µ+ (A) − µ− (A). This gives another proof that A has volume if and only if ∂A has volume 0. Exercise 7.1.5. Determine whether the subsets have volumes and compute the volumes. 1. {m−1 + n−1 : m, n ∈ N}. 2. [0, 1] − Q. 3. {(x, x2 ) : 0 ≤ x ≤ 1}. 4. {(x, y) : 0 ≤ x ≤ 1, 0 ≤ y ≤ x2 }. 5. {(x, y) : 0 ≤ x ≤ y 2 , 0 ≤ y ≤ 1}. 6. {(n−1 , y) : n ∈ N, 0 < y < 1 + n−1 }. 7. {(x, y) : |x| < 1, |y| < 1, x ∈ Q or y ∈ Q}. 8. {(x, y) : |x| < 1, |y| < 1, x 6= y}. Exercise 7.1.6. Prove that any n-gon in R2 has volume.

7.1. RIEMANN INTEGRATION

343

So far the partitions are made of rectangles. To allow more flexibility, define a general partition of a subset B to be a finite collection P of subsets such that 1. Each I ∈ P has volume. 2. µ(I ∩ J) = 0 for I 6= J in P . 3. B = ∪I∈P I. The subset B must have volume and µ(B) = the partition by

P

I∈P

µ(I). Extend the size of

kP k = max{d(I) : I ∈ P }, d(I) = sup{k~x − ~y k∞ : ~x, ~y ∈ I}. − For a subset A ⊂ B, the outer and the inner approximations A+ P , AP may also be defined by (7.1.2).

Proposition 7.1.4. Suppose B has volume and A ⊂ B is a subset. The following are equivalent. 1. A has volume. 2. For any > 0, there is δ > 0, such that kP k < δ for a general partition − P implies µ(A+ P ) − µ(AP ) < . 3. For any > 0, there is a general partition P of B, such that µ(A+ P) − − µ(AP ) < . 4. For any > 0, there are subsets A+ and A− with volumes, such that A− ⊂ A ⊂ A+ and µ(A+ ) − µ(A− ) < . Proof. To prove the first property implies the second, we use the last part of the proof of Proposition 7.1.2. As remarked at the end of the proof, that part of the proof does not make use of the assumption that the subsets in P are rectangles. Therefore the proof can be applied to any general partition P . The result is that if A has the third property in Proposition 7.1.2, then for any > 0, there is δ > 0, such that − kP k < δ =⇒ µ(A+ P ) − µ(AP ) <

for general partitions P . In other words, the third property in Proposition 7.1.2 implies the second property in the current theorem. Since A having volume implies the third property in Proposition 7.1.2, we conclude that the first property implies the second. The second property also implies the first by restricting to rectangular partitions. Therefore the first two properties are equivalent. The second property clearly implies the third, and the third implies the − − fourth by taking A+ = A+ P and A = AP . Finally, assume the fourth property. Since A+ and A− have volumes, there is δ > 0, such that if a partition P satisfies kP k < δ, then + − − + − − µ((A+ )+ P ) − µ((A )P ) < , µ((A )P ) − µ((A )P ) < .

344

CHAPTER 7. MULTIVARIABLE INTEGRATION

+ + + − + By A+ P ⊂ (A )P and (A )P ⊂ A , we get + + + + − µ(A+ P ) − µ(A ) ≤ µ((A )P ) − µ((A )P ) < .

Similarly, we get µ(A− ) − µ(A− P ) < . Then − + − + + − − µ(A+ P )−µ(AP ) = (µ(AP )−µ(A ))+(µ(A )−µ(A ))+(µ(A )−µ(AP )) < 3.

This shows that A has volume. Exercise 7.1.7. Prove that the disks in R2 have volumes by comparing the inscribed and circumscribed regular n-gons.

7.1.2

Riemann Sum

Let f (~x) be a function defined on a subset A ⊂ Rn with volume. For any general partition P of A and choices ~x∗I ∈ I, define the Riemann sum X S(P, f ) = f (~x∗I )µ(I). (7.1.7) I∈P

Definition 7.1.5. A function f (~x) is Riemann integrable on a subset A with volume, with integral J, if for any > 0, there is δ > 0, such that kP k < δ =⇒ |S(P, f ) − J| < . We denote

Z f (~x)dµ = J. A

Sometimes we also use dV (dA for n = 2) or dx1 dx2 · · · dxn in place of dµ. Riemann integrable functions might be unbounded if µ(A) = 0 (or some “separated part” of A has volume 0). To avoid complicated statements, we will always insist Riemann integrable functions to be bounded. For a bounded function f , define the oscillation ωA (f ) = sup f (~x) − inf f (~x) ~ x∈A

~ x∈A

(7.1.8)

as in the single variable case. Then we have ~c ∈ A =⇒ |f (~c)µ(A) − S(P, f )| ≤ ωA (f )µ(A).

(7.1.9)

For a function f defined on A ⊂ Rn , the subset (y ∈ [f (~x), 0] if f (~x) ≤ 0) GA (f ) = {(~x, y) : ~x ∈ A, y ∈ [0, f (~x)]} ⊂ Rn+1

(7.1.10)

is the region between the graph of the function and the ~x-plane. Proposition 7.1.6. Suppose f is a bounded function on a subset A with volume. Then the following are equivalent.

7.1. RIEMANN INTEGRATION

345

1. f is Riemann integrable on A. 2. For any > P0, there is δ > 0, such that kP k < δ for a general partition P implies I∈P ωI (f )µ(I) < . P 3. For any > 0, there is a general partition P of A, such that I∈P ωI (f )µ(I) < . 4. The subset GA (f ) has volume in Rn+1 . Proof. The proof that the first two properties are equivalent is the same as the single variable case. The second property clearly implies the third. Let P be a general partition of A. Let |f | < M and Q : − M = y0 < y1 < y2 < · · · < yk = M be a partition of the y-axis. Then − GA (f )+ P ×Q = ∪I∈P I × [yp , yq ], GA (f )P ×Q = ∪I∈P I × [yr , ys ],

where [yp , yq ] and [yr , ys ] are determined as follows. If f ≥ 0 on I, then yp is the biggest yj < 0 and yq is the smallest yj ≥ f on I. Similar to the discussion in Example 7.1.3, this implies yq ≥ sup f ≥ yq−1 ≥ yq − kQk, −kQk ≤ yp < 0 ≤ yp+1 ≤ kQk. I

Moreover, [yr , ys ] is the biggest interval satisfying 0 ≤ yr < ys ≤ f on I. Note that [yr , ys ] exists if and only if f ≥ yp+2 on I. If [yr , ys ] exists, then yr = yp+1 , and similar to the discussion in Example 7.1.3, we have ys ≤ inf f ≤ ys+1 ≤ ys + kQk. I

Therefore |µ(I × [yp , yq ]) − µ(I × [yr , ys ]) − ωI (f )µ(I)| =|(yq − yp ) − (ys − yr ) − ωI (f )|µ(I) ≤(|yq − sup f | + |ys − inf f | + |yp+1 − yp |)µ(I) ≤ 3kQkµ(I). I

I

If [yr , ys ] does not exist, then 0 ≤ inf f ≤ yp+2 < yp+2 − yp ≤ 2kQk, I

and I × [yr , ys ] = ∅ in the formula for GA (f )− P ×Q . Therefore |µ(I × [yp , yq ]) − µ(I × [yr , ys ]) − ωI (f )µ(I)| =|yq − yp − ωI (f )|µ(I) ≤(|yq − sup f | + |yp | + | inf f |)µ(I) ≤ 4kQkµ(I). I

I

If f ≤ 0 on I, then similar argument can be carried out, and we also get |µ(I × [yp , yq ]) − µ(I × [yr , ys ]) − ωI (f )µ(I)| ≤ 4kQkµ(I).

346

CHAPTER 7. MULTIVARIABLE INTEGRATION

If f can be positive and negative on I, then yp is the biggest yj < f on I, yq is the smallest yj > f on I, and I × [yr , ys ] = ∅. We then have yp ≤ inf f ≤ yp+1 ≤ yp + kQk, yq ≥ sup f ≥ yq−1 ≥ yq − kQk, I

I

and |µ(I × [yp , yq ]) − µ(I × [yr , ys ]) − ωI (f )µ(I)| =|yq − yp − ωI (f )|µ(I) ≤(|yq − sup f | + |yp − inf f |)µ(I) ≤ 2kQkµ(I). I

I

Combining the estimations of the three cases together, we have X + − ωI (f )µ(I) µ GA (f )P ×Q − µ GA (f )P ×Q − I X ≤ |(yq − yp ) − (ys − yr ) − ωI (f )|µ(I) ≤ 4kQkµ(A). I∈P

By taking kQk sufficiently small, and with the help of Proposition 7.1.4, we find the second, the third, and the fourth statements are equivalent. Now assume P f is ∗integrable on A and we compare+ the Riemann sum S(P, |f |) = I∈P |f (~xI )|µ(I) with the volume µ(GA (f )P ×Q ). In case f ≥ 0 on I, we have 0 ≤ (yq − yp ) − |f (~x∗I )| ≤ sup f + kQk + kQk − f (~x∗I ) ≤ ωI (f ) + 2kQk. I

In case f ≤ 0 on I, we similarly have 0 ≤ (yq − yp ) − |f (~x∗I )| ≤ ωI (f ) + 2kQk. In the remaining case, we have 0 ≤ (yq − yp ) − |f (~x∗I )| ≤ (sup f + kQk) − (inf f − kQk) ≤ ωI (f ) + 2kQk. I

I

Therefore µ(GA (f )+

P ×Q )

X − S(P, |f |) ≤ |(yq − yp ) − |f (~x∗I )||µ(I) I∈P

≤

X

ωI (f )µ(I) + 2kQkµ(A).

I

Since the right side can be arbitrarily small, we conclude that Z µ(GA (f )) = |f |dA. A

(7.1.11)

7.1. RIEMANN INTEGRATION

347

Example 7.1.4. Consider the characteristic function of a bounded subset A ( 1 if ~x ∈ A χA (~x) = . 0 if ~x ∈ /A Let A ⊂ (−M, M )n and let P be a partition of [−M, M ]n . Then the oscillation ( 1 if I ∩ A 6= ∅ and I − A 6= ∅ ωI (χA ) = , 0 if I ⊂ A or I ∩ A = ∅ P − and we get ωI (χA )µ(I) = µ(A+ P ) − µ(AP ). Thereofre χA is Riemann integrable if and only if A has volume. Moreover, by taking x∗I ∈ I ∩ A for all I satisfying I ∩ A 6= ∅ and I − A 6= ∅, we get S(P, f ) = µ(A+ x∗I ∈ I − A instead, we get S(P, f ) = µ(A− P ). By taking P ). Z Either formula implies µ(A) =

χA dµ in case A has volume.

Exercise 7.1.8. Prove Zthat if µ(A) = 0, then any bounded function is Riemann integrable on A, with f dµ = 0. A

Exercise 7.1.9. The Thomae function in Examples 1.4.2 and 3.1.5 may be extended 1 1 + if x and y are rational numbers with to two variables by R(x, y) = qx qy irreducible denominators qx and qy and R(x, y) = 0 otherwise. Prove that R(x, y) is Riemann integrable on any subset with volume, and the integral is 0. Exercise 7.1.10. Suppose f is a bounded function on a subset A with volume. ˚A (f ) = {(~x, y) : ~x ∈ A, y ∈ [0, f (~x))} has Prove that f is integrable if and only if G volume. Moreover, the integrability of f implies HA (f ) = {(~x, f (~x)) : ~x ∈ A} has volume zero, but the converse is not true. Exercise 7.1.11. Suppose f ≤ gZ ≤ h on a subset A with volume. Suppose f and h Z are integrable, with f dµ = hdµ. Prove that g is also integrable. A

A

Exercise 7.1.12. Prove that if f is integrable on A, and B ⊂ A has volume, then f is integrable on B.

For a map F : A ⊂ Rn → Rm on a subset A with volume, we may also introduce the Riemann sum X S(P, F ) = µ(I)F (x∗I ). I∈P

The map is Riemann integrable on a subset A with volume, with integral J~ ∈ Rm , if for any > 0, there is δ > 0, such that ~ < . kP k < δ =⇒ kS(P, F ) − Jk Then we denote

Z

~ F (~x)dµ = J.

A

It is easy to see that F is integrable if and only if each coordinate is integrable.

348

CHAPTER 7. MULTIVARIABLE INTEGRATION

Exercise 7.1.13. Define the oscillation of a map F : Rn → Rm to be ωA (F ) = sup kF (~x) − F (~y )k. ~ x,~ y ∈A

Prove that a map F is Riemann integrable if and only if for any > 0, there is δ > 0, such that kP k P < δ for a general partition P of A implies the Riemann sum of the oscillation I∈P ωI (F )µ(I) < . Moreover,Pthe integrability is also equivalent to the existence of one partition P satisfying I∈P ωI (F )µ(I) < . Exercise 7.1.14. For a Riemann integrable map, prove

Z

Z

F (~x)dµ ≤ kF (~x)kdµ ≤ sup kF (~x)k µ(A).

A

7.1.3

~ x∈A

A

Properties of Integration

The following property extends Proposition 3.1.4. Proposition 7.1.7. Bounded and continuous maps are Riemann integrable on subsets with volumes. Proof. Suppose f is continuous on a subset A with volume. For any > 0, − there is a partition P , such that µ(A − A− P ) < . The subset K = AP is compact because it is a union of finitely many closed rectangles. Therefore f is uniformly continuous on K. This means that there is δ > 0, such that ~x, ~y ∈ K and k~x − ~y k < δ implies |f (~x) − f (~y )| < . By further refine the rectangles in P such that kP k < δ, we get ωI (f ) < for any I ∈ P satisfying I ⊂ A. Now consider the general partition Q = {I ∈ P : I ⊂ A} ∪ {A − K} of A. If |f | < b on A, then with respect to the partition, we have X X ωI (f )µ(I) ωJ (f )µ(J) = ωA−K (f )µ(A − K) + I∈P,I⊂A

J∈Q

≤ 2b +

X

µ(I) ≤ 2b + µ(A).

I∈P,I⊂A

This verifies the criterion for the integrability of f on A. Proposition 3.1.5 cannot be extended in general because there is no monotonicity concept for multivariable functions. With basically the same proof, Proposition 3.1.6 can be extended. Proposition 7.1.8. Suppose F : Rn → Rm is a Riemann integrable map on a subset A with volume. Suppose the values of F lie in a compact subset K, and Φ is a continuous map on K. Then the composition Φ ◦ F is Riemann integrable on A. Propositions 3.1.7 and 3.1.8 can also be extended, by the same argument. Proposition 7.1.9. Suppose f and g are Riemann integrable on a subset A with volume.

7.1. RIEMANN INTEGRATION

349

1. The linear combination af + bg and Z the product Zf g are Riemann inteZ grable on A, and (af + bg)dµ = a f dµ + b gdµ. A

Z 2. If f ≤ g, then

A

Z f dµ ≤

A

A

A

Z Z gdµ. Moreover, f dµ ≤ |f |dµ. A

A

Denote the “upper half” and the “lower half” of Rn+1 H + = {(~x, y) : y ≥ 0} = Rn × [0, +∞), H − = {(~x, y) : y ≤ 0} = Rn × (−∞, 0]. For any subset G ⊂ Rn+1 with volume, the intersections G ∩ H + and G ∩ H − have volumes. For the special case G = GA (f ), we have GA (f ) ∩ H + = GA (max{f, 0}), GA (f ) ∩ H − = GA (min{f, 0}). If f is Riemann integrable on A, then both intersections have volumes, so that max{f, 0} and min{f, 0} are Riemann integrable on A. Moreover, by (7.1.11), we have Z Z + min{f, 0}dµ = −µ(GA (f ) ∩ H − ). max{f, 0}dµ = µ(GA (f ) ∩ H ), A

A

Adding the two equalities, we get Z f dµ = µ(GA (f ) ∩ H + ) − µ(GA (f ) ∩ H − ).

(7.1.12)

A

The equality and Proposition 7.1.3 imply the following extension of Proposition 3.1.9. Proposition 7.1.10. Suppose f is Riemann integrable on subsets A and B with volume. Then f is Riemann integrable on A ∪ B and A ∩ B, with Z Z Z Z f dµ. f dµ + f dµ = f dµ + A∪B

A∩B

A

B

The next result will be needed in the theory of improper multivariable integrals. Proposition 7.1.11. Suppose f is Riemann integrable on a subset A with volume. Then Z Z max{f, 0}dµ = sup f dµ. A

C⊂A, C has volume

C

Intuitively, we wish C to be the largest subset of A on which f ≥ 0. The natural choice of C would be A+ = {~x ∈ A : f (~x) ≥ 0}. Unfortunately, the subset A+ may not have volume. A counterexample is the negative of the Thomae function in Example 3.1.5.

350

CHAPTER 7. MULTIVARIABLE INTEGRATION

P Proof. For any > 0, there is δ > 0, such that I∈P ωI (f )µ(I) < for any general partition P of A satisfying kP k < δ. Then f + = max{f, 0} satisfies (see Exercise 7.1.20) Z X X + + ≤ S(P, f + ) − f dµ ω (f )µ(I) ≤ ωI (f )µ(I) ≤ . I A

I∈P

I∈P

Moreover, Q = {I ∈ P : f (~x) ≥ 0 for some ~x ∈ I} is a general partition of C = ∪I∈Q I. For each I ∈ P , choose ~x∗I ∈ I. The choice can be made to satisfy f (~x∗I ) ≥ 0 in case I ∈ Q. Then we have Z Z S(P, f + ) − f dµ = S(Q, f ) − f dµ C C X X ≤ ωI (f )µ(I) ≤ ωI (f )µ(I) ≤ . I∈Q

I∈P

Combined with the earlier estimation, we get Z Z + ≤ 2. f dµ − f dµ A

C

On the other hand, we always have Z Z Z + + f dµ. f dµ ≥ f dµ ≥ A

C

C

Then the equality in the proposition follows. Example 7.1.5. For any ~a 6= ~0, consider a codimension 1 hyperplane B = {~x : ~a · ~x = c}. If the last coordinate of ~a is nonzero, then we can solve the last coordinate from ~a · ~x = c and get B = {(~y , z) : z = ~b · ~y }. For any M > 0, B ∩ [−M, M ]n−1 × R is contained in the graph of the linear function ~b · ~y for ~y ∈ [−M, M ]n−1 . Since linear functions are Riemann integrable, we conclude that µ(B ∩ [−M, M ]n−1 × R) = 0. In general, we always have µ(B ∩ [−M, M ]n−1 × R) = 0 up to a permutation of coordinates. A convex polytope A is the intersection of finitely many half spaces ~a · ~x ≥ c, ~a 6= ~0. The boundary of the convex polytope consists of those points such that some inequalities become equalities ~a · ~x = c. Therefore the boundary is a finite union of subsets of the form B above. In particular, if A is bounded, then the boundary has volume zero. Therefore bounded convex polytopes have volume. A bounded polyhedron is a finite union of bounded polytopes and therefore also has volume. Exercise 7.1.15. Prove Proposition 7.1.8.

7.1. RIEMANN INTEGRATION

351

Exercise 7.1.16. Suppose f is Riemann integrable on [a, b]. Suppose A ⊂ R2 is a subset with area, such that (x, y) ∈ A implies a ≤ x + y ≤ b. 1. Prove that there is b, such that µ{(x, y) ∈ A : s ≤ x + y ≤ t} < b(t − s) for any s < t. 2. Prove that f (x + y) is Riemann integrable on A. Moreover, compute the area of {(x, y) : |x| ≤ M, |y| ≤ M, s ≤ xy ≤ t} and study the integrability of f (xy). Exercise 7.1.17. Prove Proposition 7.1.9. Explain that the Riemann integral is not changed if the function is arbitrarily modified on any subset of volume 0. Exercise 7.1.18. Prove Proposition 7.1.10 by directly using the Riemann sum. Exercise 7.1.19. Suppose a function f (x, y) defined on a rectangle [a, b] × [c, d] is increasing in x for fixed y and increasing in y for fixed x. Prove that f is Riemann integrable. What if f in increasing in one variable and continuous in another? Exercise 7.1.20. Suppose f (~x) is integrable on A. Prove that Z f dµ ≤ µ(A) sup f, µ(A) inf f ≤ A

A

A

Z X S(P, f ) − ≤ f dµ ωI (f )µ(I). A

I∈P

Exercise 7.1.21. Suppose A is a path connected and compact subset with volume. ZSuppose f is a continuous function on A. Prove Z that there is ~a ∈ A, such that

f dµ = f (~a)µ(A). Extend the conclusion to A

f gdµ for a non-negative A

integrable function g on A.

Exercise 7.1.22. Suppose A is an open subset with volume. Suppose f is a continuous function on A. Prove the following are equivalent. 1. f = 0. Z 2. |f |dµ = 0. A

Z f dµ = 0 for any subset B ⊂ A with volume.

3. B

Z f dµ = 0 for any rectangle I ⊂ A.

4. I

Z 5.

f gdµ = 0 for any continuous function g on A. A

Does the conclusion still hold if A is not open? Exercise 7.1.23. Extend Proposition 4.2.9 to multivariable Riemann integration. Suppose fn (~x) is integrable on a subset A with volume and the sequence {fn (~x)} uniformly converges on A. Then limn→∞ fn (~x) is integrable on A, with Z Z lim fn (~x)dµ = lim fn (~x)dµ. A n→∞

n→∞ A

352

CHAPTER 7. MULTIVARIABLE INTEGRATION

7.1.4

Fubini Theorem

Theorem 7.1.12 (Fubini Theorem). Suppose A ⊂ Rm and B ⊂ Rn have volumes. Suppose f (~x, ~y ) is Riemann integrable onZA × B. Suppose for each fixed ~y , f (~x, ~y ) is Riemann integrable on A. Then

f (~x, ~y )dµ~x is Riemann A

integrable on B, and Z

Z Z

f (~x, ~y )dµ~x,~y = A×B

f (~x, ~y )dµ~x dµ~y . B

A

∗ ∗ Proof. Let P and Z Q be partitions of A and B. Let ~xI ∈ I ∈ P and ~yJ ∈ J ∈ Q. Let g(~y ) = f (~x, ~y )dµ~x . Then we have the Riemann sums A

X

S(P × Q, f ) =

f (~x∗I , ~yJ∗ )µ~x (I)µ~y (J) =

I∈P,J∈Q

S(Q, g) =

X

X

S(P, f (~x, ~yJ∗ ))µ~y (J),

J∈Q

g(~yJ∗ )µ~y (J)

J∈Q

=

X Z J∈Q

f (~x, ~yJ∗ )dµ~x

µ~y (J).

A

For any > 0, by the integrability of f on A × B, there is δ > 0, such that kP k < δ, kQk < δ implies Z < . S(P × Q, f ) − f (~ x , ~ y )dµ ~ x ,~ y A×B

Fix one partition Q satisfying kQk < δ and fix one choice of ~yJ∗ . Then there is δ ≥ δ 0 > 0, such that kP k < δ 0 implies Z ∗ S(P, f (~x, ~yJ∗ )) − < f (~ x , ~ y )dµ ~ x J A

for all (finitely many) J ∈ Q. For any such P , we have Z X ∗ ∗ f (~x, ~yJ )dµ~x µ~y (J) |S(P × Q, f ) − S(Q, g)| ≤ S(P, f (~x, ~yJ )) − A

J∈Q

≤

X

µ~y (J) = µ~y (B),

J∈Q

and we get Z S(Q, g) −

f (~x, ~y )dµ~x,~y A×B Z ≤|S(P × Q, f ) − S(Q, g)| + S(P × Q, f ) −

A×B

f (~x, ~y )dµ~x,~y

<µ~y (B) + . In particular, the thirdZproperty in Z Theorem 7.1.6 is satisfied. Therefore g is integrable on B, with

gdµ~y = B

f (~x, ~y )dµ~x,~y . A×B

7.1. RIEMANN INTEGRATION

353

..... y . b ..................................................................................... .. .. ........................................... y = h(~x) ....................... .. .. .. .............. .. .. . . . . ........... ... .. .. ...... .. .. B .. .. .. .. .. .. .. .. .. .. ................................................................................... ... ........ .. .. .. y = k(~x) .... .. .. . a ....................................................................................... ....................................................................................................................................................................... ~x .. .................................................A....................................................... .. Figure 7.1: integration between two graphs Example 7.1.6. Suppose h and k are integrable functions on a subset A with volume. Suppose h(~x) ≥ k(~x). Then the region B = {(~x, y) : ~x ∈ A, h(~x) ≥ y ≥ k(~x)} between the graphs of h and k has volume. For an integrable function f (~x, y) on B, we may extend the function to A × [a, b] (b > h(~x) ≥ k(~x) > a for all ~x ∈ A) by taking value 0 outside B. If for each fixed ~x, f (~x, y) is integrable for y ∈ [k(~x), h(~x)], then we have Z Z f (~x, y)dµ~x,y = f (~x, y)dµ~x,y A×[a,b] B ! Z Z Z Z h(~ x)

b

f (~x, y)dy dµ~x .

f (~x, y)dy dµ~x =

= A

a

k(~ x)

A

For a specific example, the integral of (x + y)2 on the region bounded by x = 0, y = 1 and x = y is Z Z 1 Z 1 Z 1 37 (x + 1)3 − (2x)3 2 2 dx = . (x+y) dxdy = (x + y) dy dx = 3 12 0≤x≤1,x≤y≤1 0 x 0 Example 7.1.7. Suppose A ⊂ Rm × Rn is a subset with volume. Suppose for each ~x ∈ Rm , the section A~x = {~y ∈ Rn : (~x, ~y ) ∈ A} also has volume. Then by applying Fubini theorem to the characteristic function (see Example 7.1.4), we find the volume of A to be Z Z Z Z µm+n (A) = χA (~x, ~y )dµ~x,~y = χA (~x, ~y )dµ~y dµ~x = µ~y (A~x )dµ~x . A

Rm

Rn

Rm

The integral on the Euclidean spaces are actually the integral on some rectangles enclosing the subsets. For a specific example, the area of the unit disk in R2 is Z 1 h p Z 1 p Z π p i p 2 2 2 µ − 1 − x , 1 − x dx = 2 1 − x dx = 2 1 − cos2 t sin tdt = π. −1

−1

0

354

CHAPTER 7. MULTIVARIABLE INTEGRATION

Example 7.1.8. The two variable Thomae function in Exercise 7.1.9 is Riemann 1 integrable. However, for each rational y, we have R(x, y) ≥ > 0 for rational x qy and R(x, y) = 0 for irrational x. By the reason similar to the non-integrability of the function, f (x, y) is not integrable in x. Thus the repeated integral Z Dirichlet Z R(x, y)dx dy does not exist. The other repeated integral also does not exist for the same reason. Note that since Riemann integrability is not changed if the function is modified on a subset of volume 0, Fubini theorem essentially holds even if those ~y for which f (~x, ~y ) is not integrable in ~x form a set of volume zero. Unfortunately, this is not the case for the two variable Thomae function, because the set of rational numbers does not have volume zero. Example 7.1.9. If f (~x, ~y ) is integrable on A × B, f (~x, ~y ) is integrable on A for each fixed ~y , and f (~x, ~y ) is integrable on B for each fixed ~x, then by Fubini theorem, the two repeated integrals are the same Z Z Z Z f (~x, ~y )dµ~x dµ~y . f (~x, ~y )dµ~y dµ~x = A

B

B

A

However, if two repeated integrals exist and are equal, it does not necessarily follow that the function is integrable. Consider k l S= , : 0 ≤ k, l ≤ n, p is a prime number . p p The section Sx = {y : (x, y) ∈ S} is empty for any irrational x and is finite for any rational x. As a result, the section has volume (length) 0 for any x. The same holds for the sections Sy . On the other hand, for any 2 dimensional partition P of [0, 1]2 by rectangles, we have SP+ = [0, 1]2 (because S is dense in [0, 1]2 ) and SP− = ∅ (because S contains no rectangles). Therefore µ(SP+ ) = 1, µ(SP− ) = 0, and the set S has no volume. By Example 7.1.4, a subset has volume if and only if its characteristic function is Riemann integrable.Z Therefore in terms function χS , the Z of theZcharacteristic Z two repeated integrals

χS (x, y)dy dx and χS (x, y)dx dy exist and Z are equal to 0, but the double integral χS (x, y)dxdy does not exist.

Exercise 7.1.24. Compute the integral. Z 1. | cos(x + y)|dxdy. [0,π]×[0,π]

Z

y 2 cos(πxy)dxdy.

2. [0,1]×[0,1]

Z

(x1 + x2 + · · · + xn )p dx1 dx2 · · · dxn .

3. [a1 ,b1 ]×[a2 ,b2 ]×···×[an ,bn ]

Z 4. A

(x + y)dxdy, A is the region bounded by y = x2 , y = 4x2 , y = 1 and with

x ≥ 0.

7.1. RIEMANN INTEGRATION Z 5. A

355

sin y dxdy, A is the region bounded by x = 0, y = x and y = π. y

Z xydxdy, A is the region bounded by y = sin x and y = cos x and between

6. A

the intersection points x =

π 5π and x = . 4 4

Z

dxdydz , A is the region bounded by x = 1, x = 2, z = 0, y = x, z = y. 2 2 A x +y Z 1 Z 1 p 2 8. |y − x |dx dy.

7.

−1

0

Z 1 Z

1 2

sin π(1 − y )dy dx.

9. 0

x

Z 10.

(1 − x1 − x2 − · · · − xn )dx1 dx2 · · · dxn , A is the region bounded by x1 = 0, A x2 =

0, . . . , xn = 0, x1 + x2 + · · · + xn = 1.

Exercise 7.1.25. Prove that Z

Z b 1 f (x1 )dx1 dx2 · · · dxn = (b − t)n−1 f (t)dt (n − 1)! a a≤x1 ≤···≤xn ≤b Z b n Z 1 f (t)dt . f (x1 )f (x2 ) · · · f (xn )dx1 dx2 · · · dxn = n! a a≤x1 ≤···≤xn ≤b

Exercise 7.1.26. Compute the volume. 1. The region bounded by x = y and x = y 2 . 2. The area between the graphs of 1 + 2x + 3x3 + 4x4 and 2x + 3x3 + 5x4 , and −1 ≤ x ≤ 1. 3. The region bounded by the elliptic parabloid z = x2 + y 2 and the plane z = 1. 4. The region bounded by the elliptic parabloid z = 2x2 + y 2 and the cylinder z = 2 − y2. 5. The region bounded by x1 = 0, x2 = 0, . . . , xn = 0, x1 + x2 + · · · + xn = 1. Z +∞ Exercise 7.1.27. Suppose f (x) is continuous and the improper integral f (x)dx Z +∞ Z b 0 converges. For 0 < a < b, show that the improper integral f (tx)dt dx also converges and compute the improper integral.

0

a

Exercise 7.1.28. Study the existence and the equalities between the double integral and the repeated integrals. ( 1 if x is rational 1. f (x, y) = on [0, 1] × [0, 1]. 2y if x is irrational ( 1 if x is rational 2. f (x, y) = on [0, 1] × [0, 2]. 2y if x is irrational

356

CHAPTER 7. MULTIVARIABLE INTEGRATION

( 1 if x, y are rational 3. f (x, y) = . 0 otherwise ( R(x) if x, y are rational 4. f (x, y) = , where R(x) is the Thomae function 0 otherwise in Example 1.4.2.  1 if x = 1 , n ∈ N and y is rational 5. f (x, y) = . n 0 otherwise  1  1 if x = , n ∈ N and y is rational   n 6. f (x, y) = 1 if x is rational and y = 1 , n ∈ N .   n   0 otherwise Exercise 7.1.29. Suppose f (x) and g(x) are increasing functions on [0, 1]. By considering the double integral of (f (x) − f (y))(g(x) − g(y)), prove that 1

Z

Z

1

f (x)g(x)dx ≥ 0

Z f (x)dx

1

g(x)dx.

0

0

More generally, for p(x) ≥ 0, prove the Chebychev inequality Z

b

Z

b

Z p(x)f (x)g(x)dx ≥

p(x)dx a

b

Z p(x)f (x)dx

a

a

b

p(x)g(x)dx. a

Exercise 7.1.30. Suppose f (x) is a continuous function on [a, b]. By considering the double integral of (f (x) − f (y))2 , prove that Z

2

b

f (x)dx

Z

b

≤ (b − a)

a

f (x)2 dx.

a

Exercise 7.1.31. Suppose A and B are subsets with volumes. Suppose f (~x) and g(~y ) are bounded functions on A and B. 1. Z Prove that if f (~x) and Zg(~y ) are integrable, then f (~x)g(~y ) is integrable, and Z f (~x)g(~y )dµ~x,~y = f (~x)dµ~x g(~y )dµ~y . A×B

A

B

Z f dµ 6= 0, then f (~x)g(~y ) is integrable

2. Prove that if f (~x) is integrable and A

if and only if g(~y ) is integrable.

7.1.5

Volume in Vector Space

A collection Σ of subsets is a ring if it satisfies A, B ∈ Σ =⇒ A ∪ B, A ∩ B, A − B ∈ Σ. A volume theory is a function ν defined on a ring Σ, such that the following properties are satisfied.

7.1. RIEMANN INTEGRATION

357

1. Non-negativity: ν(A) ≥ 0. 2. Additivity: If A1 , A2 , . . . , Ak are disjoint, then ν(A1 ∪ A2 ∪ · · · ∪ Ak ) = ν(A1 ) + ν(A2 ) + · · · + ν(Ak ). The properties imply the monotonicity A ⊂ B and A, B ∈ Σ =⇒ ν(A) ≤ µ(A) + ν(B − A) = ν(B). We also say a subset A has ν-volume (or ν-measurable) if A ∈ Σ. The notation µ will be reserved for the specific volume theory developed in Section 7.1.1. By Propositions 7.1.9 and 7.1.10, if f is a non-negative function on RZn that is Riemann integrable on any subset with µ-volume, then ν(A) = f dµ is a volume theory on the collection of subsets with µ-volumes.

A

Exercise 7.1.32. Given a volume theory ν on a ring Σ, define the outer and inner volumes of any A ⊂ X by ν + (A) =

inf

K∈Σ,K⊃A

ν(K), ν − (A) =

sup

ν(L).

L∈Σ,L⊂A

We say A has ν¯-volume if ν + (A) = ν − (A), and denote ν¯(A) = ν + (A) = ν − (A). ¯ of subsets with ν¯-volume is still a ring. Moreover, Prove that the collection Σ ¯ prove that ν¯ is indeed a volume theory on Σ. Exercise 7.1.33. Prove that finite disjoint unions of all types of rectangles (i.e., products of all types of intervals, including single points) form a ring Σ of subsets of Rn . Moreover, define the volume µ(I) of rectangles I in a usual way and define P µ(A) = µ(Ij ) for A = ∪Ij ∈ Σ, Ij disjoint. 1. Prove that µ is well-defined. In other words, µ(A) is independent of the way A is divided into disjoint rectangles. 2. Prove that µ is a volume theory on Σ. The volume theory µ ¯ produced by the process in Exercise 7.1.32 is then the Jordan measure theory.

A volume theory ν on a vector space V is translation invariant if A ∈ Σ implies A + ~v ∈ Σ, and ν(A + ~v ) = ν(A) for any ~v ∈ V. The following result shows that translation invariant volume theories are essentially unique. Proposition 7.1.13. Suppose ν is a translation invariant volume theory on Rn , such that all subsets with µ-volumes also have ν-volumes. Then ν(A) = cµ(A) for some constant c.

358

CHAPTER 7. MULTIVARIABLE INTEGRATION

Proof. For any x, N and distinct x1 , x2 , . . . , xN ∈ [a1 , b1 ], we have N ν(x × [a2 , b2 ] × [a3 , b3 ] × · · · × [an , bn ]) =

N X

ν(xi × [a2 , b2 ] × [a3 , b3 ] × · · · × [an , bn ])

(translation invariant)

i=1

=ν({x1 , x2 , . . . , xN } × [a2 , b2 ] × [a3 , b3 ] × · · · × [an , bn ]) ≤ν([a1 , b1 ] × [a2 , b2 ] × [a3 , b3 ] × · · · × [an , bn ]).

(additive) (monotone)

Since N is arbitrary, we must have ν(x × [a2 , b2 ] × [a3 , b3 ] × · · · × [an , bn ]) = 0. Similar equality holds for other “reduced rectangles”, including ones with open or partly open intervals. In particular, the additivity holds if the intersections of Ai are finite unions of reduced rectangles. 1 Let N be a natural number and = . The cube [0, 1]n is the union of N translations of N n copies of the small cube [0, ]n , such that the intersections among the translations are reduced rectangles. Therefore ν([0, ]n ) = n N n ν([0, ]n ) = n ν([0, 1]n ) = cn , c = ν([0, 1]n ). Now for any rectangle I = [a1 , b1 ] × [a2 , b2 ] × [a3 , b3 ] × · · · × [an , bn ], we find non-negative integers ki satisfying ki ≤ bi − ai ≤ (ki + 1). Then Y Y I+ = [ai , ai + (ki + 1)] ⊃ I ⊃ I − = [ai , ai + ki ]. By filling I + and I − with translations of [0, ]n , we get Y Y + n − ν(I ) = (ki + 1) c ≥ ν(I) ≥ ν(I ) = ki cn . By limN →∞ ki = limN →∞ (ki + 1) = bi − ai , we conclude that Y ν(I) = c (bi − ai ) = cµ(I). Let A be any subset with µ-volume. For a partition P , we have the − approximations A+ P and AP of A. By the additivity and the fact that ν = cµ − − + on rectangles, we get ν(A+ P ) = cµ(AP ) ≥ ν(A) ≥ ν(AP ) = cµ(AP ). Since − µ(A+ P ) and µ(AP ) can become arbitrarily close to µ(A) as P gets more refined, we conclude that ν(A) = cµ(A). The discussion on translation invariant volume theory helps us to study the change of the volume under linear transforms. Proposition 7.1.14. Suppose L : Rn → Rn is a linear transform. Suppose a subset A has µ-volume. Then L(A) has µ-volume, and µ(L(A)) = | det(L)|µ(A).

7.1. RIEMANN INTEGRATION

359

Proof. For any rectangle I, L(I) is a parallelepiped, which has µ-volume by Example 7.1.5. This implies that if A is a finite union of rectangles, then L(A) has µ-volume. Denote cL = µ(L([0, 1]n )). By the translation invariance property of A 7→ µ(L(A)) and the proof of Proposition 7.1.13, we have µ(L(A)) = cL µ(A) for any finite union A of rectangles. Repeating the last part of the proof, + for any subset A with µ-volume, we get µ(L(A+ P )) = cL µ(AP ) ≥ µ(L(A)) ≥ − − µ(L(AP )) = cL µ(AP ). Then by the last criterion in Proposition 7.1.4, we conclude that L(A) also has µ-volume. Moreover, the estimation tells us µ(L(A)) = cL µ(A). It remains to show cL = | det(L)|. For linear transforms L, K : Rn → Rn , we have µ(L ◦ K(A)) = µ(L(K(A)) = cL µ(K(A)) = cL cK µ(A). This tells us cL◦K = cL cK . On the other hand, any linear transform is a composition of the following types (called elementary linear transforms). 1. Exchange: L(x1 , x2 , x3 , . . . , xn ) = (x2 , x1 , x3 , . . . , xn ). 2. Scaling: L(x1 , x2 , . . . , xn ) = (cx1 , x2 , . . . , xn ). 3. Shifting: L(x1 , x2 , x3 , . . . , xn ) = (x1 , x2 + cx1 , x3 , . . . , xn ). By cL◦K = cL cK and | det(L ◦ K)| = | det(L)|| det(K)|, it suffices to verify the formula cL = µ(L([0, 1]n )) = | det(L)| for the elementary linear transforms. For the exchange, we have L([0, 1]n ) = [0, 1]n , so that cL = µ([0, 1]n ) = 1 = | − 1| = | det L|. For the scaling, we have L([0, 1]n ) = [0, c] × [0, 1]n−1 , so that cL = µ([0, c] × [0, 1]n−1 ) = |c| = | det L|. For the shifting, we have L([0, 1]n ) = B × [0, 1]n−2 , where B ⊂ R2 is the parallelogram with (0, 0), (1, c), (0, 1), (0, 1 + c) as vertices. It is easy to see that µ(B) = 1, so that cL = µ(B × [0, 1]n−2 ) = 1 = | det L|. Proposition 7.1.14 suggests that the concept of having µ-volume is welldefined on any finite dimensional vector space V . Proposition 7.1.13 further tells us that translation invariant volumes on the collection of subsets of V with µ-volumes are unique up to multiplying a constant. In particular, such a volume theory is completely determined by the volume of a nondegenerate parallelepiped (rectangles are not well defined in vector spaces, parallelepipeds are well defined). Exercise 7.1.34. Suppose A ⊂ Rn is a subset with volume. The cone on A with cone point ~c ∈ Rn+1 is ~cA = {(1 − t)~c + t(~a, 0) : ~a ∈ A, t ∈ [0, 1]}. 1 |cn+1 |µ(A). n+1 Exercise 7.1.35. Suppose αn is the volume of the unit ball B n ⊂ Rn . Z 1 n−1 1. Prove that αn = (1 − r2 ) 2 αn−1 dr. Prove that the volume of ~cA is

−1

2. Find the general formula for αn .

360

CHAPTER 7. MULTIVARIABLE INTEGRATION

3. What is the volume of the ellipsoid x2 + 13y 2 + 14z 2 + 6xy + 2xz + 18yz ≤ 1? The quadratic form appeared in Example 5.2.3.

Now we can justify the claim in the multilinear algebra that the volume of a parallelepiped spanned by ~x1 , ~x2 , . . . , ~xk ∈ Rn is the Euclidean norm k~x1 ∧ ~x2 ∧ · · · ∧ ~xk k2 of the wedge of the spanning vectors. Suppose V ⊂ Rn is the k-dimensional linear subspace spanned by the vectors ~x1 , ~x2 , . . . , ~xk . Then V has an inner product induced from the dot product of Rn . For any orthonormal basis of V , there is a unique translation invariant volume theory on V such that the volume of the cube spanned by the orthonormal basis is 1. Since any two orthonormal basis are related by an orthogonal linear transform, which has determinant ±1, the volume theory is actually independent of the choice of the orthonormal basis. Denote the volume theory by µV (which only depends on the inner product on V ). Equivalently, we can take any isometry L : Rk → V (linear transform preserving the inner product) and define µV = µ ◦ L−1 . Suppose K : Rk → V is the linear transform that takes the standard basis ~e1 , ~e2 , . . . , ~ek to ~x1 , ~x2 , . . . , ~xk . Then A = K(I), where I = [0, 1]k is the parallelepiped spanned by ~e1 , ~e2 , . . . , ~ek . For ~vi = L−1 (~xi ) = L−1 ◦ K(~ei ), we have µV (A) = µV (K(I)) = (µV ◦ L)(L−1 ◦ K(I)) = µ(L−1 ◦ K(I)) (definition of µV ) −1 = | det(L ◦ K)|µ(I) (Proposition 7.1.14) = k~v1 ∧ ~v2 ∧ · · · ∧ ~vk k2 (definition of determinant) = k~x1 ∧ ~x2 ∧ · · · ∧ ~xk k2 . (L is an isometry) The computation also tells us that the equality µV (K(B)) = kK(~e1 ) ∧ K(~e2 ) ∧ · · · ∧ K(~ek )k2 µ(B) holds for B = I. Since both sides are translation invariant volume theories on Rk , the equality holds for any subset B ⊂ Rk with µ-volume. After changing the notations, we make a record of the fact. Proposition 7.1.15. Suppose L : Rk → Rn is an injective linear transform, with image subspace V . Then µV (L(A)) = kL(~e1 ) ∧ L(~e2 ) ∧ · · · ∧ L(~ek )k2 µ(A).

7.1.6

Change of Variable

A change of variable on Rn is an invertible map Φ : U ⊂ Rn → Rn defined on an open subset U , such that both Φ and Φ−1 are continuously differentiable. By the inverse function theorem, the condition is the same as Φ is invertible, continuously differentiable and the derivative Φ0 is invertible everywhere. Moreover, we know Φ(U ) is also open. A change of variable on a subset A is a change of variable on an open subset U containing the closure A¯ of A.

7.1. RIEMANN INTEGRATION

361

The formula for the Riemann integral under a change of variable was established for single variable functions in Theorems 3.2.4 and 3.2.5. For multivariable functions, we first study how the change of variable affects the volume. Proposition 7.1.14 tells us how the volume changes under a linear change of variable. For a general change of variable on a subset A with volume, we consider a partition P of A and choices ~x∗I ∈ I for I ∈ P . Then Φ is approximated by the linear map LI (~x) = Φ(~x∗I ) + Φ0 (~x∗I )(~x − ~x∗I ) on I, and we expect the volume of Φ(I) to be approximated by the volume of LI (I), Φ0 (~x∗I )|µ(I). Thus the volume of Φ(A) is approximated P which is 0| det by I∈P | det Φ (~x∗I )|µ(I), which leads to the expectation that Z µ(Φ(A)) = | det Φ0 (~x)|dµ~x . A

This further leads to the formula for the Riemann integral under a change of variable. Theorem 7.1.16 (Change of Variable). Suppose A ⊂ Rn has volume. Suppose Φ : Rn → Rn is a change of variable on A. Then Φ(A) has volume. Moreover, for any Riemann integrable function f on Φ(A), f ◦ Φ is also Riemann integrable on A, and Z Z 0 f (~y )dµ~y . (7.1.13) f (Φ(~x))| det Φ (~x)|dµ~x = A

Φ(A)

The sketchy argument leading to the formula (7.1.13) contains some technical difficulties. First, we do not yet know that Φ(I) has volume, so we can only talk about µ+ (Φ(I)) or µ− (Φ(I)), at least in the beginning of a rigorous argument. Second, the estimation on the volume of Φ(I) can only be made by finding a nice subset I Φ containing or contained in Φ(I). We expect I Φ to be a “slightly inflated” version of the parallelepiped LI (I) (for containing Φ(I)) or a “slightly shrunken” version (for being contained in Φ(I)). However, the only information we have is the fact that LI (~x) is a linear approximation of Φ, which is some inequality about norms. The subset containment that can be derived from the inequality must be the containment of balls. Thus we must consider partitions consisting of balls. This means that the partitions are made up of closed balls ¯L∞ (~c, r) = {~x : k~x − ~ck∞ ≤ r} = ~c + [−r, r]n B in the L∞ -norm. Moreover, the containment argument between Φ(I) and I Φ must be done after using a linear transform to convert the parallelepiped I Φ to an L∞ -ball. Proof. The change of variable Φ is defined on an open subset U containing the ¯ We need to consider partitions P and take linear approximations closure A.

362

CHAPTER 7. MULTIVARIABLE INTEGRATION

of Φ on I ∈ P that have nonempty intersections with A. Therefore we need to consider Φ on a compact subset slightly bigger than A and still contained in U . By Proposition 5.1.9, there is δ0 > 0, such that ¯ ⊂ U. K = {~y : k~y − ~xk∞ ≤ δ0 for some ~x ∈ A} Since A is bounded, it is easy to see that K is a bounded and closed subset. By Proposition 5.1.3, K is compact. With the help of Theorem 5.1.15, we can set up the bounds and the linear approximations for Φ on K. Specifically, the uniform continuity of Φ0 and Proposition 6.1.4 tell us that for any > 0, there is δ > 0, such that if k~y − ~xk∞ < δ and the straight line connecting ~x to ~y is contained in K, then kΦ(~y ) − Φ(~x) − Φ0 (~x)(~y − ~x)k∞ ≤ k~y − ~xk∞ .

(7.1.14)

Moreover, the boundedness of Φ0 and (Φ0 )−1 tells us that | det Φ0 | < M and k(Φ0 )−1 k < M on K for some constant M , where the norms of the linear transform is with respect to the L∞ -norm. .................. ..... ..... ..... ..... ..... .... .. ...... ... .. ....... ...................................................... .. ............................................ ............................................ .... . . . .. .... .. .. . . . . . . . . . . .. I . . . . .......... .... ... ........ . . .. . .... .. ... . . . . . . . . . . .............. ...... . .......... .. ..........Φ . . .... . . .. . . . . . . . . . . . . . . . . . . . Φ(~c) .............. ....................... . .... ~ c .. ....................... ............... .... .. . ... •.... ~c • • . . ........ . . ..... ................ . . .. . . . .... .. . −1 .. . . .. r . . . . .... ................ ...... . L . . .. . . . . r ... ..... .. . L . ... . . . . .... ............... ...... . . . . . . . ............................................. . . . . . . . . . . .... .... ............... .. ................................................................ .. ........... .... ....... .. . ..... ..... ..... ..... ..... ..... ......... .........br . Figure 7.2: estimate the volume of Φ(I) ¯L∞ (~c, r) ⊂ K with radius r < δ. We Now consider a closed L∞ -ball I = B would like to estimate the volume of Φ(I). Denote the linear approximation of Φ at ~c by L(~x) = Φ(~c) + Φ0 (~c)(~x − ~c). Then for any ~x ∈ I, by (7.1.14) we have kΦ(~x) − L(~x)k∞ ≤ k~x − ~ck∞ ≤ r. This implies kL−1 (Φ(~x)) − ~xk∞ = kΦ0 (~c)−1 (Φ(~x) − L(~x))k∞ ≤ M kΦ(~x) − L(~x)k∞ ≤ M r, and kL−1 (Φ(~x)) − ~ck∞ ≤ kL−1 (Φ(~x)) − ~xk∞ + k~x − ~ck∞ ≤ M r + r = (1 + M )r. ¯L∞ (~c, (1 + M )r), which is the same as Φ(I) ⊂ Therefore L−1 (Φ(I)) ⊂ B ¯L∞ (~c, (1 + M )r)). Thus by Proposition 7.1.14, L(B ¯L∞ (~c, (1 + M )r))) = | det Φ0 (~c)|µ(B ¯L∞ (~c, (1 + M )r)) µ+ (Φ(I)) ≤ µ(L(B = (1 + M )n | det Φ0 (~c)|µ(I). (7.1.15)

7.1. RIEMANN INTEGRATION

363

Next we need to put the estimations together. For the given > 0, there is δ > 0, such that 0

− kP k < δ 0 =⇒ µ(A+ P ) − µ(AP ) < . δ0 δ0 Fix any r < min , δ, and choose P to consist of special rectangles of 2 2 ¯L∞ (~c, r). By 2r < δ0 , we have the form I = B

I ∩ A 6= ∅ =⇒ I ⊂ K. Therefore the estimation (7.1.15) can be applied to Φ(I) and gives us X X | det Φ0 (~c)|µ(I) µ+ (Φ(A)) ≤ µ+ (Φ(I)) ≤ (1 + M )n I∩A6=∅

I∩A6=∅

≤ (1 + M )

n

M µ(A+ P ).

(7.1.16)

Applying the inequality (7.1.16) to ∂A in place of A, we get µ+ (Φ(∂A)) ≤ (1 + M )n M µ((∂A)+ P ). Since A has volume, µ((∂A)+ P ) can become arbitrarily small. Therefore + µ (Φ(∂A)) = 0. By the continuity of Φ and Φ−1 , we have Φ(∂A) = ∂Φ(A). Then by the second criterion in Proposition 7.1.2, we conclude that Φ(A) has volume. Moreover, the inequality (7.1.16) (the first and the third terms, more precisely) further tells us µ(Φ(A)) = µ+ (Φ(A)) ≤ (1 + M )n

X

| det Φ0 (~c)|µ(I)

I∩A6=∅

 ≤ (1 + M )n 

 X I⊂A

X

+

 | det Φ0 (~c)|µ(I)

I∩A6=∅,I∩(Rn −A)6=∅

 ≤ (1 + M )n 

 X

| det Φ0 (~c)|µ(I ∩ A) + M

I∈P

= (1 + M )

n

X

µ(I)

I∩∂A6=∅ 0

S(P ∩ A, | det Φ |) + M µ((∂A)+ P) .

As → 0 and kP k → 0, we get Z µ(Φ(A)) ≤

| det Φ0 |dµ.

A

We obtained an inequality for the change of volume. We will show that this implies a corresponding inequality for the change of Riemann integrals, at least for non-negative functions. Then by applying the inequality to the change of variable Φ−1 , we get an inequality for the change of Riemann integrals in the opposite direction. Two inequalities in both directions become an equality.

364

CHAPTER 7. MULTIVARIABLE INTEGRATION

First we need to clarify the integrability. By Proposition 7.1.6, the integrability of f on Φ(A) means that the graph GΦ(A) (f ) has volume. By what we just proved, the change of variable (Φ−1 , id) takes GΦ(A) (f ) to another subset (Φ−1 , id) GΦ(A) (f ) = GA (f ◦ Φ) with volume. Again by Proposition 7.1.6, this means that f ◦ Φ is integrable on A. Second we study the change of integral for an integrable function f ≥ 0 on A. Suppose P is a general partition of A. Then Φ(P ) is also a general partition. By choosing ~x∗I ∈ I and the corresponding Φ(~x∗I ) ∈ Φ(I), we have Z X X ∗ ∗ f (Φ(xI )) | det Φ0 (~x)|dµ~x . S(f, Φ(P )) = f (Φ(xI ))µ(Φ(I)) ≤ I∈P

I∈P

I

(7.1.17) On the other hand, Z X f (Φ(x∗I )) | det Φ0 (~x)|dµ~x S((f ◦ Φ)| det Φ0 |, P ) − I I∈P Z X ≤ f (Φ(x∗I )) | det Φ0 (x∗I )|µ(I) − | det Φ0 (~x)|dµ~x I I∈P X f (Φ(x∗I ))ωI (| det Φ0 |)µ(I). (7.1.18) ≤ I∈P

Since f is bounded and | det Φ0 | is integrable (by continuity), for any > 0, there is δ > 0, such that kP k < δ implies X

f (Φ(x∗I ))ωI (| det Φ0 |)µ(I) < .

(7.1.19)

I∈P

Combining (7.1.17), (7.1.18), (7.1.19) together, we see kP k < δ implies S(f, Φ(P )) ≤ S((f ◦ Φ)| det Φ0 |, P ) + . The boundedness of Φ0 implies that kΦ(P )k is also small when kP k is small. Thus the inequality above implies Z Z f (~y )dµ~y ≤ f (Φ(~x))| det Φ0 (~x)|dµ~x . (7.1.20) Φ(A)

A

Applying the inequality (7.1.20) to Φ−1 , (f ◦ Φ)| det Φ0 |, Φ(A) in place of Φ, f , A, we get the inequality in the opposite direction. Thus the formula for the change of variable is proved for non-negative functions. The general case can be proved by expressing any Riemann integrable function f as f = max{0, f } − max{0, −f }, where both max{0, f } and max{0, −f } are nonnegative and Riemann integrable.

7.1. RIEMANN INTEGRATION

365

Example 7.1.10. Suppose a region A in the cartesian coordinate corresponds to a region B in the polar coordinate. Then by taking the determinant of the Jacobian matrix in Example 6.1.8, we have Z Z f (r cos θ, r sin θ)rdrdθ. f (x, y)dxdy = B

A

For example, the volume of the intersection of the ball x2 + y 2p + z 2 ≤ R2 and the 2 2 2 2 2 cylinder p x + y ≤ Rx is the region between the graphs of z = R − x − y and z = − R2 − x2 − y 2 over the disk x2 + y 2 ≤ Rx. Since the boundary of the disk π π is given by r = R cos θ for − ≤ θ ≤ , the volume is 2 2 Z Z RZ π p p 2 πR3 2 2 2 2 R − x − y dxdy = R2 − r2 rdrdθ = . 3 x2 +y 2 ≤Rx −π 0 2

Example 7.1.11. Suppose ~ : A ⊂ Rn−1 → S n−1 = {~x : k~xk2 = 1} ⊂ Rn F (θ) is a parametrization of the unit sphere (see Example 6.1.10 for an example in case n = 3). Then ~ = rF (θ) ~ : (0, ∞) × A → Rn Φ(r, θ) is a spherical change of variable (away from ~0). We have Φ0 = F

rF 0 , det Φ0 = rn−1 det F

F0 .

For a Riemann integrable function f (t) on [a, b], b > a ≥ 0, we have Z Z f (k~xk2 )dµ = f (r)rn−1 | det F F 0 |drdµθ~ ~ a≤r≤b,θ∈A Z b

a≤k~ xk2 ≤b

f (r)rn−1 dr.

= βn−1 a

The quantity Z | det F

βn−1 = A

F 0 |dµθ~

is independent of the choice of parametrization F and is the (n − 1)-dimensional volume of the unit sphere. For the special case n = 2 and n = 3, we have β1 = 2π and   Z sin φ cos θ cos φ cos θ − sin φ sin θ det  sin φ sin θ cos φ sin θ sin φ cos θ  dφdθ = 4π. β2 = [0,π]×[0,2π] cos φ − sin φ 0 We also note that by taking f = 1, a = 0, b = 1, we get the volume Z αn = βn−1 0

1

rn−1 dr =

βn−1 n

of the unit ball B n . Then βn−1 can be computed by using Exercise 7.1.35.

366

CHAPTER 7. MULTIVARIABLE INTEGRATION

Example 7.1.12. Suppose A ⊂ R2 has area and all (x, y) ∈ A satisfies y ≥ 0. The rotation of A with respect to the x-axis is the subset B = {(x, y, z) : (x, r) ∈ A, x2 + y 2 = r2 } ⊂ R3 . By using the parametrization Φ(x, r, θ) = (x, r cos θ, r sin θ) : A × [0, 2π) → B and det Φ0 = r, we find the volume Z µ(B) =

Z ydxdy.

rdxdrdθ = 2π A

A×[0,2π)

Introduce the center of weight ∗ (x∗A , yA )

1 = µ(A)

Z (x, y)dxdy A

for the subset A. Then we get ∗ µ(B) = 2πyA µ(A). ∗ is the average The formula is called the Pappus-Guldinus theorem. We note that yA of the distance from points in A to the x-axis. Next we consider the more general case of the rotation along a straight line L : ax + bx = c. Assume A lies in the “positive side” of the straight line, which means that all (x, y) ∈ A satisfies ax + by ≥ c. Denote by B the subset of R3 obtained by rotating A around L. To compute the volume of B, we introduce a new coordinate system on R2 by

F (˜ x, y˜) = (x0 + x ˜ cos α − y˜ sin α, y0 + x ˜ sin α + y˜ cos α) : R2 → R2 , where (x0 , y0 ) is a point on L and α is the angle of inclination of L. The transform takes the x-axis to L, and F −1 (A) lies in the upper half plane. A corresponding rotation in R3 takes the rotation of F (A) with respect to the x-axis to B. Since the rotation does not change distance and volume, we still have ∗ µ(B) = 2π y˜A µ(F −1 (A)) = 2πdµ(A),

where d is the average distance from points in A to L. Since the distance from (a, b) (x, y) to L is y˜ = √ · (x − x0 , y − y0 ), we have a2 + b2 Z ∗ −y ) ∗ −c a(x∗A − x0 ) + b(yA ax∗ + byA (a, b) 0 √ √ √ · (x−x0 , y−y0 )dµ = = A . d= µ(A) a2 + b2 A a2 + b2 a2 + b2 Exercise 7.1.36. Find the formula for the integral under the changes of variable in Exercise 6.1.12. Exercise 7.1.37. Use suitable change of variable to compute the integral. Z 1. |xy|dxdy. x2 +y 2 ≤a2

Z 2.

x−y

e x+y dxdy, A is the region bounded by x = 0, y = 0, x + y = 1. A

7.1. RIEMANN INTEGRATION Z 3.

367

(ax2 +2bxy +cy 2 )dxdy, A is the part of the unit disk in the first quadrant.

A

Z 4.

z 2 dxdydz, A is the intersection of the ball x2 + y 2 + z 2 ≤ 1 and the

A

cylinder x2 + y 2 ≤ x. Exercise 7.1.38. Compute the area and volume. 1. The region on the plane bounded by the curve r = sin nθ in polar coordinate. 2. The region on the plane bounded by the curve (x2 + y 2 )2 = 2ax3 . 3. The intersection of the balls x2 + y 2 + z 2 ≤ 1 and x2 + y 2 + z 2 ≤ 2az. Exercise 7.1.39. Extend the Pappus-Guldinus Theorem in Example 7.1.12 to high dimension. Specifically, we may rotate a subset A ⊂ Rn with volume and on the positive side of the hyperplane ~a · ~x = b around the hyperplane. Moreover, the rotation of a single point can be a sphere instead of a circle. Exercise 7.1.40. Suppose f (t) is a Riemann integrable function. Prove that Z Z p 1 f (k~x − ~ak2 )dµ = βn−2 f ( (x − k~ak2 )2 + y 2 )|y|n−2 dxdy. 2 k~ xk2 ≤R x2 +y 2 ≤R2

7.1.7

Improper Integration

Suppose A ⊂ Rn is an unbounded subset such that A ∩ B has volume for any bounded subset B with volume. Suppose f is a function on A that is Riemann integrable on A ∩ B for any bounded subset B with volume. The condition is the same as that for any R, the intersection A ∩ BRn with the n ball of radius R has Z volume, and f is Riemann integrable on A ∩ BR . The f dµ converges to I if for any > 0, there is R, such that

improper integral A

Z

A∩B

f dµ − I <

for any subset B with volume and containing BRn . Proposition 7.1.17. Suppose f is a function on an unbounded subset A, such that for any bounded subset B with volume, A ∩ B has volume and f is Riemann integrable on A ∩ B. Then the following are equivalent. Z 1. The improper integral f dµ converges. A

Z |f |dµ converges.

2. The improper integral A

3. For any > 0, there is R > 0, such that subset C has Z if a bounded volume and satisfies C ∩ BRn = ∅, then f dµ < . A∩C

368

CHAPTER 7. MULTIVARIABLE INTEGRATION Z |f |dµ < M for all bounded subsets B

4. There is M > 0, such that A∩B

with volume.

Z 5. There is M > 0, such that f dµ < M for all bounded subsets B A∩B with volume. In the fourth statement, B can be replaced by BRn , or by any sequence Bk with the property that any BRn is contained in some Bk . Proof. We have the Cauchy criterion for the convergence: For any > 0, there is R > 0, such that if B1 and B2 have volumes and contain BRn , then Z Z < . f dµ − f dµ A∩B1

A∩B2

By taking B1 = C ∪ BRn and B2 = BRn , the Cauchy criterion becomes the third statement. Conversely, suppose the third statement holds. Then for B1 and B2 in the Cauchy criterion, B1 − B2 and B2 − B1 satisfy the condition for C in the second statement. Therefore Z Z Z Z = < 2. f dµ − f dµ f dµ − f dµ A∩B1

A∩B2

A∩(B1 −B2 )

A∩(B2 −B1 )

This proves that the first and the third statements are equivalent. By an argument similar to the the convergence of monotone sequences, the second and the fourth statements are equivalent. The first statement implies the fifth. By Proposition 7.1.11, the fifth statement implies the fourth. Then the fourth statement, being equivalent to the second statement, implies the first by way of Cauchy criterion. This completes the proof that all statements are equivalent. In contrast to the single variable case, an improper integral converges if and only if it absolutely converges. The reason is that more freedom is allowed for the choice of B in the definition of the improper integral. If we restrict the choice of B to balls only, then we get a concept similar to the single variable case, and there will be a difference between absolute and conditional convergences. The third condition in Proposition 7.1.17 is the Cauchy criterion for the convergence. This allows us to establish the comparison tests for the convergence similar to the single variable case. Moreover, the usual properties of the Riemann integrals, including the Fubini theorem and the change of variable formula, can be extended to improper integrals. Example 7.1.13. Suppose f (r) is a function on [a, +∞). By the computation in Example 7.1.11, we have Z

Z |f (k~xk2 )|dµ = βn−1

a≤k~ xk2 ≤R

a

R

|f (r)|rn−1 dr.

7.1. RIEMANN INTEGRATION

369 Z

Z

+∞

|f (r)|rn−1 dr are equivaa Z R |f (r)|rn−1 dr, we con|f (k~xk2 )|dµ and

f (k~xk2 )dµ and

Since the convergence of k~ xk2 ≥a

Z lent to the boundedness of a≤k~ xk2 ≤R

a

clude the convergence of the two improper integrals are equivalent. In particular, Z dµ converges if and only if p > n, and we have xkp2 k~ xk2 ≥a k~ Z k~ xk2 ≥a

dµ = βn−1 k~xkp2

Z

+∞

a

1 n−1 βn−1 r dr = . p r (p − n)ap−n

Z

e−(x 2 R Z for sufficiently big k(x, y)k2 and

Example 7.1.14. The improper integral 1 + y 2 )2 bini theorem, we have (x2

Z e

−(x2 +y 2 )

2 +y 2 )

dxdy converges because e−(x

x2 +y 2 ≥1

+∞

Z dxdy =

−x2

e

R2

+∞

Z dx

−y 2

e

−∞

2 +y 2 )

dxdy converges. By Fu+ y 2 )2

(x2

dy

Z

+∞

=

−∞

−x2

e

2 dx

.

−∞

On the other hand, we have Z

e−(x

2 +y 2 )

Z

R2

Thus we conclude that Z

2

re−r dr = π.

0

+∞

−x2

e 0

+∞

dxdy = β1

1 dx = 2

Z

√

+∞

e

−x2

dx =

−∞

π . 2

Exercise 7.1.41. State and prove the comparison test for the convergence of improper integrals. Z Exercise 7.1.42. Prove that if f ≥ 0 and the improper integral f dµ converges, A

then Z

Z

Z

f dµ = sup A

B

f dµ = sup A∩B

f dµ,

n R>0 A∩BR

where B are all bounded subsets with volume. Exercise 7.1.43. Suppose A ⊂ Rn is an unbounded subset, such that A ∩ B has volume for any bounded subset B with volume. We say A has volume if µ(A ∩ B) is bounded for all B and denote µ(A) = supB µ(A ∩ B). 1. Prove that A has volume if and Z only if the characteristic function χA is n integrable on R , and µ(A) = χA dµ. Rn

2. Prove that the extended volume satisfies µ(A∪B)+µ(A∩B) = µ(A)+µ(B). Z 3. Prove that for unbounded A with volume, the improper integral f dµ A

converges if and only if GA (f ) has volume.

≤

370

CHAPTER 7. MULTIVARIABLE INTEGRATION

Exercise 7.1.44. Suppose A and B have the property that A ∩Z C and B ∩ C have f dµ converges volume for any bounded subset C with volume. Prove that A∪B Z Z Z Z f dµ = f dµ + f dµ converge. Moreover, f dµ and if and only if A∩B A∪B B A Z Z f dµ. f dµ + A

B

Z

+∞

|f (r2 )|rn−1 dr converges and q(~x) = A~x · ~x is a

Exercise 7.1.45. Suppose 0

quadratic form given by a positive definite matrix A. Prove that Z Z +∞ βn−1 f (q(~x))dµ = √ f (r2 )rn−1 dr. n det A 0 R Exercise 7.1.46. Suppose ϕ(~x) is a non-negative continuously differentiable homoZ geneous function of order p > 0 satisfying ~x·∇ϕ(~x) 6= 0. Prove that f (ϕ(~x))dµ ϕ(~ x)≥1 Z +∞ converges if and only if |f (rp )|rn−1 dr converges. 1

The result still holds even if restricted to a cone with originZ as the vertex. A consequence is that for any 1 ≤ p ≤ ∞, the improper integral f (k~xkp )dµ k~ xkp ≥1 Z +∞ converges if and only if |f (r)|rn−1 dr converges. 1

Exercise 7.1.47. Determine the convergence of the improper integrals and compute the convergent ones if possible. Z Z Z dxdy log(xy)dxdy e−|x−y| dxdy 3. . 5. . 1. . p q 2 2 R2 (1 + |x|) (1 + |y|) [1,+∞)2 x + xy + y R2 1 + |x − y| Z Z Z dxdy (x − y)dxdy sin x sin ydxdy 2. . 4. . . 6. p p (x + y) (x + y + 1) 2 2 (x2 + y 2 )p [1,+∞) [0,+∞) R2 Z xp y q Exercise 7.1.48. Study the convergence of dxdy, where all the m n k x,y≥1 (x + y ) parameters are positive. Extend the study to more variables. Exercise 7.1.49. Study the convergence for any norm k~xk. Z Z Z p 1. k~xk dµ. 2. k~xk−k~xk dµ. 3. Rn

k~ xk≥1

Rn

sin k~xk dµ. (1 + k~xk)p

Suppose f is an unbounded function on a bounded subset A with volume, such that f is Riemann integrable on A − B(~x0 , δ) for any δ > 0. Then we Z say the improper integral δ > 0, such that

f dµ converges to I if for any > 0, there is A

Z

A−B

f dµ − I <

for any subset B with volume satisfying B(~x0 , δ 0 ) ⊂ B ⊂ B(~x0 , δ) for some δ 0 < δ. The improperness of multivariable integration may also appear along more sophisticated subsets. See Example 7.1.16.

7.1. RIEMANN INTEGRATION

371

With the necessary modification, Proposition 7.1.17 remains largely true. In particular, the convergence may be determined by the uniform boundedness. Example 7.1.15. Suppose f (r) is a function on (0, a]. By the computation in Z Example 7.1.11, the improper integral f (k~xk2 )dµ converges if and only if k~ xk2Z≤a Z a dµ |f (r)|rn−1 dr converges. In particular, converges if and only if xkp2 0 k~ xk2 ≤a k~ p < n, and we have Z Z a dµ 1 n−1 βn−1 an−p r dr = . = β n−1 p n−p xkp2 k~ xk2 ≤a k~ 0 r Z dxdy p Example 7.1.16. The integral is improper along 1 × 2 (x + y)(1 − x)(1 − y) (0,1) [0, 1] ∪ [0, 1] × 1 and (0, 0). Near 1 × [0, 1] ∪ [0, 1] × 1, we have 1 1 1 √ p ≤p ≤p . 2 (1 − x)(1 − y) (x + y)(1 − x)(1 − y) (1 − x)(1 − y) Z dxdy p = By the boundedness criterion and the convergence of 2 (1 − x)(1 − y) (0,1) Z Z 1 Z 1 dxdy dx dy √ √ p , we see that converges along 1−y 1−x 0 (x + y)(1 − x)(1 − y) (0,1)2 0 1 × [0, 1] ∪ [0, 1] × 1. Near (0, 0), we have c1 1 c2 p ≤p ≤ p 4 4 2 2 2 (x + y)(1 − x)(1 − y) x +y x + y2 Z dxdy p for some c1 , c2 > 0. By Example 7.1.15, the integral con4 x2 + y 2 x,y≥0,x2 +y 2 ≤1 Z dxdy p verges. Therefore converges at (0, 0). We conclude (x + y)(1 − x)(1 − y) (0,1)2 that the whole integral converges. Z (x2 − y 2 )dxdy Exercise 7.1.50. Prove that the improper integral diverges (x2 + y 2 )2 x2 +y 2 ≤1 Z (x2 − y 2 )dxdy but the Cauchy principal value lim→0+ converges. (x2 + y 2 )2 2 ≤x2 +y 2 ≤1 Exercise 7.1.51. Study the existence of the repeated integrals and the improper integrals. Z Z cos x dxdy. 2. e−xy sin xdxdy. 1. y 2 [0,π]×(0,1] (0,+∞) Exercise 7.1.52. Determine the convergence of the improper integrals and compute the convergent ones if possible. Z |x − y|p 1. dxdy. q (0,1)2 (x + y) Z dx1 · · · dxn 2. . xk22 )p k~ xk2 <1 (1 − k~

372

CHAPTER 7. MULTIVARIABLE INTEGRATION Z

3. x+y+z≤1,x>0,y>0,z>0

zdxdydz . (x + y + z)(y + z)2

Z log sin(y − x)dxdy.

4. 0≤x
Z 5. (0,1)2

7.1.8

xp y q dxdy. (xm + y n )k

Additional Exercise

Integrability and Continuity Exercise 7.1.53. Prove that an integrable function f on A must be continuous at some interior points of A. In fact, for any ball B(~a, ) ⊂ A, f is continuous somewhere in the ball. Exercise 7.1.54. Extend Exercise 3.1.35 to multivariable Riemann integral. Suppose f is integrable on a subset A with volume. Prove that for any > 0, there is a union U of countably many rectangles, such that the sum of the volumes of the rectangles is < , and all discontinuous points of f are inside U . Exercise 7.1.55. Suppose Zf and g are Z integrable on A. Prove that if f (~x) > g(~x) f dµ > g(~x)dµ. in the interior of A, then A

A

Exercise 7.1.56. Suppose f is integrable on A. Prove the following are equivalent. Z 1. |f |dµ = 0. A

Z f dµ = 0 for any subset B ⊂ A with volume.

2. B

Z f dµ = 0 for any rectangle I ⊂ A.

3. B

Z 4.

f gdµ = 0 for any continuous function g on A. A

Z 5.

f gdµ = 0 for any integrable function g on A. A

6. f = 0 at continuous points.

Darboux Sum and Darboux Integral The upper and the lower Darboux sums (3.1.18) and (3.1.19) and the upper and the lower Darboux integrals (3.1.20) may be extended to multivariable functions. Exercise 7.1.57. For a single variable function on [a, b], there are two versions of the Darboux integrals by choosing interval partitions or by choosing partitions with subsets with volumes (lengths). Prove that the two versions are the same. The result can be extended to functions defined on rectangles in Rn .

7.1. RIEMANN INTEGRATION

373 Z

Z Exercise 7.1.58. Prove that

f (~x)dµ~x = limkP k→0 U (P, f ) and

f (~x)dµ~x = A

A

limkP k→0 L(P, f ). Z Exercise 7.1.59. Prove that if f (~x) ≥ 0 on A, then Z f (~x)dµ~x = µ− (GA (f )).

f (~x)dµ~x = µ+ (GA (f )) and

A

A

Z

Z f (~x)dµ~x ≥

Exercise 7.1.60. Prove that

f (~x)dµ~x . Moreover, the equality holds Z f (~x)dµ~x is the common if and only if f (~x) is Riemann integrable on A, and A

A

A

value.

Exercise 7.1.61. Prove that Z Z Z f (~x, ~y )dµ~x dµ~y , f (~x, ~y )dµ~x,~y ≤ A×B

B

A

Z Z

Z

f (~x, ~y )dµ~x,~y ≥ A×B

f (~x, ~y )dµ~x dµ~y . B

A

Z

Z in place of

Also write down the similar inequalities with A

. A

Exercise 7.1.62. Prove the extension of FubiniZtheorem: If f is Riemann Z integrable on A × B, then any function g(~y ) satisfying f (~x, ~y )dµ~x ≤ g(~y ) ≤ A Z Z is Riemann integrable on B, and f (~x, ~y )dµ~x,~y = g(~y )dµ~y . A×B

f (~x, ~y )dµ~x A

B

Exercise 7.1.63. Prove that if f (~x, ~y ) is integrable on A × B, then the collection of ~y for which f is not integrable in ~x is contained in a countable union of subsets of volume 0. Exercise 7.1.64. Study the Darboux integrals and the repeated Darboux integrals for the functions in Exercise 7.1.28.

Integral Continuity Exercise 7.1.65. Suppose A has volume. Suppose f is integrable on an open subset ¯ Prove that containing the closure A. Z lim |f (~x + ~t) − f (~x)|dµ = 0. ~t→~0 A

Kepler’s Second Law on Planet Motion Kepler’s Law says that the line from the sun to the planet will sweep out equal areas in equal intervals of time. Exercise 7.1.66. Suppose φ(t) = (x(t), y(t)) : [a, b] → R2 is a curve, such that the movement from the vector φ to φ0 is counterclockwise. Prove that the area swept by Z 1 b the line connecting the origin and φ(t) as t moves from a to b is (xy 0 − yx0 )dt. 2 a

374

CHAPTER 7. MULTIVARIABLE INTEGRATION

Exercise 7.1.67. Derive Kepler’s law from Exercise 7.1.66 and Newton’s second φ law of motion: φ00 = c , where c is a constant determined by the mass of the kφk32 sun.

Dirichlet Transform The Dirichlet transform is x1 + x2 + · · · + xn = t1 , x2 + · · · + xn = t1 t2 , .. . xn−1 + xn = t1 t2 · · · tn−1 , xn = t1 t2 · · · tn−1 tn . Exercise 7.1.68. Use the Dirichlet transform to prove the relation between the Beta function and the Gamma function B(x, y) =

Γ(x)Γ(y) . Γ(x + y)

Exercise 7.1.69. The standard simplex is ∆n = {(x1 , x2 , . . . , xn ) : xi ≥ 0, x1 + x2 + · · · + xn ≤ 1}. Use the Dirichlet transform to prove that that for p1 , p2 , . . . , pn > 0, we have Z Γ(p1 )Γ(p2 ) · · · Γ(pn ) xp11 −1 xp22 −1 · · · xnpn −1 dx1 dx2 · · · dxn = . (p + p + · · · + pn )Γ(p1 + p2 + · · · + pn ) n 1 2 ∆

7.2

Integration on Hypersurface

The concept of volume can be defined for good geometric objects such as curves and surfaces in Euclidean spaces. Although the definition makes use the parametrizations of the objects, the volume is independent of the parametrizations, thanks to the change of volume formula for the Riemann integral. Then the volume can be further used to define the Riemann integrals on such geometric objects. When the Riemann integrals are used for the physical quantities such as the work or the flux, we get the Riemann integral of differential forms on curves and surfaces, etc. The formalism of differential forms is compatible with orientable change of variables.

7.2.1

Rectifiable Curve

The straight line connecting ~x to ~y has length k~x − ~y k. To define the length of a continuous curve φ : [a, b] → Rn , we take a partition P of [a, b] and approximate the part of φ on [ti−1 , ti ] by connecting a straight line between the end points φ(ti−1 ) and φ(ti ). The length of the approximate curve is X µP (φ) = kφ(ti ) − φ(ti−1 )k.

7.2. INTEGRATION ON HYPERSURFACE

375

We say φ is rectifiable if µP (φ) has an upper bound. The length of a rectifiable curve is µ(φ) = sup µP (φ). P

Proposition 7.2.1. A curve is rectifiable if and only if its coordinate functions have bounded variations. Proof. Because all norms are equivalent, the rectifiability is independent of the choice of the norm. If we take the L1 -norm, then kφ(ti )−φ(ti−1 )k1 = |x1 (ti )−x1 (ti−1 )|+|x2 (ti )−x2 (ti−1 )|+· · ·+|xn (ti )−xn (ti−1 )|, and µP (φ) = VP (x1 ) + VP (x2 ) + · · · + VP (xn ). The proposition then follows. A change of variable (or reparametrization) for a parametrized curve is an invertible continuous map u : [c, d] → [a, b]. The map induces a one-to-one correspondence between the partitions Q of [c, d] and partitions P = u(Q) of [a, b]. Since µP (φ) = µQ (φ ◦ u), and kP k is small if and only if kQk is small, the rectifiability and the length are independent of the choice of parametrization. Proposition 7.2.2. Suppose Ψ : [a, b] → Rn is Riemann integrable. Then Z b Z t kΨ(t)kdt. Ψ(τ )dτ is a rectifiable curve, and its length is φ(t) = ~x0 + a

a

If φ is continuously differentiable, then Ψ = φ0 and Z b µ(φ) = kφ0 (t)kdt.

(7.2.1)

a

Proof. For any partition P of [a, b] and choices t∗i ∈ [ti−1 , ti ], we have

X Z ti

∗

|µP (φ) − S(P, kΨ(t)k)| ≤ Ψ(t)dt − kΨ(ti )k∆ti ti−1

Z X

ti ∗

≤ Ψ(t)dt − ∆t Ψ(t ) i i

ti−1

≤

X

sup

kΨ(t) − Ψ(t∗i )k∆ti .

t∈[ti−1 ,ti ]

The integrability of Ψ shows the right side is small when kP k is small. Example 7.2.1. The graph of a function f (x) on [a, b] is parametrized by φ(x) = (x, f (x)). If f is continuously differentiable, then the Euclidean length of the graph Z bp Z b 02 is 1 + f dt and the L1 -length is (1 + |f 0 |)dt. a

a

376

CHAPTER 7. MULTIVARIABLE INTEGRATION

Example 7.2.2. Consider the following “circular curves”. φ1 (t) = (cos t, sin t), 0 ≤ t ≤ 2π φ2 (t) = (cos t, sin t), 0 ≤ t ≤ 4π φ3 (t) = (cos 2t, sin 2t), 0 ≤ t ≤ 2π φ4 (t) = (cos t, − sin t), 0 ≤ t ≤ 2π ( (cos t, sin t) if 0 ≤ t ≤ π φ5 (t) = (cos t, − sin t) if π < t ≤ 2π Although the images of both φ1 and φ2 are the whole unit circle, the length of φ1 and φ2 are 2π and 4π. As a matter of fact, φ1 wraps around the circle once, and φ2 wraps around twice, and we cannot reparametrize φ1 to get φ2 . Therefore the length of a curve depends on the “track” instead of the image only. The curve φ3 is a reparametrization of φ2 by t → 2t. Therefore the length of φ3 is also 4π. The curve φ4 is a reparametrization of φ1 by t → 2π − t. Therefore the length of φ4 is also 2π. The image of the curve φ5 is the upper unit circle. The curve first moves from (1, 0) to (−1, 0) along the circle and then moves back to (1, 0) along the circle. The total distance 2π of the movement is the length of the curve. 2

2

2

Example 7.2.3. The astroid x 3 + y 3 = a 3 can be parametrized as x = a cos3 t, y = a sin3 t for 0 ≤ t ≤ 2π. The Euclidean length of the curve is Z 2π q (−3a cos2 t sin t)2 + (3a sin2 t cos t)2 dt = 6a. 0

The L∞ -length is Z 2π 1 2 2 √ max{| − 3a cos t sin t|, |3a sin t cos t|}dt = 8 1 − a. 2 2 0 2

2

2

Note that the equality x 3 + y 3 = a 3 only specifies a subset of R2 , which is the image of the astroid. Strictly speaking, the image alone is not sufficient to determine the length, as indicated by Example 7.2.2. On the other hand, we do implicitly understand that the question is to compute the length of one round of the astroid, which is indicated by the range [0, 2π] for the parametrization. Exercise 7.2.1. Prove that the L1 -length of a rectifiable curve is the sum of the variations of the coordinates on the interval. Exercise 7.2.2. Prove that if φ is a rectifiable curve on [a, b], then µ(φ) = µ(φ|[a,c] )+ µ(φ|[c,b] ) for any a < c < b. Exercise 7.2.3. Let n ≥ 2. Prove that the image of any rectifiable curve in Rn has n-dimensional volume zero. Exercise 7.2.4. Suppose φ is a rectifiable curve. Prove that for any > 0, there is δ > 0, such that kP k < δ implies µP (φ) > µ(φ)−. This implies limkP k→0 µP (φ) = µ(φ). Exercise 7.2.5. Suppose F : Rn → Rn is a map satisfying kF (~x) − F (~y )k = k~x −~y k. Prove that µ(F (φ)) = µ(φ). Exercise 7.2.6. Suppose φ(t) is a continuously differentiable curve on [a.b]. For any partition P of [a.b] and choice t∗i , the curve is approximated by the tangent lines φ(t∗i ) + φ0 (t∗i )(t − t∗i ) on intervals [ti−1 , ti ]. Prove that the sum of the lengths of the tangent lines converges to the length of φ as kP k → 0.

7.2. INTEGRATION ON HYPERSURFACE

377

Given a parametrized curve φ(t) : [a, b] → Rn , the arc length function s(t) = µ(φ|[a,t] ) is increasing. By Exercise 3.3.36 and the relation between µP (φ) and the variation, s(t) is also continuous. If φ is not constant on any interval in [a, b], then s(t) is strictly increasing, and the curve can be reparametrized by the arc length. In general, we may pick out the intervals on which φ is constant and modify the parametrization by reducing such intervals to single points. This does not change the arc length and the “track” of φ, and therefore will not affect all the later developments. Thus without loss of generality, we will assume that all curves can be reparametrized by the arc length. Z t

kφ0 (t)kdt by

Suppose φ is continuously differentiable. Then s(t) = a

(7.2.1), or ds = kφ0 (t)kdt. In the special case t = s is the arc length, we get kφ0 (s)k = 1. In other words, φ0 (s) is the tangent vector of unit length. Geometrically, s(t) = µ(φ|[a,t] ) is the arc length counted from the beginning point φ(a) of the curve. It is also possible to count the length s˜(t) = µ(φ|[t,b] ) from the end point φ(b) of the curve. The two arc lengths satisfy s + s˜ = µ(φ) and represent different choices of the directions or orientations of the curve. The arc length s is compatible with the orientation of the parameter t and s˜ is opposite to the orientation of t. The choice of direction will not affect the length, but will affect certain integrals in the future. Example 7.2.4. For the circle φ(θ) = (a cos θ, a sin θ), the Euclidean arc length (counted from θ = 0) is Z

θ

0

Z

kφ (t)k2 dt =

s= 0

θ

p

a2 sin2 t + a2 cos2 tdt = aθ.

0

s s Therefore the circle is parametrized as φ(s) = a cos , a sin by the arc length. a a On the other hand, with respect to the L1 -norm, we have Z

θ

0

Z

kφ (t)k1 dt =

s= 0

θ

|a|(| sin t|+| cos t|)dt = |a|(1−cos θ+sin θ), 0

for 0 ≤ θ ≤

π . 2

Exercise 7.2.7. Find the formula for the Euclidean arc length of a curve in R2 from the parametrized polar coordinate r = r(t), θ = θ(t). Exercise 7.2.8. Compute the Euclidean lengths of the curves. 1. Parabola y = x2 , 0 ≤ x ≤ 1. 2. Spiral r = aθ, 0 ≤ θ ≤ π. 3. Another spiral r = eaθ , 0 ≤ θ ≤ α. 4. Cycloid x = a(t − sin t), y = a(1 − cos t), 0 ≤ t ≤ 2π.

378

CHAPTER 7. MULTIVARIABLE INTEGRATION

5. Cardioid r = 2a(1 + cos θ). 6. Involute of the unit circle x = cos θ + θ sin θ, y = sin θ − θ cos θ, 0 ≤ θ ≤ α. 7. Helix x = a cos θ, y = a sin θ, z = bθ, 0 ≤ θ ≤ α.

We will often use capital letters such as C to denote curves. By this we mean that C is presented by a parametrization but any reparametrization is equally good. Thus different parametrizations are considered to give the same curve C if they are related by invertible continuous maps. Moreover, C can be parametrized by arc length s, which means that only parametrizations that are not constant on any interval are allowed. Thus we have the length µ(C) without any ambiguity. Finally, sometimes we may wish to distinguish the two choice of the directions C may take. This is achieved by restricting the reparametrizations to strictly increasing functions only. This is also equivalent to the choice of one of two possible arc lengths. We will indicate one choice by C and the other choice by −C.

7.2.2

Integration of Function on Curve

Suppose f (~x) is a function defined along a rectifiable curve C. Let φ : [a, b] → Rn be a parametrization of C. For a partition Q of [a, b], we get a partition P of C by the segments Ci between ~xi−1 = φ(ti−1 ) and ~xi = φ(ti ). For choices t∗i ∈ [ti−1 , ti ], we get ~x∗i = φ(t∗i ) ∈ Ci . Define the Riemann sum X X S(P, f ) = f (~x∗i )µ(Ci ) = f (φ(t∗i ))µ(φ|[ti−1 ,ti ] ). The Zfunction is Riemann integrable along the curve, with Riemann integral I=

f ds, if for any > 0, there is δ > 0, such that C

kP k = max µ(Ci ) < δ =⇒ |S(P, f ) − I| < . Z The definition can be easily extended to F ds for maps F : Rn → Rm . C

Since the Riemann sum S(P, f ) depends only on the partition points ~xi , Ci and ~x∗i ∈ Ci , the integral is independent of the choice of parameter t. The arc length s(t) = µ(φ|[a,t] ) is an increasing function, and X S(P, f ) = f (φ(t∗i ))(s(ti ) − s(ti−1 )) = S(P, f ◦ φ, s) is the Riemann-Stieltjes sum of f (φ(t)) with respect to s(t). Therefore we get Z Z b f ds = f (φ(t))ds(t). C

a

The connection to the Riemann-Stieltjes integral gives us many usual properties of the integral along curves. See Exercises 7.2.10 through Z 7.2.16. Moret

over, if the parametrization is further given by φ(t) = ~x0 + Proposition 7.2.2. Then by Theorem 3.3.5, we get Z Z b f ds = f (φ(t))kΨ(t)kdt. C

a

Ψ(τ )dτ as in a

7.2. INTEGRATION ON HYPERSURFACE

379

Example 7.2.5. We try to compute the integral of a linear function l(~x) = ~a · ~x along the straight line C connecting ~u to ~v . The straight line can be parametrized as φ(t) = ~u + t(~v − ~u), with ds = k~v − ~ukdt. Therefore Z 1 Z 1 ~a · (~u + t(~v − ~u))k~v − ~ukdt = (~a · (~u + ~v ))k~v − ~uk. ~a · ~xds = 2 0 C The computation holds to any norm. Example 7.2.6. The integral of |y| along the unit circle with respect to the Euclidean norm is Z π Z 2π Z 2 | sin θ|dθ = 4 | sin θ|dθ = 4. |y|ds = x2 +y 2 =1

0

0

The integral with respect to the L1 -norms is Z Z 2π Z | sin θ|(| sin θ| + | cos θ|)dθ = 4 |y|ds = x2 +y 2 =1

0

π 2

sin θ(sin θ + cos θ)dθ = 4.

0

Example 7.2.7. The integrals of |y| along the curves C1 , C2 , C3 , C4 in Example 7.2.2 with respect to the Euclidean norm are Z Z Z 2π p |y|ds = |y|ds = | sin t| sin2 t + cos2 tdt = 4, ZC1 ZC4 Z0 Z Z |y|ds = |y|ds = |y|ds + |y|ds = 2 |y|ds = 8, C2

φ2 |[0,2π]

C3

Z |y|ds = C5

φ2 |[2π,4π]

Z

Z |y|ds +

φ5 |[0,π]

C1

Z |y|ds = 2

φ5 |[π,2π]

|y|ds = 4. φ5 |[0,π]

Note that C2 may be split into two parts for 0 ≤ t ≤ 2π and 2π ≤ t ≤ 4π. The first part is C1 and the second part is also C1 but reparametrized by t → t + 2π. Therefore the integral on C2 is the double of the integral on C1 . The curve C4 may be split into two parts for 0 ≤ t ≤ π and π ≤ t ≤ 2π. The second part is the first part reparametrized by t → 2π − t. Thus the integral on C4 , which is the sum of the two parts, is the double of the first part. Exercise 7.2.9. Compute the integral along the curve with respect to the Euclidean norm. Z 1. |y|ds, C is the unit circle x2 + y 2 = 1. C

Z 2.

xyds, C is the part of the ellipse C

Z

4

4

2

x2 y 2 + 2 = 1 in the first quadrant. a2 b 2

2

(x 3 + y 3 )ds, C is the astroid x 3 + y 3 = a 3 .

3. C

Z 4.

(a1 x + a2 y + a3 z)ds, C is the circle x2 + y 2 + z 2 = 1, b1 x + b2 y + b3 z = 0.

C

Exercise 7.2.10. Prove that f is integrable alongPa curve C if and only if for any > 0, there is δ > 0, such that kP k < δ implies ωCi (f )µ(Ci ) < . Exercise 7.2.11. Prove that continuous functions and monotone functions (the concept is defined along curves) are integrable along rectifiable curves.

380

CHAPTER 7. MULTIVARIABLE INTEGRATION

Exercise 7.2.12. Suppose f is integrable along a curve. Suppose g is a uniformly continuous function on values of f . Prove that g ◦ f is integrable along the curve. Exercise 7.2.13. Prove that the sum and the product of integrable functions along a curve are still integrable along the curve, and Z Z Z Z Z (f + g)ds = f ds + gds, cf ds = c f ds. C

C

C

C

C

Moreover, prove that Z f ≤ g =⇒

Z f ds ≤

C

gds. C

Exercise 7.2.14. Suppose Zf is a continuous function along a curve C. Prove that f ds = f (~c)µ(C). there is ~c ∈ C, such that C

Exercise 7.2.15. Suppose C is divided into two parts C1 and C2 . Prove that a function is integrable on C if and only if it is integrable on C1 and C2 . Moreover, Z Z Z f ds = f ds + f ds. C

C1

C2

Exercise 7.2.16. Suppose F Z: Rn → Rn is a map satisfying kF (~x)−F (~y )k = k~x −~y k. Z Prove that f (~x)ds = f (F (~x))ds. F (C)

C

Exercise 7.2.17. Suppose f is a Riemann integrable function along a rectifiable ~b. For any ~x ∈ C, denote by C[~a, ~x] the part of C between curve C connecting ~a to Z ~a and ~x. Then F (~x) =

f ds can be considered as the “antiderivative” of f C[~a,~ x]

along C. By using the concept, state and prove the integration by part formula for the integral along C. Exercise 7.2.18. Consider a map F : C ⊂ Rn → Rm on a rectifiable curve C. Prove that F is integrable along C if and only ifZ each coordinate of F is integrable. Moreover, discuss properties of the integral

F ds. C

7.2.3

Integration of 1-Form on Curve

In physics, the force can be represented as a vector. The work done by a constant force F along the straight line from ~a to ~b is F ·(~b−~a). Now consider the work done by a changing force along a curve. Specifically, consider a vector field F : C → Rn along a rectifiable curve C. For a partition P of C and choices ~x∗i ∈ Ci , define the Riemann sum X S(P, F ) = F (~x∗i ) · (~xi − ~xi−1 ). Z F ·d~x is defined as the limit of the Riemann sum as kP k → 0.

The integral C

If the direction of the curve is reversed, then the order of the partition points ~xi is reversed, and the Riemann sum changes the sign. The observation leads to Z Z F · d~x = − F · d~x. −C

C

7.2. INTEGRATION ON HYPERSURFACE

381

a

Z

Z f dx = −

The equality can be compared with b

b

f dx. a

Denote F = (f1 , f2 , . . . , fn ). Let φ = (x1 , x2 , . . . , xn ) : [a, b] → Rn be a parametrization of C. For a partition Q of [a, b] and choices t∗i , we have the corresponding partition P = φ(Q) of C and choices ~x∗i = φ(t∗i ), and S(P, F ) =

X

F (φ(t∗i )) · (φ(ti ) − φ(ti−1 ))

=

X

(f1 (φ(t∗i ))∆x1i + f2 (φ(t∗i ))∆x2i + · · · + fn (φ(t∗i ))∆xni )

= S(P, f1 ◦ φ, x1 ) + S(P, f2 ◦ φ, x2 ) + · · · + S(P, fn ◦ φ, xn ), which is the sum of the Riemann-Stieljes sums of fi (φ(t)) with respect to xi (t). Therefore if fi (φ(t)) are integrable with respect to xi (t), then Z

Z

b

F · d~x = C

b

Z

Z f2 (φ(t))dx2 (t) + · · · +

f1 (φ(t))dx1 (t) +

fn (φ(t))dxn (t). a

a

a

b

Because of the connection to the Riemann-Stieljes integral, we also denote Z Z f1 dx1 + f2 dx2 + · · · + fn dxn . F · d~x = C

C

The expression F · d~x = f1 dx1 + f2 dx2 + · · · + fn dxn is called a 1-form (a differential form of order 1). Z t Ψ(τ )dτ as in Proposition 7.2.2. Then Suppose φ(t) = ~x0 + a

|S(P, F ) − S(Q, (F ◦ φ) · Ψ)| Z ti X ∗ F (φ(ti )) · Ψ(t)dt − F (φ(t )) · ∆t Ψ(t ≤ ) i i i ti−1

Z ti X

Ψ(t)dt − ∆ti Ψ(t∗i ) ≤ kF (φ(ti ))k2

ti−1

≤

X

kF (φ(ti ))k2

kΨ(t) −

sup

2 ∗ Ψ(ti )k2 ∆ti

t∈[ti−1 ,ti ]

If F is bounded, then the integrability of Ψ implies that the right side is very small when kQk is very small. Therefore the 1-form F · d~x is integrable if and only if the function F (φ(t)) · Ψ(t) is Riemann integrable, and Z

Z F · d~x =

C

b

F (φ(t)) · Ψ(t)dt. a

For more properties of the integral of a 1-form along a curve, see Exercises 7.2.22 through 7.2.29.

382

CHAPTER 7. MULTIVARIABLE INTEGRATION

Example 7.2.8. Consider three curves connecting (0, 0) to (1, 1). The curve C1 is the straight line φ(t) = (t, t). The curve C2 is the parabola φ(t) = (t, t2 ). The curve C3 is the straight line from (0, 0) to (1, 0) followed by the straight line from (1, 0) to (1, 1). Then Z 1 Z (tdt + tdt) = 1, ydx + xdy = 0

C1

Z

Z

1

(t2 dt + t · 2tdt) = 1,

ydx + xdy = 0

C2

Z

Z ydx + xdy =

C3

1

1

Z

1dy = 1.

0dx + 0

0

We note that the result is independent of the choice of the curve. In fact, for any continuously differentiable curve φ(t) = (x(t), y(t)), t ∈ [a, b], connecting (0, 0) to (1, 1), we have Z b Z b Z 0 0 (x(t)y(t))0 dt (y(t)x (t) + x(t)y (t))dt = ydx + xdy = a

a

C

= x(b)y(b) − x(a)y(a) = 1 − 0 = 1. Example 7.2.9. Taking the three curves in Example 7.2.8 again, we have Z Z 1 4 xydx + (x + y)dy = (t2 dt + 2tdt) = , 3 C1 0 Z Z 1 17 xydx + (x + y)dy = (t3 dt + (t + t2 )2tdt) = , 12 C2 0 Z 1 Z Z 1 3 0dx + (1 + y)dy = . xydx + (x + y)dy = 2 0 C3 0 In contrast to the integral of ydx + xdy, the integral of xydx + (x + y)dy depends on the curve connecting the two points. Example 7.2.10. Consider the integral of F = (−y, x) along the four parametrized curves in Example 7.2.2. Z Z 2π Z 2π F · d~x = (− sin t, cos t) · (− sin t, cos t)dt = dt = 2π, C1

0

Z

Z

0 4π

F · d~x = C2

Z

dt = 4π,

0

Z

Z

0 2π

F · d~x = C3

4π

(− sin t, cos t) · (− sin t, cos t)dt =

Z (− sin 2t, cos 2t) · (−2 sin 2t, 2 cos 2t)dt =

0

Z

Z

2π

Z (sin t, cos t) · (− sin t, − cos t)dt = −

0

Z

Z

π

Z

0

0

(sin t, cos t) · (− sin t, − cos t)dt π

π

Z

2π

dt −

=

dt = −2π,

2π

(− sin t, cos t) · (− sin t, cos t)dt + Z

2π

0

F · d~x = C5

2dt = 4π, 0

F · d~x = C4

2π

dt = 0. π

The discussion in Example 7.2.7 on the relation between the integrals on the four curves is still valid. Note that the reparametrizations t → t + 2π in C2 and

7.2. INTEGRATION ON HYPERSURFACE

383

t t → in C3 preserve the direction, and the reparametrization t → 2π − t in C4 2 and C5 reverses the direction. In particular, the integral along C5 , which is the sum of the two parts, must be zero. Exercise 7.2.19. Compute the integrals of the 1-forms on the three curves in Example 7.2.8. In case the three integrals are the same, can you provide a general reason? 1. xdx + ydy.

3. (2x + ay)ydx + (x + by)xdy.

2. ydx − xdy.

4. ex (ydx + ady).

Exercise 7.2.20. Compute the integral of 1-form. Z ydx − xdy 1. , C is upper half circle in the counterclockwise direction. 2 2 C x +y Z 2. (2a−y)dx+dy, C is the cycloid x = at−b sin t, y = a−b cos t, 0 ≤ t ≤ 2π. C

Z 3.

xdy + ydz + zdx, C is the straight line connecting (0, 0, 0) to (1, 1, 1). C

Z (x + z)dx + (y + z)dy + (x − y)dz, C is the helix x = cos t, y = sin t, z = t,

4. C

0 ≤ t ≤ α. Z 5. (y 2 −z 2 )dx+(z 2 −x2 )dy+(x2 −y 2 )dz, C is the boundary of the intersection C

of the sphere x2 + y 2 + z 2 = 1 with the first quadrant, and the direction is (x, y)-plane part, followed by (y, z)-plane part, followed by (z, x)-plane part. Exercise 7.2.21. Suppose A is a symmetric matrix. Compute the integral of A~x ·d~x on any continuously differentiable parametrized curve. Exercise 7.2.22. Prove that if F is continuous, then the 1-form F · d~x is integrable along a rectifiable curve. Exercise 7.2.23. Prove that if the 1-form F · d~x is integrable along a rectifiable curve and G is uniformly continuous on values of F , then (G ◦ F ) · d~x is integrable. Exercise 7.2.24. Prove that the sum and scalar multiplication of integrable 1-forms are integrable, and Z Z Z Z Z (F + G) · d~x = F · d~x + G · d~x, cF · d~x = c F · d~x. C

C

C

C

C

Exercise 7.2.25. Prove the inequality Z F · d~x ≤ µ(C) sup kF k2 , C

C

where µ(C) is the Euclidean length of the curve C. Exercise 7.2.26. Suppose C is divided into two parts C1 and C2 . Prove that a 1-form is integrable on C if and only if it is integrable on C1 and C2 . Moreover, Z Z Z F · d~x = F · d~x + F · d~x. C

C1

C2

384

CHAPTER 7. MULTIVARIABLE INTEGRATION

Exercise 7.2.27. Suppose Z and U is an orthogonal linear Z ~a is a constant vector F (~a + U (~x)) · d~x. U (F (~x)) · d~x = transform. Prove that ~a+U (C)

C

Z Exercise 7.2.28. Suppose c is a constant. Prove that

Z F (~x)·d~x = c

cC

F (c~x)·d~x. C

Exercise 7.2.29. Define ωC (F ) = sup~x,~y∈C kF (~xP ) − F (~y )k. Prove that if for any > 0, there is δ > 0, such that kP k < δ implies ωCi (F )µ(Ci ) < , where Ci are the segments of C cut by the partition P , then F · d~x is integrable along C. Show that the converse is not true.

7.2.4

Surface Area

Suppose σ(~u) = σ(u, v) : A ⊂ R2 → Rn is a parametrized surface on a subset A with area. For a partition P of R2 by triangles, we consider the triangles I ∈ P that are contained in A. Each triangle is determined by its three vertices ~u0 , ~u1 , ~u2 . Let I σ ⊂ Rn be the triangle with σ(~u0 ), σ(~u1 ), σ(~u2 ) as the vertices. Then it is natural to think of the union of such I σ as an approximation of the parametrized surface. By (5.2.13), the area of I σ is 1X k(σ(~u1 ) − σ(~u0 )) × (σ(~u2 ) − σ(~u0 ))k2 . 2 I We expect the sum of the areas approaches the area of the parametrized surface as kP k → 0. It turns out that the expectation is right for curves but wrong for surfaces. For curves, the direction of the straight line connecting two nearby points φ(ti−1 ) and φ(ti ) is very close to the direction φ0 (t∗i ). For surfaces, the direction of the plane passing through three nearby points σ(~u0 ), σ(~u1 ), σ(~u2 ) may be far away from the direction of the tangent plane σ(~u∗I ) + σ 0 (~u∗I )(~u − ~u∗I ). Example 7.2.11. Consider the cylinder σ(θ, z) = (cos θ, sin θ, z) for 0 ≤ z ≤ 1. π 1 Let α = and d = . We cut the cylinder at heights 0, d, 2d, . . . , 2nd and m 2n get 2n + 1 unit circles. On the unit circles at heights 0, 2d, 4d, . . . , 2nd, we plot m points at angles 0, 2α, 4α, . . . , (2m − 2)α. On the unit circles at the heights d, 3d, 5d, . . . , (2n − 1)d, we plot m points at angles α, 3α, . . . , (2m − 1)α. By connecting nearby triple p points, we get 2mn identical isosceles triangles with base 2 sin α and height (1 − cos α)2 + d2 . The total area is p 1 sin α 4mn 2 sin α (1 − cos α)2 + d2 = 2π 2 α

s

1 − cos α d

2 + 1.

This has no limit as α, d → 0.

So it is necessary to make sure that the direction of the approximate planes to be really close to the direction of the tangent plane. Assume the parametrization σ is continuously differentiable. Take any general partition P of A, choose ~u∗I ∈ I. Then σ is approximated on I by the linear map LI (~u) = σ(~u∗I ) + σ 0 (~u∗I )(~u − ~u∗I ).

7.2. INTEGRATION ON HYPERSURFACE ........ ......... . . 1 ... .. ......... .. ........ .. ............................................................................... ......... ........... ... . . . . . . . . . (2j+1)d .... . . . . . . . . .. ......... ....................................... ............................... . . . . . . .. ......................................................................................................................................... 2jd ... . .. .. . .. .................... ........... ............. ...... .............. ............................. . . . . ............................................2iα . .. . .. .... .... .... . .............................. (2j−1)d ... ... .... .... ..... ........................................ . ............ ............ ............... .. ............... .. .. ................................................... ..(2i−3)α ................ . .........................................................(2i+3)α .. (2i−1)α (2i+1)α 0 ... ......... ........

385

Figure 7.3: a bad approximation of the cylinder The area of σ(I) is approximated by the area of LI (I). By Proposition 7.1.15, the area of LI (I) is kσ 0 (~u∗I )(~e1 ) × σ 0 (~u∗I )(~e2 )k2 µ(I) = kσu (~u∗I ) × σv (~u∗I )k2 µ(I). P Thus the area of the surface is approximated by I∈P kσu (~u∗I )×σv (~u∗I )k2 µ(I), which is the Riemann sum for the integral of the function kσu × σv k2 on A. Therefore the area of a surface parametrized by a continuously differentiable map σ(u, v) : A ⊂ R2 → Rn is Z Z q µ(S) = kσu × σv k2 dudv = kσu k22 kσv k22 − (σu · σv )2 dudv. A

A

We also denote dA = kσu × σv k2 dudv, where A indicates the area. Intuitively, the area of a surface should be independent of the parametrization. To verify the intuition, we consider a change of variable (u, v) = Φ(s, t) : B → A. We have σs = us σu + vs σv , σt = ut σu + vt σv , and by (5.2.12), ∂(u, v) σu × σv . (7.2.2) σs × σt = det ∂(s, t) Then by the change of variable formula, we have Z Z Z ∂(u, v) kσs × σt k2 dsdt = kσu × σv k2 dudv. det ∂(s, t) kσu × σv k2 dsdt = B B A Example 7.2.12. The graph of a continuously differentiable function f (x, y) defined for (x, y) ∈ A is naturally parametrized as σ(x, y) = (x, y, f (x, y)). By σx × σy = (1, 0, fx ) × (0, 1, fy ) = (−fx , −fy , 1), the area of the surface is Z Z q k(1, 0, fx ) × (0, 1, fy )k2 dxdy = 1 + fx2 + fy2 dxdy. A

A

386

CHAPTER 7. MULTIVARIABLE INTEGRATION

In case z = f (x, y) is implicitly defined by g(x, y, z) = c, the formula becomes q Z Z gx2 + gy2 + gz2 dxdy dxdy = , |gz | A A cos θ where ∇g = (gx , gy , gz ) is normal to the tangent space, and θ is the angle between the tangent plane and the (x, y)-plane. Example 7.2.13. The surface obtained by rotating a curve x = x(t), y = y(t), t ∈ [a, b], around the x-axis is σ(t, θ) = (x(t), y(t) cos θ, y(t) sin θ). The area is Z k(x0 , y 0 cos θ, y 0 sin θ) × (0, −y sin θ, y cos θ)k2 dtdθ [a,b]×[0,2π] Z b

p |y(t)| x0 (t)2 + y 0 (t)2 dt.

=2π a

In particular, the area of the unit sphere is (see Example 7.1.11) Z π p | sin t| (cos t)02 + (sin t)02 dt = 4π. β2 = 2π 0

The torus in Example 6.1.10 is obtained by rotating the circle x = a + b cos φ, z = b sin φ around the z-axis. So the torus has area Z 2π p 2π (a + b cos φ) (a + b cos φ)02 + (b sin φ)02 dφ = 2πab. 0

Example 7.2.14. The parametrized surface σ(u, v) = (cos(u+v), sin(u+v), cos u, sin u) π for 0 ≤ u, v ≤ has area 2 Z Z q p π2 2 2 2 kσu k2 kσv k2 − (σu · σv ) dudv = 2 · 1 − 12 dudv = . 4 [0, π2 ] [0, π2 ] Exercise 7.2.30. Find the area of the graph of F : R2 → Rn−2 . The graph is a surface in Rn . Exercise 7.2.31. Study the effect of transforms of Rn on the areas of surfaces. 1. For the shifting ~x → ~a + ~x, prove µ(~a + S) = µ(S). 2. For the scaling ~x → c~x, prove µ(cS) = c2 µ(S). 3. For an orthogonal linear transform U , prove µ(U (S)) = µ(S). Exercise 7.2.32. Establish and prove the Pappus-Guldinus theorem (see Example 7.1.12) for the area of surfaces obtained by rotation. Exercise 7.2.33. Find the area of surface. 1. The boundary surface of the common part of the ball x2 + y 2 + z 2 ≤ R2 and the cylinder x2 + y 2 ≤ Rx. 1

2

2. The surface (x2 + y 2 ) 3 + z 3 = 1. 3. The parametrized surface σ(u, v) = (u+v, u−v, u2 +v 2 , u2 −v 2 ) for 0 ≤ u ≤ a, 0 ≤ v ≤ b.

7.2. INTEGRATION ON HYPERSURFACE

7.2.5

387

Integration of Function on Surface

Suppose f (~x) is a bounded function on a surface S parametrized by σ(u, v) : A ⊂ R2 → Rn . For a general partition P of A and choices (u∗I , vI∗ ) ∈ I for I ∈ P , we have the Riemann sum X S(P, f, σ) = f (σ(u∗I , vI∗ ))µ(σ(I)). (7.2.3) Z

Z f dµ =

The integral

f dA of f on the surface is defined as the limit of

S

S

the Riemann sum as kP k → 0. The notation dA indicates the integration is against the area. The Riemann sum satisfies |S(P, f, σ) − S(P, (f ◦ σ)kσu × σv k2 )| Z X ∗ ∗ ∗ ∗ ∗ ∗ ≤ |f (σ(uI , vI ))| kσu × σv k2 dudv − kσu (uI , vI ) × σv (uI , vI )k2 µ(I) I X ∗ ∗ ≤ |f (σ(uI , vI ))| sup kσu (u, v) × σv (u, v) − σu (u∗I , vI∗ ) × σv (u∗I , vI∗ )k2 µ(I). (u,v)∈I

By the continuity of σu × σv and the boundedness of f , the right side is small when kP k is small. Therefore f is Riemann integrable on the surface if and only if (f ◦ σ)kσu × σv k2 is Riemann integrable on A. When the parametrization is regular, this is also equivalent to f ◦ σ is Riemann integrable on A. Moreover, we have Z Z f dA = f (σ(u, v))kσu (u, v) × σv (u, v)k2 dudv. S

A

The integral has properties similar to the integral of functions on rectifiable curves in Section 7.2.2. Moreover, the integral is independent of the choice of the parametrization, either by an argument similar to the area of the surface, or by the fact that the Riemann sum (7.2.3) is essentially given by a partition of the surface and a choice of points on the surface. p 2 2 Example 7.2.15. Let S be the part q of the cone z = √ x + y cut by the cylinder (x − 1)2 + y 2 = 1. Then dA = 1 + zx2 + zy2 dxdy = 2dxdy, and Z

p √ (xy + (x + y) x2 + y 2 ) 2dxdy.

Z (xy + yz + zx)dA = (x−1)2 +y 2 ≤1

S

π π The region (x − 1)2 + y 2 ≤ 1 can be described as 0 ≤ r ≤ 2 cos θ, − ≤ θ ≤ in 2 2 the polar coordinate. Therefore Z

Z (xy + yz + zx)dA =

S

π 2

Z

2 cos θ 2

√

(r cos θ sin θ + r(cos θ + sin θ)r) 2rdr dθ

− π2

0

√ Z =4 2

π 2

− π2

(cos θ sin θ + cos θ + sin θ) cos4 θdθ =

64 √ 2. 15

388

CHAPTER 7. MULTIVARIABLE INTEGRATION

Example 7.2.16. For a fixed vector ~a ∈

R3 ,

Z f (~a · ~x)dA on

consider the integral S2

the unit sphere. There is a rotation that moves ~a to (k~ak2 , 0, 0). Since the rotation preserves the areas of surfaces in R3 , we have Z Z Z f (~a · ~x)dA = f ((k~ak2 , 0, 0) · (x, y, z))dA = f (k~ak2 x)dA S2

S2

Parametrize S 2 by σ(x, θ) = (x, and Z Z f (k~ak2 x)dA = S2

S2

√

√ 1 − x2 cos θ, 1 − x2 sin θ). Then kσx ×σθ k2 = 1 Z

1

f (k~ak2 x)dx.

f (k~ak2 x)dxdθ = 2π −1

−1≤x≤1,0≤θ≤2π

Exercise 7.2.34. Use the change of variable formula to prove that the Riemann integral on a surface is independent of the parametrization. Exercise 7.2.35. Study the effect of transforms of Rn on the integrals on surfaces. Z Z f (~a + ~x)dA. f (~x)dA = 1. For the shifting ~x → ~a + ~x, prove S

~a+S

Z 2. For the scaling ~x → c~x, prove

f (~x)dA = c2

cS

Z f (c~x)dA. S

Z 3. For an orthogonal linear transform U , prove

Z f (~x)dA =

U (S)

f (U (~x))dA. S

Exercise 7.2.36. Compute the integral on surface. Z dA 1. , 1 ≥ b > a > 0. x2 +y 2 +z 2 =1,a≤z≤b z Z Z 2 2. x dA and x2 y 2 dA, S is the sphere x2 + y 2 + z 2 = 1. S

S

Z

dA , S is the boundary of the tetrahedron x + y + z ≤ 1, x ≥ 0, 2 S (1 + x + y) y ≥ 0, z ≥ 0. Z 4. (x + y + z)dA, T 2 is part of the torus in (5.1.20) in the quadrant x, y ≥ 0.

3.

T2

Z 5.

(x21 + x22 )dA, S is the surface σ(u, v) = (u + v, u − v, u2 + v 2 , u2 − v 2 ) for

S

0 ≤ u ≤ a, 0 ≤ v ≤ b. Z Exercise 7.2.37. Compute the attracting force S2

on a point ~a.

7.2.6

~x − ~a dA of the unit sphere k~x − ~ak32

Integration of 2-Form on Surface

In physics, a flow can be represented as a vector. The flux of a constant flow F through a region A on a plane ~a · ~x = b in R3 is (F · ~n)µ(A), where ~a ~n = has unit Euclidean length and is called a normal vector of the k~ak2

7.2. INTEGRATION ON HYPERSURFACE

389

plane. A plane in R2 has two normal vectors ~n and −~n that indicate “two sides” of the plane. The sign of the flux indicates whether the flow is the same as or against the normal direction. Suppose S ⊂ R3 is a surface parametrized by a regular continuously differentiable map σ(u, v) : A ⊂ R2 → R3 . Then the parametrization gives a normal vector σu × σv ~n = kσu × σv k2 of (the tangent plane of) the surface. Geometrically, the equality means that the order σu , σv , ~n follows the “right hand rule”. Sometimes instead of one differentiable map that parametrizes the whole surface, we may need several regular parametrizations that combine to cover the whole surface. In this case, we require the normal vectors from different parametrizations coincide on their overlapping. Then the whole surface has a continuous choice of the normal vectors. Such a choice is called an orientation of the surface. Example 7.2.17. The graph of a continuously differentiable function f (x, y) is naturally parametrized as σ(x, y) = (x, y, f (x, y)). By the computation in Exercise 7.2.12, the normal vector induced by the parametrization is ~n =

(−fx , −fy , 1) (−fx , −fy , 1) . =q k(−fx , −fy , 1)k2 1 + fx2 + fy2

Note that the vector points to the positive direction of z, which is upward in the usual way of depicting the (x, y, z) coordinates. To induce the normal vector in the downward direction, we may use the parametrization σ(y, x) = (x, y, f (x, y)) (or perhaps σ(u, v) = (v, u, f (v, u)) in the less confusing way). 2 = {(x, y, z) : x2 + y 2 + z 2 = R2 } of radius R can Example 7.2.18. The sphere SR be covered by the paramerizations p σ1 (x, y) = (x, y, R2 − x2 − y 2 ), x2 + y 2 < R2 , p σ2 (y, x) = (x, y, − R2 − x2 − y 2 ), x2 + y 2 < R2 , p σ3 (z, x) = (x, R2 − x2 − z 2 , z), x2 + z 2 < R2 , p σ4 (x, z) = (x, − R2 − x2 − z 2 , z), x2 + z 2 < R2 , p σ5 (y, z) = ( R2 − y 2 − z 2 , y, z), y 2 + z 2 < R2 , p σ6 (z, y) = (− R2 − y 2 − z 2 , y, z), y 2 + z 2 < R2 .

By

x y x y (σ1 )x × (σ1 )y = 1, 0, − , ,1 , × 0, 1, − = z z z z The normal vector from the first parametrization is (σ1 )x × (σ1 )y (x, y, z) ~x = = . k(σ1 )x × (σ1 )y k2 k(x, y, z)k2 R ~x is the normal vector from the other R ~x parametrizations. Therefore the choice ~n = gives an orientation of the sphere. R

Similar computations also show that

390

CHAPTER 7. MULTIVARIABLE INTEGRATION

Note that the order of the variables in σ2 are deliberately arranged to have y as the first and x as the second. Thus the normal vector should be computed from (σ2 )y × (σ2 )x instead of the other way around.

Under a change of variable u = u(s, t), v = v(s, t), by (7.2.2), we find the normal vectors computed from the two parametrizations are the same if ∂(u, v) and only if det > 0. Now on the overlapping of two parametrizations ∂(s, t) σ1 (u, v), σ2 (s, t) of a surface, the parameters (u, v) and (s, t) are related as a change of variables. We say the two parametrizations are compatibly oriented if the Jacobian matrix of the change of variable on the overlapping has positive determinant. Then we conclude that an orientation of the surface is the same as a collection of compatibly oriented parametrizations. It is possible for a surface to be nonorientable in the sense that there is no continuous choice of the normal vectors. This is equivalent to the nonexistence of a collection of compatibly oriented parametrizations. A typical example is the M¨obius band. ................................................. .......................... ........... . . . . . . . . . . . ........ .. ...... ...... .... . .... . . . ..... . . . ... . .. ......... ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... . .................. ....................................... .. . . . . .......................................... ........... .. . . ... .... .... .... ..... . . . . . . . . ........ ........ ............. ............................................................................ Figure 7.4: M¨obius band Back to flux, suppose S ⊂ R3 is a surface with orientation ~n. The flow is a vector field F : S → Rn on the surface. For a regular continuously differentiable parametrization σ : A → S compatible with the orientation, a general partition P of A and choices (u∗I , vI∗ ) ∈ I for I ∈ P , the total flux of the flow through the surface is approximated by the Riemann sum X S(P, F, ~n, σ) = F (σ(u∗I , vI∗ )) · ~n(σ(u∗I , vI∗ ))µ(σ(I)). (7.2.4) The integral of the flow along the surface is defined as the limit of the Riemann sum as kP k → 0. Since the Riemann sum S(P, F, ~n, σ) is the same as the Riemann sum S(P, F ·~n, σ) for the integral of the function F ·~n on the surface, the integrability of F along the surface is the same as the integrability of F · ~n, and Z Z F · ~ndA = F (σ(u, v)) · (σu (u, v) × σv (u, v))dudv. (7.2.5) S

A

The integral has properties similar to the integral of 1-forms along rectifiable curves in Section 7.2.3. Moreover, the integral is independent of the choice of the orientation compatible parametrization, either by an argument similar to the area of the surface, or by the fact that the Riemann sum (7.2.4) is essentially given by a partition of the surface, a choice of points on the surface, and the orientation (indicated by the normal vector ~n).

7.2. INTEGRATION ON HYPERSURFACE

391

Example 7.2.19. By the computation of the normal vector in Example 7.2.18, the (outgoing) flux of the flow F = ~x = (x, y, z) through the sphere of radius R is Z Z ~x 2 ~x · dA = RdA = Rµ(SR ) = R3 β2 = 4πR3 . 2 2 R SR SR Example 7.2.20. The outgoing flux of the flow F = (x2 , y 2 , z 2 ) through the sphere k~x − ~x0 k22 = (x − x0 )2 + (y − y0 )2 + (z − z0 )2 = R2 is Z (x − x0 , y − y0 , z − z0 ) (x2 , y 2 , z 2 ) · dA. R k~ x−~ x0 k2 =R By the shifting ~x → ~x0 + ~x, the integral is equal to Z 1 ((x + x0 )2 x + (y + y0 )2 y + (z + z0 )2 z)dA. R k~xk2 =R Z Z By suitable rotations that exchange the axis, we have xn dA = y n dA = k~ xk2 =R Z k~xk2 =R Z n xn dA = z dA. By the transform (x, y, z) → (−x, y, z), we also have k~ xk2 =R

k~ xk2 =R

0 for odd integers n. Thus the integral above becomes Z Z 1 2 (2x0 x2 + 2y0 y 2 + 2z0 z 2 )dA = (x0 + y0 + z0 ) (x2 + y 2 + z 2 )dA R k~xk2 =R 3R k~ xk2 =R 8 = π(x0 + y0 + z0 )R3 . 3 Exercise 7.2.38. Use the change of variable formula to prove that the Riemann integral (7.2.5) on a surface is independent of the orientation compatible parametrization. Exercise 7.2.39. Study the effect of transforms of Rn on the flux, similar to Exercise 7.2.35. Exercise 7.2.40. Compute the flux. 1. F = (x, y, z), S is a triangle with vertices ~a, ~b, ~c and normal vector ~n (which is necessarily orthogonal to ~b − ~a and ~c − ~a). 2. F = (f (x), g(y), h(z)), S is the boundary of the rectangle [a1 , a2 ] × [b1 , b2 ] × [c1 , c2 ] with ~n pointing outward. 3. F = (xy 2 , yz 2 , zx2 ), S is the sphere (x − x0 )2 + (y − y0 )2 + (z − z0 )2 = 1 with ~n pointing outward.

The right side of (7.2.5) requires the compatibility between the normal vector ~n and the parametrization σ(u, v). The compatibility condition can be included in the integral if a more suitable formal notation is used. We formally extend the cross product to the differential 1-forms. Then for the parametrization σ(u, v) = (x(u, v), y(u, v), z(u, v)), we have dx × dy = (xu du + xv dv) × (yu du + yv dv) ∂(x, y) = (xu yv − xv yu )du × dv = det du × dv ∂(u, v)

392

CHAPTER 7. MULTIVARIABLE INTEGRATION

and the similar formulae for dy × dz and dz × dx. Furthermore, denote F = (f, g, h). Then [F · (σu × σv )]du × dv =[f (yu zv − yv zu ) + g(zu xv − zv xu ) + h(xu yv − xv yu )]du × dv =f dy × dz + gdz × dx + hdx × dy. We also note the formal property dv × du = −du × dv, which is compatible with the fact that exchanging u and Z v would reverse the orientation (changing ~n to −~n) and change the sign of

F · ~ndA. S

Z

The discussion suggests that the integral F · ~ndA should really be S Z f dy × dz + gdz × dx + hdx × dy. Since the cross product is a denoted S

special case of the exterior product, we call ω = f dy ∧ dz + gdz ∧ dx + hdx ∧ dy a differential 2-form on R3 and, inspired by the computation of [F · (σu × σv )]du×dv above, define the integral of the differential 2-form on the oriented surface by Z Z ∂(y, z) ∂(z, x) ∂(x, y) f dy∧dz+gdz∧dx+hdx∧dy = f det + g det + h det dudv. ∂(u, v) ∂(u, v) ∂(u, v) S A The use of exterior product allows us to extend the integral of 2-forms to surfaces in the other dimensions. Define the integral of a differential 2-form X ω= fij (~x)dxi ∧ dxj i<j

on a surface S ⊂ Rn parametrized by a regular continuously differentiable map σ(u, v) : A ⊂ R2 → Rn to be Z XZ ∂(xi , xj ) fij (σ(u, v)) det ω= dudv. (7.2.6) ∂(u, v) S i<j A Similar to the case n = 3, we are concerned about whether the integral depends on the choice of the parametrization. Under a change of variable u = u(s, t), v = v(s, t), we have Z ∂(xi , xj ) f (σ(u, v)) det dudv ∂(u, v) Auv Z ∂(xi , xj ) ∂(u, v) = det dsdt. f (σ(s, t)) det ∂(u, v) ∂(s, t) Ast ∂(xi , xj ) ∂(xi , xj ) ∂(u, v) = , the right side fits into the definition (7.2.6) ∂(s, t) ∂(u, v) ∂(s, t) ∂(u, v) in terms of the new variable (s, t) if and only if det > 0. Therefore ∂(s, t)

By

7.2. INTEGRATION ON HYPERSURFACE

393

the definition of the integral of a 2-form on a surface is not changed if the parametrizations are compatibly oriented. In general, a surface S ⊂ Rn may be covered by finitely many parametrized pieces σi : Ai ⊂ R2 → Si ⊂ Rn , S = ∪Si . On the overlap Si ∩Sj of the pieces, we get the change of variable among different parametrizations (called transition map) ϕij = σj−1 σi : Aij = σi−1 (Si ∩ Sj ) → Aji = σj−1 (Si ∩ Sj ). The parametrizations are compatibly oriented if det ϕ0ij > 0 for any i, j. The surface is orientable if there is a compatibly oriented collection of parametrizations that covers the surface. Such a collection gives an orientation of the surface. Any other parametrization σ∗ : A ⊂ R2 → S∗ ⊂ S is compatible with the orientation if det(σi−1 ◦ σ∗ )0 > 0 on σ −1 (S∗ ∩ Si ). .............................................................................................. ................................................. ..... . . . . . . . . . . . . .......... ................. . . . ..... . . . . . . . . . . . . . . . . . . S . . . . . . . . . . ... .. .... ....i ... ... . ........ .......................... .............................. ..S j ... ......... .............. . . . . . ........ ................ ......................................... ........ . . ... ....... . ..σ.. ............. ...... σj.............. . . . . ...... ................. . i ..... S ...... ........................... ...... ....................... . .... .. ..... . ... ... ... . .. Aji .... ....... ... ........ . . . . . ... . .. . . .................................................................. ........ .. .... ...................................................ϕ . . . . . . . . ij . . . . .... .... .... .. ... Aij ... . .. .. ....... .. ......... ...... . . . . ....... .... .............. .. .. ............................................................ ............................................... . . Figure 7.5: transition between overlapping parametrizations Choose orientation compatible parametrizations σi that cover the surface, such that the intersections Si ∩ Sj have area zero. Then the integral of a 2form ω on the surface is defined as Z XZ ω, (7.2.7) ω= S

Si

where the integrals inside the sum on the right side are defined by (7.2.6). The definition is independent of the choice of orientation compatible parametrizations. Example 7.2.21. The flux in Example 7.2.19 is the integral of the 2-form xdy ∧ 2 with orientation given by compatible dz + ydz ∧ dx + zdx ∧ dy on the sphere SR parametrizations in Example 7.2.18. The integral of zdx ∧ dy may be computed by using σ1 and σ2 Z Z p ∂(x, y) 1 − x2 − y 2 det dxdy zdx ∧ dy = 2 ∂(x, y) SR x2 +y 2 ≤R2 Z p ∂(x, y) + − 1 − x2 − y 2 det dxdy ∂(y, x) x2 +y 2 ≤R2 Z 2π Z R p 4 2 =2 1 − r rdr dθ = πR3 . 3 0 0

394

CHAPTER 7. MULTIVARIABLE INTEGRATION Z

Z xdy ∧ dz =

Similarly, we get 2 SR

4 ydz ∧ dx = πR3 . 2 3 SR

Example 7.2.22. We compute the outgoing flux of F = (x2 , y 2 , z 2 ) in Example (x − x0 )2 (y − y0 )2 (z − z0 )2 7.2.20 through the ellipse S given by + + = 1. By a2 b2 c2 the shifting ~x → ~x0 + ~x, the flux is the integral Z (x + x0 )2 dy ∧ dz + (y + y0 )2 dz ∧ dx + (z + z0 )2 dx ∧ dy. S+~ x0

The integral of (z +z0 )2 dx∧dy may be computed by using parametrizations similar to σ1 and σ2 in Example 7.2.18 !2 r Z Z x2 y 2 ∂(x, y) 2 c 1 − 2 − 2 + z0 det (z + z0 ) dx ∧ dy = 2 2 dxdy x a b ∂(x, y) S+~ x0 + y2 ≤1 2 a b !2 r Z x2 y 2 ∂(x, y) −c 1 − 2 − 2 + z0 det + 2 2 dxdy x a b ∂(y, x) + y2 ≤1 2 a b r Z x2 y 2 = 4z0 c 2 2 1 − 2 − 2 dxdy x a b + y2 ≤1 a2 Z b p 16 = 4abcz0 1 − u2 − v 2 dudv = πabcz0 . 3 2 2 u +v ≤1 16π abc(x0 + y0 + z0 ). 3 Exercise 7.2.41. Compute the integral of 2-form. Z (x − x0 )2 (y − y0 )2 1. xy 2 dy∧dz+yz 2 dz∧dx+zx2 dx∧dy, S is the ellipse + + a2 b2 S (z − z0 )2 = 1 with orientation given by the inward normal vector. c2 Z x2 y2 z2 2. xyzdx ∧ dy, S is the part of the ellipse 2 + 2 + 2 = 1, x ≥ 0, y ≥ 0, a b c S with orientation given by outward normal vector. Z Z xydy ∧ dz and x2 ydy ∧ dz, S is the boundary of the solid enclosed by 3.

The total flux is

S

S

z = x2 + y 2 and z = 1, with orientation given by outward normal vector. Z 4. dx1 ∧ dx2 + dx2 ∧ dx3 + · · · + dxn−1 ∧ dxn , S is the surface σ(u, v) = S

(u + v, u2 + v 2 , . . . , un + v n ) for 0 ≤ u ≤ a, 0 ≤ v ≤ b. Exercise 7.2.42.Z Prove that the area of a surface S ⊂ R3 given by an equation gx dy ∧ dz + gy dz ∧ dx + gz dx ∧ dy q g(x, y, z) = c is . S gx2 + gy2 + gz2

7.2.7

Volume and Integration on Hypersurface

Suppose S ⊂ Rk is a k-dimensional hypersurface parametrized by a continuously differentiable map σ(~u) = σ(u1 , u2 , . . . , uk ) : Rk → Rn on a subset

7.2. INTEGRATION ON HYPERSURFACE

395

A ⊂ Rk with volume. The (k-dimensional) volume of the hypersurface is Z µ(S) = kσu1 ∧ σu2 ∧ · · · ∧ σuk k2 du1 du2 · · · duk A Z q = det(σui · σuj )1≤i,j≤k du1 du2 · · · duk . (7.2.8) A

We also denote dV = kσu1 ∧ σu2 ∧ · · · ∧ σuk k2 du1 du2 · · · duk , where V indicates the volume. The definition is independent of the choice of the parametrization. ~ : Rn−1 → Rn of Example 7.2.23. In Example 7.1.11, for a parametrizationZ F (θ) | det F F 0 |dµθ~ is the unit sphere S n−1 , we claimed that the integral βn−1 = A

the volume of the sphere. Now the claim can be justified. Since F · F = kF k22 = 1, by taking partial derivatives, we get Fθi · F = 0. Therefore F is a unit length vector orthogonal to Fθi . This implies that kFθ1 ∧ Fθ2 ∧ · · · ∧ Fθn−1 k2 = kF ∧ Fθ1 ∧ Fθ2 ∧ · · · ∧ Fθn−1 k2 = | det F F 0 |, where the second equality follows from the interpretation of the determinant as the wedge product of n vectors in Rn . Thus the volume of the sphere is Z Z kFθ1 ∧ Fθ2 ∧ · · · ∧ Fθn−1 k2 dµθ~ = | det F F 0 |dµθ~ . A

A

Continuing with the computation of the volume of the unit sphere, we note that the parametrization of S n−1 induces a parametrization of S n i h ~ φ) = (F (θ) ~ cos φ, sin φ) : A × − π , π → Rn+1 . G(θ, 2 2 We have kGθ1 ∧ Gθ2 ∧ · · · ∧ Gθn−1 ∧ Gφ k2 =k(Fθ1 cos φ, 0) ∧ (Fθ2 cos φ, 0) ∧ · · · ∧ (Fθn−1 cos φ, 0) ∧ (−F sin φ, cos φ)k2 =k(Fθ1 cos φ, 0) ∧ (Fθ2 cos φ, 0) ∧ · · · ∧ (Fθn−1 cos φ, 0)k2 k(−F sin φ, cos φ)k2 =kFθ1 ∧ Fθ2 ∧ · · · ∧ Fθn−1 k2 cosn−1 φ, where the second equality is due to the fact that (−Fqsin φ, cos φ) is orthogonal to (Fθi cos φ, 0), and we use k(−F sin φ, cos φ)k2 = kF k22 sin2 φ + cos2 φ = p sin2 φ + cos2 φ = 1 in the last equality. Therefore Z Z π 2 n−1 βn = kFθ1 ∧Fθ2 ∧· · ·∧Fθn−1 k2 cos φdµθ~ dφ = βn−1 cosn−1 φdφ. π ~ θ∈A,− ≤φ≤ π2 2

− π2

Using integration by parts, we get (see Exercise 3.2.19)  (n − 1)(n − 3) · · · 1   Z π Z π π  2 2 n − 1 n(n − 2) · · · 2 cosn φdφ = cosn−2 φdφ = · · · = (n − 1)(n − 3) · · · 2  n − π2 − π2  2  n(n − 2) · · · 3

if n is even . if n is odd

396

CHAPTER 7. MULTIVARIABLE INTEGRATION

Therefore βn =

2π βn−2 . Combined with β1 = 2π, β2 = 4π, we get n−1  2π k   if n = 2k  (k − 1)! βn−1 = . 2k+1 π k    if n = 2k + 1 (2k − 1)(2k − 3) · · · 1

Finally, by Example 7.1.11, the volume of the unit ball is   πk   if n = 2k βn−1 = k! αn = . k+1 k 2 π  n  if n = 2k + 1  (2k + 1)(2k − 1)(2k − 3) · · · 1 The result may be compared with Exercise 7.1.35.

A function f (~x) is Riemann integrable on the hypersurface S if f (σ(~u)) is Riemann integrable on A. The integral is Z Z f dV = f (σ(u1 , u2 , . . . , uk ))kσu1 ∧ σu2 ∧ · · · ∧ σuk k2 du1 du2 · · · duk S

A

The integral is independent of the choice of regular parametrizations. Example 7.2.24. Consider a k-dimensional hypersurface S parametrized by σ(~u) = (ξ(~u), r(~u)) : A ⊂ Rk → Rn × R. Assume r(~u) ≥ 0, so that the hypersurface is inside the upper half Euclidean space. The l-dimensional rotation of S around the axis Rn × 0 is a (k + l)-dimensional hypersurface ρl (S) = {(~x, ~y ) ∈ Rn × Rl+1 : (~x, k~y k2 ) ∈ S}. ~ : B ⊂ Rl → Rl+1 be a parametrization of the unit sphere S l . Then the Let F (θ) rotation hypersurface ρl (S) is parametrized by ~ = (ξ(~u), r(~u)F (θ)) ~ : A × B → Rn × Rl+1 . ρ(~u, θ) By ρui = (ξui , rui F ), ρθj = (~0, rFθj ), and ρui · ρθj = rui F · rFθj = 0, we get kρu1 ∧ρu2 ∧· · ·∧ρuk ∧ρθ1 ∧ρθ2 ∧· · ·∧ρθl k2 = kρu1 ∧ρu2 ∧· · ·∧ρuk k2 kρθ1 ∧ρθ2 ∧· · ·∧ρθl k2 , and kρθ1 ∧ ρθ2 ∧ · · · ∧ ρθl k2 = rl kFθ1 ∧ Fθ2 ∧ · · · ∧ Fθl k2 . Moreover, there is an orthogonal transform UF on Rl+1 such that UF (F ) = (1, 0, . . . , 0). Then (id, UF ) is an orthogonal transform on Rn × Rl+1 such that (id, UF )ρui = (ξui , rui UF (F )) = (ξui , rui , 0, . . . , 0) = (σui , 0, . . . , 0). Applying the orthogonal transform to kρu1 ∧ ρu2 ∧ · · · ∧ ρuk k2 , we get kρu1 ∧ ρu2 ∧ · · · ∧ ρuk k2 = kσu1 ∧ σu2 ∧ · · · ∧ σuk k2 Thus the volume of the rotation hypersurface is Z µ(ρl (S)) = kσu1 ∧ σu2 ∧ · · · ∧ σuk k2 rl kFθ1 ∧ Fθ2 ∧ · · · ∧ Fθl k2 dµ~u dµθ~ A×B Z Z l = βl kσu1 ∧ σu2 ∧ · · · ∧ σuk k2 r dµ~u = βl rl dV. A

S

7.2. INTEGRATION ON HYPERSURFACE

397 Z

Exercise 7.2.43. Use the rotation to prove that βk+l+1 = βk βl

π 2

cosk θ sinl θdθ.

0

Exercise 7.2.44. Extend the result of Example 7.2.24 to the rotation around a hyperplane ~a · ~x = b (the surface is on the positive side of the hyperplane).

A differential k-form on Rn is an expression of the form X fi1 i2 ···ik (~x)dxi1 ∧ dxi2 ∧ · · · ∧ dxik . ω= i1
The integral of the differential k-form on the hypersurface S with respect to a regular parametrization σ is Z Z ∂(xi1 , xi2 , . . . , xik ) ω= fi1 i2 ···ik (σ(~u)) det du1 du2 · · · duk . ∂(u1 , u2 , . . . , uk ) S A Similar to the integral of differential 2-forms on surfaces, the integral is independent of the parametrizations as long as they are compatibly oriented. The compatibility means that the determinant of the Jacobian for the change of variables (so called transition map) is positive. In general, the hypersurface may be divided into several compatibly oriented pieces and the integral of the differential form is the sum of the integrals on the pieces. Now we consider the special case of the integral of an (n − 1)-form on an oriented (n − 1)-dimensional hypersurface S ⊂ Rn . At any ~x ∈ S, there are two unit length vectors orthogonal to the tangent hyperplane. Specifically, if σ(~u) is a regular parametrization, then the tangent hyperplane is spanned by σu1 , σu2 , . . . , σun−1 . Define the normal vector compatible with the orientation of the parametrization to be ~n = (−1)n−1

(σu1 ∧ σu2 ∧ · · · ∧ σun−1 )? . kσu1 ∧ σu2 ∧ · · · ∧ σun−1 k2

The compatible normal vector is characterized by the property that ~n has length 1, is orthogonal to σui , and the basis ~n, σu1 , σu2 , . . . , σun−1 has positive orientation (meaning det ~n σu1 σu2 · · · σun−1 > 0). Let (xbi means the term xi is missing from the list) ∂(x1 , x2 , . . . , xbi , . . . , xn ) ? ~e ∂(u1 , , u2 , . . . , un−1 ) ∧([n]−i) X ∂(x1 , x2 , . . . , xbi , . . . , xn ) = (−1)n−i det ~ei . ∂(u1 , , u2 , . . . , un−1 )

~a = (σu1 ∧ σu2 ∧ · · · ∧ σun−1 )? =

X

det

Then define the flux of a vector field F = (f1 , f2 , . . . , fn ) : S → Rn to be Z Z n−1 F · ~ndV = (−1) F · ~adu1 du2 · · · duk S A Z X ∂(x1 , x2 , . . . , xbi , . . . , xn ) = (−1)i−1 fi det du1 du2 · · · duk ∂(u1 , , u2 , . . . , un−1 ) A Z X ci ∧ · · · ∧ dxn . = (−1)i−1 fi dx1 ∧ dx2 ∧ · · · ∧ dx (7.2.9) S

398

CHAPTER 7. MULTIVARIABLE INTEGRATION

This extends the relation between the flux in R3 and the integration of 2forms in Section 7.2.6. For the special case n = 2, we have an oriented curve C ⊂ R2 parametrized by φ(t), and the compatible normal vector ~n is the unit length vector with direction obtained by rotating the tangent vector φ0 in clockwise direction by 90 degrees. Moreover, Z Z Z (f, g) · ~nds = f dy − gdx = (−g, f ) · d~x. C

C

C

Exercise 7.2.45. Denote π(v0 , v1 , . . . , vn ) = (v1 , v0 , v2 , . . . , vn ), ρi (v0 , v1 , . . . , vn ) = (vi+1 , vi+2 , . . . , vn , v0 , v1 , . . . , vi−1 , vi ). Prove that the unit sphere S n is covered by compatibly oriented regular parametrizations q q σi+ (~u) = ρi

1 − k~uk22 , ~u , σi− (~u) = ρi π − 1 − k~uk22 , ~u

defined for 1 ≤ i ≤ n and ~u ∈ Rn satisfying k~uk2 < 1. Exercise 7.2.46. Use (7.2.9) to show that the length of a curve C in R2 given by an Z −gy dx + gx dy q equation g(x, y) = c is , similar to the formula in Exercise 7.2.42. C gx2 + gy2 Extend the formula to the volume of an (n − 1)-dimensional hypersurface in Rn given by g(x1 , x2 , . . . , xn ) = c. What about the (n − k)-dimensional hypersurface defined by a map G : Rn → Rk ? Exercise 7.2.47. Compute the integral. Z 1. (~x − ~a) · ~ndV , the normal vector ~n points to the outside of the sphere. n−1 SR

Z 2. n−1 SR

(a1 x1 + a2 x2 + · · · + an xn )dx2 ∧ dx3 ∧ · · · ∧ dxn , the orientation of the

sphere is given by the outward normal vector. Z 3. dx1 ∧dx2 ∧dx3 +dx2 ∧dx3 ∧dx4 +· · ·+dxn−2 ∧dxn−1 ∧dxn , S is the surface S

σ(u, v, w) = ρ(u) + ρ(v) + ρ(w), ρ(u) = (u, u2 , . . . , un ), 0 ≤ u, v, w ≤ a.

We end the section with the discussion of the direction of the normal vector for the hypersurfaces given by graphs of functions. Consider the graph σ(~u) = (~u, h(~u)) of a continuously differentiable function h on Rn−1 . By (σu1 ∧ σu2 ∧ · · · ∧ σun−1 )? =((~e1 + hu1 ~en ) ∧ (~e2 + hu2 ~en ) ∧ · · · ∧ (~en−1 + hun−1 ~en ))? X =(~e1 ∧ ~e2 ∧ · · · ∧ ~en−1 )? + (−1)n−1−i hui (~e1 ∧ ~e2 ∧ · · · ∧ ~ebi ∧ · · · ∧ ~en )? X =~en − hui ~ei = (−hu1 , −hu2 , . . . , −hun−1 , 1),

7.3. STOKES THEOREM

399

the normal vector ~n =

(−1)n−1 (−hu1 , −hu2 , . . . , −hun−1 , 1) q 1 + h2u1 + h2u2 + . . . + h2un−1

points in the direction of xn when n is odd and opposite to the direction of xn when n is even. In general, consider the graph σi (~u) = σi (u1 , u2 , . . . , un−1 ) = (u1 , u2 , . . . , ui−1 , h(~u), ui , · · · , un−1 ) of a continuously differentiable function h. We have σi (~u) = Ui (σ(~u)) for the orthogonal transform U (x1 , x2 , . . . , xn ) = (x1 , x2 , . . . , xi−1 , xn , xi+1 , . . . , xn−1 ). By the characterization of the compatible normal vector, the normal vector compatible with the parametrization σi is (det U )U (~n) =

(−1)i−1 (−hu1 , −hu2 , . . . , −hui−1 , 1, −hui , . . . , −hun−1 ) q . 1 + h2u1 + h2u2 + . . . + h2un−1

The normal direction points in the direction of xi when i is odd and opposite to the direction of xi when i is even.

7.3

Stokes Theorem

The fundamental theorem of calculus relates the integration of a function on an interval to the values of the antiderivative at the end points of the integral. The Stokes theorem extends the result to high dimensional integrations.

7.3.1

Green Theorem

A curve φ : [a, b] → Rn is simple if it has no self intersection. It is closed if φ(a) = φ(b). The concept of simple closed curve is independent of the choice of parametrization. A simple closed curve in R2 divides the plane into two path connected pieces, one bounded and the other unbounded. Therefore we have two sides of a simple closed curve in R2 . Suppose A ⊂ R2 is a bounded subset with finitely many rectifiable simple closed curves C1 , C2 , . . . , Ck as the boundary. Since rectifiable curves have area zero (see Exercise 7.2.3), the subset A has area. We further assume that A is contained in only one side of each Ci . Then Ci has a compatible orientation such that A is “on the left” of C. This means that Ci has counterclockwise orientation if A is in the bounded side of Ci , and has clockwise orientation if A is in the unbounded side. This also means that if Ci is oriented and differentiable, then rotating the tangent vector ~t by 90 degrees in clockwise direction will produce a normal vector ~n pointing away from A.

400

CHAPTER 7. MULTIVARIABLE INTEGRATION ...................................................... ............... .......... ~t. . . . . . . . . ............. ... . . . . . . . .......... . . ......~n . ... .................. . . . . . . . . . . . . . . . . . .... .... . ... .. ..... .... .. ......... .... .. . ...... ..... ....... . . ....C2 ... ... . ... . ... ................... ... ... . ....... C1 . . .. .. ..... . . . . . . . . . . . . . . . . . . ....... . .. .............. ... ........C .. ......... . . . . ... 3 ... .... .. ... ... . ..... . .... .... .. ...... ... ............ ..... ............ ...... .......... A ....... ......... ........ ....... ........... ... ................. ..................................................................................... Figure 7.6: orientation of boundary

Theorem 7.3.1 (Green Theorem). Suppose A ⊂ R2 is a region with compatibly oriented rectifiable simple closed boundary curves C. Then for any continuously differentiable functions f and g, we have Z Z f dx + gdy = (−fy + gx )dxdy. (7.3.1) C

A

In Green theorem, C is understood to be the union of simple closed curves C1 , C2 , . . . , Ck , each oriented in the compatible way. The integral in (7.3.1) along C is understood to be Z Z Z Z = + +··· + . C

C1

C2

Ck

Proof. We first consider the special case that A = {(x, y) : x ∈ [a.b], h(x) ≥ y ≥ k(x)} is a region between the graphs of functions h(x) and k(x) with bounded variations and g = 0. We have ! Z Z Z b

h(x)

fy dxdy =

fy (x, y)dy dx Z

(Fubini Theorem)

k(x)

a

A

b

(f (x, h(x)) − f (x, k(x)))dx.

=

(fundamental theorem)

a

On the other hand, the boundary C consists of four segments with parametrizations compatible with the orientation Cb : Cr : Ct : Cl :

φ(t) = (t, k(t)), φ(t) = (b, t), φ(t) = (−t, h(−t)), φ(t) = (a, −t),

t ∈ [a, b] t ∈ [k(b), h(b)] t ∈ [−b, −a] t ∈ [−h(a), −k(a)]

7.3. STOKES THEOREM

401

.... y .. .. h(x) .. .............y......= ..................................Ct.. .. .. .................... .. .. ............. ........... .. .. .... .. ..... .. .. ... A .C .. .. ..... r .. Cl .. . .. .. ............................................................................... .... ......... .. .. ..... ...... .. .. y = k(x) Cb .. .. .. . . . ........................................................................................................................................................................ x .. a b .. Figure 7.7: Green theorem for a special case Then Z

Z

Z

f dx =

+

C

Cb b

Z =

Z

Z +

f dx

+

Cr

Ct

Z

Cl h(b)

f (t, k(t))dt + a

Z

f (b, t)db k(b)

−a

+

Z

Z

−b b

=

f (a, −t)da −h(a)

Z

f (x, k(x))dx + 0 + a

a

f (x, h(x))dx + 0. b

Z

Z fy dxdy = −

Therefore we have

−k(a)

f (−t, h(−t))d(−t) +

f dx. C

A

Many regions can be divided by vertical lines into several special regions Aj studied Zabove. LetZ Cj be the compatibly oriented boundary of Aj . Then f dx is

f dx, because the integrations along the vertical lines Z Z fy dxdy. This proves the fy dxdy is are all zero. Moreover, the sum of A A j Z Z equality f dx = − fy dxdy for the more general regions. the sum of

Cj

C

C

A

Now consider the most general case that the boundary curves are assumed to be only rectifiable. To simplify the presentation, we will assume that A has only one boundary curve C. Parametrize the curve C by the Euclidean arc length φ(s) : [a, b] → R2 . For any natural number n, divide [a, b] evenly b−a into n intervals of length δ = . Denote by Cj the segment of C between n φ(sj−1 ) and φ(sj ). Let Lj be the straight line connecting φ(sj−1 ) and φ(sj ). Let L be the curve constructed by taking the union of Lj . Let A0 be the region enclosed by the curve L. Then A0 can be divided by vertical lines into special regions. As explained above, we have Z Z f dx = − fy dxdy. L

A0

402

CHAPTER 7. MULTIVARIABLE INTEGRATION .. ........................................................................ . . . . . . . . . . . . . . ........ .. .. A5 .. .............. ........ ....... . . .. .................... .. . . . . . . . ...... ...... ...... .... .. A2 .. . . . .... ... .. .. .................. .. .. ... . . . ... .. . . .. ... .... ..... ... ..... A .... . 8 . .... .. . .. ............................ .. . ..... . . ... . .. .. . .. . ... ...... . . .. .. A7 ................................ .. A ........ A ... .. .. ............. . 1 . 4 ... .... .. ...... ... ... . ..... .. ..... ... ... . .. .... . .. .. .. ....... .... . .. .. ........... ..... .. .................. .. .. .......... .. ...... .. ..A6 .. ......... ........ A3 .. . . ........ . . ........ .. .. .. ...... A ............. ... 9 . . . .................. .......................................................................................... Figure 7.8: divide a general case to special cases

p Let M be the upper bound of k∇f k2 = fx2 + fy2 . Since s is the Euclidean arc length, we have kφ(s) − φ(t)k2 ≤ |s − t|. Therefore for ~x ∈ Cj and ~y ∈ Lj , we have |f (~x) − f (~y )| ≤ M k~x − ~y k2 ≤ M k~x − φ(sj )k2 + M kφ(sj ) − ~y k2 ≤ 2M δ. This implies ! Z Z X ∗ S(P, f dx, φ) − f dx = f (φ(sj ))∆xj − f dx Lj L Z X ∗ (f (φ(s )) − f (~ x ))dx ≤ j Lj X X ≤ 2M δ|∆xj | ≤ 2M δ ∆sj = 2M (b − a)δ. Z Z f dx. The estimation shows that limn→∞ f dx = C

L

On the other hand, let Bj be the region bounded by Cj and Lj . Then Bj ⊂ B(φ(sj ), δ). Therefore µ(Bj ) ≤ πδ 2 and Z XZ Z f dxdy − ≤ |f |dxdy f dxdy A

A0

Bj

≤

X

Z

M µ(Bj ) ≤ nM πδ 2 =

πM (b − a)2 . n

Z

This shows that limn→∞

f dxdy = f dxdy. A Z Z This completes the proof of f dx = − fy dxdy for regions with rectiZC Z A fiable boundaries. The formula gdy = gx dxdy can be similarly proved. A0

C

A

Suppose a curve C is closed but not simple. Then C may enclose some parts of the region more than once, and the orientation of C may not be

7.3. STOKES THEOREM

403

.... ............................ ... ...... ... C ... A1 . 1 ....... . ..... . ....... . . . . ....... ........ ....... ....... ............. . . . . . ..... ........ .. ... A . C2 . 2 .... ........ .. .......... .............. Figure 7.9: a closed but not simple curve

..................................................... ........ ..... . . . .... ........ .... ... ..... ..... ... . . . ...C2 ...C1 . ....... . . ... A1 ... ... . ....... .... . ... . ...................... . ... . .. ... . .... ... . ..... . A . . 2 ....... ....... ........... ............................

compatible with some parts. For example, in Figure 7.9, the left curve intersects itself once, which divides the curve into the outside part C1 and the inside part C2 . Then Z Z Z Z Z (−fx + gy )dxdy f dx + gdy = + f dx + gdy = + A1 C C1 C2 A1 ∪A2 Z Z = 2 + (−fx + gy )dxdy. A1

A2

For the curve on the right, we have Z Z Z f dx + gdy = − (−fx + gy )dxdy A1

C

A2

In general, we still have Green theorem for regions enclosed by (not necessarily simple) closed curves, as long as the regions are counted in the corresponding way. In fact, this remark is already used in the proof of Green theorem above, since the curve L in the proof may not be simple. Example 7.3.1. The area of the region A enclosed by a simple closed curve C is Z Z Z dxdy = xdy = − ydx. A

C

C

For example,Zsuppose C is the counterclockwise unit circle arc from (1, 0) to (0, 1). xdy, we add C1 = [0, 1] × 0 in the rightward direction and C2 = Z Z Z 0 × [0, 1] in the downward direction. Then the integral + + xdy is C C1 C2 Z Z 1 Z π of the quarter unit disk. Since xdy = xd0 = 0 and xdy = the area 4 C1 0 C2 Z 0 Z π 0dy = 0, we conclude that xdy = . 4 1 C Exercise 7.3.1. Compute the integral. Z 1. (ex + 2y − 3x)dx + (8x + cos y 2 )dy, C is the triangle with vertices (0, 0), To compute

C

C

(1, 2), (2, 3), in the clockwise direction. Z 2. (x2 +y 2 )dx−2xydy, C is the boundary of the half disk x2 +y 2 ≤ 4, y ≥ 0, C

in the counterclockwise direction.

404

CHAPTER 7. MULTIVARIABLE INTEGRATION

Exercise 7.3.2. Compute the area. 3

3

1. Region enclosed by the astroid x 2 + y 2 = 1. 2. Region enclosed by the cardioid r = 2a(1 + cos θ). 3. Region enclosed by the parabola (x + y)2 = x and the x-axis. Exercise 7.3.3. Do Exercise 7.1.66 again by using Green theorem. Z Exercise 7.3.4. Extend integration by parts by finding a relation between Z vgy )dxdy and (ux f + vy g)dxdy.

(ufx + A

A

Exercise 7.3.5. The divergence of a vector field F = (f, g) on R2 is divF = fx + gy . Prove that

Z

Z F · ~nds =

divF dA,

C

A

where ~n is the normal vector pointing away from A. Exercise 7.3.6. The Laplacian of a two variable function f is ∆f = fxx + fyy = div∇f . Prove Green identities Z Z Z f ∆gdA = f ∇g · ~nds − ∇f · ∇gdA, A

and

C

A

Z

Z (f ∆g − g∆f )dA =

A

7.3.2

(f ∇g − g∇f ) · ~nds. C

Independence of Integral on Path Z

2

If fy = gx on a region A ⊂ R , then Green theorem tells us

f dx + gdy = 0. C

The conclusion can be interpreted and utilized in different ways. Suppose C1 and C2 are two curves connecting ~a to ~b in R2 . Then C1 ∪ (−C2 ) is an oriented closed Z curve. If fy = gx on the region enclosed by C1 ∪ (−C2 ), then we have

f dx + gdy = 0, which means C1 ∪(−C2 )

Z

Z f dx + gdy =

C1

f dx + gdy. C2

In other words, if fy = gx on the region between two curves with the same beginning and end points, then the integral of the 1-form f dx + gdy along the two curves are the same. Another explanation of the independence of the integral of a 1-form on the choice of paths is given in Examples 7.2.8. The general argument is the following. Suppose f = ϕx and g = ϕy for a differentiable function

7.3. STOKES THEOREM

405

C .........................2................. . . . . C . . . 1 . ......... .... ................................... ...... ...... ......... . . . . . . . . . . . . (x , y ) . . . . . . . . . . . ...... ... 1 1 .. f = g . . . y x . . . . . . . . .. .......... .. fy = gx (x0 , y0 )........ ... ........ .... . . . ..... . . . . . ............. ... ........... ...... ... ................... .......... .. ................ .................. Z Z Figure 7.10: when f dx + gdy = f dx + gdy C1

C2

ϕ(x, y). Suppose a curve C connecting (x0 , y0 ) to (x1 , y1 ) is parametrized by differentiable φ(t) = (x(t), y(t)). Then Z Z b Z b dϕ(φ(t)) 0 0 f dx + gdy = (ϕx x + ϕy y )dt = dt dt C a a = ϕ(φ(b)) − ϕ(φ(a)) = ϕ(x1 , y1 ) − ϕ(x0 , y0 ). The condition f = ϕx and g = ϕy means the 1-form f dx + gdy is the differential dϕ. It also means the vector field (f, g) is the gradient ∇ϕ. The function ϕ, called a potential for the 1-form f dx + gdy or for the vector field (f, g), is simply the antiderivative for the pair of two variable functions! Any continuous single variable function f (x) has essentially unique antiderivative Z x ϕ(x) = ϕ(x0 ) + f (t)dt, x0

by integrating along the line (interval) [x0 , x] connecting x0 to x. However, if ϕx = f , ϕy = g, and f and g have continuous partial derivatives, then f and g must satisfy fy = ϕyx = ϕxy = gx . Therefore fy = gx is a necessary condition for the vector field (f, g) to have antiderivative (potential). Moreover, similar to the single variable case, we expect the potential to be Z Z (x,y) ϕ(x, y) = ϕ(x0 , y0 ) + f dx + gdy = ϕ(x0 , y0 ) + f dx + gdy, (7.3.2) C

(x0 ,y0 )

except there are many curves C connecting (x0 , y0 ) to (x, y). Therefore the two variable antiderivative problem is closely related to the independence of the integral on the choice of paths. Theorem 7.3.2. Suppose f and g are continuous functions on an open subset U ⊂ R2 . Then the following are equivalent. Z 1. The integral f dx + gdy along an oriented rectifiable curve C in U C

depends only on the beginning and end points of C. 2. There is a differentiable function ϕ on U , such that ϕx = f and ϕy = g. Moreover, if U is simply connected and f and g are continuously differentiable, then the above is also equivalent to fy = gx .

406

CHAPTER 7. MULTIVARIABLE INTEGRATION

The simply connected condition means that the subset has no holes. The rigorous definition is that any continuous map S 1 → U from the unit circle (i.e., a closed curve in U ) extends to a continuous map B 2 → U from the unit disk. Any two curves in U with the same beginning and end points form a closed curve in U . The extension to the unit disk means that the region between the two curves still lies in U , so that the integral along the two curves are the same. Proof. Suppose the integral depends only on the beginning and end points. Then we consider ϕ(x, y) given by the formula (7.3.2), in which (x0 , y0 ) is a fixed point in U and ϕ(x0 , y0 ) is the arbitrary constant allowed in the antiderivative. To study the differentiability of ϕ at (x1 , y1 ) ∈ U , for any > 0, we find δ > 0, such that k∆(x, y)k2 = k(x, y) − (x1 , y1 )k2 < δ implies (x, y) ∈ U and k(f (x, y) − f (x1 , y1 ), g(x, y) − g(x1 , y1 ))k2 < . Then for such (x, y), we have Z ϕ(x, y) = ϕ(x1 , y1 ) +

f dx + gdy, I

where I is the straight line connecting (x1 , y1 ) to (x, y) and I still lies in U . For the linear function l(x, y) = ϕ(x1 , y1 ) + f (x1 , y1 )∆x + g(x1 , y1 )∆y, we have Z |ϕ(x, y) − l(x, y)| = (f (x, y) − f (x1 , y1 ))dx + (g(x, y) − g(x1 , y1 ))dy I

≤ sup k(f (x, y) − f (x1 , y1 ), g(x, y) − g(x1 , y1 ))k2 µ(I) (x,y)∈I

≤ k∆(x, y)k2 . This shows that l is a linear approximation of ϕ. Moreover, we have ϕx (x1 , y1 ) = f (x1 , y1 ) and ϕy (x1 , y1 ) = g(x1 , y1 ). Conversely, suppose ϕ is differentiable and ϕx = f , ϕy = g. By the earlier discussion, we have Z f dx + gdy = ϕ(x1 , y1 ) − ϕ(x0 , y0 ) (7.3.3) C

for the special case that C is a differentiably parametrized curve connecting (x0 , y0 ) to (x1 , y1 ). Now consider a general rectifiable curve C parametrized by φ : [a, b] → R2 connecting (x0 , y0 ) to (x1 , y1 ). Let P be a partition of [a, b]. Let Li be the straight line connecting φ(ti−1 ) to φ(ti ). Let L be obtained by combining Li together. Since Li can be differentiably parametrized, we have Z XZ f dx + gdy = f dx + gdy L Li X = (ϕ(φ(ti )) − ϕ(φ(ti−1 ))) = ϕ(x1 , y1 ) − ϕ(x0 , y0 ).

7.3. STOKES THEOREM

407

On Z the other hand, in Z the proof of Theorem 7.3.1, we have shown that f dx + gdy = lim f dx + gdy. Therefore we conclude that (7.3.3) also kP k→0

C

L

holds for the general C. Finally, we know that in case f and g are continuously differentiable, the second statement implies fy = gx . Conversely, we also explained that by Green theorem, the condition fy = gx on a simply connected open subset implies the first statement. Example 7.3.2. The integral of ydx + xdy in Example 7.2.8 is independent of the ∂y ∂x choice of the paths because the equality = holds on the whole plane, which ∂y ∂x is simply connected. The potential of the 1-form is xy, up to adding constants. In Example 7.2.9, the integrals of xydx + (x + y)dy along the three curves are different. Indeed, we have (xy)y = x 6= (x + y)x = 1, and the 1-form has no potential. 1 Example 7.3.3. The vector field 2 (xy 2 + y, 2y − x) is defined on y > 0 and y < 0, y d xy 2 + y 1 d 2y − x both simply connected. It satisfies = − = . Theredy y2 y2 dx Z y 2 y(xy + 1) y(xy + 1) , we get ϕ = dx+ fore it has a potential function ϕ. By ϕx = 2 y y2 x2 x x 2y − x 2 ϑ(y) = + + ϑ(y). Then by ϕy = − 2 + ϑ0 (y) = , we get ϑ0 (y) = , so 2 2 y y y y that ϑ(y) = 2 log |y| + c. The potential function is ϕ=

x2 x + + 2 log |y| + c. 2 y

In particular, we have Z

(2,2)

(1,1)

(xy 2 + y)dx + (2y − x)dy = y2

(2,2) x2 x 3 + + 2 log |y| = + log 2. 2 y 2 (1,1)

The example shows that the formula (7.3.2) may not be the most practical way of computing the potential. ydx − xdy Example 7.3.4. The 1-form satisfies x2 + y 2 y x2 − y 2 −x fy = = 2 = gx = x2 + y 2 y (x + y 2 )2 x2 + y 2 x on R2 − (0, 0), which is unfortunately not simply connected. Let U be obtained by removing the non-positive x-axis (−∞, 0]×0 from R2 . Then U is simply connected, ydx − xdy and Theorem 7.3.2 can be applied. The potential for is ϕ = −θ, where x2 + y 2 −π < θ < π is the angle in the polar coordinate. The potential can be used to compute Z (0,1) π ydx − xdy π = −θ(0, 1) + θ(1, 0) = − + 0 = − 2 2 x +y 2 2 (1,0)

408

CHAPTER 7. MULTIVARIABLE INTEGRATION

for any curve connecting (1, 0) to (0, 1) that does not intersect the non-positive π x-axis. The unit circle arc C1 : φ1 (t) = (cos t, sin t), 0 ≤ t ≤ , and the straight 2 line C2 : φ2 (t) = (1 − t, t), 0 ≤ t ≤ 1, are such curves. On the other hand, consider the unit circle arc C3 : φ3 (t) = (cos t, − sin t), 3π 0≤t≤ , connecting (1, 0) to (0, 1) in the clockwise direction. To compute the 2 integral along the curve, let V be obtained by removing the non-negative diagonal {(x, x) : x ≥ 0} from R2 . The 1-form still has the potential ϕ = −θ on V . The π 9π crucial difference is that the range for θ is changed to < θ < . Therefore 4 4 Z

(0,1)

(1,0)

ydx − xdy π 3π = −θ(0, 1) + θ(1, 0) = − + 2π = 2 2 x +y 2 2

for any curve connecting (1, 0) to (0, 1) that does not intersect the non-negative diagonal. Z (0,1) ydx − xdy depends only on how the curve conIn general, the integral x2 + y 2 (1,0) necting (1, 0) to (0, 1) goes around the origin, the only place where fy = gx fails. Exercise 7.3.7. Explain the computations in Exercise 7.2.19 by Green theorem. Exercise 7.3.8. Use potential function to compute the integral. Z 1. 2xdx + ydy, C is the straight line connecting (2, 0) to (0, 2). C

Z 2. C

Z 3. C

xdx + ydy , C is the circular arc connecting (2, 0) to (0, 2). x2 + y 2 ydx − xdy , C is the unit circle in counterclockwise direction. ax2 + by 2

x2 y2 (x − y)dx + (x + y)dy , C is the elliptic arc + = 1 in the upper x2 + y 2 a2 b2 C half plane connecting (a, 0) to (−a, 0). Z 5. (ex sin 2y −y)dx+(2ex cos 2y −1)dy, C is the circular arc connecting (0, 1) Z

4.

C

to (1, 0) in clockwise direction. Z 6. (2xy 3 − 3y 2 cos x)dx + (−6y sin x + 3x2 y 2 )dy, C is the curve 2x = πy 2 C π connecting (0, 0) to ,1 . 2 Exercise 7.3.9. Find α so that the vector fields

(x, y) (−y, x) and 2 have (x2 + y 2 )α (x + y 2 )α

potentials. Then find the potentials. Z Exercise 7.3.10. Compute the integral

g(xy)(ydx+xdy), where C is the straight C

line connecting (2, 3) to (1, 6). Exercise 7.3.11. Why is the potential function unique up to adding constants?

7.3. STOKES THEOREM

409

............................... ....................................................................................................... . . . . . . . .... ... ....... .... ..... ..... ..... ..... ............................................. ...... ........... . . . ..... . . . . [1]. . .. ... ... ... .. ... . .... . . ... . . . . . . . . . . . . . ... . . . .. ... ...... ....... . . . .. . . ... ... ..C1 .. ... ... .. .. . .... ... ................ . . . . . . . . .. . ... ....................... . . . .. .. . . . . . . . .... ... ...... . . .. ... [3] ....... ....... ....... ..... . . . ... .. ... ............ ............................. ..................... ... [2]... ......... .. .. ... ... ..C2 ... ... .................. .......................... ........... .. .. ... ... .. .. .. ... [4]... .. ... ... .... ... .. ... .... .. ... .... .... .. .. ... ... ..... ... ............ ... ... ..... ..... . . ..... . ...... ....... . C ..... . . . . .................. . . . . . ...... U ....................................... . ............... ....................................................... Figure 7.11: closed curve in open subset with holes Now we discuss another application of Green theorem in case fy = gx . The open subset in Figure 7.11 has two holes. The two holes are enclosed by simple closed curves C1 and C2 in U oriented in counterclockwise direction. The closed curve C in U may be divided into four parts, denoted C[1] , C[2] , C[3] , C[4] . The union of oriented closed curves C[1] ∪(−C1 ), C[2] ∪C[4] ∪(−C1 )∪ (−C2 ) and C[3] also enclose subsets contained in U . If fy = gy on U , then by Green theorem, Z Z Z f dx + gdy = 0. f dx + gdy = f dx + gdy = C[2] ∪C[4] ∪(−C1 )∪(−C2 )

C[1] ∪(−C1 )

Thus Z

Z f dx + gdy =

+

Z f dx + gdy = 2

C1 ∪C2

C1

C

Z

C[3]

Z +

C1

f dx + gdy. C2

Note that the coefficients 2 for C1 and 1 for C2 mean that C wraps around C1 twice and C2 once. In general, suppose U has finitely many holes. We enclose these holes with closed curves C1 , C2 , . . . , Ck , all in counterclockwise orientation. Then any closed curve C in U wraps around the i-th hole ni times. The sign of ni is positive when the wrapping is counterclockwise and is negative if the wrapping is clockwise. Then we say C is homologous to n1 C1 + n2 C2 + · · · + nk Ck . If fy = gx on U , then we have

Z f dx + gdy =

Z n1

C

Z

Z + · · · + nk

+n2 C1

C2

f dx + gdy. Ck

Finally, suppose C and D are two curves in U with the same end points. Because U isZno longer simplyZconnected, the equality no longer implies that the integrals

f dx+gdy and C

f dx+gdy are equal. However, the difference D

between the two integrals can be computed by considering the homology of the closed curve C ∪ (−D).

410

CHAPTER 7. MULTIVARIABLE INTEGRATION

ydx − xdy satisfies fy = gx on U = x2 + y 2 R2 − (0, 0). The unit circle C in the counterclockwise direction encloses the only hole of U . We have Z 2π Z ydx − xdy sin t(− sin t)dt − cos t cos tdt = = −2π. 2 + y2 x cos2 t + sin2 t 0 C

Example 7.3.5. In Example 7.3.4, the 1-form

If C1 is a curve on the first quadrangle connecting (1, 0) to (0, 1) and C2 is a curve on the second, third and fourth quadrangles connecting the two points, then C1 ∪ (−C2 ) is homologous to C, and we have Z Z ydx − xdy ydx − xdy = − 2π. 2 2 2 2 C1 x + y C2 x + y Exercise 7.3.12. Study how the integral of the 1-form

ydx − xdy depends on x2 + xy + y 2

the curves. Exercise 7.3.13. Study how the integral of the 1-form ω=

(x2 − y 2 − 1)dx + 2xydy ((x − 1)2 + y 2 )((x + 1)2 + y 2 )

depends on the curves. Note that Z Z if C is the counterclockwise circle of radius around (1, 0), then ω = lim ω. C0

7.3.3

→0 C

Stokes Theorem

Stokes theorem is the extension of Green theorem to oriented surfaces S ⊂ R3 with compatibly oriented boundary curves C. Suppose the surface is given by an orientation compatible parametrization σ(u, v) = (x(u, v), y(u, v), z(u, v)) : A ⊂ R2 → S ⊂ R3 , and the boundary C of S corresponds to the boundary D of A. Recall that D is oriented in such a way that A is “on the left” of D. Correspondingly, C is oriented in such a way that S is “on the left” of C. Also recall the description in case D is differentiable, the clockwise rotation of the tangent vector of D by 90 degrees will be the normal direction pointing away from A. Correspondingly, in the tangent plane of S, the clockwise rotation of the tangent vector of C by 90 degrees will be the normal direction pointing away from S. Suppose F = (f, g, h) is a continuously differentiable vector field on R3 . Then by Green theorem, Z Z F · d~x = f dx + gdy + hdz C C Z = (f xu + gyu + hzu )du + (f xv + gyv + hzv )dv ZD = ((f xv + gyv + hzv )u − (f xu + gyu + hzu )v )dudv ZA = (fu xv − fv xu + gu yv − gv yu + hu zv − hv zu )dudv. A

7.3. STOKES THEOREM

411

By fu xv − fv xu = (fx xu + fy yu + fz zu )xv − (fx xv + fy yv + fz zv )xu ∂(x, y) ∂(z, x) = −fy det + fz det , ∂(u, v) ∂(u, v) we have Z Z ∂(x, y) ∂(z, x) (fu xv − fv xu )dudv = −fy det dudv + fz det dudv ∂(u, v) ∂(u, v) A A Z = −fy dx ∧ dy + fz dz ∧ dx. S

Combined with the similar computations for g and h, we get the following result. Theorem 7.3.3 (Stokes Theorem). Suppose S ⊂ R3 is an oriented surface with compatibly oriented simple closed boundary curves C. Then for any continuously differentiable F = (f, g, h), we have Z Z f dx + gdy + hdz = (gx − fy )dx ∧ dy + (hy − gz )dy ∧ dz + (fz − hx )dz ∧ dx. C

S

Although the argument was made only for one parametrization, which may not cover the whole surface, the equality can be extended to oriented surfaces by adding the equalities on compatibly oriented parametrized pieces. Introduce the symbol ∂ ∂ ∂ , , (7.3.4) ∇= ∂x ∂y ∂z on R3 . For any function f , we formally have the gradient ∂f ∂f ∂f gradf = ∇f = , , . ∂x ∂y ∂z

(7.3.5)

For any vector field F = (f, g, h) on R3 , we formally have the curl ∂h ∂g ∂f ∂h ∂g ∂f − , − , − . curlF = ∇ × F = ∂y ∂z ∂z ∂z ∂x ∂y Then Stokes theorem can be written as Z Z F · d~x = curlF · ~ndA. C

S

Example 7.3.6. Suppose C is the circle given by x2 +y 2 +z 2 = 1, x+y +z = r, with the counterclockwise orientation when Z viewed from the direction of the x-axis. We would like to compute the integral (ax + by + cz + d)dx. Let S be the disk C

x2 + y 2 + z 2 ≤ 1, x + y + z = r, with the normal direction r given by (1, 1, 1). Then r2 by Stokes theorem and the fact that the radius of S is 1 − , we have 3 Z Z Z 1 c−b r2 √ (−b+c)dA = √ π 1 − (ax+by+cz+d)dx = −bdx∧dy+cdz∧dx = . 3 3 3 C S S

412

CHAPTER 7. MULTIVARIABLE INTEGRATION

Example 7.3.7. Faraday observed that a changing magnetic field B induces an electric field E. More precisely, Faraday’s induction law says that the rate of change of the flux of the magnetic field through a surface S is the negative of the integral of the electric field along the boundary C of the surface S. The law is summarized in the formula Z Z d E · d~x = − B · ~ndA. (7.3.6) dt S C Z By Stokes theorem, the left side is − curlE · ~ndA. Since the equality holds for S

any surface S, we conclude the differential version of Faraday’s law −curlE =

∂B . ∂t

(7.3.7)

This is one of Maxwell’s equations for electromagnetic fields. Z Exercise 7.3.14. Compute y 2 dx + (x + y)dy + yzdz, where C is the ellipse C

x2 + y 2 = 2, x + y + z = 2, with clockwise orientation as viewed from the origin. Exercise 7.3.15. Suppose C is any closed curve on the sphere x2 + y 2 + z 2 = R2 . Z Prove that (y 2 + z 2 )dx + (z 2 + x2 )dy + (x2 + y 2 )dz = 0. In general, what is the C Z condition for f , g, h so that f dx + gdy + hdz = 0 for any closed curve C on C

any sphere centered at the origin? Exercise 7.3.16. Find the formulae for the curls of F + G, gF . Exercise 7.3.17. Prove that curl(gradf ) = ~0. Moreover, compute curl(curlF ). Exercise 7.3.18. The electric field E induced by a changing magnetic field is also changing and follows Ampere’s law Z Z Z d B · d~x = µ0 J · ~ndA + 0 µ0 E · ~ndA, (7.3.8) dt S S C where J is the current density, and µ0 , 0 are some physical constants. Derive the differential version of Ampere’s law, which is the other Maxwell’s equation for electromagnetic fields.

Green theorem can be further extended to surfaces in Euclidean spaces of dimension higher than 3. Suppose S is an oriented surface given by an orientation compatible regular parametrization σ(u, v) : A ⊂ R2 → Rn . The boundary C of S corresponds to the boundary D of A. The compatible orientation of C is described as follows. At any ~x = σ(u, v) ∈ C, the tangent plane T~x S of the surface is spanned by the tangent vectors σu and σv . On the tangent plane is the normal vector ~n ∈ T~x S pointing away from the surface S. Then the tangent vector of C is a unit length vector ~t ∈ T~x S orthogonal to ~n. Among the two choices of ~t (one is the negative of the other), the one compatible with the orientation of S is such that the rotation from σu to σv is in the same direction of the rotation from ~n to ~t. The condition can be rephrased as follows. Since both {σu , σv } and {~n, ~t} are bases of the 2-dimensional subspace T~x S, we have σu ∧ σv = λ~n ∧ ~t,

7.3. STOKES THEOREM

413

for some number λ 6= 0. The tangent vector ~t is the only unit length vector orthogonal to ~n, such that λ > 0 in the equality above. Now for a continuously differentiable f , by Green theorem, we have Z Z ∂xj ∂xj f dxj = f du + dv ∂u ∂v C D Z ∂ ∂xj ∂ ∂xj = f − f dudv ∂v ∂v ∂u ∂u A Z ∂f ∂xj ∂f ∂xj = − dudv ∂v ∂v ∂u ∂u A Z X ∂f ∂xi ∂xj ∂f ∂xi ∂xj = − dudv ∂xi ∂u ∂v ∂xi ∂v ∂u A i Z X Z X ∂(xi , xj ) ∂f ∂f det dudv = dxi ∧ dxj . = ∂(u, v) S i6=j ∂xi A i6=j ∂xi Thus for a continuously differentiable vector field F = (f1 , f2 , . . . , fn ) on Rn , we have Z X Z X Z ∂fj dxi ∧ dxj fj dxj = F · d~x = S i6=j ∂xi C j C Z X ∂fj ∂fi = − dxi ∧ dxj . (7.3.9) ∂xi ∂xj S i<j This is Stokes theorem for oriented surfaces in Rn . Similar to Green theorem, Stokes theorem has implication on how the integral of a 1-form depends on the choice of the curve. Suppose a vector field F satisfies ∂fi ∂fj = (7.3.10) ∂xi ∂xj on an open subset U ⊂ Rn . A collection C of oriented closed curves is homologous to 0 in U if it is the compatibly oriented boundary of an oriented surface S in U (in fact, a continuously differentiable map from S to U is sufficient). Stokes theorem tells Z us that if F satisfies (7.3.10) on U and C is F · d~x = 0. Moreover, two oriented curves C

homologous to 0 in U , then C

and D that have the same begining and end points are homologous in U if the closed curve C ∩ (−D) is homologous to 0. Then the equalities (7.3.10) Z F · d~x depends only on the homologous class of C.

imply that the integral C

A special case is when the subset U is simply connected, which means that any continuous map S 1 → U extends to a continuous map B 2 → U . Then C ∩ (−D) is the boundary of a disk in U . With the same proof, Theorem 7.3.2 may be extended. Theorem 7.3.4. Suppose F = (f1 , f2 , . . . , fn ) is a continuous vector field on an open subset U ⊂ Rn . Then the following are equivalent.

414

CHAPTER 7. MULTIVARIABLE INTEGRATION Z F ·d~x along an oriented rectifiable curve C in U depends

1. The integral C

only on the beginning and end points of C. 2. There is a differentiable function ϕ on U , such that ∇ϕ = F . Moreover, if U is simply connected and F is continuously differentiable, then the above is also equivalent to ∂fj ∂fi = ∂xi ∂xj for any i, j. In R3 , the theorem says that F = gradϕ for some ϕ on a simply connected region if and only if curlF = ~0. The potential function ϕ is given by Z ~x ϕ(~x) = ϕ(~x0 ) + F · d~x ~ x0

and also satisfies dϕ = F · d~x. Example 7.3.8. The vector field (f, g, h) = (yz(2x + y + z), zx(x + 2y + z), xy(x + y + 2z)) satisfies fy = gx = (2x + 2y + z)z, hy = gz = (x + 2y + 2z)x, fz = hx = (2x + y + 2z)y. Therefore the vector field has potential. The potential function can be computed by integrating along successive straight lines connecting (0, 0, 0), (x, 0, 0), (x, y, 0), (x, y, z). The integral is zero on the first two segments, so that Z z ϕ(x, y, z) = xy(x + y + 2z)dz = xyz(x + y + z). 0

Exercise 7.3.19. Determine whether the integral of vector field or the 1-form is independent of the choice of the curves. In the independent case, find the potential function. 1. ex (cos yz, −z sin yz, −y sin yz). 2. y 2 z 3 dx + 2xyz 3 dy + 2xyz 2 dz. 3. (y + z)dx + (z + x)dy + (x + y)dz. 4. (x2 , x3 , . . . , xn , x1 ). 5. (x21 , x22 , . . . , x2n ). dxn dx1 dx2 + + ··· + . 6. x1 x2 · · · xn x1 x2 xn Exercise 7.3.20. Suppose ~a is a nonzero vector. Find condition on a function f (~x) so that the vector field f (~x)~a has a potential function. Exercise 7.3.21. Find condition on a function f (~x) so that the vector field f (~x)~x has a potential function.

7.3. STOKES THEOREM

415

(y − z)dx + (z − x)dy + (x − y)dz . (x − y)2 + (y − z)2 Exercise 7.3.23. Suppose a continuously second order differentiable function g(~x) satisfies ∇g(~x0 ) 6= ~0. Suppose f (~x) is continuously differentiable near ~x0 . Prove that the differential form Exercise 7.3.22. Study the potential function of

f dg = f gx1 dx1 + f gx2 dx2 + · · · + f gxn dxn has a potential near ~x0 if and only if f (~x) = h(g(~x)) for a continuously differentiable h(t).

7.3.4

Gauss Theorem

Green theorem can be extended to regions of higher dimension. For example, let A = {(x, y, z) : (x, y) ∈ B, k(x, y) ≤ z ≤ h(x, y)} be the 3-dimensional region between the graphs of functions h(x, y) and k(x, y) on B ⊂ R2 (see Figure 7.12). Then we have ! Z Z Z h(x,y)

fz (x, y, z)dz

fz dxdydz = B

A

dxdy

k(x,y)

Z (f (x, y, h(x, y)) − f (x, y, k(x, y)))dxdy.

= B

On the other hand, the boundary S of A consists of three pieces. By taking the normal vector ~n of the boundary surface to point outward of A, the orientation compatible parametrizations of the three pieces are Sb : σ(y, x) = (x, y, k(x, y)), St : σ(x, y) = (x, y, h(x, y)), Sv : σ(t, z) = (x(t), y(t), z),

(x, y) ∈ B (x, y) ∈ B k(x, y) ≤ z ≤ h(x, y)

where (x(t), y(t)) is a parametrization of the boundary curve of B. We have Z Z Z Z f dx ∧ dy f dx ∧ dy = + + St Sv Sb S Z ∂(x, y) = f (x, y, k(x, y)) det dxdy ∂(y, x) B Z ∂(x, y) + f (x, y, h(x, y)) det dxdy ∂(x, y) B Z ∂(x, y) dtdz + f (x(t), y(t), z) det ∂(t, z) a≤t≤b,k(x,y)≤z≤h(x,y) Z Z = − f (x, y, k(x, y))dxdy + f (x, y, h(x, y))dxdy + 0. B

Z Thus we proved

B

Z f dx∧dy =

S

fz dxdydz. The equality can be extended A

to regions that can be divided by vertical planes into regions between graphs

416

CHAPTER 7. MULTIVARIABLE INTEGRATION

x

..... z ~n .. .. z = h(x, y) .... .. .. .................................................................................................. St ....... .... .. .. .......... ........... ........................................ .. .. ................. .. .. ................................................ .. .. .. .. .. .. .. .. z = k(x, y) A ..Sv .. .. b .. .. .................................................................................S . .......... ........................... ~n . . . . . . . . . . . .. ............ . ...... ... . . .. . .. ...................................... ..................................................................... .......... .. .. .... .. . ~n . .. . . . . . . ................................................................................................................................................................................. y . . . . ... . . .... .. .... ................................................................................... .... . . ..... . ........ .... B .. . ....................... . . . . . . . . . . . . . . . . . ...... . ......................................... Figure 7.12: Gauss Theorem for a special case

of continuously differentiable functions. Similar equalities can be established by rotating x, y and z. Similar to the 2-dimensional case, we may consider a subset A ⊂ R3 with surfaces S1 , S2 , . . . , Sk as the boundary. We further assume that A is contained in only one side of each Si . Let ~n be the normal vector of the surfaces pointing outward of A. This induces orientations on the surfaces. We denote by S the union of the oriented boundary surfaces and say A is a region with compatibly oriented boundary surfaces S. Theorem 7.3.5 (Gauss Theorem). Suppose A ⊂ R3 a region with boundary surfaces S compatibly oriented with respect to the outward normal vector ~n. Then for any continuously differentiable F = (f, g, h), we have Z Z f dy ∧ dz + gdz ∧ dx + hdx ∧ dy = (fx + gy + hz )dxdydz. S

A

Define the divergence divF = ∇ · F =

∂g ∂h ∂f + + . ∂x ∂y ∂z

The Gauss theorem means that the outward flux of a flow F = (f, g, h) is equal to the integral of the divergence of the flow on the solid Z Z F · ~ndA = divF dV. S

A

Example 7.3.9. The volume of the region enclosed by a surface S ⊂ R3 without boundary is Z Z Z Z 1 xdy ∧ dz = ydz ∧ dx = zdx ∧ dy = (x, y, z) · ~ndA. 3 S S S S

7.3. STOKES THEOREM

417

Example 7.3.10. In Example 7.2.22, the outgoing flux of the flow F = (x2 , y 2 , z 2 ) (x − x0 )2 (y − y0 )2 (z − z0 )2 through the ellipse + + = 1 is computed by surface a2 b2 c2 integral. Alternatively, by Gauss theorem, the flux is Z Z 2(x + y + z)dxdydz F · ~ndA = 2 2 2 (x−x0 ) a2

+

(y−y0 ) b2

+

(z−z0 ) c2

≤1

Z =

2 2 x2 + y2 + z2 ≤1 a2 b c

2(x + x0 + y + y0 + z + z0 )dxdydz.

By the transform ~x → −~x, we get Z 2 2 x2 + y2 + z2 ≤1 a2 b c

(x + y + z)dxdydz = 0.

Therefore the flux is Z 2 2 x2 + y2 + z2 ≤1 a2 b c

2(x0 + y0 + z0 )dxdydz =

8πR3 abc(x0 + y0 + z0 ). 3

Example 7.3.11. To compute the upward flux of F = (xz, −yz, (x2 + y 2 )z) through the surface S given by 0 ≤ z = 4−x2 −y 2 , we introduce the disk D = {(x, y, 0) : x2 + y 2 ≤ 4} on the (x, y)-plane. Taking the normal direction of D to be (0, 0, 1), the surface S ∪ (−D) is the boundary of the region A given by 0 ≤ z ≤ 4 − x2 − y 2 . By Gauss theorem, the flux through S is Z Z Z F · ~ndA = F · ~ndA + F · ~ndA S S∪(−D) D Z Z 2 2 = (z − z + x + y )dxdydz + F (x, y, 0) · (0, 0, 1)dA A x2 +y 2 ≤4 Z Z 32π = (x2 + y 2 )dxdydz = r2 (4 − r2 )rdrdθ = . 3 A 0≤r≤2,0≤θ≤2π Example 7.3.12. The gravitational field created by a mass M at point ~x0 ∈ R is G=−

M (~x − ~x0 ). k~x − ~x0 k32

A straightforward computation shows divG = 0. Suppose A is a region with compatibly oriented surfaces S asZ the boundary and ~x0 6∈ S. If ~x0 6∈ A, then by G ·~ndA = 0. If ~x0 ∈ A, then let B be the ball

Gauss theorem, the outward flux S

of radius centered at ~x0 . The boundary of the ball is the sphere S , which we ~x − ~x0 . given an orientation compatible with the outward normal vector ~n = k~x − ~x0 k2 For sufficiently small , the ball is contained in A. Moreover, A − B is a region not containing ~x0 and has compatibly oriented surfaces S ∪ (−S ) as the boundary. Therefore Z Z Z Z M ~x − ~x0 M G·~ndA = G·~ndA = − 3 (~x−~x0 )· dA = − 2 dA = −4πM. S S S k~ x−~ x0 k2 = More generally, the gravitational field created by several masses at various locations is the sum of the individual gravitational field. The outward flux of the field through S is then −4π multiplied to the total mass contained in A. In particular, the flux does not depend on the location of the mass, but whether the mass is contained in A or not. This is called Gauss’ Law.

418

CHAPTER 7. MULTIVARIABLE INTEGRATION

Exercise 7.3.24. Compute the flux. 1. Outward flux of (x3 , x2 y, x2 z) through boundary of the solid x2 + y 2 ≤ a2 , 0 ≤ z ≤ b. 2. Inward flux of (xy 2 , yz 2 , zx2 ) through the ellipse (z − z0 )2 = 1. c2

(x − x0 )2 (y − y0 )2 + + a2 b2

3. Upward flux of (x3 , y 3 , z 3 ) through the surface z = x2 + y 2 ≤ 1. 4. Outward flux of (x2 , y 2 , −2(x + y)z) through the torus in Example 6.1.10. Exercise 7.3.25. Suppose A ⊂ R3 is a convex region with boundary surface S. Suppose ~a is a vector in the interior of A, and p is the distance from ~a to the Z pdA. Moreover, for the special case S is the Z x2 y 2 z 2 1 ellipse 2 + 2 + 2 = 1 and ~a = (0, 0, 0), compute dA. a b c S p Exercise 7.3.26. Find the formulae for the divergences of F + G, gF , F × G. tangent plane of S. Compute

S

Exercise 7.3.27. Prove that div(curlF ) = 0. Moreover, compute div(gradf ) and grad(divF ).

Gauss theorem can be further extended to higher dimension. Suppose A ⊂ Rn is the n-dimensional region between the graphs xn = h(x1 , x2 , . . . , xn−1 ) and xn = k(x1 , x2 , . . . , xn−1 ) for (x1 , x2 , . . . , xn−1 ) ∈ B. Then we have Z Z fxn dx1 dx2 · · · dxn = (f (~u, h(~u)) − f (~u, k(~u)))du1 du2 · · · dun−1 . B

A

Suppose the boundary S of A is oriented to be compatible with the normal vector pointing away from A. Then the normal vector for the top boundary St points in the direction of xn and the normal vector for the bottom boundary Sb points opposite to the direction of xn . By the discussion at the end of Section 7.2.7, the parametrization σ(~u) = (~u, h(~u)) for St is compatibly oriented if n is odd and is oppositely oriented if n is even. Consequently, we get Z f (~u, h(~u))du1 du2 · · · dun−1 B Z ∂(x1 , x2 , . . . , xn−1 ) = f (σ(~u)) du1 du2 · · · dun−1 ∂(u1 , u2 , . . . , un−1 ) B Z = (−1)n−1 f dx1 ∧ dx2 ∧ · · · ∧ dxn−1 . St

Similar argument may be made for the bottom boundary Sb , for which we note that the parametrization σ(~u) = (~u, k(~u)) is compatibly oriented (the normal vector point opposite to the direction of xn ) if n is even and is oppositely oriented if n is odd. We then conclude that Z Z (f (~u, h(~u))−f (~u, k(~u)))du1 du2 · · · dun−1 = (−1)n−1 f dx1 ∧dx2 ∧· · ·∧dxn−1 . B

S

7.3. STOKES THEOREM

419

The argument can be similarly applied to the case A is given by h(x1 , . . . , xi−1 , xi+1 , . . . , xn−1 ) ≥ xi ≥ k(x1 , . . . , xi−1 , xi+1 , . . . , xn−1 ). In this case, we have Z Z fxi dx1 dx2 · · · dxn = f (u1 , . . . , ui−1 , h(~u), ui , . . . , un )du1 du2 · · · dun−1 A B Z − f (u1 , . . . , ui−1 , k(~u), ui , . . . , un )du1 du2 · · · dun−1 . B

Moreover, St may be parametrized by σi (~u) = σi (u1 , u2 , . . . , un−1 ) = (u1 , u2 , . . . , ui−1 , h(~u), ui , . . . , un−1 ). The normal vector for St points in the direction of xi . By the discussion at the end of Section 7.2.7, the orientation of the parametrization σi is compatible to the normal vector if i is odd and is not compatible if i is even. Consequently, we get Z f (σi (~u))du1 du2 · · · dun−1 B Z ∂(x1 , x2 , . . . , xbi , . . . , xn ) = f (σi (~u)) du1 du2 · · · dun−1 ∂(u1 , u2 , . . . , un−1 ) B Z ci ∧ · · · ∧ dxn−1 . = (−1)i−1 f dx1 ∧ dx2 ∧ · · · ∧ dx St

Therefore we get the following extension of Gauss theorem. Suppose A ⊂ Rn is a region with boundary hypersurface S compatibly oriented with respect to the outward normal vector ~n. Then for any continuously differentiable F = (f1 , f2 , . . . , fn ), we have Z X ci ∧ · · · ∧ dxn (−1)i−1 fi dx1 ∧ dx2 ∧ · · · ∧ dx ZS ∂f2 ∂fn ∂f1 + + ··· + dx1 dx2 · · · dxn . (7.3.11) = ∂x1 ∂x2 ∂xn A Extend the definition of divergence to vector fields in Rn divF =

∂f1 ∂f2 ∂fn + + ··· + . ∂x1 ∂x2 ∂xn

The Gauss theorem can be rewritten as Z Z F · ~ndV = divF dµ~x . S

A