THE AMERICAN MATHEMATICAL
MONTHLY VOLUME 119, NO. 2
FEBRUARY 2012
Solving a Generalized Heron Problem by Means of Convex Analysis
87
Boris S. Mordukhovich, Nguyen Mau Nam, and Juan Salinas Jr.
Jacobi Sum Matrices
100
Sam Vandervelde
Alcuin’s Sequence
115
Donald J. Bindner and Martin Erickson
A Case of Continuous Hangover
122
Burkard Polster, Marty Ross, and David Treeby
NOTES Riemann Maps and Diameter Distance
140
David A. Herron
A Power Series Approach to Some Inequalities
147
Cristinel Mortici
Chebyshev Mappings of Finite Fields
151
Julian Rosen, Zachary Scherr, Benjamin Weiss, and Michael E. Zieve
Collapsing Walls Theorem
156
Igor Pak and Rom Pinchasi
PROBLEMS AND SOLUTIONS
161
REVIEWS Roads to Infinity. The mathematics of truth and proof. By John Stillwell Jos´ e Ferreir´ os
An Official Publication of the Mathematical Association of America
169
Your Favorite MAA Books. Now Digital. Visit www.maa.org/ebooks
Save 10% on your order! coupon code: 353030655
THE AMERICAN MATHEMATICAL
MONTHLY Volume 119, No. 2
February 2012
EDITOR Scott T. Chapman Sam Houston State University NOTES EDITOR Sergei Tabachnikov Pennsylvania State University
Douglas B. West University of Illinois
BOOK REVIEW EDITOR Jeffrey Nunemacher Ohio Wesleyan University
PROBLEM SECTION EDITORS Gerald Edgar Ohio State University
Doug Hensley Texas A&M University
ASSOCIATE EDITORS William Adkins Louisiana State University David Aldous University of California, Berkeley Elizabeth Allman University of Alaska, Fairbanks Jonathan M. Borwein University of Newcastle Jason Boynton North Dakota State University Edward B. Burger Williams College Minerva Cordero-Epperson University of Texas, Arlington Beverly Diamond College of Charleston Allan Donsig University of Nebraska, Lincoln Michael Dorff Brigham Young University Daniela Ferrero Texas State University Luis David Garcia-Puente Sam Houston State University Sidney Graham Central Michigan University Tara Holm Cornell University Roger A. Horn University of Utah Lea Jenkins Clemson University Daniel Krashen University of Georgia
Ulrich Krause Universit¨ at Bremen Jeffrey Lawson Western Carolina University C. Dwight Lahr Dartmouth College Susan Loepp Williams College Irina Mitrea Temple University Bruce P. Palka National Science Foundation Vadim Ponomarenko San Diego State University Catherine A. Roberts College of the Holy Cross Rachel Roberts Washington University, St. Louis Ivelisse M. Rubio Universidad de Puerto Rico, Rio Piedras Adriana Salerno Bates College Edward Scheinerman Johns Hopkins University Susan G. Staples Texas Christian University Dennis Stowe Idaho State University Daniel Ullman George Washington University Daniel Velleman Amherst College
EDITORIAL ASSISTANT Bonnie K. Ponce
NOTICE TO AUTHORS The MONTHLY publishes articles, as well as notes and other features, about mathematics and the profession. Its readers span a broad spectrum of mathematical interests, and include professional mathematicians as well as students of mathematics at all collegiate levels. Authors are invited to submit articles and notes that bring interesting mathematical ideas to a wide audience of MONTHLY readers. The MONTHLY’s readers expect a high standard of exposition; they expect articles to inform, stimulate, challenge, enlighten, and even entertain. MONTHLY articles are meant to be read, enjoyed, and discussed, rather than just archived. Articles may be expositions of old or new results, historical or biographical essays, speculations or definitive treatments, broad developments, or explorations of a single application. Novelty and generality are far less important than clarity of exposition and broad appeal. Appropriate figures, diagrams, and photographs are encouraged. Notes are short, sharply focused, and possibly informal. They are often gems that provide a new proof of an old theorem, a novel presentation of a familiar theme, or a lively discussion of a single issue. Beginning January 1, 2011, submission of articles and notes is required via the MONTHLY’s Editorial Manager System. Initial submissions in pdf or LATEX form can be sent to the Editor Scott Chapman at http://www.editorialmanager.com/monthly The Editorial Manager System will cue the author for all required information concerning the paper. Questions concerning submission of papers can be addressed to the Editor at
[email protected]. Authors who use LATEX are urged to use article.sty, or a similar generic style, and its standard environments with no custom formatting. A formatting document for MONTHLY references can be found at http://www.shsu.edu/~bks006/ FormattingReferences.pdf. Follow the link to Electronic Publications Information for authors at http: //www.maa.org/pubs/monthly.html for information about figures and files, as well as general editorial guidelines. Letters to the Editor on any topic are invited. Comments, criticisms, and suggestions for making the MONTHLY more lively, entertaining, and informative can be forwarded to the Editor at
[email protected]. The online MONTHLY archive at www.jstor.org is a valuable resource for both authors and readers; it may be searched online in a variety of ways for any specified keyword(s). MAA members whose institutions do not provide JSTOR access may obtain individual access for a modest annual fee; call 800-3311622. See the MONTHLY section of MAA Online for current information such as contents of issues and descriptive summaries of forthcoming articles: http://www.maa.org/
Proposed problems or solutions should be sent to: DOUG HENSLEY, MONTHLY Problems Department of Mathematics Texas A&M University 3368 TAMU College Station, TX 77843-3368 In lieu of duplicate hardcopy, authors may submit pdfs to
[email protected]. Advertising Correspondence: MAA Advertising 1529 Eighteenth St. NW Washington DC 20036 Phone: (877) 622-2373 E-mail:
[email protected] Further advertising information can be found online at www.maa.org Change of address, missing issue inquiries, and other subscription correspondence: MAA Service Center,
[email protected] All at the address: The Mathematical Association of America 1529 Eighteenth Street, N.W. Washington, DC 20036 Recent copies of the MONTHLY are available for purchase through the MAA Service Center.
[email protected], 1-800-331-1622 Microfilm Editions: University Microfilms International, Serial Bid coordinator, 300 North Zeeb Road, Ann Arbor, MI 48106. The AMERICAN MATHEMATICAL MONTHLY (ISSN 0002-9890) is published monthly except bimonthly June-July and August-September by the Mathematical Association of America at 1529 Eighteenth Street, N.W., Washington, DC 20036 and Lancaster, PA, and copyrighted by the Mathematical Association of America (Incorporated), 2012, including rights to this journal issue as a whole and, except where otherwise noted, rights to each individual contribution. Permission to make copies of individual articles, in paper or electronic form, including posting on personal and class web pages, for educational and scientific use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear the following copyright notice: [Copyright the Mathematical Association of America 2012. All rights reserved.] Abstracting, with credit, is permitted. To copy otherwise, or to republish, requires specific permission of the MAA’s Director of Publications and possibly a fee. Periodicals postage paid at Washington, DC, and additional mailing offices. Postmaster: Send address changes to the American Mathematical Monthly, Membership/Subscription Department, MAA, 1529 Eighteenth Street, N.W., Washington, DC, 20036-1385.
Solving a Generalized Heron Problem by Means of Convex Analysis Boris S. Mordukhovich, Nguyen Mau Nam, and Juan Salinas Jr.
Abstract. The classical Heron problem states: on a given straight line in the plane, find a point C such that the sum of the distances from C to the given points A and B is minimal. This problem can be solved using standard geometry or differential calculus. In the light of modern convex analysis, we are able to investigate more general versions of this problem. In this paper we propose and solve the following problem: on a given nonempty closed convex subset of Rs , find a point such that the sum of the distances from that point to n given nonempty closed convex subsets of Rs is minimal.
1. PROBLEM FORMULATION. Heron from Alexandria (10–75 AD) was “a Greek geometer and inventor whose writings preserved for posterity a knowledge of the mathematics and engineering of Babylonia, ancient Egypt, and the GrecoRoman world” (from the Encyclopedia Britannica). One of the geometric problems he proposed in his Catroptica was as follows: find a point on a straight line in the plane such that the sum of the distances from it to two given points is minimal. Recall that a subset of Rs is called convex if λx + (1 − λ)y ∈ whenever x and y are in and 0 ≤ λ ≤ 1. Our idea now is to consider a much broader situation, where the two given points in the classical Heron problem are replaced by finitely many closed and convex subsets i , i = 1, . . . , n, and the given line is replaced by a given closed and convex subset of Rs . We are looking for a point in the set such that the sum of the distances from that point to i , i = 1, . . . , n, is minimal. The distance from a point x to a nonempty set is understood in the conventional way d(x; ) = inf ||x − y|| y ∈ , (1.1) where || · || is the Euclidean norm in Rs . The new generalized Heron problem is formulated as follows: minimize D(x) :=
n X
d(x; i ) subject to x ∈ ,
(1.2)
i=1
where all the sets and i , i = 1, . . . , n, are nonempty, closed, and convex; these are our standing assumptions in this paper. Thus (1.2) is a constrained convex optimization problem, and hence it is natural to use techniques of convex analysis and optimization to solve it. 2. ELEMENTS OF CONVEX ANALYSIS. In this section we review some basic concepts of convex analysis used in what follows. This material and much more can be found, e.g., in the books [2, 3, 4, 7]. http://dx.doi.org/10.4169/amer.math.monthly.119.02.087 MSC: Primary 49J52, Secondary 49J53, 90C31
February 2012]
SOLVING A GENERALIZED HERON PROBLEM
87
Let f : Rs → R := (−∞, ∞] be an extended-real-valued function, which may be infinite at some points, and let dom f := x ∈ Rs f (x) < ∞ be its effective domain. The epigraph of epi f := (x, α) ∈ Rs+1
f is the subset of Rs × R defined by x ∈ dom f and α ≥ f (x) .
The function f is closed if its epigraph is closed, and it is convex if its epigraph is a convex subset of Rs+1 . It is easy to check that f is convex if and only if f λx + (1 − λ)y ≤ λ f (x) + (1 − λ) f (y) for all x, y ∈ dom f and λ ∈ [0, 1]. Furthermore, a nonempty closed subset of Rs is convex if and only if the corresponding distance function f (x) = d(x; ) is a convex function. Note that the distance function f (x) = d(x; ) is Lipschitz continuous on Rs with modulus one, i.e., | f (x) − f (y)| ≤ ||x − y|| for all x, y ∈ Rs . A typical example of an extended-real-valued function is the indicator function ( 0 if x ∈ , δ(x; ) := (2.3) ∞ otherwise of the set . It follows immediately from the definitions that ⊂ Rs is closed (resp. convex) if and only if the indicator function (2.3) is closed (resp. convex). An element v ∈ Rs is called a subgradient of a convex function f : Rs → R at x ∈ dom f if it satisfies the inequality f (x) ≥ f (x) + hv, x − xi for all x ∈ Rs ,
(2.4)
where h·, ·i stands for the usual scalar product in Rs . Intuitively, a vector v ∈ Rs is a subgradient of f at x if and only if f is bounded below by an affine function that agrees with f at x in which the coefficients are given by v. The set of all the subgradients v in (2.4) is called the subdifferential of f at x and is denoted by ∂ f (x). If f is convex and differentiable at x, then ∂ f (x) = {∇ f (x)}. For example, the function f (x) = |x| is not differentiable at x = 0. However, one can draw several “subtangent” lines that go through (0, 0) with slopes belonging to [−1, 1], and these lines stay below the graph
Figure 1. The absolute value function and “subtangent” lines at (0, 0).
88
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
of the function. As a result, we obtain the following subdifferential formula for the absolute value function f (x) = |x|: if x < 0, {−1} ∂ f (x) = [−1, 1] if x = 0, {1} if x > 0. A well-recognized technique in optimization is to reduce a constrained optimization problem to an unconstrained one using the indicator function of the constraint set. It is obvious that x ∈ is a minimizer of the general constrained optimization problem: minimize f (x) subject to x ∈
(2.5)
if and only if it solves the unconstrained problem minimize f (x) + δ(x; ), x ∈ Rs .
(2.6)
By the definitions, for any convex function ϕ : Rs → R, x is a minimizer of ϕ if and only if 0 ∈ ∂ϕ(x),
(2.7)
which is a nonsmooth convex counterpart of the classical Fermat stationary rule. Applying (2.7) to the constrained optimization problem (2.5) via its unconstrained description (2.6) requires the usage of subdifferential calculus. The most fundamental calculus result of convex analysis is the following Moreau-Rockafellar theorem for the subdifferential of sums; see, e.g., [4, p. 51]. Theorem 2.1. Let ϕi : Rs → R, i = 1, . . . , m, be closed convex functions. Assume n that there is a point x ∈ ∩i=1 dom ϕi at which all (except possibly one) of the functions ϕ1 , . . . , ϕm are continuous. Then we have the equality ∂
X m
X m m X ϕi (x) = ∂ϕi (x) := vi vi ∈ ∂ϕi (x) for i = 1, . . . , m
i=1
i=1
i=1
n for all x ∈ ∩i=1 dom ϕi .
Given a convex set ⊂ Rs and a point x ∈ , the corresponding geometric counterpart of (2.4) is the normal cone to at x defined by (2.8) N (x; ) := v ∈ Rs hv, x − xi ≤ 0 for all x ∈ . Observe that a vector v belongs to N (x; ) if and only if it makes a right or obtuse angle with the vector from x¯ to x for any x ∈ . It easily follows from the definitions that ∂δ(x; ) = N (x; ) for every x ∈ ,
(2.9)
which allows us, in particular, to characterize minimizers of the constrained problem (2.5) in terms of the subdifferential (2.4) of f and the normal cone (2.8) to by applying Theorem 2.1 to the function f (x) + δ(x; ) in (2.7). February 2012]
SOLVING A GENERALIZED HERON PROBLEM
89
Figure 2. A set and its normal vectors at x = (0, 0).
Finally in this section, we present a useful formula for computing the subdifferential of the distance function (1.1) via the Euclidean projection 5(x; ) := x ∈ ||x − x|| = d(x; ) (2.10) of x ∈ Rs on the closed and convex set ⊂ Rs . It follows from the definition of the Euclidean projection that 5(x; ) = {x} if x ∈ and it is a singleton when x ∈ / . In the sequel, we identify the set 5(x; ) with its unique element. Proposition 2.2. Let 6 = ∅ be a closed and convex subset of Rs . Then n x − 5(x; ) o d(x; ) ∂d(x; ) = N (x; ) ∩ B
if x ∈ / , if x ∈ ,
where B is the closed unit ball of Rs . The proof of this formula can be found in [3, p. 181]. 3. OPTIMAL SOLUTIONS TO THE GENERALIZED HERON PROBLEM. In this section we derive efficient characterizations of optimal solutions to the generalized Heron problem (1.2), which allow us to completely solve this problem in some important settings. First let us present general conditions that ensure the existence of optimal solutions to (1.2). Proposition 3.1. Assume that one of the sets and i , i = 1, . . . , n, is bounded. Then the generalized Heron problem (1.2) admits at least one optimal solution. Proof. Consider the optimal value γ := inf D(x) x∈
in (1.2) and take a minimizing sequence {xk } ⊂ with D(xk ) → γ as k → ∞. If the constraint set is bounded, then by the classical Bolzano-Weierstrass theorem the sequence {xk } contains a subsequence converging to a point x, which belongs to the 90
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
set due to its closedness. Since the function D(x) in (1.2) is continuous, we have D(x) = γ , and so x is an optimal solution to (1.2). It remains to consider the case when one of sets i , say 1 , is bounded. In this case we have for the above sequence {xk } when k is sufficiently large that d(xk ; 1 ) ≤ D(xk ) < γ + 1, and thus there exists wk ∈ 1 with ||xk − wk || < γ + 1 for such indices k. Then ||xk || < ||wk || + γ + 1, which shows that the sequence {xk } is bounded. The existence of optimal solutions follows in this case from the arguments above. To characterize optimal solutions to the generalized Heron problem (1.2) in what follows, for any nonzero vectors u, v ∈ Rs define the cosine of the angle between u and v by the quantity cos(v, u) :=
hv, ui . ||v|| · ||u||
We say that N (x; ) is representable by a subspace L = L(x) 6 = {0} if N (x; ) = L ⊥ := v ∈ Rs hv, ui = 0 whenever u ∈ L .
(3.11)
(3.12)
The next theorem gives necessary and sufficient conditions for optimal solutions to (1.2) via projections (2.10) on i incorporated into quantities (3.11). Theorem 3.2. Consider problem (1.2) in which i ∩ = ∅ for i = 1, . . . , n.
(3.13)
Given x ∈ , define the vectors ai (x) :=
x − 5(x; i ) 6 = 0, d(x; i )
i = 1, . . . , n.
(3.14)
Then x ∈ is an optimal solution to the generalized Heron problem (1.2) if and only if we have the inclusion −
n X
ai (x) ∈ N (x; ).
(3.15)
i=1
Suppose in addition that the normal cone to the constraint set N (x; ) is representable by a subspace L. Then (3.15) is equivalent to n X
cos ai (x), u = 0 whenever u ∈ L \ {0}.
(3.16)
i=1
Proof. Problem (1.2) is equivalent to the following unconstrained optimization problem: February 2012]
SOLVING A GENERALIZED HERON PROBLEM
91
minimize D(x) + δ(x; ),
x ∈ Rs .
(3.17)
Applying the generalized Fermat rule (2.7), we see that x is a solution to (3.17) if and only if 0∈∂
X n
d(·; i ) + δ(·; ) (x).
(3.18)
i=1
Since all of the functions d(·; i ), i = 1, . . . , n, are convex and continuous, we employ the subdifferential sum rule of Theorem 2.1 to (3.18) and arrive at n X 0 ∈ ∂ D + δ(·, ) (x) = ∂d(x; i ) + N (x; ) i=1
=
n X
(3.19) ai (x) + N (x; ),
i=1
where the second representation in (3.19) is due to (2.9), assumption (3.13), and the subdifferential description of Proposition 2.2 with ai (x) defined in (3.14). It is obvious that (3.19) and (3.15) are equivalent. Suppose in addition that the the normal cone N (x; ) to the constraint set is representable by a subspace L. Then the inclusion (3.15) is equivalent to 0∈
n X
ai (x) + L ⊥ ,
i=1
which in turn can be written in the form X n ai (x), u = 0 for all u ∈ L . i=1
Taking into account that ||ai (x)|| = 1 for i = 1, . . . , n by (3.14) and assumption (3.13), the latter equality is equivalent to n X i=1
hai (x), vi = 0 for all u ∈ L \ {0}, ||ai (x)|| · ||u||
which gives (3.16) due to the notation (3.11) and thus completes the proof of the theorem. To further specify the characterization in Theorem 3.2, recall that a subset A of Rs is an affine subspace if there is a vector a ∈ A and a subspace L such that A = a + L. In this case we say that A is parallel to L. Note that the subspace L parallel to A is uniquely defined by L = A − A = {x − y | x ∈ A, y ∈ A} and that A = b + L for any vector b ∈ A. Corollary 3.3. Let be an affine subspace parallel to a subspace L, and let assumption (3.13) of Theorem 3.2 be satisfied. Then x ∈ is a solution to the generalized Heron problem (1.2) if and only if condition (3.16) holds. 92
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
Proof. To apply Theorem 3.2, it remains to check that N (x; ) is representable by the subspace L in the setting of this corollary. Indeed, we have = x + L, since is an affine subspace parallel to L. Fix any v ∈ N (x; ) and get by (2.8) that hv, x − xi ≤ 0 whenever x ∈ and hence hv, ui ≤ 0 for all u ∈ L. Since L is a subspace, the latter implies that hv, ui = 0 for all u ∈ L, and thus N (x; ) ⊂ L ⊥ . The opposite inclusion is trivial, which gives (3.12) and completes the proof of the corollary. The underlying characterization (3.16) can be checked easily when the subspace L in Theorem 3.2 is given as the span of fixed generating vectors. Corollary 3.4. Let L = span{u 1 , . . . , u m } with u j 6 = 0, i = 1, . . . , m, in the setting of Theorem 3.2. Then x ∈ is an optimal solution to the generalized Heron problem (1.2) if and only if n X
cos ai (x), u j = 0 for j = 1, . . . , m.
(3.20)
i=1
Proof. We show that (3.16) is equivalent to (3.20) in the setting under consideration. Since (3.16) obviously implies (3.20), it remains to justify the opposite implication. Set a :=
n X
ai (x)
i=1
and observe that (3.20) yields the condition ha, u j i = 0 for j = 1, . . . m,
(3.21)
since u j 6= 0 for j = 1, . . . , m and ||ai || = 1 for i = 1, . . . , n. Taking now any vector u ∈ L \ {0}, we represent it in the form u=
m X
λ j u j with λ j ∈ R
j=1
and get from (3.21) the equalities ha, ui =
n X
λ j ha, u j i = 0.
j=1
This justifies (3.16) and completes the proof of the corollary. Let us examine in more detail the case of two sets 1 and 2 in (1.2) with the normal cone to the constraint set being a straight line generated by a given vector. This is a direct extension of the classical Heron problem to the setting when the two points are replaced by closed and convex sets, and the constraint line is replaced by a closed convex set with the property above. The next theorem gives a complete and verifiable solution to the new problem. Theorem 3.5. Let 1 and 2 be subsets of Rs such that ∩ i = ∅ for i = 1, 2 in (1.2). Suppose also that there is a vector a 6 = 0 such that N (x; ) = span{a}. The February 2012]
SOLVING A GENERALIZED HERON PROBLEM
93
following assertions hold, where the quantities ai := ai (x) are defined in (3.14): (i) If x ∈ is an optimal solution to (1.2), then either a1 + a2 = 0 or cos(a1 , a) = cos(a2 , a).
(3.22)
(ii) Conversely, if s = 2 and either a1 + a2 = 0 or a1 6 = a2 and cos(a1 , a) = cos(a2 , a) ,
(3.23)
then x ∈ is an optimal solution to the generalized Heron problem (1.2). Proof. It follows from Theorem 3.2 that x ∈ is an optimal solution to (1.2) if and only if −a1 − a2 ∈ N (x; ). By the assumed structure of the normal cone to the latter is equivalent to the alternative: either a1 + a2 = 0 or a1 + a2 = λa for some λ 6 = 0.
(3.24)
To justify (i), let us show that the second equality in (3.24) implies the corresponding one in (3.22). Indeed, we have ||a1 || = ||a2 || = 1, and thus ha1 , λai = ha1 , a1 + a2 i = ha1 , a1 i + ha1 , a2 i = 1 + ha1 , a2 i = ha2 , a2 i + ha2 , a1 i = ha2 , a1 + a2 i = ha2 , λai, which ensures that ha1 , ai = ha2 , ai as λ 6 = 0. This gives us the equality cos(a1 , a) = cos(a2 , a) due to ||a1 || = ||a2 || = 1 and a 6 = 0. Hence we arrive at (3.22). To justify (ii), we need to prove that the relationships in (3.23) imply −a1 − a2 ∈ N (x; ) = span{a}.
(3.25)
If a1 + a2 = 0, then (3.25) is obviously satisfied. Consider the alternative in (3.23) when a1 6= a2 and cos(a1 , a) = cos(a2 , a). Choose a vector b 6 = 0 orthogonal to a and express a1 and a2 in terms of the basis {a, b} by: a1 = x1 a + y1 b and a2 = x2 a + y2 b. Since cos(a1 , a) = cos(a2 , a), we have x1 = x2 . Then y1 = ±y2 by ||a1 ||2 = ||a2 ||2 . Due to a1 6= a2 this implies y1 = −y2 and thus completes the proof. Finally in this section, we present two examples illustrating the application of Theorem 3.2 and Corollary 3.4, respectively, to solving the corresponding generalized and classical Heron problems. Example 3.6. Consider problem (1.2) where n = 2, the sets 1 and 2 are two points A1 and A2 in the plane, and the constraint is a disk that does not contain A1 or A2 . Condition (3.15) from Theorem 3.2 characterizes a solution M ∈ to this generalized 94
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
Heron problem as follows. If the line segment A1 A2 intersects the disk, then any point in the intersection is an optimal solution. In this case the problem may actually have infinitely many solutions. Otherwise, there is a unique point M on the circle such that a normal vector nE to at M is the angle bisector of angle A1 M A2 , and that is the only optimal solution to the generalized Heron problem under consideration; see Figure 3. 10
10 8 6 2
4 M
2 y
A1
0
A2
6
M
4 y
8
A2
0
−2
−2
−4
−4
−6
−6
−8
−8
−10 −10 −8 −6 −4 −2 0 x
2
4
6
A1
−10 −10 −8 −6 −4 −2 0 x
8 10
2
4
6
8 10
Figure 3. Generalized Heron problems for two points with disk constraints.
Example 3.7. Consider problem (1.2), where i = {Ai }, i = 1, . . . , n, are n points in the plane, and where = L ⊂ R2 is a straight line that does not contain any of these points. Then, by Corollary 3.4 of Theorem 3.2, a point M ∈ L is a solution to this generalized Heron problem if and only if −−→ −−→ cos(MA1 , aE ) + · · · + cos(MAn , aE ) = 0, where aE is a direction vector of L. Note that the latter equation completely characterizes the solution of the classical Heron problem in the plane in both cases when A1 and A2 are on the same side and different sides of L; see Figure 4. 10
10
8 6 4
6
A1
4 2 M
y
y
2 0
A2
8
A2
−2
−2
−4
−4
−6
−6
−8
−8
−10 −10 −8 −6 −4 −2 0 x
2
4
6
8 10
M
0
A1
−10 −10 −8 −6 −4 −2 0 x
2
4
6
8 10
Figure 4. The classical Heron problem.
February 2012]
SOLVING A GENERALIZED HERON PROBLEM
95
4. NUMERICAL ALGORITHM AND ITS IMPLEMENTATION. In this section we present and justify an iterative algorithm to solve the generalized Heron problem (1.2) numerically and illustrate its implementations using MATLAB in two important settings with disk and ball constraints. Here is the main algorithm. Theorem 4.1. Let and i , i = 1, . . . , n, be nonempty closed convex subsets of Rs such that at least one of them is bounded. Picking a sequence {αk } of positive numbers and a starting point x1 ∈ , consider the iterative algorithm: xk+1
n X = 5 xk − αk vik ; ,
k = 1, 2, . . . ,
(4.26)
i=1
where the vectors vik in (4.26) are constructed by xk − 5(xk ; i ) d(xk ; i ) vik := 0
if xk ∈ / i , if xk ∈ i .
Assume that the given sequence {αk } in (4.26) satisfies the conditions ∞ X k=1
αk = ∞ and
∞ X
αk2 < ∞.
(4.27)
k=1
Then the iterative sequence {xk } in (4.26) converges to an optimal solution of the generalized Heron problem (1.2) and the value sequence Vk := min D(x j ) j = 1, . . . , k
(4.28)
converges to the optimal value Vˆ in this problem. Proof. Observe that algorithm (4.26) is well posed, since the projection to a convex set used in (4.26) is uniquely defined. Since one of the sets and i , i = 1, . . . , n, is bounded, the problem has an optimal solution by Proposition 3.1. This algorithm and its convergence under conditions (4.27) are based on the subgradient method for convex functions in the so-called “square summable but not summable case” (see, e.g., [1, Proposition 8.2.8, p. 480]), the subdifferential sum rule of Theorem 2.1, and the subdifferential formula for the distance function given in Proposition 2.2. The reader can compare this algorithm and its justifications with the related developments in [6] for the numerical solution of the (unconstrained) generalized Fermat-Torricelli problem. Example 4.2. Consider the generalized Heron problem (1.2) for (not necessarily disjoint) squares i , i = 1, . . . , n, of right position in R2 (i.e., such that the sides of each square are parallel to the x-axis and the y-axis) subject to a given disk constraint . Let ci = (ai , bi ) and ri , i = 1, . . . , n, be the centers and half the side lengths of the squares under consideration. The vertices of the ith square are denoted by q1i = (ai + ri , bi + ri ), q2i = (ai − ri , bi + ri ), q3i = (ai − ri , bi − ri ), q4i = (ai + ri , bi − ri ). Let r and p = (ν, η) be the radius and the center of the constraint. 96
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
The projection P(x, y) := 5((x, y); ) for (x, y) ∈ / is calculated by P(x, y) = (wx + ν, w y + η) with r (x − ν) wx = p (x − ν)2 + (y − η)2 and r (y − η) wy = p . (x − ν)2 + (y − η)2 For (x, y) ∈ , P(x, y) = (x, y). Let xk = (x1k , x2k ) be the sequence defined by (4.26), in which the quantities vik are computed by the following formula:
vik =
0
if |x1k − ai | ≤ ri and |x2k − bi | ≤ ri ,
xk − q1i kxk − q1i k
if x1k − ai > ri and x2k − bi > ri ,
xk − q2i kxk − q2i k
if x1k − ai < −ri and x2k − bi > ri ,
xk − q3i kxk − q3i k
if x1k − ai < −ri and x2k − bi < −ri ,
xk − q4i kxk − q4i k
if x1k − ai > ri and x2k − bi < −ri ,
(0, 1)
if |x1k − ai | ≤ ri and x2k − bi > ri ,
(0, −1)
if |x1k − ai | ≤ ri and x2k − bi < −ri ,
(1, 0)
if x1k − ai > ri and |x2k − bi | ≤ ri ,
(−1, 0)
if x1k − ai < −ri and |x2k − bi | ≤ ri
for i = 1, . . . , n and k = 1, 2, . . . To implement this algorithm we developed a MATLAB program. Figure 5 and the corresponding table show the result of applying this algorithm for the disk constraint with center (−3, 4) and radius 1.5, for the squares i with centers (−7, 1), (−5, −8), (4, 7), and (5, 1) of the same half side length 1, for the starting point x1 = (−3, 5.5) ∈ , and for the sequence αk = 1/k in (4.26) satisfying conditions (4.27). The approximate optimal solution and optimal value are x ≈ (−2.04012, 2.84734) and Vˆ ≈ 26.13419. February 2012]
SOLVING A GENERALIZED HERON PROBLEM
97
y
10 8 6 4 2 0 −2 −4 −6 −8 −10 −10 −8 −6 −4 −2 0 2 4 6 8 10 x
MATLAB RESULT k
xk
Vk
1 100 1000 100,000 200,000 400,000 600,000
(−3, 5.5) (−2.02957, 2.85621) (−2.03873, 2.84850) (−2.04010, 2.84735) (−2.04011, 2.84735) (−2.04012, 2.84734) (−2.04012, 2.84734)
30.99674 26.13427 26.13419 26.13419 26.13419 26.13419 26.13419
Figure 5. Generalized Heron problem for squares with a disk constraint.
Example 4.3. Consider the generalized Heron problem (1.2) for (not necessarily disjoint) cubes of right position in R3 subject to a ball constraint. In this case the projection 5((x, y, z); ) and quantities vik are computed similarly to Example 4.2. Once again, we implemented this algorithm with a MATLAB program. Figure 6 and the corresponding table present the calculation results for the ball constraint with center (0, 2, 0) and radius 1, for the cubes i with centers (0, −4, 0), (−4, 2, −3), (−3, −4, 2), (−5, 4, 4), and (−1, 8, 1) of the same half side length 1, for the starting point x1 = (0, 2, 0), and for the sequence αk = 1/k in (4.26) satisfying (4.27). The approximate optimal solution and optimal value are x ≈ (−0.92531, 1.62907, 0.07883) and Vˆ ≈ 22.23480.
MATLAB RESULT 6 4 z
2 0 −2 −4 −6 −4
−2 0 2 4 6 8 y 10
−8 −4 −6 0 −2 4 2 x 6 8
k
xk
Vk
1 10 100 1,000 10,000 20,000 30,000
(0, 2, 0) (−0.92583, 1.63052, 0.07947) (−0.92531, 1.62908, 0.07884) (−0.92531, 1.62907, 0.07883) (−0.92531, 1.62907, 0.07883) (−0.92531, 1.62907, 0.07883) (−0.92531, 1.62907, 0.07883)
24.18180 22.23480 22.23480 22.23480 22.23480 22.23480 22.23480
Figure 6. Generalized Heron problem for cubes with a ball constraint.
ACKNOWLEDGMENTS. Research of the first author was partially supported by the US National Science Foundation under grant DMS-1007132 and by the Australian Research Council under grant DP-12092508.
REFERENCES 1. D. Bertsekas, A. Nedic, A. Ozdaglar, Convex Analysis and Optimization, Athena Scientific, Belmont, MA, 2003. 2. J. M. Borwein, A. S. Lewis, Convex Analysis and Nonlinear Optimization: Theory and Examples, second edition, Springer, New York, 2006. 3. J.-B. Hiriart-Urruty, C. Lemar´echal, Fundamentals of Convex Analysis. Springer-Verlag, Berlin, 2001. 4. G. G. Magaril-Il’yaev, M. V. Tikhomirov, Convex Analysis: Theory and Applications, American Mathematical Society, Providence, RI, 2003.
98
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
5. H. Martini, K. J. Swanepoel, G. Weiss, The Fermat-Torricelli problem in normed planes and spaces, J. Optim. Theory Appl. 115 (2002) 283–314; available at http://dx.doi.org/10.1023/A: 1020884004689. 6. B. S. Mordukhovich, N. M. Nam, Applications of variational analysis to a generalized Fermat-Torricelli problem, J. Optim. Theory Appl. 148 (2011) 431–454; available at http://dx.doi.org/10.1007/ s10957-010-9761-7. 7. R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1970. BORIS MORDUKHOVICH is Distinguished University Professor and President of the Academy of Scholars at Wayne State University. He has more than 300 publications including monographs and patents. Among his best known achievements are the introduction of powerful constructions of generalized differentiation (bearing his name), their development, and applications to broad classes of problems in variational analysis, optimization, equilibrium, control, economics, engineering, and other fields. Mordukhovich is a SIAM Fellow and a recipient of many international awards and honors including Doctor Honoris Causa degrees from four universities. Department of Mathematics, Wayne State University, Detroit, MI 48202
[email protected] NGUYEN MAU NAM received his B.S. from Hue University, Vietnam, in 1998 and his Ph.D. from Wayne State University in 2007 under the direction of Boris Mordukhovich. He is currently an Assistant Professor of Mathematics at the University of Texas-Pan American. Department of Mathematics, University of Texas-Pan American, Edinburg, TX 78539
[email protected] JUAN SALINAS JR. received his B.S. in Electrical Engineering from the University of Texas-Pan American in 1999. He is currently a graduate student at the University of Texas-Pan American. Department of Mathematics, University of Texas-Pan American, Edinburg, TX 78539
[email protected]
February 2012]
SOLVING A GENERALIZED HERON PROBLEM
99
Jacobi Sum Matrices Sam Vandervelde
Abstract. In this article we identify several beautiful properties of Jacobi sums that become evident when these numbers are organized as a matrix and studied via the tools of linear algebra. In the process we reconsider a convention employed in computing Jacobi sum values by illustrating how these properties become less elegant or disappear entirely when the standard definition for Jacobi sums is utilized. We conclude with a conjecture regarding polynomials that factor in an unexpected manner.
1. JACOBI SUMS. Carl Jacobi’s formidable mathematical legacy includes such contributions as the Jacobi triple product, the Jacobi symbol, the Jacobi elliptic functions with associated Jacobi amplitudes, and the Jacobian in the change of variables theorem, to but scratch the surface. Among his many discoveries, Jacobi sums stand out as one of the most brilliant gems. Very informally, a Jacobi sum adds together certain roots of unity in a manner prescribed by the arithmetic structure of the finite field on which it is based. (We will supply a precise definition momentarily.) For a given finite field a Jacobi sum depends on two parameters, so it is natural to assemble these values into a matrix. We have done so below for the Jacobi sums arising from the field with eight elements. We invite the reader to study this collection of numbers and identify as many properties as are readily apparent.
6
−1
−1
−1
−1
−1
−1
−1 −1 + i √7 5 − 1 i √7 −1 − i √7 5 − 1 i √7 −1 + i √7 −1 2 2 2 2 √ −1 5 − 1 i √7 −1 + i √7 −1 + i √7 5 − 1 i √7 −1 −1 − i 7 2 2 2 2 √ √ √ √ √ 5 1 5 1 −1 −1 − i 7 −1 + i 7 −1 − i 7 −1 + 2i 7 2 + 2i 7 2 √ √ √ √ √ −1 52 − 12 i 7 52 − 12 i 7 −1 −1 + i 7 −1 − i 7 −1 + i 7 √ √ √ √ √ 5 1 5 1 −1 + i 7 −1 − i 7 −1 − i 7 + i 7 −1 −1 + i 7 2 2 2 2 −1
−1
√ −1 − i 7
5 2
√ + 12 i 7
√ −1 + i 7
5 2
√ + 21 i 7
(1)
√ −1 − i 7
Before enumerating the standard properties of Jacobi sums we offer a modest background on their development and applications. According to [2] Jacobi first proposed these sums as mathematical objects worthy of study in a letter mailed to Gauss in 1827. Ten years later he published his findings, with extensions of his work provided soon after by Cauchy, Gauss, and Eisenstein. It is interesting to note that while Gauss sums will suffice for a proof of quadratic reciprocity, a demonstration of cubic reciprocity along similar lines requires a foray into the realm of Jacobi sums; Eisenstein formulated a generalization of Jacobi sums (see [3]) in order to prove biquadratic reciprocity. As shown in [5], Jacobi sums may be used to estimate the number of integral solutions to congruences such as x 3 + y 3 ≡ 1 mod p. These estimates played an important role in the development of the Weil conjectures [6]. Jacobi sums were also employed by Adleman, Pomerance, and Rumely [1] for primality testing. http://dx.doi.org/10.4169/amer.math.monthly.119.02.100 MSC: Primary 11T24
100
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
Although Jacobi sums have been around for a long time, several of the results presented below seem to have gone unnoticed. We suspect this has to do in part with the fact that the usual definition of Jacobi sums differs slightly from the one we use. Conventional wisdom would have us forego the 6 in the upper left corner of (1) in favor of an 8 and replace each −1 along the top row and left column by a 0. However, some of the most compelling features of Jacobi sum matrices evaporate when the standard definition is used. Therefore one of our purposes in presenting these results is to suggest that this alternative warrants serious consideration as the “primary” definition, at least in the setting of finite fields. To be fair, the version of Jacobi sums we study does appear in the literature: e.g., in [2, Section 2.5], which discusses the relationship between Jacobi sums and cyclotomic numbers. 2. PRELIMINARIES. Recall that there exists a finite field Fq with q elements if and only if q = pr is a power of a prime, and such a field is unique up to isomorphism. We shall not require any specialized knowledge of finite fields beyond the fact that the multiplicative group Fq∗ of nonzero elements forms a cyclic group of order q − 1. The quantity q − 1 appears throughout our discussion, so we set m = q − 1 from here on. Thus Fq∗ has m elements. Fix a generator g of Fq∗ and let ξ = e2πi/m . The function χ defined by χ (g k ) = ξ k for 1 ≤ k ≤ m is an example of a multiplicative character on Fq∗ ; that is, a function χ : Fq∗ → C satisfying χ(1) = 1,
χ(uv) = χ (u)χ (v),
u, v ∈ Fq∗ .
(2)
We use an mth root of unity since (χ(g))m = χ (g m ) = χ (1) = 1. As the reader may verify, there are precisely m multiplicative characters on Fq∗ , namely χ, χ 2 , . . . , χ m , where χ a (g k ) = (χ(g k ))a = ξ ak as one would expect. Note that χ m (g k ) = 1 for all k, so we call χ m the trivial character. It follows that the value of the exponent a only matters mod m. In particular, the inverse of χ a (which is also the complex conjugate) may be written either as χ −a or as χ m−a . By the same token, we will usually write the trivial character as χ 0 . To define a Jacobi sum it is necessary to extend each character χ a to all of Fq by defining χ a (0). The multiplicative condition forces χ a (0) = 0 whenever 1 ≤ a < m. But for the trivial character a seemingly arbitrary choice1 must be made, since taking either χ 0 (0) = 0 or χ 0 (0) = 1 satisfies (2). Convention dictates that we declare χ 0 (0) = 1 for the trivial character. However, we opt for setting χ a (0) = 0 for all a. As the opportunity arises we will point out the ramifications of this choice. Properties of roots of unity now imply that X
χ a (u) = 0,
u∈Fq
1 ≤ a < m,
X
χ 0 (u) = q − 1 = m.
(3)
u∈Fq
(One rationale behind taking χ 0 (0) = 1 is presumably rooted in the fact that the latter sum would come to q rather than q − 1, giving a more pleasing value.) A Jacobi sum takes as its arguments a pair of multiplicative characters on a given finite field and returns a complex number: 1 Ireland and Rosen explain that Jacobi sums arise when counting solutions to equations over F . In this p context χ 0 (0) tallies solutions to x e = 0, which would seem to motivate the value χ 0 (0) = 1. However, one might also argue that the zero solution should not be included since the equations are homogenous, leading to χ 0 (0) = 0 instead.
February 2012]
JACOBI SUM MATRICES
101
Jq (χ a , χ b ) =
X
χ a (u)χ b (1 − u) =
u∈Fq
X
χ a (u)χ b (v).
(4)
u,v∈Fq u+v=1
The middle expression is more utilitarian, while the final one highlights the symmetry in the definition. When the field Fq is clear we will drop the subscript q. We will also often omit χ and refer to a particular Jacobi sum simply as J (a, b). Because the terms of the sum corresponding to u = 0 and u = 1 always vanish, we may write J (a, b) =
X
χ a (u)χ b (1 − u),
(5)
u6 =0,1
where it is understood that the sum is over u ∈ Fq . Thus a Jacobi sum adds together q − 2 not necessarily distinct mth roots of unity. In a marvelous manner this sum plays the additive and multiplicative structures of the field off one another, yielding a collection of numbers with extraordinary properties. To illustrate how these numbers are computed we return to matrix (1), which catalogs the values J8 (χ a , χ b ) for 0 ≤ a, b ≤ 6 for a particular generator g of F∗8 . (For aesthetic reasons we begin numbering rows and columns of this matrix at 0.) The generator g of F∗8 chosen satisfies g 1 + g 3 = 1,
g 2 + g 6 = 1,
g 4 + g 5 = 1,
g 7 + 0 = 1.
(6)
Letting ξ = e2πi/7 we may now calculate, for instance, J (1, 2) = χ(g)χ 2 (1 − g) + χ(g 2 )χ 2 (1 − g 2 ) + · · · + χ (g 6 )χ 2 (1 − g 6 )
(7)
= χ(g)χ 2 (g 3 ) + χ(g 2 )χ 2 (g 6 ) + · · · + χ (g 6 )χ 2 (g 2 ) = χ(g 7 ) + χ(g 14 ) + χ(g 5 ) + χ (g 14 ) + χ (g 13 ) + χ (g 10 ) = ξ 7 + ξ 14 + ξ 5 + ξ 14 + ξ 13 + ξ 10 = 1 + 1 + ξ5 + 1 + ξ6 + ξ3 √ = 25 − 21 i 7, which explains the entry in row 1, column 2 of (1). For 1 ≤ a ≤ 6 we find J (a, 0) = χ a (g)χ 0 (1 − g) + χ a (g 2 )χ 0 (1 − g 2 ) + · · · + χ a (g 6 )χ 0 (1 − g 6 ) 2
(8)
6
= χ (g) + χ (g ) + · · · + χ (g ) a
a
a
= −χ a (g 7 ) = −1, where the penultimate step follows from (3). If we had employed the conventional value for χ 0 (0) the term χ a (g 7 ) would also appear in the sum, giving a total of 0 instead. By √ way of further orientation the reader is encouraged to confirm that J (5, 1) = −1 + i 7 and that J (3, 4) = −1. A cursory examination shows that matrix (1) is symmetric, that the top left entry equals q − 2, and that the remaining entries along the top row, the left column, and the secondary diagonal are √ −1. Slightly less obvious is the fact that all other entries have an absolute value of 8. The sum of the entries along the top row is 0; a quick check 102
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
reveals the same is true for every row and column. We summarize these properties below without proof. (One may consult [5] for details.) Proposition 1. Fix a generator g of Fq∗ , let χ (g k ) = ξ k with ξ = e2πi/m be the corresponding character, take χ a (0) = 0 for all a, and abbreviate Jq (χ a , χ b ) to J (a, b). Then for 0 ≤ a, b ≤ m − 1 we have (i) (ii) (iii) (iv) (v) (vi)
J (a, b) = J (b, a), J (0, 0) = q − 2, J (a, 0) = J (0, b) = −1 when a, b 6 = 0, J (a, m − a) = −χ a (−1), |J (a, b)|2 = q when a, b 6 = 0 and a + b 6 = m, Pm−1 Pm−1 k=0 J (a, k) = k=0 J (k, b) = 0.
Observe that |J (a, b) + 1|2 and |J (a, b) + 8|2 are either 0 or of the form 2r 7s with r, s ∈ N for every entry of (1). In general the quantities Jq (a, b) + 1 and Jq (a, b) + q satisfy interesting congruences. We also remark that all the results presented here continue to be valid regardless of the generator g of Fq∗ used to define χ. The value of Jq (χ as , χ bs ) obtained by using the generator g s is identical to that of Jq (χ a , χ b ) using the original generator g, so altering the generator only permutes the rows and columns of a Jacobi sum matrix in a symmetric fashion. 3. EIGENVALUES. Thus far our discussion has focused on properties of Jacobi sums taken individually. However, we are primarily interested in what can be said about the set of all Jacobi sum values for a particular finite field, viewed collectively as a matrix. It may have occurred to the curious individual to calculate the eigenvalues of matrix (1). We are rewarded for our efforts upon finding that its characteristic polynomial factors as p(x) = −x(x − 7)2 (x − 7ω)2 (x − 7ω)2 ,
(9)
where ω = e2πi/3 . One might speculate that cube roots of unity make an appearance since we used characters of F8 , and 8 = 23 . But in fact the same phenomenon occurs for every value of q. This is explained by the fact that powers of these matrices (suitably scaled) cycle with period three, a property that depends on using the nonstandard value for χ 0 (0). Theorem 1. Defining J (a, b) as in Proposition 1, let B be the m × m matrix with entries J (a, b) for 0 ≤ a, b ≤ m − 1. Then the powers of B satisfy (i) B 2 = m B, (ii) B 3 = m 3 I − m 2 U , (iii) B n = m 3 B n−3 for n ≥ 4, where I is the m × m identity matrix and U is the matrix all of whose entries are 1. Proof. The first claim is equivalent to the assertion that m−1 X
J (a, k)J (k, b) = m J (a, b)
(10)
k=0
February 2012]
JACOBI SUM MATRICES
103
for all a and b. Using definition (5) for J (a, b) we expand the left-hand side as m−1 X X X χ a (u)χ k (1 − u) χ k (v)χ b (1 − v) . k=0
(11)
v6=0,1
u6 =0,1
It is a standard opening gambit in these sorts of proofs to move the summation over k P k to the inside and then use the fact that m−1 k=0 χ (u) = 0 unless u = 1, in which case the sum equals m. (It is this feature of characters that make them useful for counting arguments.) Employing this strategy leads to X X
χ a (u)χ b (1 − v)
u6 =0,1 v6 =0,1
m−1 X
χ k ((1 − u)v).
(12)
k=0
The final sum vanishes unless (1 − u)v = 1, or u = 1 − v1 . Hence our expression reduces to X X 1 v 1 a b −a −b m χ 1− χ (1 − v) = m χ χ , (13) v v−1 1−v v6 =0,1 v6 =0,1 where we have used χ a (u) = χ −a ( u1 ). Next observe that J (a, b) = J (−a, −b) since χ a = χ −a ; we introduced negative exponents in anticipation of this fact. And now in v 1 a beautiful stroke we realize that v−1 + 1−v = 1, and as v runs through all elements of v Fq other than 0 and 1 so does v−1 . Hence the right-hand side of (13) is precisely the sum defining J (−a, −b), thus proving the first part. With this result in hand the second part will follow once we show B B = m 2 I − mU . This is equivalent to demonstrating that m−1 X k=0
J (a, k)J (k, b) =
m2 − m −m
a=b a 6 = b.
(14)
The same ingredients are needed as above (but without negative exponents at the end), so we omit the proof in favor of permitting the reader to supply the steps. There are no major surprises along the way, and the explanation is quite satisfying. The final claim is an immediate consequence of the second part. For n ≥ 4 we compute B n = B n−4 B(m 3 I − m 2 U ) = m 3 B n−3 ,
(15)
where we have used the fact that BU is the zero matrix because the entries within each row of B sum to 0. This completes the proof. Corollary 1. If λ is an eigenvalue of a Jacobi sum matrix B then λ ∈ {0, m, mω, mω}, where ω = e2πi/3 . Proof. Suppose that Bv = λv for some nonzero vector v. Then multiplying B 4 = m 3 B on the right by v yields λ4 v = m 3 λv. Therefore λ4 = m 3 λ since v 6 = 0, which implies that λ ∈ {0, m, mω, mω}. Corollary 2. Every Jacobi sum matrix B has an orthogonal basis of eigenvectors. 104
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
Proof. Since B is symmetric its conjugate transpose B ∗ is just B. But B is a scalar multiple of B 2 , so we deduce that B and B ∗ commute, and hence B is normal. The assertion now follows from well-known properties of normal matrices as furnished by [4], for instance. The fact that the eigenspaces for λ = 7, 7ω, and 7ω have the same dimension has probably not escaped notice. In general the eigenspaces are always as close in size as possible, a fact that depends upon ascertaining the traces of Jacobi sum matrices. Proposition 2. The trace tr(B) of a Jacobi sum matrix B is equal to 0, m, or 2m according to whether m − 1 ≡ 0, 1, or 2 mod 3. Proof. The values occurring along the main diagonal of B are J (a, a). Hence tr(B) =
m−1 X a=0
J (a, a) =
m−1 X X
χ (u)χ (1 − u) = a
a
a=0 u∈Fq
m−1 XX
χ a (u − u 2 ).
(16)
u∈Fq a=0
But the inner sum vanishes unless u − u 2 = 1, in which case its value is m. When Fq has characteristic 3 we find that u = −1 is a double root of the equation, while for other characteristics u = −1 is not a root. Since (u 2 − u + 1)(u + 1) = u 3 + 1, in these cases we seek values of u with u 3 = −1 other than u = −1. For m ≡ 0 mod 3 there will be two such values, while for m ≡ 1 mod 3 there are no such values, because Fq∗ is a cyclic group of order m. In summary, u − u 2 = 1 has one, two, or zero distinct roots when m ≡ 2, 0, 1 mod 3, as claimed. Proposition 3. The characteristic polynomial of a Jacobi sum matrix B has the form p B (x) = ±x(x − m)r (x − mω)s (x − mω)s
(17)
for nonnegative integers r ≥ s satisfying r + 2s = m − 1 with r as close to s as possible. Proof. Clearly p B (x) is monic. Furthermore, the sign will be positive unless m is odd; i.e., when q = 2r . We will also see below that rank(B) = m − 1, giving the single factor of x. Now let us show that p B (x) has real coefficients, meaning that the eigenvalues mω and mω occur in pairs. This follows from the relationship J (m − a, m − b) = J (a, b). In other words, swapping rows a and m − a as well as columns b and m − b for all a and b with 1 ≤ a, b ≤ m2 does not alter p B (x) but does conjugate every entry of B, and therefore p B (x) is real. The trace of B is the sum of the eigenvalues, and each triple m, mω, mω will cancel. The result now follows from the fact that tr(B) is equal to 0, m, or 2m. 4. RELATED MATRICES. Observe that the list of eigenvalues for a Jacobi sum matrix constructed using the conventional definition is nearly identical to the list given by Proposition 3, the difference being that the eigenvalue λ = 0 is replaced by λ = 1 and a single occurrence of λ = m changes to λ = m + 1 = q. This is a consequence of the close relationship in each case between the characteristic polynomial of the entire matrix and that of the lower right (m − 1) × (m − 1) submatrix of values they share. Lemma 1. Let M be an n × n matrix whose first column contains the entries c − 1, −1, . . . , −1 for some number c, as shown below, and such that the sum of the entries in February 2012]
JACOBI SUM MATRICES
105
each row of M is 0. Let M 0 be the (n − 1) × (n − 1) submatrix obtained by deleting the first row and column of M. Then the list of eigenvalues of M 0 (with multiplicity) is given by removing the values λ = 0, c from the list of eigenvalues for M and including λ = 1. c−1 ··· −1 M = .. (18) . M0 −1 Proof. Multiplying the first column of M − x I by (1 − x) before taking the determinant yields (c − 1 − x)(1 − x) · · · x −1 .. (1 − x) det(M − x I ) = . (19) 0 0 . M − xI x −1 We next add columns 2 through n to the first column. Since the sum of the entries within each row of M is zero this operation cancels every term in the first column below the top entry, which becomes (c − 1 − x)(1 − x) + (1 − c) = x(x − c). Therefore the value of the determinant may be rewritten as x(x − c) · · · 0 .. (1 − x) det(M − x I ) = = x(x − c) det(M 0 − x I 0 ). . M0 − x I 0 0
(20)
(21)
The assertion follows. If Jq (a, b) were computed in the traditional manner the top row of our Jacobi sum matrix would be q followed by a row of 0’s, so the list of eigenvalues would consist of those of the lower right submatrix, augmented by the value λ = q. Invoking the lemma now leads to the statement made above comparing lists of eigenvalues. Purely to satisfy our curiosity, we now propose permuting the rows and columns of a Jacobi sum matrix B before computing the eigenvalues. For example, take B to equal matrix (1), let P be any 7 × 7 permutation matrix, and consider the degree-seven polynomial det(PB − x I ). Compiling the roots to all 5040 polynomials that arise in this manner produces a list with somewhat more than 3500 distinct complex numbers; locating them in the complex plane yields the scatterplot on the left in Figure 1. The roots, whose locations are marked by small solid discs, form a nearly unbroken chain along the circle of radius 7 centered at the origin, with discernible gaps located only near the real axis. By way of comparison, the related 7 × 7 matrix B˜ of conventional Jacobi sum values yields the right-hand plot in Figure 1. Put another way, matrix B generates in excess of 3500 algebraic integers, each of degree 14 or less over Q and each having absolute value 7. As one might hope, this property is shared by all Jacobi sum matrices. The following result was conjectured by the author and proved by Ron Evans (personal communication, Jan. 2011); we present this proof below. 106
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
–6
–4
6
6
4
4
2
2
–2
2
4
6
–6
–4
–2
2
–2
–2
–4
–4
–6
–6
4
6
Figure 1. A plot of the roots of det(P B − x I ) on the left and the roots of det(P B˜ − x I ) on the right for all 7 × 7 permutation matrices P.
Proposition 4. Let B denote a Jacobi sum matrix for the finite field Fq and let P be any m × m permutation matrix, where m = q − 1. Then every nonzero eigenvalue λ of the matrix PB satisfies |λ| = m. Proof. Let λ be a nonzero eigenvalue of PB, so that PBv = λv for some nonzero vector v. Letting M ∗ denote the conjugate transpose of a matrix M, it follows that (PBv)∗ (PBv) = (λv)∗ (λv). Expanding yields v ∗B ∗P ∗PBv = |λ|2 v ∗ v, which implies v ∗ m1 B 3 v = |λ|2 v ∗ v, (22) since P ∗P = I for any permutation matrix and B ∗ = B = m1 B 2 using Theorem 1 and the fact that B is symmetric. Appealling once more to Theorem 1, we find that 1 3 B = m 2 I − mU , where every entry of U equals 1. Next observe that U v = 0, since m multiplying PBv = λv on the left by U gives UPBv = λU v, and UPB = UB = 0 while λ 6= 0. Therefore (22) becomes v ∗(m 2 I − mU )v = m 2 v ∗ v − mv ∗ U v = m 2 v ∗ v = |λ|2 v ∗ v.
(23)
But v ∗ v > 0 since v is a nonzero vector, and hence |λ| = m, as desired. 5. DETERMINANTS. One of the more striking properties of Jacobi sum matrices emerges once we begin to examine submatrices and their determinants, in particular. Thus the alert reader may have wondered about the determinant of (1). Since the sum of the entries in each row is zero, it is clear that det(B) = 0 for any Jacobi sum matrix. Not content, the truly enterprising individual next computes det(B 0 ) for the lower right 6 × 6 submatrix B 0 of matrix (1), obtaining the intriguing value det(B 0 ) = 16807 = 75 . The obvious generalization is true, and the groundwork for a proof has largely been laid. We need only one further observation, which is a nice result in its own right. Proposition 5. Let B denote a Jacobi sum matrix with lower right (m − 1) × (m − 1) submatrix B 0 . Then (B 0 )−1 = m12 (B 0 + (m + 1)U 0 ), where every entry of U 0 is 1. Proof. The statement follows readily from the equality B B = m 2 I − mU stated in Theorem 1. We omit the details. February 2012]
JACOBI SUM MATRICES
107
Proposition 6. With B and B 0 as above, we have det(B 0 ) = m m−2 . Proof. According to Corollary 1 the eigenvalues of B belong to the set {0, m, mω, mω}. We know det(B) = 0, so rank(B) < m. But by the previous lemma B 0 is nonsingular; therefore rank(B) = m − 1, implying that exactly one eigenvalue of B is 0. We next apply Lemma 1 to conclude that the eigenvalues of B 0 are among {1, m, mω, mω}, with the value 1 occurring precisely once. Finally, the discussion within Proposition 3 indicates that the values mω and mω come in pairs. Hence the product of the m − 1 eigenvalues, which is det(B 0 ), comes to m m−2 . Corollary 3. Let A be the submatrix of a Jacobi sum matrix B obtained by deleting row i and column j. Then we have det(A) = (−1)i+ j m m−2 . Proof. The case i = j = 0 is handled by Proposition 6. When j > 0 note that adding all other columns of B 0 to column j effectively replaces that column with the negative of column 0 of B, since the sum of the entries within every row is 0. Moving this column back to the far left and negating it introduces a sign of (−1) j to the value of the determinant. The same reasoning applies to the rows; therefore B 0 is transformed into A by operations that change the sign of det(B 0 ) by (−1)i+ j . But why stop there? √ If A is the lower right 5 × 5 submatrix of (1), we discover that det(A) = 343(7 − i 7). The power of 7 is nice, but even more interesting is √ (24) 7 − i 7 = J (0, 0) − J (0, 1) − J (1, 0) + J (1, 1). In other words, the determinant of this submatrix appears to be related to the conjugates of the entries in the “complementary” upper left 2 × 2 submatrix. The same phenomenon occurs elsewhere; for instance, if√ A is the upper left 5 × 5 submatrix of (1) then we find that det(A) = 343(−7 + 3i 7), and sure enough √ (25) −7 + 3i 7 = J (5, 5) − J (5, 6) − J (6, 5) + J (6, 6). These computations hint at a beautiful extension to Corollary 3. We first formalize a few of the above ideas. A k × k submatrix A is determined by a subset r1 , . . . , rk of the rows of B, where 0 ≤ r1 < · · · < rk ≤ m − 1, and a similar subset c1 , . . . , ck of k columns. Deleting these rows and columns yields the complementary submatrix Ac , which contains exactly those entries of B that are not in the same row or column as any element of A. The sign of the submatrix, denoted by A , is based on its position within B. It is given by A = (−1)r1 +···+rk +c1 +···+ck .
(26)
It is routine to verify that A = Ac . Finally, the diminished determinant ddet(A) of A is an alternating sum of the determinants of all maximal submatrices of A. Letting Aij represent the matrix obtained by deleting row i and column j of A we have ddet(A) =
k X i, j=1
(−1)i+ j det(Aij ) =
X
A0 det(A0 ),
(27)
A0 ⊂A
where A0 ⊂ A signifies a (k − 1) × (k − 1) submatrix of A. We have chosen the term “diminished” since the degree of ddet(A) as a polynomial in the entries of A is one less than the degree of det(A). 108
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
So that the upcoming result will apply to all possible submatrices of B, we adopt the convention that ddet(A) = 1 for a 1 × 1 matrix A, while ddet(A) = 0, det(A) = 1, and A = 1 when A is the 0 × 0 “empty” matrix. With the foregoing definitions in hand we are now prepared to state our main result. Theorem 2. Given a Jacobi sum matrix B, let A be any k × k submatrix of B, where 0 ≤ k ≤ m. Denote the complementary submatrix to A and its sign by Ac and Ac , respectively. Then the following identity holds: ddet(Ac ) det(A) c = . A mk m m−k
(28)
Observe that the power of m in each denominator corresponds to the size of the matrix in the numerator. Also, the examples outlined above illustrate the case m = 7, k = 5; in both examples the sign happened to be Ac = 1. We provide a proof of this result in the appendix. The reader is encouraged to peruse the argument—among other things, a number of steps would make excellent exercises for linear algebra students. Before considering a collection of multivariable polynomials with unlikely factorizations, we pause to present a couple of elementary facts concerning the diminished determinant, which arose naturally in the preceding discussion. Early in the proof of Theorem 2 we will need an analogue to expansion by minors to handle the transition between diminished determinants for matrices of different sizes. To clarify the analogy, let M be an n × n matrix with entries m i j and let M ij denote the submatrix obtained by deleting row i and column j from M. Then expansion by minors implies that det(M) =
n 1 X (−1)i+ j m i j det(M ij ). n i, j=1
(29)
Lemma 2. With M and M ij as above we have ddet(M) =
n 1 X (−1)i+ j m i j ddet(M ij ). n − 1 i, j=1
(30)
Proof. Applying (29) to the definition of ddet(M) yields ddet(M) =
n X
(−1)k+l det(Mlk )
k,l=1
X = (−1)k+l k,l
1 X 0 0 (−1)i + j m i j det(M ik jl ). n − 1 i6=k
(31)
j6=l
Here M ik jl is the submatrix of M obtained by deleting rows i, k and columns j, l. Note that if row i is below row k then we must use i − 1 in the exponent when applying (29) to det(Mlk ); otherwise i is the correct value. Hence we set i 0 = i − 1 when i > k and i 0 = i when i < k, and similarly for j 0 relative to l. 0 0 The key to ensuring that the signs behave is to realize that (−1)i +k = (−1)i+k +1 , where k 0 = k − 1 when k > i and k 0 = k otherwise. Defining l 0 in the same manner February 2012]
JACOBI SUM MATRICES
109
relative to j enables us to rewrite (31) as ddet(M) =
X 1 X 0 0 (−1)k +l det(M ik (−1)i+ j m i j jl ) n − 1 i, j k6=i
(32)
l6= j
=
n X
1 (−1)i+ j m i j ddet(M ij ). n − 1 i, j=1
This completes the proof. Diminished determinants also resemble determinants with respect to row and column transpositions. Lemma 3. Interchanging a pair of adjacent rows or columns in a matrix M negates the value of ddet(M). P Proof. Every term in the sum (−1)i+ j det(M ij ) is negated by such an operation, for one of two reasons. If column i and row j stay put then a pair of rows or columns within M ij trade places, negating det(M ij ) without affecting (−1)i+ j . On the other hand, if column i or row j is involved in the exchange then M ij still appears in the sum with entries intact, but now with an attached sign of (−1)i+ j±1 . 6. FURTHER INQUIRY. To conclude we offer an observation regarding Jacobi sum matrices that suggests there is still gold left to be mined. Define the three permutation matrices 1000000 1000000 1000000 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 P1 = 0 0 0 1 0 0 0 , P2 = 0 0 0 0 0 1 0 , P4 = 0 0 0 0 0 0 1 . 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0000001 0001000 0000010 0001000 0000010 0000001 (33) In each case the 1s are situated along a “line through the origin,” where the origin is the upper left entry and we reduce coordinates mod 7; the subscript indicates the slope of the line. We have already observed that the characteristic polynomial of matrix (1), which we shall denote as B once again, splits completely over the field Q(ω): p B (x) = det(B − x P1 ) = −x(x − 7)2 (x − 7ω)2 (x − 7ω)2 .
(34)
Remarkably, much more is true: det(B − x P1 − y P2 − z P4 ) = −(x + y + z)(x + y + z − 7)2
(35)
(x + ωy + ωz − 7ω)(x + ωy + ωz − 7ω) (x + ωy + ωz − 7ω)(x + ωy + ωz − 7ω). Further experimentation suggests that it is not a coincidence that the slopes used for P1 , P2 , and P4 are powers of 2. For instance, det(B − w P1 − x P2 − y P4 − z P8 ) splits into 110
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
linear and quadratic factors, where B is the Jacobi sum matrix for F16 and all matrices are 15 × 15 in size. This phenomenon persists for finite fields of odd characteristic as well. Thus when working over F9 we find that det(B − x P1 − y P3 ) splits completely over Q(ω). We also point out the related beautiful factorization det(B − x P5 − y P7 ) = (x + y)(x + y + 8)(x − y + 8)(x − y − 8)2 (x + y − 8)3 . (36) Based on these observations we surmise the following. Conjecture 1. Let B be a Jacobi sum matrix for the finite field Fq , where q = pr and m = q − 1. For (k, m) = 1 denote by Pk the m × m permutation matrix whose entry in row s, column t is 1 for all 0 ≤ s, t < m with s ≡ kt mod m. Then the polynomial det(B − x0 P1 − x1 Pp − x2 Pp2 − · · · − xr −1 Ppr −1 )
(37)
in the r variables x0 , x1 , . . . , xr −1 may be written as a product of factors each of which has degree at most two in these variables. Other evidence that we have not included here suggests that this conjecture can be extended in scope. In summary, we have examined an elegant tool from number theory via the lens of linear algebra and uncovered several nice results in the process. At the very least this approach demonstrates a tidy manner in which many of the elementary (though perhaps not fully mapped out) facts concerning Jacobi sums may be packaged. On an optimistic note, this avenue of inquiry may even lead to a more complete understanding of Jacobi sums. 7. APPENDIX. Our main result relates the determinant of a submatrix of a Jacobi sum matrix to the diminished determinant of the conjugate complementary submatrix. Theorem 2. Given a Jacobi sum matrix B, let A be any k × k submatrix of B, where 0 ≤ k ≤ m. Denote the complementary submatrix to A and its sign by Ac and Ac , respectively. Then the following identity holds: ddet(Ac ) det(A) = Ac . k m m m−k
(38)
Proof. For k = 0 and k = m the statement to be proved reduces to 1=
ddet(B) , mm
det(B) = 0. mm
(39)
The former is a consequence of Corollary 3, while the latter is clear. Furthermore, the statement for k = m − 1 is equivalent to Corollary 3. Hence we need only show that case k follows from case k + 1 for 1 ≤ k ≤ m − 2. In the interest of presenting a lucid argument, we will provide a sketch of the proof in the case k = m − 3, followed by a summary of the algebra for the general case, which is qualitatively no different. Therefore suppose the result holds for k = m − 2 and that A is an (m − 3) × (m − 3) submatrix of B. For the sake of organization we permute the rows and columns of B in order to situate the entries of Ac in the upper left corner, but otherwise maintain the February 2012]
JACOBI SUM MATRICES
111
original order of the rows and columns within A and Ac . Let us label the permuted matrix as C, having entries γi j for 0 ≤ i, j ≤ m − 1. γ00 γ10 γ 20 C = γ30 γ40
γ01 γ11 γ21 γ31 γ41 .. .
γ02 γ03 γ04 γ12 γ13 γ14 · · · γ22 γ23 γ24 γ32 γ42 A
(40)
We claim that the result for k = m − 2 continues to hold for matrix C, up to a sign which we now determine. For exchanging a pair of adjacent rows or columns of B will negate exactly one of det(A), ddet(Ac ) or Ac , according as the pair of rows or columns both intersect A, both intersect Ac (by Lemma 3), or intersect both. If the entries of Ac reside in rows r1 , r2 , r3 and columns c1 , c2 , c3 then it requires r1 + (r2 − 1) + (r3 − 2) + c1 + (c2 − 1) + (c3 − 2)
(41)
swaps of adjacent rows or columns to transform B into C; hence we must include a factor of Ac when applying (28) to C. In other words, if D is an (m − 2) × (m − 2) submatrix of C then Ac
ddet(D c ) det(D) c = . D m m−2 m2
(42)
The final observation to be made before embarking upon a grand calculation is that the dot product of any row vector of C with the conjugate of another row vector is −m, while the dot product of a row vector with its own conjugate is m 2 − m. This relationship holds for B since B is symmetric and B B = m 2 I − mU , as noted in the proof of Theorem 1. Permuting rows and columns of B does not destroy this property, which consequently holds for C as well. Now to begin. We wish to relate ddet(Ac ) to det(A). By Lemma 2 we may begin 1 γ γ γ γ γ 00 ddet γ 11 γ 12 − γ 01 ddet γ 10 γ 12 + · · · 21 22 20 22 2 Ac 12 12 12 = γ det(C ) + γ det(C ) + γ det(C ) + · · · , (43) 00 01 02 12 02 01 2m m−4
ddet(Ac ) =
since the result holds for k = m − 2 with a correction factor of Ac . As before C ikjl denotes the submatrix of C obtained by deleting rows i, k and columns j, l. We then 12 12 12 expand each of det(C12 ), det(C02 ), and det(C01 ) by minors along row 0, which gives (γ 00 γ00 + γ 01 γ01 + γ 02 γ02 ) det(A), along with a fair number of other terms. We next collect the remaining terms according to whether they involve γ03 , γ04 , γ05 , and so on. The reader may verify that the sum of the terms containing a factor of γ03 is γ03 γ 03 det(A) + m det(Aγ03 F3 ),
(44)
where Aγ03 F3 refers to matrix A with all entries in the left column replaced by γ03 . Combining the terms involving γ04 , γ05 , . . . in the same manner, we may rewrite the 112
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
first three terms of (43) as (γ00 γ 00 + γ01 γ 01 + γ02 γ 02 + γ03 γ 03 + γ04 γ 04 + γ05 γ 05 + · · · ) det(A) + m det(Aγ03 F3 ) + m det(Aγ04 F4 ) + m det(Aγ05 F5 ) + · · · .
(45)
The coefficient of det(A) is the dot product of row 0 of C with its own conjugate, so this simplifies to (m 2 − m) det(A) + m det(Aγ03 F3 ) + m det(Aγ04 F4 ) + m det(Aγ05 F5 ) + · · · .
(46)
Finally, this entire sequence of steps may be performed on the second trio and third trio of terms in (43), yielding 3(m 2 − m) det(A) + m(det(Aγ03 F3 ) + det(Aγ04 F4 ) + det(Aγ05 F5 ) + · · · ) + m(det(Aγ13 F3 ) + det(Aγ14 F4 ) + det(Aγ15 F5 ) + · · · ) + m(det(Aγ23 F3 ) + det(Aγ24 F4 ) + det(Aγ25 F5 ) + · · · ).
(47)
Since the sum of the entries of any column of C is 0, we have det(Aγ03 F3 ) + det(Aγ13 F3 ) + det(Aγ23 F3 ) = − det( A˜ 3 ),
(48)
where A˜ 3 represents matrix A with each entry in its leftmost column replaced by the sum of all the entries in that column. Defining A˜ j similarly for j ≥ 3, (47) reduces to 3(m 2 − m) det(A) − m(det( A˜ 3 ) + det( A˜ 4 ) + det( A˜ 5 ) + · · · ). (49) P It is a neat exercise in linear algebra to confirm that det( A˜ j ) = (m − 3) det(A). Each term of det(A) appears in det( A˜ j ) for every j and hence appears m − 3 times in the sum; all other terms cancel in pairs, as the reader may verify. Hence we are left with 3(m 2 − m) det(A) − m(m − 3) det(A) = 2m 2 det(A).
(50)
In summary, we have shown that ddet(Ac ) =
det(A) Ac (2m 2 det(A)) = Ac m−6 . 2m m−4 m
(51)
Dividing through by Ac m 3 gives the desired equality. The calculation proceeds in an identical fashion for other values of k. One arrives at the expression k(m 2 − m) det(A) − m(m − k) det(A) = (k − 1)m 2 det(A)
(52)
in place of (50), yielding ddet(Ac ) =
Ac det(A) ((k − 1)m 2 det(A)) = Ac m−2k . m−2k+2 (k − 1)m m
(53)
Rearranging gives the result. ACKNOWLEDGMENTS. I would like to thank the referees for many helpful remarks and suggestions. In particular, the idea of generating the right-hand scatterplot in the figure as well as the insightful remarks contained in the footnote were both due to the referees. I am also grateful to Ron Evans for sharing the (quite rapidly found) proof appearing in this article.
February 2012]
JACOBI SUM MATRICES
113
REFERENCES 1. L. Adleman, C. Pomerance, R. Rumely, On distinguishing prime numbers from composite numbers, Ann. of Math. 117 (1983) 173–206; available at http://dx.doi.org/10.2307/2006975. 2. B. C. Berndt, R. J. Evans, K. S. Williams, Gauss and Jacobi Sums. Wiley, New York, 1998. 3. G. Eisenstein, Einfacher beweis und verallgemeinerung des fundamental theorems f¨ur die biquadratischen reste, in Mathematische Werke, Band I, 223–245, Chelsea, New York, 1975. 4. R. A. Horn, C. R. Johnson, Matrix Analysis. Cambridge University Press, Cambridge, 1985. 5. K. Ireland, M. Rosen, A Classical Introduction to Modern Number Theory, second edition. Springer, New York, 1990. 6. A. Weil, Number of solutions of equations in a finite field, Bull. Amer. Math. Soc. 55 (1949) 497–508; available at http://dx.doi.org/10.1090/S0002-9904-1949-09219-4. SAMUEL K. VANDERVELDE is an assistant professor of mathematics at St. Lawrence University. His mathematical interests include number theory, graph theory, and partitions. He is an enthusiastic promoter of mathematics—he conducts math circles for students of all ages, helped to found the Stanford Math Circle and the Teacher’s Circle, and composes problems for the USA Math Olympiad. He also writes and coordinates the Mandelbrot Competition, a nationwide contest for high schools. He is an active member of his church and enjoys singing, hiking, and teaching his boys to program. Department of Mathematics, Computer Science, and Statistics, St. Lawrence University, Canton, NY 13617
[email protected]
114
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
Alcuin’s Sequence Donald J. Bindner and Martin Erickson
Abstract. Alcuin of York (c. 740–804) lived over four hundred years before Fibonacci. Like Fibonacci, Alcuin has a sequence of integers named after him. Although not as well known as the Fibonacci sequence, Alcuin’s sequence has several interesting properties. The purposes of this note are to acquaint the reader with Alcuin’s sequence, to give the simplest available proofs of various formulas for Alcuin’s sequence, and to showcase a new discovery about the period of Alcuin’s sequence modulo a fixed integer.
1. INTRODUCTION. A famous problem posed by Alcuin of York gives rise to an integer sequence not unlike the famous Fibonacci sequence, though Alcuin’s problem predates Fibonacci by about 400 years. We will describe and prove some basic properties of Alcuin’s sequence and demonstrate a new property. Little is known about the background and early life of Alcuin of York (c. 740–804). Apparently, he was born in Northumbria in what is now England. He was a scholar and teacher who became instrumental in Charlemagne’s court. He is noted for his letters, poetry, and other writings, and his fostering of an educational system focused on humanitarianism and religious teaching. Alcuin’s collection of mathematical story problems, Propositiones ad acuendos juvenes (“Problems to Sharpen the Young”) contains 53 problems that can be classified as puzzles or recreational mathematics. A notable example is the problem of the goat, wolf, and cabbage, where a man must transport all three across a river but his boat will only carry himself and one of the three at a time. The problem that interests us is number 12 in Alcuin’s list. A certain father died and left as an inheritance to his three sons 30 glass flasks, of which 10 were full of oil, another 10 were half full, while another 10 were empty. Divide the oil and flasks so that an equal share of the commodities should equally come down to the three sons, both of oil and glass [4]. There are five solutions, of which Alcuin gives only the first: Solution 1
Solution 2
Solution 3
Solution 4
Solution 5
F
E
H
F
E
H
F
E
H
F
E
H
F
E
H
Son 1
5
5
0
5
5
0
5
5
0
4
4
2
4
4
2
Son 2
5
5
0
4
4
2
3
3
4
4
4
2
3
3
4
Son 3
0
0
10
1
1
8
2
2
6
2
2
6
3
3
4
The numbers of full, empty, and half-full flasks are represented by the columns F, E, and H, respectively. We don’t regard solutions with the sons permuted as distinct. Notice that in each solution, each son receives an equal number of full and empty http://dx.doi.org/10.4169/amer.math.monthly.119.02.115 MSC: Primary 11B50
February 2012]
ALCUIN’S SEQUENCE
115
flasks, and half-full flasks are used to make a total of ten flasks. Hence, the number of full flasks that a son receives completely determines his share. The problem may be generalized to any number 3n of flasks, n full, n empty, and n half full, to be distributed among three sons. The numbers of full flasks for each son form a triple of nonnegative integers (a, b, c), with a ≤ b ≤ c and a + b + c = n, satisfying the weak triangle inequality: a + b ≥ c. The reason is that a, b, c ≤ n/2, since the number of full flasks must be balanced by an equal number of empty flasks. Because the integer triple (a, b, c) satisfies the weak triangle inequality if and only if the triple (a + 1, b + 1, c + 1) satisfies the triangle inequality, the number of solutions to the flask-sharing problem is the same as the number of incongruent triangles with integer sides and perimeter n + 3. A triangle with integer sides is called an integer triangle. For n ≥ 0, let t (n) be the number of incongruent integer triangles of perimeter n. The sequence {t (n)} is called Alcuin’s sequence, after a tradition that seems to have started in [10]. The first few terms are 0, 0, 0, 1, 0, 1, 1, 2, 1, 3, 2, 4, 3, 5, 4, 7, 5, 8, 7, 10, 8, 12, . . . . 2. FORMULAS FOR ALCUIN’S SEQUENCE. When we have an intriguing sequence, it is natural to want to find a direct formula for it as well as a recurrence relation. Formulas for {t (n)}, the number of incongruent integer triangles of perimeter n, have been derived many times (see, e.g., [1], [5], [6], [7], [8]). We give a streamlined version of the derivation in [11]. The key idea is to relate integer triangles to partitions of integers into three parts. Theorem 1. Let kxk be the nearest integer to x. Then for all n ≥ 0,
n2 if n is even,
48
t (n) = 2 if n is odd.
(n+3) 48 (There is no ambiguity because n 2 /48 is never a half-integer.) Proof. Let pk (n) denote the number of partitions of the nonnegative integer n into k positive integer parts (summands). The order of the parts is unimportant, and for uniformity we list them in nonincreasing order. For example, p3 (5) = 2, and the two relevant partitions are 3 + 1 + 1 and 2 + 2 + 1. The number of integer triangles of even perimeter is equal to the number of partitions of half the perimeter into three parts: t (2n) = p3 (n),
n ≥ 0.
This is seen via the bijection (a, b, c) ↔ {n − a, n − b, n − c}, where (a, b, c) represents the sides of an integer triangle of perimeter 2n and {n − a, n − b, n − c} represents the summands in a partition of n into three parts. The point is that a, b, and c satisfy the triangle inequality if and only if each of these quantities is less than n, and this occurs if and only if the three parts n − a, n − b, and n − c are positive. 116
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
The sequence { p3 (n)} satisfies the recurrence relation p3 (n + 6) = p3 (n) + n + 3,
n ≥ 0.
To prove this, consider the last summand in a partition of n + 6 into three parts. If this summand is at least 3, then subtract 2 from each summand to obtain a partition of n into three parts. If the last summand is a 2, then subtract 2 from each part to obtain a partition of n into one or two parts. If the last summand is a 1, then subtract 1 from each part to obtain a partition of n + 3 into one or two parts. Using the obvious formulas p1 (n) = 1 and p2 (n) = bn/2c, where bxc is the greatest integer less than or equal to x, we obtain j n k n+3 + 1+ , n ≥ 0, p3 (n + 6) = p3 (n) + 1 + 2 2 which simplifies to our desired recurrence relation by examining the cases n even and n odd. Initial values of the sequence { p3 (n)} can be obtained from the relation p3 (n) = t (2n) or “from scratch”: 0, 0, 0, 1, 1, 2, . . . . We will prove the formula
2
n
p3 (n) =
12 ,
n ≥ 0,
where kxk is the nearest integer to x. (It is easy to show that n 2 /12 is never a halfinteger.) This formula for p3 (n) has the correct six initial values, and it satisfies the recurrence relation:
(n + 6)2 n 2
12 = 12 + n + 3, n ≥ 0. Hence, the formula holds for all n ≥ 0 by mathematical induction. We have shown that for n even,
2
n
t (n) = p3 (n/2) =
48 . Via the bijection (a, b, c) ↔ (a + 1, b + 1, c + 1), where (a, b, c) represents an integer triangle with odd perimeter n, we have t (n) = t (n + 3),
for n odd.
From the above formula, we notice a couple of interesting features of Alcuin’s sequence. The only fixed point is t (48) = 48. Furthermore, the sequence has the zig-zag property that t (2n) < t (2n + 1) > t (2n + 2), February 2012]
ALCUIN’S SEQUENCE
n ≥ 3. 117
The ordinary generating function for a sequence {a(n)} is the power series ∞ X
a(n)x n .
n=0
Like the Fibonacci sequence, Alcuin’s sequence has a rational generating function. Theorem 2. Alcuin’s sequence {t (n)} has the generating function ∞ X
t (n)x n =
n=0
x3 . (1 − x 2 )(1 − x 3 )(1 − x 4 )
Proof. We claim first that the triples (a, b, c) that represent integer triangles (i.e., triples of positive integers satisfying a ≤ b ≤ c < a + b) are precisely the triples of the form (a, b, c) = (1, 1, 1) + α(0, 1, 1) + β(1, 1, 1) + γ (1, 1, 2), where α, β, and γ are nonnegative integers. Furthermore, the choice of α, β, and γ in this representation is unique. It is easy to see that every triple of this form represents an integer triangle. And given a triple (a, b, c) that represents an integer triangle, the formulas α = b − a, β = a + b − c − 1, γ = c − b give the unique coefficients α, β, and γ . Since 2α + 3β + 4γ = a + b + c − 3 = n − 3, it follows that t (n) is equal to the number of partitions of n − 3 as a sum of 2’s, 3’s, and 4’s. Our claimed generating function may be written as ∞ ∞ ∞ X X X X 2 α 3 β x (x ) (x ) (x 4 )γ = x 2α+3β+4γ +3 , 3
α=0
β=0
γ =0
α,β,γ ≥0
and each way of writing n − 3 as a sum of 2’s, 3’s, and 4’s leads to an x n term in the sum on the right. The denominator of the rational generating function yields an order-nine linear recurrence relation for Alcuin’s sequence. Theorem 3. Alcuin’s sequence is determined by the recurrence relation t (n) = t (n − 2) + t (n − 3) + t (n − 4) − t (n − 5) − t (n − 6) − t (n − 7) + t (n − 9), for n ≥ 9, together with the values of t (n) for 0 ≤ n ≤ 8. Proof. From the generating function, we have x 3 = (1 − x 2 )(1 − x 3 )(1 − x 4 )
∞ X
t (n)x n
n=0
=
∞ X
t (n)(x n − x n+2 − x n+3 − x n+4 + x n+5 + x n+6 + x n+7 − x n+9 ).
n=0
Now the order-nine recurrence relation can be read off by equating coefficients of x n for n ≥ 9. 118
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
This recurrence relation is palindromic, so that if we use it to define {t (n)} for n < 0, then the sequence is the same forwards and backwards: t (−n) = t (n − 3). 3. THE PERIOD OF ALCUIN’S SEQUENCE. An integer sequence {a(n)} is periodic if there exists a positive integer L such that a(n + L) = a(n) for all n. The least such L is called the period of the sequence. Every integer recurrent sequence modulo a fixed integer is periodic, where the period is the least common multiple of the periods with respect to the prime powers which divide the modulus. The period of the Fibonacci sequence with respect to prime power moduli is unknown in general, except when the modulus is a power of 2 or 5 (see [12]). The period of Alcuin’s sequence was first given in [2]. Theorem 4. For any integer m ≥ 2, the sequence {t (n) mod m} is periodic with period 12m. Moreover, the range of the sequence consists of all integers modulo m if and only if m is one of the following: 7, 10, 19, 2 j , 3 j , 5 j , 11 j , 13 j , 41 j , 2 · 3 j , 5 · 3 j ,
for j ≥ 1.
This collection parallels a similar family of moduli for which the Fibonacci sequence contains all residues [3, p. 318]. Proof. For even values of the argument, we have t (2n + 12m) = k(2n + 12m)2 /48k = k(2n)2 /48k + nm + 3m 2 ≡ t (2n) (mod m). Since t (n) = t (n + 3) for n odd, the sequence {t (n) mod m} is periodic with period L ≤ 12m. Now we will show that L ≥ 12m. Let λ = L or L − 1, so that λ is even. We then have kλ2 /48k ≡ 0 (mod m) and k(λ + 2)2 /48k ≡ 0 (mod m), since t (−1) = t (0) = t (1) = t (2) = 0. Hence m divides k(λ + 2)2 /48k − kλ2 /48k, which is nonzero because L > 12 (since {t (n)} is an aperiodic pattern of 0’s and 1’s for −4 ≤ n ≤ 8). This difference is less than (λ + 2)2 /48 − λ2 /48 + 1 = (λ + 13)/12, and it follows that 12m < λ + 13 ≤ L + 13. By definition of period, L is a divisor of 12m. If L < 12m, then L ≤ 6m, but this contradicts the inequality 12m < L + 13, for m ≥ 3. The result is checked by inspection for m = 2. We conclude that L ≥ 12m, and therefore L = 12m. The question of when the range of {t (n) mod m} contains all integers modulo m reduces to a question of when a certain set of polynomials represents all integers modulo m. Our proof will use two venerable tools of number theory, namely Hensel’s lemma and quadratic nonresidues. Letting n = 12k + r , where 0 ≤ r < 12, we find from Theorem 1 that the values of t (n) are given by six quadratic polynomials: 3k 2 , 3k 2 + k, 3k 2 + 2k, 3k 2 + 3k + 1, 3k 2 + 4k + 1, 3k 2 + 5k + 2. By the substitution k 7 → −(k + 1), the last two polynomials may be eliminated as redundant. Now the problem is to show that the range of the first four polynomials, where k is an integer modulo m, consists of all m residues if and only if m is as stated. To show that these values of m have the required property, the solitary cases 7, 10, and 19 are checked. For the remaining values, we will use a version of Hensel’s lemma that is slightly more general than the version in [9]. If f (x) is a polynomial with integer coefficients, p is a prime, f (a) ≡ 0 (mod kpi ), and f 0 (a) 6 ≡ 0 (mod p), then February 2012]
ALCUIN’S SEQUENCE
119
there exists a unique integer t such that f (a + tkpi ) ≡ 0 (mod kpi+1 ). We check that for m equal to each of the “base primes” 2, 3, 5, 11, 13, and 41, every integer modulo m is equal to f (x) for some x modulo m, where f is one of the polynomials and the derivative f 0 (x) is nonzero modulo m. By induction, the solution to each congruence can be lifted to a solution modulo 2 j , 3 j , 5 j , etc., for j > 1. A computer search establishes that there are such solutions for each of the base primes. Similarly, solutions modulo 6 and modulo 15 can be lifted to solutions modulo 2 · 3 j and 5 · 3 j , for j > 1, respectively. This entails checking that every integer modulo 6 is congruent to f (x) for some x with f 0 (x) 6 ≡ 0 (mod 3), and every integer modulo 15 is congruent to f (x) for some x with f 0 (x) 6 ≡ 0 (mod 3). This shows that the stated values of m have the required property. To show that the only valid moduli m are those on our list, we begin by showing that if the prime p is sufficiently large, then the four polynomials do not represent every integer modulo p; that is, some integer modulo p is not represented by any of the polynomials. We complete the squares of the polynomials and observe that if the integer s is represented, then 12s is of at least one of the following forms: (6k)2 , (6k + 1)2 − 1, (6k + 2)2 − 4, (6k + 3)2 + 3. Hence, at least one of 12s, 12s + 1, 12s + 4, 12s − 3 must be a square modulo p. However, by a theorem of Andr´e Weil (see, e.g., [9]), there exist arbitrarily many consecutive quadratic nonresidues modulo p, if p is large enough. The number N of sequences of h consecutive quadratic nonresidues modulo p satisfies p √ N − ≤ 3h p. 2h For h = 8, if N > 0 then we are guaranteed a sequence of eight consecutive quadratic nonresidues modulo p. Any such sequence includes four quadratic nonresidues which yield a value of s not represented by the four polynomials. Thus, we check primes up to 37,748,736 (in order to exclude all primes other than our base primes), and this is done in a matter of seconds by computer. Finally, we must eliminate all composite numbers except those in our list. If m is a valid modulus, then m 0 is a valid modulus for any divisor m 0 of m. So it remains to check that 72 and 192 do not work, nor does any modulus that is a product of two primes in our list, except for the products 2 · 5, 2 · 3, and 3 · 5. We must also rule out 22 · 5, 2 · 52 , 3 · 52 , 22 · 3, and 2 · 3 · 5. All of this is done instantly by computer. ACKNOWLEDGMENTS. The authors wish to thank the referees for valuable suggestions that improved the motivation and clarity of this paper.
REFERENCES 1. E. G. Andrews, A note on partitions and triangles with integer sides, Amer. Math. Monthly 86 (1979) 477–478; available at http://dx.doi.org/10.2307/2320420. 2. M. Erickson, Aha! Solutions, Mathematical Association of America, Washington, DC, 2009. 3. R. L. Graham, D. E. Knuth, O. Patashnik, Concrete Mathematics: A Foundation for Computer Science, second edition, Addison-Wesley, Reading, MA, 1994.
120
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
4. J. Hadley, D. Singmaster, Problems to Sharpen the Young, An annotated translation of Propsitiones ad acuendos juvenes, the oldest mathematical collection in Latin, attributed to Alcuin of York, Math. Gaz. 76 (1992) 102–126; available at http://dx.doi.org/10.2307/3620384. 5. D. M. Hirschorn, Triangles with integer sides, Math. Mag. 76 (2003) 306–308; available at http://dx. doi.org/10.2307/3219089. 6. T. Jenkyns, E. Muller, Triangular triples from ceilings to floors, Amer. Math. Monthly 107 (2000) 634– 639; available at http://dx.doi.org/10.2307/2589119. 7. R. H. Jordan, R. Walch, R. J. Wisner, Triangles with integer sides, Amer. Math. Monthly 86 (1979) 686– 689; available at http://dx.doi.org/10.2307/2321300. 8. N. Krier, B. Manvel, Counting integer triangles, Math. Mag. 71 (1998) 291–295; available at http: //dx.doi.org/10.2307/2690701. 9. I. Niven, H. Zuckerman, H. Montgomery, An Introduction to the Theory of Numbers, fifth edition, Wiley, New York, 1991. 10. D. Olivastro, Ancient Puzzle: Classic Brainteasers and Other Timeless Mathematical Games of the Last 10 Centuries, Bantam, New York, 1993. 11. J. Tanton, Young students approach integer triangles, FOCUS 22 (2002) no 5, 4–6. 12. D. D. Wall, Fibonacci series modulo m, Amer. Math. Monthly 67 (1960) 525–532; available at http: //dx.doi.org/10.2307/2309169.
DONALD J. BINDNER received his B.S. from Truman State University in 1992 and his Ph.D. from the University of Georgia–Athens in 2001. His interests include computer programming and free software. He and Martin recently wrote the book A Student’s Guide to the Study, Practice, and Tools of Modern Mathematics (CRC Press). Department of Mathematics and Computer Science, Truman State University, Kirksville, MO 63501
[email protected]
MARTIN ERICKSON received his B.S. and M.S. from the University of Michigan in 1985 and his Ph.D. from the University of Michigan in 1987. His mathematical interests are combinatorics, number theory, and problem solving. Department of Mathematics and Computer Science, Truman State University, Kirksville, MO 63501
[email protected]
February 2012]
ALCUIN’S SEQUENCE
121
A Case of Continuous Hangover Burkard Polster, Marty Ross, and David Treeby
Abstract. We consider a continuous analogue of the classic problem of stacking identical bricks to construct a tower of maximal overhang.
1. INTRODUCTION. How much of an overhang can we produce by stacking identical rectangular blocks at the edge of a table? Most mathematicians know that the overhang can be as large as desired: we arrange the blocks in the form of a staircase as shown in Figure 1. This stack will (just) fail to topple over, and with n blocks of length 2 the overhang sums to 1+
1 1 1 1 + + + ··· + . 2 3 4 n
Since the harmonic series diverges, it follows that the overhang can be arranged to be as large as desired, simply by using a suitably large number of blocks. 2
table
... 1 1 1 1 65 4 3
1 2
1
Figure 1. The total overhang of this tower of twenty blocks is 1 +
1 2
+
1 3
+ ··· +
1 20 .
In practice, these special stacks are constructed from top to bottom: the top block is placed so that its middle, balancing point is at the upper right corner of the second block. Then the top two blocks are placed together so that their combined balancing point is at the upper right corner of the third block, and so on.1 Okay, review is over; now for something new. Let’s rescale our special stacks in the vertical direction, so that each stack has height 1; the resulting stacks resemble decks of playing cards, as indicated in Figure 2. We’ll call these stacks the harmonic http://dx.doi.org/10.4169/amer.math.monthly.119.02.122 MSC: Primary 26A06 1 Recently, stacks have been investigated for which it is permitted to place two blocks upon any lower block. Stacking in this way, one can use some blocks as counterweights and thus achieve significantly greater overhangs than with the staircases. See [5] for these recent results and a comprehensive bibliography for the stacking problem.
122
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
y
2
1 table
x
y = 1 – e –x table Figure 2. The harmonic staircases converge to the harmonic stack.
staircases. Notice that we’ve arranged for the top right corner of the table to coincide with the origin of our coordinate system. We will prove (Theorem 2.1) that the sequence of harmonic staircases converges to the harmonic stack, determined by the function 1 − e−x . And, just like the harmonic staircases, the harmonic stack won’t topple over. In fact, we will show (Theorem 3.2) that the harmonic stack is stable in a correspondingly stricter sense. Motivated by the limiting process above, in this article we shall consider general stacks of width 2 and height 1. A general stack is not one solid piece, but rather consists of infinitely many infinitely thin and unconnected horizontal blocks. Similar to a tower of finite blocks, a general stack is capable of toppling at any level: to avoid toppling at a given height, the center of mass of the stack above that height must lie directly above the cross-section of the stack at that height. (By comparison, a solid stack is safe from toppling just as long as its total center of mass lies above its base.) As indicated above, the harmonic staircases are distinguished in the framework of the original problem by each block being extended as far as possible. We will show (Theorem 3.2) that the harmonic stack is similarly characterized among the stable stacks: cutting the stack horizontally at any height into two pieces, the center of mass of the top piece lies directly above the upper right corner of the lower piece. The harmonic staircase consisting of n blocks has maximum overhang within a natural class of stacks made up of the same blocks. Similarly, we will show (Theorem 6.1) that the harmonic stack is a fastest growing stable stack. What may be surprising is that the harmonic stack is not the uniquely fastest growing stable stack (Theorem 6.2). Other results include various methods of transforming stable stacks into new stable stacks, and further characterizations of the harmonic stack amongst stable stacks. All the arguments employed are elementary. Getting ready to stack. The original stacking problem is posed in terms of threedimensional blocks. However, the harmonic staircases and all stacks that we are interested in are simply figures in the x y-plane, orthogonally extended in the z-direction. Clearly, the two-dimensional stacks will be stable if and only if their extensions are. So, we lose nothing by restricting ourselves to discussing and drawing two-dimensional stacks. (In Section 7, we make a short excursion into the world of 3D blocks). February 2012]
A CASE OF CONTINUOUS HANGOVER
123
For a further simplification, note that scaling a stack horizontally or vertically cannot affect its stability. It follows that we can normalize, making all stacks one unit high and two units wide. There will be one further normalization in Section 3, once we introduce the notion of the gravity curve of a stack: effectively, all stacks considered will just balance at the table level. We can now formally define stacks and the stability of stacks. 2. WHAT IS A STACK? The stack S given by the stack function f : [0, 1) → R is the region in the plane bordered by the graphs of x = f (y) (not y = f (x)!) and x = f (y) − 2: S = {(x, y) : 0 ≤ y < 1, f (y) − 2 ≤ x ≤ f (y)}. Though more general functions can be considered, it is natural to restrict to stack functions that are integrable,2 and piecewise continuous and right continuous: that is, the one-sided limits of f exist at any y ∈ [0, 1), and lim f (t) = f (y). We shall t→y + assume this throughout.3 As the simplest example, the constant stack function f (y) = 1 gives a vertical stack. The harmonic stack HAR, pictured in Figure 2, has stack function har(y) given by the inverse of 1 − e−x : har(y) = − log(1 − y). The n-block harmonic staircase HARn has piecewise constant stack function harn (y) =
n X
1 , m m=n−bnyc
where the floor function bnyc is the largest integer m ≤ ny. We’ve shaded the graph of har6 (y) in the picture of HAR6 in Figure 3. Essentially, it is comprised of the vertical right-hand borders of the rectangular blocks in the stack. y
1 table
1 1 1 6 5 4
1 3
1 2
1
x
Figure 3. The 6-block harmonic staircase.
As indicated in the picture, all stacks are resting on the x-axis, and the table extends from −∞ to 0, making the upper right corner of the table the origin. The weight of part of a stack is simply its area. Since stacks are of height 1 and constant width 2, it follows that all stacks have total weight 2. 2 The stack function being either Riemann or Lebesgue integrable suffices. It is also sufficient that the stack have an improper Riemann integral as y → 1− . 3 For some of what follows it is only required that f be Lebesgue integrable. However, such considerations are a bit arcane, even for us.
124
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
Theorem 2.1 (The harmonic staircases converge to the harmonic stack). The harmonic stack function har(y) is the pointwise limit as n → ∞ of the harmonic staircase functions harn (y).4 Proof. We use the estimate n X 1 = log(n + 1) + γ + n , m m=1
where γ is the Euler-Mascheroni constant and n → 0. Now let y ∈ [0, 1). Then harn (y) =
n−1−bnyc n X X 1 1 − m m m=1 m=1
= (log(n + 1) + γ + n ) − log (n − bnyc) + γ + n−1−bnyc bnyc n+1 + log + n − n−1−bnyc . = − log 1 − n n Now, 0 ≤ ny − bnyc < 1, from which it follows that
bnyc n
→ y. Also, since y < 1,
n − 1 − bnyc = ny − bnyc − 1 + n(1 − y) → ∞. Consequently lim harn (y) = − log(1 − y) = har(y).
n→∞
3. THE STABILITY OF A STACK AND ITS GRAVITY CURVE. Consider the stack in Figure 4 below. Thought of as a stack of six unconnected rectangles, it is clear that the top two rectangles will fall down: the center of mass of the top rectangle is not above the second rectangle, and the center of mass of the combined top two rectangles is not above the stack consisting of the remaining four rectangles. However, considered as one solid piece, the center of mass of this stack is above the table, and so it will not topple over. y
1 x
table
Figure 4. As one solid piece, this stack has a huge overhang and will not topple over. 4 The proof actually shows that, for h < 1, har (y) converges uniformly to har(y) on [0, h]. Note also that n Theorem 2.1 can be related to the Maclaurin expansion of the harmonic stack function:
har(y) = − log(1 − y) = y +
y2 y3 y4 + + + ··· . 2 3 4
For the limiting value y = 1 this identity says that the overhang of the harmonic stack is equal to the limit of the harmonic series.
February 2012]
A CASE OF CONTINUOUS HANGOVER
125
We want to consider our stacks as unconnected, and thus capable of toppling, at any height; in effect, we are thinking of stacks as consisting of infinitely many infinitely thin blocks. It is then natural to define a stack to be stable if it balances at every possible level. The gravity function and the gravity curve of a stack. More formally, consider a general stack S, and fix a height y. Consider the slab of S lying above y, and let g(y) be the x-coordinate of the center of mass of this slab. Then we call (g(y), y) the gravity point of S at height y. Notice that a gravity point always lies directly below the center of mass of the slab defining it, as pictured in Figure 5. We also call the function g(y) the gravity function of the stack S, and the graph of x = g(y) is the gravity curve of S. center of mass of top slab
gravity point at height y
y
x
table
Figure 5. The gravity point at height y is directly below the center of mass of the top slab.
Proposition 3.1 (Equation of the gravity function). Suppose S is a stack with stack function f . Then the gravity function of S is continuous and is given by R1 y
g(y) =
( f (t) − 1) dt 1−y
.
Wherever f (y) is continuous, the gravity function g(y) is differentiable and f (y) = 1 − [g(y)(1 − y)]0 . At every point, g is differentiable from the right, and the above equation for f holds everywhere with the derivative so interpreted. Consequently, the stack function uniquely determines the gravity function and vice versa.5 Proof. If we consider the stack S to be made of infinitesimally thin blocks, then the block at height t has mass 2dt, and the x-coordinate of its center of mass is f (t) − 1. It then follows that R1 g(y) =
y
mass of the slab above height y R1
=
( f (t) − 1) × 2 dt
y
( f (t) − 1) dt (1 − y)
.
5 In the more general setting of Lebesgue integrable functions, the gravity function g is absolutely continuous, and determines f almost everywhere.
126
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
The rest of the proposition follows by multiplying by 1 − y and applying the fundamental theorem of calculus, recalling that we are only considering stack functions that are continuous from the right. From here on, g 0 (y) shall always denote, if need be, the right derivative of the gravity function g. The previous result then promises that this right derivative always exists. Normalizing the tables. We say that a stack S is stable if S contains its gravity curve. If f is the stack function of S, this is the case exactly when f (y) − 2 ≤ g(y) ≤ f (y)
for all y ∈ [0, 1).
Further, we say that a stack is balanced at 0 if g(0) = 0: that is, if the center of mass of the whole stack is above the top right corner of the table. It is easy to check that all harmonic staircases are stable stacks balanced at 0. In fact, a harmonic staircase is constructed exactly so that its gravity curve will contain the top right corner of the table, as well the top right corners of all but the topmost block; through the top block, the gravity curve is simply a vertical line directly up the middle. As part of the next result, we prove that HAR is balanced at 0. As well, it is obvious that any stack can be translated to be balanced at 0. This justifies the following normalization: From here on, we shall consider only those stacks that are balanced at 0. y
table
y
x
table
x
Figure 6. The gravity curves of the vertical stack and a harmonic staircase.
The gravity curve of the harmonic stack. Intuitively, the stack functions of HARn approximate the gravity functions of HARn , suggesting that the stack and gravity functions of HAR should coincide. This is indeed the case. Theorem 3.2 (The harmonic stack and gravity functions coincide). The harmonic stack HAR is the unique stack whose gravity function and stack function coincide. In particular, HAR is stable. Proof. Suppose S is a stack with stack function f . By Proposition 3.1, the stack and gravity functions of S will coincide if and only if f is continuous and R1 y
( f (t) − 1) dt 1−y
February 2012]
= f (y).
A CASE OF CONTINUOUS HANGOVER
127
It is easy to verify that the harmonic stack function satisfies this equation. Conversely, suppose the equation holds. It follows that the integral is differentiable, and thus f must be differentiable (in fact infinitely differentiable). Multiplying the equation by 1 − y and differentiating, 1 − f (y) = f 0 (y)(1 − y) − f (y), and so f 0 (y) =
1 . 1−y
Antidifferentiating gives f (y) = − log(1 − y) + C. Since we have normalized to have all stacks balanced at 0, the only possibility is C = 0, giving the harmonic stack. For completeness, we also prove the following. Proposition 3.3. The gravity functions of the stacks HARn converge pointwise to har. Proof. Let gn be the gravity function of HARn and let g be the gravity function of HAR. We prove that, as suggested above, gn (y) − harn (y) → 0 pointwise on [0, 1). The proposition then follows immediately from Theorem 2.1 and Theorem 3.2. Fix y ∈ [0, 1), suppose n ∈ N, and set m = bnyc. Then y ∈ mn , m+1 . If n is large n then n > m + 1, and therefore m m 1 m+1 gn ≤ gn (y) ≤ harn (y) = gn = gn + . n n n n−m From the proof of Theorem 2.1, we know that gn (y) − harn (y) → 0, as desired.
1 n−m
=
1 n−bnyc
→ 0. It follows that
4. CUT AND PASTE. The following diagram shows two stacks S and T together with their gravity curves. We now slice both stacks at height t, and then combine them to make a new stack as pictured: the bottom slab of S stays fixed, and the top slab of T is horizontally translated so that the ends of the gravity curves coincide. We will denote this new stack by S \t T . From Proposition 3.1 we know that gravity functions are continuous. It then follows immediately from the definition of the gravity curve that the gravity curve of S \t T is exactly the union of the two part-curves. We immediately conclude the following. Proposition 4.1 (Properties of cut and pasted stacks). Let S and T be two stacks. If both S and T are stable then so is S \t T . Here is a nice application of this construction. Suppose that S and T are stable stacks. Then St = S \t T, t ∈ [0, 1] is a continuous deformation of T into S with all intermediate stacks being stable. 128
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
1 t
S
T
2 t
3 t
S\tT
Figure 7. Combining parts of two stacks by aligning the gravity curves.
1 1 2
1 2
1
Figure 8. The stack har2 \ 1 har4 . The gray curve is the gravity curve. 2
Cutting and pasting is also a useful technique for transforming finite stacks of rectangular blocks. As an example, the stack har2 \ 1 har4 consists of the three blocks 2 pictured in Figure 8, with the lower two overhangs of length 12 . Now replace the top block, by cutting and pasting with har8 at t = 34 , giving (har2 \ 1 har4 ) \ 3 har8 . Continuing this process forever and taking the limit, we arrive 2 4 at the infinite-block stack HALF = ((((har2 \ 1 har4 ) \ 3 har8 ) \ 7 har16 ) \ 15 har32 ) · · · 2
8
4
16
The stack HALF consists of blocks of heights 12 , 41 , 18 , 161 , . . . , with all overhangs of length 12 . So, HALF has infinite overhang. Also, by applying Proposition 3.1 one
1 1 2
1 2
1 2
1 2
1 2
Figure 9. Finite stacks converging to HALF. The gray curve is the gravity curve.
February 2012]
A CASE OF CONTINUOUS HANGOVER
129
can show that the gravity curves of HALF and the nth finite approximation to HALF n coincide on the interval 0, 2 2−1 . It follows that HALF is stable and that its gravity n curve passes through the top right corners of the blocks, as indicated in Figure 9. 5. STRETCHING AND TRANSLATING. Another useful method of modifying a stack S is to take the top slab of S, above some height t, and then dilate and translate this slab to form a new stack; see Figure 10. We denote the resulting stack by ↓t S. 1
t
S
discard bottom part
2 rescale down to make new stack
t
3 translate to balance at 0
4 ↓t S
Figure 10. Stretching and translating the top part of a stack into a new stack.
The stack function for ↓t S is easily determined from the construction, and then the gravity function is easily determined from Proposition 3.1. We summarize this in the following proposition. Proposition 5.1 (The defining functions of stretched stacks). Let S be a stack with stack function f and gravity function g, and let t ∈ [0, 1). Then ↓t S has stack function f (y(1 − t) + t) − g(t), and gravity function g(y(1 − t) + t) − g(t). If S is stable then so is ↓t S. A direct calculation using Proposition 5.1 shows that HAR is self-similar: ↓t HAR = HAR for any t ∈ [0, 1). 130
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
However, HAR is not the only self-similar stack. Theorem 5.2 (Classification of self-similar stacks). Let S be a stack such that ↓t S = S for all t ∈ [0, 1). Then the stack function f and gravity function g of S are of the form f (y) = a log(1 − y) + a + 1 and g(y) = a log(1 − y), for some a ∈ R. Notice that a = 0 gives the vertical stack, and a = −1 gives HAR. Also, a = 1 gives HAR reflected in the y-axis. It is clear that the stable stacks are given by a ∈ [−1, 1], and so a = ±1 give the extreme stacks. Proof of Theorem 5.2. Since ↓t S = S, the gravity curves of the two stacks have to be identical. So, by Proposition 5.1, g(y(1 − t) + t) − g(t) = g(y) for all y, t ∈ [0, 1). Rearranging and dividing by t (1 − y), we have g(t) g(y + t (1 − y)) − g(y) = . t (1 − y) t (1 − y) Letting t → 0+ and applying g(0) = 0 gives g 0 (y) =
g 0 (0) , 1−y
where g 0 (y) denotes the right derivative of g. Since g(0) = 0, we can conclude that g(y) = a log(1 − y) for some a ∈ R.6 Then, from Proposition 3.1, f (y) = 1 − (g(y)(1 − y))0 = a log(1 − y) + a + 1. Notice that the HALF stack introduced in the previous section is self-similar for infinitely many values of t: it is easy to check that ↓1− 1 HALF = HALF 2n
for all n ∈ N.
6. WHICH STACKS GROW THE FASTEST? When stacking n identical blocks, with each block supporting at most one block above, it can be shown that the harmonic staircase HARn has maximal overhang: see [4, Section II]. However, HARn is 6 Even though g may only have a one-sided derivative, the continuity of g still permits us to antidifferentiate g 0 , obtaining the desired expression for g. See, for example, [1, Theorem 7.1].
February 2012]
A CASE OF CONTINUOUS HANGOVER
131
Figure 11. Converting HAR6 into another stable stack of equal, maximal overhang.
not unique in this regard, as an adaptation of HARn does just as well, as shown in Figure 11. In this section we will prove similar results for HAR. First, we prove (Theorem 6.1) that HAR has the fastest growing gravity function amongst stable stacks. We then prove (Theorem 6.2) that HAR has one of the fastest growing stack functions, but that it is not unique in this regard. In what follows, we will consider stacks growing to the right of the table. Of course stacks can also grow to the left, and there are obvious left versions of all our results below. Theorem 6.1 (HAR has the fastest growing gravity function). Suppose S is a stable stack with gravity function g. Then har − g is a nondecreasing and nonnegative function. Note that it is possible that g = har on an initial interval [0, t]. However, Theorem 6.1 implies that once har gets in front of g, it remains so. Proof of Theorem 6.1. Since har(0) = g(0) = 0, we only need to prove that har − g is nondecreasing. To do this, assume by way of contradiction that har0 (t) − g 0 (t) < 0 for some t ∈ [0, 1), where g 0 (t) refers to the right derivative of g. Let h be the gravity function of ↓t S. From Proposition 5.1 and the self-similarity of HAR, it follows that har =↓t har < h on some interval (0, s]. Now consider the stack T = (↓t S) \s HAR. By Proposition 4.1 and Proposition 5.1, T is stable. Further, if k is the gravity function and f is the stack function of T , then the stability of T and the choice of s ensures that har < k ≤ f on (0, 1); see Figure 12.
har s
k
Figure 12. The gravity function k of the stack T is to the right of har.
132
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
But this is a contradiction: since HAR is balanced at 0, har < f implies that the center of mass of T will be to the right of the origin. Having proved the gravity function har of HAR is dominant, what can we say about har as a stack function relative to other stack functions? Certainly, HAR need not always be in front of other stacks. For example, the vertical stack begins in front of HAR, before being overtaken. On the other hand, to have a stable stack S strictly in front of HAR from a certain height on is impossible: if this were the case, then above that height the gravity function of S would also be in front of har, contradicting our previous theorem. However, there do exist stable stacks that effectively compete for the lead. Theorem 6.2 (The fastest growing stack functions). There are no stable stacks that stay ahead of HAR for all y near 1. However, there do exist stacks that grow as fast as HAR. That is, there is a stable stack S, with stack function f , such that f (y) − har(y) changes sign infinitely often as y approaches 1. Proof of Theorem 6.2. We have already argued that a stable stack cannot stay ahead of HAR for all y near 1. We will now construct a stack S that repeatedly alternates with HAR for the lead: S has stack function f satisfying har(y) < f (y)
for values of y arbitrarily close to 1.
To see how to construct such a stack S, we first assume that S has the desired leadchanging property, and we use this to derive explicit sufficient conditions for the stack function f . We then show that f can indeed be chosen so that these conditions are satisfied. We begin by considering the gravity function g of S. Since S is assumed stable, it follows that for any y for which har(y) < f (y), we must also have har(y) − g(y) < 2. But, by Theorem 6.1, har − g is nondecreasing. It follows that if S is forever changing the lead with HAR then har(y) − g(y) < 2 for all y near 1, and thus that har − g converges to some m ∈ [0, 2] as y goes to 1. We can therefore write g(y) = har(y) − m + (y), where ≥ 0 is a nonincreasing function, and (y) → 0 as y → 1. Any m ∈ (0, 2) will suffice for what follows, but for definiteness we take m = 1. Then, since S is to be balanced at 0, we must have (0) = 1. We now assume that (y) is differentiable. Then, using Proposition 3.1, we can calculate f (y) = 1 − [g(y)(1 − y)]0 = 1 − [(har(y) − 1 + (y))(1 − y)]0 = −[har(y)(1 − y)]0 − 0 (y)(1 − y) + (y). February 2012]
A CASE OF CONTINUOUS HANGOVER
133
By Theorem 3.2, we also know that 1 − [har(y)(1 − y)]0 = har(y), and so f (y) = har(y) − 1 + (y) − 0 (y)(1 − y). We want S to be stable, for which we require f − 2 ≤ g ≤ f . Since g = har − 1 + , this is equivalent to 0 ≤ − 0 (y)(1 − y) ≤ 2
for all y.
Finally, S being ahead of HAR at height y amounts to f (y) > har(y). So, for the lead-changing property, it suffices to have 1 < − 0 (y)(1 − y)
for values of y arbitrarily close to 1.
In summary, it suffices for us to find a nonincreasing and differentiable function : [0, 1) → R such that (0) = 1 lim (y) = 0 y→1−
− 0 (y)(1 − y) ≤ 2 for all y − 0 (y)(1 − y) > 1 for values of y arbitrarily close to 1. These conditions will guarantee that the corresponding stack S is stable and balanced at 0, and will repeatedly overtake HAR. To construct an explicit function satisfying these conditions, we shall take to have constant segments that are connected by small S-bends of just the right size and slope. To do this, we first define a suitable prototype S-bend; see Figure 13: B(y) =
5 1 y − y5, 4 4
y ∈ [−1, 1].
Note that B(±1) = ±1 and B 0 (±1) = 0. Also, B 0 ≥ 0 on [−1, 1], with a maximum of 54 . We also take B(y) = 1 for y > 1 and B(y) = −1 for y < −1. We now define by subtracting a sum of suitable linear transformations of B. Specifically, define (y) = 1 −
∞ X
Bn (y),
n=1
where Bn is the S-bend B transformed to rise from 0 to 21n on the interval n 2 −1 1 2n − 1 − 2n , ; 2n 2 2n see Figure 14. Clearly is nonincreasing, with (0) = 1 and (y) → 0 as y → 1. Also, Bn has a maximum slope of 5 · 2n−2 . And, on the interval where Bn is bending, we have 21n ≤ 3 1 − y ≤ 2n+1 . It follows that at the point of maximum slope of Bn , we have − 0 (y)(1 − y) ≥ 5 · 2n−2 · 134
1 5 = > 1. 2n 4
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
7/8 3/4 1
x = (y) x = B (y)
–1
1/2
1 –1 1/8 1/4
0
1/2
1
Figure 14. The function .
Figure 13. The prototype S-bend B.
Furthermore, at every point on the bending interval of Bn , we have 15 3 ≤ 2. − 0 (y)(1 − y) ≤ 5 · 2n−2 · n+1 = 2 8 It follows that has exactly the properties desired. The resulting stack S is shown shaded in Figure 15, with har and the gravity curve of S superimposed.
Figure 15. A stable stack that grows as fast as HAR.
7. AFTERTHOUGHT 1: FLIPPING. Figure 16 illustrates another natural operation for transforming a stack S into a new stack: take the slab above a certain height t ∈ [0, 1), and reflect that slab about the vertical line through its gravity point. We denote the new stack by ↔t S. It is clear that if S is stable then so is ↔t S. We now apply this flipping procedure to construct a stable stack that has infinite overhang to both the left and the right. (The following was inspired by similar constructions involving finitely many blocks, described in [2, Chapters 12.5 and 12.7].)
t
t
S
↔t S
Figure 16. Reflecting the top slab of a stack about an axis through its gravity point.
February 2012]
A CASE OF CONTINUOUS HANGOVER
135
1
Figure 17. Constructing a stable stack with infinite overhang to both the left and right.
Recall the stack HALF constructed at the end of Section 4, consisting of blocks of heights 21n , each placed with overhang 12 . We now flip this stack above the 1st, 3rd, th block; each pair of flips results in the stack extending 6th, and in general the (n+1)n 2 1 further in both directions. Using Proposition 3.1, it is then easy to show that the 2 limiting result of these flips is a stable stack S, with stack function having unbounded oscillation in both directions as y → 1. Note also that this flipping procedure can be used to construct a stack that continually overtakes the harmonic stack, similar to that constructed at the end of the previous section. For this we modify the harmonic stack by flipping out infinitely many small horizontal slivers that get arbitrarily close to the top of the stack. It is then possible to arrange for these slivers to jut beyond the harmonic stack. We now momentarily venture into the world of 3-dimensional blocks. We’ll create a stack that casts a shadow over the whole x z-plane. (We’ll continue to label the vertical direction as y.) Begin with the oscillating stack S just constructed. Notice that there are infinitely many blocks of S such that a top corner of the block lies above the origin and the gravity curve of S also passes through that corner. Now, thicken the blocks in S to have a thickness d in the z direction, giving a 3D stack b S. Next, take any fixed irrational number a, and consider the angle α = aπ. Finally, at the height of each of the distinguished corners of S above the origin, successively rotate the top slab of b S the angle α around the y-axis. Any angle θ is closely approximated by arbitrarily large integer multiples of α. It follows that, no matter how small the thickness d, any point in the direction θ will eventually lie under some block of b S. It follows that b S casts a shadow over the whole x z-plane. 8. AFTERTHOUGHT 2: BALANCING THE EXPONENTIAL FUNCTION. In this section, we derive an interesting balancing property of the exponential function. We use this property to reprove the result from Theorem 3.2, that the stack function and gravity function of the harmonic stack coincide. Proposition 8.1 (Balancing the exponential function). The region under the graph of the function e x to the left of the point x = a balances over a fulcrum at x = a − 1. Proof. Using the standard formula, we find that the x-coordinate of the center of mass of the tail region is Ra −∞ Ra
xe x dx = a − 1. e x dx
−∞
136
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
y = ex
a a–1 Figure 18. The tail of the region under y = e x always balances 1 unit to the left of the cut.
Figure 19 shows the region between the graph of the exponential function e x and its horizontal translate e x−2 , truncated at a certain height. If the height is less than 1, then this is exactly a top slab of the harmonic stack, rotated 180 degrees about the point (0, 1/2). What we want to prove is that no matter where we cut, the shaded region balances with the fulcrum at a: this establishes again that the gravity function and the stack function of HAR are identical. 2 gravity y = ex
a Figure 19. The truncated region between y = e x and y = e x−2 balances at x = a.
Consider the region lying between y = e x and the horizontal translate y = e x−d , and to the left of x = a; see Figure 20. By the previous proposition this sliver is the difference of two regions that balance over a − 1, and hence the sliver also balances over a − 1. y = ex y = e x –d
a–1
a
Figure 20. The sliver trapped by y = e x and its translate balances over a pivot at a − 1.
Now set the horizontal difference to be d = n2 . Then n + 1 copies of the sliver fit together seamlessly into the shaded region in Figure 19, with a few curvy triangles missing at the top and an extra sliver sticking out on the right; see Figure 21. February 2012]
A CASE OF CONTINUOUS HANGOVER
137
a–1
a
Figure 21. Translates of the sliver combine to approximate the shaded region in Figure 19.
The fulcrums of these n + 1 slivers are equally spaced from a − 1 to a + 1 meaning the fulcrum of the entire region is at the middle point a. Letting n → ∞, the triangular gaps at the top and the extra sliver on the right vanish, proving again that the gravity function and stack function of HAR coincide. It can also be shown that, within a natural class of functions, the exponential function is characterized by the balancing property established in Theorem 8.1. The proof is a straightforward exercise. 9. AFTERTHOUGHT 3: A CANDIDATE BUT NO WINNER? Our stacks provide an interesting situation, where the natural candidate for optimality is not optimal. Suppose, ignoring what we have learned from Theorem 6.2, that we try to prove HAR is the stack with eventual greatest overhang: that is, for every stable stack function f it is eventually the case that har ≥ f . We now argue that if there is a stack with eventual greatest overhang, then it is HAR. This is reminiscent of Jakob Steiner’s original and famously flawed approach to the isoperimetric problem, that the circle maximizes area amongst closed curves of a given perimeter; see, for example, [3]. Consider any other stable stack S with stack function f and gravity function g. Then, by Theorem 3.2, there exists an interval [s, t] on which g < f . Figure 22 shows the stack S and its gravity curve subdivided into a top, middle, and bottom slab, with the middle slab corresponding to the interval [s, t].
a b
Figure 22. A stable non-harmonic stack together with its gravity curve.
Now, if we slide the top slab slightly to the right, then its gravity point a will stay within the top of the middle slab, ensuring that the top slab will not topple. However, sliding the top slab will move the gravity point of the whole stack to the right, and the stack will not be balanced at 0. However, we can avoid this by simultaneously sliding the top slab to the right and the middle slab to the left. Clearly, we can do this in such a way that the gravity point b of the combined top and middle slabs stays fixed, and so leaving the gravity curve inside the bottom slab unchanged. This means that the adjusted stack still balances, 138
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
with the top slab being further to the right than for the original stack: the adjusted stack eventually has greater overhang. This argument can be applied to any stack other than HAR, and thus establishes that if there is a stack of eventual maximum overhang, then it must be HAR. Note that this does imply that HAR can be regarded as the fastest growing stable stack in a certain sense: HAR is the only stable stack that cannot be completely overtaken by any other stable stack. REFERENCES 1. D. Bressoud, A Radical Approach to Lebesgue’s Theory of Integration, MAA Textbooks, Cambridge University Press, Cambridge, 2008. 2. J. Bryant, C. Sangwin, How Round is Your Circle? Princeton University Press, Princeton, 2008. 3. R. Courant, H. Robbins, I. Stewart, What is Mathematics? Oxford University Press, New York, 1996. 4. J. F. Hall, Fun with stacking blocks, Amer. J. Phys. 73 (2005) 1107–1116; available at http://dx.doi. org/10.1119/1.2074007. 5. M. Paterson, Y. Peres, M. Thorup, P. Winkler, U. Zwick, Maximum overhang, Amer. Math. Monthly 116 (2009) 763–787; available at http://dx.doi.org/10.4169/000298909X474855. BURKARD POLSTER received his Ph.D. in 1993 from the University of Erlangen-N¨urnberg in Germany. He currently teaches at Monash University in Melbourne, Australia. Readers may be familiar with some of his books dealing with fun and beautiful mathematics such as The Mathematics of Juggling, Q.E.D.: Beauty in Mathematical Proof, or the Shoelace Book. School of Mathematical Sciences, Monash University, Victoria 3800, Australia
[email protected] MARTY ROSS is a mathematical nomad. He received his Ph.D. in 1991 from Stanford University. Burkard, Marty, and their mascot the QED cat are Australia’s tag team of mathematics. They have a weekly column in Melbourne’s AGE newspaper and are heavily involved in the popularization of mathematics. Their various activities can be checked out at http://www.qedcat.com. When he is not partnering Burkard, Marty enjoys smashing calculators with a hammer. PO Box 83, Fairfield, Victoria 3078, Australia
[email protected]. DAVID TREEBY studied mathematics at Monash University in Australia where he graduated in 2005. He currently teaches mathematics to high school students at Presbyterian Ladies’ College in Melbourne, Australia. He delights in exploring beautiful mathematics with students at his school. Presbyterian Ladies’ College, 141 Burwood Hwy, Burwood, Victoria 3125, Australia
[email protected].
February 2012]
A CASE OF CONTINUOUS HANGOVER
139
NOTES Edited by Sergei Tabachnikov
Riemann Maps and Diameter Distance David A. Herron Abstract. We use the intrinsic diameter distance to describe when a Riemann map has a continuous extension to the closed unit disk.
1. INTRODUCTION. Many basic complex analysis courses include the celebrated Riemann mapping theorem; a few discuss the Carath´eodory-Osgood extension theorem which asserts that a Riemann map admits an extension to the closed unit disk that is a homeomorphism if and only if the target is a Jordan domain (i.e., bounded by a plane Jordan curve). When the target domain is not Jordan, one considers prime ends, a theory beyond the scope of beginning complex analysis courses and texts. The purpose of this note is to provide a written record of the following intermediate result; of special interest is the interplay between complex analysis, plane topology, and metric geometry. This by-product of a more general theory is folklore among the experts (cf. [3, 6.1.(4)]), but perhaps not widely known. f
Theorem. Let D → be a holomorphic homeomorphismfrom the open unit disk D in the complex plane C onto a bounded domain in C. The following are equivalent: g ¯ → ¯ with g|D = f . (a) There exists a continuous map D h ¯ → ¯ d with h|D = f . (b) There exists a homeomorphism D
(c) ∂d is a closed Jordan curve, i.e., a topological circle. (d) ∂ is a closed curve. (e) ∂ is locally connected. ¯ d and ∂d := ¯ d \ are the metric completion and metric boundary of the Here metric space d := (, d) where d is the diameter distance on . Also, in (a) =⇒ ¯d → ¯ is the extension of the identity map d → . (b) we have g = i B h where i : See §2.C for definitions. Figure 1 illustrates a simple non-Jordan domain that satisfies conditions (a) through (e); Figure 2 pictures two domains that do not. It is straightforward to check that (b) implies (c) and that (d) implies (e). We refer to [4, Theorem 2.1, p. 20] for a proof of the nontrivial fact that (e) implies (a). In §3 we verify that (a) implies (b) and that (c) implies (d). Our ideas and proofs should be accessible to students possessing basic knowledge of complex analysis and plane topology, and could possibly serve as a capstone experience for undergraduate mathematics majors. The reader is forewarned that our proofs employ basic plane and metric topology arguments; the rˆole of holomorphicity is explained in §2.B. http://dx.doi.org/10.4169/amer.math.monthly.119.02.140 MSC: Primary 30C20, Secondary 30C35, 30J99
140
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
slit disk
f
+
unit disk
+
A point in ∂ that has two preimages in ∂d and in ∂D f (z) =
√ √ (z+1)− 2 z 2 +1 √ √ (z+1)+ 2 z 2 +1
Figure 1. A Riemann map onto a slit disk.
2. PRELIMINARIES. 2.A. Basic notation and terminology. Throughout this article denotes a simply connected bounded domain in the complex plane C. We write D(a; r ) := {z ∈ C | |z − a| < r } for the open disk centered at a ∈ C with radius r > 0. Then D := D(0; 1) is the unit disk with boundary T := ∂D = {eit | t ∈ [0, 2π]}, the unit circle. A path is a continuous map of a compact interval, and unless explicitly indicated otherwise, we assume that the parameter interval is [0, 1]. We use the phrase path in with a terminal endpoint ζ in ∂ to describe a path γ
[0, 1] → ∪ {ζ } with γ ([0, 1)) ⊂ and γ (1) = ζ ∈ ∂. We write |γ | := γ ([0, 1]) for the image of the path γ . However, we write [a, b] both for the Euclidean line segment joining a and b as well as the affine path [0, 1] 3 t 7→ a + t (b − a); the reader can distinguish these two meanings by context. A path γ joins γ (0) to γ (1). When γ (0) = γ (1), we call γ a closed path. By a curve we mean the image of a path, a closed curve is the image of a closed path, and an arc is the image of an injective path. A crosscut of is an arc with endpoints in ∂ and all other points in . An endcut of is an arc having one endpoint in ∂ and all other points in . We note that every path contains an injective subpath that joins its endpoints; see [5]. A (closed) Jordan curve is a topological circle, that is, the homeomorphicimage of T. We call 0 a plane Jordan curve if 0 is a Jordan curve in C; in this setting, the Jordan curve theorem asserts that C \ 0 has exactly two components: the bounded component int(0) called the interior of 0 and the unbounded component ext(0) called the exterior of 0. f
2.B. Riemann maps. A Riemann map D → is a holomorphic homeomorphism; complex analysts call such an f a conformal map. We require two properties of Riemann maps. First, according to [4, Prop. 2.14, p. 29], each path in with a terminal endpoint in ∂ will have a preimage that is a path in D with a terminal endpoint in T. This fact, whose proof is based on a length-area estimate known as Wolf’s lemma (see [4, Prop. 2.2, p. 20]), does not require that f have a continuous extension to the closed disk. February 2012]
NOTES
141
¯ → . ¯ Second, suppose a Riemann map f has a continuous extension to a map g : D According to Lemma 2.2, each nondegenerate subarc of T has a nondegenerate image; that is, for all subarcs A ⊂ T,
diam(A) > 0 ⇐⇒ diam(g(A)) > 0.
(2.1)
2.2 Lemma. Let f be holomorphic and bounded in D. Suppose there is a nondegenerate subarc A ⊂ T such that for all ζ ∈ A,
lim f (z) = 0.
z→ζ z∈D
Then f = 0. Proof. We assume that A ⊃ {eit | 0 ≤ t ≤ 2π/n} for some positive integer n. Define F(z) := f 1 (z) f 2 (z) · · · f n (z) where f k (z) := f (e2πik/n z). Then F is holomorphic and bounded in D with the property that for each ζ ∈ T, lim F(z) = 0.
z→ζ z∈D
An appeal to the maximum principle reveals that F = 0, so f = 0. 2.C. Diameter distance. The diameter distance1 d on is defined by d(a, b) := inf{diam(γ ) | γ a path in joining a and b }. It is easy to check that d is a distance function on and that for all a, b ∈ , ¯ d and ∂d := ¯ d \ are the |a − b| ≤ d(a, b). We write d := (, d), and then metric completion and metric boundary of the metric space d . id
In fact, the identity map d − → is a 1-Lipschitz homeomorphism. Therefore, it ¯ d → . ¯ In general, i need follows that id extends naturally to a 1-Lipschitz map i : −1 not be surjective nor injective, but using continuity of id we see that i(∂d ) ⊂ ∂. For the slit disk, pictured in Figure 1, the map i is surjective but not injective. For the two domains pictured in Figure 2, i is neither surjective nor injective; here the left-
1 8
+
1 4
...
1 8
7i 8
+
3i 4 1 2
+
7i 8
+
i 2
1 2
...
1 4
+
+
i 2
i 4
Figure 2. Domains with i noninjective and nonsurjective. 1 This
142
is sometimes called the inner or internal or intrinsic diameter distance.
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
hand-side has ∂d homeomorphic to (0, 1] whereas the right-hand-side has ∂d homeomorphic to (0, 1). In our setting we can visualize ∂d as the set ∂ pa of path accessible boundary points of . More precisely, ∂ pa = i(∂d ) is the set of all points ζ ∈ ∂ with the property that there exists a path γ in with terminal endpoint ζ ; we recall that this means γ
[0, 1] → ∪ {ζ } with γ ([0, 1)) ⊂ and γ (1) = ζ. There is also a natural connection between ∂d and endcuts of . To see this, let γ be a path in with terminal endpoint γ (1) = ζ in ∂. By choosing a sequence of points along γ that is d-Cauchy but nonconvergent in , we obtain a point ξ ∈ ∂d with i(ξ ) = ζ . The continuity of γ at t = 1 ensures that limt→1− d(γ (t), ξ ) = 0, so we can define a continuous path γd in d by ( γd (t) :=
γ (t) if t ∈ [0, 1), ξ if t = 1.
Thus each such path γ corresponds to a path γd in d with a unique terminal endpoint γd (1) ∈ ∂d and with the property that γ = i B γd , so γ (1) = i(γd (1)). Let α and β be two paths in with terminal endpoints in ∂. We declare α and β to be d-equivalent provided limt→1− d(α(t), β(t)) = 0. There is a natural one-toone correspondence between ∂d and the equivalence classes of such paths. A general discussion, with detailed proofs, can be found in [2]. In §3 we use the following elementary facts. ¯ d is compact, then i is surjective; in particular, i(∂d ) = ∂. 2.3 Lemma. If ¯ d is compact. Let ζ ∈ ∂. Choose a sequence (z n )∞ Proof. Assume n=1 in with ¯ d such |z n − ζ | → 0 as n → ∞. There exists a subsequence (z nk )∞ and a point ξ ∈ k=1 that d(z n k , ξ ) → 0 as k → ∞. As id is continuous, ξ 6 ∈ , so ξ ∈ ∂d . Since |ζ − i(ξ )| ≤ |ζ − z n k | + |z nk − i(ξ )| ≤ |ζ − z n k | + d(z n k , ξ ) → 0
as k → ∞,
it follows that ζ = i(ξ ). ¯ d is compact if and only if ∂d is compact. 2.4 Lemma. The space ¯ d , compactness of the latter implies compactProof. Since ∂d is a closed subset of ness of the former. Assume that ∂d is compact. Let U be a collection of d-open sets ¯ d . There is a finite subcollection V ⊂ U such that that cover U :=
[
V ⊃ ∂d .
V ∈V
¯ d \ U = d \ U = \ U is d-closed, so also closed. As Since U is d-open, A := A is closed and bounded, it is compact, and hence d-compact too. So, there is another finite subcollection W ⊂ U that covers A, and V ∪ W is a finite subset of U that forms ¯ d. a d-open cover of February 2012]
NOTES
143
f
3. PROOFS. Let D → be a holomorphic homeomorphismwith ⊂ C bounded. We continue with the notation introduced in §2. Here we demonstrate that (a) implies (b) and that (c) implies (d). g
¯ → ¯ with g|D = f . For each 3.A. (a) =⇒ (b). Assume there is a continuous map D point a ∈ T, consider the path α := g B [0, a] in ∪ {g(a)} defined by α(t) := g(t a). As described in §2.C, this determines a unique point αd (1) ∈ ∂d with the property ¯ → ¯ d by that i(αd (1)) = α(1) = g(a). Now define h : D ( g(z) if z ∈ D, h(z) := (g B [0, z])d (1) if z ∈ T. ¯ is compact and ¯d Note that g = i B h. We claim that h is a homeomorphism. Since D is Hausdorff (being a metric space), it suffices to show that h is a continuous bijection. id
Proof that h is continuous. Using the fact that d → is a homeomorphism we see that h|D = id−1 B g|D is continuous. Thus it suffices to check that h is continuous at ¯ ∩ D(a; δ)) ⊂ each point of T. Let a ∈ T and ε > 0 be given. Select δ > 0 so that g(D ¯ D(g(a); ε/4). Let z ∈ D ∩ D(a; δ/2). The continuity of g, in conjunction with the definition of h, guarantees that lim d(h(ta), h(a)) = 0 = lim d(h(t z), h(z)).
t→1−
t→1−
In particular, we may choose t0 ∈ (0, 1) so that a0 := t0 a and z 0 := t0 z satisfy a0 , z 0 ∈ D ∩ D(a; δ) and d(h(a0 ), h(a)) < ε/4, d(h(z 0 ), h(z)) < ε/4. As D ∩ D(a; δ) is convex, the line segment path λ := [a0 , z 0 ] lies in D ∩ D(a; δ), so g B λ lies in D(g(a); ε/4). Thus d(h(a0 ), h(z 0 )) = d(g(a0 ), g(z 0 )) ≤ d(g B λ) < ε/2 and therefore d(h(z), h(a)) ≤ d(h(z), h(z 0 )) + d(h(z 0 ), h(a0 )) + d(h(a0 ), h(a)) < ε. Proof that h is surjective. Let ξ ∈ ∂d . Pick a path γ in with terminal endpoints γd (1) = ξ and γ (1) = ζ := i(ξ ). As described in §2.B, the preimage of γ is a path β (so, g B β = γ ) in D with terminal endpoint a := β(1) ∈ T = ∂D. Since h is continu¯ d . For each t ∈ [0, 1), h(β(t)) = g(β(t)) = γ (t), so it follows ous, h B β is a path in that h B β = γd . Thus h(a) = h(β(1)) = γd (1) = ξ . Proof that h is injective. Let a and b be distinct points in T = ∂D and let I and J be the components of T \ {a, b} (so I and J are open subarcs of T). We demonstrate that if h(a) = h(b), then min{diam[g(I )], diam[g(J )]} = 0; since this would contradict (2.1), it follows that h(a) 6 = h(b). As there is no harm in doing so, we assume that g(0) = 0. Then the paths α := g B [0, a] and β := g B [0, b], given by α(t) := g(t a) and β(t) := g(t b), define endcuts of both having initial endpoint α(0) = g(0) = 0 = β(0) and with terminal endpoints α(1) = g(a), β(1) = g(b) in ∂. See Figures 3 and 4. 144
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
ζ
α
∂ β
0 Figure 3. The paths α and β.
Now suppose that ξ := h(a) = h(b). Then ζ := i(ξ ) = g(a) = g(b), α and β are d-equivalent paths that join 0 to ζ ∈ ∂, and C := |α| ∪ |β| is a plane Jordan curve in ∪ {ζ }. Note that C ∩ ∂ = {ζ }; however, in general, D := int(C) need not be contained in . See Figure 3. Let ε > 0 be given. We show that either g(I ) or g(J ) has diameter smaller than ε. ¯ disk picture” We assume that ε < |ζ |/10. It is helpful to examine topology in the “D in Figure 4, but the reader must remember that g is only a homeomorphism in D. a g −1 (10 ) ra
0 g −1 (11 )
λ
sb
I
b
Figure 4. The disk picture. g
¯ → , ¯ in conjunction with the d-equivalence of α and β, guarThe continuity of D antees the existence of a τ ∈ (0, 1) such that diam(α[τ, 1]) < ε/10, diam(β[τ, 1]) < ε/10, and ∀t ∈ [τ, 1), d(α(t), β(t)) < ε/10. In particular, there is an injective path γ in that joins α(τ ) = g(τ a) to β(τ ) = g(τ b) with diam(γ ) < ε/10. Note that |γ | ⊂ ∩ D(ζ ; ε/5), so 0 ∈ / |γ | and thus γ 6⊂ C (because the subarc of C that joins α(τ ) and β(τ ) in passes through the origin). See Figure 5. Next we select an appropriate subpath κ of γ . We exhibit R, S satisfying 0 ≤ R < S ≤ 1 and such that γ (R) is the last point of γ in α[τ, 1] and γ (S) is the first point of γ |[R,1] in β[τ, 1], and such that κ := γ |[R,S] is injective and meets C only at the endpoints of κ. To this end, let R := sup{t ∈ [0, 1] | γ (t) ∈ α[τ, 1]}. Then 0 ≤ R < 1 and there exists an r ∈ [τ, 1) so that γ (R) = α(r ) = g(r a). Next, let S := inf{t ∈ [R, 1] | γ (t) ∈ β[τ, 1]}. Then R < S ≤ 1 and there exists an s ∈ [τ, 1) so that γ (S) = β(s) = g(s b). We examine the paths κ := γ |[R,S] February 2012]
and λ := g −1 B κ. NOTES
145
γ (R)
ζ
α(τ ) κ γ
γ (S)
α 0
β
β(τ )
Figure 5. The path γ and its subpath κ.
By its definition, κ is an injective path in that joins α(r ) to β(s) and is such that |κ| ∩ C = {α(r ), β(s)}. Thus |κ| is a crosscut either of D = int(C) or of E := ext(C) (and |κ| separates 0 and ζ , either in D or in E). To see that the former holds (i.e., that |κ| is a crosscut of D), note that there are (closed) Jordan curves 00 := g([0, ra] ∪ |λ| ∪ [sb, 0]) = α[0, r ] ∪ |κ| ∪ β[s, 0], 01 := g([a, ra] ∪ |λ| ∪ [sb, b]) = α[1, r ] ∪ |κ| ∪ β[s, 1] with 00 ⊂ , 01 ⊂ ( ∪ {ζ }) ∩ D(ζ ; ε/5), and 10 := int(00 ) ⊂ , 11 := int(01 ) ⊂ D(ζ ; ε/5). ¯ 0 and 0 6 ∈ 1 ¯ 1 . If |κ| were a crosscut of E, then either |κ| would In particular, ζ ∈ /1 separate 0 from ∞ in E or |κ| would separate ζ from ∞ in E: the first case would imply that 0 ∈ 11 while the second case would mean that ζ ∈ 10 . Since neither of these holds, |κ| is not a crosscut of E and so is a crosscut of D. Note that 10 and 11 are the components of D \ |κ|. See Figure 4. Evidently, |λ| is a crosscut of one of the components, call it W , of D \ ([0, a] ∪ [0, b]). The boundary of W consists of [0, a], [0, b], and one of the subarcs I or J ; we ¯ 1. assume that ∂ W = [0, a] ∪ I ∪ [0, b]. Again, see Figure 4. We claim that g(I ) ⊂ 1 ¯ 1 ⊂ D(ζ ; ε/5), this means that diam[g(I )] < ε as required. Since 1 To verify this claim, we first check that g(I ) ⊂ D ∪ {ζ }. Let c ∈ I . Then K := g([0, c]) is an endcut of and K ∩ C ⊂ {0, ζ }, so either K ⊂ D ∪ {0, ζ } or K ⊂ E ∪ {0, ζ }. However, as [0, c] is a crosscut of W that joins 0 to c with ∅ 6 = [0, c] ∩ |λ| ⊂ W , we see that ∅ 6 = K ∩ |κ| ⊂ D. Thus, K ⊂ D ∪ {0, ζ }, so g(c) ∈ D ∪ {ζ }. ¯ As g(I ) is connected and Finally, recall that 0 and ζ are separated by |κ| in D. ¯ 1 \ |κ|. Thus g(I ) ⊂ 1 ¯ 1. ζ ∈ g(I ), g(I ) lies in the ζ -component of D¯ \ |κ|, which is 1 3.B. (c) =⇒ (d). We assume that ∂d is a closed Jordan curve. In particular then, ∂d is compact. To see that ∂ is a closed curve, it suffices to verify that i(∂d ) = ∂. This latter requirement follows from Lemmas 2.3 and 2.4. 146
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
In closing we mention that in the literature one also encounters the (inner) length distance ` which is defined similarly to d but using the length `(γ ) (of a joining path γ ) instead of diam(γ ). However, there are bounded Jordan plane domains that have ¯ ` noncompact, so nonhomeofinite length distance and ∂ locally connected but ¯ morphic to D. Such an example can be constructed by starting with an open square and taking a convergent sequence (ζn ) of boundary points; at the points ζn we attach disjoint spiraling tentacles that have length approximately one and diameters tending to zero. See [1, Example 2.10] for more details. ACKNOWLEDGMENTS. The author thanks the referees for their helpful comments. He was partially supported by the Charles Phelps Taft Research Center.
REFERENCES 1. D. Freeman, D. Herron, Bilipschitz homogeneity and inner diameter distance, J. Anal. Math. 111 (2010) 1–46; available at http://dx.doi.org/10.1007/s11854-010-0011-6. 2. D. Herron, Geometry and topology of intrinsic distances, J. Anal. 18 (2010) 197–231. 3. R. N¨akki, J. V¨ais¨al¨a, Jon disks, Expo. Math. 9 (1991) 3–43. 4. C. Pommerenke, Boundary Behavior of Conformal Maps, Grunlehren der mathematischen Wissenschaften, no. 299, Springer-Verlag, Berlin, 1992. 5. J. V¨ais¨al¨a, Exhaustions of John domains, Ann. Acad. Sci. Fenn. Math. 19 (1994) 47–57. Department of Mathematics, University of Cincinnati, OH 45221
[email protected]
A Power Series Approach to Some Inequalities Cristinel Mortici
Abstract. The aim of this note is to introduce a new technique for proving and discovering some inequalities.
1. INTRODUCTION. We give here a method for proving a class of inequalities using infinite series. This method is useful, because many difficult problems can be easily solved and often they can be extended. To begin, let us consider the well-known Nesbitt inequality [4] a b c 3 + + ≥ , b+c c+a a+b 2
a, b, c > 0.
(1)
To introduce our method, we assume, without loss of generality, that a + b + c = 1, so we have to prove that http://dx.doi.org/10.4169/amer.math.monthly.119.02.147 MSC: Primary 26D15
February 2012]
NOTES
147
a b c 3 + + ≥ , 1−a 1−b 1−c 2
a, b, c ∈ (0, 1) .
If we look carefully, then we discover that the fractions from the left-hand side are the sums of some convergent geometric series. By also using the generalized means inequality, we deduce that ∞
∞
∞
X X X a b c + + = an + bn + cn 1−a 1−b 1−c n=1 n=1 n=1 =3·
∞ X a n + bn + cn
3
n=1
=3·
∞ n X 1 n=1
3
∞ X a+b+c n
≥3·
3
n=1
=3·
1 3
1−
1 3
=
3 , 2
which justifies (1). Next we discuss and refine some inequalities to illustrate our technique. 2. DISCOVERING NEW INEQUALITIES. Vasile Cˆırtoaje [2] states that for all nonnegative real numbers a1 , a2 , . . . , ak < 1 satisfying q √ a= a12 + a22 + · · · + ak2 /k ≥ 3/3, we have ka a2 ak a1 ≥ + + ··· + . 1 − a2 1 − a12 1 − a22 1 − ak2
(2)
Here we give a sort of extension. Note that, by using (2) and the inequality between the quadratic mean and the arithmetic mean, we obtain
a1 1 − a12
2
+
a2 1 − a22
2
+ ··· +
ak 1 − ak2
2 ≥ ≥
a1 1−a12
+
a2 1−a22
+ ··· +
ak 1−ak2
2
k ka 2 . (1 − a 2 )2
Hence, the inequality
a1 1 − a12
2
+
a2 1 − a22
2
+ ··· +
ak 1 − ak2
2 ≥
ka 2 , (1 − a 2 )2
(3)
as √ a consequence of (2), holds for all a1 , a2 , . . . , ak ∈ (0, 1), with the condition a ≥ 3/3. √ We use our method to prove that inequality (3) holds without the condition a ≥ 3/3. In this case, using ∞ X n=1
148
nx n =
x , (1 − x)2
x ∈ (0, 1) ,
(4)
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
we have:
a1 1 − a12
2
+
a2 1 − a22
2
+ ··· +
ak 1 − ak2
2 =
∞ ∞ ∞ X X X na12n + na22n + · · · + nak2n n=1
=k·
n=1
∞ X
n·
n=1
n=1
a12n + a22n + · · · + ak2n k
2 n ∞ X a1 + a22 + · · · + ak2 ≥k· n k n=1 ∞ X =k· a 2n n=1
=k·
a2 . (1 − a 2 )2
New inequalities can be discovered via this method, as we will see next. Consider the well-known inequality: a 2 + b2 + c2 ≥ ab + bc + ca,
(5)
which is true for all a, b, c, but we take a, b, c ∈ (−1, 1). For every n ∈ N, a 2n + b2n + c2n ≥ (ab)n + (bc)n + (ca)n ,
(6)
and by addition, ∞ X
a 2n +
n=0
∞ X
b2n +
n=0
∞ X
c2n ≥
n=0
∞ ∞ ∞ X X X (ab)n + (bc)n + (ca)n . n=0
n=0
n=0
We obtain the following nice inequality: 1 1 1 1 1 1 + + ≥ + + , 2 2 2 1−a 1−b 1−c 1 − ab 1 − bc 1 − ca
a, b, c ∈ (−1, 1) .
Further, if we multiply (6) by n and sum, then by (4), we obtain
a 1 − a2
2
+
b 1 − b2
2
+
c 1 − c2
2 ≥
ab bc ca + + . (1 − ab)2 (1 − bc)2 (1 − ca)2
By applying inequality (5) twice, we get a 4n + b4n + c4n ≥ (a 2 bc)n + (ab2 c)n + (abc2 )n , and then using the summation method with respect to n again, we obtain 1 1 1 1 1 1 + + ≥ + + . 4 4 4 2 2 1−a 1−b 1−c 1 − a bc 1 − ab c 1 − abc2 February 2012]
NOTES
149
If we multiply by n before summing, we obtain
a2 1 − a4
2
+
≥ abc
b2 1 − b4
2
+
c2 1 − c4
2
a b c + + . (1 − a 2 bc)2 (1 − ab2 c)2 (1 − abc2 )2
Vasile Cˆırtoaje [1] proved that for all x, y, z ∈ R, 17(x 3 + y 3 + z 3 ) + 45x yz ≥ 32(x 2 y + y 2 z + z 2 x). If we consider x, y, z ∈ (−1, 1) and n ∈ N, then by adding the inequalities n n n 17(x 3n + y 3n + z 3n ) + 45(x yz)n ≥ 32 x 2 y + y 2 z + z 2 x , we obtain the following interesting inequality: 17 17 17 45 32 32 32 + + + ≥ + + . 1 − x3 1 − y3 1 − z3 1 − x yz 1 − x2y 1 − y2z 1 − z2 x Our method is also suitable for obtaining new results related to convexity. Walter Janous [3] showed (z n − x n ) f (y) ≥ (z n − y n ) f (x) + (y n − x n ) f (z),
(7)
where f : [0, ∞) → R is any increasing and concave function, 0 < x ≤ y ≤ z, and n a positive integer. By adding inequalities (7) for n = 0, 1, 2, . . . , with 0 < x ≤ y ≤ z < 1, we get
1 1 − 1−z 1−x
f (y) ≥
1 1 − 1−z 1−y
f (x) +
1 1 − 1−y 1−x
f (z).
By multiplying by (1 − x)(1 − y)(1 − z), we obtain the interesting inequality: (z − x)(1 − y) f (y) ≥ (z − y)(1 − x) f (x) + (y − x)(1 − z) f (z), which shows that the function g : (0, 1) → R given by g(x) = (1 − x) f (x) is also concave. Finally, we use H¨older’s inequality to establish two inequalities that are new, as far we know. Theorem 1. For a, b ∈ (0, 1) and p, q > 0 with p −1 + q −1 = 1, we have q p pq + ≥ 1 − ap 1 − bq 1 − ab and ap bp ab + ≥ . p(1 − a p )2 q(1 − bq )2 (1 − ab)2 150
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
Proof. We use H¨older’s inequality:
ap p
+
bq q
≥ ab. We have
∞ ∞ 1 1 1 1 1 X p n 1 X q n · + · = · (a ) + · (b ) p 1 − ap q 1 − bq p n=0 q n=0
=
∞ X (a n ) p
p
n=0
(bn )q + q
∞ X ≥ (ab)n = n=0
1 . 1 − ab
For the second inequality, we have ∞ ∞ 1 ap 1 bq 1 X 1 X p n · + · = · n(a ) + · n(bq )n p (1 − a p )2 q (1 − bq )2 p n=1 q n=1
n p X ∞ ∞ X (a ) (bn )q ab = n + ≥ n(ab)n = . p q (1 − ab)2 n=1 n=1 ACKNOWLEDGMENTS. This work was supported by a grant of the Romanian National Authority for Scientific Research, CNCS-UEFISCDI, project number PN-II-ID-PCE-2011-3-0087.
REFERENCES 1. V. Cˆırtoaje, Problem 2972, Crux Mathematicorum 6 (2004) 372. 2. , Problem 2983, Crux Mathematicorum 7 (2004) 430. 3. W. Janous, Problem 1861, Crux Mathematicorum 7 (1993) 203. 4. A. M. Nesbitt, Problem 15114, Educational Times 2 (1903) 37–38. Department of Mathematics, Valahia University of Tˆargovis¸te, Romania
[email protected]
Chebyshev Mappings of Finite Fields Julian Rosen, Zachary Scherr, Benjamin Weiss, and Michael E. Zieve Abstract. For a fixed prime p, we consider the set of maps Z/ pZ → Z/ pZ of the form a 7→ Tn (a), where Tn (x) is the degree-n Chebyshev polynomial of the first kind. We observe that these maps form a semigroup, and we determine its size and structure.
1. INTRODUCTION. Some of the “world’s most interesting” polynomials [2] are the Chebyshev polynomials [4], which are defined for any positive integer n to be bn/2c
n − k n−2k−1 n−2k n 2 x . Tn (x) = (−1) n−k k k=0 X
k
The following table lists the first few Chebyshev polynomials: http://dx.doi.org/10.4169/amer.math.monthly.119.02.151 MSC: Primary 37P25, Secondary 11T06, 33C45
February 2012]
NOTES
151
n
Tn (x)
1
x
2
2x 2 − 1
3
4x 3 − 3x
4
8x 4 − 8x 2 + 1
5
16x 5 − 20x 3 + 5x
6
32x 6 − 48x 4 + 18x 2 − 1
7
64x 7 − 112x 5 + 56x 3 − 7x
Chebyshev polynomials have integer coefficients and satisfy Tn (cos θ ) = cos nθ for any θ ∈ R. The induced mappings a 7 → Tn (a) are of particular interest, in part because the identity Tn ◦ Tm = Tnm = Tm ◦ Tn implies that any two such mappings commute. The Chebyshev polynomials induce especially remarkable mappings on the rings Z/ pZ for prime p: for instance, if f (x) ∈ (Z/ pZ)[x] has degree at most p 1/4 , and the map a 7 → f (a) describes a bijection Z/ pZ → Z/ pZ, then f is a composition of Chebyshev polynomials, cyclic polynomials x d , and linear polynomials1 [1, 5]. The purpose of this note is to analyze the collection of maps Z/ pZ → Z/ pZ which are induced by Chebyshev polynomials. Since there are only finitely many maps Z/ pZ → Z/ pZ of any sort, there must be infinitely many pairs (n, m) of distinct positive integers such that Tn and Tm induce the same map Z/ pZ → Z/ pZ. This leads to the following questions. (1) When do Tn and Tm induce the same map Z/ pZ → Z/ pZ? (2) For fixed p, how many distinct maps Z/ pZ → Z/ pZ are induced by Chebyshev polynomials? We can say more about the structure of the collection of maps Z/ pZ → Z/ pZ induced by Chebyshev polynomials, which we will call Chebyshev maps. For, the identity Tn ◦ Tm = Tnm implies that (for a fixed prime p) the set of Chebyshev maps Z/ pZ → Z/ pZ is closed under composition, and hence forms a semigroup. This fact already distinguishes Chebyshev polynomials from most other classes of polynomials, and raises the following question. (3) What is the structure of the semigroup of Chebyshev maps Z/ pZ → Z/ pZ? As often happens, the prime p = 2 behaves differently from other primes. The answers to our questions for p = 2 are as follows. Theorem 1. The polynomials Tn and Tm induce the same map Z/2Z → Z/2Z if and only if n ≡ m (mod 2). There are a total of two Chebyshev maps Z/2Z → Z/2Z, namely the identity and the constant map 1. These form a semigroup isomorphic to Z/2Z under the operation of multiplication. For odd primes p, the answers to our questions are as follows. Theorem 2. Let p be an odd prime. The polynomials Tn and Tm induce the same map Z/ pZ → Z/ pZ if and only if n is congruent to either ±m or ± pm modulo ( p 2 − 1)/2. 1 The
152
coefficients of these linear polynomials are only required to lie in the algebraic closure of Z/ pZ.
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
The number of distinct Chebyshev maps Z/ pZ → Z/ pZ is ( p + 1)( p + 3)/8. The semigroup of Chebyshev maps Z/ pZ → Z/ pZ is isomorphic to the quotient of the multiplicative semigroup Z/(( p 2 − 1)/2)Z by the subgroup {1, −1, p, − p}. Before proving these results, we illustrate Theorem 2 by writing it out in the two smallest cases. •
•
When p = 3, there are three Chebyshev maps on Z/ pZ, induced by T1 , T2 , and T4 . Here T1 is the identity, T4 is the constant map 1, and T2 ◦ T2 = T4 . These three maps comprise the quotient of the semigroup Z/4Z (under multiplication) by the subgroup {1, 3}; the cosets of this subgroup are {1, 3}, {0}, and {2}. When p = 5, there are six Chebyshev maps on Z/ pZ. These maps correspond to the cosets of the subgroup {1, 5, 7, 11} of the semigroup Z/12Z (under multiplication), namely, {0}, {1, 5, 7, 11}, {2, 10}, {3, 9}, {4, 8}, {6}. Here a prescribed coset corresponds to the map a 7 → Tn (a), where n is any positive integer whose image in Z/12Z lies in the prescribed coset. The coset containing 1 is the identity element, and in this case it is the only invertible element in the quotient semigroup. Note that the cosets have sizes 1, 2, and 4. This also holds for larger primes, and will be made explicit in the proof of Theorem 2.
2. EVEN CHARACTERISTIC. In this section we prove the following result, which implies Theorem 1. Proposition 3. If n is even then Tn (x) ≡ 1 (mod 2); if n is odd then Tn (x) ≡ x (mod 2). We begin with an alternate development of Chebyshev polynomials. For any positive integer n, the fundamental theorem of symmetric polynomials [6, p. 99] implies that there is a unique f ∈ Z[x, y] such that f (u + v, uv) = u n + v n . MorePbn/2c over, f (t x, t 2 y) is homogeneous in t of degree n, so f (x, y) = i=0 f i x n−2i y i for Pbn/2c some integers f i . Now put g(x) := f (x, 1) = i=0 f i x n−2i , so that g(u + u −1 ) = u n + u −n . Then h(x) := g(2x)/2 satisfies h((z + z −1 )/2) = (z n + z −n )/2, which for z = eiθ implies that h(cos θ) = cos nθ . Hence h − Tn vanishes at cos θ , and since θ is arbitrary it follows that h = Tn . We now determine the lowest-degree term of h, and use it to compute the reduction of h mod 2. If n is even then substituting u = −v yields 2v n = (−v)n + v n = f (0, −v 2 ) = f n/2 · (−v 2 )n/2 , Pbn/2c so that f n/2 = 2 · (−1)n/2 . Since h = i=0 f i x n−2i 2n−2i−1 and each f i is an integer, it follows for even n that h ≡ 1 (mod 2). If n is odd then (n−1)/2 X u n + vn = f i (u + v)n−1−2i (uv)i ; u+v i=0
substituting u = −v on the right yields f (n−1)/2 (−v 2 )(n−1)/2 , and evaluating the left side at u = −v (for instance, via l’Hˆopital’s rule) yields nv n−1 . Thus we find that February 2012]
NOTES
153
f (n−1)/2 = n · (−1)(n−1)/2 is odd, so that h ≡ x (mod 2). This proves the proposition (and more). 3. ODD CHARACTERISTIC. In this section we prove Theorem 2. Let p be an odd prime, and write F p and F p for the field Z/ pZ and its algebraic closure. As noted in the previous section, Tn ((z + z −1 )/2) = (z n + z −n )/2. ∗
Lemma 4. For any α ∈ F p , the number of elements β ∈ F p such that β + β −1 = α is either one or two, and if it is two then the elements are reciprocals of one another. ∗
Proof. For β ∈ F p , the equality β + β −1 = α holds precisely when β is a root of ∗ x 2 − αx + 1. But this polynomial has either one or two roots in F p , and if it has two then they are reciprocals. For any α ∈ F p , write α = β + β −1 with β as in the lemma; then, since pth powering is an automorphism of F p which fixes α, we have β p + β − p = α p = α = β + β −1 , so the lemma implies that β p ∈ {β, β −1 }, whence β p±1 = 1. Conversely, if β ∈ F p satisfies β p±1 = 1, then β + β −1 is fixed by pth powering, and hence lies in F p . Thus the elements of F p are precisely the elements (β + β −1 )/2 where β ∈ F p and β p±1 = 1. Now, if β p±1 = 1 then Tn
β + β −1 2
= Tm
β + β −1 2
⇔ β n + β −n = β m + β −m ⇔ either
βn = βm
⇔ either
β n−m = 1
or or
β n = β −m β n+m = 1.
Letting σn denote the map F p → F p defined by a 7 → Tn (a), it follows that σn = σm if and only if every ( p ± 1)th root of unity in F p is either an (n − m)th root of unity or an (n + m)th root of unity. Since F p contains both primitive ( p + 1)th roots of unity and primitive ( p − 1)th roots of unity, this says that σn = σm if and only if n ≡ ±m (mod p + 1) and n ≡ ±m (mod p − 1), or equivalently, n ≡ ±m or ± pm (mod ( p 2 − 1)/2). We have shown that the number of maps F p → F p induced by Chebyshev polynomials equals the number of orbits of the action of multiplication by {1, −1, p, − p} on residue classes mod ( p 2 − 1)/2. There are precisely two orbits of size 1, namely {0} and {( p 2 − 1)/4}. The orbits of size 2 are {±k( p − 1)/2} for k = 1, 2, . . . , ( p − 1)/2 and {±`( p + 1)/2} for ` = 1, 2, . . . , ( p − 3)/2. The remaining ( p 2 − 4 p + 3)/2 residue classes split into orbits of size 4. Hence the number of distinct orbits, which equals the number of distinct maps σn , is ( p 2 + 4 p + 3)/8. Finally, since Tn ◦ Tm = Tnm , the map n 7 → σn is a semigroup homomorphism from the multiplicative semigroup of positive integers to the semigroup of maps F p → F p induced by Chebyshev polynomials. Since we showed above that σn = σm precisely when n ≡ ±m or ± pm (mod ( p 2 − 1)/2), it follows that the semigroup of Chebyshev maps F p → F p is isomorphic to the quotient of the multiplicative semigroup Z/(( p 2 − 1)/2)Z by the subgroup {±1, ± p}. 154
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
4. FINAL REMARKS. It would be interesting to consider similar questions over more general fields or rings. Proposition 3 shows that if K is any commutative ring of characteristic 2 then the identity map and the constant map 1 are the only maps K → K induced by Chebyhsev polynomials. If K is a finite field whose order q is odd, then the proof of Theorem 2 shows that the number of Chebyshev maps K → K is (q + 1)(q + 3)/8, and the semigroup of Chebyshev maps is the quotient of the multiplicative semigroup Z/((q 2 − 1)/2)Z by the subgroup {1, −1, q, −q}. Theorem 2 implies that, for any odd prime p, the group of permutations of Z/ pZ induced by Chebyshev polynomials, or equivalently the group of invertible elements in our semigroup, is the quotient group (Z/(( p 2 − 1)/2)Z)∗ /h−1, pi. This recovers the main result of [3]. Finally, we note that when examining Chebyshev-like mappings of arbitrary fields K , it is often convenient to treat the related class of Dickson polynomials. These are defined for any positive integer n and any α ∈ K by bn/2c
Dn (x, α) =
X k=0
n−k n (−α)k x n−2k n−k k
n n−k (it turns out that n−k is an integer). If 2α 6 = 0 then the Dickson polynomial is k related to the Chebyshev polynomial via the change of variables √ n x . Dn (x, α) = 2 α · Tn √ 2 α ACKNOWLEDGMENTS. We thank Florian Block, Kevin Carde, Jeff Lagarias, and the referees for valuable suggestions which improved the exposition in this paper. The first, third and fourth authors were partially supported by the NSF under grants DMS-0502170, DMS-0801029, and DMS-0903420, respectively. The second author was supported by an NSF Graduate Research Fellowship.
REFERENCES 1. M. Fried, On a conjecture of Schure, Michigan Math. J. 17 (1970) 41–55; available at http://dx.doi. org/10.1307/mmj/1029000374. 2. R. Lidl, G. L. Mullen, The world’s most interesting class of integral polynomials, J. Combin. Math. Comnin. Comput. 37 (2001) 87–100. ¨ 3. W. N¨obauer, Uber eine Klasse von Permutationspolynomen und die dadurch dargestellten Gruppen, J. Reine Angew. Math. 231 (1968) 216–219. 4. T. J. Rivlin, Chebyshev Polynomials: From Approximation Theory to Algebra and Number Theory, second edition. John Wiley, New York, 1990. 5. G. Turnwald, On Schur’s conjecture, J. Austral. Math. Soc. Ser. A 58 (1995) 312–357; available at http: //dx.doi.org/10.1017/S1446788700038349. 6. B. L. van der Waerden, Algebra, Vol. I. Springer-Verlag, New York, 1991. Department of Mathematics, University of Michigan, Ann Arbor, MI 48109 {rosenjh, zscherr, blweiss, zieve}@umich.edu
February 2012]
NOTES
155
Collapsing Walls Theorem Igor Pak and Rom Pinchasi
Abstract. Let P ⊂ R3 be a pyramid with the base a convex polygon Q. We show that when other faces are collapsed (rotated around the edges onto the plane spanned by Q), they cover the whole base Q.
1. INTRODUCTION. Let P be a convex pyramid in R 3 over the base Q, which is a convex polygon in a horizontal plane. Think of the other faces F of P as the “walls” of a wooden box, and that each wall F is hinged to the base Q along the edge. Suppose now that the walls are “collapsed,” i.e., rotated around the edges towards the base onto the horizontal plane. The question is: do they cover the whole base Q? At first, this may seem obvious, but in fact the problem is already nontrivial even in the case of four-sided pyramids, which can possibly have some obtuse dihedral angles (see Figure 1). Formally, we have the following result:
Figure 1. An impossible configuration of four collapsing walls of a pyramid leaving a hole in the base.
Collapsing Walls Theorem. Let P ⊂ R3 be a pyramid over a convex polygon Q. For a face F of P, denote by e F the edge between F and the base: e F = F ∩ Q, and let A F denote the result of rotation of F around e F in the direction of P, onto the plane which contains Q. Then Q ⊆ ∪F A F , where the union is over all faces F of P, different from Q. For example, suppose pyramid P in the theorem has a very large height, and all walls are nearly vertical. The theorem then implies that every point O ∈ Q has an orthogonal projection into the interior of some edge e of Q. This is a classical result with a number of far-reaching generalizations (see [4, §9]). Thus, the collapsing walls theorem can be viewed as yet another generalization of this result (see Section 3). 2. PROOF OF THE THEOREM. Consider R3 endowed with the standard Cartesian coordinates (x, y, z). Without loss of generality assume that the plane H spanned by Q is horizontal, i.e., given by z = 0, and that P is contained in the half-space z ≥ 0. http://dx.doi.org/10.4169/amer.math.monthly.119.02.156 MSC: Primary 52B10
156
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
Denote by F1 , . . . , Fm the faces of P different from Q. For 1 ≤ i ≤ m, denote by Hi the plane spanned by Fi and by ei = Fi ∩ Q the edge of Q adjacent to Fi . Denote by 8i the rotation about ei of Hi onto H (the rotation is performed in the direction dictated by P, so that throughout the rotation Hi intersects the interior of P). Similarly, let Ai = 8i (Fi ) be the rotation of the face F of P onto Q, 1 ≤ i ≤ m. We m need to show that every point in Q lies in ∪i=1 Ai . Without loss of generality and in order to simplify the presentation, we can take this point to be the origin O. Further, denote by L i = Hi ∩ H the line through ei . Let ri be the distance from the origin to L i , and let αi be the dihedral angle of P at ei , i.e., the angle between H and Hi which contains P. Suppose now F1 is a face such that τi = ri · tan
αi is minimized at τ1 . 2
We will show that the origin O is contained in A1 . In other words, we prove that if O∈ / A1 , then τi < τ1 for some i > 1. Let B ∈ H1 be such that the rotation of B onto Q is the origin: 81 (B) = O. It suffices to show that B ∈ F1 . Let v = (a, b, 0) be the unit vector that is normal to L 1 in the horizontal plane and pointing outwards from Q. It is easy to see that −→ O B = r1 (1 − cos α1 )a, r1 (1 − cos α1 )b, r1 sin α1 . To prove the theorem, assume to the contrary that B ∈ / F1 . Then there exists a face of P, say F2 , such that H2 separates B from the origin. Denote by C the closest point to B on L 2 , and by α 0 the angle between the line BC and the horizontal plane H , where the angle is taken with the half-plane of H which contains Q (and thus the origin). In this notation, the above condition implies that α 0 > α2 . Without loss of generality we may assume that line L 2 is given by the equations y = r2 and z = 0. Then C = r1 (1 − cos α1 )a, r2 , 0 , and r2 − r1 (1 − cos α1 )b d =q . cos α 0 = cos OCB r12 sin2 α1 + (r2 − r1 (1 − cos α1 )b)2 √ Note that the quantity t/ a 2 + t 2 is monotone increasing as a function of t, and that b ≤ 1. We get r2 − r1 (1 − cos α1 ) cos α 0 ≥ q . r12 sin2 α1 + (r2 − r1 (1 − cos α1 ))2 Applying cos α 0 < cos α2 , we conclude: r2 − r1 (1 − cos α1 ) q < cos α2 . r12 sin2 α1 + (r2 − r1 (1 − cos α1 ))2
(1)
Recall the assumption that τ1 ≤ τ2 . This gives r1 tan α21 ≤ r2 tan α22 , or February 2012]
NOTES
157
tan α21 r2 ≥ . r1 tan α22
(2)
The rest of this section is dedicated to showing that (1) and (2) cannot both be true. This gives a contradiction with our assumptions and proves the claim. We split the proof into two cases depending on whether the dihedral angle α2 is acute or obtuse. In each case we repeatedly rewrite (1) and (2), eventually leading to a contradiction. Case 1 (obtuse angles). Suppose cos α1 ) < 0. Now (1) implies 1+
π 2
< α2 < π. In this case cos α2 < 0 and r2 − r1 (1 −
r12 sin2 α1 1 < , (r2 − r1 (1 − cos α1 ))2 cos2 α2
(3)
r1 sin α1 > tan α2 . r2 − r1 (1 − cos α1 )
(4)
and
This can be further rewritten as: r2 sin α1 < 1 − cos α1 + . r1 tan α2
(5)
Now (5) and (2) together imply tan α21 sin α1 , α2 < 1 − cos α1 + tan 2 tan α2 which is impossible. Indeed, suppose for some α and β satisfying 0 < α, β < π we have tan α2 tan
β 2
< 1 − cos α +
sin α . tan β
(6)
Dividing both sides by tan α2 , after some easy manipulations, we conclude that (6) is equivalent to 1 tan
β 2
1 + cos α , tan β
(7)
sin β < cos(α − β).
(8)
< sin α +
which in turn is equivalent to 1 tan β2
1 − tan β
!
Since the left-hand side of (8) is equal to 1, we get a contradiction and complete the proof in Case 1. Case 2 (right and acute angles). Suppose now that 0 < α2 ≤ π2 . Then cos α2 ≥ 0, and 0 < tan α22 ≤ 1. Let us first show that the numerator of (1) is nonnegative, i.e., that r2 ≥ r1 (1 − cos α1 ). From the contrary assumption we have r2 /r1 < (1 − cos α1 ). Together 158
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
with (2), this implies: tan α21 r2 α1 1 − cos α1 > ≥ ≥ tan , r1 tan α22 2 which is impossible whenever 0 < α1 < π. From the above, we can exclude the right angle case α2 = π2 , for otherwise the left-hand side of (1) is nonnegative, while the right-hand side is equal to zero. Thus, cos α2 > 0. Therefore, the inequality (1) in this case can be rewritten as 1+
r12 sin2 α1 1 > , (r2 − r1 (1 − cos α1 ))2 cos2 α2
(9)
and r1 sin α1 > tan α2 . r2 − r1 (1 − cos α1 )
(10)
Note now that (10) coincides with (4). Since (6) does not hold for any α and β satisfying 0 < α, β < π , we obtain the contradiction verbatim as in the proof of Case 1. This completes the analysis of Case 2 and finishes the proof of the theorem. 3. FINAL REMARKS. 3.1. The collapsing walls theorem extends verbatim to higher dimensions. Moreover, it also extends to every polytope P ⊂ Rd , as follows. For each facet F of P, let HF denote the hyperplane containing F. Fix one facet Q of P. If all other facets F of P are rotated around the affine subspace HF ∩ HQ onto HQ (or if HF is parallel to HQ we just consider the orthogonal projection of F onto HQ ), then they cover the whole facet Q. We refer to [5], where this result is proved in full generality. We should mention that after we advertised the result in this paper, other people (we should mention here personal communications with Arseniy Akopyan and independently with G¨unter Rote) found alternative elementary and beautiful proofs that are not more complicated and perhaps technically even easier for the simple three-dimensional case presented in this paper. However, we have not yet seen another proof (simple or not) that generalizes to higher dimensions or to the case of a general convex polytope, as does the argument in this paper. 3.2. Let us note that when the walls of a pyramid are collapsed outside, rather than onto the base, they are pairwise nonintersecting (see Figure 2). We leave this easy exercise to the reader.
Figure 2. Walls of a pyramid collapsing outside the base do not intersect.
February 2012]
NOTES
159
3.3. Continuing with the example of “vertical walls” as given in the introduction right after the theorem, recall that for the center of mass O = cm(Q), there are at least two edges onto which orthogonal projection of O lies in the interior (see, e.g., [4, §9]).1 It would be interesting to see if this result extends to the setting of the theorem (of course, the notion of the center of mass would have to be modified appropriately). Let us note here that the center of mass result is closely related to the four vertex theorem [6], and fails in higher dimension [2]. 3.4. The proof of the theorem is based on an implicit subdivision of Q given by the smallest of the linear functions τi at every point O ∈ Q. Recall that τi is a weighted distance to the edge ei . Thus this subdivision is in fact a weighted analogue of the dual Voronoi subdivision in the plane (see [1, 3]). As a consequence, computing this subdivision can be done efficiently, both theoretically and practically. ACKNOWLEDGMENTS. The authors are thankful to Yuri Rabinovich for his interest in the problem. The first author was partially supported by the National Security Agency and the National Science Foundation. The second author was supported by the Israeli Science Foundation (grant No. 938/06).
REFERENCES 1. F. Aurenhammer, Voronoi diagrams—A survey of a fundamental geometric data structure, ACM Comput. Surv. 23 (1991) 345–405; available at http://dx.doi.org/10.1145/116873.116880. 2. J. H. Conway, M. Goldberg, R. K. Guy, Problem 66-12, SIAM Review 11 (1969) 78–82; available at http://dx.doi.org/10.1137/1011014. 3. S. Fortune, Voronoi Diagrams and Delaunay Triangulations, Computing in Euclidean Geometry, second edition. Edited by F. Hwang and D.-Z. Du, Lecture Notes Ser. Comput. 4, 225–265, World Scientific, Singapore, 1995. 4. I. Pak, Lectures on Discrete and Polyhedral Geometry, monograph (to appear). Available at http://www. math.ucla.edu/~pak/book.htm 5. I. Pak, R. Pinchasi, How to cut out a convex polyhedron, (to appear). 6. S. Tabachnikov, Around four vertices, Russian Math. Surveys 45 (1990) 229–230; available at http: //dx.doi.org/10.1070/RM1990v045n01ABEH002326. Department of Mathematics, UCLA, Los Angeles, CA
[email protected] Mathematics Department, Technion—Israel Institute of Technology, Haifa 32000, Israel
[email protected]
1 One can give a construction in which there is only one such edge, if the center of mass is replaced by a general point in Q (see [2] and [4, §9]).
160
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
PROBLEMS AND SOLUTIONS Edited by Gerald A. Edgar, Doug Hensley, Douglas B. West with the collaboration of Mike Bennett, Itshak Borosh, Paul Bracken, Ezra A. Brown, Randall Dougherty, Tam´as Erd´elyi, Zachary Franco, Christian Friesen, Ira M. Gessel, L´aszl´o Lipt´ak, Frederick W. Luttmann, Vania Mascioni, Frank B. Miles, Bogdan Petrenko, Richard Pfiefer, Cecil C. Rousseau, Leonard Smiley, Kenneth Stolarsky, Richard Stong, Walter Stromquist, Daniel Ullman, Charles Vanden Eynden, Sam Vandervelde, and Fuzhen Zhang.
Proposed problems and solutions should be sent in duplicate to the MONTHLY problems address on the back of the title page. Proposed problems should never be under submission concurrently to more than one journal. Submitted solutions should arrive before April 30, 2012. Additional information, such as generalizations and references, is welcome. The problem number and the solver’s name and address should appear on each solution. An asterisk (*) after the number of a problem or a part of a problem indicates that no solution is currently available.
PROBLEMS 11621. Proposed by Z. K. Silagadze, Budker Institute of Nuclear Physics and Novosibirsk State University, Novosibirsk, Russia. Find Z ∞ Z s1 Z s2 Z s3 cos(s12 − s22 ) cos(s32 − s42 ) ds4 ds3 ds2 ds1 . s1 =−∞
s2 =−∞
s3 =−∞
s4 =−∞
11622. Proposed by Oleh Faynshteyn, Leipzig, Germany. In triangle ABC, let A1 , B1 , C1 be the points opposite A, B, C at which symmedians of the triangle meet the opposite sides. Prove that m a (c cos α1 − b cos α2 ) + m b (a cos β1 − c cos β2 ) + m c (b cos γ1 − a cos γ2 ) = 0, m a (sin α1 − sin α2 ) + m b (sin β1 − sin β2 ) + m c (sin γ1 − sin γ2 ) = 0, and m a (cos α1 + cos α2 ) + m b (cos β1 + cos β2 ) + m c (cos γ1 + cos γ2 ) = 3s, where a, b, c are the lengths of the sides, m a , m b , m c are the lengths of the medians, s is the semiperimeter, α1 = ∠C A A1 , α2 = ∠A1 AB, and similarly with the β j and γ j . 11623. Proposed by Aruna Gabhe, Pendharkar’s College, Dombivali, India, and M. N. Deshpande, Nagpur, India. A fair coin is tossed n times and the results recorded as a bit string. A run is a maximal subsequence of (possibly just one) identical tosses. Let the random variable X n be the number of runs in the bit string not immediately followed by a longer run. (For instance, with bit string 1001101110, there are six runs, of lengths 1, 2, 2, 1, 3, and 1. Of these, the 2nd, 3rd, 5th, and 6th are not followed by a longer run, so X 10 = 4.) Find E(X n ). 11624. Proposed by David Callan, University of Wisconsin, Madison, WI, and Emeric Deutsch, Polytechnic Institute of NYU, Brooklyn, NY. A Dyck n-path is a lattice path of http://dx.doi.org/10.4169/amer.math.monthly.119.02.161
February 2012]
PROBLEMS AND SOLUTIONS
161
n upsteps U (changing by (1, 1)) and n downsteps D (changing by (1, −1)) that starts at the origin and never goes below the x-axis. A peak is an occurrence of U D and the peak height is the y-coordinate of the vertex between its U and D. The peak heights multiset of a Dyck path is the set of peak heights for that Dyck path, with multiplicity. For instance, the peak heights multiset of the Dyck 3-path UU DU D D is {2, 2}. In terms of n, how many different multisets occur as the peak heights multiset of a Dyck n-path? 11625. Proposed by Lane Bloome, Peter Johnson, and Nathan Saritzky (students) Auburn University Research Experience for Undergraduates in Algebra and Discrete Mathematics 2011. Let V (G), E(G), and χ (G) denote respectively the vertex set, edge set, and chromatic number of a simple graph G. For each positive integer n, let g(n) and h(n) respectively denote the maximum and the minimum of χ (G) + χ (H ) − χ (G ∪ H ) over all pairs of simple graphs G and H with |V (G) ∪ V (H )| ≤ n and . E(G) ∩ E(H ) = ∅. Find g(n) and limn→∞ h(n) n 11626. Proposed by Cezar Lupu, University of Pittsburgh, Pittsburgh, PA. Let x1 , x2 , and x3 be positive numbers such that x1 + x2 + x3 = x1 x2 x3 . Treating indices modulo 3, prove that 3 X
1 q
1
xk2 + 1
≤
3 X 1
3
X 1 1 q + 2 xk + 1 (x 2 + 1)(x 2 1 k
k+1
≤ + 1)
3 . 2
11627. Proposed by Samuel Alexander, The Ohio State University, Columbus, Ohio. Let N be the set of nonnegative integers. Let M be the set of all functions from N to N. For a function f 0 from an interval [0, m] in N to N, say that f extends f 0 if f (n) = f 0 (n) for 0 ≤ k ≤ m. Let F( f 0 ) be the set of all extensions in M of f 0 , and equip M with the topology in which the open sets of M are unions of sets of the form F( f 0 ). Thus, { f ∈ M : f (0) = 7 and f (1) = 11} is an open set. S T Let S be a proper subset of M that can be expressed both as i∈N j∈N X i, j and as T S and open i∈N j∈N Yi, j , where each set X i, j or Yi, j is a subset of M that is both closed S T (clopen). Show that there is a family Z i, j of clopen sets such that S = i∈N j∈N Z i, j T S and S = i∈N j∈N Z i, j .
SOLUTIONS Eigenvalues of Sums and Differences of Idempotent Matrices 11466 [2009, 845]. Proposed by Tian Yongge, Central University of Finance and Economics, Beijing, China. For a real symmetric n × n matrix A, let r (A), i + (A), and i − (A) denote the rank, the number of positive eigenvalues, and the number of negative eigenvalues of A, respectively. Let s(A) = i + (A) − i − (A). Show that if P and Q are symmetric n × n matrices, P 2 = P, and Q 2 = Q, then i + (P − Q) = r (P + Q) − r (Q), i − (P − Q) = r (P + Q) − r (P), and s(P − Q) = r (P) − r (Q). Solution by Oskar Maria Baksalary, Adam Mickiewicz University, Pozna´n, Poland, and G¨otz Trenkler, Dortmund University of Technology, Dortmund, Germany. We solve the more general problem in which idempotent P and Q are Hermitian with complex entries. We view them as n × n complex orthogonal projectors. The solution is based on a joint decomposition of the projectors P and Q. Let P have rank ρ, where 0 < ρ ≤ n. By the Spectral Theorem, there is an n × n unitary 162
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
matrix U such that P =U
Iρ 0
0 U ∗, 0
where Iρ is the identity matrix of order ρ and U ∗ denotes the conjugate transpose of U . We use this expression for P to partition the projector Q. Using the same matrix U , we write A B Q=U U ∗, B∗ D where A and D are Hermitian matrices of orders ρ and n − ρ, respectively. Let A¯ = Iρ − A. Since Q 2 = Q, we have A¯ = A¯ 2 + B B ∗ = A¯ A¯ ∗ + B B ∗ . Since A¯ A¯ ∗ and B B ∗ are both nonnegative definite, A¯ is also nonnegative definite. With R(·) denoting the column space of a matrix argument, we obtain ¯ = R( A¯ A¯ ∗ + B B ∗ ) = R( A¯ A¯ ∗ + R(B B ∗ ) = R( A) ¯ + R(B), R( A) ¯ Other relationships among A, B, and D are found in Lemand hence R(B) ⊆ R( A). mas 1–5 of [1]; we use two of these. The first expresses the orthogonal projector PD onto the column space of D as PD = D + B ∗ A¯ B, where A¯ is the Moore–Penrose ¯ The second expresses the rank of A¯ as r ( A) ¯ = ρ − r (A) + r (B). Furinverse of A. thermore, Theorem 1 of [1] gives r (Q) = r (A) − r (B) + r (D), and Lemma 6 of [1] gives r (P + Q) = ρ + r (D). Taking differences of these expressions yields r (P + Q) − r (Q) = ρ − r (A) + r (B) and r (P + Q) − r (P) = r (D). Since the third of the desired equations is just the difference of the first two, it suffices to show that P − Q ¯ positive eigenvalues and r (D) negative eigenvalues. has r ( A) Theorem 5 in [1] expresses P − Q as A¯ −B P−Q=U U ∗, −B ∗ −D which can be rewritten as Iρ P−Q=U −B ∗ A¯
0 In−ρ
A¯ 0 0 −PD
Iρ 0
− A¯ B U ∗. In−ρ
The matrices before and after the central matrix in the product on the right are nonsingular and are conjugate transposes of each other. By Sylvester’s Law of Inertia (see [2, Section 1.3]), the numbers of positive and negative eigenvalues are unchanged by conjugation. Since A¯ and PD are nonnegative definite (the eigenvalues of an idempotent ¯ positive eigenvalues matrix lie in the interval [0, 1]), we conclude that P − Q has r ( A) and r (D) negative eigenvalues, as desired. [1] O. M. Baksalary and G. Trenkler, Eigenvalues of functions of orthogonal projectors, Linear Alg. Appl. 431 (2009) 2172–2186. [2] R. A. Horn, F. Zhang, Basic properties of the Schur complement, in The Schur Complement and its Applications, edited by F. Zhang, Springer Verlag, New York, 2005, 17–46. Also solved by R. Chapman (U. K.), E. A. Herman, O. Kouba (Syria), J. H. Lindsey II, K. Schilling, J. Simons (U. K.), R. Stong, Z. V¨or¨os (Hungary), S. Xiao (Canada), GCHQ Problem Solving Group (U. K.), and the proposer.
February 2012]
PROBLEMS AND SOLUTIONS
163
A Hankel Determinant Limit 11471 [2009, 941]. Proposed by Finbarr Holland, University College Cork, Cork, Ireland. Let A be an r × r matrix with distinct eigenvalues λ1 , . . . , λr . For n ≥ 0, let a(n) be the trace of An . Let H (n) be the r × r Hankel matrix with (i, j) entry a(i + j + n − 2). Show that lim |det H (n)|
1/n
=
r Y
n→∞
|λk |.
k=1
Solution by Jim Simons, Cheltenham, U. K. The eigenvalues of An are λn1 , . . . , λrn , P P n+i+ j−2 . It is well known that the so a(n) = rk=1 λnk . Therefore, H (n)i, j = rk=1 λk j−1 Vandermonde matrix V , given by Vi, j = λi for i, j ∈ {1, . . . , n}, has determinant Q n j
r X
λi−1 k λk
n+ j−1
k=1
=
r X
λk
n+i+ j−2
= H (n)i, j .
k=1
Therefore, det H (n) = det(V 0 V (n)) =
r Y
λk
n Y (λi − λ j )2 .
k=1
j
The second factor is constant, so its nth root tends to 1. Editorial comment. Simons observed that the computation of det H (n) is valid over any field. Also solved by R. Chapman (U. K.), M. Goldenberg & M. Kaplan, J.-P. Grivaux (France), E. A. Herman, O. Kouba (Syria), J. H. Lindsey II, O. P. Lossers (Netherlands), R. Stong, E. I. Verriest, GCHQ Problem Solving Group (U. K.), Microsoft Research Problems Group, and the proposer.
Pretty Boxes All in a Row 11477 [2010, 86]. Proposed by Antonio Gonz´alez, Universidad de Sevilla, Seville, Spain, and Jos´e Heber Nieto, Univesidad del Zulia, Maracaibo, Venezuela. Several boxes sit in a row, numbered from 0 on the left to n on the right. A frog hops from box to box, starting at time 0 in box 0. If at time t, the frog is in box k, it hops one box to the left with probability k/n and one box to the right with probability 1 − k/n. Let pt (k) be the probability that the frog launches its (t + 1)th hop from box k. Find limi→∞ p2i (k) and limi→∞ p2i+1 (k). Solution by Robin Chapman, University of Exeter, Exeter, U. K. We show that limi→∞ p2i (k) is nk /2n−1 when k is even and 0 when k is odd. Also, limi→∞ p2i+1 is n−1 n /2 when k is odd and 0 when k is even. k In standard language, we have a Markov chain with states 0, . . . , n and transition probabilities pk,k−1 = k/n and pk,k+1 = 1 − k/n (all others equal 0). This Markov chain is periodic with period 2, since the state switches parity on each move. Thus p j (k) = 0 when j and k have opposite parity. Taking two hops at once converts the Markov chain into two others, one on the odd states and one on the even states. Each is ergodic and thus has a unique stationary 164
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
distribution. (For a chain to be ergodic it suffices that one can reach any state from any other and that it is possible to remain in the current state at any step.) The stationary distributions by definition are the limits to be computed, so the limits exist. Let ak = limi→∞ p2i (k) for even k and ak = limi→∞ P p2i−1 (k) for odd k. In order for these to form stationary distributions, we must have nk=0 ak = 2 and ak =
k+1 n−k+1 ak−1 + ak+1 n n
for 0 ≤ k ≤ n (with a−1 = an+1 = 0). These linear equations determine the n + 1 val ues {ak }nk=0 . Therefore, it suffices to check that setting ak = nk 2n−1 for all k satisfies the equations. Editorial comment. Stephen J. Herschkorn wrote “The problem begs the question as to why, from a probabilistic point of view, the binomial should be the stationary distribution for this simple random walk.” Herschkorn communicated the following intuition from Sheldon Ross: Flip n fair coins; the number of heads has a binomial distribution. Pick a random coin and turn it over. The new number of heads arises from the old by the same transition probability as in the random-walk model, but the new number of heads still has the binomial distribution, because each coin still has probability 1/2 of being heads. Daniel M. Rosenblum noted a similarity to Problem 11032 (2003, 637), in which the frog’s probabilities of jumping to the right and left are reversed, yielding the same Markov chain as the Ehrenfest urn model (see, for example, Sections 4 and 5 of M. Kac, Random Walk and the Theory of Brownian Motion, Amer. Math. Monthly 54 (1947) 269–391.) Some solvers used generating functions. It is also possible to avoid mentioning the theorem on stationary distributions and instead prove that the limits exist by direct methods particular to the problem. Also solved by A. Agnew, M. Andreoli, D. Beckwith, K. David & P. Fricano, D. Fleischman, O. Geupel ´ Plaza (Spain), S. J. Herschkorn, O. Kouba (Syria), J. H. Lindsey II, (Germany), C. Gonz´alez-alc´on & A. O. P. Lossers (Netherlands), D. M. Rosenblum, R. K. Schwartz, J. Simons (U. K.), N. C. Singer, R. Stong, R. Tauraso (Italy), M. Tetiva (Romania), GCHQ Problem Solving Group (U. K.), NSA Problems Group, Skidmore College Problem Group, and the proposer.
Separating the Degrees of Polynomials 11478 [2010, 87]. Proposed by Marius Cavachi, “Ovidius” University of Constanta, Constanta, Romania. Let K be a field of characteristic 0, and let f and g be relatively prime polynomials in K [x] with deg(g) < deg( f ). Suppose that for infinitely many λ in K there is a sublist of the roots of f + λg (counting multiplicity) that sums to 0. Show that deg(g) < deg( f ) − 1 and that the sum of all the roots of f (again counting multiplicity) is 0. Solution by Richard Stong, Center for Communications Research, San Diego, CA. For a monic polynomial p of degree n with roots α1 , . . . , αn (taken with multiplicity) the product Q k defined by Y Qk = (αi1 + · · · + αik ) 1≤i 1
is a symmetric function in the roots of p. Hence Q k is given by a universal polynomial in the coefficients of p. When p is a constant multiple of f + λg (choosing the constant to make p monic), Q k is a polynomial in λ. By hypothesis, there are infinitely many February 2012]
PROBLEMS AND SOLUTIONS
165
Q values of λ such that nk=1 Q k vanishes. Hence one of these polynomials, say Q j , is the 0 polynomial. Thus Q j vanishes for all λ, and the desired sublist exists for all λ. The same conclusion holds even when we replace K by a larger field, specifically the field K (t) of rational functions in a new indeterminate t. By Gauss’s Lemma, if the polynomial f (x) + tg(x) is reducible over K (t), then it is reducible over the polynomial ring K [t]. However, since it is linear in t, one of the factors would be independent of t and would give a common factor of f and g. Thus f (x) + tg(x) is irreducible over K (t). Hence its Galois group G acts transitively on the roots α1 , . . . , αn of f (x) + tg(x). Suppose without loss of generality that α1 + · · · + αk = 0. Now 0=
X φ∈G
φ(α1 + · · · + αk ) =
|G|k (α1 + · · · + αn ). n
Thus α1 + · · · + αn = 0, and hence the coefficient of x n−1 in f + tg vanishes. Now deg(g) ≤ n − 2, and the sum of the roots of f vanishes as desired. Also solved by R. Chapman (U. K.), O. P. Lossers (Netherlands), and the proposer.
Orthogonality of Matrices under Additivity of Traces of Powers ´ Pit´e, Paris, France. Let A and B be real n × n 11483 [2010, 182]. Proposed by Eric symmetric matrices such that tr (A + B)k = tr Ak + tr B k for every nonzero integer k. Show that AB = 0. Composite solution by the editors. We prove the stronger statement that if A and B are n × n Hermitian matrices such that tr (A + B)k = tr Ak + tr B k for every integer k such that 1 ≤ k ≤ 3n, then AB = 0. We show first that if the sums of the kth powers of two lists of complex numbers, of length l and m respectively, are equal for 1 ≤ k ≤ l + m, then the lists are the same (up to order of the entries). To see this, let the first list have distinct entries α1 , . . . , αr with multiplicities a1 , . . . , ar , and let the second list have P distinct entries P β1 , . . . , β s with multiplicities b1 , . . . , bs . The hypothesis is now that ri=1 ai αik − sj=1 b j β kj = 0 P P for 1 ≤ k ≤ ai + b j = l + m. Since the Vandermonde matrix is invertible, the hypothesis requires the lists to have the same entries. This immediately yields the following: If S1 , S2 , and S3 are three lists of complex numbers, and the sum of the kth powers of the entries in S1 and S2 equals the sum of the kth powers of the entries in S3 whenever k is at most the sum of the lengths of the three lists, then the entries of the concatenation of S1 and S2 are the same as the entries in S3 . Now let A and B be Hermitian matrices, and let C = A + B. Let the lists of s t nonzero eigenvalues of these matrices be {αi }ri=1 , {βi }i=1 , and i }i=1 , respectively. P{γ Pr t k k k k k The tr (A + B) = tr A + tr B is the same as i=1 γi = i=1 αi + Ps condition k i=1 βi , imposed for 1 ≤ k ≤ 3n. Hence, the nonzero eigenvalues of C are exactly the nonzero eigenvalues of A and B, including multiplicities. Consequently, rank (C) = rank (A) + rank (B). On the other hand, the images satisfy Im(A + B) ⊆ Im(A) + Im(B). Thus, Im(A + B) = Im(A) + Im(B). Let V = Im(A + B). Viewed as a linear transformation on V , C is invertible. Finally, we argue that AB = 0. By spectral factorization, since A and B are Hers mitian, there are orthonormal vectors {u i }ri=1 for A and {vi }i=1 for B such that A = Pr Ps ∗ ∗ α u u and B = β v v . Moreover, the space V is spanned by {u i }ri=1 and i i j j i j i=1 j=1 s s r {vi }i=1 . Since r + s = dim V , it follows that {u i }i=1 ∪ {vi }i=1 is a linearly independent 166
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
set and forms a basis for V . It follows that (A + B)u i = αi u i +
s X
β j v j (v ∗j u i )
for 1 ≤ i ≤ r
j=1
and (A + B)v j =
r X
αi u i (u i∗ v j ) + β j v j
for 1 ≤ j ≤ s.
i=1
Under the basis {u 1 , . . . , u r , v1 , . . . , vs }, the matrix representation of A + B is Dα 0 I E Dα Dα E ∗ = , Dβ E Dβ 0 Dβ E∗ I
(1)
where Dα = diag(α1 , . . . , αr ), Dβ = diag(β1 , . . . , βs ), and E i, j = u i∗ v j . s , the determinants of Since the nonzero eigenvalues of A + B are {αi }ri=1 and {βi }i=1 Qr Qs I E both sides of (1) equal i=1 αi i=1 βi . This yields det E ∗ I = 1. Also, EI∗ EI is just the Gram matrix of u 1 , . . . , u r , v1 , . . . , vs . By the Hadamard determinant inequality, E = 0; that is, u i∗ v j = 0 for all i and j. It follows that AB =
r X i=1
αi u i u i∗
s X j=1
β j v j v ∗j
=
r X s X
αi β j u i (u i∗ v j )v ∗j = 0.
i=1 j=1
Editorial comment. It would be nice to extend the result to normal matrices. The problem is that C = A + B is not normal when A and B are normal. Thus the rank of C is not necessarily the same as the number of nonzero eigenvalues of C. Other than this, everything works for normal matrices. One may wonder whether the condition “k ≤ 3n” be replaced with “k ≤ n”. This fails at least when n = 1, since tr (A + B) = tr A + tr B for all numbers A and B, but AB 6= 0. Also solved by J. Simons (U. K.), R. Stong, and the proposer.
Friendly Paths 11484 [2010, 182]. Proposed by Giedrius Alkauskas, Vilnius University, Vilnius, Lithuania. An uphill lattice path is the union of a (doubly infinite) sequence of directed line segments in R2 , each connecting an integer pair (a, b) to an adjacent pair, either (a, b + 1) or (a + 1, b). A downhill lattice path is defined similarly, but with b − 1 in place of b + 1, and a monotone lattice is an uphill or downhill lattice path. Given a finite set P of points in Z2 , a friendly path is a monotone lattice path for which there are as many points in P on one side of the path as on the other. (Points that lie on the path do not count.) (a) Show that√ if N = a 2 + b2 + a + b for some positive integer pair (a, b) satisfying a ≤ b ≤ a + 2a, then for some set of N points there is no friendly path. (b)* Is it true that for every odd-sized set of points there is a friendly path? Solution to (a) by the proposer. Let P be the centrally symmetric configuration consisting of triangles of points in four quadrants as in the figure (where a = 4 and b = 7). The first and third quadrants contain triangles meeting a diagonals, comprising a(a + 1)/2 points. The second and fourth quadrants contain triangles meeting b February 2012]
PROBLEMS AND SOLUTIONS
167
B • • • • • • • • • • •
C
• • • • • • •
• • • • • • • • •
• • • • • • • • • • •
A • • • • • • • • • • •
• • • • • • • • •
• • • • • • •
• • • • • • • • • • • D
diagonals, comprising b(b + 1)/2 points. In total, |P| = N . Let A, B, C, D denote the subsets in the four quadrants. We prove that there is no friendly path for P. If the first and last points of P on a monotone path Q lie in neighboring quadrants, then at least N /2 points lie on one side, and Q is not friendly. If the first and last points are in C and A, then Q hits one point in each of 2a + 1 diagonals. Since N is even, this leaves an odd number of points of P outside Q, and they cannot be split equally. It remains to consider a downhill lattice path Q whose first and last points are in B and D. If Q hits a point of P at every step between these extremes, then Q hits 2b + 1 points of P, and again the remainder cannot be split equally. Hence, we may assume that an odd number of lattice points along Q between its ends are not in P. By symmetry, we may assume these points are in A. We claim that every such path has more points of P below it than above it. Consider the point x just above the leftmost column of the triangle in A. The downhill path Q containing x that has the most points of P above and to its right goes directly rightward to x and then down. There are b−a−1 points of P above Q in B, 2 b a points of P to the right of Q in A, and 2 points of P to the right of Q in D. 2 b−a+1 b+1 points. An equal split − + Meanwhile, on the other side of Q are a+1 2 2 2 requires a b b−a−1 a+1 b+1 b−a+2 + + ≥ + − , 2 2 2 2 2 2 which simplifies to a + b ≤ (b − a)2 . The left side is at least 2a, and the right side is at most 2a, so b = a is necessary, but then 2a ≤ 0. As we move from x to any other point in the first quadrant outside A as a point of Q outside P between points of Q in P, the number of points above Q decreases, while the number of points below Q increases. Hence the two sides can never have equal size. Editorial comment. We do not know the answer to part (b)*. Parity considerations made part (a) easy using a centrally symmetric configuration. However, a centrally symmetric configuration of odd size has a central point. Any symmetric path through that point is a friendly path. This makes it difficult to construct a counterexample. No other solutions were received.
168
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
REVIEWS Edited by Jeffrey Nunemacher Mathematics and Computer Science, Ohio Wesleyan University, Delaware, OH 43015
Roads to Infinity. The mathematics of truth and proof. By John Stillwell. A K Peters, Natick, MA, 2010. xi + 203 pp., ISBN 978-1-56881-466-7. $39.95.
Reviewed by Jos´e Ferreir´os Mathematics appears to be an ever-unfolding dialectic between the finite and the infinite, between discrete structures and continuous forms, but also between symbolic content and idealisation. In a well-known paper ‘On the infinite’, Hilbert proposed a distinction between the “contentual” in mathematics—which he tentatively identified with the study of strictly finite structures—and the “ideal elements” that are constantly being introduced to help explore the realm of mathematical truths. Hilbert also claimed that the infinite is not to be found in Nature, yet “it may still be the case that the infinite occupies a well-justified place in our thinking, that it plays the role of an indispensable concept” [6]. Broadly construed, that is the topic of the book under review. More than dealing with roads to infinity, or with infinity studied for its own sake (the topic of higher set theory), the book’s core focus is roads from infinity, in the sense explained below. In that same paper of 1925, Hilbert offered new clarifications of his celebrated program to vindicate infinitarian mathematics by methods employing only finitary, “contentual” mathematics—by means of a proof theory studying the structure of proofs inside any given (axiomatized) theory, which would show that mathematics is consistent, i.e., free from contradiction. 1 This became the springboard for G¨odel’s surprising and celebrated Incompleteness Theorems, but also for new developments in proof theory initiated in the 1930s by Gentzen, who was able to establish the consistency of the axiomatic system called PA (Peano Arithmetic) by an extension of Hilbertian methods. This is the “mathematics of truth and proof” to which Stillwell’s book is devoted, and in which he chooses to emphasize the contributions of Emil Post and Gerhard Gentzen. Stillwell is a master expositor and does a very good job explaining and weaving together many core issues in mathematical logic and foundational studies. Less than halfway through the book the reader has reached the limitation results that affect formal systems capable of codifying a certain amount of arithmetic: incompleteness, unprovability of consistency, the halting problem in computation, the decision problem. But, although Stillwell sets himself the task of dispelling “the myth that incompleteness is a difficult concept,” I doubt that he has managed to do so. Diagonalization is a simple and clear technique, and the author does very well presenting it in several versions—from the original set theoretic one (Chapter 1) to the relevant proof-theoretic http://dx.doi.org/10.4169/amer.math.monthly.119.02.169 also had the courage to propose a way of solving the Continuum Problem—which turned out to be seriously flawed. 1 He
February 2012]
REVIEWS
169
versions. But the idea of incompleteness is itself a sophisticated concept (as is even the notion of a formal system), so less mature readers will have some trouble adequately grasping the contents of Chapter 3, where these ideas are discussed. I believe many readers will not be quite satisfied with the proof sketch provided on p. 75, and particularly not with the (vague) considerations of soundness and consistency offered at this point in the book. Even more difficult—since vague—is the problem of obtaining an adequate understanding of the links and differences between formal systems and mathematical practice, which however is central to fully digesting the implications of G¨odel’s and other limitation results.2 In this part of the book, Stillwell chooses to underscore the contributions of the American (Polish-born) mathematician Emil Leon Post (1897–1954). In 1921, having completed his Ph.D., Post developed ideas about unsolvable problems that brought him close to the path-breaking results of G¨odel and Turing, but he also could not find the direct link to arithmetic and Principia Mathematica later established by G¨odel. And, feeling the need for “full generality” and “a complete analysis,” he refrained from publishing. (Stillwell cites a 1938 postcard to G¨odel: “As for any claims I might make perhaps the best I can say is that I would have proved G¨odel’s Theorem in 1921—had I been G¨odel.”) Nonetheless, Post had found a notion of “canonical” or “normal system” that he would publish many years later, and which turned out to be equivalent to Turing machines (Church’s thesis says that all computation can be done by normal systems).3 The core of the book deals with results that (to the best of my knowledge) have not yet made it into the popular, or semi-popular, literature: Gentzen’s 1936 proof of the consistency of Peano Arithmetic (PA) and later work on “natural” unprovable arithmetic sentences. Both are crucial post-G¨odel developments in proof theory. Gerhard Gentzen (1909–1945) employed an extension of Hilbertian methods to include induction up to ε0 as the basis to establish the consistency of PA (this is a “miniature version” of transfinite induction; ε0 is a countable transfinite ordinal, i.e., ω0 < ε0 < ω1 , to be defined below). And, years after G¨odel’s incompleteness result, mathematicians started looking for unprovable sentences that arise more “naturally,” unlike G¨odel’s sentence which is concocted metatheoretically. The examples that Stillwell discusses are Goodstein’s theorem, the finite and infinite Ramsey theorems, Kruskal’s theorem in graph theory, and Friedman’s finite form of it. 4 One of the most beautiful examples is Goodstein’s theorem, which is about sequences of natural numbers arising from the following process. Take any natural number and express it in base 2 normal form (with all digits at most 2), then replace each base 2 by a 3; subtract 1 and write the result in base 3 normal form (with digits at most 3) and then repeat the procedure, replacing each 3 by a 4, subtracting 1, and writing the result in base 4 normal form; and continue repeating. The numbers obtained grow bigger and bigger, since the procedure goes on for very long, and the bases keep rising, but only up to a point. Goodstein’s theorem states that the sequence of numbers thus obtained is finite and the process terminates at 0. 2 This has given rise to all kinds of misunderstandings, even among expert mathematicians. To suggest some readings, I find very interesting Cellucci’s distinction between “closed” and “open” symbolic systems (see, e.g., his [2]); a masterful exposition of mistaken readings of G¨odel’s theorems is provided in [4]. 3 Stillwell does not mention that Post “suffered all his adult life from crippling manic-depressive disease at a time when no drug therapy was available for this malady” (M. Davis). His case, together with those of Cantor and G¨odel, has reinforced the popular, “romantic” notion that there is some link between logic and madness. 4 Kruskal’s theorem has to do with trees, finite graphs that are connected and contain no closed paths: For any infinite sequence of trees T1 , T2 , T3 , . . . there are indices i < j such that Ti embeds T j (i.e., the infinite sequence of trees contains an infinite increasing subsequence of trees embedding each other).
170
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
If we start with 3 = 2 + 1, the Goodstein sequence of numbers is 2 + 1, 3, 3, 2, 1, 0. Very simple. Starting with number 4 the process already becomes quite intractable; the sequence begins 22 , 32 · 2 + 3 · 2 + 2, 42 · 2 + 4 · 2 + 1, 52 · 2 + 5 · 2, 62 · 2 + 6 + 5, . . . and, according to Stillwell (following Kirby & Paris, see pages 49–50), it reaches 0 at base 3 · 2402653211 − 1. The Goodstein process looks very natural if you have studied some set theory, to the point of becoming acquainted with countable ordinals and with the fact that any descending sequence of countable ordinals will be finite. For the Goodstein process is a finitary, arithmetic translation of the finiteness of descending sequences of ordinals. The interesting fact is that Goodstein’s theorem, albeit a truth of arithmetic which can be stated in PA, and whose statement involves only the basic operations of addition, product and exponentiation, cannot be proved in PA. Indeed, it can be proved in PA that the Goodstein theorem (expressed in this theory by means of an axiom schema) implies the consistency of PA. But then, by G¨odel’s second incompleteness theorem, a.k.a. the unprovability of consistency, it follows that Goodstein’s theorem is a “formally unprovable” sentence of PA (better: schema of sentences). Of course, Goodstein’s theorem can be proved in set theory, for instance in ZFC. The process by which the numbers in any Goodstein sequence (like the two we showed above) change and eventually come down to 0 can be understood by considering descending sequences of countable ordinals. Countable ordinals are objects like ω, ω2 , or ω ω2 · 2 + ω + 5, or even ωω +ω + ωω+3 + 3, written in what is called the Cantor normal form; obviously there are infinitely many such “regimented” polynomials in ω. And they are bounded above by ε0 , which can be defined as the “limit” of the sequence ω ω, ωω , ωω , . . . , or alternatively as the first ordinal α such that ωα = α;5 Cantor knew this number already and established that it is merely countable. That is, our friend ε0 is the first countable ordinal that cannot be written as one of the above polynomials in ω. For instance, the sequence for number 4 above corresponds to this (one may simply interpret ω as a variable that takes value n + 1 in row n): ωω , ω2 · 2 + ω · 2 + 2, ω2 · 2 + ω · 2 + 1, ω2 · 2 + ω · 2, ω2 · 2 + ω + 5 . . . Given that such descending sequences of ordinals must be finite, Goodstein’s theorem is true. Actually the result can be proved in a theory much simpler than set theory: it is sufficient to employ Gentzen’s expanded arithmetic, which adds ε0 -Induction to the system of Peano Arithmetic. 5 Of
course there are uncountably many such numbers, even among the countable ordinals!
February 2012]
REVIEWS
171
Here we observe the delicate interplays between truth and proof that Stillwell wants to present to his readers. We see a beautiful example of infinitary mathematics translated into the finitary, which is actually the main topic of Roads to Infinity; for the book is not so much devoted to the higher infinite as to proof-theoretic results, e.g., concrete arithmetical statements that are obtained by mirroring certain methods that—true—were introduced in the set-theoretic study of infinity. That is what I mean by roads from infinity. Stillwell himself quotes Terence Tao talking about principles that “allow one to tap the power of the infinitary world in order to establish results in the finitary world, or at least to take the intuition gained in the infinitary world and transfer it to a finitary setting” (p. 163). Now, what are the roads to infinity? And how do they turn into roads from infinity? Stillwell is explicit about this. He considers two main roads to infinity, explored in the first two chapters: 1. The road to transfinite cardinals through diagonalization, which led mathematicians from the simple infinity of N to the infinity of real numbers R, namely, to 2ℵ0 (here I’ll have something to qualify, see below). And 2. The road of transfinite ordinals through principles of ordinal generation, which led Cantor to ω(= ω0 ) and ω + 1 and all the ordinals mentioned above, and which in a bold step he “brought to completion” by considering the set of all countable ordinals, ω1 , and proceeding beyond. The attractive idea of Stillwell’s presentation is to emphasize the parallel between those two upward roads, and the crucial methods employed in proof theory by G¨odel and Post, and by Gentzen. For diagonalization in proof-theoretic dressing is crucial to the proof of incompleteness for systems that codify primitive recursive arithmetic. And the set of all countable ordinals that can be constructed from ω, hence a “miniaturization” of Cantor’s boundless generation of ordinals, is crucial to Gentzen’s proof of the consistency of PA. (In order to prove that the set of all countable ordinals is uncountable, Cantor had to introduce into his considerations a new bigger ordinal, ω1 ; similarly, Gentzen has to introduce a new ordinal bigger than those constructible from ω, namely ε0 .) Despite all of the admirable coverage of this book (and I recommend it highly), at some points Stillwell presents rather conventional views that deserve closer scrutiny and revision. An unimportant example is his comments on Principia Mathematica (p. 68ff), which historically was very important, but has some rather serious shortcomings in content and presentation; for instance he makes no distinction between the complex formal system of Principia and the simple type theory which was actually studied by G¨odel in his famous paper. By doing so Stillwell echoes and amplifies the myth of Principia, which looms large in the secondary literature. Also, in connection with Frege’s predicate logic (p. 90), the author does not remark on the difference between the first-order part and the full Fregean system. You may think that the difference between first-order and second-order logic is sophisticated and not appropriate for a popular book, but this one is not a usual popular book. It is rather a semi-popular exposition, which often enters into technical material, and the above-mentioned distinction plays a role in several passages of the exposition. More important are some aspects of the exposition of set theory, in particular the (important) side-comments of a conceptual nature. The following subsection is devoted to a number of comments concerning this issue, pertaining to the roads to infinity and the status of the Axiom of Choice. It is not that Stillwell’s technical exposition, sketchy as it may be, has shortcomings; the author is very proficient and very clever in finding ways to present his material. But in general the book offers more on the side of proof theory and computability, than 172
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
on set theory proper6 (an indication of this is the fact that Chapter 7, entitled ‘Axioms of infinity’ and devoted to higher set theory, is presented as a kind of “epilogue” on p. 165). In Chapter 2, while presenting Cantor’s transfinite ordinals, nothing is said about the details of Zermelo’s proof of the well-ordering theorem, but above all what the author says regarding the axiom of choice (AC) is conventional wisdom that I find misguided. The reader will excuse me for being opinionated on this topic, since it is one to which I have contributed and in which I’d like to help change the received view. The received view is that AC is a fishy axiom, which is necessary for some technical purposes but which also has very inconvenient consequences in some other contexts, a principle that is not quite clear and that we rather want to avoid. AC is ritually blamed for the appearance of such inconvenient facts as the existence of subsets of R that are not Lebesgue-measurable, and the fact that a ball of unit volume can be decomposed into two balls of unit volume (Banach-Tarski paradox). The received view is that there is not much difference between the evidence we have for AC and the evidence for Cantor’s continuum hypothesis, that 2ℵ0 = ℵ1 . I disagree strongly [3]. And of course, I am not alone. The view I insist upon has been urged by great experts in set theory and foundations such as G¨odel and Bernays, who, starting in 1935, emphasized that AC is a natural principle of set theory. One of the key ingredients of set theory is the viewpoint called quasi-combinatorialism, according to which arbitrary sets and arbitrary functions are just as much mathematical objects as any set or function which may be explicitly defined through some condition. That is to say, set theory (and classical mathematics, in particular classical analysis) accepts arbitrary or “random” sets of natural numbers on a par with definable sets (e.g., the set of prime numbers or the set of multiples of 2011). Classical analysis reasons about the totality of arbitrary sets of real numbers, and set theory generalizes this to reason about the totality of arbitrary sets of numbers, points, functions, or sets (in the cumulative hierarchy V = ∪Vα which has levels Vα corresponding to all transfinite ordinals α). Classical mathematics follows the lead of Dedekind, Cantor, Hilbert and Zermelo in not making any requirement of definability, even when we are thinking about infinite sets, however big in cardinality. It treats infinite sets in analogy with the combinatorial nature of finite sets, which we take for granted, and this analogy explains the label “quasi-combinatorialism”. The Axiom of Choice AC is the one axiom in Zermelo’s set theory that implements on a technical level this crucial idea.7 But many mathematicians around 1900 had second thoughts about admitting arbitrary sets and functions, and they were not fully aware of the extent to which such sets were involved in the traditional results of analysis. This is how the practice of treating AC as “different” and dangerous started. However, it is not difficult to show that the simple idea of the real number system as the totality of “all possible” infinite decimals leads directly to quasi-combinatorialism (hence to choice, AC). The real numbers in any interval of unit length correspond to all possible decimal expansions after the comma (but for the well-known identification in the case of sequences ending in 9s). This much was clear, well understood, and accepted as early as 1800. Now, these expansions are infinitary objects, and as such they can be understood in several different ways: most simply perhaps as infinite sequences of ciphers; but 6 Readers
interested in this topic will find a lot in the philosophically oriented [1]. in 1935: “The axiom of choice is an immediate application of the quasi- combinatorial concepts in question”. G¨odel, assuming the extensional, combinatorial notion of “class” or set, says in 1944: “nothing can express better the meaning of the term “class” than the axiom of [separation] and the axiom of choice” (Collected Papers, vol. II, 139; compare p. 131). 7 Bernays
February 2012]
REVIEWS
173
also as functions f : N → {0, 1}, if we assume we are using binary notation. Quasicombinatorialism is nothing but the denial of any definability requirement, and this is a common ingredient to both of the reconstructions just mentioned. Let me spell this out. There is a notion of sequence which is arguably more elementary than sets or functions: the concept of lawlike sequence that is a generalization of the most basic mathematical structure, the sequence of natural numbers. (The notion of lawlike sequence is usual in constructive mathematics, referring to sequences determined by an explicitly given rule. Examples are the sequence of Fibonacci numbers, and any explicitly given sequence you have√ever encountered; think, e.g., of a determinate sequence of rationals converging to 2, like 1/1, 7/5, 41/29, 239/169, . . . , 8 or even the algebraic numbers ordered in a sequence according to their corresponding minimal polynomials.9 ) This simple notion is however not the one involved in understanding the reals as decimals: to get the real numbers by means of sequences of ciphers, one has to move beyond lawlike sequences and consider also arbitrary sequences.10 Thus the classical real numbers force consideration of arbitrary infinitary objects, they force acceptance of quasi-combinatorialism. And AC is nothing but an expression of this general idea. Hence, contrary to the received view, the axiom AC is clearly required in a system that aims to establish foundations for classical analysis. Its consequences (complicating as they may be: non-measurable sets of reals, non-determinacy of sets of reals, etc.) are important discoveries regarding the subject matter of classical mathematics. And there is much difference between the evidence we have for AC and the evidence for Cantor’s continuum hypothesis. Let me now come to the methodological point of the role of AC in implying strange consequences. What I want to argue is just that AC is never alone—in particular, the axiom asserting the existence of power sets (APow) is always required too. For example, APow is no less present in the proof of Zermelo’s well-ordering theorem than is AC. To show that a well-ordering of S exists for any given set S, one employs a choice function f : ℘ (S) → S, and this step is essential. (Then what is required is to define a substructure of ℘ (S) that will play the role of the family of “rests” of (well-ordered) set S, in the sense that the rest of an element s ∈ S is the set of all elements x > s in the well-ordering. Once a family of sets having the structure of the set of “rests” is defined, the choice function f mentioned above specifies the ordering of elements, beginning with f (S).) This brings me to the promised criticism of Stillwell’s identification of the cardinal road to infinity with diagonalization. I must disagree. The first road to infinity consists in heavy reliance on the axiom of powersets, understood in the sense of quasicombinatorialism explained above. It is the assumption that “all possible” subsets of N forms a new well-determined totality, a set (which is just as much a mathematical object as the number 1), which brings us boldly into the realm of uncountable infinite sets. In a paper published this year I wrote: “The importance of Cantor’s diagonal procedure is, precisely, that it constitutes a method for transcending any given sequence of 8 Where
numerators and denominators obey the rule: tn+2 = 6 · tn+1 = tn . is what Cantor (and Dedekind) used to prove that the set of algebraic numbers is denumerable. 10 From a didactic point of view, it is important to reflect on the fact that the essential difference between sequences and functions applies only if we restrict to lawlike sequences. Once we are considering arbitrary sequences, this is essentially the same as functions f : N → (0, 1, . . . , 9). 9 This
174
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
definable subsets of N (analogously for other sets). In and by itself, however, Cantor’s diagonal method does not lead to arbitrary sets [nor, I add here, to higher infinities]. In fact, if a countable sequence of definable sets of natural numbers is given explicitly, so that we can compute whether n belongs to the nth set, the diagonal procedure yields a computation of the truth value of n ∈ B (where B refers to the new set defined by Cantor’s method).” [3]11 Diagonalization establishes that a certain domain (e.g., the set of real numbers) is not exhaustible by a given countable sequence—which is transcended by employing the diagonal procedure. However, to establish that the domain is an uncountable set, a higher infinity, obviously one needs to postulate that it is a set, and this is based on the axioms of ZFC, very especially on APow. Actually, the axiom of powersets is also behind the second, ordinal road to infinity. The formation of countable ordinals ω, ω + 1, . . . , ω2 , . . . goes on and on, leading up to ε0 and much, much farther. But to follow Cantor in his path towards the higher infinite, one needs to claim that there is a set of all countable ordinals, denoted by ω1 . Now, the existence of ω1 is again a claim that depends crucially on APow: in the system ZFC without APow one cannot prove that ω1 exists. And of course APow is crucially behind the kind of infinitary structure by which the continuum is modelled in classical mathematics, hence behind results such as the Banach-Tarski paradox. The question now is, why do we put all the blame on AC, and disregard APow completely? The answer is, merely by tradition and conventional wisdom. AC has been regarded as intuitively more dubious than APow; but this may be a superficial impression and invites revision insofar as powersets are interpreted in the standard, quasi-combinatorial way. Notice that the classical idea of the real numbers involves both (1) the assumption of particular arbitrary (non-definable) objects, and (2) the assumption of a totality of such objects. The first corresponds, roughly speaking, to a randomly chosen real, the second to the set of all reals. In the system of axiomatic set theory, these two assumptions are paralleled by AC, corresponding to (1), and the axiom of powersets, corresponding to (2). From a conceptual point of view, both are equally dubious, and any qualm we may have concerning AC would apply equally to APow. If anything, such doubts should be multiplied, for the powerset ℘ (N ) contains uncountably many arbitrary sets of naturals! Thus, your doubts concerning a single application of AC should be multiplied times 2ℵ0 in the case of the simplest powerset of an infinite set. ℵ And the powerset ℘ (R) has 22 0 -many arbitrary sets of reals! The above may help emphasize, once again, that there is something “inconcrete” in the subject matter of classical set theory—in ω1 and 2ℵ0 , hence on both sides of Cantor’s equation 2ℵ0 = ℵ1 . We have pinned it down to the heavily idealizing tendency—and the associated vagueness—of quasi-combinatorialism, which is expressed directly in the contentious axiom AC and indirectly in the standard reading of the axiom of powersets. Coming back to Stillwell’s discussion of the finitary counterparts of those ideas, we find that the material on arithmetic and proof theory is rather concrete, “contentual:” there is something tangible about ε0 and about Gentzen’s measuring the proof-theoretic strength of the proposition that PA is consistent, while there is something intangible about the real number system and about the problem of measuring its uncountability. Yet mathematicians have always found a need for ideal horizons 11 Compare the case with the algebraic numbers: one can define in the strict sense a transcendental number by using Cantor’s diagonal argument applied to an explicit enumeration of the set of algebraic numbers. See R. Gray, ‘Georg Cantor and Transcendental Numbers,’ The American Mathematical Monthly 101 (1994), pp. 819–832; available at http://dx.doi.org/10.2307/2975129.
February 2012]
REVIEWS
175
against which to determine our more concrete procedures. Prominent among these are the step to infinity, and the step to the continuum, which are always moves that allow us to determine in new ways (top-down) significant procedures which are more constructive (bottom-up). We find again Hilbert’s topic of the contentual vs. the ideal, symbols and processes vs. (possibly vague) ideas!12 To sum up, Stillwell’s book is highly commendable, very informative and well organized. It is very carefully produced. I was able to find only minor errors: an unfortunate one on p. 8 in Euclid’s proof, on p. 26 a misplaced sentence, and a few others. I should mention also a historical mistake on p. 62 (Cantor did know about sets larger than R before 1891), and an opinion that I do not share on 135 (Dirichlet’s geometric proof of ab = ba probably had a didactic intention). There is a place where the reader may easily be misled: on pp. 68–69 it is said that G¨odel proved the “logical” completeness (!) of Principia Mathematica before proving its “mathematical” incompleteness. This must be understood in the sense that the “logical” completeness of PM is that of its first-order subsystem—for the type theory of PM is incomplete.13 As already said, I believe the book is not really appropriate for high school students, since it presupposes some measure of mathematical maturity (e.g., comfort with functions). So it seems to presuppose the level of a mature first-year university student. A feature that deserves special mention is that Stillwell has included rich and interesting historical notes, which he often employs to mention more advanced material. Thus, in the early chapters, some commentary on G¨odel’s constructible hierarchy (the basis for proving the consistency of AC and the Continuum Hypothesis relative to the axiomatic system ZF) and Cohen’s method of forcing; later on, a discussion of Chaitin’s incompleteness theorem and of links between logic and computation (Cook’s theorem, the problem N = N P). In this same spirit, the last chapter discusses large cardinals and projective determinacy. REFERENCES 1. T. Arrigoni, What is Meant by V? Reflections on the Universe of All Sets, Mentis-Verlag, Paderborn, 2007. 2. C. Cellucci, Why Proof? What is Proof? in Deduction, Computation, Experiment. Exploring the Effectiveness of Proof. Edited by G. Corsi and R. Lupacchini. Springer, Berlin, 2008. 3. J. Ferreir´os, On arbitrary sets and ZFC, Bull. of Symbolic. Logic 17 (2011) 361–393. 4. T. Franz´en, G¨odel’s theorem: An incomplete guide to its use and abuse, A K Peters, Natick, MA, 2005. 5. L. Henkin, Completeness in the theory of types, J. of Symbolic Logic 15 (1950) 81–91; available at http://dx.doi.org/10.2307/2266967. 6. D. Hilbert, On the Infinite, in From Frege to G¨odel, a Source Book in Mathematical Logic. Edited by J. van Heijenoort. Harvard University Press, Cambridge, MA, 2002. Institute of Philosophy, CCHS CSIC (Spanish Higher Council for scientific research), Madrid
[email protected]
12 Two more examples given by Stillwell: we have both the finite and infinite Ramsey theorems, but few Ramsey numbers are actually known! (p. 152–153); also what is said about the Green-Tao theorem (p. 158). 13 Unless one does not employ the full set-theoretic semantics, but so-called Henkin semantics, which is actually quite natural as a semantics for higher-order logics [5].
176
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 119
New from the MAA The Hungarian Problem Book IV Edited and Translated by Robert Barrington Leigh and Andy Liu The Eötvös Mathematics Competition is the oldest high school mathematics competition in the world, dating back to 1894. This book is a continuation of Hungarian Problem Book III and takes the contest through 1963. Forty-eight problems in all are presented in this volume. Problems are classified under combinatorics, graph theory, number theory, divisibility, sums and differences, algebra, geometry, tangent lines and circles, geometric inequalities, combinatorial geometry, trigonometry and solid geometry. Multiple solutions to the problems are presented along with background material. There is a substantial chapter entitled "Looking Back," which provides additional insights into the problems.
Hungarian Problem Book IV is intended for beginners, although the experienced student will find much here. Beginners are encouraged to work the problems in each section, and then to compare their results against the solutions presented in the book. They will find ample material in each section to help them improve their problem-solving techniques. 114 pp., Paperbound, 2011 ISBN 978-0-88385-831-8 Catalog Code: HP4 List: $40.95 Member: $33.95
To order visit us online at www.maa.org or call us at 1-800331-1622.
MATHEMATICAL ASSOCIATION OF AMERICA
1529 Eighteenth St., NW • Washington, DC 20036
New title from the MAA! Rediscovering Mathematics: You Do the Math Shai Simonson Rediscovering Mathematics is an eclectic collection of mathematical topics and puzzles aimed at talented youngsters and inquisitive adults who want to expand their view of mathematics. By focusing on problem solving, and discouraging rote memorization, the book shows how to learn and teach mathematics through investigation, experimentation, and discovery. Rediscovering Mathematics is also an excellent text for training math teachers at all levels. Topics range in difficulty and cover a wide range of historical periods, with some examples demonstrating how to uncover mathematics in everyday life, including: • • •
number theory and its application to secure communication over the Internet, the algebraic and combinatorial work of a medieval mathematician Rabbi, and applications of probability to sports, casinos, and everyday life.
Rediscovering Mathematics provides a fresh view of mathematics for those who already like the subject, and offers a second chance for those who think they don’t.
To order call 1-800-331-1622 or visit us online at www.maa.org!