MONTHLY THE AMERICAN MATHEMATICAL
Volume 117, Number 2
Natalie Priebe Frank Sean M. Hart David Sherman Tam´as Keleti Elliot Paquette Lutz D¨umbgen Sara A. van de Geer Mark C. Veraar Jon A. Wellner NOTES John E. Wetzel Lara K. Pudwell Eric S. Rowland Sam B. Nadler, Jr. Emmanuel Lesigne
®
February 2010
A Dynamical System Using the Vorono¨ı Tessellation Variations on Kuratowski’s 14-Set Theorem The Trouble with von Koch Curves Built from n-gons Nemirovski’s Inequalities Revisited
99
138
An Ancient Elliptic Locus Counting Interesting Elections
161 167
A Proof of Darboux’s Theorem On the Behavior at Infinity of an Integrable Function
174 175
113 124
PROBLEMS AND 182 SOLUTIONS REVIEWS Daniel S. Silver Tools of American Mathematics Teaching, 1800–2000 190 By Peggy Aldrich Kidwell, Amy Ackerberg-Hastings, and David Lindsay Roberts
AN OFFICIAL PUBLICATION OF THE MATHEMATICAL ASSOCIATION OF AMERICA
MONTHLY THE AMERICAN MATHEMATICAL
Volume 117, Number 2
®
February 2010
EDITOR Daniel J. Velleman Amherst College ASSOCIATE EDITORS William Adkins Jeffrey Nunemacher Louisiana State University Ohio Wesleyan University David Aldous Bruce P. Palka University of California, Berkeley National Science Foundation Roger Alperin Joel W. Robbin San Jose State University University of Wisconsin, Madison David Banks Rachel Roberts Duke University Washington University, St. Louis Anne Brown Judith Roitman Indiana University South Bend University of Kansas, Lawrence Edward B. Burger Edward Scheinerman Williams College Johns Hopkins University Scott Chapman Abe Shenitzer Sam Houston State University York University Ricardo Cortez Karen E. Smith Tulane University University of Michigan, Ann Arbor Joseph W. Dauben Susan G. Staples City University of New York Texas Christian University Beverly Diamond John Stillwell College of Charleston University of San Francisco Gerald A. Edgar Dennis Stowe The Ohio State University Idaho State University, Pocatello Gerald B. Folland Francis Edward Su University of Washington, Seattle Harvey Mudd College Sidney Graham Serge Tabachnikov Central Michigan University Pennsylvania State University Doug Hensley Daniel Ullman Texas A&M University George Washington University Roger A. Horn Gerard Venema University of Utah Calvin College Steven Krantz Douglas B. West Washington University, St. Louis University of Illinois, Urbana-Champaign C. Dwight Lahr Dartmouth College EDITORIAL ASSISTANT Nancy R. Board
NOTICE TO AUTHORS The M ONTHLY publishes articles, as well as notes and other features, about mathematics and the profession. Its readers span a broad spectrum of mathematical interests, and include professional mathematicians as well as students of mathematics at all collegiate levels. Authors are invited to submit articles and notes that bring interesting mathematical ideas to a wide audience of M ONTHLY readers. The M ONTHLY’s readers expect a high standard of exposition; they expect articles to inform, stimulate, challenge, enlighten, and even entertain. M ONTHLY articles are meant to be read, enjoyed, and discussed, rather than just archived. Articles may be expositions of old or new results, historical or biographical essays, speculations or definitive treatments, broad developments, or explorations of a single application. Novelty and generality are far less important than clarity of exposition and broad appeal. Appropriate figures, diagrams, and photographs are encouraged. Notes are short, sharply focused, and possibly informal. They are often gems that provide a new proof of an old theorem, a novel presentation of a familiar theme, or a lively discussion of a single issue. Articles and notes should be sent to the Editor: DANIEL J. VELLEMAN American Mathematical Monthly Amherst College P. O. Box 5000 Amherst, MA 01002-5000
[email protected] For an initial submission, please send a pdf file as an email attachment to:
[email protected]. (Pdf is the only electronic file format we accept.) Please put “Submission to the Monthly” in the subject line, and include the title of the paper and the name and postal address of the corresponding author in the body of your email. If submitting more than one paper, send each in a separate email. In lieu of a pdf, an author may submit a single paper copy of the manuscript, printed on only one side of the paper. Manuscript pages should be numbered, and left and right margins should be at least one inch wide. Authors who use LATEX are urged to use article.sty, or a similar generic style, and its standard environments with no custom formatting. See recent articles in the M ONTHLY for the style of citations for journal articles and books. Follow the link to Electronic Publication Information for authors at http://www.maa.org/pubs/monthly.html for information about figures and files as well as general editorial guidelines. Letters to the Editor on any topic are invited. Comments, criticisms, and suggestions for making the M ONTHLY more lively, entertaining, and informative are welcome. The online M ONTHLY archive at www.jstor.org is a valuable resource for both authors and readers; it may be searched online in a variety of ways for any specified keyword(s). MAA members whose institutions do not provide JSTOR access may obtain individual access for a modest annual fee; call 800-3311622. See the M ONTHLY section of MAA Online for current information such as contents of issues and descriptive summaries of forthcoming articles: http://www.maa.org/
Proposed problems or solutions should be sent to: DOUG HENSLEY, M ONTHLY Problems Department of Mathematics Texas A&M University 3368 TAMU College Station, TX 77843-3368 In lieu of duplicate hardcopy, authors may submit pdfs to
[email protected].
Advertising Correspondence: MAA Advertising 1529 Eighteenth St. NW Washington DC 20036 Phone: (866) 821-1221 Fax: (866) 387-1208 E-mail:
[email protected] Further advertising information can be found online at www.maa.org Change of address, missing issue inquiries, and other subscription correspondence: MAA Service Center,
[email protected] All at the address: The Mathematical Association of America 1529 Eighteenth Street, N.W. Washington, DC 20036 Recent copies of the M ONTHLY are available for purchase through the MAA Service Center.
[email protected], 1-800-331-1622 Microfilm Editions: University Microfilms International, Serial Bid coordinator, 300 North Zeeb Road, Ann Arbor, MI 48106. The AMERICAN MATHEMATICAL MONTHLY (ISSN 0002-9890) is published monthly except bimonthly June-July and August-September by the Mathematical Association of America at 1529 Eighteenth Street, N.W., Washington, DC 20036 and Montpelier, VT, and copyrighted by the Mathematical Association of America (Incorporated), 2010, including rights to this journal issue as a whole and, except where otherwise noted, rights to each individual contribution. Permission to make copies of individual articles, in paper or electronic form, including posting on personal and class web pages, for educational and scientific use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear the following copyright notice: [Copyright the Mathematical Association of America 2010. All rights reserved.] Abstracting, with credit, is permitted. To copy otherwise, or to republish, requires specific permission of the MAA’s Director of Publications and possibly a fee. Periodicals postage paid at Washington, DC, and additional mailing offices. Postmaster : Send address changes to the American Mathematical Monthly, Membership/Subscription Department, MAA, 1529 Eighteenth Street, N.W., Washington, DC, 20036-1385.
A Dynamical System Using the Vorono¨ı Tessellation Natalie Priebe Frank and Sean M. Hart
1. INTRODUCTION. Suppose you know the locations of post offices or cell phone satellites, and you want to know what regions they serve. Or maybe you know the locations of atoms in a crystal, and you want to know what a fundamental region looks like. There are lots of reasons you might want to make a tiling around a given discrete set of points. A natural way to do it is with “Vorono¨ı tessellations”—so natural, in fact, that it has been rediscovered numerous times over the years. On the other hand, if you have a tiling, you might want to decorate each tile with a few points to create or destroy symmetry. Or you might look at all the vertices of the tiling—points where three or more tiles meet—to extract combinatorial information. If you have a tiling, there are many ways to obtain a point set from it. So, you can get tilings from point sets and point sets from tilings: doesn’t this give you a way to associate point sets to point sets or tilings to tilings? Once you have a map from a class of objects back to itself, you can take a dynamical systems viewpoint to analyze the situation. In this paper we are going to do exactly that, with a new dynamical system based on the vertices of Vorono¨ı tessellations. For those uninitiated with the Vorono¨ı tessellation, we begin with its definition and then give the definition of our dynamical system. From there, the remainder of §1 is spent exploring the evolution of simple point sets, using these simplified examples to develop both the intuition and vocabulary needed for more interesting cases. In §2 we give a new proof of a theorem, first proved in [4], quantifying the growth in size of point sets over repeated iteration. Following that we will point out some interesting corollaries and give some estimates on the growth rate. We devote §3 to discussing what questions interest us most from the dynamical systems viewpoint. For now, let’s turn to the definitions. 1.1. Our Dynamical System. We start with a finite point set P ⊂ R2 , which we call the generating set, the members of which we refer to as the generators. (We relax the assumption that P be finite and/or planar in §3.) The Vorono¨ı polygon of a point p ∈ P, denoted V ( p), is given by V ( p) = x ∈ R2 | || p − x|| ≤ || p − x|| for all p ∈ P . Simply put, the Vorono¨ı polygon of p contains every point in the plane that is closer to p than to any other member of P, or is equidistant between p and a nearby generator point. The Vorono¨ı tessellation of P is given by ϒ(P) = {V ( p) | p ∈ P} . A point set and its Vorono¨ı tessellation are given in Figure 1. Given distinct p, p ∈ P, the sets V ( p) and V ( p ) are not necessarily disjoint; in fact, the boundary of each Vorono¨ı polygon is shared with other Vorono¨ı polygons. If doi:10.4169/000298910X476022
February 2010]
A DYNAMICAL SYSTEM USING THE VORONO¨I TESSELLATION
99
(a)
(b)
Figure 1. A point set P, its Vorono¨ı tessellation, and ν(P).
V ( p) ∩ V ( p ) is a line, ray, or line segment, we call it a Vorono¨ı edge and denote it e p, p . If the intersection of three or more Vorono¨ı tiles is a point, we call that point a Vorono¨ı vertex. The sets of all Vorono¨ı edges and vertices are denoted by E (ϒ(P)) and V (ϒ(P)), respectively. It is useful to notice that e p, p always lies on the perpendicular bisector of the line between p and p . This gives a method for constructing Vorono¨ı diagrams: for p ∈ P, sketch each perpendicular bisector between p and another member of P. Then the Vorono¨ı polygon V ( p) is the intersection of all half-planes created by the perpendicular bisectors. It is also useful to notice that a Vorono¨ı vertex is equidistant from the generator points of the tiles it is in. For proofs of these properties and a wealth of other information about Vorono¨ı tessellations, [7] is an excellent source. Now, the set V (ϒ(P)) constitutes a point set in its own right, and so one might naturally wonder what its Vorono¨ı tessellation looks like. And we need not stop there— the Vorono¨ı tessellation of V (ϒ(P)) will have a vertex set, too, so how does its Vorono¨ı tessellation behave? We have the makings of a dynamical system on the set P (R2 ) of all finite point sets in the plane. Definition 1.1. Let P ∈ P (R2 ). We define the Vorono¨ı iteration1 of P to be ν(P) = V (ϒ(P)). For n = 2, 3, . . . we define the nth Vorono¨ı iteration of P recursively: ν n (P) = V ϒ ν n−1 (P) = ν ν n−1 (P) . Figure 1(b) depicts the Vorono¨ı tessellation of the six points depicted in 1(a), and so the lighter seven points—the vertices of the tessellation—constitute the Vorono¨ı iteration of the original set. Let’s begin looking at the simplest cases this dynamical system has to offer. 1.2. The Really Trivial Cases. We follow the convention of using |P| to denote the cardinality of P. If |P| = 1, then the single Vorono¨ı polygon is all of R2 . Since R2 is just one big tile, it has no vertices, and so ν n (P) = ∅ for all n ≥ 1. Next, we up the 1 This is not to be confused with Lloyd’s algorithm from computer graphics (see [3] and references therein), which is sometimes also called “Vorono¨ı iteration.”
100
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
ante: suppose P = { p1 , p2 }. The Vorono¨ı tessellation then fractures the plane into two half-planes, split along the perpendicular bisector of p1 p2 , the line segment joining p1 to p2 . The vertex set, however, is still empty, and so ν n (P) = ∅ for all n ≥ 1. There are two possibilities when P has three points: either all three are collinear, or they lie on a circle. The former yields ν n (P) = ∅ for all n ≥ 1 and in the latter, ν(P) is a single point. No matter how large P is, the special cases of collinearity and cocircularity always work out like this: Proposition 1.2. All the points in P are collinear if and only if ν n (P) = ∅ for all n ≥ 1. All the points in P are cocircular if and only if |ν(P)| = 1 and ν n (P) = ∅ for n > 1. The first fact follows naturally from the observation that if p, q ∈ P are such that e p,q ∈ E (ϒ(P)), then e p,q lies on the perpendicular bisector of the line segment pq. When all the points are collinear, the perpendicular bisectors are all parallel and thus do not intersect to produce new vertices. Conversely, if ν(P) = ∅ then no Vorono¨ı edges intersect, which implies all Vorono¨ı edges are parallel, and hence P must be collinear. To prove the cocircularity result, we introduce the concept of an empty circle. This is a circle whose interior does not contain any generator points and whose boundary passes through three or more generator points. Such a circle gets its name since it is “empty” of generator points. Proposition 1.2 follows immediately from the next Proposition. Proposition 1.3. A point q ∈ R2 is a Vorono¨ı vertex in V (ϒ(P)) if and only if it is the center of an empty circle.2 In addition to proving Proposition 1.2, this proposition also lets us quickly find the vertex set of ϒ(P). To do so, pick a noncollinear triple p, q, r ∈ P, and look at the unique circle passing through them. If this circle is empty, then place a vertex at its center. Once you have checked every triple, every vertex will be accounted for—and you never had to sketch an edge! So, we know what happens when the point set is really small, or collinear, or cocircular. Let’s move on to . . . 1.3. A Slightly Less Trivial Case. Suppose |P| = 4. To discriminate between configurations that aren’t collinear or cocircular, we require a more sophisticated vocabulary. We say that a subset A ⊂ R2 is convex if, given any distinct x, y ∈ A, the line segment x y is contained in A. (For practice, try proving that Vorono¨ı polygons are convex!) The convex hull of a set P is defined to be the smallest convex set containing P. We write C H (P) to mean the convex hull of P, and write ∂C H (P) for its boundary. Definition 1.4. For p ∈ P, we say p is on the boundary of P and write p ∈ Bd(P) if p ∈ P ∩ ∂C H (P). Otherwise, we say p is in the interior of P and write p ∈ Int(P). In general it is clear that |Int(P)| + |Bd(P)| = |P|. 2 See
(1.5)
[7, p. 61].
February 2010]
A DYNAMICAL SYSTEM USING THE VORONO¨I TESSELLATION
101
We say that two distinct points p, p ∈ Bd(P) are neighbors on the boundary of P if pp ⊂ ∂C H (P) and no p ∈ P − { p, p } lies on pp . For noncollinear finite point sets, any point on the boundary has two distinct neighbors. It is not surprising that the Vorono¨ı tiles of boundary points would have infinite edges. We denote by E F (ϒ(P)) and E I (ϒ(P)) the sets of finite and infinite edges of ϒ(P), respectively. Proposition 1.6. Assume not all p ∈ P are collinear, and let e p, p ∈ E (ϒ(P)). Then e p, p ∈ E I (ϒ(P)) if and only if p and p are neighbors on the boundary of P. Moreover, e p, p ∈ E F (ϒ(P)) if and only if p and p are not neighbors on the boundary of P.3 This then implies |E I (ϒ(P))| = |Bd(P)|.
(1.7)
Returning to the case of |P| = 4, let us assume that the points in P are neither collinear nor cocircular. We may have |Bd(P)| = 3 or 4, and we are going to prove that Figure 2 represents the only possible types of Vorono¨ı iterations.
(a)
(b)
Figure 2. Configurations satisfying (a) |Bd(P)| = 3 and (b) |Bd(P)| = 4.
First let’s assume |Bd(P)| = 3. Since |Int(P)| + |Bd(P)| = |P| = 4, say Int(P) = { p}. Proposition 1.6 tells us that each of the edges of its Vorono¨ı polygon V ( p) must be finite. As |P − { p}| = 3, there can be at most three edges; since V ( p) must be bounded, there must be exactly three edges. So V ( p) is a triangle, and the three infinite edges promised by Proposition 1.6 extend from its vertices. Since |ν(P)| = 3, we conclude that |ν 2 (P)| = 1 and ν n (P) = ∅ for all n ≥ 3. Now let’s assume that |Bd(P)| = 4 and that P is neither collinear nor cocircular. We will prove that |ν(P)| = 2. By Proposition 1.3 we know that |ν(P)| > 1. To prove that |ν(P)| ≤ 2, we use a trick that will come in handy again, during the proof of our main theorem. For q ∈ ν(P), let ρ(q) be the degree of the vertex q: the number of edges touching q. Then we have ρ(q) ≥ 3 for each q ∈ ν(P). The sum of the degrees of all the vertices therefore is greater than or equal to 3 · |ν(P)|. Counting edges instead we see 3 Follows
102
from [7, p. 59].
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
that each infinite edge touches exactly one vertex but each finite edge touches exactly two vertices. Thus ρ(q) = |E I (ϒ(P))| + 2 · |E F (ϒ(P))| . 3 · |ν(P)| ≤ q∈ν(P)
Since there are only two pairs of points in P that are not neighbors, Proposition 1.6 implies |E F (ϒ(P))| ≤ 2. Thus 3 · |ν(P)| ≤ 4 + 2 · 2 = 8, and so |ν(P)| ≤ 2, proving that in this case, |ν(P)| = 2. Moreover we can conclude that ν n (P) = ∅ for all n > 1. 1.4. The Rest of the Cases Are All Hard. The case |P| = 4 turns out to be the last case for which ν n (P) is guaranteed to equal the null set for large enough n. Indeed, the case |P| = 5 has resisted our attempts to understand it! Figure 3 depicts configurations of larger cardinality where the point set does not iterate to a simpler configuration, which in turn enriches (and complicates) matters. We must suspend our case-by-case analysis at this point.
(a)
(b)
Figure 3. Configurations for which |ν(P)| is (a) equal to or (b) greater than |P|.
2. HOW BIG IS ν(P)? Having played with the dynamical system for a while, we, along with our colleagues Allison Edgren and Olivia Gillham, decided to explore the question of what happens to the size of ν n (P) as n goes to infinity. We had some success: a formula [4] (unpublished) that tells us exactly how many points will be in ν(P)! We are going to share this result with you as soon as we describe one more essential piece of geometric information. 2.1. Relating Degeneracy and Cocircularity. Compare the configuration given in Figure 3(a) with those sketched in Figure 4. Each is a configuration of five points, yet each has a distinctly different iteration. We already have the vocabulary necessary to distinguish the configuration sketched in Figure 4(a) from the others: they differ with respect to the number of points on the boundary. The difference between the other two, however, is more subtle. A meticulous reader might notice that all of the vertices in Figure 3(a) are of degree three while there is a vertex in 4(b) with degree four. A Vorono¨ı diagram with a vertex February 2010]
A DYNAMICAL SYSTEM USING THE VORONO¨I TESSELLATION
103
(a)
(b) Figure 4. Two configurations of 5 points.
of degree greater than three is called degenerate; if all the vertices of the Vorono¨ı diagram are degree three we call it nondegenerate. We can quantify this concept of degeneracy with the number Ic (P): Definition 2.1. Let {C1 , . . . , Ck } be the set of all empty circles of a point set P. The number of instances of cocircularity of P is given by k Ic (P) = |Ci ∩ P| − 3 . i=1
In a nondegenerate point set, every vertex will be of degree three, and so every empty circle will intersect exactly three points in P. In such a case, we compute Ic (P) = 0. When a vertex q is of degree k > 3, this implies that |Cq ∩ P| = k. In this case, we say that q contributes (k − 3) instances of cocircularity. This argument proves the following: Proposition 2.2. Let ρ(q) denote the degree of a vertex q ∈ ν(P). Then Ic (P) = (ρ(q) − 3) . q∈ν(P)
As an aside, we note that another way to think about instances of cocircularity involves Delaunay (pre)-triangulations. These tilings arise by connecting the generators whose Vorono¨ı polygons share an edge. If the generator set is nondegenerate, then the resulting graph is a triangulation called the Delaunay triangulation; when the generator set is degenerate, however, there exist Delaunay tiles that are not triangles. The graph is then called the Delaunay pre-triangulation, and the number of edges we need to add to obtain a triangulation is precisely the number of instances of cocircularity. When given a degenerate configuration, one can obtain a nondegenerate point set by simply shifting any cocircular points by an arbitrarily small amount; further, when given a nondegenerate configuration, it remains nondegenerate under sufficiently small perturbations. Hence, when studying or applying Vorono¨ı tessellations it is customary to ignore degenerate configurations: the modified nondegenerate configuration is usually “close enough” to the original, and if the points are collected from real-world instruments we can never assume that our measurements are exact anyway. 104
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
If only we could make such an assumption! In our system, computing Ic (P) is usually only possible if you already know where the vertices are, and, as we shall see, incorporating degeneracy makes determining the long-term behavior much more difficult. But we cannot simply throw out these configurations; there exist situations, like that which is depicted in Figure 5, that ruin things for us. A nondegenerate point set may iterate to a degenerate one, and so assuming P is nondegenerate does not guarantee ν(P) is as well.
(a)
(b) Figure 5. Ic (P) = 0 while Ic (ν(P)) = 1.
2.2. The Theorem on Counting Vertices. We are ready to state and prove the theorem discovered by the second author and his colleagues in [4] (unpublished). The proof we present here, due only to the second author, is much simpler than the original. Theorem 2.3. If not all p ∈ P are collinear, then |ν(P)| = 2 · |P| − |Bd(P)| − Ic (P) − 2. Proof. We can use the trick from §1.3, summing the degrees of the vertices in ν(P). Since P is noncollinear, every infinite edge intersects exactly one vertex and every finite edge intersects exactly two. Hence, recalling from §1.3 that |E I (ϒ(P))| = |Bd(P)|, we get ρ(q) = |E I (ϒ(P))| + 2 · |E F (ϒ(P))| = |Bd(P)| + 2 · |E F (ϒ(P))|. q∈ν(P)
Euler’s formula for finite planar graphs4 tells us that V − E + F = 1. We can apply Euler’s formula to ϒ(P) by ignoring the infinite edges and infinite Vorono¨ı polygons. Doing this yields |ν(P)| − |E F (ϒ(P))| + |Int(P)| = 1, and so |E F (ϒ(P))| = |ν(P)| + |Int(P)| − 1. Thus ρ(q) = |Bd(P)| + 2 · (|ν(P)| + |Int(P)| − 1) . q∈ν(P)
Recalling from §1.3 that |P| = |Bd(P)| + |Int(P)|, we have ρ(q) = 2 · |ν(P)| + 2 · |P| − |Bd(P)| − 2. q∈ν(P) 4 See
[5, p. 5].
February 2010]
A DYNAMICAL SYSTEM USING THE VORONO¨I TESSELLATION
105
On the other hand, by Proposition 2.2, ρ(q) = 3 · |ν(P)| + Ic (P). q∈ν(P)
Thus 2 · |ν(P)| + 2 · |P| − |Bd(P)| − 2 = 3 · |ν(P)| + Ic (P), and so, solving for |ν(P)|, we find |ν(P)| = 2 · |P| − |Bd(P)| − Ic (P) − 2, as desired. From this and our preliminary analysis in §1, several interesting observations may be made: Corollary 2.4. (i) (ii) (iii) (iv)
If |P| < 5 then ν n (P) = ∅ for some n ∈ N. If |P| = 5 then either ν n (P) = ∅ for some n ∈ N or |ν n (P)| = 5 for all n ∈ N. If |ν(P)| > |P| then |P| > 5. If |Int(P)| = 2 and Ic (P) = 0, then |ν(P)| = |P|.
Part (ii) of this corollary suggests that the case when |P| = 5 may be of special interest, not only because it marks a transition in the behavior of the system, but because it forms a natural subset of phase space that is closed under Vorono¨ı iteration. We discuss this further in §3. Theorem 2.3 alone is not enough to predict the long-term behavior of our system, since computing |ν n (P)| requires that we can also compute |Bd(ν n−1 (P))| and Ic (ν n−1 (P)). We have been able to find bounds on the size of ν n (P), and we present one of interest next. 2.3. Bounds on the Size of νn (P). With no assumptions on the geometry of ν n (P) we can give an upper bound on the size of ν n (P), which we conjecture to be sharp. Proposition 2.5. For all n > 0, we have |ν n (P)| ≤ 2n (|P| − 5) + 5. Proof. By induction on n. For n = 1, plugging the inequalities |Bd(P)| ≥ 3 and Ic (P) ≥ 0 into Theorem 2.3 immediately gives us what we want. Proceeding inductively, we get |ν n (P)| = 2 · |ν n−1 (P)| − |Bd(ν n−1 (P))| − Ic (ν n−1 (P)) − 2 ≤ 2 · |ν n−1 (P)| − 5 ≤ 2 · 2n−1 · (|P| − 5) + 5 − 5 = 2n · (|P| − 5) + 5, which completes the proof. We have also been able to find an upper bound on the size of the boundary of ν(P). Theorem 2.6. The number of points on the boundary of ν(P) does not exceed |Bd(P)|. 106
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
We shall need the following lemma: Lemma 2.7. The interior angle for any vertex in a Vorono¨ı polygon is strictly less than π. Proof. Let p ∈ P generate the Vorono¨ı polygon V ( p) ∈ ϒ(P). Any vertex of V ( p) lies at the intersection of two Vorono¨ı edges e p, p and e p, p , generated by distinct p , p ∈ P. From the argument given in §1.3, we deduce that V ( p) is convex, and hence immediately have that the interior angle made by these edges must be less than or equal to π. Seeking a contradiction, assume that this angle is equal to π. By the perpendicular bisector property from §1.2, we may obtain p by reflecting p across e p, p and p by reflecting p across e p, p . But then p = p , contradicting the assumption that p and p are distinct. Proof of Theorem 2.6. We have two cases: when all q ∈ ν(P) are collinear and when they are not. For the former, since any Vorono¨ı tessellation is connected we must have each vertex joined to its neighbor(s). Since each vertex must at least be of degree three, there must be at least |ν(P)| + 2 infinite edges, implying that |Bd(ν(P))| = |ν(P)| < |ν(P)| + 2 ≤ |E I (ϒ(P))| = |Bd(P)|, as needed. To prove the result for noncollinear vertex sets, we seek to show |Bd (ν(P))| ≤ |E I (ϒ(P))| by proving that every q ∈ Bd (ν(P)) intersects at least one infinite Vorono¨ı edge. This, with the equality |E I (ϒ(P)) | = |Bd(P)| from Proposition 1.6, gives us what we want. Since q ∈ Bd (ν(P)), we may find neighbors q , q ∈ Bd(ν(P)) of q on the boundary of the convex hull of ν(P). Define H (q, q ) to be the closed half-plane containing ν(P) that is bounded by the line through q and q , and similarly define H (q, q ); these half-planes exist by the properties of the convex hull. Thus ν(P) is contained in the intersection H (q, q ) ∩ H (q, q ), which we denote H . Find a point p ∈ P such that q ∈ V ( p) and V ( p) has a nontrivial intersection with the complement of H , and let e ∈ E (ϒ(P)) be an edge of V ( p) that touches q. If e is infinite, then we are done. If e terminates in a vertex, then by the convexity of H we have that e is contained in H . Let e ∈ E (ϒ(P)) be the other edge of V ( p) that touches q. By Lemma 2.7, the interior angle made at q must be strictly less than π. Since V ( p) must have a nontrivial intersection with the complement of H , we find that e cannot be contained in H since the convexity of H would force the interior angle made at q to be greater than or equal to π. Thus it cannot terminate in a vertex, and instead is infinite, as desired. Note that the size of the boundary does not steadily decrease to three in all cases. There seem to be configurations with boundaries whose sizes stay stable over many iterations. Theorems 2.3 and 2.6 combine to give a lower bound on the size of ν n (P) in the generic situation where there are no instances of cocircularity. Theorem 2.8. Let P ∈ P (R2 ) and N ∈ N. Suppose that for all n = 0, 1, 2, . . . , N − 1, Ic (ν n (P)) = 0. Then |ν N (P)| ≥ 2 N |P| − (2 N − 1) (|Bd(P)| + 2). A point set P with |Int(P)| > 2 and with Vorono¨ı iterations that never contain instances of cocircularity will have exponential growth on the order of 2n . (If |Int(P)| ≤ 2 then the size does not increase). Since having instances of cocircularity is a rare occurrence, we conjecture that such a point set exists. We show a typical situation in Figure 6: a few iterates of a randomly-selected point set of size 9. As Theorem 2.3 February 2010]
A DYNAMICAL SYSTEM USING THE VORONO¨I TESSELLATION
107
predicts, we see the size going to 13, then 21 points; if there continues not to be any cocircularity then the growth will escalate rapidly: the next four iterations contain 37, 69, 133, and then 261 points, respectively. 50
50
40 30
0
20 10 –50
0 –10 –20
0
10
20
30
40
50
60
–100 –20
70
0
20
40
60
80
100
120
140
200 150 100 50 0 –50 –100 –150 –200 –150 –100 –50
0
50
100 150 200 250 300
Figure 6. A few iterations of a point set with no instances of cocircularity.
3. SO WHAT’S NEXT? QUESTIONS FROM THE DYNAMICAL SYSTEMS VIEWPOINT. In dynamical systems terminology, P (R2 ) is called the phase space of the system, and elements of P (R2 ) are states of the system that evolve over time according to the map ν. The orbit or trajectory of a state P is defined to be the sequence {P, ν(P), ν 2 (P), . . . } ([1] and [6] are good references). So far we have only talked about one property of an orbit: the growth rate of |ν n (P)| over time. But there are many other interesting questions we can ask about this dynamical system. 3.1. What Is the Right Topology? In order to use the machinery of modern dynamics, we ought to have a topology on P (R2 ) making Vorono¨ı iteration continuous, at least some of the time. We need this to study major dynamical features such as recurrence—how orbits return to open sets over time—or topological entropy—a measure of the tendency of the system to become disordered. Finding the right topology is of great importance, but it has proved to be an interesting problem in its own right. We will mention some of the subtlety here. To begin, notice that the scale of the diagrams in Figure 6 increases with each iteration. We want our topology to care about the shape of the Vorono¨ı polygons rather than their size. Similar triangles produce similar Vorono¨ı diagrams, so the topology ought to respect that. Let us be more precise. Definition 3.1. A similarity transformation t ∈ Sim(2) is a map of the form t (x) = kU x + x0 , 108
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
where x, x0 ∈ R2 , U is a 2 × 2 orthogonal matrix, and k ∈ R+ . For P, Q ⊆ P (R2 ), we say P is similar to Q, written P Q, if there exists t ∈ Sim(2) such that t (P) = Q. A similarity transformation is a composition of translations, rotations, reflections, or dilatations of Euclidean space. One can show that Sim(2) forms a group under the composition of functions and that the relation is an equivalence relation. Similarity transformations play nice with our dynamical system: Theorem 3.2. For t ∈ Sim(2), we have t (ν(P)) = ν(t (P)). Proof. Let q ∈ ν(P). By Proposition 1.3, there is a unique empty circle Cq centered at q. Since similarity transformations preserve circles, t maps Cq to an empty circle centered at t (q). Thus t (q) ∈ ν (t (P)), and so t (ν(P)) ⊂ ν (t (P)). By similar logic, since t (P) is a point set and t −1 is a similarity transformation, we have t −1 (ν (t (P))) ⊂ ν t −1 (t (P)) = ν(P). Thus ν (t (P)) ⊂ t (ν(P)), and so we have equality. The right topology should identify similar point sets, so we aren’t really looking at P (R2 ) but rather the quotient of P (R2 ) under similarity. On the other hand, forgetting the question of similarity, we do have an intuitive idea of what it means for point sets P and Q in P (R2 ) to be “close.” In fact there are already metrics, for instance the Hausdorff metric, for this. The problem is that we need our metric to respect Vorono¨ı iteration. Suppose we draw little -balls around all the points of P, and discover that each ball contains exactly one point of Q. If so, it is very likely that ν(P) and ν(Q) are close as well. (The obvious exception is with point sets that have nontrivial cocircularity— jiggling the points a little bit destroys the cocircularity and thus produces extra vertices in the Vorono¨ı iteration.) This idea for a metric is promising but has a serious flaw: it can’t compare sets that aren’t the same cardinality. Since there are sets of the same cardinality that iterate to sets of different cardinalities, it would be nice to have the ability to consider whether those iterates are close. Unfortunately, trying to measure distances between sets of different cardinalities opens Pandora’s box. If several points of Q are inside the -ball around a point of P, we can create all sorts of bizarre patterns in ν(Q) that may be quite dissimilar to ν(P). We find ourselves unsure how to resolve these issues. 3.2. Are There Periodic Points? For k ∈ N, we say P is a period-k point if P
ν k (P). As a special case, if k = 1 then P ν(P) and we call P a fixed point. The orbit of a periodic point repeats itself over and over again, and can be thought of as finite (modulo similarity). Periodic orbits are of substantial interest in dynamical systems theory: if periodic orbits are dense in phase space it is an indication of chaos. Despite our attempts so far, we have not been able to find any finite periodic point set P. Curiously, it’s not difficult to find an infinite set P of period 1 or 2! In Figure 7(a), the square lattice iterates to a shifted copy of itself and so has period 1; in Figure 7(b), points evenly spaced on the diagonals y = ±x iterate to points on the x- and y-axes, which then iterate back for a period of 2. In both examples one sees a high degree of cocircularity and we assume that this may be a key component in finding examples with other periods as well. Once a fixed (or periodic) point has been located, we may ask whether it is attracting, repelling, or hyperbolic. We look at “nearby” points (again the need for a February 2010]
A DYNAMICAL SYSTEM USING THE VORONO¨I TESSELLATION
109
(a)
(b)
Figure 7. Infinite configurations which are (a) period-1 and (b) period-2.
well-defined notion of distance!) and look at what their orbits do. If all nearby points come closer and closer to P, it is attracting; if they all get pushed away we call it repelling. If there is a mixture it may be hyperbolic. There’s some indication that the grid in Figure 7(a) may be hyperbolic: we know it repels grids with “defects” like the one in Figure 8. However, if we shift an entire row of points up by a fixed amount, the orbit will be pulled back towards the orbit in 7(a).
Figure 8. Introducing a defect into the square lattice of 7(a).
3.3. Sensitive Dependence on Initial Conditions. This is the so-called “butterfly effect,” where a small change in an initial state can produce a large change in its orbit. Even though we lack a metric on our phase space P (R2 ), there is evidence of sensitive dependence on initial conditions. For instance, if we shift a point on the grid in Figure 7(a), it introduces a defect in the Vorono¨ı iteration, which we see in Figure 8. As the system evolves, that defect will grow to include ever larger regions of the plane until the original grid is no longer recognizable. One can see this effect in finite configurations as well. Consider the set depicted in Figure 4(b). Since |ν(P)| = 4 we know it is doomed to iterate to the empty set. However, if we were to move any one of the four cocircular points even the slightest amount, we would see the single vertex of degree four become two vertices of degree three. This five-point configuration has a chance of iterating to five-point sets indefinitely. 110
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
3.4. Can We Go Backwards? A sensible question to ask is, what happens when we try to invert the system? Most discrete point sets in the plane are not the vertex set of a Vorono¨ı diagram. So we must ask first when a preimage of a point set exists, i.e., given a set Q, does there exist a set P such that ν(P) = Q? If there is a preimage, when is it unique? When it is not unique, how many preimages are possible, and is there a “best” one? Some literature exists about inverting Vorono¨ı tessellations (see, e.g., [2, 8]), and we know relatively straightforward conditions that must be satisfied for a preimage to exist. What is more subtle in our situation is that we don’t have the whole Vorono¨ı diagram—we just have its vertices. Without the edge adjacencies the problem becomes richer. For example, in Figure 9 we depict two preimages of the lighter colored set of six points, with distinctly different Vorono¨ı diagrams.
(a)
(b)
Figure 9. A configuration of six points with two nontrivially different preimages.
It would be nice if there was a subset of P (R2 ) that was invariant under Vorono¨ı iteration both forward and backwards in time. We conjecture that there is a subset S ⊂ P (R2 ) of five-point configurations with 2 in the interior and 3 on the boundary for which (1) the set is invariant under forward Vorono¨ı iteration, and (2) there exist preimages of all orders within the set. If we could identify criteria for membership in S , then we could begin to examine the orbits of S for evidence of chaos. 3.5. Generalizations. In §3.2 we mentioned generalizing the Vorono¨ı iteration to infinite point sets. However, this allows for a host of strange phenomena and so we must restrict our point sets somewhat. For example, we might require that the derived set— the set of all accumulation points—of P be empty. Or we might assume that P is a Delone set; i.e., both relatively dense and uniformly discrete. Of course for a property to have any use to us, it must be preserved under iteration. One can prove that the derived set of ν(P) is empty whenever the derived set of P is empty, while the property of being Delone is actually not preserved under iteration. We conclude that a natural phase space consists of all generator sets with empty derived sets. And what if we were to no longer work on the plane? The sphere with the great circle metric immediately comes to mind. All Vorono¨ı polygons are bounded, and so infinite edges are no longer an issue; this gives rise to a more elegant version of Theorem 2.3. Finite periodic point sets are suddenly easy to find: each Platonic solid iterates to its dual. Further, one can prove that a set of noncoplanar points will never iterate to the empty set. And this is all besides the fact that we humans live on a sphere, February 2010]
A DYNAMICAL SYSTEM USING THE VORONO¨I TESSELLATION
111
so maybe the system has relevance to real-world problems such as the distribution of cell-phone satellites or the spread of diseases. The fact that one can define the Vorono¨ı tessellation for a set of points in any metric space means that vast generalizations are possible, whether it be to infinite sets, a higher dimension, a new metric, or a less geometric setting. Part of the appeal of this dynamical system is its intuitive definition—a definition that depends little on the size of the point sets or the space they are in. We urge interested readers to think about how the ideas in this paper play out in their favorite metric spaces! 3.6. Conclusion. We admit that we don’t know much about the behavior of this dynamical system yet. We know exactly what happens for small point sets, but as soon as there are five or more points we encounter difficulties. We have a theorem [4] that measures how the point sets grow or shrink in size, but it requires geometric information that isn’t always readily available. We have evidence that many point sets grow without bound, but have been unable to determine which conditions guarantee this. We’ve noticed evidence of sensitive dependence on initial conditions, but we don’t have the machinery to measure the phenomenon. In the course of our study we’ve developed tools to help us with our insight on this finicky dynamical system, and we’ve shared some of that here. There is one thing we know for certain: plenty of problems remain. Some are simple enough to be studied by budding mathematicians, and some may be subtle enough to interest their advisors as well. REFERENCES 1. K. T. Alligood, T. D. Sauer, and J. A. Yorke, Chaos: An Introduction to Dynamical Systems, SpringerVerlag, New York, 1997. 2. P. F. Ash and E. D. Bolker, Recognizing Dirichlet tessellations, Geom. Dedicata 19 (1985) 175–206. doi: 10.1007/BF00181470 3. Q. Du, M. Emelianenko, and L. Ju, Convergence of the Lloyd algorithm for computing centroidal Vorono¨ı tessellations, SIAM J. Numer. Anal. 44 (2006) 102–119. doi:10.1137/040617364 4. A. Edgren, O. Gillham, and S. M. Hart, On a dynamical system based on Vorono¨ı tessellations, 2005 URSI Undergraduate Project Report (advisor: Natalie Priebe Frank), available at http://math.vassar.edu/Faculty/Frank/NPFpublications.html. 5. M. Henle, A Combinatorial Introduction to Topology, Dover, New York, 1994. 6. A. Katok and B. Hasselblatt, Introduction to the Modern Theory of Dynamical Systems, Encyclopedia of Mathematics and its Applications, vol. 54, Cambridge University Press, Cambridge, 1995. 7. A. Okabe, B. Boots, K. Sugihara, and S. N. Chiu, Spatial Tessellations: Concepts and Applications of Vorono¨ı Diagrams, 2nd ed., John Wiley, Chichester, UK, 2000. 8. F. P. Schoenberg, T. Ferguson, and C. Li, Inverting Dirichlet tessellations, Comput. J. 46 (2003) 76–83. doi:10.1093/comjnl/46.1.76
NATALIE PRIEBE FRANK received her B.S. in mathematics from Tulane University in 1991. She thought up the Vorono¨ı dynamical system while working on her Ph.D., which she received from the University of North Carolina at Chapel Hill in 1997. She currently teaches at Vassar College, where she was delighted to recruit her coauthor for mathematical duty after his freshman year. Department of Mathematics, Vassar College, Poughkeepsie, NY 12604
[email protected]
SEAN M. HART received his B.A. in mathematics from Vassar College in 2008. He is currently working on his Ph.D. at the University of Texas at Austin with the hope that one day he, too, will be struck with a great idea. He will always be grateful to his coauthor and the rest of the faculty at Vassar for nurturing his mathematical abilities. Department of Mathematics, The University of Texas at Austin, 1 University Station, C1200, Austin, Texas 78712
[email protected]
112
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
Variations on Kuratowski’s 14-Set Theorem David Sherman
1. INTRODUCTION. The following result, from Kuratowski’s 1920 dissertation, is known as the 14-set theorem. Theorem 1.1. [8] Let E ⊆ X be a subset of a topological space. The number of distinct sets which can be obtained from E by successively taking closures and complements (in any order) is at most 14. Moreover, there are subsets of the Euclidean line for which 14 is attained. In this article we will see what happens when “closure” and “complement” are replaced or supplemented with other basic topological operations. Question 1.2. Let I be a subcollection of {closure, interior, complement, intersection, union}. What is the maximum number of distinct sets which can be generated from a single subset of a topological space by successive applications of members of I ? Apparently Question 1.2 will require us to solve 25 = 32 different problems . . . well, not really. Many of these are redundant, either because different choices of I allow us to perform the same operations, or because different choices of I raise algebraically isomorphic questions. (This is explained in Section 4.) Theorem 1.1 answers at least one case, and certainly many others are trivial. Finally, an example of Kuratowski shows that if we allow all five operations, we may obtain infinitely many sets. After a full reckoning, only two questions remain. Question 1.3. (1) What is the maximum number of distinct sets that can be generated from a fixed set in a topological space by successively taking closures, interiors, and intersections (in any order)? (2) Same question, but with closures, interiors, intersections, and unions. Question 1.3(1) was posed by Smith in a 1974 M ONTHLY as Problem 5996, with published solution given later by Yu [12]. The numerical answer to Question 1.3(2) was stated by Langford in an abstract in the 1975 Notices of the AMS [9], but it seems that no proof had appeared before the first version of the present article was circulated. Recently Gardner and Jackson published a nice article [3] which also solves Question 1.3(2) and gives a thorough discussion of many other aspects of Kuratowski-type problems. With a little additional work, we will finally solve a generalization which has not appeared elsewhere. doi:10.4169/000298910X476031
February 2010]
VARIATIONS ON KURATOWSKI’S 14-SET THEOREM
113
Question 1.4. Same as Question 1.2, with n ≥ 2 sets initially given. Our approach to this topic, like Kuratowski’s, is almost entirely algebraic. The basic language is the “topological calculus” which was developed by the Polish school during the first half of the twentieth century. Prominent figures such as Birkhoff, Stone, Halmos, and Tarski kept this program dynamic, intertwining topology with the related fields of set theory, logic, and lattice theory. (And the incorporation of Hilbert spaces opened up new realms of noncommutative analysis, with von Neumann at the center.) In the present article, the relevant topological calculus is known as a “closure algebra,” and the questions above reduce to the calculation of certain algebras generated by a specific partially ordered set. So while our subject is apparently point-set topology, membership of points in sets plays a very minor role! Theorem 1.1 is actually easy to prove, almost certainly having acquired some cachet from the unusual presence of the number 14. Readers who have never seen it before may enjoy experimenting at Mark Bowron’s website, currently hosted at http: //mathdl.maa.org/mathDL/60/?pa=content&sa=viewDocument&nodeId=3343, which allows a visitor to construct an initial set, then counts the number of sets generated by taking closures and complements. (A score of 14 “wins.”) Within the substantial literature surrounding this theorem, the idea of considering different collections of topological operations goes back at least to 1927 [13]. Other authors have abstracted the algebraic content, or isolated the specific conditions under which a topological space and subset generate 14 sets. Some of these variations are described in the last section. This article is intended for the nonspecialist in universal algebra—indeed, it was written by one. Thus we define even basic terms, and do not always give the most general formulations. It is hoped that many readers will find the methods at least as interesting as the answers. 2. MONOIDS, POSETS, AND THE PROOF OF THEOREM 1.1. We start with the basics. Let X be an arbitrary set, and let P (X ) be the set of subsets of X . To endow X with a topology means to choose a distinguished subset of P (X ), called the open sets, which is closed under arbitrary unions and finite intersections, and contains both X and the empty set. The complement of an open set is a closed set. The (topological) closure of E ∈ P (X ) is the smallest closed set containing E; the interior of E is the largest open set contained in E. Therefore the three functions “closure of,” “interior of,” and “complement of” can naturally be viewed as operations on P (X ). We will denote them by k, i, and c, respectively, and write them to the left of the set, as is usual for operators (or English sentences). We also denote the collection of maps P (X ) → P (X ) as End(P (X )) (for “endomorphisms”—but these maps need not preserve any of the algebraic structure we later attach to P (X )). Thus ki E should be read as “the closure of the interior of E.” The reader should be aware that some authors place topological operations to the right of the set, with an opposite rule for composition, and the letters k and c are sometimes switched. With regard to the latter, our choice was made with Kuratowski closure operators in mind. Definition 2.1. [8] A Kuratowski closure operator on a set X is a map k ∈ End(P (X )) which satisfies, for any E, F ∈ P (X ), (1) (2) (3) (4) 114
k∅ = ∅; kk E = k E; k E ⊇ E; k E ∪ k F = k(E ∪ F). c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
Any Kuratowski closure operator k ∈ End(P (X )) is exactly the topological closure operator for the topology on X whose open sets are {ck E | E ⊆ X } [8]. So the choice of k is equivalent to the choice of topology on X , while c is independent of topology. We write I ∈ End(P (X )) for the identity map and record the following (redundant) consequences of our definitions: k 2 = k,
c2 = I,
i = ckc,
i 2 = i,
ic = ck,
kc = ci.
(2.1)
Next we recall some definitions from algebra. A partial order on a set is a reflexive antisymmetic transitive relation. A standard example is ≤ on R, but it is not necessary that any two elements be comparable: P (X ) is a partially ordered set—a poset—with partial order given by inclusion. One notices that k and i preserve the ordering, while c reverses it. (This means, for example, that E ⊇ F ⇒ k E ⊇ k F—use Definition 2.1(4).) Now the collection of functions from any set into a poset can also be made into a poset, where one function dominates another if and only if this is true pointwise. This induces a partial order on End(P (X )): for ϕ, ψ ∈ End(P (X )), ϕ≥ψ
⇐⇒
ϕ(E) ⊇ ψ(E),
∀E ∈ P (X ).
Then item (3) of Definition 2.1 can be rewritten as k ≥ I , and evidently i ≤ I . The poset End(P (X )) is also a monoid: a set with an associative binary operation (in this case composition) and a unit. (So a monoid is a “group without inverses.”) Note that order is preserved by an arbitrary right-composition: ϕ ≥ ψ ⇒ ϕσ ≥ ψσ,
ϕ, ψ, σ ∈ End(P (X )).
Order is also preserved by left-composition with k or i, but reversed by left-composition with c. “Ordered monoid” sounds frighteningly abstract, but we will only be concerned here with subsets of End(P (X )). The advantage in the situation at hand is that we may invoke a familiar friend from group theory (or universal algebra, to those in the know): presentations. This just means that we will describe sets of operations in terms of generators and relations, as demonstrated in the following lemma. Lemma 2.2. (1) Let k, i ∈ End(P (X )) be the closure and interior operators of a topological space. Then the cardinality of the monoid generated by k and i is at most 7. (2) For a subset of a topological space, the number of distinct sets which can be obtained by successively taking closures and interiors (in any order) is at most 7. Proof. Composing k ≥ I on the left and right with i gives iki ≥ i. Composing i ≤ I on the left and right with k gives kik ≤ k. We use both of these to calculate (i)k ≤ (iki)k = i(kik) ≤ i(k) ⇒ ik = ikik; k(i) ≤ k(iki) = (kik)i ≤ (k)i ⇒ ki = kiki. Since k 2 = k and i 2 = i, the monoid generated by k and i contains exactly {I, i, ik, iki, k, ki, kik} (which may not all be distinct). This proves the first part, and the second part is a direct consequence. February 2010]
VARIATIONS ON KURATOWSKI’S 14-SET THEOREM
115
Using parentheses for “the monoid generated by,” the second sentence of Lemma 2.2(1) can be rewritten as |(k, i)| ≤ 7. Proof of Theorem 1.1. It follows from (2.1) that any word in k, i, c can be reduced to a form in which c appears either as the leftmost element only, or not at all. So by the previous lemma (k, c) = (k, i, c) = {I, i, ik, iki, k, ki, kik, c, ci, cik, ciki, ck, cki, ckik}.
(2.2)
Thus 14 is an upper bound. To conclude the proof, it suffices to exhibit a so-called (Kuratowski) 14-set: a subset of a topological space for which all of these operations produce distinct sets. One example is S = {0} ∪ (1, 2) ∪ (2, 3) ∪ [Q ∩ (4, 5)] ⊂ R. We now investigate the order structures of (k, i) and (k, i, c) a little further. Via our basic rules for order we find that i ≤ I ≤ k;
i ≤ iki ≤ [either of ki, ik] ≤ kik ≤ k.
By considering the set S we see that these (and the consequences from transitivity) are the only order relations in (k, i), at least when X contains a copy of R. The order structure of (k, i) is depicted in Figure 1, which is called the Hasse diagram of the poset. Here a segment from ϕ up to ψ means that ϕ < ψ and there is no σ satisfying ϕ < σ < ψ. (We write ϕ < ψ for ϕ ≤ ψ and ϕ = ψ.) k C CC CC
{{ {{ I
<< << << << << <<
ki
kik
CC CC { {{ {{
CC CC {{ {{
ik
iki
i Figure 1. The Hasse diagram of (k, i) for topological spaces containing a copy of R.
Again using S, we obtain that there are no necessary relations between the first seven and last seven elements of (2.2). Since left composition with c reverses order, the Hasse diagram of (k, i, c) (Figure 2) consists of two disjoint copies of Figure 1, one of which has been left-composed with c and vertically inverted. It was first drawn by Kuratowski [8]; in essence all of the arguments in this section go back to his dissertation. 3. BOOLEAN LATTICES AND THE ANSWER TO QUESTION 1.3. We have already mentioned that P (X ) is a poset. Now we want to take advantage of its richer structure as a Boolean lattice. Recall that a lattice is a poset in which any two elements have a least upper bound and a greatest lower bound. We write these binary operations as ∨ and ∧, respectively, and refer to them as join and meet. A lattice is distributive if it satisfies the equality x ∧ (y ∨ z) = (x ∧ y) ∨ (x ∧ z), 116
∀ x, y, z
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
k C CC CC
{{ {{ I
<< << << << << <<
ki
kik
CC CC { {{ {{
CC CC {{ {{
ik
iki
i
c
== == == == == ==
ci
FF FF F xx xx
cki
ciki
FF FF x xx xx
FF FF xx xx
cik
ckik
ck
Figure 2. The Hasse diagram of (k, i, c) for topological spaces containing a copy of R.
(or equivalently, the same equation with ∨ and ∧ everywhere interchanged). Finally, a distributive lattice is Boolean if (i) it contains a least element 0 and a greatest element 1, and (ii) for every element a there is an element b, called a complement of a, such that a ∨ b = 1 and a ∧ b = 0. Complements in Boolean lattices are unique [5, Lemma I.6.1], so that we may view complementation as a third (unary) operation. (Strictly speaking our use of operations means that we have turned our Boolean lattice into a Boolean algebra, but we will ignore the distinction.) The poset P (X ) is a Boolean lattice in which the three operations are nothing but union, intersection, and set complementation. As the collection of all functions from P (X ) into the Boolean lattice P (X ), the poset End(P (X )) also acquires structure as a Boolean lattice, with pointwise operations. For ϕ, ψ ∈ End(P (X )), E ∈ P (X ), we have (ϕ ∨ ψ)E = (ϕ E) ∪ (ψ E),
(ϕ ∧ ψ)E = (ϕ E) ∩ (ψ E).
The complement (in the Boolean sense) of ϕ is the composition cϕ. We approach Question 1.3 in the same way as Theorem 1.1, by enumerating (k, i, ∧) (respectively (k, i, ∧, ∨)), the algebra of operations in End(P (X )) which can be expressed in terms of {I, k, i, ∧} (respectively {I, k, i, ∧, ∨}). Then we identify a single set which distinguishes all the operations under consideration. Closures, Interiors, Intersections. Assuming X contains a copy of R, the order structure of (k, i) is as shown in Figure 1. A first step is to add all irredundant meets to this diagram; we start with ki ∧ ik and then notice that the meet of any two elements, both different from I , is already in our poset. It suffices to add the meet of each element with I , and since k ∧ I = I and i ∧ I = i, this gives five more elements. The resulting structure is the meet semi-lattice generated by the poset (k, i), since it has ∧ but not ∨. A diagram of this 13-element poset is given in Figure 3. By construction it is closed under ∧, and since i distributes across ∧ it is closed under i. Perhaps surprisingly, it is also closed under k. To prove this, we need to show that for each element ϕ in Figure 3, kϕ already appears in Figure 3. February 2010]
VARIATIONS ON KURATOWSKI’S 14-SET THEOREM
117
i k XXXXXXX XXXXX iiii i i i XXXXX ii i XXXXX i i i i XX i i I UUUU ff kik G f f f UUUU GG ffff f f ww UUUU f f GG ww ffff UU f f w f f f I ∧ kik ee ki G gg ik NNNeeeeeeeeee GG gggggggg ww pp eNNN e p g G w e g e p g e G w g pp eeeeeee gg w gggg ee I ∧ ki I ∧ ik ki ∧ ik f NNN fffff NNN ppp ffffffff p p f p ffff I ∧ ki ∧ ik f iki fffff fffff f f f f ff fffff I ∧ iki i Figure 3. The meet semi-lattice generated by the poset (k, i) (also the Hasse diagram of (k, i, ∧)) for topological spaces containing a copy of R.
This is clear for the seven elements from Figure 1. For the remaining elements, we start with an easy observation. Since k preserves order, for any E, F ∈ P (X ) we have k(E ∩ F) ⊆ k E, k(E ∩ F) ⊆ k F ⇒ k(E ∩ F) ⊆ k E ∩ k F. This means that k(ϕ ∧ ψ) ≤ kϕ ∧ kψ,
ϕ, ψ ∈ End(P (X )).
(3.1)
Now let σ be any of ki ∧ ik, I ∧ iki, I ∧ ki ∧ ik, and I ∧ ki. Applying (3.1) to σ and reducing gives kσ ≤ ki. But σ ≥ i, so kσ ≥ ki. We conclude that kσ = ki. It is left to consider the two elements k(I ∧ ik) and k(I ∧ kik). Applying (3.1) shows that each is ≤ kik. We claim that k(I ∧ ik) = kik, whence the larger element k(I ∧ kik) is kik as well. We calculate as follows: ik = ik ∧ k(I ) = ik ∧ k[(I ∧ ik) ∨ (I ∧ cik)] = ik ∧ [k(I ∧ ik) ∨ k(I ∧ cik)] = [ik ∧ k(I ∧ ik)] ∨ [ik ∧ k(I ∧ cik)]. Inspecting the last term, ik ∧ k(I ∧ cik) ≤ ik ∧ k(cik) = ik ∧ cik = 0. Here 0 ∈ End(P (X )) is the map which sends every set to the empty set. We may therefore omit this term from the previous equation, which gives ik = ik ∧ k(I ∧ ik) ⇒ ik ≤ k(I ∧ ik) ⇒ kik ≤ k(I ∧ ik). Since the opposite inequality was already established, the claim is proved. 118
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
It follows that (k, i, ∧) has at most thirteen elements, and it can be checked that each of the operations in Figure 3 produces a distinct set when applied to the set 1 1 T = : n ∈ N ∪ [2, 4] − 3 + : n ∈ N (3.2) n n ∞ 1 1 ,6 + ∪ (5, 7] ∩ Q ∪ 6+ . 2nπ (2n − 1)π n=1 So the answer to Question 1.3(1) is thirteen. (We remind the reader that this result first appeared as the solution to a M ONTHLY problem in 1978 [12].) Closures, Interiors, Intersections, Unions. Still assuming that R embeds in X to guarantee that (k, i, ∧) is as large as possible, our first task here is to add all irredundant joins of operations from Figure 3. Since End(P (X )) is distributive, the resulting set will be closed under joins and meets: it is in fact the distributive lattice generated by Figure 1. Let us add in the two elements ki ∨ ik and (I ∧ ki) ∨ (I ∧ ik), and partition our poset into four classes: (1) (2) (3) (4)
i, k; I; iki, ki ∧ ik, ik, ki, ki ∨ ik, kik; I ∧ iki, I ∧ ki ∧ ik, I ∧ ik, I ∧ ki, (I ∧ ki) ∨ (I ∧ ik), I ∧ kik.
It may help to notice that the third class is the right-hand five of Figure 3, plus ki ∨ ik, while the fourth class is the middle five of Figure 3, plus (I ∧ ki) ∨ (I ∧ ik). Each class above is already a sublattice of End(P (X )), so an irredundant join x1 ∨ x2 ∨ · · · ∨ xn can contain at most one x j from each class. Elements in the first class occur in no irredundant joins. The identity I cannot be involved in an irredundant join except with elements of the third class, which produces six more elements. It is left to consider joins of the third and fourth classes. Using the distributive law, this turns up 14 more elements: (I ∧ kik) ∨ ki ∨ ik, (I ∧ kik) ∨ ki, (I ∧ kik) ∨ ik, (I ∧ kik) ∨ (ki ∧ ik), (I ∧ kik) ∨ iki, (I ∧ ki) ∨ (I ∧ ik) ∨ (ki ∧ ik), (I ∧ ki) ∨ (I ∧ ik) ∨ iki, (I ∧ ki) ∨ ik, (I ∧ ki) ∨ (ki ∧ ik), (I ∧ ki) ∨ iki, (I ∧ ik) ∨ ki, (I ∧ ik) ∨ (ki ∧ ik), (I ∧ ik) ∨ iki, (I ∧ ki ∧ ik) ∨ iki. We conclude that the distributive lattice generated by the poset (k, i) has at most 35 elements, each of which is a join of elements from Figure 3. Since k distributes across joins, this set is closed under left composition with k. The roles of k and i are dual— see the next section for explanation—in the self-dual distributive lattice generated by (k, i), so it is also closed under left composition with i. Finally, fans of drudgery can check that the 35 operations are distinguished by the set T from (3.2). (Skeptics may also consult the more sophisticated example given in [3].) Therefore the answer to Question 1.3(2) is thirty-five. February 2010]
VARIATIONS ON KURATOWSKI’S 14-SET THEOREM
119
4. ANSWERS TO QUESTION 1.2. A complete answer to Question 1.2 is given in Table 1. All of the numbers ≤ 4 in Table 1 are trivial to verify, and some of the repetition is due to the fact that in the presence of c, the inclusion of k or i (respectively ∧ or ∨) is equivalent to the inclusion of k and i (respectively ∧ and ∨). Other repetition is due to duality, which we now describe. Table 1. Solution to Question 1.2. Each entry is the maximum cardinality of the algebras generated by the topological operations in its row and column, and also the maximum number of operations in these algebras which can be distinguished by a single subset.
{I }
{∧}
{∨}
{∧, ∨}
{I }
1
1
1
1
{i}
2
2
2
2
{k}
2
2
2
2
{c}
2
4
4
4
{i, k}
7
13
13
35
14
∞
∞
∞
Operations
{i, c} = {k, c} = {i, k, c}
The dual of a poset is the same underlying set, with the ordering reversed. (So its diagram is turned upside-down.) The Boolean lattice P (X ) is isomorphic with its own dual, via the complementation map c. This means that any operation ϕ on P (X ) has a dual operation, ϕ¯ = cϕcn , where by cn we mean the application of c to each of the n arguments of ϕ. The action of ϕ¯ is vertically opposite to ϕ; k and i are dual, as are ∧ and ∨. We note that the duality map distributes over composition. For suppose that ϕ has n arguments, and the jth argument is filled by a function of k j arguments, with k total arguments in the composition. (It is not necessary that k = k j , because the n functions may have some arguments in common.) Then ϕ(ψ1 , ψ2 , . . . , ψn ) = cϕ(ψ1 , ψ2 , . . . , ψn )ck = cϕ(ψ1 ck1 , ψ2 ck2 , . . . , ψn ckn ) ¯ ψ¯1 , ψ¯2 , . . . , ψ¯n ). = cϕcn (cψ1 ck1 , cψ2 ck2 , . . . , cψn ckn ) = ϕ( Thus the restriction of the duality map to operations built out of {I, i, k, c, ∧, ∨} amounts exactly to the interchanges i ↔ k and ∧ ↔ ∨. This explains, for example, why |(k, i, ∧)| = |(k, i, ∨)|. As for the assertion at the end of the previous section, just observe that for any operation ϕ in the distributive lattice generated by (k, i), iϕ = k ϕ, ¯ which belongs to the lattice since it is self-dual and closed under left-composition with k. Finally we give a simplified version of Kuratowski’s example [8] showing that (k, c, ∧) = (i, k, c, ∧, ∨) can be infinite. Define a closure operator k on P (N) by
∅, A = ∅; k(A) = (4.1) [min A, ∞) = {min A, 1 + min A, . . . }, otherwise; for any A ∈ P (N). Since k satisfies Definition 2.1, it determines a topology on N, the so-called “left order topology.” Let ϕ = I ∧ [k(k ∧ c)], and let E ⊂ N be the even 120
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
natural numbers. The reader can easily check that ϕ j (E) = E ∩ [2 j + 2, ∞), so the subset E distinguishes infinitely many operations of (i, k, c, ∧, ∨) for this topological space. Unlike our other examples, E does not distinguish all the different operations: for instance i(E) = ik(E) while i({1}) = ik({1}). The existence of a set which does distinguish all the different operations is addressed in the next section. 5. CLOSURE ALGEBRAS. We pause here for a digression which affords us a more convenient language and sharpens some of the preceding results in surprising ways. These ideas are taken from a beautiful 1944 article of McKinsey and Tarski [10]. A closure algebra is a Boolean lattice which is equipped with a closure operator k satisfying the lattice version of Definition 2.1, i.e., with ∅, ∪, and ⊇ replaced with 0, ∨, and ≥. We naturally define i as the dual operation of k. A closure algebra is singlygenerated if every element can be obtained from a certain fixed generator by some unary operation built out of {I, i, k, c, ∧, ∨}. Of particular interest is the unique (up to isomorphism) free singly-generated closure algebra [10, Theorem 5.1], which we refer to here as F . Freeness means that the set of relations is minimal. In other words, if two unary operations agree on the generator of F , they agree on every element of every closure algebra. This has the convenient consequence that inside F , we can identify elements with unary operations; F is the algebra of unary operations expressible in {I, i, k, c, ∧, ∨}. The key observation is that any closure algebra can be identified with a Boolean sublattice of some P (X ), where X is a topological space and k becomes the associated closure operator [10, Theorem 2.4]. Thus Question 1.3 asks about the cardinalities of subalgebras of F in which only some of {I, i, k, c, ∧, ∨} can be used. Here are some other striking facts about F and the theory of closure algebras [10, Theorem 5.10, Theorem 5.17, Appendix IV]. •
•
•
F is isomorphic to a sub-closure algebra of the closure algebra of the Euclidean line, so that a certain subset of the line distinguishes all unequal unary closure algebraic operations. The problem of deciding whether two expressions define the same operation in all closure algebras is “effectively solvable”: there is a decision procedure whose run time is bounded by a function of the complexity of the expressions. Whenever two expressions define the same operation in all closure algebras, there is a formal proof—an analogue for closure algebras of G¨odel’s completeness theorem.
Logicians also know closure algebras (sometimes called “interior algebras”) as metamathematical objects, since they can be used as frameworks for modal and intuitionistic logic [11, Chapter III]. 6. ANSWERS TO QUESTION 1.4. Now we alter our hypotheses by supposing that n ≥ 2 sets are initially given. The theorems of McKinsey and Tarski apply to this case as well, so that we may consider the free closure algebra generated by n elements. We will denote it as Fn , with generators {F jn }1≤ j ≤n . Remarkably, Fn also embeds in the closure algebra of the Euclidean line. At first glance Question 1.4 may seem intractable or at least extremely tedious. In the presence of ∧ or ∨ the cardinalities grow at least exponentially: for example, the two sets T × R, R × T ⊂ R2 obviously generate at least 132 distinct subsets under k, i, and ∧. It turns out, however, that all of the hard work is done, and the situation stabilizes nicely for n ≥ 2. A complete solution to Question 1.4 is given in Table 2; below we explain the key points. February 2010]
VARIATIONS ON KURATOWSKI’S 14-SET THEOREM
121
Table 2. Solution to Question 1.4. Each number is the cardinality of the subalgebra of Fn generated by the operations in its row and column.
{I }
{∧}
{∨}
{∧, ∨}
{I }
n
2 −1
2 −1
Dn
{i}
2n
3n − 1
∞
∞
2n
∞
3 −1
∞
{c}
2n
2n
2
2n
2
22
{i, k}
7n
∞
∞
∞
{i, c} = {k, c} = {i, k, c}
14n
∞
∞
∞
Operations
{k}
n
n
n
n
Using a subscript to denote the ambient algebra, we first claim that |(k, ∧)F2 | = ∞, proved in almost the same way as |F | = ∞. Let k be the closure operator of (4.1), and let E = E 0 and O be the even and odd natural numbers, respectively. For j ≥ 1, define inductively elements of the closure algebra generated by E and O by E j = E j −1 ∩ k(k E j −1 ∩ O). Then the E j = E ∩ [2 j + 2, ∞) are all distinct, establishing the claim. By duality and inclusions, this justifies every occurrence of ∞ in Table 2. The first column of Table 2 consists of algebras with unary operations only, so the results of Table 1 can be applied to one generator at a time. For the last three entries in the fourth row of Table 2, the algebra under consideration is the free Boolean algebra n with n generators, which is well known to have 22 elements [5, Theorem II.2.2(iii)]. The elements of (∧)Fn have the form F jn1 ∧ F jn2 ∧ · · · ∧ F jnk , where 1 ≤ k ≤ n and the jk ∈ {1, 2, . . . n} are distinct. Apparently they are in one-to-one correspondence with the 2n − 1 nonempty subsets of the n generators. The situation for (i, ∧)Fn is similar; now any element is a meet in which for each F jn , one of three possibilities holds: F jn is present, i F jn is present, or both are absent. (Recall that i distributes across ∧.) Since we do not admit the empty meet, this allows 3n − 1 possibilities. The other occurrences of 2n − 1 and 3n − 1 in Table 2 follow from duality. Finally, (∧, ∨)Fn is the free distributive lattice on n generators; determining its cardinality Dn is sometimes called Dedekind’s problem. No explicit formula for Dn is known, but asymptotically log2 Dn ∼ C(n, n2 ), where C(n, k) denotes the binomial coefficient and x the greatest integer ≤ x [7]. Each of the finite formulas of Table 2 extends to the case n = 1. 7. OTHER VARIATIONS. It has not been our goal to survey the wealth of literature concerning extensions of the 14-set theorem. The interested reader may consult the article [3], which takes a sophisticated algebraic approach and contains many references. Here we simply indicate some of the other directions in which the theorem has been generalized. A natural idea is to study subalgebras of F in which the generating operations include other topological operations built out of {k, ∧, c}. This need not be too abstract— for example, the operation “boundary of” (considered in Kuratowski-type theorems by many authors) is k ∧ kc. And of course many basic topological constructions are not closure algebraic. Kuratowski himself [8] noted that the operation “accumulation points of,” which sends E ∈ P (X ) to its derived set D E, is not expressible in terms of 122
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
k, ∧, and c. (Let E = {0} ∪ {1/n : n ∈ N} ⊂ R.) See [10, Appendix I] for remarks on “derivative algebra.” One may also generalize Definition 2.1 by relaxing the postulates and/or replacing P (X ) with non-Boolean lattices. Among the sizable research in this direction, we point out [6], which deals specifically with the 14-set theorem. The present article is about the algebraic content of the 14-set theorem, but there are also investigations into its topological content. We have seen that a topological space X contains 14-sets when it is sufficiently rich, in particular when it contains a copy of the Euclidean line. One can ask for characterizations of topological spaces in which certain limitations occur, and a classification along these lines was given by Aull [1]. Several authors have given necessary and sufficient conditions for a specific subset E ⊆ X to be a 14-set; Chapman [2] gave an exhaustive description of all possible degeneracies of the poset in Figure 1. Once again the reader is referred to [3] for precise statements of these and other results. ACKNOWLEDGMENTS. The author benefited from several conversations with John D’Angelo, including banter over pizza which sparked the writing of this article. Thanks are also due to Mark Bowron and the anonymous referees for useful comments on an earlier version.
REFERENCES 1. C. E. Aull, Classification of topological spaces, Bull. Acad. Polon. Sci. S´er. Sci. Math. Astronom. Phys. 15 (1967) 773–778. 2. T. A. Chapman, A further note on closure and interior operators, this M ONTHLY 69 (1962) 524–529. doi:10.2307/2311193 3. B. J. Gardner and M. Jackson, The Kuratowski closure-complement problem, New Zealand J. Math (to appear). 4. G. Gr¨atzer, Universal Algebra, Springer-Verlag, New York, 1979. 5. ———, General Lattice Theory, 2nd ed., Birkh¨auser, Boston, 2003. 6. P. C. Hammer, Kuratowski’s closure theorem, Nieuw Arch. Wisk. 8 (1960) 74–80. 7. D. Kleitman, On Dedekind’s problem: The number of monotone Boolean functions, Proc. Amer. Math. Soc. 21 (1969) 677–682. doi:10.2307/2036446 8. K. Kuratowski, L’op´eration A¯ de l’analysis situs, Fund. Math. 3 (1922) 182–199. 9. E. Langford, Problems of Kuratowski type involving unions and intersections, Notices Amer. Math. Soc. 22 (1975) A-485. 10. J. C. C. McKinsey and A. Tarski, The algebra of topology, Ann. of Math. 45 (1944) 141–191. doi: 10.2307/1969080 11. H. Rasiowa and R. Sikorski, The Mathematics of Metamathematics, 3rd ed., Monografie Matematyczne, no. 41, PWN–Polish Scientific Publishers, Warsaw, 1970. 12. C. Y. Yu, Solution to Problem 5996, this M ONTHLY 85 (1978) 283–284. doi:10.2307/2321184 13. M. Zarycki, Quelques notions fondamentales de l’Analysis Situs au point de vue de l’Alg`ebre de la Logique, Fund. Math. 9 (1927) 3–15. DAVID SHERMAN is assistant professor of mathematics at the University of Virginia. He received his Ph.D. from UCLA in 2001 and works primarily in operator algebras and related areas of functional analysis. He once had hobbies but now has young children, which he considers a wonderful trade. No kidding, his favorite number has always been fourteen. Department of Mathematics, University of Virginia, P.O. Box 400137, Charlottesville, VA 22904-4137
[email protected]
February 2010]
VARIATIONS ON KURATOWSKI’S 14-SET THEOREM
123
The Trouble with von Koch Curves Built from n-gons Tam´as Keleti and Elliot Paquette 1. INTRODUCTION. The classical von Koch curve is an oft-cited fractal (see Figure 1). It is an everywhere continuous, nowhere differentiable curve (a recent proof of ˇ Ungar [10]) with a straightforward construcwhich was given in the M ONTHLY by S. tion, and it was specifically designed with simplicity of construction in mind. After Weierstrass had exhibited such a curve analytically, Helge von Koch sought a geometrically constructed curve that, beyond being another example of a nowhere differentiable curve, would intuitively exhibit this property.1 Von Koch’s curve is constructed by a recursive process that begins with a line segment. The first step is to delete the interior middle third of this line segment and replace it by two faces of an equilateral triangle. The same operation is then performed on each segment of the resulting curve, with the equilateral triangles pointing “outwards.” This process is repeated, and a limit is taken to finalize the curve.
Figure 1. The classical von Koch curve.
The von Koch curve has a history of generalizations as old as the curve itself. In the very paper in which he introduced the curve, von Koch provided a family of curves generalizing his original construction. He allowed that the deleted section might vary in size and position at every stage of the process, so long as the curve remained bounded. This is a greatly flexible definition that allows the final curve to have many different properties; here we will focus on a more restrictive and more symmetric generalization. In this vein, we vary the length of the middle segment that is removed. When the length of the removed segment is less than a third, this generalization has been cited as the “modified” von Koch curve [4]. More significantly, we replace the equilateral triangle that is glued to the middle third by a regular n-gon. This gives a family of curves with two parameters: a scalar c with 0 < c < 1 that defines the length of the removed subinterval, and an integer n ≥ 3 that defines which n-gon will be used in the construction. The construction of the (n, c)-von Koch curve then goes as follows (see Figure 2). Start with a closed line segment, which will be called the curve’s base, of length L, doi:10.4169/000298910X476040 courbe . . . est d´efinie par une construction g´eom´etrique, suffisamant simple, je crois, pour que tout le monde puisse pressentir, d´ej`a par «l’intuition na¨ıve», l’impossibilit´e d’une tangente d´etermin´ee.” [8] 1 “La
124
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
glue a regular n-gon with side length cL to the center of this line segment, and delete the interior of the length cL middle subinterval of the line segment. This procedure took one input line segment and produced n + 1 output line segments. Thus it may be applied recursively, always gluing the regular n-gons outwards.
Figure 2. The (7, 0.15)-von Koch curve.
This process creates a sequence of curves that converges uniformly to a limiting curve, which is called the (n, c)-von Koch curve. In this paper, we are concerned primarily with the self-intersection of the von Koch curve. If the curve is a simple curve, then it is said to be self-avoiding or non-self-intersecting. Regardless of whether or not the curve self-intersects, the image of the (n, c)-von Koch curve is a self-similar set: there are finitely many contractive similarities (affine transformations of the plane that shrink distances by a constant factor) so that the union of the images of the set under these similarities equals the set. These similarity maps are best understood through the construction of the curve. Namely, they are the n + 1 similarities that map the initial line segment to one of the n + 1 resultant line segments and preserve the outward orientation of the curve. Because of this self-similarity, the von Koch curve is the union of smaller von Koch curves, and so a von Koch curve contains an entire hierarchy of smaller, geometrically similar curves. Some additional terminology is helpful for discussing these relations. An (n, c)-von Koch curve is said to have n + 1 children. Two lie on either off-center (1 − c)/2 segment and have bases that are collinear with that of their parent. The other n − 1 primary children have bases that are faces of a regular n-gon, the primary n-gon of the curve. There are two fractals related to the (n, c)-von Koch curve which are worth mentioning at this juncture because they share many of the same properties as the curve. First, if instead of starting from a lone line segment, the construction begins with the boundary of a regular n-gon, then the limiting closed curve is called the (n, c)von Koch snowflake, which is self-avoiding exactly when the von Koch curve is self-avoiding. Second, instead of constructing a curve, it is possible to construct a von Koch-type planar domain. In a method analogous to how the snowflake was constructed, begin with a solid regular n-gon and glue solid regular n-gons on to the boundary. The closure of the union of all these finite approximations is called the closed (n, c)-von Koch domain. It is straightforward to prove that the von Koch snowflake is the boundary of the von Koch domain when the snowflake is selfavoiding, and that the closed von Koch domain overlaps itself exactly when the von Koch curve is self-intersecting. When the (n, c)-von Koch curve is self-avoiding, it is straightforward to calculate its dimension. Because the n + 1 children of the curve do not overlap (more precisely, the union of the open solid regular n-gons used in the construction satisfies the open February 2010]
VON KOCH CURVES BUILT FROM n-GONS
125
Figure 3. The (6, 18 )-von Koch snowflake.
set condition), all the dimensions (Hausdorff, packing, and box-counting) agree with the similarity dimension (see [4]), which is the unique positive s for which
1−c (n − 1)c + 2 2 s
s = 1.
When the curve is self-intersecting, the children of the curve overlap. In such situations, it is usually very hard to calculate the dimension, although results about overlapping self-similar sets suggest that we can expect that the dimension is the s defined above for almost every c so long as s < 2. The von Koch domains have been the setting for study of the heat equation, with the classical (3, 13 )-von Koch domain being considered by Fleckinger et al. [5] and the (4, c)-von Koch domain being studied by van den Berg [1]. Other studies of the heat equation have led to the consideration of a simplified (n, c)-von Koch domain that restricts the construction to the primary n-gons [2]. M. van den Berg noticed in his work on the (4, c)-snowflake that the curve is selfavoiding if and only if c < 13 and asked for an analogous result for the (3, c)-snowflake. In [6] it was proved that the (3, c)-snowflake curve is non-self-intersecting if and only if c < 12 . In a communication to the first author, M. van den Berg asked what the critical c is for other n. The main goal of the present paper is to show that there is no such critical c in general: for some n there exist c1 < c2 such that the (n, c1 )-snowflake curve is selfintersecting but the (n, c2 )-snowflake curve is not (Theorem 4.5). It is worth remarking that other families of generalizations of the von Koch curve do exhibit this critical c effect [3]. Figure 4 shows how the critical c phenomenon can fail, and Table 1 shows a list of triplets (n, c1 , c2 ) with the aforementioned property. The critical c phenomenon is a peculiarity of the low-order cases, where a certain symmetry effects a self-intersection. A shorter proof of the result that the (3, c)snowflake curve self-intersects for c ≥ 12 is presented in Section 2 to illustrate this symmetry. To be able to talk precisely about the von Koch curve, we will name some of the points of an (n, c)-von Koch curve that are useful in describing its geometry (see 126
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
n = 14, c = 0.025
n = 14, c = 0.028
n = 14, c = 0.032
n = 14, c = 0.037
Figure 4. These figures demonstrate the meshing phenomenon.
Table 1. These are some values of n up to 50 where the result given here may be demonstrated using precisely the same methodology. At c1 the curve can be shown to self-intersect, and at c2 the curve can be shown to be self-avoiding.
n
c1
c2
n
c1
c2
14 19 20 26 27 28 29 30
0.032 0.014670 0.014571 0.0074988 0.0074905 0.0074653 0.0074555 0.0074584
0.037 0.018424 0.018028 0.0090564 0.0089699 0.0089045 0.0088275 0.0087413
37 38 39 40 41 42 43
0.0038070 0.0037970 0.0037992 0.0037966 0.0038046 0.0037935 0.0037928
0.0043831 0.0043604 0.0043343 0.0043054 0.0042890 0.0042692 0.0042601
Figure 5). A (n, c)-von Koch curve with base AC (or just curve AC for short) is built upon the closed horizontal line segment from A on the left to C on the right, having length 1. Let M denote the vertex of the primary n-gon touching the right non-primary child. Let S denote the vertex of the primary n-gon adjacent to M on the right, and lastly let T denote the vertex adjacent to S on the right.
T
S
M
C
Figure 5. The wedge-shaped region between segments M S and MC is singularly important in the search for self-intersection.
February 2010]
VON KOCH CURVES BUILT FROM n-GONS
127
2. LOW-ORDER SYMMETRY. In [6] Theorem 2.1 and its converse were proved. The proof of Theorem 2.1 presented here is substantially simpler than the proof in [6]. The converse of Theorem 2.1 is also a consequence of Theorem 3.1. Theorem 2.1. The (3, c)-snowflake curve self-intersects if c ≥ 12 . Proof. The proof requires a few definitions to be clear. Let D F be the inward-facing primary child of curve MC. Let E M be the lower non-primary child of S M. Figure 6 illustrates these. S
F
E
M
D
C
1 2.
Figure 6. The value of c is The line segment is contained in the closure of the von Koch domain. This is sufficiently long to effect self-intersection.
The key realization is that D F and E M are the same size for any c: |D F| = |E M| =
c(1 − c) . 2
Because of this, the region E M D F is an isoceles trapezoid in which the curve is fixed by the reflection that interchanges E with F and M with D. Therefore, if curve E M crosses the line of symmetry of this reflection, then curve F D crosses the line at the same point, and so the von Koch curve is self-intersecting. To this end define a line segment that begins on the midpoint of segment E M and extends horizontally to the right until it exits the closed von Koch domain (see Figure 6). This curve goes through the midpoint of the base of the right primary child of E M, and it goes through the midpoint of the base of the left primary child thereof. The base segment of this grandchild is parallel to the base of E M, and the line segment goes through both of their midpoints. Thus by self-similarity, this horizontal line segment contains the midpoints of the bases of a countably infinite family of curves. These curves may be enumerated by letting the first be E M, by letting each evenindexed curve be the right primary child of the curve that came before, and by letting each odd-indexed curve be the left primary child of the curve that came before. Again, by self-similarity the line segment contains the midpoints of the bases of all of these curves, and so a lower bound for the length of the horizontal line segment is c2 (1 − c) c2 (1 + c + c2 + c3 + · · · ) = . 4 4 128
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
The horizontal distance separating the midpoints of the bases of E M and D F is
1−c 2
2 +2
1 c c(1 − c) = − . 8 4 4
Thus curve E M crosses the line of symmetry if c2 c 1 + ≥ . 2 4 4 This inequality is satisfied so long as c ≥ 12 , establishing that the horizontal line segments overlap, and the curve self-intersects. An analogous tight bound on self-intersecting c can be formulated for the (4, c)-von Koch curve using the same sequences. In larger n-gons, these sequences can still be used to provide a value of c above which the curve self-intersects. However, the curves self-intersect long before the line segments overlap. 3. TO AN EXTENT, ANY n WORKS. There will always be an interval of values of c for which the (n, c)-von Koch curve is self-avoiding. This interval of c values, bounded below by 0, can be computed explicitly using only geometry. Better still, we will give an interval of c values which is maximal in the sense that any larger interval of c values would have to take the meshing phenomenon into account. Theorem 3.1. The (n, c)-von Koch curve is self-avoiding if c satisfies c<
sin2 (π/n) cos2 (π/n) + 1
when n is even or if c satisfies c < 1 − cos(π/n) when n is odd. Note that when n = 3, this inequality becomes c < 12 , which gives the converse of Theorem 2.1. Proof. Our method of proof is a generalization of the method used by the first author in [6]. In what follows, θ is the internal angle of an n-gon, given in terms of n by the formula θ=
(n − 2)π . n
The key is to approximate an (n, c)-von Koch curve by a polygon so that three properties hold. First, the polygon encloses the base and primary n-gon of the curve. Second, the approximation of each of the curve’s children fits entirely within the curve’s approximation. Third, all of the approximations of the curve’s children are pairwise disjoint (except for adjacent children, whose approximations are disjoint save for the shared vertex). By self-similarity, these three properties ensure that the curve is selfavoiding. February 2010]
VON KOCH CURVES BUILT FROM n-GONS
129
π–θ
S
φ M
A
C
Figure 7. The even-n (n, c)-von Koch curve is best approximated by triangles.
To get optimal bounds on the intervals, some care is required in choosing the approximating polygons. It turns out that the (n, c)-von Koch curves differ in this regard depending upon whether n is even or odd. If n is even, then approximate the curve AC by an isoceles triangle having base AC and going through the endpoints of the uppermost primary child of the curve (see Figure 7). Call the base angle of this approximating triangle φ. The three required properties needed to show that the curve is self-avoiding are satisified exactly when π − θ > 2φ. After some computation, this inequality is shown to be equivalent to c<
1 2 tan2 (θ/2)
+1
.
If n is odd, then approximate the curve AC by an isoceles trapezoid with base AC. In order to define the top side of the trapezoid, consider the line that is parallel to the top left segment of the primary n-gon and goes through the top left vertex of the primary n-gon thereof (see Figure 8). Let D be the point where this line intersects line AC, and let E be the point where this line intersects the horizontal line through the apex of AC’s primary n-gon. The point E is the top left corner of the trapezoid. The top right corner is defined by the same process reflected across AC’s vertical axis of symmetry.
E
h
S
π–θ 2
φ A
w
D
π–θ φ
M
C
Figure 8. The odd-n (n, c)-von Koch curve is best approximated by trapezoids.
Let w be the length of the top segment, let h be the height of the trapezoid, and let φ be the acute angle of the trapezoid. Again the three self-avoidance properties are satisfied exactly when π − θ > 2φ. To express this as a constraint on c, it is helpful to note that this constraint is equivalent to point D being on line segment AC, as the 130
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
angle EDC is
π−θ . 2
Point D is on line segment AC so long as
θ 1−w > h tan . 2 2 Note that h = 2c tan θ2 + sec θ2 and that w = 2ch sec θ2 . After some calculation, the previous inequality is shown to be equivalent to θ c < 1 − sin . 2 4. THE DIFFICULTY WITH LARGER n. We will show here that the set of c for which the (n, c)-von Koch curve self-intersects is not always an interval. The difficulty with showing this is that there are so many places, with seemingly so little order, to check for self-intersection. However, the self-similarity of the curve greatly reduces the amount of work that needs to be done. First of all, it is superfluous to search for intersections between every pair of points on the curve. Instead, one only needs to check a wedge-shaped region between segments M S and MC (see Lemma 4.1). This region is illustrated in Figures 5 and 9. S
M
C
Figure 9. n = 14, c = 0.037, with circular approximations. The grayed circles do not need to be checked by self-similarity.
Furthermore, within the wedge shaped region between segments M S and MC, there is a considerable amount of repetition. In order to simplify the search for selfintersection, this repetition needs to be eliminated, and so we introduce the rescaling map that fixes the point M and shrinks all distances by a factor of (1 − c)/2. It sends curve M S into itself, and it sends curve MC into itself (and so it is a similarity of both M S and MC). Lemma 4.1. Curve AC self-intersects if and only if there is an intersection between curve M S and curve MC. Proof. Note that because of self-similarity, AC self-intersects if and only if there is a self-intersection between two distinct children of AC. Suppose the curve MC is not strictly below ray M S. This implies that curve MC must intersect ray M S at some point besides M. Iterated applications of the rescaling map bring this point arbitrarily close to M. Hence there is a curve descended from curve MC having endpoint M whose other endpoint is arbitrarily close to M, and so February 2010]
VON KOCH CURVES BUILT FROM n-GONS
131
it can be ensured that part of curve MC crosses line segment M S. As a result, there must be an intersection between curves M S and MC. Suppose the curve MC is strictly below ray M S (except of course at M). Recall that curve ST is the primary child of AC adjacent to M S. By similarity, curve ST is strictly above ray M S, and thus curve ST is separated from MC. All other children of curve AC are strictly separated from curve MC by line M S. Furthermore, all primary children of curve AC are pairwise disjoint, and thus if an intersection occurs, it must be between curve M S and curve MC (or between the left counterparts of curve M S and MC, but by the natural left-right symmetry of curve AC, it is unnecessary to search both the left and right sides for intersection). Using the rescaling map, Lemma 4.1 can be bettered. Instead of checking the whole wedge, our focus can be restricted to a properly chosen trapezoid. In essence, this is because iterated applications of the rescaling map cover the whole wedge. Before this trapezoid can be introduced, however, some more machinery for approximating the curve is needed. In order to isolate parts of the snowflake domain from others, we circumscribe a circle around the n − 1 primary children of each curve. Only some of the circles are relevant to showing self-avoidance, and these are introduced here (see Figure 9). • • • • •
The circle ζ is associated to curve M S. The circle α is the largest between ζ and S; it is associated to the outer non-primary child of M S. The circle α˜ is the rescaled image of α. The family of circles {ξn } is recursively defined by letting ξ1 be the circle associated to curve MC and letting ξn+1 be the rescaled image of ξn . The family of circles {βn } is defined by letting βn be the largest circle between ξn and ξn+1 .
From the two infinite families of circles, only a few that are close to ζ will be needed. So formally, let k be the natural number such that the horizontal coordinate of the center of ζ is between the horizontal coordinates of the centers ξk and ξk+1 . Roughly speaking, the region around ζ is the critical region that needs to be checked for intersection. To nail down the definition of this region we will cut up the wedge using a family of lines. Let {lm }, with m ≥ k, be a family of lines recursively defined by letting lm+1 be the rescaled image of lm and letting lk be chosen so that lk+1 separates ζ and βk on the right from ξk+1 and α˜ on the left. This family of lines will not always exist, so before using them for a particular value of c, it must be shown that they exist. We are now able to define the critical trapezoid in which intersections need to be checked. The trapezoid is the region of the plane bounded by line segments M S and MC and lines lk and lk+1 . This trapezoid was defined to contain the four circles ζ , ξk , α, and βk . To be able to demonstrate that M S and MC are disjoint, we will need approximations that cover all of either curve in the trapezoid. However, these circles alone do not cover all the portions of M S and MC inside the critical trapezoid. In order to account for all parts of M S and MC, we define two strips, an upper one and a lower one. Let the upper strip be the minimum constant-radius tubular neighborhood of segment M S so that the union of the strip, α, and ζ contains the intersection of curve M S with the critical trapezoid. Likewise, let the lower strip be the minimum constant-radius tubular neighborhood of segment MC so that the union of the strip, ξk , and βk contains the intersection of curve MC with the critical trapezoid. Let μ be the radius of the upper strip, and let λ be the radius of the lower strip. 132
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
Lemma 4.2 collects the importance of the preceding definitions. Lemma 4.2. If for a given c, the lines {lm } exist, and the upper objects are disjoint from the lower, (i.e., α, ζ , and the upper strip are disjoint from ξk , βk , the lower strip) then curve AC does not self-intersect. Proof. Because the lines lk and lk+1 exist, the critical trapezoid, denoted T , exists. Since the rescaling map takes lm to lm+1 for any m ≥ k, the iterated rescaled images of T cover the entire wedge between M and lk . If the curves MC and M S intersect, then the image of any intersection point under the rescaling map is also an intersection point, and by iterating the rescaling map we get a sequence of intersection points that converges to M. Thus, if the curves intersect then there is an intersection point in the wedge between M and lk . Therefore, to determine whether or not MC and M S intersect it suffices to check T for intersections, and by the definition of the upper and lower strips, curve M S is disjoint from curve MC inside T . So by Lemma 4.1, the curve AC does not self-intersect. Thus, once the existence of lk has been shown, there are just a few distances that need to be computed to show that the curve is self-avoiding. One method to demonstrate the existence of lk is to explicitly look for lines having a particular direction. To that end, choose M as origin of a cartesian coordinate system for the plane, and define v to be the unit vector pointing along segment M S. This choice is arbitrary, and any vector pointing between M S and MC would work. Lemma 4.3. Define a function v ∗ by x → x · v, using the standard inner product. The line lk exists if sup v ∗ (x) ≤ sup v ∗ (x) < inf v ∗ (x) ≤ inf v ∗ (x). x∈α˜
x∈ξk+1
x∈ζ
x∈βk
Proof. Choose some y between supx∈ξk+1 v ∗ (x) and infx∈ζ v ∗ (x), and let lk+1 be defined as the preimage of y under v ∗ . All of these lemmas together give a method for checking that the curve is selfavoiding. However, we also need a method for checking that the curve is selfintersecting. The base segments of the primary children of a curve form all but one side of a regular n-gon, and so inside this n-gon, we inscribe a circle. Note that this inscribed circle is concentric with the outer approximation constructed earlier. Lemma 4.4. If the inscribed circles associated to two curves overlap, then the von Koch curve self-intersects. Proof. If the inscribed circles overlap, then the domains that the two curves bound overlap. Thus, the snowflake domain overlaps, which is equivalent to the von Koch curve self-intersecting. We now have the machinery to prove the main result. Theorem 4.5. The set of c for which the (n, c)-von Koch curve is self-avoiding is not in general an interval. February 2010]
VON KOCH CURVES BUILT FROM n-GONS
133
Proof. We present numerical proof that when n is 14, the set of self-avoiding c is not an interval. Formulae to compute these numerical results are provided in the appendix. We will reason explicitly that when c = 0.037, the curve is self-avoiding, but when c = 0.032, the curve self-intersects (see Figure 4). Thus, there is a self-avoiding value of c that is disconnected from the interval of self-avoiding c near 0 (see Theorem 3.1). Table 2. These values establish the existence of the {lm } for the (14, 0.037)-von Koch curve.
supx∈α˜ v ∗ (x)
supx∈ξk+1 v ∗ (x)
infx∈ζ v ∗ (x)
infx∈βk v ∗ (x)
0.014275
0.014828
0.015270
0.017055
For both of these values of c, the value of k is 4. Using this value of k, the distances in Table 2 were computed for c = 0.037. They show that the criteria of Lemma 4.3 are satisfied, and hence the critical trapezoid exists. Table 3 shows the minimal distances between the upper objects and the lower objects. Note that the minimal distance between the upper strip and lower strip is along the line lk+1 . For the value in Table 3, the line lk+1 is chosen so that it bisects the v ∗ -distance between ξk+1 and ζ . These values show that the upper objects are disjoint from the lower objects, and so by Lemma 4.2 the curve is self-avoiding when c = 0.037. Table 3. Here, μ refers to the upper strip, and λ refers to the lower strip.
α ζ μ
β4
ξ4
λ
0.009066 0.000314 0.005084
0.000352 0.001038 0.001600
0.008323 0.001085 0.004682
To show that the curve self-intersects when c = 0.032, we follow Lemma 4.4 and compute the distance between the centers of ζ and ξ5 minus the sum of inscribed radii of ζ and ξ5 as approximately −0.0003895. As the inscribed circles overlap, the von Koch curve self-intersects when c = 0.032. The methodology used here may also be used to show that the set of self-avoiding c is not an interval for larger n. In Table 1, some values of n, c1 , and c2 with c1 < c2 are given such that the (n, c)-von Koch curve can be shown to self-intersect when c = c1 and can be shown to be self-avoiding at c = c2 using the machinery developed here. While the set of self-avoiding c is not necessarily an interval, it is unknown how complex the set might be. For example, a challenging open question is whether or not the set of self-avoiding c can be written as a finite union of intervals. The aid of computer tools has been indispensable in studying the von Koch curve. Java software for exploring the fractal and generating figures (such as Figures 1 and 2) is available on the MAA website [7]. APPENDIX. All the formulae for the computations are presented here. In what follows, the variable θ is the interior angle of a regular n-gon, which is given explicitly by Formula (1) of Table 4. 134
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
Table 4. These are all the formulae, besides the circle formulae in Table 5, required to employ Theorem 4.5.
The interior angle of an n-gon
θ=
The radius of a circumscribed circle
The index of the ξ immediately to the right of ζ
The thickness of the lower strip
θ c 1+c tan 21−c 2 θ c h(c) = tan 2 2 ln [c (− cos θ + c tan(θ/2) sin θ )] k= ln[(1 − c)(1/2)] 1−c μ=c h(c) 2 1 − c k+2 h(c) λ= 2
R(c) =
The height of the center of a circle above its base
The thickness of the upper strip
(n − 2)π n
(1) (2) (3) (4) (5) (6)
The radius R(c) of the circumscribed circle of a unit-base-length curve’s primary children is given by Formula (2) of Table 4. This formula may be derived by analyzing the relation between the radius of the minimal circle enclosing a von Koch curve’s primary child and the minimal circle enclosing a primary child of a von Koch curve’s primary child. The grandchild’s circle is tangent to and contained in the child’s circle. Thus, by writing the radius of the circle enclosing the grandchild in two ways, the following relation is derived: θ c c R(c) = R(c) − tan (1 + c). 2 2 This immediately yields the presented formula. The distance h(c) from the center of the minimal circle to the base segment of a unit-base-length curve is given by Formula (3) of Table 4. To derive this, note that this minimal circle is concentric with the polygon formed by this curve’s primary children. Furthermore, the inscibed circle of the bases of a curve’s primary children is also concentric with these, and so the inscribed circle’s radius r (c) is the same of the height of the center, r (c) = h(c). Recall that μ and λ are defined to be the radii of the upper and lower strips. They are given as Formulae (5) and (6) of Table 4, and they can be computed presupposing the existence of the critical trapezoid by making the following observations. It is straightforward to verifty that 1−c (h(c) + R(c)) = h(c), from which relation it follows that 2 the circle associated to a non-primary child rises exactly as high as the center of its parent’s primary n-gon. On curve M S, the next largest circle inside the critical trapezoid after α is a non-primary child of the curve that α approximates. On curve MC, the next largest circle inside the critical trapezoid after βk is a non-primary child of the curve that βk approximates. Recall that k is defined as the natural number such that the horizontal coordinate of the center of ζ is between the horizontal coordinates of the centers ξk and ξk+1 . This definition translates into the following inequality, which can be derived from the February 2010]
VON KOCH CURVES BUILT FROM n-GONS
135
Table 5. These are the relevant formulae for the circles shown to be disjoint. The origin for these formulae is taken to be M.
Horizontal Coordinate ξk βk ζ α
1−c 2 1−c 2
k
1 2
k+1
Vertical Coordinate
3+c 4
1−c 2 1−c 2
Radius
k h(c)
k+2 h(c)
c − cos(θ ) + c sin(θ )h(c) 2
c sin(θ ) + c cos(θ )h(c) 2
−c (3 + c) cos(θ ) 4 c + (1 − c) sin(θ )h(c) 2
c (3 + c) sin(θ ) 4 c + (1 − c) cos(θ )h(c) 2
1−c 2 1−c 2
k R(c) k+2 R(c)
cR(c)
c
1−c R(c) 2
formulae in Tables 5 and 4:
1−c 2
k+1
θ 1−c k ≤ c − cos θ + c tan . sin θ ≤ 2 2
Solving this equation for k yields Formula (4). The circle formulae in Table 5 can be deduced directly from their definitions. The coordinates for the centers of α and ζ are easiest to deduce if they are first computed in the rotated coordinate system where segment M S is rotated to be the negative direction on the horizontal axis. ACKNOWLEDGMENTS. The research of the first author was partially supported by Hungarian Scientific Foundation grant No. T72655. This work began when the first author visited M. van den Berg at the University of Bristol.
REFERENCES 1. M. van den Berg, Heat equation on the arithmetic von Koch snowflake, Probab. Theory Related Fields 118 (2000) 17–36. doi:10.1007/PL00008740 2. M. van den Berg and F. den Hollander, Asymptotics for the heat content of a planar region with a fractal polygonal boundary, Proc. London Math. Soc. 78 (1999) 627–661. doi:10.1112/S0024611599001781 3. R. Darst, J. Palagallo, and T. Price, Generalizations of the Koch curve, Fractals 16 (2008) 1–8. doi: 10.1142/S0218348X08003971 4. K. Falconer, Fractal Geometry: Mathematical Foundations and Applications, John Wiley, Chichester, UK, 1990. 5. J. Fleckinger, M. Levitin, and D. Vassiliev, Heat equation on the triadic von Koch snowflake: Asymptotic and numerical analysis, Proc. London Math. Soc. 71 (1995) 372–396. doi:10.1112/plms/s3-71.2. 372 6. T. Keleti, When is the modified von Koch snowflake non-self-intersecting? Fractals 14 (2006) 245–249. doi:10.1142/S0218348X06003234 7. T. Keleti and E. Paquette, The Trouble with von Koch curves built from n-gons, an online supplement (2009), available at http://mathdl.maa.org/mathDL/60/. 8. H. von Koch, Une m´ethode g´eom´etrique e´ l´ementaire pour l’´etude de certaines questions de la th´eorie des courbes planes, Acta Math. 30 (1906) 145–174. doi:10.1007/BF02418570 9. B. B. Mandelbrot, The Fractal Geometry of Nature, W. H. Freeman, New York, 1983. ˇ Ungar, The Koch curve: A geometric proof, this M ONTHLY 114 (2007) 61–66. 10. S.
136
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
´ KELETI received his M.Sc. in 1993 and Ph.D. in 1997 under the supervision of Mikl´os Laczkovich TAMAS at the E¨otv¨os Lor´and University in Budapest, Hungary, at which institution he is currently an associate professor. Often, he teaches at the Budapest Semester in Mathematics, a study abroad program for American undergraduate students. Each year he organizes and usually leads the (often extremely successful) team of his university to the International Mathematics Competition for University Students. He loves geometric measure theory, fractals, real functions, sports, music, and, more than anything, his wife Gabi and his children Hanga and Doma. Department of Analysis, E¨otv¨os Lor´and University, P´azm´any P´eter s´et´any 1/C, H-1117 Budapest, Hungary
[email protected]
ELLIOT PAQUETTE is a graduate student at the University of Washington, where he intends to earn his Ph.D. He likes to cycle, although presently he is having difficulty with his bicycle. Its front wheel is badly deformed, which gives him a personal appreciation for the geometry of cylinders. He is particularly glad that his wheel is not shaped like a von Koch snowflake, although the novelty of such a wheel does appeal to him. He received his B.A. from Kalamazoo College. Department of Mathematics, University of Washington, Box 354350 Seattle, WA 98195
[email protected]
Mathematics Is . . . “Mathematics is much like the Mississippi. There are side-shoots and dead ends and minor tributaries; but the mainstream is there, and you can find it where the current—the mathematical power—is strongest. Its delta is research mathematics: it is growing, it is going somewhere (but it may not always be apparent where), and what today looks like a major channel may tomorrow clog up with silt and be abandoned. Meanwhile a minor trickle may suddenly open out into a roaring torrent. The best mathematics always enriches the mainstream, sometimes by diverting it in an entirely new direction.” Ian Stewart, From Here to Infinity, Oxford University Press, Oxford, 1996, p. 11. —Submitted by Carl C. Gaither, Killeen, TX
February 2010]
VON KOCH CURVES BUILT FROM n-GONS
137
Nemirovski’s Inequalities Revisited ¨ Lutz Dumbgen, Sara A. van de Geer, Mark C. Veraar, and Jon A. Wellner
1. INTRODUCTION. Our starting point is the following well-known theorem from probability: Let X 1 , . . . , X n be independent random variables with finite second mon ments, and let Sn = i=1 X i . Then Var(Sn ) =
n
Var(X i ).
(1)
i=1
If we suppose that each X i has mean zero, EX i = 0, then (1) becomes ESn2 =
n
EX i2 .
(2)
i=1
This equality generalizes easily to vectors in a Hilbert space H with inner product 2 ·, ·: If the X i ’s are independent n with values in H such that EX i = 0 and EX i < 2 ∞, then Sn = Sn , Sn = i, j =1 X i , X j , and since EX i , X j = 0 for i = j by independence, ESn 2 =
n
EX i , X j =
i, j =1
n
EX i 2 .
(3)
i=1
What happens if the X i ’s take values in a (real) Banach space (B, · )? In such cases, in particular when the square of the norm · is not given by an inner product, we are aiming at inequalities of the following type: Let X 1 , X 2 , . . . , X n be indepen2 dent random n vectors with values in (B, · ) with EX i = 0 and EX i < ∞. With Sn := i=1 X i we want to show that ESn 2 ≤ K
n
EX i 2
(4)
i=1
for some constant K depending only on (B, · ). For statistical applications, the case (B, · ) = rd := (Rd , · r ) for some r ∈ [1, ∞] is of particular interest. Here the r -norm of a vector x ∈ Rd is defined as
xr :=
⎧ d 1/r ⎪ r ⎪ ⎨ |x j |
if 1 ≤ r < ∞, (5)
j =1
⎪ ⎪ ⎩ max |x j | 1≤ j ≤d
if r = ∞.
An obvious question is how the exponent r and the dimension d enter an inequality of type (4). The influence of the dimension d is crucial, since current statistical research doi:10.4169/000298910X476059
138
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
often involves small or moderate “sample size” n (the number of independent units), say on the order of 102 or 104 , while the number d of items measured for each independent unit is large, say on the order of 106 or 107 . The following two examples for the random vectors X i provide lower bounds for the constant K in (4): Example 1.1 (A lower bound in dr ). Let b1 , b2 , . . . , bd denote the standard basis of Rd , and let 1 , 2 , . . . , d be independent Rademacher variables, i.e., random variables taking the values +1 and −1 each with probability 1/2. Define X i := i bi for 1 ≤ i ≤ n n := d. Then EX i = 0, X i r2 = 1, and Sn r2 = d 2/r = d 2/r −1 i=1 X i r2 . Thus any 2/r −1 candidate for K in (4) has to satisfy K ≥ d . Example 1.2 (A lower bound in d∞ ). Let X 1 , X 2 , X 3 , . . . be independent random vectors, each uniformly distributed on {−1, 1}d . Then EX i = 0 and X i ∞ = 1. On the other hand, according to the Central Limit Theorem, n −1/2 Sn converges in distribution as n → ∞ to a random vector Z = (Z j )dj =1 with independent, standard Gaussian components, Z j ∼ N (0, 1). Hence ESn 2∞ sup n = sup En −1/2 Sn 2∞ ≥ EZ2∞ = E max Z 2j . 2 1≤ j ≤d EX n≥1 n≥1 i ∞ i=1
But it is well known that max1≤ j ≤d |Z j | − 2 log d → p 0 as d → ∞. Thus candidates K (d) for the constant in (4) have to satisfy lim inf d→∞
K (d) ≥ 1. 2 log d
At least three different methods have been developed to prove inequalities of the form given by (4). The three approaches known to us are: (a) deterministic inequalities for norms; (b) probabilistic methods for Banach spaces; (c) empirical process methods. Approach (a) was used by Nemirovski [14] to show that in the space rd with d ≥ 2, inequality (4) holds with K = C min(r, log(d)) for some universal (but unspecified) constant C. In view of Example 1.2, this constant has the correct order of magnitude if r = ∞. For statistical applications see Greenshtein and Ritov [7]. Approach (b) uses special moment inequalities from probability theory on Banach spaces which involve nonrandom vectors in B and Rademacher variables as introduced in Example 1.1. Empirical process theory (approach (c)) in general deals with sums of independent random elements in infinite-dimensional Banach spaces. By means of chaining arguments, metric entropies, and approximation arguments, “maximal inequalities” for such random sums are built from basic inequalities for sums of independent random variables or finite-dimensional random vectors, in particular, “exponential inequalities”; see, e.g., Dudley [4], van der Vaart and Wellner [26], Pollard [21], de la Pena and Gin´e [3], or van de Geer [25]. Our main goal in this paper is to compare the inequalities resulting from these different approaches and to refine or improve the constants K obtainable by each method. The remainder of this paper is organized as follows: In Section 2 we review several deterministic inequalities for norms and, in particular, key arguments of Nemirovski [14]. Our exposition includes explicit and improved constants. While finishing the present paper we became aware of yet unpublished work of Nemirovski [15] and Juditsky and February 2010]
NEMIROVSKI’S INEQUALITIES REVISITED
139
Nemirovski [12] who also improved some inequalities of [14]. Rio [22] uses similar methods in a different context. In Section 3 we present inequalities of type (4) which follow from type and cotype inequalities developed in probability theory on Banach spaces. In addition, we provide and utilize a new type inequality for the normed space d∞ . To do so we utilize, among other tools, exponential inequalities of Hoeffding [9] and Pinelis [17]. In Section 4 we follow approach (c) and treat d∞ by means of a truncation argument and Bernstein’s exponential inequality. Finally, in Section 5 we compare the inequalities resulting from these three approaches. In that section we relax the assumption that EX i = 0 for a more thorough understanding of the differences between the three approaches. Most proofs are deferred to Section 6. 2. NEMIROVSKI’S APPROACH: DETERMINISTIC INEQUALITIES FOR NORMS. In this section we review and refine inequalities of type (4) based on deterministic inequalities for norms. The considerations for (B, · ) = rd follow closely the arguments of [14]. 2.1. Some Inequalities for Rd and the Norms · r . Throughout this subsection let B = Rd , equipped with one of the norms · r defined in (5). For x ∈ Rd we think of x as a column vector and write x for the corresponding row vector. Thus x x is a d × d matrix with entries xi x j for i, j ∈ {1, . . . , d}. A first solution. Recall that for any x ∈ Rd , xr ≤ xq ≤ d 1/q−1/r xr
for 1 ≤ q < r ≤ ∞.
(6)
Moreover, as mentioned before, ESn 22 =
n
EX i 22 .
i=1
Thus for 1 ≤ q < 2, ESn q2 ≤ (d 1/q−1/2 )2 ESn 22 = d 2/q−1
n
EX i 22 ≤ d 2/q−1
i=1
n
EX i q2 ,
i=1
whereas for 2 < r ≤ ∞, ESn r2 ≤ ESn 22 =
n
EX i 22 ≤ d 1−2/r
i=1
n
EX i r2 .
i=1
Thus we may conclude that (4) holds with 2/r −1 (d, r ) := d K =K d 1−2/r
if 1 ≤ r ≤ 2, if 2 ≤ r ≤ ∞.
(d, r ) is indeed optimal for 1 ≤ r ≤ 2. Example 1.1 shows that this constant K (d, r ) = d 1−2/r with A refinement for r > 2. In what follows we shall replace K substantially smaller constants. The main ingredient is the following result: 140
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
Lemma 2.1. For arbitrary fixed r ∈ [2, ∞) and x ∈ Rd \ {0} let
d h(x) := 2xr2−r |xi |r −2 xi i=1 while h(0) := 0. Then for arbitrary x, y ∈ Rd , xr2 + h(x) y ≤ x + yr2 ≤ xr2 + h(x) y + (r − 1)yr2 . [16] and [14] stated Lemma 2.1 with the factor r − 1 on the right side replaced with Cr for some (absolute) constant C > 1. Lemma 2.1, which is a special case of the more general Lemma 2.4 in the next subsection, may be applied to the partial sums k S0 := 0 and Sk := i=1 X i , 1 ≤ k ≤ n, to show that for 2 ≤ r < ∞,
ESk r2 ≤ E Sk−1 r2 + h(Sk−1 ) X k + (r − 1)X k r2 = ESk−1 r2 + Eh(Sk−1 ) EX k + (r − 1)EX k r2 = ESk−1 r2 + (r − 1)EX k r2 , and inductively we obtain a second candidate for K in (4): ESn r2 ≤ (r − 1)
n
EX i r2
for 2 ≤ r < ∞.
i=1
Finally, we apply (6) again: For 2 ≤ q ≤ r ≤ ∞ with q < ∞, ESn r2 ≤ ESn q2 ≤ (q − 1)
n
EX i q2 ≤ (q − 1)d 2/q−2/r
i=1
n
EX i r2 .
i=1
This inequality entails our first (q = 2) and second (q = r < ∞) preliminary result, and we arrive at the following refinement: Theorem 2.2. For arbitrary r ∈ [2, ∞], ESn r2 ≤ K Nem (d, r )
n
EX i r2
i=1
with K Nem (d, r ) :=
inf (q − 1)d 2/q−2/r .
q∈[2,r ]∩R
This constant K Nem (d, r ) satisfies the (in)equalities ⎧ if d ≤ 7 ⎨ = d 1−2/r K Nem (d, r ) ≤ r − 1 ⎩ ≤ 2e log d − e if d ≥ 3, and K Nem (d, ∞) ≥ 2e log d − 3e. February 2010]
NEMIROVSKI’S INEQUALITIES REVISITED
141
Corollary 2.3. In the case (B, · ) = d∞ with d ≥ 3, inequality (4) holds with constant K = 2e log d − e. If the X i ’s are also identically distributed, then En −1/2 Sn 2∞ ≤ (2e log d − e)EX 1 2∞ . Note that lim
d→∞
K Nem (d, ∞) 2e log d − e = lim = e. d→∞ 2 log d 2 log d
Thus Example 1.2 entails that for large dimension d, the constants K Nem (d, ∞) and . 2e log d − e are optimal up to a factor close to e = 2.7183. 2.2. Arbitrary Lr -spaces. Lemma 2.1 is a special case of a more general inequality: Let (T, , μ) be a σ -finite measure space, and for 1 ≤ r < ∞ let L r (μ) be the set of all measurable functions f : T → R with finite norm 1/r f r := | f |r dμ , where two such functions are viewed as equivalent if they coincide almost everywhere with respect to μ. In what follows we investigate the functional f → V ( f ) := f r2 on L r (μ). Note that (Rd , · r ) corresponds to (L r (μ), · r ) if we take T = {1, 2, . . . , d} equipped with counting measure μ. Note that V (·) is convex; thus for fixed f, g ∈ L r (μ), the function v(t) := V ( f + tg) = f + tgr2 ,
t ∈R
is convex with derivative
v (t) = v
1−r/2
(t)
2| f + tg|r −2 ( f + tg)g dμ.
By convexity of v, the directional derivative DV ( f, g) := v (0) satisfies DV ( f, g) ≤ v(1) − v(0) = V ( f + g) − V ( f ). This proves the lower bound in the following lemma. We will prove the upper bound in Section 6 by computation of v and application of H¨older’s inequality. Lemma 2.4. Let r ≥ 2. Then for arbitrary f, g ∈ L r (μ), DV ( f, g) = h( f )g dμ with h( f ) := 2 f r2−r | f |r −2 f ∈ L q (μ), where q := r/(r − 1). Moreover, V ( f ) + DV ( f, g) ≤ V ( f + g) ≤ V ( f ) + DV ( f, g) + (r − 1)V (g). 142
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
Remark 2.5. The upper bound for V ( f + g) is sharp in the following sense: Suppose that μ(T ) < ∞, and let f, go : T → R be measurable such that | f | ≡ |go | ≡ 1 and f go dμ = 0. Then our proof of Lemma 2.4 reveals that V ( f + tgo ) − V ( f ) − DV ( f, tgo ) → r − 1 as t → 0. V (tgo ) Remark 2.6. If r = 2, Lemma 2.4 is well known and easily verified. Here the upper bound for V ( f + g) is even an equality, i.e., V ( f + g) = V ( f ) + DV ( f, g) + V (g). Remark 2.7. Lemma 2.4 improves on an inequality of [16]. After writing this paper we realized Lemma 2.4 is also proved by Pinelis [18]; see his (2.2) and Proposition 2.1, page 1680. Lemma 2.4 leads directly to the following result: Corollary 2.8. In the case B = L r (μ) for r ≥ 2, inequality (4) is satisfied with K = r − 1. 2.3. A Connection to Geometrical Functional Analysis. For any Banach space (B, · ) and Hilbert space (H, ·, ·, · ), their Banach-Mazur distance D(B, H) is defined to be the infimum of T · T −1 over all linear isomorphisms T : B → H, where T and T −1 denote the usual operator norms T := sup T x : x ∈ B, x ≤ 1 , T −1 := sup T −1 y : y ∈ H, y ≤ 1 . (If no such bijection exists, one defines D(B, H) := ∞.) Given such a bijection T , ESn 2 ≤ T −1 2 ET Sn 2 n = T −1 2 ET X i 2 i=1
≤ T −1 2 T 2
n
EX i 2 .
i=1
This leads to the following observation: Corollary 2.9. For any Banach space (B, · ) and any Hilbert space (H, , ·, ·, , · ) with finite Banach-Mazur distance D(B, H), inequality (4) is satisfied with K = D(B, H)2 . A famous result from geometrical functional analysis is John’s theorem √ (see [24], ) ≤ [11]) for finite-dimensional normed spaces. It entails that D(B, dim(B) dim(B). 2 This entails the following fact: February 2010]
NEMIROVSKI’S INEQUALITIES REVISITED
143
Corollary 2.10. For any normed space (B, · ) with finite dimension, inequality (4) is satisfied with K = dim(B). Note that Example 1.1 with r = 1 provides an example where the constant K = dim(B) is optimal. 3. THE PROBABILISTIC APPROACH: TYPE AND COTYPE INEQUALITIES. 3.1. Rademacher Type and Cotype Inequalities. Let {i } denote a sequence of independent Rademacher random variables. Let 1 ≤ p < ∞. A Banach space B with norm · is said to be of (Rademacher) type p if there is a constant T p such that for all finite sequences {xi } in B, p n n p i xi ≤ T xi p . E p i=1
i=1
Similarly, for 1 ≤ q < ∞, B is of (Rademacher) cotype q if there is a constant Cq such that for all finite sequences {xi } in B, q n n −q i xi ≥ C xi q . E q i=1
i=1
Ledoux and Talagrand [13, p. 247] note that type and cotype properties appear as dual notions: if a Banach space B is of type p, its dual space B is of cotype q = p/( p − 1). One of the basic results concerning Banach spaces with type p and cotype q is the following proposition: Proposition 3.1. [13, Proposition 9.11, p. 248]. If B is of type p ≥ 1 with constant T p , then ESn p ≤ (2T p ) p
n
EX i p .
i=1
If B is of cotype q ≥ 1 with constant Cq , then ESn ≥ (2Cq ) q
−q
n
EX i q .
i=1
As shown in [13, p. 27], the Banach space L r (μ) with 1 ≤ r < ∞ (cf. Section 2.2) is of type min(r, 2). Similarly, L r (μ) is cotype max(r, 2). If r ≥ 2 = p, explicit values for the constant T p in Proposition 3.1 can be obtained from the optimal constants in Khintchine’s inequalities due to Haagerup [8]. Lemma 3.2. For 2 ≤ r < ∞, the space L r (μ) is of type 2 with constant T2 = Br , where ((r + 1)/2) 1/r Br := 21/2 . √ π 144
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
Corollary 3.3. For B = L r (μ), 2 ≤ r < ∞, inequality (4) is satisfied with K = 4Br2 . Note that B2 = 1 and 1 Br √ →√ e r
as r → ∞.
Thus for large values of r , the conclusion of Corollary 3.3 is weaker than that of Corollary 2.8. 3.2. The Space d∞ . The preceding results apply only to r < ∞, so the special space d∞ requires different arguments. First we deduce a new type inequality based on Hoeffding’s [9] exponential inequality: if 1 , 2 , . . . , n are independent Rademacher n 2 2 random variables, a1 , a2 , . . . , an are real numbers, and v := a , i=1 i then the tail n ai i may be bounded as follows: probabilities of the random variable i=1 n z2 ai i ≥ z ≤ 2 exp − 2 , z ≥ 0. (7) P 2v i=1 At the heart of these tail bounds is the following exponential moment bound: n E exp t ai i ≤ exp(t 2 v 2 /2), t ∈ R.
(8)
i=1
From the latter bound we shall deduce the following type inequality in Section 6:
Lemma 3.4. The space d∞ is of type 2 with constant 2 log(2d). Using this upper bound together with Proposition 3.1 yields another Nemirovskitype inequality: Corollary 3.5. For (B, · ) = d∞ , inequality (4) is satisfied with K = K Type2 (d, ∞) = 8 log(2d). Refinements. Let T2
(d∞ ) be the optimal type-2 constant for the space d∞ . So far we know that T2 (d∞ ) ≤ 2 log(2d). Moreover, by a modification of Example 1.2 one can show that d (9) T2 (∞ ) ≥ cd := E max Z 2j . 1≤ j ≤d
The constants cd can be expressed or bounded in terms of the distribution function √ z of N (0, 1), i.e., (z) = −∞ φ(x) d x with φ(x) = exp(−x 2 /2)/ 2π. Namely, with W := max1≤ j ≤d |Z j |, ∞ ∞ cd2 = E(W 2 ) = E 2t1[t≤W ] dt = 2tP(W ≥ t) dt, 0
0
and for any t > 0, = 1 − P(W < t) = 1 − P(|Z 1 | < t)d = 1 − (2(t) − 1)d , P(W ≥ t) ≤ dP(|Z 1 | ≥ t) = 2d(1 − (t)). February 2010]
NEMIROVSKI’S INEQUALITIES REVISITED
145
These considerations and various bounds for will allow us to derive explicit bounds for cd . On the other hand, Hoeffding’s inequality (7) has been refined by Pinelis [17, 20] as follows: n P (10) ai i ≥ z ≤ 2K (1 − (z/v)), z > 0, i=1
where K satisfies 3.18 ≤ K ≤ 3.22. This will be the main ingredient for refined upper bounds for T2 (d∞ ). The next lemma summarizes our findings: Lemma 3.6. The constants cd and T2 (d∞ ) satisfy the following inequalities: ⎧
T2 (d∞ ) ≤ 2 log d + h 2 (d), ⎪ ⎪ ⎨
2 log d + h 1 (d) ≤ cd ≤ 2 log d, ⎪ ⎪ ⎩
2 log d + h 3 (d),
d≥1 d≥3
(11)
d≥1
where h 2 (d) ≤ 3, h 2 (d) becomes negative for d > 4.13795 × 1010 , h 3 (d) becomes negative for d ≥ 14, and h j (d) ∼ − log log d as d → ∞ for j = 1, 2, 3. In particular, one could replace K Type2 (d, ∞) in Corollary 3.5 with 8 log d + 4h 2 (d). 4. THE EMPIRICAL PROCESS APPROACH: TRUNCATION AND BERNSTEIN’S INEQUALITY. An alternative to Hoeffding’s exponential tail inequality (7) is a classical exponential bound due to Bernstein (see, e.g., [2]): Let Y1 , Y2 , . . . , Yn be independent random variables with mean zero such that |Yi | ≤ κ. Then for n v 2 = i=1 Var(Yi ), n Yi ≥ x ≤ 2 exp − P i=1
x2 , 2(v 2 + κ x/3)
x > 0.
(12)
We will not use this inequality itself but rather an exponential moment inequality underlying its proof: Lemma 4.1. For L > 0 define e(L) := exp(1/L) − 1 − 1/L . Let Y be a random variable with mean zero and variance σ 2 such that |Y | ≤ κ. Then for any L > 0, Y σ 2 e(L) σ 2 e(L) ≤1+ E exp ≤ exp . κL κ2 κ2 With the latter exponential moment bound we can prove a moment inequality for random vectors in Rd with bounded components: 146
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
Lemma 4.2. Suppose that X i = (X i, j )dj =1 satisfies X i ∞ ≤ κ, and let be an upper n Var(X i, j ). Then for any L > 0, bound for max1≤ j ≤d i=1
ESn 2∞ ≤ κ L log(2d) +
L e(L) . κ
Now we return to our general random vectors X i ∈ Rd with mean zero and EX i 2∞ < ∞. They are split into two random vectors via truncation: X i = X i(a) + X i(b) with X i(a) := 1[X i ∞ ≤κo ] X i
and
X i(b) := 1[X i ∞ >κo ] X i
for some constant κo > 0 to be specified later. Then we write Sn = An + Bn with the centered random sums An :=
n
X i(a) − EX i(a)
and
Bn :=
i=1
n
X i(b) − EX i(b) .
i=1
The sum An involves centered random vectors in [−2κo , 2κo ]d and will be treated by means of Lemma 4.2, while Bn will be bounded with elementary methods. Choosing the threshold κ and the parameter L carefully yields the following theorem. Theorem 4.3. In the case (B, · ) = d∞ for some d ≥ 1, inequality (4) holds with
2 K = K TrBern (d, ∞) := 1 + 3.46 log(2d) . If each of the random vectors X i is symmetrically distributed around 0, one may even set
2 (symm) K = K TrBern (d, ∞) = 1 + 2.9 log(2d) . 5. COMPARISONS. In this section we compare the three approaches just described for the space d∞ . As to the random vectors X i , we broaden our point of view and consider three different cases: General case: The random vectors X i are independent with EX i 2∞ < ∞ for all i. Centered case: In addition, EX i = 0 for all i. Symmetric case: In addition, X i is symmetrically distributed around 0 for all i. In view of the general case, we reformulate inequality (4) as follows: ESn − ESn 2∞ ≤ K
n
EX i 2∞ .
(13)
i=1
One reason for this extension is that in some applications, particularly in connection with empirical processes, it is easier and more natural to work with uncentered summands X i . Let us discuss briefly the consequences of this extension in the three frameworks: February 2010]
NEMIROVSKI’S INEQUALITIES REVISITED
147
Nemirovski’s approach. Between the centered and symmetric cases there is no difference. If (4) holds in the centered case for some K , then in the general case ESn − ESn 2∞ ≤ K
n
EX i − EX i 2∞ ≤ 4K
i=1
n
EX i 2∞ .
i=1
The latter inequality follows from the general fact that
EY − EY 2 ≤ E (Y + EY )2 ≤ 2EY 2 + 2EY 2 ≤ 4EY 2 . This looks rather crude at first glance, but in the case of the maximum norm and high dimension d, the factor 4 cannot be reduced. For let Y ∈ Rd have independent components Y1 , . . . , Yd ∈ {−1, 1} with P(Y j = 1) = 1 − P(Y j = −1) = p ∈ [1/2, 1). Then Y ∞ ≡ 1, while EY = (2 p − 1)dj =1 and Y − EY ∞ =
2(1 − p) if Y1 = · · · = Yd = 1, 2p otherwise.
Hence
EY − EY 2∞ = 4 (1 − p)2 p d + p 2 (1 − p d ) . 2 EY ∞ If we set p = 1 − d −1/2 for d ≥ 4, then this ratio converges to 4 as d → ∞. The approach via Rademacher type-2 inequalities. The first part of Proposition 3.1, involving the Rademacher type constant T p , remains valid if we drop the assumption that EX i = 0 and replace Sn with Sn − ESn . Thus there is no difference between the general and centered cases. In the symmetric case, however, the factor 2 p in Proposition 3.1 becomes superfluous. Thus, if (4) holds with a certain constant K in the general and centered cases, we may replace K with K /4 in the symmetric case. The approach via truncation and Bernstein’s inequality. Our proof for the centered case does not utilize that EX i = 0, so again there is no difference between the centered and general cases. However, in the symmetric case, the truncated random vectors 1{X i ∞ ≤ κ}X i and 1{X i ∞ > κ}X i are centered, too, which leads to the substantially smaller constant K in Theorem 4.3. Summaries and comparisons. Table 1 summarizes the constants K = K (d, ∞) we have found so far by the three different methods and for the three different cases. Table 2 contains the corresponding limits K ∗ := lim
d→∞
K (d, ∞) . log d
Interestingly, there is no global winner among the three methods. But for the centered 148
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
Table 1. The different constants K (d, ∞).
General case
Centered case
Symmetric case
8e log d − 4e
2e log d − e
2e log d − e
8 log(2d) 8 log d + 4h 2 (d)
2
1 + 3.46 log(2d)
8 log(2d) 8 log d + 4h 2 (d)
2
1 + 3.46 log(2d)
2 log(2d) 2 log d + h 2 (d)
2
1 + 2.9 log(2d)
Nemirovski Type-2 inequalities Truncation/Bernstein
Table 2. The different limits K ∗ .
Nemirovski
General case
Centered case
Symmetric case
. 8e = 21.7463
. 2e = 5.4366
. 2e = 5.4366
Type-2 inequalities
8.0
8.0
2.0
3.46 = 11.9716
3.46 = 11.9716
2.9 = 8.41
2
Truncation/Bernstein
2
2
case, Nemirovski’s approach yields asymptotically the smallest constants. In particular, lim
3.462 . K TrBern (d, ∞) = = 2.20205, K Nem (d, ∞) 2e
lim
K Type2 (d, ∞) 4 . = = 1.47152, K Nem (d, ∞) e
lim
3.462 . K TrBern (d, ∞) = = 1.49645. K Type2 (d, ∞) 8
d→∞
d→∞
d→∞
The conclusion at this point seems to be that Nemirovski’s approach and the type 2 inequalities yield better constants than Bernstein’s inequality and truncation. Figure 1 shows the constants K (d, ∞) for the centered case over a certain range of dimensions d. 120 100 80 60 40 20 0
200
400
600
800
1000
Figure 1. Comparison of K (d, ∞) obtained via the three proof methods: Medium dashing (bottom) = Nemirovski; Small and tiny dashing (middle) = type 2 inequalities; Large dashing (top) = truncation and Bernstein inequality.
February 2010]
NEMIROVSKI’S INEQUALITIES REVISITED
149
6. PROOFS. 6.1. Proofs for Section 2. Proof of (6). In the case r = ∞, the asserted inequalities read x∞ ≤ xq ≤ d 1/q x∞
for 1 ≤ q < ∞
and are rather obvious. For 1 ≤ q < r < ∞, (6) is an easy consequence of H¨older’s inequality. Proof of Lemma 2.4. In the case r = 2, V ( f + g) is equal to V ( f ) + DV ( f, g) + V (g). If r ≥ 2 and f r = 0, both DV ( f, g) and h( f )g dμ are equal to zero, and the asserted inequalities reduce to the trivial statement that V (g) ≤ (r − 1)V (g). Thus let us restrict our attention to the case r > 2 and f r > 0. Note first that the mapping R t → h t := | f + tg|r is pointwise twice continuously differentiable with derivatives h˙ t = r | f + tg|r −1 sign( f + tg)g = r | f + tg|r −2 ( f + tg)g, h¨ t = r (r − 1)| f + tg|r −2 g 2 .
By means of the inequality |x + y|b ≤ 2b−1 |x|b + |y|b for real numbers x, y and b ≥ 1, a consequence of Jensen’s inequality, we can conclude that for any bound to > 0,
max |h˙ t | ≤ r 2r −2 | f |r −1 |g| + tor −1 |g|r , |t|≤to
max |h¨ t | ≤ r (r − 1)2r −3 | f |r −2 |g|2 + tor −2 |g|r . |t|≤to
The latter two envelope functions belong to L 1 (μ). This follows from H¨older’s inequality which we rephrase for our purposes in the form | f |(1−λ)r |g|λr dμ ≤ f r(1−λ)r grλr for 0 ≤ λ ≤ 1. (14) Hence we may conclude via dominated convergence that t → v(t) ˜ := f + tgrr is twice continuously differentiable with derivatives v˜ (t) = r | f + tg|r −2 ( f + tg)g dμ, v˜ (t) = r (r − 1)
| f + tg|r −2 g 2 dμ.
This entails that t → v(t) := V ( f + tg) = v(t) ˜ 2/r 150
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
is continuously differentiable with derivative
˜ v (t) = (2/r )v(t)
2/r −1
v˜ (t) = v˜
2/r −1
(t)
h( f + tg)g dμ.
For t = 0 this entails the asserted expression for DV ( f, g). Moreover, v(t) is twice continuously differentiable on the set {t ∈ R : f + tgr > 0} which equals either R or R \ {to } for some to = 0. On this set the second derivative equals ˜ 2/r −1 v˜ (t) + (2/r )(2/r − 1)v(t) ˜ 2/r −2 v˜ (t)2 v (t) = (2/r )v(t) 2 | f + tg|r −2 2 | f + tg|r −2 ( f + tg) = 2(r − 1) g dμ − 2(r − 2) g dμ f + tgrr −2 f + tgrr −1 f + tg r −2 2 |g| dμ ≤ 2(r − 1) f + tgr ≤ 2(r − 1)gr2 = 2(r − 1)V (g) by virtue of H¨older’s inequality (14) with λ = 2/r . Consequently, by using
t
v (t) − v (0) =
v (s) ds ≤ 2(r − 1)V (g)t,
0
we find that V ( f + g) − V ( f ) − DV ( f, g) 1 (v (t) − v (0)) dt = v(1) − v(0) − v (0) =
0 1
t dt = (r − 1)V (g).
≤ 2(r − 1)V (g) 0
Proof of Theorem 2.2. The first part is an immediate consequence of the considerations preceding the theorem. It remains to prove the (in)equalities and expansion for K Nem (d, r ). Note that K Nem (d, r ) is the infimum of h(q)d −2/r over all real q ∈ [2, r ], where h(q) := (q − 1)d 2/q satisfies the equation h (q) =
d 2/q (q − log d)2 − (log d − 2) log d . 2 q
Since 7 < e2 < 8, this shows that h is strictly increasing on [2, ∞) if d ≤ 7. Hence K Nem (d, r ) = h(2)d −2/r = d 1−2/r
if d ≤ 7.
For d ≥ 8, one can easily show that log d − (log d − 2) log d < 2, so that h is strictly decreasing on [2, rd ] and strictly increasing on [rd , ∞), where rd := log d + February 2010]
< 2 log d, (log d − 2) log d > 2 log d − 2.
NEMIROVSKI’S INEQUALITIES REVISITED
151
Thus for d ≥ 8, K Nem (d, r ) =
h(r )d −2/r = r − 1 < 2 log d − 1 h(rd )d −2/r ≤ h(2 log d) = 2e log d − e
if r ≤ rd , if r ≥ rd .
Moreover, one can verify numerically that K Nem (d, r ) ≤ d ≤ 2e log d − e for 3 ≤ d ≤ 7. Finally, for d ≥ 8, the inequalities rd := 2 log d − 2 < rd < rd := 2 log d yield
K Nem (d, ∞) = h(rd ) ≥ (rd − 1)d 2/rd = 2e log d − 3e, and for 1 ≤ d ≤ 7, the inequality d = K Nem (d, ∞) ≥ 2e log(d) − 3e is easily verified. 6.2. Proofs for Section 3. Proof of Lemma 3.2. The following proof is standard; see, e.g., [1, p. 160], [13, p. 247]. Let x1 , . . . , xn be fixed functions in L r (μ). Then by [8], for any t ∈ T , r 1/r 1/2 n n 2 i xi (t) ≤ Br |xi (t)| . E i=1
(15)
i=1
To use inequality (15) for finding an upper bound for the type constant for L r , rewrite it as r r/2 n n r 2 i xi (t) ≤ Br |xi (t)| . E i=1
i=1
It follows from Fubini’s theorem and the previous inequality that r n x = E E i i r
i=1
=
r n dμ(t) x (t) i i i=1
r n E i xi (t) dμ(t) i=1
≤
Brr
n
r/2 |xi (t)|
2
dμ(t).
i=1
Using the triangle inequality (or Minkowski’s inequality), we obtain r 2/r 2/r r/2 n n 2 2 i xi ≤ Br |xi (t)| dμ(t) E i=1
r
i=1
≤
Br2
n
2/r
|xi (t)| dμ(t) r
i=1
= Br2
n
xi r2 .
i=1
152
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
Furthermore, since g(v) = v 2/r is a concave function of v ≥ 0, the last display implies that 2 r 2/r n n n 2 x ≤ E x ≤ B xi r2 . E i i i i r r
i=1
r
i=1
i=1
d Proof of Lemma 3.4. n For 1 ≤ i ≤ n let xi = (xim )m=1 be an arbitrary fixed vector in d R , andset S := i=1 i xi . Further let Sm be the mth component of S with n variance n 2 vm2 := i=1 xim , and define v 2 := max1≤m≤d vm2 , which is not greater than i=1 xi 2∞ . It suffices to show that
ES2∞ ≤ 2 log(2d)v 2 . To this end note first that h : [0, ∞) → [1, ∞) with h(t) := cosh(t 1/2 ) =
∞ tk (2k)! k=0
is bijective, increasing, and convex. Hence its inverse function h −1 : [1, ∞) → [0, ∞) is increasing and concave, and one easily verifies that
2 h −1 (s) = log(s + (s 2 − 1)1/2 ) ≤ (log(2s))2 . Thus it follows from Jensen’s inequality that for arbitrary t > 0,
ES2∞ = t −2 Eh −1 cosh(t S∞ ) ≤ t −2 h −1 E cosh(t S∞ )
2 ≤ t −2 log 2E cosh(t S∞ ) . Moreover, E cosh(t S∞ ) = E max cosh(t Sm ) ≤ 1≤m≤d
d
E cosh(t Sm ) ≤ d exp(t 2 v 2 /2),
m=1
according to (8), whence
2 2 ES2∞ ≤ t −2 log 2d exp(t 2 v 2 /2) = log(2d)/t + tv 2 /2 . Now the assertion follows if we set t =
2 log(2d)/v 2 .
Proof of (9). We may replace the random sequence {X i } in Example 1.2 with the random sequence {i X i }, where {i } is a Rademacher sequence independent of {X i }. Thereafter on {X i }, i.e., we view it as a deterministic sequence such nwe condition that n −1 i=1 X i X i converges to the identity matrix Id as n → ∞, by the strong law of large numbers. Now Lindeberg’s version of the multivariate Central Limit Theorem shows that n 2 2 n −1/2 E i=1 i X i ∞ 2 sup n ≥ sup En i X i ≥ cd . 2 n≥1 n≥1 ∞ i=1 X i ∞ i=1 February 2010]
NEMIROVSKI’S INEQUALITIES REVISITED
153
Inequalities for Φ. The subsequent results will rely on (10) and several inequalities for 1 − (z). The first of these is: 1 − (z) ≤ z −1 φ(z),
z > 0,
(16)
which is known as Mills’ ratio; see [6] and [19] for related results. The proof of this upper bound is easy: since φ (z) = −zφ(z) it follows that ∞ ∞ −1 ∞ t φ(z) φ(t) dt = . (17) φ(t) dt ≤ φ (t) dt = 1 − (z) = z z z z z z A very useful pair of upper and lower bounds for 1 − (z) is as follows: 2 4 φ(z) ≤ 1 − (z) ≤ φ(z), √ √ 2 z+ z +4 3z + z 2 + 8
z > −1;
(18)
the inequality on the left is due to Komatsu (see, e.g., [10, p. 17]), while the inequality on the right is an improvement of an earlier result of Komatsu due to Szarek and Werner [23]. Proof of Lemma 3.6. To prove the upper bound for T2 (d∞ ), let (i )i≥1 be a Rademacher sequence. With S and Sm as in the proof of Lemma 3.4, for any δ > 0 we may write ∞ 2 2tP sup |Sm | > t dt ES∞ = 0
≤ δ2 + ≤ δ2 +
1≤m≤d
∞
2tP δ
1≤m≤d
d m=1
sup |Sm | > t dt
δ
∞
2tP |Sm | > t dt.
Now by (10) with v 2 and vm2 as in the proof of Lemma 3.4, followed by Mills’ ratio (16), ∞ ∞ 4K vm −t 2 /(2vm2 ) 2tP(|Sm | > t) dt ≤ dt te √ 2π t δ δ ∞ −t 2 /(2vm2 ) e 4K vm ∞ −t 2 /(2vm2 ) e dt = 4K vm2 dt = √ √ 2π δ 2πvm δ = 4K vm2 (1 − (δ/vm )) ≤ 4K v 2 (1 − (δ/v)).
(19)
Now instead of the Mills’ ratio bound (16) for the tail of the normal distribution, we use the upper bound part of (18). This yields ∞ 4cv 2 2 2
2tP(|Sm | > t) dt ≤ 4K v 2 (1 − (δ/v)) ≤ e−δ /(2v ) , 2 2 3δ/v + δ /v + 8 δ √ √ where we have defined c := 4K / 2π = 12.88/ 2π , and hence ES2 ≤ δ 2 + 154
4cdv 2 2 2
e−δ /(2v ) . 3δ/v + δ 2 /v 2 + 8
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
Taking δ 2 = v 2 2 log
cd/2 2 log(cd/2)
gives ⎧ ⎨ ES2 ≤ v 2 2 log d + 2 log(c/2) − log(2 log(dc/2)) ⎩ + 3 2 log √ 2
⎫ ⎬
8 2 log(cd/2) + 2 log √
cd 2 log(cd/2)
2
cd 2 log(cd/2)
+8
⎭
=: v 2 2 log d + h 2 (d) where it is easily checked that h 2 (d) ≤ 3 for all d ≥ 1. Moreover h 2 (d) is negative for d > 4.13795 × 1010 . This completes the proof of the upper bound in (11). To prove the lower bound for cd in (11), we use the lower bound of [13, Lemma 6.9, p. 157] (which is, in this form, due to Gin´e and Zinn [5]). This yields ∞ λ 2 1 2 t + d 4t (1 − (t)) dt (20) cd ≥ 1 + λ o 1 + λ to for any to > 0, where λ = 2d(1 − (to )). By using Komatsu’s lower bound (18), we find that ∞ ∞ 2t t (1 − (t)) dt ≥ φ(t) dt √ t2 + 4 to to t + ∞ 2to
≥ φ(t) dt to + to2 + 4 to =
1+
2 1 + 4/to2
(1 − (to )).
Using this lower bound in (20) yields 8 λ 2 1
to + d (1 − (to )) 1+λ 1 + λ 1 + 1 + 4/to2 4 2d(1 − (to ))
to2 + = 1 + 2d(1 − (to )) 1 + 1 + 4/to2
cd2 ≥
≥ Now we let c ≡
February 2010]
to +
1+
4d √ 2
to +4 4d √
to +
φ(to )
to2 +4
φ(to )
to2 +
1+
4 1 + 4/to2
.
(21)
√ 2/π and δ > 0 and choose cd 2 to = 2 log . (2 log(cd))(1+δ)/2 NEMIROVSKI’S INEQUALITIES REVISITED
155
For this choice we see that to → ∞ as d → ∞, (2 log(cd))(1+δ)/2 2d · = 2(2 log(cd))(1+δ)/2 , 4dφ(to ) = √ cd 2π and 4dφ(to ) 2(2 log(cd))(1+δ)/2 = →∞ to {2 log(cd/(2 log(cd))(1+δ)/2 )}1/2 as d → ∞, so the first term on the right-hand side of (21) converges to 1 as d → ∞, and it can be rewritten as 4
Ad to2 + 1 + 1 + 4/to2 cd 4
= Ad 2 log + (2 log(cd))(1+δ)/2 1 + 1 + 4/to2 ∼ 1 · 2 log d + 2 log c − (1 + δ) log(2 log(cd)) + 2 . To prove the upper bounds for cd , we will use the upper bound of [13, Lemma 6.9, p. 157] (which is, in this form, due to Gin´e and Zinn [5]). For every to > 0 ∞ 2t P(|Z 1 | > t) dt cd2 ≡ E max |Z j |2 ≤ to2 + d 1≤ j ≤d
to
= to2 + 4d ≤
to2
+ 4d
∞
t (1 − (t)) dt
to ∞
φ(t) dt
(by Mills’ ratio)
to
= to2 + 4d(1 − (to )). √ Evaluating this bound at to = 2 log(d/ 2π ) and then using Mills’ ratio again yields
√
√ 2 2 log d/ 2π cd ≤ 2 log d/ 2π + 4d 1 −
1 ≤ 2 log d − 2 log(2π) + 4d 2
φ
√ 2 log d/ 2π
√ 2 log d/ 2π
√ 2 2 = 2 log d − log(2π) + √ log d/ 2π
(22)
≤ 2 log d, where the last inequality holds if √ 2 2 √ ≤ log(2π), log d/ 2π 156
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
or equivalently if log d ≥
log(2π) 8 = 3.28735 . . . , + (log(2π))2 2
. and hence if d ≥ 27 > e3.28735... = 26.77. The claimed inequality is easily verified numerically for d = 3, . . . , 26. (It fails for d = 2.) As can be seen from (22), 2 log d − log(2π) gives a reasonable approximation to E max1≤ j ≤d Z 2j for large d. Using the upper bound in (18) instead of the second application of Mills’ ratio and choosing √ to2 = 2 log(cd/ 2 log(cd)) with c := 2/π yields the third bound for cd in (11) with h 3 (d) = − log(π) − log(log(cd)) + 3 1−
log(2 log(cd)) 2 log(cd)
8 + 1−
log(2 log(cd)) 2 log(cd)
+
4 log(cd)
.
6.3. Proofs for Section 4. Proof of Lemma 4.1. It follows from EY = 0, the Taylor expansion of the exponential function, and the inequality E|Y |m ≤ σ 2 κ m−2 for m ≥ 2 that Y Y Y E exp = 1 + E exp −1− κL κL κL ≤ 1+
∞ ∞ 1 E|Y |m 1 1 σ2 σ 2 e(L) ≤ 1 + = 1 + . m! (κ L)m κ 2 m=2 m! L m κ2 m=2
Proof of Lemma 4.2. Applying Lemma 4.1 to the jth components X i, j of X i and Sn, j of Sn yields for all L > 0, ! ! n n ±Sn, j ±X i, j Var(X i, j )e(L) e(L) E exp E exp exp ≤ exp . = ≤ κL κL κ2 κ2 i=1 i=1 Hence E cosh
Sn ∞ κL
= E max cosh 1≤ j ≤d
Sn, j κL
≤
d
E cosh
j =1
Sn, j κL
≤ d exp
e(L) . κ2
As in the proof of Lemma 3.4 we conclude that 2 Sn ∞ ≤ (κ L) log 2E cosh κL e(L) 2 ≤ (κ L)2 log(2d) + κ2 L e(L) 2 = κ L log(2d) + , κ
ESn 2∞
2
which is equivalent to the inequality stated in the lemma. February 2010]
NEMIROVSKI’S INEQUALITIES REVISITED
157
Proof of Theorem 4.3. For fixed κo > 0 we split Sn into An + Bn as described before. Let us bound the sum Bn first: For this term we have Bn ∞ ≤
n 1[X i ∞ >κo ] X i ∞ + E(1[X i ∞ >κo ] X i ∞ ) i=1
n = 1[X i ∞ >κo ] X i ∞ − E(1[X i ∞ >κo ] X i ∞ ) i=1
+2
n
E(1[X i ∞ >κo ] X i ∞ )
i=1
=: Bn1 + Bn2 . Therefore, since EBn1 = 0, 2 2 + Bn2 EBn 2∞ ≤ E(Bn1 + Bn2 )2 = EBn1
=
n i=1
≤
n i=1
2 n
Var 1[X i ∞ >κo ] X i ∞ + 4 E(X i ∞ 1[X i ∞ >κo ] ) i=1
n EX i 2∞ 2 2 EX i ∞ + 4 κo i=1
2 , κo2 n where we define := i=1 EX i 2∞ . The first sum, An , may be bounded by means of Lemma 4.2 with κ = 2κo , utilizing the bound
Var(X i,(a)j ) = Var 1[X i ∞ ≤κo ] X i, j ≤ E 1[X i ∞ ≤κo ] X i,2 j ≤ EX i 2∞ . =+4
Thus
L e(L) 2 . EAn 2∞ ≤ 2κo L log(2d) + 2κo
Combining the bounds we find that ESn 2∞ ≤ EAn 2∞ + EBn 2∞ ≤ 2κo L log(2d) + = ακo +
Le(L) √ + +2 2κo κo
√ β + , κo
where α := 2L log(2d) and β := (L e(L) + 4)/2. This bound is minimized if κo = √ β/α with minimum value
√
√ 2 αβ + = 1 + 2 L 2 e(L) + 4L log(2d) , and for L = 0.407 the latter bound is not greater than
√
1 + 3.46 log(2d) . 158
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
In the special case of symmetrically distributed random vectors X i , our treatment of the sum Bn does not change, but in the bound for EAn 2∞ one may replace 2κo with κo , because EX i(a) = 0. Thus
Le(L) √ + +2 κo κo
β √ = α κo + + with α := L log(2d), β := (L e(L) + 2) κo √
= 1 + 2 L 2 e(L) + 2L log(2d) if κo = β /α .
ESn 2∞ ≤ κo L log(2d) +
For L = 0.5 the latter bound is not greater than √
1 + 2.9 log(2d) . ACKNOWLEDGMENTS. The authors owe thanks to the referees for a number of suggestions which resulted in a considerable improvement in the article. The authors are also grateful to Ilya Molchanov for drawing their attention to Banach-Mazur distances, and to Stanislaw Kwapien and Vladimir Koltchinskii for pointers concerning type and cotype proofs and constants. This research was initiated during the opening week of the program on “Statistical Theory and Methods for Complex, High-Dimensional Data” held at the Isaac Newton Institute for Mathematical Sciences from 7 January to 27 June, 2008, and was made possible in part by the support of the Isaac Newton Institute for visits of various periods by D¨umbgen, van de Geer, and Wellner. The research of Wellner was also supported in part by NSF grants DMS-0503822 and DMS-0804587. The research of D¨umbgen and van de Geer was supported in part by the Swiss National Science Foundation.
REFERENCES 1. A. Araujo and E. Gin´e, The Central Limit Theorem for Real and Banach Valued Random Variables, Wiley Series in Probability and Mathematical Statistics, John Wiley, New York, 1980. 2. G. Bennett, Probability inequalities for the sum of independent random variables, J. Amer. Statist. Assoc. 57 (1962) 33–45. doi:10.2307/2282438 3. V. H. de la Pe˜na and E. Gin´e, Decoupling: From Dependence to Independence, Probability and its Applications, Springer-Verlag, New York, 1999. 4. R. M. Dudley, Uniform Central Limit Theorems, Cambridge Studies in Advanced Mathematics, vol. 63, Cambridge University Press, Cambridge, 1999. 5. E. Gin´e and J. Zinn, Central limit theorems and weak laws of large numbers in certain Banach spaces, Z. Wahrsch. Verw. Gebiete 62 (1983) 323–354. doi:10.1007/BF00535258 6. R. D. Gordon, Values of Mills’ ratio of area to bounding ordinate and of the normal probability integral for large values of the argument, Ann. Math. Statistics 12 (1941) 364–366. doi:10.1214/aoms/ 1177731721 7. E. Greenshtein and Y. Ritov, Persistence in high-dimensional linear predictor selection and the virtue of overparametrization, Bernoulli 10 (2004) 971–988. doi:10.3150/bj/1106314846 8. U. Haagerup, The best constants in the Khintchine inequality, Studia Math. 70 (1981) 231–283. 9. W. Hoeffding, Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc. 58 (1963) 13–30. doi:10.2307/2282952 10. K. Itˆo and H. P. McKean, Jr., Diffusion Processes and their Sample Paths, Classics in Mathematics, Springer-Verlag, Berlin, 1974. 11. W. B. Johnson and J. Lindenstrauss, Basic concepts in the geometry of Banach spaces, in Handbook of the Geometry of Banach Spaces, vol. I, North-Holland, Amsterdam, 2001, 1–84. 12. A. Juditsky and A. S. Nemirovski, Large deviations of vector-valued martingales in 2-smooth normed spaces, Tech. report, Georgia Institute of Technology, Atlanta, GA, 2008. 13. M. Ledoux and M. Talagrand, Probability in Banach Spaces: Isoperimetry and Processes, Ergebnisse der Mathematik und ihrer Grenzgebiete 3. Folge / A Series of Modern Surveys in Mathematics, vol. 23, Springer-Verlag, Berlin, 1991. 14. A. S. Nemirovski, Topics in non-parametric statistics, in Lectures on Probability Theory and Statistics (Saint-Flour, 1998), Lecture Notes in Mathematics, vol. 1738, Springer, Berlin, 2000, 85–277.
February 2010]
NEMIROVSKI’S INEQUALITIES REVISITED
159
15. , Regular Banach spaces and large deviations of random sums (working paper) 2004. 16. A. S. Nemirovski and D. B. Yudin, Problem Complexity and Method Efficiency in Optimization, John Wiley, Chichester, UK, 1983. 17. I. Pinelis, Extremal probabilistic problems and Hotelling’s T 2 test under a symmetry condition, Ann. Statist. 22 (1994) 357–368. doi:10.1214/aos/1176325373 18. , Optimum bounds for the distributions of martingales in Banach spaces, Ann. Probab. 22 (1994) 1679–1706. doi:10.1214/aop/1176988477 19. , Monotonicity properties of the relative error of a Pad´e approximation for Mills’ ratio, J. Inequal. Pure Appl. Math. 3/2 (2002). , Toward the best constant factor for the Rademacher-Gaussian tail comparison, ESAIM Probab. 20. Stat. 11 (2007) 412–426. doi:10.1051/ps:2007027 21. D. Pollard, Empirical Processes: Theory and Applications, NSF-CBMS Regional Conference Series in Probability and Statistics, 2, Institute of Mathematical Statistics, Hayward, CA, 1990. 22. E. Rio, Moment inequalities for sums of dependent random variables under projective conditions, J. Theoret. Probab. 22 (2009) 146–163. doi:10.1007/s10959-008-0155-9 23. S. J. Szarek and E. Werner, A nonsymmetric correlation inequality for Gaussian measure, J. Multivariate Anal. 68 (1999) 193–211. doi:10.1006/jmva.1998.1784 24. N. Tomczak-Jaegermann, Banach-Mazur Distances and Finite-Dimensional Operator Ideals, Pitman Monographs and Surveys in Pure and Applied Mathematics, vol. 38, Longman Scientific & Technical, Harlow, UK, 1989. 25. S. A. van de Geer, Applications of Empirical Process Theory, Cambridge Series in Statistical and Probabilistic Mathematics, vol. 6, Cambridge University Press, Cambridge, 2000. 26. A. W. van der Vaart and J. A. Wellner, Weak Convergence and Empirical Processes: With Applications to Statistics, Springer Series in Statistics, Springer-Verlag, New York, 1996. ¨ LUTZ DUMBGEN received his Ph.D. from Heidelberg University in 1990. From 1990–1992 he was a Miller research fellow at the University of California at Berkeley. Thereafter he worked at the Universities of Bielefeld, Heidelberg, and L¨ubeck. Since 2002 he has been professor of statistics at the University of Bern. His research interests are nonparametric, multivariate, and computational statistics. Institute of Mathematical Statistics and Actuarial Science, University of Bern, Alpeneggstrasse 22, CH-3012 Bern, Switzerland
[email protected] SARA A. VAN DE GEER obtained her Ph.D. at Leiden University in 1987. She worked at the Center for Mathematics and Computer Science in Amsterdam, at the Universities of Bristol, Utrecht, Leiden, and Toulouse, and at the Eidgen¨ossische Technische Hochschule in Z¨urich (2005–present). Her research areas are empirical processes, statistical learning, and statistical theory for high-dimensional data. Seminar for Statistics, ETH Zurich, CH-8092 Zurich, Switzerland
[email protected] MARK C. VERAAR received his Ph.D. from Delft University of Technology in 2006. In the year 2007 he was a postdoctoral researcher in the European RTN project “Phenomena in High Dimensions” at the IMPAN institute in Warsaw (Poland). In 2008 he spent one year as an Alexander von Humboldt fellow at the University of Karlsruhe (Germany). Since 2009 he has been Assistant Professor at Delft University of Technology (the Netherlands). His main research areas are probability theory, partial differential equations, and functional analysis. Delft Institute of Applied Mathematics, Delft University of Technology, P.O. Box 5031, 2600 GA Delft, The Netherlands
[email protected],
[email protected] JON A. WELLNER received his B.S. from the University of Idaho in 1968 and his Ph.D. from the University of Washington in 1975. He got hooked on research in probability and statistics during graduate school at the UW in the early 1970s, and has enjoyed both teaching and research at the University of Rochester (1975–1983) and the University of Washington (1983–present). If not for probability theory and statistics, he might be a ski bum. Department of Statistics, Box 354322, University of Washington, Seattle, WA 98195-4322
[email protected]
160
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
NOTES Edited by Ed Scheinerman
An Ancient Elliptic Locus John E. Wetzel The elliptic locus discussed in this article is of ancient origin, but it seems not as well known as it ought to be. We follow D¨orrie [4, pp. 214–216] and attribute it to the Dutch mathematician Frans van Schooten the younger (1615–1660). Theorem (Schooten [9]). If the vertices P and Q of a triangular tile PQR move respectively on two intersecting lines p and q, the locus of the third vertex R is a (possibly degenerate) ellipse whose center is at the point in which the lines p and q meet (Figure 1). q
P
p
T R
Q
Figure 1. Schooten’s locus.
This locus was apparently well known in the 18th and 19th century. It can be found in 19th century textbooks on analytic geometry, commonly as an exercise. See, for example, Baltzer [1, pp. 109–110], Salmon [7, Ex. 1, p. 188 and Ex. 16, p. 190]. After sketching the classical proof of Schooten’s theorem, we present an apparently new, short, coordinate proof based on motions from an unpublished 1996 article by Sullivan [10]. TRAMMELS. The special case of the theorem in which the lines p and q are perpendicular and the three points P, Q, and R are collinear is the familiar elliptic compass or trammel of Archimedes (Figure 2a), also sometimes called the ellipsograph, although the latter term is also used for any mechanism that generates an elliptic locus. The configuration in which the lines p and q are not required to be perpendicular is called the oblique trammel (see Figure 2b). The point R may be taken anywhere on the line PQ. The history of trammels is obscure, but their use as mechanisms to draw ellipses is ancient, the trammel of Archimedes dating back at least to Proclus (410–485 CE) and possibly even to Archimedes (c. 287–212 BCE), and the oblique trammel at least to doi:10.4169/000298910X476068
February 2010]
NOTES
161
1579 (del Monte [3]) and probably much earlier. Taimina [11] remarks without citing historical sources that they were known to Leonardo da Vinci (1452–1519). In any case, the trammel was a familiar mechanism for drawing ellipses in the 16th and 17th centuries. q
q l
R
Q S
Q S
p
P
P
R l
Q S
P
R Q S
R
a. Trammel of Archimedes.
p l
P
l
b. Oblique trammel.
Figure 2. Trammels.
A Proof of the Trammel Locus. A trigonometric proof of the trammel locus is not difficult. Take the line p to be the x-axis, with the origin O at the point in which p and q meet (Figure 3). Let p, q = ϕ (with 0 < ϕ < 180◦ ), and suppose c > 0 is given. Take P on p and Q on q with PQ = c, and assume that neither P nor Q is at the origin O. Let R be a point on the line PQ. Then sin ϑ sin ϕ = OQ PQ in triangle OPQ, where ϑ = ∠OPQ. Writing a = PR, b = QR, and h = c cot ϕ, we find the cartesian coordinates (x, y) of R: x = OQ cos ϕ − b cos ϑ = h sin ϑ − b cos ϑ y = a sin ϑ. R
(1)
q
y
Q ϕ
O
ϑ
P
x=p
Figure 3. Proof of the trammel locus.
162
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
Note that a and b cannot both equal 0. If both are different from 0, then, from (1), y a hy − ax , cos ϑ = ab sin ϑ =
and eliminating the parameter ϑ gives the second-order equation a 2 x 2 − 2ahx y + (b2 + h 2 )y 2 = a 2 b2 .
(2)
The discriminant (−2ah)2 − 4a 2 (b2 + h 2 ) = −4a 2 b2 of this quadratic is negative, the quadratic is not a square, and the locus is a nondegenerate ellipse, as claimed. (It is easy to confirm directly that the coordinates of R satisfy (2) in the excluded cases P = O and Q = O.) When a = 0 and b = 0 it is clear from (1) that the locus is a line segment along p. And similarly when a = 0 and b = 0, the locus is a line segment along q. We have proved the following result. Theorem 1. Suppose points P and Q fixed on a rigid rod move respectively on two intersecting lines p and q. Then the locus of a point R on the rod different from P and from Q is an ellipse whose center is at the point in which the lines p and q meet. If R is P or Q, then (and only then) the ellipse collapses to a line segment. SCHOOTEN’S LOCUS. The theorem of Schooten has historically been deduced from the trammel—a somewhat surprising instance of P´olya’s observation that “the equivalence of the particular and the general . . . is . . . quite usual in mathematics.” Schooten [9] based his proof of the theorem on his study of the properties of a particular “mechanism,” a configuration of four points A, B, C, and D with AB = BC = BD (Figure 4). He takes A to be a fixed point and regards isosceles triangle BCD as linked to A by the segment AB, and he investigates the loci that are traced by points on the sidelines of triangle BCD as it moves subject to various additional constraints. C B
D
A
Figure 4. Schooten’s mechanism.
D¨orrie [4, #47, pp. 214–216] gave a deduction of Schooten’s theorem from the trammel of Archimedes, an argument that he attributed to Schooten. This argument is repeated in Honsberger [5, pp. 173–177]. John Casey [2, p. 175] gave a somewhat similar deduction of the theorem from the oblique trammel, an argument he too attributed to Schooten [9]. More recently Dan Pedoe [6] has given a deduction of Schooten’s theorem from the so-called “Tusi-couple” (i.e., the two-cusped hypocycloid), a result well known to engineers working with planetary gears that has been traced to Nasir al-Din al-Tusi (1201–1274) (see Taimina [11, pp. 92–93]). Pedoe goes so far as to atFebruary 2010]
NOTES
163
tribute the general Schooten theorem to Leonardo da Vinci (1452–1519), although he cites no historical sources. We are aware of only two proofs of Schooten’s theorem in the literature in which the argument does not depend on a previous study of the trammel. One is a quite straightforward trigonometric proof given in 1882 by Richard Baltzer [1, pp. 109– 110]. And Chris Sangwin [8] has very recently given a direct coordinate proof, using Gr¨obner bases to deal with the involved algebraic manipulations involved; this is the only direct coordinate proof of the theorem in the literature of which we are aware. It is worth pointing out that we take the generating triangle T = PQR to be an oriented triangle. In particular, we take P as positive if triangle PQR is positively oriented (i.e., if the cyclic order P → Q → R → P is counterclockwise) and negative otherwise. We turn next to a version of Casey’s elegant argument. Casey’s Argument. Suppose the generating triangle T = PQR is positively oriented, and for the sake of formulating the argument suppose the points and lines are arranged as pictured (Figure 5). (Minor modifications deal similarly with other configurations and with the case in which T is negatively oriented.) Suppose the given lines p and q meet at a point O, and ∠ p, q = ϕ (measured counterclockwise from p to q) with 0 < ϕ < 180◦ . The line through P and R meets the circle C through O, P, and Q at P and hence at a second point S; let s be the line through O and S (Figure 5). Then ∠PSQ = ϕ, and ∠QOS = ∠P, so that ∠POS = ϕ + ∠P, and it follows that the line s is completely determined by the given angle ϕ and the angles of the given triangle PQR (but independent of the positions of P on p and Q on q). Furthermore, the distances PR, RS, and PS are determined by ϕ and triangle PQR. It follows that the locus of R as T moves with P on p and Q on q is the locus of the point R on the line PS as P moves on p and S moves on s, which by Theorem 1 is an ellipse. s
Q
q C
S T R O
p P
Figure 5. Casey’s argument.
Sullivan’s Proof. The underlying idea of this proof is straightforward. Place the given triangle T with the vertex P at the intersection I of the two given lines and the vertex Q on its line q, then rotate it about I through a parametric angle θ to obtain triangle Tθ (Figure 6b). Translate Tθ , keeping Pθ on the line p, until Q θ lies on q, giving triangle Tθ = P¯θ Q¯ θ R¯ θ . The curve sought is then traced by the point R¯ θ as the parametric angle θ varies through one complete revolution. Since the rotation and translation are readily 164
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
_ Qθ
Qθ R0 Rθ
R d
P
Tθ
Q0
T
θ
_ _ Tθ Rθ
P0 = I
Q
a. Triangle T.
q
d _ Pθ
p
b. The construction. Figure 6. Sullivan’s idea.
described in coordinate terms, parametric equations for the locus of R¯ θ can easily be found. It is worth pointing out that this construction provides an easy way to automate this construction using a geometry program such as Geometer’s SketchPad. We begin by restating the theorem to include the condition for degeneracy. Theorem 2. If the vertices P and Q of a triangular tile PQR move respectively on two intersecting lines p and q, the locus of the third vertex R is a (possibly degenerate) ellipse whose center is at the point I in which the lines p and q meet. The ellipse degenerates to a line segment if and only if triangle PQR has circumradius equal to 1 PQ csc ∠ p, q. 2 Proof. Introduce rectangular coordinates with the origin at I and the x-axis along the given line p. Let ϕ, 0◦ < ϕ < 180◦ , be the angle from p to q, let d = PQ, and observe that ∠ P¯θ Q¯ θ I = θ. Then the law of sines applied to triangle I P¯θ Q¯ θ gives sin θ sin ϕ , = d I P¯θ so that the appropriate translation I −→ P¯θ is through the distance I P¯θ = k sin θ, where k = d csc ϕ. If R0 has coordinates (x0 , y0 ), then R¯ θ is the point (x, y), where x cos θ = y sin θ
− sin θ cos θ
x0 k sin θ + , 0 y0
and the parametric equations for the locus are x = (k − y0 ) sin θ + x0 cos θ y = x0 sin θ + y0 cos θ,
(3)
0 ≤ θ ≤ 360◦ . Suppose that k − y0 x0 February 2010]
x0 = ky0 − (x02 + y02 ) = 0. y0 NOTES
(4)
165
Then solving this system gives sin θ =
x y0 − x0 y ky0 − (x02 + y02 )
cos θ =
ky − (x0 x + y0 y) . ky0 − (x02 + y02 )
To eliminate the parameter we square and add, obtaining after a little algebra the equation 2 (x02 + y02 )x 2 − 2kx0 x y + x02 + (k − y0 )2 y 2 = ky0 − (x02 + y02 ) . (5) Since the locus is surely bounded, this central conic must be an ellipse (or degenerate), as claimed. One could also employ the standard test from analytic geometry. A little algebra shows that the discriminant of the quadratic (5) is =−
1 (x02
+
y02
− ky0 )2
< 0.
Further, since the left side of (5) is not a square (because = 0), the locus does not degenerate to a line segment. If ky0 − (x02 + y02 ) = 0, then it follows at once from (3) that y0 x − x0 y = 0, so the locus is a line segment through R0 = (x0 , y0 ). The equation ky − (x 2 + y 2 ) = 0, i.e., 2 k k 2 2 = , x + y− 2 2 is the equation of the circumcircle of triangle P0 Q 0 R0 (it is satisfied by the coordinates of the vertices), so the circumradius of triangle PQR is 12 k. Conversely, if the circumradius of PQR is 12 k, then ky0 − (x02 + y02 ) = 0. Thus the ellipse collapses to a line segment if and only if ky0 − (x02 + y02 ) = 0, i.e., if and only if triangle PQR has circumradius 12 k, as asserted. It is worth stressing again that the generating triangle T = PQR must be an oriented triangle. Writing P for the oriented angle ∠QPR (taken positive in the counterclockwise direction and negative in the clockwise direction), we see that R0 has polar coordinates R0 = PR, ϕ − P so that
(x0 , y0 ) = PR cos(ϕ − P), PR sin(ϕ − P) .
In this way everything can be expressed explicitly in terms of the given oriented triangle PQR. The resulting formulas are nasty. ACKNOWLEDGMENT. I want to express my gratitude to John Sullivan for allowing me to include his proof of Schooten’s theorem. Only the pressure of his other responsibilities prevented him from joining with me as a coauthor. I also want to acknowledge with pleasure some insightful conversations with J. Ralph Alexander and Richard Bishop and to thank John Maki for his invaluable help in translating the relevant parts of Schooten [9].
166
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
REFERENCES 1. R. Baltzer, Analytische Geometrie, S. Hirzel, Leipzig, 1882. 2. J. Casey, A Treatise on the Analytical Geometry of the Point, Line, Circle, and Conic, Hodges, Figgis, Dublin, 1885. 3. G. del Monte, Planisphaeriorum Universalium Theorica, Pisauri, 1579. 4. H. D¨orrie, 100 Great Problems of Elementary Mathematics, Dover, New York, 1965. 5. R. Honsberger, Ingenuity in Mathematics, New Mathematical Library vol. 23, Mathematical Association of America, Washington, DC, 1970. 6. D. Pedoe, The ellipse as an hypotrochoid, Math. Mag. 48 (1975) 228–230. 7. G. Salmon, A Treatise on Conic Sections, 3rd ed., Longman, Brown, Green, and Longmans, London, 1855. 8. C. J. Sangwin, The wonky trammel of Archimedes, Teaching Mathematics and Its Applications 28 (2009) 48–52. doi:10.1093/teamat/hrn019 9. F. van Schooten, Eerste Bouck der Mathematische Oeffeningen, Begrijpende Dijftigh Arithmetische, en Dijftigh Geometrische Doorstellen, 4. Tuych-werckelijcke beschrijving van Kegelsneden, Gerrit van Goedesburgh, Amsterdam, 1659; translation into Dutch by van Schooten of his De organica conicarum sectionum in plano descriptione constructione tractatus, Elsevier, Leiden, 1646. 10. J. Sullivan, Polygon in triangle: Generalizing a theorem of Post (1996), available at http://torus. math.uiuc.edu/jms/Papers/post.pdf (accessed July, 2008). 11. D. Taimina, Historical mechanisms for drawing curves, in Hands On History: A Resource for Teaching Mathematics, MAA Notes, vol. 72, A. Shell-Gellasch, ed., Mathematical Association of America, Washington, DC, 2007, 89–104. Department of Mathematics, University of Illinois at Urbana-Champaign, 1409 W. Green St., Urbana, IL 61801-2975
[email protected]
Counting Interesting Elections Lara K. Pudwell and Eric S. Rowland Suppose that on election day a TV news network of questionable morality wants to increase their viewership as polling results come in. While the reporters cannot control the outcome of the election, they can control the order in which votes are reported to the public. If one candidate is ahead in the tally throughout the entire day, viewership will wane since it is clear that she will win the election. On the other hand, a more riveting broadcast occurs when one candidate is ahead at certain times and the other candidate is ahead at others. In fact, the network employs a group of psychologists and market analysts who have worked out certain margins they would like to achieve at certain points in the tally. The director of programming needs to know the number of ways this can be done. 1. THE BALLOT PROBLEM. We will work up to the general question by first examining the special (low ratings) case when one candidate has at least as many votes as the other throughout the tally. This is the classical “ballot problem,” in which candidate E and candidate N are competing for a public office. Candidate E wins the election with n votes. How many ways are there to report the votes so that at all times during the tally N is not ahead of E? doi:10.4169/000298910X476077
February 2010]
NOTES
167
REFERENCES 1. R. Baltzer, Analytische Geometrie, S. Hirzel, Leipzig, 1882. 2. J. Casey, A Treatise on the Analytical Geometry of the Point, Line, Circle, and Conic, Hodges, Figgis, Dublin, 1885. 3. G. del Monte, Planisphaeriorum Universalium Theorica, Pisauri, 1579. 4. H. D¨orrie, 100 Great Problems of Elementary Mathematics, Dover, New York, 1965. 5. R. Honsberger, Ingenuity in Mathematics, New Mathematical Library vol. 23, Mathematical Association of America, Washington, DC, 1970. 6. D. Pedoe, The ellipse as an hypotrochoid, Math. Mag. 48 (1975) 228–230. 7. G. Salmon, A Treatise on Conic Sections, 3rd ed., Longman, Brown, Green, and Longmans, London, 1855. 8. C. J. Sangwin, The wonky trammel of Archimedes, Teaching Mathematics and Its Applications 28 (2009) 48–52. doi:10.1093/teamat/hrn019 9. F. van Schooten, Eerste Bouck der Mathematische Oeffeningen, Begrijpende Dijftigh Arithmetische, en Dijftigh Geometrische Doorstellen, 4. Tuych-werckelijcke beschrijving van Kegelsneden, Gerrit van Goedesburgh, Amsterdam, 1659; translation into Dutch by van Schooten of his De organica conicarum sectionum in plano descriptione constructione tractatus, Elsevier, Leiden, 1646. 10. J. Sullivan, Polygon in triangle: Generalizing a theorem of Post (1996), available at http://torus. math.uiuc.edu/jms/Papers/post.pdf (accessed July, 2008). 11. D. Taimina, Historical mechanisms for drawing curves, in Hands On History: A Resource for Teaching Mathematics, MAA Notes, vol. 72, A. Shell-Gellasch, ed., Mathematical Association of America, Washington, DC, 2007, 89–104. Department of Mathematics, University of Illinois at Urbana-Champaign, 1409 W. Green St., Urbana, IL 61801-2975
[email protected]
Counting Interesting Elections Lara K. Pudwell and Eric S. Rowland Suppose that on election day a TV news network of questionable morality wants to increase their viewership as polling results come in. While the reporters cannot control the outcome of the election, they can control the order in which votes are reported to the public. If one candidate is ahead in the tally throughout the entire day, viewership will wane since it is clear that she will win the election. On the other hand, a more riveting broadcast occurs when one candidate is ahead at certain times and the other candidate is ahead at others. In fact, the network employs a group of psychologists and market analysts who have worked out certain margins they would like to achieve at certain points in the tally. The director of programming needs to know the number of ways this can be done. 1. THE BALLOT PROBLEM. We will work up to the general question by first examining the special (low ratings) case when one candidate has at least as many votes as the other throughout the tally. This is the classical “ballot problem,” in which candidate E and candidate N are competing for a public office. Candidate E wins the election with n votes. How many ways are there to report the votes so that at all times during the tally N is not ahead of E? doi:10.4169/000298910X476077
February 2010]
NOTES
167
We may represent the state of the tally at any moment by the pair (x, y), where the coordinates x and y count the votes received by E and N respectively. Then a tally consists of a sequence of points on the integer lattice in the plane made in steps of E = 1, 0 and N = 0, 1. Such a sequence is called a northeast lattice path. We say that the lattice path q is restricted by the lattice path p if no part of q lies directly above p. For example, Figure 1 shows two northeast lattice paths from (0, 0) to (n, n) that are restricted by the “staircase” p = E N E N · · · E N , or, equivalently, that do not go above the line y = x. The ballot problem asks for the number Cn of these paths. (Note that if the tally ends at (n, m), we may uniquely continue it to a northeast lattice path ending at (n, n).)
Figure 1. Two northeast lattice paths from (0, 0) to (7, 7) restricted by (E N )7 .
The ballot problem can be solved by constructing a simple recurrence. Let q be a northeast lattice path restricted by the staircase p. Consider the point on q where it first revisits the line y = x, and let i be the x-coordinate of this point. (This point exists since q ends at (n, n).) For the upper path in Figure 1, i = 3; for the lower path, i = 7. Notice that since q does not go above y = x and begins at (0, 0) its first step is E; further, its last step before reaching the point (i, i) is N . Therefore we may delete these steps to obtain a northeast lattice path from (1, 0) to (i, i − 1) that does not go above the line y = x − 1. There are Ci−1 ways to form such a path, and there are Cn−i n ways to continue this path from (i, i) to (n, n), so we have that Cn = i=1 Ci−1 Cn−i . we recognize as the familiar recurrence satisfied by the Catalan numbers Cn = This 2n /(n + 1) [8, Exercise 6.19(h)], so we simply check that the initial condition C0 = 1 n agrees. 2. NOTATION AND THEOREM. We now consider a generalization of the ballot problem. Let LP( p) be the number of northeast lattice paths restricted by an arbitrary northeast lattice path p from (0, 0) to (n, m). The path p represents the network’s predetermined restrictions on the tally. It was known by MacMahon [5, p. 242] that the sum of LP( p) over all such paths is p
LP( p) =
(m + n)! (m + n + 1)! . m! n! (m + 1)! (n + 1)!
However, we are interested in computing LP( p) for specific p. 168
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
First we develop notation for lattice paths. It is possible to represent a northeast lattice path as a word on {E, N }, such as q = EENENNEENENNEN for the upper path in Figure 1. However, this representation is redundant, because the location of each E step determines the path uniquely. Therefore, we may represent a northeast lattice path by the sequence of heights qi of the path along each interval from x = i − 1 to x = i. For example, for the upper path in Figure 1 we have q = (0, 0, 1, 3, 3, 4, 6). This representation is always a nondecreasing tuple of integers, and it is our primary representation of lattice paths in this note. A lattice path q = (q1 , q2 , . . . , qn ) is restricted by the lattice path p = ( p1 , p2 , . . . , pn ) precisely when q ≤ p componentwise, i.e., qi ≤ pi whenever 1 ≤ i ≤ n. To write the main result, however, it turns out to be more natural to use still another representation of a northeast lattice path p—its difference sequence p = ( p1 , p2 − p1 , . . . , pn − pn−1 ). Let (v1 , v2 , . . . , vn ) = v = p. Since p is a northeast lattice path, the entries of v are nonnegative integers. The entry vi is the number of N steps taken along the line x = i − 1, so we can think of this representation as determining a path by the location of each N step. The operator has an inverse , which produces the sequence of partial sums: p = v = (v1 , v1 + v2 , . . . , v1 + v2 + · · · + vn ). The relationship between p and v = p can be interpreted in another way. If v = (v1 , v2 , . . . , vn ) is a tuple of nonnegative integers, the Pitman–Stanley polytope [7] defined by v is n (v) := x ∈ Rn≥0 : x ≤ v componentwise . Thus a tuple x = (x1 , x2 , . . . , xn ) of nonnegative integers is a lattice point inside n (v) precisely when the northeast lattice path x is restricted by v. In other words, provides a bijection from the northeast lattice paths restricted by p to the lattice points in n (p). We now return to the question at hand: How many northeast lattice paths are restricted by the path p = ( p1 , p2 , . . . , pn−1 , pn )? Equivalently, how many lattice points lie inside n (p)? One answer to this question is the following enumer i +1determinant ation. Let A = (ai j ) be the n × n matrix with entries ai j = jp−i+1 . Then the number of northeast lattice paths restricted by p is LP( p) = det A, as given by Kreweras [3] and Mohanty [6, Theorem 2.1]. This fact can be obtained from the triangular system of equations j +1 pi + 1 1 j −i+1 (−1) LP(( p1 , p2 , . . . , pi−1 )) = j −i +1 0 i=1
if j = 0 if j ≥ 1
for 0 ≤ j ≤ n (where LP(()) = 1), which comes from an inclusion–exclusion argument; solve for LP(( p1 , p2 , . . . , pn )) using Cramer’s rule, and in the numerator expand by minors along the last column. February 2010]
NOTES
169
The following theorem presents a formula for LP( p) in which the lattice points in n ((1, 1, . . . , 1, 1)) play a central role. This gives a non-determinantal formula for the number of northeast lattice paths restricted by p. A generalization of the formula has been independently discovered by Gessel and by Pitman and Stanley [7, Equation (33)] in more advanced contexts. Our proof uses elementary combinatorial methods. Theorem. Let p be a northeast lattice path from (0, 0) to (n, m), and let v = p. The number of northeast lattice paths restricted by p is LP( p) =
n
vn+1−i + xi − 1 , xi x i=1
(1)
where the sum is over all Cn+1 lattice points x in n ((1, 1, . . . , 1, 1)). We immediately obtain two well-known results as special cases. For p = (m, m, . . . , m, m) we see that v = p = (m, 0, . . . , 0, 0), which gives n−1 m + xn − 1
xi − 1 LP((m, m, . . . , m, m)) = . xn xi x i=1 Since
xi − 1 1 = xi 0
if xi = 0 if xi ≥ 1
m j m− j to (from the generalization of the binomial theorem (a + b)m = ∞ j =0 j a b m = −1), the only nonzero terms in the sum come from lattice points of the form (0, 0, . . . , 0, xn ), and therefore LP((m, m, . . . , m, m)) =
n m + xn − 1 m+n = n xn x n =0
as expected. For p = (1, 2, . . . , n − 1, n) we recover the ballot problem. Namely, v = p = (1, 1, . . . , 1, 1), so LP((1, 2, . . . , n − 1, n)) =
n
xi x
i=1
xi
=
1 = Cn+1 .
x
Equation (1) allows one to compute LP( p) not only for explicit integer paths but for symbolic paths, and the resulting expressions have the pleasant property that they are written in the basis of rising factorials a (m) = a(a + 1) · · · (a + m − 1). For example, LP((v1 )) = v1(0) + v1(1) = 1 + v1 . For a general path of length 2, we have 1 LP((v1 , v1 + v2 )) = v2(0) v1(0) + v2(0) v1(1) + v2(0) v1(2) + v2(1) v1(0) + v2(1) v1(1) 2 1 = 1 + v1 + v1 (v1 + 1) + v2 + v2 v1 , 2 170
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
and LP((v1 , v1 + v2 , v1 + v2 + v3 )) is v3(0) v2(0) v1(0) + v3(0) v2(0) v1(1) 1 + v3(0) v2(0) v1(2) + 2 1 (0) (2) (0) + v3 v2 v1 + 2
1 (0) (0) (3) v3 v2 v1 + v3(0) v2(1) v1(0) + v3(0) v2(1) v1(1) + 6 1 (0) (2) (1) v v v + v3(1) v2(0) v1(0) + v3(1) v2(0) v1(1) + 2 3 2 1
1 (0) (1) (2) v v v 2 3 2 1 1 (1) (0) (2) v v v 2 3 2 1
+ v3(1) v2(1) v1(0) + v3(1) v2(1) v1(1) . Putting equation (1) together with the determinantal formula for LP( p), we obtain a formula for a certain symbolic determinant in the same basis: n
pi + 1 1 (xi ) v det = , x ! n+1−i j − i + 1 n×n x i=1 i where again v = p. We note that Amdeberhan and Stanley [1, Corollary 4.7] show that LP( p) also gives the number of monomials in the expanded form of the multivariate polynomial pi +1 n
aj
i=1 j =1
in the variables a j . Moreover, LP( p) is the number of noncrossing matchings of a certain type [1, Corollary 4.9]. 3. PROOF OF THE THEOREM. Let lp(v) be the number of lattice points in n (v), where v = (v1 , v2 , . . . , vn−1 , vn ). That is, lp(v) = LP(v). The following recurrence will be used. Proposition. We have 1 lp(v) = v1
if n = 0 lp((v + v − j, v , . . . , v , v )) if n ≥ 1. 1 2 3 n−1 n j =0
Proof. The only lattice point in 0 (()) is (); hence lp(()) = 1. For n ≥ 1, we partition the lattice points w in n (v) according to the first entry j = w1 . Since w is a lattice point in n (v), then w1 + w2 ≤ v1 + v2 , so w2 ≤ v1 + v2 − j. Therefore, lattice points w = ( j, w2 , . . . , wn−1 , wn ) in n (v) are in bijection (by deleting the first entry j) with lattice points in n−1 ((v1 + v2 − j, v3 , . . . , vn−1 , vn )). Thus lp((v1 + v2 − j, v3 , . . . , vn−1 , vn )) is the number of lattice points in n (v) with first entry j, giving the recurrence. To prove the theorem, then, it suffices to show that equation (1) satisfies this recurrence. The base case n = 0 is easily checked, since the product is empty; we have n
vn+1−i + xi − 1 1=1 = xi x i=1 x since again 0 (()) has only one lattice point. February 2010]
NOTES
171
The remainder of this note is devoted to showing that for n ≥ 1 n
vn+1−i + xi − 1 xi x i=1 n−2 v1 v1 + v2 − j + yn−1 − 1
vn+1−i + yi − 1 , = yn−1 yi j =0 y i=1
(2)
where the left sum is over all Cn+1 lattice points x in n ((1, 1, . . . , 1, 1)) and the right sum is over all Cn lattice points y in n−1 ((1, 1, . . . , 1)). We proceed by simplifying this equation until it becomes a statement about sums of binomial coefficients, given in the lemma below. First interchange the two summations on the right side of equation (2). Next, fix y = (y1 , y2 , . . . , yn−1 ) on the right side, and break up the sum on the left according to the choice of y in the following way. The children of y = (y1 , y2 , . . . , yn−1 ) are the elements of the set { (y1 , y2 , . . . , yn−2 , yn−1 , 0) } ∪ {(y1 , y2 , . . . , yn−2 , yn−1 − i, i + 1) : 0 ≤ i ≤ yn−1 }. For example, the children of the lattice point (0, 3, 2) are (0, 3, 2, 0), (0, 3, 2, 1), (0, 3, 1, 2), and (0, 3, 0, 3). It is immediate that each lattice point x has a unique parent y. This definition is central to the proof. The reason for defining children in this way is that x is a lattice point in n ((1, 1, . . . , 1, 1)) if and only if x’s parent is a lattice point in n−1 ((1, 1, . . . , 1)). This property provides a many-to-one correspondence between the n-dimensional lattice points in n ((1, 1, . . . , 1, 1)) and the (n − 1)-dimensional lattice points in n−1 ((1, 1, . . . , 1)). Using this correspondence to break up equation (2), we obtain n
vn+1−i + xi − 1 xi x i=1 n−2 v1 v1 + v2 − j + yn−1 − 1
vn+1−i + yi − 1 = yn−1 yi j =0 i=1
(3)
for each y, where the left sum is over all children x = (x1 , x2 , . . . , xn−1 , xn ) of y. It now suffices to prove equation (3) for a fixed y, since summing both sides of equation (3) over all Cn lattice points y in n−1 ((1, 1, . . . , 1)) produces equation (2). Note that if x is a child of y then xi = yi for 1 ≤ i ≤ n − 2, so we may divide both sides of equation (3) by the product n−2
vn+1−i + yi − 1 yi i=1 to obtain v1 v2 + xn−1 − 1v1 + xn − 1 v1 + v2 − j + yn−1 − 1 = . xn−1 xn yn−1 x j =0 172
(4)
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
We know what the children of y look like, so the sum on the left side can be written as yn−1 v2 + yn−1 − 1 v1 + 0 − 1 v2 + (yn−1 − i) − 1 v1 + (i + 1) − 1 + . yn−1 0 yn−1 − i i +1 i=0 The first term in this expression, which corresponds to the child (y1 , y2 , . . . , yn−1 , 0) of y, is equal to the j = v1 term on the right side of equation (4). Removing this term from both sides leaves v yn−1 1 −1 v2 + yn−1 − i − 1 v1 + i v1 + v2 − j + yn−1 − 1 , = yn−1 − i i +1 yn−1 i=0 j =0 which is proved in the following lemma under the substitution a = v1 , b = v2 , and c = yn−1 . Lemma. Let a, b, and c be nonnegative integers. Then c a−1 b+c−i −1 a+i a+b+c− j −1 = . c−i i +1 c i=0 j =0 Proof. We show that both sides of the equation are equal to a+b+c b+c − . c+1 c+1 The right side is a telescoping sum: a−1 a−1 a+b+c− j −1 a+b+c− j a+b+c− j −1 = − c c+1 c+1 j =0 j =0 a+b+c b+c = − . c+1 c+1 The result for the left side follows from a generalization of the Vandermonde identity, namely f d +k e−k k=0
k
f −k
=
d +e+1 f
[4, Problem 1.42(i)]. The summand on the left side of this equation counts the (d + e + 1 − f )-element subsets of {1, 2, . . . , d + e + 1} whose (d + 1)st element is d + k + 1 by choosing k of the first d + k elements to be not in the set and f − k of the last e − k elements to be not in the set. The right side counts all (d + e + 1 − f )-element subsets of {1, 2, . . . , d + e + 1} by selecting the elements not in the set. Subtract ef from both sides of this equation and substitute d = a − 1, e = b + c, f = c + 1, and k = i + 1 to obtain c b+c−i −1 a+i a+b+c b+c = − . c−i i +1 c+1 c+1 i=0 February 2010]
NOTES
173
Thus the director of programming may, for example, determine the likelihood that a random tally of votes will satisfy the network’s needs. ACKNOWLEDGMENTS. We thank Dimitrije Kostic for generating our interest in this topic [2], Tewodros Amdeberhan for pointing us to relevant literature, Doron Zeilberger for encouraging us to simplify, and an anonymous referee for a number of helpful suggestions.
REFERENCES 1. T. Amdeberhan and R. Stanley, Polynomial coefficient enumeration (to appear), available at http:// arxiv.org/abs/0811.3652. 2. D. Kostic, How to experiment with logjams, Rutgers Experimental Mathematics Seminar, New Brunswick/Piscataway, NJ, November 15, 2007. 3. G. Kreweras, Sur une classe de probl`emes li´es au treillis des partitions d’entiers, Cahiers du Bureau Universitaire de Recherche Op´erationnelle 6 (1965) 5–105. 4. L. Lov´asz, Combinatorial Problems and Exercises, 2nd ed., Elsevier, Amsterdam, 1993. 5. P. A. MacMahon, Combinatory Analysis, vol. 2, Chelsea, New York, 1960; reprint of Cambridge University Press, Cambridge, 1916. 6. S. G. Mohanty, Lattice Path Counting and Applications, Academic Press, New York, 1979. 7. J. Pitman and R. Stanley, A polytope related to empirical distributions, plane trees, parking functions, and the associahedron, Discrete Comput. Geom. 27 (2002) 603–634. 8. R. Stanley, Enumerative Combinatorics, vol. 2, Cambridge University Press, New York, 1999. Department of Mathematics and Computer Science, Valparaiso University, Valparaiso, IN 46383
[email protected] Department of Mathematics, Tulane University, New Orleans, LA 70118
[email protected]
A Proof of Darboux’s Theorem Sam B. Nadler, Jr. Darboux [2] proved the following theorem: Darboux’s Theorem. Let I be an interval in R1 , and let f be a differentiable realvalued function on I . Then the derivative f has the intermediate value property on I (i.e., every number between two values of f is a value of f ). Various proofs of Darboux’s Theorem are in our references. We offer a simple, transparent proof of Darboux’s Theorem that we feel shows systematically and conceptually why the theorem is true. Proof of Darboux’s Theorem. We let D denote the set of all values of the derivative of f , and we let C denote the set of all slopes of chords joining (distinct) points of the graph of f . We prove that D is an interval. The Mean Value Theorem says that C ⊂ D, and the definition of the derivative gives that D ⊂ C (the closure of C). Thus, it suffices to prove that C is an interval. Hence, it suffices to prove that any two points of C can be connected by an interval doi:10.4169/000298910X476086
174
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
Thus the director of programming may, for example, determine the likelihood that a random tally of votes will satisfy the network’s needs. ACKNOWLEDGMENTS. We thank Dimitrije Kostic for generating our interest in this topic [2], Tewodros Amdeberhan for pointing us to relevant literature, Doron Zeilberger for encouraging us to simplify, and an anonymous referee for a number of helpful suggestions.
REFERENCES 1. T. Amdeberhan and R. Stanley, Polynomial coefficient enumeration (to appear), available at http:// arxiv.org/abs/0811.3652. 2. D. Kostic, How to experiment with logjams, Rutgers Experimental Mathematics Seminar, New Brunswick/Piscataway, NJ, November 15, 2007. 3. G. Kreweras, Sur une classe de probl`emes li´es au treillis des partitions d’entiers, Cahiers du Bureau Universitaire de Recherche Op´erationnelle 6 (1965) 5–105. 4. L. Lov´asz, Combinatorial Problems and Exercises, 2nd ed., Elsevier, Amsterdam, 1993. 5. P. A. MacMahon, Combinatory Analysis, vol. 2, Chelsea, New York, 1960; reprint of Cambridge University Press, Cambridge, 1916. 6. S. G. Mohanty, Lattice Path Counting and Applications, Academic Press, New York, 1979. 7. J. Pitman and R. Stanley, A polytope related to empirical distributions, plane trees, parking functions, and the associahedron, Discrete Comput. Geom. 27 (2002) 603–634. 8. R. Stanley, Enumerative Combinatorics, vol. 2, Cambridge University Press, New York, 1999. Department of Mathematics and Computer Science, Valparaiso University, Valparaiso, IN 46383
[email protected] Department of Mathematics, Tulane University, New Orleans, LA 70118
[email protected]
A Proof of Darboux’s Theorem Sam B. Nadler, Jr. Darboux [2] proved the following theorem: Darboux’s Theorem. Let I be an interval in R1 , and let f be a differentiable realvalued function on I . Then the derivative f has the intermediate value property on I (i.e., every number between two values of f is a value of f ). Various proofs of Darboux’s Theorem are in our references. We offer a simple, transparent proof of Darboux’s Theorem that we feel shows systematically and conceptually why the theorem is true. Proof of Darboux’s Theorem. We let D denote the set of all values of the derivative of f , and we let C denote the set of all slopes of chords joining (distinct) points of the graph of f . We prove that D is an interval. The Mean Value Theorem says that C ⊂ D, and the definition of the derivative gives that D ⊂ C (the closure of C). Thus, it suffices to prove that C is an interval. Hence, it suffices to prove that any two points of C can be connected by an interval doi:10.4169/000298910X476086
174
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
f (b) f (d) in C. To prove this, fix p, q ∈ C, p = f (a)− and q = f (c)− , where a < b and a−b c−d c < d in I . We define a continuous function g : [0, 1] → C; g gives the slopes of the chords one obtains by sliding the points (a, f (a)) and (b, f (b)) “linearly” with respect to the x-axis along the graph of f to the points (c, f (c)) and (d, f (d)):
g(t) =
f ((1 − t)a + tc) − f ((1 − t)b + td) . ((1 − t)a + tc) − ((1 − t)b + td)
By the Intermediate Value Theorem, g([0, 1]) is an interval (obviously in C); also, p = g(0) and q = g(1). ACKNOWLEDGMENT. We thank a referee for several of our references.
REFERENCES 1. R. Bagby, Introductory Analysis: A Deeper View into Calculus, Harcourt/Academic Press, Burlington, MA, 2001. ´ 2. G. Darboux, M´emoire sur les fonctions discontinues, Ann. Sci. Ecole Norm. Sup. 4 (1875) 57–122. 3. G. Folland, Advanced Calculus, Prentice Hall, Upper Saddle River, NJ, 2002. 4. N. R. Nandakumar, A note on the intermediate value theorem for derivatives, J. Math. Phys. Sci. 26 (1992) 425–426. 5. M. Olinick and B. Peterson, Darboux’s theorem and points of inflection, Two-Year College Mathematics Journal 7 (1976) 5–9. doi:10.2307/3027142 6. L. Olsen, A new proof of Darboux’s theorem, this M ONTHLY 111 (2004) 713–715. doi:10.2307/ 4145046 University of Toledo, Toledo, Ohio 43606
[email protected]
On the Behavior at Infinity of an Integrable Function Emmanuel Lesigne We denote by x a real variable and by n a positive integer variable. The reference measure on the real line R is the Lebesgue measure. In this note we will use only basic properties of the Lebesgue measure and integral on R. It is well known that the fact that a function tends to zero at infinity is a condition neither necessary nor sufficient for this function to be integrable. However, we have the following result. Theorem 1. Let f be an integrable function on the real line R. For almost all x ∈ R, we have lim f (nx) = 0.
(1)
NOTES
175
n→∞
doi:10.4169/000298910X476095
February 2010]
f (b) f (d) in C. To prove this, fix p, q ∈ C, p = f (a)− and q = f (c)− , where a < b and a−b c−d c < d in I . We define a continuous function g : [0, 1] → C; g gives the slopes of the chords one obtains by sliding the points (a, f (a)) and (b, f (b)) “linearly” with respect to the x-axis along the graph of f to the points (c, f (c)) and (d, f (d)):
g(t) =
f ((1 − t)a + tc) − f ((1 − t)b + td) . ((1 − t)a + tc) − ((1 − t)b + td)
By the Intermediate Value Theorem, g([0, 1]) is an interval (obviously in C); also, p = g(0) and q = g(1). ACKNOWLEDGMENT. We thank a referee for several of our references.
REFERENCES 1. R. Bagby, Introductory Analysis: A Deeper View into Calculus, Harcourt/Academic Press, Burlington, MA, 2001. ´ 2. G. Darboux, M´emoire sur les fonctions discontinues, Ann. Sci. Ecole Norm. Sup. 4 (1875) 57–122. 3. G. Folland, Advanced Calculus, Prentice Hall, Upper Saddle River, NJ, 2002. 4. N. R. Nandakumar, A note on the intermediate value theorem for derivatives, J. Math. Phys. Sci. 26 (1992) 425–426. 5. M. Olinick and B. Peterson, Darboux’s theorem and points of inflection, Two-Year College Mathematics Journal 7 (1976) 5–9. doi:10.2307/3027142 6. L. Olsen, A new proof of Darboux’s theorem, this M ONTHLY 111 (2004) 713–715. doi:10.2307/ 4145046 University of Toledo, Toledo, Ohio 43606
[email protected]
On the Behavior at Infinity of an Integrable Function Emmanuel Lesigne We denote by x a real variable and by n a positive integer variable. The reference measure on the real line R is the Lebesgue measure. In this note we will use only basic properties of the Lebesgue measure and integral on R. It is well known that the fact that a function tends to zero at infinity is a condition neither necessary nor sufficient for this function to be integrable. However, we have the following result. Theorem 1. Let f be an integrable function on the real line R. For almost all x ∈ R, we have lim f (nx) = 0.
(1)
NOTES
175
n→∞
doi:10.4169/000298910X476095
February 2010]
Remark 1. It is too much to hope in Theorem 1 for a result for all x because we consider an integrable function f , which can take arbitrary values on a set of zero measure. Even if we consider only continuous functions, the result does not hold for all x. Indeed a classical result, using a Baire category argument, tells us that if f is a continuous function on R such that for all nonzero x, limn→∞ f (nx) = 0, then limx→±∞ f (x) = 0. Thus for a continuous integrable function f which does not tend to zero at infinity, property (1) is true for almost all x and not for all x. Remark 2. Let f be an integrable and nonnegative function on R. We have 1 f (x) dx. f (nx) dx = n Hence for any nonnegative real sequence (εn ) such that n n /n < +∞, we have εn f (nx) dx < +∞, n
and the monotone convergence theorem (or Fubini’s theorem) ensures that the func tion x → n εn f (nx) is integrable, hence almost everywhere finite. In particular, for almost all x, we have limn→∞ εn f (nx) = 0. This argument is not sufficient to prove Theorem 1. Now we will state that, in a sense, Theorem 1 gives an optimal result. The strength of the following theorem lies in the fact that the sequence (an ) can tend to infinity arbitrarily slowly. Theorem 2. Let (an ) be a real sequence which tends to +∞. There exists a continuous and integrable function f on R such that, for almost all x, lim sup an f (nx) = +∞. n→∞
Moreover, there exists an integrable function f on R such that, for all x, lim sup an f (nx) = +∞. n→∞
Question. Under the hypothesis of Theorem 2, does there exist a continuous and integrable f such that, for all x, lim supn→∞ an f (nx) = +∞? We do not know the answer to this question, and we propose it to the reader. However, the next remark shows that the answer is positive under a slightly more demanding hypothesis. 1 Remark 3. If the sequence (an ) is nondecreasing and satisfies n nan < +∞, then there exists a continuous and integrable function f on R such that for all x, lim supn→∞ an f (nx) = +∞. Remark 4. In Theorem 2 we cannot replace the hypothesis limn an = +∞ by lim supn an = +∞. Indeed, by a simple change of variable we can deduce from Theorem 1 the following result: for all integrable functions f on R, lim n f (n 2 x) = 0
n→∞
176
for almost all x.
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
(Apply Theorem 1 to the function x → x f (x 2 ).) Thus the conclusion of Theorem 2 is false for the sequence (an ) defined by √ n if n is a square of integer, an = 0 if not. In the remainder of this note, we give proofs of the two theorems and of Remark 3. Proof of Theorem 1. The function f is integrable on R. Let us fix ε > 0 and denote by E the set of points x > 0 such that | f (x)| ≥ ε. We know that E has finite measure. We are going to show that, for almost all x ∈ [0, 1], we have nx ∈ E for only finitely many n’s. (If A is a measurable subset of R, we denote by |A| its Lebesgue measure.) For each integer m ≥ 1, let E m := E ∩ (m − 1, m]. Let us fix a ∈ (0, 1). For each integer n ≥ 1, we consider the set
1 1 1 E ∩ [a, 1) = E m ∩ [a, 1) = Fn := (E m ∩ [na, n)) . n n m≥1 n m≥1 We have +∞
|Fn | =
n=1
+∞ +∞ 1 |E m ∩ [na, n)| . n n=1 m=1
In this doubly indexed sum of positive numbers, we can invert the order of summation. Moreover, noticing that E m ∩ [na, n) = ∅ if n > m/a or n ≤ m − 1, we obtain +∞ n=1
|Fn | =
+∞ [m/a] 1 m=1 n=m
n
|E m ∩ [na, n)| ≤
+∞ m=1
|E m |
[m/a] n=m
1 . n
By comparison of the discrete sum with an integral, we see that, for all m ≥ 1, [m/a] 1 n=m n ≤ (1 − ln a). Thus we have +∞ n=1
|Fn | ≤ (1 − ln a)
+∞
|E m | = (1 − ln a)|E| < +∞.
m=1
This implies that almost every x belongs to only finitely many sets Fn . (This statement is the Borel-Cantelli lemma, which has a one-line proof: Fn < +∞ almost everywhere since Fn (x) dx = Fn (x) < +∞.) Returning to the definition of Fn , we conclude that, for almost all x ∈ [a, 1], for all large enough n , x ∈ / Fn , i.e., nx ∈ / E. Since a is arbitrary, we have in fact: for almost all x ∈ [0, 1], for all large enough n, nx ∈ / E. We have proved that, for all ε > 0, for almost all x ∈ [0, 1], for all large enough n, | f (nx)| ≤ ε. Since we have to consider only countably many ε’s, we can invert for all ε > 0 and for almost all x ∈ [0, 1]. We conclude that, for almost all x ∈ [0, 1], limn→∞ f (nx) = 0. It is immediate, by a linear change of variable (for example), that this result extends to almost all x ∈ R. February 2010]
NOTES
177
Proof of Theorem 2. We will utilize the following theorem, a fundamental result in the metric theory of Diophantine approximation [1, Theorem 32]. positive real numbers such that the Kinchin’s Theorem. Let (bn ) be a sequence of sequence (nbn ) is nonincreasing and the series n bn diverges. For almost all real numbers x, there are infinitely many integers n such that dist(nx, Z) < bn . (In the preceding statement, dist(nx, Z) denotes min{|nx − k| | k ∈ Z}.) We will also make use of the following lemma, which will be proved in the sequel. Lemma 1. Let (cn ) be a sequence of nonnegative real numbers going to zero. There exists a sequence of positive real numbers (bn ) such that the sequence (nbn ) is nonincreasing, n bn = +∞, and n bn cn < +∞. Let us prove Theorem 2. Replacing if necessary an by infk≥n ak , we can suppose that the sequence√(an ) is nondecreasing. Applying the preceding lemma to the the sequence cn = 1/ an , we obtain a sequence (b ) such that the sequence (nb ) is nonincreasing, n n n bn = +∞, √ and n bn / an < +∞. The sequence (bn ) tends to zero, and we can impose the additional requirement that bn < 1/2 for all n. We consider the function f 1 defined on R by √ 1/ an if |x − n| ≤ bn for an integer n ≥ 1, f 1 (x) = 0 if not. This function is integrable, due to the last condition imposed on (bn ). By Khinchin’s theorem, for almost all x > 0, there exist pairs of positive integers (n, k(n)), with arbitrarily large n, such that |nx − k(n)| ≤ bn . Let us consider one fixed such x in the interval (0, 1). We have limn→∞ k(n) = +∞ and, since limn→+∞ bn = 0, we have k(n) ≤ n for all large enough n. For such an n, we have |nx − k(n)| ≤ bk(n)
and hence
1 f 1 (nx) = √ . ak(n)
(We used here the fact that the sequence (bn ) is nonincreasing.) Thus, for arbitrarily large n, we have an √ an f 1 (nx) = √ ≥ ak(n) . ak(n) (We used here the fact that the sequence (an ) is nondecreasing.) This proves that lim supn→∞ an f 1 (nx) = +∞. This argument applies to almost all x between 0 and 1. For each integer m ≥ 1, let us denote by f m the function f m (x) = f 1 (x/m). This function f m is nonnegative and integrable on R. It is locally a step function. For almost all x between 0 and m, we have lim sup an f m (nx) = +∞. n→∞
178
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
From this, it is not difficult to construct a continuous and integrable function f on R such that, for all m > 0, there exists Am > 0 with f ≥ f m on [Am , +∞). (For example, we can choose an increasing sequence of numbers (Am ) such that +∞ 1 f 1 (x) + f 2 (x) + · · · + f m (x) dx ≤ 2 ; m Am then we define g = f 1 + f 2 + · · · + f m on the interval [Am , Am+1 ). Since Am+1 f 1 (x) + f 2 (x) + · · · + f m (x) dx < ∞, m
Am
this function g is integrable. Then we just have to find a continuous and integrable function f which dominates g; this can be achieved since the function g is locally a step function: choose f to be zero on (−∞, 0] and continuous +∞on R such that g ≤ m f and, for all m > 0, m−1 f (x) − g(x) dx ≤ 1/m 2 , so that 0 f (x) − g(x) dx < +∞.) For almost all x ≥ 0, we have lim supn→∞ an f (nx) = +∞. A symmetrization procedure extends this property to almost all real numbers. The first part of Theorem 2 is proved. The second part is a direct consequence. We consider the function f constructed above, and we denote by F the set of x such that the sequence (an f (nx)) is bounded. The set {nx | x ∈ F, n ∈ N} has zero measure. We modify the function f on this set, choosing for example the value 1. The new function is integrable and satisfies, for all x, lim supn→∞ an f (nx) = +∞. Proof of Lemma 1. The sequence (cn ) is given, and it goes to zero. We will construct by induction an increasing sequence of integers (n k ) and a nonincreasing sequence of positive numbers (dk ), and we will define bn = dk /n for n k−1 ≤ n < n k . The numbers nk −1 dk will be chosen so that i=n b = 1 ; thus we require that k−1 i dk :=
n k −1
i=n k−1
1 i
−1 .
We start from n 0 = 1, and then we choose n 1 > n 0 such that, for all n ≥ n 1 , |cn | ≤ 1/2. In the next step, we choose n 2 > n 1 such that d2 ≤ d1 and, for all n ≥ n 2 , |cn | ≤ 1/4. More generally, if n 1 , n 2 , . . . , n k−1 have been constructed, we choose n k > n k−1 such that dk ≤ dk−1 and, for all n ≥ n k , |cn | ≤ 2−k . (Of course, this is possible because
−1 n 1 = 0.) limn→+∞ i=n k−1 i This defines the sequence (bn ) by blocks. The sequence (nbn ) is nonincreasing and, for all k ≥ 1, we have n k −1
n k −1
bi = 1 and
i=n k−1
This guarantees that
n
bn = +∞ and
bi ci ≤ 21−k .
i=n k−1
n
bn cn < +∞. The lemma is proved.
About Remark 3. Dirichlet’s lemma in Diophantine approximation (based on the pigeon-hole principle) concerns the particular case bn = 1/n in Khinchin’s theorem and it gives a result for all x. February 2010]
NOTES
179
Lemma 2 (Dirichlet’s Lemma). For all real numbers x, there exist infinitely many integers n such that dist(nx, Z) ≤ n1 . Now, we justify Remark 3. We consider a nondecreasing sequence of positive real numbers (an ) such that 1 < +∞. nan n We claim that there exists a sequence of positive real numbers (bn ) such that bn an → +∞
and
bn n
< +∞.
Here is a proof of this claim: for each k ≥ 1, there exists n(k) such that
1 1 ≤ 2. na k n n≥n(k) We have
card{k | n(k) ≤ n}
n
1 1 = < +∞, nan nan k≥1 n≥n(k)
and we can define bn := card{k | n(k) ≤ n}/an . Given this sequence (bn ), we consider the function f defined on R by f (x) =
bk 0
if |x − k| ≤ 1/k, k an integer, k ≥ 2, if not.
This function is integrable. Using Dirichlet’s lemma, we have the following: for each fixed x in (0, 1), there exist pairs of positive integers (n, k(n)), with n arbitrarily large, such that |nx − k(n)| ≤ 1/n. We have limn→∞ k(n) = +∞ and, for all large enough n, k(n) ≤ n. Hence there exist infinitely many n’s such that |nx − k(n)| ≤
1 k(n)
and so
f (nx) = bk(n) .
For such an n, we have an f (nx) = an bk(n) ≥ ak(n) bk(n) . (We used here the fact that the sequence (an ) is nondecreasing.) This proves that lim supn→∞ an f (nx) = +∞. This result obtained for all numbers x between 0 and 1 extends to all real numbers by the same argument as the one used in the proof of Theorem 2. We can also replace the local step function by a continuous one as we did before. Theorem 1 answers a question asked by Aris Danilidis. 180
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
REFERENCE 1. A. Ya. Khinchin, Continued Fractions, Dover, Mineola, NY, 1997; reprint of (trans. Scripta Technica, Inc.) University of Chicago Press, Chicago, 1961; reprint of 3rd Russian ed., State Publishing House of Physical-Mathematical Literature, Moscow, 1961. Laboratoire de Math´ematiques et Physique Th´eorique, Universit´e Franc¸ois-Rabelais Tours, F´ed´eration Denis Poisson–CNRS, Parc de Grandmont, 37200 Tours, France
[email protected]
Another Proof of the Infinitude of the Prime Numbers Theorem. There are infinitely many prime numbers. Proof. Assume the contrary. Let k be any positive integer. Then k! = p p f ( p,k) , where the product is taken over all primes p and k k k k k f ( p, k) = ≤ k. + ··· < + 2 + ··· = + p p2 p p p−1 Therefore k! < ( p p)k . But limk→∞ ( p p)k /k! = 0, so this is impossible. The formula for the prime factorization of k! is sometimes referred to as de Polignac’s formula, but it is actually due to Legendre [2, p. 8]; see [1, p. 263].
—Submitted by Junho Peter Whang, student, Queen’s University REFERENCE 1. L. E. Dickson, History of the Theory of Numbers, vol. I, Carnegie Institution of Washington, Washington, DC, 1919; reprinted by Dover Publications, Mineola, NY, 2005. 2. A. M. Legendre, Essai sur la Th´eorie des nombres, 2nd ed., Chez Courcier, Paris, 1808.
February 2010]
NOTES
181
PROBLEMS AND SOLUTIONS Edited by Gerald A. Edgar, Doug Hensley, Douglas B. West with the collaboration of Itshak Borosh, Paul Bracken, Ezra A. Brown, Randall Dougherty, Tam´as Erd´elyi, Zachary Franco, Christian Friesen, Ira M. Gessel, L´aszl´o Lipt´ak, Frederick W. Luttmann, Vania Mascioni, Frank B. Miles, Bogdan Petrenko, Richard Pfiefer, Cecil C. Rousseau, Leonard Smiley, Kenneth Stolarsky, Richard Stong, Walter Stromquist, Daniel Ullman, Charles Vanden Eynden, Sam Vandervelde, and Fuzhen Zhang. Proposed problems and solutions should be sent in duplicate to the MONTHLY problems address on the inside front cover. Submitted solutions should arrive at that address before June 30, 2010. Additional information, such as generalizations and references, is welcome. The problem number and the solver’s name and address should appear on each solution. An asterisk (*) after the number of a problem or a part of a problem indicates that no solution is currently available.
PROBLEMS 11481. Proposed by Ron Rietz, Gustavus Adolphus College, St. Peter, MN. Let X be a countable dense subset of a separable metric space M with no isolated points. Show that there exists a countable partition (X 1 , X 2 , . . . ) of X such that each X n is dense in M. 11482. Proposed by Marius Cavachi, “Ovidius” University of Constanta, Constanta, Romania. Let n be a positive integer, and let (a1 , . . . , an ), (b1 , . . . , bn ), and (c1 , . . . , cn ) be n-tuples of points in R2 with noncollinear centroids. For u ∈ R2 , let u be the usual euclidean norm of u. Show that there is a point p ∈ R2 such that n
p − ak =
k=1
n k=1
p − bk =
n
p − ck .
k=1
´ 11483. Proposed by Eric Pit´e, Paris, France. Let A and B be real n × n symmetric matrices such that tr((A + B)k ) = tr(Ak ) + tr(B k ) for every nonzero integer k. Show that AB = 0. 11484. Proposed by Giedrius Alkauskas, Vilnius University, Vilnius, Lithuania. An uphill lattice path is the union of a (doubly infinite) sequence of directed line segments in R2 , each connecting an integer pair (a, b) to an adjacent pair, either (a, b + 1) or (a + 1, b). A downhill lattice path is defined similarly, but with b − 1 in place of b + 1, and a monotone lattice path is an uphill or downhill lattice path. Given a finite set P of points in Z2 , a friendly path is a monotone lattice path for which there are as many points in P on one side of the path as on the other. (Points that lie on the path do not count.) (a) Show that√ if N = a 2 + b2 + a + b for some positive integer pair (a, b) satisfying a ≤ b ≤ a + 2a, then for some set of N points there is no friendly path. (b)∗ Is it true that for every odd-sized set of points there is a friendly path? doi:10.4169/000298910X476103
182
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
11485. Proposed by Neetu Badhoniya, K. S. Bhanu, and M. N. Deshpande, Institute of Science, Nagpur, India. An urn contains a white balls and b black balls, and a ≥ 2b + 3. Balls are drawn at random from the urn and placed in a row as they are drawn. Drawing halts when three white balls are drawn in succession. Let X be the number of paired white balls in the lineup produced during play, and let Y be the number of isolated white balls. Show that E[X ] =
b , a+1
E[Y ] =
b(a + b + 1) . (a + 1)(a + 2)
11486. Proposed by Cezar Lupu, student, University of Bucharest, Bucharest, Romania. Show that in an acute triangle with sides of lengths a1 , a2 , a3 and opposite angles of radian measure A1 , A2 , A3 , 3 3 2 ( 3k=1 tan Ak )3 (1 − cos Ak ) 8 k=1 ak ≥ 3 . cos Ak 9 ( k=1 ak )2 3k=1 (tan Ak + tan Ak+1 ) k=1 11487. Proposed by Stephen Gagola Jr., Kent State University, Kent, OH. Let A be a 0,1-matrix of order n with the property that tr(Ak ) = 0 for every positive integer k. Prove or disprove: A is similar by way of a permutation matrix to a strictly uppertriangular 0,1-matrix.
SOLUTIONS A Combinatorial Identity from Arcsin 11356 [2008, 365]. Proposed by Michael Poghosyan, Yerevan State University, Yerevan, Armenia. Prove that for any positive integer n, n2 n 24n (n!)4 k . 2n = (2n)!(2n + 1)! k=0 (2k + 1) 2k Solution by Omran Kouba, Higher Institute for Applied Sciences and Technology, Damascus, Syria. Recall that 2n ∞ ∞ 1 −1/2 n 2 n = x 2n (−x ) = √ 2n n 2 1 − x2 n=0 n=0 for |x| < 1. Integrating yields arcsin(x) =
∞ n=0
2n n
(2n + 1)22n
x 2n+1 .
√ 2n+1 Define f on (−1, 1) by f (x) = arcsin(x)/ 1 − x 2 , and let ∞ be the n=0 A n x power series expansion of f . Now 2n−2k 2k 2n−2k 2k n2 n n n (2n)! 1 n−k k n−k k k = 2n = 2n An = . 2k 22n−2k 2 (2k + 1)2 2 (2k + 1) 2 (n!) (2k + 1) 2n k=0 k=0 k=0 2k February 2010]
PROBLEMS AND SOLUTIONS
183
On (−1, 1), we have
√ √ 1 − x2 1 − x 2 f (x) = 1, which reduces to (1 − x 2 ) f (x) − x f (x) = 1.
Hence 1=
∞ ∞ ∞ (2n + 1)An x 2n − (2n − 1)An−1 x 2n − An−1 x 2n n=0
= A0 +
n=1
n=1
∞
(2n + 1)An − 2n An−1 x 2n .
n=1
It follows that A0 = 1 and An = 2n/(2n + 1) An−1 for all n ≥ 1, and so An =
22n (n!)2 2 · 4 · 6 · · · (2n) = , 3 · 5 · 7 · · · (2n + 1) (2n + 1)!
for all n. Comparing this to our first expression for An yields the desired identity. Editorial comment. Both Maple and Mathematica can evaluate the sum in terms of gamma functions, from which the result follows by simple manipulations. The GosperZeilberger algorithm can produce a human-readable proof of the identity automatically. Several solvers noted that the result appears in thin disguise as Problem 10(a) on page 121 of J. Riordan, Combinatorial Identities, Wiley, 1968, or as result 3.95 of H. W. Gould, Combinatorial Identities, 1972. Several others derived the result from Saalsch¨utz’s theorem on hypergeometric functions, quoting a variety of sources including equation 5.36 in R. L. Graham, D. E. Knuth, and O. Patashnik, Concrete Mathematics, Addison-Wesley, 1998. Also solved by R. Bagby, M. Bataille (France), D. Beckwith, D. Borwein (Canada), S. Casey (Ireland), R. Chapman (U. K.), C. Curtis, M. L. Glasser, J. Grivaux (France), C. Krattenthaler (Austria), M. E. Larsen ´ Plaza (Denmark), O. P. Lossers (Netherlands), K. McInturff, T. Nakata (Japan), J. H. Nieto (Venezuela), A. & S. Falc´on (Spain), C. R. Pranesachar (India), R. Pratt, O. G. Ruehr, N. C. Singer, A. Stadler (Switzerland), R. Stong, R. Tauraso (Italy), M. Tetiva (Romania), M. Vowe (Switzerland), S. Wagon, B. Ward, P. Xi & Y. Yi (China), Y. Yu, BSI Problems Group (U. K.), CMC 328, GCHQ Problem Solving Group, Microsoft Research Problems Group, NSA Problems Group, UIUC Hypergeometric Functions Class, and the proposer.
An Inradius-Exradius Equation 11357 [2008, 365]. Proposed by Mehmet S¸ahin, Ankara, Turkey. Let Ia , Ib , Ic and ra , rb , rc be respectively the excenters and exradii of the triangle ABC. If ρa , ρb , ρc are the inradii of triangles Ia BC, Ib C A, and Ic AB, show that ρa ρb ρc + + = 1. ra rb rc Solution by Robin Chapman, University of Exeter, Exeter, U. K. Let α, β, and γ denote the angles of triangle ABC. Let α = 12 (π − α), β = 12 (π − β), and γ = 12 (π − γ ). Now α + β + γ = π, and thus there is a triangle A B C with angles α , β , and γ . Let J denote the incenter of triangle A B C , and let Ja be the incenter of triangle Ia BC. In that triangle, the angle B is half the exterior angle of triangle ABC at B; that is, (π − β)/2 = β . Similarly, the angle of triangle Ia BC at C is γ . Therefore triangle 184
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
Ia BC is similar to triangle A B C . Letting F(X Y Z) denote the area of a triangle with given vertices X , Y , Z, we have the following identities: ρa F(J B C ) F(Ja BC) = , = ra F(Ia BC) F(A B C )
ρb F(J C A ) , = rb F(A B C )
and
ρc F(J A B ) . = rc F(A B C )
Now F(A B C ) = F(J B C ) + F(J C A ) + F(J A B ), so ρb ρc ρa + + = 1. ra rb rc Also solved by M. Bataille (France), G. L. Bello-Burguet (Spain), H. Caerois (Chile), S. Casey (Ireland), C. Curtis, P. De (India), V. V. Garc´ıa (Spain), M. Goldenberg & M. Kaplan, D. Gove, K. Greeson & J. Zacharias, J. Grivaux (France), J. G. Heuver (Canada), E. J. Ionascu, I. M. Isaacs, D. Karagulyan (Armenia), N. Khachatryan (Armenia), O. Kouba (Syria), J. H. Lindsey II, O. P. Lossers (Netherlands), J. Minkus, A. Munaro (Italy), J. H. Nieto (Venezuela), C. R. Pranesachar (India), J. Rooin & M. Bayat (Iran), M. Sahin, A. Stadler (Switzerland), R. Stong, M. Tetiva (Romania), D. Vacaru (Romania), V. Verdiyan (Armenia), Z. V¨or¨os (Hungary), M. Vowe (Switzerland), B. Ward (Canada), L. Zhou, GCHQ Problem Solving Group (U. K.), Microsoft Research Problems Group, and the proposer.
Bernstein’s Envelope 11359 [2008, 365]. Proposed by J. Monterde, University of Valencia, Valencia, Spain. Find an explicit parametric formula for the geometric envelope onthe interval (0, 1) of the family of Bernstein polynomials Bsn (t), defined by Bsn (t) = ns t s (1 − t)n−s for n! . s ∈ [0, n], where ns = (s+1)(n−s+1) Solution by Max Stampfli, University of Applied Sciences, Bern, Switzerland. The envelope is given by solving simultaneously n s ∂ n B (t) = 0. Bsn (t) = t (1 − t)n−s and s ∂s s Logarithmic differentiation yields ∂ Bsn (t) = Bsn (t) [ψ(n − s + 1) − ψ(s + 1) + ln(t/(1 − t))] , ∂s where ψ is the digamma function given by ψ(x) = (x)/ (x). Because Bsn (t) > 0 for all t ∈ (0, 1), the equation ∂s∂ Bsn (t) = 0 simplifies to ψ(n − s + 1) − ψ(s + 1) + ln(t/(1 − t)) = 0. Although exact methods of solving this for s do not exist, by solving for t = xn (s) we obtain the envelope as a parameterized curve (xn (s), yn (s)): xn (s) =
eψ(s+1) , eψ(s+1) + eψ(n−s+1)
yn (s) =
Bsn (xn (s))
sψ(s+1) (n−s)ψ(n−s+1) n e e = . ψ(s+1) s (e + eψ(n−s+1) )n
Also solved by R. Mabry, N. C. Singer, A. Stadler (Switzerland), R. Stong, M. Vowe (Switzerland), GCHQ Problem Solving Group (U. K.), and the proposer.
February 2010]
PROBLEMS AND SOLUTIONS
185
An Ellipse of Lemoine Points 11361 [2008, 366]. Proposed by Finbarr Holland, University College Cork, Cork, Ireland. The Lemoine point of a triangle is the unique point L inside the triangle such that the distances from L to the sides are proportional to the corresponding side lengths. Given a circle G and distinct fixed points B and C on G, let K be the locus of the Lemoine point of ABC as A traverses the circle. Show that K is an ellipse. Solution by Omran Kouba, H.I.A.S.T., Damascus, Syria. As usual, write a, b, c for the lengths of the sides opposite vertices A, B, C, respectively. Lemma. The Lemoine point L of ABC is the barycenter of the weighted points (A, a 2 ), (B, b2 ), (C, c2 ); that is, −→ −→ −→ a 2 L A + b2 L B + c2 LC = 0. (1) Proof. Write F(X Y Z) for the area of X Y Z. Let la , lb , lc be the distances from the Lemoine point to the sides BC, C A, AB, respectively. By definition there is a positive constant κ such that la = 2κa, lb = 2κb, and lc = 2κc. Thus F(L BC) = κa 2 , F(LC A) = κb2 , and F(L AB) = κc2 . Now L is inside ABC, so there exist positive λ, μ, ν with λ + μ + ν = 1 such that L is the weighted barycenter of −→ −→ −→ (A, λ), (B, μ), (C, ν). To determine λ, μ, ν, note that λ L A + μ L B + ν LC = 0 implies
−→ −→ −→ −→ −→ −→ −→ −→ λ det L A, L B + ν det LC, L B = 0 = λ det L A, LC + μ det L B, LC . Consequently λF(L AB) = ν F(L BC) and λF(LC A) = μF(L BC), so λ : μ : ν = F(L BC) : F(LC A) : F(L AB) = a 2 : b2 : c2 . Without loss of generality, the points B and C and the center of G have Cartesian coordinates (−1, 0), (1, 0), and (0, t), respectively. The coordinates (α, β) of any point A on G must satisfy α 2 + (β − t)2 = A2 = 1 + t 2 , or α 2 + β 2 − 2tβ − 1 = 0.
(2)
Now a 2 = BC 2 = 4, b2 = AC 2 = (α − 1)2 + β 2 = 2 − 2α + 2tβ, and c2 = AB 2 = (α + 1)2 + β 2 = 2 + 2α + 2tβ, so a 2 + b2 + c2 = 8 + 4tβ. Hence −→ OL =
c2 − b2 −→ a2 α −→ 1 −→ −→ OC + OA = OC + O A. 2 2 2 2 2 2 a +b +c a +b +c 2 + tβ 2 + tβ
Therefore, the coordinates (x, y) of L satisfy x = 2α/(2 + tβ) and y = β/(2 + tβ), so α = x/(1 − t y) and β = 2y/(1 − t y). Therefore, by (2), L ∈ K if and only if 2 2 2y 4t y x − 1 = 0, + − 1 − ty 1 − ty 1 − ty which is equivalent to x 2 + (4 + 3t 2 )y 2 − 2t y − 1 = 0 or 2 t 4 + 4t 2 x 2 + (4 + 3t 2 ) y − = . 4 + 3t 2 4 + 3t 2 As required, this is the equation of an ellipse. Also solved by M. Bataille (France), M. Brozinsky, M. Goldenberg & M. Kaplan, J.-P. Grivaux (France), L. R. King, J. H. Lindsey II, J. Minkus, C. R. Pranesachar (India), A. Stadler (Switzerland), R. Stong, H. L. Stubbs,
186
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
R. S. Tiberio, Z. V¨or¨os (Hungary), M. Vowe (Switzerland), H. Widmer (Germany), J. B. Zacharias, GCHQ Problem Solving Group (U. K.), Microsoft Research Problems Group, and the proposer.
Counting Good Bit String Arc Diagrams 11362 [2008, 461]. Proposed by David Callan, University of Wisconsin, Madison, WI. A bit string arc diagram is an undirected graph in which the vertices are the positions in a single string of bits and the edges are called arcs due to the visual representation in which they are drawn joining positions in the string. To be a good diagram, arcs must occur only between unequal bits, and each bit may be the left endpoint of at most one arc. Thus the first diagram is good but, for two reasons, the second is not.
1
1
0
0
1
1
0
1
1
0
0
1
1
0
There are six good diagrams on two bits, four with no arc and two with a single arc. How many good diagrams are there on n bits? Solution by Jody Lockhart, U. S. Naval Academy, Annapolis, MD. The answer is (n + 1)!. We use induction on n. Letting g(n) be the number of good diagrams on n bits, it is immediate that g(1) = 2. Given that g(n − 1) = n!, consider a good diagram on n bits. Removing the leftmost bit and any arc emanating from it leaves a good diagram on n − 1 bits. Each good diagram on n − 1 bits arises in this way n + 1 times, since when one vertex is added to the left of the others, the diagram is completed either by adding an arc from it to another vertex (there are n − 1 choices, and each determines the label of the new vertex) or by adding no new arcs and labeling the new vertex with 0 or 1 (two more choices). Thus g(n) = (n + 1)g(n − 1) = (n + 1)!. Editorial comment. Robin Chapman observed that the answer and the proof remain unchanged if the words “only between unequal bits” in the problem statement are changed to “only between equal bits”. Also solved by S. Abbott (U. K.), M. Andreoli, D. Beckwith, J. C. Binz (Switzerland), D. R. Bridges, R. Chapman (U. K.), M. T. Clay, P. Corn, C. Curtis, D. Degiorgi (Switzerland), J. Freeman, J. Gately, J. Guerreiro & J. Matias (Portugal), E. Guliyev & M. Sterin, T. Hopp, I. M. Isaacs, R. Johnsonbaugh, J. H. Lindsey II, S. C. Locke, O. P. Lossers (Netherlands), S. Metcalf, D. Nacin, J. H. Nieto (Venezuela), M. A. Prasad (India), R. Pratt, E. Raduns, M. Renault, J. B. Robertson, T. Rucker, P. Spanoudakis (U. K.), R. Stong, P. D. Straffin, J. Swenson, R. Tauraso (Italy), P. Weiner, BSI Problems Group (Germany), GCHQ Problem Solving Group (U. K.), and the proposer.
Nested Radicals 11367 [2008, 461]. Proposed by Andrew Cusumano, Great Neck, NY. Let x1 = √ √ 1 + 2, x2 = 1 + 2 1 + 3, and in general, let xn+1 be the number obtained by replacing the innermost √ expression (1 + (n + 1)) in the nested square root formula for xn with 1 + (n + 1) 1 + (n + 2). Show that lim
n→∞
xn − xn−1 = 2. xn+1 − xn
Solution by John H. Lindsey II, Cambridge, MA. We introduce two auxiliary, doubly indexed, sequences an,i and bn,i , defined for 0 ≤ i ≤ n. We put an,n = 1, and for 0 ≤ i ≤ n − 1, set an,i = 1 + (i + 2)an,i+1 , so that an,0 = xn . We put bn,i = (i + 3 − February 2010]
PROBLEMS AND SOLUTIONS
187
an,i )/(i + 2), noting that bn,n = 1, and we put m = 2 log n. A computation shows that for 0 ≤ i ≤ n − 1, bn,i 1 = , i+2 bn,i+1 2 − i+3 bn,i
(*)
from which it follows that 1 = bn,n > bn,n−1 > · · · > bn,0 > 0. Next, we note that 2 /(i + 3) < (1 − bn,i )2 . 1 − bn,i+1 = (1 − bn,i )2 − bn,i
(**)
We shall need some less immediate properties of these sequences. We claim 1<
an,i an−1,i−1
bn,n−m ≤
1 n
<
i +2 i +1
for 1 ≤ i ≤ n − 1
(1)
for n ≥ 1000
2−i bn,n−m < bn,n−m−i ≤
1 n
i 2 3
(2) for n ≥ 1000, 1 ≤ i ≤ n − m
bn,i /bn,i+1 i +1 i +3 < < i +2 bn−1,i−1 /bn−1,i i +2 2−(n−m) <
bn,0 bn,n−m
< 2−(n−m) e2/(2n−1)
(3) (4)
for n ≥ 1000
(5)
bn,n−m n−m+1 n+2 < . < n+m bn−1,n−m+1 n−m+2
(6)
Postponing the proof of the claims, we now compute bn,0 bn,n−m lim = lim · n→∞ bn−1,0 n→∞ bn−1,n−m−1
bn,0 bn,n−m bn−1,0 bn−1,n−m−1
=1·
1 2
so that lim
n→∞
xn − xn−1 an,0 − an−1,0 bn−1,0 − bn,0 = lim = lim n→∞ an+1,0 − an,0 n→∞ bn,0 − bn+1,0 xn+1 − xn bn−1,0 /bn,0 − 1 2−1 = 2. = n→∞ 1 − bn+1,0 /bn,0 1 − 1/2
= lim
Claim (1) follows from√ backwards induction on i, starting with i = n − 1 where we have √ an,n−1 /an−1,n−2 = (n + 2)/(n + 1) and then using the definition of an,i together with (i + 2)/i < (i + 1)/i. For (2), suppose instead that bn,n−m > 1/n. Then (1 − m−1 bn,n−m ) < (1 − 1/n). Using (**) m − 1 times now yields 1 − bn,n−1 < (1 − 1/n)2 . Thus m (1 − bn,n−1 )2 < (1 − 1/n)2 < exp (−2m /n) < exp −n 2 log 2−1 . On the other hand, bn,n−1 = (n + 2)/(n + 3) from (*), so (n + 3)2 > exp n 2 log 2−1 . For n sufficiently large, this fails, and some mundane calculation confirms that n = 188
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
1000 is sufficiently large. For (3), when n > 1000, (*) gives bn,n−m−1 < bn,n−m , then 1 bn,n−m−1 1 n−m+1 < < 2 bn,n−m 2 − n−m+2 bn,n−m−1 <
2 1 1 < , < 2 − bn,n−m 2 − 1/n 3
and eventually i i 1 2 bn,n−m−i < < . 2 bn,n−m 3 For (4), the definition of bn,i , (*), and (1) give a
1 + n−1,i−1 bn,i /bn,i+1 i+2 = an,i bn−1,i−1 /bn−1,i 1 + i+3 which has the form X = (1 + θa/(i + 2))/(1 + a/(i + 3)) with a > 1 and (i + 1)/(i + 2) < θ < 1. For any such X , (i + 1)/(i + 2) < X < (i + 3)/(i + 2). The lower bound in (5) is immediate from (3). For the upper bound, from (*) and (3) we have n−m−1 bn,i log bn,0 = log(bn,n−m ) + log bn,i+1 i=0
i + 2 1 2 n−m−i n−m−1 · < log(bn,n−m ) − i =0 log 2 − i +3 n 3 n−m 1 2 i log 1 − < log(bn,n−m ) − (n − m) log(2) − 2n 3 i=1 < log(bn,n−m ) − (n − m) log(2) +
2 . 2n − 1
The two ends of this chain of inequalities are equivalent to the required upper bound. Finally, for (6), using both estimates of (4) gives n−1 n−1 n−m+1 i +1 bn,i /bn,i+1 = < n+m i + 2 i=n−m bn−1,i−1 /bn−1,i i=n−m
=
bn,n−m bn−1,n−m−1
<
i=n−m
n−1
n+2 i +3 = . i +2 n−m+2
Also solved by F. Alayont, K. Andersen (Canada), R. Bagby, D. & J. Borwein (Canada), P. Bourdon, P. Bracken, R. Chapman (U. K.), H. Chen, K. Dale (Norway), P. P. D´alyay (Hungary), K. Endo, G. C. Greubel, J. Grivaux (France), J. A. Grzesik, S. J. Herschkorn, M. Hildebrand, F. Holland (Ireland), A. Incognito & T. Mengesha, V. K. Jenner (Switzerland), O. Kouba (Syria), K.-W Lau (China), W. R. Livingston, O. P. Lossers ´ (Netherlands), K. McInturff, K. Nagasaki (Japan), T. Nakata (Japan), O. Pad´e (Israel), P. Perfetti (Italy), A. Plaza & J. M. Pacheco (Spain), D. S. Ross, V. Rutherfoord, B. Schmuland (Canada), A. Stadler (Switzerland), R. Stong, R. Tauraso (Italy), M. Tetiva (Romania), M. Thaler (Australia), J. Vinuesa (Spain), Z. V¨or¨os (Hungary), T. Wilkerson, Y. Yu, BSI Problems Group (Germany), GCHQ Problem Solving Group (U. K.), Microsoft Research Problems Group, and the proposer.
February 2010]
PROBLEMS AND SOLUTIONS
189
REVIEWS Edited by Jeffrey Nunemacher Mathematics and Computer Science, Ohio Wesleyan University, Delaware, OH 43015
Tools of American Mathematics Teaching, 1800–2000. By Peggy Aldrich Kidwell, Amy Ackerberg-Hastings, and David Lindsay Roberts, Johns Hopkins University Press, Baltimore, 2008, xviii+418 pp., ISBN 13-978-0-8018-8814-4, $70.
Reviewed by Daniel S. Silver The chalk-covered overhead projector in the classroom corner has a story to tell. So does the pitted blackboard. Even the abandoned scrap of graph paper has a tale. These objects find their voice in Tools of American Mathematics Teaching. Other, less familiar or forgotten devices such as cube-root blocks and Cuisenaire rods get the occasion to speak as well. Tools of American Mathematics Teaching is divided into four parts: Tools of Presentation and General Pedagogy; Tools of Calculation; Tools of Measurement and Representation; Electronic Technology and Mathematical Learning. The impatient reader can dip in anywhere without fear of getting lost, since the chapters are independent. Careful references and generous notes at the end make this book a valuable supplement for a course in history of mathematics, especially one designed for education students. Briefly, Tools of American Mathematics Teaching is a highly unusual, well-written book that will entice those who have been on either side of the lectern. “When do you suppose blackboards were introduced in the United States?” I eavesdropped as one of the authors, Peggy Kidwell, posed the question to a dinner table full of mathematicians attending a conference. Her account of indignant Yale students whose resistance to such an innovation led to the Conic Section Rebellion of 1830 (and their ultimate dismissal) now often intrudes on my subconscious when I drag an eraser across a board. Ostensibly, mathematical tools are the focus of the book. They are treated carefully, even lovingly. One chapter is titled “The Slide Rule,” another “The Protractor.” More than 80 black and white photographs aid the reader’s imagination. However, Tools of American Mathematics Teaching is in fact a catalog of enthusiasms. Each of the chronicled devices embodies someone’s hopes. Consider, for example, this early 19th-century description of the abacus: “Suffice it to say, that it was one of the best instruments that was ever introduced into an infant school.” Or this later assertion: “The single most important catalyst for today’s mathematics education reform movement is the continuing exponential growth in personal access to powerful computing technology.” The most common teaching tools of our day are textbooks. They were not always common. Textbooks were rare and expensive in the early United States. When A New and Complete System of Arithmetic was published in 1786 by a Newbury-Port schoolmaster, Nicholas Pike, the weight of the event was recognized by the former commander of the Continental Army, George Washington. “I flatter myself,” he wrote, “that the idea of its being an American production, and the first of its kind which has appeared, doi:10.4169/000298910X476112
190
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
will induce every patriotic character to give it all the countenance in his power” [2, pp. 181–182]. Those who traded in anchors of brandy, pipes of water, or butts of ale were thankful for Pike’s book. So great was its need that it remained in general use for more than 50 years. Other authors followed Pike’s example. Both Jeremiah Day of Yale and John Farrar of Harvard made handsome profits from their textbooks. Yet both continued to regard themselves primarily as university professors rather than successful authors. Charles Davies was different. A hasty product of the United States Military Academy at West Point, he returned after the War of 1812 as an assistant professor of mathematics. Perhaps it was his depressing salary of ten dollars a month that motivated Davies to begin writing mathematics textbooks. He would churn out fifty books during the same number of years from 1826 to 1876. His dealings with various publishers suggest that he was equally fond of business. Davies created a new, lucrative career that would survive longer than his writing. A sense of civic duty also propelled most mathematical authors and inventors. However, another, more alarming motive was gaining strength. As schools and students multiplied, cries for uniform standards and assessment became loud. By the end of the nineteenth century, testing had become an industry that would survive recessions and even the Great Depression. When knowledgeable enthusiasm for teaching gives way to other concerns, trouble follows. Administrators and psychologists soon began a symbiotic relationship, the former dreaming of more efficient classrooms, the latter of huge classroomlaboratories. Together they began to transform teachers into data gatherers. Some of the tests, such as those designed by Stuart Appleton Courtis which timed students’ arithmetic computations, were at best a waste of time. One series of “diagnostic tests,” designed by Courtis for the schools of Cleveland and Grand Rapids, required nearly six hours for completion. A colleague, Walter Scott Monroe, warned, “Any series of classroom tests must not require a large amount of time if they are to be used by any besides the most enthusiastic workers.” Schools today must be full of enthusiastic workers. The need for greater classroom efficiency also spurred the use of many of the devices found in the pages of Tools of American Mathematics Teaching. Perhaps the oddest instance involved the overhead projector. As the Second World War began, the United States military discovered that it was now in the business of teaching. Thousands of new recruits needed training in technical matters, and many had little education. While movies and filmstrips were helpful, one such device, the overhead projector, was drawing special attention and funding of the U.S. government. Before the war, overhead projectors were more likely to be bathed in beer than chalk. The “Tel-E-Score,” manufactured in Chicago by the Brunswick-BalkeCollender Company, projected written bowling scores on screens at the head of the alley. After the war ended, aggressive marketing combined with government grants promoted “bowling-alley projectors” to the classroom. Unlike blackboards, overhead projectors permitted teachers to remain facing unruly students, thereby satisfying needs of both pedagogy and protection. “Projectuals,” devices specifically designed for the projectors, offered teachers the opportunity to present material in a dynamic fashion. One such device that is still available for purchase used iron filings in a soft bag of viscous fluid together with magnets to help students visualize magnetic fields. While some teaching tools promoted more efficient teaching, others demanded precious time. Cube-root blocks (a visual aid for cube root extraction) and Cuisenaire rods (colorful manipulatives for demonstrating basic arithmetic operations, reciprocal February 2010]
REVIEWS
191
numbers, proportion, and other elementary concepts) are two examples. As early as the mid 1800s we find educators contending that children should discover mathematical principles for themselves. I do not know if the use of such tools initiated the teaching debate between “traditionalists” and “reformers,” the former emphasizing manipulation of symbols, the latter objects. Certainly, it accelerated this ongoing fight. The slide rule was a different sort of device. It was invented by a seventeenthcentury Anglican minister and mathematician, William Oughtred. Specialized rules for tradesmen quickly followed. One such “gauging rule” was used to determine excise taxes on wine, ale, and spirits contained in barrels. When in the late nineteenth century the slide rule was introduced to American college classrooms, it arrived as a tool only for advanced calculations. Nevertheless, there was unease among some educators. The authors of Tools of American Mathematics Teaching might have included the following warning of Oughtred, reported in 1916 by mathematics historian Florian Cajori [1, p. 93]. A boy may learn to use a slide rule mechanically and, because of his ability to obtain practical results, feel justified in foregoing the mastery of underlying theory; or he may consider the ability of manipulating a surveying instrument quite sufficient, even though he be ignorant of geometry and trigonometry; or he may learn how to operate a dynamo and an electric switchboard and be altogether satisfied, though having no grasp of electrical science. Thus instruments draw a youth aside from the path leading to real intellectual attainments and real efficiency; they allure him into lanes which are often blind alleys. Such were the views of Oughtred. Many of the devices described in Tools of American Mathematics Teaching were made possible by new or improved materials. The commercial success of the overhead projector, for example, was dependent upon cellophane, invented in 1912 and later used to make inexpensive transparencies. The popularity of graph paper depended on production methods that could ensure large supplies at low cost. However, one innovation had a greater impact on mathematical tools than all previous ones combined. With the introduction of miniaturized electronic circuits in 1959, the shore of a previously unimagined, virtual world was glimpsed. As affordable handheld calculators arrived in the 1970s, traditional mathematics curricula and attitudes about teaching began to change. Fifty years later we are still struggling in the ocean currents, making our way to an uncertain shore. It is fitting that the final quarter of Tools of American Mathematics Teaching is devoted to electronic technology. Computers have made possible new educational tools. More significantly, they have changed the way we discover and, arguably, the way we think. (Today it is easier to find a slide rule “app” for an iPod Touch than to purchase the real thing.) The search during the past two centuries for new tools of teaching is now carried to the jungles of our virtual world. I can’t wait to read the sequel to this extraordinary book 200 years from now. REFERENCES 1. W. Cajori, William Oughtred, A Great Seventeenth-Century Teacher of Mathematics, Open Court, Chicago, 1916. 2. G. E. Littlefield, Early Schools and School-Books of New England, The Club of Odd Volumes, Boston, 1904. University of South Alabama, Mobile, AL 36688
[email protected]
192
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
Mathematical Association of America
Presents:
Visual Group Theory by Nathan Carter Group Theory is the branch of mathematics that studies symmetry, found in crystals, art, architecture, music and many other contexts. Its beauty is often lost on students because it is typically taught in a technical style that is difficult to understand. Visual Group Theory assumes only a high school mathematics background and covers a typical undergraduate course in group theory from a thoroughly visual perspective. The more than 300 illustrations in Visual Group Theory bring groups, subgroups, homomorphisms, products, and quotients into clear view. Every topic and theorem is accompanied with a visual demonstration of its meaning and import, from the basics of groups and subgroups through advanced structural concepts such as semidirect products and Sylow theory. Although the book stands on its own, the free software Group Explorer makes an excellent companion. It enables the reader to interact visually with groups, including asking questions, creating subgroups, defining homomorphisms, and saving visualizations for use in other media. It is open source software available for Windows, Macintosh, and Unix systems from http://groupexplorer.sourceforge.net. Catalog Code: VGT 334 pp., Hardbound 2009 MAA Member: $57.50
ISBN: 978-0-88385-757-1 List: $71.95
Order your copy today! 1.800.331.1622 www.maa.org Mathematical Association of America
®
THE MATHEMATICAL ASSOCIATION OF AMERICA 1529 Eighteenth Street, N.W. Washington, DC 20036