Springer Monographs in Mathematics
József Beck
Probabilistic Diophantine Approximation Randomness in Lattice Point Counting
Springer Monographs in Mathematics
More information about this series at http://www.springer.com/series/3733
József Beck
Probabilistic Diophantine Approximation Randomness in Lattice Point Counting
123
József Beck Department of Mathematics Rutgers University Piscataway, NJ, USA
ISSN 1439-7382 ISSN 2196-9922 (electronic) ISBN 978-3-319-10740-0 ISBN 978-3-319-10741-7 (eBook) DOI 10.1007/978-3-319-10741-7 Springer Cham Heidelberg New York Dordrecht London Library of Congress Control Number: 2014950069 © Springer International Publishing Switzerland 2014 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Preface
p We could choose randomness of 2 as an alternative subtitle of the p book. Indeed, the book connects two seemingly unrelated concepts, namely, (1) 2: symbolizing the class of quadratic irrationals, including the theory of the quadratic number fields in general and (2) randomness. These two concepts, representing algebra (the science of order and structure) and probability theory (the science of disorder), are the endpoints of aplong chain of relations/implications. The periodicity of the continued fraction of 2 (or any other quadratic irrational) means self-similarity. Self-similarity leads to independence (e.g., via Markov chains; here we refer to the well known probabilistic concept), and independence ensures (nearly) perfect randomness. In particular, we prove some unexpected probabilistic results: quadratic irrational H) periodic continued fraction H) H) self-similarity H) independence .or independence via Markov chains/ H) H) randomness W central limit theorem and the law of the iterated logarithm This diagram may summarize the book in a nutshell. p The reason why we decided not to choose randomness of 2 to be the subtitle is that it would perhaps mislead the reader. The reader would probably expect us to prove the apparent randomness of the digit distribution in the usual decimal expansion p 2 D 1:414213562373095048801688724209698078569671875376948 : : : : Unfortunately, we cannot make any progress with this famous old problem; it remains open and hopeless (to read more about this and other related famous open problems the reader may jump ahead right now to Sect. 2.5: A Giant Leap in number theory). What we study instead is the “irrational rotation” by any v
vi
Preface
p quadratic irrational, say, by 2. We study the global and local behavior of the irrational rotation from a probabilistic viewpoint—this explains the title of the book probabilistic diophantine approximation. Consider the linear sequence n˛, n D 1; 2; 3; : : :: it is perfectly regular, it is an infinite arithmetic progression. Even if we take it modulo one, and ˛ is an arbitrary (but fixed) irrational, the sequence n˛ (mod 1)—called irrational rotation—still features a lot of regularities. For example, (1) we have infinitely many Bounded Error Intervals, (2) we have infinitely many Bounded Error Initial Segments, (3) every initial segment has at most three different “gaps,” and (4) there is an extremely strong restriction on the induced permutations—these are all strong “anti-randomness” type regularity properties of the irrational rotation n˛ (mod 1), n D 1; 2; 3; : : : (properties (1)–(4) will be explained in depths in Sect. 1.1). These regularities show that the irrational rotation is highly non-random in many respects. This is why the irrational rotation (with an underlying nested structure) is also called a quasi-periodic sequence. Also we know from number theory that the key to understand the irrational rotation n˛ (mod 1), n D 1; 2; 3; : : : ; is to know the continued fraction for ˛. The quadratic irrationals have the most regular continued fraction: the class of quadratic irrationals is characterized by the property of (ultimately) periodic continued fraction, for example, p 2D1C
1 D Œ1I 2; 2; 2; : : : D Œ1I 2: 1 2 C 2C
Despite these regularities of the irrational rotation, our first main result exhibits “full-blown randomness.” For example, how much time does the irrational rotation n˛ (mod 1), n D 1; 2; 3; : : : ; spend in the first half Œ0; 1=2/ of the unit interval Œ0; 1/?pWell, we prove a central limit theorem for every quadratic irrational ˛ (e.g., ˛ D 2). More precisely, let ˛ be p an arbitrary real root of a quadratic equation with integer coefficients, say, ˛ D 2. Given any rational number 0 < x < 1 (say, x D 1=2) and any positive integer n, we count the number of elements of the sequence ˛; 2˛; 3˛; : : : ; n˛ modulo 1 that fall into the subinterval Œ0; x. We prove that this counting number satisfies a central limit theorem in the following sense. First, we subtract the “expected number" nx from the counting number and study the typical fluctuation of this difference as n runs in a long interval 1 n N . Depending on ˛ and x, we may need an extra additive correction of constant times logarithm of N ; furthermore, what we always need is a multiplicative correction: division by (another) constant times square root of logarithm of N . If N is large, the distribution of this renormalized counting number, as n runs in 1 n N , is very close to the standard normal distribution (bell-shaped curve), and the corresponding error term tends to zero as N tends to infinity. This is one of the main results of the book (see Theorem 1.1). The proof is rather complicated and long; it has many interesting detours and by-products. For example, the exact determination of the
Preface
vii
key constant factors (in the additive and multiplicative norming), which depend on ˛ and x, requires surprisingly deep algebraic tools such as Dedeking sums, the class number of quadratic fields, and generalized class number formulas. p Perhaps the reader is wondering: why are the quadratic irrationals (like 2) special and worth spending hundreds of pages on. The answer is that the quadratic irrationals play a central role in diophantine approximation for several reasons. They are the “most anti-rational real numbers” (officially called badly approximable numbers), and at the same time they represent the most uniformly distributed irrational rotations. A third reason is the Pell’s equation x 2 dy 2 D ˙1 (d 2 is p square free), which is of course closely related to d . Also, and this is the message of our book, the best way to understand the local and global randomness of the irrational rotation is to focus on the class of quadratic irrationals. This class gives the most elegant and striking results with the simplest proofs. Some of these results extend to almost every real number, some of them do not extend. We will elaborate on each one of these issues later. The quadratic irrational rotation demonstrates the coexistence p of order and randomness; p a novelty here is the much smaller norming factor log n (instead of the usual n). The log n comes from the fact that the underlying problem is about “generalized digit sums” with the surprising twist that the base of the number p system p is an irrational number (namely, the fundamental unit, e.g., it is 1 C 2 for ˛ D 2). Also log n represents the minimum; it corresponds to the most uniformly distributed irrational rotations. Our second main subject is motivated by the classical Pell’s equation. Finding the integral solutions of (say) x 2 2y 2 D ˙1 means counting lattice points in a long and narrow tilted hyperbolic region that we call a “hyperbolic needle.” Of course, we basically know everything about Pell’s equation (this is why Pell’s equation is included in every undergraduate number theory course), but what happens if we translate the “hyperbolic needle”? What is the asymptotic number of lattice points inside (note that the area is infinite)? Well, for a typical translated copy of the “hyperbolic needle”—which corresponds to an “inhomogeneous Pell inequality”— we prove a “law of the iterated logarithm,” which describes the asymptotic number of integral solutions in a strikingly precise way. In other words, the classical Circle Problem of Gauss is wide open, but here we can solve an analogous Hyperbola Problem. This result is a good illustration of the full power of the probabilistic viewpoint in number theory. In general, consider the inhomogeneous diophantine inequality kn˛ ˇk <
c ; n
(0.1)
where ˛ is an arbitrary irrational, ˇ, c > 0 are arbitrary real numbers, and n is the variable. An old result of Kronecker states that inequality (0.1) has infinitely many integral solutions n if c D 3; this is how Kronecker proved that the irrational
viii
Preface
rotation n˛ (mod 1) is dense in the unit interval. What can we sayp about the number of solutions n of inequality (0.1)? Consider the special case ˛ D 2 of (0.1): p c kn 2 ˇk < ; n
(0.2)
p and let F . 2I ˇI cI N / denote the number of integral solutions n of inequality (0.2) satisfying 1 n of the p N ; this counting function is about the local behavior p irrational rotation n 2 (mod 1). We can describe the true order of F . 2I ˇI cI N /, as N ! 1, in an extremely precise way for almost every ˇ. We prove that the p number of solutions F . 2I ˇI cI e n / of (0.2) oscillates between the sharp bounds (" > 0) p p p p p 2cn n .2 C "/ log log n < F . 2I ˇI cI e n / < 2cnC n .2 C "/ log log n (0.3) as n !p1 for almost every ˇ; see Theorem 5.6 in Part 1.3 of the book. Note that D . 2; c/ > 0 is a positive constant, and (0.3) fails with 2 " instead of 2 C ". (The reason why in (0.3) we switched p from N to the exponentially sparse sequence e n is that the counting function F . 2I p ˇI cI N / is slowly changing in the sense that, as N runs in e n < N < e nC1 , F . 2I ˇI cI N / makes only an additive constant change.) Observe that inequality (0.2) is (basically) equivalent to the inhomogeneous Pell inequality c 0 .x C ˇ/2 2y 2 c 0 ;
(0.4)
p where c 0 D 2 2c. Notice that equation (0.4) determines a long and narrow tilted hyperbola region (“hyperbolic needle”). The message of (0.3) is, roughly speaking, that for almost all translations, the number of lattice points in long and narrow hyperbola segments of any fixed quadratic irrational slope equals the area plus an error term which is never much larger than the square root of the area. Notice that (0.3) is a perfect analog of Khinchin’s law of the iterated logarithm in probability theory (describing the maximum fluctuations of the digit sums of a typical real number ˇ; the factor log log n in (0.3) explains the name “iterated logarithm”). We also have an analogous central limit theorem: the renormalized counting function p F . 2I ˇI cI e n / 2cn p ; 0 ˇ < 1; n has a standard normal limit distribution with error term O.n1=4 .log n/3 / as n ! 1 p [ D . 2; c/ > 0 is the same positive constant as in (0.3)].
Preface
ix
Formally, ˇ n p p o ˇ max ˇmeasure ˇ 2 Œ0; 1/ W F . 2I ˇI cI e n / 2cn n
1 p 2
Z
1
e u
2 =2
ˇ ˇ d uˇˇ D O n1=4 .log n/3 ;
(0.5)
where the maximum is taken over all 1 < < 1 (and of course measure means the one-dimensional Lebesgue measure). The proofs of the innocent-looking results (0.3) and (0.5) are quite difficult (in spite of the fact that most of the arguments are “elementary”). Note that here “independence” comes from a good approximation by modified Rademacher functions. The book is basically “lattice point counting” in disguise. This explains the subtitle randomness in lattice point counting. The main results are proved by the same scheme: we represent a natural lattice point counting function in the form X1 C X2 C X3 C : : : C negligible; where X1 ; X2 ; X3 ; : : : are independent random variables. This way we can directly apply some classical results of probability theory (such as the central limit theorem and the law of the iterated logarithm). We have the following questions: (a) how to construct the independent random variables X1 ; X2 ; X3 ; : : :, (b) how to compute the expectation, and finally (c) how to compute the variance. These are surprisingly difficult questions. Of course (0.3) and (0.5) extend to all quadratic irrationals. They also extend to some p other special numbers for which we know the continued expansion (e.g., e, e 2 , e). Some of the main results about quadratic irrationals (e.g., Theorems 1.1 and 1.2) do not extend to almost every ˛. The reason is that the continued fraction digits (officially called partial quotients) of a typical real number ˛ exhibit a very irregular behavior (see Sect. 6.10). Some other results, including (0.3) and (0.5), do have p every pan analog for almost ˛. There is, however, a difference: the norming factor n is replaced by n log n, and also the error term is much weaker (see Sect. 6.10). The kind of “randomness” we prove in the book requires some knowledge about the continued fraction expansion of the real number ˛. This is why the best way to demonstrate this “randomness” is to study the class of quadratic irrationals. Unfortunately, we know very little about the continued fraction of algebraic numbers of degree 3.pThis explains why we cannot prove anything about (say) the “randomness of 3 2”; this is why we can prove strong results about the “randomness of e,” and can prove nothing about the “randomness of .”
x
Preface
Besides “randomness,” the other main subject of the book is “Area Principle versus superirregularity” (see Part 1.3, starting with Sect. 5.1). The traditional meaning of probabilistic diophantine approximation is that it is a collection of results best illustrated by the following classical 0 1 law of Khinchin. If .n/ > 0 is a nonincreasing sequence, then the diophantine inequality P nkn˛k < .n/ has infinitely many P integral solutions n for almost every ˛ if 1 nD1 .n/ D 1; on the other hand, if 1 .n/ < 1 then nkn˛k < .n/ has only finitely nD1 many integral solutions n for almost every ˛. The subtitle of our book (randomness in lattice point counting) emphasizes the fact that what we do here is very different. We develop a new direction of research on the borderline of probability theory and number theory (including algebraic number theory). We switch the focus from almost every ˛ to special numbers (like quadratic irrationals and e), and switch from 0 1 laws to more sophisticated probabilistic results such as the central limit theorem and the law of the iterated logarithm. One of the challenges we faced in writing this book was that the experts in probability theory tend to know very little algebraic number theory and vice versa: the experts in algebraic number theory do not really care much about probability theory. These two groups, “algebraists” and “probabilists,” are in fact very different kinds of mathematicians with totally different taste and different intuitions. It is hard to find a middle ground satisfying both groups, not to mention the readers who know little probability theory and little algebraic number theory. This forced us to include a lot of examples and “detours.” The book grew from five partly-survey-partly-research papers of ours written between 1991 and 2000 (see [Be1,Be2,Be3,Be4,Be5]) and four more recent papers starting from 2010 (see [Be7, Be8, Be9, Be10]). In a nutshell, our work is a farreaching extension of some classical results of Hardy–Littlewood and Ostrowski from the period of 1914–1920. In particular, we added the unifying “probabilistic viewpoint,” which is completely missing from the old papers. It is interesting to point out that for the generation of Hardy, number theory and probability sounded like a strange mismatch. Hardy once dismissively declared: “probability is not a notion of pure mathematics but of philosophy or physics” (Hardy made this statement before Kolmogorov’s axioms “legitimized” probability theory as a wellfounded chapter in measure theory). The main results of the book are Theorems 1.1, 1.2, 5.4, 5.6 (all about “randomness”) and the subject of “Area Principle versus superirregularity” (see, respectively, Proposition 1.18, Theorems 5.7 and 5.3, Sects. 5.4–5.10). Since the two parts of the book are quite independent, the reader may start reading Part 1.3 first. We would recommend the reader to start with Sects. 1.1, 1.2, 5.1, and 5.2. An alternative way is to start with Sect. 2.5 and then go to Sects. 1.1, 1.2, 5.1, and 5.2. The book is more or less self-contained. It should be readable to everybody with some basic knowledge of mathematics (second-year graduate students and up) who is interested in number theory and probability theory.
Preface
xi
A few words about the notation. We constantly use the (rather standard) notation fxg, kxk, bxc, dxe, which mean, in this order, the fractional part of a real number x, the distance of x from the nearest integer, and the lower and upper integral parts of x (for example, x D fxg C bxc and kxk D minffxg; 1 fxgg). A less well-known notation is ( fxg 12 ; if x is not an integerI ..x// D 0; otherwise for the “sawtooth function,” which is permanently used in Part I of the book starting from Sect. 2.1. Throughout the letter c (or c0 , c1 , c2 ; : : :) denotes a generic constant, i.e., a positive constant that we could but do not care to determine. This constant may be absolute, or may depend upon the parameters involved in the theorem in question; it will not generally be the same constant. The well-known O-notation which occurs involves constants implicitly. It will generally be obvious on what, if any, parameters these constants depend. The natural (base e) logarithm is denoted by log (instead of ln that we don’t use in the book). We use log2 for the iterated logarithm, so log2 x D log log x; we use log x= log 2 to denote the binary (i.e., base 2) logarithm of x. We are sure there are many errors in this first version of the book. We welcome any corrections, suggestions, and comments. Piscataway, NJ, USA March 2014
József Beck
Contents
Part I
Global Aspects Randomness of the Irrational Rotation
1 What Is “Probabilistic” Diophantine Approximation? . . . . . . . . . . . . . . . . . . 1.1 The Giant Leap in Uniform Distribution . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.1.1 From Quasi-Periodicity to Randomness . . . . . . . . . . . . . . . . . . . . 1.1.2 Summary in a Nutshell . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2 Randomness in Lattice Point Counting .. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2.1 A Key Tool: Ostrowski’s Explicit Formula . . . . . . . . . . . . . . . . . 1.2.2 Counting Lattice Points in General . . . . .. . . . . . . . . . . . . . . . . . . . 1.3 First Warm-Up: Van der Corput Sequence—When Independence Is Given . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.3.1 Digit Sums and Generalized Digit Sums.. . . . . . . . . . . . . . . . . . . 1.3.2 A Decomposition Trick .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.3.3 Concluding Remark .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.4 Second Warm-Up: Markov Chains and the Area Principle .. . . . . . . . . 1.4.1 Statistical Independence and Markov Chains. . . . . . . . . . . . . . . 1.4.2 Long Runs of Heads . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.5 The Golden Ratio and Markov Chains: The Simplest Case of Theorem 1.2.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.5.1 Constructing the Underlying (Homogeneous) Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.5.2 How to Approximate with a Sum of Independent Random Variables .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.5.3 Solving the Parity Problem . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.5.4 Concluding Remarks. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
3 3 14 14 17 23 26 29 37 38 43 44 49 52
2 Expectation, and Its Connection with Quadratic Fields . . . . . . . . . . . . . . . . . 2.1 Computing the Expectation in General (I).. . . . . . .. . . . . . . . . . . . . . . . . . . . 2.1.1 An Important Detour: How to Guess Proposition 2.1? .. . . . 2.1.2 Quadratic Fields in a Nutshell . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
79 79 82 83
59 65 69 70 77
xiii
xiv
Contents
2.1.3 2.1.4
2.2
2.3
2.4 2.5 2.6
Another Detour: Formulating a “Positivity Conjecture” . . . Proposition 2.1 and Some Works of Hardy and Littlewood . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Computing the Expectation in General (II) .. . . . . .. . . . . . . . . . . . . . . . . . . . 2.2.1 The Expectation in Theorem 1.1 . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.2.2 An Analog of Proposition 2.1 . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.2.3 Periodicity in Proposition 2.9 . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Fourier Series and a Problem of Hardy and Littlewood (I) .. . . . . . . . . 2.3.1 Badly Approximable Numbers .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.3.2 The Hardy–Littlewood Series . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.3.3 Doubling and Halving in Continued Fractions . . . . . . . . . . . . . 2.3.4 A Geometric Interpretation .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Fourier Series and a Problem of Hardy and Littlewood (II) . . . . . . . . . A Detour: The Giant Leap in Number Theory . . .. . . . . . . . . . . . . . . . . . . . 2.5.1 Looking at the “Big Picture” . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Connection with Quadratic Fields (I) . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.6.1 A Detour: Another Class Number Formula .. . . . . . . . . . . . . . . . 2.6.2 How to Compute the Class Number in General: The Complex Case . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
87 98 100 100 105 113 116 118 120 123 125 128 137 137 148 161 163
3 Variance, and Its Connection with Quadratic Fields . . . . . . . . . . . . . . . . . . . . 3.1 Computing the Variance .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.1.1 Guiding Intuition .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.1.2 An Alternative Form of the Guiding Intuition .. . . . . . . . . . . . . 3.2 Connection with Quadratic Fields (II) .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.2.1 A Convenient Special Case: When the Class Number Is One . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.2.2 The Class Number for Real Quadratic Fields: Illustrations .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.2.3 The Dedekind’s Zeta Function at s=2: A Formula Involving Characters . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.2.4 An Alternative Formula Due to Siegel: Proposition 3.7 . . . 3.3 Connection with Quadratic Fields (III) . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.3.1 The General Case: Computing the Variance for an Arbitrary Quadratic Irrational .. . .. . . . . . . . . . . . . . . . . . . . 3.3.2 Computing the Variance in Theorem 1.1: A Special Case . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.3.3 Computing the Variance in Theorem 1.1: The General Case . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.3.4 The Case of Symmetric Intervals . . . . . . .. . . . . . . . . . . . . . . . . . . .
167 167 168 170 176
4 Proving Randomness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.1 Completing the Proof of Theorem 1.2 .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.1.1 Renewal Versus Self-Similarity . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.1.2 Ergodic Markov Chains: Exponentially Fast Convergence to the Stationary Distribution .. . . . . . . . . . . . . . . .
207 207 210
181 182 186 192 196 196 197 202 204
220
Contents
4.2 4.3 4.4
4.5 Part II
xv
How to Use Lemma 4.2 to Find the Analog of (1.223) in General? .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Completing the Proof of Theorem 1.1 . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . The Fourier Series Approach.. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.4.1 Guiding Intuition .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.4.2 Constructing a Sum XQ 1 C XQ 2 C XQ 3 C : : : of Almost Independent Random Variables . . . . . . . . . . . . . . . . . 4.4.3 Defining the Truly Independent Random Variables X1 ; X2 ; X3 ; : : :. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . More Results in a Nutshell . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
223 226 226 227 233 236 240
Local Aspects Inhomogeneous Pell Inequalities
5 Pell’s Equation, Superirregularity and Randomness . . . . . . . . . . . . . . . . . . . . 5.1 From Pell Equation to Superirregularity .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.1.1 Pell’s Equation: Bounded Fluctuations .. . . . . . . . . . . . . . . . . . . . 5.1.2 The Area Principle . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.1.3 The Giant Leap in the Inhomogeneous Case: Extra Large Fluctuations.. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.2 Randomness and the Area Principle . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.3 Proving Theorem 5.3 and the Lemmas . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.4 The Riesz Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.4.1 The Method of Nested Intervals vs. the Riesz Product .. . . . 5.4.2 The “Rectangle Property”, and a Key Result: Theorem 5.11.. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.5 Starting the Proof of Theorem 5.11 Using Riesz Product . . . . . . . . . . . 5.5.1 What are the Trivial Errors and How to Synchronize Them . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.5.2 Geometric Ideas . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.5.3 An Important Consequence of the “Rectangle Property” . . 5.5.4 Choosing a Short Vertical Translation . .. . . . . . . . . . . . . . . . . . . . 5.5.5 Summarizing the Vague Geometric Intuition .. . . . . . . . . . . . . . 5.6 More on the Riesz Product . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.6.1 Applying Super-Orthogonality . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.6.2 Single Term Domination: Clarifying the Technical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.6.3 A Combination of the Rectangle Property and the Pigeonhole Principle . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7 Completing the Case Study . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7.1 Verifying (5.152) . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7.2 A Combination of the Rectangle Property and the Pigeonhole Principle . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7.3 A Combination of the Rectangle Property and the Pigeonhole Principle. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7.4 A Combination of the Rectangle Property and the Pigeonhole Principle. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
251 251 251 253 256 263 275 281 281 285 288 295 296 299 300 301 302 302 307 311 314 314 318 324 329
xvi
Contents
5.8 5.9
Completing the Proof of Theorem 5.11.. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Yet Another Generalization of Theorem 5.3 . . . . .. . . . . . . . . . . . . . . . . . . . 5.9.1 Step One .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.9.2 Step Two: Small “Digit” ai Implies “Local” Rectangle Property .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.9.3 Step Three: Employing the Riesz Product Technique .. . . . . 5.9.4 Step Four: Constructing a Cantor Set . . .. . . . . . . . . . . . . . . . . . . . 5.10 General Point Sets: Theorem 5.19 . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.10.1 Statistical Version of the Rectangle Property: An Average Argument .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.10.2 Consequences of Inequality (5.327). . . . .. . . . . . . . . . . . . . . . . . . . 5.11 The Area Principle in General . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6 More on Randomness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.1 Starting the Proof of Theorem 5.4: Blocks-and-Gaps Decomposition .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.2 Completing the Blocks-and-Gaps Decomposition . . . . . . . . . . . . . . . . . . 6.3 Estimating the Variance . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.4 Applying Probability Theory.. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.4.1 Central Limit Theorem with Explicit Error Term . . . . . . . . . . 6.5 Conclusion of the Proof of Theorem 5.4 . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.6 Proving the Three Lemmas: Part One . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.6.1 Properties of the Auxiliary Functions in (6.222) and (6.223) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.6.2 Deduction of Lemma 6.6 from Lemmas 6.4 and 6.5 .. . . . . . 6.7 Proving the Three Lemmas: Part Two . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.8 Starting the Proof of Theorem 5.6 . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.9 Completing the Proof of Theorem 5.6 . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.10 More Results in a Nutshell . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.10.1 Combining the Logarithmic Density with the Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
331 338 341 343 346 347 349 351 353 357 371 371 383 393 403 405 413 423 427 429 434 446 457 468 473
References .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 481 Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 485
Part I
Global Aspects Randomness of the Irrational Rotation
Chapter 1
What Is “Probabilistic” Diophantine Approximation?
1.1 The Giant Leap in Uniform Distribution p We discuss some surprising new developments concerning 2, and in general the p class of quadratic irrationals. We use 2 as the representative for the whole class. These results provide some rigorous evidence for a mysterious general phenomenon that we call the Giant Leap. In a nutshell, it is about the unexpected randomness of explicit sequences (Giant Leap to full-blown randomness). The reader may jump ahead to Sect. 2.5 for p a detailed discussion of this issue. The history of 2 is quite remarkable. Every mathematician knows that the discovery the Pythagorean school—namely, that numbers p of irrational numbers by p like 2 and the golden ratio .1 C 5/=2 are irrational (the Ancient Greeks called them “incommensurable”)—caused a great deal of shock. The Pythagoreans looked upon integers as the essence of all things in the universe. When they realized that the integers did not suffice to measure even a simple geometric object such as the length of the diagonal of a unit square, they must have felt cheated by the gods. However, a modern student (say, a good undergraduate student) has a hard time understanding the magnitude of this philosophical crisis 2,500 years ago. The modern student remembers the well-known theorem from the high school that a real number is rational if and only if its decimal expansion (an infinite series(!)) is eventually periodic. Now it is very easy to construct decimal expansions which are obviously not periodic. For example, take a decimal expansion which is increasingly dominated by zeros: ˛ D 0:01001000100001000001000000100000001
(1.1)
It is clearly nonperiodic, since the length of the blocks of consecutive 0s (separated by 1s) tends to infinity; of course, there are infinitely many similar examples. The Ancient Greeks had a totally different way of discovering irrational numbers. Instead of studying infinite series (the Ancient Greeks knew little calculus), © Springer International Publishing Switzerland 2014 J. Beck, Probabilistic Diophantine Approximation, Springer Monographs in Mathematics, DOI 10.1007/978-3-319-10741-7__1
3
4
1 What Is “Probabilistic” Diophantine Approximation?
they were focusing on intuitive geometry, thoroughly studying regular polygons (equilateral triangle, square, regular pentagon, etc.) and also the regular polyhedra (regular tetrahedron, cube, etc.). By using Pythagoras’ theorem, they were able to express many natural geometric distances, say, the height of the equilateral triangle, the diagonal of the square, the diagonal of the regular pentagon, the height of the regular tetrahedron, and the space diagonal of the cube in terms of square roots p p (i.e., quadratic irrationals). If each side is one, we obtain the numbers 3=2, 2, p p p .1 C 5/=2, 2=3, and 3 in this order. The Ancient Greeks called two (positive) distances d0 and d1 commensurable (i.e., their ratio is rational) if they can be both measured with the same unit u so that d0 D m u and d1 D n u, where m and n are natural numbers. Before discovering irrational numbers, the Ancient Greeks probably felt intuitively that this process—basically the Euclidean algorithm—would always terminate. It was a shock, therefore, when in the fifth century B.C. a member of the Pythagorean school, probably Hippasus of Metapont, discovered examples of incommensurable (i.e., irrational) geometric distances. The first example was most p likely the ratio diagonal/side in the regular pentagon (i.e., the golden ratio .1 C 5/=2), due to the fact that the pentagram (regular pentagon with the five diagonals) was the official symbol of the Pythagorean brotherhood. By iterating the pentagram for the inscribed pentagon, we obtain an infinity of smaller and smaller similar pentagons. Converting this self-similar picture into a continued fraction, we obtain p 1C 5 1 diagonal D Œ1I 1; 1; 1; 1; 1; : : : D Œ1I 1; D D1C 1 side 2 1 C 1C:::
(1.2)
that is, we have an example where the Euclidean algorithm never terminates. We emphasize the difference between the artificially constructed irrational number in (1.1) and the quadratic numbers that the Ancient Greeks have proved to be irrational. The quadratic numbers represent genuinely interesting natural geometric distances; they deserve to be called special numbers. The real number in (1.1), on the other hand, is just an artificial counterexample. Equation (1.2) gives the continued fraction for the golden ratio. The irrationality p p of 2 and 3 were probably proved by the Greeks using analogous geometric considerations, by studying self-similar pictures. A self-similar picture can be converted into a recurrence relation, for example, p p 1 1 2 D 1 C . 2 1/ D 1 C p D1C p ; 2C1 2 C . 2 1/ and the recurrence relation in (1.3) leads to the familiar continued fraction for p
2D1C
1 2C
1 p 2C. 21/
D Œ1I 2; 2; 2; : : : D Œ1I 2:
(1.3) p
2:
(1.4)
1.1 The Giant Leap in Uniform Distribution
5
Now we jump ahead in time a couple of thousand years to Lagrange’s famous theorem, which generalizes (1.2) and (1.4) as follows. A real number ˛pis said to be a quadratic irrational if it can be written in the form ˛ D .a C d /=b, where a; b ¤ 0; d 2 are integers and d is not a complete square. An equivalent definition is that ˛ is a root of a quadratic equation Ax 2 CBx CC D 0 with integral coefficients such that the discriminant B 2 4AC 2 is not a complete square. Lagrange’s Theorem. The continued fraction which represents a quadratic irrational is always (ultimately) periodic. For example, the
p 24 15 17
D Œ1I 5; 2; 3 (the bar indicates the period). We also have
Converse of Lagrange’s Theorem. If the continued fraction of ˛ is (ultimately) periodic then ˛ is a quadratic irrational. Continued fractions play a key role in the theory of the Pell equation x 2 dy 2 D 1. We known that the Pell equation has infinitely many integral solutions if the integer d 2 is not a complete square. The well-known cyclic structure of p all integral solutions is a by-product of the periodicity of the continued fraction of d . It is also well knownphow to read out the least solution from the period of the continued fraction of d . As an illustration, take d D 29, and consider p 29 D Œ5I 2; 1; 1; 2; 10; 2; 1; 1; 2; 10; 2; 1; 1; 2; 10; : : : D Œ5I 2; 1; 1; 2; 10; where the bar indicates here the period. The length of the period is 5, an odd number, implying that the numerator and the denominator of the fifth convergent Œ5I 2; 1; 1; 2 D
70 13
give the least positive solution x D 70, y D 13 (i.e., x D x0 > 0, y D y0 > 0 for which y0 is least) of the Pell equation x 2 29y 2 D 1 with “1” instead of “C1.” In order to get the least solution of x 2 29y 2 D C1 we need the tenth convergent (i.e., we repeat the period) Œ5I 2; 1; 1; 2; 10; 2; 1; 1; 2 D
9081 ; 1820
and the least solution is the pair x D 9081 and y D 1820. Sometimes the least solution is huge. A striking example is the Pell equation x 2 61y 2 D 1 for which the least solution is x D 1,766,319,049 and y D 226,153,980; another one is x 2 109y 2 D 1 for which the least solution is x D 158,070,671,986,249 p and y D 15,140,424,455,100. Roughly speaking, the length of the period of d describes the logarithmp of the least solution of the Pell equation (for example, the length of the period of 61 is 11).
6
1 What Is “Probabilistic” Diophantine Approximation?
The remarkable connection between continued fractions and higher arithmetic, especially quadratic fields, is a well-known story, and it can be found in many books on number theory (see, e.g., [Ha-Wr]). Here we focus on a completely different, hardly known angle: the equally fascinating connection between quadratic irrationals and randomness. As a first illustration, we formulate and prove p a central limit theorem related to the uniform distribution of the sequence n 2 (mod 1), n D 1; 2; 3; : : :. If ˛ is rational then the sequence n˛ (mod 1), n D 1; 2; 3; : : : ; is clearly periodic. On the other hand, if ˛ is irrational, then the fractional parts 0 < fn˛g < 1, n D 1; 2; 3; : : : ; represent distinct points in the unit interval .0; 1/. The sequence n˛ (mod 1), n D 1; 2; 3; : : : ; is often called the irrational rotation, due to the familiar representation of the unit torus as a circle of unit circumference. What can we say about the distribution of the irrational rotation? We are going to achieve a “Giant Leap” from the perfectly regular, periodic behavior to randomness in three steps. a
12a
0=1=2=...
2a
3a
Step One: The irrational rotation is dense in .0; 1/. Step Two: The irrational rotation is uniformly distributed in .0; 1/. Step Three: The quadratic irrational rotation, counted in any fixed interval .0; x/ with rational endpoint x, exhibits a central limit theorem. Step Three is the new result here. Why quadratic irrationals? Well, the quadratic irrationals play a special role. Besides the deep connection with number theory (Pell’s equation is just one example), we have to point out that the quadratic irrationals are in the class of the “most anti-rational” real numbers (officially called badly approximable numbers—this will be explained below). This “antirational” property of the quadratic irrationals is a consequence of the boundedness of the continued fraction “digits”(= partial quotients); boundedness follows from periodicity.
1.1 The Giant Leap in Uniform Distribution
7
How such as e, p p about the “anti-rational” property of other interesting numbers , 3 2, and log 2? Well, e is almost as “anti-rational” as p (say) 2, but we know hardly anything about the “anti-rational” property of or p3 2 or log 2 (because we can prove very little about the continued fraction for or 3 2 or log 2). For better understanding of Step Three, we have to briefly talk about Step One and Step Two, which are of course well-known classical results. Notice that Step One is just a one-dimensional special case of Kronecker’s famous general theorem that he proved in 1884: if 1; ˛1 ; ˛2 ; : : : ; ˛k are linearly independent over the rationals, then the k-dimensional sequence .n˛1 ; n˛2 ; : : : ; n˛k /; n D 1; 2; 3; : : : modulo one;
(1.5)
is dense in the k-dimensional unit cube. Density, important as it is, does not tell the whole truth about the global distribution of the irrational rotation: Step Two above claims the much stronger property of uniform distribution. We recall that an infinite sequence in the unit interval is said to be uniformly distributed if for any subinterval I .0; 1/ the density of the elements of the sequence that fall into I exists, and it equals the length jI j of the subinterval. The uniform distribution of the irrational rotation has been discovered and proved around 1910 (Bohl, Sierpinski, H. Weyl). For later purposes we include a short proof of this important result. Short Proof of Uniform Distribution. It is based on a simple but very useful observation of Hecke that if subintervals have some special length then the counting error is bounded. First a notation: for any interval I .0; 1/ write Z˛ .N I I / D
X
1;
(1.6)
1nN W n˛2I .mod 1/
and call Z˛ .N I I / the “counting function.” The counting function (1.6) is simply the partial sum of the interval-hitting sequence. t u Lemma on Bounded Error Intervals.. Let I .0; 1/ be a half-open interval of length jI j D fk˛g (fractional part) where k 1 is some integer. Then for every N jZ˛ .N I I / N jI jj < k:
(1.7)
Proof. First let k D 1. Since each step ˛ of the irrational rotation is the same as the length of interval I , the equality Z˛ .N I I / D bN˛cordN˛e
(1.8)
(meaning the lower or upper integral part) is obvious: every interval Œm; m C 1/, where m is an integer, contains exactly one multiple n˛ with n˛ 2 I (mod 1).
8
1 What Is “Probabilistic” Diophantine Approximation?
If k 2 then we simply decompose the sequence n˛, n D 1; 2; 3; : : : ; into k arithmetic progressions of the same gap k and apply (1.8) for each. This implies (1.7). t u Using this lemma we can quickly prove the uniform distribution of the irrational rotation. It clearly suffices to deal with intervals of the type I D Œ0; where 0 < < 1 is arbitrary. Since the irrational rotation is dense (“Step One”), for every " > 0 there exist natural numbers m1 and m2 such that " < fm1 ˛g < < fm2 ˛g < C ":
(1.9)
Write I1 D Œ0; fm1 ˛g/ and I2 D Œ0; fm2 ˛g/; then clearly Z˛ .N I I1 / Z˛ .N I I / Z˛ .N I I2 /:
(1.10)
By (1.7) for every N and j D 1; 2 jZ˛ .N I Ij / N jIj jj < mj :
(1.11)
Combining (1.9)–(1.11), for every N jZ˛ .N I I / N jI jj < maxfm1 ; m2 g C "N: Dividing (1.12) by N and taking " ! 0, uniform distribution follows.
(1.12) t u
Note that the usual proof is based on Weyl’s criterion [We], which is far the most flexible approach: it easily generalizes in higher dimensions, gives nontrivial results for power sequences like n2 ˛ and n3 ˛, and for many other cases. Weyl’s criterion says that a sequence xn , n D 1; 2; 3; : : : ; is uniformly distributed modulo one if and only if Z 1 N 1 X 2ikxn e D e 2ikx dx D 0 N !1 N 0 nD1 lim
(1.13)
for every integer k ¤ 0 (notice that the case k D 0 is trivial). There is a third proof, using continued fractions, which has the great advantage of providing a sharp estimation on the error term. This quantitative approach goes back to Ostrowski [Os] and to Hardy and Littlewood [Ha-Li1, Ha-Li2] (independent work around 1920). First we recall some well-known facts from the theory of continued fractions (see, e.g., the books [Kh2] or [La]). If
1.1 The Giant Leap in Uniform Distribution
˛ D a0 C
1 1 a1 C a2 C : : :
9
D Œa0 I a1 ; a2 ; : : :;
then the j th convergent pj D Œa0 I a1 ; : : : ; aj 1 qj has the property that pj qj 1 pj 1 qj D .1/j ;
(1.14)
implying that pj and qj are relatively prime; the denominators qj satisfy the recurrence formula q1 D 1, q2 D a1 , qj D aj 1 qj 1 C qj 2 for all j 3, and finally, ˇ ˇ ˇ ˇ 1 ˇ˛ p j ˇ < ; ˇ ˇ qj qj qj C1 implying the weaker inequality that will suffice for our purposes here: ˇ ˇ ˇ ˇ ˇ˛ pj ˇ < 1 ; ˇ qj ˇ qj2
(1.15’)
(1.15”)
Quantitative proof of uniform distribution. It is based on the following Lemma on Bounded Error Initial Segments.. The special initial segment k˛, 1 k qn , where qn is a convergent denominator, is particularly well distributed in the sense that, for every subinterval I .0; 1/ and for every integer n 1, the discrepancy of the counting function [see (1.6)] is bounded: jZ˛ .qn I I / qn jI jj 3:
(1.16)
ˇ ˇ ˇ ˇ ˇk˛ kpn ˇ < k 1 ˇ qn ˇ qn2 qn
(1.17)
Proof. By (1.15”)
for all 1 k qn . Since pn and qn are relatively prime, the sequence kpn =qn , 1 k qn (mod 1) is just a permutation of the equidistant set j=qn , 1 j qn , for which we have Z1=qn .qn I I / D bqn jI jc or dqn jI je:
(1.18)
10
1 What Is “Probabilistic” Diophantine Approximation?
By (1.17) jZ˛ .qn I I / Z1=qn .qn I I /j 2; and combining this with (1.18), the lemma follows. By using this lemma we can easily estimate the discrepancy jZ˛ .N I I / N jI jj
t u
(1.19)
for a general N . Assume qn1 N < qn . In view of the recurrence relation qj D aj 1 qj 1 C qj 2 (for all j 3) we can write N in the form N D bn1 qn1 C bn2 qn2 C : : : C b1 q1 ;
(1.20)
where 1 bn1 an1 , 0 bj aj for 2 j n 2, and 0 b1 a1 1. Combining the trivial identity Z˛ .m C qj I I / Z˛ .mI I / D Z˛ .qj I I m˛/
(1.21)
with (1.16) and (1.20), we have jZ˛ .N I I / N jI jj 3.bn1 C bn2 C : : : C b1 /; which, in view of bj aj , immediately implies the following Discrepancy Lemma. For every integer N 1 and every subinterval I .0; 1/ jZ˛ .N I I / N jI jj 3.a1 C a2 C : : : C an1 /;
(1.22)
where qn1 N < qn . In fact, we have the slightly sharper form jZ˛ .N I I / N jI jj 3.a1 C : : : C an2 C N=qn1 /:
(1.23) t u
To prove uniform distribution we have to check that Z˛ .N I I / ! jI j N
(1.24)
as N ! 1 for all subintervals I .0; 1/. From the recurrence formula qj D aj 1 qj 1 C qj 2 (for all j 3) we have q2j C1 .1 C a1 a2 /.1 C a3 a4 / .1 C a2j 1 a2j /; and trivially
(1.25)
1.1 The Giant Leap in Uniform Distribution
11
q2j C2 a2j C1 q2j C1:
(1.26)
a1 C : : : C ak ! 0 .1 C a1 a2 /.1 C a3 a4 / .1 C ak1 ak /
(1.27)
Using the general fact
as k ! 1 through the even integers, and combining (1.23), (1.25) and (1.26), we obtain (1.24) where qn1 N < qn . This completes the quantitative proof of uniform distribution. t u Let’s return to (1.22) in the Discrepancy Lemma: note without proof that the upper bound .a1 C a2 C : : : C an1 / is basically sharp apart from the constant factor. The max-discrepancy, i.e., the discrepancy taken over all N in qn1 N < qn and over all subintervals I .0; 1/, does fluctuate as much as constant times .an1 C an3 C an5 C : : :/; this result is due to Hardy and Littlewood and, independently, to Ostrowski. If qn1 N < qn , then from qj D aj 1 qj 1 C qj 2 , very roughly, qn .1 C a1 /.1 C a2 / .1 C an1 /:
(1.28)
Under side condition (1.28) the minimum of the critical digit sum .a1 C a2 C : : : C an1 / is attained when max n
a1 C a2 C : : : C an D O.1/; n
(1.29)
i.e., when the average digit size is bounded, and so the smallest possible maxdiscrepancy for all irrational rotations is (positive) constant times log N , with equality (apart from a constant factor) for the class of ˛ satisfying (1.29). For quadratic irrationals the average digit size is clearly bounded (a by-product of periodicity), so (1.29) applies, and implies that the quadratic irrational rotation n˛, n D 1; 2; 3; : : : (mod 1), has max-discrepancy c˛ log N . The smallest p values of constant factor cp > 0 occur for numbers like the golden ratio .1 C 5/=2 D ˛ Œ1I 1; 1; 1; : : : and 2 D Œ1I 2; 2; 2; : : : that have very small continued fraction digits; see the more recent works of Dupain [Du] and Dupain and Sós [Du-So]. Summarizing, we have a very good understanding of the max-discrepancy of the quadratic irrational rotation: it is always (positive) constant times log N —i.e., as small as possible—where the constant factor depends on ˛. The numbers ˛ which are badly approximable by rationals give the “most uniform” irrational rotation and vice versa. The first new result is about the typical discrepancy (instead of the maxdiscrepancy). Step Three: The quadratic irrational rotation, counted in any fixed interval .0; x/ with p rational endpoint x, exhibits a central limit theorem with standard deviation c log N .
12
1 What Is “Probabilistic” Diophantine Approximation?
Step Three is in perfect harmony with the mysterious Giant Leap phenomenon that we will discuss in detail in Sect. 2.5. The Giant Leap refers to the dramatic change that happens when we switch from rationals to irrationals, and especially to quadratic irrationals. The rational rotation exhibits extremely simple periodic behavior; the quadratic irrational rotation, on the other hand, exhibits full-blown randomness, including a delicate central limit theorem. Note that the quadratic irrational rotation is at the other end of the spectrum, since the quadratic irrationals are (among) the most “anti-rational” numbers. Here is the precise statement. Theorem 1.1 (Central limit theorem). Let ˛ be any quadratic irrational and consider any interval I D Œ0; x/ with rational endpoint 0 < x < 1. There are effectively computable constants C1 D C1 .˛; x/ and C2 D C2 .˛; x/ > 0 such that, for any real numbers 1 < A < B < 1, the density of integers N 2 for which A<
.Z˛ .N I I / N x/ C1 log N
(1.30)
is given by the familiar integral 1 p 2
Z
B
e u
2 =2
d u:
(1.31)
A
In fact, we prove the following quantitative version: ˇ ˇ ˇ .Z˛ .nI I / nx/ C1 log N 1 ˇˇ ˇD 0 n < N W A B p ˇ ˇ N C2 log N 1 Dp 2
Z
B
e u
2 =2
d u C O .log N /1=10 log log N ;
A
where the implicit constant in the error term is absolute. The result remains true for any subinterval cN < n < N , where 0 < c < 1 is a fixed constant (say, c D 1=2). For any quadratic irrational ˛ and for any rational endpoint x, there is an explicit finite formula for the “expectation constant” C1 D C1 .˛; x/ (explained in Sects. 2.1 and 2.2), and similarly there is an explicit finite formula for the “variation constant” C2 D C2 .˛; x/ > 0 (explained in Sects. 3.1–3.3). Remarks. Note without proof that the central limit theorem can be extended to a delicate Large Deviation Theorem: 1 N
ˇ˚ ˇ ˇ 0 n < N W Z˛ .nI I / > nx C C1 log N C C2 plog N ˇ R1 !1 u2 =2 d u p1 e 2
as long as D O .log N /1=10 .
1.1 The Giant Leap in Uniform Distribution
13
The exponent 1=10 is certainly not best possible, and with a little extra effort we could easily prove a better constant, but to find the best exponent is not our main goal here. Hecke’s Lemma on Bounded Error Intervals shows that our condition “endpoint x is rational” cannot be relaxed to “any x”; indeed, if x D f˛g, or x D fk˛g for some integer k 1 (i.e., x is the fractional part of an integer p multiple of ˛), then the fluctuation is bounded (instead of having average size log N ). Note that the first constant factor C1 D C1 .˛; x/ in (1.30) can be both zero and nonzero, but the second factor C2 D C2 .˛; x/ > 0 is always p strictly positive. For example, if I D Œ0; 1=2/ (i.e., x D 1=2) and ˛ D 2, then [see (2.86)] p C1 D C1 . 2; 1=2/ D
1 8 log.1 C
p 2/
(1.32)
and [see (3.127)] p 1 C2 D C2 . 2; 1=2/ D 8
3
1=2
p p 2 log.1 C 2/
:
(1.33)
p if I remains the first half p Œ0; 1=2/ of the unit interval, but p On the other hand, 2 is replaced by 3 or the golden ratio .1 C 5/=2, then the corresponding first constant factor C1 is zero [see (2.90) and (2.91)], that is, we don’t need the additive logarithmic term in the numerator of (1.30). p Another example is ˛ D 7 and I D Œ0; 1=2/, then [see (2.92)] p C1 . 7; 1=2/ D
1 p : 4 log.8 C 3 7/
p Note that the number 8 C 3 7 in the denominator comes from the p least positive solution x D 8; y D 3 of Pell’s equation x 2 p 7y 2 D ˙1; this 8 C 3 7 is called the fundamental unit in the real quadratic field Q . 7/. The reason why the fundamental unit shows up in both C1 and C2 will be explained in the proofs. Note also that Theorem 1.1 can be easily generalized for any interval I D .x1 ; x2 / where both endpoints are rational. For example, taking the symmetric intervals I D .x; x/ (instead of I D Œ0; x/) the first constant factor C1 is always zero. Note in advance that the explicit evaluation of the variance constants C2 is based on explicit finite formulas that we call “generalized class number formulas.” It involves surprisingly deep number theory (see Sects. 3.1–3.3). The basic idea of the proof of Theorem 1.1 is the following. As n runs in an interval 0 < n < N , we set up an approximation of Z˛ .nI I / nx with a sum of independent and identically distributed random variables. (Note in advance that the independence will come from an underlying homogeneous Markov chain.) Despite the simplicity of this approach, the details are complicated, and the proof of Theorem 1.1 is rather long.
14
1 What Is “Probabilistic” Diophantine Approximation?
1.1.1 From Quasi-Periodicity to Randomness Let’s return to Hecke’s Lemma on Bounded Error Intervals: it is a very strong “antirandomness” type limitation on the irrational rotation. By the way, later we need the following stronger form of Hecke’s Lemma. Lemma on Just Intervals. Let I .0; 1/ be an arbitrary half-open interval of length jI j D fqk ˛g for some integer k 0, where qk is the k-th convergent denominator of ˛. Then for any integer N 1, jZ˛ .N I I / N jI jj < 2: We give a proof of this lemma at the end of the section. Another strong regularity property of the irrational rotation is the Lemma on Bounded Error Initial Segments. A third strong regularity property is the so-called Three-distance theorem. We don’t need it for the rest, but this elegant result is definitely worthwhile mentioning. Let 0 < ˛ < 1 be an arbitrary irrational number, let n be a natural number, and let 0 < y1 < y2 < : : : < yn < 1 be the first n terms of the fractional part sequence fk˛g, 1 k n, arranged in increasing order. H. Steinhaus made the surprising conjecture that the set of gaps yj C1 yj , j D 0; 1; : : : ; n (where y0 D 0 and ynC1 D 1), attain at most three different values. Moreover, if there are three different values, say, 0 < ı1 < ı2 < ı3 , then ı 1 C ı2 D ı 3 . This beautiful conjecture was proved by Sós [So1] and Swierczkowski [Sw], and it is now called the “three-distance theorem.” It was Sós [So1] who noticed a very interesting by-product of the proof of the Three-distance theorem. Lemma on Restricted Permutations. Let ˛ be an arbitrary irrational, and let P be the permutation of the set 1; 2; : : : ; n such that 0 < fp.1/˛g < fp.2/˛g < : : : < fp.n/˛g < 1: Then the whole permutation P W p.1/; p.2/; : : : ; p.n/ can be reconstructed from the knowledge of p.1/ and p.n/; the point is that we don’t need to know ˛. It is worth mentioning that there is another interesting “three-distance theorem,” which goes as follows. Besides ˛ and n, let 0 < b < 1 be an arbitrary real number. The “gaps” between the successive values of k, 1 k n, for which fk˛g < b can have at most three lengths, and if there are three, one will be the sum of the other two (this was also a conjecture of Steinhaus).
1.1.2 Summary in a Nutshell The linear sequence n˛, n D 1; 2; 3; : : : ; is perfectly regular: it is an infinite arithmetic progression. Even if we take it modulo one, a lot of regularities are
1.1 The Giant Leap in Uniform Distribution
15
still preserved. For example, (1) Hecke’s Lemma on Bounded Error Intervals and its stronger form, (2) the Lemma on Just Intervals, (3) the Lemma on Bounded Error Initial Segments, (4) the Three-distance theorem, and (5) the Lemma on Restricted Permutations are all strong “anti-randomness” type regularity properties of the irrational rotation. These regularities demonstrate that the irrational rotation is highly non-random in many respects, and explain why the irrational rotation is called a quasi-periodic sequence. Nevertheless, our Theorem 1.1, a central limit theorem, clearly exhibits full-blown “randomness.” The price p p that we pay is the much smaller norming factor log n instead of the usual n. The message—in fact, the basic message of the book—is that, even under very restrictive regularity conditions such as quasi-periodicity, randomness eventually prevails. We have a very good understanding of the irrational rotation n˛ (mod 1), n D 1; 2; 3; : : :, which is a linear sequence. By comparison, we know much, much less about the polynomial sequences such as n2 ˛ (mod 1), n3 ˛ (mod 1), n4 ˛ (mod 1), and so on, where ˛ is a given special number, say, a quadratic irrational. Computer experimentation indicates full-blown randomness with standard deviation p p n (instead of log n), but basically there is no mathematical tool to prove it (especially for degree 3). Finally, as we promised, we conclude this section with a Proof of the Lemma on Just Intervals. Let q1 D 1, q2 D a1 , q3 D a2 a1 C 1, : : : be the convergent denominators for ˛. In the special case q1 D 1 we already proved the statement, see (1.8). Now assume that qk is an arbitrary convergent denominator, I Œ0; 1/ is an arbitrary half-open interval of length jI j D fqk ˛g < 1=2, and we study the counting function Z˛ .N I I / D
X
1:
1nN W n˛2I .mod 1/
First assume that N is divisible by qk , and consider the arithmetic progressions for a D 1; 2; : : : ; qk : N : qk
(1.34)
Z .M I I .qk a/˛/;
(1.35)
i qk C a; i D 0; 1; 2; : : : ; M 1 with M D For brevity write D jI j D fqk ˛g, then by (1.34) Z˛ .N I I / D
qk X aD1
where I t denotes the translated copy of interval I modulo one. The point here is that the intervals I .qk a/˛, as a D 1; 2; : : : ; qk , are pairwise disjoint and also uniformly distributed in the unit interval.
16
1 What Is “Probabilistic” Diophantine Approximation?
To prove disjointness, notice that if I j˛ and I l˛ overlap for some 0 j < l < qk , then k.l j /˛k < jI j D kqk ˛k; which contradicts the well-known local minimum property of kqk ˛k (km˛k < kqk ˛k implies that m > qk ). To prove uniform distribution of the translated intervals, we simply refer to the Lemma on Bounded Error Initial Segments. Combining disjointness with uniform distribution, by (1.8) we have Z˛ .N I I / D
qk X
Z .M I I .qk a/˛/ D qk bM jI jc C ;
(1.36)
D bqk fM jI jgc or dqk fM jI jge
(1.37)
aD1
where
(lower or upper integral part). Since N D qk M , we can rewrite (1.36) and (1.37) as follows: Z˛ .N I I / D bN jI jc or bN jI jc;
(1.38)
which proves the lemma in the special case when N is divisible by qk . In the general case we write N D N1 C r, where N1 is divisible by qk and 0 r < qk . Clearly Z˛ .N I I / D Z˛ .N1 I I / C Z˛ .rI I N1 ˛/:
(1.39)
Since 0 r < qk , and again using the local minimum property of kqk ˛k D jI j, we have 0 Z˛ .rI I N1 ˛/ 1:
(1.40)
1 : qk
(1.41)
Also, jI j D jqk ˛ pk j < Combining (1.38)–(1.41) we have jZ˛ .N I I / N jI jj jZ˛ .N1 I I / N1 jI jj C jZ˛ .rI I N1 ˛/ rjI jj < < 1 C 1 D 2; completing the proof of the lemma.
t u
1.2 Randomness in Lattice Point Counting
17
1.2 Randomness in Lattice Point Counting First note that the counting function Z˛ .N I I / D
X
1
1nN W n˛2I .mod 1/
of the irrational rotation has an alternative geometric meaning: it counts lattice points in a long tilted narrow strip of slope ˛.
Indeed, let I be the interval .0; /, we push down the line y D ˛x of slope ˛ by the length of interval I , and consider the long tilted narrow parallelogram with vertices 1 1 1 1 .0; 0/; .0; /; .N C ; ˛.N C //; .N C ; ˛.N C / /I 2 2 2 2 we denote this parallelogram with P.I N /. Clearly the area of parallelogram P.I N / is .N C 12 /. Let L.I N / denote the number of lattice points in parallelogram P.I N /. It is easy to see that, with I D .0; /, Z˛ .N I I / N D
X 1nN W 0
1 C O.1/: 1 N D L.I N / N C 2 (1.42)
18
1 What Is “Probabilistic” Diophantine Approximation?
In view of (1.42), Theorem 1.1 is equivalent to a central limit theorem for counting lattice points in long tilted narrow strips P.I N /, N D 1; 2; 3; : : : ; of slope ˛ (note that the upper left corner of P.I N / is the origin). Another natural lattice point counting problem is closely related to the classical diophantine sum S˛ .n/ D
n X kD1
1 : fk˛g 2
(1.43)
Sum (1.43) already shows up in the pioneering papers of Hardy–Littlewood and Ostrowski mentioned in Sect. 1.1. To explain how the sum (1.43) is related to a problem about counting lattice points in a right triangle, we begin with the (almost) trivial case of lattice triangle (when the slope is rational). Assume that the natural numbers n and m are relatively prime, and let L.n; m/ denote the number of lattice points inside the right triangle with vertices A D .0; 0/, B D .n; 0/, and C D .n; m/. Since n and m are relatively prime, L.n; m/ is exactly one half of the number of lattice points inside the rectangle with vertices A; B; C , and D D .0; m/, that is, L.n; m/ D
1 .n 1/.m 1/: 2
(1.44)
Consider now the smaller similar right triangle with vertices A1 D
1 1 1 n 1 m ; B1 D n ; ; C1 D n ; m ; 2m 2 2 2 2 2n
The area of the A1 B1 C1 triangle is 1 1 1 n m n m D 2 2 2m 2 2n D
1 1 n m .n 1/.m 1/ C C 2 : 2 8 m n
(1.45)
Comparing (1.44) with (1.45) we see that the area of the A1 B1 C1 triangle is almost L.n; m/, the difference is the “negligible” last term in (1.45). Next consider the case of irrational slope. Let L˛ .n C 1/ denote the number of lattice points inside the right triangle with vertices A D .0; 0/, B D .n C 1; 0/, and C D .n C 1; ˛.n C 1// where ˛ is irrational. Motivated by the case of rational slope above, consider the smaller similar right triangle with vertices A1 D
1 1 1 1 1 ˛ ; B1 D n C ; ; C1 D n C ; ˛n C : ; 2˛ 2 2 2 2 2
1.2 Randomness in Lattice Point Counting
19
The area of the A1 B1 C1 triangle is 1 ˛ 1 1 1 nC ˛n C : 2 2 2˛ 2 2
(1.46)
On the other hand, by counting the lattice points inside the ABC triangle vertically, we have (bxc denotes the lower integral part of x) L˛ .n C 1/ D b˛c C b2˛c C b3˛c C C bn˛c D D
n X
.k˛ fk˛g/ D
kD1
n X kD1
X 1 1 .k˛ / .fk˛g / D 2 2 n
kD1
!
D˛
n nC1 S˛ .n/: 2 2
(1.47)
Since ! ˛ C ˛1 2 n nC1 C .2:5/ D ˛ ; 2 2 2 we can rewrite (1.47) as follows: L˛ .n C 1/ E.n C 1/ D S˛ .n/;
(1.48)
20
1 What Is “Probabilistic” Diophantine Approximation?
where ! n nC1 E.n C 1/ D ˛ 2 2
(1.49)
is the area of the A1 B1 C1 triangle minus the negligible term 1 ˛
˛C
2
2 depending only on the slope. Letter “E” emphasizes the intuition that E.n/, the area of a triangle, is what we consider the “expected value” of L˛ .n/ as n runs in a long interval. In view of (1.48) and (1.49) the problem of describing the fluctuations of the counting function L˛ .n/ around the “expected value” E.n/ is equivalent to describing the fluctuations of sum (1.43) as n runs in a long interval. This common fluctuation satisfies a central limit theorem; we have an analog of Theorem 1.1. Theorem 1.2 (Central limit theorem). Let ˛ be any quadratic irrational. There are effectively computable constants C3 D C3 .˛/ and C4 D C4 .˛/ > 0 such that, for any real numbers 1 < A < B < 1, the density of integers N 2 for which A<
S˛ .N / C3 log N
(1.50)
is given by the usual integral 1 p 2
Z
B
e u
2 =2
d u:
(1.51)
A
In fact, we prove the following quantitative version: ˇ ˇ ˇ S˛ .n/ C3 log N 1 ˇˇ ˇD 0 n < N W A B p ˇ ˇ N C4 log N 1 Dp 2
Z
B
e u
2 =2
d u C O .log N /1=10 log log N ;
A
where the implicit constant in the error term is absolute. Also, the result remains true for any subinterval cN < n < N where 0 < c < 1 is a fixed constant (say, c D 1=2). Remarks. Note without proof that Theorem 1.2 can also be extended to a Large Deviation Theorem, similarly to Theorem 1.1.
1.2 Randomness in Lattice Point Counting
Consider the special cases ˛ D constants are equal to
p
21
2 and
p 3 in Theorem 1.2. Then the critical
1=2 p 1 1 C3 . 2/ D 0 and C4 . 2/ D ; p p 4 3 2 log.1 C 2/ p
(1.52)
and p C3 . 3/ D
1
p 12 log.2 C 3/
1=2 p 1 1 : and C4 . 3/ D p p 2 6 3 log.2 C 3/
(1.53) p p at the Note that the constant factors C3 . 2/ and C3 . 3/pwill be computed p beginning of Sect. 2.6, and the constant factors C4 . 2/ and C4 . 3/ will be computed in Sect. 3.2. p p There is a surprising difference between 2 and 3: for the latter, in the numerator of (1.50) we need the additive term C3 log N with a nonzero C3 . The underlying reason is asymmetry: the Pell equation x 2 2y 2 D 1 has infinitely many integral solutions, but x 2 3y 2 D 1 has no integral solution (which is clear from a routine modulo 3 analysis). The constants C1 and C2 in Theorem 1.1 and C3 and C4 in Theorem 1.2 are intimately bound up with the deeper arithmetic properties of real quadratic number fields; this will be explained later in Sects. 2.6 and 3.2–3.3. Here we just give two informal illustrations in advance (the definitions and proofs will come much p later). First let ˛ D 7, then p C3 . 7/ D
h.7/ p D p ; 4 log.8 C 3 7/ 4 log.8 C 3 7/ 1
where x D 8; y D 3 is the least positive solution of Pell’s equation x 2 7y 2 D 1, p and the class number h.7/ of the complex quadratic field Q . 7/ is one (“unique factorization”). p The second example is ˛ D 71, then p C3 . 71/ D
7 h.71/ p D p ; 4 log.3480 C 413 71/ 4 log.3480 C 413 71/
2 where x D 3480; y D 413 is the least positive solution of Pell’s equation px 2 71y D 1, and the class number h.71/ of the complex quadratic field Q . 71/ is 7. (Here is the complete list of the seven nonequivalent binary quadratic forms of discriminant 71: x 2 C xy C 18y 2 , 2x 2 C xy C 9y 2 , 2x 2 C 3xy C 10y 2 , 3x 2 C 2 xy C 6y 2 , 3x 2 C 5xy C 8y 2 , 5x 2 C 3xy C 4y 2 , 5x 2 C 7xy p C 6y .) Notice that in these examples thepconstant factor C3 . d / was always nonnegative. Is it true in general that C3 . d / 0? We will return to this “positivity conjecture” in Sect. 2.6 after Eq. (2.20).
22
1 What Is “Probabilistic” Diophantine Approximation?
Switching to the constant factors C4 in the standard deviation, we have the formula 0 p B C4 . 7/ D B @
11=2 X
1 p p 240 7 log.8 C 3 7/
b 2 CacD7W
C aC A
;
a>0;c>0
where the sum is over all ways of writing 7 D b 2 C ac with a, c positive integers (integer b can be positive, negative, and zero); see Proposition 3.7 (due to Siegel). Clearly X
a D .1 C 7/ C 2.1 C 6 C 2 C 3/ C 2.1 C 3/ D 40;
b 2 CacD7W a>0;c>0
where .1 C 7/ corresponds to b D 0, 2.1 C 6 C 2 C 3/ corresponds to b D ˙1, and 2.1 C 3/ corresponds to b D ˙2. Thus we have p
C4 . 7/ D
1=2
40
p p 240 7 log.8 C 3 7/
1=2
1
p p 6 7 log.8 C 3 7/
D
:
Finally, we have the analogous formula 0 p B C4 . 71/ D B @
11=2 1 p p 240 71 log.3480 C 413 71/
X b 2 CacD71W
C aC A
:
a>0;c>0
Since X
a D 1160;
b 2 CacD71W a>0;c>0
we have p C4 . 71/D
1160 p p 240 71 log.3480 C 413 71/
1=2
D
29 p p 6 71 log.3480 C 413 71/
1=2 :
p p Note that both real quadratic fields Q . 7/ and Q . 71/ have class number one: this is why we could use the elegant Siegel’s formula. If the class number of the real quadratic field is not one, then we have to switch to a more complicated algorithm. The basic idea of the proof of Theorem 1.2 is the same as that of Theorem 1.1: as n runs in the interval 0 < n < N , we approximate S˛ .n/ with a sum of independent
1.2 Randomness in Lattice Point Counting
23
and identically distributed random variables. Again the independence comes from an underlying (homogeneous) Markov chain. Theorems 1.1 and 1.2 are our main results describing the asymptotic behavior of the irrational rotation from a global viewpoint. The proofs are very long. This is why we decided to include two warm-up sections: Sects. 1.3 and 1.4.
1.2.1 A Key Tool: Ostrowski’s Explicit Formula Our proof of Theorem 1.2 will use a somewhat complicated but very useful formula, due to Ostrowski (see [Os]), expressing the sum S˛ .n/ in terms of the basic parameters of the continued fraction expansion of ˛. First we recall the wellknown recurrence relations for the denominators qi of the convergents pi =qi of ˛ D Œa0 I a1 ; a2 ; : q1 D 1; q2 D a1 ; and for all i 1; qi C2 D ai C1 qi C1 C qi : In view of this, there is a unique way to express an arbitrary positive integer n as a linear combination of the qi s as follows: X
0 bi D bi .n/ ai for i 2; 0 b1 D b1 .n/ a1 1; (1.54) where * indicates the Extra Rule that if bi D ai then bi 1 D 0. The only new parameter in Ostrowski’s explicit formula below is "i D "i .˛/ D qi ˛ pi , where sign."i / D ˙1 denotes the usual sign. It is well known that, for every ˛, as i runs, "i forms an alternating sequence (in fact, an alternating decreasing sequence that tends to zero at least exponentially fast). nD
i
b i qi ;
Proposition 1.3 (Ostrowski’s explicit formula). Let q` n < q`C1 ; and write P n D 1i `bi qi as in (1.54). Then S˛ .n/ D
j"i j 1 bi qi j"i j X sign."i /bi C bj qj j"i j C : C 1j
` X
For the sake of completeness, we include a Proof of Proposition 1.3. Write P n D b` q` C n0 where n0 D 1i <` bi qi : We have S˛ .n/ D S1 C S2 ; S1 D
b` q` X 1 fk˛g 2 kD1
where
and
24
1 What Is “Probabilistic” Diophantine Approximation? n X
S2 D
mDb` q` C1
1 : fm˛g 2
We recall (1.15): ˛ pi =qi D "i =qi where j"i j < 1=qi C1: We can rewrite S1 as follows: S1 D S˛ .b` q` / D
b` q` b` q` X k"` kp` C : C 2 q` q` kD1
We distinguish two cases. Case 1: "` > 0: In this case b` q` X kp` kD1
D b`
q`
C
k"` q`
D
b` q` X kp` kD1
q`
C
b` q` X k"` kD1
q`
b` q` .b` q` C 1/"` b` q` b` b` q` .q` 1/ C D C .b` q` C 1/"` : 2q` 2q` 2 2 2
` In the last step we used the facts that k" < q1` for k b` q` .< q`C1 / and also that q` the residue of kp` modulo q` ; as k runs, is a permutation of 0; 1; 2; ; q` 1: Therefore,
S1 D
b` b` C .b` q` C 1/ "` : 2 2
Case 2: "` < 0: In this case, if k 6 0 (mod q` ), we have
k"` kp` C q` q`
D
kp` q`
C
k"` q`
since jk"` =q` j < 1=q` for 1 k b` q` ; and if k 0 (mod q` ) and 1 k b` q` ; then
k"` kp` C q` q`
D1C
k"` : q`
Repeating the summations in Case 1, again we have S1 D
b` b` C .b` q` C 1/ "` : 2 2
1.2 Randomness in Lattice Point Counting
25
Next we rewrite S2 : 0
S2 D
n X
fb` q` ˛ C m˛g
mD1
n0 2
where n0 D
X 1i <`
bi qi :
To evaluate S2 ; we again distinguish two cases. Case 1: "` > 0: In this case, fb` q` ˛ C m˛g D fb` q` ˛g C fm˛g since the sum of the fractional parts on the right-hand side is less than 1. Indeed, using the standard notation kxk for the distance of x from the nearest integer, we have kb` q` ˛k a` kq` ˛k, km˛k kq`1 ˛k for all m < q` ; and .a` q` C q`1 /˛ D q`C1 ˛ has the property that sign."`C1 / D sign.q`C1 ˛ p`C1 / D sign."`1 / D sign.q`1 ˛ p`1 /: So we have fb` q` ˛ C m˛g D fb` q` .p` =q` C "` =q` /g C fm˛g D b` "` C fm˛g ; and 0
S2 D
n X
fb` q` ˛ C m˛g
mD1
n0 D n0 b` "` C S˛ .n0 /: 2
Case 2: "` < 0: Then fb` q` ˛ C m˛g D fb` q` ˛g C fm˛g 1: Indeed, fb` q` ˛g D fb` q` .p` =q` C "` =q` /g D b` "` C 1; fm˛g D fm.p` =q` C "` =q` /g D fmp` =q` g C
m "` C 1; q`
and fmp` =q` g C
.b` q` C m/ "` C 1 > 1 q`
since b` q` C n0 < q`C1 and j"` j < 1=q`C1 : Thus we have fb` q` ˛ C m˛g D fb` q` ˛g C fm˛g 1 which equals .1 C b` "` / C fm˛g 1 D b` "` C fm˛g; and
26
1 What Is “Probabilistic” Diophantine Approximation?
so again we have S2 D n0 b` "` C S˛ .n0 /: Summarizing, S˛ .n/ D S˛ .n0 /
b` .1 b` q` 2n0 j"` j j"` j/sign."` /; 2
and Ostrowski’s formula (1.55) follows by induction.
t u
Ostrowski used his formula to study the maximum fluctuation of the sum S˛ .n/ as ˛ is fixed and n runs in a long interval. As an illustration, we mention without proof the following result. Proposition 1.4 (Ostrowski’s large fluctuation result). Suppose the partial quotients of ˛ D Œa0 I a1 ; : : : form a bounded sequence: ai A for all i (this covers the class of quadratic irrationals). Then there are positive constants 0 < c1 < 1 and c2 > 0 (possibly depending on A) such that, for every sufficiently large N , the interval c1 N < n < N contains an integer n1 with the property S˛ .n1 / > c2 log N; and also the interval c1 N < n < N contains another integer n2 with S˛ .n1 / < c2 log N:
1.2.2 Counting Lattice Points in General We conclude this section with a short general discussion about lattice point problems. It is fair to say that there is no such thing as a coherent “lattice point theory” (yet). What we have instead are two unrelated subjects: (a) the two famous old lattice point problems and a lot of related partial results and (b) Minkowski’s well-known lattice point theorem(s), as the basic result(s) of the so-called geometry of numbers. A possible vague description of what “lattice point theory” should mean may go like this: the main question is to determine, or at least estimate, the number of lattice points in a “reasonable” region in the plane and in higher dimensions. Notice that the one-dimensional problem is trivial. The only “reasonable” set in the real line is an interval, and every interval Œa; b/ R I contains either bb ac or db ae integers (lower or upper integral part). By contrast, the two-dimensional problem is far from trivial. What are the “reasonable” sets in the plane? The first novelty here is that we have many natural candidates, such as
1.2 Randomness in Lattice Point Counting
27
1. polygons, 2. smooth regions like the circle, and other quadratic shapes (ellipse, hyperbola), and 3. all convex regions. Some natural questions have an easy answer (e.g., Pick’s theorem about lattice polygons; see below); other problems are extremely hard and are open for more than 200 years (e.g., Gauss’s well-known Circle Problem). Theorem 1.1 (or Theorem 1.2) is in the middle in the sense that it is a lattice point counting result that is neither simple nor hopeless. In the rest of the section we collect some simple results that will be repeatedly used later.
1.2.2.1 Pick’s Theorem: Complete Answer for Lattice Polygons A polygon is called simple if it does not intersect itself. A simple polygon divides the plane into two regions: a bounded and simply connected “inside” (or interior) and an unbounded “outside”—this is a special case of a well-known theorem of Jordan. In the rest a polygon always means a simple polygon. Let P be a lattice polygon, meaning that every vertex of polygon P is a lattice point .k; l/ 2 ZZ2 . Let B.P/ denote the number of lattice points on the boundary of P, let I.P/ denote the number of lattice points inside P, and finally let A.P/ denote the area of P. Proposition 1.5 (Pick’s theorem). Every simple lattice polygon P satisfies the equation 1 B.P/ C I.P/ D A.P/ C 1: 2 A lattice triangle or parallelogram is called empty if it contains no lattice point inside, and contains, respectively, 3 or 4 lattice points on the boundary (the “vertices”). We have the following simple corollary of Proposition 1.5. Corollary 1.6. Every empty lattice triangle or parallelogram has area, respectively, 1/2 or 1. The standard way of proving Pick’s theorem is to prove Corollary 1.6 first and then extend it for arbitrary polygons by induction (since every polygon is a union of triangles). It is fair to say that Theorem 1.2—i.e., counting lattice points in right triangles of irrational slope ˛, where one vertex is the origin and one side is on the x-axis—is the simplest case beyond Pick’s theorem. And the simplest case already exhibits a central limit theorem. Pick’s theorem (Proposition 1.5) was an “exact result”; here is another one. Consider the following “half-open” version of the unit square: P D f.x; y/ W 0 x < 1 and 0 y < 1g:
(1.56)
28
1 What Is “Probabilistic” Diophantine Approximation?
In other words, from the closed unit square Œ0; 12 we remove the top unit interval Œ0; 1 and also the right-hand side unit interval Œ0; 1—this is how we get P. P contains exactly one lattice point (the origin), and every translated copy P C v of P contains exactly one lattice point in the plane. Similarly, let P be an arbitrary (not necessarily empty) lattice parallelogram; in fact, we assume that P is “half open” the same way as (1.56). Then again every translated copy P C v of P contains the same number of lattice points, and the common value is the area of P. In general, we can extend it to all centrally symmetric polygons. Indeed, every centrally symmetric polygon can be decomposed into parallelograms; we leave the easy proof to the reader. Thus we obtain the following simple but elegant result. Proposition 1.7. Let P be a centrally symmetric lattice polygon with half-open border the same way as (1.56). Then every translated copy P C v of P contains the same number of lattice points, and the common value is the area of P. t u Here is another simple result. Proposition 1.8. Let A R I 2 be a Lebesgue measurable set in the plane with finite measure (that we call the “area”). Then Z 1Z
1
j.A C x/ \ ZZ2 j d x D area.A/; 0
(1.57)
0
where A C x is the translated copy of set A, translated by the vector x 2 R I 2.
t u
Finally, we mention the almost trivial
I 2 be a region inside a simple curve , and assume that Proposition 1.9. Let S R has a well-defined finite arc length (= perimeter of S ), then Area.S / O.Perimeter.S // jS \ ZZ2 j Area.S / C O.Perimeter.S // C 1: (1.58) Note that Proposition 1.9 is basically best possible. Indeed, let S be the square Œ"; n C "2 : it has area .n C 2"/2 D n2 C 4"n C 4"2 D n2 C o.1/ if " > 0 is small enough, the perimeter of S is 4n C o.1/, the number of lattice points inside S is .n C 1/2 D n2 C 2n C 1, thus we have number of lattice points inside S D Area C
1 Perimeter C o.1/: 2
(1.59)
Here S is an axis-parallel square; the situation is completely different p pfor tilted squares where the slope is a (say) quadratic irrational, such as 2 or 3. Then the maximum fluctuation (around the area) drops from ˙Perimeter p in (1.59) to ˙ log.Perimeter/, and the typical fluctuation drops further to ˙ log.Perimeter/. In fact, we have a central limit theorem—a variant of Theorem 1.2; see Sect. 4.5.
1.3 First Warm-Up: Van der Corput Sequence—When Independence Is Given
29
1.3 First Warm-Up: Van der Corput Sequence—When Independence Is Given In 1935 van der Corput [Co] constructed his famous “digit reversal sequence” t0 ; t1 ; t2 ; : : :, which in many respects can be considered an oversimplified model for the irrational rotation. At the same time, it is the simplest example of a “most uniform” infinite sequence in the unit interval. The van der Corput sequence goes as follows: 0;
1 1 3 1 5 3 7 1 9 5 13 3 11 7 15 ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; 2 4 4 8 8 8 8 16 16 16 16 16 16 16 16
1 17 9 25 5 21 13 29 3 19 11 27 7 23 15 31 ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ::: 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 Note that here t1 D 1=2 is obtained from t0 D 0 by a shift of 1=2, then the first two elements t0 ; t1 are shifted by 1=4, then the first four elements t0 ; t1 ; t2 ; t3 are shifted by 1=8, then the first eight elements are shifted by 1=16, then the first sixteen elements are shifted by 1=32, and so on. An alternative definition of tn is the following. We write down n in binary form (say, 13 D 8 C 4 C 1 D 1101), then we write its digits in reverse order and prefix it with “0” and “.” like this: t13 D 0:1011 D
1 1 1 11 C C D : 2 8 16 16
In general, if n D 2k1 C 2k2 C 2k3 C with k1 > k2 > k3 > ;
(1.60)
then tn D 2k1 1 C 2k2 1 C 2k3 1 C :
(1.61)
The van der Corput sequence t0 ; t1 ; t2 ; : : : exhibits a clear-cut dyadic nested structure; it is well illustrated by the following three properties of the sequence. Property A: The set fti W 0 i < 2k g of the first 2k elements of the van der Corput sequence is the equidistant set fj 2k W 0 j < 2k g in different order. Property B: Let I .0; 1/ be an arbitrary half-open subinterval of length 2k for some integer k 1, and let n be an arbitrary integer divisible by 2k . Then the number of elements of the set fti W 0 i < 2k g that fall into interval I is exactly n2k . Property C (“Two Distances”): If 2k n < 2kC1 then the consecutive points of the set fti W 0 i < ng have at most two distances: 2k and 2k1 .
30
1 What Is “Probabilistic” Diophantine Approximation?
We have the perfect analogs of Properties A–C for the irrational rotation ˛; 2˛; 3˛; : : : (mod 1). The Three-distance theorem mentioned in Sect. 1.1 is an obvious analog of Property C. The Lemma on Bounded Error Initial Segments in Sect. 1.1 is an analog of Property A, and the Lemma on Just Intervals is the analog of Property B. We can say, intuitively speaking, that the van derpCorput sequence t0 ; t1 ; t2 ; : : : behaves like a “fake irrational rotation where ˛ D 2 is replaced by 1=2 (and 1=4 and 1=8 and so on).” Since tk is uniformly distributed in the unit interval Œ0; 1/ (see Properties A and B), it is natural to take the difference tk 1=2; in fact, we study the sum S.n/ D
n1 X 1 ; tk 2
(1.62)
kD0
which is a perfect analog of sum (1.43). As a warm-up result for Theorem 1.1 (and Theorem 1.2), we are going to prove the following central limit theorem for S.n/ as n runs in the interval 0 n < 2m , where m 2 is any integer. Proposition 1.10 (Central limit theorem for the van der Corput sequence). For any integer m 2 and any real numbers 1 < A < B < 1 ˇ ˇ Z B ˇ 1 S.n/ C m=8 1 ˇˇ 2 m ˇ 0n<2 W A B ˇD p p e u =2 d u C 2m ˇ m=4 2 A C O.m1=10 log m/: (1.63) Note that the implicit constant in the error term is absolute. Remarks. Before proving Proposition 1.10, we want to make a short detour about irregularities of distribution (a counterpart of uniform distribution) and how it is related to our subject (for more about “irregularities of distribution,” see the books [Be-Ch, Cha, Ma]). The introduction of the van der Corput sequence was motivated by the following question of van der Corput himself (raised in the very same paper where the van der Corput sequence was defined [Co]): What is the most uniformly distributed infinite sequence in the unit interval? Of course, the most uniformly distributed n-element point set in the unit interval is the equidistant set S D f0; 1=n; 2=n; : : : ; .n1/=ng (or any translate of S modulo one). Indeed, given any two subintervals I1 and I2 , of equal length, in the unit interval [0,1), the number of elements of set S that lie in I1 differs from the number of elements of S that lie in I2 by at most one (the actual number is the upper or lower integral part of njI j where jI j is the length of interval I ). Error 1 is clearly a very strong result. The simplest variant of van der Corput’s problem goes as follows: Question 1.11 ([Co]). Does there exist an infinite sequence ˛1 , ˛2 , ˛3 , : : : in [0,1) such that for any integer n 1 and for any subinterval I Œ0; 1/,
1.3 First Warm-Up: Van der Corput Sequence—When Independence Is Given
31
ˇ ˇ ˇ ˇ ˇ ˇ X ˇ ˇ ˇ 1 njI jˇˇ < absolute constant‹ ˇ ˇ ˇ 1i nW ˇ ˇ ˛i 2I Here jI j is the length of interval I . In 1945 van Aardenne-Ehrenfest [Aa1] proved that the answer to Question 1.11 is “no,” and a few years later she proved the following quantitative result. First we introduce a notation for the max-discrepancy: ˇ ˇ ˇ ˇ ˇ ˇ X ˇ ˇ .N / D .N I ˛1 ; : : : ; ˛N / D max sup ˇˇ 1 njI jˇˇ : 1nN I Œ0;1/ ˇ ˇ nW ˇ ˇ 1i ˛i 2I In 1949 van Aardenne-Ehrenfest [Aa2] proved that for any infinite sequence in the unit interval .N / > c1
log log N ; log log log N
where c1 > 0 is a positive absolute constant. After van Aardenne-Ehrenfest’s breakthrough results the main question was Question 1.12. How large is the max-discrepancy .N /? In 1972 Schmidt [Schm] settled this problem by proving that .N / > c2 log N; where c2 > 0 is a positive absolute constant (e.g., c2 D 1=50 is a good choice). The order of magnitude log N in Schmidt’s theorem is the best possible. There are several infinite sequences with max-discrepancy .N / D O.log N /—the van der Corput sequence is perhaps the simplest construction. Further examples are the irrational rotation k˛ (mod 1), k D 1; 2; 3; : : :, where ˛ is any quadratic irrational (this follows from the Discrepancy Lemma in Sect. 1.1, and it goes back to the early works of Hardy–Littlewood [Ha-Li1, Ha-Li2] and Ostrowski [Os]). Between van Aardenne-Ehrenfest (1945–1949) and W. M. Schmidt (1972), the most important work p was done by Roth [Ro], who proved in 1954 that the L2 discrepancy is > log N . More precisely, let ˛1 , ˛2 , : : :, ˛N be an arbitrary N element point set in the unit interval [0,1), and define the L2 -discrepancy as 0 B1 2 .N / D 2 .N I ˛1 ; : : : ; ˛N / D B @N
N Z 1 X nD1
0
0
12
11=2
B X C C B C dx C 1 nx @ A A 1i nW 0˛i <x
:
32
1 What Is “Probabilistic” Diophantine Approximation?
Roth’s theorem says that for any N -element set in the unit interval p 2 .N / D 2 .N I ˛1 ; : : : ; ˛N / > c3 log N ; where c3 > 0 is a positive absolute constant (e.g., c3 D 1=20 is a p good choice). In 1956 Davenport [Da] proved that the order of magnitude log N in Roth’s theorem is best possible. Davenport considered the following “symmetric” 2N element point set coming from the irrational rotation: S˛˙ D S˛˙ .N / D fk˛ .mod 1/ W k D ˙1; ˙2; ˙3; : : : ; ˙N g; where ˛ is a badly approximable number, meaning that an D O.1/ where an is the nth partial quotient in the continued fraction ˛ D Œa0 I a1 ; a2 ; a3 ; : : : of ˛ (in other words, the partial quotients are bounded—this is certainly the case for the quadratic irrationals, since periodicity implies boundedness). Davenport actually proved that for any n 2 12
0
Z
1=2 0
B B @
X 1knW x
C 1 2nx C A dx D O.log n/:
Davenport’s result is important for us, so we outline the short proof. The proof is in fact a rather straightforward application of Parseval’s formula and goes as follows. Let x .t/ denote the characteristic function of the interval .x; x/ with 0 < x < 1=2. It is easy to compute the Fourier series: x .t/ 2x D
1 X sin.2`x/ `D1
`
cos.2`t/:
Thus we have X
1 2nx D
1knW x
n X kD1
D
1 X `D1
1 X sin.2`x/ x .k˛/ 2nx D `
1 sin.2`x/ `
By Parseval’s formula we have
`D1
n X kD1
sin..2n C 1/`˛/ 1 : 2 sin.`˛/ 2
! cos.2`k˛/ D
1.3 First Warm-Up: Van der Corput Sequence—When Independence Is Given
12
0
Z
1=2 0
B B @
33
X 1knW x
! 1 X C 2 1 2nx C A .`I n/ ; A dx D O `D1
where jA.`I n/j D min Using the trivial fact the sum
P k
1 n : ; ` `k`˛k
k 2 D O.1/, it is easy to see that the main contribution of 1 X
A2 .`I n/
`D1
comes from the convergent denominators qi n of ˛ (where pi =qi is the i th convergent of ˛). Because ˛ is badly approximable, we have inf`1 `k`˛k > 0, and so the contribution of the convergent denominators satisfying qi n is O.log n/. Thus we have ! 1 X X A2 .`I n/ D O 1 D O.log n/; qi n
`D1
proving Davenport’s theorem. This simple argument—an application of Parseval’s formula—is the best (and quickest) way to understand why the norming factor in Theorems 1.1 and 1.2 is p log N . See also the closely related paper of Sós and Zaremba [So-Za]. Unfortunately the proof of Theorem 1.1 is not simple at all—this is why we decided to include an intermediate result such as Proposition 1.10. We conclude Sect. 1.3 with its proof. Proof of Proposition 1.10. First we give an “explicit formula” for the sum S.n/ [defined in (1.62)] in terms of the binary digits of n. The simplest case is n D 2m where m is an integer, then by Property A we have m
S.2 / D
m 1 2X
kD0
k 1 m 2 2
D2
m
! 2m 1 2m1 D : 2 2
If n D 2k1 C 2k2 C 2k3 C with k1 > k2 > k3 > ;
(1.64)
34
1 What Is “Probabilistic” Diophantine Approximation?
then by repeated application of (1.64) and (1.61) we have 1 1 1 1 1 1 C C k k C1 C C k k C1 C k k C1 C S.n/ D 2 2 21 2 2 22 3 21 3 1 1 1 1 C k k C1 C k k C1 C k k C1 C 2 23 4 22 4 21 4
(1.65)
We prefer to rewrite (1.65) in the following way: if 2m1 n < 2m and n D ı0 2m1 C ı1 2m2 C ı2 2m3 C : : : C ım1 ; where ı0 D 1, ıi 2 f0; 1g for 1 i m 1, then 0 1 m1 X 1 @X S.n/ D ıi ıi ıj 2i j A : 2 i D0 0i <j <m
(1.66)
(1.67)
Notice that (1.67) is a simplified version of Ostrowski’s formula (1.55). We can clearly extend the range 2m1 n < 2m in (1.67) to 0 n < 2m by letting ı0 D 0; then, as long as n runs in the interval 0 n < 2m , the binary digits ı0 D ı0 .n/, ı1 D ı1 .n/, ı2 D ı2 .n/, : : :, ım1 D ım1 .n/ of n form independent random variables. It is very easy to evaluate the “expectation” 0 1 m1 2m X X X 1 1 i j 1 S.n/ D @ 2 A; Em D 2m (1.68) 2 2 4 nD1 i D0 0i <j <m since PrŒıi ıj D 1 D PrŒıi D 1 PrŒıj D 1 D 1=2 1=2 D 1=4 and so PrŒıi ıj D 0 D 3=4 for all i ¤ j ; here of course Pr stands for probability. We can easily find the exact value of sum (1.68): m 1 1 1 1 1 1 1 1 1 1 Em D C C C D C C C C C C : : : C m1 4 8 2 2 4 2 4 8 2 4 2 1 1 1 1 1 m 1 C 1 C 1 C : : : C 1 m1 D D C 4 8 2 4 8 2 1 m 1 m1 1 m (1.69) 1 m1 D C 2m2 : D C 4 8 8 2 8 4 Next we compute the “variance” (E means expectation) 12 0 m1 X 1 @X 1 1 i j A Vm D E 2 D ıi ıi ıj 4 2 4 i D0 0i <j <m D DiagPart C OffDiagPart;
(1.70)
1.3 First Warm-Up: Van der Corput Sequence—When Independence Is Given
35
where the diagonal and off-diagonal parts are defined as follows: 0 1 m1 X 1 2 1 2 i j A 1 @X C 4 DiagPart D E ıi ı i ıj 4 2 4 i D0 0i <j <m
(1.71)
and 1 0 X 1 1 A 1 @ ıi2 ıi1 OffDiagPart D E 2 4 2 2 0i
2
1 0 m1 X X 1 1 1 i2 j X X A; E @2 C ı i 2 ıj 2 ıi1 4 2 4 i D0 0i <j <m 1
(1.72)
2
where XX
X
X
D
0i1 <j1 <m 0i2 <j2 <m .i1 ;j1 /¤.i2 ;j2 /
1 1 i1 j1 Ci2 j2 : ı i 2 ıj 2 2 ıi1 ıj1 4 4 (1.73)
Since 2 2 1 2 1 3 3 3 1 1 3 D D C D E ı i ıj ; 4 4 4 4 4 4 4 16 the diagonal part is easy: 1 0 m1 X 1 @X 3 DiagPart D .˙1=2/2 C 4i j A D 4 i D0 16 0i <j <m D
m 3 C 16 16
1 C 4
3 1 m C D 16 16 3
1 1 C 2 4 4
C
1 1 1 C 2C 3 4 4 4
C
1 1 1 C 2 C : : : C m1 4 4 4
D
1 1 1 1 1 C 1 2 C 1 3 C : : : C 1 m1 D 4 4 4 4
1 m m1 1 1 1 m 1 m1 D C C D : 16 16 48 4 8 12 3 4mC1
(1.74)
36
1 What Is “Probabilistic” Diophantine Approximation?
To evaluate the off-diagonal part, we repeatedly use the well-known “product rule” EX Y D EX EY if X and Y are independent random variables. This eliminates the contribution of the first part on the right-hand side of (1.72); also in the middle part we can assume that either i1 D i2 or i1 D j (otherwise the contribution is zero). Note that with i ¤ k, 1 1 1 1 1 E ıi ıi ık D E ıi2 ık ıi ık ıi C D 2 4 2 4 8 1 1 1 1 1 1 1 1 D C D ; D E ıi ık ıi ık ıi C 2 4 8 4 8 8 8 8 so we have m1 m1 X X 2X 2X i1 j 2 middle part of (1.72) D 8 i D0 0i Di <j <m 8 i D0 0i <j Di 1
2
D
1
1
m C O.1/: 2
2
2i2 i1 D
1 <m
(1.75)
Finally, we study the expected value of (1.73). In view of the “product rule” we have four cases only: (1) i1 D i2 , j1 ¤ j2 , (2) i1 D j2 , (3) i2 D j1 , (4) i1 ¤ i2 , j1 D j2 . Note that with i < j1 < j2 1 1 1 1 ıi ıj2 D E ıi2 ıj1 ıj2 ıi .ıj1 C ıj2 / C D E ı i ıj 1 4 4 4 16 1 1 D E ıi ıj1 ıj2 ıi .ıj1 C ıj2 / C D 4 16 D
1 1 1 1 C D ; 8 8 16 16
so we have X
X
0i <j1 <m 0i <j2 <m j1 ¤j2
1 m 2i j1 Ci j2 D C O.1/; 16 16
(1.76)
which is the contribution of case (1). Repeating the same argument, we obtain m=16 C O.1/ in each one of cases (2)–(4). Summarizing, by (1.72)–(1.74), and (1.76) (applied four times) we have
1.3 First Warm-Up: Van der Corput Sequence—When Independence Is Given
OffDiagPart D
1 m m m C O.1/ C 4 C O.1/ D C O.1/: 4 2 16 16
37
(1.77)
By using (1.70), (1.74), and (1.77) we can evaluate the variance: Vm D 2
m
m 1 2X
.S.n/ Em /2 D
nD0
D
m m m C O.1/ C O.1/ D C O.1/; 8 16 16
where Em is the expectation [see (1.69)].
(1.78) t u
1.3.1 Digit Sums and Generalized Digit Sums Let’s return to explicit formula (1.67): if 0 n < 2m and n D ı0 2m1 C ı1 2m2 C ı2 2m3 C : : : C ım1 ; where ıi 2 f0; 1g for 0 i m 1, then 0 1 m1 X 1 @X ıi ıi ıj 2i j A : S.n/ D 2 i D0 0i <j <m
(1.79)
As n runs in the interval 0 n < 2m , the binary digits ı0 D ı0 .n/, ı1 D ı1 .n/, ı2 D ı2 .n/, : : :, ım1 D ım1 .n/ of n form a sequence of independent random variables, each has values 0 and 1 with probability 1/2. (We assume that the reader is familiar with the elements of probability theory, including such concepts as independence, expectation, and variance.) It follows, therefore, from probability theory that the plain digit sum m1 X i D0
ıi D
m1 X
ıi .n/
(1.80)
i D0
exhibits a central limit theorem. This special case, the binomial distribution, in fact goes back to De Moivre and Laplace. Namely, for any integer m 2 and any real numbers 1 < A < B < 1 ˇ( )ˇ Pm1 Z B ˇ 1 ˇˇ 1 ı m=2 2 i ˇ m i D0 p e u =2 d u C o.1/; B ˇD p ˇ 0n<2 W A ˇ 2m ˇ m=2 2 A
38
1 What Is “Probabilistic” Diophantine Approximation?
where the error term o.1/ tends to zero as m ! 1. In view of (1.79) the van der Corput sum S.n/ is more complicated than the plain digit sum (1.80). Nevertheless, by using a decomposition trick in (1.79), we are able to prove a central limit theorem.
1.3.2 A Decomposition Trick First we introduce two parameters: let L D L.m/ D 3.log m= log 2/ and 2L < R D R.m/ D m ;
(1.81)
where the value of the constant 0 < < 1=2 will be specified later. Notice that 2L D 23 log m= log 2 D m3 :
(1.82)
We decompose the digit sequence ı0 D ı0 .n/, ı1 D ı1 .n/, ı2 D ı2 .n/, : : :, ım1 D ım1 .n/ into groups of size R D R.m/: ı0 ; ı1 ; ı2 ; : : : ; ıR1 ; ıR ; ıRC1 ; ıRC2 ; : : : ; ı2R1 ; ı2R ; ı2RC1 ; ı2RC2 ; : : : ; ı3R1 ; and so on. For notational convenience assume that m is divisible by R D m , i.e., both R and m0 D m=R are integers. We can rewrite (1.79) in the form S.n/ D X1 C X2 C X3 C : : : C Xm0 C C Y1 C Y2 C Y3 C : : : C Ym0 1 C W; where for k D 1; 2; 3; : : : ; m0 we define 0 1B Xk D B 2@
X
ıi
.k1/Ri
for k D 1; 2; 3; : : : ; m0 1 we define
(1.83)
1 X
.k1/Ri <j
C ıi ıj 2i j C A;
(1.84)
1.3 First Warm-Up: Van der Corput Sequence—When Independence Is Given
Yk D
1 2
X
ıi ıj 2i j ;
39
(1.85)
0i
and finally W D
1 2
X
ıi ıj 2i j :
(1.86)
0i <j <m j i >L
First notice that W is negligible: by (1.82) and (1.86) W DO
! ! m m3 D O.1=m/: 2
(1.87)
In view of (1.84) the random variables X1 , X2 , X3 , : : : depend on disjoint sets of ıi s; therefore, they are independent and identically distributed random variables. What we used here was the following general result about functions of independent random variables. Lemma 1.13. If X and Y are independent random variables and f; g W R I !R I are arbitrary functions, then f .X / and g.Y / are independent also. The proof is trivial assuming the reader has a good understanding of the concept of independence; in fact, this is the level of expertise we assume from probability theory. Similarly, since L < R=2, the random variables Y1 , Y2 , Y3 , : : : [see (1.85)] also depend on disjoint sets of ıi s; therefore, they are also independent and identically distributed random variables. Note, however, that Yk does depend on both Xk and XkC1 . Finally, Yk is independent of all X` with ` 62 fk; k C 1g. It follows from probability theory that the sum S1 D X1 C X2 C X3 C : : : C Xm0 with m0 D
m D m1
R
(1.88)
exhibits a central limit theorem, and the other sum S2 D Y1 C Y2 C Y3 C : : : C Ym0 1
(1.89)
also exhibits a central limit theorem (with different parameters). We show that S2 is negligible compared to S1 [see (1.88) and (1.89)]; this is why the central limit theorem for S1 also describes the asymptotic behavior of the van der Corput sum S.n/ apart from a negligible error term. To make this argument precise, we need a quantitative form of the central limit theorem with an explicit error term (unfortunately most probability theory books give only a “soft” qualitative form). In fact, we formulate a more general result
40
1 What Is “Probabilistic” Diophantine Approximation?
where the random components are not necessarily identically distributed (the proof is almost the same anyway); see, e.g., in Feller’s book [Fe1, Fe2]. Central Limit Theorem with Explicit Error Term (Berry–Esseen version). Let Z1 ,Z2 , : : :,Zn be independent random variables with expectation EZi D 0, variance EZi2 < 1, and also EjZi j3 < 1 for all 1 i n. Write T D
n X
EjZi j3 and V D
i D1
n X
EZi2 :
i D1
Then for any real ˇ ˇ Z 1 ˇ ˇ 40T u2 =2 ˇPr Z1 C Z2pC : : : C Zn p1 e d uˇˇ < 3=2 : ˇ V V 2
(1.90)
Let’s return to (1.84) and (1.85): the boundedness of geometric series implies Xk D O.R/ D O.m /; Yk D O.1/;
(1.91)
and independence implies E Œ.Xk EXk /.Yl EYl / D 0 or O.1/
(1.92)
depending on whether l 62 fk; k C 1g or l 2 fk; k C 1g. Note that the upper bound O.1/ for the correlation comes from routine calculations by using the fact that the factor 2i j in (1.85) decreases exponentially fast as the difference j i.> 0/ increases. In other words, the correlation is O.1/, because it is estimated from above by a convergent geometric series as follows: for l D k jE Œ.Xk EXk /.Yk EYk /j D ˇ ˇ 1 ˇ D ˇ ˇ 4
C
1 4
1 C 4
E .ıi Eıi / E ıi ıj 2i j Eıi ıj 2i j C
X .k1/Ri
E ıi ıj1 2i j1 Eıi ıj1 2i j1 E ıi ıj2 2i j2 Eıi ıj2 2i j2 C
X .k1/Ri
X
E ıi1 ıj 2
i1 j
Eıi1 ıj 2
i1 j
i2 j
E ıi2 ıj 2
Eıi2 ıj 2
.k1/Ri1 ;i2
1 4
X .k1/Ri
2i j C
1 4
X .k1/Ri
i2 j
2.i j1 /C.i j2 / C
ˇ
ˇ ˇ ˇ ˇ
1.3 First Warm-Up: Van der Corput Sequence—When Independence Is Given
C
1 4
X
i j 2
2
2.i1 j /C.i2j /
.k1/Ri1 ;i2
0
X
C@
i
12
X
.i j /1 A
2
1 X
h3
h2
X
C@
12 .i j /1 A
2
D
i
C2
hD1
D
0
i
D
41
1 X
!2 h2
h2
D
hD1
2 1=4 1=2 C 2 D 1 C 8 D 9; .1 .1=2//2 .1 .1=2//2
and similar estimation for l D k C 1. This proves (1.92). With “Var” meaning the variance, we have [see (1.78)] Vm D Var.S.n// D
m0 X
Var.Xi / C
i D1
C
XX i
m 0 1 X
Var.Yj /C
j D1
2E .Xi EXi /.Yj EYj / C O.1/;
(1.93)
j
where the term O.1/ comes from the negligible contribution of W [see (1.86) and (1.87)]. Applying (1.91) and (1.92) in (1.93) we have Vn D Var.S.n// D
m0 X
Var.Xi / C O.m0 /:
(1.94)
m m C O.m0 / D C O.m1 /: 16 16
(1.95)
i D1
By (1.78) and (1.94) m0 X
Var.Xi / D Vm C O.m0 / D
i D1
Since trivially EjXi EXi j3 D O.R3 / D O.m3 /; we can apply the central limit theorem above [see (1.90)]: for any ˇ ˇ Z 1 ˇ 40T1 ˇ 1 ES1 u2 =2 ˇPr S1 p p e d uˇˇ < 3=2 ˇ V1 2 V1
(1.96)
42
1 What Is “Probabilistic” Diophantine Approximation?
with S 1 D X 1 C X 2 C X 3 C : : : C Xm 0 ; V1 D Var.S1 / D T1 D
X
m C O.m1 /; 16
(1.97)
EjXi EXi j3 D m0 O.m3 / D O.m1C2 /:
i
Equation (1.97) implies that p
p m C O.m1=2 /; V1 D 4
(1.98)
and the right-hand side of (1.96) equals 40T1 3=2
D O.m2 1=2 /:
(1.99)
V1
Next we apply the central limit theorem to S2 D Y1 C Y2 C Y3 C : : : C Ym0 1 and obtain that for any ˇ ˇ Z 1 ˇ ˇ 40T2 ES2 1 u2 =2 ˇPr S2 p ˇ< p e d u ˇ ˇ 3=2 V2 2 V2
(1.100)
with the trivial upper bounds for the corresponding variance and third moment: V2 DVar.S2 /DO.m0 /DO.m1 /; and T2 D
X
EjYi EYi j3 DO.m0 /DO.m1 /;
i
which are obvious from Yi D O.1/ [see (1.91)]. Since e u small in terms of u, (1.100) implies with D log m
2 =2
is super-exponentially
Pr jS2 ES2 j m.1 /=2 log m D O.m. 1/=2 /:
(1.101)
Since S.n/ Em D .S1 ES1 / C .S2 ES2 / C o.1/; where the o.1/ stands for the contribution of W in (1.83), by (1.96)–(1.101) we can describe the asymptotic behavior of S.n/ Em as n runs in 0 n < 2m : for any
1.3 First Warm-Up: Van der Corput Sequence—When Independence Is Given
2
m
43
ˇ ˇ ˇ ˇ Em =2 ˇ 0 n < 2m W S.n/ C O.m log m/ ˇˇ D p ˇ m=4 Z
1 Dp 2
1
e u
2 =2
d u C O.m2 1=2 /:
(1.102)
With the choice of D 1=5 we can make both error terms in (1.102) small (around m1=10 ). Combining (1.69) with (1.102), Proposition 1.10 follows. t u
1.3.3 Concluding Remark In Proposition 1.10 the integral parameter n runs in the special interval 0 n < 2m , but the result can be easily extended to any interval 0 n < N with arbitrary (“large”) N . Indeed, let N D N.a1 ; : : : ; a` / D 2m Ca1 2m1 Ca2 2m2 Ca3 2m3 C: : :Ca` 2m`
(1.103)
where ai 2 f0; 1g, 1 i `, and ` D c.log m= log 2/ (of course log m= log 2 is the “binary logarithm”) is an integer; the value of the constant factor c > 0 will be specified later. Since ` D O.log m/ is negligible compared to m, by switching from N D 2m in Proposition 1.10 to any number of the form N D N.a1 ; : : : ; a` /, the expectation and the variance hardly change value, and Proposition 1.10 remains true with the same error term: ˇ ˇ ˇ ˇ 1 C m=8 ˇ 0 n < N.a1 ; : : : ; a` / W A S.n/ p B ˇˇ D ˇ N.a1 ; : : : ; a` / m=4 1 Dp 2
Z
B
e u
2 =2
d u C O m1=10 log m :
(1.104)
A
On the other hand, for every N in 2m N < 2mC1 there is a “short binary number” N.a1 ; : : : ; a` / defined in (1.103) such that 0 N N.a1 ; : : : ; a` / < 2m` D mc 2m :
(1.105)
By choosing c 1=10, the relative factor mc in (1.105) becomes less than the error 1=10 term O m log m in (1.104). This proves that Proposition 1.10 holds for any interval 0 n < N with arbitrary N (not just for the special intervals 0 n < 2m ).
44
1 What Is “Probabilistic” Diophantine Approximation?
1.4 Second Warm-Up: Markov Chains and the Area Principle In this section we make a short excursion to the local side of diophantine approximation, formulated in terms of lattice point counting. (We will return to the local aspects with a more detailed discussion in Part 1.3 of the book, starting with Sect. 5.1.) Perhaps the simplest and most natural intuition in “lattice point theory” is the following: if a “nice region” has infinite area, then it should contain infinitely many lattice points. We refer to this intuition as the Area Principle. Of course, the heart of the matter is how to define “nice region” precisely. Consider, for example, the infinite horizontal strip of height one 0 < y < 1, 1 < x < 1; it has infinite area, but it contains no lattice point. We think the reader agrees that the infinite strip is a “nice region,” so the Area Principle is clearly violated here. A less trivial example is the Pell inequality 1 < x 2 2y 2 < 1;
(1.106)
which is a hyperbolic region of infinite area and contains no lattice point except the origin. Assume that we are in the positive quadrant of the plane with x 1, y 1; then inequality (1.106) is equivalent to p p p p 1 > jx 2 2y 2 j D jy 2 xj jy 2 C xj D jy 2 xj .2y 2 C o.1//;
(1.107)
that is, p 1 ; ky 2k < p 2 2y C o.1/
(1.108)
where, as usual, kuk denotes the distance of a real u from the nearest integer. In view of (1.106)–(1.108) the inequality p c kn 2k < n
(1.109)
does not have infinitely many integer solutions n 1 if the constant c < 23=2 . On the other hand, the hyperbolic region jyj <
c ; 1 x N; x
(1.110)
with any constant c > 0, has area Z
N
2c 1
dx D 2c log N; x
(1.111)
1.4 Second Warm-Up: Markov Chains and the Area Principle
45
which tends to infinity as N ! 1. We think the reader agrees that the hyperbolic region (1.110) is “nice,” so this is again a violation of the Area Principle. A natural way to “save” the Area Principle is to guess that perhaps almost every translated copy of hyperbola segment (1.110) does satisfy the Area Principle. More formally, we can ask the following Question 1.14. Consider the translated hyperbola segment c ; x 1; x
jy ˇj <
(1.112)
where 0 ˇ < 1 is a fixed real, and study the inhomogeneous versions of (1.109): p c kn 2 ˇk < : n
(1.113)
Let c > 0 be arbitrarily small but fixed. Is it true that, for almost every ˇ (in the sense of the Lebesgue measure), inequality (1.113) does have infinitely many integer solutions n 1? We may go further, and consider the smaller region jyj <
1 1 ; and the even smaller region jyj < ; x log x x log x log log x
(1.114)
and so on. They all have infinite area, since Z
N e
dx D log log N; and x log x
Z
N
ee
dx D log log log N; x log x
(1.115)
and the rest all tend to infinity as N ! 1. It is natural to ask the analog of Question 1.14. Question 1.15. Consider the inequalities p kn 2 ˇk < p kn 2 ˇk <
c .n 2/; n log n
(1.116)
c .n 3/; n log n log log n
(1.117)
and so on; 0 ˇ < 1 is a fixed constant. Is it true that, for almost every ˇ (in the sense of the Lebesgue measure), inequality (1.116) (and (1.117), and so on) does have infinitely many integer solutions n 1? The objective of this section is to give an affirmative answer to these questions; see Proposition 1.18 at the end of the section.
46
1 What Is “Probabilistic” Diophantine Approximation?
Let’s return to Questions 1.14 and 1.15. Our answer to Questions 1.14 and 1.15 is based on the idea that, instead p of applying the familiar decimal representation, we switch to the unusual “.1 C 2/ scale representation” of the translation constant ˇ. In Sect. 1.2 we already met an unusual representation: the Ostrowski representation p of integers, see (1.54) and Ostrowski’s Explicit Formula (1.55); the “.1 p C 2/ scale representation” of real numbers is a very similar idea. Since 2 < 1 C 2 < 3, we can certainly write every real 0 < ˇ < 1 in the form ˇD
b1 b2 b3 p C p p C C ::: 1C 2 .1 C 2/2 .1 C 2/3
(1.118)
where bi 2 f0; 1; 2g for every i 1, but the representation is not unique. Since 2.1 C
p p p p 1 2/ C .1 C 2/2 D 2. 2 1/ C .3 2 2/ D 1;
(4.14a)
and in general 2.1 C
p
2/i C .1 C
p
2/i 1 D .1 C
p i C1 2/ ;
(4.14b)
we can guarantee uniqueness by enforcing the Extra Rule in (1.118): bi D 2 implies bi C1 D 0:
(1.118)
Let’s go back to inequality (1.113) (or (1.116), or (1.117), and so on); to “match” representation (1.118) of ˇ, we use an “Ostrowski like” representation of integer n; the novelty is alternation. To p motivate this Ostrowski p like alternating representation, we recall the fact that .1 C 2/j D pj C qj 2, j D 1; 2; 3; : : : ; describes the whole family of positive solutions 1 p x D pj , 1 p y D qj of the Pell equation x 2 2y 2 D ˙1. It follows that .1 2/j D pj qj 2, implying qj
p
2 pj D .1
p j .1/j C1 p ; 2/ D .1 C 2/j
(1.119)
p that is, the distance of qj 2 from the nearest integer has an alternating positive– negative behavior as j D 1; 2; 3; p : : :. This alternating nature of qj 2pj motivates the following “alternating version of the Ostrowski representation”: we search for n in the form with k odd n D dk qk dk1 qk1 C dk2 qk2 dk3 qk3 C dk4 qk4 : : :
(1.120)
where dj 2 f0; 1; 2g for all j k, and dk ¤ 0. Here qj is the “y” in the j th positive solution of the Pell equation x 2 2y 2 D ˙1, that is,
1.4 Second Warm-Up: Markov Chains and the Area Principle
qj D
.1 C
47
p p j 2/ .1 2/j p : 2 2
(1.121)
Note that (1.121) is a perfect analog of the well-known Binet’s formula for the Fibonacci numbers p !n p !n ! 1C 5 1 1 5 Fn D p 2 2 5 where F1 D F2 D 1, F3 D 2, F4 D 3, F5 D 5, and FnC2 D FnC1 C Fn for n 1. (By the way, the Binet’s formula was already published by De Moivre in 1730, hundred years before Binet.) The reason why (1.118) and (1.120) form a perfect match is explained by the following argument: k
X X p p p p n 2ˇ D .1/j C1dj qj 2 bj .1 C 2/j bi .1 C 2/i ; j D1
i >k
and so by working modulo one we have [see (1.119)] k
X X p p p p bi .1C 2/i .1/j C1 dj .qj 2 pj / bj .1 C 2/j n 2ˇ j D1
i >k
k
X X p p .dj bj /.1 C 2/j bi .1 C 2/i ; j D1
(1.122)
i >k
where (1.122) p is a (mod 1) equality. Formula (1.122) tells usphow to find an integer n such that kn 2 ˇk is “very small.” Assume that the .1 C 2/ scale representation of ˇ [see (1.118)] has a long block of consecutive 0s: there is an odd k such that bk ¤ 0; bkC1 D bkC2 D bkC1 D : : : D bkC` D 0;
(1.123)
where ` is “large.” Choose an integer n in the form (1.120) such that dj D bj for 1 j k: Then by (1.122) X p p bi .1 C 2/i kn 2 ˇk i >kC`
(1.124)
48
1 What Is “Probabilistic” Diophantine Approximation?
X
2.1 C
i >kC`
p
i
2/
p 2 p D .1 C 2/kC`
(1.125)
where ` is defined by (1.123); it is the length of the zero-block. Let’s apply (1.124) in (1.120); we claim that the resulting n satisfies the lower bound (recall that k is odd) n qk1 :
(1.126)
Indeed, since dk D bk ¤ 0, by the Extra Rule we have dk1 D bk1 ¤ 2, and so n qk qk1 2qk3 2qk5 qk7 : : : :
(1.127)
We have qk qk1 D qk1 C qk2 ; qk2 2qk3 D qk4 ; qk4 2qk5 D qk6 ; and so on, and using these identities in (1.127) we obtain (1.126). On the other hand, we have the easy upper bound n qkC1 :
(1.128)
n 2qk C 2qk2 C 2qk4 C 2qk6 C : : : ;
(1.129)
Indeed,
and because 2qi D qi C1 qi 1 , the right-hand side of (1.129) is a telescoping sum, implying (1.128). By using the generalized Binet formula (1.121) in (1.125) we have p kn 2 ˇk
<
p p 2 2 2 1 D < p p p .1 C 2/kC` .1 C 2/kC1 2.1 C 2/`1
1 1 1 1 ; p p qkC1 .1 C 2/`1 n .1 C 2/`1
(1.130)
where in the last step we used (1.128), qk1 n qkC1 , and ` is defined by (1.123) (it is the length of the zero-block). In view of (1.130) and (1.123) we can say that the longer the zero-block in (1.123) the better inequality (1.130). This leads to the following question: Why is it true that
1.4 Second Warm-Up: Markov Chains and the Area Principle
49
almost pall real numbers 0 < ˇ < 1 contain “long” zero-blocks of the “digits” in the .1 C 2/ scale representation [see (1.118)]? The digit sequence b1 D b1 .ˇ/, b2 D b2 .ˇ/, b3 D b3 .ˇ/, : : : in (1.118) does not form independent random variables as ˇ runs in the unit interval 0 < ˇ < 1: the Extra Rule “bi D 2 implies bi C1 D 0” clearly contradicts (statistical) independence. What we have instead is a homogeneous Markov chain.
1.4.1 Statistical Independence and Markov Chains Beyond games of chance and digit sums (Borel’s theorem on normal numbers) it is very hard to find perfect independence. At the end of the nineteenth century a very useful relaxation of the concept of independence was introduced and studied by the Russian A.A. Markov, which led to a rapid development of the subject of Markov chains (see, e.g., the well-known book of Feller [Fe1,Fe2]). A (finite) Markov chain is an asymmetric random walk on a (finite directed) graph (with possible loops). For example, our concrete Markov chain b1 D b1 .ˇ/, b2 D b2 .ˇ/, b3 D b3 .ˇ/, can be visualized as an asymmetric random walk on a three-vertex graph (where the vertices are labeled 0, 1, 2). The vertices of the graph are called the states of the Markov chain. Our particular Markov chain has three states: 0, 1, and 2, representing the three possible values bk D bk .ˇ/ D 0 or 1 or 2. A homogeneous Markov chain has a short-term memory: conditional upon the present, the future does not depend on the past. This means that the transition matrix A D .pi;j /i;j with the transition probabilities pi;j =“probability to go from state i to state j (in one step)” completely describes a homogeneous Markov chain.
50
1 What Is “Probabilistic” Diophantine Approximation?
In our special case we have 1 0 1 2 p0;0 p0;1 p0;2 A D @p1;0 p1;1 p1;2 A D @ 2 A p2;0 p2;1 p2;2 10 0 0
(1.131)
p p where D 2 1 D . 2 C 1/1 . It is customary to label the (directed) edges of the corresponding graph with the transition probabilities. In our special case, p the directed edges 0 ! 1, 1 ! 0, 0 ! 0, 1 ! 1 are all labeled with D 2 1 (the last two are loops), 0 ! 2, 1 ! 2 are labeled with 2 , and 2 ! 0 is labeled with 1 (meaning a mandatory step). The steady-state behavior (or long-term behavior) of the Markov chain is described by the stationary distribution q D .q0 ; q1 ; q2 /, which is a probability distribution satisfying the fixpoint equation
q D qA; that is; qj D q0 p0;j C q1 p1;j C q2 p2;j ; j D 0; 1; 2:
(1.132)
A simple calculation gives p p 1 2 2 2 ; q2 D D q1 : q0 D ; q 1 D 2 4 4
(1.133)
The stationary (or fixpoint) distribution is also a limit distribution, see (1.135), justifying the name “long-term behavior.” Indeed, the kth power Ak of the transition matrix represents the k-step transition probabilities pi;j .k/=“probability to go from state i to state j in k steps”: Ak D .pi;j .k//i;j (0 i; j p 2). The eigenvalues 1 , 2 2 , 3 of the transition matrix are 1 D 1, 2 D D 2 2 3, 3 D 0, so we can rewrite the transition matrix as 0
1 1 0 0 A D B 1 @ 0 2 0 A B 0 0 3 for some invertible matrix B, and we obtain 0
1 0 k1 0 0 1 p0 Ak D B 1 @ 0 k2 0 A B D B 1 @0 .2 2 3/k 0 0 k3 0 0
1 0 0A B: 0
(1.134)
By (1.134) we have the following simple formula for the k-step transition probabilities: p pi;j .k/ D qj C ci;j .2 2 3/k ;
(1.135)
1.4 Second Warm-Up: Markov Chains and the Area Principle
51
p p where q0 D 1=2; q1 D 2=4; q2 D .2 2/=4 is the stationary distribution [see p (1.132) and (1.133)], and ci;j are appropriate constants independent of k. Since j2 2 3j < 1 (in fact < 1=5), (1.135) tells us that the k-step transition probability pi;j .k/ converges to qj , as k ! 1, exponentially fast. It is worthwhile to know the recipe how to determine the constant factors ci;j in (1.135). Comparing (1.131) to (1.135) with k D 1, we have D p0;0 D q0 c0;0 2 H) c0;0 D
1=2 ; 2
and similarly c1;0 D
1=2 1 ; c2;0 D 2 ; 2 2
p p 2=4 D c1;1 ; c2;1 D 2=4 2 ; c0;1 D 2 p p 2 2 2 2 c0;2 D 1 D c1;2 ; c2;2 D : 4 2 4 2 Even if equation (1.135) is relatively simple, it is rather inconvenient to work with the k-step transition probabilities pi;j .k/. Luckily there is a simple way to go back to independence: the trick is to switch to “0,” “1,” “20” (instead of the original values “0,” “1,” “2”); formally, b1 b2 b3 : : : D B1 B2 B3 : : : where Bi D 0 or 1 or 20:
(1.136)
The sequence defined by (1.136) B1 .ˇ/; B2 .ˇ/; B3 .ˇ/; : : : as 0 < ˇ < 1
(1.137)
does form independent random variables with common distribution p p 2 1 and PrŒBi D 20 D . 2 1/2 D 3 2 2; (1.138) where Pr (“probability”) means the ordinary one-dimensional Lebesgue measure. The independence in (1.137) comes from self-similarity, p that is to say, it is a corollary of the one-digit periodicity of the continued fraction 2 D Œ1I 2; 2; 2; : : :. Notice that (1.138) is just a restatement of (1.119) and (1.118), or using the Markov chain terminology, (1.138) comes from the first two rows of the transition matrix in (1.131) (the third row explains the use of “20” instead of “2”). A long zero-block of bi s is equivalent to a long zero-block of Bi s (with a possible loss of “20” at the beginning), so we reduced our number-theoretic problem to a purely probabilistic question—long runs of 0s—for independent trials. The PrŒBi D 0 D PrŒBi D 1 D
p
52
1 What Is “Probabilistic” Diophantine Approximation?
simplest probabilistic model comes from tossing a fair coin repeatedly, and then the analogous problem is to study the long runs of Heads. Surprisingly, this natural problem was somehow ignored by all textbooks of probability theory, so we have to make a detour here to briefly discuss it.
1.4.2 Long Runs of Heads Suppose that we toss a fair coin N times and write down the outcomes; thus we obtain a TH sequence where T and H stand for Tails and Heads: say, THH TH TH T T : : : THH . Let L D L.N / denote the length of the longest block of consecutive Heads. This L is a random variable with possible values 0 L N . What is the typical size of L D L.N /? It is easy to guess that L D L.N / log N= log 2 (binary logarithm of N ). What is quite surprising is that L D L.N / is in fact concentrated on a constant number of values log N= log 2 C O.1/ centered at log N= log 2 with probability close to one. The following elegant result gives the complete answer (we don’t really need such a sharp result, so we skip the proof). Proposition 1.15. For simplicity assume that N is a power of two (i.e., log N= log 2 is an integer); then for any fixed integer d we have log N d 2 d 1 e 2 C o.1/; C d D e 2 Pr L D L.N / D log 2
(1.139)
where the error term o.1/ tends to zero if d is fixed and N ! 1. Remarks. As far as we know this is an unpublished result. We learned it from János Komlós; it is probably his theorem, or perhaps it is folklore. The maximum of the exponential expression in (1.139) is attained at d D 1, and d D 2; 0; 3; 1 give the remaining relatively large values in decreasing order: e 1=2 e 1 D 0:2387 for d D 1; e 1 e 2 D 0:2325 for d D 2; e 1=4 e 1=2 D 0:1723 for d D 0; e 2 e 4 D 0:1174 for d D 3; e 1=8 e 1=4 D 0:1037 for d D 1: Notice that the five values d D 1; 2; 0; 3; 1 represent more than 85 % probability, so the longest run of Heads L D L.N / is basically concentrated on a constant number of values log N= log 2 C O.1/. There is nothing surprising
1.4 Second Warm-Up: Markov Chains and the Area Principle
53
about the binary logarithm log N= log 2, but the extreme concentration around log N= log 2 C O.1/ and the elegant limit theorem above are truly surprising facts. The goal of Proposition 1.15 was to justify the binary logarithm main term log N= log 2. But what we are really interested in is the behavior of the long runs in an infinite sequence of independent trials (Heads-and-Tails). Let L1 .N / denote the length of the longest run of Heads among the first N trials (coin tossings), N D 1; 2; 3; : : :. Proposition 1.15 describes the typical behavior for a fixed N : the longest run is log N= log 2 C O.1/ with probability close to one. We are going to prove that, with probability one, there are infinitely many values of N such that the surplus L1 .N / .log N= log 2/ tends to infinity with the rate of the iterated logarithm. More precisely, we prove Proposition 1.16. With probability one, there are infinitely many values of N such that L1 .N / > .log N= log 2/ C .log log N= log 2/:
(1.140)
On the other hand, with probability one, for any " > 0 we have the upper bound L1 .N / < .log N= log 2/ C .1 C "/.log log N= log 2/
(1.141)
for all sufficiently large N . Remark. The intuitive reason behind (1.140) and (1.141) is divergence– convergence: X
2.log N= log 2/.log log N= log 2/ D
N 2
X
1 D 1; N log N N 2
(1.142)
but X N 2
2.log N= log 2/.1C"/.log log N= log 2/ D
X N 2
1 < 1: N.log N /1C"
(1.143)
Proof of Proposition 1.16. We use Chebyshev’s inequality and combine it with the “nonoverlapping property of the T -closed H -blocks.” This technical trick greatly simplifies the calculations. The idea is to include the “T ” at both endpoints of any long run of Heads. More precisely, if H H is a long block of consecutive Heads that we cannot extend further, then there is a T at both ends (unless the block is already at the end, i.e., it begins at 1 or ends at N ), and we consider the T -closed H -block TH H T , or possibly H H T (if it begins at 1) or TH H (if it ends at N ). The crucial property of the T -closed H -blocks is that they cannot overlap except that they may share a common T at the end.
54
1 What Is “Probabilistic” Diophantine Approximation?
Let E.i; j / denote the event that the i th outcome is Tails, the rth outcome is Heads with i C 1 r i C j , and the .i C j C 1/th outcome is Tails again; this is a typical T -closed H -block TH H T . Also, let Est art .j / denote the event that the rth outcome is Heads with 1 r j , and the .j C 1/th outcome is Tails; this is a T -closed H -block H H T that begins at 1. Finally, let Eend .j / denote the event that the .N j /th outcome is Tails, and the rth outcome is Heads with N j C 1 r N ; this is a T -closed H -block H H T that ends at N . Let Fn denote the event that the .n k/th outcome is Tails, the rth outcome is Heads with n k C 1 r n, and the .n C 1/th outcome is Tails again, where k D k.n/ D .log n= log 2/ C .log log n= log 2/ (take the lower integral part, and assume that n > 10). Event Fn means a particular T -closed H -block TH H T ; the probability is 2k2 . Then by (1.142) X
Pr ŒFn D 1:
(1.144)
n>10
Let n denote the characteristic function of event Fn : n D 1 or 0 depending on whether Fn holds or fails. The sum X n (1.145) XM D 10
is less or equal to the number of times inequality (1.140) holds for some 10
EXM D
10
D
X
Pr ŒFn D
2.log n= log 2/.log log n= log 2/2 D
10
X 10
1 1 D log log M C O.1/: 4n log n 4
(1.146)
We want to show that the random variable XM is typically “close” to its expected value (1.146). The standard way to do this is to apply the Chebyshev inequality. We need to compute the variance:
Var.XM / D E
X
!2 . n En /
10
C 2
X
D
X
E. n En /2 C
10
E. n1 En1 /. n2 En2 /;
10
where En D E n D Pr ŒFn
(1.147)
1.4 Second Warm-Up: Markov Chains and the Area Principle
55
is the expectation of n . Clearly (n1 ¤ n2 ) E. n1 En1 /. n2 En2 / D E. n1 n2 En1 En2 /;
(1.148)
so we have to study E n1 n2 . There are three cases. Case 1: If the corresponding T -closed H -blocks are disjoint, then E n1 n2 D En1 En2 , which has zero contribution in (1.148). Case 2: If the corresponding T -closed H -blocks are “touching,” i.e., they share a common T , then E n1 n2 D 2En1 En2 . Case 3: If the corresponding T -closed H -blocks are overlapping (more than just touching), then PrŒFn1 Fn2 D 0 (i.e., this case is impossible), which implies E. n1 n2 / D 0. Therefore this case has negative(!) contribution in (1.148). Let’s return to (1.147). The contribution of the diagonal part X
E. n En /2
10
in (1.147) is less than EXM , since E. n En /2 D PrŒFn .1 PrŒFn / < PrŒFn : Also, the contribution of Case 2 in the off-diagonal part of (1.147) is less than X
2
Pr ŒFn D 2EXM :
10
Thus we have the upper bound
Var.XM / D E
X
!2 . n En /
< 3EXM :
(1.149)
10
We apply Chebyshev’s inequality Pr ŒjX EX j
Var.X / ; 2
(1.150)
which holds for any random variable X (with finite variance) and any positive real : let X D XM and D
1 EXM : 2
56
1 What Is “Probabilistic” Diophantine Approximation?
Then by (1.149) 1 12 : Pr XM EXM 1 2 EXM
(1.151)
In view of (1.146) we have EXM ! 1 as M ! 1. Combining this with (1.151) we conclude that, with probability one, inequality (1.140) has infinitely many solutions. This proves the first part of Proposition 1.16. The second part is almost trivial. Indeed, let Gn denote the event that the .n`/th outcome is Tails, the rth outcome is Heads with n`C1 r n, and the .nC1/th outcome is Tails again, where ` D `.n/ D .log n= log 2/ C .1 C "/.log log n= log 2/ with some fixed " > 0 (again we take the lower integral part, and assume that n > 10). This means a particular T -closed H -block TH H T that has probability 2`2 . Then by (1.143) X
Pr ŒGn < 1:
(1.152)
n>10
Consider the event H D
\ [
Gm I
(1.153)
n>10 mn
by (1.152) the probability of H is zero. This proves the second part of Proposition 1.16. t u Remarks. The last argument is often called the “easy part of the Borel–Cantelli Lemma”; see, e.g., in the well-known book of Feller [Fe1, Fe2]. The “harder part of the Borel–Cantelli Lemma” is some kind of a converse: it states that if Gn is an infinite sequence of independent events with X
Pr ŒGn D 1;
(1.154)
n
then the probability of event H [see (1.153)] is one. Notice that with Gn D Fn we cannot apply this criterion, since the events Fn are not independent if the corresponding T -closed H -blocks are touching or overlapping. This is why we couldn’t apply the “harder part of the Borel–Cantelli Lemma” and had to turn to the Chebyshev inequality instead. It is important to notice that the correct proof with the Chebyshev inequality gives exactly the same divergence condition as the incorrect argument applying the “harder part of the Borel–Cantelli Lemma.” Repeating the proof of Proposition 1.16 one can easily prove the following more general convergence–divergence type result.
1.4 Second Warm-Up: Markov Chains and the Area Principle
Proposition 1.17. If
57
.N / is any increasing function of N for which X N
1 D 1; .N /
then with probability one, the longest run of Heads up to N satisfies the lower bound L1 .N / >
log .N / log 2
(1.155)
for infinitely many values of N . On the other hand, if X N
1 < 1; .N /
then with probability one, the longest run of Heads up to N satisfies the upper bound L1 .N / <
log .N / log 2
for all sufficiently large values of N . Notice that in the special cases .N / D N log N and
.N / D N.log N /1C"
we get back Proposition 1.16. Now p we are ready to solve our number-theoretic problem: the Area Principle for slope 2; see Questions 1.14 and 1.15 and (1.112)–(1.117). Let’s return to (1.130) and (1.136)–(1.138). In view of (1.138) we have to replace the fair coin with an asymmetric discrete probability distribution: B1 , B2 , B3 , : : : are independent and identically distributed random variables having values 0, 1, and “20” with the distribution p p p 2 1 and PrŒBi D 20 D . 2 1/2 D 3 2 2; (1.156) and we are interested in the long runs of 0s. Let L1 .N / denote the length of the longest run of 0s among B1 , B2 , : : : ; BN ; of course, L1 .N / is a random variable. Proposition 1.17 is about the longest run of Heads in tossing a fair coin repeatedly, and for a fair coin PrŒHeads D 1=2. Since in our case PrŒBi D 0 D PrŒBi D 1 D
PrŒBi D 0 D
p 21 D
1 p ; 2
1C
58
1 What Is “Probabilistic” Diophantine Approximation?
it is perfectly reasonable to expect the following analog of (1.155): with probability one L1 .N / >
log .N / log b
for infinitely many values of N , where b D 1 C increasing function of N for which X N
(1.157) p
2 and
.N / is any positive
1 D 1: .N /
(1.158)
The proof of (1.157) and (1.158) is the same as that of Proposition 1.17; I leave it to the reader. If condition (1.158) applies, then by (1.157) there are infinitely many zero-blocks Bk ¤ 0; BkC1 D BkC2 D : : : D BkC` D 0
(1.159)
where ` D `.k/ satisfies the equation ` D log .k C `/= log b with b D 1 C Because of independence, we have PrŒBkC1 D BkC2 D : : : D BkC` D 0 D b ` D
1 : .k C `/
p
2.
(1.160)
There is a technical nuisance due to the slight difference between the B-indexing and the b-indexing in (1.136): this is the effect of the pairs “20.” More precisely, if Bi D 0 then Bi D bj where j D i C i2 and i2 denotes the number of pairs Bk D 20 withpk < i . By the strong law of large numbers, with probability one, i2 = i ! 3 2 2 as i ! 1 [see (1.156)]. It follows that 0 D Bi D bj D bj.i / implies p j D j.i / D .1 C .3 2 2/ C o.1//i as i ! 1:
(1.161)
Thus we can rewrite (1.159): p bj C1 D bj C2 D : : : D bj C` D 0 where j .1 C .3 2 2/ C o.1//k:
(1.162)
Then by (1.124)–(1.126) there is an integer n in qj n < qj C1 such that p kn 2 ˇk
1 1 I n .k C l/ n .log n= log c/
here c > 1 is some appropriate constant.
(1.163)
1.5 The Golden Ratio and Markov Chains: The Simplest Case of Theorem 1.2
59
Note that, by using the substitution y D log x, we have the equidivergence property: 1 X nD2
1 D1” n .log n/ Z
1
” 1
Z
1 2
dx D1 x .log x/
1
X dy D1” .y/ nD1
1 D 1: .n/
(1.164)
In view of this, the technical nuisance due to the discrepancy between the Bindexing and b-indexing [see (1.161)] is also irrelevant. Therefore, by (1.163) and (1.164) we obtain p Proposition 1.18 (“Area principle for 2”). Let .x/ be any positive decreasing function of the real variable x with X
.n/ D 1:
n
Then the inhomogeneous inequality p kn 2 ˇk <
.n/
has infinitely many integral solutions for almost every 0 ˇ < 1 (in the sense of Lebesgue measure). This gives a positive answer to both Questions 1.14 and 1.15 pat the beginning of the section. Note that Proposition 1.18 holds in general, when 2 is replaced by an arbitrary (irrational) ˛; see Theorem 5.7.
1.5 The Golden Ratio and Markov Chains: The Simplest Case of Theorem 1.2 The case of the van der Corput sequence in Sect. 1.3 was particularly simple, because the explicit formula (1.67) contains independent random variables. This is why the computation of the expectation [see (1.68) and (1.69)] and the variance [see (1.70)–(1.78)] was p rather simple. By contrast, in Sect. 1.4 we had to work with the unusual .1 C 2/ scale representation of real numbers, in which the digit sequence did not form independent random variables (instead it was a homogeneous Markov chain). There was, however, a simple trick to reduce the problem to the familiar case of independent random variables: namely, to work with the pair “20” instead of the single “2.” In this section we combine the ideas p of Sects. 1.3 and 1.4 and settle Theorem 1.2 in the simplest special case ˛ D . 5 1/=2 (golden ratio)
60
1 What Is “Probabilistic” Diophantine Approximation?
(p
5C1 2
) D
p 51 D 2
1 1
1C 1C
D Œ1; 1; 1; : : ::
1 1 C :::
p (Because we work modulo one, p we often use the term golden ratio for both . 5 C 1/=2 and its fractional part . 5 1/=2; we hope this slight ambiguity will not confuse the reader.) The golden ratio, which has the simplest continued fraction among all quadratic irrationals, is intimately associated with the Fibonacci sequence. The convergents p pj =qj of ˛ D . 5 1/=2 satisfy q1 D q2 D 1, q3 D 2, qj D qj 1 C qj 2 for all j 3, and so we have the well-known Binet formula 1 qj D p ˛ j .˛/j and pj D qj 1 5
(1.165)
with p1 D 0. (The Fibonacci numbers are usually denoted by Fj , but here we follow the standard notation pj =qj used for the convergents p in continued fractions.) The Ostrowski representation (1.54) for ˛ D . 5 1/=2 is particularly simple: every integer 0 n < qm can be uniquely represented in the form n D bm1 qm1 C bm2 qm2 C : : : C b1 q1 ;
(1.166)
where qi is the i th Fibonacci number (q1 D q2 D 1; q3 D 2, and see (1.165) for the rest), bi 2 f0; 1g for 2 i < m, we apply the Extra Rule that bi D 1 implies bi 1 D 0 [i.e., we never take two consecutive Fibonacci numbers in (1.166)], and finally b1 D 0 (which seems ridiculous, but it is the special case of the natural general inequality 0 b1 < a1 for arbitrary ˛). Similar to Sects. 1.3 and 1.4, we follow a probabilistic approach/terminology: we view the integral parameter n as a discrete random variable, taking the values 0; 1; 2; : : : ; qm 1 with the same probability 1=qm . As a warm-up, we compute first the expectation E.bi / D PrŒbi D 1
(1.167)
of the i th coefficient (“digit”) bi D bi .n/ of n in the Ostrowski representation (1.166). We claim that PrŒbi D 1 D
qi 1 qmi : qm
(1.168)
The proof of (1.168) is a simple corollary of the uniqueness of the Ostrowski representation (1.166). Indeed, if bi D 1 then bi 1 D 0, and the number of ways to choose the vector .b1 ; b2 ; : : : ; bi 2 / is exactly qi 1 . Similarly, bi D 1 implies
1.5 The Golden Ratio and Markov Chains: The Simplest Case of Theorem 1.2
61
bi C1 D 0, and the second factor qmi in the numerator in (1.168) is the number of ways to choose the vector .bi C1 ; biC2 ; : : : ; bm1 / (for the latter equals the number of ways to choose the vector .b1p ; b2 ; : : : ; bmi 1 /). These two arguments give (1.168). For the special case ˛ D . 5 1/=2 the Ostrowski formula (1.55) is particularly simple. We begin with evaluating "i : by (1.165) we have "i D qi ˛ pi D qi ˛ qi 1 D 1 1 D p ˛ i .˛/i ˛ p ˛ i C1 .˛/i 1 D 5 5 1 D p .˛/i 1 .˛/iC1 D 5 .˛/i D p 5
1 .1/i C1 ˛ i C˛ D p : ˛ 5
(1.169)
Thus we have ˛i 1 .1 C .1/i C1 ˛ 2i / qi j"i j D p ˛ i .˛/i p D : 5 5 5
(1.170)
p By using Ostrowski’s formula (1.55) for ˛ D . 5 1/=2 and also (1.169) and (1.170) for 0 n < qm we have (we recall that b1 D 0) 0 1 1 X 1 b q j" j q j" j i i i j i AD S˛ .n/ D C@ .1/i C1 bi @ C bj A qi j"i j C 2 2 q 2 i i D2 2j
D
m1 X i D2
0
0
1 X 1 b 1 i .1/i C1 bi @ C C bj ˛ i j A C O.1/; 2 10 5
(1.171)
2j
since by (1.165) qj ˛ j .˛/j 1 .˛/2j D i D ˛ i j : i qi ˛ .˛/ 1 .˛/2i
(1.172)
In order to compute the expectation of S˛ as 0 n < qm , we of course use (1.167) and (1.168), and in addition we need to know
E.bj bi / D PrŒbj D bi D 1: We have a perfect analog of (1.168) (the proof is similar): if j < i then
(1.173)
62
1 What Is “Probabilistic” Diophantine Approximation?
PrŒbj D bi D 1 D
qj 1 qi j 1 qmi : qm
(1.174)
We leave the proof to the reader. Applying (1.165) in (1.168), we have qi 1 qmi PrŒbi D 1 D D qm
p1 .˛ i C1 5
.˛/i 1 / p1 .˛ i m .˛/mi / 5
p1 .˛ m 5
.˛/m /
˛ .1 C O.˛ 2i //.1 C O.˛2.mi / // Dp .1 C O.˛ 2m // 5
D
(1.175)
and the same argument for (1.174) (j < i ): .1CO.˛2j //.1CO.˛ 2.mi / // ˛ 2 : 1C.1/i j ˛ 2.i j 1/ PrŒbj D bi D 1D p .1CO.˛ 2m // 5 (1.176) p Applying (1.175) and (1.176) in (1.171), we obtain (˛ D . 5 1/=2 and n runs in 0 n < qm ) ES˛ D ES˛ .n/ D D
m1 X
0
.1/
i D2
1
1 1 X ˛ 1 ˛ ˛ 2 i j 2.i j 1/ i j A 1C.1/ ˛ ˛ p C p C C p 2 5 10 5 5 2j
i C1 @
C O.1/ D O.1/;
(1.177)
since (1.177) is basically a telescoping sum. In other word, the expectation is trivial: ES˛ D O.1/ independently of the size of m (note that n runs in 0 n < qm ). Unfortunately, the computation of the variance is much more complicated: it leads to a long, unpleasant case study. To get the variance, by (1.171) and (1.177) we have to evaluate the square (bi D bi .n/ as n runs in 0 n < qm ) 0 0 112 qm 1 m1 X 1 X @X 1 2 .1/i C1 @ bi C bj bi ˛ i j AA : qm nD0 i D2 5 5 2j
(1.178)
The expansion of (1.178) is a complicated sum, so for the sake of simplicity we just focus on the following subsum of the expansion: qm 1 m1 i1 m1 i2 X X .1/i1 Ci2 1 XXX bj1 bi1 bj2 bi2 ˛ i1 j1 Ci2 j2 : qm nD0 i D2 j D2 i D2 j D2 25 1
1
2
2
(1.179)
1.5 The Golden Ratio and Markov Chains: The Simplest Case of Theorem 1.2
63
To evaluate (1.179) we need to know the probability PrŒbj1 D bi1 D bj2 D bi2 D 1:
(1.180)
For simplicity assume that j1 ; i1 ; j2 ; i2 are all different. We know j1 < i1 and j2 < i2 , and without loss of generality we may assume that i1 < i2 . Then we have three options for the range of j2 : either (1) j2 < j1 , or (2) j1 < j2 < i1 , or (3) i1 < j2 < i2 . As an illustration, consider the first case (the rest are similar): Case 1: j2 < j1 < i1 < i2 . We have the following analog of (1.168) and (1.174): qj2 1 qj1 j2 1 qi1 j1 1 qi2 i1 1 qmi2 ; qm (1.181) and using (1.165) in (1.181) we obtain the following analog of (1.176): PrŒbj1 D bi1 D bj2 D bi2 D 1 D
˛ PrŒbj2 D bj1 D bi1 D bi2 D 1 D .1 C o.1// p 5
4 A.j2 ; j1 ; i1 ; i2 /; (1.182)
where A.j2 ; j1 ; i1 ; i2 / equals the product
1C.1/j1 j2 ˛ 2.j1 j2 1/
1C.1/i1 j1 ˛ 2.i1 j1 1/
1C.1/i2 i1 ˛ 2.i2 i1 1/ :
Note that the o.1/ in (1.182) makes a negligible contribution in the variance. To evaluate sum (1.179) in Case 1, by (1.182) we have to determine the following sum X A.j2 ; j1 ; i1 ; i2 /B.j2 ; j1 ; i1 ; i2 / (1.183) 2j2 <j1
where B.j2 ; j1 ; i1 ; i2 / D .1/i1 Ci2 ˛ i1 j1 Ci2 j2 : It is convenient to introduce new variables for the differences d1 D i2 i1 , d2 D i1 j1 , d3 D j1 j2 . Multiplying out in (1.183) and using the new variables, we have XXX sum.1.183/ D m .C1 C : : : C C8 / C O.1/; (1.184) d1 1 d2 1 d3 1
where C1 D .1/d1 ˛ d1 C2d2 Cd3 ; C2 D ˛ 3d1 C2d2 Cd3 2 ; C3 D .1/d1 Cd2 ˛ d1 C4d2 Cd3 2 ; C4 D .1/d1 Cd3 ˛ d1 C2d2 C3d3 2 ;
64
1 What Is “Probabilistic” Diophantine Approximation?
C5 D .1/d2 ˛ 3d1 C4d2 Cd3 4 ; C6 D .1/d3 ˛ 3d1 C2d2 C3d3 4 ; C7 D .1/d1 Cd2 Cd3 ˛ d1 C4d2 C3d3 4 ; C8 D .1/d2 Cd3 ˛ 3d1 C6d2 C3d3 6 : For illustration, we just evaluate the contribution of C1 : XXX
C1 D
d1 1 d2 1 d3 1
D
X XX
.1/d1 ˛ d1 C2d2 Cd3 D
d1 1 d2 1 d3 1
D D1 D2 D3 1
(1.185)
where D1 D 1 ˛ C ˛ 2 ˛3 C ˛4 D D2 D 1 C ˛ 2 C ˛ 4 C ˛6 C D
1 ; 1C˛
1 ; 1 ˛2
D3 D 1 C ˛ C ˛2 C ˛3 C ˛4 C D
1 : 1˛
It follows that sum.1.185/ D
1 1 1 1 D D 1 C ˛ 1 ˛2 1 ˛ ˛
p 5C1 : 2
Repeating the same elementary argument (evaluation of geometric series) a few dozen times (due to the large number of cases), eventually we conclude that the variance [see (1.178)] is D c0 m C O.1/, where p c0 > 0 is some positive absolute constant (its value happens to be c0 D .60 5/1 ). Unfortunately, this kind of direct computation of the constant factor is annoyingly cumbersome even for the golden ratio, which has the simplest continued fraction. For an arbitrary quadratic irrational—where the period of the continued fraction can be very complicated—the direct approach to determine the variance (or the expectation) outlined above is hopelessly messy. In the next sections we will show an alternative, indirect method (involving Dedekind sums, Fourier analysis, and deeper algebraic number theory), which gives the expectation and the variance in a faster/simpler way (in the form of an explicit finite sum) in all cases. Note that in general the expectation is not O.1/, i.e., we don’t have the luxury of a telescoping sum leading to a total cancellation like (1.177). p Let’s return to the variance of the golden ratio ˛ D . 5 1/=2 [see (1.178)]:
1.5 The Golden Ratio and Markov Chains: The Simplest Case of Theorem 1.2
65
0 0 112 qm 1 m1 X 1 X @X 2 1 .1/iC1 @ bi C bj bi ˛ i j AA D qm nD0 i D2 5 5 2j
1 p : 60 5
(1.186)
For a shortcut way of computing c0 , see Sect. 3.2 [in particular, formula (3.102)]. Next we study the distribution of the sum [the critical part of S˛ .n/, see (1.171)] 1 X 2 1 S.n/ D .1/i C1 @ bi C bj bi ˛ i j A 5 5 i D2 2j
0
(1.187)
bi as 0 n < qm (note that bi . 12 10 / D 25 bi if bi D 1 or 0). Here bi D bi .n/ form 0-1 valued random variables with the distribution [see (1.175)]
˛ ˛ PrŒbi D 1 D p C o.1/ and PrŒbi D 0 D 1 p C o.1/: 5 5
(1.188)
The random variables bi D bi .n/ are not independent (as n runs in the interval 0 n < qm )—instead they form a Markov chain (a consequence of the Extra Rule that bi D 1 implies bi 1 D 0). However, due to the small error o.1/ in (1.188), the Markov chain is not homogeneous, which implies that the simple trick used in Sect. 1.4 breaks down: working with “0” and “10” instead of “0” and “1” does not lead to independent random variables. Instead it leads to almost independent random variables (since we just make an exponentially small error by deleting the term o.1/ in (1.188), and the total effect of these small errors is negligible). This means, we cannot directly apply the central limit theorem of probability theory (which is developed for independent random variables), but instead we are forced to repeat the whole proof (hoping that almost independence is just as good as perfect independence). To avoid repeating the whole probability theory proof, we try to preserve perfect independence.
1.5.1 Constructing the Underlying (Homogeneous) Markov Chain To preserve independence, we sacrifice the Ostrowski representation (1.166) and switch to a “golden ratio scale” representation of real numbers. Instead of taking any integer n in 0 n < qm where
66
1 What Is “Probabilistic” Diophantine Approximation?
1 qm D p 5
!m p 5C1 2
p !m ! 1 5 1 D p .ˇ m .ˇ/m / ; 2 5
we write an arbitrary real number in 0 < the form
1 p ˇm , 5
p where ˇ D . 5 C 1/=2, in
1 ˇ m1 C bm2 ˇ m2 C bm3 ˇm3 C ;
D p bm1 5
(1.189)
p which is a perfect analog of (1.118). That is, p(1.189) is a kind of ˇ D . 5 C 1/=2 scale representation of . Since 1 < ˇ D . 5 C 1/=2 < 2, in (1.189) we can take bi D bi . / 2 f0; 1g for every i < m. Also, by using the equation 1 C ˇ D ˇ 2 ” ˇ i C ˇ i C1 D ˇ i C2 ;
(1.190)
we can guarantee uniqueness in (1.189) by enforcing the Extra Rule bi D 1 implies bi1 D 0:
(1.191)
The fundamental difference between Ostrowski’s expansion (1.166) and the golden ratio expansion (1.189) is that the latter is infinite: for D n (i.e., for an integer) representation (1.189) has infinitely many digits bi D bi . / (namely, for all i < m, including all negative integers for i ). Since the key equation (1.187) is expressed in terms of the Ostrowski coefficients bi D bi .n/, we have to clarify the relation between bi and bi . The idea is to replace S.n/ in (1.187) with the new sum 1 X 2 1 S . / D .1/i C1 @ bi . / C bj . /bi . /˛ i j A 5 5 i D2 2j
0
(1.192)
p where the real number runs in p the interval 0 < < ˇ m = 5 and bi D bi . / 2 f0; 1g are the digits in the ˇ D . 5 C 1/=2 scale representation of [see (1.189); we ignore the digits bi with i < 2; instead of 2 we could take any other starting value in (1.192)]. The advantage of working with bi instead of bi is p that the m random p variables bi D bi . / (as runs in the interval 0 < < ˇ = 5 with ˇ D . 5 C 1/=2) form a homogeneous Markov chain; consequently, we can restore independence by using the familiar trick to switch from “0” and “1” to “0” and “10.” Let’s compare (1.166) to (1.189). We study the equation bi .n/ D bi . / for all n
1 1
(1.193)
1.5 The Golden Ratio and Markov Chains: The Simplest Case of Theorem 1.2
67
It is easy to see that for the overwhelming majority of integers n in 0 n < qm (mth Fibonacci number), (1.193) holds for every m > i > C , where C D O.1/ is a large constant (the larger the constant the larger the majority). It follows that for the overwhelming majority of integer n in 0 n < qm D mth Fibonacci number [see (1.187) and (1.192)] S.n/ D S . / C O.1/ for all n
1 1
(1.194)
By (1.194) it suffices to prove the central limit theorem for the sum S . / defined in (1.192). Let’s return to representation (1.189); the random variables bi D bi . / (as the p real number runs in the special interval 0 < < ˇ m = 5) do p form a homogeneous Markov chain. (In fact, the special interval 0 < < ˇ m = 5 guarantees that the initial distribution is the stationary distribution.) This Markov chain has only two states, 0 and 1, and we can visualize the whole thing as an asymmetric random walk on a two-vertex graph, where the vertices are labeled 0 and 1. The p transition probabilities are the following: the loop 0 ! 0 has probability ˛ D . 5 1/=2, 0 ! 1 has probability ˛ 2 , and 1 ! 0 has probability 1 (mandatory step).
In other words, the two-by-two transition matrix A D .pi;j / with the transition probabilities pi;j =“probability to go from state i to state j (in one step)” is p0;0 p0;1 ˛ ˛2 AD D (1.195) p1;0 p1;1 1 0 p where ˛ D . 5 1/=2. The long-term behavior of the Markov chain is described by the stationary distribution q D .q0 ; q1 /, which is a probability distribution satisfying the equation q D qA; that is; qj D q0 p0;j C q1 p1;j ; j D 0; 1: A simple calculation gives that ˛ ˛ q0 D 1 p ; q1 D p ; 5 5 which is exactly (1.188) without the (exponentially small) error terms o.1/.
68
1 What Is “Probabilistic” Diophantine Approximation?
The kth power Ak of the transition matrix represents the k-step transition probabilities pi;j .k/=“probability to go from state i to state j in k steps”: Ak D .pi;j .k//i;j where i D 0; 1 and j D 0; 1. pThe eigenvalues 1 and 2 of the transition 2 matrix are 1 D 1 and 2 D ˛ D . 5 3/=2, so we can rewrite the transition matrix as 1 1 0 B ADB 0 2 for some invertible matrix B, and so we have k
A DB
1
k 1 0 0 1 1 p B DB B: 0 k2 0 .. 5 3/=2/k
(1.196)
By (1.196) we have the following simple formula for the k-step transition probabilities: p pi;j .k/ D qj C ci;j .. 5 3/=2/k ;
(1.197)
p p where q0 D 1 ˛= 5; q1 D ˛= 5 is the stationary distribution, and ci;j p are appropriate constants independent of k. Since j. 5 3/=2j < 1 (in fact < 1=2), (1.197) tells us that the k-step transition probability pi;j .k/ converges to qj , as k ! 1, exponentially fast. p Working with the special interval 0 < < ˇ m = 5 guarantees that the initial distribution of the Markov chain is in fact the stationary distribution. We can switch from the Markov chain back to independence by using the familiar trick of using “10” instead of “1.” To use “10” instead of “1” means that we decompose the digit sequence bm1 bm2 b2 into 0s and 10s, that is, bm1 bm2 b2 D 0 0 10 0 0 10 10 0 10 0 ;
or formally bm1 bm2 b2 D B1 B2 Bt
(1.198)
where Bi D 0 or 10, and t is a random variable. We have PrŒBi D 0 D ˛ and PrŒBi D 10 D ˛ 2 ;
(1.199)
p m where Pr (“probability”) means 5=ˇ p times the one-dimensional Lebesgue m measure (as runs in 0 < < ˇ = 5). Equation (1.199) comes from the first row of the transition matrix in (1.195) (the second row explains the use of “10” instead of “1”).
1.5 The Golden Ratio and Markov Chains: The Simplest Case of Theorem 1.2
69
By using standard probability theory, we can easily describe the asymptotic behavior of the random variable t in (1.198). By the strong law of large numbers the sequence B1 B2 Bt in (1.198) consists of t0 0s and t1 10s (of course both t0 and t1 are random variables), where t0 C t1 D t . By (1.199) we have t1 D PrŒBi D 10 C o.1/ D ˛ 2 C o.1/: t
(1.200)
On the other hand, (1.198) consists of t0 C t1 0s and t1 1s, and so t0 C 2t1 D t C t1 D m C O.1/. Thus by (1.200) we have tD
m m ˇ C o.m/ D p m C o.m/ t1 C O.1/ D 2 1C˛ 1C t 5
(1.201)
p where ˇ D . 5 C 1/=2 D ˛ 1 .
1.5.2 How to Approximate with a Sum of Independent Random Variables Next we develop a variant of the decomposition trick in Sect. 1.3. Let 1 L D L.m/ < R D R.m/
(1.202)
be two integral parameters to be specified later; L will be around constant times log m and R D m with some constant 0 < < 1=2. We decompose the sequence B1 B2 Bt in (1.198) into groups of size R D R.m/: B1 ; B2 ; B3 ; : : : ; BR ; BRC1 ; BRC2 ; BRC3 ; : : : ; B2R ; B2RC1 ; B2RC2 ; B2RC3 ; : : : ; B3R ; and so on:
(1.203)
Let B.i 1/RCj be an arbitrary element of the i th group in (1.203), where i D 1; 2; 3; : : : and 1 j R. We have B.i 1/RCj D 0 or 10, that is, B.i 1/RCj D bk or bk bk1 for some k [see (1.198)]; let Ii denote the set of all indices k or k; k 1 that occur this way in the i th group. Notice that Ii is an interval (of consecutive integers), and these intervals are decreasing: the elements of Ii C1 are all smaller than that of Ii . By using decomposition (1.203) we can rewrite (1.192) in the form S . / D X1 C X2 C X3 C C Y1 C Y2 C Y3 C C W;
(1.204)
70
1 What Is “Probabilistic” Diophantine Approximation?
p where (˛ D . 5 1/=2) Xi D
2X 1 .1/k bk 5 5 k2Ii
Yi D
1 5
X
.1/k bk b` ˛ k` ;
(1.205)
k;`2Ii W l
X
.1/kC1 bk b` ˛ k` ;
(1.206)
.1/kC1 bk b` ˛ k` :
(1.207)
k2Ii ;`2Ii C1 W 0
and finally W D
1 5
X
k>`W k`>L
Again the main idea is that the first sum X1 C X2 C X3 C in (1.204) is the dominant part, that is, we plan to show that both Y1 C Y2 C Y3 C and W are negligible compared to X1 C X2 C X3 C . Since 0 < ˛ < 1, the term W becomes negligible if we choose L D c log m (say, c D 6 is a safe choice). Since the intervals Ii do not break up the pairs Bj D 10, and B1 ; B2 ; B3 ; : : : are independent and identically distributed random variables, we almost obtain that X1 ; X2 ; X3 ; : : : are independent and identically distributed random variables, and similarly we almost obtain that Y1 ; Y2 ; Y3 ; : : : are also independent and identically distributed random variables. We said almost, because the alternating nature of the terms .1/k bk in (1.205) and (1.206) causes a minor technical problem. But it is easy to solve this “parity problem” by enforcing that each group in (1.203) contains an even number of singles Bj D 0 (notice that a pair Bj D 10 does not change the parity). If each group has even size, then the parity inside a group is the same as the absolute parity. As a by-product, it follows that if we know a group in terms of a sequence of 0s and 10s but we don’t know the location of the group, then we can still evaluate the corresponding sum (1.205). Similarly, if we know two consecutive groups in terms of a sequence of 0s and 10s but we don’t know the location, then we can still evaluate the corresponding sum (1.206).
1.5.3 Solving the Parity Problem Here is the simple solution: assume that the first group B1 ; B2 ; B3 ; : : : ; BR in (1.203) contains an odd number of singles Bj D 0, then we just keep searching the sequence BRC1 , BRC2 , BRC3 , : : : for the first single BRCi D 0; let i D 1
1.5 The Golden Ratio and Markov Chains: The Simplest Case of Theorem 1.2
71
denote the first i 1 such that BRCi D 0. This 1 is a random variable. We show that 1 is “negligible” compared to R. Indeed, by (1.199), p PrŒ1 D i D .1 ˛/i 1 ˛ D ˛ 2i 1 ; where ˛ D . 5 1/=2:
(1.208)
Due to the exponentially small tail probability in (1.208), we may say that the unbounded random variable 1 is “essentially bounded.” More precisely, with probability extremely close to one, we have 1 D O.log m/. This clearly implies that 1 is negligible compared to R. If the first group B1 ; B2 ; B3 ; : : : ; BR in (1.203) contains an even number of singles Bj D 0, then we define 1 D 0. The introduction of 1 means that we extend the first row in (1.203) by adding 1 extra terms: B1 ; B2 ; B3 ; : : : ; BR ; : : : ; BRC1 : Next assume that BRC1 C1 ; BRC1 C2 ; BRC1 C3 ; : : : ; B2RC1 in (1.203) contains an odd number of singles Bj D 0, then we again keep searching the sequence B2RC1 C1 , B2RC1 C2 , B2RC1 C3 , : : : for the first single B2RC1 Ci D 0; let i D 2 denote the first i 1 such that B2RC1 Ci D 0. This 2 is a random variable, and of course we have the analog of (1.208): p PrŒ2 D i D .1 ˛/i1 ˛ D ˛ 2i 1 ; where ˛ D . 5 1/=2; implying that, with probability extremely close to one, we have 2 D O.log m/. That is, 2 is negligible compared to R. If the group BRC1 C1 ; BRC1 C2 ; BRC1 C3 ; : : : ; B2RC1 in (1.203) contains an even number of singles Bj D 0, then we define 2 D 0. The introduction of 2 means that we modify the second row in (1.203) as follows: BRC1 C1 ; BRC1 C2 ; BRC1 C3 ; : : : ; B2RC1 ; : : : ; B2RC1 C2 ; and so on. At the end of the procedure we obtain the following modified version of (1.203): B1 ; B2 ; B3 ; : : : ; BRC1 ; BRC1 C1 ; BRC1 C2 ; BRC1 C3 ; : : : ; B2RC1 C2 ; B2RC1 C2 C1 ; B2RC1 C2 C2 ; B2RC1 C2 C3 ; : : : ; B3RC1 C2 C3 ; and so on: (1.209)
72
1 What Is “Probabilistic” Diophantine Approximation?
There is no change in the rest: again let B be an arbitrary element of the i th group in (1.209); we have B D 0 or 10, that is, B D bk or bk bk1 for some k [see (1.198)]; let Ii denote the set of all indices k or k; k 1 that occur this way in the i th group. Notice that Ii is an interval (of consecutive integers), and these intervals are decreasing. We redefine (1.204)–(1.207) by using the new decomposition (1.209) instead of the old (1.203). This way we really obtain that X1 ; X2 ; X3 ; : : : are independent and identically distributed random variables, and similarly Y1 ; Y2 ; Y3 ; : : : are also independent and identically distributed random variables. Note, however, that Yi does depend on both Xi and Xi C1 , but Yi is independent of all X` with ` 62 fi; i C 1g. It is perfectly natural to expect that the subsums S D X1 C X2 C X3 C and S D Y1 C Y2 C Y3 C
(1.210)
in (1.204) both satisfy the central limit theorem. But there is a minor technical difficulty here: the number of terms in both sums is a random variable. By (1.201) S D X1 C X2 C C X and S D Y1 C Y2 C C Y1
(1.211)
where the random variable D . / approximately equals ˇ m : D . / D .1 C o.1// p 5R
(1.212)
Note that the last, possibly “broken pieces” of Xi and Yi are ignored in (1.211); the corresponding error is a negligible O.R/. We may look at the random variable D . / as a “random stopping,” since one can easily extend X1 ; X2 ; X3 ; : : : into an infinite sequence of independent and identically distributed random variables. (Indeed, in (1.189) has infinitely many digits bi D bi . /, so we can define Xi and Yi in (1.205) and (1.206) for infinitely many i .) Let’s return to (1.212): by Chebyshev’s inequality we can replace the weak (1.212) with the following stronger square root law: ˇ ˇ ˇ ˇ m ˇˇ ˇ. / p D O .m=R/1=2 ; ˇ ˇ 5R
(1.213)
p which holds for the overwhelming majority of s in 0 < < ˇ m = 5. Since the number D . / of terms in the random sum S [see (1.211)] is a random variable typically in the range (see (1.213) ˇ m p D . / D 0 C O. 0 / where 0 D p ; 5R
(1.214)
1.5 The Golden Ratio and Markov Chains: The Simplest Case of Theorem 1.2
73
it would be nice to have a good upper bound on the probability that, with d D p O. 0 /, ˇ ˇ max ˇX0 C1 C X0 C2 C C X0 Cj ˇ is “large00 :
1j d
(1.215)
p Notice that d D O. 0 / is the typical deviation of random variable D . / [see (1.214)]. Taking max1j d in (1.215) means that we can handle even the worst case scenario in (1.211). To solve subproblem (1.215), we use the well-known Kolmogorov’s inequality (a combination of Chebyshev’s inequality and the reflection principle; see, e.g., in Feller’s book [Fe1, Fe2]), which is designed exactly to estimate the probability of maximum-type questions such as (1.215). Kolmogorov’s inequality. If X1 ; X2 ; X3 ; : : : ; Xd are independent random variables with variance Vj D E.Xi EXi /2 , 1 i d , then for any > 0, 2
!1=2 3 5 1 : Pr 4 max j .Xi EXi /j Vi 1j d 2 i D1 iD1 j X
d X
(1.216)
Now we are ready to compare the sum S D X1 C X2 C C X
(1.217)
with a random number of terms (i.e., is a random variable, a “random stopping”) to the sum S 0 D X1 C X2 C C X0
(1.218)
p with a fixed number of terms; notice that 0 D ˇm. 5R/1 is a constant. By Kolmogorov’s inequality and (1.215), with probability 1 o.1/—i.e., by choosing in (1.216) as a large constant—the discrepancy between (1.217) and (1.218) is estimated from above by p O .dV /1=2 where d D O. 0 / D O .m=R/1=2 and V D O.R/:
(1.219)
The main point here is that the discrepancy in (1.219) is negligible compared to m1=2 = the standard deviation of the central limit theorem (see Theorem 1.2): indeed, by (1.219) O .dV /1=2 D O
q p R m=R D O m1=4 R1=4
(1.220)
74
1 What Is “Probabilistic” Diophantine Approximation?
is negligible compared to m1=2 if we choose the parameter R D R.m/ to be much smaller than m (note in advance that we will choose R D m1=5 ). It is worthwhile to know that, by using the plain Chebyshev’s inequality instead of the subtle Kolmogorov’s inequality, formula (1.220) would become p p
m=R R D O m1=2 ; O d V 1=2 D O and the too large error would kill Theorem 1.2. It is critical, therefore, to apply the Kolmogorov’s inequality. Now we are ready to repeat the argument of Sect. 1.3; see (1.81)–(1.102). We choose L D L.m/ D O.log m/ and R D R.m/ D m
with some appropriate constant 0 < < 1=2 (to be specified later). We have the analog of (1.196): Xk D O.R/ D O.m /; Yk D O.1/;
(1.221)
and also, by independence, we have the analog of the first half of (1.197): E Œ.Xk EXk /.Y` EY` / D 0 if ` 62 fk; k C 1g;
(1.222)
but the second half E Œ.Xk EXk /.Y` EY` / D O.1/ if ` 2 fk; k C 1g
(1.223)
requires a new argument. To prove (1.223), we use (1.197) to estimate the covariance h i cov.bj ; bi / D E .bj Ebj /.bi Ebi / D Ebj bi Ebj Ebi D p D p1;1 .i j / q1 q1 D p1;1 .i j / .˛= 5/2 ; where p j < i , p1;1 .k/ is the k-step transition probability from 1 to 1, and q1 D ˛= 5 is the “steady-state probability” of state 1 in the stationary distribution. By p using (1.197), a routine calculation gives that, with ˛ D . 5 1/=2, cov.bj ; bi / D O ˛ 2.i j / :
(1.224)
We also need the similar results cov.bj1 bi1 ; bj2 bi2 // D O ˛2.j2 i1 /
(1.225)
1.5 The Golden Ratio and Markov Chains: The Simplest Case of Theorem 1.2
75
for j1 < i1 < j2 < i2 , and cov.bj1 bi1 ; bi2 / D O ˛ 2.i2 i1 /
(1.226)
cov.bi1 ; bj2 bi2 / D O ˛ 2.j2 i1 /
(1.227)
for j1 < i1 < i2 , and
for i1 < j2 < i2 . By using (1.224)–(1.227), the verification of (1.223) is just a routine calculation that we briefly outline here. By (1.205) and (1.206) we have E Œ.Xk EXk /.Y` EY` / D D
1 EŒA.i1 ; j1 /A0 .i2 ; j2 / 25
(1.228)
where A.i1 ; j1 / D 2
X
.1/i1 .bi1 Ebi1 /
i1 2Ik
X
.1/i1 .bi1 bj1 Ebi1 bj1 /˛ i1 j1
i1 ;j1 2Ik W j1
(1.229) and A0 .i2 ; j2 / D
X
.1/i2 C1 .bi2 bj2 Ebi2 bj2 /˛i2 j2 :
(1.230)
i2 2I` ;j2 2I`C1 0
First assume that ` D kC1. Multiplying out (1.227) [and using (1.229) and (1.230)], we obtain terms like cov.bi1 ; bj2 bi2 /˛i2 j2 where j2 < i2
(1.231)
and cov.bj1 bi1 ; bj2 bi2 /˛i1 j1 Ci2 j2 where j1 < i1 and j2 < i2 :
(1.232)
By using (1.225) in (1.232), using (1.226) and (1.227) in (1.231), and also using the (absolute) convergence of the geometric series 1 X
1 ˛ D where ˛ D 1 ˛ nD0 n
p
51 ; 2
76
1 What Is “Probabilistic” Diophantine Approximation?
the upper bound E Œ.Xk EXk /.Y` EY` / D O.1/ is a simple routine calculation. The same holds for ` D k. This completes the proof of (1.223). Since (1.223) is the analog of (1.92), we can proceed just like in Sect. 1.3, and by (1.207) we obtain the error terms O m =2 log m C O m2 1=2 :
(1.233)
By (1.220) we have the additional error term [due to the randomness of the number of terms in sum S , see (1.217)] O m1=4 R1=4 D O m. 1/=4 I m1=2
(1.234)
note that the factor m1=2 in the denominator is the standard norming factor in the central limit theorem. To minimize the sum [see (1.233) and (1.234)] m =2 log m C m2 1=2 C m. 1/=4 ; we choose D 1=5 (the same way as we did in Sect. 1.3). Therefore, combining (1.171), (1.187), (1.192)–(1.194), and (1.204), we obtain the following special case of Theorem 1.2 (an analog of Proposition 1.10). First we recall (1.43): S˛ .n/ D
n X kD1
1 : fk˛g 2
(1.235)
Proposition 1.19 (Central limit theorem for the golden ratio). Consider the p special case ˛ D . 5 1/=2 in (1.235). Then for any integer m 2 and any real numbers 1 < A < B < 1 9ˇ ˇ8 ˇ ˇˆ > Z B =ˇ < ˇ 1 ˇ 1 S˛ .n/ 2 ˇ B ˇD p e u =2 d u C ˇ 0 n < qm W A q p > ˇ qm ˇˆ 2 A ;ˇ ˇ: .60 5/1 m C O.m1=10 log m/;
(1.236)
where qm is the mth Fibonacci number, and the implicit constant in the error term is absolute.
1.5 The Golden Ratio and Markov Chains: The Simplest Case of Theorem 1.2
77
1.5.4 Concluding Remarks (1) In Proposition 1.19 the integral parameter n runs in the special interval 0 n < qm (where qm is the mth Fibonacci number), but the result can be easily extended to any interval 0 n < N with arbitrary (“large”) N . We simply repeat the argument of the Concluding Remark at the end of Sect. 1.3; see (1.208)–(1.210). Let N D N.a1 ; : : : ; a` / D qm C a1 qm1 C a2 qm2 C a3 qm3 C : : : C a` qm` (1.237) p where ai 2 f0; 1g, 1 i `, and ` D c.log m= log b/ (“base b D .1 C 5/=2 logarithm”) is an integer; the value of the constant factor c > 0 will be specified later. Since ` D O.log m/ is negligible compared to m, by switching from N D qm in Proposition 1.19 to any number of the form N D N.a1 ; : : : ; a` /, the expectation and the variance hardly change value, and Proposition 1.19 remains true with the same error term: ˇ8 9ˇ ˇˆ ˇ > ˇ < =ˇ 1 S˛ .n/ ˇ ˇ B ˇD ˇ 0 n < N.a1 ; : : : ; a` / W A q p > ˇ N.a1 ; : : : ; a` / ˇˆ : ; 1 ˇ ˇ .60 5/ m 1 Dp 2
Z
B
e u
2 =2
d u C O m1=10 log m :
(1.238)
A
On the other hand, for every N in qm N < qmC1 there is a short sum N.a1 ; : : : ; a` / defined in (1.237) such that 0 N N.a1 ; : : : ; a` / D O .mc qm / :
(1.239)
By choosing c 1=10, the relative factor mc in (1.239) becomes less than the error term O m1=10 log m in (1.238). This proves that Proposition 1.19 holds for any interval 0 n < N with arbitrary N (not just for the special intervals 0 n < qm ). An alternative explanation is that an arbitrary interval 0 n < N means that we face a Markov chain where the initial distribution is not necessarily the stationary distribution. But on the long run it does not really matter, since the convergence to the stationary distribution is exponentially fast. (2) Finally, we briefly summarize our proof technique above. Since M˛ .N / D p O.1/ with ˛ D .1 C 5/=2, we had to evaluate S˛ .n/ as n runs in 0 < n < N . We used Ostrowski’s explicit formula (1.55), and we extended the integral p parameter n to a real variable in 0 < < N . We worked with the .1 C 5/=2 scale representation of [see (1.189)], which led us to a Markov chain: it was the sequence of coefficients bi D bi . / in that representation that formed a
78
1 What Is “Probabilistic” Diophantine Approximation?
homogeneous Markov chain. By heavily relying on this Markov chain, we could approximate S˛ . / (as runs in 0 < < N ) with a sum of independent random variables: S˛ . / D X1 C X2 C X3 C : : : C X C Y;
(1.240)
where X1 C X2 C X3 C : : : were independent, and as a bonus, they were also identically distributed random variables; the last term Y in (1.240) turned out to be negligible. The unusual feature in (1.240) was that the number of terms formed a random variable; in fact, a “random stopping.” However, by involving an inequality of Kolmogorov, we could easily neutralize the uncertainty caused by the “random stopping” . We concluded the proof of Proposition 1.19 with a direct reference to a basic result in probability theory: the central limit theorem for a sum of independent random variables. Before proving Theorems 1.1 and 1.2 in the general case, we need to compute the expectation and the variance for arbitrary quadratic irrationals, and this task leads us to quadratic number fields. This innocent-looking technical problem (computing the first two moments) is a rich mini-theory itself, with remarkable ramifications (Dedekind sums, class number formulas, diophantine series of Hardy and Littlewood—just to mention a few).
Chapter 2
Expectation, and Its Connection with Quadratic Fields
2.1 Computing the Expectation in General (I) The diophantine sum n X 1 fk˛g S˛ .n/ D 2
(2.1)
kD1
introduced in Sect. 1.2 [see (1.43)] is highly irregular as n ! 1, but its mean value M˛ .N / D
N 1 X S˛ .n/ N nD1
(2.2)
exhibits a particularly simple and elegant asymptotic behavior for quadratic irrationals. Let ˛ D a0 C
1 1 a1 C a2 C : : :
D Œa0 I a1 ; a2 ; a3 ; : : :
(2.3)
denote the continued fraction for ˛; ai denote the partial quotients and Œa0 I a1 ; : : : ; aj 1 D pj =qj is the j th convergent. By using (2.3) we can formulate Proposition 2.1. For any irrational ˛ > 0 given with (2.3) and any integer N 1, M˛ .N / D
a1 C a2 a3 ˙ : : : C .1/k ak C O. max aj /; 1j k 12
© Springer International Publishing Switzerland 2014 J. Beck, Probabilistic Diophantine Approximation, Springer Monographs in Mathematics, DOI 10.1007/978-3-319-10741-7__2
(2.4)
79
80
2 Expectation, and Its Connection with Quadratic Fields
where k D k.˛; N / is the last index j for which the j th convergent denominator qj N , i.e., qk N < qkC1 , and the implicit constant on the right-hand side of (2.4) is absolute (less than 10). Proposition 2.1 is particularly useful for quadratic irrationals. Indeed, for a periodic sequence ai it is easy to evaluate the alternating sum in (2.4). As an illustration, consider first ˛D
p 3 D Œ1I 1; 2; 1; 2; 1; 2; : : : D Œ1I 1; 2:
(2.5)
The least solution of Pell’s equation x 2 3y 2 D 1 is x D 2, y D 1, and so p2j ˙ q2j
p
3 D .2 ˙
p j 3/ ; j D 1; 2; 3; : : :
(2.6)
p where p2j =q2j is the 2j th convergent of p 3 (we get every second convergent in (2.6), because the length of the period of 3 is 2 [see (2.5)]. By (2.6) p
p 1 q2j D p .2 C 3/j .2 3/j ; 2 2 and so we have N D q2j H) j D Combining (2.4) with (2.7), for ˛ D Mp3 .N / D D
p
log N p C O.1/: log.2 C 3/
(2.7)
3 we have with k D 2j
a1 C a2 a3 ˙ : : : C .1/k ak C O.1/ D 12
1 C 2 log N 1 C 2 1 C 2 : : : 1 C 2 C O.1/ D p C O.1/ D 12 12 log.2 C 3/ D
log N 12 log.2 C
p C O.1/; 3/
(2.8)
proving our claim in (1.53). p Here are two more examples like (2.8): for 7 D Œ2I 1; 1; 1; 4 the least solution 2 2 x D 8, y D 3 of Pell’s p equation x 7y D 1 comes from the fourth convergent Œ2I 1; 1; 1 D 8=3 of 7, and so Mp7 .N / D
1 C 1 1 C 4 log N p C O.1/ D 12 log.8 C 3 7/ D
log N
p C O.1/; 4 log.8 C 3 7/
2.1 Computing the Expectation in General (I)
81
p and for 67 D Œ8I 5; 2; 1; 1; 7; 1; 1; 2; 5; 16 the least solution x D 48,842, y D 5,967 of Pell’s equation x 2 67y 2 p D 1 comes from the tenth convergent Œ8I 5; 2; 1; 1; 7; 1; 1; 2; 5 D 48842=5967 of 67, and so Mp67 .N / D
log N 5 C 2 1 C 1 7 C 1 1 C 2 5 C 16 p 12 log.48842 C 5967 67/ CO.1/ D
log N
p C O.1/: 4 log.48842 C 5967 67/
p In sharp contrast, for ˛ D 2 D Œ1I 2 the alternating sum in (2.4) cancels out, and Mp2 .N / D O.1/; this proves (1.52). Similarly, any quadratic irrational ˛, for which the length of the period (of the continued fraction) is odd, has the property that the mean value is basically zero: M˛ .N / D O.1/ D O˛ .1/ (because the alternating sum in (2.4) cancels out). Note that in Sect. 1.5 we proved the fact M˛ .N / D O.1/ in the special case of the golden ratio p ˛ D . 5 1/=2 D Œ1; 1; 1; 1; : : : D Œ1 by a long, direct computation; see (1.177). This direct computation becomes hopelessly messy even for an arbitrary quadratic irrational, not to mention the general case of an arbitrary irrational number. Unfortunately, we cannot characterize the quadratic irrationals for which the period is odd/even (what we mean here is that the length of period in the continued p fraction is odd or even). However, if ˛ D p where p is an odd prime, we have a perfect characterization: the period is odd if p 1 (mod 4), and the period is even if p 3 (mod 4). The proof of this elegant characterization is based on the well-known numbertheoretic fact that the “negative” Pell equation x 2 dy 2 D 1 (where d > 0 is an integer, p but not a complete square) has an integral solution if and only if the period of d is odd. If p is a prime with p 1 (mod 4), then we will find an integral p solution of x 2 py 2 D 1, and this will imply that the period of p is odd. To 2 2 find a solution of x py D 1, we start with the fundamental solution .x1 ; y1 / of the ordinary Pell’s equation x 2 py 2 D 1, which always has a solution (the fundamental solution is the least positive solution). The equation x 2 1 D py 2 leads to the factorization .x1 1/.x1 C 1/ D py12 :
(2.9)
If p 1 (mod 4) then (2.9) implies that x1 is odd, and also by using that p is a prime, we have either (1) x1 1 D 2pu2 and x1 C 1 D 2v2 or (2) x1 C 1 D 2pu2 and x1 1 D 2v2 holds for some positive integers u and v satisfying y1 D 2uv. Hence v2 pu2 D ˙1. The case v2 pu2 D 1 is impossible, since .v; u/ is a smaller solution
82
2 Expectation, and Its Connection with Quadratic Fields
than .x1 ; y1 /, the fundamental solution—a contradiction. Thus v2 pu2 D 1, i.e., the negative Pell’s equation does have a solution, and we obtain the following. Corollary 2.2. If p is a prime with p 1 (mod 4) then Mpp .N / D O.1/: The proof above is prime p specific: if d 1 (mod 4) is not a prime, p then the length of the period of d can be both even and odd. For example, 21 D p Œ4I 1; 1; 2; 1; 1; 8 gives length 6 (even) and 65 D Œ8I 16 gives length 1 (odd). On the other hand, if d 3 (mod 4), then by a simple (mod 4) analysis we have x 2 dy 2 6 1 (mod 4) p (it is irrelevant that d is a prime or not), implying that the length of the period of d has to be even. Actually, p we have a stronger result: if d has a prime factor q 3 (mod 4), the period of d is always even. Indeed, then x 2 dy 2 D 1 implies x 2 1 (mod q), which contradicts Fermat’s little theorem: 1 x q1 D .x 2 /.q1/=2 .1/.q1/=2 D 1 .mod q/: What happens in Proposition 2.1 if we go beyond quadratic irrationals? How about the special number e: e D Œ2I 1; 2; 1; 1; 4; 1; 1; 6; 1; 1; 8; 1; ; 1; 2i; 1; ‹ Well, the alternating sum .1 C2 1/C.1 4 C1/C.1 C6 1/C C.1/i .1 2i C 1/ equals i 1 if i is odd and i if i is even. Thus by Proposition 2.1 we have Me .N / D O.log N= log log N /;
(2.10)
which is the true order of magnitude. Note in advance that Proposition 2.1 also gives the constant factor C1 .˛; x/ in Theorem 1.1 in the special case x D 1=2. It is a consequence of the identity 1 1 1 2 fyg ; 1=2 .y/ D f2yg 2 2 2 where of course fyg denotes the fractional part of y, and 1=2 .y/ is 1 if fyg < 1=2 and 0 otherwise. We will return to this later in Sect. 2.2; see (2.87) and (2.88).
2.1.1 An Important Detour: How to Guess Proposition 2.1? The proof of Proposition 2.1 is not easy, but it was equally difficult to find the right conjecture. What was our motivation to guess formula (2.4)? Well, this is an
2.1 Computing the Expectation in General (I)
83
interesting long story, which involves algebraic number theory. To explain it, we briefly outline an alternative approach to find the average M˛ .N /. We start with the well-known Fourier series expansion of the fractional part function (warning: it is not absolutely convergent) 1
fxg D
1 X sin.2 nx/ : 2 nD1 n
(2.11)
Substituting it back to (2.1) and (2.2), after some long but standard manipulations we end up with M˛ .N / D
N 1 1 X C O.1/; 2 nD1 n tan. n˛/
(2.12)
if ai D O.1/, i.e., the partial quotients of ˛ are bounded (this is certainly true for the quadratic irrationals). (Note that Eq. (2.12) is exactly our Proposition 2.16 coming later.) p Let ˛ D d , where d 3 (mod 4) p is a positive square-free integer. We clearly have (m denotes the nearest integer to n d ) p p p 1 .m2 d n2 / tan. n d / ˙jjn d jj D n d m p : 2n d
(2.13)
In view of (2.12) and (2.13), the following formula is not too surprising: p Mpd .N / D
0
dB B 2 @
1 X .x;y/¤.0;0/W primary representations
x2
C log N 1 C C O .log log N /3 ; A 2 dy log d p
(2.14)
where d is the fundamental unit of Q . d /. Note that Eq. (2.14) is exactly Proposition 2.20; the meaning of “primary representations” will be explained later at the beginning of Sect. 2.6—actually the reader can jump ahead and read it right now. p If d 3 (mod 4) then xp2 dy 2 is the norm of the algebraic integer x C y d in the real quadratic field Q . d /:
2.1.2 Quadratic Fields in a Nutshell Letp D be a square-free positive or negative integer, and consider the quadratic field p Q . D/: The discriminant of Q . D/ is 4D p if D 2 or 3 (mod 4), and D if D 1 (mod 4). The quadratic irrational .a C b D/=2 is an algebraic integer in
84
2 Expectation, and Its Connection with Quadratic Fields
p Q . D/ iff a and b 2 ZZ are integers satisfying a b 0 (mod 2) when D 2 or 3 (mod 4), and a b (mod 2) when D 1 (mod 4). So the norm p p a2 b 2 D aCb D ab D D 2 2 4 p p is called a of .a C b D/=2 is always an integer. An algebraic integer in Q . D/ p unit if its norm is p ˙1. If D > 0, then there exists a unit D D in Q . D/ such n that any unit in Q . D/ is representable as ˙ p ; n D 0; ˙1; ˙2; : : : : This number D D is called the fundamental unit in Q . D/. Let F .x; y/ D ax 2 C bxy C cy 2 be an integral binary quadratic form of discriminant D b 2 4ac (a; b; c 2ZZ are integers). If an integral binary quadratic form F .x; y/ is transformed into the form F1 .x1 ; y1 / by an integral unimodular transformation x D Ux1 CVy1 , y D W x1 CZy1 where UZ V W D 1, then F and F1 are called equivalent. The class number h.D/ (where D 4D or D) is basically the number of nonequivalent integral binary quadratic forms of discriminant . More precisely, by computing the class number we do not distinguish a quadratic form from its negative, though they may be nonequivalent (which is exactly the case if D > 0, and x 2 Dy 2 D 1 does not have an integer solution). For example, let D D 79; then the discriminant is 4 79 D 316; and there are six nonequivalent integral binary forms of discriminant 316: F1 D x 2 79y 2 ; F1 D x 2 C 79y 2 ; F2 D 3x 2 C 4xy 25y 2 ; F2 D 3x 2 4xy C 25y 2 ; F3 D 3x 2 C 2xy 26y 2 ; 2 2 Fp field 3 D 3x 2xy C 26y : So the class number h.79/ of the quadratic p Q . 79/ is 3 (and not 6). If h.D/ D 1 then the algebraic integers in Q . D/ have unique factorization p into algebraic primes. The “first” quadratic field with class number > 1 is Q . 5/. The discriminant is 4 .5/ D 20, and there are two nonequivalent integral binary quadratic forms of discriminant 20: x 2 C 5y 2 and 2x 2 C 2xy C 3y 2 : So the class number h.5/ is 2. A counterexample to the unique prime factorization is p p .1 C 5/ .1 5/ D 6 D 2 3; p p where all the 4pfactors .1 C 5/; .1 5/; 2, and 3 are primes in the ring of integers of Q . 5/. Now let us return to (2.14). If we make the extra hypothesis that d D p 3 p (mod 4) is a prime and the class number h.p/ of the real quadratic field Q . p/ is one, then the middle sum on the right-hand side of (2.14) becomes a special Lfunction at s D 1: X .x;y/¤.0;0/W primary representations
x2
1 D L.1; /: py 2
(2.15)
Here is the so-called norm-sign character: a unique characterpwith values ˙1 defined for all ideals in the ring of the algebraic integers of Q . d / (in fact,
2.1 Computing the Expectation in General (I)
85
depends only on the narrow ideal class), and satisfies ..a// D sign Norm.a/ for the principal ideals .a/. Note that, in our special case d D p with h.p/ D 1, every ideal is principal. The L-function X
L.s; / D
AW ideals
.A/ Norm.A/s
(here we don’t have to write jNorm.A/j, because the norm of an ideal is by definition an integer 1; in sharp contrast the norm of an algebraic integer in a real field can be both positive and negative) has the product decomposition L.s; / D L.s; 4 /L.s; p /
(2.16)
where L.s; 4 / D
1 X 4 .n/ nD1
ns
and L.s; p / D
1 X p .n/ nD1
ns
p p are the (ordinary) L-functions of the complex quadratic fields Q . 4/=Q Q. 1/ p (“Gauss integers”) and Q . p/; the characters 4 and p are defined as follows: 4 .n/ D ˙1 if n ˙1 (mod 4) and 4 .n/ D 0 if n is even, and n p .n/ D p is the usual Legendre symbol (quadratic residue symbol). Note that (2.16) is basically an Euler product, and it is “explained” by the elementary factorization 4p D .4/.p/ of the discriminant of x 2 py 2 ; see, e.g., Zagier’s book [Za4]. In the special case s D 1 Eq. (2.16) gives L.1; / D L.1; 4 /L.1; p /;
(2.170)
and by Dirichlet’s (analytic) class number formula, L.1; 4 / D
4
and L.1; p / D
h.p/ ; p p
(2.1700)
if p > 3. Now this is where the remarkable Hirzebruch–Meyer–Zagier formula (HMZ-formula, in short) enters the story: h.p/ can be expressed in terms of an alternating sum of the partial quotients (i.e., the “digits” of the continued fraction) p in the period of p; see, e.g., in Zagier [Za1]. But before formulating the HMZ-formula, we note that quadratic irrationals all have periodic continued fraction, and the least solution of Pell’s equation
86
2 Expectation, and Its Connection with Quadratic Fields
p x 2 dy 2 D 1 can be determined from the period of d ; the least solution is basically the fundamental unit. Moreover, the parity of the length of the period describes the sign of the norm of the fundamental unit: odd length means C1; even length means 1: Combining Dirichlet’s class number formulas with the ineffective Siegel theorem, we obtain the deep asymptotic formulas h.d / log d D d 1=2˙" ;
(2.180)
h.d / D d 1=2˙" ;
(2.1800)
where h.d / are the class numbers of the real and complex quadratic p/ and h.dp p fields Q . d / and Q . d /, respectively, d is the fundamental unit of Q . d /, and " > 0 is arbitrarily small but fixed. Note that the order of magnitude p of log d is roughly around the length of the period of the continued fraction for d . The elegant Hirzebruch-Meyer-Zagier formula (HMZ-formula) was discovered in the 1970s. It states that h.p/ D
a1 C a2 a3 ˙ C a2s ; 3
(2.19)
where p 3 (mod 4) is a prime > 3, h.p/ D 1, and a1 ; a2 ; : : : ; a2s forms the p period of p (since p 3 (mod 4), the length of the period has to be even). p (Note that both (2.17) and (2.19) fail for p D 3, because Q . 3/ has too many automorphisms: 6 instead of the usual 2—a technical nuisance in algebraic number theory.) Combining the HMZ-formula with (2.14)–(2.17), we conclude Mpp .N / D
h.p/ log N C O .log log N /3 D 4 log
D
a1 C a2 C a2s log N C O .log log N /3 D 12 log
D
a1 C a2 a3 ˙ C .1/` a` C O .log log N /3 ; 12
(2.20)
p where ` is the last index for which q` N and is the fundamental unit of Q . p/ (in the last equation we heavily used the periodicity of the continued fraction p for p). Summarizing, by using the HMZ-formula, we just managed to prove (2.20), at least under some strong technical conditions (for example, we assumed that p 3 (mod 4) is a prime > 3 with h.p/ D 1, and also in (2.20) we have the ugly but negligible error term O .log log N /3 ). Nevertheless, from (2.20) it was quite easy to guess that Proposition 2.1 must hold for arbitrary ˛ (not just for quadratic irrationals), and this is exactly how we came up with the right conjecture (2.4).
2.1 Computing the Expectation in General (I)
87
Because we know a completely elementary proof of Proposition 2.1, reversing the argument, we can produce an elementary proof for the HMZ-formula. Later we will give a precise proof of (2.12) and (2.14); (2.12) is Proposition 2.16 and (2.14) is Proposition 2.20. (The interested reader can find all the details, and much more, about quadratic fields in the well-written book of Zagier [Za4] (it is in German), or in the classic Borevich–Safarevich: Number Theory.)
2.1.3 Another Detour: Formulating a “Positivity Conjecture” The first line in (2.20) raises a very interesting question. If a prime p satisfies the condition of the HMZ-formula, the expectation equals Mpp .N / D
h.p/ log N C negligible error: 4 log
Here the class number is trivially 1, and also therefore,
p
p > 1, implying log > 0;
Mpp .N / D c log N C negligible error; where c D c.p/ > 0 is a positive constant. By Proposition 2.1, the error term here is in fact O.1/, and in general, for any quadratic irrational ˛, M˛ .N / D c log N C O.1/; where c D p c.˛/ is a constant (expressed in terms of the period of ˛). Is it true that if ˛ D d , the corresponding constant factor is always nonnegative, that is, Mpd .N / D c log N C O.1/ with c 0? We guess the answer is “yes,” and I refer to this as the “positivity conjecture.” p If the length of the period of d is odd, the “positivity conjecture” is trivial. Indeed, by formula (2.4) the corresponding alternating sum “cancels out,” implying that the constant factor is zero, i.e., Mpd .N / D O.1/ (the same holds for any quadratic irrational p with odd period). Thus, the nontrivial case is when the length of the period of d is even. It is well known that then the period has the symmetric form with a central term p d D Œa0 I a1 ; a2 ; : : : ; at ; at C1 ; at ; : : : ; a2 ; a1 ; 2a0 p where a0 D b d c and at C1 denotes the central term. Applying the alternating sum in formula (2.4), we have
88
2 Expectation, and Its Connection with Quadratic Fields
a1 C a2 a3 ˙ C O.1/ D 12 1 1 0 0 t X log N C O.1/: D @2 @ .1/j aj A C .1/t C1at C1 C 2a0 A log j D1 Mpd D
The positivity of the constant factor c D c.d / in Mpd D c log N C O.1/ is, therefore, equivalent to the positivity of the alternating sum formed from the period 2
t X
.1/j aj C .1/t C1 at C1 > 0:
j D0
We checked the p tables for d < 100, and this alternating sum is indeed positive when the period of d is even. Since the “positivity conjecture” is certainly not truepfor arbitrary quadratic irrational ˛, its hypothetical truth in the special case ˛pD d is probably closely related top the arithmetic of the real quadratic field Q . d / (or perhaps the complex field Q . d /). Let’s return now to Proposition 2.1. We include an elementary (but far from easy) proof. Proof of Proposition 2.1. We use Dedekind sums. To explain where the Dedekind sum comes from, we rewrite (2.1) and (2.2) in the following form: M˛ .N / D
N 1 X 1 D .N C 1 k/ fk˛g N 2 kD1
D
N X N N C1 1 X 1 1 1 k fk˛g ; fk˛g N 2 2 N 2 2 kD1
(2.21)
kD1
where the last sum N X k kD1
N
1 2
1 fk˛g 2
in (2.21) strongly resembles a Dedekind sum D.H; K/ D
K1 X j D1
1 j K 2
1 fjH=Kg ; 2
where we always assume that H and K 1 are relatively prime integers.
(2.22)
2.1 Computing the Expectation in General (I)
89
Dedekind sums [i.e., (2.22)] originally appeared in Dedekind’s study of elliptic functions and theta-functions. Luckily we don’t need to know anything about these (rather technical) subjects; we can just work with definition (2.22). The key fact about Dedekind sums is the following reciprocity formula, a highly surprising and nontrivial result. Lemma 2.3 (Dedekind’s reciprocity formula). We have 1 D.H; K/ C D.K; H / D 12
H K 1 C C K H HK
1 : 4
(2.23)
Note that the definition of D.H; K/ and D.K; H / automatically includes the condition that “H 1 and K 1 are relatively prime integers.” For a proof of this classical result, see, e.g., the book [Ra-Gr]. From Lemma 2.3 we will derive Lemma 2.4. If 1 H < K are relatively prime then D.H; K/ D
a1 a2 C a3 C .1/`1 a` C O.1/; 12
(2.24)
where H D K
1
D Œa1 ; a2 ; a3 ; : : : ; a` :
1
a1 C a2 C
(2.25)
1 a3 C : : :
Note that the error term O.1/ in (2.24) has absolute value 1=4. Proof. The continued fraction Euclidean algorithm
H K
D Œa1 ; a2 ; a3 ; : : : ; a` is equivalent to the
K D a1 H C H1 ; H D a2 H1 C H2 ; H1 D a3 H2 C H4 ; : : : ; H`2 D al H`1 where H`1 D gcd.H; K/ D 1 (gcd denotes the greatest common divisor). We apply Lemma 2.3 with the short notation 1 g.x; y/ D 12
x y 1 C C y x xy
1 4
as follows: write K D H1 , H D H0 , then D.H; K/ D D.H0 ; H1 / D g.H1 ; H0 / D.H1 ; H0 / D D g.H1 ; H0 / D.H1 ; H0 /I
90
2 Expectation, and Its Connection with Quadratic Fields
here we used the first equation of the Euclidean algorithm. Repeating the same argument, we have D.H; K/ D g.H1 ; H0 / D.H1 ; H0 / D D g.H1 ; H0 / .g.H0 ; H1 / D.H0 ; H1 // D D g.H1 ; H0 / g.H0 ; H1 / C D.H2 ; H1 /I here we used the second equation of the Euclidean algorithm. Repeating the same argument several times, we have D.H; K/ D g.H1 ; H0 / g.H0 ; H1 / C g.H1 ; H2 / g.H2 ; H3 / ˙ C .1/`1 g.H`2 ; H`1 / C .1/` D.H`2 ; H`1 /: Note that the last term here is in fact zero; indeed, H`1 D gcd.H; K/ D 1 implies that D.H`2 ; H`1 / D 0. Moreover, by using the notation f .x; y/ D
x y C ; y x
we have `1 X Hi Hi 1 i .1/ f .Hi 1 ; Hi / D .1/ C D Hi Hi 1 i D0 i D0
`1 X
i
X Hi 1 Hi C1 H0 C .1/i D H1 Hi i D0 `1
D
D
X X H ai 1 Hi H C C .1/i D .1/i ai 1 : K Hi K `1
`1
i D0
i D0
Since 1 f .x; y/ C g.x; y/ D 12
1 1 ; 12xy 4
2.1 Computing the Expectation in General (I)
91
combining the facts above, we conclude D.H; K/ D g.H1 ; H0 / g.H0 ; H1 / C g.H1 ; H2 / g.H2 ; H3 / ˙ C C.1/`1 g.H`2 ; H`1 / D D
C
1 C .1/`1 1 H C 12K 8 12
a1 a2 C a3 C .1/`1 a` C 12
1 .1/`1 1 1 C C KH HH1 H1 H2 H`2 H`1
:
The last alternating sum has absolute value 1=12, and because 1 H < K, the total error is at most maxf1=4; 1=12 C 1=12g D 1=4, completing the deduction of Lemma 2.4 from Lemma 2.3. t u Next we derive Proposition 2.1 from Lemma 2.4 in the special case N D qr , i.e., when N happens to be a convergent denominator of ˛; see Lemma 2.5. But first we introduce a notation that simplifies the treatment of Dedekind sums. Let ( ..x// D
fxg 12 ; if x is not an integerI 0;
otherwi se:
Note that y D ..x// is usually called the “sawtooth function.” By using this new notation, we can rewrite (2.22) in a shorter form: D.H; K/ D
K1 X j D1
j K
jH K
;
(2.26)
where, as usual, we assume that H and K 1 are relatively prime integers. Notice that extending the summation in (2.26) from 1 to K makes no difference (just adds a zero to the sum). Now we are ready to formulate and prove an important special case of Proposition 2.1. Lemma 2.5. We have M˛ .qr / D
a1 C a2 a3 ˙ : : : C .1/r1 ar1 C O.1/; 12
(2.27)
where ˛ D Œa1 ; a2 ; a3 ; : : : and pr =qr D Œa1 ; a2 ; : : : ; ar1 is the rth convergent of ˛. The implicit error term O.1/ is less than 5 for all ˛ and r.
92
2 Expectation, and Its Connection with Quadratic Fields
Proof. We recall (2.21) with N D qr :
qr X qr 1 qr C 1 1 X 1 1 k fk˛g : fk˛g M˛ .qr / D qr 2 2 qr 2 2 kD1 kD1 (2.28) First we focus on the following subsum of (2.28): S D
qr X k kD1
qr
1 2
X qr 1 k fk˛g D ..k˛//: 2 qr
(2.29)
kD1
We compare S to the Dedekind sum D.pr ; qr / D
qr X k kpr kD1
qr
qr
;
(2.30)
where pr =qr is the rth convergent of ˛. We recall the well-known fact from diophantine approximation that ˇ ˇ ˇ ˇ ˇ˛ p r ˇ < 1 ; ˇ qr ˇ qr2 which implies that the inequality ˇ ˇ ˇ ˇ ˇk˛ kpr ˇ < k 1 ˇ qr ˇ qr2 qr
(2.31)
holds for all 1 k qr . By (2.31) we have jS D.pr ; qr /j < 1:
(2.32)
On the other hand, by Lemma 2.4, ˇ ˇ r ˇ ˇ ˇD.pr ; qr / a1 a2 C a3 C .1/ ar1 ˇ 1 : ˇ 4 ˇ 12
(2.33)
Combining (2.32) and (2.33) we have ˇ ˇ ˇ a1 a2 C a3 C .1/r ar1 ˇ 1 ˇS ˇ C 1 D 5: ˇ ˇ 4 12 4
(2.34)
2.1 Computing the Expectation in General (I)
93
Another application of (2.31) gives ˇ ˇ ˇˇq 1 ˇq 1 ˇ r r ˇ ˇX ˇX 1 1 ˇˇ j ˇ ˇ ˇ .fk˛g 1=2/ˇ ˇ ˙ ˇ ˇ ˇ ˇ qr qr 2 ˇˇ j D1 kD1 ˇ ˇ ˇqX ˇ ˇ ˇ r 1 j 1 ˇ C qr 1 D 0 C 1 D 1: ˇˇ 2 ˇˇ qr ˇ j D1 qr
(2.35)
Applying (2.34) and (2.35) in (2.28), we conclude that ˇ ˇ r ˇ ˇ ˇM˛ .qr / a1 a2 C a3 : : : C .1/ ar1 ˇ ˇ ˇ 12
ˇ ˇ ˇˇ ˇ 5 ˇˇ qr C 1 1 ˇˇ ˇˇ qr C 1 1 ˇˇ ˇˇ Cˇ ˇCˇ ˇ ˇfqr ˛g 4 qr 2 qr 2
ˇ ˇ ˇ ˇ qr C 1 1 ˇ 1 ˇˇ 5 ˇ < 5; ˇ C 2 ˇ q 2ˇ 4 2ˇ r t u
and Lemma 2.5 follows.
The last step is to derive the general Proposition 2.1 from the special case Lemma 2.5. There are many ways to reduce the general case to Lemma 2.5; see, e.g., Beck [Be4]. Here we follow a nice idea of Schoissengeier [Scho], involving telescoping sums, which seems to be the best treatment of the general case. Let N 1 be an arbitrary integer. Consider the Ostrowski expansion of N [see (1.54)]: N D
r X
bi qi ; where 0 bi ai and
(2.36)
i D1
bi D ai implies bi 1 D 0 (“Extra Rule”). Here ai is the i th partial quotient of the continued fraction of ˛ D Œa1 ; a2 ; a3 ; : : : and pi =qi D Œa1 ; : : : ; ai 1 is the i th convergent of ˛. We are motivated by the following telescoping sum equation: N X N C1i ipr D N qr i D1
(2.37)
1 0 NX Nk r k1 ipk jpk1 A 1 X @X .Nk C 1 i / .Nk1 C 1 j / ; D N qk qk1 i D1 j D1 kD1
where Nk is the kth partial sum of (2.36): Nk D
Pk
i D1 bi qi .
94
2 Expectation, and Its Connection with Quadratic Fields
We are going to evaluate the terms of the telescoping sum (2.37). The next lemma, clearly motivated by Eq. (2.37), can be considered as a generalization, or new version, of Lemma 2.5. The idea is to involve the Dedekind sum D.pk ; qk /, just like we did in the proof of Lemma 2.5. Pj Lemma 2.6. If Nj D i D1 bi qi then Nk X
.Nk C 1 i /
i D1
D bk qk D.pk ; qk / C
ipk qk
N k1 X
.Nk1 C 1 j /
j D1
jpk1 qk1
D
bk1 .1 C .1/k /.2Nk1 C 1 .bk1 C 1/qk1 /C 4
C .1/kC1
Nk1 .Nk1 C 1/.Nk1 C 2/ : 6qk qk1
(2.38)
Proof of Lemma 2.6. We basically repeat the proof of Lemma 2.5. Write Nk X
.Nk C 1 i /
i D1
ipk qk
X
D
1
C
X 2
;
(2.39)
where X
X
bk qk 1
D
.Nk C 1 i /
i D1
ipk qk
and X 2
Nk X
D
.Nk C 1 i /
i Dbk qk C1
ipk qk
:
P We evaluate 1 first. Since ..x// D 0 if x is an integer, we take out the i ’s that are divisible by qk : X 1
D
D
bX k 1 .t C1/q Xk 1
.Nk C 1 i /
t D0 i Dt qk C1
bX k 1 k 1 qX
.Nk C 1 tqk j /
t D0 j D1
qk 1
D bk
X
j D1
j
jpk qk
ipk qk
jpk qk
D
D
;
(2.40)
2.1 Computing the Expectation in General (I)
95
since K1 X j D1
jH K
D 0:
Thus by (2.40), qk 1
X 1
D bk qk
X
j D1
1 j qk 2
jpk qk
D bk qk D.pk ; qk /;
(2.41)
justifying the first term Pon the Pright-handPside of (2.38). P Next we evaluate 2 3 , where 2 is the second term in (2.39) and 3 is the negative term on the left-hand side of (2.38): X 3
D
N k1 X
jpk1 qk1
.Nk1 C 1 j /
j D1
:
(2.42)
We recall the well-known fact from the theory of continued fraction: pk1 .1/k1 pk D C ; qk qk1 qk1 qk
(2.43)
and so, if j Nk1 then
jpk qk
D
.1/k1 j jpk1 C qk1 qk1 qk
D
jpk1 qk1
C
.1/k1 j ; qk1 qk
(2.44)
when j is not divisible by qk1 , and
jpk qk
D
jpk1 qk1
C
.1/k1 j 1 C .1/k1 ; C qk1 qk 2
when j is divisible by qk1 . Thus we can rewrite
Nk X
.Nk C 1 i /
i Dbk qk C1
D
N k1 X
2
ipk qk
.Nk bk qk C 1 j /
j D1
D
P
N k1 X j D1
.Nk C 1 j /
[see (2.39)] in the form D
jpk1 qk1
jpk1 qk1
D
;
(2.45)
96
2 Expectation, and Its Connection with Quadratic Fields
and applying (2.44) and (2.45) we have [note that X 2
C bk1
D
P 3
is defined in (2.42)]
Nk1 .1/k1 j X C .Nk C 1 j /j C 3 qk1 qk j D1
X
.bk1 C 1/qk1 Nk1 C 1 : 2
1 C .1/k1 2
(2.46) t u
Combining (2.41), (2.42), and (2.46), Lemma 2.6 follows.
By using Lemma 2.6, we are ready to complete the proofPof Proposition 2.1. Let’s return to (2.36). First we extend the definition of Nk D kiD1 bk qk for all k > r in the trivial way: put bi D 0 for i > r. We sum up both sides of Lemma 2.6 as k D 1; 2; 3; : : :; the left-hand side of (2.38) gives r X
.N C 1 k/..k˛//;
(2.47)
kD1
and the right hand side of (2.38) gives X
C
1
X 1
X 2
D
r X bj j D1
X 3
D
r X
.1/j
j D1
4
D
X
D
2
C
r X
X 3
where
(2.48)
bi qi D.pi ; qi /;
i D1
.1 C .1/j C1 /.2Nj C 1 .bj C 1/qj /;
1 X
.1/j
j D1
Nj .Nj C 1/.Nj C 2/ D 6qj qj C1
Nj .Nj C 1/.Nj C 2/ N.N C 1/.N C 2/ prC1 C ˛ ; 6qj qj C1 6 qrC1
where in the last step we used (2.43) and the fact pi =qi ! ˛ as i ! 1.
2.1 Computing the Expectation in General (I)
First we evaluate r X
P
1.
By Lemma 2.4,
bi qi D.pi ; qi / D
r X
i D1
bi qi
i D1
D
97
r X .1/j aj 1 j D1
12
a1 a2 ˙ C .1/i ai 1 C 12 4
.N Nj 1 / C
D
N D 4
0 1 r r X .1/j aj 1 X .1/j 1 aj 1 Nj 1 DN@ C C A; 12 12 N 4 j D1 j D1
(2.49)
where ji j < 1 and jj < 1 are appropriate constants. Since the sequence Nj D Pj i D1 bi qi increases at least exponentially fast, an upper bound like k X
Ni 4NkC1
(2.50)
i D1
is trivial. Combining (2.49) and (2.50), a1 a2 ˙ C .1/r ar1 C 0 . max aj / C 00 ; 1j r 12 i D1 (2.51) 00 where j 0 j 4 and jP j 1=4. Next we estimate 2 from above: r X
bi qi D.pi ; qi / D N
X 2
r r X 1X 1 bi Ni . max aj / Ni 3N. max aj /; 1j r 2 i D1 2 1j r i D1
(2.52)
where in the last step wePused (2.50). Finally, we estimate 3 from above. Since Nj D
j X
bi qi and qj C1 aj qj bj qj ;
i D1
we have ˇ ˇ ˇ ˇ r r ˇ X ˇX N .N C 1/.N C 2/ j ˇ ˇ .1/j j j .bj C 1/2 qj 2N. max aj /: ˇ ˇ 1j r 6qj qj C1 ˇ j D1 ˇj D1
(2.53)
98
2 Expectation, and Its Connection with Quadratic Fields
We also have N.N C 1/.N C 2/ 6
ˇ ˇ ˇ N3 N prC1 ˇˇ ˇ : ˇ˛ ˇ 2 qrC1 3 3qrC1
(2.54)
Combining (2.47), (2.48), (2.51)–(2.54), we obtain r 1 X .N C 1 k/..k˛// D M˛ .N / D N kD1
D
a1 a2 ˙ C .1/r ar1 C . max aj /; 1j r 12
(2.55)
where jj < 10. Equation (2.55) completes the proof of Proposition 2.1. t u Note that our original proof of Proposition 2.1 was a much longer, brute force deduction from Ostrowski’s formula (1.55) (see [Be2, Be3]). Later Schoissengeier [Scho] pointed out the connection with Dedekind sums and some related results of Knuth [Kn1], which made the proof substantially shorter. The proof above follows the Schoissengeier–Knuth approach.
2.1.4 Proposition 2.1 and Some Works of Hardy and Littlewood It is interesting to note that, a few weeks after we completed our proof of Proposition 2.1 (November 1995), we accidentally noticed the following technical lemma in Hardy–Littlewood [Ha-Li2]. “Lemma 14”: If ˛ D Œa0 I a1 ; a2 ; then l 1 X 1 k 2 .1/ ˛i C C O . max ai / ; M˛ .N / D 1i l 12 i D1 ˛i where l is the least index such that ql N , and ˛i D ai C
1 1 ai C1 C ai C2 C
D Œai I ai C1 ; ai C2 ; :
(2.56)
2.1 Computing the Expectation in General (I)
99
By using the trivial identity ˛i D ai C ˛i1C1 ; the alternating sum in “Lemma 14” becomes 1 1 1 ˛1 C C ˛2 C ˛3 C ˙ ˛1 ˛2 ˛3 D a1 C a2 a3 ˙ C .1/i ai ˙ :
(2.57)
The surprising conclusion is that from “Lemma 14” we can obtain a somewhat weaker version of Proposition2.1 in one line. Note that (2.56) is weaker, because the error term O .max1i l ai /2 is the square of the linear error term O.max1i l ai / in Proposition 2.1. Note that Hardy and Littlewood proved their “Lemma 14” by using a different kind of reciprocity formula (namely, the reciprocity formula for the theta functions). A related development is that, about 10 years later, in 1930, Hardy and Littlewood [Ha-Li3] studied the following (diophantine) series: 1 X nD1
1 n sin. n˛/
(2.58)
and made a very interesting discovery. Though the terms of the series (2.58) do not tend to zero for any ˛, Hardy and Littlewood p managed to prove the next best thing; namely, that for the special value ˛ D 2 the partial sums of (2.58) remain uniformly bounded, i.e., N X
1 D O.1/: n sin. n˛/ nD1
(2.59)
p In general, if ˛ D a2 C 1; a is odd, then the partial sums are similarly O.1/: p On the other hand, Hardy and Littlewood noticed that for ˛ D 6=2 1 the N th partial sum is c log N C O.1/ with c ¤ 0. p What is going on here? The proof of the “O.1/-theorem” for ˛ D a2 C 1, a is odd, was so complicated, mysterious, and ad hoc that in his Introduction to the Collected Papers of G.H. Hardy, Vol. 1, Davenport listed the “real understanding” of this paper as a major research problem in diophantine approximation. Now here is our “real understanding”: the “O.1/-theorem” of Hardy and Littlewood is a simple corollary of Proposition 2.1. Indeed, all that we need is the simple identity N X
1 D 4M˛=2 .N / 2M˛ .N / C O. max ai /; 1i l n sin. n˛/ nD1
where l is the last index such that ql N .
(2.60)
100
2 Expectation, and Its Connection with Quadratic Fields
Equation (2.60) is an easy consequence of two facts. The first one is (2.12): M˛ .N / D
N 1 1 X C O. max ai / 1i k 2 nD1 n tan. n˛/
where k is the last index for which qk N , and the second fact is a simple trigonometric identity: 1 2 cos2 .ˇ/ cos.2ˇ/ 1 1 D D : tan.ˇ/ tan.2ˇ/ 2 sin.ˇ/ cos.ˇ/ sin.2ˇ/ It seems very likely that Hardy and Littlewood overlooked the simple application of Proposition 2.1 via (2.60) (the weaker error term (2.56) would be fine here). This is why they had to develop a complicated ad hoc method in [Ha-Li3]. P We will return to the Hardy–Littlewood series n 1=n sin. n˛/ in Sect. 2.3.
2.2 Computing the Expectation in General (II) 2.2.1 The Expectation in Theorem 1.1 Next we switch from the saw-tooth function ..x// to the characteristic function ( .x/ D
1; if 0 x < I 0; if x < 1;
(2.61)
of the interval Œ0; /, where 0 < < 1, and extend it periodically modulo 1. Then we get the simple equation .x/ D ..x // ..x//:
(2.62)
The sum n X
.k˛/
kD1
is the counting function for the irrational rotation: it counts the integers k in 1 k n for which k˛ 2 Œ0; / modulo 1. Theorem 1.1 is about this counting function. Therefore, to prove Theorem 1.1, we have to determine the corresponding expectation: by (2.62) we need to evaluate the generalized Dedekind sum
2.2 Computing the Expectation in General (II)
D.H; KI c/ D
K1 X j D1
101
j K
jH C c K
;
(2.63)
where c, the “shift constant,” is an arbitrary real number (by (2.62) we use c D or c D 1 ; it doesn’t matter which one). The following lemma, a reciprocity law due to Dieter [Di], describes the connection between the ordinary Dedekind sum and its generalization (2.63). For later application, we have to include a proof. Lemma 2.7. Let 1 H < K be relatively prime integers, and let 0 < c < K be a real number. Then D.H; KI c/ C D.K; H I c/ D D.H; K/ C D.K; H /C 1 bccdce 1 bc=H c C E.H; c/; 2HK 2 4
(2.64)
where ( E.H; c/ D
0; if c 6 0 mod H I 1; if c 0 mod H:
(2.65’)
Proof. First assume that c is a natural number; we prove (2.64) by induction on c. Clearly
jH C c C 1 K
D
jH C c K
C
1 1 1 jH C c jH C c C 1 C ı ; ı K 2 K 2 K (2.66)
where in this section we use the notation ı.x/ D 1 if x is an integer and 0 otherwise (“Kronecker delta”). By (2.63) and (2.66), D.H; KI c C 1/ D
K1 K1 X 1 X j jH C c j C K K K j D1 K j D1
K1 jH C c jH C c C 1 1X j ı Cı : 2 j D1 K K K
(2.67)
Since 1 H < K are relatively prime, there exist two integers h0 and k 0 such that H h0 C Kk 0 D 1:
(2.68)
102
2 Expectation, and Its Connection with Quadratic Fields
If j h0 c .mod K/ then jH C c 0 .mod K/; and because the saw-tooth function ..x// is odd, we can rewrite (2.67) as follows: D.H; KI c C 1/ D D.H; KI c/ C
1 2
h0 c K
C
1 2
h0 .c C 1/ K
:
It follows by induction on c that D.H; KI c/ D D.H; KI 0/ C
c1 0 X hj
K
j D1
1 C 2
h0 c K
:
(2.69)
For every j with 1 j K 1 [see (2.68)]
h0 j K
D
D
j k 0 Kj HK k0 j H
D
j 1 C ı HK 2
k 0 Kj j HK
k0 j H
D
:
(2.70)
Adding (2.69) to itself with H and K interchanged, and using (2.70), we have D.H; KI c/ C D.K; H I c/ D D.H; K/ C D.K; H / C S; where SD
0 0 i 1 X c 1 1 j kj kc C : ı ı HK 2 H 2HK 4 H j D1
(2.71)
The evaluation of the last line in (2.71) is easy: we have SD
c2 1jc k 1 c
C ı : 2HK 2 H 4 H
(2.72)
Equations (2.71) and (2.72) complete the proof when c is any integer. For an arbitrary real number c we use the identity D.H; KI c C / D D.H; KI c/ C
1 2
h0 c K
;
(2.73)
2.2 Computing the Expectation in General (II)
103
where c 0 is an integer and 0 < < 1 [h0 is defined by (2.68)]. The proof of (2.73) is easy: K1 X j D1
D
K1 X j D1
j K
j K
jH C c K
jH C c C K
1 C ı K 2
D D.H; KI c/ C 0
1 2
D
h0 c K
jK C c K
D
;
because h0 Hc C c 0 (mod K), and (2.73) follows. When 0 < < 1, Eqs. (2.73) and (2.70) imply that D.H; KI c C / C D.K; H I c C / D D.H; KI c/ C D.K; H I c/C C
c 1 c
: ı 2HK 4 H t u
This completes the proof of Lemma 2.7.
Lemma 2.7 leads to the following analog of Lemma 2.4; see Knuth [Kn1]. Again we need the proof. Lemma 2.8. Let 1 H < K be relatively prime integers and let 0 < c < K be a real number. Let H D K
1 1 a1 C a2 C : : :
D Œa1 ; a2 ; a3 ; : : : ; a` ;
then D.H; KI c/ D.H; K/ D
C
b1 C b2 b3 ˙ C .1/` b` C 2
2 c`1 c22 c12 c02 C C .1/`1 C O.1/; 2KH 2HH1 2H1 H2 2H`2 H`1
(2.74)
where the terms bi , ci , Hi in (2.74) are determined by two Euclidean algorithms as follows. Let H1 D K, H0 D H , and define Hi by the first Euclidean algorithm K D a1 H C H1 ; H D a2 H1 C H2 ; H1 D a3 H2 C H4 ; : : : ; H`2 D al H`1 ; (2.75)
104
2 Expectation, and Its Connection with Quadratic Fields
where H`1 D gcd.H; K/ D 1 (gcd denotes the greatest common divisor); then by using (2.75), we define the integers bi and the real numbers ci via the second Euclidean algorithm c D c0 D b1 H0 C c1 ; c1 D b2 H1 C c2 ; c2 D b3 H2 C c3 ; : : : ; c`1 D b` H`1 C c` ; (2.76) where 0 c1 < H0 , 0 c2 < H1 , : : :, and 0 c` < 1 (note that H` D 0). The error term O.1/ in (2.74) has absolute value 1. Proof. First assume that c is an integer; then c` D 0. Write .h; kI c/ D D.h; kI c/ D.h; k/ and F .h; k; c/ D
1 c 1 c
c2 ; b cC ı 2hk 2 h 4 h
then by Lemma 2.7, .h; kI c/ D F .h; k; c/ .k; hI c/ D D F .h; k; c/ .k .mod h/; hI c .mod h//:
(2.77)
Combining the Euclidean algorithms (2.75) and (2.76) with (2.77), we have .Hj ; Hj 1 I cj / D F .Hj ; Hj 1 ; cj / .Hj C1 ; Hj I cj C1 /
(2.78)
for j D 0; 1; 2; : : : ; ` 1. Write Fj D F .Hj ; Hj 1 ; cj /; then by repeated application of (2.78), we have .H; KI c/ D F0 F1 C F2 F3 ˙ : : : C .1/`1 F`1 D D
`1 X
.1/
j D0
D
j
! cj 1 1 D bj C1 C ı 2hk 2 4 Hj cj2
`1 cj2 .1/`1 b1 C b2 b3 ˙ C .1/` b` X .1/j C C : 2 2Hj 1 Hj 4 j D0
Equation (2.79) proves Lemma 2.8 if c is an integer. If c is not an integer then we simply apply (2.73).
(2.79)
t u
2.2 Computing the Expectation in General (II)
105
2.2.2 An Analog of Proposition 2.1 Let 0 < ˛ < 1 be any irrational and let 0 < < 1 be any rational number. To prove Theorem 1.1 about the irrational rotation, first we need to know the average (“expectation”) M˛ .I N / D
N 1 X S˛ .I n/; N nD1
(2.80)
where S˛ .I n/ D
n X .k˛/
(2.81)
kD1
and the characteristic function .x/ is defined in (2.61). By using (2.62) we have S˛ .I n/ D
n X
...k˛ // ..k˛/// ;
kD1
and N 1 X .N C 1 k/ ...k˛ // ..k˛/// : M˛ .I N / D N nD1
Repeating the proof of Proposition 2.1 with some natural modifications, we obtain the following analogous result. Proposition 2.9. For any irrational ˛ > 0, any real number 0 < < 1, and any integer N 1, M˛ .I N / D
b1 b2 C b3 C .1/`1 b` 2
2 c`1 c02 c22 c12 ˙ C .1/` C max bj ; C 1j ` 2KH 2HH1 2H1 H2 2H`2 H`1
(2.82)
where jj < 10, ˛ D Œa1 ; a2 ; : : :, the index ` D `.˛; N / is defined as the last integer j such that qj N , where pj =qj is the j -th convergent of ˛, and finally the terms bi , ci , Hi in (2.82) are determined by the two Euclidean algorithms (2.75) t u and (2.76) with c D c0 D .1 /K, K D q` , H D p` (i.e., H=K D p` =q` ). Next we show some illustrations.
106
2 Expectation, and Its Connection with Quadratic Fields
p Example 2.10. First let D 1=2. We begin with ˛ D 2, and evaluate Mp2 .1=2I N /, i.e., the corresponding expectation in Theorem 1.1. The continued p fraction 2 1 D Œ2; 2; 2; : : : D Œ2 gives that 2 D a1 D a2 D a3 D in (2.75). Next we compute bi , ci , Hi in (2.76) as follows: c D c0 D .1 /K D
1 1 .2H C H1 / D H C H1 ; 2 2
implying b1 D 1, and c1 D c2 D
1 1 H1 D 0 H C H1 ; implying b2 D 0; and 2 2
1 1 1 H1 D .2H2 H C H3 / D H2 C H3 ; implying b3 D 1; 2 2 2
and so on. Thus we obtain the periodic sequences b1 D 1; b2 D 0; b3 D 1; b4 D 0; : : : ; bi D c0 D
1 .1 C .1/i 1 /I 2
1 1 1 1 K; c1 D c2 D H1 ; c3 D c4 D H3 ; c5 D c6 D H5 ; : : : 2 2 2 2
Hence we have b1 b2 C b3 b4 ˙ 1 0C10C1 0C D 2 2
(2.83)
and c22 c02 c12 ˙ D C 2KH 2HH1 2H1 H2 1 1 1 K H3 H5 1 1 1 H1 D 8H 8 H2 H 8 H4 H2 8 H6 H4 (2.84) Since 1 H2i C1 1 2H2i C1 H2i C1 H2i H2i C2 H2i C1 D D D 8 H2i C2 H2i 8 H2i C2 H2i 8 H2i C2 H2i
D
H2i2 C1 1 D C exponentially small; 4H2i C2 H2i 4
(2.85)
2.2 Computing the Expectation in General (II)
107
applying (2.83)–(2.85) in Proposition 2.9, by (2.82) we have Mp2
1 IN 2
D
10 1 2 4
1 log N p C O.1/; 2 log.1 C 2/
where in the last step we used the fact that [see (2.79)] q` D
.1 C
p p ` 2/ .1 2/` log N p D N implies ` D p C O.1/: 2 2 log.1 C 2/
Thus we obtain Mp2
1 IN 2
1 log N p C O.1/; 8 log.1 C 2/
D
(2.86)
which proves (1.32). In the special case D 1=2 we have the ad hoc identity 1=2 .x/
1 D ..2x// 2..x//; 2
(2.87)
which gives the equation [see (2.62) and (2.80)] M˛
1 IN 2
D M2˛ .N / 2M˛ .N /:
(2.88)
By using (2.88), we can easilypdouble-check (2.86). What it means is that we apply Proposition 2.1 for both ˛ D 2 D Œ2 and p p 2˛ D 2 2 D 8 D Œ2I 1; 4; 1; 4; 1; 4; : : : D Œ2I 1; 4: p The length of the period of ˛ D 2 is odd, so the corresponding alternating sum in Proposition 2.1 cancels out. Thus we have Mp2 D
1 IN 2
D M2p2 .N / D
1 C 4 1 C 4 1 C 4 C O.1/ D 12
log N log N 1 1 C 4 1 p C O.1/ D p C O.1/; 12 2 8 log.1 C 2/ log.1 C 2/
(2.89)
which givespback (2.86). In Eq. (2.89) we used the fact that the .2i /th convergent p2i =q2i of 8 satisfies the equation p2i ˙ q2i
p p 8 D .3 ˙ 8/i
108
2 Expectation, and Its Connection with Quadratic Fields
(due to the fact that the least positive solution of x 2 8y 2 D ˙1 is x D 3; y D 1), which implies p
p p p 1 q2i D p .3 C 8/i .3 8/i .3 C 8/i D .1 C 2/2i : 2 8 The ad hoc equation (2.88) gives p a shortcut for D 1=2 with any quadratic irrational ˛. For example, if ˛ D 3 D Œ1I 1; 2 then p p 2˛ D 2 3 D 12 D Œ3I 2; 6: Thus by (2.88) and Proposition 2.1, Mp3
1 IN 2
D M2p3 .N / 2Mp3 .N / D
2 C 6 log N 2 log N 1 C 2 p 2 p C O.1/ D O.1/; 2 2 log.2 C 3/ log.2 C 3/ (2.90) p since the .2i /th convergent p2i =q2i of 3 satisfies the equation D
1 12
p2i ˙ q2i
p
3 D .2 ˙
p i 3/ ;
which implies p
p p 1 q2i D p .2 C 3/i .2 3/i .2 C 3/i I 2 3 p p similarly, the i th convergent denominator for 2 3 is about .2 C 3/i (because p the least p positive solution of x 2 12y 2 D ˙1 is x D 7; y D 2, and 7 C 2 12 D .2 C 3/2 ). p Next consider the golden ratio ˛ D . 5C1/=2. Then ˛ D Œ1I 1 and 2˛ D Œ3I 4. Since the length of the period is odd for both continued fractions, by (2.88) and Proposition 2.1, M.p5C1/=2
1 IN 2
D O.1/:
(2.91)
p The last example p in this section is ˛p D 7 (again D 1=2). We need the following facts: 7 D Œ2I 1; 1; 1; 4, 28 D Œ5I 3; 2; 3; 10, the least positive solutions of x 2 7y 2 D ˙1 and x 2 28y 2 D ˙1pare, respectively, p x D 8; y D 3 and x D 127; y D 24 with the relation 127 C 24 28 D .8 C 3 7/2 . Combining these facts with (2.88) and Proposition 2.1, we have
2.2 Computing the Expectation in General (II)
Mp7 log N D 12
1 IN 2
109
D M2p7 .N / 2Mp7 .N / D
3 C 2 3 C 10 1 C 1 1 C 4 p 2 p log.127 C 24 28/ log.8 C 3 7/
C O.1/ D
log N
p C O.1/: 4 log.8 C 3 7/
(2.92)
Next we discuss examples where ¤ 1=2. p p Example 2.11. Next let D 1=3 and ˛ D 2. Then 2 D Œ1I 2 gives that 2 D a1 D a2 D a3 D in (2.75). We compute bi , ci , Hi in (2.76) as follows: c D c0 D .1 /K D
2 1 2 2 K D .2H C H1 / D H C H C H1 ; 3 3 3 3
implying b1 D 1, and similarly
c1 D
1 1 2 1 1 2 H C H1 D .2H1 C H2 / C H1 D H1 C H1 C H2 ; implying b2 D 1; and 3 3 3 3 3 3
1 1 1 1 1 H1 C H2 D .2H2 C H3 / C H3 D H2 C H3 ; implying b3 D 1; and 3 3 3 3 3 1 1 c3 D H3 D 0 H3 C H3 ; implying b4 D 0; and 3 3 1 1 2 1 c4 D H3 D .2H4 C H5 / D 0 H4 C H4 C H5 ; implying b5 D 0; and 3 3 3 3
c2 D
c5 D
2 1 2 1 2 2 H4 C H5 D .2H5 C H6 / C H5 D H5 C H5 C H6 ; implying b6 D 1; and 3 3 3 3 3 3
2 2 2 H5 C H5 D .2H6 C H7 / C 3 3 3 3 2 c7 D H7 D 0 H7 C H7 ; implying 3 3
c6 D
c8 D
2 2 H6 D 2H6 C H7 ; implying b7 D 2; and 3 3 b8 D 0; and
2 2 1 2 H7 D .2H6 C H9 / D H8 C H8 C H9 ; implying b9 D 1; and so on; 3 3 3 3
back to the beginning. Thus we get the periodic sequence for b1 ; b2 ; b3 ; : : :: 1; 1; 1; 0; 0; 1; 2; 0; 1; 1; 1; 0; 0; 1; 2; 0; 1; 1; 1; 0; 0; 1; 2; 0; : : :
110
2 Expectation, and Its Connection with Quadratic Fields
Therefore, we obtain b1 b2 C b 3 b4 ˙ D 2 D
1 11C10C01C20 log N p C O.1/; 2 8 log.1 C 2/
(2.93)
and
D
c22 c12 c02 ˙ D C 2KH 2HH1 2H1 H2
H32 1 .2K/2 .H C 2H1 /2 .H1 C H2 /2 log N p C C C 18 KH HH1 H1 H2 H2 H3 8 log.1 C 2/
H32 1 .2H4 C H5 /2 .2H5 C 2H6 /2 .2H7 /2 C C C 18 H3 H4 H4 H5 H5 H6 H6 H7
!
log N 8 log.1 C
p C O.1/: 2/ (2.94)
Since by (2.75) Hi Hi C2 D ai C2 D 2; H2i C1 we can rewrite (2.94) as follows: sum(2.94) D
1 H1 H3 H3 H5 4.K H1 / H H2 2 C C4C 18 H H1 H2 H4 4.H5 H7 / 4.H4 H6 / 8 C H5 H6
D
D
2 1 .8 C 4 C 2 2 2 2 C 4 C 8 8 8/ D ; 18 3
implying sum(2.94) D
log N 12 log.1 C
p C O.1/: 2/
(2.95)
2.2 Computing the Expectation in General (II)
111
Applying (2.93)–(2.95) in (2.82), we have Mp2
1 IN 3
D
D Next let D 2=3 and ˛ D
p
Mp
2
1 1 8 12
log N 24 log.1 C
log N p C O.1/ D log.1 C 2/
p C O.1/: 2/
(2.96)
2, then a similar calculation gives the same answer:
2 IN 3
log N
D
24 log.1 C
p
2/
C O.1/:
(2.97)
We can easily double-check (2.96) and (2.97) by using the ad hoc equation 1 2 C 2=3 .x/ D ..3x// 3..x//; 1=3 .x/ 3 3
(2.98)
which leads to [see (2.62) and (2.80)] M˛
1 IN 3
C M˛
2 IN 3
D M3˛ .N / 3M˛ .N /:
(2.99)
Notice that (2.98) and (2.99) is an analog of (2.87) and (2.88). p p We have 3 2 D 18 D Œ4I 4; 8, and so by Proposition 2.1, M3p2 .N / D
1 4 C 8 log N p C O.1/; 12 2 2 log.1 C 2/
(2.100)
2 2 because the least positive solution p of x 18y D ˙1 is x D 17; y D 4, and so the .2i /th convergent p2i =q2i of 18 satisfies the equation
p2i ˙ q2i
p p 18 D .17 ˙ 4 18/i ;
which implies p p q2i .17 C 4 18/i D .1 C 2/4i :
112
2 Expectation, and Its Connection with Quadratic Fields
Since the length of the period of Mp2
1 IN 3
p
2 is odd, by (2.99) and (2.100),
C
Mp2
D
2 IN 3
log N 12 log.1 C
D M3p2 .N / 3Mp2 .N / D p C O.1/; 2/
which is in agreement with (2.96) and (2.97). p Example 2.12. Let D 1=4 and ˛ D . 5 C 1/=2 D Œ1I 1. Then 1 D a1 D a2 D a3 D in (2.75), c D c0 D .1 /K D
3 3 1 K D .H C H1 / D H C .3H1 H /; 4 4 4
(2.101)
implying p b1 D 1. Note that 3H1 > H , since H=H1 is very close to the golden ratio ˛ D . 5 C 1/=2 < 3. We have 3H1 H D 3H1 .H1 C H2 / D 2H1 H2 D 2.H2 C H3 / H2 D H2 C 2H3 ; (2.102) and so c1 D
1 1 H2 C H3 D 0 H1 C c2 D 0 H2 C c3 ; implying b2 D b3 D 0; and 4 2
1 H2 C 4 3 c4 D H3 C 4
c3 D
1 1 H3 D .H3 C H4 / C 2 4 1 3 H4 D .H4 C H5 / C 4 4
1 3 1 H3 D H3 C H4 < H3 ; implying b4 D 0; and 2 4 4 1 3 H4 D H4 C H5 ; implying b5 D 1; and 4 4
c5 D
3 3 H5 D 0 H5 C H5 ; implying b6 D 0; and 4 4
c6 D
3 3 1 H5 D .H6 C H7 / D H6 C .3H7 H6 /; 4 4 4
which is the same as the beginning. Thus we get the periodic sequence for b1 ; b2 ; b3 ; : : :: 1; 0; 0; 0; 1; 0; 1; 0; 0; 0; 1; 0; 1; 0; 0; 0; 1; 0; : : : ; implying b1 b2 C b 3 b4 ˙ D 2 D
1 10C00C10 log N p C O.1/; 2 6 log 5C1 2
(2.103)
2.2 Computing the Expectation in General (II)
113
and
c22 c12 c02 1 log N p ˙ D C O.1/; C S0 2KH 2HH1 2H1 H2 32 6 log 5C1 2
(2.104)
where S0 D
9K 2 C.H2 C2H3 /2 KH
1 1 9H52 .3H3 C H4 /2 1 C C : HH1 H1 H2 H2 H3 H3 H4 H4 H5
p The critical sum S0 in the middle of (2.104) equals (with ˛ D . 5 C 1/=2) .3˛ C 1/2 C 9˛1 ; S0 D 9˛ C .˛ C 2/2 ˛5 ˛ 3 C ˛1 ˛
(2.105)
and using the simple facts ˛ 2 D 1 C ˛ and ˛ 2 D 1 ˛1 , it is easy to evaluate (2.105): S0 D 24. Returning to (2.104), we have sum(2.104) D
1 log N p C O.1/: .24/ 32 6 log 5C1 2
(2.106)
Applying (2.103)–(2.106) in (2.82), we have M.p5C1/=2
1 IN 4
log N 24 p C O.1/ D D 1 32 6 log 5C1 2
D
log N 24 log
p 5C1 2
C O.1/:
(2.107)
2.2.3 Periodicity in Proposition 2.9 Let’s return to Proposition 2.9 and Eq. (2.82). The periodicity of b1 , b2 , b3 , : : : in the examples above was not an accident: we prove that if the sequence a1 , a2 , a3 , : : : is periodic and c=K is a rational number, then b1 , b2 , b3 , : : : is also periodic (but the length of the period is not necessarily the same). Indeed, write c=K D s=t where 1 s < t are relatively prime integers. Then by (2.75) and (2.76), s s c D c0 D K D .a1 H C H1 / D b1 H C c1 ; t t
114
2 Expectation, and Its Connection with Quadratic Fields
where (bxc and fxg denote the lower integral part and the fractional part of x) b1 D
j sa k 1
t
and c1 D
n sa o 1
t
s1 s s H C H1 D H C H1 ; t t t
and here we assume that c1 < H . Similarly, c1 D
s1 s1 s s H C H1 D .a2 H1 C H2 / C H1 D b2 H1 C c2 ; t t t t
where b2 D
s1 a2 C s t
and c2 D
s2 H1 C s1 H2 s1 a2 C s s1 H C H2 D ; t t t
and again we assume that c2 < H1 . Repeating this argument, for every i 0 we have si Hi 1 C si 1 Hi ; t
ci D
(2.108)
where 0 si ; si 1 < t are integers, and we always assume that ci < Hi 1 . The periodicity of ai means that ai D ai CL holds for .say/ M1 i M2 ;
(2.109)
and here we assume that .M2 M1 /=L is a very large integer. Consider now the sequence with gap L [see (2.109)]: cM1 ; cM1 CL ; cM1 C2L ; cM1 C3L ; ; cM2 I by (2.108) we have cM1 CjL D
sj0 HM1 CjL1 C sj00 HM1 CjL t
< HM1 CjL1 ;
(2.110)
where 0 sj0 ; sj00 < t are integers. If .M2 M1 /=L is larger than t 2 , then by the Pigeonhole Principle there is a repetition among the pairs .sj0 ; sj00 /, j D 0; 1; 2; : : :, and the first repetition implies the periodicity of the sequence b1 , b2 , b3 , : : : in the rest of the interval M1 i M2 [see (2.109)]. Of course, we cannot predict the length of the period, but it is certainly less than L.t 2 C 1/. Warning! It may happen that our assumption ci D
si Hi 1 C si 1 Hi < Hi 1 ; 0 si ; si 1 < t; t
2.2 Computing the Expectation in General (II)
115
inp(2.108) is violated; for example, see Eq. (2.101) in Example 2.12 (where ˛ D . 5 C 1/=2 and D 1=4): 3 .H C H1 / > H; 4 p since H=H1 is very close to ˛ D . 5 C 1/=2 < 3. This is why we cannot write c0 D
c0 D 0 H C c1 with c1 D
3 .H C H1 /; 4
instead we have to use c0 D H C
3H1 H D H C c1 ; 4
where in c1 we face a negative(!) coefficient: 1 3 0 < c1 D H C H1 < H: 4 4
(2.111)
p For ˛ D . 5 C 1/=2 < 3 we can use the ad hoc fact [see (2.102)] 3H1 H D H2 C 2H3 ;
(2.112)
which simply eliminates the “negativity problem” in (2.111). Next we show that this trick always works; we can always eliminate the “negativity problem.” To prove this, assume that for some i we have—just like in (2.110)—the reverse of (2.108): ci D
si Hi 1 C si 1 Hi > Hi 1 0 si ; si 1 < t: t
(2.113)
Then we rewrite (2.113) in the form ci D Hi 1 C ci0 where ci0 D
si 1 Hi .t si /Hi 1 t
and 0 ci0 < Hi 1 . In (2.75) we have the recurrence formula Hi 1 D ai C1 Hi C Hi C1 , so with ri D t si , si 1 Hi ri Hi 1 D si 1 Hi ri .ai C1 Hi C Hi C1 / D si1 Hi ri Hi C1 ; where si1 D si 1 ri ai C1 1. Case 1:
si1 ri .
116
2 Expectation, and Its Connection with Quadratic Fields
By using Hi D ai C2 Hi C1 C Hi C2 , we have the following analog of (2.112): si1 Hi ri Hi C1 D si1 .ai C2 Hi C1 C Hi C2 / ri Hi C1 D D .si1 ai C2 ri /Hi C1 C si1 Hi C2 ;
(2.114)
which eliminates the “negativity problem.” Case 2:
si1 < ri .
Then again we use (2.114): si1 Hi ri Hi C1 D .si1 ai C2 ri /Hi C1 C si1 Hi C2 :
(2.115)
If .si1 ai C2 ri / is positive, then we are done; if it is negative, then clearly ri C2 D jsi1 ai C2 ri j < si1 , and we can rewrite (2.115) in the form si1 Hi ri Hi C1 D si1 Hi C2 ri C2 Hi C1 where ri > ri C2 0:
(2.116)
The decreasing property in (2.116) guarantees that, repeating this argument less than t times, the negative coefficient eventually disappears [i.e., turns into a positive coefficient like in (2.112)]. In other words, in both cases we can eliminate the “negativity problem.” By getting rid of the “negativity problem,” we are safe to say that the Pigeonhole Principle argument above always works. As a consequence, we obtain the periodicity of b1 , b2 , b3 , : : :. Combining this periodicity with Lemma 2.7 and Proposition 2.9 [see Eq. (2.82)], we have Proposition 2.13. If ˛ is a quadratic irrational and 0 < < 1 is a rational number, then there is a constant c D c.˛; / such that M˛ .; N / D c log N C O.1/
(2.117)
holds for every integer N 2.
2.3 Fourier Series and a Problem of Hardy and Littlewood (I) It is a standard exercise in every Fourier analysis course to compute the Fourier coefficients of the sawtooth function ..x// D
1 X sin.2jx/ j D1
j
;
(2.118)
2.3 Fourier Series and a Problem of Hardy and Littlewood (I)
117
where ..x// D fxg 1=2 if x is not an integer and 0 otherwise. We want to apply (2.118) in both S˛ .n/ D
n X
..k˛// and M˛ .N / D
kD1
N N 1 X 1 X S˛ .n/ D .N C 1 k/..k˛//; N nD1 N nD1
but we have to be a little bit careful, since the Fourier series in (2.118) is not absolutely convergent. Instead of (2.118) we actually use a finite version with a small error term. First we recall Abel’s transformation (“discrete integration by parts”): m X
aj bj D a1 .b1 b2 / C .a1 C a2 /.b2 b3 /C
j D1
C.a1 Ca2 Ca3 /.b3 b4 /C: : :C.a1 C: : :Cam1 /.bm1 bm /C.a1 C: : :Cam /bm : (2.119) We also need the well-known summation formula m X j D1
sin.jˇ/ D
cos.ˇ=2/ cos..2m C 1/ˇ=2/ ; 2 sin.ˇ=2/
(2.120)
which implies the useful upper bound ˇ ˇ ˇ ˇ m ˇ ˇX 1 ˇ sin.jˇ/ˇˇ : ˇ ˇ j sin.ˇ=2/j ˇj D1
(2.121)
The pointwise convergence of the Fourier series in (2.118) follows from (2.119) and (2.121), and the equality of the two sides in (2.118) follows from Fejér’s wellknown theorem in Fourier analysis. By (2.119) and (2.121), for any T 1, ˇ ˇ ˇ ˇ T X ˇ ˇ 1 2 sin.2jx/ ˇ..x// C ˇ < ; ˇ ˇ j T kxk ˇ ˇ T j sin.x/j j D1
(2.122)
where kxk denotes, as usual, the distance of x from the nearest integer. It follows that ˇ ˇ ˇ ˇ T X n n X X ˇ ˇ sin.2j k˛/ 1 ˇS˛ .n/ C ˇ< 1 : (2.123) ˇ ˇ j ˇ ˇ T kD1 kk˛k j D1 kD1
118
2 Expectation, and Its Connection with Quadratic Fields
2.3.1 Badly Approximable Numbers We need to estimate the diophantine sum n X kD1
1 kk˛k
from above for the class of quadratic irrational ˛. Our argument below—a standard application of the Pigeonhole Principle—will work even for a larger class of reals, called badly approximable numbers. A real number ˛ is called badly approximable, if there is a positive constant c0 D c0 .˛/ > 0 such that kkk˛k c0 > 0 holds for all integers k 1: One can easily characterize this class in terms of the continued fraction: ˛ is badly approximable if and only if the sequence a1 ; a2 ; a3 ; : : : of partial quotients in ˛ D Œa0 I a1 ; a2 ; a3 ; : : : is bounded, i.e., there is a threshold M0 D M0 .˛/ < 1 such that ak M0 holds for all k 1. The well-known fact from diophantine approximation ˛D
.1/i C1 pi C ; qi qi .qi C1 C qi /
where pi =qi D Œa0 I a1 ; : : : ; ai 1 is the i th convergent of ˛, qi C1 D ai qi C qi 1 , and 0 < D .i / < 1, implies that c0 and M0 are basically reciprocals of each other (apart from an absolute constant factor). Note that every quadratic irrational is badly approximable, since periodicity implies boundedness. Lemma 2.14. Assume that ˛ is badly approximable, and kkk˛k c0 > 0 holds for all integers k 1. Then for any integer n, n X kD1
n 4 1 = log 2 : n log kk˛k c0 c0
In general, for any m > 2 we have X n
1 D O.n log m/: kk˛k
Proof. What we do is a routine application of the Pigeonhole Principle. To prove the first part, we define the set 2j 1 2j Aj D 1 k n W c0 kk˛k < c0 : n n
2.3 Fourier Series and a Problem of Hardy and Littlewood (I)
119
Of course Aj is empty if 1 2j 1 c0 > : n 2
(2.124)
We claim that the set Aj has at most 2j C1 elements. Indeed, if jAj j > 2j C1 then by the Pigeonhole Principle there exist 1 k1 < k2 n such that ki 2 Aj , i D 1; 2, and jfk1 ˛g fk2 ˛gj <
c0 : n
By choosing ` D k2 k1 , we have k`˛k < c0 =n, which contradicts the hypothesis `k`˛k c0 > 0. Thus we have n X kD1
X j 1
D
XX 1 1 D kk˛k kk˛k j 1 k2Aj
n 2j 1 c0
4n c0
jAj j
X j 1W2j n=c0
1 X n j C1 2 D c0 j 1 2j 1
1<
4n n log = log 2; c0 c0
where at the end we used (2.124). This proves the first part in Lemma 2.14. The same Pigeonhole Principle argument proves the second part.
t u
By Eqs. (2.120), (2.123) and by Lemma 2.14, we obtain Lemma 2.15. Assume that ˛ is badly approximable, and kkk˛k c0 > 0 for all integers k 1. Then for any n and T , S˛ .n/ D
T X cos..2n C 1/j˛/ cos.j˛/ j D1
2j sin.j˛/
C 1
4n log.n=c0 / log 2c0 T
C
(2.125)
120
2 Expectation, and Its Connection with Quadratic Fields
and M˛ .N / D
N T X 1 X 1 S˛ .n/ D N nD1 2j tan.j˛/ j D1
T X sin.2j˛/ sin.2.N C 1/j˛/ 4n log.n=c0 / C 2 ; 2 log 2c0 T 4Nj sin .j˛/ j D1
(2.126)
where j1 j < 1 and j2 j < 1.
t u
The only novelty in the proof of (2.126) is the use of the summation formula N X
cos.nˇ C / D
nD1
sin..N C 12 /ˇ C / sin. 12 ˇ C / ; 2 sin.ˇ=2/
(2.127)
instead of (2.120).
2.3.2 The Hardy–Littlewood Series Now we return to the numerical series 1 X nD1
1 ; ˛ is irrational; n sin. n˛/
(2.128)
briefly mentioned at the end of Sect. 2.1. First notice that the series (2.128) cannot be convergent, since the terms do not tend to zero for any ˛. Indeed, the inequality kn˛k < 1=n holds for infinitely many values of n, for example, let n D qj where pj =qj is the j th convergent of ˛. The inequality kn˛k < 1=n combined with the trivial fact j sin. n˛/j kn˛k implies that (2.128) contains infinitely many terms that have absolute value 1=. Thus the convergence is out of the question. Nevertheless, Hardy p and Littlewood made the very interesting discovery that for the special value ˛ D 2 the partial sums of (2.128) remain uniformly bounded, that is, N X
1 D O.1/: n sin. n˛/ nD1
(2.129)
Equation (2.129) represents a miraculous cancellation; we can consider it the next best thing to convergence. Note thatpHardy and Littlewood actually proved the slightly more general result that if ˛ D a2 C 1; a is odd, then the partial sums always remain bounded. On the
2.3 Fourier Series and a Problem of Hardy and Littlewood (I)
121
other hand, for many other quadratic irrationals the N th partialpsum is c log N C O.1/ with c ¤ 0 (Hardy and Littlewood gave the example ˛ D 6=2 1). What is going on here? We will give a very transparent proof of (2.129) by using the following improved version of (2.126). Proposition 2.16. If ˛ is badly approximable, then for any N , M˛ .N / D
N X j D1
1 C O.1/; 2j tan.j˛/
(2.130)
where the implicit constant O.1/ D O˛ .1/ is independent of N . We postpone the proof of Proposition 2.16 to the next section. Besides Proposition 2.16, we also need the following simple trigonometric identity: 1 1 2 cos2 .ˇ/ cos.2ˇ/ 1 D D : tan.ˇ/ tan.2ˇ/ 2 sin.ˇ/ cos.ˇ/ sin.2ˇ/
(2.131)
By using (2.131), we obtain N X nD1
X X 1 1 1 D ; n sin. n˛/ n tan. n˛=2/ nD1 n tan. n˛/ nD1 N
N
and combining this with Proposition 2.16, we get the equation N X nD1
1 D 2M˛ .N / 2M˛=2 .N / C O.1/: n sin. n˛/
(2.132)
If ˛ is a quadratic irrational, then ˛=2 is also a quadratic irrational; therefore, combining Eq. (2.132) with Proposition 2.1, we obtain Proposition 2.17. If ˛ is a quadratic irrational, then there is a constant c D c .˛/ such that N X
1 D c log N C O.1/; n sin. n˛/ nD1
(2.133)
where the constant factor c D c .˛/ can be determined by using (2.132) and Proposition 2.16. Now we are in p a position to understand why the constant factor p c .˛/ in (2.133) equals 0 for ˛ D 2, and why in general it equals 0 for any ˛ D m2 C 1 where
122
2 Expectation, and Its Connection with Quadratic Fields
p m 1 is an odd integer. The advantage of ˛ D m2 C 1 is that it has a particularly simple continued fraction: ˛ D ŒmI 2m; 2m; 2m; D ŒmI 2m and Case 1: Case 2:
if m is odd, then ˛=2 D Œ.m 1/=2I 1; 1; m 1; if m is even, then ˛=2 D Œm=2I 4m; m.
In Case 1 both ˛ and ˛=2 have periods of odd length, so by Proposition 2.1 and (2.132), the partial sums of the series (2.128) are O.1/. On the other hand, in Case 2, ˛=2 has a period of even length, so the partial sums of the series (2.128) have the form c .˛/ log N C O.1/ where c .˛/ is never zero. Now we clearly understand why in the “O.1/-theorem” p of Hardy and Littlewood the condition “m is odd” was necessary. Indeed, if ˛ D m2 C 1 and m is even, then there is no O.1/-theorem: by (2.132) and Case 2 above, N X
1 D O.1/ 2M˛=2 .N / D n sin. n˛/ nD1
D
log N 2 4m m C O.1/ D p 12 2 log.m C m2 C 1/ D
m p log N C O.1/; 4 log.m C m2 C 1/
since x D m and y D 1 is the least solution of Pell’s equation x 2 .m2 C 1/y 2 D ˙1. In view of (2.132) it is natural to ask the following related question: How to compute the continued fraction for ˛=2 from the continued fraction for ˛? Well, if ˛ D Œa0 I a1 ; a2 ; a3 ; then ˛=2 D Œa0 =2I 2a1 ; a2 =2; 2a3 ; a4 =2; ; a2i =2; 2a2i C1 ; if this formula does make sense, i.e., if a2i is even for every i 0. Under this “parity condition,” by using (2.132) and Proposition 2.16, it is very easy to characterize those quadratic irrationals for which the partial sums of the series (2.128) are O.1/. Indeed, if the length s of the period aj C1 ; aj C2 ; ; aj Cs of ˛ is odd, then the Pj Cs necessary and sufficient condition for an “O.1/-theorem” is i Dj C1 .1/i ai D 0. On the other hand, if the length of the period is even, then there is no “O.1/theorem” whatsoever. p For example, if ˛ D 41 D Œ6I 2; 2; 12; 2; 2; 12; : : : D Œ6I 2; 2; 12 then the “parity condition” holds: ˛ D 2
p
41 D Œ3I 4; 1; 24; 1; 4; 6; 4; 1; 24; 1; 4; 6; : : : D Œ3I 4; 1; 24; 1; 4; 6; 2
2.3 Fourier Series and a Problem of Hardy and Littlewood (I)
123
and by (2.132) we have N X
1 D O.1/ 2M˛=2 .N / D n sin. n˛/ nD1
D
log N 2 4 1 C 24 1 C 4 6 p C O.1/ D 12 6 3 log.32 C 5 41/ D
2 log N p C O.1/; 9 log.32 C 5 41/
since x D 32 and y D 5 is the least solution of Pell’s equation x 2 41y 2 D ˙1. The general case, when the “parity condition” is violated, is technically more complicated and somewhat unpleasant. We guess that this technical difficulty was the reason why Hardy and p Littlewood restricted their study to the very special quadratic irrationals ˛ D m2 C 1 D ŒmI 2m having the simplest possible (“one digit period”) continued fraction. How to obtain the continued fraction for ˛=2 in general, assuming we know ˛ D Œa0 I a1 ; a2 ; a3 ; ? There is an interesting general procedure to answer this question, even when the “parity condition” is violated. We learned it from Richard Bumby (Rutgers University), an expert in continued fractions, who claims that the procedure goes back to Hurwitz. What Hurwitz was really interested in was to find the continued fraction for e=2 and 2e, based on the knowledge of Euler’s classical solution for e: e D Œ2I 1; 2; 1; 1; 4; 1; 1; 6; 1; 1; 8; 1; : : : ; 1; 2i; 1; : : ::
(2.134)
2.3.3 Doubling and Halving in Continued Fractions The procedure consists of three operations. The first two, H =“halving,” D =“doubling,” are perfectly natural; the third, S =“special operation,” is the tricky one. For example, to get the continued fraction for e=2, first we apply the “halving operation” H to the first “digit” 2 in (2.134): this gives 1, and next comes the “doubling operation” D applied to the second “digit” 1 in (2.134), and so on. There are nine rules. 1. 2. 3. 4. 5. 6.
H(2n) = nD (i.e., D comes next) Dn = (2n)H H(2n C 1) = n,1S Dn,1 = (2n C 1)S S(2n) = 1,n 1,1S S(2n C 1) = 1,nD
124
2 Expectation, and Its Connection with Quadratic Fields
7. S1,n = (2n C 1)H 8. S1,n,1 = (2n C 2)S 9. S2 = 2S Note that rules 1 and 2 are obvious, but the rest of the rules require a little bit of work with continued fraction. For example, to prove rule 3, we may proceed as follows (n 1, m 1 are integers and x > 1 is a real): .2n C 1/ C
1 mC x1
2 DnC
1 2mC x2 mC1C x1
DnC
m C 1 C x1 1 1 D n C D C 2 2.m C x1 / 2m C x2
1
DnC 1C
m1C x1 mC1C x1
DnC
1 1C
DnC
1 1 mC1C x m1C x1
1 1C
1 1C
:
1 m1C x1 2
Assume now that m D 2k C 1 where k 1 is an integer, then .2n C 1/ C 2
1 mC x1
DnC
1 1C
1 1C
;
1 1 kC 2x
which proves the combination of rules 3 and 6. Similar argument proves the rest of the cases—we leave the details to the reader. We illustrate the application of these rules by determining the continued fractions of e=2 and 2e (first published by Hurwitz). To get e=2 we proceed on the “digits” in (2.134); we start with the “halving operation” applied on 2 (the first “digit” of e): H2 H) rule 1 H) 1 (D comes next) D1 H) rule 2 H) 2 (H comes next) H2 H) rule 1 H) 1 (D comes next) D1,1 H) rule 4 H) 3 (S comes next) S4 H) rule 5 H) 1,1,1 (S comes next) S1,1 H) rule 7 H) 3 (H comes next) H6 H) rule 1 H) 3 (D comes next) D1,1 H) rule 4 H) 3 (S comes next) S8 H) rule 5 H) 1,3,1 (S comes next) S1,1 H) rule 7 H) 3 (H comes next) H(10) H) rule 1 H) 5 (D comes next) D1,1 H) rule 4 H) 3 (S comes next) S(12) H) rule 5 H) 1,5,1 (S comes next) S1,1 H) rule 7 H) 3 (H comes next) and so on. We applied the following rules: 1, 3, 1, 4, 5, 7, 1, 4, 5, 7, 1, 4, 5, 7, 1, 4, 5, 7,
2.3 Fourier Series and a Problem of Hardy and Littlewood (I)
125
This sequence shows periodicity; the period is 1,4,5,7, and we obtain e=2 D Œ1I 2; 1; 3; 1; 1; 1; 3; 3; 3; 1; 3; 1; 3; 5; 3; 1; 5; 1; 3; : : ::
(2.135)
It is easy to recognize the linear pattern in (2.135): e=2 D Œ1I 2; 1; 3; 1; 1; 1; 3; 3; 3; 1; 3; 1; 3; 5; 3; 1; 5; 1; 3; : : : ; 2i C 1; 3; 1; 2i C 1; 1; 3; : : :. Similarly, to get 2e we proceed on the “digits” in (2.134), but of course here we start with the “doubling operation” applied on 2: D2,1 H) rule 4 H) 5 (S comes next) S2 H) rule 9 H) 2 (S comes next) S1,1 H) rule 7 H) 3 (H comes next) H4 H) rule 1 H) 2 (D comes next) D1,1 H) rule 4 H) 3 (S comes next) S6 H) rule 5 H) 1,2,1 (S comes next) S1,1 H) rule 7 H) 3 (H comes next) H8 H) rule 1 H) 4 (D comes next) D1,1 H) rule 4 H) 3 (S comes next) S(10) H) rule 5 H) 1,4,1 (S comes next) S1,1 H) rule 7 H) 3 (H comes next) H(12) H) rule 1 H) 6 (D comes next) D1,1 H) rule 4 H) 3 (S comes next) S(14) H) rule 5 H) 1,6,1 (S comes next) and so on. We applied the following rules: 4, 9, 7, 1, 4, 5, 7, 1, 4, 5, 7, 1, 4, 5, 7, 1, 4, 5, This sequence shows periodicity with the same period as for e=2, and we obtain 2e D Œ5I 2; 3; 2; 3; 1; 2; 1; 3; 4; 3; 1; 4; 1; 3; 6; 3; 1; 6; 1; : : :: It is easy to recognize the linear pattern here: 2e D Œ5I 2; 3; 2; 3; 1; 2; 1; 3; 4; 3; 1; 4; 1; 3; 6; 3; 1; 6; 1; : : : ; 2i; 3; 1; 2i; 1; 3; : : :: (2.136)
2.3.4 A Geometric Interpretation We conclude Sect. 2.3 with the interesting observation that the partial sums of the Hardy–Littlewood series [see (2.128)] N X nD1
1 ; ˛ is irrational; n sin. n˛/
(2.137)
126
2 Expectation, and Its Connection with Quadratic Fields
have a nice geometric meaning: the partial sums represent the “average error” in yet another natural lattice point counting problem. To justify this claim, we go back to Sect. 1.2, where we counted lattice points inside the axes-parallel right triangle bounded with the lines y D ˛x, y D 0, x D n (we excluded the lattice points on the boundary). Here we slightly modify the problem: let 0 < < 1, we shift the line y D ˛x to the parallel line y D ˛.x / passing through the point .; 0/— this point is the left corner of our new triangle; the lines y D 0, x D n remain unchanged. In other words, we just shift the left corner of the right triangle from the origin .0; 0/ to .; 0/. Counting the lattice points inside the new triangle vertically, we obtain the following sum [an analog of (1.47)]: b˛ ˛c C b2˛ ˛c C b3˛ ˛c C C b.n 1/˛ ˛c D D
n1 X 1 1 D k˛ ˛ fk˛g 2 2 kD1
D E˛; .n 1/ S˛; .n 1/;
(2.138)
where E˛; .m/
! 1 mC1 m ˛ C D˛ 2 2
and .m/ D S˛;
m X ..k˛ ˛//: kD1
Just like in Sect. 1.2, we consider E˛; .n 1/ the “expectation,” and S˛; .n 1/ is the “error term” (i.e., the deviation from the expected value). By using the Fourier series of the sawtooth function [see (2.118)], we have
..x ˛// D
1 X sin.2j.x ˛// j D1
j
;
and so we have the (formal) equation S˛; .m/ D
1 m X 1 X sin.2j.k˛ ˛// D j j D1 kD1
D
1 X 1 cos.2j˛. 12 // cos.2j˛. mC1 2 // : j 2 sin.j˛/ j D1
2.3 Fourier Series and a Problem of Hardy and Littlewood (I)
127
Now we choose D 1=2, that is, the left corner of our right triangle is the point .1=2; 0/ (instead of the origin). Then S˛;1=2 .m/ D
1 X cos.2 mj˛/ 1 j D1
2j sin.j˛/
;
(2.139)
implying that in the average .N / D M˛;1=2
N 1 X S .m/ N mD1 ˛;1=2
we have the new factor sin.j˛/ in the denominator instead of tan.j˛/ that we have in M˛ .N /; see (2.125), (2.126) and (2.139). Now assume that ˛ is badly approximable; then the proof of Proposition 2.16 can be easily adapted for the similar M˛;1=2 .N /, and it gives the following analog of (2.130): .N / D M˛;1=2
N X
1 C O.1/; 2j sin.j˛/ j D1
(2.140)
where the implicit constant O.1/ D O˛ .1/ is independent of N . Comparing (2.137) to (2.140), we see the geometric interpretation of the initial segment of the Hardy–Littlewood series. It represents the “average error” in a lattice point counting problem. Namely, counting lattice points in axes-parallel right triangles of slope ˛ (where ˛ is badly approximable), bounded by the horizontal axis, where the left corner is the fixed half-integer point .1=2; 0/; see the picture below.
128
2 Expectation, and Its Connection with Quadratic Fields
2.4 Fourier Series and a Problem of Hardy and Littlewood (II) The whole section is devoted to the proof of Proposition 2.16. By using Lemma 2.15 with the choice T N log N , we have M˛ .N / D
N X
1 S1 S2 C O.1/; 2j tan.j˛/ j D1
(2.141)
N X sin.2j˛/ sin.2.N C 1/j˛/ 4Nj sin2 .j˛/ j D1
(2.142)
where S1 D and T X
1 S2 D 2j j DN C1
1 sin.2j˛/ sin.2.N C 1/j˛/ C : tan.j˛/ 2N sin2 .j˛/
(2.143)
Since the irrational rotation is uniformly distributed, we have the “plausible” approximation 1 M2 M1
X
Z f .k˛/
1
f .x/ dx;
(2.144)
0
M1 k<M2
where f .x/ is a “nice” periodic function with period one. We can make the “plausible” approximation (2.144) precise by using the so-called Koksma’s inequality. Lemma 2.18 (“Koksma’s inequality”). Let X D fx1 ; : : : ; xn g be an arbitrary n-element point set in the unit interval [0,1), then ˇ ˇ n Z 1 ˇ .X / Z 1 ˇ1 X ˇ ˇ f .xi / f .x/ dx ˇ jf 0 .x/j dx; ˇ ˇ ˇn n 0 0 i D1 where of course f 0 is the derivative of f (i.e., we assume that f is smooth), and ˇ ˇ ˇ ˇX ˇ ˇ 1 ny ˇ .X / D sup ˇ ˇ ˇ 0
is the discrepancy of the set X .
(2.145)
2.4 Fourier Series and a Problem of Hardy and Littlewood (II)
129
Notice that the discrepancy defined in (2.145) measures the deviation of X D fx1 ; : : : ; xn g from the perfect uniform distribution in the unit interval. The integral Z
1
jf 0 .x/j dx
0
is usually called the variation of f . Proof of Lemma 2.18. Assume that the elements of X are in increasing order: 0 x1 x2 : : : xn 1. Using integration by parts, we have Z
Z
1
1
xf 0 .x/ dx:
(2.146)
n n X i 1X f .xi / D f .1/ .f .xi C1 / f .xi // ; n i D1 n i D1
(2.147)
f .x/ dx D f .1/ 0
0
The discrete analog of (2.146) is
where xnC1 D 1; Equation (2.147) is a routine application of Abel’s transformation (2.119). Putting x0 D 0, by (2.146) and (2.147), ˇ ˇZ ˇ ˇ n Z 1 n ˇ ˇ 1 ˇ ˇ1 X X i ˇ ˇ ˇ ˇ 0 D f .x / f .x/ dx xf .x/ dx / f .x // .f .x ˇ ˇ ˇ i i C1 i ˇ ˇ ˇ ˇ ˇn n 0 0 i D1 i D0
n Z X i D0
xi C1 xi
ˇ ˇ Z 1 ˇ ˇ ˇx i ˇ jf 0 .x/j dx .X / jf 0 .x/j dx; ˇ nˇ n 0 t u
and Lemma 2.18 follows.
It is easy to rescale Lemma 2.18 to any interval Œa; b: if a x1 x2 : : : xn b then ˇ n ˇ Z b ˇ1 X ˇ Z b 1 ˇ ˇ f .xi / f .x/ dx ˇ jf 0 .x/j dx; (2.148) ˇ ˇn ˇ b a n a a i D1 where ˇ ˇ ˇX y a ˇˇ ˇ 1n D sup ˇ ˇ; b aˇ a
(2.149)
xi y
an analog of the discrepancy in (2.145). Let’s return to the Discrepancy Lemma in Sect. 1.1 [see (1.22) and (1.23)]: it implies that the discrepancy of the irrational rotation k˛ (mod 1), 1 k n,
130
2 Expectation, and Its Connection with Quadratic Fields
is O.log n/, if ˛ is badly approximable. Next we show that, for a given interval I Œ0; 1, we can replace the upper bound O.log n/ for the discrepancy in I with O.log.njI j C 2//. This is a substantial improvement if jI j is “close” to 1=n. Lemma 2.19. If ˛ is badly approximable then X
Z˛ .nI I / D
1 D njI j C O.log.njI j C 2//:
1knW k˛2I .mod 1/
Proof. We repeat the argument in (1.15)–(1.21) with a twist at the end. Assume q`1 n < q` ; in view of (1.20) we can write n D b`1 q`1 C b`2 q`2 C : : : C b1 q1 ; where 1 b`1 a`1 , 0 bj aj for 2 j < ` 1, 0 b1 a1 1, and j 1 X
bi qi < qj for 1 j `:
i D1
Let r be the largest index j such that kqj ˛k > jI j, and write n D M C m where M D b`1 q`1 Cb`2 q`2 C: : :Cbr qr and m D br1 qr1 Cbr2 qr2 C: : :Cb1 q1 < qr :
By (1.22) and (1.23), jZ˛ .M I I / M jI jj 3.b`1 C b`2 C : : : C br /:
(2.150)
Notice that the end sequence .M C j /˛ (mod 1), 1 j m, of the irrational rotation contains at most one member in the interval I . Indeed, otherwise there exist n1 < n2 such that 1 n2 n1 < m < qr with ni ˛ 2 I (mod 1), i D 1; 2, and so k.n2 n1 /˛k jI j < kqr ˛k. But this contradicts the following well-known minimum property of the convergent denominators qj of ˛: kp˛k < kqj ˛k implies that p > qj . Thus we have Z˛ .nI I / D Z˛ .M I I / C O.1/; and so by (2.150), jZ˛ .nI I / njI jj D jZ˛ .M I I / M jI j C O.1/ njI jj jZ˛ .M I I / M jI jj C O.1/ C njI j D
D
max bj
rj <`
O.` r/ C mjI j D O.` r/ C O.1/:
(2.151)
2.4 Fourier Series and a Problem of Hardy and Littlewood (II)
131
In the last step we used that ˛ is badly approximable, and also mjI j < qr jI j < qr kqr ˛k D O.1/; where in the last step we used (1.9). Again using the fact that ˛ is badly approximable, we have n q` and jI j kqr ˛k
1 ; qr
implying q` njI j and ` r D O.log.njI j C 2//: qr Combining (2.151) and (2.152), Lemma 2.19 follows.
(2.152) t u
Now we are ready to estimate S2 in (2.143). We define the set 2kC1 2k Ak D N < j T W : kj˛k < N N
(2.153)
Depending on whether fj˛g is small or 1 fj˛g is small, we split Ak into two C parts: Ak D AC k [ Ak . More precisely, let kxk D kxk if the interval .x 1=2; x contains an integer and 0 otherwise, and similarly let kxk D kxk if the interval .x; x C 1=2 contains an integer and 0 otherwise. Then kxk D kxkC C kxk , and write 2k 2kC1 C C (2.154) kj˛k < Ak D N < j T W N N and 2k 2kC1 A D N < j T W < : kj˛k k N N
(2.155)
The proof of Proposition 2.16 proceeds in several steps: Step One, Step Two, and so on. Step One:
We estimate the sum X j 2Ak
1 for every k 1; j tan.j˛/
(2.156)
where Ak is defined in (2.153). For technical reasons, we decompose Ak into several parts: for 1 ` k 2 let
132
2 Expectation, and Its Connection with Quadratic Fields
(
Ak;`
) 1 `1 1 ` D j 2 Ak W N 1 C 2 <j N 1C 2 ; k k
(2.157)
and for ` k 2 C 1 let Ak;` D fj 2 Ak W N.k; ` 1/ < j N.k; `/g ;
(2.158)
2 1 k 1 N.k; k / D N 1 C 2 and N.k; `/ D N.k; `1/ 1 C : k `
(2.159)
where 2
Again we split Ak;` D AC k;` [ Ak;`
(2.160)
exactly the same way as we did in (2.153)–(2.155). We estimate the sum X j 2Ak;`
1 j tan.j˛/
by using Lemmas 2.18 and 2.19. The reason why we defined the “short” sets Ak;` is that the factor j hardly changes in such a short set. We apply Eq. (2.148) with aD
2k 2kC1 1 ; bD ; f .x/ D ; N N tan.x/
(2.161)
and the finite point set X in the interval Œa; b is the following [see (2.160)]: X D fj˛ .mod 1/ W j 2 AC k;` gI
(2.162)
then we have ˇ ˇ ˇ ˇ Z b C Z b ˇ ˇX jAk;` j 1 ˇ ˇ f .x/ dx ˇ jf 0 .x/j dx: ˇ ˇ ˇ C tan.j˛/ b a a a ˇ ˇj 2Ak;`
(2.163)
Notice the difference between (2.156) and (2.163): in the latter factor j is missing from the denominator. Write E.k; `/ D .N.k; `/ N.k; ` 1//
2k ; N
(2.164)
2.4 Fourier Series and a Problem of Hardy and Littlewood (II)
133
where N.k; `/ is defined in (2.159) for ` k 2 , and 1 ` for 0 ` < k 2 : N.k; `/ D N 1 C 2 k
(2.165)
We may call E.k; `/ the “expectation,” because by Lemma 2.19, D O.log.E.k; `/ C 2//;
(2.166)
and Z
b
jf 0 .x/j dx D jf .b/ f .a/j;
(2.167)
a
because f .x/ D .tan.x//1 is monotonic in a x b as long as 2k N=4. Combining (2.161)–(2.167), we have X C
j 2Ak;`
E.k; `/ 1 D tan.j˛/ ba
Z
b
f .x/ dx C O.N 2k log.E.k; `/ C 2//:
a
(2.168) A k;` :
the only difference is that a1 D 1 We repeat the same argument for 2kC1 N 1 and b1 D 1 2k N 1 are the new endpoints instead of a; b in (2.161). Thus we have the analog of (2.168): X j 2A k;`
E.k; `/ 1 D tan.j˛/ b1 a1
Z
b1
f .x/ dx C O.N 2k log.E.k; `/ C 2//:
a1
(2.169) Since tan.x/ is an odd function, Z
Z
b1
b
f .x/ dx C
f .x/ dx D 0; a
a1
and by (2.168) and (2.169), X j 2Ak;`
1 D O.N 2k log.E.k; `/ C 2//: tan.j˛/
(2.170)
By (2.157)–(2.159), if j1 ; j2 2 Ak;` then with j1 < j2 we have 1<
1 j2 1 C 2 if ` k 2 j1 k
(2.171)
134
2 Expectation, and Its Connection with Quadratic Fields
and 1<
1 j2 1 C if ` > k 2 : j1 `
(2.172)
By (2.170)–(2.172), we can control the effect of the extra factor of j in the denominator as follows: X j 2Ak;`
1 D O..N.k; ` 1//1 N 2k log.E.k; `/ C 2// C j tan.j˛/ X
C minfk 2 ; `2 g
j 2Ak;`
1 : j j tan.j˛/j
(2.173)
By the definition of Ak;` [see (2.153)–(2.159)] X j 2Ak;`
1 jA.k; `/j N 2k ; j j tan.j˛/j N.k; ` 1/
so by (2.173) we have X j 2Ak;`
1 D O.Hk;` / j tan.j˛/
(2.174)
where Hk;` D .N.k; ` 1//1 N 2k log.E.k; `/ C 2/ C minfk 2 ; `2 g jAk;` j : (2.175) By (2.164)–(2.166), X
Hk;` D
`1
D
X
X
Hk;` C
1`k 2
X
Hk;` D
k 2 <`
X k `=k 2 2 O.`k 2 C k/ C O.`2 e `=k / D 2k O.k/ C O.k 4 2k / C 2 e
1`k 2
k 2 <`
D O.k 2 /:
(2.176)
Combining (2.175) and (2.176), for every k 1 we have X j 2Ak
which completes Step One.
1 D O.k 2 /; j tan.j˛/
(2.177)
2.4 Fourier Series and a Problem of Hardy and Littlewood (II)
135
Adding up (2.177) for all k D 1; 2; 3; : : : we have X N <j T W kj˛k2=N
1
X 1 k 2 / D O.1/: D O. j tan.j˛/
(2.178)
kD1
Of course, if kj˛k is “around” 1=N , then the method of Step One still works, for example, X N <j T W 2=N >kj˛k1=16N
1 D O.1/; j tan.j˛/
(2.179)
but if kj˛k is much smaller than 1=N , then we switch to Step Two:
Let
1 1 Bk D N < j T W k kj˛k > kC1 2 N 2 N
;
(2.180)
then we estimate the sum [see (2.143)] X j 2Bk
sin.2j˛/ sin.2.N C 1/j˛/ 1 C j tan.j˛/ 2jN sin2 .j˛/
for every k 4. We repeat the argument of Step One with the new function g.x/ D
sin.2x/ sin.2.N C 1/x/ 1 C tan.x/ 2N sin2 .x/
(2.181)
instead of f .x/ D 1= tan.x/ [see (2.161)] that we used in Step One. Note that g.x/ is also odd (which is crucial for the cancellation part); and g.x/ is also monotonic at least in the interval 0 < x < 1=16N ; and g.x/
2 2 1 N x if 0 < x < : 3 16N
(2.182)
Applying the method of Step One with Bk and g.x/ instead of Ak and f .x/, and heavily relying on (2.182) (what we need is monotonicity: smaller x D kj˛k leads to smaller g.x/), we obtain the following analog of (2.178): X X g.j˛/ D O.1/: j j 2B k4
This completes Step Two.
k
(2.183)
136
2 Expectation, and Its Connection with Quadratic Fields
Let’s return to S2 in (2.143). In view of (2.178)–(2.183), the last step is Step Three:
We have to estimate the sum X N <j T W kj˛k1=16N
sin.2.N C 1/j˛/ sin.2j˛/ : 2jN sin2 .j˛/
(2.184)
Again we repeat the argument of Step One: this time with the function h.x/ D
sin.2.N C 1/x/ sin.2x/ ; 2N sin2 .x/
(2.185)
and as an analog of the set Ak;` [see (2.157)–(2.159)], we introduce the new set Ak;` defined as 8 < :
N
1C
1
log2
T
N
!`1 C2
<j N
1C
1
log2
T
N
!` C2
9 k k C 1= ; W < kj˛k 16N 16N ;
where k D 1; 2; 3; : : : and ` D 1; 2; 3; : : :. Similarly to Step One, we estimate the sum X h.j˛/ j
j 2Ak;`
by combining Koksma’s inequality [in fact, we use the form (2.148) and (2.149)] with Lemma 2.19 and taking advantage of the fact that the function h.x/ is odd (which gives the crucial cancellation); also we use the fact that the factor j hardly changes in the “short” set Ak;` . A simple calculation gives sum(2.184) D O.1/I
(2.186)
in the denominator a key reason why (2.186) holds is that the square sin2 .x/ P of (2.185) implies the appearance of the convergent series k1 k 2 D O.1/ (instead of the divergent harmonic series). Summarizing, by (2.178)–(2.184) and (2.186) we have S2 in (2.143) D O.1/:
(2.187)
S1 in (2.142) D O.1/:
(2.188)
It remains to show that
2.5 A Detour: The Giant Leap in Number Theory
137
To prove (2.188) we don’t need the sophisticated method of Step One; instead we can succeed by simply using the trivial upper bound jS1 j
N X kD1
1 : N kkk˛k2
(2.189)
By repeating the proof of Lemma 2.14 (Pigeonhole Principle), we obtain N X kD1
1 D O.1/; N kkk˛k2
(2.190)
P due to the fact that the square kk˛k2 leads to the convergent series k1 k 2 D O.1/ (instead of the divergent harmonic series). Combining (2.141)–(2.143) with (2.187)–(2.190), Proposition 2.16 follows. t u The next section is a (very important) detour: it is a short essay about the paradigm of determinism versus randomness, providing a broader perspective for our main results, Theorems 1.1 and 1.2.
2.5 A Detour: The Giant Leap in Number Theory 2.5.1 Looking at the “Big Picture” As we already said in the Preface,pwe did not choose the (otherwise catchy and quite fitting) subtitle randomness of 2 to avoid misleading the reader. Our p objective is not to prove the apparent “randomness” of the digit distribution of 2 (which, unfortunately, remains open). Nevertheless, this notorious and totally untouchable problem is a perfect illustration of what we like to call the “Giant Leap” in number theory. Historically the first attempt to provep something vaguely similar to the apparent randomness of the digit distribution of 2 was a measure-theoretic result. About 100 years ago, in 1909 E. Borel proved that almost every real number is normal in all bases b D 2; 3; : : : ; 10; : : :. Of course, almost every means “all but a set of Lebesgue measure zero,” and a real number is said to be normal in a particular base if every block of digits of any length occurs with the same density depending only on the length and the base. In particular, if the base is b 2 and the length is l 1 then the density is b l , that is, normality is an equidistribution property. Unfortunately, p the measure-theoretic approach says nothing about individual numbers such as 2 or . This is why now, 100 years later, we still don’t know any explicit example of a number that is normal in all bases (such a number is often called absolutely normal).
138
2 Expectation, and Its Connection with Quadratic Fields
To be fair, we have to admit that there are some very indirectly defined numbers, such as the Chaitin’s number—defined as the halting probability of a universal Turing machine—and the so-called Sierpinski’s number (which gives a little bit of extra information beyond Borel’s measure-theoretic existential proof), that are absolutely normal, but most mathematicians are not happy with them—they are not considered “properly explicit.” For example, the so-called Champernowne number, see below, is undoubtedly “properly explicit,” and perfectly satisfies everybody. The core problem is that we don’t have a rigorous definition of “concrete example.” For example, Sierpinski, mainly a set theorist, has a very broad interpretation and considers everything “explicit” if it does not use the Axiom of Choice. Sierpinski’s “explicit example” is the minimum of a bounded countable set of real numbers. For most number theorists this is some sort of cheating; they want something more explicit, something “similar” to the Champernowne number. We concede, at this point the discussion becomes very murky—so we just stop this inserted remark. When we say we don’t know any explicit example of an absolutely normal number, we mean that we don’t have a rigorous mathematical proof. We have, however, a very convincing “experimental proof,” because there is an overwhelming numerical p evidence that the famous special numbers, such as D 3:14 : : :, e D p p 2:718 : : :, 2, 3, 3 2, log 2 (meaning the natural logarithm of 2), and log 3= log 2 (meaning the base 2 logarithm of 3), are all absolutely normal. We cannot help but insert here two historic remarks. One of the early (pure mathematical) experimentations with the electronic computer—in 1949 von Neumann and his group working on ENIAC, the first fully electronic computer—was to determine the first two thousand decimal digits of and to carry out a statistical treatment of the digit distribution. The second remark is a prediction of the great Dutch mathematician L.E.J. Brouwer. Almost 100 years ago, well before the revolution of the electronic computer, Brouwer wanted to show an example of an “unsolvable” problem—or at least unsolvable in his lifetime—and he came up with the following question: In the decimal expression for , do we ever come to a place where a thousand consecutive digits are all zero? The answer is still unknown (but of course we all expect a positive answer). As illustration, here are the first 50 digits of in bases 10 and 2: D 3:141592653589793238462643383279502884197169399375105820 : : : D 11:00100100001111110110101010001000100001011010001100001 : : : And here are the first 50 digits of
p
2 in bases 10 and 2:
p 2 D 1:414213562373095048801688724209698078569671875376948 : : : p 2 D 1:0110101000001001111001100110011111110011101111001100 : : : But much more is true—or seems to be true—here: according to Wolfram’s book A New Kind of Science (especially Chap. 4), every single irrational special number
2.5 A Detour: The Giant Leap in Number Theory
139
ever tried so far seems to be normal in all bases. This observation is supported by an enormous computational evidence. For example, the frequency of digit 7 among the first 10n decimal digits of is 8 %, 9.5 %, 9.7 %, 10.025 %, 9.980 %, 10.002 % as 1 n D 2; 3; 4; 5; 6; 7—the occurrence ratios for digit 7 seem to be converging to 10 . The vaguely defined notion special number means a real number expressed in terms of standard mathematical functions. The rational numbers are trivial exceptions: they are eventually periodic in every base, and periodicity (i.e., the repetition of the same block) is the complete opposite of the equidistribution of the blocks. Note that normality is much less than “randomness”: the number 0:123456789101112131415161718192021 : : : 99100101102 : : : is normal in base 10 in spite of exhibiting a very clear and predictable antirandomness pattern. The pattern is that the digits are those of all natural numbers in succession; this is called the Champernowne number. Is the Champernowne number normal in base 2 or base 3? No one knows. Irrational special numbers seem to exhibit digit equidistribution (i.e., normality), and what is more, far beyond normality they all seem to exhibit “full-blown randomness,” including the trademark square root size fluctuation of the random walk (physicists call it the “square root law”). For example, a statistical analysis of the first 10 million decimal digits of tells us something interesting. The frequencies of 0, 1, 2, : : :, 9 differ from the expected number 106 by 560; 667; 306; 36; 1093; 466; 663; 207; 186; 40: Since the standard deviation of the corresponding binomial distribution p np.1 p/ with n D 107 , p D 1=10 is 300, the fluctuations are close to what one would expect by the central limit theorem. Among the first 2 1011 (200 billion) decimal digits of , the frequencies of 0, 1, 2, : : :, 9 differ from the expected number 2 1010 by 30841; 85289; 136978; 69393; 78309; 82947; 118485; 32406; 291044; 130820I the data are from Wolfram’s book, see p p. 912. Now the standard deviation of the corresponding binomial distribution np.1 p/ with n D 2 1011 , p D 1=10 is roughly 135,000, and again the fluctuations are well predicted by the central p p limit theorem. We have similar data for 2. The decimal expansions of and 2 seem to exhibit normality, or using an alternative probabilistic name: the law of large numbers, and what is much more, they also seem to exhibit the square root law, or perhaps even the delicate central limit theorem. (Note that these results, the law of large numbers, the square root law, and the central limit theorem, are the benchmarks of Probability Theory.)
140
2 Expectation, and Its Connection with Quadratic Fields
Summarizing, we can say that for the “interesting” real numbers (or “special” numbers) the decimal expansion, and in general any base b 2 expansion, either features a simple behavior (such as the periodicity for the rationals) or features fullblown “randomness” (which seems to be the case for all special irrationals ever tried). We refer to this striking phenomenon as the Giant Leap. What makes the Giant Leap so uniquely interesting is the sharp contrast between the overwhelming numerical evidence and the total lack of rigorous mathematical proof. We don’t even know whether or notp each of the ten digits keeps occurring infinitely often in the decimal form of (or 2, or e, etc.). How come that these questions are mathematically untouchable? We are sure the reader’s first reaction is to turn to Probability Theory for help. But here p is the big dilemma: the decimal expansion of (or 2 or e) is an individual sequence, and traditional probability theory says nothing about the “randomness” of individual sequences. In fact, the basic idea of Kolmogorov’s axiomatic foundation for probability theory is to scrupulously avoid the notion of “individual random sequence,” and right now we simply do not have any workable, agreed-on definition of “randomness.” Note that in the 1920s, before Kolmogorov’s axioms, von Mises made an attempt to come up with a definition, but his work remained incomplete and controversial (we can actually say that von Mises’s failure was a key motivation for Kolmogorov’s axiomatic approach). Von Mises’s basic idea was to express the apparent lack of successful gambling schemes in a formal definition for random sequences. Many years later Information Theory (Shannon) suggested the new idea to define randomness via inability to compress data. Combining Mises’s old idea with this new idea, people like Chaitin, Kolmogorov, Solomonoff, and Martin-Löf introduced and developed the notion of algorithmic randomness. An individual sequence of length n features algorithmic randomness if the program-size complexity (i.e., the length of the shortest program describing the sequence) is close to n (i.e., the length of the sequence). The intuitive meaning is that the sequence is “patternless”; we cannot really compress the information: we have to write down the whole sequence. Notice that algorithmic randomness is an extremely restrictive notion. Any sequence generated by a simple program (i.e., every “long” sequence we know) can by definition never be algorithmically random. For example, we know very long p initial segments of the decimal digits of 2 and ; they are generated by simple p programs. For 2 we have the ancient Babylonian Algorithm: let a0 D 1 and define a sequence a1 , a2 , a3 ; : : : inductively by letting anC1 D
an C 2
2 an
; n 0:
(2.191)
p The convergence an ! 2 is extremely rapid: the number of correct decimal digits doubles with each iteration. Since (2.191) p is a very short program, the program-size complexity of the digit sequence of 2 is very low, so the algorithmic p randomness of the digit sequence of 2 is also very low. This means the concept of algorithmic randomness is quite irrelevant in our quest for understanding the apparent randomness we clearly see in these digit sequences.
2.5 A Detour: The Giant Leap in Number Theory
141
The message of von Mises’s failure is that there is no “absolute randomness”; in each case one has to decide on a cutoff. For example, in this book we say “enough” and stop around the central limit theorem; this is where we draw the line in the infinite hierarchy of notions of randomness. Most mathematicians would agree that “randomness up to the central limit theorem” is already a high, advanced level in the hierarchy. For more readings about “randomness” and “random numbers,” we recommend Chap. 3 in Knuth [Kn2]. In our search for finding further evidence supporting the Giant Leap, we switch now from the decimal expansion to the continued fraction. To represent a real number x as a continued fraction, first we take the integral part of x, then we take the reciprocal 1=fxg of the fractional part of x, write it as the sum of the integral part and the fractional part, then take the reciprocal of the fractional part, and keep repeating the process: 1
x D a0 C
1
a1 C a2 C
;
(2.192)
1 a3 C : : :
or by using the space-saving notation, x D Œa0 I a1 ; a2 ; a3 ; : : :. Note that continued fractions play a key role in diophantine approximation, in uniform distribution, and in the solution of some diophantine equations. Continued fractions provide another perfect illustration for the Giant Leap phenomenon. Indeed, for every “interesting” real number ever tried the continued fraction either has a simple behavior or it exhibits full-blown randomness. Examples of Simple Behavior: 1. rational numbers have finite continued fraction; p p p p p 2. quadratic irrationals, such as 2, 3, 5, 6, 7, all have periodic continued fractions—here are a few examples: p 2 D Œ1I 2; 2; 2; 2; 2; : : :; p
3 D Œ1I 1; 2; 1; 2; 1; 2; 1; 2; : : :; p 5 D Œ2I 4; 4; 4; 4; 4; : : :; p
p
6 D Œ2I 2; 4; 2; 4; 2; 4; : : :;
7 D Œ2I 1; 1; 1; 4; 1; 1; 1; 4; 1; 1; 1; 4; : : :; p 1C 5 D Œ1I 1; 1; 1; 1; 1; : : :; 2
142
2 Expectation, and Its Connection with Quadratic Fields
where the last one, representing the golden ratio, has the simplest form. A more complicated example is p
67 D Œ8I 5; 2; 1; 1; 7; 1; 1; 2; 5; 16; 5; 2; 1; 1; 7; 1; 1; 2; 5; 16; : : :;
p where the period of 67 isp the block 5,2,1,1,7,1,1,2,5,16 of length 10. Note that the length of the periodp of n in general remains a big mystery. The maximum p length of the period for npcan be asymptotically as large as (roughly) n itself, or it can be very short like 65 D Œ8I 16; 16; 16; : : :, where the period has length one. 3. special number e and its “family”: we know from Euler that e D Œ2I 1; 2; 1; 1; 4; 1; 1; 6; 1; 1; 8; 1; : : : ; 1; 2n; 1; : : :; p
e D Œ1I 1; 1; 1; 5; 1; 1; 9; 1; 1; 13; 1; : : : ; 1; 4n C 1; 1; : : :;
e 2 D Œ7I 2; 1; 1; 3; 18; 5; 1; 1; 6; 30; : : : ; 3n 1; 1; 1; 3n; 12n C 6; : : :; p 3
e D Œ1I 2; 1; 1; 8; 1; 1; 14; 1; 1; 20; 1; : : : ; 1; 6n C 2; 1; : : ::
Notice that they all have a simple linear pattern. The list is in fact infinite, including all numbers of the form e 2=k where k 1 is an integer; for more about it, see, e.g., Lang [La]. By the way, the “simplest” member of the family is e2 1 D Œ1; 3; 5; 7; 9; : : : ; 2n C 1; : : : e2 C 1 (when the integral part is zero, we often delete 0 and the semicolon from the beginning). Examples of Random Behavior: The rest of the special numbers, including e 3 , all seem to exhibit full-blown randomness with a common limit distribution for the digits. Unlike the familiar decimal expansion, where we have ten possible digits, in the continued fraction the j th digit aj (often called the j th partial quotient) can be any integer 1, so equidistribution does not make any sense. The particular limit distribution for the continued fraction comes from the invariant measure of the relevant mapping T W x ! f1=xg;
(2.193)
which maps the open unit interval (0,1) onto itself. Note that T is not one-to-one: the inverse image of an interval .a; b/, where 0 < a < b < 1, is the infinite union of disjoint intervals
2.5 A Detour: The Giant Leap in Number Theory
143
1 1 1 1 1 1 ; ; ; I ; ; ; 1Cb 1Ca 2Cb 2Ca 3Cb 3Ca
(2.194)
each one of these intervals is mapped to the whole .a; b/ by T . If we define the measure of an interval .a; b/ to be 1 m.a; b/ D log 2
Z a
b
dx 1 1Cb D log ; 1Cx log 2 1Ca
(2.195)
then one can easily check that this m-measure of the interval .a; b/ equals the sum of the m-measures of the intervals in (2.194). We can extend (2.195) to any measurable set A .0; 1/ by the integral m.A/ D
1 log 2
Z A
dx ; 1Cx
(2.196)
where log stands for the natural (base e) logarithm. Measure (2.195) and (2.196) was already known to Gauss (who, for number-theoretic reasons, carried out an extensive numerical experimentation on continued fractions). The key property of measure (2.195) and (2.196) is that it is preserved by the transformation T . By definition the first partial quotient a1 of a real x 2 .0; 1/ equals an integer k 1 if 1 and only if x falls into the interval . kC1 ; k1 /, which has m-measure 1 log 2
Z
1=k
1=kC1
1 1 1 dx log.1 C / log.1 C D / D 1Cx log 2 k kC1 2
D
.kC1/ log k.kC2/
log 2
1 1 : D log 1 C log 2 k.k C 2/
(2.197)
A well-known theorem of Kusmin states that, for almost every x 2 .0; 1/, the density with which an arbitrary integer k 1 appears in the sequence a1 ; a2 ; a3 ; : : : of partial quotients in (2.192) is exactly (2.197). For example, for almost every x 2 .0; 1/, the density of the digit 1 is exactly log.4=3/ D 0:415 : : : 41:5 %: log 2
(2.198)
It was realized later that both Borel’s theorem and Kusmin’s theorem are special cases of the very general Ergodic Theorem of Birkhoff. Note, however, that Birkhoff’s general theorem doesn’t give any error term; on the other hand, in Borel’s theorem and also in Kusmin’s theorem we can prove a basically square root size error term (the sharpest form of Borel’s theorem is the well-known Law of the Iterated Logarithm).
144
2 Expectation, and Its Connection with Quadratic Fields
Kusmin’s theorem clearly fails for x D e [where the frequency of the digit 1 is 2/3, which differs from the 41.5 % in (2.198)] and fails for the quadratic irrationals (which are periodic). By contrast, higher roots (cube p roots, fourth roots, etc.) never appear to show any simple pattern like what e or e or e 2 does. Unlike “regularity,” they all seem to show “randomness” with Kusmin’s rescaling [see (2.197)]. For example, among the first million partial quotients in the continued fraction for the cube root of 2 the digit 1 appears 414,983 times, which is remarkably close to the 41:5 % in (2.198), i.e., Kusmin’s limit (2.197) with k D 1. The same remarkable fact holds for the special number : among the first million partial quotients the digit 1 appears 414,526 times, again very close to 41:5 %. These are striking numerical facts, but, unfortunately, we cannot prove any theorem—not even the most plausible conjecture. For example, we don’t know for sure whether the sequence a1 ; a2 ; a3 ; : : : of partial quotients for the cube root of 2 is bounded or not. What is worse, we don’t know a single algebraic number of degree 3 for which the sequence a1 ; a2 ; a3 ; : : : of partial quotients is unbounded. We don’t know this in spite of the well-known conjecture (raised by Khinchin in the 1930s) claiming that a1 ; a2 ; a3 ; : : : is unbounded for every single real algebraic number of degree 3. Summarizing, we can safely say that computer experimentation strongly supports the Giant Leap phenomenon for both the decimal (or any other base) expansion and the continued fraction expansion of special numbers: they either exhibit very simple behavior or they exhibit full-blown randomness. The only technical difference is in the scaling: in continued fractions the ordinary uniform Lebesgue measure in the unit interval (0,1) has to be replaced by the nonuniform Gauss measure (2.195) and (2.196). In spite of the overwhelming numerical evidence, we don’t have the slightest idea how to prove the Giant Leap phenomenon. A good illustration of what contemporary mathematics can do versus the conjectured truth is the concrete special number x D p 3 2 and a brief discussion of the celebrated works of two Fields medal winners, K.F. Roth and A. Baker. We begin with recalling a classical result of Dirichlet: for every irrational ˛ there are infinitely many rationals p=q such that ˇ ˇ ˇ ˇ ˇ˛ p ˇ < 1 : ˇ q ˇ q2
(2.199)
In the 1950s K.F. Roth completed a long line of research initiated by Thue and Siegel and proved the following basic theorem in diophantine approximation (he was awarded a Fields medal p in 1958): for any real algebraic number of degree 3, including the case ˛ D 3 2, and for any " > 0, ˇ ˇ ˇ ˇ ˇ˛ p ˇ > c.˛; "/ ; ˇ qˇ q 2C"
(2.200)
where c D c.˛; "/ > 0 is a constant (note that the case of quadratic irrationals is trivial). In view of (2.199) Roth’s inequality (2.200) is nearly best possible (since
2.5 A Detour: The Giant Leap in Number Theory
145
" > 0 can be arbitrarily small), but a more delicate analysis reveals that there is plenty of room for improvement in (2.200). Indeed, (2.200) is equivalent to q kq˛k >
c.˛; "/ q"
(2.201)
for every integer q 1, where kxk denotes the distance of a real x from the nearest integer. p On the other hand, for every real algebraic number of degree 3, including ˛ D 3 2, computer experimentation seems to support the much stronger inequality q kq˛k >
c.˛; "/ log q .log log q/1C"
(2.202)
for every integer q 3, and also that (2.202) is best possible in the sense that we cannot delete " > 0. Notice that there is an exponential gap between (2.201) and (2.202). By the way, (2.202) is certainly true for almost every real ˛; the proof is easy. A serious handicap of Roth’s theorem (or Thue–Siegel–Roth theorem) is that the constant c D c.˛; "/ > 0 is ineffective: we cannot replace it with an explicit constant. The reason is that the proof technique (“Thue method”) is indirect—it involves a hypothetical assumption that there is a large “bad” q, which behaves wickedly, and the constant c D c.˛; "/ > 0 depends on the size of this “bad” q (q is finite, but in principle it can be arbitrarily large). Nevertheless effective results have been obtained by A. Baker in the 1960s (for which he was awarded the Fields medal in 1970). For example, in 1964 Baker proved the explicit result p 106 3 q kq 2k > 0:955 q
(2.203)
that holds for every integer q 1. The point here is the effective constant 106 in the numerator and the exponent 0:955 p < 1 in the denominator (notice that (2.203) with 1 instead of 0:955 is trivial, since 3 2 is a cubic number). We have to admit, therefore, that there is a humiliating exponential gap between the apparent truth [i.e., conjecture (2.202)] and what contemporary mathematics can do: the ineffective (2.201) and the effective (2.203), due to two Fields’ medalists. (Nevertheless, even a “weak” result like (2.203) has remarkable consequences in the theory of diophantine equations.) Conjecture (2.202) for real algebraic numbers (of degree 3)—a special case of the vague Giant Leap phenomenon—features “randomness.” Where does this pseudorandomness come from? This is a fundamental open problem, and we are nowhere near to understand it (not to mention answering it). For more about this exciting general issue, see Wolfram [Wo] and Beck [Be6]].
146
2 Expectation, and Its Connection with Quadratic Fields
With some exaggeration we may even include the celebrated Riemann Hypothesis as another example of the Giant Leap. In the history of mathematics the set of primes served the first example of what one would call a “random set.” The Riemann Hypothesis (arguably the most famous open problem in mathematics) is equivalent to a problem about the “randomness” of the primes in the following way. The starting point P is Riemann’s remarkable Explicit Formula for the prime-counting function .x/ D px 1, which involves the nontrivial zeros of the Riemann zeta function. Instead of the original formula, nowadays it is customary to discuss a simplified version, due to von Mangoldt, where the plain prime-counting function .x/ is replaced with a weighted version (“Mangoldt sum”) 0 .x/
D
X
ƒ.n/;
(2.204)
1nx
where ƒ.n/ D log p if n is a power of p (p always stands for a prime) and ƒ.n/ D 0 if n is not a prime power. Riemann’s Explicit Formula in prime number theory goes as follows: 0 .x/
Dx
X x
C O.1/;
(2.205)
where runs through the nontrivial zeta-zeros (meaning the zeros in the vertical strip with real part between 0 and 1). Riemann described the number of the nontrivial zeta-zeros (say) in the vertical box where the imaginary part has absolute value T (T is “large”): the number is 1 1 C log.2/ T log T T C O.log T /: 2 2
(2.206)
In sharp contrast to the number, we can prove very little about the location of the nontrivial zeta-zeros. What we can prove is much, much less than the Riemann Hypothesis, which claims that the nontrivial zeta-zeros are all on the critical line (vertical line with
D x C O.x 1=2Co.1/ /;
(2.207)
or equivalently (via integration by parts) Z
x
.x/ D 2
dt C O.x 1=2Co.1/ /: log t
(2.208)
The square root size error term O.x 1=2Co.1/ / nicely fits the so-called random set simulation of the primes. By the Prime Number Theorem, the density of the primes at x is log1 x . This motivates the following simulation (due to Cramer): starting from
2.5 A Detour: The Giant Leap in Number Theory
147
n D 3, for every integer n 3 we toss a “loaded n-coin” that shows Heads with probability log1 n and shows Tails with probability 1 log1 n . Keeping n if the outcome of the trial is Heads and rejecting it if the outcome is Tails, we obtain a Random Subset of the natural numbers; we call the elements of this random set “random primes.” The expected number of “random primes” is exactly x X
1 D log n nD3
Z
x 2
dt C O.1/; log t
(2.209)
and the actual number of “random primes” x fluctuates around the expected number (2.209) with the usual square root size standard deviation O.x 1=2Co.1/ /. In other words, formula (2.208), which is equivalent to the Riemann Hypothesis, is in perfect harmony with the O.x 1=2Co.1/ / size fluctuation of the Random Subset (i.e., the Monte Carlo simulation of the primes). The converse is also true: if the Riemann Hypothesis fails then the fluctuation in (2.205) is much larger than the standard deviation O.x 1=2Co.1/ /. Indeed, if there is a nontrivial zeta-zero D ˇCi with ˇ ¤ 1=2, then D .1ˇ/Ci is another zeta-zero (follows from a symmetry of the Functional Equation of the zeta function), and maxfˇ; 1 ˇg D ˛ > 1=2. Then in (2.205) the fluctuation around x is at least as large as x˛o.1/ , and also the fluctuation of .x/ around the logarithmic integral is at least as large as x ˛o.1/ , which is asymptotically much larger than the standard deviation O.x 1=2Co.1/ / of the Random Subset (it is not too difficult to make this argument precise). In other words, the failure of the Riemann Hypothesis implies that the “random prime” model is grossly incorrect. Even if no one has a rigorous mathematical proof, everyone would agree that p the Riemann Hypothesis is “true”—just like everyone would agree that , e, 2 are all normal. Indeed, we have an overwhelming “computer science proof”: it cannot be an accident that the first billion zeta-zeros are all on the critical line. Since the Riemann Hypothesis is “true,” the Random Prime model predicts the fluctuations in the global distribution of primes very accurately. The common feature of the digit sequences of special numbers and the set of primes is the “apparent randomness” and the (almost) total lack of rigorous proofs. Our main goal is to prove results, such as Theorems 1.1 and 1.2, which support the Giant Leap phenomenon. These results are admittedly modest first steps only. Our second goal is to challenge the reader to participate in the long-term research project of exploring this exciting mystery. What we do here has some vague formal similarities to the Erd˝os–Kac theorem (about the number of prime divisors of typical integers) and other probabilistic results about multiplicative and additive number theoretic functions (see, e.g., Elliott’s book [El] or Kac [Ka]). However, in spite of the formal similarity, the two subjects are rather different.
148
2 Expectation, and Its Connection with Quadratic Fields
2.6 Connection with Quadratic Fields (I) After the philosophical detour of Sect. 2.5, now we return to the proofs of our central limit theorems (Theorems 1.1 and 1.2); in particular, to the computation of the expectation and the variance. In Sect. 2.4 we proved Proposition 2.16, which evaluates the mean value as follows: M˛ .N / D
N 1 X 1 C O.1/; 2 nD1 n tan. n˛/
(2.210)
assuming ˛ is a badly approximable number. The p following result is an alternative formula for M˛ .N / in the special case when ˛ D d , d 3 (mod 4) is a squarefree positive integer. The necessary distinction between the cases d 1 or 3 (mod 4) is one of the characteristic peculiarities of algebraic number theory—a subject that we are going to heavily use below. Proposition 2.20. Assume that d is a square-free positive integer with d 3 (mod 4), then 0 p d B Mpd .N / D 2 B @
1 X .x;y/¤.0;0/W primary representations
x2
C log N 1 C C O .log log N /3 ; A 2 dy log d (2.211)
p
where d D u0 C v0 d comes from the least solution x D u0 , y D v0 of Pell’s equation x 2 dy 2 D 1 (“least” means that x0 > 0, y0 > 0 and y0 is least). The meaning of “primary representations” in (2.211) will be explained in the proof below. Proof. First we give a precise definition of the infinite series X .x;y/¤.0;0/W primary representations
x2
1 dy 2
(2.212)
in the middle of (2.211), and prove the convergence. Note that x 2 dy 2 is the principal (binary quadratic) form of discriminant 4d , and the theory of quadratic forms p of discriminant 4d is equivalent to the theory of the real quadratic field Q . d /. We assume that the reader is somewhat familiar with the simplest concepts and facts about quadratic forms and quadratic fields (see, for example the book [Za4]). We recall the well-known fact that, given any integer A ¤ 0, if the equation x 2 dy 2 D A has one integral solution .x; y/, then the equation has infinitely many integral solutions. Indeed, if x12 dy12 D A and u2 d v2 D 1, then the product formula
2.6 Connection with Quadratic Fields (I)
p p p p .x1 Cy1 d /.uCv d / D .x1 uCy1 vd /C.x1 vCy1 u/ d D x2 Cy2 d
149
(2.213)
leads to a new solution x2 D x1 uCy1 vd , y2 D x1 vCy1 u of the equation x 2 dy 2 D A. Since Pell’s equation u2 d v2 D 1 has infinitely many solutions, generated by the least solution, product formula (2.213) gives rise to infinitely many solutions of x 2 dy 2 D A. The two solutions, .x1 ; y1 / and .x2 ; y2 /, related by the product formula (2.213), are called associates—this defines an equivalence relation on the set of all solutions of x 2 dy 2 D A. Let Rd .A/ denote the number of equivalence classes. Note that Rd .A/ is always finite and satisfies the inequality Rd .A/ .jAj/;
(2.214)
where .n/ is the divisor function, i.e., .n/ is the number of (positive) divisors of n, including 1 and n itself. Inequality (2.214) is a classical result (it is in fact a corollary of an exact formula for Rd .A/, due to Dirichlet). Now we are ready to define the precise meaning of series (2.212): X .x;y/¤.0;0/W primary representations
1 X X Rd .A/ 1 Rd .n/ Rd .n/ D D : x 2 dy 2 A n nD1
(2.215)
A¤0
To prove the convergence in (2.215), we describe a definite way of selecting a representative solution from each equivalence class—we call these representatives the primary solutions of x 2 dy 2 D A. First we take the conjugate of the product formula (2.213): p p p .x1 y1 d /.u v d / D x2 y2 d ; (2.216) and then take the ratio of (2.213) and (2.216): p p p x1 C y1 d u C v d x2 C y2 d p p : p D x1 y1 d u v d x2 y2 d
(2.217)
p We have u C v dpD ˙m for some integer m (where D d is the fundamental unit), and so u v d D ˙m . Returning to (2.217), we have p p x2 C y2 d x1 C y1 d 2m p D p : x2 y2 d x1 y1 d
(2.218)
In view of (2.218) there is just one choice of m (for a given x1 and y1 ) which will ensure that p x2 C y2 d 1< p 2d : (2.219) x2 y2 d
150
2 Expectation, and Its Connection with Quadratic Fields
Equation (2.219) does not change if we replace .x2 ; y2 / with .x2 ; y2 /, so we can further ensure that p (2.220) x2 y2 d > 0: The particular solution x D x2 , y D y2 of x 2 dy 2 D A that satisfies (2.219) and (2.220) will be called primary. To prove the convergence in (2.215), we estimate the sums N X nD1
Rd .n/ and
N X
Rd .n/
nD1
by employing a simple lattice point counting argument. (It is worthwhile to point out that the same lattice point counting argument is used in the p proof of Dirichlet’s class number formula for real quadratic fields h.d / log d D d L.1; d /.) We will show that N X
p Rd .n/ D c0 .d /N C O. N /
(2.2210)
p Rd .n/ D c0 .d /N C O. N /
(2.22100)
nD1
and N X nD1
with the same constant factor c0 .d / (which is of course independent of N ).
2.6 Connection with Quadratic Fields (I)
151
us that the sum PNTo prove (2.221), we use (2.219) and (2.220), which tells 2 R .n/ equals the number of lattice points .x; y/ 2 Z Z satisfying the three d nD1 requirements: p xCy d 0 < x dy N; x y d > 0; 1 < p 2d : xy d 2
p
2
(2.222)
The region defined by Eq. (2.222) is a sector of a hyperbola bounded by two half lines through the origin—we call it a “hyperbolic triangle,” and denote it with H.N / D Hd .N /; see the picture. The left corner of the “hyperbolic triangle” p H.N / D Hd .N / is the origin .0; 0/, the lower right corner is the point . N ; 0/, and the upper right corner is the intersection of the hyperbola x 2 dy 2 D N and the positive side of the line p xCy d p D 2d : xy d It is not too difficult to determine the area of H.N /: we have N Area.Hd .N // D p log d : 2 d
(2.223)
We outline the proof p of (2.223). Firstpwe change the coordinates from x; y to u; v where u D x y d and v D x C y d and compute the determinant p ˇ ˇ p ˇ1 d ˇ @.u; v/ D ˇˇ p ˇˇ D 2 d : 1 d @.x; y/
(2.224)
In the u; v-plane, the hyperbolic triangle H.N / [defined by (2.222)] is given by 0 < uv N; u > 0; u < v u2 : These conditions are equivalent to 0
p N ; u < v minfu2 ; N=ug:
Since u2 < N=u is equivalent to u < Z
p N =
Z 2
.u u/ d u C 0
p
(2.225)
N =, the area of (2.225) is
p
N
p N =
N u u
d u D N log :
This has to be divided by the determinant in (2.224) to obtain the area in the x; yplane and this gives (2.223).
152
2 Expectation, and Its Connection with Quadratic Fields
To estimate the number of lattice points inside the hyperbolic triangle H.N /, we use the general inequality (see Proposition 1.9) Area.H / O.Perimeter.H // jH \ ZZ2 j Area.H / C O.Perimeter.H // C 1: (2.226) p The perimeter of the hyperbolic triangle H.N / is O. N /. Indeed, the three vertices p of H.N / are .0; 0/, . N ; 0/, and .x0 ; y0 /, where the point .x0 ; y0 / satisfies both equations p xCy d x 2 dy 2 D N; (2.227) p D 2d : xy d p p N d . The coordinates of the vertices It follows from (2.227) that xp 0 Cy0 d D p of H.N / are all in the range O. N /, implying that the perimeter of H.N / is O. N /. Applying (2.226) we have N X
Rd .n/ D Area.H.N // C O.Perimeter.H.N /// D
nD1
p N D p log d C O. N /: 2 d
(2.228)
Repeating the same argument for 0 < dy 2 x 2 N instead of 0 < x 2 dy 2 N , we obtain the same right-hand side: N X
p N Rd .n/ D p log d C O. N /; 2 d nD1
(2.229)
proving (2.2210) and (2.22100). Taking the difference of (2.228) and (2.229), we have N X
p .Rd .n/ Rd .n// D O. N /:
(2.230)
nD1
Now it is easy to prove the convergence of the series in (2.215). Indeed, by using (2.230) and Abel’s transformation (2.119), we have for any 1 < N < M , M M 1 X X Rd .n/ Rd .n/ D n nDN mDN
Pm
nDN .Rd .n/
Rd .n// m.m C 1/
p p M M 1 X O. M / O. m/ 1 X D O.N 1=2 /: .Rd .n/ Rd .n// D C C 2 M nDN m M mDN (2.231)
2.6 Connection with Quadratic Fields (I)
153
Equation (2.231) immediately implies the convergence of the infinite series in (2.215): 1
X .x;y/¤.0;0/W primary representations
X Rd .n/ Rd .n/ 1 is convergent: D x 2 dy 2 n
(2.232)
nD1
If x D w 0, y D z 0 is a primary solution of x 2 dy 2 D A with A > 0, then by definition p p p wCz d A D w d z D .w C z d /.w z d /; 1 < p 2d ; wz d 2
2
implying p p p A < w C z d Ad :
(2.233)
p j It follows from the product formula (2.213) that for every integer j , .w C z d /d 2 2 gives another solution of x dy D A, and by (2.233) we have p p p log.N= A/ j C O.1/: .w C z d /d .2 C o.1//N d ” j log d
(2.234)
2 The same holds for x 2 dy p A < 0, the only minor difference is that p D A with in (2.234) we have to replace A with jAj. Thus by (2.234) we obtain the key formula:
X p 1yN;1xN d W jx 2 dy 2 jm
! p X Rd .A/ Rd .A/ log.N= A/ 1 D C O.1/ ; x 2 dy 2 A log d 1Am (2.235)
which holds for any 1 < m < N . Equation (2.235) is the key to prove Proposition 2.20; in the application below we will use (2.235) with the choice m .log N /c , where c > 1 is an absolute constant to be specified later. We divide the left-hand side of (2.235) into two parts: X p
1yN;1xN d W jx 2 dy 2 jm
x2
X X 1 D C ; 2 1 2 dy
(2.236)
154
2 Expectation, and Its Connection with Quadratic Fields
where X 1
D
X p 1yN;1xN p dW jx 2 dy 2 jm;jxy d j<1=2
1 x 2 dy 2
(2.237)
1 : x 2 dy 2
(2.238)
and X 2
D
X p 1yN;1xN p dW 2 2 jx dy jm;jxy d j1=2
First we show that X 2
D O .log m/3 :
(2.239)
To prove (2.239), notice that the conditions p jx 2 dy 2 j m; jx y d j 1=2 in (2.238) clearly imply p 0 < x C y d 2m:
(2.240)
p Since the number of solutions of x 2 dy 2 D A with x 0, y 0, x C y d 2m is estimated from above by Rd .A/ P O.log m/, by (2.238) and (2.240) we have the following trivial upper bound on 2 : X 2
! m X Rd .A/ Rd .A/ D O log m : A AD1
(2.241)
We recall (2.214): Rd .A/ C Rd .A/ .A/ where .n/ is the divisor function (number of divisors of n) and using this in (2.241) we obtain X 2
D O log m
m X .k/ kD1
k
! :
(2.242)
We recall the following well-known fact about the divisor function (see, e.g., in [Ha-Wr]): n X kD1
.k/ D O.n log n/:
(2.243)
2.6 Connection with Quadratic Fields (I)
155
An application of (2.243) in formula (2.242), combined with the Abel’s transformation (2.119), gives P (2.239). Next we study 1 defined in (2.237). We are motivated by the vague approximation p p p p k d C` p .d k 2 `2 /; tan.k d / .k d `/ D .k d `/ p k d C` 2k d (2.244) p where ` D `.k; d / is the nearest integer to k d . It is easy to make (2.244) precise by using the beginning of the Taylor series of tan.x/: tan.x/ D x C O.x 3 /; then a simple calculation gives the following precise equality: 1
p
k tan.k d /
p p 2 d D O.kk d k=k/ C O.1=k 2 /: 2 2 .d k ` /
(2.245)
Thus we have [see (2.237) and (2.245)] X
X
1
p W p 1kN 2 d kkk d km
p D k tan.k d /
p 1kN;1`N p dW d j<1=2
j`2 d k 2 jm;j`k
0 CO.
X k1
p
D
p 2 d C .`2 d k 2 /
B k 2 / C O B @
1 X p 1kN p W 2 d kkk d km
p C kk d k=k C AD
0
B 2 dX C O.1/ C O B @ 1
1 X p W p 1kN 2 d kkk d km
p C kk d k=k C A;
(2.246)
provided 1 m is a half-integer, i.e., m D integer C : 2
(2.247)
To explain the role m” [see condition (2.247)] in (2.246), note that p of “half-integer p jd k 2 `2 j D .k d C `/jk d `j is clearly an integer, and p p p p p p p 2 d kkk d k D 2 d kjk d `j D .. d k C `/ C . d k `//jk d `j D p p D jd k 2 `2 j ˙ . d k `/2 D integer ˙ . d k `/2 :
(2.248)
156
2 Expectation, and Its Connection with Quadratic Fields
p p Since ` is the nearest integer to d k, . d k `/2 1=4, and so by (2.247) and (2.248) with m D m1 C 1=2, where m1 is an integer, we have p p 1 2 d kkk d k m D m1 C ” jd k 2 `2 j m1 : 2
(2.249)
It is easy to estimate the error term in (2.246): X
X X p kk d k=k 1=k C m=k 2 D O.log m/:
p W p 1kN 2 d kkk d km
1km
(2.250)
m
Next we apply the following p general result, which holds for any badly approximable ˛ (we will choose ˛ D d ). Lemma 2.21. If ˛ is badly approximable, then for any N 2 and .log N /6 , M˛ .N / D
1 2
X 1nN W nkn˛k
1 C O.1/: n tan. n˛/
Here the error term O.1/ depends only on the upper bound on the partial quotients of the badly approximable ˛. First we show how to use Lemma 2.21 to complete the proof of Proposition 2.20. We make the choice D .log N /6 C O.1/; here I choose the constant O.1/ in such a way that p 1 2 d D m D integer C : 2 Combining (2.246)–(2.251) with Lemma 2.21—where ˛ D p Mpd .N /
D
dX
2
1
(2.251) p
d —we obtain
C O.log log N /:
(2.252)
By (2.235)–(2.239) and (2.252), p d Mpd .N / D 2
X 1Am
! p
Rd .A/ Rd .A/ log.N= A/ C O.1/ C O .log log N /3 ; A log d (2.253)
where by condition (2.251), p p m D 2 d D 2 d .log N /6 C O.1/ D half-integer:
(2.254)
2.6 Connection with Quadratic Fields (I)
157
Note that X Rd .A/ Rd .A/ A 1Am log N D log d
! p log.N= A/ C O.1/ D log d
X Rd .A/ Rd .A/ C O A
1Am
! X .Rd .A/ Rd .A// log A : A 1Am (2.255)
Again using (2.214), (2.243), and Abel’s transformation (2.119), a routine calculation gives X .Rd .A/ Rd .A// log A D O .log m/3 : A 1Am
(2.256)
Moreover, by (2.231) and (2.254), X Rd .A/ Rd .A/ D O.m1=2 / D O .log N /3 : A A>m
(2.257)
Combining (2.255)–(2.257), we have X Rd .A/ Rd .A/ A 1Am D
! p log.N= A/ C O.1/ D log d
1 log N X Rd .A/ Rd .A/ C O .log log N /3 : log d AD1 A
(2.258)
Finally, (2.253) and (2.258) imply Proposition 2.20. It remains to give a Proof of Lemma 2.21. We basically repeat the argument of Step One in the proof of Proposition 2.16 (see Sect. 2.4). This means, we are going to combine Koksma’s inequality [in fact, we use the form (2.148) and (2.149)] with Lemma 2.19 and try to force the usual cancellation of the “positive and negative sides.” Since the notation “kxk=small” does not tell us whether x is slightly less or slightly more than an integer, we will use the notation kxkC and kxk introduced in Sect. 2.4, see the definition between (2.153) and (2.154). Let p 1 1 p kk˛kC < .1 C / AC .M; p; q; r/ D M.1 / < k M W r M M q
158
2 Expectation, and Its Connection with Quadratic Fields
and
1 p p 1 A .M; p; q; r/ D M.1 / < k M W kk˛k < .1 C / ; r M M q
where M 2p, p 2, q 1, r 1 are real numbers (to be specified later). We apply Lemma 2.18—in fact, we use Eq. (2.148)—with aD
p 1 1 p ; b D .1 C /; f .x/ D ; M N q tan.x/
and the finite point set in the interval Œa; b is X D fk˛ .mod 1/ W k 2 AC .M; p; q; r/gI then we have ˇ ˇ ˇ ˇ Z b Z b C X ˇ ˇ jA 1 .M; p; q; r/j ˇ ˇ f .x/ dx ˇ jf 0 .x/j dx; ˇ ba a a ˇ ˇk2AC .M;p;q;r/ tan.k˛/ where by Lemma 2.19, D O.log p/. Also, we have Z
b
0
jf .x/j dx D jf .b/ f .a/j D O a
M 1 M .1 C / p q p
DO
M pq
;
and again using Lemma 2.19—in fact, we use it twice: first for n D M , then for n D M.1 1=r/, and finally, take the difference—we can estimate the number of elements jAC .M; p; q; r/j of the set AC .M; p; q; r/ as follows: jAC .M; p; q; r/j D
M .b a/ C O.log.M.b a/ C 2// D r
D
M .b a/ C O.log..p=q/ C 2//: r
It follows that X k2AC .M;p;q;r/
D
M r
Z
1 D tan.k˛/
b
f .x/ dx C O.M log p=pq/ C O.log..p=q/ C 2// a
D
M r
Z
1 ba
Z
b
f .x/ dx D a
b
f .x/ dx C O.M.log p/=pq/ C O.M.log..p=q/ C 2//=p/: a
2.6 Connection with Quadratic Fields (I)
159
If k1 ; k2 2 AC .M; p; q; r/ then k1 =k2 D 1 C O.1=r/, and so we have X k2AC .M;p;q;r/
C O.
1 1 D k tan.k˛/ r
Z
b
f .x/ dxC a
1 1 log p/ C O. log..p=q/ C 2// C O.r 2 / pq p
Z
b
f .x/ dx:
(2.259)
a
Note that Z
Z
b
b
f .x/ dx D a
a
dx tan.x/
log.b=a/ D log.1 C 1=q/ D O.1=q/:
(2.260)
Since we can repeat the argument for A .M; p; q; r/, by (2.259) and (2.260) we have for both A˙ .M; p; q; r/ X k2Aı .M;p;q;r/
C O.
1 1 D kj tan.k˛/j r
Z
b
f .x/ dxC a
1 1 log p/ C O. log..p=q/ C 2// C O.r 2 q 1 / pq p
(2.261)
holds for both “ı D C” and “ı D .” Applying (2.261) with pj D p.1 C 1=q/j , j D 0; 1; 2; : : :, we have for both kxkC and kxk , i.e., formally for both “ı D C” and “ı D ” (note that the value of parameter q 1 will be specified later): X M.11=r/
C O.
1 1 D kj tan.k˛/j r
Z
1=2
f .x/ dxC a
1 1 log p/O.q log M / C O.q log M /O. log..p=q/ C 2// C O.q log M /O.r 2 q 1 /; pq p
(2.262) since we can clearly stop at j D O.q log M /. What we really want to estimate is a slightly different variant of (2.262), where the condition kk˛kı p=M is replaced by kkk˛kı p: X M.11=r/
1 : kj tan.k˛/j
(2.263)
160
2 Expectation, and Its Connection with Quadratic Fields
Since M.1 1=r/ < k M , kk˛kı p=M in (2.262) implies kkk˛kı .1 1=r/p. By changing p to p 0 D .1 1=r/p, a D p=M changes to a0 D .1 1=r/p=M , and this gives the additional error term 1 r
Z
a a0
1 f .x/ dx D r
Z
p=M .11=r/p=M
1 p M dx D O.r 2 /: tan.x/ r Mr p
(2.264)
Thus, by using (2.262) and (2.264) in (2.263), we have (“ı D C” and “ı D ”) X M.11=r/
1 1 D kj tan.k˛/j r
Z
1=2
f .x/ dxC a
q log M log p / C O. log M log..p=q/ C 2// C O.r 2 log M / C o.r 2 /: p p (2.265) In (2.265) we take the difference for “ı D C” and “ı D ”: C O.
X M.11=r/
D O.
1 D k tan.k˛/
log M log p q / C O. log M log..p=q/ C 2// C O.r 2 log M / C O.r 2 /: p p (2.266)
Next we choose r D .log M /3 and apply (2.266) with Mj D M.1 1=r/j , j D 0; 1; 2; : : : ; r 1. Since .1 1=r/r D e 1 C o.1/, (2.266) implies that for every M there is a constant times smaller M D .1 C o.1//M=e such that X M
D O.
1 D k tan.k˛/
.log M /4 log p q / C O. .log M /4 log..p=q/ C 2// C O..log M /2 //: p p (2.267)
We use (2.267) repeatedly: with M D N , M D .1 C o.1//Ne 1, M D .1 C o.1//Ne 2 , M D .1 C o.1//Ne 3 , and so on—at the end we obtain X 1kN W kkk˛kp
D O.
1 D k tan.k˛/
.log N /5 log p q / C O. .log N /5 log..p=q/ C 2// C O..log N /1 //: p p (2.268)
2.6 Connection with Quadratic Fields (I)
161
By choosing q D 1 and p .log N /6 in (2.268), we conclude that X 1kN W kkk˛kp
1 D o.1/: k tan.k˛/
t u
Combining this with Proposition 2.16, Lemma 2.21 follows.
t u
This completes the proof of Proposition 2.20.
2.6.1 A Detour: Another Class Number Formula We recall that Proposition 2.20 is exactly Eq. (2.14) in Sect. 2.1, and it quickly leads to a proof of the elegant Hirzebruch–Meyer–Zagier class number formula (HMZformula, in short) as follows. Assume that d Dp p 3 (mod 4) is a prime > 3, and the class number of the real quadratic field Q . d / is one, or using the traditional h-notation, h.d / D h.p/ D 1. Then we have the equality X .x;y/¤.0;0/W primary representations
1 D L.1; /; x 2 py 2
(2.269)
where is the so-called norm-sign character and L.1; / is the corresponding L-function at s D 1. More precisely, is a unique character p with values ˙1 defined for all ideals in the ring of the algebraic integers of Q . d / (in fact, depends only on the narrow ideal class) and satisfies ..a// D sign Norm.a/ for the principal ideals .a/. Notice that, in our special case d D p with h.p/ D 1, every ideal is principal. The special L-function X
L.s; / D
AW ideals
.A/ Norm.A/s
has the product decomposition L.s; / D L.s; 4 /L.s; p /
(2.270)
where L.s; 4 / D
1 X 4 .n/ nD1
ns
and L.s; p / D
1 X p .n/ nD1
ns
162
2 Expectation, and Its Connection with Quadratic Fields
p p are the (ordinary) L-functions of the complex quadratic fields Q . 4/DQ Q. 1/ p (“Gauss integers”) and Q . p/; the characters 4 and p are defined as follows: 4 .n/ D ˙1 if n ˙1 (mod 4) and 4 .n/ D 0 if n is even, and n p .n/ D p is the usual Legendre symbol (i.e., the quadratic residue symbol). Note that (2.270) is “explained” by the elementary factorization 4p D .4/.p/ of the discriminant of x 2 py 2 ; for a precise proof, see, e.g., Zagier’s book [Za4]. In the special case s D 1 Eq. (2.270) gives L.1; / D L.1; 4 /L.1; p /;
(2.271)
and by Dirichlet’s class number formula, L.1; 4 / D
4
and L.1; p / D
h.p/ ; p p
(2.272)
if p > 3. p Let a1 ; a2 ; : : : ; a2s be the period of the continued fraction for p (since p 3 (mod 4) prime, p the length of the period has to be even). (We have to exclude p D 3, because Q . 3/ has too many automorphisms—a technical nuisance in algebraic number theory.) By Proposition 2.1, Mpp .N / D D
a1 C a2 a3 ˙ C .1/` a` C O.1/ D 12 a1 C a2 C a2s log N C O.1/; 12 log
(2.273)
p where ` is the last index for which q` N and is the fundamental unit of Q . p/ (in the last equation we heavily used the periodicity of the continued fraction p of p). On the other hand, combining Proposition 2.20 with (2.269)–(2.273), we have Mpp .N / D
h.p/ log N C O .log log N /3 : 4 log
(2.274)
Comparing (2.273) and (2.274), we obtain the beautiful equation h.p/ D
a1 C a2 a3 ˙ C a2s : 3
(2.275)
2.6 Connection with Quadratic Fields (I)
163
As far as we know this equation was discovered (or rediscovered) in the 1970s by Hirzebruch, and it is called the Hirzebruch or Hirzebruch–Meyer–Zagier class number formula. Note that, among the primes p 3 (mod 4), the majority (in fact, about 80 % ) p seems to satisfy the requirement h.p/ D 1 (i.e., the real quadratic field Q . p/ has class number one)—at least this is what we can read out from the numerical tables. Unfortunately, despite the overwhelming computational evidence, nothing is proved here. It is more than surprising that the “mean value” Mpp .N /, associated with the p irrational rotation k p (mod 1), k D 1; 2; : : : ; N , is intimately bound up with p the class number h.p/ of the complex quadratic field Q . p/. This leads to the following question.
2.6.2 How to Compute the Class Number in General: The Complex Case One way to do it is to use Dirichlet’s finite class number formula, which expresses the class number in terms of the Dirichlet character of the corresponding discriminant. The formula is the simplest when d D p, where p 3 (mod 4). We form the sum, say R, of all quadratic residues (mod p), and the sum, say N , of all quadratic non-residues. Then h.p/ D .N R/=p. For example, if p D 7, the quadratic residues are 12 ; 22 , and 32 2 (mod 7), and the quadratic non-residues are the remaining 3; 5; 6 (mod 7). The formula gives 14 7 .3 C 5 C 6/ .1 C 4 C 2/ D D 1: 7 7 p In the general case, the formula is the following: if K D Q . d / is a complex quadratic field, then h.7/ D
w.d / X D .k/k; 2D D
h.d / D
kD1
where D(=d or 4d ) is the discriminant of K, D .k/ is the real character of K periodic modulo D (it is a product of certain Legendre symbols), and finally w.1/ D 4, w.3/ D 6, w.d / D 2 for the rest (the number of roots of unity in the field). An equivalent form is h.d / D
for all square-free d 2.
1 2 D .2/
X 0
D .k/
164
2 Expectation, and Its Connection with Quadratic Fields
An alternative—in fact, more efficient—way to compute the class number is to use “reduction theory.” There is an elegant reduction theory for positive definite quadratic forms (i.e., when the discriminant is negative; we denote it .D/), which leads to a surprisingly simple p algorithm to determine the class number h.D/ of a complex quadratic field Q . D/. We summarize it in a nutshell. By using a finite sequence of simple unimodular substitutions of the form x D y 0 , y D x 0 and x D x 0 ˙ y 0 , y D y 0 , any binary form can be transformed into another binary form ax 2 C bxy C cy 2 , for which jbj a c. In fact, we can even force that either a < b a < c or 0 b a D c: Such a form is called a reduced form. It is an important theorem that there is one and only one reduced form equivalent to any given form. The number of reduced forms with discriminant D is the class number h.D/. For example, to calculate the class number when D D 7, b2 pthe inequality p 2 2 2 a ac and the fact 4ac b D D give 3b D, i.e., jbj D=3 D 7=3 < 2. Since 4ac b 2 D D D 7 implies that b is odd, we have b D ˙1. Now 4ac D 1 C 7 D 8 gives a D 1, c D 2. The requirement a < b a < c excludes the case b D 1, so there is only one reduced form of discriminant 7—namely, x 2 C xy C 2y 2 —yielding h.7/ D 1. p p A more complicated example is2D D 23. The inequality jbj D=3 D 23=3 < 2 and the fact 4ac D b C 23 imply that b is odd and b D ˙1. Now 4ac D 1 C 23 D 24 gives a D 1; c D 6 or a D 2; c D 3. The requirement a < b a < c excludes the case a D 1; b D 1; c D 6, so there are three reduced forms of discriminant 23—namely, x 2 C xy C 6y 2 and 2x 2 ˙ xy C 3y 2 — yielding h.23/ D 3. Since h.7/ D h.23/ D 1 (i.e., the class numbers in the real cases are both one; we omit the proof), we can double-check and h.23/ by using p the facts h.7/ D 1 p the HMZ-formula, see (2.275). Since 7 D Œ2I 1; 1; 1; 4 and 23 D Œ4I 1; 3; 1; 8, we have h.7/ D
1 C 3 1 C 8 1 C 1 1 C 4 D 1 and h.23/ D D 3: 3 3
We conclude this section with the remark that if ˛ is an arbitrary quadratic irrational ˛D
p B C D ; that is; ˛ is a root of Ax 2 C Bx C C D 0; and D D B 2 4AC > 0; 2A
2.6 Connection with Quadratic Fields (I)
165
then we have the following analog of formula (2.211): p M˛ .N / D
0
DB B 2 2 @
1 X .x;y/¤.0;0/W primary representations
Ax 2
C log N 1 C C negligible; 2 C Bxy C Cy A log
(2.276) p where is the fundamental unit in Q . D/. The proof of (2.276) is the same as that of Proposition 2.20. The guiding intuition is that if y˛ is very close to an integer x, then p ky˛k Dy D ˙A.xy˛/.y˛y˛0 / A.xy˛/.xy˛ 0 / D Ax 2 CBxyCCy 2 ; where ˛0 D .B
p
D/=2A is the other root of Ax 2 C Bx C C D 0.
Chapter 3
Variance, and Its Connection with Quadratic Fields
3.1 Computing the Variance It is a rule of thumb in probability theory that if we have a random variable, then it is a good idea to know the expectation and the variance. This is particularly true when we need to prove a central limit theorem, since these parameters—the expectation and the variance—both show up in the statement itself. Theorem 1.2 is about the diophantine sum (see (1.43), fxg denotes the fractional part of x) n X 1 S˛ .n/ D fk˛g 2 kD1
as n runs in 1 n N . Proposition 2.1 describes the expectation M˛ .N / D
N 1 X S˛ .n/ N nD1
in terms of the partial quotients of ˛. This gives a perfectly satisfying answer when we know the partial quotients—it is the case, for example, for quadratic irrationals. It is time now to evaluate the variance N 1 X V˛ .N / D .S˛ .n/ M˛ .N //2 : N nD1
(3.1)
Unfortunately, we cannot prove an analog of Proposition 2.1; instead, what we can prove is an analog of Proposition 2.20, which will lead us back to the deeper arithmetic of quadratic number fields.
© Springer International Publishing Switzerland 2014 J. Beck, Probabilistic Diophantine Approximation, Springer Monographs in Mathematics, DOI 10.1007/978-3-319-10741-7__3
167
168
3 Variance, and Its Connection with Quadratic Fields
If ˛ is badly approximable, then, combining (2.125) in Lemma 2.15 with Proposition 2.16, for any T we have (1 n N ) S˛ .n/ M˛ .N / D
C
X0
C O.
T X cos..2n C 1/j˛/ C 2j sin.j˛/ j D1
T X X0 n log n 1 D / C O.1/; where : T 2j tan.j˛/ j DN C1
(3.2)
By repeating the argument of Step One and Step Two in the proof of Proposition 2.16 (see Sect. 2.4), we obtain the upper bound X0
T D O log. C 1/ : N
We choose T D T .N / D N log N:
(3.3)
By using these in (3.2), we have S˛ .n/ M˛ .N / D
T X cos..2n C 1/j˛/ C O.log log N /: 2j sin.j˛/ j D1
(3.4)
3.1.1 Guiding Intuition As n runs in the interval 1 n N , it is plausible to expect that the difference S˛ .n/ M˛ .N / can be approximated by the following “stochastic variant” of (3.4): S˛ .n/ M˛ .N / D
N X cos.2Yj / C negligible; 2j sin.j˛/ j D1
(3.5)
where Y1 ; Y2 ; : : : ; YN are independent random variables, each one is uniformly distributed in the unit interval 0 Yj 1. Since Z
1
cos2 .2x/ dx D 0
1 ; 2
the guiding intuition (3.5) suggests that, as n runs in 1 n N , the variance of S˛ .n/ M˛ .N / equals
3.1 Computing the Variance
169
N X j D1
8 2 j 2
1 C negligible: sin2 .j˛/
To prove this, i.e., to evaluate the variance V˛ .N / in (3.1), it is natural to focus on the sum 12 0 N T X X cos..2n C 1/j˛/ 1 A ; @ V˛ .N / D N nD1 j D1 2j sin.j˛/
(3.6)
since by (3.4) V˛ .N / D V˛ .N / C O.log log N
p
V˛ .N // C O .log log N /2 :
(3.7)
Equation (3.7) follows from the general inequality ˇ ˇ N N N N ˇ1 X 1 X 2 ˇˇ 1 X 2 2 X ˇ 2 .an C bn / an ˇ jan bn j C b ˇ ˇN N nD1 ˇ N nD1 N nD1 n nD1 N 2B X jan j C B 2 2B N nD1
N 1 X 2 a N nD1 n
!1=2 C B 2;
where B D maxn jbn j. Multiplying out the square in (3.6), we have V˛ .N / D Diag.N / C OffDiag.N /;
(3.8)
where the diagonal part is Diag.N / D
T X
N 1 1 X 2 cos ..2n C 1/j˛/; .2j sin.j˛//2 N nD1 j D1
(3.9)
and the off-diagonal part is OffDiag.N / D
T X
X
j D1 1kT W k¤j
1 .j; k/ 4 2 j sin.j˛/ k sin.k˛/ N
(3.10)
where .j; k/ D
N X nD1
cos..2n C 1/j˛/ cos..2n C 1/k˛/:
(3.11)
170
3 Variance, and Its Connection with Quadratic Fields
3.1.2 An Alternative Form of the Guiding Intuition If ˇ is a “poorly approximable number” in the more precise sense that ˇ is not very close to any rational number with small denominator, then the sequence nˇ (mod 1), n D 1; 2; : : : ; N is “very uniformly distributed” in the unit interval. Therefore, it is plausible to approximate the sum (3.6) with the integral (see (3.9)–(3.11)) Z
N X
1 .2j sin.j˛//2 j D1 C
X 1j
1
cos2 .2jx/ dxC 0
2 4 2 j sin.j˛/ k sin.k˛/
Z
1
cos.2jx/ cos.2kx/ dx: 0
By using the trivial facts Z 1 1 cos2 .2jx/ dx D for all integers j 1; 2 0 Z 1 cos.2jx/ cos.2kx/ dx D 0 for all integers 1 j < k; 0
and also that T is “close” to N (see (3.3)), it is plausible to expect that V˛ .N / D V˛ .N / C negligible D D
N X j D1
8 2 j 2
1 C negligible: sin2 .j˛/
Now p consider the special case when ˛ D kj d k is small then
p
(3.12)
d and d 2 is a square-free integer. If
p p 2 p sin2 .j d / .m2 dj 2 /2 .m2 dj 2 /2 2 p kj d k D .j d m/ D ; 2 4dj 2 .m C j d /2 (3.13) p where m D m.j; d / is the nearest integer to j d . Using approximation (3.13) in (3.12), we can rightly expect that the following formula holds: Vpd .N / D
N d X 2 4 yD1
X .x 2
p 1xN
1 C negligible: dy 2 /2
(3.14)
d
Ifpd 2 or 3 (mod 4) then x 2 dy 2 is the norm p of the algebraic integer x C y d (x; y 2 ZZ) in the real quadratic field Q . d /. It is clear, therefore, that the
3.1 Computing the Variance
171
p evaluation of the variance V˛ .N / for ˛ pD d is intimately associated with the arithmetic of the real quadratic field Q . d /. Here we stop the intuition. The rest of the section, and also the beginning of the next section, is about how to justify the vague approximations in (3.12) and (3.14). First we prove the following precise statement. Proposition 3.1. If ˛ is badly approximable then with T D N log N we have 0 12 N T X X cos..2n C 1/j˛/ A 1 @ V˛ .N / D D N 2j sin.j˛/ nD1
D
N X j D1
8 2 j 2
j D1
1 C O .log log N /2 : 2 sin .j˛/
(3.15)
Proof. We return to (3.6) and (3.8). First I estimate the diagonal part (see (3.9)) Diag.N / D
T X
N 1 X 2 1 cos ..2n C 1/j˛/: .2j sin.j˛//2 N nD1 j D1
(3.16)
Let 1 j N be fixed; we claim that the set fn j˛ .mod 1/ W n D 1; 2; : : : ; N g is uniformly distributed in the unit interval with discrepancy O.j log N /. (Note that the concept of discrepancy is defined in (1.19), or see (2.145).) Indeed, let I D .a; b/ be an arbitrary subinterval of .0; 1/, and for ` D 1; 2; : : : ; j consider the subinterval `1Ca `1Cb I` D : ; j j Since ˛ is badly approximable (this concept is defined before Lemma 2.14), the Discrepancy Lemma in Sect. 1.1 (see (1.22) and (1.23)) implies that X 1 D N jI` j C O.log N /; 1nN W n˛2I` .mod 1/
and so X 1nN W nj˛2I .mod 1/
D
j X
1D
j X
X
`D1
1nN W n˛2I` .mod 1/
1D
.N jI` j C O.log N // D N jI j C O.j log N /;
`D1
as we have claimed above.
(3.17)
172
3 Variance, and Its Connection with Quadratic Fields
Now let N1 D N.log N /2 , and assume 1 j N1 , then by (3.17) and Koksma’s inequality (Lemma 2.18), Z 1 N 1 X 2 cos ..2n C 1/j˛/ D cos2 .2x/ dx C O..log N /1 / D N nD1 0 D
1 C O..log N /1 /: 2
(3.18)
By repeating the proof of Lemma 2.14 (Pigeonhole Principle), we obtain T X j DN1 C1
1 D O.log.T =N1 // D O.log log N / .j kj˛k/2
(3.19)
and T X
1 D O.log T / D O.log N /: .j kj˛k/2 j D1
(3.20)
By using (3.18)–(3.20) in (3.16), we have Diag.N / D
N X j D1
8 2 j 2
1 C O.log log N /: sin2 .j˛/
(3.21)
Next we estimate the off-diagonal part, see (3.10) and (3.11). Assume j ¤ k, then .j; k/ D
N X
cos..2n C 1/j˛/ cos..2n C 1/k˛/ D
nD1
D
N X cos..2n C 1/.j k/˛/ C cos..2n C 1/.j C k/˛/ D 2 nD1
D
C
sin.2.N C 1/.j k/˛/ sin.2.j k/˛/ C 4 sin..j k/˛/ sin.2.N C 1/.j C k/˛/ sin.2.j C k/˛/ ; 4 sin..j C k/˛/
3.1 Computing the Variance
173
implying
1 ; N 2 k.j k/˛k C k.j k/˛k
1 ; N 2 k.j C k/˛k : k.j C k/˛k
j.j; k/j min
C min
Applying this in (3.10), we have the trivial upper bound OffDiag.N / D O.1 / C O.2 /;
(3.22)
where 1 D
X 1k<j T
1 1 min ; N 2 k.j k/˛k j kj˛k kkk˛k k.j k/˛k
(3.23)
1 1 2 min ; N k.j C k/˛k : j kj˛k kkk˛k k.j C k/˛k
(3.24)
and 2 D
X 1k<j T
First we estimate 1 . We begin with a straightforward corollary of Lemma 2.14: T X j D1
1 D O .log T /2 D O .log N /2 : j kj˛k
(3.25)
If k.j k/˛k
.log N /4 ; N
(3.26)
then by (3.23) and (3.25), the contribution of 1 under condition (3.26), denoted by 1 (3.26), can be estimated from above as follows: 0
1 ! T X 1 1 A .log N /4 D 1 (3.26) @ j kj˛k kkk˛k j D1 T X
kD1
D O .log N /2 .log N /2 .log N /4 D O.1/; which is negligible.
(3.27)
174
3 Variance, and Its Connection with Quadratic Fields
Thus we can assume the complement of (3.26): k.j k/˛k <
.log N /4 : N
(3.28)
Since ˛ is badly approximable, (3.28) implies that the inequality j k c0 N.log N /4
(3.29)
holds for some positive absolute constant c0 D c0 .˛/ > 0. Since the factor minf: : :g in 1 (see (3.23)) is clearly 1, we have the trivial upper bound 1
X 1k<j T
1 : j kj˛k kkk˛k
(3.30)
If j kj˛k .log N /2 ;
(3.31)
then by using (3.25) and (3.30) we can estimate the corresponding contribution of 1 as follows: 1 (3.31) D O .log N /2 .log N /2 D O.1/;
(3.32)
which is again negligible. We can thus assume that j kj˛k < .log N /2 ;
(3.33)
kkk˛k < .log N /2 :
(3.34)
and by a similar argument,
Since j > k, by (3.29) and (3.33), kj˛k <
.log N /2 .log N /2 6 < D O .log N / =N : j c0 N.log N /4
(3.35)
By (3.28) and (3.35), kk˛k k.j k/˛k C kj˛k D O .log N /6 =N ;
(3.36)
implying (since ˛ is badly approximable) k > c1 N.log N /6 with some constant c1 D c1 .˛/ > 0:
(3.37)
3.1 Computing the Variance
175
We also need the simple fact B D 1 D O log. C 1/ log. C 1/ ; j kj˛k A C
X Aj BW C j kj˛kD
(3.38)
which can be proved the same way as Lemma 2.14 (Pigeonhole Principle). We recall the trivial upper bound (3.30): 1
X 1k<j T
1 : j kj˛kkkk˛k
(3.39)
We can clearly assume that (3.28), (3.33), (3.34) are all true (since the contribution of the rest is O.1/); combining these conditions with (3.29), (3.37), and using (3.38) in (3.39), we obtain 1 D O .log log N /2 : Similarly (see (3.24)) 2 D O .log log N /2 : Thus by (3.22) we have OffDiag.N / D O .log log N /2 : Finally, combining (3.6), (3.8), (3.21), and (3.40), Proposition 3.1 follows.
(3.40) t u
If ˛ is badly approximable, then the following inequality is obvious: c1 log N <
N X
1 < c2 log N 2 sin2 .j˛/ j j D1
(3.41)
with some appropriate constants 0 < c1 D c1 .˛/ < c2 D c2 .˛/. Indeed, to prove the lower bound in (3.41), we simply take the convergent denominators j D qi , i D 1; 2; 3; : : :, with qi N (where pi =qi is the i th convergent of ˛), and use the well-known approximation property kqi ˛k < 1=qi . The upper bound in (3.41) is a straightforward corollary of (3.20). By (3.41) the main term in Proposition 3.1is some positive constant times log N ; consequently, the error term O .log log N /2 is negligible.
176
3 Variance, and Its Connection with Quadratic Fields
3.2 Connection with Quadratic Fields (II) For a general badly approximable ˛ we cannot say more than (3.41): c1 log N <
N X
1 < c2 log N; 2 sin2 .j˛/ j j D1
but for quadratic irrationals we can prove much more. Proposition 3.2. If ˛ is a quadratic irrational, then N X j D1
j2
1 D c.˛/ log N C O.1/; sin2 .j˛/
(3.42)
where c.˛/ > 0 is a positive constant depending only on ˛, and also the bounded error term O.1/ D O˛ .1/ depends only on ˛. Proof. Since Proposition 3.2 is somewhat similar to Proposition 2.20, it is not surprising that we can p closely follow the proof of Proposition 2.20. For simplicity assume first that ˛ D d where d 3 (mod 4) is a positive square-free integer. The next formula is a perfect analog of (2.235) (the proof is exactly the same): X p 1yN;1xN d W 2 2 jx dy jm
! p X Rd .A/CRd .A/ log.N= A/ 1 D C O.1/ ; .x 2 dy 2 /2 1Am A2 log d
(3.43) which holds for any 1 < m < N . In the application below we will use (3.43) with the choice m .log N /c , where c > 1 is an absolute constant to be specified later, see (3.52). Similarly to (2.236), we divide the left-hand side of (3.43) into two parts: X p 1yN;1xN d W jx 2 dy 2 jm
X X 1 D C ; 1 2 .x 2 dy 2 /2
(3.44)
where X 1
D
X p 1yN;1xN p dW 2 2 jx dy jm;jxy d j<1=2
1 .x 2 dy 2 /2
(3.45)
3.2 Connection with Quadratic Fields (II)
177
and X
D
2
X p 1yN;1xN p dW jx 2 dy 2 jm;jxy d j1=2
1 : .x 2 dy 2 /2
(3.46)
First we show that X 2
D O.1/:
(3.47)
To prove (3.47), we use the factorization p p jx 2 dy 2 j D jx C y d jjx y d j; which implies X 2
X
1 .x C y
x1;y1W p jxy d j1=2
1
X
x1;y1W p jxy d j1=2
p
d /2
1
XX 1 1 D p 2 2 2 x Cy .xy d /
D
1 p .x y d /2 X
1
kD1 `D1 x1;y1W 2k x 2 Cy 2 <2kC1 p `=2jxy d j<.`C1/=2
1 X
2
k
O 2
k=2
kD1
1 X
! `
2
`D1
DO
1 X
x 2 Cy 2
1 p .xy d /2
! 2
k=2
D O.1/;
kD1
proving (3.47). P p Next we study 1 . We recall the vague approximation in (3.13): if kj d k is small then sin2 .j
p
d/
2 .`2 dj 2 /2 ; 4dj 2
(3.48)
p where ` D `.j; d / is the nearest integer to j d . By using the short Taylor series sin.x/ D x x 3 =6 C O.x 5 / it is easy to turn the vague approximation in (3.48) into a precise equation. A simple calculation gives 1 j2
2
sin .j
p
d/
4d D O.j 3=2 /; 2 .`2 dj 2 /2
(3.49)
assuming j kj
p
d k < .log j /2 :
(3.50)
178
3 Variance, and Its Connection with Quadratic Fields
Furthermore, we need the fact that for any badly approximable ˛ X 1j N W j kj˛k.log j /2
1 D O.1/; .j kj˛k/2
(3.51)
which can be proved by a straightforward adaptation of the proof technique of Lemma 2.14 (Pigeonhole Principle). Combining (3.49)–(3.51) we have N 2 X 2 1 p D 2 4d j D1 j 2 sin .j d / 4d
D
X 1j N p W j`2 dj 2 j<2 d .log j /2
X
1
1j N W p j kj d k<.log j /2
2
j 2 sin .j
p
X
1 CO.1/ D .`2 dj 2 /2
p 1j N;1`N d W 2 2 j` dj j<m
where ` D `.j; d / is the nearest integer to j
p
d/
C O.1/ D
1 C O.1/; .`2 dj 2 /2
d and
p m D 2 d .log N /2 : Comparing this to the definition of
P 1
(3.52)
in (3.45), and using (3.47), we have
N X X X 2 X 1 C O.1/ D C C O.1/ D p D 2 1 1 2 4d j D1 j 2 sin .j d /
X Rd .A/ C Rd .A/ D A2 1Am
! p log.N= A/ C O.1/ ; log d
(3.53)
where in the last step we used (3.43) and (3.44). Following (2.255) and (2.258) we have the equation X Rd .A/ C Rd .A/ A2 1Am
! p log.N= A/ C O.1/ D log d
! X .Rd .A/ C Rd .A// log A : A2 1Am (3.54) Again using (2.214), (2.243), and Abel’s transformation (2.119), a routine calculation gives the following analog of (2.256): for any m > 1 we have log N D log d
X Rd .A/ C Rd .A/ C O A2 1Am
3.2 Connection with Quadratic Fields (II)
179
X .Rd .A/ C Rd .A// log A D O.1/: A2 1Am
(3.55)
Moreover, by repeating the argument in (2.230) and (2.231), X Rd .A/ C Rd .A/ D O.m3=2 /: A2
(3.56)
A>m
Combining (3.52) with (3.54)–(3.56), we have X Rd .A/ C Rd .A/ A2
1Am
D
! p log.N= A/ C O.1/ D log d
1 log N X Rd .A/ C Rd .A/ C O.1/: log d AD1 A2
(3.57)
p Finally, (3.53) and (3.57) imply Proposition 3.2 in the special case ˛ D d where d 3 (mod 4) is a positive square-free integer. p The proof of the special case ˛ D d was based on Eq. (3.43), which is a consequence of the cyclic group structure of the solutions of Pell’s equation x 2 dy 2 D 1. The group structure of the solutions of the Pell equation is equivalent to the fact that the principal (binary quadratic) form x 2 dy 2 has infinitely many nontrivial automorphisms (generated by a single automorphism). What is more, every (binary quadratic) form of discriminant D > 0 has infinitely many nontrivial automorphisms, and these are determined by the solutions of Pell’s equation t 2 Du2 D 4:
(3.58)
For a binary form Ax 2 CBxy CCy 2 with D D B 2 4AC > 0 (D is not a complete square), the automorphisms are given by the substitution xD
t uB t C uB x1 C uy1 ; y D Aux1 C y1 ; 2 2
(3.59)
where t and u come from (3.58). The corresponding determinant is one: ˇ ˇ ˇ.t uB/=2 t 2 u2 B 2 t 2 Du2 C u ˇˇ 2 ˇ C AC u D 1; D D ˇ Au .t C uB/=2ˇ 4 4 so (3.59) is a unimodular substitution. There are two trivial automorphisms, namely, the identity x D x1 ; y D y1 and the negative identity x D x1 ; y D y1 ; they correspond to the trivial solutions
180
3 Variance, and Its Connection with Quadratic Fields
t D ˙2; y D 0 of (3.58). Pell’s equation (3.58) has infinitely many nontrivial solutions, and all solutions are given by p !n p ! t Cu D t0 C u 0 D ; D˙ 2 2
(3.60)
where .t0 ; u0 / is the least positive solution (i.e., t0 > 0, u0 > 0 for which u0 is least). Substitution (3.59) become more transparent if we start with the factorization Ax 2 C Bxy C Cy 2 D A.x ˛y/.x ˛ 0 y/; where p p B C D B D 0 ˛D and ˛ D : 2A 2A
(3.61)
The effect of the unimodular substitution (3.59) is expressed by the product formula p t u D (3.62’) .x1 ˛y1 /; x ˛y D 2 p t Cu D 0 x˛ y D (3.62”) .x1 ˛0 y1 /: 2 Since every quadratic irrational ˛ can be written in the form (3.61), and the set of all automorphisms of the corresponding form Ax 2 C Bxy C Cy 2 is described by (3.60), (3.62’) and (3.62”), we have the following analog of (3.53) and (3.57) for p an arbitrary quadratic irrational ˛ D .B C D/=2A: 0 2
D
N X j D1
B 1 DB @ j 2 sin2 .j˛/
1 X .x;y/¤.0;0/W primary representations
C log N 1 C C negligible; .Ax 2 C Bxy C Cy 2 /2 A log
p where is the fundamental unit in Q . D/. Moreover, the infinite sum X .x;y/¤.0;0/W primary representations
1 D c0 .˛/ .Ax 2 C Bxy C Cy 2 /2
(3.63’)
(3.63”)
clearly defines a strictly positive constant c0 .˛/ > 0. This completes the proof of Proposition 3.2. u t
3.2 Connection with Quadratic Fields (II)
181
3.2.1 A Convenient Special Case: When the Class Number Is One The evaluation of the constant factor c.˛/ in Proposition 3.2 is particularly simple p if ˛ D p where p 3 (mod 4) is a prime and the class number h.p/ of the real p quadratic field Q . p/ is one. Note that, according to the mathematical tables, for more than half of the primes in the residue class p 3 (mod 4) the class number is one, i.e., h.p/ D 1. For example, there are 13 primes p 3 (mod 4) with p < 100, and all but one have class number one. The only exception is p D 79 with class number h.79/ D 3. (Unfortunately, there is no mathematical proof even for the much weaker conjecture that h.p/ D 1 occurs for infinitely many primes.) Class number one (i.e., h.p/ D 1) simply means that the principal form x2 py 2 is the only relevant quadratic form of discriminant 4p. Therefore, if h.p/ D 1 then 1 X Rp .A/ C Rp .A/ D K .2/; A2 AD1
(3.64)
p Q. p/. The value of the where K .s/ is the Dedekind zeta function of K=Q constant K .2/ is crucial in computing the variance. Indeed, by (3.6) and (3.7), Proposition 3.1, (3.53), (3.57), and (3.64), the variance Vpp .N / D
N 1 X p .S p .n/ Mpp .N //2 N nD1
equals Vpp .N / D Vpp .N / C O.log log N
D
p
log N / D
p p log N K .2/ C O.log log N log N /: 4 2 log p
(3.65’)
p Note that the seemingly large error term O.log log N log N / in (3.65’) is in fact negligible. Indeed, what we really need in our central limit theorem is the standard deviation, that is, we take the square root in (3.65’):
1=2 1 p K .2/ 1=2 p Vpp .N / D 2 log N C O.log log N /; 2 log p and here the error term O.log log N / is clearly negligible.
(3.65”)
182
3 Variance, and Its Connection with Quadratic Fields
3.2.2 The Class Number for Real Quadratic Fields: Illustrations A very interesting—but admittedly very inefficient—way to compute the class number of real quadratic fields is to use Dirichlet’s “finite formula” for the class number of real quadratic fields. For simplicity p assume that p is a prime with p 6 1 (mod 4), then the formula is (of course i D 1) Y
4p1
hp D
.1 e 2in=4p / p .n/ ;
nD1
p where h D h.p/ is the class number, 4p is the discriminant, p D x0 C y0 p > 1 2 where x D x0 , y D y0 is the least positive solution of Pell’s equation x py 2 D 1, and p .n/ is the corresponding real character (mod 4p), namely, .n1/=2
p .n/ D .1/
n for odd n p
(using the Legendre symbol) and 0 for even n, assuming p is odd, and finally 2 .n/ is a real character (mod 8) defined as 2 .n/ D 1 or 1 depending on whether n ˙1 or ˙3 (mod 8) and 0 for n even. p For example, let p D 2, then 2 D 3 C 2 2 (since p x D 3; y D 2 is the least positive solution of x 2 2y 2 D 1), e 2i=8 D .1Ci/= 2 D ˇ, and using the formula, p .1 ˇ 3 /.1 ˇ5 / .3 C 2 2/h D D .1 ˇ/.1 ˇ 7 / p p 1 ˇ3 ˇ 3 C 1 2C 2 p D 3 C 2 2; D D 1 ˇ ˇ 1 C 1 2 2 p implying that the class number of Q . 2/ is one: h.2/ D 1. There are more efficient methods to compute the class number, and they are all based on some definition of reduced forms. Already Gauss was able to compute the class number for hundreds of discriminants, and now by using computers, the threshold is in the millions. Number theory books routinely contain tables with the class number of all discriminants jDj < 500, see, e.g., the classic Borevich– Shafarevich: Number Theory. Here we briefly outline one possible approach, which is based on continued fractions. Given a quadratic form Q.x; y/ D ax 2 C bxy C c 2 with integral coefficients and discriminant D D b 2 4ac > 0, we call the root p b C D of ax 2 C bx C c D 0 'D 2a
3.2 Connection with Quadratic Fields (II)
183
the “first root” (where the square root has positive sign) and distinguish it from its conjugate p b D ; the other root of ax 2 C bx C c D 0; D 2a that we call the “second root” (where the square root has negative sign). Clearly Q.x; y/ D a.x 'y/.x y/. The form Q.x; y/ is called reduced if j'j < 1; jj > 1; ' < 0:
(3.66)
The following definition is clearly equivalent: 0
p p p D; 0 < D b < 2jaj < D C b:
(3.67’)
Another equivalent definition is 0
p p p D; 0 < D b < 2jcj < D C b:
(3.67”)
Indeed, the first root ' and the first coefficient a have the same sign, so 4ac D b 2 D < 0 implies that the last coefficient c has opposite sign; finally, note that p p 0 < D b 2 D . D b/. D C b/ D 4jacj: The equivalence of (3.67’) and (3.67”) yields that ax 2 C bxy C cy 2 and cx 2 C bxy C ay 2 are reduced at the same time. The key fact here is that every binary quadratic form of discriminant D > 0 (“indefinite form”)pis equivalent to a reduced form, see, e.g., [Za4]. (Since the quadratic field Q . D/ is basically equivalent to the binary quadratic forms of discriminant D, it suffices to deal with the quadratic forms.) Combining this key fact with (3.67’), we obtain that the list of candidates is finite, i.e., the class number is finite. Example 3.3. Let d D 19, then the discriminant D D 4 19 D 76 D b 2 4ac, implying that b is even. By condition (3.67’), 0
p
DD
p
76 < 9; so b D 2; 4; 6; 8;
1 and; respectively; .D b 2 / D jacj D 18; 15; 10; 3: 4 By using the second half of (3.67’), p p Db DCb < jaj < ; 0< 2 2
184
3 Variance, and Its Connection with Quadratic Fields
which gives the following list of reduced forms with discriminant D D 76: ˙3x 2 C 4xy 5y 2 ; ˙5x 2 C 4xy 3y 2 ; ˙2x 2 C 6xy 5y 2 ; ˙5x 2 C 6xy 2y 2 ; ˙x 2 C 8xy 3y 2 ; ˙3x 2 C 8xy y 2 : p The list has 12 forms, but the class number of Q . 19/ is less than 12. To find the exact number of pairwise inequivalent forms of discriminant D > 0—i.e., the class number—we can make use of the continued fraction (see, e.g., Chap. 13 in Zagier [Za4]). Consider the “first and second roots” of the form 3x 2 C 4xy 5y 2 (which happens to be the first on the list): p p 2 C 19 2 C 19 ; D : 'D 3 3 Clearly 1 1 > 1 and 1 < < 0: ' It is well known that under this condition the continued fraction for periodic; in fact, we have 3 1 D D p ' 19 2 The period 1; 3; 1; 2; 8; 2 of
1 '
is purely
p 19 C 2 D Œ1I 3; 1; 2; 8; 2: 5
is basically the same as the period of
p
1 '
p
19:
19 D Œ4I 2; 1; 3; 1; 2; 8:
Here “basically the same” means that the two periods, 1; 3; 1; 2; 8; 2 and 2; 1; 3; 1; 2; 8, can be transformed into each other by a cyclic permutation. Checking the remaining 11 forms on the list,pit is easy to see that each has a period that can be transformed into the period of 19 by a cyclic permutation. Thus p the class number of Q . 19/ is one. Example 3.4. Let d D 79, then the discriminant D D 4 79 D 316 D b 2 4ac, implying that b is even. By condition (3.67’), 0
p
DD
p
316 < 18; so b D 2; 4; 6; 8; 10; 12; 14; 16;
1 and; respectively; .D b 2 / D jacj D 78; 75; 70; 63; 54; 43; 30; 15: 4 By using the second half of (3.67’),
3.2 Connection with Quadratic Fields (II)
185
p p Db DCb < jaj < ; 0< 2 2 which gives the following long list of reduced forms with discriminant D D 316: ˙7x 2 C 6xy 10y 2 ; ˙10x 2 C 6xy 7y 2 ; ˙7x 2 C 8xy 9y 2 ; ˙9x 2 C 8xy 7y 2 ; ˙6x 2 C 10xy 9y 2 ; ˙9x 2 C 10xy 6y 2 ; ˙2x 2 C 14xy 15y 2 ; ˙15x 2 C 14xy 2y 2 ; ˙3x 2 C 14xy 10y 2 ; ˙10x 2 C 14xy 3y 2 ; ˙5x 2 C 14xy 6y 2 ; ˙6x 2 C 14xy 5y 2 ; ˙x 2 C 16xy 15y 2 ; ˙15x 2 C 16xy y 2 ; ˙3x 2 C 16xy 5y 2 ; ˙5x 2 C 16xy 3y 2 :
p The list has 32 forms, but the class number of Q . 79/ is much smaller than 12. To find the exact value of the class number, again we make use of the continued fraction. Consider the “first and second roots” of the form 7x 2 C 6xy 10y 2 on the list: p p 3 C 79 3 C 79 ; D : 'D 7 7 Clearly 1 1 > 1 and 1 < < 0; ' so the continued fraction for
1 '
is purely periodic:
1 7 D p D ' 79 3 The period 1; 5; 3; 2; 1; 1 of permutation:
1 '
p 79 C 3 D Œ1I 5; 3; 2; 1; 1: 10
cannot be obtained from the period of
p 79 by cyclic
p 79 D Œ8I 1; 7; 1; 16: This implies that the class number is 2. The form x 2 C 16xy 15y 2 on the list has the following “first and second roots”: ' D 8 C
p p 79; D .8 C 79/:
Clearly 1 1 > 1 and 1 < < 0; '
186
3 Variance, and Its Connection with Quadratic Fields
so the continued fraction for
1 '
is purely periodic: 1 D Œ1I 7; 1; 16: '
p The period 1; 7; 1; 16 of '1 is exactly the same as the period of 79 D Œ8I 1; 7; 1; 16. Next consider the form 3x 2 C 14xy 10y 2 on the list; it has the following “first and second roots”: p p 7 C 79 7 C 79 'D ; D : 3 3 Again 1 1 > 1 and 1 < < 0; ' so the continued fraction for
1 '
is purely periodic:
1 3 D D p ' 79 7
p 79 C 7 D Œ1I 1; 1; 2; 3; 5: 10
Despite the fact that the reverse 5; 3; 2; 1; 1; 1 of this period can be transformed into the previous period 1; 5; 3; 2; 1; 1 by some cyclic permutation, the three periods 1; 5; 3; 2; 1; 1 and 1; 7; 1; 16 and 1; 1; 1; 2; 3; 5
(3.68)
are substantially different (i.e., neither period can be transformed into another by a cyclic permutation). Moreover, checking the remaining forms on the list, it is easy to see that each has a period that can be obtained by some cyclic p permutation from one of the three periods in (3.68). Thus the class number of Q . 79/ is 3. These examples illustrate an effective method (studying the periods of continued fractions) to determine the class number of positive discriminants in general. Let’s now return to (3.65’), and in particular to the value of K .2/.
3.2.3 The Dedekind’s Zeta Function at s=2: A Formula Involving Characters Our goal is to supply an explicit evaluation of K .2/ in the form of a finite character sum, see Proposition 3.6. Since it has a relatively short and elementary proof, we
3.2 Connection with Quadratic Fields (II)
187
decided to include it. It illustrates the kind of number-theoretic arguments that are needed here. Proposition 3.6 is based on the well-known product formula
K .s/ D .s/L.s; D /;
(3.69)
where .s/ is the Riemann zeta function and L.s; D / is the Dirichlet L-function of the real Dirichlet character D corresponding to discriminant D (=the discriminant of K):
.s/ D
1 1 X X 1 D .n/ and L.s; / D : D s n ns nD1 nD1
We know since Euler that
.2/ D
1 X 2 1 D : 2 n 6 nD1
The difficult part is to determine L.2; D /. Lemma 3.5. For positive discriminant D we have D1 2 X L.2; D / D 5=2 D .n/n2 : D nD1
(3.70)
Proof. Since the quadratic field K is real, its character D .n/ is even, i.e., D .n/ D D .n/; thus we have L.2; D / D
1 X
D .n/n2 D
nD1
D1 1X D .a/'.a; D/; 2 aD1
(3.71)
where '.a; D/ D
X
n2 ;
(3.72)
na .mod D/
and the summation in (3.72) is extended over all integers (positive and negative). We clearly have '.a; D/ D
X
.mD C a/2 D D 2 f .a=D/; m2ZZ
188
3 Variance, and Its Connection with Quadratic Fields
where f .x/ D
X m2ZZ
1 : .m C x/2
It is not too hard to see that f .x/ D
2 : sin .x/ 2
(3.73)
Indeed, we have the well-known formula X 1 D ; tan.x/ mCx m2ZZ and taking derivative of both sides, (3.73) follows. Combining (3.71)–(3.73) we already obtain a representation of L.2; D / in the form of a finite sum, but because of the denominators sin2 .a=D/, the evaluation of this sum is very inconvenient for large D. To obtain the elegant/convenient formula (3.70), we involve a Fourier expansion and the so-called Gauss sum. Let F .x/ D 0 if x is integer, and let F .b=D/ D f .b=D/ for all b 2 ZZ when b is not a divisor of D. The function F .b=D/ is periodic in integral variable b with period D and therefore has a finite Fourier expansion F .b=D/ D
D1 X
n e 2inb=D :
(3.74)
nD0
It is easy to determine the constant term 0 in (3.74): D1 D1 1 X 1 X F .b=D/ D f .b=D/ D 0 D D D bD0
bD1
D1 X D2 X X n2 D .mD C a/2 D 2D D bD1 m2ZZ n1W n60 .modD/ 1 0 X X 2 .D 1/.D C 1/ 2 n2 d 2 n2 A D 2D.1 D 2 / D : D 2D @ 6 D 3 n1 n1
D
(3.75) To determine a general coefficient n , 1 n < D, the standard recipe is to multiply (3.74) by e 2inb=D and take the sum b D 0; 1; : : : ; D 1:
3.2 Connection with Quadratic Fields (II)
189
D1 D1 1 X 1 X f .b=D/e 2inb=D D F .b=D/e 2inb=D D D D bD1
bD0
D
D1 D1 X 1 X m e 2i.mn/b=D D n : D mD0 bD0
Since f .x/ is an even function (see (3.73)), we have D1 1 X n D f .b=D/ e 2inb=D C e 2inb=D D 2D bD1
D
D1 2 2 X e 2inb=D C e 2inb=D : D .e ib=D e 2ib=D /2
(3.76)
bD1
Write y D y.b/ D e ib=D ; then, motivated by (3.76), we study the rational function y 2n 1 1 y 2n y 2n C y 2n 2 D D .y y 1 /2 y 2 1 1 y 2 D .1 C y 2 C y 4 C C y 2n2 /.1 C y 2 C y 4 C C y 2nC2 / D D nC.n1/.y 2 Cy 2 /C.n2/.y 4 Cy 4 /C.n3/.y 6 Cy 6 /C C.y 2n2 Cy 2nC2 /: (3.77)
By using (3.77), we can easily evaluate the Fourier coefficient n with 1 n < D: n D
2 2 n.D n/ C 0 : D
(3.78)
Indeed, for 1 k < D we have D1 X bD1
e 2ikb=D D 1 C
1 e 2ikb D 1: 1 e 2ikb=D
(3.79)
Using (3.79) in (3.77), we have with y D y.b/ D e ib=D : D1 X bD1
y 2n C y 2n 2 D .D 1/n 2 ..n 1/ C .n 2/ C C 1/ D .y y 1 /2 D .D 1/n n.n 1/ D n.D n/:
(3.80)
190
3 Variance, and Its Connection with Quadratic Fields
On the other hand, by (3.76), D1 X bD1
D1 y 2n C y 2n 2 2 n D 1 X D D C 2 1 2 2 2 .y y / 2 2 sin .b=D/ bD1
D
n D 0 D C : 2 2 2 2
Combining this with (3.80), (3.78) follows. Returning to (3.71)–(3.74), we have L.2; D / D
D1 1X D .a/D 2 f .a=D/ D 2 aD1
D1 D1 X 1 X .a/ n e 2ina=D D D 2 2D aD1 nD0
! D
! D1 D1 X 2 2 1 X 2ina=D D D D .a/ n.D n/ C 0 e 2D 2 aD1 D nD0 D1 D1 X 2 X D 3 D .a/ n.D n/e 2ina=D D D aD1
nD0
! D1 D1 X 2 X D 3 n.D n/ D .a/e 2ina=D : D nD0 aD1
(3.81)
Here the sum S.n; D/ D
D1 X
D .a/e 2ina=D
aD1
is the famous Gauss sum, which shows up everywhere in algebraic number theory. We need two facts about the Gauss sum. Fact 1: S.n; D/ D D .n/S.1; D/ where of course S.1; D/ D
D1 X aD1
Fact 2: S.1; D/ D
p
D
D .a/e 2ia=D :
3.2 Connection with Quadratic Fields (II)
191
Note that the proof of Fact 1 is easy if n and D are relatively prime: indeed, then 2D .n/ D 1, and so S.n; D/ D D .n/
D1 X
D .an/e 2ina=D D D .n/S.1; D/;
aD1
since the numbers an, a D 1; 2; : : : ; D 1 run through a complete set of residues modulo D. If n and D have a common divisor 2, then both sides of the equality in Fact 1 are equal to 0; the proof of this case requires some extra work that we skip. The proof of Fact 2 is far from easy; we have to refer to the textbooks. Now using Facts 1 and 2 in (3.81), we obtain L.2; D / D
D1 p 2 X n.D n/ D .n/ D: 3 D
(3.82)
nD0
By using the fact D .D n/ D D .n/, we can simplify (3.82): D1 X
D1 1X n D .n/ D .n D .n/ C .D n/ D .D n// D 2 nD1 nD1 D1 D1 DX 1X .n C D n/ D .n/ D D .n/ D 0: D 2 2 nD1
nD1
Using this in (3.82), formula (3.70) follows, and the proof of Lemma 3.5 is complete. t u p Returning to (3.69), and applying Lemma 3.5, with K=Q Q. d /, where D is the discriminant, we obtain the following result. Proposition 3.6. We have
K .2/ D .2/L.2; D / D
D
D1 2 2 X D .n/n2 D 5=2 6 D nD1
D1 4 X D .n/n2 : 6D 5=2 nD1
(3.83)
This is a well-known “folklore” result (as far as we know, it goes back to Hecke, Klingen, Siegel, and was simplified by others). For example, with d D 5 we have D D 5 and pD .n/ is the classical quadratic residue symbol (“Legendre symbol”), so for K=Q Q. 5/ (3.83) gives
192
3 Variance, and Its Connection with Quadratic Fields
K .2/ D
4 2 4 p .12 22 32 C 42 / D p : 150 5 75 5
(3.84)
3.2.4 An Alternative Formula Due to Siegel: Proposition 3.7 Proposition 3.6 is an elegant, satisfying result, but there is a more efficient way to evaluate the Dedekind zeta function K .s/ at s D 2. Since the proof is much more difficult than that of Proposition 3.6, we have to skip it and just briefly state the result itself. This new approach is based on two deep arithmetic facts. The first fact is the so-called functional equation of the Dedekind zeta function, applied at the special value s D 2:
K .1/ D
1 .4p/3=2 K .2/ 4 4
(3.85)
(note that 4p is the discriminant). The second fact is a remarkable formula that implies that 60 K .1/ is always an integer(!):
K .1/ D
1 60
X
a;
(3.86)
b 2 CacDpW a>0;c>0
where the sum is over all ways of writing p D b 2 C ac with a, c positive integers (integer b can be positive, negative, and zero). Formula (3.86) is due to Siegel (see, e.g., in Zagier [Za2]), and it gives a fast way of computing K .2/. Combining (3.85) and (3.86), we have
K .2/ D
4 120p 3=2
X
a:
(3.87)
b 2 CacDpW a>0;c>0
Applying (3.87) in (3.65’), we have Vpp .N / D
1
0 D
B 1 p B 240 p @
N 1 X p .S p .n/ Mpp .N //2 D N nD1
X b 2 CacDpW a>0;c>0
p C log N aC A log p C O.log log N log N /;
(3.88)
3.2 Connection with Quadratic Fields (II)
193
where p 3 (mod 4) is a prime, the class number h.p/ D 1, and p is the p p fundamental unit in K=Q Q. p/, i.e., p D t0 C u0 p where .t0 ; u0 / is the least positive solution of Pell’s equation t 2 pu2 D 1. p Remark. It seems very likely that (3.88) holds for any ˛ D d where d 2 is a square-free integer (generalizing the special case d D p 3 (mod 4) prime with class number h.p/ D 1; of course, for arbitrary p d 2 the factor p should be replaced by d , the fundamental unit in K=Q Q. d /). We call it a conjecture and challenge the experts of algebraic number theory to prove it. This conjecture of mine on the variance is clearly motivated by the results that we know about the expectation. We recall that at the beginning of Sect. 2.1 we explained how a deep result in algebraic number theory (the Hirzebruch–Meyer– Zagier class number formula, see (2.19)) implies Proposition 2.1 in the special case p ˛ D p, p 3 (mod 4) is a prime > 3, and h.p/ D 1. But Proposition 2.1 is a far more general result, which holds for any real ˛ (not just for the special quadratic p irrationals ˛ D p). Similarly, here we derived our variance formula (3.88) from another deep result in algebraic number theory (Siegel’s formula (3.86)), and it is reasonable to expect that (3.88) has a far-reaching generalization (something like Proposition 2.1), far beyond the reach of Siegel’s formula. Unfortunately, we have no clue how this (hypothetical) generalization may look like; perhaps the reader can help me. What we certainly know is that p Eq. (3.88) remains true for any positive d 2 (mod 4) if the class number of Q . d / is one, for example, this happens for d D 2; 6; 14; 22. Let f .d / denote the critical sum in (3.88): X
f .d / D
a:
(3.89)
b 2 CacDd W a>0;c>0
For example, we have f .2/ D 5; f .3/ D 10; f .6/ D 30; f .7/ D 40:
(3.90)
By (3.88), lim
N !1
Vpd .N / log N
D c .d / D
f .d / : p 240 d log d
(3.91)
For example, by (3.90) and (3.91) we have 1 1 p p ; c .3/ D p p ; 48 2 log.1 C 2/ 24 3 log.2 C 3/
(3.92)
1 1 p ; c .7/ D p p : c .6/ D p 8 6 log.5 C 2 6/ 6 7 log.8 C 3 7/
(3.93)
c .2/ D
194
3 Variance, and Its Connection with Quadratic Fields
p p Notice that (3.93) justifies the values of the constant factors C4 . 2/ and C4 . 3/ in the denominator in Theorem 1.2, p see (1.52) and (1.53). The golden ratio ˛ D .1 C 5/=2 is not covered by the key formula (3.88). One of the (annoying) peculiarities of algebraic number theory is that the real quadratic p fields Q . d / require a slightly different treatment if d is a square-free p positive integer with d 1 (mod 4) (d 5). Then the algebraic integers of Q . d / have the form p d 1 xCy ; x; y 2 ZZ with norm 2 ! ! p p .2x y/2 dy 2 d 1 d C1 xy D : (3.94) xCy 2 2 4 Therefore, if ˛ D .1 C
p d /=2 and kj˛k is “small,” then sin2 .j˛/ kj˛k2 D .j˛ `/2 2
1 `Cj dj 2
!2 !2 p p 2 .2` j /2 dj 2 d 1 d C1 `j D ; 2 2 4 4dj 2
(3.95)
where ` D `.j; d / is the nearest integer to j˛. Notice that (3.95) is an analog of (3.48). We can repeat the argument of the case d 3 (mod 4) above with the slight modification that the new Pell equation is .2x y/2 dy 2 D ˙1; 4
(3.96)
which is equivalent to t 2 d u2 D ˙4 and t u (mod 2). For example, if d D 5; 13; 17 then the fundamental units are p p p 1C 5 3 C 13 5 D ; 13 D ; 17 D 4 C 17; 2 2
(3.97)
and they all have norm 1. The following version of (3.88) covers all cases. p Proposition 3.7. Assume that the class number of the real quadratic fields Q . d / is one. Let p p 1C d if d 1 .mod 4/: ˛ D d if d 2; 3 .mod 4/ and ˛ D 2
3.2 Connection with Quadratic Fields (II)
195
Then the variance V˛ .N / D
1
0 D
B 1 p B 120 D @
N 1 X .S˛ .n/ M˛ .N //2 D N nD1
X B 2 C4acDDW a>0;c>0
p C log N aC A log d C O.log log N log N /;
(3.98)
p p where d is the fundamental unit in Q . d / and D is the discriminant of Q . d /, i.e., D D 4d if d 2,3 (mod 4) and D D d if d 1 (mod 4) (and, accordingly, B D 2b or b). Remarks. When we compute the standard deviation—i.e., we take the square root— p the relatively large error term O.log log N log N / in (3.98) becomes a negligible O.log log N /: p
V˛ .N / D c˛
p log N C O.log log N /:
Let F .D/ denote the critical sum in (3.98): F .D/ D
X
a:
(3.99)
B 2 C4acDDW a>0;c>0
For example, we have F .5/ D 2; F .13/ D 10; F .17/ D 20:
(3.100)
By (3.98), lim
N !1
F .D/ V˛ .N / : D c .d / D p log N 120 D log d
(3.101)
For example, by (3.97) and (3.100) and (3.101) we have c .5/ D
1 1 p p ; c .13/ D p p ; 60 5 log..1 C 5/=2/ 12 13 log..3 C 13/=2/ (3.102) 1 c .17/ D p p : 6 17 log.4 C 17/
(3.103)
196
3 Variance, and Its Connection with Quadratic Fields
3.3 Connection with Quadratic Fields (III) 3.3.1 The General Case: Computing the Variance for an Arbitrary Quadratic Irrational In Sect. 3.2 wep explained how to compute the variance for p the special quadratic irrationals ˛ D d for d 2 or 3 (mod 4) and ˛ D .1 C d p /=2 for d 1 (mod 4), assuming the class number of the real quadratic field K=Q Q. d / is one. In these cases the computation of the variance is equivalent to finding the exact value of the Dedekind zeta function K .s/ at s D 2. The general form of a quadratic irrational is p b ˙ ˛D ; 2a that is, ˛ is the root of ax 2 C bx C c D 0 and D b 2 4ac > 0 is the discriminant. We have D Dm2 , where D is a fundamental discriminant (i.e., the discriminant of a real quadratic field). To evaluate the corresponding variance X .x;y/¤.0;0/W primary representations
1 ; .ax 2 C bxy C cy 2 /2
(3.104)
the Dedekind zeta function K .s/ has to be replacedpby the zeta function .s; M / of the complete ZZ-module M D ZZ1 C ZZ˛ of K=Q Q. d /. For notational simplicity, we assume that D D, that is, the discriminant of the primitive indefinite form ax 2 C bxy C cy 2 is a fundamental discriminant (since the switch from D D to the general case D Dm2 is a routine matter in algebraic number theory). The fundamental discriminant means that we work p with the zeta function K .s; A/ of the corresponding ideal class A of K=Q Q. d /. In general K .s; A/ does not have an Euler-type decomposition as (3.130), but it does have a functional equation relating K .s; A/ and K .1 s; A/ (proved by Hecke): with F .s/ D D s=2 s 2 .s=2/ K .s; A/
(3.105)
we have F .s/ D F .1 s/: In the special case s D 2 we have
K .2; A/ D
4 4
K .1; A/: D 3=2
(3.106)
3.3 Connection with Quadratic Fields (III)
197
We know two effective algorithms to evaluate K .1; A/ is the form of a finite sum: one is due to Zagier [Za3] and the other one is due to Shintani [Shi]. Both are developed to the point of explicit calculations. Since the details become quite involved and technical, we stop here and refer the interested reader to the readable papers [Za3] and [Shi].
3.3.2 Computing the Variance in Theorem 1.1: A Special Case Theorem 1.1 is about the asymptotic behavior of the discrepancy X
S˛ .I n/ D
1 n
(3.107)
1knW k˛2.0;/ .mod 1/
of the counting function of the irrational rotation k˛ (mod 1). For simplicity, assume that D 1=2. The characteristic function 1=2 of the first half of the unit interval .0; 1=2/ has the Fourier series 1=2 .x/
1 D 2
X j 1W od d
2 sin.2jx/ ; j
and so we have S˛ .1=2I n/ D
n X
1=2 .k˛/
kD1
n D 2
X j 1W od d
cos..2n C 1/j˛/ cos.j˛/ : j sin.j˛/
By repeating the argument of Sects. 2.3 and 2.4 we can prove the following analog of Proposition 2.16: M˛ .1=2I N / D
N 1 X S˛ .1=2I n/ D N nD1
X 1j N W od d
1 C O.1/; j tan.j˛/
(3.108) assuming ˛ is badly approximable. p For simplicity, we just consider the special case ˛ D 2. Repeating the arguments of Sect. 3.1, we can easily prove the following result for the corresponding variance: Vp2 .1=2I N /
N 1 X p D .S 2 .1=2I n/ Mp2 .1=2I N //2 D N nD1
198
3 Variance, and Its Connection with Quadratic Fields
D
D
4 4
N X
1
kD1W od d
2 2 k 2
X
N X
p 1x 2N yD1W od d
p C negligible D sin2 .k 2/
.x 2
1 C negligible: 2y 2 /2
(3.109)
From (3.109) we will derive that Vp2 .1=2I N /
p
p log N 3 2 p C O log log N log N : D 7 2 log.1 C 2/
The new difficulty in (3.109) is the condition “y is odd”; without it the evaluation of the sum on the right hand side of (3.109) is easy: X
N X
p 1x 2N yD1
1 D .x 2 2y 2 /2
1
0 X
B DB @
.x;y/¤.0;0/W primary representations
.x 2
C log N 1 C p C negligible; 2 2 2y / A log.1 C 2/
(3.110)
p Q. 2/ (and 1 C where the sum on the right-hand side of (3.110) is K .2/ with K=Q p 2 is the fundamental unit). The condition “y is odd” in (3.109) means that we have to involve an extra case study modulo 8. We have
K .s/ D
1 X R2 .n/ nD1
(3.111)
ns
where R2 .n/ is the number of primary representations of x 2 2y 2 D ˙n. We decompose (3.111) modulo 8 as follows:
K .s/ D
X 4
K;a .s/ where K;a .s/ D
X n1W na .mod 8/
Clearly (1) x even and y even imply x 2 2y 2 0 or 4 (mod 8); (2) x odd and y even imply x 2 2y 2 1 (mod 8);
R2 .n/ : ns
(3.112)
3.3 Connection with Quadratic Fields (III)
199
(3) x odd and y odd imply x 2 2y 2 1 (mod 8); (4) x even and y odd imply x 2 2y 2 2 or 2 (mod 8). We also p need to study the parity effect of multiplying with the fundamental unit 1 C 2 and its square: p p p .x C y 2/.1 C 2/ D .x C 2y/ C .x C y/ 2;
(3.113)
p
p p p p .x C y 2/.1 C 2/2 D .x C y 2/.3 C 2 2/ D .3x C 4y/ C .2x C 3y/ 2: (3.114) In (3.113) y and xCy have the same or different parity if x is even or odd. In (3.114) y and 2x C 3y have the same parity. Combining these facts, we have X
N X
p 1x 2N yD1W od d
D . K;2 .2/ C K;2 .2//
C . K;1 .2/ C K;1 .2//
1 D .x 2 2y 2 /2
log N p C log.1 C 2/
log N 2 log.1 C
p C negligible: 2/
(3.115’)
(3.115”)
Note that (3.115’) corresponds to (3.113) paired with case (4) above, and (3.115”) corresponds to (3.113) paired with case (2) and (3.114) paired with case (3). We have to evaluate K;a .s/ for a D 2; 1; 1; 2. We recall the well-known factorization
K .s/ D .s/L.s; 8 /
(3.116)
p Q. 2/; here of the Dedekind zeta function K with K=Q
.s/ D
1 1 X X 1 8 .`/ and L.s; / D 8 ks `s kD1
(3.117)
kD1
where 8 denotes the following character modulo 8: 8 .n/ D 1 if n ˙1 (mod 8), 8 .n/ D 1 if n ˙3 (mod 8), and 8 .n/ D 0 if n is even. In view of (3.111)–(3.112) and (3.116)–(3.117) it suffices to evaluate the sum S.b/ D
X n2ZZW nb .mod
X 1 1 D : 2 n .8m C b/2 8/ m2ZZ
(3.118)
200
3 Variance, and Its Connection with Quadratic Fields
We recall the well-known formula X 1 D ; tan.x/ mCx m2ZZ and differentiating it term-wise we obtain the new formula X 2 1 D : 2 .m C x/2 sin .x/ m2ZZ Using it in (3.118) we have S.b/ D
1 2 .=8/2 D 5 : 2 2 1 cos.b=4/ sin .b=8/
(3.119)
Combining (3.111) and (3.112), (3.116) and (3.117), (3.119), and the numerical fact 1 1 3 3 5 5 7 7 1 (mod 8), we have
K;1 .2/ D
1 4 .A1 B1 C A3 B3 C A5 B5 C A7 B7 / ; 22 210
(3.120)
where Aj D
1 8 .j / and Bj D : 1 cos.j=4/ 1 cos.j=4/
(3.121)
Note that the factor 212 at the beginning of the right-hand side of (3.120) comes from the difference between the summation n 2 ZZ in (3.118) and n 1 in the zeta function. Since p 2 1 1 A1 D Dp (3.122) D A7 D B1 D B7 D 1 1 cos.=4/ 1 p 21 2
and p 1 1 2 Dp D A3 D D A5 D B3 D B5 ; 1 1 cos.3=4/ 1C p 2C1 2
(3.123)
we have 0 !2 1 p p p !2 4 @ 4 2 2 2 A
K;1 .2/ D 12 2 p 2 p : D 2 28 21 2C1 Since cos.x/ is an even function, K;1 .2/ D K;7 .2/.
(3.124)
3.3 Connection with Quadratic Fields (III)
201
Based on the numerical fact 2 1 6 3 2 5 2 7 2 (mod 8), we have the following analog of (3.120):
K;2 .2/ D
4 .A2 B1 C A6 B3 C A2 B5 C A6 B7 / : 212
(3.125)
Note that A2 D
1 D 1 D A6 ; 1 cos.=2/
and combining this with (3.122) and (3.123), we can rewrite (3.125) as follows: 4 .A2 .B1 C B5 / C A6 .B3 C B7 // D 212 ! p p p 4 2 2 2 D K;6 .2/: 2p D 2p 210 21 2C1
K;2 .2/ D
D
4 212
(3.126)
Combining (3.109), (3.115), (3.124), and (3.126), we have Vp2 .1=2I N / D
4 D 4
D
X
N 1 X p .S 2 .1=2I n/ Mp2 .1=2I N //2 D N nD1 N X
p 1x 2N yD1W od d
1 C negligible D .x 2 2y 2 /2
log N 2 .2 K;2 .2/ C 2 K;2 .2/ C K;1 .2/ C K;1 .2// p C negligible D 4 log.1 C 2/ p log N 3 2 p C negligible: D 7 2 log.1 C 2/
Again “negligible” means O.log log N Vp
p log N /, that is,
p
p 3 2 log N .1=2I N / D log N : p C O log log N 2 27 log.1 C 2/
(3.127)
p Note that (3.127) justifies the value of the constant factor C2 D C2 . 2; 1=2/ in the denominator in Theorem 1.1, see (1.33).
202
3 Variance, and Its Connection with Quadratic Fields
3.3.3 Computing the Variance in Theorem 1.1: The General Case In order to compute the variance of X
S˛ .I n/ D
1 n;
1knW k˛2.0;/ .mod 1/
we consider the characteristic function of the interval .0; / with a rational 0 < < 1. We write D r1 =r2 where 1 r1 < r2 are relatively prime integers. It is easy to compute the Fourier series of .x/: X
.x/ D
j 2ZZW j ¤0
(where of course i D S˛ .I n/ D
n X
1 e 2ijr1 =r2 2ijx e 2ij
p 1), and so we have
.k˛/ n D
kD1
X j 2ZZW j ¤0
D
X j 2ZZW j ¤0
1 e 2ijr1 =r2 2ij
n X
! e
2ij k˛
D
kD1
1 e 2ijr1 =r2 e 2ij n˛ 1 : 2ij 1 e 2ij˛
By repeating the argument of Sects. 2.3 and 2.4 we can prove the following analog of Proposition 2.16: M˛ .I N / D
N 1 X S˛ .I n/ D N nD1
D
X jj jN W j ¤0
1 e 2ijr1 =r2 C O.1/ D 2ij .1 e 2ij˛ /
N X cos2 .jr1 =r2 / C O.1/; j tan.j˛/ j D1
assuming ˛ is badly approximable. Repeating the arguments of Sect. 3.1, we can easily prove the following result for the corresponding variance: V˛ .I N / D
N 1 X .S˛ .I n/ M˛ .I N //2 D N nD1
3.3 Connection with Quadratic Fields (III)
D
D
D 2 4
203
N X sin2 .kr1 =r2 / C negligible D 2 2 k 2 sin2 .k˛/ kD1
N X X
sin2 .yr1 =r2 / C negligible; .Ax 2 C Bxy C Cy 2 /2 1x˛N yD1
(3.128)
p where ˛ D .B C D/=2A is the general form of a quadratic irrational, i.e., ˛ is a root of Ax 2 C Bx C C D 0 and D D B 2 4AC > 0 is not a complete square. The integral solutions of Ax 2 C Bxy C Cy 2 D m, where m ¤ 0 is a given integer, are described by the automorphisms of the binary form (see the unimodular substitution in (3.59)): xD
t C uB t uB x1 C uy1 ; y D Aux1 C y1 ; 2 2
(3.129)
where .t; u/ is a solution of Pell’s equation t 2 Du2 D 4. Since all solutions of Pell’s equation t 2 Du2 D 4 are given by the formula (see (3.60)) p !n p ! t0 C u0 D t Cu D D˙ ; 2 2 where .t0 ; u0 / is the least positive solution, it suffices to study (3.129) with t D t0 , u D u0 : x2 D
t0 u 0 B t0 C u0 B x1 C u0 y1 ; y2 D Au0 x1 C y1 : 2 2
(3.130)
Replacing .x1 ; y1 / in (3.130) with .x2 ; y2 /, we obtain .x3 ; y3 /. Replacing .x2 ; y2 / in (3.130) with .x3 ; y3 /, we obtain .x4 ; y4 /, and so on. Also, we can go backward and obtain .x0 ; y0 /, .x1 ; y1 /, .x2 ; y2 /, and so on. In view of (3.128), we have to study the sequence .xi ; yi / modulo r2 (=denominator of ): .xi .mod r2 /; yi .mod r2 //; i 2 ZZ:
(3.131)
Equation (3.130) implies that (3.131) is a periodic infinite sequence, and the length of the period is clearly p r22 . This periodicity explains that the proof of the special case D 1=2 (and ˛ D 2) above perfectly illustrates the general case, and we have N X X
sin2 .yr1 =r2 / D c 0 .˛; / log N C negligible: 2 C Bxy C Cy 2 /2 .Ax 1x˛N yD1
204
3 Variance, and Its Connection with Quadratic Fields
Combining this with (3.128), we obtain V˛ .I N / D
N 1 X D c 00 .˛; / log N C negligible: N nD1
(3.132)
Note that the constant factor c 00 .˛; / in (3.132) is clearly nonzero. Indeed, by the periodicity of (3.131) and by (3.128), it suffices to find a single integer m ¤ 0 such that Ax 2 C Bxy C Cy 2 D m with some y 6 0 (mod r2 ) (because then sin2 .yr1 =r2 / ¤ 0). But this is trivial: x D 0, y D 1 is a good choice. Again we p can guarantee that “negligible” in (3.132) actually means O.log log N log N /. Thus we have Proposition 3.8. If ˛ is a quadratic irrational, and 0 < < 1 is a rational number, then V˛ .I N / D
N 1 X .S˛ .I n/ M˛ .I N //2 D N nD1
p D c 00 .˛; / log N C O log log N log N ; where c 00 .˛; / > 0 is a strictly positive constant.
t u
3.3.4 The Case of Symmetric Intervals What happens if the asymmetric interval .0; / in Proposition 3.8 is replaced with the symmetric interval .; /? It is easy to answer this question. Let ˙ denote the characteristic function of the interval .; / with a rational 0 < < 1=2. We write D r1 =r2 where 1 r1 < r2 are relatively prime integers. It is easy to compute the Fourier series of ˙ .x/: X
˙ .x/ 2 D
j 2ZZW j ¤0
sin.2jr1 =r2 / 2ijx : e j
Our goal is to study S˛ .˙I n/ D
X 1knW k˛2.;/ .mod 1/
1 2n:
3.3 Connection with Quadratic Fields (III)
205
By using the Fourier series, we have S˛ .˙I n/ D
n X
X
˙ .k˛/ 2n D
kD1
j 2ZZW j ¤0
D
X j 2ZZW j ¤0
n sin.2jr1 =r2 / X 2ij k˛ e j
! D
kD1
sin.2jr1 =r2 / e 2ij n˛ 1 : j 1 e 2ij˛
By repeating the argument of Sects. 2.3 and 2.4 we can prove the following analog of Proposition 2.1: if ˛ is badly approximable, M˛ .˙I N / D
N 1 X S˛ .˙I n/ D N nD1
D
X jj jN W j ¤0
N X sin.2jr1 =r2 /
j
j D1
sin.2jr1 =r2 / C O.1/ D j .1 e 2ij˛ /
C O.1/;
because 1 1 C D 1: 1 e 2ij˛ 1 e 2ij˛ Since r2 X
sin.2jr1 =r2 / D 0;
j D1
by Abel’s transformation (2.119) we have N X sin.2jr1 =r2 / D O.1/: j j D1
Therefore, with D r1 =r2 , M˛ .˙I N / D
N 1 X S˛ .˙I n/ D O.1/: N nD1
206
3 Variance, and Its Connection with Quadratic Fields
Repeating the arguments of Sect. 3.1, we can easily prove the following result for the corresponding variance: V˛ .˙I N / D
D
N 1 X 2 S .˙I n/ D N nD1 ˛
N X sin2 .2kr1 =r2 / C negligible: 2 2 k 2 sin2 .k˛/ kD1
Comparing this to (3.128), we find the following equality between the variances of the “symmetric” and “asymmetric” cases: V˛ .˙I N / D V˛ .2I N / C negligible; where “negligible” is the usual error term O.log log N
p log N /.
Chapter 4
Proving Randomness
4.1 Completing the Proof of Theorem 1.2 In Sect. 1.5 we almost proved Theorem 1.2 in the special case of the golden ratio. The missing part was the variance, but we took care of this particular issue in Sect. 3.2. The golden ratio has the simplest continued fraction among all quadratic irrationals, and this extreme simplicity (the length of the period is one, and every partial quotient is one) is rather misleading. What we do in this section is basically a “proof by examples.” We replace the golden ratio with some increasingly more complicated quadratic irrationals, such as ˛D
p p p 3 D Œ1I 1; 2 and 7 D Œ2I 1; 1; 1; 4 and 19 D Œ4I 2; 1; 3; 1; 2; 8
in Theorem 1.2, and p explain p phow to modify the arguments of Sect. 1.5. We show that the numbers 3, 7, 19 well represent the general case. The method of Sect. 1.5 was p to set up an approximation (in the special case of the golden ratio ˛ D .1 C 5/=2) S˛ .n/ M˛ .N / D X1 C X2 C X3 C : : : C negligible; where the Xi s are independent and identically distributed random variables (as n runs in 0 < n < N ). For an arbitrary quadratic irrational ˛ the construction of these independent random variables is somewhat more complicated, and this construction is the bulk of Sect. p 4.1. We start with 3. Again we are going to use Ostrowski’s explicit p formula (1.55), so the first step is Ostrowski’s expansion of n with respect to ˛ D 3, see (1.54). Since ˛D
p
3 D Œ1I 1; 2; 1; 2; 1; 2; : : : D Œa0 I a1 ; a2 ; a3 ; : : :;
© Springer International Publishing Switzerland 2014 J. Beck, Probabilistic Diophantine Approximation, Springer Monographs in Mathematics, DOI 10.1007/978-3-319-10741-7__4
207
208
4 Proving Randomness
it is easy to determine the denominator qi of the convergents pi =qi of
p
3. We have
q1 D 1; q2 D a1 D 1; qi D ai 1 qi 1 C qi 2 for i 3; that is; q2j D q2j 1 C q2j 2 and q2j 1 D 2q2j 2 C q2j 3 for j 2:
(4.1)
We also have p2j ˙ q2j
p p p 3 D .p2 ˙ q2 3/j D .2 ˙ 3/j ;
(4.2)
implying p
p 1 q2j D p .2 C 3/j .2 3/j : 2 3
(4.3)
By (4.1), q2j D q2j 1 C q2j 2 , and so (4.3) gives that q2j 1 D q2j q2j 2 D
p p p
p 1 1 D p .2 C 3/j .2 3/j p .2 C 3/j 1 .2 3/j 1 D 2 3 2 3 p p p
1 p D p . 3 C 1/ .2 C 3/j C . 3 1/ .2 3/j : 2 3
(4.4)
Following (1.54), the corresponding Ostrowski expansion of n is a unique way to express an arbitrary positive integer n as a linear combination of the qi s as follows: nD
X i
bi qi ;
0 b2j 2 D a2j ; 0 b2j 1 1 D a2j 1 for i 1
(4.5)
and b1 D b1 .n/ D 0, where * indicates the Extra Rule that if bi D bi .n/ D ai then bi 1 D bi 1 .n/ D 0 (i 2). p p The only new parameter in Ostrowski’s formula (1.55) is "i D "i . 3/ D qi 3 pi . By (4.2) we have "2j D q2j
p
3 p2j D .2
p
3/j ;
(4.6)
and by (4.1), p p "2j 1 D q2j 1 3 p2j 1 D .q2j q2j 2 / 3 .p2j p2j 2 / D p p D "2j "2j 2 D . 3 1/ .2 3/j 1 :
(4.7)
4.1 Completing the Proof of Theorem 1.2
209
Thus we have
p 1 q2j j"2j j D p 1 .2 3/2j ; 2 3
(4.8)
p 1 q2j 1 j"2j 1 j D p 1 C .2 3/2j 1 : 3
(4.9)
and
By using these facts in Ostrowski’s formula (1.55), for 0 n < qm we have (we recall that b1 D 0) 1 0 1 X q q j" j j j" b 1 j i i i i AD Sp3 .n/ D .1/i C1 bi @ C bj A qi j"i j C C@ 2 2 q 2 i i D2 2j
m1 X
0
1 X p p b 1 1 2i ij D b2i @ C p C p .b2j C . 3 C 1/b2j C1 /.2 3/ A C 2 4 3 2 3 1j
X
0
1 b2i1 1 X b2j 1 C b2i1 @ C p C p C 2 2 3 3 1j
p
3C1 b2j 2
! .2
1
p
ij A
3/
C O.1/; (4.10)
where of course bi D bi .n/ depends on n (see (4.5)). Following the method of Sect. 1.5, to restore independence, we sacrifice the p Ostrowski representation (4.5) and switch to a .2 C 3/ scale representation of real numbers. Instead of taking any integer n in 0 n < qm , where qm is defined in (4.3) and (4.4), we write an arbitrary real number in the interval p 1 0 < p ˇ m D q2m C o.1/; where ˇ D 2 C 3; 2 3 in the form
p p 1
D p b2m1 .1C 3/ˇ m1 Cb2m2 ˇ m1 Cb2m3 .1C 3/ˇ m2 Cb2m4 ˇ m2 C ; 2 3 (4.11)
which is analogous to (1.189). We can guarantee uniqueness in (4.11) by choosing D b2i . / 2 f0; 1; 2g; b2i b2i 1 D b2i 1 . / 2 f0; 1g
and by enforcing the Extra Rule: D 2 implies b2i b2i 1 D 0 and b2i 1 D 1 implies b2i 2 D 0:
(4.12)
210
4 Proving Randomness
Notice that expansion (4.11) is infinite: it is defined for all i < m, including all negative integers for i . Since the key Eq. (4.10) is expressed in terms of the Ostrowski coefficients bi D bi .n/, we have to clarify the relation between bi .n/pand bi . /. First we replace Sp3 .n/ in (4.10) with the new sum (where ˇ D 2 C 3) S . /DSp . /D 3
0
X 1i m1
1 X p b 1 1 2i @ j i A C p C p C b2i .b2j C. 3C1/b2j C1 /ˇ 2 4 3 2 3 1j
0
1 X b2i 1 @ p1 C p C b2i C b2j 1 1 C 2 2 3 3 2i m 1j
X
p
1 ! 3C1 b2j ˇ j i A C O.1/; 2
(4.13) which is an analog of (1.192). In (4.13) we ignored the (negligible!) contribution of the digits bi with i < 2 (instead of 2 we could take any other starting value in (4.13)). The advantage of working with bi instead of bi p is that the random variables bi D bi . / (as runs p m in the interval 0 < < ˇ =2 3 with ˇ D 2 C 3, which forces that the initial distribution is the stationary distribution) form a homogeneous Markov chain, and again we can restore independence by a simple trick (see below). By repeating the arguments in (1.192)–(1.194), we see that for the overwhelming majority of integer n Sp3 .n/ D Sp . / C O.1/ for all n 3
1 1
In view of this, it suffices to prove the central limit theorem for the sum Sp . / 3 defined in (4.13).
4.1.1 Renewal Versus Self-Similarity The key technical idea of Sect. 1.5 was to restore independence by decomposing the digit sequence bm1 bm2 b2 into 0s and 10s, that is, bm2 b2 D 0 0 10 0 0 10 10 0 10 0 ; bm1
or formally bm1 bm2 b2 D B1 B2 Bt
where Bi D 0 or 10, and t is a random variable (see (1.198)).
4.1 Completing the Proof of Theorem 1.2
211
Here 0 and 10 means that we always stop at the state 0, and start a “new life” until we visit state 0 again. Since the Markov chain has a short-term memory, these “new lives” are independent of each other. This argument can be easily generalized for any (homogeneous and ergodic) Markov chain. Let i be an arbitrary state, and let startD 0 < T1 < T2 < T3 < : : : denote the times of successive visits to state i . We call the restriction of the Markov chain to the time interval .Tj 1 ; Tj the j th “life,” and denote it by Xj (we set T0 D 0). Again the short-term memory of the Markov chain implies that X1 ; X2 ; X3 ; : : : are independent and that X2 ; X3 ; X4 ; : : : are identically distributed. (On the other hand, X1 does not have this common distribution in general, unless the Markov chain began in state i .) We refer to this argument as the “renewal principle” of the Markov chain. p The application of the renewal principle for ˛ D 2 would give the infinitely many blocks 0, 20, 10, 120, 110, 1120, 1110, 11120, : : : in (1.136). Instead, in Sect. 1.4 we managed to achieve independence with a very short list of three: 0, 1, 20, and the reason behind it was self-similarity. We can say that “self-similarity” is an elegant shortcut way of using the renewal principle. What we do below is exactly this shortcut way. Let’s return to Sect. 1.5: the decomposition p in (1.198) was motivated by the simple equation 1 C ˆ1 D ˆ with ˆ D .1 C 5/=2.p The geometric motivation is the following: we split the interval .0; ˆ/ D .0; .1p C 5/=2/ into the unit interval .0; 1/ and the shorter interval .1; ˆ/ D .1; .1 C 5/=2/; .0; 1/ represents 0 and .1; ˆ/ represents the block 10. Since the length of .1; ˆ/ is ˆ1 , we have selfsimilarity. p For ˛ D 3 recurrence (4.1) motivates pthe following more complicated picture. We divide the interval .0; ˇ/ D .0; 2 C 3/ into 4 subintervals: the first two unit p intervals, .0; 1/ and .1; 2/, the shorter interval .2; 1 C 3/, and the last interval .1 p pC p 3; 2 C 3/ of length one. Here .0;p1/ and .1; 2/ represent 00 and 01, .2; 1 C 3/ p represents the block 020, and .1 C 3; 2 C 3/ represents the block 10. But here p p comes the technical problem: the length of the short interval .2; 1 C 3/ is 3 1, which is not ˇ 1 . To achieve self-similarity, this “short forces us into an infinite cycle as p distance”p follows. The short interval .2; 1 C 3/ of length 3 1 (representing 020) can be 1 decomposed into two subintervals p of length1ˇ (representing 0200 and 0201) and a shorter subinterval of length . 3 1/ˇ (representing 02020). Next p we apply the same kind of decomposition to the shorter subinterval of length . 3 1/ˇ 1 p what we did to .2; 1 C 3/ and so on. Formally, we have the infinite series: ˇ D 2C D 3 C 2.2
p
p 3 D 3 C . 3 1/ D
p p p p 3/ C . 3 1/.2 3/ D 3 C 2ˇ 1 C . 3 1/ˇ 1 D
1 X p ˇ k : D 3 C 2ˇ 1 C 2ˇ2 C . 3 1/ˇ 2 D : : : D 3 C 2 kD1
(4.14)
212
4 Proving Randomness
An alternative way of obtaining (4.14) is to use the recurrence formulas in (4.1): q2k D q2k1 C q2k2 and q2k1 D 2q2k2 C q2k3 ; which imply q2k D .2q2k2 C q2k3 / C q2k2 D 3q2k2 C q2k3 D D 3q2k2 C .2q2k4 C q2k5 / D 3q2k2 C 2q2k4 C q2k5 D D 3q2k2 C 2q2k4 C 2q2k6 C q2k7 D 3q2k2 C 2q2k4 C 2q2k6 C 2q2k8 C : : : (4.14’) Equation (4.14) “means” that, instead of a simple decomposition into two types of “blocks” (0s and 10s), we need a decomposition into infinitely many types of “blocks,” such as 10; 00; 01; 0200; 0201; 020200; 020201; 02020200; 02020201; : : :
(4.15)
The pattern in (4.15) is thepfollowing: every “block” must have even length (since the length of the period of 3 D Œ1I 1; 2 is two), and no “block” can end with a “2” (since the Extra Rule forces an additional “0,” and that violates the parity). One can visualize (4.15) as an infinite “caterpillar” graph with infinitely long spine and with two legs on each vertex of the spine (the legs represent the endings 00 and 01). p Dividing Eq. (4.14) by ˇ D 2 C 3, we have a probability distribution 1 D 3ˇ 1 C 2ˇ2 C 2ˇ 3 C 2ˇ 4 C : : : ;
(4.16)
meaning that every “block” in (4.15) has probability ˇ j if the length of the block is 2j . As we said above, we decompose the digit sequence b2m1 b2m2 b2m3 b2m4 : : : b2 into “blocks” b2m1 b2m2 b2m3 b2m4 : : : b2 D B1 B2 : : : Bt
(4.17)
where Bi D 10 or 00 or 01 or 0200 or 0201 or 020200 or 020201 or 02020200 or 02020201 or 0202020200 or 0202020201 or : : : ; see (4.15), and we have PrŒBi D block listed in (4.15) of length 2j D ˇ j
(4.18)
4.1 Completing the Proof of Theorem 1.2
213
p where ˇ D 2 C 3. To describe random variable t in (4.17), we need to know the expected length of a “block” listed in (4.15). The expected length is 2 3ˇ 1 C 4 2ˇ 2 C 6 2ˇ 3 C 8 2ˇ 4 C D D 2ˇ 1 C 4ˇ 1
1 X kD1
kˇ kC1 D 2ˇ 1 C 4ˇ 1
p 1 D 6 2 3 D 2:536 : : : 1 2 .1 ˇ / (4.19)
p What happens if 3p D Œ1I 1; 2 is replaced p with some more complicated quadratic irrational, say, 7 D Œ2I 1; 1; 1; 4 or 19 D Œ4I 2; 1; 3; 1; 2; 8? p Let’s begin with 7 D Œ2I 1; 1; 1; 4. Because its period has length four, to enforce independence, every “block” in the decomposition analogous to (4.15) must have a length that is divisible by 4. To figure out the p analog of (4.15), we work with the denominators q of the convergents p =q of 7 D Œ2I 1; 1; 1; 4 the same way i i i p as we did with 3 above, see (15.14’). We use the recurrence formulas q4k D q4k1 C q4k2 ; q4k1 D q4k2 C q4k3 ; q4k2 D q4k3 C q4k4 ; q4k3 D 4q4k4 C q4k5 ; and putting them together, we obtain an analog of (15.14’): q4k D q4k1 C q4k2 D .q4k2 C q4k3 / C q4k2 D 2q4k2 C q4k3 D
D 2.q4k3 C q4k4 / C q4k3 D 3.4q4k4 C q4k5 / C 2q4k4 D 14q4k4 C 3q4k5 : (4.20) Note that (4.20) is a special case of the general recurrence formula q4k D q5 q4k4 C q4 q4k5 ;
(4.21)
since Œ2I p1; 1; 1 D 8=3 gives q4 D 3 and Œ2I 1; 1; 1; 8 D 37=14 gives q5 D 14. For 19 D Œ4I 2; 1; 3; 1; 2; 8 the analog of (4.20) and (4.21) is q6k D q7 q6k6 C q6 q6k7 D 326q6k6 C 39q6k7 :
(4.22)
p p The description of the “blocks”p(i.e., the analog of (4.15)) p for 7 and 19 is more complicated than that of for 3. We recall Eq. (4.20) for 7 D Œ2I 1; 1; 1; 4: q4k D 14q4k4 C 3q4k5 I
214
4 Proving Randomness
it means that there are 14 admissible sequences S D .S1 ; S2 ; S3 ; S4 / (i.e., satisfying the corresponding Extra Rule) with S4 ¤ 4 (=last digit of the period): 0000; 0001; 0002; 0003; 0010; 0100; 0101; 0102; 0103;
(15.230)
1000; 1001; 1002; 1002; 1003; 1010:
(15.2300)
Furthermore, there are three admissible sequences S D .S1 ; S2 ; S3 ; S4 / satisfying S4 D 4, and by the Extra Rule each one of these sequences is extended to length five with an additional extra 0: 0004 ! 0; 0104 ! 0; 1004 ! 0:
(4.24)
p To describe the “blocks” for 7, we can use the following Markov chain (=asymmetric random walk on a graph with 4 vertices, labeled as A1 ; A2 ; BH.1/; BH.2/).
State A1 consists of the single sequence 1004; A1 represents the case S1 ¤ 0; S4 D 4. State A2 consists of the two sequence 0004 and 0104; A2 represents the case S1 D 0; S4 D 4. Note that A1 and A2 together give (4.24). Moreover, there are two “black holes”: BH.1/ and BH.2/. BH.1/ is an isolated black hole, and BH.2/ is a non-isolated black hole. The non-isolated black hole BH.2/ consists of the 9 admissible sequences listed in (15.23’); the isolated black hole BH.1/ consists of the 5 sequences listed in (15.23”). A1 is a “no-return” state: from A1 we must go either to A2 or to BH.2/. A2 is a “transient” state: from A2 we may go either to BH.2/ or to A2 (“return”).
4.1 Completing the Proof of Theorem 1.2
215
Now we are ready to classify the “blocks” for of (4.15):
p
7; the following is an analog
BH.1/ BH.2/ A1 BH.2/ A1 A2 BH.2/; A1 A2 A2 BH.2/; A1 A2 A2 A2 BH.2/; : : : ; A1 A2 : : : A2 BH.2/; : : : A2 BH.2/; A2 A2 BH.2/; A2 A2 A2 BH.2/; : : : ; A2 : : : A2 BH.2/; : : :
(4.25)
In (4.25) there are 9+5=14 “blocks” of length 4; 1 9 C 2 9 D 27 “blocks” of length 8; 1 2 9 C 2 2 9 D 54 “blocks” of length 12; and in general there are 27 2k2 “blocks” of length 4k (k 2). We have the following analog of (4.17) and (4.18): we decompose the digit sequence b4m1 b4m2 b4m3 b4m4 : : : b2 into “blocks” b4m2 b4m3 b4m4 : : : b2 D B1 B2 : : : Bt b4m1
(4.26)
where Bi is a “block” defined by (4.25), and we have the probability distribution PrŒBi D block listed in (4.25) of length 4j D ˇj
(4.27)
p p where ˇ D 8 C 3 7 is the fundamental unit in the real quadratic field Q . 7/. To describe random variable t in (4.26), we need to know the expected length of the “blocks” listed in (4.25). The expected length is 4 14ˇ 1 C 8 27ˇ 2 C 12 54ˇ3 C D D 2ˇ 1 C 54ˇ 1
1 X
k.2ˇ/kC1 D 2ˇ 1 C 54ˇ 1
kD1
1 ; .1 ˇ 1 /2
(4.28)
which is a number between p 4 and p 5. Next we switch from 7 to 19 D Œ4I 2; 1; 3; 1; 2; 8. We recall Eq. (4.22) q6k D 326q6k6 C 39q6k7 ; which implies that there are 326 admissible sequences S Dp.S1 ; : : : ; S6 / of length six with S6 ¤ 8 (=last digit of the period 2; 1; 3; 1; 2; 8 of 19), and there are 39
216
4 Proving Randomness
sequences with S6 D 8. The Extra Rule forces that each one of these 39 sequences is extended to length seven with an additional 0. Due to the large numbers, 326 and 39, we are not going to give the list of these sequences. Instead, we just give a brief description ofpthe corresponding 4-state Markov chain, which is basically the same as that of for 7 above.
Again we have four states: the no-return A1 , the transient A2 , the isolated black hole BH.1/, and the non-isolated black hole BH.2/. State A1 represents the case S1 ¤ 0; S6 D 8; it consists of 25 sequences. State A2 represents the case S1 D 0; S6 D 8; it consists of 14 sequences. The isolated black hole BH.1/ represents the case S1 ¤ 0; S6 ¤ 8; it consists of 209 sequences. The non-isolated black hole BH.2/ represents the case S1 D 0; S6 ¤ 8; it consists of 117 sequences. A1 is a no-return state: from A1 we must go either to A2 or to BH.2/. A2 is a transient state: from A2 we may go either to BH.2/ or to A2 (“return”). p Equation (4.25) gives the complete classification of the “blocks” for 19. The only novelty is in the number of admissible sequences corresponding to the four states A1 ; A2 ; BH.1/; BH.2/; let jA1 j; jA2 j; jBH.1/j; jBH.2/j denote these numbers. We recall that jA1 j C jA2 j D q6 D denominator of Œ0I 2; 1; 3; 1; 2 D
1 2C
and jBH.1/j C jBH.2/j D q7 D denominator of Œ0I 2; 1; 3; 1; 2; 8 D where 2; 1; 3; 1; 2; 8 is the period of
p
19 D Œ4I 2; 1; 3; 1; 2; 8.
1 ; 2C
4.1 Completing the Proof of Theorem 1.2
217
To determine jA2 j and jBH.2/j, we simply delete the first digit “2” in the period: jA2 j D denominator of Œ0I 1; 3; 1; 2 D
11 14
and jBH.2/j D denominator of Œ0I 1; 3; 1; 2; 8 D
92 ; 117
that is, jA2 j D 14 and jBH.2/j D 117. p We leave the proof of this simple recipe to the reader; we just note that for 7 D Œ2I 1; 1; 1; 4 the same recipe gives the correct numbers jA2 j D 2 and jBH.2/j D 9: 1 5 jA2 jDdenominator of Œ0I 1; 1D and jBH.2/jDdenominator of Œ0I 1; 1; 4 D : 2 9 p Therefore, in (4.25) with 19 we have the following block distribution: 326 “blocks” of length 6; 25 117 C 14 117 “blocks” of length 12; 25 14 117 C 14 14 117 “blocks” of length 18; and in general, there are 117 39 14k2 “blocks” of length 6k (k 2). We have the following analog of (4.17) and (4.18): we decompose the digit sequence b6m1 b6m2 b6m3 : : : b2 into “blocks” b6m1 b6m2 b6m3 : : : b2 D B1 B2 : : : Bt
(4.29)
where Bi is a “block” defined by (4.25), and we have the probability distribution PrŒBi D block listed in (4.25) of length 6j D ˇ j
(4.30)
p p where ˇ D 170C39 19 is the fundamental unit in the real quadratic field Q . 19/. To describe random variable t p in (4.29), we need to know the expected length of the “blocks” listed in (4.25) with 19. The expected length is 1
6 326ˇ 1 C
6 39 117 X k.14ˇ 1 /k ; 142
(4.31)
kD2
which is a number between 6 and 7. Of course, we need to know the expectation and the variance. By Proposition 2.1, for p p p 3 D Œ1I 1; 2; 7 D Œ2I 1; 1; 1; 4; 19 D Œ4I 2; 1; 3; 1; 2; 8
218
4 Proving Randomness
we have 1 C 2 log N p C O.1/ D 12 log.2 C 3/
Mp3 .N / D
D
Mp7 .N / D
12 log.2 C
p C O.1/; 3/
1 C 1 1 C 4 log N p C O.1/ D 12 log.8 C 3 7/ D
Mp19 .N / D
log N
log N
p C O.1/; 4 log.8 C 3 7/
2 C 1 3 C 1 2 C 8 log N p C O.1/ D 12 log.170 C 39 19/ D
log N
p C O.1/: 4 log.170 C 39 19/
p p p Since the real quadratic fields Q . 3/, Q . 7/, Q . 19/ all have class number one, by using Proposition 3.7 we can easily compute the variance: 1
0 Vp3 .N / D
1 B p B 240 3 @
X b 2 CacD3W a>0;c>0
1 B p B 240 7 @
X b 2 CacD7W a>0;c>0
B 1 p B 240 19 @
C aC A
log N
p
log.8 C 3 7/
C O.log log N
p log N /;
1
0 Vp19 .N / D
p log N p C O.log log N log N /; log.2 C 3/
1
0 Vp7 .N / D
C aC A
X b 2 CacD19W a>0;c>0
C aC A
p p CO.log log N log N /: log.170 C 39 19/ log N
4.1 Completing the Proof of Theorem 1.2
219
What we actually need is the standard deviation: q
p p 1=2 p Vp3 .N / D 24 3 log.2 C 3/ log N C O.log log N /;
q
p p 1=2 p Vp7 .N / D 6 7 log.8 C 3 7/ log N C O.log log N /;
q Vp19 .N / D
33 p p 40 19 log.170 C 39 19/
1=2
p log N C O.log log N /:
Note that the effect of the error term O.log log N / is obviously negligible. The rest of the proof of Theorem 1.2 for an arbitrary quadratic irrational ˛ goes along the p lines of the arguments in Sect. 1.5 (i.e., the special case of the golden ratio ˛ D . 5 C 1/=2)—we mean the arguments from (1.202) till the end. Note that in Sect. 1.5 we had a minor “parity problem”—due to the fact that the length of the period of the golden ratio is odd—that we could easily eliminate by the ad hoc trick of “slightly extending the groups of odd size in (1.203) p p going p up to the first single 0.” Our illustrative examples in this section, ˛ D 3, 7, 19, all have the property that the length of the period is even, avoiding the “parity problem.” In general, if the length of the period of the continued fraction for ˛ is odd, then we have two options to eliminate the “parity problem.” Either we use the ad hoc trick of Sect. 1.5, or we simply double the period (pretending that the double period is the period). Note that in Proposition 1.19 the integral parameter n runs in the special interval p 0 n < qm , where qm is the mth Fibonacci number. Similarly, for ˛ D 3 we studied the special interval p 0 n < q2m , where q2m is the denominator of the .2m/th convergent of 3, see (4.11). We made this restriction for the sake of simplicity: to guarantee that the initial distribution is the stationary p distribution. We can easily generalize from the special interval 0 n < q2m (for 3) to an arbitrary interval 0 n < N by using the simple trick explained in the Concluding Remarks at the end of Sect. 1.5, see (1.237)–(1.239). Of course, the same applies for any p quadratic irrational ˛ (not just for 3). There are two more technical details that we have to address here: namely, (1) how to prove the analog of (1.223) in general, that is, the fact that the correlation tends to zero exponentially fast and (2) how to prove the analog of (1.208) in general (“exponentially small tail probability for the return time”). We begin with the generalization of (1.223). In Sect. 1.5 we could carry out a direct approach: we could prove (1.223) by a direct computation, because the transition matrix of the corresponding Markov chain was a small 2-by-2 matrix, and it was easy to determine the eigenvalues explicitly. In the general case we can avoid the explicit calculation of the eigenvalues by using the following indirect “metric space” approach.
220
4 Proving Randomness
4.1.2 Ergodic Markov Chains: Exponentially Fast Convergence to the Stationary Distribution This is a basic result in the theory of Markov chains. In spite of its importance, most books in probability theory either avoid the proof, or just give a (trivial) illustration on a 2-by-2 matrix, or refer to very general results in ergodic theory. To make our book more or less self-contained, we decided to include a short proof. A finite homogeneous Markov chain is called ergodic if there is an integer s0 1 such that for any two states, say, states i and j , the s0 -step transition probability pi;j .s0 / from i to j is strictly positive: pi;j .s0 / > 0 for any i and j . Ergodicity is equivalent to the fact that the s0 th power As0 of the transition matrix A D .pi;j /i;j has the property that every entry is strictly positive. The term ergodic (which comes from statistical mechanics) will be justified by Lemma 4.2. Assume that a finite and homogeneous Markov chain with transition matrix A has r states 1; 2; : : : ; r, and let D f1 ; : : : ; r g be a probability distribution on 1; 2; : : : ; r. Then Ak defines a probability distribution for any integer k 1. Let 0 D f01 ; : : : ; 0r g and 00 D f001 ; : : : ; 00r g be two arbitrary probability distributions on 1; 2; : : : ; r. We define the “distance” 1X 0 j 00i j; 2 i D1 i r
dist.0 ; 00 / D
which turns the space of all probability distributions on 1; 2; : : : ; r into a complete metric space. Trivially 0 dist.0 ; 00 / 1. The following simple result plays a key role here. For the sake of completeness we include the short proof. Lemma 4.1. If Q D .qi;j /i;j is an r-by-r stochastic matrix, and qi;j ı > 0 for all 1 i; j r, then dist.0 Q; 00 Q/ .1 ı/ dist.0 ; 00 /: Proof. First note that 1X 0 j 00i j D 2 i D1 i r
dist.0 ; 00 / D
X 1 X 00 1X 0 .i 00i /C C .i 0i /C D .0i 00i /C ; 2 i D1 2 i D1 i D1 r
D
r
r
(4.32)
where .x/C D x if x > 0 and .x/C D 0 if x 0 (the “positive part” of x). In the last step of (4.32) we used the trivial fact that
4.1 Completing the Proof of Theorem 1.2
0D11D
r X
221
0i
i D1
D
r X
00i D
r X
i D1
.0i 00i / D
i D1
r r X X .0i 00i /C .00i 0i /C : i D1
i D1
By (4.32) we have 0
r r X X
00
dist. Q; Q/ D
j D1
!C .0i
00i /qi;j
:
(4.33)
i D1
We claim that for some j r X
!C .0i
00i /qi;j
D 0:
(4.34)
i D1
Indeed, otherwise r X
0i qi;j >
i D1
r X
00i qi;j for all j;
i D1
and adding them up leads to the contradiction 1D
r X r X
0i qi;j >
j D1 i D1
r X r X
00i qi;j D 1:
j D1 i D1
Therefore, at least one index j D j0 is “missing” in (4.33), and so we have 0
r X
r X
j D1Wj ¤j0
i D1
00
dist. Q; Q/ D
r X
0 .0i
00i /C
i D1
@
r X
j D1Wj ¤j0
r X
1 qi;j A
!C .0i
00i /qi;j
r X .0i 00i /C 1 qi;j0 i D1
.0i 00i /C .1 ı/ D .1 ı/ dist.0 ; 00 /;
i D1
completing the proof of Lemma 4.1.
t u
222
4 Proving Randomness
Now let 0 be a probability distribution on 1; 2; : : : ; r, and let n D 0 An where A is the r-by-r transition matrix of an ergodic Markov chain. Lemma 4.1 implies that the sequence n D 0 An , n D 1; 2; 3; : : :, of probability distributions forms a Cauchy sequence in our complete metric space. Indeed, by Lemma 4.1, dist.n ; nCm / D dist.0 An ; 0 AnCm / .1 ı/dist.0 Ans0 ; 0 AnCms0 / D D .1 ı/dist.ns0 ; nCms0 /;
(4.35)
where every entry of As0 is ı > 0 (“ergodicity”). Iterating (4.35) we have dist.n ; nCm / .1 ı/dist.ns0 ; nCms0 / .1 ı/2 dist.n2s0 ; nCm2s0 / .1 ı/3 dist.n3s0 ; nCm3s0 / : : : .1 ı/k dist.nks0 ; nCmks0 / .1 ı/k
(4.36)
as long as n ks0 . By choosing k ! 1 in (4.36), we conclude that n D 0 An , n D 1; 2; 3; : : :, forms a Cauchy sequence. Since the metric space is complete, the limit exists, and limn n D is a probability distribution. We have A D lim n A D lim 0 An A D lim nC1 D : n!1
n!1
n!1
(4.37)
Next we show that the invariance property D A uniquely determines the probability distribution (see (4.37)). Indeed, assume that 0 D 0 A and 00 D 00 A, then 0 D 0 A D 0 A2 D : : : D 0 As0 ; and similarly 00 D 00 A D 00 A2 D : : : D 00 As0 ; and by Lemma 4.1, dist.0 ; 00 / D dist.0 As0 ; 00 As0 / .1 ı/dist.0 ; 00 / for some ı > 0, which implies that dist.0 ; 00 / D 0, i.e., 0 D 00 , proving the uniqueness.
4.2 How to Use Lemma 4.2 to Find the Analog of (1.223) in General?
223
We have thus obtained that for any initial probability distribution 0 on 1; 2; : : : ; r, the limit lim 0 An D exists, and it is independent of 0 :
n!1
We call the uniquely determined probability distribution the stationary distribution of the ergodic Markov chain. Let i be an arbitrary state; by choosing the initial distribution 0 D 1 on state i and 0 on the rest of the states, 0 An is equal to the sequence of n-step transition probabilities pi;j .n/, j D 1; 2; : : : ; r (the starting point, state i , is fixed). Let D fp1 ; : : : ; pr g denote the stationary distribution. Then by (4.36), 1X jpi;j .n/ pj j D dist.0 An ; lim 0 A` / `!1 2 j D1 r
.1 ı/n=s0 1 .1 ı/1 .1 "/n
(4.38)
with 1 " D .1 ı/1=s0 < 1. Since Eq. (4.38) holds for every i D 1; 2; : : : ; r, it proves Lemma 4.2. In every finite ergodic Markov chain the speed of convergence of the n-step transition probability pi;j .n/ to the limit pj (=the stationary probability of state j ) is exponential (independently of state i ).
4.2 How to Use Lemma 4.2 to Find the Analog of (1.223) in General? Let A denote the set of all admissible sequences .B1 ; B2 ; : : : ; B` / (i.e., satisfying the Extra Rule), where ` 1 is the length of the period of the quadratic irrational ˛ in Theorem 1.2. Consider now the Markov chain, where the set of states is A, that is, every admissible sequence represents a state, see the picture below. Why is it ergodic, and what is the corresponding s0 ? We represent an admissible sequence (a vector) with a boldface letter, so i D .B1 ; B2 ; : : : ; B` / denotes a state of the Markov chain. Let 0 D .0; 0; : : : ; 0/; since pi;0 > 0 for any state i, and also p0;j > 0 for any state j, the 2-step transition probabilities pi;j .2/ pi;0 p0;j > 0 are all strictly positive. This proves the ergodicity with s0 D 2. Therefore, the n-step transition probability pi;j .n/ converges to the limit pj (=the stationary probability of state j) exponentially fast.
224
4 Proving Randomness
Let i.k/ denote the kth coordinate of state i (which is an `-dimensional vector). By using the exponentially fast convergence in Lemma 4.2, we have X
p.b 0 ; b 00 I k1 ; k2 I n/ D
iW i.k1 / Db 0
X
D
iW i.k1 / Db 0
0
X
D@
iW i.k1 / Db 0
pi
X
X
pi
pi;j .n/ D
jW j.k2 / Db 00
pj C O ..1 "/n / D
jW j.k2 / Db 00
10
X
pi A @
1 pj A C O ..1 "/n / :
jW j.k2 / Db 00
By using the notation 0 p.b 0 ; b 00 I k1 ; k2 / D @ iW
X i.k1 / Db 0
10
X
pi A @ jW
1 pj A ;
j.k2 / Db 00
we conclude that p.b 0 ; b 00 I k1 ; k2 I n/ converges to p.b 0 ; b 00 I k1 ; k2 / exponentially fast (as n ! 1). Now this is the analog of (1.223) in the general case. Next, we formulate a general form of inequality (1.208). Similarly to Lemma 4.2, this is another important result in the theory of Markov chains. Again, for the sake of completeness, we included a short proof.
4.2 How to Use Lemma 4.2 to Find the Analog of (1.223) in General?
225
Lemma 4.3 (Exponentially small tail probability for the return time). Consider an arbitrary finite ergodic (homogeneous) Markov chain; let s denote the number of states. Let 0 be any state, and let denote the return time from state 0 to itself (of course is a random variable). Then there is a real number with 0 < < 1 (possibly depending on the Markov chain) such that PrŒ > ns < n for every integer n 1: Proof. Let : : : ; X1 ; X0 ; X1 ; X2 ; : : : denote our Markov chain. We have PrŒX1 D a1 ; X2 D a2 ; : : : ; X` D a` jX0 D a0 ; X1 D a1 ; : : : ; Xk D ak D D PrŒX1 D a1 ; X2 D a2 ; : : : ; X` D a` jX0 D a0 ; which expresses the basic property that, conditional upon the present (“X0 D a0 ”), the future (“X1 D a1 ; X2 D a2 ; : : : ; X` D a` ”) does not depend on the past (“X1 D a1 ; : : : ; Xk D ak ”). Let a be an arbitrary state, and write ra .n/ D PrŒX1 ¤ 0; : : : ; Xn ¤ 0jX0 D a: In the special case a D 0 we obtain r0 .n/ D PrŒ > n: We prove the lemma in two steps. First, we show that, for any integer n 1,
n PrŒ > ns D r0 .ns/ max ra .s/ ; a
where s is the number of states in our Markov chain. Second, we show that max ra .s/ < 1: a
Combining the two steps, the lemma immediately follows. The proof of the first step is a simple induction: for any state b, rb .n1 C n2 / rb .n1 / max ra .n2 / max ra .n1 / max ra .n2 / D a
a
a
D r.n1 /r.n2 / where r.k/ D max ra .k/: a
Similarly, rb .n1 C n2 C n3 / r.n1 /r.n2 /r.n3 /;
226
4 Proving Randomness
and in general rb .ns/ r n .s/; proving the first step. The proof of the second step is equally simple. Let m be the minimum value of k 1 such that ra .k/ < 1 (ergodicity implies that m is finite). This is equivalent to the fact that there is a positive product pa0 ;a1 pa1 ;a2 pam1 ;am > 0 of transition probabilities pa;b D PrŒXi C1 D bjXi D a, where a0 D a; a1 ; a2 ; : : : ; am1 ; am D 0 is some sequence of states, and m has the minimum property. We claim the inequality m s D number of states: Suppose m > s, then the sequence a1 ; a2 ; : : : ; am1 ; am D 0 must have a repetition: ai D aj for some 1 i < j m. But this clearly contradicts the minimum property of m. Indeed, if j D m, i.e., ai D am D 0 then pa0 ;a1 pa1 ;a2 pai 1 ;ai > 0 is a shorter product; otherwise consider pa0 ;a1 pa1 ;a2 pai 1 ;ai paj ;aj C1 pam1 ;am > 0; which is also shorter. This completes the proof of Lemma 4.3.
t u
Finally, note that, by using the above-mentioned analog of (1.223) and Lemma 4.3 (as an analog of (1.208)) the same way as we used (1.223) and (1.208) in Sect. 1.5 (i.e., in the special case of the golden ratio), we can easily complete the proof of Theorem 1.2 for any quadratic irrational ˛.
4.3 Completing the Proof of Theorem 1.1 4.4 The Fourier Series Approach In Sect. 4.1 we completed the proof of Theorem 1.2 by using Ostrowski’s explicit formula (1.55). An alternative approach is to work with the (truncated) Fourier series representation (see (3.4)): T X cos..2n C 1/j˛/ S˛ .n/ M˛ .N / D C O.log log N / 2j sin.j˛/ j D1
(4.39)
that holds for every n in 1 n N , where T D T .N / D N log N and ˛ is any badly approximable number.
4.4 The Fourier Series Approach
227
In this section we prove Theorem 1.1 by working out the details of the Fourier series approach. We recall that Theorem 1.1 is about the asymptotic behavior of the discrepancy X
S˛ .I n/ D
1 n
(4.40)
1knW k˛2.0;/ .mod 1/
of the counting function of the irrational rotation k˛ (mod 1). Note that Sós [So2] has developed an analog of Ostrowski’s formula (1.55) for S˛ .I n/, and it is possible to prove Theorem 1.1 by basically repeating the arguments of Sect. 4.1 with Sós’s formula instead of Ostrowski’s formula. The advantage of the Fourier series approach is that it is far more flexible and works well even when we don’t know any “explicit formula.” p For simplicity, we discuss first the special case ˛ D 2 and D 1=2 of Theorem 1.1. We then have the following analog of (4.39): Sp2 .1=2I n/
Mp2 .1=2I N /
X
D
1j T W od d
p cos..2n C 1/j 2/ p C O.log log N / j sin.j 2/ (4.41)
for every n in 1 n N , and again T D T .N / D N log N . The proof is the same as that of (4.39). The only difference between (4.39) and (4.41) is that in the latter j is restricted to the odd integers (i.e., half of the integers, explaining the extra factor of 2 in the denominator in (4.39)). Note that both (4.39) and (4.41) are based on the Fourier series ..x// D
1 X sin.2jx/ j D1
j
;
and also, (4.41) uses identity (2.62): .x/ D ..x // ..x//:
4.4.1 Guiding Intuition As n runs in the interval 1 n N , it is plausible to expect that the difference Sp2 .1=2I n/ Mp2 .1=2I N / can be approximated by the following “stochastic variant” of (3.4): Sp2 .1=2I n/Mp2 .1=2I N / D
X 1j N W od d
cos.2Yj / C negligible; 2j sin.j˛/
(4.42)
228
4 Proving Randomness
where Y1 ; Y2 ; : : : ; YN are independent random variables, each one is uniformly distributed in the unit interval 0 Yj 1. The guiding intuition gives a good insight, but it is far too vague. The actual method of the proof of Theorem 1.1 follows the technique of Sects. 1.5 and 4.1: as n runs in the interval 0 < n < N , we approximate Sp2 .1=2I n/ Mp2 .1=2I N / with a sum of independent and identically distributed random variables: Sp2 .1=2I n/ Mp2 .1=2I N / D X1 C X2 C X3 C : : : C negligible; see (4.83). And again, the independence comes from an underlying (homogeneous) Markov chain, see (4.45). We carry out the approximation in two steps: first we construct a sequence XQ 1 ; XQ 2 ; XQ 3 ; : : : of almost independent random variables (see (4.71) and (4.72)) in a way similar to (1.204)–(1.206), and it is the second step, when—after some “truncation” and “linearization”—we obtain the truly independent random variables X1 ; X2 ; X3 ; : : :, see (4.83). p Similarly to Sect. 1.4, we obtain our Markov chain frompworking with the .1 C 2/ scalep representation of real numbers. Write D 1 C 2 for the fundamental unit in Q . 2/. For simplicity assume that the N in (4.41) has the special form N D m D .1 C
p m 2/ :
(Note in advance that, to go from the special values N D m to arbitrary N s, we just repeat the argument of the Concluding Remark p at the end of Sect. 1.5, see (1.237)– p (1.239).) We recall that pj ˙ qj 2 D .1 ˙ 2/j , and so qj D
.1 C
p j p p p .1 C 2/j C .1 C 2/j 2/ .1 C 2/j p and pj D ; 2 2 2
(4.43)
p where pj =qj is the j th convergent of 2. Similarly to (4.11) and (1.189), we write p an arbitrary real number in the interval 0 < < N D m in the D .1 C 2/ scale form
D bm1 m1 C bm2 m2 C bm3 m3 C : : : ;
(4.44)
where bj 2 f0; 1; 2g, and, as usual, (4.44) satisfies the Extra Rule: bj D 2 implies bj 1 D 0 (this makes representation (4.44) unique). As runs in the interval 0 <
< N D m , the sequence of coefficients bm1 D bm1 . /; bm2 D bm2 . /; bm3 D bm3 . /; : : :
(4.45)
forms a homogeneous Markov chain described in Sect. 1.4 (see (1.131) and (1.138)), and the initial distribution is in fact the stationary distribution.
4.4 The Fourier Series Approach
229
We know that every integer j 1 can be written in the form j D
X
ck qk ; where ck 2 f0; 1; 2g;
(4.46)
k1
and again we can make the representation (4.46) unique by enforcing the Extra Rule: ck D 2 implies ck1p D 0. This is the special case of Ostrowski’s representation (1.54) for ˛ D 2. For notational convenience, we rewrite (4.46) in a slightly different form. We replace a possible term 2qk in (4.46) (i.e., ck D 2) with qkC1 qk1 , and by repeated application of this operation we eventually obtain the new form: j D
X
"k qk ; where "k 2 f1; 0; 1g:
(4.47)
k1
More precisely, for every integer j 1 there exist integers H D H.j / h D h.j / 1, and "k 2 f1; 0; 1g for all h k H such that "h ¤ 0, "H ¤ 0, and X
j D
"k qk ; where "k 2 f1; 0; 1g:
(4.48)
hkH
Working modulo one, we have j
p
2D
X
X X p p p " k qk 2 "k .qk 2pk /D "k .1 2/k modulo 1:
hkH
hkH
hkH
(4.49) Combining (4.44) with (4.49), we have
j
p
2 .bm1 m1 C bm2 m2 C : : :/.
X
"k .1
p
2/k / D
hkH
D
X X
p .1/kC1 b` "k . 2 1/k` modulo 1:
(4.50)
`<m hkH
p p If ` > k then the factor . 2 1/k` D .1 C 2/`k in (4.50) is close to an integer: by (4.43) with j 1, .1 C
p
2/j D 2pj .1
p j 2/ :
Combining (4.50) with (4.51), we have
j
p 2
X `<mW h.j /`H.j /
.1/`C1 b` "` C
(4.51)
230
4 Proving Randomness
C
X
Xp . 2 1/d d 1
C
X
.1
.1/`Cd C1 b` "`Cd C
`<mW h.j /`Cd H.j /
X
p d 2/
d 1
.1/`d b` "`d modulo 1:
(4.52)
`<mW h.j /`d H.j /
By using the notation X
G. ; j / D
.1/`C1 b` "` C
(4.53)
`<mW h.j /`H.j /
0 X p B C . 21/d B @ d 1
1 X
X
.1/`Cd C1 b` "`Cd C
`<mW h.j /d `H.j /d
`<mW h.j /Cd `H.j /Cd
C .1/` b` "`d C A;
we can rewrite (4.52) as follows:
j
p
2 G. ; j / modulo 1;
(4.54)
and also we can rewrite the numerator in (4.41), which is the special case D n=integer: p p
p
p
cos .2 C 1/j 2 D cos 2 j 2 C j 2 D cos 2G. ; j / C j 2 : (4.55) p Next we estimate the denominator j sin.j 2/ in (4.41). By (4.49) we have (where kyk denotes, as usual, the distance of y from the nearest integer) kj
p
X
2k D k
"k .1
hkH
p
X p p 2/k k . 2 1/h . 2 1/k h
1 X p p p p . 2 1/h . 2 1/k D . 2 1/h .1 1= 2/:
(4.56)
kDhC1
By (4.48) and (4.56), 1 j j sin.j
p
2/j
p
p p D O .1 C 2/H . 2 1/h D O . 2 1/H h : (4.57)
Let’s return to (4.41) (which is the special case D n=integer): we throw away those terms from the sum
4.4 The Fourier Series Approach
S. / D
231
X 1j T W od d
p cos..2 C 1/j 2/ p j sin.j 2/
(4.58)
for which the corresponding (odd) j has the property that in representation (4.48) H.j / h.j / > L D L.N / D 3 log log N:
(4.59)
This particular choice of L D L.N / is motivated by the fact that p p . 2 1/L D . 2 1/3 log log N < .log N /2
(4.60)
is “very small.” This “throw away” procedure formally means that we write S. / D S .1/ . / C S .2/ . /;
(4.61)
where S
.1/
. / D
X 1j T W od d H.j /h.j /L
S
.2/
. / D
X 1j T W od d H.j /h.j />L
p cos..2 C 1/j 2/ p ; j sin.j 2/
(4.62)
p cos..2 C 1/j 2/ p ; j sin.j 2/
(4.63)
and S .1/ . / is expected to be the dominant part. First we estimate the quadratic average of S .2/ .n/ (which is expected to be the minority part of S.n/) as n runs through the integers in 1 n N . Lemma 4.4. We have 1
0 1 N
N X nD1
B .2/ 2 S .n/ D O B @
X 1j T W od d H.j /h.j />L
1 2
j 2 sin .j
p
C CC 2/ A
CO .log log N /2 : The proof of Lemma 4.4 is basically the same as that of Proposition 3.1—we just leave the details of the proof to the reader. In fact, the proof of Lemma 4.4 is simpler, because if H.j /h.j / > L D L.N / D 3 log log N , then by (4.57), (4.59), and (4.60)
232
4 Proving Randomness
1 j j sin.j
p
2/j
D O .log N /2 ;
(4.64)
and combining (4.64) with inequality (3.113), we have X
1
1j T W od d H.j /h.j />L
j 2 sin2 .j
p
2/
D O.1/:
(4.65)
Combining Lemma 4.4 with (4.65), we obtain N 1 X .2/ 2 S .n/ D O .log log N /2 : N nD1
(4.66)
Applying Chebyshev’s inequality (see (1.150)) with (4.66), we see that the contribution of S .2/ .n/ in our central limit theorem (Theorem 1.1) is totally negligible. It means that we can focus on the dominant part S .1/ .n/ of S.n/, see (4.58), (4.61), and (4.62). We study S .1/ .n/ as the integral variable n runs .1/ in 0 < n < N D m . In fact, we extend S .1/ .n/ p tomS . / (see (4.62)), where is m any real number in 0 < < N D D .1 C 2/ . The -scale representation of
(see (4.44)) gives the Markov chain (4.45): bm1 D bm1 . /; bm2 D bm2 . /; bm3 D bm3 . /; : : : ; and the initial distribution is in fact the stationary distribution. To get independence, we apply the basic trick of Sect. 1.4: we work with “0,” “1,” and “20” (instead of “0,” “1,” “2”). Formally, bm1 . /; bm2 . /; bm3 . /; : : : D B1 . /; B2 . /; B3 . /; : : :
(4.67)
where Bi D Bi . / D 0 or 1 or 20, and the right-hand side of (4.67) forms a sequence of independent random variables with common distribution (see (4.73)) PrŒBi D 0 D PrŒBi D 1 D
p
p p 2 1 and PrŒBi D 20 D . 2 1/2 D 3 2 2; (4.68)
one-dimensional Lebesgue where Pr (“probability”) means m times the ordinary p measure (as runs in 0 < < N D m D .1 C 2/m ).
4.4 The Fourier Series Approach
233
4.4.2 Constructing a Sum XQ 1 C XQ 2 C XQ 3 C : : : of Almost Independent Random Variables We apply the usual decomposition technique of Sects. 1.3 and 1.5; here log N plays the role of m. Let 0 < < 1=2 be a constant to be specified later, and define parameter R as R D R.N / D b.log N / c:
(4.69)
We decompose the sequence B1 ; B2 ; B3 ; : : : on the right-hand side of (4.67) into groups of size R D R.m/ (defined in (4.69)): B1 ; B2 ; B3 ; : : : ; BR ; BRC1 ; BRC2 ; BRC3 ; : : : ; B2R ; B2RC1 ; B2RC2 ; B2RC3 ; : : : ; B3R ; and so on:
(4.70)
Let B.i 1/RCj be an arbitrary element of the i th group in (4.70), where i D 1; 2; 3; : : : and 1 j R. We have B.i 1/RCj D 0 or 1 or 20, that is, B.i 1/RCj D bk or bk bk1 for some k (see (4.67)); let Ii denote the set of all indices k or k; k 1 that occur this way in the i th group. Notice that Ii is an interval (of consecutive integers), and these intervals are disjoint and decreasing: the elements of Ii C1 are all smaller than that of Ii . By using decomposition (4.70) and these intervals Ii , we can rewrite S .1/ . / in (4.62) in the form S
.1/
. / D
X 1j T W od d H.j /h.j /L
p cos..2 C 1/j 2/ p D j sin.j 2/
D XQ 1 C XQ2 C XQ3 C C YQ1 C YQ2 C YQ3 C ;
(4.71)
where the XQ i s are defined by the disjoint intervals Ii above: XQi D XQi . / D
X 1j T W od d H.j /h.j /L;fh.j /L;H.j /CLgIi
p cos..2 C 1/j 2/ p ; j sin.j 2/
(4.72)
and YQi D YQi . / D
X 1j T W od d H.j /h.j /L;h.j /L2Ii C1;H.j /CL2Ii
p cos..2 C 1/j 2/ p : j sin.j 2/
(4.73)
234
4 Proving Randomness
We emphasize the similarity to (1.204)–(1.206). We have the upper bounds 0 B jXQ i j const B @
1 X 1j T DN log N W H.j /h.j /L;fh.j /L;H.j /CLgIi
j kj
1 p
C CD 2k A
D O.jIi j L/ D O..log N / log log N /; and 0 B jYQi j const B @
1 X 1j T DN log N W H.j /h.j /L;h.j /L2Ii C1;H.j /CL2Ii
C 1 p C D j kj 2k A
D O.L2 / D O..log log N /2 /; where in both cases we applied Lemma 2.14. These upper bounds play the role of (1.221) in Sect. 1.5. As usual, our plan is to show that the first sum XQ 1 C XQ 2 C XQ3 C in (4.71) is the dominating part. The new condition in (4.72) means that we keep those j s for which the whole L-neighborhood of the interval Œh.j /; H.j / is still inside of Ii . The motivation for this comes from the fact (see (4.60)) p p . 2 1/L D . 2 1/3 log log N < .log N /2 D very small;
(4.74)
P which implies that the tail d >L of the series in (4.53) is negligible, so we can safely cut the series off at d D L D 3 log log N . The “cutoff-the-tail” argument means that we consider the following truncated version of G. ; j / in (4.53): G . ; j / D
X
.1/`C1 b` "` C
(4.75)
`<mW h.j /`H.j /
0 C
L X d D1
p B . 2 1/d B @
1 X
.1/`Cd C1 b` "`Cd C
`<mW h.j /d `H.j /d
X `<mW h.j /Cd `H.j /Cd
C .1/` b` "`d C A:
By (4.54) and (4.74), we can rewrite the numerator: p
p
cos .2 C 1/j 2 D cos 2G. ; j / C j 2 D
4.4 The Fourier Series Approach
235
p
D cos 2G . ; j / C j 2 C O .log N /2 : Next we study the numerator j sin.j j
p
(4.76)
2/. By (4.48) and (4.49),
X p p 2D " k qk 2 D hkH
D
X
X p "k .qk 2 pk / C "k pk D
hkH
hkH
X
D
"k .1
p
X
2/k C
hkH
" k pk :
(4.77)
hkH
p Every convergent numerator pk of 2 is odd (since p1 D 1, p2 D 3, and pi D 2pi 1 C pi 2 for all i 3), thus (4.77) implies that (we also use (4.48)) 0 j sin.j
p
2/ D j sin @
X
"k .1
p
2/k C
hkH
0 D .1/1C
P hkH
"k
@
X
1
X
1 "k A D
hkH
0
"k qk A sin @
hkH
X
1 "k .1
p
2/k A :
(4.78)
hkH
p Sum S .1/ .n/ (see (4.62)) is defined for j s with very small kj 2k (apart from a very few “small” j s), so we can safely apply the approximation sin.x/ D x C O.x 3 /: 0 sin @
X
1 X p k p "k .1 2/ A D "k .1 2/k C negligible;
hkH
(4.79)
hkH
and for the same reason in (4.76) we can ignore the term j
p
2:
p
cos .2 C 1/j 2 D cos 2G . ; j / C negligible: Let’s return to (4.78): by (4.43) we have 0 @
X
hk1 H
10 "k1 qk1 A @
X hk2 H
1 "k2 .1
p
2/k2 A D
(4.80)
236
4 Proving Randomness
0 D@
X
"k1
.1 C
hk1 H
D
X
X
p
2/k1 .1 C p 2 2
p k 10 2/ 1 A @ X
1 "k2 .1
p
2/k2 A D
hk2 H
"k1 "k2 .1/k2 .1 C
p k k 2/ 1 2 C negligible:
(4.81)
hk1 H hk2 H
It is easy to see that the single largest term in (4.81) comes from the choice k1 D H.j /, k2 D h.j /, and thepabsolute value of sum (4.81) is between two positive constant multiples of .1 C 2/H h .
4.4.3 Defining the Truly Independent Random Variables X1 ; X2 ; X3 ; : : : Besides G . ; j / in (4.75), we also need the new functions: G1 .j / D 1 C
X
"k and
hkH
G2 .j / D
X
X
"k1 "k2 .1/k2 .1 C
p k k 2/ 1 2 :
(4.82)
hk1 H hk2 H
We approximate XQ i (see (4.72)) with Xi D Xi . / D
X
.1/G1 .j /
1j T W od d H.j /h.j /L;fh.j /L;H.j /CLgIi
cos.2G . ; j // : 2 G2 .j / (4.83)
Similarly, we approximate YQi (see (4.73)) with Yi D Yi . / D
X
.1/G1 .j /
1j T W od d H.j /h.j /L;h.j /L2Ii C1 ;H.j /CL2Ii
cos.2G . ; j // : 2 G2 .j /
(4.84) Equations (4.78)–(4.82) make both approximations perfectly reasonable. Let’s return p to (4.67): as the real variable runs in the interval 0 < < N D m D .1 C 2/m , the sequence B1 . /; B2 . /; B3 . /; : : :
(4.85)
forms independent and identically distributed random variables (where Bi D Bi . / D 0 or 1 or 20, and the common distribution is described in (4.68)). The
4.4 The Fourier Series Approach
237
random variables X1 D X1 . /, X2 D X2 . /, X3 D X3 . /; : : : in (4.83) were defined in such a way that they depend only on disjoint sets of B` ’s. More precisely, Xi D Xi . / depends only on the i th group in (4.70) (described by the interval Ii ). It follows that X1 ; X2 ; X3 ; : : : are independent random variables. Similarly, Y1 ; Y2 ; Y3 ; : : : are independent random variables. Yi may depend on both Xi and Xi C1 , but Yi is independent of all X` with ` 62 fi; i C 1g. By independence, EX` Yi D 0 if ` 62 fi; i C 1g. As an analog of (1.223), we need an upper bound for the absolute value of m
Z
m
EX` Yi D
X` . /Yi . / d 0
if ` 2 fi; i C 1g. It is more convenient to go back to (4.72) and (4.73): EXQ` YQi D m
Z
m
XQ` . /YQi . / 0
if ` 2 fi; i C 1g. By repeating the argument of Proposition 3.2, X
jEXQ` YQi j
1
1j T W od d H.j /h.j /L;h.j /L2Ii C1 ;H.j /CL2Ii
2
j 2 sin .j
p
2/
C
CO .log log N /2 D O.L/ C O .log log N /2 D O .log log N /2 if ` 2 fi; i C 1g. Since X` D XQ ` Cnegligible and Yi D YQi Cnegligible, the same upper bound holds for jEX` Yi j as well: EX` Yi D O .log log N /2 if ` 2 fi; i C 1g. Note that X1 ; X2 ; X3 ; : : : are identically distributed. This is a consequence of the “translation invariance” of the functions G1 .j /; G2 .j / (see (4.82) and (4.83)) and the linearity of G . ; j / (see (4.75)). The meaning of “translation invariance” is explained in Eq. (4.86). First we recall (4.48): j D
X
"k qk ; where "k 2 f1; 0; 1g;
(16.48’)
h.j /kH.j /
and, replacing each qk with qkC2i , define the “translate” j.i / D
X h.j /kH.j /
"k qkC2i
(16.48”)
238
4 Proving Randomness
where i D 0; ˙1; ˙2; ˙3; : : :. Then j and j.i / have the same parity (due to the recurrence q1 D 1, q2 D 2, q` D 2q`1 C q`2 for all ` 3; the parity is important, since in (4.41) the integral parameter j has to be odd), and also (see (4.82)) G1 .j / D G1 .j.i // and G2 .j / D G2 .j.i //:
(4.86)
Similarly, Y1 ; Y2 ; Y3 ; : : : are also identically distributed. Equation (4.83) is an analog of (1.205), and similarly Eq. (4.84) is an analog of (1.206). Repeating the arguments that follow (1.205) in Sect. 1.5 (e.g., applying p Kolmogorov’s inequality), we obtain Theorem 1.1 in the special case ˛ D 2 and D 1=2. To prove the general case, where ˛ is an arbitrary quadratic irrational and is an arbitrary rational number in 0 < < 1, we just have to repeat the arguments of Sect. 4.1. To compute the expectation, we use the results of Sects. 2.1 and 2.2. To compute the variance, we use the results of Sect. 3.1–3.3. Finally, as an analog of (4.56) and (4.57) in the general case, we use the following technical lemma describing an important property of the so-called Ostrowski representation (see also (1.54)). Let ˛ be an arbitrary irrational: ˛ D Œa0 I a1 ; a2 ; a3 ; : : :; and write Œa0 I a1 ; : : : ; ai 1 D
pi : qi
(4.87)
By using the denominators qi in (4.87), every integer n 1 can be written in the form (where q1 D 1, q2 D a1 ) nD
X
bi qi ; where m 0; bm ¤ 0;
i m
bi 2 f0; 1; : : : ; ai g for i > m; bi D ai implies bi 1 D 0; and b1 < a1 : (4.88) This is the Ostrowski representation of n defined by ˛ (see (4.87)). Lemma 4.5. For every n written in the form (4.88) we have the lower bound kn˛k D j
X
bi .qi ˛ pi /j jqmC1 ˛ pmC1 j;
(4.89)
i m
and also the upper bound kn˛k jqm1 ˛ pm1 j if m 2:
(4.90)
Since the proof is tricky, we include it. To prove (4.89), we write i D qi ˛ pi ; then i D ai 1 i 1 C i 2 I
(4.91)
4.4 The Fourier Series Approach
239
note that i D .1/i 1 ji j; and ji 2 j D ai 1 ji 1 j C ji j:
(4.92)
We have 0 .1/m1 @
1 X
1 bj j A D bm jm jbmC1jmC1 jCbmC2jmC2 jbmC3jmC3 j˙
j Dm
bm jm j bmC1 jmC1 j bmC3 jmC3 j bmC5 jmC5 j :
(4.93)
Since bm ¤ 0 we have bmC1 amC1 1, and using the recurrence formula (4.92): ji 2 j D ai 1 ji 1 j C ji j repeatedly, we obtain bm jm j bmC1 jmC1 j jmC1 j C jmC2 j; jmC2 j bmC3 jmC3 j jmC4 j; jmC4 j bmC5 jmC5 j jmC6 j; and so on. Applying these inequalities in (4.93), we have 0 .1/m1 @
1 X
1 bj j A .bm 1/jm j C jmC1 j;
j Dm
which proves (4.89). On the other hand, by a telescoping sum argument 0 .1/m1 @
1 X
1 bj j A bm jm j C bmC2 jmC2 j C bmC4 jmC4 j C
j Dm
bm jm j C .jmC1 j jmC3 j/ C .jmC3 j jmC5 j/ C .jmC5 j jmC7 j/ C D D bm jm j C jmC1 j jm1 j by (4.92); this proves (4.90), and Lemma 4.5 follows. Thus the proof of Theorem 1.1 is complete.
u t t u
240
4 Proving Randomness
4.5 More Results in a Nutshell There are many more results that can be proved by this method. As a first illustration, consider the following, equally interesting and natural, lattice point counting in a right-angled triangle type problem—a variant of the problem in Sect. 1.2. It was Hardy–Littlewood [Ha-Li1,Ha-Li2] and Ostrowski [Os] who, independently of each other and about the same time around 1914–1920, started to investigate the problem of counting lattice points inside the right-angled triangle whose perpendicular sides are on the two coordinate axes and the long side is on the line ˛x C y D t; here ˛ (the negative of the slope) is a fixed irrational. Let D .˛I t/ denote this right triangle; the vertices are the origin O D .0; 0/, A D .t=˛; 0/, and B D .0; t/; we assume that t is a “large” positive number. The number of lattice points inside .˛I t/ equals (by vertical counting) bt =˛c
T˛ .t/ D
X
bt k˛c D
kD1
D
t t t2 C O.1/ C Z˛ .t/; 2˛ 4˛ 4
(4.94)
where bt =˛c
Z˛ .t/ D
X
fk˛ tg
kD1
1 2
(4.95)
is analogous to S˛ .n/ in (1.43) (fyg denotes, as usual, the fractional part of y). Note that the quadratic function t t t2 2˛ 4˛ 4 in (4.94) represents the area (=expectation). Indeed, consider the smaller and similar right-angled triangle where the lower left corner is .1=2; 1=2/ (instead of .0; 0/), but the long side is still on the AB-line; the area of this smaller triangle is exactly 1 t 2 ˛
1 2
t2 ˛
t t 1 D t C : 2 2˛ 4˛ 4 8
To study the crucial part Z˛ .t/ in (4.94), we use the familiar Fourier series of the sawtooth function: ..x// D fxg
1 X sin.2jx/ 1 D 2 j j D1
4.5 More Results in a Nutshell
241
in (4.95) and obtain 1 0 bt =˛c 1 X 1 @X sin.2 mx/A D Z˛ .t/ D m mD1 kD1
D
1 X cos.2 m˛ft=˛g m˛/ cos.2 mftg m˛/ ; 2 m sin. m˛/ mD1
(4.96)
where we used the identity n X
cos. 12 ˇ / cos..n C 12 /ˇ /
sin.kˇ / D
2 sin. 12 ˇ/
kD1
:
If the real variable t runs in the “long” interval 0 < t < N and N ! 1, then ˛ft=˛g .mod 1/ and ftg are “asymptotically independent” variables (a corollary of the uniform distribution of n˛ (mod 1)). It follows via routine calculations that 1 N
Z
N
Z˛ .t/ dt D negligible 0
and 1 N
Z
N
Z˛2 .t/ dt
0
D
N X
1
mD1
4 2 m2 sin2 . m˛/
C negligible:
(4.97)
Comparing this to the variance of n X 1 fk˛g S˛ .n/ D 2 kD1
(as n runs in 1 n pN ) in (3.12), we see an extra factor of 2 in (4.97), and so in the special case ˛ D 2, by (3.63), (3.87), (3.92) we have 1 N
Z
N 0
2 Zp .t/ dt D 2
D
N X
1 p C negligible D 2 m2 sin2 . m 2/ 4 mD1
1 log N p p C negligible: 24 2 log.1 C 2/
(4.98)
242
4 Proving Randomness
By using the proof technique of Sect. 4.3 (“Fourier series approach”) and applying it to (4.96), we can easily obtain the following analog of Theorem 1.1: for any integer N 3 and any real numbers 1 < A < B < 1 Z B Zp2 .t/ 1 1 2 B D p e u =2 d u C measure 0 < t < N W A p N c1 log N 2 A C O .log N /1=10 log log N ; (4.99) and measure stands for the usual one-
p p where c1 D .24 2 log.1 C 2//1=2 dimensional Lebesgue measure. Note that (4.99) remains true if the long interval 0 < t < N is replaced with a constant size interval, say, N 1 < t < N . Another similar lattice point problem arises when the large square Œt; t2 centered at the origin is rotated by an angle , where the slopeptan./ D ˛ is irrational; the center remains the origin. For simplicity, let ˛ D p 2; by a routine application of Poisson’s summation formula, the number L.t/ D L. 2I t/ of lattice p points inside the tilted square (of slope 2) equals X
2
L.t/D4t C
2
nD.n1 ;n2 /2ZZ W n¤0
p p p p sin.2 t.n1 2n2 /= 3/ sin.2 t.n1 Cn2 2/= 3/ p p p p ; ..n1 2n2 /= 3/ ..n1 Cn2 2/= 3/ (4.100)
2
where the term 4t (the area of the tilted square) comes from the contribution of n D 0. It follows via routine calculations that 1 N q and (let jnj D 1 N
Z 0
N
Z
N
L.t/ 4t 2 dt D negligible
0
n21 C n22 ) X
2 L.t/4t 2 dtD 2
n2ZZ W 0<jnjN
4 4 .n
p 1
9 2n2 /2 .n1 Cn2
p
2/2
C negligible:
Since p p 2n2 n22 n2 2n22 n1 2 n2 D p1 and n1 C n2 2 D 1 p ; n1 2 C n2 n1 n2 2
4.5 More Results in a Nutshell
243
we have 1 N
Z
N
L.t/ 4t 2
2
dt D
0
9 C 4 4
9 D 2 4
9 4 4
X 2
n2ZZ W jnjN; n1 n2 <0
X 2
n2ZZ W jnjN; n1 >0;n2 >0
D
8 4
X 2
n2ZZ W jnjN; n1 n2 >0
1 2 .n1 2n22 /2
1 .2n21 n22 /2
!2 p 2 C n2 p C n1 C n2 2 n1
p !2 n1 n2 2 p C negligible D n1 2 n2
p !2 p 2C 2 1 1 C 2 Cnegligible D 1C2 .2n21 n22 /2 .n1 2n22 /2
X 2
n2ZZ W jnjN; n1 >0;n2 >0
.n21
1 C negligible: 2n22 /2
Clearly X 2
n2ZZ W jnjN; n1 >0;n2 >0
log N 1 p C negligible; D K .2/ .n21 2n22 /2 log.1 C 2/
p where K .s/ is the Dedekind zeta function of the real quadratic field K DQ Q. 2/ (see (3.64)). By Siegel’s formula (3.87) and (3.92),
K .2/ D
4 120 23=2
X b 2 CacD2W a>0;c>0
aD
4 4 5D p ; 3=2 120 2 48 2
and so 1 N
Z
N 0
2 1 log N L.t/ 4t 2 dt D p p C negligible: 6 2 log.1 C 2/
Comparing this to (4.43), we see an extra factor of 4 here. This factor of 4 is “explained” by the geometric intuition that our square has four tilted sides; on the other hand, the right-angled triangle has only one tilted side (the horizontal and vertical sides do not count). By using the proof technique of Sect. 4.3 (“Fourier series approach”) and applying it to (4.100), we can easily obtain the following analog of Theorem 1.1: for any integer N 3 and any real numbers 1 < A < B < 1 ( ) p Z B L. 2I t/ 4t 2 1 1 2 measure 0 < t < N W A p e u =2 d u C B D p N c2 log N 2 A
244
4 Proving Randomness
C O .log N /1=10 log log N ; (4.101)
p p where c2 D .6 2 log.1 C 2//1=2 D 2c1 (see (4.99)) and again measure stands for the one-dimensional Lebesgue measure. Again (4.101) remains true if the long interval 0 < t < N is replaced with a short one like N 1 < t < N . p Of course, we can generalize both (4.99) and (4.101) from ˛ D 2 to any quadratic irrational. Note that (4.99) and (4.101) are both central limit theorems, p similar to Theorems 1.1 and 1.2. What happens when we switch from ˛ D 2 (or any quadratic irrational) to a typical real ˛, i.e., in the case of almost every ˛? Perhaps it surprises the reader to learn that for a typical ˛ we cannot expect a central limit theorem. The reason behind it is in the distribution of the partial quotients a1 ; a2 ; a3 ; : : : in the continued fraction for ˛ D Œa0 I a1 ; a2 ; a3 ; : : :. If ˛ is a quadratic irrational then the partial quotients form a bounded (and periodic) sequence. In sharp contrast, for a typical ˛, by a well-known theorem of Kusmin, the density of an D 1 is log.4=3/ D :415 : : : 41:5 %; log 2 see (2.198), and in general, for any fixed integer k 1, the density of an D k is 2
.kC1/ log k.kC2/
log 2
1 1 1 2 D log 1 C ; log 2 k.k C 2/ k log 2
(4.102)
see (2.197). So the average size of the partial quotients equals 2
.kC1/ 1 1 1 X X log k.kC2/ 1 1 X1 k k 2 D D1 log 2 k log 2 log 2 k kD1
kD1
(4.103)
kD1
for almost every ˛. The divergence in (4.103) is the reason behind the failure P of the central limit theorem for almost every ˛. (Note in advance that the fact nkD1 k1 D log n C O.1/ will be used again in Sect. 6.10, explaining the extra factor of log n for the “almost every ˛” type results in Part 1.3.) There is a limit theorem here, proved by Kesten [Ke], which is not a central limit theorem. It goes as follows: (
PN 2
area .˛; ˇ/ 2 Œ0; 1/ W
) Z 1 A du C ˇk 1=4/ A ! ; c3 log N 0 1 C u2 (4.104)
kD1 .kk˛
where c3 > 0 is an absolute constant and, of course, kyk is the distance of y from the nearest integer (its average value is 1/4).
4.5 More Results in a Nutshell
245
The limit distribution on the right-hand side of (4.104) is called the Cauchy distribution, and it is “degenerate” in the sense that it has neither expectation nor variance (both give divergent integrals). The appearance of the “degenerate” Cauchy distribution is quite natural, since the square-integral Z 1Z 0 0
1
N X
!2 .kk˛ C ˇk 1=4/
d˛ dˇ D const N
(4.105)
kD1
is exponentially larger than the norming factor log N in (4.104). (Note that the proof of (4.105) is based on (4.102).) p other hand, by switching from ˛ D 2 to ˛ D e (or its relatives such as p On2the p e, e , 3 e), we can save the central limit theorem. As an illustration, we show an analog of Theorem 1.2. First we recall e D Œ2I 1; 2; 1; 1; 4; 1; 1; 6; 1; 1; 8; 1; : : : ; 1; 2i; 1; : : :;
(4.106)
and next we apply (3.12) with ˛ D e: variance D Ve .N / D
D
N X
1 C negligible D 2 j 2 sin2 .je/ 8 j D1
N 1 X 1 C negligible: 4 2 8 j D1 j kjek2
(4.107)
Let pi =qi denote the i th convergent of e, and let qk N < qkC1 . By using (4.106) and the recurrence formula qi D ai 1 qi 1 Cqi 2 (and also formula (1.28)), we have k D .3 C o.1//
log N : log log N
It is easy to see that the main contribution of (4.107) comes from the j s of the form j D mqi (with m D 1; 2; 3; : : :): 0 10 1 @X 1 A @ Ve .N / D 8 4 m1 m4 0 1 4 @ D 8 4 90
1
X
ai2 A C negligible D
1i 3 log N= log log N
X
1`log N= log log N
1 .2`/2 A C negligible D
246
4 Proving Randomness
D
1 4 1 4 .log N= log log N /3 C negligible D 4 8 90 3 D
1 540
log N log log N
3 C negligible;
(4.108)
describing the variance. On the other hand, to determine the mean value, we use Proposition 2.1: for ˛ D e the continued fraction (4.106) gives the alternating sum .1 C 2 1/ C .1 4 C 1/ C .1 C 6 1/ C C .1/i .1 2i C 1/; which equals .i 1/ if i is odd and i if i is even. Thus by Proposition 2.1, Me .N / D
N 1 X Se .n/ D O.log N= log log N /; N nD1
(4.109)
which is the true order of magnitude. Equations (4.108) and (4.109) tell us that the mean value Me .N / is negligible compared to the standard deviation 3=2 p 1 log N Ve .N / D C negligible: 6 15 log log N
(4.110)
We can prove the following analog of Theorem 1.2 for ˛ D e: for any integer N 3 and any real numbers 1 < A < B < 1 1 1 Se .n/ B Dp measure 1 n N W A N c4 .log N= log log N /3=2 2 Z
B
e u
2 =2
d u C C O .log N /3=10 log log N ;
A
(4.111) where c4 D 61 153=2 (see (4.110)). What happens for an arbitrary irrational ˛ D Œa0 I a1 ; a2 ; a3 ; : : :? How far can we generalize Theorem 1.2? Proposition 2.1 expresses the mean value M˛ .N / in terms of the partial quotients ai . For the variance we don’t have a similar elegant formula, but we still have (3.12) V˛ .N / D
N N 2 1 1 X 1 X C negligible: S˛ .n/ M˛ .N / D N nD0 8 2 nD1 .n sin. n˛//2 (4.112)
It is easy to see that P the right-hand side of (4.112) is between two absolute constant multiples of i Wqi N ai2 . We can prove the following central limit theorem:
4.5 More Results in a Nutshell
247
assume that a2 Pm m
i D1
ai2
! 0 as
m ! 1;
(4.113)
then ) ( Z B 1 1 S˛ .n/ M˛ .N / 2 B ! p p e u =2 d u measure 1 n N W A N 2 A V˛ .N / (4.114) for any fixed values of 1 < A < B < 1 as N ! 1. Observe that (4.113) is basically the necessary condition (called Lindeberg condition) that the components are “individually negligible." This is why (4.114) is the most general result that we can hope for. p 3 2? Well, Can we prove a similar central limit theorem for, say, ˛ D p 3 unfortunately we know almost nothing about the continued fraction for 2 (or any other real algebraic number of degree 3). In particular, we don’t have the slightest p clue whether criterion (4.113) applies for ˛ D 3 2 or not. The proofs of (4.111) and (4.114) are somewhat more complicated than that of Theorem 1.1 (see Sect. 4.3), due to the fact that in general the continued fraction is not periodic, so we do not obtain a homogeneous Markov chain. Nevertheless, the usual decomposition technique (see Sects. 1.5, 4.1, 4.3) still works as the Markov chain (“short-term memory”) can be successfully replaced by exponentially weak dependence. The best way to handle exponentially weak dependence is to involve martingales. We will give the full details somewhere else.
Part II
Local Aspects Inhomogeneous Pell Inequalities
Chapter 5
Pell’s Equation, Superirregularity and Randomness
5.1 From Pell Equation to Superirregularity 5.1.1 Pell’s Equation: Bounded Fluctuations Our starting point is the well-known Pell’s equation, a standard part of any introductory course on number theory. The theory of Pell’s equation, while mostly elementary, is nevertheless one of the most beautiful chapters in the whole of mathematics. Also, it is very important, since the concept of units plays a key role in algebraic number theory. We illustrate the main results on the concrete equation x 2 2y 2 D ˙1. This equation has infinitely many integral solutions; in fact, the set of all integral solutions .xk ; yk / 2 ZZ2 forms a cyclic group generated by the least positive solution. More precisely, we have p p xk C yk 2 D ˙.1 C 2/k ; k 2 ZZ: All integral solutions of x 2 2y 2 D 1 are given by p p xk C yk 2 D ˙.1 C 2/2k and all of x 2 2y 2 D 1 by p p xk C yk 2 D ˙.1 C 2/2kC1 : In particular, all positive integer solutions of x 2 2y 2 D 1 are given by p p p xk C yk 2 D .1 C 2/2k D .3 C 2 2/k ; k D 1; 2; 3; : : :
© Springer International Publishing Switzerland 2014 J. Beck, Probabilistic Diophantine Approximation, Springer Monographs in Mathematics, DOI 10.1007/978-3-319-10741-7__5
251
252
5 Pell’s Equation, Superirregularity and Randomness
p p Taking the algebraic conjugate xk yk 2 D .3 2 2/k and adding/subtracting these two equations together, we obtain the explicit formulas p p p p .3 C 2 2/k C .3 2 2/k .3 2 2/k .3 2 2/k p : xk D and yk D 2 2 2 (5.1) p p Since 0 < 3 2 2 < 1 (in fact, 0 < 3 2 2 < 1=5), we have p 1 xk D the nearest integer to .3 C 2 2/k 2 and p 1 yk D the nearest integer to p .3 C 2 2/k : 2 2 If k is large, the error is very small. For example, the tenth solution of x 2 2y 2 D 1 in positive integers is the pair x10 D 22; 619; 537 and y10 D 15; 994; 428: On the other hand, p 1 .3 C 2 2/10 D 22; 619; 536:99999998895 : : : 2 and p 1 p .3 C 2 2/k D 15; 994; 428:000000007815 : : : : 2 2 p Let F .N / D F . 2I 1I N / denote the number of positive integer solutions of the Pell equation x 2 2y 2 D 1 up to N in the sense x 1 and 1 y N . (For the simplicity of notation it is more convenient to restrict the second variable y.) We have p p .3 C 2 2/k .3 2 2/k k F .N / ” p N; 2 2 which implies the asymptotic formula p F .N / D F . 2I 1I N / D
log N
p C O.1/: log.3 C 2 2/
(5.2)
5.1 From Pell Equation to Superirregularity
253
p Formula (5.2) says that the counting function F .N / D F . 2I 1I N / has an extremely predictable, almost deterministic behavior: it is const log N plus some totally negligible bounded error term. Note that (5.2) has p some far-reaching generalizations. Let Œ1 ; 2 be an arbitrary interval, and let F . 2I Œ1 ; 2 I N / denote the number of positive integer solutions of the Pell inequality 1 x 2 2y 2 2 , x 1 and 1 y N . By using the theory of indefinite binary quadratic forms, it is easy to prove the following analog of (5.2): p p F . 2I Œ1 ; 2 I N / D c0 . 2I 1 ; 2 / log N C O.1/;
(5.3)
p where the constant factor c0 . 2I 1 ; p 2 / is independent of N . What is more, we can switch from 2 to any other quadratic irrational ˛ (i.e., ˛ is a root of a quadratic equation Ax 2 C Bx C C D 0 with integral coefficients such that the discriminant B 2 4AC 2 is not a complete square).p Let’s go back to (5.3), that is, to the special case ˛ D 2. For example, if 2 < 1 1 < 1 2 < 2, then p c0 . 2I 1 ; 2 / D
2 p D p ; log.1 C 2/ log.3 C 2 2/ 1
(5.4)
if 1 < 1 1 2 < 2, then p c0 . 2I 1 ; 2 / D
1 p ; log.3 C 2 2/
and finally if 1 < 1 2 < 1, then of course p c0 . 2I 1 ; 2 / D 0:
(5.5)
(5.6)
5.1.2 The Area Principle It is very interesting to compare these well-known asymptotic results about the number of solutions of the Pell equation/inequality to what we like to call the “naive area principle”—a natural guiding intuition in “lattice point theory.” It goes as follows: if a “nice region” has a “large” area, then it should contain a “large” number of lattice points, “close” to the area. In the rest we refer to this vague intuition as the Area Principle. Of course, the heart of the matter is how to define “nice region” precisely. Consider, for example, the infinite open horizontal strip of height one: 0 < y < 1, 1 < x < 1; it has infinite area, but it contains no lattice point. The reader is likely to agree that the infinite strip is a “nice region,” so the Area Principle is clearly violated here.
254
5 Pell’s Equation, Superirregularity and Randomness
A less trivial example comes from the Pell inequality
1 1 x 2 2y 2 ; 2 2
(5.7)
which is a hyperbolic region of infinite area, and contains no lattice point except the origin. The reader is again likely to agree that the hyperbolic region (5.7) is also “nice,” so this is again a violation of the Area Principle. Next we switch from (5.7) to the general Pell inequality 1 x 2 2y 2 2 ;
(5.8)
where 1 < 1 < 2 < 1 are arbitrary real numbers. Of course, the hyperbolic region (5.8) has infinite area. What we want to compute is the area of a finite segment. Consider the finite region p ˚ H. 2I Œ1 ; 2 I N / D .x; y/ 2 R I 2 W 1 x 2 2y 2 2 where x 1 and 1 y N g :
(5.9)
If Np is very large compared to the pair of constants 1 ; 2 , then the finite region H. 2I Œ1 ; 2 I N / looks like a “hyperbolic needle.” p It is easy to estimate the area of the “hyperbolic needle” H. 2I Œ1 ; 2 I N /:
p 2 1 area H. 2I Œ1 ; 2 I N / D p log N C O.1/; 2 2
(5.10)
where the implicit constant in O.1/ is independent of N (but may depend on 1 and 2 ). The proof of (5.10) is based on the familiar factorization p p x 2 2y 2 D .x C y 2/.x y 2/
(5.11)
and on the computation p of the Jacobian of the corresponding substitution [this explains the factor 2 2 in the denominator in (5.10)]. The details are easy, and go as follows. In view of the factorization (5.11), it is more convenient to compute the area of the following slight variant of region (5.10): let p ˚ H . 2I Œ1 ; 2 I N / D .x; y/ 2 R I 2 W 1 x 2 2y 2 2 p p o where 1 x C y 2 2 2N :
(5.12)
Consider the substitution p p u1 D x C y 2; u2 D x y 2;
(5.130)
5.1 From Pell Equation to Superirregularity
255
which is equivalent to xD
u1 u 2 u1 C u 2 ; yD p ; 2 2 2
(5.1300)
and the corresponding Jacobian is ˇ ˇ ˇ 1=2 1=2 ˇ @.x; y/ ˇ D ˇ 3=2 3=2 ˇˇ D 23=2 : 2 2 @.u2 ; u1 / Applying the substitution (5.130) and (5.1300 ), we have Z p area.H . 2I Œ1 ; 2 I N // D
1 D p 2 2 1 D p 2 2
Z
p H . 2IŒ1 ;2 IN /
Z
Z p 1u1 2 2N
p 2 2N
1
1 =u1 u2 2 =u1
1 dxdy D
1 d u2
d u1 D
2 1 2 1 d u1 D p log N C O.1/: u1 2 2
(5.14)
A simple geometric consideration shows that
p p area H. 2I Œ1 ; 2 I N / D area H . 2I Œ1 ; 2 I N / C O.1/; and so (5.14) implies (5.10). Now let’s return to the Area Principle. Comparing (5.3) with (5.9) and (5.10), it is “reasonable” p to expect—in view of the Area Principle—that the countingpfunction F . 2I Œ1 ; 2 I N / is “close” to the area of the hyperbolic needle H. 2I Œ1 ; 2 I N /. In other words, it is “reasonable” to expect that p 2 1 c0 . 2I 1 ; 2 / D p : 2 2
(5.15)
Unfortunately, the Area Principle is “almost always” violated in the quantitative sense that (5.15) fails for the overwhelming majority of the choices 1 < 1 < 2 < 1. In fact, the left-hand side and the right-hand side of (5.15) have completely different behavior: the left-hand side of (5.15) has discrete jumps and the righthand side is a continuous function of 1 and 2 . For example, as 1 and 2 run in p the interval 2 < 1 < 2 < 2, the constant factor c0 . 2I 1 ; 2 / has only three possible values [see (5.4)–(5.6)]: 0;
2 1 p ; p : log.3 C 2 2/ log.3 C 2 2/
256
5 Pell’s Equation, Superirregularity and Randomness
This shows—in a quantitative way—how the general Pell inequality [see (5.8)] 1 x 2 2y 2 2 violates the Area Principle.
5.1.3 The Giant Leap in the Inhomogeneous Case: Extra Large Fluctuations Using the familiar factorization (5.11), we can rewrite the Pell equation x 2 2y 2 D ˙1, restricted to positive integers, as follows: p p p p jx 2 2y 2 j 1 ” jy 2 xj .y 2 C x/ 1 ” ky 2k .y 2 C x/ 1; (5.16) where kzk denotes, as usual, the distance of a realpnumber z from the nearest integer. Notice that in (5.16) x is p the nearest integer to y 2 (=an p irrational number, namely, an integral multiple of 2 where y 1). Since y 2 D x C o.1/, (5.16) is “basically” equivalent to the “vague” inequality p 1 C o.1/ ky 2k p : 2 2y
(5.17)
The vagueness of (5.17) comes from the additive term o.1/, which tends to 0 as y ! 1. Formula (5.17) is ambiguous, but we are sure every mathematician understands what we are talking about here. An expert in number theory would classify (5.17) as a basic problem in diophantine approximation. Next we give a nutshell summary of diophantine approximation. The classical problem in the theory of diophantine approximation is to find “good” rational approximations of irrational numbers. More precisely, we want to decide whether an inequality kn˛k <
1 m 1 ” ˛ < 2 ; n .n/ n n .n/
(5.18)
or in general kn˛ ˇk <
1 ; n .n/
(5.19)
where ˛ is a given irrational and ˇ is a given real number, has infinitely many integral solutions in n, and if this is the case, to determine the solutions, or at least determine the asymptotic number of integral solutions. As usual, kzk denotes the distance of a real z from the nearest integer, and .n/ is a positive increasing function of n.
5.1 From Pell Equation to Superirregularity
257
The diophantine inequality (5.18) is said to be homogeneous, whereas the diophantine inequality (5.19) is said to be inhomogeneous. For example, in the homogeneous case the best possible result is Hurwitz’s well-known theorem: for any irrational ˛, 1 kn˛k < p 5n has infinitely many positive integer solutions. In the inhomogeneous case we can mention an old result of Kronecker that, for any irrational ˛ and for any real ˇ, kn˛ ˇk <
3 n
has infinitely many positive integer solutions. Perhaps the strongest inhomogeneous result is Minkowski’s theorem: for any irrational ˛, kn˛ ˇk <
1 4jnj
has infinitely many integer solutions (not necessarily positive), unless 0 < ˇ < 1 is an integral multiple of ˛ modulo one. The homogeneous case (5.18) has a complete theory based on the effectiveness of the tool of continued fractions. These are classical results mostly due to Euler and Lagrange. In Part I we gave some new results in the inhomogeneous case: they cover the case where ˛ is an arbitrary quadratic irrational and ˇ is a “typical” real number. Before formulating the main results, first we want to elaborate on the connection between homogeneous/inhomogeneous diophantine inequalities, such as (5.18) and (5.19), and homogeneous/inhomogeneous Pell inequalities.
5.1.3.1 Homogeneous and Inhomogeneous Pell Inequalities The general form of a quadratic curve on the plane is a11 x 2 C a12 xy C a22 y 2 C a13 x C a23 y C a33 D 0:
(5.20)
We are interested in the integral solutions .x; y/ 2 ZZ2 of an arbitrary inequality 1 a11 x 2 C a12 xy C a22 y 2 C a13 x C a23 y 2 ;
(5.21)
where 1 < 2 are given real numbers. Equation (5.21) defines a plane region; the boundary consists of two curves of type (5.20). If the discriminant is negative:
258
5 Pell’s Equation, Superirregularity and Randomness
2 D D a12 4a11 a22 < 0, then (5.21) defines a bounded region where the boundary curves are two ellipses. Of course in this case there are only a finite number of integral solutions of (5.21). Nevertheless, this case is far from trivial. It includes the classical Circle Problem of Gauss, which is about the number of lattice points in the plane contained in a large circle centered at the origin—let Z.R/ denote this number for radius R. More precisely, the problem is to describe the true order of the fluctuations of the error term Z.R/ R2 as R ! 1 (where of course R2 is the area of the circle). It is a famous, long-standing open problem to prove the conjecture
1
Z.R/ R2 D O R 2 Co.1/ : Let Z.cI R/ denote the analogous lattice point counting number for a circle of radius R centered at c 2 R I 2 (of course we can assume that c 2 Œ0; 1/2 ). The analogous conjecture is 1
Z.cI R/ R2 D O R 2 Co.1/ : Unfortunately, this conjecture is not proved for any center c 2 Œ0; 1/2 . 2 What we are really interested in is the case of positive discriminant D D a12 4a11 a22 > 0. It is very different, because for D > 0 (5.21) defines an unbounded region, where the boundary curves are two hyperbolas—thus we have a chance for infinitely many integral solutions of (5.21). For simplicity assume that the coefficients a11 ; a12 ; a22 in (5.21) are integers and 2 D D a12 4a11 a22 > 0. We can factorize the quadratic part as follows: a11 x 2 C a12 xy C a22 y 2 D a11 .x ˛y/.x ˛0 y/;
(5.22)
where ˛D
p p a12 C D a12 D ; ˛0 D : 2a11 2a11
(5.23)
Using (5.22) we can rewrite (5.21) in the form 1 .x ˛y C 1 /.x ˛ 0 y C 2 / 2 ;
(5.24)
where 1 C 2 D
a13 a23 ; ˛ 0 1 C ˛2 D a11 a11
[note that 1 ; 2 are generic numbers; the pair 1 ; 2 in (5.21) is not (necessarily) the same as the pair 1 ; 2 in (5.24)].
5.1 From Pell Equation to Superirregularity
259
p Without loss of generality we can assume that ja12 j a11 D=3 (this is a well-known fact from the Reduction Theory of binary quadratic forms; we omit the proof, see, e.g., [Za4]), and then we have ˛ > 0 > ˛ 0 . For simplicity assume that the interval Œ1 ; 2 is symmetric to 0, i.e., Œ1 ; 2 D Œ; . Also, assume that we are interested in the positive integral solutions of (5.24). Since ˛ > 0 > ˛ 0 , for “large” positive x and y the second factor .x ˛ 0 y C 2 / in (5.24) is also “large” positive, implying that the first factor .x ˛y C 1 / in (5.24) has to be very small. That is, x has to be the nearest integer to .y˛ 1 /. It follows that the symmetric version of (5.21) a11 x 2 C a12 xy C a22 y 2 C a13 x C a23 y ;
(5.25)
where > 0 is a given real number, is equivalent to the diophantine inequality ky˛ 1 k <
c a11 D p : where c D 0 y C O.1/ ˛˛ D
(5.26)
Let’s return to Eq. (5.21). If the linear part a13 x C a23 y is missing, i.e., a13 D a23 D 0, then we have a complete theory based on the Pell equation. More precisely, 1 Q.x; y/ 2 ” Q.x; y/ D m; 1 m 2 ; m 2 ZZ with Q.x; y/ D a11 x 2 Ca12 xy Ca22 y 2 , and we have a complete characterization of the integral solutions of Q.x; y/ D m for any integer m as follows. For any integer m there is a finite list of “primary solutions,” say, .xj ; yj /, j 2 J where jJ j < 1 and Q.xj ; yj / D m such that every solution x D u; y D v of Q.x; y/ D m can be written in the form p !n u0 C v0 D .xj ˛yj / u ˛v D ˙ 2 for some j 2 J and n 2 ZZ, where x D u0 > 0; y D v0 > 0 is the least positive solution of Pell’s equation x 2 Dy 2 D 4. As a by-product, we obtain that the number of positive integral solutions of 1 Q.x; y/ 2 with 1 x N; 1 y N has the simple asymptotic form c log N C O.1/, where c D c.a11 ; a12 ; a22 ; 1 ; 2 / is a constant and the error term O.1/ is uniformly bounded as N ! 1. (For a more detailed proof, see Lang’s book [La].) Exactly the same holds if there is a nonzero linear part a13 x C a23 y in (5.21), but its effect “cancels out”: 1 in (5.24) is an integer.
260
5 Pell’s Equation, Superirregularity and Randomness
Finally, if 1 is not an integer, then we call (5.24) an inhomogeneous Pell inequality. In view of (5.26), an inhomogeneous Pell inequality (5.24) is basically equivalent to an inhomogeneous diophantine inequality kn˛ ˇk <
c n
(5.27)
p with c D a11 = D, where ˛ is a quadratic irrational defined in (5.23). Inequality (5.27) is a special case of (5.19) where .n/ is a constant. Our results describe the asymptotic behavior of the number of positive integral solutions of (5.21) for every non-square integer discriminant D > 0 and for almost every a13 ; a23 . The number of solutions exhibits 1. extra large fluctuations (proportional to the area!), 2. satisfies an elegant central limit theorem, and 3. satisfies a shockingly precise law of the iterated logarithm, see Theorems 5.3, 5.4, and 5.6. Because it represents the whole difficulty, for notational simplicity we formulate the results in the special case ofpdiscriminant D D 8. It corresponds to the most famous quadratic irrational ˛ D 2. Since the class number of discriminant D D 8 is one, the general form of an inhomogeneous Pell inequality of discriminant D D 8 is 1 .x C ˇ1 /2 2.y C ˇ2 /2 2
(5.28)
where 1 < 2 and ˇ1 ; ˇ2 2 Œ0; 1/ are fixed constants. For notational simplicity we restrict ourselves to symmetric intervals Œ; in (5.28); note that everything works similarly for general intervals Œ1 ; 2 . The factorization p p .x C ˇ1 /2 2.y C ˇ2 /2 D .x C ˇ y 2/.x C ˇ 0 C y 2/;
(5.29)
p p where ˇ D ˇ1 ˇ2 2 and ˇ0 D ˇ1 C ˇ2 2, clearly indicates that the asymptotic number p of integral solutions of (5.28) heavily depends on the “local” behavior of n 2 mod 1. In fact, (5.28) is essentially equivalent to the inhomogeneous diophantine inequality p c kn 2 ˇk < n
(5.30)
p with c D =2 2. To turn the vague term “essentially equivalent” into a precise statement, let p F . 2I ˇ1 ; ˇ2 I I N / be the number of integral solutions .x; y/ 2 ZZ2 of (5.28) with 2 D , 1 D satisfying 1 y N and x 1. It means counting lattice points in a long and narrow hyperbola segment.
5.1 From Pell Equation to Superirregularity
261
p Let F . 2I ˇI cI N / be the numberpof integral solutions n of (5.30) satisfying 1 n N , where ˇ D ˇ1 pˇ2 2. Now essentially p equivalent means that, for almost every pair ˇ1 ; p ˇ2 , F . 2I ˇ1 ; ˇ2 I I N / F . p2I ˇI cI N / D O.1/ as N ! 1, where c D =2 2 (and of course ˇ D ˇ1 ˇ2 2). More precisely, we have Lemma 5.1. Let > 0 and ˇ2 be arbitrary real numbers. Then for almost every ˇ1 there exists a finite 0 < C.ˇ1 ; ˇ2 ; / < 1 such that Z
1
C.ˇ1 ; ˇ2 ; / dˇ < 1 and 0
ˇ p ˇ p ˇ ˇ ˇF . 2I ˇ1 ; ˇ2 I I N / F . 2I ˇI cI N /ˇ < C.ˇ1 ; ˇ2 ; / for all N 1; p p where c D =2 2 and ˇ D ˇ1 ˇ2 2. We postpone the simple proof to Sect. 5.3. In view of Lemma 5.1 it suffices to study the special case ˇ2 D 0, ˇ1 D ˇ: .x C ˇ/2 2y 2
(5.31) p
where > 0 and ˇ 2 Œ0; 1/ are fixed constants. For simplicity, let F . 2I ˇI I N / 2 denote the number p of integral solutions .x; y/ 2 ZZ of (5.31) satisfying 1 y N and x 1. F . 2I ˇI I N / counts the number of lattice points in a long and narrow p hyperbola segment (“hyperbolic p needle”) located along a line of slope 1= 2 (if ˇ D 0 then the line is y D x= 2), see Fig. 5.1.
Fig. 5.1
262
5 Pell’s Equation, Superirregularity and Randomness
In the special case D 1 and ˇ D 0, (5.31) becomes the simplest Pell equation x 2 2y 2 D ˙1. The integral solutions .xk ; yk / form a cyclic group generated p by the smallest positive solution x D y D 1 in the well-known way: x C y 2D k k p .1 C 2/k , implying the familiar asymptotic formula p F . 2I ˇ D 0I D 1I N / D
log N p C O.1/; log.1 C 2/
(5.32)
p p where 1 C 2 is the fundamental unit of the real quadratic field Q . 2/. In sharp contrast to the bounded fluctuation in the homogeneous case ˇ D 0, the inhomogeneous case can exhibit “extra large fluctuations proportional to the area,” p see Theorem 5.3. To explain this, first we have to compute the mean value of F . 2I ˇI I N / as ˇ runs in the unit interval 0 ˇ < 1. Lemma 5.2. We have Z
1 0
p F . 2I ˇI I N / dˇ D p log N C O.1/; 2
(5.33)
where the implicit constant in O.1/ is independent of N (but may depend on ). Moreover, for an arbitrary subinterval 0 a < b 1 we have the limit formula lim
N !1
1 ba
Rb a
p F . 2I ˇI I N / dˇ Dp : log N 2
(5.34)
Formulas (5.33) and (5.34) express the almost trivial geometric fact that the average number of lattice points contained in all the translated copies of a given region (a hyperbola segment in our special case) is precisely the area of the region, see Lemma 5.8. We will give a detailed proof of Lemma 5.2 in Sect. 5.3. Now we are ready to formulate our first—and weakest—extra large fluctuation result, demonstrating that the fluctuations can be proportional to the area. This result is hardly more than a warm-up for—or simplest illustration of—the main results that will come later. Theorem 5.3. For D 1=2 there are continuum many “divergence points” ˇ 2 Œ0; 1/ in the sense that p p F . 2I ˇ I D 1=2I n/ F . 2I ˇ I D 1=2I n/ lim sup > lim inf : n!1 log n log n n!1
(5.35)
Note that the fluctuation const log n in (5.35) is as large as possible apart from a constant factor. This follows from Lemma 5.5 in the next section. It is fair to say that Theorem 5.3 represents a sophisticated violation of the Area Principle.
5.2 Randomness and the Area Principle
263
We postpone the proof of Theorem 5.3 to Sect. 5.3. Note that Theorem 5.3 has a far-reaching generalization: it holds for every > 0, and we actually have the stronger inequality p p F . 2I ˇ I I n/ F . 2I ˇ I I n/ > p > lim inf : (5.36) lim sup n!1 log n log n n!1 2 We will return to the stronger (5.36) later in Sect. 5.4, see Theorem 5.11. Another far-reaching generalization of Theorem 5.3 will be discussed in Sect. 5.9, see Theorem 5.14. Finally, an extra large fluctuation type result for arbitrary point sets (instead of the set ZZ2 of lattice points) will be discussed in Sect. 5.9, see Theorem 5.19. We refer to these extra large fluctuation type results as super-irregularity. Sections 5.4–5.9 are all devoted to super-irregularity.
5.2 Randomness and the Area Principle Equations (5.32) and (5.35) display the two extreme cases: (1) the totally negligible bounded fluctuations around the main value const log n (which is in the range of the area) and (2) the extra large fluctuations proportional to the area (“superirregularity”). But what kind of fluctuations do we have for a typicalp0 < ˇ < 1? We show that for a typical ˇ, the asymptotic number of solutions F . 2I ˇI I N /, as N ! 1, justifies the Area Principle. And beyond that a more thorough look reveals “randomness.” Talking about randomness, we note that the two most important parameters of a random variable are the expectation (or mean value) and the variance. By (5.33) Z 1 p expectation D F . 2I ˇI I N / dˇ D p log N C O.1/: 2 0 Explaining why the natural scaling is exponential. Note that for any 1 < M < N , the counting function is “slowly changing” in the following sense: p p (5.37) F . 2I ˇI I N / F . 2I ˇI I M / D O .log.N=M // ; where const log.N=M / is the corresponding area. The geometric reason behind this is the exponentially sparse occurrence of the lattice points in the corresponding long and narrow tilted hyperbola. The proof of (5.37) is a straightforward application of Lemma 5.5. We have the following corollary of (5.37). If M D cN , i.e., n runs p in cN < n < N with some constant 0 < c < 1, then the fluctuation of F . 2I ˇI I N / is a trivial O.1/. This negligible constant size change O.1/ in (5.37), as n runs in cNp< n < N , explains why it is more natural to switch to the exponential scaling F . 2I ˇI I e N /. In the rest of this discussion we will often prefer the exponential scaling.
264
5 Pell’s Equation, Superirregularity and Randomness
The variance comes from the following result: for any > 0 there is a positive effective constant D . / > 0 such that 1 lim N !1 N
Z 0
1
2 p N dˇ D 2 . /: F . 2I ˇI I e / p N 2
The proof of this limit formula is far from easy: it is based on a combination of Fourier analysis (Poisson’s summationpformula, Parseval’s formula) and the arithmetic of the quadratic number field Q . 2/. The first probabilistic result—nicely fitting our general scheme of “determinism vs. randomness”—is the following. Theorem 5.4 (“central limit theorem”). The renormalized counting function p F . 2I ˇI I e N / p . / N
p N 2
; 0 ˇ < 1;
has a standard normal limit distribution with error term O.N 1=4 .log N /3 / as N ! 1. Formally, ˇ p p ˇ max ˇˇmeas ˇ 2 Œ0; 1/ W F . 2I ˇI I e N / p N . / N 2 ˇ Z 1 ˇ 1 2 e u =2 d uˇˇ D O N 1=4 .log N /3 ; p 2 where the maximum is taken over all 1 < < 1. To give at least a very vague intuition behind Theorem 5.4, we write p p Gj .ˇ/ D F . 2I ˇI I e j / F . 2I ˇI I e j 1 /; j D 1; 2; :::; N: I of (5.31) satisfying e j 1 < That is, Gj .ˇ/ is the number of integral solutions n 2 N j ne : Note that Gj .ˇ/ is a bounded function. This follows from Lemma 5.5, and from the obvious geometric fact that, any short hyperbola segment, corresponding to Gj , is “basically a rectangle.” More precisely, any short hyperbola segment, corresponding to Gj , can be approximated by an inscribed rectangle R1 of slope p p 1= 2 and a circumscribed rectangle R2 of slope 1= 2 such that the ratio of the two areas is uniformly bounded by an absolute constant. It is time now to formulate p Lemma 5.5. Every tilted rectangle of slope 1= 2 and area 1=5 contains at most one lattice point.
5.2 Randomness and the Area Principle
265
We postpone the proof of this simple but important result to the next section. Lemma 5.5 can be easily generalized: the same proof gives that for any quadratic irrational ˛ there is a positive constant c0 D c0 .˛/ > 0 such that every tilted rectangle of slope ˛ and area c0 contains at most one lattice point. Our key intuition is that the bounded function Gj .ˇ/ resembles the j th Rademacher function, so the sum N X p Gj .ˇ/ p ; F . 2I ˇI I e N / p N D 2 2 j D1 as a function of ˇ 2 Œ0; 1/; behaves like a sum of N independent Bernoulli variables (“N -step random walk”): p F . 2I ˇI I e N / p N ˙1 ˙ 1 ˙ ˙ 1 .N terms/: 2
(5.38)
Our next result—Theorem 5.6—can be interpreted as an analog of the famous Law of the iterated logarithm in probability theory. We show that the number of p solutions F . 2I ˇI I e n / of (5.31) oscillates between the sharp bounds (" > 0) p p p p p p n n .2 C "/ log log n < F . 2I ˇI I e n / < p n C n .2 C "/ log log n 2 2 (5.39)
as n ! 1 for almost every ˇ. Note that D . / > 0 is the same as in Theorem 5.4, and (5.39) fails for 2 " instead of 2 C " (where " > 0). Here the main term p2 n means the “area,” so (5.39) can be considered as a highly sophisticated justification of the Area Principle. Equation (5.39) is particularly interesting in view of the fact that the classical Circle Problem is unsolved, and seems to be hopeless by the current techniques. As we pointed out in Sect. 5.1, we do not know the true order of the fluctuations of the error term Z.cI R/ R2 , R ! 1 for any fixed center c. What (5.39) means is that we can solve a Hyperbola Problem instead of the Circle Problem. More precisely, we can solve the hyperbola version of the Circle Problem at least for almost every “center”. We show that, for almost every “center” (i.e., for almost every value of the translation parameter ˇ), the number of lattice points asymptotically equals the area plus an error, which even in the worst case scenario is roughly around the square root of the area. (For circles the corresponding maximum error is conjectured to be roughly around square root of the circumference, which is the fourth root of the area.) The law of the iterated logarithm is one of the most famous results in classical probability theory, and it describes the “maximum fluctuation” in the infinite (onedimensional) random walk. The term infinite random walk refers to an infinite sequence of random Bernoulli trials, where each trial is tossing a fair coin. Of course, “coin tossing” belongs to the physical world; it is not a mathematical concept. But there is a well-known pure mathematical problem, which is considered
266
5 Pell’s Equation, Superirregularity and Randomness
“equivalent”: we can study the digit distribution of a typical real number written in the binary form ˇD
b1 b3 b2 C 2 C 3 C ; 2 2 2
where each bi D 0 or 1 (for simplicity assume that 0 < ˇ < 1). The infinite 0-1 sequence b1 D b1 .ˇ/; b2 D b2 .ˇ/; b3 D b3 .ˇ/; ; i.e., the sequence of binary digits of 0 < ˇ < 1, represents an infinite Heads-andTails sequence, say, 1 is Heads and 0 is Tails. The sum Bn D Bn .ˇ/ D b1 C b2 C b3 C C bn counts the number of 1s (“Heads”) among the first n binary digits of 0 < ˇ < 1. Borel’s classical theorem about normal numbers asserts that 1 Bn .ˇ/ ! for almost all 0 < ˇ < 1: n 2 Let Sn D Sn .ˇ/ denote the corresponding error term Sn D Sn .ˇ/ D 2Bn .ˇ/ n D number of Heads number of Tails: That is, Sn D Sn .ˇ/ represents the number of Heads minus the number of Tails among the first n random trials (“coin tossings”). A well-known theorem of Khinchin (see [Kh1]) asserts that lim sup p n
Sn .ˇ/ D 1 for almost all 0 < ˇ < 1: 2n log log n
Notice that Khinchin’s theorem is a far-reaching quantitative improvement on Borel’s famous theorem on “normal numbers.” The “long form” of Khinchin’s theorem says that, for any " > 0 and for almost every ˇ, we have the following two statements: 1. p Sn .ˇ/ < .1 C "/ 2n log log n 2.
for all sufficiently large values of n and p Sn .ˇ/ > .1 "/ 2n log log n holds for infinitely many values of n.
5.2 Randomness and the Area Principle
267
This strikingly elegant and precise result is the simplest form of the so-called law of the iterated logarithm, usually called the Khinchin’s form. About 20 years later Erd˝os proved the following ultimate convergence– divergence criterion (conjectured by Kolmogorov), which contains the Khinchin’s form as a simple corollary (see [Ko, Er, Fe3]). Let .n/ be an arbitrary positive increasing function of n. Then for almost every ˇ, p p Sn .ˇ// .n/ n or Sn .ˇ// > .n/ n
(5.40)
hold, respectively, for infinitely many ns or for all sufficiently large ns if and only if the series 1 X .n/ nD1
n
e
2 .n/=2
converges or diverges:
(5.41)
Exactly the same holds for the other inequality p Sn .ˇ// < .n/ n: Notice that Khinchin’s theorem is a special case of Erd˝os’s theorem with .n/ D ..2 C "/ log log n/1=2 : Indeed, the series (5.41) is convergent or divergent depending on whether we have " > 0 or " 0. We can obtain a much more precise result by choosing an arbitrarily large but fixed value of the integer k 5 and an arbitrarily small but fixed value of " > 0, and write .n/ D " .n/ D .2 log2 n C 3 log3 n C 2 log4 n C : : : C 2 logk1 n C 2.1 C "/ logk n/1=2 : (5.42)
Warning: here we use the space-saving notation log2 n D log log n, i.e., it means the iterated logarithm (and not the base 2 logarithm), and in general logk n D log.logk1 n/ denotes the k times iterated logarithm of n. With this choice of " .n/, 1 X " .n/ nD1
X n
n
e "
2
.n/=2
1 ; n log n log2 n log3 n logk1 n.logk n/1C"
where the last sum is convergent or divergent depending on whether we have " > 0 or " 0.
268
5 Pell’s Equation, Superirregularity and Randomness
This example clearly illustrates the remarkable precision of Erd˝os’ theorem. Let’s return to (5.39). The fact that it is an analog of Khinchin’s law of the iterated logarithm suggests the vague intuition that the lattice point counting function p F . 2I ˇI I e n / behaves like a “generalized digit sum” (as ˇ runs in 0 < ˇ < 1). What we are going to actually formulate below (see Theorem 5.6) is a refinement of (5.39): we work with the " .n/ defined in (5.42). Theorem 5.6 (“law of the iterated logarithm”). Let k 5 be an integer, and write " .n/ D .2 log2 n C 3 log3 n C 2 log4 n C : : : C 2 logk1 n C 2.1 C "/ logk n/1=2 ; (5.43) where " 0 is a fixed constant. Choosing " > 0 in (5.43), for almost every ˇ, p p (5.44) F . 2I ˇI I e n / p n C " .n/ n 2 hold for all sufficiently large integers n. On the other hand, choosing " D 0 in (5.43), for almost every ˇ, p p F . 2I ˇI I e n / > p n C 0 .n/ n 2
(5.45)
hold for infinitely many n’s. Exactly the same holds for the negative direction p p F . 2I ˇI I e n / p n " .n/ n 2 and p p F . 2I ˇI I e n / < p n 0 .n/ n; 2 respectively. Remarks. We could also prove an analog of the Erd˝os type ultimate convergence– divergence criterion, but the proof of Theorem 5.6 is already very long, and the convergence-divergence criterion would require some (annoyingly long) extra technical discussions. p p By Lemma p 5.1, F . 2I ˇI cI N / D F . 2I ˇI I N / C O.1/ as N ! 1, where c D p =2 2. So Lemma 5.1 implies that Theorems 5.4 andp5.6 remain true if F . 2I ˇI I N / is replaced with the number of solutions F . 2I ˇI cI N / of the inhomogeneous diophantine inequality (5.30). Theorem 5.6 emphasizes the dramatic difference between rational ˇ and almost every ˇ. For every rational ˇ the counting function has the form p F . 2I ˇI I N / D c. / log N C O.1/ as N ! 1
(5.46)
5.2 Randomness and the Area Principle
269
p for all > 0, and it remains true if 2 is replaced by any quadratic irrational. This bounded size fluctuation around the main term c log N (which is typically not the area, but it is in the range of the area) jumps up considerably for a “typical” ˇ. By Theorem 5.6 we have square root size fluctuations (square root of the area) around the main term (=area), and this holds for almost every ˇ and all > 0. The bounded size fluctuation around c log N in (5.46) follows from a general principle that recurrence implies periodicitypin a fixed modulus. We illustrate this general principle in the special case of ˛ D 2. Let ˇ D r=s be a rational number; then 1 .x C ˇ/2 2y 2 2 ” 1 s 2 .xs C r/2 2.ys/2 2 s 2 ; so it suffices to study the asymptotic number of integral solutions u; v 2 ZZ of the quadratic equation u2 2v2 D m where 1 s 2 m 2 s 2 ; u r .mod s/; v 0 .mod s/: p the form of either u2k C v2k 2 D .1 C Every solution of u2 2v2 D m has p p p 2/2k .a C b 2/ where a C bp 2 is a fundamental x 2 2y 2pD p p solution of 2kC1 m, or u2kC1 C v2kC1 2 D .1 C 2/ .a C b 2/ where a C b 2 2 2 is a fundamental solution of x 2y D m (there are only a finite number of fundamental solutions). To prove (5.46) it suffices to show that the sequence of vectors .u2k ; v2k /, k 2 ZZ is periodic modulo s, and similarly the sequence of vector .u2kC1 ; v2kC1 /, k 2 ZZ is periodic modulo s. We start with the even case .u2k ; v2k /, k 2 ZZ. We have p p p p p u2k C v2k 2 D .1 C 2/2k .a C b 2/ D .pk C qk 2/.a C b 2/; which implies that u2k D a pk C 2b qk and v2k D a qk C b pk ; so it suffices to show that the sequence of vectors .pk ; qk /, k 2 ZZ is periodic modulo s. Since p p p pkC1 C qkC1 2 D .pk C qk 2/.3 C 2 2/; we have the recurrence formula pkC1 D 3pk C 4qk and qkC1 D 2pk C 3qk : The Pigeonhole Principle implies that there are two integers 1 `1 < `2 s 2 C 1 such that the vectors .p`1 ; q`1 / and .p`2 ; q`2 / are the same modulo s. Combining this fact with the recurrence formula, we conclude that the sequence of vectors .pk ; qk /,
270
5 Pell’s Equation, Superirregularity and Randomness
k 2 ZZ is periodic modulo s with period `2 `1 . This settles the even case .u2k ; v2k /, k 2 ZZ. The odd case .u2kC1 ; v2kC1/, k 2 ZZ goes similarly. Next we focus on a simple consequence of Theorem 5.6. Let c > 0 be arbitrarily small but fixed, then by Theorem 5.6 the inhomogeneous diophantine inequality p c kn 2 ˇk < n
(5.47)
has infinitely many integer solutions n 1 for almost every ˇ (in the sense of the Lebesgue measure). Inequality (5.47) corresponds to the hyperbola segment (ˇ is fixed): c ; x 1; x
jy ˇj <
which has infinite area. But we may go further, and consider the smaller region jy ˇj <
1 1 ; and the even smaller region jy ˇj < ; x log x x log x log log x
and so on. They all have infinite area, since Z
N e
dx D log log N; and x log x
Z
N
ee
dx D log log log N; x log x log log x
and p the rest all tend to infinity as N ! 1. Proposition 1.18 (“Area Principle for 2”) gives that the inhomogeneous inequalities p kn 2 ˇk < p kn 2 ˇk <
c .n 2/; n log n
(5.480)
c .n 3/; n log n log log n
(5.4800)
and so on, in the most general case p kn 2 ˇk < where
.n/
(5.49)
.x/ is a positive decreasing function of the real variable x with X
.n/ D 1;
n
all have infinitely many integer solutions n 1 for almost every ˇ (in the sense of the Lebesgue measure).
5.2 Randomness and the Area Principle
271
p Next we discuss the generalization where 2 is replaced by an arbitrary real ˛. To explain this generalization (see Theorem 5.7), we recall what is considered the basic problem of classical diophantine approximation. The basic problem is to decide whether or not an inequality ˇ ˇ ˇ ˇ ˇ˛ p ˇ < 1 ; or equivalently; jq˛ pj < 1 (5.50) ˇ q ˇ q2 q with integers p, q, or more generally of an inequality kq˛k <
.q/
(5.51)
has infinitely many integral solutions in q, and if this is the case, to determine the solutions, or at least determine the asymptotic number of integral solutions. Here kxk denotes the distance of a real x from the nearest integer, and .q/ is a positive decreasing function of q. The inhomogeneous analog of (5.51) is kq˛ ˇk <
.q/;
(5.52)
where ˇ is an arbitrary fixed real number. Of course, we may assume 0 ˇ < 1. Is there any connection between the solvability of the homogeneous (5.51) p and the inhomogeneous (5.52)? Proposition 1.18 is about the special case ˛ D 2, and it justifies the Area Principle. The Area Principle is a vague intuition claiming that a “nice region of infinite area must contain infinitely many lattice points.” We know that the Area Principle is false for the hyperbolic region 1=2 x 2 2y 2 1=2, which has infinite area and contains only one lattice point (the origin). This Pell inequality is basically equivalent to the diophantine inequality p c kq 2k < with c 25=2 ; q
(5.53)
and (5.53) has no infinitely many integral solutions in q if the constant c < 25=2 . The failure of the Area Principle for (5.53) is compensated by the success of the Area Principle for the inhomogeneous inequality p kq 2 ˇk < .q/; (5.54) which has infinitely many integral solution q for almost every ˇ, provided any positive decreasing function of the real variable x with 1 X
.n/ D 1:
.x/ is
(5.55)
nD1
This is thepstatement of Proposition 1.18. The next result generalizes the special case ˛ D 2 for arbitrary real ˛.
272
5 Pell’s Equation, Superirregularity and Randomness
Theorem 5.7 (“Area Principle in general”). Let function of the real variable x with 1 X
.x/ be any positive decreasing
.n/ D 1:
(5.56)
nD1
For any real number ˛, at least one of the following two cases always holds: (i) the homogeneous inequality kq˛k <
.q/
(5.57)
has infinitely many integral solutions, (ii) the inhomogeneous inequality kq˛ ˇk <
.q/
(5.58)
has infinitely many integral solutions for almost every 0 ˇ < 1 (in the sense of Lebesgue measure). Remarks. Note that the divergence condition (5.56) is necessary. Indeed, if 1 X
.n/ < 1
(5.59)
nD1
then the set of pairs .˛; ˇ/ for which the inequality kq˛ ˇk <
.q/
(5.60)
has infinitely many integral solutions q has two-dimensional Lebesgue measure zero. This statement immediately follows from the other statement that, for every fixed ˇ, the set of ˛ which satisfy (5.60) for infinitely many q has Lebesgue measure zero. The second statement has an easy proof as follows: every such ˛ in 0 < ˛ < 1 is contained in infinitely many intervals of the form
pCˇ q
.q/ p C ˇ ; C q q
.q/ q
with q N , 1 p q integers, and the total length of these intervals is less than 2
X qN
which by (5.59) tends to zero as N ! 1.
.q/;
5.2 Randomness and the Area Principle
273
This means that Theorem 5.7 is a precise convergence–divergence type result, or, to borrow a well-known concept from probability theory, we may call it a “zero–one law.” Let’s return to the inhomogeneous inequality (5.60). If ˛ is rational and ˇ is irrational, then (5.60) has only a finite number of integral solutions for any .q/ ! 0 as q ! 1. Well, this is P trivial. It is less trivial to find an irrational ˛ and a decreasing function with q .q/ D 1 such that for almost all ˇ (5.60) has only a finite number of integral solutions. One can find such an ˛ by taking any irrational 0 < ˛ < 1 with “sufficiently large” partial quotients in the following quantitative sense: ˛D
1 1 a1 C a2 C : : :
D Œa1 ; a2 ; a3 ; : : :
where 2
ak k .log k/ ;
(5.61)
and take .q/ D
1 : q log q
(5.62)
Then the denominator qk of the kth convergent of ˛ is roughly 2
qk a1 a2 ak k k.log k/ ;
(5.63)
and so X k
X 1 1 const < 1: log qk k.log k/3 k
We recall the well-known fact ˇ ˇ ˇ ˇ 1 ˇ˛ p k ˇ < ˇ ˇ qk qk qkC1 which implies ˇ ˇ ˇ ˇ n ˇn˛ npk ˇ < : ˇ ˇ qk qk qkC1
(5.64)
274
5 Pell’s Equation, Superirregularity and Randomness
If qk n < qkC1 k 2 and kn˛ ˇk <
1 n log n
then by (5.63) and (5.64) 2 ˇ npk < 1 C 1 < : 2 qk k qk n log n k.log k/3 qk
(5.65)
If qkC1 k 2 n < qkC1 then define the set Ak D
[ n˛ n
1 1 ; n˛ C n log n n log n
.mod 1/
(5.66)
where the summation in (5.66) is extended over all n with qkC1 k 2 n < qkC1 , and motivated by (5.65) define the set [ j 2 j 2 ; C Bk D .mod 1/: qk k.log k/3 qk qk k.log k/3 qk 0j
(5.67)
k
Clearly X
meas.Bk /
k
X k
4 < 1; k.log k/3
(5.68)
where meas stands for the usual Lebesgue measure, and X
meas.Ak / const
k
X log.k 2 / k.log k/3 k
const
X k
1 < 1: k.log k/2
(5.69)
It follows from (5.68) and (5.69) that almost every ˇ is contained only in a finite number of Ak and in a finite number of Bk . In view of (5.65)–(5.67) this implies that, for almost every ˇ, inequality (5.60) has only a finite number of integral solutions [where ˛ and are defined by (5.61) and (5.62)]. For the proofs of Theorems 5.4 and 5.6, see Chap. 6; for the proof of Theorem 5.7, see Sect. 5.11. Section 5.3 is technical: it contains the proofs of Theorem 5.3 and Lemmas 5.1, 5.2, and 5.5. Sections 5.4–5.9 contain the “super-irregularity” results.
5.3 Proving Theorem 5.3 and the Lemmas
275
5.3 Proving Theorem 5.3 and the Lemmas Proof of Lemmap5.2. First we prove formula (5.33). Consider the hyperbolic needle HN . / D HN . 2I / defined as n p p o HN . / D .x; y/ 2 R I 2 W x 2 2y 2 where 1 x C y 2 2 2N : (5.70) Comparing (5.12) with (5.70), we see that p HN . / D H . 2I Œ; I N /; so by (5.14) we obtain the area: area.HN . // D p log N C O.1/: 2 Next we need the following almost trivial result.
(5.71) t u
2
Lemma 5.8. Let S R I be a Lebesgue measurable set in the plane with finite measure (that we call the “area”). Then Z 1Z
1
j.S C x/ \ ZZ2 j d x D area.S /; 0
0
where S C x is the translated copy of set S , translated by the vector x 2 R I 2. First we derive Lemma 5.2 from Lemma 5.8. By Lemma 5.8, Z 1Z
1
j.HN . / C v/ \ ZZ2 j d v D area.HN . //; 0
(5.72)
0
where S C v denotes the translated copy of a set S , translated bypthe vector v 2 R I 2. 2 If v D .v1 ; v2 / 2 Œ0; 1/ is chosen in such a way that v1 v2 2 ˇ (mod 1) is fixed, then clearly ˇ p ˇ ˇ ˇ ˇF . 2I ˇI I N / j.HN . / C v/ \ ZZ2 jˇ < c0 . /;
(5.73)
where c0 . / < 1 is a constant independent of ˇ and N . Combining (5.71)–(5.73), Eq. (5.33) follows. Next we prove (5.34). Let 0 a < b 1 be fixed, and for any M 1 define the parallelogram p p PM D fv D .v1 ; v2 / 2 R I 2 W a v1 v2 2 b; 0 v1 C v2 2 M g:
(5.74)
276
5 Pell’s Equation, Superirregularity and Randomness
If M is large, then PM is a long and narrow parallelogram, but we can turn it into a “round” shape by applying an appropriate automorphism of the quadratic form x 2 2y 2 . The substitution x1 D x C2y, y1 D x Cy is a fundamental automorphism of x 2 2y 2 (indeed, x12 2y12 D .x C 2y/2 2.x C y/2 D .x 2 2y 2 /), and Ak D
k 12 ; k 2 ZZ; 11
give rise to infinitely many automorphisms preserving the lattice points and the area. 12 are parallel to the sides of parallelogram PM , so The eigenvectors of A D 11 applying an appropriate power Ak on the long and narrow parallelogram PM , we obtain a “round” shape parallelogram Ak PM with sides parallel to that of PM , and of course area.Ak PM / D area.PM / D const M: p “Round” shape means that the diameter of parallelogram Ak PM is O. M /, so the 2 2 k number p of unit squares Œ0; 1/ C n, n 2 ZZ intersecting the boundary of A PM is O. M /. Combining this geometric fact with Lemma 5.8 [see (5.72)], we have 1 area.PM /
Z PM
j.HN . / C v/ \ ZZ2 j d v D area.HN . // 1 C O.M 1=2 / (5.75) : p
If v D .v1 ; v2 / 2 Œ0; 1/2 is chosen in such a way that v1 v2 2 ˇ (mod 1) is fixed, then clearly ˇ ˇ p ˇ ˇ ˇF . 2I ˇI I N / j.HN . / C v/ \ ZZ2 jˇ < c0 .; M /;
(5.76)
where c0 .; M / < 1 is a constant independent of ˇ and N . Combining (5.71), (5.75), and (5.76), 1 ba
Rb a
p c0 .; M / F . 2I ˇI I N / dˇ D p C O.1= log N / 1 C O.M 1=2 / C : log N log N 2 (5.77)
Since M can be arbitrarily large, (5.77) implies (5.34). The proof of Lemma 5.2 is complete. t u For the sake of completeness we include a Proof of Lemma 5.8. First assume that S is bounded. Let N be a “large” integer. By using the periodicity of ZZ2 we have
5.3 Proving Theorem 5.3 and the Lemmas
Z NZ
277
Z 1Z
N
1
j.S C x/ \ ZZ2 j d x D N 2 0
0
j.S C x/ \ ZZ2 j d x: 0
0
On the other hand, Z NZ 0
N
j.S C x/ \ ZZ2 j d x D
0
X n2ZZ
D
X
˚ area x 2 Œ0; N 2 W n 2 S C x D 2
˚ area .n S / \ Œ0; N 2 : 2
n2ZZ
Without loss of generality we can assume that the origin is inside S . Let d.S / denote the diameter of S. Then .n S / Œ0; N 2 if n 2 Œd.S /; N d.S /2 , and .n S / \ Œ0; N 2 D ; if n 62 Œd.S /; N C d.S /2 . Thus we have .N C2d.S //2 area.S /
X n2ZZ
˚ area .n S / \ Œ0; N 2 .N 2d.S //2 area.S /: 2
Dividing the last line by N 2 , and combining the equations above, Lemma 5.8 follows as N tends to infinity. If S is unbounded, then we approximate S with an increasing sequence S1 S2 S3 : : : of subsets of S such that each Sk is bounded and area.S nSk / ! 0. The last step is to use the continuity of the Lebesgue measure. t u Proof of Lemma 5.1. For notational simplicity we just prove the special case ˇ2 D 0 (the general case is the same). Again the key step is to apply Lemma 5.8. For 1 K < L 1 we define the following four regions: ˚ HK;L .ˇI / D .x; y/ 2 R I 2 W .x C ˇ/2 2y 2 where K y L; x > 0 ; n o p p HQ K;L .ˇI / D .x; y/ 2 R I 2 W jx C ˇ y 2j 2 2y < where K y L; x > 0 ; n o p p C HQ K;L .ˇI / D .x; y/ 2 R I 2 W jx C ˇ y 2j .2 2y C 1/ < where K y L; x > 0 ; n o p p HQK;L .ˇI / D .x; y/ 2 R I 2 W jx C ˇ y 2j .2 2y 1/ < where K y L; x > 0 :
p In view of factorization (5.29), .x; y/ 2 HK;L .ˇI /p implies that x C ˇ D y 2 C o.1/; in fact, we have the stronger form x C ˇ D y 2 C O.1=y/. Thus there is a threshold c1 D c1 . / such that C HQ K;L .ˇI / HK;L .ˇI / HQ K;L .ˇI /
278
5 Pell’s Equation, Superirregularity and Randomness
holds for all L > K > c1 . /. On the other hand, it is trivial that C HQ K;L .ˇI / HQ K;L .ˇI / HQ K;L .ˇI /:
Consider now the special case K D 1, L D 1, ˇ D 0, and study the difference set C D. / D HQ 1;1 .0I / n HQ 1;1 .0I /:
We estimate the area of the difference set D. /: Z
1
area.D. // D O.1/ 1
Z D O.1/ 1
1
1 1 p p 2 2y 1 2 2y C 1
dy D
dy D O.1/: 8y 2 1
Combining this with Lemma 5.8, we have Z 1Z
1
j.D. / C v/ \ ZZ2 j d v D area.D. // < 1:
(5.78)
0 0
p If v D .v1 ; v2 / 2 Œ0; 1/2 is chosen in such a way that v1 v2 2 ˇ (mod 1) is fixed, then C D. / C v HK;L .ˇI / 4 HQ K;L .ˇI /;
(5.79)
where A 4 B D .A n B/ [ .B n A/ is the symmetric difference of A and B. Combining (5.78) and (5.79), Lemma 5.1 easily follows. t u p Proof of Lemma 5.5. Consider a rectangle of slope 1= 2 which contains two lattice points P D .k; `/ and Q D .m; n/; in fact, assume that P; Q are two corner points of the rectangle. We denote the PQ vector as v D .m k; n `/, and consider the two perpendicular unit vectors e1 D
! p 2 1 and e2 D p ;p 3 3
p ! 1 2 : p ; p 3 3
Then the two sides, a and b, of the rectangle can be expressed in terms of the inner products e1 v and e2 v: p p jp 2 C qj jp q 2j a D je1 vj D and b D je2 vj D ; p p 3 3
5.3 Proving Theorem 5.3 and the Lemmas
279
where p D m k and q D n `. Thus we have p p jp 2 C qj jp q 2j : area D ab D 3 Without loss of generality we can assume that p 0 and q 0. Since .p; q/ ¤ .0; 0/, we have p jp 2 2q 2 j 1 jp q 2j D p D p ; pCq 2 pCq 2 and so p p p jp 2 C qj jp q 2j 1 p 2Cq area D p 3 3 pCq 2 p 1 p= 2 C q 1 1 p D p > ; 3 pCq 2 5 3 2 t u
proving Lemma 5.5.
Proof of Theorem 5.3. We show that the set of ˇs in question (“set of divergence points”) contains a Cantor set. This guarantees that the cardinality of the set is continuum. We make a standard Cantor set construction, i.e., we papply the method of “nested intervals.” For notational convenience, we write F . 2I ˇI I N / D F .ˇI I N /. By (5.33), Z
1 0
F .ˇI I N / dˇ D p log N C O.1/; 2
and applying it with D 1=4, we obtain the existence of a 0 < ˇ1 < 1 and an arbitrarily large integer N1 such that F .ˇ1 I D 1=4I N1 / >
1 log N1 : 8
Since 1=4 < 1=2, there is an interval I1 D Œa; b with 0 < a < b < 1 such that ˇ1 2 I1 and F .ˇI D 1=2I N1/ >
1 log N1 for all ˇ 2 I1 : 8
(5.80)
p Next let n D .n1 ; n2 / 2 ZZ2 be a lattice point such that ˇ2 D n1 n2 2 2 I1 . Since the equation jx 2 2y 2 j 3=4 does not have a nonzero integral solution, trivially
280
5 Pell’s Equation, Superirregularity and Randomness
F .ˇ2 I D 3=4I N / <
1 log N for all N N2 ; 100
where N2 < 1 is a sufficiently large threshold. We can clearly assume that N2 > N1 . Since 3=4 > 1=2, there is an interval I2 D Œa; b with some 0 < a < b < 1 (a and b are generic numbers) such that ˇ2 2 I2 and F .ˇI D 1=2I N2 / <
1 log N2 for all ˇ 2 I2 : 100
(5.81)
We can clearly assume that I2 is a proper subinterval of I1 . Let I.0/ D I2 , and repeating the second argument, there is another closed subinterval I.1/ such that I.0/ [ I.1/ I1 , I.0/ and I.1/ are disjoint, and .1/
F .ˇI D 1=2I N2 / <
1 .1/ log N2 for all ˇ 2 I.1/: 100
(5.82)
.1/
We can clearly assume that N2 > N1 . By (5.34), 1 jI.0/j
Z
F .ˇI I N / dˇ D .1 C o.1// p log N; 2 I.0/
and applying it with D 1=4, we obtain the existence of a 0 < ˇ3 < 1 and a large integer N3 such that F .ˇ3 I D 1=4I N3 / >
1 log N3 : 8
Since 1=4 < 1=2, there is an interval I3 D Œa; b with 0 < a < b < 1 such that ˇ3 2 I3 and F .ˇI D 1=2I N3/ >
1 log N3 for all ˇ 2 I3 : 8
(5.83)
We can clearly assume that I3 is a proper subinterval of I.0/. Write I.0; 0/ D I3 . Similarly, there is another subinterval I.0; 1/ such that I.0; 0/ [ I.0; 1/ I.0/, I.0; 0/ and I.0; 1/ are disjoint, and .1/
F .ˇI D 1=2I N3 / >
1 .1/ log N3 for all ˇ 2 I.0; 1/: 8
(5.84)
There are similar disjoint subintervals I.1; 0/ and I.1; 1/ of I.1/. p Next let n D .n1 ; n2 / 2 ZZ2 be a lattice point such that ˇ4 D n1 n2 2 2 I.0; 0/. Since the equation jx 2 2y 2 j 3=4 does not have a non-trivial integral solution, F .ˇ4 I D 3=4I N / <
1 log N for all N N4 ; 100
5.4 The Riesz Product
281
where N4 < 1 is a sufficiently large threshold. We can clearly assume that N4 > N3 . Since 3=4 > 1=2, there is an interval I4 D Œa; b with 0 < a < b < 1 such that ˇ4 2 I4 and F .ˇI D 1=2I N4 / <
1 log N4 for all ˇ 2 I4 : 100
(5.85)
We can clearly assume that I4 is a proper subinterval of I.0; 0/. Let I.0; 0; 0// D I4 , and repeating the last argument, there is another closed subinterval I.0; 0; 1/ such that I.0; 0; 0/ [ I.0; 0; 1/ I.0; 0/, I.0; 0; 0/ and I.0; 0; 1/ are disjoint, and .1/
F .ˇI D 1=2I N4 / <
1 .1/ log N4 for all ˇ 2 I.0; 0; 1/; 100
(5.86)
and so on. Repeating this argument, we build an infinite binary tree: I1 I"1 I"1 ;"2 I"1 ;"2 ;"3 where "1 D 0 or 1, "2 D 0 or 1, "3 D 0 or 1, and so on. For an arbitrary infinite 0–1 sequence "1 ; "2 ; "3 ; : : :, let ˇ 2 I1 \ I"1 \ I"1 ;"2 \ I"1 ;"2 ;"3 \ ; then by (5.80)–(5.86) there is an infinite sequence 1 < M1 < M2 < M3 < M4 < : : : of integers such that F .ˇI D 1=2I M2k1/ >
1 log M2k1 8
and F .ˇI D 1=2I M2k / <
1 log M2k ; 100
where k D 1; 2; 3; : : :. This proves Theorem 5.3.
t u
5.4 The Riesz Product 5.4.1 The Method of Nested Intervals vs. the Riesz Product At the end of Sect. 5.1 we formulated a far-reaching generalization of Theorem 5.3: see (5.36). It states that Theorem 5.3 actually holds for every > 0, and we have the stronger inequality
282
5 Pell’s Equation, Superirregularity and Randomness
p p F . 2I ˇ I I n/ F . 2I ˇ I I n/ > p > lim inf ; lim sup n!1 log n log n n!1 2
(5.87)
where p 2 log n C O.1/ is the area of the corresponding hyperbolic region. What is more, (5.87) holds for continuum many “divergence points” ˇ D ˇ . / 2 Œ0; 1/. The proof of Theorem 5.3 was based on an elementary argument that we may call the “method of nested intervals.” To prove (5.87) we need a new idea: we apply a more sophisticated “Riesz product argument.” The Riesz product is a powerful tool in Fourier analysis. A typical application is to prove large fluctuations for lacunary trigonometric series. To compare the “method of nested intervals” to the method of Riesz product, we give a simple illustration, see Facts 1 and 2 below. (Fact 2 is actually a well-known theorem of S. Sidon.) Consider a finite cosine sum F .x/ D
N X
aj cos.2 nj x/; where aj D ˙1 for all 1 j N;
(5.88)
j D1
and 1 n1 < n2 < : : : < nN are integers. We study the following question: What can we say about max0x1 F .x/? Well, under different extra conditions we have different results. We begin with Fact 1. If the strong gap condition nj C1 =nj 8 holds for every 1 j N 1, then max F .x/
0x1
N : 2
The proof of Fact 1 is almost trivial. Let J1 D fx 2 Œ0; 1 W cos.2 n1 x/ falls between
a1 and a1 g: 2
Since a1 D ˙1, the set J1 contains a closed subinterval I1 of length jI1 j
1 : 4n1
Next let J2 D fx 2 I1 W cos.2 n2 x/ falls between
a2 and a2 g: 2
Since a2 D ˙1, the set J2 contains a closed subinterval I2 of length jI2 j
1 : 4n2
5.4 The Riesz Product
283
Next let J3 D fx 2 I2 W cos.2 n3 x/ falls between
a3 and a3 g; 2
and so on. At the end of this process, we obtain a nested sequence of closed intervals Œ0; 1 I1 I2 IN such that ak cos.2 nk x/ 1=2 for all x 2 Ik , k D 1; 2; : : : ; N . Then we clearly have x 2 IN H) F .x/
N ; 2
proving Fact 1. This was a typical application of the “method of nested intervals.” Next comes the “Riesz product argument.” The problem that we study is the following: What happens if the strong gap condition nj C1 =nj 8 is replaced by the weaker nj C1 =nj 1 C " > 1 where " > 0 is an arbitrarily small but fixed constant? Can we still prove a linear lower bound like max0x1 F .x/ c N with some constant c D c."/ > 0 depending only on the value of "? Unfortunately, the “method of nested intervals” hopelessly collapses, and we need a new approach: it is exactly the “Riesz product argument.” The following result—a well-known theorem of Sidon in Fourier analysis—is much deeper than that of Fact 1. Fact 2 (Sidon’s theorem). If the weak gap condition nj C1 1C" > 1 nj
(5.89)
holds for every 1 j N 1, where 0 < " < 1=2 is a fixed constant, then for F .x/ defined in (5.88) we have max F .x/ c N with c D
0x1
4 "
1 : log 2"
To prove Sidon’s theorem, we define a Riesz product as follows. Let 1 D i.1/ < i.2/ < : : : < i.M / be a subsequence of 1; 2; 3; : : : ; N such that ni.j C1/ 2 holds for all j D 1; 2; : : : ; M 1; ni.j / "
(5.90)
and consider the product R.x/ D
M Y 1 C ai.j / cos.2 ni.j / x/ : j D1
(5.91)
284
5 Pell’s Equation, Superirregularity and Randomness
Since ai.j / D ˙1, the Riesz product R.x/ is obviously nonnegative, i.e., R.x/ 0. Riesz product R.x/ is used as a test function: first we evaluate the integral Z
1
F .x/R.x/ dx D 0
M X
Z 2 ai.j /
j D1
1
cos2 .2 ni.j / x/ dx D 0
M : 2
(5.92)
Indeed, multiplying out the Riesz product R.x/, and using Euler’s formula e y D .e iy C e iy /=2, we obtain terms like ai.j1 / ai.j2 / ai.j3 / : : : ai.jk / e 2i.˙ni.j1 / ˙ni.j2/ ˙ni.j3/ ˙˙ni.jk / / ;
(5.93)
where we call (5.93) a product of length k 1. We distinguish two cases. Case 1: k D 1 (“short products”) Multiplying the corresponding terms with F .x/ and integrating from 0 to 1, we obtain M X
Z 2 ai.j /
j D1
1
cos2 .2 ni.j / x/ dx D 0
M ; 2
which is exactly (5.92). Next assume Case 2: k 2 (“long products”) We can clearly write 1 j1 < j2 < : : : < jk , then by using the elementary inequalities 1C
" " 2 " 3 C C < 1 C " and C 2 2 2
1
1 " " 2 " 3 > 2 2 2 1C"
if 0 < " < 1=2, we obtain that j˙ni.j1 / ˙ni.j2 / ˙ni.j3 / ˙ ˙ni.jk / j falls between .1C"/ni.jk / and
1 ni.jk / : 1C"
Comparing this to the gap condition (5.89), we see that F .x/ and the “long products” of R.x/ represent disjoint sets of exponential functions e 2i`x ; ` 2 ZZ; and using the orthogonality of these functions, the contribution of Case 2 in R1 0 F .x/R.x/ dx is zero. This proves (5.92).
5.4 The Riesz Product
285
The same argument shows that Z
1
R.x/ dx D 1:
(5.94)
0
R1 Since R.x/ 0, (5.94) means that the integral 0 F .x/R.x/ dx is a “weighted average” of F .x/ (with nonnegative weights). So by (5.92), Z
1
max F .x/
0x1
F .x/R.x/ dx D 0
M : 2
(5.95)
The inequality 2 2 2 .1 C "/ > ; clearly holds with r D log " " " r
thus by (5.89) and (5.90) we can choose M
N D r
2 "
N ; log 2"
and (5.95), (5.96) complete the proof of Sidon’s theorem.
(5.96) t u
5.4.2 The “Rectangle Property”, and a Key Result: Theorem 5.11 Let’s return now to Theorem 5.3 and (5.87). We restate Theorem 5.3 in a slightly different form. We recall the notation in (5.70): p ˚ HN . 2I / D .x; y/ 2 R I 2 W x 2 2y 2 p p o where 1 x C y 2 2 2N ;
(5.97)
p p that is, HN . 2I / is a long, narrow, tilted hyperbolic needle of slope 1= 2. Its area is p log N C O.1/, see (5.71). Theorem 5.3 states—roughly speaking—that, in the 2 special casep D 1=2, there are two translated copies of the same tilted hyperbolic needle HN . 2I D 1=2/ such that one is substantially richer in lattice points than the other: the discrepancy is proportional to the area (“extra large deviation”). More precisely, there is a positive absolute constant c > 0 such that, for infinitely many integers N D Ni (wherepNi ! 1), there are two translated p .i / .i / C H . 2I / and x2 C HN . 2I / of the tilted hyperbolic needle copies x N p 1 HN . 2I D 1=2/ such that
286
5 Pell’s Equation, Superirregularity and Randomness
ˇ ˇ p p ˇ 2 ˇ .i / .i / ˇjZZ \ .x1 C HN . 2I D 1=2//j jZZ2 \ .x2 C HN . 2I D 1=2//jˇ > > c log N D c log Ni :
(5.98)
Because of the periodicity of the lattice points, we can clearly assume that the pairs .i / .i / x1 , x2 of vectors are all in the unit square Œ0; 1/2 (i ! 1). The extra large deviation result (5.98), which is equivalent to Theorem 5.3, can be generalized in several stages. The first generalization is (5.87) or at least an equivalent form as follows. Proposition 5.9. Let > 0 be an arbitrary but fixed real number and let N 2 be an integer. We study thep number of lattice points in the translated copies of the tilted hyperbolic needle HN . 2I / of area p2 log N C O.1/. There is a translated copy p x1 C HN . 2I / such that p jZZ2 \ .x1 C HN . 2I //j > p log N C ı 0 log N; 2
(5.990)
where ı 0 D ı 0 . / > 0 is a positive constant, independentpof N . Similarly, there is another translated copy x2 C HN . 2I / such that p jZZ2 \ .x2 C HN . 2I //j < p log N ı 0 log N; 2
(5.9900)
where ı 0 D ı 0 . / > 0 is the same positive constant as in (5.990 ). Note that Proposition 5.9 immediately gives the existence of a single “divergence point” ˇ D ˇ . / 2 Œ0; 1/ in (5.87). To prove continuum many “divergence points” ˇ D ˇ . / 2 Œ0; 1/, we just have to combine Proposition 5.9 with the routine Cantor set argument in the proof of Theorem 5.3. Next comes the second stage of generalization: we replace the set ZZ2 of lattice points in the plane with an arbitrary subset A ZZ2 of positive density. Here is an illustration. We say that a lattice point n D .n1 ; n2 / 2 ZZ2 is coprime (or visible) if the coordinates n1 and n2 are relatively prime. The alternative name visible is explained by the geometric fact that if a lattice point n D .n1 ; n2 / 2 ZZ2 is not coprime, then n is not visible from the origin (since n is “behind the back” of .n1 =d; n2 =d / 2 ZZ2 , where d 2 is the greatest common divisor of n1 and n2 , i.e., .n1 =d; n2 =d / is between .0; 0/ and .n1 ; n2 /). Let ZZ2copri me denote the set of coprime lattice points in the plane. It is well known from number theory that ZZ2copri me is a positive density subset of ZZ2 , and the density is 6= 2 . Now let A be an arbitrary subset of ZZ2 of positive density ı D ı.A/ > 0. There is a natural generalization of Proposition 5.9 where we replace ZZ2 with A; the price that we pay is that, due to the lack of periodicity of a general subset A, the translations are not necessarily in the unit square anymore.
5.4 The Riesz Product
287
Proposition 5.10. Let A ZZ2 be an arbitrary subset of positive density ı D ı.A/ > 0. Let > 0 be an arbitrary but fixed real number and let N 2 be an integer. We study the number of elements of A in the translated copies of the tilted p hyperbolic needle HN . 2I / of area p log N C O.1/. We restrict our attention 2
to the translated copies inside a square Œ0; M 2 . Assume that M=N is sufficiently large depending only p p on and ı. Then there is a translated copy x1 C HN . 2I / such that x1 C HN . 2I / Œ0; M 2 and p jA \ .x1 C HN . 2I //j > ı p log N C ı 0 log N; 2
(5.1000)
where ı 0 D ı 0 .; ı/ > 0 is a positive constant, independent p of N and M . Similarly, there is another translated copy x C H . 2I / such that x2 C 2 N p HN . 2I / Œ0; M 2 and p jA \ .x2 C HN . 2I //j < ı p log N ı 0 log N; 2
(5.10000)
where ı 0 D ı 0 .; ı/ > 0 is the same positive constant as in (5.1000). It turns out that the only relevant property of a lattice point set A ZZ2 that we really use in the proof of Proposition p 5.10 is the “rectangle property” in Lemma 5.5: every tilted rectangle of slope 1= 2 and area 1=5 contains at most one lattice point. (Of course, the concrete value 1=5 of the constant is secondary.) The third stage of generalization goes far beyond the family of lattice point sets A ZZ2 : the only requirement is that the point set satisfies the “rectangle property.” Theorem 5.11. Let P be a finite set of points in the square Œ0; M 2 with density ı, i.e., the number of elements of P is jPj D ı M 2 . We study the number p of elements of P in the translated copies of the tilted hyperbolic needle HN . 2I / of area p log N C O.1/. We restrict our attention to the translated copies inside a square 2
Œ0; M 2 . Assume that P satisfies the following “rectangle property”: there p is a positive constant c1 D c1 .P/ > 0 such that every tilted rectangle of slope 1= 2 and of area c1 contains at most one element of the set P. Furthermore, assume that both N and M=N are “large” in the precise sense of (5.102). p p Then there is a translated copy x1 C HN . 2I / such that x1 C HN . 2I / Œ0; M 2 and p jP \ .x1 C HN . 2I //j > ı p log N C ı 0 log N; 2
(5.1010)
where ı 0 D ı 0 .c1 ; ; ı/ > 0 is a positive constant, independent of N and M .
288
5 Pell’s Equation, Superirregularity and Randomness
p Similarly, there is another translated copy x2 C HN . 2I / such that x1 C p HN . 2I / Œ0; M 2 and p jP \ .x2 C HN . 2I //j < ı p log N ı 0 log N 2
(5.10100)
with the same ı 0 D ı 0 .c1 ; ; ı/ > 0 as in (5.1010); namely, 107 c1 107 c12 p 0 0 12 ı D ı .c1 ; ; ı/ D 10 ı min : ; c1 ; ; 20 2 2 Finally, the assumption that both N and M=N are “large” goes as follows: N 2
10 C 1
;
1 N < 2n N; 2
C 1 .N C 2 / n p o: M > 1011 7 107 c 2 ı c1 min 20 ; c1 ; 10 2 c1 ; 2 1
(5.102)
Note that Propositions 1.15 and 1.16 are special cases of Theorem 5.11 (with P D ZZ2 and P D A). Unfortunately, the proof of Theorem 5.11 is rather difficult and long. The complicated details cover the next four sections. But the main idea is quite simple: it is basically a sophisticated application of the Riesz product.
5.5 Starting the Proof of Theorem 5.11 Using Riesz Product Since the proof is long and complicated, a convenient notation makes a big difference. It is much simpler for us to work with hyperbolic regions in the usual horizontal–vertical position (instead of the tilted position). It means that, instead of working with the set ZZ2 of lattice points in the plane and the family of tilted hyperbolic needles of a fixed (quadratic) irrational slope (i.e., the setup of Theorem 5.11), we rotate back. In other words, we rotate ZZ2 by a (quadratic) irrational slope, and consider the family of hyperbolic needles in the usual horizontal–vertical position. Let > 0 be an arbitrary real number and let N 2 be a (large) integer: let H .N / denote the hyperbolic region xy , 1 x N , see Fig. 5.2. Notice that H .N / p is basically the horizontal–vertical version of the tilted hyperbolic needle HN . 2I / [see (5.70) or (5.97)]. To emphasize the difference between the tilted and the horizontal–vertical versions, we made a major change in the notation: notice that we switched the location of the parameters and N . Again we refer to H .N / as a “hyperbolic needle.” The area of H .N / equals the integral
5.5 Starting the Proof of Theorem 5.11 Using Riesz Product
289
Fig. 5.2
Z area.H .N // D 2 1
N
dx D 2 log N: x
(5.103)
Let rot˛ ZZ2 denote the rotated copy of ZZ2 by the angle where tan D ˛ Dslope (we assume that the origin is the fixpoint of the rotation). If ˛(¤ 0) is a quadratic irrational, then the continued fraction for ˛ is (eventually) periodic. This is a wellp known number-theoretic fact, for example, if ˛ D 2=2 then p 2 1 1 Dp D D Œ1; 2; 2; 2; : : : D Œ1; 2: 2 1 C 2C 1 1 2 2C
Periodicity implies that the continued fraction “digits” (officially called partial quotients) form a bounded sequence. Boundedness yields (via some well-known elementary facts from the theory of continued fraction) that kkk˛k c0 D c0 .˛/ > 0 for all integers k 1
(5.104)
where c0 D c0 .˛/ > 0 is some positive constant depending only on k p ˛, and k p denotes the distance from the nearest integer. For example, if ˛pD 2=2 p D 1= 2 then (5.104) follows from the factorization x 2 2y 2 D .x y 2/.x C y 2/: if x and y are integers then p p p p 1 jx 2 2y 2 j D jx y 2j jx C y 2j D jx˛ yj 2jx C y 2j; and we choose p x D k, y=the nearest integer to k˛; this explains why in the special case ˛ D 1= 2 the choice c0 D 1=4 in (5.104) works.
290
5 Pell’s Equation, Superirregularity and Randomness
Inequality (5.104) has an important geometric interpretation: every axes parallel rectangle of area c1 .˛/ > 0 contains at most one element of the rotated copy rot˛ ZZ2 of ZZ2 ;
(5.105)
where c1 D c1 .˛/ p > 0 is another positive constant depending only on ˛. For example, if ˛ D 2=2, then by Lemma 5.5, c1 D 1=5 is a good choice in (5.105). The following statement is just a slight generalization of Theorem 5.11. Proposition 5.12. Let P be a finite set of points in the square Œ0; M 2 with density ı, i.e., the number of elements of P is jPj D ı M 2 . We study the number of elements of P in the translated copies of the hyperbolic needle H .N /. Assume that P satisfies the following “rectangle property”: there is a positive constant c1 D c1 .P/ > 0 such that every axes-parallel rectangle of area c1 contains at most one element of the set P. Furthermore, assume that both N and M=N are “large” in the precise sense of (5.107). Then there is a translated copy x1 C H .N / of the hyperbolic needle H .N / such that x1 C H .N / Œ0; M 2 and jP \ .x1 C H .N //j ı 2 log N C ı 0 log N;
(5.1060)
where ı 0 D ı 0 .c1 ; ; ı/ > 0 is a positive constant, independent of N and M , to be defined below. Similarly, there is a translated copy x2 CH .N / of the hyperbolic needle H .N / such that x2 C H .N / Œ0; M 2 and jP \ .x2 C H .N //j ı 2 log N ı 0 log N
(5.10600)
with the same ı 0 D ı 0 .c1 ; ; ı/ > 0 as in (5.1060); namely, 0
0
ı D ı .c1 ; ; ı/ D 10
12
107 c1 107 c12 p ı min : ; c1 ; ; 20 2 2
Finally, the assumption that both N and M=N are “large” goes as follows: N 2
10 C 1
;
1 N < 2n N; 2
C 1 .N C 2 / o: M > 1011 n p 7 107 c 2 ı c1 min 20 ; c1 ; 10 2 c1 ; 2 1
(5.107)
Remarks. The term ı 2 log N in (5.1060) and (5.10600) represents the “expectation,” since the set P has density ı and the hyperbolic needle H .N / has area
5.5 Starting the Proof of Theorem 5.11 Using Riesz Product
291
2 log N . The extra terms ˙ı 0 log N mean that the deviation from the expectation is proportional to the expectation, justifying the name “extra large deviation.” The constant factors 1012 and 1011 are certainly very far from the best possible. Since the proof is complicated, my primary goal is to present the basic ideas in the simplest form—I don’t care too much about optimizing the constant factors. Proof of Proposition 5.12. Let f .x/ denote the point-counting function jP \ .x C H .N //j:
(5.108)
If x 2 Œ0; M N Œ; M then clearly x C H .N / Œ0; M 2 :
(5.109)
This explains why we choose the rectangle Œ0; M N Œ; M to be our underlying domain in the proof. Let .x/ D f .x/ ı area.H .N // D f .x/ ı 2 log N
(5.110)
denote the discrepancy function; .x/ deserves its name if (5.109) holds. In order to show that .x/ > ı 0 log N > 0 holds for some x D x1 , we apply the “test function method,” initiated by Roth [Ro]. The basic idea of the proof is to construct a positive test function T .x/ > 0 such that 1 .M N /.M 2 /
Z
M N
Z
M
.x/T .x/ d x > const log N > 0; 0
(5.111)
and 1 .M N /.M 2 /
Z
M N
Z
M
T .x/ d x < const: 0
(5.112)
Combining (5.111) and (5.112) with the general (trivial) inequality Z
Z .x/T .x/ d x max .x/ x
T .x/ d x;
which holds for any positive function T .x/ > 0, we obtain what we want: max .x/ > const log N x
with some positive constant const> 0.
(5.113)
292
5 Pell’s Equation, Superirregularity and Randomness
Similarly, to verify the other direction .x/ < ı 0 log N < 0 for some x D x2 , we construct a positive test function T .x/ > 0 such that 1 .M N /.M 2 /
Z
M N
Z
0
M
.x/T .x/ d x < const log N < 0;
(5.114) and again 1 .M N /.M 2 /
Z
M N
Z
0
M
T .x/ d x < const:
(5.115)
Clearly (5.114) and (5.115) imply the other direction min .x/ < const log N < 0 x
with some positive constant const> 0. Let’s return to (5.111) and (5.112). We will express the test function T .x/ in terms of “modified Rademacher functions” (sometimes called Haar wavelets)—this is another idea that we borrow from Roth’s pioneering paper [Ro]. The benefit of working with modified Rademacher functions is that we have orthogonality, and what is more, we have “super-orthogonality,” see the key property below. Note that Roth himself simply took the sum of certain “modified Rademacher functions” (and applied the Cauchy–Schwarz inequality, instead of (5.113); for his purposes orthogonality was sufficient). It was Halász’s innovation to express T .x/ as a Riesz product of modified Rademacher functions, see Halász [Ha]. The main point is that the Riesz product takes advantage of the “super-orthogonality.” (Halász used this method, among many other things, to give an elegant new proof of Schmidt’s well-known discrepancy theorem, see [Schm].) Here we develop an adaptation of the Roth–Halász method for hyperbolic regions. Following the Roth–Halász method, we will express the test function T .x/ in the form of a Riesz product of “modified Rademacher functions” T .x/ D
Y
.1 C Rj . x//;
(5.116)
j 2J
where 0 < < 1 is an appropriate constant to be specified later, and Rj .x/, j 2 J are certain modified Rademacher functions to be defined below (J is some appropriate index-set). We assume that the test function T .x/ is zero outside of the rectangle Œ0; M N Œ; M . Suppose that 102 > 1 > 0 and 102 > 2 > 0 are (small) positive real numbers (to be specified later) such that M N M 2 D D 2m ; m 1 is an integer: 1 2
(5.117)
5.5 Starting the Proof of Theorem 5.11 Using Riesz Product
293
Let j be an arbitrary integer in the interval 0 j n where 2n N , that is, n D log N= log 2 C O.1/ (binary logarithm): we decompose the rectangle Œ0; M N Œ; M into 2m 2m D 4m disjoint translated copies of the small rectangle Œ0; 2j 1 Œ0; 2j 2 :
(5.118)
We call these congruent copies of the small rectangle (5.118) j -cells. For each one of the 4m j -cells we independently choose one of the following three patterns: C, C, and 0, see Fig. 5.3. t u
Fig. 5.3
As Fig. 5.3 shows, the pattern C actually means a two-dimensional pattern as follows: we divide the j -cell into four congruent subrectangles, and define a step function on the j -cell, which is C1 on the upper right, 1 on the upper left, 1 on the lower right, and C1 on the lower left subrectangle. Similarly, the pattern C means the step function with 1 on the upper right, C1 on the upper left, C1 on the lower right, and 1 on the lower left subrectangle. Finally, pattern 0 means that the step function is zero on the whole j -cell. In the rest we refer to these two-dimensional patterns simply as 0, C, and C (representing the top rows in Fig. 5.3). By making an independent choice of C, C, and 0 for each j -cell, we obtain a particular modified Rademacher function Rj .x/ of order j , defined over the whole rectangle Œ0; M N Œ; M . We define Rj .x/ to be 0 outside of the rectangle Œ0; M N Œ; M . Since for each one of the 4m j -cells there are three options (namely, C, C, m and 0), the total number of modified Rademacher functions Rj .x/ of order j is 34 . 4m Let R.j / denote the family of all 3 modified Rademacher functions of order j . This means the notation Rj .x/ is somewhat ambiguous in the sense that it represents any element of the huge family R.j /. Super-orthogonality: Key Property of the modified Rademacher functions. If k 1 and 0 j1 < : : : < jk n, then any product Rj1 .x/ Rjk .x/ of k modified Rademacher functions has the property that, in every elementary cell of size 2j1 1 2jk 2 , the product Rj1 .x/ Rjk .x/ equals to one of the three familiar patterns: 0, C, and C (meaning the two-dimensional patterns):
294
5 Pell’s Equation, Superirregularity and Randomness
Note that an elementary cell of size 2j1 1 2jk 2 arises as a nonempty intersection of a j1 -cell and a jk -cell (where j1 < jk ). The proof of the “key property” is almost trivial: it is based on the fact that, for any k 2, the intersection of any k cells of different orders j1 < : : : < jk is either empty or equals the intersection of the j1 -cell and the jk -cell (i.e., the intersection of the first and the last). We emphasize that in each one of the three patterns the integral of the corresponding step function is zero. Since every modified Rademacher function Rj .x/ has values ˙1 or 0, and 0 < < 1, it is clear that the Riesz product (5.116) defines a positive test function T .x/. The index-set J , a subset of f0; 1; 2; : : : ; ng, will be specified later. Note in advance that J is a “large” subset of f0; 1; 2; : : : ; ng in the sense that jJ j const .n C 1/. Next we check the second requirement (5.112) of the test function: 1 .M N /.M 2 /
Z
M N
Z
M
T .x/ d x D O.1/: 0
(5.119)
Multiplying out the Riesz product (5.116), we have T .x/ D
Y j 2J
C 2
X
X
.1 C Rj .x// D 1 C
Rj .x/C
j 2J
X
Rj1 .x/Rj2 .x/ C 3
j1 <j2 W ji 2J
Rj1 .x/Rj2 .x/Rj3 .x/ C
(5.120)
j1 <j2 <j3 W ji 2J
Using (5.120) in (5.119), we have 1 .M N /.M 2 / C
X k1
k .M N /.M 2 /
Z
M N
Z
M
T .x/ d x D 1C 0
X j1 <:::<jk W ji 2J
Z
M N
Z
M
Rj1 .x/ Rjk .x/ d x D 1: 0
(5.121) The zero integrals in the last step come from the super-orthogonality of the modified Rademacher functions: in each one of the three patterns the integral is zero.
5.5 Starting the Proof of Theorem 5.11 Using Riesz Product
295
It is obvious that (5.121) proves (5.119) with O.1/ D 1. Finally, we turn to requirement (5.111): 1 .M N /.M 2 /
Z
M N
Z
M
.x/T .x/ d x > const log N > 0 0
(5.122)
holds with some positive constant const > 0. The verification of (5.122) is far the most difficult part of the proof. This is where we make the critical decision of how to choose an appropriate modified Rademacher function Rj .x/ from the m huge family Rj .x/ 2 R.j / of size 34 . We choose the “best” Rj .x/ 2 R.j / in order to “synchronize the trivial errors.” (If we don’t synchronize the trivial errors, then they might cancel out and we cannot guarantee extra large deviation.) The synchronization argument is the heart of the proof.
5.5.1 What are the Trivial Errors and How to Synchronize Them By (5.108) and (5.110), the discrepancy function equals .x/ D jP \ .x C H .N //j ı area.H .N //;
(5.123)
and so we can write Z
M N
Z
M
.x/T .x/ d x D 0
Z
M N
Z
M
D 0
Z
M N
Z
M
D 0
0
1
X
@
1 ı area.H .N //A T .x/ d x D
Pi 2P\.xCH .N //
0
X
@
1 1A T .x/ d xıarea.H .N //.M N /.M 2 /;
Pi 2P\.xCH .N //
(5.124) where in the last step we used (5.121), and P1 , P2 , P3 ; : : : denote the elements of the given point set P. Next we change the order of summation and integration: Z 0
M N
Z
M
0 @
X Pi 2P\.xCH .N //
1 1A T .x/ d x D
XZ Pi 2P
T .x/ d x;
Pi H .N /
(5.125)
296
5 Pell’s Equation, Superirregularity and Randomness
where Pi H .N / denotes the reflected and translated copy of the hyperbolic needle H .N /: Pi H .N / D fPi w W w 2 H .N /g:
(5.126)
Combining (5.124) and (5.125), we have 1 .M N /.M 2 / D
X Pi 2P
1 .M N /.M 2 /
Z
M N
Z
M
.x/T .x/ d x D 0
Z Pi H .N /
T .x/ d x ı area.H .N //:
(5.127)
To evaluate (5.127), we multiply out the Riesz product [see (5.120)]: T .x/ D
Y
.1 C Rj .x// D 1 C
j 2J
C 2
X
Rj1 .x/Rj2 .x/ C 3
j1 <j2 W ji 2J
X
Rj .x/C
j 2J
X
Rj1 .x/Rj2 .x/Rj3 .x/ C ;
(5.128)
j1 <j2 <j3 W ji 2J
that is, we have 1 plus the “linear part” plus the “quadratic part” plus the “cubic part” and so on. Note that “1” in fact means the characteristic function B of the rectangle B D Œ0; M N Œ; M (since by definition the modified Rademacher functions are all zero outside of B). We begin with the contribution of 1 D B in (5.127): X Pi 2P
D
X Pi 2P
1 .M N /.M 2 /
Z 1 dx D B\.Pi H .N //
1 area B \ .Pi H .N // ; .M N /.M 2 /
(5.129)
where B D Œ0; M N Œ; M .
5.5.2 Geometric Ideas Next we study the contribution of the “linear part” [see (5.128)] in (5.127). Synchronization means that we want to make the sum
5.5 Starting the Proof of Theorem 5.11 Using Riesz Product
XZ Pi 2P
Pi H .N /
Rj .x/ d x
297
(5.130)
“large positive” (for every j 2 J , where the index-set J f0; 1; 2; : : : ; ng will be specified later). We decompose the underlying rectangle B D Œ0; M N Œ; M into j -cells. Let C be an arbitrary j -cell; it has size 2j 1 2j 2 . Consider a single term in (5.130) and restrict it to the j -cell C. The geometric meaning of the integral Z C\.Pi H .N //
Rj .x/ d x
(5.131)
plays a crucial role in the argument below; see Fig. 5.4 below.
Fig. 5.4
Since the j -cell is very small, the hyperbola arc Pi H .N / can be approximated by its tangent line locally—this explains the tilted straight line segment in Fig. 5.4. The arrows indicate the inside of the hyperbolic needle (i.e., the arc on the picture is the upper arc of the needle). The value of integral (5.131) heavily depends on which one of the three patterns happens to show up in the restriction of Rj .x/ to the j -cell C: the C pattern and the C pattern give two integrals where the values are negatives of each other, and of course the 0 pattern gives zero integral. How to choose the right pattern (C or C or 0) in an arbitrary j -cell C? Well, for a fixed point the choice is trivial: for every fixed point Pi 2 P, exactly one of the two patterns C and C will make the integral (5.131) positive (since the sum of the two integrals is zero). The problem is that we are dealing with a large sum
298
5 Pell’s Equation, Superirregularity and Randomness
XZ Pi 2P
C\.Pi H .N //
Rj .x/ d x
(5.132)
instead of a single term (5.131), and we have to make (5.132) positive. The difficulty is that different points may prefer different patterns, say, for Pi1 the pattern C will make the integral (5.131) positive, and for another point Pi2 the pattern C will make the integral (5.131) positive. To overcome this difficulty, we will apply the Single Dominant Term Rule, which means the following. If the sum (5.132) is “dominated” by a single term (5.131), then by an appropriate choice between the patterns C and C, we can always make this dominant term positive, and show that the contribution of the rest of the terms in (5.132) is relatively negligible. If there is no dominant term in (5.132), then we choose the 0 pattern. Of course, we have to precisely define what “domination” means. The success of the Single Dominant Term Rule is based on the fact that “single term domination” is quite typical: it happens very often among the 4m j -cells. What is a “single term domination” in (5.132)? To explain this, we have to talk about slopes. The slope of the diagonal of a j -cell is 4j 2 =1 4j ; since 1 and 2 are almost equal (we don’t distinguish between positive and negative slopes). Since the hyperbola is a smooth curve, the intersection of a (translated and reflected) hyperbolic needle Pi H .N / with the j -cell C is “almost” like the intersection of C with a half plane or the intersection of C with two “nearly parallel” half planes. Since half planes have well-defined constant slopes, as an intuitive oversimplification, I will use the terms “half plane” and “slope” for the intersections C \ .Pi H .N // (we don’t distinguish between positive and negative slopes). “Single term domination” occurs if there is exactly one half-plane—meaning some C \ .Pi H .N //—with slope close to 4j , and Pi H .N / intersects only one of the four subrectangles (where the pattern is constant) of C, namely, the lower right subrectangle, and this intersection is a “large triangle.” The intersection requirement “large triangle from the lower right subrectangle” guarantees that the integral (5.131) is “far from zero.” The integral (5.131) of this dominant term (“far from zero”) is called the trivial error. Note that the reflected hyperbolic needle H .N / has two long arcs: the upper arc, which is increasing, and the lower arc, which is decreasing (the lower arc is under the upper arc). When we say “Pi H .N / intersects C”, then it always means that (at least) one of the two long arcs of Pi H .N / intersects C. For example, in the trivial error mentioned above the intersection comes from the upper arc.
5.5 Starting the Proof of Theorem 5.11 Using Riesz Product
299
5.5.3 An Important Consequence of the “Rectangle Property” As we said above, “single term domination” means that there is exactly one half plane (i.e., some C \ .Pi H .N //) with slope very close to 4j . It is important to point out that we cannot have two half planes with slopes very close to 4j such that both are upper arcs. Indeed, if C \ .Pi1 H .N // and C \ .Pi2 H .N // are both upper arcs with slopes very close to 4j (see Fig. 5.5), then the two points Pi1 and Pi2 have to be in the same axes-parallel rectangle of area c1 (namely, in an axes-parallel rectangle where the slope of the diagonal is close to 4j ). But two points in the same axes-parallel rectangle of area c1 is impossible: it contradicts the hypothesis of Proposition 5.12.
Fig. 5.5
What can happen, however, is that we have two half planes with slopes very close to 4j such that one is an upper arc and the other one is a lower arc. For example, it can happen that C \ .Pi1 H .N // is an upper arc and C \ .Pi2 H .N // is a lower arc with both slopes close to 4j (we don’t distinguish between positive and negative slopes). To overcome this difficulty, we switch to a 22 configuration of j cells. More precisely, instead of working with a single j -cell C, we switch to a 2 2 configuration of four neighboring j -cells C1 , C2 , C3 , and C4 , where C1 denotes the upper left, C2 is the upper right, C3 is the lower left and C4 is the lower right member of the 2 2 configuration. The simple geometric idea is the following. Assume that the upper arc of Pi1 H .N / intersects both C2 and C3 satisfying the requirement “large triangle from the lower right subrectangle (where the pattern is constant)”. Then obviously the lower arc of Pi2 H .N / cannot intersect both of C2 and C3
300
5 Pell’s Equation, Superirregularity and Randomness
(we assumed that the slopes are close to 4j ). Therefore, either C2 or C3 will be a j -cell with “single term domination.” That is, we can always save at least one from the four neighboring j -cells C1 , C2 , C3 , and C4 , see Fig. 5.6 (where C3 has “single term domination”).
Fig. 5.6
5.5.4 Choosing a Short Vertical Translation Next we explain how to satisfy the intersection requirement “large triangle from the lower right subrectangle (where the pattern is constant).” This is very important, since this requirement guarantees that the dominant integral (5.131) is “far from zero.” First we pick an arbitrary point Pi 2 P; then of course the hyperbolic needle Pi H .N / has a “long” arc such that the slope is close to 4j ; “long” in fact means “length of roughly 2j .” Therefore, for each point Pi 2 P there is a j -cell C such that the intersection C \.Pi H .N // has slope close to 4j . But, unfortunately, nothing guarantees that Pi H .N / intersects only one of the four subrectangles (where the pattern is constant). The solution is very simple: we apply a “short” vertical translation for the point set P (but of course the modified Rademacher functions and the test function T .x/ remain fixed in the rectangle B D Œ0; M N Œ; M ). “Short” vertical translation means that the length of the vertical translation runs from 0 to 1. For a j -cell already a translation of length from 0 to 2j 2 suffices: as the point Pi is moving up vertically, the intersection C \ .Pi H .N // changes and has “good” positions where Pi H .N / intersects only the lower right subrectangle (where the pattern is constant), and at the same time, this intersection is a “large triangle.” Since the slope is close to 4j , a positive constant percentage of the translations is “good.” If we apply translations from 0 to 1, then it will work for all j .
5.5 Starting the Proof of Theorem 5.11 Using Riesz Product
301
It follows from a standard average argument that there is a vertical translation 0 < t0 < 1 (in fact, the majority will do) which is “good” for “many” pairs .Pi ; j / at the same time, where Pi 2 P is a given point and j 2 f0; 1; 2; : : : ; ng is an order (of the modified Rademacher function). Here “many” means positive constant percentage of all pairs. Of course, a vertical translation has a bad side effect: some points leave the underlying square Œ0; M 2 . But, luckily for us, it suffices to use “short” translations of length 1, which means that we just lose relatively few points, namely, those that are close to the border. The hypothesis of Proposition 5.12 (see the “rectangle property”) guarantees that there are just O.M / points close to the border, which is negligible compared to the number ı M 2 of the points in P (linear is negligible compared to quadratic; we assume that ı is fixed and M is “large”).
5.5.5 Summarizing the Vague Geometric Intuition A “typical” vertical translation of length 0 < t0 < 1 has the property that, for a positive constant percentage of the pairs .j; C/, where j 2 f0; 1; 2; : : : ; ng and C is a j -cell, we have “single term domination,” implying (and here we skipped a lot of technical details!) Z XZ 1 Rj .x/ d x Rj .x/ d x const > 0; 2 C\.Pi0 H .N // Pi 2P C\.Pi H .N // (5.133) where Pi0 is the “dominating” point, i.e., the intersection C \ .Pi0 H .N // has slope close to 4j , and this intersection is a “large triangle from the lower right subrectangle of C (where the pattern is constant).” We will explain the missing details of (5.133) later, including an explicit value for “const.” The Single Term Domination Rule and (5.133) give X X j 2J Pi 2P
1 .M N /.M 2 /
Z Pi H .N /
Rj .x/ d x constjJ j const.nC1/>0: (5.134)
The geometric intuition requires that j 2 J satisfies an inequality like 1 N max 1; 2j min N; :
(5.135)
To guarantee (5.135), we choose J to be the interval J W
n o
log max 1; 1 log 2
j
log N log .maxf1; g/ : log 2 log 2
(5.136)
302
5 Pell’s Equation, Superirregularity and Randomness
We emphasize that this was just an “intuitive” proof for (5.134). We will return to (5.133) and (5.134) later and show how to make the whole thing perfectly precise and explicit. This concludes Sect. 5.5. We complete the proof of Proposition 5.12 in the next three sections. Note that (5.134) is the most difficult part.
5.6 More on the Riesz Product 5.6.1 Applying Super-Orthogonality Next we turn to the contribution of the “quadratic,” “cubic,” and even higher order parts of the Riesz product [see (5.128)] in (5.127). Let k 2 and let 0 j1 < : : : < jk n be k orders written as an increasing sequence. Let C be an elementary cell of size 2j1 1 2jk 2 : C is the intersection of k cells of orders j1 < : : : < jk . Super-orthogonality yields that the product Rj1 .x/ Rjk .x/ (of k modified Rademacher functions of given orders) restricted to C equals to one of the following three patterns: C or C or 0. Assume that the translated and reflected hyperbolic needle Pi H .N / intersects C ; let slope D slope.C \ .Pi H .N /// denote the slope of the intersection C \ .Pi H .N // (we don’t distinguish between positive and negative slopes). A simple geometric consideration shows that, roughly speaking, the integral 1 area.C /
Z C \.Pi H .N //
Rj1 .x/ Rjk .x/ d x
is “negligible” unless the slope of the intersection C \ .Pi H .N // is “close” to 2.j1 Cjk / (=the slope of the diagonal of C , which has size 2j1 1 2jk 2 ). The precise statement goes as follows: ˇ ˇZ ˇ ˇ 1 1 ˇ ˇ j1 Cjk R .x/ R .x/ d x ; slope 2 min : ˇ ˇ j j 1 k ˇ area.C / ˇ C \.Pi H .N // slope 2j1 Cjk (5.137)
Note that (5.137) is a straightforward corollary of the geometry of the three possible patterns of Rj1 .x/ Rjk .x/ in C . The hyperbolic needle H .N / is bounded by the long curves y D =x and its negative y D =x (where 1 x N ), and the slope is the derivative .=x/0 D x 2 . The number of elementary cells C of size 2j1 1 2jk 2 intersecting a fixed hyperbolic needle Pi H .N / is estimated from above by the simple expression 2
N 2 C j 2j 1 1 2 k 2
:
(5.138)
5.6 More on the Riesz Product
303
Here the factor 2 comes from the two long boundary curves (hyperbolas), the first fraction comes from the pointed end of the hyperbolic needle, and the second fraction comes from the wide part of the needle. A more detailed explanation of (5.138) goes as follows. Let’s start with the pointed end of the hyperbolic needle H .N / (bounded by the curves y D =x and y D =x, where 1 x N ). p Case A: When x runs in the interval N x 2.j1 Cjk /=2 , then the slope of the intersection C \ .Pi H .N // is x 2 , which is less than 2.j1 Cjk / (=the slope of the diagonal of C ). Therefore, in this range Pi H .N / intersects less than 2
N 2j1
1
elementary cells C of size 2j1 1 2jk 2 . p Case B: When x runs in the interval 2.j1 Cjk /=2 x 1, then the slope of the intersection C \ .Pi H .N // is larger than 2.j1 Cjk / (=the slope of the diagonal of C ). Therefore, in this range Pi H .N / intersects less than 2
2 2jk
2
elementary cells C (of size 2j1 1 2jk 2 ). In Case A we look at the hyperbola xy D as y D =x and in Case B we look at it as x D =y, i.e., in Case B we switch the role of the coordinate axes. Thus by (5.137) and (5.138) we have ˇ ˇZ Z ˇ ˇ 2 N N 2j1 Cjk ˇ ˇ Rj1 .x/ Rjk .x/ d xˇ 2 j area.C / dx C ˇ p 1 ˇ ˇ Pi H .N / 2 1 N xD 2.j1 Cjk /=2 x2
C2
2 2 area.C / 2jk 2
Z
p yD 2.j1 Cjk /=2
2.j1 Cjk / dy: y2
(5.139)
Since area.C / D 2j1 1 2jk 2 , by using this fact in (5.139) we have, ˇ ˇZ ˇ ˇ 2j1 Cjk ˇ ˇ jk p .j1 Cjk /=2 Rj .x/ Rjk .x/ d xˇ 42 2 2 C ˇ ˇ ˇ Pi H .N / 1 N C41 2j1
p
2.j1 Cjk /=2 2.j1 Cjk /
p 4 .1 C 2 /2.j1 jk /=2 :
(5.140)
304
5 Pell’s Equation, Superirregularity and Randomness
Let’s return now to (5.127)–(5.129). We recall the notation B D Œ0; M N Œ; M . We have 1 .M N /.M 2 /
Z
M N
Z
M
.x/T .x/ d x D 0
0
1 X area B \ .Pi H .N // A ı area.H .N // C D@ .M N /.M 2 / P 2P i
C
X X j 2J Pi 2P
C
X
X
k
1 .M N /.M 2 /
X
j1 <:::<jk W Pi 2P ji 2J
k2
Z
1 .M N /.M 2 /
Pi H .N /
Rj .x/ d x C
Z Pi H .N /
Rj1 .x/ Rjk .x/ d x: (5.141)
By using (5.140) it is easy to estimate the last line in (5.141). By (5.140) we have X
X
k
k2
X
j1 <:::<jk W Pi 2P ji 2J
X k2
X
k
ˇ ˇZ ˇ ˇ 1 ˇ ˇ Rj1 .x/ Rjk .x/ d xˇ ˇ ˇ ˇ .M N /.M 2 / Pi H .N / X
0j1 <:::<jk n Pi 2P
1 p 4 .1 C 2 /2.j1jk /=2 : .M N /.M 2 / (5.142)
For convenience, let us put q D jk j1 . We estimate the sum X k2
k
nkC1 X nj X1
X
2q=2 :
(5.143)
j1 D0 qDk1 j1 <j2 <:::<jk1 <j1 Cq
In the innermost sum in (5.143), the indices j2 ;: : : ;jk1 can be chosen, among the q1 ways. To simplify (5.143), we q 1 numbers lying between j1 and j1 C q, in k2 can let the indices j1 and q run up to n. Then we change the order of summation. Thus we have
5.6 More on the Riesz Product
X
k
305
nkC1 X nj X1
X
2q=2
j1 D0 qDk1 j1 <j2 <:::<jk1 <j1 Cq
k2
X
! q 1 q=2 D 2 k2
n n X X j1 D0 qDk1
k2
D
k
n X n X
2
q=2
j1 D0 qD1
qC1 X
kD2
k
! q1 : k2
(5.144)
The innermost sum in (5.144) equals qC1 X
! ! qC1 X q1 2 k2 q 1 D D 2 .1 C /q1 : k2 k2
k
kD2
kD2
Using this in (5.144), we have with 0 < < n X n X
2
q=2
j1 D0 qD1
qC1 X kD2
k
p
21
! n X n X q1 2q=2 2 .1 C /q1 D D k2 j D0 qD1 1
n 2 X 1 C q1 p D .n C 1/ p 2 qD1 2 1 2 X 1 C q1 p D .n C 1/ p 2 qD1 2 1 2 2 : D .n C 1/ p D .n C 1/ p 1C 2 1 p 21 2
(5.145)
Combining (5.142)–(5.145), we obtain p Lemma 5.13. If 0 < < 2 1 then X
k2
k
X
X
j1 <:::<jk W Pi 2P ji 2J
ˇ ˇZ ˇ ˇ 1 ˇ ˇ Rj1 .x/ Rjk .x/ d xˇ ˇ ˇ .M N /.M 2 / ˇ Pi H .N /
jPj 2 p 4 .1 C 2 / .n C 1/ p : .M N /.M 2 / 21
(5.146)
306
5 Pell’s Equation, Superirregularity and Randomness
Now we return to (5.141). We show that the contribution of the second line in (5.141) is o.1/, i.e., it is negligible. We recall that jPj D ıM 2 , and Pi H .N / B D Œ0; M N Œ; M for all but O.M / points Pi 2 P. Thus we have 1 X area B \ .Pi H .N // A ı area.H .N // D @ .M N /.M 2 / P 2P 0
i
D
ıM 2 C O.M / area.H .N // ı area.H .N // D .M N /.M 2 / D
O.N log N / D o.1/:: M
(5.147)
Next we turn to the third line in (5.141). By (5.134) we have
X X j 2J Pi 2P
Z
1 .M N /.M 2 /
Pi H .N /
Rj .x/ d x
const .n C 1/ > 0;
(5.148)
where const stands for some unspecified yet positive constant. Combining (5.141), (5.146)–(5.148), we obtain 1 .M N /.M 2 / const1 .n C 1/ p
Z
M N
Z
M
.x/T .x/ d x 0
2 21
const2 .n C 1/ o.1/;
(5.149)
where we have two positive constants, both denoted by const (the first one is p 2 1. By choosing a sufficiently small in unspecified yet), and 0 < < p 0 < < 2 1, we clearly have 1 .M N /.M 2 /
Z
M N
Z
M
.x/T .x/ d x 0
const .n C 1/ > const log N > 0;
(5.150)
proving (5.111), and thus proving Proposition 5.12 in the positive direction [see (5.1060)]. It remains to clarify the missing details in (5.133) and (5.134), see Summarizing the intuition at the end of Sect. 5.5.
5.6 More on the Riesz Product
307
5.6.2 Single Term Domination: Clarifying the Technical Details The geometric ideas introduced in Sect. 5.5 lead to the following conclusion. At least half of the short vertical translations P C .0; t0 / (where 0 < t0 < 1) of the given point set P have the property that, for at least 1 % of the pairs .j; C/, where j 2 f0; 1; 2; : : : ; ng and C is a j -cell of the underlying rectangle B D Œ0; M N Œ; M , there is a “single term domination.” This property includes— among other requirements to be specified later—that there is a dominating point Pi0 D Pi0 .j; C/ 2 P such that 1. C \ .Pi0 H .N // has slope between 56 4j and 76 4j ; 2. Pi0 H .N / intersects only the lower right subrectangle of C, and the intersection is a “large triangle”; 1 3. to be a “large triangle” means that the area is 32 of the area of C, that is, the 1 2 area is 32 . Then, by choosing the pattern C in the j -cell C, we have Z C\.Pi0 H .N //
Rj .x/ d x
1 2 : 32
(5.151)
To justify the notion single term domination, we show that for a “typical” pair .j; C/, the contribution of the rest of the points Pi 2 P W i ¤ i0 in the j -cell C is “negligible”: ˇ ˇ ˇ ˇ X Z ˇ 1 2 ˇ ˇ Rj .x/ d xˇˇ : ˇ 40 ˇ ˇPi 2PW i ¤i0 C\.Pi H .N //
(5.152)
To prove (5.152), let Pi ¤ Pi0 be another point in P such that Pi H .N / intersects C, i.e., the upper or lower arc of the boundary of the hyperbolic needle Pi H .N / does intersect the j -cell C. We are going to distinguish four cases, depending on the type of the intersection of Pi H .N / with C, i.e., upper or lower arc, and close to horizontal or close to vertical (relative to the diagonals of C). We begin with Case 1: The upper arc of Pi H .N / intersects C, and the slope is smaller than the slope of the “dominant needle” Pi0 H .N / (see Fig. 5.7).
308
5 Pell’s Equation, Superirregularity and Randomness
Fig. 5.7
Let Pi0 D .ai0 ; bi0 /, Pi D .ai ; bi / denote the coordinates of the two points, see Fig. 5.7. By the hypothesis of Case 1, ai > ai0 . Write h D ai ai0 > 0 and v D bi bi0 ; where of course h stands for horizontal and v stands for vertical. The “rectangle property” guarantees that hjvj c1 > 0. Let .A1 ; A2 / denote the coordinates of the lower left corner of the j -cell C. The intersection of the vertical line x D A1 with the upper arcs of Pi0 H .N / and Pi H .N / gives two points: the hypothesis of Case 1 implies that these intersection points are “close” to each other. More precisely, with x D 1 C ai0 A1 (where ai0 A1 > 0 and the additive term “1” comes from the fact that the hyperbolic needle H .N / begins at 1), we have the upper bound ˇ ˇ
ˇ ˇ ˇ < 2 2j 2 : ˇ b i C bi C (5.153) ˇ 0 x xCh ˇ Since bi bi0 D v, we can rewrite (5.153) as follows: ˇ ˇ ˇ ˇ ˇ ˇ h ˇ ˇ j C1 ˇ ˇ ˇ ˇ 2 : ˇ x x C h vˇ D ˇ x.x C h/ vˇ < 2
(5.154)
On the other hand, we know that the slope of the upper arc of C \ .Pi0 H .N // satisfies the inequality 7 5 j 4 2 4j : 6 x 6
(5.155)
5.6 More on the Riesz Product
309
We claim that if 1 (and so 2 ) is a small constant, then the upper arc of Pi0 H .N / intersects a “large number” of j -cells (different from C) such that the slope is still almost equal to 4j . Indeed, the horizontal size of C is 2j 1 , and, assuming that (5.155) holds, the inequality 5 j 7 4j 4 j 2 6 .x C `2 1 / 6 has constant times 11 consecutive integer solutions in `. If 1 > 0 is small then of course 11 is a “large number,” justifying our claim. Returning to (5.154) and (5.155), and using the substitution x D x C `2j 1 , we have the inequalities ˇ ˇ ˇ ˇ h ˇ < 2j C12 ˇ v ˇ ˇ .x C `2j /.x C `2j C h/ 1 1
(5.156)
and 7 5 j 4j : 4 6 .x C `2j 1 /2 6
(5.157)
p
If (5.155) holds, then there are at least 101 consecutive integer solutions ` of (5.157). The basic idea is the following: if ` runs through these integer solutions of (5.157), and of course , x, h, v remain fixed, then the function (=function of `)
.x C
`2j
h j 1 /.x C `2 1 C h/
(5.158)
has “substantially different” values, and we expect only a very few of them to be “very close” to a fixed v in the quantitative sense of (5.156) (of course, here we assume that 2 is “small”). Next we work out the details of this intuition. We begin with the following corollary of (5.157): r
6 j 2 x C `2j 1 5
r
6 j 2 ; 7
(5.159)
and using this in (5.158), we have the good approximation
.x C
`2j
h h p j p j D j 2 . 2 C h/ 1 /.x C `2 1 C h/ D
h 2j 2j C
ph
:
(5.160)
310
5 Pell’s Equation, Superirregularity and Randomness
p Next we distinguish two cases. First assume that 0 < h c1 2j 1 , where c1 > 0 is the positive constant in the “rectangle property.” Then the “rectangle property” yields that jvj
p c1 c1 p j 1 D 2 c1 2j ; h c1 2
(5.161)
and also h 2j 2j C
ph
h j 2 2j
<
p c1 j 2 : 2
(5.162)
Assuming p
c1 ; 2
2 <
(5.163)
(5.160)–(5.163) imply that (5.156) has no solution. We can assume, therefore, that the following lower bound holds for h: h>
p
c1 2j 1 :
(5.164)
Now we go back to the basic idea. We claim that if we switch ` to ` C 1 in the function [see (5.158)] h ; .x C `2j 1 /.x C `2j 1 C h/
(5.165)
then (5.165) changes at least as much as 1 q 2j 2 : 1 C c1
(5.166)
Indeed, by (5.159),
.x C
h 1 h p j p j : j j C `2 1 C h/ 2 C 2 1 2 C h
`2j 1 /.x
(5.167)
We have the routine estimation 1 1 1 D p j p j p j j 2 2 C 2 1 2 1 Dp j 2
1 p
1 p
2
C
1 p
1 1 1 C p1 !
3
!
1 : 2j
D
(5.168)
5.6 More on the Riesz Product
311
Furthermore, by (5.164), p
h h : D q >p 2h j 2 C h pc1 C h 2 c1 C 1
(5.169)
Combining (5.167)–(5.169), (5.166) follows. Let’s return to (5.165) and (5.166), and apply it in (5.156). We obtain that, among the p const=1 consecutive integer values of ` satisfying (5.157), only const .1 C =c1 / will satisfy (5.156). More explicitly, it is safe to say that r values of ` will satisfy both (5.156) and (5.157): at most 10 1 C c1 (5.170) The next step is
5.6.3 A Combination of the Rectangle Property and the Pigeonhole Principle We recall (5.164): h > decomposition: 2r1
p
c1 2j 1 . We consider the following power-of-two type
p j p c1 2 < h 2r c1 2j ; r D 0; 1; 2; : : :
(5.171)
We claim that, for a fixed point Pi0 D .ai0 ; bi0 / 2 P and for a fixed integer r 0, there are at most r r 10 2 (5.172) c1 other points Pi D .ai ; bi / 2 P (Pi ¤ Pi0 ) such that h D hi D ai ai0 > 0 and v D vi D bi bi0 satisfy (5.156) [and implicitly (5.157)] and (5.171). To prove (5.172), first note that if h D hi satisfies (5.171), then by (5.160) and (5.171), h h .x C `2j 1 /.x C `2j 1 C h/ j j 2 2 C p 2 r c1 2 j D p 2j .2j C p1 2r c1 2j /
p1
ph
2j ; C p1c1 2r
312
5 Pell’s Equation, Superirregularity and Randomness
so a solution of (5.156) gives the approximation v D vi 2
!
1
j p1
C
˙ 22 :
p1 2r c1
(5.173)
Assuming
2 < 8
1 p1
p1 c1
C
;
(5.174)
(5.173) yields the good approximation 1
v D vi 2j
p1
C
p1 2r c1
:
(5.175)
Now suppose that (5.172) is not true. If there are more than r 10
r 2 c1
other points Pi D .ai ; bi / 2 P (Pi ¤ Pi0 ) such that h D hi D ai ai0 > 0 and v D vi D bi bi0 satisfy (5.156) [and implicitly (5.157)] and (5.171), then by the Pigeonhole Principle and (5.175) there must exist two points Pi1 ; Pi2 2 P (i1 ¤ i2 ) such that vi1 2j
1 p1
C
p1 2r c1
vi 2
and jhi1 hi2 j
p j c1 2 c1 2 j D p : q 10 10 c1 2r 2r
Since the product 2j
1 p1
C
p1 2r c1
c1 2j c1 q p D 1 C c1 2r
is less than c1 , we obtain that there exists an axes-parallel rectangle of area less than c1 containing at least two points of P (namely, Pi1 and Pi2 ). This contradicts the rectangle property and proves (5.172).
5.6 More on the Riesz Product
313
If h D hi falls into the interval (5.171), then slope of C \ .Pi H .N // D
2 4j ; .x C h/2 h c1 4 r
(5.176)
where 4j (almost) equals the slope of the diagonals of the j -cell C. By (5.176) we have ˇZ ˇ ˇ ˇ 1 ˇ ˇ Rj .x/ d xˇ 10 r : (5.177) ˇ ˇ area.C/ ˇ C\.Pi H .N // c1 4 What is more, (5.177) holds for all j -cells C satisfying the property 7 5 j 4 slope of C \ .Pi0 H .N // 4j : 6 6
(5.178)
Let’s return now to (5.152). Combining (5.170)–(5.172) and (5.177) we have X
X
Pi 2PW i ¤i0 CW all j cel ls satisfying (5.178) C ase 1
D 103
c1
ˇ ˇZ ˇ ˇ 1 ˇ ˇ Rj .x/ d xˇ ˇ ˇ ˇ area.C/ C\.Pi H .N //
r r r 10 1 C 2 10 r D 10 c1 c1 c1 4 r0
X
3=2 C
Since there are at least (5.155) holds), we have
c1
101
1 0 3=2 2 ! 2 ! X r 3 @ 2 A D 2 10 C : c1 c1 r0 (5.179) consecutive integer solutions ` of (5.157) (assuming X CW all j cel ls satisfying (5.178)
p
1
: 101
(5.180)
Let’s return to (5.152). As we said above, in order to prove (5.152), we distinguish four cases. Inequalities (5.179) and (5.180) complete Case 1. The remaining three cases will be discussed in the next section. Note that these cases are quite similar to Case 1, but there are some annoying differences in the minor details. We will be able to finish the proof of Proposition 1.19 in Sect. 5.8.
314
5 Pell’s Equation, Superirregularity and Randomness
5.7 Completing the Case Study 5.7.1 Verifying (5.152) Let’s return to (5.151) and (5.152). Again we assume that there is a dominating point Pi0 D Pi0 .j; C/ 2 P such that 1. C \ .Pi0 H .N // has slope between 56 4j and 76 4j ; 2. Pi0 H .N / intersects only the lower right subrectangle of the j -cell C, and the intersection is a “large triangle”; 1 3. to be a “large triangle” means that the area is 20 of the area of C, that is, 1 2 the area is 32 . Let Pi ¤ Pi0 be another point in P such that Pi H .N / intersects C, i.e., the upper or lower arc of the boundary of the hyperbolic needle Pi H .N / does intersect the j -cell C. Next we discuss the second case, which is similar to the first case: roughly speaking, we switch the roles of horizontal and vertical. Case 2: The upper arc of Pi H .N / intersects C, and the slope is larger than the slope of the “dominant needle” Pi0 H .N / (see Fig. 5.8 below). Let Pi0 D .ai0 ; bi0 /, Pi D .ai ; bi / denote the coordinates of the two points. By the hypothesis of Case 2, ai0 > ai . Write h D ai0 ai > 0 and v D bi0 bi ; where again h stands for horizontal and v stands for vertical. The “rectangle property” guarantees that hjvj c1 > 0. Let .A1 ; A2 / denote the coordinates of the upper left corner of the j -cell C. The intersection of the horizontal line y D A2 with the upper arcs of Pi0 H .N / and Pi H .N / give two points: the hypothesis of Case 2 implies that these intersection points are “close” to each other in the following quantitative sense. Write y D A2 bi0 , then we have the upper bound ˇ ˇ ˇ ˇˇ ˇ ai a < 2 2j 1 : i 0 ˇ yCv y ˇ
(5.181)
Since ai0 ai D h > 0, we can rewrite (5.181) as follows: ˇ ˇ ˇ ˇ ˇ ˇ ˇ v ˇ j ˇ ˇ ˇ ˇ ˇ y y C v hˇ D ˇ y.y C v/ hˇ < 2 2 1 : We emphasize that y C v > 0. Indeed, otherwise 0 y C v D .A2 bi0 / C .bi0 bi / D A2 bi ;
(5.182)
5.7 Completing the Case Study
315
implying bi A2 , which means that the whole upper arc of Pi H .N / is above the j -cell C. But this is impossible, since in Case 2 we assumed that the upper arc of Pi H .N / does intersect C.
Fig. 5.8
Since we switch the roles of horizontal and vertical, we focus on the reciprocal of the slope: we know that the reciprocal of the slope of the upper arc of C \ .Pi0 H .N // satisfies the inequality 6 6 j 4 2 4j : 7 y 5
(5.183)
We claim that if 2 (and so 1 ) is a small constant, then the upper arc of Pi0 H .N / intersects a “large number” of j -cells (different from C) such that the reciprocal of the slope is still almost equal to 4j . Indeed, the vertical size of C is 2j 2 , and, assuming that (5.183) holds, the inequality 6 j 6 4j 4 7 .y C `2j 2 /2 5 has constant times 12 consecutive integer solutions in `. If 2 > 0 is small then of course 12 is a “large number,” justifying our claim. Returning to (5.182) and (5.183), and using the substitution y D y C `2j 2 , we have the inequalities: ˇ ˇ ˇ ˇ v j C1 ˇ ˇ ˇ .y C `2j /.y C `2j C v/ hˇ < 2 1 2 2
(5.184)
316
5 Pell’s Equation, Superirregularity and Randomness
and 6 6 j 4j : 4 7 .y C `2j 2 /2 5
(5.185)
p
If (5.183) holds, then there are at least 102 consecutive integer solutions ` of (5.185). The basic idea is the same as in Case 1: if ` runs through these integer solutions of (5.185), and of course , x, h, v remain fixed, then the function (=function of `) v .y C `2j 2 /.y C `2j 2 C v/
(5.186)
has “substantially different” values, and we expect only a very few of them to be “close” to a fixed h in the quantitative sense of (5.184) (of course, here we assume that 1 is “small”). Next we work out the details of this intuition. We begin with the following corollary of (5.185): r
6 j 2 y C `2j 2 7
r
6 j 2 ; 5
(5.187)
and using this in (5.186), we have the good approximation .y C
v v p j p j D j C `2 2 C v/ 2 . 2 C v/
`2j 2 /.y
D
2j
v 2j C
pv
:
(5.188)
Next we distinguish three cases. First assume that v is negative. Since y C v > 0, we have y 1 < .y C v/1 , and so by (5.184) we have 2j C1 1 > jhj D h:
(5.189)
Combining (5.189) with the rectangle property, we obtain jvj
c1 j c1 2 ; > h 21
(5.190)
and using (5.190) in (5.188), we have
2j
v 2j C
D
p1
pv
D
2j
jvj pv
2j
2j 2j p > 1 D 2j ; 1 p jvj2j
D
(5.191)
5.7 Completing the Case Study
317
assuming c1 1 < p : 2
(5.192)
Combining (5.184) and (5.188)–(5.191), we conclude 2j C11 > h >
1 p j 2 2j C1 1 ; 2
which is an obvious contradiction if p 1 < : 8
(5.193)
This proves that v > 0. p Next assume that 0 < v c1 2j 1 , where c1 > 0 is the positive constant in the rectangle property. The rectangle property yields that h
p c1 c1 p j 1 D 2 c1 2j ; v c1 2
(5.194)
p c1 j 2 : 2
(5.195)
and also
2j
v j 2 C
pv
<
v 2j 2j
Assuming p 1 <
c1 ; 2
(5.196)
(5.188), (5.194)–(5.196) imply that (5.184) has no solution. We can assume, therefore, that the following lower bound holds for v: v>
p j 1 c1 2 :
(5.197)
Now we go back to the basic idea. We claim that if we switch ` to ` C 1 in the function [see (5.186)] .y C
`2j
v ; j 2 /.y C `2 2 C v/
(5.198)
318
5 Pell’s Equation, Superirregularity and Randomness
then (5.198) changes at least as much as 1 q 2j 2 : 1 C c1
(5.199)
Indeed, by (5.187),
.y C
v v 1 p j p j : j C `2 2 C v/ 2 2 C v
(5.200)
`2j 2 /.y
We have the routine estimation 1 1 1 D p j p j p j j 2 2 C 2 2 2 1 D p j 2
2 p
2 p
2 C
2 p
1 1 1 C p2 !
3
2 j 2 :
! D
(5.201)
Furthermore, by (5.197), p
v v >p : D q 2v p 2j C v c1 C v 2 c1 C 1
(5.202)
Combining (5.200)–(5.202), (5.199) follows. Let’s return to (5.198) and (5.199) and apply it in (5.184). We obtain that, among the p const=2 consecutive integer values of ` satisfying (5.185), only const .1 C =c1 / will satisfy (5.184). More explicitly, it is safe to say that r at most 10 1 C values of ` will satisfy both (5.184) and (5.185): c1 (5.203) Just like in Case 1, the next step is
5.7.2 A Combination of the Rectangle Property and the Pigeonhole Principle We recall (5.197): v > decomposition: 2r1
p j 1 c1 2 . We consider the following power-of-two type
p j p c1 2 < v 2r c1 2j ; r D 0; 1; 2; : : :
(5.204)
5.7 Completing the Case Study
319
We claim that, for a fixed point Pi0 D .ai0 ; bi0 / 2 P and for a fixed integer r 0, there are at most r r 10 2 (5.205) c1 other points Pi D .ai ; bi / 2 P (Pi ¤ Pi0 ) such that h D hi D ai0 ai > 0 and v D vi D bi0 bi > 0 satisfy (5.184) [and implicitly (5.185)] and (5.204). To prove (5.205), first note that if v D vi satisfies (5.204), then by (5.188) and (5.204), .y C
v v j C `2 2 C v/ 2j 2j C
`2j 2 /.y
p 2r c1 2j D p 2j .2j C p1 2r c1 2j /
p1
pv
2j ; C p1c1 2r
so a solution of (5.184) gives the approximation h D hi 2 j
!
1 p1
C
p1 2r c1
˙ 21 :
(5.206)
Assuming
1 < 8
1 p1
C
p1 c1
;
(5.207)
(5.206) yields the good approximation h D hi 2 j
1 p1
C
p1 2r c1
:
(5.208)
Now suppose that (5.205) is not true. If there are more than r 10
r 2 c1
other points Pi D .ai ; bi / 2 P (Pi ¤ Pi0 ) such that h D hi D ai0 ai > 0 and v D vi D bi0 bi > 0 satisfy (5.184) [and implicitly (5.185)] and (5.204), then by the Pigeonhole Principle and (5.208) there must exist two points Pi1 ; Pi2 2 P (i1 ¤ i2 ) such that hi1 2j
1 p1
C
p1 2r c1
hi 2
320
5 Pell’s Equation, Superirregularity and Randomness
and jvi1 vi2 j
p j c1 2 c1 2j D p : q 10 10 c1 2r
2r
Since the product 2j
1 p1
C
p1 2r c1
c1 c1 2j q p D 1 C c1 2r
is less than c1 , we obtain that there exists an axes-parallel rectangle of area less than c1 containing at least two points of P (namely, Pi1 and Pi2 ). This contradicts the rectangle property, and proves (5.205). If v D vi falls into the interval (5.204), then 2 4j ; .y C v/2 v c1 4 r (5.209) where 4j (almost) equals the reciprocal of the slope of the diagonals of the j -cell C. By (5.209) we have the reciprocal of the slope of C \ .Pi H .N // D
ˇ ˇZ ˇ ˇ 1 ˇ ˇ Rj .x/ d xˇ 10 r : ˇ ˇ area.C/ ˇ C\.Pi H .N // c1 4
(5.210)
What is more, (5.210) holds for all j -cells C satisfying (5.178): 5 j 7 4 slope of C \ .Pi0 H .N // 4j : 6 6 Let’s return now to (5.152). Combining (5.203)–(5.205) and (5.210) we have the perfect analog of (5.179): X
X
Pi 2PW i ¤i0 CW all j cel ls satisfying (5.178) C ase 2
ˇ ˇZ ˇ ˇ 1 ˇ ˇ Rj .x/ d xˇ ˇ ˇ area.C/ ˇ C\.Pi H .N //
r r r 10 1 C 2 10 r D 10 c c c 1 1 14 r0 X
D 103
c1
3=2
C
1 0 2 ! X 3=2 2 ! r 3 @ : 2 A D 2 10 C c1 c c 1 1 r0 (5.211)
This completes Case 2.
5.7 Completing the Case Study
321
Next comes Case 3: The lower arc of Pi H .N / intersects C, and the slope is smaller than the slope of the “dominant needle” Pi0 H .N / (see Fig. 5.9). We emphasize that we don’t distinguish between positive and negative slopes. Again let Pi0 D .ai0 ; bi0 /, Pi D .ai ; bi / denote the coordinates of the two points. By the hypothesis of Case 3, ai > ai0 . Write h D hi D ai ai0 > 0 and v D vi D bi bi0 ; where of course h stands for horizontal and v stands for vertical. It is obvious from the geometry of Case 3 that v D vi D bi bi0 > 0. The rectangle property guarantees that hv c1 > 0. Let .A1 ; A2 / denote the coordinates of the lower left corner of the j -cell C. The intersection of the vertical line x D A1 with the upper arc of Pi0 H .N / and the lower arc of Pi H .N / gives two points: the hypothesis of Case 3 implies that these intersection points are “close” to each other. More precisely, just like in Case 1 we write x D 1 C ai0 A1 , then we have the upper bound ˇ ˇ
ˇ ˇ ˇ b i C bi ˇ < 2 2j 2 : ˇ 0 x xCh ˇ
(5.212)
Fig. 5.9
Since bi bi0 D v, we can rewrite (5.212) as follows: ˇ ˇ ˇ ˇ ˇ ˇ < 2j C12 : v C ˇ x ˇ xCh
(5.213)
322
5 Pell’s Equation, Superirregularity and Randomness
Notice that (5.213) is an analog of (5.154) in Case 1: the only difference is that the “minus” is replaced by “plus.” This means that we can basically repeat the argument in Case 1; in fact, the “plus” just helps and makes Case 3 simpler than that of Case 1. Furthermore, we know that the slope of the upper arc of C \ .Pi0 H .N // satisfies the inequality 5 j 7 4 2 4j : 6 x 6
(5.214)
Again, if 1 (and so 2 ) is a small constant, then the upper arc of Pi0 H .N / intersects a “large number” of j -cells (different from C) such that the slope is still almost equal to 4j . Indeed, the horizontal size of C is 2j 1 , and, assuming that (5.214) holds, the inequality 5 j 7 4j 4 j 2 6 .x C `2 1 / 6 has constant times 11 consecutive integer solutions in `. Using the substitution x D x C `2j 1 in (5.213) and (5.214), we have the inequalities ˇ ˇ ˇ ˇ j C1 ˇ ˇ 2 ˇ x C `2j C x C `2j C h vˇ < 2 1 1
(5.215)
and 7 5 j 4j : 4 6 .x C `2j 1 /2 6
(5.216)
p
If (5.214) holds, then there are at least 101 consecutive integer solutions ` of (5.216). The basic idea is the same: if ` runs through these integer solutions of (5.216), and of course , x, h, v remain fixed, then the function (=function of `) C x C `2j 1 x C `2j 1 C h
(5.217)
has “substantially different” values, and we expect only a very few of them to be “very close” to a fixed v in the quantitative sense of (5.215) (we assume that 2 is “small”). To work out the details of this intuition, we begin with the following corollary of (5.216): r
6 j 2 x C `2j 1 5
r
6 j 2 ; 7
(5.218)
5.7 Completing the Case Study
323
and using this in (5.217), we have the good approximation C p j Cp j : x C `2j 1 x C `2j 1 C h 2 2 C h
(5.219)
Next we distinguish two cases. First assume that c1 0 < h p 2j 2 ; where c1 > 0 is the positive constant in the rectangle property. Then jvj
p c1 c1 p j 1 D 2 c1 2j : h c1 2
(5.220)
On the other hand, by (5.215) and (5.175), v2 p
p C 2j C1 2 < 4 2j ; j 2
(5.221)
assuming p : 2 < 2
(5.222)
Since (5.220) and (5.221) contradict each other, we can assume that c1 h > p 2j 2 ;
(5.223)
which is analogous to (5.164) in Case 1. Next, similarly to Case 1, we go back to the basic idea. We claim that if we switch ` to ` C 1 in the function [see (5.217)] ; C j x C `2 1 x C `2j 1 C h
(5.224)
then (5.224) changes at least as much as 1 2j 2 :
(5.225)
(Notice that (5.225) is analogous to (5.166).) Indeed, (5.225) immediately follows from the routine estimation ! 1 1 1 1 1 D p j 1 : p j p j 1 j p 2 2 C 2 1 2 1C 2j
324
5 Pell’s Equation, Superirregularity and Randomness
Let’s return to (5.224) and (5.225) and apply it in (5.215). We obtain that at most 10 values of ` will satisfy both (5.215) and (5.216):
(5.226)
Just like in Case 1, the next step is
5.7.3 A Combination of the Rectangle Property and the Pigeonhole Principle. We recall (5.223): c1 h > p 2j 2 : We consider the following power-of-two type decomposition: c1 c1 2r1 p 2j 1 < h 2r p 2j 1 ; r D 0; 1; 2; : : :
(5.227)
We claim that, for a fixed point Pi0 D .ai0 ; bi0 / 2 P and for a fixed integer r 0, there are at most 10 2r
(5.228)
other points Pi D .ai ; bi / 2 P (Pi ¤ Pi0 ) such that h D hi D ai ai0 > 0 and v D vi D bi bi0 satisfy (5.215) [and implicitly (5.216)] and (5.227). To prove (5.228), first note that if h D hi satisfies (5.227), then by (5.219) and (5.227), C p j Cp j j j x C `2 1 x C `2 1 C h 2 2 C h
p
2
j
1C
!
1 1C
c1 r1 2
;
so a solution of (5.215) gives the approximation p v D vi 2j
1C
1 1C
c1 r1 2
! ˙ 2j C1 2 :
(5.229)
5.7 Completing the Case Study
325
Assuming 2 <
p ; 100
(5.230)
(5.229) yields the good approximation v D vi
p
2
j
1C
!
1 1C
c1 r1 2
:
(5.231)
Now suppose that (5.228) is not true. If there are more than 10 2r other points Pi D .ai ; bi / 2 P (Pi ¤ Pi0 ) such that h D hi D ai ai0 > 0 and v D vi D bi bi0 satisfy (5.215) [and implicitly (5.216)] and (5.227), then by the Pigeonhole Principle and (5.231) there must exist two points Pi1 ; Pi2 2 P (i1 ¤ i2 ) such that p vi1 2j
1C
!
1 1C
c1 r1 2
vi2
and jhi1 hi2 j
2r pc1 2j 1 10
2r
D
c1 2j p : 20
Since the product p
2
j
1C
1 1C
c1 r1 2
!
c 1 2j p 2
is less than c1 , we obtain that there exists an axes-parallel rectangle of area less than c1 containing at least two points of P (namely, Pi1 and Pi2 ). This contradicts the rectangle property and proves (5.228). If h D hi falls into the interval (5.227), then slope of C \ .Pi H .N // D
.=c1 /2 j 4 ; .x C h/2 h2 4r2
(5.232)
where 4j (almost) equals the slope of the diagonals of the j -cell C. By (5.232) we have ˇ ˇZ ˇ ˇ 1 .=c1 /2 ˇ ˇ Rj .x/ d xˇ 10 r2 : (5.233) ˇ ˇ area.C/ ˇ C\.Pi H .N // 4
326
5 Pell’s Equation, Superirregularity and Randomness
What is more, (5.233) holds for all j -cells C satisfying (5.178): 5 j 7 4 slope of C \ .Pi0 H .N // 4j : 6 6 Let’s return now to (5.152). Combining (5.226)–(5.228) and (5.233) we have X
X
Pi 2PW i ¤i0 CW all j cel ls satisfying (5.178) C ase 3
X
ˇZ ˇ ˇ ˇ 1 ˇ ˇ Rj .x/ d xˇ ˇ ˇ ˇ area.C/ C\.Pi H .N //
10 10 2r 10
r0
D 42 103
.=c1 /2 D 4r2
0 1 2 X 2 @ 2r A D 32 103 : c1 c1
(5.234)
r0
This completes Case 3. Finally, we study Case 4: The lower arc of Pi H .N / intersects C, and the slope is larger than the slope of the “dominant needle” Pi0 H .N / (see Fig. 5.10). As usual, we don’t distinguish between positive and negative slopes. Again let Pi0 D .ai0 ; bi0 /, Pi D .ai ; bi / denote the coordinates of the two points. By the hypothesis of Case 4, ai0 > ai . We want positive real numbers: we write h D hi D ai0 ai > 0 and v D vi D bi bi0 > 0; where h stands for horizontal and v stands for vertical. The rectangle property guarantees that hv c1 > 0. Let .A1 ; A2 / denote the coordinates of the lower left corner of the j -cell C. In Case 4 we have bi > A2 > bi0 and bi A2 > A2 bi0 . The intersection of the horizontal line y D A2 with the upper arc of Pi0 H .N / and the lower arc of Pi H .N / gives two points: the hypothesis of Case 4 implies that these intersection points are relatively close to each other in the following quantitative sense. Write y D A2 bi0 > 0; then bi A2 D .bi bi0 / y D v y > y, and we have the upper bound ˇ ˇ ˇ ˇˇ ˇ ai a < 2 2j 1 : i0 ˇ vy y ˇ
(5.235)
5.7 Completing the Case Study
327
Since ai0 ai D h > 0, we can rewrite (5.235) as follows: ˇ ˇ ˇ ˇ ˇ ˇ < 2 2j 1 : h ˇ y vy ˇ
(5.236)
Fig. 5.10
Now we basically repeat the argument of Case 2. But, just like Case 3 was a simpler version of Case 1, Case 4 is a simpler version of Case 2. Case 4 is similar to Case 3 in the following technical detail: the two critical functions f3 .y/ D
C and f4 .y/ D y yCh y vy
(5.237)
328
5 Pell’s Equation, Superirregularity and Randomness
are similar in the sense that if y increases (decreases) then both parts of the function increase (decrease) at the same time. We refer to this fact by saying “f3 .y/ and f4 .y/ are in synchrony.” Since in both Cases 2 and 4 we switch the role of horizontal and vertical, here again we focus on the reciprocal of the slope: we know that the reciprocal of the slope of the upper arc of C \ .Pi0 H .N // satisfies the inequality 6 6 j 4 2 4j : 7 y 5
(5.238)
We know that if 2 (and so 1 ) is a small constant, then the upper arc of Pi0 H .N / intersects a “large number” of j -cells (different from C) such that the reciprocal of the slope is still almost equal to 4j . Returning to (5.236) and (5.238), and using the substitution y D y C `2j 2 , we have the inequalities ˇ ˇ ˇ ˇ j C1 ˇ ˇ ˇ y C `2j v .y C `2j / hˇ < 2 1 2 2
(5.239)
and 6 6 j 4j : 4 j 2 7 .y C `2 2 / 5
(5.240)
p
If (5.238) holds, then there are at least 102 consecutive integer solutions ` of (5.240). The basic idea is the same: if ` runs through these integer solutions of (5.240), and of course , x, h, v remain fixed, then the function (=function of `) j y C `2 2 v .y C `2j 2 /
(5.241)
has “substantially different” values, and we expect only a very few of them to be “close” to a fixed h in the quantitative sense of (5.240) (of course, here we assume that 1 is “small”). Next we work out the details of this intuition. We begin with the following corollary of (5.240): r
6 j 2 y C `2j 2 7
r
6 j 2 : 5
(5.242)
Since “f3 .y/ and f4 .y/ are in synchrony” [see (5.237)], we can basically repeat the argument of (5.224)–(5.226) in Case 3 and obtain that if we switch ` to ` C 1 in the function [see (5.241)] ; y C `2j 2 v .y C `2j 2 /
(5.243)
5.7 Completing the Case Study
329
then (5.243) changes at least as much as 2 2j 2 :
(5.244)
(Notice that (5.244) is analogous to (5.199) and (5.225).) Thus we obtain that at most 10 values of ` will satisfy both (5.239) and (5.240):
(5.245)
Just like in Cases 1–3, the next step is
5.7.4 A Combination of the Rectangle Property and the Pigeonhole Principle. Since in Case 4 we have v .y C `2j 2 / > y C `2j 2 H) v > 2.y C `2j 2 /;
(5.246)
by (5.242) we can assume that r v>2
6 j 2 : 7
(5.247)
We consider the following power-of-two type decomposition: r 2
r1
6 j C2 < v 2r 2 7
r
6 j C2 r D 0; 1; 2; : : : 2 7
(5.248)
We claim that, for a fixed point Pi0 D .ai0 ; bi0 / 2 P and for a fixed integer r 0, there are at most 100
r 2 c1
(5.249)
other points Pi D .ai ; bi / 2 P (Pi ¤ Pi0 ) such that h D hi D ai0 ai > 0 and v D vi D bi bi0 > 0 satisfy (5.239) [and implicitly (5.240)] and (5.248). To prove (5.249), first note that if v D vi satisfies (5.248), then by (5.239), (5.242) and (5.246), h D hi <
C 2j C11 y C `2j 2
p C 2j C11 2 2j ;
q
6 7
2j
(5.250)
330
5 Pell’s Equation, Superirregularity and Randomness
assuming 1 <
p : 4
(5.251)
Now suppose that (5.249) is not true. If there are more than 100
r 2 c1
other points Pi D .ai ; bi / 2 P (Pi ¤ Pi0 ) such that h D hi D ai0 ai > 0 and v D vi D bi bi0 > 0 satisfy (5.239) [and implicitly (5.240)] and (5.248), then by the Pigeonhole Principle and (5.250) there must exist two points Pi1 ; Pi2 2 P (i1 ¤ i2 ) such that p maxfhi1 ; hi2 g 2 2j ; and 2r jvi1 vi2 j
q
6 j C2 7 2 100 c1 2r
q c1
D
6 7
p 2j : 25
Since the product p
q r c1 67 6 j j 2 p 2 D c1 7
is less than c1 , we obtain that there exists an axes-parallel rectangle of area less than c1 containing at least two points of P (namely, Pi1 and Pi2 ). This contradicts the rectangle property and proves (5.249). If v D vi falls into the interval (5.248), then 1 2 r 4j ; 2 .y C v/ v 4 (5.252) j where 4 (almost) equals the reciprocal of the slope of the diagonals of the j -cell C. By (5.252) we have ˇ ˇZ ˇ 10 ˇ 1 ˇ ˇ R .x/ d x (5.253) ˇ r: ˇ j ˇ 4 area.C/ ˇ C\.Pi H .N // the reciprocal of the slope of C \ .Pi H .N // D
What is more, (5.253) holds for all j -cells C satisfying (5.178): 5 j 7 4 slope of C \ .Pi0 H .N // 4j : 6 6
5.8 Completing the Proof of Theorem 5.11
331
Let’s return now to (5.152). Combining (5.245), (5.248), (5.249), and (5.253) we have ˇZ ˇ ˇ ˇ X X 1 ˇ ˇ Rj .x/ d xˇ ˇ ˇ area.C/ ˇ C\.Pi H .N // Pi 2PW i ¤i0 CW all j cel ls satisfying (5.178) C ase 4
X
10 100
r0
r 10 2 r D c1 4
0 1 X D 104 @ 2r A D 2 104 : c1 r0 c1
(5.254)
This completes Case 4.
5.8 Completing the Proof of Theorem 5.11 Here we finally complete the proof of Proposition 5.12. Let’s return to (5.151) and (5.152); we are now ready to clarify the technical details of the Single Term Domination. Let Pi0 2 P and j 2 J be arbitrary. The slope x2 of the hyperbolic needle Pi0 H .N / satisfies Property I: 5 j 7 4 2 4j 6 x 6
(5.255)
if and only if r
6p j 2 x 7
r
6p j 2 ; 5
which is an interval of length r
6 5
r ! p j 6 p j 2 > 2 : 7 6
Since a j -cell C has horizontal side 1 2j , there are more than p j 2 6 1 2j
p D 61
j -cells C such that the slope of the intersection C \ .Pi0 H .N // satisfies Property I [see (5.255)].
332
5 Pell’s Equation, Superirregularity and Randomness
It would be not too difficult to prove directly—by usingpsome familiar arguments from uniform distribution—that, among these more than 61 j -cells C, at least 1 % has the following additional property that we call Property II: Pi0 H .N / intersects only the lower right subrectangle of C, and the 1 intersection is a “large triangle”, meaning that the area is 32 of the area of C, i.e., 1 2 the area is 32 . It is technically simpler, however, to force Property II in an indirect way: by using the trick of short vertical translations; see Fig. 5.11 below. This geometric trick was already mentioned at the end of Sect. 5.5. More precisely, for every real number t0 in 0 < t0 < 1, consider all j -cells C such that with B D Œ0; M N Œ; M we have C \ ..Pi0 C .0; t0 // H .N // B
(5.256)
and 5 j 7 4 slope of C \ ..Pi0 C .0; t0 // H .N // 4j : 6 6
(5.257)
A simple geometric consideration shows that, for (say) at least 5 % of the pairs .t0 ; C/, where C satisfies (5.256) and (5.257), C \ ..Pi0 C .0; t0 // H .N // also satisfies Property II. That is, .Pi0 C .0; t0 // H .N / intersects only the lower right subrectangle of C, and the intersection is a “large triangle,” meaning that the area is 1 1 2 32 of the area of C, i.e., the area is 32 .
Fig. 5.11
For the proof of the positive direction (5.1060), we choose the pattern C in every j -cell C satisfying (5.256) and (5.257) [and of course we choose the negative pattern C for the negative direction (5.10600)]. Then we have Z C\..Pi0 C.0;t0 //H .N //
Rj .x/ d x
1 2 : 32
(5.258)
Finally, if the j -cell C does not satisfy both (5.256) and (5.257), then we choose the 0 pattern. Therefore, by (5.258) and Cases 1–4, we have
5.8 Completing the Proof of Theorem 5.11
Z
0 1
@
t0 D0
.Pi0 C.0;t0 //H .N /
X
X
j 2J
Pi0 2PW for all 0
X
Z
X
CW all j cells Pi 2PW i¤i0 C ases 1;2;3;4 satisfying (5.257)
1
X X Z
j 2J Pi0 2P
333
ˇZ ˇ ˇ ˇ t0 D0 ˇ C\..Pi 1
0
X
X
j 2J
Pi0 2PW for all 0
3
1 2 2 2 10
c1
3=2
C
c1
Rj .x/ d xA dt0
p 1 2 1 20 61 32 ˇ ! ˇ ˇ Rj .x/ d xˇ dt0 ˇ C.0;t0 //H .N // p 1 2 1 20 61 32
2 !
C 32 10
3
c1
2
C 2 10
4
c1
!! ; (5.259)
1 where the factor 20 comes from the “5 %” mentioned above, in the last step we used (5.179), (5.211), (5.234), and (5.254) for every t0 in 0 < t0 < 1, and, finally, we assume that p c1 1
Œsee (5.174); Œsee (5.163); 2 < 2 < 1 2 8 p C p1c1
c1 1 < p Œsee (5.192); 1 < 2
1 < 8
1 p1
C
p 2 <
p1 c1
p
c1 Œsee (5.196); 2
Œsee (5.207); 2 <
p Œsee (5.222); 2
p Œsee (5.230); 1 < Œsee (5.251): 100 4
334
5 Pell’s Equation, Superirregularity and Randomness
Since p1
1 C
p1 c1
p p C c1 ; 2
we can guarantee all of these requirements by forcing the following single inequality: p c1 c1 : ; ; p maxf1 ; 2 g < min 100 8 2 p
(5.260)
Let’s return to (5.259): we have 3=2 2 ! 2 p 1 2 1 3 C C 32 103 1 2 2 2 10 20 61 32 c1 c1 c1 C 2 104
c1
p 1 2 1 ; 40 61 32
(5.261)
assuming (5.260) holds, and also 1 satisfies the inequality
1 108 p 1
c1
C
c1
2 ! :
(5.262)
Since 1 and 1 are almost equal [see (5.117)], we can satisfy both (5.260) and (5.262) by the choice p 1 2 D min
p c1 108 c1 108 c12 : ; ; p ; 200 10 2 2 3=2
(5.263)
By using (5.263) in (5.261), and then returning to (5.259), we have Z
0 1
@
t0 D0
1
X XZ j 2J Pi 2P
.Pi C.0;t0 //H .N /
X
X
j 2J
Pi 2PW for all 0
D
p 1 2 1 D 40 61 32
X
X
j 2J
Pi0 2PW for all 0
where 2 is defined in (5.263).
Rj .x/ d xA dt0
p
2 ; 240 32
(5.264)
5.8 Completing the Proof of Theorem 5.11
335
It is straightforward from (5.264) that there is a t0 in 0 < t0 < 1 such that X XZ j 2J Pi 2P
.Pi C.0;t0 //H .N /
Rj .x/ d x p
X
X
j 2J
Pi 2PW for all 0
2 : 240 32
(5.265)
Now we return to (5.141). Applying Lemma 5.13 for the translated point set P C .0; t0 /, we have 1 .M N /.M 2 /
Z
M N
Z
M
.x/T .x/ d x D 0
0
1 X area B \ ..Pi C .0; t0 // H .N // A ı area.H .N // C D@ .M N /.M 2 / P 2P i
C
X X j 2J Pi 2P
1 .M N /.M 2 /
Z .Pi C.0;t0 //H .N /
Rj .x/ d x C error;
(5.266)
where by (5.146),
jerrorj
jPj 2 p 4 .1 C 2 / .n C 1/ p : .M N /.M 2 / 21
(5.267)
We recall that P is a finite subset of the square Œ0; M 2 with cardinality jPj D ıM 2 . Since 0 < t0 < 1, the rectangle property implies—via elementary calculations— that .Pi C .0; t0 // H .N / B D Œ0; M N Œ; M
(5.268)
holds for all but
.2N C 4 C 1/M points Pi 2 P: c1
(5.269)
336
5 Pell’s Equation, Superirregularity and Randomness
Thus we have 0 1 X area B \ ..Pi C .0; t0 // H .N // @ A ı area.H .N // D .M N /.M 2 / P 2P i
D
D
ıM 2 C .2N C4C1/M c1 .M N /.M 2 /
area.H .N // ı area.H .N // D
! .2N C4 C1/M M2 c1 1 ı area.H .N // C area.H .N // .M N /.M 2 / .M N /.M 2 / (5.270)
with some constant 1 1. Since area.H .N // D 2 log N , by (5.270) we have ˇ0 ˇ 1 ˇ ˇ ˇ X area B \ ..Pi C .0; t0 // H .N // ˇ ˇ@ A ı area.H .N //ˇ ˇ ˇ .M N /.M 2 / ˇ ˇ Pi 2P
3 N C 6 C M1
2 log N: M M N 1 2 1 M M
(5.271)
Combining (5.271) with (5.266) and (5.267), we obtain 1 .M N /.M 2 / D
X X j 2J Pi 2P
Z
M N
Z
M
.x/T .x/ d x D 0
1 .M N /.M 2 /
Z .Pi C.0;t0 //H .N /
Rj .x/ d x C error; (5.272)
where jerrorj
jPj 2 p 4 .1 C 2 / .n C 1/ p C .M N /.M 2 / 21 3 N C 6 C M1
2 log N: C M M N 1 2 1 M M
(5.273)
5.8 Completing the Proof of Theorem 5.11
337
Applying (5.265) in (5.272) and (5.273), we have 1 .M N /.M 2 /
X j 2J
Z
M N
Z
M
.x/T .x/ d x 0
1 .M N /.M 2 /
p
X Pi 2PW for all 0
2 240 32
jPj 2 p 4 .1 C 2 / .n C 1/ p .M N /.M 2 / 21 3 N C 6 C M1
2 log N: M M N 1 M 1 2 M
(5.274)
By (5.136),
jJ j .n C 1/
log C 1 log 2
;
and by (5.268) and (5.269), X
1 ıM 2
Pi 2PW for all 0
.2N C 4 C 1/M : c1
Thus we have, X j 2J
1 .M N /.M 2 /
X
1
Pi 2PW for all 0
1 2N C 4 C 1 .n C 1/ log C log 2 ı 1 : c1 M
(5.275)
Now let’s go back to (5.274). If is small then 2 is negligible compared to . Let (say) D 106 . Using this and (5.275) in (5.274), we obtain 1 area.B/
Z .x/T .x/ d x B
1 p log C 1 A ı 1 2N C 4 C 1 @.n C 1/ 2 ; log 2 c1 M 104 0
(5.276)
338
5 Pell’s Equation, Superirregularity and Randomness
assuming N and M=N are both “large.” More precisely, we assume that N 2
10 C 1
;
1 N < 2n N; 2
C 1 .N C 2 / o: M > 1011 n p 7 107 c 2 ı c1 min 20 ; c1 ; 10 2 c1 ; 2 1
(5.277)
By (5.276) and (5.277) and the definition of 2 [see (5.263)] we conclude that 1 area.B/
Z
.x/T .x/ d x ı 0 log N;
(5.278)
B
where ı 0 D ı 0 .c1 ; ; ı/ > 0 is a positive constant independent of N and M : ı 0 D ı 0 .c1 ; ; ı/ D 1012 ı min
107 c1 107 c12 p : ; c1 ; ; 20 2 2
(5.279)
By (5.278) there must exist a translated copy x1 C H .N / of the hyperbolic needle H .N / such that x1 C H .N / Œ0; M 2 and jP \ .x1 C H .N //j ı 2 log N C ı 0 log N; where ı 0 D ı 0 .c1 ; ; ı/ > 0 is defined in (5.279). This proves the positive direction (5.1060). The proof of the negative direction (5.10600) is the same, except that we replace the pattern C with its negative C. Thus the long proof of Proposition 5.12 is complete. This finally completes the proof of Theorem 5.11. t u
5.9 Yet Another Generalization of Theorem 5.3 Let ˛ > 0, 0 ˇ < 1, > 0 be arbitrary but fixed real numbers, and let F .˛I ˇI I N / denote the number of integral solutions of the diophantine inequality kn˛ ˇk <
; 1 n N: n
(5.280)
p Note that the special case ˛ D 2 was already introduced in Sect. 5.1, see (5.30). Inequality (5.280) motivates the hyperbolic region jy ˇj < which has area 2 log N C O.1/.
; 1 x N; x
5.9 Yet Another Generalization of Theorem 5.3
Let’s return to the special case ˛ D have Z
1
p
339
2. Combining Lemmas 5.1 and 5.2, we
p F. 2I ˇI I N / dˇ D 2 log N C O.1/;
(5.281)
0
and for an arbitrary subinterval Œa; b with 0 a < b 1 we have the limit formula lim
1 ba
Rb a
N !1
p F . 2I ˇI I N / dˇ D 2: log N
(5.282)
There is a straightforward generalization of (5.281) and (5.282) for arbitrary ˛ > 0 (the proof is the same): Z
1
F.˛I ˇI I N / dˇ D 2 log N C O.1/;
(5.283)
0
and for an arbitrary subinterval Œa; b with 0 a < b 1 we have the limit formula lim
1 ba
Rb a
N !1
F .˛I ˇI I N / dˇ D 2: log N
(5.284)
Formulas (5.281)–(5.284) express the almost trivial geometric fact that the average number of lattice points contained in all the translated copies of a given region equals the area of the region (see Lemma 5.8). It is natural, therefore, to study the limit F .˛I ˇI I N / : N !1 2 log N lim
(5.285)
The case of rational ˛ in (5.285) is trivial. Indeed, if N ! 1 then F .˛I ˇI I N / remains bounded for all but a finite number of values of ˇ D ˇ.˛/ in the unit interval. When F .˛I ˇI I N / tends to infinity, then it behaves like a linear function constN , which is much faster than the logarithmic function log N . If ˛ is irrational, then we have the following nontrivial result, which can be considered a far-reaching generalization of Theorem 5.3. Theorem 5.14. Let ˛ > 0 be an arbitrary irrational and let > 0 be an arbitrary real number. There are continuum many “divergence points” ˇ D ˇ .˛; / 2 Œ0; 1/ such that F .˛I ˇ I I n/ F .˛I ˇ I I n/ > lim inf : n!1 log n log n n!1
lim sup
(5.286)
340
5 Pell’s Equation, Superirregularity and Randomness
Proof. We can clearly assume that 0 < ˛ < 1; we need the continued fraction ˛D
1 D Œa1 ; a2 ; a3 ; : : :: 1 a1 C a2 C
(5.287)
For irrational ˛ the “digits” a1 ; a2 ; a3 ; : : : form an infinite sequence (ai 1 for all i 1). The fractions (k 2) pk D Œa1 ; : : : ; ak1 qk are known as the convergents to ˛. It is wellknown that pk ; qk are generated by the recurrence relations pk D ak1 pk1 C pk2 ; qk D ak1 qk1 C qk2 ;
(5.288)
where p1 D 0, q1 D 1 and p2 D 1, q2 D a1 . Another well-known fact about the convergents is the inequality ˇ ˇ ˇ ˇ 1 ˇ˛ pk ˇ ; ˇ ˇ qk qk qkC1
(5.289)
which clearly implies kqk ˛k <
1 : ak qk
(5.290)
Write n D `qk , then by (5.290), kn˛k D k`qk ˛k < and so kn˛k <
n
` `2 `2 D D ; ak qk ak `qk ak n
holds whenever `2 p ” 1 ` ak : ak
(5.291)
Now let Nk D
˘ p ak qk ;
(5.292)
where, as usual, b: : :c denotes the lower integral part. Equation (5.291) implies that the homogeneous diophantine inequality kn˛k < n has at least
5.9 Yet Another Generalization of Theorem 5.3 k X p
341
ai
˘
i D1
integer solutions n in 1 n Nk . Or formally, we have F .˛I ˇ D 0I I Nk /
k X p
˘ ai :
(5.293)
i D1
We distinguish two cases. We start with the much harder one. Case 1:
For all sufficiently large values of k we have k X p
˘ ai 100 2 log Nk :
(5.294)
i D1
We proceed in four steps. t u
5.9.1 Step One The crucial first step in the argument is to show that condition (5.294) implies the exponential upper bound k Y
0
.ai C 1/ e c k
(5.295)
i D1
for all sufficiently large values of k, where c 0 D c 0 . / is a finite constant independent of k. To derive (5.295), we use the well-known principle “exponential beats polynomial” in the form of an elementary inequality as follows. Lemma 5.15. For any positive c > 0 c p .x C 1/c 2c 2 e 2 e x for all x 1: Proof. It is just routine calculus. We start with the trivial x C 1 2x for all x 1, which leads us to the function p
g.x/ D .2x/c e
x
;
342
5 Pell’s Equation, Superirregularity and Randomness
what we want to maximize. It is easy to compute the derivative of g.x/, and after routine calculations we obtain c p max g.x/ D g.4c 2 / D 2c 2 e 2 e x ; x0
t u
which completes the proof of the lemma. By repeated application of (5.288) we have qk D ak1 qk1 Cqk2 .ak1 C1/qk1 .ak1 C1/.ak2 C1/qk2
k1 Y
.ai C1/:
i D1
(5.296)
Combining this with (5.292) and (5.294), we have k X p
! k1 Y p p ai 1 100 2 log C log ak C log .ai C 1/
i D1
i D1
! k Y p 200 log C log .ai C 1/ :
(5.297)
i D1
Switching to the exponential function, the sum on (5.297) becomes the product k Y
e
p ai 1
100
i D1
k Y .ai C 1/200 ;
(5.298)
i D1
and this inequality holds for all sufficiently large k, i.e., for all k k0 . p Applying Lemma 5.15 with c D 400 and x D ai , i D 1; 2; : : : ; k, and multiplying these inequalities together, we obtain k k Y p p 800p k Y pai .ai C 1/400 400 e ; i D1
and, finally, raising (5.299) to the
p th power, we have
k k Y Y p .ai C 1/400 .400 /800k e ai : i D1
i D1
Combining (5.298) and (5.300), we have k k Y Y .ai C 1/400 .400 /800k e k 100 .ai C 1/200 : i D1
(5.299)
i D1
i D1
(5.300)
5.9 Yet Another Generalization of Theorem 5.3
343
Simplifying both sides with the product k Y .ai C 1/200 i D1
and taking 200 th root, we conclude that k Y
k
.ai C 1/ .400 /4k e 200
p
D
k 1 p 8 2 200 20 e ;
(5.301)
i D1
which holds for all k k0 . This proves (5.295).
5.9.2 Step Two: Small “Digit” ai Implies “Local” Rectangle Property It follows from (5.301) that, for all sufficiently large k, 1
ai C 1 2017 4 e 100
(5.302)
holds for at least k=2 values of j in 1 j k. In other words, at least half of the (continued fraction) “digits” ai of ˛ are “small”—namely, less than a constant depending only on —in the precise quantitative sense of (5.302). Next we show that for every small digit ai the rectangle property holds locally: in some power-of-two range around qi . To prove this, we basically repeat the proof of Lemma 5.5 and use some facts from the theory of continued fraction (see Lemma 5.16). The details go as follows. Similarly to the proof of Lemma 5.5, we consider a rectangle of slope 1=˛ which contains two lattice points P D .k; `/ and Q D .m; n/; in fact, assume that P; Q are two corner points of the rectangle. We denote the PQ vector as v D .m k; n `/, and consider the two perpendicular unit vectors e1 D
p
˛ 1 C ˛2
;p
1 1 C ˛2
and e2 D
1
˛
p ;p 1 C ˛2 1 C ˛2
:
Then the two sides, a and b, of the rectangle can be expressed in terms of the inner products e1 v and e2 v: jp˛ C qj jp q˛j a D je1 vj D p and b D je2 vj D p ; 1 C ˛2 1 C ˛2
344
5 Pell’s Equation, Superirregularity and Randomness
where p D m k and q D n `. Thus we have area D ab D
jp˛ C qj jp q˛j : 1 C ˛2
(5.303)
Without loss of generality we can assume that p 0, q 0 and p is the nearest integer to q˛. Thus we have jp q˛j D kq˛k. Next we need the following facts from the theory of continued fraction. t u Lemma 5.16. If 1 q < qi C1 then kq˛k kqi ˛k >
1 : .ai C 2/qi
(We postpone the proof of Lemma 5.16 later.) Now assume that qi q < qi C1 : 4
(5.304)
Applying Lemma 5.16 and (5.304), we have jp q˛j D kq˛k kqi ˛k >
1 1 : .ai C 2/qi 4.ai C 2/q
(5.305)
Applying (5.305) in (5.303), we have area D
q jp q˛j 1 .p˛ C q/j jp q˛j ; 2 2 1C˛ 1C˛ 4.ai C 2/.1 C ˛ 2 /
(5.306)
assuming (5.304). Let me elaborate on the meaning of (5.306). It was about a rectangle of slope 1=˛ which contains two lattice points P D .k; `/ and Q D .m; n/; in fact, P; Q are two corner points of the rectangle. We write the PQ-vector as v D .p; q/; without loss of generality we can p assume that p 0, q 0 and p is the nearest integer to q˛. If q is large, then 1 C ˛ 2 q is very close to the diameter of this long and narrow rectangle. It means that q is basically a size parameter of the rectangle. Assume that (5.304) holds: qi q < qi C1 ; 4 then Eq. (5.306) tells us that the area of this long and narrow rectangle is
1 ; 4.ak C 2/.1 C ˛ 2 /
that is, the area is not too small if ai is not too large.
5.9 Yet Another Generalization of Theorem 5.3
345
Therefore, we can rephrase (5.304) and (5.306) together in a nutshell as follows: a small digit ai yields the rectangle property locally. This means that we have a good chance to adapt the Riesz product technique. But first, for the convenience of the reader, I interrupt the argument and include a proof of Lemma 5.16 (which is surprisingly tricky). Proof of Lemma 5.16. We recall (5.288): pi D ai 1 pi1 C pi 2 ; qi D ai 1 qi 1 C qi 2 ;
(5.307)
and because these recurrence relations hold for any ai , including arbitrary real values, writing ˛ D Œa1 ; : : : ; ak1 ; ˛k with ˛k D ak C
1 D Œak I akC1 ; akC2 ; akC3 ; : : :; akC1 C akC21 C
we obtain the useful formula ˛D
˛k pk C pk1 : ˛k qk C qk1
(5.308)
By using (5.308), we can rewrite qk ˛ pk as follows: qk ˛ pk D qk
˛k pk C pk1 qk .˛k pk C pk1 / pk .˛k qk C qk1 / pk D D ˛k qk C qk1 ˛k qk C qk1 D
qk pk1 pk qk1 : ˛k qk C qk1
(5.309)
Note that qk pk1 pk qk1 D .ak1 qk1 C qk2 /pk1 .ak1 pk1 C pk2 /qk1 D D .qk1 pk2 pk1 qk2 /:
(5.310)
Since p1 D 0, q1 D 1 and p2 D 1, q2 D a1 , we have q2 p1 p2 q1 D 1, and using it in (5.310), we obtain by induction qk pk1 pk qk1 D .1/k1 :
(5.311)
346
5 Pell’s Equation, Superirregularity and Randomness
Using this in (5.309), we have qk ˛ pk D
.1/k1 ; ˛k qk C qk1
(5.312)
which implies kqk ˛k D jqk ˛ pk j D
1 1 > : ˛k qk C qk1 .ak C 2/qk
(5.313)
It remains to prove that if p; q are integers with 0 < q < qkC1 then jq˛ pj jqk ˛ pk j:
(5.314)
To prove (5.314), we define integers u; v by the equations p D upk C vpkC1 ; q D uqk C vqkC1 :
(5.315)
Note that (5.315) is solvable in integers u; v, since the determinant is ˙1, see (5.311). Since 0 < q < qkC1 , u ¤ 0. Moreover, if v ¤ 0, then u; v must have opposite signs. Since qk ˛ pk and qkC1 ˛ pkC1 also have opposite signs [see (5.312)], we conclude jq˛ pj D ju.qk ˛ pk / C v.qkC1 ˛ pkC1 /j D D ju.qk ˛ pk /j C jv.qkC1 ˛ pkC1 /j ju.qk ˛ pk /j jqk ˛ pk j; proving (5.314). Since (5.313) and (5.314) imply Lemma 5.16, we are done.
t u
5.9.3 Step Three: Employing the Riesz Product Technique Let’s go back to Theorem 5.11 and, what is basically the same, to Proposition 5.12. A trivial novelty is that in Sect. 5.9 the slope is ˛1 [note that in Theorem 5.11 the slope was p12 and in Proposition 5.12 the slope was 0 (=horizontal)]. The Riesz product (5.116) was defined by using some appropriate modified Rademacher functions Rj .x/ 2 R.j / for j with 1 2j N , i.e., for log N= log 2 C O.1/ values of j . In the hypothesis of Theorem 5.11 (and Proposition 5.12) we had the unrestricted rectangle property; here we have a “restricted Rectangle Property” instead, meaning that the rectangle property holds only for const log N values of the power-of-two parameter j , where 0 j log N= log 2 C O.1/. Indeed, by (5.295) and (5.296),
5.9 Yet Another Generalization of Theorem 5.3
log Nk D log qk C O.1/ log
k Y
347
.ai C 1/ C O.1/ const log N;
i D1
and by (5.302), the continued fraction “digit” ai of ˛ is “small” for at least k=2 values of i in 1 i k, if k is sufficiently large. For these “small” values of ai the rectangle property holds in the power-of-two range around qi , i.e., when 2j qi , see (5.304) and (5.306). This means we can easily save the Riesz product technique developed in Sects. 5.5–5.8; the minor price that we pay is a constant factor loss, due to the fact that log N= log 2 is replaced by const log N where const is a (small) positive constant depending only on > 0. Thus we obtain the following result. Lemma 5.17. Let I D Œa; b, 0 a < b < 1 be an arbitrary subinterval of the unit interval. Assume that (5.294) holds. Then for all sufficiently large integers N , there is a subinterval I1 D Œa1 ; b1 , a < a1 < b1 < b of I (I1 may depend on N ) such that for all ˇ1 2 I1 , F .˛I ˇ1 I I N / > 2 log N C ı 0 log N
(5.316)
where ı 0 D ı 0 . / > 0 is a positive constant depending only on > 0. Similarly, for all sufficiently large integers N , there is a subinterval I2 D Œa2 ; b2 , a < a2 < b2 < b of I (I2 may depend on N ) such that for all ˇ2 2 I2 , F .˛I ˇ2 I I N / < 2 log N ı 0 log N where ı 0 D ı 0 . / > 0 is the same positive constant as in (5.316).
5.9.4 Step Four: Constructing a Cantor Set The last step is routine. Combining the method of nested intervals with Lemma 5.17, we can easily build an infinite binary tree of nested intervals the same way as we did in the proof of Theorem 5.3. The “divergence points” ˇ arise as the intersection of infinitely many decreasing intervals, which correspond to an infinite branch of the binary tree. Since a binary tree of countably infinite height has continuum many infinite branches, we obtain continuum many divergence points, proving Theorem 5.14 in Case 1. Case 2:
The inequality k X p
˘ ai > 100 2 log Nk
(5.317)
i D1
holds for infinitely many integers k 1, where Nk D
p
˘ ak qk , see (5.292).
348
5 Pell’s Equation, Superirregularity and Randomness
Equation (5.283) tells us that 2 log Nk is the average value of F .˛I ˇI I Nk / as ˇ runs in the unit interval. On the other hand, combining (5.293) with (5.317), we have that F .˛I ˇ D 0I I Nk / > 100 2 log Nk
(5.318)
holds for infinitely many integers k 1. In other words, for infinitely many values N D Nk , the homogeneous case ˇ D 0 gives at least 100 times more integer solutions than the average value 2 log Nk . This represents an “extreme bias”; in fact, an extreme surplus. Note that the proof of Theorem 5.3 was based on a somewhat similar “extreme bias”: a violation of the Area Principle. We mean the trivial fact that the Pell inequality 1 < x 2 2y 2 < 1 has no integer solution (except x D 0 D y), but the corresponding hyperbolic region has infinite area. The only difference is that in Theorem 5.3 we had an extreme shortage of solutions for the homogeneous case ˇ D 0; here, on the other hand, we have an extreme surplus. But this difference is irrelevant for the method of nested intervals: it works in both cases. This means that in Case 2 we can simply repeat the Cantor set construction in the proof of Theorem 5.3. This completes the proof of Theorem 5.14. t u Theorem 5.14 is a qualitative result. In contrast, we finish Sect. 5.9 with a quantitative result. Proposition 5.18. Let ˛ > 0 and > 0 be arbitrary real numbers. Then there is an effectively computable positive constant ı 0 D ı 0 . / > 0 depending only on > 0 such that for every sufficiently large integer N there exist two real numbers ˇ1 .N /, ˇ2 .N / in the unit interval 0 ˇ1 .N / < ˇ2 .N / < 1 such that jF .˛I ˇ1 .N /I I N / F .˛I ˇ2 .N /I I N /j > ı 0 log N: We just outline the proof in a couple of sentences, since it is basically the same as that of Theorem 5.14 (without the Cantor set construction). Indeed, let q` N < q`C1 . Since q`C1 D a` q` C q`1 .a` C 1/q` , we have 1
N a` C 1: q`
Again we distinguish two cases. Case 1: We have `1 k X p ˘ jp ai C N=q` 100 2 log N: i D1
Then, by repeating the argument of Case 1 in the proof of Theorem 5.14, we obtain Proposition 5.18 (see Lemma 5.17).
5.10 General Point Sets: Theorem 5.19
349
Next comes Case 2: We have `1 X p
k ˘ jp ai C N=q` > 100 2 log N:
i D1
Then F .˛I ˇ D 0I I N / > 100 2 log N; and so we can choose ˇ1 .N / D 0. Finally, for ˇ2 .N / we can choose any “below average point”: any ˇ2 .N / D ˇ with F .˛I ˇI I N / .2 C o.1// log N , see (5.283).
5.10 General Point Sets: Theorem 5.19 Let’s return to Theorem 5.11. What happens if we drop the “rectangle property” in Theorem 5.11, or—what is basically the same thing—in Proposition 5.12? Can we still prove “extra large deviations” for hyperbolic needles? This is what we discuss here. Let P be a finite point set of density ı > 0 in a “large” square Œ0; M 2 , i.e., jPj D ıM 2 . We just make a very mild technical assumption: we assume that P is “not clustered.” More precisely, we introduce a new concept called the “separation constant” D .P/. We say that P is -separated for some constant 0 if the (usual Euclidean) distance between any two points of the set P is at least . For example, the set of lattice points in the plane is clearly 1-separated, i.e., D 1. Our basic idea is the following: we show that if P is -separated with some not too small constant > 0, then the rectangle property holds, at least in a weak statistical sense, for the majority of the directions—we p call them the “good” directions. (For example, in Theorem 5.11 the slope 1= 2 is a concrete “good” direction.) This is how we will be able to save the Riesz product argument in the proof of Theorem 5.11 (or Proposition 5.12), and still prove “extra large deviations” (proportional to the area) for hyperbolic needles, at least for the majority of the directions. In the rest of the section we work out the details of the vague intuition—this will give us Theorem 5.19. The obvious handicap of this “majority approach” is that, for an arbitrary point set P, which is “not clustered,” we cannot predict that a given concrete direction is “good” or not. Another, purely technical, shortcoming is that in Theorem 5.19 we cannot get rid of the assumption that P is “not clustered.” This technical difficulty is rather counterintuitive, since, at least at first sight, “clusters” actually seem to help creating
350
5 Pell’s Equation, Superirregularity and Randomness
“extra large deviations”. Nevertheless, some technical difficulties prevent me from adapting the Riesz product technique for “clustered” point sets P. It remains an interesting open problem to decide whether or not in Theorem 5.19 the separation constant D .P/ plays any role. In Theorem 5.19 we change the underlying set: we switch from the “large” square Œ0; M 2 to the “large” disk ˚ disk.0I M / D x 2 R I 2 W jxj M
(5.319)
of radius M centered at the origin. (The reason behind this change is rotation invariance: Theorems 5.3 and 5.11 were about the translated copies, and Theorem 5.19 is about the rotated and translated copies of the hyperbolic needle.) Let P be a finite point set of density ı > 0 in the “large” disk disk.0I M /, i.e., jPj D ıM 2 (we assume that the radius M is “large”). We also assume that P is “not clustered.” More precisely, we assume that P is -separated for some positive constant D .P/ > 0. The goal is to count the number of elements of P in the rotated and translated copies of our usual hyperbolic needle H .N /. Let 102 > > 0 be a (small) positive real numbers (to be specified later). Let j be an arbitrary integer in the interval 0 j n where 2n N , that is, n D log N= log 2 C O.1/ (binary logarithm). We decompose the “large” disk disk.0I M / [see (5.319)] into disjoint translated copies of the small rectangle Œ0; 2j 1 Œ0; 2j 2 ;
(5.320)
i.e., we form a rectangle lattice starting from the origin. We just focus on the copies of (5.320) which are inside the large disk disk.0I M /, i.e., we ignore the copies of (5.320) that intersect the boundary circle or are outside of the disk. Note that there are O.2j M / copies of (5.320) that intersect the boundary circle of the large disk. If 2j D o.M / then there are .1 C o.1//M 22 copies of (5.320) that are inside the large disk disk.0I M /; we call these translated copies of the small rectangle (5.320) j -cells. More precisely, we call them j -cells of angle 0. In general, let be an arbitrary angle in 0 < . Let Rot denote the rotation of the plane by the angle ; we assume that the fixpoint of Rot is the origin. We decompose the “large” disk disk.0I M / into disjoint translates of the rotated copy Rot Œ0; 2j 1 Œ0; 2j 2
(5.321)
of the small rectangle (5.320). We just focus on the translated copies of (5.321) which are inside the large disk disk.0I M /. Again, if 2j D o.M / then there are .1 C o.1//M 2 2 translated copies of (5.321) that are inside the large disk disk.0I M /; we call these translated copies of the small rectangle (5.321) j -cells of angle .
5.10 General Point Sets: Theorem 5.19
351
We want to prove, in a quantitative form, that if P is “not clustered”, then for a “typical” angle 2 Œ0; /, the overwhelming majority of the j -cells of angle that contain at least one point from P contain exactly one point from P. A quantitative result like this—a statistical version of the “rectangle property”—will serve as a substitute for the “rectangle property,” and it will suffice to save the Riesz product technique developed in Sects. 5.5–5.8.
5.10.1 Statistical Version of the Rectangle Property: An Average Argument Let Pi1 ; Pi2 2 P (i1 ¤ i2 ) be an arbitrary pair of points. Define the “angle-set” angle .Pi1 ; Pi2 I j /D f 2 Œ0; / W there is a j -cell of angle containing both Pi1 ; Pi2 g. The angle-set angle.Pi1 ; Pi2 I j / is clearly measurable; let jangle.Pi1 ; Pi2 I j /j denote the usual one-dimensional Lebesgue measure (“length”). A simple geometric consideration shows that jangle.Pi1 ; Pi2 I j /j < 2
2j ; jPi1 Pi2 j
(5.322)
where 2j is the length of the short side of a j -cell and jPi1 Pi2 j is the (usual euclidean) distance of the two points. The basic idea is to estimate the following double sum: X
jangle.Pi1 ; Pi2 I j /j <
Pi1 ;Pi2 2PW i1 ¤i2
0 <
1
X 1 X B 2j C B CD 2 @ 2 jPi1 Pi2 j A Pi1 2P
Pi2 2PW i1 ¤i2
0 D 2j
1
X B X 1 C B C; @ jPi1 Pi2 j A P 2P P 2PW i1
where in (5.323) we applied (5.322).
i2
i1 ¤i2
(5.323)
352
5 Pell’s Equation, Superirregularity and Randomness
Since P is -separated, it is easy to give an upper bound to the inner sum in (5.323). We use a standard power-of-two decomposition argument: X Pi2 2PW i1 ¤i2
X 1`L
X 1 jPi1 Pi2 j
X
1`L
1 2`1
Pi2 2PW i1 ¤i2 2`1 <jPi1 Pi2 j2`
1 jPi1 Pi2 j
X 8 2 16 L 2`C1 D 2` < 2 :
(5.324)
1`L
Here L denotes the largest integer such that 2L < 2j C1 (where 2j is the length 2 of the long side of a j -cell), and the upper bound 2`C1 in (5.324) comes from the fact that a square of side =2 cannot contain two points from P (since P is -separated). Using the fact 2L < 2j C1 in (5.324), we have X Pi2 2PW i1 ¤i2
<
1 16 L < 2 < jPi1 Pi2 j
25 2j 16 2j C1 D : 2
(5.325)
Applying (5.325) in (5.323), and using the fact jPj D ıM 2 , we obtain X
jangle.Pi1 ; Pi2 I j /j <
Pi1 ;Pi2 2PW i1 ¤i2
< 2j jPj
25 2j 25 2 2 ıM 2 D : 2 2
(5.326)
We recall that the disk disk.0I M / of radius M contains .1 C o.1//M 2 2 j -cells of a given angle , and runs in the interval 0 < . It is natural, therefore, to consider the following renorming of (5.326): average D
<
1 2 M 2 2
X
jangle.Pi1 ; Pi2 I j /j <
Pi1 ;Pi2 2PW i1 ¤i2
5 25 2 2 ıM 2 1 4 2 ı D : 2 M 2 2 2 2
(5.327)
5.10 General Point Sets: Theorem 5.19
353
5.10.2 Consequences of Inequality (5.327) Let’s go back to Sect. 5.8. I recall that, the last step in the proof of Proposition 5.12 (i.e., basically the proof of Theorem 5.11) was to choose the parameters 1 and 2 as sufficiently small positive constants independent of M and N , see (5.263) (in view of (5.117), 1 and 2 are almost equal). Similarly to that, here we assume that (=parameter of the hyperbolic needle), ı (=density of P) and (=separation constant of P) are fixed positive constants, and consider (which of course plays the role of 1 ; 2 ) as a parameter that we will eventually choose as a sufficiently small positive constant independent of M and N . Since the area of a j -cell is 2 , we can roughly say that the “probability” that a j -cell of any angle contains a point from P is density area D ı 2 D const 2 :
(5.328)
On the other hand, in view of (5.327), the “probability” that a j -cell of any angle contains exactly two points from P is const 4 , which is negligible compared to the const 2 in (5.328) if is small enough. In general, the “probability” that a j -cell of any angle contains exactly p points from P where 2` < p 2`C1 (` D 1; 2; 3; : : :) is const 4 4` , where the constant factor const is independent of `. Indeed, p points from P means that we can choose p2 pairs Pi1 ; Pi2 , implying that those “rich” j -cells show up with multiplicity p 2
! > 2` 2`1 D
1 ` 4 2
in (5.327), explaining the factor 4` in const 4 4` . The point here is that even the sum of the products X
2`C1 4 4`
`1
is negligible compared to the const 2 in (5.328) if is small enough. Summarizing, we can say that (5.327) implies the following general picture about the distribution of the elements of P in the j -cells of any angle. Let 2 Œ0; / be a “typical” angle, and consider the j -cells of angle . The overwhelming majority of the points P 2 P turn out to be “singles”: if the point P 2 P is contained by some j -cell C of angle , then C does not contain any other point from P. Here the vague term “overwhelming majority” in fact has the quantitative meaning “1 O.2 / part of P.” (Of course, 1 O.2 / is almost 1 if is small.) Furthermore, “rich” j -cells turn out to be very rare in the following sense. Let ` 0 be a fixed integer; the proportion of the j -cells C of angle containing p points from P where 2` < p 2`C1 compared to those j -cells which contain
354
5 Pell’s Equation, Superirregularity and Randomness
at least one point from P is const 2 4` , where the constant factor const is independent of `. Since 2` is negligible compared to 4` if ` is large, the term “very rare” is well justified. We can say, therefore, that a weaker statistical version of the “rectangle property” holds for the majority of the angles 2 Œ0; /, assuming > 0 is a sufficiently small constant depending only on the positive constants (=parameter of the hyperbolic needle), ı (=density of P), and (=separation constant of P). A simple analysis of the Riesz product argument in Sects. 5.5–5.8 shows that this weaker statistical version of the “rectangle property” is a good substitute (for the strict “rectangle property”), and thus we can prove the following result. Theorem 5.19. Let P be a finite set of points in the disk disk.0I M / with density ı, i.e., the number of elements of P is jPj D ı M 2 . We study the number of elements of P in the rotated and translated copies of the hyperbolic needle H .N /. Assume that P is -separated with some > 0. Furthermore, assume that both N and M=N are sufficiently large depending only on , ı, and . Then there is a measurable subset A Œ0; 2/ such that A is larger than (say) 99 99 % of the interval Œ0; 2/ (i.e., the Lebesgue measure of A is larger than 100 2), and for every angle 2 A there is a translate x1 C Rot H .N / of the rotated copy Rot H .N / of the hyperbolic needle H .N /—rotated by angle —such that x1 C Rot H .N / disk.0I M / and jP \ .x1 C Rot H .N //j ı 2 log N C ı 0 log N;
(5.329)
where ı 0 D ı 0 .; ; ı/ > 0 is a positive constant, independent of N and M . Similarly, there is another translate x2 C Rot H .N / of the rotated copy Rot H .N / of the hyperbolic needle H .N / such that x2 C Rot H .N / disk.0I M / and jP \ .x2 C Rot H .N //j ı 2 log N ı 0 log N;
(5.330)
where ı 0 D ı 0 .; ; ı/ > 0 is the same positive constant as in (5.329). As we said at the beginning of this section, it is reasonable to guess that clusters just help to create extra large fluctuations. This intuition motivates the following Open Problem. Can one prove a version of Theorem 5.19 which has no reference to the separation constant D .P/? In other words, can we simply drop D .P/ from the conditions of Theorem 5.19? We guess the answer is yes, but, unfortunately, we cannot prove it. Finally, we briefly mention a closely related problem, where we cannot drop the separation constant D .P/ from the conditions. Note that Theorems 5.3, 5.11, and 5.14 were all about the extra large fluctuations of the “measure-theoretic discrepancy,” meaning the number of points of P minus the expectation (=density times the area). What we study last is the large fluctuations of the “˙1-discrepancy”
5.10 General Point Sets: Theorem 5.19
355
(or 2-coloring discrepancy). It means that we have an arbitrary “2-coloring” ' W P ! f1; C1g of the given point set P (say, “+1” represents red and “1” represents blue). Extra large fluctuations of the ˙1-discrepancy means that there is a translated (or rotated and translated) copy H 0 of the hyperbolic needle H .N / such that X '.P / > const area.H 0 / D const log N > 0 P 2P\H 0
with some positive constant, and similarly there is another translated (or rotated and translated) copy H 00 of the hyperbolic needle H .N / with negative ˙1-discrepancy: X
'.P / < const area.H 00 / D const log N < 0
P 2P\H 00
with some negative constant. Note that the Riesz product technique can be easily adapted to prove extra large fluctuations for the ˙1-discrepancy. For example, here is the ˙1-discrepancy analog of Proposition 5.12 (=basically Theorem 5.11). Proposition 5.20 (“˙1-discrepancy for translated copies”). Let P be a finite set of points in the square Œ0; M 2 with density ı, i.e., the number of elements of P is jPj D ı M 2 . Let ' W P ! f1; C1g be an arbitrary “2-coloring” of the point set P. We study the ˙1-discrepancy X
'.P /
P 2P\H
for the translated copies H of the hyperbolic needle H .N /. Assume that P satisfies the following “rectangle property”: there is a positive constant c1 D c1 .P/ > 0 such that every axes-parallel rectangle of area c1 contains at most one element of the set P. Furthermore, assume that both N and M=N are “large” in the precise sense of (5.333). Then there is a translated copy H D x1 C H .N / of the hyperbolic needle H .N / such that H Œ0; M 2 and X
'.P / ı 0 log N;
(5.331)
P 2P\H
where ı 0 D ı 0 .c1 ; ; ı/ > 0 is a positive constant, independent of N and M , to be specified below in (5.15). Similarly, there is another translated copy H D x2 C H .N / of the hyperbolic needle H .N / such that H Œ0; M 2 and X P 2P\H
'.P / ı 0 log N;
(5.332)
356
5 Pell’s Equation, Superirregularity and Randomness
with the same ı 0 D ı 0 .c1 ; ; ı/ > 0 as in (5.331); namely, 0
0
ı D ı .c1 ; ; ı/ D 10
12
107 c1 107 c12 p ı min : ; c1 ; ; 20 2 2
(5.333)
Finally, the assumption that both N and M=N are “large” goes as follows:
10 C 1
N 2
;
1 N < 2n N; 2
C 1 .N C 2 / o: n p M > 1011 7 107 c 2 ı c1 min 20 ; c1 ; 10 2 c1 ; 2 1
(5.334)
As we said, the proof is a straightforward adaptation of the arguments in Sects. 5.5–5.8. Similarly, one can easily prove the following analog of Theorem 5.19. Proposition 5.21 (“˙1-discrepancy for rotated and translated copies”). Let P be a finite set of points in the disk disk.0I M / with density ı, i.e., the number of elements of P is jPj D ı M 2 . Let ' W P ! f1; C1g be an arbitrary “2-coloring” of the point set P. We study the ˙1-discrepancy X
'.P /
P 2P\H
for the rotated and translated copies H of the hyperbolic needle H .N /. Assume that P is -separated with some > 0. Furthermore, assume that both N and M=N are sufficiently large depending only on , ı, and . Then there is a measurable subset A Œ0; 2/ such that A is larger than (say) 99 99 % of the interval Œ0; 2/ (i.e., the Lebesgue measure of A is larger than 100 2), and for every angle 2 A there is a translate H D x1 CRot H .N / of the rotated copy Rot H .N / of the hyperbolic needle H .N /—rotated by angle —such that H disk.0I M / and X
'.P / ı 0 log N;
(5.335)
P 2P\H
where ı 0 D ı 0 .; ; ı/ > 0 is a positive constant, independent of N and M . Similarly, there is another translate H D x2 C Rot H .N / of the rotated copy Rot H .N / such that H disk.0I M / and X
'.P / ı 0 log N;
P 2P\H
where ı 0 D ı 0 .; ; ı/ > 0 is the same positive constant as in (5.335).
(5.336)
5.11 The Area Principle in General
357
We want to point out that in Proposition 5.21, which is about the ˙1-discrepancy of hyperbolic needles, we definitely need some extra condition implying “P is not too clustered.” Indeed, it is easy to construct an extremely clustered point set P for which the ˙1-discrepancy of the hyperbolic needles is negligible. For example, we can start with a “typical” point set in general position and split up every point into a pair of points being extremely close to each other. The two points in the extremely close pairs are joined with a straight line segment each; we refer to these line segments as the “very short line segments.” Consider the particular 2-coloring of the point set where the two points in the extremely close pairs all have different “colors”: one is +1 and the other one is 1. We can easily guarantee that this particular 2-coloring has negligible ˙1-discrepancy for the family of all hyperbolic needles congruent to H .N /. If the original point set was in general position and the point pairs are close enough, than the arcs of any congruent copy of H .N / intersect at most two “very short line segments.” Since the boundary of H .N / consists of four arcs, the ˙1-discrepancy is at most 4 2 D 8, which is indeed negligible.
5.11 The Area Principle in General Proof of Theorem 5.7. We use the theory of continued fractions. This is of course not surprising, since the complete solution of the homogeneous inequality (5.57), or (5.18), was determined by Euler and Lagrange exactly by using the tool of continued fractions. We note in advance that the last step in the proof is an application of the Chebyshev inequality. We use the Ostrowski representation of integers with respect to any fixed irrational 0 < ˛ < 1, given by the continued fraction ˛D
1 1 a1 C a2 C : : :
D Œa1 ; a2 ; a3 ; : : :;
Œa1 ; a2 ; : : : ; ak1 D pk =qk with q1 D 1, q2 D a1 , qn D an1 qn1 C qn2 for all n 3. Since qn D an1 qn1 C qn2 , every positive integer n can be written in the form nD
k X
di qi ; di are integers
(5.337)
i D1
where 0 di ai (see [Os]). An analog of the Ostrowski representation of integers can be developed for the representation of the real number ˇ. Write n D qn ˛ pn ; then n D an1 n1 C n2 :
(5.338)
358
5 Pell’s Equation, Superirregularity and Randomness
Note that n D .1/n1 jn j; and jn2 j D an1 jn1 j C jn j:
(5.339)
In the theorem we can assume without loss of generality that 0 < ˛ < 1, so 1 D ˛ > 0 and 2 D a1 ˛ 1 < 0. Now every real number ˇ in the interval ˛ ˇ < 1 ˛ of length one (any interval of length one is fine, since the theorem is about modulo one) can be written in the form ˇD
1 X
bi i ; bi are integers;
(5.340)
i D1
where 0 b1 a1 1 and 0 bi ai for i 2. We can make representation (5.340) unique by enforcing the Extra Rule bi D ai implies bi 1 D 0 for all i 2;
(5.341)
and we also require that b2i C1 ¤ a2i C1 for infinitely many i:
(5.342)
Note that the minimum value of representation (5.340)–(5.342) is attained at a2 2 C a4 4 C a6 6 C : : : D .1 C 3 / C .3 C 5 / C .5 C 7 / C : : : D D 1 D ˛;
(5.343)
and similarly the maximum value of representation (5.340)–(5.342) is attained at .a1 1/1 C a3 3 C a5 5 C : : : D .a1 1/1 C .2 C 4 / C .4 C 6 / C : : : D D .a1 1/1 2 D .a1 1/˛ .1 a1 ˛/ D .1 ˛/;
(5.344)
but because of (5.342), equality in (5.344) cannot occur. This explains the interval ˛ ˇ < 1 ˛. Inserted Remark. Note that representation (5.340)–(5.342) was independently introduced by Cassels [Ca2], Descombes [De], and Sós [So1], and it was constantly used by Sós in her research of studying the irregularities of the irrational rotation (see, e.g., [So2, So3]).
5.11 The Area Principle in General
359
By (5.337) and (5.340) (we use to indicate equality modulo one) n˛ ˇ D
k X
di qi ˛
i D1
k X
di .qi ˛ pi /
i D1
1 X
bi i
i D1 1 X
bi .qi ˛ pi /
i D1
k X
.di bi /i
i D1
1 X
bj j .mod 1/:
(5.345)
j >k
The term kn˛ ˇk is particularly small if di D bi for 1 i k
(5.346)
0 D bkC1 D bkC2 D : : : D bkC` ;
(5.347)
and also
meaning a relatively long zero-block of ` consecutive coefficients bj —the same idea as in Sect. 5.4. By (5.345)–(5.347) ˇ ˇ ˇ ˇ 1 ˇ ˇ X kn˛ ˇk ˇˇ bj j ˇˇ I ˇ ˇj >kC`
(5.348)
the larger `, the better inequality (5.348). First we need the technical Lemma 5.22. If bm ¤ 0 then j
1 X
bj j j bm jm j C jmC1 j:
(5.349)
j Dm
Proof. We have 0 .1/
m1 @
1 X
1 bj j A D bm jm jbmC1jmC1 jCbmC2jmC2 jbmC3jmC3 j˙
j Dm
bm jm j bmC1 jmC1 j bmC3 jmC3 j bmC5 jmC5 j
(5.350)
360
5 Pell’s Equation, Superirregularity and Randomness
Since bm ¤ 0 we have bmC1 amC1 1, and using the recurrence formula (5.339): jn2 j D an1 jn1 j C jn j repeatedly, we obtain bm jm j bmC1 jmC1 j jmC1 j C jmC2 j; jmC2 j bmC3 jmC3 j jmC4 j; jmC4 j bmC5 jmC5 j jmC6 j; and so on. Applying these inequalities in (5.350), we have 0 .1/m1 @
1 X
1 bj j A .bm 1/jm j C jmC1 j:
(5.351)
j Dm
On the other hand, by a telescoping sum argument 0 .1/m1 @
1 X
1 bj j A bm jm j C bmC2 jmC2 j C bmC4 jmC4 j C
(5.352)
j Dm
bm jm j C .jmC1 j jmC3 j/ C .jmC3 j jmC5 j/ C .jmC5 j jmC7 j/ C D bm jm j C jmC1 j: Equations (5.351) and (5.352) prove Lemma 5.22.
t u
We recall the following well-known fact from the theory of continued fraction: ˇ ˇ ˇ ˇ 1 1 ˇ˛ pm ˇ < ” jm j D jqm ˛ pm j < : ˇ ˇ qm qm qmC1 qmC1
(5.353)
By Lemma 5.22 and (5.353) we have the following upper bound in (5.348): kn˛ ˇk <
1 C bkC`C1 ; qkC`C2
(5.354)
assuming bkC`C1 ¤ 0 and (5.347) holds. Condition (5.347) defines an integer n such that bk qk n D
k X
bi qi .bk C 2/qk :
(5.355)
i D1
Now assume that the Area Principle fails for the homogeneous inequality (5.354); then by (5.353)
5.11 The Area Principle in General
361
.qm / <
1 for all m m0 : qmC1
(5.356)
Let B1 2 f1; : : : ; ak g be fixed with k m0 , and, motivated by (5.355), we find a j D j.B1 / such that qj <
1 < qj C1 : ..B1 C 2/qk /
(5.357)
By (5.356) 1 > ..B1 C 2/qk /
1 > qkC1 ; .qk /
implying j D j.B1 / k C 1. We choose a B2 2 f1; : : : ; aj g such that qj C1 1=10 ..B1 C 2/qk / B2
1 : ..B1 C 2/qk /
(5.358)
Since j D j.B1 / k C 1, with some appropriate integer ` 0 we can write j D k C 1 C `, and define the set S.bk D B1 ; bkC`C1 D B2 / as the following subset of Œ˛; 1 ˛/ [see expansion (5.340)]: S.bk D B1 ; bkC`C1 D B2 / D D fˇ 2 Œ˛; 1 ˛/ W bk D B1 ; 0 D bkC1 D D bkC` ; bkC`C1 D B2 g: (5.359) If ˇ 2 S.bk D B1 ; bkC`C1 D B2 / [see (5.357)–(5.359)] then by (5.354), (5.355), (5.358) the inhomogeneous inequality kn˛ ˇk D O. .n//; where the implicit constant is absolute;
(5.360)
has an integral solution n with B1 qk n .B1 C 2/qk :
(5.361)
Next we compute the Lebesgue measure meas.S / of the sets S D S.bk D B1 ; bkC`C1 D B2 / defined by (5.357)–(5.359). Lemma 5.23. With any B2 2 f1; : : : ; akC`C1 g we have ( meas .S.bk D B1 ; bkC`C1 D B2 // D
qk jkC`C1 j;
if B1 ¤ ak I
qk1 jkC`C1 j; if B1 D ak :
362
5 Pell’s Equation, Superirregularity and Randomness
Proof. Let ].b1 ; : : : ; bk1 / denote the number of permissible sequences .b1 ; : : : ; bk1 / satisfying (5.340)–(5.342). Clearly ].b1 / D a1 D q2 , ].b1 ; b2 / D a1 a2 C 1 D q3 , and ].b1 ; : : : ; bk1 / satisfies the same recurrence as qi : qi D ai 1 qi 1 C qi 2 , and so we have ( qk ; if bk D B1 ¤ ak I ].b1 ; : : : ; bk1 / D (5.362) qk1 ; if bk D B1 D ak : Next we study the tail series 1 X
bi i D :
(5.363)
i DkC`C2
Since bkC`C1 D B2 ¤ 0, we have 0 bkC`C2 akC`C2 1. Repeating the argument (5.343) and (5.344) we have .1/kC` .akC`C2 1/jkC`C2 j C jkC`C3 j;
(5.364)
and also .1/kC` jkC`C2 j
(5.365)
[note that (5.365) is analogous to (5.343), and (5.364) is analogous to (5.344)]. It follows that the tail series (5.363) covers an interval of length akC`C2 jkC`C2 j C jkC`C3 j D jkC`C1 j: Equations (5.362) and (5.366) prove Lemma 5.23. Next we estimate the total sum of the measures: X
meas .S.bk D B1 ; bkC`C1 D B2 //
(5.357)–(5.359)W km0 ak X X X qk or qk1 const qkC`C2 B D1 B km0
const
1
2
ak X X qk or qk1 ..B1 C 2/qk /qkC`C2 D qkC`C2 B D1
km0
1
(5.366) t u
5.11 The Area Principle in General
D const
X km0
0 @
aX k 1
363
1 qk ..B1 C 2/qk / C qk1 ..B1 C 2/qk /A
B1 D1
const
X
.n/ D 1;
(5.367)
nqm0
where we used Lemma 5.23, (5.358), m0 is defined by (5.356), and as usual, const stands for a positive absolute constant factor. In view of (5.360) and (5.361) it suffices to show that almost every ˇ 2 Œ˛; 1 ˛/ is contained by infinitely many sets S.bk D B1 ; bkC`C1 D B2 / defined by (5.357)–(5.359). Equation (5.367) was the first step in this direction. But we also need information about the Lebesgue measure of the pairwise intersections S.bk1 D B1 ; bk1 C`1 C1 D B2 / \ S.bk2 D B3 ; bk2 C`2 C1 D B4 /:
(5.368)
We can assume k1 < k2 , then intersection (5.368) is the empty set, unless k1 C `1 C 1 < k2 , or possibly k1 C `1 C 1 D k2 , B2 D B3 . Let d D k2 k1 `1 1 denote the “distance”; we prove that (5.368) is exponentially close to the product rule in terms of the distance d . This means “exponentially weak dependence,” a phenomenon well known among the experts of continued fraction. For example, this fact has been constantly used by Sós in her research concerning the “strong irregularities” of the irrational rotation, see [So3]. The following useful counting lemma is taken from Sós’s paper. Lemma 5.24. For every r t, let Ar;t .B/ denote the number of sequences .br ; brC1 ; : : : ; bt / such that br D B 2 f1; : : : ; ar g; 0 bi ai and bi D ai implies bi 1 D 0 for every i in r < i t. Then Ar;t .B/ D qt C1 jr j C .1/t r qr jt C1j:
(5.369)
Proof. By definition Ar;r .B/ D 1. We double-check (5.369) in the special case t D r by computing the right-hand side of (5.369): qrC1 jqr ˛ pr j C qr jqrC1˛ prC1 j D D qrC1 .1/r .qr ˛pr /Cqr .1/rC1.qrC1 ˛prC1 / D .1/r .prC1 qr qrC1 pr /D1; proving (5.369) in the simplest case t D r.
364
5 Pell’s Equation, Superirregularity and Randomness
We also have Ar;rC1 .B/ D arC1 , and qrC2 .1/r .qr ˛ pr / C .1/.rC1/r qr .1/rC2 .qrC2 ˛ prC2 /D.1/r .prC2 qr qrC2 pr /D
D .1/r ..arC1 prC1 C pr /qr .arC1 qrC1 C qr /pr / D D .1/r arC1 .prC1qr qrC1 pr / D arC1 ; proving (5.369) for t D r C 1. Since bi D ai implies bi 1 D 0, we have the recurrence relation Ar;t .B/ D at Ar;t 1 .B/ C Ar;t 2 .B/ for all t > r C 1:
(5.370)
Now we are ready to prove (5.369) by induction on .t r/. We have Ar;t j .B/ D qt j C1 jr j C .1/t j r qr jt j C1j for both j D 1; 2, and returning to (5.370), we conclude Ar;t .B/ D at .qt jr j C .1/t 1r qr jt j/ C qt 1 jr j C .1/t 2r qr jt 1 j D D jr j.at qt C qt 1 / C .1/t r qr .at jt j C jt 1 j/ D D qt C1 jr j C .1/t r qr jt C1 j; proving (5.369), and this completes the proof of Lemma 5.24.
t u
Now it is easy to compute the measure of the intersection (5.368). First assume that the distance d D k2 k1 `1 1 is 1. We know from the proof of Lemma 5.23 that the number of permissible sequences .b1 ; b2 ; : : : ; bk1 1 / satisfying (5.340)– (5.342) is qk1 if bk1 D B1 ¤ ak1 and qk1 1 if bk1 D B1 D ak1 . By Lemma 5.24 the number of permissible sequences .bk1 C`1 C1 D B2 ¤ 0; bk1 C`1 C2 ; : : : ; bk2 1 / of length d is qk2 jk1 C`1 C1 j C .1/d C1 qk1 C`1 C1 jk2 j if bk2 D B3 ¤ ak2 and qk2 1 jk1 C`1 C1 j C .1/d qk1 C`1 C1 jk2 1 j if bk2 D B3 D ak2 :
5.11 The Area Principle in General
365
Finally, note that, just like in Lemma 5.23, the tail series 1 X
b i i
i Dk2 C`2 C2
completely fills out an interval of length jk2 C`2 C1 j. Write X D S.bk1 D B1 ; bk1 C`1 C1 D B2 /
(5.371a)
Y D S.bk2 D B3 ; bk2 C`2 C1 D B4 /:
(5.371b)
and
Lemma 5.25. We have jmeas.X \ Y / meas.X /meas.Y /j 22d ; meas.X /meas.Y / where d D k2 .k1 C `1 C 1/ 1 is the “distance”. Proof. We distinguish four cases. We begin with Case 1: Assume that d D k2 k1 `1 1 is 1, B1 ¤ ak1 , B3 ¤ ak2 Then we have meas.X \ Y / D qk1 qk2 jk1 C`1 C1 j C .1/d C1 qk1 C`1 C1 jk2 j jk2 C`2 C1 j: On the other hand, by Lemma 5.23, meas.X / D qk1 jk1 C`1 C1 j and meas.Y / D qk2 jk2 C`2 C1 j: It follows that qk C` C1 jk2 j jmeas.X \ Y / meas.X /meas.Y /j D 1 1 : meas.X /meas.Y / qk2 jk1 C`1 C1 j
(5.372)
We need the almost trivial inequality qi Cd 2bd=2c ; qi which follows from the successive application of the recurrence qi D ai 1 qi 1 C qi 2 qi 1 C qi 2 2qi 2 ;
(5.373a)
366
5 Pell’s Equation, Superirregularity and Randomness
and we also need the following analog of (5.373a): ji j 2bd=2c : ji Cd j
(5.373b)
By (5.372) and (5.373), we have jmeas.X \ Y / meas.X /meas.Y /j 21d ; meas.X /meas.Y /
(5.374)
where d D k2 .k1 C `1 C 1/ 1 is the “distance.” Inequality (5.374) justifies the term exponentially weak dependence, which is the reason behind the Area Principle (a “zero–one law”). Case 2: Assume that d D k2 .k1 C `1 C 1/ 1, B1 D ak1 , B3 D ak2 Then [see (5.371)] meas.X \ Y / D qk1 1 qk2 1 jk1 C`1 C1 j C .1/d qk1 C`1 C1 jk2 1 j jk2 C`2 C1 j; and by Lemma 5.23, meas.X / D qk1 1 jk1 C`1 C1 j and meas.Y / D qk2 1 jk2 C`2 C1 j: Combining these facts with (5.373), we obtain qk C` C1 jk2 1 j jmeas.X \ Y / meas.X /meas.Y /j D 1 1 22d ; meas.X /meas.Y / qk2 1 jk1 C`1 C1 j
(5.375)
which is basically the same as (5.374) (we lost an irrelevant factor of 2). It is easy to check that (5.375) remains true for the remaining two cases with d 1: Case 3: B1 ¤ ak1 , B3 D ak2 and Case 4: B1 D ak1 , B3 ¤ ak2 . In all four cases we have exponentially weak dependence. This completes the proof of Lemma 5.25. t u Now we are ready to complete the proof of Theorem 5.7: we simply use the exponentially weak dependence in a Chebyshev’s inequality as follows. (The most difficult part is to find a good notation.) Let k;`;B1 ;B2 denote the characteristic function of the set S.bk D B1 ; bkC`C1 D B2 / defined by (5.357)–(5.359): ( k;`;B1 ;B2 .ˇ/ D
1; if ˇ 2 S.bk D B1 ; bkC`C1 D B2 /I 0; if ˇ 62 S.bk D B1 ; bkC`C1 D B2 /:
We have a probabilistic viewpoint: the interval ˛ ˇ < 1 ˛ of length one is considered the whole probability space, and the usual “length” (one-dimensional Lebesgue measure), denoted by meas.: : :/, is the probability. So the expectation
5.11 The Area Principle in General
367
E k;`;B1 ;B2 D meas .S.bk D B1 ; bkC`C1 D B2 // ; and the sum [see (5.356)] X
k;`;B1 ;B2 .ˇ/
(5.376)
m0 kM 2
counts the number of integral solutions of the diophantine inequality kn˛ ˇk D O. .n//
(5.377)
(the implicit constant in (5.377) is absolute) in the range 1 n qM , since by (5.361) B1 qk n .B1 C 2/qk qM : Here M is a parameter; we choose M ! 1 at the end of the proof. To apply Chebyshev’s inequality, we need to compute the variance 0 E@
X
12 . k;`;B1 ;B2 E k;`;B1 ;B2 /A D
m0 kM 2
D
X
. k;`;B1 ;B2 E k;`;B1 ;B2 /2 C
m0 kM 2
C2
X
E. k1 ;`1 ;B1 ;B2 E1 /. k2 ;`2 ;B3 ;B4 E2 /;
(5.378)
m0 k1
where, for notational convenience, we use the brief notation E1 D E k1 ;`1 ;B1 ;B2 and E2 D E k2 ;`2 ;B3 ;B4 : Write A1 D S.bk1 D B1 ; bk1 C`1 C1 D B2 / and A2 D S.bk2 D B3 ; bk2 C`2 C1 D B4 /: (5.379) Note that with k1 k2 we have ( E A1 A2 Dmeas.A1 \ A2 / D
0;
if k2 k1 C `1 I
¤ 0;
if k2 > k1 C `1 C 1 or k2 D k1 C `1 C 1; B2 D B3 :
368
5 Pell’s Equation, Superirregularity and Randomness
By (5.374) and (5.375) jE A1 A2 Pr.A1 / Pr.A2 /j 22d Pr.A1 / Pr.A2 /;
(5.380)
where d D k2 .k1 C `1 C 1/ 1. Using these facts in (5.378), we have X
Variance in (5.378)
Pr.A1 / C
X 1
m0 k1 M 2
C
X 2
;
(5.381)
where X 1
X
X
D
Pr.A1 \ A2 /
(5.382)
A1 W m0 k1 M 2 A2 W k1 C`1 C1Dk2 M 2 B2 DB3
and (5.380) X 2
0
X
D
Pr.A1 / @
A1 W m0 k1 M 2
1
X
X
Pr.A2 / 22d A :
d 1 A2 W k1 C`1 C1Dk2 M 2
(5.383) Since the sets A2 with fixed k2 are pairwise disjoint, we have X 1
X
Pr.A1 /;
(5.384)
m0 k1 M 2
and similarly X 2
0 1 X Pr.A1 / @ 22d A D 4
X m0 k1 M 2
d 1
X
Pr.A1 /:
(5.385)
m0 k1 M 2
Combining (5.381)–(5.385) we obtain X
Variance in (5.378) 6
Pr.A1 /:
(5.386)
m0 k1 M 2
By Chebyshev’s inequality and (5.386), for any 2 3 X X Pr 4 A1 Pr.A1 / 5 m0 k1 M 2
m0 k1 M 2
0 1 2 @6
X m0 k1 M 2
1 Pr.A1 /A :
(5.387)
5.11 The Area Principle in General
369
Write T D T .M / D
X
Pr.A1 /;
m0 k1 M 2
then by (5.367) and (5.379), T D T .M / ! 1 as M ! 1:
(5.388)
We choose D .M / D
1 T .M /; 2
then by (5.387), 2 Pr 4
X m0 k1 M 2
A1
3 1 24 T .M /5 1 : 2 T .M /
(5.389)
Taking M ! 1, by (5.388) and (5.389) we obtain X km0
k;`;B1 ;B2 .ˇ/ D
X
A1 D 1
km0
for almost every ˇ 2 Œ˛; 1 ˛/, and by (5.376) and (5.377) this gives infinitely many integral solutions of the diophantine inequality kn˛ ˇk D O. .n//:
(5.390)
Since the implicit constant in (5.390) is absolute, the proof of Theorem 5.7 is complete. u t
Chapter 6
More on Randomness
6.1 Starting the Proof of Theorem 5.4: Blocks-and-Gaps Decomposition p We recall [see (5.31)] that F . 2I ˇI I N / denotes the number of lattice points in the long and narrow p hyperbolapsegment (“hyperbolic needle”) located along the line y D .x C ˇ/= 2 of slope 1= 2 p ˚ H . 2I ˇI N / D .x; y/ 2 ZZ2 W .x C ˇ/2 2y 2 ; 0 < y N; x > 0 : (6.1) p In the special case ˇ D 0 the line is y D x= 2 passing through the origin, and we simply write p p ˚ H . 2I N / D H . 2I 0I N / D .x; y/ 2 ZZ2 W x 2 2y 2 ; 0 < y N; x > 0g :
(6.2)
In Theorem 5.4 we study the case where ˇ runs in the unit interval 0 ˇ < 1; then ˇ ˇ
p p ˇ ˇ F . 2I ˇI I N / D ˇ H . 2I N / v.ˇ/ \ ZZ2 ˇ ;
(6.3)
where we use the standard notation that S C v means the translated copy of a set S , translated by the vector v, and in (6.3) the vector is v.ˇ/ D .ˇ; 0/. We also recall the well-known fact that the set of all positive integral solutions .pi ; qi / 2 ZZ2 of the Pell’s equation x 2 2y 2 D ˙1 forms a cyclic group generated by the least positive solution; formally, pi ˙ qi
p
2 D .1 ˙
p
2/i ; i 0;
© Springer International Publishing Switzerland 2014 J. Beck, Probabilistic Diophantine Approximation, Springer Monographs in Mathematics, DOI 10.1007/978-3-319-10741-7__6
371
372
6 More on Randomness
where all positive integral solutions of x 2 2y 2 D 1 are given by pi ˙ qi
p
2 D .1 ˙
p 2i 2/
and all of x 2 2y 2 D 1 by pi ˙ qi
p
2 D .1 ˙
p 2i C1 2/ :
It follows that pi D
p
p
p p 1 1 .1 C 2/i C .1 2/i and qi D p .1 C 2/i .1 2/i ; 2 2 2 (6.4)
and in particular we have .p0 ; q0 / D .1; 0/; .p1 ; q1 / D .1; 1/; .p2 ; q2 / D .3; 2/; .p3 ; q3 / D .7; 5/, and so on. For p every integer i 0 we define a “hyperbolic triangle” Ti D Ti . / D Ti . 2I / as follows. Let Li denote the half line starting from the origin .0; 0/ and passing through the lattice point .pi ; qi /. The “hyperbolic triangle” Ti D Ti . / is bounded by the lines Li ; Li C2 and the hyperbola x 2 2y 2 D in the positive quadrant if i 1 is odd and bounded by the lines Li ; Li C2 and the hyperbola x 2 2y 2 D in the positive quadrant p if i 0 is even. This means that Ti D Ti . / is below or above the line y D x= 2 depending on whether i 0 is even or odd. Note that Ti D Ti . / has vertices .0; 0/, .pi ; qi /, and .pi C2 ; qi C2 /. 12 is a fundamental automorphism We also use the fact that the matrix A D 11 of ˙.x 2 2y 2 / (indeed, x12 2y12 D .x C 2y/2 2.x C y/2 D .x 2 2y 2 /), and Ai D
i 12 ; i 2 ZZ 11
give rise to infinitely many automorphisms preserving the lattice points and the area. In particular, we have A
pi 12 pi C1 pi D D ; 11 qi qi qi C1
which implies ATi D Ti C1 . Thus we have Ti D Ai T0 , and in general Aj Ti D Ti Cj . The matrix A has determinant 1 (explaining why it preserves the area), and all hyperbolic triangles have the same area log.1 C area .Ti . // D p 2
p
2/
:
(6.5)
6.1 Starting the Proof of Theorem 5.4: Blocks-and-Gaps Decomposition
373
What we are interested in is the one-dimensional family of translations by the vectors v.ˇ/ D .ˇ; 0/, 0 ˇ < 1 [see (6.3)]; nevertheless it turns out to be very useful to involve an extra dimension, and to study translations by all twodimensional vectors v 2 R I 2 (so we can take advantage of the rich geometry of the plane). This explains why we focus on the lattice point counting function ˇ ˇ fi .v/ D ˇ.Ti v/ \ ZZ2 ˇ ; v 2 R I 2; (6.6) where Ti D Ti . /. Since ZZ2 is periodic, the function fi .v/ is defined on the unit torus v 2 Œ0; 1/2 D R I 2 =ZZ2 . The fact Aj Ti D TiCj implies that fi .v/; v 2 Œ0; 1/2 ; i D 0; 1; 2; 3; : : : ; is a stationary sequence. This term in probability theory means that the joint cumulative distribution is invariant under the time shift, which in this special case is equivalent to ˚ area v 2 Œ0; 1/2 W fi .v/ a0 ; fi C1 .v/ a1 ; : : : ; fi C` .v/ a` D ˚ D area v 2 Œ0; 1/2 W fj Ci .v/ a0 ; fj CiC1 .v/ a1 ; : : : ; fj Ci C` .v/ a` for all integers i; ` 0, j 1 and reals a0 ; a1 ; : : : ; a` , where j is the time shift. Classical probability theory is mainly about independent random variables. The study of mixing stationary processes in discrete (and continuous) time came up later as a natural extension of independent identically distributed random variables. It is well known since the 1960s (or perhaps even earlier) that a discrete stationary process with exponentially fast mixing exhibits a central limit theorem (CLT). Exponentially fast mixing in our special case would mean the following: sup .E1 ;E2 / with time gap j W PrŒE1 >0
jPrŒE2 jE1 PrŒE2 j c j with some c > 1 for all j 1; (6.7)
where the pair .E1 ; E2 / runs through all possible events of the form ˚ E1 D v 2 Œ0; 1/2 W fi .v/ a0 ; fi C1 .v/ a1 ; : : : ; fi C` .v/ a` ; ˚ E2 D v 2 Œ0; 1/2 W fj Ci .v/ a0 ; fj Ci C1 .v/ a1 ; : : : ; fj Ci C` .v/ a` with time gap j , and of course PrŒE2 jE1 D
PrŒE1 \ E2 PrŒE1
denotes the conditional probability with Pr=area=two-dimensional Lebesgue measure.
374
6 More on Randomness
Unfortunately we cannot prove (6.7) (it may be false). This means we don’t see any shortcut way to prove our CLT (Theorem 5.4) by directly applying some existing result in probability theory. What we can prove is the weaker version of (6.7): jPrŒE2 jE1 PrŒE2 j c j with some c > 1 for all j 1
(6.8)
holds for the “majority” of the pairs E1 ; E2 of events with PrŒE1 > 0 and time gap j . We refer to (6.8) as “exponentially fast majority mixing.” Unfortunately it is a long, nontrivial technical task to make “exponentially fast majority mixing” precise, and to derive from it a CLT. To do so, we borrow a decomposition technique from probability theory. It goes back to the works of S.N. Bernstein in the 1920s; we call it a “blocks-and-gaps” decomposition. Sections 6.1 and 6.2 are about the application of this method. We summarize the results of this method at the beginning of Sect. 6.3 in Lemma 6.3. (A reader in rush may jump ahead to Lemma 6.3 right now.) Another idea is to employ “Rademacher like functions”. Let 0 r0 < r1 < r2 < r3 < : : : be an arbitrary sequence of integers. A sequence '1 .x/; '2 .x/; '3 .x/; : : : of functions defined on the unit interval 0 x < 1 is called a sequence of Rademacher like functions of type 0 r0 < r1 < r2 < r3 < : : : if the following two properties hold: 1. 'j .x/ is a step function such that it is constant on every subinterval a2rj x < .a C 1/2rj , 0 a < 2rj integer, j 1; 2. the distribution of 'j .x/ on the longer subinterval a2rj 1 x < .a C 1/2rj 1 is independent of the value of a, where 0 a < 2rj 1 integer. It is obvious from the definition that a sequence of Rademacher like functions forms a sequence of independent random variables. Let 0 1 < 2 be arbitrary integers, and consider the lattice point counting function representing a “block” [see (6.6)] f .1 ; 2 I v/ D
X 1 i 2
fi .v/ D
X ˇ ˇ ˇ.Ti v/ \ ZZ2 ˇ ; v 2 R I 2;
(6.9)
1 i 2
where Ti D Ti . /. Since ZZ2 is periodic, the function f .1 ; 2 I v/ is actually defined on the unit torus v 2 Œ0; 1/2 D R I 2 =ZZ2 . p p 12 has eigenvalues 1 C 2 and 1 2; the eigenvector The matrix A D 11 p p . 2; 1/ of 1 C 2 represents the magnifying p p direction for the positive powers of A, and the eigenvector . 2; 1/ of 1 2 represents the “shrinking” direction. 2 The magnifying direction explains why we tilt the p half-open unit square Œ0; 1/ in such a way that the vertical side has slope 1= p2, thatp is, we consider the halfopen parallelogram with vertices .0; 0/; .1; 0/; . 2; 1/; . 2 C 1; 1/; let P0 denote this half-open parallelogram. Notice that P0 is equivalent to the unit square Œ0; 1/2 modulo one, i.e., the distribution of (6.9) is exactly the same as that of
6.1 Starting the Proof of Theorem 5.4: Blocks-and-Gaps Decomposition
f .1 ; 2 I v/ D
X
fi .v/ D
1 i 2
X ˇ ˇ ˇ.Ti v/ \ ZZ2 ˇ ; v 2 P0 ;
375
(6.10)
1 i2
where the longer sides of the parallelogram P0 are parallel to the magnifying 12 direction of matrix A D . 11 Given integers r 0 and 0 a < 2r , let P0 .rI p a/ denote thephalf-open parallelogram with vertices .a2r ; 0/, ..a C 1/2r ; 0/, . 2 C a2r ; 1/, . 2 C .a C 1/2r ; 1/. Notice that P0 is the disjoint union of P0 .rI a/, 0 a < 2r . We say that an interval .a2r ; .a C 1/2r / is 0-robust with respect to the lattice point counting function f .1 ; 2 I v/ if f .1 ; 2 I v/ is constant on the parallelogram v 2 P0 .rI a/. For later application we introduce now a generalization of the concept of 0robust intervals. Let s 0 be an arbitrary integer, p and let P ps denote the half-open parallelogram with vertices .0; 0/, .1; 0/, .2s 2; 2s /, .2s 2 C 1; 2s /. Again let r 0, 0 a < 2r be integers, and let Ps .rI parallelogram p a/ denote the half-open p with vertices .a2r ; 0/, ..aC1/2r ; 0/, .2s 2Ca2r ; 2s /, .2s 2C.aC1/2r ; 2s /. Notice that Ps is the disjoint union of Ps .rI a/, 0 a < 2r . We say that an interval .a2r ; .a C 1/2r / is s-robust with respect to the lattice point counting function f .1 ; 2 I v/ if f .1 ; 2 I v/ is constant on the parallelogram v 2 Ps .rI a/. If fi .v/ is constant on the parallelogram Ps .rI a/ for every 1 i 2 then of course f .1 ; 2 I v/ is also constant on the parallelogram Ps .rI a/. Let Ps;0 .r/ denote the parallelogram satisfying the following three properties: 1. Ps;0 .r/ is centered at the origin; 2. Ps;0 .r/ has two horizontal sides of length 2rC1 on the lines y D 2s and y D 2s ; p 3. the other two sides have slope 1= 2. Let 2Ps;0 .r/ D f2x W x 2 Ps;0 .r/g denote the twice as large magnified copy of Ps;0 .r/. Let i be an integer with 1 i 2 . We define the Ps;0 .r/-neighborhood of the boundary curve @Ti of the hyperbolic triangle Ti D Ti . / as follows (@ denotes the boundary) Ps;0 .r/-neighborhood-of-@Ti D fx C y W x 2 @Ti and y 2 Ps;0 .r/g :
(6.11)
If the translated copy .Ps;0 .r/-neighborhood-of-@Ti / .a2r ; 0/ of (6.11) [translated by the vector .a2r ; 0/] does not contain a lattice point 2 ZZ2 , then fi .v/ is clearly constant on the parallelogram Ps .rI a/. It follows that if .Ps;0 .r/-neighborhood-of-@Ti / .a2r ; 0/ does not contain a lattice point 2 ZZ2 for any 1 i 2 , then f .1 ; 2 I v/ is constant on the parallelogram Ps .rI a/.
376
6 More on Randomness
We clearly have 2r
r 1 2X
ˇ ˇ ˇ..Ps;0 .r/-neighborhood-of-@Ti / .a2r ; 0// \ ZZ2 ˇ
aD0
Z
ˇ ˇ ˇ..2Ps;0 .r/-neighborhood-of-@Ti / v/ \ ZZ2 ˇ d v D
v2Ps
Z D v2Ai Ps
ˇ ˇ ˇ..2Ps;0 .r/-neighborhood-of-@Ti / v/ \ ZZ2 ˇ d v;
(6.12)
12 is measure-preserving, and of where we used the fact that the matrix A D 11 course 2Ps;0 .r/-neighborhood-of-@Ti means that in (6.11) we replace Ps;0 .r/ with the twice as large copy 2Ps;0 .r/. We have Z ˇ ˇ ˇ..2Ps;0 .r/-neighborhood-of-@Ti / v/ \ ZZ2 ˇ d v D v2Ai Ps
Z D v2Ai Ps
Z D
ˇ i ˇ ˇA ..2Ps;0 .r/-neighborhood-of-@Ti / v/ \ ZZ2 ˇ d v D
ˇ i ˇ ˇ A .2Ps;0 .r// -neighborhood-of-@T0 w \ ZZ2 ˇ d w;
(6.13)
w2Ps
since Ai Ti D Ai Ai T0 D T0 , where T0 D T0 . / is the hyperbolic triangle with vertices .0; 0/; .; 0/; .3; 2 /. We say that a lattice point n 2 ZZ2 is relevant in equation (6.13) if n 2 Ai .2Ps;0.r// -neighborhood-of-@T0 w holds for some w 2 Ps : (6.14) i The sides of .2Ps;0 .r// are parallel to the magnifying p the parallelogram A sC4 p that eigenvector . 2; 1/ p have length 2 .1 C 2/i and the other two sides have rC4 length 2 .1 C 2/i . Combining this with (6.14), we obtain that there are less than
p p
104 1 C 2s .1 C 2/i 1 C 2r .1 C 2/i .1 C 2 / (6.15) lattice points that are relevant in equation (6.13) [see (6.14)].
6.1 Starting the Proof of Theorem 5.4: Blocks-and-Gaps Decomposition
377
Similarly, we obtain the trivial upper bound area of Ai .2Ps;0 .r// -neighborhood-of-@T0 p p
104 2s .1 C 2/i C 2r .1 C 2/i :
(6.16)
Combining the trivial fact [see (6.14)] n 2 Ai .2Ps;0.r// -neighborhood-of-@T0 w ” ” w 2 Ai .2Ps;0 .r// -neighborhood-of-@T0 n with Fubini’s theorem (“continuous double counting”), we obtain the upper bound Z
ˇ i ˇ ˇ A .2Ps;0.r// -neighborhood-of-@T0 w \ ZZ2 ˇ d w w2Ps
Œnumber of relevant lattice points in (6.13) AREA;
(6.17)
AREA D area of Ai .2Ps;0.r// -neighborhood-of-@T0 :
(6.18)
where
Combining (6.12)–(6.18), we have 2r
r 1 2X
ˇ ˇ ˇ..Ps;0 .r/-neighborhood-of-@Ti / .a2r ; 0// \ ZZ2 ˇ 108 .1 C 2 /
aD0
p p p p
1 C 2s .1 C 2/i 1 C 2r .1 C 2/i 2s .1 C 2/i C 2r .1 C 2/i : (6.19) Switching to the union set [
Ti ;
1 i2
by (6.19) we obtain 2r
ˇ ˇ ˇ ˇ
r 1 2X ˇ
aD0
2
r
r 1 2X
Ps;0 .r/-neighborhood-of-@
[ 1 i 2
! Ti
ˇ ˇ ˇ .a2r ; 0/ \ ZZ2 ˇ ˇ !
X ˇ ˇ ˇ..Ps;0 .r/-neighborhood-of-@Ti / .a2r ; 0// \ ZZ2 ˇ
aD0 1 i 2
378
6 More on Randomness
108 .1 C 2 /
X p 1 C 2s .1 C 2/i 1 i 2
p p p
1 C 2r .1 C 2/i 2s .1 C 2/i C 2r .1 C 2/i
p 108 .1 C 2 /.2 1 C 1/ 1 C 2s .1 C 2/1
p p p 1 C 2r .1 C 2/2 2s .1 C 2/1 C 2r .1 C 2/2 :
(6.20)
Trivial geometric consideration gives that if Ps;0 .r/-neighborhood-of-@
[
! .a2r ; 0/
Ti
1 i 2
does not contain a lattice point 2 ZZ2 then f .1 ; 2 I v/ is constant on the parallelogram Ps .rI a/. Combining this with (6.20) we obtain that there are at most
p 2r 108 .1 C 2 /.2 1 C 1/ 1 C 2s .1 C 2/1
p p p 1 C 2r .1 C 2/2 2s .1 C 2/1 C 2r .1 C 2/2 integers a with 0 a < 2r such that the set Ps;0 .r/-neighborhood-of-@
[
! Ti
.a2r ; 0/
1 i 2
contains a lattice point. This proves the following lemma. Lemma 6.1. There are at most
p 2r 108 .1 C 2 /.2 1 C 1/ 1 C 2s .1 C 2/1
p p p 1 C 2r .1 C 2/2 2s .1 C 2/1 C 2r .1 C 2/2 integers a in 0 a < 2r such that the interval .a2r ; .a C 1/2r / is not s-robust with respect to the lattice point counting function f .1 ; 2 I v/. Now we are ready to start the “blocks-and-gaps” decomposition and to define our Rademacher like functions. We proceed by induction.
6.1 Starting the Proof of Theorem 5.4: Blocks-and-Gaps Decomposition
379
Let B1 D
[
[
Ti and B2 D
`
Ti ;
(6.21)
4`Ck
so the gap between the two blocks B1 and B2 is 3`. We apply Lemma 6.1 with .1/
.1/
s D r0 D 0; 1 D k1 D ` C 1; 2 D k2 D ` C k $ and r D r1 D
% p log.1 C 2/ .2` C k/ ; log 2
(6.22)
where the positive integers k and ` will be specified later. Since r D r1 is the lower integer part of the fraction in (6.22), we have p p 1 .1 C 2/2`Ck < 2r1 .1 C 2/2`Ck : 2
(6.23)
The lattice point counting function .1/
X
.1/
f .1 ; 2 I v/ D f .k1 ; k2 I v/ D .1/
ˇ ˇ ˇ.Ti v/ \ ZZ2 ˇ .1/
k1 i k2
[see (6.22), where Ti D Ti . /] has the property that it is constant on the parallelograms P0 .r1 I a/ for which the intervals .a2r1 ; .a C 1/2r1 / are 0-robust, and it follows from Lemma 6.1 with the choice (6.22) that there are at most "1 2r1 integers a in 0 a < 2r1 such that the interval .a2r1 ; .a C 1/2r1 / is not 0-robust, where
p .1/ .1/ .1/ "1 D 108 .1 C 2 /.k2 k1 C 1/ 1 C 2r0 .1 C 2/k1 p .1/ r p p .1/
.1/ 1 C 2r1 .1 C 2/k2 2 0 .1 C 2/k1 C 2r1 .1 C 2/k2
p 108 .1 C 2 /k 1 C .1 C 2/`
p p p 1 C .1 C 2/`C1 .1 C 2/` C .1 C 2/`C1 ; and in the last step we used (6.22) and (6.23).
(6.24)
380
6 More on Randomness
Now we are ready to define our first Rademacher like function '1 .x/, 0 x < 1. The nonnegative integers r0 D 0 and r1 are already defined in (6.22), and for the interval a2r1 < x < .a C 1/2r1 let .1/
.1/
'1 .x/ D f .k1 ; k2 I v/; v 2 P0 .r1 I a/:
(6.25)
Notice that (6.25) is somewhat ambiguous: it is well defined for 0-robust intervals .1/ .1/ .a2r1 ; .aC1/2r1 / (since then f .k1 ; k2 I v/ is constant on the parallelograms v 2 P0 .r1 I a/), but (6.25) is not well defined for the intervals .a2r1 ; .a C 1/2r1 / that are not 0-robust. In that case we eliminate the ambiguity in (6.25) by choosing any .1/ .1/ of the values f .k1 ; k2 I v/ on the parallelogram v 2 P0 .r1 I a/ (it does not matter which one we choose). This way we defined a Rademacher like function '1 .x/: it is a step function, constant on every interval Œa2r1 ; .a C 1/2r1 , 0 a < 2r1 integer. .1/ .1/ Note that the lattice point counting function f .k1 ; k2 I v/, v 2 P0 has at most 4 2 10 .1 C /k different values. Indeed, the set [
B1 D .1/
Ti . / .1/
k1 i k2
.1/
.1/
can be easily p covered by less than 104 .1 C 2 /.k2 k1 C 1/ rectangles that all have slope 1= 2 and area 1/5, and the last step is to apply Lemma 5.5. It follows that the Rademacher like function '1 .x/, 0 x < 1, also has at most 104 .1 C 2 /k different values. Let m1 ; m2 ; m3 ; : : : ; mM be the complete list of the different values of the Rademacher like function '1 .x/ on the unit interval 0 x < 1; we have M 104 .1 C 2 /k. Let i be the density of the value mi , i.e., the one-dimensional Lebesgue measure of the subset fx 2 Œ0; 1/ W '1 .x/ D mi g is i (1 i M ). Since '1 .x/ is a step function, constant on every interval Œa2r1 ; .a C1/2r1 /, 0 a < 2r1 integer, it follows that every density i has the form i D
positive integer ; 1 i M; 2r 1
(6.26)
where M 104 .1 C 2 /k. Let i D area fv 2 P0 W f .1 ; 2 I v/ D mi g ; 1 i M
(6.27)
and 0 D area fv 2 P0 W f .1 ; 2 I v/ 62 fmi W 1 i M gg D 1
M X i D1
i ;
(6.28)
6.1 Starting the Proof of Theorem 5.4: Blocks-and-Gaps Decomposition
381
where, as usual, area stands for the two-dimensional Lebesgue measure. It follows from Lemma 6.1 that M X
ji i j "1 ;
(6.29)
i D1
which immediately implies 0 "1 . We extend the Rademacher like function '1 .x/ from the unit interval 0 x < 1 to the parallelogram P0 (which is equivalent to the unit square Œ0; 1/2 D R I 2 =ZZ2 modulo one) in the following trivial way: for every v 2 P0 .r1 I a/, 0 a < 2r1 integer, ˆ1 .v/ equals the constant value of '1 .x/ on the interval a2r1 x < .a C 1/2r1 : (6.30) It follows from the construction that .1/
.1/
ˆ1 .v/ D f .k1 ; k2 I v/; v 2 P0 .r1 I a/ with the possible exception of at most "1 2r1 integers a in 0 a < 2r1 :
(6.31)
Next we apply Lemma 6.1 with $ s D s2 D
% p log.1 C 2/ .2/ .2/ .3` C k/ ; 1 D k1 D 4`Ck C1; 2 D k2 D 4`C2k log 2 $ and r D r2 D
% p log.1 C 2/ .5` C 2k/ ; log 2
(6.32)
where k and ` are the same as in (6.22) (they will be specified later). We clearly have p p 1 .1 C 2/3`Ck < 2s2 .1 C 2/3`Ck 2
(6.33)
p p 1 .1 C 2/5`C2k < 2r2 .1 C 2/5`C2k : 2
(6.34)
and
We emphasize that the definition of our second Rademacher like function '2 .x/, 0 x < 1, is somewhat more complicated than that of '1 .x/. Indeed, now we have to satisfy two requirements:
382
6 More on Randomness
1. '2 .x/ is a step function such that it is constant on every subinterval a2r2 x < .a C 1/2r2 , 0 a < 2r2 integer; 2. the distribution of '2 .x/ on the longer subinterval a2r1 x < .a C 1/2r1 is independent of the value of a, where 0 a < 2r1 integer. Here requirement (2) is a new challenge, so we are not done in one step by simply applying Lemma 5.5, as we did in the case of '1 .x/. (In the definition of '1 .x/ we did not have to worry about requirement (2), because of the choice r0 D 0 in (6.22).) Besides applying Lemma 5.5 with the choice (6.32), we also need a new 12 ” argument argument. It is the same “reshaping by a power of matrix A D 11 that we already used in the proof of Lemma 5.2 (see Sect. 5.3). For the convenience of the reader we put it in the form of a new lemma. First we recall the notation that, given integers s 0, r 0, and 0 a p < 2r , r r s Ps .rI a/ denotes p the parallelogram with vertices .a2 ; 0/, ..aC1/2 ; 0/, .2 2C a2r ; 2s /, .2s 2 C .a C 1/2r ; 2s /. Notice that the area of Ps .rI a/ is 2sr . Lemma 6.2. Suppose that s r 0 and 0 a < 2r are integers; then the parallelogram Ps .rI a/ contains at least 2sr 200 2.sr/=2 pairwise disjoint empty lattice parallelograms. Proof. One can visualize Ps .rI a/ as a “long and narrow” parallelogram, for which 12 the “long” side is parallel to the magnifying eigenvector of the matrix A D 11 (the other sides of Ps .rI a/ are horizontal). We can turn the “long and narrow” parallelogram Ps .rI a/ into a “round” shape p by applying p an appropriate power p of A. We recall that A has eigenvalues 1 C 2 and 1 2; the eigenvector . 2; 1/ p of 1 C 2 represents the magnifying direction (for the positive powers of A), and p p the eigenvector . 2; 1/ of 1 2 represents the “shrinking” direction. It follows that, applying an appropriate negative power Ak of A, we obtain a “round” shape parallelogram Ak Ps .rI a/ that is equivalent to Ps .rI a/ modulo ZZ2 (and of course has the same area). Here “round” shape means that the diameter of parallelogram p Ak Ps .rI a/ is less than 10 area D 10 2.sr/=2 . It follows that parallelogram Ak Ps .rI a/ contains at least area 4 perimeter area 4 4 diameter 2sr 4 4 10 2.sr/=2 2sr 200 2.sr/=2 pairwise disjoint empty lattice squares. Applying the linear transformation Ak of determinant ˙1, we conclude that Ps .rI a/ contains at least 2sr 300 2.sr/=2 pairwise disjoint empty lattice parallelograms, completing the proof of Lemma 6.2. t u We discuss the definition of our second Rademacher like function '2 .x/, 0 x < 1 (by using Lemmas 6.1 and 6.2), in the next section.
6.2 Completing the Blocks-and-Gaps Decomposition
383
6.2 Completing the Blocks-and-Gaps Decomposition We have to define the second Rademacher like function '2 .x/ to satisfy requirement (2): for every integer j in 0 j < 2r1 and for every 1 i M (meas stands for the one-dimensional Lebesgue measure) 2r1 meas fj 2r1 x < .j C 1/2r1 W '2 .x/ D mi g D D i D meas f0 x < 1 W '1 .x/ D mi g ;
(6.35)
where m1 ; m2 ; : : : ; mM is the complete list of values of '1 .x/, 0 x < 1. .2/ .2/ We apply Lemma 6.1 with the choice (6.32). By definition f .k1 ; k2 I v/ is constant on the parallelograms v 2 Ps2 .r2 I a/ with s2 -robust intervals .a2r2 ; .a C 1/2r2 /. For every integer 0 j < 2r1 , let "2;j 2r2 r1 denote the number of intervals .a2r2 ; .a C 1/2r2 / with j 2r2 r1 a < .j C 1/2r2r1 [r1 is defined in (6.22)] that are not s2 -robust. Of course 0 "2;j 1, and by Lemma 6.1, 2
r1
r1 1 2X
"2;j "2 ;
(6.36)
j D0
where
p .2/ .2/ .2/ "2 D 108 .1 C 2 /.k2 k1 C 1/ 1 C 2s2 .1 C 2/k1 p .2/ s p p .2/
.2/ 2 2 .1 C 2/k1 C 2r2 .1 C 2/k2 1 C 2r2 .1 C 2/k2
p 108 .1 C 2 /k 1 C .1 C 2/`C1
p p p 1 C .1 C 2/`C1 .1 C 2/`C1 C .1 C 2/`C1 D
2 p p D 108 .1 C 2 /2k 1 C .1 C 2/`C1 .1 C 2/`C1 ;
(6.37)
and in the last step we used (6.32)–(6.34). For every 1 i M and 0 j < 2r1 write i .hI j / D 2
ˇ( ˇ ˇ j 2r2 r1 a < .j C 1/2r2r1 W ˇ
r2 Cr1 ˇ
)ˇ ˇ ˇ .2/ .2/ f .k1 ; k2 I v/ D mi for all v 2 Ps2 .r2 I a/ ˇ; ˇ
(6.38)
384
6 More on Randomness .2/
.2/
noting that the requirement “f .k1 ; k2 I v/ is constant for all v 2 Ps2 .r2 I a/” in (6.38) implies that the interval .a2r2 ; .a C 1/2r2 / is s2 -robust. It follows from the definition of "2;j that M X
ji .2I j / i .2I j /j "2;j ;
(6.39)
i D1
where o n .2/ .2/ i .2I j / D 2s2 Cr1 area v 2 Ps2 .r1 I j / W f .k1 ; k2 I v/ D mi ;
(6.40)
and we used the trivial fact [
Ps2 .r2 I a/ D Ps2 .r1 I j /:
j 2r2 r1 a<.j C1/2r2 r1
Next we apply Lemma 6.2 with s D s2 , r D r1 , and a D j (where of course 0 .2/ .2/ j < 2r1 ). The distribution of f .k1 ; k2 I v/ on a half-open lattice parallelogram is exactly the same as that of on the unit square Œ0; 1/2 D R I 2 =ZZ2 . Thus by Lemma 6.2 and (6.27), M X
j i .2I j / i j
i D1
200 2.s2 r1 /=2
400 .1 C
p
2/`=2 ;
(6.41)
where in the last step we used (6.23), and (6.34). Adding up the inequalities (6.29), (6.39) and (6.41), the triangle inequality yields M X
ji i .2I j /j
i D1
C
M X
M X i D1
ji i j C
M X
j i i .2I j /j C
i D1
j i .2I j / i .2I j /j "1 C 400 .1 C
p
2/`=2 C "2;j
(6.42)
i D1
for every integer 0 j < 2r1 . Note that the fractions i all have denominator 2r1 [see (6.26)], and the fractions i .2I j / all have denominator 2r2 r1 [see (6.38)], so 2r2 r1 is a common denominator, since [see (6.22) and (6.32)] $ r1 D
% % $ p p log.1 C 2/ log.1 C 2/ .2` C k/ and r2 D .5` C 2k/ ; log 2 log 2
implying r2 r1 r1 .
6.2 Completing the Blocks-and-Gaps Decomposition
385
Now we are ready to define '2 .x/ as a step function, constant on every interval a2r2 x < .a C 1/2r2 , where 0 j < 2r1 and j 2r2 r1 a < .j C 1/2r2 r1 are integers. Let 0 j < 2r1 be fixed. If i .2I j / i holds for some 1 i M — representing “surplus”—then we choose i 2r2 r1 integers a from the set [see (6.38)] )
( j2
r2 r1
a < .j C 1/2
r2 r1
W
.2/ .2/ f .k1 ; k2 I v/
D mi for all v 2 Ps2 .r2 I a/ ; (6.43)
and define '2 .x/ to be mi on the chosen intervals a2r2 x < .a C 1/2r2 . Next we turn to the indices 1 i M for which i .2I j / < i holds, representing “deficit.” We proceed by induction on these “deficit indices” i , taking them in increasing order (say). We define '2 .x/ to be mi on the intervals a2r2 x < .a C 1/2r2 for which (6.43) holds. If there remain intervals a2r2 x < .a C 1/2r2 for which '2 .x/ is undefined yet, then we choose .i i .2I j // 2r2 r1
(6.44)
undefined intervals a2r2 x < .a C 1/2r2 , and define '2 .x/ to be mi on these intervals (the inequality r2 r1 r1 implies that (6.44) is an integer). We carry out this “fixing the deficit by enforcing density i ” algorithm by induction on i . It follows from the construction that both requirements (1) and (2) are satisfied, so '1 .x/ and '2 .x/ are independent and identically distributed random variables defined on the unit interval 0 x < 1. As usual, we extend the Rademacher like function '2 .x/ from the unit interval 0 x < 1 to the parallelogram P0 in the following trivial way: for every v 2 P0 .r2 I a/, 0 a < 2r2 integer, ˆ2 .v/ equals the constant value of '2 .x/ on the interval a2r2 x < .a C 1/2r2 : (6.45) It follows from the construction and the key inequality (6.42) that, for a fix j in 0 j < 2r1 , .2/
.2/
ˆ2 .v/ D f .k1 ; k2 I v/; v 2 P0 .r2 I a/
(6.46)
for all integers a in j 2r2 r1 a < .j C 1/2r2 r1 with the possible exception of at most
p "1 C 400 .1 C 2/`=2 C "2;j 2r2 r1 : (6.47)
386
6 More on Randomness
Adding up j D 0; 1; 2; : : : ; 2r1 1 in (6.46) and (6.47), and applying (6.36), we obtain the following analog of (6.31): .2/
.2/
ˆ2 .v/ D f .k1 ; k2 I v/; v 2 P0 .r2 I a/ with the possible exception of
p at most "1 C 400 .1 C 2/`=2 C "2 2r2 integers a in 0 a < 2r2 :
(6.48)
We define the Rademacher like functions '3 ; '4 ; '5 ; : : : by induction exactly the same way as we defined '2 . More precisely, assume that for some integer h 3 we already defined '1 ; '2 ; : : : ; 'h1 with the associated sequence r0 D 0 < r1 < r2 < : : : < rh1 of integers as the type, where $ ri D
% p log.1 C 2/ ..3i 1/` C i k/ ; 1 i h 1: log 2
(6.49)
We show how to define 'h satisfying the following two properties: 1. 'h .x/ is a step function such that it is constant on every subinterval a2rh x < .a C 1/2rh , 0 a < 2rh integer; 2. the distribution of 'h .x/ on every longer subinterval j 2rh1 x < .j C 1/2rh1 , where 0 j < 2rj 1 is an integer, is the same as the distribution of '1 .x/ on the unit interval 0 x < 1. Requirement (2) is equivalent to the following: for every integer j in 0 j < 2rh1 and for every 1 i M (as usual, meas stands for the one-dimensional Lebesgue measure) 2rh1 meas fj 2rh1 x < .j C 1/2rh1 W 'h .x/ D mi g D D i D meas f0 x < 1 W '1 .x/ D mi g ; where m1 ; m2 ; : : : ; mM is the complete list of values of '1 .x/, 0 x < 1. We apply Lemma 6.1 with the choice $ s D sh D
% p log.1 C 2/ .h/ ..3h 3/` C .h 1/k/ ; 1 D k1 log 2 D .3h 2/` C .h 1/k C 1; $
2 D
.h/ k2
D .3h 2/` C hk and rh D
% p log.1 C 2/ ..3h 1/` C hk/ ; log 2 (6.50)
6.2 Completing the Blocks-and-Gaps Decomposition
387
where k and ` are the same as in (6.22) (they will be specified later). We clearly have p p 1 .1 C 2/.3h3/`C.h1/k < 2sh .1 C 2/.3h3/`C.h1/k 2
(6.51)
p p 1 .1 C 2/.3h1/`Chk < 2rh .1 C 2/.3h1/`Chk : 2
(6.52)
and
.h/
.h/
By definition f .k1 ; k2 I v/ is constant on the parallelograms v 2 Psh .rh I a/ with sh -robust intervals .a2rh ; .a C 1/2rh /. For every integer 0 j < 2rh1 , let "h;j 2rh rh1 denote the number of intervals .a2rh ; .a C 1/2rh / with j 2rh rh1 a < .j C 1/2rh rh1 that are not sh -robust. Of course 0 "h;j 1, and by Lemma 6.1, 2rh1
2rh1 X1
"h;j "h ;
(6.53)
j D0
where
p .h/ .h/ .h/ "h D 108 .1 C 2 /.k2 k1 C 1/ 1 C 2sh .1 C 2/k1 p .h/ s p p .h/
.h/ 1 C 2rh .1 C 2/k2 2 h .1 C 2/k1 C 2rh .1 C 2/k2
p 108 .1 C 2 /k 1 C .1 C 2/`C1
p p p 1 C .1 C 2/`C1 .1 C 2/`C1 C .1 C 2/`C1 D
2 p p D 108 .1 C 2 /2k 1 C .1 C 2/`C1 .1 C 2/`C1 ;
(6.54)
and in the last step we used (6.51) and (6.52). For every 1 i M and 0 j < 2rh1 write i .hI j / D 2
ˇ( ˇ .h/ .h/ ˇ j 2rh rh1 a < .j C1/2rh rh1 W f .k1 ; k2 I v/ D mi ˇ
rh Crh1 ˇ
)ˇ ˇ ˇ for all v 2 Psh .rh I a/ ˇ; ˇ
(6.55)
388
6 More on Randomness .h/
.h/
noting that the requirement “f .k1 ; k2 I v/ is constant for all v 2 Psh .rh I a/” in (6.55) implies that the interval .a2rh ; .a C 1/2rh / is sh -robust. It follows from the definition of "h;j that M X
ji .hI j / i .hI j /j "h;j ;
(6.56)
i D1
where o n .h/ .h/ i .hI j / D 2sh Crh1 area v 2 Psh .rh1 I j / W f .k1 ; k2 I v/ D mi ; and we used the trivial fact [
(6.57)
Psh .rh I a/ D Psh .rh1 I j /:
j 2rh rh1 a<.j C1/2rh rh1
Next we apply Lemma 6.2 with s D sh , r D rh1 , and a D j (where of .k/ .h/ course 0 j < 2rh1 ). The distribution of f .k1 ; k2 I v/ on a half-open lattice parallelogram is exactly the same as that of on the unit square Œ0; 1/2 D R I 2 =ZZ2 . Thus by Lemma 6.2 and (6.27), M X
j i .hI j / i j
i D1
200 2.sh rh1 /=2
400 .1 C
p `=2 2/ ;
(6.58)
where in the last step we used (6.49) and (6.51), (6.52). Adding up the inequalities (6.29), (6.56) and (6.58), the triangle inequality yields M X i D1
C
M X
ji i .hI j /j
M X i D1
ji i j C
M X
j i i .hI j /j C
i D1
j i .hI j / i .hI j /j "1 C 400 .1 C
p `=2 2/ C "h;j
(6.59)
i D1
for every integer 0 j < 2rh1 . Note that the fractions i all have denominator 2r1 [see (6.26)], and the fractions i .hI j / all have denominator 2rh rh1 [see (6.55)], so 2rh rh1 is a common denominator, since [see (6.22) and (6.50)] $ r1 D
$ % % p p log.1 C 2/ log.1 C 2/ .2` C k/ ; rh1 D ..3h 4/` C .h 1/k/ log 2 log 2
6.2 Completing the Blocks-and-Gaps Decomposition
$ and rh D
389
% p log.1 C 2/ ..3h 1/` C hk/ ; log 2
implying rh rh1 r1 . Now we are ready to define 'h .x/ as a step function, constant on every interval a2rh x < .a C 1/2rh , where 0 j < 2rh1 and j 2rh rh1 a < .j C 1/2rh rh1 are integers. Let 0 j < 2rh1 be fixed. If i .hI j / i holds for some 1 i M — representing “surplus”—then we choose i 2rh rh1 integers a from the set [see (6.55)] (
) j 2rh rh1 a < .j C 1/2rh rh1 W
.h/ .h/ f .k1 ; k2 I v/
D mi for all v 2 Psh .rh I a/ ; (6.60)
and define 'h .x/ to be mi on the chosen intervals a2rh x < .a C 1/2rh . Next we turn to the indices 1 i M for which i .hI j / < i holds, representing “deficit”. We proceed by induction on these “deficit indices” i , taking them in increasing order (say). We define 'h .x/ to be mi on the intervals a2rh x < .a C 1/2rh for which (6.60) holds. If there remain intervals a2rh x < .a C 1/2rh for which 'h .x/ is undefined yet, then we choose .i i .hI j // 2rh rh1
(6.61)
undefined intervals a2rh x < .a C 1/2rh , and define 'h .x/ to be mi on these intervals (the inequality rh rh1 r1 implies that (6.61) is an integer). We carry out this “fixing the deficit by enforcing density i ” algorithm by induction on i . It follows from the construction that both requirements (1) and (2) are satisfied, so '1 .x/; '2 .x/; : : : ; 'h .x/ are independent and identically distributed random variables defined on the unit interval 0 x < 1. As usual, we extend the Rademacher like function 'h .x/ from the unit interval 0 x < 1 to the parallelogram P0 in the following trivial way: for every v 2 P0 .rh I a/, 0 a < 2rh integer, ˆh .v/ equals the constant value of 'h .x/ on the interval a2rh x < .a C 1/2rh : (6.62) It follows from the construction and the key inequality (6.59) that .h/
.h/
ˆh .v/ D f .k1 ; k2 I v/; v 2 P0 .rh I a/
(6.63)
390
6 More on Randomness
for all integers a in j 2rh rh1 a < .j C 1/2rhrh1 with the possible exception of at most
p "1 C 400 .1 C 2/`=2 C "h;j 2rh rh1 : (6.64) Adding up j D 0; 1; 2; : : : ; 2rh1 1 in (6.63) and (6.64), and applying (6.53), we obtain the following perfect analog of (6.48): .h/
.h/
ˆh .v/ D f .k1 ; k2 I v/; v 2 P0 .rh I a/ with the possible exception of
p at most "1 C 400 .1 C 2/`=2 C "h 2rh integers a in 0 a < 2rh :
(6.65)
Since '1 .x/; '2 .x/; : : : ; 'h .x/, 0 x < 1, and ˆ1 .v/; ˆ2 .v/; : : : ; ˆh .v/, v 2 P0 have the same joint distribution, the latter are also independent and identically distributed random variables. Let me summarize what we did so far. We were studying the “blocks” (see (6.21) for the special cases h D 1; 2) [
Bh D .h/
Ti
(6.66)
.h/
k1 i k2
with h D 1; 2; 3; : : :, where Ti D Ti . / is the “hyperbolic triangle” defined at the beginning of Sect. 6.1. More precisely, we were studying the lattice point counting functions [see (6.9)] .h/
X
.h/
f .k1 ; k2 I v/ D .h/
X
fi .v/ D .h/
k1 i k2
.h/
ˇ ˇ ˇ.Ti v/ \ ZZ2 ˇ D .h/
k1 i k2
ˇ ˇ D ˇ.Bh v/ \ ZZ2 ˇ ; v 2 P0 ;
(6.67)
p p where P0 is the half-open parallelogram with vertices .0; 0/; .1; 0/; . 2; 1/; . 2 C 1; 1/. Note that P0 is equivalent to the unit square Œ0; 1/2 D R I 2 =ZZ2 modulo one. We constructed a sequence of Rademacher like functions '1 .x/; '2 .x/; : : : ; 'h .x/; : : : with 0 x < 1 of type [see (6.50)] $ r0 D 0 and rh D
% p log.1 C 2/ ..3h 1/` C hk/ for h 1 log 2
satisfying Property One and Property Two below.
(6.68)
6.2 Completing the Blocks-and-Gaps Decomposition
391
Property One: the sequence of extensions ˆ1 .v/; ˆ2 .v/; : : : ; ˆh .v/; : : :
(6.69)
defined by (6.62) on v 2 P0 represents independent and identically distributed random variables. Property Two: the sequence (6.69) exhibits a very good approximation property in the quantitative sense that .h/
.h/
ˆh .v/ D f .k1 ; k2 I v/ for all v 2 P0 .rh I a/ with the possible exception of
p at most "1 C 400 .1 C 2/`=2 C "h 2rh integers a in 0 a < 2rh ;
(6.70)
and (6.70) holds for every h 1 [see (6.65)], where
2 p p "h 108 .1 C 2 /2k 1 C .1 C 2/`C1 .1 C 2/`C1
(6.71)
[see (6.24), (6.37), and (6.54)]. This completes the first part of the “blocks-and-gaps” decomposition method. The second part is to study the “gaps” between Bh and BhC1 : [
Gh D .h/
Ti ; h D 1; 2; 3; : : :
(6.72)
.hC1/
k2
The second part of the “blocks-and-gaps” decomposition method simply means that we repeat the argument of the first part above by switching the roles of blocks and gaps. It formally means that we interchange the roles of k and 3` (for simplicity .h/ .h/ assume that k is divisible by 3) and replace the parameters k1 , k2 , rh , sh in (6.50) with the “overlines” .h/
.h/
k 1 D .3h 2/` C hk C 1 and k 2 D .3h C 1/` C hk for h 1; $ rh D $ and s h D
p % log.1 C 2/ 1 .3h C 1/` C .h C /k for h 0; log 2 3
p % log.1 C 2/ 1 .3h 2/` C .h /k for h 1: log 2 3
Equation (6.73) gives $ r0 D
p % 1 log.1 C 2/ `C k ; log 2 3
(6.73)
392
6 More on Randomness
$
p % log.1 C 2/ 4 4` C k ; log 2 3
$
p % 7 log.1 C 2/ 7` C k ; log 2 3
r1 D and r2 D
so r 2 r 1 > r 0 , and in general r h r h1 > r 0 for all h 2. This plays the role of the inequality rh rh1 r1 , h 2 that was used in the “fixing the deficit by enforcing density i ” algorithm, see (6.44). By repeating the argument above, we obtain via induction a sequence of Rademacher like functions ' 1 .x/; ' 2 .x/; : : : ; ' h .x/; : : : 0 x < 1 of type r h , h 0 defined in (6.73) that satisfy Property One and Property Two below. To formulate these two properties, we need to extend the Rademacher like functions ' h .x/ from the unit interval 0 x < 1 to the parallelogram P0 in the usual way: for every v 2 P0 .r h I a/, 0 a < 2r h integer, ˆh .v/ equals the constant value of ' h .x/ on the interval a2r h x < .a C 1/2r h : (6.74) Now we are ready to formulate the two properties: Property One: the sequence of extensions ˆ1 .v/; ˆ2 .v/; : : : ; ˆh .v/; : : :
(6.75)
defined by (6.74) on v 2 P0 represents independent and identically distributed random variables. Property Two: the sequence (6.75) exhibits a very good approximation property in the quantitative sense that .h/
.h/
ˆh .v/ D f .k 1 ; k 2 I v/ for all v 2 P0 .r h I a/ with the possible exception of
p "1 C 400 .1 C 2/k=6 C "h 2r h integers a in 0 a < 2r h ;
(6.76)
and (6.76) holds for every h 1, where
2 p p k k "h 108 .1 C 2 /6` 1 C .1 C 2/ 3 C1 .1 C 2/ 3 C1 :
(6.77)
6.3 Estimating the Variance
393
Notice that (6.77) is the analog of (6.71) obtained by switching the roles of k and 3` (since we switch the roles of “blocks” and “gaps”). It follows from the definition of the Rademacher like functions 'h and ' h that we have some extra independence beyond Property One in (6.69) and (6.75). Indeed, if a “block” and a “gap” are not neighbors then the associated random variables are independent. More precisely, f'1 ; : : : ; 'h g and
˚
' hC1 ; : : :
are independent;
and similarly f' 1 ; : : : ; ' h g and f'hC2 ; : : :g are independent: Of course the same holds for the extensions ˆh and ˆh . This completes the “blocksand-gaps” decomposition method.
6.3 Estimating the Variance The following lemma summarizes what we did in Sects. 6.1 and 6.2. The only novelty is the introduction of an extra parameter d representing the “starting point” (this slight generalization will be needed in some later applications). Lemma 6.3 (“blocks-and-gaps lemma”). Let b 1, d 0, ` 1, and k 3 be integers; we also assume that k is divisible by 3. Write 1 D d C ` C 1; 2 D d C .3b C 1/` C bk;
(6.78)
and consider the lattice point counting function f .1 ; 2 I v/ D
X 1 i 2
fi .v/ D
X ˇ ˇ ˇ.Ti v/ \ ZZ2 ˇ ; v 2 R I 2;
(6.79)
1 i 2
where Ti D Ti . / is the hyperbolic triangle defined at the beginning of Sect. 6.1. Note that f .1 ; 2 I v/ is actually defined on the unit torus v 2 Œ0; 1/2 D R I 2 =ZZ2 ; or, equivalently, we can take the half-open parallelogram P0 defined at the beginning of Sect. 6.1 (since P0 is equivalent modulo one to the unit torus Œ0; 1/2 ). There exist two sequences of Rademacher like functions such that the first sequence ' 1 ; ' 2 ; : : : ; 'b
394
6 More on Randomness
has type r0 D d < r1 < r2 < : : : < rb where $ rh D
% p log.1 C 2/ .d C .3h 1/` C hk/ for 1 h b; log 2
(6.80)
the second sequence ' 1 .x/; ' 2 .x/; : : : ; ' b .x/ has type r 0 < r 1 < r 2 < : : : < r b where $ rh D
p % log.1 C 2/ 1 d C .3h C 1/` C .h C /k for 0 h b; log 2 3
(6.81)
and the extensions ˆh , ˆh , 1 h b, defined in (6.62) and (6.74), have the following approximation property: f .1 ; 2 I v/ D
b X
ˆh . v/ C
hD1
b X
ˆh .v/ for all v 2 P0 .r b I a/
(6.82)
hD1
with the possible exception of at most ".I k; `/b 2r b integers a in 0 a < 2r b , where
2 p p p ".I k; `/ D 108 .1C 2 /4k 1C.1C 2/`C1 .1C 2/`C1 C400.1C 2/`=2 C
2 p p p k k C 108 .1 C 2 /12` 1 C .1 C 2/ 3 C1 .1 C 2/ 3 C1 C 400 .1 C 2/k=6 : (6.83) In particular, we have f .1 ; 2 I v/ D
b X
.h/ .h/ f .k1 ; k2 I v/
hD1
C
b X
.h/
.h/
f .k 1 ; k 2 I v/;
hD1
where .h/
.h/
k1 D d C .3h 2/` C .h 1/k C 1; k2 D d C .3h 2/` C hk; .h/
.h/
k 1 D d C .3h 2/` C hk C 1; k 2 D d C .3h C 1/` C hk; 1 h bI moreover, for every 1 h b we have .h/
.h/
f .k1 ; k2 I v/ D ˆh .v/ for all v 2 P0 .rh I a/
6.3 Estimating the Variance
395
with the possible exception of at most ! p `C1 2 p `C1 p `=2 r 10 .1 C /4k 1 C .1 C 2/ .1 C 2/ C 400 .1 C 2/ 2h 8
2
(6.84) integers a in 0 a < 2rh , and similarly .h/
.h/
f .k 1 ; k 2 I v/ D ˆh .v/ for all v 2 P0 .r h I a/ with the possible exception of at most ! p k C1 2 p k C1 p k=6 r 3 3 .1 C 2/ C 400 .1 C 2/ 10 .1 C /12` 1 C .1 C 2/ 2h 8
2
(6.85) integers a in 0 a < 2r h . Finally, we have the following independence relations: 1. '1 .v/; '2 .v/; : : : ; 'b .v/, v 2 P0 represent independent and identically distributed random variables; 2. ' 1 .v/; ' 2 .v/; : : : ; ' b .v/, v 2 P0 represent independent and identically distributed random variables; 3. for all 1 h b 1 f'1 ; : : : ; 'h g and
˚
' hC1 ; : : : ; ' b
are independentI
4. for all 1 h b 2 f' 1 ; : : : ; ' h g and f'hC2 ; : : : ; 'b g are independent: Of course the same holds for the extensions ˆh and ˆh , 1 h b. Proof. Sections 6.1 and 6.2 prove the special case d D 0. In our first application we need this special case only. In later applications, however, we need the general case d 0. Luckily the arguments of Sects. 6.1 and 6.2 work for the general case d 0 without any modification. t u To prove a CLT, we certainly need some information about the variance. The following somewhat complicated lemma (see Lemma 6.4) provides the necessary information about the variance. Let HK;L D HK;L . / denote the hyperbolic region n o p HK;L . / D .x; y/ 2 R I 2 W x 2 2y 2 where K x C y 2 L : (6.86) p (Note that the choice p K D 1, L D 2 2e N in (6.86) basically gives back the hyperbolic needle H . 2I N / in (6.2).)
396
6 More on Randomness
We claim Area.HK;L . // D p log.L=K/: 2
(6.87)
p p To prove (6.87), we switch to the new variables u D x C y 2 and v D x y 2, which is equivalent to xD
ˇ ˇ ˇ 1=2 1=2 ˇ @.x; y/ uCv uv and y D p with Jacobian D D ˇˇ 3=2 3=2 ˇˇ D 23=2 : 2 2 2 @.v; u/ 2 2
Thus we have I 2 W uv ; K u Lg D area.HK;L . // D 23=2 areaf.u; v/ 2 R D2
3=2
Z
L
uDK
Z
=u
2 1 d u d v D 3=2 2 vD=u
Z
L uDK
1 d u D p log.L=K/; u 2
proving (6.87). Lemma 6.4 (“variance lemma”). We have Z 2 L j.HK;L . / v/ \ ZZ2 j Area.HK;L . // d v D 2 . / log C O.1/; K P0 (6.88) where 2 . / D
4 log.1 C
p
1 X R˙ .n/
2/
nD1
n2
2 . 2 n=2/ C ‰ 2 . 2 n=2/ :
(6.89)
Here R˙ .n/ denotes the number of primary representations of x 2 2y 2 D ˙n (the definition of primary representation is given at the beginning of the proof of Proposition 2.20 in Sect. 2.6), and .z/, ‰.z/ are two nonelementary functions defined as infinite integrals: Z
1
.z/ D
cos.x/ sin.z=x/ dx
(6.90)
sin.x/ sin.z=x/ dx:
(6.91)
0
and Z
1
‰.z/ D 0
The proof of Lemma 6.4 is nontrivial; it is postponed to Sects. 6.6 and 6.7.
6.3 Estimating the Variance
397
Next we explain why the infinite series in (6.89) is convergent. We need the explicit formula for R˙ .n/ (probably due to Dirichlet) that is based on the prime factorization of n, where the primes are classified (mod 8). Let n D 2a p b11 p b`` q c11 q cmm be the prime factorization of n such that the first group of primes p i satisfy p i ˙1 (mod 8) and the second group of primes q j satisfy q j ˙3 (mod 8). Then R˙ .n/ D .1 C b1 / .1 C b` / if every power cj is even, and R˙ .n/ D 0 if at least one power cj is odd. This explicit formula implies the upper bound 0 R˙ .n/ .1 C a/.1 C b1 / .1 C b` /.1 C c1 / .1 C cm / D .n/ D
X
1;
d jn
(6.92) where .n/ D d jn 1 denotes the divisor function. Since the terms in (6.89) are all positive, to prove convergence it suffices to show boundedness. We need to know the asymptotic behavior of the nonelementary functions .z/ and ‰.z/; it is described by the following lemma. P
Lemma 6.5. For z > 1 we have Z .z/ D
1
cos.x/ sin.z=x/ dx D
0
p p p D p z1=4 sin.2 z/ C cos.2 z/ C O.z1=24 / 2 2
(6.93)
and Z
1
‰.z/ D
sin.x/ sin.z=x/ dx D 0
p p p D p z1=4 sin.2 z/ cos.2 z/ C O.z1=24 / ; 2 2
(6.94)
and finally for 0 < z 1 we have p p j.z/j 3 z and j‰.z/j 2 z:
(6.95)
We postpone the proof of Lemma 6.5 to Sects. 6.6 and 6.7. By Lemma 6.5, 2 . 2 n=2/ C ‰ 2 . 2 n=2/ D O . n/1=2 C O.1/:
(6.96)
398
6 More on Randomness
Also we use the well-known number-theoretic fact that the divisor function is relatively small: .n/ D O.n" / for any " > 0. Combining this with (6.92) and (6.96), we have 1 X R˙ .n/ 2 2 . n=2/ C ‰ 2 . 2 n=2/ D 2 n nD1
! ! 1 1 X X p n"3=2 C O.1/ n"2 D D O. / nD1
nD1
p D O. / C O.1/;
(6.97)
proving the boundedness of series (6.89). By (6.89), the “variance constant” 2 D 2 . / is a sum of infinitely many terms 0, but this fact alone does not guarantee that 2 > 0, and it is even less clear why 2 D 2 . / cannot be “extremely close to zero.” The following lemma settles this issue. Lemma 6.6. There are absolute constants 0 < c1 < c2 (independent of ) such that c1 < 2 . / < c2 for all 0 < 1 and p p c1 < 2 . / < c2 for all > 1: Moreover, we have the asymptotic formula 1 X 2 . / R˙ .n/ 2 p : p D p !1 2 log.1 C 2/ nD1 n3=2
lim
We postpone the proof of Lemma 6.6 to Sects. 6.6 and 6.7. The following lemma is the link between Lemmas 6.3 and 6.4. Let Eˆh denote the expectation of the random variable ˆh .v/, v 2 P0 ; formally, Z ˆh .v/ d v: (6.98) Eˆh D P0
Similarly, let Eˆh denote the expectation of the random variable ˆh .v/, v 2 P0 . Write ˆh;0 D ˆh Eˆh and ˆh;0 D ˆh Eˆh :
(6.99)
6.3 Estimating the Variance
399
Lemma 6.7. Under the condition of Lemma 6.3, we have (using the same notation) ˇ ˇ !1=2 ˇ b
1=2 ˇˇ X p ˇ 2 ˇ Variance ˇ ˆh .v/ C ˆh .v/ ./b.k C 3`/ log.1 C 2/ C O.1/ ˇ ˇ ˇ ˇ hD1 104 .1 C 2 / C b".I k; `/104 .1 C 2 /.2 1 C 1/ C
p b".I k; `/104 .1 C 2 /.2 1 C 1/;
where ".I k; `/ is defined in (6.83). Similarly, ˇ
1=2 ˇˇ p ˇ ˇ Varv2P ˆh .v/ 1=2 2 . /3` log.1 C 2/ C O.1/ ˇ 0 ˇ ˇ 104 .1 C 2 / C ".I k; `/104 .1 C 2 /3` C
p ".I k; `/104 .1 C 2 /3`:
Finally, we have ˇ !ˇ b ˇ X ˇˇ ˇ ˆh .v/ C ˆh .v/ ˇ ".I k; `/104 .1 C 2 /b 2 .k C 3`/: ˇEv2P0 f .1 ; 2 I v/ ˇ ˇ hD1
Proof of Lemma 6.7. By Lemma 5.8 and (6.5), Z
Z P0
fi .v/ d v D
P0
ˇ ˇ ˇ.Ti . / v/ \ ZZ2 ˇ d v D
Z
ˇ ˇ ˇ.Ti . / v/ \ ZZ2 ˇ d v D Œ0;1/2
D area .Ti . // D
log.1 C p 2
p 2/
:
(6.100)
Equation (6.100) means that the random variable fi .v/, v 2 P0 has expectation log.1 C p Efi D 2
p 2/
:
(6.101)
We are going to apply Lemma 6.4 with [see (6.78)] p p 1 1 K D p .1 C 2/1 1 and L D p .1 C 2/2 ; 2 2 2 2 where 1 D d C ` C 1 and 2 D d C .3b C 1/` C bk:
(6.102)
In view of (6.4) K is the nearest integer to q1 1 and L is the nearest integer to q2 , and combining this with (6.86), we obtain that the symmetric set-difference
400
6 More on Randomness
HK;L . / n
[
! Ti . / [
1 i 2
[
! Ti . / n HK;L . /
1 i 2
p can be easily covered by less than 104 .1 C 2 / rectangles that all have slope 1= 2 and area 1=5. So by Lemma 5.5, ˇˇ ˇ ˇ ˇˇ.HK;L . / v/ \ ZZ2 ˇ f .1 ; 2 I v/ˇ 104 .1 C 2 /:
(6.103)
Moreover, by (6.87) and (6.5) we have p area .HK;L . // D p log.L=K/ D p log.1 C 2/2 1 C1 D 2 2 ! [ p D p .2 1 C 1/ log.1 C 2/ D area Ti . / D Ef .1 ; 2 /; 2 1 i 2 (6.104) where Ef .1 ; 2 / denotes the expected value of the random variable f .1 ; 2 I v/, v 2 P0 . t u By (6.103), ˇ 2 ˇ Ev2P0 ˇ.HK;L . / v/ \ ZZ2 ˇ f .1 ; 2 I v/ 108 .1 C 2 /2 :
(6.105)
We recall Minkowski’s inequality: kF C Gkp kF kp C kGkp for 1 p 1;
(6.106)
where k : : : kp denotes the Lp -norm. Note that (6.106) plays the role of the triangle inequality in the Lp -space, and it will be repeatedly used below. Combining (6.103)–(6.105), and Minkowski’s inequality in the special case p D 2, we have (Var stands for the variance) ˇ ˇ ˇ ˇ1=2 ˇ ˇ .Varv2P0 f .1 ; 2 I v//1=2 ˇ ˇ Varv2P0 ˇ.HK;L . / v/ \ ZZ2 ˇ ˇ 2 1=2 ˇ Ev2P0 ˇ.HK;L . / v/ \ ZZ2 ˇ f .1 ; 2 I v/ 104 .1C 2 /: By repeated application of Minkowski’s inequality with p D 2, we have ˇ !1=2 ˇˇ ˇ b X ˇ ˇ ˇ.Varv2P f .1 ; 2 I v//1=2 Varv2P ˇ ˆh .v/ C ˆh .v/ 0 0 ˇ ˇ ˇ ˇ hD1
(6.107)
6.3 Estimating the Variance
Varv2P0
401
!!1=2 b X f .1 ; 2 I v/ ˆh .v/ C ˆh .v/ hD1
0 @Ev2P0 f .1 ; 2 I v/
b X
!2 11=2 ˆh .v/ C ˆh .v/ A C
hD1
C Ev2P0 f .1 ; 2 I v/
b X
ˆh .v/ C ˆh .v/
! :
(6.108)
b" ;
(6.109)
hD1
We recall the following corollary of Lemma 6.3: ( area v 2 P0 W f .1 ; 2 I v/ ¤
b X
ˆh .v/ C ˆh .v/
)
hD1
where
2 p p p " D ". I k; `/ D 108 .1C 2 /4k 1 C .1 C 2/`C1 .1C 2/`C1 C400.1C 2/`=2 C
2 p p p k k C 108 .1 C 2 /12` 1 C .1 C 2/ 3 C1 .1 C 2/ 3 C1 C 400 .1 C 2/k=6 : (6.110) Furthermore, by Lemma 5.5, max f .1 ; 2 I v/ 104 .1 C 2 /.2 1 C 1/;
(6.111)
b X ˆh .v/ C ˆh .v/ 104 .1 C 2 /.2 1 C 1/:
(6.112)
v2P0
and similarly max v2P0
hD1
By (6.109)–(6.112), ˇ !ˇ b ˇ X ˇˇ ˇ ˆh .v/ C ˆh .v/ ˇ b" 104 .1 C 2 /.2 1 C 1/ D ˇEv2P0 f .1 ; 2 I v/ ˇ ˇ hD1
D b" 104 .1 C 2 /b.k C 3`/;
(6.113)
402
6 More on Randomness
and 0 @Ev2P0 f .1 ; 2 I v/
b X
!2 11=2 ˆh .v/ C ˆh .v/ A
hD1
p b" 104 .1 C 2 /.2 1 C 1/:
(6.114)
Combining (6.107), (6.108), (6.113), and (6.114), the triangle inequality gives ˇ !1=2 ˇˇ ˇ b X ˇ ˇ ˇ ˇ ˇ ˇ Varv2P ˇ.HK;L . / v/ \ ZZ2 ˇ 1=2 Varv2P ˆh .v/ C ˆh .v/ 0 0 ˇ ˇ ˇ ˇ hD1 ˇ ˇ ˇ ˇ1=2 ˇ ˇ ˇ Varv2P0 ˇ.HK;L . / v/ \ ZZ2 ˇ .Varv2P0 f .1 ; 2 I v//1=2 ˇ C ˇ !1=2 ˇˇ ˇ b X ˇ ˇ ˇ C ˇˇ.Varv2P0 f .1 ; 2 I v//1=2 Varv2P0 ˆh .v/ C ˆh .v/ ˇ ˇ ˇ hD1 p b" 104 .1 C 2 /.2 1 C 1/: (6.115) By using (6.88) in Lemma 6.4 with the choice (6.102), we have
104 .1 C 2 / C b" 104 .1 C 2 /.2 1 C 1/ C
p ˇ ˇ Varv2P0 ˇ.HK;L . / v/ \ ZZ2 ˇ D 2 . /.2 1 C 1/ log.1 C 2/ C O.1/: (6.116) Combining (6.115) and (6.116), we have ˇ 11=2 ˇˇ 0 ˇ b ˇ ˇ
1=2 X p ˇ 2 ˇ @Varv2P0 ˆh .v/ C ˆh .v/ A ˇ ˇ . /.2 1 C 1/ log.1 C 2/ C O.1/ ˇ ˇ ˇ ˇ hD1
p b" 104 .1 C 2 /.2 1 C 1/: (6.117) Repeating the proof of (6.117) with 2 1 C 1 D 3` instead of 2 1 C 1 D b.k C 3`/, we obtain 104 .1 C 2 / C b" 104 .1 C 2 /.2 1 C 1/ C
ˇ
1=2 ˇˇ p ˇ ˇ Varv2P ˆh .v/ 1=2 2 . /3` log.1 C 2/ C O.1/ ˇ 0 ˇ ˇ 104 .1 C 2 / C " 104 .1 C 2 /3` C
p
" 104 .1 C 2 /3`:
Combining (6.113), (6.117)–(6.118), and (6.110), Lemma 6.7 follows.
(6.118)
6.4 Applying Probability Theory
403
6.4 Applying Probability Theory We are now ready to prove Theorem 5.4. Theorem 5.4 is about the typical fluctuations of the lattice point counting function ˇ ˇ
p p ˇ ˇ F . 2I ˇI I N / D ˇ H . 2I N / v.ˇ/ \ ZZ2 ˇ ;
(6.119)
where parameter ˇ runs in the interval 0 ˇ < 1, i.e., we study the effect of the one-dimensional family of translations by the vectors v.ˇ/ D .ˇ; 0/ [see (6.1)–(6.3)]. As we explained at the beginning of Sect. 5.1, it is natural to switch from the linear scale N to the exponential scale e N . Let I0 D I0 . I N / denote the largest integer i such that thephyperbolic triangle Ti D Ti . / is still contained in the hyperbolic needle H . 2I e N /. By definition, ˚ I0 D I0 . I N / D max i 2 ZZ W qi C2 e N ; and using (6.4): p p
p 1 1 qi D p .1 C 2/i .1 2/i D nearest integer to p .1 C 2/i ; 2 2 2 2 we obtain that p N C log.2 2= / I0 D I0 . I N / D p 2; log.1 C 2/
(6.120)
where the slightly ambiguous (6.120) means either the upper or the lower integral part of the right-hand side. The set-difference [ p Ti . / H . 2I e N / n 0i I0 . IN /
p can be easily covered by less than 104 .1 C 2 / rectangles that all have slope 1= 2 and area 1=5. The first consequence of this fact is the straightforward inequality 0 p N
area H . 2I e / area @
[
0i I0 . IN /
1
104 p .1C 2 /; Ti . /A area H . 2I e N / 5
404
6 More on Randomness
and the second consequence via Lemma 5.5 is the following: X
X
p fi .v.ˇ// F . 2I ˇI I e N /
0i I0 . IN /
fi .v.ˇ// C 104 .1 C 2 /
0i I0 . IN /
(6.121) for every vector v.ˇ/ D .ˇ; 0/, 0 ˇ < 1 (and of course for every > 0). We choose jp k jp k I0 and k D I0 .log I0 /2 f0 or 1 or 2g (6.122) bD in such a way that k is divisible by 3. Then
p p p I0 I0 .log I0 /2 D I0 I0 .log I0 /2 I0 bk > >
p
p
p I0 1 I0 .log I0 /2 3 > I0 I0 .log I0 /2 C 4 :
(6.123)
For simplicity, we assume first that I0 [defined in (6.120)] has the special form I0 D I0 . I N / D 2 D .3b C 1/` C bk
(6.124)
(see (6.78) with d D 0). By (6.123) and (6.124), p p I0 I0 .log I0 /2 C 4 ; .log I0 /2 ` < 3b C 1 3b C 1 and by (6.122), p 1 1 bC1 b 1 1 I0 C > > > ; 3 3b 3b C 1 3b C 1 3b C 1 3 3b so we have the upper and lower bounds 1 1 .log I0 /2 1 ` < .log I0 /2 C 2: 3 3
(6.125)
By (6.124), X
fi .v.ˇ// D f .1 ; 2 I v.ˇ// C f .0; 1 1I v.ˇ//; where 1 1 D `
0i I0 . IN /
(6.126) (see (6.78) with d D 0). Since ` is “relatively small,” the dominating part of (6.126) is f .1 ; 2 I v.ˇ//. In view of Lemma 6.3 the distribution of
6.4 Applying Probability Theory
405
f .1 ; 2 I v.ˇ//; v.ˇ/ D .ˇ; 0/; 0 ˇ < 1; is “almost the same” as that of f .1 ; 2 I v/, v 2 P0 . Moreover, we have the equality f .1 ; 2 I v/ D
b X
ˆh . v/ C
hD1
b X
ˆh .v/
(6.127)
hD1
for the “overwhelming majority” of v 2 P0 . Since parameter ` is “small” compared to k, the sum b X
ˆh .v/ is the dominating part in (6.127):
hD1
This is a sum of independent and identically distributed random variables, so it is natural to apply the standard CLT in probability theory. For later applications we use a more general version that goes beyond identically distributed components. (Note that we already used such a version in Sect. 1.3, see (1.90).)
6.4.1 Central Limit Theorem with Explicit Error Term (Berry–Esseen version) Let Z1 ,Z2 , : : :,Zn be independent random variables with expectation EZi D 0, variance EZi2 < 1, and also EjZi j3 < 1 for all 1 i n. Write W D
n X
EjZi j3 and V D
i D1
n X
EZi2 :
i D1
Then for every real ˇ ˇ Z 1 ˇ ˇ 40W u2 =2 ˇPr Z1 C Z2pC : : : C Zn p1 ˇ< e d u ˇ ˇ V 3=2 : V 2
(6.128)
In order to apply (6.128) P we need some information about the second and third central moments of the sum bhD1 ˆh .v/, v 2 P0 . By using the notation (6.98) and (6.99), and the independence relations at the end of Lemma 6.3, we have (Var stands for variance) !2 b b X X Var D ˆh C ˆh D E ˆh;0 C ˆh;0 hD1
hD1
406
6 More on Randomness
D
b X
Var.ˆh / C
hD1
C
b X
Eˆh;0 ˆh;0 C
hD1
b X
Var.ˆh /C
hD1 b1 X
EˆhC1;0 ˆh;0 :
(6.129)
hD1
We apply the Cauchy–Schwarz inequality: q ˇ ˇ p ˇEˆh;0 ˆh;0 ˇ Var.ˆh / Var.ˆh /; and similarly q ˇ p ˇ ˇEˆhC1;0 ˆh;0 ˇ Var.ˆhC1 / Var.ˆh /: Using these inequalities in (6.129), we obtain ˇ ˇ b b ˇ ˇ X X ˇ ˇ Var.ˆh /ˇ ˆh C ˆh ˇVar ˇ ˇ hD1
hD1
q p bVar.ˆ1 / C .2b 1/ Var.ˆ1 / Var.ˆ1 /; which implies ˇ !1=2 ˇˇ !1=2 ˇ b b X X ˇ ˇ ˇ ˇ Var Var.ˆh / ˆh C ˆh ˇ ˇ ˇ ˇ hD1 hD1 q p bVar.ˆ1 / C .2b 1/ Var.ˆ1 / Var.ˆ1 /
1=2 1=2 Pb P Var bhD1 ˆh C ˆh C Var.ˆ / h hD1 bVar.ˆ1 / .2b 1/ 1=2 C pb Pb Var hD1 ˆh C ˆh
q Var.ˆ1 /:
We recall (6.120) and (6.125): p N C log.2 2= / p 2 I0 D I0 . I N / D log.1 C 2/
(6.130)
6.4 Applying Probability Theory
407
and 1 1 .log I0 /2 1 ` < .log I0 /2 C 2: 3 3 We use the elementary fact that given arbitrary constants C1 > 1 and C2 < 1, the inequality .log N /2
C1
> N C2
(6.131)
holds for every sufficiently large value of N . It follows via simple calculations that the choice of parameters k [see (6.122)] and ` implies the following upper bound for ".I k; `/ [defined in (6.83)]: ".I k; `/
1012 .1 C 2 /2 : N8
Thus by Lemma 6.7, ˇ ˇ !1=2 ˇ b
1=2 ˇˇ X p ˇ ˇ Var ˇ 2 . /b.k C 3`/ log.1 C 2/ C O.1/ ˆh C ˆ h ˇ ˇ ˇ ˇ hD1 104 .1 C 2 / C
1010 .1 C 2 /4 ; N2
(6.132)
and ˇ
1=2 ˇˇ p ˇ ˇ Varˆh 1=2 2 . /3` log.1 C 2/ C O.1/ ˇ ˇ ˇ 104 .1 C 2 / C
1010 .1 C 2 /2 : N2
(6.133)
Next we study the third moment. To estimate the third central moment of ˆ1 .v/, v 2 P0 , we are going to use the following well-known moment inequality: let X be a random variable, then 1=3 1=4 EjX j4 : EjX j3
(6.134)
(Note that (6.134) is a special case of the general inequality .EjX ju /1=u .EjX jv /1=v for all 0 < u v; which follows from Jensen’s inequality applied for the convex function x v=u , x > 0.)
408
6 More on Randomness
In view of (6.134), it suffices to estimate the fourth central moment Ev2P0 .ˆ1;0 . v//4 :
(6.135)
It is based on another application of Lemma 6.3 where the blocks and the gaps all have the same size 3`. First we divide k with 6` [see (6.122)–(6.125)]: k D b 6` C r ; where the remainder is in the interval 0 r < 6`:
(6.136)
We specify the integral parameters “b 1; d 0; ` 1; k 3” in Lemma 6.3 to be b ; d ; ` ; k as follows: ` D ` D k =3; d D 0; b is defined in (6.136);
(6.137)
and of course ` is defined in (6.124)–(6.125). Write [see (6.78)] 1 D ` C 1 D ` C 1 and 2 D .3b C 1/` C b k D ` C 6`b :
(6.138)
By Lemma 6.3 there exist two sequences of Rademacher like functions '1 ; '2 ; : : : ; 'b and ' 1 .x/; ' 2 .x/; : : : ; ' b .x/
such that the extensions ˆh , ˆh , 1 h b , defined in (6.62) and (6.74), have the following approximation property:
f .1 ; 2 I v/
D
b X
ˆh .
hD1
v/ C
b X
ˆh .v/ for all v 2 P0
(6.139)
hD1
with the possible exception of vs of total area at most 2b
! p `C1 2 p `C1 p `=2 .1 C 2/ C 400 .1 C 2/ 10 .1 C /12` 1 C .1 C 2/ : 8
2
(6.140) We also need the simple fact max
1hb ;v2P0
ˇ ˇo n ˇ ˇ jˆh .v/j ; ˇˆh .v/ˇ 104 .1 C 2 /3`;
which is a standard application of Lemma 5.5.
(6.141)
6.4 Applying Probability Theory
409
Write
ˆh;0 D ˆh Eˆh and ˆh;0 D ˆh Eˆh ; that is, the extra 0 in the index indicates that the expectation is 0. By using the independence of the Rademacher like functions, we have 14 0 b X 4 2 2 ˆh;0 A D b E ˆ1;0 C 3b .b 1/ E ˆ1;0 E@ hD1
4 4 b max ˆ1;0 .v/ C 3b b 1 max ˆ1;0 .v/ v2P0
v2P0
4 2 4 2 3 b max ˆ1;0 .v/ 3 b 104 .1 C 2 /3` ; v2P0
(6.142)
where in the last step we used (6.141). Similarly, 0 14 b X 2 4 ˆh;0 A 3 b 104 .1 C 2 /3` ; E@
(6.143)
hD1
Applying Minkowski’s inequality with p D 4 [see (6.106)], by (6.142) and (6.143) we have 0 14 b
X 2 4 E@ ˆh;0 C ˆh;0 A 24 3 b 104 .1 C 2 /3` :
(6.144)
hD1
Note that (6.144) is the main step toward the estimation of (6.135). The rest is routine estimations with a few more applications of Minkowski’s inequality. The details go as follows. We have b
X .1/ .1/ ˆ1 .v/ ˆh C ˆh D ˆ1 .v/ f .k1 ; k2 I v/C
hD1 b X
C
.1/ .1/ f .k1 ; k2 I v/
ˆh C ˆh D 1 .v/ C 2 .v/ C 3 .v/;
(6.145)
hD1
where .1/
.1/
1 .v/ D ˆ1 .v/ f .k1 ; k2 I v/;
(6.146)
410
6 More on Randomness
2 .v/ D f .2 C 1; k2 I v/; .1/
(6.147)
b
X ˆh C ˆh ;
3 .v/ D f .1 ; 2 I v/
(6.148)
hD1
since 1 D ` C 1 D k1 and 2 D ` C 6`b < k2 D ` C k [so k2 2 < 6`, see (6.136)–(6.138)]. Combining (6.135)–(6.139) with (6.131), we have that 3 .v/, v 2 P0 is zero except for a possible subset of P0 with area 1010 .1 C 2 /N 6 , and also p max j3 .v/j 104 .1 C 2 /k < 104 .1 C 2 /2 N : .1/
.1/
.1/
v2P0
It follows that p 4 Ev2P0 .3 .v/ E3 /4 1010 .1 C 2 /N 6 104 .1 C 2 /2 N < <
1032 .1 C 2 /5 : N4
(6.149)
Next we study 2 .v/; see (6.147). Since max j2 .v/j 104 .1 C 2 /r < 104 .1 C 2 /6`; v2P0
we clearly have 4 Ev2P0 .2 .v/ E2 /4 104 .1 C 2 /6` < < 1020 .1 C 2 /4 `4 :
(6.150)
Finally we study 1 .v/, see (6.146). We apply Lemma 6.3: using the facts starting below (6.83) and ending at (6.84), that we can basically repeat the argument of (6.149). Indeed, the function 1 .v/, v 2 P0 is zero except for a possible subset of P0 with area 1010 .1 C 2 /N 6 , and also p max j1 .v/j 104 .1 C 2 /k < 104 .1 C 2 /2 N : v2P0
It follows that p 4 Ev2P0 .1 .v/ E1 /4 1010 .1 C 2 /N 6 104 .1 C 2 /2 N <
6.4 Applying Probability Theory
411
<
1032 .1 C 2 /5 : N4
(6.151)
Combining (6.145)–(6.151), and using Minkowski’s inequality we have 0 0 14 11=4 b
X C B @ ˆh;0 C ˆh;0 A A @E ˆ1;0 .v/ hD1
3 X 4 1=4 E j .v/ Ej j D1
2
108 .1 C 2 /5=4 C 105 .1 C 2 /`: N
(6.152)
Combining (6.144) and (6.152), and using Minkowski’s inequality one more time, we have 0 0 14 11=4 b
1=4
X C B E .ˆ1;0 .v//4 ˆh;0 C ˆh;0 A A C @E @ hD1
0 0 14 11=4 b
X B C C @E @ˆ1;0 .v/ ˆh;0 C ˆh;0 A A hD1
1=2 4 108 .1 C 2 /5=4 10 .1 C 2 /3` C 2 C 105 .1 C 2 /` 3 b N 9 104 .1 C 2 /N 1=4 log N C 2
108 .1 C 2 /5=4 C 105 .1 C 2 /.log N /2 N
105 .1 C 2 /N 1=4 log N;
(6.153)
where we used (6.120), (6.122), (6.125), and (6.136). Combining (6.134) and (6.153), we have
1=3
1=4 E jˆ1;0 .v/j3 E .ˆ1;0 .v//4 105 .1 C 2 /N 1=4 log N:
(6.154)
412
6 More on Randomness
Next we combine (6.130) and (6.132): ˇ ˇ !1=2 ˇ b
1=2 ˇˇ p ˇ X ˇ ˇ Var.ˆh / 2 . /b.k C 3`/ log.1 C 2/ C O.1/ ˇ ˇ ˇ ˇ hD1 bVar.ˆ1 / 1=2 C Pb Var hD1 ˆh C ˆh C
.2b 1/ p b
q 1010 .1 C 2 /2 Var.ˆ1 / C 104 .1 C 2 / C : N2
(6.155)
p 2/ D N C O .log N /2 C O .1/;
(6.156)
By (6.120)–(6.125), b.k C 3`/ log.1 C
and combining it with (6.132), we have
Var
b X
ˆh C ˆh
!1=2
1=2 D . / N C O .log N /2 C O .1/ :
(6.157)
hD1
Moreover, by (6.133),
1=2 p 1=2 Varˆh D . / .log N /2 log.1 C 2/ C O .1/ :
(6.158)
p Combining (6.155)–(6.158) and using the fact b < 2 N [see (6.120) and (6.122)], we have ˇ ˇ !1=2 ˇ b ˇ ˇ X 1=2 ˇ 2 ˇ ˇ Var.ˆ / . / N C O .log N / .1/ C O h ˇ ˇ ˇ hD1 ˇ p 2 N 2 . /O .log N C O .1//2 1=2 C . / N C O ..log N /2 / C O .1/ 1010 .1 C 2 /2 D CN 1=4 . /O log N C O .1/ C 104 .1 C 2 / C N2 D N 1=4 . /O log N C O .1/ : (6.159)
6.5 Conclusion of the Proof of Theorem 5.4
413
Finally we recall the last statement of Lemma 6.7: ˇ !ˇ b ˇ X ˇˇ ˇ ˆh .v/ C ˆh .v/ ˇ ˇEv2P0 f .1 ; 2 I v/ ˇ ˇ hD1
". I k; `/104 .1 C 2 /b 2 .k C 3`/ D O .1/;
(6.160)
where in the last step (6.131) was used again. Now we have everything ready to complete the proof of Theorem 5.4. We work out the details in the next section.
6.5 Conclusion of the Proof of Theorem 5.4 Theorem 5.4 is a CLT about the lattice point counting function ˇ ˇ
p p ˇ ˇ F . 2I ˇI I N / D ˇ H . 2I N / v.ˇ/ \ ZZ2 ˇ ; where parameter ˇ runs in the interval 0 ˇ < 1, i.e., we study the effect of the one-dimensional family of translations by the vectors v.ˇ/ D .ˇ; 0/. More precisely, Theorem 5.4 states that the renormalized lattice point counting function p F . 2I ˇI I e N / p . / N
p N 2
; 0 ˇ < 1;
has a standard normal limit distribution as N ! 1. We recall some facts from the beginning of Sect. 6.4: for every vector v.ˇ/ D .ˇ; 0/, 0 ˇ < 1 and > 0, X
p fi .v.ˇ// F . 2I ˇI I e N /
0i I0 . IN /
X
fi .v.ˇ// C 104 .1 C 2 /;
0i I0 . IN /
(6.161) where p N C log.2 2= / I0 D I0 .I N / D 2: p log.1 C 2/
(6.162)
We choose bD
jp k jp k I0 and k D I0 .log I0 /2 f0 or 1 or 2g
in such a way that k is divisible by 3.
(6.163)
414
6 More on Randomness
For simplicity, assume first that I0 D I0 . I N / D 2 D .3b C 1/` C bk
(6.164)
holds for some integer b. Then 1 1 .log I0 /2 1 ` < .log I0 /2 C 2: 3 3
(6.165)
Clearly X
fi .v.ˇ// D f .1 ; 2 I v.ˇ// C f .0; 1 1I v.ˇ//; where 1 1 D `:
0i I0 . IN /
Since ` < .log N /2 is “small,” the dominating part is f .1 ; 2 I v.ˇ//. More precisely, ˇ ˇ ˇ ˇ ˇ X ˇ ˇ fi .v.ˇ// f .1 ; 2 I v.ˇ//ˇˇ max f .0; `I v.ˇ// ˇ ˇ0i I0 . IN / ˇ 0ˇ1 104 .1 C 2 /` < 104 .1 C 2 /.log N /2 :
(6.166)
In view of Lemma 6.3 the distribution of f .1 ; 2 I v.ˇ//; v.ˇ/ D .ˇ; 0/; 0 ˇ < 1 is “almost the same” as that of f .1 ; 2 I v/, v 2 P0 , and we have the equality f .1 ; 2 I v/ D
b X hD1
ˆh .v/ C
b X
ˆh .v/
hD1
for the “overwhelming majority” of v 2 P0 . The following three corollaries of Lemma 6.3 make the vague terms “almost the same” and “overwhelming majority” precise. Corollary (1) of Lemma 6.3: We have f .1 ; 2 I v/ D
b X hD1
ˆh . v/ C
b X
ˆh .v/ for all v 2 P0 .r b I a/
(6.167)
hD1
with the possible exception of at most b".I k; `/2r b <
1010 .1 C 2 / r b 2 N4
(6.168)
6.5 Conclusion of the Proof of Theorem 5.4
415
integers a in 0 a < 2r b , where $ rb D
p % log.1 C 2/ 1 .3b C 1/` C .b C /k ; log 2 3
and in the last step of (6.168) we used (6.131). Corollary (2) of Lemma 6.3: The right-hand side sum b X hD1
ˆh .v/ C
b X
ˆh .v/
hD1
in (6.167) is a step function: it is constant on every parallelogram v 2 P0 .r b I a/, 0 a < 2r b . P P Corollary (3) of Lemma 6.3: bhD1 ˆh .v/ and bhD1 ˆh .v/ represent two sums of independent and identically distributed random variables. The next step is to apply the Berry–Esseen form of the CLT in probability theory, see (6.128). Write V1 D bEˆ21;0 and W1 D bE jˆ1;0 j3 ;
(6.169)
then by (6.128), for every real 1 , ˇ ˇ ( ) Z 1 b ˇ ˇ 40W X p 1 2 ˇ ˇ 1 ˆh;0 .v/ 1 V1 p e u =2 d uˇ < 3=2 : ˇarea v 2 P0 W ˇ ˇ 2 1 V1 hD1 (6.170) Similarly, write ˇ ˇ3 2 V2 D bEˆ1;0 and W2 D b E ˇˆ1;0 ˇ ;
(6.171)
then by (6.128), for every real 2 , ˇ ˇ ( ) Z 1 b ˇ ˇ 40W X p 1 2 2 ˇ ˇ ˆh;0 .v/ 2 V2 p e u =2 d uˇ < 3=2 : ˇarea v 2 P0 W ˇ ˇ 2 2 V2 hD1 (6.172) Using (6.154) and (6.159), we have
E jˆ1;0 .v/j3
1=3
105 .1 C 2 /N 1=4 log N;
(6.173)
416
6 More on Randomness
and p 1=2 V1 D . / N C O .log N /2 C O .1/ C . /N 1=4 O log N C O .1/ : (6.174) By (6.173) and (6.174), 0
E jˆ1;0 j3 B D 40b @ p V1
40W1 3=2
V1
D 40b
1=3 13 C A D
105 .1 C 2 /N 1=4 log N 1=2 . / N C O ..log N /2 / C O .1/ C . /N 1=4 O log N C O .1/ D O N 1=4 .log N /3 ;
!3 D
(6.175)
p where we applied Lemma 6.6 and the fact b < 2 N . The usual application of Lemma 5.5 gives ˇ ˇ3 ˇ ˇ E ˇˆ1;0 ˇ max ˇˆ1;0 .v/ˇ 104 .1 C 2 /3` < 104 .1 C 2 /2.log N /2 : v2P0
(6.176)
By (6.158), p
V2 D Var
b X
!1=2 ˆh
D
1=2 p p b. / .log N /2 log.1 C 2/ C O .1/ :
hD1
(6.177) Combining (6.176) and (6.177), 0 ˇ ˇ 1=3 13 ˇˆ1;0 ˇ3 E 40W2 B C D 40b @ p A D 3=2 V 2 V2 13
0 B D 40b @ p
4
2
2
10 .1 C /2.log N / C
1=2 A D p b. / .log N /2 log.1 C 2/ C O .1/
D O N 1=4 .log N /3 ; where we applied Lemma 6.6 and the fact
p
p N < b < 2 N.
(6.178)
6.5 Conclusion of the Proof of Theorem 5.4
417
Combining (6.161), (6.166), Corollary (1), and Corollary (2) of Lemma 6.3 [see (6.167)–(6.168)] and using the fact that 'h .ˇ/ and ' h .ˇ/ are the “tilted projections” of ˆh .v/ and ˆh .v/, respectively, from the “unit parallelogram” v 2 P0 to the unit interval 0 ˇ < 1, we have b b X p X F . 2I ˇI I e N / p N D 'h .ˇ/ Eˇ2Œ0;1/ 'h .ˇ/ C ' h .ˇ/ Eˇ2Œ0;1/ ' h .ˇ/ C 2 hD1 hD1
C Eˇ2Œ0;1/
b X
.'h .ˇ/ C ' h .ˇ// p N C O .log N /2 2 hD1
(6.179)
except possibly for a set of ˇ 2 Œ0; 1/ of one-dimensional Lebesgue measure less than 1010 .1 C 2 /N 4 . Next we estimate the subsum Eˇ2Œ0;1/
b X
.'h .ˇ/ C ' h .ˇ// p N 2 hD1
in (6.179). We recall (6.160) and (6.104): ˇ ˇ b ˇ X ˇˇ ˇ ˆh .v/ C ˆh .v/ ˇ D O .1/ ˇEv2P0 f .1 ; 2 I v/ Ev2P0 ˇ ˇ
(6.180)
hD1
and p p Ev2P0 f .1 ; 2 I v/ D p .2 1 C 1/ log.1 C 2/ D p b.k C 3`/ log.1 C 2/: 2 2 (6.181) By (6.162) and (6.164), p p b.k C 3`/ C ` log.1 C 2/ D I0 log.1 C 2/ D N C O .1/:
(6.182)
By (6.181) and (6.182), p Ev2P0 f .1 ; 2 I v/ D p b.k C 3`/ log.1 C 2/ D 2 D p N C O .`/ D p N C O .log N /2 : 2 2
(6.183)
418
6 More on Randomness
Since Eˇ2Œ0;1/
b X
.'h .ˇ/ C ' h .ˇ// D Ev2P0
hD1
b X
ˆh .v/ C ˆh .v/ ;
hD1
combining (6.180) and (6.183), we have Eˇ2Œ0;1/
b X
.'h .ˇ/ C ' h .ˇ// D p N C O .log N /2 : 2 hD1
(6.184)
Using (6.184) in (6.179), we obtain b X p 'h .ˇ/ Eˇ2Œ0;1/ 'h .ˇ/ C F . 2I ˇI I e N / p N D 2 hD1
C
b X ' h .ˇ/ Eˇ2Œ0;1/ ' h .ˇ/ C O .log N /2
(6.185)
hD1
except possibly for a set ˇ 2 S1 Œ0; 1/ of Lebesgue measure meas .S1 / <
1010 .1 C 2 / : N4
(6.186)
Write 'h;0 .ˇ/ D 'h .ˇ/ E'h and ' h;0 D ' h .ˇ/ E' h : Combining (6.170), (6.174), (6.175), and using the trivial fact that b X
'h;0 .ˇ/; 0 ˇ < 1 and
hD1
b X
ˆh;0 .v/; v 2 P0
hD1
have the same distribution, we obtain ˇ ˇ ( ) Z 1 b ˇ ˇ X p 1 ˇ ˇ u2 =2 'h;0 .ˇ/ 1 V1 p e d uˇ D max ˇmeas ˇ 2 Œ0; 1/ W ˇ 1<1 <1 ˇ 2 1 hD1 D O N 1=4 .log N /3 ;
(6.187)
6.5 Conclusion of the Proof of Theorem 5.4
419
where p 1=2 V1 D . / N C O .log N /2 C O .1/ C . /N 1=4 O log N C O .1/ D p D 1 C O .N 1=4 log N / . / N :
(6.188)
Similarly, by (6.173), (6.177), and (6.178), ˇ ˇ ) ( Z 1 b ˇ ˇ X p 1 2 ˇ ˇ max ˇmeas ˇ 2 Œ0; 1/ W ' h;0 .ˇ/ 2 V2 p e u =2 d uˇ D ˇ ˇ 1<2 <1 2 2 hD1 D O N 1=4 .log N /3 ;
(6.189)
where
1=2 p p p V2 D b. / .log N /2 log.1 C 2/ C O .1/ D O N 1=4 log N (6.190) p (here we used the fact b < 2 N ). Combining (6.185)–(6.190), we can easily derive the statement of Theorem 5.4 ˇ ˇ Z 1 p p ˇ ˇ 1 2 max ˇˇmeas ˇ 2 Œ0; 1/ W F . 2I ˇI I e N / p N . / N p e u =2 d uˇˇ 2 2
D O N 1=4 .log N /3
(6.191)
via routine calculations. The details go as follows. We apply (6.189) with 2 D log N . Then Z
1
e u
2
2 =2
Z
1
du
ue u
2 =2
d u D e .2 /
2 =2
D
2
D e .log N /
2 =2
D N .log N /=2 ;
and so the set ˇ ˇ ) b ˇX ˇ p ˇ ˇ S2 D ˇ 2 Œ0; 1/ W ˇ ' h;0 .ˇ/ˇ log N V2 ˇ ˇ (
(6.192)
hD1
has Lebesgue measure meas .S2 / D O N 1=4 .log N /3 :
(6.193)
420
6 More on Randomness
Given 2 .1; 1/ in (6.191), first we choose 1 D C 1 in (6.187) to satisfy the equality p p p C V1 log N V2 C O .log N /2 D . / N : 1
(6.194)
Then by (6.185)–(6.187) and (6.192)–(6.152), ( ˇ 2 Œ0; 1/ W
b X
'h;0 .ˇ/
C 1
p
) V1 n fS1 [ S2 g
hD1
p p ˇ 2 Œ0; 1/ W F . 2I ˇI I e N / p N . / N ; 2 which implies ( meas ˇ 2 Œ0; 1/ W
b X
'h;0 .ˇ/
C 1
p V1
)
hD1
p p meas ˇ 2 Œ0; 1/ W F . 2I ˇI I e N / p N . / N C meas .S1 / C meas .S2 / : 2 (6.195)
Next we choose 1 D 1 in (6.187) to satisfy the equality p p p V1 C log N V2 C O .log N /2 D . / N : 1
(6.196)
Then by (6.185)–(6.187), (6.192)–(6.151), and (6.196), ( ˇ 2 Œ0; 1/ W
b X
p
)
'h;0 .ˇ/ V1 1
hD1
p
p
ˇ 2 Œ0; 1/ W F . 2I ˇI I e / p N . / N 2 N
n fS1 [ S2 g;
which implies ( meas ˇ 2 Œ0; 1/ W
b X
'h;0 .ˇ/
1
p V1
)
hD1
p p meas ˇ 2 Œ0; 1/ W F . 2I ˇI I e N / p N . / N meas .S1 /meas .S2 /: 2 (6.197)
6.5 Conclusion of the Proof of Theorem 5.4
421
Combining (6.187), (6.195), and (6.197), we have 1 p 2
Z
1 C 1
e u
2 =2
d u C O N 1=4 .log N /3 meas .S1 / meas .S2 /
p p N meas ˇ 2 Œ0; 1/ W F . 2I ˇI I e / p N . / N 2 Z 1 1 2 p e u =2 d u C O N 1=4 .log N /3 C meas .S1 / C meas .S2 / : 2 1 (6.198) p Next we divide (6.194) by . / N : p
C 1
p V1 log N V2 p p C O N 1=2 .log N /2 D ; . / N . / N
and combining it with (6.188) and (6.190), we obtain 1=4 C log N C O N 1=4 .log N /2 D ; 1 1 C O N or equivalently 1=4 C log N C O N 1=4 .log N /2 : 1 D 1 C O N
(6.199)
p Similarly, if we divide (6.196) by . / N , and use again (6.188) and (6.190), we have the analog of (6.199): 1=4 log N C O N 1=4 .log N /2 : 1 D 1 C O N
(6.200)
By (6.186) and (6.193): meas .S1 / C meas .S2 / D O N 1=4 .log N /3 :
(6.201)
We also need the following simple estimation: for all > 0 and " > 0 we have Z
.1C"/C"
e u
2 =2
d u 2":
(6.202)
Indeed, Z
.1C"/C"
e
u2 =2
Z
.1C"/C"
du 2
e u d u D 2 e e .1C"/" D
D 2e
1 e "" 2e . C 1/" 2";
422
6 More on Randomness
where we used the elementary inequalities e u =2 2e u , 1 e u u, and .u C 1/e u 1 that hold for all u 0. Finally notice that (6.191) immediately follows from (6.198)–(6.202). This settles the special case where 2
I0 D I0 . I N / D .3b C 1/` C bk
(6.203)
holds for some integer b [see (6.164)]. In the general case we have I0 D I0 . I N / D .3b C 1/` C bk C %
(6.204)
with some integers b and 0 % < k C 3`, where % is the “remainder.” Then we choose 2 D .3b C 1/` C bk C minf%; kg: Again using 1 1 D `, we clearly have X
fi .v.ˇ// D f .1 ; 2 I v.ˇ// C f .0; 1 1I v.ˇ// C f .2 C 1; I0 I v.ˇ//;
0i I0 . IN /
and so ˇ ˇ ˇ ˇ ˇ ˇ X ˇ fi .v.ˇ// f .1 ; 2 I v.ˇ//ˇˇ O .log N /2 ; ˇ ˇ ˇ0i I0 . IN / which is analogous to (6.166). A straightforward adaptation of Lemma 6.3 with “remainder” gives that the distribution of f .1 ; 2 I v.ˇ//; v.ˇ/ D .ˇ; 0/; 0 ˇ < 1 is “almost the same” as that of f .1 ; 2 I v/, v 2 P0 , and we have the equality f .1 ; 2 I v/ D
b X hD1
ˆh .v/ C
b X
ˆh .v/ C ˆhC1 .v/
hD1
for the “overwhelming majority” of v 2 P0 . Here the last term ˆhC1 .v/ corresponds to the “short tail sum” f ..3b C1/`Cbk C1; 2 I v.ˇ//, representing the contribution of the “remainder.”
6.6 Proving the Three Lemmas: Part One
423
Since the “remainder” can be smaller than k, b X
ˆh .v/ C ˆhC1 .v/
hD1
is a sum of independent, but not necessarily identically distributed random variables (due to the last term ˆhC1 .v/). This change does not lead to a new problem, since the Berry–Esseen form of the CLT (6.128) still applies. The rest of the proof of the general case is the same as it was in the special case (6.203). This proves (6.191) in the general case (6.204). It does not mean, however, that the proof of Theorem 5.4 is complete. We still have to prove three lemmas from Sect. 6.3, namely, Lemmas 6.4– 6.6. It is the subject of the next two sections.
6.6 Proving the Three Lemmas: Part One To prove Lemmas 6.4–6.6 we will apply Poisson’s summation formula, study some nonelementary functions, estimate integrals of exponential functions, and finally, to compute the variance, we use Parseval’s formula. The details will be rather troublesome. To prove Lemma 6.4, we need the two-dimensional Poisson’s summation formula, which basically means that we work with Fourier series. For the convenience of the reader we start with the one-dimensional case. Assume that a series X G.x/ D g.x C k/ (6.205) k2ZZ converges uniformly for 0 x < 1, and also assume that the sum G.x/ is p represented by its Fourier series (where of course i D 1): G.x/ D lim
N !1
N X
cn e
2inx
1 X
D
cn e 2inx ;
(6.206)
nD1
nDN
where the Fourier coefficients cn in (6.206) are calculated in the usual way: Z
1
cn D
G.t/e 2int dt D
0
D
XZ k2ZZ
k
XZ k2ZZ
kC1
g.t/e
2int
Z
1
g.t C k/e 2int dt D
0 1
dt D 1
g.t/e 2int dt:
(6.207)
424
6 More on Randomness
Applying (6.207) in (6.206) with x D 0, we obtain (the one-dimensional form of) Poisson’s summation formula: XZ 1 X g.t/e 2int dt: (6.208) g.k/ D G.0/ D 1 k2ZZ n2ZZ What we really need here is the two-dimensional form of (6.208), which can be proved exactly the same way: X
g.k/ D 2
X Z 2
k2ZZ
n2ZZ
1
1
Z
1
g.t/e 2int d t;
(6.209)
1
where nt D n1 t1 C n2 t2 is the usual dot (or inner) product. If g D B , that is, if the function g is the characteristic function of a bounded region B R I 2 in the plane, then the left-hand side of (6.209) counts the number of lattice points in the region B: X
X Z
B .k/ D jB \ ZZ2 j D 2
2
k2ZZ
e 2int d t
(6.210)
B
n2ZZ
with the usual dot product nt D n1 t1 C n2 t2 . If we switch from region B to the translated copy B C v, then by (6.210) and the equality t 2 B C v , t v 2 B, we have j.B C v/ \ ZZ2 j D
X Z 2
n2ZZ
D
X Z n2ZZ
2
e B
2in.tv/
dt D
e 2int d t D BCv
X Z 2
n2ZZ
e
2int
d t e 2inv ;
(6.211)
B
which is a Fourier series in terms of the translation vector v 2 Œ0; 1/2 (since the set of lattice points is periodic modulo one). We recall the definition of the hyperbolic region HK;L D HK;L . / [see (6.86)]: n o p HK;L . / D .x; y/ 2 R I 2 W x 2 2y 2 where K x C y 2 L : (6.212) For every v 2 R I 2 let HK;L . / C v denote the translated copy of region HK;L . /, and consider the periodic function h.K; LI v/ D h. I K; LI v/ D j.HK;L . / C v/ \ ZZ2 j Area.HK;L . //:
(6.213)
6.6 Proving the Three Lemmas: Part One
425
The first statement of Lemma 6.4 is about the integral Z
P0
2 j.HK;L . / v/ \ ZZ j Area.HK;L . // d v D
Z 1Z
1
2
h2 .K; LI v/ d v: 0
0
(6.214) By using (6.211) for the function h.K; LI v/ introduced in (6.213), we have h.K; LI v/ D
!
Z
X
e
2int
d t e 2inv :
(6.215)
HK;L . /
2
n2ZZ W n¤0
Combining (6.215) with Parseval’s formula, Z 1Z
h2 . I K; LI v/ d v D 0
0
ˇZ ˇ ˇ ˇ
X
1
2
n2ZZ W n¤0
HK;L . /
ˇ2 ˇ e 2int d tˇˇ :
(6.216)
We want to evaluate the integral Z
e 2int d t HK;L . /
for every n 2 ZZ2 with n ¤ 0. Unfortunately, there is no explicit formula for this integral; instead we are going to express it in terms of two nonelementary functions ˝.z/ and ‰.z/. p p Motivated by the factorization x 2 2y 2 D .x C y 2/.x y 2/, we apply the following substitution: p p u1 D x C y 2; u2 D x y 2;
(6.217)
which is equivalent to ˇ ˇ ˇ 1=2 1=2 ˇ u1 u 2 u 1 C u2 @.x; y/ ˇ ; yD p D ˇ 3=2 3=2 ˇˇ D 23=2 : xD with Jacobian D 2 2 2 @.v; u/ 2 2 (6.218) Applying the substitution (6.217) and (6.218), Z e
2i.n1 xCn2 y/
HK;L . /
1 D p 2 2
Z
p
e 2i.n1 .u1 Cu2 /=2Cn2 .u1 u2 /=2
dxdy D
p Z p e 2iu1 .n1 2Cn2 /=2 2
Z Ku1 L
2/
dxdy D
HK;L . /
e 2iu2 .n1 =u1 u2 =u1
p p 2n2 /=2 2
du2 du1
426
6 More on Randomness
Z
p p p
2 2 e p sin .n1 2 n2 /=u1 2 d u1 : n1 2 n2 K (6.219) p p Making the substitution u D u1 .n1 2 C n2 /=2 2 in the last line of (6.219), we conclude that Z e 2i.n1 xCn2 y/ dxdy D 1 D p 2 2
L
2iu1 .n1
p
p 2Cn2 /=2 2
HK;L . /
Z
.n1
D .n1
p p 2Cn2 /L= 2
p
p 2Cn2 /K= 2
e
iu
p 2 2 sin 2 .2n21 n22 /=2u d u D 2n1 n22
p Z .n1 p2Cn2 /L=p2 2 2 2 2 D 2 p cos.u/ sin .2n1 n2 /=2u d u p 2 2n1 n2 .n1 2Cn2 /K= 2 p Z .n1 p2Cn2 /L=p2 2 2 2 2 i 2 p sin.u/ sin .2n1 n2 /=2u d u; p 2 2n1 n2 .n1 2Cn2 /K= 2
(6.220)
where n D .n1 ; n2 / 2 ZZ2 is a lattice point ¤ 0. By (6.211), 2
j.HK;L . / C v/ \ ZZ j Area.HK;L . // D
X 2
n2ZZ Wn¤0
(6.221)
R where .n/ is defined as Z
.n1 .n1
p
p
2Cn2 /L=
p 2
p 2Cn2 /K= 2
p Z 2 2inv e .n/; 2n21 n22
cos.u/ sin 2 .2n21 n22 /=2u i sin.u/ sin 2 .2n21 n22 /=2u d u:
To evaluate the right-hand side of (6.220), we need to study the auxiliary functions Z 1 cos.x/ sin.z=x/ dx (6.222) .z/ D 0
and Z
1
sin.x/ sin.z=x/ dx:
‰.z/ D
(6.223)
0
In particular, we have to show that the infinite integrals in (6.222) and (6.223) are both convergent (i.e., the functions are welldefined).
6.6 Proving the Three Lemmas: Part One
427
6.6.1 Properties of the Auxiliary Functions in (6.222) and (6.223) In fact, we study the integrals Z
b
cos.x/ sin.z=x/ dx for all 0 a < b 1 and z ¤ 0; a
and Z
b
sin.x/ sin.z=x/ dx for all 0 a < b 1 and z ¤ 0: a
First we show that the limit Z N cos.x/ sin.z=x/ dx exists for all z ¤ 0: lim N !1
1=N
This limit is the formal definition of .z/. To prove the limit, we assume z > 0; by using integration by parts, Z
N
p
p cos.x/ sin.z=x/ dx D sin.N / sin.z=N / sin2 . z/C
z
Z C
N p z
sin.x/ cos.z=x/zx 2 dx:
(6.224)
Also, by making the substitution y D z=x, Z
Z
p z
cos.x/ sin.z=x/ dx D 1=N
zN
p
cos.z=y/ sin.y/zy 2 dy:
We assume that N is large enough to yield 1=N < (6.224) and (6.225), Z
N
(6.225)
z
p z < minfN; zN g; then by
p cos.x/ sin.z=x/ dx D sin.N / sin.z=N / sin2 . z/C
1=N
Z C
N p z
sin.x/ cos.z=x/zx 2 dx C
Z
zN p
z
sin.x/ cos.z=x/zx 2 dx:
(6.226)
428
6 More on Randomness
Taking the limit N ! 1 in (6.226), we have Z
N
.z/ D lim
N !1 1=N
Z D2
1
p
cos.x/ sin.z=x/ dx D
p sin.x/ cos.z=x/zx 2 dx sin2 . z/;
(6.227)
z
and the infinite integral in the second line is clearly convergent, since Z
1 p z
x 2 dx D z1=2 < 1:
Of course, .0/ D 0 and .z/ D .z/. Next we show that the limit Z lim
N
N !1 1=N
sin.x/ sin.z=x/ dx exists for all z ¤ 0:
This limit is the formal definition of ‰.z/. To prove the limit, let z > 0, and repeating the arguments above, we have Z
N p
p p sin.x/ sin.z=x/ dx D cos.N / sin.z=N / C cos. z/ sin. z/
z
Z
N
p
cos.x/ cos.z=x/zx 2 dx;
(6.228)
z
and Z
Z
p z
sin.x/ sin.z=x/ dx D 1=N
zN p
sin.z=y/ sin.y/zy 2 dy;
z
and also Z
N
p p sin.x/ sin.z=x/ dx D cos.N / sin.z=N / C cos. z/ sin. z/C
1=N
Z C
zN p
sin.x/ sin.z=x/zx z
2
Z dx
N p
z
cos.x/ cos.z=x/zx 2 dx:
6.6 Proving the Three Lemmas: Part One
429
Taking the limit N ! 1, we have Z ‰.z/ D lim
N
N !1 1=N
Z D
1 p z
sin.x/ sin.z=x/ dx D
p z 2 cos x C zx dx C sin.2 z/=2; x
(6.229)
and again the infinite integral in the second line is clearly convergent for the same reason as (6.227). Of course, ‰.0/ D 0 and ‰.z/ D ‰.z/. Equations (6.227) and (6.229) show that the functions .z/ and ‰.z/ are well defined. Their asymptotic behavior is described by Lemma 6.5. On the other hand, the limit constant 2 . / in Lemma 6.4 is described by Lemma 6.6. We conclude Sect. 6.6 deriving Lemma 6.6 from Lemmas 6.4 and 6.5. The proofs of Lemmas 6.4 and 6.5 are postponed to the next section.
6.6.2 Deduction of Lemma 6.6 from Lemmas 6.4 and 6.5 First note that the asymptotic formula at the end of Lemma 6.6 immediately follows from Lemma 6.5. Again applying Lemma 6.5, there is an absolute constant c3 > 0 such that if z c3 then 2 .z/C‰ 2 .z/
p p p p 1 1=2 z .sin.2 z/ C cos.2 z//2 C .sin.2 z/ cos.2 z//2 D 2 8 D
1 1=2 z 2 D z1=2 ; 2 8 8
(6.230)
and also 2 .z/ C ‰ 2 .z/ 2
1=2 z 2 D z1=2 : 8 2
(6.231)
We distinguish three cases. Case 1: > c3 Then by Lemma 6.4 with n D 1, and also by (6.230), we have 2 . / D
8 log.1 C
8 log.1 C p
p
2/
2 . 2 n=2/ C ‰ 2 . 2 n=2/
1=2 2 2 p =2 p D : 2/ 8 log.1 C 2/
(6.232)
430
6 More on Randomness
On the other hand, by Lemma 6.4 and (6.97), p 2 . / c4
(6.233)
with some absolute constant c4 . Next we assume that > 0 is “small.” We recall (6.227): Z .z/ D 2 Z D2
1
p z
1
p
p sin.x/ cos.z=x/zx 2 dx sin2 . z/ D
z
sin.x/ cos.z=x/zx 2 dx C 2
Z
1
p sin.x/ cos.z=x/zx 2 dx sin2 . z/:
1
If z > 0 is “small” then Z
1
p
sin.x/ cos.z=x/zx 2 dx D
z
Z
1 p
xzx 2 dx C 0.z/ D z
z
Z
1 p z
x 1 dx C O.z/ D
1 1 1 D z log p C 0.z/ D z log C O.z/; z 2 z and Z
1
sin.x/ cos.z=x/zx
2
Z dx D O z
1
1
x
2
dx
D O.z/:
1
Thus, for 0 < z < 1=2 we have .z/ D z log
1 C O.z/: z
It follows that there is a (possibly small) constant c5 > 0 such that, for all 0 < z < c5 , 1 1 1 z log < .z/ < 2z log : 2 z z Next we switch from .z/ to ‰.z/: by definition, Z ‰.z/ D
Z
=2
1
sin.x/ sin.z=x/ dx C 0
sin.x/ sin.z=x/ dx; =2
(6.234)
6.6 Proving the Three Lemmas: Part One
431
and clearly ˇ Z ˇZ ˇ ˇ =2 =2 z ˇ ˇ sin.x/ sin.z=x/ dx ˇ x .z=x/ dx D : ˇ ˇ ˇ 0 2 0 By integration by parts [similarly to (6.228)] ˇZ ˇ ˇ ˇ
1 =2
ˇ ˇ Z ˇ ˇ p p sin.x/ sin.z=x/ dx ˇˇ D ˇˇcos. z/ sin. z/ Z
1
z
x 2 dx D
=2
1 =2
ˇ ˇ cos.x/ cos.z=x/zx 2 dx ˇˇ
2z :
Therefore, j‰.z/j
2z z C < 3z for all z > 0: 2
(6.235)
By (6.234) and (6.235) there is a (small) constant c6 > 0 such that for all 0 < z c6 .< 1=2/, 1 2 z log2 4
1 1 < 2 .z/ C ‰ 2 .z/ < 5z2 log2 : z z
(6.236)
Now we are ready to discuss Case 2: 0 < < c6 =10 Then by Lemma 6.4 and (6.236), 2 . /
1 log.1 C
p
X 2/ 1nc6 =5
2 R˙ .n/ 2 n=2 log2 2 n
2 ; 2 n
(6.237)
where R˙ .n/ denotes the number of primary representations of x 2 2y 2 D ˙n. The special case d D 2 in (2.221) gives log.1 C 1 X R˙ .n/ D p N 1nN 2
p
2/
C O N 1=2 :
(6.238)
Combining (6.237) and (6.238) with Abel’s transformation (2.119), 2 . / with some absolute constant c7 > 0.
c7 2 D c7
(6.239)
432
6 More on Randomness
On the other hand, by Lemma 6.4, (6.236), (6.238), and Lemma 6.5, 2 . /
C
24 log.1 C
p
X 2/ 1nc6 =5
2 R˙ .n/ 2 n=2 log2 2 n
2 C 2 n
X R˙ .n/ X R˙ .n/ p p O. n/ D O. / C O. / D O. /: n2 n3=2
n>c6 =5
(6.240)
n>c6 =5
By (6.239) and (6.240) there are constants 0 < c8 < c9 such that 0 < c8 < 2 . / < c9 for all 0 < < c6 =10:
(6.241)
It remains to discuss Case 3: c6 =10 c3 We show that there are constants 0 < c10 < c11 such that in this range of , c10 < 2 . / < c11 :
(6.242)
The upper bound is trivial from Lemma 6.4 and (6.97). To prove the lower bound, we simply choose the least complete square m2 such that z D 2 m2 =2 c3 . Then by (6.230), 2 .z/ C ‰ 2 .z/
1=2 z c12 > 0; 8
and of course R˙ .m2 / 1 (since x 2 2y 2 D m2 has the solution x D y D m). Now the lower bound in (6.242) is trivial from Lemma 6.4: we just use the single term n D m2 . Combining (6.232), (6.233), (6.239), (6.240), and (6.242), Lemma 6.6 follows. Concluding Remark. Lemma 6.6 tells us that in the two different ranges 0 < 1 and > 1 we have two different exponents of , namely, 1 and 1/2, to describe the order of 2 . /. Here we give an intuitive explanation p for this somewhat surprising phenomenon. We recall the definition of region H . 2I N /: ˚ .x; y/ 2 R I 2 W x 2 2y 2 ; where 0 y e N ; x 0 ; p an exponentially long and narrow tilted “hyperbolic i.e., H . 2I N / denotes p needle” of area N= 2 C O.1/. 2 First p assume that is “very small”; say, 0 < < 10 . Divide the region H . 2I N / into segments H1 ; H2 ; H3 ; : : : such that each p p Hi is covered by a rectangle of slope 1= 2 and area 1/5.p (Note that slope 1= 2 comes from x 2 D 2y 2 , which is equivalent to y=x D ˙1= 2; on the other hand, area 1/5 comes from Lemma 5.5.) Then the area of each segment Hi is about log.1= /, and the number
6.6 Proving the Three Lemmas: Part One
433
of segments Hi is about N= log.1= /. By Lemma 5.5, each translate Hi C v, v 2 R I2 contains at most one lattice point. Note that Hi and Hi Ck have dramatically different shapes as the gap k is increasing: the change is larger than k times iterated doubling– halving (doubling in one direction, halving in another direction). Therefore, it is plausible to assume that the occurrence of a lattice point in Hi C v and in Hi Ck C v, as v runs in the unit square, is (almost) an independent event if k is “large.” By using the additivity of the variance for independent components, we have
p p Variance j.H . 2I N / C v/ \ ZZ2 j Area.H . 2I N //
X
Variance j.Hi C v/ \ ZZ2 j Area.Hi / D
1i O.N= log.1= //
D
X
Area.Hi / log.1= / N= log.1= / D N;
1i O.N= log.1= //
which perfectly fits Lemma 6.6 for the range 0 < 1. 2 Next p assume that is “very large,” say, > 10 . In this case we divide the region H . 2I N / intop segments H1 ; H2 ; H3 ; : : : such that each Hi has area . The parts Hi , 1 i N= 2, have a doubling–halving behavior: the next part Hi C1 is twice as long and half as narrow as Hi , which is a dramatic change in the shapes. We recall that x1 D x C 2y, y1 D x C y is a basic automorphism of the quadratic form x 2 2y 2 . Indeed, x12 2y12 D .x C 2y/2 2.x C y/2 D .x 2 2y 2 /: k Applying a proper power k for any segment Hi , the automorphism A with A D 12 maps the long and narrow tilted region Hi into a “round” shape of size about 11 p p p (the area is ). Since the perimeter of such a “round” shape is O. /, it is clear that
p Variance j.Hi C v/ \ ZZ2 j Area.Hi / : Again assuming independence for the different parts Hi , we obtain
p p Variance j.H . 2I N / C v/ \ ZZ2 j Area.H . 2I N //
X p 1i N= 2
p p N;
which fits Lemma 6.6 for > 1. This completes our “intuitive understanding” of Lemma 6.6.
434
6 More on Randomness
6.7 Proving the Three Lemmas: Part Two It remains to prove Lemmas 6.4 and 6.5. We begin with the Proof of Lemma 6.5. In view of (6.229) it is natural to study the integral Z I D
1 p
cos.x C z
z / zx 2 dxI x
(6.243)
also, we assume z > 1. p We make the substitution x D z C y: xC
p p p z z z D D . z C y/ C p D . z C y/ C x zCy 1 C py z ! 1 X p p y k D D . z C y/ C z 1 C p z kD1
! 1 X y k y2 D2 zC p 1C : p z z p
(6.244)
kD1
Before applying (6.244), first we split the integral (6.243) into two parts: I D I1 C I2 where Z I1 D
p zCz
cos.x C
p z
Z I2 D
z / zx 2 dx and x
(6.245)
z / zx 2 dx; x
(6.246)
1
p
cos.x C zCz
where the value of the constant parameter in 1=4 < < 1=2 will be specified later (note in advance that D 7=24 will be a good choice). To evaluate the integral p in (6.245), we use the substitution y D x z and (6.244), and also use the trigonometric identity cos.˛ C ˇ/ D cos.˛/ cos.ˇ/ sin.˛/ sin.ˇ/ as follows: Z I1 D
p zCz
p z
cos.x C
z / zx 2 dx D x
6.7 Proving the Three Lemmas: Part Two
Z
z
D 0
435
!! 1 X y2 z y k cos 2 z C p 1 C dy D p p z z . z C y/2 p
kD1
p D cos.2 z/
Z
z
0
Z
p
z
sin.2 z/ 0
!! 1 X y2 y k 1 cos p 1 C p dy p z z .1 C y= z/2 kD1
!! 1 X y2 1 y k sin p 1 C p dy: p z z .1 C y= z/2
(6.247)
kD1
Making the substitution u D yz1=4 in (6.247), we have Z I1 D Z
p D z1=4 cos.2 z/
p zCz
cos.x C
p z
z1=4
cos u2 1 C 0
1=4
z
p sin.2 z/
Z
z / zx 2 dx D x
1 X
!! .uz1=4 /k
kD1 1 X .uz1=4 /k 1C
z1=4
sin u
2
0
!!
kD1
To evaluate (6.248), we use Lemma 6.8. We have Z 1 Z cos.u2 / d u D 0
1 0
1 du .1 C uz1=4 /2 1 d u: .1 C uz1=4 /2 (6.248) t u
p sin.u2 / d u D p ; 2 2
(6.249)
and for any M > 1, ˇZ ˇ ˇ ˇ
1
M
ˇ ˇZ ˇ ˇ ˇ 2 ˇˇ 1 2 2 ˇ cos.u / d uˇ < 2 ; ˇ sin.u / d uˇˇ < 2 : M M M 2
(6.250)
Remark. The two integrals in (6.249) are the so-called Fresnel integrals. For the sake of completeness we include a proof. Proof of Lemma 6.8. To prove (6.249) we use Cauchy’s integral theorem for complex variables. Let D 1 [ 2 [ 3 be the closed curve, where 1 is the interval p Œ0; R on the real axis; 2 is the arc Re i# where 0 # =4, of course i D 1, and 3 is the line segment fre i=4 W R r 0g returning to the origin. Since 2 f .w/ D e w is an analytic function (where w D x C iy), by Cauchy’s theorem,
436
6 More on Randomness
Z 0D
f .w/ d w D
3 Z X
f .w/ d w:
j D1 j
We have Z
Z
R
f .w/ d w D 1
e x dx ! 2
0
Z
1
e x dx D 2
0
p as R ! 1; 2
Z f .w/ d w ! 0 as R ! 1; 2
Z
1Ci f .w/ d w D p 2 3
Z
R 0
Z
1Ci 2 e ix dx ! p 2
1
.cos.x 2 / i sin.x 2 // dx 0
as R ! 1. Summarizing, with R ! 1 we have p 0D
1Ci p 2 2
Z
1
.cos.x 2 / i sin.x 2 // dx ;
0
and (6.249) follows. Next we prove (6.250). We work with sin; the same argument works for cos. Let m be the least integer such that m M 2 . We have Z
Z
1
.m/1=2
2
2
sin.x / dx D M
sin.x / dx C M
1 X
Aj
j D0
where Z Aj D
..mCj C1//1=2
sin.x 2 / dx: ..mCj //1=2
P Notice that 1 j D0 Aj is an alternating series such that jAj j jAj C1 j and Aj ! 0 as j ! 1. Thus we have ˇ ˇ ˇ1 ˇ ˇX ˇ ˇ ˇ A j ˇ jA0 j; ˇ ˇj D0 ˇ
6.7 Proving the Three Lemmas: Part Two
437
and so ˇZ ˇ ˇ ˇ
1 M
ˇ Z ˇ sin.x 2 / dx ˇˇ
..mC1//1=2
j sin.x 2 /j dx M
..m C 1//1=2 M .M 2 C 2/1=2 M <
2 ; M2 t u
which proves (6.250), and Lemma 6.8 follows. Let’s return to (6.248): Z I1 D
1=4
Dz
p zCz
cos.x C
p z
Z
p cos.2 z/
z1=4
z / zx 2 dx D x
cos.u / d u C O .z1=4 /4 z1=4
!
2
0
Z
p
z1=4
z1=4 sin.2 z/
! sin.u2 / d u C O .z1=4 /4 z1=4 :
(6.251)
0
By Lemma 6.8, Z
Z
z1=4
Z
1
2
cos.u / d u D
cos.u2 / d u D
cos.u / d u z1=4
0
0
1
2
p 1 D p C O.z 2 2 /; 2 2
(6.252)
and similarly Z
z1=4
p 1 sin.u / d u D p C O.z 2 2 /: 2 2 2
0
(6.253)
To minimize the total error [see (6.251)–(6.253)], we choose in such a way that 1
.z1=4 /4 z1=4 D z 2 2 ; that is; D 7=24: Combining (6.251)–(6.254), with D 7=24 and z > 1, we have Z I1 D
p zCz
p z
cos.x C
z / zx 2 dx D x
(6.254)
438
6 More on Randomness
p p p D p z1=4 cos.2 z/ sin.2 z/ C O z1=12 ; 2 2
(6.255)
which gives a good estimate for the first integral I1 in (6.245). It remains to estimate the second integral I2 in (6.245): Z I2 D
1 p
cos.x C zCz
z / zx 2 dx with D 7=24: x
(6.256)
To estimate I2 we apply a general lemma about exponential sums. Lemma 6.9. Let F .x/ and G.x/ be real-valued functions, F is differentiable with derivative F 0 , F .x/ and F 0 .x/=G.x/ are both monotonic throughout the interval a x b. Then ˇZ ˇ ˇ ˇ ˇ ˇ ˇ b ˇ ˇ G.a/ ˇ ˇ G.b/ ˇ ˇ ˇ iF .x/ ˇ ˇ ˇ e G.x/ dx ˇ 2 ˇ 0 ˇ C ˇ 0 ˇˇ : ˇ ˇ a ˇ F .a/ F .b/ Remark. This is a standard tool in analytic number theory; nevertheless, for the sake of completeness, we include a proof. Proof. The basic idea is the same as that of the simpler inequality (6.249). Suppose, for example, that F .x/ is monotone increasing, i.e., F 0 .x/ > 0 for a x b. Let F 1 denote the inverse function to F ; it is also increasing. Applying the substitution x D F 1 .u/, Z
Z
b
e
iF .x/
a
Z
F .b/
e iu
G.x/ dx D F .a/
F .b/
e iu h.u/ d u with h.u/ D
D F .a/
G.F 1 .u// du D F 0 .F 1 .u// G.F 1 .u// I F 0 .F 1 .u//
(6.257)
note that h.u/ is a monotone function. By integration by parts, Z
Z
F .b/
F .b/
iu
ie iu h.u/ d u D
e dh.u/ C F .a/
F .a/
D e ib h.F .b// e ia h.F .a//: The first integral is estimated from above as follows: ˇZ ˇ ˇZ ˇ ˇ F .b/ ˇ ˇ F .b/ ˇ ˇ ˇ ˇ ˇ e iu dh.u/ˇ ˇ 1 dh.u/ˇ D jh.F .b// h.F .a//j: ˇ ˇ F .a/ ˇ ˇ F .a/ ˇ Combining (6.257)–(6.259), Lemma 6.9 follows.
(6.258)
(6.259) t u
6.7 Proving the Three Lemmas: Part Two
439
To estimate (6.256) we use Lemma 6.9 with F .x/ D x C
z and G.x/ D zx 2 : x
Then F 0 .x/ D 1
z F 0 .x/ z x2 x2 z and D 1 D ; x2 G.x/ x2 z z
and both are positive for x >
p p z. If x > z C z with D 7=24, then 1
F 0 .x/ x2 z 2z 2 C 1 D > D z 2 ; G.x/ z z and by Lemma 6.9, ˇZ ˇ ˇ 1 ˇ z 1 ˇ ˇ 2 cos.x C / zx dx ˇ 2z 2 D 2z5=24 : jI2 j D ˇ p ˇ zCz ˇ x
(6.260)
Combining (6.243), (6.245), (6.255), and (6.260), we obtain for z > 1, Z
1 p
cos.x C z
z / zx 2 dx D x
p p p D p z1=4 sin.2 z/ cos.2 z/ C O z1=24 : 2 2
(6.261)
Using (6.261) and (6.229), we have the asymptotic formula in Lemma 6.8 for ‰.z/ with z > 1. If 0 < z 1, then we just use the trivial estimation in (6.229): ˇZ ˇ ˇ 1 p z 2 ˇˇ ˇ j‰.z/j ˇ p cos x C zx dx ˇ C j sin.2 z/=2j ˇ z ˇ x Z z
1
p
z
x 2 dx C
p p p z z D p C z D 2 z: z
This completes the proof of Lemma 6.8 for ‰.z/. Next we discuss .z/, see (6.227). Using the trigonometric identity z z z 2 sin.x/ cos. / D sin.x C / C sin.x / x x x
440
6 More on Randomness
in (6.227), we have Z .z/ D 2
1
p
p sin.x/ cos.z=x/zx 2 dx sin2 . z/ D
z
Z D Z C
1
p
sin.x C z
1
p
sin.x z
z /zx 2 dxC x
p z /zx 2 dx sin2 . z/: x
(6.262)
The first integral Z I D
1
sin.x C
p z
z /zx 2 dx x
is analogous to (6.243), so, not surprisingly, we just repeat the arguments above. Similarly to (6.245), I D I1 C I2 where Z I1 D
p zCz p
sin.x C z
Z I2 D
z / zx 2 dx and x
1 p zCz
sin.x C
z / zx 2 dx; x
and similarly to (6.248) I1 D z
1=4
p sin.2 z/
Z
z1=4 2
cos u
1C
0
1=4
Cz
p cos.2 z/
Z
1 X
!! .uz
1=4 k
/
kD1
z1=4
sin u
2
1C
0
1 X
!! 1=4 k
.uz
/
kD1
1 du .1 C uz1=4 /2
1 d u: .1 C uz1=4 /2
By using Lemma 6.9 as above, we eventually obtain the following analog of (6.261): for z > 1, Z
1
p
sin.x C z
z / zx 2 dx D x
p p p D p z1=4 sin.2 z/ C cos.2 z/ C O z1=24 : 2 2
(6.263)
6.7 Proving the Three Lemmas: Part Two
441
Next we estimate the second integral in (6.262): Z
1 p
sin.x z
z /zx 2 dx: x
Now we apply Lemma 6.9 with F .x/ D x
z and G.x/ D zx 2 : x
Then F 0 .x/ D 1 C
z F 0 .x/ z x2 x2 C z and D 1 C D ; 2 2 x G.x/ x z z
and by Lemma 6.9, ˇ ˇZ ˇ ˇ 1 z ˇ ˇ sin.x / zx 2 dx ˇ 2: ˇ p ˇ ˇ zCz x
(6.264)
By (6.262)–(6.264) we obtain the asymptotic formula in Lemma 6.8 for .z/ with z > 1. If 0 < z 1, then we just use the trivial estimation in (6.262): Z j.z/j 2z
1 p z
p p z x 2 dx C sin2 . z/ 2 p C z D 3 z: z t u
This completes the proof of Lemma 6.8. Next we discuss the Proof of Lemma 6.4. By (6.220) and Parseval’s formula, Z 1Z
1
2 j.HK;L . / C v/ \ ZZ2 j Area.HK;L . // d v D
0 0
X
D
2
n2ZZ Wn¤0
C
1 .2n21 n22 /2
X 2
n2ZZ Wn¤0
1 2 .2n1 n22 /2
Z
.n1
.n1
Z
p
p
2Cn2 /K=
.n1 .n1
2Cn2 /L=
p 2
p 2
cos.u/ sin
p p 2Cn2 /L= 2
p
2Cn2 /K=
p 2
2
sin.u/ sin
.2n21
2
.2n21
!2
n22 /=2u
du
n22 /=2u
C !2
du
:
(6.265)
442
6 More on Randomness
Equation (6.265) displays the integrals Z
b
a
Z
z cos.u/ sin. / d u and u
b
z sin.u/ sin. / d u u
a
(6.266)
with p p p a D a.n/ D .n1 2 C n2 /K= 2; b D b.n/ D 2.n1 2 C n2 /L; z D 2 .2n21 n22 /=2: (6.267)
Clearly Z
b a
Z
a
D .z/ 0
z cos.u/ sin. / d u D u
z cos.u/ sin. / d u u
Z
1
b
z cos.u/ sin. / d u; u
(6.268)
and Z
b a
Z
a
D ‰.z/ 0
z sin.u/ sin. / d u D u
z sin.u/ sin. / d u u
Z
1 b
z sin.u/ sin. / d u: u
(6.269) t u
To estimate the tail integrals in (6.268) and (6.269), we use the simple Lemma 6.10. If 0 < a < b < 1 and z > 0 then ˇZ a ˇ ˇ ˇZ a ˇ ˇ ˇ ˇ z z ˇ ˇ a; ˇ ˇ a; cos.u/ sin. sin.u/ sin. / d u / d u ˇ ˇ ˇ ˇ u u 0
ˇZ ˇ ˇ ˇ
b
0
ˇ 1 ˇ z z cos.u/ sin. / d uˇˇ 2 ; u b
ˇZ ˇ ˇ ˇ
1 b
ˇ ˇ z z sin.u/ sin. / d uˇˇ 2 : u b
Proof. The first line is trivial. To prove the second line, we apply integration by parts: ˇZ 1 ˇ ˇ ˇ Z 1 ˇ ˇ ˇ ˇ z z 2 ˇ ˇ ˇ cos.u/ sin. / d uˇ D ˇ sin.b/ sin.z=b/ C sin.u/ cos. /zx d uˇˇ ˇ u u b
b
Z
1
.z=b/ C z b
Similar argument works for the other one.
z u2 d u D 2 : b t u
6.7 Proving the Three Lemmas: Part Two
443
To prove Lemma 6.4, we basically repeat the proof of Proposition 2.20, p or, what is very similar, the proof of Proposition 3.2 (in the special case ˛ D 2). In fact, what we are going to do next is a somewhat simpler version. Let A > 0 be a positive integer; if x D v 0, y D w 0 is a primary solution of x 2 2y 2 D ˙A, then by definition [see (2.219)] p p p p vCw 2 p .1 C 2/2 ; ˙A D v 2w D .v C w 2/.v w 2/ with 1 < vw 2 2
2
implying p p p p A < v C w 2 .1 C 2/ A:
(6.270)
It follows from the classical product formula (2.213) that for every integer j , p p p .vCw 2/.1C 2/j D X CY 2 gives a solution xDX; y D Y of x 2 2y 2 D ˙A: (6.271) 2 Now let’s return to (6.265). Let a fixed integer; p j write z D A=2. If p A > 0 be p 2 2 2n1 n2 D ˙A then n1 C n2 2 D .v C w 2/.1 C 2/ for some integer j . We begin with [see (6.267)] Case 1: Suppose that
0 < a D a.n/ D .n1
p
p p 2Cn2 /K= 2 < 1; and bDb.n/ D 2.n1 2Cn2 /L > z D 2 A=2:
By using Lemma 6.10 in (6.268) and (6.269), Z
b
z cos.u/ sin. / d u D .z/ C O.a/ C O.z=b/; u
b
z sin.u/ sin. / d u D ‰.z/ C O.a/ C O.z=b/; u
a
Z a
and so Z a
b
z cos.u/ sin. / d u u
!2
Z
b
C a
z sin.u/ sin. / d u u
!2 D
D 2 .z/ C ‰ 2 .z/ C O.a C z=b/.j.z/j C j‰.z//j/ D p D 2 .z/ C ‰ 2 .z/ C O.a C z=b/ 2 .z/ C ‰ 2 .z/; where in the last step we used the Cauchy–Schwartz inequality.
(6.272)
444
6 More on Randomness
By (6.271), for every fixed integer A > 0 there are as many as log.b=a/ 2 log A log.L=K/ 2 log A p C O.1/ D p C O.1/ log.1 C 2/ log.1 C 2/
(6.273)
p p p integer values of j such that n1 C n2 2 D .v C w 2/.1 C 2/j satisfies the conditions of Case 1. The total contribution of Case 1 with a fixed integer A > 0 (i.e., 2n21 n22 D ˙A) in (6.265) is equal to 2
A
log.L=K/ 2 log A p C O.1/ 2 .z/ C ‰ 2 .z/ C log.1 C 2/
p C O A2 2 .z/ C ‰ 2 .z/ ;
(6.274)
where z D 2 A=2. Note that (6.274) is a consequence of (6.272) and (6.273); also, the error term comes from a convergent geometric series [due to the exponential nature of (6.271) and the effect of the factor .a.n/ C z=b.n// in (6.272)]. For a fixed integer A > 0, the contribution of Case p 1 represents thepoverwhelming p majority in (6.265): the rest of the j s with n1 C n2 2 D .v C w 2/.1 C 2/j make a total contribution p A2 O.log A/ 2 .z/ C ‰ 2 .z/ D A2 O.log A/O. A/I
(6.275)
this is a corollary of Lemma 6.10. Following the proof of Proposition 2.20 (or Proposition 3.2), we split the big sum (6.265) into two parts depending on a threshold M D .log.L=K//c (where the value of the constant c > 1 in the exponent will be specified soon): X 1
X
D
2
n2ZZ Wn¤0 j2n21 n22 jM
X
C
2
n2ZZ Wn¤0 j2n21 n22 jM
1 2 .2n1 n22 /2
1 .2n21 n22 /2
Z
Z
!2
b.n/
cos.u/ sin.z.n/=u/ d u
C
a.n/
!2
b.n/
sin.u/ sin.z.n/=u/ d u
;
(6.276)
a.n/
and X 2
D
X 2
n2ZZ Wn¤0 j2n21 n22 j>M
1 .2n21 n22 /2
Z
!2
b.n/
cos.u/ sin.z.n/=u/ d u a.n/
C
6.7 Proving the Three Lemmas: Part Two
X
C
2
n2ZZ Wn¤0 j2n21 n22 j>M
1 2 .2n1 n22 /2
Z
445
!2
b.n/
sin.u/ sin.z.n/=u/ d u
;
(6.277)
a.n/
p p p where b.n/ D 2.n1 2 C n2 /L, a.n/ D .n1 2 C n2 /K= 2, z.n/ D 2 .2n21 n22 /=2. By Lemma 6.5, (6.274), and (6.275), X 2
! X R˙ .A/ p DO .log.L=K/ C O.log A// A : A2 A>M
Using the upper bound with the divisor function 0 R˙ .A/ .A/ D Ao.1/ , by (6.278) we have X
D log.L=K/ O
2
X
(6.278) P d jA
1D
! A3=2Co.1/
D log.L=K/ O.M 1=3 / D O.1/;
A>M
(6.279) if M D .log.L=K//c with c D 3. Returning to (6.276), by (6.274) and (6.275), X
X
D
1
2
n2ZZ Wn¤0 j2n21 n22 jM
X
C
2
n2ZZ Wn¤0 j2n21 n22 jM
D
.2n21
log.L=K/ C O.log j2n21 n22 j/ X 1 .n/C p 2 2 0 n2 / log.1 C 2/
1=2 O.1/ 2 . 2 .2n21 n22 /=2/ C ‰ 2 . 2 .2n21 n22 /=2/ D 2 2 2 .2n1 n2 /
X 4 log.L=K/ X R˙ .n/ 2 2 . n=2/ C ‰ 2 . 2 n=2/ C ; p 2 3 log.1 C 2/ 1nM n (6.280)
where X 0
.n/ D 2 . 2 .2n21 n22 /=2/ C ‰ 2 . 2 .2n21 n22 /=2/
and X 3
D
X R˙ .n/ O.log n/ 2 . 2 n=2/ C ‰ 2 . 2 n=2/ : 2 n 1nM
(6.281)
446
6 More on Randomness
By (6.97), X 3
D O.1/:
(6.282)
Again by (6.97), X R˙ .n/ 2 . 2 n=2/ C ‰ 2 . 2 n=2/ D 2 n n>M X
DO
! 3=2Co.1/
n
D O.M 1=3 / D O .log.L=K//1 ;
(6.283)
n>M
since M D .log.L=K//3 . By (6.279)–(6.283), 1 4 log.L=K/ X R˙ .n/ 2 2 . n=2/ C ‰ 2 . 2 n=2/ C O.1/: p 2 1 2 log.1 C 2/ nD1 n (6.284) By (6.265), (6.276), (6.277), and (6.284),
X
C
Z 1Z 0
1
X
D
2 j.HK;L . / C v/ \ ZZ2 j Area.HK;L . // d v D 2 . / log.L=K/CO.1/;
0
(6.285)
where 2 . / D
4 log.1 C
p
1 X R˙ .n/ 2 2 . n=2/ C ‰ 2 . 2 n=2/ : 2 2/ nD1 n
(6.286)
Since P0 and Œ0; 1/2 are equivalent modulo one [see (6.213) and (6.214)], Lemma 6.4 follows from (6.285) and (6.286). This completes the proof of Theorem 5.4. t u
6.8 Starting the Proof of Theorem 5.6 The proof is based on Lemma 6.3 and a general form of the law of the iterated logarithm (LIL) in probability theory (see Feller’s theorem below). We apply Lemma 6.3 for every integer j 20 with the following choice of parameters. Let 1 D 1 .j / D 3 2j C 1; 2 D 2 .j / D 3 2j C1 I
(6.287)
6.8 Starting the Proof of Theorem 5.6
447
moreover, let i D i.j / denote the integer satisfying the inequality 2i j 3 C j 2 < 2i C1 ;
(6.288)
and define k D kj and ` D `j such that kj C 3`j D 3 2i and `j D b22i=3 c:
(6.289)
So `j is in the range of j 2 ; formally, `j j 2 ; furthermore, kj j 3 and kj is divisible by 3. Finally, let d D dj D 1 .j / 1 `j 3 2i `j :
(6.290)
Combining (6.287)–(6.290), we have 2 .j / dj `j D 3 2j C1 3 2j D 3 2j D 2j i .kj C 3`j / D bj .kj C 3`j /; i:e:; the choice b D bj D 2j i satisfies (6.78):
(6.291)
Note that bj is in the range of 2j =j 3 : bj 2j =j 3 . By Lemma 6.3, for every j 20 there exist two sequences of Rademacher like functions such that the first sequence 'j;1; 'j;2 ; : : : ; 'j;bj has type rj;0 D dj < rj;1 < rj;2 < : : : < rj;bj where $ rj;h D
% p log.1 C 2/ .dj C .3h 1/`j C hkj / for 1 h bj ; log 2
(6.292)
the second sequence ' j;1 .x/; ' j;2 .x/; : : : ; ' j;bj .x/ has type r j;0 < r j;1 < r j;2 < : : : < r j;bj where $
p % log.1 C 2/ 1 r j;h D dj C .3h C 1/`j C .h C /kj for 0 h bj ; log 2 3 (6.293) and the usual extensions ˆj;h , ˆj;h , 1 h bj , defined in (6.62) and (6.74) have the following approximation property:
448
6 More on Randomness
f .1 .j /; dj C `j C h.kj C 3`j /I v/ D
h X
ˆj;h .v/ C ˆj;h .v/
sD1
for all integers 1 h bj and for all v 2 P0 .r bj I a/
(6.294)
with the possible exception of at most bj ".I kj ; `j /2r bj integers a in 0 a < 2r bj , where ".I k; `/ is defined in (6.83). Note that (6.294) follows from (6.84) and (6.85). The special case h D bj in (6.294) is particularly useful: it gives f .1 .j /; dj C `j C bj .kj C 3`j /I v/ D f .1 .j /; 2 .j /I v/:
(6.295)
Next we apply Lemma 6.7: for every j 20 we have ˇ0 ˇ 11=2 ˇ ˇ bj ˇ
X p 1=2 ˇˇ ˇ@ 2 A ˆj;h .v/ C ˆj;h .v/ . /bj .kj C 3`j / log.1 C 2/ C O.1/ ˇ Var ˇ ˇ ˇ ˇ ˇ hD1
104 .1 C 2 / C bj ".I kj ; `j /104 .1 C 2 /.2 .j / 1 .j / C 1/C q C
bj ".I kj ; `j /104 .1 C 2 /.2 .j / 1 .j / C 1/;
(6.296)
where ".I k; `/ is defined in (6.83). Furthermore, ˇ
1=2 ˇˇ p ˇ ˇ Varv2P ˆj;h .v/ 1=2 2 . /3`j log.1 C 2/ C O.1/ ˇ 0 ˇ ˇ 104 .1 C 2 / C ".I kj ; `j /104 .1 C 2 /3`j C
q
".I kj ; `j /104 .1 C 2 /3`j ; (6.297)
and, finally, ˇ 0 1ˇ ˇ ˇ bj X ˇ ˇ ˇEv2P @f .1 .j /; 2 .j /I v/ ˇ A ˆ .v/ C ˆ .v/ j;h j;h 0 ˇ ˇ ˇ ˇ hD1 ".I kj ; `j /104 .1 C 2 /bj2 .kj C 3`j /:
(6.298)
rj C1;0 rj;bj :
(6.299)
We claim
6.8 Starting the Proof of Theorem 5.6
449
Indeed, in view of (6.292) it suffices to check the inequality dj C .3bj 1/`j C bj kj dj C1 :
(6.300)
We can derive (6.300) from (6.287)–(6.291) as follows: dj C .3bj 1/`j C bj kj D .dj C `j / C bj .kj C 3`j / 2`j D 2 .j / 2`j D 3 2j C1 2`j ;
and dj C1 D 3 2j C1 `j C1 ; so it remains to show that `j C1 2`j , and it is trivial from the definition of `j [see (6.288) and (6.289)] and the fact .j C 1/3 C .j C 1/2 < 2 for j 4: j3 C j2 For every j 20 and 1 hj bj let Xm D Xm .v/ D ˆj;hj .v/; v 2 P0 where m D
X
b C hj :
(6.301)
20 <j
Inequality (6.299) implies that the infinite sequence X1 ; X2 ; X3 ; : : : represents independent random variables:
(6.302)
Next we prove the following analog of (6.299): r j C1;0 r j;bj :
(6.303)
Indeed, in view of (6.293) it suffices to check the inequality 1 1 dj C .3bj C 1/`j C .bj C /kj dj C1 C `j C1 C kj C1 : 3 3
(6.304)
We can derive (6.304) from (6.287)–(6.291) as follows: 1 1 1 1 dj C.3bj C1/`j C.bj C /kj D .dj C`j /Cbj .kj C3`j /C kj D 2 .j /C kj D 32j C1 C kj ; 3 3 3 3
and 1 1 dj C1 C `j C1 C kj C1 D 3 2j C1 C kj C1 ; 3 3
450
6 More on Randomness
so it remains to show that kj C1 kj , and it is trivial from the definition of kj [see (6.289)] kj D 3 2i.j / b22i.j /=3c and the fact i.j C 1/ i.j /: For every j 20 and 1 hj bj let X
Ym D Ym .v/ D ˆj;hj .v/; v 2 P0 where m D
b C hj :
(6.305)
20 <j
Inequality (6.303) implies that the infinite sequence Y1 ; Y2 ; Y3 ; : : : represents independent random variables:
(6.306)
We use the elementary fact that given arbitrary constants c1 > 1 and c2 < 1, the inequality j2
c1 > 2c2 j
(6.307)
holds for every sufficiently large integer j . Combining this elementary fact with (6.83) and the definitions of kj ; `j ; bj (see (6.287)–(6.291), in particular, `j j 2 , kj j 3 , and bj 2j =j 3 ), we obtain via routine calculations bj ".I kj ; `j / D O
1 j2
:
P Since j 1 1=j 2 is convergent, the Borel–Cantelli lemma and (6.294) and (6.295) imply the following [we use the notation of (6.301) and (6.305)]: for almost every ˇ 2 Œ0; 1/, with v.ˇ/ D .ˇ; 0/ we have that the sum
m X
.Xn .v.ˇ// C Yn .v.ˇ/// D
nD1
b X X ' ;h .ˇ/ C ' ;h .ˇ/ C 20 <j hD1
C
hj X 'j;s .ˇ/ C ' j;s .ˇ/ differs from sD1
X
f .1 . /; 2 . /I v.ˇ// C f .1 .j /; 3 2j C 3hj 2i.j / I v.ˇ// by not more
20 <j
than an absolute constant C . I ˇ/ < 1:
(6.308)
We emphasize that, for almost every ˇ 2 Œ0; 1/, the same C . I ˇ/ holds in (6.308) simultaneously for all m 1.
6.8 Starting the Proof of Theorem 5.6
451
The next step is to apply a deep theorem from probability theory: it is Feller’s general form of the LIL (see Feller [Fe3]). Feller’s Theorem. Let Z1 ; Z2 ; Z3 ; : : : be an infinite sequence of independent random variables such that the upper bounds (Var stands for variance) max jZm j ƒm .Var.Z1 C Z2 C : : : C Zm //1=2 hold for some sequence ƒm , m 1. Let m , m 1, denote an increasing sequence 1 1 2 3 : : :. Assume that ƒm D O 3 m :
(6.309)
If the series 1 X
VarZm 2 m e m =2 is divergent Var.Z C Z C : : : C Z / 1 2 m mD1
(6.310a)
then with probability one m X
.Zs EZs / > m .Var.Z1 C Z2 C : : : C Zm //1=2
sD1
hold for infinitely many integers m; on the other hand, if the series 1 X
VarZm 2 m e m =2 is convergent Var.Z1 C Z2 C : : : C Zm / mD1
(6.310b)
then with probability one m X
.Zs EZs / m .Var.Z1 C Z2 C : : : C Zm //1=2
sD1
hold for all sufficiently large integers m. The same statements hold for the negative side m X
.Zs EZs / < m .Var.Z1 C Z2 C : : : C Zm //1=2 :
sD1
The deduction of Theorem 5.6 from Feller’s theorem is similar to how we derived Theorem 5.4 from the Berry–Esseen theorem in Sect. 6.5. A novelty is that we heavily use the fact that log log x is a very slowly changing function (see, e.g., Lemma 6.11), and also the different choice of the parameters in Lemma 6.3 leads to different estimations.
452
6 More on Randomness
The key step in the proof is to apply Feller’s theorem to the infinite sequence X1 ; X2 ; X3 ; : : : defined in (6.301): Xm D Xm .v/ D ˆj;hj .v/; v 2 P0 where m D
X
b C hj :
20 <j
This means we have to estimate the variance Var.X1 C X2 C : : : C Xm / D
m X
Var.Xn /:
nD1
Formula (6.291) implies that bj .kj C 3`j / D 3 2j , so by (6.296) for every 20,
Var
b X
ˆ ;h .v/ C ˆ ;h .v/
!1=2
p 1=2 D 2 . /3 2 log.1 C 2/ C O .1/;
hD1
(6.311) where we used the definitions (6.287)–(6.291) and the definition of ".I kj ; `j / [see (6.83)]:
2 p p p ".I kj ; `j / D 108 .1C 2 /4kj 1 C .1 C 2/`j C1 .1C 2/`j C1 C400.1C 2/`j =2 C
kj p p p kj C1 2 3 C10 .1 C /12`j 1 C .1 C 2/ .1 C 2/ 3 C1 C 400 .1 C 2/kj =6 ; 8
2
and applied the elementary fact (6.307). Repeating the same argument for (6.297), we have p 1=2 1=2 2 Varv2P0 ˆ ;h .v/ D . /3` log.1 C 2/ C O .1/:
(6.312)
We recall (6.130): ˇ !1=2 ˇˇ !1=2 ˇ b b X X ˇ ˇ ˇ ˇ Var Var.ˆ ;h / ˆ ;h C ˆ ;h ˇ ˇ ˇ ˇ hD1 hD1 .2b 1/ b Var.ˆ ;1 / 1=2 C pb Pb Var hD1 ˆ ;h C ˆ ;h
q Var.ˆ ;1 /:
(6.313)
Combining (6.311), (6.312), and (6.313), and using the fact that b and ` are, respectively, in the range of 2 = 3 and 2 , we have
6.8 Starting the Proof of Theorem 5.6 b X
!1=2 Var.ˆ ;h /
453
p 1=2 D 2 . /3 2 log.1 C 2/ C
hD1
p
CO .1/ C O b ` 2 =2 C O b ` D p 1=2 p D 2 . /3 2 log.1 C 2/ C O 2 =2 = :
(6.314)
Combining (6.314) with the equality A2 B 2 D .A B/.A C B/, we have ˇb ˇ
ˇX p ˇˇ ˇ 2
Var.ˆ ;h / . /3 2 log.1 C 2/ˇ ˇ ˇ ˇ hD1
p p O 2 =2 = O 2 =2 D O 2 = :
(6.315)
By P using (6.315) we are ready now to estimate the variance: with m D 20 <j b C hj we have
Var.X1 C X2 C : : : C Xm / D
b X X 20 <j hD1
D
X
Varˆ ;h C
hj X
Varˆj;s D
sD1
X p p p 2 . /3 2 log.1 C 2/ C 2 . /hj 3 2i.j / log.1 C 2/ C O 2 = D
20 <j
20 j
p
p D 2 . /3 2j 220 C hj 2i.j / log.1 C 2/ C O 2j = j ;
(6.316)
where i.j / is defined in (6.288). By (6.315) and (6.316), p p
1 2 . /3 2j log.1 C 2/ C O 2j = j VarXm bj p D p ; Var.X1 C X2 C : : : C Xm / 2 . /3 2j 220 C hj 2i.j / log.1 C 2/ C O 2j = j
which implies
p 1 p
1 1 VarXm C O 1= j 1 C O 1= j : 2 bj Var.X1 C X2 C : : : C Xm / bj (6.317) Now we recall the statement of Theorem 5.6: let q 5 be an arbitrarily large but fixed integer, and write
454
6 More on Randomness
1=2 " .n/ D 2 log2 n C 3 log3 n C 2 log4 n C : : : C 2 logq1 n C 2.1 C "/ logq n ; (6.318) where log2 n D log log n means the iterated logarithm (and not the base 2 logarithm), and in general logk n D log.logk1 n/ denotes the k times iterated logarithm of n. Note that with this choice of " .n/, the sum 1 X " .n/ nD1
n
e "
2
.n/=2
X n
1 n log n log2 n log3 n logq1 n.logq n/1C"
is convergent or divergent depending on whether we have " > 0 or " 0. Let > 0 be fixed, then Theorem 5.6 states that for almost every 0 ˇ < 1, p p F . 2I ˇI I e n / > p n C 0 .n/. / n 2
(6.319)
hold for infinitely many n’s [i.e., if we choose " D 0 in (6.318)]. Exactly the same holds for the “negative side” p p F . 2I ˇI I e n / < p n 0 .n/. / n: 2 On the other hand, choosing " > 0 in (6.318), for almost every 0 ˇ < 1 we have the opposite inequality p p F . 2I ˇI I e n / p n C " .n/. / n 2
(6.320)
for all sufficiently large values of n, and the same holds for the “negative side” p p F . 2I ˇI I e n / p n " .n/. / n: 2 To prove (6.319), we recall (6.161) and (6.162) (with e n instead of e N ): for every vector v.ˇ/ D .ˇ; 0/, 0 ˇ < 1 and > 0, X 0i I0 . In/
p fi .v.ˇ// F . 2I ˇI I e n /
X
fi .v.ˇ// C 104 .1 C 2 /;
0i I0 . In/
(6.321) where p n C log.2 2= / I0 D I0 . I n/ D p 2; log.1 C 2/
(6.322)
6.8 Starting the Proof of Theorem 5.6
455
Let j denote the integer satisfying 3 2j I0 . I n/ < 3 2j C1; and write I0 . I n/ D 32j C3hj 2i.j / C% with 0 hj < bj and 0 % < 32i.j /;
(6.323)
where % is the (negligible) “remainder.” Write 1 D 1 .20/ D 3 220 C 1 and 2 D 3 2j C 3hj 2i.j /: Then X
(6.324)
fi .v.ˇ// D f .1 ; 2 I v.ˇ// C f .0; 1 1I v.ˇ// C f .2 C 1; I0 I v.ˇ//;
0i I0 . In/
and so ˇ ˇ ˇ X ˇ ˇ ˇ ˇ fi .v.ˇ// f .1 ; 2 I v.ˇ//ˇˇ O j 3 D O .log n/3 : ˇ ˇ0i I0 . In/ ˇ
(6.325)
Combining (6.308), (6.321), and (6.325), we have that, for almost every ˇ 2 Œ0; 1/ and every integer n 2, hj b X X X p F . 2I ˇI I e n / D ' ;h .ˇ/ C ' ;h .ˇ/ C 'j;s .ˇ/ C ' j;s .ˇ/ C 20 <j hD1
sD1
C O;ˇ .1/ C O .log n/3 : Involving the expectations in (6.326), we obtain p F . 2I ˇI I e n / p n D 2 D
C
b X X
' ;hI0 .ˇ/ C
hj X
20 <j hD1
sD1
b X X
hj X
20 <j hD1
' ;hI0 .ˇ/ C
sD1
'j;sI0 .ˇ/C
' j;sI0 .ˇ/C
(6.326)
456
6 More on Randomness
0 CEˇ2Œ0;1/ @
b X X 20 <j hD1
1 hj X ' ;h.ˇ/ C ' ;h .ˇ/ C 'j;s .ˇ/ C ' j;s .ˇ/ A p nC 2 sD1
C O;ˇ .1/ C O .log n/3 ;
(6.327)
' ;hI0 .ˇ/ D ' ;h.ˇ/ Eˇ2Œ0;1/ ' ;h .ˇ/
(6.328)
where
means that we subtracted the expectation, and the same for ' ;hI0 .ˇ/ (the extra 0 in the index refers to zero expectation). We recall that by definition the two families ' ;h.ˇ/; 0 ˇ < 1 and ˆ ;h .v/; v 2 P0 have the same distribution, and similarly ' ;h .ˇ/; 0 ˇ < 1 and ˆ ;h .v/; v 2 P0 have the same distribution. This is why we can replace the sum in (6.327) b X X
' ;hI0 .ˇ/ C
20 <j hD1
hj X
'j;sI0 .ˇ/
sD1
with .X1 EX1 /C.X2 EX2 /C: : :C.Xm EXm / D
b X X
ˆ ;hI0 .v/ C
20 <j hD1
hj X
ˆj;sI0 .v/;
sD1
and similarly, we can replace the other sum in (6.327) b X X
' ;hI0 .ˇ/ C
20 <j hD1
hj X
' j;sI0 .ˇ/
sD1
with .Y1 EY1 /C.Y2 EY2 /C: : : C.Ym EYm / D
b X X 20 <j hD1
ˆ ;hI0 .v/ C
hj X
ˆj;sI0 .v/;
sD1
where again ˆ ;hI0 .v/ and ˆ ;hI0 .v/ mean, as usual, that we subtracted the expectations (the extra 0 in the index refers to zero expectation).
6.9 Completing the Proof of Theorem 5.6
457
6.9 Completing the Proof of Theorem 5.6 As we already said after formulating Feller’s theorem, the key step in the proof of Theorem 5.6 is to apply Feller’s theorem to the sequence X1 ; X2 ; X3 ; : : :. We choose 1=2 m D 2 log2 m C 3 log3 m C 2 log4 m C : : : C 2 logq m C 2 logqC1 m ; (6.329) that is, in (6.318) we replace q with q C 1 and write " D 0. Then the sum 1 X m 2m =2 X 1 e D1 m m log m log m log m logq m logqC1 m 2 3 m mD1 (6.330) is divergent. This implies requirement (6.310a). Indeed, we have
mD
X 20 <j
b C hj
X
b
20 j
and bj j 3 2j , which imply the existence of absolute constants 0 < C1 < C2 such that C1 m=bj C2 . Combining this fact with (6.317) and (6.330), we obtain that requirement (6.310a) is satisfied with the “deviation factor” m in (6.329). Moreover, we clearly have max jXm j D O j 3 D O .log m/3 ; where we use the fact that m is in the range of j 3 2j . It follows from (6.316) that .Var.X C : : : C Xm //1=2 is in the range of 2j=2 , or equivalently in the range p 1 C X2 3=2 of m.log m/ , so
max jXm j D O .log m/3 D O ƒm .Var.X1 C X2 C : : : C Xm //1=2 holds with the choice ƒm D .log m/2 m1=2 . It follows that requirement (6.309) ƒm D O 3 m is trivially satisfied with the “deviation factor” m in (6.329). Since both requirements (6.309) and (6.310a) are satisfied, we can apply Feller’s theorem for X1 ; X2 ; X3 ; : : :, and obtain that with probability one (i.e., for almost every v 2 P0 ) m X
.Xs EXs / > m .Var.X1 C X2 C : : : C Xm //1=2
sD1
hold for infinitely many integers m. We extend the discrete sequence (6.329) to the continuous function
(6.331)
458
6 More on Randomness
1=2 .x/ D 2 log2 x C 3 log3 x C 2 log4 x C : : : C 2 logq x C 2 logqC1 x ; (6.332) so .m/ D m for (sufficiently large) positive integers m (note that logqC1 x, q 5 is well defined only for relatively large real numbers). For later application (motivated by the fact that bj j 3 2j ) we need to estimate the difference x.log x/4 .x/: The following simple lemma is based on the fact that log log x is a very slowly increasing function. Lemma 6.11. For q 5 we have log x x.log x/4 .x/ < 8.q C 1/ 2 : log x Moreover, we have logqC1 x ; .x/ > 0 .x/ C p 2 log2 x where 0 .x/ is defined in (6.318) by choosing " D 0. Proof. Note that .y/ is monotone increasing. By using the mean value theorem in calculus, we have .2y/ .y/ y max 0 .u/ D yu2y
D y max
yu2y
1
1
1
1 2.log u/ C 3.log2 u/ .log u/ C 2.log3 u/1 .log2 u/1 .log u/1 C : : :
1=2 u 2 2 log2 u C 3 log3 u C 2 log4 u C : : : C 2 logq u C 2 logqC1 u
qC1 : log y
(6.333)
Let p denote the integer satisfying 2p1 < .log x/4 2p . Using a standard powerof-two decomposition, and applying (6.333), we have X qC1 log x x.log x/4 .x/ < 8.q C 1/ 2 ; .2s x/ .2s1 x/ p log x log x 1sp proving the first statement of the lemma.
6.9 Completing the Proof of Theorem 5.6
459
To prove the second statement, we use the fact A2 B 2 D .A B/.A C B/: .x/ 0 .x/ D
2 logqC1 x .x/ C 0 .x/
2 logqC1 x logqC1 x p > p ; 3 log2 x 2 log2 x t u
which completes the proof of Lemma 6.11. Next we estimate the “expectation part” of (6.327): 0
1 hj b X X X Eˇ2Œ0;1/ @ ' ;h .ˇ/ C ' ;h .ˇ/ C 'j;s .ˇ/ C ' j;s .ˇ/ A p n D 2 20 <j hD1 sD1 0
1 hj b X X X D Ev2P0 @ ˆ ;h .v/ C ˆ ;h .ˇ/ C ˆj;s .v/ C ˆj;s .v/ A p n: 2 20 <j hD1 sD1 (6.334) By using (6.298) and (6.324), we have ˇ ˇ 0 1 ˇ ˇ hj b X X X ˇ ˇ ˇEv2P @ A ˆ ;h .v/ C ˆ ;h .ˇ/ C ˆj;s .v/ C ˆj;s .v/ f .1 ; 2 I v/ˇˇ 0 ˇ ˇ ˇ 20 <j hD1 sD1
ˇ !ˇ b X ˇˇ X ˇˇ ˆ ;h .v/ C ˆ ;h .v/ ˇ C ˇEv2P0 f .1 . /; 2 . /I v/ ˇ ˇ 20 <j
hD1
ˇ 0 1ˇ ˇ ˇ hj X ˇ ˇ ˇ @ A ˆj;s .v/ C ˆj;s .v/ ˇˇ C ˇEv2P0 f .1 .j /; 2 I v/ ˇ ˇ sD1
X
". I k ; ` /104 .1 C 2 /b 2 .k C 3` / D O .1/;
(6.335)
20 j
where the last step is the usual routine estimation combining the key inequality (6.307) with the definition of ".I k; `/ in (6.83) and the definitions of kj ; `j ; bj (see (6.287)–(6.291), in particular, `j j 2 , kj j 3 , bj j 3 2j ). Moreover, by (6.5) and (6.324), p Ev2P0 f .1 ; 2 I v/ D p .2 1 C 1/ log.1 C 2/ D 2 p D p 3 2j C hj 2i.j / 220 log.1 C 2/: (6.336) 2
460
6 More on Randomness
Combining (6.335) and (6.336), we have 1 hj b X X X Ev2P0 @ ˆ ;h .v/ C ˆ ;h .ˇ/ C ˆj;s .v/ C ˆj;s .v/ A D 0
20 <j hD1
sD1
p D p 3 2j C hj 2i.j / 220 log.1 C 2/ C O .1/: 2
(6.337)
By (6.322) and (6.323), p p 3 2j C hj 2i.j / 220 log.1 C 2/ D .I0 %/ log.1 C 2/ D n C O .log n/3 ; (6.338) and combining (6.338) with (6.334) and (6.337), we have 1 hj b X X X ' ;h .ˇ/ C ' ;h .ˇ/ C 'j;s .ˇ/ C ' j;s .ˇ/ A p n D Eˇ2Œ0;1/ @ 2 20 <j hD1 sD1 0
0 D Ev2P0 @
b X X
20 <j hD1
1 hj X ˆ ;h .v/ C ˆ ;h .ˇ/ C ˆj;s .v/ C ˆj;s .v/ A p n D 2 sD1
D O .log n/3 :
(6.339)
By (6.301), mD
X
b C hj ;
(6.340)
20 <j
and by (6.287)–(6.291) and (6.338), X
X
b 3 2i. / C hj 3 2i.j / D 3
20 <j
2 C hj 3 2i.j / D
20 <j
D 3 2j C hj 2i. / 220 D
n log.1 C
p
2/
C O .log n/3 :
(6.341)
By (6.288), 2i.j / j 3 C j 2 < 2i.j /C1; and combining this fact with (6.340) and (6.341), we have m.log m/4 > n for all sufficiently large n:
(6.342)
6.9 Completing the Proof of Theorem 5.6
461
By Lemma 6.11 and (6.342), m D .m.log m/4 / .m.log m/4 / .m/ .n/ 8.q C 1/
log2 m log n .n/ 10.q C 1/ 2 log m log n
logqC1 n logqC1 n log n 0 .n/ C p 10.q C 1/ 2 0 .n/ C p : log n 2 log2 n 3 log2 n
(6.343)
By using (6.316), (6.338), and (6.343), we can estimate the right-hand side of (6.331) as follows: m .Var.X1 C X2 C : : : C Xm //1=2 D p
1=2 p D m 2 . /3 2j 220 C hj 2i.j / log.1 C 2/ C O 2j = j D p p
p p
p p D m . / n C O n= j D m . / n C O n= log n p
p logqC1 n p p 0 .n/. / n C p . / n C O n log log n= log n : 3 log2 n
(6.344)
Combining (6.331) and (6.344) we obtain that for almost every v 2 P0 , b X X 20 <j hD1
ˆ ;hI0 .v/ C
hj X
ˆj;sI0 .v/
sD1
p
logqC1 n p p 0 .n/. / n C p . / n C O n log log n= log n 3 log2 n
(6.345)
hold for infinitely many positive integers n, where the connection between the pair .j; hj / and the integer n is described by (6.291)–(6.293). Next we estimate .Y1 EY1 / C.Y2 EY2 / C : : : C.Ym EYm / D
b X X
ˆ ;hI0 .v/ C
20 <j hD1
hj X
ˆj;sI0 .v/
sD1
and show that its contribution is negligible. We apply Feller’s theorem to the sequence Y1 ; Y2 ; Y3 ; : : : with the choice p p m D 2 log2 m D 2 log log m:
(6.346)
462
6 More on Randomness
(In this step we do not have to be very careful; this explains the “generous” choice in (6.346).) Then the sum p 1 X m 2m =2 X log log m <1 e m m.log m/2 m mD1
(6.347)
is convergent. This implies requirement (6.310b). Indeed, as we explained after (6.330), there are absolute constants 0 < C1 < C2 such that C1 m=bj C2 . Moreover, by (6.312), VarYm D Var.Y1 C Y2 C : : : C Ym / p 2 . /3`j log.1 C 2/
D D .1 C o.1// p P 2 . /3 log.1 C 2/ b ` C h `
j j 20 <j D .1 C o.1// P 20 <j
`j : b ` C hj `j
(6.348)
Since `j j 2 , bj j 3 2j , and 0 hj bj , (6.348) implies the existence of two absolute constants 0 < C 0 < C 00 such that C 00 C0 VarYm : bj Var.Y1 C Y2 C : : : C Ym / bj
(6.349)
Combining (6.349) with C1 m=bj C2 and (6.347), we obtain that requirement (6.310b) is satisfied with the “deviation factor” m defined in (6.346). Moreover, we clearly have max jYm j max jXm j, so repeating the argument between (6.330) and (6.331), we conclude that, with the choice of ƒm D .log m/2 m1=2 and the “deviation factor” m in (6.346), requirement (6.309) ƒm D O 3 m is trivially satisfied. Since both requirements (6.309) and (6.310b) are satisfied, we can apply Feller’s theorem for Y1 ; Y2 ; Y3 ; : : :, and obtain that with probability one (i.e., for almost every v 2 P0 ) m X
p .Ys EYs / 2 log2 m .Var.Y1 C Y2 C : : : C Ym //1=2
sD1
hold for all sufficiently large integers m.
(6.350)
6.9 Completing the Proof of Theorem 5.6
463
Again by (6.312), 0 1 X p Var.Y1 CY2 C: : :CYm / D .1Co.1// 2 . /3 log.1C 2/ @ b ` C hj `j A D 20 <j
D O .2j =j / D O .n= log n/;
(6.351)
where we used (6.322), (6.323) and `j j 2 , bj j 3 2j , 0 hj bj . Combining (6.350) and (6.351), we obtain that for almost every v 2 P0 , b X X
ˆ ;hI0 .v/ C
20 <j hD1
hj X
ˆj;sI0 .v/ D .Y1 EY1 /C.Y2 EY2 /C: : :C.Ym EYm /
sD1
p
p
p 2 log2 mO n= log n D O n log log n= log n
(6.352)
hold for all sufficiently large integers n, where the connection between the pair .j; hj / and the integer n is described by (6.291)–(6.293). Combining (6.345) and (6.352), we have that for almost every v 2 P0 , b X X
ˆ ;hI0 .v/ C
20 <j hD1
C
b X X
hj X
ˆj;sI0 .v/C
sD1
ˆ ;hI0 .v/ C
20 <j hD1
hj X
ˆj;sI0 .v/
sD1
p
logqC1 n p p 0 .n/. / n C p . / n C O n log log n= log n 3 log2 n
(6.353)
hold for infinitely many positive integers n. Since they have the same distribution, (6.353) implies that for almost every 0 ˇ < 1, b X X 20 <j hD1
hj X ' ;h.ˇ/ C ' ;h .ˇ/ C 'j;s .ˇ/ C ' j;s .ˇ/
sD1
p
p p logqC1 n 0 .n/. / n C . / n p C O n log log n= log n 3 log2 n
(6.354)
hold for infinitely many positive integers n, where the connection between the pair .j; hj / and the integer n is described by (6.291)–(6.293).
464
6 More on Randomness
Combining (6.327) and (6.354), for almost every 0 ˇ < 1, p F . 2I ˇI I e n / p n 2 p
p p logqC1 n C O n log log n= log n C 0 .n/. / n C . / n p 3 log2 n 1 hj b X X X ' ;h .ˇ/ C ' ;h .ˇ/ C 'j;s .ˇ/ C ' j;s .ˇ/ A C CEˇ2Œ0;1/ @ 0
20 <j hD1
sD1
C O;ˇ .1/ C O .log n/3
(6.355)
hold for infinitely many positive integers n. Finally, using (6.339) in (6.355), we have p F . 2I ˇI I e n / p n 2 p p logqC1 n 0 .n/. / n C . / n p C 3 log2 n CO
p n log log n= log n CO .log n/3 CO;ˇ .1/CO .log n/3 ;
(6.356)
which implies that for almost every 0 ˇ < 1, p p F . 2I ˇI I e n / > p n C 0 .n/. / n 2 hold for infinitely many positive integers n. Indeed, it follows from the fact that the term p logqC1 n . / n p 3 log2 n is larger than the dominant term O
p
n log log n= log n
in the last line of (6.356) if n is large enough. This proves (6.319), so the proof of the first half of Theorem 5.6 is complete. Next we prove the other half (6.320). Again the key step is to apply Feller’s theorem to the sequence X1 ; X2 ; X3 ; : : :. This time we choose
6.9 Completing the Proof of Theorem 5.6
465
m D "=2 .m/ D
1=2 "
D 2 log2 m C 3 log3 m C 2 log4 m C : : : C 2 logq1 m C 2 1 C logq m ; 2 (6.357) that is, in (6.318) we replace " > 0 with "=2 > 0. Then the sum 1 X m 2m =2 X 1 e <1 m m log m log2 m log3 m logq1 m.logq m/1C."=2/ m mD1 (6.358) is convergent. This implies requirement (6.310b). Indeed, we just repeat the argument between (6.330) and (6.331), which includes that requirement (6.309) is also satisfied. Since both requirements (6.309) and (6.310b) are satisfied, we can apply Feller’s theorem for X1 ; X2 ; X3 ; : : :, and obtain that with probability one (i.e., for almost every v 2 P0 ) m X
.Xs EXs / "=2 .m/ .Var.X1 C X2 C : : : C Xm //1=2
sD1
"=2 .n/ .Var.X1 C X2 C : : : C Xm //1=2
(6.359)
hold for all sufficiently large integers m. By using (6.316) and (6.338), we have .Var.X1 C X2 C : : : C Xm //1=2 D p
1=2 p D 2 . /3 2j 220 C hj 2i.j / log.1 C 2/ C O 2j = j D p p
p p
p p D . / n C O n= j D . / n C O n= log n :
(6.360)
Combining (6.359) and (6.360) we obtain that for almost every v 2 P0 , b X X 20 <j hD1
ˆ ;hI0 .v/ C
hj X
ˆj;sI0 .v/
sD1
p
p "=2 .n/. / n C O n log log = log n
(6.361)
hold for all sufficiently large integers n, where the connection between the pair .j; hj / and the integer n is described by (6.291)–(6.293).
466
6 More on Randomness
Again the next step is to show that the contribution of b X X
.Y1 EY1 / C .Y2 EY2 / C : : : C .Ym EYm / D
ˆ ;hI0 .v/ C
20 <j hD1
hj X
ˆj;sI0 .v/
sD1
is negligible. Again we apply Feller’s theorem to the sequence Y1 ; Y2 ; Y3 ; : : : with the choice in (6.346) p p m D 2 log2 m D 2 log log m; and repeating the arguments of (6.346)–(6.352), we obtain that for almost every v 2 P0 , b X X
ˆ ;hI0 .v/ C
20 <j hD1
hj X
ˆj;sI0 .v/ D .Y1 EY1 /C.Y2 EY2 /C: : :C.Ym EYm / D
sD1
D O
p n log log n= log n
(6.362)
hold for all sufficiently large integers n, where the connection between the pair .j; hj / and the integer n is described by (6.291)–(6.293). Combining (6.361) and (6.362), we have that for almost every v 2 P0 , b X X
ˆ ;hI0 .v/ C
20 <j hD1
C
b X X 20 <j hD1
hj X
ˆj;sI0 .v/C
sD1
ˆ ;hI0 .v/ C
hj X
ˆj;sI0 .v/
sD1
p
p "=2 .n/. / n C O n log log n= log n
(6.363)
hold for all sufficiently large integers n. Since they have the same distribution, (6.363) implies that for almost every 0 ˇ < 1, b X X 20 <j hD1
j X ' ;h.ˇ/ C ' ;h .ˇ/ C 'j;s .ˇ/ C ' j;s .ˇ/
h
sD1
p p "=2 .n/. / n C O n log log n= log n
(6.364)
6.9 Completing the Proof of Theorem 5.6
467
hold for all sufficiently large integers n, where the connection between the pair .j; hj / and the integer n is described by (6.291)–(6.293). Combining (6.327) and (6.364), for almost every 0 ˇ < 1, p F . 2I ˇI I e n / p n 2
p p "=2 .n/. / n C O n log log n= log n C 0
1 hj b X X X CEˇ2Œ0;1/ @ ' ;h .ˇ/ C ' ;h .ˇ/ C 'j;s .ˇ/ C ' j;s .ˇ/ A C 20 <j hD1
sD1
C O;ˇ .1/ C O .log n/3
(6.365)
hold for all sufficiently large integers n. Finally, using (6.339) in (6.365), we have p p F . 2I ˇI I e n / p n "=2 .n/. / nC 2 p
CO n log log n= log n CO .log n/3 CO;ˇ .1/CO .log n/3 ;
(6.366)
which implies that for almost every 0 ˇ < 1, p p F . 2I ˇI I e n / p n C " .n/. / n 2
(6.367)
hold for all sufficiently large integers n. Indeed,
."=2/ logq n p p " .n/ "=2 .n/ . / n > p . / n; 4 log2 n
(6.368)
and the right-hand side of (6.368) is larger than the dominant term O
p
n log log n= log n
in the last line of (6.366) if n is large enough. Since (6.367) is exactly (6.320), the proof of Theorem 5.6 is complete. t u
468
6 More on Randomness
6.10 More Results in a Nutshell There are many more results that can be proved by the method of Chap. VI. As a first illustration, consider the following modification of Theorem 5.6. p In Theorem 5.6 we choose a fixed > 0 and study the asymptotic behavior of F . 2I ˇI I e n / as n ! 1 for almost every ˇ. This raises the following natural question: Is there an analog of Theorem 5.6 which works simultaneously for all > 0? The answer is yes; for simplicity we just formulate such a result in the Khinchin form (LIL stands for the law of the iterated logarithm). Simultaneous version of the LIL: Let " > 0 be an arbitrarily small but fixed constant. Then for almost every ˇ, p p p p n . / n .2 C "/ log log n < F . 2I ˇI I e n / < 2 p p < p n C . / n .2 C "/ log log n 2
(6.369)
hold for all > 0 and for all sufficiently large n, i.e., for all n > n0 .ˇ; /. Note that (6.369) is sharp in the sense that 2 C " cannot be replaced by 2 ". Note that one can even upgrade the Simultaneous LIL from “all > 0” to “all intervals Œ1 ; 2 .” Let’s compare Theorems 5.4 and 5.6. The former is a “global” result describing the case where N D e n is fixed and ˇ runs in the unit interval 0 < ˇ < 1; the later describes the “individual” behavior of almost every ˇ as N D e n ! 1. One may think that, in a long run as N ! 1, we should have an “individual” CLT as follows: for almost every 0 < ˇ < 1, Z 1 p p p oˇˇ 1 ˇˇn 1 2 e u =2 d u: ˇ 1 n M W F . 2I ˇI I e n / n= 2 > . / n ˇ ! p M 2 (6.370) But (6.370) is false!. We just give a vague/intuitive explanation why it fails (this vague argument can be easily turned into a precise proof). In view of our basic probabilistic intuition (5.38), it suffices to study the analogous question for the symmetric random walk. Let X1 ; X2 ; X3 ; : : : be an infinite sequence of independent random variables with PrŒXi D 1 D PrŒXi D 1 D 1=2 for all i 1; and write Sn D X1 C X2 C : : : C Xn : If Sk 0 then we say that, after the kth step, the random walk is “on the positive side” (zero is included). We need the following well-known combinatorial/probabilistic result.
6.10 More Results in a Nutshell
469
Lemma 6.12. The probability that the symmetric random walk spends exactly 2k from the first 2n steps on the positive side equals 2k k
!
! 2n 2k 2n 2 : nk
We postpone the proof to the end of the section. From Lemma 6.12 we can easily derive the so-called Arc Sine Law, a well-known C “paradox” of the random walk, which is the reason why (6.370) fails. Let T2n denote the total time of the random walk S1 ; S2 ; S3 ; : : : ; S2n (of the first 2n steps) on the positive side. Then by Lemma 6.12, for any 0 < x < 1, C PrŒT2n
2nx D
X k2nx
2k k
!
! 2n 2k 2n 2 : nk
p By using Stirling’s formula nŠ .n=e/n 2 n.1 C o.1//, C 2nx D .1 C o.1// PrŒT2n
Z
nx
D .1 C o.1// 0
2 D .1 C o.1//
Z
k2nx
p
p D .1 C o.1// t.n t/
p x 0
dt
X
1 k.n k/
Z 0
x
D
du p D u.1 u/
p dy 2 p D .1 C o.1// arcsin x; 1 y2
(6.371)
p where we applied two substitutions: first u D t=n and then y D u (of course arcsin means the “inverse sine”, and o.1/ ! 0 if n ! 1). C Equation (6.371) tells p us that T2n =2n has a distribution function which is 2 approximately arcsin x when n is sufficiently large. It is often called the “Arc Sine Law of visiting the positive side.” This Arc Sine Law is rather surprising; with some exaggeration one may even call it a “mathematical paradox.” One may think, in a long run of 2n steps, the random walk typically spends close to one-half of the time on the positive side and one-half of the time on the negative side (here “close” means a possible discrepancy tending to zero as n tends to infinity). But the truth is quite different. By (6.371), the probability that a random walk p of 2n steps spends 2nx steps or less on the positive side is approximately 2 arcsin x. Since u D 1=2 is in fact the minimum(!) of the density function 1 .u.1 u//1=2 in (6.371), we see that, with relatively large probability, the proportion of the time spent on the positive side is near to 0 or 1, but not near to 1/2. We can say, therefore, that in a long run of tosses of a fair coin,
470
6 More on Randomness
there is a good chance that one face (either Heads or Tails) will dominate (meaning that it will lead the other for a disproportionate amount of time). The Arc Sine Law (more or less) explains why (6.370) is false. But we can restore common sense in the form of an individual CLT, if we switch from the ordinary asymptotic density 1 M !1 M
X
density.A/ D lim
1
(6.372)
n2AW nM
to the so-called 1 M !1 log M
X
logarithmic density.A/ D lim
n2AW nM
1 ; n
(6.373)
where A N I is an arbitrary (usually infinite) subset of the natural numbers. (Of course, the limits in (6.372) and (6.373) do not necessarily exist for an arbitrary A.) The name “logarithmic density” comes from the well-known fact that M X 1 D log M C O.1/ D log M C 0 C O.1=N / n nD1
(here 0 D :5772::: is Euler’s constant, but its value is irrelevant). In the following result CLT stands for the central limit theorem. Individual CLT for the logarithmic density: For any real numbers > 0 and , the set n p p p o n 2N I W F . 2I ˇI I e n / n= 2 > . / n has logarithmic density equal to 1 p 2
Z
1
e u
2 =2
du
for almost every 0 < ˇ < 1. Our two results so far (Simultaneous LIL and p Individual CLT) are stated, for the sake of simplicity, in the special case ˛ D 2. Needless to say, both can be extended for every quadratic irrational ˛. The next natural question is: What happens for almost every ˛? Let F .˛I ˇI cI N / denote the number of integral solutions 1 n N of the inhomogeneous diophantine inequality kn˛ ˇk <
c : n
(6.374)
6.10 More Results in a Nutshell
471
(Note that this notation was already used in Sect. 5.9, see (5.280).) It is easy to see that F .˛I ˇI cI N / is basically the same as the number of integral solutions .m; n/ 2 ZZ2 of the quadratic inequality < .x C ˇ/2 ˛ 2 y 2 <
(6.375)
p with D 2c˛ and 1 n N . Note that choosing ˛ D 2 in Eq. (6.375), we get back the inhomogeneous Pell inequality < .x C ˇ/2 2y 2 < . CLT for almost every ˛: Given a positive real number c > 0, for almost every ˛ ˇ ˇ o n p ˇ ˇ max ˇmeas ˇ 2 Œ0; 1/ W F .˛I ˇI cI e n / 2cn 0 .c/ n log n N ./ˇ ! 0
(6.376) as n ! 1, where 1 N ./ D p 2
Z
1
e u
2 =2
du
is the tail probability of the normal distribution; the maximum at the beginning of (6.376) is taken over all 1 < < 1; 0 .c/ > 0 denotes a constant depending only on c. Similarly to the proofs of Theorems 5.4 and 5.6, the proof of the “CLT for almost every ˛” is also based on an approximation with a sum of independent random variables: F .˛I ˇI cI e n / 2cn D X1 C X2 C X3 C : : : C Xn C Un ;
(6.377)
where X1 ; X2 ; X3 ; : : : are independent random variables, and Un is “negligible” (they all have zero expectation). Combining the “variance lemma” Lemma 6.4 and Kusmin’s theorem (see (4.102) and (4.103), it is not too difficult to see that the distribution of the dominating part X1 C X2 C X3 C : : : C Xn in (6.377) is similar to the the distribution of the simpler sum Z1 C Z2 C Z3 C : : : C Zn of independent random variables with zero expectations: X1 C X2 C X3 C : : : C Xn Z1 C Z2 C Z3 C : : : C Zn ;
(6.378)
where, for simplicity assume that n is a 2-power: n D 2k ; assume that 2k1 of the random variables Zi in (6.378) have the distribution PrŒZi D 1 D PrŒZi D 1 D 1=2; 2k2 of the random variables Zi in (6.378) have the distribution PrŒZi D 2 D 1=4 and PrŒZi D 2=3 D 3=4;
472
6 More on Randomness
2k3 of the random variables Zi in (6.378) have the distribution PrŒZi D 4 D 1=8 and PrŒZi D 4=7 D 7=8; 2k4 of the random variables Zi in (6.378) have the distribution PrŒZi D 8 D 1=16 and PrŒZi D 8=15 D 15=16; and so on; finally assume that two of the random variables Zi in (6.378) have the distribution PrŒZi D 2k2 D 2kC1 and PrŒZi D 2k2 =.2k1 1/ D .2k1 1/=2k1; and the last two of the random variables Zi in (6.378) have the distribution PrŒZi D 2k1 D 2k and PrŒZi D 2k1 =.2k 1/ D .2k 1/=2k : Indeed, Z1 ; Z2 ; : : : ; Zn reflects the distribution in (4.102) in the following sense: k 1 2X
log .ii.iC1/ C2/
i D2k1
2
log 2
2X 1 1 1 2k : 2 log 2 i k1 k
i D2
Note that X
2kj 2j D
.k=2/CC j k
X
2k2j < 22C C1
(6.379)
.k=2/CC j k
is very small if C is a large constant. Inequality (6.379) implies that in (6.378) we can “truncate the tail” in the sense that in suffices to restrict (6.378) to the cases 1 j < .k=2/ C C with the property that 2kj of the random variables Zi in (6.378) have the distribution PrŒZi D 2j 1 D 2j and PrŒZi D 2j 1 =.2j 1/ D .2j 1/=2j : Indeed, by (6.379) the probability of the event “Zi 2.k=2/CC occurs for at least one index 1 i n D 2kP ” is less than 22C C1 , which is a negligible probability if P C is a large constant. Let i Zi denote the truncated version of the sum niD1 Zi in (6.378). The good news is that we P can apply the Berry–Esseen form of the CLT [see (6.128)] for the truncated sum i Zi . Indeed, V D
X i
EZi2
X 1j <.k=2/CC
2kj 22.j 1/ 2j D 2k2
X
1 D ..k=2/ C C /2k2 ;
1j <.k=2/CC
(6.380)
p p which explains the order of the standard deviation V n log n.
6.10 More Results in a Nutshell
473
Furthermore, W D
X i
EjZi j3 <
X 1j <.k=2/CC
2kj 23j 2j D 2k
X
2j < 2.3k=2/CC C1 :
1j <.k=2/CC
(6.381) Using (6.380) and (6.381), the error term in (6.128) can be estimated from above as 40W D O k 3=2 D O .log n/3=2 : (6.382) 3=2 V The error term O .log n/3=2 in (6.382) is much weaker than the roughly n1=4 error term in Theorem 5.4. Nevertheless, the weak error term (6.382) is still sufficient to prove the “CLT for almost every ˛” formulated above. This completes our outline of the proof. Note that we can also prove a LIL for almost every ˛. Compared to Theorem 5.6 a substantial p difference is that the order pof the maximum p fluctuation gains an extra factor of log n: it jumps up from n log log n to n log n log log n. Another novelty is the use of martingales. We will give the full details of the proofs of these results somewhere else. Next we give more details of the proof of the “Individual CLT,” and finally We conclude Sect. 6.10 with a proof of Lemma 6.12.
6.10.1 Combining the Logarithmic Density with the Central Limit Theorem Similarly to the proofs of Theorems 5.4 and 5.6, the proof p of the “Individual p CLT” is also based on the approximation of the function F . 2I I e n / n= 2 with a sum of independent random variables p p F . 2I I e n / n= 2 D X1 C X2 C X3 C : : : C Un ;
(6.383)
where X1 ; X2 ; X3 ; : : : are independent random variables, and Un is “negligible.” In view of (6.383) it is natural to study the following purely probabilistic problem. Let X1 ; X2 ; X3 ; : : : be independent random variables, and consider the sums (n 1) S n D X1 C X2 C X3 C : : : C X n :
(6.384)
Assume that the expectation EXi D 0 for every i 1, write i2 D EXi2 and Vn D 12 C 22 C : : : C n2 ; that is, Vn is the variance of the sum Sn . The following result is basically due to Erd˝os and Hunt [Er-Hu].
474
6 More on Randomness
Lemma 6.13 (Vague form). With probability one, for any real number , X
1 M !1 log M lim
p Sn = Vn >W nM
Z
1 1 D p n 2
1
e u
2 =2
du
holds under mild condition. Remark. An old result of Erd˝os and Hunt [Er-Hu] proves the special case D 0 of the lemma for the random walk (i.e., where each Xi is ˙1 with probability 1/2). The argument below is a straightforward generalization of the Erd˝os–Hunt proof. Outline of the proof of Lemma 6.13. Let A be an arbitrary but fixed rational number. Let ( p 1; if Sn > A Vn I n D A;n D (6.385) 0; otherwise; and write 1 N .A/ D p 2
Z
1
e u
2 =2
d u:
A
By the CLT (which holds under mild condition), En D N .A/ C O.nı / with some positive absolute constant ı > 0. If m > n then we compute the covariance: cov.m ; n / D E..m Em /.n En //
D
X p k>A Vn
X p k>A Vn
p PrŒSn D k PrŒSm > A Vm jSn D k N 2 .A/ D
p PrŒSn D k PrŒSm Sn > A Vm k N 2 .A/:
(6.386)
If we make the stronger assumption m > 2n, then by the CLT, Z
1 cov.m ; n / p 2 1 p 2
Z
1
e
u2 =2
A 1
e A
u2 =2
Z
1 p 2
1 p 2
Z
!
1 p p A pVm u Vn Vm Vn
1
e A
t 2 =2
e
t 2 =2
dt
d u;
dt
du
6.10 More Results in a Nutshell
475
where, of course, the last line is just N 2 .A/ in disguise. Thus we have s Vn Vm Vn
cov.m ; n / D O
!
r DO
n : mn
(6.387)
If n < m 2n then we simply use the trivial upper bound jcov.m ; n /j 1:
(6.388)
Notice that Lemma 6.13 is equivalent to the following strong law of large numbers: for every rational number A, the random variables n D A;n defined in (6.385) have the property PM
M X n En
1
n
nD1 1=n nD1
! 0 with probability one:
(6.389)
(Indeed, the set of rationals is countable, and a countable union of zero measure sets is still a zero measure set.) To prove (6.389), we compute the corresponding second moment: 2 E4
M X n En
n
nD1
X
C2
1n<mM
C2
X 1n<m2nM
!2 3 M X E.n En /2 5D C n2 nD1 cov.m ; n / D O.1/C mn
X cov.m ; n / cov.m ; n / C2 : mn mn 12n<mM
(6.390)
First we use (6.388): X 1n<m2nM
M X cov.m ; n / n DO mn n2 nD1
M X 1 DO n nD1
! D
! D O.log M /:
(6.391)
476
6 More on Randomness
Next we use (6.387): X 12n<mM
X
cov.m ; n / DO mn 0
X
DO@
X
1 m
1<mM
12n<mM
1n<m=2
1 n
r
1 mn
r
n mn
! D
1 n A : mn
(6.392)
We have X 1n<m=2
1 n
r
X
X n D mn
k1 m2k1 n<m2k
1 n
r
n D mn
k X X m 2 k=2 O O 2k=2 D O.1/: D D 2 2kC1 m k1
k1
Using this in (6.392), we have X 12n<mM
X
cov.m ; n / DO mn
1mM
1 m
! D O.log M /:
(6.393)
Combining (6.390), (6.392), and (6.393), we have 2
M X n En
Variance D E 4
nD1
n
!2 3 5 D O.log M /:
(6.394)
By Chebyshev’s inequality and (6.394), for any > 0 we have the upper bound
1 O.1/ Pr jLM j > D 2 ; log M log M where LM D
M X n En
n
nD1
:
2
By choosing M D Mk D e k , we have 1 X kD1
1
X 1 1 D D O.1/: log Mk k2 kD1
(6.395)
6.10 More Results in a Nutshell
477
So by (6.395), for every > 0, 1 X kD1
1 Pr jLMk j > < 1: log Mk
By the (trivial) Borel–Cantelli lemma, for every > 0, with probability one, jLMk j < for all sufficiently large k: 2
(6.396)
2
If Mk D e k < n < MkC1 D e .kC1/ then 0 Ln LMk D O @
DO
e .kC1/ 1 log 2 k ek2
2
1 k2
!
2
2
e k <j <e .kC1/
DO
1
X
1 AD j
1 log e 2kC1 k2
DO
1 ; k
which tends to zero as k ! 1. Combining this with (6.396), (6.389) follows, completing the outline of the proof of Lemma 6.13. t u Finally, we include the Proof of Lemma 6.12. It is a combinatorial argument based on a recurrence formula. Let p2n .2k/ denote the probability in question. By symmetry, p2n .2k/ D p2n .2n 2k/:
(6.397)
Let 1 k n 1, and consider the event corresponding to p2n .2k/. Then the time T of the first return to the origin equals T D 2r for some 1 r n 1. The time interval .0; T / is spent entirely on the strictly positive or strictly negative side, and each possibility has probability 1/2. Thus we have the key recurrence formula p2n .2k/ D
k X 1 rD1
C
2
nk X 1 rD1
2
PrŒT D 2r p2n2r .2k 2r/C
PrŒT D 2r p2n2r .2k/;
where the first sum represents the case of “strictly positive in the time interval .0; T /” and the second sum represents the case of “strictly negative in the time interval .0; T /.”
478
6 More on Randomness
Since 1 k n 1, we can apply induction: p2n .2k/ D
k X 1 rD1
C
nk X 1 rD1
2
2
PrŒT D 2r u2k2r u2n2k C
PrŒT D 2r u2k u2n2r2k ;
where u2` D PrŒS2`
(6.398)
! 2` 2` D 0 D 2 : `
We use the obvious equality (where, as usual, PrŒAjB denotes the conditional probability) u2` D PrŒS2` D 0 D
` X
PrŒS2` D 0jT D 2r PrŒT D 2r D
rD1
D
` X
PrŒS2`2r D 0 PrŒT D 2r D
rD1
` X
u2`2r PrŒT D 2r
rD1
in (6.398), and obtain 1 1 u2k u2n2k C u2n2k u2k D 2 2 ! ! 2k 2n 2k 2n 2 ; D u2k u2n2k D nk k
p2n .2k/ D
proving Lemma 6.12 for 1 k n 1. It remains to prove the missing cases k D 0 and k D n. They follow from the symmetry (6.397) and the identity n X 2k k kD0
!
! 2n 2k 2n 2 D 1: nk
(6.399)
To prove (6.399), we start from the binomial series with power 1=2: 1=2
.1 x/
! ! 1 1 X X 1=2 2k 2k k k k .1/ x D 2 x : D k k kD0
kD0
(6.400)
6.10 More Results in a Nutshell
479
Squaring both sides of (6.400), we have 1 X
2 x n D .1 x/1 D .1 x/1=2 D
nD0
! ! ! 1 ! 1 X X 2m 2k 2k k D 2 x 22m x m D k m mD0 kD0
1 n X X 2k D k nD0 kD0
!
! ! 2n 2k 2n 2 xn : nk
Comparing the coefficients of x n , identity (6.399) follows. This completes the proof of Lemma 6.12. t u
References
[Aa1] van Aardenne-Ehrenfest, T.: Proof of the impossibility of a just distribution of an infinite sequence of points over an interval, Proc. Kon. Ned. Akad. v. Wetensch. 48 (1945), 266–271. [Aa2] van Aardenne-Ehrenfest, T.: On the impossibility of a just distribution, Proc. Kon. Ned. Akad. v. Wetensch. 52 (1949), p 734–739. [Be1] Beck, J.: Randomness of n 2 mod 1 and a Ramsey property of the hyperbola, Colloquia Math. Soc. János Bolyai 60, Sets, Graphs and Numbers, Budapest, Hungary (1992), 23–66. [Be2] Beck, J.: Diophantine approximation and quadratic fields, Number Theory, Eds.: Gy˝ory/Peth˝o/Sós, Walter de Gruyter GmbH, Berlin - New York 1998, pp. 55–93. [Be3] Beck, J.: From probabilistic diophantine approximation to quadratic fields, Random and Quasi-Random Point Sets, Lecture Notes in Statistics 138, Springer-Verlag New York 1998, pp. 1–49. [Be4] Beck, J.: Randomness in lattice point problems, Discrete Mathematics 229 (2001), pp. 29–45 [Be5] Beck, J.: Lattice point problems: crossroads of number theory, probability theory, and Fourier analysis, Fourier Analysis and Convexity (conference in Milan, Italy, July 2001) Eds.: L. Brandoline et al., Applied and Numerical Harmonic Analysis, Birhäuser-Verlag, Boston MA 2004, pp. 1–35. [Be6] Beck, J.: Inevitable Randomness in Discrete Mathematics, University Lecture Series Vol. 49, Amer. Math. Soc. 2009. [Be7] Beck, J.: Randomness of the square root of 2 and the giant leap, Part 1, Periodica Math. Hungarica, 60, no. 2 (2010), 137–242. [Be8] Beck, J.: Randomness of the square root of 2 and the giant leap, Part 2, Periodica Math. Hungarica, 62, no. 2 (2011), 127–246. [Be9] Beck, J.: Lattice point counting and the probabilistic method, Journal of Combinatorics, 1, no. 2 (2010), 171–232. [Be10] Beck, J.: Superirregularity in Panorama of Discrepancy Theory, Editors: William Chen, Anand Srivastav, Giancarlo Travaglini, Springer Verlag 2014, pp. 1–87. [Be-Ch] Beck, J. and Chen, W.W.L.: Irregularities of Distribution, Cambridge Tracts in Mathematics 89, Cambridge University Press, Cambridge, 1987. [Ca2] Cassels, J.W.: Über lim xj x C ˛ yj, Math. Ann. 127 (1954), 288–304. [Cha] Chazelle, B.: The Discrepancy Method, Cambridge University Press, Cambridge, 2000. [Co] van der Corput, J.G.: Verteilungsfunktionen. I and II. Proc. Kon. Ned. Akad. v. Wetensch. 38 (1935), 813–821 and 1058–1066.
© Springer International Publishing Switzerland 2014 J. Beck, Probabilistic Diophantine Approximation, Springer Monographs in Mathematics, DOI 10.1007/978-3-319-10741-7
481
482
References
[Da] Davenport, H.: Note on irregularities of distribution, Mathematika 3 (1956), 131–135. [De] Descombes, I.R.: Sur la répartition des sommets d’une ligne polygonale réguliere nonfermée, Ann. Sci. de l’École Normale Sup. 75 (1956), 284–355. [Di] Dieter, U.: Das Verhaltender Kleinschen Functionen gegenüber Modultransformationen und verallgemeinerte Dedekindsche Summen, Journ. Reine Angew. Math. 201 (1959), 37–70. [Du] Dupain, Y.: Discrépance de la suite, Ann. Inst. Fourier 29 (1979), 81–106. [Du-So] Dupain, Y. and Sós, Vera T.: On the discrepancy of n˛ sequences, Topics in classical number theory, Colloquium, Budapest 1981, vol. 1, Colloq. Math. Soc. János Bolyai 34, 355–387. [El] Elliott, P.D.T.A.:Probabilistic number theory, vol. 1 and 2, Springer 1979–80. [Er] Erd˝os, P.: On the law of the iterated logarithm, Ann. of Math. 43 no. 2 (1942), 419–436. [Er-Hu] Erd˝os, P. and Hunt, G.A.: Changes of sign of sums of random variables, Pacific Journal Math. 3 (1953), 673–687. [Fe1] Feller, W.: An Introduction to Probability Theory and its Applications, Vol. 1 (3rd edn), Wiley, New York, 1969. [Fe2] Feller, W.: An Introduction to Probability Theory and its Applications, Vol. 2 (2nd edn), Wiley, New York, 1971. [Fe3] Feller, W.: The general form of the so-called law of the iterated logarithm, Trans. Amer. Math. Soc. 54 (1943), 373–402. [Ha] Halász, G.: On Roth’s method in the theory of irregularities of point distributions, Recent Progress in Analytic Number Theory, Vol. 2, pp. 79–94, London, Academic Press 1981. [Ha-Li1] Hardy, G.H. and Littlewood, J.: The lattice-points of a right-angled triangle. I, Proc. London Math. Soc. 3 (1920), 15–36. [Ha-Li2] Hardy, G.H. and Littlewood, J.: The lattice-points of a right-angled triangle. II, Abh. Math. Sem. Hamburg 1 (1922), 212–249. [Ha-Li3] Hardy, G.H. and Littlewood, J.: Some problems of Diophantine approximation: A series of cosecants, Bull. Calcutta Math. Soc. 20 (1930), 251–266. [Ha-Wr] Hardy, G.H. and Wright, E.M.: An introduction to the theory of numbers, 5th edition, Clarendon Press, Oxford 1979. [Ka] Kac, M.: Probability methods in some problems of analysis and number theory, Bull. Amer. Math. Soc. 55 (1949), 641–665. [Ke] Kesten, H.: Uniform distribution mod 1, Ann. of Math. 71 (1960), 445–471, and Part II in Acta Arithm. 7 (1961), 355–360. [Kh1] Khinchin, A.: Über einen Satz der Wahrscheinlichkeitsrechnung, Fundamenta Math. 6 (1924), 9–20. [Kh2] Khinchin, A.: Continued Fractions, English translation, P. Noordhoff, Groningen, The Netherlands 1963. [Kn1] Knuth, D.E.: Notes on generalized Dedekind sums, Acta Arithmetica 33 (1977), 297–325. [Kn2] Knuth, D.E.: The art of computer programming, volume 3, Addison-Wesley 1998. [Ko] Kolmogorov, A.: Das Gesetz des iterierten Logarithmus, Math. Annalen 101 (1929), 126–135. [La] Lang, S.: Introduction to Diophantine Approximations, Addison-Wesley 1966. [Ma] Matousek, J.: Geometric Discrepancy, Algorithms and Combinatorics 18, SpringerVerlag, Berlin 1999. [Os] Ostrowski, A.: Bemerkungen zur Theorie der Diophantischen Approximationen. I. Abh. Hamburg Sem. 1 (1922), 77–99. [Ra-Gr] Rademacher, H. and Grosswald, E.: Dedekind Sums, Math. Assoc. Amer., Carus Monograph No. 16 (1972). [Ro] Roth, K.F.: Irregularities of distribution, Mathematika 1 (1954), 73–79. [Schm] Schmidt, W.M.: Irregularities of distribution. VII, Acta Arithmetica 21 (1972), 45–50. [Scho] Schoissengeier, J.: Another proof of a theorem of J. Beck, Monatshefte für Mathematik 129 (2000), 147–151.
References
483
[Shi] Shintani, T.: On evaluation of zeta functions of totally real algebraic number fields at non-positive integers, Journ. Fac. Sci. Univ. Tokyo 23 1976, 393–417. [So1] Sós, Vera T.: On the distribution mod 1 of the sequence fn˛g, Ann. Univ. Sci. Budapest 1 (1958), 127–234. [So2] Sós, Vera T.: On the discrepancy of the sequence fn˛g, Coll. Math. Soc. János Bolyai 13 (1974), 359–367. [So3] Sós, Vera T.: On strong irregularities of the distribution of fn˛g sequences, Studies in Pure Math. (1983), 685–700. [So-Za] Sós, Vera T. and Zaremba, S.K.: The mean-square discrepancies of some twodimensional lattices, Studia Sci. Math. Hungarica 14 (1979), 255–271. [Sw] Swierczkowski, S.: On successive settings of an arc on the circumference of a circle, Fund. Math. 46 (1958), 187–189. [We] Weyl, H.: Über die Gleichverteilung von Zahlen mod Eins, Math. Ann. 77 (1916), 313–352. [Wo] Wolfram, S.: A new kind of science, Wolfram Media 2002. [Za1] Zagier, D.B.: Nombres de classes at fractions continues, Journ. Arithmetiques de Bordeaux, Asterisque 24–25 (1975), 81–97. [Za2] Zagier, D.B.: On the values at negative integers of the zeta-function of a real quadratic field, Einseignement Math. (2) 22 (1976), 55–95. [Za3] Zagier, D.B.: Valeurs des functions zeta des corps quadratiques reels aux entiers negatifs, Journ. Arithmetiques de Caen, Asterisque 41–42 (1977), 135–151. [Za4] Zagier, D.B.: Zeta-funktionen und quadratische Körper, Hochschultext, Springer 1981.
Index
A Area Principle, 44–59, 251–254, 260–279, 346, 355–367
B Badly approximable numbers, 6, 32, 118–120, 148, 226 Beck–Chen, 30 Beck, J., 93, 145 Binary quadratic form, 21, 84, 148, 179, 251, 257 reduced, 183 Birkhoff, 143 Blocks-and-gaps decomposition, 371–393 Bohl, 7 Borel, E., 137 Bumby, 123
E Elliott, P.D.T.A., 147 Erdõs-Hunt, 473, 474 Erdõs, P., 265 Euler, 85, 142, 187, 255, 355 Extra Rule, 23, 46, 48, 49, 60, 65, 66, 93, 208, 209, 212, 214, 216, 223, 228, 356
F Feller, W., 49, 56, 73 Fibonacci number, 47, 60, 67, 76, 77, 219 Formula Dedekind’s reciprocity, 89 Hirzebruch-Mayer-Zagier (HMZ), 85–87, 161, 163, 164, 193 Ostrowski’s explicit, 23–26, 46, 77, 207, 226 Parseval’s, 32, 33, 262, 423, 425, 441 Poisson’s summation, 242, 262, 423, 424
C Cassels, J.W., 356 Chazelle, B., 30 Continued fraction convergent, 60 partial quotient, 6, 32, 79, 85, 93, 118, 142, 144
G Gauss, 143, 182, 256 measure, 144 Giant Leap, 1–16, 137–147, 254–261
D Davenport, H., 32 Descombes, I.R., 356 Dieter, U., 101 Dupain-Sós, 11 Dupain, Y., 11
H Halász, G., 290 Hardy–Littlewood, 8, 18, 31, 98–100, 125, 239 series, 100, 120–123, 125, 127 Hardy–Wright, 6, 154 Hurwitz, 123, 124, 255
© Springer International Publishing Switzerland 2014 J. Beck, Probabilistic Diophantine Approximation, Springer Monographs in Mathematics, DOI 10.1007/978-3-319-10741-7
485
486 Hyperbolic needle, 252, 253, 259, 273, 283–286, 288, 294–296, 298, 300, 301, 305, 312, 336, 347, 348, 351–355, 371, 395, 403 triangle, 151, 152, 372, 375, 376, 390, 393, 403
I Inequality Koksma’s, 128, 136, 172 Kolmogorov’s, 73, 74, 238 Irrational rotation, 6–8, 11, 12, 14, 15, 17, 23, 29–32, 100, 105, 128, 129, 163, 197, 227, 356, 367
K Kac, M., 147 Kesten, H., 244 Khinchin, A., 8, 144, 264 Knuth, D.E., 98, 103, 141 Kolmogorov, A., 140, 265 Komlós, 52 Kronecker, 7, 101, 255 Kusmin, 143, 144, 244, 471
L Lang, S., 8, 142, 257 Law of the iterated logarithm (LIL), 258, 263, 266, 446, 451, 468, 470, 473 Lemma Borel-Cantelli, 56, 450, 477 on Bounded Error Initial Segments, 9–10, 14–16, 30 on Bounded Error Intervals, 7–9, 13–15 Discrepancy, 10, 11, 31, 129, 171 Hecke’s, 13–15 on Just Intervals, 14, 15, 30 on Restricted Permutations, 14
M Markov chain ergodic, 211, 219–223 stationary distribution, 50, 68, 77 Matousek, J., 30
O Ostrowski, A., 8, 11, 18, 23, 31, 239, 355 Ostrowski’s large uctuation result, 26
Index P Pell’s equation, 6, 13, 21, 80–82, 85, 122, 123, 149, 179, 180, 182, 193, 203, 249–367, 371 Pell’s inequalities homogeneous, 255–261 inhomogeneous, 255–261, 471
Q Quadratic field class number, 21, 22, 84, 161–165, 181–186 Dedekind zeta function, 181, 192, 196, 199, 243 Dirichlet character, 163, 187 Dirichlet L-function, 187 fundamental unit, 13, 84, 86, 162, 165, 180, 193–195, 198, 199, 215, 217, 260 norm, 83, 171 primary representation, 83, 148, 149, 153, 161, 165, 180, 196, 198 unique factorization, 84 unit, 84 Quadratic irrational, 3–6, 11, 12, 15, 20, 26, 28, 31, 64, 78, 80–83, 87, 88, 116, 118, 121–123, 141, 144, 167, 176, 180, 196–197, 203, 204, 207, 213, 219, 223, 226, 238, 243, 244, 251, 255, 258, 267, 286, 470
R Rademacher function, 263 modified, 290–294, 298–300 Rademacher-Grosswald, 89 Rademacher like function, 374, 378, 380–383, 385, 386, 389, 390, 392, 393, 408, 409 Riesz product, 279–311, 343–345, 347–349, 352, 353 Roth, K.F., 31, 144, 289, 290
S Sawtooth function, 91, 116, 126, 240 Schmidt, W.M., 31 Schoissengeier, J., 93, 98 Shintani, T., 197 Sierpinski, H., 7, 138 Sós, 14, 227, 356, 361
Index Sós-Zaremba, 33 Sum Dedekind, 64, 78, 88, 89, 91, 92, 94, 98, 100, 101 Gauss, 188, 190 Super-irregularity, 261, 272 Swierczkowski, S., 14
T Theorem central limit, 6, 11–13, 18, 20–23, 27, 28, 30–43, 65, 67, 72, 73, 76, 78, 139, 141, 148, 167, 181, 210, 232, 243–247, 258, 262–266, 373, 405–413, 470 Converse of Lagrange’s, 5–7 Feller’s, 446, 451–457, 461, 462, 464–466 Lagrange’s, 5 Roth’s, 32, 145 three-distance, 14, 15, 30
487 U Uniform distribution, 3–16, 30, 129, 141, 241, 330
V Van Aardenne-Ehrenfest, 31 Van der Corput, J.G., 29, 30 sequence, 29–43, 59 Von Mises, 140, 141
W Weyl, H., 7, 8 Weyl’s criterion, 8 Wolfram, S., 145
Z Zagier, D.B., 85, 87, 148, 162, 183, 184, 192, 197, 257