Sieve Methods

SIEVE METHODS Graduate course, Rutgers, spring 1996 Henryk Iwaniec Dedicated to the memory of Ted Richert CONTENTS ...

Author: Iwaniec H.

23 downloads 649 Views 800KB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

SIEVE METHODS Graduate course, Rutgers, spring 1996

Henryk Iwaniec

Dedicated to the memory of Ted Richert

CONTENTS

Preface CHAPTER 1. Evolution of Sieve Ideas 1.1. Eratosthenes sieve 1.2. The inclusion-exclusion formula of Legendre 1.3. Sieve problems 1.4. Digressions on the Legendre formula 1.5. The leading and the error terms 1.6. Hypothesis on g(d) and rd (A) 1.7. A general application of the Eratosthenes-Legendre sieve 1.8. A simple case of a local sieve 1.9. Epilogue - the sieve weights Appendix 1. Arithmetic Functions A1.1. The M¨ obius inversion formulas A1.2. The Dirichlet convolution A1.3. The Tchebyshev and Mertens estimates CHAPTER 2. Combinatorial Sieves 2.1. Buchstab’s formula 2.2. Pure sieve 2.3. Setting up a sieve by iterations 2.4. Choosing the truncation parameters 2.5. A variation of Brun’s method 2.6. Two applications to one problem 2.7. Buchstab’s iterations CHAPTER 3. The Beta-Sieve 3.1. Introduction 3.2. Estimates for Vn (D, z)

3.3.

The functions F (s), f (s)

3.4.

The functions H(s), h(s)

3.5.

The convergence problem. Conclusion

3.6.

The main theorems

3.7.

Numerical tables

Appendix 3. The Differential-Difference Equations CHAPTER 4. The Λ2 -Sieve 4.1.

General Results

4.2.

Explicit estimates for H(D, z)

4.3.

Explicit estimates for R(A, P, Λ2 )

4.4.

Selected applications

Appendix 4. Mean-Values of Multiplicative Functions A4.1. Simple estimates A4.2. Asymptotic formulas for full sums A4.3. Asymptotic formulas for restricted sums A4.4. The linear case CHAPTER 5. The Linear Sieve Theory 5.1.

A summary of previous results

5.2.

The true asymptotics for special sifted sums

5.3.

The optimality of the linear sieve

5.4.

A refinement of estimates for error terms

5.5

The remainder in a well-factorable form

5.6.

Estimates for bilinear forms in the error terms

Appendix 5. Separation of Variables Techniques CHAPTER 6. Weighted Sieves 6.1.

Almost-primes

6.2.

Sieve limits for almost-primes

6.3.

Some applications

6.4.

Twin almost-primes

CHAPTER 7. Sublinear Sieves 7.1.

The half-linear sieve

7.2.

Sieves of fractional dimension

7.3.

Small sieve of Eratosthenes-Legendre

CHAPTER 8. The Large Sieve 8.1.

The basic inequalities

8.2.

The large sieve inequalities for additive characters

8.3.

Equidistribution over residue classes

8.4.

Arithmetic large sieve

8.5.

The large sieve inequality for multiplicative characters

CHAPTER 9. Bombieri’s Sieve 9.1.

Heuristic arguments for sums over primes

9.2.

Asymptotics for sums over almost-primes

9.3.

Basic arrangements

9.4.

Handling the sieve mollifier

9.5.

Estimation of S0 (Aλk , x, y)

9.6.

Evaluation of S(Aλk , x, y)

9.7.

Some applications

9.8.

The parity problem

9.9.

Asymptotic sieve for primes

Appendix 9. The functions Λk

REFERENCES

PREFACE These are lecture notes from a graduate course in sieve methods which I delivered in the spring of 1996 at Rutgers. Though generally regarded as part of elementary number theory the sieve methods are not quite as easy to teach because of complexity of arguments and diversity of techniques. A modern graduate student has little, if any, exposure to this nevertheless fascinating area. Thus I assumed nothing of the student at the start. For five out of nine chapters I provided appendices in which a vast side material is presented that is needed to make the exposition of sieve theory self-contained to a great extent. Anyone who studies these notes from scratch should be able to check proofs thoroughly, even though some technical details are being suppressed gradually toward the end. If one prefers a more relaxed style I recommend the master book by H. Halberstam and H. E. Richert [HR]. The sieve theory as well as its applications are equally interesting. Of course, one cannot practice the latter without the former. Therefore we spend considerable time in developing theoretical background. In fact I have placed strong emphasis to generality, however, not as far as to lose touch with practice. To make the theory intelligible I introduce the involved concepts in an intuitive fashion rather than by abstract definitions; this approach serves the purpose without compromising the clarity and precision of results needed for applications. The abundance of applications of sieve methods is overwhelming but we show only the core. Because of time limit we have chosen a few problems of one flavor (twin primes, Goldbach conjecture, etc.) for which we make several improvements over and over again as to illustrate progress in theoretical studies. I hope these lectures will give a student sufficient skill to employ the theory and her/his imagination for finding new applications. For further studies of related topics and for learning more sieve theory, I recommend the following books [HR], R2], [B5], [M] and the articles [S3], [B4], JR], I4], [G2]. These notes cover almost exactly the material presented in classes. The original drafts which I have been regularily distributing among students are slightly different. A few modifications and improvements were installed as these drafts were being typed during the term. I thank Barbara Miller for her generous help in editing and high quality typing. Texts in sieve theory have the reputation of looking formidable but Barbara made her part beautiful. I also thank David Farmer for preparing figures and Sandra Davis for computing certain integrals.

Henryk Iwaniec June 11, 1996

I. EVOLUTION OF SIEVE IDEAS

CHAPTER I

EVOLUTION OF SIEVE IDEAS

We often apply, consciously or not, some kind of sieve procedure whenever the subject of investigation is not directly recognizable. We begin by making a long list of suspects and then we sort it out gradually by excluding obvious cases with respect to the available information. The process of exclusion itself may yield new data which influence our decision what to exclude or include in the next run. When no clue is provided to drive us further the process terminates and we are left with objects which can be examined by other means to determine their exact identity. These universal ideas were formalized in the context of arithmetic back in the second century BC by Eratosthenes, and are still used today. I hope the sieve methods will continue to inspire developments in arithmetic. 1.1. Eratosthenes Sieve In the set of natural numbers (positive integers) N = {1, 2, 3, 4, 5, 6, 7, 8, . . . } there are primes P = {2, 3, 5, 7, . . . }. These are the natural numbers p > 1 which are not divisible by any d > 1 other than d = p. We cannot say when the concept of prime number was formulated, clearly it was known to Euclid (circa 300 BC) who is attributed to having given the first rigorous proof that there exists infinitely many primes. This follows from the unique factorization theorem which asserts that every n ∈ N factors into powers of distinct primes, n = Π pn p p

where np are non-negative integers determined uniquely by n with np = 0 for almost all p. Therefore the primes are fundamental elements of arithmetic. Properties of prime numbers, beautiful and mysterious, challenged the sharpest minds in the history of mathematics. After Euclid anyone would wish to know

CHAPTER 1. EVOLUTION OF SIEVE IDEAS

HOW MANY PRIME NUMBERS ARE THERE? To get a hold on the problem we consider natural numbers in a finite interval, say 1, 2, 3, 4, 5, 6, 7, 8, . . . , x and we ask how to recognize which of these are primes? By convention 1 is not prime so we cross it out. The even numbers greater than 2 are composite so we cross them out as well. The next number to 2 which is not crossed out is 3 and every third number exceeding 3 is divisible by 3 so we cross out each of these. Then we pass to number 5 and we cross out every fifth number greater than 5. If one continues this procedure until no new number can be found, the numbers which are not crossed out are all primes p 6 x. For example if x = 48 we obtain 11 13 14 16 1 2 3 4 5 6 7 8 9 10 12 15 19 21 22 23 25 26 27 28 29 31 32 17 18 20 24 30 37 38 39 41 43 44 46 47 33 34 35 36 40 42 45 48 Note that some numbers are crossed out more than one time. Hence we extract the complete table of primes p 6 48; 2

3

5 7

11

13

17

19 23

29

31

37

41

43

47.

The above procedure was invented by Eratosthenes and was named “Eratosthenes Sieve” still during his life time (he was born in 276 BC and died blind from hunger in 196 BC, see [Pol] ). √ It was observed by Leonardo Pissano [Pis] that in order to obtain the list of all primes x√< p 6 x one only needs to remove from the integers 1 < n 6 x the multiples of primes 6 x. Such an observation proved to be useful for counting the number of all primes p 6 x, π(x) = |{p ∈ P : p 6 x}|. For example in our case of x = 48 in order to complete the sifting process one must cast the multiples of 2, 3, 5; it shows that π(48) = 15. 1.2. The Inclusion-Exclusion Formula of Legendre Building the table of primes p 6 x by means of Eratosthenes sieve and counting primes p 6 x are not exactly the same problems. The latter can be reduced to the first but it is not necessary to do so. In reality if one seeks an analytic expression for π(x) it helps not at all to look at the table of all primes p 6 x.

1.2. THE INCLUSION-EXCLUSION FORMULA OF LEGENDRE

In 1808 A.-M. Legendre [Leg] published the following formula (1.1)

√ π(x) − π( x) + 1 X x X X x X X X x = [x] − [ ]+ [ ]− [ ] + ... p1 p1 p2 p 3 √ √ p1 p2 √ p1 6 x

p2
p3
where [x] denotes the integral part of x (the “entier” function) which for x > 0 is the same as [x] = |{n ∈ N : n 6 x}|. To verify this formula observe that [x/p1 . . . pr ] is the number of n 6 x which are multiples of p1 . . . pr , therefore the right-hand side of (1.1) represents the number of √ n 6 x counted with suitable multiplicities. Namely if n has exactly s prime divisors 6 x it is counted with multiplicity (1.2)

s s s 1− + − + . . . = (1 − 1)s = 0 1 2 3

provided s > 1, otherwise n is counted once. The left-hand side of (1.1) represents the √ same things, i.e., it counts n 6 x exactly once if n is free of prime divisors 6 x. Legendre fully realized that his formula holds true for general sequences (before him L. Euler [E1] and after V. A. Lebesque [Leb] used the Eratosthenes sieve to pick up primes in special arithmetic progressions). His ideas can be put in an abstract, purely combinatorial setting. Suppose we have a finite set A and a finite collection P of properties applicable to the elements of A. Given p ∈ P we denote by A(p) the subset of elements in A which have property p. Then for any collection (d) ⊂ P we obtain A(d) =

\

A(p)

p∈(d)

the set of elements in A which posssess every property in (d). We would like to know how many elements in A possess no property in P. Let A(P) denote the set of these exceptional elements. By (1.2) one verifies the following identity (1.3)

|A(P)| =

X

(−1)|(d)| |A(d) |

(d)⊂P

where |B| is the cardinality of B. We shall call this identity the inclusion-exclusion formula.

CHAPTER 1. EVOLUTION OF SIEVE IDEAS

The Legendre formula (1.1) is a special case of the inclusion-exclusion formula (1.3) in √ which A is the set of integers 1 6 n 6 x and P is the collection of primes p 6 x. A number n has property p iff n ≡ 0(mod p). We write d = p1 . . . pr in place of (d) = (p1 , . . . , pr ) so the set Ad consists of numbers n ≡ 0(mod d) and its cardinality is |Ad | = [ xd ]. Using the M¨ obius function (see A1.1) we write (1.1) as X √ x (1.4) π(x) − π( x) + 1 = µ(d)[ ] d √ d|P ( x)

√ √ where P ( x) is the product of all primes 6 x. For actual computations of π(x) this formula is not very efficient because there are a lot of terms on its right side. Even if one discards the divisors d > x for which the terms vanish a lot will remain. In this connection E. Meissel [Mei] introduced interesting modifications. To reduce the number of terms he took a smaller set of primes in the inclusion-exclusion process and exploited values of π(y) for small y which are known from the preceding computations. For any z 6 x we put (1.5)

Φ(x, z) = |{n 6 x : (n, P (z)) = 1}|

where P (z) is the product of all primes p 6 z, thus Φ(x, z) counts the natural numbers √ n 6 x which are free of prime divisors p 6 z. In particular for x 6 z 6 x we have (1.6)

Φ(x, z) = π(x) − π(z) + 1.

By the inclusion-exclusion formula we have the expression X x (1.7) Φ(x, z) = µ(d)[ ]. d d|P (z)

On the other hand, for special z Meissel expressed Φ(x, z) in terms of π(y), for example X √ x Φ(x, 3 x) = π(x) + π( ) p √ √ 3 x
d|P ( x)

1.3. SIEVE PROBLEMS

1.3. Sieve Problems Along the lines of the previous section, one is able to treat an arbitrary but finite subset of natural numbers, which amounts to considering the whole set of the natural numbers taken with certain multiplicities. Even more generally we may as well attach to each n ∈ N any coefficient an > 0. Let A = (an )

(1.10)

be the sequence of coefficients (non-negative numbers). We assume the series X (1.11) |A| = an n

converges and that we have some knowledge about every congruent partial series X (1.12) |Ad | = an . n≡0(mod d)

Given a number P we wish to evaluate the sum X (1.13) S(A, P ) =

an .

(n,P )=1

Usually P is the product of a large number of distinct primes from a certain set. We call A the sifting sequence, P the sifting range and S(A, P ) the sifted sum. By the inclusion-exclusion formula we verify that X (1.14) S(A, P ) = µ(d)|Ad |. d|P

Using (1.82) one can derive the generalized Legendre formula (1.14) somewhat formally as follows   X X X S(A, P ) = an = an  µ(d) n

(n,P )=1

=

X d|P

µ(d)

X d|n

an =

d|(n,P )

X

µ(d)|Ad |.

d|P

The additional generality of the Legendre formula is introduced for good reasons. A vast exploration of (1.14) will be more interesting for coefficients an not necessarily differentiable

CHAPTER 1. EVOLUTION OF SIEVE IDEAS

such as an = n in the case of (1.9). Very often A = (an ) will be the characteristic function of a finite set of special natural numbers such as polynomial values or shifted primes. We shall offer numerous examples in due course, and we present a few right now. The two most inspiring problems which laid the foundation of modern sieve theory were the twin primes and the Goldbach conjectures, they assert that:

- There exist infinitely many primes p such that p − 2 is a prime, - Every even number greater than 2 is a sum of two primes.

To put the twin primes conjecture in the framework of the sieve consider the numbers √ n = m(m − 2) with 2 < m 6 x, 2 - m. If such an n is free of prime divisors p 6 x then it yields a pair of primes {m, m − 2}. Observe that n = m(m − 2) has the property n ≡ 0(mod p) iff m ≡ 0, 2(mod p). Therefore the task before us is to sift out the odd numbers in 2 < m 6 x √ which are congruent to either of the two specified residue classes modulo odd primes p 6 x. Paraphrasing, if A = (an ) is the characteristic √ function of the set of numbers of the above type and P is the product of odd primes 6 x then S(A, P ) √ counts the prime numbers m with x < m 6 x such that m − 2 is a prime. Similarly we proceed with the Goldbach problem. Let N be an even integer, N > 2, and A = (an ) be the characteristic function of the set of numbers n =√m(N − m) with 2 < m < N and (m, N ) = 1. Let P be the product of all primes p√< N , p - N . Then S(A, P ) counts the solutions to ` + m = N with `, m both primes > N . Now we need to sift out the √ numbers m with 2 < m < N and (m, N ) = 1 such that m ≡ 0, N ( mod p) for any p < N with p - N . There is yet another setting for catching twin primes in a sieve. Take an supported on numbers n = m−2 where m is a prime with 2 < m 6 x. Let P be the product of all primes 2 < p 6 z. Then S(A, √ P (z)) scores the primes m for which m − 2 has no prime factors 6 z. Choosing z = x we force m − 2 to be a prime. This approach requires removing only one residue class of m per each prime modulus p 6 z. However, the drawback of such simplification is that one has to deal with the sequence of primes in place of odd integers. The prime numbers are known to be equidistributed over the primitive residue classes (see (1.27)) but not as deeply as the odd integers. Yet, in spite of our incomplete knowledge about primes in this respect, the latter approach produces stronger results. A similar alternative approach can be set up for the Goldbach problem. Next, given a k-tuple a = [a1 , . . . , ak ] of distinct integers one may ask if there exist infinitely many m such that m − a1 , . . . , m − ak are simultaneously primes. This amounts

1.3. SIEVE PROBLEMS

to sifting out natural numbers m congruent to one of the residue classes aj (mod p) to prime moduli in an appropriate range. In the examples presented so far we were led to a problem of casting integers which have fixed residue classes to prime moduli. We now show a problem which requires frequent change of residue classes according to the modulus. Consider the quadratic polynomial values n = m2 + 1. For each prime p the property n ≡ 0(mod p) translates into m ≡ ν(mod p) where ν are roots of the congruence ν 2 + 1 ≡ 0(mod p). There are no roots if p ≡ 3(mod 4) and there are exactly two roots if p ≡ 1(mod 4), these vary with p quite a lot. It is known that the set of fractional parts { νp } is dense in (0, 1) (see [DFI]). More generally we may consider a system of distinct, irreducible polynomials f1 (x), . . . , fk (x) ∈ Z[x] with positive leading coefficients, and we ask if there are infinitely many integers m such that each f1 (m), . . . fk (m) is prime. Of course, one needs a necessary condition that F (x) = f1 (x) . . . fk (x) has no fixed prime divisor, in other words the congruence F (ν) ≡ 0(mod p) must have fewer than p solutions for every p (by the theorem of Lagrange it is enough to verify this condition for p 6 deg F ). The celebrated hypothesis of A. Schinzel [SW] asserts that these conditions are sufficient. Having examined basic cases and famous conjectures we now have some idea what the formulation of the general sieve problem should be. We assume M is a finite set of integers, P is a set of primes and for each p ∈ P we are given a collection Ωp of residue classes modulo p. The general sieve problem concerns the set (M, P, Ω) = {m ∈ M : m(mod p) ∈ / Ωp

for any p ∈ P}.

We wish to know when this set is not empty, and how many elements are there? Most often in sieve problems we encounter collections of classes Ωp ⊂ Z/pZ of which the number of elements, say ω(p) = |Ωp |, is constant. There are also interesting problems which require a lot of residue classes for large moduli to be excluded. For instance if one wants to estimate the number of squares in M then by virtue of the following property 2

m=` ⇒

m p

= 0 or 1 for all p,

CHAPTER 1. EVOLUTION OF SIEVE IDEAS

where

m p

is the Legendre symbol, one sees that ν Ωp = {ν(mod p) : = −1} p

is the appropriate collection of classes for the job, it has ω(p) = p−1 2 elements. One extends the subset of squares in M to the set (M, P, Ω) where P is chosen at will and estimate the cardinality of the latter. This is a typical situation of the large sieve (see Chapter 8). There is some demand for sieve problems in several dimensions. For example consider the question of rational zeros of ternary quadratic forms ϕab (x, y, z) = ax2 + by 2 − z 2 with positive integer coefficients a, b. According to the Hasse principle ϕab represents zero in rationals if and only if it does so in every p-adic field, and using Hilbert’s symbol this property translates into a, b =1 for all p. p Since this always holds for p - 2ab it suffices to verify the local conditions for a finite number of places p with p|2ab. Hence the quadratic forms ϕab with 1 6 a, b 6 X which represent zero are outnumbered by elements of the set E(X) = {1 6 a, b 6 X : (a, b)(mod p) ∈ / Ωp ,

for all p > 2}

where Ωp ⊂ (Z/pZ)2 consists of pairs of residue classes (α, β) such that α, β = −1. p Here the number of exceptional (forbidden) classes is ω(p) = |Ωp | = p − 1 for each p > 2. J. -P. Serre [Ser] showed (among other things) that |E(X)| X 2 (log X)−1 . 1.4. Digressions on the Legendre Formula Elementary as it looks the formula (1.14), nevertheless, does not give a good insight into the quantity S(A, P ); we cannot say in simple terms what is the true order of magnitude, nor even can one directly recover the obvious estimates 0 6 S(A, P ) 6 |A|

1.4. DIGRESSIONS ON THE LEGENDRE FORMULA

(recall that the coefficients an are real, non-negative). Of course, one should have a much better upper bound than this since S(A, P ) takes only a few lucky terms an which survive the sifting process, i.e., these with (n, P ) = 1. At first glance one feels the expansion (1.14) is inferior as the right side has more terms than we had initially in (1.11). On a positive note one thinks of the congruent partial series (1.12) as being tractable and hopes to take advantage of the expansion (1.14) by observing a good deal of cancellation due to random sign change of µ(d). Well, except for certain abstract choices of an for which |Ad | is asymptotically proportional to an arithmetic function which neutralizes the variation of sign of µ(d) (see Section 5.3). However such choices are biased. In true settings of sieve theory |Ad | satisfy approximations of type |Ad | = g(d)X + rd (A)

(1.15)

where g(d) is a nice multiplicative function with (1.16)

0 6 g(p) < 1 if p|P ,

X is a suitable positive number (independent of d) and rd (A) is regarded as an error term. This approximation is meaningful when |rd (A)| is much smaller than |Ad |. The function g(d) and the number X are not determined uniquely by the sequence A, yet these have to be fixed carefully to minimize the resulting error terms. For d = 1 we have |A| = X + r(A) which suggests to choose X = |A| making r(A) = r1 (A) = 0. Indeed, this natural choice is often used but one gains some flexibility by holding X slightly untied. At any rate X is close to |A|, therefore g(d) can be thought of as a probability of finding an ∈ A with n ≡ 0(mod d). We shall refer to g(d) as the density function and to g(d)X as the leading term. Replacing each |Ad | in (1.14) by the approximation (1.15) we can execute completely the summation of leading terms getting S(A, P ) = XV (P ) + R(A, P )

(1.17) where (1.18)

V (P ) =

X

µ(d)g(d) =

Y (1 − g(p))

R(A, P ) =

X

d|P

(1.19)

p|P

d|P

µ(d)rd (A).

CHAPTER 1. EVOLUTION OF SIEVE IDEAS

Here, if the total remainder term R(A, P ) is ignored, one is led to believe that S(A, P ) ∼ XV (P )

(1.20)

so we call XV (P ) the expected asymptotic value of S(A, P ). Moreover, since X approximates to |A|, the product V (P ) represents a probability of finding an ∈ A with (n, P ) = 1. We shall refine the above concepts in due course, and as the theory develops we learn that some of the heuristics are not right. 1.5. The Leading and the Error Terms In this section we give a glimpse of available estimates for the error term rd (A) in the approximation (1.15) for various sequences. We shall also realize what kind of properties are adequate for the density function g(d) in the leading term. These selected examples will guide us to make reasonable hypotheses of general type. We begin by considering values of an integral polynomial F , (1.21)

n = F (m)

with 1 6 m 6 x.

Letting A = (an ) be the indicator of these numbers we infer by splitting into residue classes modulo d that X (1.22) |Ad | = |{1 6 m 6 x : m ≡ ν(mod d)}| ν

=

X x − ν ν

d

−ν ω(d) − = x + θω(d) d d

where ν ranges over the roots and ω(d) is the number of roots of the congruence (1.23)

F (ν) ≡ 0(mod d).

Here and thereafter θ denotes any complex number with |θ| 6 1 not the same one in each occurence. This symbol may change its value even within one formula so one should get used to strange rules such as θ − θ = 2θ. For a linear polynomial F (m) = qm + a with q > 1 and (a, q) = 1 there is one root of (1.23) to every modulus d coprime with q, it is given by ν ≡ −a¯ q (mod d) where q¯ denotes the multiplicative inverse of q modulo d, i.e., q¯q ≡ 1(mod d). The residue class −a¯ q (mod d) varies with the modulus considerably if q is large but not as broadly as for irreducible polynomials of degree 2 (see [DFI]).

1.5. THE LEADING AND THE ERROR TERMS

For polynomials of degree > 2 we do not know enough about the actual distribution of roots of (1.23), however the number of these roots is well understood (on average at any rate). For application of sieve it is sufficient to know that ω(d) is a multiplicative function such that X ω(p) (1.24) log p = k log z + O(1) p p6z

where k is the number of irreducible factors of F . The latter follows by the Prime Ideal Theorem in the decomposition field of F (cf. [Lan]). After the above considerations for the polynomial sequence (1.21) it is easy to recommend the approximations (1.15) with X = x and (1.25)

g(d) =

ω(d) . d

These choices yield error terms as small as (see (1.22)) |rd (A)| 6 ω(d).

(1.26)

The error terms of that size are manageable for moduli d < x(log x)−B . Next we examine the polynomial values (1.21) with m restricted to primes (for F (m) = m − 2 this is the case of the twin primes problem). As before we infer by splitting into residue classes modulo d that X |Ad | = π(x; d, ν) ν

where ν ranges over the roots of the congruence (1.23) and π(x; d, ν) denotes the number of primes m 6 x in the arithmetic progression m ≡ ν(mod d). If (ν, d) = 1 we have the Siegel-Walfisz theorem (cf. [Dav]) (1.27)

π(x; d, ν) =

π(x) + O(x(log x)−A ) ϕ(d)

for any A > 0 with the implied constant depending only on A. If (ν, d) 6= 1 then m = (ν, d) so these classes contribute to |Ad | at most ω(d). Hence (1.28)

|Ad | =

ω ? (d) π(x) + rd (A) ϕ(d)

X?

π(x) π(x; d, ν) − ϕ(d)

where (1.29)

rd (A) =

ν

+ θω(d).

CHAPTER 1. EVOLUTION OF SIEVE IDEAS

Here the star in the summation restricts ν to the roots of (1.23) coprime with the modulus, and ω ? (d) denotes the number of such roots. By (1.27) we obtain rd (A) ω(d)x(log x)−A . This bound is inadequate (it exceeds the leading term in (1.28)) as soon as d > (log x)A . By the Riemann hypothesis for L-functions with Dirichlet characters of modulus d one 1 infers the bound (1.27) with much better error term O(x 2 log x), whence 1

rd (A) ω(d)x 2 log x 1

which is useful for d as large as x 2 (log x)−B . However we hesitate to use this or any other unproven assertions. In sieve applications it does not matter how sharp is the bound for individual error terms rd (A), rather it is important to have adequate results with moduli d as large as possible, and one really needs estimates only on average. In this respect, when it comes to counting primes in arithmetic progressions for the purpose of sieve theory, we have a satisfactory substitute for the Riemann hypothesis, namely the following result X

(1.30)

d6D

max |π(x; d, ν) −

(ν,d)=1

π(x) | x(log x)−A ϕ(d)

1

where D = x 2 (log x)−B with B = B(A) > 0 and A is any positive number, the implied constant depending on A alone. This result is due to E. Bombieri [B2] and in a slightly weaker form to A.I. Vinogradov [V]. The Riemann hypothesis implies not much more, nevertheless it is plausible that (1.30) holds with larger D. It is conjectured by P. Elliott and H. Halberstam [EH] that (1.30) holds true with D = x1−ε for any ε > 0, the implied constant depending on ε and A. On the other hand, as shown by J. Friedlander and A. Granville [FG], it would be false to claim (1.30) with D = x(log x)−B . In the case of (1.29) one derives from (1.30) by Cauchy’s inequality that X (1.31) |rd (A)| x(log x)−A . d6D

1

for any A > 0 with D = x 2 (log x)−B(A) . The conjecture of Elliott-Halberstam would give (1.31) with D = x1−ε . 1.6. Hypotheses on g(d) and rd (A) Let A = (an ) be a sequence to be sifted by primes of a certain set P. Naturally we do not need in P any prime p for which all an with n ≡ 0(mod p) vanish, i.e., such that

1.6. HYPOTHESES ON g(d) AND rd (A)

|Ap | = 0, because such p does nothing in the sieve process. For example if an carries the polynomial values n = m2 + 1 with m even then we can restrict P to primes p ≡ 1(mod 4). A finite collection of primes in P which are taken to the sieve process will be called the sifting range, so also the product of these primes, say P . For a given A there is at most one natural choice of the multiplicative function g which produces an adequate approximation (1.15). Since g(p) occurs at primes p ∈ P only we can (for notation convenience) arbitrarily revise the natural values of g(p) at p ∈ / P; we set (1.32)

if p ∈ / P.

g(p) = 0

The modified error terms rd (A) with d having a prime divisor outside the sifting range have no effect on results. Next, in order to be able to examine the main terms in sieve estimates, we are going to assume some regularity in the distribution of g over primes. A reasonable condition (cf. (1.24)) would be that X

(1.33)

g(p) log p = κ log z + O(1)

p6z

where κ > 0 is a constant. Comparing with Mertens’ formula (1.97) one can interpret (1.33) as saying that g(p)p is κ on average. By partial summation (1.33) yields (1.34)

X

g(p) = κ log log z + α + O((log z)−1 )

p6z

where α is a certain constant, hence Y (1.35) (1 − g(p)) = β(log z)−κ 1 + O((log z)−1 ) p6z

where β is a positive constant. Precisely, by another formula of Mertens (1.99) we have −γ

β=e

Y p

1 (1 − g(p)) 1 − p

−κ

.

Much of sieve theory requires weaker conditions and some methods require nothing of g(p) at all. In these lectures we shall be working with one-sided inequality of type (1.36)

Y

w6p
−1

(1 − g(p))

6

log z log w

κ

K

CHAPTER 1. EVOLUTION OF SIEVE IDEAS

for any z > w > 2 where K is a constant > 1. Note that this inequality implies g(p) 6 1 −

1 . K

The upper bound in (1.36) controls the probabilities g(p) on average over the primes in the sifting range. The exponent κ captures the size of sieve, it combines in one number a measure of the density of the set P and the measure of the susceptability of the elements of A to being sifted by the primes in P. We shall call κ the sieve dimension; note that any larger value of κ also serves. There are some advantages to postulating the one-sided inequality (1.36) instead of the asymptotic (1.35). First of all the sieve results for dimension κ will be automatically valid for any smaller dimension. In particular we can remove any inconvenient prime from the sifting range without losing results. These adjustments must not be performed on a grand scale but rather occasionally to clear technical obstacles. One of the involved problems could be with parameters upon which the sequence A depends and which distort its asymptotic behaviour, yet by making slight modifications one may achieve an adequate uniformity without altering the condition (1.36). In practice we can establish by elementary means the following bound κ Y log z L −1 1+ (1.37) (1 − g(p)) 6 log w log w w6p
for any z > w > 2 where L is a positive constant. Note that this inequality implies g(p) 6

L L + log p

and it yields (1.36) with the constant K = 1 + L/ log 2. If this K is too large (not acceptable) one can put aside a few primes from the sifting range, say all p < y for some y, and deduce (1.36) with K = 1 + L/ log y. Thus (1.36) can be assumed to hold with K close to 1 provided P contains no small primes. For small primes one can apply exact sifting by means of M¨ obius’ inversion or Legendre’s formula (see A1.1). Another way to secure (1.36) with K close to 1 goes by enlarging the dimension κ. Indeed, the right-hand side of (1.37) is bounded by κ+ε ε L log y log z 1+ log w log y log z provided P contains no primes p < y. Hence we have (1.36) with (1.38)

K = 1 + L(log y)ε−1 (log z)−ε

1.6. HYPOTHESES ON g(d) AND rd (A)

and κ + ε in place of κ. Since z does not change through the applied arguments, and it is usually large, the constant K given by (1.38) is fine, even for y = 2 (no preliminary sifting is needed). In particular if we increase the dimension by 1 we have (1.36) with K = 1 + L(log z)−1 . For numerous applications we derive from (1.36) the following handy estimates: Lemma 1.1. Let g be a multiplicative function with 0 6 g(p) < 1 such that (1.36) holds for all w with y 6 w 6 z. Let h be a continuous, non-negative and non-decreasing function on the segment [y, z]. Then we have two inequalities X

g(p)h(p)V (p) 6 −KV (z)

h(w)d

y

y6p
X

z

Z

g(p)h(p)V (p) 6 −V (z)

Z

z

h(w)d

y

y6p

log z log w

log z log w

κ

κ

+ (K − 1)h(z)V (z),

+ (1 −

1 )h(z)V (y). K

Proof. By partial summation the sum is equal to G(y)h(y) +

Z

z

G(w)dh(w)

y

where G(w) =

X

w6p
κ log z g(p)V (p) = V (w) − V (z) < K − 1 V (z) log w

by (1.36). Applying this bound and then integrating by parts we obtain the first inequality. By a slightly different bound G(w) <

log z log w

κ

1 − 1 V (z) + (1 − )V (y) K

we obtain the second inequality. Concerning the error term rd (A) in the approximation (1.15) we do not need to control each of these separately. As we pass through a sieve the relevant error terms rd (A) are accumulated into a sum of type R(A, P, Λ) =

X d|P

λd rd (A)

CHAPTER 1. EVOLUTION OF SIEVE IDEAS

with certain weights λd . The weights are not large, very often |λd | 6 1 and λd = 0 if d > D for some D > 2. In this case λd rd (A) can be replaced by |rd (A)|. One could postulate that (1.39)

R(A, P ; D) =

X

|rd (A)| 6 εXV (P )

d|P,d
with a small ε > 0 (recall that XV (P ) is the expected value of S(A, P ), see (1.20)). However we prefer to make no such assertion in the general framework of sieve theory. The total remainder term will be left unestimated in the common results, only for specific cases we examine it thoroughly to complete the job. For example, as we know, in many sieve problems there holds (1.40)

|rd (A)| 6 g(d)d

for d|P .

In this simple, yet common, case we derive R(A, P ; D) 6 D

X

g(d) = D

d|P

Y (1 + g(p)). p|P

Hence (1.41)

R(A, P ; D) 6 DV (P )−1 .

This bound satisfies (1.39) for any D ≤ εXV (P )−2 . Assuming g(d)d > 1 we can do slightly better by using (4.54), namely (1.42)

R(A, P ; D) 6

Dϕ(P ) . P V (P )

In general we call R(A, P ; D) the remainder term of range P and level D. The larger D is afforded subject to (1.39) the stronger sieve bounds will be obtained. When constructing a sieve one tends to focus on the main terms, however one should not be completely out of contact with R(A, P, Λ), in particular one should not rush to insert absolute values since there is a chance to exploit a cancellation of the error terms rd (A) with the effect of admitting larger D. We shall develop various shapes of the remainder term which have special features in due course. 1.7. A General Application of the Eratosthenes-Legendre Sieve

1.8. A SIMPLE CASE OF A LOCAL SIEVE

We show quickly what the inclusion-exclusion formula is capable of when (1.40) holds. Inserting (1.41) with D = P to (1.17) we obtain (1.43)

S(A, P ) = XV (P ) + θP V (P )−1 .

Suppose all primes of sifting range P are 6 z, then V (P )−1 6 K(2 log z)κ by (1.36), and P 6 4z by an elementary inequality of Tchebyshev’s type. Therefore (1.43) yields (1.44)

S(A, P ) = V (P ) {X + O(5z )} .

Hence we conclude that the expected asymptotic formula (1.20) holds true as X → ∞ uniformly in z 6 12 log X. 1.8. A Simple Case of a Local Sieve In this section we demonstrate a device for improving the result of the previous section. There is not enough profit from working in a general context so we stick to the simplest of all important sequences A = (an ), namely that with an = 1 for 1 6 n 6 x. Our aim is merely to point out how to reduce the number of terms in the Legendre formula by taking advantage that a natural number has no divisors larger than itself. We begin by repeating the argument of the previous section in the context of (1.45)

ϕ(x, m) = {1 6 n 6 x : (n, m) = 1}

where m is a positive integer and x > 1. In the Legendre formula ϕ(x, m) =

X

µ(d)

hxi

d|m

we insert

d

x x nxo = − d d d

where {y} denotes the fractional part of y getting ϕ(x, m) = x

Y

p|m

1 (1 − ) − R p

where R=

X d|m

µ(d)

nxo d

.

CHAPTER 1. EVOLUTION OF SIEVE IDEAS

At this point one usually treats the remainder term R by estimating its individual terms trivially by 1 (surely, one cannot find a much better approximation than this because the cardinality of sets is an integer-valued function) which gives |R| ≤ 2ω(m) where ω(m) denotes the number of distinct prime factors of m. Hence (1.46)

ϕ(x, m) = x

Y

p|m

1 (1 − ) + θ2ω(m) . p

Actually one can show (1.46) with the error term θ2ω(m)−1 if m > 1. Sometimes the Legendre formula yields a pure product without error terms. As an example we give the Euler phi-function ϕ(m) = |{1 6 n 6 m : (n, m) = 1}| which is the case of x = m. Indeed we have { m d } = 0 for any d|m, therefore R = 0 and (1.47)

ϕ(m) = m

Y

p|m

1 (1 − ). p

L. Euler [E2] established this formula already in 1784 using the idea of inclusion-exclusion two decades before Legendre. The asymptotic formula (1.46), although not exact for all x, is quite strong if m has few prime divisors (it does not matter how large they are) but it loses its meaning very quickly as ω(m) increases with x. The remainder term exceeds the main term already for m with ω(m) > 2 log x. Subtracting ϕ(x, m) from ϕ(x + y, m) we infer from (1.46) that any interval of length y > mϕ(m)−1 2ω(m) contains an integer coprime with m, whereas the Jacobsthal conjecture asserts this is true for any interval of length y > cω(m)2 where c is a positive constant (a very strong result is stated in Corollary 5.4). Our estimates of error terms { xd } were poor for d > x. We now improve upon these by applying the following convexity inequality x α x x { } 6 min{1, } 6 d d d where α can be chosen at will subject to 0 6 α 6 1. This device yields −1 Y 1 1 ϕ(m) α Y α −α |R| 6 x (1 + p ) 6 x 1− 1+ α m p p p|m

p6z

1.8. A SIMPLE CASE OF A LOCAL SIEVE

where z is any number with π(z) > ω(m). We choose α such that z 1−α = 3 so p 6 3pα for any p 6 z getting x Y ϕ(m) − log 1 3 (1 − )−1 (1 + ). |R| 6 x 3 log z m p p p6z

Hence we conclude the following approximate formula (1.48)

ϕ(x, m) =

log x ϕ(m) x{1 + O(3− log z log4 z)}. m

Choosing z = x1/5 log log x we get (1.49)

ϕ(m) ϕ(x, m) = x{1 + O m

1 } log x

provided ω(m) < x1/6 log log x . For m = P (z) the product of all primes p 6 z the formula (1.48) becomes (1.50)

Φ(x, z) = x

Y

p6z

1 (1 − ){1 + O(3−s log4 z)} p

where s = log x/ log z > 1 and the implied constant is absolute. Hence (1.51)

Φ(x, z) = x

Y

p6z

1 (1 − ){1 + O p

1 } log x

uniformly for z 6 x1/6 log log x . This range of uniformity is much larger than that previously achieved in the last section, however, (1.51) is not as general as (1.44) (our device would fail if the sequence A was lacunary or supported in a short segment located far away from the origin). √ It would be naive to think that (1.51) remains true for all z 6 x. Suppose x 6 z 6 ε(x)x with ε(x) → 0 as x → ∞. In this range we have Φ(x, z) = π(x) − π(z) + 1 which is constant in z asymptotically whereas the product in (1.51) varies in z according to the Mertens formula (1.99). This observation reveals that not all of the error terms µ(d){ xd } can be treated crudely or ignored. To make a point we return to the Legendre formula (1.52)

Φ(x, z) =

X

d|P (z)

x µ(d) [ ]. d

CHAPTER 1. EVOLUTION OF SIEVE IDEAS

First we delete the terms with d > x since they vanish, then we write Φ(x, z) = x

X

µ(d)d−1 +

d|P (z),d6x

X

µ(d){x/d}

d|P (z),d6x

= xG(x, z) + R(x, z),

say.

Now it turns out that xG(x, z) is the right main term, in other words R(x, z) is small, but to prove these facts is the crux of the matter. A quick inspection reveals that R(x, z) is small not because it contains fewer terms but rather due to the sign change of the M¨ obius function which causes a considerable cancellation. Using Φ(x, x) = 1 and the trivial bound |R(x, x)| 6 x one derives (1.53)

|

X

µ(d)d−1 | 6 1.

d6x

Hence one sees indeed that µ(d) changes sign quite often but, of course, the inequality (1.53) is not sufficient to estimate successfully the remainder term R(x, z) when z is large.

1.9. EPILOGUE - THE SIEVE WEIGHTS

1.9. Epilogue - The Sieve Weights In view of the attempts made so far to derive a reasonable estimate for the sifted sum X

S(A, P ) =

an

(n,P )=1

from the Legendre expansion S(A, P ) =

X

µ(d)|Ad |

d|P

and the problems involved with the errors rd (A) one comes up promptly with a suggestion to drop some |Ad | before employing the approximations |Ad | = g(d)X + rd (A) to a fewer number of terms. Thus one is willing to trade the exact expansion for inequalities of type (1.54)

X

µ(d)|Ad | 6 S(A, P ) 6

d|P,d∈D−

X

µ(d)|Ad |

d|P,d∈D+

where D− and D+ are relatively small sets provided the resulting bounds are meaningful. There are still more possibilities with weighted inequalities (1.55)

X

λ− d |Ad | 6 S(A, P ) 6

d|P

X

λ+ d |Ad |

d|P

+ + where Λ− = {λ− d } and Λ = {λd } are real sequences supported on a relatively small sets, say we have

(1.56)

+ λ− d = λd = 0 if d > D.

At this point we assume no estimate for |Ad | other than the obvious one |Ad | > 0. These quantities are not completely arbitrary; in addition to positivity we know that |Ad | can be expressed as the congruent sum |Ad | =

X

n≡0(mod d)

an .

CHAPTER 1. EVOLUTION OF SIEVE IDEAS

Based on this information only we seek two universal systems of weights Λ− and Λ+ such that the inequalities (1.55) are valid for any sequence A = (an ) with an > 0. This requirement is equivalent to the assertion that for any m|P X

(1.57)

λ− d 6

d|m

X

µ(d) 6

d|m

X

λ+ d.

d|m

For a proof use the sequence (1.58)

an = δm,n =

1

if n = m

0

otherwise.

To secure (1.57) for m = 1 we set + λ− 1 = λ1 = 1

(1.59)

whereas for m 6= 1 the properties (1.57) read as (1.60)

X d|m

λ− d 606

X

λ+ d.

d|m

+ The replacement (or deformation) of the M¨ obius function µ(d) by λ− d and λd in the above fashion constitutes the underlying principle of any sieve method. We call Λ− = Λ− (D) and Λ+ = Λ+ (D) the lower bound sieve and the upper bound sieve of level D respectively. + + Actual constructions of Λ− = {λ− d } and Λ = {λd } take place out of the context of the sequence A = (an ) to be sifted. When Λ− , Λ+ are obtained by truncating the M¨ obius function these are called combinatorial sieves.

Certain type of sieves, the combinatorial ones included, can be constructed naturally by introducing some restrictions to the inclusion-exclusion steps in every second position depending on the sign of µ(d). For the first time a procedure of such kind was exercised by Jean Merlin [M1,2], unfortunately he was killed during WWI when he was very young and was deprived a chance for his ideas to mature. Soon after, Viggo Brun [Br1,2,3] presented a powerful idea of his own with impressive applications to the twin prime conjecture. Brun’s works started the true life of sieve theory. I would compare this moment of history of prime numbers to that of the publication of the memoir on the zeta-function by Bernard Riemann [R]. Introducing the approximations (1.15) to (1.55) we obtain (1.61)

XV (Λ− ) + R(A, P, Λ− ) 6 S(A, P ) 6 XV (Λ+ ) + R(A, P, Λ+ )

1.9. EPILOGUE - THE SIEVE WEIGHTS

where (1.62)

V (Λ) =

X

λd g(d)

R(A, P, Λ) =

X

λd rd (A).

d|P

(1.63)

d|P

We shall call XV (Λ) and R(A, P, Λ) the main and the remainder terms respectively. Hence + − a strategy for construction of the sieves {λ− d } and {λd } should proceed to make V (Λ ) maximal and V (Λ+ ) minimal subject to the normalization condition (1.59), the linear inequalities (1.60) together with the restriction (1.56). The last restriction on the support of λd controls the total number of error terms rd (A). What about the size of λd ? Fortunately, it comes as a bonus that any sieve weights which are chosen wisely according to the above strategy are not large. The following bound (1.64)

|λd | 6 γ ω(d)

with some constant γ > 1 can be verified after the construction of various type of sieves in spite of being no issue in the main term optimization strategy. In the case of a combinatorial sieve we have (1.64) with γ = 1 and in the case of Selberg sieve γ = 3 (see (4.21)). If (1.64) holds then the remainder term R(A, P, Λ) is usually treated by taking absolute values, so one deals with (1.65)

Rγ (A, D) =

X

γ ω(d) |rd (A)|.

d
In many cases (a few of these are presented in Section 1.5) analytic number theory provides estimates of type (1.66)

Rγ (A, D) X(log X)−A

for D = X α−ε with some 0 < α 6 1 and any ε > 0, γ > 1, A > 0, the implied constant depending on ε, γ, A. In this case we say A has level D and level exponent α. Thus (1.66) allows us to treat A by any sieve Λ of level D. Of course, (1.66) implies (1.67)

R(A, P, Λ) X(log X)−A

however, (1.67) is known to hold for sieves of larger level due to a cancellation of terms λd rd (A). If (1.67) holds for a sieve Λ(D) we say that the sequence A has level D(Λ) relative to Λ.

CHAPTER 1. EVOLUTION OF SIEVE IDEAS

Some sieves are difficult to perform because of technical problems caused by small primes in the sifting range. For example our hypotheses (1.36), (1.37) cannot be sharp if w is small (it requires long averaging to create a strong approximation to a smooth function). For this reason one may prefer to apply first a simple, yet quite precise sieve Λ1 to sift out numbers having small prime divisors and a different, more sophisticated sieve Λ2 in the complementary range for the remaining numbers. With this aim in mind we describe the following principles of composition of sieves. Let (d) denote the set of all prime divisors of d. Suppose P is the union of two disjoint sets of primes P1 and P2 . Thus any d with (d) ⊂ P = P1 ∪ P2 factors uniquely as d = d1 d2 with (d1 ) ⊂ P1 and (d2 ) ⊂ P2 . Given two sequences Λ1 = {λd1 } and Λ2 = {λd2 } we define the product Λ = Λ1 Λ2 = {λd } by setting λd = λd1 λd2

if d = d1 d2 .

+ Clearly, if Λ+ 1 , Λ2 are the upper bound sieves of level D1 , D2 respectively then + Λ+ = Λ+ 1 Λ2

(1.68)

is an upper bound sieve of level D = D1 D2 , for we have + λ+ ∗ 1 = (λ+ 1 ∗ 1)(λ2 ∗ 1) > (µ ∗ 1)(µ ∗ 1) = µ ∗ 1.

Moreover we have (1.69)

+ V (Λ+ ) = V (Λ+ 1 )V (Λ2 ).

+ + Observe that upper bounds for V (Λ+ 1 ), V (Λ2 ) yield an upper bound for V (Λ ). + − + To compose a Λ− sieve we use the four sieves Λ− 1 , Λ1 , Λ2 , Λ2 in the following fashion (there are other options as well)

(1.70)

+ − + + + Λ− = Λ − 1 Λ2 + Λ2 Λ1 − Λ1 Λ2 .

This is a lower bound sieve of level D1 D2 . Indeed we have + − + + λ− ∗ 1 = (λ− 1 ∗ 1)(λ2 ∗ 1) + (λ2 ∗ 1 − λ2 ∗ 1)(λ1 ∗ 1) − + 6 (µ ∗ 1)(λ+ 2 ∗ 1) + (λ2 ∗ 1 − λ2 ∗ 1)(µ ∗ 1)

= (µ ∗ 1)(λ− 2 ∗ 1) 6 (µ ∗ 1)(µ ∗ 1) = µ ∗ 1 − because λ+ 2 ∗ 1 > λ2 ∗ 1. Moreover we have

(1.71)

+ − + + + V (Λ− ) = V (Λ− 1 )V (Λ2 ) + V (Λ2 )V (Λ1 ) − V (Λ1 )V (Λ2 ) − + − + − = V (Λ− 1 )V (Λ2 ) − [V (Λ1 ) − V (Λ1 )][V (Λ2 ) − V (Λ2 )].

1.9. EPILOGUE - THE SIEVE WEIGHTS

Hence observe that in order to get a lower bound for V (Λ− ) one needs lower bounds for − + + V (Λ− 1 ), V (Λ2 ) and upper bounds for V (Λ1 ), V (Λ2 ). + − + If each of Λ− obius 1 , Λ1 , Λ2 , Λ2 is a combinatorial sieve (a suitable truncation of the M¨ function) then the composite sieves Λ+ and Λ− take only three values 0, ±1 (for Λ− these values may disagree with the M¨ obius function so Λ− is not a combinatorial sieve in the strict sense).

Another generally useful property is the following principle of monotonicity of sieves: X X g(q) λ+ g(d) > λ+ (1.72) d g(d) d h(q) d|P,(d,q)=1

d|P

X

(1.73)

λ− d g(d) 6

d|P

g(q) h(q)

X

λ− d g(d)

d|P,(d,q)=1

for any q|P where h is the multiplicative function with g(p) . 1 − g(p)

h(p) =

To see these inequalities we write ρ = 1 ∗ λ so λ = µ ∗ ρ and X X X λd g(d) = µ(a)ρ(b)g(a)g(b) (d,q)=1

(ab,q)=1

=

X

ρ(b)g(b)

(b,q)=1

Y

(1 − g(p)).

p-bq

Y h(q) X ρ(b)g(b) (1 − g(p)). = g(q) (b,q)=1

p-b

Here we assumed that g is supported on divisors of P . If we drop the condition (b, q) = 1 in the last summation we obtain the asserted inequalities. From these we deduce by the inclusion-exclusion argument that for any p|P X X (1.74) λ+ g(d) > −h(p) λ+ d d g(d), (1.75)

d|P,p|d

d|P

X

X

λ− d g(d) 6 −h(p)

d|P,p|d

λ− d g(d).

d|P

Now suppose u, v are non-negative additive and multiplicative functions respectively. We consider the weight function of type X X u(p)v(p) Y (1.76) w(d) = u(q)v(q) = ( ) (1 + v(p)). 1 + v(p) q|d

p|d

p|d

CHAPTER 1. EVOLUTION OF SIEVE IDEAS

Let f be the multiplicative function with (1.77)

f (p) = g(p)(1 + v(p)).

Suppose (1.78)

0 6 f (p) < 1.

Applying the above inequalities with f in place of g and primes p 6 D we get (1.79)

X

(1.80)

X

λ+ d g(d)w(d) > −σ(D)

X

λ+ d f (d),

λ− d g(d)w(d) 6 −σ(D)

X

λ− d f (d),

d|P

d|P

d|P

d|P

where (1.81)

σ(D) =

X

u(p)v(p)g(p)(1 − f (p))−1 .

p
These inequalities will be employed in the theory of Bombieri’s sieve (see Chapter 9).

1.9. EPILOGUE - THE SIEVE WEIGHTS

APPENDIX FOR CHAPTER 1 Arithmetic functions A1.1. The M¨ obius Inversion Formulas If m = p1 . . . pr is the product of r distinct primes, we define µ(m) = (−1)r which is 1 or −1 whether r is even or odd, in particular µ(1) = 1. For any other m, i.e., if m is not squarefree, we set µ(m) = 0. This makes µ a function on N, it was introduced in 1832 by A.F. M¨ obius [M¨ ob]. Besides its place in sieve theory the M¨ obius function appears in central areas of analytic number theory. Through the use of Euler product for the Riemann zeta-function ζ(s) =

∞ X

n−s =

Y (1 − p−s )−1 p

1

one observes that µ(m) are the coefficients of ζ(s)−1 =

∞ X

µ(m)m−s =

Y (1 − p−s ). p

1

Multiplying these Dirichlet series and Euler products one gets ! ∞ X X µ(m) `−s = 1 `=1

`=mn

and comparing the coefficients one infers the fundamental property of M¨ obius function X 1 if ` = 1 (1.82) µ(m) = 0 otherwise. m|` This property is just another way of expressing the relation (1.2). Using (1.82) one proves the M¨ obius inversion formula; it asserts that for any functions f, g : N → C the two relations X (1.83) g(n) = f (d), d|n

CHAPTER 1. EVOLUTION OF SIEVE IDEAS

(1.84)

f (n) =

X

µ(d)g(n/d)

d|n

are equivalent. To be accurate with the history the inversion formulas in the above form were stated in 1857 by R. Dedekind [Ded]. The M¨ obius version was that for any real variable functions F, G : [1, x] → C the two relations G(x) =

X

F (x/n)

n6x

X

F (x) =

µ(m)G(x/m)

m6x

are equivalent (well, originally M¨obius put these functions into coefficients of infinite Dirichlet series). A1.2. The Dirichlet Convolution Functions f : N → C are called arithmetic. Besides the usual operations of addition and multiplication for two arithmetic functions f, g one has the Dirichlet convolution (1.85)

(f ∗ g)(n) =

X

f (a)g(d).

ad=n

The function δ : N → C defined by (1.86)

δ(n) =

1

if n = 1

0

otherwise

is a unit in the algebra of arithmetic functions with respect to the convolution, i.e., δ∗f = f . Now (1.82) reads as µ∗1=δ which means µ is the inverse to the constant function 1. The M¨ obius inversion formulas (1.83-1.84) read as the equivalence g = 1 ∗ f ⇔ f = µ ∗ g. Let L denote the logarithm function, (1.87)

L(n) = log n.

APPENDIX FOR CHAPTER 1

By the property log ab = log a + log b (which means L is additive) it follows that L(f ∗ g) = Lf ∗ g + f ∗ Lg. This shows that the multiplication by L is a derivation in the Dirichlet algebra. Put Λ = µ ∗ L, i.e., (1.88)

Λ(n) =

X

µ(d) log

d|n

Writing log

n d

n . d

= log n − log d we also have by (1.82) Λ(n) = −

(1.89)

X

µ(d) log d.

d|n

This important arithmetic function (named after von Mangoldt) is supported on powers of primes, precisely (1.90)

Λ(n) =

log p

if n = p` , ` > 0

0

otherwise.

Indeed by (1.88) ∞

X −ζ 0 (s) = Λ(n)n−s ζ 1 since the convolution corresponds to the multiplication of Dirichlet series, on the other hand ! 0 Y X X −ζ d (s) = log (1 − p−s ) = p−`s log p ζ ds p p `

by the Euler product formula and the power series expansion for log(1 − x). Hence (1.90) is verified by comparing the coefficients of both series. An arithmetic function f is multiplicative if it has the property f (mn) = f (m)f (n)

if (m, n) = 1.

The M¨ obius function is multiplicative. If f, g are multiplicative then so are f g and f ∗ g. If g is multiplicative then (1.91)

X d|P

µ(d)g(d) =

Y p|P

(1 − g(p)) .

APPENDIX FOR CHAPTER 1

A1.3. The Tchebyshev and Mertens Estimates In analytic number theory it is more convenient to sum the von Mangoldt function Λ(n) instead of counting primes. We have (1.92)

ψ(x) =

X

Λ(n) = x + O x(log x)−A

n6x

for any A > 0, the implied constant depends on A. This is not an easy result (the Prime Number Theorem) although it can be proved by elementary means (cf. E. Bombieri [B1] and E. Wirsing [W2]). In sieve theory we often appeal to somewhat weaker estimates of P. Tchebyshev [T] and F. Mertens [Mer]. Tchebyshev’s method resembles some sort of sieve so it is right to present it here (we shall return to Λ(n) and its cousins in advanced parts of sieve theory in Chapter 9). Tchebyshev’s method begins with X

Λ(d) = log n.

d|n

Summing over 1 6 n 6 x we get X

d6x

Here X

log n =

Z

X x X x Λ(d) = ψ = log n. d m m6x

n6x

x

(log y)dy + O(log x) = x log x − x + O(log x).

1

n6x

Hence X

(1.93)

ψ

x

m6x

We subtract this for

x 2

m

= x log x − x + O(log x)

with multiplicity 2 getting

ψ(x) − ψ

x 2

+ψ

x 3

− · · · = x log 2 + O(log x)

Hence (1.94)

ψ(x) > x log 2 + O(log x)

APPENDIX FOR CHAPTER 1

and ψ(x) − ψ

x

< x log 2 + O(log x). 2 Adding up the latter for x, x/2, x/4, . . . we get ψ(x) < x log 4 + O(log2 x).

(1.95) Moreover we have by (1.93) X

d6x

x = x log x − x + O(log x). Λ(d) d

Relax the “entier” symbol and estimate the error by (1.95), then divide by x to get X Λ(d) = log x + O(1). d

(1.96)

d6x

Here the composite numbers yield a constant, therefore X log p = log x + O(1). p

(1.97)

p6x

By partial summation we derive X1 1 = log log x + α + O p log x

(1.98)

p6x

where α is a constant. Since Y

p6x

1−

1 p



= exp 

X

p6x







X1 1 1 log(1 − ) = exp − + β + O( ) p p x p6x

we derive by (1.98) the following formula of F. Mertens Y e−γ 1 1 (1.99) (1 − ) = 1+O , p log x log x p6x

where γ is a constant. Mertens showed that γ is exactly the Euler constant,   X 1 γ = lim  − log x = 0.577 . . . . x→∞ n n6x

2.1. BUCHSTAB’S FORMULA

CHAPTER II

COMBINATORIAL SIEVES

2.1. Buchstab’s Formula As mentioned in Epilogue of Chapter 1 the combinatorial sieve was invented eighty years ago by Jean Merlin and Viggo Brun. Brun’s original approach seemed to be very complicated so badly that after nine publications within ten years the inventor quit the subject for the rest of his life (Brun died in 1977). In retrospect Brun’s ideas look clear, it was rather a formidable notation in which he was trapped. We do not follow the original lines of Brun here. An easy way to start a construction of combinatorial sieves (a pair of the lower bound and the upper bound sieves) today employs the following recurrence formula X (2.1) S(A, P ) = |A| = S(Ap , P (p)) p|P

where P (z) denotes the product of primes in P strictly smaller than z, Y (2.2) P (z) = p. p
This identity is obvious. A. Buchstab [Buc] made good use of it to improve sieve bounds by iterations (see Section 2.7) so (2.1) is named after him. 2.2. Pure Sieve If we terminate the inclusion-exclusion process after r steps we get the upper bound X (2.3) S(A, P ) 6 µ(d)|Ad | d|P ω(d)
or the lower bound (2.4)

S(A, P ) >

X

d|P ω(d)
µ(d)|Ad |

CHAPTER 2. COMBINATORIAL SIEVE

according to whether r is odd or even. Actually we can find the missing terms from the identity X X µ(d)|Ad | + (−1)r S(Ad , P (p(d))) (2.5) S(A, P ) = d|P ω(d)
d|P ω(d)=r

where p(d) denotes the least prime divisor of d. This identity is obtained by repeated application of Buchstab’s formula r times. Hence one derives (2.3) and (2.4) by dropping the terms S(Ad , P (p(d))) since these are non-negative. The general inequalities (2.3), (2.4) show that the sequence µ(d) if ω(d) < r (2.6) λd = 0 if ω(d) > r makes an upper bound sieve if r is odd or a lower bound sieve if r is even, i.e., the corresponding inequality (1.60) holds for all m 6= 1. One can also see this property directly from the identity (which follows by (2.5) for the sequence (1.58)) X ω(m) − 1 r (2.7) µ(d) = −(−1) . r−1 d|m ω(d)
Similar identities and inequalities are satisfied by the expected asymptotic main term XV (P ). Recall that X Y V (P ) = µ(d)g(d) = (1 − g(p)). d|P

p|P

The Buchstab formula becomes (2.8)

V (P ) = 1 −

X

g(p)V (P (p)).

p|P

After r iterations we arrive at X X µ(d)g(d) + (−1)r g(d)V (P (p(d))). (2.9) V (P ) = d|P ω(d)
d|P ω(d)=r

Inserting the approximations (1.15) for each |Ad | in (2.5) and comparing the resulting main term with (2.9) we obtain X S(A, P ) = XV (P ) + µ(d)rd (A) d|P ω(d)
+ (−1)r

X

d|P ω(d)=r

{S(Ad , P (p(d))) − g(d)XV (P (p(d)))}.

2.1. BUCHSTAB’S FORMULA

Applying the trivial estimates 0 6 S(Ad , P (p(d)) 6 |Ad | = g(d)X + rd (A) and 0 6 g(d)XV (P (p(d))) 6 g(d)X we get S(A, P ) = XV + θXGr + θR

(2.10) where V = V (P ),

X

Gr =

g(d)

and R =

d|P ω(d)=r

X

|rd (A)|.

d|P ω(d)6r

For r = 1 we have G=

X

g(p) 6

p|P

X

− log(1 − g(p)) = − log V.

p|P

Hence for any r > 1 we deduce by r! > e( re )r that Gr 1 Gr 6 6 r! e

eG r

r

6

r 1 e | log V | . e r

Let c be the number which solves the equation (2.11)

c c e

= e,

c = 3.591 . . .

.

Then for b > c we have bb > e2b−c+1 . We apply this inequality for b = r/| log V | getting Gr 6 e−r−1 V 1−c provided r > c| log V |. We choose r = [s − c log V ] where s > 1 getting Gr 6 e−s V.

(2.12)

Suppose all primes of the sifting range P are 6 z, then every d|P with ω(d) 6 r satisfies d 6 z r 6 D where (2.13)

D = z s−c log V .

By (2.10), (2.12) and (2.13) we obtain

CHAPTER 2. COMBINATORIAL SIEVE

Theorem 2.1. Suppose all primes of the sifting range P are 6 z. Then we have S(A, P ) = XV (P ){1 + θe−s } + θR(A, P ; D)

(2.14)

where s > 1, the remainder term R(A, P ; D) is defined in (1.39) and D is given by (2.13). Suppose the error terms rd (A) satisfy (1.40), then R(A, P ; D) 6 DV (P )−1 (see (1.41)). In this case (2.14) yields S(A, P ) = X{V + θe1−s } + θV −1 z s−c log V

(2.15)

for any s (note (2.15) is trivial if s 6 1). We choose s = c log V + log X/ log ez getting S(A, P ) = X{V + 4θV −c e− log X/ log ez }.

(2.16)

Since V −1 < K(2 log z)κ by (1.36) we conclude that (2.17)

S(A, P ) = XV (P ){1 − 4θK 5 (2 log z)5κ e− log X/ log ez }.

Compare this with the result of Section 1.7. Corollary 2.2. Suppose (1.40) holds and the sifting range P has its primes 6 z. Then (2.18)

S(A, P ) ∼ XV (P )

as X → ∞

uniformly for z 6 X 1/5κ log log X . Let us apply (2.17) for the sequence (1.21) where F is the product of k distinct irreducible polynomials over Z with positive leading coefficients and having no fixed prime divisor. This will be a sieve of dimension k by virtue of (1.24). Suppose the sifting range is P = P (z) the product of primes < z. Put (2.19)

πF (x, z) = |{1 6 m 6 x : (F (m), P (z)) = 1}|.

By (2.17) we obtain (2.20)

πF (x, z) x(log z)−k

provided log z (log x)(log log x)−1 . Hence

2.3. SETTING UP A SIEVE BY ITERATIONS

Corollary 2.3. The number of integers 1 6 m 6 x for which each of the irreducible factors of F (m) is a prime satisfies πF (x) x

(2.21)

log log x log x

k

.

In particular for F (m) = m(m − 2) we get an upper bound for the number π2 (x) of twin primes p, p − 2 with p 6 x which was the target of Brun’s original study. He couldn’t establish a lower bound for π2 (x) but nevertheless put his upper bound into a striking form that the series of reciprocals of twin primes converges. Brun’s constant has been computed by Shanks-Wrench and Brendt; X 1 1 (2.22) + = 1.9021602393 . . . . p p−2 p,p−2 primes

2.3. Setting up a Sieve by Iterations In the previous section we considered a combinatorial sieve (2.6) which takes the M¨ obius function on integers having a number of prime divisors limited by a certain fixed (though large) parameter r. The parity of r alone determines whether this is a lower bound or an upper bound sieve. If r is chosen properly the expected asymptotic formula derived from such a pure sieve is valid in a range much larger than that allowed by the complete Eratosthenes-Legendre sieve. Subsequently Brun [Br3] improved these results in still wider − − ranges by choosing the sequences Λ+ = {λ+ obius function d } and Λ = {λd } to be the M¨ truncated to sets of type D+ = {d = p1 . . . p` : pm < ym

for m odd}

D = {d = p1 . . . p` : pm < ym

for m even}

−

where d is written as the product of distinct primes enumerated in decreasing order, (2.23)

d = p1 . . . p `

with p1 > · · · > p` .

By convention both sets D+ and D− contain d = 1. Here ym are suitable parameters. By the inclusion-exclusion principle it follows that the conditions (1.60) are satisfied no matter how the ym are chosen. Brun’s construction can be explained by Buchstab’s iterations. Suppose we seek an upper bound sieve. We begin by (see (2.1)) X S(A, P ) = |A| − S(Ap1 , P (p1 )) p1

CHAPTER 2. COMBINATORIAL SIEVE

where p1 runs over the divisors of P (the sifting range). For large p1 , say p1 > y1 , the subsequence Ap1 is short and the sifting range P (p1 ) is relatively large so we may have nothing better to use than the trivial bound S(Ap1 , P (p1 )) > 0. Therefore we drop these terms hoping that it is not much lost. We get an upper bound X |A| − S(Ap1 , P (p1 )). p1
Next we apply the Buchstab formula for each remaining term S(Ap1 , P (p1 )) getting X XX |A| − |Ap1 | + S(Ap1 p2 , P (p2 )). p1
p2
Now, the sifted sums S(Ap1 p2 , P (p2 )) appear with positive sign so we cannot drop any of these. At this point, instead, we could use an upper bound but nothing good is available so we must apply the Buchstab formula to each and every S(Ap1 p2 , P (p2 )) getting X XX XXX |A| − |Ap1 | + |Ap1 p2 | − S(Ap1 p2 p3 , P (p3 )). p1
p2
p3
Now again the sign is negative so we can drop S(Ap1 p2 p3 , P (p3 )) for large p3 , say p3 > y3 . Continuing this procedure we end up with X (2.24) S(A, P ) 6 µ(d)|Ad | = S+ (A, P ), d|P,d∈D+

say. Similarly we arrive at the lower bound X (2.25) S(A, P ) > µ(d)|Ad | = S− (A, P ). d|P,d∈D−

Tracking the iteration steps we can recover the missing terms in the above inequalities. To this end we group divisors of P not in D+ or D− respectively in accordance with the first failure of the condition pm < ym getting sums of type X X (2.26) Sn (A, P ) = ... S(Ap1 ···pn , P (pn )). yn 6pn <···
Putting these back to (2.24) or (2.25) according to the parity of n we have two identities for the same S(A, P ); X (2.27) S(A, P ) = S+ (A, P ) − Sn (A, P ) n odd

2.3. SETTING UP A SIEVE BY ITERATIONS

(2.28)

X

S(A, P ) = S− (A, P ) +

Sn (A, P ).

n even

The same rules apply to the expected main term XV (P ), thus we have V (P ) = V + (P ) −

(2.29)

X

Vn (P )

X

Vn (P )

n odd

V (P ) = V − (P ) +

(2.30)

n even

where V ± (P ) =

(2.31)

X

µ(d)g(d)

d|P,d∈D±

(2.32)

Vn (P ) =

X

...

X

g(p1 . . . pn )V (P (pn )).

yn 6pn <···
One can arrange the above identities in a still more general form by applying an arbitrary partition of unity in each step of the iterations. This leads us to the following combinatorial identity: Lemma 2.4. Let {ρd } be any sequence of real numbers with ρ1 = 1. Put λ1 = 1, σ1 = 0 and for d = p1 . . . p` with p1 > · · · > p` put λd = µ(d)

Y

ρp1 ...pk

16k6`

σd = µ(d) (1 − ρp1 ...p` )

Y

ρp1 ...pk .

16k<`

Then we have S(A, P ) =

X

V (P ) =

X

λd |Ad | +

X

σd S (Ad , P (p(d)))

λd g(d) +

X

σd g(d)V (P (p(d))) .

d|P

d|P

d|P

d|P

CHAPTER 2. COMBINATORIAL SIEVE

There is a prospect for good use of the above identities with diverse coefficients 0 6 ρd 6 1 but we shall stick to two values ρd = 0, 1 only since such a sieve is capable of a lot. Even with arbitrary ρd the sieve weights λd derived from Lemma 2.4 cannot be completely general. These weights are constructed in time periods so they are special in this respect because at any given moment of construction one takes into account the history, not the future. The time aspect of our construction will be exploited in our analysis of the systems of relevant differential-difference equations when searching for best solutions in Chapter 3. In Chapter 4 we construct an upper bound sieve due to Atle Selberg which is based on principles of global optimization. It does not mean that Selberg’s sieve produces optimal bounds in every case because it begins with a special type of weights. There are numerous advantages of the combinatorial sieves. One of these is that the weights λd depend only on the dimension κ and the level D. On the other hand Selberg’s construction does not depend explicitly on κ, but instead, it involves the density function g(d) from the approximation (1.15) so it is somewhat linked to the particular sequence A being sifted. 2.4. Choosing the Truncation Parameters ¿From now on our combinatorial sieve depends on the truncation parameters ym alone. Brun tried various parameters, his best choice was (2.33)

ym = Dαβ

m

for suitable constants 0 < α, β < 1. Over the years after him a number of refinements have been given. An important innovation appeared in two short papers by W. Tartakowski [T1,2] in which for the first time the parameters ym depend on the primes p1 , . . . , pm being used prior to the m-th step of iteration. This, of course, is allowed, however in view of a multitude of choices, it requires some motivation to make reasonable ym = ym (p1 , . . . , pm ). Tartakovski seemed to pick up his parameters from numerical experience. We give a treatment of a Brun type sieve that follows from the choice (2.34)

1

ym = (D/p1 . . . pm ) β

where β > 1 is a fixed number. There is a natural motivation for this choice based upon the concepts of sieve level and sieve limit. Recall the analysis which led us to the inequalities (2.35)

S− (A, P ) 6 S(A, P ) 6 S+ (A, P )

by deleting some of S(Ap1 ...pm , P (pm )) in the iteration procedure for S(A, P ). Now we ask how to decide which terms to delete? To get a glimpse we assume there is a number β > 1

2.5. A VARIATION OF BRUN’S METHOD

(which depends only on the dimension κ) with the property that a good sieve Λ− (D) of level D is capable of showing that (2.36)

S(A, P (z)) XV (P (z))

for any A of which the remainder term of level D is under control (see (1.39)) provided the sifting range P (z) is not too large, namely it satisfies (2.37)

s=

log D > β. log z

We shall refer to s as the relative sifting range and β = β(κ) can be regarded as the sieve limit. We do not dwell on the precise definition of β(κ) (cf. [Sel 2]), its intuitive meaning is clear enough to explain our choice (2.34). If the sifting range exceeds this limit, i.e., s < β, then we expect (2.36) to fail at least for some A and P of dimension κ. Now we apply the concept of β(κ) to the subsequence Ap1 ...pm pretending it is a genuine sifting sequence of the appropriate level D/p1 . . . pm . Accordingly we expect no loss by dropping any term S(Ap1 ...pm , P (pm )) such that (2.38)

log(D/p1 . . . pm ) < β. log pm

To the contrary if the reverse of (2.38) holds we should retain this term since it yields a positive contribution, namely we have S (Ap1 ...pm , P (pm )) g(p1 . . . pm )V (P (pm )) by (2.36) (note that g(p1 . . . pm )X is the adequate approximation to |Ap1 ...pm |). The above reasoning indicates that the truncation parameter ym which resolves (2.38) is critical. This suggests our choice (2.34). We shall refer to the combinatorial sieve induced by the parameters (2.34) as the β-sieve of level D. At first glance one tends to believe that our choice (2.34) must lead to the best possible results by taking optimal β = β(κ). This turns out to be true in some important cases, however false for sieve problems of any dimension κ > 1. Hence the question: what goes wrong with the described heuristic when κ > 1 ? One of many possible explanations is that we were allowed to drop sifted sums only on every second period. We shall address this matter in due course. 2.5. A Variation of Brun’s Method

CHAPTER 2. COMBINATORIAL SIEVE

Before we proceed to essential analysis of the β-sieve we give in this section a relatively simple treatment. The argument has some features in common with Brun’s original work [Br3] though his truncation parameters were different. Since β > 1 the resulting sequences Λ+ = {µ(d) : d ∈ D+ } and Λ− = {µ(d) : d ∈ D− } have level of support D except for µ(p) in Λ− . We control this by restricting the range P to primes p < z < D. By (2.24) and (2.25) we obtain (2.39)

S(A, P ) 6 S+ (A, P ) 6 XV + (P ) + R(A, P ; D) S(A, P ) > S− (A, P ) > XV − (P ) − R(A, P ; D)

where R(A, P ; D) is the remainder term of level D (see (1.39)) and V + (P ), V − (P ) are expressed in terms of V (P ) and Vn (P ) in (2.29) and (2.30) respectively. We seek an upper bound for V + (P ) and a lower bound for V − (P ), therefore in both cases we need upper bounds for Vn (P ). We have V − (P ) 6 V (P ) 6 V + (P ) and V + (P ) − V − (P ) =

X

Vn (P ).

n

Keep in mind that V + (P ), V − (P ) and Vn (P ) depend on the sieve level D. We shall use a simplified notation by writing z in place of P (z) since P does not change throughout the argument. Therefore (2.32) becomes (2.40)

Vn (z) =

X

...

X

g(p1 . . . pn )V (pn ).

yn 6pn <···
Since the terms of Vn (z) are non-negative some of the summation conditions can be relaxed or forgotten. Thus we have Vn (z) 6

X

...

X

g(p1 . . . pn )V (pn )

yn 6pn <···
where the conditions hold for all 1 6 m < n regardless the parity. These conditions imply that β−1 m p1 . . . pm < D1−( β ) if m < n, by induction on m. In particular for m = n − 1 this yields the following lower bound for the least prime in the range of Vn (z), 1

1

pn > (D/p1 . . . pn−1 ) β+1 > D β+1 (

β−1 n−1 β )

1

> Dβ(

β−1 n β )

> zn ,

2.5. A VARIATION OF BRUN’S METHOD

say, where zn = z (

β−1 n β )

1

, z = D s , s > β.

Having this estimate we drop the other conditions and estimate as follows X X Vn (z) 6 ... g(p1 . . . pn )V (pn ) zn 6pn <···


V (zn )  6 n!

X

zn 6p
n

V (zn ) g(p) 6 n!

V (zn ) log V (z)

n

.

By (1.36) V (zn ) 6 K(

n β κn ) V (z) < Ke b V (z) β−1

by the inequality 1 + x 6 ex where β = κb + 1. Hence Vn (z) <

n n K n 1 n n n b+1 + log K e b V (z) < e b K V (z). n! b n! b

Inserting n! > e( ne )n we derive Vn (z) < e−1 an K b+1 V (z)

(2.41) where

a = b−1 e1+b

(2.42)

−1

We choose β = κb + 1 large enough to get a < 1, a condition which will be required for the convergence of the series of Vn (z), this means we require b > c where c solves the equation −1 c−1 e1+c = 1 (see (2.11)). Note that for b > c we have a < 1 − e(b − c)b−2 .

(2.43)

> D and this The condition pn > yn in the range of summation (2.40) implies pn+β 1 shows that Vn (z) is void if n 6 s − β. Therefore by (2.41) X

n>0

Vn (z) =

X

Vn (z) < e−1 (1 − a)−1 as−β K b+1 V (z).

n>s−β

Hence we conclude the following fundamental lemma

CHAPTER 2. COMBINATORIAL SIEVE

Lemma 2.5. Let Λ+ (D), Λ− (D) be the β-sieves of level D for β = κb + 1 with b > c where c = 3.591 . . . is the number which solves (c/e)c = e. Then for any multiplicative function g(d) satisfying (1.16), (1.36) and z = D1/s with s > β we have (2.44)

V + (D, z) − V − (D, z) < e−1 (1 − a)−1 as−β K b+1 V (z)

where a is given by (ab/e)b = e so a < 1. In particular for b = 9 we have a < e−1 whence (2.45)

V ± (D, z) = {1 + 2θe9κ−s K 10 }V (z)

if s > 9κ + 1.

By Lemma 2.5 for a sieve of dimension κ and level D we obtain the following estimate for the sifted sum S(A, z). Theorem 2.6. Let κ > 0, z > 2 and D > z 9κ+1 . Suppose (1.36) holds for all w < z with some K > 1. Then (2.46)

S(A, z) = XV (z){1 + 2θe9κ−s K 10 } + θR(A, P ; D)

where s = log D/ log z and R(A, P ; D) is the remainder term R(A, P ; D) =

X

|rd (A)|.

d|P,d
Remark. By increasing κ to κ + 1 one may have K = 1 + L(log z)−1 by (1.37) which is close to 1 for large z. ¿From the fundamental lemma we shall infer a few bounds for the sieve limit β(κ). By (2.44) we have V − (D, z) > {1 − e−1 (1 − a)−1 as−β K b+1 }V (z).

(2.47)

For s > β + log(1 − a)/ log a this lower bound is positive provided K is sufficiently close √ − 21 for κ > 1 we get a < 1 − 1/8 κ by (2.43) so to 1, namely K b+1 <√e. Taking b = c + κ √ log(1 − a)/ log a < 8 κ log 8 κ. Hence (2.47) gives V − (D, z) > (1 − e−1 K 6 )V (z) provided s > cκ + κ > 1 satisfies (2.48)

√

√ √ κ + 1 + 8 κ log 8 κ. This implies that the sieve limit of dimension √ √ β(κ) < cκ + 4 κ log κ + (1 + 8 log 8) κ + 1.

2.6. TWO APPLICATIONS TO ONE PROBLEM

One can be more precise with the lower bound (2.47). Using n! > improve (2.41) by factor 2e−1 whence X

e2 n n 2 (e)

for n > 2 we

Vn (z) < 2e−2 a2 (1 − a2 )−1 K b+1 V (z).

n even

For b = 4 we have a < 7/8 and the above inequality yields V − (D, z) > 1 − 89 K 5 V (z)

(2.49)

if D > z 4κ+1 .

This shows that the sieve limit of any dimension κ > 0 satisfies (2.50)

β(κ) 6 4κ + 1.

√ √ If κ 6 1 we may take b = 4 κ, it gives a < 78 κ and (2.51)

−

V (D, z) > 1 −

√ 5/ κ 8 V 9 κK

(z)

if D > z 4

√

κ+1

.

Hence if κ is small V − (D, z) is close to V (z) in the whole range. Remark:. If one allows β (so also the sieve weights) to depend on the sifting range then (2.45) can be improved substantially in terms of s = log D/ log z. Choosing β = s/ log s for sufficiently large s one can derive from (2.44) that for κ > 1 (2.52)

s V + (D, z) − V − (D, z) eκs−1 log s V (z).

Therefore V + (D, z) and V − (D, z) tend to V (z) very rapidly as the sifting range decays (relatively to the level) and the dimension is fixed. In the next chapter we shall see that even a stronger estimate than (2.52) with respect to s is true for a fixed choice of β (depending only on the dimension). 2.6. Two Applications to One Problem Perhaps the most frequent application of the sieve stems from the fact that, given even a modest level of A we get an upper bound for S(A, z) of the correct order of magnitude in a wide range of z. Furthermore in a restricted range we get a lower bound of the correct order of magnitude as well. Assuming (1.66) we obtain (2.53)

S(A, z) XV (z)

if z < X α/β−ε

where α = α(A, D) is the level exponent of A and β = β(κ) is the sieve limit of dimension κ.

CHAPTER 2. COMBINATORIAL SIEVE

For comparison of strength of the pure sieve (in Section 2.2) with the β-sieve version of Brun’s sieve (its simple treatment in Section 2.4) we reconsider the problem of primes represented by polynomials. By Theorem 2.6 we derive πF (x) x(log x)−k in place of (2.21), where the implied constant depends on F . For the quadratic polynomial F (m) = m(am + b) with (a, b) = 1 we get πF (x)

(2.54)

x ab ϕ(ab) (log x)2

where the implied constant is absolute. Hence for the number of twin primes we get π2 (x) x(log x)−2

(2.55) while it is conjectured that (2.56)

π2 (x) ∼ 2

Y

p<2

1 − (p − 1)−2 x(log x)−2 .

Moreover we now know that (2.20) holds in wider range, namely (2.57)

πF (x, z) x(log z)−k

1

if z < x β −

with β = β(k) = 4k + 1 by (2.49) and (2.50). In particular for F (m) = m(m − 2) with 1 6 m 6 x this shows Theorem 2.7. There are infinitely many m such that both m and m − 2 have at most nine prime divisors. By the same method one derives Theorem 2.8. Every even number sufficiently large can be represented as the sum of two numbers each of which has at most nine prime divisors. These are the original results of Viggo Brun [Br3], though our derivation is somewhat different. Had we used more accurate estimates the number of prime divisors in both problems could be reduced easily from nine to eight. In 1939 V. A. Tartakovski [T2] reduced it further down to four by applying his own refinement of Brun’s sieve. There is a long list of improvements obtained by Brun’s method before and after Tartakovski. Another approach to the twin primes problem (introduced in 1947 by A. R´enyi [Ren]) applies the one dimensional sieve to the numbers n = m − 2 where m takes prime values 6 x (rather than the two dimensional sieve applied before to the polynomial values n = m(m − 2)). At present we know by the Bombieri-Vinogradov theorem (1.30) that the sequence of shifted primes has the level exponent α = 21 (see (1.31)) and the sieve limit is β 6 5 by (2.50) for κ = 1. Therefore we deduce by (2.53)

BUCHSTAB’S ITERATIONS

Theorem 2.9. There are infinitely many primes p such that p − 2 has at most ten prime divisors. A similar assertion is established for the Goldbach problem by considering the numbers N − m with m prime, 2 < m < N . Apart from a slight difference in the sifting range the twin primes and the Goldbach problems are indistinguishable by the sieve methods. 2.7. Buchstab’s Iterations Sieve methods produce general inequalities of type (see for example Theorem 2.6) (2.58)

S(A, z) > XV (z)f (s) − R(A, D)

(2.59)

S(A, z) 6 XV (z)F (s) + R(A, D)

where f, F are continuous functions in s = log D/ log z such that (2.60)

0 6 f (s) < 1 < F (s),

f (x) is increasing and F (s) is decreasing to 1 exponentially, (2.61)

f (s) = 1 + O(e−s ),

F (s) = 1 + O(e−s ).

Our objective is to find the best pair (f, F ). In 1938 A. A. Buchstab [Buc 1] discovered by applying Brun’s sieve to each S(Ap , p) in the recurrence formula X (2.62) S(A, z) = |A| − S(Ap , p) p
that the resulting estimates for S(A, z) can be better than these obtained directly. More generally given a pair of functions (f, F ) for which (2.58) and (2.59) hold one can produce a new pair (fˆ, Fˆ ) by passing through the recurrence formula (2.62). To accelerate production we first subtract the expected values getting X (2.63) S(A, z) − XV (z) = |A| − X − (S(Ap , p) − g(p)XV (p)) . p
Suppose the lower bound (2.59) is true for any A in the range s > β, in particular S(Ap , p) > g(p)XV (p)f (sp ) − R(Ap , Dp ) is true for all p < z provided s > β + 1 where Dp = D/p and sp = log D/ log p. Inserting this to (2.63) we obtain X S(A, z) − XV (z) 6 X g(p)V (p)(1 − f (sp )) + R(A, D). p

CHAPTER 2. COMBINATORIAL SIEVE

By Lemma 1.1 the above sum over p < z is bounded by V (z) times K

Z

z

(1 − f (st )) d

0

log z log t

κ

+ (K − 1)(1 − f (s − 1)) = K fˆ(s) − (K − 1)f (s − 1)

where fˆ(s) = 1 + s−κ

(2.64)

Z

∞

(1 − f (t − 1))dtκ .

s

Hence we get the upper bound (2.65)

S(A, z) 6 XV (z){K fˆ(s) − (K − 1)f (s − 1)} − R(A, D)

in the range s > β + 1. Similarly, if the upper bound (2.59) is true for any A in the range s > β then we derive the lower bound n o (2.66) S(A, z) > XV (z) K Fˆ (s) − (K − 1)F (s − 1) − R(A, D) in the range s > β + 1. We call the map (f, F ) 7→ (Fˆ , fˆ) the Buchstab transform. Keep in mind that K is close to 1 thus if (f, F ) is a pair of admissible functions then so is (Fˆ , fˆ) in appropriate ranges up to a small correction of order O(K − 1). One can refine the new pair by taking (f˜, F˜ ) with f˜ = max(f, Fˆ ) F˜ = min(F, fˆ). Usually this refinement makes a difference only for small s since the transforms Fˆ , fˆ improve on f, F if s is sufficiently large. Note that Fˆ , fˆ are defined by (2.64) in the range s > β + 1 provided the initial functions f, F are given in s > β. To get a result in complete ranges in every second step of iterations we appeal to the obvious bound (2.67)

1

S(A, z) 6 S(A, D β+1 )

if s 6 β + 1.

Hence if (2.59) holds for s = β + 1 then it holds for all s 6 β + 1 with F (s) defined by (2.68)

sκ F (s) = (β + 1)κ F (β + 1)

if 0 < s 6 β + 1

(to be precise it holds with KF (s) by applying (1.36), however the correction is small). By virtue of this extension we have F (s) for all s > 0 and Fˆ (s) for all s > 1.

BUCHSTAB’S ITERATIONS

Now we are ready to carry out the iteration process. We start from the upper bound (2.59) with some F0 (s) for s > 0 getting the lower bound (2.58) with f0 (s) = Fˆ0 (s) for s > 1 (up to a small correction). Let β0 > 1 be the largest root of f0 (s) = 0 or put β0 = 1 if such a root does not exist. Next apply (2.58) with f0 (s) for s > β0 getting (2.59) with F (s) = fˆ0 (s) for s > βo + 1 (up to a small correction). Take F1 (s) = min{F0 (s), F (s)} and extend this function to all s > 0 according to (2.68). Continuing this process we produce a sequence of admissible pairs {fj , Fj } each one at least as good as the predecessor, i.e., 0 6 f0 6 f1 6 · · · < 1 < · · · F1 6 F0 .

The question is what one gets after infinite number of iterations? If the process converges the limit pair (f, F ) is the fixed point of the Buchstab transform, i.e., f, F solve the system of integral equations fˆ(s) = F (s) Fˆ (s) = f (s) for sufficiently large s at any rate. This is equivalent to the system of differential-difference equations 0

(2.69)

(sκ f (s)) = κsκ−1 F (s − 1)

(2.70)

(sκ F (s)) = κsκ−1 f (s − 1)

0

subject to the decay conditions (2.61) at infinity. With no initial conditions at hand there are many solutions to this system, the obvious one being f (s) = F (s) = 1. However, given a decent initial function F0 (s) one should be able (theoretically speaking) to come up with the unique limit. When Brun’s sieve is used to ignite iterations one comes up with the initial conditions of type (2.71) (2.72)

sκ F (s) = A κ

s f (s) = B

if s 6 β + 1 if s 6 β

CHAPTER 2. COMBINATORIAL SIEVE

where β, A, B are constants to be determined in terms of κ. The β is chosen as the smallest number > 1 for which the process of iterations converges. Given β, A, B one can find f (s), F (s) in step-by-step integrations of (2.69 - 2.70) starting from (2.71 - 2.72). These will depend uniquely on the constants A, B which are finally determined by the conditions (2.61) at infinity. Therefore, there is exactly one solution to the problem. Yet, in order to establish results rigorously, we must take care of error terms, and that would be a painstaking enterprise. Instead, we shall apply directly the β-sieve construction, actually a combination of two such sieves to solve another convergence problem (see Section 3.6). The results will be essentially the same as these which could be achieved by infinite number of iterations. One may begin Buchstab’s iterations with any pair of a lower and upper bound for the sifted sums. Ankeny-Onishi [AO] employed Selberg’s upper bound and the trivial lower bound. By a very sophisticated analysis Diamond-Halberstam-Richert [DHR] reached the limit of iterations starting from the β-sieve and the Λ2 -sieve. The results are very good for κ slightly larger than 1.

3.2. ESTIMATE FOR Vn (D, z)

CHAPTER III

THE BETA-SIEVE

3.1. Introduction Recall the β-sieve of level D is the combinatorial sieve whose truncation parameters are (3.1)

1

ym (p1 , . . . , pm ) = (D/p1 . . . pm ) β

where β > 1. We have already given a simple and quick treatment of this sieve in Section 2.5. In this chapter we establish the optimal results (up to the leading terms); this amounts to determining the best β = β(κ). The same results (essentially) were found independently by A. Selberg [S1,2], however his account of the sieve theory based on the choice (3.1) is rather different (Selberg reveals that he saw some relevant developments in unpublished notes by B. Rosser). The case of β = 1 is somewhat complicated in technical details, therefore, in these lectures we only consider the β-sieve with β > 1. This restriction turns out to hold for the sieve of any dimension κ > 12 . The half-linear sieve is very interesting and we just miss results, nevertheless we shall get these indirectly by increasing the dimension slightly. 3.2. Estimates for Vn (D, z) Our crude estimates (2.41) were derived by discarding a lot of the summation conditions in (3.2)

Vn (D, z) =

X

...

X

g(p1 . . . pn )V (pn ).

pn <···D

In spite of this extravagant treatment the results were good, particularly if the dimension κ is large. For example it turns out that the estimate (2.48) for the sieve limit is close to the optimal one for the β-sieve as κ tends to infinity (see Section 3.7). However, if κ is large a combination of Selberg sieve with Buchstab iterations produces still stronger estimates. For small κ one can do well with the β-sieve alone.

CHAPTER 3. THE BETA-SIEVE

In this section we give a precise treatment of Vn (D, z). Since we assume only the one-sided inequality (see (1.36)) κ V (w) log z (3.3) 0) Z Z n −κ (3.4) fn (s) = κ s ... (t1 . . . tn )−1 t−κ n dt1 . . . dtn . 01

Naturally one expects that fn (s)V (z) approximates to Vn (D, z). Recall that z s = D, i.e., (3.5)

s = log D/ log z.

Note that the range of summation in (3.2) and the range of integration in (3.4) are void if s > β + n so (3.6)

Vn (D, z) = 0

if s > β + n

(3.7)

fn (s) = 0

if s > β + n.

We shall examine Vn (D, z) and fn (s) in the range 1 (3.8) s > βn = β − εn with εn = 0

if n is odd if n is even

For n = 1 and β − 1 6 s < β + 1 we have (3.9)

1 V1 (D, z) = V D β+1 − V (z)

and (3.10)

sκ f1 (s) = (β + 1)κ − sκ .

For n > 2 and βn 6 s 6 β + n we have the following recurrence formulas X (3.11) Vn (D, z) = g(p)Vn−1 D , p p zn 6p<xn

3.2. ESTIMATE FOR Vn (D, z)

where zn = D

(β+n)−1

(3.12)

, xn = min z, D κ

s fn (s) =

Z

(β+εn )−1

and

β+n

fn−1 (t − 1)dtκ .

max(s,β+εn )

Hence if n > 1, n odd, we have in the range β − 1 6 s 6 β + 1 1 (3.13) Vn (D, z) = Vn D, D β+1 sκ fn (s) = (β + 1)κ fn (β + 1).

(3.14)

Notice that sκ fn (s) is continuous, non-decreasing for s > βn . Moreover by (3.12) we derive for n > 2 κ fn−1 (βn−1 ) (3.15) fn (βn ) 6 β+n βn whence for all n > 1 (3.16)

fn (βn ) 6

Y

m6n

β+m κ . βm

Lemma 3.1. For n > 1 and s > βn we have (3.17)

Vn (D, z) 6 {fn (s) + (K − 1)K n ∆n }V (z)

with (3.18)

∆n = 2n

β+n κn . β−1

Proof. First for n = 1 we obtain by (3.3), (3.9) and (3.10) that κ κ β+1 V1 (D, z)V (z)−1 < K β+1 − 1 6 f (s) + (K − 1) 1 s β−1 β+1 κ so (3.17) holds with ∆1 = β−1 . Suppose n > 2 and that (3.17) holds for n − 1. Then by the recurrence formulas (3.11), (3.12) and by Lemma 1.1 we obtain X Vn (D, z) 6 g(p){fn−1 loglogD/p + (K − 1)K n−1 ∆n−1 }V (p) p zn 6p<xn

6 V (xn )

Z

xn

zn

+ (1 −

fn−1

log D/w log w

1 K )fn−1 (βn−1 )V

κ log xn d log w

(zn ) + (K − 1)K n−1 ∆n−1 V (zn ).

CHAPTER 3. THE BETA-SIEVE

Here the integral is equal to fn (s)(log xn / log z)κ by (3.12). Applying (3.3) for w = xn and w = zn we get κ β+n Vn (D, z) 6 Kfn (s)V (z) + (K − 1) s [fn−1 (βn−1 ) + K n ∆n−1 ]V (z) Hence by (3.15) Vn (D, z)V (z)−1 6 fn (s) + (K − 1)K n

β+n βn

κ

[2fn−1 (βn−1 ) + ∆n−1 ].

This proves (3.17) with (3.19)

∆n = 2n

Y

β+m βm

κ

m6n

by (3.16) which is slightly better than (3.18). We need two estimates for sums of Vn (D, z) over odd and even numbers (see (2.29) and (2.30) respectively). By Lemma 3.1 we can only estimate finite sums because our error terms ∆n grow rapidly with n. Put κn (3.20) ∆(n) = n2 β+n β−1

(3.21)

Tn (s) =

X

fm (s).

m6n,m≡n(2)

Corollary 3.2. For n > 2 and s > βn we have X (3.22) Vm (D, z) < {Tn (s) + (K − 1)K n ∆(n)}V (z). m6n,m≡n(2)

3.3. The Functions F (s), f (s) Our next task is to find β > 1 as small as possible for which the series (3.21) converge, and to represent the infinite sums by simple integrals for practical computations. In this section we give a preliminary analysis. Assuming the convergence in question we form two series X (3.24) F (s) = 1 + fn (s) if s > β − 1 n odd

3.3. THE FUNCTIONS F (s), f (s)

(3.25)

X

f (s) = 1 −

fn (s)

if s > β.

n even

Assuming also the condition (2.61) at infinity we find by the recurrence formula (3.12) that F, f are invariant under the Buchstab transform (2.64), or equivalently F, f satisfy the system of differential-difference equations (3.26)

(sκ F (s))0 = κsκ−1 f (s − 1) 0

κ

κ−1

(s f (s)) = κs

F (s − 1)

if s > β + 1 if s > β.

Moreover sκ F (s) is constant in the segment β − 1 6 s 6 β + 1. Therefore we are led to the initial conditions (3.27)

sκ F (s) = A κ

s f (s) = B

if β − 1 6 s 6 β + 1 at s = β.

As s tends to infinity it will suffice to assume that F (s) = 1 + o(s2κ )

(3.28)

f (s) = 1 + o(s2κ ).

We shall see that our problem is posed correctly, namely the constants A, B are determined by κ and β. Letting for s > β P (s) = F (s) + f (s)

(3.29)

Q(s) = F (s) − f (s)

we turn the system (3.26) into two independent equations sP 0 (s) = −κP (s) + κP (s − 1)

(3.30)

sQ0 (s) = −κQ(s) − κQ(s − 1)

for s > β + 1. The initial conditions (3.27) become s P (s) = A + B + A

Z

sκ Q(s) = A − B − A

Z

κ

s

(t − 1)−κ dtκ

β s

(3.31)

β

(t − 1)−κ dtκ

CHAPTER 3. THE BETA-SIEVE

for β 6 s 6 β + 1, and the conditions at infinity become P (s) = 2 + o(s−2κ ),

(3.32)

Q(s) = o(s−2κ ).

Now we can treat each of the problems for P (s) and Q(s) separately. We solve these problems by the method of adjoint equation. A general account of this method is given in A3.1. The adjoint equations to (3.30) are (sp(s))0 = κp(s) − κp(s + 1)

(3.33)

(sq(s))0 = κq(s) + κq(s + 1).

The first one holds for (3.34)

p(s) =

Z

∞

exp −sz − κ

0

z

Z 0

1−e−u du u

dz.

A solution to the second equation is given in Lemma 3.9 for a = b = κ. For example we have Z ∞ Z z −sz 2κ−1 1 1−e−u e exp(κ du) − 1 z −2κ dz (3.35) q(s) = s + Γ(1−2κ) u 0

0

if 21 < κ < 1 by the formula (3.87). If 2κ is a positive integer then q(s) is a monic polynomial of degree 2κ − 1. For any κ > 12 we have (3.36)

p(s) ∼ s−1 ,

q(s) ∼ s2κ−1

as s → ∞.

Next we compute the inner products (see (3.81) and (3.82)) (3.37)

hP, pi = sP (s)p(s) − κ

Z

s

P (x)p(x + 1)dx Z s = (s − 1)P (s − 1)p(s − 1) + x1−κ p(x)dxκ P (x) s−1

s−1

and (3.38)

hQ, qi = sQ(s)q(s) + κ

Z

s

Q(x)q(x + 1)dx Z s = (s − 1)Q(s − 1)q(s − 1) + κ x1−κ q(x)dxκ Q(x). s−1

s−1

3.3. THE FUNCTIONS F (s), f (s)

First by the behaviour at infinity (see (3.32) and (3.36)) we derive from the first expressions in (3.37) and (3.38) that hP, pi = 2,

(3.39)

hQ, qi = 0.

These equations hold for all s > β + 1. Putting s = β + 1 in the second expressions for the inner products we derive by inserting the initial conditions (3.31) that p(β) + A

Z

= (A + B)β 1−κ p(β) − A

Z

hP, pi = (A + B)β

1−κ

β+1

x1−κ p(x)(x − 1)−κ dxκ

β β

dx1−κ p(x)

β−1

by (3.33). Hence (3.40)

hP, pi = A(β − 1)1−κ p(β − 1) + Bβ 1−κ p(β) = 2.

Similarly we derive (3.41)

hQ, qi = A(β − 1)1−κ q(β − 1) − Bβ 1−κ q(β) = 0.

We know by Lemma 3.11 that q(s) has at most 2κ − 1 real zeros. For the sake of simplicity we restrict our analysis to κ > 21 . In this case q(s) has indeed a positive zero, and we choose β − 1 to be the largest zero of q(s), (3.42)

q(β − 1) = 0.

This yields by (3.41) (3.43)

B = 0.

Then by (3.40) (3.44)

A = 2(β − 1)κ−1 p(β − 1)−1 .

It turns out that our β is the smallest possible one which works. Any value for β − 1 between the last two zeros of q(s) yields negative B by (3.41), consequently f (s) is negative at s = β (see (3.27)) which is worse than the trivial value f (s) = 0 acceptable for any lower bound sieve. Furthermore the second to the largest zero of q(s) (if exists) is not a good choice because it is too small for the convergence of the series (3.24) and (3.25) (the two largest zeros of q(s) are distanced by at least 1, actually the distance increases to infinity

CHAPTER 3. THE BETA-SIEVE

with κ). We do not bother to prove all these properties here, we are only required to show that the choice (3.42) does secure the convergence in question. Incidentally, if κ is a half of an integer then β is an algebraic number since q(s) is a polynomial of degree 2κ − 1 with rational coefficients. Having chosen β, A, B as above one can improve the conditions at infinity (3.28) quite substantially. Indeed, by the inner product formulas we have two integral equations Z s (3.45) sq(s)Q(s) = κ q(x + 1)Q(x)dx, s−1 Z s (3.46) sp(s)P (s) + κ p(x + 1)P (x)dx = 2. s−1

The latter holds also for P (s) = 2 (because it is another solution to (3.30) and (3.32)) so by subtracting we get Z s (3.47) sp(s)(P (s) − 2) = −κ p(x + 1)(P (x) − 2)dx. s−1

Using (3.38) we infer sQ(s)

s

Z

|Q(x)|dx

s−1

by (3.45) and s|P (s) − 2|

Z

s

|P (x) − 2|dx

s−1

by (3.47). From these integral inequalities it is easy to derive Q(s) = O(s−s ),

P (s) = 2 + O(s−s )

F (s) = 1 + O(s−s ),

f (s) = 1 + O(s−s ).

(3.48) whence (3.49)

Remark. Since p(s) is positive for all s > 0 (see the formula (3.34)) it follows by (3.47) that P (s) − 2 changes sign in every interval of length one. 3.4. The Functions H(s), h(s) In order to establish the convergence of the series (3.24) and (3.25) we consider the pair of functions (H, h) defined as the continuous solution to the system of differential-difference equations (3.50)

(sκ+1 H(s))0 = −κsκ h(s − 1) κ+1

(s

0

κ

h(s)) = −κs H(s − 1)

if s > β + 1 if s > β

3.4. THE FUNCTIONS H(s), h(s)

with the initial conditions (3.51)

sκ+1 H(s) = (β − 1)κ sκ+1 h(s) = β κ

if β − 1 6 s 6 β + 1 if s = β.

We shall prove that these functions majorize the partial sums (3.22) for n odd and even respectively, and this shows the convergence. The plan is simple, yet its realization requires some skill. There will be no room for waste in the induction inequalities. Somewhere we have to appeal to our choice of β as the largest root of (3.52)

q(β − 1) = 0,

and we do it neatly by employing again the method of adjoint equations. This time, however, our targets are reversed, we construct the inner products to deduce conditions for H, h at infinity from the initial values (3.51). As before we turn the system (3.50) into two independent equations by forming the linear combinations (3.53)

U (s) = H(s) + h(s) V (s) = H(s) − h(s)

for any s > β. These satisfy (3.54)

(sU (s))0 = −κU (s) − κU (s − 1) (sV (s))0 + −κV (s) + κV (s − 1)

for s > β + 1, and the initial conditions for β 6 s 6 β + 1 are Z s κ+1 κ κ κ s U (s) = (β − 1) + β + (β − 1) tκ d(t − 1)−κ β (3.55) Z s sκ+1 V (s) = (β − 1)κ − β κ − (β − 1)κ tκ d(t − 1)−κ . β

The adjoint equations are (3.56)

su0 (s) = κu(s) + κu(s + 1) sv 0 (s) = κv(s) − κv(s + 1).

Equivalently (3.57)

(s−κ u(s))0 = κs−κ−1 u(s + 1) (s−κ v(s))0 = −κs−κ−1 v(s + 1).

CHAPTER 3. THE BETA-SIEVE

Note that any solution to (3.56) yields a solution to (3.33) by differentiating so we can link these for convenience. In what follows we use the standard solutions u(s), v(s) which are derived from (3.85), these are v(s) = 1 (3.58) u(s) ∼ s2κ , u0 (s) = 2κ(s). By (3.52) we have u0 (β − 1) = 0 whence by (3.56) we have u(β − 1) = −u(β). Using this fact we show that the inner product Z s hV, ui = su(s)U (s) − κ u(x + 1)U (x)dx s−1

vanishes. Indeed from the general theory hU, ui is constant for all s > β + 1. By the other expression (see (3.82)) Z s hU, ui = (s − 1)u(s − 1)V (s − 1) + x−κ u(x)dxκ+1 U (x) s−1

at s = β + 1 we infer by inserting the initial conditions (3.55) Z β+1 κ hU, ui = βu(β)U (β) + (β − 1) u(x)d(x − 1)−k . β

Using (3.57) we get hU, ui = βu(β)U (β) + (β − 1)κ [(β − 1)−κ u(β − 1) − β −κ u(β)] = 0 as claimed. In other words we have for all s > β + 1 Z s (3.59) su(s)U (s) = κ u(x + 1)U (x)dx. s−1

Similarly one can show that hV, vi vanishes, hence for all s > β + 1 we have Z s (3.60) sV (s) = −κ V (x)dx. s−1

Of course, this can be also checked directly using the initial conditions (3.55). From the integral representations (3.59) and (3.60) we obtain Z s Z s sU (s) |U (x)|dx, sV (s) |V (x)|dx s−1

s−1 2κ

for all sufficiently large s because u(s) ∼ s estimates (3.61)

as s → ∞. Hence one derives the following

U (s) s−s ,

V (s) s−s .

H(s) s−s ,

h(s) s−s .

Thus we also have

3.4. THE FUNCTIONS H(s), h(s)

Lemma 3.3. We have −

(3.62)

Z

β+1

tκ d(t − 1)−κ <

β

β β−1

κ

.

Proof. We make use of the properties of u(s) already established. First note that u(s) is increasing for s > β − 1 because u0 (s) = 2κq(s) > 2κq(β − 1) = 0. Hence the function δ(t) =

1 2

κ β t

1+

u(t) u(β)

is increasing for t > β because δ 0 (t) =

κ 2t

κ

u(t+1) u(β)

β t

− 1 > 0,

Therefore δ(t) > δ(β) = 1 for t > β. Hence the left-hand side of (3.62) is strictly bounded by Z β+1 Z β u(t+1) βκ κ −κ − δ(t)t d(t − 1) = − 2 1 + u(β) dt−κ . β

β−1

Integrating the equations (3.57) over β − 1 < t < β one gets from the above 1 2

β β−1

κ

1−

u(β−1) u(β)

=

β β−1

κ

what is exactly on the right-hand side of (3.62). Letting (3.63)

γ=

β β−1

κ

+

Z

β+1

tκ d(t − 1)−κ

β

we have 0 < γ < 1, the lower bound following by Lemma 3.3 and the upper bound is obvious (replace tκ by β κ ). Note that for β 6 s 6 β + 1 we have sκ+1 h(s) > (β + 1)κ+1 h(β + 1) = γ(β − 1)κ > 0 so h(s) is positive in the initial segment.

CHAPTER 3. THE BETA-SIEVE

Lemma 3.4. There exists a constant 0 < η < 1 such that (3.65)

|V (s)| < ηU (s)

if s > β.

Proof. By the initial conditions (3.55) we get (β−1)κ −β κ (β−1)κ +β κ

6

V (s) U (s)

6

1−γ 1+γ

in β 6 s 6 β + 1 so (3.65) is true in the initial segment. We shall prove that (3.65) holds true for all s > β + 1 with the same constant η. If it failed there would exist s > β + 1 such that |V (t)| < ηU (t) for all t < s and |V (s)| = ηV (s). By (3.60) and (3.59) we derive s|V (s)| 6 κ

Z

s

|V (t)|dt < ηκ

s−1

ηκ < u(s)

Z

s

U (t)dt

s−1

Z

s

U (t)u(t + 1)dt = ηsU (s)

s−1

which is the desired contradiction. The immediate consequences of Lemma 3.4 are the following bounds (3.66)

1−η 2 U (s)

< H(s), h(s) <

1+η 2 U (s)

which show that both functions H(s), h(s) are positive in the whole range s > β and have the same order of magnitude as U (s) (they decay to zero faster than the exponential function e−s ). 3.5. The Convergence Problem. Conclusion As planned we now proceed to estimation of the partial sums of F (s), f (s) given by (3.21). Lemma 3.5. There exists a constant µ > 0 such that (3.67)

Tn (s) 6 µH(s)

(3.68)

Tn (s) 6 µh(s)

if n is odd, s > β − 1 if n is even, s > β

Proof. By induction in n. If n = 1 we have for β − 1 6 s 6 β + 1 sκ T1 (s) = (β + 1)κ − sκ < (β + 1)κ 6 (β + 1)κ+1 (β − 1)−κ sκ H(s).

3.6. THE MAIN THEOREMS

Now suppose n > 2 and that the result holds for n−1. If n is even we get by the recurrence formula (3.12) and (3.67) κ

s Tn (s) =

Z s

∞

µ Tn−1 (t − 1)dt 6 s κ

Z

∞

H(t − 1)κtκ dt = µsκ h(s)

s

by (3.50). The same argument works for n odd if s > β + 1, just interchange H and h. If β − 1 6 s 6 β + 1 we get along similar lines sκ Tn (s) = (β + 1)κ Tn (β + 1) + sκ f1 (s) 6 µ(β + 1)κ H(β + 1) + (β + 1)κ − sκ 6

µ β+1 (β

− 1)κ + (β + 1)κ − sκ 6

µ s (β

− 1)κ = µsκ H(s)

by (3.51) provided µ is sufficiently large. By Lemma 3.5 we conclude that both series (3.24), (3.25) converge and the resulting functions F (s), f (s) have all the properties derived previously in Section 3.3. 3.6. The Main Theorems In this section we collect various results from the previous sections and we make out of these the main theorems of the β-sieve. We begin by applying Corollary 3.2 but only for relatively small n for which the error term κn δ = (K − 1)K n n2 β+n β−1 can be controlled. Since we do not have adequate estimates for all n at first we delete small primes from the sifting range to force the sums Vn (D, z) to be void for all sufficiently large n. Precisely, if Y P (z, w) = P (z)/P (w) = p w6p
is the sifting range and Vn (D, z, w) denotes the sum (3.2) restricted to this range then the summation condition p1 , . . . pm pβm < D for m = n − 2 implies wn+β−1 < D, or else Vn (D, z, w) vanishes. Recall that V + (D, z, w) =

X

λ+ d g(d)

d|P (z,w)

where λ+ d are the weights of β-sieve of level D, and we have V + (D, z, w) = V (z)/V (w) +

X

n odd

Vn (D, z, w).

CHAPTER 3. THE BETA-SIEVE

A similar expansion holds for V − (D, z, w). By Corollary 3.2 we infer (3.69)

V + (D, z, w) 6 (F (s) + δ)V (z)/V (w) V − (D, z, w) > (f (s) − δ)V (z)/V (w)

where s = log D/ log z and δ is computed with the largest n. Put ν = log D/ log w so n + β − 1 < ν and n < ν, whence κν ν+1 δ < (K − 1)K ν ν 2 β−1 . Suppose (1.37) holds with L > 1 so (1.36) holds with K = 1+L(log w)−1 = 1+νL(log D)−1 . Take w = D−ε/ log ε so ν = ε−1 log ε−1 and assume that −1

log D > L + e4κε

(3.70)

log2 ε

.

Then δ εL where the implied constant depends only on κ. In the range P (w) we apply the α-sieve of level Dε with α = 9κ + 1 getting by (2.45) (3.71)

V + (Dε , w) 6 (1 + η)V (w) V − (Dε , w) > (1 − η)V (w)

with η = 2εe9κ (3L)10 provided ε 6 e−α . Combining (3.69) with (3.71) as in Section 1.9 we obtain the sieves Λ+ and Λ− of level D1+ε such that V + (D, z) 6 (1 + η)(F (s) + δ)V (z) V − (D, z) > (1 − η)(f (s) − δ)V (z) − 2η(F (s) − f (s) + 2δ)V (z). Hence we conclude the following Theorem 3.6. Suppose κ > 21 . Let β − 1 be the largest zero of q(s) and α = 9κ + 1. Choose any ε 6 e−α . Let Λ+ and Λ− be the upper bound and the lower bound sieves composed of the β-sieve of level D and the α-sieve of level Dε . Suppose (1.37) and (3.70) hold. Then we have V + (D, z) < (F (s) + O(εL11 ))V (z), if s > β − 1 (3.72) V − (D, z) > (f (s) + O(εL11 ))V (z), if s > β with the implied constant depending only on κ. Here s = log D/ log z and F (s), f (s) are the continuous solutions to the system of differential-difference equations (3.26) with the initial conditions (3.27) (see also (3.34), (3.42), (3.43) and (3.44)). Some cosmetical refinements of Theorem 3.6 can be made before applications. First of all one can change D1+ε to D so that the composite sieves Λ+ and Λ− have level D. This reduces s to (1 + ε)−1 s but it does not alter the results since the functions F (s), f (s) are Lipshitz type. Furthermore one can choose ε = (log log log D)3 / log log D. This choice requires D to be sufficiently large in terms of κ and log D > 2L in place of (3.70). With these modifications applying Theorem 3.6 to the sifting sequence A we get

3.6. THE MAIN THEOREMS

Theorem 3.7. Suppose (1.37) holds with κ > (3.74)

1 2

and L > 1. Let D > e2L . Then

(f (s) − ∆)XV (z) − R(A, D) 6 S(A, z) 6 (F (s) + ∆)XV (z) + R(A, D)

where s = log D/ log z, ∆ = cL11 (log log log D)3 (log log D)−1 and X R(A, D) = |rd (A)|. d
We shall analyze particular cases of Theorem 3.7 in the forthcoming chapters. Two cases for κ = 1 and κ = 21 will receive special attention. Although Theorems 3.6 and 3.7 do not cover the case of half-dimensional sieve one can derive desired bounds from the available results by continuity, namely one can use the results for κ = 12 + ε (recall that this choice satisfies (1.37) if κ = 12 does). The functions Fκ (s), fκ (s) were considered only for κ > 21 yet one can extend our analysis to the case κ = 12 directly or by continuity. In the limit κ → 12 the resulting functions F (s), f (s) inherit some of the properties of Fκ (s), fκ (s), in particular F (s), f (s) solve the system of differential-difference equations (3.26) with κ = 12 . However, the auxiliary functions Hκ (s), hκ (s), which were employed in our approach to the convergence problem in Section 3.5, are lost. The sieve limit β(κ) > 1 moves continuously towards β( 21 ) = 1 while the initial conditions (3.27) become f (1) = 0 and (3.75)

1

F (s) = 2(eγ /πs) 2

if 0 < s 6 2.

1

Here the constant A = 2(eγ /π) 2 is derived as the limit of (3.44) by Lemma 3.10 for a = −b = κ = 21 . For κ = 1 (the linear sieve) we have q(s) = s − 1 so β(1) = 2, and A = 2p(1)−1 by (3.44) where p(s) is given by (3.34). To compute p(1) we write Z ∞ −γ p(s) = e exp(−sz − E(z))z −1 dz 0

by (3.90). Hence and by the differential-difference equation (3.33) we get Z ∞ 0 −γ p(s + 1) = −sp (s) = e exp(−z − E(z/s))dz → e−γ 0

as s → 0. Therefore p(1) = e−γ and A = 2eγ . The initial conditions (3.27) become f (2) = 0 and (3.76)

F (s) = 2eγ s−1

if 1 6 s 6 3.

CHAPTER 3. THE BETA-SIEVE

3.7. Numerical Tables Since the sieve limit demonstrates most evidently the power of the method we end this chapter by giving a few insightful inequalities and a table of numerical values. We have the following asymptotics β − 1 ∼ πeγ (2κ − 1)2

as κ → 12 +

β − 1 ∼ cκ

as κ → ∞

where γ = .577 . . . is the Euler constant and c = 3.591 . . . is the number which solves the equation (c/e)e = e, it is the same one which was encountered in Sections 2.2 and 2.5. We also have the following neat inequalities 0<β−1<κ κ<β−1<κ+ κ+

p

if p

κ(κ − 1) < β − 1 < cκ −

κ(κ − 1)

2c c+1

1 2

< κ < 1,

if 1 < κ < 23 , if κ > 32 ,

The last inequality is due to H.G. Diamond, H. Halberstam and H.-E. Richert [DHR]. For the constant A = A(κ) in the initial conditions for F (s) one can show that 1+ and A < 2(cκ)κ , see [I4].

κ A κ < <1+ . κ β 2(β − 1) β−1

3.7. NUMERICAL TABLES

The following numerical values were computed by F. Romani (in Pisa, 1977) and J. Van de Lune-H.J.J. Te Riele (in Amsterdam, 1979).

Note that β(κ) < 2κ

if

1 2

κ

β

A

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95

1 1.0340 1.1042 1.1922 1.2912 1.3981 1.5107 1.6279 1.7489 1.8731

2(eγ /π) 2 1.6066 1.7254 1.8631 2.0212 2.2020 2.4082 2.6431 2.9106 3.2152

1 1.1 1.2 1.3 1.4 1.5

2 2.2605 2.5286 2.8028 3.0822 3.3660

2eγ 4.4084 5.5109 6.9528 8.8464 11.3442

2/3 3/4 4/5 5/6

1.2242 1.3981 1.5107 1.5884

1.9134 2.2020 2.4082 2.5614

< κ < 1.

1

CHAPTER 3. THE BETA-SIEVE

APPENDIX FOR CHAPTER 3 The Differential-Difference Equations The differential equations with delayed argument occur in mathematical models of real time processes. In number theory one encounters solutions to differential-difference equations in asymptotics for the average value of a multiplicative function over integers having prime divisors of certain type in selected segments. All we need in sieve theory are the linear equations of type (3.77)

sQ0 (s) + aQ(s) + bQ(s − 1) = 0

if s > β

where a, b are constant, real coefficients and β > 1. For example the β-sieve theory makes use of this for a = κ and b = ±κ. Different coefficients are relevant to other sieves (see A4.3). Equivalently we write (3.77) as (3.78)

0

(sa Q(s)) = −bsa−1 Q(s − 1)

if s > β.

Hence given Q(s) in the initial segment β − 1 < s 6 β it extends as a continuous function uniquely to all s > β by repeated integration. Suppose Q(s) is smooth inside the initial segment, then it is smooth everywhere except for the points β + n with n = 0, 1, 2, . . . at which it has continuous derivatives of order 6 n. In practice Q(s) is not smooth at these points so we are not able to write a simple formula for Q(s). A standard technique of solving the differential-difference equation (3.77) applies the Laplace transform to get an ordinary differential equation, and when the latter is solved one gets Q(s) by the Laplace inverse transform (in the complex plane). In these lectures we employ a technique of the adjoint equation which, besides being more elementary, reveals the properties of a solution required for the development of the β-sieve at once. The adjoint to (3.77) is the following equation with advanced argument (3.79)

0

(sq(s)) = aq(s) + bq(s + 1)

if s > 0.

Equivalently we write this as (3.80)

0 s1−a q(s) = bs−a q(s + 1)

if s > 0.

The equation with advanced argument as above has usually a nice solution q(s) 6≡ 0 of C∞ -class. There is a reasonable explanation. Think of s as a time variable. The equation (3.77) describes the process Q(s) at the moment s which happens after elapse of a finite time from the initial period. To the contrary (3.80) describes q(s) at the present moment

APPENDIX FOR CHAPTER 3

from which the beginning of the process is distanced infinitely far away towards the future. That is why q(s) is smooth at any s, in fact q(s) is holomorphic in Res > 0, consequently it will be possible to give a simple expression for q(s). With a pair of functions Q(s), q(s) we associate the “inner product” (3.81)

hQ, qi = sQ(s)q(s) − b

Z

s

Q(x)q(x + 1)dx

s−1

which is defined for s > β. The key property of this product is that it is constant for adjoint functions. Indeed, differentiating (3.81) we get sQ0 ((s)q(s) + Q(s)(sq(s))0 − bQ(s)q(s + 1) + bQ(s − 1)q(s) = 0 by invoking (3.77) and (3.79). Using (3.78) and (3.80) we can also write the inner product by partial integration as follows Z s (3.82) hQ, qi = (s − 1)Q(s − 1)q(s − 1) + x1−a q(x)dxa Q(x). s−1

The inner product is instrumental for studying local properties of Q(s) whose behavior is often erratic and is difficult to grasp by methods of complex analysis. Since the inner product is constant you can look at this as an integral representation for Q(s) over a unit segment. Concerning the kernel function q(s) think of this as a simple given function having regular asymptotic behavior. For instance, if a + b = n + 1 is a positive integer then q(s) is a polynomial of degree n, X n q(s) = a` sn−` ` 06`6n

where a0 = 1 and a` with ` > 0 are determined by the recurrrence formula X ` `a` + b aj = 0. j 06j<`

In particular for a + b = 1, 2, 3 we get (3.83)

q(s) = 1,

s − b,

s2 − 2bs + b(b − 12 )

APPENDIX FOR CHAPTER 3

respectively, hence for a = b = κ = 12 , 1, 32 we have (3.84)

q(s) = 1,

s − 1,

s2 − 3s + 32 .

In general q(s) can be given by the contour integral Z Z z Γ(a + b) −a−b u −1 (3.85) q(s) = z exp sz + b (1 − e )u du dz 2πi C 0 where C has the shape

and the power function is defined using −π < arg z 6 π. Indeed, the function being integrated has exponential decay as z → ∞ along C, thus we can integrate by parts and verify (3.79) as follows Z z Z Γ(a + b) 1−a−b u −1 0 z exp b (1 − e )u du desz sq (s) = 2πi C 0 Z z Z Γ(a + b) sz −1−a−b u −1 =− e dz exp b (1 − e )u du dz 2πi C 0 Z Z z Γ(a + b) −a−b z u −1 = z (be + a − 1) exp sz + b (1 − e )u du dz 2πi C 0 = (a − 1)q(s) + bq(s + 1). It is possible and convenient to arrange q(s) as an integral over the positive reals, simply by pressing C to the negative real axis and changing the variable z into −z. This works right if a + b < 1, otherwise there is a problem of convergence. We overcome the problem by expanding Z z

(1 − eu )u−1 du

R(z) = exp b

0

into Taylor series R(z) =

X

06`
R(`) (0)

z` + RN (z), `!

APPENDIX FOR CHAPTER 3

say, and integrating all but the remainder term RN (z) explicitly. Using the formula Z 1 sν+1 esz z −ν dz = 2πi C Γ(ν) we obtain Z X a + b − 1 Γ(a+b) (`) a+b−1−` q(s) = R (0)s + 2πi esz RN (z)z −a−b dz ` C 06`
where a+b−1 `

=

Γ(a+b) `!Γ(a+b−`) .

Here RN (z) |z|N as z → 0 so if a + b 6 N one can press the contour of integration to the two copies of the half-line (−∞, 0) having different orientations. When approaching from below we reach the value z −a−b = |z|−a−b e−πi(a+b) while from above we reach the complex conjugate of this value. Having done this the contour integral becomes Z Z ∞ 1 sz −a−b −1 e RN (z)z dz = π sin(π(a + b)) e−sz RN (−z)z −a−b dz. 2πi C 0 Finally, using the functional equation for the gamma function Γ(a + b)Γ(1 − a − b) sin π(a + b) = π we conclude the following Lemma 3.9. If a + b 6 N we have X a + b − 1 q(s) = R(`) (0)sa+b−1−` + `

1 Γ(1−a−b)

06`
Z

∞

e−sz RN (−z)z −a−b dz.

0

We shall call q(s) = q(s; a, b) given by (3.85) the standard solution to (3.79). It follows from Lemma 3.9 that q(s; a, b) is holomorphic in Re s > 0 and it is entire function of either variable a or b. If a + b < 1 the formula of Lemma 3.9 yields Z ∞ Z z −u −1 1 (3.86) q(s) = Γ(1−a−b) exp −sz + b (1 − e )u du z −a−b dz 0

0

and if a + b < 2 it yields 1+b−1

(3.87) q(s) = s

+

1 Γ(1−a−b)

Z

∞

−sz

e

Z z −u −1 exp b (1 − e )u du − 1 z −a−b dz.

0

0

In general it follows by Lemma 3.9 that (3.88)

q(s) ∼ sa+b−1 ,

as s → ∞.

APPENDIX FOR CHAPTER 3

Lemma 3.10. If a + b < 1 and a < 1 then Γ(1−a) q(s) ∼ ebγ Γ(1−a−b) sa−1 ,

(3.89)

as s → 0+.

Proof. We have (3.90)

γ=

Z

1 −u

(1 − e

)u

−1

0

du −

Z

∞

e−u u−1 du

1

and (3.91)

E(z) =

Z

∞

−u −1

e

u

du =

z

Z

z

(1 − e−u )u−1 du − log z − γ.

0

Hence by (3.86) we get −bγ

e

Z

∞

exp (−sz + bE(z)) z −a dz 0 Z ∞ a−1 = Γ(1 − a)s + esz (ebE(z) − 1)z −a dz.

Γ(1 − a − b)q(s) =

0

The last integral is bounded by its absolute value at s = 0, hence the asymptotic (3.89). We have seen that the zeros of q(s) for (a, b) = (κ, κ) play a role in the β-sieve theory. In general the distribution of zeros of q(s) is a fascinating subject of its own merit. Here we present only a few elementary results. Lemma 3.11. The number of real, positive zeros of q(s) is less than a + b. Proof. If a + b is a positive integer then q(s) is a polynomial of degree a + b − 1 so the assertion is true. If a + b < 1 then q(s) is positive by (3.86) so the assertion is true. Now suppose n < a + b < n + 1 where n is a positive integer. Differenting the equation (3.79) or the integral formula (3.85) n times we find that Γ(a + b)−1 Γ(a + b − n)q (n) (s) is the standard solution to (3.79) with the coefficient a reduced to a − n, therefore q (n) (s) is positive by virtue of the previous observation. This implies q(s) has at most n zeros. Lemma 3.11. If b 6 0 then q(s) is positive for all s > 0. Proof. By (3.88) it follows that q(s) is positive for all sufficiently large s. If b = 0 then q(s) = sa−1 > 0 for all s > 0. Let b < 0 and suppose q(s) has a zero, say α is the largest one. Then q 0 (α) > 0, q(α + 1) > 0 and αq 0 (α) = bq(α + 1) < 0 which is a contradiction.

APPENDIX FOR CHAPTER 3

Lemma 3.12. If b > 0 and a + b > 1 then q(s) has a zero. Proof. First we show that if q 0 (s) has a zero, say α0 > 0, then q(s) has a zero α > α0 . Indeed, assuming α0 is the largest zero of q 0 (s) we derive q(α0 + 1) > q(α0 ) and 0 = α0 q 0 (α0 ) = (a − 1)q(α0 ) + bq(α0 + 1) > (a + b − 1)q(α0 ), hence q(α0 ) is negative so q(s) must have a zero α > α0 . Now by this property we can reduce the proof by repeated differentiation to the case 1 < a + b 6 2. If a + b = 2 then q(s) = s − b so α = b is the zero of q(s). If 1 < a + b < 2 then by (3.87) it follows that q(s) is negative as s → 0+ while it is positive as s → ∞ so q(s) has a zero. Remarks. Arguing as above one can show that the largest zero α of q(s) is simple and it is a continuous function of the coefficients a, b for b > 0 and a + b > 1. Quite easily one can show by playing with the equation (3.79) that 0 <α < b (3.92)

1 b b <α < b + ( (a + b − 2)) 2 2 1 b α > b + ( (a + b − 2)) 2 2

if 1 < a + b < 2 if 2 < a + b < 3 if a + b > 3.

The theory of the equations (3.77) and (3.79) has been developed further by several researchers in sieve methods, notably by H. Diamond, H. Halberstam, H. -E. Richert and their students. I refer to two recent Ph.D. theses by F. Wheeler [W] and D. Bradley [B] where one can find up-to-date progress and references.

APPENDIX FOR CHAPTER 3

CHAPTER IV

THE Λ2 -SIEVE

A powerful and elegant method of an upper bound sieve came from Atle Selberg [S1]. Selberg’s method yields results of great generality, it is simpler than combinatorial sieves at the start, though equally complex in its advanced forms. 4.1. General Results Recall that an upper bound sieve of level D is a sequence of real numbers λd for d < D with λ1 = 1 such that X (4.1) λd > 0 for all m ∈ N. d|m

Hereafter the superscript + is omitted for notation simplicity. This positivity condition was quite difficult to get a hold on in combinatorial sieve. Selberg made it very easy by choosing λd such that X X (4.2) λd = ( ρd )2 d|m

d|m

where {ρd } is another sequence of real numbers with (4.3)

ρ1 = 1.

Since the squares are non-negative such a choice guarantees (4.1) no matter what ρd are! Selberg’s choice amounts to (4.4)

λd =

X

ρd1 ρd2 .

[d1 ,d2 ]=d

In order to control the level we assume ρd are supported on integers < √ (4.5) ρd = 0 if d > D.

√

D,

APPENDIX FOR CHAPTER 3

Hence the resulting sieve {λd } has level of support D. Following Selberg we call it the Λ2 -sieve of level D. Applying the Λ2 -sieve to the sequence A = (an ) in the sifting range P we get S(A, P ) =

X

an 6

X X

X

an (

n

(n,P )=1

=

X

ρd )2

d|(n,P )

ρd1 ρd2 |A[d1 ,d2 ] | = XG + R(A, P, Λ2 )

d1 ,d2 |P

where (4.6)

G=

X X

g([d1 , d2 ])ρd1 ρd2

d1 ,d2 |P

and (4.7)

R(A, P, Λ2 ) =

X X

ρd1 ρd2 r[d1 ,d2 ] (A).

d1 ,d2 |P

The task before us is to make this general inequality optimal. Forgetting for a moment about the remainder term R(A, P, Λ2 ) we wish to minimize G with respect to the unknown numbers ρd subject to (4.3) and (4.5). The ensuing numbers will satisfy (4.8)

|ρd | 6 1,

hence the remainder term is automatically under control. The expression (4.6) is a quadratic form in ρd . In order to find the minimum of G it helps to diagonalize. In the presentation below it goes without saying that ρd is supported on, and the relevant variables of summation run over the divisors of P (thus over squarefree numbers). Furthermore we can assume that (4.9)

0 < g(p) < 1 g(p) = 0

if p|P , if p - P .

Let h(d) be the multiplicative function defined by (4.10)

h(p) =

g(p) . 1 − g(p)

APPENDIX FOR CHAPTER 3

We obtain G=

XXX

g(abc)ρac ρbc

abc|P

=

X

g(c)

c

=

X c

=

X d|P

XX

g(a)g(b)ρac ρbc

(a,b)=1

g(c)

X

µ(d)g(d)2

X

g(m)ρcdm

!2

m

d



h(d)−1 

2

X

g(m)ρm  .

X

g(m)ρm

m≡0(mod d)

Hence by the linear change of variables (4.11)

ξd = µ(d)

m≡0(mod d)

we obtain the diagonal form (4.12)

G=

X

h(d)−1 ξd2 .

d|P

We still have to reinterpret the condition (4.3) in terms of the new variables ξd . To this end we use the M¨ obius inversion (see A1.1) to convert (4.11) into (4.13)

ρ` =

µ(`) g(`)

X

ξd .

d≡0(mod `)

In particular for ` = 1 this gives the linear equation (4.14)

X

ξd = 1.

d|P

Moreover one observes by (4.11) and (4.13) that the support conditions (4.5) for ρd are equivalent to these for ξd , (4.15)

ξd = 0

if d >

√

D.

APPENDIX FOR CHAPTER 3

Now our target is to minimize (4.12) on the hyperplane (4.14). Applying Cauchy’s inequality to (4.14) we derive GH > 1 where X (4.16) H= h(d) √ d< D,d|P

so G cannot be smaller than H −1 . The equality (4.17)

GH = 1

holds for ξd = h(d)H −1

(4.18)

if d <

√

D.

Note that (4.19)

H6

X

h(d) =

d|P

Y Y (1 + h(p)) = (1 − g(p))−1 = V (P )−1 p|P

p|P

so G > V (P ). Next we compute ρ` by inserting (4.18) into (4.13) getting X h(m), µ(`)g(`)ρ` H = √ m< D m≡0(mod `)

that is (4.20)

−1 ρ` = µ(`) h(`) g(`) H

X

h(d).

√ d< D/` (d,`)=1

Now we show (4.8). To this end we group the terms in (4.16) according to the greatest common divisor of d and ` getting X X X X H= h(d) = h(k) h(m) √ k|` d< D (d,`)=k

X >( h(k)) k|`

k|`

X

√ m< D/` (m,`)=1

√ m< D/k (m,`)=1

h(m) = µ(`)ρ` H

and this proves (4.8) (this neat estimate is due to J.H. van Lint and H.-E. Richert [LR]). From this one gets directly by (4.4) (4.21)

|λd | 6 τ3 (d).

¿From the above results we conclude the following

APPENDIX FOR CHAPTER 3

Theorem 4.1. Let A = (an ) be a finite sequence of non-negative numbers and P be a finite product of distinct primes. For every d|P we write X (4.22) |Ad | = an = g(d)X + rd (A) n≡0(mod d)

where X > 0 and g(d) is a multiplicative function with 0 < g(p) < 1 for p|P . Let h(d) be the multiplicative function given by h(p) = g(p)(1 − g(p))−1 and X h(d). (4.23) H= √ d< D,d|P

for some D > 1. Then we have (4.24)

S(A, P ) =

X

an 6 XH −1 + R(A, P, Λ2 )

(n,P )=1

where (4.25)

R(A, P, Λ2 ) =

X

λd rd (A)

d|P

with λd given by (4.4) and (4.20). Using (4.21) one estimates the remainder term crudely by X (4.26) |R(A, P, Λ2 )| 6 τ3 (d)|rd (A)|. d
The Λ2 -sieve is very general indeed. Its upper bound does not require any regularity in the distribution of the density function g(p) over primes, therefore the dimension of a sifting problem does not play explicitly a role. If g(p) = ω(p)p−1 , where ω(p) is the number of residue classes (mod p) which one wants to exclude, then h(p) = ω(p)(p − ω(p))−1 is the ratio of the numbers of rejected to admitted classes, and H incorporates these ratios. Of course, the larger ω(p) is the smaller the upper bound (4.24). However, a few local deviations of ω(p) from its average have insignificant global effect. To the contrary our estimates derived by combinatorial sieves are quite sensitive in this respect because the hypothesis (1.36) is assumed to hold for all segments of primes w 6 p < z, no matter how short. The Λ2 -sieve yields fantastic results for sifting problems when the number of residue classes to be excluded is large, it can compete with the large sieve method of Linnik. We shall give a connection between both methods in Section 8.4.

APPENDIX FOR CHAPTER 3

Note that the numbers ρd depend on the multiplicative function g(d) so (indirectly) on the sifting sequence A. However, assuming some regularity of g(d) (as in the κ dimensional sieve) one can establish fairly good approximation (for small d at any rate, see Theorem 4.5) (4.27)

ρd = µ(d)

κ √ log D/d √ log D

n o 1 + O log √1D/d .

Hence for small d we have ρd ∼ µ(d) and λd =

X

ρac ρbc ∼

abc=d

Putting ρ+ d = µ(d)

X

µ(a)µ(b) = µ(d).

abc=d

κ √ log+ D/d √ log D

λ+ d =

and

X

+ ρ+ ac ρbc

abc=d + one gets an upper bound sieve Λ+ = {λ+ d } of level D. The modified sieve Λ depends only on κ and D, it is not the optimal one, but how good is it? Precisely, how close H −1 is to

G+ =

X

λ+ d g(d)

d|P

for a multiplicative function g such that g(p)p is κ on average? For g(d) = d−1 , in which case κ = 1, the quasi-optimal weights satisfy (see [BV]) X X 2 ( ρ+ d)

16n6x d|n

if x > (4.28)

√

log

x √

D

D whereas the optimal weights satisfy (see (8.33)) X X x+D √ . ( ρd )2 < log D 16n6x d|n

4.2. Explicit Estimates for H(D, z) To bring the Selberg upper bound (4.24) to a practical form we need a clear lower bound for (4.29)

H(D, P ) =

X

√ d< D,d|P

h(d).

4.2. EXPLICIT ESTIMATES FOR H(D, z)

The upper bound H(D, P ) 6 V (P )−1 holds in general (see (4.19)) but a good lower bound requires some restrictions on the density function g, so on h, and the sifting range P . Let I(D, P ) denote the complementary sum so H(D, P )+I(D, P ) = V (P )−1 . Therefore we need an upper bound for I(D, P ). Using Rankin’s trick we estimate as follows I(D, P ) =

X

√ d> D,d|P

h(d) 6 D−ε

X

H(d)d2ε = D−ε

d|P

Y (1 + h(p)p2ε ) p|P

  X Y V (P )I(D, P ) 6 D−ε (1 + g(p)(p2ε − 1)) 6 D−ε exp  g(p)(pε − 1) . p|P

p|P

Suppose every prime in P is p < z = D1/s with s > 1. Choose ε = (log z)−1 so p2ε − 1 6 6ε log p. Suppose g(p) satisfies the one-sided inequality (1.36). Then we derive by partial summation X g(p) log p 6 (κ + log K) log z. p|P

Hence V (P )I(D, P ) 6 K 6 e6κ−s and V (P )H(D, P ) > 1−K 6 e6κ−s . Using (1−x)−1 6 1+2x for 0 6 x 6 12 we invert the last inequality into (4.30)

H(D, P )−1 6 V (P )(1 + 2K 6 e6κ−s )

if s > 6κ + log 2K 6 .

Since H(D, P ) is √ increasing with the sifting range one can extend (4.30) to all z. In particular for z = D we derive by (1.36) H(D, P )−1 6 2K3κ (κ + log 2K)κ V (P ). Now we proceed to more precise estimates. It is easy to remove (for convenience) a few primes from the sifting range. Specifically, if q|P then H(D, P ) 6 H(D, q)H(D, P/q) whence (4.31)

H(D, P/q) > V (q)H(D, P ).

This rule must not be used excessively (see also (1.72)). ¿From now on P =√P (z) is the product of all primes p < z. We begin by considering the sifting level z > D in which case the sifting range does not obstruct (4.29) so we have X[ H(D) = h(d). √ d< D

APPENDIX FOR CHAPTER 3

We can estimate H(D) strongly, elementarily and nicely for g(d) = d−1 (this occurs when one is sifting numbers in an interval). In this case h(d) = ϕ(d)−1 and H(D) =

X[

h(d)

√ d< D

Y

1 p

+

1 p2

+ ···

p|d

>

X

m−1 > log

√

D.

√ m< D

If g(d) agrees with d−1 only for (d, q) = 1 (as for example in the case of sifting an arithmetic progression) then by (4.31) we still have a result (4.32)

H(D) >

Y (1 − g(p))

log

√

D.

p|q

The above example is rather special. Now suppose g(p)p is κ not exactly but on average, say we have (4.33)

X

g(p) log p = κ log x + O(1)

p6x

for all x > 2 where κ is a positive real number. Hence g(p) log p 1. Suppose also that X

g(p)2 log p < ∞.

p

Since h(p) = g(p) + O(g(p)2 ) it follows that (4.33) √ holds for h as well (with a different implied constant). Applying Theorem 4.5 for Mh ( D) we get H(D) = c(log

√

D)κ 1 + O((log D)−1 )

where c=

1 Γ(κ+1)

Y (1 − p1 )κ (1 − g(p))−1 p

and the implied constant depends on that in (4.33). If D is large in terms of this constant we can invert this approximation getting (4.34)

H(D)−1 = 2κ Γ(κ + 1)Hg (log D)−κ {1 + O((log D)−1 )}

where (4.35)

Hg =

Y (1 − g(p))(1 − p1 )−κ . p

4.2. EXPLICIT ESTIMATES FOR H(D, z)

√ Now let P = P (z) √ with z 6 D. Suppose g satisfies (4.33). Then by Theorem 4.7 for Mh (x, z) with x = D we obtain (4.36)

H(D, z)V (z) = σ(s) + O((log D)−1 )

where s = log D/ log z > 2 and σ(s) is the continuous solution to the differential-difference problem s−κ σ(s) = 2−κ e−γκ Γ(κ + 1)−1 (4.37)

sσ 0 (s) = κσ(s) − κσ(s − 2)

if 0 < s 6 2 if s > 2,

i.e., f(s) = σ(2s) is the solution to (4.88). By (4.36), (4.26) and (4.24) we get a more developed estimate 1 1 (4.38) S(A, z) < XV (z) + O( ) + R3 (A, D) σ(s) log D where s = log D/ log z > 2 and R3 (A, D) is given by (1.65) (one can retain the original remainder term R(A, P, Λ2 ) to take advantage of particular properties of the Λ2 -sieve, see the comments in Section 5.5). The estimate (4.38) should be compared with our upper bound (3.74) obtained by the β-sieve. By (4.88) we get 0

sf (s) = κ

Z

s

f0 (t)dt

if s > 1

s−1

and sf0 (s) = sκ e−γκ Γ(κ)−1 if 0 < s 6 1. Hence one can show that f0 (s) = exp(−s log s + O(s log log s)) f(s) = 1 − exp(−s log s + O(s log log s)) σ(s) = 1 − exp(− 2s log s + O(s log log s)) if s is sufficiently large. Therefore 1 = 1 + exp − 2s log s + O(s log log s) . σ(s) On the other hand the corresponding function F (s) in the upper bound (3.74) satisfies (3.49) so it goes to 1 faster than σ(s)−1 does. This analysis shows that if s is sufficiently large the combinatorial sieve is stronger than the Selberg sieve in any fixed dimension.

APPENDIX FOR CHAPTER 3

A, P, Λ2 ) 4.3. Explicit Estimates for R(A Once again we consider a class of sifting problems in which the individual error terms satisfy |rd (A)| 6 g(d)d.

(4.39)

Naturally with this property one makes the condition (4.40)

g(d)d > 1

if d|P .

This implies g([d1 , d2 ])[d1 , d2 ] 6 g(d1 )g(d2 )d1 d2 , therefore

(4.41)



|R(A, P, Λ2 )| 6 

X

√ d< D

2



|ρd |g(d)d 6 

1 H

X

√ m< D

h(m)σ(m)

by (4.7) and (4.19), where σ(m) denotes the sum of divisors of m. Assuming that (see A4.1) X

(4.42)

g(p) log p log(2x/y)

y6p6x

we infer the same condition for h(p)σ(p)p−1 and apply (4.57) getting √

X

h(m)σ(m)

X

h(m)σ(m)m−1 6 H

√ m< D

D log D

X

h(m)σ(m)m−1 .

X

h(m)m−1 H.

√ m< D

Here we have √ m< D

m

Hence we conclude that (4.43) Combining with (4.24) we get

R(A, P, Λ2 ) D(log D)−2 .

2

4.4. SELECTED APPLICATIONS

Theorem 4.2. Suppose the conditions of Theorem 4.1 hold. Moreover assume (4.39), (4.40) and (4.42). Then we have X D (4.44) S(A, P ) 6 +O H log2 D where H = H(D) is given by (4.23), D > 1 is arbitrary, and the implied constant depends only on that in (4.42). 4.4. Selected Applications Consider the sequence A = (an ) which is the characteristic function of an arithmetic progression in a short interval (4.45)

n ≡ a(mod q),

x
where (a, q) = 1 and 1 6 q < y. Let P consist of primes p 6 (4.46)

π(x + y; q, a) − π(x; q, a) 6 S(A, P ) +

√

√

y with p - q. Then

y q −1 .

On the other hand the conditions of Theorem 4.2 are satisfied with X = yq −1 and g(d) = d−1 if (d, q) = 1. By (4.32) we have H(D) > ϕ(q) 2q log D. Combining (4.44) with (4.46) we get √ y 2y D π(x + y; q, a) − π(x; q, a) < +O + 2 ϕ(q) log D q log D Choosing D = yq −1 we conclude Theorem 4.3. For (a, q) = 1 and 1 6 q < y we have (4.47)

2y π(x + y; q, a) − π(x; q, a) < +O ϕ(q) log(y/q)

y 2 q log (y/q)

where the implied constant is absolute. Using the large sieve methods H.L. Montgomery and R.C. Vaughan [MV1] have shown that the error term in (4.47) can be deleted. Next we take A = (an ) the characteristic function of the polynomial n = (m − α1 ) . . . (m − αk ) with 1 6 m 6 x, where all αj are distinct. In this case g(p) = ν(p)p−1 where ν(p) is the number of roots modulo p. If p is sufficiently large ν(p) = k so we have k-dimensional sieve problem. By (4.34) and (4.44) we deduce

APPENDIX FOR CHAPTER 3

Theorem 4.4. Let a = (α1 , . . . , αk ) be distinct integers which do not cover all residue classes to any prime modulus. Then the number of integers 1 6 m 6 x for which m − α1 , . . . , m − αk are all primes satisfies (4.48)

π(x; a) 6 2k k!Bx(log x)−k {1 + O(log log x/ log x)}

where (4.49)

B=

Y (1 −

ν(p) p )(1

− p1 )−k .

p

Remarks. The upper bound (4.48) is larger by factor 2k k! than the conjectured asymptotic π(x; a) ∼ Bx(log x)−k .

APPENDIX FOR CHAPTER 4

APPENDIX FOR CHAPTER 4 Mean-Values of Multiplicative Functions In sieve theory one encounters variety of sums of multiplicative functions over special integers having prime divisors in the sifting range. In this section we deliver a few elementary estimates for the mean value Mf (x) =

(4.50)

X

f (m).

m6x

where m runs over all positive integers. Throughout f is a multiplicative function supported on squarefree numbers. Sometimes f is temporarily detached from a variable of summation, however its absence does not necessarily mean we give up the restriction to squarefree numbers. To stress that the summation runs over squarefree numbers we use the superscripts [ or the context speaks itself when this restriction is still in place. If P is the sifting range the additional summation condition m|P can be implemented by assuming that f is supported on the divisors of P . Together with f we consider another multiplicative function g(m) = f (m)m−1

(4.51) A4.1. Simple Estimates

Suppose f (p) > 0. The following estimates need no explanation, (4.52)

Mf (x) 6 xMg (x) 6 x

Y

(1 +

f (p) p ).

p6x

If f (p) > 1 one can do slightly better. Write f = 1 ∗ h with h = µ ∗ f by the M¨ obius inversion. Since h(p) = f (p) − 1 > 0 we infer Mf (x) =

X

h(m) 6 xMh (x),

mn6x

hence by (4.52) we get (4.53)

Mf (x) 6 x

Y

(1 +

p6x

f (p)−1 ). p

APPENDIX FOR CHAPTER 4

Both results can be combined. Suppose f (p) > 0 for all p and f (p) > 1 for p|P . Then we have Y Y (4.54) Mf (x) 6 x (1 + f (p) (1 + f (p)−1 ) ). p p p6x p-P

p6x p|P

Next suppose that f (p) > 0 and X

(4.55)

g(p) log p 6 a log

x y

+b

y
for all 1 < y 6 x where a > 0, b > 1 are constants. By partial summation we infer X (4.56) f (p) log p 6 cx p6x

where c = a + b. Then we get X X f (np) log p 6 cxMg (x). f (m) log m = m6x

np6x

Hence again by partial summation Mf (x) 6 1 + cL(x)Mg (x) where L(x) =

Z

x

(log t)−1 dt < 2x(log x)−1 .

2

Therefore (4.57)

Mf (x) <

3cx 3cx Y Mg (x) 6 (1 + log x log x

f (p) p ).

p6x

A4.2. Asymptotic Formulas for Full Sums In order to establish an asymptotic formula for Mg (x) we require a condition about distribution of g at primes which is somewhat stronger than (4.55), namely that (4.58)

X

g(p) log p = k log x + δ(x)

p6x

where δ(x) is bounded for all x > 2. For many functions in practice this condition can be established by elementary methods (it is weaker than the prime number theorem). With

APPENDIX FOR CHAPTER 4

some applications in mind we allow g(p) to be negative but not too much. Precisely we assume that (4.58) holds with k > − 12 . Moreover we need two minor estimates Y

(4.59)

(1 + |g(p)|)

w6p
X

(4.60)

log z log w

|k|

if z > w > 2

g(p)2 log p < ∞.

p

Our arguments are modelled on E. Wirsing [W1], these are based on considering the smoothed mean value Z x X x g(m) log m = Mg (t)t−1 dt. 1

m6x

As before we have (Tchebyshev’s ideas) X

g(m) log m =

m6x

X

g(np) log p

np6x

= =

g(n)

n6x

p6x/n

X

g(n) k log

X

g(m) log m = kMg (x) log x + ∆g (x)

n6x

X

X

X

g(p) log p −

g(np)g(p) log p

np2 6x x n

+δ

x n

−

X

g(np)g(p) log p.

np2 6x

Hence (4.61)

(k + 1)

m6x

where ∆g (x) =

X

g(n)δ

x n

n6x

−

X

g(np)g(p) log p.

np2 6x

For any x > 2 we have by (4.59) (4.62)

∆g (x) M|g| (x) 1 +

X

g(p)2 log p

!

(log x)|k| .

p

We write (4.61) as follows Mg (x) log x − (k + 1)

Z 1

x

Mg (t)t−1 dt = ∆g (x).

APPENDIX FOR CHAPTER 4

Here we extract the contribution of the integral over 1 6 t 6 2, which is (k + 1) log 2, and putting it on the right side we get for x > 2 Mg (x) log x − (k + 1)

(4.63)

x

Z

Mg (t)t−1 dt = ∆∗g (x)

2

where ∆∗g (x) = ∆g (x) + (k + 1) log 2. Next we divide by x(log x)k+2 and integrate Z

x

Mg (t)t

−1

−k−1

(log t)

2

Z

x

Z

t

dt − (k + 1) t (log t) Mg (u)u−1 du 2 2 Z x = ∆∗g (t)t−1 (log t)−k−2 dt. −1

−κ−2

2

Changing the order of integration we find that the left-hand side is equal to −k−1

(log x)

Z

x

Mg (t)t−1 dt.

2

Combining this result with (4.63) we arrive at the following identity (4.64)

Mg (x) = −(log x)

k

Z

x

∆∗g (t)d(log t)−k−1 + ∆∗g (x)(log x)−1 .

2

Since k > − 12 the above integral converges by virtue of (4.62). Extending the integration to infinity we obtain another identity (4.65)

Mg (x) = {cg + γg (x)}(log x)k

if x > 2 where cg is a constant (4.66)

cg = −

Z

∞

∆∗g (t)d(log t)−k−1

2

and γg (x) is the error term given by γg (x) =

Z

∞

(∆g (t) − ∆g (x))d(log t)−k−1 .

x

By (4.62) we have γg (x) (log x)|k|−k−1 .

APPENDIX FOR CHAPTER 4

Here the exponent is negative so (4.65) becomes an asymptotic formula for Mg (x). However the constant cg as given by (4.66) is not appealing. We shall compute cg in another way. Consider the zeta-function formed by g, ζg (x) =

∞ X

g(m)m−s .

m=1

The series converges absolutely for s > 0. We compute by partial summation using (4.65) that Z ∞ Z ∞ −s ζg (s) = x dMg (x) = − Mg (x)dx−s 1 1 Z ∞ Z ∞ t −st =− Mg (e )de =− (cg + O(t−ε ))tk de−st 0

0 ε

−k

= (cg + O(s ))s

Γ(k + 1)

as s → 0+.

Comparing this with the k-th power of the Riemann zeta function we get ζ(s + 1)−k ζg (s) ∼ cg Γ(k + 1). On the other hand we have the product over primes Y ζ(s + 1)−k ζg (s) = (1 − p−s−1 )k (1 + g(p)p−s ) p

which converges absolutely for s > 0 and it has the limit as s → 0 by virtue of (4.58). Hence the constant cg is equal to Y 1 (4.67) cg = Γ(k+1) (1 − p1 )k (1 + g(p)). p

Thus we have established Theorem 4.5. Suppose g is a multiplicative function supported on squarefree numbers which satisfies (4.58) with k > − 21 and δ(x) bounded. Assume that (4.59) and (4.60) also hold. Then the mean value of g satisfies the asymptotic formula (4.68)

Mg (x) = cg (log x)k + O((log x)|k|−1 )

if x > 2 where cg is given by (4.67). A4.3. Asymptotic Formulas for Restricted Sums

APPENDIX FOR CHAPTER 4

The sieve theory demands asymptotics for mean-values of multiplicative functions over integers free of large prime divisors. In this section we derive the required asymptotics from the results for unrestricted mean-values in Theorem 4.5. Let P (z) be the product of all primes < z and (4.69)

Mg (x, z) =

X

g(m).

m6x m|P (z)

Throughout we assume that g satisfies the hypotheses of Theorem 4.5, and to simplify notation we do not carry the subscript g. For z > x > 2 we have (4.70)

M(x, z) = M(x) = c(log x)k + O((log x)|k|−1 ).

For 2 6 z < x we have the following recurrence formula (4.71)

M(x, z) = M(x) −

X

g(p)M( xp , p).

z6p<x

To prove this write every m in M(x) which is not in M(x, z) as m = np where p is the largest prime divisor of m with z 6 p < x, such a factorization is unique and it is also determined by the condition n|P (p). By this recurrence formula we extend (4.70) to any level z = x1/s with s > 0 by induction in [s]. We postulate the following formula (4.72)

M(x, z) = cm(s)(log x)k + O((log x)|k|−1 )

for all x, z > 2 where m(s) is a suitable continuous function of s = log x/ log z to be determined. We know this holds with (4.73)

m(s) = 1

if 0 < s 6 1.

Suppose s > 1. Inserting (4.72) into (4.71) we get M(x, z) = c(log x)k − c

X

g(p)m(sp )(log xp )k

z6p<x/2

  X + O (log x)|k|−1 + |g(p)|(log  z6p<x

 

2x |k|−1 p ) 

APPENDIX FOR CHAPTER 4

where sp = log x/ log p − 1. Hence by (4.58), (4.59) using partial summation we derive M(x, z) = c(1 +

s

Z

m(t − 1)dt−k )(log x)k + O((log x)|k|−1 )

1

Therefore (4.72) holds with m(s) given by (4.74)

m(s) = 1 +

Z

s

m(t − 1)dt−k

if s > 1.

1

Before stating a theorem we investigate asymptotic behaviour of m(s) by the method of adjoint equation (see A3). Differentiating (4.74) we get sk+1 m0 (s) = −k(s − 1)k m(s − 1).

(4.75)

Putting m(s) = sk m(s) this equation becomes (4.76)

sm0 (s) = km(s) − km(s − 1)

if s > 1.

Note that a constant function satisfies (4.76) but our initial values are not constant, namely m(s) = s−k

(4.77)

if 0 < s < 1.

Writing (4.76) as 0

sm (s) = k

Z

s

m0 (t)dt

s−1

it follows that m(s) = m(∞) + O(s−s ).

(4.78) where m(∞) is a constant.

The adjoint to (4.76) is the equation (3.33) for p(s) with κ = −k, i.e., (sp(s))0 = −kp(s) + kp(s + 1),

(4.79)

its standard solution is given by (4.80)

p(s) =

Z 0

∞

exp −sz + k

Z 0

z −u

(1 − e

)u

−1

du dz.

APPENDIX FOR CHAPTER 4

Therefore p(s) is positive for all s > 0. Moreover by (3.88) and (3.89) (4.81)

p(s) ∼ s−1

(4.82)

γk

as s → ∞ −k−1

p(s) ∼ e Γ(k + 1)s

as s → 0.

Having these asymptotics we can compute the constant m(∞) by examing the inner product Z s hm, pi = sm(s)p(s) − k m(x)p(x + 1)dx s−1 Z s = (s − 1)m(s − 1)p(s − 1) + xk+1 p(x)dx−k m(x). s−1

Letting s tend to infinity from the first expression we obtain hm, pi = m(∞) by the conditions (4.78) and (4.81). On the other hand letting s tend to one from the second expression we obtain hm, pi = eγk Γ(k + 1) by the initial condition (4.77) and the asymptotic (4.82). Therefore m(∞) = eγk Γ(k + 1).

(4.83)

Since both m(s) and the constant function m(∞) satisfy (4.76) we deduce by subtracting inner products of these against p(s) the following integral formula Z s (4.84) s(m(s) − m(∞)) = k (m(x) − m(∞))p(x + 1)dx s−1

for s > 1. Since p(s) is positive this tells us Corollary 4.6. If k is negative then m(s) − m(∞) changes sign in every interval of length 1. We shall state the asymptotic formula (4.72) in a form harmonized with the sieve theory, i.e., we express the main term by the product Y (4.85) W (z) = (1 + g(p)). p
We have W (z) =

Y

p
(1 − p1 )−k

Y

(1 − p1 )k (1 + g(p)).

p
Hence by Mertens’ formula (1.90) and the hypothesis (4.58) (4.86)

W (z) = eγk Γ(k + 1)cg (log z)k {1 + O((log z)−1 )}

where cg is the constant (4.67). Combining (4.86) with (4.72) we conclude

APPENDIX FOR CHAPTER 4

Theorem 4.7. Let g be a multiplicative function which satisfies (4.58) with k > − 12 and δ(x) bounded. Assume that (4.59) and (4.60) also hold. Let P (z) be the product of all primes < z. Then (4.87)

Mg (x, z) = W (z){f(s) + O((log x)−η )}

where W (z) is the product (4.85), s = log x/ log z, f(s) is the continuous solution to the differential-difference problem (4.88)

s−k f(s) = e−γk Γ(k + 1)−1 sf0 (s) = kf(s) − kf(s − 1)

if 0 < s 6 1 if s > 1

and η = 1 + k − |k|. The implied constant depends on s. A4.4. The Linear Case In this section g is a non-negative multiplicative function supported on squarefree numbers such that X

(4.89)

g(p) log p = log x + O(1),

p6x

X

(4.90)

g(p)2 log p < ∞.

p

In this case Theorem 4.5 with k = 1 yields (4.91)

X

g(m) = c log x + O(1)

c=

Y 1 (1 − )(1 + g(p)). p p

m6x

where (4.92)

However in some applications (see Chapter 9) we shall need asymptotic formulas more precise than (4.91), namely (4.93)

X

m6x

g(m) = c log x + b + O((log x)−A )

APPENDIX FOR CHAPTER 4

for all x > 2 where b is a constant. This could be derived by refining the proof of Theorem 4.5 subject to the correspondingly stronger hypothesis X (4.94) g(p) log p = log x + a + O((log x)−A ). p6x

While (4.94) is as deep as the Prime Number Theorem the asymptotic formula (4.93) can be verified directly by simple arguments in many cases in practice. For example we often have X f (m) = mg(m) = h(d) d|m

where h is an arithmetic function such that h(d) (log 2d)−A−1 . In this case (4.93) is derived rather easily with the constants c=

X h(d) d

d

and b =

X h(d) d

d

(γ − log d).

In order to keep our exposition of sieve theory on elementary level we shall accept (4.93) as a hypothesis rather than (4.94) whenever such strong asymptotic formula is needed. From (4.93) we derive the following Theorem 4.8. Suppose g is a multiplicative function supported on squarefree numbers with 0 6 g(p) < 1 and which satisfies (4.93) with some A > 2. Then for any q > 1 we have X g(m) = γq {c log x + cδq + b + O(τ (q)(log x)−A )} (4.95) m6x,(m,q)=1

where (4.96)

γq =

Y (1 + g(p))−1

and

p|q

δq =

X p|q

g(p) log p, 1 + g(p)

the implied constant depending on g and A. Proof. Note that g(p) (log p)−A by (4.93). Denote Y X P (s) = (1 + g(p)p−s )−1 = αn n−s , p|q

Q(s) =

n

Y X (1 − g(p)p−s )−1 = |αn |n−s . p|q

n

APPENDIX FOR CHAPTER 4

Observe that X

g(m)m−s =

(m,q)=1

Y X (1 + g(p)p−s ) = P (s) g(m)m−s . m

p-q

Hence the sum in (4.95) is X X

2x x + b + O((log )−A )} n n n6x X X x = αn (c log + b) + O((1 + |αn |(log n)j )(log x)−A ) n n n

αn g(m) =

mn6x

X

αn {c log

with j = A + 1. Here we have X

αn = P (0) = γq

n

−

X

αn log n = P 0 (0) = P (0)δq = γq δq

n

X

|αn |(log n)j = (−1)j Q(j) (0) Q(0)ω(q)j γq τ (q).

n

The last estimate is obvious while the former is derived as follows. We have ∞

XX Q0 (s) = g(p)` p−s` log p. Q

R(s) = −

p|q `=1

Differentiating the equation −Q0 (s) = Q(s)R(s) repeatedly j − 1 times we get the recurrence formula X j − 1 (j) −Q (0) = Q(k) (0)R(j−1−k) (0). k 06k<j

Hence (j)

|Q

(j−1)

(0)| 6 |Q

(0)|

∞ XX

g(p)` (1 + 2` log p)j .

p|q `=1

Since g(p) < 1 and g(p) (log p)−j the last sum is bounded uniformly in p. This shows that |Q(j) (0)| |Q(j−1) (0)|ω(q) Q(0)ω(q)j by induction. This completes the proof of Theorem 4.8.

APPENDIX FOR CHAPTER 4

CHAPTER V

THE LINEAR SIEVE THEORY

The sieve of dimension κ = 1 is demanded in central areas of analytic number theory such as the distribution of prime numbers. In this chapter we give a closer look at the linear sieve problems and we refine some previous results. 5.1. A Summary of Previous Results We begin by recalling the results of Chapter 3 in the context of the β-sieve for κ = 1. In this case the sieve limit is β(1) = 2 and the corresponding functions F (s), f (s) are the continuous solutions to the system

(5.1)

(5.2)

sF (s) = 2eγ

if 1 6 s 6 3

sf (s) = 0

at s = 2

(sF (s))0 = f (s − 1)

if s > 3

0

(sf (s)) = F (s − 1)

if s > 2

Let Λ+ and Λ− be the upper bound and the lower bound sieves composed of the 2-sieve of level D and the 10-sieve of level Dε where ε < e−10 . Suppose (1.37) holds and −1

log D > L + e4ε

(5.3)

log2 ε

.

Then by Theorem 3.6 (5.4)

V + (D, z) < F (s) + O(εL11 ) V (z) V − (D, z) > f (s) + O(εL11 ) V (z)

if s > 1 if s > 2

where s = log D/ log z and the implied constant is absolute. Applying these sieves to the sequence A we obtain (5.5)

S+ (A, z) < XV + (D, z) + R(A, P, Λ+ ) S− (A, z) > XV − (D, z) + R(A, P, Λ− )

CHAPTER 5. THE LINEAR SIEVE THEORY

where (5.6)

R(A, P, Λ) =

X

λd rd (A)

d|P

and P = P (z) is the product of primes p < z. These estimates hold also for S(A, z) because S− 6 S 6 S+ by (2.27) and (2.28). Some of the suspended terms Sn can be shown to be positive by a modern technology so we reclaim a good part of their contribution, namely X (5.7) S(A, z) 6 S+ (A, z) − Sn (A, z) n odd

(5.8)

S(A, z) > S− (A, z) +

X

Sn (A, z)

n even

where (5.9)

Sn (A, z) =

X

...

X

S(Ap1 ...pn , pn )

pn <···D −1

−1

if n + 2 6 ν = ε−1 log ε−1 . Note that pn > D(n+2) > Dν = w hence the weights of 2-sieve only have effect on Sn if n + 2 6 ν. For larger n the weights of 10-sieve complicate Sn somewhat so we ignore these terms entirely, their contribution is small anyway. − 1+ε Note that our composite sieves Λ+ = {λ+ = {λ− . d } and Λ d } have the level D 1+ε Changing D to D and chosing ε as in Theorem 3.7 we derive

(5.10)

S(A, z) < (F (s) + ∆)XV (z) + R(A, D) S(A, z) > (f (s) − ∆)XV (z) − R(A, D)

provided D > e2L where s = log D/ log z and ∆ = cL11 (log log log D)3 (log log D)−1 . The remainder term is X (5.11) R(A, D) = |rd (A)|. d
The general estimates of type (5.10) with the functions (5.1), (5.2) in the leading terms were established in 1965 by W.B. Jurkat and H.-E. Richert [JR]. Actually they got a better estimate for ∆ whereas a slightly worse expression for R(A, D), however, these differences are not essential for most applications. What needs to be emphasized is that Jurkat and

APPENDIX FOR CHAPTER 4

Richert took a somewhat reverse approach to the one we have taken for derivation of the β-sieve in general. They started Buchstab’s iterations from the interval 1 6 s 6 2 in which the initial upper bound for S(A, z) is given by the Λ2 -sieve of Selberg. It is important that in this range Selberg’s upper bound has the same function σ(s)−1 = F (s) = 2eγ s−1 in the leading term (see (4.37)) which itself is an interesting coincidence (more of the same happens in the large sieve theory, see Section 8.4). However for s > 2 the Selberg sieve yields a weaker bound because σ(s)F (s) < 1, and it requires infinitely many Buchstab’s iterations to reach the functions F (s), f (s) in the whole range s > 2. Recall we have also passed infinitely many iterations but starting from large s for which the Brun sieve provided very good estimates. Of course, in reality, the iterations are replaced by explicit constructions in either approach. Recall that the functions F (s) and f (s) were introduced by the series (3.24) and (3.25) respectively and we have proved in Section 3.5 that these series converge absolutely. In the case of κ = 1 we have X F (s) = 1 + fn (s) if s > 1 n odd

X

f (s) = 1 −

fn (s)

if s > 2

n even

where sfn (s) =

Z

...

Z

(t1 . . . tn )−1 t−1 n dt1 . . . dtn

01

for s > 12 (3 + (−1)n ). G. Greaves [G2] has developed the following series representations F (s) = 2eγ

X

In (s)

if s > 1

X

In (s)

if s > 2

n odd γ

f (s) = 2e

n even

where sIn (s) =

Z

...

Z

dµn

u1 >···>un > 1s u1 +...+un =1

and dµn = (u1 . . . un )−1 du1 . . . dun−1 is the measure on the set Tn = {(u1 , . . . , un ) : u1 > · · · > un ,

u1 + · · · + un = 1}

CHAPTER 5. THE LINEAR SIEVE THEORY

if n > 2 and it is the point measure if n = 1. Note that In (s) = 0 if n > s, sI1 (s) = 1 if s > 1, sI2 (s) = log(s − 1) if s > 2 and (sIn (s))0 = In−1 (s − 1)

if s >

1 (3 + (−1)n ), n > 2. 2

Therefore the Greaves series converge absolutely to the functions which satisfy (5.1) and (5.2) thus they yield F (s) and f (s) by the uniqueness. 5.2. The True Asymptotics for Special Sifted Sums It is interesting to compare the linear sieve estimates with the true asymptotic for Φ(x, z) = |{n 6 x : (n, P (z)) = 1}|.

(5.12)

Lemma 5.1. (Buchstab). For s > 1 we have 1

Φ(x, x s ) ∼ sω(s)x(log x)−1

(5.13)

as x → ∞ where ω(s) is the continuous solution to sω(s) = 1 if 1 6 s 6 2 (5.14) (sω(s))0 = ω(s − 1) if s > 2 √ Proof. For x > z > x we have Φ(x, z) = π(x) − π(z) + 1, hence the√asymptotic formula (5.13) holds if 1 < s 6 2 by the Prime Number Theorem. For z < x we have the recurrence formula X √ Φ(x, z) = Φ(x, x) + Φ( xp , p). √ z6p< x

Hence by the induction hypothesis we derive X x Φ(x, z) ∼ x(log x)−1 + x ω( log log p − 1)/p log p √ z6p< x

∼ (1 +

Z

s

ω(t − 1)dt)x(log x)−1 = ω(s)x(log x)−1 = ω(s)x(log x)−1 .

2

Remarks. By Mertens’ formula V (z) =

1 (1 − ) ∼ e−γ (log z)−1 p p
5.2. THE TRUE ASYMPTOTICS FOR SPECIAL SIFTED SUMS

we can write (5.13) with s = log x/ log z as Φ(x, z) ∼ eγ ω(s)xV (z).

(5.15)

Note that eγ ω(s) is half of F (s) in the range 1 < s 6 2. For s > 2 we have ω(s) = e−γ + O(s−s ).

(5.16)

This follows from the general theory of adjoint equation developed in A3. The adjoint equation to (5.14) is sq 0 (s) = −q(s + 1) whose standard solution is given by (5.17)

q(s) =

Z

∞

exp −sz −

z

Z

0

−u

(1 − e

)u

−1

du dz

0

(see (3.86) with (a, b) = (1, −1)). Note that q(s) > 0 for s > 0 and sq(s) ∼ 1 as s → ∞ by (3.88). Hence (3.81) yields the inner product hω, qi = ω(∞) = e−γ . The constant function ω(∞) has the same properties, therefore subtracting (3.81) for ω(∞) from that for ω(s) we get the following integral formula (5.18)

sq(s)(ω(s) − ω(∞)) = −

Z

s

(ω(x) − ω(∞))q(x + 1)dx

s−1

for s > 2. This shows that ω(s) − ω(∞) changes sign in each interval of length 1. Accordingly the true asymptotic value for Φ(x, z) fluctuates the expected value xV (z) as often as the s = log x/ log z moves by one. This behaviour of Φ(x, z) is exploited by J. Friedlander and A. Granville [FG] to show irregularities in the distribution of primes over residue classes to large moduli (see the comments about (1.30)). Let λ(n) be the totally multiplicative function such that λ(p) = −1 (the Liouville function). Put (5.19)

Ψ(x, z) = −

X

λ(n).

16n6x (n,P (z))=1

√ √ This sum coincides with Φ(x, z) if x > z > x and for z < x it satisfies the recurrence formula X √ x Ψ(x, z) = Ψ(x, x) − Ψ( , p). p √ z6p< x

Hence we derive by the same argument as for the Buchstab sum

CHAPTER 5. THE LINEAR SIEVE THEORY

Lemma 5.1. For s > 1 we have (5.20)

1

Ψ(x, x s ) ∼ sρ(s)x(log x)−1

as x → ∞ where ρ(s) is the continuous solution to sρ(s) = 1 (5.21) (sρ(s))0 = −ρ(s − 1)

if 1 6 s 6 2 if s > 2.

Remarks. By Mertens’ formula we can write (5.20) as (5.22)

Ψ(x, z) ∼ eγ ρ(s)xV (z).

Combining (5.14) and (5.21) we find that for s > 1 ω(s) + ρ(s) = e−γ F (s) (5.23) ω(s) − ρ(s) = e−γ f (s). 5.3. The Optimality of the Linear Sieve We shall show that the upper and the the lower bounds for S(A, z) obtained by the β-sieve for κ = 1 are essentially best possible. We have already got some clue from the − − equations (5.23). Consider the sequences A+ = (a+ n ) and A = (an ) with (5.24)

1 a+ n = 2 (1 − λ(n)),

1 a− n = 2 (1 + λ(n))

− + for 1 6 n 6 x and a+ is supported on integers n = an = 0 if n > x. Therefore A − n 6 x having odd number of prime divisors and A on these having even number of prime divisors. We have

S(A+ , z) + S(A− , z) = Φ(x, z) S(A+ , z) − S(A− , z) = Ψ(x, z). Hence by (5.15), (5.22) and (5.23) we get (5.25)

S(A+ , z) ∼ 12 xF (s)V (z) S(A− , z) ∼ 12 xf (s)V (z).

On the other hand we show that both sequences A+ , A− satisfy the hypotheses of linear sieve. Indeed we have X (5.26) λ(m) y(log y)−3 m6y

5.4. A REFINEMENT OF ESTIMATES FOR ERROR TERMS

by the Prime Number Theorem, whence for d < x we derive x X x x −3 1 + O (log ) . |A± | = (1 ± λ(d)λ(m)) = d 2 2d d d x m6 d

Therefore the approximation (1.15) holds with g(d) = d1 , X = x2 and the total remainder term is bounded by X x x (log )−3 x(log x)−2 R(A, D) d d d
for D = x1−ε . Hence the upper bound for S(A+ , z) and the lower bound for S(A− , z) given in (5.10) coincide with the asymptotics (5.25) respectively. This shows that for κ = 1 Theorem 3.7 is essentially best possible, i.e., F (s) cannot be reduced nor f (s) increased for any s > 2. The above examples of optimal sequences A+ and A− were found by A. Selberg [Sel]. 5.4. A Refinement of Estimates for Error Terms Some improvements of Theorem 3.1 are still possible in the estimation for ∆ and − − R(A, D). Let D+ , D− be the sets of support for the 2-sieves Λ+ = (λ+ d ) and Λ = (λd ) of level D, i.e., (5.27)

D+ = {d = p1 . . . pn : p1 > · · · > pn , p1 . . . pm p2m < D

for m odd}

D− = {d = p1 . . . pn : p1 > · · · > pn , p1 . . . pm p2m < D

for m even}

We can show that if z 6 (5.28)

√

D then (see [I1])

|{d ∈ D+ ∪ D− : d|P (z)}| D(log D)−2 .

Hence if the individual error terms rd (A) are bounded then the total remainder term satisfies X −2 (5.29) λ± . d rd (A) D(log D) d|P (z)

Furthermore if the density function in the leading term satisfies g(d)d 6 1 then we can show that (5.30)

V + (D, z) < {F (s) + O((log D)−1 )}V (z) V − (D, z) > {f (s) + O(log D)−1 )}V (z)

if s = log D/ log z > 2. Inserting (5.29) and (5.30) with D = X to (1.61) one obtains

CHAPTER 5. THE LINEAR SIEVE THEORY

Theorem 5.3. Suppose (1.15) holds with g(d)d 6 1 and |rd (A)| 6 1. Then for 2 6 √ z 6 X we have (5.31)

S(A, z) 6 {F (s) + O((log X)−1 )}XV (z) S(A, z) > {f (s) + O((log X)−1 )}XV (z)

where s = log X/ log z and F (s), f (s) are the continuous solutions to (5.1) - (5.2). The implied constant is absolute. 1

The lower bound in Theorem 5.3 is quite interesting for z somewhat smaller than X 2 . We have (5.32)

sf (s) = 2eγ log(s − 1)

if 2 6 s 6 4.

S(A, z) X(log X)−2

if z < ηX 2

Hence (5.31) gives (5.33)

1

where η is a small positive constant. Applying this for an interval we get Corollary 5.4. Any interval of length X = cz 2 where c is a large absolute constant and z > 2 contains at least (z/ log z)2 numbers having no prime divisors < z. 5.5. The Remainder in a Well-factorable Form Even though the main terms of linear sieve bounds are best possible the results are not definite. These depend on the sieve level D which can be chosen at will; the bigger D the better main terms. On the other hand D cannot be too large because we must be able to show that the remainder term X (5.34) R(A, P ; Λ) = λd rd (A) d|P

is negligible in relation to the expected value XV (z). Very often we treat the remainder term by summing its terms with absolute values and consequently we require the bound (5.35)

Rγ (A, D) =

X

γ ω(d) |rd (A)| < X(log X)−A

d
with some constants γ, A for X sufficiently large. In practice (5.35) holds for D = X α−ε with 0 < α 6 1. For example if A = (an ) with an = Λ(n − 2) for 2 < n < X then we have 1 (5.35) for D = X 2 −ε by virtue of the Bombieri-Vinogradov theorem.

5.5. THE REMAINDER IN A WELL-FACTORABLE FORM

The best exponent α one can hope for is α = 1 since it is impossible to control the individual error terms rd (A) to moduli d which exceed the number of elements in the sifting sequence. However, sometimes one can do better with R(A, P ; Λ) than Rγ (A, D) by observing a cancellation of terms λd rd (A). This is due to the variation of signs of the sieve weights λd and the error terms rd (A). Remarks. In the√ case of Selberg’s example (5.24) the sign of λd agrees with that of rd (A) very often in x < d < x, precisely we have X x (5.36) 2rd (A− ) = λ(d) λ(m) − { } d m6M

√ for all d in the interval x(M + 1)−1 < d 6 xM −1 with any integer 1 6 M < x. Therefore thereP is no variation of sign of λd rd (A) as d runs over the above intervals with M such that λ(m) 6= 0.

Unless a choice of A = (an ) is biased, as in the Selberg example, we expect the remainder term rd (A) changes the sign in a much different fashion than the sieve weight λd does (the latter agrees with µ(d) quite often) so the terms λd rd (A) do cancel considerably. However, it is not an easy job to exploit the cancellation because the sign change of λd is intractable. Indeed the sign of λd will play no crucial role in our developments. The key property of the linear sieve weight is hidden in its multiplicative structure which we are going to exploit now to write the remainder R(A, P, Λ) in very flexible bilinear forms. Let Λ+ = (λ+ d ) be the upper bound sieve of level D > 1, thus for d 6= 1 we have λ+ d

= µ(d)

∞ X

X

ψ + (p1 , . . . , pn )

n=1 p1 ...pn =d

where ψ + (p1 , . . . , pn ) is defined to be 1 if (5.37)

p1 > · · · > p n ,

p1 . . . pm p2m 6 D

for m odd

and it is zero otherwise. Next we split the summation over p1 , . . . , pn into dyadic boxes, λ+ d

= µ(d)

∞ X

X

n=1 (b1 ,... ,bn )

X

ψ + (p1 , . . . , pn )

p1 ...pn =d bm
where b1 , . . . , bn run independently over the powers 1, 2, 4, 8, . . . . The conditions (5.37) imply (5.38)

b1 > · · · > bn ,

b 1 . . . b m bm < D

for all m

CHAPTER 5. THE LINEAR SIEVE THEORY

(in fact the stronger conditions b1 . . . bm b2m < D hold for m odd but we do not need these). Let Bn denote the collection of vectors b = (b1 , . . . , bn ) satisfying (5.38) thus its cardinality is bounded by |Bn | 6 (log D)n .

(5.39) For any d 6= 1 we have (5.40)

λ+ d = µ(d)

∞ X X

n=1 b∈Bn

X

ψ + (p1 , . . . , pn ).

p1 ...pn =d b<(p1 ,... ,pn )62b

Now when the vectors (p1 , . . . , pn ) are reasonably well located we want to forget the conditions (5.37). By Lemma √ 5.15 with x = D we detect the conditions p1 . . . pm p2m 6 D and by Lemma 5.16 with y = D the conditions p1 > · · · > pn (recall that d = p1 . . . pn is squarefree so the strict inequalities hold) getting Y Z ∞ + ψ (p1 , . . . , pn ) = h(t)(p1 . . . pm p2m )it dt −∞

16m6n m odd

Y Z

∞

g(t)(p` /p`+1 )it dt.

−∞

16`
1 Here the number of integrals is [ 3n−1 2 ] < 2n and each one has the L -norm bounded by log 6D. By a linear change of variables we write Z + itn 1 (5.41) ψ (p1 , . . . , pn ) = ϕ+ (t1 , . . . , tn )pit 1 . . . pn dt1 . . . dtn Rn

where ϕ+ is a function Rn with Z (5.42)

|ϕ+ (t)|dt 6 (log 6D)2n .

Rn

Inserting (5.41) into (5.40) we obtain the desired representation (5.43)

λ+ d

=

∞ X Z X

n=1 b∈Bn

ϕ+ (t)λd (b, t)dt

Rn

where (5.44)

λd (b, t) = µ(d)

X

p1 ...pn =d b<(p1 ,... ,pn )62b

itn 1 pit 1 . . . pn .

5.5. THE REMAINDER IN A WELL-FACTORABLE FORM

The same argument applies for the lower bound sieve Λ− = (λ− d ), just change + into − and the word “odd” into “even”. For d = 1 we put λ1 (b, t) = 1. Note that |λd (b, t)| 6 n! and λd (b, t) = 0 if d > 2n kbk where kbk = b1 . . . bn if b = (b1 , . . . , bn ). Given a partition b = (b1 , b2 ) and the corresponding partition t = (t1 , t2 ) we have (5.45)

λd (b, t) =

X

λd1 (b1 , t1 )λd2 (b2 , t2 ).

d1 d2 =d

Lemma 5.5. Let D = D1 D2 with D1 , D2 > 1. Then any b ∈ Bn has a partition b = (b1 , b2 ) such that (5.46)

kb1 k < D1

and

kb2 k < D2 . 1

Proof. Suppose b = (b1 , . . . , bn ) satisfies (5.38). We have b1 < D 2 6 max(D1 , D2 ) so the assertion is true if n = 1. Let b∗ = (b1 , . . . , bn−1 ) = (b∗1 , b∗2 ) be a partition which satisfies (5.46). By the last condition of (5.38) we have either kb∗1 kbn < D1 or kb∗2 kbn < D2 therefore we can extend one of b∗1 , , b∗2 by bn getting a partitition of b which satisfies (5.46). This completes the proof by induction. Definition. An arithmetic function f (d) is said to be well-factorable of level D if for any M, N > 1 with M N = D there exist two functions g(m), h(n) such that f = g ∗ h and |g(m)| 6 1,

g(m) = 0

if m > M ,

|h(n)| 6 1,

h(n) = 0

if n > N .

By Lemma 5.5 and the property (5.31) we derive Corollary 5.6. For any b ∈ Bn and t ∈ Rn the function f (d) = λd (b, t)/n! is wellfactorable of level 4n D. Proof. Let 4n D = D1 D2 with D1 > D2 > 1. If D2 > 2n then the assertion holds by virtue of (5.45) and Lemma 5.5 with 2−n D1 , 2−n D2 in place of D1 , D2 . If D2 6 2n then D1 > 2n D so the trivial convolution is adequate. Suppose f (d) is a well-factorable function supported on d 6 D, (d, P ) = 1 and g(c) is any function supported on c 6 C, c|P with C 6 D and |g(c)| 6 1. Then the convolution function h = f ∗ g is well-factorable of level CD.

CHAPTER 5. THE LINEAR SIEVE THEORY

We apply the above observation to the composed sieve Λ+ = (λ+ d ) described in Section 3.9. For this sieve the remainder term is X X R(A, P, Λ+ ) = λ+ λ+ c d rcd (A) c|P (w)

d|P (z,w)

+ ε where λ+ c is the α-sieve supported on c < D and λd is the β-sieve supported on d < D. For the β-sieve we use the formula (5.43) getting X X Z + (5.47) R(A, P, Λ ) = ϕ+ (t)R+ (b, t)dt 06n<ν b∈Bn

Rn

where (5.48)

R+ (b, t) =

X

c|P (w)

λ+ c

X

λd (b, t)rcd (A).

d|P (z,w)

Here n is restricted by ν = ε−1 log ε−1 because for b = (b1 , . . . , bn ) in Bn we have wn < 2n b1 . . . bn < 2n b−1 n D < D. Moreover we can write X (5.49) R+ (b, t) = ν! ρd rd (A) d|P (z)

where ρ is a well-factorable function of level 4ν D1+ε . Similar results hold for the remainder term of the lower bound sieve Λ− = (λ− d ) but, of course, with a different well-factorable function ρ. Taking the worst of all ρ involved in (5.49) we conclude by (5.39), (5.42) and (5.47) (see [I5]). Theorem 5.7. The remainder terms in the linear sieve estimates (5.5) satisfy X (5.50) |R(A, P, Λ)| 6 (ν log D)3ν | ρd rd (A)| d|P

where ν = ε−1 log ε−1 and ρ is a well-factorable function of level 4ν D1+ε . In order to polish the background from now on we change 4ν D1+ε into D; of course, this does not effect the estimates (5.4) for the main terms V + (D, z), V − (D, z) because F (s), f (s) are Lipschitz type. Now for the remainder terms we require X (5.51) R(A, P, Γ) = ρd rd (A) X(log D)−A d|P

with any Γ = (ρd ) well-factorable of level D. Here A is any positive number and the implied constant depends only on A. One of the most interesting sequences for which (5.51) is known to hold for D considerbly larger than the classical level is the sequence of shifted primes (see [BFI] and [F].

5.6. ESTIMATES FOR BILINEAR FORMS IN THE ERROR TERMS

Proposition 5.8. Let X > 2 and a 6= 0 be a fixed integer. Then for any well-factorable function f (d) of level D = x4/7−ε we have X π(x) (5.52) x(log x)−A f (d) π(x; d, a) − ϕ(d) (d,a)=1

where the implied constant depends only on ε, a, A. This result together with (5.4) - (5.7) yield Corollary 5.9. The number of pairs of twin primes {p, p − 2} with p 6 x satisfies (5.53)

π2 (x) < (7 + ε)

Y

(1 − (p − 1)−2 )x(log x)−2

p>2

for any ε > 0 provided x is sufficiently large. This bound is 3 12 times larger than the conjectured asymptotic value (2.56). 5.6. Estimates for Bilinear Forms in the Error Terms By (5.47) one can arrange various special forms for R(A, P, Λ). Given M > N > 1 with M N = D we have (5.54)

|R(A, P, Λ)| 6 B(A, M, N ) (ν log D)3ν

where B(A, M, N ) is a bilinear form of size M × N of type (5.55)

B(A, M, N ) =

X

X

αm βn rmn (A)

m6M n6N mn|P

with some |αm | 6 1 and |βn | 6 1. We say B(A, M, N ) has size M × N . Assuming M > N we can make (5.56)

βn = 0

if n has a prime divisor < D1/ν

(recall that ν = ε−1 log ε−1 ). This condition on βn helps to resolve technical problems in various methods of estimating B(A, M, N ). To show that the remainder term R(A, P, Λ) has a negligible contribution to the estimates (5.5) we require by virtue of (5.54) that (5.57)

B(A, M, N ) X(log D)−A

CHAPTER 5. THE LINEAR SIEVE THEORY

for any A > 0 with the implied constant depending on A. We wish D to be as large as possible so we choose the M, N which gives the maximal M N = D subject to (5.57). In practice our treatment of B(A, M, N ) makes no use of particular properties of the coefficients αm , βn other than the boundedness, hence neither of M, N can be larger than X. Quite often we can establish (5.57) for M as large as the classical level of A while N is a positive power of M . Even though N is small in comparison to M the resulting enlargement of D improves the linear sieve bounds significiantly allowing us to pass the classical limits. We shall demonstrate some breakthroughs in the context of weighted sieves in the next section. Since the coefficients αm , βn are supposed to be arbitrary there is no chance of showing cancellation of terms in the bilinear form B(A, M, N ) if the error terms rd (A) behave like a multiplicative function in d over a long segment. Note that the sequence (5.24) has such defect. In many important cases the rd (A) has a nice Fourier series expansion whose terms are far from being multiplicative in d. In these cases the problem reduces to estimation of certain exponential sums and the latter are given useful estimates by various analytic methods. In particular the large sieve methods (see Chapter 8) work well. The large sieve methods are also very effective in the context of Dirichlet series expansion for rd (A). If neither Fourier nor Dirichlet expansion is available one can apply to B(A, M, N ) the dispersion method of Linnik. There are other options as well, such as the circle method of Hardy-Ramanujan. It is the non-trivial handling of the error terms rd (A) through which analytic methods sneak into the modern sieve theory. Below we give the bound (5.57) for three selected sequences (the proofs lie beyond the scope of this course). Example 1. Let A = (an ) be the characteristic function of integers in a short segment x − y < n < x, therefore rd (A) = |Ad | − yd . Suppose x7/19 < y < x1/2 . Then we have (5.57) for 19

7

X = y, M = yx−ε , N = y 16 x− 16 . 35

7

so D = M N = y 16 x− 16 −ε . Example 2. Let A = (an ) be the characteristic function of integers n 6 x in the arithmetic progession n ≡ a(mod q), therefore for (d, q) = 1 rd (A) = |Ad | −

x . dq

5.6. ESTIMATES FOR BILINEAR FORMS IN THE ERROR TERMS

Suppose x1/2 < q < x2/3 . Then we have (5.57) for X = q −1 x, 7

M = q −1 x1−ε ,

3

1

N = q− 4 x 2

3

so D = M N = q − 4 x 2 −ε . Example 3. Let A = (an ) be the characteristic function of integers n = m2 + 1 6 x, therefore ω(d) 1 rd (A) = |Ad | − x2 d where ω(d) denotes the number of roots to the quadratic congruence ν 2 + 1 ≡ 0(mod d). In this case (5.57) holds for 1

X = x2 , 5

so D = M N = x 9 −ε .

1

M = x 2 −ε ,

1

N = x 18

CHAPTER 5. THE LINEAR SIEVE THEORY

APPENDIX FOR CHAPTER 5 Separation of Variables Techniques In this section we develop certain integrals which are useful for separating integral variables `, m, n constrained by inequalities of type ` 6 x or m 6 n. Lemma 5.10. For x > 1 there exists a function g(t) such that Z ∞ |g(t)|dt < log 6x −∞

and for every positive integer ` Z ∞

it

g(t)` dt =

−∞

1

if ` 6 x

0

otherwise.

Proof. Put f (u) = min{u, 1, [x] + 1 − u} on 0 6 u 6 [x] + 1 and f (u) = 0 elsewhere. Therefore, for a positive integer ` we have f (`) = 1 if ` 6 x and f (`) = 0 otherwise. On the other hand f (u) is given by the inverse Mellin transform Z 1 g(s)u−s ds f (u) = 2πi (0) with Z

∞

1 du = s

Z

1

1 u du − g(s) = f (u)u s 0 0 1 = 1 + [x]s+1 − [x + 1]s+1 . s(s + 1) s−1

s

Z

[x+1]

us du

[x]

These three expressions show that 2 2(x + 1) |g(s)| 6 min 1 + log x, , . |s| |s(s + 1)| Hence integrating separately over the intervals [0, 1), [1, x), [x, ∞) we deduce that Z ∞ |g(it)|dt 6 1 + log x + 2 log x + 2(x + 1)x−1 < π log 6x 0

which completes the proof by changing g(it) into g(t).

APPENDIX FOR CHAPTER 5

Lemma 5.11. For y > 1 there exists a function h(t) such that Z

∞

|h(t)|dt < log 6y 2

−∞

and for all positive integers m, n 6 y we have Z

∞

m h(t)( )it dt = n −∞

1

if m 6 n

0

otherwise.

−2 Proof. The distinct points m so the condition m n are spaced by > y n 6 1 is equivalent m −2 to n 6 u for any u with v 6 u < v + y where v is the largest of all points m n 6 1. The remaining part is similar to the proof of Lemma 5.10.

6.1. ALMOST-PRIMES

CHAPTER VI

WEIGHTED SIEVES

6.1. Almost-Primes We know by now that the sieve methods are capable of detecting almost-primes in various interesting sequences. For this purpose the sifting sequence A = (an ) must be, so to speak, in a local position, that is we require, in addition to the usual hypotheses, that A is supported in the initial segment 1 6 n 6 x. Throughout we assume that x is sufficiently large. Furthermore we require that the sifting range P consists of all primes (except possibly for the primes p such that |Ap | = 0 since these play no role whatsoever). In such a situation the sifted sum X an S(A, z) = (n,P (z))=1

takes only elements an with n having fewer than log x/ log z prime divisors. If one can show that S(A, z) XV (z)

(6.1) 1

for z as large as z = x r+1 with r a positive integer then we can conclude that there are elements an ∈ A with n having at most r prime divisors. Let us recall the following concept Definition. A positive integer n is said to be almost-prime of order r if ν(n) 6 r. Here ν(n) denotes the number of distinct prime divisors of n. We denote the set of almost-primes of order r by Pr , thus P ⊆ P1 ⊆ P2 ⊆ . . . and N = ∪Pr . For r = 1 the set P1 consists of prime powers. Suppose A has level (6.2)

D=x

1−ε g

with some g > 1 and any ε > 0 provided x is sufficiently large in terms of ε. We shall refer to g as a degree of A. Indeed, if A = (an ) is the indicator of a polynomial values n = F (m) then g = deg F , but in general g can be any real number > 1. In fact g is not uniquely

CHAPTER 6. WEIGHTED SIEVES

defined. We wish to have g as small as possible to prove better results, and this depends on our skill at getting a high level D for the particular sequence. For example we know that the sequence of shifted primes n = p − 2 has degree g = 2 by virtue of the BombieriVinogradov theorem whereas one hopes to have g = 1 in this case (the Elliott-Halberstam conjecture). Immediately by definition of the sieve limit β = β(κ) > 1 it follows that (6.1) holds for z = D(1−ε)/β = x(1−ε)

(6.3)

2

/βg

.

Therefore the sequence A = (an ) contains elements supported on almost-primes of order (6.4)

r = [βg + ε],

for any ε > 0, and we have X

(6.5)

an X(log x)−κ

n∈Pr

provided x is sufficiently large. Before advancing arguments it is instructive to say a few more words about (6.5). First we recall that the proof of (6.5) goes through a lower bound for the sifted sum S(A, z) and the latter is estimated below by the weighted sum S− (A, z) =

X

λ− d |Ad | =

X

an (

n

d|P (z)

X

λ− d)

d|(n,P (z))

which could be evaluated asymptotically though we had only given a lower bound for economy sake. In the last sum if n has more than r prime divisors than (n, P (z)) 6= 1 so the multiplicity factor is (6.6)

w(n) =

X

λ− d 60

d|(n,P (z))

by (1.60) and the corresponding term an yields no positive contribution. It was P. Kuhn [Kuh] who first realized that one can detect almost-primes of smaller order by estimating sums (6.7)

S(AW) =

X n

an w(n)

6.1. ALMOST-PRIMES

with more suitable multiplicities w(n) than these given by (6.6). We seek real numbers w(n) such that (6.8)

w(n) 6 0

if n 6 x and ν(n) > r.

In the same time we wish to have (6.9)

S(AW) X(log x)−κ .

¿From both properties we conclude there must exist an ∈ A with ν(n) 6 r. In order to be able to evaluate the sum S(AW) we require the multiplicities w(n) to be of convolution type X (6.10) w(n) = wd . d|n

By unfolding the convolution as before we obtain the weighted sum X (6.11) S(AW) = wd |Ad | d

to which we insert the approximation (1.15) for each d with wd 6= 0. The resulting main term is XW with X (6.12) W = wd g(d) d

and the remainder term is (6.13)

R(A, W) =

X

wd rd (A).

d

If the weights wd are nice we should be able to evaluate W asymptotically, or at least to show that (6.14)

W (log x)−κ .

Hence (6.9) follows provided the remainder term is under control, and this requires the weights wd to be supported on d 6 D. Along the above lines we described general principles of weighted sieve. The goal is to prove (6.5) with small r bounded in terms of k and g only. In this framework one proposes a problem of constructing the weights wd which produce almost-primes of the lowest order. Needless to say the job is hopelessly difficult. Pretty good results were obtained in an experimental fashion by Buchstab, Miech, Ankeny-Onishi, Selberg, Richert-Halberstam, Laborde and Greaves.

CHAPTER 6. WEIGHTED SIEVES

6.2. Sieve Limits for Almost-Primes Definition. Any number Λ(κ, r) > 1 such that (6.5) holds whenever (6.14)

g < Λ(κ, r)

is called the sieve limit for almost-primes of order r. For example we may have Λ(κ, r) = rβ(κ)−1 by virtue of (6.4). In particular by (2.50) we get (6.15)

Λ(κ, r) =

r . 4κ + 1

¿From now on we confine discussion to the linear sieve. In this case we denote Λ(1, r) = Λr . Since the sieve limit is β = 2 we have (6.5) with r < 2g + ε by (6.4) therefore Λr = 2r is possible. On the other hand Selberg (oral communication) constructed a sequence showing that (6.16)

Λr > r

is not possible.

To see this consider the characteristic sequence A = (an ) of numbers n = mp1 . . . pr−1 6 x 1 1 with m > 1 having even number of prime divisors and x r < pj < 2x r . Clearly ν(n) > r+1 1−ε and A has level D = x r by (5.26) so its degree is g = r. Conjecture. For the linear sieve (6.17)

Λr = r.

The following values were obtained by different constructions Λr = r − 0.261 . . .

(Richert, 1969)

Λr = r − 0.145 . . .

(Laborde, 1979)

Λr = r − 0.124 . . .

(Greaves, 1986).

Richert’s weights are simple and elegant. We shall use these to show the following

6.2. SIEVE LIMITS FOR ALMOST-PRIMES

Theorem 6.1. For any r > 2 one may take (6.18)

Λr =

log 43 (3r + 1) . log 3

Proof. Following Richert [R1] we consider the sifted sum X

S(AW, z) =

(6.19)

an w(n)

(n,P (z))=1

with multiplicities of type w(n) = 1 − λ

(6.20)

X

wp

p|n

where Λ is a positive number and wp are the logarithmic weights wp = max{1 −

(6.21)

log p , 0}. log y

Thus we have three parameters to choose, namely y, λ and z. Since we cannot handle the weight wp with p beyond the level of A we restrict to z < y < D. Let us put 1

1

z = x gs

and y = x gu

with s > u > 1.

For n 6 x with (n, P (z)) = 1 and ν(n) > r we have log p 1− log y

log n 6 1 − λ ν(n) − log y

log x 6 1 − λ ν(n) − log y

6 1 − λ(r + 1 − gu).

w(n) 6 1 − λ

X p|n

Assuming gu < r + 1 we choose (6.22)

λ = (r + 1 − gu)−1

to make the multiplicity w(n) 6 0. This proves that (6.23)

X

n∈Pr

an > S(AW, z).

CHAPTER 6. WEIGHTED SIEVES

To proceed further we express S(AW, z) in terms of pure sifted sums as follows S(AW, z) = S(A, z) − λ

X

(1 −

z6p
log p )S(Ap , z). log y

Then we apply (5.10) with levels D and Dp = D/p respectively getting S(A, z) > XV (z){f (s) − ε} − R(A, D) S(Ap , z) < g(p)XV (z){F (sp ) + ε} + R(Ap , Dp ) where 1

sp = log(x g /p)/ log z = (1 − g

log p ) s. log x

By Lemma 1.1 we execute the summation over p getting X

z6p
log p 1− log y

g(p)F (sp ) <

Z

s

(1 −

u

u 1 dt )F ((1 − )s) + ε. t t t

Hence S(AW, z) > XV (x){G(s, u) − ε} where G(s, u) = f (s) − λ

Z

s

u

u (1 − )F t

1 dt (1 − )s . t t

For computational reasons we restrict s 6 4, we are then in the range of elementary functions sF (s) = 2eγ γ

sf (s) = 2e log(s − 1)

if 1 6 s 6 3 if 2 6 s 6 4.

We compute that s−1 s G(s, u) = log(s − 1) − λ u log − (u − 1) log F (s). u u−1 We demand G(s, u) to be positive, i.e., (r + 1 − gu) log(s − 1) > u log

s s−1 − (u − 1) log . u u−1

6.2. SIEVE LIMITS FOR ALMOST-PRIMES

For s = 4 this condition becomes (6.24)

gu < r + δ(u)

where (6.25)

δ(u) = [u log

3u 4

− (u − 1) log(u − 1)]/ log 3.

Hence we conclude that for any u with 1 < u < (6.26)

∆(r, u) =

r+1 g

the number

r + δ(u) u

is a limit for almost-primes of order r. The maximum of ∆(r, u) is attained at u = 1 + 3−r and is equal to (6.18) completing the proof of Theorem 6.1. Subsequently Richert refined his construction by allowing the weights wp in (6.20) to depend on n. He sets (6.27)

wp (n) =

log p(n) log y

1

if p(n) < p < y 2 ,

where p(n) is the least prime divisor of n, and he keeps wp (n) = wp as in (6.21) for the other places p. With λ given by (6.22) as before (assuming λ > 1, i.e., r < gu < r + 1) one shows that if w(n) is positive for n 6 x then n ∈ Pr (see [HHR] and [I6]). M. Laborde [Lab] 1 gave further refinements. Either of these is essential only if one chooses z < D 4 and in this range the involved functions f (s), F (s) are not elementary which complicates somewhat the computations. The weights of G. Greaves [G2] are altogether different. These are based on the particular sets (5.27). Thus Greaves’ weights reach the roots of the linear sieve before appealing to estimates for sifted sums. He predicts that the optimal weights cannot be restricted only to large prime divisors (p ≥ z in the former constructions). To see the progress we list three values of Λ2 ; Λ2 = 2 − 0.165 . . .

(Richert)

Λ2 = 2 − 0.142 . . .

(Laborde)

Λ2 = 2 − 0.044 . . .

(Greaves).

In view of the conjecture Λr = r our current values of Λr are quite good. There is, however, room for substantial improvements in terms of degree g. Indeed, the condition (6.14) is weaker if g is smaller. Recall that the degree is a function of level given by (6.2),

CHAPTER 6. WEIGHTED SIEVES

and the level can be higher if the remainder has a suitable form. At the end of Chapter 5 we have given three sequences which have larger than the classical level in the sense of bilinear forms for the remainder term. Needless to say there are many more examples of such sequences so it is important to develop weighted sieve theory which takes advantage of the new setting of linear sieve. It is easy to adopt the Richert constructions (6.19)-(6.21). Suppose (5.57) holds true for every bilinear form B(A, M, N ) of size M × N with M > N > 1. Then D = M N is the level of A by virtue of Theorem 5.7. Moreover Ap has level Dp = D/p on average over p but only in the range p 6 M . Therefore the test condition (6.24) is valid in the new setting provided all the weights wp are supported on p 6 M . To secure this we choose y = M (this is not always the best choice) which gives (6.28)

u∼1+

log N . log M

The positivity condition (6.24) becomes (6.29)

h < r + δ(u)

where h = gu is the degree of A with respect to the classical level M and g is the degree of A with respect to the new (improved) level M N . Notice that if δ(u) = 0 then the condition (6.29) coincides with the conjecture Λr = r. The function δ(u) is increasing and has zero at (6.30)

u0 = 1.091 . . .

.

Hence, if our improvement of the classical level exceeds 9.1%, we pass the classical limit for almost-primes. 6.3. Some Applications One gets nothing for primes from weighted sieves therefore we seek almost-primes of the second order because these are direct relatives of primes. In this section we verify the condition (6.29) for r = 2 with sequences from the three examples in Section 5.6. For integers in a short interval x − y < n 6 x with y = xθ the classical degree is h = θ−1 5 7 and the improvement rate is u = 1 + (19θ − 7)/16θif 19 < θ < 12 . For θ = 11 we get 11 49 h = 5 , u = 40 and δ(u) = 0.211 . . . so we pass the positivity test (6.29) proving Theorem 6.2. If x is sufficiently large then there exists n ∈ P2 with x − x5/11 < n 6 x. For integers n 6 x in a progression n ≡ a(mod q) with (a, q) = 1 and q = xθ the classical degree is h = (1 − θ)−1 and the improvement rate is u = 1 + (2 − 3θ)/4(1 − θ) if 12 < θ < 23 . 7 29 For θ = 13 we get h = 13 6 , u = 24 and δ(u) = 0.189 . . . so we pass the positivity test (6.29) proving

6.3. SOME APPLICATIONS

Theorem 6.3. If q is sufficiently large than for any a prime to q there exists n ∈ P2 13 with n ≡ a(mod q) and n < q 7 . Our third sequence consists of numbers n = m2 + 1 6 x. This has degree h = 2 and the improvement rate u = 10 9 > u0 . Therefore we pass that positivity test (6.29) proving Theorem 6.4. There are infinitely many integers m with m2 + 1 ∈ P2 .

CHAPTER 6. WEIGHTED SIEVES

6.4. Twin Almost-Primes In connection to the twin prime and Goldbach problems J.-R. Chen [C] considered the sifted sum (6.19) with multiplicities of type X

w(n) = 1 −

(6.31)

z6p1
1 − 2

X

p1 p2 p3 =n z6p3
1 2

1

with z < y = x 3 . Clearly if n 6 x, (n, P (z)) = 1 and w(n) > 0 then n ∈ P2 . Chen 1 1 takes z = x 10 though z = x 8 is sufficient. A new idea appears in a treatment of numbers n = p1 p2 p3 . When the twin prime problem is tried the numbers n = p1 p2 p3 = p − 2 are replaced by m = p1 p2 p3 + 2 = p. A similar treatment (called switching sieves) was given independently in [I2] to the equation p = a2 + b2 + 1.

(6.32)

In our work a half-linear is switched to a double-linear sieve whereas in Chen’s work a linear sieve is switched to another linear sieve. After the switch the sequence being sifted is thinner than the original one so even a crude upper bound is fine. This idea is revealing in the case of the equation (6.32) which we are going to treat in Section 7.2 (one can see that it works without checking numerical computations). Here we treat the twin primes problem in a straightforward fashion (without weights) to re-establish the celebrated result of Chen Theorem 6.5. There are infinitely many primes p such that p − 2 is almost-prime of order two. Actually Chen’s weighted sieve produces primes p such that p − 2 has at most two 1 prime divisors each of which is larger than p 8 . We shall make a step closer to the genuine 3 twin primes by requiring these two prime divisors to be larger than p 11 . Such result has applications to the Artin conjecture for primitive roots and to the Lang-Trotter conjecture 3 for elliptic curves where it is vital that 11 > 14 (see [Mur] for other results and contributors). Recall the hypothetical formula (2.56) for twin primes, π2 (x) ∼ Bx(log x)−2 where B is the twin prime constant B=2

Y

(1 − (p − 1)−2 ).

p>2

Let π2 (x, z) denote the number of primes p 6 x such that p − 2 = p1 p2 with p1 > p2 > z. We shall prove the following estimate

6.4. TWIN ALMOST-PRIMES

Theorem 6.6. For all x sufficiently large 3

(6.33)

π2 (x) + π2 (x, x 11 ) >

−2 1 . 31 Bx(log x)

Let A = (an ) be the characteristic sequence of numbers n = p − 2 with 2 < p 6 x. By 4 Proposition 5.8 our sequence has level D = x 7 −ε in the sense of Theorem 5.7. Therefore applying the linear sieve inequality (5.5) we get S(A, z) > (f ( 4s 7 ) − ε)|A|V (z) 1

for z = x s where |A| = π(x) − 1 ∼ x(log x)−1 and Y

V (z) =

2
(1 −

1 ) = e−γ B(log z)−1 . p−1

Assuming 3 < s 6 4 the sifted sum S(A, z) catches numbers p−2 having at most three prime divisors each of which is > z. We just do not want these with exactly three prime divisors. Thus we need to subtract from S(A, z) the number of prime solutions to p−2 = p1 p2 p3 . We interpret this equation as p = p1 p2 p3 + 2 and let B = (bm ) be the characteristic sequence of numbers m = p1 p2 p3 + 2 6 x with p1 > p2 > p3 > z. After this switch the number of unwanted solutions is estimated by S(B, z). The new 1 1 sequence B is rather thin since z is near x 3 (precisely |B| < 12 |A|, see (6.34) for s = 11 3 !) yet it has the same level as A (Proposition 5.8 holds for a great variety of almost-prime numbers). Therefore applying the linear sieve inequality (5.5) we get S(B, z) < (F (

4s ) + ε)|B|V (z). 7

It remains to estimate |B|. By the Prime Number Theorem |B| ∼

1X X x x −1 (log ) ∼ G(s)x(log x)−1 6 p2 p3 p 2 p3 p2 ,p3 >z zp2 p3 6x

where 1 G(s) = 6

Z Z α,β> 1s α+β<1− 1s

−1

(αβ)

−1

(1 − α − β)

s dαdβ = 6

Z Z α,β>1 α+β<s−1

dαdβ . αβ(s − α − β)

CHAPTER 6. WEIGHTED SIEVES

Since αβ(s − α − β) > s − 2 and the area of the triangle is 12 (s − 3)2 we get easily a quite good upper bound (6.34)

G(s) <

s(s − 3)2 . 12(s − 2)

Combining the above estimates we arrive at π2 (x) + π2 (x, z) > S(A, z) − S(B, z) 4s 4s > {f ( ) − F ( )G(s) + o(1)}V (z)x(log x)−1 7 7 7 4s ∼ {log( − 1) − G(s)}Bx(log x)−1 . 2 7 For s = 11 3 we compute that the above constant is greater than proof of (6.33), hence also of Theorem 6.5.

1 31 .

This completes the

Our proof of Theorem 6.6 would work for the Goldbach equation N =p+n

with n ∈ P2

if we had Proposition 5.8 for x = N and a = N but it is not yet established (the implied constant in (5.52) depends on Hecke eigenvalues λj (N ) for Maass cusp forms with respect to certain congruence groups and one needs the Ramanujan-Petersson conjecture to achieve the required uniformity in N ). However the original approach by Chen works unconditionally showing Theorem 6.7. For every large even number N we have |{p : p + p1 = N

1

or p + p1 p2 = N with p1 , p2 > N 8 }| >

where BN = B

Y

(1 +

2
1 BN N (log N )−2 95

1 ). p−2

Here is a sketch of proof. It crops from estimation of the sifted sum S(AW, z) for A the characteristic sequence of number n = N − p with p < N, p - N and W given by (6.31) 1 1 with z = N 8 , y = N 3 , x = N . The number of representations of N in question is bounded below by 1 X 1 S(AW, z) > S(A, z) − S(Ap , z) − S(B, y) − π(y) 2 2 z6p
6.4. TWIN ALMOST-PRIMES

where B (the outcome of switching) is the characteristic sequence of numbers m = N − p1 p2 p3 with p1 p2 p3 < N , pj - N , p1 > p2 > y > p3 > z. By the large sieve theory the 1 sequences A, B and Ap have level D = N 2 −ε and Dp = D/p respectively (the last one only on average over p). Therefore by the bounds of linear sieve 4S(AW, z) > eγ V (z){|A|2 log 3 − |A|Σ − |B| −

εN }. log N

Here we have V (z) =

Y

(1 −

p
X

=

X

z6p
1 ) ∼ BN (eγ log N )−1 , p−1

log D ∼ p log Dp

Z

1 3 1 8

dt = log 6. t(1 − 2t)

Moreover |A| = π(N )−ν(N ) ∼ N (log N )−1 and |B| ∼ bN (log N )−1 where b is the constant given by the integral b=

Z Z 1 1 8 <α< 3 <β

−1

(αβ)

−1

(1 − α − β)

dαdβ =

Z

1 3 1 8

log(2 − 3α) dα = 0.363083728 . . . . α(1 − α)

α+2β<1

The last integral was computed by Rutgers graduate student S. Davis. Adding and subtracting these values we arrive at 4S(AW, z) > (a − ε)BN N (log N )−2 with a = 2 log 3 − log 6 − b = 0.04238138 . . . completing the proof.

6.4. TWIN ALMOST-PRIMES

CHAPTER VII

SUBLINEAR SIEVES

The linear sieve is most often used in applications but there are also interesting cases for certain dimensions κ < 1. In this Chapter we give a brief survey of developments in this area. 7.1. The Half-Linear Sieve Our treatment of the β-sieve in Chapter 3 required the condition β > 1 (for the sake of simplicity) and this turned out to hold true for any κ > 12 . However the case κ = 21 , β = 1 can be derived from the former by continuity. Following the arguments after Theorem 3.7 we obtain Theorem 7.1. Suppose (1.37) holds with κ =

1 2

and L > 1. Then

(7.1)

S(A, z) < (F (s) + ε)XV (z) + R(A, D)

(7.2)

S(A, z) > (f (s) − ε)XV (z) − R(A, D)

for any ε > 0 and D > eL provided D is sufficiently large in terms of ε. Here we have s = log D/ log z and F, f are the continuous solutions to (7.3)

(7.4)

1

F (s) = 2(eγ /πs) 2

if 0 < s 6 2

f (s) = 0

at s = 1

( √ √ 0 2 s ( sF (s)) = f (s − 1) √ √ 0 2 s ( sf (s)) = F (s − 1)

if s > 2 if s > 1.

The upper and the lower bounds in Theorem 7.1 are essentially best possible. To see this consider two sequences A± = (a± n ) which are the characteristic functions of numbers n 6 x, n ≡ ±1(mod 4) respectively. Thus |A± d|=

x + O(1) 4d

if 2 - d.

CHAPTER 7. SUBLINEAR SIEVES

Let P be the set of primes p ≡ −1(mod 4). Note that any n ∈ A+ has even number of prime divisors in P and any n ∈ A− has odd number of prime divisors in P (counted with 1 multiplicity). Applying analytic methods (the zero-free region of (ζ(s)L(s, χ4 )) 2 ) one can show that (7.5)

S(A+ , z) ∼ XV (z)F (s)

if s > 0

(7.6)

S(A , z) ∼ XV (z)f (s)

if s > 1

−

as x → ∞ where X =

x 4

and Y

V (z) =

(1 − p−1 ).

p
We also have 1 1π V (z) ∼ B(log z)− 2 γ 2 e

(7.7) as z → ∞, where (7.8)

B=

√

2

Y

1

(1 − p−2 ) 2

p≡−1(mod 4)

by L(1, χ4 ) =

π 4

(Leibnitz) and (1.99) (Mertens).

There is a nice arithmetical interpretation of S(A+ , z). Put 1 if n = a2 + b2 , (a, b) = 1 (7.9) b(n) = 0 otherwise. Note that a positive integer is represented as the sum of two co-prime squares if and only if it is not divisible by 4 nor by any prime ≡ −1(mod 4). Therefore X (7.10) S(A+ , x) = b(n). n6x,n odd

√

In fact S(A+ , z) = S(A+ , x) for all x < z 6 x. In this range (7.5) follows from a formula by E. Landau in his thesis of 1906. The asymptotic formula for S(A+ , z) can be proved by elementary sieve arguments and in greater generality. Of course, to this end the one-sided inequality (1.37) is insufficient. Besides this we assume throughout this section that 1 Y log w 2 M (7.11) (1 − g(p)) 6 1+ log z log w w6p
for any w < z where M is a positive constant. Remember our former settings g(p) = 0 if p is not in the sifting range and for all p we assume that 0 6 g(p) < 1.

7.2. SIEVES OF FRACTIONAL DIMENSION

Theorem 7.2. Let A = (an ) be a sequence of non-negative numbers for 1 6 n 6 x and P be a set of primes such that (1.37) and (7.11) hold with some constants L, M > 1. Suppose A is supported on integers having even number of prime divisors in P counted with multiplicity. Then for any ν > (log log x)−1 we have (7.12)

1

1

S(A, z) = XV (z){F (s) + O(ν 3 + M (log x)− 3 )} + 2θR(A, x1−ν )

where s = log x/ log z > 2 and the implied constant depends only on L. Note that the main term in (7.12) coincides with that of the upper bound (7.1) (we assume ν → 0 because otherwise (7.12) is not interesting). It is the power of our additional hypothesis about the parity of number of prime divisors from the sifting range P which transforms the sieve upper bound for S(A, z) into the true asymptotic. A complete proof of (7.12) is quite long (see [I3]) so we discuss here only the key idea. In the general case when we seek an upper bound for S(A, z) we use the combinatorial identity X (7.13) S(A, z) = S+ (A, z) − S` (A, z). ` odd

We discard every S` (A, z) =

X X

S(Ap1 ...p` ,p` )

y` 6p` <···
and estimate (7.14)

S+ (A, z) =

X

+ λ+ d |Ad | = XV (D, z) + θR(A, D)

d|P (z)

from above by using (1.37). As a matter of fact if we used both (1.37) and (7.11) then our upper bound for V + (D, z) could be shown (by refining the arguments of Chapter 3) to represent the true asymptotic n o 1 (7.15) V + (D, z) = V (z) F (s) + O(M (log x)− 3 ) . Therefore the only loss occurs from discarding S` (A, z). However, under the circumstances of Theorem 7.2 one can show that the S` (A, z) gives negligible contribution altogether. Thus what was the sieve upper bound for S(A, z) turns out to be close to the real thing. Why are S` (A, z) small? To explain this question we examine the summation conditions (7.16)

p1 . . . p` p` > D,

(7.17)

p1 . . . pm pm < D,

p1 > · · · > p` m < `, m ≡ `(mod 2)

CHAPTER 7. SUBLINEAR SIEVES

where D = x1−4ν is the level of support of Λ+ = (λ+ d ). Suppose an appears in S(Ap1 ...p` , p` ), then n = p1 . . . p` q 6 x where q has a prime divisor p ∈ P, p > p` because ` is odd. Hence it follows that (7.18)

p1 . . . p` p` 6 x.

Next we estimate each S(Ap1 ...p` , p` ) by means of a sieve of level

√

p` getting

√ S(Ap1 ...p` , p` ) 6 cg(p1 . . . p` )XV (p` ) + R(Ap1 ...p` , p` ). 3 1 3 1 √ Note that p1 . . . p` p` < (p1 . . . p`−1 ) 4 x 4 < D 4 x 4 = x1−ν by (7.18) and (7.17), except for 3

3

3

` = 1 which case is verified separately; p12 < z 2 6 x 4 . From these estimates we obtain (7.19)

X

S` (A, z) 6 cXU (D, z) + R(A, x1−ν )

` odd

where U (D, z) =

X?

g(d)V (p(d))

d|P (z)

and ? indicates the summation conditions (7.16)-(7.18) for d = p1 . . . p` . Since D is near x there is not much room between these conditions so the range of summation in U (D, z) is a small set. This observation has the following effect 1

U (D, z) ν 3 V (z).

(7.20)

Inserting the above estimates to (7.13) we obtain (7.12). There are numerous applications of Theorem 7.2 for asymptotics of special numbers representable by binary quadratic forms. We shall derive a few formulas in the simplest case of two squares. First in general by taking the sifting range P = {p ≡ −1(mod 4)} we obtain (see the characterization after (7.9)). Theorem 7.3. Let A = (an ) be a sequence of non-negative real numbers for n ≡ 1, 2, 5(mod 8) truncated to n 6 x. Suppose R(A, D) X(log x)−1

(7.21)

for D = x1−ε with any ε > 0, the implied constant depending on ε. Then (7.22)

X

n6x

1

an b(n) ∼ BC(g)X(log x)− 2

7.2. SIEVES OF FRACTIONAL DIMENSION

as x → ∞ where B is the absolute constant given by (7.8) and Y

C(g) =

p≡−1(mod 4)

1 (1 − g(p))(1 − )−1 p

(the infinite product for C(g) converges by virtue of (1.37) and (7.11)). Let A = (an ) be the characteristic sequence of an arithmetic progression n ≡ a(mod q) with 8|q, (a, q) = 1 and a ≡ 1, 2, 5(mod 8). Then g(p) = χ0 (p)p−1 where χ0 is the principal character to modulus q so (7.23)

C(g) = C(q) =

Y

p|q p≡−1(mod 4)

1 (1 − )−1 . p

In this case (7.22) yields (7.24)

X

1

b(n) ∼ BC(q)q −1 x(log x)− 2 .

n6x,n≡a(mod q)

By a more precise employment of Theorem 7.2 one can show that the asymptotic formula (7.24) holds true uniformly in q 6 xε(x) with ε(x) → 0 arbitrarily slowly. This great uniformity manifests an advantage of elementary sieve over analytic methods, the latter 2 works only for log q (log x) 3 . Next we let A = (an ) be the characteristic sequence of shifted primes n = p − 1 with p ≡ 3(mod 8). In this case the density function is g(d) = ϕ(d)−1 . Unfortunately (7.21) is not known but is expected to hold (the Elliott-Halberstam conjecture), if true we would get X 3 (7.25) b(p − 1) ∼ 41 BCx(log x)− 2 p6x

where C=

Y

(1 − (p − 1)−2 ).

p≡−1(mod 4)

For some sequences the condition (7.21) is known for D = xα−ε with 21 < α < 1. Though this is not enough to claim the asymptotic formula (7.22) one can still derive a lower and an upper bound of correct order of magnitude. Indeed Theorem 7.1 implies X 1 1 (7.26) X(log x)− 2 an b(n) X(log x)− 2 . n6x

CHAPTER 7. SUBLINEAR SIEVES

This argument is not exactly applicable for the shifted primes n = p − 1 since we know (7.21) only for α = 12 (the Bombieri-Vinogradov theorem). By applying (7.2) directly one misses the lower bound (7.26) because the sieve limit is β = 1. Nevertheless, in this case we shall break the barrier without increasing α by switching sieves (in the same style as with the proof of Theorem 6.6). Theorem 7.4. We have X

(7.27)

3

b(p − 1) x(log x)− 2 .

p6x

Hence there are infinitely many primes of type p = a2 + b2 + 1. Proof.√ We show the lower bound, the upper bound is obvious. Our sum is S(A, z) with z = x but being unable to estimate this directly we start at some lower level, say 1 z = x 2s with 1 < s < 2. By (7.2) we get 1

3

S(A, z) (s − 1) 2 x(log x)− 2

(7.28) because

eγ 1 f (s) = 2( ) 2 πs

Z

s

1

1

(t(t − 1))− 2 dt (s − 1) 2 .

1

As s tends to 1 the lower bound (7.28) vanishes slowly. On the other hand we show that the difference √ √ ∆(z, x) = S(A, z) − S(A, x) √ disappears faster. To this end we interpret ∆(z, x) as the number of solutions to the equation p − 1 = 2ap1 p2

(7.29)

with p 6 x and p1 > p2 > z

where p, p1 , p2 run over primes ≡ −1(mod 4) and √ a runs over integers composed of primes ≡ 1(mod 4). Note that a < xz −2 and z 6 p1 < x. For given ap1 we estimate the number of solutions to (7.29) in p and p2 by applying two-dimensional sieve to the polynomial sequence n = F (m) = m(2ap1 m + 1) with m 6 x/2ap1 , see (2.54). Then we sum over a and p1 getting √

∆(z, x)

x (log x)2

X

a<xz −2

! b(a)ϕ(a)−1 

X

√ z6p1 < x



 p−1 1

√ 3 3 x x 1 log x 2 (log 2 ) (log ) (s − 1) 2 x(log x)− 2 . 2 (log x) z log z

7.2. SIEVES OF FRACTIONAL DIMENSION

Subtracting this bound from (7.28) we get √ 1 3 3 S(A, x) > [c1 (s − 1) 2 − c2 (s − 1) 2 ]x(log x)− 2 for some absolute constants c1 , c2 > 0, any s with 1 < s < 2 and all x sufficiently large in terms of s. This lower bound is positive for all s near 1 proving Theorem 7.4. Similarly one can solve other problems including squares. We obtain X 3 b(N − p) C(N )N (log N )− 2 p
where C(N ) is given by (7.23). Hence it follows that every large number can be represented as the sum of a prime and two squares, N = p + a2 + b2 . G. Greaves [G1] proved that every large N ≡ 2, 3, 4, 6, 7(mod 8) can be represented as the sum of two squares of primes and two other squares, N = p2 + q 2 + a2 + b2 . K. Indlekofer [I] established the upper and lower bounds X b(n)b(n + 1) x(log x)−1 . n6x

If the representations n = a2 + b2 are counted with multiplicity then other techniques are available and some of the above problems are easier. Let r(n) denote the number of such representations. Then we have X 1 r(n) = πx + O(x 3 ), n6x

X

2

r(n)r(n + 1) = 8x + O(x 3 ).

n6x

These results follow from the lattice circle problems in the euclidean and the hyperbolic planes acted on by Z2 and SL2 (Z) respectively. The very good error terms are achieved by application of adequate harmonic analysis. No satisfactory harmonic analysis is yet available for sums over primes. A long time ago G. R. Hardy and J. E. Littlewood [HL] conjectured the asymptotic formula Y X Y χ4 (p) χ4 (p) N r(N − p) ∼ π 1+ 1− p(p − 1) p log N p
p-N

p|N

which is now known to hold due to the works of C. Hooley [H] and Yu. V. Linnik [L2]. Hooley’s work contains a delicate application of a crude sieve. We shall discuss more advanced asymptotics for sums over primes in Chapter 9.

CHAPTER 7. SUBLINEAR SIEVES

7.2. Sieves of Fractional Dimension In this section we dwell on the β-sieve of dimension 12 < κ < 1. The sieve limit satisfies 1 < β(κ) < 1 + κ, it is the root of the equation (3.42) where q(s) is given by (3.35). It is conceivable that our upper and lower bounds (3.74) are best possible up to the error terms but nobody yet constructed extreme cases. Sieve problems with rational κ ∈ ( 21 , 1) occur in questions about representability of integers by norms of ideals from number fields. Suppose K/Q is an abelian extension of degree k > 1 and conductor ∆. Then from the class field theory (Artin’s reciprocity law) the prime norms are fully characterized by the residue classes modulo ∆. Precisely, a prime number p - ∆ is a norm if and only if p(mod ∆) ∈ H where H is a certain subgroup of (Z/∆Z)? of index k. Hence the primes which are not norms have density κ = 1 − k1 , these comprise the sifting range and κ is the dimension of the relevant sieve. To give an example what the sieve methods can do for additive problems in number fields I pick up the latest application from [IP]. Theorem 7.5. Let K/Q be abelian of degree four and conductor ∆. Define b(n) = 1 if n = N a for some integral ideal a in K co-prime with ∆ and put b(n) = 0 otherwise. Let h be a positive even integer such that h ≡ h1 − h2 (mod ∆) for some h1 , h2 ∈ H. Then X

3

b(n)b(n + h) x(log x)− 2 ,

n6x

for all sufficiently large x, the implied constant depending on h and the field K. I cannot imagine a sieve problem for a natural arithmetic sequence A = (an ) which would have dimension κ ∈ ( 12 , 1) other than rational. But when κ is rational in this segment then the sieve limit β(κ) is probably a transcendental number, therefore it is unlikely that one can observe breakdown of the lower bound sieve exactly at this point. Thus it is difficult to show that the β-sieve in dimension 12 < κ < 1 yields the best possible estimates. This is merely an academic question. Let us enjoy the existing results as they are very sharp indeed. 7.3. Small Sieve of Eratosthenes-Legendre Here we consider sieves of dimension 0 < κ < 12 . The arguments used in Chapter 3 are not valid for technical reasons, however the main results remain true. A rigorous proof requires substantial modifications in the treatment of numerous error terms and the involved convergence issues (for details we refer to [I4]). It turns out that Theorem 3.7 stays without change. For any κ < 12 we have β = β(κ) = 1

7.3. SMALL SIEVE OF ERATOSTHENES-LEGENDRE

and the functions F, f are continuous solutions to the system of differential-difference equations (3.26) with the initial conditions

(7.30)

sk F (s) = A

if 0 < s 6 2

sk f (s) = B

at s = 1

where the constants A, B are solutions to the linear system (3.40)-(3.41). It is important that B is positive, namely we have Z ∞ (7.31) B e−z z −κ cosh(κE(z))dz = eκγ 0

where E(z) is given by (3.91). The other constant is determined by (7.32)

B

∞

Z

e−z z −κ exp(κE(z))dz = AΓ(1 − κ).

0

Hence B → 1 and A → 1 as κ → 0. For any sieve (7.33)

f (s) < 1 < F (s).

Hence in particular B < 1 < 2−κ A for any κ < 21 . Since B is positive there is really no sieve limit. We have in the whole range z 6 D S(A, z) XV (z).

(7.34)

The question is whether the lower and the upper bounds for S(A, z) in Theorem 3.7 are best possible (up to the error terms)? We shall see that these bounds are never sharp for sequences in a local position. Suppose A = (an ) is supported on the initial segment 1 6 n 6 x, and that the error terms rd (A) satisfy (1.40). Then by the Legendre formula (1.14) we get S(A, z) =

X

µ(d)|Ad | = XMg (x, z) + θRg (x, z)

d6x,d|P (z)

where Mg (x, z) =

X

µ(d)g(d)

X

g(d)d.

d6x,d|P (z)

Rg (x, z) =

d6x,d|P (z)

CHAPTER 7. SUBLINEAR SIEVES

For the main sum Mg (x, z) we apply Theorem 4.7 with the multiplicative function µg. This requires two hypotheses X (7.35) g(p) log p = κ log x + O(1), p6x

X

(7.36)

g(p)2 log p < ∞.

p

Now (4.87) becomes Mg (x, z) = V (z) f(s) + O((log x)2κ−1 )

where s = log x/ log z and f(s) is the continuous solution to the differential-difference problem κ s f(s) = eγκ Γ(1 − κ)−1 if 0 < s 6 1 (7.37) sf0 (s) = κf(s − 1) − κf(s) if s > 1. Note that by (7.35) Y V (z) = (1 − g(p)) (log z)−κ . p
For the remainder sum Rg (x, z) we apply (4.57) (note that (4.55) holds by virtue of (7.35)) getting x Y (1 + g(p)) x(log x)κ−1 . Rg (x, z) log x p6x

Collecting the above estimates we conclude Theorem 7.6. If A = (an ) is supported on 1 6 n 6 x, the individual error terms |rd (A)| are bounded by g(d)d, and the density function satisfies (7.36) and (7.35) with 0 < κ < 21 then (7.38) S(A, z) = XV (z) f(s) + O((log x)2κ−1 ) 1

for z = x s with s > 1 where f(s) is the continuous solution to (7.37). The implied constant depends on g and s. In view of the general lower and upper sieve bounds (3.74) it follows by comparison with the asymptotic formula (7.38) that (7.39)

f (s) 6 f(s) 6 F (s).

We also know (7.33). Hence the question: which of the two segments in (7.33), if any, the true value f(s) lies on? Corollary 4.6 provides an answer, it says f(s) − 1 changes sign in every unit interval! Thus quite frequently S(A, z) is equivalent asymptotically with the heuristically expected value XV (z). Moreover the asymptotic 1

1

S(A, x s ) ∼ XV (x s ) fails for all but a countable set of s.

8.1. THE BASIC INEQUALITIES

CHAPTER VIII

THE LARGE SIEVE

8.1. The Basic Inequalities The large sieve was invented in a short paper of 1941 by Yu.V.Linnik [L1]. While it is based on quite different principles from those of conventional sieves the method applies to the common sifting problem, i.e., it is a tool for examing sets of type (M, P, Ω) = {m ∈ M : m(mod p) ∈ / Ωp

for any p ∈ P}.

The large sieve methods are a lot more powerful than the combinatorial sieve when the set M is contained in a short interval and the number of residue classes to be excluded ω(p) = |Ωp | is very large in comparison to the modulus. Briefly speaking the power of large sieve methods is supplied by harmonic analysis on the circle R/Z and this offers more precise counting of relevant integers by employment of characters. After intensive studies (notably by R´enyi, Roth, Bombieri, Davenport, Halberstam, Montgomery, Vaughan, Selberg, Gallagher) the large sieve developed into abstract inequalities whose origin is hardly recognizable today. One of many variants of the large sieve inequalities gives a bound for the trigonometric polynomial (8.1)

S(α) =

X

an e(αn)

n

where A = (an ) are arbitrary complex numbers in a segment (8.2)

M < n 6 M + N.

By Cauchy’s inequality we have (8.3)

|S(α)|2 6 N

X

|an |2

n

and this, of course, is best possible in general. However, by varying α one should be able to establish non-trivial estimates, even for general A = (an ) because the additive characters

CHAPTER 8. THE LARGE SIEVE

ψr (n) = e(αr n) are independent and almost orthogonal provided the points αr are well spaced mod 1. Precisely, suppose that kαr − αs k > δ

(8.4)

if r 6= s

where kαk denotes the distance of α to the nearest integer. Note that the number of δ-spaced points cannot exceed 1 + δ −1 . The large sieve inequality asserts that X

(8.5)

|S(αr )|2 6 D(δ, N )

r

X

|an |2

n

where D(δ, N ) depends only on δ and N . We shall reduce various sifting problems to the inequality (8.5) in forthcoming sections. Before any example we give a comprehensive treatment of this basic inequality for general points αr ∈ R/Z. It is clear by (8.3) that (8.5) cannot hold with D(δ, N ) smaller than N . Moreover D(δ, N ) must exceed the number of points αr which can be as large as δ −1 . It turns out that D(δ, N ) need not be larger than the both cardinalities added. Theorem 8.1. For any set of δ-spaced points αr ∈ R/Z and any complex numbers an with M < m 6 M + N , where 0 < δ 6 21 and N is positive integer, we have (8.6)

X

|S(αr )|2 6 (δ −1 + N − 1)

r

X

|an |2 .

n

This neat inequality is best possible, it was proved independently by A. Selberg [S3] and H. Montgomery-R.C. Vaughan [MV2]. There are several ways of proving the large sieve inequality (8.5) which produce D(δ, N ) slightly worse than (8.7)

D(δ, N ) = δ −1 + N − 1

but good enough for most essential applications. The first result of such kind with D(δ, N ) =

11 5

max(δ −1 , N )

was established by H. Davenport and H. Halberstam [DH]. A very simple derivation of (8.6) with D(δ, N ) = δ −1 + πN has been given by P.X. Gallagher [G]. In this lecture we present the proof of (8.6) following the method of Montgomery and Vaughan.

8.1. THE BASIC INEQUALITIES

Lemma 8.2. (generalized Hilbert’s inequality). Suppose λr are real numbers with λn+1 − λr > δ. Then for any complex numbers zr we have X X X zr z¯s (λr − λs )−1 6 πδ −1 |zr |2 .

(8.8)

r

r6=s

Proof. By Cauchy’s inequality it suffices to show that X X X 2 z¯s (λr − λs )−1 6 π 2 δ −2 |zr |2

(8.9)

r

r

s6=r

Squaring out we arrange the left side as follows XX

z¯s zt

=

X

X

+

XX

L=

s

t

X

(λr − λs )−1 (λr − λt )−1

r6=s,t

|zs |2

s

(λr − λs )−2

r6=s

z¯s zt (λs − λt )−1

s6=t

X

[(λr − λs )−1 − (λr − λt )−1 ].

r6=s,t

For the last sum we have X

=

r6=s,t

X

(λr − λs )−1 −

r6=s

X

(λr − λt )−1 + 2(λs − λt )−1 .

r6=t

Hence L=

X

|zs |2

s

+

XX

X

(λr − λs )−2 + 2

r6=s

XX

z¯s zt (λs − λt )−2

s6=t

z¯s zt (λs − λt )−1

 X 

s6=t

(λr − λs )−1 −

r6=s

X r6=t

(λr − λt )−1

 

.



Note that we are estimating the norm of the matrix (µrs ) with µrs = (λr − λs )−1 if r 6= s and µrr = 0, thus we may assume that the vector v = (zr ) is extremal. Since the matrix is skew-Hermitian the extremal vector is an eigenvector, i.e., X r6=s

zr (λr − λs )−1 = νzs

CHAPTER 8. THE LARGE SIEVE

for some ν purely imaginary. This shows that the last two sums in L cancel out. Therefore for the extremal vector we have L=

X 2 X XX zs z¯s zt (λs − λt )−2 . (λr − λs )−2 + 2 s

r6=s

s6=t

Applying 2|zs zt | 6 |zs |2 + |zt |2 we obtain L63

X s

|zs |2

X

(λr − λs )−2 .

r

Since |λr − λs | > δ|r − s| the innermost sum is bounded by 2δ −2 ζ(2) = π 2 /3δ 2 . This gives (8.9) completing the proof of (8.8). Corollary 8.3. For any set of δ-spaced points αr ∈ R/Z and any complex numbers zr we have X X X zr z¯s (sin π(αr − αs ))−1 6 δ −1 |zr |2 .

(8.10)

r

r6=s

Proof. We apply (8.8) for the doubly-indexed set of numbers zmr = (−1)m zr and λmr = m + αr with 1 6 m 6 K getting X X X (−1)m−n zr z¯s (m − n + αr − αs )−1 6 πδ −1 K |zr |2 . r

(r,m)6=(s,n)

Here we can replace the summation condition (r, m) 6= (s, n) by r 6= s because for r = s the remaining terms cancel out pairwise for (m, n) against (n, m). If we put k = m − n and divide by K we derive K X X X X |k| zr z¯s (−1)k (1 − )(k + αr − αs )−1 6 πδ −1 |zr |2 . K r r6=s

−K

This yields (8.10) by letting K → ∞ because for α ∈ /Z X k

(−1)k (k + α)−1 = π(sin πα)−1 .

8.2. THE LARGE SIEVE INEQUALITY FOR ADDITIVE CHARACTERS

Corollary 8.4. For any real x we have X X X sin 2πx(αr − αs ) (8.11) zr z¯s |zr |2 . 6 δ −1 sin π(αr − αs ) r r6=s

Proof. This follows by applying Corollary 8.3 twice with zr twisted by e(xαr ) and e(−xαr ). Now we are ready to prove (8.6) with D(δ, N ) = δ −1 + N . By the duality principle (a linear operator and its adjoint have the same norm in Banach spaces) it suffices to show that for any complex numbers zr X X X 2 (8.12) zr e(nαr ) 6 D(δ, N ) |zr |2 . M
r

r

Squaring out we see that the diagonal terms r = s yield N terms yield XX X zr z¯s e(n(αr − αs ))

|zr |2 while the off-diagonal

n

r6=s

=

P

XX

zr z¯s e((M +

r6=s

N +1 2 )(αr

− αs ))

sin πN (αr − αs ) . sin π(αr − αs )

Hence these terms contribute no more than what is on the right side of (8.11). Adding both contributions we obtain (8.12) with D(δ, N ) = δ −1 + N . It remains to save 1 in D(δ, N ). To this end we apply Theorem 8.1 for the points (αr + k)K −1 with 1 6 k 6 K and for the trigonometric polynomial T (α) = S(αK). We obtain X X X αr + k 2 X T ( S(αr ) 2 = K ) 6 (δ −1 K + N K − K + 1) |an |2 K r r n k

because the points (αr +k)K −1 are spaced by δK −1 and the sum T (α) ranges over m = nK with (M K + K − 1) < m 6 (M K + K − 1) + (N K − K + 1). Hence dividing by K and letting K tend to infinity we obtain (8.6) (this trick is due to P. Cohen). 8.2. The Large Sieve Inequality for Additive Characters In applications of the large sieve inequality (8.6) to number theory we often take the points αr to be rationals a/q with 1 6 q 6 Q and (a, q) = 1. These points are spaced by δ = Q−2 , indeed if a/q 6= a0 /q 0 then

a a0 aq 0 − a0 q

− =

> 1 > 1 . 0 0 q q qq qq 0 Q2 Therefore Theorem 8.1 yields.

CHAPTER 8. THE LARGE SIEVE

Theorem 8.5. For any complex numbers an with M < n 6 M + N where N is a positive integer we have X X ? a 2 X S( ) 6 (Q2 + N − 1) |an |2 . (8.13) q n q6Q a(mod q)

Notice that if A = (an ) is supported on an arithmetic progression n ≡ `(mod k) and (k, q) = 1 we can change the variables to derive Corollary 8.6. For any complex numbers an with M < n 6 M + N we have X X? X X an 2 (8.14) |an |2 . an e( ) 6 (Q2 + k −1 N ) q q6Q a(mod q) n≡`(mod k) (q,k)=1

n≡`(mod k)

8.3. Equidistribution over Residue Classes As an application of Theorem 8.5 we show that a general set of distinct integers √ M ⊂ (M, M + N ] represents almost all residue classes for almost all prime moduli p 6 N provided only that M is numerous. More exactly we establish the equidistribution of a general sequence of real numbers A = (an ) with M < n 6 M + N over distinct residue classes ν(mod q). Put X X= an n

X(q, ν) =

X

an .

n≡ν(mod q)

We show that (8.15)

∆(q, ν) = X(q, ν) −

X q

is small for almost all ν(mod q). Indeed, using additive characters we have ∆(q, ν) =

1 q

X

e(−

a(mod q) a6≡0(mod q)

Hence by the orthogonality (Plancherel’s theorem) X X q |∆(q, ν)|2 = ν(mod q)

νa a )S( ). q q

a(mod q) a6≡0(mod q)

a |S( )|2 . q

8.3. EQUIDISTRIBUTION OVER RESIDUE CLASSES

Summing this over prime moduli we infer by (8.13) that (8.16)

X

p6Q

p

X

|∆(p, ν)|2 6 (Q2 + N )

X

|an |2 .

n

ν(mod p)

In particular if A = (an ) is the characteristic function of a set M this yields (8.17)

X

√ p6 N

p

X

|X(p, ν) − p−1 X|2 6 2N X

ν(mod p)

where now X(p, ν) is the number of m ∈ M with m ≡ ν(mod p) and X = |M| is the number of all elements in M. In the language of probability theory (spoken by A. R´enyi) this estimate is just Tchebyshev’s inequality for the variance. Given 0 < η 6 1 we call a prime p exceptional if the number of residue classes ν(mod p) covered by M does not exceed (1 − η)p. Let Eη (Q) denote the number of exceptional p 6 Q. By (8.17) we derive (8.18)

√ 2N Eη ( N ) 6 . ηX

This bound is impressive if M has positive natural density, say X > √ δN with 0 < δ 6 1, in which √ case we conclude that the number of exceptional primes p 6 N is bounded, namely Eη ( N ) 6 2/δη. This was (apart of the constant) the original conclusion of Linnik. An estimation for the variance (somewhat weaker than (8.17)) was first established by R´enyi using Linnik’s large sieve method. Linnik gave a striking application of his method to estimate the least quadratic nonresidue (mod p), i.e., the smallest positive integer q(p) such that ( q(p) p ) = −1. Note that q(p) is prime. It is conjectured that (8.19)

q(p) pε

for any ε > 0 with the implied √ constant depending on ε, whereas the best known estimate is that (8.19) for any ε > 1/4 e = 0.1516 . . . . From the Riemann hypothesis for L-function one derives that q(p) (log p)2 .

CHAPTER 8. THE LARGE SIEVE

Theorem 8.7. (Linnik). The number of primes p 6 N such that q(p) > N ε is bounded by a constant depending on ε. Proof. Consider the sifting problem (M, P, Ω) with M = {1, 2, . . . , N } √ n P = {p 6 N : ( ) = 1 for all n 6 N ε } p ν Ωp = {ν(mod p) : ( ) = −1}. p Thus ω(p) = |Ωp| = (8.20)

p−1 2

and by (8.17) X 1 (1 − ) 6 4N X −1 p p∈P

where X is the number of elements in (M, P, Ω) = {1 6 m 6 N : ( m p)=1

for any p ∈ P}.

Note that this set contains all numbers m 6 N free of prime divisors > N ε , thus also all 2 the number of type m = np1 . . . pk 6 N with N ε−ε < pj < N ε for 1 6 j 6 k = ε−1 . Therefore X N X> [ ] N. p1 . . . p k p ...p 1

k

Inserting this bound to (8.20) we obtain |P| 1 as claimed. The estimate for the variance (8.17) can be extended to all moduli. To this end we use the Ramanujan sum X? an (8.21) cq (n) = e( ). q a(mod q)

We have (8.22)

cq (n) =

X

dµ(q/d).

d|(n,q)

Hence we derive

X?

a(mod q)

X a νa dµ(q/d)X(d, ν). S( )e( ) = q q d|q

Here we can replace X(d, ν) by ∆(d, ν) and d by q/d getting X µ(d) q X? a νa S( )e( ) = q ∆( , ν). q q d d a(mod q)

d|q

Summing over q we infer by (8.13) the following extension of (8.16).

8.4. ARITHMETIC LARGE SIEVE

Theorem 8.8. For any complex numbers an with M < n 6 M + N we have X X µ(d) q X X 2 (8.23) q |an |2 . ∆( , ν) 6 (Q2 + N ) d d n q6Q

ν(mod q) d|q

8.4. Arithmetic Large Sieve The large sieve inequality (8.13) can be used to derive an upper bound for the sifted sum X an (8.24) Z= n∈(N,P,Ω)

with the number of residue classes ω(p) = |Ωp | not necessarily very large, and it is capable to produce results equivalent to these derived by the Λ2 -sieve. We may assume that (8.25)

unless n ∈ (N, P, Ω)

an = 0

so Z = X. Let h be the multiplicative function supported on squarefree numbers with ω(p) (8.26) h(p) = . p − ω(p) Lemma 8.9. Letting S(α) be the trigonometric series (8.1) we have for any q that X? a |S( )|2 . (8.27) h(q)|S(0)|2 6 q a(mod q)

Proof. If q = p is prime then X(p, ν) = 0 for all ν ∈ Ωp , therefore by Cauchy’s inequality we get X X |S(0)|2 = | X(p, ν)|2 6 (p − ω(p)) |X(p, ν)|2 ν(mod p)

= (1 −

ω(p) ) p

ν(mod p)

X

a(mod p)

a |S( )|2 p

which gives (8.27) in this case. In general if q = q1 q2 with (q1 , q2 ) = 1 we have X? X? a1 X? a S( + a2 ) 2 . |S( )|2 = q q1 q2 a(mod q)

a1 (mod q1 )a2 (mod q2 )

Assuming (8.27) holds for q1 and q2 the above factorization yields (change an into an e(na1 /q1 )) X? a1 2 S( ) > h(q2 )h(q1 )|S(0)|2 . > h(q2 ) q1 a1 (mod q1 )

This completes the proof of (8.27) by induction on prime factors of q. Summing (8.27) over the moduli q 6 Q we derive by (8.13) the following

CHAPTER 8. THE LARGE SIEVE

Theorem 8.10. For any complex numbers an with n ∈ (N, P, Ω) and M < m 6 M + N we have X (8.28) H|S(0)|2 6 (N + Q2 ) |an |2 where (8.29)

X

H=

h(q).

q6Q

Letting A(an ) be the characteristic function of numbers n ∈ (N, P, Ω) in an interval M < n 6 M + N we get by (8.28). Corollary 8.11. The number of n ∈ (N, P, Ω) with M < n 6 M + N satisfies (8.30)

Z6

N + Q2 . H

This estimate is due to H.L. Montgomery [Mon], it is neat and only slightly weaker than the estimate of Theorem 4.2 derived by Selberg’s sieve. Actually there is a close connection between the Λ2 -sieve and the large sieve. By (4.20) we have X X H ρ` = µ(`)h(d)h(`)g(`)−1 √ d`< d (d,`)=1,`|n

`|n

=

X

h(q)

√ q< D

X

µ(`)g(`)−1 =

X

√ q< D

d`=q `|n

h(q)

Y

(1 − g(p)−1 )

p|(n,q)

Thus (8.31)

H

X

ρ` =

`|n

X

µ((n, q))h(q/(n, q)).

√

q< D

We proceed further in the simplest case g(d) = d−1 if (d, k) = 1 and g(d) = 0 otherwise. This gives h(d) = ϕ(d)−1 if (d, k) = 1 and h(d) = 0 otherwise. In this case H>

√ ϕ(k) log D k

X

ϕ(q)−1 µ((n, q))ϕ((n, q)).

by (4.32) and H

X `|n

ρ` =

√ q< D (q,k)=1

8.5. THE LARGE SIEVE INEQUALITY

On the other hand, the Ramanujan sum for q squarefree is cq (n) = µ(q)µ((n, q))ϕ((n, q)). Therefore X

ρ` =

`|n

1 H

X µ(q) cq (n). ϕ(q) √

q< D (q,k)=1

Hence (8.32)

X n

X 2 X 1 X µ(q) 2 an ( ρ` ) = 2 an cq (n) . H n ϕ(q) √ `|n

q< D (q,k)=1

If A = (an ) is the characteristic function of numbers n ≡ `(mod k) in the interval M < n 6 M + N then the expression (8.32) is the dual form of X?

X

√ q< D a(mod q) (q,k)=1

a |S( )|2 . q

This connection was observed by I. Kobayashi [K]. Applying the large sieve inequality (8.14) we conclude by the duality principle

(8.33)

X

M
2  X  ρ`  6 `|n

N + kD √ ϕ(k) log D

.

8.5. The Large Sieve Inequality for Multiplicative Characters The multiplicative characters χ(mod q) can be expanded into additive characters by means of Gauss sums X a (8.34) τ (χ) = χ(a)e( ). q a(mod q)

Let s be the conductor of χ so s|q. If s = q then χ is said to be primitive. For a primitive character χ(mod q) we have (8.35)

τ (χ)χ(n) ¯ =

X

a(mod q)

χ(a)e(

an ) q

CHAPTER 8. THE LARGE SIEVE

for all n. Hence by the orthogonality it follows that |τ (χ)|2 =

(8.36)

√

q.

If χ(mod q) is not primitive then (8.35) requires the condition (n, q) = 1. Let χs be the primitive character of conductor s which is induced by χ. Suppose that q = rs with (r, s) = 1. Then we have (8.37)

X

a(mod q)

χ(a)e(

an )= q

X? X?

χ(bs + cr)e(

b(mod r) c(mod s)

bn cn + ) r s

= χ(n)χ ¯ s (r)cr (n)τ (χs ) where cr (n) is the Ramanujan sum. In particular for n = 1 this gives (8.38)

τ (χ) = µ(r)χs (r)τ (χs ).

Hence |τ (χ)|2 = µ2 (r)s

(8.39)

if (r, s) = 1.

Given a finite sequence A = (an ) of complex numbers we denote for any arithmetic function f : N → C X T (f ) = an f (n). n

In particular if f (n) = e(an/q) then T (f ) = S(a/q). By (8.37) we get T (χc ¯ r) = χ ¯s (r)τ (χs )−1

X

a(mod q)

a χ(a)S( ). q

Hence by the orthogonality of characters X

rs6Q (r,s)=1

X? X 1 s |T (χcr )|2 6 ϕ(rs) ϕ(q) q6Q

χ(mod s)

=

X

X

X

χ(mod q) a(mod q)

X?

q6Q a(mod q)

Applying Theorem 8.5 we derive

|

a |S( )|2 . q

a χ(a)S( )|2 q

8.5. THE LARGE SIEVE INEQUALITY

Theorem 8.12. For any complex numbers an with M < n 6 M + N where N is a positive integer we have X X? X s (8.40) |T (χcr )|2 6 (Q2 + N − 1) |an |2 . ϕ(rs) n rs6Q (r,s)=1

χ(mod s)

This result (with a slightly larger bound) is due to E. Bombieri and Davenport [BD]. Taking r = 1 one gets X q X? X (8.41) |T (χ)|2 6 (Q2 + N ) |an |2 . ϕ(q) n q6Q

χ(mod q)

Subsequently Bombieri and Davenport have made a good use of the summation over r. If (n, r) = 1 then cr (n) = µ(r). Thus assuming the condition (8.42)

an = 0

if n has a prime divisor 6 Q

we have (8.43)

T (χcr ) = µ(r)T (χ).

Furthermore we have by (4.32) X µ(r)2 ϕ(s) Q > log . ϕ(r) s s

(8.44)

r6Q/s (r,s)=1

Combining (8.40), (8.43) and (8.44) we derive Theorem 8.13. If A = (an ) satisfies (8.42) then X X Q X? (8.45) (log ) |T (χ)|2 6 (Q2 + N − 1) |an |2 . s n s6Q

χ(mod s)

Corollary. If M > Q we have X Q X? (8.46) (log ) |π(M + N, χ) − π(M, χ)|2 6 (Q2 + N )(π(M + N ) − π(M )) s s6Q

χ(mod s)

where (8.47)

π(x, χ) =

X

χ(p).

p6x

Taking only one term s = 1 on the left side of (8.46) we deduce the bound N + Q2 (8.48) π(M + N ) − π(M ) 6 log Q for any Q 6 M . Choosing Q2 = N/ log N the resulting bound (8.48) is almost as good as (4.47) for q = 1 which fact is remarkable because a lot of characters in (8.46) have been ignored.

9.1. HEURISTIC ARGUMENTS FOR SUMS OVER PRIMES

CHAPTER IX

BOMBIERI’S SIEVE

After a half century of creative improvements of Brun’s works the sieve methods turned out to be incapable of achieving the goal for which they have been created—the detection of prime numbers, although one comes tantalizingly close. Selberg’s examples (5.24) show that in the sieve general framework no prime can be captured while the weighted sieves are capable of producing almost primes in various interesting sets. In this connection E. Bombieri [B3] developed a method which, besides yielding complete asymptotics for almost primes, offers a great deal of insights into the difficulty of producing prime. Bombieri’s method is based on somewhat different principles than Brun’s sieve, it applies effectively to problems of local linear sieve. In this chapter we sketch a simplified version of Bombieri’s method and we use the results to explain the parity problem of sieve theory. At the end we import an additional device to the sieve machinery by means of which one can break the parity problem to reach the primary goal—sifting out prime numbers. 9.1. Heuristic Arguments for Sums over Primes Thus far our sieves captured elements of the sequence A = (an ) with n having its prime divisors in certain ranges which though we couldn’t select arbitrarily but just so as to conclude that n has only a few divisors, i.e., n is almost prime. This was achieved by counting all an with a suitable multiplicity of convolution type X (9.1) w(n) = wd d|n

where the weights wd are restricted by d 6 D. It is the essence of any sieve method that the elements in demand are registered with positive multiplicity of a convolution form. We need to evaluate the weighted sum X (9.2) S(AW) = an w(n). n

By unfolding the convolution (9.1) we express S(AW) as X S(AW) = wd |Ad |. d6D

CHAPTER 9. BOMBIERI’S SIEVE

This expression can be estimated, or even evaluated asymptotically, given strong enough approximations (1.15) for the congruent series |Ad |. For example the sums S+ (A, P ) and S− (A, P ) in (2.24) and (2.25) respectively are of type (9.2), these could have been evaluated asymptotically if one desired (see for example (7.16)) but often one-sided estimates were sufficient. The crucial point is that the level of support D can be chosen at will so that the resulting error terms rd (A) can be estimated. There is a price for the flexibility of choosing D, namely the factor w(n) does not annihilate every unwanted element an . In this respect the asymptotic formula for the weighted sum S(AW) does not reveal the true distribution of almost primes over the sequence A = (an ) while still proving the existence (the asymptotic is positive whereas the multiplicity w(n) of every unwanted n is not). If we quit the requirement of limited support for the weights wd then the multiplicity w(n) can be made so precise as to represent the almost primes exclusively. Indeed, even the prime numbers can be isolated. This can be achieved by means of the Tchebyshev convolution X (9.3) Λ(n) = − µ(d) log d. d|n

Consider the sum S(AΛ) =

(9.4)

X

an Λ(n).

n

By (9.3) and (1.15) we write (9.5)

S(AΛ) = −

X

µ(d)(log d)|Ad | = HX + R

d

where (9.6)

H=−

X

µ(d)g(d) log d

d

and (9.7)

R=−

X

µ(d)(log d)rd (A).

d

Assuming some regularity in the distribution of g(p), such as (9.30), one shows (by the same method which gave (4.67)) that the series (9.6) converges to the infinite product (9.8)

H=

Y (1 − g(p))(1 − p1 )−1 . p

9.2. ASYMPTOTICS FOR SUMS OVER ALMOST-PRIMES

The remainder term (9.6) is likely to be negligible because if d is small the error term rd (A) is small and if d is large the M¨ obius function µ(d) changes sign independently of rd (A) causing significant cancellation. Thus one expects under some reasonable conditions that S(AΛ) ∼ HX.

(9.9)

The above heuristic is amazingly accurate, it never led to a false asymptotic even if A = (an ) is a lacunary sequence but chosen objectively. Of course, a successful estimation of the remainder is the crux of the matter, it cannot be resolved within standard sieve axioms (the sequence A = (a− n ) given by (5.24) misses primes!). 9.2. Asymptotics for Sums over Almost-Primes A close cousin of Λ(n) is the function defined by X n (9.10) Λk (n) = µ(d)(log )k d d|n

for every k > 0 (see A.9). This is supported on positive integers having at most k distinct prime divisors. For example we have Λ2 (p) = (log p)2 (9.11)

if p prime

Λ2 (pq) = 2(log p)(log q)

if p 6= q primes.

Now, we consider the sum (9.12)

S(AΛk ) =

X

an Λk (n).

n

Thus S(AΛk ) takes every element an of A for which n has no more than k distinct prime divisors and no other elements are taken neither positively nor negatively. By the same heuristic as for k = 1 one is led to the asymptotic formula (9.13)

S(AΛk ) ∼ kHX(log x)k−1 .

In view of the problems encountered with the sum S(AΛ) it was surprising that Bombieri has established (9.13) rigorously for all k > 2 subject to a few natural conditions. These are the same type conditions as in general sieve theory. First we require (9.14)

an > 0

for all n

(9.15)

an = 0

if n > x.

CHAPTER 9. BOMBIERI’S SIEVE

The latter puts us in a local sieve situation. By virtue of this we can reduce the convolution (9.10) to d < x. Still this reduction is not enough to control the error terms rd (A). We would need the following estimate for the total remainder (9.16)

R(A, D) =

X[

τ3 (d)|rd (A)| < X(log x)−2

d
with D as large as x which hypothesis is unrealistic. The most we can hope for in practice is that (9.16) holds for D = x1−ε with any ε > 0 provided x is sufficiently large in terms of ε. Indeed this is one of two essential hypotheses in Bombieri’s sieve. The other important hypothesis is that the density function g satisfies (9.17)

X[

g(d) = c log y + b + O((log y)−A )

d6y

for all y > 2 with some constant c > 0, b and A > 5. By the way (9.17) implies the asymptotic (see A.9) X g(p) log p = log y + O(1) p6y

which is a stronger form of the linear sieve condition (1.36). Actually (9.17) implies X

(9.18)

g(n)Λk (n) = Pk (log y) + O(1).

n6y

where Pk is a monic polynomial of degree k. 9.3. Basic Arrangements Since (9.16) does not cover large moduli we split the convolution (9.10) into (9.19)

Λk (n) =

X

d6y,d|n

µ(d)(log

X n k n ) + µ(d)(log )k d d d>y,d|n

where y = x1−2ε . Bombieri has shown how to treat the upper range d > y provided k > 1. To this end he observed that (log nd )k is relatively small in this range (it is still small if k = 1 but not enough). This gain is tiny so that we cannot afford to lose more than a constant factor in the summation over large moduli. In particular one must keep track of almost primes over which the shifted sum S(AΛk ) runs. The obvious fact that the left side of (9.19) carries almost primes is lost when we look instead on the right side. In order to

9.3. BASIC ARRANGEMENTS

keep a partial track of this information Bombieri introduced an upper bound sieve (ξν ) of small level z = xε . Thus ξν are real numbers such that ξ1 = 1, |ξν | 6 1

(9.20)

and ξν = 0

if ν > z.

Moreover we have ρn =

X

ξν > 0

for all n.

ν|n

First Bombieri shows that the modified function λk (n) = ρn Λk (n) is not much different from Λk (n). After only this modification he splits the sum S(Aλk ) =

X

an Λk (n) = S(Aλk , y) + S0 (Aλk , y)

n

rather than S(AΛk ) according to (9.19) and estimates the contribution S0 (Aλk , y) of large moduli by trivial summation. The main part S(Aλk , y) is evaluated asymptotically by an appeal to (9.16) and (9.17). Before this we introduce another technical modification, namely we replace Λk (n) by (9.21)

Λk (n, x) =

X d|n

x µ(d)(log )k d

(for the properties of Λk (n, x) see A.9). Thus we consider (9.22)

S(AΛk , x) =

X

an Λk (n, x)

n

instead of S(AΛk ). Note that by (9.59) and (9.60) (9.23)

S(AΛk , x) > S(AΛk ) > (1 − 2ε)S(AΛk , x) − A(y)(log y)k

where A(y) =

X

an .

n6y

We assume that the sequence A = (an ) is not very lacunary, precisely that (9.24)

A(y) 6 A(x)(log x)−2

CHAPTER 9. BOMBIERI’S SIEVE

for y = x1−2ε provided x is sufficiently large in terms of ε. Hence the last term in (9.23) is X(log x)k−2 . Therefore it suffices to show that (9.25)

S(AΛk , x) = kHX(log x)k−1 {1 + O(ε)}.

For the proof of (9.25) we can assume without loss of generality that A = (an ) is supported on squarefree numbers. Indeed, writing n = `m where m is squarefree and `|m∞ we observe that for d squarefree d|n ⇔ d|m, thus Λk (n, x) = Λk (m, x). Hence, considering the sequence B = (bm ) with bm = µ2 (m)

(9.26)

X

a`m

`|m∞

in place of A = (an ) one reduces the argument to squarefree numbers (verify that |Bd | = |Ad | and S(BΛk , x) = S(AΛk , x)). Following Bombieri we mollify Λk (n, x) with the sieve factor ρn . We put (9.27)

λk (n, x) = ρn Λk (n, x)

and consider S(Aλk , x) =

(9.28)

X

an λk (n, x).

n

Then we split S(Aλk , x) = S(Aλk , x, y) + S0 (Aλk , x, y) according to Λk (n, x) =

X

d6y,d|n

X x x µ(d)(log )k . µ(d)(log )k + d d d>y,d|n

Denote these partial convolutions by Λk (n, x, y) and Λ0k (n, x, y) respectively. 9.4. Handling the Sieve Mollifier Here we show that the introduction of ρn alters S(AΛk , x) by acceptable amount. More precisely we show that, regardless of what kind of sieve {ξν } is employed, one has (9.29)

S(Aλk , x) = S(AΛk , x) + O(εX(log x)k−1 )

as long as the conditions (9.20) are satisfied.

9.5. ESTIMATION OF S0 (Aλk , x, y)

Let P = P (z) denote the product of all primes p < z. Observe that ρn = 1 if (n, P ) = 1. Moreover we have 0 6 λk (n, x) 6 2k Λk (n, x) because Λk (n, x) is supported on numbers having at most k prime divisors. Let y < n 6 x and (n, P ) > 1. If q is the largest prime divisor of n then q > n1−1/k > y 1−1/k > z, thus n = mq with m < n1−1/k 6 x1−1/k . Moreover by (9.63) we have Λk (n, x) 6 kΛk−1 (m, x) log q. ¿From the above observations we infer that the difference S(Aλk , x)−S(AΛk , x) is bounded by X X k2k k(log x) Λk−1 (m, x) amq + A(y)(2 log y)k q>ym−1 (q,m)=1

m<x1−1/k (m,P )>1

where the last term takes care of the contribution from n 6 y. Applying an upper bound sieve of level ym−1 to detect the primality of q we obtain X

X(log x)

Λk−1 (m, x)g(m)

Y

(1 − g(p)) + X(log x)k−2

p
m<x1−1/k (m,P )>1

where the last term absorbs the resulting remainder k(2 log x)k R(A, y) by virtue of (9.16) and the contribution of n 6 y by (9.24). By the linear sieve condition (9.18) we deduce a crude bound Y (1 − g(p)) (log w)−1 . p6w

Using this we infer that S(Aλk , x) − S(AΛk , x) X

X[

h(m)Λk−1 (m, x) + X(log x)k−2

m6x (m,P )>1

where h is the multiplicative function with h(p) = g(p)(1 − g(p))−1 . It is clear that h(p) satisfies (9.18). Writing m = `p with p < z we derive by (9.63), (9.64) and (9.18) that X[ m

6k

X

h(p) log p

!

p
This completes the proof of (9.29). 9.5. Estimation of S0 (Aλk , x, y)

X `<x

!

h(`)Λk−1 (`, x)

(log z)(log x)k−2 .

CHAPTER 9. BOMBIERI’S SIEVE

First we estimate crudely |Λ0k (n, x, y)|

6

x log y

k

X

1

d|n,d<x/y

by switching the divisor d > y to its complementary one. Hence 0

|S (Aλk , x, y)| 6

=

x log y

k X

an ρn

x log y

k X

ξν

X

n

1

d|n,d<x/y

X

ν
d<x/y

k

x = XU (z, x/y) log y

|A[ν,d] | + O(X(log x)k−2 )

by inserting the approximations (1.15) and estimating the resulting error terms by (9.16) for D = zx/y = x3ε . Here we have U (z, x/y) =

X

ν
ξν

X[

g([ν, d])

d<x/y

and the innermost sum is equal to g(ν)

X

X[

g(m).

q|ν m<x/yq (m,q)=1

Applying Theorem 4.8 (it requires our hypothesis (9.17)) we get X x g(ν) γq {c log yq + b + cδq + O(τ (q)(log z)−A )}. q|ν

Here v(q) = γq is the multiplicative function with v(p) = (1 + g(p))−1 and u(q) = log q − δq is the additive function with u(p) = v(p) log p. Therefore X[

g([ν, d]) = (c log

x y

+ b)f (ν) − cg(ν)w(ν) + O(g(ν)τ3 (ν)(log z)−A )

d<x/y

where f (ν) and w(ν) are as at the end of Section 1.9. Thus f is the multiplicative function with f (p) = g(p)(1 + v(p)) = g(p)(2 + g(p))(1 + g(p))−1 .

9.6. EVALUATION OF S(Aλk , x, y)

In what follows we require f (p) < 1. This condition holds for all but a few small primes which we therefore exclude from the range of the mollifying sieve (ξν ). From (1.79) we infer X x ξν f (ν)) + O((log z)2−A ) U (z, x/y) 6 (c log + b + cσ(z))( y ν where σ(z) =

X

p
u(p)v(p)

X g(p) g(p) log p = log z 1 − f (p) p
Note that f is a multiplicative function which satisfies (1.36) with k = 2. More precisely we have Y (1 − f (p)) (log z)−2 . p
Accordingly, choosing (ξν ) to be an upper bound sieve for dimension two of level z and of the range consisting of primes p < z such that f (p) < 1 we obtain X ξν f (ν) (log z)−2 . ν

Hence we derive

x xz U (z, ) (log )(log z)−2 y y

and we conclude (9.30)

S0 (Aλk , x, y) X(ε log x)k−1

provided z = xε > 2 which condition is henceforth assumed. The above estimate lies in the core of Bombieri’s sieve. This is the part of theory which requires k > 2. For k = 1 (9.30) is just short of the vital factor ε. 9.6. Evaluation of S(Aλk , x, y). Unfolding the partial convolution Λk (n, x, y) we arrange X S(Aλk , x, y) = an ρn Λk (n, x, y) n

=

X ν

ξν

X

µ(d)(log xd )k |A[ν,d] |.

d6y

Next, inserting the approximations (1.15) we get S(Aλk , x, y) = XVk (x, y) + θR(A, yz)(log x)k

CHAPTER 9. BOMBIERI’S SIEVE

where

(9.31)

Vk (x, y) =

X ν

ξν

X

µ(d)g([ν, d]) log

d6y

x k d

.

Here the remainder term R(A, yz) is bounded by X(log x)−2 by virtue of the hypothesis (9.16). Therefore S(Aλk , x, y) = XVk (x, y) + O(X(log x)k−2 ). Combining this with (9.30) and (9.29) we conclude that (9.32)

S(AΛk , x) = XVk (x, y) + O(εX(log x)k−1 ).

It remains to prove that (9.33)

Vk (x, y) = kH(log x)k−1 {1 + O(ε)}

to conclude

(9.34)

S(AΛk , x) = kHX(log x)k−1 {1 + O(ε)}.

We choose an indirect approach to (9.33) which exploits (9.32) for a specific sequence B = (bn ) having the same characteristics as A = (an ) and for which (9.34) can be directly established. To this end we take bn = h(n) for n in the segment e−1 x < n 6 x where h is the multiplicative function supported on squarefree numbers such that h(p) = g(p)(1−g(p))−1 . Since g satisfies (9.17) so does h but with different constants, namely X h(d) = c log y + b + O((log y)−A ). d6y

By Theorem 4.5 we determine that Y c= (1 − p1 )(1 − g(p))−1 = H −1 . p

For any d we have |Bd | =

X

x/e
h(n) = h(d)

X

x/ed<m6x/d (m,d)=1

h(m).

9.7. SOME APPLICATIONS

Applying Theorem 4.8 we get the following approximation |Bd | = cg(d) + O(τ (d)h(d)(log xd )−A ). Hence, with the error term rd (B) = |Bd | − cg(d) the remainder of level D = x1−ε is bounded by X X R(B, D) τ (d)h(d)(log xd )−A < (ε log x)−A τ (d)h(d) d<x1−ε

d<x

−A

6 (ε log x)

Y

(1 + 2h(p)) ε

−A

(log x)2−A .

p<x

Therefore our model sequence B satisfies the hypothesis (9.16), so (9.32) is applicable giving S(BΛk , x) = cVk (x, y) + O(ε(log x)k−1 ). On the other hand by (9.59) and (9.60) we have (1 −

k 1 log x )

S(BΛk , x) 6 S(BΛk ) 6 S(BΛk , x)

and by (9.18) S(B, Λk ) =

X

h(n)Λk (n) = k(log x)k−1 + O((log x)k−1 ).

x/e
Combining the above estimates we infer (9.33). 9.7. Some Applications First we give precise statement of the result just established Theorem 9.1. Let A = (an ) be a sequence of non-negative numbers for 1 6 n 6 x. Suppose the density function g(d) in the approximations (1.15) satisfies (9.17) with some constants c > 0, b, A > 5 and the error term rd (A) satisfies (9.16) with D = x1−ε where 0 < ε < 1 and x is sufficiently large in terms of ε. Suppose also that X (9.35) an 6 X(log x)−2 n<x1−ε

Then for any k > 2 we have X (9.36) an Λk (n) = kHX(log x)k−1 {1 + O(ε)} n6x

CHAPTER 9. BOMBIERI’S SIEVE

where H is the constant given by (9.8). The implied constant depends on g and k. In the simplest case an = 1 for all n 6 x Theorem 9.1 with k = 2 (see (9.11)) yields X

(log p)2 +

p6x

X X

(log p)(log q) ∼ 2x log x

pq6x

which is essentially the formula which Erd¨ os and Selberg used in their elementary proofs of the Prime Number Theorem. A spectacular example is the sequence an = Λ(n − 2) considered in [B3]. If we assume the Elliott-Halberstam conjecture in order to have (9.16) then (9.36) yields X

Λ(n − 2)Λk (n) ∼ kHx(log x)k−1

n6x

for any k > 2 where H is the usual twin primes constant H=2

Y

(1 − (p − 1)−2 ).

p>2

Even for the simpler case when the sequence A = (an ) is an arithmetic progression one gets results which do not seem to follow by L-function techniques, for example we obtain (see [Fri]) X kx Λk (n) ∼ (log x)k−1 ϕ(q) n6x n≡a(mod q)

if (a, q) = 1 uniformly for log q = o(log x) as x → ∞. An interesting example is given in [FI] where the Bombieri sieve is applied to prove X

Λ(a)Λk (a2 + b2 ) ∼ kHx(log x)k−1

a2 +b2 6x

where H=2

Y

1−

χ(p) (p−1)(p−χ(p))

p

and χ is the non-principal character to modulus 4. Actually this asymptotic is established also for k = 1 but, of course, not entirely within the sieve territory. Hence one infers

9.8. THE PARITY PROBLEM

Corollary 9.2. There are infinitely many primes p = a2 + b2 such that a is prime. 9.8. The Parity Problem In the case k = 1 Bombieri’s sieve fails for intrinsic reasons rooted in the parity problem. In this section we explain the parity problem in the style of Bombieri but without proofs. To this end Bombieri extended the asymptotic formula X an Λk (n) ∼ kHX(log x)k−1 (9.37) n6x

for a class of functions G(n) which can be well approximated by Λk (n) with k 6= 1. In fact to cover a larger area he employs the functions Λ(k) = Λk1 ∗ · · · ∗ Λkr for vectors (k) = (k1 , . . . , kr ) 6= (1, . . . , 1). By similar arguments as for scalars one proves (9.38)

X

an Λ(k) (n) ∼

(k)! |k|−1 (|k|−1)! HX(log x)

n6x

where (k)! = k1 ! . . . kr ! and |k| = k1 +· · ·+kr provided |k| > r. Then using the WeierstrassStone theorem one extends these formulas to Z X (9.39) an G(n) ∼ δr (x) Gr dµr HX(log x)−1 . Tr

n6x,n∈Pr

Here Pr denotes the set of squarefree integers n = p1 . . . pr , p1 log pr G(n) = Gr log , . . . , log x log x where Gr is a smooth, compactly supported function on Tr = {(u1 , . . . , ur ) : 1 > u1 > · · · > ur > 0, u1 + · · · + ur = 1} and dµr = (u1 . . . ur )−1 dur . . . dur−1 is the measure on Tr (for r = 1 this is the point measure at u = 1). Since the vectors (k) = (1, . . . , 1) are not allowed in the approximation we cannot fully describe δr (x). Bombieri realized that δr (x) is determined by the distribution function δ(x) on primes. Precisely, his result asserts that if one has the asymptotic (9.40)

X

p6x

ap ∼ δ(x)HX(log x)−1

CHAPTER 9. BOMBIERI’S SIEVE

then (9.39) holds true for any r > 1 with (9.41)

δr (x) =

δ(x)

if r is odd

2 − δ(x)

if r is even.

For many sequences A = (an ) one expects the distribution function on primes to be δ(x) = 1 (see (9.9)) in which case the distribution function on numbers having exactly r prime factors is the same δr (x) = 1. However in general every value 0 6 δ 6 2 is possible as the following example shows an = 1 + (1 − δ)λ(n) where λ(n) is the Liouville function (see (5.19) and (5.24)). This creates the parity problem. Adding (9.39) for r > 1 we get (9.42)

X[

an G∗ (n) ∼ (G+ + (1 − δ)G− )HX(log x)−1

n6x

where G∗ (n) = Gr (n)

(9.43)

+

G =

if ν(n) = r

XZ r

(9.44)

−

G =

X r

Gr dµr

Tr

r

(−1)

Z

Gr dµr .

Tr

If δ is not known, the asymptotic formula (9.40) contains no better information than that in the following two bounds (9.45)

G+ − |G− | 6 G+ + (1 − δ)G− 6 G+ + |G− |.

These bounds are best possible given nothing about δ. Assuming that the weights Gr are non-negative we can infer that the lower bound is positive only if G+ > |G− | and this means at least two Gr , Gs with r 6≡ s(mod 2) do not vanish identically. Therefore in order to produce a positive density of almost primes within general sieve framework one has to take numbers having different parity for the number of prime divisors. For example we cannot produce numbers having exactly three prime factors, or two and six prime factors. However one can produce numbers having either 1995 or 1996 prime factors.

9.9. ASYMPTOTIC SIEVE FOR PRIMES

Eliminating δ in (9.42) we obtain the following lower and upper bounds (9.46)

X[

an G∗ (n) 6 (G+ + |G− | + ε)HX(log x)−1

X[

an G∗ (n) > (G+ − |G− | − ε)HX(log x)−1

n6x

(9.47)

n6x

Each of these alone is best possible. In particular for G given by

G(u1 , . . . , ur ) =

(

1

if u1 > · · · > ur >

0

otherwise

1 s 1

the weighted sum on the left side coincides with the sifted sum S(A, z) with z = x s . Therefore it is not a coincidence that in this case F (s) = 2eγ s(G+ + |G− |) f (s) = 2eγ s(G+ − |G− |) are the linear sieve functions defined by (5.1) and (5.2). For a proof see Section 5.1. 9.9. Asymptotic Sieve for Primes To resolve the parity problem one has to appeal to certain properties of the sequence A = (an ) not yet captured by the standard system of sieve axioms. In recent years there have been made several successful injections to the system. We close these lectures with a glimpse of current developments. The following bound for special bilinear forms has been recently proposed, (9.48)

X n

X

M <m62M mn6x

µ(mn)amn X(log x)−A .

α

1

Suppose this holds for any M with x 2 −δ < M < x 2 −ε for some 0 < 2δ < α < 1 and any ε > 0. Suppose also that (9.49)

X

|rd (A)| X(log x)−A .

d<xα−ε

In both hypotheses A is arbitrarily large constant.

CHAPTER 9. BOMBIERI’S SIEVE

Assuming (9.49) with 23 < α 6 1 and (9.48) with the same α, among other things of similar nature, it has been shown [FI] that the asymptotic formula (9.37) holds for k = 1. As an application of this enhanced sieve setting we treat the sequence A = (an ) with an being the number of representations n = a2 + b4 . This is quite a lacunary sequence. Nevertheless in this case the hypothesis (9.49) is established with α = 34 and the bilinear form bound (9.48) with any α > 12 which more than suffices to prove the asymptotic formula X 3 (9.50) Λ(a2 + b4 ) ∼ cx 4 a2 +b4 6x

√ where c = 2Γ( 14 )/3 π Γ( 34 ) = 1.1128 . . . (see (FI2]). Hence one infers Corollary 9.3. There are infinitely many primes of type p = a2 + b4 . There are other options for the introduction of bilinear form estimate to produce primes with the sieve, see for example [DFI], but it is not yet clear what is the general strategy. The recent research promises a lot. The future of sieve methods looks bright.

APPENDIX FOR CHAPTER 9

APPENDIX FOR CHAPTER 9

The Functions Λk Recall that Λ(n) is defined by (1.77). Here we introduce some generalizations and suitable variations of this important arithmetic function. For an integer k > 0 we define (9.51)

Λk (n) =

X

µ(d)(log

d|n

n k ) , d

i.e., Λk = µ ∗ Lk , or equivalently Lk = 1 ∗ Λk by M¨ obius inversion. This satisfies the recurrence formula (9.52)

Λk+1 = LΛk + Λ ∗ Λk .

Hence (9.53)

(9.54)

0 6 Λk 6 Lk Λj 6 Lj−k Λk

if k > 0 if 0 6 j 6 k

Moreover it follows from (9.52) by induction that (9.55)

Λk (n) = 0

if ν(n) > k,

that is Λk (n) is supported on numbers having at most k distinct prime factors. We have (9.56)

X k Λk (mn) = Λj (m)Λk−j (n) j

if (m, n) = 1.

06j6k

For an integer k > 0 and a real number x > 0 we define (9.57)

Λk (n, x) =

X d|n

x µ(d)(log )k . d

This can be expressed in terms of Λj (n) as follows (9.58)

X k x Λk (n, x) = Λj (n)(log )k−j . j n 06j6k

APPENDIX FOR CHAPTER 9

Hence it is clear that Λk (n, x) carries numbers having at most k distinct prime divisors. Taking only j = k we get a lower bound (9.59)

Λk (n, x) > Λk (n)

if n 6 x.

Inserting (9.54) we get an upper bound (9.60)

Λk (n, x)

log n log x

k

6 Λk (n)

if n 6 x.

These inequalities imply (9.61)

0 6 Λk (n, x) 6 (log x)k

if n 6 x.

If (m, n) = 1 then any divisor of mn can be written uniquely as a product d = ab with x a|m and b|n. Writing log xd = log an + log nb we show that (9.62)

X k x Λk (mn, x) = Λj (m, )Λk−j (n) j n

if (m, n) = 1.

06j6k

If n! > 1 then Λ!0 (n) = 0 so (9.62) ranges over 0 6 j < k. Applying the inequalities k k−1 6k and Λk−j (n) 6 (log n)k−j we derive from (9.62) that j j (9.63)

Λk (mn, x) 6 kΛk−1 (m, x) log n

provided (m, n) = 1, mn 6 x, n > 1 and k > 1. By iterated application of this inequality we get k α1 αr Λk (p1 . . . pr , x) 6 r! (log x)k−r log p1 . . . log pr . r Hence for a multiplicative function g > 0 we derive (9.64)

X

g(n)Λk (n, x) 6 (log x +

n6x

X

g(`)Λ(`))k .

`6x

Now suppose g is a totally multiplicative function such that (9.65)

P1 (x) =

X

n6x

g(n)Λ(n) = log x + O(1).

APPENDIX FOR CHAPTER 9

This implies (9.66)

Pk (x) =

X

n6x

g(n)Λk (n) = log x)k + O((log x)k−1 .

Indeed by the recurrence formula (9.52) we get X x Pk+1 (x) = g(n)Λk (n){log n + P1 ( )} n n6x

= Pk (x){log x + O(1)}, hence (9.66) follows by induction. Assuming X X g(pα ) log p < ∞ (9.67) α>2

p

the asymptotic formula (9.66) remains true for multiplicative g which is not necessarily totally multiplicative. In practice (9.65) can be established by the elementary method of Tchebyshev. The simplest case g(n) = n−1 was treated in A1.3. Now suppose g(n) satisfies X (9.68) g(n) = c log x + b + O((log x)−2 ) n6x

for all x > 2, where c > O and b are constants. Note that this implies g(n) (log n)−2 . Furthermore by applying an upper bound sieve one can derive from (9.68) the following crude bound X (9.69) g(p) log p 1. y
Now using (9.68) and (9.69) we proceed to prove (9.65). On one hand we deduce from (9.68) by partial summation two asymptotic formulas X (9.70) g(n)n = cx + O(x(log x)−2 ) n6x

X

(9.71)

g(n)n log n = cx(log x − 1) + O(x(log x)−1 ).

n6x

On the other hand employing the first formula we deduce that the second sum is (assume that g is totally multiplicative) X X X x x x g(mn)mnΛ(n) = g(n)Λ(n){c + O( (log )−2 )} n n n mn6x

n6x

= cxP1 (x) + O(x) where the error term is estimated by means of (9.69). Comparing this with (9.71) we get (9.65). One can refine this argument so that the requirement that g is totally multiplicative can be dropped (use the bound g(n) (log n)−2 ).

REFERENCES

References [AD]

N. C. Ankeny and H. Onishi, The general sieve, Acta Arith. 10 (1964), 31–62.

[BV]

M. B. Barban and P. P. Vehov, On certain extremal problem, Trudy Mosk. Mat. Ob. 18 (1968), 83–90. (in Russian)

[B]

D. Bradley, A sieve auxiliary function, in Analytic Number Theory Proceedings in Honor of Heini Halberstam, Birkh¨ auser, Boston, 1996, pp. 173–210.

[B1]

E. Bombieri, Sulle formule di A. Selberg generalizzate per classi di funzioni aritmetiche e le applicazioni al problema del resto nel “Primzahlsatz”, Riv. Mat. Univ. Parma 3 (2) (1962), 393–440.

[B2]

, On the large sieve, Mathematika 12 (1965), 201–225.

[B3]

, On twin almost primes, Acta Arith. 28 (1975), 177–193 (Corrigendum 457–461).

[B4]

, The asymptotic sieve, Mem.Acad.Naz.dei XL 1/2 (1976), 243–269.

[B5]

, Le Grand Crible dans la Th´ eorie Analytique des Nombres, Soc.Math.France, Asterisque No. 18, 1974; the second edition 1987.

[BD]

E. Bombieri and H. Davenport, On the large sieve method, in Abhandlungen aus Zahlentheorie und Analysis, Plenum Press, New York 1969, 9–22.

[BFI]

E. Bombieri, J. Friedlander and H. Iwaniec, Primes in arithmetic progressions to large moduli, Acta Math. 156 (1986), 203–251.

[Br1]

¨ V. Brun, Uber das Goldbachsche Gesetz und die Anzahl der Primzahlpaare, Archiv for Math.og Naturvid. B34 (1915), no. 8, 19 pages.

[Br2]

, Le crible d’Eratosth` ene et le th´ eor` eme de Goldbach, C. R. Acad.Sci.Paris 168 (1919), 544-546.

[Br3]

, Le crible d’Eratosth` ene et le th´ eor` eme de Goldbach, Skr. Norske Vid.-Akad. Kristiania I (1920), no. 3, 36 pages.

[Buc]

A. A. Buchstab, New improvements in the method of the sieve of Eratosthenes (in Russian), Mat. Sbornik 4(46) (1938), 375–387.

[C]

J. -R. Chen, On the representation of a larger even integer as the sum of a prime and the product of at most two primes, Sci. Sinica 16 (1973), 157–176, (see also the announcement in Kexue Tongbao 17 (1966), 385–386).

[Dav]

H. Davenport, Multiplicative Number Theory, Markham, Chicago 1967.

[DH]

H. Davenport and H. Halberstam, The values of a trigonometric polynomial at well spaced points, Mathematika 14 (1967), 14–20.

[Ded]

R. Dedekind, Theorie de h¨ oheren Congruenzen, Berlin (1857), Jour. f¨ ur Reine und ang. Math. 54, 19–26.

[DHR]

H. G. Diamond, H. Halberstam and H. -E. Richert, Combinatorial sieve of dimension exceeding one, J. Number Theory 28 (1988), 306–346.

[DFI]

W. Duke, J. Friedlander and H. Iwaniec, Equidistribution of roots of a quadratic congruence to prime moduli, Annals of Math. 141 (1995), 423–441.

REFERENCES [EH]

P.D.T.A. Elliott and H. Halberstam, A conjecture in prime number theory, Rome 1968/69, Symposia Mathematica 4, 59–72.

[E1]

L. Euler, De tabule numerorum primorum, in Novi Commentari Acad. Petropol, vol. 19, 1775, 132–133.

[E2]

, Speculationes circa quastam insignes proprietates numerorum..., Acta Acad. Petropol, (1784), 18–30.

[F]

E. Fouvry, Autour du th´ eor` eme de Bombieri-Vinogradov II, Annales Scient. de l’E.N.S. 20 (1987), 617–640.

[FI]

E. Fouvry and H. Iwaniec, Gaussian primes, preprint 1995.

[Fri]

J. Friedlander, Selberg’s formula and Siegel’s zero, in Recent Progress in Analytic Number Theory I, Academic Press, London 1981, 15–23.

[FG]

J. Friedlander and A. Granville, Limitation to the equi-distribution of primes I, Ann. Math. 129 (1989), 363–382.

[FI1]

J. Friedlander and H. Iwaniec, Asymptotic sieve for primes, preprint 1996.

[FI2]

, The polynomial x2 + y 4 captures its primes, preprint 1996.

[G]

P. X. Gallagher, The large sieve, Mathematika 14 (1967), 14–20.

[G1]

G. Greaves, On the representation of a number in the form x2 + y 2 + p2 + q 2 where p, q are odd primes, Acta Arith. 29 (1976), 257–274.

[G2]

, A weighted sieve of Brun’s type, Acta Arith. 40 (1982), 297–332.

[HR]

H. Halberstam and H. -E. Richert, Sieve Methods, Academic Press, London 1974.

[HHR]

H. Halberstam, D. R. Heath-Brown and H. -E. Richert, Almost-primes in short intervals, in Recent Progress in Analytic Number Theory I, Academic Press, London 1981, 69–101.

[HL]

G. R. Hardy and J. E. Littlewood, Some problems of “Partitio Numerorum”. III: On the expression of a number as a sum of primes, Acta Math. 44 (1923), 1–70.

[H]

C. Hooley, On the representation of a number as the sum of two squares and a prime, Acta Math. 97 (1957), 189–210.

[I]

K. -H. Indlekofer, Scharfe untere Absch¨ atzung f¨ ur die Anzahlfunktion der B-Zwillinge, Acta Arith. 26 (1974), 207–212.

[I1]

H. Iwaniec, On the error term in the linear sieve, Acta Arith. 19 (1971), 1–30.

[I2]

, Primes of type ϕ(x, y)+A where ϕ is a quadratic form, Acta Arith. 21 (1972), 203–234.

[I3]

, The half-dimensional sieve, Acta Arith. 29 (1976), 69–95.

[I4]

, Rosser’s sieve, Acta Arith. 36 (1980), 171–202.

[I5]

, A new form of the error term in the linear sieve, Acta Arith. 37 (1980), 307–320.

[I6]

, Almost primes represented by quadratic polynomials, Invent. Math. 47 (1978), 171–188.

[IP]

H. Iwaniec and J. Pomykala, Sums and differences of quartic norms, Mathematika 40 (1993), 233–245.

REFERENCES [JR]

W. B. Jurkat and H. -E. Richert, An improvement of Selberg’s sieve method. I, Acta Arith. 11 (1965), 217–240.

[K]

I. Kobayashi, Remarks on the large sieve method, Proc. United States-Japan Seminar on Number Theory, Tokyo 1971.

[Kuh]

P. Kuhn, Zur Viggo Brunschen Siebmethode I, Norske Vid.Selsk. Fork., Trondheim 14 (1941), 145–148.

[Lab]

M. Laborde, Buchstab’s sifting weights, Mathematika 26 (1979), 250–257.

[Lan]

E. Landau, Einf¨ uhrung in die elementare und analytische Theorie der algebraischen Zahlen und der Ideale, 2. Aufl. Leipzig (1927).

[Leb]

V.A. Lebesque, Tables diversees pour la decomposition, Mem. Soc. Sci.Ph. et Mat., Bordeaux (1864), 1–37.

[Leg]

A.-M. Legendre, Th´ eorie des Nombres, 2´ ed. Paris 1808.

[L1]

Yu. V. Linnik, The large sieve, Dokl. Akad. Nauk SSSR 30 (1941), 292–294. (in Russian)

[L2]

, Asymptotic formula in an additive problem of Hardy-Littlewod, Izv. Akad. Nauk SSSR, Ser.Math. 24 (1960), 629–706. (in Russian)

[LR]

J. H. van Lint and H. -E. Richert, On primes in arithmetic progressions, Acta Arith. 11 (1965), 209–216.

[Mei]

E. Meissel, Ueber die Bertimmung..., Math.Ann., (1870), 636–692.

[M1]

J. Merlin, Sur quelques th´ eor` emes d’Arithm´ etique et un ´ enonc´ e qui les cartient, C. R. Acad. Sci, Paris 153 (1911), 516–518.

[M2]

, Un travail de Jean Merlin sur les nombres premiers, Bull. Sci. Math. 2 (1915), 121–136.

[Mr]

F. Mertens, Ein Bertrag zur analytischen Zahlentheorie, J. reine angew. Math. 78 (1814), 46–92.

[M¨ ob]

¨ A.F. M¨ obius, Uber eine besondere Art von Umkehnung der Reiden, Jour. f¨ ur Math. 9, Berlin (1832), 105–123.

[Mon]

H. L. Montgomery, A note on the large sieve, J. London Math.Soc. 43 (1968), 93–98.

[MV1]

H. L. Montgomery and R. C., Vaughan, The large sieve, Mathematikia 20 (1973), 119–134.

[MV2]

, Hilbert’s inequality, J. London Math.Soc. (2) 8 (1974), 73–82.

[M]

Y. Motohashi, Sieve Methods and Prime Number Theory, Tata Institute of Fundamental Research, Springer, Berlin, 1983.

[Mur]

R. Murty, Artin’s conjecture for primitive roots, The Math. Intell. 10 (4) (1988), 59–67.

[Pis]

L. Pissano, Il Liber Abbaci, Roma 1202.

[Pol]

A. de Polignac, Notice historique sur la crible d’Eratosthenes, Nouv. Ann. Math. 12, Paris 1853, 429–432.

[R´ en]

A. R´ enyi, On the representation of an even number as the sum of a single prime and a single almost-prime number, Izv. Akad. Nauk SSSR. Ser. Mat. 12 (1948), 57–78. (in Russian)

REFERENCES [R1]

H. -E. Richert, Selberg’s sieve with weights, Mathematika 16 (1969), 1–22.

[R2]

H. -E. Richert, Sieve Methods, Tata Institute of Fundamental Research, Bombay 1976.

[R]

¨ B. Riemann, Uber die Anzahl der Primzahlen unter einer gegebenen Gr¨ osse, Monatsberichte der Berliner Akademie (1859), 671–680.

[Rog]

F. Rogel, Zur Bestimmung der Anzahlen..., Math.Ann. 36 (1890), 304–316.

[SW]

A. Schinzel and W. Sierpinski, Sur certaines hypoth` eses concernant les nombres premiers, Acta Arith. 4 (1958), 85–208.

[S1]

A. Selberg, The general sieve method and its place in prime number theory, Proc. ICM, vol. 1, Cambridge, MA. 1950, 286–292.

[S2]

, Sieve methods, Proc. Sympos. Pure Math. vol. XX, AMS Providence 1971, 311–351.

[S3]

, Lectures on sieves, Collected Papers Vol. II, Springer Berlin 1991, 66–247.

[Ser]

J-P. Serre, Sp´ ecialisation des ´ el´ ements de Br2 (Q(T1 , . . . , Tn )), C. R. Acad. Sci. Paris 311 (1990), 397–402.

[Syl]

J.J. Sylvester, Note sur la th´ eor` eme de Legendre cito dans un Note in seree dans Comptes Rendus, Comptes Rendus 96, Paris (1883), 463–468.

[T1]

V. A. Tartakovski, Sur quelques sommes du type de Viggo Brun, Dokl. Akad. Nauk SSSR 23 (1939), 121–125. , La m´ ethode du crible approximatif “electif ”, Dokl. Akad. Nauk SSSR 23 (1939), 126– 129.

[T2] [T]

P. L. Tchebyshev, Sur la fonction qui d´ etermine la totalit´ e des nombres premiers inferieurs ` a une limite donn´ ee, J. Math. Pures Appl. (1) 17 (1852), 366–390.

[V]

A. I. Vinogradov, The density hypothesis for Dirichlet L-series (in Russian), Izv. Akad. Nauk SSSR Ser.Mat. 29 (1965), 903–934.

[W]

F. Wheeler, Two differential-difference equations, Trans. Amer. Math. Soc. 318 (1990), 491– 523.

[W1]

E. Wirsing, Das asymptotische Verhalten von Summen u ¨ber multiplikative Funktionen, Math. Ann. 143 (1961), 75–102.

[W2]

, Elementare Beweise des Primzahlsatzes mit Restglied I, J. Reine Angew. Math. 211 (1962), 205–214.